Environments ============ The ``rl_environments`` package ships pre-built Gymnasium environments for four manipulator arms and three manipulation tasks. Each ``(robot, task)`` combination is registered as four env IDs covering the cross-product of {standard, goal-conditioned} × {simulation, real}, giving **48 task env IDs** plus extra Kinect/ZED2 sensor variants for RX200 (54 total). All envs are standard Gymnasium ``gym.Env`` instances. Standard envs use a ``Box`` observation space; goal-conditioned envs use the Gymnasium ``GoalEnv`` ``Dict`` shape with ``observation`` / ``desired_goal`` / ``achieved_goal`` keys (suitable for HER replay buffers). Robots ------ .. list-table:: :widths: 18 12 18 18 34 :header-rows: 1 * - Robot - DoF - Gripper - Mount - Tasks * - :doc:`rx200/index` — Trossen ReactorX-200 - 5 - 2 prismatic fingers (continuous) - Flush on cafe-table - Reach, Push, PnP * - :doc:`ned2/index` — Niryo Ned2 - 6 - 2 prismatic mors (binary open/close) - Flush on cafe-table - Reach, Push, PnP * - :doc:`vx300s/index` — Trossen ViperX-300 S - 6 - 2 prismatic fingers (continuous) - Flush on cafe-table - Reach, Push, PnP * - :doc:`ur5e/index` — Universal Robots UR5e + Robotiq 2F-85 - 6 - 1 knuckle (continuous, mimic linkage) - 4-legged base + separate cafe-table - Reach, Push, PnP Tasks ----- Every robot supports the same three tasks. The same task on different robots shares its reward shape, observation layout, and command interface — only the joint count, gripper mechanics, and workspace geometry differ. .. list-table:: :widths: 14 86 :header-rows: 1 * - Task - Goal * - **Reach** - The end-effector must reach a 3D target sampled in the workspace. No cube. Gripper is not commanded. Goal is in the air; achieved goal is the EE position. * - **Push** - A cube sits on the cafe-table. The closed gripper acts as a flat paddle; the arm must push the cube to a goal point on the table. Gripper is not in the action vector. Achieved goal is the cube position. * - **Pick-and-Place** (PnP) - A cube sits on the cafe-table. The action vector gains one extra scalar — an absolute gripper command. The arm must grasp the cube, lift it, and place it at a goal point (which can be in the air). Optional ``multi_goal`` curriculum splits the reward into a lift target then the final place target. Achieved goal is the cube position. Variants — every (robot, task) is registered four ways ------------------------------------------------------- .. list-table:: :widths: 24 76 :header-rows: 1 * - Variant - Description * - ``{Robot}{Task}Sim-v0`` - Standard, Gazebo simulation. Box observation space. * - ``{Robot}{Task}GoalSim-v0`` - Goal-conditioned, Gazebo. Dict observation space. Pair with HER (``sb3_ros_support.td3_goal``). * - ``{Robot}{Task}Real-v0`` - Standard, real hardware. Same observation shape as the sim variant; real-side machinery for joint-state staleness gate + cube-pose subscriber for push/pnp. * - ``{Robot}{Task}GoalReal-v0`` - Goal-conditioned, real hardware. Common safety machinery ----------------------- Every env enforces a per-link forward-kinematics safety check before publishing a joint trajectory. Each candidate joint target is forward-kinematic'd through PyKDL subchains for every safety-check link; if any link's predicted world-z falls below the workspace floor, the action is rejected and a penalty reward is returned. For robots mounted flush on the cafe-table (RX200 / NED2 / VX300S) the floor is the table surface; UR5e adds a second check against the cafe-table footprint because it sits on a separate base. All real-side trainers refuse to construct unless ``--allow-real-robot-motion`` is passed on the CLI. The flag exports ``ALLOW_REAL_ROBOT_MOTION=1`` in the process so subprocess workers inherit the same consent — it is propagation of the flag, not a second independent channel. .. toctree:: :hidden: :maxdepth: 2 rx200/index ned2/index vx300s/index ur5e/index Reference --------- If you use these environments in your research, please cite the UniROS paper: .. code-block:: bibtex @Article{s25185679, AUTHOR = {Kapukotuwa, Jayasekara and Lee, Brian and Devine, Declan and Qiao, Yuansong}, TITLE = {UniROS: ROS-Based Reinforcement Learning Across Simulated and Real-World Robotics}, JOURNAL = {Sensors}, VOLUME = {25}, YEAR = {2025}, NUMBER = {18}, PAGES = {5679}, DOI = {10.3390/s25185679}, }