Environments

The rl_environments package ships pre-built Gymnasium environments for four manipulator arms and three manipulation tasks. Each (robot, task) combination is registered as four env IDs covering the cross-product of {standard, goal-conditioned} × {simulation, real}, giving 48 task env IDs plus extra Kinect/ZED2 sensor variants for RX200 (54 total).

All envs are standard Gymnasium gym.Env instances. Standard envs use a Box observation space; goal-conditioned envs use the Gymnasium GoalEnv Dict shape with observation / desired_goal / achieved_goal keys (suitable for HER replay buffers).

Robots

Robot	DoF	Gripper	Mount	Tasks
RX200 (Trossen ReactorX-200) — Trossen ReactorX-200	5	2 prismatic fingers (continuous)	Flush on cafe-table	Reach, Push, PnP
NED2 (Niryo Ned2) — Niryo Ned2	6	2 prismatic mors (binary open/close)	Flush on cafe-table	Reach, Push, PnP
VX300S (Trossen ViperX-300 S) — Trossen ViperX-300 S	6	2 prismatic fingers (continuous)	Flush on cafe-table	Reach, Push, PnP
UR5e (Universal Robots UR5e + Robotiq 2F-85) — Universal Robots UR5e + Robotiq 2F-85	6	1 knuckle (continuous, mimic linkage)	4-legged base + separate cafe-table	Reach, Push, PnP

Tasks

Every robot supports the same three tasks. The same task on different robots shares its reward shape, observation layout, and command interface — only the joint count, gripper mechanics, and workspace geometry differ.

Task	Goal
Reach	The end-effector must reach a 3D target sampled in the workspace. No cube. Gripper is not commanded. Goal is in the air; achieved goal is the EE position.
Push	A cube sits on the cafe-table. The closed gripper acts as a flat paddle; the arm must push the cube to a goal point on the table. Gripper is not in the action vector. Achieved goal is the cube position.
Pick-and-Place (PnP)	A cube sits on the cafe-table. The action vector gains one extra scalar — an absolute gripper command. The arm must grasp the cube, lift it, and place it at a goal point (which can be in the air). Optional `multi_goal` curriculum splits the reward into a lift target then the final place target. Achieved goal is the cube position.

Variants — every (robot, task) is registered four ways

Variant	Description
`{Robot}{Task}Sim-v0`	Standard, Gazebo simulation. Box observation space.
`{Robot}{Task}GoalSim-v0`	Goal-conditioned, Gazebo. Dict observation space. Pair with HER (`sb3_ros_support.td3_goal`).
`{Robot}{Task}Real-v0`	Standard, real hardware. Same observation shape as the sim variant; real-side machinery for joint-state staleness gate + cube-pose subscriber for push/pnp.
`{Robot}{Task}GoalReal-v0`	Goal-conditioned, real hardware.

Common safety machinery

Every env enforces a per-link forward-kinematics safety check before publishing a joint trajectory. Each candidate joint target is forward-kinematic’d through PyKDL subchains for every safety-check link; if any link’s predicted world-z falls below the workspace floor, the action is rejected and a penalty reward is returned. For robots mounted flush on the cafe-table (RX200 / NED2 / VX300S) the floor is the table surface; UR5e adds a second check against the cafe-table footprint because it sits on a separate base.

All real-side trainers refuse to construct unless --allow-real-robot-motion is passed on the CLI. The flag exports ALLOW_REAL_ROBOT_MOTION=1 in the process so subprocess workers inherit the same consent — it is propagation of the flag, not a second independent channel.

Reference

If you use these environments in your research, please cite the UniROS paper:

@Article{s25185679,
  AUTHOR  = {Kapukotuwa, Jayasekara and Lee, Brian and Devine, Declan and Qiao, Yuansong},
  TITLE   = {UniROS: ROS-Based Reinforcement Learning Across Simulated and Real-World Robotics},
  JOURNAL = {Sensors},
  VOLUME  = {25},
  YEAR    = {2025},
  NUMBER  = {18},
  PAGES   = {5679},
  DOI     = {10.3390/s25185679},
}