Environments
The rl_environments package ships pre-built Gymnasium environments
for four manipulator arms and three manipulation tasks. Each
(robot, task) combination is registered as four env IDs covering
the cross-product of {standard, goal-conditioned} × {simulation,
real}, giving 48 task env IDs plus extra Kinect/ZED2 sensor
variants for RX200 (54 total).
All envs are standard Gymnasium gym.Env instances. Standard envs
use a Box observation space; goal-conditioned envs use the
Gymnasium GoalEnv Dict shape with observation /
desired_goal / achieved_goal keys (suitable for HER replay
buffers).
Robots
Robot |
DoF |
Gripper |
Mount |
Tasks |
|---|---|---|---|---|
RX200 (Trossen ReactorX-200) — Trossen ReactorX-200 |
5 |
2 prismatic fingers (continuous) |
Flush on cafe-table |
Reach, Push, PnP |
NED2 (Niryo Ned2) — Niryo Ned2 |
6 |
2 prismatic mors (binary open/close) |
Flush on cafe-table |
Reach, Push, PnP |
VX300S (Trossen ViperX-300 S) — Trossen ViperX-300 S |
6 |
2 prismatic fingers (continuous) |
Flush on cafe-table |
Reach, Push, PnP |
UR5e (Universal Robots UR5e + Robotiq 2F-85) — Universal Robots UR5e + Robotiq 2F-85 |
6 |
1 knuckle (continuous, mimic linkage) |
4-legged base + separate cafe-table |
Reach, Push, PnP |
Tasks
Every robot supports the same three tasks. The same task on different robots shares its reward shape, observation layout, and command interface — only the joint count, gripper mechanics, and workspace geometry differ.
Task |
Goal |
|---|---|
Reach |
The end-effector must reach a 3D target sampled in the workspace. No cube. Gripper is not commanded. Goal is in the air; achieved goal is the EE position. |
Push |
A cube sits on the cafe-table. The closed gripper acts as a flat paddle; the arm must push the cube to a goal point on the table. Gripper is not in the action vector. Achieved goal is the cube position. |
Pick-and-Place (PnP) |
A cube sits on the cafe-table. The action vector gains one
extra scalar — an absolute gripper command. The arm must grasp
the cube, lift it, and place it at a goal point (which can be
in the air). Optional |
Variants — every (robot, task) is registered four ways
Variant |
Description |
|---|---|
|
Standard, Gazebo simulation. Box observation space. |
|
Goal-conditioned, Gazebo. Dict observation space. Pair with
HER ( |
|
Standard, real hardware. Same observation shape as the sim variant; real-side machinery for joint-state staleness gate + cube-pose subscriber for push/pnp. |
|
Goal-conditioned, real hardware. |
Common safety machinery
Every env enforces a per-link forward-kinematics safety check before publishing a joint trajectory. Each candidate joint target is forward-kinematic’d through PyKDL subchains for every safety-check link; if any link’s predicted world-z falls below the workspace floor, the action is rejected and a penalty reward is returned. For robots mounted flush on the cafe-table (RX200 / NED2 / VX300S) the floor is the table surface; UR5e adds a second check against the cafe-table footprint because it sits on a separate base.
All real-side trainers refuse to construct unless --allow-real-robot-motion
is passed on the CLI. The flag exports ALLOW_REAL_ROBOT_MOTION=1
in the process so subprocess workers inherit the same consent — it
is propagation of the flag, not a second independent channel.
Reference
If you use these environments in your research, please cite the UniROS paper:
@Article{s25185679,
AUTHOR = {Kapukotuwa, Jayasekara and Lee, Brian and Devine, Declan and Qiao, Yuansong},
TITLE = {UniROS: ROS-Based Reinforcement Learning Across Simulated and Real-World Robotics},
JOURNAL = {Sensors},
VOLUME = {25},
YEAR = {2025},
NUMBER = {18},
PAGES = {5679},
DOI = {10.3390/s25185679},
}