RX200 — Reach
The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded.
Env IDs: RX200ReacherSim-v0 / RX200ReacherGoalSim-v0 /
RX200ReacherReal-v0 / RX200ReacherGoalReal-v0. Sim-only ZED 2
sensor variants: RX200Zed2ReacherSim-v0 /
RX200Zed2ReacherGoalSim-v0.
Description
A Trossen ReactorX-200 5-DoF arm with two prismatic gripper fingers
sits flush on a cafe_table (top at z = 0.78). Reach ≈ 550 mm.
Joint-space or EE-space deltas, per-link FK safety, real-time or
MDP-pause step mode — same architecture as the other robots’
reach env (see UR5e — Reach).
Action Space
Joint mode (default). Box(5,):
Num |
Action |
Min |
Max |
Joint |
Unit |
|---|---|---|---|---|---|
0 |
waist delta |
-3.14 |
+3.14 |
|
rad |
1 |
shoulder delta |
-1.85 |
+1.26 |
|
rad |
2 |
elbow delta |
-1.76 |
+1.61 |
|
rad |
3 |
wrist angle delta |
-1.87 |
+2.23 |
|
rad |
4 |
wrist rotate delta |
-3.14 |
+3.14 |
|
rad |
When delta_action=True (default), scaled by delta_coeff = 0.05
and added to the current joint position.
EE mode (ee_action_type=True). Box(3,) — Δ EE position in
the rx200/base_link frame.
Observation Space
Standard env. Box layout:
EE position (3, base frame, m)
Unit vector EE → goal (3, normalised)
Distance EE → goal (1, m)
Current joint positions (8,
/rx200/joint_states.position, alphabetical: elbow, gripper continuous, left_finger, right_finger, shoulder, waist, wrist_angle, wrist_rotate)Previous action (5 or 3)
Current joint velocities (8)
Goal env. Dict with three keys. desired_goal /
achieved_goal = Box(3,). The bounds below are the declared
observation-space bounds (mirror position_desired_goal_min/max
in rx200_reach_task_config.yaml); for RX200 the per-episode
sampling support (position_goal_min/max) happens to match
exactly.
Idx |
Dim |
Component |
Min |
Max |
|---|---|---|---|---|
0 |
1 |
goal x |
0.15 |
0.25 |
1 |
1 |
goal y |
-0.15 |
0.15 |
2 |
1 |
goal z |
0.15 |
0.25 |
(Goal coords are in the rx200/base_link frame; base is at world
z = 0.78. Values mirror position_(desired_)goal_min/max in
rx200_reach_task_config.yaml.)
Rewards
Sparse: 0.0 if ‖ee − goal‖ < 0.02 else -1.0.
Dense: dist-shaped + reached-goal bonus + per-step penalty +
joint/none/goal-space penalties. Defaults from
config/rx200_reach_task_config.yaml: reach_tolerance=0.02,
multiplier_dist_reward=2.0, reached_goal_reward=20,
step_reward=-0.5, joint_limits_reward=-2.0,
none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.
env = gym.make("RX200ReacherSim-v0", reward_type="Dense")
env = gym.make("RX200ReacherGoalSim-v0", reward_type="Sparse")
Starting State
Initial joint pose: zeros (Interbotix URDF home — safe for the on-table mount):
waist = 0.0
shoulder = 0.0
elbow = 0.0
wrist_angle = 0.0
wrist_rotate = 0.0
Goal sampling. desired_goal ∈ Box(3,) from
position_goal_min/max. Per-link FK rejects sampled goals that
can’t be reached without dipping a link below
table_z + safety_z_margin.
Episode End
Truncation. max_episode_steps (default 100). Real env aborts
on stale /rx200/joint_states.
Termination. ‖ee − goal‖ < reach_tolerance (sparse only).
Arguments
Same kwargs as UR5e — Reach. Sim-only variants:
use_kinect (default) for /head_mount_kinect2/*;
RX200Zed2* IDs use the ZED 2 stereo camera instead.
Version History
v0— first release (rl_environmentsv0.1.0). Framework reference robot — most-exercised env in the ecosystem benchmarks (Use Cases A–C in the UniROS paper).