RX200 — Reach

The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded.

Env IDs: RX200ReacherSim-v0 / RX200ReacherGoalSim-v0 / RX200ReacherReal-v0 / RX200ReacherGoalReal-v0. Sim-only ZED 2 sensor variants: RX200Zed2ReacherSim-v0 / RX200Zed2ReacherGoalSim-v0.

Description

A Trossen ReactorX-200 5-DoF arm with two prismatic gripper fingers sits flush on a cafe_table (top at z = 0.78). Reach ≈ 550 mm. Joint-space or EE-space deltas, per-link FK safety, real-time or MDP-pause step mode — same architecture as the other robots’ reach env (see UR5e — Reach).

Action Space

Joint mode (default). Box(5,):

Num	Action	Min	Max	Joint	Unit
0	waist delta	-3.14	+3.14	`waist`	rad
1	shoulder delta	-1.85	+1.26	`shoulder`	rad
2	elbow delta	-1.76	+1.61	`elbow`	rad
3	wrist angle delta	-1.87	+2.23	`wrist_angle`	rad
4	wrist rotate delta	-3.14	+3.14	`wrist_rotate`	rad

When delta_action=True (default), scaled by delta_coeff = 0.05 and added to the current joint position.

EE mode (ee_action_type=True). Box(3,) — Δ EE position in the rx200/base_link frame.

Observation Space

Standard env. Box layout:

EE position (3, base frame, m)
Unit vector EE → goal (3, normalised)
Distance EE → goal (1, m)
Current joint positions (8, /rx200/joint_states.position, alphabetical: elbow, gripper continuous, left_finger, right_finger, shoulder, waist, wrist_angle, wrist_rotate)
Previous action (5 or 3)
Current joint velocities (8)

Goal env. Dict with three keys. desired_goal / achieved_goal = Box(3,). The bounds below are the declared observation-space bounds (mirror position_desired_goal_min/max in rx200_reach_task_config.yaml); for RX200 the per-episode sampling support (position_goal_min/max) happens to match exactly.

Idx	Dim	Component	Min	Max
0	1	goal x	0.15	0.25
1	1	goal y	-0.15	0.15
2	1	goal z	0.15	0.25

(Goal coords are in the rx200/base_link frame; base is at world z = 0.78. Values mirror position_(desired_)goal_min/max in rx200_reach_task_config.yaml.)

Rewards

Sparse: 0.0 if ‖ee − goal‖ < 0.02 else -1.0.

Dense: dist-shaped + reached-goal bonus + per-step penalty + joint/none/goal-space penalties. Defaults from config/rx200_reach_task_config.yaml: reach_tolerance=0.02, multiplier_dist_reward=2.0, reached_goal_reward=20, step_reward=-0.5, joint_limits_reward=-2.0, none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.

env = gym.make("RX200ReacherSim-v0", reward_type="Dense")
env = gym.make("RX200ReacherGoalSim-v0", reward_type="Sparse")

Starting State

Initial joint pose: zeros (Interbotix URDF home — safe for the on-table mount):

waist        = 0.0
shoulder     = 0.0
elbow        = 0.0
wrist_angle  = 0.0
wrist_rotate = 0.0

Goal sampling. desired_goal ∈ Box(3,) from position_goal_min/max. Per-link FK rejects sampled goals that can’t be reached without dipping a link below table_z + safety_z_margin.

Episode End

Truncation. max_episode_steps (default 100). Real env aborts on stale /rx200/joint_states.

Termination. ‖ee − goal‖ < reach_tolerance (sparse only).

Arguments

Same kwargs as UR5e — Reach. Sim-only variants: use_kinect (default) for /head_mount_kinect2/*; RX200Zed2* IDs use the ZED 2 stereo camera instead.

Version History

v0 — first release (rl_environments v0.1.0). Framework reference robot — most-exercised env in the ecosystem benchmarks (Use Cases A–C in the UniROS paper).