RX200 — Reach

The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded.

Env IDs: RX200ReacherSim-v0 / RX200ReacherGoalSim-v0 / RX200ReacherReal-v0 / RX200ReacherGoalReal-v0. Sim-only ZED 2 sensor variants: RX200Zed2ReacherSim-v0 / RX200Zed2ReacherGoalSim-v0.

Description

A Trossen ReactorX-200 5-DoF arm with two prismatic gripper fingers sits flush on a cafe_table (top at z = 0.78). Reach ≈ 550 mm. Joint-space or EE-space deltas, per-link FK safety, real-time or MDP-pause step mode — same architecture as the other robots’ reach env (see UR5e — Reach).

Action Space

Joint mode (default). Box(5,):

Num

Action

Min

Max

Joint

Unit

0

waist delta

-3.14

+3.14

waist

rad

1

shoulder delta

-1.85

+1.26

shoulder

rad

2

elbow delta

-1.76

+1.61

elbow

rad

3

wrist angle delta

-1.87

+2.23

wrist_angle

rad

4

wrist rotate delta

-3.14

+3.14

wrist_rotate

rad

When delta_action=True (default), scaled by delta_coeff = 0.05 and added to the current joint position.

EE mode (ee_action_type=True). Box(3,) — Δ EE position in the rx200/base_link frame.

Observation Space

Standard env. Box layout:

  • EE position (3, base frame, m)

  • Unit vector EE → goal (3, normalised)

  • Distance EE → goal (1, m)

  • Current joint positions (8, /rx200/joint_states.position, alphabetical: elbow, gripper continuous, left_finger, right_finger, shoulder, waist, wrist_angle, wrist_rotate)

  • Previous action (5 or 3)

  • Current joint velocities (8)

Goal env. Dict with three keys. desired_goal / achieved_goal = Box(3,). The bounds below are the declared observation-space bounds (mirror position_desired_goal_min/max in rx200_reach_task_config.yaml); for RX200 the per-episode sampling support (position_goal_min/max) happens to match exactly.

Idx

Dim

Component

Min

Max

0

1

goal x

0.15

0.25

1

1

goal y

-0.15

0.15

2

1

goal z

0.15

0.25

(Goal coords are in the rx200/base_link frame; base is at world z = 0.78. Values mirror position_(desired_)goal_min/max in rx200_reach_task_config.yaml.)

Rewards

Sparse: 0.0 if ‖ee goal‖ < 0.02 else -1.0.

Dense: dist-shaped + reached-goal bonus + per-step penalty + joint/none/goal-space penalties. Defaults from config/rx200_reach_task_config.yaml: reach_tolerance=0.02, multiplier_dist_reward=2.0, reached_goal_reward=20, step_reward=-0.5, joint_limits_reward=-2.0, none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.

env = gym.make("RX200ReacherSim-v0", reward_type="Dense")
env = gym.make("RX200ReacherGoalSim-v0", reward_type="Sparse")

Starting State

Initial joint pose: zeros (Interbotix URDF home — safe for the on-table mount):

waist        = 0.0
shoulder     = 0.0
elbow        = 0.0
wrist_angle  = 0.0
wrist_rotate = 0.0

Goal sampling. desired_goal ∈ Box(3,) from position_goal_min/max. Per-link FK rejects sampled goals that can’t be reached without dipping a link below table_z + safety_z_margin.

Episode End

Truncation. max_episode_steps (default 100). Real env aborts on stale /rx200/joint_states.

Termination. ‖ee goal‖ < reach_tolerance (sparse only).

Arguments

Same kwargs as UR5e — Reach. Sim-only variants: use_kinect (default) for /head_mount_kinect2/*; RX200Zed2* IDs use the ZED 2 stereo camera instead.

Version History

  • v0 — first release (rl_environments v0.1.0). Framework reference robot — most-exercised env in the ecosystem benchmarks (Use Cases A–C in the UniROS paper).