RX200 — Pick-and-Place

The arm must grasp a 4 cm cube, lift it, and place it at a target point (possibly elevated). Action vector gains one extra scalar at action[-1] controlling the gripper (absolute left-finger position). is_grasped derived from cube-EE proximity AND left-finger position. Optional multi_goal lift-then-place curriculum.

Env IDs: RX200PnPSim-v0 / RX200PnPGoalSim-v0 / RX200PnPReal-v0 / RX200PnPGoalReal-v0. Sim-only ZED 2 variants: RX200Zed2PnPSim-v0 / RX200Zed2PnPGoalSim-v0.

Description

RX200 flush-mounted on the cafe-table. At reset the arm goes to home pose, the gripper opens (init_open_gripper = [0.036, -0.036] m — opposite of push’s closed reset), and the cube spawns on the table. is_grasped flips on when ‖cube − ee‖ < grasp_dist_thresh AND left_finger_pos < grasp_finger_thresh.

Action Space

Joint mode (default). Box(6,):

Indices 0–4: 5-joint arm command (same as RX200 — Reach).
Index 5: gripper command (absolute left_finger position) ∈ [0.015, 0.037] m. right_finger = -left_finger set just before publishing.

EE mode (ee_action_type=True). Box(4,) — Δ EE position + gripper command.

Observation Space

Standard env. Box. Extends push obs with is_grasped (1 dim, binary 0/1).

Goal env. Dict. desired_goal = Box(3,); z range extends up to 0.20 m above the table for place-at-height. achieved_goal = cube XYZ.

Rewards

Sparse: 0.0 if ‖cube − pnp_goal‖ < reach_tolerance else -1.0.

Dense with multi_goal=True:

reward = -multiplier_dist_reward * ‖cube − current_goal‖
       + grasp_bonus  if is_grasped
       + reached_goal_reward  if ‖cube − pnp_goal‖ < reach_tolerance
       + step_reward
       + (joint/none/goal-space penalties)

current_goal is the intermediate lift target (cube_init + [0, 0, lift_height=0.15]) until reached, then pnp_goal.

Defaults: grasp_dist_thresh=0.05, grasp_finger_thresh=0.020, lift_height=0.15.

env = gym.make("RX200PnPSim-v0", reward_type="Dense", multi_goal=True)
env = gym.make("RX200PnPGoalSim-v0", reward_type="Sparse", multi_goal=True)

Starting State

Joint pose: zeros (URDF home). Gripper open at init_open_gripper = [0.036, -0.036] m.

Cube spawn. 4 cm red cube at default [0.25, 0.0, 0.015] in base frame.

Goal sampling. pnp_goal ∈ Box(3,) from position_goal_min/max. With multi_goal=True, intermediate_goal = cube_init + [0, 0, 0.15].

Episode End

Truncation. max_episode_steps (default 100). Real env on stale /rx200/joint_states.

Termination. ‖cube − pnp_goal‖ < reach_tolerance (sparse only).

Arguments

Inherits RX200 — Reach and RX200 — Push kwargs plus pnp-specific:

multi_goal (bool, default True) — lift-then-place curriculum.
lift_height (float, default 0.15 m) — vertical offset of intermediate_goal above cube spawn.

Version History

v0 — first release (rl_environments v0.1.0). Grasp comparison: left_finger_pos < grasp_finger_thresh (prismatic finger convention).