RX200 — Pick-and-Place
The arm must grasp a 4 cm cube, lift it, and place it at a target
point (possibly elevated). Action vector gains one extra scalar at
action[-1] controlling the gripper (absolute left-finger
position). is_grasped derived from cube-EE proximity AND
left-finger position. Optional multi_goal lift-then-place
curriculum.
Env IDs: RX200PnPSim-v0 / RX200PnPGoalSim-v0 /
RX200PnPReal-v0 / RX200PnPGoalReal-v0. Sim-only ZED 2 variants:
RX200Zed2PnPSim-v0 / RX200Zed2PnPGoalSim-v0.
Description
RX200 flush-mounted on the cafe-table. At reset the arm goes to
home pose, the gripper opens
(init_open_gripper = [0.036, -0.036] m — opposite of push’s
closed reset), and the cube spawns on the table.
is_grasped flips on when ‖cube − ee‖ < grasp_dist_thresh
AND left_finger_pos < grasp_finger_thresh.
Action Space
Joint mode (default). Box(6,):
Indices 0–4: 5-joint arm command (same as RX200 — Reach).
Index 5: gripper command (absolute
left_fingerposition) ∈ [0.015, 0.037] m.right_finger = -left_fingerset just before publishing.
EE mode (ee_action_type=True). Box(4,) — Δ EE position +
gripper command.
Observation Space
Standard env. Box. Extends push obs with is_grasped (1 dim,
binary 0/1).
Goal env. Dict. desired_goal = Box(3,); z range extends up
to 0.20 m above the table for place-at-height. achieved_goal =
cube XYZ.
Rewards
Sparse: 0.0 if ‖cube − pnp_goal‖ < reach_tolerance else
-1.0.
Dense with multi_goal=True:
reward = -multiplier_dist_reward * ‖cube − current_goal‖
+ grasp_bonus if is_grasped
+ reached_goal_reward if ‖cube − pnp_goal‖ < reach_tolerance
+ step_reward
+ (joint/none/goal-space penalties)
current_goal is the intermediate lift target
(cube_init + [0, 0, lift_height=0.15]) until reached, then
pnp_goal.
Defaults: grasp_dist_thresh=0.05, grasp_finger_thresh=0.020,
lift_height=0.15.
env = gym.make("RX200PnPSim-v0", reward_type="Dense", multi_goal=True)
env = gym.make("RX200PnPGoalSim-v0", reward_type="Sparse", multi_goal=True)
Starting State
Joint pose: zeros (URDF home). Gripper open at
init_open_gripper = [0.036, -0.036] m.
Cube spawn. 4 cm red cube at default
[0.25, 0.0, 0.015] in base frame.
Goal sampling. pnp_goal ∈ Box(3,) from position_goal_min/max.
With multi_goal=True,
intermediate_goal = cube_init + [0, 0, 0.15].
Episode End
Truncation. max_episode_steps (default 100). Real env on
stale /rx200/joint_states.
Termination. ‖cube − pnp_goal‖ < reach_tolerance (sparse
only).
Arguments
Inherits RX200 — Reach and RX200 — Push kwargs plus pnp-specific:
multi_goal(bool, default True) — lift-then-place curriculum.lift_height(float, default 0.15 m) — vertical offset ofintermediate_goalabove cube spawn.
Version History
v0— first release (rl_environmentsv0.1.0). Grasp comparison:left_finger_pos < grasp_finger_thresh(prismatic finger convention).