UR5e — Pick-and-Place

The arm must grasp a 4 cm cube sitting on the cafe-table, lift it, and place it at a target point — optionally elevated above the table (place-at-height). The action vector gains one extra scalar at action[-1] controlling the Robotiq knuckle. An is_grasped flag is derived from cube-EE proximity AND knuckle position; an optional multi_goal curriculum splits the reward signal into a lift target first, then the final place target.

This page covers four registered Gymnasium env IDs:

  • UR5ePnPSim-v0 — standard, Gazebo

  • UR5ePnPGoalSim-v0 — goal-conditioned (HER), Gazebo

  • UR5ePnPReal-v0 — standard, real hardware

  • UR5ePnPGoalReal-v0 — goal-conditioned, real hardware

Description

Hardware setup identical to UR5e — Reach (UR5e on ur5_base, cafe-table at x = 0.7). At reset the arm folds up, the gripper opens (init_open_gripper = [0.0] rad — opposite of push’s closed gripper), and the cube spawns on the cafe-table top. The agent commands the 6-DoF arm plus the gripper scalar; is_grasped flips on once the cube is within grasp_dist_thresh = 0.05 m of the EE AND the knuckle has closed past grasp_finger_thresh 0.30 rad.

Robotiq 2F-85 INVERTS the grasp comparison versus the prismatic-finger robots (RX200, VX300S, NED2): the knuckle position GROWS as the gripper closes (range 0.0 open → 0.8 closed). The env reads gripper_open_value and gripper_closed_value rosparams so the reward + obs code doesn’t have to know which way the bounds map.

Action Space

Joint mode (default, ee_action_type=False). Box(7,):

Num

Action

Min

Max

Joint

Unit

0

shoulder pan delta

-3.14

+3.14

shoulder_pan_joint

rad

1

shoulder lift delta

-3.14

+3.14

shoulder_lift_joint

rad

2

elbow delta

-3.14

+3.14

elbow_joint

rad

3

wrist 1 delta

-3.14

+3.14

wrist_1_joint

rad

4

wrist 2 delta

-3.14

+3.14

wrist_2_joint

rad

5

wrist 3 delta

-3.14

+3.14

wrist_3_joint

rad

6

gripper command (absolute)

0.0

0.8

robotiq_85_left_knuckle_joint

rad

The gripper scalar is ALWAYS treated as an absolute knuckle position (regardless of delta_action) — open/close is closer to a discrete decision than a small delta; matches FetchPickAndPlace’s gripper convention. A single value is published; the other 5 finger joints follow via the URDF mimic linkage.

EE mode (ee_action_type=True). Box(4,) — Δ EE position (3 dims) + gripper command (1 dim).

Observation Space

Standard env (``UR5ePnPSim-v0`` / ``UR5ePnPReal-v0``). Box. Layout extends the push obs with grasp state:

Idx

Dim

Component

Source

Unit

0–2

3

EE position

MoveIt FK

m

3–5

3

EE rpy

MoveIt

rad

6–8

3

Unit vector cube → current goal

normalized

unitless

9

1

Euclidean distance cube → current goal

‖goal − cube‖

m

10–16

7

Current joint positions

/ur5e/joint_states.position

rad

17–23

7 (or 4)

Previous action

cached

matches action space (incl. gripper scalar)

24–30

7

Current joint velocities

/joint_states.velocity

rad/s

31–33

3

Cube position

Gazebo (sim) / /cube_pose (real)

m

34–36

3

Cube rpy

same source

rad

37–39

3

Cube linear velocity (finite-diff)

cached + dt

m/s

40–42

3

Cube angular velocity (finite-diff)

cached + dt

rad/s

43–45

3

Cube position relative to EE

cube_pos − ee_pos

m

46

1

is_grasped (derived binary)

cube_rel_to_ee + knuckle pos

0.0 or 1.0

The “current goal” in row 6–9 is intermediate_goal (lift target) when multi_goal=True and the lift hasn’t been reached yet; otherwise it’s the final pnp_goal.

Goal env (``UR5ePnPGoalSim-v0`` / ``UR5ePnPGoalReal-v0``). Dict with three keys:

observation — Box, same as standard env’s Box minus goal columns.

desired_goal — Box(3,). Sampled XYZ target. Unlike push, can be elevated off the table for place-at-height:

Idx

Dim

Component

Min

Max

0

1

goal x

0.40

0.80

1

1

goal y

-0.30

0.30

2

1

goal z

0.795

0.95

achieved_goal — Box(3,). Current cube XYZ.

Rewards

Sparse (required for HER):

reward = 0.0  if ‖cube − goal‖ < reach_tolerance else -1.0

Dense with multi_goal=True and grasp-aware shaping (default for std env):

reward = -multiplier_dist_reward * ‖cube − current_goal‖
       + (grasp_bonus           if is_grasped else 0)
       + reached_goal_reward    if ‖cube − pnp_goal‖ < reach_tolerance
       + step_reward
       + joint_limits_reward / none_exe_reward / not_within_goal_space_reward

When multi_goal=True, current_goal is the lift target (cube_init + [0, 0, lift_height] with lift_height = 0.15 m) until intermediate_reached flips on (when ‖cube intermediate_goal‖ < reach_tolerance); after that, current_goal = pnp_goal.

grasp_finger_thresh = 0.30 rad (knuckle floor — closed enough to be grasping something). grasp_dist_thresh = 0.05 m (EE-to-cube ceiling).

Code example:

env = gym.make("UR5ePnPSim-v0", reward_type="Dense", multi_goal=True)
env = gym.make("UR5ePnPGoalSim-v0", reward_type="Sparse", multi_goal=True)

Starting State

Joint pose: folded upright (same as UR5e — Reach). Gripper: OPEN (init_open_gripper = [0.0] rad — opposite of push, where the gripper resets closed).

Cube spawn. 4 cm red cube at default (0.500, -0.150, 0.795) in world frame; randomised XY within the spawn box if random_cube_spawn=True. The cube is removed and re-spawned each reset.

Goal sampling. pnp_goal ∈ Box(3,) sampled from position_goal_min/max. With multi_goal=True, intermediate_goal = cube_init + [0, 0, 0.15].

Episode End

Truncation. max_episode_steps (default 100). Real env also aborts on stale /joint_states.

Termination. ‖cube pnp_goal‖ < reach_tolerance (sparse path only).

Arguments

Inherits all kwargs from UR5e — Reach and UR5e — Push, plus pnp-specific:

Kwarg

Default

Meaning

multi_goal

True

Enable the lift-then-place curriculum (intermediate_goal reached → switch to pnp_goal).

lift_height

0.15 (m)

Vertical offset of the intermediate lift goal above the cube spawn position.

Version History

  • v0 — first release (rl_environments v0.1.0). Inverted grasp comparison (knuckle_pos > grasp_finger_thresh) + single-element gripper publish — both UR5e-specific vs the prismatic-finger robots.