UR5e — Pick-and-Place
The arm must grasp a 4 cm cube sitting on the cafe-table, lift it,
and place it at a target point — optionally elevated above the
table (place-at-height). The action vector gains one extra scalar
at action[-1] controlling the Robotiq knuckle. An is_grasped
flag is derived from cube-EE proximity AND knuckle position; an
optional multi_goal curriculum splits the reward signal into a
lift target first, then the final place target.
This page covers four registered Gymnasium env IDs:
UR5ePnPSim-v0— standard, GazeboUR5ePnPGoalSim-v0— goal-conditioned (HER), GazeboUR5ePnPReal-v0— standard, real hardwareUR5ePnPGoalReal-v0— goal-conditioned, real hardware
Description
Hardware setup identical to UR5e — Reach (UR5e on ur5_base,
cafe-table at x = 0.7). At reset the arm folds up, the gripper opens
(init_open_gripper = [0.0] rad — opposite of push’s closed
gripper), and the cube spawns on the cafe-table top. The agent
commands the 6-DoF arm plus the gripper scalar; is_grasped flips
on once the cube is within grasp_dist_thresh = 0.05 m of the EE
AND the knuckle has closed past grasp_finger_thresh ≈ 0.30 rad.
Robotiq 2F-85 INVERTS the grasp comparison versus the
prismatic-finger robots (RX200, VX300S, NED2): the knuckle
position GROWS as the gripper closes (range 0.0 open → 0.8 closed).
The env reads gripper_open_value and gripper_closed_value
rosparams so the reward + obs code doesn’t have to know which way
the bounds map.
Action Space
Joint mode (default, ee_action_type=False). Box(7,):
Num |
Action |
Min |
Max |
Joint |
Unit |
|---|---|---|---|---|---|
0 |
shoulder pan delta |
-3.14 |
+3.14 |
|
rad |
1 |
shoulder lift delta |
-3.14 |
+3.14 |
|
rad |
2 |
elbow delta |
-3.14 |
+3.14 |
|
rad |
3 |
wrist 1 delta |
-3.14 |
+3.14 |
|
rad |
4 |
wrist 2 delta |
-3.14 |
+3.14 |
|
rad |
5 |
wrist 3 delta |
-3.14 |
+3.14 |
|
rad |
6 |
gripper command (absolute) |
0.0 |
0.8 |
|
rad |
The gripper scalar is ALWAYS treated as an absolute knuckle
position (regardless of delta_action) — open/close is closer
to a discrete decision than a small delta; matches FetchPickAndPlace’s
gripper convention. A single value is published; the other 5 finger
joints follow via the URDF mimic linkage.
EE mode (ee_action_type=True). Box(4,) — Δ EE position
(3 dims) + gripper command (1 dim).
Observation Space
Standard env (``UR5ePnPSim-v0`` / ``UR5ePnPReal-v0``). Box. Layout extends the push obs with grasp state:
Idx |
Dim |
Component |
Source |
Unit |
|---|---|---|---|---|
0–2 |
3 |
EE position |
MoveIt FK |
m |
3–5 |
3 |
EE rpy |
MoveIt |
rad |
6–8 |
3 |
Unit vector cube → current goal |
normalized |
unitless |
9 |
1 |
Euclidean distance cube → current goal |
‖goal − cube‖ |
m |
10–16 |
7 |
Current joint positions |
|
rad |
17–23 |
7 (or 4) |
Previous action |
cached |
matches action space (incl. gripper scalar) |
24–30 |
7 |
Current joint velocities |
|
rad/s |
31–33 |
3 |
Cube position |
Gazebo (sim) / |
m |
34–36 |
3 |
Cube rpy |
same source |
rad |
37–39 |
3 |
Cube linear velocity (finite-diff) |
cached + dt |
m/s |
40–42 |
3 |
Cube angular velocity (finite-diff) |
cached + dt |
rad/s |
43–45 |
3 |
Cube position relative to EE |
cube_pos − ee_pos |
m |
46 |
1 |
|
cube_rel_to_ee + knuckle pos |
0.0 or 1.0 |
The “current goal” in row 6–9 is intermediate_goal (lift target)
when multi_goal=True and the lift hasn’t been reached yet;
otherwise it’s the final pnp_goal.
Goal env (``UR5ePnPGoalSim-v0`` / ``UR5ePnPGoalReal-v0``). Dict with three keys:
observation — Box, same as standard env’s Box minus goal columns.
desired_goal — Box(3,). Sampled XYZ target. Unlike push, can be
elevated off the table for place-at-height:
Idx |
Dim |
Component |
Min |
Max |
|---|---|---|---|---|
0 |
1 |
goal x |
0.40 |
0.80 |
1 |
1 |
goal y |
-0.30 |
0.30 |
2 |
1 |
goal z |
0.795 |
0.95 |
achieved_goal — Box(3,). Current cube XYZ.
Rewards
Sparse (required for HER):
reward = 0.0 if ‖cube − goal‖ < reach_tolerance else -1.0
Dense with multi_goal=True and grasp-aware shaping
(default for std env):
reward = -multiplier_dist_reward * ‖cube − current_goal‖
+ (grasp_bonus if is_grasped else 0)
+ reached_goal_reward if ‖cube − pnp_goal‖ < reach_tolerance
+ step_reward
+ joint_limits_reward / none_exe_reward / not_within_goal_space_reward
When multi_goal=True, current_goal is the lift target
(cube_init + [0, 0, lift_height] with
lift_height = 0.15 m) until intermediate_reached flips on
(when ‖cube − intermediate_goal‖ < reach_tolerance); after that,
current_goal = pnp_goal.
grasp_finger_thresh = 0.30 rad (knuckle floor — closed enough
to be grasping something). grasp_dist_thresh = 0.05 m
(EE-to-cube ceiling).
Code example:
env = gym.make("UR5ePnPSim-v0", reward_type="Dense", multi_goal=True)
env = gym.make("UR5ePnPGoalSim-v0", reward_type="Sparse", multi_goal=True)
Starting State
Joint pose: folded upright (same as UR5e — Reach). Gripper:
OPEN (init_open_gripper = [0.0] rad — opposite of push,
where the gripper resets closed).
Cube spawn. 4 cm red cube at default (0.500, -0.150, 0.795)
in world frame; randomised XY within the spawn box if
random_cube_spawn=True. The cube is removed and re-spawned each
reset.
Goal sampling. pnp_goal ∈ Box(3,) sampled from
position_goal_min/max. With multi_goal=True,
intermediate_goal = cube_init + [0, 0, 0.15].
Episode End
Truncation. max_episode_steps (default 100). Real env also
aborts on stale /joint_states.
Termination. ‖cube − pnp_goal‖ < reach_tolerance (sparse
path only).
Arguments
Inherits all kwargs from UR5e — Reach and UR5e — Push, plus pnp-specific:
Kwarg |
Default |
Meaning |
|---|---|---|
|
|
Enable the lift-then-place curriculum
( |
|
|
Vertical offset of the intermediate lift goal above the cube spawn position. |
Version History
v0— first release (rl_environmentsv0.1.0). Inverted grasp comparison (knuckle_pos > grasp_finger_thresh) + single-element gripper publish — both UR5e-specific vs the prismatic-finger robots.