UR5e — Pick-and-Place ===================== The arm must grasp a 4 cm cube sitting on the cafe-table, lift it, and place it at a target point — optionally elevated above the table (place-at-height). The action vector gains one extra scalar at ``action[-1]`` controlling the Robotiq knuckle. An ``is_grasped`` flag is derived from cube-EE proximity AND knuckle position; an optional ``multi_goal`` curriculum splits the reward signal into a lift target first, then the final place target. This page covers four registered Gymnasium env IDs: * ``UR5ePnPSim-v0`` — standard, Gazebo * ``UR5ePnPGoalSim-v0`` — goal-conditioned (HER), Gazebo * ``UR5ePnPReal-v0`` — standard, real hardware * ``UR5ePnPGoalReal-v0`` — goal-conditioned, real hardware Description ----------- Hardware setup identical to :doc:`reach` (UR5e on ``ur5_base``, cafe-table at x = 0.7). At reset the arm folds up, the gripper opens (``init_open_gripper = [0.0]`` rad — *opposite* of push's closed gripper), and the cube spawns on the cafe-table top. The agent commands the 6-DoF arm plus the gripper scalar; ``is_grasped`` flips on once the cube is within ``grasp_dist_thresh = 0.05 m`` of the EE AND the knuckle has closed past ``grasp_finger_thresh ≈ 0.30 rad``. Robotiq 2F-85 INVERTS the grasp comparison versus the prismatic-finger robots (RX200, VX300S, NED2): the knuckle position GROWS as the gripper closes (range 0.0 open → 0.8 closed). The env reads ``gripper_open_value`` and ``gripper_closed_value`` rosparams so the reward + obs code doesn't have to know which way the bounds map. Action Space ------------ **Joint mode** (default, ``ee_action_type=False``). Box(7,): .. list-table:: :widths: 6 32 12 12 26 12 :header-rows: 1 * - Num - Action - Min - Max - Joint - Unit * - 0 - shoulder pan delta - -3.14 - +3.14 - ``shoulder_pan_joint`` - rad * - 1 - shoulder lift delta - -3.14 - +3.14 - ``shoulder_lift_joint`` - rad * - 2 - elbow delta - -3.14 - +3.14 - ``elbow_joint`` - rad * - 3 - wrist 1 delta - -3.14 - +3.14 - ``wrist_1_joint`` - rad * - 4 - wrist 2 delta - -3.14 - +3.14 - ``wrist_2_joint`` - rad * - 5 - wrist 3 delta - -3.14 - +3.14 - ``wrist_3_joint`` - rad * - 6 - gripper command (absolute) - 0.0 - 0.8 - ``robotiq_85_left_knuckle_joint`` - rad The gripper scalar is ALWAYS treated as an absolute knuckle position (regardless of ``delta_action``) — open/close is closer to a discrete decision than a small delta; matches FetchPickAndPlace's gripper convention. A single value is published; the other 5 finger joints follow via the URDF mimic linkage. **EE mode** (``ee_action_type=True``). Box(4,) — Δ EE position (3 dims) + gripper command (1 dim). Observation Space ----------------- **Standard env (``UR5ePnPSim-v0`` / ``UR5ePnPReal-v0``).** Box. Layout extends the push obs with grasp state: .. list-table:: :widths: 8 14 38 28 12 :header-rows: 1 * - Idx - Dim - Component - Source - Unit * - 0–2 - 3 - EE position - MoveIt FK - m * - 3–5 - 3 - EE rpy - MoveIt - rad * - 6–8 - 3 - Unit vector cube → current goal - normalized - unitless * - 9 - 1 - Euclidean distance cube → current goal - ‖goal − cube‖ - m * - 10–16 - 7 - Current joint positions - ``/ur5e/joint_states.position`` - rad * - 17–23 - 7 (or 4) - Previous action - cached - matches action space (incl. gripper scalar) * - 24–30 - 7 - Current joint velocities - ``/joint_states.velocity`` - rad/s * - 31–33 - 3 - Cube position - Gazebo (sim) / ``/cube_pose`` (real) - m * - 34–36 - 3 - Cube rpy - same source - rad * - 37–39 - 3 - Cube linear velocity (finite-diff) - cached + dt - m/s * - 40–42 - 3 - Cube angular velocity (finite-diff) - cached + dt - rad/s * - 43–45 - 3 - Cube position relative to EE - cube_pos − ee_pos - m * - 46 - 1 - ``is_grasped`` (derived binary) - cube_rel_to_ee + knuckle pos - 0.0 or 1.0 The "current goal" in row 6–9 is ``intermediate_goal`` (lift target) when ``multi_goal=True`` and the lift hasn't been reached yet; otherwise it's the final ``pnp_goal``. **Goal env (``UR5ePnPGoalSim-v0`` / ``UR5ePnPGoalReal-v0``).** Dict with three keys: ``observation`` — Box, same as standard env's Box minus goal columns. ``desired_goal`` — Box(3,). Sampled XYZ target. Unlike push, can be elevated off the table for place-at-height: .. list-table:: :widths: 8 16 32 22 22 :header-rows: 1 * - Idx - Dim - Component - Min - Max * - 0 - 1 - goal x - 0.40 - 0.80 * - 1 - 1 - goal y - -0.30 - 0.30 * - 2 - 1 - goal z - 0.795 - 0.95 ``achieved_goal`` — Box(3,). Current cube XYZ. Rewards ------- **Sparse** (required for HER): .. code-block:: text reward = 0.0 if ‖cube − goal‖ < reach_tolerance else -1.0 **Dense** with ``multi_goal=True`` and grasp-aware shaping (default for std env): .. code-block:: text reward = -multiplier_dist_reward * ‖cube − current_goal‖ + (grasp_bonus if is_grasped else 0) + reached_goal_reward if ‖cube − pnp_goal‖ < reach_tolerance + step_reward + joint_limits_reward / none_exe_reward / not_within_goal_space_reward When ``multi_goal=True``, ``current_goal`` is the lift target (``cube_init + [0, 0, lift_height]`` with ``lift_height = 0.15 m``) until ``intermediate_reached`` flips on (when ``‖cube − intermediate_goal‖ < reach_tolerance``); after that, ``current_goal = pnp_goal``. ``grasp_finger_thresh = 0.30`` rad (knuckle floor — closed enough to be grasping something). ``grasp_dist_thresh = 0.05`` m (EE-to-cube ceiling). Code example: .. code-block:: python env = gym.make("UR5ePnPSim-v0", reward_type="Dense", multi_goal=True) env = gym.make("UR5ePnPGoalSim-v0", reward_type="Sparse", multi_goal=True) Starting State -------------- Joint pose: folded upright (same as :doc:`reach`). Gripper: **OPEN** (``init_open_gripper = [0.0]`` rad — opposite of push, where the gripper resets closed). **Cube spawn.** 4 cm red cube at default ``(0.500, -0.150, 0.795)`` in world frame; randomised XY within the spawn box if ``random_cube_spawn=True``. The cube is removed and re-spawned each reset. **Goal sampling.** ``pnp_goal`` ∈ Box(3,) sampled from ``position_goal_min/max``. With ``multi_goal=True``, ``intermediate_goal = cube_init + [0, 0, 0.15]``. Episode End ----------- **Truncation.** ``max_episode_steps`` (default 100). Real env also aborts on stale ``/joint_states``. **Termination.** ``‖cube − pnp_goal‖ < reach_tolerance`` (sparse path only). Arguments --------- Inherits all kwargs from :doc:`reach` and :doc:`push`, plus pnp-specific: .. list-table:: :widths: 24 14 62 :header-rows: 1 * - Kwarg - Default - Meaning * - ``multi_goal`` - ``True`` - Enable the lift-then-place curriculum (``intermediate_goal`` reached → switch to ``pnp_goal``). * - ``lift_height`` - ``0.15`` (m) - Vertical offset of the intermediate lift goal above the cube spawn position. Version History --------------- * ``v0`` — first release (``rl_environments`` v0.1.0). Inverted grasp comparison (``knuckle_pos > grasp_finger_thresh``) + single-element gripper publish — both UR5e-specific vs the prismatic-finger robots.