UR5e — Reach ============ The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; the gripper is not commanded. The achieved goal is the EE position; the desired goal is the sampled target. This page covers four registered Gymnasium env IDs: * ``UR5eReacherSim-v0`` — standard, Gazebo * ``UR5eReacherGoalSim-v0`` — goal-conditioned (HER), Gazebo * ``UR5eReacherReal-v0`` — standard, real hardware * ``UR5eReacherGoalReal-v0`` — goal-conditioned, real hardware Description ----------- A UR5e arm with a Robotiq 2F-85 gripper sits on a 4-legged ``ur5_base`` (top at z = 0.59) next to a ``cafe_table`` workspace at world (0.7, 0, 0). The agent commands joint-space deltas (default) or absolute joint positions, or alternatively end-effector position deltas in EE mode (``ee_action_type=True``). Every commanded action is checked link-by-link against the workspace floor before being published — actions that would dip a link below the safety floor or into the cafe-table footprint are rejected with a penalty reward. The env loop runs at ``environment_loop_rate`` (default 10 Hz). In real-time mode (``realtime_mode=True``, the default), Gazebo physics is never paused; ``step()`` reads the latest cached obs / reward / done values. Otherwise the standard MDP loop pauses physics around each action. Action Space ------------ **Joint mode** (default, ``ee_action_type=False``). Box(6,): .. list-table:: :widths: 6 28 12 12 30 12 :header-rows: 1 * - Num - Action - Min - Max - Joint - Unit * - 0 - shoulder pan delta (or absolute, per ``delta_action``) - -3.14 - +3.14 - ``shoulder_pan_joint`` - rad * - 1 - shoulder lift delta - -3.14 - +3.14 - ``shoulder_lift_joint`` - rad * - 2 - elbow delta - -3.14 - +3.14 - ``elbow_joint`` - rad * - 3 - wrist 1 delta - -3.14 - +3.14 - ``wrist_1_joint`` - rad * - 4 - wrist 2 delta - -3.14 - +3.14 - ``wrist_2_joint`` - rad * - 5 - wrist 3 delta - -3.14 - +3.14 - ``wrist_3_joint`` - rad When ``delta_action=True`` (default), the action is scaled by ``delta_coeff = 0.05`` and added to the current joint position. When ``delta_action=False`` the action is the absolute joint target, clipped to the box bounds. **EE mode** (``ee_action_type=True``). Box(3,) — ΔEE position in the robot's base frame: .. list-table:: :widths: 6 28 12 12 42 :header-rows: 1 * - Num - Action - Min - Max - Notes * - 0 - Δx (or absolute x) - -0.90 - 1.20 - EE x in base_link frame * - 1 - Δy (or absolute y) - -0.90 - 0.90 - EE y * - 2 - Δz (or absolute z) - 0.00 - 1.50 - EE z The env solves IK against the target EE pose, then publishes the joint-space trajectory through the same per-link safety check as joint mode. Observation Space ----------------- **Standard env (``UR5eReacherSim-v0`` / ``UR5eReacherReal-v0``).** Box(27,) by default (24 if ``ee_action_type=True``): .. list-table:: :widths: 8 16 32 32 12 :header-rows: 1 * - Idx - Dim - Component - Source - Unit * - 0–2 - 3 - EE position - MoveIt FK - m * - 3–5 - 3 - Unit vector EE → goal - normalized - unitless * - 6 - 1 - Euclidean distance EE → goal - ‖goal − ee‖ - m * - 7–13 - 7 - Current joint positions - ``/ur5e/joint_states.position`` - rad * - 14–19 - 6 (or 3) - Previous action - cached - matches action space * - 20–26 - 7 - Current joint velocities - ``/ur5e/joint_states.velocity`` - rad/s The 7-element joint vectors are in alphabetical order from ``/joint_states``: ``elbow_joint``, ``robotiq_85_left_knuckle_joint``, ``shoulder_lift_joint``, ``shoulder_pan_joint``, ``wrist_1_joint``, ``wrist_2_joint``, ``wrist_3_joint``. **Goal env (``UR5eReacherGoalSim-v0`` / ``UR5eReacherGoalReal-v0``).** Gymnasium ``Dict`` with three keys: ``observation`` — Box(24,) (or 21 in EE mode). Same as the standard env's Box minus the EE→goal feature columns (no goal info leaks into the policy's plain observation). ``desired_goal`` — Box(3,). Sampled target XYZ in base frame: .. list-table:: :widths: 8 16 32 32 12 :header-rows: 1 * - Idx - Dim - Component - Min - Max * - 0 - 1 - goal x - 0.40 - 0.80 * - 1 - 1 - goal y - -0.30 - 0.30 * - 2 - 1 - goal z - 0.85 - 1.10 ``achieved_goal`` — Box(3,). Current EE XYZ (same coordinate frame as ``desired_goal``). Rewards ------- The env supports two reward modes selected by the ``reward_type`` kwarg. **Sparse** (``reward_type="Sparse"``, required for HER on goal envs): .. code-block:: text reward = 0.0 if ‖ee − goal‖ < reach_tolerance else -1.0 **Dense** (``reward_type="Dense"``, default for std env): .. code-block:: text reward = -multiplier_dist_reward * ‖ee − goal‖ # step shaping + reached_goal_reward if ‖ee − goal‖ < reach_tolerance + step_reward every step + joint_limits_reward if action outside joint bounds + none_exe_reward if MoveIt plan / FK safety rejects + not_within_goal_space_reward if goal sampling failed Defaults (from ``config/ur5e_reach_task_config.yaml``): ``reach_tolerance=0.02``, ``multiplier_dist_reward=2.0``, ``reached_goal_reward=20``, ``step_reward=-0.5``, ``joint_limits_reward=-2.0``, ``none_exe_reward=-5.0``, ``not_within_goal_space_reward=-2.0``. Code example: .. code-block:: python import uniros as gym import rl_environments # noqa: F401 (triggers registration) # Standard env, dense reward env = gym.make("UR5eReacherSim-v0", reward_type="Dense") # Goal env, sparse reward (HER) env = gym.make("UR5eReacherGoalSim-v0", reward_type="Sparse") Starting State -------------- Initial joint pose (folded upright, set via ``gazebo_msgs/SetModelConfiguration`` while Gazebo is paused, then unpaused): .. code-block:: text shoulder_pan_joint = 0.000 shoulder_lift_joint = -1.5707 (-90°, upper arm vertical up) elbow_joint = 1.5707 (+90°, forearm horizontal forward) wrist_1_joint = -1.5707 wrist_2_joint = -1.5707 wrist_3_joint = 0.000 The arm's all-zeros URDF pose puts it horizontal at base height (z = 0.59), colliding with the cafe-table column at x = 0.7. The folded pose above puts the EE over the workspace at roughly (0.40, 0, 0.95) in world coordinates. **Goal sampling.** Each ``reset()`` draws a fresh ``desired_goal`` ∈ Box(3,) from ``[position_goal_min, position_goal_max]``: .. code-block:: text x ∈ [0.40, 0.80] y ∈ [-0.30, 0.30] z ∈ [0.85, 1.10] This box sits above the cafe-table top (z = 0.775) and within the UR5e's ≈ 0.85 m reach from the arm base at (0, 0, 0.59). Episode End ----------- **Truncation.** Episodes truncate after ``max_episode_steps`` (default 100, set at registration time; override via the ``TimeLimitWrapper`` in the train scripts). Episodes also terminate / truncate if the joint-state staleness gate fires on the real env (``/joint_states`` not updated for ``joint_state_timeout_s = 0.5`` seconds). **Termination.** Episode terminates when the EE reaches the goal (``‖ee − goal‖ < reach_tolerance``). Termination is *only* set on the sparse-reward path; on dense reward the agent keeps accumulating the shaping signal even after reaching the goal until the time limit. Arguments --------- Top-level kwargs to ``gym.make("UR5eReacher*-v0", ...)``. All have sensible defaults; only ``gazebo_gui`` and ``reward_type`` are commonly overridden. .. list-table:: :widths: 24 14 62 :header-rows: 1 * - Kwarg - Default - Meaning * - ``seed`` - ``None`` - RNG seed for goal sampling. * - ``gazebo_gui`` - ``False`` - Set ``True`` to launch Gazebo with the GUI. * - ``reward_type`` - ``"Dense"`` (std) / ``"Sparse"`` (goal) - One of ``"Sparse"`` or ``"Dense"``. * - ``ee_action_type`` - ``False`` - ``True`` → Box(3,) EE action; ``False`` → Box(6,) joint action. * - ``delta_action`` - ``True`` - ``True`` → action interpreted as delta (× ``delta_coeff``); ``False`` → action is the absolute target. * - ``delta_coeff`` - ``0.05`` - Scale factor when ``delta_action=True``. * - ``environment_loop_rate`` - ``10.0`` - Hz for the internal env loop / obs cache update. * - ``action_cycle_time`` - ``0.5`` - Seconds the env waits between actions. Must be ≥ 1 / ``environment_loop_rate``. * - ``action_speed`` - ``0.2`` (sim) / configurable (real) - Time the controller has to interpolate to the commanded joint target. * - ``realtime_mode`` - ``True`` - ``True`` → UniROS real-time loop (physics never paused). ``False`` → MDP-style pause-step-resume. * - ``use_kinect`` - ``False`` - Opt-in subscribe to ``/head_mount_kinect2/*`` for RGB / depth. * - ``log_internal_state`` - ``False`` - Verbose ``rospy.loginfo`` for debugging. Real-only kwargs (``UR5eReacher*Real-v0``): inherits the above plus the ``--allow-real-robot-motion`` gate enforced by ``rl_training_validation.utils.env_safety.check_env_constructable``. Version History --------------- * ``v0`` — first release (``rl_environments`` v0.1.0). Per-link FK safety check; ``SetModelConfiguration`` init-pose path; 27-dim Box obs (standard) or 24-dim Box + 3-dim Box × 2 (goal).