UR5e — Reach

The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; the gripper is not commanded. The achieved goal is the EE position; the desired goal is the sampled target.

This page covers four registered Gymnasium env IDs:

  • UR5eReacherSim-v0 — standard, Gazebo

  • UR5eReacherGoalSim-v0 — goal-conditioned (HER), Gazebo

  • UR5eReacherReal-v0 — standard, real hardware

  • UR5eReacherGoalReal-v0 — goal-conditioned, real hardware

Description

A UR5e arm with a Robotiq 2F-85 gripper sits on a 4-legged ur5_base (top at z = 0.59) next to a cafe_table workspace at world (0.7, 0, 0). The agent commands joint-space deltas (default) or absolute joint positions, or alternatively end-effector position deltas in EE mode (ee_action_type=True). Every commanded action is checked link-by-link against the workspace floor before being published — actions that would dip a link below the safety floor or into the cafe-table footprint are rejected with a penalty reward.

The env loop runs at environment_loop_rate (default 10 Hz). In real-time mode (realtime_mode=True, the default), Gazebo physics is never paused; step() reads the latest cached obs / reward / done values. Otherwise the standard MDP loop pauses physics around each action.

Action Space

Joint mode (default, ee_action_type=False). Box(6,):

Num

Action

Min

Max

Joint

Unit

0

shoulder pan delta (or absolute, per delta_action)

-3.14

+3.14

shoulder_pan_joint

rad

1

shoulder lift delta

-3.14

+3.14

shoulder_lift_joint

rad

2

elbow delta

-3.14

+3.14

elbow_joint

rad

3

wrist 1 delta

-3.14

+3.14

wrist_1_joint

rad

4

wrist 2 delta

-3.14

+3.14

wrist_2_joint

rad

5

wrist 3 delta

-3.14

+3.14

wrist_3_joint

rad

When delta_action=True (default), the action is scaled by delta_coeff = 0.05 and added to the current joint position. When delta_action=False the action is the absolute joint target, clipped to the box bounds.

EE mode (ee_action_type=True). Box(3,) — ΔEE position in the robot’s base frame:

Num

Action

Min

Max

Notes

0

Δx (or absolute x)

-0.90

1.20

EE x in base_link frame

1

Δy (or absolute y)

-0.90

0.90

EE y

2

Δz (or absolute z)

0.00

1.50

EE z

The env solves IK against the target EE pose, then publishes the joint-space trajectory through the same per-link safety check as joint mode.

Observation Space

Standard env (``UR5eReacherSim-v0`` / ``UR5eReacherReal-v0``). Box(27,) by default (24 if ee_action_type=True):

Idx

Dim

Component

Source

Unit

0–2

3

EE position

MoveIt FK

m

3–5

3

Unit vector EE → goal

normalized

unitless

6

1

Euclidean distance EE → goal

‖goal − ee‖

m

7–13

7

Current joint positions

/ur5e/joint_states.position

rad

14–19

6 (or 3)

Previous action

cached

matches action space

20–26

7

Current joint velocities

/ur5e/joint_states.velocity

rad/s

The 7-element joint vectors are in alphabetical order from /joint_states: elbow_joint, robotiq_85_left_knuckle_joint, shoulder_lift_joint, shoulder_pan_joint, wrist_1_joint, wrist_2_joint, wrist_3_joint.

Goal env (``UR5eReacherGoalSim-v0`` / ``UR5eReacherGoalReal-v0``). Gymnasium Dict with three keys:

observation — Box(24,) (or 21 in EE mode). Same as the standard env’s Box minus the EE→goal feature columns (no goal info leaks into the policy’s plain observation).

desired_goal — Box(3,). Sampled target XYZ in base frame:

Idx

Dim

Component

Min

Max

0

1

goal x

0.40

0.80

1

1

goal y

-0.30

0.30

2

1

goal z

0.85

1.10

achieved_goal — Box(3,). Current EE XYZ (same coordinate frame as desired_goal).

Rewards

The env supports two reward modes selected by the reward_type kwarg.

Sparse (reward_type="Sparse", required for HER on goal envs):

reward = 0.0  if ‖ee − goal‖ < reach_tolerance else -1.0

Dense (reward_type="Dense", default for std env):

reward = -multiplier_dist_reward * ‖ee − goal‖   # step shaping
       + reached_goal_reward     if ‖ee − goal‖ < reach_tolerance
       + step_reward             every step
       + joint_limits_reward     if action outside joint bounds
       + none_exe_reward         if MoveIt plan / FK safety rejects
       + not_within_goal_space_reward  if goal sampling failed

Defaults (from config/ur5e_reach_task_config.yaml): reach_tolerance=0.02, multiplier_dist_reward=2.0, reached_goal_reward=20, step_reward=-0.5, joint_limits_reward=-2.0, none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.

Code example:

import uniros as gym
import rl_environments  # noqa: F401  (triggers registration)

# Standard env, dense reward
env = gym.make("UR5eReacherSim-v0", reward_type="Dense")
# Goal env, sparse reward (HER)
env = gym.make("UR5eReacherGoalSim-v0", reward_type="Sparse")

Starting State

Initial joint pose (folded upright, set via gazebo_msgs/SetModelConfiguration while Gazebo is paused, then unpaused):

shoulder_pan_joint  =  0.000
shoulder_lift_joint = -1.5707  (-90°, upper arm vertical up)
elbow_joint         =  1.5707  (+90°, forearm horizontal forward)
wrist_1_joint       = -1.5707
wrist_2_joint       = -1.5707
wrist_3_joint       =  0.000

The arm’s all-zeros URDF pose puts it horizontal at base height (z = 0.59), colliding with the cafe-table column at x = 0.7. The folded pose above puts the EE over the workspace at roughly (0.40, 0, 0.95) in world coordinates.

Goal sampling. Each reset() draws a fresh desired_goal ∈ Box(3,) from [position_goal_min, position_goal_max]:

x ∈ [0.40, 0.80]   y ∈ [-0.30, 0.30]   z ∈ [0.85, 1.10]

This box sits above the cafe-table top (z = 0.775) and within the UR5e’s ≈ 0.85 m reach from the arm base at (0, 0, 0.59).

Episode End

Truncation. Episodes truncate after max_episode_steps (default 100, set at registration time; override via the TimeLimitWrapper in the train scripts). Episodes also terminate / truncate if the joint-state staleness gate fires on the real env (/joint_states not updated for joint_state_timeout_s = 0.5 seconds).

Termination. Episode terminates when the EE reaches the goal (‖ee goal‖ < reach_tolerance). Termination is only set on the sparse-reward path; on dense reward the agent keeps accumulating the shaping signal even after reaching the goal until the time limit.

Arguments

Top-level kwargs to gym.make("UR5eReacher*-v0", ...). All have sensible defaults; only gazebo_gui and reward_type are commonly overridden.

Kwarg

Default

Meaning

seed

None

RNG seed for goal sampling.

gazebo_gui

False

Set True to launch Gazebo with the GUI.

reward_type

"Dense" (std) / "Sparse" (goal)

One of "Sparse" or "Dense".

ee_action_type

False

True → Box(3,) EE action; False → Box(6,) joint action.

delta_action

True

True → action interpreted as delta (× delta_coeff); False → action is the absolute target.

delta_coeff

0.05

Scale factor when delta_action=True.

environment_loop_rate

10.0

Hz for the internal env loop / obs cache update.

action_cycle_time

0.5

Seconds the env waits between actions. Must be ≥ 1 / environment_loop_rate.

action_speed

0.2 (sim) / configurable (real)

Time the controller has to interpolate to the commanded joint target.

realtime_mode

True

True → UniROS real-time loop (physics never paused). False → MDP-style pause-step-resume.

use_kinect

False

Opt-in subscribe to /head_mount_kinect2/* for RGB / depth.

log_internal_state

False

Verbose rospy.loginfo for debugging.

Real-only kwargs (UR5eReacher*Real-v0): inherits the above plus the --allow-real-robot-motion gate enforced by rl_training_validation.utils.env_safety.check_env_constructable.

Version History

  • v0 — first release (rl_environments v0.1.0). Per-link FK safety check; SetModelConfiguration init-pose path; 27-dim Box obs (standard) or 24-dim Box + 3-dim Box × 2 (goal).