UR5e — Push

The arm must push a 4 cm cube across the cafe-table to a goal point on the table top. The Robotiq gripper is held closed throughout (it acts as a flat paddle); the gripper is not in the action vector. Achieved goal is the cube position; desired goal is the sampled target on the table top.

This page covers four registered Gymnasium env IDs:

UR5ePushSim-v0 — standard, Gazebo
UR5ePushGoalSim-v0 — goal-conditioned (HER), Gazebo
UR5ePushReal-v0 — standard, real hardware
UR5ePushGoalReal-v0 — goal-conditioned, real hardware

Description

Same hardware geometry as UR5e — Reach (UR5e on 4-legged ur5_base next to the cafe-table at world (0.7, 0)). At reset the arm folds up, the gripper closes, and a red cube spawns on the cafe-table top at z = 0.795. The agent commands joint-space (default) or EE-space deltas; the closed gripper makes contact with the cube and the cube slides toward the target. Episodes end when the cube reaches the goal point (‖cube − goal‖ < reach_tolerance) or after the truncation limit.

Per-link FK safety, joint-state staleness gate, --allow-real-robot-motion real-side gate — all identical to the reach env.

Action Space

Joint mode (default, ee_action_type=False). Box(6,) — same 6-joint arm command as UR5e — Reach. The gripper is NOT in the action vector for push.

EE mode (ee_action_type=True). Box(3,) — Δ EE position. The env solves IK and publishes the joint-space trajectory through the same per-link FK safety check.

See UR5e — Reach for the per-joint Min/Max table (identical here).

Observation Space

Standard env (``UR5ePushSim-v0`` / ``UR5ePushReal-v0``). Box obs adds cube state on top of the arm state used by reach. Full layout (default, joint-mode):

Idx	Dim	Component	Source	Unit
0–2	3	EE position (base frame)	MoveIt FK	m
3–5	3	EE rpy	MoveIt	rad
6–8	3	Unit vector cube → goal	normalized	unitless
9	1	Euclidean distance cube → goal	‖goal − cube‖	m
10–16	7	Current joint positions	`/ur5e/joint_states.position`	rad
17–22	6 (or 3)	Previous action	cached	matches action space
23–29	7	Current joint velocities	`/ur5e/joint_states.velocity`	rad/s
30–32	3	Cube position (base frame)	Gazebo `get_model_state` (sim) / `/cube_pose` (real)	m
33–35	3	Cube rpy	same source	rad
36–38	3	Cube linear velocity (finite-diff)	cached + dt	m/s
39–41	3	Cube angular velocity (finite-diff)	cached + dt	rad/s
42–44	3	Cube position relative to EE	cube_pos − ee_pos	m

Goal env (``UR5ePushGoalSim-v0`` / ``UR5ePushGoalReal-v0``). Dict with three keys:

observation — Box, same as standard env’s Box minus the goal-related columns (no goal info leaks into the policy’s plain observation).

desired_goal — Box(3,). Sampled XYZ target on the cafe-table top:

Idx	Dim	Component	Min	Max
0	1	goal x	0.40	0.80
1	1	goal y	-0.30	0.30
2	1	goal z	0.785	0.80

achieved_goal — Box(3,). Current cube XYZ (not EE — push tracks the cube). Same coordinate frame as desired_goal.

Rewards

Sparse (required for HER):

reward = 0.0  if ‖cube − goal‖ < reach_tolerance else -1.0

Dense (default for std env):

reward = -multiplier_dist_reward * ‖cube − goal‖
       + reached_goal_reward     if ‖cube − goal‖ < reach_tolerance
       + step_reward             every step
       + joint_limits_reward     if action outside joint bounds
       + none_exe_reward         if MoveIt plan / FK safety rejects
       + not_within_goal_space_reward  if goal sampling failed

Defaults (from config/ur5e_push_task_config.yaml): reach_tolerance=0.05, multiplier_dist_reward=2.0, reached_goal_reward=20, step_reward=-0.5, joint_limits_reward=-2.0, none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.

Starting State

Same folded-upright joint pose as UR5e — Reach. After the joint config is applied, the gripper is commanded closed (init_close_gripper = [0.7] rad knuckle).

Cube spawn. A 4 cm red cube spawns at world coordinates (0.500, -0.150, 0.795) by default. random_cube_spawn=True (default) randomises the XY within cube_init_pos ± random_offset. The cube model is removed and re-spawned at every reset to clear residual physics.

Goal sampling. desired_goal ∈ Box(3,) drawn from the position_goal_min/max rosparams above. Goals always sit on the cafe-table top (z ≈ 0.795).

Episode End

Truncation. Episodes truncate after max_episode_steps (default 100). Real env additionally aborts the loop tick if /ur5e/joint_states is stale for > joint_state_timeout_s (0.5 s).

Termination. Episode terminates when ‖cube − goal‖ < reach_tolerance (sparse reward path only — dense keeps shaping past the goal until the time limit).

Arguments

Inherits all kwargs from UR5e — Reach plus push-specific:

Kwarg	Default	Meaning
`random_cube_spawn`	`True`	Randomise cube XY within the spawn box each reset.
`random_goal`	`True`	Randomise the push goal each reset (else use the static `push_goal` from `_set_init_params`).
`cube_pose_topic` (real only)	`/cube_pose`	Topic the env subscribes to for cube pose (`geometry_msgs/PoseStamped`).
`auto_launch_cube_tracker` (real only)	`False`	If `True`, the env auto-launches `rl_envs_cube_tracker/<camera>.launch` (default kinect2).
`cube_tracker_camera` (real only)	`"kinect2"`	One of `"kinect2"`, `"zed2"`, `"d405"`.
`cube_tracker_target_frame` (real only)	`""`	If non-empty, TF-transforms `/cube_pose` into this frame (e.g. `"base_link"` for UR5e).

Version History

v0 — first release (rl_environments v0.1.0). Closed-gripper paddle (init_close_gripper = [0.7] rad knuckle). Goal x_min = 0.40 (UR5e near-base dead-zone shifts further out than the Interbotix robots’ goal box).