VX300S — Push

The arm must push a 4 cm cube across the cafe-table to a goal point on the table top. The closed gripper acts as a flat paddle (gripper not in the action vector). Achieved goal is the cube position.

Env IDs: VX300SPushSim-v0 / VX300SPushGoalSim-v0 / VX300SPushReal-v0 / VX300SPushGoalReal-v0.

Description

VX300S flush-mounted on the cafe-table (z = 0.78). At reset the arm goes to home pose, the gripper closes (init_close_gripper = [0.025, -0.025] m), and a red cube spawns on the table top at z = 0.795. The agent commands joint-space or EE-space deltas; contact with the closed gripper slides the cube toward the goal.

Action Space

Joint mode (default). Box(6,) — same 6-joint command as VX300S — Reach (waist / shoulder / elbow / forearm_roll / wrist_angle / wrist_rotate). Gripper is NOT in the action vector for push.

EE mode (ee_action_type=True). Box(3,) — Δ EE position.

Observation Space

Standard env (``VX300SPushSim-v0`` / ``VX300SPushReal-v0``). Box. Extends the reach obs with cube state:

Idx	Dim	Component	Source	Unit
0–2	3	EE position	MoveIt FK	m
3–5	3	EE rpy	MoveIt	rad
6–8	3	Unit vector cube → goal	normalized	unitless
9	1	Distance cube → goal	‖goal − cube‖	m
10–18	9	Current joint positions	`/vx300s/joint_states`	rad / m
19–24	6 (or 3)	Previous action	cached	matches action space
25–33	9	Current joint velocities	`/vx300s/joint_states`	rad/s / m/s
34–36	3	Cube position (base frame)	Gazebo / `/cube_pose`	m
37–39	3	Cube rpy	same source	rad
40–42	3	Cube linear velocity (finite-diff)	cached + dt	m/s
43–45	3	Cube angular velocity	cached + dt	rad/s
46–48	3	Cube relative to EE	cube − ee	m

Goal env (``VX300SPushGoalSim-v0`` / ``VX300SPushGoalReal-v0``). Dict. desired_goal = Box(3,) on table top (x ∈ [0.20, 0.35], y ∈ [-0.15, 0.15], z ≈ 0.015 from the on-table-base perspective — the goal sampling is in the vx300s/base_link frame). achieved_goal = Box(3,) cube XYZ.

Rewards

Sparse: 0.0 if ‖cube − goal‖ < reach_tolerance else -1.0.

Dense: same reward components as VX300S — Reach but distance is measured from the cube to the goal (not EE to goal). Defaults from config/vx300s_push_task_config.yaml match the reach defaults plus push-specific reset / shaping.

Starting State

Joint pose: zeros (Interbotix URDF home; safe on-table mount). Gripper closed at init_close_gripper = [0.025, -0.025] m.

Cube spawn. 4 cm red cube at default cube_init_pos = [0.25, 0.0, 0.015] in the base frame (= [0.45, 0.0, 0.795] in world). Randomised within random_cube_spawn box if enabled.

Goal sampling. Push goal ∈ Box(3,) on the table top within position_goal_min/max. The goal_x_min was bumped from 0.15 to 0.20 — the VX300S’s longer reach pushes the near-base dead-zone further out than the RX200’s.

Episode End

Truncation. max_episode_steps (default 100). Real env also on stale /joint_states.

Termination. ‖cube − goal‖ < reach_tolerance (sparse only).

Arguments

Inherits VX300S — Reach kwargs plus push-specific: random_cube_spawn (bool), random_goal (bool), cube_pose_topic (real only, default ``/cube_pose``), cube_pose_timeout_s (real only, default 1.0 s), auto_launch_cube_tracker / cube_tracker_camera / cube_tracker_target_frame (real only).

Version History

v0 — first release (rl_environments v0.1.0). Closed-gripper paddle. goal_x_min=0.20 (bumped from 0.15 due to VX300S near-base dead-zone).