VX300S — Push
The arm must push a 4 cm cube across the cafe-table to a goal point on the table top. The closed gripper acts as a flat paddle (gripper not in the action vector). Achieved goal is the cube position.
Env IDs: VX300SPushSim-v0 / VX300SPushGoalSim-v0 /
VX300SPushReal-v0 / VX300SPushGoalReal-v0.
Description
VX300S flush-mounted on the cafe-table (z = 0.78). At reset the arm
goes to home pose, the gripper closes
(init_close_gripper = [0.025, -0.025] m), and a red cube spawns
on the table top at z = 0.795. The agent commands joint-space or
EE-space deltas; contact with the closed gripper slides the cube
toward the goal.
Action Space
Joint mode (default). Box(6,) — same 6-joint command as VX300S — Reach (waist / shoulder / elbow / forearm_roll / wrist_angle / wrist_rotate). Gripper is NOT in the action vector for push.
EE mode (ee_action_type=True). Box(3,) — Δ EE position.
Observation Space
Standard env (``VX300SPushSim-v0`` / ``VX300SPushReal-v0``). Box. Extends the reach obs with cube state:
Idx |
Dim |
Component |
Source |
Unit |
|---|---|---|---|---|
0–2 |
3 |
EE position |
MoveIt FK |
m |
3–5 |
3 |
EE rpy |
MoveIt |
rad |
6–8 |
3 |
Unit vector cube → goal |
normalized |
unitless |
9 |
1 |
Distance cube → goal |
‖goal − cube‖ |
m |
10–18 |
9 |
Current joint positions |
|
rad / m |
19–24 |
6 (or 3) |
Previous action |
cached |
matches action space |
25–33 |
9 |
Current joint velocities |
|
rad/s / m/s |
34–36 |
3 |
Cube position (base frame) |
Gazebo / |
m |
37–39 |
3 |
Cube rpy |
same source |
rad |
40–42 |
3 |
Cube linear velocity (finite-diff) |
cached + dt |
m/s |
43–45 |
3 |
Cube angular velocity |
cached + dt |
rad/s |
46–48 |
3 |
Cube relative to EE |
cube − ee |
m |
Goal env (``VX300SPushGoalSim-v0`` / ``VX300SPushGoalReal-v0``). Dict.
desired_goal = Box(3,) on table top (x ∈ [0.20, 0.35],
y ∈ [-0.15, 0.15], z ≈ 0.015 from the on-table-base
perspective — the goal sampling is in the vx300s/base_link frame).
achieved_goal = Box(3,) cube XYZ.
Rewards
Sparse: 0.0 if ‖cube − goal‖ < reach_tolerance else
-1.0.
Dense: same reward components as VX300S — Reach but distance is
measured from the cube to the goal (not EE to goal). Defaults
from config/vx300s_push_task_config.yaml match the reach
defaults plus push-specific reset / shaping.
Starting State
Joint pose: zeros (Interbotix URDF home; safe on-table mount).
Gripper closed at init_close_gripper = [0.025, -0.025] m.
Cube spawn. 4 cm red cube at default
cube_init_pos = [0.25, 0.0, 0.015] in the base frame (=
[0.45, 0.0, 0.795] in world). Randomised within
random_cube_spawn box if enabled.
Goal sampling. Push goal ∈ Box(3,) on the table top within
position_goal_min/max. The goal_x_min was bumped from 0.15
to 0.20 — the VX300S’s longer reach pushes the near-base
dead-zone further out than the RX200’s.
Episode End
Truncation. max_episode_steps (default 100). Real env also
on stale /joint_states.
Termination. ‖cube − goal‖ < reach_tolerance (sparse only).
Arguments
Inherits VX300S — Reach kwargs plus push-specific:
random_cube_spawn (bool), random_goal (bool),
cube_pose_topic (real only, default ``/cube_pose``),
cube_pose_timeout_s (real only, default 1.0 s),
auto_launch_cube_tracker / cube_tracker_camera /
cube_tracker_target_frame (real only).
Version History
v0— first release (rl_environmentsv0.1.0). Closed-gripper paddle.goal_x_min=0.20(bumped from 0.15 due to VX300S near-base dead-zone).