VX300S — Reach
The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded.
This page covers four registered Gymnasium env IDs:
VX300SReacherSim-v0— standard, GazeboVX300SReacherGoalSim-v0— goal-conditioned (HER), GazeboVX300SReacherReal-v0— standard, real hardwareVX300SReacherGoalReal-v0— goal-conditioned, real hardware
Description
A ViperX-300 S 6-DoF arm with the standard Interbotix two-prismatic-
finger gripper sits flush on a cafe_table (top at z = 0.78). The
agent commands joint-space deltas (default) or absolute joint
positions, or alternatively EE-space deltas
(ee_action_type=True). Every commanded action passes through the
per-link FK safety check before being published.
The env loop runs at environment_loop_rate (default 10 Hz). In
real-time mode (realtime_mode=True, default), Gazebo physics is
never paused and step() reads the latest cached obs.
Action Space
Joint mode (default, ee_action_type=False). Box(6,):
Num |
Action |
Min |
Max |
Joint |
Unit |
|---|---|---|---|---|---|
0 |
waist delta |
-3.14 |
+3.14 |
|
rad |
1 |
shoulder delta |
-1.85 |
+1.26 |
|
rad |
2 |
elbow delta |
-1.76 |
+1.61 |
|
rad |
3 |
forearm roll delta |
-3.14 |
+3.14 |
|
rad |
4 |
wrist angle delta |
-1.87 |
+2.23 |
|
rad |
5 |
wrist rotate delta |
-3.14 |
+3.14 |
|
rad |
When delta_action=True (default), the action is scaled by
delta_coeff = 0.05 and added to the current joint position.
EE mode (ee_action_type=True). Box(3,) — Δ EE position in
the robot’s base frame, x ∈ [-0.85, 0.85], y ∈ [-0.85, 0.85],
z ∈ [-0.85, 0.85] (loose obs bounds; actual table floor enforced
by workspace_min.z + safety_z_margin).
Observation Space
Standard env (``VX300SReacherSim-v0`` / ``VX300SReacherReal-v0``). Box layout:
Idx |
Dim |
Component |
Source |
Unit |
|---|---|---|---|---|
0–2 |
3 |
EE position (vx300s/base_link frame) |
MoveIt FK |
m |
3–5 |
3 |
Unit vector EE → goal |
normalized |
unitless |
6 |
1 |
Euclidean distance EE → goal |
‖goal − ee‖ |
m |
7–15 |
9 |
Current joint positions |
|
rad / m |
16–21 |
6 (or 3) |
Previous action |
cached |
matches action space |
22–30 |
9 |
Current joint velocities |
|
rad/s / m/s |
The 9-element joint vectors are in /joint_states order
(alphabetical): elbow, forearm_roll, gripper (continuous virtual
joint), left_finger, right_finger, shoulder, waist, wrist_angle,
wrist_rotate.
Goal env (``VX300SReacherGoalSim-v0`` / ``VX300SReacherGoalReal-v0``).
Dict with three keys. observation is the standard Box minus the
goal-related columns. desired_goal and achieved_goal are
Box(3,):
Idx |
Dim |
Component |
Min |
Max |
|---|---|---|---|---|
0 |
1 |
goal x |
0.25 |
0.50 |
1 |
1 |
goal y |
-0.30 |
0.30 |
2 |
1 |
goal z |
0.20 |
0.50 |
(Goal coordinates are in the vx300s/base_link frame which is at
world z = 0.78 — so a goal z of 0.20 is 1.0 m above the floor.)
Rewards
Sparse (required for HER): 0.0 if ‖ee − goal‖ < 0.02 else
-1.0.
Dense (default for std env): dist-shaped penalty + reached-goal
bonus + per-step penalty + joint-limit / non-executable / not-in-goal-
space penalties. Defaults from config/vx300s_reach_task_config.yaml:
reach_tolerance=0.02, multiplier_dist_reward=2.0,
reached_goal_reward=20, step_reward=-0.5,
joint_limits_reward=-2.0, none_exe_reward=-5.0,
not_within_goal_space_reward=-2.0.
env = gym.make("VX300SReacherSim-v0", reward_type="Dense")
env = gym.make("VX300SReacherGoalSim-v0", reward_type="Sparse")
Starting State
Initial joint pose (set via MoveIt
set_trajectory_joints(init_pos) — the Interbotix URDF zero pose
is collision-free for the on-table mount):
waist = 0.0
shoulder = 0.0
elbow = 0.0
forearm_roll = 0.0
wrist_angle = 0.0
wrist_rotate = 0.0
Goal sampling. desired_goal ∈ Box(3,) drawn from the
position_goal_min/max rosparams (see table above). The
per-link FK safety check rejects sampled goals that aren’t reachable
without dipping the wrist below table_z + safety_z_margin.
Episode End
Truncation. max_episode_steps (default 100). Real env also
aborts on stale /joint_states (>
joint_state_timeout_s = 0.5 s).
Termination. ‖ee − goal‖ < reach_tolerance (sparse only).
Arguments
Same kwargs as UR5e — Reach (seed, gazebo_gui,
reward_type, ee_action_type, delta_action, delta_coeff,
environment_loop_rate, action_cycle_time, action_speed,
realtime_mode, use_kinect, log_internal_state).
Real-only kwargs: inherits the --allow-real-robot-motion gate from
rl_training_validation.utils.env_safety.
Version History
v0— first release (rl_environmentsv0.1.0).