VX300S — Reach

The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded.

This page covers four registered Gymnasium env IDs:

  • VX300SReacherSim-v0 — standard, Gazebo

  • VX300SReacherGoalSim-v0 — goal-conditioned (HER), Gazebo

  • VX300SReacherReal-v0 — standard, real hardware

  • VX300SReacherGoalReal-v0 — goal-conditioned, real hardware

Description

A ViperX-300 S 6-DoF arm with the standard Interbotix two-prismatic- finger gripper sits flush on a cafe_table (top at z = 0.78). The agent commands joint-space deltas (default) or absolute joint positions, or alternatively EE-space deltas (ee_action_type=True). Every commanded action passes through the per-link FK safety check before being published.

The env loop runs at environment_loop_rate (default 10 Hz). In real-time mode (realtime_mode=True, default), Gazebo physics is never paused and step() reads the latest cached obs.

Action Space

Joint mode (default, ee_action_type=False). Box(6,):

Num

Action

Min

Max

Joint

Unit

0

waist delta

-3.14

+3.14

waist

rad

1

shoulder delta

-1.85

+1.26

shoulder

rad

2

elbow delta

-1.76

+1.61

elbow

rad

3

forearm roll delta

-3.14

+3.14

forearm_roll

rad

4

wrist angle delta

-1.87

+2.23

wrist_angle

rad

5

wrist rotate delta

-3.14

+3.14

wrist_rotate

rad

When delta_action=True (default), the action is scaled by delta_coeff = 0.05 and added to the current joint position.

EE mode (ee_action_type=True). Box(3,) — Δ EE position in the robot’s base frame, x ∈ [-0.85, 0.85], y ∈ [-0.85, 0.85], z ∈ [-0.85, 0.85] (loose obs bounds; actual table floor enforced by workspace_min.z + safety_z_margin).

Observation Space

Standard env (``VX300SReacherSim-v0`` / ``VX300SReacherReal-v0``). Box layout:

Idx

Dim

Component

Source

Unit

0–2

3

EE position (vx300s/base_link frame)

MoveIt FK

m

3–5

3

Unit vector EE → goal

normalized

unitless

6

1

Euclidean distance EE → goal

‖goal − ee‖

m

7–15

9

Current joint positions

/vx300s/joint_states.position

rad / m

16–21

6 (or 3)

Previous action

cached

matches action space

22–30

9

Current joint velocities

/vx300s/joint_states.velocity

rad/s / m/s

The 9-element joint vectors are in /joint_states order (alphabetical): elbow, forearm_roll, gripper (continuous virtual joint), left_finger, right_finger, shoulder, waist, wrist_angle, wrist_rotate.

Goal env (``VX300SReacherGoalSim-v0`` / ``VX300SReacherGoalReal-v0``). Dict with three keys. observation is the standard Box minus the goal-related columns. desired_goal and achieved_goal are Box(3,):

Idx

Dim

Component

Min

Max

0

1

goal x

0.25

0.50

1

1

goal y

-0.30

0.30

2

1

goal z

0.20

0.50

(Goal coordinates are in the vx300s/base_link frame which is at world z = 0.78 — so a goal z of 0.20 is 1.0 m above the floor.)

Rewards

Sparse (required for HER): 0.0 if ‖ee goal‖ < 0.02 else -1.0.

Dense (default for std env): dist-shaped penalty + reached-goal bonus + per-step penalty + joint-limit / non-executable / not-in-goal- space penalties. Defaults from config/vx300s_reach_task_config.yaml: reach_tolerance=0.02, multiplier_dist_reward=2.0, reached_goal_reward=20, step_reward=-0.5, joint_limits_reward=-2.0, none_exe_reward=-5.0, not_within_goal_space_reward=-2.0.

env = gym.make("VX300SReacherSim-v0", reward_type="Dense")
env = gym.make("VX300SReacherGoalSim-v0", reward_type="Sparse")

Starting State

Initial joint pose (set via MoveIt set_trajectory_joints(init_pos) — the Interbotix URDF zero pose is collision-free for the on-table mount):

waist        =  0.0
shoulder     =  0.0
elbow        =  0.0
forearm_roll =  0.0
wrist_angle  =  0.0
wrist_rotate =  0.0

Goal sampling. desired_goal ∈ Box(3,) drawn from the position_goal_min/max rosparams (see table above). The per-link FK safety check rejects sampled goals that aren’t reachable without dipping the wrist below table_z + safety_z_margin.

Episode End

Truncation. max_episode_steps (default 100). Real env also aborts on stale /joint_states (> joint_state_timeout_s = 0.5 s).

Termination. ‖ee goal‖ < reach_tolerance (sparse only).

Arguments

Same kwargs as UR5e — Reach (seed, gazebo_gui, reward_type, ee_action_type, delta_action, delta_coeff, environment_loop_rate, action_cycle_time, action_speed, realtime_mode, use_kinect, log_internal_state).

Real-only kwargs: inherits the --allow-real-robot-motion gate from rl_training_validation.utils.env_safety.

Version History

  • v0 — first release (rl_environments v0.1.0).