VX300S — Reach ============== The arm must move its end-effector to a 3D target sampled in the workspace above the cafe-table. No cube; gripper not commanded. This page covers four registered Gymnasium env IDs: * ``VX300SReacherSim-v0`` — standard, Gazebo * ``VX300SReacherGoalSim-v0`` — goal-conditioned (HER), Gazebo * ``VX300SReacherReal-v0`` — standard, real hardware * ``VX300SReacherGoalReal-v0`` — goal-conditioned, real hardware Description ----------- A ViperX-300 S 6-DoF arm with the standard Interbotix two-prismatic- finger gripper sits flush on a ``cafe_table`` (top at z = 0.78). The agent commands joint-space deltas (default) or absolute joint positions, or alternatively EE-space deltas (``ee_action_type=True``). Every commanded action passes through the per-link FK safety check before being published. The env loop runs at ``environment_loop_rate`` (default 10 Hz). In real-time mode (``realtime_mode=True``, default), Gazebo physics is never paused and ``step()`` reads the latest cached obs. Action Space ------------ **Joint mode** (default, ``ee_action_type=False``). Box(6,): .. list-table:: :widths: 6 28 12 12 26 16 :header-rows: 1 * - Num - Action - Min - Max - Joint - Unit * - 0 - waist delta - -3.14 - +3.14 - ``waist`` - rad * - 1 - shoulder delta - -1.85 - +1.26 - ``shoulder`` - rad * - 2 - elbow delta - -1.76 - +1.61 - ``elbow`` - rad * - 3 - forearm roll delta - -3.14 - +3.14 - ``forearm_roll`` - rad * - 4 - wrist angle delta - -1.87 - +2.23 - ``wrist_angle`` - rad * - 5 - wrist rotate delta - -3.14 - +3.14 - ``wrist_rotate`` - rad When ``delta_action=True`` (default), the action is scaled by ``delta_coeff = 0.05`` and added to the current joint position. **EE mode** (``ee_action_type=True``). Box(3,) — Δ EE position in the robot's base frame, x ∈ [-0.85, 0.85], y ∈ [-0.85, 0.85], z ∈ [-0.85, 0.85] (loose obs bounds; actual table floor enforced by ``workspace_min.z + safety_z_margin``). Observation Space ----------------- **Standard env (``VX300SReacherSim-v0`` / ``VX300SReacherReal-v0``).** Box layout: .. list-table:: :widths: 8 14 36 30 12 :header-rows: 1 * - Idx - Dim - Component - Source - Unit * - 0–2 - 3 - EE position (vx300s/base_link frame) - MoveIt FK - m * - 3–5 - 3 - Unit vector EE → goal - normalized - unitless * - 6 - 1 - Euclidean distance EE → goal - ‖goal − ee‖ - m * - 7–15 - 9 - Current joint positions - ``/vx300s/joint_states.position`` - rad / m * - 16–21 - 6 (or 3) - Previous action - cached - matches action space * - 22–30 - 9 - Current joint velocities - ``/vx300s/joint_states.velocity`` - rad/s / m/s The 9-element joint vectors are in ``/joint_states`` order (alphabetical): elbow, forearm_roll, gripper (continuous virtual joint), left_finger, right_finger, shoulder, waist, wrist_angle, wrist_rotate. **Goal env (``VX300SReacherGoalSim-v0`` / ``VX300SReacherGoalReal-v0``).** Dict with three keys. ``observation`` is the standard Box minus the goal-related columns. ``desired_goal`` and ``achieved_goal`` are Box(3,): .. list-table:: :widths: 8 16 32 22 22 :header-rows: 1 * - Idx - Dim - Component - Min - Max * - 0 - 1 - goal x - 0.25 - 0.50 * - 1 - 1 - goal y - -0.30 - 0.30 * - 2 - 1 - goal z - 0.20 - 0.50 (Goal coordinates are in the ``vx300s/base_link`` frame which is at world z = 0.78 — so a goal z of 0.20 is 1.0 m above the floor.) Rewards ------- **Sparse** (required for HER): ``0.0`` if ``‖ee − goal‖ < 0.02`` else ``-1.0``. **Dense** (default for std env): dist-shaped penalty + reached-goal bonus + per-step penalty + joint-limit / non-executable / not-in-goal- space penalties. Defaults from ``config/vx300s_reach_task_config.yaml``: ``reach_tolerance=0.02``, ``multiplier_dist_reward=2.0``, ``reached_goal_reward=20``, ``step_reward=-0.5``, ``joint_limits_reward=-2.0``, ``none_exe_reward=-5.0``, ``not_within_goal_space_reward=-2.0``. .. code-block:: python env = gym.make("VX300SReacherSim-v0", reward_type="Dense") env = gym.make("VX300SReacherGoalSim-v0", reward_type="Sparse") Starting State -------------- Initial joint pose (set via MoveIt ``set_trajectory_joints(init_pos)`` — the Interbotix URDF zero pose is collision-free for the on-table mount): .. code-block:: text waist = 0.0 shoulder = 0.0 elbow = 0.0 forearm_roll = 0.0 wrist_angle = 0.0 wrist_rotate = 0.0 **Goal sampling.** ``desired_goal`` ∈ Box(3,) drawn from the ``position_goal_min/max`` rosparams (see table above). The per-link FK safety check rejects sampled goals that aren't reachable without dipping the wrist below ``table_z + safety_z_margin``. Episode End ----------- **Truncation.** ``max_episode_steps`` (default 100). Real env also aborts on stale ``/joint_states`` (> ``joint_state_timeout_s = 0.5 s``). **Termination.** ``‖ee − goal‖ < reach_tolerance`` (sparse only). Arguments --------- Same kwargs as :doc:`/envs/ur5e/reach` (``seed``, ``gazebo_gui``, ``reward_type``, ``ee_action_type``, ``delta_action``, ``delta_coeff``, ``environment_loop_rate``, ``action_cycle_time``, ``action_speed``, ``realtime_mode``, ``use_kinect``, ``log_internal_state``). Real-only kwargs: inherits the ``--allow-real-robot-motion`` gate from ``rl_training_validation.utils.env_safety``. Version History --------------- * ``v0`` — first release (``rl_environments`` v0.1.0).