RX200 — Reach
=============

The arm must move its end-effector to a 3D target sampled in the
workspace above the cafe-table. No cube; gripper not commanded.

Env IDs: ``RX200ReacherSim-v0`` / ``RX200ReacherGoalSim-v0`` /
``RX200ReacherReal-v0`` / ``RX200ReacherGoalReal-v0``. Sim-only ZED 2
sensor variants: ``RX200Zed2ReacherSim-v0`` /
``RX200Zed2ReacherGoalSim-v0``.

Description
-----------

A Trossen ReactorX-200 5-DoF arm with two prismatic gripper fingers
sits flush on a ``cafe_table`` (top at z = 0.78). Reach ≈ 550 mm.
Joint-space or EE-space deltas, per-link FK safety, real-time or
MDP-pause step mode — same architecture as the other robots'
reach env (see :doc:`/envs/ur5e/reach`).

Action Space
------------

**Joint mode** (default). Box(5,):

.. list-table::
   :widths: 6 28 12 12 26 16
   :header-rows: 1

   * - Num
     - Action
     - Min
     - Max
     - Joint
     - Unit
   * - 0
     - waist delta
     - -3.14
     - +3.14
     - ``waist``
     - rad
   * - 1
     - shoulder delta
     - -1.85
     - +1.26
     - ``shoulder``
     - rad
   * - 2
     - elbow delta
     - -1.76
     - +1.61
     - ``elbow``
     - rad
   * - 3
     - wrist angle delta
     - -1.87
     - +2.23
     - ``wrist_angle``
     - rad
   * - 4
     - wrist rotate delta
     - -3.14
     - +3.14
     - ``wrist_rotate``
     - rad

When ``delta_action=True`` (default), scaled by ``delta_coeff = 0.05``
and added to the current joint position.

**EE mode** (``ee_action_type=True``). Box(3,) — Δ EE position in
the ``rx200/base_link`` frame.

Observation Space
-----------------

**Standard env.** Box layout:

* EE position (3, base frame, m)
* Unit vector EE → goal (3, normalised)
* Distance EE → goal (1, m)
* Current joint positions (8, ``/rx200/joint_states.position``,
  alphabetical: elbow, gripper continuous, left_finger, right_finger,
  shoulder, waist, wrist_angle, wrist_rotate)
* Previous action (5 or 3)
* Current joint velocities (8)

**Goal env.** Dict with three keys. ``desired_goal`` /
``achieved_goal`` = Box(3,). The bounds below are the declared
*observation-space* bounds (mirror ``position_desired_goal_min/max``
in ``rx200_reach_task_config.yaml``); for RX200 the per-episode
*sampling* support (``position_goal_min/max``) happens to match
exactly.

.. list-table::
   :widths: 8 16 32 22 22
   :header-rows: 1

   * - Idx
     - Dim
     - Component
     - Min
     - Max
   * - 0
     - 1
     - goal x
     - 0.15
     - 0.25
   * - 1
     - 1
     - goal y
     - -0.15
     - 0.15
   * - 2
     - 1
     - goal z
     - 0.15
     - 0.25

(Goal coords are in the ``rx200/base_link`` frame; base is at world
z = 0.78. Values mirror ``position_(desired_)goal_min/max`` in
``rx200_reach_task_config.yaml``.)

Rewards
-------

**Sparse**: ``0.0`` if ``‖ee − goal‖ < 0.02`` else ``-1.0``.

**Dense**: dist-shaped + reached-goal bonus + per-step penalty +
joint/none/goal-space penalties. Defaults from
``config/rx200_reach_task_config.yaml``: ``reach_tolerance=0.02``,
``multiplier_dist_reward=2.0``, ``reached_goal_reward=20``,
``step_reward=-0.5``, ``joint_limits_reward=-2.0``,
``none_exe_reward=-5.0``, ``not_within_goal_space_reward=-2.0``.

.. code-block:: python

   env = gym.make("RX200ReacherSim-v0", reward_type="Dense")
   env = gym.make("RX200ReacherGoalSim-v0", reward_type="Sparse")

Starting State
--------------

Initial joint pose: zeros (Interbotix URDF home — safe for the
on-table mount):

.. code-block:: text

   waist        = 0.0
   shoulder     = 0.0
   elbow        = 0.0
   wrist_angle  = 0.0
   wrist_rotate = 0.0

**Goal sampling.** ``desired_goal`` ∈ Box(3,) from
``position_goal_min/max``. Per-link FK rejects sampled goals that
can't be reached without dipping a link below
``table_z + safety_z_margin``.

Episode End
-----------

**Truncation.** ``max_episode_steps`` (default 100). Real env aborts
on stale ``/rx200/joint_states``.

**Termination.** ``‖ee − goal‖ < reach_tolerance`` (sparse only).

Arguments
---------

Same kwargs as :doc:`/envs/ur5e/reach`. Sim-only variants:
``use_kinect`` (default) for ``/head_mount_kinect2/*``;
``RX200Zed2*`` IDs use the ZED 2 stereo camera instead.

Version History
---------------

* ``v0`` — first release (``rl_environments`` v0.1.0). Framework
  reference robot — most-exercised env in the ecosystem benchmarks
  (Use Cases A–C in the UniROS paper).