Creating a real-hardware environment
====================================

RealROS environments follow the same two-layer **Robot env → Task
env** pattern as MultiROS sim envs, but they talk to a physical
robot via the manufacturer's ROS driver (and typically MoveIt)
instead of Gazebo.

If you haven't built a sim env yet, read :doc:`env_creation_sim`
first — most of the task-layer reasoning carries over.


Differences from sim
--------------------

* **No Gazebo lifecycle.** No spawn / despawn / pause / unpause.
  The robot is the robot — you initialise its driver and assume
  it's running.
* **MoveIt is the usual control path.** Joint-space commands and
  cartesian goals are typically issued through MoveIt
  ``MoveGroupCommander`` rather than direct controller publishers.
* **No simulation reset.** "Reset" must be implemented as a safe
  motion to a home pose. Don't randomise initial joint positions
  without checking joint limits and collision space.
* **Wall-clock paced.** Real hardware runs at real time. Avoid any
  per-step ``time.sleep`` patterns that assume simulated time.
* **Real consequences.** Failed actions actually crash the robot.
  Test in sim first; smoke-test slowly on hardware.


1. Pick a starting point
------------------------

Templates live in ``realros/src/realros/templates/``:

* ``robot_envs/MyRealRobotEnv.py``
* ``robot_envs/MyRealRobotGoalEnv.py``
* ``task_envs/MyRealTaskEnv.py``
* ``task_envs/MyRealTaskGoalEnv.py``

Or — if you have a working sim env for the same robot/task —
**start from your sim env** and replace the Gazebo-specific bits
with MoveIt / direct-driver equivalents. The action-space and
reward function should be near-identical, which is the point of
the framework.


2. Implement the robot env
--------------------------

Extends :class:`realros.envs.RealBaseEnv.RealBaseEnv` (or its goal
variant). Required:

* ``_set_init_params(options=None)`` — move to a home pose, reset
  gripper, etc.
* Helpers the task env will call: ``get_joint_positions()``,
  ``move_to_joint_target(...)``, ``move_to_pose(...)``.

Bring up the manufacturer's driver outside the env (via
``roslaunch``). The env attaches to existing ROS topics; it does
not spawn the driver process.


3. Implement the task env
-------------------------

Same gymnasium contract as the sim version:

* ``observation_space`` / ``action_space`` defined in ``__init__``.
* ``_get_observation()``, ``_set_action(action)``,
  ``_compute_terminated(info=None)`` (and ``_compute_truncated`` if
  not relying on the ``TimeLimit`` wrapper),
  ``_get_reward(info=None)`` for non-goal envs or
  ``compute_reward(achieved_goal, desired_goal, info)`` for goal envs.
* ``_set_init_params(options=None)`` for any task-side reset (e.g.
  resample target).

Keep the same Gymnasium ID conventions you'd use in sim, but with a
``Real`` suffix:

.. code-block:: python

   register(
       id="MyRobotReachReal-v0",
       entry_point="my_pkg.task_envs.my_robot_reach_real:MyRobotReachReal",
       max_episode_steps=200,
   )


4. Bring up the robot, then launch the env
------------------------------------------

In one terminal:

.. code-block:: bash

   roslaunch my_robot_bringup robot.launch

In another:

.. code-block:: python

   import rospy
   import uniros as gym
   import my_pkg

   rospy.init_node("real_smoke_test")
   env = gym.make("MyRobotReachReal-v0")
   obs, info = env.reset(seed=0)
   for _ in range(5):
       obs, reward, term, trunc, info = env.step(env.action_space.sample())
       if term or trunc:
           obs, info = env.reset()
   env.close()

Start with a tiny number of steps and a hand near the e-stop.


Sim-to-real transfer
--------------------

The framework is designed so a model trained in simulation can be
validated on the matching real env with the *same* action/observation
spaces. The recommended workflow:

1. Train under ``...Sim-v0`` until you're happy with the policy.
2. Save the model.
3. Load it against ``...Real-v0`` and run validation (no training
   updates):

   .. code-block:: python

      model = SAC.load_trained_model(
          save_path + "trained_model_name",
          env=real_env,
          model_pkg="rl_training_validation",
          config_filename="rx200_reacher_sac.yaml",
      )

See :doc:`using_trained_models` for the full pattern, and
:doc:`training` for joint sim+real training (a single agent updating
on transitions from both environments concurrently).


Safety checklist
----------------

Before clicking "run" on hardware:

* The action space cannot command joint positions outside the
  robot's limits.
* ``_compute_reward`` doesn't have a code path that incentivises
  smashing into the table.
* Episode termination triggers reliably (collision, time limit,
  joint-limit violation).
* You can stop the script with Ctrl+C — the framework's cleanup
  hooks tear down the roscore and the script process gracefully.
* The e-stop is reachable.