Ecosystem overview ================== The **core framework** is intentionally split into four packages — UniROS, MultiROS, RealROS, and ``sb3_ros_support``. Each solves a clearly-bounded problem; users compose them depending on whether they're training in simulation, on hardware, or both. Two additional **application** packages ship pre-built envs and working training scripts on top: ``rl_environments`` (the gym envs themselves) and ``rl_training_validation`` (the training scripts that exercise them). They're optional — the framework runs without them — but they're the easiest path to a first running example. Architecture at a glance ------------------------ .. code-block:: text +---------------------------+ +---------------------------+ | multiros | | realros | | Gazebo simulation envs | | Real-hardware envs | | - launch_gazebo | | - direct controller I/O | | - parallel roscores | | - MoveIt integration | | - physics tuning | | | +-------------+-------------+ +-------------+-------------+ | | | both re-export from | v v +------------------------------------------+ | UniROS | | uniros._proxy.GymProxy (canonical) | | uniros.utils.{ros_markers, | | ros_kinematics, | | ros_controllers} | | uniros.utils.ros_common | | - port allocator | | - managed-process registry | | - on-Ctrl+C cleanup | +------------------------------------------+ ^ | trained by | +------------------------------------------+ | sb3_ros_support | | Stable Baselines 3 algorithm wrappers | | PPO A2C DDPG TD3 SAC DQN | | + goal-conditioned variants (HER) | +------------------------------------------+ Why four packages? ------------------ **UniROS** is the abstraction layer. It hosts the multiprocessing gym-env proxy class (:class:`uniros._proxy.GymProxy`) and the ROS utility modules shared by every package. multiros and realros re-export the proxy under their historical names (``MultirosGym``, ``RealrosGym``) so existing code keeps working; new code can import ``uniros.GymProxy`` directly. **multiros** focuses on Gazebo: launching simulators on arbitrary ports, spawning multiple parallel envs against the same rosmaster, tuning physics parameters per env. It depends on UniROS for the shared utilities and the proxy class. **realros** is the real-world counterpart. Same gym API, but talks to physical robot drivers and (optionally) MoveIt instead of Gazebo. Code written against ``multiros.make()`` can in many cases be swapped to ``realros.make()`` with only configuration changes. **sb3_ros_support** adapts Stable Baselines 3 to ROS-based training scripts. Each algorithm subclass exposes the same training / validation / save / load surface so swapping PPO for SAC for TD3 is a YAML edit, not a code rewrite. Multiprocessing model --------------------- Every gym env created via ``multiros.make()`` / ``realros.make()`` runs inside a worker ``multiprocessing.Process``. The parent holds a :class:`uniros._proxy.GymProxy` instance that forwards ``step`` / ``reset`` / ``close`` / attribute access over a ``multiprocessing.Pipe`` to the worker. This buys two important properties: 1. **Isolated rospy state per env.** Each worker has its own ``rospy.init_node`` and its own callbacks; envs can run in parallel against different rosmasters without bleeding subscriptions into each other. 2. **Crash isolation.** A misbehaving worker raises a Python exception, which the framework catches and ships back to the parent as a :class:`uniros._proxy._RemoteException`. The parent re-raises with the worker's traceback instead of hanging on the next ``recv()``. Lifecycle / cleanup ------------------- The framework tracks the roscores, Gazebo processes, and roslaunch / rosrun xterms each script spawns and tears them down on ``Ctrl+C`` or normal interpreter exit. The mechanism is layered: 1. **Targeted signal handler.** ``register_managed_process`` (called internally by ``launch_roscore`` / ``launch_gazebo`` / ``ros_launch_launcher`` / ``ros_node_launcher``) installs a SIGINT handler plus an ``atexit`` hook. 2. **rospy shutdown hook.** Because ``rospy.init_node`` installs its own SIGINT handler that would otherwise overwrite ours, ``register_managed_process`` also registers via ``rospy.on_shutdown``. This is what guarantees cleanup runs in the common ``launch_gazebo; rospy.init_node; train`` flow. 3. **Targeted ``pkill`` + process-group kill.** Each tracked roscore is killed by its port (``pkill -f "roscore -p "``), and Gazebo's gzserver / gzclient PIDs are captured at launch and SIGTERM'd individually. Pre-existing ROS sessions on the host are not affected. As of v0.3.1, xterm wrappers from ``ros_launch_launcher`` and ``ros_node_launcher`` are also spawned as process-group leaders (``start_new_session=True``), and cleanup auto-detects session leaders and uses ``os.killpg(pgid, SIGTERM)`` to tear down the whole subtree (e.g. ``xterm → roslaunch → move_group → moveit_python_interface``) in one signal. Previously, terminating only the xterm left grandchildren orphaned to init. 4. **Escape hatch.** After cleanup runs, SIGINT is reset to ``SIG_DFL`` so a subsequent Ctrl+C kills the script immediately even if the training loop is stuck in a non-responsive C call. Reliable launch verification (v0.3.1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``launch_roscore`` verifies the spawned roscore is reachable (TCP probe to the picked port) before returning. On failure, it retries with a fresh kernel-allocated port up to three times, then raises ``RuntimeError``. Previously, a silent xterm/roscore failure (port collision, stale process, display issue) would still report "Roscore launched!" and set ``ROS_MASTER_URI`` at a phantom master. Downstream ``wait_for_service`` calls then blocked their full 30 s timeout, and ``roslaunch`` inside any subsequent ``launch_gazebo`` xterm would start its own rosmaster on an auto-allocated port — so gzserver registered there and the env never found the ``/gazebo/*`` services it expected. Forceful worker subprocess reap (v0.3.2) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`uniros._proxy.GymProxy.close` now escalates to SIGKILL if the worker subprocess doesn't exit after the graceful close command + SIGTERM. Previously, a worker stuck in a non-responsive state (PyKDL C call, Gazebo XMLRPC retry, or its own ``rospy.Timer`` thread spinning post-failure when ``gym.make`` raised mid-init) would survive the parent's ``close()`` and linger as a CPU-burning zombie. The new last-resort SIGKILL guarantees the worker is reaped within ~7 seconds of any ``proxy.close()`` call, regardless of what state the worker is in. If you want to clobber every ROS / Gazebo session on the machine (across users / scripts), use the explicit host-wide helpers: :func:`multiros.utils.ros_common.kill_all_host_ros_and_gazebo` or :func:`realros.utils.ros_common.kill_all_host_ros_processes`. The ``host`` in the name is the warning. Three supported use cases ------------------------- The UniROS paper (`Kapukotuwa et al., 2025 `_, Section IX) lays out three workflows the framework is designed to support. The choice depends on whether you have a simulator, real hardware, or both — and whether you want a sim-trained policy, a real-trained policy, or a generalised policy that performs in both worlds: .. list-table:: :widths: 22 78 :header-rows: 1 * - Use case - What it looks like * - **Real-world only** - Train directly on the physical robot via RealROS. Every episode is a real episode. Slowest, but no reality-gap. See :doc:`env_creation_real` and :doc:`training`. * - **Sim → real transfer** - Train under a MultiROS env, save the model, validate on the matching RealROS env with no further updates. Reasonable when the simulator is close to reality. See :doc:`using_trained_models`. * - **Joint sim + real training** - Hold a sim env and a real env open at the same time, sample episodes from both, and update one policy from the combined replay buffer. The result is a single policy that's competent in both domains by construction. See :doc:`joint_sim_real_training`. Framework-agnostic policies --------------------------- The envs this framework produces are **standard gymnasium environments**. Any reinforcement-learning library that accepts a ``gym.Env`` works: Stable Baselines 3, CleanRL, Tianshou, RLlib, Tensorforce, or your own training loop. ``uniros.make()`` returns a proxy that behaves like ``gym.Env`` while running the underlying env in a worker process; the downstream training code is unchanged. :doc:`/api/sb3_ros_support` is a convenience layer for SB3 users (YAML hyperparameter loading, ROS-aware paths, HER ready for goal envs). It is **one option** — not a requirement. Reading further --------------- * :doc:`/api/uniros` — the canonical class and shared utilities. * :doc:`/api/multiros` — Gazebo-side API. * :doc:`/api/realros` — real-hardware-side API. * :doc:`/api/sb3_ros_support` — algorithm wrappers (one option for SB3 users; vanilla SB3 / CleanRL / Tianshou / RLlib all work). * :doc:`testing` — how to run the regression test suites.