Ecosystem overview

The core framework is intentionally split into four packages — UniROS, MultiROS, RealROS, and sb3_ros_support. Each solves a clearly-bounded problem; users compose them depending on whether they’re training in simulation, on hardware, or both.

Two additional application packages ship pre-built envs and working training scripts on top: rl_environments (the gym envs themselves) and rl_training_validation (the training scripts that exercise them). They’re optional — the framework runs without them — but they’re the easiest path to a first running example.

Architecture at a glance

+---------------------------+      +---------------------------+
|        multiros           |      |         realros           |
|  Gazebo simulation envs   |      |  Real-hardware envs       |
|  - launch_gazebo          |      |  - direct controller I/O  |
|  - parallel roscores      |      |  - MoveIt integration     |
|  - physics tuning         |      |                           |
+-------------+-------------+      +-------------+-------------+
              |                                  |
              |  both re-export from             |
              v                                  v
        +------------------------------------------+
        |                  UniROS                  |
        |   uniros._proxy.GymProxy  (canonical)    |
        |   uniros.utils.{ros_markers,             |
        |                 ros_kinematics,          |
        |                 ros_controllers}         |
        |   uniros.utils.ros_common               |
        |     - port allocator                    |
        |     - managed-process registry          |
        |     - on-Ctrl+C cleanup                 |
        +------------------------------------------+
                        ^
                        | trained by
                        |
        +------------------------------------------+
        |             sb3_ros_support              |
        |  Stable Baselines 3 algorithm wrappers   |
        |  PPO  A2C  DDPG  TD3  SAC  DQN           |
        |  + goal-conditioned variants (HER)       |
        +------------------------------------------+

Why four packages?

UniROS is the abstraction layer. It hosts the multiprocessing gym-env proxy class (uniros._proxy.GymProxy) and the ROS utility modules shared by every package. multiros and realros re-export the proxy under their historical names (MultirosGym, RealrosGym) so existing code keeps working; new code can import uniros.GymProxy directly.

multiros focuses on Gazebo: launching simulators on arbitrary ports, spawning multiple parallel envs against the same rosmaster, tuning physics parameters per env. It depends on UniROS for the shared utilities and the proxy class.

realros is the real-world counterpart. Same gym API, but talks to physical robot drivers and (optionally) MoveIt instead of Gazebo. Code written against multiros.make() can in many cases be swapped to realros.make() with only configuration changes.

sb3_ros_support adapts Stable Baselines 3 to ROS-based training scripts. Each algorithm subclass exposes the same training / validation / save / load surface so swapping PPO for SAC for TD3 is a YAML edit, not a code rewrite.

Multiprocessing model

Every gym env created via multiros.make() / realros.make() runs inside a worker multiprocessing.Process. The parent holds a uniros._proxy.GymProxy instance that forwards step / reset / close / attribute access over a multiprocessing.Pipe to the worker.

This buys two important properties:

  1. Isolated rospy state per env. Each worker has its own rospy.init_node and its own callbacks; envs can run in parallel against different rosmasters without bleeding subscriptions into each other.

  2. Crash isolation. A misbehaving worker raises a Python exception, which the framework catches and ships back to the parent as a uniros._proxy._RemoteException. The parent re-raises with the worker’s traceback instead of hanging on the next recv().

Lifecycle / cleanup

The framework tracks the roscores, Gazebo processes, and roslaunch / rosrun xterms each script spawns and tears them down on Ctrl+C or normal interpreter exit. The mechanism is layered:

  1. Targeted signal handler. register_managed_process (called internally by launch_roscore / launch_gazebo / ros_launch_launcher / ros_node_launcher) installs a SIGINT handler plus an atexit hook.

  2. rospy shutdown hook. Because rospy.init_node installs its own SIGINT handler that would otherwise overwrite ours, register_managed_process also registers via rospy.on_shutdown. This is what guarantees cleanup runs in the common launch_gazebo; rospy.init_node; train flow.

  3. Targeted ``pkill`` + process-group kill. Each tracked roscore is killed by its port (pkill -f "roscore -p <port>"), and Gazebo’s gzserver / gzclient PIDs are captured at launch and SIGTERM’d individually. Pre-existing ROS sessions on the host are not affected. As of v0.3.1, xterm wrappers from ros_launch_launcher and ros_node_launcher are also spawned as process-group leaders (start_new_session=True), and cleanup auto-detects session leaders and uses os.killpg(pgid, SIGTERM) to tear down the whole subtree (e.g. xterm roslaunch move_group moveit_python_interface) in one signal. Previously, terminating only the xterm left grandchildren orphaned to init.

  4. Escape hatch. After cleanup runs, SIGINT is reset to SIG_DFL so a subsequent Ctrl+C kills the script immediately even if the training loop is stuck in a non-responsive C call.

Reliable launch verification (v0.3.1)

launch_roscore verifies the spawned roscore is reachable (TCP probe to the picked port) before returning. On failure, it retries with a fresh kernel-allocated port up to three times, then raises RuntimeError. Previously, a silent xterm/roscore failure (port collision, stale process, display issue) would still report “Roscore launched!” and set ROS_MASTER_URI at a phantom master. Downstream wait_for_service calls then blocked their full 30 s timeout, and roslaunch inside any subsequent launch_gazebo xterm would start its own rosmaster on an auto-allocated port — so gzserver registered there and the env never found the /gazebo/* services it expected.

Forceful worker subprocess reap (v0.3.2)

uniros._proxy.GymProxy.close() now escalates to SIGKILL if the worker subprocess doesn’t exit after the graceful close command + SIGTERM. Previously, a worker stuck in a non-responsive state (PyKDL C call, Gazebo XMLRPC retry, or its own rospy.Timer thread spinning post-failure when gym.make raised mid-init) would survive the parent’s close() and linger as a CPU-burning zombie. The new last-resort SIGKILL guarantees the worker is reaped within ~7 seconds of any proxy.close() call, regardless of what state the worker is in.

If you want to clobber every ROS / Gazebo session on the machine (across users / scripts), use the explicit host-wide helpers: multiros.utils.ros_common.kill_all_host_ros_and_gazebo() or realros.utils.ros_common.kill_all_host_ros_processes(). The host in the name is the warning.

Three supported use cases

The UniROS paper (Kapukotuwa et al., 2025, Section IX) lays out three workflows the framework is designed to support. The choice depends on whether you have a simulator, real hardware, or both — and whether you want a sim-trained policy, a real-trained policy, or a generalised policy that performs in both worlds:

Use case

What it looks like

Real-world only

Train directly on the physical robot via RealROS. Every episode is a real episode. Slowest, but no reality-gap. See Creating a real-hardware environment and Training a model.

Sim → real transfer

Train under a MultiROS env, save the model, validate on the matching RealROS env with no further updates. Reasonable when the simulator is close to reality. See Using trained models.

Joint sim + real training

Hold a sim env and a real env open at the same time, sample episodes from both, and update one policy from the combined replay buffer. The result is a single policy that’s competent in both domains by construction. See Joint sim + real training.

Framework-agnostic policies

The envs this framework produces are standard gymnasium environments. Any reinforcement-learning library that accepts a gym.Env works: Stable Baselines 3, CleanRL, Tianshou, RLlib, Tensorforce, or your own training loop. uniros.make() returns a proxy that behaves like gym.Env while running the underlying env in a worker process; the downstream training code is unchanged.

sb3_ros_support API reference is a convenience layer for SB3 users (YAML hyperparameter loading, ROS-aware paths, HER ready for goal envs). It is one option — not a requirement.

Reading further