Using trained models

Once you’ve trained a model with Training a model, you can load the saved .zip file and run it against any env that exposes the same observation/action space.

Load and predict

The pattern matches Stable Baselines 3 itself, wrapped to read the same YAML config the training script used:

import rospy
import uniros as gym
import rl_environments

from sb3_ros_support.sac import SAC


rospy.init_node("rx200_validate_sim")
env = gym.make("RX200ReacherSim-v0", gazebo_gui=True)
obs, _ = env.reset()

# Load the trained model. ``save_path`` is the same value you
# used when saving; the .zip extension is added automatically.
model = SAC.load_trained_model(
    "/models/sac/trained_model_name",
    env=env,
    model_pkg="rl_training_validation",
    config_filename="rx200_reacher_sac.yaml",
)

# Run validation episodes
episodes = 50
epi_count = 0
while epi_count < episodes:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        epi_count += 1
        rospy.loginfo(f"Episode {epi_count} / {episodes}")
        obs, _ = env.reset()

env.close()

Validate sim-trained models on real hardware

The framework’s value proposition: an env trained under MyRobotReachSim-v0 can be validated on MyRobotReachReal-v0 without retraining, as long as the action and observation spaces match.

# Bring up the real robot's driver in a separate terminal first:
#   roslaunch my_robot_bringup robot.launch

import rospy
import uniros as gym
import rl_environments

from sb3_ros_support.sac import SAC


rospy.init_node("rx200_validate_real")

# Same env API, different backend
env = gym.make("RX200ReacherReal-v0")
obs, _ = env.reset()

model = SAC.load_trained_model(
    "/models/sac/trained_in_sim",
    env=env,
    model_pkg="rl_training_validation",
    config_filename="rx200_reacher_sac.yaml",
)

# Slow start — confirm the first action is sensible before
# letting the loop run freely.
for _ in range(3):
    action, _ = model.predict(obs, deterministic=True)
    rospy.loginfo(f"First action: {action}")
    obs, _, term, trunc, _ = env.step(action)
    if term or trunc:
        obs, _ = env.reset()

If the policy looks unsafe (joint velocities too high, target positions outside the workspace, etc.), stop the script and:

  • Tighten the action clip in the env’s _set_action().

  • Lower the SB3 action_noise even in validation (deterministic prediction should ignore noise, but double-check the YAML).

  • Retrain with reward shaping that penalises large actions.

Deterministic vs stochastic prediction

model.predict(obs, deterministic=True) returns the policy’s mean action — what you want for validation on hardware.

deterministic=False samples from the policy’s distribution. Useful during training-time evaluation when you want stochastic behaviour, but not when reproducing the trained policy.

Common pitfalls

  • Observation-space mismatch. Loading a model expects the env’s observation_space and action_space to match what was used at training. If you changed the obs space (e.g. added camera channels) you can’t reuse the old model.

  • Goal-env mismatch. A model trained on a goal env (HER) can’t be loaded onto a non-goal env. The classes are different.

  • Config-file changes. The YAML config you pass to load_trained_model must reference compatible policy_kwargs (architecture, normalisation). If you’ve changed the network shape, the load will fail.

  • Different gymnasium ID, same robot. Validating a Real-v0 model on a Sim-v0 env (or vice versa) only works if the action/observation spaces are byte-identical. The framework’s templates aim to keep them that way; verify before loading.