Using trained models
Once you’ve trained a model with Training a model, you can load the
saved .zip file and run it against any env that exposes the
same observation/action space.
Load and predict
The pattern matches Stable Baselines 3 itself, wrapped to read the same YAML config the training script used:
import rospy
import uniros as gym
import rl_environments
from sb3_ros_support.sac import SAC
rospy.init_node("rx200_validate_sim")
env = gym.make("RX200ReacherSim-v0", gazebo_gui=True)
obs, _ = env.reset()
# Load the trained model. ``save_path`` is the same value you
# used when saving; the .zip extension is added automatically.
model = SAC.load_trained_model(
"/models/sac/trained_model_name",
env=env,
model_pkg="rl_training_validation",
config_filename="rx200_reacher_sac.yaml",
)
# Run validation episodes
episodes = 50
epi_count = 0
while epi_count < episodes:
action, _ = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
epi_count += 1
rospy.loginfo(f"Episode {epi_count} / {episodes}")
obs, _ = env.reset()
env.close()
Validate sim-trained models on real hardware
The framework’s value proposition: an env trained under
MyRobotReachSim-v0 can be validated on MyRobotReachReal-v0
without retraining, as long as the action and observation spaces
match.
# Bring up the real robot's driver in a separate terminal first:
# roslaunch my_robot_bringup robot.launch
import rospy
import uniros as gym
import rl_environments
from sb3_ros_support.sac import SAC
rospy.init_node("rx200_validate_real")
# Same env API, different backend
env = gym.make("RX200ReacherReal-v0")
obs, _ = env.reset()
model = SAC.load_trained_model(
"/models/sac/trained_in_sim",
env=env,
model_pkg="rl_training_validation",
config_filename="rx200_reacher_sac.yaml",
)
# Slow start — confirm the first action is sensible before
# letting the loop run freely.
for _ in range(3):
action, _ = model.predict(obs, deterministic=True)
rospy.loginfo(f"First action: {action}")
obs, _, term, trunc, _ = env.step(action)
if term or trunc:
obs, _ = env.reset()
If the policy looks unsafe (joint velocities too high, target positions outside the workspace, etc.), stop the script and:
Tighten the action clip in the env’s
_set_action().Lower the SB3
action_noiseeven in validation (deterministic prediction should ignore noise, but double-check the YAML).Retrain with reward shaping that penalises large actions.
Deterministic vs stochastic prediction
model.predict(obs, deterministic=True) returns the policy’s
mean action — what you want for validation on hardware.
deterministic=False samples from the policy’s distribution.
Useful during training-time evaluation when you want stochastic
behaviour, but not when reproducing the trained policy.
Common pitfalls
Observation-space mismatch. Loading a model expects the env’s
observation_spaceandaction_spaceto match what was used at training. If you changed the obs space (e.g. added camera channels) you can’t reuse the old model.Goal-env mismatch. A model trained on a goal env (HER) can’t be loaded onto a non-goal env. The classes are different.
Config-file changes. The YAML config you pass to
load_trained_modelmust reference compatiblepolicy_kwargs(architecture, normalisation). If you’ve changed the network shape, the load will fail.Different gymnasium ID, same robot. Validating a
Real-v0model on aSim-v0env (or vice versa) only works if the action/observation spaces are byte-identical. The framework’s templates aim to keep them that way; verify before loading.