Using trained models ==================== Once you've trained a model with :doc:`training`, you can load the saved ``.zip`` file and run it against any env that exposes the same observation/action space. Load and predict ---------------- The pattern matches Stable Baselines 3 itself, wrapped to read the same YAML config the training script used: .. code-block:: python import rospy import uniros as gym import rl_environments from sb3_ros_support.sac import SAC rospy.init_node("rx200_validate_sim") env = gym.make("RX200ReacherSim-v0", gazebo_gui=True) obs, _ = env.reset() # Load the trained model. ``save_path`` is the same value you # used when saving; the .zip extension is added automatically. model = SAC.load_trained_model( "/models/sac/trained_model_name", env=env, model_pkg="rl_training_validation", config_filename="rx200_reacher_sac.yaml", ) # Run validation episodes episodes = 50 epi_count = 0 while epi_count < episodes: action, _ = model.predict(obs, deterministic=True) obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: epi_count += 1 rospy.loginfo(f"Episode {epi_count} / {episodes}") obs, _ = env.reset() env.close() Validate sim-trained models on real hardware -------------------------------------------- The framework's value proposition: an env trained under ``MyRobotReachSim-v0`` can be validated on ``MyRobotReachReal-v0`` without retraining, as long as the action and observation spaces match. .. code-block:: python # Bring up the real robot's driver in a separate terminal first: # roslaunch my_robot_bringup robot.launch import rospy import uniros as gym import rl_environments from sb3_ros_support.sac import SAC rospy.init_node("rx200_validate_real") # Same env API, different backend env = gym.make("RX200ReacherReal-v0") obs, _ = env.reset() model = SAC.load_trained_model( "/models/sac/trained_in_sim", env=env, model_pkg="rl_training_validation", config_filename="rx200_reacher_sac.yaml", ) # Slow start — confirm the first action is sensible before # letting the loop run freely. for _ in range(3): action, _ = model.predict(obs, deterministic=True) rospy.loginfo(f"First action: {action}") obs, _, term, trunc, _ = env.step(action) if term or trunc: obs, _ = env.reset() If the policy looks unsafe (joint velocities too high, target positions outside the workspace, etc.), stop the script and: * Tighten the action clip in the env's ``_set_action()``. * Lower the SB3 ``action_noise`` even in validation (deterministic prediction should ignore noise, but double-check the YAML). * Retrain with reward shaping that penalises large actions. Deterministic vs stochastic prediction -------------------------------------- ``model.predict(obs, deterministic=True)`` returns the policy's mean action — what you want for validation on hardware. ``deterministic=False`` samples from the policy's distribution. Useful during training-time evaluation when you want stochastic behaviour, but not when reproducing the trained policy. Common pitfalls --------------- * **Observation-space mismatch.** Loading a model expects the env's ``observation_space`` and ``action_space`` to match what was used at training. If you changed the obs space (e.g. added camera channels) you can't reuse the old model. * **Goal-env mismatch.** A model trained on a goal env (HER) can't be loaded onto a non-goal env. The classes are different. * **Config-file changes.** The YAML config you pass to ``load_trained_model`` must reference compatible ``policy_kwargs`` (architecture, normalisation). If you've changed the network shape, the load will fail. * **Different gymnasium ID, same robot.** Validating a ``Real-v0`` model on a ``Sim-v0`` env (or vice versa) only works if the action/observation spaces are byte-identical. The framework's templates aim to keep them that way; verify before loading.