Creating a simulation environment

This guide walks through adding a new Gazebo-based environment end-to-end, from picking a starting template to registering the gymnasium ID and running a smoke test.

If you’re new to the two-layer pattern, read Environment templates first.

1. Pick a starting point

  • New robot, new task — copy both layers from the MultiROS templates (MyRobotEnv.py + MyTaskEnv.py). You’ll be doing the most work.

  • Existing robot, new task — copy only the task layer (MyTaskEnv.py / MyTaskGoalEnv.py) and inherit from the existing robot env.

  • Reach-like task — start from rl_environments/src/rl_environments/ rx200/sim/task_envs/reach/rx200_reach_sim.py and adapt.

Goal-conditioned tasks (suitable for HER) use the ...GoalEnv.py templates and the multiros.envs.GazeboGoalEnv.GazeboGoalEnv base class.

2. Implement the robot env

The robot env extends multiros.envs.GazeboBaseEnv.GazeboBaseEnv (or its goal variant). The base class handles the lifecycle — spawning the robot model in Gazebo, pausing/unpausing physics, loading ROS controllers — so your job is to fill in:

  • _check_if_done() (often unused at this layer; see below).

  • _set_episode_init_params() — robot-side reset (move to home pose, wait for controllers, etc.).

  • Optional helpers for the task layer (e.g. get_ee_pose(), move_to_joint_target(joint_array)).

Spawning is configured via the base class’s __init__ parameters. A typical robot env constructor looks like:

class MyRobotSim(GazeboBaseEnv):
    def __init__(self, **kwargs):
        super().__init__(
            robot_name="my_robot",
            robot_model_xml=load_urdf("my_robot_description"),
            robot_pos_x=0.0,
            robot_pos_y=0.0,
            robot_pos_z=0.0,
            robot_ori_w=1.0,        # NOT 0.0 — see env_templates
            ros_controllers_list=["my_arm_controller", ...],
            **kwargs,
        )

3. Implement the task env

The task env extends your robot env and defines the gymnasium contract. Required overrides:

  • _get_obs() — return the observation array (or dict for goal envs).

  • _set_action(action) — apply the action to the robot.

  • _check_if_done() — terminal-condition check.

  • _compute_reward(achieved_goal, desired_goal, info) for goal envs, or _get_reward() for non-goal envs.

  • _set_init_pose() — task-side reset (e.g. randomise target, reset object positions).

The base class also wants you to declare observation_space and action_space in __init__. Use gymnasium.spaces.Box (with seed= for reproducibility) or Dict for goal envs.

For reward design tips, examine rl_environments/src/rl_environments/rx200/sim/task_envs/reach/ rx200_reach_sim.py — it shows distance-to-target reward with optional sparse/dense modes.

4. Register with gymnasium

In your package’s __init__.py:

from gymnasium.envs.registration import register

register(
    id="MyRobotReachSim-v0",
    entry_point="my_pkg.task_envs.my_robot_reach_sim:MyRobotReachSim",
    max_episode_steps=200,
)

5. Smoke test

Without involving SB3, confirm the env runs end-to-end:

import rospy
from multiros.utils import gazebo_core
import uniros as gym
import my_pkg                   # triggers gymnasium.register

gazebo_core.launch_gazebo(launch_roscore=True, gui=True)
rospy.init_node("smoke_test")

env = gym.make("MyRobotReachSim-v0")
obs, info = env.reset(seed=0)
for _ in range(100):
    obs, reward, term, trunc, info = env.step(env.action_space.sample())
    if term or trunc:
        obs, info = env.reset()
env.close()

If the env runs cleanly here, plug it into a training script — see Training a model.

Configuration via YAML

Production envs typically read their hyperparameters (timesteps, reward coefficients, action limits) from a YAML file under config/<env>_config.yaml. Look at rl_environments/config/rx200_reach_task_config.yaml for a template. Loading the file:

from multiros.utils.ros_common import ros_load_yaml
ros_load_yaml(pkg="my_pkg", file="config/my_task_config.yaml")
self.reward_coefficient = rospy.get_param("my_task/reward_coefficient")

Common pitfalls

  • Gazebo physics multiplier. If you set gazebo_update_rate_multiplier above 1.0 in your config, the simulator runs faster than wall-clock — which breaks any agent loop paced by wall time. Keep it at 1.0 unless you’ve consciously redesigned the timing.

  • Quaternion default. robot_ori_w=0.0 is a zero quaternion; use 1.0 for identity.

  • Seed leakage. Use self.np_random (populated by the base class via super().reset(seed=seed)), not np.random.

  • Stuck on reset. If reset hangs forever, check that your ROS controllers actually report running and that wait_for_service calls have timeouts.