Creating a simulation environment (MuJoCo)
Available (experimental)
The MuJoCo backend is available but experimental; it is usable today but not yet in a stable release, and APIs and defaults may change. For the stable Gazebo workflow see Creating a simulation environment; for the backend overview and install steps see MuJoCo backend (experimental).
This guide walks through adding a new MuJoCo-based environment end-to-end. It mirrors the Gazebo guide (Creating a simulation environment) — the two-layer robot-env / task-env pattern is identical — and focuses on the pieces that are specific to MuJoCo: the MJCF scene, stripping URDF transmissions the model does not have, and (for manipulation tasks) objects as free-joint bodies.
The worked example throughout is the vx300s_mujoco_envs package
(github.com/ncbdrck/vx300s_mujoco_envs), which implements
the ViperX-300 S reach, push, and pick-and-place tasks on
the MuJoCo backend.
1. Pick a starting point
New robot, new task — copy both layers from the MultiROS MuJoCo templates (
MyMujocoRobotEnv.py+MyMujocoTaskEnv.py).Existing robot, new task — copy only the task layer and inherit from the existing robot env (this is how
vx300s_mujoco_envs’s push task extends the same robot env the reach task uses).Reach-like task — start from the
vx300s_mujoco_envsreach task and adapt.
Goal-conditioned tasks (suitable for HER) use the ...GoalEnv.py
templates and the multiros.envs.MujocoGoalEnv.MujocoGoalEnv base
class.
The MuJoCo tooling mirrors the Gazebo tooling one-to-one
(mujoco_core / mujoco_models / mujoco_physics /
MujocoBaseEnv), so if you have built a Gazebo env the structure
will be familiar. See MuJoCo backend (experimental) for the full mapping.
2. Build the MJCF scene
This step has no Gazebo equivalent. Gazebo spawns models into a running world; MuJoCo loads a single MJCF/XML scene when the server starts. You provide that scene as a package asset.
Start from the robot’s MJCF (e.g. the MuJoCo Menagerie model). Remove the native
<actuator>block — joints are driven throughmujoco_ros_control, and leaving the MJCF position actuators in place would fight the controller.Compose scenes with
<include>.vx300s_mujoco_envsuses a layered set:vx300s.xml(arm only) →vx300s_table_scene.xml(arm + table + floor) →vx300s_push_scene.xml(table scene + a cube). Reuse keeps the arm/table identical across tasks.Mount the arm and the work surface consistently. In the example the arm base is at the world origin and the table top is at
z = 0, so the arm’s kinematics and the task goal sampling do not change when the table is added; the table slab extends downward and the ground plane is dropped to the foot of its legs.
Validate the scene compiles before wiring anything else:
import mujoco
mujoco.MjModel.from_xml_path("assets/<robot>_mjcf/<scene>.xml")
3. Implement the robot env
The robot env extends multiros.envs.MujocoBaseEnv.MujocoBaseEnv
(or its goal variant). The base class handles the lifecycle — bringing
up the robot description, the robot_state_publisher and the ROS
controllers, and pausing/unpausing (or deterministically stepping) the
simulation — so your job is to fill in robot-side helpers
(get_ee_pose(), move_arm_joints(...), forward kinematics, etc.)
and the robot-side reset.
Two MuJoCo-specific constructor arguments matter:
controlled_joints— the joints your task drives. The backend keeps only these joints’<transmission>elements inrobot_descriptionand strips the rest. This is required becausemujoco_ros_controlaborts if the URDF declares a transmission for a joint that is not in the MJCF (manufacturer URDFs commonly include a gripper or mimic fingers the MJCF omits).sim_step_mode/num_mujoco_steps— select the stepping regime (see step 6).
A typical robot env constructor forwards these to the base class:
class MyMujocoRobotEnv(MujocoBaseEnv):
def __init__(self, **kwargs):
super().__init__(
spawn_robot=load_robot,
urdf_pkg_name="my_robot_description",
urdf_file_name="my_robot.urdf.xacro",
controllers_file="my_robot_control.yaml",
controllers_list=["joint_state_controller", "arm_controller"],
controller_package_name="my_mujoco_envs",
controlled_joints=["joint1", "joint2", "joint3", ...],
**kwargs,
)
4. Implement the task env
The task env extends your robot env and defines the gymnasium contract. Required overrides mirror the Gazebo guide:
_set_action(action)— apply the action to the robot._get_observation()— return the observation array (or dict for goal envs)._get_reward()(orcompute_reward(...)for goal envs)._compute_terminated()/_compute_truncated()._set_init_params()— task-side reset (randomise the target, reposition objects).
Declare observation_space and action_space in __init__
with gymnasium.spaces.Box (pass seed= for reproducibility), or
Dict for goal envs.
Objects (push / pick-and-place). Add the object to the MJCF scene
as a body with a <freejoint/> so it moves under contact, then in
the task env reposition it each episode and read its pose for the
observation. There is no per-episode spawn/delete; you move the
existing body via the set_body_state service:
from multiros.utils import mujoco_models
# episode reset: place the cube in its spawn region
mujoco_models.mujoco_set_body_state(
body_name="cube",
pos_x=x, pos_y=y, pos_z=z,
set_pose=True, set_twist=True, # set_twist zeroes its velocity
server_name=self.server_name)
# observation: read the cube pose back
_, pose, _, ok = mujoco_models.mujoco_get_body_state(
body_name="cube", server_name=self.server_name)
This is the MuJoCo analogue of Gazebo’s set_model_state /
get_model_state. Keep the object’s observation bounds wider
than its spawn region — a pushed object travels into the goal
region, and the observation must still contain it.
5. Register with gymnasium
Register the env when its module is imported (the example does this at the top of each task module):
from gymnasium.envs.registration import register
register(
id="MyMujocoReachSim-v0",
entry_point="my_mujoco_envs.task_envs.reach.my_mujoco_reach:MyMujocoReachEnv",
max_episode_steps=100,
)
6. Smoke test
The env can bring the whole stack up itself (launch_mujoco=True,
new_roscore=True, load_robot=True — all default True), so a
smoke test is a single process:
import uniros as gym
import my_mujoco_envs # triggers gymnasium.register
env = gym.make("MyMujocoReachSim-v0") # self-launches roscore + server + controllers
obs, info = env.reset(seed=0)
for _ in range(100):
obs, reward, term, trunc, info = env.step(env.action_space.sample())
if term or trunc:
obs, info = env.reset()
env.close()
To attach to a stack you started separately (useful while debugging
the bring-up), launch the package’s launch file first, then construct
the env with launch_mujoco=False, new_roscore=False,
load_robot=False.
For manipulation tasks, random joint actions rarely bring the end-effector to the object, so they do not exercise contact. Add a short scripted “poke” (IK the end-effector through the object and check it moves) to confirm contact and the free-joint dynamics independently of the policy. Once the env runs cleanly, plug it into a training script — see Training a model.
7. Stepping regimes
MujocoBaseEnv supports three regimes, selected per task:
Real-time (default) — the UniROS paper real-time loop; physics is never paused and a timer refreshes observations. Matches the real robot, so policies transfer.
Paused MDP (
realtime_mode=False) — physics runs only during the action window, then pauses; every sample is the post-action world state.Deterministic fast-step (
sim_step_mode=2) — the simulation is advanced explicitly bynum_mujoco_stepswith no wall-clock sleep, so training runs faster than real time.
Configuration via YAML
As with the Gazebo envs, keep task parameters (action limits, object
regions, reward coefficients, tolerances) in a YAML file under
config/ and load it at construction:
from multiros.utils.ros_common import ros_load_yaml
ros_load_yaml(pkg_name="my_mujoco_envs",
file_name="my_task_config.yaml", ns="/")
self.reach_tolerance = rospy.get_param("/my_robot/reach_tolerance")
Keep the env package self-contained: ship its own task and training
configs rather than importing them from another package, so it can be
used with just MultiROS + sb3_ros_support.
Common pitfalls
Transmission mismatch. If
mujoco_ros_controllogs “Could not initialize robot simulation interface”, the URDF declares a transmission for a joint absent from the MJCF. Passcontrolled_joints(env) and use the launch helper$(find multiros)/scripts/mujoco_filtered_description.py(launch files) so only the driven joints’ transmissions are kept.use_sim_time and resets. The MuJoCo server publishes
/clock; a reset rewinds it, so anyrospy.sleepstraddling a reset must tolerateROSTimeMovedBackwardsException. The backend already guards its own sleeps; do the same in task code that sleeps.Degenerate observation bounds. A
Boxdimension whose min equals max (e.g. an object’s z fixed to the table height) trips the gymnasium checker. Give every dimension a real range.Object obs vs spawn region. Bound the object’s observation to cover spawn + goal + overshoot, not just the spawn region, or a pushed object leaves the observation space.
Safety floor for table tasks. The per-link floor check is shared with the Gazebo envs. Its defaults suit reach (which never approaches the surface); for push/pick-and-place tune
table_z/safety_z_marginin the task config so the arm can get low enough to contact the object without the action being rejected.Seed leakage. Use
self.np_random(populated by the base class viasuper().reset(seed=seed)), notnp.random.