Environment templates

Both MultiROS and RealROS ship template files you can copy as the starting point for a new environment. Each follows the two-layer Robot env → Task env pattern that the framework expects.

The two-layer pattern

Reinforcement-learning environments are split into two responsibilities:

Robot env — the layer that knows about the robot.

  • What controllers it exposes (joint position, joint velocity, end-effector pose).

  • How to read its joint states / sensors.

  • How to wait for it to be ready.

This layer is reusable across tasks: the same robot env can back a reach task, a push task, a pick-and-place task.

Task env — the layer that knows about the task.

  • The gymnasium observation_space and action_space.

  • The reward function.

  • When an episode terminates.

  • What “reset to a sensible starting state” means for this task.

It composes the robot env (via inheritance) and adds task-specific behaviour on top.

Base classes that wire this together live in the framework:

Template files

MultiROS (simulation)

Located at multiros/src/multiros/templates/:

  • robot_envs/MyRobotEnv.py — non-goal robot env template.

  • robot_envs/MyRobotGoalEnv.py — goal-conditioned robot env.

  • task_envs/MyTaskEnv.py — non-goal task env.

  • task_envs/MyTaskGoalEnv.py — goal-conditioned task env (use with HER algorithms via sb3_ros_support).

RealROS (real hardware)

Located at realros/src/realros/templates/:

  • robot_envs/MyRealRobotEnv.py

  • robot_envs/MyRealRobotGoalEnv.py

  • task_envs/MyRealTaskEnv.py

  • task_envs/MyRealTaskGoalEnv.py

Common pitfalls

  • Quaternion default. The base envs default robot_ori_w=1.0 (identity orientation). Don’t pass 0.0 — that’s a zero quaternion which Gazebo will normalise but is non-portable across physics engines.

  • Seeded RNG. Use self.np_random (populated by super().reset(seed=seed)) for any random sampling. Don’t use the global np.random — that’s not seedable from the env API and breaks reproducibility.

  • MoveIt timeouts. rospy.wait_for_service calls in the framework already have 30 s timeouts (see uniros.utils.ros_controllers). If you add your own service calls in a task env, follow the same pattern.