diff --git a/docs/source/overview/gym/index.rst b/docs/source/overview/gym/index.rst index e72a20fb..61deac36 100644 --- a/docs/source/overview/gym/index.rst +++ b/docs/source/overview/gym/index.rst @@ -1,26 +1,195 @@ Gym -=================== +=== .. currentmodule:: embodichain.lab.gym -The ``gym`` module provides a comprehensive framework for creating robot learning environments. It extends the Gymnasium interface to support multi-environment parallel execution, custom observations, and robotic-specific functionality. +The ``embodichain.lab.gym`` module is the environment layer that turns +simulation scenes into robot-learning tasks. It follows the Gymnasium interface +while adding vectorized simulation, declarative scene configuration, manager +systems for task logic, and tensor-based observations and actions for learning +pipelines. + +The design mirrors the separation used in modern robot-learning frameworks such +as Isaac Lab: simulation objects live in the scene, task behavior is composed +through managers, and the environment exposes a stable ``reset`` / ``step`` API +to reinforcement learning, imitation learning, data collection, and evaluation +code. Environment Classes ------------------- -Base Environments -~~~~~~~~~~~~~~~~~ +:class:`~embodichain.lab.gym.envs.base_env.BaseEnv` is the common +Gymnasium-compatible foundation. It owns the +:class:`~embodichain.lab.sim.SimulationManager`, manages the number of parallel +environments, configures timing through ``sim_steps_per_control``, tracks +episode length, exposes batched observation and action spaces, and defines the +standard environment lifecycle. + +:class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnv` builds on +:class:`~embodichain.lab.gym.envs.base_env.BaseEnv` for configuration-driven +embodied tasks. A +single :class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnvCfg` declares +the robot, sensors, lights, background objects, interactive objects, +articulations, and manager configs. The environment constructs the simulation +scene from that config and then delegates task-specific behavior to managers and +functors. + +.. list-table:: + :header-rows: 1 + :widths: 24 38 38 + + * - Class + - Role + - Use it when + * - :class:`~embodichain.lab.gym.envs.base_env.BaseEnv` + - Provides Gymnasium compatibility, vectorized arena control, simulation + timing, spaces, reset, and step bookkeeping. + - You need a custom environment base with direct control over scene setup + and task methods. + * - :class:`~embodichain.lab.gym.envs.base_env.EnvCfg` + - Configures common environment settings such as ``num_envs``, simulation + config, seed, control frequency, and episode length. + - You are defining shared runtime parameters for any environment. + * - :class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnv` + - Adds declarative scene creation and manager-based actions, events, + observations, rewards, and dataset recording. + - You are building robot manipulation, RL, IL, or data-generation tasks. + * - :class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnvCfg` + - Declares the robot, controlled parts, sensors, objects, articulations, + lights, managers, and task extension fields. + - You want the task definition to live primarily in configuration. + +Architecture +------------ + +The Gym module sits above the simulation module and below training or data +collection code: + +.. code-block:: text + + RL trainer / IL recorder / evaluation script + | + v + Gymnasium API: reset(), step(action), observation_space, action_space + | + v + BaseEnv + |-- SimulationManager ownership + |-- vectorized arenas and timing + |-- episode state and auto reset + `-- batched observation/action spaces + | + v + EmbodiedEnv + |-- scene config: robot, sensors, objects, lights, articulations + |-- ActionManager: policy action -> robot command + |-- EventManager: startup, reset, and interval behavior + |-- ObservationManager: add or modify observation terms + |-- RewardManager: weighted scalar reward terms + `-- DatasetManager: episode recording and export + +This layering keeps task code small. A task can define only the scene +configuration, reward or termination logic, and any task-specific parameters +while relying on managers for reusable behavior. + +Manager and Functor Pattern +--------------------------- + +Managers are configured collections of functors. A functor is either a function +or a callable class that receives the environment and optional parameters, then +performs one well-scoped operation. The config objects identify the callable and +its parameters, and +:class:`~embodichain.lab.gym.envs.managers.cfg.SceneEntityCfg` resolves named +robots, objects, joints, links, or bodies from the simulation scene. + +.. list-table:: + :header-rows: 1 + :widths: 22 38 40 + + * - Manager + - Responsibility + - Typical use + * - Action manager + - Converts raw policy actions into robot commands such as joint targets, + velocity commands, force commands, or end-effector targets. + - RL policies that output normalized or task-space actions. + * - Event manager + - Applies startup, reset, and interval behaviors. + - Domain randomization, object placement, visual changes, and scripted + scene updates. + * - Observation manager + - Adds task-specific observations or modifies existing nested observation + entries. + - Object poses, target states, normalized proprioception, sensor-derived + terms, and keypoint projections. + * - Reward manager + - Evaluates weighted reward functors and sums them into the environment + reward. + - Distance, alignment, success, smoothness, and penalty terms for RL. + * - Dataset manager + - Records episode data through dataset functors. + - Imitation-learning demonstrations, LeRobot export, and offline dataset + generation. + +Typical Step Flow +----------------- + +At runtime, an ``EmbodiedEnv`` step usually follows this high-level sequence: + +1. Receive a batched action from a policy, script, or teleoperation source. +2. Use the Action Manager to convert it into robot control targets. +3. Step the simulation for ``sim_steps_per_control`` physics steps. +4. Update sensors and collect base observations from the robot and scene. +5. Apply Observation Manager terms to add or transform observation entries. +6. Evaluate rewards, success, failure, timeout, and reset conditions. +7. Record transition data through the Dataset Manager when configured. +8. Return Gymnasium-compatible ``obs``, ``reward``, ``terminated``, + ``truncated``, and ``info`` values. + +Task Authoring Workflow +----------------------- + +For most new tasks, start from +:class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnv`: + +1. Define an + :class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnvCfg` subclass with + the robot, objects, sensors, lights, and manager configs. +2. Register the task with ``register_env`` so it can be constructed by ID. +3. Configure actions for RL tasks, or dataset recording for IL tasks. +4. Add observation, reward, and event functors instead of hard-coding reusable + logic in the environment class. +5. Keep custom environment methods focused on task-specific success, failure, + reset, or demonstration behavior. + +Choosing Where to Start +----------------------- + +- Start with :doc:`env` for the full + :class:`~embodichain.lab.gym.envs.embodied_env.EmbodiedEnv` configuration and + custom task guide. +- Use :doc:`action_functors` when connecting policy outputs to robot control. +- Use :doc:`event_functors` for reset randomization, visual randomization, and + scene perturbations. +- Use :doc:`observation_functors` to add task observations without changing base + environment code. +- Use :doc:`reward_functors` when composing RL reward terms. +- Use :doc:`dataset_functors` when recording demonstrations or exporting + datasets. -- :class:`envs.BaseEnv` - Foundational environment class that provides core functionality for all EmbodiChain RL environments -- :class:`envs.EnvCfg` - Configuration class for basic environment settings +Documentation Quality Notes +--------------------------- -Embodied Environments -~~~~~~~~~~~~~~~~~~~~~ +Gym documentation should make the runtime contract explicit: tensor shapes, +whether data is batched by ``num_envs``, which manager mode invokes a functor, +and which scene entity each functor expects. Prefer linking to the simulation +overview for asset and sensor details, and keep task pages focused on the +environment lifecycle, manager configuration, and learning interface. -- :class:`envs.EmbodiedEnv` - Advanced environment class for complex Embodied AI tasks with configuration-driven architecture -- :class:`envs.EmbodiedEnvCfg` - Configuration class for Embodied Environments +See Also +-------- .. toctree:: :maxdepth: 1 - env.md \ No newline at end of file + env.md diff --git a/docs/source/overview/sim/index.rst b/docs/source/overview/sim/index.rst index 60cdfd56..9025771b 100644 --- a/docs/source/overview/sim/index.rst +++ b/docs/source/overview/sim/index.rst @@ -1,22 +1,140 @@ -Simulation Framework -================== +Simulation Framework +==================== -Overview of the Simulation Framework: +.. currentmodule:: embodichain.lab.sim -- Architecture +The ``embodichain.lab.sim`` module is the runtime layer that connects robot +assets, physics, rendering, sensing, kinematics, and motion generation. It is +designed around a small set of composable components: a +:class:`SimulationManager` owns the simulation lifecycle, asset classes +represent objects in the scene, sensors produce batched observations, solvers +convert between joint space and task space, planners generate feasible +trajectories, and atomic actions package common manipulation skills. -- Components - - - Simulation Manager - - Simulation Assets - - Virtual Sensor - - Kinematics Solver - - Motion Generation +Like EmbodiChain's environment and learning modules, the simulation framework is +configuration driven. Scene elements are declared through config classes, spawned +through the manager, and then stepped explicitly in a simulation loop. This makes +the same scene description usable for interactive visualization, scripted data +generation, and vectorized robot-learning environments. + +Architecture +------------ + +The simulation stack can be read from the bottom up: + +.. code-block:: text + + SimulationManager + |-- global physics, rendering, arenas, stepping, USD import/export + |-- assets + | |-- rigid objects and rigid object groups + | |-- articulations and robots + | |-- soft objects and cloth + | `-- lights, materials, shapes, and gizmos + |-- sensors + | |-- cameras and stereo cameras + | `-- contact sensors + |-- solvers + | |-- forward kinematics + | |-- inverse kinematics + | `-- differential kinematics + |-- planners + | |-- joint-space and Cartesian trajectory generation + | `-- time parameterization and sampling utilities + `-- atomic actions + `-- reusable manipulation primitives built from assets, solvers, and planners + +The :class:`SimulationManager` is the entry point for most workflows. It creates +the physics world, configures rendering and time stepping, lays out multiple +parallel arenas, and exposes ``add_*`` methods for scene construction. Every +asset and sensor is registered with the manager so that state updates, resets, +rendering, and batched queries remain synchronized. + +Submodule Relationships +----------------------- + +.. list-table:: + :header-rows: 1 + :widths: 22 38 40 + + * - Submodule + - Responsibility + - Relationship to other modules + * - Simulation manager + - Owns lifecycle, stepping, rendering, multiple arenas, and scene import/export. + - Creates and coordinates assets, sensors, and physics state. + * - Assets + - Define the simulated entities: rigid objects, object groups, + articulations, robots, soft bodies, cloth, lights, shapes, and + materials. + - Provide the state and control surfaces consumed by sensors, solvers, + planners, environments, and atomic actions. + * - Sensors + - Produce perception data such as color, depth, segmentation, stereo disparity, and contacts. + - Attach to world frames, robot links, or monitored bodies and return + batched tensors for downstream policies or datasets. + * - Solvers + - Compute FK, IK, and differential kinematics for robots and articulations. + - Translate task-space goals into joint-space commands used by planners, + controllers, and actions. + * - Planners + - Generate joint-space or Cartesian trajectories with interpolation, + timing, and feasibility handling. + - Use robot state and solver results to produce trajectories that can be + replayed in the manager loop. + * - Atomic actions + - Package complete manipulation primitives such as move, pick, and place. + - Compose semantic targets, solvers, planners, and robot control into + reusable higher-level skills. + +Typical Data Flow +----------------- + +A typical robot-learning or data-generation workflow follows this sequence: + +1. Create a :class:`SimulationManager` from :class:`SimulationManagerCfg`. +2. Add assets such as objects, articulations, robots, lights, and materials. +3. Add sensors for camera, stereo, or contact observations. +4. Use solvers and planners to convert task goals into robot trajectories. +5. Step the simulation with :meth:`SimulationManager.update` and collect state or sensor tensors. +6. Wrap the same simulation logic in a Gym environment when training or evaluating agents. + +For manipulation tasks, atomic actions can replace the lower-level solver and +planner calls. An action engine receives semantic targets or poses, resolves the +motion primitive sequence, and returns a trajectory that can be replayed in the +simulation. + +Choosing Where to Start +----------------------- + +- Start with :doc:`sim_manager` when creating a new simulation scene or learning + how stepping, rendering, and parallel arenas work. +- Use :doc:`sim_assets` when adding physical entities, materials, lights, or USD + assets. The asset pages underneath it cover each object family in detail. +- Use :doc:`sim_sensor` when adding camera, stereo, or contact observations. +- Use :doc:`solvers/index` when a robot needs FK, IK, or velocity-level + kinematics. +- Use :doc:`planners/index` when a target pose or joint goal must become a + time-ordered trajectory. +- Use :doc:`atomic_actions` when building scripted manipulation from reusable + motion primitives. + +Documentation Quality Notes +--------------------------- + +The pages in this section should stay organized around the same workflow: +configuration, construction through :class:`SimulationManager`, runtime state or +control APIs, and integration with sensors, planners, or Gym environments. When +adding new simulation documentation, include tensor shapes for batched data, +state the coordinate frame for poses and contacts, and link to the relevant +object, solver, or planner page instead of duplicating API tables. + +See Also +-------- .. toctree:: :maxdepth: 1 - :glob: - + sim_manager.md sim_assets.md sim_sensor.md