feat(robot): OpenPI policy harness, H.264 trace video, rollout batching against one agent by lukass16 · Pull Request #425 · hud-evals/hud-python

lukass16 · 2026-06-17T22:19:01Z

Issue

The v6 robot harness needed to drive real OpenPI policy servers, run concurrent rollouts efficiently, and stream camera data to traces without bloating each step with JPEG frames. Slow sim boots (e.g. Isaac Sim) also exceeded the default env connect timeout.

Solution

Add RemoteModel — WebSocket/msgpack client for OpenPI policy servers (lazy connect, supports actions / action response keys).
Add BatchedAgent / BatchedModel — coalesce concurrent ainfer() calls into stacked forwards for parallel rollouts.
Adopt OpenPI slash-delimited observation keys end-to-end; add OpenPIAdapter so a stock OpenPI server drives the harness with no agent changes.
Stream per-camera H.264/CMAF video via VideoStreamer (hud/agents/robot/video.py); numeric state stays on ObservationStep, frames go as VideoSegmentStep spans.
Raise RobotClient connect ready_timeout default to 240s for slow container boots.
Also includes Modal/Daytona eval runtime providers merged from lukass/modal-daytona-runtimes.

Outcome / Verification

Robot rollout against OpenPI policy server via RemoteModel + OpenPIAdapter
Concurrent rollouts via BatchedAgent(batch_size=N)
Trace shows video_segment spans with playable H.264 segments
Env connect succeeds on slow Isaac Sim boots

Note

Medium Risk
Robot rollout, inference batching, and trace shape change observability (video segments vs per-step images); connect timeout and init download behavior affect all env provisioning paths.

Overview
Robot harness gains an OpenPI path: RemoteModel talks to a policy server over WebSocket, OpenPIAdapter maps observations to OpenPI wire keys, and Model is now stateless with a fixed [N, T, A] batch contract (LeRobot inlined; Ensembler / lerobot_infer removed). BatchedModel / BatchedAgent coalesce concurrent ainfer calls into one forward for in-process models only (RemoteModel stays one agent per rollout).

Tracing stops embedding per-tick JPEGs on ObservationStep; RobotAgent runs VideoStreamer (PyAV/x264 CMAF) and emits VideoSegmentStep spans with optional trace_id on Step.emit. RobotClient.get_control_rate() drives encoder FPS. The robot extra now requires av>=12.

Platform polish: default connect(..., ready_timeout) rises 120s → 240s; hud init can download GitHub starter presets (--preset / TTY picker) with safe tarball extract; RL cookbook uses file-level MODEL / TASKSET instead of HUD_MODEL / HUD_TASKSET; new v6 Environments and Tasks docs plus .gitignore exception so docs/v6/build/ stays tracked; version 0.6.1.

^{Reviewed by Cursor Bugbot for commit 4c85e4a. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-06-18T00:28:30Z

+                    except Exception:  # not found: build it under this name
+                        await daytona.snapshot.create(
+                            CreateSnapshotParams(name=self.snapshot_name, image=self._image)
+                        )


Daytona snapshot probe swallows errors

Medium Severity

DaytonaRuntime._ensure_snapshot treats any snapshot.get failure like a missing snapshot and always calls snapshot.create. Transient API or auth errors can trigger a redundant create attempt and mark the snapshot resolved, hiding the real failure until sandbox startup.

^{Reviewed by Cursor Bugbot for commit 446a05b. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 4 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 82c1ef8. Configure here.}

cursor · 2026-06-19T19:22:08Z

+            if self._init_sent and btype == b"mdat":
+                self._dispatch(self._pending)
+                self._pending = b""
+        return len(b)  # return the number of bytes written


MP4 sink buffer grows unbounded

High Severity

SegmentEncoder.write advances _scan after extracting MP4 boxes but never discards consumed bytes from _buf, while _pos keeps growing with every mux write. Each camera encoder retains a full copy of all muxed output for the episode, so long rollouts or many cameras can inflate memory without bound.

^{Reviewed by Cursor Bugbot for commit 82c1ef8. Configure here.}

cursor · 2026-06-19T19:22:08Z

+        # Start camera video at env's control rate; capture trace id for encoder span attribution.
+        self._video = video.VideoStreamer(
+            fps=client.get_control_rate(), trace_id=get_current_trace_id()
+        )


LeRobot policy not reset per episode

Medium Severity

Episode startup no longer calls policy.reset() on LeRobot checkpoints. The prior harness reset the policy (and optional ensembler) in on_episode_start; that hook was removed while reusing the same LeRobotModel across sequential rollouts, so internal episode state can carry into the next episode.

Additional Locations (1)

hud/agents/robot/model.py#L48-L83

^{Reviewed by Cursor Bugbot for commit 82c1ef8. Configure here.}

cursor · 2026-06-19T19:22:08Z

+        """Ship one request dict → the server's ``[T, A]`` chunk, returned as ``[1, T, A]``."""
+        self.connect()  # lazy connect on first call (blocks until the server is up)
+        chunk = np.asarray(self._client.infer(batch)[self.response_key], dtype=np.float32)
+        return chunk[None]  # add the leading N=1 batch dim


Shared RemoteModel lacks infer lock

Medium Severity

RemoteModel.infer uses one lazy WebSocket client with no serialization. Concurrent rollouts that share a single RemoteModel (common when fanning out parallel OpenPI evals) can interleave infer calls on the same connection and corrupt requests or responses.

^{Reviewed by Cursor Bugbot for commit 82c1ef8. Configure here.}

Feat: hud-python sdk v6

L/v6 template updates

Adds a -p/--preset flag (and an interactive picker on a TTY) so hud init can fetch the same starter environments as the platform's environments/new flow. Presets live in hud/cli/presets.py (blank, browser, deepresearch, cua, autonomous-businesses, verilog) and are materialized by downloading the repo's main tarball from codeload (no git, path-traversal-safe). With no preset in a non-interactive shell it still writes the minimal local scaffold. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Apply tar members' execute bits after write so starter entrypoints/scripts stay runnable. Pass preset=None in the direct-call init tests (typer Option defaults to OptionInfo when the command function is called directly). Co-authored-by: Cursor <cursoragent@cursor.com>

feat(cli): hud init --preset to scaffold from GitHub starters

Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries can exceed 120s on slow container boots.

Add a weightless Model that queries a remote policy server over the OpenPI msgpack/WebSocket protocol: the adapter builds the request dict, the server owns all pre/post-processing + the forward, and infer() ships it and returns the [T, A] chunk. connect() is lazy and idempotent (blocks until the server is up); response_key covers "actions" (stock OpenPI) vs "action" (Cosmos).

…erence BatchedModel wraps any Model and coalesces concurrent ainfer() calls into a single stacked forward: a lazily-started worker drains up to batch_size queued calls (or flushes after max_wait_s for the suite tail), runs one inner.infer, and scatters the [N, T, A] rows back to each caller. BatchedAgent wraps a RobotAgent and shallow-clones it per run so each rollout keeps isolated episode state while sharing the one batched model. Usage stays a one-liner: BatchedAgent(agent, batch_size=8) with max_concurrent set to match.

Migrate the robot harness to OpenPI-standard, slash-delimited observation keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy server drives the harness with no agent code changes.

Replace per-tick JPEG observation images with per-camera H.264/CMAF video streaming for robot traces: - Add hud/agents/robot/video.py (SegmentEncoder/VideoStreamer): encode each camera on a background thread, emitting CMAF fragments as VideoSegmentStep spans without blocking the act loop. - RobotAgent starts/finalizes the streamer at the env control rate; finalize in `finally` so a crashed run still leaves video. - ObservationStep.from_obs records only numeric state now; camera frames travel as video. - Step.emit accepts an explicit trace_id so the encoder thread (no contextvars trace context) attributes spans correctly. - Add RobotClient.get_control_rate(); add "video_segment" RobotStepSource; add PyAV (av>=12) to the robot extra.

Remove the per-episode model.reset() hook (Model/LeRobotModel/RemoteModel/ BatchedModel + agent.on_episode_start); per-episode state lives only on the agent, so a shared BatchedModel can no longer clear one rollout's policy state mid-episode. Document that RemoteModel is not batchable (OpenPI server has no batched-request shape) on RemoteModel, BatchedModel, and BatchedAgent.

…ship Spell out on Model.infer/ainfer that implementations must keep the leading batch dim N (ainfer indexes [0], BatchedModel scatters rows along it) and add a one-line assert in LeRobotModel.infer. Document that BatchedAgent mutates the passed-in agent in place, leaving it permanently batched. Co-authored-by: Cursor <cursoragent@cursor.com>

Clamp get_control_rate to max(1, round(...)) so sub-0.5 Hz contracts no longer emit 0 FPS on VideoSegmentStep. Init _hooks_done before add_capability in Environment.__init__. Load optional robot deps via importlib for pyright, add shim-test ignores, and ruff-format flagged files. Co-authored-by: Cursor <cursoragent@cursor.com>

Wrap long lines, move NDArray to TYPE_CHECKING, noqa intentional 0.0.0.0 bind in LocalRuntime, and reformat legacy shim test imports. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hud/agents/robot/batching.py Outdated

Comment thread hud/agents/robot/batching.py

Comment thread hud/agents/robot/batching.py

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hud/agents/robot/model.py Outdated

Comment thread hud/agents/robot/batching.py

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread hud/capabilities/robot.py Outdated

cursor Bot reviewed Jun 18, 2026

View reviewed changes

lukass16 force-pushed the v6-robot-3 branch from 51786b6 to 82c1ef8 Compare June 19, 2026 19:20

mintlify Bot deployed to staging - docs June 19, 2026 19:21 View deployment

cursor Bot reviewed Jun 19, 2026

View reviewed changes

jdchawla29 and others added 19 commits June 19, 2026 14:58

Merge pull request #421 from hud-evals/v6

7a8955c

Feat: hud-python sdk v6

add updates and fix docs

6dcc40a

fix version

4cd60a0

Merge pull request #438 from hud-evals/l/v6-template-updates

56f561c

L/v6 template updates

chore: bump version to 0.6.1

363c0a2

Co-authored-by: Cursor <cursoragent@cursor.com>

chore: bump pyproject version to 0.6.1

1522c16

Co-authored-by: Cursor <cursoragent@cursor.com>

fix(cli): clean up partial dir on failed preset fetch; document hud init

4fb0a5d

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #441 from hud-evals/l/hud-init-presets

681ec80

feat(cli): hud init --preset to scaffold from GitHub starters

fix(clients): raise connect ready_timeout default to 240s

03a84cf

Docker for slow envs like Isaac Sim publishes the port before @env.initialize finishes, so hello retries can exceed 120s on slow container boots.

feat(robot): adopt OpenPI wire-key convention + OpenPIAdapter

3758adf

Migrate the robot harness to OpenPI-standard, slash-delimited observation keys end-to-end, and add a thin OpenPIAdapter so a generic OpenPI policy server drives the harness with no agent code changes.

chore(robot): fix ruff lint failures in robot and runtime modules

4c85e4a

Wrap long lines, move NDArray to TYPE_CHECKING, noqa intentional 0.0.0.0 bind in LocalRuntime, and reformat legacy shim test imports. Co-authored-by: Cursor <cursoragent@cursor.com>

lukass16 force-pushed the v6-robot-3 branch from 82c1ef8 to 4c85e4a Compare June 20, 2026 01:55

mintlify Bot deployed to staging - docs June 20, 2026 01:56 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(robot): OpenPI policy harness, H.264 trace video, rollout batching against one agent#425

feat(robot): OpenPI policy harness, H.264 trace video, rollout batching against one agent#425
lukass16 wants to merge 19 commits into
v6from
v6-robot-3

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 18, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 19, 2026

Uh oh!

cursor Bot Jun 19, 2026

Uh oh!

cursor Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lukass16 commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Outcome / Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 18, 2026

Choose a reason for hiding this comment

Daytona snapshot probe swallows errors

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 19, 2026

Choose a reason for hiding this comment

MP4 sink buffer grows unbounded

Uh oh!

cursor Bot Jun 19, 2026

Choose a reason for hiding this comment

LeRobot policy not reset per episode

Uh oh!

cursor Bot Jun 19, 2026

Choose a reason for hiding this comment

Shared RemoteModel lacks infer lock

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lukass16 commented Jun 17, 2026 •

edited by cursor Bot

Loading