Feat: hud-python sdk v6#421
Conversation
[codex] drop v4 task compatibility
Decouple agent native tools from environment primitives
# Conflicts: # docs/reference/agents.mdx # hud/environment/environment.py # hud/environment/tests/test_environment.py # hud/tools/computer/base.py # hud/tools/computer/gemini.py # hud/tools/executors/xdo.py # hud/tools/tests/test_computer.py
Refactor Agents
feat(eval): route HUDRuntime through runtime tunnel
[codex] add modal runtime provider wiring
hud/train/: TrainingClient (forward_backward, optim_step, step, custom forward/backward) over the HUD training service, keyed by model id. New 'hud models' CLI group (list, fork, checkpoints, head --set). settings: hud_rl_url; drop the old eval/training.py BYO helper. Docs: v6 training how-to rewritten for the managed trainer + new reference/training page; rl-training cookbook. Co-authored-by: Cursor <cursoragent@cursor.com>
Training POSTs (forward_backward/optim_step/backward) are non-idempotent, so make_request now uses max_retries=0 there (a silent retry would double-apply the optimizer/gradient or collide on the checkpoint name). Adds the 2048 RL cookbook example. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the divergent 'Task Run:' / 'Batch Run:' prefixes; default job names now use the bare subject (task id for a single task, '{taskset} (N tasks)' for a batch), matching the lone-rollout and chat paths and aligning with the platform's '{subject} on {model}' convention.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ames Standardize default job names
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77cd964ee9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if resolved.is_dir() or resolved.suffix == ".py": | ||
| return resolved |
There was a problem hiding this comment.
Serve the env source instead of tasks.py
When hud eval is run on the scaffolded tasks.py (which imports task factories from env.py and only exposes Task objects), this branch passes tasks.py to LocalRuntime. The child then runs load_environment(tasks.py, --env <task.env>), but that file has no Environment, so the default hud init workflow fails before any rollout can start. Use the task's captured _source/the containing env module (or the directory) for local placement instead of the task list file.
Useful? React with 👍 / 👎.
| if cap.protocol in wanted and cap.protocol not in connections: | ||
| connections[cap.protocol] = await run.client.open(cap.protocol) |
There was a problem hiding this comment.
Open capabilities by name, not protocol
If an env publishes more than one binding for the same protocol (for example two rfb/3.8 screens or multiple MCP tool servers), run.client.open(cap.protocol) calls HudClient.binding() with an ambiguous protocol ref and raises before the agent loop starts; even without the raise, the dict keyed by protocol would drop the extra binding. Iterate by capability name and keep distinct connections so same-protocol capabilities remain usable.
Useful? React with 👍 / 👎.
|
Docs PR opened: #436 Removed the broken v6 Build nav group and repointed six broken links to existing v5 and v6 reference pages. |
|
Docs PR opened: #437 Rewrote short, generic SEO descriptions on 31 v6, platform, and migrate pages to unique 130–155 character summaries. |
Note
High Risk
This is a major SDK and protocol shift (v5 agents cannot drive v6-served environments) plus CI test setup changes that drop browser/Playwright provisioning, which can hide regressions in computer-use paths if those tests still exist.
Overview
This PR ships HUD Python SDK v6 as the primary surface: environments expose a thin control channel with capabilities (
ssh,mcp,cdp,rfb,robot) and tasks (@env.template()generators), while agent harnesses own the tools. User-facing narrative moves from v5 scenarios/MCP tools to protocol-first manifest → tasks.start → tasks.grade, withTask.run(agent)returning aJob/Runinstead ofhud.eval()/env("scenario", ...).Documentation is restructured on Mintlify: default v6 nav (
docs/v6/), v5 tagged Legacy underdocs/v5/, redirects from old paths, new Migrate to v6 guide, agent skill doc, and refreshed site styling (docs.json,custom.css). Several long-form cookbooks are removed from the old tree and replaced or relocated (e.g. v6 coding-agent, ops-diagnostics, a2a-chat, robot-benchmark).Runnable examples land under
cookbooks/(A2A chat server moved out of the SDK as reference code; codex-style agent; v6chat_envusingEvaluationResultand templates). README and CONTRIBUTING are rewritten for v6 workflows (hud init,hud deploy,hud evalwithout--rootdir=hud).CI/dev ergonomics: GitHub Actions drops Xvfb/Playwright install from the test matrix;
.githooks/pre-pushis removed..gitignoreexpands for local/experimental dirs. AddsAGENTS.md(andCLAUDE.mdpointer) for contributor/agent guidance.Reviewed by Cursor Bugbot for commit c673f40. Bugbot is set up for automated code reviews on this repo. Configure here.