Feat: hud-python sdk v6 by Parth220 · Pull Request #421 · hud-evals/hud-python

Parth220 · 2026-06-15T18:30:29Z

Note

High Risk
This is a major SDK and protocol shift (v5 agents cannot drive v6-served environments) plus CI test setup changes that drop browser/Playwright provisioning, which can hide regressions in computer-use paths if those tests still exist.

Overview
This PR ships HUD Python SDK v6 as the primary surface: environments expose a thin control channel with capabilities (ssh, mcp, cdp, rfb, robot) and tasks (@env.template() generators), while agent harnesses own the tools. User-facing narrative moves from v5 scenarios/MCP tools to protocol-first manifest → tasks.start → tasks.grade, with Task.run(agent) returning a Job/Run instead of hud.eval() / env("scenario", ...).

Documentation is restructured on Mintlify: default v6 nav (docs/v6/), v5 tagged Legacy under docs/v5/, redirects from old paths, new Migrate to v6 guide, agent skill doc, and refreshed site styling (docs.json, custom.css). Several long-form cookbooks are removed from the old tree and replaced or relocated (e.g. v6 coding-agent, ops-diagnostics, a2a-chat, robot-benchmark).

Runnable examples land under cookbooks/ (A2A chat server moved out of the SDK as reference code; codex-style agent; v6 chat_env using EvaluationResult and templates). README and CONTRIBUTING are rewritten for v6 workflows (hud init, hud deploy, hud eval without --rootdir=hud).

CI/dev ergonomics: GitHub Actions drops Xvfb/Playwright install from the test matrix; .githooks/pre-push is removed. .gitignore expands for local/experimental dirs. Adds AGENTS.md (and CLAUDE.md pointer) for contributor/agent guidance.

^{Reviewed by Cursor Bugbot for commit c673f40. Bugbot is set up for automated code reviews on this repo. Configure here.}

[codex] drop v4 task compatibility

Decouple agent native tools from environment primitives

# Conflicts: # docs/reference/agents.mdx # hud/environment/environment.py # hud/environment/tests/test_environment.py # hud/tools/computer/base.py # hud/tools/computer/gemini.py # hud/tools/executors/xdo.py # hud/tools/tests/test_computer.py

Refactor Agents

…6-env

feat(eval): route HUDRuntime through runtime tunnel

[codex] add modal runtime provider wiring

hud/train/: TrainingClient (forward_backward, optim_step, step, custom forward/backward) over the HUD training service, keyed by model id. New 'hud models' CLI group (list, fork, checkpoints, head --set). settings: hud_rl_url; drop the old eval/training.py BYO helper. Docs: v6 training how-to rewritten for the managed trainer + new reference/training page; rl-training cookbook. Co-authored-by: Cursor <cursoragent@cursor.com>

Training POSTs (forward_backward/optim_step/backward) are non-idempotent, so make_request now uses max_retries=0 there (a silent retry would double-apply the optimizer/gradient or collide on the checkpoint name). Adds the 2048 RL cookbook example. Co-authored-by: Cursor <cursoragent@cursor.com>

l/training

Drop the divergent 'Task Run:' / 'Batch Run:' prefixes; default job names now use the bare subject (task id for a single task, '{taskset} (N tasks)' for a batch), matching the lone-rollout and chat paths and aligning with the platform's '{subject} on {model}' convention. Co-authored-by: Cursor <cursoragent@cursor.com>

…ames Standardize default job names

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77cd964ee9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-19T22:02:54Z

+    if resolved.is_dir() or resolved.suffix == ".py":
+        return resolved


Serve the env source instead of tasks.py

When hud eval is run on the scaffolded tasks.py (which imports task factories from env.py and only exposes Task objects), this branch passes tasks.py to LocalRuntime. The child then runs load_environment(tasks.py, --env <task.env>), but that file has no Environment, so the default hud init workflow fails before any rollout can start. Use the task's captured _source/the containing env module (or the directory) for local placement instead of the task list file.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T22:02:54Z

+                if cap.protocol in wanted and cap.protocol not in connections:
+                    connections[cap.protocol] = await run.client.open(cap.protocol)


Open capabilities by name, not protocol

If an env publishes more than one binding for the same protocol (for example two rfb/3.8 screens or multiple MCP tool servers), run.client.open(cap.protocol) calls HudClient.binding() with an ambiguous protocol ref and raises before the agent loop starts; even without the raise, the dict keyed by protocol would drop the extra binding. Iterate by capability name and keep distinct connections so same-protocol capabilities remain usable.

Useful? React with 👍 / 👎.

mintlify · 2026-06-19T22:04:46Z

Docs PR opened: #436

Removed the broken v6 Build nav group and repointed six broken links to existing v5 and v6 reference pages.

mintlify · 2026-06-19T22:09:30Z

Docs PR opened: #437

Rewrote short, generic SEO descriptions on 31 v6, platform, and migrate pages to unique 130–155 character summaries.

jdchawla29 and others added 30 commits April 27, 2026 16:07

drop v4 task compatibility

d2e8a8d

Merge pull request #403 from hud-evals/codex/drop-v4-support

2e937d4

[codex] drop v4 task compatibility

Align docs with v4 support removal

4f37307

Fix public docs SDK imports

0f19561

v5 regression tests

66afab0

Decouple agent native tools from environment primitives

a2bb01c

tool updates

63165d0

Merge pull request #407 from hud-evals/decouple-agent-tools

7bfbdc6

Decouple agent native tools from environment primitives

small gitignore

a43c5c0

refactor OpenAIChatAgent into openai_compatible package

eeef96f

agent updates

9366a1a

Merge pull request #413 from hud-evals/j/agent-updates

18306c5

Refactor Agents

add AGENTS.md

2330b9e

add init env

9442766

Merge branch 'v6' of https://github.com/hud-evals/hud-python into l/v…

1576dee

…6-env

simplify fx

78f5461

fx

0c84a19

Update .gitignore

9d7696f

Isolate agent run state

c8d3a1b

add more testing guideliens to AGENTS.md

89c3138

fix imports

4f494b0

simplify tool name handling

93ce003

agent context with top-level system prompt and citation options

70de8c7

tests updated

f92e707

restructure + claude [in progress, openai/gemini not done]

e1d420c

rfb + runnable test [in progress}

e285d66

refactor openai + gemini

beecc36

fx

8181d2e

imp and warmup

f33c7ee

jdchawla29 added 5 commits June 19, 2026 11:45

feat(eval): make HUDRuntime use runtime tunnel

67a48f4

fix(eval): address runtime tunnel review feedback

0ad7424

fix(cli): let runtime override remote flag

1afc5ad

fix(cli): reject conflicting runtime placement flags

47f1064

Merge pull request #430 from hud-evals/codex/cloud-runtime-tunnel-sdk

8ad2be2

feat(eval): route HUDRuntime through runtime tunnel

mintlify Bot deployed to staging - docs June 19, 2026 19:16 View deployment

Parth220 and others added 14 commits June 19, 2026 12:43

fix: authlib deprecation warning

6979905

ad co8

584a6af

add modal runtime provider wiring

d3d5d19

Merge pull request #433 from hud-evals/codex/modal-runtime-provider

6cc2be7

[codex] add modal runtime provider wiring

add small notes

87306f3

add timeout safety

98c8792

also remote taskset run via cli, report trace info

19bfad3

fx

4d80d7b

fx 2

b6dd7be

fix scoring and timeouts

1b76c65

small fix

e85b66e

Merge pull request #426 from hud-evals/l/tinker-training

1905f42

l/training

mintlify Bot deployed to staging - docs June 19, 2026 21:22 View deployment

lorenss-m and others added 2 commits June 19, 2026 14:44

Merge pull request #434 from hud-evals/feat/standardize-default-job-n…

77cd964

…ames Standardize default job names

jdchawla29 marked this pull request as ready for review June 19, 2026 21:55

jdchawla29 merged commit 7a8955c into main Jun 19, 2026
10 of 11 checks passed

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

mintlify Bot mentioned this pull request Jun 19, 2026

fix(docs): repair broken links from v6 SDK migration #436

Open

mintlify Bot mentioned this pull request Jun 19, 2026

docs: improve SEO descriptions across v6, platform, and migrate pages #437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: hud-python sdk v6#421

Feat: hud-python sdk v6#421
jdchawla29 merged 200 commits into
mainfrom
v6

Parth220 commented Jun 15, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Uh oh!

mintlify Bot commented Jun 19, 2026

Uh oh!

mintlify Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		if resolved.is_dir() or resolved.suffix == ".py":
		return resolved

		if cap.protocol in wanted and cap.protocol not in connections:
		connections[cap.protocol] = await run.client.open(cap.protocol)

Conversation

Parth220 commented Jun 15, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mintlify Bot commented Jun 19, 2026

Uh oh!

mintlify Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Parth220 commented Jun 15, 2026 •

edited by cursor Bot

Loading