Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
200 commits
Select commit Hold shift + click to select a range
d2e8a8d
drop v4 task compatibility
jdchawla29 Apr 27, 2026
2e937d4
Merge pull request #403 from hud-evals/codex/drop-v4-support
jdchawla29 Apr 27, 2026
4f37307
Align docs with v4 support removal
jdchawla29 Apr 28, 2026
0f19561
Fix public docs SDK imports
jdchawla29 Apr 29, 2026
66afab0
v5 regression tests
jdchawla29 May 1, 2026
a2bb01c
Decouple agent native tools from environment primitives
jdchawla29 May 1, 2026
63165d0
tool updates
jdchawla29 May 5, 2026
7bfbdc6
Merge pull request #407 from hud-evals/decouple-agent-tools
jdchawla29 May 8, 2026
5866ecb
Merge remote-tracking branch 'origin/main' into v6
jdchawla29 May 8, 2026
a43c5c0
small gitignore
lorenss-m May 8, 2026
eeef96f
refactor OpenAIChatAgent into openai_compatible package
jdchawla29 May 8, 2026
9366a1a
agent updates
jdchawla29 May 24, 2026
18306c5
Merge pull request #413 from hud-evals/j/agent-updates
jdchawla29 May 24, 2026
2330b9e
add AGENTS.md
jdchawla29 May 26, 2026
9442766
add init env
lorenss-m May 26, 2026
1576dee
Merge branch 'v6' of https://github.com/hud-evals/hud-python into l/v…
lorenss-m May 26, 2026
78f5461
simplify fx
lorenss-m May 26, 2026
0c84a19
fx
lorenss-m May 27, 2026
9d7696f
Update .gitignore
jdchawla29 May 26, 2026
c8d3a1b
Isolate agent run state
jdchawla29 May 26, 2026
89c3138
add more testing guideliens to AGENTS.md
jdchawla29 May 26, 2026
4f494b0
fix imports
jdchawla29 May 27, 2026
93ce003
simplify tool name handling
jdchawla29 May 27, 2026
70de8c7
agent context with top-level system prompt and citation options
jdchawla29 May 27, 2026
f92e707
tests updated
jdchawla29 May 27, 2026
e1d420c
restructure + claude [in progress, openai/gemini not done]
lorenss-m May 27, 2026
e285d66
rfb + runnable test [in progress}
lorenss-m May 27, 2026
beecc36
refactor openai + gemini
lorenss-m May 27, 2026
8181d2e
fx
lorenss-m May 27, 2026
f33c7ee
imp and warmup
lorenss-m May 27, 2026
3056a9f
mm fix
lorenss-m May 27, 2026
1751b40
claude sdk
lorenss-m May 27, 2026
ae04127
fx win outputs
lorenss-m May 27, 2026
9b0dec6
fx
lorenss-m May 27, 2026
e96ff9d
add inference-side instrumentation
jdchawla29 May 28, 2026
3921da2
fx
lorenss-m May 29, 2026
145759a
add bu fix claude
lorenss-m May 30, 2026
ea185ce
additions
lorenss-m Jun 1, 2026
fda0479
fxs
lorenss-m Jun 1, 2026
3a11712
add impl tinker api support + reward system
lorenss-m Jun 1, 2026
429ec15
Merge branch 'v6' of https://github.com/hud-evals/hud-python into v6-…
lorenss-m Jun 1, 2026
123fc16
temp: removing side-effects from importing hud.types
jdchawla29 Jun 1, 2026
d4b85b8
fix rollouts
lorenss-m Jun 1, 2026
8929f9b
temp: fix 2
jdchawla29 Jun 1, 2026
c07895e
fix running
lorenss-m Jun 1, 2026
c21f27d
add eval flows
lorenss-m Jun 1, 2026
6563750
telem
lorenss-m Jun 1, 2026
7e2b7df
small change
jdchawla29 Jun 2, 2026
542b7d4
add legacy improvements, cleanup
lorenss-m Jun 2, 2026
026fd9d
cleanup
lorenss-m Jun 2, 2026
52623b1
cleanup
lorenss-m Jun 3, 2026
3684598
fxs
lorenss-m Jun 3, 2026
b3fdb38
better legacy compat
lorenss-m Jun 3, 2026
9b44b85
tests time
lorenss-m Jun 3, 2026
4ba5a0f
fxs
lorenss-m Jun 3, 2026
29a0fb1
fix tests
lorenss-m Jun 4, 2026
4dcf91d
full tests and cleanup
lorenss-m Jun 4, 2026
cc7bb2d
Merge v6 into v6-agent-f-l (ours)
jdchawla29 Jun 5, 2026
2a356e3
Merge pull request #415 from hud-evals/v6-agent-f-l
jdchawla29 Jun 5, 2026
40d5db6
cleanup and add task cli
lorenss-m Jun 6, 2026
4c7c5f1
rm push
lorenss-m Jun 6, 2026
55b3ce8
improve readme and convert
lorenss-m Jun 7, 2026
bf60f0e
fxs
lorenss-m Jun 7, 2026
2fb7aef
V6 contrainer mgmt (#416)
lorenss-m Jun 7, 2026
9be51b2
refactor: decouple job registration from telemetry
jdchawla29 Jun 8, 2026
9bc8e78
docs
lorenss-m Jun 8, 2026
d67592f
Merge branch 'v6-contrainer-mgmt' of https://github.com/hud-evals/hud…
lorenss-m Jun 8, 2026
54cad0c
changes in task and environment structure, replacing references to 'v…
jdchawla29 Jun 9, 2026
75b380e
refactor 1
jdchawla29 Jun 9, 2026
8613869
consolidation
jdchawla29 Jun 9, 2026
cfead4f
consolidate 2
jdchawla29 Jun 9, 2026
495139c
remove hud build
jdchawla29 Jun 9, 2026
96ea421
refactor
jdchawla29 Jun 9, 2026
2ed744c
refactor 2
jdchawla29 Jun 9, 2026
55ebe7e
cleanup
jdchawla29 Jun 9, 2026
467c7a4
restructure
jdchawla29 Jun 10, 2026
f3041a3
clean
jdchawla29 Jun 10, 2026
1b01b65
cookbooks
jdchawla29 Jun 10, 2026
3876bb0
utils
jdchawla29 Jun 10, 2026
82fcff6
delt
jdchawla29 Jun 10, 2026
8223526
restructure
jdchawla29 Jun 10, 2026
f74ab32
works on my machine
jdchawla29 Jun 10, 2026
0577a25
small clean
jdchawla29 Jun 10, 2026
98a67c6
small docs improvements and cli ux
lorenss-m Jun 10, 2026
d5f1f57
fxs
lorenss-m Jun 10, 2026
95f61b5
rm skill
lorenss-m Jun 10, 2026
2735555
update docs
lorenss-m Jun 10, 2026
cab7ee4
robot: add robot capability, environment.robots, and episode recorder
lukass16 Jun 10, 2026
40ca44a
final
jdchawla29 Jun 11, 2026
55afb33
docs
jdchawla29 Jun 11, 2026
820f76c
agent-side robot concerns to sdk
lukass16 Jun 11, 2026
23100a9
Merge origin/v6: docs tone/structure + CLI UX, keeping simplify API
jdchawla29 Jun 11, 2026
c515164
docs 2
jdchawla29 Jun 11, 2026
f141da1
pyright
jdchawla29 Jun 11, 2026
b09f8b7
Refactor task ID handling to strip environment prefixes for local tas…
jdchawla29 Jun 11, 2026
bf78a10
Merge pull request #418 from hud-evals/simplify
jdchawla29 Jun 11, 2026
f33d0c7
change env side robot telemetry
lukass16 Jun 11, 2026
3516852
Merge origin/v6 into v6-robot
lukass16 Jun 11, 2026
326dbf7
update robot telemetry
lukass16 Jun 11, 2026
56dfef6
update robot docs
lukass16 Jun 11, 2026
1231204
docs and matching
lukass16 Jun 11, 2026
4fafa69
fix matching
lukass16 Jun 11, 2026
5c41356
add ensembler
lukass16 Jun 11, 2026
0245d56
fix queue
lukass16 Jun 11, 2026
b2ff1d8
remove arbitrary tests, update adapter
lukass16 Jun 12, 2026
62a1554
small reliability fixes
lorenss-m Jun 12, 2026
03f12b7
clean robot agent w/out tracing rewrite
lukass16 Jun 12, 2026
c722a9c
Align platform API client with the rewrite control plane
jdchawla29 Jun 12, 2026
8b15400
refactor datasaving
lukass16 Jun 12, 2026
ffdf742
undo delete
lukass16 Jun 12, 2026
e5f1edb
clean sim runner
lukass16 Jun 12, 2026
772d782
remove contracts
lukass16 Jun 12, 2026
5b7110a
Simplify the v6 contract surfaces
jdchawla29 Jun 12, 2026
a98b289
Merge remote-tracking branch 'origin/v6' into v6-robot
lukass16 Jun 12, 2026
1c7c058
Align v6 SDK and CLI surfaces with the rewrite control plane
jdchawla29 Jun 13, 2026
e553c9f
feat(eval): v6 placement model — Provider/HUDRuntime, run atom, agent…
jdchawla29 Jun 13, 2026
2f64fe7
fix(gateway): carry HUD key in x-goog-api-key for the gemini client
jdchawla29 Jun 13, 2026
ca1c834
test(eval): fix taskset export fixture to the canonical CP wire shape
jdchawla29 Jun 13, 2026
ec68b92
clean telemetry
lukass16 Jun 13, 2026
2e10145
Merge remote-tracking branch 'origin/v6' into v6-robot
lukass16 Jun 13, 2026
d5f7bc8
remove realtime for now
lukass16 Jun 13, 2026
67fd2b9
small fixes for platform
lukass16 Jun 13, 2026
e3520e2
remove data saving
lukass16 Jun 13, 2026
d62a651
keying and small ux updates, cleanup and dep mgmt
lorenss-m Jun 13, 2026
c308d1a
refactor and improve docs cadence
lorenss-m Jun 13, 2026
68ea5b6
update endpoint
lukass16 Jun 13, 2026
e72a3eb
docs
lorenss-m Jun 13, 2026
2a07225
docs adjustment
lorenss-m Jun 13, 2026
a472623
Merge branch 'v6-l-clean' of https://github.com/hud-evals/hud-python …
lorenss-m Jun 14, 2026
db58f86
align robot and docs, format and fixes
lorenss-m Jun 14, 2026
e34335c
fxs
lorenss-m Jun 14, 2026
9f67834
Merge pull request #419 from hud-evals/v6-robot
lorenss-m Jun 14, 2026
5962b07
thread runner add
lukass16 Jun 14, 2026
bc06c18
capability rename
lukass16 Jun 14, 2026
57aceb5
small tweak in proc + flush line
lukass16 Jun 14, 2026
b451efd
Merge pull request #420 from hud-evals/v6-robot-2
lorenss-m Jun 14, 2026
1aa4e17
linter
lorenss-m Jun 14, 2026
4925ec9
improve telem exporter
lorenss-m Jun 14, 2026
68007e6
docs fixes
lorenss-m Jun 14, 2026
39970b0
fix rubric based grader and windows local, add convenience imports
lorenss-m Jun 14, 2026
48309ff
local teleme export + windows local test
lorenss-m Jun 14, 2026
d7f6cc5
env var merge and proper win support
lorenss-m Jun 15, 2026
1f449da
upgrade settings links
lorenss-m Jun 15, 2026
a4a78c7
fix: env name resolution now uses env.py declared name, instead of se…
solvemproblr Jun 15, 2026
b72a944
Merge pull request #422 from hud-evals/asa/environment-name-fix
jdchawla29 Jun 15, 2026
88ba14d
improve local observability
lorenss-m Jun 16, 2026
704bca4
add better remote guidance, docs and bump version
lorenss-m Jun 17, 2026
c673f40
small adjustments
lorenss-m Jun 17, 2026
696d15f
feat(eval): add ModalRuntime provider for per-rollout Modal sandboxes
lukass16 Jun 17, 2026
bb53e47
feat(eval): add DaytonaRuntime provider for per-rollout Daytona sandb…
lukass16 Jun 17, 2026
166c2bf
fix(environment): set _hooks_done before adding constructor capabilities
lukass16 Jun 17, 2026
fb27f7f
chore(eval): silence S104 on intentional 0.0.0.0 bind in ModalRuntime
lukass16 Jun 17, 2026
dd1e391
fix(eval): derive DaytonaRuntime command from port to avoid tunnel mi…
lukass16 Jun 17, 2026
acc264e
fix(eval): type casting timeout to int for Modal and Daytona
lukass16 Jun 17, 2026
5977d5b
fix(eval): make Daytona sandboxes ephemeral by default
lukass16 Jun 17, 2026
4a31f50
fix(eval): fix exception handling in _ensure_snapshot
lukass16 Jun 17, 2026
2bf3f11
fix(eval): kill LocalRuntime process group to prevent orphan children
lukass16 Jun 18, 2026
30f7d13
chore(format): apply ruff formatting to claude sdk agent and cli init
lukass16 Jun 18, 2026
380dc40
fix(environment): set _hooks_done before constructor capabilities loop
lukass16 Jun 18, 2026
420718b
feat(eval): introduce RuntimeConfig for task-level resource management
jdchawla29 Jun 18, 2026
04c0344
fix(eval): address runtime config CI feedback
jdchawla29 Jun 18, 2026
d466034
adjustments
jdchawla29 Jun 18, 2026
d3775af
fix(eval): keep docker image shorthand
jdchawla29 Jun 18, 2026
ae79946
fix(eval): reject daytona run timeouts consistently
jdchawla29 Jun 18, 2026
566ecfe
Merge pull request #423 from hud-evals/lukass/modal-daytona-runtimes
jdchawla29 Jun 18, 2026
e4aa827
chore(eval): delete loose ORPHAN_BUG.md
lukass16 Jun 18, 2026
d87b34c
fix(eval): always SIGKILL LocalRuntime group, not only on timeout
lukass16 Jun 18, 2026
33e2037
Merge branch 'v6' into lukass/local-runtime-fixes
lukass16 Jun 18, 2026
a93caaf
chore(eval): delete loose test_local_runtime_orphan.py
lukass16 Jun 18, 2026
60e55bc
docs(eval): tighten _terminate process-group comment
lukass16 Jun 18, 2026
bbb614d
Merge pull request #428 from hud-evals/lukass/local-runtime-fixes
jdchawla29 Jun 18, 2026
b013ef8
feat(filetracking): workspace file-tracking capability + telemetry
lorenss-m Jun 19, 2026
99663d2
style(filetracking): move test-only pytest import under TYPE_CHECKING…
lorenss-m Jun 19, 2026
c35783c
fix(filetracking): address bugbot review on the observer + tracker
lorenss-m Jun 19, 2026
f5c4f54
fix(filetracking): keep skipped diffs pending; root-only gitignore
lorenss-m Jun 19, 2026
6cd857f
fix(filetracking): gate polling on successful observer setup
lorenss-m Jun 19, 2026
0e48355
fix(filetracking): degrade gracefully when capability open fails
lorenss-m Jun 19, 2026
81fc8dc
Merge pull request #429 from hud-evals/l/file-tracking
lorenss-m Jun 19, 2026
c13c3c5
Add cloud mode HUD runtime tunnel support
jdchawla29 Jun 17, 2026
67a48f4
feat(eval): make HUDRuntime use runtime tunnel
jdchawla29 Jun 19, 2026
0ad7424
fix(eval): address runtime tunnel review feedback
jdchawla29 Jun 19, 2026
1afc5ad
fix(cli): let runtime override remote flag
jdchawla29 Jun 19, 2026
47f1064
fix(cli): reject conflicting runtime placement flags
jdchawla29 Jun 19, 2026
8ad2be2
Merge pull request #430 from hud-evals/codex/cloud-runtime-tunnel-sdk
jdchawla29 Jun 19, 2026
6979905
fix: authlib deprecation warning
Parth220 Jun 19, 2026
584a6af
ad co8
lorenss-m Jun 19, 2026
d3d5d19
add modal runtime provider wiring
jdchawla29 Jun 19, 2026
6cc2be7
Merge pull request #433 from hud-evals/codex/modal-runtime-provider
jdchawla29 Jun 19, 2026
5bea22e
Add hud.TrainingClient + hud models CLI for managed RL training
lorenss-m Jun 18, 2026
87306f3
add small notes
lorenss-m Jun 18, 2026
f611497
hud.train: no retry on stateful training POSTs; add 2048 RL cookbook
lorenss-m Jun 18, 2026
98c8792
add timeout safety
lorenss-m Jun 18, 2026
19bfad3
also remote taskset run via cli, report trace info
lorenss-m Jun 18, 2026
4d80d7b
fx
lorenss-m Jun 18, 2026
b6dd7be
fx 2
lorenss-m Jun 18, 2026
1b76c65
fix scoring and timeouts
lorenss-m Jun 18, 2026
e85b66e
small fix
lorenss-m Jun 18, 2026
1905f42
Merge pull request #426 from hud-evals/l/tinker-training
jdchawla29 Jun 19, 2026
3db5250
Standardize default job names
lorenss-m Jun 19, 2026
77cd964
Merge pull request #434 from hud-evals/feat/standardize-default-job-n…
lorenss-m Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
17 changes: 0 additions & 17 deletions .githooks/pre-push

This file was deleted.

15 changes: 1 addition & 14 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,8 @@ jobs:
- name: Install Python
run: uv python install ${{ matrix.python-version }}

- name: Setup virtual display
run: |
sudo apt-get update
sudo apt-get install -y xvfb
Xvfb :99 -screen 0 1920x1080x24 -ac &
sleep 3

- name: Install Playwright browsers
run: uv run --with=".[dev]" playwright install chromium

- name: Run tests
env:
DISPLAY: :99
XAUTHORITY: /dev/null
run: uv run --python ${{ matrix.python-version }} --with=".[dev]" pytest --rootdir=hud --cov --cov-report=''
run: uv run --python ${{ matrix.python-version }} --with=".[dev]" pytest --cov --cov-report=''

lint-ruff:
runs-on: ubuntu-latest
Expand Down
12 changes: 10 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ TODO.md
/dev/

.claude
CLAUDE.md

*.csv
.rl_config_*.json
Expand All @@ -54,4 +53,13 @@ hud/rl/checkpoints_test/
.ck/

.hud_eval_config
.hud_eval.toml
.hud_eval.toml

docs/internal

environments/

experiments/
.memories/

.codex/
3 changes: 3 additions & 0 deletions .hud/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"tasksetId": "de5f3062-2587-4b33-a547-27995df213bd"
}
Comment thread
cursor[bot] marked this conversation as resolved.
153 changes: 153 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# HUD Python Agent Guide

This repository is the Python SDK and CLI for HUD: environments, capabilities,
tasks, agents, the rollout engine, telemetry, and command-line workflows for
building and running agent evaluations.

Priorities: solve the requested problem, keep scope tight, preserve public SDK
behavior where it is actually shipped, and improve code quality rather than
adding local workarounds.

## Where To Look First

- `README.md` for the protocol, product concepts, and common CLI workflows.
- `docs/v6/` for the live SDK docs: quickstart, reference (environment, tasks,
capabilities, agents, graders, types, cli), run guides, and cookbooks.
- `CONTRIBUTING.md` for setup, test, lint, and type-check commands.
- `pyproject.toml` for supported Python versions, dependencies, optional extras,
ruff, pyright, pytest, and coverage configuration.
- Source files and colocated tests for exact behavior. Trust code and tests over
stale prose.
- `cookbooks/` for runnable end-to-end examples (each is its own uv project).

Keep this file stable. Do not turn it into a release runbook, command matrix, or
inventory of current incidents.

## Repository Map

- Core flow: `hud/environment/` (spec: capabilities, tasks, serving) →
`hud/eval/` (engine: rollout, runtimes, jobs) → `hud/agents/` (harnesses),
connected by `hud/capabilities/` and `hud/clients/`.
- `hud/cli/` is the Typer surface over the same modules.
- `hud/_legacy.py` and `hud/patches/` quarantine v5 compatibility.
- `cookbooks/` and `integrations/` live outside the `hud` package.

## Working Style

- Run commands from the repository root unless a tool explicitly requires a
subdirectory.
- Use `uv` for Python commands. Do not rely on an activated virtualenv.
- Read files before editing them and follow nearby patterns.
- Keep edits focused on the requested behavior. Do not clean up unrelated code.
- Prefer editing existing docs over creating new docs unless the user asks for a
new document.
- Do not introduce hacks, monkey patches, or partial workarounds. If a robust
solution needs missing support, add that support cleanly or report the blocker.
- Report any part of a change that is uncertain, fragile, or intentionally left
unverified.

## Setup And Checks

Use the commands in `CONTRIBUTING.md` as the source of truth. Common commands:

```bash
uv sync --extra dev
uv run pytest -q
uv run ruff format . --check
uv run ruff check .
uv run pyright
```

The shared pre-push hook lives in `.githooks/pre-push`, but agents should not
change local git config unless explicitly asked.

Tests run on Python 3.11 and 3.12 in CI. `pyproject.toml` currently supports
Python `>=3.11, <3.13`.

## Code Quality Bar

- Prefer direct, typed, maintainable code over clever or magical abstractions.
- Be ambitious about simplification. Look for ways to delete whole branches,
helper layers, modes, and special cases while preserving behavior.
- Fail fast and loudly. Avoid silent fallbacks, broad exception swallowing, and
defensive branches that hide broken invariants.
- Minimize branching. Every new `if`, `try`, compatibility path, or nullable mode
should earn its keep.
- Preserve documented public API and persisted behavior unless the task is an
intentional migration. Do not add compatibility layers for unshipped branch
work; replace the design cleanly.
- Reuse canonical helpers and local abstractions before adding new ones.
- Keep feature logic in the layer that owns the concept. Treat scattered
feature checks in shared paths as a design problem.
- Prefer explicit contracts over optional, loosely shaped, or cast-heavy data.
- Delete dead code. Do not keep obsolete paths around "just in case."
- Keep comments rare and useful. Explain non-obvious intent, not what the next
line mechanically does.
- Remove AI-generated slop before finishing: unnecessary comments, abnormal
defensive checks, broad `try` blocks, type bypasses, deep nesting, and thin
wrappers that do not reduce real complexity.
- Be suspicious of files pushed past 1000 lines. Decompose when there is a clear
focused module to extract.
- Avoid new core dependencies. If a dependency is only needed for optional
provider, tool, or integration behavior, put it behind the relevant extra.

## Typing And Imports

- Type public APIs and cross-module contracts. Prefer explicit Pydantic models or
typed structures over ad-hoc dictionaries at boundaries.
- `cast(...)` and `assert ...` are acceptable for real type narrowing. Broad
`# type: ignore` comments are not.
- Keep `Any` contained to genuinely dynamic payloads such as provider JSON,
metadata, or third-party integration blobs.
- Keep imports at the top of the module. Use inline imports only for an existing
lazy optional-dependency pattern or a documented circular-import constraint.
- Use `TYPE_CHECKING` imports for type-only imports that would otherwise add
runtime dependency cost or cycles.

## Testing Expectations

- Add or update focused tests for behavior changes. Put tests near the module
they cover, following the existing `*/tests/` layout.
- Test behavior and contracts, not private implementation details.
- Regression tests should fail on the old behavior through the normal lifecycle
or public boundary. Do not manually seed private state such as internal maps,
caches, cursors, or prepared containers just to prove a changed line.
- If a bug involves internal state, reach it through real setup and execution:
construction, configuration, preparation, run loop, provider response, tool
execution, or public API call.
- Do not add hooks, helper methods, or abstraction layers only to make tests
easier. If a test needs that, reconsider the behavior boundary instead.
- Test names should describe the observable behavior or contract, not the
private mechanism.
- Mock external services, provider APIs, network, Docker, browser, and filesystem
boundaries as needed. Do not mock core logic just to make a test easy.
- Mark tests that require `HUD_API_KEY`, network access, or deployed services as
integration tests.
- Run the narrowest relevant tests first, then broader checks when the blast
radius is shared or user-facing.

## Operational Debugging

- Follow the execution path instead of guessing from abstractions.
- For CLI issues, start with the command module, then config/settings, then the
SDK module being exercised.
- For agent/provider issues, inspect gateway resolution, provider adapter code,
capability-backed tool wiring, and recorded request/response shapes.
- For environment/task issues, inspect the task lifecycle (start/grade), the
control-channel server and client, and capability routing/tunneling.
- For execution issues, inspect the rollout engine: runtime provider
acquisition, `connect`, the `Run` lifecycle, and job/trace reporting.
- For telemetry issues, inspect instrumentation boundaries and exporter behavior
before changing call sites.
- Report what was verified, what remains inferred, and which file, test, trace,
or command output supports the conclusion.

## Decision Protocol

Ask first when scope, public API compatibility, or ownership is unclear.

Choose and flag when naming, test boundaries, or local structure are ambiguous
but the direction is straightforward.

Just do it when fixing formatting, applying an obvious bug fix with clear root
cause, tightening types, or removing slop that does not change behavior.
1 change: 1 addition & 0 deletions CLAUDE.md
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ git config core.hooksPath .githooks
### Running Tests

```bash
uv run pytest --rootdir=hud -q
uv run pytest -q
```

Tests run on Python 3.11 and 3.12 in CI.
Expand Down
Loading
Loading