Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
d7e1b3b
docs(v6): restructure nav and expand protocol into its own page
lukass16 Jun 19, 2026
9ae6f46
docs(v6): rework landing flow, add runtime page, fix dark-mode theming
lukass16 Jun 19, 2026
3c9a71a
docs(v6): collapse landing parts into accordions and fix table styling
lukass16 Jun 19, 2026
9c2210e
docs(v6): rename reference section to core and sharpen protocol/envir…
lukass16 Jun 19, 2026
0f3931f
docs(v6): fix stale SDK references in chat, tasks, robot, and skill docs
lukass16 Jun 19, 2026
1adf347
fix(cli): use 'hud serve' in scaffolded Dockerfile.hud
lukass16 Jun 19, 2026
774b929
docs(v6): document runtime_config, Job.results, and cloud runtimes; f…
lukass16 Jun 19, 2026
d5646df
revert(cli): keep 'hud dev' in scaffolded Dockerfile.hud
lukass16 Jun 19, 2026
a29876a
docs(v6): tighten environment page and drop internal load_environment
lukass16 Jun 19, 2026
4707975
docs(v6): rework tasks & tasksets page and move placement detail to r…
lukass16 Jun 19, 2026
6229f6a
docs(v6): clarify how capabilities spin up and stay reachable
lukass16 Jun 19, 2026
70e7a8d
docs(v6): rework agents page and expand harness/Run guidance
lukass16 Jun 19, 2026
d27d539
Merge origin/v6 into lukass/docs-love
lukass16 Jun 19, 2026
7a8955c
Merge pull request #421 from hud-evals/v6
jdchawla29 Jun 19, 2026
0aeae44
docs(v6): update index
lukass16 Jun 19, 2026
6dcc40a
add updates and fix docs
lorenss-m Jun 19, 2026
4cd60a0
fix version
lorenss-m Jun 19, 2026
56f561c
Merge pull request #438 from hud-evals/l/v6-template-updates
lorenss-m Jun 19, 2026
6f3b9b7
docs(v6): motivation love
lukass16 Jun 19, 2026
3b126a9
Merge origin/main into lukass/docs-love
lukass16 Jun 19, 2026
1b86302
feat(cli): hud init --preset to scaffold from GitHub starters
lorenss-m Jun 19, 2026
363c0a2
chore: bump version to 0.6.1
lorenss-m Jun 19, 2026
1522c16
chore: bump pyproject version to 0.6.1
lorenss-m Jun 19, 2026
4fb0a5d
fix(cli): clean up partial dir on failed preset fetch; document hud init
lorenss-m Jun 19, 2026
d68591a
fix(cli): preserve executable bits in preset extraction; fix init tests
lorenss-m Jun 19, 2026
681ec80
Merge pull request #441 from hud-evals/l/hud-init-presets
lorenss-m Jun 19, 2026
937ddee
docs(v6): clean up robot
lukass16 Jun 20, 2026
03a84cf
fix(clients): raise connect ready_timeout default to 240s
lukass16 Jun 15, 2026
9904d54
feat(robot): add RemoteModel client for OpenPI-WebSocket policy servers
lukass16 Jun 15, 2026
19367d3
feat(robot): add BatchedAgent/BatchedModel for concurrent rollout inf…
lukass16 Jun 16, 2026
3758adf
feat(robot): adopt OpenPI wire-key convention + OpenPIAdapter
lukass16 Jun 16, 2026
1ad1254
feat(robot): stream camera frames as per-camera H.264 video
lukass16 Jun 17, 2026
93efb50
fix(robot): make inference Model stateless to fix shared reset race
lukass16 Jun 17, 2026
3aed868
docs(robot): document [N, T, A] infer contract and BatchedAgent owner…
lukass16 Jun 17, 2026
0cf3d78
fix(robot): clamp control rate and clear CI pyright/test failures
lukass16 Jun 18, 2026
4c85e4a
chore(robot): fix ruff lint failures in robot and runtime modules
lukass16 Jun 18, 2026
26ee4e7
feat(models): surface is_trainable + model id in `hud models`; bump 0…
lorenss-m Jun 20, 2026
dde7a80
Merge pull request #443 from hud-evals/l/hud-models-trainable
lorenss-m Jun 20, 2026
82dd7da
Merge branch 'lukass/docs-love' into v6-robot-3
lukass16 Jun 20, 2026
0f3a61c
Merge remote-tracking branch 'origin/main' into v6-robot-3
lukass16 Jun 20, 2026
7086986
Align v6 scaffold and docs with current CLI, add RL cookbooks.
lorenss-m Jun 20, 2026
49a15d8
Remove docs/AGENTS.md and apply ruff formatting.
lorenss-m Jun 20, 2026
4c7cc7e
feat(eval): link remote jobs to their synced taskset (Taskset.api_id …
lorenss-m Jun 20, 2026
4f2299b
style: ruff format job.py
lorenss-m Jun 20, 2026
8367609
chore: bump version to 0.6.3
lorenss-m Jun 20, 2026
d6ff51e
Merge remote-tracking branch 'origin/main' into l/small-updates-16
lorenss-m Jun 20, 2026
4b8063b
Merge pull request #445 from hud-evals/l/small-updates-16
lorenss-m Jun 20, 2026
faf59b2
Merge branch 'main' into v6-robot-3
lukass16 Jun 20, 2026
127606d
feat(robot): add opt-in LeRobot dataset recording to robot rollouts
lukass16 Jun 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ __pycache__
.pytest_cache
dist/
build/
# The broad build/ rule above also matches docs/v6/build/, which is real docs
# content (linked from docs.json). Keep tracking it so docs.hud.ai/v6/build/*
# does not 404.
!docs/v6/build/
*.egg-info/
uv.lock

Expand Down
114 changes: 114 additions & 0 deletions cookbooks/fireworks-rl-training/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Fireworks RL Training

Direct Fireworks Training API loop over the same arithmetic preview task used by
`cookbooks/rl-training`.

This does **not** use Fireworks native datasets or RFT jobs. It follows the
Training API service path from the Fireworks docs:

1. `FiretitanServiceClient.from_firetitan_config(...)`
2. `create_deployment_sampler(...)` for high-parallel rollouts
3. local grading of HUD-style multiplication tasks
4. `forward_backward_custom(...)` + `optim_step(...)`
5. `save_weights_for_sampler(...)` + sampler refresh

References:

- Fireworks Training API introduction: https://docs.fireworks.ai/fine-tuning/training-api/introduction
- Training and sampling lifecycle: https://docs.fireworks.ai/fine-tuning/training-api/training-and-sampling
- Loss functions / GRPO reference: https://docs.fireworks.ai/fine-tuning/training-api/loss-functions

## Setup

The repo-level `.env` is loaded automatically. It must contain:

```bash
FIREWORKS_API_KEY=...
FIREWORKS_ACCOUNT_ID=...
```

Install the isolated cookbook environment:

```bash
uv sync --pre
```

## Calibrate task difficulty first

Calibration defaults to Fireworks' OpenAI-compatible inference API, so it does
**not** create a trainer, provision a Training API deployment, or call
`optim_step`. This is the cheap way to tune task difficulty before paying for a
Training API run.

The calibration model is separate from the training base model because the
`lorenss` key currently exposes only a small serverless inference catalog (no
Qwen3 8B deployment). Override it with `--inference-model` if you have a closer
deployed model.

```bash
uv run train.py --calibrate-only --groups-per-step 8 --rollouts-per-prompt 8 --parallelism 32
```

The goal is a reward distribution with variance. If reward is all zero, make the
task easier:

```bash
uv run train.py --calibrate-only --min-a 10 --max-a 99 --min-b 2 --max-b 9
```

If reward is all one, make the task harder:

```bash
uv run train.py --calibrate-only --min-a 1000 --max-a 9999 --min-b 11 --max-b 99
```

The current defaults are calibrated for the visible `gpt-oss-120b` inference
model on the `lorenss` key: 2-digit by 1-digit multiplication with a direct
"reply only with the integer" prompt. A 32-rollout calibration gave a non-trivial
baseline (`reward_mean ~= 0.22`, `reward_std ~= 0.42`), while the original
3-digit by 2-digit range was all-zero.

## Train

Once calibration has non-trivial rewards:

```bash
uv run train.py --steps 5 --groups-per-step 8 --rollouts-per-prompt 8 --parallelism 32
```

This uses the direct Training API managed service path. If you want calibration
to go through the managed deployment sampler too, pass
`--calibration-backend managed`; this provisions the same resources as training.

### Current Fireworks preview account blocker

On the `lorenss` preview account, trainer creation currently fails before the
first train step with:

```text
failed to ensure FIREWORKS_API_KEY secret: unkey inference api id is not configured
```

This happens even with `create_deployment=False`, so it is an account/control
plane provisioning issue rather than a problem in the rollout or loss code. Once
Fireworks enables the missing Unkey inference API config for the account, the
same `uv run train.py ...` command should proceed to trainer startup and the
first `forward_backward_custom(...)` call.

Metrics are written to:

- `runs/fireworks-rl-preview/metrics.jsonl`
- `runs/fireworks-rl-preview/reward_loss.png` if `matplotlib` is installed

## Notes

- Defaults use Qwen 3 8B full-parameter training:
- `accounts/fireworks/models/qwen3-8b`
- `Qwen/Qwen3-8B`
- `accounts/fireworks/trainingShapes/qwen3-8b-128k`
- LoRA can be tested with `--lora-rank N`, but the validated Qwen3 8B training
shape currently rejects LoRA mode on the `lorenss` preview account.
- The first checkpoint sync happens after step 0 and subsequent rollouts sample
the updated weights through the same deployment.
- `--keep-trainer` and `--keep-deployment` are available for debugging. By
default the trainer is cleaned up and the deployment scales to zero on exit.
19 changes: 19 additions & 0 deletions cookbooks/fireworks-rl-training/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[project]
name = "fireworks-rl-training"
version = "0.1.0"
description = "Direct Fireworks Training API RL loop over HUD-style arithmetic tasks"
requires-python = ">=3.11,<3.13"
dependencies = [
"fireworks-ai[training]",
"hud-python",
"matplotlib",
"python-dotenv",
"torch>=2",
"transformers>=4.55",
]

[tool.uv]
package = false

[tool.uv.sources]
hud-python = { path = "../..", editable = true }
Loading
Loading