Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions .claude/skills/add-tests-and-ci/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,33 @@ if __name__ == "__main__":
- `run-ci-changed` extracts a top-level `NUM_GPUS = <N>` constant from added/modified `tests/test_*.py` and `tests/plugin_contracts/test_*.py`; if missing, it defaults to 8 GPUs. Set `NUM_GPUS = 0` for CPU-only tests.
- For GPU/e2e tests, follow the nearby file pattern (`prepare()`, `execute()`, `NUM_GPUS`, and any model/dataset constants).

### Step 3: Run Local Validation
### Step 3: Register Tests in GitHub CI

Whenever adding, moving, or renaming a test file, update the GitHub workflow template before finishing:

1. Add the test to the appropriate matrix in `.github/workflows/pr-test.yml.j2`.
- CPU-only pytest/unit tests usually belong in `cpu-unittest` with `num_gpus: 0`.
- GPU/e2e tests should be placed beside the nearest similar model/path test with the matching `num_gpus` and environment fields.
2. Regenerate workflows:

```bash
python .github/workflows/generate_github_workflows.py
```

3. Include both `.github/workflows/pr-test.yml.j2` and the generated `.github/workflows/pr-test.yml` in the change set.

Only skip fixed matrix registration when the test is intentionally helper-only or manually invoked; state that reason in the final response.

### Step 4: Run Local Validation

- Run the exact existing test files you changed, if any.
- For new registered tests, run the same shape CI will use, for example `python tests/test_new_file.py`.
- Run repository-wide checks only when they are already part of the task or workflow.
- Avoid documenting placeholder test commands that may not exist in the current tree.

### Step 4: Update Workflow Template Correctly
### Step 5: Keep Workflow Template as Source of Truth

For CI workflow changes:
For CI workflow changes unrelated to a new, moved, or renamed test:

1. Edit `.github/workflows/pr-test.yml.j2`
2. Regenerate workflows:
Expand All @@ -59,18 +77,20 @@ python .github/workflows/generate_github_workflows.py

3. Include both the template and generated workflow file in the change set (`.j2` and `.yml`). If the user asked for a commit, commit both.

### Step 5: Provide Verifiable PR Notes
### Step 6: Provide Verifiable PR Notes

Include:

- Which tests were added/changed
- Where each new/renamed test was registered in `.github/workflows/pr-test.yml.j2`
- Exact commands executed
- GPU assumptions for each test path
- Why this coverage protects against regression

## Common Mistakes

- Editing generated workflow file only
- Relying on `run-ci-changed` discovery for a new test that should run in the regular PR matrix
- Forgetting `NUM_GPUS = 0` on a CPU-only changed test, causing `run-ci-changed` to default to 8 GPUs
- Adding a CPU pytest file that passes under `pytest tests/foo.py` but fails under CI's `python tests/foo.py`
- Adding tests without following existing constants/conventions
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ jobs:
strategy:
fail-fast: false
matrix:
info: [{"enable_eval": "0", "num_gpus": 8, "test_file": "test_quick_start_glm4_9B.py"}, {"num_gpus": 8, "test_file": "test_glm4.7_30B_A3B_pd_mooncake.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"num_gpus": 8, "test_file": "test_qwen3.6_35B_A3B_pd_mooncake.py", "use_deepep": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_30B_A3B_r3.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo_disaggregate.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo_train_critic_only.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_moonlight_16B_A3B.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_moonlight_16B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "test_mimo_7B_mtp_only_grad.py"}, {"num_gpus": 8, "test_file": "test_qwen3_0.6B_parallel_check.py"}, {"num_gpus": 8, "test_file": "test_qwen2.5_0.5B_debug_rollout_then_train.py"}, {"num_gpus": 8, "test_file": "test_qwen2.5_0.5B_opd_sglang.py"}, {"num_gpus": 4, "test_file": "test_qwen2.5_0.5B_fully_async_short.py"}, {"num_gpus": 8, "test_file": "test_qwen3_4B_streaming_partial_rollout.py"}, {"num_gpus": 4, "test_file": "test_qwen3.5_0.8B_gsm8k_short.py"}, {"num_gpus": 4, "test_file": "test_qwen3.5_0.8B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_args": "--save-optimizer gpu --load-optimizer gpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer gpu --load-optimizer cpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer cpu --load-optimizer cpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer cpu --load-optimizer gpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--async-save", "test_file": "test_qwen3_4B_ckpt.py"}]
info: [{"enable_eval": "0", "num_gpus": 8, "test_file": "test_quick_start_glm4_9B.py"}, {"num_gpus": 8, "test_file": "test_glm4.7_30B_A3B_pd_mooncake.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_30B_A3B.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"num_gpus": 8, "test_file": "test_qwen3.6_35B_A3B_pd_mooncake.py", "use_deepep": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_30B_A3B_r3.py", "use_deepep": "1", "use_fp8_rollout": "1"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo_disaggregate.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_qwen3_4B_ppo_train_critic_only.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_moonlight_16B_A3B.py"}, {"enable_eval": "0", "num_gpus": 8, "test_file": "test_moonlight_16B_A3B_r3.py"}, {"num_gpus": 8, "test_file": "test_mimo_7B_mtp_only_grad.py"}, {"num_gpus": 8, "test_file": "test_qwen3_0.6B_parallel_check.py"}, {"num_gpus": 8, "test_file": "test_qwen2.5_0.5B_debug_rollout_then_train.py"}, {"num_gpus": 8, "test_file": "test_qwen2.5_0.5B_opd_sglang.py"}, {"num_gpus": 6, "test_file": "test_qwen3_4B_external_pd.py"}, {"num_gpus": 4, "test_file": "test_qwen2.5_0.5B_fully_async_short.py"}, {"num_gpus": 8, "test_file": "test_qwen3_4B_streaming_partial_rollout.py"}, {"num_gpus": 4, "test_file": "test_qwen3.5_0.8B_gsm8k_short.py"}, {"num_gpus": 4, "test_file": "test_qwen3.5_0.8B_gsm8k_async_short.py"}, {"num_gpus": 8, "test_args": "--save-optimizer gpu --load-optimizer gpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer gpu --load-optimizer cpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer cpu --load-optimizer cpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--save-optimizer cpu --load-optimizer gpu", "test_file": "test_qwen3_4B_ckpt.py"}, {"num_gpus": 8, "test_args": "--async-save", "test_file": "test_qwen3_4B_ckpt.py"}]
defaults:
run:
working-directory: ${{ github.workspace }}
Expand Down Expand Up @@ -454,7 +454,7 @@ jobs:
strategy:
fail-fast: false
matrix:
info: [{"num_gpus": 0, "test_file": "test_megatron_argument_validation.py"}, {"num_gpus": 0, "test_file": "test_dp_schedule.py"}, {"num_gpus": 0, "test_file": "test_cp_utils.py"}, {"num_gpus": 0, "test_file": "test_metric_report.py"}, {"num_gpus": 0, "test_file": "test_metric_report_dist.py"}, {"num_gpus": 0, "test_file": "test_loss_cp_invariance.py"}, {"num_gpus": 0, "test_file": "test_value_temperature.py"}, {"num_gpus": 0, "test_file": "test_rm_f1.py"}, {"num_gpus": 0, "test_file": "test_rm_gpqa.py"}, {"num_gpus": 0, "test_file": "test_rm_math.py"}, {"num_gpus": 0, "test_file": "test_rm_math_dapo.py"}, {"num_gpus": 0, "test_file": "test_rm_deepscaler.py"}, {"num_gpus": 0, "test_file": "test_sample.py"}, {"num_gpus": 0, "test_file": "test_agent_trajectory.py"}, {"num_gpus": 0, "test_file": "test_rollout_validation.py"}, {"num_gpus": 0, "test_file": "utils/test_hf_checkpoint_saver.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_rollout_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_runtime_hook_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_path_loading_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_generate_contracts.py"}]
info: [{"num_gpus": 0, "test_file": "test_megatron_argument_validation.py"}, {"num_gpus": 0, "test_file": "test_dp_schedule.py"}, {"num_gpus": 0, "test_file": "test_cp_utils.py"}, {"num_gpus": 0, "test_file": "test_metric_report.py"}, {"num_gpus": 0, "test_file": "test_metric_report_dist.py"}, {"num_gpus": 0, "test_file": "test_loss_cp_invariance.py"}, {"num_gpus": 0, "test_file": "test_value_temperature.py"}, {"num_gpus": 0, "test_file": "test_rm_f1.py"}, {"num_gpus": 0, "test_file": "test_rm_gpqa.py"}, {"num_gpus": 0, "test_file": "test_rm_math.py"}, {"num_gpus": 0, "test_file": "test_rm_math_dapo.py"}, {"num_gpus": 0, "test_file": "test_rm_deepscaler.py"}, {"num_gpus": 0, "test_file": "test_sample.py"}, {"num_gpus": 0, "test_file": "test_agent_trajectory.py"}, {"num_gpus": 0, "test_file": "test_rollout_validation.py"}, {"num_gpus": 0, "test_file": "test_placement_group.py"}, {"num_gpus": 0, "test_file": "test_external_sglang_engines.py"}, {"num_gpus": 0, "test_file": "utils/test_hf_checkpoint_saver.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_rollout_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_runtime_hook_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_path_loading_contracts.py"}, {"num_gpus": 0, "test_file": "plugin_contracts/test_plugin_generate_contracts.py"}]
defaults:
run:
working-directory: ${{ github.workspace }}
Expand All @@ -481,7 +481,7 @@ jobs:
shell: bash
run: |
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx pybase64 pylatexenc sympy aiohttp pillow safetensors
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx requests ray pybase64 pylatexenc sympy aiohttp pillow safetensors


- name: Install
Expand Down Expand Up @@ -547,7 +547,7 @@ jobs:
shell: bash
run: |
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx pybase64 pylatexenc sympy aiohttp pillow safetensors
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx requests ray pybase64 pylatexenc sympy aiohttp pillow safetensors

pip install openai openai-agents anthropic

Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/pr-test.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
{'test_file': 'test_qwen3_0.6B_parallel_check.py', 'num_gpus': 8},
{'test_file': 'test_qwen2.5_0.5B_debug_rollout_then_train.py', 'num_gpus': 8},
{'test_file': 'test_qwen2.5_0.5B_opd_sglang.py', 'num_gpus': 8},
{'test_file': 'test_qwen3_4B_external_pd.py', 'num_gpus': 6},
{'test_file': 'test_qwen2.5_0.5B_fully_async_short.py', 'num_gpus': 4},
{'test_file': 'test_qwen3_4B_streaming_partial_rollout.py', 'num_gpus': 8},
{'test_file': 'test_qwen3.5_0.8B_gsm8k_short.py', 'num_gpus': 4},
Expand Down Expand Up @@ -83,6 +84,8 @@
{'test_file': 'test_sample.py', 'num_gpus': 0},
{'test_file': 'test_agent_trajectory.py', 'num_gpus': 0},
{'test_file': 'test_rollout_validation.py', 'num_gpus': 0},
{'test_file': 'test_placement_group.py', 'num_gpus': 0},
{'test_file': 'test_external_sglang_engines.py', 'num_gpus': 0},
{'test_file': 'utils/test_hf_checkpoint_saver.py', 'num_gpus': 0},
{'test_file': 'plugin_contracts/test_plugin_rollout_contracts.py', 'num_gpus': 0},
{'test_file': 'plugin_contracts/test_plugin_runtime_hook_contracts.py', 'num_gpus': 0},
Expand Down Expand Up @@ -194,7 +197,7 @@ jobs:
shell: bash
run: |
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx pybase64 pylatexenc sympy aiohttp pillow safetensors
pip install pytest numpy packaging pyyaml omegaconf tqdm httpx requests ray pybase64 pylatexenc sympy aiohttp pillow safetensors
<% if config.get('extra_pip_deps') %>
pip install << config.extra_pip_deps >>
<% endif %>
Expand Down
15 changes: 10 additions & 5 deletions docs/en/advanced/sglang-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ Overrides take **highest priority**, overriding both the base `--sglang-*` CLI a

### 7. Standalone SGLang Launcher

While `--sglang-config` is designed for slime's training pipeline, it also works as a powerful launcher for pure inference scenarios using the `--rollout-external` pattern or by configuring slime to focus solely on serving.
While `--sglang-config` is designed for slime's training pipeline, it also works as a powerful launcher for pure inference scenarios using external engine addresses or by configuring slime to focus solely on serving.

**Using external engines with a pre-launched topology:**

Expand All @@ -270,12 +270,17 @@ python -m sglang.launch_server --model-path /path/to/model --port 10091 ...
# Step 2: Connect slime to external engines
python train.py \
--rollout-external \
--rollout-external-engine-addrs host1:10090 host2:10091 \
...
```

> **Note:** `--sglang-config` and `--rollout-external` are mutually exclusive. Use `--sglang-config` when you want slime to manage the full engine lifecycle; use `--rollout-external` when engines are pre-deployed.
slime queries each external engine's `/server_info` endpoint to infer
`rollout_num_gpus`, per-engine GPU counts, SGLang parallel sizes, and
prefill/decode worker types. If no `--sglang-router-ip/--sglang-router-port`
is provided, slime launches its own router and registers the external engines
to it.

> **Note:** `--sglang-config` and `--rollout-external-engine-addrs` are mutually exclusive. Use `--sglang-config` when you want slime to manage the full engine lifecycle; use `--rollout-external-engine-addrs` when engines are pre-deployed.

---

Expand Down Expand Up @@ -332,7 +337,7 @@ When the config is loaded, slime applies the following resolution cascade:
| Flag | Conflict Reason |
|------|----------------|
| `--prefill-num-servers` | PD disaggregation is configured via `server_groups` in the YAML |
| `--rollout-external` | External engines have their own topology; config manages the lifecycle internally |
| `--rollout-external-engine-addrs` | External engines have their own topology; config manages the lifecycle internally |

---

Expand Down Expand Up @@ -446,7 +451,7 @@ Use `get_model_url(args, "model_name", "/endpoint")` from `slime.rollout.sglang_

### Q: Can I use `--sglang-config` without training (inference only)?

While `--sglang-config` is designed for slime's training loop, you can effectively use it for inference-only scenarios by configuring a rollout-only run. For fully standalone SGLang serving, consider using SGLang's native `launch_server` directly or the `--rollout-external` mode for connecting to pre-deployed engines.
While `--sglang-config` is designed for slime's training loop, you can effectively use it for inference-only scenarios by configuring a rollout-only run. For fully standalone SGLang serving, consider using SGLang's native `launch_server` directly or `--rollout-external-engine-addrs` for connecting to pre-deployed engines.

### Q: What is the relationship between `--sglang-config` and `--prefill-num-servers`?

Expand Down
14 changes: 9 additions & 5 deletions docs/zh/advanced/sglang-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ sglang:

### 7. 独立 SGLang 启动器

虽然 `--sglang-config` 是为 slime 的训练流水线设计的,但它也可以作为纯推理场景的强大启动器,通过 `--rollout-external` 模式或配置 slime 仅关注推理服务。
虽然 `--sglang-config` 是为 slime 的训练流水线设计的,但它也可以作为纯推理场景的强大启动器,通过外部 engine 地址或配置 slime 仅关注推理服务。

**使用预启动的外部引擎:**

Expand All @@ -270,12 +270,16 @@ python -m sglang.launch_server --model-path /path/to/model --port 10091 ...
# 步骤 2:将 slime 连接到外部引擎
python train.py \
--rollout-external \
--rollout-external-engine-addrs host1:10090 host2:10091 \
...
```

> **注意:** `--sglang-config` 和 `--rollout-external` 互斥。当你希望 slime 管理完整的引擎生命周期时,使用 `--sglang-config`;当引擎已预部署时,使用 `--rollout-external`。
slime 会请求每个外部引擎的 `/server_info`,自动推断
`rollout_num_gpus`、单个 engine 的 GPU 数、SGLang 并行参数,以及
prefill/decode worker 类型。如果没有提供 `--sglang-router-ip/--sglang-router-port`,
slime 会自己启动 router,并把这些外部引擎注册进去。

> **注意:** `--sglang-config` 和 `--rollout-external-engine-addrs` 互斥。当你希望 slime 管理完整的引擎生命周期时,使用 `--sglang-config`;当引擎已预部署时,使用 `--rollout-external-engine-addrs`。

---

Expand Down Expand Up @@ -332,7 +336,7 @@ slime 自动为每个 sample 分配一个唯一的 `session_id`(存储在 `sam
| 选项 | 冲突原因 |
|------|----------|
| `--prefill-num-servers` | PD 分离通过 YAML 中的 `server_groups` 配置 |
| `--rollout-external` | 外部引擎有自己的拓扑;config 在内部管理生命周期 |
| `--rollout-external-engine-addrs` | 外部引擎有自己的拓扑;config 在内部管理生命周期 |

---

Expand Down Expand Up @@ -446,7 +450,7 @@ async def generate_with_models(args, sample, sampling_params):

### Q: 可以不训练,只用 `--sglang-config` 做推理吗?

虽然 `--sglang-config` 是为 slime 的训练循环设计的,但你可以通过配置仅 rollout 的运行来实现纯推理场景。对于完全独立的 SGLang 推理服务,建议直接使用 SGLang 原生的 `launch_server`,或使用 `--rollout-external` 模式连接预部署的引擎
虽然 `--sglang-config` 是为 slime 的训练循环设计的,但你可以通过配置仅 rollout 的运行来实现纯推理场景。对于完全独立的 SGLang 推理服务,建议直接使用 SGLang 原生的 `launch_server`,或使用 `--rollout-external-engine-addrs` 连接预部署的引擎

### Q: `--sglang-config` 和 `--prefill-num-servers` 是什么关系?

Expand Down
2 changes: 1 addition & 1 deletion slime/backends/megatron_utils/actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,7 @@ def update_weights(self) -> None:
self.weight_updater.update_weights()
print_memory("after update_weights")

if self.args.ci_test and len(rollout_engines) > 0:
if self.args.ci_test and len(rollout_engines) > 0 and self.weight_updater.weight_version > 0:
engine = random.choice(rollout_engines)
engine_version = ray.get(engine.get_weight_version.remote())
if str(engine_version) != str(self.weight_updater.weight_version):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -577,11 +577,6 @@ def update_weights(self) -> None:
if not self._snapshot_seeded:
self._seed_snapshot()
self._snapshot_seeded = True
# Pin the engine's recorded version to ours (0) on the seed call so the
# CI version-equality check holds before any real sync has happened.
if dist.get_rank() == 0 and self.transport == "disk" and self.rollout_engines:
weight_version = str(self.weight_version)
ray.get([engine.set_weight_version.remote(weight_version) for engine in self.rollout_engines])
return

self.weight_version += 1
Expand Down
Loading
Loading