Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 81 additions & 5 deletions .github/workflows/codeowner-signoff-verify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@
# this InferenceX PR.
# 4. An authorized maintainer has explicitly posted a `/reuse-sweep-run`
# command on the PR (the reuse-merge path requires it).
# 5. The sign-off uses the latest docs/PR_REVIEW_CHECKLIST.md template.
# 6. vLLM/SGLang configs run upstream images (hub.docker.com/u/vllm,
# hub.docker.com/u/lmsysorg) on established hardware, and vLLM/SGLang
# submissions land before additional frameworks (TRT-LLM, ATOM, ...).
# 7. No benchmark hacks that change the model architecture / cut FLOPs.
# 8. Speculative-decoding configs benchmark through chat templates.
# If any of those are not to standard, Claude posts a single comment that
# @-mentions the reviewer who signed off and explains exactly what is wrong.
#
Expand Down Expand Up @@ -229,9 +235,11 @@
A CODEOWNER (`${{ needs.gate.outputs.signoff-author }}`) just posted the reviewer
sign-off checklist (as a ${{ needs.gate.outputs.signoff-kind }}) that marks
PR #${{ needs.gate.outputs.pr-number }} as ready to merge. Your job is to
INDEPENDENTLY verify the checks below (0-3). Do not trust the reviewer's checkmarks
— re-derive every conclusion from CODEOWNERS, CI runs, the PR diff, and the linked
recipe yourself. Be rigorous and specific.
INDEPENDENTLY verify the checks below (0-8). Do not trust the reviewer's checkmarks
— re-derive every conclusion from CODEOWNERS, CI runs, the PR diff, the master
configs, and the linked recipe yourself. Be rigorous and specific. The checks encode
the merge standard in `docs/PR_REVIEW_CHECKLIST.md` (read it in the checked-out
default branch — it is the source of truth if wording here and there ever drifts).

Read the exact sign-off body first (especially its "Additional detail section").
The sign-off was posted as a ${{ needs.gate.outputs.signoff-kind }}, so fetch it with:
Expand Down Expand Up @@ -389,8 +397,76 @@
command has been posted on this PR" and remind the reviewer that an authorized
maintainer must comment `/reuse-sweep-run` before this PR can be merged via reuse.

## Check 5 — Sign-off uses the LATEST checklist template
The first item of the checklist has the reviewer affirm they used the latest version
of `docs/PR_REVIEW_CHECKLIST.md`. Verify it instead of trusting it: read the template
in `docs/PR_REVIEW_CHECKLIST.md` (checked-out default branch) and compare its items
against the sign-off body.
- PASS if every item in the current template has a corresponding checked (`[x]`) item
in the sign-off. Match items semantically — minor wording drift is fine; a missing
ITEM is not.
- FAIL if the sign-off is missing current-template items (stale copy) or left items
unchecked without an explanation in the additional detail section. Name the
missing/unchecked items and link the current template.

Check warning on line 410 in .github/workflows/codeowner-signoff-verify.yml

View check run for this annotation

Claude / Claude Code Review

Check 5 spuriously FAILs on clean sign-offs due to conditional escape-hatch item

Check 5's PASS/FAIL rule doesn't account for the conditional escape-hatch item on line 18 of `docs/PR_REVIEW_CHECKLIST.md` (`If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.`). On a perfectly clean sign-off, that box is intentionally left unchecked with no additional-detail entry — which literally matches Check 5's FAIL condition ("left items unchecked without an explanation in the additional detail section"), inverting its semantics. Sugge
Comment on lines +405 to +410

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Check 5's PASS/FAIL rule doesn't account for the conditional escape-hatch item on line 18 of docs/PR_REVIEW_CHECKLIST.md (If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.). On a perfectly clean sign-off, that box is intentionally left unchecked with no additional-detail entry — which literally matches Check 5's FAIL condition ("left items unchecked without an explanation in the additional detail section"), inverting its semantics. Suggest exempting the conditional item, e.g. "the escape-hatch If any of the above criteria cannot reasonably be satisfied line stays unchecked unless a genuine exception is documented — treat it as PASS in that case".

Extended reasoning...

What the bug is. Check 5 asks the verifier to compare the sign-off body against the current docs/PR_REVIEW_CHECKLIST.md template. Its rules are:

  • PASS if every item has a corresponding [x] in the sign-off.
  • FAIL if items are missing (stale copy) or "left items unchecked without an explanation in the additional detail section."

Every item in the template is a verification affirmation ("I have verified X") — except line 18, which is inherently conditional:

- [ ] If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

Its own wording says "IF any of the above criteria cannot reasonably be satisfied" — the reviewer is meant to tick it only when they need to note an exception, and the accompanying reasoning belongs in the additional-detail section.

Why the check fails on the happy path. On a perfectly clean sign-off — every criterion satisfied, no exception needed — line 18 is legitimately unchecked and the additional-detail section is empty (or contains only non-exception info like the recipe link). That is a literal three-of-three match for Check 5's FAIL rule: item exists in template, item unchecked, no explanation in additional-detail. So Check 5 would FAIL precisely the state that is most correct.

Why existing wording doesn't rescue it. Check 5 does say "Match items semantically — minor wording drift is fine," which gives the LLM auditor some latitude. But the FAIL clause is written mechanically ("left items unchecked without an explanation"), and a strict-reading verifier acting on that rule will trip. The safer fix is to remove the ambiguity, not rely on interpretive charity.

Step-by-step proof (happy-path sign-off):

  1. Reviewer submits a sign-off where every substantive item (lines 8-17 in the template) is [x].
  2. Reviewer leaves line 18 (- [ ] If any of the above criteria cannot reasonably be satisfied ...) unchecked because there are no exceptions to explain.
  3. The additional-detail section contains only the required recipe link — nothing addressing line 18.
  4. Check 5 runs: item 18 is in the current template ✓, item 18 is left unchecked ✓, there is no explanation for item 18 in the additional-detail section ✓ — the FAIL rule literally matches.
  5. The verifier reports Check 5 as FAIL, blocking merge on a correct sign-off. Reviewer's only workarounds are (a) tick a box whose text is a false claim ("criteria cannot reasonably be satisfied") or (b) add filler text to the additional-detail section. Both are worse than passing.

Impact. Nit-level, but pointed at the common case: false FAILs will land on precisely the sign-offs that are most correct. The verifier is Claude Fable-5, which will often reason semantically and pass anyway, so this won't fire every time — but the rule text invites it to fire, and even one spurious FAIL wastes a merge cycle. Nothing about merging as-is is unsafe.

How to fix. One clause in Check 5, e.g.:

The conditional item ("If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.") is an escape-hatch — it should be checked ONLY when a genuine exception is documented in the additional-detail section. Leaving it unchecked with no exception is expected and PASSes.


## Check 6 — Upstream vLLM/SGLang images, and engine-first ordering
The checklist makes upstream engine images a HARD guideline: on established hardware,
vLLM/SGLang submissions must run images published by the upstream projects, not
vendor forks. Established (NOT "new hardware") SKUs: NVIDIA H100, H200, B200, B300,
GB200, GB300; AMD MI300X, MI325X, MI355X.
Identify each master-config entry this PR adds/changes (in `configs/*-master.yaml`)
and read its `framework:`, `runner:`, and `image:` fields.
- (a) UPSTREAM IMAGE: for entries with `framework: vllm`, the image must come from the
upstream vLLM Docker Hub org (https://hub.docker.com/u/vllm) — in the master configs
that looks like `vllm/vllm-openai:<tag>` or `vllm/vllm-openai-rocm:<tag>`. For
`framework: sglang`, it must come from the upstream org
(https://hub.docker.com/u/lmsysorg) — `lmsysorg/sglang:<tag>` or
`lmsysorg/sglang-rocm:<tag>` (digest-pinned `@sha256:...` variants are fine).
Vendor/private forks — e.g. `rocm/sgl-dev`, `rocm/vllm-dev`, `ghcr.io#...`, or any
non-`vllm/`/non-`lmsysorg/` repo — FAIL on the established SKUs above unless the
sign-off's additional detail section documents a genuine exception: truly new
hardware (e.g. MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8) or a new model
architecture that upstream vLLM/SGLang does not fundamentally support yet (as
backed by vLLM/SGLang community maintainers). Name the offending image and the
missing justification when you FAIL. (Note: some entries write the registry
separator as `#`, e.g. `nvcr.io#nvidia/...` — treat `#` as `/`.)
- (b) ENGINE-FIRST ORDERING: if this PR adds a config entry for a NON-vLLM/SGLang
framework (`framework:` of `trtllm`, `atom`, dynamo variants, etc. — images like

Check warning on line 434 in .github/workflows/codeowner-signoff-verify.yml

View check run for this annotation

Claude / Claude Code Review

Check 6 framework value matching uses wrong literals (misses disagg/dynamo variants, references non-existent trtllm)

Check 6's framework matching is too literal for actual master-config values. (a) Only exact `framework: vllm`/`sglang` matches, missing `vllm-disagg`/`sglang-disagg`/`dynamo-vllm`/`dynamo-sglang` — which is exactly where vendor forks like `rocm/sgl-dev`/`rocm/vllm-dev` currently live (see `dsr1-fp8-mi355x-sglang-disagg` at configs/amd-master.yaml:967, `minimaxm3-fp4-mi355x-vllm-disagg` at :2534). (b) Lists `trtllm` as a framework value, but the actual YAML value is `trt` (and `dynamo-trt`) — `fr

Check warning on line 434 in .github/workflows/codeowner-signoff-verify.yml

View check run for this annotation

Claude / Claude Code Review

Check 6 framework matching uses wrong literals (misses disagg/dynamo variants, references non-existent trtllm)

Check 6's framework matching uses literal equality that misses real config values: (a) the upstream-image gate at lines 415-419 targets only `framework: vllm` / `framework: sglang`, silently exempting `vllm-disagg`, `sglang-disagg`, `dynamo-vllm`, and `dynamo-sglang` entries — which is exactly where vendor forks like `rocm/sgl-dev` and `rocm/vllm-dev` currently appear on established MI355X SKUs (e.g. `dsr1-fp8-mi355x-sglang-disagg` at configs/amd-master.yaml:967, `minimaxm3-fp4-mi355x-vllm-disag
Comment on lines +419 to +434

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Check 6's framework matching is too literal for actual master-config values. (a) Only exact framework: vllm/sglang matches, missing vllm-disagg/sglang-disagg/dynamo-vllm/dynamo-sglang — which is exactly where vendor forks like rocm/sgl-dev/rocm/vllm-dev currently live (see dsr1-fp8-mi355x-sglang-disagg at configs/amd-master.yaml:967, minimaxm3-fp4-mi355x-vllm-disagg at :2534). (b) Lists trtllm as a framework value, but the actual YAML value is trt (and dynamo-trt) — framework: trtllm appears nowhere. Also, categorizing dynamo variants as non-vLLM/SGLang in (b) contradicts (a)'s intent — a dynamo-vllm PR is still a vLLM submission. Recommend matching by substring (framework contains vllm/sglang) and naming the exact YAML values (trt, dynamo-trt, atom, vllm-disagg, etc.).

Extended reasoning...

What the check says

Check 6(a) at .github/workflows/codeowner-signoff-verify.yml:412-427 gates upstream-image usage on entries with framework: vllm or framework: sglang. The backtick-formatted literals plus the exact-equality wording tell the verifier to match those two strings.

Check 6(b) at :428-434 lists the NON-vLLM/SGLang bucket as framework: of trtllm, atom, dynamo variants, etc.

What the master configs actually contain

Grepping configs/*-master.yaml for framework: values:

  • vLLM/SGLang variants used on established SKUs: vllm, sglang, vllm-disagg, sglang-disagg, dynamo-vllm, dynamo-sglang
  • TensorRT-LLM: trt and dynamo-trttrtllm does not appear anywhere
  • ATOM: atom, atom-disagg

Concrete examples of vendor-fork images on disagg vLLM/SGLang variants — precisely the class Check 6 was designed to gate:

  • configs/amd-master.yaml:967-973dsr1-fp8-mi355x-sglang-disagg (framework: sglang-disagg) uses rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2
  • configs/amd-master.yaml:2534-2541minimaxm3-fp4-mi355x-vllm-disagg (framework: vllm-disagg) uses rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625
  • configs/amd-master.yaml:1119-1127 — another sglang-disagg on rocm/sgl-dev

Why this matters

(a) — silent under-application. A literal reading of framework: vllm/framework: sglang exempts every disagg/dynamo variant from the upstream-image requirement. That means the gate silently passes exactly the configs it was designed to fail: MI355X sglang-disagg / vllm-disagg entries running rocm/sgl-dev / rocm/vllm-dev sail through Check 6(a) untouched.

(b) — wrong search term + self-contradiction. A verifier grepping framework: trtllm finds nothing and skips every actual TRT-LLM entry (the ~14 framework: trt entries in nvidia-master.yaml, plus dynamo-trt). Additionally, (b) puts dynamo variants in the non-vLLM/SGLang bucket, which is self-contradictory with (a): a dynamo-vllm PR is a vLLM submission, yet (b) would demand a plain-vllm entry come first.

Step-by-step proof (part a)

  1. A hypothetical PR modifies configs/amd-master.yaml:2534-2541 (minimaxm3-fp4-mi355x-vllm-disagg) and keeps the vendor-fork image rocm/vllm-dev:....
  2. Check 6(a) asks the verifier: does this entry have framework: vllm? No — it has framework: vllm-disagg. Literal match fails.
  3. Because the entry doesn't have framework: vllm, the (a) rule doesn't apply. The vendor-fork image is not flagged.
  4. Result: MI355X is an established SKU, the image is a vendor fork, no exception was documented — but the gate passes.

Step-by-step proof (part b)

  1. A hypothetical PR adds a TensorRT-LLM entry — framework: trt (or dynamo-trt) — before any vLLM/SGLang entry for that model+SKU exists.
  2. Check 6(b) tells the verifier to look for framework: of trtllm, atom, dynamo variants, etc.
  3. A verifier greps framework: trtllm in the diff / master configs → 0 matches.
  4. Result: TRT-LLM entry is not detected as non-vLLM/SGLang, engine-first ordering is not enforced.

Why this is a nit, not a blocker

This is a natural-language prompt for an LLM verifier that has interpretive latitude. A competent auditor reading the whole check — including the vendor-fork image examples (rocm/sgl-dev, rocm/vllm-dev) and the informal trtllm label — will likely still catch the obvious cases. The list in (b) is explicitly non-exhaustive (etc.), and (b) also supplies image-name identifiers (nvcr.io...tensorrt-llm...) that independently point at the right entries. So the gate keeps working in the common case; it's the corner cases (disagg variants using vendor forks, first-time TRT-LLM ordering) where a literal reading misses.

Suggested fix

  • (a): change wording from equality to substring — "for entries whose framework contains vllm or sglang (vllm, vllm-disagg, dynamo-vllm, sglang, sglang-disagg, dynamo-sglang)".
  • (b): replace trtllm with trt / dynamo-trt, and explicitly exclude dynamo-vllm / dynamo-sglang from the non-vLLM/SGLang bucket. Something like: "framework values that are NOT vLLM/SGLang engines — trt, dynamo-trt, atom, atom-disagg, etc."

Comment on lines +419 to +434

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Check 6's framework matching uses literal equality that misses real config values: (a) the upstream-image gate at lines 415-419 targets only framework: vllm / framework: sglang, silently exempting vllm-disagg, sglang-disagg, dynamo-vllm, and dynamo-sglang entries — which is exactly where vendor forks like rocm/sgl-dev and rocm/vllm-dev currently appear on established MI355X SKUs (e.g. dsr1-fp8-mi355x-sglang-disagg at configs/amd-master.yaml:967, minimaxm3-fp4-mi355x-vllm-disagg at :2534); (b) the engine-first ordering bullet at line 434 lists trtllm as a framework value, but grep confirms only trt (and dynamo-trt) appears in the master configs — trtllm matches nothing. Suggest matching by substring in (a) (framework contains vllm/sglang), and naming the exact YAML values (trt, dynamo-trt, atom, dynamo-* excluding dynamo-vllm/dynamo-sglang) in (b).

Extended reasoning...

Bug

Check 6 tries to gate two behaviors by literal framework-string matching, but the string set is out of sync with what the master configs actually contain.

Part (a) — upstream-image gate (line 415-419) misses -disagg / dynamo- variants.* The check text reads "for entries with framework: vllm, the image must come from the upstream vLLM Docker Hub org" and similarly for framework: sglang. The backtick code-fence formatting reads as an exact string match. But configs/amd-master.yaml and configs/nvidia-master.yaml also use compound framework values that are still fundamentally vLLM/SGLang engines: vllm-disagg, sglang-disagg, dynamo-vllm, dynamo-sglang.

Crucially, these are the exact entries currently using vendor forks — the class of entries Check 6 was written to catch:

  • configs/amd-master.yaml:967-973dsr1-fp8-mi355x-sglang-disagg uses framework: sglang-disagg + image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2
  • configs/amd-master.yaml:1119-1127 — another sglang-disagg + rocm/sgl-dev on MI355X
  • configs/amd-master.yaml:2534-2541minimaxm3-fp4-mi355x-vllm-disagg uses framework: vllm-disagg + image: rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625

All three are on ESTABLISHED SKUs (MI355X is in the check's own established-hardware list). A verifier taking the check text at face value would search for framework: vllm / framework: sglang, not find these entries, and silently pass the exact configurations the gate was designed to fail.

Part (b) — engine-first ordering (line 434) references framework: trtllm, which does not exist. The bullet lists "framework: of trtllm, atom, dynamo variants, etc." as the NON-vLLM/SGLang bucket. But grep across configs/*-master.yaml confirms the actual YAML value for TensorRT-LLM entries is framework: trt (~14 entries in nvidia-master.yaml, e.g. lines 1793, 1814, 1885, 3247, 4429) and framework: dynamo-trt — there is no framework: trtllm anywhere in the configs.

Part (b) is also self-contradictory with (a). The bullet lumps "dynamo variants" into the NON-vLLM/SGLang bucket, so a dynamo-vllm PR would be flagged as needing a plain-vllm entry to land first — even though the PR IS a vLLM submission (just packaged as Dynamo). Meanwhile (a) exempts it from the upstream-image gate for the opposite reason.

Why existing code doesn't prevent it

This check is a natural-language prompt for an LLM auditor, not code. The LLM has reasoning latitude and will likely match trtllm → TensorRT-LLM → trt semantically, and may notice the vendor-fork example images (rocm/sgl-dev, rocm/vllm-dev) in the prompt itself and connect them to disagg entries. But the wording is ambiguous enough that a strict reading is plausible, and (b)'s explicit categorization of dynamo variants as non-vLLM/SGLang is stated outright, not merely implied.

Impact

Soft check under-applied. A verifier that reads (a) strictly would pass entries using rocm/sgl-dev / rocm/vllm-dev on MI355X disagg configs — the exact class of entries the check exists to gate. And (b) as written would demand a plain-vLLM submission come first for a dynamo-vLLM PR that IS a vLLM submission.

Step-by-step proof

Consider a hypothetical PR that adds a new MI355X sglang-disagg config with image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2 (matching the pattern already at configs/amd-master.yaml:967-973):

  1. The verifier reads Check 6(a): "for entries with framework: vllm, the image must come from the upstream vLLM Docker Hub org ... For framework: sglang, it must come from the upstream org..."
  2. The verifier reads the new config entry: framework: sglang-disagg.
  3. A strict reading of the check finds no match: the entry's framework field is not sglang and not vllm.
  4. Check 6(a) is reported as N/A or PASS (no matching entries).
  5. The vendor-fork image on an established SKU is silently accepted.

Compare to the intended behavior: this is precisely the gate's purpose. The MI355X is in the check's own established-hardware list, sglang-disagg is a vLLM/SGLang engine, and rocm/sgl-dev is a vendor fork.

Fix

  • In (a), change the wording to match by substring/prefix — e.g. "for entries whose framework contains vllm or sglang (i.e. vllm, vllm-disagg, dynamo-vllm, sglang, sglang-disagg, dynamo-sglang), the image must come from...".
  • In (b), name the exact YAML values (framework: of trt, dynamo-trt, atom, dynamo-* variants excluding dynamo-vllm/dynamo-sglang) and drop the non-existent trtllm.

`rocm/atom*`, `nvcr.io...tensorrt-llm...`), check whether `configs/*-master.yaml`
already contains a vLLM or SGLang entry for the same model (`model-prefix`) and SKU
(`runner`). If none exists and no exception is documented in the sign-off, FAIL —
vLLM/SGLang submissions must land before additional frameworks. Otherwise PASS with
the matching entry named.
- N/A if the PR changes no master-config entries (state that in one line).

## Check 7 — No benchmark hacks that change the model architecture
Verify from the PR diff (server args in `benchmarks/**` and master-config changes)
that nothing alters the model architecture or reduces its FLOPs — e.g. `--hf-overrides`
that skip the indexer every N layers on a model that doesn't natively support it,
trimming layers/experts/heads, or otherwise skipping computation. The rule: making the
SAME computation run faster is fair game; FLOPs at lower precision is fine when evals
pass; REMOVING model-architecture FLOPs is not. Optimizations should be ones used in
production by accuracy-sensitive customers.
- Scan for architecture-override knobs: `--hf-overrides`, `hf_overrides`,
`--json-model-override-args`, config-editing `sed`/`jq` on the model files, etc. If
present, determine whether the override changes computed FLOPs vs the model's native
config and whether the model natively supports that mode.
- FAIL with the exact flag/value if architecture FLOPs are reduced without native
model support. PASS in one line otherwise; N/A if the PR touches no server args.

## Check 8 — Speculative-decoding configs benchmark through chat templates
If this PR adds or changes a speculative-decoding config (MTP / EAGLE / draft-model
flags such as `--speculative-config`, `--speculative-algorithm`, `spec-decode`, config
names ending in `-mtp`), verify the benchmark client exercises the model through its
chat template (e.g. chat-completions style endpoint/backend rather than raw
completions) so the acceptance-length distribution matches real-world traffic.
- FAIL if a spec-decode benchmark drives raw completions with no chat template —
name the config and script line.
- N/A if the PR has no speculative-decoding changes.

## Verdict and output
Decide PASS only if Checks 0, 1, 2, 3, and 4 ALL pass. Post EXACTLY ONE summary comment on
Decide PASS only if Checks 0-8 ALL pass (a check reported as `N/A` counts as a pass —
keep the `N/A — <reason>` row so the reviewer sees it was considered). Post EXACTLY ONE summary comment on
PR #${{ needs.gate.outputs.pr-number }} using `gh pr comment`. Start the comment with
the hidden marker so reruns are identifiable:
`<!-- codeowner-signoff-verify sha=${{ needs.gate.outputs.head-sha }} -->`
Expand All @@ -409,7 +485,7 @@
restating the checklist, no hedging ("if X then maybe Y" — make the call). Link the
run/recipe instead of describing it.

- If everything is to standard: post the verdict line + the five one-line PASS rows
- If everything is to standard: post the verdict line + the nine one-line PASS/N-A rows
(with the green run URL). Do NOT @-mention anyone on a pass.
- If anything is NOT to standard: the FIRST line (after the marker) must @-mention the
sign-off author as `@${{ needs.gate.outputs.signoff-author }}` with the blocking
Expand Down