-
Notifications
You must be signed in to change notification settings - Fork 217
[Klaud Cold] Align codeowner-signoff-verify with latest PR_REVIEW_CHECKLIST.md #2015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,12 @@ | |
| # this InferenceX PR. | ||
| # 4. An authorized maintainer has explicitly posted a `/reuse-sweep-run` | ||
| # command on the PR (the reuse-merge path requires it). | ||
| # 5. The sign-off uses the latest docs/PR_REVIEW_CHECKLIST.md template. | ||
| # 6. vLLM/SGLang configs run upstream images (hub.docker.com/u/vllm, | ||
| # hub.docker.com/u/lmsysorg) on established hardware, and vLLM/SGLang | ||
| # submissions land before additional frameworks (TRT-LLM, ATOM, ...). | ||
| # 7. No benchmark hacks that change the model architecture / cut FLOPs. | ||
| # 8. Speculative-decoding configs benchmark through chat templates. | ||
| # If any of those are not to standard, Claude posts a single comment that | ||
| # @-mentions the reviewer who signed off and explains exactly what is wrong. | ||
| # | ||
|
|
@@ -229,9 +235,11 @@ | |
| A CODEOWNER (`${{ needs.gate.outputs.signoff-author }}`) just posted the reviewer | ||
| sign-off checklist (as a ${{ needs.gate.outputs.signoff-kind }}) that marks | ||
| PR #${{ needs.gate.outputs.pr-number }} as ready to merge. Your job is to | ||
| INDEPENDENTLY verify the checks below (0-3). Do not trust the reviewer's checkmarks | ||
| — re-derive every conclusion from CODEOWNERS, CI runs, the PR diff, and the linked | ||
| recipe yourself. Be rigorous and specific. | ||
| INDEPENDENTLY verify the checks below (0-8). Do not trust the reviewer's checkmarks | ||
| — re-derive every conclusion from CODEOWNERS, CI runs, the PR diff, the master | ||
| configs, and the linked recipe yourself. Be rigorous and specific. The checks encode | ||
| the merge standard in `docs/PR_REVIEW_CHECKLIST.md` (read it in the checked-out | ||
| default branch — it is the source of truth if wording here and there ever drifts). | ||
|
|
||
| Read the exact sign-off body first (especially its "Additional detail section"). | ||
| The sign-off was posted as a ${{ needs.gate.outputs.signoff-kind }}, so fetch it with: | ||
|
|
@@ -389,8 +397,76 @@ | |
| command has been posted on this PR" and remind the reviewer that an authorized | ||
| maintainer must comment `/reuse-sweep-run` before this PR can be merged via reuse. | ||
|
|
||
| ## Check 5 — Sign-off uses the LATEST checklist template | ||
| The first item of the checklist has the reviewer affirm they used the latest version | ||
| of `docs/PR_REVIEW_CHECKLIST.md`. Verify it instead of trusting it: read the template | ||
| in `docs/PR_REVIEW_CHECKLIST.md` (checked-out default branch) and compare its items | ||
| against the sign-off body. | ||
| - PASS if every item in the current template has a corresponding checked (`[x]`) item | ||
| in the sign-off. Match items semantically — minor wording drift is fine; a missing | ||
| ITEM is not. | ||
| - FAIL if the sign-off is missing current-template items (stale copy) or left items | ||
| unchecked without an explanation in the additional detail section. Name the | ||
| missing/unchecked items and link the current template. | ||
|
Check warning on line 410 in .github/workflows/codeowner-signoff-verify.yml
|
||
|
|
||
| ## Check 6 — Upstream vLLM/SGLang images, and engine-first ordering | ||
| The checklist makes upstream engine images a HARD guideline: on established hardware, | ||
| vLLM/SGLang submissions must run images published by the upstream projects, not | ||
| vendor forks. Established (NOT "new hardware") SKUs: NVIDIA H100, H200, B200, B300, | ||
| GB200, GB300; AMD MI300X, MI325X, MI355X. | ||
| Identify each master-config entry this PR adds/changes (in `configs/*-master.yaml`) | ||
| and read its `framework:`, `runner:`, and `image:` fields. | ||
| - (a) UPSTREAM IMAGE: for entries with `framework: vllm`, the image must come from the | ||
| upstream vLLM Docker Hub org (https://hub.docker.com/u/vllm) — in the master configs | ||
| that looks like `vllm/vllm-openai:<tag>` or `vllm/vllm-openai-rocm:<tag>`. For | ||
| `framework: sglang`, it must come from the upstream org | ||
| (https://hub.docker.com/u/lmsysorg) — `lmsysorg/sglang:<tag>` or | ||
| `lmsysorg/sglang-rocm:<tag>` (digest-pinned `@sha256:...` variants are fine). | ||
| Vendor/private forks — e.g. `rocm/sgl-dev`, `rocm/vllm-dev`, `ghcr.io#...`, or any | ||
| non-`vllm/`/non-`lmsysorg/` repo — FAIL on the established SKUs above unless the | ||
| sign-off's additional detail section documents a genuine exception: truly new | ||
| hardware (e.g. MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8) or a new model | ||
| architecture that upstream vLLM/SGLang does not fundamentally support yet (as | ||
| backed by vLLM/SGLang community maintainers). Name the offending image and the | ||
| missing justification when you FAIL. (Note: some entries write the registry | ||
| separator as `#`, e.g. `nvcr.io#nvidia/...` — treat `#` as `/`.) | ||
| - (b) ENGINE-FIRST ORDERING: if this PR adds a config entry for a NON-vLLM/SGLang | ||
| framework (`framework:` of `trtllm`, `atom`, dynamo variants, etc. — images like | ||
|
Check warning on line 434 in .github/workflows/codeowner-signoff-verify.yml
|
||
|
Comment on lines
+419
to
+434
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Check 6's framework matching is too literal for actual master-config values. (a) Only exact Extended reasoning...What the check saysCheck 6(a) at .github/workflows/codeowner-signoff-verify.yml:412-427 gates upstream-image usage on entries with Check 6(b) at :428-434 lists the NON-vLLM/SGLang bucket as What the master configs actually containGrepping
Concrete examples of vendor-fork images on disagg vLLM/SGLang variants — precisely the class Check 6 was designed to gate:
Why this matters(a) — silent under-application. A literal reading of (b) — wrong search term + self-contradiction. A verifier grepping Step-by-step proof (part a)
Step-by-step proof (part b)
Why this is a nit, not a blockerThis is a natural-language prompt for an LLM verifier that has interpretive latitude. A competent auditor reading the whole check — including the vendor-fork image examples ( Suggested fix
Comment on lines
+419
to
+434
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Check 6's framework matching uses literal equality that misses real config values: (a) the upstream-image gate at lines 415-419 targets only Extended reasoning...BugCheck 6 tries to gate two behaviors by literal framework-string matching, but the string set is out of sync with what the master configs actually contain. Part (a) — upstream-image gate (line 415-419) misses -disagg / dynamo- variants.* The check text reads "for entries with Crucially, these are the exact entries currently using vendor forks — the class of entries Check 6 was written to catch:
All three are on ESTABLISHED SKUs (MI355X is in the check's own established-hardware list). A verifier taking the check text at face value would search for Part (b) — engine-first ordering (line 434) references Part (b) is also self-contradictory with (a). The bullet lumps "dynamo variants" into the NON-vLLM/SGLang bucket, so a Why existing code doesn't prevent itThis check is a natural-language prompt for an LLM auditor, not code. The LLM has reasoning latitude and will likely match ImpactSoft check under-applied. A verifier that reads (a) strictly would pass entries using Step-by-step proofConsider a hypothetical PR that adds a new MI355X
Compare to the intended behavior: this is precisely the gate's purpose. The MI355X is in the check's own established-hardware list, Fix
|
||
| `rocm/atom*`, `nvcr.io...tensorrt-llm...`), check whether `configs/*-master.yaml` | ||
| already contains a vLLM or SGLang entry for the same model (`model-prefix`) and SKU | ||
| (`runner`). If none exists and no exception is documented in the sign-off, FAIL — | ||
| vLLM/SGLang submissions must land before additional frameworks. Otherwise PASS with | ||
| the matching entry named. | ||
| - N/A if the PR changes no master-config entries (state that in one line). | ||
|
|
||
| ## Check 7 — No benchmark hacks that change the model architecture | ||
| Verify from the PR diff (server args in `benchmarks/**` and master-config changes) | ||
| that nothing alters the model architecture or reduces its FLOPs — e.g. `--hf-overrides` | ||
| that skip the indexer every N layers on a model that doesn't natively support it, | ||
| trimming layers/experts/heads, or otherwise skipping computation. The rule: making the | ||
| SAME computation run faster is fair game; FLOPs at lower precision is fine when evals | ||
| pass; REMOVING model-architecture FLOPs is not. Optimizations should be ones used in | ||
| production by accuracy-sensitive customers. | ||
| - Scan for architecture-override knobs: `--hf-overrides`, `hf_overrides`, | ||
| `--json-model-override-args`, config-editing `sed`/`jq` on the model files, etc. If | ||
| present, determine whether the override changes computed FLOPs vs the model's native | ||
| config and whether the model natively supports that mode. | ||
| - FAIL with the exact flag/value if architecture FLOPs are reduced without native | ||
| model support. PASS in one line otherwise; N/A if the PR touches no server args. | ||
|
|
||
| ## Check 8 — Speculative-decoding configs benchmark through chat templates | ||
| If this PR adds or changes a speculative-decoding config (MTP / EAGLE / draft-model | ||
| flags such as `--speculative-config`, `--speculative-algorithm`, `spec-decode`, config | ||
| names ending in `-mtp`), verify the benchmark client exercises the model through its | ||
| chat template (e.g. chat-completions style endpoint/backend rather than raw | ||
| completions) so the acceptance-length distribution matches real-world traffic. | ||
| - FAIL if a spec-decode benchmark drives raw completions with no chat template — | ||
| name the config and script line. | ||
| - N/A if the PR has no speculative-decoding changes. | ||
|
|
||
| ## Verdict and output | ||
| Decide PASS only if Checks 0, 1, 2, 3, and 4 ALL pass. Post EXACTLY ONE summary comment on | ||
| Decide PASS only if Checks 0-8 ALL pass (a check reported as `N/A` counts as a pass — | ||
| keep the `N/A — <reason>` row so the reviewer sees it was considered). Post EXACTLY ONE summary comment on | ||
| PR #${{ needs.gate.outputs.pr-number }} using `gh pr comment`. Start the comment with | ||
| the hidden marker so reruns are identifiable: | ||
| `<!-- codeowner-signoff-verify sha=${{ needs.gate.outputs.head-sha }} -->` | ||
|
|
@@ -409,7 +485,7 @@ | |
| restating the checklist, no hedging ("if X then maybe Y" — make the call). Link the | ||
| run/recipe instead of describing it. | ||
|
|
||
| - If everything is to standard: post the verdict line + the five one-line PASS rows | ||
| - If everything is to standard: post the verdict line + the nine one-line PASS/N-A rows | ||
| (with the green run URL). Do NOT @-mention anyone on a pass. | ||
| - If anything is NOT to standard: the FIRST line (after the marker) must @-mention the | ||
| sign-off author as `@${{ needs.gate.outputs.signoff-author }}` with the blocking | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Check 5's PASS/FAIL rule doesn't account for the conditional escape-hatch item on line 18 of
docs/PR_REVIEW_CHECKLIST.md(If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.). On a perfectly clean sign-off, that box is intentionally left unchecked with no additional-detail entry — which literally matches Check 5's FAIL condition ("left items unchecked without an explanation in the additional detail section"), inverting its semantics. Suggest exempting the conditional item, e.g. "the escape-hatchIf any of the above criteria cannot reasonably be satisfiedline stays unchecked unless a genuine exception is documented — treat it as PASS in that case".Extended reasoning...
What the bug is. Check 5 asks the verifier to compare the sign-off body against the current
docs/PR_REVIEW_CHECKLIST.mdtemplate. Its rules are:[x]in the sign-off.Every item in the template is a verification affirmation ("I have verified X") — except line 18, which is inherently conditional:
Its own wording says "IF any of the above criteria cannot reasonably be satisfied" — the reviewer is meant to tick it only when they need to note an exception, and the accompanying reasoning belongs in the additional-detail section.
Why the check fails on the happy path. On a perfectly clean sign-off — every criterion satisfied, no exception needed — line 18 is legitimately unchecked and the additional-detail section is empty (or contains only non-exception info like the recipe link). That is a literal three-of-three match for Check 5's FAIL rule: item exists in template, item unchecked, no explanation in additional-detail. So Check 5 would FAIL precisely the state that is most correct.
Why existing wording doesn't rescue it. Check 5 does say "Match items semantically — minor wording drift is fine," which gives the LLM auditor some latitude. But the FAIL clause is written mechanically ("left items unchecked without an explanation"), and a strict-reading verifier acting on that rule will trip. The safer fix is to remove the ambiguity, not rely on interpretive charity.
Step-by-step proof (happy-path sign-off):
[x].- [ ] If any of the above criteria cannot reasonably be satisfied ...) unchecked because there are no exceptions to explain.Impact. Nit-level, but pointed at the common case: false FAILs will land on precisely the sign-offs that are most correct. The verifier is Claude Fable-5, which will often reason semantically and pass anyway, so this won't fire every time — but the rule text invites it to fire, and even one spurious FAIL wastes a merge cycle. Nothing about merging as-is is unsafe.
How to fix. One clause in Check 5, e.g.: