Skip to content

[Klaud Cold][DO NOT MERGE] Test signoff-verify Check 7 with --hf-overrides indexer hack#2017

Closed
functionstackx wants to merge 1 commit into
mainfrom
klaud/test-signoff-hf-overrides
Closed

[Klaud Cold][DO NOT MERGE] Test signoff-verify Check 7 with --hf-overrides indexer hack#2017
functionstackx wants to merge 1 commit into
mainfrom
klaud/test-signoff-hf-overrides

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

Summary

  • Adds --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' to the MiniMax M3 MXFP8 H200 vLLM launch command — an architecture-changing benchmark hack of exactly the kind docs/PR_REVIEW_CHECKLIST.md forbids.
  • This is a NEGATIVE test for the updated codeowner-signoff-verify workflow ([Klaud Cold] Align codeowner-signoff-verify with latest PR_REVIEW_CHECKLIST.md #2015): after a sign-off comment is posted, Check 7 should FAIL naming this flag.
  • Will be closed without merging once verified.

🤖 Generated with Claude Code

…merge)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@functionstackx

Copy link
Copy Markdown
Collaborator Author

As a PR reviewer and CODEOWNER, I have reviewed this and have:

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • Verified that the model architecture isn't changed with benchmark hacks like using --hf-overrides to skipping indexer for every x layers on models that don't natively support this. As a general rule, we won't accept optimizations that reduces the number of model architecture FLOPs. Anything that makes that same computation run faster is fair game; FLOPs at lower precisions is fine, given that the config passes private evals. As an general north star princple, we should only use optimizations which is used in production by customers that care about accuracy
  • If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM submission made using upstream https://hub.docker.com/u/vllm docker repo, upstream SGLang https://hub.docker.com/u/lmsysorg docker repo. The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet as supported by vLLM/SGLang community maintainers
  • If an company claims that they support vLLM/SGLang as first class upstream in-tree LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or theSGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

Additional detail section:

Signed: functionstackx

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@functionstackx Sign-off NOT verified — merge is blocked: no sweep/evals ran on any commit in this PR, no /reuse-sweep-run has been posted, and the launch command adds an architecture-changing --hf-overrides indexer hack.

  • Check 0 (CODEOWNER): PASS — only changed path is covered by the * @InferenceX/core catch-all; signer is an org MEMBER (team membership not readable by this token; not treated as a failure).
  • Check 1 (sweep on in-PR commit): FAIL — No passing sweep/eval was found on any commit in this PR. The only commit (2234a2d) has no single-node */ or eval / check-runs at all.
  • Check 2 (evals pass): FAIL — no eval run exists on an in-PR commit to verify.
  • Check 3 (recipe linked + complete): FAIL — link present (https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3) and model/TP/DP/EP/block-size match, but --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' materially changes the served model and appears nowhere in the recipe.
  • Check 4 (reuse command): FAIL — No authorized /reuse-sweep-run command has been posted on this PR; an authorized maintainer must comment /reuse-sweep-run before this PR can be merged via reuse.
  • Check 5 (latest checklist template): PASS — every current-template item is present and checked.
  • Check 6 (upstream image / engine-first): N/A — no configs/*-master.yaml changes in this PR.
  • Check 7 (no architecture hacks): FAIL — --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' in benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_h200.sh reuses cached indices to skip the MSA indexer on 3 of every 4 layers, reducing model-architecture FLOPs; MiniMax-M3 does not natively run this mode and the linked recipe does not use it.
  • Check 8 (spec-decode chat template): N/A — no speculative-decoding changes.

Note: the PR is titled "[DO NOT MERGE] Test signoff-verify Check 7" and describes itself as a negative test for this workflow.

@functionstackx

Copy link
Copy Markdown
Collaborator Author

Negative test confirmed — Check 7 correctly FAILed on the --hf-overrides indexer hack (and Check 3 flagged the recipe divergence). Closing.

@functionstackx functionstackx deleted the klaud/test-signoff-hf-overrides branch July 4, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants