Skip to content

[Klaud Cold][DO NOT MERGE] Test signoff-verify Check 6 with vendor rocm/sgl-dev image on MI300X#2018

Closed
functionstackx wants to merge 1 commit into
mainfrom
klaud/test-signoff-vendor-image
Closed

[Klaud Cold][DO NOT MERGE] Test signoff-verify Check 6 with vendor rocm/sgl-dev image on MI300X#2018
functionstackx wants to merge 1 commit into
mainfrom
klaud/test-signoff-vendor-image

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

Summary

  • Swaps dsr1-fp8-mi300x-sglang from upstream lmsysorg/sglang:v0.5.12-rocm700-mi30x to vendor fork rocm/sgl-dev:sglang-0.5.12-rocm700-mi30x.
  • NEGATIVE test for the updated codeowner-signoff-verify workflow ([Klaud Cold] Align codeowner-signoff-verify with latest PR_REVIEW_CHECKLIST.md #2015): MI300X is established hardware and DeepSeek-R1 is upstream-supported, so Check 6(a) should FAIL naming the non-upstream image.
  • Will be closed without merging once verified.

🤖 Generated with Claude Code

…t merge)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

2 similar comments
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@functionstackx

Copy link
Copy Markdown
Collaborator Author

As a PR reviewer and CODEOWNER, I have reviewed this and have:

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • Verified that the model architecture isn't changed with benchmark hacks like using --hf-overrides to skipping indexer for every x layers on models that don't natively support this. As a general rule, we won't accept optimizations that reduces the number of model architecture FLOPs. Anything that makes that same computation run faster is fair game; FLOPs at lower precisions is fine, given that the config passes private evals. As an general north star princple, we should only use optimizations which is used in production by customers that care about accuracy
  • If an company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM submission made using upstream https://hub.docker.com/u/vllm docker repo, upstream SGLang https://hub.docker.com/u/lmsysorg docker repo. The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet as supported by vLLM/SGLang community maintainers
  • If an company claims that they support vLLM/SGLang as first class upstream in-tree LLM inference engines on their hardware, I have have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or theSGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

Additional detail section:

Signed: functionstackx

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@functionstackx Sign-off REJECTED — vendor image on established hardware, no passing sweep/evals on any in-PR commit, no /reuse-sweep-run posted, and the signer is not a CODEOWNER for the changed file.

  • Check 0: FAIL — configs/amd-master.yaml is owned by billishyahao, chunfangamd, seungrokj, yctseng0211, 1am9trash (last-matching CODEOWNERS line); functionstackx is not among them.
  • Check 1: FAIL — No passing sweep/eval on any commit in this PR: the sole commit 5e5ca5b has no single-node */ or eval / check-runs at all.
  • Check 2: FAIL — no eval run exists on an in-PR commit to verify accuracy against (blocked by Check 1).
  • Check 3: FAIL — the linked "recipe" (https://docs.sglang.io/cookbook/intro) is the generic cookbook index, not a corresponding DeepSeek-R1 recipe (no DeepSeek recipe is listed there), so major server args cannot be compared.
  • Check 4: FAIL — No authorized /reuse-sweep-run command has been posted on this PR; an authorized maintainer must comment it before this PR can be merged via reuse.
  • Check 5: PASS — sign-off matches the current docs/PR_REVIEW_CHECKLIST.md; all items checked.
  • Check 6: FAIL — dsr1-fp8-mi300x-sglang (framework: sglang, runner: mi300x — established SKU) is switched to vendor fork rocm/sgl-dev:sglang-0.5.12-rocm700-mi30x; upstream lmsysorg/sglang:* is required and no exception is documented in the sign-off. (b) N/A — no non-vLLM/SGLang framework entry added.
  • Check 7: PASS — image-only change; no server args or architecture-override flags touched.
  • Check 8: N/A — no speculative-decoding changes in this PR.

Note: the PR is titled "[DO NOT MERGE]" and self-describes as a negative test of this workflow.

@functionstackx

Copy link
Copy Markdown
Collaborator Author

Negative test confirmed — Check 6 correctly FAILed on the vendor rocm/sgl-dev image for an established SKU (MI300X) with no documented exception. Closing.

@functionstackx functionstackx deleted the klaud/test-signoff-vendor-image branch July 4, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants