[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가 by seungrokj · Pull Request #2000 · SemiAnalysisAI/InferenceX

seungrokj · 2026-07-03T06:50:16Z

Summary

Refactor server_atom.sh and models_atom.yaml to centralize model-specific ATOM config (block size, memory fraction, model length, quant config, hf overrides) and support per-role (prefill/decode) flags for TP+DPA, EP+DPA, and online_quant_config
Move hardcoded server tuning params from launch scripts into models_atom.yaml with env-var override support
Add minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp recipes to amd-master.yaml for EAGLE3 speculative decoding (DECODE_MTP_SIZE=3)
Remove redundant SPEC_DECODING gating; use MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 directly
Split online_quant_config into DPA and non-DPA variants (FP8 uses different exclude patterns with DPA)

PR Review Checklist

Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
If a company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
Verified that the single-node recipes are similar to the official vLLM recipes and/or the SGLang cookbook:
- If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

中文说明

重构 server_atom.sh 和 models_atom.yaml，将模型特定的 ATOM 配置（block size、内存比例、模型长度、量化配置、hf overrides）集中管理，支持按角色（prefill/decode）分别设置 TP+DPA、EP+DPA 和 online_quant_config 参数。将启动脚本中的硬编码调优参数迁移到 models_atom.yaml，支持环境变量覆盖。新增 minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp 配置到 amd-master.yaml，支持 EAGLE3 投机解码（DECODE_MTP_SIZE=3）。移除冗余的 SPEC_DECODING 判断，改用 MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 直接控制。将 online_quant_config 拆分为 DPA 和非 DPA 两个变体（FP8 在 DPA 模式下使用不同的 exclude pattern）。

한국어 설명

server_atom.sh와 models_atom.yaml을 리팩토링하여 모델별 ATOM 설정(block size, 메모리 비율, 모델 길이, 양자화 설정, hf overrides)을 중앙 집중화하고, 역할별(prefill/decode) TP+DPA, EP+DPA, online_quant_config 플래그를 지원합니다. 런치 스크립트의 하드코딩된 서버 튜닝 파라미터를 models_atom.yaml로 이전하고 환경 변수 오버라이드를 지원합니다. amd-master.yaml에 minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp 레시피를 추가하여 EAGLE3 투기적 디코딩(DECODE_MTP_SIZE=3)을 지원합니다. 불필요한 SPEC_DECODING 게이팅을 제거하고 MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0으로 직접 제어합니다. online_quant_config를 DPA/비-DPA 변형으로 분리합니다 (FP8은 DPA 모드에서 다른 exclude 패턴 사용).

… ATOM config; add minimaxm3-fp4-mi355x-atom-disagg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nd server_atom.sh refactor (PR #1940) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sagg launch script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…, SPEC_DECODING guard - Replace fragile eval "$(python3 -c "...")" with heredoc + source tempfile to avoid nested quote escaping issues that caused MODEL_ENVS to be empty at runtime - Fix PREFILL/DECODE_ENABLE_EP comparison from numeric -gt 1 to string = "true" to match the "true"/"false" values set by launch scripts - Fix SPEC_DECODING guard from hardcoded "mtp" to any non-none/non-empty value so EAGLE3 and future methods also activate SPEC_ARGS from models_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ewline in models_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…niMax-M3 ATOM recipes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ages to 20260623 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…agg image to 20260622 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…efaults to YAML - Split MODEL_TP_DP_FLAGS and MODEL_EP_DP_FLAGS into prefill/decode variants - Move BLOCK_SIZE, MEM_FRAC_STATIC, MAX_MODEL_LEN, MAX_NUM_SEQS, MAX_NUM_BATCHED_TOKENS from launch scripts into models_atom.yaml - Add hf_overrides and online_quant_config (with DPA variant) to YAML - Remove SPEC_DECODING gate; use MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 - Add minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp recipes to amd-master.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-07-03T06:50:24Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

Shell defaults (BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85) were set before YAML loading, so the YAML values (128, 0.8) were never substituted. Use three-tier fallback: env var > YAML > shell default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#2000) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-07-03T12:02:05Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28645608307
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28645608307

github-actions · 2026-07-03T13:00:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28645608307
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28645608307

…efaults job.slurm injects BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85, MAX_NUM_SEQS=256 as Docker env vars with hardcoded defaults. The previous env-first fallback (env > YAML > default) meant YAML values were always shadowed. Flip all five server-tuning vars to YAML > env > default so models_atom.yaml entries (e.g. block_size=128 for MiniMax-M3-MXFP4) actually take effect. Also add set -x before YAML parsing for CI debuggability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj and others added 11 commits July 3, 2026 13:41

[AMD] refactor server_atom.sh and models_atom.yaml for model-specific…

3ef1380

… ATOM config; add minimaxm3-fp4-mi355x-atom-disagg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] add perf-changelog entry for minimaxm3-fp4-mi355x-atom-disagg a…

594f2bc

…nd server_atom.sh refactor (PR #1940) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] add env dump in server_atom.sh and minimaxm3-fp4-mi355x-atom-di…

8740b80

…sagg launch script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] cap minimaxm3-fp8-mi355x-atom-disagg conc to 256; fix missing n…

7cd3353

…ewline in models_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] update amd-master.yaml: image bumps, search space tweaks for Mi…

8961125

…niMax-M3 ATOM recipes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] restore minimaxm3-fp4/fp8-mi355x-atom recipes; bump all ATOM im…

48b9946

…ages to 20260623 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] clean up minimaxm3-fp4-mi355x-atom search space; revert fp8-dis…

7f94d30

…agg image to 20260622 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] add amd-master.yaml config

1aa7ace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[AMD] remove amd-master.yaml config

ed5e874

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team July 3, 2026 06:50

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners July 3, 2026 06:50

github-project-automation Bot added this to InferenceMAX Board Jul 3, 2026

This was referenced Jul 3, 2026

[AMD] Add MiniMax-M3-FP4 MI355X ATOMESH update 0623 #1940

Closed

[AMD] Add MiniMax-M3-FP8 MI355X ATOMESH update 0623 #1930

Closed

seungrokj added AMD evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection labels Jul 3, 2026

seungrokj changed the title ~~[AMD] MiniMax-M3 FP4/FP8 MI355X ATOM disagg: refactor config & add MTP recipes~~ [AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes Jul 3, 2026

claude Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated

Comment thread benchmarks/multi_node/amd_utils/server_atom.sh

Comment thread benchmarks/multi_node/amd_utils/models_atom.yaml Outdated

seungrokj added the full-sweep-enabled label Jul 3, 2026

functionstackx reviewed Jul 3, 2026

View reviewed changes

Comment thread benchmarks/multi_node/amd_utils/models_atom.yaml Outdated

seungrokj and others added 2 commits July 3, 2026 16:15

[AMD] add perf-changelog entry for MiniMax-M3 ATOM disagg refactor (PR …

8d57cde

…#2000) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

[AMD] remove hf_overrides from models_atom.yaml and server_atom.sh

0fd4545

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SemiAnalysisAI deleted a comment from github-actions Bot Jul 3, 2026

seungrokj and others added 2 commits July 4, 2026 13:09

Merge branch 'main' into amd/m3_atom_pd_fp4fp8_0701

79a7c65

functionstackx changed the title ~~[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes~~ [AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가 Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000

[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000
seungrokj wants to merge 16 commits into
mainfrom
amd/m3_atom_pd_fp4fp8_0701

seungrokj commented Jul 3, 2026 •

edited by functionstackx

Loading

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

seungrokj commented Jul 3, 2026 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR Review Checklist

中文说明

한국어 설명

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented Jul 3, 2026 •

edited by functionstackx

Loading