Skip to content

[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000

Open
seungrokj wants to merge 16 commits into
mainfrom
amd/m3_atom_pd_fp4fp8_0701
Open

[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000
seungrokj wants to merge 16 commits into
mainfrom
amd/m3_atom_pd_fp4fp8_0701

Conversation

@seungrokj

@seungrokj seungrokj commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Refactor server_atom.sh and models_atom.yaml to centralize model-specific ATOM config (block size, memory fraction, model length, quant config, hf overrides) and support per-role (prefill/decode) flags for TP+DPA, EP+DPA, and online_quant_config
  • Move hardcoded server tuning params from launch scripts into models_atom.yaml with env-var override support
  • Add minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp recipes to amd-master.yaml for EAGLE3 speculative decoding (DECODE_MTP_SIZE=3)
  • Remove redundant SPEC_DECODING gating; use MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 directly
  • Split online_quant_config into DPA and non-DPA variants (FP8 uses different exclude patterns with DPA)

PR Review Checklist

  • Verified that as of the moment of typing this, this is the latest version of PR_REVIEW_CHECKLIST.md
  • Verified that the general code quality meets the InferenceX standard and does not make the code quality any worse.
  • Verified that this PR has passed PR validation. Please link to GitHub Action workflow that shows this.
  • Verified that this PR passes evals. Please link to GitHub Action workflow that shows this.
  • Verified that speculative decoding PRs uses chat templates to align the AL distribution to real world
  • If a company claims that they support vLLM/SGLang as first class LLM inference engines on their hardware, I have verified that the respective vLLM/SGLang submission has been made before additional frameworks (TRT-LLM, ATOM, etc.). The only exceptions are for new hardware, such as MI455X UALoE72, Vera Rubin NVL72, Rubin NVL8, etc., and for new model architectures where there is an actual reason why vLLM/SGLang does not fundamentally support them yet.
  • Verified that the single-node recipes are similar to the official vLLM recipes and/or the SGLang cookbook:
    • If they are not, I have verified that a PR has been opened in vLLM recipe repo or SGLang repo and linked it below in the additional detail section:
  • If any of the above criteria cannot reasonably be satisfied, I have provided additional reasoning below.

🤖 Generated with Claude Code

中文说明

重构 server_atom.shmodels_atom.yaml,将模型特定的 ATOM 配置(block size、内存比例、模型长度、量化配置、hf overrides)集中管理,支持按角色(prefill/decode)分别设置 TP+DPA、EP+DPA 和 online_quant_config 参数。将启动脚本中的硬编码调优参数迁移到 models_atom.yaml,支持环境变量覆盖。新增 minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp 配置到 amd-master.yaml,支持 EAGLE3 投机解码(DECODE_MTP_SIZE=3)。移除冗余的 SPEC_DECODING 判断,改用 MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 直接控制。将 online_quant_config 拆分为 DPA 和非 DPA 两个变体(FP8 在 DPA 模式下使用不同的 exclude pattern)。

한국어 설명

server_atom.shmodels_atom.yaml을 리팩토링하여 모델별 ATOM 설정(block size, 메모리 비율, 모델 길이, 양자화 설정, hf overrides)을 중앙 집중화하고, 역할별(prefill/decode) TP+DPA, EP+DPA, online_quant_config 플래그를 지원합니다. 런치 스크립트의 하드코딩된 서버 튜닝 파라미터를 models_atom.yaml로 이전하고 환경 변수 오버라이드를 지원합니다. amd-master.yamlminimaxm3-fp4/fp8-mi355x-atom-disagg-mtp 레시피를 추가하여 EAGLE3 투기적 디코딩(DECODE_MTP_SIZE=3)을 지원합니다. 불필요한 SPEC_DECODING 게이팅을 제거하고 MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0으로 직접 제어합니다. online_quant_config를 DPA/비-DPA 변형으로 분리합니다 (FP8은 DPA 모드에서 다른 exclude 패턴 사용).

seungrokj and others added 11 commits July 3, 2026 13:41
… ATOM config; add minimaxm3-fp4-mi355x-atom-disagg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd server_atom.sh refactor (PR #1940)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sagg launch script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, SPEC_DECODING guard

- Replace fragile eval "$(python3 -c "...")" with heredoc + source tempfile to
  avoid nested quote escaping issues that caused MODEL_ENVS to be empty at runtime
- Fix PREFILL/DECODE_ENABLE_EP comparison from numeric -gt 1 to string = "true"
  to match the "true"/"false" values set by launch scripts
- Fix SPEC_DECODING guard from hardcoded "mtp" to any non-none/non-empty value
  so EAGLE3 and future methods also activate SPEC_ARGS from models_atom.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ewline in models_atom.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…niMax-M3 ATOM recipes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ages to 20260623

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…agg image to 20260622

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efaults to YAML

- Split MODEL_TP_DP_FLAGS and MODEL_EP_DP_FLAGS into prefill/decode variants
- Move BLOCK_SIZE, MEM_FRAC_STATIC, MAX_MODEL_LEN, MAX_NUM_SEQS,
  MAX_NUM_BATCHED_TOKENS from launch scripts into models_atom.yaml
- Add hf_overrides and online_quant_config (with DPA variant) to YAML
- Remove SPEC_DECODING gate; use MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0
- Add minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp recipes to amd-master.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@seungrokj seungrokj added AMD evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection labels Jul 3, 2026
@seungrokj seungrokj changed the title [AMD] MiniMax-M3 FP4/FP8 MI355X ATOM disagg: refactor config & add MTP recipes [AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes Jul 3, 2026
Comment thread benchmarks/multi_node/amd_utils/server_atom.sh Outdated
Comment thread benchmarks/multi_node/amd_utils/server_atom.sh
Comment thread benchmarks/multi_node/amd_utils/models_atom.yaml Outdated
Shell defaults (BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85) were set before
YAML loading, so the YAML values (128, 0.8) were never substituted.
Use three-tier fallback: env var > YAML > shell default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread benchmarks/multi_node/amd_utils/models_atom.yaml Outdated
seungrokj and others added 2 commits July 3, 2026 16:15
…#2000)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot Jul 3, 2026
@SemiAnalysisAI SemiAnalysisAI deleted a comment from github-actions Bot Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

seungrokj and others added 2 commits July 4, 2026 13:09
…efaults

job.slurm injects BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85, MAX_NUM_SEQS=256
as Docker env vars with hardcoded defaults. The previous env-first fallback
(env > YAML > default) meant YAML values were always shadowed. Flip all
five server-tuning vars to YAML > env > default so models_atom.yaml
entries (e.g. block_size=128 for MiniMax-M3-MXFP4) actually take effect.

Also add set -x before YAML parsing for CI debuggability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@functionstackx functionstackx changed the title [AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes [AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가 Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMD evals-only Suppress throughput and run only eval jobs; combine with all-evals to expand selection full-sweep-enabled

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants