[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000
[AMD] MiniMax-M3 FP4/FP8 MI355X ATOMESH (disagg): refactor config & add MTP recipes / 重构配置并新增 MTP 配方 / 설정 리팩토링 및 MTP 레시피 추가#2000seungrokj wants to merge 16 commits into
Conversation
… ATOM config; add minimaxm3-fp4-mi355x-atom-disagg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd server_atom.sh refactor (PR #1940) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sagg launch script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, SPEC_DECODING guard - Replace fragile eval "$(python3 -c "...")" with heredoc + source tempfile to avoid nested quote escaping issues that caused MODEL_ENVS to be empty at runtime - Fix PREFILL/DECODE_ENABLE_EP comparison from numeric -gt 1 to string = "true" to match the "true"/"false" values set by launch scripts - Fix SPEC_DECODING guard from hardcoded "mtp" to any non-none/non-empty value so EAGLE3 and future methods also activate SPEC_ARGS from models_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ewline in models_atom.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…niMax-M3 ATOM recipes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ages to 20260623 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…agg image to 20260622 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efaults to YAML - Split MODEL_TP_DP_FLAGS and MODEL_EP_DP_FLAGS into prefill/decode variants - Move BLOCK_SIZE, MEM_FRAC_STATIC, MAX_MODEL_LEN, MAX_NUM_SEQS, MAX_NUM_BATCHED_TOKENS from launch scripts into models_atom.yaml - Add hf_overrides and online_quant_config (with DPA variant) to YAML - Remove SPEC_DECODING gate; use MODEL_MTP_FLAGS + DECODE_MTP_SIZE > 0 - Add minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp recipes to amd-master.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. 感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致 如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow 一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。 如需更多帮助,PR 作者可通过 Slack 联系核心维护者。 |
Shell defaults (BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85) were set before YAML loading, so the YAML values (128, 0.8) were never substituted. Use three-tier fallback: env var > YAML > shell default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#2000) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28645608307 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28645608307 |
…efaults job.slurm injects BLOCK_SIZE=16, MEM_FRAC_STATIC=0.85, MAX_NUM_SEQS=256 as Docker env vars with hardcoded defaults. The previous env-first fallback (env > YAML > default) meant YAML values were always shadowed. Flip all five server-tuning vars to YAML > env > default so models_atom.yaml entries (e.g. block_size=128 for MiniMax-M3-MXFP4) actually take effect. Also add set -x before YAML parsing for CI debuggability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
server_atom.shandmodels_atom.yamlto centralize model-specific ATOM config (block size, memory fraction, model length, quant config, hf overrides) and support per-role (prefill/decode) flags for TP+DPA, EP+DPA, and online_quant_configmodels_atom.yamlwith env-var override supportminimaxm3-fp4/fp8-mi355x-atom-disagg-mtprecipes toamd-master.yamlfor EAGLE3 speculative decoding (DECODE_MTP_SIZE=3)SPEC_DECODINGgating; useMODEL_MTP_FLAGS+DECODE_MTP_SIZE > 0directlyonline_quant_configinto DPA and non-DPA variants (FP8 uses different exclude patterns with DPA)PR Review Checklist
🤖 Generated with Claude Code
中文说明
重构
server_atom.sh和models_atom.yaml,将模型特定的 ATOM 配置(block size、内存比例、模型长度、量化配置、hf overrides)集中管理,支持按角色(prefill/decode)分别设置 TP+DPA、EP+DPA 和 online_quant_config 参数。将启动脚本中的硬编码调优参数迁移到models_atom.yaml,支持环境变量覆盖。新增minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp配置到amd-master.yaml,支持 EAGLE3 投机解码(DECODE_MTP_SIZE=3)。移除冗余的SPEC_DECODING判断,改用MODEL_MTP_FLAGS+DECODE_MTP_SIZE > 0直接控制。将online_quant_config拆分为 DPA 和非 DPA 两个变体(FP8 在 DPA 模式下使用不同的 exclude pattern)。한국어 설명
server_atom.sh와models_atom.yaml을 리팩토링하여 모델별 ATOM 설정(block size, 메모리 비율, 모델 길이, 양자화 설정, hf overrides)을 중앙 집중화하고, 역할별(prefill/decode) TP+DPA, EP+DPA, online_quant_config 플래그를 지원합니다. 런치 스크립트의 하드코딩된 서버 튜닝 파라미터를models_atom.yaml로 이전하고 환경 변수 오버라이드를 지원합니다.amd-master.yaml에minimaxm3-fp4/fp8-mi355x-atom-disagg-mtp레시피를 추가하여 EAGLE3 투기적 디코딩(DECODE_MTP_SIZE=3)을 지원합니다. 불필요한SPEC_DECODING게이팅을 제거하고MODEL_MTP_FLAGS+DECODE_MTP_SIZE > 0으로 직접 제어합니다.online_quant_config를 DPA/비-DPA 변형으로 분리합니다 (FP8은 DPA 모드에서 다른 exclude 패턴 사용).