-
Notifications
You must be signed in to change notification settings - Fork 218
[AMD] Enable AITER MoE for MiniMax-M3 MI355X vLLM MTP benchmarks / 为 MiniMax-M3 MI355X vLLM MTP 基准测试启用 AITER MoE #1955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Fangzhou-Ai
wants to merge
4
commits into
main
Choose a base branch
from
amd/minimax-m3-mtp-aiter-moe
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
e655a4c
[AMD] enable AITER MoE for MiniMax-M3 MI355X vLLM MTP benchmarks
Fangzhou-Ai 75ed170
fix: set perf-changelog pr-link for #1955
Fangzhou-Ai 811bd4e
[AMD] pass --linear-backend emulation on MiniMax-M3 FP8 MTP
Fangzhou-Ai 1db8d73
chore(amd): bump MiniMax-M3 MI355X vLLM to latest ROCm nightly
Fangzhou-Ai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,6 +61,10 @@ export VLLM_ENGINE_READY_TIMEOUT_S=3600 | |
| # Run with CUDA graphs (no --enforce-eager): VLLM_USE_BREAKABLE_CUDAGRAPH=0 | ||
| # avoids the M3-decode breakable-cudagraph path that previously forced eager. | ||
| export VLLM_USE_BREAKABLE_CUDAGRAPH=0 | ||
| export VLLM_ROCM_USE_AITER=1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will need to set VLLM_ROCM_USE_AITER=0 when enable ep |
||
| export VLLM_ROCM_USE_AITER_MOE=1 | ||
| export VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1 | ||
| export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 | ||
|
|
||
| if [ "${EVAL_ONLY}" = "true" ]; then | ||
| setup_eval_context | ||
|
|
@@ -176,7 +180,9 @@ vllm serve "$MODEL" --port "$PORT" \ | |
| --language-model-only \ | ||
| --max-model-len "$MAX_MODEL_LEN" \ | ||
| --kv-cache-dtype fp8 \ | ||
| --linear-backend emulation \ | ||
| --attention-backend TRITON_ATTN \ | ||
| --moe-backend aiter \ | ||
|
functionstackx marked this conversation as resolved.
|
||
| --speculative-config "{\"method\": \"eagle3\", \"model\": \"$DRAFT_MODEL\", \"num_speculative_tokens\": $NUM_SPEC_TOKENS}" \ | ||
| --tool-call-parser minimax_m3 \ | ||
| --reasoning-parser minimax_m3 \ | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Nit: line 8 says "MoE serving mirrors minimaxm3_fp4_mi355x_vllm.sh (AITER MoE, vllm#46419)." but on current main that STP recipe sets no AITER env vars and no
--moe-backend— its own header comment says it "lets vLLM select the MoE backend." Those STP knobs come from #1954, which hasn't landed. If #1955 merges first, this cross-reference is wrong. Consider pointing at #1954 directly, or just describing the AITER setup (with the vllm#46419 credit) without claiming the STP file already does the same.Extended reasoning...
What the comment claims vs. what is true on main
The new line 8 in
benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm_mtp.shreads:# MoE serving mirrors minimaxm3_fp4_mi355x_vllm.sh (AITER MoE, vllm#46419).But
benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.shon current main does not yet setVLLM_ROCM_USE_AITER,VLLM_ROCM_USE_AITER_MOE,VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS, or pass--moe-backend aiter. Its line 6 explicitly states the opposite: "…lets vLLM select the MoE backend." Agit grep --moe-backendconfirms the only occurrences of--moe-backend aiterin the tree are the two MTP files introduced by this PR.Step-by-step proof
cat benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.shon the current branch's view of main shows lines 5–6 saying the recipe "uses the text-only language-model path and lets vLLM select the MoE backend."grep -E 'VLLM_ROCM_USE_AITER|moe-backend' benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.shreturns nothing.git log --all --oneline | grep -i 1954returns nothing — [AMD] Enable AITER MoE for MiniMax-M3 FP4 MI355X vLLM STP #1954 has not landed.minimaxm3_fp4_mi355x_vllm_mtp.shand following the line-8 pointer tominimaxm3_fp4_mi355x_vllm.shwill find a file whose header comment says vLLM chooses the backend — directly contradicting the "mirrors … AITER MoE" claim.Addressing the refutation
The refutation argues this is (a) a known dependency, (b) flagged in the test plan, (c) a nit, and (d) forward-looking design language is normal for paired PRs. Points (a)–(c) are accurate and are exactly why this is filed at nit severity, not blocking. The remaining concern is narrow: the comment's surface reading is a present-tense factual claim ("mirrors"), and merge order in this repo is not actually pinned — #1955 can land before, after, or instead of #1954. If #1954 is rebased, re-scoped, or abandoned, this comment ships indefinitely as a dangling cross-reference. The refutation's framing of "a comment that's true after the next-PR-in-the-stack merges" assumes a merge order that the PR description hopes for but does not enforce.
Impact
Documentation-only; no runtime effect. The AITER env vars and
--moe-backend aiterflag in this file are self-contained and correct on their own. The risk is purely reader confusion if/when they trace the cross-reference and find the STP recipe in a state that contradicts the comment.How to fix
One of three trivial options:
# MoE serving uses AITER MoE (vllm#46419).# MoE serving mirrors the STP AITER MoE setup from #1954.This is in-scope to flag because the comment is newly added by this PR, the fix is one-line, and option 1 makes the comment robust to any merge order without coupling the two PRs.