-
Notifications
You must be signed in to change notification settings - Fork 217
[AMD] Enable AITER MoE for MiniMax-M3 FP4 MI355X vLLM MTP (fix EP startup hang) #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+22
−1
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
a2f7a8a
[AMD] Enable AITER MoE for MiniMax-M3 FP4 MI355X vLLM MTP (incl. EP)
Fangzhou-Ai dcc1331
Add perf-changelog entry for minimaxm3-fp4-mi355x-vllm-mtp AITER MoE fix
Fangzhou-Ai e33fc18
Merge branch 'main' into amd/minimax-m3-fp4-mtp-aiter-moe-ep
Fangzhou-Ai 94fbcbc
Merge branch 'main' into amd/minimax-m3-fp4-mtp-aiter-moe-ep
functionstackx f8a1165
Restore dropped config-keys header for minimaxm3-fp4-b300-dynamo-vllm…
functionstackx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 This PR bumps the
minimaxm3-fp4-mi355x-vllm-mtpimage tag (nightly-3f5a1e173...→nightly-4559c43a9...) and addsVLLM_ROCM_USE_AITER=1,VLLM_ROCM_USE_AITER_MOE=1,VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1, and--moe-backend aiter, but does not append the required entry toperf-changelog.yaml. Without that entry the run-sweep workflow has no trigger record for the new config on this PR, so the EP-hang fix that this PR exists to deliver will not be re-benchmarked end-to-end by CI before merge. Mirror the sister STP entry atperf-changelog.yamllines 4320–4325 (from #1954): a 4–6 line block underconfig-keys: [minimaxm3-fp4-mi355x-vllm-mtp]describing the image pin and AITER MoE enablement, withpr-linkpointing at this PR.Extended reasoning...
What's missing
AGENTS.md §"Updating Docker images" (line 126) explicitly mandates: "Update the image tag in the relevant
.github/configs/*-master.yamland/orbenchmarks/*.sh, update any related env vars / config params, and append aperf-changelog.yamlentry (required - triggers benchmarks)". AGENTS.md line 58 reinforces this: "Changes toperf-changelog.yamltrigger benchmark runs."This PR makes both of the changelog-triggering changes AGENTS.md calls out — (1) bumps the image tag at
.github/configs/amd-master.yaml:2669fromnightly-3f5a1e1733200760169ff31ebe60a271072b199etonightly-4559c43a9526597c00cbcc4f59979496500268d1, and (2) adds threeVLLM_ROCM_USE_AITER*env vars plus--moe-backend aitertobenchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm_mtp.sh— but the PR diff modifies only those two files.perf-changelog.yamlis not touched.Why this matters here (not generic policy concern)
The sister STP PR #1954 — which makes byte-for-byte equivalent changes (same image pin, same three AITER env vars, same
--moe-backend aiter) for the non-MTP variantminimaxm3-fp4-mi355x-vllm— did append a proper entry atperf-changelog.yamllines 4320–4325 with the matchingconfig-keys: [minimaxm3-fp4-mi355x-vllm]block and apr-linkto #1954. The MTP twin (this PR) omits the mirror entry. The most recent existing entry forminimaxm3-fp4-mi355x-vllm-mtpinperf-changelog.yaml(lines 4292–4298, from PR #1939) still pins the old hanging imagenightly-3f5a1e173...and describes "automatic MoE backend selection" — exactly the configuration this PR is trying to replace.CI consequence (step-by-step)
.github/workflows/run-sweep.yml(push to main and pull_request triggers) gates onpaths: perf-changelog.yaml— only changes to that file cause the full sweep to fire for the affected configs.perf-changelog.yaml, so on push/PR events the run-sweep workflow has noconfig-keysdelta namingminimaxm3-fp4-mi355x-vllm-mtpto trigger on.4559c43a9+ aiter tip; the CI re-sweep on the pinned docker image will confirm end-to-end." That CI re-sweep cannot happen without the changelog entry.This is precisely the case run-sweep exists to catch: the entire reason for this PR is to fix an ~8-hour CI hang on EP configs of
minimaxm3-fp4-mi355x-vllm-mtp. Merging without the changelog entry leaves the remediation unverified by the gate it was built to satisfy.Fix
Append an entry to the end of
perf-changelog.yamlmirroring the #1954 entry at lines 4320–4325:This is the same mechanical 4–6 line append the sister STP PR #1954 already used; copying its shape is the cleanest fix.