Skip to content

[feat] Enable MLA chunked prefill and KV cache reuse on SM121#15347

Open
CodersAcademy006 wants to merge 1 commit into
NVIDIA:mainfrom
CodersAcademy006:feat/sm121-mla
Open

[feat] Enable MLA chunked prefill and KV cache reuse on SM121#15347
CodersAcademy006 wants to merge 1 commit into
NVIDIA:mainfrom
CodersAcademy006:feat/sm121-mla

Conversation

@CodersAcademy006

@CodersAcademy006 CodersAcademy006 commented Jun 14, 2026

Copy link
Copy Markdown

I updated the SM version check logic in py_executor_creator.py to allow MLA chunked prefill and KV cache block reuse on SM121 (Blackwell) architectures. Specifically, I added SM121 (121) to the validation lists for both enable_block_reuse and enable_chunked_context checks, preventing the executor from automatically disabling these features on Blackwell GPUs.

This resolves issue #15344, where these optimization features were being disabled on SM121 devices because the Python-side validator was missing SM121 in its allowlist. The underlying C++ kernels already support SM121, so this change enables full compatibility for MLA optimizations on this hardware.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

In py_executor_creator.py, SM121 is added to the GPU SM version allowlists for two MLA feature guards inside create_py_executor: MLA KV cache block reuse and MLA chunked prefill. Both the conditional checks and their associated warning messages are updated to reflect SM121 as a supported architecture.

Changes

MLA SM121 allowlist expansion

Layer / File(s) Summary
SM121 added to MLA KV cache reuse and chunked prefill guards
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
The SM version allowlist for the MLA KV cache block reuse guard and the MLA chunked prefill guard are each extended to include 121, and both warning messages are updated to list SM121 as a supported SM version.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title '[feat] Enable MLA chunked prefill and KV cache reuse on SM121' accurately summarizes the main change: adding SM121 support to two MLA optimization features.
Linked Issues check ✅ Passed The changes fully satisfy issue #15344 objectives: SM121 is added to both MLA KV block reuse and chunked prefill allowlists as required.
Out of Scope Changes check ✅ Passed All code changes in the PR are directly related to the linked issue #15344; the modifications only update the SM version allowlists as specified in the objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description provides clear context about the changes, references issue #15344, and explains the rationale for enabling SM121 support for MLA features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Srijan Upadhyay <srjnupadhyay@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant