[CK Tile] Prepare mixed batch-prefill FP8 KV contract#3745
Draft
ysmkone wants to merge 2 commits into
Draft
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
Make the unsupported BF16/FP16 Q with FP8 KV path explicit across dispatcher and codegen validation so future kernel plumbing has a single contract to extend. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
|
Thanks for the fast responsed PR! I believe we need to put it in to the rocm-libraries (https://github.com/ROCm/rocm-libraries/pulls). CK repo is just a mirrored repo right now. @ecamartins @CongMa13 Do you mind puttingit into the rocm library to run the preliminary CI? cc. @illsilin @cgmillette |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mixed_q_fp8_kvcontract instead of letting it look like a normal BF16/FP16 batch-prefill request.fp8bf16batch-prefill path and adjacent all-BF16/all-FP16 paths.q_data_type/kv_data_typecodegen configs.Current status
This remains a draft preparatory PR. It does not implement full mixed-dtype CK Tile batch-prefill kernels for #3744.
The branch now makes the support boundary precise in dispatcher and codegen: CK Tile batch-prefill currently selects Q/K/V operand types from one
data_typetoken, andfp8bf16means FP8 Q/K/V with BF16 output. The requested contract needs separate Q activation dtype, KV storage dtype, output dtype, descale plumbing, and generated kernel instances before this PR can claim full support.Tests
python -m unittest dispatcher.tests.test_fmha_utils dispatcher.tests.test_fmha_rules dispatcher.tests.test_fmha_codegenLocal validation limits
fleet,hipcc, androcminfowere not available on PATH in this Windows environment, so I could not run a gfx942 compile/probe smoke.Remaining kernel work
Next validation spec
Run on MI300X/gfx942 with HIP tooling available:
mha_batch_prefill_func/ batch-prefill dispatcherbatch=4,q_len=1,ctx_len=1024,num_q_heads=96,num_kv_heads=8,head_dim=128, GQA ratio12page_size=1, SGLang-style lookup if availablebf16andfp16variants, K/V cacheuint8storage interpreted as FP8 E4M3 with descalesgfx942Made with Cursor