Add reuseA/reuseB support to MFMA instruction for precise reuse bit#7753
Conversation
|
gfx1250 tox passed. ---------------------------------------------------------------------------------------- generated xml file: /workspace/fork/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml ----------------------------------------------------------------------------------------- |
There was a problem hiding this comment.
Pull request overview
This PR adds explicit reuseA/reuseB support for WMMA/MFMA-family instructions to more precisely control the emitted matrix_a_reuse / matrix_b_reuse hints (targeting gfx1250), and propagates those hints through Tensile assembly emission and rocisa→stinkytofu conversion.
Changes:
- Extend rocisa
MFMAInstruction/MXMFMAInstructionwithreuseA/reuseBfields, update nanobind bindings, and emitmatrix_{a,b}_reuseintoString()when supported. - Compute per-instruction reuse decisions in
KernelWriterAssembly.pyand pass them intoMFMAInstruction/MXMFMAInstruction. - Teach stinkytofu rocisa string parsing to extract
matrix_{a,b}_reuseand store it inMFMAModifiers; remove an obsolete commented “hack” line.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| shared/stinkytofu/src/conversion/rocisa/ToStinkyTofuUtils.cpp | Parse matrix_a_reuse/matrix_b_reuse tokens from rocisa instruction strings into MFMAModifiers. |
| projects/hipblaslt/tensilelite/Tensile/SolutionStructs/Problem.py | Remove an unused commented-out TLUB “hack” line. |
| projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py | Compute reuseA/reuseB per MFMA and pass into rocisa instruction constructors. |
| projects/hipblaslt/tensilelite/rocisa/rocisa/src/instruction/mfma.cpp | Expose reuseA/reuseB as Python ctor args for MFMA/MXMFMA instructions. |
| projects/hipblaslt/tensilelite/rocisa/rocisa/include/instruction/mfma.hpp | Store reuseA/reuseB on MFMA/MXMFMA instructions and append matrix_{a,b}_reuse to emitted asm when supported. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #7753 +/- ##
===========================================
+ Coverage 61.87% 61.98% +0.11%
===========================================
Files 2086 2087 +1
Lines 357038 357926 +888
Branches 53806 54001 +195
===========================================
+ Hits 220892 221843 +951
+ Misses 117348 117281 -67
- Partials 18798 18802 +4
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Can we do this at StinkyTofu stage? |
666331e to
2dd1db0
Compare
2dd1db0 to
49885a8
Compare
hi @nakajee Since StinkyTofu currently preserves the relative order of WMMA instructions (i.e., WMMA instructions remain stable with respect to each other, even though they may be interleaved with other instructions), there is no need to move this logic to the StinkyTofu stage. Thank you. |
Pull request was closed
|
Let me clarify: Since StinkyTofu currently preserves the relative order of WMMA instructions (i.e., WMMA instructions remain stable with respect to each other, even though they may be interleaved with other instructions), there is no need to move this logic to the StinkyTofu stage. Thank you. |
AIHPBLAS-1403
Brief
Precisely setting reuse bit for gfx1250.
Implementations
Precisely setting reuse bit for gfx1250
Assembly output
Tests
Test cases are all passed on FFM and Gopher.
FFM(with emulated ECC)
tox passed
hipblaslt-test passed
Notes