Add logic for detecting loops with dynamic indices in RemoveRedundantCasts#2378
Add logic for detecting loops with dynamic indices in RemoveRedundantCasts#2378justinrosner wants to merge 2 commits into
RemoveRedundantCasts#2378Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2378 +/- ##
===========================================
+ Coverage 79.50% 81.66% +2.16%
===========================================
Files 100 125 +25
Lines 31016 42474 +11458
Branches 4819 7020 +2201
===========================================
+ Hits 24659 34684 +10025
- Misses 4245 5189 +944
- Partials 2112 2601 +489
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR extends the rock-remove-redundant-casts pass to handle dynamic-index GEP patterns inside counted CFG loops. Previously the pass required every fptrunc store to op-dominate the load, which failed when stores were nested inside a writer-loop body even when full-buffer coverage was provable. The change recognizes canonical single-block counted loops (icmp slt/sgt/ult/ugt header + add/br latch) and proves that a dynamic GEP indexed by the loop's induction variable tiles [0, bufferSize), then uses block-level dominance to validate the load executes after the loop.
Changes:
- Introduces a counted-loop recognizer (
matchSingleBlockCountedLoop) with memoization keyed by the induction-variable block argument. - Splits
findMatchingFPTruncStores's dominance check: static indices keep op-level dominance; dynamic indices require full-buffer coverage plus header-dominates-load plus load-outside-loop. - Documents known pessimism in
selectConsistentWideBuffers/createWideStoreregarding pre-existing parallel wide stores inside the writer loop body.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| mlir/lib/Dialect/Rock/Transforms/RemoveRedundantCasts.cpp | Adds counted-loop recognizer, coverage proof, dynamic-index dominance branch, and explanatory comments. |
| mlir/test/Dialect/LLVMIR/remove-redundant-casts.mlir | Adds 19 new FileCheck tests covering safe and unsafe variations of the dynamic-loop transform. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Motivation
Extends
rock-remove-redundant-caststo handle the case where a narrow (f16/bf16) private buffer is filled by a counted CFG loop and then read back andfpext'd in a later block. Previously the pass required everyfptruncstore to op-dominate the load, which fails for stores nested inside a writer-loop body even when every path to the load provably executes every store. The motivating IR is the lowered convolution/attention epilogue: a counted loop writes eachvector<2xf16>tile of a private buffer, then a later counted loop reads those tiles back throughfpextfor bias-add / ReLU. Without this change the precision-preserving wide-buffer rewrite never fires on that pattern.Technical Details
The new logic recognizes a canonical single-block LLVM-CFG counted loop (
icmp slt/sgt/ult/ugt+cond_brheader, single-body block withadd+brlatch) and proves a dynamic GEP indexed by its induction variable collectively tiles[0, bufferSize)when the loop starts at zero, ends atbufferSize, and steps by the per-iteration access width.findMatchingFPTruncStoresnow splits the dominance check by index kind: static indices keep the existing op-level check, while dynamic indices require (1) full-buffer coverage by the writer loop, (2) the loop header dominates the load block, and (3) the load lives outside the loop's header and body, together guaranteeing every path to the load passes through every iteration. Recognizer results are memoized per induction-variable block argument so each writer loop is proven at most once.The transform reuses the existing wide-buffer creation path; in the dynamic case the inserted wide store lands inside the writer-loop body, which is safe via the same coverage argument and is called out in block comments on
createWideStoreandselectConsistentWideBuffers.Test Plan
Test Result
Submission Checklist