Skip to content

Add logic for detecting loops with dynamic indices in RemoveRedundantCasts#2378

Open
justinrosner wants to merge 2 commits into
developfrom
justinr-dynamic-indices
Open

Add logic for detecting loops with dynamic indices in RemoveRedundantCasts#2378
justinrosner wants to merge 2 commits into
developfrom
justinr-dynamic-indices

Conversation

@justinrosner
Copy link
Copy Markdown
Contributor

@justinrosner justinrosner commented May 19, 2026

Motivation

Extends rock-remove-redundant-casts to handle the case where a narrow (f16/bf16) private buffer is filled by a counted CFG loop and then read back and fpext'd in a later block. Previously the pass required every fptrunc store to op-dominate the load, which fails for stores nested inside a writer-loop body even when every path to the load provably executes every store. The motivating IR is the lowered convolution/attention epilogue: a counted loop writes each vector<2xf16> tile of a private buffer, then a later counted loop reads those tiles back through fpext for bias-add / ReLU. Without this change the precision-preserving wide-buffer rewrite never fires on that pattern.

Technical Details

The new logic recognizes a canonical single-block LLVM-CFG counted loop (icmp slt/sgt/ult/ugt + cond_br header, single-body block with add + br latch) and proves a dynamic GEP indexed by its induction variable collectively tiles [0, bufferSize) when the loop starts at zero, ends at bufferSize, and steps by the per-iteration access width. findMatchingFPTruncStores now splits the dominance check by index kind: static indices keep the existing op-level check, while dynamic indices require (1) full-buffer coverage by the writer loop, (2) the loop header dominates the load block, and (3) the load lives outside the loop's header and body, together guaranteeing every path to the load passes through every iteration. Recognizer results are memoized per induction-variable block argument so each writer loop is proven at most once.

The transform reuses the existing wide-buffer creation path; in the dynamic case the inserted wide store lands inside the writer-loop body, which is safe via the same coverage argument and is called out in block comments on createWideStore and selectConsistentWideBuffers.

Test Plan

  • PR CI
  • Nightly CI

Test Result

Submission Checklist

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 69.33333% with 46 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...b/Dialect/Rock/Transforms/RemoveRedundantCasts.cpp 69.33% 20 Missing and 26 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2378      +/-   ##
===========================================
+ Coverage    79.50%   81.66%   +2.16%     
===========================================
  Files          100      125      +25     
  Lines        31016    42474   +11458     
  Branches      4819     7020    +2201     
===========================================
+ Hits         24659    34684   +10025     
- Misses        4245     5189     +944     
- Partials      2112     2601     +489     
Flag Coverage Δ
gfx950 81.31% <69.33%> (?)
mfma ?
navi4x 81.60% <69.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...b/Dialect/Rock/Transforms/RemoveRedundantCasts.cpp 77.78% <69.33%> (ø)

... and 111 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@justinrosner justinrosner marked this pull request as ready for review May 19, 2026 12:48
@justinrosner justinrosner requested a review from causten as a code owner May 19, 2026 12:48
Copilot AI review requested due to automatic review settings May 19, 2026 12:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the rock-remove-redundant-casts pass to handle dynamic-index GEP patterns inside counted CFG loops. Previously the pass required every fptrunc store to op-dominate the load, which failed when stores were nested inside a writer-loop body even when full-buffer coverage was provable. The change recognizes canonical single-block counted loops (icmp slt/sgt/ult/ugt header + add/br latch) and proves that a dynamic GEP indexed by the loop's induction variable tiles [0, bufferSize), then uses block-level dominance to validate the load executes after the loop.

Changes:

  • Introduces a counted-loop recognizer (matchSingleBlockCountedLoop) with memoization keyed by the induction-variable block argument.
  • Splits findMatchingFPTruncStores's dominance check: static indices keep op-level dominance; dynamic indices require full-buffer coverage plus header-dominates-load plus load-outside-loop.
  • Documents known pessimism in selectConsistentWideBuffers / createWideStore regarding pre-existing parallel wide stores inside the writer loop body.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
mlir/lib/Dialect/Rock/Transforms/RemoveRedundantCasts.cpp Adds counted-loop recognizer, coverage proof, dynamic-index dominance branch, and explanatory comments.
mlir/test/Dialect/LLVMIR/remove-redundant-casts.mlir Adds 19 new FileCheck tests covering safe and unsafe variations of the dynamic-loop transform.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants