Skip to content

[BACKPORT] Fix bert performance regressions on release branch#2331

Merged
umangyadav merged 5 commits into
release/rocm-rel-7.2from
perf-fix/rocm-rel-7.2
Apr 7, 2026
Merged

[BACKPORT] Fix bert performance regressions on release branch#2331
umangyadav merged 5 commits into
release/rocm-rel-7.2from
perf-fix/rocm-rel-7.2

Conversation

@umangyadav
Copy link
Copy Markdown
Member

Cherry pick some fixes to fix performance on bert models
More details at ticket SWDEV-580287

This cherry pick brings back the performance and does a little better compared to QA's baseline

justinrosner and others added 4 commits April 2, 2026 20:19
* Fix barrier placement for scheduleVersion = 1.
It should appear before LDSRead and not before GlobalLoad.
…gs for attention (#2237)

Full cherry-pick of 8b45d8a from develop.

* RockPipeline: replace pair-only swap with topological sort (Kahn's
  algorithm) to handle chained private-memory RAW dependencies
  (e.g. three-way stage rotations)
* BlockwiseGemmToThreadwise: fix rthreads calculation to avoid LDS
  aliasing when rthreads does not evenly divide rDimSize
* Add LIT test for rthreads fix in lowering_blockwise_broadcast_reduce
* Add LIT test for three-way swap in test_rock_pipeline
* Add E2E attention tests for gfx90a and gfx950

Made-with: Cursor
@umangyadav
Copy link
Copy Markdown
Member Author

@umangyadav umangyadav merged commit 6671a1e into release/rocm-rel-7.2 Apr 7, 2026
12 checks passed
@umangyadav umangyadav deleted the perf-fix/rocm-rel-7.2 branch April 7, 2026 14:23
@ahsan-ca
Copy link
Copy Markdown
Contributor

ahsan-ca commented Apr 8, 2026

Cherry pick PR created: ROCm/AMDMIGraphX#4756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants