rocsolver: replace remaining compile-time WarpSize assumptions in GPU reductions#7744
rocsolver: replace remaining compile-time WarpSize assumptions in GPU reductions#7744Copilot wants to merge 2 commits into
Conversation
Agent-Logs-Url: https://github.com/ROCm/rocm-libraries/sessions/1969379b-e4bc-4b53-8aa5-5bc733b566ab Co-authored-by: EdDAzevedo <98369794+EdDAzevedo@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates several rocSOLVER GPU reduction paths to avoid relying on a compile-time WarpSize constant and instead use the device runtime warpSize, while keeping shared-memory allocations safe via conservative compile-time sizing.
Changes:
- Removed the compile-time
WarpSizeconstant and introduced helpers to (a) conservatively size per-warp shared accumulators at compile time and (b) compute warp counts at runtime. - Updated LACN2 reduction helpers to use
warpSizefor reduction logic and to bound inter-warp reductions using runtime-derived warp counts. - Updated multiple kernels’ shared “warp accumulator” allocations to use the new conservative sizing helper.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| projects/rocsolver/library/src/include/lib_device_helpers.hpp | Removes compile-time WarpSize; adds MinWarpSize, MaxWarpCount<>, and num_warps() helpers to support runtime warp sizing safely. |
| projects/rocsolver/library/src/include/lapack_device_functions.hpp | Updates lacn2_* kernels’ shared storage sizing and reduction indexing/loop bounds to use runtime warpSize and num_warps(). |
| projects/rocsolver/library/src/auxiliary/rocauxiliary_lange.hpp | Switches shared per-warp accumulator arrays to MaxWarpCount<> sizing. |
| projects/rocsolver/library/src/auxiliary/rocauxiliary_latrd.hpp | Switches shared per-warp accumulator array to MaxWarpCount<> sizing. |
| projects/rocsolver/library/src/specialized/rocauxiliary_larf_specialized_kernels.hpp | Switches shared per-warp accumulator array to MaxWarpCount<> sizing for the specialized right-kernel path. |
| projects/rocsolver/library/src/specialized/rocauxiliary_larfg_specialized_kernels.hpp | Switches shared per-warp accumulator array to MaxWarpCount<> sizing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (77.82%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #7744 +/- ##
===========================================
- Coverage 61.95% 61.94% -0.00%
===========================================
Files 2086 2086
Lines 357056 357059 +3
Branches 53784 53784
===========================================
Hits 221180 221180
- Misses 117067 117070 +3
Partials 18809 18809
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
projects/rocsolverstill had GPU reduction paths that sized shared storage or bounded reduction loops using compile-timeWarpSize. This updates those kernels to follow the device runtime warp size (warpSize) while keeping shared-memory allocation safe for supported warp widths.Runtime warp sizing in reduction helpers
WarpSizechecks/indexing inlacn2device helpers with runtimewarpSizeShared-memory sizing without compile-time wave assumptions
WarpSizeconstant fromlib_device_helpers.hpprocsolver kernel sites updated
rocauxiliary_lange.hpprocauxiliary_latrd.hpprocauxiliary_larf_specialized_kernels.hpprocauxiliary_larfg_specialized_kernels.hpplapack_device_functions.hpplacn2_*reductions to use runtimewarpSizethroughoutExample of the change pattern:
Original prompt
This pull request was created from Copilot chat.