Skip to content

rocsolver: replace remaining compile-time WarpSize assumptions in GPU reductions#7744

Draft
Copilot wants to merge 2 commits into
developfrom
copilot/userseddazevedofix-warpsize
Draft

rocsolver: replace remaining compile-time WarpSize assumptions in GPU reductions#7744
Copilot wants to merge 2 commits into
developfrom
copilot/userseddazevedofix-warpsize

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 25, 2026

projects/rocsolver still had GPU reduction paths that sized shared storage or bounded reduction loops using compile-time WarpSize. This updates those kernels to follow the device runtime warp size (warpSize) while keeping shared-memory allocation safe for supported warp widths.

  • Runtime warp sizing in reduction helpers

    • replaced remaining compile-time WarpSize checks/indexing in lacn2 device helpers with runtime warpSize
    • switched reduction loop bounds from compile-time wave counts to runtime-derived warp counts
  • Shared-memory sizing without compile-time wave assumptions

    • removed the WarpSize constant from lib_device_helpers.hpp
    • added small helpers for:
      • conservative compile-time sizing of warp-accumulator arrays using the minimum supported warp width
      • runtime computation of the number of participating warps
  • rocsolver kernel sites updated

    • updated shared warp-accumulator storage in:
      • rocauxiliary_lange.hpp
      • rocauxiliary_latrd.hpp
      • rocauxiliary_larf_specialized_kernels.hpp
      • rocauxiliary_larfg_specialized_kernels.hpp
    • updated lapack_device_functions.hpp lacn2_* reductions to use runtime warpSize throughout

Example of the change pattern:

__shared__ S sval[MaxWarpCount<LACN2_BLOCKSIZE>];

if(tid % warpSize == 0)
    sval[tid / warpSize] = sum;

for(I k = 1; k < std::min(num_warps<I>(LACN2_BLOCKSIZE, warpSize), num_warps(n, warpSize)); k++)
    sum += sval[k];
Original prompt

Create a pull request in ROCm/rocm-libraries to update the projects/rocsolver component so GPU kernel code avoids relying on compile-time WarpSize and instead uses the runtime-defined warp size (warpSize) where appropriate.

Context:

  • Target repository: ROCm/rocm-libraries
  • Relevant subdirectory: projects/rocsolver
  • Base branch: develop
  • Requested remote branch name: users/EdDAzevedo/fix_WarpSize

Task details:

  • Identify uses in projects/rocsolver GPU kernel code where compile-time WarpSize is assumed or hard-coded in a way that should instead use the runtime-defined device warp size.
  • Update the implementation to use the runtime warp size (warpSize) where appropriate, while preserving correctness and performance as much as possible.
  • Limit changes to the rocsolver project unless a small shared fix is clearly required.
  • Ensure the code builds cleanly and update any related comments or helper logic if needed.
  • Create a PR with a concise summary of the change and its rationale.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: I would like to create a new PR to rocsolver to avoid the use of compile time WarpSize to use the executim time defined warpSize in GPU kernel codes

Assistant: I can help with that, but I need the exact repository name in owner/repo format for the PR target.

You mentioned rocsolver as the component, but the only explicit repo provided so far is ROCm/rocm-libraries. If you want the PR opened in that repo, confirm:

ROCm/rocm-libraries

If instead rocsolver is a different repository, please provide its exact owner/repo name.I can help with that, but I need the exact repository name in owner/repo format for the PR target.

You mentioned rocsolver as the component, but the only explicit repo provided so far is ROCm/rocm-libraries. If you want the PR opened in that repo, confirm:

ROCm/rocm-libraries

If instead rocsolver is a different repository, please provide its exact owner/repo name.

User: The repository is @ROCm/rocm-libraries/files/projects/rocsolver and the remote branch should be called users/EdDAzevedo/fix_WarpSize

Assistant: I’m waiting for your response to the confirmation dialog so I can continue creating the PR on ROCm/rocm-libraries from develop using branch users/EdDAzevedo/fix_WarpSize.

This pull request was created from Copilot chat.

@assistant-librarian assistant-librarian Bot added the external contribution Code contribution from users community.. label May 25, 2026
Agent-Logs-Url: https://github.com/ROCm/rocm-libraries/sessions/1969379b-e4bc-4b53-8aa5-5bc733b566ab

Co-authored-by: EdDAzevedo <98369794+EdDAzevedo@users.noreply.github.com>
Copilot AI changed the title [WIP] Update rocsolver to use runtime-defined warp size rocsolver: replace remaining compile-time WarpSize assumptions in GPU reductions May 25, 2026
Copilot AI requested a review from EdDAzevedo May 25, 2026 20:08
@EdDAzevedo EdDAzevedo requested a review from Copilot May 25, 2026 20:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates several rocSOLVER GPU reduction paths to avoid relying on a compile-time WarpSize constant and instead use the device runtime warpSize, while keeping shared-memory allocations safe via conservative compile-time sizing.

Changes:

  • Removed the compile-time WarpSize constant and introduced helpers to (a) conservatively size per-warp shared accumulators at compile time and (b) compute warp counts at runtime.
  • Updated LACN2 reduction helpers to use warpSize for reduction logic and to bound inter-warp reductions using runtime-derived warp counts.
  • Updated multiple kernels’ shared “warp accumulator” allocations to use the new conservative sizing helper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
projects/rocsolver/library/src/include/lib_device_helpers.hpp Removes compile-time WarpSize; adds MinWarpSize, MaxWarpCount<>, and num_warps() helpers to support runtime warp sizing safely.
projects/rocsolver/library/src/include/lapack_device_functions.hpp Updates lacn2_* kernels’ shared storage sizing and reduction indexing/loop bounds to use runtime warpSize and num_warps().
projects/rocsolver/library/src/auxiliary/rocauxiliary_lange.hpp Switches shared per-warp accumulator arrays to MaxWarpCount<> sizing.
projects/rocsolver/library/src/auxiliary/rocauxiliary_latrd.hpp Switches shared per-warp accumulator array to MaxWarpCount<> sizing.
projects/rocsolver/library/src/specialized/rocauxiliary_larf_specialized_kernels.hpp Switches shared per-warp accumulator array to MaxWarpCount<> sizing for the specialized right-kernel path.
projects/rocsolver/library/src/specialized/rocauxiliary_larfg_specialized_kernels.hpp Switches shared per-warp accumulator array to MaxWarpCount<> sizing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...csolver/library/src/include/lib_device_helpers.hpp 0.00% 3 Missing ⚠️

❌ Your project status has failed because the head coverage (77.82%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7744      +/-   ##
===========================================
- Coverage    61.95%   61.94%   -0.00%     
===========================================
  Files         2086     2086              
  Lines       357056   357059       +3     
  Branches     53784    53784              
===========================================
  Hits        221180   221180              
- Misses      117067   117070       +3     
  Partials     18809    18809              
Flag Coverage Δ *Carryforward flag
TensileLite 26.45% <ø> (ø) Carriedforward from 6824cec
hipBLAS 90.65% <ø> (ø) Carriedforward from 6824cec
hipBLASLt 41.27% <ø> (ø) Carriedforward from 6824cec
hipCUB 82.68% <ø> (ø) Carriedforward from 6824cec
hipDNN 85.87% <ø> (ø) Carriedforward from 6824cec
hipFFT 50.00% <ø> (ø) Carriedforward from 6824cec
hipRAND 76.12% <ø> (ø) Carriedforward from 6824cec
hipSOLVER 69.24% <ø> (ø) Carriedforward from 6824cec
hipSPARSE 85.09% <ø> (ø) Carriedforward from 6824cec
rocBLAS 48.11% <ø> (ø) Carriedforward from 6824cec
rocFFT 52.07% <ø> (ø) Carriedforward from 6824cec
rocRAND 57.04% <ø> (ø) Carriedforward from 6824cec
rocSOLVER 77.82% <0.00%> (-0.01%) ⬇️
rocSPARSE 72.68% <ø> (ø) Carriedforward from 6824cec

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...olver/library/src/auxiliary/rocauxiliary_lange.hpp 79.46% <ø> (ø)
...olver/library/src/auxiliary/rocauxiliary_latrd.hpp 93.71% <ø> (ø)
...er/library/src/include/lapack_device_functions.hpp 62.07% <ø> (ø)
...cialized/rocauxiliary_larf_specialized_kernels.hpp 47.37% <ø> (ø)
...ialized/rocauxiliary_larfg_specialized_kernels.hpp 55.88% <ø> (ø)
...csolver/library/src/include/lib_device_helpers.hpp 21.28% <0.00%> (-1.45%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community.. project: rocsolver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants