Refactor(barrier): Optimize affine direction handling with GPU snapshot by solos · Pull Request #1397 · NVIDIA/cuopt

solos · 2026-06-05T10:39:18Z

Description

Refactor(barrier): Optimize affine direction handling with GPU snapshot

Eliminated redundant CPU-GPU data transfers in Mehrotra step.
Introduced GPU-side snapshot to retain affine directions (dx, dy, etc.).
Removed costly synchronous copies and stream synchronizations.
Simplified code by deprecating manual host-side vector management.

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

copy-pr-bot · 2026-06-05T10:39:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-05T10:44:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 025da6ff-b0b7-4fd9-9fd7-6088e16bf072

📥 Commits

Reviewing files that changed from the base of the PR and between 6b28fde and d99a356.

📒 Files selected for processing (2)

cpp/src/barrier/barrier.cu
cpp/src/barrier/barrier.hpp

🚧 Files skipped from review as they are similar to previous changes (2)

cpp/src/barrier/barrier.hpp
cpp/src/barrier/barrier.cu

📝 Walkthrough

Walkthrough

Optionally snapshots affine search directions into dedicated device buffers before corrector overwrites; keeps intermediate direction data device-resident (removing redundant host↔device transfers); updates FINITE_CHECK to validate device-stored affine vectors and updates iteration call sites.

Changes

GPU Search Direction Affine Snapshot Optimization

Layer / File(s)	Summary
Function signature extension and snapshot helper `cpp/src/barrier/barrier.hpp`, `cpp/src/barrier/barrier.cu`	Extends `gpu_compute_search_direction` with `snapshot_affine_direction` boolean parameter (default false) and adds `copy_affine_direction_to_device_buffers()` to copy finalized affine directions (dw/dx/dy/dv/dz) into dedicated device buffers.
Device-resident direction buffer flow `cpp/src/barrier/barrier.cu`	Removes host←device and device→host transfers of dx/dy/dz/dv across augmented and non-augmented solve paths; keeps intermediate directions GPU-resident for downstream computations.
Affine snapshot invocation `cpp/src/barrier/barrier.cu`	Conditionally snapshots current affine directions into `d_*_aff_` when `snapshot_affine_direction` is true by calling the new helper before the corrector overwrites working buffers.
compute_target_mu device use `cpp/src/barrier/barrier.cu`	Removes the host→device copy that populated `d_*_aff_`; `compute_target_mu` now uses device-resident affine buffers.
FINITE_CHECK validation update `cpp/src/barrier/barrier.cu`	Changes finiteness validation to copy `d_dy_aff_` from device to host and check finiteness there instead of relying on host-side affine buffers.
Solve loop and sync removals `cpp/src/barrier/barrier.cu`	Solve loop passes `snapshot_affine_direction=true` for the affine predictor call and removes explicit stream synchronizations after affine and corrector `gpu_compute_search_direction` calls.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels: non-breaking, improvement

Suggested reviewers:

mlubin

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main refactoring: optimizing affine direction handling by introducing GPU snapshots, which aligns with the primary objective of eliminating CPU-GPU transfers.
Description check	✅ Passed	The description is directly related to the changeset, detailing the elimination of CPU-GPU data transfers, GPU-side snapshot introduction, and removal of synchronous operations, which matches the code changes documented in the raw summary.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 3488-3490: Remove the unconditional host-side sync (the
RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_)) call) in the hot path;
instead rely on stream ordering for the producers of data.d_*_aff_ and the
consumers (compute_target_mu, compute_cc_rhs, gpu_compute_search_direction,
compute_final_direction) which already enqueue on the same stream_view_. If any
producer/consumer can run on a different stream, replace the sync with a CUDA
event-based stream synchronization (record on producer stream and wait on
consumer stream) rather than a host-side cudaStreamSynchronize; keep
RAFT_CUDA_TRY and stream_view_ usage intact for any event waits you add.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ba0b05f6-f2fb-4b54-88bc-69a1f0acf522

📥 Commits

Reviewing files that changed from the base of the PR and between cf90894 and d05dfe5.

📒 Files selected for processing (2)

cpp/src/barrier/barrier.cu
cpp/src/barrier/barrier.hpp

- Eliminated redundant CPU-GPU data transfers in Mehrotra step. - Introduced GPU-side snapshot to retain affine directions (dx, dy, etc.). - Removed costly synchronous copies and stream synchronizations. - Simplified code by deprecating manual host-side vector management.

Add docs Fix bug

solos requested a review from a team as a code owner June 5, 2026 10:39

solos requested review from chris-maes and kaatish June 5, 2026 10:39

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread cpp/src/barrier/barrier.cu Outdated

solos force-pushed the refactor-barrier branch 3 times, most recently from cacdd39 to 3043b45 Compare June 5, 2026 11:10

solos added 2 commits June 5, 2026 19:11

Add docs

6857d1a

Add docs Fix bug

solos force-pushed the refactor-barrier branch from 29461d5 to 6857d1a Compare June 5, 2026 11:12

Merge branch 'main' into refactor-barrier

d99a356

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor(barrier): Optimize affine direction handling with GPU snapshot#1397

Refactor(barrier): Optimize affine direction handling with GPU snapshot#1397
solos wants to merge 3 commits into
NVIDIA:mainfrom
solos:refactor-barrier

solos commented Jun 5, 2026

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solos commented Jun 5, 2026

Description

Issue

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading