Refactor(barrier): Optimize affine direction handling with GPU snapshot#1397
Refactor(barrier): Optimize affine direction handling with GPU snapshot#1397solos wants to merge 3 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughOptionally snapshots affine search directions into dedicated device buffers before corrector overwrites; keeps intermediate direction data device-resident (removing redundant host↔device transfers); updates FINITE_CHECK to validate device-stored affine vectors and updates iteration call sites. ChangesGPU Search Direction Affine Snapshot Optimization
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels: Suggested reviewers:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 3488-3490: Remove the unconditional host-side sync (the
RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_)) call) in the hot path;
instead rely on stream ordering for the producers of data.d_*_aff_ and the
consumers (compute_target_mu, compute_cc_rhs, gpu_compute_search_direction,
compute_final_direction) which already enqueue on the same stream_view_. If any
producer/consumer can run on a different stream, replace the sync with a CUDA
event-based stream synchronization (record on producer stream and wait on
consumer stream) rather than a host-side cudaStreamSynchronize; keep
RAFT_CUDA_TRY and stream_view_ usage intact for any event waits you add.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ba0b05f6-f2fb-4b54-88bc-69a1f0acf522
📒 Files selected for processing (2)
cpp/src/barrier/barrier.cucpp/src/barrier/barrier.hpp
cacdd39 to
3043b45
Compare
- Eliminated redundant CPU-GPU data transfers in Mehrotra step. - Introduced GPU-side snapshot to retain affine directions (dx, dy, etc.). - Removed costly synchronous copies and stream synchronizations. - Simplified code by deprecating manual host-side vector management.
Description
Refactor(barrier): Optimize affine direction handling with GPU snapshot
Issue
Checklist