Add a guard for huge bounds in bounds propagation by hlinsen · Pull Request #1390 · NVIDIA/cuopt

hlinsen · 2026-06-04T03:18:23Z

Semi-continuous var with big-M reformulations can create large row activities where the true residual is
smaller than double precision at that scale.

    row activity      = -2.476e26
    var contribution  = -2.476e26
    true residual     = -2.0e10
    double ULP there  =  3.44e10

  After cancellation:
    computed residual = -3.44e10
    coeff             = -1.23
    candidate bound   =  2.8e10

In that case, floating point cancellation can produce an unsafe candidate bound and later false infeasibility.

We add an usual guard on huge bounds derived. Note that it does not protect against the float cancellation error.

A full cancellation-aware guard would require tracking row activity scale or using compensated activity arithmetic, which would add extra reductions/storage or broader propagation changes.

copy-pr-bot · 2026-06-04T03:18:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-04T03:21:34Z

📝 Walkthrough

Walkthrough

This PR adds tolerance-aware validation to bound updates in the mixed-integer programming presolve routine. A new predicate function gates candidate lower and upper bound assignments, ensuring updates exceed a scaled absolute tolerance threshold relative to the bound's magnitude before being written back to the working interval.

Changes

Bound Update Validation

Layer / File(s)	Summary
Tolerance check predicate `cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh`	New device function `accept_candidate_bound_update` validates whether a candidate bound change is significant enough relative to its magnitude and the provided absolute tolerance.
Conditional bound updates in presolve `cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh`	`update_bounds_per_cnst` now conditionally applies `new_lb` and `new_ub` only after passing the acceptance check using the problem's absolute tolerance, preventing spurious bound tightening.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add a guard for huge bounds in bounds propagation' directly aligns with the main change: adding the accept_candidate_bound_update helper to guard against huge bounds in bounds propagation.
Description check	✅ Passed	The description explains the problem (floating-point cancellation in semi-continuous variables with big-M reformulations), provides a concrete example, and describes the solution (adding a guard on huge bounds), all of which relate to the actual code changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh`:
- Around line 46-47: The hardcoded acceptance scale (candidate_bound_scale =
f_t{1e-14}) in bounds_update_helpers.cuh makes bound gating inflexible; change
the code to derive this scale from the solver's configured tolerances (or accept
a passed-in relative tolerance) instead of the magic constant: replace the
literal with a parameter or a value read from the solver settings (e.g.,
solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and
use that derived value in the existing comparison that uses
candidate_bound_scale, while keeping the existing variables bound and abs_tol
intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f63805a4-c0d7-438e-9fec-93725704b9f2

📥 Commits

Reviewing files that changed from the base of the PR and between d6d6f9e and b148a3b.

📒 Files selected for processing (1)

cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh

coderabbitai · 2026-06-04T03:21:37Z

+  constexpr f_t candidate_bound_scale = f_t{1e-14};
+  return abs(bound) * candidate_bound_scale <= abs_tol;


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid a hardcoded acceptance scale in bound gating.

Line 46 hardcodes 1e-14, which makes the guard policy fixed across models and can reject legitimate large-scale bound tightenings. Please derive this scale from configured solver tolerances (or pass a configurable relative tolerance) instead of embedding a magic constant.

As per coding guidelines, "Avoid hardcoded tolerances that fail on degenerate problems; ensure tolerance consistency with solver settings."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh` around lines 46 - 47, The hardcoded acceptance scale (candidate_bound_scale = f_t{1e-14}) in bounds_update_helpers.cuh makes bound gating inflexible; change the code to derive this scale from the solver's configured tolerances (or accept a passed-in relative tolerance) instead of the magic constant: replace the literal with a parameter or a value read from the solver settings (e.g., solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and use that derived value in the existing comparison that uses candidate_bound_scale, while keeping the existing variables bound and abs_tol intact.

I would also adjust this according to abs_tol user might give very low abs tolerance. I think this also should related to the big-M value we derive in the semi-continuous variables. Something like candidate_bound_scale = feasibility_tolerance/big-M, even maybe a bit lower.

Apart from that, I think there might be real user bounds that are larger than 1e8. We should probably reject or give a warning if those bounds are given in the current state.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for big-M related code rg -n --type=cpp -C3 'big.?[Mm]|BigM|BIG_M' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp" echo "=== semi-continuous ===" rg -n --type=cpp -C3 'semi.?cont|SemiCont|SEMI_CONT' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp" echo "=== candidate_bound_scale or 1e-14 ===" rg -n -C3 'candidate_bound_scale|1e-14'

Length of output: 9504

🏁 Script executed:

#!/bin/bash # Search for big-M related code rg -n -C3 'big.?[Mm]|BigM|BIG_M' --type cu --type cpp echo "=== semi-continuous ===" rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT' --type cu --type cpp echo "=== absolute_tolerance field ===" rg -n -C2 'absolute_tolerance' --type cu --type cpp | head -80 echo "=== tolerances struct ===" rg -n -C5 'struct.*[Tt]olerance|tolerances\s*{' --type cu --type cpp | head -60

Length of output: 753

🏁 Script executed:

#!/bin/bash # Search for big-M related code rg -n -C3 'big.?[Mm]|BigM|BIG_M' echo "=== semi-continuous ===" rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT' echo "=== absolute_tolerance field ===" rg -n -C2 'absolute_tolerance' | head -100 echo "=== tolerances struct ===" rg -n -C5 'struct.*olerance' | head -80 echo "=== problem tolerances ===" rg -n -C3 'tolerances\.' | head -80

Length of output: 151911

@akifcorduk, both points are well-taken. Here's a breakdown of the issues and what would be needed to fix them properly:

Concern 1 — scale should depend on abs_tol and big_m

The physics of this guard is: the floating-point cancellation error in a big-M row is approximately eps_machine × big_M ≈ 2e-16 × 1e10 = 2e-6. The guard should therefore reject only bounds whose magnitude puts them in the noise floor of the big-M arithmetic. Your formula is exactly right:

candidate_bound_scale = abs_tol / big_m threshold = abs_tol / candidate_bound_scale = big_m

This makes the guard problem-aware: it only rejects derived bounds whose magnitude exceeds the big-M itself (i.e., they are clearly cancellation artifacts), regardless of what abs_tol the user chose.

Concern 2 — valid user bounds > 1e8 silently dropped

With the current hardcoded 1e-14 and default abs_tol = 1e-6, the threshold is only 1e8, but the default semi_continuous_big_m = 1e10. So legitimate derived bounds in the range [1e8, 1e10] are being silently discarded, potentially weakening propagation on large-scale SC problems. A bound exceeding the big-M value is the real signal of a cancellation artifact.

What needs to change

Implementing the fix requires semi_continuous_big_m to be accessible inside the GPU kernel — it is currently a solver-setting field, not part of the problem_t device view. It would need to either be threaded through as a kernel parameter or added to the problem view. The GPU-side warning case is also tricky since __device__ code cannot emit diagnostics directly; it would need a dedicated counter (similar to atomicAdd(upd.bounds_changed, ...)) surfaced to the CPU, or a CPU-side pre-check in the SC reformulation step before the presolve loop.

These changes go somewhat beyond the scope of this PR (which is a targeted guard to prevent catastrophic bounds). Would you like me to open a follow-up issue tracking:

Replacing the hardcoded 1e-14 with abs_tol / semi_continuous_big_m (requires plumbing big_m to kernel)

Adding a warning/rejection path when user-provided bounds exceed the effective big-M threshold

🧠 Learnings used

Learnt from: nguidotti Repo: NVIDIA/cuopt PR: 1401 File: cpp/include/cuopt/linear_programming/constants.h:117-130 Timestamp: 2026-06-08T08:13:20.179Z Learning: In NVIDIA/cuopt `cpp/include/cuopt/linear_programming/mip/diving_hyper_params.hpp`, the field `farkas_obj_dynamism_tol` is an internal parameter. It intentionally does NOT have a corresponding string constant in `cpp/include/cuopt/linear_programming/constants.h` and is NOT user-configurable via config-file parameter loading. Do NOT flag its absence from `constants.h` as a missing public API key.

Learnt from: aliceb-nv Repo: NVIDIA/cuopt PR: 986 File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8 Timestamp: 2026-03-23T11:33:23.998Z Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.

Learnt from: bdice Repo: NVIDIA/cuopt PR: 1035 File: cpp/tests/utilities/base_fixture.hpp:29-47 Timestamp: 2026-04-19T16:49:17.616Z Learning: In NVIDIA/cuopt (and the upstream rapidsai/rmm) after the RMM → CCCL migration, the rmm::mr resource adaptors (e.g., rmm::mr::pool_memory_resource and rmm::mr::binning_memory_resource) are now owning: they take/hold their upstream resources by value instead of non-owning references. Therefore, direct construction of adaptor chains from temporaries/local values (e.g., pool_memory_resource(make_async(), size)) is safe and should NOT be flagged as potential dangling/lifetime bugs. Also, rmm::mr::make_owning_wrapper is no longer needed/available for this owning design, so do not suggest it in this codepath.

Learnt from: chris-maes Repo: NVIDIA/cuopt PR: 1353 File: cpp/src/barrier/sparse_cholesky.cuh:438-455 Timestamp: 2026-06-01T18:35:10.606Z Learning: In this repo’s cuDSS/CUOPt integration code, `cudssMatrixCreateCsr` is version-dependent: for cuDSS `>= 0.8` (use the guard `CUDSS_VERSION_MAJOR > 0 || (CUDSS_VERSION_MAJOR == 0 && CUDSS_VERSION_MINOR >= 8)`), the signature changes so that after the `values` pointer it takes **three** `cudaDataType_t`-like parameters (row-offset type, column-index type, and value type). This call is correct and should NOT be flagged as an arity/type mismatch when it is inside the `>= 0.8` branch; use the `CUDSS_R_*` constants (e.g., `CUDSS_R_32I, CUDSS_R_32I, CUDSS_R_64F`). For cuDSS `< 0.8`, expect only **two** type parameters after `values` using the `CUDA_R_*` constants. Only flag calls that don’t match the expected arity/type constants for the active version branch.

hlinsen · 2026-06-04T03:25:02Z

/ok to test b4ba3b1

Add a guard for huge bounds in bounds propagation

b148a3b

hlinsen requested a review from a team as a code owner June 4, 2026 03:18

hlinsen added the bug Something isn't working label Jun 4, 2026

hlinsen requested a review from akifcorduk June 4, 2026 03:18

hlinsen added the non-breaking Introduces a non-breaking change label Jun 4, 2026

hlinsen requested a review from nguidotti June 4, 2026 03:18

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Merge branch 'main' of github.com:NVIDIA/cuopt into add-bound-prop-check

b4ba3b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a guard for huge bounds in bounds propagation#1390

Add a guard for huge bounds in bounds propagation#1390
hlinsen wants to merge 2 commits into
NVIDIA:mainfrom
hlinsen:add-bound-prop-check

hlinsen commented Jun 4, 2026

Uh oh!

copy-pr-bot Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026 •

edited

Loading

Uh oh!

akifcorduk Jun 8, 2026

Uh oh!

coderabbitai Bot Jun 8, 2026

Uh oh!

hlinsen commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		constexpr f_t candidate_bound_scale = f_t{1e-14};
		return abs(bound) * candidate_bound_scale <= abs_tol;

Conversation

hlinsen commented Jun 4, 2026

Uh oh!

copy-pr-bot Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akifcorduk Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

hlinsen commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot Jun 4, 2026 •

edited

Loading