Skip to content

Add a guard for huge bounds in bounds propagation#1390

Open
hlinsen wants to merge 2 commits into
NVIDIA:mainfrom
hlinsen:add-bound-prop-check
Open

Add a guard for huge bounds in bounds propagation#1390
hlinsen wants to merge 2 commits into
NVIDIA:mainfrom
hlinsen:add-bound-prop-check

Conversation

@hlinsen
Copy link
Copy Markdown
Contributor

@hlinsen hlinsen commented Jun 4, 2026

Semi-continuous var with big-M reformulations can create large row activities where the true residual is
smaller than double precision at that scale.

    row activity      = -2.476e26
    var contribution  = -2.476e26
    true residual     = -2.0e10
    double ULP there  =  3.44e10

  After cancellation:
    computed residual = -3.44e10
    coeff             = -1.23
    candidate bound   =  2.8e10

In that case, floating point cancellation can produce an unsafe candidate bound and later false infeasibility.

We add an usual guard on huge bounds derived. Note that it does not protect against the float cancellation error.

A full cancellation-aware guard would require tracking row activity scale or using compensated activity arithmetic, which would add extra reductions/storage or broader propagation changes.

@hlinsen hlinsen requested a review from a team as a code owner June 4, 2026 03:18
@hlinsen hlinsen added the bug Something isn't working label Jun 4, 2026
@hlinsen hlinsen requested a review from akifcorduk June 4, 2026 03:18
@hlinsen hlinsen added the non-breaking Introduces a non-breaking change label Jun 4, 2026
@hlinsen hlinsen requested a review from nguidotti June 4, 2026 03:18
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds tolerance-aware validation to bound updates in the mixed-integer programming presolve routine. A new predicate function gates candidate lower and upper bound assignments, ensuring updates exceed a scaled absolute tolerance threshold relative to the bound's magnitude before being written back to the working interval.

Changes

Bound Update Validation

Layer / File(s) Summary
Tolerance check predicate
cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh
New device function accept_candidate_bound_update validates whether a candidate bound change is significant enough relative to its magnitude and the provided absolute tolerance.
Conditional bound updates in presolve
cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh
update_bounds_per_cnst now conditionally applies new_lb and new_ub only after passing the acceptance check using the problem's absolute tolerance, preventing spurious bound tightening.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add a guard for huge bounds in bounds propagation' directly aligns with the main change: adding the accept_candidate_bound_update helper to guard against huge bounds in bounds propagation.
Description check ✅ Passed The description explains the problem (floating-point cancellation in semi-continuous variables with big-M reformulations), provides a concrete example, and describes the solution (adding a guard on huge bounds), all of which relate to the actual code changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh`:
- Around line 46-47: The hardcoded acceptance scale (candidate_bound_scale =
f_t{1e-14}) in bounds_update_helpers.cuh makes bound gating inflexible; change
the code to derive this scale from the solver's configured tolerances (or accept
a passed-in relative tolerance) instead of the magic constant: replace the
literal with a parameter or a value read from the solver settings (e.g.,
solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and
use that derived value in the existing comparison that uses
candidate_bound_scale, while keeping the existing variables bound and abs_tol
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f63805a4-c0d7-438e-9fec-93725704b9f2

📥 Commits

Reviewing files that changed from the base of the PR and between d6d6f9e and b148a3b.

📒 Files selected for processing (1)
  • cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh

Comment on lines +46 to +47
constexpr f_t candidate_bound_scale = f_t{1e-14};
return abs(bound) * candidate_bound_scale <= abs_tol;
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid a hardcoded acceptance scale in bound gating.

Line 46 hardcodes 1e-14, which makes the guard policy fixed across models and can reject legitimate large-scale bound tightenings. Please derive this scale from configured solver tolerances (or pass a configurable relative tolerance) instead of embedding a magic constant.

As per coding guidelines, "Avoid hardcoded tolerances that fail on degenerate problems; ensure tolerance consistency with solver settings."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh` around lines 46 -
47, The hardcoded acceptance scale (candidate_bound_scale = f_t{1e-14}) in
bounds_update_helpers.cuh makes bound gating inflexible; change the code to
derive this scale from the solver's configured tolerances (or accept a passed-in
relative tolerance) instead of the magic constant: replace the literal with a
parameter or a value read from the solver settings (e.g.,
solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and
use that derived value in the existing comparison that uses
candidate_bound_scale, while keeping the existing variables bound and abs_tol
intact.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also adjust this according to abs_tol user might give very low abs tolerance. I think this also should related to the big-M value we derive in the semi-continuous variables. Something like candidate_bound_scale = feasibility_tolerance/big-M, even maybe a bit lower.

Apart from that, I think there might be real user bounds that are larger than 1e8. We should probably reject or give a warning if those bounds are given in the current state.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for big-M related code
rg -n --type=cpp -C3 'big.?[Mm]|BigM|BIG_M' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp"

echo "=== semi-continuous ==="
rg -n --type=cpp -C3 'semi.?cont|SemiCont|SEMI_CONT' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp"

echo "=== candidate_bound_scale or 1e-14 ==="
rg -n -C3 'candidate_bound_scale|1e-14'

Length of output: 9504


🏁 Script executed:

#!/bin/bash
# Search for big-M related code
rg -n -C3 'big.?[Mm]|BigM|BIG_M' --type cu --type cpp

echo "=== semi-continuous ==="
rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT' --type cu --type cpp

echo "=== absolute_tolerance field ==="
rg -n -C2 'absolute_tolerance' --type cu --type cpp | head -80

echo "=== tolerances struct ==="
rg -n -C5 'struct.*[Tt]olerance|tolerances\s*{' --type cu --type cpp | head -60

Length of output: 753


🏁 Script executed:

#!/bin/bash
# Search for big-M related code
rg -n -C3 'big.?[Mm]|BigM|BIG_M'

echo "=== semi-continuous ==="
rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT'

echo "=== absolute_tolerance field ==="
rg -n -C2 'absolute_tolerance' | head -100

echo "=== tolerances struct ==="
rg -n -C5 'struct.*olerance' | head -80

echo "=== problem tolerances ==="
rg -n -C3 'tolerances\.' | head -80

Length of output: 151911



@akifcorduk, both points are well-taken. Here's a breakdown of the issues and what would be needed to fix them properly:

Concern 1 — scale should depend on abs_tol and big_m

The physics of this guard is: the floating-point cancellation error in a big-M row is approximately eps_machine × big_M ≈ 2e-16 × 1e10 = 2e-6. The guard should therefore reject only bounds whose magnitude puts them in the noise floor of the big-M arithmetic. Your formula is exactly right:

candidate_bound_scale = abs_tol / big_m
threshold = abs_tol / candidate_bound_scale = big_m

This makes the guard problem-aware: it only rejects derived bounds whose magnitude exceeds the big-M itself (i.e., they are clearly cancellation artifacts), regardless of what abs_tol the user chose.

Concern 2 — valid user bounds > 1e8 silently dropped

With the current hardcoded 1e-14 and default abs_tol = 1e-6, the threshold is only 1e8, but the default semi_continuous_big_m = 1e10. So legitimate derived bounds in the range [1e8, 1e10] are being silently discarded, potentially weakening propagation on large-scale SC problems. A bound exceeding the big-M value is the real signal of a cancellation artifact.

What needs to change

Implementing the fix requires semi_continuous_big_m to be accessible inside the GPU kernel — it is currently a solver-setting field, not part of the problem_t device view. It would need to either be threaded through as a kernel parameter or added to the problem view. The GPU-side warning case is also tricky since __device__ code cannot emit diagnostics directly; it would need a dedicated counter (similar to atomicAdd(upd.bounds_changed, ...)) surfaced to the CPU, or a CPU-side pre-check in the SC reformulation step before the presolve loop.

These changes go somewhat beyond the scope of this PR (which is a targeted guard to prevent catastrophic bounds). Would you like me to open a follow-up issue tracking:

  1. Replacing the hardcoded 1e-14 with abs_tol / semi_continuous_big_m (requires plumbing big_m to kernel)
  2. Adding a warning/rejection path when user-provided bounds exceed the effective big-M threshold

🧠 Learnings used
Learnt from: nguidotti
Repo: NVIDIA/cuopt PR: 1401
File: cpp/include/cuopt/linear_programming/constants.h:117-130
Timestamp: 2026-06-08T08:13:20.179Z
Learning: In NVIDIA/cuopt `cpp/include/cuopt/linear_programming/mip/diving_hyper_params.hpp`, the field `farkas_obj_dynamism_tol` is an internal parameter. It intentionally does NOT have a corresponding string constant in `cpp/include/cuopt/linear_programming/constants.h` and is NOT user-configurable via config-file parameter loading. Do NOT flag its absence from `constants.h` as a missing public API key.

Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 986
File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8
Timestamp: 2026-03-23T11:33:23.998Z
Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.

Learnt from: bdice
Repo: NVIDIA/cuopt PR: 1035
File: cpp/tests/utilities/base_fixture.hpp:29-47
Timestamp: 2026-04-19T16:49:17.616Z
Learning: In NVIDIA/cuopt (and the upstream rapidsai/rmm) after the RMM → CCCL migration, the rmm::mr resource adaptors (e.g., rmm::mr::pool_memory_resource and rmm::mr::binning_memory_resource) are now owning: they take/hold their upstream resources by value instead of non-owning references. Therefore, direct construction of adaptor chains from temporaries/local values (e.g., pool_memory_resource(make_async(), size)) is safe and should NOT be flagged as potential dangling/lifetime bugs. Also, rmm::mr::make_owning_wrapper is no longer needed/available for this owning design, so do not suggest it in this codepath.

Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 1353
File: cpp/src/barrier/sparse_cholesky.cuh:438-455
Timestamp: 2026-06-01T18:35:10.606Z
Learning: In this repo’s cuDSS/CUOPt integration code, `cudssMatrixCreateCsr` is version-dependent: for cuDSS `>= 0.8` (use the guard `CUDSS_VERSION_MAJOR > 0 || (CUDSS_VERSION_MAJOR == 0 && CUDSS_VERSION_MINOR >= 8)`), the signature changes so that after the `values` pointer it takes **three** `cudaDataType_t`-like parameters (row-offset type, column-index type, and value type). This call is correct and should NOT be flagged as an arity/type mismatch when it is inside the `>= 0.8` branch; use the `CUDSS_R_*` constants (e.g., `CUDSS_R_32I, CUDSS_R_32I, CUDSS_R_64F`). For cuDSS `< 0.8`, expect only **two** type parameters after `values` using the `CUDA_R_*` constants. Only flag calls that don’t match the expected arity/type constants for the active version branch.

@hlinsen
Copy link
Copy Markdown
Contributor Author

hlinsen commented Jun 4, 2026

/ok to test b4ba3b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants