Add a guard for huge bounds in bounds propagation#1390
Conversation
📝 WalkthroughWalkthroughThis PR adds tolerance-aware validation to bound updates in the mixed-integer programming presolve routine. A new predicate function gates candidate lower and upper bound assignments, ensuring updates exceed a scaled absolute tolerance threshold relative to the bound's magnitude before being written back to the working interval. ChangesBound Update Validation
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh`:
- Around line 46-47: The hardcoded acceptance scale (candidate_bound_scale =
f_t{1e-14}) in bounds_update_helpers.cuh makes bound gating inflexible; change
the code to derive this scale from the solver's configured tolerances (or accept
a passed-in relative tolerance) instead of the magic constant: replace the
literal with a parameter or a value read from the solver settings (e.g.,
solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and
use that derived value in the existing comparison that uses
candidate_bound_scale, while keeping the existing variables bound and abs_tol
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f63805a4-c0d7-438e-9fec-93725704b9f2
📒 Files selected for processing (1)
cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh
| constexpr f_t candidate_bound_scale = f_t{1e-14}; | ||
| return abs(bound) * candidate_bound_scale <= abs_tol; |
There was a problem hiding this comment.
Avoid a hardcoded acceptance scale in bound gating.
Line 46 hardcodes 1e-14, which makes the guard policy fixed across models and can reject legitimate large-scale bound tightenings. Please derive this scale from configured solver tolerances (or pass a configurable relative tolerance) instead of embedding a magic constant.
As per coding guidelines, "Avoid hardcoded tolerances that fail on degenerate problems; ensure tolerance consistency with solver settings."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/mip_heuristics/presolve/bounds_update_helpers.cuh` around lines 46 -
47, The hardcoded acceptance scale (candidate_bound_scale = f_t{1e-14}) in
bounds_update_helpers.cuh makes bound gating inflexible; change the code to
derive this scale from the solver's configured tolerances (or accept a passed-in
relative tolerance) instead of the magic constant: replace the literal with a
parameter or a value read from the solver settings (e.g.,
solver_tolerances.relative_bound_tol or an added function parameter rel_tol) and
use that derived value in the existing comparison that uses
candidate_bound_scale, while keeping the existing variables bound and abs_tol
intact.
There was a problem hiding this comment.
I would also adjust this according to abs_tol user might give very low abs tolerance. I think this also should related to the big-M value we derive in the semi-continuous variables. Something like candidate_bound_scale = feasibility_tolerance/big-M, even maybe a bit lower.
Apart from that, I think there might be real user bounds that are larger than 1e8. We should probably reject or give a warning if those bounds are given in the current state.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for big-M related code
rg -n --type=cpp -C3 'big.?[Mm]|BigM|BIG_M' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp"
echo "=== semi-continuous ==="
rg -n --type=cpp -C3 'semi.?cont|SemiCont|SEMI_CONT' --include="*.cuh" --include="*.cu" --include="*.cpp" --include="*.h" --include="*.hpp"
echo "=== candidate_bound_scale or 1e-14 ==="
rg -n -C3 'candidate_bound_scale|1e-14'Length of output: 9504
🏁 Script executed:
#!/bin/bash
# Search for big-M related code
rg -n -C3 'big.?[Mm]|BigM|BIG_M' --type cu --type cpp
echo "=== semi-continuous ==="
rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT' --type cu --type cpp
echo "=== absolute_tolerance field ==="
rg -n -C2 'absolute_tolerance' --type cu --type cpp | head -80
echo "=== tolerances struct ==="
rg -n -C5 'struct.*[Tt]olerance|tolerances\s*{' --type cu --type cpp | head -60Length of output: 753
🏁 Script executed:
#!/bin/bash
# Search for big-M related code
rg -n -C3 'big.?[Mm]|BigM|BIG_M'
echo "=== semi-continuous ==="
rg -n -C3 'semi.?cont|SemiCont|SEMI_CONT'
echo "=== absolute_tolerance field ==="
rg -n -C2 'absolute_tolerance' | head -100
echo "=== tolerances struct ==="
rg -n -C5 'struct.*olerance' | head -80
echo "=== problem tolerances ==="
rg -n -C3 'tolerances\.' | head -80Length of output: 151911
@akifcorduk, both points are well-taken. Here's a breakdown of the issues and what would be needed to fix them properly:
Concern 1 — scale should depend on abs_tol and big_m
The physics of this guard is: the floating-point cancellation error in a big-M row is approximately eps_machine × big_M ≈ 2e-16 × 1e10 = 2e-6. The guard should therefore reject only bounds whose magnitude puts them in the noise floor of the big-M arithmetic. Your formula is exactly right:
candidate_bound_scale = abs_tol / big_m
threshold = abs_tol / candidate_bound_scale = big_m
This makes the guard problem-aware: it only rejects derived bounds whose magnitude exceeds the big-M itself (i.e., they are clearly cancellation artifacts), regardless of what abs_tol the user chose.
Concern 2 — valid user bounds > 1e8 silently dropped
With the current hardcoded 1e-14 and default abs_tol = 1e-6, the threshold is only 1e8, but the default semi_continuous_big_m = 1e10. So legitimate derived bounds in the range [1e8, 1e10] are being silently discarded, potentially weakening propagation on large-scale SC problems. A bound exceeding the big-M value is the real signal of a cancellation artifact.
What needs to change
Implementing the fix requires semi_continuous_big_m to be accessible inside the GPU kernel — it is currently a solver-setting field, not part of the problem_t device view. It would need to either be threaded through as a kernel parameter or added to the problem view. The GPU-side warning case is also tricky since __device__ code cannot emit diagnostics directly; it would need a dedicated counter (similar to atomicAdd(upd.bounds_changed, ...)) surfaced to the CPU, or a CPU-side pre-check in the SC reformulation step before the presolve loop.
These changes go somewhat beyond the scope of this PR (which is a targeted guard to prevent catastrophic bounds). Would you like me to open a follow-up issue tracking:
- Replacing the hardcoded
1e-14withabs_tol / semi_continuous_big_m(requires plumbingbig_mto kernel) - Adding a warning/rejection path when user-provided bounds exceed the effective big-M threshold
🧠 Learnings used
Learnt from: nguidotti
Repo: NVIDIA/cuopt PR: 1401
File: cpp/include/cuopt/linear_programming/constants.h:117-130
Timestamp: 2026-06-08T08:13:20.179Z
Learning: In NVIDIA/cuopt `cpp/include/cuopt/linear_programming/mip/diving_hyper_params.hpp`, the field `farkas_obj_dynamism_tol` is an internal parameter. It intentionally does NOT have a corresponding string constant in `cpp/include/cuopt/linear_programming/constants.h` and is NOT user-configurable via config-file parameter loading. Do NOT flag its absence from `constants.h` as a missing public API key.
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 986
File: cpp/src/branch_and_bound/branch_and_bound.cpp:8-8
Timestamp: 2026-03-23T11:33:23.998Z
Learning: In this repo (NVIDIA/cuopt), treat nvcc as the supported CUDA toolchain; clang-based compilation/support is not required and may fail/break. During code reviews, do NOT request code changes or add blocking comments for errors that appear only under clang (e.g., header-resolution failures such as 'utilities/determinism_log.hpp not found')—these can be toolchain-related rather than real source issues.
Learnt from: bdice
Repo: NVIDIA/cuopt PR: 1035
File: cpp/tests/utilities/base_fixture.hpp:29-47
Timestamp: 2026-04-19T16:49:17.616Z
Learning: In NVIDIA/cuopt (and the upstream rapidsai/rmm) after the RMM → CCCL migration, the rmm::mr resource adaptors (e.g., rmm::mr::pool_memory_resource and rmm::mr::binning_memory_resource) are now owning: they take/hold their upstream resources by value instead of non-owning references. Therefore, direct construction of adaptor chains from temporaries/local values (e.g., pool_memory_resource(make_async(), size)) is safe and should NOT be flagged as potential dangling/lifetime bugs. Also, rmm::mr::make_owning_wrapper is no longer needed/available for this owning design, so do not suggest it in this codepath.
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 1353
File: cpp/src/barrier/sparse_cholesky.cuh:438-455
Timestamp: 2026-06-01T18:35:10.606Z
Learning: In this repo’s cuDSS/CUOPt integration code, `cudssMatrixCreateCsr` is version-dependent: for cuDSS `>= 0.8` (use the guard `CUDSS_VERSION_MAJOR > 0 || (CUDSS_VERSION_MAJOR == 0 && CUDSS_VERSION_MINOR >= 8)`), the signature changes so that after the `values` pointer it takes **three** `cudaDataType_t`-like parameters (row-offset type, column-index type, and value type). This call is correct and should NOT be flagged as an arity/type mismatch when it is inside the `>= 0.8` branch; use the `CUDSS_R_*` constants (e.g., `CUDSS_R_32I, CUDSS_R_32I, CUDSS_R_64F`). For cuDSS `< 0.8`, expect only **two** type parameters after `values` using the `CUDA_R_*` constants. Only flag calls that don’t match the expected arity/type constants for the active version branch.
|
/ok to test b4ba3b1 |
Semi-continuous var with big-M reformulations can create large row activities where the true residual is
smaller than double precision at that scale.
In that case, floating point cancellation can produce an unsafe candidate bound and later false infeasibility.
We add an usual guard on huge bounds derived. Note that it does not protect against the float cancellation error.
A full cancellation-aware guard would require tracking row activity scale or using compensated activity arithmetic, which would add extra reductions/storage or broader propagation changes.