Skip to content

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#7

Closed
aliceb-nv wants to merge 292 commits into
release/26.04from
gpudet
Closed

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#7
aliceb-nv wants to merge 292 commits into
release/26.04from
gpudet

Conversation

@aliceb-nv
Copy link
Copy Markdown
Owner

Test PR to trigger coderabbit.

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4ca2e378-42a2-499f-970a-d8fdc38148df

📥 Commits

Reviewing files that changed from the base of the PR and between 3951105 and 1678640.

📒 Files selected for processing (31)
  • benchmarks/linear_programming/cuopt/run_mip.cpp
  • cpp/CMakeLists.txt
  • cpp/include/cuopt/linear_programming/cuopt_c.h
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/mip_heuristics/diversity/population.cu
  • cpp/src/mip_heuristics/diversity/population.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu
  • cpp/src/mip_heuristics/local_search/local_search.cu
  • cpp/src/mip_heuristics/local_search/local_search.cuh
  • cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu
  • cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh
  • cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu
  • cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cu
  • cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cuh
  • cpp/src/mip_heuristics/presolve/lb_probing_cache.cu
  • cpp/src/mip_heuristics/presolve/probing_cache.cu
  • cpp/src/mip_heuristics/presolve/probing_cache.cuh
  • cpp/src/mip_heuristics/solution/solution.cu
  • cpp/src/mip_heuristics/solution_callbacks.cuh
  • cpp/src/mip_heuristics/solve.cu
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/utilities/determinism_log.hpp
  • cpp/tests/mip/determinism_test.cu
  • cpp/tests/mip/diversity_test.cu
  • cpp/tests/mip/feasibility_jump_tests.cu
  • cpp/tests/mip/local_search_test.cu
  • cpp/tests/mip/presolve_test.cu

📝 Walkthrough

Walkthrough

Extended MIP solver with deterministic execution support: added determinism bitflags and solution-origin metadata, unified termination/work-limit infrastructure, rewired callback/publication/injection paths, propagated work-aware timers across heuristics and presolve, and introduced deterministic logging, hashing, and extensive test coverage.

Changes

Cohort / File(s) Summary
Determinism core & flags
cpp/include/cuopt/linear_programming/constants.h, cpp/src/utilities/determinism_log.hpp
Added determinism bitflags (CUOPT_DETERMINISM_*), mode aliases, and MIP solution origin enum; added no-op determinism logging macro.
Termination / work-limit plumbing
cpp/src/utilities/termination_checker.hpp, cpp/src/utilities/work_limit_timer.hpp, cpp/src/utilities/work_limit_context.hpp, cpp/src/utilities/work_unit_scheduler.*
Introduced termination_checker_t (alias work_limit_timer_t), work-limit context with producer sync and scaling, scheduler base-offset, and helpers for recording/querying work and remaining units.
Callback & solution publication/injection
cpp/include/cuopt/linear_programming/cuopt_c.h, cpp/include/cuopt/linear_programming/utilities/internals.hpp, cpp/src/mip_heuristics/solution_callbacks.cuh, cpp/src/pdlp/cuopt_c.cpp, cpp/src/dual_simplex/simplex_solver_settings.hpp
Added extended C callback info (cuOptMIPSolutionCallbackInfo) and ext callback API; new polymorphic callback type, solution_publication and solution_injection helpers, and C wrapper/registration for extended callbacks.
Branch-and-bound determinism refactor
cpp/src/branch_and_bound/... (branch_and_bound.cpp, .hpp, deterministic_workers.hpp, mip_node.hpp, pseudo_costs.*)
Major deterministic B&B changes: origin-aware incumbent flow, deterministic replay/retire, work-timestamp plumbing, API signature updates (set_new_solution, queue_external_solution_deterministic, report_heuristic), node move semantics/destructor, and pseudo-costs reliable-selection with work-context.
GPU heuristics / diversity & recombiners
cpp/src/mip_heuristics/diversity/*, recombiners/*, weights.cuh, recombiner_stats.hpp
Propagated work/timer context into diversity manager and recombiners; recombiner returns now include work (tuple<solution,bool,double>), deterministic enablement, per-recombiner work tracking, origin-tagged drained solutions, and reward normalization aware of deterministic mode.
Feasibility-jump & load balancing
cpp/src/mip_heuristics/feasibility_jump/*, load_balancing.cuh, utils.cuh
Added deterministic frontier-work estimator, device-side deterministic accounting, score-type changes to integer scores, deterministic tie-breaking, new kernels and sorting utilities, and timer->work-limit transitions.
Local search, line-segment & rounding
cpp/src/mip_heuristics/local_search/*, line_segment_search/*, rounding/*
Switched timers to work-limit variants, passed solver context into line-segment search, added deterministic caps/logging in bounds-repair and constraint-prop, and deterministic ordering/aggregation changes.
Presolve and probing
cpp/src/mip_heuristics/presolve/*, third_party_presolve.*
Moved presolve loops to work-limit timers, added WORK_LIMIT termination enum, deterministic iteration caps, set_deterministic flag for third-party presolve, and deterministic-safe probing/cache behavior.
Relaxed LP & PDLP interactions
cpp/src/mip_heuristics/relaxed_lp/*, pdlp/initial_scaling_strategy/*
Extended relaxed LP settings with iteration/work limits and work_context; added determinism-mode iteration heuristics and work recording; added stream-override to unscale routines.
Problem & solution data model
cpp/src/mip_heuristics/problem/*, solution/*
Deterministic-safe device buffer init, objective-coeff aggregation via sort+reduce (no atomics), optional RAFT handle overrides, removed lower/upper slack fields, added solution hashing and handle-overload constructor, and branch_and_bound_callback origin argument.
High-level solver orchestration
cpp/src/mip_heuristics/solve.cu, solver.cu/cuh, solver_context.cuh, benchmarks/linear_programming/cuopt/run_mip.cpp
Replaced timer with termination_checker across solver stack, integrated solution publication/injection, changed determinism checks to bitmask, set per-component work scales and GPU heuristics modes, added incumbent-history CSV output in benchmark.
Utilities & small fixes
cpp/src/utilities/* (seed_generator.*, copy_helpers.hpp, cuda_helpers.cuh, timer.hpp)
Refactored seed generator to base/epoch/local counters; added device_scalar make_span overloads; NVTX memory-init helpers; extended timer check API with caller context.
Tests, CI & datasets
cpp/tests/mip/*, cpp/tests/CMakeLists.txt, ci/compute-sanitizer-suppressions.xml, datasets/mip/download_miplib_test_dataset.sh
Large suite additions for determinism testing (diversity, FJ, local search, multi-probe, presolve), stress utilities, CUPTI CMake bits, compute-sanitizer suppressions, and added supportcase42 to dataset list.
Misc CMake / small changes
cpp/CMakeLists.txt, cpp/tests/CMakeLists.txt, various headers
Commented PAPI discovery block; CUPTI detection helper in tests CMake; many signature and header includes adjusted for new types.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gpudet

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
cpp/tests/mip/multi_probe_test.cu (1)

152-165: ⚠️ Potential issue | 🟠 Major

This regression helper never enables the deterministic code path.

test_multi_probe() still builds solver from default settings, so multi_probe_t::solve() is not running under the determinism mode this PR is adding. That makes multi_probe_deterministic a regression test for the old path, not the new one. Pass a determinism flag into this helper or construct the solver with deterministic settings in the deterministic test case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/multi_probe_test.cu` around lines 152 - 165, The regression
helper constructs detail::mip_solver_t solver using default
mip_solver_settings_t (so multi_probe_t::solve() runs non-deterministic code);
update the helper so deterministic tests actually enable determinism—either add
a parameter to test_multi_probe() to pass a determinism flag and use it when
creating mip_solver_settings_t default_settings (set the deterministic mode) or,
in the multi_probe_deterministic test case, explicitly modify default_settings
before constructing detail::mip_solver_t solver so the solver is created with
deterministic settings and the deterministic code path is exercised.
cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh (1)

541-567: ⚠️ Potential issue | 🔴 Critical

Type change from f_t to int32_t for score accumulators is a critical bug.

The accumulators base_feas and bonus_robust changed from f_t to int32_t (lines 541-542), but feas_score_constraint returns std::pair<f_t, f_t>. This causes implicit truncation of floating-point scores to integers at lines 562-563 before they are reduced and passed to warp_update_move_score, which expects f_t parameters (lines 271-272). The binary variable kernel (lines 328-359) correctly uses f_t accumulators throughout, making this inconsistency a regression that corrupts score calculations in the MTM kernel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh` around lines 541
- 567, The accumulators base_feas and bonus_robust in the MTM kernel must be
f_t, not int32_t, because feas_score_constraint<i_t,f_t> returns
std::pair<f_t,f_t> and warp_update_move_score expects f_t; change the
declarations of base_feas and bonus_robust back to f_t, ensure the values added
from feas_score_constraint are stored as f_t, and keep the raft::warpReduce
calls operating on f_t so the reduced values passed to warp_update_move_score
retain their floating precision.
cpp/src/mip_heuristics/local_search/local_search.cu (1)

209-245: ⚠️ Potential issue | 🟠 Major

Deterministic do_fj_solve() overcommits the local work budget.

The GPU climb gets the full time_limit through timer, and every ls_cpu_fj climber also gets work_budget = time_limit. Unlike start_cpufj_deterministic() above, these workers are never registered with producer_sync, so this helper can spend more than one budgeted unit of work per local-search call. Either split the budget across workers or keep CPUFJ disabled in deterministic mode here until it participates in the same producer/work accounting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 209 - 245,
The deterministic path in do_fj_solve assigns each cpu_fj worker the full
time_limit (cpu_fj.fj_cpu->work_budget = time_limit) without registering them
with producer_sync, allowing total CPU work to exceed the intended budget;
change this by disabling per-worker CPUFJ in deterministic mode here (skip
creating/assigning work_budget and work_units_elapsed for ls_cpu_fj) or instead
divide the time_limit across ls_cpu_fj (e.g., time_limit / ls_cpu_fj.size()) and
ensure any created cpu_fj workers are registered with the same producer/work
accounting used by start_cpufj_deterministic so their budgets are enforced
consistently.
cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu (1)

1166-1180: ⚠️ Potential issue | 🟠 Major

Clamp the nested LP to the parent timer.

In deterministic mode this LP gets its own work_limit, but it is taken from lp_run_time_after_feasible instead of timer.remaining_time(). Near exhaustion, this path can spend more work than the caller budgeted.

Suggested fix
   relaxed_lp_settings_t lp_settings;
   lp_settings.time_limit = lp_run_time_after_feasible;
   if (timer.deterministic) {
-    lp_settings.work_limit   = lp_settings.time_limit;
+    lp_settings.work_limit =
+      std::min(lp_settings.time_limit, std::max((f_t)0.0, timer.remaining_time()));
+    lp_settings.time_limit   = lp_settings.work_limit;
     lp_settings.work_context = timer.work_context;
     cuopt_assert(lp_settings.work_context != nullptr, "Missing deterministic work context");
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines
1166 - 1180, The deterministic LP uses lp_settings.work_limit derived from
lp_run_time_after_feasible which can exceed the caller's remaining budget;
update the deterministic branch in constraint_prop.cu to clamp
lp_settings.work_limit to the parent timer by setting lp_settings.work_limit =
std::min(lp_run_time_after_feasible, timer.remaining_time()) (and keep
lp_settings.work_context = timer.work_context as before) before calling
run_lp_with_vars_fixed; ensure you reference lp_settings, timer,
lp_run_time_after_feasible and the run_lp_with_vars_fixed call when making the
change.
cpp/src/mip_heuristics/diversity/diversity_manager.cu (1)

126-144: ⚠️ Potential issue | 🔴 Critical

Move the deterministic-flag initialization above the env-config early returns.

Line 127 returns in the common CUOPT_CONFIG_ID == nullptr path, so Lines 142-143 never run. That leaves context.gpu_heur_loop.deterministic false and turns later work_limit_timer_t(context.gpu_heur_loop, ...) instances back into wall-clock timers even when GPU determinism was requested.

🛠️ Proposed fix
   const char* env_config_id_raw = std::getenv("CUOPT_CONFIG_ID");
+  context.gpu_heur_loop.deterministic =
+    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
   if (env_config_id_raw == nullptr) { return; }
@@
-  context.gpu_heur_loop.deterministic =
-    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 126 -
144, Initialize context.gpu_heur_loop.deterministic before any early returns so
GPU determinism is always set regardless of CUOPT_CONFIG_ID parsing/validation;
specifically, move the assignment context.gpu_heur_loop.deterministic =
(context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS) to occur
immediately after reading settings (and before checking env_config_id_raw,
std::stoi, or the max_config check), leaving the existing CUOPT_CONFIG_ID
parsing, env_config_id, and max_config validation logic intact so
work_limit_timer_t and other consumers see the correct deterministic flag even
when CUOPT_CONFIG_ID is missing or invalid.
🟠 Major comments (20)
cpp/src/dual_simplex/basis_updates.cpp-2205-2205 (1)

2205-2205: ⚠️ Potential issue | 🟠 Major

Sort work estimate is computed with the wrong size term.

std::sort here operates on nz elements, but the estimate uses m + nz. This overcounts work (including when nz == 0) and can skew work-based scheduling/determinism decisions.

Suggested fix
-  if ((m + nz) > 1) { work_estimate_ += (m + nz) * std::log2((f_t)(m + nz)); }
+  if (nz > 1) {
+    const f_t nz_f = static_cast<f_t>(nz);
+    work_estimate_ += nz_f * std::log2(nz_f);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/basis_updates.cpp` at line 2205, The work estimate uses
the wrong size: change the condition and log term that update work_estimate_ in
the block that precedes the std::sort over nz elements so it uses nz (not m +
nz); specifically, replace the (m + nz) check and std::log2((f_t)(m + nz)) with
an nz > 1 guard and std::log2((f_t)nz) so work_estimate_ reflects sorting nz
elements (reference symbols: work_estimate_, m, nz, std::sort).
cpp/include/cuopt/linear_programming/constants.h-80-90 (1)

80-90: ⚠️ Potential issue | 🟠 Major

Potential breaking semantic change in CUOPT_MODE_DETERMINISTIC.

CUOPT_MODE_DETERMINISTIC now aliases CUOPT_DETERMINISM_FULL (0x3), while the comment indicates prior behavior/value aligned with B&B-only determinism (0x1). Recompiled clients using the legacy macro will silently get stricter behavior/perf characteristics unless this is explicitly documented and migration-guided.

As per coding guidelines, "For breaking changes, recommend updating docs and migration guides."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/constants.h` around lines 80 - 90, The
macro CUOPT_MODE_DETERMINISTIC was changed to alias CUOPT_DETERMINISM_FULL (0x3)
which breaks prior semantics that mapped deterministic mode to B&B-only
(CUOPT_DETERMINISM_BB, 0x1); restore the original behavior by remapping
CUOPT_MODE_DETERMINISTIC back to CUOPT_DETERMINISM_BB (or introduce a new
CUOPT_MODE_DETERMINISTIC_FULL and keep CUOPT_MODE_DETERMINISTIC as before) and
add a note in the changelog/docs/migration guide explaining the new FULL mode
and the intentional choice if you keep the new mapping; locate and change the
definition for CUOPT_MODE_DETERMINISTIC in the constants.h file (symbols:
CUOPT_MODE_DETERMINISTIC, CUOPT_DETERMINISM_FULL, CUOPT_DETERMINISM_BB).
cpp/src/mip_heuristics/local_search/local_search.cuh-55-83 (1)

55-83: ⚠️ Potential issue | 🟠 Major

Pass work_limit_timer_t by reference for consistency and state propagation.

Methods in this header accept work_limit_timer_t by value, but in cpp/src/mip_heuristics/presolve/multi_probe.cuh (line 58) and cpp/src/mip_heuristics/presolve/bounds_presolve.cuh (line 64), the same type is passed by reference. Since termination_checker_t (the underlying type) tracks mutable state and provides non-const methods like record_work(), passing by value creates a copy and prevents state changes from propagating back to the caller. For consistency and correctness, consider changing all parameters to work_limit_timer_t&.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/local_search.cuh` around lines 55 - 83,
All declarations that currently take work_limit_timer_t by value should take it
by reference so mutation of the underlying termination_checker_t propagates to
the caller; update the signatures of generate_fast_solution, generate_solution,
run_fj_until_timer, run_local_search, run_fj_annealing, run_fj_line_segment,
run_fj_on_zero, check_fj_on_lp_optimal, run_staged_fp and run_fp to accept
work_limit_timer_t& (and adjust any callers/definitions accordingly) to match
how work_limit_timer_t is used elsewhere (e.g., in multi_probe.cuh and
bounds_presolve.cuh).
cpp/src/utilities/seed_generator.cuh-16-16 (1)

16-16: ⚠️ Potential issue | 🟠 Major

Data race on base_seed_ read in local_state().

base_seed_ is a non-atomic int64_t that is written in set_seed() (lines 43, 45) and read in local_state() (line 32). While epoch_ uses proper acquire/release semantics, the read of base_seed_ in local_state() races with the write in set_seed() if called concurrently from different threads.

Consider making base_seed_ atomic or protecting both fields under a common synchronization mechanism.

Proposed fix
 class seed_generator {
-  static int64_t base_seed_;
+  static std::atomic<int64_t> base_seed_;
   // Monotonically increasing epoch; incremented on every set_seed() call.
   // ...
   static std::atomic<int64_t> epoch_;

   // In local_state():
   if (state.last_epoch != current_epoch) {
-    state.counter    = base_seed_;
+    state.counter    = base_seed_.load(std::memory_order_acquire);
     state.last_epoch = current_epoch;
   }

   // In set_seed():
 `#ifdef` BENCHMARK
-    base_seed_ = std::random_device{}();
+    base_seed_.store(std::random_device{}(), std::memory_order_release);
 `#else`
-    base_seed_ = static_cast<int64_t>(seed);
+    base_seed_.store(static_cast<int64_t>(seed), std::memory_order_release);
 `#endif`

Also applies to: 27-36, 40-48

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/seed_generator.cuh` at line 16, The read of base_seed_ in
local_state() races with writes in set_seed(); change base_seed_ to
std::atomic<int64_t> (or otherwise protect both base_seed_ and epoch_ with a
common mutex) and update all accesses: replace raw reads in local_state() with
base_seed_.load(std::memory_order_acquire) and writes in set_seed() with
base_seed_.store(..., std::memory_order_release) so the ordering matches
epoch_’s acquire/release semantics (alternatively lock the same mutex used for
epoch_ around both set_seed() and local_state()).
cpp/src/mip_heuristics/solution/solution.cu-58-72 (1)

58-72: ⚠️ Potential issue | 🟠 Major

The handle-override constructor is unsafe as written.

This ctor dereferences handle_override immediately, and the type still sometimes syncs via problem_ptr->handle_ptr instead of solution_t::handle_ptr (for example, Line 263). A real override can therefore crash on null or enqueue work on one stream and synchronize another.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution/solution.cu` around lines 58 - 72, The
constructor dereferences handle_override and mixes problem_ptr->handle_ptr vs
solution_t::handle_ptr leading to potential null deref and stream mismatches;
fix by selecting a single chosen_handle = handle_override ? handle_override :
problem_.handle_ptr before using it, initialize solution_t::handle_ptr with that
chosen_handle, and use chosen_handle (not problem_.handle_ptr) when calling
get_lower_bounds, constructing member tensors (their get_stream()), initializing
lp_state, and calling clamp_within_var_bounds so all operations use the same
non-null handle consistently.
cpp/src/mip_heuristics/solution/solution.cu-689-693 (1)

689-693: ⚠️ Potential issue | 🟠 Major

Hash only the active variable prefix.

assignment.size() is not guaranteed to match problem_ptr->n_variables here—get_host_assignment() already documents that. Hashing the full buffer pulls stale tail values into the new determinism fingerprint after resize/fix/unfix flows.

Suggested fix
 template <typename i_t, typename f_t>
 uint32_t solution_t<i_t, f_t>::get_hash() const
 {
-  auto h_assignment = host_copy(assignment, handle_ptr->get_stream());
+  auto h_assignment =
+    cuopt::host_copy(assignment.data(), problem_ptr->n_variables, handle_ptr->get_stream());
   return compute_hash(h_assignment);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution/solution.cu` around lines 689 - 693, get_hash
currently copies and hashes the entire device buffer (assignment) which can
include stale tail values; change it to compute the hash only over the active
variable prefix. Specifically, obtain the host assignment and its active length
(use get_host_assignment() or problem_ptr->n_variables / the documented active
size from the host vector) and call compute_hash only on that prefix (not the
full buffer returned by host_copy). Update solution_t<i_t,f_t>::get_hash to copy
and hash only the active-length portion (keep references to assignment,
handle_ptr->get_stream(), compute_hash, and
problem_ptr->n_variables/get_host_assignment to locate the code).
cpp/tests/mip/determinism_test.cu-70-95 (1)

70-95: ⚠️ Potential issue | 🟠 Major

Don't use the capped prefix buffer to prove GPU callbacks happened.

Once Line 87 starts returning early, later callbacks are dropped from solutions. The downstream count_gpu_callbacks(callbacks_run*) checks then only inspect the first 32 callbacks, so this test can fail even when GPU-origin incumbents appear later in the run. Keep the prefix cap for stored assignments, but track callback-origin counts across all invocations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_test.cu` around lines 70 - 95, In get_solution, we
should not gate counting of callback origins on the capped solutions buffer;
move or add logic that records/updates callback-origin counts (using
callback_info->origin or a dedicated counters map/array like callbacks_run)
before the early return that checks solutions.size() >= max_solutions so every
invocation is accounted for, while still preserving the existing prefix-cap
behavior for storing assignment data in solutions; update get_solution to
increment the origin counter (or call the existing count/update helper)
immediately after ASSERT/EXPECT checks and before the if (solutions.size() >=
max_solutions) return.
cpp/tests/mip/local_search_test.cu-157-170 (1)

157-170: ⚠️ Potential issue | 🟠 Major

Stop the FP loop when the work budget is exhausted.

run_single_fp_descent() / restart_fp() consume local_search.fp.timer, but the outer while (true) never checks it. Once the timer expires, this can spin forever returning false and hang the test runner.

Suggested fix
   if (mode == local_search_mode_t::FP) {
     bool is_feasible = false;
     int iterations   = 0;
-    while (true) {
+    while (!local_search.fp.timer.check_time_limit()) {
       is_feasible = local_search.fp.run_single_fp_descent(solution);
       printf("fp_loop it %d, is_feasible %d\n", iterations, is_feasible);
       if (is_feasible) {
         break;
       } else {
         is_feasible = local_search.fp.restart_fp(solution);
         if (is_feasible) { break; }
       }
       iterations++;
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/local_search_test.cu` around lines 157 - 170, The FP loop can
spin after the work budget expires because run_single_fp_descent() and
restart_fp() consume local_search.fp.timer but the while(true) never checks it;
update the loop in local_search_test.cu (around
local_search.fp.run_single_fp_descent and local_search.fp.restart_fp) to check
the FP work budget/timer after each call and break when expired (e.g., inspect
local_search.fp.timer or call its expired/remaining API) so the loop exits when
the timer indicates work is exhausted rather than looping forever.
cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh-249-256 (1)

249-256: ⚠️ Potential issue | 🟠 Major

Non-deterministic recombiner order due to unordered_set iteration.

The enabled_recombiners vector is constructed from std::unordered_set iteration, which has implementation-defined order. This means order_str in the log and the actual recombiner execution order may vary between runs, potentially affecting determinism.

🐛 Fix: Sort the enabled_recombiners vector after construction
     recombiner_t::enabled_recombiners =
       std::vector<recombiner_enum_t>(enabled_recombiners.begin(), enabled_recombiners.end());
+    // Sort to ensure deterministic iteration order
+    std::sort(recombiner_t::enabled_recombiners.begin(), 
+              recombiner_t::enabled_recombiners.end());
     cuopt_assert(!recombiner_t::enabled_recombiners.empty(), "No recombiners enabled after init");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh` around lines 249
- 256, The enabled_recombiners vector is built from an unordered_set and
therefore can have non-deterministic iteration order; after constructing
recombiner_t::enabled_recombiners, sort it to enforce a stable order (e.g., call
std::sort on recombiner_t::enabled_recombiners using a comparator that orders by
the enum numeric value or by recombiner_name(recombiner) for a deterministic
lexical order) so order_str and actual execution order are deterministic.
cpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuh-193-195 (1)

193-195: ⚠️ Potential issue | 🟠 Major

Don't ship a placeholder work estimate into deterministic accounting.

n_vars_from_other / 1e8 ignores the actual branch taken, reduced-problem size, probing load, and the bounds-prop/LP effort. Two recombinations with the same n_vars_from_other can therefore consume very different work while being charged identically.

Do you want me to sketch an estimator based on reduced-problem size and the actual bounds-prop rounds?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuh`
around lines 193 - 195, The current deterministic accounting uses a placeholder
estimate double work = static_cast<double>(n_vars_from_other) / 1e8; which
mischarges recombinations; replace this with a real estimator that accounts for
the actual branch taken, reduced-problem size, bounds-prop rounds, probing
effort and any LP solves: compute work from inputs like n_vars_from_other,
n_constraints_from_other (or reduced problem variable/constraint counts), the
number of bounds-prop iterations/rounds (e.g., bounds_prop_rounds), and an
LP/solver cost factor (e.g., lp_solves or lp_iterations), and combine them into
a weighted sum (or calibrated formula) instead of the fixed division; update the
usage of work where used in BoundPropRecombiner/recombine (or the function
containing work) so charges reflect these concrete components rather than the
1e8 placeholder.
cpp/src/mip_heuristics/solve.cu-176-179 (1)

176-179: ⚠️ Potential issue | 🟠 Major

Don't force-enable solution logging.

Line 178 turns CUOPT_MIP_HIDE_SOLUTION into a no-op, so callers who explicitly asked to suppress full solution output will still get it in logs.

🛠️ Suggested fix
   int hidesol =
     std::getenv("CUOPT_MIP_HIDE_SOLUTION") ? atoi(std::getenv("CUOPT_MIP_HIDE_SOLUTION")) : 0;
-  hidesol = 0;
   if (!hidesol) { detail::print_solution(scaled_problem.handle_ptr, sol.get_solution()); }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solve.cu` around lines 176 - 179, The code forcibly
enables solution logging by overwriting the hidesol flag (int hidesol ...) with
hidesol = 0; remove that overwrite so the hidesol value derived from the
environment variable CUOPT_MIP_HIDE_SOLUTION is respected; then ensure the
conditional if (!hidesol) { detail::print_solution(scaled_problem.handle_ptr,
sol.get_solution()); } remains unchanged so callers can suppress printing by
setting the env var, referencing the hidesol variable, detail::print_solution,
scaled_problem.handle_ptr, and sol.get_solution() to locate the change.
cpp/src/mip_heuristics/solve.cu-244-246 (1)

244-246: ⚠️ Potential issue | 🟠 Major

The root checker still ignores settings.work_limit.

deterministic_run is computed after the checker is created, and the checker is always initialized from time_limit. That means the new deterministic remaining_time() / work_limit_timer_t plumbing below never sees the configured top-level work budget.

🛠️ Suggested fix
-    auto timer =
-      cuopt::termination_checker_t(time_limit, cuopt::termination_checker_t::root_tag_t{});
-    const bool deterministic_run = (settings.determinism_mode != CUOPT_DETERMINISM_NONE);
+    const bool deterministic_run = (settings.determinism_mode != CUOPT_DETERMINISM_NONE);
+    const f_t root_work_limit =
+      (settings.work_limit == 0 || settings.work_limit == std::numeric_limits<f_t>::infinity() ||
+       settings.work_limit == std::numeric_limits<f_t>::max())
+        ? max_time_limit
+        : settings.work_limit;
+    auto timer = cuopt::termination_checker_t(
+      deterministic_run ? root_work_limit : time_limit,
+      cuopt::termination_checker_t::root_tag_t{});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solve.cu` around lines 244 - 246, Compute
deterministic_run before constructing the termination checker and initialize
cuopt::termination_checker_t with the correct top-level budget (use
settings.work_limit via the work_limit timer path when determinism is enabled)
instead of always passing time_limit; in practice move the line computing
deterministic_run above the creation of timer and, if deterministic_run (or
settings.work_limit is set), construct the checker using the work-limit-aware
mechanism (work_limit_timer_t / remaining_time plumbing) so the root
termination_checker_t (root_tag_t) sees settings.work_limit rather than always
using time_limit.
cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu-304-307 (1)

304-307: ⚠️ Potential issue | 🟠 Major

The pre-sort does not survive the later thrust::sort_by_key in compute_damages().

The compute_damages() function resorts candidates using thrust::sort_by_key on (cstr_delta, damage) with a custom comparator. Since sort_by_key is unstable, candidates with equal (cstr_delta, damage) pairs can be reordered, destroying the deterministic (variable_index, bound_shift) ordering established by the pre-sort at lines 304-307.

To fix: either replace sort_by_key with stable_sort_by_key, or add (variable_index, bound_shift) as tie-breakers in the custom comparator.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu` around lines
304 - 307, The current pre-sort of candidates by (variable_index, bound_shift)
is being undone inside compute_damages() because it uses thrust::sort_by_key
with a custom comparator (on cstr_delta and damage) which is unstable; either
switch that call to thrust::stable_sort_by_key so equal (cstr_delta, damage)
preserve the pre-sort, or update the custom comparator used in compute_damages()
to include candidates.variable_index and candidates.bound_shift as deterministic
tie-breakers (compare cstr_delta, then damage, then variable_index, then
bound_shift) to maintain deterministic ordering.
cpp/src/utilities/work_limit_context.hpp-60-62 (1)

60-62: ⚠️ Potential issue | 🟠 Major

CUOPT_GPU_HEUR_WORK_UNIT_SCALE does not change the actual budget yet.

The env value is stored in producer_progress_scale, but recorded work is multiplied by work_unit_scale, which stays 1.0. That makes producer notifications and real scheduling drift apart.

Suggested fix
-  double producer_progress_scale{
-    read_work_unit_scale_env_or_default("CUOPT_GPU_HEUR_WORK_UNIT_SCALE", 1.0)};
-  double work_unit_scale{1.0};
+  double work_unit_scale{
+    read_work_unit_scale_env_or_default("CUOPT_GPU_HEUR_WORK_UNIT_SCALE", 1.0)};
+  double producer_progress_scale{work_unit_scale};

Also applies to: 123-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/work_limit_context.hpp` around lines 60 - 62,
producer_progress_scale is being initialized from CUOPT_GPU_HEUR_WORK_UNIT_SCALE
but work_unit_scale remains 1.0 so recorded work never uses the env override;
update the initialization so work_unit_scale is set from
read_work_unit_scale_env_or_default (or assigned from producer_progress_scale)
so both producer notifications and recorded work use the same scale (apply the
same change where work_unit_scale is declared/initialized at the other
occurrence referenced by symbols producer_progress_scale and work_unit_scale).
cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu-29-58 (1)

29-58: ⚠️ Potential issue | 🟠 Major

Gate log_unset_var_ordering() before the host syncs.

This helper copies to host, synchronizes the stream, and reads device values before it reaches CUOPT_DETERMINISM_LOG. Because the call sites are unconditional, normal runs pay that stall too.

Suggested fix
 template <typename i_t, typename f_t>
 void log_unset_var_ordering(const char* label,
                             const rmm::device_uvector<i_t>& vars,
                             const rmm::device_uvector<f_t>& assignment,
-                            const raft::handle_t* handle_ptr)
+                            const raft::handle_t* handle_ptr,
+                            bool enabled)
 {
+  if (!enabled) return;
   const size_t n   = vars.size();
   const size_t cap = std::min(n, (size_t)16);
   std::vector<i_t> h_vars(cap);
   raft::copy(h_vars.data(), vars.data(), cap, handle_ptr->get_stream());
   handle_ptr->sync_stream();
@@
-    log_unset_var_ordering("post-sort_by_frac", unset_integer_vars, sol.assignment, sol.handle_ptr);
+    log_unset_var_ordering(
+      "post-sort_by_frac", unset_integer_vars, sol.assignment, sol.handle_ptr, timer.deterministic);

Apply the same guard to the other two call sites in find_integer().

Also applies to: 945-946, 963-964, 998-999

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines
29 - 58, The function log_unset_var_ordering performs host copies and stream
syncs unconditionally; wrap its expensive host-side work behind the determinism
logging guard so it only runs when CUOPT_DETERMINISM_LOG is enabled (e.g., check
the same macro/flag used by CUOPT_DETERMINISM_LOG and early-return if logging is
disabled), and also wrap the three unconditional calls from find_integer() with
the same guard so those call sites no longer trigger the host copy/sync when
logging is off; locate log_unset_var_ordering and the call sites inside
find_integer() and implement the conditional/early-return around the host copy,
raft::copy/handle_ptr->sync_stream, and assignment reads (or wrap the entire
function body) and add the same macro check at the three call sites.
cpp/src/mip_heuristics/solution_callbacks.cuh-87-100 (1)

87-100: ⚠️ Potential issue | 🟠 Major

Move callback dispatch out of solution_callback_mutex_.

invoke_get_solution_callbacks() runs user code. Holding the mutex across that call can deadlock on re-entrant publication paths and blocks solver progress behind slow callbacks.

Suggested fix
   bool publish_new_best_feasible(const solution_callback_payload_t<i_t, f_t>& payload,
                                  double elapsed_time = -1.0)
   {
-    std::lock_guard<std::mutex> lock(solution_callback_mutex_);
-    cuopt_assert(std::isfinite(payload.solver_objective),
-                 "Feasible incumbent objective must be finite");
-    if (!(payload.solver_objective < best_callback_feasible_objective_)) { return false; }
-
-    best_callback_feasible_objective_ = payload.solver_objective;
-    if (settings.benchmark_info_ptr != nullptr && elapsed_time >= 0.0) {
-      settings.benchmark_info_ptr->last_improvement_of_best_feasible = elapsed_time;
-    }
-    invoke_get_solution_callbacks(payload);
-    return true;
+    {
+      std::lock_guard<std::mutex> lock(solution_callback_mutex_);
+      cuopt_assert(std::isfinite(payload.solver_objective),
+                   "Feasible incumbent objective must be finite");
+      if (!(payload.solver_objective < best_callback_feasible_objective_)) { return false; }
+
+      best_callback_feasible_objective_ = payload.solver_objective;
+      if (settings.benchmark_info_ptr != nullptr && elapsed_time >= 0.0) {
+        settings.benchmark_info_ptr->last_improvement_of_best_feasible = elapsed_time;
+      }
+    }
+    invoke_get_solution_callbacks(payload);
+    return true;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution_callbacks.cuh` around lines 87 - 100, The
function publish_new_best_feasible holds solution_callback_mutex_ across
invoke_get_solution_callbacks, which can run user code and deadlock; limit the
mutex scope to only the critical section that checks/updates
best_callback_feasible_objective_ and benchmark info (use a scoped block or a
std::unique_lock and unlock before calling invoke_get_solution_callbacks), then
call invoke_get_solution_callbacks(payload) after releasing
solution_callback_mutex_; keep checks using solution_callback_mutex_, update
best_callback_feasible_objective_ and settings.benchmark_info_ptr while locked,
then release the lock and finally invoke_get_solution_callbacks.
cpp/src/mip_heuristics/solution_callbacks.cuh-205-206 (1)

205-206: ⚠️ Potential issue | 🟠 Major

Use model-scaled tolerances for injected-objective validation.

A fixed 1e-6 absolute check is too tight for templated f_t and large objectives. Valid injected solutions can trip this assert after preprocessing/scaling due to normal roundoff.

Suggested fix
-      cuopt_assert(std::abs(outside_sol.get_user_objective() - outside_sol_objective) <= 1e-6,
-                   "External solution objective mismatch");
+      const f_t computed_obj = outside_sol.get_user_objective();
+      const f_t abs_tol      = current_incumbent.problem_ptr->tolerances.absolute_tolerance;
+      const f_t rel_tol      = f_t(1e-6);
+      cuopt_assert(std::abs(computed_obj - outside_sol_objective) <=
+                     abs_tol + rel_tol *
+                                 std::max(std::abs(computed_obj),
+                                          std::abs(outside_sol_objective)),
+                   "External solution objective mismatch");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution_callbacks.cuh` around lines 205 - 206, The
absolute tolerance 1e-6 in the cuopt_assert comparing
outside_sol.get_user_objective() and outside_sol_objective is too strict for
templated f_t and scaled models; replace it with a model-scaled tolerance (e.g.,
compute tol = max(min_tol, rel_tol * max(1.0,
abs(outside_sol.get_user_objective()), abs(outside_sol_objective))) or use the
model/objective scaling factor if available) and use cuopt_assert(std::abs(...)
<= tol, ...) instead; update the assertion in solution_callbacks.cuh where
outside_sol.get_user_objective() and outside_sol_objective are compared to use
this computed tolerance.
cpp/src/utilities/termination_checker.hpp-10-13 (1)

10-13: ⚠️ Potential issue | 🟠 Major

Make finished atomic.

check() can run concurrently from multiple worker threads. The mutable bool finished is read and written without synchronization in the deterministic path, which is a data race.

🔒 Proposed fix
 `#include` <algorithm>
+#include <atomic>
 `#include` <cmath>
 `#include` <string>
@@
-      if (finished_now && !finished) {
-        finished                   = true;
-        double actual_elapsed_time = timer.elapsed_time();
-
-        if (work_limit > 0 && std::abs(actual_elapsed_time - work_limit) / work_limit > 0.10) {
-          CUOPT_LOG_TRACE(
-            "%s:%d: %s(): Work limit timer finished with a large discrepancy: %fs for %fwu "
-            "(global: %g, start: %g)",
-            file,
-            line,
-            caller,
-            actual_elapsed_time,
-            work_limit,
-            work_context->current_work(),
-            work_units_at_start);
+      if (finished_now) {
+        bool expected = false;
+        if (finished.compare_exchange_strong(expected, true, std::memory_order_relaxed)) {
+          double actual_elapsed_time = timer.elapsed_time();
+
+          if (work_limit > 0 && std::abs(actual_elapsed_time - work_limit) / work_limit > 0.10) {
+            CUOPT_LOG_TRACE(
+              "%s:%d: %s(): Work limit timer finished with a large discrepancy: %fs for %fwu "
+              "(global: %g, start: %g)",
+              file,
+              line,
+              caller,
+              actual_elapsed_time,
+              work_limit,
+              work_context->current_work(),
+              work_units_at_start);
+          }
         }
       }
-      return finished;
+      return finished.load(std::memory_order_relaxed);
@@
-  mutable bool finished{false};
+  mutable std::atomic<bool> finished{false};

Also applies to: 88-109, 223-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/termination_checker.hpp` around lines 10 - 13, The mutable
bool finished in the termination checker is a data race because check() is
called concurrently; change it to a thread-safe atomic by adding `#include`
<atomic> and replacing the declaration with mutable std::atomic<bool>
finished{false}; then update all reads/writes to use finished.load(...) and
finished.store(...), using memory_order_acquire for reads in check() and
memory_order_release for writes when setting finished (and/or
memory_order_relaxed for simple clear/set if appropriate), and apply the same
changes to any other methods that read/write finished (e.g., the reset/marking
code referenced around the other occurrences).
cpp/src/mip_heuristics/diversity/diversity_manager.cu-499-503 (1)

499-503: ⚠️ Potential issue | 🟠 Major

Derive deterministic phase budgets from remaining work.

timer.get_time_limit() ignores work already consumed before run_solver() (presolve, probing, etc.). In deterministic mode the initial LP budget can therefore exceed the remaining global budget.

🛠️ Proposed fix
   population.timer     = timer;
-  const f_t time_limit = timer.deterministic ? timer.get_time_limit() : timer.remaining_time();
+  const f_t time_limit = timer.remaining_time();
   const f_t lp_time_limit =
     std::min(diversity_config.max_time_on_lp, time_limit * diversity_config.time_ratio_on_init_lp);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 499 -
503, The deterministic LP-phase budget is being computed from
timer.get_time_limit(), which can ignore work already consumed before
run_solver(); instead derive phase budgets from remaining time. Change the
computation so time_limit is obtained from timer.remaining_time() for both
deterministic and non-deterministic modes (i.e., replace the conditional using
timer.get_time_limit() with a call to timer.remaining_time()), then compute
lp_time_limit as before using diversity_config.max_time_on_lp and
diversity_config.time_ratio_on_init_lp; update any comments referencing
deterministic behavior to reflect the unified remaining-time approach
(references: population.timer, timer.get_time_limit(), timer.remaining_time(),
time_limit, lp_time_limit, diversity_config.max_time_on_lp,
diversity_config.time_ratio_on_init_lp).
cpp/tests/mip/diversity_test.cu-285-295 (1)

285-295: ⚠️ Potential issue | 🟠 Major

Don't call exit(1) from a test helper.

This aborts the whole test binary and skips GTest failure reporting and teardown for the rest of the suite. Fail the current test instead.

🧪 Proposed fix
         if (hash_map.find(std::make_tuple(path, i, j, recombiner)) == hash_map.end()) {
           hash_map[std::make_tuple(path, i, j, recombiner)] = offspring_hash;
         } else {
           if (hash_map[std::make_tuple(path, i, j, recombiner)] != offspring_hash) {
-            printf("%s: hash mismatch for %d,%d: %d != %d\n",
-                   path.c_str(),
-                   i,
-                   j,
-                   hash_map[std::make_tuple(path, i, j, recombiner)],
-                   offspring_hash);
-            exit(1);
+            ADD_FAILURE() << path << ": hash mismatch for " << i << "," << j << ": expected "
+                          << hash_map[std::make_tuple(path, i, j, recombiner)]
+                          << " got " << offspring_hash;
+            return 0;
           }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/diversity_test.cu` around lines 285 - 295, The test helper
currently prints an error and calls exit(1) on a hash mismatch (the branch that
compares hash_map[std::make_tuple(path, i, j, recombiner)] to offspring_hash);
replace the printf + exit(1) with a GTest failure so the current test fails
cleanly (e.g., use FAIL() or ADD_FAILURE() with a formatted message including
path, i, j, the stored hash and offspring_hash) and remove the exit call so
teardown and other tests continue to run.
🟡 Minor comments (11)
ci/compute-sanitizer-suppressions.xml-80-106 (1)

80-106: ⚠️ Potential issue | 🟡 Minor

Record 5 (lines 80-108) has a brittle host stack—consolidate into explicit function anchors

This record has 4 module-only frames paired with only 1 function anchor (ThreadLoad), making it vulnerable to silent suppression failures across CUDA/runtime updates. Tighten the stack by using function-anchored callsites (cudaLaunchKernel*, cudaMemcpyAsync, specific rmm::* functions) instead of repeated module patterns.

Note: Records at lines 124-155, 177-209, and 223-249 already have multiple function anchors alongside their module-only frames (2–3 anchors each), so they are not at risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/compute-sanitizer-suppressions.xml` around lines 80 - 106, The suppression
record for Initcheck/ThreadLoad is brittle because it uses multiple module-only
frames; replace those module-only anchors with concrete function callsites to
stabilize matching — e.g., add/replace frames referencing cudaLaunchKernel*
and/or cudaMemcpyAsync and specific rmm::* functions (or other explicit host
functions that call into libcuda/libcudart/libcuopt) alongside the existing
ThreadLoad and Device(Reduce|Scan)Kernel anchors so the hostStack contains
multiple function-anchored frames instead of only module patterns.
cpp/src/utilities/determinism_log.hpp-19-35 (1)

19-35: ⚠️ Potential issue | 🟡 Minor

Remove duplicate macro blocks; only the first definition is active.

After Line 20 defines CUOPT_DETERMINISM_LOG, the #ifndef blocks at Line 25-35 never execute. Keep one no-op definition to avoid confusion.

Proposed cleanup
 `#ifndef` CUOPT_DETERMINISM_LOG
 `#define` CUOPT_DETERMINISM_LOG(logger, ...) \
   do {                                     \
   } while (0)
 `#endif`
-
-#ifndef CUOPT_DETERMINISM_LOG
-#define CUOPT_DETERMINISM_LOG(...) \
-  do {                             \
-  } while (0)
-#endif
-
-#ifndef CUOPT_DETERMINISM_LOG
-#define CUOPT_DETERMINISM_LOG(...) \
-  do {                             \
-  } while (0)
-#endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/determinism_log.hpp` around lines 19 - 35, The file
contains three redundant no-op macro definitions for CUOPT_DETERMINISM_LOG; keep
a single no-op macro definition and remove the duplicate blocks so the macro is
only defined once. Locate the repeated CUOPT_DETERMINISM_LOG macro definitions
(the `#ifndef` / `#define` / do { } while (0) blocks) and delete the extra two,
leaving one clear no-op definition to avoid confusion.
cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp-80-84 (1)

80-84: ⚠️ Potential issue | 🟡 Minor

Initialize last_recombiner_work and consider updating reset().

The new field last_recombiner_work is uninitialized, which could lead to undefined behavior if get_last_recombiner_work() is called before set_recombiner_work(). Similarly, the reset() method (lines 99-105) does not clear last_recombiner_work (or last_recombiner_time), so these values persist across resets.

Consider adding default initialization:

Proposed fix
   double last_recombiner_time;
   std::chrono::high_resolution_clock::time_point last_recombiner_start_time;
-  double last_recombiner_work;
+  double last_recombiner_work{0.0};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp` around
lines 80 - 84, Initialize last_recombiner_work (and last_recombiner_time) when
declared and clear them in the reset() method: add a default value for
last_recombiner_work (e.g., 0.0) in its declaration, and update reset() to
explicitly set last_recombiner_work and last_recombiner_time back to the same
default/sentinel so get_last_recombiner_work() cannot return uninitialized data;
modify the set_recombiner_work(), get_last_recombiner_work(), and reset() logic
accordingly to rely on these initialized fields.
cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh-121-124 (1)

121-124: ⚠️ Potential issue | 🟡 Minor

Pass the work-limit timer by reference for consistency.

Line 123 passes work_limit_timer_t timer_ by value, which is inconsistent with other timer-accepting functions in the codebase (bound_update_loop in bounds_presolve.cuh and multi_probe.cuh accept work_limit_timer_t&). While the current implementation functions correctly due to the shared work_context pointer, accepting by reference aligns with the established pattern for mutable state throughout the codebase.

Suggested header fix
-  bool repair_problem(problem_t<i_t, f_t>& problem,
-                      problem_t<i_t, f_t>& original_problem,
-                      work_limit_timer_t timer_,
-                      const raft::handle_t* handle_ptr_);
+  bool repair_problem(problem_t<i_t, f_t>& problem,
+                      problem_t<i_t, f_t>& original_problem,
+                      work_limit_timer_t& timer_,
+                      const raft::handle_t* handle_ptr_);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh` around lines
121 - 124, The declaration of repair_problem currently takes work_limit_timer_t
timer_ by value; change it to take work_limit_timer_t& timer_ for consistency
with other timer-accepting functions (e.g., bound_update_loop) and to avoid
copying mutable timer state. Update the function prototype in bounds_repair.cuh
(repair_problem<...> signature), the corresponding implementation/definition,
and all call sites so they pass the timer by reference (and adjust any constness
if needed); ensure the parameter name and the raft::handle_t* handle_ptr_ symbol
remain unchanged.
cpp/include/cuopt/linear_programming/cuopt_c.h-740-740 (1)

740-740: ⚠️ Potential issue | 🟡 Minor

Documentation references non-existent struct_size field.

The documentation states "struct_size is always set and can be used to detect future extensions safely," but cuOptMIPSolutionCallbackInfo does not contain a struct_size field. This appears to be a copy-paste error.

Suggested fix
- * `@param`[in] callback_info - Pointer to callback metadata. `struct_size` is always set and can be
- * used to detect future extensions safely.
+ * `@param`[in] callback_info - Pointer to callback metadata containing solution origin and work
+ * timestamp.

As per coding guidelines: For public header files, verify parameter descriptions match actual types/behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/cuopt_c.h` at line 740, The comment
incorrectly references a non-existent struct_size field in the documentation for
cuOptMIPSolutionCallbackInfo; update the parameter description for callback_info
in cuopt_c.h to remove the mention of struct_size (or replace it with the
correct field if one exists) so the doc matches the actual
cuOptMIPSolutionCallbackInfo definition and follows public-header guidelines.
cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh-79-80 (1)

79-80: ⚠️ Potential issue | 🟡 Minor

Add a null check before dereferencing context.termination.

The work_limit_timer_t constructor (which is an alias for termination_checker_t) dereferences the third parameter without null validation. The context.termination member is initialized to nullptr and documented as being "set by mip_solver_t after construction," but there is no runtime check to ensure it is non-null when recombine() is called. This pattern appears elsewhere in the codebase (e.g., in diversity_manager.cu) and represents a systematic unsafe pointer dereference. Add an assertion or null check to enforce the precondition, or refactor to avoid the dereference if termination may not be set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh`
around lines 79 - 80, Before constructing work_limit_timer_t line_segment_timer
in recombine(), verify context.termination is non-null; add an assertion or
explicit null-check (e.g., assert(context.termination) or if
(!context.termination) handle error/return) prior to the line that creates
line_segment_timer so you never dereference a nullptr. Reference the symbols:
work_limit_timer_t, line_segment_timer, context.termination and the recombine()
method when applying the check.
cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh-72-75 (1)

72-75: ⚠️ Potential issue | 🟡 Minor

This fingerprint hashes vars_to_fix before it is populated.

vars_to_fix is a reusable scratch buffer, so the hash at this point reflects stale or uninitialized contents from a previous recombination. That makes the new determinism trace misleading.

Suggested fix
     CUOPT_DETERMINISM_LOG(
-      "FP rec: offspring hash 0x%x, vars to fix 0x%x",
-      offspring.get_hash(),
-      detail::compute_hash(make_span(vars_to_fix), offspring.handle_ptr->get_stream()));
+      "FP rec: offspring hash 0x%x", offspring.get_hash());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh` around lines
72 - 75, The determinism log is hashing vars_to_fix before it is populated,
causing stale/uninitialized data to be logged; update the FP rec logging so that
detail::compute_hash(make_span(vars_to_fix), offspring.handle_ptr->get_stream())
is called only after vars_to_fix has been filled with the current recombination
data (or compute the hash of the correct populated span), i.e., move the
compute_hash call to after the code that populates vars_to_fix (or ensure
vars_to_fix is explicitly zeroed/populated beforehand) and then pass that
populated span into CUOPT_DETERMINISM_LOG alongside offspring.get_hash().
benchmarks/linear_programming/cuopt/run_mip.cpp-299-299 (1)

299-299: ⚠️ Potential issue | 🟡 Minor

Typo in commented line: "1solution" should be "solution".

This appears to be a debugging artifact where a character was accidentally prepended to comment out the line.

🔧 Fix the comment
-  // 1solution.write_to_sol_file(base_filename + ".sol", handle_.get_stream());
+  // solution.write_to_sol_file(base_filename + ".sol", handle_.get_stream());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/linear_programming/cuopt/run_mip.cpp` at line 299, The
commented-out line contains a typo: "1solution.write_to_sol_file(base_filename +
\".sol\", handle_.get_stream());" — remove the leading '1' so the comment reads
"solution.write_to_sol_file(base_filename + \".sol\", handle_.get_stream());"
(or uncomment it if intended to be active). Update the comment around the symbol
solution.write_to_sol_file and keep the references to base_filename and
handle_.get_stream() unchanged.
cpp/tests/mip/feasibility_jump_tests.cu-192-197 (1)

192-197: ⚠️ Potential issue | 🟡 Minor

Static map for determinism tracking has thread-safety concern.

The static std::unordered_map is not thread-safe. If Google Test runs multiple test cases in parallel (via --gtest_parallel), concurrent access could cause data races.

Consider using thread_local or ensuring tests run sequentially.

Also, EXPECT_NEAR(..., 1.0) is a relatively loose tolerance for a determinism check — if the solver is truly deterministic, the values should match exactly (within floating-point representation).

🔧 Suggested fix for thread safety
-  static std::unordered_map<std::string, double> first_val_map;
+  // Thread-local to avoid races if tests run in parallel
+  static thread_local std::unordered_map<std::string, double> first_val_map;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/feasibility_jump_tests.cu` around lines 192 - 197, The static
unordered_map first_val_map used to record determinism is not thread-safe;
replace it with a thread-safe alternative and tighten the comparison: either
make first_val_map thread_local or protect access with a mutex around
reads/writes to first_val_map (referencing first_val_map and test_instance), and
change the assertion from EXPECT_NEAR(solution.get_user_objective(),
first_val_map[test_instance], 1.0) to a stricter equality check such as
EXPECT_DOUBLE_EQ(solution.get_user_objective(), first_val_map[test_instance]) or
EXPECT_NEAR(..., epsilon) with a very small epsilon (e.g., 1e-12) to reliably
detect non-determinism while avoiding data races when reading
solution.get_user_objective().
cpp/tests/mip/presolve_test.cu-151-157 (1)

151-157: ⚠️ Potential issue | 🟡 Minor

Hash both cached-bound branches.

The composite hash only includes a.second[0].var_to_cached_bound_map. If a.second[1] diverges, this test still reports deterministic behavior.

Suggested fix
-    auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
-      a.second[0].var_to_cached_bound_map.begin(), a.second[0].var_to_cached_bound_map.end());
-    for (const auto& [var_id, cached_bound] : sorted_map) {
-      var_to_cached_bound_keys.push_back(var_id);
-      var_to_cached_bound_lb.push_back(cached_bound.lb);
-      var_to_cached_bound_ub.push_back(cached_bound.ub);
-    }
+    for (const auto& cache_entry : a.second) {
+      auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
+        cache_entry.var_to_cached_bound_map.begin(), cache_entry.var_to_cached_bound_map.end());
+      for (const auto& [var_id, cached_bound] : sorted_map) {
+        var_to_cached_bound_keys.push_back(var_id);
+        var_to_cached_bound_lb.push_back(cached_bound.lb);
+        var_to_cached_bound_ub.push_back(cached_bound.ub);
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/presolve_test.cu` around lines 151 - 157, The test only hashes
a.second[0].var_to_cached_bound_map, so divergence in a.second[1] is ignored;
update the logic that builds sorted_map and the subsequent pushes
(var_to_cached_bound_keys, var_to_cached_bound_lb, var_to_cached_bound_ub) to
include both branches: iterate/merge a.second[0].var_to_cached_bound_map and
a.second[1].var_to_cached_bound_map (or otherwise combine their entries) in a
deterministic order (e.g., insert both into a std::map keyed by var_id) and then
push the keys and cached_bound_t (lb/ub) from that combined sorted_map so both
branches affect the composite hash.
cpp/src/branch_and_bound/pseudo_costs.cpp-223-260 (1)

223-260: ⚠️ Potential issue | 🟡 Minor

Forward work_ctx through this overload.

This wrapper accepts work_ctx but drops it when it calls trial_branching(), so callers using the atomic-counter overload silently bypass the new work-accounting path.

🩹 Proposed fix
   return trial_branching(original_lp,
                          settings,
                          var_types,
                          vstatus,
                          edge_norms,
                          basis_factors,
                          basic_list,
                          nonbasic_list,
                          branch_var,
                          branch_var_lower,
                          branch_var_upper,
                          upper_bound,
                          bnb_lp_iter_per_node,
                          start_time,
                          upper_max_lp_iter,
                          lower_max_lp_iter,
-                         total_lp_iter);
+                         total_lp_iter,
+                         work_ctx);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 223 - 260, The
wrapper trial_branching_generic currently accepts a cuopt::work_limit_context_t*
work_ctx parameter but calls trial_branching without forwarding it, which causes
callers using the work-accounting path to be bypassed; update the call to
trial_branching inside trial_branching_generic to pass the work_ctx argument
through (i.e., include the work_ctx when invoking trial_branching) so the
work-limit context is honored, keeping all other parameters unchanged; reference
trial_branching_generic and trial_branching to locate the change.

solution_t<i_t, f_t>& sol,
f_t lp_run_time_after_feasible,
timer_t& timer,
work_limit_timer_t& timer,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the relationship between work_limit_timer_t and timer_t
rg -n "work_limit_timer_t|class.*timer" cpp/src/utilities/timer.hpp cpp/src/utilities/termination_checker.hpp -A 5

# Check if work_limit_timer_t is a typedef or inherits from timer_t
ast-grep --pattern $'using work_limit_timer_t = $_'

Repository: aliceb-nv/cuopt-fork

Length of output: 2319


🏁 Script executed:

#!/bin/bash
# Extract function signatures from lb_constraint_prop.cu
echo "=== Checking apply_round signature around line 704 ==="
sed -n '700,710p' cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu

echo -e "\n=== Checking find_integer signature around line 759 ==="
sed -n '755,765p' cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu

echo -e "\n=== Checking run_repair_procedure signature around line 457 ==="
sed -n '453,463p' cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu

echo -e "\n=== Checking find_integer call in apply_round around line 729 ==="
sed -n '725,735p' cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 2200


🏁 Script executed:

#!/bin/bash
# Check timer_t and termination_checker_t class definitions and any inheritance
echo "=== timer_t class definition ==="
sed -n '16,60p' cpp/src/utilities/timer.hpp

echo -e "\n=== termination_checker_t class definition (first 50 lines) ==="
sed -n '30,80p' cpp/src/utilities/termination_checker.hpp

echo -e "\n=== Check for inheritance and compatibility methods ==="
rg -A 2 "class termination_checker_t|struct termination_checker_t" cpp/src/utilities/termination_checker.hpp

echo -e "\n=== Check if termination_checker_t inherits from timer_t ==="
rg "termination_checker_t\s*:" cpp/src/utilities/termination_checker.hpp

Repository: aliceb-nv/cuopt-fork

Length of output: 3214


🏁 Script executed:

#!/bin/bash
# Check if termination_checker_t has methods compatible with timer_t interface
echo "=== Check termination_checker_t methods ==="
rg -A 3 "remaining_time|check_time_limit|elapsed_time" cpp/src/utilities/termination_checker.hpp | head -40

echo -e "\n=== Full termination_checker_t class (up to line 237) ==="
wc -l cpp/src/utilities/termination_checker.hpp

echo -e "\n=== Check for conversion operator or implicit conversion ==="
rg "operator timer_t|explicit|implicit" cpp/src/utilities/termination_checker.hpp

Repository: aliceb-nv/cuopt-fork

Length of output: 1890


🏁 Script executed:

#!/bin/bash
# Check if find_integer is a template function
echo "=== Search for find_integer definition ==="
rg -B 5 "find_integer\s*\(" cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu | head -30

echo -e "\n=== Check function signature and template status ==="
sed -n '745,765p' cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 1290


Fix timer type incompatibility in apply_round and find_integer calls.

apply_round (line 704) accepts work_limit_timer_t& timer, but calls find_integer (line 729) which expects timer_t& timer (line 759). These are incompatible types—work_limit_timer_t is a typedef for termination_checker_t, which is unrelated to timer_t. Similarly, run_repair_procedure (line 457) also expects timer_t&. Update function signatures to use a consistent timer type across the call chain.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu` at line
704, The timer parameter types are inconsistent: apply_round currently takes
work_limit_timer_t& but calls find_integer and run_repair_procedure which expect
timer_t&; change the signatures of apply_round and any intermediate callers
(e.g., run_repair_procedure) to accept timer_t& (or change find_integer to
accept work_limit_timer_t& if you prefer that canonical type) so the same timer
type is used across the call chain; update all call sites to pass the matching
timer_t instance and adjust any typedef/using declarations if needed to make
timer_t the shared, compatible timer type.

@aliceb-nv aliceb-nv closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant