Skip to content

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#6

Closed
aliceb-nv wants to merge 277 commits into
release/26.04from
gpudet
Closed

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#6
aliceb-nv wants to merge 277 commits into
release/26.04from
gpudet

Conversation

@aliceb-nv
Copy link
Copy Markdown
Owner

Test PR to trigger coderabbit review

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This PR refactors MIP incumbent solution handling, determinism control, and resource management throughout the codebase. Key changes include: replacing binary determinism modes with bitmask-based flags, introducing extended solution callbacks with origin and work-timestamp metadata, replacing timer-based termination with work-unit-aware termination_checker_t, adding comprehensive solution publication/injection infrastructure, and updating GPU/CPU heuristics and presolve components to support deterministic work accounting and reproducible execution.

Changes

Cohort / File(s) Summary
Constants and Public API
cpp/include/cuopt/linear_programming/constants.h, cpp/include/cuopt/linear_programming/cuopt_c.h
Replaced binary determinism mode constants with bitmask-based flags (CUOPT_DETERMINISM_NONE, CUOPT_DETERMINISM_BB, CUOPT_DETERMINISM_GPU_HEURISTICS, CUOPT_DETERMINISM_FULL). Added MIP solution origin enum constants (14 values, 0–13). Introduced extended callback support via cuOptMIPSolutionCallbackInfo struct and cuOptMIPGetSolutionCallbackExt callback type.
C API Implementation
cpp/src/pdlp/cuopt_c.cpp
Added c_get_solution_callback_ext_t callback wrapper class and cuOptSetMIPGetSolutionCallbackExt C API function to register extended solution callbacks with origin and work-timestamp metadata.
Callback Infrastructure
cpp/src/mip_heuristics/solution_callbacks.cuh, cpp/include/cuopt/linear_programming/utilities/internals.hpp
Introduced solution_publication_t and solution_injection_t template classes for managing MIP solution callback lifecycle, including mip_solution_origin_t enum, mip_solution_callback_info_t struct, and get_solution_callback_ext_t polymorphic callback interface.
Termination/Work Accounting
cpp/src/utilities/termination_checker.hpp, cpp/src/utilities/work_limit_timer.hpp, cpp/src/utilities/work_limit_context.hpp
Added termination_checker_t unifying time and work-unit termination, work_limit_context_t tracking deterministic work with producer/consumer synchronization, and work_limit_timer_t as backward-compatible alias. Updated seed_generator to support per-thread state with epoch-based resets.
Determinism Logging
cpp/src/utilities/determinism_log.hpp, cpp/src/utilities/cuda_helpers.cuh
Added CUOPT_DETERMINISM_LOG macro infrastructure and NVTX-based memory initialization marking via mark_memory_as_initialized and mark_span_as_initialized helpers.
Branch-and-Bound Core
cpp/src/branch_and_bound/branch_and_bound.cpp, cpp/src/branch_and_bound/branch_and_bound.hpp, cpp/src/branch_and_bound/mip_node.hpp
Extended solution ingestion APIs to track mip_solution_origin_t and work_timestamp. Added emit_solution_callback and emit_solution_callback_from_crushed to publish incumbents. Refactored deterministic event replay to use deterministic_replay_solution_t. Added custom destructor to mip_node_t for non-recursive tree teardown. Changed report_heuristic signature to accept work time parameter.
Deterministic Workers
cpp/src/branch_and_bound/deterministic_workers.hpp
Extended queued_integer_solution_t with mip_solution_origin_t origin field. Added nodes_explored tracking to deterministic_snapshot_t and deterministic_worker_base_t. Updated BFS child-queuing logic and diving worker origin assignment.
Pseudo-Costs/Strong Branching
cpp/src/branch_and_bound/pseudo_costs.cpp, cpp/src/branch_and_bound/pseudo_costs.hpp
Added work-unit context propagation through strong branching and trial branching. Introduced reliable_variable_selection_core unified template accepting work context and pseudo-cost update callbacks. Extended pseudo_cost_snapshot_t with strong_branching_lp_iter_ field and record_update method.
MIP Solver Settings
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp, cpp/src/dual_simplex/simplex_solver_settings.hpp
Added bb_work_unit_scale, gpu_heur_work_unit_scale, cpufj_work_unit_scale (default 1.0) and gpu_heur_wait_for_exploration settings. Changed determinism_mode default from CUOPT_MODE_OPPORTUNISTIC to CUOPT_DETERMINISM_NONE. Replaced solution_callback with new_incumbent_callback accepting extended signature.
LP Solver Utilities
cpp/src/dual_simplex/basis_updates.cpp, cpp/src/dual_simplex/bound_flipping_ratio_test.cpp, cpp/src/dual_simplex/bound_flipping_ratio_test.hpp, cpp/src/dual_simplex/phase2.cpp
Added null checks and guards on std::log2 calls to handle degenerate cases (added size > 1 condition). Added work_unit_context null pointer checks before synchronization calls. Value-initialized work_estimate_ in bound_flipping_ratio_test_t.
Feasibility Jump
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh
Added deterministic work estimator initialization, work-unit accounting via charge_deterministic_iteration_work, score deterministic tie-breaking, constraint sorting in deterministic mode. Changed fj_staged_score_t from float to int32_t. Renamed init_lhs_and_violation to init_lhs_and_violated_constraints. Added work_limit field to fj_settings_t.
FJ Common/CPU
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_impl_common.cuh, cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu, cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh, cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh, cpp/src/mip_heuristics/feasibility_jump/utils.cuh
Updated constraint scoring to accept external weight parameters. Added CPU work-unit scaling environment variable support. Added work_budget field to fj_cpu_climber_t. Added sort method to contiguous_set_t and deterministic work charging to load-balancing path.
Diversity/Recombination
cpp/src/mip_heuristics/diversity/diversity_manager.cu, cpp/src/mip_heuristics/diversity/diversity_manager.cuh, cpp/src/mip_heuristics/diversity/recombiners/*
Changed timer type from timer_t to work_limit_timer_t throughout. Updated recombiner return types from pair<solution, bool> to tuple<solution, bool, double work>. Modified run_presolve signature to accept termination_checker_t&. Added solution origin tracking and callback wiring updates. Updated init_enabled_recombiners to accept context for determinism-based filtering.
Diversity Population
cpp/src/mip_heuristics/diversity/population.cu, cpp/src/mip_heuristics/diversity/population.cuh
Removed local solution_origin_t enum, switched to internals::mip_solution_origin_t. Added drained_external_solution_t struct pairing solution with origin. Updated callback invocation to branch on deterministic mode. Changed external solution handling and drain mechanics.
Local Search
cpp/src/mip_heuristics/local_search/local_search.cu, cpp/src/mip_heuristics/local_search/local_search.cuh, cpp/src/mip_heuristics/local_search/feasibility_pump/*, cpp/src/mip_heuristics/local_search/line_segment_search/*, cpp/src/mip_heuristics/local_search/rounding/*
Replaced timer_t with work_limit_timer_t in all LS methods. Added context parameter to line_segment_search_t constructor. Updated repair_problem signatures. Added deterministic iteration caps and work-unit accounting in bounds repair. Added extensive determinism logging throughout LS phases.
Presolve Components
cpp/src/mip_heuristics/presolve/bounds_presolve.cu, cpp/src/mip_heuristics/presolve/bounds_presolve.cuh, cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu, cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuh, cpp/src/mip_heuristics/presolve/multi_probe.cu, cpp/src/mip_heuristics/presolve/multi_probe.cuh, cpp/src/mip_heuristics/presolve/probing_cache.cu, cpp/src/mip_heuristics/presolve/probing_cache.cuh, cpp/src/mip_heuristics/presolve/lb_probing_cache.cu
Updated bound_update_loop and compute_probing_cache signatures to accept work_limit_timer_t& instead of timer_t. Added determinism-gated iteration limits and sorting. Introduced WORK_LIMIT termination criterion. Added set_deterministic method to third_party_presolve_t.
Presolve Utilities
cpp/src/mip_heuristics/presolve/bounds_update_data.cu, cpp/src/mip_heuristics/presolve/utils.cuh
Added explicit device buffer initialization with NaN/−1 fill. Added WORK_LIMIT to termination_criterion_t enum.
Problem/Solution Management
cpp/src/mip_heuristics/problem/problem.cu, cpp/src/mip_heuristics/problem/problem.cuh, cpp/src/mip_heuristics/problem/presolve_data.cu, cpp/src/mip_heuristics/problem/presolve_data.cuh, cpp/src/mip_heuristics/problem/problem_fixing.cuh, cpp/src/mip_heuristics/solution/solution.cu, cpp/src/mip_heuristics/solution/solution.cuh
Added handle override parameters to post-process/uncrush assignment methods. Updated branch_and_bound_callback signature to include mip_solution_origin_t. Added solution hash computation via get_hash(). Removed lower_slack/upper_slack from solution state. Added solution constructor accepting handle override.
Relaxed LP
cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu, cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cuh
Added iteration_limit and work_limit parameters to relaxed_lp_settings_t. Implemented determinism mode LP iteration estimation and work scaling. Changed logging levels and added iteration count tracking.
PDLP Scaling
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu, cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh
Added optional cudaStream_t stream_override parameter to unscale_solutions overloads for stream flexibility.
MIP Solver Integration
cpp/src/mip_heuristics/solve.cu, cpp/src/mip_heuristics/solver.cu, cpp/src/mip_heuristics/solver.cuh, cpp/src/mip_heuristics/solver_context.cuh
Changed run_mip to accept termination_checker_t& instead of timer_t&. Updated mip_solver_t constructor to store termination reference. Replaced solution_callback with new_incumbent_callback. Introduced bb_callback_adapter_t for B&B incumbent publishing. Added solution_publication and solution_injection to solver context. Implemented bitmask-based determinism checks.
Utilities
cpp/src/utilities/copy_helpers.hpp, cpp/src/utilities/timer.hpp, cpp/src/utilities/work_unit_scheduler.cpp
Added make_span overloads for rmm::device_scalar. Extended timer_t::check_time_limit signature with source context parameters. Updated work unit scheduler to use set_current_work instead of direct assignment.
Constants and Macros
cpp/src/mip_heuristics/mip_constants.hpp
Added work scale constants: BB_BASE_WORK_SCALE=1.0, GPU_HEUR_BASE_WORK_SCALE=0.4, CPUFJ_BASE_WORK_SCALE=1.0.
Test Infrastructure
cpp/tests/CMakeLists.txt, cpp/tests/mip/CMakeLists.txt, cpp/tests/mip/determinism_test.cu, cpp/tests/mip/determinism_utils.cuh, cpp/tests/mip/diversity_test.cu, cpp/tests/mip/feasibility_jump_tests.cu, cpp/tests/mip/load_balancing_test.cu, cpp/tests/mip/local_search_test.cu, cpp/tests/mip/mip_utils.cuh, cpp/tests/mip/multi_probe_test.cu, cpp/tests/mip/presolve_test.cu
Added CUPTI CMake detection and helper function. Enabled/added FJ, local search, and diversity tests. Extended determinism tests with callback sequence validation, GPU heuristic reproducibility checks, and multi-phase determinism verification. Added determinism_utils.cuh with spin-kernel helpers for stream synchronization testing. Added reusable MIP solver setup utilities.
Build and Dataset
cpp/CMakeLists.txt, benchmarks/linear_programming/cuopt/run_mip.cpp, ci/compute-sanitizer-suppressions.xml, datasets/mip/download_miplib_test_dataset.sh
Added commented-out PAPI header/library detection. Updated benchmark to add incumbent tracing callback, determinism mode selection, work-unit scale settings, and CSV output. Added Compute Sanitizer suppression rules for uninitialized memory patterns. Added supportcase42 MPS instance to download script.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gpudet

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
cpp/src/mip_heuristics/solver.cu (1)

302-317: ⚠️ Potential issue | 🔴 Critical

Deterministic B&B currently drops GPU-heuristic incumbents.

When CUOPT_DETERMINISM_BB is enabled, branch_and_bound_callback is left unset, so heuristic incumbents never reach queue_external_solution_deterministic(...). The fallback at the end then discards dm.run_solver()’s incumbent unless B&B itself retired a feasible solution. In this mode the solver can return no incumbent even though the heuristic side found one, and B&B also loses incumbent-driven pruning while it runs. Please either wire the deterministic handoff or stop launching concurrent heuristics for this mode until that path exists.

Also applies to: 365-379

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solver.cu` around lines 302 - 317, When
CUOPT_DETERMINISM_BB is set the code leaves
context.problem_ptr->branch_and_bound_callback unset causing GPU-heuristic
incumbents to be dropped; fix by wiring a deterministic handoff: in the branch
where (context.settings.determinism_mode & CUOPT_DETERMINISM_BB) set
context.problem_ptr->branch_and_bound_callback to a lambda that calls
branch_and_bound->queue_external_solution_deterministic(solution, 0.0) (use
branch_and_bound.get() as in the non-deterministic path), and keep
branch_and_bound->set_concurrent_lp_root_solve(false); alternatively (if
deterministic GPU heuristics are not available) ensure concurrent heuristics are
not launched by keeping set_concurrent_lp_root_solve(false) and explicitly
disabling any code paths that start heuristic runs when CUOPT_DETERMINISM_BB is
enabled so dm.run_solver() can’t produce an orphaned incumbent.
cpp/src/mip_heuristics/diversity/diversity_manager.cu (2)

126-144: ⚠️ Potential issue | 🔴 Critical

Initialize gpu_heur_loop.deterministic before the env-var early returns.

If CUOPT_CONFIG_ID is unset, invalid, or out of range, the constructor returns before Line 142. That leaves context.gpu_heur_loop.deterministic at its default and silently disables the deterministic GPU-heuristic path in normal runs.

♻️ Suggested fix
   const char* env_config_id_raw = std::getenv("CUOPT_CONFIG_ID");
+  context.gpu_heur_loop.deterministic =
+    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
+
   if (env_config_id_raw == nullptr) { return; }
@@
-
-  context.gpu_heur_loop.deterministic =
-    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 126 -
144, Set context.gpu_heur_loop.deterministic immediately during initialization
so it is always configured even when CUOPT_CONFIG_ID handling returns early;
specifically, before reading/validating env_config_id (env var CUOPT_CONFIG_ID)
and before any early returns (including the nullptr check, stoi catch, and
out-of-range check against max_config), assign
context.gpu_heur_loop.deterministic = (context.settings.determinism_mode &
CUOPT_DETERMINISM_GPU_HEURISTICS) so the deterministic GPU-heuristics flag is
not left at its default when env parsing fails or is skipped.

230-249: ⚠️ Potential issue | 🟠 Major

Presolve substeps are not chained to the presolve budget.

presolve_timer and probing_timer are parented to *context.termination, probing is sized from the original time_limit, and trivial_presolve is gated on global_timer. After bounds propagation spends time/work, probing and trivial presolve can still consume a fresh full slice instead of the remaining presolve budget.

♻️ Suggested fix
-  work_limit_timer_t presolve_timer(context.gpu_heur_loop, time_limit, *context.termination);
+  work_limit_timer_t presolve_timer(context.gpu_heur_loop, time_limit, global_timer);
@@
-    f_t time_for_probing_cache = std::min(max_time_on_probing, time_limit);
-    work_limit_timer_t probing_timer(
-      context.gpu_heur_loop, time_for_probing_cache, *context.termination);
+    f_t time_for_probing_cache =
+      std::min(max_time_on_probing, presolve_timer.remaining_time());
+    work_limit_timer_t probing_timer(
+      context.gpu_heur_loop, time_for_probing_cache, presolve_timer);
@@
-  if (!global_timer.check_time_limit()) { trivial_presolve(*problem_ptr, remap_cache_ids); }
+  if (!presolve_timer.check_time_limit()) { trivial_presolve(*problem_ptr, remap_cache_ids); }

Also applies to: 256-260

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 230 -
249, Presolve substeps (presolve_timer, probing_timer, and trivial_presolve
logic) currently each use the original termination/time_limit so they can exceed
the overall presolve budget; change them to consume the remaining presolve
budget instead: after creating presolve_timer (work_limit_timer_t
presolve_timer(..., time_limit, *context.termination)), compute the remaining
time/termination from presolve_timer (or derive a child termination) and pass
that into the probing_timer and into the trivial_presolve gating (replace use of
time_limit and global_timer with presolve_timer.remaining_time()/presolve
termination), and cap time_for_probing_cache with that remaining budget
(involving diversity_config.max_time_on_probing as before) so all substeps are
chained to presolve_timer and cannot exceed the presolve budget.
cpp/src/mip_heuristics/local_search/local_search.cu (1)

349-365: ⚠️ Potential issue | 🟠 Major

Clamp the infeasible-search slice against the caller’s remaining budget.

In the infeasible branch, timer is recreated from *context.termination with a fixed 0.25/0.5 slice. If the caller only has less than that remaining, this path still hands local search a fresh larger budget instead of respecting the incoming checker.

♻️ Suggested fix
 bool local_search_t<i_t, f_t>::run_local_search(solution_t<i_t, f_t>& solution,
                                                 const weight_t<i_t, f_t>& weights,
                                                 work_limit_timer_t timer,
                                                 const ls_config_t<i_t, f_t>& ls_config)
 {
@@
+  work_limit_timer_t phase_timer = timer;
   if (!solution.get_feasible()) {
     if (ls_config.at_least_one_parent_feasible) {
       fj_settings.time_limit = 0.5;
     } else {
       fj_settings.time_limit = 0.25;
     }
-    timer = work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit, *context.termination);
+    const auto phase_limit = std::min<f_t>(fj_settings.time_limit, timer.remaining_time());
+    phase_timer            = work_limit_timer_t(context.gpu_heur_loop, phase_limit, timer);
   } else {
-    fj_settings.time_limit = std::min(1., timer.remaining_time());
+    const auto phase_limit = std::min<f_t>(1., timer.remaining_time());
+    fj_settings.time_limit = phase_limit;
+    phase_timer            = work_limit_timer_t(context.gpu_heur_loop, phase_limit, timer);
   }
@@
-    is_feas = run_fj_line_segment(solution, timer, ls_config);
+    is_feas = run_fj_line_segment(solution, phase_timer, ls_config);
   } else {
@@
-    is_feas = run_fj_annealing(solution, timer, ls_config);
-    if (lp_optimal_exists) { is_feas = run_fj_line_segment(solution, timer, ls_config); }
+    is_feas = run_fj_annealing(solution, phase_timer, ls_config);
+    if (lp_optimal_exists) { is_feas = run_fj_line_segment(solution, phase_timer, ls_config); }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 349 - 365,
The infeasible branch recreates timer with a fixed fj_settings.time_limit
(0.25/0.5) ignoring the caller's remaining budget; modify run_local_search so
before assigning fj_settings.time_limit you query the incoming timer's remaining
budget and set fj_settings.time_limit = min(desired_slice, remaining_budget)
(use the existing timer instance's remaining-time accessor) then construct the
new work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit,
*context.termination); update references to timer, fj_settings.time_limit,
ls_config.at_least_one_parent_feasible, context.gpu_heur_loop and
context.termination accordingly.
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu (1)

401-403: ⚠️ Potential issue | 🟠 Major

Invalidate skipped binary vars before returning.

In FJ_MTM_SATISFIED, this early return leaves jump_move_scores, jump_move_delta, and jump_move_infeasibility untouched for binary objective variables. best_sat_cstr_mtm_move() later reduces over all fj.objective_vars with cached scores, so a binary var can still win with stale state from an earlier pass.

♻️ Suggested fix
   if constexpr (move_type == MTMMoveType::FJ_MTM_SATISFIED) {
-    if (fj.pb.is_binary_variable[var_idx]) return;
+    if (fj.pb.is_binary_variable[var_idx]) {
+      if (threadIdx.x == 0) {
+        fj.jump_move_delta[var_idx]         = 0;
+        fj.jump_move_scores[var_idx]        = fj_t<i_t, f_t>::move_score_t::invalid();
+        fj.jump_move_infeasibility[var_idx] = -std::numeric_limits<f_t>::infinity();
+      }
+      return;
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu` around
lines 401 - 403, In the early-return for MTMMoveType::FJ_MTM_SATISFIED when
fj.pb.is_binary_variable[var_idx] is true, invalidate the cached per-variable
outputs (jump_move_scores, jump_move_delta, jump_move_infeasibility) for this
var_idx before returning so stale values cannot win in best_sat_cstr_mtm_move();
set the score entry to a definitive "ignore" value (e.g., -INFINITY or other
sentinel used elsewhere) and reset delta/infeasibility to their neutral/zero
equivalents so subsequent reductions over fj.objective_vars skip this binary
variable.
🟠 Major comments (25)
ci/compute-sanitizer-suppressions.xml-109-122 (1)

109-122: ⚠️ Potential issue | 🟠 Major

Narrow this cuMemcpyDtoHAsync suppression to avoid masking real bugs.

This rule is too broad for InitcheckApiError: it can suppress unrelated 16-byte host-copy defects anywhere that matches cuMemcpyDtoHAsync.*. Add at least one additional discriminating frame (e.g., owning module/function) tied to the known harmless tuple-padding path.

Suggested tightening
   <record>
     <kind>InitcheckApiError</kind>
     <level>Error</level>
     <what>
       <text>Host API uninitialized memory access</text>
       <accessSize>16</accessSize>
     </what>
     <hostStack>
       <frame>
         <func>cuMemcpyDtoHAsync.*</func>
         <module>.*libcuda.so.*</module>
       </frame>
+      <frame>
+        <module>.*libcuopt.so.*</module>
+      </frame>
+      <!-- Optional: add a specific thrust/cub caller frame if stable in your stack traces -->
     </hostStack>
   </record>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/compute-sanitizer-suppressions.xml` around lines 109 - 122, The
suppression for InitcheckApiError is too broad because the <hostStack> only
matches cuMemcpyDtoHAsync.* and libcuda.so, so narrow it by adding an extra
discriminating frame tied to the known harmless tuple-padding path: find the
<record> with <kind>InitcheckApiError</kind> that contains the cuMemcpyDtoHAsync
frame and add one more <frame> entry (e.g., a <func> regex and/or <module>
regex) that matches the specific internal function or module responsible for the
benign tuple-padding copy (for example the project-specific function name or
module that owns the padding path), so only that call-site is suppressed rather
than all cuMemcpyDtoHAsync matches.
ci/compute-sanitizer-suppressions.xml-209-222 (1)

209-222: ⚠️ Potential issue | 🟠 Major

Avoid blanket suppression of all cuMemcpyDtoDAsync uninitialized-copy errors.

This rule suppresses virtually every DtoD uninitialized-copy API error and can hide true memory-initialization regressions in CI. Scope it to a known-safe call path (module + caller frame) instead of suppressing globally.

Suggested tightening
   <record>
     <kind>InitcheckApiError</kind>
     <level>Error</level>
     <what>
       <text>Host API uninitialized memory access</text>
     </what>
     <hostStack>
       <frame>
         <func>cuMemcpyDtoDAsync.*</func>
         <module>.*libcuda.so.*</module>
       </frame>
+      <frame>
+        <module>.*libcuopt.so.*</module>
+      </frame>
+      <!-- Add stable caller symbol(s) for the intended benign path -->
     </hostStack>
   </record>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/compute-sanitizer-suppressions.xml` around lines 209 - 222, The
suppression entry currently blanket-matches Host API uninitialized memory
accesses for any cuMemcpyDtoDAsync call and can hide real regressions; narrow
the rule by restricting the hostStack to the known-safe caller and module (e.g.,
add a specific <frame> with the exact calling function regex and the concrete
module basename or path instead of just .*libcuda.so.*) or add an additional
<frame> parent call pattern to limit scope to the trusted call path (reference
the existing cuMemcpyDtoDAsync hostStack record to update); ensure the revised
record only suppresses the specific safe code path rather than all
cuMemcpyDtoDAsync invocations.
cpp/src/branch_and_bound/mip_node.hpp-75-76 (1)

75-76: ⚠️ Potential issue | 🟠 Major

Default move operations don't update child parent pointers.

std::unique_ptr<mip_node_t> children[2] transfer ownership when moved, but each child's raw parent pointer remains unchanged. Moving any node with populated children will leave those children pointing to the moved-from location, breaking upward traversal invariants.

While current code avoids this by moving the root only before children are added, the design is unsafe. Either delete move support or implement custom move operations that update direct children's parent pointers to the new location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/mip_node.hpp` around lines 75 - 76, The defaulted
move constructor and move assignment for mip_node_t are unsafe because moving
transfers the children unique_ptrs but leaves each child's raw parent pointer
pointing to the moved-from object; either delete move support or implement
custom move operations: write a custom mip_node_t(mip_node_t&&) and mip_node_t&
operator=(mip_node_t&&) that move all members (including children array) and
after moving set children[0]->parent and children[1]->parent (when non-null) to
this, and ensure the moved-from object's child pointers are nullified, or
alternatively mark the move ctor/assign as deleted to enforce no moves. Ensure
you update both direct children and preserve other member moves correctly in the
implementations.
cpp/src/mip_heuristics/diversity/weights.cuh-30-33 (1)

30-33: ⚠️ Potential issue | 🟠 Major

Don't default get_hash() to cuda_stream_default.

This object is written on the caller's handle stream, but a zero-arg get_hash() will read cstr_weights and objective_weight on the default stream. That can observe stale state or force hidden cross-stream synchronization, which makes the determinism hash itself unreliable.

Minimal fix
-  uint32_t get_hash(rmm::cuda_stream_view stream = rmm::cuda_stream_default) const
+  uint32_t get_hash(rmm::cuda_stream_view stream) const
#!/bin/bash
set -euo pipefail
rg -nP '\.get_hash\s*\(\s*\)'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/weights.cuh` around lines 30 - 33, The
get_hash method currently defaults its stream parameter to
rmm::cuda_stream_default, which may read cstr_weights and objective_weight on
the wrong stream; change get_hash to require an explicit rmm::cuda_stream_view
parameter (remove the default value) and ensure callers pass the caller's handle
stream; use that stream when calling compute_hash(cstr_weights, stream) and
compute_hash(objective_weight.value(stream)) and update all call sites that
invoke get_hash() (search for ".get_hash(") to forward the correct stream to
avoid cross-stream reads/synchronization.
cpp/tests/mip/determinism_utils.cuh-47-54 (1)

47-54: ⚠️ Potential issue | 🟠 Major

This helper can stall teardown for up to 1s per instance.

The polling thread sleeps for 1..1000 ms between checks, and spin_stream_raii_t::~spin_stream_raii_t() can only stop by setting the flag and joining. If destruction happens during that sleep, test shutdown inherits the full delay. A wakeable stop signal, or at least a much shorter fixed backoff, would keep determinism tests fast and reproducible.

Also applies to: 64-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_utils.cuh` around lines 47 - 54, The polling loop
in spin_stream_raii_t uses std::this_thread::sleep_for(rand()%1000+1) which can
stall teardown up to 1s; change the loop to use a wakeable wait (preferred) or a
much shorter fixed backoff: add a std::condition_variable and std::mutex to the
spin_stream_raii_t, replace sleep_for with cv.wait_for(lock, short_duration,
[&]{ return stop_flag; }) and in the destructor set the stop_flag, notify_one()
and join; alternatively, if you prefer simpler change, replace the randomized
1..1000ms sleep with a small fixed backoff (e.g., 1–10ms) so spin_kernel,
host_copy(flag,...), stream_view and the destructor join return quickly.
cpp/src/mip_heuristics/presolve/bounds_presolve.cu-174-176 (1)

174-176: ⚠️ Potential issue | 🟠 Major

Avoid permanently mutating settings.iteration_limit inside the solve loop.

At Line 175, the member is clamped in place. If this object is reused, later solves stay capped at 50 iterations even outside the intended scope. Use a local effective limit instead.

Proposed fix
-  if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) {
-    settings.iteration_limit = std::min(settings.iteration_limit, 50);
-  }
+  i_t effective_iteration_limit = settings.iteration_limit;
+  if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) {
+    effective_iteration_limit = std::min(effective_iteration_limit, i_t{50});
+  }

   i_t iter;
   upd.init_changed_constraints(pb.handle_ptr);
-  for (iter = 0; iter < settings.iteration_limit; ++iter) {
+  for (iter = 0; iter < effective_iteration_limit; ++iter) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/bounds_presolve.cu` around lines 174 - 176,
The code currently mutates settings.iteration_limit in place when
context.settings.determinism_mode has CUOPT_DETERMINISM_GPU_HEURISTICS, which
permanently clamps the setting for future solves; instead compute a local
effective limit (e.g. effective_iteration_limit =
std::min(settings.iteration_limit, 50)) inside the solve loop or the function
that uses it and use that local variable wherever the iteration cap is applied
(refer to settings.iteration_limit usage in this file and replace with
effective_iteration_limit), leaving the settings object unmodified.
cpp/src/mip_heuristics/presolve/probing_cache.cu-707-712 (1)

707-712: ⚠️ Potential issue | 🟠 Major

Resolve duplicate substitutions with a stable rule.

This keeps whichever proposal for a substituted_var is visited first, but the traversal is driven by std::unordered_map/std::unordered_set iteration order. In a determinism-sensitive path that makes the winning substitution hash-order dependent, so equivalent runs can still diverge. Pick the winner by a stable key (for example timestamp, then variable ids) before applying and storing it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu` around lines 707 - 712, The
current loop over all_substitutions uses unordered iteration and keeps the
first-seen substitution for each substitution.substituted_var, which makes
winner selection non-deterministic; instead, build a deterministic selection
pass: collect all proposals into a temporary map keyed by
substitution.substituted_var, compare candidates by a stable comparator (first
substitution.timestamp, then substituting_var id, then substituted_var id) to
choose the winner for each substituted_var, then iterate that map to apply and
log the chosen substitutions; update references in this block
(all_substitutions, substitution.substituted_var, substituting_var, and
substitution.timestamp) accordingly and remove reliance on seen_substituted
during the first discovery loop.
cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu-704-705 (1)

704-705: ⚠️ Potential issue | 🟠 Major

Thread the work-limit timer all the way through repair.

apply_round() now creates a work_limit_timer_t, but it still hands it off to find_integer(timer_t&), and the repair path below continues with plain timer_t instances. That strips the parent termination/work-limit context in the deepest loop, so repair can continue past the root deadline even though this entrypoint was updated.

Also applies to: 724-730

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu` around
lines 704 - 705, The work-limit context is lost because apply_round() creates a
work_limit_timer_t but calls find_integer(timer_t&) and subsequent repair paths
use plain timer_t; update the function signatures and calls so the
work_limit_timer_t is threaded through: change find_integer(timer_t&) to
find_integer(work_limit_timer_t&) (and similarly for any functions in the repair
path that accept timer_t, e.g., the repair function(s) invoked from
find_integer), update all call sites (including apply_round and the deeper
repair loop functions referenced in lb_constraint_prop.cu lines around 704 and
724–730) to pass the work_limit_timer_t instance instead of timer_t, and ensure
any places constructing timers use or wrap into work_limit_timer_t so the parent
deadline/termination context is preserved.
cpp/include/cuopt/linear_programming/cuopt_c.h-74-90 (1)

74-90: ⚠️ Potential issue | 🟠 Major

The exported callback info struct is missing its advertised size field.

The docs for cuOptMIPGetSolutionCallbackExt say callback_info->struct_size is always set, but cuOptMIPSolutionCallbackInfo only contains origin and work_timestamp. As shipped, the new C ABI cannot support the append-only growth contract described here, and clients following the docs will read past the struct. Either add struct_size as the first field or remove that contract from the API/docs.

Suggested fix
 typedef struct {
+  /** Size of this struct in bytes. */
+  uint64_t struct_size;
   /** Which solver component found this solution (CUOPT_MIP_SOLUTION_ORIGIN_*). */
   uint32_t origin;
   /** Deterministic work-unit timestamp at which the solution was found.
    *  Monotonically increasing across successive callbacks within a single solve.
    *  In non-deterministic mode this value is informational only. */
   double work_timestamp;
 } cuOptMIPSolutionCallbackInfo;

Also applies to: 734-751

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/cuopt_c.h` around lines 74 - 90, The
exported callback struct cuOptMIPSolutionCallbackInfo lacks the advertised
struct_size field; add a uint32_t struct_size as the first member of
cuOptMIPSolutionCallbackInfo (so size is always available and future fields can
be appended) and update any places that populate this struct to set struct_size
accordingly; also make the same change for the other callback-ext struct
mentioned (the one referenced at lines 734-751) so both follow the ABI
append-only contract and consumers can safely check struct_size before accessing
newer fields.
cpp/src/mip_heuristics/solver_context.cuh-72-74 (1)

72-74: ⚠️ Potential issue | 🟠 Major

Make termination non-null by construction.

This shared context now carries a raw pointer that defaults to nullptr, but new timer setup in this PR dereferences *context.termination unconditionally. That leaves a crash window until some external setup step patches the pointer. Please require a checker in the constructor or keep a root checker by value/reference instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solver_context.cuh` around lines 72 - 74, The shared
context currently stores a raw pointer termination that defaults to nullptr and
is later dereferenced by timer setup; update the solver context so termination
cannot be null at construction: either (A) change the member from
cuopt::termination_checker_t* termination to a by-value or reference member
(e.g., cuopt::termination_checker_t termination or cuopt::termination_checker_t&
termination) so the root checker always exists, or (B) add a constructor for the
context (the struct used by mip_solver_t) that requires a
cuopt::termination_checker_t* parameter, asserts or throws on null, and
initializes termination from that pointer; also update mip_solver_t creation
sites to pass the checker. Ensure all references to termination in timer setup
use the non-null member.
cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu-75-98 (1)

75-98: ⚠️ Potential issue | 🟠 Major

Preserve the hard wall-clock cap in deterministic LP mode.

This branch sets pdlp_settings.time_limit to infinity, but the solve still runs under a plain timer_t later in the function. That means a single LP call can blow past the caller/global deadline whenever work_limit is finite. The work-derived iteration budget should be an additional limit, not a replacement for the outer termination ceiling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` around lines 75 - 98, The
code currently zeroes out the wall-clock cap by setting pdlp_settings.time_limit
= infinity in the determinism_mode branch, which lets the later plain timer_t
bypass the caller's work_limit; instead, preserve the outer deadline by
assigning pdlp_settings.time_limit to the caller's work_limit when that limit is
finite (use the same variable logged as settings.work_limit or work_limit,
remembering units — pdlp_settings.time_limit is in seconds), and only set it to
infinity when work_limit is infinite; keep the iteration budget logic
(estim_iters, pdlp_settings.iteration_limit) as an additional cap so both
iteration and wall-clock limits (used alongside timer_t) are enforced.
cpp/include/cuopt/linear_programming/utilities/internals.hpp-72-72 (1)

72-72: ⚠️ Potential issue | 🟠 Major

Keep existing callback type values stable.

Adding GET_SOLUTION_EXT in the middle changes SET_SOLUTION from 1 to 2. Any client callback class compiled against the previous header will still return the old value from the inline get_type(), so a newer library can misclassify it. Please give the enum explicit values and append the new member instead.

Suggested fix
-enum class base_solution_callback_type { GET_SOLUTION, GET_SOLUTION_EXT, SET_SOLUTION };
+enum class base_solution_callback_type {
+  GET_SOLUTION = 0,
+  SET_SOLUTION = 1,
+  GET_SOLUTION_EXT = 2,
+};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/utilities/internals.hpp` at line 72, The
enum base_solution_callback_type must keep existing numeric values stable:
explicitly assign values to the original members (e.g., GET_SOLUTION = 0,
SET_SOLUTION = 1) and move the new member to the end (e.g., GET_SOLUTION_EXT =
2) so existing client binaries still match their compiled get_type() results;
update the enum declaration in internals.hpp to use these explicit integer
assignments and ensure no other code assumes implicit ordering.
cpp/tests/mip/determinism_test.cu-296-299 (1)

296-299: ⚠️ Potential issue | 🟠 Major

Compare a GPU-origin subsequence, not just the first five raw callbacks.

The test only proves that GPU callbacks appear somewhere in the 32 captured entries. expect_callback_prefixes_bitwise_equal(..., 5, ...) still compares the unfiltered prefix, so a divergent GPU callback stream later in the capture can pass unnoticed. Filter to GPU origins before comparing, or extend the compared window until it includes the first N GPU callbacks.

Also applies to: 342-346

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_test.cu` around lines 296 - 299, The test currently
compares only the first raw callbacks (using callback_compare_count = 5) which
can miss GPU-origin callbacks occurring later; change the comparison to operate
on the filtered GPU-origin subsequence instead of the unfiltered prefix (or
alternatively increase the compared window to include the first N GPU
callbacks). Specifically, update the calls to
expect_callback_prefixes_bitwise_equal (and related checks around
callback_compare_count / callback_capture_limit) to first filter the captured
callbacks for origin == GPU and then compare the first N GPU callbacks (use
min_gpu_callback_count to choose N), ensuring you reference the variables
callback_compare_count, callback_capture_limit, min_gpu_callback_count and the
function expect_callback_prefixes_bitwise_equal when locating and modifying the
logic.
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-1054-1054 (1)

1054-1054: ⚠️ Potential issue | 🟠 Major

Don't discard the explicit deterministic work budget.

This overwrites settings.work_limit with settings.time_limit, so a caller-provided work budget is never honored. host_loop() then builds its limiter from settings.time_limit, which means deterministic FJ is still effectively wall-clock-budgeted.

🛠️ Suggested fix
-  bool deterministic = (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
-  if (deterministic) {
-    settings.time_limit = std::max((f_t)0.0, settings.time_limit);
-    settings.work_limit = settings.time_limit;
-  }
+  bool deterministic = (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
+  settings.time_limit = std::max((f_t)0.0, settings.time_limit);
+  settings.work_limit = std::max((f_t)0.0, settings.work_limit);
-  work_limit_timer_t timer(context.gpu_heur_loop, settings.time_limit, *context.termination);
+  const f_t budget = deterministic_work_estimate ? settings.work_limit : settings.time_limit;
+  work_limit_timer_t timer(context.gpu_heur_loop, budget, *context.termination);

Also applies to: 1263-1267

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` at line 1054,
The code constructs a work_limit_timer_t using settings.time_limit, which
overwrites any caller-provided deterministic work budget (settings.work_limit)
and causes host_loop() to build its limiter from time_limit; change the
construction so the deterministic work budget is preserved—use
settings.work_limit when it is set/meaningful (or pass both values
appropriately) to initialize work_limit_timer_t (the place where timer is
created for context.gpu_heur_loop) and apply the same fix in the other
occurrence around the block at the later occurrence (lines near the second timer
creation), ensuring host_loop() sees the original settings.work_limit rather
than being forced to use settings.time_limit.
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-1036-1040 (1)

1036-1040: ⚠️ Potential issue | 🟠 Major

Disable graph batching when work-limit termination is active.

use_graph stays true even under deterministic accounting, so the loop can only notice exhaustion after an entire iterations_per_graph batch completes. With the current default of 50, small work budgets can overshoot by a full graph launch.

🛠️ Suggested fix
-  const bool use_graph           = true;
+  const bool use_graph           = !deterministic_work_estimate;
   const i_t iterations_per_batch = use_graph ? iterations_per_graph : 1;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines
1036 - 1040, The loop currently keeps graph batching enabled even under
deterministic work accounting, so set use_graph to false when
deterministic_work_estimate is active to avoid overshooting work-limited
termination; change the initialization of use_graph to depend on
deterministic_work_estimate (e.g. use_graph = !deterministic_work_estimate) and
keep iterations_per_batch = use_graph ? iterations_per_graph : 1 so that when
context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS is set you
fall back to single-iteration launches.
cpp/src/mip_heuristics/problem/problem.cu-1426-1434 (1)

1426-1434: ⚠️ Potential issue | 🟠 Major

Guard the empty substitution case before indexing var_indices[0].

This validation still accepts an empty var_indices, and the next line unconditionally reads var_indices[0]. If a presolve path hands you an empty batch, this becomes UB/crash instead of a no-op.

🛠️ Suggested fix
   cuopt_assert((are_exclusive<i_t, f_t>(var_indices, var_to_substitute_indices)),
                "variables and var_to_substitute_indices are not exclusive");
+  if (var_indices.empty()) { return; }
   {
     std::vector<i_t> sorted_vi(var_indices);
     std::sort(sorted_vi.begin(), sorted_vi.end());
     cuopt_assert(std::adjacent_find(sorted_vi.begin(), sorted_vi.end()) == sorted_vi.end(),
                  "var_indices must not contain duplicates");
   }
   const i_t dummy_substituted_variable = var_indices[0];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1426 - 1434, The code
reads var_indices[0] without checking for empty input; before using
var_indices[0] (used to set dummy_substituted_variable) add an explicit guard
for the empty-substitution case: check var_indices.empty() immediately after
validating exclusivity (and before sorting/duplicate checks or use that check to
short-circuit) and either return/skip the substitution logic (no-op) or assert
with a clear message; update logic around are_exclusive<i_t,f_t>(var_indices,
var_to_substitute_indices) and the block that computes sorted_vi so they
handle/skip when var_indices is empty, ensuring dummy_substituted_variable is
only assigned when var_indices is non-empty.
cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu-1079-1082 (1)

1079-1082: ⚠️ Potential issue | 🟠 Major

Chain the repair timer to max_timer.

apply_round() just created max_timer to cap bounds propagation, but this child timer bypasses it and starts from the global termination checker. On a long outer solve, repair can consume more budget than max_timer still has before the next local limit check.

🛠️ Suggested fix
-      work_limit_timer_t repair_timer(
-        context.gpu_heur_loop, timer.remaining_time() / 5, *context.termination);
+      work_limit_timer_t repair_timer(
+        context.gpu_heur_loop, max_timer.remaining_time() / 5, max_timer);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines
1079 - 1082, The repair timer is currently constructed with the global
termination checker (*context.termination) and thus can bypass the cap imposed
by max_timer created in apply_round(); change the work_limit_timer_t
construction for repair_timer so it is chained to max_timer (i.e., use the
max_timer termination/parent instead of *context.termination) so repair respects
the max_timer budget — locate the work_limit_timer_t repair_timer initialization
in constraint_prop.cu and replace the termination argument with the
max_timer/its termination handle created earlier in apply_round().
cpp/src/mip_heuristics/problem/problem.cu-1493-1513 (1)

1493-1513: ⚠️ Potential issue | 🟠 Major

Use a stable sort to ensure deterministic floating-point reduction order.

The code explicitly aims for "deterministic reduction" but uses thrust::sort_by_key, which is unstable and does not preserve the relative order of equal keys. When two substitutions target the same variable, their deltas can be accumulated in different orders across runs. Since floating-point addition is not associative, different accumulation orders produce different results due to rounding. Replace with thrust::stable_sort_by_key to guarantee the reduction order remains canonical.

🛠️ Suggested fix
-  thrust::sort_by_key(handle_ptr->get_thrust_policy(),
-                      obj_coeff_keys.begin(),
-                      obj_coeff_keys.end(),
-                      obj_coeff_deltas.begin());
+  thrust::stable_sort_by_key(handle_ptr->get_thrust_policy(),
+                             obj_coeff_keys.begin(),
+                             obj_coeff_keys.end(),
+                             obj_coeff_deltas.begin());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1493 - 1513, The sort
used before reduction is unstable: replace
thrust::sort_by_key(handle_ptr->get_thrust_policy(), obj_coeff_keys.begin(),
obj_coeff_keys.end(), obj_coeff_deltas.begin()) with
thrust::stable_sort_by_key(...) so that equal keys in obj_coeff_keys keep a
canonical order before thrust::reduce_by_key; this change affects the sorting
step that feeds unique_keys/summed_deltas and ensures the subsequent
thrust::reduce_by_key and the for_each lambda updating objective_coefficients
produce deterministic results.
cpp/tests/mip/determinism_test.cu-140-159 (1)

140-159: ⚠️ Potential issue | 🟠 Major

expect_callback_prefixes_bitwise_equal() uses EXPECT_EQ on floating-point values but doesn't guarantee bitwise equality.

EXPECT_EQ on double will accept IEEE 754 equivalent values like +0.0 and -0.0 despite their bitwise difference, and may not catch subtle representation changes that indicate non-deterministic behavior. Since the function name explicitly promises bitwise equality and is used for determinism testing, compare the raw bits of objective, solution_bound, and each assignment entry instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_test.cu` around lines 140 - 159,
expect_callback_prefixes_bitwise_equal currently uses EXPECT_EQ on floating
fields but must enforce true bitwise equality; change comparisons for
callback_solution_t::objective, ::solution_bound and each double in ::assignment
to compare their raw IEEE-754 bit patterns (e.g., reinterpret_cast<const
uint64_t&> or memcpy into uint64_t and compare) instead of EXPECT_EQ on doubles,
keeping the existing failure messages and using EXPECT_EQ on the integer
bit-representations so +0.0 vs -0.0 and any bit-level differences are caught;
leave the size and origin checks as-is and apply this bitwise comparison for
each i and each j in the loops.
cpp/tests/mip/diversity_test.cu-103-107 (1)

103-107: ⚠️ Potential issue | 🟠 Major

Use the new bitmask determinism flags in these tests.

This file still passes CUOPT_MODE_DETERMINISTIC*, while the solver changes in this PR branch on CUOPT_DETERMINISM_*. If the legacy names are gone, this won’t compile; if they are aliased, the new tests still aren’t exercising the bitmask-based path you just added.

Also applies to: 163-166, 224-227, 249-253

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/diversity_test.cu` around lines 103 - 107, Replace legacy enum
usage with the new bitmask determinism flags: set settings.determinism_mode to
the CUOPT_DETERMINISM_* bitmask constant instead of CUOPT_MODE_DETERMINISTIC_*
(e.g., change settings.determinism_mode =
CUOPT_MODE_DETERMINISTIC_GPU_HEURISTICS; to settings.determinism_mode =
CUOPT_DETERMINISM_GPU_HEURISTICS;). Update the same replacement for every
occurrence in this test (also at the other spots you noted around the settings
variable), so the test exercises the new bitmask-based determinism path with the
existing settings.
cpp/tests/mip/diversity_test.cu-265-296 (1)

265-296: ⚠️ Potential issue | 🟠 Major

Don’t retain recombiner gold hashes across unrelated test runs.

hash_map is static and keyed only by (path, i, j, recombiner), so repeating the same test binary with a different seed in one process compares against stale state and can fail spuriously. The exit(1) path then aborts the whole GTest process instead of surfacing a normal assertion failure. Keep the gold state scoped to one test case (or include seed in the key) and fail through GTest instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/diversity_test.cu` around lines 265 - 296, The test currently
uses a static hash_map (symbol: hash_map) keyed by (path, i, j, recombiner)
which lets stale gold hashes persist across runs; make the gold state local to
the test by removing the static qualifier and declaring hash_map inside the test
scope (or alternatively add the current seed to the tuple key) so entries are
not reused across unrelated runs, and replace the process-aborting exit(1) with
a GTest failure (e.g., ASSERT_EQ / EXPECT_EQ or FAIL()) when the stored hash
(lookup of std::make_tuple(path, i, j, recombiner)) does not match
offspring.get_hash() produced by diversity_manager.recombine called on
pop_vector; this ensures failures surface via GTest instead of killing the whole
process.
cpp/src/mip_heuristics/diversity/population.cu-224-231 (1)

224-231: ⚠️ Potential issue | 🟠 Major

new_best_feasible_objective is updated in the wrong direction.

This is a minimization objective everywhere else. With >, a worse non-CPUFJ solution relaxes the threshold and a better non-CPUFJ solution never tightens it, so the CPUFJ filter can admit solutions that are worse than the real best in the same batch.

♻️ Suggested fix
-      } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP &&
-                 h_entry.objective > new_best_feasible_objective) {
+      } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP &&
+                 h_entry.objective < new_best_feasible_objective) {
         new_best_feasible_objective = h_entry.objective;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/population.cu` around lines 224 - 231, The
code treats the objective as maximized; flip the comparisons so we tighten
toward smaller (better) objectives: when checking CPU_FEASIBILITY_JUMP
(h_entry.origin == internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP) skip
the entry if it is not better or equal (use h_entry.objective >=
new_best_feasible_objective), and when handling non-CPUFJ entries update
new_best_feasible_objective only if h_entry.objective is strictly smaller (use
h_entry.objective < new_best_feasible_objective); update the comparisons around
h_entry.objective and new_best_feasible_objective accordingly.
cpp/src/utilities/termination_checker.hpp-88-109 (1)

88-109: ⚠️ Potential issue | 🟠 Major

Synchronize the finished transition using atomic.

check() can run concurrently in the deterministic work-accounting solver (which uses OpenMP threading), but the finished flag is a plain bool (line 214). The read-check-write pattern at lines 92–93 (if (finished_now && !finished) { finished = true; ... }) is a data race: two threads can both read finished as false and both execute the log block. Even though finished is only meant to suppress duplicate tracing, the C++ memory model requires synchronization.

Suggested fix
+#include <atomic>
+
   bool check(const char* caller = __builtin_FUNCTION(),
              const char* file   = __builtin_FILE(),
              int line           = __builtin_LINE()) const noexcept
   {
@@
     if (deterministic) {
       if (!work_context) { return false; }
       double elapsed_since_start = work_context->current_work() - work_units_at_start;
       bool finished_now          = elapsed_since_start >= work_limit;
-      if (finished_now && !finished) {
-        finished                   = true;
+      bool expected = false;
+      if (finished_now && finished.compare_exchange_strong(expected, true)) {
         double actual_elapsed_time = timer.elapsed_time();
@@
-      return finished;
+      return finished.load(std::memory_order_relaxed);
     } else {
       return timer.check_time_limit();
     }
   }
@@
-  mutable bool finished{false};
+  mutable std::atomic<bool> finished{false};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/termination_checker.hpp` around lines 88 - 109, The
transition of the plain bool finished in termination_checker::check() is racy
under OpenMP; change finished to an std::atomic<bool> and use an atomic
compare-and-exchange (or exchange) to ensure only one thread transitions it to
true and runs the logging block: compute finished_now as before, then if
(finished_now) attempt to atomically set finished from false to true and only
call timer.elapsed_time() and CUOPT_LOG_TRACE when the atomic operation
succeeded; leave other reads of finished as atomic loads to avoid data races and
preserve the suppression of duplicate traces.
cpp/src/branch_and_bound/branch_and_bound.cpp-732-751 (1)

732-751: ⚠️ Potential issue | 🟠 Major

Publish incumbents after releasing mutex_upper_.

Both paths keep mutex_upper_ held while they log/report and then call into the incumbent callback path. That makes a user callback plus uncrush work part of the critical section, which can stall all incumbent updates and can deadlock if the callback re-enters solver code.

Suggested unlock-before-publish pattern
-  mutex_upper_.lock();
-  if (obj < upper_bound_) {
-    upper_bound_ = obj;
-    incumbent_.set_incumbent_solution(obj, x);
-    report_heuristic(obj, work_timestamp);
-    emit_solution_callback_from_crushed(x, obj, origin, work_timestamp);
-  }
-  mutex_upper_.unlock();
+  bool publish = false;
+  mutex_upper_.lock();
+  if (obj < upper_bound_) {
+    upper_bound_ = obj;
+    incumbent_.set_incumbent_solution(obj, x);
+    publish = true;
+  }
+  mutex_upper_.unlock();
+
+  if (publish) {
+    report_heuristic(obj, work_timestamp);
+    emit_solution_callback_from_crushed(x, obj, origin, work_timestamp);
+  }

Also applies to: 955-981

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 732 - 751, The
incumbents are being published while holding mutex_upper_, which makes
logging/reporting and the emit_solution_callback_from_crushed path part of the
critical section; change both update paths (the block updating upper_bound_,
incumbent_.set_incumbent_solution, CUOPT_DETERMINISM_LOG, report_heuristic, and
emit_solution_callback_from_crushed, and the analogous block around lines
955-981) to: acquire mutex_upper_, check/assign previous_upper and update
upper_bound_ and incumbent_.set_incumbent_solution, capture any values needed
for logging/reporting (e.g., previous_upper, repaired_obj, hash,
queued_solution.work_timestamp, queued_solution.origin) into local variables,
then unlock mutex_upper_ before calling CUOPT_DETERMINISM_LOG, report_heuristic,
or emit_solution_callback_from_crushed so the callback and logging occur outside
the critical section. Ensure no shared state is mutated after unlocking.
cpp/src/branch_and_bound/branch_and_bound.cpp-3680-3690 (1)

3680-3690: ⚠️ Potential issue | 🟠 Major

Retire work_timestamp == horizon solutions in the same sync.

producer_sync_.wait_for_producers(horizon_end) already guarantees producers are caught up through this horizon, but the strict < deterministic_current_horizon_ check leaves exact-boundary solutions queued until the next sync. That delays incumbent pruning/publication by a full horizon and can change the explored tree.

Boundary fix
-        if (sol.work_timestamp < deterministic_current_horizon_) {
+        if (sol.work_timestamp <= deterministic_current_horizon_) {
           due_solutions.push_back(std::move(sol));
         } else {
           future_solutions.push_back(std::move(sol));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3680 - 3690, The
current loop keeps queued_external_solution_t entries whose work_timestamp
equals deterministic_current_horizon_ instead of retiring them, causing
one-horizon delay; change the condition in the heuristic_solution_queue_ scan
(where it compares sol.work_timestamp to deterministic_current_horizon_) from
using strict '<' to '<=' so that producer_sync_.wait_for_producers(horizon_end)
will allow retiring solutions with work_timestamp ==
deterministic_current_horizon_; update any related comments and ensure this
affects the due_solutions collection used for feasibility/repair and incumbent
pruning/publication.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb644a6f-47be-427b-83b0-439f8bcfe64a

📥 Commits

Reviewing files that changed from the base of the PR and between 4d5f5e5 and 3cfd9b1.

📒 Files selected for processing (111)
  • benchmarks/linear_programming/cuopt/run_mip.cpp
  • ci/compute-sanitizer-suppressions.xml
  • cpp/CMakeLists.txt
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/cuopt_c.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/branch_and_bound.hpp
  • cpp/src/branch_and_bound/deterministic_workers.hpp
  • cpp/src/branch_and_bound/mip_node.hpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/branch_and_bound/pseudo_costs.hpp
  • cpp/src/dual_simplex/basis_updates.cpp
  • cpp/src/dual_simplex/bound_flipping_ratio_test.cpp
  • cpp/src/dual_simplex/bound_flipping_ratio_test.hpp
  • cpp/src/dual_simplex/phase2.cpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/diversity/diversity_config.hpp
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/diversity_manager.cuh
  • cpp/src/mip_heuristics/diversity/lns/rins.cu
  • cpp/src/mip_heuristics/diversity/multi_armed_bandit.cuh
  • cpp/src/mip_heuristics/diversity/population.cu
  • cpp/src/mip_heuristics/diversity/population.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh
  • cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp
  • cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh
  • cpp/src/mip_heuristics/diversity/weights.cuh
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_impl_common.cuh
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuh
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
  • cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh
  • cpp/src/mip_heuristics/feasibility_jump/utils.cuh
  • cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu
  • cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuh
  • cpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cu
  • cpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cuh
  • cpp/src/mip_heuristics/local_search/local_search.cu
  • cpp/src/mip_heuristics/local_search/local_search.cuh
  • cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu
  • cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh
  • cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu
  • cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cuh
  • cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cu
  • cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cuh
  • cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu
  • cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuh
  • cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
  • cpp/src/mip_heuristics/local_search/rounding/simple_rounding_kernels.cuh
  • cpp/src/mip_heuristics/mip_constants.hpp
  • cpp/src/mip_heuristics/presolve/bounds_presolve.cu
  • cpp/src/mip_heuristics/presolve/bounds_presolve.cuh
  • cpp/src/mip_heuristics/presolve/bounds_update_data.cu
  • cpp/src/mip_heuristics/presolve/lb_probing_cache.cu
  • cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu
  • cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuh
  • cpp/src/mip_heuristics/presolve/multi_probe.cu
  • cpp/src/mip_heuristics/presolve/multi_probe.cuh
  • cpp/src/mip_heuristics/presolve/probing_cache.cu
  • cpp/src/mip_heuristics/presolve/probing_cache.cuh
  • cpp/src/mip_heuristics/presolve/third_party_presolve.cpp
  • cpp/src/mip_heuristics/presolve/third_party_presolve.hpp
  • cpp/src/mip_heuristics/presolve/utils.cuh
  • cpp/src/mip_heuristics/problem/presolve_data.cu
  • cpp/src/mip_heuristics/problem/presolve_data.cuh
  • cpp/src/mip_heuristics/problem/problem.cu
  • cpp/src/mip_heuristics/problem/problem.cuh
  • cpp/src/mip_heuristics/problem/problem_fixing.cuh
  • cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu
  • cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cuh
  • cpp/src/mip_heuristics/solution/solution.cu
  • cpp/src/mip_heuristics/solution/solution.cuh
  • cpp/src/mip_heuristics/solution_callbacks.cuh
  • cpp/src/mip_heuristics/solve.cu
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/mip_heuristics/solver.cuh
  • cpp/src/mip_heuristics/solver_context.cuh
  • cpp/src/pdlp/cuopt_c.cpp
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh
  • cpp/src/utilities/copy_helpers.hpp
  • cpp/src/utilities/cuda_helpers.cuh
  • cpp/src/utilities/determinism_log.hpp
  • cpp/src/utilities/seed_generator.cu
  • cpp/src/utilities/seed_generator.cuh
  • cpp/src/utilities/termination_checker.hpp
  • cpp/src/utilities/timer.hpp
  • cpp/src/utilities/work_limit_context.hpp
  • cpp/src/utilities/work_limit_timer.hpp
  • cpp/src/utilities/work_unit_scheduler.cpp
  • cpp/tests/CMakeLists.txt
  • cpp/tests/mip/CMakeLists.txt
  • cpp/tests/mip/determinism_test.cu
  • cpp/tests/mip/determinism_utils.cuh
  • cpp/tests/mip/diversity_test.cu
  • cpp/tests/mip/feasibility_jump_tests.cu
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/tests/mip/local_search_test.cu
  • cpp/tests/mip/mip_utils.cuh
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/tests/mip/presolve_test.cu
  • datasets/mip/download_miplib_test_dataset.sh

Comment on lines +279 to +302
int64_t& total_lp_iter,
cuopt::work_limit_context_t* work_ctx = nullptr)
{
omp_atomic_t<int64_t> atomic_iter{0};
f_t result = trial_branching(original_lp,
settings,
var_types,
vstatus,
edge_norms,
basis_factors,
basic_list,
nonbasic_list,
branch_var,
branch_var_lower,
branch_var_upper,
upper_bound,
bnb_lp_iter_per_node,
start_time,
upper_max_lp_iter,
lower_max_lp_iter,
atomic_iter,
work_ctx);
total_lp_iter += atomic_iter.load();
return result;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file
find . -name "pseudo_costs.cpp" -type f | head -5

Repository: aliceb-nv/cuopt-fork

Length of output: 107


🏁 Script executed:

# Get basic file info
wc -l cpp/src/branch_and_bound/pseudo_costs.cpp

Repository: aliceb-nv/cuopt-fork

Length of output: 110


🏁 Script executed:

# Read the critical section at lines 279-302
sed -n '270,310p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -n

Repository: aliceb-nv/cuopt-fork

Length of output: 2055


🏁 Script executed:

# Read the other affected sections mentioned (794-820, 834-853)
sed -n '790,860p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -n

Repository: aliceb-nv/cuopt-fork

Length of output: 3961


🏁 Script executed:

# Search for taskloop usage and how strong_branching_lp_iter is shared
rg -B5 -A10 "taskloop" cpp/src/branch_and_bound/pseudo_costs.cpp

Repository: aliceb-nv/cuopt-fork

Length of output: 602


🏁 Script executed:

# Look for reliable_variable_selection_core to understand the context
rg -B3 -A20 "reliable_variable_selection_core" cpp/src/branch_and_bound/pseudo_costs.cpp | head -80

Repository: aliceb-nv/cuopt-fork

Length of output: 4078


Add atomic synchronization to shared counter updates in deterministic strong-branching path.

The taskloop shares strong_branching_lp_iter across parallel tasks (line 795-796), but the overload at lines 279-302 performs unsynchronized total_lp_iter += atomic_iter.load(); when called in deterministic mode (plain int64_t&). This causes lost increments and corrupts iteration statistics and adaptive thresholds.

♻️ Suggested fix
   f_t result = trial_branching(original_lp,
                                settings,
                                var_types,
                                vstatus,
                                edge_norms,
                                basis_factors,
                                basic_list,
                                nonbasic_list,
                                branch_var,
                                branch_var_lower,
                                branch_var_upper,
                                upper_bound,
                                bnb_lp_iter_per_node,
                                start_time,
                                upper_max_lp_iter,
                                lower_max_lp_iter,
                                atomic_iter,
                                work_ctx);
-  total_lp_iter += atomic_iter.load();
+  const auto local_iter = atomic_iter.load();
+#pragma omp atomic update
+  total_lp_iter += local_iter;
   return result;

Also applies to: 794–820, 834–853

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 279 - 302, The
unsynchronized update total_lp_iter += atomic_iter.load() in the wrapper around
trial_branching can race when called from parallel taskloops; replace that plain
+= with an atomic-safe add — e.g. use std::atomic_ref<int64_t>
total_ref(total_lp_iter); total_ref.fetch_add(atomic_iter.load(),
std::memory_order_relaxed); or wrap the update in an omp critical/atomic region
so the increment from atomic_iter is applied atomically; apply the same fix to
the other deterministic strong-branching sites mentioned (the taskloop use sites
around the blocks corresponding to strong_branching_lp_iter at ~794–820 and
~834–853).

Comment on lines +541 to +542
int32_t base_feas = 0;
int32_t bonus_robust = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for fj_staged_score_t definition to verify the base/bonus types
ast-grep --pattern $'struct fj_staged_score_t {
  $$$
}'

# Also search for move_score_t to check if it uses the same types
rg -n "struct.*move_score_t" --type cpp -A 10

Repository: aliceb-nv/cuopt-fork

Length of output: 46


🏁 Script executed:

cat -n cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh | head -20

Repository: aliceb-nv/cuopt-fork

Length of output: 763


🏁 Script executed:

# Search for fj_staged_score_t more broadly
rg "fj_staged_score_t" -t cpp -t h --type-list | head -20

Repository: aliceb-nv/cuopt-fork

Length of output: 453


🏁 Script executed:

# Search in the repository for the struct
rg "struct fj_staged_score_t" --type-list

Repository: aliceb-nv/cuopt-fork

Length of output: 453


🏁 Script executed:

# List files in the feasibility_jump directory
fd . cpp/src/mip_heuristics/feasibility_jump/ --type f

Repository: aliceb-nv/cuopt-fork

Length of output: 604


🏁 Script executed:

# Get file length
wc -l cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 126


🏁 Script executed:

# Search for fj_staged_score_t definition
rg "fj_staged_score_t" --type cpp --type h -A 5

Repository: aliceb-nv/cuopt-fork

Length of output: 46


🏁 Script executed:

# Search for struct definitions in the feasibility_jump directory
rg "struct.*score_t" cpp/src/mip_heuristics/feasibility_jump/ -A 5

Repository: aliceb-nv/cuopt-fork

Length of output: 690


🏁 Script executed:

# Read context around lines 541-542 in load_balancing.cuh
sed -n '530,555p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 1225


🏁 Script executed:

# Search for how base_feas and bonus_robust are used after initialization
sed -n '540,600p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 2985


🏁 Script executed:

# Search for fj_staged_score_t usage in load_balancing.cuh
rg "fj_staged_score_t" cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh -B 2 -A 2

Repository: aliceb-nv/cuopt-fork

Length of output: 46


🏁 Script executed:

# Search for warp_update_move_score function definition
rg "warp_update_move_score" cpp/src/mip_heuristics/feasibility_jump/ -A 15

Repository: aliceb-nv/cuopt-fork

Length of output: 5219


🏁 Script executed:

# Search for isfinite usage with score fields
rg "isfinite.*score" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 2

Repository: aliceb-nv/cuopt-fork

Length of output: 626


🏁 Script executed:

# Look for the function call context - what type are the arguments expected to be?
sed -n '560,580p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 988


🏁 Script executed:

# Check move_score_t definition to understand the score field types
rg "move_score_t" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 5

Repository: aliceb-nv/cuopt-fork

Length of output: 27516


🏁 Script executed:

# Look for the other location where warp_update_move_score is called to see pattern
rg "warp_update_move_score" cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh -B 5 | head -50

Repository: aliceb-nv/cuopt-fork

Length of output: 632


🏁 Script executed:

# Check if there's a second kernel/location where similar code exists with f_t
rg "base_feas.*bonus_robust" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 2

Repository: aliceb-nv/cuopt-fork

Length of output: 3062


🏁 Script executed:

# Search for any git history or comments about the f_t to int32_t change
rg "base_feas\|bonus_robust" cpp/src/mip_heuristics/feasibility_jump/ -B 1 -A 1 | head -80

Repository: aliceb-nv/cuopt-fork

Length of output: 46


Type inconsistency between accumulators and function signature: warp_update_move_score expects f_t parameters.

While fj_staged_score_t::base and fj_staged_score_t::bonus are correctly defined as int32_t, the function warp_update_move_score<i_t, f_t>() at line 571 declares its parameters as f_t base_feas and f_t bonus_robust, not int32_t. The int32_t accumulators undergo implicit conversion to f_t at the call site. Additionally, isfinite() checks on int32_t fields are semantically incorrect. Either change the accumulator variables to f_t or update the function signature to accept int32_t.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh` around lines 541
- 542, The accumulators base_feas and bonus_robust are declared as int32_t but
are passed to warp_update_move_score<i_t,f_t>() which expects f_t and uses
isfinite on those values; change the accumulator types to f_t (matching
fj_staged_score_t::base/bonus usage in the function) so there is no implicit
conversion and isfinite is applied to a floating type, i.e., replace the int32_t
declarations of base_feas and bonus_robust with f_t, and ensure any uses
(including the call to warp_update_move_score and isfinite checks) compile
against the new f_t types.

const raft::handle_t* handle_ptr;
std::mt19937 gen;
timer_t timer{0.};
work_limit_timer_t timer;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for work_limit_timer_t definition to verify default constructor availability
ast-grep --pattern 'class work_limit_timer_t {
  $$$
}'

# Also check if it's a typedef/using alias
rg -n 'work_limit_timer_t' --type cpp -C 3 | head -60

Repository: aliceb-nv/cuopt-fork

Length of output: 2548


🏁 Script executed:

# Read the termination_checker.hpp file to see the full class definition and constructors
head -250 cpp/src/utilities/termination_checker.hpp | tail -150

Repository: aliceb-nv/cuopt-fork

Length of output: 4945


🏁 Script executed:

# Search for the constructors of termination_checker_t
rg -n "termination_checker_t\(" cpp/src/utilities/termination_checker.hpp | head -20

Repository: aliceb-nv/cuopt-fork

Length of output: 272


🏁 Script executed:

# Read the constructor definitions to verify
sed -n '30,75p' cpp/src/utilities/termination_checker.hpp

Repository: aliceb-nv/cuopt-fork

Length of output: 1495


🏁 Script executed:

# Find timer_t definition to verify default constructor
rg -n "class timer_t|struct timer_t" cpp/src/utilities/ -A 20 | head -80

Repository: aliceb-nv/cuopt-fork

Length of output: 1196


The member timer cannot be default-constructed.

work_limit_timer_t (alias for termination_checker_t) contains a timer_t timer; member with a deleted default constructor. This member declaration requires either explicit initialization with a time limit parameter or a different initialization strategy. The previous code correctly used timer_t timer{0.} to invoke the parameterized constructor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh` at line 147,
The member variable work_limit_timer_t timer is being default-constructed but
termination_checker_t contains a timer_t member that has no default constructor;
change the declaration to explicitly construct it with a time limit (e.g.,
initialize with zero) — replace the default declaration with an explicit
initialization such as work_limit_timer_t timer{timer_t{0.}} (or
work_limit_timer_t timer{0.} if termination_checker_t exposes a ctor taking a
double) so the embedded timer_t is properly constructed.

i_t var_idx)
{
timer_t timer(settings.time_limit);
work_limit_timer_t timer(context.gpu_heur_loop, settings.time_limit, *context.termination);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how context.termination is initialized and whether null checks exist
rg -n 'context\.termination' --type cpp -C 2 | head -80

# Check the termination member declaration
ast-grep --pattern 'struct mip_solver_context_t {
  $$$
}'

Repository: aliceb-nv/cuopt-fork

Length of output: 46


🏁 Script executed:

#!/bin/bash
# First, check if the file exists and explore the structure
find . -name "load_balanced_bounds_presolve.cu" -type f 2>/dev/null

# Also look for related files
fd -e cu -e hpp -e h "context" 2>/dev/null | head -20

Repository: aliceb-nv/cuopt-fork

Length of output: 171


🏁 Script executed:

#!/bin/bash
# Get more info about the directory structure
ls -la cpp/src/mip_heuristics/presolve/ 2>/dev/null || find . -type d -name "presolve" 2>/dev/null

Repository: aliceb-nv/cuopt-fork

Length of output: 2218


🏁 Script executed:

#!/bin/bash
# Read the file with line numbers to see the context around the problematic lines
wc -l cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Get context around lines 629, 641, 670
sed -n '620,640p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 926


🏁 Script executed:

#!/bin/bash
# Get context around line 670
sed -n '665,680p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 600


🏁 Script executed:

#!/bin/bash
# Get line 641 and surrounding context
sed -n '635,650p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu

Repository: aliceb-nv/cuopt-fork

Length of output: 578


🏁 Script executed:

#!/bin/bash
# Find the class definition and constructor
head -100 cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu | tail -80

Repository: aliceb-nv/cuopt-fork

Length of output: 3507


🏁 Script executed:

#!/bin/bash
# Search for mip_solver_context_t definition in header file
cat cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 8875


🏁 Script executed:

#!/bin/bash
# Find solver_context.cuh
find . -name "solver_context.cuh" -type f 2>/dev/null

Repository: aliceb-nv/cuopt-fork

Length of output: 107


🏁 Script executed:

#!/bin/bash
# Search for mip_solver_context_t definition
rg -n "struct mip_solver_context_t|class mip_solver_context_t" --type cpp -A 30 2>/dev/null | head -100

Repository: aliceb-nv/cuopt-fork

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Read the solver_context.cuh file
cat cpp/src/mip_heuristics/solver_context.cuh

Repository: aliceb-nv/cuopt-fork

Length of output: 3365


Add null checks for context.termination before dereferencing.

The context.termination pointer is initialized to nullptr and documented as being set by mip_solver_t after construction. All three solve() overloads (lines 629, 641, 670) unconditionally dereference it with *context.termination without verifying it has been initialized. If solve() is called before termination is set, this will cause undefined behavior.

Either ensure termination is always initialized before any solve() call, or add null checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu` at line
629, The code dereferences context.termination when constructing
work_limit_timer_t in the solve() overloads (where work_limit_timer_t
timer(context.gpu_heur_loop, settings.time_limit, *context.termination) is
used); add a null-check for context.termination before dereferencing in each
solve() overload (the three sites around work_limit_timer_t creation) and handle
the null case deterministically (e.g., log an error via the existing logger and
return/throw, or use a safe default termination object) so you never do
*context.termination when it is nullptr.

@aliceb-nv aliceb-nv closed this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant