Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#6
Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#6aliceb-nv wants to merge 277 commits into
Conversation
📝 WalkthroughWalkthroughThis PR refactors MIP incumbent solution handling, determinism control, and resource management throughout the codebase. Key changes include: replacing binary determinism modes with bitmask-based flags, introducing extended solution callbacks with origin and work-timestamp metadata, replacing timer-based termination with work-unit-aware Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
cpp/src/mip_heuristics/solver.cu (1)
302-317:⚠️ Potential issue | 🔴 CriticalDeterministic B&B currently drops GPU-heuristic incumbents.
When
CUOPT_DETERMINISM_BBis enabled,branch_and_bound_callbackis left unset, so heuristic incumbents never reachqueue_external_solution_deterministic(...). The fallback at the end then discardsdm.run_solver()’s incumbent unless B&B itself retired a feasible solution. In this mode the solver can return no incumbent even though the heuristic side found one, and B&B also loses incumbent-driven pruning while it runs. Please either wire the deterministic handoff or stop launching concurrent heuristics for this mode until that path exists.Also applies to: 365-379
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solver.cu` around lines 302 - 317, When CUOPT_DETERMINISM_BB is set the code leaves context.problem_ptr->branch_and_bound_callback unset causing GPU-heuristic incumbents to be dropped; fix by wiring a deterministic handoff: in the branch where (context.settings.determinism_mode & CUOPT_DETERMINISM_BB) set context.problem_ptr->branch_and_bound_callback to a lambda that calls branch_and_bound->queue_external_solution_deterministic(solution, 0.0) (use branch_and_bound.get() as in the non-deterministic path), and keep branch_and_bound->set_concurrent_lp_root_solve(false); alternatively (if deterministic GPU heuristics are not available) ensure concurrent heuristics are not launched by keeping set_concurrent_lp_root_solve(false) and explicitly disabling any code paths that start heuristic runs when CUOPT_DETERMINISM_BB is enabled so dm.run_solver() can’t produce an orphaned incumbent.cpp/src/mip_heuristics/diversity/diversity_manager.cu (2)
126-144:⚠️ Potential issue | 🔴 CriticalInitialize
gpu_heur_loop.deterministicbefore the env-var early returns.If
CUOPT_CONFIG_IDis unset, invalid, or out of range, the constructor returns before Line 142. That leavescontext.gpu_heur_loop.deterministicat its default and silently disables the deterministic GPU-heuristic path in normal runs.♻️ Suggested fix
const char* env_config_id_raw = std::getenv("CUOPT_CONFIG_ID"); + context.gpu_heur_loop.deterministic = + (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS); + if (env_config_id_raw == nullptr) { return; } @@ - - context.gpu_heur_loop.deterministic = - (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 126 - 144, Set context.gpu_heur_loop.deterministic immediately during initialization so it is always configured even when CUOPT_CONFIG_ID handling returns early; specifically, before reading/validating env_config_id (env var CUOPT_CONFIG_ID) and before any early returns (including the nullptr check, stoi catch, and out-of-range check against max_config), assign context.gpu_heur_loop.deterministic = (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS) so the deterministic GPU-heuristics flag is not left at its default when env parsing fails or is skipped.
230-249:⚠️ Potential issue | 🟠 MajorPresolve substeps are not chained to the presolve budget.
presolve_timerandprobing_timerare parented to*context.termination, probing is sized from the originaltime_limit, andtrivial_presolveis gated onglobal_timer. After bounds propagation spends time/work, probing and trivial presolve can still consume a fresh full slice instead of the remaining presolve budget.♻️ Suggested fix
- work_limit_timer_t presolve_timer(context.gpu_heur_loop, time_limit, *context.termination); + work_limit_timer_t presolve_timer(context.gpu_heur_loop, time_limit, global_timer); @@ - f_t time_for_probing_cache = std::min(max_time_on_probing, time_limit); - work_limit_timer_t probing_timer( - context.gpu_heur_loop, time_for_probing_cache, *context.termination); + f_t time_for_probing_cache = + std::min(max_time_on_probing, presolve_timer.remaining_time()); + work_limit_timer_t probing_timer( + context.gpu_heur_loop, time_for_probing_cache, presolve_timer); @@ - if (!global_timer.check_time_limit()) { trivial_presolve(*problem_ptr, remap_cache_ids); } + if (!presolve_timer.check_time_limit()) { trivial_presolve(*problem_ptr, remap_cache_ids); }Also applies to: 256-260
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 230 - 249, Presolve substeps (presolve_timer, probing_timer, and trivial_presolve logic) currently each use the original termination/time_limit so they can exceed the overall presolve budget; change them to consume the remaining presolve budget instead: after creating presolve_timer (work_limit_timer_t presolve_timer(..., time_limit, *context.termination)), compute the remaining time/termination from presolve_timer (or derive a child termination) and pass that into the probing_timer and into the trivial_presolve gating (replace use of time_limit and global_timer with presolve_timer.remaining_time()/presolve termination), and cap time_for_probing_cache with that remaining budget (involving diversity_config.max_time_on_probing as before) so all substeps are chained to presolve_timer and cannot exceed the presolve budget.cpp/src/mip_heuristics/local_search/local_search.cu (1)
349-365:⚠️ Potential issue | 🟠 MajorClamp the infeasible-search slice against the caller’s remaining budget.
In the infeasible branch,
timeris recreated from*context.terminationwith a fixed 0.25/0.5 slice. If the caller only has less than that remaining, this path still hands local search a fresh larger budget instead of respecting the incoming checker.♻️ Suggested fix
bool local_search_t<i_t, f_t>::run_local_search(solution_t<i_t, f_t>& solution, const weight_t<i_t, f_t>& weights, work_limit_timer_t timer, const ls_config_t<i_t, f_t>& ls_config) { @@ + work_limit_timer_t phase_timer = timer; if (!solution.get_feasible()) { if (ls_config.at_least_one_parent_feasible) { fj_settings.time_limit = 0.5; } else { fj_settings.time_limit = 0.25; } - timer = work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit, *context.termination); + const auto phase_limit = std::min<f_t>(fj_settings.time_limit, timer.remaining_time()); + phase_timer = work_limit_timer_t(context.gpu_heur_loop, phase_limit, timer); } else { - fj_settings.time_limit = std::min(1., timer.remaining_time()); + const auto phase_limit = std::min<f_t>(1., timer.remaining_time()); + fj_settings.time_limit = phase_limit; + phase_timer = work_limit_timer_t(context.gpu_heur_loop, phase_limit, timer); } @@ - is_feas = run_fj_line_segment(solution, timer, ls_config); + is_feas = run_fj_line_segment(solution, phase_timer, ls_config); } else { @@ - is_feas = run_fj_annealing(solution, timer, ls_config); - if (lp_optimal_exists) { is_feas = run_fj_line_segment(solution, timer, ls_config); } + is_feas = run_fj_annealing(solution, phase_timer, ls_config); + if (lp_optimal_exists) { is_feas = run_fj_line_segment(solution, phase_timer, ls_config); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 349 - 365, The infeasible branch recreates timer with a fixed fj_settings.time_limit (0.25/0.5) ignoring the caller's remaining budget; modify run_local_search so before assigning fj_settings.time_limit you query the incoming timer's remaining budget and set fj_settings.time_limit = min(desired_slice, remaining_budget) (use the existing timer instance's remaining-time accessor) then construct the new work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit, *context.termination); update references to timer, fj_settings.time_limit, ls_config.at_least_one_parent_feasible, context.gpu_heur_loop and context.termination accordingly.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu (1)
401-403:⚠️ Potential issue | 🟠 MajorInvalidate skipped binary vars before returning.
In
FJ_MTM_SATISFIED, this early return leavesjump_move_scores,jump_move_delta, andjump_move_infeasibilityuntouched for binary objective variables.best_sat_cstr_mtm_move()later reduces over allfj.objective_varswith cached scores, so a binary var can still win with stale state from an earlier pass.♻️ Suggested fix
if constexpr (move_type == MTMMoveType::FJ_MTM_SATISFIED) { - if (fj.pb.is_binary_variable[var_idx]) return; + if (fj.pb.is_binary_variable[var_idx]) { + if (threadIdx.x == 0) { + fj.jump_move_delta[var_idx] = 0; + fj.jump_move_scores[var_idx] = fj_t<i_t, f_t>::move_score_t::invalid(); + fj.jump_move_infeasibility[var_idx] = -std::numeric_limits<f_t>::infinity(); + } + return; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu` around lines 401 - 403, In the early-return for MTMMoveType::FJ_MTM_SATISFIED when fj.pb.is_binary_variable[var_idx] is true, invalidate the cached per-variable outputs (jump_move_scores, jump_move_delta, jump_move_infeasibility) for this var_idx before returning so stale values cannot win in best_sat_cstr_mtm_move(); set the score entry to a definitive "ignore" value (e.g., -INFINITY or other sentinel used elsewhere) and reset delta/infeasibility to their neutral/zero equivalents so subsequent reductions over fj.objective_vars skip this binary variable.
🟠 Major comments (25)
ci/compute-sanitizer-suppressions.xml-109-122 (1)
109-122:⚠️ Potential issue | 🟠 MajorNarrow this
cuMemcpyDtoHAsyncsuppression to avoid masking real bugs.This rule is too broad for
InitcheckApiError: it can suppress unrelated 16-byte host-copy defects anywhere that matchescuMemcpyDtoHAsync.*. Add at least one additional discriminating frame (e.g., owning module/function) tied to the known harmless tuple-padding path.Suggested tightening
<record> <kind>InitcheckApiError</kind> <level>Error</level> <what> <text>Host API uninitialized memory access</text> <accessSize>16</accessSize> </what> <hostStack> <frame> <func>cuMemcpyDtoHAsync.*</func> <module>.*libcuda.so.*</module> </frame> + <frame> + <module>.*libcuopt.so.*</module> + </frame> + <!-- Optional: add a specific thrust/cub caller frame if stable in your stack traces --> </hostStack> </record>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/compute-sanitizer-suppressions.xml` around lines 109 - 122, The suppression for InitcheckApiError is too broad because the <hostStack> only matches cuMemcpyDtoHAsync.* and libcuda.so, so narrow it by adding an extra discriminating frame tied to the known harmless tuple-padding path: find the <record> with <kind>InitcheckApiError</kind> that contains the cuMemcpyDtoHAsync frame and add one more <frame> entry (e.g., a <func> regex and/or <module> regex) that matches the specific internal function or module responsible for the benign tuple-padding copy (for example the project-specific function name or module that owns the padding path), so only that call-site is suppressed rather than all cuMemcpyDtoHAsync matches.ci/compute-sanitizer-suppressions.xml-209-222 (1)
209-222:⚠️ Potential issue | 🟠 MajorAvoid blanket suppression of all
cuMemcpyDtoDAsyncuninitialized-copy errors.This rule suppresses virtually every DtoD uninitialized-copy API error and can hide true memory-initialization regressions in CI. Scope it to a known-safe call path (module + caller frame) instead of suppressing globally.
Suggested tightening
<record> <kind>InitcheckApiError</kind> <level>Error</level> <what> <text>Host API uninitialized memory access</text> </what> <hostStack> <frame> <func>cuMemcpyDtoDAsync.*</func> <module>.*libcuda.so.*</module> </frame> + <frame> + <module>.*libcuopt.so.*</module> + </frame> + <!-- Add stable caller symbol(s) for the intended benign path --> </hostStack> </record>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/compute-sanitizer-suppressions.xml` around lines 209 - 222, The suppression entry currently blanket-matches Host API uninitialized memory accesses for any cuMemcpyDtoDAsync call and can hide real regressions; narrow the rule by restricting the hostStack to the known-safe caller and module (e.g., add a specific <frame> with the exact calling function regex and the concrete module basename or path instead of just .*libcuda.so.*) or add an additional <frame> parent call pattern to limit scope to the trusted call path (reference the existing cuMemcpyDtoDAsync hostStack record to update); ensure the revised record only suppresses the specific safe code path rather than all cuMemcpyDtoDAsync invocations.cpp/src/branch_and_bound/mip_node.hpp-75-76 (1)
75-76:⚠️ Potential issue | 🟠 MajorDefault move operations don't update child
parentpointers.
std::unique_ptr<mip_node_t> children[2]transfer ownership when moved, but each child's rawparentpointer remains unchanged. Moving any node with populated children will leave those children pointing to the moved-from location, breaking upward traversal invariants.While current code avoids this by moving the root only before children are added, the design is unsafe. Either delete move support or implement custom move operations that update direct children's
parentpointers to the new location.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/mip_node.hpp` around lines 75 - 76, The defaulted move constructor and move assignment for mip_node_t are unsafe because moving transfers the children unique_ptrs but leaves each child's raw parent pointer pointing to the moved-from object; either delete move support or implement custom move operations: write a custom mip_node_t(mip_node_t&&) and mip_node_t& operator=(mip_node_t&&) that move all members (including children array) and after moving set children[0]->parent and children[1]->parent (when non-null) to this, and ensure the moved-from object's child pointers are nullified, or alternatively mark the move ctor/assign as deleted to enforce no moves. Ensure you update both direct children and preserve other member moves correctly in the implementations.cpp/src/mip_heuristics/diversity/weights.cuh-30-33 (1)
30-33:⚠️ Potential issue | 🟠 MajorDon't default
get_hash()tocuda_stream_default.This object is written on the caller's handle stream, but a zero-arg
get_hash()will readcstr_weightsandobjective_weighton the default stream. That can observe stale state or force hidden cross-stream synchronization, which makes the determinism hash itself unreliable.Minimal fix
- uint32_t get_hash(rmm::cuda_stream_view stream = rmm::cuda_stream_default) const + uint32_t get_hash(rmm::cuda_stream_view stream) const#!/bin/bash set -euo pipefail rg -nP '\.get_hash\s*\(\s*\)'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/weights.cuh` around lines 30 - 33, The get_hash method currently defaults its stream parameter to rmm::cuda_stream_default, which may read cstr_weights and objective_weight on the wrong stream; change get_hash to require an explicit rmm::cuda_stream_view parameter (remove the default value) and ensure callers pass the caller's handle stream; use that stream when calling compute_hash(cstr_weights, stream) and compute_hash(objective_weight.value(stream)) and update all call sites that invoke get_hash() (search for ".get_hash(") to forward the correct stream to avoid cross-stream reads/synchronization.cpp/tests/mip/determinism_utils.cuh-47-54 (1)
47-54:⚠️ Potential issue | 🟠 MajorThis helper can stall teardown for up to 1s per instance.
The polling thread sleeps for
1..1000 msbetween checks, andspin_stream_raii_t::~spin_stream_raii_t()can only stop by setting the flag and joining. If destruction happens during that sleep, test shutdown inherits the full delay. A wakeable stop signal, or at least a much shorter fixed backoff, would keep determinism tests fast and reproducible.Also applies to: 64-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_utils.cuh` around lines 47 - 54, The polling loop in spin_stream_raii_t uses std::this_thread::sleep_for(rand()%1000+1) which can stall teardown up to 1s; change the loop to use a wakeable wait (preferred) or a much shorter fixed backoff: add a std::condition_variable and std::mutex to the spin_stream_raii_t, replace sleep_for with cv.wait_for(lock, short_duration, [&]{ return stop_flag; }) and in the destructor set the stop_flag, notify_one() and join; alternatively, if you prefer simpler change, replace the randomized 1..1000ms sleep with a small fixed backoff (e.g., 1–10ms) so spin_kernel, host_copy(flag,...), stream_view and the destructor join return quickly.cpp/src/mip_heuristics/presolve/bounds_presolve.cu-174-176 (1)
174-176:⚠️ Potential issue | 🟠 MajorAvoid permanently mutating
settings.iteration_limitinside the solve loop.At Line 175, the member is clamped in place. If this object is reused, later solves stay capped at 50 iterations even outside the intended scope. Use a local effective limit instead.
Proposed fix
- if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) { - settings.iteration_limit = std::min(settings.iteration_limit, 50); - } + i_t effective_iteration_limit = settings.iteration_limit; + if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) { + effective_iteration_limit = std::min(effective_iteration_limit, i_t{50}); + } i_t iter; upd.init_changed_constraints(pb.handle_ptr); - for (iter = 0; iter < settings.iteration_limit; ++iter) { + for (iter = 0; iter < effective_iteration_limit; ++iter) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/bounds_presolve.cu` around lines 174 - 176, The code currently mutates settings.iteration_limit in place when context.settings.determinism_mode has CUOPT_DETERMINISM_GPU_HEURISTICS, which permanently clamps the setting for future solves; instead compute a local effective limit (e.g. effective_iteration_limit = std::min(settings.iteration_limit, 50)) inside the solve loop or the function that uses it and use that local variable wherever the iteration cap is applied (refer to settings.iteration_limit usage in this file and replace with effective_iteration_limit), leaving the settings object unmodified.cpp/src/mip_heuristics/presolve/probing_cache.cu-707-712 (1)
707-712:⚠️ Potential issue | 🟠 MajorResolve duplicate substitutions with a stable rule.
This keeps whichever proposal for a
substituted_varis visited first, but the traversal is driven bystd::unordered_map/std::unordered_setiteration order. In a determinism-sensitive path that makes the winning substitution hash-order dependent, so equivalent runs can still diverge. Pick the winner by a stable key (for exampletimestamp, then variable ids) before applying and storing it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/probing_cache.cu` around lines 707 - 712, The current loop over all_substitutions uses unordered iteration and keeps the first-seen substitution for each substitution.substituted_var, which makes winner selection non-deterministic; instead, build a deterministic selection pass: collect all proposals into a temporary map keyed by substitution.substituted_var, compare candidates by a stable comparator (first substitution.timestamp, then substituting_var id, then substituted_var id) to choose the winner for each substituted_var, then iterate that map to apply and log the chosen substitutions; update references in this block (all_substitutions, substitution.substituted_var, substituting_var, and substitution.timestamp) accordingly and remove reliance on seen_substituted during the first discovery loop.cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu-704-705 (1)
704-705:⚠️ Potential issue | 🟠 MajorThread the work-limit timer all the way through repair.
apply_round()now creates awork_limit_timer_t, but it still hands it off tofind_integer(timer_t&), and the repair path below continues with plaintimer_tinstances. That strips the parent termination/work-limit context in the deepest loop, so repair can continue past the root deadline even though this entrypoint was updated.Also applies to: 724-730
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu` around lines 704 - 705, The work-limit context is lost because apply_round() creates a work_limit_timer_t but calls find_integer(timer_t&) and subsequent repair paths use plain timer_t; update the function signatures and calls so the work_limit_timer_t is threaded through: change find_integer(timer_t&) to find_integer(work_limit_timer_t&) (and similarly for any functions in the repair path that accept timer_t, e.g., the repair function(s) invoked from find_integer), update all call sites (including apply_round and the deeper repair loop functions referenced in lb_constraint_prop.cu lines around 704 and 724–730) to pass the work_limit_timer_t instance instead of timer_t, and ensure any places constructing timers use or wrap into work_limit_timer_t so the parent deadline/termination context is preserved.cpp/include/cuopt/linear_programming/cuopt_c.h-74-90 (1)
74-90:⚠️ Potential issue | 🟠 MajorThe exported callback info struct is missing its advertised size field.
The docs for
cuOptMIPGetSolutionCallbackExtsaycallback_info->struct_sizeis always set, butcuOptMIPSolutionCallbackInfoonly containsoriginandwork_timestamp. As shipped, the new C ABI cannot support the append-only growth contract described here, and clients following the docs will read past the struct. Either addstruct_sizeas the first field or remove that contract from the API/docs.Suggested fix
typedef struct { + /** Size of this struct in bytes. */ + uint64_t struct_size; /** Which solver component found this solution (CUOPT_MIP_SOLUTION_ORIGIN_*). */ uint32_t origin; /** Deterministic work-unit timestamp at which the solution was found. * Monotonically increasing across successive callbacks within a single solve. * In non-deterministic mode this value is informational only. */ double work_timestamp; } cuOptMIPSolutionCallbackInfo;Also applies to: 734-751
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/cuopt_c.h` around lines 74 - 90, The exported callback struct cuOptMIPSolutionCallbackInfo lacks the advertised struct_size field; add a uint32_t struct_size as the first member of cuOptMIPSolutionCallbackInfo (so size is always available and future fields can be appended) and update any places that populate this struct to set struct_size accordingly; also make the same change for the other callback-ext struct mentioned (the one referenced at lines 734-751) so both follow the ABI append-only contract and consumers can safely check struct_size before accessing newer fields.cpp/src/mip_heuristics/solver_context.cuh-72-74 (1)
72-74:⚠️ Potential issue | 🟠 MajorMake
terminationnon-null by construction.This shared context now carries a raw pointer that defaults to
nullptr, but new timer setup in this PR dereferences*context.terminationunconditionally. That leaves a crash window until some external setup step patches the pointer. Please require a checker in the constructor or keep a root checker by value/reference instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solver_context.cuh` around lines 72 - 74, The shared context currently stores a raw pointer termination that defaults to nullptr and is later dereferenced by timer setup; update the solver context so termination cannot be null at construction: either (A) change the member from cuopt::termination_checker_t* termination to a by-value or reference member (e.g., cuopt::termination_checker_t termination or cuopt::termination_checker_t& termination) so the root checker always exists, or (B) add a constructor for the context (the struct used by mip_solver_t) that requires a cuopt::termination_checker_t* parameter, asserts or throws on null, and initializes termination from that pointer; also update mip_solver_t creation sites to pass the checker. Ensure all references to termination in timer setup use the non-null member.cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu-75-98 (1)
75-98:⚠️ Potential issue | 🟠 MajorPreserve the hard wall-clock cap in deterministic LP mode.
This branch sets
pdlp_settings.time_limitto infinity, but the solve still runs under a plaintimer_tlater in the function. That means a single LP call can blow past the caller/global deadline wheneverwork_limitis finite. The work-derived iteration budget should be an additional limit, not a replacement for the outer termination ceiling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` around lines 75 - 98, The code currently zeroes out the wall-clock cap by setting pdlp_settings.time_limit = infinity in the determinism_mode branch, which lets the later plain timer_t bypass the caller's work_limit; instead, preserve the outer deadline by assigning pdlp_settings.time_limit to the caller's work_limit when that limit is finite (use the same variable logged as settings.work_limit or work_limit, remembering units — pdlp_settings.time_limit is in seconds), and only set it to infinity when work_limit is infinite; keep the iteration budget logic (estim_iters, pdlp_settings.iteration_limit) as an additional cap so both iteration and wall-clock limits (used alongside timer_t) are enforced.cpp/include/cuopt/linear_programming/utilities/internals.hpp-72-72 (1)
72-72:⚠️ Potential issue | 🟠 MajorKeep existing callback type values stable.
Adding
GET_SOLUTION_EXTin the middle changesSET_SOLUTIONfrom1to2. Any client callback class compiled against the previous header will still return the old value from the inlineget_type(), so a newer library can misclassify it. Please give the enum explicit values and append the new member instead.Suggested fix
-enum class base_solution_callback_type { GET_SOLUTION, GET_SOLUTION_EXT, SET_SOLUTION }; +enum class base_solution_callback_type { + GET_SOLUTION = 0, + SET_SOLUTION = 1, + GET_SOLUTION_EXT = 2, +};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/utilities/internals.hpp` at line 72, The enum base_solution_callback_type must keep existing numeric values stable: explicitly assign values to the original members (e.g., GET_SOLUTION = 0, SET_SOLUTION = 1) and move the new member to the end (e.g., GET_SOLUTION_EXT = 2) so existing client binaries still match their compiled get_type() results; update the enum declaration in internals.hpp to use these explicit integer assignments and ensure no other code assumes implicit ordering.cpp/tests/mip/determinism_test.cu-296-299 (1)
296-299:⚠️ Potential issue | 🟠 MajorCompare a GPU-origin subsequence, not just the first five raw callbacks.
The test only proves that GPU callbacks appear somewhere in the 32 captured entries.
expect_callback_prefixes_bitwise_equal(..., 5, ...)still compares the unfiltered prefix, so a divergent GPU callback stream later in the capture can pass unnoticed. Filter to GPU origins before comparing, or extend the compared window until it includes the first N GPU callbacks.Also applies to: 342-346
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_test.cu` around lines 296 - 299, The test currently compares only the first raw callbacks (using callback_compare_count = 5) which can miss GPU-origin callbacks occurring later; change the comparison to operate on the filtered GPU-origin subsequence instead of the unfiltered prefix (or alternatively increase the compared window to include the first N GPU callbacks). Specifically, update the calls to expect_callback_prefixes_bitwise_equal (and related checks around callback_compare_count / callback_capture_limit) to first filter the captured callbacks for origin == GPU and then compare the first N GPU callbacks (use min_gpu_callback_count to choose N), ensuring you reference the variables callback_compare_count, callback_capture_limit, min_gpu_callback_count and the function expect_callback_prefixes_bitwise_equal when locating and modifying the logic.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-1054-1054 (1)
1054-1054:⚠️ Potential issue | 🟠 MajorDon't discard the explicit deterministic work budget.
This overwrites
settings.work_limitwithsettings.time_limit, so a caller-provided work budget is never honored.host_loop()then builds its limiter fromsettings.time_limit, which means deterministic FJ is still effectively wall-clock-budgeted.🛠️ Suggested fix
- bool deterministic = (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS); - if (deterministic) { - settings.time_limit = std::max((f_t)0.0, settings.time_limit); - settings.work_limit = settings.time_limit; - } + bool deterministic = (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS); + settings.time_limit = std::max((f_t)0.0, settings.time_limit); + settings.work_limit = std::max((f_t)0.0, settings.work_limit);- work_limit_timer_t timer(context.gpu_heur_loop, settings.time_limit, *context.termination); + const f_t budget = deterministic_work_estimate ? settings.work_limit : settings.time_limit; + work_limit_timer_t timer(context.gpu_heur_loop, budget, *context.termination);Also applies to: 1263-1267
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` at line 1054, The code constructs a work_limit_timer_t using settings.time_limit, which overwrites any caller-provided deterministic work budget (settings.work_limit) and causes host_loop() to build its limiter from time_limit; change the construction so the deterministic work budget is preserved—use settings.work_limit when it is set/meaningful (or pass both values appropriately) to initialize work_limit_timer_t (the place where timer is created for context.gpu_heur_loop) and apply the same fix in the other occurrence around the block at the later occurrence (lines near the second timer creation), ensuring host_loop() sees the original settings.work_limit rather than being forced to use settings.time_limit.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-1036-1040 (1)
1036-1040:⚠️ Potential issue | 🟠 MajorDisable graph batching when work-limit termination is active.
use_graphstaystrueeven under deterministic accounting, so the loop can only notice exhaustion after an entireiterations_per_graphbatch completes. With the current default of 50, small work budgets can overshoot by a full graph launch.🛠️ Suggested fix
- const bool use_graph = true; + const bool use_graph = !deterministic_work_estimate; const i_t iterations_per_batch = use_graph ? iterations_per_graph : 1;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 1036 - 1040, The loop currently keeps graph batching enabled even under deterministic work accounting, so set use_graph to false when deterministic_work_estimate is active to avoid overshooting work-limited termination; change the initialization of use_graph to depend on deterministic_work_estimate (e.g. use_graph = !deterministic_work_estimate) and keep iterations_per_batch = use_graph ? iterations_per_graph : 1 so that when context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS is set you fall back to single-iteration launches.cpp/src/mip_heuristics/problem/problem.cu-1426-1434 (1)
1426-1434:⚠️ Potential issue | 🟠 MajorGuard the empty substitution case before indexing
var_indices[0].This validation still accepts an empty
var_indices, and the next line unconditionally readsvar_indices[0]. If a presolve path hands you an empty batch, this becomes UB/crash instead of a no-op.🛠️ Suggested fix
cuopt_assert((are_exclusive<i_t, f_t>(var_indices, var_to_substitute_indices)), "variables and var_to_substitute_indices are not exclusive"); + if (var_indices.empty()) { return; } { std::vector<i_t> sorted_vi(var_indices); std::sort(sorted_vi.begin(), sorted_vi.end()); cuopt_assert(std::adjacent_find(sorted_vi.begin(), sorted_vi.end()) == sorted_vi.end(), "var_indices must not contain duplicates"); } const i_t dummy_substituted_variable = var_indices[0];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1426 - 1434, The code reads var_indices[0] without checking for empty input; before using var_indices[0] (used to set dummy_substituted_variable) add an explicit guard for the empty-substitution case: check var_indices.empty() immediately after validating exclusivity (and before sorting/duplicate checks or use that check to short-circuit) and either return/skip the substitution logic (no-op) or assert with a clear message; update logic around are_exclusive<i_t,f_t>(var_indices, var_to_substitute_indices) and the block that computes sorted_vi so they handle/skip when var_indices is empty, ensuring dummy_substituted_variable is only assigned when var_indices is non-empty.cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu-1079-1082 (1)
1079-1082:⚠️ Potential issue | 🟠 MajorChain the repair timer to
max_timer.
apply_round()just createdmax_timerto cap bounds propagation, but this child timer bypasses it and starts from the global termination checker. On a long outer solve, repair can consume more budget thanmax_timerstill has before the next local limit check.🛠️ Suggested fix
- work_limit_timer_t repair_timer( - context.gpu_heur_loop, timer.remaining_time() / 5, *context.termination); + work_limit_timer_t repair_timer( + context.gpu_heur_loop, max_timer.remaining_time() / 5, max_timer);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines 1079 - 1082, The repair timer is currently constructed with the global termination checker (*context.termination) and thus can bypass the cap imposed by max_timer created in apply_round(); change the work_limit_timer_t construction for repair_timer so it is chained to max_timer (i.e., use the max_timer termination/parent instead of *context.termination) so repair respects the max_timer budget — locate the work_limit_timer_t repair_timer initialization in constraint_prop.cu and replace the termination argument with the max_timer/its termination handle created earlier in apply_round().cpp/src/mip_heuristics/problem/problem.cu-1493-1513 (1)
1493-1513:⚠️ Potential issue | 🟠 MajorUse a stable sort to ensure deterministic floating-point reduction order.
The code explicitly aims for "deterministic reduction" but uses
thrust::sort_by_key, which is unstable and does not preserve the relative order of equal keys. When two substitutions target the same variable, their deltas can be accumulated in different orders across runs. Since floating-point addition is not associative, different accumulation orders produce different results due to rounding. Replace withthrust::stable_sort_by_keyto guarantee the reduction order remains canonical.🛠️ Suggested fix
- thrust::sort_by_key(handle_ptr->get_thrust_policy(), - obj_coeff_keys.begin(), - obj_coeff_keys.end(), - obj_coeff_deltas.begin()); + thrust::stable_sort_by_key(handle_ptr->get_thrust_policy(), + obj_coeff_keys.begin(), + obj_coeff_keys.end(), + obj_coeff_deltas.begin());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1493 - 1513, The sort used before reduction is unstable: replace thrust::sort_by_key(handle_ptr->get_thrust_policy(), obj_coeff_keys.begin(), obj_coeff_keys.end(), obj_coeff_deltas.begin()) with thrust::stable_sort_by_key(...) so that equal keys in obj_coeff_keys keep a canonical order before thrust::reduce_by_key; this change affects the sorting step that feeds unique_keys/summed_deltas and ensures the subsequent thrust::reduce_by_key and the for_each lambda updating objective_coefficients produce deterministic results.cpp/tests/mip/determinism_test.cu-140-159 (1)
140-159:⚠️ Potential issue | 🟠 Major
expect_callback_prefixes_bitwise_equal()usesEXPECT_EQon floating-point values but doesn't guarantee bitwise equality.
EXPECT_EQondoublewill accept IEEE 754 equivalent values like+0.0and-0.0despite their bitwise difference, and may not catch subtle representation changes that indicate non-deterministic behavior. Since the function name explicitly promises bitwise equality and is used for determinism testing, compare the raw bits ofobjective,solution_bound, and eachassignmententry instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_test.cu` around lines 140 - 159, expect_callback_prefixes_bitwise_equal currently uses EXPECT_EQ on floating fields but must enforce true bitwise equality; change comparisons for callback_solution_t::objective, ::solution_bound and each double in ::assignment to compare their raw IEEE-754 bit patterns (e.g., reinterpret_cast<const uint64_t&> or memcpy into uint64_t and compare) instead of EXPECT_EQ on doubles, keeping the existing failure messages and using EXPECT_EQ on the integer bit-representations so +0.0 vs -0.0 and any bit-level differences are caught; leave the size and origin checks as-is and apply this bitwise comparison for each i and each j in the loops.cpp/tests/mip/diversity_test.cu-103-107 (1)
103-107:⚠️ Potential issue | 🟠 MajorUse the new bitmask determinism flags in these tests.
This file still passes
CUOPT_MODE_DETERMINISTIC*, while the solver changes in this PR branch onCUOPT_DETERMINISM_*. If the legacy names are gone, this won’t compile; if they are aliased, the new tests still aren’t exercising the bitmask-based path you just added.Also applies to: 163-166, 224-227, 249-253
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 103 - 107, Replace legacy enum usage with the new bitmask determinism flags: set settings.determinism_mode to the CUOPT_DETERMINISM_* bitmask constant instead of CUOPT_MODE_DETERMINISTIC_* (e.g., change settings.determinism_mode = CUOPT_MODE_DETERMINISTIC_GPU_HEURISTICS; to settings.determinism_mode = CUOPT_DETERMINISM_GPU_HEURISTICS;). Update the same replacement for every occurrence in this test (also at the other spots you noted around the settings variable), so the test exercises the new bitmask-based determinism path with the existing settings.cpp/tests/mip/diversity_test.cu-265-296 (1)
265-296:⚠️ Potential issue | 🟠 MajorDon’t retain recombiner gold hashes across unrelated test runs.
hash_mapisstaticand keyed only by(path, i, j, recombiner), so repeating the same test binary with a different seed in one process compares against stale state and can fail spuriously. Theexit(1)path then aborts the whole GTest process instead of surfacing a normal assertion failure. Keep the gold state scoped to one test case (or includeseedin the key) and fail through GTest instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 265 - 296, The test currently uses a static hash_map (symbol: hash_map) keyed by (path, i, j, recombiner) which lets stale gold hashes persist across runs; make the gold state local to the test by removing the static qualifier and declaring hash_map inside the test scope (or alternatively add the current seed to the tuple key) so entries are not reused across unrelated runs, and replace the process-aborting exit(1) with a GTest failure (e.g., ASSERT_EQ / EXPECT_EQ or FAIL()) when the stored hash (lookup of std::make_tuple(path, i, j, recombiner)) does not match offspring.get_hash() produced by diversity_manager.recombine called on pop_vector; this ensures failures surface via GTest instead of killing the whole process.cpp/src/mip_heuristics/diversity/population.cu-224-231 (1)
224-231:⚠️ Potential issue | 🟠 Major
new_best_feasible_objectiveis updated in the wrong direction.This is a minimization objective everywhere else. With
>, a worse non-CPUFJ solution relaxes the threshold and a better non-CPUFJ solution never tightens it, so the CPUFJ filter can admit solutions that are worse than the real best in the same batch.♻️ Suggested fix
- } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP && - h_entry.objective > new_best_feasible_objective) { + } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP && + h_entry.objective < new_best_feasible_objective) { new_best_feasible_objective = h_entry.objective; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/population.cu` around lines 224 - 231, The code treats the objective as maximized; flip the comparisons so we tighten toward smaller (better) objectives: when checking CPU_FEASIBILITY_JUMP (h_entry.origin == internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP) skip the entry if it is not better or equal (use h_entry.objective >= new_best_feasible_objective), and when handling non-CPUFJ entries update new_best_feasible_objective only if h_entry.objective is strictly smaller (use h_entry.objective < new_best_feasible_objective); update the comparisons around h_entry.objective and new_best_feasible_objective accordingly.cpp/src/utilities/termination_checker.hpp-88-109 (1)
88-109:⚠️ Potential issue | 🟠 MajorSynchronize the
finishedtransition using atomic.
check()can run concurrently in the deterministic work-accounting solver (which uses OpenMP threading), but thefinishedflag is a plainbool(line 214). The read-check-write pattern at lines 92–93 (if (finished_now && !finished) { finished = true; ... }) is a data race: two threads can both readfinishedas false and both execute the log block. Even thoughfinishedis only meant to suppress duplicate tracing, the C++ memory model requires synchronization.Suggested fix
+#include <atomic> + bool check(const char* caller = __builtin_FUNCTION(), const char* file = __builtin_FILE(), int line = __builtin_LINE()) const noexcept { @@ if (deterministic) { if (!work_context) { return false; } double elapsed_since_start = work_context->current_work() - work_units_at_start; bool finished_now = elapsed_since_start >= work_limit; - if (finished_now && !finished) { - finished = true; + bool expected = false; + if (finished_now && finished.compare_exchange_strong(expected, true)) { double actual_elapsed_time = timer.elapsed_time(); @@ - return finished; + return finished.load(std::memory_order_relaxed); } else { return timer.check_time_limit(); } } @@ - mutable bool finished{false}; + mutable std::atomic<bool> finished{false};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/utilities/termination_checker.hpp` around lines 88 - 109, The transition of the plain bool finished in termination_checker::check() is racy under OpenMP; change finished to an std::atomic<bool> and use an atomic compare-and-exchange (or exchange) to ensure only one thread transitions it to true and runs the logging block: compute finished_now as before, then if (finished_now) attempt to atomically set finished from false to true and only call timer.elapsed_time() and CUOPT_LOG_TRACE when the atomic operation succeeded; leave other reads of finished as atomic loads to avoid data races and preserve the suppression of duplicate traces.cpp/src/branch_and_bound/branch_and_bound.cpp-732-751 (1)
732-751:⚠️ Potential issue | 🟠 MajorPublish incumbents after releasing
mutex_upper_.Both paths keep
mutex_upper_held while they log/report and then call into the incumbent callback path. That makes a user callback plus uncrush work part of the critical section, which can stall all incumbent updates and can deadlock if the callback re-enters solver code.Suggested unlock-before-publish pattern
- mutex_upper_.lock(); - if (obj < upper_bound_) { - upper_bound_ = obj; - incumbent_.set_incumbent_solution(obj, x); - report_heuristic(obj, work_timestamp); - emit_solution_callback_from_crushed(x, obj, origin, work_timestamp); - } - mutex_upper_.unlock(); + bool publish = false; + mutex_upper_.lock(); + if (obj < upper_bound_) { + upper_bound_ = obj; + incumbent_.set_incumbent_solution(obj, x); + publish = true; + } + mutex_upper_.unlock(); + + if (publish) { + report_heuristic(obj, work_timestamp); + emit_solution_callback_from_crushed(x, obj, origin, work_timestamp); + }Also applies to: 955-981
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 732 - 751, The incumbents are being published while holding mutex_upper_, which makes logging/reporting and the emit_solution_callback_from_crushed path part of the critical section; change both update paths (the block updating upper_bound_, incumbent_.set_incumbent_solution, CUOPT_DETERMINISM_LOG, report_heuristic, and emit_solution_callback_from_crushed, and the analogous block around lines 955-981) to: acquire mutex_upper_, check/assign previous_upper and update upper_bound_ and incumbent_.set_incumbent_solution, capture any values needed for logging/reporting (e.g., previous_upper, repaired_obj, hash, queued_solution.work_timestamp, queued_solution.origin) into local variables, then unlock mutex_upper_ before calling CUOPT_DETERMINISM_LOG, report_heuristic, or emit_solution_callback_from_crushed so the callback and logging occur outside the critical section. Ensure no shared state is mutated after unlocking.cpp/src/branch_and_bound/branch_and_bound.cpp-3680-3690 (1)
3680-3690:⚠️ Potential issue | 🟠 MajorRetire
work_timestamp == horizonsolutions in the same sync.
producer_sync_.wait_for_producers(horizon_end)already guarantees producers are caught up through this horizon, but the strict< deterministic_current_horizon_check leaves exact-boundary solutions queued until the next sync. That delays incumbent pruning/publication by a full horizon and can change the explored tree.Boundary fix
- if (sol.work_timestamp < deterministic_current_horizon_) { + if (sol.work_timestamp <= deterministic_current_horizon_) { due_solutions.push_back(std::move(sol)); } else { future_solutions.push_back(std::move(sol)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3680 - 3690, The current loop keeps queued_external_solution_t entries whose work_timestamp equals deterministic_current_horizon_ instead of retiring them, causing one-horizon delay; change the condition in the heuristic_solution_queue_ scan (where it compares sol.work_timestamp to deterministic_current_horizon_) from using strict '<' to '<=' so that producer_sync_.wait_for_producers(horizon_end) will allow retiring solutions with work_timestamp == deterministic_current_horizon_; update any related comments and ensure this affects the due_solutions collection used for feasibility/repair and incumbent pruning/publication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb644a6f-47be-427b-83b0-439f8bcfe64a
📒 Files selected for processing (111)
benchmarks/linear_programming/cuopt/run_mip.cppci/compute-sanitizer-suppressions.xmlcpp/CMakeLists.txtcpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/cuopt_c.hcpp/include/cuopt/linear_programming/mip/solver_settings.hppcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/dual_simplex/basis_updates.cppcpp/src/dual_simplex/bound_flipping_ratio_test.cppcpp/src/dual_simplex/bound_flipping_ratio_test.hppcpp/src/dual_simplex/phase2.cppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/diversity_config.hppcpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/diversity/diversity_manager.cuhcpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/diversity/multi_armed_bandit.cuhcpp/src/mip_heuristics/diversity/population.cucpp/src/mip_heuristics/diversity/population.cuhcpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hppcpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuhcpp/src/mip_heuristics/diversity/weights.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cucpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump_impl_common.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cucpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cuhcpp/src/mip_heuristics/feasibility_jump/fj_cpu.cucpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuhcpp/src/mip_heuristics/feasibility_jump/load_balancing.cuhcpp/src/mip_heuristics/feasibility_jump/utils.cuhcpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cucpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuhcpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cucpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cuhcpp/src/mip_heuristics/local_search/local_search.cucpp/src/mip_heuristics/local_search/local_search.cuhcpp/src/mip_heuristics/local_search/rounding/bounds_repair.cucpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuhcpp/src/mip_heuristics/local_search/rounding/constraint_prop.cucpp/src/mip_heuristics/local_search/rounding/constraint_prop.cuhcpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cucpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cuhcpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cucpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuhcpp/src/mip_heuristics/local_search/rounding/simple_rounding.cucpp/src/mip_heuristics/local_search/rounding/simple_rounding_kernels.cuhcpp/src/mip_heuristics/mip_constants.hppcpp/src/mip_heuristics/presolve/bounds_presolve.cucpp/src/mip_heuristics/presolve/bounds_presolve.cuhcpp/src/mip_heuristics/presolve/bounds_update_data.cucpp/src/mip_heuristics/presolve/lb_probing_cache.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuhcpp/src/mip_heuristics/presolve/multi_probe.cucpp/src/mip_heuristics/presolve/multi_probe.cuhcpp/src/mip_heuristics/presolve/probing_cache.cucpp/src/mip_heuristics/presolve/probing_cache.cuhcpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/presolve/third_party_presolve.hppcpp/src/mip_heuristics/presolve/utils.cuhcpp/src/mip_heuristics/problem/presolve_data.cucpp/src/mip_heuristics/problem/presolve_data.cuhcpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/problem/problem.cuhcpp/src/mip_heuristics/problem/problem_fixing.cuhcpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cucpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cuhcpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solution/solution.cuhcpp/src/mip_heuristics/solution_callbacks.cuhcpp/src/mip_heuristics/solve.cucpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver.cuhcpp/src/mip_heuristics/solver_context.cuhcpp/src/pdlp/cuopt_c.cppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuhcpp/src/utilities/copy_helpers.hppcpp/src/utilities/cuda_helpers.cuhcpp/src/utilities/determinism_log.hppcpp/src/utilities/seed_generator.cucpp/src/utilities/seed_generator.cuhcpp/src/utilities/termination_checker.hppcpp/src/utilities/timer.hppcpp/src/utilities/work_limit_context.hppcpp/src/utilities/work_limit_timer.hppcpp/src/utilities/work_unit_scheduler.cppcpp/tests/CMakeLists.txtcpp/tests/mip/CMakeLists.txtcpp/tests/mip/determinism_test.cucpp/tests/mip/determinism_utils.cuhcpp/tests/mip/diversity_test.cucpp/tests/mip/feasibility_jump_tests.cucpp/tests/mip/load_balancing_test.cucpp/tests/mip/local_search_test.cucpp/tests/mip/mip_utils.cuhcpp/tests/mip/multi_probe_test.cucpp/tests/mip/presolve_test.cudatasets/mip/download_miplib_test_dataset.sh
| int64_t& total_lp_iter, | ||
| cuopt::work_limit_context_t* work_ctx = nullptr) | ||
| { | ||
| omp_atomic_t<int64_t> atomic_iter{0}; | ||
| f_t result = trial_branching(original_lp, | ||
| settings, | ||
| var_types, | ||
| vstatus, | ||
| edge_norms, | ||
| basis_factors, | ||
| basic_list, | ||
| nonbasic_list, | ||
| branch_var, | ||
| branch_var_lower, | ||
| branch_var_upper, | ||
| upper_bound, | ||
| bnb_lp_iter_per_node, | ||
| start_time, | ||
| upper_max_lp_iter, | ||
| lower_max_lp_iter, | ||
| atomic_iter, | ||
| work_ctx); | ||
| total_lp_iter += atomic_iter.load(); | ||
| return result; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the file
find . -name "pseudo_costs.cpp" -type f | head -5Repository: aliceb-nv/cuopt-fork
Length of output: 107
🏁 Script executed:
# Get basic file info
wc -l cpp/src/branch_and_bound/pseudo_costs.cppRepository: aliceb-nv/cuopt-fork
Length of output: 110
🏁 Script executed:
# Read the critical section at lines 279-302
sed -n '270,310p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -nRepository: aliceb-nv/cuopt-fork
Length of output: 2055
🏁 Script executed:
# Read the other affected sections mentioned (794-820, 834-853)
sed -n '790,860p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -nRepository: aliceb-nv/cuopt-fork
Length of output: 3961
🏁 Script executed:
# Search for taskloop usage and how strong_branching_lp_iter is shared
rg -B5 -A10 "taskloop" cpp/src/branch_and_bound/pseudo_costs.cppRepository: aliceb-nv/cuopt-fork
Length of output: 602
🏁 Script executed:
# Look for reliable_variable_selection_core to understand the context
rg -B3 -A20 "reliable_variable_selection_core" cpp/src/branch_and_bound/pseudo_costs.cpp | head -80Repository: aliceb-nv/cuopt-fork
Length of output: 4078
Add atomic synchronization to shared counter updates in deterministic strong-branching path.
The taskloop shares strong_branching_lp_iter across parallel tasks (line 795-796), but the overload at lines 279-302 performs unsynchronized total_lp_iter += atomic_iter.load(); when called in deterministic mode (plain int64_t&). This causes lost increments and corrupts iteration statistics and adaptive thresholds.
♻️ Suggested fix
f_t result = trial_branching(original_lp,
settings,
var_types,
vstatus,
edge_norms,
basis_factors,
basic_list,
nonbasic_list,
branch_var,
branch_var_lower,
branch_var_upper,
upper_bound,
bnb_lp_iter_per_node,
start_time,
upper_max_lp_iter,
lower_max_lp_iter,
atomic_iter,
work_ctx);
- total_lp_iter += atomic_iter.load();
+ const auto local_iter = atomic_iter.load();
+#pragma omp atomic update
+ total_lp_iter += local_iter;
return result;Also applies to: 794–820, 834–853
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 279 - 302, The
unsynchronized update total_lp_iter += atomic_iter.load() in the wrapper around
trial_branching can race when called from parallel taskloops; replace that plain
+= with an atomic-safe add — e.g. use std::atomic_ref<int64_t>
total_ref(total_lp_iter); total_ref.fetch_add(atomic_iter.load(),
std::memory_order_relaxed); or wrap the update in an omp critical/atomic region
so the increment from atomic_iter is applied atomically; apply the same fix to
the other deterministic strong-branching sites mentioned (the taskloop use sites
around the blocks corresponding to strong_branching_lp_iter at ~794–820 and
~834–853).
| int32_t base_feas = 0; | ||
| int32_t bonus_robust = 0; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for fj_staged_score_t definition to verify the base/bonus types
ast-grep --pattern $'struct fj_staged_score_t {
$$$
}'
# Also search for move_score_t to check if it uses the same types
rg -n "struct.*move_score_t" --type cpp -A 10Repository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
cat -n cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh | head -20Repository: aliceb-nv/cuopt-fork
Length of output: 763
🏁 Script executed:
# Search for fj_staged_score_t more broadly
rg "fj_staged_score_t" -t cpp -t h --type-list | head -20Repository: aliceb-nv/cuopt-fork
Length of output: 453
🏁 Script executed:
# Search in the repository for the struct
rg "struct fj_staged_score_t" --type-listRepository: aliceb-nv/cuopt-fork
Length of output: 453
🏁 Script executed:
# List files in the feasibility_jump directory
fd . cpp/src/mip_heuristics/feasibility_jump/ --type fRepository: aliceb-nv/cuopt-fork
Length of output: 604
🏁 Script executed:
# Get file length
wc -l cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 126
🏁 Script executed:
# Search for fj_staged_score_t definition
rg "fj_staged_score_t" --type cpp --type h -A 5Repository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
# Search for struct definitions in the feasibility_jump directory
rg "struct.*score_t" cpp/src/mip_heuristics/feasibility_jump/ -A 5Repository: aliceb-nv/cuopt-fork
Length of output: 690
🏁 Script executed:
# Read context around lines 541-542 in load_balancing.cuh
sed -n '530,555p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 1225
🏁 Script executed:
# Search for how base_feas and bonus_robust are used after initialization
sed -n '540,600p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 2985
🏁 Script executed:
# Search for fj_staged_score_t usage in load_balancing.cuh
rg "fj_staged_score_t" cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh -B 2 -A 2Repository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
# Search for warp_update_move_score function definition
rg "warp_update_move_score" cpp/src/mip_heuristics/feasibility_jump/ -A 15Repository: aliceb-nv/cuopt-fork
Length of output: 5219
🏁 Script executed:
# Search for isfinite usage with score fields
rg "isfinite.*score" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 2Repository: aliceb-nv/cuopt-fork
Length of output: 626
🏁 Script executed:
# Look for the function call context - what type are the arguments expected to be?
sed -n '560,580p' cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 988
🏁 Script executed:
# Check move_score_t definition to understand the score field types
rg "move_score_t" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 5Repository: aliceb-nv/cuopt-fork
Length of output: 27516
🏁 Script executed:
# Look for the other location where warp_update_move_score is called to see pattern
rg "warp_update_move_score" cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh -B 5 | head -50Repository: aliceb-nv/cuopt-fork
Length of output: 632
🏁 Script executed:
# Check if there's a second kernel/location where similar code exists with f_t
rg "base_feas.*bonus_robust" cpp/src/mip_heuristics/feasibility_jump/ -B 2 -A 2Repository: aliceb-nv/cuopt-fork
Length of output: 3062
🏁 Script executed:
# Search for any git history or comments about the f_t to int32_t change
rg "base_feas\|bonus_robust" cpp/src/mip_heuristics/feasibility_jump/ -B 1 -A 1 | head -80Repository: aliceb-nv/cuopt-fork
Length of output: 46
Type inconsistency between accumulators and function signature: warp_update_move_score expects f_t parameters.
While fj_staged_score_t::base and fj_staged_score_t::bonus are correctly defined as int32_t, the function warp_update_move_score<i_t, f_t>() at line 571 declares its parameters as f_t base_feas and f_t bonus_robust, not int32_t. The int32_t accumulators undergo implicit conversion to f_t at the call site. Additionally, isfinite() checks on int32_t fields are semantically incorrect. Either change the accumulator variables to f_t or update the function signature to accept int32_t.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/feasibility_jump/load_balancing.cuh` around lines 541
- 542, The accumulators base_feas and bonus_robust are declared as int32_t but
are passed to warp_update_move_score<i_t,f_t>() which expects f_t and uses
isfinite on those values; change the accumulator types to f_t (matching
fj_staged_score_t::base/bonus usage in the function) so there is no implicit
conversion and isfinite is applied to a floating type, i.e., replace the int32_t
declarations of base_feas and bonus_robust with f_t, and ensure any uses
(including the call to warp_update_move_score and isfinite checks) compile
against the new f_t types.
| const raft::handle_t* handle_ptr; | ||
| std::mt19937 gen; | ||
| timer_t timer{0.}; | ||
| work_limit_timer_t timer; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for work_limit_timer_t definition to verify default constructor availability
ast-grep --pattern 'class work_limit_timer_t {
$$$
}'
# Also check if it's a typedef/using alias
rg -n 'work_limit_timer_t' --type cpp -C 3 | head -60Repository: aliceb-nv/cuopt-fork
Length of output: 2548
🏁 Script executed:
# Read the termination_checker.hpp file to see the full class definition and constructors
head -250 cpp/src/utilities/termination_checker.hpp | tail -150Repository: aliceb-nv/cuopt-fork
Length of output: 4945
🏁 Script executed:
# Search for the constructors of termination_checker_t
rg -n "termination_checker_t\(" cpp/src/utilities/termination_checker.hpp | head -20Repository: aliceb-nv/cuopt-fork
Length of output: 272
🏁 Script executed:
# Read the constructor definitions to verify
sed -n '30,75p' cpp/src/utilities/termination_checker.hppRepository: aliceb-nv/cuopt-fork
Length of output: 1495
🏁 Script executed:
# Find timer_t definition to verify default constructor
rg -n "class timer_t|struct timer_t" cpp/src/utilities/ -A 20 | head -80Repository: aliceb-nv/cuopt-fork
Length of output: 1196
The member timer cannot be default-constructed.
work_limit_timer_t (alias for termination_checker_t) contains a timer_t timer; member with a deleted default constructor. This member declaration requires either explicit initialization with a time limit parameter or a different initialization strategy. The previous code correctly used timer_t timer{0.} to invoke the parameterized constructor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh` at line 147,
The member variable work_limit_timer_t timer is being default-constructed but
termination_checker_t contains a timer_t member that has no default constructor;
change the declaration to explicitly construct it with a time limit (e.g.,
initialize with zero) — replace the default declaration with an explicit
initialization such as work_limit_timer_t timer{timer_t{0.}} (or
work_limit_timer_t timer{0.} if termination_checker_t exposes a ctor taking a
double) so the embedded timer_t is properly constructed.
| i_t var_idx) | ||
| { | ||
| timer_t timer(settings.time_limit); | ||
| work_limit_timer_t timer(context.gpu_heur_loop, settings.time_limit, *context.termination); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how context.termination is initialized and whether null checks exist
rg -n 'context\.termination' --type cpp -C 2 | head -80
# Check the termination member declaration
ast-grep --pattern 'struct mip_solver_context_t {
$$$
}'Repository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
#!/bin/bash
# First, check if the file exists and explore the structure
find . -name "load_balanced_bounds_presolve.cu" -type f 2>/dev/null
# Also look for related files
fd -e cu -e hpp -e h "context" 2>/dev/null | head -20Repository: aliceb-nv/cuopt-fork
Length of output: 171
🏁 Script executed:
#!/bin/bash
# Get more info about the directory structure
ls -la cpp/src/mip_heuristics/presolve/ 2>/dev/null || find . -type d -name "presolve" 2>/dev/nullRepository: aliceb-nv/cuopt-fork
Length of output: 2218
🏁 Script executed:
#!/bin/bash
# Read the file with line numbers to see the context around the problematic lines
wc -l cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuRepository: aliceb-nv/cuopt-fork
Length of output: 132
🏁 Script executed:
#!/bin/bash
# Get context around lines 629, 641, 670
sed -n '620,640p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuRepository: aliceb-nv/cuopt-fork
Length of output: 926
🏁 Script executed:
#!/bin/bash
# Get context around line 670
sed -n '665,680p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuRepository: aliceb-nv/cuopt-fork
Length of output: 600
🏁 Script executed:
#!/bin/bash
# Get line 641 and surrounding context
sed -n '635,650p' cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuRepository: aliceb-nv/cuopt-fork
Length of output: 578
🏁 Script executed:
#!/bin/bash
# Find the class definition and constructor
head -100 cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu | tail -80Repository: aliceb-nv/cuopt-fork
Length of output: 3507
🏁 Script executed:
#!/bin/bash
# Search for mip_solver_context_t definition in header file
cat cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 8875
🏁 Script executed:
#!/bin/bash
# Find solver_context.cuh
find . -name "solver_context.cuh" -type f 2>/dev/nullRepository: aliceb-nv/cuopt-fork
Length of output: 107
🏁 Script executed:
#!/bin/bash
# Search for mip_solver_context_t definition
rg -n "struct mip_solver_context_t|class mip_solver_context_t" --type cpp -A 30 2>/dev/null | head -100Repository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Read the solver_context.cuh file
cat cpp/src/mip_heuristics/solver_context.cuhRepository: aliceb-nv/cuopt-fork
Length of output: 3365
Add null checks for context.termination before dereferencing.
The context.termination pointer is initialized to nullptr and documented as being set by mip_solver_t after construction. All three solve() overloads (lines 629, 641, 670) unconditionally dereference it with *context.termination without verifying it has been initialized. If solve() is called before termination is set, this will cause undefined behavior.
Either ensure termination is always initialized before any solve() call, or add null checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu` at line
629, The code dereferences context.termination when constructing
work_limit_timer_t in the solve() overloads (where work_limit_timer_t
timer(context.gpu_heur_loop, settings.time_limit, *context.termination) is
used); add a null-check for context.termination before dereferencing in each
solve() overload (the three sites around work_limit_timer_t creation) and handle
the null case deterministically (e.g., log an error via the existing logger and
return/throw, or use a safe default termination object) so you never do
*context.termination when it is nullptr.
Test PR to trigger coderabbit review
Description
Issue
Checklist