Primal and dual crushing for Papilo#11
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis pull request implements Papilo presolve solution crushing functionality to reduce original-space solutions to compressed reduced-space representations. It integrates early-heuristic incumbents as initial solutions, updates presolve configuration, adds comprehensive KKT-based testing, and extends the test dataset with new MIPLIB instances. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly.yaml:
- Line 20: The workflow fails because the matrix includes the non-existent
branch value "release/26.06" which is later used by the gh API call via the
CUOPT_BRANCH variable; fix it by either removing "release/26.06" from the branch
matrix in .github/workflows/nightly.yaml or create the corresponding branch
upstream in the NVIDIA/cuopt repo so the gh api call (gh api -q '.commit.sha'
"repos/nvidia/cuopt/branches/${CUOPT_BRANCH}") can resolve; ensure the matrix
entry and the CUOPT_BRANCH variable remain consistent.
In `@cpp/CMakeLists.txt`:
- Around line 297-302: The CMake snippet redundantly checks
OpenSSL_FOUND/OPENSSL_FOUND instead of using the target; change the fallback
logic to rely on the presence of the target OpenSSL::SSL: if OpenSSL::SSL target
is not available, call find_package(OpenSSL CONFIG QUIET) and then, only if the
OpenSSL::SSL target still does not exist, call find_package(OpenSSL REQUIRED).
Update the conditional test to use NOT TARGET OpenSSL::SSL in both checks and
remove references to OpenSSL_FOUND and OPENSSL_FOUND to follow the target-based
pattern used for gRPC/protobuf.
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 944-948: The code currently allows crush_rc when z_original is
provided but y_original is empty, which later causes out-of-bounds reads of
y[row]; change the logic so crush_rc is only true when both z_original and
y_original are non-empty (e.g., set crush_rc = !z_original.empty() &&
!y_original.empty()), and add a clear assertion or error (using cuopt_assert)
that rejects callers which pass z_original without y_original; apply the same
guard/update to the other similar reduction-handling block that uses
crush_rc/crush_dual and accesses y (the symmetric block around the second
occurrence of these variables).
In `@cpp/src/utilities/cuda_helpers.cuh`:
- Around line 199-218: The shared-memory cache currently keys only on Function*
(shmem_sizes) so the fast-path shared_lock can skip setting
cudaFuncAttributeMaxDynamicSharedMemorySize on a different GPU; change the cache
key to include the current CUDA device (e.g., get device via cudaGetDevice() or
cudaGetDeviceId()) and use a composite key (Function* plus device id) for all
lookups/updates (both the shared_lock fast path and the unique_lock write path)
while keeping mtx for synchronization; ensure every reference to shmem_sizes
(find, count, operator[]) is updated to use the composite key before calling
cudaFuncSetAttribute or storing the dynamic_request_size.
In `@cpp/tests/mip/incumbent_callback_test.cu`:
- Around line 224-249: The test must ensure early feasibility-jump heuristics
cannot produce the incumbent: before calling solve_mip with settings2, disable
the early CPU/GPU FJ heuristics so only the Papilo + add_initial_solution path
can create the incumbent; update the mip_solver_settings_t instance (settings2)
to set the appropriate flags (e.g., disable_cpu_feasibility_jump /
disable_gpu_feasibility_jump or a single disable_feasibility_jump flag if
available) or alternatively set the CUOPT_DISABLE_GPU_HEURISTICS and
corresponding CPU heuristic env var prior to calling solve_mip(op_problem,
settings2).
- Around line 201-241: The test sets the process env var via
setenv("CUOPT_DISABLE_GPU_HEURISTICS", "1", 1) but restores it with a trailing
unsetenv(), which leaks the changed state if an ASSERT_* or exception occurs;
wrap the setenv/unset in an RAII guard (e.g., a small ScopedEnvRestore class or
lambda-based scope guard) created immediately after the setenv call in
incumbent_callback_test (around the block that constructs handle_, op_problem
and settings1/settings2) so the guard captures the previous value (via getenv)
and restores it in its destructor, ensuring the env is restored even if
solve_mip, ASSERTs, or exceptions abort the test. Ensure the guard is used
before any ASSERT_* or calls that may throw and remove the bare unsetenv() at
the end.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 875195e0-088c-4949-be63-9355816ccd78
📒 Files selected for processing (17)
.github/workflows/nightly.yaml.gitignore.pre-commit-config.yamlCONTRIBUTING.mdci/validate_wheel.shcpp/CMakeLists.txtcpp/libmps_parser/src/mps_parser.cppcpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/presolve/third_party_presolve.hppcpp/src/mip_heuristics/solve.cucpp/src/utilities/cuda_helpers.cuhcpp/tests/linear_programming/unit_tests/presolve_test.cucpp/tests/mip/incumbent_callback_test.cucpp/tests/routing/CMakeLists.txtcpp/tests/routing/unit_tests/set_shmem_of_kernel.cudatasets/mip/download_miplib_test_dataset.sh
| cuopt_branch: | ||
| - "main" | ||
| - "release/26.04" | ||
| - "release/26.06" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify branch refs used by this workflow exist in NVIDIA/cuopt.
gh api "repos/NVIDIA/cuopt/branches/main" >/dev/null
gh api "repos/NVIDIA/cuopt/branches/release/26.06" >/dev/null
echo "Both branches are resolvable."Repository: aliceb-nv/cuopt-fork
Length of output: 95
🏁 Script executed:
cat -n .github/workflows/nightly.yaml | head -30Repository: aliceb-nv/cuopt-fork
Length of output: 1039
The branch release/26.06 does not exist in NVIDIA/cuopt.
The workflow will fail when attempting to resolve the SHA on line 29 (gh api -q '.commit.sha' "repos/nvidia/cuopt/branches/${CUOPT_BRANCH}"). Either create the branch upstream or remove it from the matrix.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/nightly.yaml at line 20, The workflow fails because the
matrix includes the non-existent branch value "release/26.06" which is later
used by the gh API call via the CUOPT_BRANCH variable; fix it by either removing
"release/26.06" from the branch matrix in .github/workflows/nightly.yaml or
create the corresponding branch upstream in the NVIDIA/cuopt repo so the gh api
call (gh api -q '.commit.sha' "repos/nvidia/cuopt/branches/${CUOPT_BRANCH}") can
resolve; ensure the matrix entry and the CUOPT_BRANCH variable remain
consistent.
| if(NOT TARGET OpenSSL::SSL) | ||
| find_package(OpenSSL CONFIG QUIET) | ||
| if(NOT OpenSSL_FOUND AND NOT OPENSSL_FOUND) | ||
| find_package(OpenSSL REQUIRED) | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify how OpenSSL discovery is handled across CMake files and whether mixed *_FOUND
# checks are used instead of target existence checks.
rg -n --type=cmake -C2 'find_package\(OpenSSL|OpenSSL_FOUND|OPENSSL_FOUND|TARGET OpenSSL::SSL'Repository: aliceb-nv/cuopt-fork
Length of output: 509
🏁 Script executed:
rg -n 'OpenSSL::SSL' cpp/Repository: aliceb-nv/cuopt-fork
Length of output: 114
🏁 Script executed:
rg -n 'target_link_libraries.*OpenSSL|OpenSSL.*target_link_libraries' cpp/CMakeLists.txtRepository: aliceb-nv/cuopt-fork
Length of output: 46
🏁 Script executed:
sed -n '290,310p' cpp/CMakeLists.txtRepository: aliceb-nv/cuopt-fork
Length of output: 789
🏁 Script executed:
rg -n 'cmake_minimum_required|CMAKE_VERSION' cpp/CMakeLists.txt | head -5Repository: aliceb-nv/cuopt-fork
Length of output: 116
Use target-based fallback for OpenSSL discovery.
The condition checking both OpenSSL_FOUND and OPENSSL_FOUND is unnecessary; find_package(OpenSSL CONFIG QUIET) reliably creates the OpenSSL::SSL target on success in CMake 3.30.4+. Checking for the target directly is cleaner, more maintainable, and aligns with the gRPC/protobuf patterns used elsewhere in this file.
♻️ Proposed change
if(NOT TARGET OpenSSL::SSL)
find_package(OpenSSL CONFIG QUIET)
- if(NOT OpenSSL_FOUND AND NOT OPENSSL_FOUND)
+ if(NOT TARGET OpenSSL::SSL)
find_package(OpenSSL REQUIRED)
endif()
endif()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/CMakeLists.txt` around lines 297 - 302, The CMake snippet redundantly
checks OpenSSL_FOUND/OPENSSL_FOUND instead of using the target; change the
fallback logic to rely on the presence of the target OpenSSL::SSL: if
OpenSSL::SSL target is not available, call find_package(OpenSSL CONFIG QUIET)
and then, only if the OpenSSL::SSL target still does not exist, call
find_package(OpenSSL REQUIRED). Update the conditional test to use NOT TARGET
OpenSSL::SSL in both checks and remove references to OpenSSL_FOUND and
OPENSSL_FOUND to follow the target-based pattern used for gRPC/protobuf.
| const bool crush_dual = !y_original.empty(); | ||
| if (crush_dual) { cuopt_assert((int)y_original.size() == (int)storage.nRowsOriginal, ""); } | ||
|
|
||
| const bool crush_rc = !z_original.empty(); | ||
| if (crush_rc) { cuopt_assert((int)z_original.size() == (int)storage.nColsOriginal, ""); } |
There was a problem hiding this comment.
Reject reduced-cost crushing without a dual vector.
crush_rc is allowed when y_original is empty, but the reduced-cost path later reads y[row] in multiple reduction handlers. A caller that provides z_original without y_original will hit out-of-bounds access on the new public API.
♻️ Proposed fix
const bool crush_dual = !y_original.empty();
if (crush_dual) { cuopt_assert((int)y_original.size() == (int)storage.nRowsOriginal, ""); }
const bool crush_rc = !z_original.empty();
+ cuopt_expects(!crush_rc || crush_dual,
+ error_type_t::ValidationError,
+ "Crushing reduced costs requires the original dual vector");
if (crush_rc) { cuopt_assert((int)z_original.size() == (int)storage.nColsOriginal, ""); }Also applies to: 1028-1061
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 944 -
948, The code currently allows crush_rc when z_original is provided but
y_original is empty, which later causes out-of-bounds reads of y[row]; change
the logic so crush_rc is only true when both z_original and y_original are
non-empty (e.g., set crush_rc = !z_original.empty() && !y_original.empty()), and
add a clear assertion or error (using cuopt_assert) that rejects callers which
pass z_original without y_original; apply the same guard/update to the other
similar reduction-handling block that uses crush_rc/crush_dual and accesses y
(the symmetric block around the second occurrence of these variables).
| static std::shared_mutex mtx; | ||
| static std::unordered_map<Function*, size_t> shmem_sizes; | ||
|
|
||
| if (dynamic_request_size != 0) { | ||
| dynamic_request_size = raft::alignTo(dynamic_request_size, size_t(1024)); | ||
| size_t current_size = shmem_sizes[function]; | ||
| if (dynamic_request_size > current_size) { | ||
| std::lock_guard<std::mutex> lock(mtx); | ||
| current_size = shmem_sizes[function]; | ||
|
|
||
| if (dynamic_request_size > current_size) { | ||
| cudaFuncSetAttribute( | ||
| function, cudaFuncAttributeMaxDynamicSharedMemorySize, dynamic_request_size); | ||
| { | ||
| std::shared_lock<std::shared_mutex> rlock(mtx); | ||
| auto it = shmem_sizes.find(function); | ||
| if (it != shmem_sizes.end() && dynamic_request_size <= it->second) { return true; } | ||
| } | ||
|
|
||
| std::unique_lock<std::shared_mutex> wlock(mtx); | ||
| size_t current_size = shmem_sizes.count(function) ? shmem_sizes[function] : 0; | ||
| if (dynamic_request_size > current_size) { | ||
| auto err = cudaFuncSetAttribute( | ||
| function, cudaFuncAttributeMaxDynamicSharedMemorySize, dynamic_request_size); | ||
| if (err == cudaSuccess) { | ||
| shmem_sizes[function] = dynamic_request_size; | ||
| return (cudaSuccess == cudaGetLastError()); | ||
| return true; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does cudaFuncSetAttribute(cudaFuncAttributeMaxDynamicSharedMemorySize, ...) apply per current CUDA device/context, or is it process-wide for a kernel function pointer?
💡 Result:
cudaFuncSetAttribute with cudaFuncAttributeMaxDynamicSharedMemorySize applies per current CUDA device/context for a kernel function pointer, not process-wide across all devices.
Citations:
- 1: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaFuncAttributes.html
- 2: https://docs.nvidia.com/cuda/archive/13.0.1/cuda-runtime-api/group__CUDART__LIBRARY.html
- 3: https://stackoverflow.com/questions/37033191/cudafuncsetcacheconfig-applies-to-all-devices-or-current-device
- 4: https://docs.nvidia.com/cuda/archive/11.8.0/cuda-runtime-api/group__CUDART__EXECUTION.html
- 5: https://docs.nvidia.com/cuda/archive/13.1.0/cuda-runtime-api/group__CUDART__EXECUTION.html
- 6: https://docs.nvidia.com/cuda/archive/13.1.0/cuda-runtime-api/structcudaFuncAttributes.html
- 7: https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDART__HIGHLEVEL_g0b85e087210b47056cb6fc03a0e264e8.html
- 8: https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__EXEC_g5e92a1b0d8d1b82cb00dcfb2de15961b.html
Include current CUDA device in the shared-memory cache key.
The cache uses only Function* as the key, but cudaFuncSetAttribute(cudaFuncAttributeMaxDynamicSharedMemorySize, ...) applies per device. On multi-GPU systems, the shared-lock fast path (lines 206–208) can incorrectly skip setting the kernel attribute on a new device after it was cached on a different device, breaking correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/utilities/cuda_helpers.cuh` around lines 199 - 218, The shared-memory
cache currently keys only on Function* (shmem_sizes) so the fast-path
shared_lock can skip setting cudaFuncAttributeMaxDynamicSharedMemorySize on a
different GPU; change the cache key to include the current CUDA device (e.g.,
get device via cudaGetDevice() or cudaGetDeviceId()) and use a composite key
(Function* plus device id) for all lookups/updates (both the shared_lock fast
path and the unique_lock write path) while keeping mtx for synchronization;
ensure every reference to shmem_sizes (find, count, operator[]) is updated to
use the composite key before calling cudaFuncSetAttribute or storing the
dynamic_request_size.
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
Dummy PR to trigger coderabbit review.
Description
Issue
Checklist