Test WSL in CI#2339
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRepointed reusable GitHub Actions workflow refs from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr.yaml:
- Line 29: The workflow reference uses a non-existent branch tag "@wsl" in the
uses line (e.g., the entry "uses:
rapidsai/shared-workflows/.github/workflows/pr-builder.yaml@wsl"); update all
occurrences (all 15 references) to point to a valid branch such as "@main" or
the correct release branch (for example "@branch-25.04") so the action can
resolve, or alternatively ensure the rapidsai/shared-workflows repo actually
contains a "wsl" branch with the required workflow files before keeping "@wsl".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8c653c59-05d6-4b75-865b-5cfc6a099329
📒 Files selected for processing (3)
.github/workflows/build.yaml.github/workflows/pr.yaml.github/workflows/test.yaml
|
/ok to test |
|
/ok to test |
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cpp/include/rmm/mr/pool_memory_resource.hpp (1)
242-261: Same exception safety concern:available_device_memory()can throw.The call to
rmm::available_device_memory()on line 243 can throwrmm::cuda_error(percpp/src/cuda_device.cpp:31-37), which would preventRMM_FAILfrom executing with the intendedrmm::out_of_memoryexception and diagnostic message.For consistency with the suggested fix in
arena_memory_resource.hpp, wrap the device memory query in a try-catch:Suggested fix for exception safety
auto report_error = [&](const char* reason) { auto const max_size = maximum_pool_size_.value_or(std::numeric_limits<std::size_t>::max()); - auto const [free_memory, total_memory] = rmm::available_device_memory(); + std::size_t free_memory{}; + std::size_t total_memory{}; + try { + std::tie(free_memory, total_memory) = rmm::available_device_memory(); + } catch (...) { + // Device memory query failed; continue with zeros + } RMM_LOG_ERROR( "[A][Stream %s][Upstream %zuB][FAILURE maximum pool size exceeded: %s]" "[Pool current=%s max=%s][Device free=%s total=%s]",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/rmm/mr/pool_memory_resource.hpp` around lines 242 - 261, The call to rmm::available_device_memory() can throw and would bypass the RMM_FAIL diagnostic; wrap the call to rmm::available_device_memory() in a try-catch around the block that computes free_memory/total_memory so exceptions are caught and converted into safe fallback values (e.g., zeros or std::numeric_limits<std::size_t>::max()) and an explanatory string or flag is recorded; then proceed to log with RMM_LOG_ERROR and call RMM_FAIL using those fallback values and include any caught exception message (from the caught rmm::cuda_error or std::exception) in the reason/diagnostic so pool_size(), maximum_pool_size_, min_size, reason, and the original RMM_FAIL path remain executed deterministically even if available_device_memory() throws.cpp/include/rmm/mr/arena_memory_resource.hpp (1)
158-165:available_device_memory()can throw, masking the original OOM error.Per
cpp/src/cuda_device.cpp:31-37,rmm::available_device_memory()usesRMM_CUDA_TRY(cudaMemGetInfo(...))which can throwrmm::cuda_error. If the CUDA call fails (device error, context destroyed, etc.), the user sees a generic CUDA error instead of the intendedrmm::out_of_memorywith diagnostic details.Consider wrapping in a try-catch to gracefully degrade when device memory info is unavailable:
Suggested fix for exception safety
if (pointer == nullptr) { if (dump_log_on_failure_) { dump_memory_log(bytes); } - auto const [free_memory, total_memory] = rmm::available_device_memory(); + std::size_t free_memory{}; + std::size_t total_memory{}; + try { + std::tie(free_memory, total_memory) = rmm::available_device_memory(); + } catch (...) { + // Device memory query failed; continue with zeros + } auto const msg = std::string("Maximum pool size exceeded (failed to allocate ") + rmm::detail::format_bytes(bytes) + ", arena size=" + rmm::detail::format_bytes(global_arena_.size()) + ", device free=" + rmm::detail::format_bytes(free_memory) + " total=" + rmm::detail::format_bytes(total_memory) + "): No room in arena."; RMM_FAIL(msg.c_str(), rmm::out_of_memory); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/rmm/mr/arena_memory_resource.hpp` around lines 158 - 165, The call to rmm::available_device_memory() inside arena_memory_resource's allocation-failure path can throw (rmm::cuda_error) and mask the intended rmm::out_of_memory; wrap the available_device_memory() call in a try-catch that catches rmm::cuda_error (and/or std::exception) and supplies a safe fallback (e.g. unknown/0 values or sentinel strings) so you still construct the diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory). Locate the code using available_device_memory(), rmm::detail::format_bytes, global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure CUDA query failures don’t replace the original OOM failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/include/rmm/mr/arena_memory_resource.hpp`:
- Around line 158-165: The call to rmm::available_device_memory() inside
arena_memory_resource's allocation-failure path can throw (rmm::cuda_error) and
mask the intended rmm::out_of_memory; wrap the available_device_memory() call in
a try-catch that catches rmm::cuda_error (and/or std::exception) and supplies a
safe fallback (e.g. unknown/0 values or sentinel strings) so you still construct
the diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory).
Locate the code using available_device_memory(), rmm::detail::format_bytes,
global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure
CUDA query failures don’t replace the original OOM failure.
In `@cpp/include/rmm/mr/pool_memory_resource.hpp`:
- Around line 242-261: The call to rmm::available_device_memory() can throw and
would bypass the RMM_FAIL diagnostic; wrap the call to
rmm::available_device_memory() in a try-catch around the block that computes
free_memory/total_memory so exceptions are caught and converted into safe
fallback values (e.g., zeros or std::numeric_limits<std::size_t>::max()) and an
explanatory string or flag is recorded; then proceed to log with RMM_LOG_ERROR
and call RMM_FAIL using those fallback values and include any caught exception
message (from the caught rmm::cuda_error or std::exception) in the
reason/diagnostic so pool_size(), maximum_pool_size_, min_size, reason, and the
original RMM_FAIL path remain executed deterministically even if
available_device_memory() throws.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 029d331c-a0f7-4a37-845f-f295b63460d2
📒 Files selected for processing (3)
cpp/include/rmm/mr/arena_memory_resource.hppcpp/include/rmm/mr/detail/arena.hppcpp/include/rmm/mr/pool_memory_resource.hpp
|
/ok to test |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cpp/tests/mr/test_utils.hpp (1)
18-31: Consider using a case-insensitive search for robustness.The current implementation checks for
"microsoft"and"Microsoft"separately. While these are the typical cases found in/proc/versionon WSL, a case-insensitive search would be more robust against potential variations.♻️ Suggested improvement for case-insensitive search
+#include <algorithm> +#include <cctype> + inline bool is_wsl() { std::ifstream proc_version("/proc/version"); if (proc_version.is_open()) { std::string line; std::getline(proc_version, line); - return line.find("microsoft") != std::string::npos || - line.find("Microsoft") != std::string::npos; + std::string line_lower; + line_lower.reserve(line.size()); + std::transform(line.begin(), line.end(), std::back_inserter(line_lower), + [](unsigned char c) { return std::tolower(c); }); + return line_lower.find("microsoft") != std::string::npos; } return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mr/test_utils.hpp` around lines 18 - 31, The is_wsl() function currently checks for "microsoft" and "Microsoft" separately; change it to perform a case-insensitive search of the /proc/version line (e.g., convert the line to lowercase or use a case-insensitive search routine) and then check for "microsoft" once. Update the logic inside is_wsl() (the std::getline and subsequent find checks) to normalize case before calling find so the function is robust to any capitalization variations.cpp/tests/mr/arena_mr_tests.cpp (1)
487-494: Consider adding a comment explaining the WSL-specific percentage.The test logic is correct, but the rationale for using 80% on WSL vs 90% on native Linux is not documented. A brief comment would help future maintainers understand why these specific values were chosen.
📝 Suggested documentation improvement
TEST_F(ArenaTest, AllocateMostOfFreeMemory) // NOLINT { + // WSL has higher memory overhead, so use a lower percentage to avoid allocation failures auto const percent = is_wsl() ? 80 : 90; EXPECT_NO_THROW([percent]() { // NOLINT(cppcoreguidelines-avoid-goto) auto const pool_size = rmm::percent_of_free_device_memory(percent); arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size); }()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 487 - 494, The test AllocateMostOfFreeMemory uses a different percent value on WSL (is_wsl()) but lacks an explanation; add a brief inline comment near the percent calculation explaining why WSL uses 80% vs 90% (e.g., WSL reports/free memory differently or to avoid OOM on WSL environments), so future readers understand the rationale; update the comment close to the percent variable and/or the call to rmm::percent_of_free_device_memory to reference is_wsl and the chosen safety margin for WSL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 487-494: The test AllocateMostOfFreeMemory uses a different
percent value on WSL (is_wsl()) but lacks an explanation; add a brief inline
comment near the percent calculation explaining why WSL uses 80% vs 90% (e.g.,
WSL reports/free memory differently or to avoid OOM on WSL environments), so
future readers understand the rationale; update the comment close to the percent
variable and/or the call to rmm::percent_of_free_device_memory to reference
is_wsl and the chosen safety margin for WSL.
In `@cpp/tests/mr/test_utils.hpp`:
- Around line 18-31: The is_wsl() function currently checks for "microsoft" and
"Microsoft" separately; change it to perform a case-insensitive search of the
/proc/version line (e.g., convert the line to lowercase or use a
case-insensitive search routine) and then check for "microsoft" once. Update the
logic inside is_wsl() (the std::getline and subsequent find checks) to normalize
case before calling find so the function is robust to any capitalization
variations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1492e7b3-713a-4877-a7d9-1279be46705c
📒 Files selected for processing (3)
cpp/tests/mr/arena_mr_tests.cppcpp/tests/mr/pool_mr_tests.cppcpp/tests/mr/test_utils.hpp
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cpp/tests/mr/arena_mr_tests.cpp (1)
489-494: Consider caching theis_wsl()result to avoid redundant file I/O.
is_wsl()is called twice, and each call reads/proc/version. Storing the result in a local variable avoids the redundant I/O.♻️ Suggested refactor
TEST_F(ArenaTest, AllocateMostOfFreeMemory) // NOLINT { - if (is_wsl()) { log_max_allocatable_memory(); } - auto const percent = is_wsl() ? 70 : 90; + auto const wsl = is_wsl(); + if (wsl) { log_max_allocatable_memory(); } + auto const percent = wsl ? 70 : 90; EXPECT_NO_THROW([percent]() { // NOLINT(cppcoreguidelines-avoid-goto) auto const pool_size = rmm::percent_of_free_device_memory(percent); arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size); }()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 489 - 494, Cache the result of is_wsl() in a local boolean to avoid calling it twice (and re-reading /proc/version); use that cached value for the conditional call to log_max_allocatable_memory() and to decide percent (replace the two is_wsl() invocations around log_max_allocatable_memory(), percent, and the EXPECT_NO_THROW lambda). Ensure you reference the cached variable where percent is computed and before calling log_max_allocatable_memory() so the behavior is unchanged but redundant file I/O is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 489-494: Cache the result of is_wsl() in a local boolean to avoid
calling it twice (and re-reading /proc/version); use that cached value for the
conditional call to log_max_allocatable_memory() and to decide percent (replace
the two is_wsl() invocations around log_max_allocatable_memory(), percent, and
the EXPECT_NO_THROW lambda). Ensure you reference the cached variable where
percent is computed and before calling log_max_allocatable_memory() so the
behavior is unchanged but redundant file I/O is eliminated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bd6f10fa-bb8b-4086-8b08-d8897a322f43
📒 Files selected for processing (3)
cpp/tests/mr/arena_mr_tests.cppcpp/tests/mr/pool_mr_tests.cppcpp/tests/mr/test_utils.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/tests/mr/pool_mr_tests.cpp
- cpp/tests/mr/test_utils.hpp
d44e8b9 to
80a625c
Compare
Description
Test of the new WSL CI runners.
Checklist