Skip to content

Test WSL in CI#2339

Open
bdice wants to merge 6 commits into
rapidsai:mainfrom
bdice:wsl
Open

Test WSL in CI#2339
bdice wants to merge 6 commits into
rapidsai:mainfrom
bdice:wsl

Conversation

@bdice
Copy link
Copy Markdown
Collaborator

@bdice bdice commented Mar 30, 2026

Description

Test of the new WSL CI runners.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 30, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Mar 30, 2026

/ok to test

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Repointed reusable GitHub Actions workflow refs from @main to @wsl; updated CI shell-script diagnostics to print broader system info (uname -a) and relabel checks; bumped a script header year; added global_arena::size() accessor and enriched RMM memory-resource error/log messages with arena and device memory details; added WSL detection and max-allocation logging helpers and adjusted tests to use them.

Changes

Cohort / File(s) Summary
GitHub Actions workflows
​.github/workflows/build.yaml, ​.github/workflows/pr.yaml, ​.github/workflows/test.yaml
Replaced uses: refs for multiple jobs from rapidsai/shared-workflows@mainrapidsai/shared-workflows@wsl. No other job inputs, needs, if, or secrets changes.
CI shell scripts — diagnostics update
ci/test_cpp.sh, ci/test_cpp_debug.sh, ci/test_python.sh, ci/test_python_integrations.sh, ci/test_wheel_integrations.sh
Changed diagnostic label from “Check GPU usage” to “Check system info”, inserted uname -a before nvidia-smi, and bumped copyright year in ci/test_cpp.sh.
RMM memory-resource headers
cpp/include/rmm/mr/arena_memory_resource.hpp, cpp/include/rmm/mr/detail/arena.hpp, cpp/include/rmm/mr/pool_memory_resource.hpp
Added global_arena::size() accessor; expanded out-of-memory and pool-expansion error messages/logs to include arena size and device memory (free/total) diagnostics; updated SPDX year in arena.hpp.
Tests — WSL detection and pool sizing
cpp/tests/mr/test_utils.hpp, cpp/tests/mr/arena_mr_tests.cpp, cpp/tests/mr/pool_mr_tests.cpp
Added rmm::test::is_wsl() and log_max_allocatable_memory() helpers; updated SPDX year; tests now use percent = is_wsl() ? 70 : 90 and log max allocatable memory on WSL; test names adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

non-breaking, improvement

Suggested reviewers

  • gforsyth
  • vyasr
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Test WSL in CI' directly relates to the primary objective of the PR, which is testing the new WSL CI runners as confirmed by comments and CI output.
Description check ✅ Passed The description 'Test of the new WSL CI runners' is directly related to the changeset, which updates CI workflows and test utilities to support WSL-specific behaviors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr.yaml:
- Line 29: The workflow reference uses a non-existent branch tag "@wsl" in the
uses line (e.g., the entry "uses:
rapidsai/shared-workflows/.github/workflows/pr-builder.yaml@wsl"); update all
occurrences (all 15 references) to point to a valid branch such as "@main" or
the correct release branch (for example "@branch-25.04") so the action can
resolve, or alternatively ensure the rapidsai/shared-workflows repo actually
contains a "wsl" branch with the required workflow files before keeping "@wsl".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c653c59-05d6-4b75-865b-5cfc6a099329

📥 Commits

Reviewing files that changed from the base of the PR and between b325755 and d24c097.

📒 Files selected for processing (3)
  • .github/workflows/build.yaml
  • .github/workflows/pr.yaml
  • .github/workflows/test.yaml

Comment thread .github/workflows/pr.yaml
@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Mar 30, 2026

/ok to test

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Mar 31, 2026

/ok to test

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Mar 31, 2026

┌─────────────────────────┐
|    Check system info    |
└─────────────────────────┘
Linux e067bc46e75d 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun  5 18:30:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Yup, it's running WSL!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cpp/include/rmm/mr/pool_memory_resource.hpp (1)

242-261: Same exception safety concern: available_device_memory() can throw.

The call to rmm::available_device_memory() on line 243 can throw rmm::cuda_error (per cpp/src/cuda_device.cpp:31-37), which would prevent RMM_FAIL from executing with the intended rmm::out_of_memory exception and diagnostic message.

For consistency with the suggested fix in arena_memory_resource.hpp, wrap the device memory query in a try-catch:

Suggested fix for exception safety
     auto report_error = [&](const char* reason) {
       auto const max_size = maximum_pool_size_.value_or(std::numeric_limits<std::size_t>::max());
-      auto const [free_memory, total_memory] = rmm::available_device_memory();
+      std::size_t free_memory{};
+      std::size_t total_memory{};
+      try {
+        std::tie(free_memory, total_memory) = rmm::available_device_memory();
+      } catch (...) {
+        // Device memory query failed; continue with zeros
+      }
       RMM_LOG_ERROR(
         "[A][Stream %s][Upstream %zuB][FAILURE maximum pool size exceeded: %s]"
         "[Pool current=%s max=%s][Device free=%s total=%s]",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/rmm/mr/pool_memory_resource.hpp` around lines 242 - 261, The call
to rmm::available_device_memory() can throw and would bypass the RMM_FAIL
diagnostic; wrap the call to rmm::available_device_memory() in a try-catch
around the block that computes free_memory/total_memory so exceptions are caught
and converted into safe fallback values (e.g., zeros or
std::numeric_limits<std::size_t>::max()) and an explanatory string or flag is
recorded; then proceed to log with RMM_LOG_ERROR and call RMM_FAIL using those
fallback values and include any caught exception message (from the caught
rmm::cuda_error or std::exception) in the reason/diagnostic so pool_size(),
maximum_pool_size_, min_size, reason, and the original RMM_FAIL path remain
executed deterministically even if available_device_memory() throws.
cpp/include/rmm/mr/arena_memory_resource.hpp (1)

158-165: available_device_memory() can throw, masking the original OOM error.

Per cpp/src/cuda_device.cpp:31-37, rmm::available_device_memory() uses RMM_CUDA_TRY(cudaMemGetInfo(...)) which can throw rmm::cuda_error. If the CUDA call fails (device error, context destroyed, etc.), the user sees a generic CUDA error instead of the intended rmm::out_of_memory with diagnostic details.

Consider wrapping in a try-catch to gracefully degrade when device memory info is unavailable:

Suggested fix for exception safety
       if (pointer == nullptr) {
         if (dump_log_on_failure_) { dump_memory_log(bytes); }
-        auto const [free_memory, total_memory] = rmm::available_device_memory();
+        std::size_t free_memory{};
+        std::size_t total_memory{};
+        try {
+          std::tie(free_memory, total_memory) = rmm::available_device_memory();
+        } catch (...) {
+          // Device memory query failed; continue with zeros
+        }
         auto const msg = std::string("Maximum pool size exceeded (failed to allocate ") +
                          rmm::detail::format_bytes(bytes) +
                          ", arena size=" + rmm::detail::format_bytes(global_arena_.size()) +
                          ", device free=" + rmm::detail::format_bytes(free_memory) +
                          " total=" + rmm::detail::format_bytes(total_memory) +
                          "): No room in arena.";
         RMM_FAIL(msg.c_str(), rmm::out_of_memory);
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/rmm/mr/arena_memory_resource.hpp` around lines 158 - 165, The
call to rmm::available_device_memory() inside arena_memory_resource's
allocation-failure path can throw (rmm::cuda_error) and mask the intended
rmm::out_of_memory; wrap the available_device_memory() call in a try-catch that
catches rmm::cuda_error (and/or std::exception) and supplies a safe fallback
(e.g. unknown/0 values or sentinel strings) so you still construct the
diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory). Locate
the code using available_device_memory(), rmm::detail::format_bytes,
global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure
CUDA query failures don’t replace the original OOM failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/include/rmm/mr/arena_memory_resource.hpp`:
- Around line 158-165: The call to rmm::available_device_memory() inside
arena_memory_resource's allocation-failure path can throw (rmm::cuda_error) and
mask the intended rmm::out_of_memory; wrap the available_device_memory() call in
a try-catch that catches rmm::cuda_error (and/or std::exception) and supplies a
safe fallback (e.g. unknown/0 values or sentinel strings) so you still construct
the diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory).
Locate the code using available_device_memory(), rmm::detail::format_bytes,
global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure
CUDA query failures don’t replace the original OOM failure.

In `@cpp/include/rmm/mr/pool_memory_resource.hpp`:
- Around line 242-261: The call to rmm::available_device_memory() can throw and
would bypass the RMM_FAIL diagnostic; wrap the call to
rmm::available_device_memory() in a try-catch around the block that computes
free_memory/total_memory so exceptions are caught and converted into safe
fallback values (e.g., zeros or std::numeric_limits<std::size_t>::max()) and an
explanatory string or flag is recorded; then proceed to log with RMM_LOG_ERROR
and call RMM_FAIL using those fallback values and include any caught exception
message (from the caught rmm::cuda_error or std::exception) in the
reason/diagnostic so pool_size(), maximum_pool_size_, min_size, reason, and the
original RMM_FAIL path remain executed deterministically even if
available_device_memory() throws.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 029d331c-a0f7-4a37-845f-f295b63460d2

📥 Commits

Reviewing files that changed from the base of the PR and between 08deb4e and 06f7ff3.

📒 Files selected for processing (3)
  • cpp/include/rmm/mr/arena_memory_resource.hpp
  • cpp/include/rmm/mr/detail/arena.hpp
  • cpp/include/rmm/mr/pool_memory_resource.hpp

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Mar 31, 2026

/ok to test

@bdice bdice marked this pull request as ready for review March 31, 2026 20:01
@bdice bdice requested review from a team as code owners March 31, 2026 20:01
@bdice bdice requested review from AyodeAwe, miscco and shrshi and removed request for AyodeAwe, miscco and shrshi March 31, 2026 20:01
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cpp/tests/mr/test_utils.hpp (1)

18-31: Consider using a case-insensitive search for robustness.

The current implementation checks for "microsoft" and "Microsoft" separately. While these are the typical cases found in /proc/version on WSL, a case-insensitive search would be more robust against potential variations.

♻️ Suggested improvement for case-insensitive search
+#include <algorithm>
+#include <cctype>
+
 inline bool is_wsl()
 {
   std::ifstream proc_version("/proc/version");
   if (proc_version.is_open()) {
     std::string line;
     std::getline(proc_version, line);
-    return line.find("microsoft") != std::string::npos ||
-           line.find("Microsoft") != std::string::npos;
+    std::string line_lower;
+    line_lower.reserve(line.size());
+    std::transform(line.begin(), line.end(), std::back_inserter(line_lower),
+                   [](unsigned char c) { return std::tolower(c); });
+    return line_lower.find("microsoft") != std::string::npos;
   }
   return false;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/test_utils.hpp` around lines 18 - 31, The is_wsl() function
currently checks for "microsoft" and "Microsoft" separately; change it to
perform a case-insensitive search of the /proc/version line (e.g., convert the
line to lowercase or use a case-insensitive search routine) and then check for
"microsoft" once. Update the logic inside is_wsl() (the std::getline and
subsequent find checks) to normalize case before calling find so the function is
robust to any capitalization variations.
cpp/tests/mr/arena_mr_tests.cpp (1)

487-494: Consider adding a comment explaining the WSL-specific percentage.

The test logic is correct, but the rationale for using 80% on WSL vs 90% on native Linux is not documented. A brief comment would help future maintainers understand why these specific values were chosen.

📝 Suggested documentation improvement
 TEST_F(ArenaTest, AllocateMostOfFreeMemory)  // NOLINT
 {
+  // WSL has higher memory overhead, so use a lower percentage to avoid allocation failures
   auto const percent = is_wsl() ? 80 : 90;
   EXPECT_NO_THROW([percent]() {  // NOLINT(cppcoreguidelines-avoid-goto)
     auto const pool_size = rmm::percent_of_free_device_memory(percent);
     arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size);
   }());
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 487 - 494, The test
AllocateMostOfFreeMemory uses a different percent value on WSL (is_wsl()) but
lacks an explanation; add a brief inline comment near the percent calculation
explaining why WSL uses 80% vs 90% (e.g., WSL reports/free memory differently or
to avoid OOM on WSL environments), so future readers understand the rationale;
update the comment close to the percent variable and/or the call to
rmm::percent_of_free_device_memory to reference is_wsl and the chosen safety
margin for WSL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 487-494: The test AllocateMostOfFreeMemory uses a different
percent value on WSL (is_wsl()) but lacks an explanation; add a brief inline
comment near the percent calculation explaining why WSL uses 80% vs 90% (e.g.,
WSL reports/free memory differently or to avoid OOM on WSL environments), so
future readers understand the rationale; update the comment close to the percent
variable and/or the call to rmm::percent_of_free_device_memory to reference
is_wsl and the chosen safety margin for WSL.

In `@cpp/tests/mr/test_utils.hpp`:
- Around line 18-31: The is_wsl() function currently checks for "microsoft" and
"Microsoft" separately; change it to perform a case-insensitive search of the
/proc/version line (e.g., convert the line to lowercase or use a
case-insensitive search routine) and then check for "microsoft" once. Update the
logic inside is_wsl() (the std::getline and subsequent find checks) to normalize
case before calling find so the function is robust to any capitalization
variations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1492e7b3-713a-4877-a7d9-1279be46705c

📥 Commits

Reviewing files that changed from the base of the PR and between 06f7ff3 and a7a3221.

📒 Files selected for processing (3)
  • cpp/tests/mr/arena_mr_tests.cpp
  • cpp/tests/mr/pool_mr_tests.cpp
  • cpp/tests/mr/test_utils.hpp

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/tests/mr/arena_mr_tests.cpp (1)

489-494: Consider caching the is_wsl() result to avoid redundant file I/O.

is_wsl() is called twice, and each call reads /proc/version. Storing the result in a local variable avoids the redundant I/O.

♻️ Suggested refactor
 TEST_F(ArenaTest, AllocateMostOfFreeMemory)  // NOLINT
 {
-  if (is_wsl()) { log_max_allocatable_memory(); }
-  auto const percent = is_wsl() ? 70 : 90;
+  auto const wsl     = is_wsl();
+  if (wsl) { log_max_allocatable_memory(); }
+  auto const percent = wsl ? 70 : 90;
   EXPECT_NO_THROW([percent]() {  // NOLINT(cppcoreguidelines-avoid-goto)
     auto const pool_size = rmm::percent_of_free_device_memory(percent);
     arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size);
   }());
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 489 - 494, Cache the result of
is_wsl() in a local boolean to avoid calling it twice (and re-reading
/proc/version); use that cached value for the conditional call to
log_max_allocatable_memory() and to decide percent (replace the two is_wsl()
invocations around log_max_allocatable_memory(), percent, and the
EXPECT_NO_THROW lambda). Ensure you reference the cached variable where percent
is computed and before calling log_max_allocatable_memory() so the behavior is
unchanged but redundant file I/O is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 489-494: Cache the result of is_wsl() in a local boolean to avoid
calling it twice (and re-reading /proc/version); use that cached value for the
conditional call to log_max_allocatable_memory() and to decide percent (replace
the two is_wsl() invocations around log_max_allocatable_memory(), percent, and
the EXPECT_NO_THROW lambda). Ensure you reference the cached variable where
percent is computed and before calling log_max_allocatable_memory() so the
behavior is unchanged but redundant file I/O is eliminated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd6f10fa-bb8b-4086-8b08-d8897a322f43

📥 Commits

Reviewing files that changed from the base of the PR and between a7a3221 and fd4b128.

📒 Files selected for processing (3)
  • cpp/tests/mr/arena_mr_tests.cpp
  • cpp/tests/mr/pool_mr_tests.cpp
  • cpp/tests/mr/test_utils.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • cpp/tests/mr/pool_mr_tests.cpp
  • cpp/tests/mr/test_utils.hpp

@bdice bdice force-pushed the wsl branch 2 times, most recently from d44e8b9 to 80a625c Compare April 3, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant