Test WSL in CI by bdice · Pull Request #2339 · rapidsai/rmm

bdice · 2026-03-30T21:39:08Z

Description

Test of the new WSL CI runners.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-03-30T21:39:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

bdice · 2026-03-30T21:39:15Z

/ok to test

coderabbitai · 2026-03-30T21:42:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Repointed reusable GitHub Actions workflow refs from @main to @wsl; updated CI shell-script diagnostics to print broader system info (uname -a) and relabel checks; bumped a script header year; added global_arena::size() accessor and enriched RMM memory-resource error/log messages with arena and device memory details; added WSL detection and max-allocation logging helpers and adjusted tests to use them.

Changes

Cohort / File(s)	Summary
GitHub Actions workflows `.github/workflows/build.yaml`, `.github/workflows/pr.yaml`, `.github/workflows/test.yaml`	Replaced `uses:` refs for multiple jobs from `rapidsai/shared-workflows@main` → `rapidsai/shared-workflows@wsl`. No other job inputs, `needs`, `if`, or secrets changes.
CI shell scripts — diagnostics update `ci/test_cpp.sh`, `ci/test_cpp_debug.sh`, `ci/test_python.sh`, `ci/test_python_integrations.sh`, `ci/test_wheel_integrations.sh`	Changed diagnostic label from “Check GPU usage” to “Check system info”, inserted `uname -a` before `nvidia-smi`, and bumped copyright year in `ci/test_cpp.sh`.
RMM memory-resource headers `cpp/include/rmm/mr/arena_memory_resource.hpp`, `cpp/include/rmm/mr/detail/arena.hpp`, `cpp/include/rmm/mr/pool_memory_resource.hpp`	Added `global_arena::size()` accessor; expanded out-of-memory and pool-expansion error messages/logs to include arena size and device memory (free/total) diagnostics; updated SPDX year in `arena.hpp`.
Tests — WSL detection and pool sizing `cpp/tests/mr/test_utils.hpp`, `cpp/tests/mr/arena_mr_tests.cpp`, `cpp/tests/mr/pool_mr_tests.cpp`	Added `rmm::test::is_wsl()` and `log_max_allocatable_memory()` helpers; updated SPDX year; tests now use `percent = is_wsl() ? 70 : 90` and log max allocatable memory on WSL; test names adjusted accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

build wheels with CUDA 13.0.x, test wheels against mix of CTK versions #2270 — Changes .github/workflows/pr.yaml uses: references; directly related to workflow ref updates.

Suggested labels

non-breaking, improvement

Suggested reviewers

gforsyth
vyasr

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Test WSL in CI' directly relates to the primary objective of the PR, which is testing the new WSL CI runners as confirmed by comments and CI output.
Description check	✅ Passed	The description 'Test of the new WSL CI runners' is directly related to the changeset, which updates CI workflows and test utilities to support WSL-specific behaviors.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr.yaml:
- Line 29: The workflow reference uses a non-existent branch tag "@wsl" in the
uses line (e.g., the entry "uses:
rapidsai/shared-workflows/.github/workflows/pr-builder.yaml@wsl"); update all
occurrences (all 15 references) to point to a valid branch such as "@main" or
the correct release branch (for example "@branch-25.04") so the action can
resolve, or alternatively ensure the rapidsai/shared-workflows repo actually
contains a "wsl" branch with the required workflow files before keeping "@wsl".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8c653c59-05d6-4b75-865b-5cfc6a099329

📥 Commits

Reviewing files that changed from the base of the PR and between b325755 and d24c097.

📒 Files selected for processing (3)

.github/workflows/build.yaml
.github/workflows/pr.yaml
.github/workflows/test.yaml

bdice · 2026-03-30T22:02:40Z

/ok to test

bdice · 2026-03-31T00:19:46Z

/ok to test

bdice · 2026-03-31T00:30:06Z

┌─────────────────────────┐
|    Check system info    |
└─────────────────────────┘
Linux e067bc46e75d 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun  5 18:30:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Yup, it's running WSL!

coderabbitai

🧹 Nitpick comments (2)

cpp/include/rmm/mr/pool_memory_resource.hpp (1)

242-261: Same exception safety concern: available_device_memory() can throw.

The call to rmm::available_device_memory() on line 243 can throw rmm::cuda_error (per cpp/src/cuda_device.cpp:31-37), which would prevent RMM_FAIL from executing with the intended rmm::out_of_memory exception and diagnostic message.

For consistency with the suggested fix in arena_memory_resource.hpp, wrap the device memory query in a try-catch:

Suggested fix for exception safety

     auto report_error = [&](const char* reason) {
       auto const max_size = maximum_pool_size_.value_or(std::numeric_limits<std::size_t>::max());
-      auto const [free_memory, total_memory] = rmm::available_device_memory();
+      std::size_t free_memory{};
+      std::size_t total_memory{};
+      try {
+        std::tie(free_memory, total_memory) = rmm::available_device_memory();
+      } catch (...) {
+        // Device memory query failed; continue with zeros
+      }
       RMM_LOG_ERROR(
         "[A][Stream %s][Upstream %zuB][FAILURE maximum pool size exceeded: %s]"
         "[Pool current=%s max=%s][Device free=%s total=%s]",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/include/rmm/mr/pool_memory_resource.hpp` around lines 242 - 261, The call
to rmm::available_device_memory() can throw and would bypass the RMM_FAIL
diagnostic; wrap the call to rmm::available_device_memory() in a try-catch
around the block that computes free_memory/total_memory so exceptions are caught
and converted into safe fallback values (e.g., zeros or
std::numeric_limits<std::size_t>::max()) and an explanatory string or flag is
recorded; then proceed to log with RMM_LOG_ERROR and call RMM_FAIL using those
fallback values and include any caught exception message (from the caught
rmm::cuda_error or std::exception) in the reason/diagnostic so pool_size(),
maximum_pool_size_, min_size, reason, and the original RMM_FAIL path remain
executed deterministically even if available_device_memory() throws.

cpp/include/rmm/mr/arena_memory_resource.hpp (1)

158-165: available_device_memory() can throw, masking the original OOM error.

Per cpp/src/cuda_device.cpp:31-37, rmm::available_device_memory() uses RMM_CUDA_TRY(cudaMemGetInfo(...)) which can throw rmm::cuda_error. If the CUDA call fails (device error, context destroyed, etc.), the user sees a generic CUDA error instead of the intended rmm::out_of_memory with diagnostic details.

Consider wrapping in a try-catch to gracefully degrade when device memory info is unavailable:

Suggested fix for exception safety

       if (pointer == nullptr) {
         if (dump_log_on_failure_) { dump_memory_log(bytes); }
-        auto const [free_memory, total_memory] = rmm::available_device_memory();
+        std::size_t free_memory{};
+        std::size_t total_memory{};
+        try {
+          std::tie(free_memory, total_memory) = rmm::available_device_memory();
+        } catch (...) {
+          // Device memory query failed; continue with zeros
+        }
         auto const msg = std::string("Maximum pool size exceeded (failed to allocate ") +
                          rmm::detail::format_bytes(bytes) +
                          ", arena size=" + rmm::detail::format_bytes(global_arena_.size()) +
                          ", device free=" + rmm::detail::format_bytes(free_memory) +
                          " total=" + rmm::detail::format_bytes(total_memory) +
                          "): No room in arena.";
         RMM_FAIL(msg.c_str(), rmm::out_of_memory);
       }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/include/rmm/mr/arena_memory_resource.hpp` around lines 158 - 165, The
call to rmm::available_device_memory() inside arena_memory_resource's
allocation-failure path can throw (rmm::cuda_error) and mask the intended
rmm::out_of_memory; wrap the available_device_memory() call in a try-catch that
catches rmm::cuda_error (and/or std::exception) and supplies a safe fallback
(e.g. unknown/0 values or sentinel strings) so you still construct the
diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory). Locate
the code using available_device_memory(), rmm::detail::format_bytes,
global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure
CUDA query failures don’t replace the original OOM failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/include/rmm/mr/arena_memory_resource.hpp`:
- Around line 158-165: The call to rmm::available_device_memory() inside
arena_memory_resource's allocation-failure path can throw (rmm::cuda_error) and
mask the intended rmm::out_of_memory; wrap the available_device_memory() call in
a try-catch that catches rmm::cuda_error (and/or std::exception) and supplies a
safe fallback (e.g. unknown/0 values or sentinel strings) so you still construct
the diagnostic message and call RMM_FAIL(msg.c_str(), rmm::out_of_memory).
Locate the code using available_device_memory(), rmm::detail::format_bytes,
global_arena_.size(), and RMM_FAIL and implement the try-catch there to ensure
CUDA query failures don’t replace the original OOM failure.

In `@cpp/include/rmm/mr/pool_memory_resource.hpp`:
- Around line 242-261: The call to rmm::available_device_memory() can throw and
would bypass the RMM_FAIL diagnostic; wrap the call to
rmm::available_device_memory() in a try-catch around the block that computes
free_memory/total_memory so exceptions are caught and converted into safe
fallback values (e.g., zeros or std::numeric_limits<std::size_t>::max()) and an
explanatory string or flag is recorded; then proceed to log with RMM_LOG_ERROR
and call RMM_FAIL using those fallback values and include any caught exception
message (from the caught rmm::cuda_error or std::exception) in the
reason/diagnostic so pool_size(), maximum_pool_size_, min_size, reason, and the
original RMM_FAIL path remain executed deterministically even if
available_device_memory() throws.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 029d331c-a0f7-4a37-845f-f295b63460d2

📥 Commits

Reviewing files that changed from the base of the PR and between 08deb4e and 06f7ff3.

📒 Files selected for processing (3)

cpp/include/rmm/mr/arena_memory_resource.hpp
cpp/include/rmm/mr/detail/arena.hpp
cpp/include/rmm/mr/pool_memory_resource.hpp

bdice · 2026-03-31T16:26:42Z

/ok to test

coderabbitai

🧹 Nitpick comments (2)

cpp/tests/mr/test_utils.hpp (1)

18-31: Consider using a case-insensitive search for robustness.

The current implementation checks for "microsoft" and "Microsoft" separately. While these are the typical cases found in /proc/version on WSL, a case-insensitive search would be more robust against potential variations.

♻️ Suggested improvement for case-insensitive search

+#include <algorithm>
+#include <cctype>
+
 inline bool is_wsl()
 {
   std::ifstream proc_version("/proc/version");
   if (proc_version.is_open()) {
     std::string line;
     std::getline(proc_version, line);
-    return line.find("microsoft") != std::string::npos ||
-           line.find("Microsoft") != std::string::npos;
+    std::string line_lower;
+    line_lower.reserve(line.size());
+    std::transform(line.begin(), line.end(), std::back_inserter(line_lower),
+                   [](unsigned char c) { return std::tolower(c); });
+    return line_lower.find("microsoft") != std::string::npos;
   }
   return false;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/test_utils.hpp` around lines 18 - 31, The is_wsl() function
currently checks for "microsoft" and "Microsoft" separately; change it to
perform a case-insensitive search of the /proc/version line (e.g., convert the
line to lowercase or use a case-insensitive search routine) and then check for
"microsoft" once. Update the logic inside is_wsl() (the std::getline and
subsequent find checks) to normalize case before calling find so the function is
robust to any capitalization variations.

cpp/tests/mr/arena_mr_tests.cpp (1)

487-494: Consider adding a comment explaining the WSL-specific percentage.

The test logic is correct, but the rationale for using 80% on WSL vs 90% on native Linux is not documented. A brief comment would help future maintainers understand why these specific values were chosen.

📝 Suggested documentation improvement

 TEST_F(ArenaTest, AllocateMostOfFreeMemory)  // NOLINT
 {
+  // WSL has higher memory overhead, so use a lower percentage to avoid allocation failures
   auto const percent = is_wsl() ? 80 : 90;
   EXPECT_NO_THROW([percent]() {  // NOLINT(cppcoreguidelines-avoid-goto)
     auto const pool_size = rmm::percent_of_free_device_memory(percent);
     arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size);
   }());
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 487 - 494, The test
AllocateMostOfFreeMemory uses a different percent value on WSL (is_wsl()) but
lacks an explanation; add a brief inline comment near the percent calculation
explaining why WSL uses 80% vs 90% (e.g., WSL reports/free memory differently or
to avoid OOM on WSL environments), so future readers understand the rationale;
update the comment close to the percent variable and/or the call to
rmm::percent_of_free_device_memory to reference is_wsl and the chosen safety
margin for WSL.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 487-494: The test AllocateMostOfFreeMemory uses a different
percent value on WSL (is_wsl()) but lacks an explanation; add a brief inline
comment near the percent calculation explaining why WSL uses 80% vs 90% (e.g.,
WSL reports/free memory differently or to avoid OOM on WSL environments), so
future readers understand the rationale; update the comment close to the percent
variable and/or the call to rmm::percent_of_free_device_memory to reference
is_wsl and the chosen safety margin for WSL.

In `@cpp/tests/mr/test_utils.hpp`:
- Around line 18-31: The is_wsl() function currently checks for "microsoft" and
"Microsoft" separately; change it to perform a case-insensitive search of the
/proc/version line (e.g., convert the line to lowercase or use a
case-insensitive search routine) and then check for "microsoft" once. Update the
logic inside is_wsl() (the std::getline and subsequent find checks) to normalize
case before calling find so the function is robust to any capitalization
variations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1492e7b3-713a-4877-a7d9-1279be46705c

📥 Commits

Reviewing files that changed from the base of the PR and between 06f7ff3 and a7a3221.

📒 Files selected for processing (3)

cpp/tests/mr/arena_mr_tests.cpp
cpp/tests/mr/pool_mr_tests.cpp
cpp/tests/mr/test_utils.hpp

coderabbitai

🧹 Nitpick comments (1)

cpp/tests/mr/arena_mr_tests.cpp (1)

489-494: Consider caching the is_wsl() result to avoid redundant file I/O.

is_wsl() is called twice, and each call reads /proc/version. Storing the result in a local variable avoids the redundant I/O.

♻️ Suggested refactor

 TEST_F(ArenaTest, AllocateMostOfFreeMemory)  // NOLINT
 {
-  if (is_wsl()) { log_max_allocatable_memory(); }
-  auto const percent = is_wsl() ? 70 : 90;
+  auto const wsl     = is_wsl();
+  if (wsl) { log_max_allocatable_memory(); }
+  auto const percent = wsl ? 70 : 90;
   EXPECT_NO_THROW([percent]() {  // NOLINT(cppcoreguidelines-avoid-goto)
     auto const pool_size = rmm::percent_of_free_device_memory(percent);
     arena_mr mr(rmm::mr::get_current_device_resource_ref(), pool_size);
   }());
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mr/arena_mr_tests.cpp` around lines 489 - 494, Cache the result of
is_wsl() in a local boolean to avoid calling it twice (and re-reading
/proc/version); use that cached value for the conditional call to
log_max_allocatable_memory() and to decide percent (replace the two is_wsl()
invocations around log_max_allocatable_memory(), percent, and the
EXPECT_NO_THROW lambda). Ensure you reference the cached variable where percent
is computed and before calling log_max_allocatable_memory() so the behavior is
unchanged but redundant file I/O is eliminated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/tests/mr/arena_mr_tests.cpp`:
- Around line 489-494: Cache the result of is_wsl() in a local boolean to avoid
calling it twice (and re-reading /proc/version); use that cached value for the
conditional call to log_max_allocatable_memory() and to decide percent (replace
the two is_wsl() invocations around log_max_allocatable_memory(), percent, and
the EXPECT_NO_THROW lambda). Ensure you reference the cached variable where
percent is computed and before calling log_max_allocatable_memory() so the
behavior is unchanged but redundant file I/O is eliminated.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd6f10fa-bb8b-4086-8b08-d8897a322f43

📥 Commits

Reviewing files that changed from the base of the PR and between a7a3221 and fd4b128.

📒 Files selected for processing (3)

cpp/tests/mr/arena_mr_tests.cpp
cpp/tests/mr/pool_mr_tests.cpp
cpp/tests/mr/test_utils.hpp

🚧 Files skipped from review as they are similar to previous changes (2)

cpp/tests/mr/pool_mr_tests.cpp
cpp/tests/mr/test_utils.hpp

Test WSL in CI

d24c097

github-project-automation Bot added this to RMM Project Board Mar 30, 2026

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread .github/workflows/pr.yaml

Add uname -a to CI test scripts for system info logging

08deb4e

Rerun CI

7693edc

bdice added 2 commits March 31, 2026 10:47

Include pool/arena size and device memory info in OOM error messages

aa21179

Merge branch 'wsl' of https://github.com/bdice/rmm into wsl

06f7ff3

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

bdice marked this pull request as ready for review March 31, 2026 20:01

bdice requested review from a team as code owners March 31, 2026 20:01

bdice requested review from AyodeAwe, miscco and shrshi and removed request for AyodeAwe, miscco and shrshi March 31, 2026 20:01

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

bdice force-pushed the wsl branch from a7a3221 to fd4b128 Compare March 31, 2026 20:55

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

bdice force-pushed the wsl branch 2 times, most recently from d44e8b9 to 80a625c Compare April 3, 2026 13:59

Reduce large allocation tests to 70% on WSL

d82e005

bdice force-pushed the wsl branch from 80a625c to d82e005 Compare April 3, 2026 14:02

Conversation

bdice commented Mar 30, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

bdice commented Mar 30, 2026

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bdice commented Mar 30, 2026

Uh oh!

bdice commented Mar 31, 2026

Uh oh!

bdice commented Mar 31, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

bdice commented Mar 31, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading