Skip to content

Add litmus tests for memory ordering operations#6

Open
WilliBee wants to merge 1 commit into
epilliat:mainfrom
WilliBee:litmus_tests
Open

Add litmus tests for memory ordering operations#6
WilliBee wants to merge 1 commit into
epilliat:mainfrom
WilliBee:litmus_tests

Conversation

@WilliBee
Copy link
Copy Markdown
Contributor

Add Comprehensive Memory Ordering Litmus Tests

This PR adds 6 adapted WebGPU litmus tests to empirically verify memory ordering guarantees across GPU backends, with particular focus on validating Metal's fence+relaxed decomposition implementation. Adapted from gpuharbor.ucsc.edu/webgpu-mem-testing/, these tests verify that memory ordering operations correctly prevent hardware reordering as specified. They are opt-in due to execution time and sensitivity to runtime conditions.

What's Included

Litmus Test Patterns

  • Message Passing: Verifies that thread A's writes to X then Y are seen in that order by thread B (fundamental synchronization primitive)
  • Store: Ensures multiple writes to the same memory location maintain program order
  • Read: Ensures multiple reads from the same memory location maintain program order
  • Load Buffer: Detects if reads can be buffered/reordered before writes to different locations
  • Store Buffer: Detects if writes can be buffered/reordered before reads to different locations
  • 2+2 Write: Verifies write coherence when multiple threads write to the same locations simultaneously

Test Methodology

Each test pattern runs in two modes:

  1. Relaxed ordering: Establishes baseline hardware reordering behavior
  2. Acquire/Release ordering: Verifies that stronger memory semantics prevent incorrect reorderings

Tests validate that "weak outcomes" (forbidden reorderings) are eliminated or reduced when using stronger memory ordering.

Metal Backend Validation

Particularly important for Metal backend, where:

  • Only Relaxed memory ordering is natively available
  • Acquire/Release are implemented via fence+relaxed decomposition
  • These tests empirically verify the decomposition provides correct ordering guarantees

Usage

# Run memory ordering tests
TEST_MEMORY_ORDERING=true julia --project -e 'using Pkg; Pkg.test(julia_args=["--check-bounds=auto"])'

# With verbose output to see detailed results
VERBOSE_MEMORY_ORDERING=true TEST_MEMORY_ORDERING=true julia --project -e 'using Pkg; Pkg.test(julia_args=["--check-bounds=auto"])'

Technical Details

  • Adapt package added as test dependency for GPU array handling
  • Bounds checking: Tests enforce --check-bounds=auto to ensure accurate detection of weak behaviors, as the default --check-bounds=yes introduces bounds checks that mask weak behaviors
  • Opt-in by default due to execution time
  • Statistical validation: Tests run multiple iterations (100+) to detect rare hardware reorderings

Why This Matters

Memory ordering bugs are difficult to debug - they're intermittent, hardware-specific, and can cause silent data corruption. These litmus tests provide:

  1. Empirical verification: Proof that memory ordering primitives work as intended
  2. Documentation: Demonstrates what each memory ordering level actually guarantees
  3. Cross-backend validation
  4. Production confidence

References

Based on WebGPU litmus tests from UCSC GPU Harbor, adapted for Julia's KernelIntrinsics.jl framework.

@epilliat
Copy link
Copy Markdown
Owner

epilliat commented May 6, 2026

Hi @WilliBee, thanks for this — the litmus framework is solid work and the Load Buffer diagram is great. Two things before we dig in:

Refactor on main (commit 0dd2bcf): I had Claude Code clean up the test layout because the per-backend files had drifted into ~95% copy-paste (the three vectorization_test.jl were 491 / 489 / 499 lines of near-identical code). New layout: test/harness.jl (helpers), test/backend_hooks.jl (per-backend traits + @capture_ir / @allowscalar / assert_ir), unified test/tests/vectorization_test.jl. access_fences.jl stays per-backend. runtests.jl no longer needs cd test/. Verified 194/194 on CUDA; would appreciate a re-run on Metal when you rebase. The opt-in block for memory_ordering will need to slot into the new general_routine.jl.

A few things I'd like to understand before merging — possibly I'm missing something:

  1. Adapt in the root Project.tomlruntests.jl activates test/envs/$TEST_BACKEND and never the root project, so I'd expect Adapt to need to live in those three per-backend Project.tomls. Did the opt-in path actually run on your machine, and if so, how? Possible I'm missing a code path.

  2. Documented invocationjulia --project -e 'using Pkg; Pkg.test(...)' — same question. There's no [targets].test runner wired to runtests.jl, so I don't see how Pkg.test reaches the litmus tests. Curious whether it worked for you locally and what I'm missing.

  3. Message Passing unpackingrun_test_message_passing returns (r0_1_y_2, r0_0_y_1, r0_1_y_1, r0_0_y_2, total) so the weak count looks like position 4 to me, but the test reads position 3 (_, _, weak_relaxed, _, _ = ...). The other five tests use the canonical (seq, seq, interleaved, weak, total) and read position 4. Is MP intentionally inverted, or is this a slip?

  4. @test weak_strong == 0 — isn't this too strict for an empirical hardware test? Even with correctly-ordered primitives, a single transient weak observation from scheduling jitter or queue drain effects could flake. Your Store Buffer assertion uses the relative weak_strong <= weak_relaxed, which seems robust against that. Was the absolute zero a deliberate confidence call, or just the simpler default?

Smaller things to consider while you're rebasing: run_test_2plus2w defaults to VERBOSE=true (others to false), and perm1=419 / perm2=1031 could use a one-liner that they come from GPUHarbor for reproducibility. Optional.

Looking forward to the next push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants