You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds 6 adapted WebGPU litmus tests to empirically verify memory ordering guarantees across GPU backends, with particular focus on validating Metal's fence+relaxed decomposition implementation. Adapted from gpuharbor.ucsc.edu/webgpu-mem-testing/, these tests verify that memory ordering operations correctly prevent hardware reordering as specified. They are opt-in due to execution time and sensitivity to runtime conditions.
What's Included
Litmus Test Patterns
Message Passing: Verifies that thread A's writes to X then Y are seen in that order by thread B (fundamental synchronization primitive)
Store: Ensures multiple writes to the same memory location maintain program order
Read: Ensures multiple reads from the same memory location maintain program order
Load Buffer: Detects if reads can be buffered/reordered before writes to different locations
Store Buffer: Detects if writes can be buffered/reordered before reads to different locations
2+2 Write: Verifies write coherence when multiple threads write to the same locations simultaneously
Acquire/Release ordering: Verifies that stronger memory semantics prevent incorrect reorderings
Tests validate that "weak outcomes" (forbidden reorderings) are eliminated or reduced when using stronger memory ordering.
Metal Backend Validation
Particularly important for Metal backend, where:
Only Relaxed memory ordering is natively available
Acquire/Release are implemented via fence+relaxed decomposition
These tests empirically verify the decomposition provides correct ordering guarantees
Usage
# Run memory ordering tests
TEST_MEMORY_ORDERING=true julia --project -e 'using Pkg; Pkg.test(julia_args=["--check-bounds=auto"])'# With verbose output to see detailed results
VERBOSE_MEMORY_ORDERING=true TEST_MEMORY_ORDERING=true julia --project -e 'using Pkg; Pkg.test(julia_args=["--check-bounds=auto"])'
Technical Details
Adapt package added as test dependency for GPU array handling
Bounds checking: Tests enforce --check-bounds=auto to ensure accurate detection of weak behaviors, as the default --check-bounds=yes introduces bounds checks that mask weak behaviors
Opt-in by default due to execution time
Statistical validation: Tests run multiple iterations (100+) to detect rare hardware reorderings
Why This Matters
Memory ordering bugs are difficult to debug - they're intermittent, hardware-specific, and can cause silent data corruption. These litmus tests provide:
Empirical verification: Proof that memory ordering primitives work as intended
Documentation: Demonstrates what each memory ordering level actually guarantees
Cross-backend validation
Production confidence
References
Based on WebGPU litmus tests from UCSC GPU Harbor, adapted for Julia's KernelIntrinsics.jl framework.
Hi @WilliBee, thanks for this — the litmus framework is solid work and the Load Buffer diagram is great. Two things before we dig in:
Refactor on main (commit 0dd2bcf): I had Claude Code clean up the test layout because the per-backend files had drifted into ~95% copy-paste (the three vectorization_test.jl were 491 / 489 / 499 lines of near-identical code). New layout: test/harness.jl (helpers), test/backend_hooks.jl (per-backend traits + @capture_ir / @allowscalar / assert_ir), unified test/tests/vectorization_test.jl. access_fences.jl stays per-backend. runtests.jl no longer needs cd test/. Verified 194/194 on CUDA; would appreciate a re-run on Metal when you rebase. The opt-in block for memory_ordering will need to slot into the new general_routine.jl.
A few things I'd like to understand before merging — possibly I'm missing something:
Adapt in the root Project.toml — runtests.jl activates test/envs/$TEST_BACKEND and never the root project, so I'd expect Adapt to need to live in those three per-backend Project.tomls. Did the opt-in path actually run on your machine, and if so, how? Possible I'm missing a code path.
Documented invocation — julia --project -e 'using Pkg; Pkg.test(...)' — same question. There's no [targets].test runner wired to runtests.jl, so I don't see how Pkg.test reaches the litmus tests. Curious whether it worked for you locally and what I'm missing.
Message Passing unpacking — run_test_message_passing returns (r0_1_y_2, r0_0_y_1, r0_1_y_1, r0_0_y_2, total) so the weak count looks like position 4 to me, but the test reads position 3 (_, _, weak_relaxed, _, _ = ...). The other five tests use the canonical (seq, seq, interleaved, weak, total) and read position 4. Is MP intentionally inverted, or is this a slip?
@test weak_strong == 0 — isn't this too strict for an empirical hardware test? Even with correctly-ordered primitives, a single transient weak observation from scheduling jitter or queue drain effects could flake. Your Store Buffer assertion uses the relative weak_strong <= weak_relaxed, which seems robust against that. Was the absolute zero a deliberate confidence call, or just the simpler default?
Smaller things to consider while you're rebasing: run_test_2plus2w defaults to VERBOSE=true (others to false), and perm1=419 / perm2=1031 could use a one-liner that they come from GPUHarbor for reproducibility. Optional.
Looking forward to the next push.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Comprehensive Memory Ordering Litmus Tests
This PR adds 6 adapted WebGPU litmus tests to empirically verify memory ordering guarantees across GPU backends, with particular focus on validating Metal's fence+relaxed decomposition implementation. Adapted from gpuharbor.ucsc.edu/webgpu-mem-testing/, these tests verify that memory ordering operations correctly prevent hardware reordering as specified. They are opt-in due to execution time and sensitivity to runtime conditions.
What's Included
Litmus Test Patterns
Test Methodology
Each test pattern runs in two modes:
Tests validate that "weak outcomes" (forbidden reorderings) are eliminated or reduced when using stronger memory ordering.
Metal Backend Validation
Particularly important for Metal backend, where:
Usage
Technical Details
Adaptpackage added as test dependency for GPU array handling--check-bounds=autoto ensure accurate detection of weak behaviors, as the default--check-bounds=yesintroduces bounds checks that mask weak behaviorsWhy This Matters
Memory ordering bugs are difficult to debug - they're intermittent, hardware-specific, and can cause silent data corruption. These litmus tests provide:
References
Based on WebGPU litmus tests from UCSC GPU Harbor, adapted for Julia's KernelIntrinsics.jl framework.