Speed-bump has two test layers: Python integration tests and C-level sanitiser tests.
pytest tests/# Test with GIL Python
./tests/run_ftp_tests.sh /path/to/python3.14/bin/python3
# Test with FTP Python
./tests/run_ftp_tests.sh /path/to/python3.14t/bin/python3The standalone runner:
- Builds the C extension for the specified Python
- Runs all tests with appropriate skips based on runtime type
- Reports pass/fail/skip counts
| Category | GIL Python | FTP Python |
|---|---|---|
| Runtime detection | ✓ | ✓ |
| Delay accuracy (main thread) | ✓ | ✓ |
| Delay accuracy (worker thread) | ✓ | ✓ |
| Per-thread accuracy (1,2,4,8 threads) | ✓ | ✓ |
| Cache thread safety | ✓ | ✓ |
| GIL serialisation tests | ✓ | skip |
| FTP parallelism tests | skip | ✓ |
GIL-specific and FTP-specific tests are skipped on the wrong runtime by design - they test mutually exclusive behaviours.
The C tests exercise the core spin delay code with sanitisers, independently of Python. This avoids false positives from uninstrumented Python runtime code.
# All sanitiser tests
./tests/run_c_tests.sh
# ThreadSanitizer only
./tests/run_c_tests.sh tsan
# AddressSanitizer only
./tests/run_c_tests.sh asan
# Clean build artifacts
./tests/run_c_tests.sh cleanThreadSanitizer (tsan):
spin_delay_nsis race-free under concurrent execution- 8 threads, 100 iterations each, no shared mutable state
AddressSanitizer (asan):
- No buffer overflows, use-after-free, or memory leaks
- Same test harness as tsan
- Isolation: Tests the actual functions that run in parallel, without Python overhead
- No false positives: Uninstrumented Python code would trigger spurious warnings
- Speed: Faster iteration than full Python integration tests
- Precision: If tsan finds a race, it's definitely in our code
Verified by tsan (documented in src/speed_bump/_core.c):
| Function | Thread-safe? | Notes |
|---|---|---|
spin_delay_ns |
Yes | Uses only local variables |
calibrate_clock |
No* | Called once at module init; Python import serialises this |
g_clock_overhead_ns |
Read-only after init | Safe to read from any thread |
g_calibrated |
Read-only after init | Safe to read from any thread |
*Would race if called concurrently, but this never happens due to Python's import machinery.
All tests are designed with falsification criteria:
- Test validity: Deliberately racy code confirms tsan detects races
- Measurement accuracy: Two-barrier synchronisation isolates spin delay from thread overhead
- Control tests: GIL tests fail on FTP, FTP tests fail on GIL (verified by design)
See docs/falsification-test-design.md for detailed test design rationale.