Skip to content

Commit f28dbd4

Browse files
authored
Add v0.1 simd_math blueprint with portable and AVX2 kernels (#103)
* Add simd_math u35 scaffolding with log2/exp2 and ULP contract tests * Stabilize restored simd_math exp/log families and document public API * Add criterion benchmarks for restored simd_math function families * Implement SIMD-native f32 log2_u35/exp2_u35 kernels * Add layered f32 kernel dispatch with AVX2 log2 override * Fix scalar benchmark wrappers on non-x86 CI
1 parent 0a24ff5 commit f28dbd4

15 files changed

Lines changed: 1553 additions & 11 deletions

File tree

Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,7 @@ features = ["plotters", "cargo_bench_support"]
4242
[[bench]]
4343
name = "numparse"
4444
harness = false
45+
46+
[[bench]]
47+
name = "simd_math"
48+
harness = false

README.md

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,54 @@ Refer to the excellent [Intel Intrinsics Guide](https://software.intel.com/sites
2424
* Extract or set a single lane with the index operator: `let v1 = v[1];`
2525
* Falls all the way back to scalar code for platforms with no SIMD or unsupported SIMD
2626

27-
# Trig Functions via Sleef-sys
27+
# SIMD math revival status
2828

29-
~~A number of trigonometric and other common math functions are provided~~
30-
~~in vectorized form via the Sleef-sys crate. This is an optional feature `sleef` that you can enable.~~
31-
~~Doing so currently requires nightly, as well as having CMake and Clang installed.~~
29+
SIMDeez now includes a native, pure-Rust math surface for the first restored SLEEF-style family:
3230

33-
⚠️ In simdeez V2.0, sleef is temporarily deprecated due to the maintenance complexity involved around it. We are open to contributions, and are undecided on whether we:
34-
- Resume sleef support via the existing sleef-sys crate
35-
- Re-implement sleef via simdeez primitives
31+
- `log2_u35`
32+
- `exp2_u35`
33+
- `ln_u35`
34+
- `exp_u35`
35+
36+
These are exposed via extension traits in `simdeez::math` and re-exported in `simdeez::prelude`:
37+
38+
```rust
39+
use simdeez::prelude::*;
40+
41+
fn apply_math<S: Simd>(x: S::Vf32) -> S::Vf32 {
42+
let y = x.log2_u35();
43+
y.exp2_u35() + x.ln_u35() + x.exp_u35()
44+
}
45+
```
46+
47+
The old `sleef-sys` feature remains historical/deprecated and is **not** the primary implementation path for this revived surface.
48+
49+
### Kernel layering blueprint (v0.1)
50+
51+
The restored `f32` path now demonstrates the intended extension architecture:
52+
53+
1. **Portable SIMD kernels** (`src/math/f32/portable.rs`) implement reduction + polynomial logic with backend-agnostic simdeez primitives.
54+
2. **Backend override dispatch** (`src/math/f32/mod.rs`) selects architecture-tuned kernels without changing the public `SimdMathF32` API.
55+
3. **Hand-optimized backend implementation** (`src/math/f32/x86_avx2.rs`) provides a real AVX2/FMA override for `log2_u35`.
56+
4. **Scalar fallback patching** remains centralized in the portable layer for exceptional lanes, preserving special-value semantics.
57+
58+
To add the next SLEEF-style function, follow the same pattern: start portable, wire dispatch, then add optional backend overrides only where profiling justifies complexity.
59+
60+
### Benchmarking restored math
61+
62+
An in-repo Criterion benchmark target is available for this revived surface:
63+
64+
```bash
65+
cargo bench --bench simd_math
66+
```
67+
68+
This benchmark reports per-function throughput for:
69+
70+
- native scalar loop baseline (`f32::{log2, exp2, ln, exp}`)
71+
- simdeez runtime-selected path
72+
- forced backend variants (`scalar`, `sse2`, `sse41`, `avx2`, and `avx512` when available on host)
73+
74+
Current expectation: `log2_u35` and `exp2_u35` should show clear speedups on SIMD-capable backends (notably AVX2 on x86 hosts), while `ln_u35`/`exp_u35` remain scalar-reference quality-first baselines. Use these benches to validate both performance and dispatch behavior as new kernels/overrides are added.
3675

3776
# Compared to packed_simd
3877

0 commit comments

Comments
 (0)