You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add v0.1 simd_math blueprint with portable and AVX2 kernels (#103)
* Add simd_math u35 scaffolding with log2/exp2 and ULP contract tests
* Stabilize restored simd_math exp/log families and document public API
* Add criterion benchmarks for restored simd_math function families
* Implement SIMD-native f32 log2_u35/exp2_u35 kernels
* Add layered f32 kernel dispatch with AVX2 log2 override
* Fix scalar benchmark wrappers on non-x86 CI
Copy file name to clipboardExpand all lines: README.md
+46-7Lines changed: 46 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,15 +24,54 @@ Refer to the excellent [Intel Intrinsics Guide](https://software.intel.com/sites
24
24
* Extract or set a single lane with the index operator: `let v1 = v[1];`
25
25
* Falls all the way back to scalar code for platforms with no SIMD or unsupported SIMD
26
26
27
-
# Trig Functions via Sleef-sys
27
+
# SIMD math revival status
28
28
29
-
~~A number of trigonometric and other common math functions are provided~~
30
-
~~in vectorized form via the Sleef-sys crate. This is an optional feature `sleef` that you can enable.~~
31
-
~~Doing so currently requires nightly, as well as having CMake and Clang installed.~~
29
+
SIMDeez now includes a native, pure-Rust math surface for the first restored SLEEF-style family:
32
30
33
-
⚠️ In simdeez V2.0, sleef is temporarily deprecated due to the maintenance complexity involved around it. We are open to contributions, and are undecided on whether we:
34
-
- Resume sleef support via the existing sleef-sys crate
35
-
- Re-implement sleef via simdeez primitives
31
+
-`log2_u35`
32
+
-`exp2_u35`
33
+
-`ln_u35`
34
+
-`exp_u35`
35
+
36
+
These are exposed via extension traits in `simdeez::math` and re-exported in `simdeez::prelude`:
37
+
38
+
```rust
39
+
usesimdeez::prelude::*;
40
+
41
+
fnapply_math<S:Simd>(x:S::Vf32) ->S::Vf32 {
42
+
lety=x.log2_u35();
43
+
y.exp2_u35() +x.ln_u35() +x.exp_u35()
44
+
}
45
+
```
46
+
47
+
The old `sleef-sys` feature remains historical/deprecated and is **not** the primary implementation path for this revived surface.
48
+
49
+
### Kernel layering blueprint (v0.1)
50
+
51
+
The restored `f32` path now demonstrates the intended extension architecture:
2.**Backend override dispatch** (`src/math/f32/mod.rs`) selects architecture-tuned kernels without changing the public `SimdMathF32` API.
55
+
3.**Hand-optimized backend implementation** (`src/math/f32/x86_avx2.rs`) provides a real AVX2/FMA override for `log2_u35`.
56
+
4.**Scalar fallback patching** remains centralized in the portable layer for exceptional lanes, preserving special-value semantics.
57
+
58
+
To add the next SLEEF-style function, follow the same pattern: start portable, wire dispatch, then add optional backend overrides only where profiling justifies complexity.
59
+
60
+
### Benchmarking restored math
61
+
62
+
An in-repo Criterion benchmark target is available for this revived surface:
63
+
64
+
```bash
65
+
cargo bench --bench simd_math
66
+
```
67
+
68
+
This benchmark reports per-function throughput for:
- forced backend variants (`scalar`, `sse2`, `sse41`, `avx2`, and `avx512` when available on host)
73
+
74
+
Current expectation: `log2_u35` and `exp2_u35` should show clear speedups on SIMD-capable backends (notably AVX2 on x86 hosts), while `ln_u35`/`exp_u35` remain scalar-reference quality-first baselines. Use these benches to validate both performance and dispatch behavior as new kernels/overrides are added.
0 commit comments