-
Notifications
You must be signed in to change notification settings - Fork 127
Description
This issue proposes a small, optional numerics utility layer for FP8 (E4M3 / E5M2) operations in CubeCL.
The goal is to provide reusable building blocks for making FP8 matmul, attention, normalization, and training paths more robust when needed, without baking policy into core kernels.
This is exploratory and meant to start a discussion, not to assert existing correctness problems.
Context
CubeCL already supports FP8 datatypes and quantized matmul paths.
In practice, FP8 usage often benefits from a set of well-known numerical techniques (scaling policies, rounding control, accumulation strategies, etc.) that tend to get re-implemented per-kernel or per-project.
This proposal explores whether it makes sense to factor some of those techniques into a shared, opt-in utility layer, rather than embedding them ad-hoc across kernels.
Scope (tentative)
A possible cubecl-fp8 or cubecl-numerics module could expose:
Scaling helpers
Dynamic or EMA-based scale tracking
Hysteresis / power-of-two scale constraints
Optional format selection helpers (E4M3 vs E5M2)
Rounding utilities
Deterministic stochastic or dithered rounding
Counter-based RNG suitable for GPU kernels
Accumulation strategies
Chunked accumulation for long reductions
Optional compensated summation (selectively enabled)
Stable building blocks
Softmax with max-subtraction
LayerNorm / RMSNorm via Welford-style variance
Designed to plug into existing kernel paths
Training-facing hooks (optional)
Quantization-aware optimizer helpers
Simple telemetry for saturation / underflow tracking
All components would be explicitly opt-in and usable independently.
Open questions
Does this belong as a CubeCL subcrate, or remain external?
Which pieces are actually worth standardizing vs leaving to downstream users?
Are there existing patterns in CubeCL that this should align with?
Related work
#798 FP4/FP8 type support