Skip to content

Proposal: FP8 Stability Kit for CubeCL #952

@huy209vn

Description

@huy209vn

This issue proposes a small, optional numerics utility layer for FP8 (E4M3 / E5M2) operations in CubeCL.
The goal is to provide reusable building blocks for making FP8 matmul, attention, normalization, and training paths more robust when needed, without baking policy into core kernels.

This is exploratory and meant to start a discussion, not to assert existing correctness problems.

Context

CubeCL already supports FP8 datatypes and quantized matmul paths.
In practice, FP8 usage often benefits from a set of well-known numerical techniques (scaling policies, rounding control, accumulation strategies, etc.) that tend to get re-implemented per-kernel or per-project.

This proposal explores whether it makes sense to factor some of those techniques into a shared, opt-in utility layer, rather than embedding them ad-hoc across kernels.

Scope (tentative)

A possible cubecl-fp8 or cubecl-numerics module could expose:

Scaling helpers

Dynamic or EMA-based scale tracking

Hysteresis / power-of-two scale constraints

Optional format selection helpers (E4M3 vs E5M2)

Rounding utilities

Deterministic stochastic or dithered rounding

Counter-based RNG suitable for GPU kernels

Accumulation strategies

Chunked accumulation for long reductions

Optional compensated summation (selectively enabled)

Stable building blocks

Softmax with max-subtraction

LayerNorm / RMSNorm via Welford-style variance

Designed to plug into existing kernel paths

Training-facing hooks (optional)

Quantization-aware optimizer helpers

Simple telemetry for saturation / underflow tracking

All components would be explicitly opt-in and usable independently.

Open questions

Does this belong as a CubeCL subcrate, or remain external?

Which pieces are actually worth standardizing vs leaving to downstream users?

Are there existing patterns in CubeCL that this should align with?

Related work

#798 FP4/FP8 type support

complements quantized matmul (#870, #871)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions