- add optional GPU acceleration layer for TurboQuant hot-path linear algebra
- new
ComputeBackendprotocol coversrotate,rotate_inverse,project,normalize_rows,restore_norms— the two O(V×D²) GEMMs that dominate encode/decode cost NumPyBackend: always available, delegates to canonicalquantizer.pyandrotation.pyto prevent driftTorchBackend: CUDA (NVIDIA) and MPS (Apple Silicon via PyTorch); install withpip install semafold[torch]MLXBackend: Metal (Apple Silicon via MLX); install withpip install semafold[mlx]- thread-safe auto-detection registry with priority chain: CUDA → MPS → MLX → NumPy
- on-device matrix cache keyed by
(ctypes.data, shape)— safe againstid()reuse after LRU eviction, bounded to ≤ 64 entries list_backends()probes availability without instantiating backend objects- v0.3.0 extension point stubbed in protocol: fused
encode_pipeline()for zero-copy KV cache workloads - plain
pip install semafoldunchanged — NumPy remains the default, no new required dependencies - 189 tests passing; Pyright 0 errors
- initial Phase 0 and Phase 1 Semafold scaffold
- stable vector contract, measured accounting, and explicit evidence records
- root-exported
PassthroughVectorCodecand non-root scalar reference codec - package-facing closeout docs aligned to the current Phase 1 surface and verified commands
- preview-only TurboQuant KV config/codec surface under deep import
semafold.turboquant.kv; stable root unchanged - synthetic KV attention-proxy, multi-shape smoke, and fp16/bf16-aware accounting validation for the KV codec surface