Release v0.6.1 — turbo_kv_5b near-lossless + regression tests · quantumaikr/quant.cpp

🆕 turbo_kv_5b — near-lossless KV at +0.34% PPL

5-bit (32-level) Lloyd-Max-Gaussian codebook on RHT-rotated keys, following the same Variant F single-stage architecture as `turbo_kv_4b`. The new quality-maximizing option for users who can spare 22% more KV memory than 4b.

Type	Bytes/block	Compression	Llama 3.2 3B PPL	Δ vs FP32
FP32 baseline	4/elem	1×	13.56	—
`turbo_kv_3b`	56	9.1×	15.39	+13.5%
`turbo_kv_4b` ⭐ default	72	7.1×	14.28	+5.3%
`turbo_kv_5b` 🏆	88	5.8×	13.60	+0.34%

CLI: `./build/quant model.gguf -k turbo_kv_5b`

Regression tests

Three new deterministic tests in `test_turbo_kv.cpp` pin the Variant F quality thresholds so future Karpathy-loop iterations cannot regress past them without failing CI:

`KV_4B_AttentionCosine` — `turbo_kv_4b` cosine ≥ 0.99 vs FP32 reference on synthetic data
`KV_5B_AttentionCosine` — `turbo_kv_5b` cosine ≥ 0.999
`KV_5B_BeatsKV_4B` — invariant: more bits must give ≥ accuracy

Tests use synthetic Gaussian-with-outliers vectors (~3% injected outliers at ±5× scale) and run in < 1 second. No model file needed.

Compatibility

Block layout for `turbo_kv_3b`/`4b` is unchanged from v0.6.0 — only new `turbo_kv_5b` type added
All 35 unit tests pass on macOS / Linux / Windows

Closes one item from issue #15

The 5-bit codebook variant follow-up from #15 is now shipped. Remaining items: per-channel outlier handling, Llama 3.1 8B + LongBench-E reproduction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.1 — turbo_kv_5b near-lossless + regression tests

Choose a tag to compare

Sorry, something went wrong.