v0.6.1 — turbo_kv_5b near-lossless + regression tests
🆕 turbo_kv_5b — near-lossless KV at +0.34% PPL
5-bit (32-level) Lloyd-Max-Gaussian codebook on RHT-rotated keys, following the same Variant F single-stage architecture as `turbo_kv_4b`. The new quality-maximizing option for users who can spare 22% more KV memory than 4b.
| Type | Bytes/block | Compression | Llama 3.2 3B PPL | Δ vs FP32 |
|---|---|---|---|---|
| FP32 baseline | 4/elem | 1× | 13.56 | — |
| `turbo_kv_3b` | 56 | 9.1× | 15.39 | +13.5% |
| `turbo_kv_4b` ⭐ default | 72 | 7.1× | 14.28 | +5.3% |
| `turbo_kv_5b` 🏆 | 88 | 5.8× | 13.60 | +0.34% |
CLI: `./build/quant model.gguf -k turbo_kv_5b`
Regression tests
Three new deterministic tests in `test_turbo_kv.cpp` pin the Variant F quality thresholds so future Karpathy-loop iterations cannot regress past them without failing CI:
- `KV_4B_AttentionCosine` — `turbo_kv_4b` cosine ≥ 0.99 vs FP32 reference on synthetic data
- `KV_5B_AttentionCosine` — `turbo_kv_5b` cosine ≥ 0.999
- `KV_5B_BeatsKV_4B` — invariant: more bits must give ≥ accuracy
Tests use synthetic Gaussian-with-outliers vectors (~3% injected outliers at ±5× scale) and run in < 1 second. No model file needed.
Compatibility
- Block layout for `turbo_kv_3b`/`4b` is unchanged from v0.6.0 — only new `turbo_kv_5b` type added
- All 35 unit tests pass on macOS / Linux / Windows
Closes one item from issue #15
The 5-bit codebook variant follow-up from #15 is now shipped. Remaining items: per-channel outlier handling, Llama 3.1 8B + LongBench-E reproduction.