Skip to content

v0.6.1 — turbo_kv_5b near-lossless + regression tests

Choose a tag to compare

@unamedkr unamedkr released this 07 Apr 22:33
· 57 commits to main since this release

🆕 turbo_kv_5b — near-lossless KV at +0.34% PPL

5-bit (32-level) Lloyd-Max-Gaussian codebook on RHT-rotated keys, following the same Variant F single-stage architecture as `turbo_kv_4b`. The new quality-maximizing option for users who can spare 22% more KV memory than 4b.

Type Bytes/block Compression Llama 3.2 3B PPL Δ vs FP32
FP32 baseline 4/elem 13.56
`turbo_kv_3b` 56 9.1× 15.39 +13.5%
`turbo_kv_4b` ⭐ default 72 7.1× 14.28 +5.3%
`turbo_kv_5b` 🏆 88 5.8× 13.60 +0.34%

CLI: `./build/quant model.gguf -k turbo_kv_5b`

Regression tests

Three new deterministic tests in `test_turbo_kv.cpp` pin the Variant F quality thresholds so future Karpathy-loop iterations cannot regress past them without failing CI:

  • `KV_4B_AttentionCosine` — `turbo_kv_4b` cosine ≥ 0.99 vs FP32 reference on synthetic data
  • `KV_5B_AttentionCosine` — `turbo_kv_5b` cosine ≥ 0.999
  • `KV_5B_BeatsKV_4B` — invariant: more bits must give ≥ accuracy

Tests use synthetic Gaussian-with-outliers vectors (~3% injected outliers at ±5× scale) and run in < 1 second. No model file needed.

Compatibility

  • Block layout for `turbo_kv_3b`/`4b` is unchanged from v0.6.0 — only new `turbo_kv_5b` type added
  • All 35 unit tests pass on macOS / Linux / Windows

Closes one item from issue #15

The 5-bit codebook variant follow-up from #15 is now shipped. Remaining items: per-channel outlier handling, Llama 3.1 8B + LongBench-E reproduction.