You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.6.1: regression tests + turbo_kv_5b in CHANGELOG/ROADMAP
Three new deterministic regression tests in test_turbo_kv.cpp using
synthetic Gaussian-with-outliers key/query vectors:
TurboKVRegression.KV_4B_AttentionCosine pins cos ≥ 0.99
TurboKVRegression.KV_5B_AttentionCosine pins cos ≥ 0.999
TurboKVRegression.KV_5B_BeatsKV_4B invariant: more bits ≥ accuracy
These tests are deterministic (don't need a model file), run in < 1s,
and catch any future Karpathy-loop iteration that would regress past
the Variant F quality thresholds. The synthetic data generator
(synth_keys) injects ~3% outliers at ±5x scale to mimic real
transformer KV statistics.
Also documents turbo_kv_5b in CHANGELOG.md and ROADMAP.md as a v0.6.1
patch release on top of v0.6.0.
35/35 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+23Lines changed: 23 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,28 @@
1
1
# Changelog
2
2
3
+
## [0.6.1] — 2026-04-08
4
+
5
+
### Highlights
6
+
7
+
-**🆕 `turbo_kv_5b` — near-lossless KV** at +0.34% PPL on Llama 3.2 3B. Uses a 32-level Lloyd-Max-Gaussian codebook (Max 1960 Table I) on RHT-rotated values. 88-byte block (vs 72 for 4b). The new quality-maximizing option for users who can spare 22% more KV memory than 4b.
8
+
-**Regression tests** — three deterministic synthetic-data tests pin the attention cosine quality of `turbo_kv_4b` (≥0.99) and `turbo_kv_5b` (≥0.999), and assert 5b ≥ 4b on the same data. Future Karpathy-loop iterations cannot regress past these thresholds without failing CI.
Copy file name to clipboardExpand all lines: ROADMAP.md
+8-5Lines changed: 8 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,14 +49,16 @@ The world's simplest way to add LLM to a C/C++ project.
49
49
A C reference engine for KV cache quantization research.
50
50
51
51
### Production-ready
52
-
-[x]**`turbo_kv_4b` ⭐** — RHT + 4-bit Lloyd-Max codebook, beats `uniform_4b` and llama.cpp `q4_0` KV at the same bit budget (Llama 3.2 3B PPL 14.28, +5.3% vs FP32)
-[x]**`turbo_kv_4b` ⭐ default** — RHT + 4-bit Lloyd-Max codebook, beats `uniform_4b` and llama.cpp `q4_0` KV at the same bit budget (Llama 3.2 3B PPL 14.28, +5.3% vs FP32)
0 commit comments