Skip to content

Commit 72ae6b2

Browse files
unamedkrclaude
andcommitted
quantcpp 0.9.0: KV compression ON by default in Python bindings
BREAKTHROUGH: kv_compress=1 was never broken in quant.h — the v0.8.1 abort was caused by the libc.free() cross-heap bug (fixed in v0.8.2 via quant_free_string), not by the UNIFORM_4B KV path. We isolated the wrong variable because kv_compress=0 AND skip-free were changed simultaneously in the v0.8.1 hotfix. Verified in standalone C AND Python ctypes: kv_compress=1 (UNIFORM_4B) works cleanly on SmolLM2-135M with quant_free_string. This is honest correction #8: "we disabled a working feature because of incorrect root cause analysis." Changes: - kv_compress default restored to 1 (was 0 since v0.8.1) - kv_compress warning/fallback guard removed - Version bumped to 0.9.0 (major: KV compression is now the default experience for all pip users) The headline value proposition now flows through both distribution channels identically: CLI: quant model.gguf -k turbo_kv_4b → 7x KV compression Python: Model("model.gguf") → 4-bit KV compression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7b09851 commit 72ae6b2

File tree

2 files changed

+3
-23
lines changed

2 files changed

+3
-23
lines changed

bindings/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "quantcpp"
10-
version = "0.8.3"
10+
version = "0.9.0"
1111
description = "Single-header LLM inference engine with KV cache compression (7× compression at fp32 parity)"
1212
readme = "README.md"
1313
license = { text = "Apache-2.0" }

bindings/python/quantcpp/__init__.py

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from importlib.metadata import version as _pkg_version
2020
__version__ = _pkg_version("quantcpp")
2121
except Exception:
22-
__version__ = "0.8.3" # fallback for editable / source-tree imports
22+
__version__ = "0.9.0" # fallback for editable / source-tree imports
2323

2424
import os
2525
import sys
@@ -181,28 +181,8 @@ def __init__(
181181
top_p: float = 0.9,
182182
max_tokens: int = 256,
183183
n_threads: int = 4,
184-
kv_compress: int = 0,
184+
kv_compress: int = 1,
185185
):
186-
"""
187-
.. note::
188-
``kv_compress=1`` and ``kv_compress=2`` are temporarily disabled in
189-
the Python bindings (v0.8.x) — the bundled ``quant.h`` single
190-
header carries an older KV compression path that aborts on Llama
191-
architectures. The CLI ``quant`` binary uses the multi-file engine
192-
and works with all KV types. KV compression will be re-enabled in
193-
the bindings once ``quant.h`` is re-generated against the v0.8.0+
194-
tree (tracked as v0.8.1: WASM SIMD / un-stub turbo_kv).
195-
"""
196-
if kv_compress not in (0,):
197-
import warnings
198-
warnings.warn(
199-
"kv_compress != 0 is not supported in the Python bindings of "
200-
"quantcpp 0.8.x — falling back to kv_compress=0. Use the CLI "
201-
"binary for KV compression until v0.8.2.",
202-
RuntimeWarning,
203-
stacklevel=2,
204-
)
205-
kv_compress = 0
206186
if not os.path.isfile(path):
207187
raise FileNotFoundError(f"Model file not found: {path}")
208188

0 commit comments

Comments
 (0)