Skip to content

Commit 1bc4611

Browse files
unamedkrclaude
andcommitted
docs: v0.8.1 CHANGELOG retrospective + README pip install / PyPI badges
CHANGELOG: lead with 0.8.1 hotfix entry that names the two bugs (kv_compress=1 default abort + cross-heap libc.free abort), explains how they were caught (end-user simulation in clean venv), and ties them into the project's honest-correction track record (#5 and #6). README: surface `pip install quantcpp` at the top of the page, with PyPI version + python-versions badges replacing the stale v0.5.0 release badge. Quick-start code example uses Model().ask() and the streaming generate(). A NOTE flags the temporary kv_compress=0 default in the bindings and points readers at the CHANGELOG for context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c7375a6 commit 1bc4611

File tree

2 files changed

+58
-4
lines changed

2 files changed

+58
-4
lines changed

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,35 @@
11
# Changelog
22

3+
## [0.8.1] — 2026-04-09 (Python bindings hotfix)
4+
5+
### `pip install quantcpp` is now actually usable
6+
7+
Two critical bugs were found in the v0.8.0 Python bindings within hours of publishing — by running an end-user simulation (`pip install` in a clean venv → `Model("file.gguf").ask("question")`). Both bugs were live for v0.8.0; v0.8.1 fixes them.
8+
9+
#### Bug 1: `Model("file.gguf").ask(...)` aborted on macOS arm64
10+
11+
Root cause: the Python wrapper defaulted to `kv_compress=1`, which routed through the bundled `quant.h`'s UNIFORM_4B KV path. The single-header is an Apr-6 snapshot that pre-dates the v0.8.0 multi-file source by several days, and that older KV path aborts on Llama-architecture models.
12+
13+
Fix: default `kv_compress=0` (no KV compression) in v0.8.1. Non-zero values warn and fall back. The CLI `quant` binary, which uses the multi-file engine, continues to work with all KV types.
14+
15+
A real fix waits on a fresh `quant.h` regen against the v0.8.0+ tree (tracked as v0.8.2).
16+
17+
#### Bug 2: `quant_ask` return string crashed `libc.free(ptr)`
18+
19+
Root cause: `quant_ask` allocates the response string inside `libquant.dylib`'s malloc heap. The Python wrapper called `ctypes.CDLL(None).free(ptr)` to release it — but on macOS arm64, that handle resolves to a different malloc zone than the dylib's. Cross-zone free → abort.
20+
21+
Fix: skip the explicit free in v0.8.1. We accept a ~65 KB leak per `ask()` call as a temporary tradeoff; `quant_free_ctx` / `quant_free_model` release the bulk of the memory at end of session. Tracked: add `quant_free_string(void*)` wrapper to `quant.h` in v0.8.2.
22+
23+
### Honest correction track record
24+
25+
This is corrections #5 and #6 in the project history (after the four in v0.6.x → v0.7.x). Both were caught by the project's own end-user-simulation testing, before any external user reported them. The pattern stands: **publish, simulate the user, fix in hours.**
26+
27+
### v0.8.0 status
28+
29+
PyPI 0.8.0 should be yanked (we strongly recommend upgrading to 0.8.1). Yanking only hides it from new `pip install` — anyone with a pinned `==0.8.0` install can still use it.
30+
31+
---
32+
333
## [0.8.0] — 2026-04-09
434

535
### Cross-platform SIMD: AVX2 port of turbo_kv attention

README.md

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,42 @@
1212
</p>
1313

1414
<p align="center">
15-
<a href="https://github.com/quantumaikr/quant.cpp/releases/tag/v0.5.0"><img src="https://img.shields.io/badge/release-v0.5.0-blue" alt="Release"></a>
15+
<a href="https://pypi.org/project/quantcpp/"><img src="https://img.shields.io/pypi/v/quantcpp.svg?label=PyPI&color=blue" alt="PyPI"></a>
16+
<a href="https://pypi.org/project/quantcpp/"><img src="https://img.shields.io/pypi/pyversions/quantcpp.svg" alt="Python versions"></a>
17+
<a href="https://github.com/quantumaikr/quant.cpp/releases/latest"><img src="https://img.shields.io/github/v/release/quantumaikr/quant.cpp?label=release" alt="Release"></a>
1618
<a href="#"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
17-
<a href="#"><img src="https://img.shields.io/badge/tests-34%20pass-brightgreen" alt="Tests"></a>
18-
<a href="#"><img src="https://img.shields.io/badge/score-99.2%25-brightgreen" alt="Score"></a>
19+
<a href="#"><img src="https://img.shields.io/badge/tests-35%20pass-brightgreen" alt="Tests"></a>
1920
<br>
2021
<a href="#"><img src="https://img.shields.io/badge/models-7%20verified-blue" alt="Models"></a>
2122
<a href="https://quantumaikr.github.io/quant.cpp/"><img src="https://img.shields.io/badge/WASM_demo-192KB-purple" alt="WASM"></a>
22-
<a href="#"><img src="https://img.shields.io/badge/platforms-macOS%20%7C%20Linux%20%7C%20Windows%20%7C%20WASM-orange" alt="Platforms"></a>
23+
<a href="#"><img src="https://img.shields.io/badge/platforms-macOS%20%7C%20Linux%20%7C%20WASM-orange" alt="Platforms"></a>
2324
</p>
2425

2526
---
2627

28+
## Install
29+
30+
```bash
31+
pip install quantcpp
32+
```
33+
34+
```python
35+
from quantcpp import Model
36+
37+
m = Model("model.gguf")
38+
print(m.ask("What is 2+2?"))
39+
40+
# Streaming
41+
for tok in m.generate("Once upon a time"):
42+
print(tok, end="", flush=True)
43+
```
44+
45+
Pre-built wheels for Linux x86_64, Linux aarch64, macOS arm64 (Python 3.9–3.13). Other platforms fall back to source distribution which compiles `quant.h` automatically — no external dependencies, just a C compiler.
46+
47+
> **Note (v0.8.x):** the Python bindings currently default to `kv_compress=0` (no KV compression). KV compression is fully working in the CLI `quant` binary; bringing it to the bindings is tracked for v0.8.2 (regenerated single-header). See [CHANGELOG](CHANGELOG.md#081--2026-04-09-python-bindings-hotfix) for details.
48+
49+
---
50+
2751
## The Problem
2852

2953
LLM memory is dominated by the **KV cache**, not model weights. At 32K context, a 8B model's KV cache consumes **4GB** — more than the model itself. Every existing engine stores KV in FP16. We compress it.

0 commit comments

Comments
 (0)