TurboQuant Memory

Near-optimal vector quantization for memory/RAG embedding compression.
Based on TurboQuant (Google, ICLR 2026).

5-8x compression · 98%+ recall · numpy only · no GPU

✨ Features

98% Recall@1 at 6.4x compression (5-bit, tested on Gemini embeddings)
Blockwise Hadamard rotation — fully invertible, O(d log d), <50KB memory
Lloyd-Max optimal quantization — precomputed codebooks, zero training
Any embedding model — Gemini, OpenAI, Cohere, sentence-transformers, etc.
Any dimension ≥ 128 — auto-splits into power-of-2 blocks
OpenClaw sqlite-vec native support (auto-detects vec0 tables)
Distribution validation tool to verify your data before quantizing
Single dependency: numpy

📊 Benchmark

Tested on real Gemini embedding-001 vectors (3072-dim, L2-normalized, 112 vectors):

Bits	MSE	Cosine	Recall@1	Bytes/vec	Compression
3	1.1e-5	0.982	88%	1,160	10.6x
4	3.2e-6	0.995	92%	1,544	8.0x
5	8.2e-7	0.999	98%	1,928	6.4x
6	2.2e-7	1.000	96%+	2,312	5.3x
7	8e-8	1.000	100%	2,696	4.6x
8	3e-8	1.000	98%+	3,080	4.0x

Recommended: 5-bit (best quality/compression tradeoff)

🚀 Quick Start

# Run tests
python3 scripts/turboquant.py

# Validate on your data
python3 scripts/validate.py --db /path/to/memory.sqlite --auto-detect --bits 5

# Benchmark + migrate
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --benchmark
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --migrate

from turboquant import TurboQuantMSE
import numpy as np

tq = TurboQuantMSE(dim=3072, bits=5)

# Compress
stored = tq.quantize(embedding)       # float32 → compact representation
reconstructed = tq.dequantize(stored)  # → float32 (cosine similarity 0.999)

# Search (asymmetric: query is exact float, database is quantized)
q_rot = tq.rotation.apply(query_vec)
for doc in quantized_db:
    score = doc['norm'] * doc['scale'] * np.dot(q_rot, tq.codebook[doc['indices']])

🔧 How It Works

Input embedding (float32, e.g. 3072-dim)
  → Blockwise Hadamard rotation (3 × 1024, fully invertible)
  → Per-vector scale normalization
  → Lloyd-Max scalar quantization (precomputed codebook)
  → Packed bit representation (5-bit = 1,928 bytes vs 12,288 original)

Key design choices:

Blockwise Hadamard over SRHT subsampling (ablation showed subsampling creates an irreversible MSE floor)
Lloyd-Max over uniform quantization (optimal for the post-rotation Gaussian-like distribution)
MSE mode over Prod/QJL mode (QJL adds complexity without benefit at <10k vectors)

📁 Files

File	Description
`scripts/turboquant.py`	Core algorithm (BlockwiseHadamard + Lloyd-Max + QJL)
`scripts/memory_quantize.py`	SQLite/sqlite-vec integration, benchmark, migration
`scripts/validate.py`	Distribution validation + quantization quality report
`references/algorithm.md`	Full algorithm explanation

📦 Install (OpenClaw Skill)

clawhub install turboquant-memory

References

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (ICLR 2026)
PolarQuant: Quantizing KV Caches with Polar Transformation (AISTATS 2026)
Google Research Blog: TurboQuant

License

MIT

TurboQuant Memory（中文说明）

基于 TurboQuant（Google，ICLR 2026）的近最优向量量化，用于 memory/RAG embedding 压缩。

5-8 倍压缩 · 98%+ 召回率 · 仅依赖 numpy · 无需 GPU

✨ 特性

98% Recall@1，6.4 倍压缩（5-bit，Gemini embedding 实测）
分块 Hadamard 旋转 — 完全可逆，O(d log d)，内存 <50KB
Lloyd-Max 最优量化 — 预计算码本，零训练开销
支持任意 embedding 模型 — Gemini、OpenAI、Cohere、sentence-transformers 等
支持任意维度 ≥ 128 — 自动分块为 2 的幂
OpenClaw sqlite-vec 原生支持（自动检测 vec0 表）
分布验证工具 — 量化前验证数据是否适合
唯一依赖：numpy

📊 性能测试

在真实 Gemini embedding-001 向量上测试（3072 维，L2 归一化，112 个向量）：

位宽	MSE	余弦相似度	Recall@1	每向量大小	压缩比
3	1.1e-5	0.982	88%	1,160 B	10.6x
4	3.2e-6	0.995	92%	1,544 B	8.0x
5	8.2e-7	0.999	98%	1,928 B	6.4x
6	2.2e-7	1.000	96%+	2,312 B	5.3x
7	8e-8	1.000	100%	2,696 B	4.6x

推荐：5-bit（性价比最优）

🔧 工作原理

输入 embedding（float32，如 3072 维）
  → 分块 Hadamard 旋转（3 × 1024，完全可逆）
  → 逐向量 scale 归一化
  → Lloyd-Max 标量量化（预计算码本）
  → 紧凑位打包（5-bit：1,928 字节 vs 原始 12,288 字节）

核心设计：

分块 Hadamard 替代 SRHT 子采样（消融实验证明子采样造成不可逆 MSE 地板）
Lloyd-Max 替代均匀量化（对旋转后的近高斯分布是最优的）
MSE 模式为默认（QJL 模式在 <10k 向量时无明显优势）

📦 安装（OpenClaw 技能）

clawhub install turboquant-memory

参考文献

许可证

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.clawhub		.clawhub
references		references
scripts		scripts
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
_meta.json		_meta.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TurboQuant Memory

✨ Features

📊 Benchmark

🚀 Quick Start

🔧 How It Works

📁 Files

📦 Install (OpenClaw Skill)

References

License

TurboQuant Memory（中文说明）

✨ 特性

📊 性能测试

🔧 工作原理

📦 安装（OpenClaw 技能）

参考文献

许可证

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TurboQuant Memory

✨ Features

📊 Benchmark

🚀 Quick Start

🔧 How It Works

📁 Files

📦 Install (OpenClaw Skill)

References

License

TurboQuant Memory（中文说明）

✨ 特性

📊 性能测试

🔧 工作原理

📦 安装（OpenClaw 技能）

参考文献

许可证

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages