pg_turboquant is a PostgreSQL extension that adds a custom ANN index access method, turboquant, for compact nearest-neighbor search over vector and halfvec.
It is designed around PostgreSQL's storage and executor constraints rather than treating ANN indexing as an external service. The project combines structured transforms, a faithful TurboQuant v2 payload for normalized cosine and inner-product retrieval, SoA batch pages with 4-bit packed dimension-major nibbles for zero-copy SIMD scoring, NEON TBL and AVX2 VPSHUFB block-16 kernels with global-scale int16 accumulation, ordered ANN scans, bitmap support for filtered workloads, SQL-side exact reranking helpers, and a reproducible benchmark harness with machine-readable microbench regression gates.
- PostgreSQL users already storing embeddings with pgvector often want a denser index format.
pg_turboquantkeeps ANN inside PostgreSQL while optimizing for compact storage and cache-friendly scoring.- The access method is explicit about PostgreSQL boundaries: MVCC still lives in the executor, exact reranking still lives in SQL, and v1 still uses generic WAL.
- Faithful fast path:
normalized cosine and inner-product retrieval use the paper-faithful
Qprodpayload with structured rotation,b - 1stage-1 scalar codes, a residual 1-bit QJL sketch, and stored residual normgamma. The fast path scores via a global-scale quantized LUT16 with NEON TBL or AVX2 VPSHUFB block-16 kernels, accumulating in int16 with periodic drain to int32 (Faiss FastScan-style). - Page format: when LUT16 is supported (bits=4, dimension divisible by 8), batch pages use an SoA layout with 4-bit packed dimension-major nibbles, enabling the SIMD kernel to read directly from the page buffer with no per-scan transpose. Pages that exceed the 8 KB budget for SoA fall back to the legacy AoS interleaved layout.
- Compatibility fallback: L2 and non-normalized scans still work, but they fall back to decoded-vector scoring rather than claiming faithful TurboQuant semantics.
- Rebuild boundary:
the
v2rewrite is a format bump. Older indexes must be rebuilt withREINDEXor recreated.
- Custom access method:
USING turboquant - Input types:
vector,halfvec - Metrics: cosine, inner product, L2
- Modes:
- flat scan with
lists = 0 - IVF-routed scan with
lists > 0 - bitmap-filter support for predicate-heavy workloads
- flat scan with
- Fast-path scope:
- normalized cosine and inner product run on the faithful
v2code-domain path - L2 and non-normalized scans use explicit compatibility fallback scoring
- normalized cosine and inner product run on the faithful
- SQL helpers:
tq_rerank_candidates(...)tq_approx_candidates(...)tq_recommended_query_knobs(...)tq_index_metadata(...)tq_maintain_index(...)tq_last_scan_stats()
Build and install against PostgreSQL 16 or 17 with PGXS:
./scripts/bootstrap_dev.sh
make
make installEnable the required extensions in your database:
CREATE EXTENSION vector;
CREATE EXTENSION pg_turboquant;Create an index:
CREATE INDEX docs_embedding_tq_idx
ON docs
USING turboquant (embedding tq_cosine_ops)
WITH (
bits = 4,
lists = 0,
transform = 'hadamard',
normalized = true
);Run approximate retrieval with SQL-side reranking:
SELECT *
FROM tq_rerank_candidates(
'docs'::regclass,
'id',
'embedding',
'[1,0,0,0]'::vector(4),
'cosine',
50,
10
);Comparative retrieval (arm64 Apple Silicon, PG 16, harrier-oss-v1-270m 640d, 200 queries, SoA page format)
Knowledge Base RAG (2.8K passages, flat scan, microsoft/harrier-oss-v1-270m):
| Method | P50 Latency (ms) | P95 Latency (ms) | Index Size |
|---|---|---|---|
pg_turboquant |
3.54 | 5.17 | 2.8 MB |
pgvector_hnsw |
3.99 | 4.76 | 10.7 MB |
pgvector_ivfflat |
3.77 | 4.65 | 7.6 MB |
turboquant is 12% faster at p50 while being 3.8x smaller than HNSW.
| Dataset | turboquant P95 | HNSW P95 | tq/HNSW | tq Footprint | HNSW Footprint |
|---|---|---|---|---|---|
| KILT NQ (2.5K, flat) | 1.66 ms | 1.43 ms | 1.16x | 1.2 MB | 5.1 MB |
| KILT HotpotQA (10K, IVF) | 1.92 ms | 2.87 ms | 0.67x | 6.5 MB | 21.6 MB |
| PopQA (4.9K, flat) | 2.87 ms | 2.94 ms | 0.98x | 2.5 MB | 10.0 MB |
turboquant beats HNSW on IVF datasets (1.5x faster on HotpotQA) and reaches parity on flat scans, while maintaining a 3-4x smaller footprint across all datasets.
Results are environment-specific. The benchmark harness keeps recall, latency, footprint, WAL, and concurrent-write measurements separate so tradeoffs remain visible instead of being collapsed into a single score.
The public docs follow Diataxis:
- Tutorial: docs/tutorials/getting-started.md
- How-to:
- Reference:
- Explanation:
The docs hub lives at docs/README.md.
- PostgreSQL: 16 and 17
- pgvector: required for
vectorandhalfvec - Tested pgvector contract: pinned development and CI reference
v0.8.1 - Current support boundary:
- one
vector/halfvecANN key plus up to eight fixed-width metadata / payload attributes - metadata keys currently support
bool,int2,int4,int8,date,timestamptz, anduuid - ordered multicolumn scans support exact metadata filtering inside the ANN path, including equality and
ANY(int4[])on the current int4 fast-lane filter contract INCLUDE-style fixed-width payload columns are returned through index tuples, and covered ordered vector-key queries can be observed as trueIndex Only Scan- built-in maintenance includes a physical delta tier plus
tq_maintain_index(...)for lightweight merge / compaction work - the production fast lane still assumes normalized cosine/IP,
transform = 'hadamard', andlanes = auto - exact reranking stays outside the access method
- one
Canonical commands:
make
make install
make unitcheck
make installcheck
make tapcheckThe benchmark harness lives in scripts/benchmark_suite.py. The RAG evaluation harness lives under benchmarks/rag/ and now reports pg_turboquant and pgvector through one generic RAG benchmark contract with separate retrieval, end-to-end, and diagnostics artifacts.
In the RAG harness, retrieval-stage benchmarking now treats stage 1 as a covering retrieval path by default: retrieval queries return IDs, scores, and optional small payload columns, exact rerank is a separate mode, and passage text is fetched only after LIMIT k for generator-facing end-to-end runs. TurboQuant benchmark diagnostics also expose delta-tier and exact-key counters plus maintenance recommendations from tq_index_metadata(...).
For scan observability, tq_last_scan_stats() exposes backend-local JSON for the most recent TurboQuant scan, including score mode, SIMD kernel, scan orchestration, and page pruning counters. tq_index_metadata(...) reports the algorithm version, quantizer family, residual sketch kind, fast-path eligibility, capability flags, and delta / maintenance recommendations. It now carries only cheap heap estimates; use tq_index_heap_stats(...) when you intentionally want an exact heap row count. The benchmark suite also records ordered-IOS evidence, raw EXPLAIN ... FORMAT JSON payloads, and visibility-map context in ordered_ios_observation.