pg_turboquant

pg_turboquant is a PostgreSQL extension that adds a custom ANN index access method, turboquant, for compact nearest-neighbor search over vector and halfvec.

It is designed around PostgreSQL's storage and executor constraints rather than treating ANN indexing as an external service. The project combines structured transforms, a faithful TurboQuant v2 payload for normalized cosine and inner-product retrieval, SoA batch pages with 4-bit packed dimension-major nibbles for zero-copy SIMD scoring, NEON TBL and AVX2 VPSHUFB block-16 kernels with global-scale int16 accumulation, ordered ANN scans, bitmap support for filtered workloads, SQL-side exact reranking helpers, and a reproducible benchmark harness with machine-readable microbench regression gates.

Why it exists

PostgreSQL users already storing embeddings with pgvector often want a denser index format.
pg_turboquant keeps ANN inside PostgreSQL while optimizing for compact storage and cache-friendly scoring.
The access method is explicit about PostgreSQL boundaries: MVCC still lives in the executor, exact reranking still lives in SQL, and v1 still uses generic WAL.

Algorithm contract

Faithful fast path: normalized cosine and inner-product retrieval use the paper-faithful Qprod payload with structured rotation, b - 1 stage-1 scalar codes, a residual 1-bit QJL sketch, and stored residual norm gamma. The fast path scores via a global-scale quantized LUT16 with NEON TBL or AVX2 VPSHUFB block-16 kernels, accumulating in int16 with periodic drain to int32 (Faiss FastScan-style).
Page format: when LUT16 is supported (bits=4, dimension divisible by 8), batch pages use an SoA layout with 4-bit packed dimension-major nibbles, enabling the SIMD kernel to read directly from the page buffer with no per-scan transpose. Pages that exceed the 8 KB budget for SoA fall back to the legacy AoS interleaved layout.
Compatibility fallback: L2 and non-normalized scans still work, but they fall back to decoded-vector scoring rather than claiming faithful TurboQuant semantics.
Rebuild boundary: the v2 rewrite is a format bump. Older indexes must be rebuilt with REINDEX or recreated.

Current capabilities

Custom access method: USING turboquant
Input types: vector, halfvec
Metrics: cosine, inner product, L2
Modes:
- flat scan with lists = 0
- IVF-routed scan with lists > 0
- bitmap-filter support for predicate-heavy workloads
Fast-path scope:
- normalized cosine and inner product run on the faithful v2 code-domain path
- L2 and non-normalized scans use explicit compatibility fallback scoring
SQL helpers:
- tq_rerank_candidates(...)
- tq_approx_candidates(...)
- tq_recommended_query_knobs(...)
- tq_index_metadata(...)
- tq_maintain_index(...)
- tq_last_scan_stats()

Quick start

Build and install against PostgreSQL 16 or 17 with PGXS:

./scripts/bootstrap_dev.sh
make
make install

Enable the required extensions in your database:

CREATE EXTENSION vector;
CREATE EXTENSION pg_turboquant;

Create an index:

CREATE INDEX docs_embedding_tq_idx
ON docs
USING turboquant (embedding tq_cosine_ops)
WITH (
  bits = 4,
  lists = 0,
  transform = 'hadamard',
  normalized = true
);

Run approximate retrieval with SQL-side reranking:

SELECT *
FROM tq_rerank_candidates(
  'docs'::regclass,
  'id',
  'embedding',
  '[1,0,0,0]'::vector(4),
  'cosine',
  50,
  10
);

Benchmark highlights

Comparative retrieval (arm64 Apple Silicon, PG 16, harrier-oss-v1-270m 640d, 200 queries, SoA page format)

Knowledge Base RAG (2.8K passages, flat scan, microsoft/harrier-oss-v1-270m):

Method	P50 Latency (ms)	P95 Latency (ms)	Index Size
`pg_turboquant`	3.54	5.17	2.8 MB
`pgvector_hnsw`	3.99	4.76	10.7 MB
`pgvector_ivfflat`	3.77	4.65	7.6 MB

turboquant is 12% faster at p50 while being 3.8x smaller than HNSW.

Comparative retrieval (arm64, PG 16, bge-small-en-v1.5 384d, 200 queries per dataset)

Dataset	turboquant P95	HNSW P95	tq/HNSW	tq Footprint	HNSW Footprint
KILT NQ (2.5K, flat)	1.66 ms	1.43 ms	1.16x	1.2 MB	5.1 MB
KILT HotpotQA (10K, IVF)	1.92 ms	2.87 ms	0.67x	6.5 MB	21.6 MB
PopQA (4.9K, flat)	2.87 ms	2.94 ms	0.98x	2.5 MB	10.0 MB

turboquant beats HNSW on IVF datasets (1.5x faster on HotpotQA) and reaches parity on flat scans, while maintaining a 3-4x smaller footprint across all datasets.

Results are environment-specific. The benchmark harness keeps recall, latency, footprint, WAL, and concurrent-write measurements separate so tradeoffs remain visible instead of being collapsed into a single score.

Documentation

The public docs follow Diataxis:

Tutorial: docs/tutorials/getting-started.md
How-to:
Reference:
Explanation:

The docs hub lives at docs/README.md.

Compatibility and scope

PostgreSQL: 16 and 17
pgvector: required for vector and halfvec
Tested pgvector contract: pinned development and CI reference v0.8.1
Current support boundary:
- one vector/halfvec ANN key plus up to eight fixed-width metadata / payload attributes
- metadata keys currently support bool, int2, int4, int8, date, timestamptz, and uuid
- ordered multicolumn scans support exact metadata filtering inside the ANN path, including equality and ANY(int4[]) on the current int4 fast-lane filter contract
- INCLUDE-style fixed-width payload columns are returned through index tuples, and covered ordered vector-key queries can be observed as true Index Only Scan
- built-in maintenance includes a physical delta tier plus tq_maintain_index(...) for lightweight merge / compaction work
- the production fast lane still assumes normalized cosine/IP, transform = 'hadamard', and lanes = auto
- exact reranking stays outside the access method

Development

Canonical commands:

make
make install
make unitcheck
make installcheck
make tapcheck

The benchmark harness lives in scripts/benchmark_suite.py. The RAG evaluation harness lives under benchmarks/rag/ and now reports pg_turboquant and pgvector through one generic RAG benchmark contract with separate retrieval, end-to-end, and diagnostics artifacts.

In the RAG harness, retrieval-stage benchmarking now treats stage 1 as a covering retrieval path by default: retrieval queries return IDs, scores, and optional small payload columns, exact rerank is a separate mode, and passage text is fetched only after LIMIT k for generator-facing end-to-end runs. TurboQuant benchmark diagnostics also expose delta-tier and exact-key counters plus maintenance recommendations from tq_index_metadata(...).

For scan observability, tq_last_scan_stats() exposes backend-local JSON for the most recent TurboQuant scan, including score mode, SIMD kernel, scan orchestration, and page pruning counters. tq_index_metadata(...) reports the algorithm version, quantizer family, residual sketch kind, fast-path eligibility, capability flags, and delta / maintenance recommendations. It now carries only cheap heap estimates; use tq_index_heap_stats(...) when you intentionally want an exact heap row count. The benchmark suite also records ordered-IOS evidence, raw EXPLAIN ... FORMAT JSON payloads, and visibility-map context in ordered_ios_observation.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docker		docker
docs		docs
scripts		scripts
sql		sql
src		src
t		t
test		test
tests		tests
third_party		third_party
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pg_turboquant.control		pg_turboquant.control
pg_turboquant_test_support.control		pg_turboquant_test_support.control
pyproject.toml		pyproject.toml
sonar-project.properties		sonar-project.properties
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_turboquant

Why it exists

Algorithm contract

Current capabilities

Quick start

Benchmark highlights

Comparative retrieval (arm64 Apple Silicon, PG 16, harrier-oss-v1-270m 640d, 200 queries, SoA page format)

Comparative retrieval (arm64, PG 16, bge-small-en-v1.5 384d, 200 queries per dataset)

Documentation

Compatibility and scope

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pg_turboquant

Why it exists

Algorithm contract

Current capabilities

Quick start

Benchmark highlights

Comparative retrieval (arm64 Apple Silicon, PG 16, harrier-oss-v1-270m 640d, 200 queries, SoA page format)

Comparative retrieval (arm64, PG 16, bge-small-en-v1.5 384d, 200 queries per dataset)

Documentation

Compatibility and scope

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages