TurboQuant Adapters

Drop-in vector compression for 24 storage systems. Based on Google's TurboQuant — PolarQuant + QJL for ~6x compression with near-zero accuracy loss.

Quick Start

from turboquant.core import TurboQuantEncoder, TurboQuantConfig
from turboquant.adapters.redis import RedisTurboCache

# 1. Create encoder (one-time, reuse across all adapters)
encoder = TurboQuantEncoder(dim=768, config=TurboQuantConfig(bits=4, block_size=32))

# 2. Wrap your existing client
import redis
cache = RedisTurboCache(encoder, redis.Redis(), prefix="emb:")

# 3. Use it — vectors are compressed transparently
cache.put("doc:1", embedding_vector)             # Stores ~500 bytes instead of ~3KB
vec = cache.get("doc:1")                          # Returns full float32 vector
results = cache.search(query_vector, k=10)        # Similarity search over compressed data
stats = cache.stats()                             # Compression stats

All Adapters

Every adapter has the same API: put(), get(), put_batch(), get_batch(), search(), delete(), stats().

Caches

Adapter	Backend	Install	Key Feature
`redis`	Redis	`pip install redis`	Pipeline batch ops, SCAN search, TTL
`memcached`	Memcached	`pip install pymemcache`	get_multi/set_multi, CAS atomic updates
`ehcache`	Ehcache (Java)	`pip install py4j`	Py4J JVM bridge or REST API
`hazelcast`	Hazelcast	`pip install hazelcast-python-client`	Distributed cluster, put_all/get_all

Databases

Adapter	Backend	Install	Key Feature
`postgresql`	PostgreSQL	`pip install psycopg2-binary`	BYTEA + pgvector hybrid search
`mysql`	MySQL/MariaDB	`pip install mysql-connector-python`	MEDIUMBLOB, executemany bulk
`sqlite`	SQLite	(built-in)	Zero dependencies, WAL mode
`mongodb`	MongoDB	`pip install pymongo`	BSON Binary, Atlas Vector Search
`dynamodb`	AWS DynamoDB	`pip install boto3`	Binary (B) attribute, batch_write, TTL
`cassandra`	Cassandra/ScyllaDB	`pip install cassandra-driver`	Prepared stmts, BATCH, native TTL

Vector Databases

Adapter	Backend	Install	Key Feature
`pinecone`	Pinecone	`pip install pinecone-client`	ANN + TurboQuant rerank
`qdrant`	Qdrant	`pip install qdrant-client`	HNSW + rerank, payload filtering
`chromadb`	ChromaDB	`pip install chromadb`	Local/server, metadata filtering
`milvus`	Milvus	`pip install pymilvus`	IVF/HNSW index + rerank
`weaviate`	Weaviate	`pip install weaviate-client`	Schema-based, near_vector search
`faiss`	FAISS	`pip install faiss-cpu`	Local ANN index, save/load to disk

Search Engines

Adapter	Backend	Install	Key Feature
`elasticsearch`	Elasticsearch	`pip install elasticsearch`	binary field + kNN rerank
`opensearch`	OpenSearch	`pip install opensearch-py`	k-NN plugin (nmslib/faiss)

Object Storage

Adapter	Backend	Install	Key Feature
`s3`	AWS S3	`pip install boto3`	~500B objects, concurrent upload
`gcs`	Google Cloud Storage	`pip install google-cloud-storage`	Blob metadata, concurrent upload
`azure_blob`	Azure Blob Storage	`pip install azure-storage-blob`	Container-based, blob metadata

Embedded Stores

Adapter	Backend	Install	Key Feature
`lmdb`	LMDB	`pip install lmdb`	Memory-mapped, zero-copy reads
`rocksdb`	RocksDB	`pip install python-rocksdb`	WriteBatch, LSM-tree optimized

Streaming

Adapter	Backend	Install	Key Feature
`kafka`	Apache Kafka	`pip install confluent-kafka`	Producer + Consumer, 6x smaller messages

Compression Performance

Tested on random unit vectors (typical embedding distribution):

Config	Dim=128	Dim=384	Dim=768	Dim=1536
4-bit, bs=32	5.5x / 0.990	6.1x / 0.973	6.2x / 0.949	6.3x / 0.907
3-bit, bs=32	6.6x / 0.957	7.5x / 0.889	7.7x / 0.796	7.9x / 0.670

Format: compression ratio / cosine similarity

Recommendation: Use 4-bit, block_size=32 for production. It gives 6x compression with >0.95 cosine similarity up to dim=768.

Architecture

Your App
   |
   v
[TurboQuant Adapter]  <-- Same API for all 24 backends
   |
   ├── encode: vector -> compressed bytes (~6x smaller)
   ├── decode: compressed bytes -> vector (near-lossless)
   └── similarity: compressed vs compressed (no decode needed)
   |
   v
[Your Storage Backend]  <-- Redis, Postgres, S3, Pinecone, etc.

How It Works

PolarQuant (Stage 1): Random orthogonal rotation spreads information uniformly across all components, then block-wise quantization compresses each block of 32 values with its own scale factor.
QJL (Stage 2): Quantized Johnson-Lindenstrauss transform projects the residual error into a random subspace and quantizes to 1-bit (sign only), providing unbiased error correction with near-zero overhead.

Vector DB Reranking

For vector databases (Pinecone, Qdrant, Milvus, etc.), the adapter stores both:

Full vector for native ANN search (fast candidate generation)
Compressed vector in metadata/payload (for TurboQuant reranking)

This gives you the best of both worlds: fast approximate search + precise reranking.

# Search modes for vector DBs:
results = cache.search(query, k=10, mode="rerank")     # ANN candidates + TQ rerank (best quality)
results = cache.search(query, k=10, mode="native")     # ANN only (fastest)
results = cache.search(query, k=10, mode="compressed")  # TQ only (no ANN index needed)

Batch Operations

All adapters support batch operations optimized for their backend:

# Batch put (uses Redis pipelines, DynamoDB batch_write, etc.)
cache.put_batch({
    "doc:1": vector1,
    "doc:2": vector2,
    "doc:3": vector3,
})

# Batch get
vectors = cache.get_batch(["doc:1", "doc:2", "doc:3"])

Stats & Monitoring

stats = cache.stats()
# {
#   "puts": 1000,
#   "gets": 5000,
#   "hits": 4800,
#   "hit_rate": "96.0%",
#   "bytes_original": 3072000,
#   "bytes_compressed": 494000,
#   "overall_ratio": "6.2x",
#   "avg_encode_ms": "1.05",
#   "avg_decode_ms": "0.58",
# }

Extending

Create your own adapter by subclassing BaseTurboAdapter:

from turboquant.adapters._base import BaseTurboAdapter

class MyCustomCache(BaseTurboAdapter):
    def _raw_get(self, key: str) -> Optional[bytes]:
        return self.backend.get(key)

    def _raw_set(self, key: str, value: bytes, ttl=None) -> None:
        self.backend.set(key, value)

    def _raw_delete(self, key: str) -> bool:
        return self.backend.delete(key)

    def _raw_keys(self, pattern: str = "*") -> List[str]:
        return self.backend.keys()

You get put(), get(), put_batch(), get_batch(), search(), delete(), and stats() for free.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TurboQuant Adapters

Quick Start

All Adapters

Caches

Databases

Vector Databases

Search Engines

Object Storage

Embedded Stores

Streaming

Compression Performance

Architecture

How It Works

Vector DB Reranking

Batch Operations

Stats & Monitoring

Extending

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TurboQuant Adapters

Quick Start

All Adapters

Caches

Databases

Vector Databases

Search Engines

Object Storage

Embedded Stores

Streaming

Compression Performance

Architecture

How It Works

Vector DB Reranking

Batch Operations

Stats & Monitoring

Extending