Skip to content

Releases: ShipItAndPray/turboquant

TurboQuant v1.0.0 — 6x Compression for Vectors, Embeddings, and LLMs

25 Mar 17:06

Choose a tag to compare

What's New

Vector Compression Engine (PolarQuant + QJL)

Based on Google's TurboQuant research — 6x compression with near-zero accuracy loss, no training required.

24 Drop-in Adapters

Plug-and-play compression for every major storage system:

  • Caches: Redis, Memcached, Ehcache, Hazelcast
  • Databases: PostgreSQL (pgvector), MySQL, SQLite, MongoDB (Atlas), DynamoDB, Cassandra
  • Vector DBs: Pinecone, Qdrant, ChromaDB, Milvus, Weaviate, FAISS
  • Search: Elasticsearch, OpenSearch
  • Storage: S3, GCS, Azure Blob
  • Embedded: LMDB, RocksDB
  • Streaming: Kafka

LLM Quantization CLI

Compress any HuggingFace model to GGUF/GPTQ/AWQ in one command.

GitHub Action

CI/CD pipeline for automated LLM quantization — use in your workflows.

Features

  • --target ollama|vllm|llamacpp|lmstudio — auto-selects best format
  • --push-to-hub — publish quantized models to HuggingFace
  • --eval — perplexity evaluation after quantization
  • --recommend — hardware-aware format recommendations