-
Notifications
You must be signed in to change notification settings - Fork 12
Add Similarity Search optimization guides #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3af9601
8a3022d
cee7036
c9c3d46
035891e
60e74d5
83b9b97
52a04db
deea507
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Similarity Search Optimization Guides | ||
|
|
||
| This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors. | ||
|
|
||
| ## Overview | ||
|
|
||
| Vector similarity search is a core component of modern AI applications including: | ||
|
|
||
| - Retrieval-Augmented Generation (RAG) | ||
| - Semantic search | ||
| - Recommendation systems | ||
| - Image and video similarity | ||
| - Anomaly detection | ||
|
|
||
| ## Intel Scalable Vector Search (SVS) | ||
|
|
||
| [Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience. | ||
|
|
||
| SVS features: | ||
|
|
||
| - **Vamana Algorithm**: Graph-based approximate nearest neighbor search | ||
| - **Vector Compression**: LVQ and LeanVec for significant memory reduction | ||
| - **Hardware Optimization**: Best performance on servers with AVX-512 support | ||
|
|
||
| ## Understanding LVQ and LeanVec Compression | ||
|
|
||
| Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels. | ||
|
|
||
| ### LVQ (Locally-adaptive Vector Quantization) | ||
|
|
||
| LVQ addresses these limitations by applying **per-vector normalization and scalar quantization**, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations. | ||
|
|
||
| Key benefits: | ||
| - Minimal decompression overhead enables fast, on-the-fly distance computations | ||
| - Significantly reduces memory bandwidth and storage requirements | ||
| - Maintains high search accuracy and throughput | ||
| - SIMD-optimized layout ([Turbo LVQ](https://arxiv.org/abs/2402.02044)) for efficient distance computations | ||
|
|
||
| LVQ achieves a **four-fold reduction** of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes. | ||
|
|
||
| ### LeanVec (LVQ with Dimensionality Reduction) | ||
|
|
||
| [LeanVec](https://openreview.net/forum?id=wczqrpOrIc) builds on LVQ by first applying **linear dimensionality reduction**, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings. | ||
|
|
||
| Best suited for: | ||
| - High-dimensional vectors (768+ dimensions) | ||
| - Text embeddings from large language models | ||
| - Cases where maximum memory savings are needed | ||
|
|
||
| ### Two-Level Compression | ||
|
|
||
| Both LVQ and LeanVec support two-level compression schemes: | ||
|
|
||
| 1. **Level 1**: Fast candidate retrieval using compressed vectors | ||
| 2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data) | ||
|
|
||
| The naming convention reflects bits per dimension at each level: | ||
| - `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension) | ||
| - `LVQ8`: Single-level, 8 bits per dimension | ||
| - `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data | ||
|
|
||
| ## Vector Compression Selection | ||
|
|
||
| | Compression | Best For | Observations | | ||
| |-------------|----------|--------------| | ||
| | LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search | | ||
| | LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall | | ||
| | LVQ4 | Maximum memory saving | Recall might be insufficient | | ||
| | LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 | | ||
| | LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall | | ||
| | LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings | | ||
|
|
||
| **Rule of thumb:** | ||
| - Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8) | ||
| - Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8) | ||
|
|
||
| ## Available Guides | ||
|
|
||
| | Software | Description | Guide | | ||
| |----------|-------------|-------| | ||
| | **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | | ||
|
|
||
| ## References | ||
|
|
||
| - [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) | ||
| - [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch) | ||
| - [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf) | ||
| - [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u) | ||
| - [Turbo LVQ Paper](https://arxiv.org/abs/2402.02044) |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,219 @@ | ||||||
| # Redis Vector Search Optimization Guide | ||||||
|
|
||||||
| This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library. | ||||||
|
|
||||||
| ## Table of Contents | ||||||
|
|
||||||
| - [Overview](#overview) | ||||||
| - [SVS-VAMANA Configuration](#svs-vamana-configuration) | ||||||
| - [Vector Compression](#vector-compression) | ||||||
| - [Performance Tuning](#performance-tuning) | ||||||
| - [Benchmarks](#benchmarks) | ||||||
| - [FAQ](#faq) | ||||||
| - [References](#references) | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm with Intel's compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support. | ||||||
|
|
||||||
| **Key Benefits of SVS-VAMANA:** | ||||||
|
|
||||||
| - **Memory Efficiency**: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory | ||||||
| - **Higher Throughput**: Up to 144% higher QPS compared to HNSW on high-dimensional datasets | ||||||
| - **Lower Latency**: Up to 60% reduction in p50/p95 latencies under load | ||||||
| - **Maintained Accuracy**: Matches HNSW precision levels while delivering performance improvements | ||||||
|
|
||||||
| ## SVS-VAMANA Configuration | ||||||
|
|
||||||
| ### Creating an SVS-VAMANA Index | ||||||
|
|
||||||
| ``` | ||||||
| FT.CREATE my_index | ||||||
| ON HASH | ||||||
| PREFIX 1 doc: | ||||||
| SCHEMA embedding VECTOR SVS-VAMANA 12 | ||||||
| TYPE FLOAT32 | ||||||
| DIM 768 | ||||||
| DISTANCE_METRIC COSINE | ||||||
| GRAPH_MAX_DEGREE 64 | ||||||
| CONSTRUCTION_WINDOW_SIZE 200 | ||||||
| COMPRESSION LVQ4x8 | ||||||
| ``` | ||||||
|
|
||||||
| ### Index Parameters | ||||||
|
|
||||||
| | Parameter | Description | Default | Tuning Guidance | | ||||||
| |-----------|-------------|---------|-----------------| | ||||||
| | TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory | | ||||||
| | DIM | Vector dimensions | - | Must match your embeddings | | ||||||
| | DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings | | ||||||
| | GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory | | ||||||
| | CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality, slower build | | ||||||
| | SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower | | ||||||
| | COMPRESSION | LVQ/LeanVec type | none | See compression section | | ||||||
| | TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low | | ||||||
| | REDUCE | Target dimension for LeanVec | DIM/2 | Lower = faster search, may reduce recall | | ||||||
|
|
||||||
| ## Vector Compression | ||||||
|
|
||||||
| Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality. | ||||||
|
|
||||||
| ### Compression Options | ||||||
|
|
||||||
| | Compression | Bits/Dim | Memory Reduction | Best For | | ||||||
| |-------------|----------|------------------|----------| | ||||||
| | None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy | | ||||||
| | LVQ8 | 8 | ~4x | Fast ingestion, good balance | | ||||||
| | LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 | | ||||||
| | LVQ4x8 | 4+8 | ~2.5x | High recall with compression | | ||||||
| | LeanVec4x8 |4/f+8 | ~3x | High-dimensional vectors (768+) | | ||||||
| | LeanVec8x8 | 8/f+8 | ~2.5x | Best recall with LeanVec | | ||||||
|
|
||||||
| The LeanVec dimensionality reduction factor `f` is the full dimensionality divided by the reduced dimensionality. | ||||||
|
|
||||||
| ### Choosing Compression by Use Case | ||||||
|
|
||||||
| | Embedding Category | Example Embeddings | Compression Strategy | | ||||||
| |--------------------|-------------------|---------------------| | ||||||
| | Text Embeddings | Cohere embed-v3 (1024), OpenAI ada-002 (1536) | LeanVec4x8 | | ||||||
| | Image Embeddings | ResNet-152 (2048), ViT (768+) | LeanVec4x8 | | ||||||
| | Multimodal | CLIP ViT-B/32 (512) | LVQ8 | | ||||||
| | Lower Dimensional | Custom embeddings (<768) | LVQ4x4 or LVQ4x8 | | ||||||
|
|
||||||
| ### Example with LeanVec Compression | ||||||
|
|
||||||
| ``` | ||||||
| FT.CREATE my_index | ||||||
| ON HASH | ||||||
| PREFIX 1 doc: | ||||||
| SCHEMA embedding VECTOR SVS-VAMANA 12 | ||||||
| TYPE FLOAT32 | ||||||
| DIM 1536 | ||||||
| DISTANCE_METRIC COSINE | ||||||
| COMPRESSION LeanVec4x8 | ||||||
| REDUCE 384 | ||||||
| TRAINING_THRESHOLD 20000 | ||||||
| ``` | ||||||
|
|
||||||
| ## Performance Tuning | ||||||
|
|
||||||
| ### Runtime Query Parameters | ||||||
|
|
||||||
| Adjust search parameters at query time for precision/performance trade-offs: | ||||||
|
|
||||||
| ```bash | ||||||
| FT.SEARCH my_index | ||||||
| "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]" | ||||||
| PARAMS 4 BLOB "\x12\xa9..." SW 50 | ||||||
| DIALECT 2 | ||||||
| ``` | ||||||
|
|
||||||
| | Parameter | Effect | Trade-off | | ||||||
| |-----------|--------|-----------| | ||||||
| | SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency | | ||||||
| | SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency | | ||||||
|
|
||||||
| ### Redis Configuration | ||||||
|
|
||||||
| ``` | ||||||
| # redis.conf optimizations for vector workloads | ||||||
|
|
||||||
| # Use multiple I/O threads for better throughput | ||||||
| io-threads 4 | ||||||
| io-threads-do-reads yes | ||||||
| ``` | ||||||
|
|
||||||
| ## Benchmarks | ||||||
|
|
||||||
| Based on [Redis benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW: | ||||||
|
|
||||||
| ### Memory Savings | ||||||
|
|
||||||
| SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets: | ||||||
|
|
||||||
| | Dataset | Dimensions | Total Memory Reduction | Index Memory Reduction | | ||||||
| |---------|------------|----------------------|----------------------| | ||||||
| | LAION | 512 | 26% | 51% | | ||||||
| | Cohere | 768 | 35% | 70% | | ||||||
| | DBpedia | 1536 | 37% | 74% | | ||||||
|
|
||||||
| ### Throughput Improvements (FP32) | ||||||
|
|
||||||
| At 0.95 precision, compared to HNSW: | ||||||
|
|
||||||
| | Dataset | Dimensions | QPS Improvement | | ||||||
| |---------|------------|-----------------| | ||||||
| | Cohere | 768 | Up to 144% higher | | ||||||
| | DBpedia | 1536 | Up to 60% higher | | ||||||
| | LAION | 512 | 0-15% (marginal) | | ||||||
|
|
||||||
| SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768–3072 dimensions). | ||||||
|
|
||||||
| ### Latency Improvements (FP32, High Concurrency) | ||||||
|
|
||||||
| | Dataset | p50 Latency Reduction | p95 Latency Reduction | | ||||||
| |---------|----------------------|----------------------| | ||||||
| | Cohere (768d) | 60% | 57% | | ||||||
| | DBpedia (1536d) | 46% | 36% | | ||||||
|
|
||||||
| ### Precision vs. Performance | ||||||
|
|
||||||
| At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput. | ||||||
|
|
||||||
| ### Ingestion Trade-offs | ||||||
|
|
||||||
| SVS-VAMANA index construction is slower than HNSW due to compression overhead. On x86 platforms: | ||||||
| - LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset | ||||||
| - LVQ: Up to 2.6x slower than HNSW | ||||||
|
|
||||||
| This trade-off is acceptable for workloads where query performance and memory efficiency are priorities. | ||||||
|
|
||||||
| ## FAQ | ||||||
|
|
||||||
| ### Q: When should I use SVS-VAMANA vs HNSW? | ||||||
|
|
||||||
| **A:** Use SVS-VAMANA when: | ||||||
| - Running on Intel Xeon processors with AVX-512 | ||||||
| - Memory efficiency is important (26-37% savings) | ||||||
| - You have medium-to-high dimensional vectors (768+) | ||||||
| - Query throughput and latency are priorities | ||||||
|
|
||||||
| Use HNSW when: | ||||||
| - Running on ARM platforms (HNSW performs well on ARM) | ||||||
| - You need faster index construction | ||||||
| - Working with lower-dimensional vectors (<512) | ||||||
|
|
||||||
| ### Q: Are LVQ and LeanVec available in Redis Open Source? | ||||||
|
|
||||||
| **A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require: | ||||||
| - Intel hardware with AVX-512 | ||||||
| - Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes` | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required. | ||||||
|
|
||||||
| ### Q: What if recall is too low with compression? | ||||||
|
|
||||||
| **A:** Try these steps in order: | ||||||
| 1. Increase `TRAINING_THRESHOLD` (e.g., 50000) | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| 2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8) | ||||||
| 3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128) | ||||||
| 4. Increase `SEARCH_WINDOW_SIZE` at query time | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Increasing the search window size is definitely the first thing to try. |
||||||
| 5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions) | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. REDUCE should be after TRAINING_THRESHOLD. |
||||||
|
|
||||||
| ### Q: Does SVS-VAMANA work on non-Intel hardware? | ||||||
|
|
||||||
| **A:** Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation: | ||||||
|
|
||||||
| - **Intel (AVX-512)**: Full LVQ/LeanVec optimizations for maximum performance | ||||||
| - **AMD/Other x86**: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance | ||||||
| - **ARM**: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM | ||||||
|
|
||||||
| Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification. | ||||||
|
|
||||||
| ## References | ||||||
|
|
||||||
| - [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/) | ||||||
| - [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index) | ||||||
| - [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) | ||||||
| - [Tech Dive: Comprehensive Compression](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/) | ||||||
| - [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both are not true, SVS has advantages.