diff --git a/README.md b/README.md index 384ca9a..9ee9f41 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,8 @@ We aim to provide a dynamic resource where users can find the latest optimizatio - [Cassandra](software/cassandra/README.md) - [Gluten](software/gluten/README.md) - [Java](software/java/README.md) + - [Similarity Search](software/similarity-search/README.md) + - [Redis](software/similarity-search/redis/README.md) - [Spark](software/spark/README.md) - [MySQL & PostgreSQL](software/mysql-postgresql/README.md) - Workloads diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md new file mode 100644 index 0000000..4d0b0f5 --- /dev/null +++ b/software/similarity-search/README.md @@ -0,0 +1,89 @@ +# Similarity Search Optimization Guides + +This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors. + +## Overview + +Vector similarity search is a core component of modern AI applications including: + +- Retrieval-Augmented Generation (RAG) +- Semantic search +- Recommendation systems +- Image and video similarity +- Anomaly detection + +## Intel Scalable Vector Search (SVS) + +[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience. + +SVS features: + +- **Vamana Algorithm**: Graph-based approximate nearest neighbor search +- **Vector Compression**: LVQ and LeanVec for significant memory reduction +- **Hardware Optimization**: Best performance on servers with AVX-512 support + +## Understanding LVQ and LeanVec Compression + +Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels. + +### LVQ (Locally-adaptive Vector Quantization) + +LVQ addresses these limitations by applying **per-vector normalization and scalar quantization**, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations. + +Key benefits: +- Minimal decompression overhead enables fast, on-the-fly distance computations +- Significantly reduces memory bandwidth and storage requirements +- Maintains high search accuracy and throughput +- SIMD-optimized layout ([Turbo LVQ](https://arxiv.org/abs/2402.02044)) for efficient distance computations + +LVQ achieves a **four-fold reduction** of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes. + +### LeanVec (LVQ with Dimensionality Reduction) + +[LeanVec](https://openreview.net/forum?id=wczqrpOrIc) builds on LVQ by first applying **linear dimensionality reduction**, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings. + +Best suited for: +- High-dimensional vectors (768+ dimensions) +- Text embeddings from large language models +- Cases where maximum memory savings are needed + +### Two-Level Compression + +Both LVQ and LeanVec support two-level compression schemes: + +1. **Level 1**: Fast candidate retrieval using compressed vectors +2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data) + +The naming convention reflects bits per dimension at each level: +- `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension) +- `LVQ8`: Single-level, 8 bits per dimension +- `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data + +## Vector Compression Selection + +| Compression | Best For | Observations | +|-------------|----------|--------------| +| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search | +| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall | +| LVQ4 | Maximum memory saving | Recall might be insufficient | +| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 | +| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall | +| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings | + +**Rule of thumb:** +- Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8) +- Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8) + +## Available Guides + +| Software | Description | Guide | +|----------|-------------|-------| +| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | + +## References + +- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) +- [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch) +- [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf) +- [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u) +- [Turbo LVQ Paper](https://arxiv.org/abs/2402.02044) diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md new file mode 100644 index 0000000..e627b85 --- /dev/null +++ b/software/similarity-search/redis/README.md @@ -0,0 +1,219 @@ +# Redis Vector Search Optimization Guide + +This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library. + +## Table of Contents + +- [Overview](#overview) +- [SVS-VAMANA Configuration](#svs-vamana-configuration) +- [Vector Compression](#vector-compression) +- [Performance Tuning](#performance-tuning) +- [Benchmarks](#benchmarks) +- [FAQ](#faq) +- [References](#references) + +## Overview + +Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm with Intel's compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support. + +**Key Benefits of SVS-VAMANA:** + +- **Memory Efficiency**: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory +- **Higher Throughput**: Up to 144% higher QPS compared to HNSW on high-dimensional datasets +- **Lower Latency**: Up to 60% reduction in p50/p95 latencies under load +- **Maintained Accuracy**: Matches HNSW precision levels while delivering performance improvements + +## SVS-VAMANA Configuration + +### Creating an SVS-VAMANA Index + +``` +FT.CREATE my_index + ON HASH + PREFIX 1 doc: + SCHEMA embedding VECTOR SVS-VAMANA 12 + TYPE FLOAT32 + DIM 768 + DISTANCE_METRIC COSINE + GRAPH_MAX_DEGREE 64 + CONSTRUCTION_WINDOW_SIZE 200 + COMPRESSION LVQ4x8 +``` + +### Index Parameters + +| Parameter | Description | Default | Tuning Guidance | +|-----------|-------------|---------|-----------------| +| TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory | +| DIM | Vector dimensions | - | Must match your embeddings | +| DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings | +| GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory | +| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality, slower build | +| SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower | +| COMPRESSION | LVQ/LeanVec type | none | See compression section | +| TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low | +| REDUCE | Target dimension for LeanVec | DIM/2 | Lower = faster search, may reduce recall | + +## Vector Compression + +Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality. + +### Compression Options + +| Compression | Bits/Dim | Memory Reduction | Best For | +|-------------|----------|------------------|----------| +| None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy | +| LVQ8 | 8 | ~4x | Fast ingestion, good balance | +| LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 | +| LVQ4x8 | 4+8 | ~2.5x | High recall with compression | +| LeanVec4x8 |4/f+8 | ~3x | High-dimensional vectors (768+) | +| LeanVec8x8 | 8/f+8 | ~2.5x | Best recall with LeanVec | + +The LeanVec dimensionality reduction factor `f` is the full dimensionality divided by the reduced dimensionality. + +### Choosing Compression by Use Case + +| Embedding Category | Example Embeddings | Compression Strategy | +|--------------------|-------------------|---------------------| +| Text Embeddings | Cohere embed-v3 (1024), OpenAI ada-002 (1536) | LeanVec4x8 | +| Image Embeddings | ResNet-152 (2048), ViT (768+) | LeanVec4x8 | +| Multimodal | CLIP ViT-B/32 (512) | LVQ8 | +| Lower Dimensional | Custom embeddings (<768) | LVQ4x4 or LVQ4x8 | + +### Example with LeanVec Compression + +``` +FT.CREATE my_index + ON HASH + PREFIX 1 doc: + SCHEMA embedding VECTOR SVS-VAMANA 12 + TYPE FLOAT32 + DIM 1536 + DISTANCE_METRIC COSINE + COMPRESSION LeanVec4x8 + REDUCE 384 + TRAINING_THRESHOLD 20000 +``` + +## Performance Tuning + +### Runtime Query Parameters + +Adjust search parameters at query time for precision/performance trade-offs: + +```bash +FT.SEARCH my_index + "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]" + PARAMS 4 BLOB "\x12\xa9..." SW 50 + DIALECT 2 +``` + +| Parameter | Effect | Trade-off | +|-----------|--------|-----------| +| SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency | +| SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency | + +### Redis Configuration + +``` +# redis.conf optimizations for vector workloads + +# Use multiple I/O threads for better throughput +io-threads 4 +io-threads-do-reads yes +``` + +## Benchmarks + +Based on [Redis benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW: + +### Memory Savings + +SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets: + +| Dataset | Dimensions | Total Memory Reduction | Index Memory Reduction | +|---------|------------|----------------------|----------------------| +| LAION | 512 | 26% | 51% | +| Cohere | 768 | 35% | 70% | +| DBpedia | 1536 | 37% | 74% | + +### Throughput Improvements (FP32) + +At 0.95 precision, compared to HNSW: + +| Dataset | Dimensions | QPS Improvement | +|---------|------------|-----------------| +| Cohere | 768 | Up to 144% higher | +| DBpedia | 1536 | Up to 60% higher | +| LAION | 512 | 0-15% (marginal) | + +SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768–3072 dimensions). + +### Latency Improvements (FP32, High Concurrency) + +| Dataset | p50 Latency Reduction | p95 Latency Reduction | +|---------|----------------------|----------------------| +| Cohere (768d) | 60% | 57% | +| DBpedia (1536d) | 46% | 36% | + +### Precision vs. Performance + +At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput. + +### Ingestion Trade-offs + +SVS-VAMANA index construction is slower than HNSW due to compression overhead. On x86 platforms: +- LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset +- LVQ: Up to 2.6x slower than HNSW + +This trade-off is acceptable for workloads where query performance and memory efficiency are priorities. + +## FAQ + +### Q: When should I use SVS-VAMANA vs HNSW? + +**A:** Use SVS-VAMANA when: +- Running on Intel Xeon processors with AVX-512 +- Memory efficiency is important (26-37% savings) +- You have medium-to-high dimensional vectors (768+) +- Query throughput and latency are priorities + +Use HNSW when: +- Running on ARM platforms (HNSW performs well on ARM) +- You need faster index construction +- Working with lower-dimensional vectors (<512) + +### Q: Are LVQ and LeanVec available in Redis Open Source? + +**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require: +- Intel hardware with AVX-512 +- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes` + +On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required. + +### Q: What if recall is too low with compression? + +**A:** Try these steps in order: +1. Increase `TRAINING_THRESHOLD` (e.g., 50000) +2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8) +3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128) +4. Increase `SEARCH_WINDOW_SIZE` at query time +5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions) + +### Q: Does SVS-VAMANA work on non-Intel hardware? + +**A:** Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation: + +- **Intel (AVX-512)**: Full LVQ/LeanVec optimizations for maximum performance +- **AMD/Other x86**: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance +- **ARM**: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM + +Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification. + +## References + +- [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/) +- [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index) +- [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) +- [Tech Dive: Comprehensive Compression](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/) +- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/)