From 3af96012ce8aac4aecdc61da0635a84979c03c05 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Mon, 9 Feb 2026 22:48:51 +0000 Subject: [PATCH 1/9] Add Similarity Search optimization guides with Redis SVS-VAMANA documentation --- README.md | 2 + software/similarity-search/README.md | 76 +++++ software/similarity-search/redis/README.md | 312 +++++++++++++++++++++ 3 files changed, 390 insertions(+) create mode 100644 software/similarity-search/README.md create mode 100644 software/similarity-search/redis/README.md diff --git a/README.md b/README.md index 384ca9a..9ee9f41 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,8 @@ We aim to provide a dynamic resource where users can find the latest optimizatio - [Cassandra](software/cassandra/README.md) - [Gluten](software/gluten/README.md) - [Java](software/java/README.md) + - [Similarity Search](software/similarity-search/README.md) + - [Redis](software/similarity-search/redis/README.md) - [Spark](software/spark/README.md) - [MySQL & PostgreSQL](software/mysql-postgresql/README.md) - Workloads diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md new file mode 100644 index 0000000..74df68e --- /dev/null +++ b/software/similarity-search/README.md @@ -0,0 +1,76 @@ +# Similarity Search Optimization Guides + +This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors. + +## Overview + +Vector similarity search is a core component of modern AI applications including: + +- Retrieval-Augmented Generation (RAG) +- Semantic search +- Recommendation systems +- Image and video similarity +- Anomaly detection + +Intel provides optimized solutions through the **Scalable Vector Search (SVS)** library, which delivers state-of-the-art performance on Intel hardware. + +## Intel Scalable Vector Search (SVS) + +[Intel SVS](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, featuring: + +- **Vamana Algorithm**: Graph-based approximate nearest neighbor search +- **Vector Compression**: LVQ and LeanVec for up to 16x memory reduction +- **AVX-512 Optimization**: Native acceleration on Intel Xeon processors +- **Streaming Support**: DynamicVamana for real-time data updates + +### Key Performance Benefits + +| Metric | Intel SVS Advantage | +|--------|---------------------| +| Throughput | Up to 13.5x vs. alternatives at billion scale | +| Memory | Up to 16x reduction with compression | +| Latency | Optimized for both batch and single queries | + +## Available Guides + +| Software | Description | Guide | +|----------|-------------|-------| +| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | +| **FAISS** | Facebook AI Similarity Search with SVS indexes | Coming soon | + +## Common Optimization Topics + +### Hardware Recommendations + +For optimal vector search performance on Intel: + +- **CPU**: 4th Gen Intel Xeon Scalable (Sapphire Rapids) or newer +- **Memory**: DDR5 for higher bandwidth +- **Storage**: NVMe SSD for large datasets + +### BIOS Settings + +| Setting | Recommendation | Impact | +|---------|----------------|--------| +| Hyperthreading | Enabled | Up to 20% throughput | +| Sub-NUMA Clustering | SNC2/SNC4 | Up to 15% with pinning | +| Hardware Prefetcher | Enabled | 5-10% improvement | + +### Vector Compression Selection + +``` +Dimensions < 512 → LVQ4x4 or LVQ4x8 +Dimensions ≥ 512 → LeanVec4x8 or LeanVec8x8 +Maximum savings → LVQ4 (may reduce recall) +``` + +## References + +- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) +- [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch) +- [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf) +- [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u) + +## Contributing + +We welcome contributions! If you have optimization tips for additional vector search solutions, please open a pull request. diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md new file mode 100644 index 0000000..3984e34 --- /dev/null +++ b/software/similarity-search/redis/README.md @@ -0,0 +1,312 @@ +# Redis Vector Search Optimization Guide + +This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library, optimized for Intel hardware. + +## Table of Contents + +- [Overview](#overview) +- [Hardware Recommendations](#hardware-recommendations) +- [BIOS Configuration](#bios-configuration) +- [Choosing the Right Index Type](#choosing-the-right-index-type) +- [SVS-VAMANA Configuration](#svs-vamana-configuration) +- [Vector Compression](#vector-compression) +- [Performance Tuning](#performance-tuning) +- [Benchmarks](#benchmarks) +- [FAQ](#faq) +- [References](#references) + +## Overview + +Redis Query Engine supports three vector index types: + +| Index Type | Use Case | Accuracy | Performance | +|------------|----------|----------|-------------| +| **FLAT** | Small datasets (<1M vectors) | Exact | Brute-force | +| **HNSW** | Large datasets, general use | Approximate | Good | +| **SVS-VAMANA** | Large datasets on Intel hardware | Approximate | Best on Intel | + +**Why SVS-VAMANA on Intel?** + +- Optimized for AVX-512 instruction set on Intel Xeon processors +- Advanced compression (LVQ, LeanVec) reduces memory by up to 16x +- Higher throughput with lower latency compared to HNSW + +## Hardware Recommendations + +### Recommended Intel Xeon Configurations + +| Workload Size | CPU | Memory | Storage | +|---------------|-----|--------|---------| +| Small (<1M vectors) | 4th Gen Xeon, 16 cores | 64 GB DDR5 | NVMe SSD | +| Medium (1-10M vectors) | 4th Gen Xeon, 32 cores | 128 GB DDR5 | NVMe SSD | +| Large (10-100M vectors) | 4th/5th Gen Xeon, 64 cores | 256 GB DDR5 | NVMe SSD | +| X-Large (>100M vectors) | 5th Gen Xeon, 128+ cores | 512+ GB DDR5 | NVMe RAID | + +> **PerfTip:** 4th Gen Intel Xeon Scalable (Sapphire Rapids) and newer provide optimal AVX-512 performance for vector operations. + +### Key Hardware Features + +- **AVX-512**: Required for optimal SVS performance +- **AMX**: Additional acceleration on 4th/5th Gen Xeon +- **DDR5 Memory**: Higher bandwidth improves vector search throughput +- **Large L3 Cache**: Helps with graph traversal operations + +## BIOS Configuration + +| Parameter | Recommended Setting | Description | PerfTip | +|-----------|---------------------|-------------|---------| +| Hyperthreading (SMT) | Enabled | Two threads per core | Up to 20% | +| Sub-NUMA Clustering (SNC) | SNC2 or SNC4 | Better memory locality | Up to 15% | +| Hardware Prefetcher | Enabled | Improves cache utilization | 5-10% | +| Intel Turbo Boost | Enabled | Higher clock speeds | 10-15% | +| Power Profile | Performance | Maximum CPU frequency | Varies | + +## Choosing the Right Index Type + +``` + ┌─────────────────────────────────────┐ + │ Do you need exact results? │ + └─────────────────────────────────────┘ + │ + ┌───────────────┴───────────────┐ + ▼ ▼ + Yes No + │ │ + ▼ ▼ + Use FLAT ┌────────────────────┐ + │ Running on Intel? │ + └────────────────────┘ + │ + ┌───────────────┴───────────────┐ + ▼ ▼ + Yes No + │ │ + ▼ ▼ + Use SVS-VAMANA Use HNSW +``` + +## SVS-VAMANA Configuration + +### Creating an SVS-VAMANA Index + +```bash +FT.CREATE my_index + ON HASH + PREFIX 1 doc: + SCHEMA embedding VECTOR SVS-VAMANA 12 + TYPE FLOAT32 + DIM 768 + DISTANCE_METRIC COSINE + GRAPH_MAX_DEGREE 64 + CONSTRUCTION_WINDOW_SIZE 200 + COMPRESSION LVQ4x8 +``` + +### Index Parameters + +| Parameter | Description | Default | Tuning Guidance | +|-----------|-------------|---------|-----------------| +| TYPE | Vector data type | - | FLOAT32 for accuracy, FLOAT16 for memory | +| DIM | Vector dimensions | - | Must match your embeddings | +| DISTANCE_METRIC | L2, IP, or COSINE | - | COSINE for normalized embeddings | +| GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory | +| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality | +| SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower | +| COMPRESSION | LVQ/LeanVec type | none | See compression section | + +> **PerfTip:** Setting `GRAPH_MAX_DEGREE` to 64 instead of default 32 can improve recall by 2-5% with ~2x memory overhead for the graph structure. + +## Vector Compression + +Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality. + +### Compression Options + +| Compression | Bits/Dim | Memory Reduction | Best For | +|-------------|----------|------------------|----------| +| None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy | +| LVQ8 | 8 | 4x | Fast ingestion | +| LVQ4x4 | 4+4 | 8x | Balanced | +| LVQ4x8 | 4+8 | ~6x | High recall with compression | +| LeanVec4x8 | Reduced dim + 4+8 | 8-16x | High-dimensional vectors (768+) | +| LeanVec8x8 | Reduced dim + 8+8 | 4-8x | Best recall with LeanVec | + +### Choosing Compression + +``` +Vector Dimensions < 512? + └─► Use LVQ4x4 or LVQ4x8 + +Vector Dimensions >= 512? + └─► Use LeanVec4x8 or LeanVec8x8 + +Need maximum memory savings? + └─► Use LVQ4 (single-level) + +Need highest recall with compression? + └─► Use LVQ4x8 or LeanVec8x8 +``` + +> **PerfTip:** LeanVec4x8 with 768-dimensional vectors (common for text embeddings) can reduce memory by 10x while maintaining 95%+ recall. + +### Two-Level Compression + +LVQ and LeanVec support two-level compression: + +1. **Level 1**: Fast candidate retrieval using compressed vectors +2. **Level 2**: Re-ranking using residual encoding for accuracy + +Example: `LVQ4x8` uses 4 bits for Level 1 and 8 bits for Level 2. + +### Compression Training + +Compression parameters are learned from data. Use `TRAINING_THRESHOLD` to control the sample size: + +```bash +FT.CREATE my_index + ON HASH + PREFIX 1 doc: + SCHEMA embedding VECTOR SVS-VAMANA 14 + TYPE FLOAT32 + DIM 768 + DISTANCE_METRIC COSINE + COMPRESSION LeanVec4x8 + TRAINING_THRESHOLD 20000 + REDUCE 192 +``` + +> **Note:** If recall is low, increase `TRAINING_THRESHOLD`. The default is 10 * 1024 = 10,240 vectors. + +## Performance Tuning + +### Runtime Query Parameters + +```bash +FT.SEARCH my_index + "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]" + PARAMS 4 BLOB "\x12\xa9..." SW 50 + DIALECT 2 +``` + +| Parameter | Effect | Trade-off | +|-----------|--------|-----------| +| SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency | +| EPSILON | Larger = wider range search | Higher latency | +| SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency | + +### OS-Level Tuning + +```bash +# Enable huge pages for better memory performance +echo 'vm.nr_hugepages = 1024' >> /etc/sysctl.conf +sysctl -p + +# Set CPU governor to performance +cpupower frequency-set -g performance + +# Disable transparent huge pages (if not using explicitly) +echo never > /sys/kernel/mm/transparent_hugepage/enabled +``` + +> **PerfTip:** Using 2MB huge pages can improve vector search throughput by 5-10%. + +### Redis Configuration + +``` +# redis.conf optimizations for vector workloads + +# Increase memory limit for large vector datasets +maxmemory 200gb + +# Use multiple I/O threads for better throughput +io-threads 4 +io-threads-do-reads yes + +# Disable persistence for pure search workloads (if acceptable) +save "" +appendonly no +``` + +## Benchmarks + +### Redis Query Engine Performance + +Based on [Redis benchmarks](https://redis.io/blog/benchmarking-results-for-vector-databases/), Redis significantly outperforms competitors: + +| Comparison | Redis Advantage | +|------------|-----------------| +| vs. Qdrant | Up to 3.4x higher QPS | +| vs. Milvus | Up to 3.3x higher QPS | +| vs. Weaviate | Up to 1.7x higher QPS | +| vs. PostgreSQL (pgvector) | Up to 9.5x higher QPS | +| vs. MongoDB Atlas | Up to 11x higher QPS | +| vs. OpenSearch | Up to 53x higher QPS | + +### SVS Performance on Intel + +Intel SVS benchmarks show significant improvements over alternatives: + +| Dataset | SVS QPS | vs. HNSW | +|---------|---------|----------| +| deep-96-1B | 95,931 | 7.0x faster | +| rqa-768-10M | 23,296 | 8.1x faster | +| deep-96-100M | 140,505 | 4.5x faster | + +*Source: [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html)* + +### Memory Savings with Compression + +| Configuration | Memory per 1M Vectors (768-dim) | Recall@10 | +|---------------|--------------------------------|-----------| +| FLOAT32 (no compression) | ~3 GB | 100% | +| LVQ4x8 | ~500 MB | ~98% | +| LeanVec4x8 (reduce=192) | ~300 MB | ~95% | + +## FAQ + +### Q: When should I use SVS-VAMANA vs HNSW? + +**A:** Use SVS-VAMANA when: +- Running on Intel Xeon processors (4th Gen+) +- Memory efficiency is important +- You need maximum throughput on Intel hardware + +Use HNSW when: +- Running on non-Intel hardware +- You need a well-established, widely-tested algorithm +- Compatibility with Redis Open Source without Intel optimizations + +### Q: Are LVQ and LeanVec available in Redis Open Source? + +**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization is available in Redis Open Source. However, Intel's proprietary LVQ and LeanVec optimizations require: +- Intel hardware +- Redis Software (commercial) or RSALv2 license +- Building with `BUILD_INTEL_SVS_OPT=yes` + +### Q: How do I migrate from HNSW to SVS-VAMANA? + +**A:** Create a new index with SVS-VAMANA and reindex your data: + +```bash +# Create new SVS-VAMANA index +FT.CREATE new_index ON HASH PREFIX 1 doc: SCHEMA embedding VECTOR SVS-VAMANA 8 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE COMPRESSION LVQ4x8 + +# Reindex data (use your application or Redis CLI) +# The data format is identical, only the index type changes +``` + +### Q: What if recall is too low with compression? + +**A:** Try these steps in order: +1. Increase `TRAINING_THRESHOLD` (e.g., 50000) +2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8) +3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128) +4. Increase `SEARCH_WINDOW_SIZE` at query time + +## References + +- [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/) +- [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index) +- [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) +- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) +- [Redis Benchmarking Results](https://redis.io/blog/benchmarking-results-for-vector-databases/) From 8a3022d5d22247f3ff2f5ae37c09252b80796b0d Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Mon, 9 Feb 2026 23:16:19 +0000 Subject: [PATCH 2/9] Update Similarity Search guides based on feedback - Remove BIOS and hardware recommendation sections - Add high-level LVQ and LeanVec description - Replace benchmarks with data from Redis tech dive blog - Mention SVS can be used directly, integrations in progress - Remove contributing section - Focus on compression selection and performance data --- software/similarity-search/README.md | 92 ++++---- software/similarity-search/redis/README.md | 259 +++++++-------------- 2 files changed, 134 insertions(+), 217 deletions(-) diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md index 74df68e..f10f9e7 100644 --- a/software/similarity-search/README.md +++ b/software/similarity-search/README.md @@ -12,57 +12,74 @@ Vector similarity search is a core component of modern AI applications including - Image and video similarity - Anomaly detection -Intel provides optimized solutions through the **Scalable Vector Search (SVS)** library, which delivers state-of-the-art performance on Intel hardware. - ## Intel Scalable Vector Search (SVS) -[Intel SVS](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, featuring: +[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and we are working on integrating it into various popular solutions to bring these optimizations to a wider audience. + +SVS features: - **Vamana Algorithm**: Graph-based approximate nearest neighbor search -- **Vector Compression**: LVQ and LeanVec for up to 16x memory reduction -- **AVX-512 Optimization**: Native acceleration on Intel Xeon processors -- **Streaming Support**: DynamicVamana for real-time data updates +- **Vector Compression**: LVQ and LeanVec for significant memory reduction +- **Hardware Optimization**: Best performance on servers with AVX-512 support -### Key Performance Benefits +## Understanding LVQ and LeanVec Compression -| Metric | Intel SVS Advantage | -|--------|---------------------| -| Throughput | Up to 13.5x vs. alternatives at billion scale | -| Memory | Up to 16x reduction with compression | -| Latency | Optimized for both batch and single queries | +Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels. -## Available Guides +### LVQ (Locally-adaptive Vector Quantization) -| Software | Description | Guide | -|----------|-------------|-------| -| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | -| **FAISS** | Facebook AI Similarity Search with SVS indexes | Coming soon | +LVQ addresses these limitations by applying **per-vector normalization and scalar quantization**, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations. + +Key benefits: +- Minimal decompression overhead enables fast, on-the-fly distance computations +- Significantly reduces memory bandwidth and storage requirements +- Maintains high search accuracy and throughput +- SIMD-optimized layout ([Turbo LVQ](https://arxiv.org/abs/2402.02044)) for efficient distance computations + +LVQ achieves a **four-fold reduction** of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes. + +### LeanVec (LVQ with Dimensionality Reduction) -## Common Optimization Topics +[LeanVec](https://openreview.net/forum?id=wczqrpOrIc) builds on LVQ by first applying **linear dimensionality reduction**, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings. -### Hardware Recommendations +Best suited for: +- High-dimensional vectors (768+ dimensions) +- Text embeddings from large language models +- Cases where maximum memory savings are needed -For optimal vector search performance on Intel: +### Two-Level Compression -- **CPU**: 4th Gen Intel Xeon Scalable (Sapphire Rapids) or newer -- **Memory**: DDR5 for higher bandwidth -- **Storage**: NVMe SSD for large datasets +Both LVQ and LeanVec support two-level compression schemes: -### BIOS Settings +1. **Level 1**: Fast candidate retrieval using compressed vectors +2. **Level 2**: Re-ranking using residual encoding for accuracy -| Setting | Recommendation | Impact | -|---------|----------------|--------| -| Hyperthreading | Enabled | Up to 20% throughput | -| Sub-NUMA Clustering | SNC2/SNC4 | Up to 15% with pinning | -| Hardware Prefetcher | Enabled | 5-10% improvement | +The naming convention reflects bits per dimension at each level: +- `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension) +- `LVQ8`: Single-level, 8 bits per dimension +- `LeanVec4x8`: Dimensionality reduction + 4-bit Level 1 + 8-bit Level 2 -### Vector Compression Selection +## Vector Compression Selection -``` -Dimensions < 512 → LVQ4x4 or LVQ4x8 -Dimensions ≥ 512 → LeanVec4x8 or LeanVec8x8 -Maximum savings → LVQ4 (may reduce recall) -``` +| Compression | Best For | Observations | +|-------------|----------|--------------| +| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search | +| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall | +| LVQ4 | Maximum memory saving | Recall might be insufficient | +| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 | +| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall | +| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings | + +**Rule of thumb:** +- Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8) +- Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8) + +## Available Guides + +| Software | Description | Guide | +|----------|-------------|-------| +| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | +| **FAISS** | Facebook AI Similarity Search with SVS indexes | Coming soon | ## References @@ -70,7 +87,4 @@ Maximum savings → LVQ4 (may reduce recall) - [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch) - [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf) - [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u) - -## Contributing - -We welcome contributions! If you have optimization tips for additional vector search solutions, please open a pull request. +- [Turbo LVQ Paper](https://arxiv.org/abs/2402.02044) diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md index 3984e34..7ef4906 100644 --- a/software/similarity-search/redis/README.md +++ b/software/similarity-search/redis/README.md @@ -1,13 +1,10 @@ # Redis Vector Search Optimization Guide -This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library, optimized for Intel hardware. +This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library. ## Table of Contents - [Overview](#overview) -- [Hardware Recommendations](#hardware-recommendations) -- [BIOS Configuration](#bios-configuration) -- [Choosing the Right Index Type](#choosing-the-right-index-type) - [SVS-VAMANA Configuration](#svs-vamana-configuration) - [Vector Compression](#vector-compression) - [Performance Tuning](#performance-tuning) @@ -17,73 +14,14 @@ This guide describes best practices for optimizing vector similarity search perf ## Overview -Redis Query Engine supports three vector index types: +Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm with Intel's compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support. -| Index Type | Use Case | Accuracy | Performance | -|------------|----------|----------|-------------| -| **FLAT** | Small datasets (<1M vectors) | Exact | Brute-force | -| **HNSW** | Large datasets, general use | Approximate | Good | -| **SVS-VAMANA** | Large datasets on Intel hardware | Approximate | Best on Intel | +**Key Benefits of SVS-VAMANA:** -**Why SVS-VAMANA on Intel?** - -- Optimized for AVX-512 instruction set on Intel Xeon processors -- Advanced compression (LVQ, LeanVec) reduces memory by up to 16x -- Higher throughput with lower latency compared to HNSW - -## Hardware Recommendations - -### Recommended Intel Xeon Configurations - -| Workload Size | CPU | Memory | Storage | -|---------------|-----|--------|---------| -| Small (<1M vectors) | 4th Gen Xeon, 16 cores | 64 GB DDR5 | NVMe SSD | -| Medium (1-10M vectors) | 4th Gen Xeon, 32 cores | 128 GB DDR5 | NVMe SSD | -| Large (10-100M vectors) | 4th/5th Gen Xeon, 64 cores | 256 GB DDR5 | NVMe SSD | -| X-Large (>100M vectors) | 5th Gen Xeon, 128+ cores | 512+ GB DDR5 | NVMe RAID | - -> **PerfTip:** 4th Gen Intel Xeon Scalable (Sapphire Rapids) and newer provide optimal AVX-512 performance for vector operations. - -### Key Hardware Features - -- **AVX-512**: Required for optimal SVS performance -- **AMX**: Additional acceleration on 4th/5th Gen Xeon -- **DDR5 Memory**: Higher bandwidth improves vector search throughput -- **Large L3 Cache**: Helps with graph traversal operations - -## BIOS Configuration - -| Parameter | Recommended Setting | Description | PerfTip | -|-----------|---------------------|-------------|---------| -| Hyperthreading (SMT) | Enabled | Two threads per core | Up to 20% | -| Sub-NUMA Clustering (SNC) | SNC2 or SNC4 | Better memory locality | Up to 15% | -| Hardware Prefetcher | Enabled | Improves cache utilization | 5-10% | -| Intel Turbo Boost | Enabled | Higher clock speeds | 10-15% | -| Power Profile | Performance | Maximum CPU frequency | Varies | - -## Choosing the Right Index Type - -``` - ┌─────────────────────────────────────┐ - │ Do you need exact results? │ - └─────────────────────────────────────┘ - │ - ┌───────────────┴───────────────┐ - ▼ ▼ - Yes No - │ │ - ▼ ▼ - Use FLAT ┌────────────────────┐ - │ Running on Intel? │ - └────────────────────┘ - │ - ┌───────────────┴───────────────┐ - ▼ ▼ - Yes No - │ │ - ▼ ▼ - Use SVS-VAMANA Use HNSW -``` +- **Memory Efficiency**: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory +- **Higher Throughput**: Up to 144% higher QPS compared to HNSW on high-dimensional datasets +- **Lower Latency**: Up to 60% reduction in p50/p95 latencies under load +- **Maintained Accuracy**: Matches HNSW precision levels while delivering performance improvements ## SVS-VAMANA Configuration @@ -106,15 +44,15 @@ FT.CREATE my_index | Parameter | Description | Default | Tuning Guidance | |-----------|-------------|---------|-----------------| -| TYPE | Vector data type | - | FLOAT32 for accuracy, FLOAT16 for memory | +| TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory | | DIM | Vector dimensions | - | Must match your embeddings | | DISTANCE_METRIC | L2, IP, or COSINE | - | COSINE for normalized embeddings | | GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory | | CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality | | SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower | | COMPRESSION | LVQ/LeanVec type | none | See compression section | - -> **PerfTip:** Setting `GRAPH_MAX_DEGREE` to 64 instead of default 32 can improve recall by 2-5% with ~2x memory overhead for the graph structure. +| TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low | +| REDUCE | Target dimension for LeanVec | DIM/2 | Lower = faster search, may reduce recall | ## Vector Compression @@ -125,42 +63,22 @@ Intel SVS provides advanced compression techniques that reduce memory usage whil | Compression | Bits/Dim | Memory Reduction | Best For | |-------------|----------|------------------|----------| | None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy | -| LVQ8 | 8 | 4x | Fast ingestion | -| LVQ4x4 | 4+4 | 8x | Balanced | -| LVQ4x8 | 4+8 | ~6x | High recall with compression | -| LeanVec4x8 | Reduced dim + 4+8 | 8-16x | High-dimensional vectors (768+) | -| LeanVec8x8 | Reduced dim + 8+8 | 4-8x | Best recall with LeanVec | - -### Choosing Compression - -``` -Vector Dimensions < 512? - └─► Use LVQ4x4 or LVQ4x8 - -Vector Dimensions >= 512? - └─► Use LeanVec4x8 or LeanVec8x8 - -Need maximum memory savings? - └─► Use LVQ4 (single-level) - -Need highest recall with compression? - └─► Use LVQ4x8 or LeanVec8x8 -``` - -> **PerfTip:** LeanVec4x8 with 768-dimensional vectors (common for text embeddings) can reduce memory by 10x while maintaining 95%+ recall. - -### Two-Level Compression - -LVQ and LeanVec support two-level compression: - -1. **Level 1**: Fast candidate retrieval using compressed vectors -2. **Level 2**: Re-ranking using residual encoding for accuracy +| LVQ8 | 8 | ~4x | Fast ingestion, good balance | +| LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 | +| LVQ4x8 | 4+8 | ~3x | High recall with compression | +| LeanVec4x8 | Reduced + 4+8 | ~3x | High-dimensional vectors (768+) | +| LeanVec8x8 | Reduced + 8+8 | ~2.5x | Best recall with LeanVec | -Example: `LVQ4x8` uses 4 bits for Level 1 and 8 bits for Level 2. +### Choosing Compression by Use Case -### Compression Training +| Embedding Category | Example Embeddings | Compression Strategy | +|--------------------|-------------------|---------------------| +| Text Embeddings | Cohere embed-v3 (1024), OpenAI ada-002 (1536) | LeanVec4x8 | +| Image Embeddings | ResNet-152 (2048), ViT (768+) | LeanVec4x8 | +| Multimodal | CLIP ViT-B/32 (512) | LVQ8 | +| Lower Dimensional | Custom embeddings (<768) | LVQ4x4 or LVQ4x8 | -Compression parameters are learned from data. Use `TRAINING_THRESHOLD` to control the sample size: +### Example with LeanVec Compression ```bash FT.CREATE my_index @@ -168,19 +86,19 @@ FT.CREATE my_index PREFIX 1 doc: SCHEMA embedding VECTOR SVS-VAMANA 14 TYPE FLOAT32 - DIM 768 + DIM 1536 DISTANCE_METRIC COSINE COMPRESSION LeanVec4x8 + REDUCE 384 TRAINING_THRESHOLD 20000 - REDUCE 192 ``` -> **Note:** If recall is low, increase `TRAINING_THRESHOLD`. The default is 10 * 1024 = 10,240 vectors. - ## Performance Tuning ### Runtime Query Parameters +Adjust search parameters at query time for precision/performance trade-offs: + ```bash FT.SEARCH my_index "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]" @@ -194,106 +112,83 @@ FT.SEARCH my_index | EPSILON | Larger = wider range search | Higher latency | | SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency | -### OS-Level Tuning - -```bash -# Enable huge pages for better memory performance -echo 'vm.nr_hugepages = 1024' >> /etc/sysctl.conf -sysctl -p - -# Set CPU governor to performance -cpupower frequency-set -g performance - -# Disable transparent huge pages (if not using explicitly) -echo never > /sys/kernel/mm/transparent_hugepage/enabled -``` - -> **PerfTip:** Using 2MB huge pages can improve vector search throughput by 5-10%. - ### Redis Configuration ``` # redis.conf optimizations for vector workloads -# Increase memory limit for large vector datasets -maxmemory 200gb - # Use multiple I/O threads for better throughput io-threads 4 io-threads-do-reads yes - -# Disable persistence for pure search workloads (if acceptable) -save "" -appendonly no ``` ## Benchmarks -### Redis Query Engine Performance +Based on [Redis and Intel benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW: + +### Memory Savings -Based on [Redis benchmarks](https://redis.io/blog/benchmarking-results-for-vector-databases/), Redis significantly outperforms competitors: +SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets: -| Comparison | Redis Advantage | -|------------|-----------------| -| vs. Qdrant | Up to 3.4x higher QPS | -| vs. Milvus | Up to 3.3x higher QPS | -| vs. Weaviate | Up to 1.7x higher QPS | -| vs. PostgreSQL (pgvector) | Up to 9.5x higher QPS | -| vs. MongoDB Atlas | Up to 11x higher QPS | -| vs. OpenSearch | Up to 53x higher QPS | +| Dataset | Dimensions | Total Memory Reduction | Index Memory Reduction | +|---------|------------|----------------------|----------------------| +| LAION | 512 | 26% | 51% | +| Cohere | 768 | 35% | 70% | +| DBpedia | 1536 | 37% | 74% | -### SVS Performance on Intel +### Throughput Improvements (FP32) -Intel SVS benchmarks show significant improvements over alternatives: +At 0.95+ precision, compared to HNSW: -| Dataset | SVS QPS | vs. HNSW | -|---------|---------|----------| -| deep-96-1B | 95,931 | 7.0x faster | -| rqa-768-10M | 23,296 | 8.1x faster | -| deep-96-100M | 140,505 | 4.5x faster | +| Dataset | Dimensions | QPS Improvement | +|---------|------------|-----------------| +| Cohere | 768 | Up to 144% higher | +| DBpedia | 1536 | Up to 60% higher | +| LAION | 512 | 0-15% (marginal) | -*Source: [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html)* +SVS-VAMANA is most effective for medium-to-high dimensional embeddings (768–3072 dimensions). -### Memory Savings with Compression +### Latency Improvements (FP32, High Concurrency) -| Configuration | Memory per 1M Vectors (768-dim) | Recall@10 | -|---------------|--------------------------------|-----------| -| FLOAT32 (no compression) | ~3 GB | 100% | -| LVQ4x8 | ~500 MB | ~98% | -| LeanVec4x8 (reduce=192) | ~300 MB | ~95% | +| Dataset | p50 Latency Reduction | p95 Latency Reduction | +|---------|----------------------|----------------------| +| Cohere (768d) | 60% | 57% | +| DBpedia (1536d) | 46% | 36% | + +### Precision vs. Performance + +At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput. + +### Ingestion Trade-offs + +SVS-VAMANA index construction is slower than HNSW due to compression overhead. On x86 platforms: +- LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset +- LVQ: Up to 2.6x slower than HNSW + +This trade-off is acceptable for workloads where query performance and memory efficiency are priorities. ## FAQ ### Q: When should I use SVS-VAMANA vs HNSW? **A:** Use SVS-VAMANA when: -- Running on Intel Xeon processors (4th Gen+) -- Memory efficiency is important -- You need maximum throughput on Intel hardware +- Running on Intel Xeon processors with AVX-512 +- Memory efficiency is important (26-37% savings) +- You have medium-to-high dimensional vectors (768+) +- Query throughput and latency are priorities Use HNSW when: -- Running on non-Intel hardware -- You need a well-established, widely-tested algorithm -- Compatibility with Redis Open Source without Intel optimizations +- Running on ARM platforms (HNSW performs well on ARM) +- You need faster index construction +- Working with lower-dimensional vectors (<512) ### Q: Are LVQ and LeanVec available in Redis Open Source? -**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization is available in Redis Open Source. However, Intel's proprietary LVQ and LeanVec optimizations require: -- Intel hardware -- Redis Software (commercial) or RSALv2 license -- Building with `BUILD_INTEL_SVS_OPT=yes` - -### Q: How do I migrate from HNSW to SVS-VAMANA? +**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's proprietary LVQ and LeanVec optimizations require: +- Intel hardware with AVX-512 +- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes` -**A:** Create a new index with SVS-VAMANA and reindex your data: - -```bash -# Create new SVS-VAMANA index -FT.CREATE new_index ON HASH PREFIX 1 doc: SCHEMA embedding VECTOR SVS-VAMANA 8 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE COMPRESSION LVQ4x8 - -# Reindex data (use your application or Redis CLI) -# The data format is identical, only the index type changes -``` +On non-Intel platforms (AMD, ARM), SVS-VAMANA falls back to SQ8 compression. ### Q: What if recall is too low with compression? @@ -302,11 +197,19 @@ FT.CREATE new_index ON HASH PREFIX 1 doc: SCHEMA embedding VECTOR SVS-VAMANA 8 T 2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8) 3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128) 4. Increase `SEARCH_WINDOW_SIZE` at query time +5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions) + +### Q: How does performance compare across CPU vendors? + +**A:** Based on benchmarks: +- **Intel**: Best performance with LVQ and LeanVec optimizations +- **AMD**: Strong performance with SQ8 fallback, comparable to Intel in many cases +- **ARM**: HNSW is recommended; SVS-VAMANA SQ8 fallback has slower ingestion on ARM ## References - [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/) - [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index) - [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) +- [Tech Dive: Comprehensive Compression](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/) - [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) -- [Redis Benchmarking Results](https://redis.io/blog/benchmarking-results-for-vector-databases/) From cee7036d63fac1fece631709570a10fafdb37500 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Tue, 10 Feb 2026 00:56:28 +0000 Subject: [PATCH 3/9] Add FAISS SVS integration documentation - Add FAISS optimization guide with SVS index types - Cover IndexSVSVamana, IndexSVSVamanaLVQ, IndexSVSVamanaLeanVec - Include factory string format and examples - Add compression selection guide - Include benchmark data from Intel SVS - Update overview and main README --- README.md | 1 + software/similarity-search/README.md | 2 +- software/similarity-search/faiss/README.md | 280 +++++++++++++++++++++ 3 files changed, 282 insertions(+), 1 deletion(-) create mode 100644 software/similarity-search/faiss/README.md diff --git a/README.md b/README.md index 9ee9f41..112f62e 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ We aim to provide a dynamic resource where users can find the latest optimizatio - [Java](software/java/README.md) - [Similarity Search](software/similarity-search/README.md) - [Redis](software/similarity-search/redis/README.md) + - [FAISS](software/similarity-search/faiss/README.md) - [Spark](software/spark/README.md) - [MySQL & PostgreSQL](software/mysql-postgresql/README.md) - Workloads diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md index f10f9e7..88a1912 100644 --- a/software/similarity-search/README.md +++ b/software/similarity-search/README.md @@ -79,7 +79,7 @@ The naming convention reflects bits per dimension at each level: | Software | Description | Guide | |----------|-------------|-------| | **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | -| **FAISS** | Facebook AI Similarity Search with SVS indexes | Coming soon | +| **FAISS** | Facebook AI Similarity Search with SVS indexes | [FAISS Guide](faiss/README.md) | ## References diff --git a/software/similarity-search/faiss/README.md b/software/similarity-search/faiss/README.md new file mode 100644 index 0000000..bf85921 --- /dev/null +++ b/software/similarity-search/faiss/README.md @@ -0,0 +1,280 @@ +# FAISS Vector Search Optimization Guide + +This guide describes best practices for optimizing vector similarity search performance in FAISS on Intel Xeon processors. FAISS includes native support for Intel SVS indexes (IndexSVSVamana, IndexSVSVamanaLVQ, IndexSVSVamanaLeanVec), providing optimized performance on Intel hardware. + +## Table of Contents + +- [Overview](#overview) +- [SVS Index Types in FAISS](#svs-index-types-in-faiss) +- [Installation](#installation) +- [Creating SVS Indexes](#creating-svs-indexes) +- [Vector Compression](#vector-compression) +- [Performance Tuning](#performance-tuning) +- [Benchmarks](#benchmarks) +- [FAQ](#faq) +- [References](#references) + +## Overview + +FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. Starting with recent versions, FAISS includes native integration with Intel's Scalable Vector Search (SVS) library. + +**SVS Index Types in FAISS:** + +| Index Type | Description | Compression | +|------------|-------------|-------------| +| IndexSVSVamana | Base SVS graph-based index | None (full precision) | +| IndexSVSVamanaLVQ | SVS with LVQ compression | LVQ (4-8 bits per dimension) | +| IndexSVSVamanaLeanVec | SVS with LeanVec compression | Dimensionality reduction + LVQ | + +**Key Benefits:** + +- Up to 13.5x higher throughput compared to HNSW at billion scale +- 51-74% memory reduction with compression +- Optimized for servers with AVX-512 support + +## SVS Index Types in FAISS + +### IndexSVSVamana + +The base SVS index using the Vamana graph algorithm without compression. Best for maximum accuracy when memory is not a constraint. + +### IndexSVSVamanaLVQ + +Combines Vamana with Locally-adaptive Vector Quantization (LVQ). LVQ applies per-vector normalization and scalar quantization, achieving up to 4x memory reduction while maintaining high accuracy. + +### IndexSVSVamanaLeanVec + +Extends LVQ with dimensionality reduction. Best for high-dimensional vectors (768+ dimensions), achieving up to 8-16x memory reduction. Particularly effective for text embeddings from large language models. + +## Installation + +### From PyPI (with Intel optimizations) + +```bash +pip install faiss-cpu +``` + +### From Conda (Intel channel) + +```bash +conda install -c conda-forge faiss-cpu +``` + +### Building with SVS Support + +To enable Intel SVS optimizations when building from source: + +```bash +git clone https://github.com/facebookresearch/faiss.git +cd faiss +cmake -B build \ + -DFAISS_ENABLE_SVS=ON \ + -DCMAKE_BUILD_TYPE=Release \ + . +cmake --build build -j +``` + +## Creating SVS Indexes + +### Using Factory String + +FAISS provides a factory string format for creating SVS indexes: + +``` +SVSVamana[,[_]] +``` + +**Examples:** + +```python +import faiss + +# Basic SVS index with graph degree 32 +index = faiss.index_factory(768, "SVSVamana32") + +# SVS with LVQ8 compression +index = faiss.index_factory(768, "SVSVamana32,LVQ8") + +# SVS with LVQ4x8 two-level compression +index = faiss.index_factory(768, "SVSVamana64,LVQ4x8") + +# SVS with LeanVec (dimensionality reduction to 128 dims) +index = faiss.index_factory(768, "SVSVamana32,LeanVec4x8_128") +``` + +### Direct Index Creation + +```python +import faiss +import numpy as np + +# Sample data +d = 768 # dimension +n = 100000 # number of vectors +xb = np.random.random((n, d)).astype('float32') + +# Create SVS index with LVQ compression +index = faiss.IndexSVSVamanaLVQ( + d, # dimension + 32, # graph_max_degree + 8, # primary bits (LVQ8) + 0 # residual bits (0 = single level) +) + +# Build the index +index.train(xb) +index.add(xb) + +# Search +k = 10 # number of neighbors +xq = np.random.random((5, d)).astype('float32') +D, I = index.search(xq, k) +``` + +### Index Parameters + +| Parameter | Description | Default | Guidance | +|-----------|-------------|---------|----------| +| dimension | Vector dimensions | - | Must match your embeddings | +| graph_max_degree | Max edges per node | 32 | Higher = better recall, more memory | +| construction_window_size | Build search window | 200 | Higher = better graph quality | +| search_window_size | Query search window | 10 | Higher = better recall | + +## Vector Compression + +### Compression Options + +| Compression | Factory String | Memory Reduction | Best For | +|-------------|----------------|------------------|----------| +| None | `SVSVamana32` | 1x | Maximum accuracy | +| LVQ8 | `SVSVamana32,LVQ8` | ~4x | Good balance | +| LVQ4x8 | `SVSVamana32,LVQ4x8` | ~3x | High recall with compression | +| LeanVec4x8 | `SVSVamana32,LeanVec4x8_128` | 8-16x | High-dimensional vectors | +| LeanVec8x8 | `SVSVamana32,LeanVec8x8_256` | 4-8x | Best recall with LeanVec | + +### Choosing Compression + +**Rule of thumb:** +- Dimensions < 768: Use LVQ (LVQ8 or LVQ4x8) +- Dimensions ≥ 768: Use LeanVec (LeanVec4x8 or LeanVec8x8) +- Maximum memory savings: LeanVec with aggressive dimension reduction + +### LeanVec Dimension Selection + +The `_` suffix in LeanVec specifies the reduced dimension: + +```python +# Original: 768 dims, reduced to 192 (768/4) +index = faiss.index_factory(768, "SVSVamana32,LeanVec4x8_192") + +# Original: 1536 dims, reduced to 384 (1536/4) +index = faiss.index_factory(1536, "SVSVamana32,LeanVec4x8_384") +``` + +Lower reduced dimensions = faster search and less memory, but may reduce recall. + +## Performance Tuning + +### Search Parameters + +```python +# Set search window size (higher = better recall, slower) +index.search_window_size = 50 + +# Perform search +D, I = index.search(queries, k) +``` + +### Multi-threaded Search + +```python +import faiss + +# Set number of threads for search +faiss.omp_set_num_threads(16) + +# Search will use multiple threads +D, I = index.search(queries, k) +``` + +### Index Save/Load + +```python +# Save index +faiss.write_index(index, "my_index.faiss") + +# Load index +index = faiss.read_index("my_index.faiss") +``` + +## Benchmarks + +Based on Intel SVS benchmarks, SVS indexes in FAISS deliver significant performance improvements: + +### Throughput Comparison (Queries Per Second) + +| Dataset | Dimensions | Size | SVS QPS | vs. HNSW | +|---------|------------|------|---------|----------| +| deep-96 | 96 | 1B | 95,931 | 7.0x faster | +| deep-96 | 96 | 100M | 140,505 | 4.5x faster | +| rqa-768 | 768 | 10M | 23,296 | 8.1x faster | +| open-images | 512 | 13M | 79,507 | 3.3x faster | + +*Source: [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html)* + +### Memory Savings with Compression + +| Configuration | Memory per 1M Vectors (768-dim) | +|---------------|--------------------------------| +| float32 (no compression) | ~3 GB | +| LVQ8 | ~750 MB | +| LVQ4x8 | ~500 MB | +| LeanVec4x8 (reduce to 192) | ~300 MB | + +### Hardware Recommendations + +Best performance is achieved on servers with AVX-512 support: +- 4th Gen Intel Xeon Scalable (Sapphire Rapids) or newer +- On non-Intel platforms, SVS falls back to basic 8-bit scalar quantization + +## FAQ + +### Q: How do I check if SVS is available in my FAISS installation? + +```python +import faiss +print(hasattr(faiss, 'IndexSVSVamana')) # True if SVS is available +``` + +### Q: Can I convert an existing FAISS index to SVS? + +**A:** No direct conversion is available. You need to rebuild the index using SVS index types. Extract your vectors and create a new SVS index. + +### Q: What happens on non-Intel hardware? + +**A:** On AMD or ARM platforms, SVS uses a fallback 8-bit scalar quantization (SQ8) instead of LVQ/LeanVec. Performance will be good but not as optimized as on Intel with AVX-512. + +### Q: How does IndexSVSVamana compare to IndexHNSW? + +**A:** Both are graph-based approximate nearest neighbor indexes. SVS typically offers: +- Higher throughput (up to 8x on some datasets) +- Better memory efficiency with compression +- Optimized performance on Intel hardware + +Use HNSW if you need broader hardware compatibility or are on ARM. + +### Q: What if recall is too low with compression? + +**A:** Try these adjustments: +1. Increase `search_window_size` (e.g., 50 or 100) +2. Use higher-bit compression (LVQ4x8 → LVQ8) +3. For LeanVec, increase the reduced dimension +4. Increase `graph_max_degree` when building + +## References + +- [FAISS GitHub Repository](https://github.com/facebookresearch/faiss) +- [FAISS SVS Integration Wiki](https://github.com/facebookresearch/faiss/wiki/CPU-Faiss---Intel-SVS-%E2%80%90-Overview) +- [FAISS SVS Usage Guide](https://github.com/facebookresearch/faiss/wiki/CPU-Faiss---Intel-SVS-%E2%80%90-Usage) +- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) +- [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html) From c9c3d4603c4dfe8eca9f5c00ca9b22abae02c9be Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Tue, 10 Feb 2026 01:07:49 +0000 Subject: [PATCH 4/9] Remove unverified benchmarks from FAISS guide - Remove benchmark section (no official FAISS SVS performance data) - Update overview to remove specific performance claims - Clarify Intel-only requirement in FAQ - Keep technical how-to guidance --- software/similarity-search/faiss/README.md | 39 +++------------------- 1 file changed, 4 insertions(+), 35 deletions(-) diff --git a/software/similarity-search/faiss/README.md b/software/similarity-search/faiss/README.md index bf85921..d12443d 100644 --- a/software/similarity-search/faiss/README.md +++ b/software/similarity-search/faiss/README.md @@ -10,7 +10,6 @@ This guide describes best practices for optimizing vector similarity search perf - [Creating SVS Indexes](#creating-svs-indexes) - [Vector Compression](#vector-compression) - [Performance Tuning](#performance-tuning) -- [Benchmarks](#benchmarks) - [FAQ](#faq) - [References](#references) @@ -28,9 +27,9 @@ FAISS (Facebook AI Similarity Search) is a library for efficient similarity sear **Key Benefits:** -- Up to 13.5x higher throughput compared to HNSW at billion scale -- 51-74% memory reduction with compression -- Optimized for servers with AVX-512 support +- High-performance graph-based similarity search optimized for Intel CPUs +- Significant memory reduction with LVQ and LeanVec compression +- Best performance on Intel Xeon with AVX-512 support ## SVS Index Types in FAISS @@ -207,36 +206,6 @@ faiss.write_index(index, "my_index.faiss") index = faiss.read_index("my_index.faiss") ``` -## Benchmarks - -Based on Intel SVS benchmarks, SVS indexes in FAISS deliver significant performance improvements: - -### Throughput Comparison (Queries Per Second) - -| Dataset | Dimensions | Size | SVS QPS | vs. HNSW | -|---------|------------|------|---------|----------| -| deep-96 | 96 | 1B | 95,931 | 7.0x faster | -| deep-96 | 96 | 100M | 140,505 | 4.5x faster | -| rqa-768 | 768 | 10M | 23,296 | 8.1x faster | -| open-images | 512 | 13M | 79,507 | 3.3x faster | - -*Source: [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html)* - -### Memory Savings with Compression - -| Configuration | Memory per 1M Vectors (768-dim) | -|---------------|--------------------------------| -| float32 (no compression) | ~3 GB | -| LVQ8 | ~750 MB | -| LVQ4x8 | ~500 MB | -| LeanVec4x8 (reduce to 192) | ~300 MB | - -### Hardware Recommendations - -Best performance is achieved on servers with AVX-512 support: -- 4th Gen Intel Xeon Scalable (Sapphire Rapids) or newer -- On non-Intel platforms, SVS falls back to basic 8-bit scalar quantization - ## FAQ ### Q: How do I check if SVS is available in my FAISS installation? @@ -252,7 +221,7 @@ print(hasattr(faiss, 'IndexSVSVamana')) # True if SVS is available ### Q: What happens on non-Intel hardware? -**A:** On AMD or ARM platforms, SVS uses a fallback 8-bit scalar quantization (SQ8) instead of LVQ/LeanVec. Performance will be good but not as optimized as on Intel with AVX-512. +**A:** SVS indexes are designed for Intel CPUs. On Intel platforms without AVX-512, performance is still good but not optimal. On non-Intel platforms (AMD, ARM), consider using standard FAISS indexes like IndexHNSW. ### Q: How does IndexSVSVamana compare to IndexHNSW? From 035891e876b808fdec6292882b54c3c102458413 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Tue, 10 Feb 2026 01:16:34 +0000 Subject: [PATCH 5/9] Address CodeRabbit review comments - Add licensing warning for LVQ/LeanVec (AGPLv3/SSPLv1 incompatible) - Add language spec to FAISS factory string code block --- software/similarity-search/faiss/README.md | 2 +- software/similarity-search/redis/README.md | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/software/similarity-search/faiss/README.md b/software/similarity-search/faiss/README.md index d12443d..085781c 100644 --- a/software/similarity-search/faiss/README.md +++ b/software/similarity-search/faiss/README.md @@ -79,7 +79,7 @@ cmake --build build -j FAISS provides a factory string format for creating SVS indexes: -``` +```text SVSVamana[,[_]] ``` diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md index 7ef4906..8a9ad1b 100644 --- a/software/similarity-search/redis/README.md +++ b/software/similarity-search/redis/README.md @@ -188,6 +188,8 @@ Use HNSW when: - Intel hardware with AVX-512 - Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes` +**⚠️ Licensing Note:** If you use Redis Open Source under AGPLv3 or SSPLv1, you cannot use Intel's proprietary LVQ/LeanVec binaries—the Intel SVS license is incompatible with those licenses. LVQ and LeanVec optimizations are only available when Redis Open Source is distributed under RSALv2. See [Redis SVS compression docs](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) for details. + On non-Intel platforms (AMD, ARM), SVS-VAMANA falls back to SQ8 compression. ### Q: What if recall is too low with compression? From 60e74d5ea3c5d7f8bb58cab8afa11d7f9b768744 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Tue, 10 Feb 2026 01:33:28 +0000 Subject: [PATCH 6/9] Remove license note, improve cross-platform messaging --- software/similarity-search/redis/README.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md index 8a9ad1b..7ca45ce 100644 --- a/software/similarity-search/redis/README.md +++ b/software/similarity-search/redis/README.md @@ -184,13 +184,11 @@ Use HNSW when: ### Q: Are LVQ and LeanVec available in Redis Open Source? -**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's proprietary LVQ and LeanVec optimizations require: +**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require: - Intel hardware with AVX-512 - Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes` -**⚠️ Licensing Note:** If you use Redis Open Source under AGPLv3 or SSPLv1, you cannot use Intel's proprietary LVQ/LeanVec binaries—the Intel SVS license is incompatible with those licenses. LVQ and LeanVec optimizations are only available when Redis Open Source is distributed under RSALv2. See [Redis SVS compression docs](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/) for details. - -On non-Intel platforms (AMD, ARM), SVS-VAMANA falls back to SQ8 compression. +On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required. ### Q: What if recall is too low with compression? @@ -201,12 +199,15 @@ On non-Intel platforms (AMD, ARM), SVS-VAMANA falls back to SQ8 compression. 4. Increase `SEARCH_WINDOW_SIZE` at query time 5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions) -### Q: How does performance compare across CPU vendors? +### Q: Does SVS-VAMANA work on non-Intel hardware? + +**A:** Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation: + +- **Intel (AVX-512)**: Full LVQ/LeanVec optimizations for maximum performance +- **AMD/Other x86**: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance +- **ARM**: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM -**A:** Based on benchmarks: -- **Intel**: Best performance with LVQ and LeanVec optimizations -- **AMD**: Strong performance with SQ8 fallback, comparable to Intel in many cases -- **ARM**: HNSW is recommended; SVS-VAMANA SQ8 fallback has slower ingestion on ARM +Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification. ## References From 83b9b97076c3b956eb7deed570cac3a248fd3a37 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Wed, 11 Feb 2026 21:15:54 +0000 Subject: [PATCH 7/9] Remove FAISS guide from PR, will add in separate PR after SVS release --- software/similarity-search/README.md | 1 - software/similarity-search/faiss/README.md | 249 --------------------- 2 files changed, 250 deletions(-) delete mode 100644 software/similarity-search/faiss/README.md diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md index 88a1912..0eff29d 100644 --- a/software/similarity-search/README.md +++ b/software/similarity-search/README.md @@ -79,7 +79,6 @@ The naming convention reflects bits per dimension at each level: | Software | Description | Guide | |----------|-------------|-------| | **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) | -| **FAISS** | Facebook AI Similarity Search with SVS indexes | [FAISS Guide](faiss/README.md) | ## References diff --git a/software/similarity-search/faiss/README.md b/software/similarity-search/faiss/README.md deleted file mode 100644 index 085781c..0000000 --- a/software/similarity-search/faiss/README.md +++ /dev/null @@ -1,249 +0,0 @@ -# FAISS Vector Search Optimization Guide - -This guide describes best practices for optimizing vector similarity search performance in FAISS on Intel Xeon processors. FAISS includes native support for Intel SVS indexes (IndexSVSVamana, IndexSVSVamanaLVQ, IndexSVSVamanaLeanVec), providing optimized performance on Intel hardware. - -## Table of Contents - -- [Overview](#overview) -- [SVS Index Types in FAISS](#svs-index-types-in-faiss) -- [Installation](#installation) -- [Creating SVS Indexes](#creating-svs-indexes) -- [Vector Compression](#vector-compression) -- [Performance Tuning](#performance-tuning) -- [FAQ](#faq) -- [References](#references) - -## Overview - -FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. Starting with recent versions, FAISS includes native integration with Intel's Scalable Vector Search (SVS) library. - -**SVS Index Types in FAISS:** - -| Index Type | Description | Compression | -|------------|-------------|-------------| -| IndexSVSVamana | Base SVS graph-based index | None (full precision) | -| IndexSVSVamanaLVQ | SVS with LVQ compression | LVQ (4-8 bits per dimension) | -| IndexSVSVamanaLeanVec | SVS with LeanVec compression | Dimensionality reduction + LVQ | - -**Key Benefits:** - -- High-performance graph-based similarity search optimized for Intel CPUs -- Significant memory reduction with LVQ and LeanVec compression -- Best performance on Intel Xeon with AVX-512 support - -## SVS Index Types in FAISS - -### IndexSVSVamana - -The base SVS index using the Vamana graph algorithm without compression. Best for maximum accuracy when memory is not a constraint. - -### IndexSVSVamanaLVQ - -Combines Vamana with Locally-adaptive Vector Quantization (LVQ). LVQ applies per-vector normalization and scalar quantization, achieving up to 4x memory reduction while maintaining high accuracy. - -### IndexSVSVamanaLeanVec - -Extends LVQ with dimensionality reduction. Best for high-dimensional vectors (768+ dimensions), achieving up to 8-16x memory reduction. Particularly effective for text embeddings from large language models. - -## Installation - -### From PyPI (with Intel optimizations) - -```bash -pip install faiss-cpu -``` - -### From Conda (Intel channel) - -```bash -conda install -c conda-forge faiss-cpu -``` - -### Building with SVS Support - -To enable Intel SVS optimizations when building from source: - -```bash -git clone https://github.com/facebookresearch/faiss.git -cd faiss -cmake -B build \ - -DFAISS_ENABLE_SVS=ON \ - -DCMAKE_BUILD_TYPE=Release \ - . -cmake --build build -j -``` - -## Creating SVS Indexes - -### Using Factory String - -FAISS provides a factory string format for creating SVS indexes: - -```text -SVSVamana[,[_]] -``` - -**Examples:** - -```python -import faiss - -# Basic SVS index with graph degree 32 -index = faiss.index_factory(768, "SVSVamana32") - -# SVS with LVQ8 compression -index = faiss.index_factory(768, "SVSVamana32,LVQ8") - -# SVS with LVQ4x8 two-level compression -index = faiss.index_factory(768, "SVSVamana64,LVQ4x8") - -# SVS with LeanVec (dimensionality reduction to 128 dims) -index = faiss.index_factory(768, "SVSVamana32,LeanVec4x8_128") -``` - -### Direct Index Creation - -```python -import faiss -import numpy as np - -# Sample data -d = 768 # dimension -n = 100000 # number of vectors -xb = np.random.random((n, d)).astype('float32') - -# Create SVS index with LVQ compression -index = faiss.IndexSVSVamanaLVQ( - d, # dimension - 32, # graph_max_degree - 8, # primary bits (LVQ8) - 0 # residual bits (0 = single level) -) - -# Build the index -index.train(xb) -index.add(xb) - -# Search -k = 10 # number of neighbors -xq = np.random.random((5, d)).astype('float32') -D, I = index.search(xq, k) -``` - -### Index Parameters - -| Parameter | Description | Default | Guidance | -|-----------|-------------|---------|----------| -| dimension | Vector dimensions | - | Must match your embeddings | -| graph_max_degree | Max edges per node | 32 | Higher = better recall, more memory | -| construction_window_size | Build search window | 200 | Higher = better graph quality | -| search_window_size | Query search window | 10 | Higher = better recall | - -## Vector Compression - -### Compression Options - -| Compression | Factory String | Memory Reduction | Best For | -|-------------|----------------|------------------|----------| -| None | `SVSVamana32` | 1x | Maximum accuracy | -| LVQ8 | `SVSVamana32,LVQ8` | ~4x | Good balance | -| LVQ4x8 | `SVSVamana32,LVQ4x8` | ~3x | High recall with compression | -| LeanVec4x8 | `SVSVamana32,LeanVec4x8_128` | 8-16x | High-dimensional vectors | -| LeanVec8x8 | `SVSVamana32,LeanVec8x8_256` | 4-8x | Best recall with LeanVec | - -### Choosing Compression - -**Rule of thumb:** -- Dimensions < 768: Use LVQ (LVQ8 or LVQ4x8) -- Dimensions ≥ 768: Use LeanVec (LeanVec4x8 or LeanVec8x8) -- Maximum memory savings: LeanVec with aggressive dimension reduction - -### LeanVec Dimension Selection - -The `_` suffix in LeanVec specifies the reduced dimension: - -```python -# Original: 768 dims, reduced to 192 (768/4) -index = faiss.index_factory(768, "SVSVamana32,LeanVec4x8_192") - -# Original: 1536 dims, reduced to 384 (1536/4) -index = faiss.index_factory(1536, "SVSVamana32,LeanVec4x8_384") -``` - -Lower reduced dimensions = faster search and less memory, but may reduce recall. - -## Performance Tuning - -### Search Parameters - -```python -# Set search window size (higher = better recall, slower) -index.search_window_size = 50 - -# Perform search -D, I = index.search(queries, k) -``` - -### Multi-threaded Search - -```python -import faiss - -# Set number of threads for search -faiss.omp_set_num_threads(16) - -# Search will use multiple threads -D, I = index.search(queries, k) -``` - -### Index Save/Load - -```python -# Save index -faiss.write_index(index, "my_index.faiss") - -# Load index -index = faiss.read_index("my_index.faiss") -``` - -## FAQ - -### Q: How do I check if SVS is available in my FAISS installation? - -```python -import faiss -print(hasattr(faiss, 'IndexSVSVamana')) # True if SVS is available -``` - -### Q: Can I convert an existing FAISS index to SVS? - -**A:** No direct conversion is available. You need to rebuild the index using SVS index types. Extract your vectors and create a new SVS index. - -### Q: What happens on non-Intel hardware? - -**A:** SVS indexes are designed for Intel CPUs. On Intel platforms without AVX-512, performance is still good but not optimal. On non-Intel platforms (AMD, ARM), consider using standard FAISS indexes like IndexHNSW. - -### Q: How does IndexSVSVamana compare to IndexHNSW? - -**A:** Both are graph-based approximate nearest neighbor indexes. SVS typically offers: -- Higher throughput (up to 8x on some datasets) -- Better memory efficiency with compression -- Optimized performance on Intel hardware - -Use HNSW if you need broader hardware compatibility or are on ARM. - -### Q: What if recall is too low with compression? - -**A:** Try these adjustments: -1. Increase `search_window_size` (e.g., 50 or 100) -2. Use higher-bit compression (LVQ4x8 → LVQ8) -3. For LeanVec, increase the reduced dimension -4. Increase `graph_max_degree` when building - -## References - -- [FAISS GitHub Repository](https://github.com/facebookresearch/faiss) -- [FAISS SVS Integration Wiki](https://github.com/facebookresearch/faiss/wiki/CPU-Faiss---Intel-SVS-%E2%80%90-Overview) -- [FAISS SVS Usage Guide](https://github.com/facebookresearch/faiss/wiki/CPU-Faiss---Intel-SVS-%E2%80%90-Usage) -- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/) -- [Intel SVS Benchmarks](https://intel.github.io/ScalableVectorSearch/benchs/static/latest.html) From 52a04dbd0af305f51bfd4b002992e917e048eda6 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Thu, 12 Feb 2026 21:09:03 +0000 Subject: [PATCH 8/9] Address mihaic review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix SVS intro: 'is integrated' instead of 'working on integrating' - Clarify Level 2 compression: LVQ encodes residuals, LeanVec encodes full dimensionality - Fix LeanVec4x8 description with reduced/full dimensionality detail - Fix DISTANCE_METRIC guidance: L2 for normalized embeddings - Add 'slower build' note to CONSTRUCTION_WINDOW_SIZE - Fix compression ratios: LVQ4x8 ~2.5x, LeanVec notation with f factor - Add LeanVec dimensionality reduction factor explanation - Remove bash markers from Redis command blocks - Fix SVS-VAMANA parameter count (14 → 12) in LeanVec example - Remove EPSILON row (not SVS-specific, range search only) - Fix attribution: Redis benchmarking (authors are Redis) - Fix precision metric: remove '+' (calibrated value) - Clarify SVS-VAMANA effectiveness: 'improving throughput' Co-authored-by: mihaic --- software/similarity-search/README.md | 6 +++--- software/similarity-search/redis/README.md | 25 +++++++++++----------- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md index 0eff29d..4d0b0f5 100644 --- a/software/similarity-search/README.md +++ b/software/similarity-search/README.md @@ -14,7 +14,7 @@ Vector similarity search is a core component of modern AI applications including ## Intel Scalable Vector Search (SVS) -[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and we are working on integrating it into various popular solutions to bring these optimizations to a wider audience. +[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience. SVS features: @@ -52,12 +52,12 @@ Best suited for: Both LVQ and LeanVec support two-level compression schemes: 1. **Level 1**: Fast candidate retrieval using compressed vectors -2. **Level 2**: Re-ranking using residual encoding for accuracy +2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data) The naming convention reflects bits per dimension at each level: - `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension) - `LVQ8`: Single-level, 8 bits per dimension -- `LeanVec4x8`: Dimensionality reduction + 4-bit Level 1 + 8-bit Level 2 +- `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data ## Vector Compression Selection diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md index 7ca45ce..e627b85 100644 --- a/software/similarity-search/redis/README.md +++ b/software/similarity-search/redis/README.md @@ -27,7 +27,7 @@ Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA ### Creating an SVS-VAMANA Index -```bash +``` FT.CREATE my_index ON HASH PREFIX 1 doc: @@ -46,9 +46,9 @@ FT.CREATE my_index |-----------|-------------|---------|-----------------| | TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory | | DIM | Vector dimensions | - | Must match your embeddings | -| DISTANCE_METRIC | L2, IP, or COSINE | - | COSINE for normalized embeddings | +| DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings | | GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory | -| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality | +| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality, slower build | | SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower | | COMPRESSION | LVQ/LeanVec type | none | See compression section | | TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low | @@ -65,9 +65,11 @@ Intel SVS provides advanced compression techniques that reduce memory usage whil | None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy | | LVQ8 | 8 | ~4x | Fast ingestion, good balance | | LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 | -| LVQ4x8 | 4+8 | ~3x | High recall with compression | -| LeanVec4x8 | Reduced + 4+8 | ~3x | High-dimensional vectors (768+) | -| LeanVec8x8 | Reduced + 8+8 | ~2.5x | Best recall with LeanVec | +| LVQ4x8 | 4+8 | ~2.5x | High recall with compression | +| LeanVec4x8 |4/f+8 | ~3x | High-dimensional vectors (768+) | +| LeanVec8x8 | 8/f+8 | ~2.5x | Best recall with LeanVec | + +The LeanVec dimensionality reduction factor `f` is the full dimensionality divided by the reduced dimensionality. ### Choosing Compression by Use Case @@ -80,11 +82,11 @@ Intel SVS provides advanced compression techniques that reduce memory usage whil ### Example with LeanVec Compression -```bash +``` FT.CREATE my_index ON HASH PREFIX 1 doc: - SCHEMA embedding VECTOR SVS-VAMANA 14 + SCHEMA embedding VECTOR SVS-VAMANA 12 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE @@ -109,7 +111,6 @@ FT.SEARCH my_index | Parameter | Effect | Trade-off | |-----------|--------|-----------| | SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency | -| EPSILON | Larger = wider range search | Higher latency | | SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency | ### Redis Configuration @@ -124,7 +125,7 @@ io-threads-do-reads yes ## Benchmarks -Based on [Redis and Intel benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW: +Based on [Redis benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW: ### Memory Savings @@ -138,7 +139,7 @@ SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across da ### Throughput Improvements (FP32) -At 0.95+ precision, compared to HNSW: +At 0.95 precision, compared to HNSW: | Dataset | Dimensions | QPS Improvement | |---------|------------|-----------------| @@ -146,7 +147,7 @@ At 0.95+ precision, compared to HNSW: | DBpedia | 1536 | Up to 60% higher | | LAION | 512 | 0-15% (marginal) | -SVS-VAMANA is most effective for medium-to-high dimensional embeddings (768–3072 dimensions). +SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768–3072 dimensions). ### Latency Improvements (FP32, High Concurrency) From deea507438248a96c728cbc15718cb69a44d3c60 Mon Sep 17 00:00:00 2001 From: Nikolay Petrov Date: Tue, 17 Feb 2026 12:28:40 -0800 Subject: [PATCH 9/9] Update README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mihai Capotă --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 112f62e..9ee9f41 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,6 @@ We aim to provide a dynamic resource where users can find the latest optimizatio - [Java](software/java/README.md) - [Similarity Search](software/similarity-search/README.md) - [Redis](software/similarity-search/redis/README.md) - - [FAISS](software/similarity-search/faiss/README.md) - [Spark](software/spark/README.md) - [MySQL & PostgreSQL](software/mysql-postgresql/README.md) - Workloads