intel · napetrov · Feb 9, 2026 · Feb 9, 2026 · Feb 10, 2026 · Feb 10, 2026
diff --git a/README.md b/README.md
@@ -34,6 +34,8 @@ We aim to provide a dynamic resource where users can find the latest optimizatio
   - [Cassandra](software/cassandra/README.md)
   - [Gluten](software/gluten/README.md)
   - [Java](software/java/README.md)
+  - [Similarity Search](software/similarity-search/README.md)
+    - [Redis](software/similarity-search/redis/README.md)
   - [Spark](software/spark/README.md)
   - [MySQL & PostgreSQL](software/mysql-postgresql/README.md)
 - Workloads

diff --git a/software/similarity-search/README.md b/software/similarity-search/README.md
@@ -0,0 +1,89 @@
+# Similarity Search Optimization Guides
+
+This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors.
+
+## Overview
+
+Vector similarity search is a core component of modern AI applications including:
+
+- Retrieval-Augmented Generation (RAG)
+- Semantic search
+- Recommendation systems
+- Image and video similarity
+- Anomaly detection
+
+## Intel Scalable Vector Search (SVS)
+
+[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience.
+
+SVS features:
+
+- **Vamana Algorithm**: Graph-based approximate nearest neighbor search
+- **Vector Compression**: LVQ and LeanVec for significant memory reduction
+- **Hardware Optimization**: Best performance on servers with AVX-512 support
+
+## Understanding LVQ and LeanVec Compression
+
+Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels.
+
+### LVQ (Locally-adaptive Vector Quantization)
+
+LVQ addresses these limitations by applying **per-vector normalization and scalar quantization**, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations.
+
+Key benefits:
+- Minimal decompression overhead enables fast, on-the-fly distance computations
+- Significantly reduces memory bandwidth and storage requirements
+- Maintains high search accuracy and throughput
+- SIMD-optimized layout ([Turbo LVQ](https://arxiv.org/abs/2402.02044)) for efficient distance computations
+
+LVQ achieves a **four-fold reduction** of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes.
+
+### LeanVec (LVQ with Dimensionality Reduction)
+
+[LeanVec](https://openreview.net/forum?id=wczqrpOrIc) builds on LVQ by first applying **linear dimensionality reduction**, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings.
+
+Best suited for:
+- High-dimensional vectors (768+ dimensions)
+- Text embeddings from large language models
+- Cases where maximum memory savings are needed
+
+### Two-Level Compression
+
+Both LVQ and LeanVec support two-level compression schemes:
+
+1. **Level 1**: Fast candidate retrieval using compressed vectors
+2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data)
+
+The naming convention reflects bits per dimension at each level:
+- `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension)
+- `LVQ8`: Single-level, 8 bits per dimension
+- `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data
+
+## Vector Compression Selection
+
+| Compression | Best For | Observations |
+|-------------|----------|--------------|
+| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search |
+| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall |
+| LVQ4 | Maximum memory saving | Recall might be insufficient |
+| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 |
+| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall |
+| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings |
+
+**Rule of thumb:**
+- Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8)
+- Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8)
+
+## Available Guides
+
+| Software | Description | Guide |
+|----------|-------------|-------|
+| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) |
+
+## References
+
+- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/)
+- [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch)
+- [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf)
+- [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u)
+- [Turbo LVQ Paper](https://arxiv.org/abs/2402.02044)
diff --git a/software/similarity-search/redis/README.md b/software/similarity-search/redis/README.md
@@ -0,0 +1,219 @@
+# Redis Vector Search Optimization Guide
+
+This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [SVS-VAMANA Configuration](#svs-vamana-configuration)
+- [Vector Compression](#vector-compression)
+- [Performance Tuning](#performance-tuning)
+- [Benchmarks](#benchmarks)
+- [FAQ](#faq)
+- [References](#references)
+
+## Overview
+
+Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm with Intel's compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support.
+
+**Key Benefits of SVS-VAMANA:**
+
+- **Memory Efficiency**: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory
+- **Higher Throughput**: Up to 144% higher QPS compared to HNSW on high-dimensional datasets
+- **Lower Latency**: Up to 60% reduction in p50/p95 latencies under load
+- **Maintained Accuracy**: Matches HNSW precision levels while delivering performance improvements
+
+## SVS-VAMANA Configuration
+
+### Creating an SVS-VAMANA Index
+
+```
+FT.CREATE my_index
+  ON HASH
+  PREFIX 1 doc:
+  SCHEMA embedding VECTOR SVS-VAMANA 12
+    TYPE FLOAT32
+    DIM 768
+    DISTANCE_METRIC COSINE
+    GRAPH_MAX_DEGREE 64
+    CONSTRUCTION_WINDOW_SIZE 200
+    COMPRESSION LVQ4x8
+```
+
+### Index Parameters
+
+| Parameter | Description | Default | Tuning Guidance |
+|-----------|-------------|---------|-----------------|
+| TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory |
+| DIM | Vector dimensions | - | Must match your embeddings |
+| DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings |
+| GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory |
+| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality, slower build |
+| SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower |
+| COMPRESSION | LVQ/LeanVec type | none | See compression section |
+| TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low |
+| REDUCE | Target dimension for LeanVec | DIM/2 | Lower = faster search, may reduce recall |
+
+## Vector Compression
+
+Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality.
+
+### Compression Options
+
+| Compression | Bits/Dim | Memory Reduction | Best For |
+|-------------|----------|------------------|----------|
+| None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy |
+| LVQ8 | 8 | ~4x | Fast ingestion, good balance |
+| LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 |
+| LVQ4x8 | 4+8 | ~2.5x | High recall with compression |
+| LeanVec4x8 |4/f+8 | ~3x | High-dimensional vectors (768+) |
+| LeanVec8x8 | 8/f+8 | ~2.5x | Best recall with LeanVec |
+
+The LeanVec dimensionality reduction factor `f` is the full dimensionality divided by the reduced dimensionality.
+
+### Choosing Compression by Use Case
+
+| Embedding Category | Example Embeddings | Compression Strategy |
+|--------------------|-------------------|---------------------|
+| Text Embeddings | Cohere embed-v3 (1024), OpenAI ada-002 (1536) | LeanVec4x8 |
+| Image Embeddings | ResNet-152 (2048), ViT (768+) | LeanVec4x8 |
+| Multimodal | CLIP ViT-B/32 (512) | LVQ8 |
+| Lower Dimensional | Custom embeddings (<768) | LVQ4x4 or LVQ4x8 |
+
+### Example with LeanVec Compression
+
+```
+FT.CREATE my_index
+  ON HASH
+  PREFIX 1 doc:
+  SCHEMA embedding VECTOR SVS-VAMANA 12
+    TYPE FLOAT32
+    DIM 1536
+    DISTANCE_METRIC COSINE
+    COMPRESSION LeanVec4x8
+    REDUCE 384
+    TRAINING_THRESHOLD 20000
+```
+
+## Performance Tuning
+
+### Runtime Query Parameters
+
+Adjust search parameters at query time for precision/performance trade-offs:
+
+```bash
+FT.SEARCH my_index
+  "*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]"
+  PARAMS 4 BLOB "\x12\xa9..." SW 50
+  DIALECT 2
+```
+
+| Parameter | Effect | Trade-off |
+|-----------|--------|-----------|
+| SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency |
+| SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency |
+
+### Redis Configuration
+
+```
+# redis.conf optimizations for vector workloads
+
+# Use multiple I/O threads for better throughput
+io-threads 4
+io-threads-do-reads yes
+```
+
+## Benchmarks
+
+Based on [Redis benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW:
+
+### Memory Savings
+
+SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets:
+
+| Dataset | Dimensions | Total Memory Reduction | Index Memory Reduction |
+|---------|------------|----------------------|----------------------|
+| LAION | 512 | 26% | 51% |
+| Cohere | 768 | 35% | 70% |
+| DBpedia | 1536 | 37% | 74% |
+
+### Throughput Improvements (FP32)
+
+At 0.95 precision, compared to HNSW:
+
+| Dataset | Dimensions | QPS Improvement |
+|---------|------------|-----------------|
+| Cohere | 768 | Up to 144% higher |
+| DBpedia | 1536 | Up to 60% higher |
+| LAION | 512 | 0-15% (marginal) |
+
+SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768–3072 dimensions).
+
+### Latency Improvements (FP32, High Concurrency)
+
+| Dataset | p50 Latency Reduction | p95 Latency Reduction |
+|---------|----------------------|----------------------|
+| Cohere (768d) | 60% | 57% |
+| DBpedia (1536d) | 46% | 36% |
+
+### Precision vs. Performance
+
+At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput.
+
+### Ingestion Trade-offs
+
+SVS-VAMANA index construction is slower than HNSW due to compression overhead. On x86 platforms:
+- LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset
+- LVQ: Up to 2.6x slower than HNSW
+
+This trade-off is acceptable for workloads where query performance and memory efficiency are priorities.
+
+## FAQ
+
+### Q: When should I use SVS-VAMANA vs HNSW?
+
+**A:** Use SVS-VAMANA when:
+- Running on Intel Xeon processors with AVX-512
+- Memory efficiency is important (26-37% savings)
+- You have medium-to-high dimensional vectors (768+)
+- Query throughput and latency are priorities
+
+Use HNSW when:
+- Running on ARM platforms (HNSW performs well on ARM)
+- You need faster index construction
+- Working with lower-dimensional vectors (<512)
- You need faster index construction
- Working with lower-dimensional vectors (<512)
+- Working with lower-dimensional vectors (<512) and needing faster index construction
- You need faster index construction
- Working with lower-dimensional vectors (<512)
+- Working with lower-dimensional vectors (<512) and needing faster index construction
+
+### Q: Are LVQ and LeanVec available in Redis Open Source?
+
+**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require:
+- Intel hardware with AVX-512
+- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
+- Redis Software (commercial) or [building Redis Open Source](https://github.com/redis/redis?tab=readme-ov-file#running-redis-with-the-query-engine-and-optional-proprietary-intel-svs-vamana-optimisations) with `BUILD_INTEL_SVS_OPT=yes`
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
+- Redis Software (commercial) or [building Redis Open Source](https://github.com/redis/redis?tab=readme-ov-file#running-redis-with-the-query-engine-and-optional-proprietary-intel-svs-vamana-optimisations) with `BUILD_INTEL_SVS_OPT=yes`
+
+On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required.
+
+### Q: What if recall is too low with compression?
+
+**A:** Try these steps in order:
+1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
-1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
+1. Increase `TRAINING_THRESHOLD` (e.g., 50000) if using LeanVec
-1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
+1. Increase `TRAINING_THRESHOLD` (e.g., 50000) if using LeanVec
+2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
+3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128)
+4. Increase `SEARCH_WINDOW_SIZE` at query time
+5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions)
+
+### Q: Does SVS-VAMANA work on non-Intel hardware?
+
+**A:** Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation:
+
+- **Intel (AVX-512)**: Full LVQ/LeanVec optimizations for maximum performance
+- **AMD/Other x86**: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance
+- **ARM**: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM
+
+Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification.
+
+## References
+
+- [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/)
+- [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index)
+- [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/)
+- [Tech Dive: Comprehensive Compression](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/)
+- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/)