Skip to content
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ We aim to provide a dynamic resource where users can find the latest optimizatio
- [Cassandra](software/cassandra/README.md)
- [Gluten](software/gluten/README.md)
- [Java](software/java/README.md)
- [Similarity Search](software/similarity-search/README.md)
- [Redis](software/similarity-search/redis/README.md)
- [Spark](software/spark/README.md)
- [MySQL & PostgreSQL](software/mysql-postgresql/README.md)
- Workloads
Expand Down
89 changes: 89 additions & 0 deletions software/similarity-search/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Similarity Search Optimization Guides

This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors.

## Overview

Vector similarity search is a core component of modern AI applications including:

- Retrieval-Augmented Generation (RAG)
- Semantic search
- Recommendation systems
- Image and video similarity
- Anomaly detection

## Intel Scalable Vector Search (SVS)

[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience.

SVS features:

- **Vamana Algorithm**: Graph-based approximate nearest neighbor search
- **Vector Compression**: LVQ and LeanVec for significant memory reduction
- **Hardware Optimization**: Best performance on servers with AVX-512 support

## Understanding LVQ and LeanVec Compression

Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels.

### LVQ (Locally-adaptive Vector Quantization)

LVQ addresses these limitations by applying **per-vector normalization and scalar quantization**, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations.

Key benefits:
- Minimal decompression overhead enables fast, on-the-fly distance computations
- Significantly reduces memory bandwidth and storage requirements
- Maintains high search accuracy and throughput
- SIMD-optimized layout ([Turbo LVQ](https://arxiv.org/abs/2402.02044)) for efficient distance computations

LVQ achieves a **four-fold reduction** of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes.

### LeanVec (LVQ with Dimensionality Reduction)

[LeanVec](https://openreview.net/forum?id=wczqrpOrIc) builds on LVQ by first applying **linear dimensionality reduction**, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings.

Best suited for:
- High-dimensional vectors (768+ dimensions)
- Text embeddings from large language models
- Cases where maximum memory savings are needed

### Two-Level Compression

Both LVQ and LeanVec support two-level compression schemes:

1. **Level 1**: Fast candidate retrieval using compressed vectors
2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data)

The naming convention reflects bits per dimension at each level:
- `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension)
- `LVQ8`: Single-level, 8 bits per dimension
- `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data

## Vector Compression Selection

| Compression | Best For | Observations |
|-------------|----------|--------------|
| LVQ4x4 | Fast search and low memory use | Consider LeanVec for even faster search |
| LeanVec4x8 | Fastest search and ingestion | LeanVec dimensionality reduction might reduce recall |
| LVQ4 | Maximum memory saving | Recall might be insufficient |
| LVQ8 | Faster ingestion than LVQ4x4 | Search likely slower than LVQ4x4 |
| LeanVec8x8 | Improved recall when LeanVec4x8 is insufficient | LeanVec dimensionality reduction might reduce recall |
| LVQ4x8 | Improved recall when LVQ4x4 is insufficient | Slightly worse memory savings |

**Rule of thumb:**
- Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8)
- Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8)

## Available Guides

| Software | Description | Guide |
|----------|-------------|-------|
| **Redis** | Redis Query Engine with SVS-VAMANA | [Redis Guide](redis/README.md) |

## References

- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/)
- [SVS GitHub Repository](https://github.com/intel/ScalableVectorSearch)
- [LVQ Paper (VLDB 2023)](https://www.vldb.org/pvldb/vol16/p2769-aguerrebere.pdf)
- [LeanVec Paper (TMLR 2024)](https://openreview.net/forum?id=Y5Mvyusf1u)
- [Turbo LVQ Paper](https://arxiv.org/abs/2402.02044)
219 changes: 219 additions & 0 deletions software/similarity-search/redis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# Redis Vector Search Optimization Guide

This guide describes best practices for optimizing vector similarity search performance in Redis on Intel Xeon processors. Redis 8.2+ includes SVS-VAMANA, a graph-based vector index algorithm from Intel's Scalable Vector Search (SVS) library.

## Table of Contents

- [Overview](#overview)
- [SVS-VAMANA Configuration](#svs-vamana-configuration)
- [Vector Compression](#vector-compression)
- [Performance Tuning](#performance-tuning)
- [Benchmarks](#benchmarks)
- [FAQ](#faq)
- [References](#references)

## Overview

Redis Query Engine supports three vector index types: FLAT, HNSW, and SVS-VAMANA. SVS-VAMANA combines the Vamana graph-based search algorithm with Intel's compression technologies (LVQ and LeanVec), delivering optimal performance on servers with AVX-512 support.

**Key Benefits of SVS-VAMANA:**

- **Memory Efficiency**: 26–37% total memory savings compared to HNSW, with 51–74% reduction in index memory
- **Higher Throughput**: Up to 144% higher QPS compared to HNSW on high-dimensional datasets
- **Lower Latency**: Up to 60% reduction in p50/p95 latencies under load
- **Maintained Accuracy**: Matches HNSW precision levels while delivering performance improvements

## SVS-VAMANA Configuration

### Creating an SVS-VAMANA Index

```
FT.CREATE my_index
ON HASH
PREFIX 1 doc:
SCHEMA embedding VECTOR SVS-VAMANA 12
TYPE FLOAT32
DIM 768
DISTANCE_METRIC COSINE
GRAPH_MAX_DEGREE 64
CONSTRUCTION_WINDOW_SIZE 200
COMPRESSION LVQ4x8
```

### Index Parameters

| Parameter | Description | Default | Tuning Guidance |
|-----------|-------------|---------|-----------------|
| TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory |
| DIM | Vector dimensions | - | Must match your embeddings |
| DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings |
| GRAPH_MAX_DEGREE | Max edges per node | 32 | Higher = better recall, more memory |
| CONSTRUCTION_WINDOW_SIZE | Build search window | 200 | Higher = better graph quality, slower build |
| SEARCH_WINDOW_SIZE | Query search window | 10 | Higher = better recall, slower |
| COMPRESSION | LVQ/LeanVec type | none | See compression section |
| TRAINING_THRESHOLD | Vectors for learning compression | 10240 | Increase if recall is low |
| REDUCE | Target dimension for LeanVec | DIM/2 | Lower = faster search, may reduce recall |

## Vector Compression

Intel SVS provides advanced compression techniques that reduce memory usage while maintaining search quality.

### Compression Options

| Compression | Bits/Dim | Memory Reduction | Best For |
|-------------|----------|------------------|----------|
| None | 32 (FLOAT32) | 1x (baseline) | Maximum accuracy |
| LVQ8 | 8 | ~4x | Fast ingestion, good balance |
| LVQ4x4 | 4+4 | ~4x | Fast search, dimensions < 768 |
| LVQ4x8 | 4+8 | ~2.5x | High recall with compression |
| LeanVec4x8 |4/f+8 | ~3x | High-dimensional vectors (768+) |
| LeanVec8x8 | 8/f+8 | ~2.5x | Best recall with LeanVec |

The LeanVec dimensionality reduction factor `f` is the full dimensionality divided by the reduced dimensionality.

### Choosing Compression by Use Case

| Embedding Category | Example Embeddings | Compression Strategy |
|--------------------|-------------------|---------------------|
| Text Embeddings | Cohere embed-v3 (1024), OpenAI ada-002 (1536) | LeanVec4x8 |
| Image Embeddings | ResNet-152 (2048), ViT (768+) | LeanVec4x8 |
| Multimodal | CLIP ViT-B/32 (512) | LVQ8 |
| Lower Dimensional | Custom embeddings (<768) | LVQ4x4 or LVQ4x8 |

### Example with LeanVec Compression

```
FT.CREATE my_index
ON HASH
PREFIX 1 doc:
SCHEMA embedding VECTOR SVS-VAMANA 12
TYPE FLOAT32
DIM 1536
DISTANCE_METRIC COSINE
COMPRESSION LeanVec4x8
REDUCE 384
TRAINING_THRESHOLD 20000
```

## Performance Tuning

### Runtime Query Parameters

Adjust search parameters at query time for precision/performance trade-offs:

```bash
FT.SEARCH my_index
"*=>[KNN 10 @embedding $BLOB SEARCH_WINDOW_SIZE $SW]"
PARAMS 4 BLOB "\x12\xa9..." SW 50
DIALECT 2
```

| Parameter | Effect | Trade-off |
|-----------|--------|-----------|
| SEARCH_WINDOW_SIZE | Larger = higher recall | Higher latency |
| SEARCH_BUFFER_CAPACITY | More candidates for re-ranking | Higher latency |

### Redis Configuration

```
# redis.conf optimizations for vector workloads

# Use multiple I/O threads for better throughput
io-threads 4
io-threads-do-reads yes
```

## Benchmarks

Based on [Redis benchmarking](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/), SVS-VAMANA delivers significant improvements over HNSW:

### Memory Savings

SVS-VAMANA with LVQ8 compression achieves consistent memory reductions across datasets:

| Dataset | Dimensions | Total Memory Reduction | Index Memory Reduction |
|---------|------------|----------------------|----------------------|
| LAION | 512 | 26% | 51% |
| Cohere | 768 | 35% | 70% |
| DBpedia | 1536 | 37% | 74% |

### Throughput Improvements (FP32)

At 0.95 precision, compared to HNSW:

| Dataset | Dimensions | QPS Improvement |
|---------|------------|-----------------|
| Cohere | 768 | Up to 144% higher |
| DBpedia | 1536 | Up to 60% higher |
| LAION | 512 | 0-15% (marginal) |

SVS-VAMANA is most effective at improving throughput for medium-to-high dimensional embeddings (768–3072 dimensions).

### Latency Improvements (FP32, High Concurrency)

| Dataset | p50 Latency Reduction | p95 Latency Reduction |
|---------|----------------------|----------------------|
| Cohere (768d) | 60% | 57% |
| DBpedia (1536d) | 46% | 36% |

### Precision vs. Performance

At every precision point from ~0.92 to 0.99, SVS-VAMANA matches HNSW accuracy while delivering higher throughput. At high precision (0.99), SVS-VAMANA sustains up to 1.5x better throughput.

### Ingestion Trade-offs

SVS-VAMANA index construction is slower than HNSW due to compression overhead. On x86 platforms:
- LeanVec: Can be up to 25% faster or 33% slower than HNSW depending on dataset
- LVQ: Up to 2.6x slower than HNSW

This trade-off is acceptable for workloads where query performance and memory efficiency are priorities.

## FAQ

### Q: When should I use SVS-VAMANA vs HNSW?

**A:** Use SVS-VAMANA when:
- Running on Intel Xeon processors with AVX-512
- Memory efficiency is important (26-37% savings)
- You have medium-to-high dimensional vectors (768+)
- Query throughput and latency are priorities

Use HNSW when:
- Running on ARM platforms (HNSW performs well on ARM)
- You need faster index construction
- Working with lower-dimensional vectors (<512)
Comment on lines +183 to +184
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both are not true, SVS has advantages.

Suggested change
- You need faster index construction
- Working with lower-dimensional vectors (<512)
- Working with lower-dimensional vectors (<512) and needing faster index construction


### Q: Are LVQ and LeanVec available in Redis Open Source?

**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require:
- Intel hardware with AVX-512
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
- Redis Software (commercial) or [building Redis Open Source](https://github.com/redis/redis?tab=readme-ov-file#running-redis-with-the-query-engine-and-optional-proprietary-intel-svs-vamana-optimisations) with `BUILD_INTEL_SVS_OPT=yes`


On non-Intel platforms (AMD, ARM), SVS-VAMANA automatically falls back to SQ8 compression—no code changes required.

### Q: What if recall is too low with compression?

**A:** Try these steps in order:
1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
1. Increase `TRAINING_THRESHOLD` (e.g., 50000) if using LeanVec

2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128)
4. Increase `SEARCH_WINDOW_SIZE` at query time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the search window size is definitely the first thing to try.

5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REDUCE should be after TRAINING_THRESHOLD.


### Q: Does SVS-VAMANA work on non-Intel hardware?

**A:** Yes! The API is unified and SVS-VAMANA runs on any x86 or ARM platform—no code changes needed. The library automatically selects the best available implementation:

- **Intel (AVX-512)**: Full LVQ/LeanVec optimizations for maximum performance
- **AMD/Other x86**: SQ8 fallback implementation, which benchmarks show is also quite fast—often comparable performance
- **ARM**: SQ8 fallback works; however, HNSW may be preferable due to slower SVS ingestion on ARM

Your application code stays the same regardless of hardware. Ideal performance is achieved on Intel Xeon with AVX-512, but you can deploy and test on any platform without modification.

## References

- [Redis Vector Search Documentation](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/)
- [SVS-VAMANA Index Reference](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/#svs-vamana-index)
- [Vector Compression Guide](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/svs-compression/)
- [Tech Dive: Comprehensive Compression](https://redis.io/blog/tech-dive-comprehensive-compression-leveraging-quantization-and-dimensionality-reduction/)
- [Intel Scalable Vector Search](https://intel.github.io/ScalableVectorSearch/)