Skip to content

Add Similarity Search optimization guides#20

Open
napetrov wants to merge 9 commits intointel:mainfrom
napetrov:similarity-search
Open

Add Similarity Search optimization guides#20
napetrov wants to merge 9 commits intointel:mainfrom
napetrov:similarity-search

Conversation

@napetrov
Copy link

Summary

This PR adds documentation for optimizing vector similarity search workloads on Intel Xeon processors.

New Content

  • software/similarity-search/README.md — Overview of Intel's compression technologies (LVQ, LeanVec) for vector search
  • software/similarity-search/redis/README.md — Complete guide for Redis SVS-VAMANA configuration and tuning

Key Topics Covered

  • SVS-VAMANA index configuration in Redis 8.2+
  • Vector compression options (LVQ4x4, LVQ4x8, LVQ8, LeanVec4x8, LeanVec8x8)
  • Performance tuning parameters
  • Benchmarks from Redis tech dive (memory, throughput, latency improvements)
  • Cross-platform compatibility (unified API, works on Intel/AMD/ARM)
  • FAQ addressing common questions

Benchmarks Highlighted

Based on Redis and Intel benchmarking:

  • Memory: 26-37% total reduction vs HNSW
  • Throughput: Up to 144% higher QPS
  • Latency: Up to 60% lower p50/p95

Related: Intel Scalable Vector Search library — https://intel.github.io/ScalableVectorSearch/

- Remove BIOS and hardware recommendation sections
- Add high-level LVQ and LeanVec description
- Replace benchmarks with data from Redis tech dive blog
- Mention SVS can be used directly, integrations in progress
- Remove contributing section
- Focus on compression selection and performance data
- Add FAISS optimization guide with SVS index types
- Cover IndexSVSVamana, IndexSVSVamanaLVQ, IndexSVSVamanaLeanVec
- Include factory string format and examples
- Add compression selection guide
- Include benchmark data from Intel SVS
- Update overview and main README
- Remove benchmark section (no official FAISS SVS performance data)
- Update overview to remove specific performance claims
- Clarify Intel-only requirement in FAQ
- Keep technical how-to guidance
- Add licensing warning for LVQ/LeanVec (AGPLv3/SSPLv1 incompatible)
- Add language spec to FAISS factory string code block

- High-performance graph-based similarity search optimized for Intel CPUs
- Significant memory reduction with LVQ and LeanVec compression
- Best performance on Intel Xeon with AVX-512 support

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we distinguish AVX-512 VNNI?


### IndexSVSVamanaLVQ

Combines Vamana with Locally-adaptive Vector Quantization (LVQ). LVQ applies per-vector normalization and scalar quantization, achieving up to 4x memory reduction while maintaining high accuracy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### IndexSVSVamanaLeanVec

Extends LVQ with dimensionality reduction. Best for high-dimensional vectors (768+ dimensions), achieving up to 8-16x memory reduction. Particularly effective for text embeddings from large language models.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

### From PyPI (with Intel optimizations)

```bash
pip install faiss-cpu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, faiss-cpu@1.13.2 from pip does not include SVS indices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with all of other faiss related questions and pending PRs - would it worth just dropping faiss doc for now and add it later on?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I suggest to move FAISS doc to its own (draft) PR with all comments implemented and wait until SVS is released with FAISS. Then we can go ahead and merge the rest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. FAISS doc removed from this PR. Will create a separate draft PR with all your feedback addressed once SVS is released with FAISS.

### From Conda (Intel channel)

```bash
conda install -c conda-forge faiss-cpu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, the combination of faiss packages does not include SVS indices (tested with python 3.12)

  faiss              conda-forge/linux-64::faiss-1.9.0-py312hf23773a_0_cpu
  faiss-cpu          conda-forge/linux-64::faiss-cpu-1.9.0-h718b53a_0

### Choosing Compression

**Rule of thumb:**
- Dimensions < 768: Use LVQ (LVQ8 or LVQ4x8)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Dimensions < 768: Use LVQ (LVQ8 or LVQ4x8)
- Dimensions < 768: Use LVQ (LVQ8x0 or LVQ4x8)

index = faiss.index_factory(1536, "SVSVamana32,LeanVec4x8_384")
```

Lower reduced dimensions = faster search and less memory, but may reduce recall.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Lower reduced dimensions = faster search and less memory, but may reduce recall.
Lower reduced dimensions = faster search and less memory, but may reduce recall.
If no suffix is specified, `d/2` will be used as default.

```

### Multi-threaded Search

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that we parallelize over queries? I.e., the queries passed to index.search() should contain at least n_threads elements.


```python
import faiss
print(hasattr(faiss, 'IndexSVSVamana')) # True if SVS is available

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(hasattr(faiss, 'IndexSVSVamana')) # True if SVS is available
"SVS" in faiss.get_compile_options() # True if SVS is available


**A:** Try these adjustments:
1. Increase `search_window_size` (e.g., 50 or 100)
2. Use higher-bit compression (LVQ4x8 → LVQ8)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Use higher-bit compression (LVQ4x8 → LVQ8)
2. Use higher-bit compression (LVQ4x8 → LVQ8x0)

Copy link

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping my change request to not block anything while I'm OOO. Still would like to see the FAISS doc sorted out though.

@rsiyer-intel
Copy link
Collaborator

@napetrov Do you plan to drop FAISS document in this PR?

@napetrov
Copy link
Author

@napetrov Do you plan to drop FAISS document in this PR?

yes, removed.

Would give opportunity to look on this pr for couple other people

Copy link
Member

@mihaic mihaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @napetrov! I have some suggestions for your consideration.


### Creating an SVS-VAMANA Index

```bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "bash" marker.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, removed bash markers from Redis command blocks.


## Intel Scalable Vector Search (SVS)

[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and we are working on integrating it into various popular solutions to bring these optimizations to a wider audience.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and we are working on integrating it into various popular solutions to bring these optimizations to a wider audience.
[Intel Scalable Vector Search (SVS)](https://intel.github.io/ScalableVectorSearch/) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — SVS is already integrated into Redis, not just 'working on it'. Fixed.

Both LVQ and LeanVec support two-level compression schemes:

1. **Level 1**: Fast candidate retrieval using compressed vectors
2. **Level 2**: Re-ranking using residual encoding for accuracy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **Level 2**: Re-ranking using residual encoding for accuracy
2. **Level 2**: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — important distinction between LVQ residuals and LeanVec full dimensionality encoding.

The naming convention reflects bits per dimension at each level:
- `LVQ4x8`: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension)
- `LVQ8`: Single-level, 8 bits per dimension
- `LeanVec4x8`: Dimensionality reduction + 4-bit Level 1 + 8-bit Level 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `LeanVec4x8`: Dimensionality reduction + 4-bit Level 1 + 8-bit Level 2
- `LeanVec4x8`: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied, much clearer now.

|-----------|-------------|---------|-----------------|
| TYPE | Vector data type (FLOAT16, FLOAT32) | - | FLOAT32 for accuracy, FLOAT16 for memory |
| DIM | Vector dimensions | - | Must match your embeddings |
| DISTANCE_METRIC | L2, IP, or COSINE | - | COSINE for normalized embeddings |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| DISTANCE_METRIC | L2, IP, or COSINE | - | COSINE for normalized embeddings |
| DISTANCE_METRIC | L2, IP, or COSINE | - | L2 for normalized embeddings |

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — L2 for normalized embeddings.

Comment on lines +182 to +183
- You need faster index construction
- Working with lower-dimensional vectors (<512)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both are not true, SVS has advantages.

Suggested change
- You need faster index construction
- Working with lower-dimensional vectors (<512)
- Working with lower-dimensional vectors (<512) and needing faster index construction


**A:** The basic SVS-VAMANA algorithm with 8-bit scalar quantization (SQ8) is available in Redis Open Source on all platforms. Intel's LVQ and LeanVec optimizations require:
- Intel hardware with AVX-512
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Redis Software (commercial) or building with `BUILD_INTEL_SVS_OPT=yes`
- Redis Software (commercial) or [building Redis Open Source](https://github.com/redis/redis?tab=readme-ov-file#running-redis-with-the-query-engine-and-optional-proprietary-intel-svs-vamana-optimisations) with `BUILD_INTEL_SVS_OPT=yes`

### Q: What if recall is too low with compression?

**A:** Try these steps in order:
1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
1. Increase `TRAINING_THRESHOLD` (e.g., 50000) if using LeanVec

1. Increase `TRAINING_THRESHOLD` (e.g., 50000)
2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128)
4. Increase `SEARCH_WINDOW_SIZE` at query time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the search window size is definitely the first thing to try.

2. Switch to higher-bit compression (LVQ4x8 → LVQ8, or LeanVec4x8 → LeanVec8x8)
3. Increase `GRAPH_MAX_DEGREE` (e.g., 64 or 128)
4. Increase `SEARCH_WINDOW_SIZE` at query time
5. For LeanVec, try a larger `REDUCE` value (closer to original dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REDUCE should be after TRAINING_THRESHOLD.

- Fix SVS intro: 'is integrated' instead of 'working on integrating'
- Clarify Level 2 compression: LVQ encodes residuals, LeanVec encodes full dimensionality
- Fix LeanVec4x8 description with reduced/full dimensionality detail
- Fix DISTANCE_METRIC guidance: L2 for normalized embeddings
- Add 'slower build' note to CONSTRUCTION_WINDOW_SIZE
- Fix compression ratios: LVQ4x8 ~2.5x, LeanVec notation with f factor
- Add LeanVec dimensionality reduction factor explanation
- Remove bash markers from Redis command blocks
- Fix SVS-VAMANA parameter count (14 → 12) in LeanVec example
- Remove EPSILON row (not SVS-specific, range search only)
- Fix attribution: Redis benchmarking (authors are Redis)
- Fix precision metric: remove '+' (calibrated value)
- Clarify SVS-VAMANA effectiveness: 'improving throughput'

Co-authored-by: mihaic <mihaic@users.noreply.github.com>
@napetrov napetrov requested review from ahuber21 and mihaic February 17, 2026 17:52
Copy link
Member

@mihaic mihaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of my earlier suggestions are unaddressed and there is still a reference to the deleted Faiss document.

Co-authored-by: Mihai Capotă <mihai@mihaic.ro>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments