Skip to content

Implements an Approximate Nearest Neighbor (ANN) search engine using Faiss/Annoy for fast vector retrieval, demonstrating optimization and high-dimensional data handling.

License

Notifications You must be signed in to change notification settings

karsterr/VectorWise

Repository files navigation

VectorWise

High-performance Approximate Nearest Neighbor search over 1 million vectors.

Built with Faiss HNSW + FastAPI. Sub-10ms queries. 95%+ recall. Production-ready.

Python 3.11+ FastAPI Faiss Docker License: MIT Tests: 25 passed


What is VectorWise?

VectorWise is a REST API that finds the most similar vectors in a dataset of one million entries -- in under 10 milliseconds.

Think of it like this: you have a library with one million books. Someone hands you a book and asks "find me the 10 most similar books." A brute-force search reads every single book -- slow. VectorWise uses an HNSW graph (a clever shortcut structure) to jump between "neighborhoods" of similar books, finding the answer in logarithmic time instead of linear time.

This project demonstrates end-to-end ML infrastructure engineering:

  • Data pipeline -- synthetic vector generation, L2 normalization, index construction
  • Algorithm selection -- HNSW parameter tuning for the recall/latency trade-off
  • API design -- typed request/response models, input validation, structured error handling
  • Containerization -- Docker image with volume-mounted index, health checks, compose orchestration
  • Testing -- 25 unit tests with FastAPI TestClient, integration test suite, benchmark framework
  • Observability -- /stats endpoint, structured logging, benchmark JSON export

Key Results

Real numbers from actual runs on this codebase:

Metric Value
Dataset 1,000,000 vectors, 128 dimensions
Index build time 551.46 seconds
Index size on disk 747.80 MB (index.faiss)
Vector data size 488.28 MB (vectors.npy)
Average query latency ~4-6 ms
P95 latency ~8 ms
Recall@10 95-98%
Unit tests 25/25 passing (0.10s)

How It Works

Architecture

                          ┌──────────────────────────────────┐
                          │         Docker Container         │
                          │                                  │
  Client ──HTTP/JSON──►   │  FastAPI (api/main.py)           │
                          │    │                             │
                          │    ├─ GET  /        → Health     │
                          │    ├─ GET  /stats   → Index info │
                          │    └─ POST /search  → k-NN      │
                          │         │                        │
                          │         ▼                        │
                          │  ┌─────────────────────┐        │
                          │  │  Faiss HNSW Index    │        │
                          │  │  1M vectors in RAM   │        │
                          │  │  O(log N) search     │        │
                          │  └─────────────────────┘        │
                          │                                  │
                          └──────────────────────────────────┘

Request lifecycle

  1. Client sends POST /search with a 128-dim vector and k
  2. Pydantic validates the request body (dimension check, k bounds)
  3. Query vector is L2-normalized to match the indexed vectors
  4. Faiss HNSW searches the graph in O(log N) time
  5. Top-k indices and L2 distances are returned as JSON

Why HNSW?

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to its nearest neighbors. Searching starts at the top layer (sparse, long-range links) and drills down to the bottom layer (dense, short-range links) -- like zooming into a map.

The trade-off knobs:

Parameter Value What it controls
M 32 Links per node. Higher = better recall, more memory
efConstruction 200 Build-time candidate list. Higher = better graph quality, slower build
efSearch 64 Search-time candidate list. Higher = better recall, slower queries

These values achieve 95%+ Recall@10 at sub-10ms latency on 1M vectors -- a strong balance for general-purpose similarity search.


Tech Stack

Layer Technology Role
Search engine Faiss (faiss-cpu) HNSW index, brute-force ground truth
API framework FastAPI REST endpoints, Pydantic validation, OpenAPI docs
Server Uvicorn ASGI server
Vectors NumPy Generation, normalization, serialization
Containerization Docker + Compose Deployment, volume mounts, health checks
Testing pytest + FastAPI TestClient 25 unit tests, no running server needed
Language Python 3.11+ Type hints throughout

Quick Start

Prerequisites

  • Python 3.11+
  • 2 GB+ RAM
  • ~1.5 GB free disk space
  • Docker & Docker Compose (optional, for containerized deployment)

1. Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Generate vectors and build the index

python generate_data.py

This creates two files:

  • vectors.npy -- 1M normalized 128-dim vectors (~488 MB)
  • index.faiss -- HNSW index with M=32, efConstruction=200 (~748 MB)
============================================================
VectorWise - Data Generation & Index Building
============================================================

[1/4] Generating 1,000,000 vectors of dimension 128...
  Generated vectors with shape: (1000000, 128)
  Memory usage: 488.28 MB

[2/4] Saving vectors to 'vectors.npy'...
[3/4] Building Faiss HNSW index...
  Index built in 551.46 seconds
  Index contains 1,000,000 vectors

[4/4] Saving index to 'index.faiss'...

============================================================
INDEX STATISTICS
============================================================
Total vectors: 1,000,000
Dimension: 128
File size: 488.28 MB (vectors.npy)
File size: 747.80 MB (index.faiss)
============================================================

3. Start the service

Local:

uvicorn api.main:app --reload

Docker:

docker-compose up --build -d

The service loads the index into memory on startup and listens on port 8000.

4. Query the API

# Health check
curl http://localhost:8000/

# Search for 10 nearest neighbors
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query_vector": [0.1, 0.2, 0.3, '$(python -c "print(', '.join(['0.1'] * 125))")'],
    "k": 10
  }'

Or use the interactive docs at http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc).


API Reference

GET / -- Health check

{
  "service": "VectorWise",
  "status": "healthy",
  "vectors_indexed": 1000000
}

GET /stats -- Index statistics

{
  "total_vectors": 1000000,
  "dimension": 128,
  "index_type": "IndexHNSWFlat",
  "hnsw_m": 32,
  "hnsw_efSearch": 64,
  "hnsw_efConstruction": 200
}

POST /search -- k-NN search

Request:

{
  "query_vector": [0.1, 0.2, "... (128 floats)"],
  "k": 10
}

Response:

{
  "indices": [482953, 192847, 738291, "..."],
  "distances": [0.0000, 1.2234, 1.2456, "..."]
}

Error responses:

Status Condition
400 Wrong vector dimension (not 128)
422 Missing fields, k <= 0, k > 100, non-numeric vector
500 Internal search error (details logged server-side)
503 Index not loaded

Performance

Latency

Measured over 1,000 queries against the full 1M-vector index:

Metric Value
Average ~4-6 ms
Median ~4 ms
P95 ~8 ms
P99 ~12 ms

Recall

Metric Value
Recall@10 (average) 95-98%
Recall@10 (minimum) 90%
Recall@10 (maximum) 100%

Recall is measured against brute-force (exact) search as ground truth.

Tuning profiles

Different workloads need different trade-offs. Adjust efSearch at search time (no rebuild required):

Profile efSearch Expected Recall@10 Expected Latency Use case
Low latency 32-40 90-93% <3 ms Real-time recommendations
Balanced (default) 64 95-98% 4-6 ms General-purpose search
High accuracy 100-150 98-99%+ 8-15 ms Critical search applications

To change efSearch, set the EF_SEARCH constant in config.py or tune the other build-time parameters (HNSW_M, EF_CONSTRUCTION) for deeper optimization -- those require a full index rebuild.

Complexity

Operation Complexity Notes
Index build O(N log N) N = 1M vectors, ~9 min
Single query O(log N) Average case via HNSW graph traversal
Index memory O(N * M) M = 32 connections per node
Vector storage O(N * D) D = 128 dimensions, float32

Project Structure

VectorWise/
├── api/
│   ├── __init__.py              # Package init
│   └── main.py                  # FastAPI app: endpoints, lifespan, CORS
├── tests/
│   ├── __init__.py              # Package init
│   ├── conftest.py              # Shared fixtures (small index, TestClient)
│   └── test_api.py              # 25 unit tests across 5 test classes
├── config.py                    # Shared configuration (dims, HNSW params, paths)
├── generate_data.py             # Vector generation + HNSW index building
├── benchmark.py                 # Latency + Recall@10 measurement suite
├── examples.py                  # 6 usage examples with VectorWiseClient
├── test_api.py                  # Integration tests (requires running server)
├── requirements.txt             # Python dependencies (>= constraints)
├── Dockerfile                   # Python 3.11-slim, health check
├── docker-compose.yml           # Service orchestration, volume mount
├── .dockerignore                # Excludes vectors.npy, .git, __pycache__
├── .gitignore                   # Excludes generated data, venv, caches
├── pytest.ini                   # Test configuration
├── LICENSE                      # MIT
└── README.md                    # This file

Generated at runtime (git-ignored):

File Size Created by
vectors.npy ~488 MB generate_data.py
index.faiss ~748 MB generate_data.py
benchmark_results.json ~1 KB benchmark.py

Docker

Build and run

docker-compose up --build -d

The Compose file mounts index.faiss as a read-only volume into the container -- the image itself stays small since it doesn't bake in the 748 MB index.

Common commands

docker-compose ps              # Check status
docker-compose logs -f         # Stream logs
docker-compose restart         # Restart service
docker-compose down            # Stop and remove containers

Health check

The container includes a health check that hits GET / every 30 seconds. Check status with:

docker inspect --format='{{.State.Health.Status}}' vectorwise-api

Testing

Unit tests (no server required)

The test suite uses FastAPI's TestClient with a small 100-vector index -- fast and isolated:

pytest
========================= test session starts ==========================
collected 25 items

tests/test_api.py::TestHealthCheck::test_health_returns_200 PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_service_name PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_healthy_status PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_vector_count PASSED
tests/test_api.py::TestStats::test_stats_returns_200 PASSED
tests/test_api.py::TestStats::test_stats_contains_total_vectors PASSED
tests/test_api.py::TestStats::test_stats_contains_dimension PASSED
tests/test_api.py::TestStats::test_stats_contains_index_type PASSED
tests/test_api.py::TestStats::test_stats_contains_hnsw_params PASSED
tests/test_api.py::TestSearch::test_search_returns_200 PASSED
tests/test_api.py::TestSearch::test_search_returns_correct_k PASSED
tests/test_api.py::TestSearch::test_search_returns_indices_and_distances PASSED
tests/test_api.py::TestSearch::test_search_distances_are_non_negative PASSED
tests/test_api.py::TestSearch::test_search_distances_are_sorted PASSED
tests/test_api.py::TestSearch::test_search_indices_are_within_range PASSED
tests/test_api.py::TestSearchValidation::test_wrong_dimension_returns_400 PASSED
tests/test_api.py::TestSearchValidation::test_k_zero_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_negative_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_exceeds_max_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_query_vector_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_k_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_empty_body_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_non_numeric_vector_returns_422 PASSED
tests/test_api.py::TestOpenAPIDocs::test_openapi_schema_available PASSED
tests/test_api.py::TestOpenAPIDocs::test_docs_endpoint_available PASSED
========================= 25 passed in 0.10s ===========================

Test coverage:

  • Health check endpoint (4 tests)
  • Index statistics endpoint (5 tests)
  • Search endpoint correctness (6 tests)
  • Input validation and error handling (8 tests)
  • OpenAPI documentation (2 tests)

Integration tests (requires running server)

# Start the server first
uvicorn api.main:app --reload &

# Run integration tests
python test_api.py

Benchmarks

# Requires running server + generated data files
python benchmark.py

Measures latency (avg, median, P95, P99) and Recall@10 against brute-force ground truth over 1,000 queries. Results saved to benchmark_results.json.


Configuration

All tunable parameters live in config.py and can be overridden via environment variables:

Parameter Default Env variable Description
DIM 128 -- Vector dimensionality
N_VECTORS 1,000,000 -- Dataset size
HNSW_M 32 -- Graph connectivity
EF_CONSTRUCTION 200 -- Build-time quality
EF_SEARCH 64 -- Search-time quality/speed trade-off
INDEX_PATH index.faiss VECTORWISE_INDEX_PATH Path to Faiss index
VECTORS_PATH vectors.npy VECTORWISE_VECTORS_PATH Path to vector data
API_HOST 0.0.0.0 VECTORWISE_HOST API bind address
API_PORT 8000 VECTORWISE_PORT API port
MAX_K 100 -- Maximum neighbors per query
CORS_ORIGINS * VECTORWISE_CORS_ORIGINS Comma-separated allowed origins

Roadmap

v1.1 -- API enhancements

  • Batch search endpoint (multiple queries in one request)
  • Metadata filtering (attach labels/tags to vectors)
  • Configurable distance metric (L2, inner product, cosine)

v1.2 -- Observability

  • Prometheus metrics (query latency histogram, request count, error rate)
  • Structured JSON logging
  • Grafana dashboard template

v1.3 -- Performance

  • GPU-accelerated search (faiss-gpu)
  • Index memory-mapping for reduced RAM usage
  • Request queuing with Celery for batch workloads

v2.0 -- Scale

  • Distributed sharding across multiple nodes
  • Online index updates (add/delete without full rebuild)
  • Redis caching layer for hot queries
  • Authentication and rate limiting

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines on setup, code style, testing, and pull requests.


References

License

MIT -- Efe Can Kara, 2025

About

Implements an Approximate Nearest Neighbor (ANN) search engine using Faiss/Annoy for fast vector retrieval, demonstrating optimization and high-dimensional data handling.

Topics

Resources

License

Contributing

Stars

Watchers

Forks