High-performance Approximate Nearest Neighbor search over 1 million vectors.
Built with Faiss HNSW + FastAPI. Sub-10ms queries. 95%+ recall. Production-ready.
VectorWise is a REST API that finds the most similar vectors in a dataset of one million entries -- in under 10 milliseconds.
Think of it like this: you have a library with one million books. Someone hands you a book and asks "find me the 10 most similar books." A brute-force search reads every single book -- slow. VectorWise uses an HNSW graph (a clever shortcut structure) to jump between "neighborhoods" of similar books, finding the answer in logarithmic time instead of linear time.
This project demonstrates end-to-end ML infrastructure engineering:
- Data pipeline -- synthetic vector generation, L2 normalization, index construction
- Algorithm selection -- HNSW parameter tuning for the recall/latency trade-off
- API design -- typed request/response models, input validation, structured error handling
- Containerization -- Docker image with volume-mounted index, health checks, compose orchestration
- Testing -- 25 unit tests with FastAPI TestClient, integration test suite, benchmark framework
- Observability --
/statsendpoint, structured logging, benchmark JSON export
Real numbers from actual runs on this codebase:
| Metric | Value |
|---|---|
| Dataset | 1,000,000 vectors, 128 dimensions |
| Index build time | 551.46 seconds |
| Index size on disk | 747.80 MB (index.faiss) |
| Vector data size | 488.28 MB (vectors.npy) |
| Average query latency | ~4-6 ms |
| P95 latency | ~8 ms |
| Recall@10 | 95-98% |
| Unit tests | 25/25 passing (0.10s) |
┌──────────────────────────────────┐
│ Docker Container │
│ │
Client ──HTTP/JSON──► │ FastAPI (api/main.py) │
│ │ │
│ ├─ GET / → Health │
│ ├─ GET /stats → Index info │
│ └─ POST /search → k-NN │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Faiss HNSW Index │ │
│ │ 1M vectors in RAM │ │
│ │ O(log N) search │ │
│ └─────────────────────┘ │
│ │
└──────────────────────────────────┘
- Client sends
POST /searchwith a 128-dim vector andk - Pydantic validates the request body (dimension check, k bounds)
- Query vector is L2-normalized to match the indexed vectors
- Faiss HNSW searches the graph in O(log N) time
- Top-k indices and L2 distances are returned as JSON
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to its nearest neighbors. Searching starts at the top layer (sparse, long-range links) and drills down to the bottom layer (dense, short-range links) -- like zooming into a map.
The trade-off knobs:
| Parameter | Value | What it controls |
|---|---|---|
M |
32 | Links per node. Higher = better recall, more memory |
efConstruction |
200 | Build-time candidate list. Higher = better graph quality, slower build |
efSearch |
64 | Search-time candidate list. Higher = better recall, slower queries |
These values achieve 95%+ Recall@10 at sub-10ms latency on 1M vectors -- a strong balance for general-purpose similarity search.
| Layer | Technology | Role |
|---|---|---|
| Search engine | Faiss (faiss-cpu) | HNSW index, brute-force ground truth |
| API framework | FastAPI | REST endpoints, Pydantic validation, OpenAPI docs |
| Server | Uvicorn | ASGI server |
| Vectors | NumPy | Generation, normalization, serialization |
| Containerization | Docker + Compose | Deployment, volume mounts, health checks |
| Testing | pytest + FastAPI TestClient | 25 unit tests, no running server needed |
| Language | Python 3.11+ | Type hints throughout |
- Python 3.11+
- 2 GB+ RAM
- ~1.5 GB free disk space
- Docker & Docker Compose (optional, for containerized deployment)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython generate_data.pyThis creates two files:
vectors.npy-- 1M normalized 128-dim vectors (~488 MB)index.faiss-- HNSW index with M=32, efConstruction=200 (~748 MB)
============================================================
VectorWise - Data Generation & Index Building
============================================================
[1/4] Generating 1,000,000 vectors of dimension 128...
Generated vectors with shape: (1000000, 128)
Memory usage: 488.28 MB
[2/4] Saving vectors to 'vectors.npy'...
[3/4] Building Faiss HNSW index...
Index built in 551.46 seconds
Index contains 1,000,000 vectors
[4/4] Saving index to 'index.faiss'...
============================================================
INDEX STATISTICS
============================================================
Total vectors: 1,000,000
Dimension: 128
File size: 488.28 MB (vectors.npy)
File size: 747.80 MB (index.faiss)
============================================================
Local:
uvicorn api.main:app --reloadDocker:
docker-compose up --build -dThe service loads the index into memory on startup and listens on port 8000.
# Health check
curl http://localhost:8000/
# Search for 10 nearest neighbors
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query_vector": [0.1, 0.2, 0.3, '$(python -c "print(', '.join(['0.1'] * 125))")'],
"k": 10
}'Or use the interactive docs at http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc).
{
"service": "VectorWise",
"status": "healthy",
"vectors_indexed": 1000000
}{
"total_vectors": 1000000,
"dimension": 128,
"index_type": "IndexHNSWFlat",
"hnsw_m": 32,
"hnsw_efSearch": 64,
"hnsw_efConstruction": 200
}Request:
{
"query_vector": [0.1, 0.2, "... (128 floats)"],
"k": 10
}Response:
{
"indices": [482953, 192847, 738291, "..."],
"distances": [0.0000, 1.2234, 1.2456, "..."]
}Error responses:
| Status | Condition |
|---|---|
400 |
Wrong vector dimension (not 128) |
422 |
Missing fields, k <= 0, k > 100, non-numeric vector |
500 |
Internal search error (details logged server-side) |
503 |
Index not loaded |
Measured over 1,000 queries against the full 1M-vector index:
| Metric | Value |
|---|---|
| Average | ~4-6 ms |
| Median | ~4 ms |
| P95 | ~8 ms |
| P99 | ~12 ms |
| Metric | Value |
|---|---|
| Recall@10 (average) | 95-98% |
| Recall@10 (minimum) | 90% |
| Recall@10 (maximum) | 100% |
Recall is measured against brute-force (exact) search as ground truth.
Different workloads need different trade-offs. Adjust efSearch at search time (no rebuild required):
| Profile | efSearch | Expected Recall@10 | Expected Latency | Use case |
|---|---|---|---|---|
| Low latency | 32-40 | 90-93% | <3 ms | Real-time recommendations |
| Balanced (default) | 64 | 95-98% | 4-6 ms | General-purpose search |
| High accuracy | 100-150 | 98-99%+ | 8-15 ms | Critical search applications |
To change efSearch, set the EF_SEARCH constant in config.py or tune the other build-time parameters (HNSW_M, EF_CONSTRUCTION) for deeper optimization -- those require a full index rebuild.
| Operation | Complexity | Notes |
|---|---|---|
| Index build | O(N log N) | N = 1M vectors, ~9 min |
| Single query | O(log N) | Average case via HNSW graph traversal |
| Index memory | O(N * M) | M = 32 connections per node |
| Vector storage | O(N * D) | D = 128 dimensions, float32 |
VectorWise/
├── api/
│ ├── __init__.py # Package init
│ └── main.py # FastAPI app: endpoints, lifespan, CORS
├── tests/
│ ├── __init__.py # Package init
│ ├── conftest.py # Shared fixtures (small index, TestClient)
│ └── test_api.py # 25 unit tests across 5 test classes
├── config.py # Shared configuration (dims, HNSW params, paths)
├── generate_data.py # Vector generation + HNSW index building
├── benchmark.py # Latency + Recall@10 measurement suite
├── examples.py # 6 usage examples with VectorWiseClient
├── test_api.py # Integration tests (requires running server)
├── requirements.txt # Python dependencies (>= constraints)
├── Dockerfile # Python 3.11-slim, health check
├── docker-compose.yml # Service orchestration, volume mount
├── .dockerignore # Excludes vectors.npy, .git, __pycache__
├── .gitignore # Excludes generated data, venv, caches
├── pytest.ini # Test configuration
├── LICENSE # MIT
└── README.md # This file
Generated at runtime (git-ignored):
| File | Size | Created by |
|---|---|---|
vectors.npy |
~488 MB | generate_data.py |
index.faiss |
~748 MB | generate_data.py |
benchmark_results.json |
~1 KB | benchmark.py |
docker-compose up --build -dThe Compose file mounts index.faiss as a read-only volume into the container -- the image itself stays small since it doesn't bake in the 748 MB index.
docker-compose ps # Check status
docker-compose logs -f # Stream logs
docker-compose restart # Restart service
docker-compose down # Stop and remove containersThe container includes a health check that hits GET / every 30 seconds. Check status with:
docker inspect --format='{{.State.Health.Status}}' vectorwise-apiThe test suite uses FastAPI's TestClient with a small 100-vector index -- fast and isolated:
pytest========================= test session starts ==========================
collected 25 items
tests/test_api.py::TestHealthCheck::test_health_returns_200 PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_service_name PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_healthy_status PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_vector_count PASSED
tests/test_api.py::TestStats::test_stats_returns_200 PASSED
tests/test_api.py::TestStats::test_stats_contains_total_vectors PASSED
tests/test_api.py::TestStats::test_stats_contains_dimension PASSED
tests/test_api.py::TestStats::test_stats_contains_index_type PASSED
tests/test_api.py::TestStats::test_stats_contains_hnsw_params PASSED
tests/test_api.py::TestSearch::test_search_returns_200 PASSED
tests/test_api.py::TestSearch::test_search_returns_correct_k PASSED
tests/test_api.py::TestSearch::test_search_returns_indices_and_distances PASSED
tests/test_api.py::TestSearch::test_search_distances_are_non_negative PASSED
tests/test_api.py::TestSearch::test_search_distances_are_sorted PASSED
tests/test_api.py::TestSearch::test_search_indices_are_within_range PASSED
tests/test_api.py::TestSearchValidation::test_wrong_dimension_returns_400 PASSED
tests/test_api.py::TestSearchValidation::test_k_zero_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_negative_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_exceeds_max_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_query_vector_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_k_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_empty_body_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_non_numeric_vector_returns_422 PASSED
tests/test_api.py::TestOpenAPIDocs::test_openapi_schema_available PASSED
tests/test_api.py::TestOpenAPIDocs::test_docs_endpoint_available PASSED
========================= 25 passed in 0.10s ===========================
Test coverage:
- Health check endpoint (4 tests)
- Index statistics endpoint (5 tests)
- Search endpoint correctness (6 tests)
- Input validation and error handling (8 tests)
- OpenAPI documentation (2 tests)
# Start the server first
uvicorn api.main:app --reload &
# Run integration tests
python test_api.py# Requires running server + generated data files
python benchmark.pyMeasures latency (avg, median, P95, P99) and Recall@10 against brute-force ground truth over 1,000 queries. Results saved to benchmark_results.json.
All tunable parameters live in config.py and can be overridden via environment variables:
| Parameter | Default | Env variable | Description |
|---|---|---|---|
DIM |
128 | -- | Vector dimensionality |
N_VECTORS |
1,000,000 | -- | Dataset size |
HNSW_M |
32 | -- | Graph connectivity |
EF_CONSTRUCTION |
200 | -- | Build-time quality |
EF_SEARCH |
64 | -- | Search-time quality/speed trade-off |
INDEX_PATH |
index.faiss |
VECTORWISE_INDEX_PATH |
Path to Faiss index |
VECTORS_PATH |
vectors.npy |
VECTORWISE_VECTORS_PATH |
Path to vector data |
API_HOST |
0.0.0.0 |
VECTORWISE_HOST |
API bind address |
API_PORT |
8000 |
VECTORWISE_PORT |
API port |
MAX_K |
100 | -- | Maximum neighbors per query |
CORS_ORIGINS |
* |
VECTORWISE_CORS_ORIGINS |
Comma-separated allowed origins |
- Batch search endpoint (multiple queries in one request)
- Metadata filtering (attach labels/tags to vectors)
- Configurable distance metric (L2, inner product, cosine)
- Prometheus metrics (query latency histogram, request count, error rate)
- Structured JSON logging
- Grafana dashboard template
- GPU-accelerated search (faiss-gpu)
- Index memory-mapping for reduced RAM usage
- Request queuing with Celery for batch workloads
- Distributed sharding across multiple nodes
- Online index updates (add/delete without full rebuild)
- Redis caching layer for hot queries
- Authentication and rate limiting
Contributions are welcome. See CONTRIBUTING.md for guidelines on setup, code style, testing, and pull requests.
- Faiss wiki -- Index types, parameters, GPU support
- HNSW paper -- Malkov & Yashunin, 2016
- FastAPI docs -- Endpoints, Pydantic, middleware
- ANN Benchmarks -- Cross-library performance comparisons
MIT -- Efe Can Kara, 2025