VectorWise

High-performance Approximate Nearest Neighbor search over 1 million vectors.

Built with Faiss HNSW + FastAPI. Sub-10ms queries. 95%+ recall. Production-ready.

What is VectorWise?

VectorWise is a REST API that finds the most similar vectors in a dataset of one million entries -- in under 10 milliseconds.

Think of it like this: you have a library with one million books. Someone hands you a book and asks "find me the 10 most similar books." A brute-force search reads every single book -- slow. VectorWise uses an HNSW graph (a clever shortcut structure) to jump between "neighborhoods" of similar books, finding the answer in logarithmic time instead of linear time.

This project demonstrates end-to-end ML infrastructure engineering:

Data pipeline -- synthetic vector generation, L2 normalization, index construction
Algorithm selection -- HNSW parameter tuning for the recall/latency trade-off
API design -- typed request/response models, input validation, structured error handling
Containerization -- Docker image with volume-mounted index, health checks, compose orchestration
Testing -- 25 unit tests with FastAPI TestClient, integration test suite, benchmark framework
Observability -- /stats endpoint, structured logging, benchmark JSON export

Key Results

Real numbers from actual runs on this codebase:

Metric	Value
Dataset	1,000,000 vectors, 128 dimensions
Index build time	551.46 seconds
Index size on disk	747.80 MB (`index.faiss`)
Vector data size	488.28 MB (`vectors.npy`)
Average query latency	~4-6 ms
P95 latency	~8 ms
Recall@10	95-98%
Unit tests	25/25 passing (0.10s)

How It Works

Architecture

                          ┌──────────────────────────────────┐
                          │         Docker Container         │
                          │                                  │
  Client ──HTTP/JSON──►   │  FastAPI (api/main.py)           │
                          │    │                             │
                          │    ├─ GET  /        → Health     │
                          │    ├─ GET  /stats   → Index info │
                          │    └─ POST /search  → k-NN      │
                          │         │                        │
                          │         ▼                        │
                          │  ┌─────────────────────┐        │
                          │  │  Faiss HNSW Index    │        │
                          │  │  1M vectors in RAM   │        │
                          │  │  O(log N) search     │        │
                          │  └─────────────────────┘        │
                          │                                  │
                          └──────────────────────────────────┘

Request lifecycle

Client sends POST /search with a 128-dim vector and k
Pydantic validates the request body (dimension check, k bounds)
Query vector is L2-normalized to match the indexed vectors
Faiss HNSW searches the graph in O(log N) time
Top-k indices and L2 distances are returned as JSON

Why HNSW?

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to its nearest neighbors. Searching starts at the top layer (sparse, long-range links) and drills down to the bottom layer (dense, short-range links) -- like zooming into a map.

The trade-off knobs:

Parameter	Value	What it controls
`M`	32	Links per node. Higher = better recall, more memory
`efConstruction`	200	Build-time candidate list. Higher = better graph quality, slower build
`efSearch`	64	Search-time candidate list. Higher = better recall, slower queries

These values achieve 95%+ Recall@10 at sub-10ms latency on 1M vectors -- a strong balance for general-purpose similarity search.

Tech Stack

Layer	Technology	Role
Search engine	Faiss (faiss-cpu)	HNSW index, brute-force ground truth
API framework	FastAPI	REST endpoints, Pydantic validation, OpenAPI docs
Server	Uvicorn	ASGI server
Vectors	NumPy	Generation, normalization, serialization
Containerization	Docker + Compose	Deployment, volume mounts, health checks
Testing	pytest + FastAPI TestClient	25 unit tests, no running server needed
Language	Python 3.11+	Type hints throughout

Quick Start

Prerequisites

Python 3.11+
2 GB+ RAM
~1.5 GB free disk space
Docker & Docker Compose (optional, for containerized deployment)

1. Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Generate vectors and build the index

python generate_data.py

This creates two files:

vectors.npy -- 1M normalized 128-dim vectors (~488 MB)
index.faiss -- HNSW index with M=32, efConstruction=200 (~748 MB)

============================================================
VectorWise - Data Generation & Index Building
============================================================

[1/4] Generating 1,000,000 vectors of dimension 128...
  Generated vectors with shape: (1000000, 128)
  Memory usage: 488.28 MB

[2/4] Saving vectors to 'vectors.npy'...
[3/4] Building Faiss HNSW index...
  Index built in 551.46 seconds
  Index contains 1,000,000 vectors

[4/4] Saving index to 'index.faiss'...

============================================================
INDEX STATISTICS
============================================================
Total vectors: 1,000,000
Dimension: 128
File size: 488.28 MB (vectors.npy)
File size: 747.80 MB (index.faiss)
============================================================

3. Start the service

Local:

uvicorn api.main:app --reload

Docker:

docker-compose up --build -d

The service loads the index into memory on startup and listens on port 8000.

4. Query the API

# Health check
curl http://localhost:8000/

# Search for 10 nearest neighbors
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query_vector": [0.1, 0.2, 0.3, '$(python -c "print(', '.join(['0.1'] * 125))")'],
    "k": 10
  }'

Or use the interactive docs at http://localhost:8000/docs (Swagger UI) or http://localhost:8000/redoc (ReDoc).

API Reference

`GET /` -- Health check

{
  "service": "VectorWise",
  "status": "healthy",
  "vectors_indexed": 1000000
}

`GET /stats` -- Index statistics

{
  "total_vectors": 1000000,
  "dimension": 128,
  "index_type": "IndexHNSWFlat",
  "hnsw_m": 32,
  "hnsw_efSearch": 64,
  "hnsw_efConstruction": 200
}

`POST /search` -- k-NN search

Request:

{
  "query_vector": [0.1, 0.2, "... (128 floats)"],
  "k": 10
}

Response:

{
  "indices": [482953, 192847, 738291, "..."],
  "distances": [0.0000, 1.2234, 1.2456, "..."]
}

Error responses:

Status	Condition
`400`	Wrong vector dimension (not 128)
`422`	Missing fields, k <= 0, k > 100, non-numeric vector
`500`	Internal search error (details logged server-side)
`503`	Index not loaded

Performance

Latency

Measured over 1,000 queries against the full 1M-vector index:

Metric	Value
Average	~4-6 ms
Median	~4 ms
P95	~8 ms
P99	~12 ms

Recall

Metric	Value
Recall@10 (average)	95-98%
Recall@10 (minimum)	90%
Recall@10 (maximum)	100%

Recall is measured against brute-force (exact) search as ground truth.

Tuning profiles

Different workloads need different trade-offs. Adjust efSearch at search time (no rebuild required):

Profile	efSearch	Expected Recall@10	Expected Latency	Use case
Low latency	32-40	90-93%	<3 ms	Real-time recommendations
Balanced (default)	64	95-98%	4-6 ms	General-purpose search
High accuracy	100-150	98-99%+	8-15 ms	Critical search applications

To change efSearch, set the EF_SEARCH constant in config.py or tune the other build-time parameters (HNSW_M, EF_CONSTRUCTION) for deeper optimization -- those require a full index rebuild.

Complexity

Operation	Complexity	Notes
Index build	O(N log N)	N = 1M vectors, ~9 min
Single query	O(log N)	Average case via HNSW graph traversal
Index memory	O(N * M)	M = 32 connections per node
Vector storage	O(N * D)	D = 128 dimensions, float32

Project Structure

VectorWise/
├── api/
│   ├── __init__.py              # Package init
│   └── main.py                  # FastAPI app: endpoints, lifespan, CORS
├── tests/
│   ├── __init__.py              # Package init
│   ├── conftest.py              # Shared fixtures (small index, TestClient)
│   └── test_api.py              # 25 unit tests across 5 test classes
├── config.py                    # Shared configuration (dims, HNSW params, paths)
├── generate_data.py             # Vector generation + HNSW index building
├── benchmark.py                 # Latency + Recall@10 measurement suite
├── examples.py                  # 6 usage examples with VectorWiseClient
├── test_api.py                  # Integration tests (requires running server)
├── requirements.txt             # Python dependencies (>= constraints)
├── Dockerfile                   # Python 3.11-slim, health check
├── docker-compose.yml           # Service orchestration, volume mount
├── .dockerignore                # Excludes vectors.npy, .git, __pycache__
├── .gitignore                   # Excludes generated data, venv, caches
├── pytest.ini                   # Test configuration
├── LICENSE                      # MIT
└── README.md                    # This file

Generated at runtime (git-ignored):

File	Size	Created by
`vectors.npy`	~488 MB	`generate_data.py`
`index.faiss`	~748 MB	`generate_data.py`
`benchmark_results.json`	~1 KB	`benchmark.py`

Docker

Build and run

docker-compose up --build -d

The Compose file mounts index.faiss as a read-only volume into the container -- the image itself stays small since it doesn't bake in the 748 MB index.

Common commands

docker-compose ps              # Check status
docker-compose logs -f         # Stream logs
docker-compose restart         # Restart service
docker-compose down            # Stop and remove containers

Health check

The container includes a health check that hits GET / every 30 seconds. Check status with:

docker inspect --format='{{.State.Health.Status}}' vectorwise-api

Testing

Unit tests (no server required)

The test suite uses FastAPI's TestClient with a small 100-vector index -- fast and isolated:

pytest

========================= test session starts ==========================
collected 25 items

tests/test_api.py::TestHealthCheck::test_health_returns_200 PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_service_name PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_healthy_status PASSED
tests/test_api.py::TestHealthCheck::test_health_returns_vector_count PASSED
tests/test_api.py::TestStats::test_stats_returns_200 PASSED
tests/test_api.py::TestStats::test_stats_contains_total_vectors PASSED
tests/test_api.py::TestStats::test_stats_contains_dimension PASSED
tests/test_api.py::TestStats::test_stats_contains_index_type PASSED
tests/test_api.py::TestStats::test_stats_contains_hnsw_params PASSED
tests/test_api.py::TestSearch::test_search_returns_200 PASSED
tests/test_api.py::TestSearch::test_search_returns_correct_k PASSED
tests/test_api.py::TestSearch::test_search_returns_indices_and_distances PASSED
tests/test_api.py::TestSearch::test_search_distances_are_non_negative PASSED
tests/test_api.py::TestSearch::test_search_distances_are_sorted PASSED
tests/test_api.py::TestSearch::test_search_indices_are_within_range PASSED
tests/test_api.py::TestSearchValidation::test_wrong_dimension_returns_400 PASSED
tests/test_api.py::TestSearchValidation::test_k_zero_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_negative_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_k_exceeds_max_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_query_vector_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_missing_k_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_empty_body_returns_422 PASSED
tests/test_api.py::TestSearchValidation::test_non_numeric_vector_returns_422 PASSED
tests/test_api.py::TestOpenAPIDocs::test_openapi_schema_available PASSED
tests/test_api.py::TestOpenAPIDocs::test_docs_endpoint_available PASSED
========================= 25 passed in 0.10s ===========================

Test coverage:

Health check endpoint (4 tests)
Index statistics endpoint (5 tests)
Search endpoint correctness (6 tests)
Input validation and error handling (8 tests)
OpenAPI documentation (2 tests)

Integration tests (requires running server)

# Start the server first
uvicorn api.main:app --reload &

# Run integration tests
python test_api.py

Benchmarks

# Requires running server + generated data files
python benchmark.py

Measures latency (avg, median, P95, P99) and Recall@10 against brute-force ground truth over 1,000 queries. Results saved to benchmark_results.json.

Configuration

All tunable parameters live in config.py and can be overridden via environment variables:

Parameter	Default	Env variable	Description
`DIM`	128	--	Vector dimensionality
`N_VECTORS`	1,000,000	--	Dataset size
`HNSW_M`	32	--	Graph connectivity
`EF_CONSTRUCTION`	200	--	Build-time quality
`EF_SEARCH`	64	--	Search-time quality/speed trade-off
`INDEX_PATH`	`index.faiss`	`VECTORWISE_INDEX_PATH`	Path to Faiss index
`VECTORS_PATH`	`vectors.npy`	`VECTORWISE_VECTORS_PATH`	Path to vector data
`API_HOST`	`0.0.0.0`	`VECTORWISE_HOST`	API bind address
`API_PORT`	`8000`	`VECTORWISE_PORT`	API port
`MAX_K`	100	--	Maximum neighbors per query
`CORS_ORIGINS`	`*`	`VECTORWISE_CORS_ORIGINS`	Comma-separated allowed origins

Roadmap

v1.1 -- API enhancements

Batch search endpoint (multiple queries in one request)
Metadata filtering (attach labels/tags to vectors)
Configurable distance metric (L2, inner product, cosine)

v1.2 -- Observability

Prometheus metrics (query latency histogram, request count, error rate)
Structured JSON logging
Grafana dashboard template

v1.3 -- Performance

GPU-accelerated search (faiss-gpu)
Index memory-mapping for reduced RAM usage
Request queuing with Celery for batch workloads

v2.0 -- Scale

Distributed sharding across multiple nodes
Online index updates (add/delete without full rebuild)
Redis caching layer for hot queries
Authentication and rate limiting

Contributing

Contributions are welcome. See CONTRIBUTING.md for guidelines on setup, code style, testing, and pull requests.

References

Faiss wiki -- Index types, parameters, GPU support
HNSW paper -- Malkov & Yashunin, 2016
FastAPI docs -- Endpoints, Pydantic, middleware
ANN Benchmarks -- Cross-library performance comparisons

License

MIT -- Efe Can Kara, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
api		api
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
config.py		config.py
docker-compose.yml		docker-compose.yml
examples.py		examples.py
generate_data.py		generate_data.py
pytest.ini		pytest.ini
quickstart.sh		quickstart.sh
requirements.txt		requirements.txt
test_api.py		test_api.py

License

karsterr/VectorWise

Folders and files

Latest commit

History

Repository files navigation

VectorWise

What is VectorWise?

Key Results

How It Works

Architecture

Request lifecycle

Why HNSW?

Tech Stack

Quick Start

Prerequisites

1. Install dependencies

2. Generate vectors and build the index

3. Start the service

4. Query the API

API Reference

GET / -- Health check

GET /stats -- Index statistics

POST /search -- k-NN search

Performance

Latency

Recall

Tuning profiles

Complexity

Project Structure

Docker

Build and run

Common commands

Health check

Testing

Unit tests (no server required)

Integration tests (requires running server)

Benchmarks

Configuration

Roadmap

v1.1 -- API enhancements

v1.2 -- Observability

v1.3 -- Performance

v2.0 -- Scale

Contributing

References

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

`GET /` -- Health check

`GET /stats` -- Index statistics

`POST /search` -- k-NN search