HTTP-based vector benchmarking for deployed search services (Elasticsearch GPU first). Benchmark recall, latency, throughput over REST—no in-process algorithm runs.
uv pip install "cuvs-bencher[elastic]"From workspace: uv sync --extra elastic
Pulls OpenAI 5M embeddings from object storage and converts to big-ann-bench format (base.*.fbin, queries.fbin, groundtruth*.ibin):
# From repo: install download deps, then run
uv sync --extra download
uv run python scripts/download_openai_5m.py ./dataOutput: ./data/ with base.5M.fbin, queries.fbin, groundtruth.5M.neighbors.ibin (~28 GiB).
cuvs-bench --data-dir ./data --host localhost --port 9200 --num-docs 100000 --jsonReturns a BenchmarkResult with typed attributes, .summary(), and .to_dict().
import asyncio
from pathlib import Path
from cuvs_bencher_core import BenchmarkResult, run_vector_benchmark
result: BenchmarkResult = asyncio.run(run_vector_benchmark(
backend="elastic",
data_dir=Path("./data"),
num_docs=100_000,
num_search_queries=1000,
host="localhost",
port=9200,
))
print(result.summary()) # Indexed 100,000 (1536d) @ 8,123 docs/s; p50=2.10ms; ~950 qps; recall@10=0.9412
result.recall_at_k # 0.9412
result.search_latency_p50_ms # 2.1
result.to_dict() # raw dict for JSONcuvs-bencher/
├── packages/cuvs_bencher_core/ # telemetry, datasets, metrics, backend protocol
└── packages/cuvs_bencher_elastic/# Elasticsearch backend + cuvs-bench CLI