Skip to content

feat: retrieval benchmark suite — recall@K vs Pinecone, Weaviate, pgvector#51

Open
kaising-openclaw1 wants to merge 1 commit into
Dipraise1:mainfrom
kaising-openclaw1:feat/retrieval-benchmarks
Open

feat: retrieval benchmark suite — recall@K vs Pinecone, Weaviate, pgvector#51
kaising-openclaw1 wants to merge 1 commit into
Dipraise1:mainfrom
kaising-openclaw1:feat/retrieval-benchmarks

Conversation

@kaising-openclaw1

Copy link
Copy Markdown

Summary

Implements the benchmark described in Issue #24: a reproducible harness that runs the same queries against Engram and baseline vector databases (Pinecone, Weaviate, pgvector) on public BEIR datasets.

What's included

  • scripts/bench/run_benchmarks.py — Main benchmark orchestrator

    • Loads BEIR datasets (NQ, HotpotQA, FiQA) via HuggingFace datasets
    • Benchmarks Engram, Pinecone, Weaviate, and pgvector
    • Reports recall@1/5/10, p50/p95/p99 latency, storage overhead
    • Quick mode for CI (1000 docs, 50 queries)
    • Rich console output with progress bars
  • scripts/bench/requirements-bench.txt — Isolated dependency file

  • scripts/bench/__init__.py — Package init

Usage

Architecture

Each system (EngramRunner, PineconeRunner, WeaviateRunner, PgvectorRunner) implements a common interface:

  • prepare(docs, embeddings) — ingest/index documents
  • query(queries_emb, queries_text, top_k) — run queries
  • storage_size() — report storage overhead
  • cleanup() — tear down resources

Results are saved as JSON and a markdown report is generated at bench_results/benchmarks.md, ready to be copied to docs/benchmarks.md.

Closes #24

…pgvector)

Implements the benchmark described in Issue Dipraise1#24:
- Reproducible harness in scripts/bench/
- Loads BEIR datasets (NQ, HotpotQA, FiQA) via HuggingFace datasets
- Benchmarks Engram, Pinecone, Weaviate, and pgvector
- Reports recall@1/5/10, p50/p95/p99 latency, storage overhead
- Generates markdown report for docs/benchmarks.md
- Quick mode for CI (1000 docs, 50 queries)
- Rich console output with progress bars

Usage: python scripts/bench/run_benchmarks.py --datasets nq,hotpotqa,fiqa
@vercel

vercel Bot commented Jun 21, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the praise's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[benchmarks] Retrieval benchmark suite — recall@K vs Pinecone, Weaviate, pgvector

1 participant