Skip to content

feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2#44

Merged
Mathews-Tom merged 3 commits into
mainfrom
feat/cross-encoder-reranking
Mar 25, 2026
Merged

feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2#44
Mathews-Tom merged 3 commits into
mainfrom
feat/cross-encoder-reranking

Conversation

@Mathews-Tom

Copy link
Copy Markdown
Owner

Summary

Add two-stage cross-encoder reranking to the search pipeline, inspired by memora-lab/memory-service-public's retrieval architecture. After RRF fusion combines ChromaDB vector and BM25 keyword results, a cross-encoder model scores (query, document) pairs jointly — capturing token-level interactions that bi-encoder cosine similarity misses. This replaces the raw embedding distance as the semantic input to composite scoring, giving the 0.40 semantic weight a fundamentally higher-quality signal.

Search Pipeline (Before vs After)

Before:

ChromaDB (top-N) + BM25 (top-N) → RRF fusion → composite scoring (semantic=distance) → results

After (when reranker_enabled=true):

ChromaDB (top-4N) + BM25 (top-4N) → RRF fusion → cross-encoder rerank (top-2N) → composite scoring (semantic=CE score) → results

Model Selection: cross-encoder/ms-marco-MiniLM-L-6-v2

Aspect L-6 (selected) L-12 (rejected) all-MiniLM-L6-v2 (rejected)
Architecture Cross-encoder Cross-encoder Bi-encoder
Parameters 22M 33M 22M
NDCG@10 74.30 74.31 N/A (not a reranker)
CPU latency (35 docs) 140-350ms 230-700ms N/A
Memory (fp16) 44 MB 65 MB 43 MB
  • L-12 rejected: 0.01 NDCG gain at 1.9x latency cost — unjustified for personal vault
  • all-MiniLM-L6-v2 rejected: bi-encoder architecture cannot function as a cross-encoder; provides zero benefit over existing ChromaDB embeddings

Score Normalization

Cross-encoder scores are unbounded floats (typically -10 to +10). Normalized to [0, 1] via sigmoid before feeding into composite scoring:

normalized = 1.0 / (1.0 + math.exp(-ce_score))

Changes

New Files

  • src/vaultmind/indexer/reranker.py (71 lines) — CrossEncoderReranker class with lazy sentence_transformers import (avoids loading torch at startup). rerank(query, documents, content_key, top_k) builds (query, doc) pairs, calls model.predict(), returns sorted (document, score) tuples
  • tests/test_reranker.py (159 lines) — 14 tests with fully mocked CrossEncoder (no torch dependency in CI)

Modified Files

  • pyproject.toml — Added reranker = ["sentence-transformers>=3.0,<4"] as optional dependency. Not in main deps since it pulls torch (~2GB)
  • uv.lock — Updated with sentence-transformers resolution
  • src/vaultmind/config.py — Added reranker_enabled: bool = False, reranker_model: str, reranker_top_k: int = 20 to RankingConfig (outside weight sum validator)
  • config/default.toml — Added reranker settings to [ranking] section
  • src/vaultmind/indexer/ranking.py — Added reranker_score: float = 0.0 to RankedResult, populated from hit metadata in rank_results()
  • src/vaultmind/indexer/store.pyranked_search() accepts optional reranker parameter. When provided: fetches 4x candidates, applies cross-encoder reranking (top 2x), sigmoid-normalizes scores, injects reranker_score into results

Backward Compatibility

  • reranker_enabled defaults to False — existing users see zero behavior change
  • ranked_search(reranker=None) (default) is a direct passthrough — no model loading
  • RankedResult.reranker_score defaults to 0.0 — existing result unpacking unaffected
  • Optional dependency: uv pip install vaultmind[reranker] to enable; base install unchanged
  • All existing ranking tests pass unchanged (composite scoring, connection density, etc.)

Installation

# Base install (no reranker, no torch)
uv pip install vaultmind

# With reranker support (~2GB additional for torch + sentence-transformers)
uv pip install "vaultmind[reranker]"

Then enable in config:

[ranking]
reranker_enabled = true

Test plan

  • 14 new tests in test_reranker.py across 4 classes:
    • CrossEncoderReranker (7): score sorting, top_k, empty input, missing content, metadata preservation — all with mocked model
    • RankedResult field (2): default value, explicit population
    • RankingConfig (3): disabled default, model name, top_k
    • Backward compat (2): rank_results without reranker, reranker_score in results
  • All existing ranking tests pass unchanged (48 in test_composite_ranking.py)
  • Full suite: 957/957 tests pass, 0 regressions
  • ruff check — clean
  • mypy --ignore-missing-imports — clean
  • Manual: install vaultmind[reranker], enable, verify reranked results differ from default ordering
  • Manual: benchmark CPU latency on real vault (target: <500ms for 35 docs)

New module indexer/reranker.py with CrossEncoderReranker that scores
(query, document) pairs jointly via cross-encoder for higher-quality
relevance ranking than bi-encoder cosine similarity alone. Uses
cross-encoder/ms-marco-MiniLM-L-6-v2 (22M params, ~44MB, 140-350ms
CPU for 35 docs).

Add sentence-transformers as optional dependency (reranker extra) to
avoid pulling torch (~2GB) for users who don't need reranking.

Add reranker_enabled, reranker_model, reranker_top_k to RankingConfig
(disabled by default — opt-in).
Add reranker_score field to RankedResult. Update ranked_search() to
accept optional reranker parameter — when provided, fetches 4x
candidates, applies cross-encoder scoring, normalizes via sigmoid,
and feeds refined scores into composite ranking. The cross-encoder
score replaces raw embedding distance as the semantic input, giving
the 0.40 semantic weight a higher-quality signal.
14 tests across 4 classes: reranker unit tests (7) with mocked
CrossEncoder (no torch dependency) covering score sorting, top_k,
empty input, missing content, metadata preservation; RankedResult
field (2); RankingConfig defaults (3); backward compatibility (2).
@Mathews-Tom Mathews-Tom merged commit 99fd6dc into main Mar 25, 2026
3 checks passed
@Mathews-Tom Mathews-Tom deleted the feat/cross-encoder-reranking branch March 25, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant