feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2 by Mathews-Tom · Pull Request #44 · Mathews-Tom/VaultMind

Mathews-Tom · 2026-03-25T17:29:03Z

Summary

Add two-stage cross-encoder reranking to the search pipeline, inspired by memora-lab/memory-service-public's retrieval architecture. After RRF fusion combines ChromaDB vector and BM25 keyword results, a cross-encoder model scores (query, document) pairs jointly — capturing token-level interactions that bi-encoder cosine similarity misses. This replaces the raw embedding distance as the semantic input to composite scoring, giving the 0.40 semantic weight a fundamentally higher-quality signal.

Search Pipeline (Before vs After)

Before:

ChromaDB (top-N) + BM25 (top-N) → RRF fusion → composite scoring (semantic=distance) → results

After (when reranker_enabled=true):

ChromaDB (top-4N) + BM25 (top-4N) → RRF fusion → cross-encoder rerank (top-2N) → composite scoring (semantic=CE score) → results

Model Selection: `cross-encoder/ms-marco-MiniLM-L-6-v2`

Aspect	L-6 (selected)	L-12 (rejected)	all-MiniLM-L6-v2 (rejected)
Architecture	Cross-encoder	Cross-encoder	Bi-encoder
Parameters	22M	33M	22M
NDCG@10	74.30	74.31	N/A (not a reranker)
CPU latency (35 docs)	140-350ms	230-700ms	N/A
Memory (fp16)	44 MB	65 MB	43 MB

L-12 rejected: 0.01 NDCG gain at 1.9x latency cost — unjustified for personal vault
all-MiniLM-L6-v2 rejected: bi-encoder architecture cannot function as a cross-encoder; provides zero benefit over existing ChromaDB embeddings

Score Normalization

Cross-encoder scores are unbounded floats (typically -10 to +10). Normalized to [0, 1] via sigmoid before feeding into composite scoring:

normalized = 1.0 / (1.0 + math.exp(-ce_score))

Changes

New Files

src/vaultmind/indexer/reranker.py (71 lines) — CrossEncoderReranker class with lazy sentence_transformers import (avoids loading torch at startup). rerank(query, documents, content_key, top_k) builds (query, doc) pairs, calls model.predict(), returns sorted (document, score) tuples
tests/test_reranker.py (159 lines) — 14 tests with fully mocked CrossEncoder (no torch dependency in CI)

Modified Files

pyproject.toml — Added reranker = ["sentence-transformers>=3.0,<4"] as optional dependency. Not in main deps since it pulls torch (~2GB)
uv.lock — Updated with sentence-transformers resolution
src/vaultmind/config.py — Added reranker_enabled: bool = False, reranker_model: str, reranker_top_k: int = 20 to RankingConfig (outside weight sum validator)
config/default.toml — Added reranker settings to [ranking] section
src/vaultmind/indexer/ranking.py — Added reranker_score: float = 0.0 to RankedResult, populated from hit metadata in rank_results()
src/vaultmind/indexer/store.py — ranked_search() accepts optional reranker parameter. When provided: fetches 4x candidates, applies cross-encoder reranking (top 2x), sigmoid-normalizes scores, injects reranker_score into results

Backward Compatibility

reranker_enabled defaults to False — existing users see zero behavior change
ranked_search(reranker=None) (default) is a direct passthrough — no model loading
RankedResult.reranker_score defaults to 0.0 — existing result unpacking unaffected
Optional dependency: uv pip install vaultmind[reranker] to enable; base install unchanged
All existing ranking tests pass unchanged (composite scoring, connection density, etc.)

Installation

# Base install (no reranker, no torch)
uv pip install vaultmind

# With reranker support (~2GB additional for torch + sentence-transformers)
uv pip install "vaultmind[reranker]"

Then enable in config:

[ranking]
reranker_enabled = true

Test plan

14 new tests in test_reranker.py across 4 classes:
- CrossEncoderReranker (7): score sorting, top_k, empty input, missing content, metadata preservation — all with mocked model
- RankedResult field (2): default value, explicit population
- RankingConfig (3): disabled default, model name, top_k
- Backward compat (2): rank_results without reranker, reranker_score in results
All existing ranking tests pass unchanged (48 in test_composite_ranking.py)
Full suite: 957/957 tests pass, 0 regressions
ruff check — clean
mypy --ignore-missing-imports — clean
Manual: install vaultmind[reranker], enable, verify reranked results differ from default ordering
Manual: benchmark CPU latency on real vault (target: <500ms for 35 docs)

New module indexer/reranker.py with CrossEncoderReranker that scores (query, document) pairs jointly via cross-encoder for higher-quality relevance ranking than bi-encoder cosine similarity alone. Uses cross-encoder/ms-marco-MiniLM-L-6-v2 (22M params, ~44MB, 140-350ms CPU for 35 docs). Add sentence-transformers as optional dependency (reranker extra) to avoid pulling torch (~2GB) for users who don't need reranking. Add reranker_enabled, reranker_model, reranker_top_k to RankingConfig (disabled by default — opt-in).

Add reranker_score field to RankedResult. Update ranked_search() to accept optional reranker parameter — when provided, fetches 4x candidates, applies cross-encoder scoring, normalizes via sigmoid, and feeds refined scores into composite ranking. The cross-encoder score replaces raw embedding distance as the semantic input, giving the 0.40 semantic weight a higher-quality signal.

14 tests across 4 classes: reranker unit tests (7) with mocked CrossEncoder (no torch dependency) covering score sorting, top_k, empty input, missing content, metadata preservation; RankedResult field (2); RankingConfig defaults (3); backward compatibility (2).

Mathews-Tom added 3 commits March 25, 2026 22:55

Mathews-Tom merged commit 99fd6dc into main Mar 25, 2026
3 checks passed

Mathews-Tom deleted the feat/cross-encoder-reranking branch March 25, 2026 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2#44

feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2#44
Mathews-Tom merged 3 commits into
mainfrom
feat/cross-encoder-reranking

Mathews-Tom commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Mar 25, 2026

Summary

Search Pipeline (Before vs After)

Model Selection: cross-encoder/ms-marco-MiniLM-L-6-v2

Score Normalization

Changes

New Files

Modified Files

Backward Compatibility

Installation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Model Selection: `cross-encoder/ms-marco-MiniLM-L-6-v2`