feat(indexer): cross-encoder reranking with ms-marco-MiniLM-L-6-v2#44
Merged
Conversation
New module indexer/reranker.py with CrossEncoderReranker that scores (query, document) pairs jointly via cross-encoder for higher-quality relevance ranking than bi-encoder cosine similarity alone. Uses cross-encoder/ms-marco-MiniLM-L-6-v2 (22M params, ~44MB, 140-350ms CPU for 35 docs). Add sentence-transformers as optional dependency (reranker extra) to avoid pulling torch (~2GB) for users who don't need reranking. Add reranker_enabled, reranker_model, reranker_top_k to RankingConfig (disabled by default — opt-in).
Add reranker_score field to RankedResult. Update ranked_search() to accept optional reranker parameter — when provided, fetches 4x candidates, applies cross-encoder scoring, normalizes via sigmoid, and feeds refined scores into composite ranking. The cross-encoder score replaces raw embedding distance as the semantic input, giving the 0.40 semantic weight a higher-quality signal.
14 tests across 4 classes: reranker unit tests (7) with mocked CrossEncoder (no torch dependency) covering score sorting, top_k, empty input, missing content, metadata preservation; RankedResult field (2); RankingConfig defaults (3); backward compatibility (2).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add two-stage cross-encoder reranking to the search pipeline, inspired by memora-lab/memory-service-public's retrieval architecture. After RRF fusion combines ChromaDB vector and BM25 keyword results, a cross-encoder model scores
(query, document)pairs jointly — capturing token-level interactions that bi-encoder cosine similarity misses. This replaces the raw embedding distance as the semantic input to composite scoring, giving the 0.40 semantic weight a fundamentally higher-quality signal.Search Pipeline (Before vs After)
Before:
After (when reranker_enabled=true):
Model Selection:
cross-encoder/ms-marco-MiniLM-L-6-v2Score Normalization
Cross-encoder scores are unbounded floats (typically -10 to +10). Normalized to [0, 1] via sigmoid before feeding into composite scoring:
Changes
New Files
src/vaultmind/indexer/reranker.py(71 lines) —CrossEncoderRerankerclass with lazysentence_transformersimport (avoids loading torch at startup).rerank(query, documents, content_key, top_k)builds (query, doc) pairs, callsmodel.predict(), returns sorted(document, score)tuplestests/test_reranker.py(159 lines) — 14 tests with fully mocked CrossEncoder (no torch dependency in CI)Modified Files
pyproject.toml— Addedreranker = ["sentence-transformers>=3.0,<4"]as optional dependency. Not in main deps since it pulls torch (~2GB)uv.lock— Updated with sentence-transformers resolutionsrc/vaultmind/config.py— Addedreranker_enabled: bool = False,reranker_model: str,reranker_top_k: int = 20toRankingConfig(outside weight sum validator)config/default.toml— Added reranker settings to[ranking]sectionsrc/vaultmind/indexer/ranking.py— Addedreranker_score: float = 0.0toRankedResult, populated from hit metadata inrank_results()src/vaultmind/indexer/store.py—ranked_search()accepts optionalrerankerparameter. When provided: fetches 4x candidates, applies cross-encoder reranking (top 2x), sigmoid-normalizes scores, injectsreranker_scoreinto resultsBackward Compatibility
reranker_enableddefaults toFalse— existing users see zero behavior changeranked_search(reranker=None)(default) is a direct passthrough — no model loadingRankedResult.reranker_scoredefaults to 0.0 — existing result unpacking unaffecteduv pip install vaultmind[reranker]to enable; base install unchangedInstallation
Then enable in config:
Test plan
test_reranker.pyacross 4 classes:ruff check— cleanmypy --ignore-missing-imports— cleanvaultmind[reranker], enable, verify reranked results differ from default ordering