Skip to content

feat(eval): add ranked retrieval metrics (YEH-009)#10

Merged
rahulrajaram merged 1 commit into
masterfrom
feat/yeh-009-ranked-retrieval-metrics
Mar 27, 2026
Merged

feat(eval): add ranked retrieval metrics (YEH-009)#10
rahulrajaram merged 1 commit into
masterfrom
feat/yeh-009-ranked-retrieval-metrics

Conversation

@rahulrajaram
Copy link
Copy Markdown
Owner

Summary

  • Add precision@k, recall@k, MRR, and nDCG@k to yore eval for measuring BM25 retrieval quality
  • Metrics computed over initial search results deduplicated to unique doc paths by first occurrence
  • New --k CLI flag (comma-separated, default 5,10) controls which k values are reported
  • Questions with relevant_docs in JSONL get per-question and aggregate ranking; questions without produce existing output only (backward compatible)
  • Fix found/missing bug where substring matching ran against the query text instead of the assembled digest
  • README updated to document new flag, JSONL format, and ranking metrics

Test plan

  • 7 new unit tests for all metric functions (precision, recall, MRR, nDCG, compute, aggregate, dedup)
  • Updated existing test_eval_json_result_serialization for new fields
  • 2 new integration tests: test_eval_with_relevant_docs_computes_metrics and test_eval_without_relevant_docs_omits_metrics
  • cargo clippy -- -D warnings clean
  • All 134 tests pass

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 27, 2026

Commit squash-scope review

Range: origin/master..fa19467b122d0afd845a61cec933588f6284e58d
Commits reviewed: 2

Heuristic candidates

No clear squash candidates found from overlap or message-prefix heuristics.

SQUASH_SCOPE_NEEDS_REWRITE=0

Add precision@k, recall@k, MRR, and nDCG@k to `yore eval`
for measuring BM25 retrieval quality. Metrics computed over
initial search results deduplicated to unique doc paths.

Questions with `relevant_docs` in JSONL get per-question and
aggregate ranking output; questions without it produce
existing output only (backward compatible).

Also fixes found/missing bug where substring matching ran
against the query text instead of the assembled digest.

New CLI flag: --k (comma-separated, default 5,10).

Stabilizes flaky test_dupes_group_mode (LSH is
probabilistic; assert on output format, not specific pairs).
@rahulrajaram rahulrajaram force-pushed the feat/yeh-009-ranked-retrieval-metrics branch from 31d6047 to 75554b0 Compare March 27, 2026 02:54
@rahulrajaram rahulrajaram merged commit 3eb31fe into master Mar 27, 2026
6 checks passed
@rahulrajaram rahulrajaram deleted the feat/yeh-009-ranked-retrieval-metrics branch March 27, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant