Skip to content

feat: Hybrid Search with BM25 + BGE-M3 Semantic Retrieval, RRF Fusion and BGE Reranker#10

Open
Parth-2701 wants to merge 1 commit intodevrev:mainfrom
Parth-2701:my-feature-branch
Open

feat: Hybrid Search with BM25 + BGE-M3 Semantic Retrieval, RRF Fusion and BGE Reranker#10
Parth-2701 wants to merge 1 commit intodevrev:mainfrom
Parth-2701:my-feature-branch

Conversation

@Parth-2701
Copy link
Copy Markdown

Overview

This PR implements a hybrid retrieval pipeline for the DevRev Search benchmark
that combines lexical and semantic search, fuses results using Reciprocal Rank
Fusion (RRF), and reranks final candidates using a cross-encoder.

What Changed

  • Replaced single-stage dense retrieval with a dual retrieval system
    combining lexical and semantic search
  • Introduced a rank fusion strategy to intelligently merge and deduplicate
    results from both retrievers
  • Added a neural cross-encoder reranking stage as the final step for
    precise relevance estimation
  • Refactored the entire search module into clean, modular, independently
    testable components

Models Used

  • Embedding: BAAI/bge-m3 (1024-dim, multilingual)
  • Reranker: BAAI/bge-reranker-v2-m3

@Parth-2701
Copy link
Copy Markdown
Author

@nimit2801 and @prakhar7651 Please have a look at the PR description validation.

@prakhar7651
Copy link
Copy Markdown
Contributor

Hey!
These are your scores.
Recall@10: 0.3158
Precision@10: 0.2973

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants