A hybrid RAG (Retrieval-Augmented Generation) pipeline implementation for LiveRAG 2025 Challenge that combines sparse and dense retrieval methods with reranking for improved question answering.
- Hybrid search combining BM25 (sparse) and dense vector search
- Reciprocal Rank Fusion (RRF) for merging search results
- Neural reranking using BGE-reranker
- AWS Bedrock integration for LLM inference
- Support for both Pinecone and OpenSearch indices
LiveRAG-2025/
├── README.md
├── requirements.txt
├── setup.py
├── LICENSE
├── .gitignore
├── liverag/
│ ├── __init__.py
│ │ ├── indices/
│ │ │ ├── __init__.py
│ │ │ ├── pinecone_client.py
│ │ │ └── opensearch_client.py
│ │ ├── models/
│ │ │ ├── __init__.py
│ │ │ ├── reranker.py
│ │ │ └── embeddings.py
│ │ └── pipeline/
│ │ ├── __init__.py
│ │ └── rag_pipeline.py
│ ├── examples/
│ │ ├── README.md
│ │ ├── example_usage.py
│ │ ├── input/
│ │ └── output/
│ └── test-day/
│ └── data
| ├── __init__.py
| └── test_pipeline.py
- Clone the repository:
git clone https://github.com/mallpriyanshu/LiveRAG-2025.git
cd LiveRAG-2025- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the package:
pip install -e .-
Get your AWS access key and secret key from AWS console:
- Log in to the AWS Management Console
- Click on your name at the top-right corner and then "Security Credentials"
- Click on "Access keys" and create a new access key for CLI
- Download and save your access key and secret key
-
Install the AWS CLI tool:
- Follow instructions at: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
-
Configure AWS CLI:
aws configure --profile sigir-participant
# Use the following settings:
# AWS Access Key ID: [your access key]
# AWS Secret Access Key: [your secret key]
# Default region name: us-east-1- Test your setup:
# Should display your AWS account ID
aws sts get-caller-identity --profile sigir-participant
# Verify access to configuration service
aws ssm get-parameter --name /pinecone/ro_token --profile sigir-participantBasic usage example:
from liverag.pipeline.rag_pipeline import rag_pipeline
# Run the pipeline
result = rag_pipeline(
query="What exactly gives particles their mass according to physics?",
query_id="test_001"
)
# Access results
print(f"Query: {result['query']}")
print(f"Answer: {result['answer']}")
print(f"Execution time: {result['execution_time']:.2f} seconds")For more detailed examples, check the examples/ directory which contains:
example_usage.py: example of using the RAG pipelineinput/: Directory for input filesoutput/: Directory for generated outputsREADME.md: documentation of the examples
This section provides detailed instructions for reproducing the results of our RAG pipeline on the test queries for LiveRAG 2025 challenge.
- Python 3.12 or higher
- AWS account with Bedrock access
- Access to the provided Pinecone and OpenSearch indices
- Test queries file (
test-set.jsonl) - AWS Bedrock endpoint configuration
-
Endpoint Configuration
- The pipeline requires a specific AWS Bedrock endpoint for the Falcon3-10B-Instruct model
- Update the endpoint ARN in
liverag/pipeline/rag_pipeline.py:
# AWS Bedrock configuration ENDPOINT_ARN = 'your-endpoint-arn' # Replace with your Falcon3-10B-Instruct endpoint ARN
- The endpoint should be in the
us-east-1region - Ensure you have the necessary permissions to access the endpoint
- The endpoint must be running the Falcon3-10B-Instruct model
-
Model Requirements
- The pipeline is specifically designed to work with the Falcon3-10B-Instruct model
- The model should be deployed as a Bedrock endpoint
- Minimum requirements for the endpoint:
- Model: Falcon3-10B-Instruct
- Region: us-east-1
- Instance type: ml.g5.2xlarge or higher recommended
-
AWS Configuration to access Pinecone and OpenSearch indexes
# Configure AWS credentials
aws configure --profile sigir-participant
# Enter the following when prompted:
# AWS Access Key ID: [your access key]
# AWS Secret Access Key: [your secret key]
# Default region name: us-east-1
# Default output format: json- Verify Access
# Test AWS configuration
aws sts get-caller-identity --profile sigir-participant
aws ssm get-parameter --name /pinecone/ro_token --profile sigir-participant- Run the Pipeline
The test pipeline is located in the test-day directory and provides processing all test-set queries:
cd test-day
python test_pipeline.pyThe script will:
- Load queries from
data/test-set.jsonl - Process each query through the RAG pipeline
- Save results to
data/test-set_output.jsonl
The output file (test-set_output.jsonl) will contain one JSON object per line with the following structure:
{
"id": "query_id",
"question": "Original question",
"passages": [
{
"passage": "Retrieved text",
"doc_IDs": ["document_id"]
}
],
"final_prompt": "Prompt used for answer generation",
"answer": "Generated answer"
}- Average query processing time: ~15-20 seconds per query
- Total processing time for 100 queries: ~25-30 minutes
-
AWS Authentication Issues
- Verify AWS credentials are correctly configured
- Ensure access to required AWS services (Bedrock, SSM)
- Verify the Bedrock endpoint ARN is correctly set in
rag_pipeline.py - Check if the endpoint is in the correct region (us-east-1) if not make changes accordingly in the script.
-
Bedrock Endpoint Issues
- Ensure the Falcon3-10B-Instruct endpoint is active and running
- Verify you have the necessary IAM permissions
- Verify the endpoint is running the correct model version
- Check endpoint metrics for any performance issues
For any issues with reproduction or questions about the implementation, please contact:
- Email: mall.priyanshu7@gmail.com
- GitHub Issues: https://github.com/mallpriyanshu/LiveRAG-2025/issues
The pipeline uses several configuration parameters that can be customized:
- Pinecone index name and namespace
- OpenSearch index name
- AWS region and profile
- Model parameters for reranking and embeddings
- Inference parameters for the LLM
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.