A lightweight, flexible library for training neural retrievers with HuggingFace transformers β featuring multiple ranking architectures, integrated PyTerrier support, and production-ready evaluation pipelines.
Rankers provides a unified interface for training, evaluating, and deploying neural ranking models. Built on top of HuggingFace transformers, it supports multiple architectures (bi-encoders, cross-encoders, sparse retrievers, and more) while maintaining compatibility with the transformers.Trainer API.
- Multiple Architectures: Bi-encoders (Dot), Cross-encoders (CAT), Sequence-to-Sequence, Sparse models, BGE, and more
- HuggingFace Compatible: Drop-in
RankerTrainerthat extendstransformers.Trainer - PyTerrier Integration: Convert trained models to PyTerrier pipelines instantly
- Production Ready: Built-in evaluation, checkpointing, and loss functions optimized for ranking
Clone the repository and install dependencies:
git clone https://github.com/Parry-Parry/rankers.git
cd rankers
pip install -e .Or install the latest from PyPI:
pip install rankersfrom rankers import RankerTrainer, RankerTrainingArguments
from rankers.modelling import Dot
from rankers.datasets import TrainingDataset, Corpus
# Load pre-trained model
model = Dot.from_pretrained("bert-base-uncased")
# Prepare datasets
corpus = Corpus.from_ir_datasets("msmarco-passage")
train_dataset = TrainingDataset("train.jsonl", corpus=corpus, group_size=4)
eval_dataset = EvaluationDataset.from_qrels("qrels.txt", corpus=corpus)
# Configure training
args = RankerTrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=32,
eval_strategy="epoch",
save_strategy="best",
)
# Train
trainer = RankerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss_fn="margin_mse",
)
trainer.train()
# Convert to PyTerrier
ranker = model.to_pyterrier()
results = ranker.transform(test_queries)Rankers includes implementations of popular ranking architectures:
| Architecture | Class | Type | Use Case |
|---|---|---|---|
| Dot | Dot |
Bi-encoder | Fast dense retrieval with separable encoders |
| CAT | CAT |
Cross-encoder | High-precision ranking with joint encoding |
| Seq2Seq | Seq2Seq |
Generative | Ranking via generation |
| Sparse | Sparse |
Sparse Retrieval | Term-based retrieval with neural weights |
| BGE | BGE |
Bi-encoder | HuggingFace BGE models |
Each model supports:
- Training with
RankerTrainer - Evaluation with IR metrics (nDCG, MRR, MAP, etc.)
- PyTerrier conversion for deployment
- Checkpointing with
save_pretrained()/from_pretrained()
{"query_id": "q1", "doc_id_a": "d1", "doc_id_b": "d2"}
{"query_id": "q2", "doc_id_a": "d3", "doc_id_b": "d4"}q1 0 d1 2
q1 0 d2 1
q2 0 d3 2
Customize your training with RankerTrainingArguments:
from rankers import RankerTrainingArguments
args = RankerTrainingArguments(
output_dir="./checkpoints",
num_train_epochs=3,
per_device_train_batch_size=32,
per_device_eval_batch_size=64,
learning_rate=2e-5,
warmup_steps=500,
# Evaluation & Checkpointing
eval_strategy="steps",
eval_steps=500,
save_strategy="best",
metric_for_best_model="eval_nDCG@10",
load_best_model_at_end=True,
# Loss & Regularization
loss_fn="margin_mse", # or "contrastive", "listwise", etc.
regularizer="l2",
regularization_weight=0.001,
# Tracking
report_to=["wandb"],
)Rankers includes multiple loss functions optimized for ranking:
- Pairwise:
margin_mse,contrastive - Listwise:
listnet,ndcg_loss - Ranking-specific:
triplet,in_batch_negatives
Built-in evaluation with IR metrics:
# During training (automatic)
trainer = RankerTrainer(
...,
eval_dataset=eval_dataset,
)
results = trainer.evaluate()
# Returns: {"eval_nDCG@10": 0.45, "eval_MRR": 0.52, ...}
# After training
predictions = trainer.predict(test_dataset)
# Returns: PredictionOutput with scores and metricsSupported metrics (via ir_measures):
- nDCG@k, MRR, MAP, Recall@k, Precision@k, and more
Once trained, convert to PyTerrier for deployment:
# Load trained model
model = Dot.from_pretrained("./checkpoints/best_model")
# Convert to PyTerrier
ranker = model.to_pyterrier(batch_size=128)
# Use in IR pipelines
pipeline = retriever >> ranker >> reranker
# Run end-to-end ranking
results = pipeline.transform(queries_df)Check the examples/ directory for complete scripts:
train.bert.cat.py: Train a BERT-based cross-encodertrain.biencoder.py: Train a bi-encoder with in-batch negativestrain.sparse.py: Train a sparse neural retrievereval_and_rank.py: Evaluate and generate rankings
Run an example:
python examples/train.bert.cat.py \
--model_name_or_path bert-base-uncased \
--training_data path/to/train.jsonl \
--output_dir ./my_rankerFull API documentation is available and can be built using Sphinx:
# Install documentation dependencies
pip install -e ".[docs]"
# Build documentation
cd docs && make html
# View in browser
open _build/html/index.htmlThe documentation includes:
- Complete API reference for all modules
- Architecture deep-dives and hyperparameter guides
- Training tutorials and best practices
- PyTerrier integration examples
Comprehensive test suite with integration tests:
# Run all tests
pytest
# Run only integration tests
pytest tests/integration/
# Run with coverage
pytest --cov=rankersAll tests pass β including:
- 89 integration & unit tests covering full training pipeline
- Gradient flow verification across evaluation passes
- Model checkpointing and loading functionality
- Loss computation and backward propagation
We welcome contributions! Please read our contributing guidelines for more details.
This project is licensed under the Apache 2.0 License β see the LICENSE file for details.
- PyTerrier β Information Retrieval research platform
- HuggingFace Transformers β State-of-the-art NLP models
- ir_measures β Standard IR evaluation metrics