Calvera is a Python library offering a collection of neural multi-armed bandit algorithms, designed to integrate seamlessly with PyTorch and PyTorch Lightning. Whether you're exploring contextual bandits or developing new strategies, Calvera provides a flexible, easy-to-use interface. You can bring your own neural networks and datasets while Calvera focuses on the implementation of the bandit algorithms.
-
Multi-Armed Bandit Algorithms:
- (Approximate + Standard) Linear Thompson Sampling
- (Approximate + Standard) Linear UCB
- Neural Linear
- Neural Thompson Sampling
- Neural UCB
-
Customizable Selectors:
- ArgMaxSelector: Chooses the arm with the highest score.
- EpsilonGreedySelector: Chooses the best arm with probability
1-epsilonor a random arm with probabilityepsilon. - TopKSelector: Selects the top
karms with the highest scores. - EpsilonGreedyTopKSelector: Selects the top
karms with probability1-epsilonorkrandom arms with probabilityepsilon.
-
Integration:
- Built on top of PyTorch Lightning for training and inference.
- Minimal adjustments needed to plug into your existing workflow.
Calvera is available on PyPI. Install it via pip:
pip install calveraThis installs the necessary dependencies for the base library. If you want to use parts of the benchmark subpackage we recommend installing the optional dependencies as well:
pip install calvera[benchmark]For development you can install the development dependencies via:
pip install calvera[dev]Below is a simple example using a Linear Thompson Sampling bandit:
import torch
import lightning as pl
from calvera.bandits import LinearTSBandit
# 1. Create a bandit for a linear model with 128 features.
N_FEATURES = 128
bandit = LinearTSBandit(n_features=N_FEATURES)
# 2. Generate sample data (batch_size, n_actions, n_features) and perform inference.
data = torch.randn(100, 1, N_FEATURES)
chosen_arms_one_hot, probabilities = bandit(data)
chosen_arms = chosen_arms_one_hot.argmax(dim=1)
# 3. Retrieve rewards for the chosen arms.
rewards = torch.randn(100, 1)
# 4. Add the data to the bandit.
chosen_contextualized_actions = data[:, :, chosen_arms]
bandit.record_feedback(chosen_contextualized_actions, rewards)
# 5. Train the bandit.
trainer = pl.Trainer(
max_epochs=1,
enable_progress_bar=False,
enable_model_summary=False,
accelerator=accelerator,
)
trainer.fit(bandit)
# (6. Repeat the process as needed)For more detailed examples, see the examples page in the documentation.
-
Bandits: Each bandit is implemented as a PyTorch Lightning Module with
forward()for inference andtraining_step()for training. -
Buffers: Data is managed via buffers that subclass AbstractBanditDataBuffer.
-
Selectors: Easily customize your arm selection strategy by using or extending the provided selectors.
The bandit algorithms are evaluated on different benchmark datasets. Here is an overview over their performance on the Statlog (Shuttle) dataset:

Detailed benchmarks, datasets, and experimental results are available in the . The configuration and even more specific results can be found in ./experiments under the specific sub-directories.
Contributions are welcome! For guidelines on how to contribute, please refer to our CONTRIBUTING.md.
Calvera is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, please reach out to one of the authors:
-
Philipp Kolbe
-
Robert Weeke
-
Parisa Shahabinejad