A framework for Reinforcement Learning research.
│ Overview │ Getting Started │ Documentation │ Citation │
- 💡 Perfect to understand and prototype algorithms:
- One algorithm = One directory -> No backtracking through parent classes
- Algorithms can be easily copied out of RL-X
- ⚒️ Known DL libraries: Implementations in PyTorch and mainly JAX
- ⚡ Maximum speed: Just-In-Time (JIT) compilation and parallel environments
- 🧪 Mix and match and extend: Generic interfaces between algorithms and environments
- ⛰️ Custom environments: Examples for MuJoCo, Isaac Lab, ManiSkill or custom socket communication
- 🚀 GPU environments: MJX, Warp, Isaac Lab and ManiSkill can run thousands of parallel environments
- 🤖 Robot learning: Training and deployment for the Unitree Go2 (quadruped) and G1 (humanoid) robots
- ⚽ RoboCup: Training for the RoboCup soccer competition in MuJoCo and MJX
- 🕰️ Memory architectures: PPO with GRU, LSTM, Transformer, History Window, Mamba-2, Memory Actions
- 📈 Experiments: Checkpoints, Evaluation, Console log, Tensorboard, Weights & Biases, SLURM, Docker
- Proximal Policy Optimization (PPO) in PyTorch, Flax
- Proximal Policy Optimization + Differentiable Trust Region Layers (PPO+DTRL) in Flax
- Proximal Policy Optimization + Gated Recurrent Unit (PPO+GRU) in Flax
- Proximal Policy Optimization + Long Short-Term Memory (PPO+LSTM) in Flax
- Proximal Policy Optimization + Transformer (PPO+Transformer) in Flax
- Proximal Policy Optimization + History Window (PPO+HistoryWindow) in Flax
- Proximal Policy Optimization + Mamba-2 (PPO+Mamba-2) in Flax
- Proximal Policy Optimization + Memory Actions (PPO+MemoryActions) in Flax
- Early Stopping Policy Optimization (ESPO) in PyTorch, Flax
- Deep Deterministic Policy Gradient (DDPG) in Flax
- Twin Delayed Deep Deterministic Gradient (TD3) in Flax
- Fast Twin Delayed Deep Deterministic Gradient (FastTD3) in PyTorch, Flax
- Soft Actor Critic (SAC) in PyTorch, Flax
- Fast Soft Actor Critic (FastSAC) in PyTorch, Flax
- Randomized Ensembled Double Q-Learning (REDQ) in Flax
- Dropout Q-Functions (DroQ) in Flax
- CrossQ in Flax
- Truncated Quantile Critics (TQC) in Flax
- Aggressive Q-Learning with Ensembles (AQE) in Flax
- Maximum a Posteriori Policy Optimization (MPO) in PyTorch, Flax
- Fast Maximum a Posteriori Policy Optimization (FastMPO) in Flax
- Deep Q-Network (DQN) in Flax
- Deep Q-Network with Histogram Loss using Gaussians (DQN HL-Gauss) in Flax
- Double Deep Q-Network (DDQN) in Flax
- Categorical Deep Q-Network (C51) in Flax
- Parallelized Q-Network (PQN) in Flax
- Gymnasium
- MuJoCo
- Atari
- Classic control
- DeepMind Control Suite
- EnvPool
- MuJoCo
- Atari
- Classic control
- DeepMind Control Suite
- MuJoCo Playground
- Locomotion
- Custom MuJoCo
- Example of a custom MuJoCo environment
- Example of a custom MuJoCo XLA (MJX) environment
- Example of a custom MuJoCo XLA (MJX) with Warp backend environment
- Example of a custom MuJoCo Warp with PyTorch environment
- Custom Robot Learning
- Example of custom MuJoCo, MJX and MJX + Warp environments for quadruped and humanoid locomotion learning and real robot deployment
- Custom RoboCup Soccer
- Example of custom MuJoCo and MJX environments for the RoboCup soccer simulation 3D league and other humanoid soccer leagues
- Custom Isaac Lab
- Example of a custom Isaac Lab environment
- Custom ManiSkill
- Example of a custom ManiSkill environment
- Custom Interface
- Prototype of a custom environment interface with socket communication
All listed environments are directly embedded in RL-X and can be used out-of-the-box.
For further information on the environments (README) and algorithms (README) and how to add your own, read the respective README files.
Default installation for a Linux system with a NVIDIA GPU:
conda create -n rlx python=3.11.4
conda activate rlx
git clone git@github.com:nico-bohlinger/RL-X.git
cd RL-X
pip install -e .[all] --config-settings editable_mode=compat
pip uninstall $(pip freeze | grep -E '\-cu12|\-cu13' | cut -d '=' -f 1) -y
pip install "torch>=2.7.0" --index-url https://download.pytorch.org/whl/cu118 --upgrade
pip install "jax[cuda12]"
For other configurations, see the detailed installation guide in the documentation. As Isaac Lab needs to be installed separately, instructions can also be found there. Similarly, ManiSkill might need additional steps, like downgrading numpy.
cd experiments
python experiment.py
Detailed instructions for running experiments can be found in the README file in the experiments directory or in the documentation.
If you use RL-X in your research, please cite the following paper:
@incollection{bohlinger2023rlx,
title={RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup},
author={Nico Bohlinger and Klaus Dorer},
booktitle={Robot World Cup},
pages={228--239},
year={2023},
publisher={Springer}
}