RL-X

A framework for Reinforcement Learning research.

│ Overview │ Getting Started │ Documentation │ Citation │

Overview

Highlights

💡 Perfect to understand and prototype algorithms:
- One algorithm = One directory -> No backtracking through parent classes
- Algorithms can be easily copied out of RL-X
⚒️ Known DL libraries: Implementations in PyTorch and mainly JAX
⚡ Maximum speed: Just-In-Time (JIT) compilation and parallel environments
🧪 Mix and match and extend: Generic interfaces between algorithms and environments
⛰️ Custom environments: Examples for MuJoCo, Isaac Lab, ManiSkill or custom socket communication
🚀 GPU environments: MJX, Warp, Isaac Lab and ManiSkill can run thousands of parallel environments
🤖 Robot learning: Training and deployment for the Unitree Go2 (quadruped) and G1 (humanoid) robots
⚽ RoboCup: Training for the RoboCup soccer competition in MuJoCo and MJX
🕰️ Memory architectures: PPO with GRU, LSTM, Transformer, History Window, Mamba-2, Memory Actions
📈 Experiments: Checkpoints, Evaluation, Console log, Tensorboard, Weights & Biases, SLURM, Docker

Implemented Algorithms

Proximal Policy Optimization (PPO) in PyTorch, Flax
Proximal Policy Optimization + Differentiable Trust Region Layers (PPO+DTRL) in Flax
Proximal Policy Optimization + Gated Recurrent Unit (PPO+GRU) in Flax
Proximal Policy Optimization + Long Short-Term Memory (PPO+LSTM) in Flax
Proximal Policy Optimization + Transformer (PPO+Transformer) in Flax
Proximal Policy Optimization + History Window (PPO+HistoryWindow) in Flax
Proximal Policy Optimization + Mamba-2 (PPO+Mamba-2) in Flax
Proximal Policy Optimization + Memory Actions (PPO+MemoryActions) in Flax
Early Stopping Policy Optimization (ESPO) in PyTorch, Flax
Deep Deterministic Policy Gradient (DDPG) in Flax
Twin Delayed Deep Deterministic Gradient (TD3) in Flax
Fast Twin Delayed Deep Deterministic Gradient (FastTD3) in PyTorch, Flax
Soft Actor Critic (SAC) in PyTorch, Flax
Fast Soft Actor Critic (FastSAC) in PyTorch, Flax
Randomized Ensembled Double Q-Learning (REDQ) in Flax
Dropout Q-Functions (DroQ) in Flax
CrossQ in Flax
Truncated Quantile Critics (TQC) in Flax
Aggressive Q-Learning with Ensembles (AQE) in Flax
Maximum a Posteriori Policy Optimization (MPO) in PyTorch, Flax
Fast Maximum a Posteriori Policy Optimization (FastMPO) in Flax
Deep Q-Network (DQN) in Flax
Deep Q-Network with Histogram Loss using Gaussians (DQN HL-Gauss) in Flax
Double Deep Q-Network (DDQN) in Flax
Categorical Deep Q-Network (C51) in Flax
Parallelized Q-Network (PQN) in Flax

Usable Environments

Gymnasium
- MuJoCo
- Atari
- Classic control
- DeepMind Control Suite
EnvPool
- MuJoCo
- Atari
- Classic control
- DeepMind Control Suite
MuJoCo Playground
- Locomotion
Custom MuJoCo
- Example of a custom MuJoCo environment
- Example of a custom MuJoCo XLA (MJX) environment
- Example of a custom MuJoCo XLA (MJX) with Warp backend environment
- Example of a custom MuJoCo Warp with PyTorch environment
Custom Robot Learning
- Example of custom MuJoCo, MJX and MJX + Warp environments for quadruped and humanoid locomotion learning and real robot deployment
Custom RoboCup Soccer
- Example of custom MuJoCo and MJX environments for the RoboCup soccer simulation 3D league and other humanoid soccer leagues
Custom Isaac Lab
- Example of a custom Isaac Lab environment
Custom ManiSkill
- Example of a custom ManiSkill environment
Custom Interface
- Prototype of a custom environment interface with socket communication

All listed environments are directly embedded in RL-X and can be used out-of-the-box.

For further information on the environments (README) and algorithms (README) and how to add your own, read the respective README files.

Getting Started

Install

Default installation for a Linux system with a NVIDIA GPU:

conda create -n rlx python=3.11.4
conda activate rlx
git clone git@github.com:nico-bohlinger/RL-X.git
cd RL-X
pip install -e .[all] --config-settings editable_mode=compat
pip uninstall $(pip freeze | grep -E '\-cu12|\-cu13' | cut -d '=' -f 1) -y
pip install "torch>=2.7.0" --index-url https://download.pytorch.org/whl/cu118 --upgrade
pip install "jax[cuda12]"

For other configurations, see the detailed installation guide in the documentation. As Isaac Lab needs to be installed separately, instructions can also be found there. Similarly, ManiSkill might need additional steps, like downgrading numpy.

Example

cd experiments
python experiment.py

Detailed instructions for running experiments can be found in the README file in the experiments directory or in the documentation.

Example for Google Colab:

Citation

If you use RL-X in your research, please cite the following paper:

@incollection{bohlinger2023rlx,
      title={RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup}, 
      author={Nico Bohlinger and Klaus Dorer},
      booktitle={Robot World Cup},
      pages={228--239},
      year={2023},
      publisher={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 528 Commits
.github		.github
docs		docs
experiments		experiments
rl_x		rl_x
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-X

Overview

Highlights

Implemented Algorithms

Usable Environments

Getting Started

Install

Example

Citation

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL-X

Overview

Highlights

Implemented Algorithms

Usable Environments

Getting Started

Install

Example

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages