A clean, modular implementation of continuous-action reinforcement learning (RL) algorithms and environments, designed for experimentation, benchmarking, and applied research.
This repository was developed in the context of the IFT6162 – Reinforcement Learning course at Mila and the Université de Montréal, and closely follows the methodology and notation of Pierre-Luc Bacon’s RL book.
- Continuous-control RL algorithms implemented from scratch
- Custom Gymnasium-compatible environments for training and evaluation
- Realistic industrial environments inspired by a flash clay calciner used in cement production
- Clear code structure, suitable for learning, research, and extension
Implementation of continuous-action reinforcement learning algorithms:
- REINFORCE – Monte Carlo policy gradient method
- PPO (Proximal Policy Optimization) – On-policy actor–critic with clipped objective
- TD3 (Twin Delayed DDPG) – Off-policy algorithm for continuous control
All implementations are aligned with the formulations presented in the reference book.
A collection of custom continuous Gymnasium environments used to train and evaluate the algorithms.
- Fully compatible with Gymnasium APIs
- Designed for reproducibility and controlled experimentation
A set of environments modeling a real-world flash clay calciner, an industrial system used in cement manufacturing.
- Focused on realistic dynamics and control constraints
- Heavily inspired by the calciner environments from the IFT6162 homework repository
- Suitable for testing RL methods in applied, high-dimensional control settings
Developed in collaboration with:
- Martin Medina — https://github.com/medinammartin3
As part of the IFT6162 - Reinforcement Learning course at Mila and the Université de Montréal.