This repository collects implementations and notes for several reinforcement learning algorithms, organized by method and environment.
Most of these implementations originate from coursework and projects for the Deep Reinforcement Learning course at Sharif University of Technology.
The DQN folder contains an implementation of the Deep Q‑Network algorithm applied to the LunarLander environment.
The PPO folder implements Proximal Policy Optimization for the continuous‑control HalfCheetah environment.
The PPO_SAC_DDPG_theory folder provides concise theoretical notes and derivations for PPO, SAC, and DDPG. It focuses on objective functions, policy/value updates, and the main intuitions behind on‑policy and off‑policy actor–critic methods.
The SAC_DDPG folder includes implementations of:
- SAC on CartPole and HalfCheetah, highlighting entropy‑regularized learning and stochastic policies.
- DDPG on HalfCheetah, demonstrating deterministic policy gradients for continuous action spaces.
The TD&MC folder implements Temporal Difference and Monte Carlo methods on the CliffWalking environment. It compares on‑policy TD learning with episodic Monte Carlo returns, making the trade‑offs between bias, variance, and exploration more concrete.