A comprehensive collection of reinforcement learning implementations and experiments covering various algorithms and environments. This project demonstrates practical applications of RL techniques from classical Q-learning to advanced deep reinforcement learning methods.
This repository contains implementations of multiple reinforcement learning algorithms applied to different environments:
- Classical RL: Q-Learning on FrozenLake
- Deep RL: DQN, Double DQN, and PPO on custom environments
- Custom Environments: Tank Dodge game with Pygame
- Standard Environments: CartPole balancing
RL/
├── tank_kills/ # Custom Tank Dodge environment
│ ├── tanke_dodge.py # Main environment implementation
│ ├── ppo_tank.py # PPO algorithm implementation
│ ├── dqn_train_tank.py # DQN training script
│ ├── visual_test_*.py # Visualization scripts
│ ├── *.ipynb # Jupyter notebooks for experiments
│ ├── saved_weights/ # Trained model checkpoints
│ └── assets/ # Game assets (images)
├── cart_pole/ # CartPole experiments
│ ├── *.ipynb # DQN implementations and experiments
│ └── *.pth # Saved model weights
├── frozen_lake_first/ # FrozenLake Q-learning
│ └── frozen_lake_q_table.ipynb
├── main.py # Entry point
├── pyproject.toml # Project dependencies
└── README.md # This file
- Python 3.14+
- CUDA-compatible GPU (optional, for accelerated training)
-
Clone the repository
git clone <repository-url> cd RL
-
Install dependencies
# Using uv (recommended) uv sync # Or using pip pip install -e .
-
Verify installation
python main.py
A custom Pygame-based environment where the agent controls a tank that must dodge enemy tanks.
Features:
- Configurable number of enemies (default: 3)
- Homing enemy behavior
- Reward system: +1 for survival, -10 for collision
- Headless mode for training
- Normalized state representation
State Space:
- Player position (x, y) - normalized
- Enemy positions (x, y) - normalized for each enemy
- Relative positions between player and enemies - normalized
Action Space:
- 0: Move up
- 1: Move right
- 2: Move down
- 3: Move left
Classic control environment from OpenAI Gym for balancing a pole on a cart.
Grid-world environment from OpenAI Gym for navigation tasks with holes.
- Location:
frozen_lake_first/frozen_lake_q_table.ipynb - Environment: FrozenLake-v1
- Features: Tabular method with epsilon-greedy exploration
- Location:
tank_kills/dqn_train_tank.py,cart_pole/dqn_cartpole.ipynb - Features:
- Experience replay buffer
- Target network stabilization
- Epsilon-greedy exploration with decay
- MLflow integration for experiment tracking
- Location:
tank_kills/double_dqn.ipynb - Features: Decoupled action selection and evaluation to reduce overestimation bias
- Location:
tank_kills/ppo_tank.py - Features:
- Actor-critic architecture
- Clipped surrogate objective
- Generalized Advantage Estimation (GAE)
- Mini-batch updates
- Entropy regularization for exploration
cd tank_kills
python dqn_train_tank.pycd tank_kills
python ppo_tank.pycd tank_kills
python visual_test_dqn.py # For DQN models
python visual_test_ppo.py # For PPO models# Start Jupyter
jupyter notebook
# Navigate to specific notebooks:
# - frozen_lake_first/frozen_lake_q_table.ipynb
# - cart_pole/dqn_cartpole.ipynb
# - tank_kills/*.ipynbThe project integrates with MLflow for comprehensive experiment tracking:
- Metrics: Rewards, losses, epsilon values
- Artifacts: Training plots, model checkpoints
- Parameters: Hyperparameters logged automatically
# Start MLflow UI
mlflow ui
# Navigate to http://localhost:5000- Experience Replay: Breaks temporal correlations for more stable training
- Target Networks: Stabilizes Q-learning updates
- Gradient Clipping: Prevents exploding gradients
- Checkpointing: Resume training from saved states
- Batch Processing: Efficient GPU utilization
- Real-time training curves
- Moving average plots
- Cumulative reward tracking
- Loss visualization
- TensorBoard integration (legacy)
- CUDA acceleration support
- Vectorized operations
- Efficient memory management
- Headless training modes
- DQN: Achieves consistent survival scores after ~2000 episodes
- PPO: More stable learning curve with better final performance
- Training Time: ~15-30 minutes on modern GPU for full convergence
- DQN: Reaches maximum score (200) within 500-1000 episodes
- Convergence: Stable and reproducible results
learning_rate = 0.0001
discount_factor = 0.99
epsilon_start = 1.0
epsilon_end = 0.005
epsilon_decay = 0.0001
replay_buffer_size = 5000
batch_size = 128learning_rate = 0.0005
gamma = 0.99
lambda = 0.95
clip_epsilon = 0.2
ppo_epochs = 8
batch_size = 64
entropy_coef = 0.1
value_coef = 0.5- Screen Size: 600x600 pixels
- Tank Speed: 2 pixels/frame
- Enemy Speed: 1.2 pixels/frame
- Number of Enemies: Configurable (default: 3)
- Checkpoint Frequency: Every 500 episodes
- Plot Update: Every 20 episodes
- Progress Print: Every 50-100 episodes
-
CUDA Out of Memory
- Reduce batch size
- Use CPU training:
device = torch.device("cpu")
-
Pygame Display Issues
- Ensure proper display drivers
- Use headless mode for training:
headless=True
-
MLflow Logging Issues
- Check MLflow installation:
pip install mlflow - Ensure write permissions for logging directory
- Check MLflow installation:
-
GPU Acceleration
- Ensure CUDA is properly installed
- Check with:
torch.cuda.is_available()
-
Memory Optimization
- Adjust replay buffer size based on available RAM
- Use gradient checkpointing for large models
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -m 'Add feature' - Push to branch:
git push origin feature-name - Open a Pull Request
- Deep Q-Networks (Mnih et al., 2015)
- Proximal Policy Optimization (Schulman et al., 2017)
- Playing Atari with Deep Reinforcement Learning
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Gym for environment implementations
- Pygame for custom environment rendering
- MLflow for experiment tracking
- The broader reinforcement learning community
Happy Training! 🚀
For questions or issues, please open an issue on GitHub.