Reinforcement Learning Experiments

A comprehensive collection of reinforcement learning implementations and experiments covering various algorithms and environments. This project demonstrates practical applications of RL techniques from classical Q-learning to advanced deep reinforcement learning methods.

🚀 Overview

This repository contains implementations of multiple reinforcement learning algorithms applied to different environments:

Classical RL: Q-Learning on FrozenLake
Deep RL: DQN, Double DQN, and PPO on custom environments
Custom Environments: Tank Dodge game with Pygame
Standard Environments: CartPole balancing

📁 Project Structure

RL/
├── tank_kills/                   # Custom Tank Dodge environment
│   ├── tanke_dodge.py           # Main environment implementation
│   ├── ppo_tank.py              # PPO algorithm implementation
│   ├── dqn_train_tank.py        # DQN training script
│   ├── visual_test_*.py         # Visualization scripts
│   ├── *.ipynb                  # Jupyter notebooks for experiments
│   ├── saved_weights/           # Trained model checkpoints
│   └── assets/                  # Game assets (images)
├── cart_pole/                   # CartPole experiments
│   ├── *.ipynb                  # DQN implementations and experiments
│   └── *.pth                    # Saved model weights
├── frozen_lake_first/           # FrozenLake Q-learning
│   └── frozen_lake_q_table.ipynb
├── main.py                      # Entry point
├── pyproject.toml               # Project dependencies
└── README.md                    # This file

🛠️ Installation

Prerequisites

Python 3.14+
CUDA-compatible GPU (optional, for accelerated training)

Setup

Clone the repository
```
git clone <repository-url>
cd RL
```

Install dependencies

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

Verify installation
```
python main.py
```

🎮 Environments

Tank Dodge Environment

A custom Pygame-based environment where the agent controls a tank that must dodge enemy tanks.

Features:

Configurable number of enemies (default: 3)
Homing enemy behavior
Reward system: +1 for survival, -10 for collision
Headless mode for training
Normalized state representation

State Space:

Player position (x, y) - normalized
Enemy positions (x, y) - normalized for each enemy
Relative positions between player and enemies - normalized

Action Space:

0: Move up
1: Move right
2: Move down
3: Move left

CartPole Environment

Classic control environment from OpenAI Gym for balancing a pole on a cart.

FrozenLake Environment

Grid-world environment from OpenAI Gym for navigation tasks with holes.

🧠 Algorithms

Q-Learning

Location: frozen_lake_first/frozen_lake_q_table.ipynb
Environment: FrozenLake-v1
Features: Tabular method with epsilon-greedy exploration

Deep Q-Network (DQN)

Location: tank_kills/dqn_train_tank.py, cart_pole/dqn_cartpole.ipynb
Features:
- Experience replay buffer
- Target network stabilization
- Epsilon-greedy exploration with decay
- MLflow integration for experiment tracking

Double DQN

Location: tank_kills/double_dqn.ipynb
Features: Decoupled action selection and evaluation to reduce overestimation bias

Proximal Policy Optimization (PPO)

Location: tank_kills/ppo_tank.py
Features:
- Actor-critic architecture
- Clipped surrogate objective
- Generalized Advantage Estimation (GAE)
- Mini-batch updates
- Entropy regularization for exploration

🏃‍♂️ Usage Examples

Training a Tank Dodge Agent with DQN

cd tank_kills
python dqn_train_tank.py

Training with PPO

cd tank_kills
python ppo_tank.py

Testing a Trained Model

cd tank_kills
python visual_test_dqn.py  # For DQN models
python visual_test_ppo.py   # For PPO models

Running Jupyter Notebooks

# Start Jupyter
jupyter notebook

# Navigate to specific notebooks:
# - frozen_lake_first/frozen_lake_q_table.ipynb
# - cart_pole/dqn_cartpole.ipynb
# - tank_kills/*.ipynb

📊 Experiment Tracking

The project integrates with MLflow for comprehensive experiment tracking:

Metrics: Rewards, losses, epsilon values
Artifacts: Training plots, model checkpoints
Parameters: Hyperparameters logged automatically

Viewing Results

# Start MLflow UI
mlflow ui

# Navigate to http://localhost:5000

🎯 Key Features

Advanced Training Techniques

Experience Replay: Breaks temporal correlations for more stable training
Target Networks: Stabilizes Q-learning updates
Gradient Clipping: Prevents exploding gradients
Checkpointing: Resume training from saved states
Batch Processing: Efficient GPU utilization

Visualization & Monitoring

Real-time training curves
Moving average plots
Cumulative reward tracking
Loss visualization
TensorBoard integration (legacy)

Performance Optimizations

CUDA acceleration support
Vectorized operations
Efficient memory management
Headless training modes

📈 Results Summary

Tank Dodge Environment Performance

DQN: Achieves consistent survival scores after ~2000 episodes
PPO: More stable learning curve with better final performance
Training Time: ~15-30 minutes on modern GPU for full convergence

CartPole Performance

DQN: Reaches maximum score (200) within 500-1000 episodes
Convergence: Stable and reproducible results

🧩 Hyperparameters

DQN (Tank Dodge)

learning_rate = 0.0001
discount_factor = 0.99
epsilon_start = 1.0
epsilon_end = 0.005
epsilon_decay = 0.0001
replay_buffer_size = 5000
batch_size = 128

PPO (Tank Dodge)

learning_rate = 0.0005
gamma = 0.99
lambda = 0.95
clip_epsilon = 0.2
ppo_epochs = 8
batch_size = 64
entropy_coef = 0.1
value_coef = 0.5

🔧 Configuration

Environment Parameters

Screen Size: 600x600 pixels
Tank Speed: 2 pixels/frame
Enemy Speed: 1.2 pixels/frame
Number of Enemies: Configurable (default: 3)

Training Configuration

Checkpoint Frequency: Every 500 episodes
Plot Update: Every 20 episodes
Progress Print: Every 50-100 episodes

🐛 Troubleshooting

Common Issues

CUDA Out of Memory
- Reduce batch size
- Use CPU training: device = torch.device("cpu")
Pygame Display Issues
- Ensure proper display drivers
- Use headless mode for training: headless=True
MLflow Logging Issues
- Check MLflow installation: pip install mlflow
- Ensure write permissions for logging directory

Performance Tips

GPU Acceleration
- Ensure CUDA is properly installed
- Check with: torch.cuda.is_available()
Memory Optimization
- Adjust replay buffer size based on available RAM
- Use gradient checkpointing for large models

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit changes: git commit -m 'Add feature'
Push to branch: git push origin feature-name
Open a Pull Request

📚 References

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI Gym for environment implementations
Pygame for custom environment rendering
MLflow for experiment tracking
The broader reinforcement learning community

Happy Training! 🚀

For questions or issues, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
cart_pole		cart_pole
frozen_lake_first		frozen_lake_first
tank_kills		tank_kills
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
TODO.md		TODO.md
main.py		main.py
pyproject.toml		pyproject.toml
q_table_3.json		q_table_3.json
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Experiments

🚀 Overview

📁 Project Structure

🛠️ Installation

Prerequisites

Setup

🎮 Environments

Tank Dodge Environment

CartPole Environment

FrozenLake Environment

🧠 Algorithms

Q-Learning

Deep Q-Network (DQN)

Double DQN

Proximal Policy Optimization (PPO)

🏃‍♂️ Usage Examples

Training a Tank Dodge Agent with DQN

Training with PPO

Testing a Trained Model

Running Jupyter Notebooks

📊 Experiment Tracking

Viewing Results

🎯 Key Features

Advanced Training Techniques

Visualization & Monitoring

Performance Optimizations

📈 Results Summary

Tank Dodge Environment Performance

CartPole Performance

🧩 Hyperparameters

DQN (Tank Dodge)

PPO (Tank Dodge)

🔧 Configuration

Environment Parameters

Training Configuration

🐛 Troubleshooting

Common Issues

Performance Tips

🤝 Contributing

📚 References

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages