Skip to content

Anky209e/RL

Repository files navigation

Reinforcement Learning Experiments

A comprehensive collection of reinforcement learning implementations and experiments covering various algorithms and environments. This project demonstrates practical applications of RL techniques from classical Q-learning to advanced deep reinforcement learning methods.

🚀 Overview

This repository contains implementations of multiple reinforcement learning algorithms applied to different environments:

  • Classical RL: Q-Learning on FrozenLake
  • Deep RL: DQN, Double DQN, and PPO on custom environments
  • Custom Environments: Tank Dodge game with Pygame
  • Standard Environments: CartPole balancing

📁 Project Structure

RL/
├── tank_kills/                   # Custom Tank Dodge environment
│   ├── tanke_dodge.py           # Main environment implementation
│   ├── ppo_tank.py              # PPO algorithm implementation
│   ├── dqn_train_tank.py        # DQN training script
│   ├── visual_test_*.py         # Visualization scripts
│   ├── *.ipynb                  # Jupyter notebooks for experiments
│   ├── saved_weights/           # Trained model checkpoints
│   └── assets/                  # Game assets (images)
├── cart_pole/                   # CartPole experiments
│   ├── *.ipynb                  # DQN implementations and experiments
│   └── *.pth                    # Saved model weights
├── frozen_lake_first/           # FrozenLake Q-learning
│   └── frozen_lake_q_table.ipynb
├── main.py                      # Entry point
├── pyproject.toml               # Project dependencies
└── README.md                    # This file

🛠️ Installation

Prerequisites

  • Python 3.14+
  • CUDA-compatible GPU (optional, for accelerated training)

Setup

  1. Clone the repository

    git clone <repository-url>
    cd RL
  2. Install dependencies

    # Using uv (recommended)
    uv sync
    
    # Or using pip
    pip install -e .
  3. Verify installation

    python main.py

🎮 Environments

Tank Dodge Environment

A custom Pygame-based environment where the agent controls a tank that must dodge enemy tanks.

Features:

  • Configurable number of enemies (default: 3)
  • Homing enemy behavior
  • Reward system: +1 for survival, -10 for collision
  • Headless mode for training
  • Normalized state representation

State Space:

  • Player position (x, y) - normalized
  • Enemy positions (x, y) - normalized for each enemy
  • Relative positions between player and enemies - normalized

Action Space:

  • 0: Move up
  • 1: Move right
  • 2: Move down
  • 3: Move left

CartPole Environment

Classic control environment from OpenAI Gym for balancing a pole on a cart.

FrozenLake Environment

Grid-world environment from OpenAI Gym for navigation tasks with holes.

🧠 Algorithms

Q-Learning

  • Location: frozen_lake_first/frozen_lake_q_table.ipynb
  • Environment: FrozenLake-v1
  • Features: Tabular method with epsilon-greedy exploration

Deep Q-Network (DQN)

  • Location: tank_kills/dqn_train_tank.py, cart_pole/dqn_cartpole.ipynb
  • Features:
    • Experience replay buffer
    • Target network stabilization
    • Epsilon-greedy exploration with decay
    • MLflow integration for experiment tracking

Double DQN

  • Location: tank_kills/double_dqn.ipynb
  • Features: Decoupled action selection and evaluation to reduce overestimation bias

Proximal Policy Optimization (PPO)

  • Location: tank_kills/ppo_tank.py
  • Features:
    • Actor-critic architecture
    • Clipped surrogate objective
    • Generalized Advantage Estimation (GAE)
    • Mini-batch updates
    • Entropy regularization for exploration

🏃‍♂️ Usage Examples

Training a Tank Dodge Agent with DQN

cd tank_kills
python dqn_train_tank.py

Training with PPO

cd tank_kills
python ppo_tank.py

Testing a Trained Model

cd tank_kills
python visual_test_dqn.py  # For DQN models
python visual_test_ppo.py   # For PPO models

Running Jupyter Notebooks

# Start Jupyter
jupyter notebook

# Navigate to specific notebooks:
# - frozen_lake_first/frozen_lake_q_table.ipynb
# - cart_pole/dqn_cartpole.ipynb
# - tank_kills/*.ipynb

📊 Experiment Tracking

The project integrates with MLflow for comprehensive experiment tracking:

  • Metrics: Rewards, losses, epsilon values
  • Artifacts: Training plots, model checkpoints
  • Parameters: Hyperparameters logged automatically

Viewing Results

# Start MLflow UI
mlflow ui

# Navigate to http://localhost:5000

🎯 Key Features

Advanced Training Techniques

  • Experience Replay: Breaks temporal correlations for more stable training
  • Target Networks: Stabilizes Q-learning updates
  • Gradient Clipping: Prevents exploding gradients
  • Checkpointing: Resume training from saved states
  • Batch Processing: Efficient GPU utilization

Visualization & Monitoring

  • Real-time training curves
  • Moving average plots
  • Cumulative reward tracking
  • Loss visualization
  • TensorBoard integration (legacy)

Performance Optimizations

  • CUDA acceleration support
  • Vectorized operations
  • Efficient memory management
  • Headless training modes

📈 Results Summary

Tank Dodge Environment Performance

  • DQN: Achieves consistent survival scores after ~2000 episodes
  • PPO: More stable learning curve with better final performance
  • Training Time: ~15-30 minutes on modern GPU for full convergence

CartPole Performance

  • DQN: Reaches maximum score (200) within 500-1000 episodes
  • Convergence: Stable and reproducible results

🧩 Hyperparameters

DQN (Tank Dodge)

learning_rate = 0.0001
discount_factor = 0.99
epsilon_start = 1.0
epsilon_end = 0.005
epsilon_decay = 0.0001
replay_buffer_size = 5000
batch_size = 128

PPO (Tank Dodge)

learning_rate = 0.0005
gamma = 0.99
lambda = 0.95
clip_epsilon = 0.2
ppo_epochs = 8
batch_size = 64
entropy_coef = 0.1
value_coef = 0.5

🔧 Configuration

Environment Parameters

  • Screen Size: 600x600 pixels
  • Tank Speed: 2 pixels/frame
  • Enemy Speed: 1.2 pixels/frame
  • Number of Enemies: Configurable (default: 3)

Training Configuration

  • Checkpoint Frequency: Every 500 episodes
  • Plot Update: Every 20 episodes
  • Progress Print: Every 50-100 episodes

🐛 Troubleshooting

Common Issues

  1. CUDA Out of Memory

    • Reduce batch size
    • Use CPU training: device = torch.device("cpu")
  2. Pygame Display Issues

    • Ensure proper display drivers
    • Use headless mode for training: headless=True
  3. MLflow Logging Issues

    • Check MLflow installation: pip install mlflow
    • Ensure write permissions for logging directory

Performance Tips

  1. GPU Acceleration

    • Ensure CUDA is properly installed
    • Check with: torch.cuda.is_available()
  2. Memory Optimization

    • Adjust replay buffer size based on available RAM
    • Use gradient checkpointing for large models

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -m 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Open a Pull Request

📚 References

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • OpenAI Gym for environment implementations
  • Pygame for custom environment rendering
  • MLflow for experiment tracking
  • The broader reinforcement learning community

Happy Training! 🚀

For questions or issues, please open an issue on GitHub.

About

Reinforcement Learning implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors