This project implements and compares two Reinforcement Learning techniques on the classic FrozenLake-v1 environment from Gymnasium:
-
Q-Learning (Tabular Method)
- Uses a Q-Table to store state-action values.
- Includes training, validation, and visualization.
-
Deep Q-Learning (DQN, Neural Network)
- Uses a neural network to approximate Q-values.
- Includes experience replay and a target network.
- Optimized for performance and stability.
The goal is to train an agent to navigate the frozen lake safely, avoiding holes and reaching the goal.
FrozenLakeAlec-main/
├── frozen_lake_enhanced.py # Q-Learning baseline (sourced from johnnycode8/gym_solutions)
├── frozen_lake_q.py # Q-Learning (inspired but significantly modified)
├── frozen_lake_dql.py # Deep Q-Learning (inspired but significantly modified)
│
├── Model/
│ ├── frozen_lake8x8.pkl # Saved Q-Table (Pickle)
│ └── frozen_lake_dql_optimized.pt # Trained DQN (PyTorch)
│
├── docs/
│ ├── GraphiqueQTable/ # Graphs for Q-Learning
│ │ ├── precision_evolution.png
│ │ ├── exploration_vs_exploitation.png
│ │ ├── cumulative_rewards.png
│ │ └── q_table_final.png
│ ├── GraphiqueDQL/ # Graphs for DQN
│ │ └── frozen_lake_optimized.png
│ └── img/ # Images (board, sprites, environment)
│ ├── environment.jpeg
│ ├── elf_up.png / elf_down.png / elf_left.png / elf_right.png
│ ├── hole.png / cracked_hole.png
│ ├── ice.png / stool.png
│ └── goal.png
│
├── requirements.txt
├── setup_and_run.sh
├── .gitignore
├── License
└── README.md
- Q-Table initialized with small values.
- Rewards are shaped to accelerate learning:
- +100 for reaching the goal.
- -100 for falling into a hole.
- -1 penalty per step, +10 for good intermediate states.
- Tracks metrics:
- Accuracy per episode
- Exploration vs exploitation
- Cumulative rewards
- Model saved as:
Model/frozen_lake8x8.pkl
- Neural network architecture:
- Input = one-hot encoded state
- 2 hidden layers (128 nodes each, ReLU)
- Output = Q-values per action
- Features:
- Experience Replay (ReplayMemory)
- Target Network updated every N steps
- Epsilon-greedy policy with decay
- Hyperparameters:
- Learning rate:
0.001 - Discount factor γ:
0.95 - Replay memory:
10000 - Mini-batch size:
32
- Learning rate:
- Model saved as:
Model/frozen_lake_dql_optimized.pt
precision_evolution.png→ Accuracy evolutionexploration_vs_exploitation.png→ Exploration vs Exploitationcumulative_rewards.png→ Reward progressionq_table_final.png→ Final Q-Table (as matrix)
frozen_lake_optimized.png→ Average reward and epsilon decay
bash setup_and_run.shThis script will:
- Create a virtual environment
.venv - Install all dependencies (
requirements.txt) - Install PyTorch properly (Mac/Linux CPU support included)
- Run a validation script
- Q-Learning (Enhanced baseline):
python frozen_lake_enhanced.py
- Q-Learning (Modified):
python frozen_lake_q.py
- Deep Q-Learning:
python frozen_lake_dql.py
Validation is built into both frozen_lake_q.py and frozen_lake_dql.py (training + testing phases).
- Direct comparison of tabular Q-Learning vs neural DQN.
- Clear visualization of the exploration-exploitation trade-off.
- Importance of reward shaping for guiding the agent.
- Demonstrates transition from classical RL to Deep RL.
Alec Waumans
Industrial Computer Science Student
This project is licensed under the MIT License.
frozen_lake_enhanced.pysourced from johnnycode8/gym_solutions.frozen_lake_q.pyandfrozen_lake_dql.pywere inspired by the same project but heavily modified to extend functionality, add reward shaping, improve validation, and generate detailed graphs.- All additional project structure, documentation, and improvements by Alec Waumans.
