This repository contains a 2D driving simulation built with Pygame + Gymnasium, where an autonomous agent learns using radar sensor readings to race around a curved racetrack using RL. Two RL approaches are explored:
- Deep Q-Network (DQN) — neural network agent using PyTorch
- Q-Table Learning — tabular RL baseline
The simulation environment is implemented as custom Gym environment (gym_race), and gameplay is rendered using Pygame. The environment supports real-time rendering for visualization and non-render mode for faster training.
├── gym_race/ # Gym environment
│ └── envs/
│ ├── pyrace_2d.py
│ ├── race_env.py
│ └── utils.py
├── models_DQN_v01/ # Saved DQN models
│ ├── best_dqn_model.pth
│ └── dqn_model_0.pth
├── models_QT_v02/ # Saved QTable memory, tables
│ ├── memory_3500.npy
│ └── q_table_3500.npy
├── Pyrace_RL_QTable.py # Main RL training/testing script
├── Pyrace_performance_analysis.ipynb #Training analysis notebook
└── *.png # Racing environment visual assets
The Pyrace-v1 environment simulates a top-down 2D vehicle navigating a track using:
- Ray based sensor inputs
- Discrete actions (accelerate, turn left/right, ...)
- Reward shaping for progress and collision penalties
| Algorithm | File | Description |
|---|---|---|
| Deep QNetwork (DQN) | Pyrace_RL_QTable.py | NN approximates QValues using PyTorch |
| Q-Table RL | Pyrace_RL_QTable.py (legacy section & saved tables) | Baseline QLearning for comparison |
The state consists of 5 radar sensor distances, normalized within [0,10]. Sensors are angled across the front of the car, with higher values meaning further distances.
[dist_1, dist_2, dist_3, dist_4, dist_5]| Action | Effect |
|---|---|
| 0 | Accelerate |
| 1 | Turn left |
| 2 | Turn right |
| 3 | Brake (available in core env) |
The agent is encouraged to move foward & pass the checkpoint (full lap around track), and avoid walls.
| Event | Reward |
|---|---|
| Checkpoint progress | + distance-based reward |
| Crash -10000 | + distance traveled |
| Lap complete | +10000 bonus |
pip install -r requirements.txtWithin Pyrace_RL_QTable.py, change this line of code:
#simulate()
load_and_play("best", learning=True)to:
simulate()
# load_and_play("best", learning=True)Run the load_and_play function (and turn training off) to run the previously (best) trained agent:
#simulate()
load_and_play("best", learning=False) Performance of the agents was evaluated using two approaches:
- DQN Learning Curves
- Tracked episodic rewards, the number of steps/episode during training.
- To assess improvement over time, visualized averaged rewards and rewards per step trends.
This shows how efficiently the agent learns to navigate the track, avoid collisions, and complete laps.
- Q-Table Policy Interpretation
- For tabular Q-learning agents, aggregated Q-values across radar sensor states to determine any preferred actions.
- Normalized and scaled aggregated values to produce a visual “policy fingerprint” showing which actions the agent favors based on obstacle direction/distance.
Both analyses provided insight into both the overall learning progress and action selection behavior, helping compare DQN and QTable approaches (and guiding further improvements).