This project trains a Deep Q-Network (DQN) to play a Unity-based Flappy Bird clone. Unity runs the game simulation, Python runs the reinforcement learning loop, and the two processes communicate over a local TCP socket on 127.0.0.1:9999.
The main idea is:
- Unity exposes the game as an environment.
- Python chooses an action each step:
0= do nothing,1= flap. - Unity returns the next state, a reward, and whether the episode ended.
- Python stores transitions in replay memory and trains a DQN with PyTorch.
The training loop lives in dqn/train.py.
For each episode, the trainer:
- waits for the Unity scene to connect,
- sends an action to Unity,
- receives a state vector, reward, and done flag,
- stores the transition in replay memory,
- performs a DQN training step,
- logs progress and saves checkpoints.
Unity sends a 4-value normalized state from flappy_bird/Assets/Scripts/FlappyEnvironment.cs:
- bird Y position
- bird vertical velocity
- horizontal distance to the next pipe
- Y position of the next pipe gap
0: no flap1: flap
The current reward function is:
+10.0for passing a pipe-5.0for dying+0.1for surviving a step-0.1for flapping-0.1when the bird is far from the next gap+0.2when the bird stays close to the next gap
The current DQN implementation in dqn/dqn_agent.py uses:
- a 2-layer MLP with hidden size
128 - replay buffer capacity
50000 - batch size
64 - discount factor
0.99 - Adam optimizer with learning rate
3e-4 - target network sync every
100training steps - epsilon-greedy exploration with decay
Checkpoints are written by dqn/train.py to:
dqn/checkpoints/dqn_latest.pthdqn/checkpoints/dqn_best.pthdqn/checkpoints/dqn_ep500.pth,dqn_ep1000.pth, etc.
Training metrics are also written to dqn/training_log.csv.
flappy_bird_rl_proj/
|-- dqn/
| |-- train.py
| |-- environment.py
| |-- dqn_agent.py
| `-- replay_buffer.py
|-- flappy_bird/
| |-- Assets/
| |-- Packages/
| `-- ProjectSettings/
`-- README.md
- Unity editor version
6000.3.2f1
This is the version recorded in flappy_bird/ProjectSettings/ProjectVersion.txt.
This repo does not currently include a pinned requirements.txt, so set up a Python environment with at least:
numpytorch
Example:
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install numpy torchOpen the flappy_bird folder in Unity Hub using Unity 6000.3.2f1.
After the project loads, open:
Assets/Scenes/GameScene.unity
From the repo root:
cd dqn
python train.pyAt this point the trainer starts a TCP server and waits for Unity to connect. You should see:
Waiting for Unity to connect...
With GameScene open, press Play in the Unity editor.
When Play mode starts:
- Unity connects to the Python process on
127.0.0.1:9999 - the menu UI is hidden
- episodes start automatically
- the episode counter updates inside the scene
Once Unity connects, Python begins training immediately.
While training runs:
- Unity shows the bird playing repeated episodes
- the Python console prints per-episode rewards
training_log.csvis overwritten and updated live- model checkpoints are saved into
dqn/checkpoints/
If dqn/checkpoints/dqn_best.pth already exists, train.py loads it before training. If it does not exist, training starts from scratch.
Checkpoint loading is currently controlled directly in dqn/train.py:
CHECKPOINTsets which model file to loadagent.epsilonsets the starting exploration ratebest_avg_rewardsets the baseline used to decide when to overwritedqn_best.pth
If you want to resume from a different checkpoint, edit those values before running the script.
- The Python trainer should usually be started before pressing Play in Unity, because Unity retries the socket connection until the Python server is available.
- The bird can still be flapped manually with mouse input because that behavior is present in
flappy_bird/Assets/Scripts/BirdController.cs, but the intended workflow here is automated RL training. - This repo is focused on training inside the Unity editor. There is not currently a separate evaluation or inference script.