Based on FlapPyBird and flappybird-qlearning-bot.
To run the game you need the following packages to install:
- python2.7
- python-tk
And in Python, you need to install the following libraries:
- numpy
- pygame
- sklearn
- matplotlib
- tensorflow (guide to install)
To install the libraries you can use, for example, the python-pip package and run the following commands:
pip install numpy pygame sklearn matplotlib tensorflow
The codes have been tested on Ubuntu 16.04 and 18.04.
To run the game with graphics interface (if not set SHOW_SCREEN=False in constants.py), run:
python -B flappy.py
To run the game without graphics interface, run:
python -B onlylearning.py
The following subsections describe contents of files and subdirectories.
The file bot.py contains importing of all the bots available in the sources (the directory bots). In variable BOT loads the imported class you want to use.
The file constants.py contains constants to run the game Flappy Bird. Meaning of the constants is as follows:
SPEEDrepresents frame rate in the game. 30 is the default value and the higher value is, the faster is the game. By using 0, the game is as fast as possible.PLAY_SOUNDSto play sounds of the game.SHOW_SCREENto show the screen of the game.FREEZ_AT_THE_ENDto wait for the player to press a key after the death of the bird to start a new game.
The file flappy.py contains the sources of the game with the graphical interface. You can change the variable USE_OWN_PIPES to True if you want to use your own pipes and defined the folder where the pipes are saved in the PIPES_FOLDER variable. In the variable NUMBER_OF_GAMES you can specify how many games should be played. After that number is reached the game is terminated. You can also set the variable to a special value INFINITE_GAMES which means that the game is only terminated by the user (by closing the window or pressing CTRL+c).
The file flappy.py contains the sources of the game without the graphical interface. This file is usually used when a bot is trained (to train faster). Meanings of the variables USE_OWN_PIPES, PIPES_FOLDER and NUMBER_OF_GAMES are the same as it is in the previous subsection.
The file plotfigure.py contains three functions which are prepared to be run by the runp program (installation, ex. pip install runp).
plot_figurevisualizes the input data of scores to an output image with trend accuracy set in the variableTREND_ACCURACY.plot_scores_with_averagevisualizes the input scores to an output image with the highlighted average score.plot_scores_with_averagevisualizes individual games with the highlighted average score of the input score files from the variableFILES.
The file run.sh is a bash script which runs multiple instances of the onlylearning.py script with tags defined in the file in the part seq x y where x is the beginning tag and y is the ending tag (for example for seq 1 3 the script will run three instances python -B onlylearning.py 1 &, python -B onlylearning.py 2 & and python -B onlylearning.py 3 &).
The directory assets contains files for the game itself (audio, sprites, other static data, etc.).
The directory bots contains an abstract class Bot in the file bot.py. The class contains three methods act, dead and stop. All of these methods should be implemented to define a new bot.
The already implemented bots:
- A greedy bot in the file
bot_greedy.py. - Another greedy bot in the file
bot_greedy_2.py. - A bot implementing the Q-learning algorithm with parameters:
LEARNINGdefining whether the bot should be trained or not,VERTICAL_ACCURACY_OF_STATESrepresenting the rounding number for the vertical positions,HORIZONTAL_ACCURACY_OF_STATESrepresenting the rounding number for the horizontal positions,ALIVE_REWARDandDEAD_PENALIZATIONrepresenting the positive and negative rewards in the game,Q_FILE_BASErepresenting the base for the name of the file where Q-values are stored,DATA_FILE_BASErepresenting the base for the name of the file where scores of the played games are stored,DISCOUNTrepresenting the parameter discount factor of the Q-learning algorithm,LEARNING_RATErepresenting the parameter learning rate of the Q-learning algorithm,EPSILON_POLICYdefining whether ϵ-greedy policy is used or not,DUMPING_RATErepresenting the number after how many iterations the Q-values in the file should be updated,LAST_STATES_PENALIZEDrepresenting the number of the last states to be penalized,STEPSrepresenting the number of future steps taken into account when picking a new action.
- A bot implementing the Deep Q-learning algorithm with parameters:
LEARNINGdefining whether the bot should be trained or not,REWARDandPENALIZATIONrepresenting positive and negative rewards in the game,DATA_FILErepresenting the name of the file where scores of the played games are stored,EPSILON_POLICYdefining whether ϵ-greedy policy is used or not,DUMPING_RATErepresenting the number after how many iterations the Q-network is updated in the file,LAST_STATES_IGNOREDrepresenting the number of the last states which are ignored during training the Q-network,STEPSrepresenting the number of future steps taken into account when picking a new action,REPLAY_MEMORY_CAPACITYrepresenting the capacity of the replay memory,MINIBATCH_SIZErepresenting the size of the mini-batch used for training the Q-network,REPLAY_STEPSrepresenting the number of repetitions for training the Q-network with the initial data,TRAIN_STEPrepresenting the number of repetitions for training the Q-network with the all other data collected during training,PROBABILITY_TO_FLAPrepresenting the probability to flap when taking a random action,GREEDY_PROBABILITYrepresenting the probability of taking a greedy action during initialization,NETWORKS_COUNTrepresenting the number of trained networks when using more Q-networks.
The directory models contains two folders:
DeepQLearningcontaining a pre-trained model for the Deep Q-learning algorithm,QLearningcontaining a pre-trained model for the Q-learning algorithm.
The directory scripts contains four files:
The file heatmap_q_values.py contains a script to create the heatmap of decisions of a bot. To run the script first load the bot, you want to use, in the bot variable. Then set the velocity you want to visualize in the variable VEL and set the name of the output file in the variable OUT_IMAGE and other parameters respectively. Run the script by:
python heatmap_q_values.py
To crop the images use one of the following commands:
convert -crop 854x1954+1106+285 name.png
or
mogrify -crop 854x1954+1106+285 *.png
where X=1106, Y=285, Width=854 and Height=1954.
An example output from the file scripts/ql10.png:
The file init_train_functions.py contains functions to generate and compare initial Q-networks by using Q-values generated by the Q-learning algorithm. The meaning of the significant constants:
NETWORKS_COUNTrepresents the number of Q-networks to generate.OPTIMIZERSrepresents used optimizers for Q-networks (ith network will use (i modlen(OPTIMIZERS))th optimizer).
The meaning of the public functions is as follows:
get_q_valuesreturns Q-values stored in the file with namefile_name,positive_and_negative_valuesprints the number of positive and negative values of the given Q-values,max_min_valueprints the max and min Q-value,get_data_for_trainingreturns data in a format for learning,save_decisionsenriches the Q-values with the actions which are taken by Q-learning for each state,train_networkstrains the inputnetworkswithdata_inanddata_outtrain_counttimes,compare_computationcompares decisions of Q-values and Q-networks.
An example of using the defined functions could be as follows:
from init_train_functions import *
q = get_q_values("q")
inn, outt = get_data_for_training(q)
count = len(inn)
networks = [FNN(OPTIMIZERS[i % len(OPTIMIZERS)], i) for i in range(NETWORKS_COUNT)]
networks = train_networks(networks, inn[:count], outt[:count], 100)
compare_computation(networks, q, 100)
The file pipes_generator.py generates pipes for the Flappy Bird game (for example to use the same situations when comparing the performance of algorithms). The parameter PIPES_IN_ONE_GAME defines how many pipes are in one generated game, NUMBER_OF_GAMES defines the count of generated games and FOLDER represents the folder where the generated games are stored.
The file plot_multiple_data.py plots multiple scores stored in files for individual instances in folders defined in the files variable. TESTS_COUNT defines how many scores in each folder should be taken into account. FOLDER_BASE defines the root folder where subfolders with the data are stored. The indices variable defines strings for the legend - ith string is used for the ith file in files. The output image is stored in FOLDER_BASE + TYPE + ".png" file and the highest score and highest average score and the highest score and the highest average score of the worse instance are stored in the file FOLDER_BASE + TYPE + ".txt".
An example output from the file scripts/lr.png:

