A library for easily building Monte Carlo Tree Search models.
Simply pip install this git repo with the following code
pip install git+https://github.com/mattdeak/mcts.git
Below is example code of how to configure a Monte Carlo Tree Search model with this library.
from mcts.environments import TicTacToe
from mcts.mcts import MCTS
config = {
'action' : 'most-visited',
'selection' : 'ucb1',
'selection_kwargs' : {'C' : 1.14},
'expansion' : 'vanilla',
'simulation' : 'random-to-end',
'update' : 'vanilla'
}
# Initialize Environment
tictactoe = TicTacToe()
# Initialize MCTS
ai = MCTS(tictactoe, calculation_time=5)
# Build the MCTS with the config dictionary
ai.build(config)
Each model is composed of five different policies.
- The
actionpolicy determines the method by which the model will choose an action after the monte carlo tree search has been completed. - The
selectionpolicy determines the method by which the model chooses actions in the selection phase of the tree search. - The
expansionpolicy determines how the model chooses actions in the expansion phase. - The
simulationpolicy determines how the simulation is run. SiimulateToEnd, for example, will simulate games until termination. Note that the simulation policy also requires arolloutpolicy. This will determine which policy the model uses to select actions in the simulation phase. - The
updatepolicy handles updating the nodes at the end of a single monte-carlo tree search.
Some policies have optional keyword arguments, which can be specified using the _kwargs specification. This is shown above to set
To act in an environment, simply call the act() method. This will run an MCTS for the number of seconds provided in calculation_time, and then choose the best action according to the action policy. E.g
ai.act()
To view available choices for each policy, simply inspect the output of the following code:
mcts.SUPPORTED_POLICY_TYPES
The MCTS is designed to be flexible so that it can be easily plugged into suitable game environments. The required api is inspired heavily by the OpenAI gym API, but which a few modifications:
- A
stateattribute, which describes the current state. It's recommended to use a numpy array. - An
actionsattribute. This must return a list of valid actions in the current game state. - A
playerattribute which returns the current player. This can be any unique identifier. - An
n_playersattribute (int) which returns the number of players in the game. - A
clonemethod which clones the environment. - A
winnerattribute which provides the identity of the winner of the game. - A
terminalattribute (bool) which flags whether or not the game is in a terminal state. - A
stepmethod which takes anaction. This will perform the action in the environment and return:- observation - the state of the next
- reward - The reward for taking that action
- done - A boolean which is
trueif the action led to a terminal state andfalseotherwise.
Future versions will eliminate some of these requirements.
For neural-network integrated MCTS, plese refer to this jupyter notebook tutorial.