In the last two weeks, I divided my time mostly between working on the Gaussian Process Regression model and creating a Monte Carlo Tree Search algorithm.
I realized early on that my accuracy function as well as the way I plotted results could both be improved. Previously, I defined by accuracy as acc = 1 - ((testY - y_pred) / testY).mean() where testY is an array of correct labels and y_pred is an array of predictions. I fixed it to MAE and RMSE, and was able to get more accurate accuracy results after that. Futhermore, I learned that plotting labels against predictions was a better way to showcase results, as seen below. The closer to the diagonal the data points are, the more accurate the model.
I also started on the second part of the project: designing a Monte Carlo Tree Search (MCTS) algorithm. To familiarize myself with the algorithm, I started by creating a tic-tac-toe agent that uses MCTS to determine the next move to make.
Monte Carlo Tree Search is an algorithm that predicts future moves of a game in order to take the most optimal action. There are four steps in the Monte Carlo Tree Search algorithm:
At the very base of the Monte Carlo Tree Search algorithm is a search tree, where every node represents a specific move in the game. For tic tac toe, each node might store the player that made the move, the location before the move, the gameboard before the move, as well as the number of simulations and simulated wins stemming from this node (see *simulation*).
Let us consider a half-played tic tac toe game like the one in the diagram above, and pretend that we are the agent using MCTS to determine what move to make. At the very beginning, we only have node 1: the gameboard before playing any moves.
In selection, the node on the tree with the highest probability of winning is chosen. There’s only one node, so we choose node 1.
In expansion, we create nodes for all possible moves we can make. Thus, we create nodes 2, 3, and 4.
In simulation, we run a random simulation of a game, recording whether or not we win. Starting from node 1, let’s simulate playing nodes 3, 8, and 13, ending in a tie.
In backpropagation, we record whether or not there was a win. Since it was a tie, we record 0.
Then, we start the iteration all over again. Now we have two levels in our tree: nodes 1 to 4.
In selection, we choose the node with the highest probability of winning. Since all 4 nodes have a win record of 0, we choose one randomly. Let’s pick node 3.
In expansion, we create nodes for all possible moves we can make. Note that these are the possible moves that the opponent can make. This corresponds to nodes 7 and 8.
In simulation, we run a random simulation of a game. Starting from node 3, let’s simulate playing nodes 7 and 12, ending in a win for green.
In backpropagation, we record the win. Since we are the green player, a win is recorded as +1. Thus, node 3 as well as all it’s parent nodes (node 1) get 1 added to it’s win record.
This process is repeated over and over again. The more repetitions, the more nodes we can create, and the more simulations we can win. In the diagram above, all nodes have been drawn out, and one simulation has been run on every possible outcome of the game. Ultimately, node 2 has a win record of 0 (loses once and wins once), node 3 has a win record of 1, and node 4 has a win record of 0. Thus, the move we will take will be node 3, playing an X in the bottom right corner.
This tic tac toe agent is what I created! I made a game environment class as well as a MCTS algorithm class, then ran MCTS for both players in the tic tac toe game. After running multiple simulated games, I realized most of the games end in a draw – which makes sense since there’s no way to force a win in tic tac toe if both players are playing optimally.
I also played around with larger tic tac toe boards (4x4, for example), but realized that a larger board corresponds to a larger game tree, meaning a larger computation is needed to make an accurate decision on what move to make.