<aside> <img src="/icons/question-mark_red.svg" alt="/icons/question-mark_red.svg" width="40px" /> How to play Quixo ?
Quixo is a two-player strategic board game played on a 5x5 grid. Each player has a set of 25 cubes with either an X or an O symbol. The goal is to be the first to create a line of five of your symbols, either horizontally, vertically, or diagonally. Players take turns sliding one of their cubes from the outer edge into an empty space, flipping it to reveal the symbol. The game can be won by placing five cubes in a row or by sliding cubes into position during the game. The optional symmetric setup variant allows players to place their cubes strategically before the game starts. The player who achieves a winning line first or fills the entire board without a winner wins the game.
</aside>
<aside> <img src="/icons/notification_red.svg" alt="/icons/notification_red.svg" width="40px" /> The following solution was developed together with Lorenzo Tozzi.
</aside>
In this project we had to develop an agent able to play the Quixo game and aim to achieve good results against a random player. We created an agent based on Reinforcement Learning teqniques, which is able to play through a policy learned with Q-learning.
Files description
main.ipynb
: Here are present key functionalities, running games against a random opponent and the option to play against our player.game.py
: This is the file presenting the implementation of the Quixo Game
class, the Player
class and the Move
class. We took this file from the commit and we modified the print function, in order to print a prettier board. We also enabled the play
method to print the board after every move.players.py
: This file contains implementations of players employing various strategies, RandomPlayer
, HumanPlayer
, RLayer
(our player trained using reinforcement learning techniques).train.py
: In this file we implemented the methods that allowed us to train our player. We had to subclass the game
class in order to facilitate our training.DeepQ.ipynb
: In this notebook we tried to implement a DeepQ-learning approach (following this guide). The net was highly unstable and very slow, so we couldn't optimize it properly. In the future we would like to focus on this approach because we believe it can deliver high performance with fewer computational power when compared to simple Q-learning.Quixo/Policies
.The policy is a dictionary that, after training, is turned into a JSON file. We structured the policy dictionary in the following format:
{
Key: str(board) + str(player_id)
Value: dict{
Key: str(from_pos) + ";" + str(Move)
Value: float(value)
}
}
Our policy is a dictionary of dictionaries. Where the keys of the outer dictionary are a string representing the board state, with also attached the player id. We add the player id because the agent can end up in the same state both if starting as first or as second. So in order to distinguish which moves are the best, we also encode the player id.
The value is another dictionary where the keys are the strings representing the position of the cube to take, plus the slide move to apply. The value is the value representing the goodness of the move.
The policy values are updated following a Q-Learning strategy: