<aside> <img src="/icons/question-mark_red.svg" alt="/icons/question-mark_red.svg" width="40px" /> How to play Quixo ?

Quixo is a two-player strategic board game played on a 5x5 grid. Each player has a set of 25 cubes with either an X or an O symbol. The goal is to be the first to create a line of five of your symbols, either horizontally, vertically, or diagonally. Players take turns sliding one of their cubes from the outer edge into an empty space, flipping it to reveal the symbol. The game can be won by placing five cubes in a row or by sliding cubes into position during the game. The optional symmetric setup variant allows players to place their cubes strategically before the game starts. The player who achieves a winning line first or fills the entire board without a winner wins the game.

</aside>

<aside> <img src="/icons/notification_red.svg" alt="/icons/notification_red.svg" width="40px" /> The following solution was developed together with Lorenzo Tozzi.

</aside>

Overview

In this project we had to develop an agent able to play the Quixo game and aim to achieve good results against a random player. We created an agent based on Reinforcement Learning teqniques, which is able to play through a policy learned with Q-learning.

Files description

Policy

The policy is a dictionary that, after training, is turned into a JSON file. We structured the policy dictionary in the following format:

{
    Key: str(board) + str(player_id)
    Value: dict{
                Key: str(from_pos) + ";" + str(Move)
                Value: float(value)
                }
}

Our policy is a dictionary of dictionaries. Where the keys of the outer dictionary are a string representing the board state, with also attached the player id. We add the player id because the agent can end up in the same state both if starting as first or as second. So in order to distinguish which moves are the best, we also encode the player id.

The value is another dictionary where the keys are the strings representing the position of the cube to take, plus the slide move to apply. The value is the value representing the goodness of the move.

The policy values are updated following a Q-Learning strategy: