<aside> <img src="/icons/notification_red.svg" alt="/icons/notification_red.svg" width="40px" /> The following project was developed together with Lorenzo Tozzi

</aside>

Goal

The goal of this project was to create and train an agent based on reinforcement learning strategies able to play TicTacToe.

Agents training

The training steps consists in a $400 000$ games against another opponent. We decided to train three different reinforcement learning agents:

Main classes

Two main classes are implemented:

RLayer

The RLayer has an internal dictionary called policy, where each discovered state of the game is associated with a value. When the agent is in train mode, it has a $0.3$ probability to make a random move (this allows exploration), otherwise it takes the best possible move that can be done, based on the values in the policy and returns it. All the chosen moves in a training game are stored in a list.

When a training game is over, we have a backpropagation step where the values associated with the moves used in that same game (policy states) are updated based on the outcome of the game. If it's a win, the reward is 1, if it's a loss, the reward is -1, and it's a tie the reward is 0.5.

The policy values are updated following a value iteration strategy:

$$ V(S) = V(S) + alpha * (gamma\_decay*reward - V(S)) $$

Where alpha is the learning rate, set to 0.2 and the gamma_decay is set to 0.9.