Skip to content

Gregory-Eales/teeny-go

Repository files navigation

Teeny Go

9x9 go agent inspired by Alpha Go

AboutNetwork ArchitectureTrainingResultsSources

About

This project aims at creating a 9x9 go agent using the methods implemented by the Google Deepmind team in both AlphaGo and AlphaGo Zero. This begins by training a neural network using supervised learning for board evaluation and move prediction and then moving into a self improvement stage using reinforcement learning. The next step is to train a policy and value network tabula rasa using pure reinforcement learning and self-play.

Network Architecture

This model utilizes a convolutional neural network with residual blocks and either a policy or value head to make predictions about a given board state. The policy head utilizes a 1x1 convolution, batch norm, and a fully connected layer and the value head...

Training

Value Training

These value network models were trained on a 2000 game subset of the 40,000 9x9 go games collected from the OGS website. Each game has at least one dan level player ensuring some degree of optimal play. The games were processed from the standard game format to a (n, 11, 9, 9) tensor, where n represents the number of states in the game, 11 where 2 sets of 5 dimensions are allocated for white and blacks stone positions for the past 5 moves and the final dimension for the player turn at that state. Each model was trained for 50 iterations with seeded random shuffling of data to predict the winner of the game. This was done using the a tanh activation function where 1 represents the current player winning and -1 represents the opposing player winning. The decision boundry for a successful prediction is one where |p| > 1/3, anything less than 1/3 it taken to be an uncertainty interval. All models tested converge to an accuracy of about 55%-60% which is relativly good considering the ambiguity of early game states. Although the accuracy remained relativly consistent for the majority of training, using model versions where the validation and training loss were at their lowest proved make the most reasonable predictions in user tesing.

Policy Training

Results

Sources

Go Datasets

Go Engines

Papers

Additional Sources

Meta

Gregory Eales – @GregoryHamEgregory.hamilton.e@gmail.com

Distributed under the MIT license. See LICENSE for more information.

https://github.com/Gregory-Eales

Releases

No releases published

Packages

No packages published