Skip to content

Reinforcement Learning Checkers AI with Web Demo

Notifications You must be signed in to change notification settings

Btsan/CheckersBot

Repository files navigation

CheckersBot

Play against it

American Checkers/English Draughts AI


A checkers playing agent that heuristically evaluates a given game state with a deep neural network.

The implemented neural network model is a multi-layer perceptron (MLP). The demo AI was initially trained with weakly supervised learning and further trained with reinforcement learning, by playing against itself.

Image of web demo being played

Figure: demo being played. AI (red) sacrifices a piece to break the player's formation.

Weakly Supervised Learning

The MLP model is initially trained with weakly supervised learning:

  1. Data samples (game states here) are procedurally generated.
  2. Each sample is heuristically given a weak label.
  3. Another model, a shallow MLP, evaluates this weakly labeled dataset, evaluating the labeling heurstics.
  4. The evaluation of the weakly labeled dataset is used to score each weak label as arguably good, or confidence. The higher the generative model's error, the worse the confidence.
  5. Finally, the actual checkers agent is trained on the labeled dataset, this time weighted by the confidence score for each weak label. This confidence score weights the label of its respective sample.

This implementation was quickly made in the hours before it was due in class is a simplified algorithm of its inspiration.

Because there are so many possible game states, the data is generated in batches, learned, and then deleted.

Game states are generated from an initial random board state, to diversify training. If all game states were generated from the standard initial board state, the model would likely overfit to heuristics during the early stages of the game. It would also not train well on middle-end game stages.

Reinforcement Learning

Finally, while the weakly trained model seemed to able to play checkers, it still could't beat me humans. This was likely because my heuristic label was bad, so I had a few options:

  • Manually readjust the heuristic labeling function(s).
  • Programmatically determine an optimal heuristic label, given the individual labeling function components. E.g. genetic algorithms, regression, etc...
  • Reinforcement learning

Manual readjustment of heuristics would be easier if I were a checkers expert. I'm not.

Policy searching algorithms might not have enough dimensions to find a decent gameplay policy in.

Thus, I chose reinforcement learning. The implemented algorithm uses epsilon-greedy exploration and rewards actions that lead to a win. Very simple to conceive and implement.

Result

Interestingly, while my heuristic labels were crafted with a cautious checkers agent in mind, the reinforcment learning program created a more aggressive agent, suggesting that it's bad to be over cautious.

Even so, the reinforcement learning agent could not play very well against me humans, so the demo uses a hard-coded look-ahead decision algorithm. This is just a duct tape fix and can be improved...

Room for Improvements

Look-Ahead

The checkers agent, after learning from 10,000 games against itself, performs badly by itself. This may be because it only learned to evaluate the immediate value of a game state and not the long-term value that also considers the opponent's actions.

This issue can be resolved by a look-ahead decision algorithm (such as the hardcoded MinMax algorithm in the web demo) but is time-consuming to calculate. A better solution would be to have the checkers agent train against a MinMax agent, so that it can learn to estimate the long-term MinMax value of game states. This erases the need to have the costly look-ahead algorithm, providng better run-time performance and AI gameplay level.

Also, there are more efficient look-ahead algorithms (read heuristic tree search algorithms)

Convolutions

Another possible improvement is the usage of a convolutional neural network, instead of the multi-layer perceptron. However, a CNN separates its weights in filters, unlike the fully connected MLP, so it would be hard to port the trained model into my JavaScript code.

disclaimer: there are tools that can easily port a trained model into tensorflow.js, but I'm using copy-paste

While I think the MLP model actually works quite well (up to a point) and could possibly encode the good responses to all of game states it'd take a large (impossibly so) network very long (impossibly so) to reach that level. A convolutional neural network would generalize the board state and train an agent to achieve a high gameplay level sooner than an MLP agent.

Tools Used

The model was implemented in Keras, a deep learning library for Python.

The web demo uses TensorFlow.js to run the Keras model.