Alpha NL Holdem

This is an implementation of a self-play non-limit texas holdem ai, using TensorFlow and ray. While heavily inspired by UCAS's work of Alpha Holdem, it's not a offical implementation of Alpha Holdem.

This is a proof of concept project, rlcard's nl-holdem env was used. It's a 50bb 1v1 env, not the standard 100bb ACPC one, and bet sizes are slightly different than ACPC.

I asked a few pro holdem players to play against this ai for some dozen games. They report that the ai's moves are all logical. They did not observe any significant mistakes.

Goal

Provide a clean codebase for apply self-play model-free RL method in Holdem-like games.
Try to reproduce the result of the AlphaHoldem.
Provide All data, including checkpoints, training methods, evaluation metrics and more.

Getting Started

This project assumes you have the following:

Conda environment (Anaconda /Miniconda)
Python 3.7+

Install dependences:

pip3 install -r requirements.txt

Usage

Play against Neural Net

First go to gui directory, and run the python script:

cd gui
python3 play_against_ai_in_ui.py

And go to http://localhost:8000/ to play against the NeuralNet.

Yes, it's a small tool I write to play against AI, Yes, it looks bad. But it works.

By default you are playing against an NN opponent which has been trained for about a week.

Trainning

Training 101

To start the training, you have to first change the config in confs/nl_holdem.py

By default it would require 1 GPU and 89 cpu to run this program.

Modify the line in the config file:

    'num_workers': 89,

change it to the cpu core that your machine have. However you still need at least one gpu to run this the training.

And then use command:

python3 train_league.py --conf confs/nl_holdem.py --sp 0.0 --upwin 1.0 --gap=500 --league_tracker_n 1000

Winrate against history agents will be displayed in the stdout log.

Restore training

If the training process is somehow killed or you want to start from the weights you downloaded, First put the downloaded league folder in this project's root. Then use command :

python3 train_league.py --conf confs/nl_holdem.py --sp 0.0 --upwin 1.0 --gap=500 --league_tracker_n 1000 --restore league/history_agents

It would auto load all training weights and continue training.

Released data

Weights of all checkpoints in the process of a week of training, ~ 1 billion of selfplay games: Google Drive: https://drive.google.com/file/d/1G_GwTaVe4syCwW43DauwSQi6FqjRS3nj/view?usp=sharing Baidu Drive: https://pan.baidu.com/s/1PYNLKN2CExRntVvkvYyKkA?pwd=7jmn
Evaluation metrics and part of results: see nl-evaluation.ipynb

Known Issues

Rlcard environment sucks, 50bb pot, wrong pot sizes, wrong action order after flop, I don't know where to start. But it's the only environment I konw out there suitable for this task.
Even after ~ 1 billion self-play, over 1000 checkpoints, the model seems still not converge, still improving itself, I really don't know when it will converge. It could be some bug, not sure.

License

GNU AGPL v3

Warning

It's illegal to use this code in any way to commercial purpose, including researching project inside a commercial entity.

Especially for Chinese company JJ world(竞技世界). You'd better look elseware.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
agi		agi
confs		confs
gui		gui
imgs		imgs
weights		weights
LICENSE.txt		LICENSE.txt
nl-evaluation.ipynb		nl-evaluation.ipynb
nl_env.ipynb		nl_env.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
train_league.py		train_league.py
utils.py		utils.py

License

bupticybee/AlphaNLHoldem

Folders and files

Latest commit

History

Repository files navigation

Alpha NL Holdem

Goal

Getting Started

Usage

Play against Neural Net

Trainning

Training 101

Restore training

Released data

Known Issues

License

Warning

About

Topics

Resources

License

Stars

Watchers

Forks

Languages