Skip to content

ArmanMielke/simple-poker-cfr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License Project status: completed

Counterfactual Regret Minimization for Simplified Poker

This is an implementation of Counterfactual Regret Minimization (CFR) for a simplified version of Texas Hold'em poker. CFR is an algorithm that is commonly used to solve imperfect information games. It is guaranteed to converge to a Nash equilibrium in finite two-player zero-sum games, such as many two-player poker variants. A Nash equilibrium is a set of strategies in which neither player can improve their expected payoff by changing their own strategy, as long as the other player doesn't change theirs. An interesting consequence of this is that the poker agent can learn to bluff and react to the opponent's bluffs (although it is of course impossible to predict with certainty whether an opponent is bluffing).

Compile the code with

cmake . && make

Then run the program to start the demo:

./cfr

This will train the agent for 1,000,000 iterations and let the user play against the newly trained agent. Training should only take a few seconds to complete.

To learn more about CFR, check out this excellent tutorial, which inspired much of the code.

Rules

  • There are two players, each of which must bet one chip at the start of the game
  • The deck has 13 cards of one suit
  • Each player gets one card, there is one community card
  • Which player has the stronger hand is determined like this:
    • Two adjacent cards count as a straight
      • A straight always beats a single card
      • If both players have a straight, the higher one wins
    • If no player has a straight, the player with the higher card wins
    • If both player's pocket cards are lower than the community card, the game ends in a draw
  • There is one betting round
    • Players can either pass (=check/fold) or bet one chip (=call/raise)
    • Betting ends after
      • Both players pass
      • Both players bet
      • One player passes after the other player bets
    • The result of the game is summarized in the table below
  • During training it is assumed that each player has an infinite supply of chips

Summary of the betting round:

Player 1 Player 2 Player 1 Payoff
pass pass +1 to player with better hand
pass bet pass +1 to player 2
pass bet bet +2 to player with better hand
bet pass +1 to player 1
bet bet +2 to player with better hand

Evaluation

  • Rand: Bot that chooses its actions uniformly at random
  • CFR1: CFR bot trained for 100 iterations (= games)
  • CFR2: CFR bot trained for 10,000 iterations
  • CFR3: CFR bot trained for 1,000,000 iterations

The table displays how many chips player 1 (row) won against player 2 (column) on average over 1,000,000 games.

Rand CFR1 CFR2 CFR3
Rand 0.000902 -- -- --
CFR1 -0.235421 -0.007219 -- --
CFR2 0.019710 0.125138 0.000522 --
CFR3 0.008765 0.070699 0.007931 0.000375

The results are NOT averaged over multiple training runs. In the runs on the diagonal, where the same number of iterations was used for both bots, the bots were trained separately. This means that they may have converged to different Nash equilibria.

Observations:

  • As expected, the entries on the diagonal are almost 0
  • Generally, CFR bots trained with more iterations are stronger than CFR bots trained with fewer iterations.
  • CFR1 loses a lot of chips against the random bot (I have repeated this experiment multiple times).
  • CFR3 does not win against the random bot by a significant amount (I have repeated this experiment, too, and sometimes CFR3 even loses).

License

Copyright © 2020 Arman Mielke

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

About

Counterfactual Regret Minimization for a simplified version of Texas Hold'em poker

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published