Backgammon OpenAI Gym

This repository contains a Backgammon game implementation in OpenAI Gym.
Given the current state of the board, a roll of the dice, and the current player, it computes all the legal actions/moves (iteratively) that the current player can execute. The legal actions are generated in a such a way that they uses the highest number of dice (if possible) for that state and player.

Installation

git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .

Environment

The encoding used to represent the state is inspired by the one used by Gerald Tesauro[1].

Observation

Type: Box(198)

Num	Observation	Min	Max
0	WHITE - 1st point, 1st component	0.0	1.0
1	WHITE - 1st point, 2nd component	0.0	1.0
2	WHITE - 1st point, 3rd component	0.0	1.0
3	WHITE - 1st point, 4th component	0.0	6.0
4	WHITE - 2nd point, 1st component	0.0	1.0
5	WHITE - 2nd point, 2nd component	0.0	1.0
6	WHITE - 2nd point, 3rd component	0.0	1.0
7	WHITE - 2nd point, 4th component	0.0	6.0
...
92	WHITE - 24th point, 1st component	0.0	1.0
93	WHITE - 24th point, 2nd component	0.0	1.0
94	WHITE - 24th point, 3rd component	0.0	1.0
95	WHITE - 24th point, 4th component	0.0	6.0
96	WHITE - BAR checkers	0.0	7.5
97	WHITE - OFF bar checkers	0.0	1.0
98	BLACK - 1st point, 1st component	0.0	1.0
99	BLACK - 1st point, 2nd component	0.0	1.0
100	BLACK - 1st point, 3rd component	0.0	1.0
101	BLACK - 1st point, 4th component	0.0	6.0
...
190	BLACK - 24th point, 1st component	0.0	1.0
191	BLACK - 24th point, 2nd component	0.0	1.0
192	BLACK - 24th point, 3rd component	0.0	1.0
193	BLACK - 24th point, 4th component	0.0	6.0
194	BLACK - BAR checkers	0.0	7.5
195	BLACK - OFF bar checkers	0.0	1.0
196 - 197	Current player	0.0	1.0

Encoding of a single point (it indicates the number of checkers in that point):

Checkers	Encoding
0	[0.0, 0.0, 0.0, 0.0]
1	[1.0, 0.0, 0.0, 0.0]
2	[1.0, 1.0, 0.0, 0.0]
>= 3	[1.0, 1.0, 1.0, (checkers - 3.0) / 2.0]

Encoding of BAR checkers:

Checkers	Encoding
0 - 14	[bar_checkers / 2.0]

Encoding of OFF bar checkers:

Checkers	Encoding
0 - 14	[off_checkers / 15.0]

Encoding of the current player:

Player	Encoding
WHITE	[1.0, 0.0]
BLACK	[0.0, 1.0]

Actions

The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.

Reward

+1 if player WHITE wins, and 0 if player BLACK wins

Starting State

All the episodes/games start in the same starting position:

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Episode Termination

One of the 2 players win the game
Episode length is greater than 10000

Reset

The method reset() returns:

the player that will move first (0 for the WHITE player, 1 for the BLACK player)
the first roll of the dice, a tuple with the dice rolled, i.e (1,3) for the BLACK player or (-1, -3) for the WHITE player
observation features from the starting position

Rendering

If render(mode = 'rgb_array') or render(mode = 'state_pixels') are selected, this is the output obtained (on multiple steps):

Example

Play Random Agents

To run a simple example (both agents - WHITE and BLACK select an action randomly):

cd examples/
python3 play_random_agent.py

Valid actions

An internal variable, current player is used to keep track of the player in turn (it represents the color of the player).
To get all the valid actions:

actions = env.get_valid_actions(roll)

The legal actions are represented as a set of tuples.
Each action is a tuple of tuples, in the form ((source, target), (source, target))
Each tuple represents a move in the form (source, target)

NOTE:

The actions of asking a double and accept/reject a double are not available.

Given the following configuration (starting position, BLACK player in turn, roll = (1, 3)):

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Current player=1 (O - Black) | Roll=(1, 3)

The legal actions are:

Legal Actions:
((11, 14), (14, 15))
((0, 1), (11, 14))
((18, 19), (18, 21))
((11, 14), (18, 19))
((0, 1), (0, 3))
((0, 1), (16, 19))
((16, 17), (16, 19))
((18, 19), (19, 22))
((0, 1), (18, 21))
((16, 17), (18, 21))
((0, 3), (18, 19))
((16, 19), (18, 19))
((16, 19), (19, 20))
((0, 1), (1, 4))
((16, 17), (17, 20))
((0, 3), (16, 17))
((18, 21), (21, 22))
((0, 3), (3, 4))
((11, 14), (16, 17))

Backgammon Versions

`backgammon-v0`

The above description refers to backgammon-v0.

`backgammon-pixel-v0`

The state is represented by (96, 96, 3) feature vector.
It is the only difference w.r.t backgammon-v0.

An example of the board representation:

Useful links and related works

[1]Implementation Details TD-Gammon
[2]Practical Issues in Temporal Difference Learning
Rules of Backgammon:
- www.bkgm.com/rules.html
- https://en.wikipedia.org/wiki/Backgammon
- Starting Position: http://www.bkgm.com/gloss/lookup.cgi?starting+position
- https://bkgm.com/faq/
Other Implementation of TD-Gammon and the game of Backgammon:
Other Implementation of the Backgammon OpenAI Gym Environment:
- https://github.com/edusta/gym-backgammon

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
gym_backgammon		gym_backgammon
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
board.gif		board.gif
logo.png		logo.png
raw_pixel.png		raw_pixel.png
setup.py		setup.py

License

dellalibera/gym-backgammon

Folders and files

Latest commit

History

Repository files navigation

Backgammon OpenAI Gym

Table of Contents

gym-backgammon

Installation

Environment

Observation

Actions

Reward

Starting State

Episode Termination

Reset

Rendering

Example

Play Random Agents

Valid actions

NOTE:

Backgammon Versions

backgammon-v0

backgammon-pixel-v0

Useful links and related works

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`backgammon-v0`

`backgammon-pixel-v0`