Backgammon OpenAI Gym
The backgammon game is a 2-player game that involves both the movement of the checkers and also the roll of the dice. The goal of each player is to move all of his checkers off the board.
This repository contains a Backgammon game implementation in OpenAI Gym.
Given the current state of the board, a roll of the dice, and the current player, it computes all the legal actions/moves (iteratively) that the current player can execute. The legal actions are generated in a such a way that they uses the highest number of dice (if possible) for that state and player.
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .
The encoding used to represent the state is inspired by the one used by Gerald Tesauro[1].
Type: Box(198)
Num | Observation | Min | Max |
---|---|---|---|
0 | WHITE - 1st point, 1st component | 0.0 | 1.0 |
1 | WHITE - 1st point, 2nd component | 0.0 | 1.0 |
2 | WHITE - 1st point, 3rd component | 0.0 | 1.0 |
3 | WHITE - 1st point, 4th component | 0.0 | 6.0 |
4 | WHITE - 2nd point, 1st component | 0.0 | 1.0 |
5 | WHITE - 2nd point, 2nd component | 0.0 | 1.0 |
6 | WHITE - 2nd point, 3rd component | 0.0 | 1.0 |
7 | WHITE - 2nd point, 4th component | 0.0 | 6.0 |
… | |||
92 | WHITE - 24th point, 1st component | 0.0 | 1.0 |
93 | WHITE - 24th point, 2nd component | 0.0 | 1.0 |
94 | WHITE - 24th point, 3rd component | 0.0 | 1.0 |
95 | WHITE - 24th point, 4th component | 0.0 | 6.0 |
96 | WHITE - BAR checkers | 0.0 | 7.5 |
97 | WHITE - OFF bar checkers | 0.0 | 1.0 |
98 | BLACK - 1st point, 1st component | 0.0 | 1.0 |
99 | BLACK - 1st point, 2nd component | 0.0 | 1.0 |
100 | BLACK - 1st point, 3rd component | 0.0 | 1.0 |
101 | BLACK - 1st point, 4th component | 0.0 | 6.0 |
… | |||
190 | BLACK - 24th point, 1st component | 0.0 | 1.0 |
191 | BLACK - 24th point, 2nd component | 0.0 | 1.0 |
192 | BLACK - 24th point, 3rd component | 0.0 | 1.0 |
193 | BLACK - 24th point, 4th component | 0.0 | 6.0 |
194 | BLACK - BAR checkers | 0.0 | 7.5 |
195 | BLACK - OFF bar checkers | 0.0 | 1.0 |
196 - 197 | Current player | 0.0 | 1.0 |
Encoding of a single point (it indicates the number of checkers in that point):
Checkers | Encoding |
---|---|
0 | [0.0, 0.0, 0.0, 0.0] |
1 | [1.0, 0.0, 0.0, 0.0] |
2 | [1.0, 1.0, 0.0, 0.0] |
>= 3 | [1.0, 1.0, 1.0, (checkers - 3.0) / 2.0] |
Encoding of BAR checkers:
Checkers | Encoding |
---|---|
0 - 14 | [bar_checkers / 2.0] |
Encoding of OFF bar checkers:
Checkers | Encoding |
---|---|
0 - 14 | [off_checkers / 15.0] |
Encoding of the current player:
Player | Encoding |
---|---|
WHITE | [1.0, 0.0] |
BLACK | [0.0, 1.0] |
The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.
+1 if player WHITE wins, and 0 if player BLACK wins
All the episodes/games start in the same starting position:
| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------| |-------P=O Home Board--------| |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
|-----------------------------| |-----------------------------| |
| O | | | | | | | X | | | | | | |
| O | | | | | | | X | | | | | | |
| O | | | | X | | | X | | | | | | |
| O | | | | X | | | X | | | | | O | |
| O | | | | X | | | X | | | | | O | |
|--------Outer Board----------| |-------P=X Home Board--------| |
| 11 | 10 | 9 | 8 | 7 | 6 | BAR | 5 | 4 | 3 | 2 | 1 | 0 | OFF |
The method reset()
returns:
0
for the WHITE
player, 1
for the BLACK
player) (1,3)
for the BLACK
player or (-1, -3)
for the WHITE
playerIf render(mode = 'rgb_array')
or render(mode = 'state_pixels')
are selected, this is the output obtained (on multiple steps):
To run a simple example (both agents - WHITE
and BLACK
select an action randomly):
cd examples/
python3 play_random_agent.py
An internal variable, current player
is used to keep track of the player in turn (it represents the color of the player).
To get all the valid actions:
actions = env.get_valid_actions(roll)
The legal actions are represented as a set of tuples.
Each action is a tuple of tuples, in the form ((source, target), (source, target))
Each tuple represents a move in the form (source, target)
The actions of asking a double and accept/reject a double are not available.
Given the following configuration (starting position, BLACK
player in turn, roll = (1, 3)
):
| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------| |-------P=O Home Board--------| |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
|-----------------------------| |-----------------------------| |
| O | | | | | | | X | | | | | | |
| O | | | | | | | X | | | | | | |
| O | | | | X | | | X | | | | | | |
| O | | | | X | | | X | | | | | O | |
| O | | | | X | | | X | | | | | O | |
|--------Outer Board----------| |-------P=X Home Board--------| |
| 11 | 10 | 9 | 8 | 7 | 6 | BAR | 5 | 4 | 3 | 2 | 1 | 0 | OFF |
Current player=1 (O - Black) | Roll=(1, 3)
The legal actions are:
Legal Actions:
((11, 14), (14, 15))
((0, 1), (11, 14))
((18, 19), (18, 21))
((11, 14), (18, 19))
((0, 1), (0, 3))
((0, 1), (16, 19))
((16, 17), (16, 19))
((18, 19), (19, 22))
((0, 1), (18, 21))
((16, 17), (18, 21))
((0, 3), (18, 19))
((16, 19), (18, 19))
((16, 19), (19, 20))
((0, 1), (1, 4))
((16, 17), (17, 20))
((0, 3), (16, 17))
((18, 21), (21, 22))
((0, 3), (3, 4))
((11, 14), (16, 17))
backgammon-v0
The above description refers to backgammon-v0
.
backgammon-pixel-v0
The state is represented by (96, 96, 3)
feature vector.
It is the only difference w.r.t backgammon-v0
.
An example of the board representation: