项目作者: pietermarsman

项目描述 :
AlphaGo Zero approach to playing 3d connect-four game
高级语言: Python
项目地址: git://github.com/pietermarsman/alpha-connect-four.git
创建时间: 2019-09-05T09:18:41Z
项目社区:https://github.com/pietermarsman/alpha-connect-four

开源协议:MIT License

下载


3D Connect Four AI

This project implements several computer players that can play 3D Connect Four.

Connect four game

Reference players:

  • a random player
  • a greedy player that maximizes its own maximum line length
  • a minimax player based on line length
  • a MCTS player that uses random playouts

Model based players:

The best player is the AlphaGo Zero player and the command-line util focusses on interacting and improving that.

Play iteractive game against AlphaConnectPlayer

You can use the command-line to play an interactive game against the current best AlphaGo Zero player. This shows the
current board state as four horizontal slices and uses hexadecimal numbers to represent actions. Enter a number between
0 and f to drop a stone.

After you have played the first stone, AlphaConnectPlayer will take 15 seconds to make it’s move.

  1. $ export PYTHONPATH=connect-four
  2. $ python -m connect-four play models/000170.h5
  3. . . . .
  4. . . . .
  5. . . . .
  6. . . . .
  7. . . . .
  8. . . . .
  9. . . . .
  10. . . . .
  11. . . . .
  12. . . . .
  13. . . . .
  14. . . . .
  15. _ _ _ _
  16. _ _ _ _
  17. _ _ _ _
  18. _ _ _ _
  19. Possible actions:
  20. 0 4 8 c
  21. 1 5 9 d
  22. 2 6 a e
  23. 3 7 b f
  24. Choose your action:

Continuously generate games with self-play

To continuously generate new self-play games, in parallel, using the newest neural network model, run:

  1. $ python -m connect-four simulate-continuously models/ data/
  2. Written game to: data/000172/20190923_140243_836785.json
  3. Written game to: data/000172/20190923_140252_434749.json

Continuously optimize neural network

The self-play games can be used to predict the outcome and best actions for arbitrary states during those games. It uses
the outcome of MCTS searches during the game to predict which actions should be played, and it uses the final outcome
of the game (i.e. win, lose or draw) to predict its value.

Note: optimizing the neural network can go very fast, especially if there are only a few self-play games. To give the
simulation process more time for simulating new games with the latest model, this process waits for 30 minutes between
each consequtive optimization.

To continuously optimize a policy and value neural network, run:

  1. $ python -m connect-four optimize-continuously data/ models/
  2. Model: "model"
  3. __________________________________________________________________________________________________
  4. Layer (type) Output Shape Param # Connected to
  5. ==================================================================================================
  6. input_1 (InputLayer) [(None, 4, 4, 4, 13) 0
  7. __________________________________________________________________________________________________
  8. ...
  9. __________________________________________________________________________________________________
  10. Using game files from data/000000/20190825_224003_643756.json to data/000172/20190903_074350_110901.json
  11. 100%|██████████| 100/100 [00:01<00:00, 99.90it/s]
  12. Train on 560 samples, validate on 240 samples
  13. 2019-09-23 14:16:44.954929: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
  14. Epoch 1/100
  15. 560/560 [==============================] - 4s 7ms/sample - loss: 3.8670 - softmax_loss: 2.7794 - dense_3_loss: 1.0636 - softmax_categorical_accuracy: 0.0607 - dense_3_mean_absolute_error: 0.9964 - val_loss: 3.7915 - val_softmax_loss: 2.7718 - val_dense_3_loss: 0.9931 - val_softmax_categorical_accuracy: 0.0708 - val_dense_3_mean_absolute_error: 0.9936
  16. Epoch 2/100
  17. 560/560 [==============================] - 0s 879us/sample - loss: 3.7463 - softmax_loss: 2.7681 - dense_3_loss: 0.9620 - softmax_categorical_accuracy: 0.0714 - dense_3_mean_absolute_error: 0.9486 - val_loss: 3.7870 - val_softmax_loss: 2.7707 - val_dense_3_loss: 0.9926 - val_softmax_categorical_accuracy: 0.0750 - val_dense_3_mean_absolute_error: 0.9907
  18. Epoch 3/100
  19. ...

Continuously generate tournament games between all players

A tournament is usefull to determine the best player. In a tournament all types of players are randomly paired and play
a game. Unlike with self-play games, the randomization of AlphaGoConnect player is disabled such that it will always
performs the best move.

  1. $ python -m connect-four tournament-continously tournament-data/ models/
  2. Written game to: tournament-data/20190923_140243_836785.json
  3. Written game to: tournamet-data/20190923_140252_434749.json
  4. Written game to: tournamet-data/20190923_140313_240510.json
  5. ...

Compute Bayesian Elo rating

The tournament games can be used to compute the Elo rating of each player.
This orders the players based on their mutual winning odds.

This uses (Stan)[https://mc-stan.org] to compute a
(Bayesian version of the Elo rating)[https://www.remi-coulom.fr/Bayesian-Elo/].

  1. $ python -m connect-four tournament-elo tournament-data/
  2. Gradient evaluation took 0.000892 seconds
  3. 1000 transitions using 10 leapfrog steps per transition would take 8.92 seconds.
  4. Adjust your expectations accordingly!
  5. Iteration: 1 / 2000 [ 0%] (Warmup)
  6. ...
  7. Iteration: 2000 / 2000 [100%] (Sampling)
  8. Elapsed Time: 3.94725 seconds (Warm-up)
  9. 2.22654 seconds (Sampling)
  10. 6.17379 seconds (Total)
  11. Advantage of starting player: 0.33 (>0.21, <0.46)
  12. games wins losses | lower median upper | best | player
  13. 387 360 27 | 2.46 3.01 3.57 | 100% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000170.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  14. 132 100 32 | 1.54 2.13 2.73 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000150.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  15. 218 168 50 | 1.36 1.89 2.42 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000110.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  16. 194 142 52 | 1.32 1.85 2.39 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000120.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  17. 161 120 41 | 1.27 1.81 2.38 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000160.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  18. 170 125 45 | 1.22 1.75 2.31 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000130.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  19. 160 116 44 | 1.05 1.60 2.17 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000140.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  20. 218 151 67 | 0.98 1.47 1.96 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000100.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  21. 131 82 49 | 0.76 1.35 1.93 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000090.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  22. 150 92 58 | 0.50 1.07 1.63 | 0% | MiniMaxPlayer(depth=2)
  23. 141 84 57 | 0.37 0.94 1.48 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000060.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  24. 151 85 66 | 0.31 0.87 1.42 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000070.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  25. 146 83 63 | 0.29 0.82 1.40 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000080.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  26. 158 82 76 | 0.00 0.56 1.11 | 0% | MiniMaxPlayer(depth=3)
  27. 137 66 71 | -0.11 0.45 0.99 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000050.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  28. 141 64 77 | -0.42 0.18 0.69 | 0% | MiniMaxPlayer(depth=1)
  29. 147 59 88 | -0.87 -0.26 0.31 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000040.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  30. 141 54 87 | -1.16 -0.57 0.02 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000030.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  31. 146 46 100 | -1.44 -0.85 -0.29 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000010.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  32. 136 36 100 | -1.81 -1.21 -0.62 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000020.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  33. 162 41 121 | -1.90 -1.34 -0.73 | 0% | MonteCarloPlayer(exploration=1.000, budget=6400)
  34. 145 35 110 | -1.98 -1.38 -0.78 | 0% | GreedyPlayer()
  35. 136 30 106 | -2.07 -1.46 -0.85 | 0% | MonteCarloPlayer(exploration=1.000, budget=3200)
  36. 136 19 117 | -2.76 -2.06 -1.42 | 0% | AlphaConnectPlayer(model_path='/Users/pieter/Documents/Projects/connect-four/models/000000.h5', exploration=1.0, start_temperature=1.0, time_budget=None, search_budget=1600, self_play=False, batch_size=16)
  37. 140 14 126 | -3.30 -2.57 -1.89 | 0% | MonteCarloPlayer(exploration=1.000, budget=1600)
  38. 156 14 142 | -3.51 -2.82 -2.12 | 0% | MonteCarloPlayer(exploration=1.000, budget=800)
  39. 143 8 135 | -3.98 -3.15 -2.37 | 0% | MonteCarloPlayer(exploration=1.000, budget=400)
  40. 132 0 132 | -5.18 -4.16 -3.22 | 0% | RandomPlayer()

Note that these Elo ratings use a different intercept (0 instead of 1000) and scale (1 instead of 400) compared to
normal Elo ratings.

Also note that AlphaConnectPlayer with the model 000170.h5 (included in this git project) is the best player. And
that (luckily) the RandomPlayer is the worst.