github.com/gorgonia/agogo@v0.1.1/README.md (about) 1 # agogo 2 3 A reimplementation of AlphaGo in Go (specifically AlphaZero) 4 5 ## About 6 7 The algorithm is composed of: 8 9 - a Monte-Carlo Tree Search (MCTS) implemented in the [`mcts`](https://pkg.go.dev/github.com/gorgonia/agogo/mcts) package; 10 - a Dual Neural Network (DNN) implemented in the [`dualnet`](https://pkg.go.dev/github.com/gorgonia/agogo/dualnet) package. 11 12 The algorithm is wrapped into a top-level structure ([`AZ`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ) for AlphaZero). The algorithm applies to any game able to fulfill a specified contract. 13 14 The contract specifies the description of a game state. 15 16 In this package, the contract is a Go interface declared in the `game` package: [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State). 17 18 ### Description of some concepts/ubiquitous language 19 20 - In the `agogo` package, each player of the game is an [`Agent`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent), and in a `game`, two `Agents` are playing in an [`Arena`](https://pkg.go.dev/github.com/gorgonia/agogo@v0.1.0#Arena) 21 22 - The `game` package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State) of the game. A State is an interface that represents the current game state *as well* as the allowed interactions. The interaction is made by an object [`Player`](https://pkg.go.dev/github.com/gorgonia/agogo/game#Player) who is operating a [`PlayerMove`](https://pkg.go.dev/github.com/gorgonia/agogo/game#PlayerMove). The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves. 23 24 ### Training process 25 26 ### Applying the Algo on a game 27 28 This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the `game` package. 29 Then, the model can be saved and used as a player. 30 31 The steps to train the algorithm are: 32 33 - Creating a structure that is fulfilling the [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State) interface (aka a _game_). 34 - Creating a _configuration_ for your AZ internal MCTS and NN. 35 - Creating an `AZ` structure based on the _game_ and the _configuration_ 36 - Executing the learning process (by calling the [`Learn`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Learn) method) 37 - Saving the trained model (by calling the [`Save`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Save) method) 38 39 The steps to play against the algorithm are: 40 41 - Creating an `AZ` object 42 - Loading the trained model (by calling the [`Read`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Read) method) 43 - Switching the agent to inference mode via the [`SwitchToInference`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent.SwitchToInference) method 44 - Get the AI move by calling the [`Search`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent.Search) method and applying the move to the game manually 45 46 ## Examples 47 48 Four board games are implemented so far. Each of them is defined as a subpackage of `game`: 49 50 - [`mnk`](https://pkg.go.dev/github.com/gorgonia/agogo/game/mnk) for [m,n,k](https://en.wikipedia.org/wiki/M,n,k-game) game. 51 - [`wq`](https://pkg.go.dev/github.com/gorgonia/agogo/game/mnk) is the game of [Go](https://en.wikipedia.org/wiki/Go_(game)) (围碁) 52 - `c4` 53 - `komi` 54 55 ### tic-tac-toe 56 57 Tic-tac-toe is a m,n,k game where m=n=k=3. 58 59 #### Training 60 61 Here is a sample code that trains AlphaGo to play the game. The result is saved in a file `example.model` 62 63 ```go 64 // encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe 65 func encodeBoard(a game.State) []float32 { 66 board := agogo.EncodeTwoPlayerBoard(a.Board(), nil) 67 for i := range board { 68 if board[i] == 0 { 69 board[i] = 0.001 70 } 71 } 72 playerLayer := make([]float32, len(a.Board())) 73 next := a.ToMove() 74 if next == game.Player(game.Black) { 75 for i := range playerLayer { 76 playerLayer[i] = 1 77 } 78 } else if next == game.Player(game.White) { 79 // vecf32.Scale(board, -1) 80 for i := range playerLayer { 81 playerLayer[i] = -1 82 } 83 } 84 retVal := append(board, playerLayer...) 85 return retVal 86 } 87 88 func main() { 89 // Create the configuration of the neural network 90 conf := agogo.Config{ 91 Name: "Tic Tac Toe", 92 NNConf: dual.DefaultConf(3, 3, 10), 93 MCTSConf: mcts.DefaultConfig(3), 94 UpdateThreshold: 0.52, 95 } 96 conf.NNConf.BatchSize = 100 97 conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well) 98 conf.NNConf.K = 3 99 conf.NNConf.SharedLayers = 3 100 conf.MCTSConf = mcts.Config{ 101 PUCT: 1.0, 102 M: 3, 103 N: 3, 104 Timeout: 100 * time.Millisecond, 105 PassPreference: mcts.DontPreferPass, 106 Budget: 1000, 107 DumbPass: true, 108 RandomCount: 0, 109 } 110 111 conf.Encoder = encodeBoard 112 113 // Create a new game 114 g := mnk.TicTacToe() 115 // Create the AlphaZero structure 116 a := agogo.New(g, conf) 117 // Launch the learning process 118 a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games. 119 // Save the model 120 a.Save("example.model") 121 } 122 ``` 123 124 #### Inference 125 126 ```go 127 func encodeBoard(a game.State) []float32 { 128 board := agogo.EncodeTwoPlayerBoard(a.Board(), nil) 129 for i := range board { 130 if board[i] == 0 { 131 board[i] = 0.001 132 } 133 } 134 playerLayer := make([]float32, len(a.Board())) 135 next := a.ToMove() 136 if next == game.Player(game.Black) { 137 for i := range playerLayer { 138 playerLayer[i] = 1 139 } 140 } else if next == game.Player(game.White) { 141 // vecf32.Scale(board, -1) 142 for i := range playerLayer { 143 playerLayer[i] = -1 144 } 145 } 146 retVal := append(board, playerLayer...) 147 return retVal 148 } 149 150 func main() { 151 conf := agogo.Config{ 152 Name: "Tic Tac Toe", 153 NNConf: dual.DefaultConf(3, 3, 10), 154 MCTSConf: mcts.DefaultConfig(3), 155 } 156 conf.Encoder = encodeBoard 157 158 g := mnk.TicTacToe() 159 a := agogo.New(g, conf) 160 a.Load("example.model") 161 a.A.Player = mnk.Cross 162 a.B.Player = mnk.Nought 163 a.B.SwitchToInference(g) 164 a.A.SwitchToInference(g) 165 // Put x int the center 166 stateAfterFirstPlay := g.Apply(game.PlayerMove{ 167 Player: mnk.Cross, 168 Single: 4, 169 }) 170 fmt.Println(stateAfterFirstPlay) 171 // ⎢ · · · ⎥ 172 // ⎢ · X · ⎥ 173 // ⎢ · · · ⎥ 174 175 // What to do next 176 move := a.B.Search(stateAfterFirstPlay) 177 fmt.Println(move) 178 // 1 179 g.Apply(game.PlayerMove{ 180 Player: mnk.Nought, 181 Single: move, 182 }) 183 fmt.Println(stateAfterFirstPlay) 184 // ⎢ · O · ⎥ 185 // ⎢ · X · ⎥ 186 // ⎢ · · · ⎥ 187 } 188 ```