github.com/gorgonia/agogo@v0.1.1/README.md (about)

     1  # agogo
     2  
     3  A reimplementation of AlphaGo in Go (specifically AlphaZero)
     4  
     5  ## About
     6  
     7  The algorithm is composed of:
     8  
     9  - a Monte-Carlo Tree Search (MCTS) implemented in the [`mcts`](https://pkg.go.dev/github.com/gorgonia/agogo/mcts) package;
    10  - a Dual Neural Network (DNN) implemented in the [`dualnet`](https://pkg.go.dev/github.com/gorgonia/agogo/dualnet) package.
    11  
    12  The algorithm is wrapped into a top-level structure ([`AZ`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ) for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.
    13  
    14  The contract specifies the description of a game state.
    15  
    16  In this package, the contract is a Go interface declared in the `game` package: [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State).
    17  
    18  ### Description of some concepts/ubiquitous language
    19  
    20  - In the `agogo` package, each player of the game is an [`Agent`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent), and in a `game`, two `Agents` are playing in an [`Arena`](https://pkg.go.dev/github.com/gorgonia/agogo@v0.1.0#Arena)
    21  
    22  - The `game` package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State) of the game. A State is an interface that represents the current game state *as well* as the allowed interactions. The interaction is made by an object [`Player`](https://pkg.go.dev/github.com/gorgonia/agogo/game#Player) who is operating a [`PlayerMove`](https://pkg.go.dev/github.com/gorgonia/agogo/game#PlayerMove). The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.
    23  
    24  ### Training process
    25  
    26  ### Applying the Algo on a game
    27  
    28  This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the `game` package.
    29  Then, the model can be saved and used as a player.
    30  
    31  The steps to train the algorithm are:
    32  
    33  - Creating a structure that is fulfilling the [`State`](https://pkg.go.dev/github.com/gorgonia/agogo/game#State) interface (aka a _game_).
    34  - Creating a _configuration_ for your AZ internal MCTS and NN.
    35  - Creating an `AZ` structure based on the _game_ and  the _configuration_
    36  - Executing the learning process (by calling the [`Learn`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Learn) method)
    37  - Saving the trained model (by calling the [`Save`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Save) method)
    38  
    39  The steps to play against the algorithm are:
    40  
    41  - Creating an `AZ` object
    42  - Loading the trained model (by calling the [`Read`](https://pkg.go.dev/github.com/gorgonia/agogo#AZ.Read) method)
    43  - Switching the agent to inference mode via the [`SwitchToInference`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent.SwitchToInference) method
    44  - Get the AI move by calling the [`Search`](https://pkg.go.dev/github.com/gorgonia/agogo#Agent.Search) method and applying the move to the game manually
    45  
    46  ## Examples
    47  
    48  Four board games are implemented so far. Each of them is defined as a subpackage of `game`:
    49  
    50  - [`mnk`](https://pkg.go.dev/github.com/gorgonia/agogo/game/mnk) for [m,n,k](https://en.wikipedia.org/wiki/M,n,k-game) game.
    51  - [`wq`](https://pkg.go.dev/github.com/gorgonia/agogo/game/mnk) is the game of [Go](https://en.wikipedia.org/wiki/Go_(game)) (围碁)
    52  - `c4`
    53  - `komi`
    54  
    55  ### tic-tac-toe
    56  
    57  Tic-tac-toe is a m,n,k game where m=n=k=3.
    58  
    59  #### Training
    60  
    61  Here is a sample code that trains AlphaGo to play the game. The result is saved in a file `example.model`
    62  
    63  ```go
    64  // encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
    65  func encodeBoard(a game.State) []float32 {
    66       board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    67       for i := range board {
    68       if board[i] == 0 {
    69            board[i] = 0.001
    70       }
    71       }
    72       playerLayer := make([]float32, len(a.Board()))
    73       next := a.ToMove()
    74       if next == game.Player(game.Black) {
    75       for i := range playerLayer {
    76            playerLayer[i] = 1
    77       }
    78       } else if next == game.Player(game.White) {
    79       // vecf32.Scale(board, -1)
    80       for i := range playerLayer {
    81            playerLayer[i] = -1
    82       }
    83       }
    84       retVal := append(board, playerLayer...)
    85       return retVal
    86  }
    87  
    88  func main() {
    89      // Create the configuration of the neural network
    90       conf := agogo.Config{
    91           Name:            "Tic Tac Toe",
    92           NNConf:          dual.DefaultConf(3, 3, 10),
    93           MCTSConf:        mcts.DefaultConfig(3),
    94           UpdateThreshold: 0.52,
    95       }
    96       conf.NNConf.BatchSize = 100
    97       conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
    98       conf.NNConf.K = 3
    99       conf.NNConf.SharedLayers = 3
   100       conf.MCTSConf = mcts.Config{
   101           PUCT:           1.0,
   102           M:              3,
   103           N:              3,
   104           Timeout:        100 * time.Millisecond,
   105           PassPreference: mcts.DontPreferPass,
   106           Budget:         1000,
   107           DumbPass:       true,
   108           RandomCount:    0,
   109       }
   110  
   111       conf.Encoder = encodeBoard
   112  
   113      // Create a new game
   114      g := mnk.TicTacToe()
   115      // Create the AlphaZero structure 
   116      a := agogo.New(g, conf)
   117      // Launch the learning process
   118      a.Learn(5, 30, 200, 30) // 5 epochs, 50 episode, 100 NN iters, 100 games.
   119      // Save the model
   120       a.Save("example.model")
   121  }
   122  ```
   123  
   124  #### Inference
   125  
   126  ```go
   127  func encodeBoard(a game.State) []float32 {
   128      board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
   129      for i := range board {
   130          if board[i] == 0 {
   131              board[i] = 0.001
   132          }
   133      }
   134      playerLayer := make([]float32, len(a.Board()))
   135      next := a.ToMove()
   136      if next == game.Player(game.Black) {
   137          for i := range playerLayer {
   138              playerLayer[i] = 1
   139          }
   140      } else if next == game.Player(game.White) {
   141          // vecf32.Scale(board, -1)
   142          for i := range playerLayer {
   143              playerLayer[i] = -1
   144          }
   145      }
   146      retVal := append(board, playerLayer...)
   147      return retVal
   148  }
   149  
   150  func main() {
   151      conf := agogo.Config{
   152          Name:     "Tic Tac Toe",
   153          NNConf:   dual.DefaultConf(3, 3, 10),
   154          MCTSConf: mcts.DefaultConfig(3),
   155      }
   156      conf.Encoder = encodeBoard
   157  
   158      g := mnk.TicTacToe()
   159      a := agogo.New(g, conf)
   160      a.Load("example.model")
   161      a.A.Player = mnk.Cross
   162      a.B.Player = mnk.Nought
   163      a.B.SwitchToInference(g)
   164      a.A.SwitchToInference(g)
   165      // Put x int the center
   166      stateAfterFirstPlay := g.Apply(game.PlayerMove{
   167          Player: mnk.Cross,
   168          Single: 4,
   169      })
   170      fmt.Println(stateAfterFirstPlay)
   171      // ⎢ · · · ⎥
   172      // ⎢ · X · ⎥
   173      // ⎢ · · · ⎥
   174  
   175      // What to do next
   176      move := a.B.Search(stateAfterFirstPlay)
   177      fmt.Println(move)
   178      // 1
   179      g.Apply(game.PlayerMove{
   180          Player: mnk.Nought,
   181          Single: move,
   182      })
   183      fmt.Println(stateAfterFirstPlay)
   184      // ⎢ · O · ⎥
   185      // ⎢ · X · ⎥
   186      // ⎢ · · · ⎥
   187  }
   188  ```