github.com/yaricom/goNEAT@v0.0.0-20210507221059-e2110b885482/README.md (about) 1 [![Build Status](https://travis-ci.org/yaricom/goNEAT.svg?branch=master)](https://travis-ci.org/yaricom/goNEAT) [![GoDoc](https://godoc.org/github.com/yaricom/goNEAT/neat?status.svg)](https://godoc.org/github.com/yaricom/goNEAT/neat) 2 3 ## Overview 4 This repository provides implementation of [NeuroEvolution of Augmenting Topologies (NEAT)][1] method written in Go language. 5 6 The Neuroevolution (NE) is an artificial evolution of Neural Networks (NN) using genetic algorithms in order to find 7 optimal NN parameters and topology. Neuroevolution of NN may assume search for optimal weights of connections between 8 NN nodes as well as search for optimal topology of resulting NN. The NEAT method implemented in this work do search for 9 both: optimal connections weights and topology for given task (number of NN nodes per layer and their interconnections). 10 11 #### System Requirements 12 The source code written and compiled against GO 1.9.x. 13 14 ## Installation 15 Make sure that you have at least GO 1.8.x. environment installed onto your system and execute following command: 16 ```bash 17 18 go get github.com/yaricom/goNEAT 19 ``` 20 21 ## Performance Evaluations 22 The basic system's performance is evaluated by two kind of experiments: 23 1. The XOR experiment which test whether topology augmenting actually happens by NEAT algorithm evaluation. To build XOR 24 solving network the NEAT algorithm should grow new hidden unit in the provided start genome. 25 2. The pole balancing experiments which is classic Reinforcement Learning experiment allowing us to estimate performance 26 of NEAT algorithm against proven results by many of other algorithms. I.e. we can benchmark NEAT performance against 27 other algorithms and find out if it performs better or worse. 28 29 30 ### 1. The XOR Experiments 31 Because XOR is not linearly separable, a neural network requires hidden units to solve it. The two inputs must be 32 combined at some hidden unit, as opposed to only at the output node, because there is no function over a linear 33 combination of the inputs that can separate the inputs into the proper classes. These structural requirements make XOR 34 suitable for testing NEAT’s ability to evolve structure. 35 36 #### 1.1. The XOR Experiment with connected inputs in start genome 37 In this experiment we will use start (seed) genome with inputs connected to the output. Thus it will check mostly the 38 ability of NEAT to grow new hidden unit necessary for solving XOR problem. 39 40 To run this experiment execute following commands: 41 ```bash 42 43 cd $GOPATH/src/github.com/yaricom/goNEAT 44 go run executor.go -out ./out/xor -context ./data/xor.neat -genome ./data/xorstartgenes -experiment XOR 45 46 ``` 47 Where: ./data/xor.neat is the configuration of NEAT execution context and ./data/xorstartgenes is the start genome 48 configuration. 49 50 This will execute 100 trials of XOR experiment within 100 generations. As result of execution into the ./out directory 51 will be stored several 'gen_x' files with snapshots of population per 'print_every' 52 generation or when winner solution found. Also in mentioned directory will be stored 'xor_winner' with winner genome and 53 'xor_optimal' with optimal XOR solution if any (has exactly 5 units). 54 55 By examining resulting 'xor_winner' from series of experiments you will find that at least one hidden unit was grown by NEAT 56 to solve XOR problem which is proof that it works as expected. 57 58 The XOR experiment for start genes with inputs connected will not fail almost always (at least 100 simulations) 59 60 The experiment results will be similar to the following: 61 62 ``` 63 Average 64 Winner Nodes: 5.0 65 Winner Genes: 6.0 66 Winner Evals: 7753.0 67 Mean 68 Complexity: 10.6 69 Diversity: 19.8 70 Age: 34.6 71 ``` 72 73 Where: 74 - **Winner nodes/genes** is number of units and links between in produced Neural Network which was able to solve XOR problem. 75 - **Winner evals** is the number of evaluations of intermediate organisms/genomes before winner was found. 76 - **Mean Complexity** is an average compexity (number of nodes + number of links) of best organisms per epoch for all epochs. 77 - **Mean Diversity** is an average diversity (number of species) per epoch for all epochs 78 - **Mean Age** is an average age of surviving species per epoch for all epochs 79 80 #### 1.2. The XOR experiment with disconnected inputs in start genome 81 This experiment will use start genome with disconnected inputs in order to check ability of algorithm to not only grow 82 need hidden nodes, but also to build missed connections between input nodes and rest of the network. 83 84 To run this experiment execute following commands: 85 ```bash 86 87 cd $GOPATH/src/github.com/yaricom/goNEAT 88 go run executor.go -out ./out/xor_disconnected -context ./data/xor.neat -genome ./data/xordisconnectedstartgenes -experiment XOR 89 90 ``` 91 92 This will execute 100 trials of XOR (disconnected) experiment within 100 generations. The results of experiment execution 93 will be saved into the ./out directory as in previous experiment. 94 95 The experiment will fail sometimes to produce XOR solution over 100 generations, but most of times solution will be found. This 96 confirms that algorithm is able not only grow needed hidden units, but also to restore input connections as needed. 97 98 The example output of the command as following: 99 ``` 100 101 Average 102 Winner Nodes: 5.7 103 Winner Genes: 9.2 104 Winner Evals: 9347.7 105 Mean 106 Complexity: 7.8 107 Diversity: 20.0 108 Age: 46.7 109 110 ``` 111 112 ### 2. The single pole-balancing experiment 113 The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning 114 systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge. 115 116 ![alt text][single_pole-balancing_scheme] 117 118 Figure 1. 119 120 ##### System Constraints 121 1. The pole must remain upright within ±r the pole failure angle. 122 2. The cart must remain within ±h of origin. 123 3. The controller must always exert a non-zero force F. 124 125 Where r is a pole failure angle (±12 ̊ from 0) and h is a track limit (±2.4 meters from the track centre). 126 127 The simulation of the cart ends when either the pole exceeds the failure angle or the cart exceeds the limit of the track. 128 The objective is to devise a controller that can keep the pole balanced for a defined length of simulation time. 129 The controller must always output a force at full magnitude in either direction (bang-bang control). 130 131 In this experiment the Genome considered as a winner if it's able to simulate single pole balancing at least 500’000 time 132 steps (10’000 simulated seconds). 133 134 To run this experiment with 150 population size execute following commands: 135 ```bash 136 137 cd $GOPATH/src/github.com/yaricom/goNEAT 138 go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole 139 140 ``` 141 142 This will execute 100 trials of single pole-balancing experiment within 100 generations and over population with 150 143 organisms. The results of experiment execution will be saved in ./out directory under specified folder. 144 145 To run this experiment with 1’000 population size execute following commands: 146 ```bash 147 148 cd $GOPATH/src/github.com/yaricom/goNEAT 149 go run executor.go -out ./out/pole1 -context ./data/pole1_1000.neat -genome ./data/pole1startgenes -experiment cart_pole 150 151 ``` 152 153 The example output of the command for population of 1000 organisms as following: 154 ``` 155 156 Average 157 Winner Nodes: 7.0 158 Winner Genes: 10.2 159 Winner Evals: 1880.0 160 Mean 161 Complexity: 17.1 162 Diversity: 25.3 163 Age: 2.2 164 165 ``` 166 167 This will execute 100 trials of single pole-balancing experiment within 100 generations and over population with 1’000 168 organisms. 169 170 The results demonstrate that winning Genome can be found in average within 2 generations among population of 1’000 organisms (which 171 belongs to 17 species in average) and within 30 generations for population of 150 organisms. 172 173 It's interesting to note that for population with 1000 organisms the winning solution often found in the null generation, 174 i.e. within initial random population. Thus the next experiment with double pole-balancing setup seems more interesting 175 for performance testing. 176 177 In both single pole-balancing configurations described above the optimal winner organism has number of nodes and genes - 7 and 10 correspondingly. 178 179 The seven network nodes has following meaning: 180 * node #1 is a bias 181 * nodes #2-5 are sensors receiving system state: X position, acceleration among X, pole angle, and pole angular velocity 182 * nodes #6, 7 are output nodes signaling what action should be applied to the system to balance pole at 183 each simulation step, i.e. force direction to be applied. The applied force direction depends on relative strength of 184 activations of both output neurons. If activation of first output neuron (6-th node) greater than activation of second 185 neuron (7-th node) the positive force direction applied. Otherwise the negative force direction applied. 186 187 The TEN genes is exactly number of links required to connect FIVE input sensor nodes with TWO output neuron nodes (5x2). 188 189 ### 3. The double pole-balancing experiment 190 191 This is advanced version of pole-balancing which assumes that cart has two poles with different mass and length to be balanced. 192 193 ![alt text][double_pole-balancing_scheme] 194 195 Figure 2. 196 197 We will consider for benchmarking the two types of this problem: 198 * the Markovian with full system state known (including velocities); 199 * the Non-Markovian without velocity information. 200 201 The former one is fairly simple and last one is a quite challenging. 202 203 ##### System Constraints 204 1. The both poles must remain upright within ±r the pole failure angle. 205 2. The cart must remain within ±h of origin. 206 3. The controller must always exert a non-zero force F. 207 208 Where r is a pole failure angle (±36 ̊ from 0) and h is a track limit (±2.4 meters from the track centre). 209 210 #### 3.1. The double pole-balancing Markovian experiment (with known velocity) 211 212 In this experiment agent will receive at each time step full system state including velocity of cart and both poles. The 213 winner solution will be determined as the one which is able to perform double pole-balancing at least 100’000 time steps or 214 1’000 simulated seconds. 215 216 To run experiment execute following command: 217 ```bash 218 219 cd $GOPATH/src/github.com/yaricom/goNEAT 220 go run executor.go -out ./out/pole2_markov -context ./data/pole2_markov.neat -genome ./data/pole2_markov_startgenes -experiment cart_2pole_markov 221 222 ``` 223 224 This will execute 10 trials of double pole-balancing experiment within 100 generations and over population with 1’000 225 organisms. 226 227 The example output of the command: 228 229 ``` 230 231 Average 232 Winner Nodes: 16.6 233 Winner Genes: 35.2 234 Winner Evals: 35593.2 235 Mean 236 Complexity: 29.4 237 Diversity: 686.9 238 Age: 14.7 239 240 ``` 241 242 The winner solution can be found approximately within 13 generation with nearly doubled complexity of resulting genome 243 compared to the seed genome. The seed genome has eight nodes where nodes #1-6 is sensors for x, x', θ1, θ1', θ2, and θ2' 244 correspondingly, node #7 is a bias, and node #8 is an output signaling what action should be applied at each time step. 245 246 247 #### 3.2. The double pole-balancing Non-Markovian experiment (without velocity information) 248 249 In this experiment agent will receive at each time step partial system state excluding velocity information about cart and both poles. 250 Only horizontal cart position X, and angles of both poles θ1 and θ2 will be provided to the agent. 251 252 The best individual (i.e. the one with the highest fitness value) of every generation is tested for 253 its ability to balance the system for a longer time period. If a potential solution passes this test 254 by keeping the system balanced for 100’000 time steps, the so called generalization score(GS) of this 255 particular individual is calculated. This score measures the potential of a controller to balance the 256 system starting from different initial conditions. It's calculated with a series of experiments, running 257 over 1000 time steps, starting from 625 different initial conditions. 258 259 The initial conditions are chosen by assigning each value of the set Ω = \[0.05, 0.25, 0.5, 0.75, 0.95\] to 260 each of the states x, ∆x/∆t, θ1 and ∆θ1/∆t, scaled to the range of the corresponding variables. The short pole 261 angle θ2 and its angular velocity ∆θ2/∆t are set to zero. The GS is 262 then defined as the number of successful runs from the 625 initial conditions and an individual 263 is defined as a solution if it reaches a generalization score of 200 or more. 264 265 To run experiment execute following command: 266 ```bash 267 268 cd $GOPATH/src/github.com/yaricom/goNEAT 269 go run executor.go -out ./out/pole2_non-markov -context ./data/pole2_non-markov.neat -genome ./data/pole2_non-markov_startgenes -experiment cart_2pole_non-markov 270 271 ``` 272 273 This will execute 10 trials of double pole-balancing Non-Markovian experiment within 100 generations and over population 274 with 1’000 organisms. 275 276 The example output of the command: 277 278 ``` 279 Average 280 Winner Nodes: 5.5 281 Winner Genes: 11.5 282 Winner Evals: 44049.3 283 Mean 284 Complexity: 13.9 285 Diversity: 611.1 286 Age: 18.9 287 288 ``` 289 290 The maximal generalization score achieved in this test run is about 347 for very simple genome which comprise of five nodes 291 and five genes (links). It has the same number of nodes as seed genome and only grew one extra recurrent gene connecting 292 output node to itself. And it happen to be the most useful configuration among other. 293 294 The most fit organism's genome based on test non-markov run with maximal generalization score of 347: 295 296 ``` 297 genomestart 46 298 trait 1 0.2138946217467174 0.06002354203083418 0.0028906105813590443 0.2713154623271218 0.35856547551861573 0.15170654613596346 0.14290235997205494 0.28328098202348406 299 node 1 1 1 1 300 node 2 1 1 1 301 node 3 1 1 1 302 node 4 1 1 3 303 node 5 1 0 2 304 gene 1 1 5 -0.8193796421450023 false 1 -0.8193796421450023 true 305 gene 1 2 5 -6.591284446514077 false 2 -6.591284446514077 true 306 gene 1 3 5 5.783520492164443 false 3 5.783520492164443 true 307 gene 1 4 5 0.6165420336465628 false 4 0.6165420336465628 true 308 gene 1 5 5 -1.240555929293543 true 50 -1.240555929293543 true 309 genomeend 46 310 311 ``` 312 313 ## Conclusion 314 315 The experiments described in this work confirm that implemented NEAT method is able to evolve new structures in ANNs (XOR 316 experiment) and is able to solve reinforcement learning tasks under conditions of incomplete knowledge (pole-balancing). 317 318 ## Credits 319 320 * The original C++ NEAT implementation created by Kenneth Stanley, see: [NEAT][1] 321 * Other NEAT implementations may be found at [NEAT Software Catalog][2] 322 323 This source code maintained and managed by [Iaroslav Omelianenko][3] 324 325 326 [1]:http://www.cs.ucf.edu/~kstanley/neat.html 327 [2]:http://eplex.cs.ucf.edu/neat_software/ 328 [3]:https://io42.space 329 330 [single_pole-balancing_scheme]: https://github.com/yaricom/goNEAT/blob/master/contents/single_pole-balancing.jpg "The single pole-balancing experimental setup" 331 [double_pole-balancing_scheme]: https://github.com/yaricom/goNEAT/blob/master/contents/double_pole-balancing.png "The double pole-balancing experimental setup"