github.com/yaricom/goNEAT@v0.0.0-20210507221059-e2110b885482/README.md

github.com/yaricom/goNEAT@v0.0.0-20210507221059-e2110b885482/README.md (about)

1 [![Build Status](https://travis-ci.org/yaricom/goNEAT.svg?branch=master)](https://travis-ci.org/yaricom/goNEAT) [![GoDoc](https://godoc.org/github.com/yaricom/goNEAT/neat?status.svg)](https://godoc.org/github.com/yaricom/goNEAT/neat)
2
3 ## Overview
4 This repository provides implementation of [NeuroEvolution of Augmenting Topologies (NEAT)][1] method written in Go language.
5
6 The Neuroevolution (NE) is an artificial evolution of Neural Networks (NN) using genetic algorithms in order to find
7 optimal NN parameters and topology. Neuroevolution of NN may assume search for optimal weights of connections between
8 NN nodes as well as search for optimal topology of resulting NN. The NEAT method implemented in this work do search for
9 both: optimal connections weights and topology for given task (number of NN nodes per layer and their interconnections).
10
11 #### System Requirements
12 The source code written and compiled against GO 1.9.x.
13
14 ## Installation
15 Make sure that you have at least GO 1.8.x. environment installed onto your system and execute following command:
16 ```bash
17
18 go get github.com/yaricom/goNEAT
19 ```
20
21 ## Performance Evaluations
22 The basic system's performance is evaluated by two kind of experiments:
23 1. The XOR experiment which test whether topology augmenting actually happens by NEAT algorithm evaluation. To build XOR
24 solving network the NEAT algorithm should grow new hidden unit in the provided start genome.
25 2. The pole balancing experiments which is classic Reinforcement Learning experiment allowing us to estimate performance
26 of NEAT algorithm against proven results by many of other algorithms. I.e. we can benchmark NEAT performance against
27 other algorithms and find out if it performs better or worse.
28
29
30 ### 1. The XOR Experiments
31 Because XOR is not linearly separable, a neural network requires hidden units to solve it. The two inputs must be
32 combined at some hidden unit, as opposed to only at the output node, because there is no function over a linear
33 combination of the inputs that can separate the inputs into the proper classes. These structural requirements make XOR
34 suitable for testing NEAT’s ability to evolve structure.
35
36 #### 1.1. The XOR Experiment with connected inputs in start genome
37 In this experiment we will use start (seed) genome with inputs connected to the output. Thus it will check mostly the
38 ability of NEAT to grow new hidden unit necessary for solving XOR problem.
39
40 To run this experiment execute following commands:
41 ```bash
42
43 cd $GOPATH/src/github.com/yaricom/goNEAT
44 go run executor.go -out ./out/xor -context ./data/xor.neat -genome ./data/xorstartgenes -experiment XOR
45
46 ```
47 Where: ./data/xor.neat is the configuration of NEAT execution context and ./data/xorstartgenes is the start genome
48 configuration.
49
50 This will execute 100 trials of XOR experiment within 100 generations. As result of execution into the ./out directory
51 will be stored several 'gen_x' files with snapshots of population per 'print_every'
52 generation or when winner solution found. Also in mentioned directory will be stored 'xor_winner' with winner genome and
53 'xor_optimal' with optimal XOR solution if any (has exactly 5 units).
54
55 By examining resulting 'xor_winner' from series of experiments you will find that at least one hidden unit was grown by NEAT
56 to solve XOR problem which is proof that it works as expected.
57
58 The XOR experiment for start genes with inputs connected will not fail almost always (at least 100 simulations)
59
60 The experiment results will be similar to the following:
61
62 ```
63 Average
64 Winner Nodes: 5.0
65 Winner Genes: 6.0
66 Winner Evals: 7753.0
67 Mean
68 Complexity: 10.6
69 Diversity: 19.8
70 Age: 34.6
71 ```
72
73 Where:
74 - **Winner nodes/genes** is number of units and links between in produced Neural Network which was able to solve XOR problem.
75 - **Winner evals** is the number of evaluations of intermediate organisms/genomes before winner was found.
76 - **Mean Complexity** is an average compexity (number of nodes + number of links) of best organisms per epoch for all epochs.
77 - **Mean Diversity** is an average diversity (number of species) per epoch for all epochs
78 - **Mean Age** is an average age of surviving species per epoch for all epochs
79
80 #### 1.2. The XOR experiment with disconnected inputs in start genome
81 This experiment will use start genome with disconnected inputs in order to check ability of algorithm to not only grow
82 need hidden nodes, but also to build missed connections between input nodes and rest of the network.
83
84 To run this experiment execute following commands:
85 ```bash
86
87 cd $GOPATH/src/github.com/yaricom/goNEAT
88 go run executor.go -out ./out/xor_disconnected -context ./data/xor.neat -genome ./data/xordisconnectedstartgenes -experiment XOR
89
90 ```
91
92 This will execute 100 trials of XOR (disconnected) experiment within 100 generations. The results of experiment execution
93 will be saved into the ./out directory as in previous experiment.
94
95 The experiment will fail sometimes to produce XOR solution over 100 generations, but most of times solution will be found. This
96 confirms that algorithm is able not only grow needed hidden units, but also to restore input connections as needed.
97
98 The example output of the command as following:
99 ```
100
101 Average
102 Winner Nodes: 5.7
103 Winner Genes: 9.2
104 Winner Evals: 9347.7
105 Mean
106 Complexity: 7.8
107 Diversity: 20.0
108 Age: 46.7
109
110 ```
111
112 ### 2. The single pole-balancing experiment
113 The pole-balancing or inverted pendulum problem has long been established as a standard benchmark for artificial learning
114 systems. It is one of the best early examples of a reinforcement learning task under conditions of incomplete knowledge.
115
116 ![alt text][single_pole-balancing_scheme]
117
118 Figure 1.
119
120 ##### System Constraints
121 1. The pole must remain upright within ±r the pole failure angle.
122 2. The cart must remain within ±h of origin.
123 3. The controller must always exert a non-zero force F.
124
125 Where r is a pole failure angle (±12 ̊ from 0) and h is a track limit (±2.4 meters from the track centre).
126
127 The simulation of the cart ends when either the pole exceeds the failure angle or the cart exceeds the limit of the track.
128 The objective is to devise a controller that can keep the pole balanced for a defined length of simulation time.
129 The controller must always output a force at full magnitude in either direction (bang-bang control).
130
131 In this experiment the Genome considered as a winner if it's able to simulate single pole balancing at least 500’000 time
132 steps (10’000 simulated seconds).
133
134 To run this experiment with 150 population size execute following commands:
135 ```bash
136
137 cd $GOPATH/src/github.com/yaricom/goNEAT
138 go run executor.go -out ./out/pole1 -context ./data/pole1_150.neat -genome ./data/pole1startgenes -experiment cart_pole
139
140 ```
141
142 This will execute 100 trials of single pole-balancing experiment within 100 generations and over population with 150
143 organisms. The results of experiment execution will be saved in ./out directory under specified folder.
144
145 To run this experiment with 1’000 population size execute following commands:
146 ```bash
147
148 cd $GOPATH/src/github.com/yaricom/goNEAT
149 go run executor.go -out ./out/pole1 -context ./data/pole1_1000.neat -genome ./data/pole1startgenes -experiment cart_pole
150
151 ```
152
153 The example output of the command for population of 1000 organisms as following:
154 ```
155
156 Average
157 Winner Nodes: 7.0
158 Winner Genes: 10.2
159 Winner Evals: 1880.0
160 Mean
161 Complexity: 17.1
162 Diversity: 25.3
163 Age: 2.2
164
165 ```
166
167 This will execute 100 trials of single pole-balancing experiment within 100 generations and over population with 1’000
168 organisms.
169
170 The results demonstrate that winning Genome can be found in average within 2 generations among population of 1’000 organisms (which
171 belongs to 17 species in average) and within 30 generations for population of 150 organisms.
172
173 It's interesting to note that for population with 1000 organisms the winning solution often found in the null generation,
174 i.e. within initial random population. Thus the next experiment with double pole-balancing setup seems more interesting
175 for performance testing.
176
177 In both single pole-balancing configurations described above the optimal winner organism has number of nodes and genes - 7 and 10 correspondingly.
178
179 The seven network nodes has following meaning:
180 * node #1 is a bias
181 * nodes #2-5 are sensors receiving system state: X position, acceleration among X, pole angle, and pole angular velocity
182 * nodes #6, 7 are output nodes signaling what action should be applied to the system to balance pole at
183 each simulation step, i.e. force direction to be applied. The applied force direction depends on relative strength of
184 activations of both output neurons. If activation of first output neuron (6-th node) greater than activation of second
185 neuron (7-th node) the positive force direction applied. Otherwise the negative force direction applied.
186
187 The TEN genes is exactly number of links required to connect FIVE input sensor nodes with TWO output neuron nodes (5x2).
188
189 ### 3. The double pole-balancing experiment
190
191 This is advanced version of pole-balancing which assumes that cart has two poles with different mass and length to be balanced.
192
193 ![alt text][double_pole-balancing_scheme]
194
195 Figure 2.
196
197 We will consider for benchmarking the two types of this problem:
198 * the Markovian with full system state known (including velocities);
199 * the Non-Markovian without velocity information.
200
201 The former one is fairly simple and last one is a quite challenging.
202
203 ##### System Constraints
204 1. The both poles must remain upright within ±r the pole failure angle.
205 2. The cart must remain within ±h of origin.
206 3. The controller must always exert a non-zero force F.
207
208 Where r is a pole failure angle (±36 ̊ from 0) and h is a track limit (±2.4 meters from the track centre).
209
210 #### 3.1. The double pole-balancing Markovian experiment (with known velocity)
211
212 In this experiment agent will receive at each time step full system state including velocity of cart and both poles. The
213 winner solution will be determined as the one which is able to perform double pole-balancing at least 100’000 time steps or
214 1’000 simulated seconds.
215
216 To run experiment execute following command:
217 ```bash
218
219 cd $GOPATH/src/github.com/yaricom/goNEAT
220 go run executor.go -out ./out/pole2_markov -context ./data/pole2_markov.neat -genome ./data/pole2_markov_startgenes -experiment cart_2pole_markov
221
222 ```
223
224 This will execute 10 trials of double pole-balancing experiment within 100 generations and over population with 1’000
225 organisms.
226
227 The example output of the command:
228
229 ```
230
231 Average
232 Winner Nodes: 16.6
233 Winner Genes: 35.2
234 Winner Evals: 35593.2
235 Mean
236 Complexity: 29.4
237 Diversity: 686.9
238 Age: 14.7
239
240 ```
241
242 The winner solution can be found approximately within 13 generation with nearly doubled complexity of resulting genome
243 compared to the seed genome. The seed genome has eight nodes where nodes #1-6 is sensors for x, x', θ1, θ1', θ2, and θ2'
244 correspondingly, node #7 is a bias, and node #8 is an output signaling what action should be applied at each time step.
245
246
247 #### 3.2. The double pole-balancing Non-Markovian experiment (without velocity information)
248
249 In this experiment agent will receive at each time step partial system state excluding velocity information about cart and both poles.
250 Only horizontal cart position X, and angles of both poles θ1 and θ2 will be provided to the agent.
251
252 The best individual (i.e. the one with the highest fitness value) of every generation is tested for
253 its ability to balance the system for a longer time period. If a potential solution passes this test
254 by keeping the system balanced for 100’000 time steps, the so called generalization score(GS) of this
255 particular individual is calculated. This score measures the potential of a controller to balance the
256 system starting from different initial conditions. It's calculated with a series of experiments, running
257 over 1000 time steps, starting from 625 different initial conditions.
258
259 The initial conditions are chosen by assigning each value of the set Ω = \[0.05, 0.25, 0.5, 0.75, 0.95\] to
260 each of the states x, ∆x/∆t, θ1 and ∆θ1/∆t, scaled to the range of the corresponding variables. The short pole
261 angle θ2 and its angular velocity ∆θ2/∆t are set to zero. The GS is
262 then defined as the number of successful runs from the 625 initial conditions and an individual
263 is defined as a solution if it reaches a generalization score of 200 or more.
264
265 To run experiment execute following command:
266 ```bash
267
268 cd $GOPATH/src/github.com/yaricom/goNEAT
269 go run executor.go -out ./out/pole2_non-markov -context ./data/pole2_non-markov.neat -genome ./data/pole2_non-markov_startgenes -experiment cart_2pole_non-markov
270
271 ```
272
273 This will execute 10 trials of double pole-balancing Non-Markovian experiment within 100 generations and over population
274 with 1’000 organisms.
275
276 The example output of the command:
277
278 ```
279 Average
280 Winner Nodes: 5.5
281 Winner Genes: 11.5
282 Winner Evals: 44049.3
283 Mean
284 Complexity: 13.9
285 Diversity: 611.1
286 Age: 18.9
287
288 ```
289
290 The maximal generalization score achieved in this test run is about 347 for very simple genome which comprise of five nodes
291 and five genes (links). It has the same number of nodes as seed genome and only grew one extra recurrent gene connecting
292 output node to itself. And it happen to be the most useful configuration among other.
293
294 The most fit organism's genome based on test non-markov run with maximal generalization score of 347:
295
296 ```
297 genomestart 46
298 trait 1 0.2138946217467174 0.06002354203083418 0.0028906105813590443 0.2713154623271218 0.35856547551861573 0.15170654613596346 0.14290235997205494 0.28328098202348406
299 node 1 1 1 1
300 node 2 1 1 1
301 node 3 1 1 1
302 node 4 1 1 3
303 node 5 1 0 2
304 gene 1 1 5 -0.8193796421450023 false 1 -0.8193796421450023 true
305 gene 1 2 5 -6.591284446514077 false 2 -6.591284446514077 true
306 gene 1 3 5 5.783520492164443 false 3 5.783520492164443 true
307 gene 1 4 5 0.6165420336465628 false 4 0.6165420336465628 true
308 gene 1 5 5 -1.240555929293543 true 50 -1.240555929293543 true
309 genomeend 46
310
311 ```
312
313 ## Conclusion
314
315 The experiments described in this work confirm that implemented NEAT method is able to evolve new structures in ANNs (XOR
316 experiment) and is able to solve reinforcement learning tasks under conditions of incomplete knowledge (pole-balancing).
317
318 ## Credits
319
320 * The original C++ NEAT implementation created by Kenneth Stanley, see: [NEAT][1]
321 * Other NEAT implementations may be found at [NEAT Software Catalog][2]
322
323 This source code maintained and managed by [Iaroslav Omelianenko][3]
324
325
326 [1]:http://www.cs.ucf.edu/~kstanley/neat.html
327 [2]:http://eplex.cs.ucf.edu/neat_software/
328 [3]:https://io42.space
329
330 [single_pole-balancing_scheme]: https://github.com/yaricom/goNEAT/blob/master/contents/single_pole-balancing.jpg "The single pole-balancing experimental setup"
331 [double_pole-balancing_scheme]: https://github.com/yaricom/goNEAT/blob/master/contents/double_pole-balancing.png "The double pole-balancing experimental setup"