go.dedis.ch/onet/v3@v3.2.11-0.20210930124529-e36530bca7ef/simul/README.md (about) 1 # Simulation 2 3 The onet library allows for multiple levels of simulations: 4 5 - [Localhost](./platform/LOCALHOST.md): 6 - up to 100 nodes 7 - [Mininet](./platform/MININET.md): 8 - up to 300 nodes on a 48-core machine, multiplied by the number of machines 9 available 10 - define max. bandwidth and delay for your network 11 - [Deterlab](./platform/DETERLAB.md): 12 - up to 1000 nodes on a strong machine, multiplied by the number of machines 13 available 14 15 Refer to the simulation-examples in one of the following places: 16 - [./manage/simulation](./manage/simulation) 17 - [./test_simul](./test_simul) 18 - https://github.com/dedis/cothority_template 19 20 ## Runfile for simulations 21 22 Each simulation can have one or more .toml-files that describe a number of experiments 23 to be run on localhost or deterlab. 24 25 The .toml-files are split in two parts, separated by an empty line. The first 26 part consists of one or more 'global' variables that describe all experiments. 27 28 The second part starts with a line of variables that have to be defined for each 29 experiment, where each experiment makes up one line. 30 31 ### Necessary variables 32 33 - `Simulation` - what simulation to run 34 - `Hosts` - how many hosts to instantiate - this corresponds to the nodes 35 that will be running and available in the main `Roster` 36 - `Servers` - how many servers to use maximum - if less than this number of 37 servers are available, a warning will be printed, but the simulation will 38 still be run 39 40 The `Servers` will mostly influence how the simulation will be run. 41 Depending on the platform, this will be handled differently: 42 - `localhost` - `Servers` is ignored here 43 - `Deterlab` - the system will distribute the `Hosts` nodes over the 44 available servers, but not over more than `Servers`. 45 This allows for running simulations that are smaller than your whole DETERLab experiment without having to modify and restart the 46 experiment. 47 - `Mininet` - as in `Deterlab`, the `Hosts` nodes will be distributed over 48 a maximum of `Servers`. 49 50 ### onet.SimulationBFTree 51 52 The standard simulation (and the only one implemented) is the 53 `SimulationBFTree`, which will prepare the `Roster` and the `Tree` for the 54 simulation. 55 Even if you use the `SimulationBFTree`, you're not restricted to use only the 56 prepared `Tree`. 57 However, there will not be more nodes available than the ones in the prepared 58 `Roster`. 59 Some restrictions apply when you're using the `Deterlab` simulation: 60 - all nodes on one server (`Hosts` / min(available servers, `Servers`)) are 61 run in one binary, which means 62 - bandwidth measurements cover all the nodes 63 - time measurements need to make sure no other calculations are taking place 64 - the bandwidth- and delay-restrictions only apply between two physical servers, so 65 - the simulation makes sure that all connected nodes in the `Tree` are always 66 on different servers. If you use another communication than the one in the 67 `Tree`, this will mean that the system cannot guarantee that the 68 communication is restricted 69 - the bandwidth restrictions apply to the sum of all communications between 70 two servers, so to a number of hosts 71 If you want to have a bandwidth restriction that is between all nodes, and 72 `Hosts > Servers`, you have to use the `Mininet` platform, which doesn't 73 have this restriction. 74 75 The following variables define how the original `Tree` is calculated - only 76 one of the two should be given: 77 78 - `BF` - branching factor: how many children each node has 79 - `Depth` - the depth of the tree in levels below the root-node 80 81 If there are 13 `Hosts` with a `BF` of 3, the system will create a complete 82 tree with the root-node having 3 children, and each of the children having 3 83 more children. 84 The same setup can be achieved with 13 `Hosts` and a `Depth` of 3. 85 86 If the tree to be created is not complete, it will be filled breath-first and 87 the children of the last row will be distributed as evenly as possible. 88 89 In addition, `Rounds` defines how many rounds the simulation will run. 90 91 ### Statistics for subset of hosts 92 93 Buckets of statistics can be defined using the following variable: 94 95 - `Buckets` - indices range of the buckets 96 97 The parameter is a string where the buckets are separated with spaces and the ranges 98 by a dash (e.g. `Buckets = "0:5 5:10-15:20"` that will create a bucket with hosts 0 99 to 4 and another one with hosts 5 to 9 and 15 to 19). Range indices can be compared 100 to Go slices so that the lower index is inclusive and the higher is exclusive. 101 102 A file will be written per bucket and the global one containing the statistics of all 103 the conodes will always be present independently from the parameter. Each file will have 104 the bucket number as suffix. 105 106 ### Simulations with long setup-times and multiple measurements 107 108 Per default, all rounds of an individual simulation-run will be averaged and 109 written to the csv-file. If you set `IndividualStats` to a non-`""`-value, 110 every round will create a new line. This is useful if you have a simulation 111 with a long setup-time and you want to do multiple measurements for the same 112 setup. 113 114 ### Timeouts 115 116 Timeouts are parsed according to Go's time.Duration: A duration string 117 is a possibly signed sequence of decimal numbers, each with optional 118 fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid 119 time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". 120 121 Two timeout variables are available: 122 123 - `RunWait` - how many seconds to wait for a run (one line of .toml-file) to finish 124 (default: 180s) 125 - `ExperimentWait` - how many seconds to wait for the while experiment to finish 126 (default: RunWait \* #Runs) 127 128 ### PreScript 129 130 If you need to run a script before the simulation is started (like installing 131 a missing library), you can define 132 133 - `PreScript` - a shell-script that is run _before_ the simulation is started 134 on each machine. 135 It receives a single argument: the platform this simulation runs: 136 [localhost,mininet,deterlab] 137 138 ### MiniNet specific 139 140 Mininet has support for setting up delays and bandwidth for each simulation. 141 You can use the following two variables: 142 143 - `Delay`[ms] - the delay between two hosts - the round-trip delay will be 144 the double of this 145 - `Bandwidth`[Mbps] - the bandwidth in both sending and receiving direction 146 for each host, measured in mega bits per second 147 148 You can put these variables either globally at the top of the .toml file or 149 set them up for each line in the experiment (see the exapmles below). 150 151 ### Experimental 152 153 - `SingleHost` - which will reduce the tree to use only one host per server, and 154 thus speeding up connections again 155 - `Tags` - build-tags that will be called when building the binaries for the 156 simulation 157 158 ### Example 159 160 Simulation = "ExampleHandlers" 161 Servers = 16 162 BF = 2 163 Rounds = 10 164 #SingleHost = true 165 166 Hosts 167 3 168 7 169 15 170 31 171 172 This will run the `ExampleHandlers`-simulation on 16 servers with a branching 173 factor of 2 and 10 rounds. The `SingleHost`-argument is commented out, so it 174 will use as many hosts as described. 175 176 In the second part, 4 experiments are defined, each only changing the number 177 of `Hosts`. First 3, then 7, 15, and finally 31 hosts are run on the 16 178 servers. For each experiment 10 rounds are started. 179 180 Assuming the simulation runs on MiniNet, the network delay can be set globally 181 as follows: 182 183 Simulation = "ExampleHandlers" 184 Delay = 100 185 Servers = 16 186 BF = 2 187 Rounds = 10 188 #SingleHost = true 189 190 Hosts 191 3 192 7 193 15 194 31 195 196 Alternatively, it can be set for each individual experiment: 197 198 Simulation = "ExampleHandlers" 199 Servers = 16 200 BF = 2 201 Rounds = 10 202 #SingleHost = true 203 204 Hosts,Delay 205 3,50 206 7,100 207 15,200 208 31,400 209 210 ## test_data format 211 212 Every simulation will be written to the `test_data` directory with the name 213 of the simulation file as base and a `.csv` applied. 214 The configuration of the simulation file is written to the tables in the 215 following columns, which are copied as-is from the simulation file: 216 217 - hosts, bf, delay, depth, other, prescript, ratio, rounds, servers, suite 218 219 For all the other measurements, the following statistics are available: 220 221 - `_avg` - the average 222 - `_std` - standard-deviation 223 - `_min` - minimum 224 - `_max` - maximum 225 - `_sum` - sum of all calls 226 227 ### measure.NewTimeMeasure 228 229 The following measurements will be taken for `measure.NewTimeMeasure`: 230 - `_user` - user-space time, crypto and other calculations 231 - `_system` - system-space time - disk i/o network i/o 232 - `_wall` - wall-clock, as described above 233 234 The measurements are given in seconds. 235 There is an important difference in the `_wall` and the `_user`/`_system` 236 measurements: the `_wall` measurements indicate how much time an external 237 observer would have measured. 238 So if the system waits for a reply of the network, this waiting time is 239 included in the measurement. 240 Contrary to this, the `_user`/`_system` measures how much work has been done 241 by the CPU during the measurement. 242 When measuring parallel execution of code, it is possible that the 243 `_user`/`_system` measurements are bigger than the `_wall` measurements 244 , because more than one CPU participated in the calculation. 245 The difference in `_user`/`_system` is explained for example here: 246 https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1 247 The `_wall` corresponds to the `real` in this comment. 248 249 There are some standard time measurements done by the simulation: 250 - `ChildrenWait` - how long the system had to wait for all children to be 251 available - might show problems in setting up the servers 252 - `SimulSyncWait` - how long the system had to wait at the end of the 253 simulation - might indicate problems in the wrap-up of the simulation 254 255 ### measure.NewCounterIOMeasure 256 257 If you want to measure bandwidth, you can use `measure.NewCounterIOMeasure`. 258 But you have to be careful to make sure that the system will not include 259 traffic that is outside of your scope by putting the `.Record()` as close as 260 possible to the `NewCounterIOMeasure`. 261 Every `CounterIOMeasure` has the following statistics: 262 263 - `_tx` - transmission-bytes 264 - `_rx` - bytes received 265 - `_msg_tx` - packets transmitted 266 - `_msg_rx` - packets received 267 268 Plus the standard modifiers (`_avg`, `_std`, ...). 269 270 There are two standard measurements done by every simulation: 271 - `bandwidth` (empty) - all node bandwidth 272 - `bandwidth_root` - bandwidth of the first node of the roster