go.dedis.ch/onet/v3@v3.2.11-0.20210930124529-e36530bca7ef/simul/README.md (about)

     1  # Simulation
     2  
     3  The onet library allows for multiple levels of simulations:
     4  
     5  -   [Localhost](./platform/LOCALHOST.md):
     6      -   up to 100 nodes
     7  -   [Mininet](./platform/MININET.md):
     8      -   up to 300 nodes on a 48-core machine, multiplied by the number of machines
     9          available
    10      -   define max. bandwidth and delay for your network
    11  -   [Deterlab](./platform/DETERLAB.md):
    12      -   up to 1000 nodes on a strong machine, multiplied by the number of machines
    13          available
    14  
    15  Refer to the simulation-examples in one of the following places:
    16  - [./manage/simulation](./manage/simulation)
    17  - [./test_simul](./test_simul)
    18  - https://github.com/dedis/cothority_template
    19  
    20  ## Runfile for simulations
    21  
    22  Each simulation can have one or more .toml-files that describe a number of experiments
    23  to be run on localhost or deterlab.
    24  
    25  The .toml-files are split in two parts, separated by an empty line. The first
    26  part consists of one or more 'global' variables that describe all experiments.
    27  
    28  The second part starts with a line of variables that have to be defined for each
    29  experiment, where each experiment makes up one line.
    30  
    31  ### Necessary variables
    32  
    33  -   `Simulation` - what simulation to run
    34  -   `Hosts` - how many hosts to instantiate - this corresponds to the nodes
    35   that will be running and available in the main `Roster`
    36  -   `Servers` - how many servers to use maximum - if less than this number of
    37   servers are available, a warning will be printed, but the simulation will
    38    still be run 
    39  
    40  The `Servers` will mostly influence how the simulation will be run.
    41  Depending on the platform, this will be handled differently:
    42  - `localhost` - `Servers` is ignored here
    43  - `Deterlab` - the system will distribute the `Hosts` nodes over the
    44   available servers, but not over more than `Servers`.
    45   This allows for running simulations that are smaller than your whole DETERLab experiment without having to modify and restart the
    46    experiment.
    47  - `Mininet` - as in `Deterlab`, the `Hosts` nodes will be distributed over
    48   a maximum of `Servers`.
    49  
    50  ### onet.SimulationBFTree
    51  
    52  The standard simulation (and the only one implemented) is the
    53   `SimulationBFTree`, which will prepare the `Roster` and the `Tree` for the
    54   simulation.
    55  Even if you use the `SimulationBFTree`, you're not restricted to use only the
    56   prepared `Tree`.
    57  However, there will not be more nodes available than the ones in the prepared
    58   `Roster`.
    59  Some restrictions apply when you're using the `Deterlab` simulation: 
    60  - all nodes on one server (`Hosts` / min(available servers, `Servers`)) are
    61   run in one binary, which means
    62    - bandwidth measurements cover all the nodes
    63    - time measurements need to make sure no other calculations are taking place  
    64  - the bandwidth- and delay-restrictions only apply between two physical servers, so
    65    - the simulation makes sure that all connected nodes in the `Tree` are always
    66      on different servers. If you use another communication than the one in the
    67      `Tree`, this will mean that the system cannot guarantee that the
    68      communication is restricted
    69    - the bandwidth restrictions apply to the sum of all communications between
    70     two servers, so to a number of hosts
    71  If you want to have a bandwidth restriction that is between all nodes, and
    72   `Hosts > Servers`, you have to use the `Mininet` platform, which doesn't
    73    have this restriction.  
    74  
    75  The following variables define how the original `Tree` is calculated - only
    76   one of the two should be given:
    77  
    78  -   `BF` - branching factor: how many children each node has
    79  -   `Depth` - the depth of the tree in levels below the root-node
    80  
    81  If there are 13 `Hosts` with a `BF` of 3, the system will create a complete
    82   tree with the root-node having 3 children, and each of the children having 3
    83   more children.
    84  The same setup can be achieved with 13 `Hosts` and a `Depth` of 3. 
    85  
    86  If the tree to be created is not complete, it will be filled breath-first and
    87   the children of the last row will be distributed as evenly as possible. 
    88  
    89  In addition, `Rounds` defines how many rounds the simulation will run.
    90  
    91  ### Statistics for subset of hosts
    92  
    93  Buckets of statistics can be defined using the following variable:
    94  
    95  -   `Buckets` - indices range of the buckets
    96  
    97  The parameter is a string where the buckets are separated with spaces and the ranges
    98  by a dash (e.g. `Buckets = "0:5 5:10-15:20"` that will create a bucket with hosts 0
    99  to 4 and another one with hosts 5 to 9 and 15 to 19). Range indices can be compared
   100  to Go slices so that the lower index is inclusive and the higher is exclusive.
   101  
   102  A file will be written per bucket and the global one containing the statistics of all
   103  the conodes will always be present independently from the parameter. Each file will have
   104  the bucket number as suffix.
   105  
   106  ### Simulations with long setup-times and multiple measurements
   107  
   108  Per default, all rounds of an individual simulation-run will be averaged and
   109  written to the csv-file. If you set `IndividualStats` to a non-`""`-value,
   110  every round will create a new line. This is useful if you have a simulation
   111  with a long setup-time and you want to do multiple measurements for the same
   112  setup.
   113  
   114  ### Timeouts
   115  
   116  Timeouts are parsed according to Go's time.Duration: A duration string
   117  is a possibly signed sequence of decimal numbers, each with optional
   118  fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid
   119  time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
   120  
   121  Two timeout variables are available:
   122  
   123  -   `RunWait` - how many seconds to wait for a run (one line of .toml-file) to finish
   124        (default: 180s)
   125  -   `ExperimentWait` - how many seconds to wait for the while experiment to finish
   126        (default: RunWait \* #Runs)
   127  
   128  ### PreScript
   129  
   130  If you need to run a script before the simulation is started (like installing
   131  a missing library), you can define
   132  
   133  -   `PreScript` - a shell-script that is run _before_ the simulation is started
   134      on each machine.
   135      It receives a single argument: the platform this simulation runs:
   136      [localhost,mininet,deterlab]
   137  
   138  ### MiniNet specific
   139  
   140  Mininet has support for setting up delays and bandwidth for each simulation.
   141  You can use the following two variables:
   142  
   143  -   `Delay`[ms] - the delay between two hosts - the round-trip delay will be
   144      the double of this
   145  -   `Bandwidth`[Mbps] - the bandwidth in both sending and receiving direction
   146      for each host, measured in mega bits per second
   147  
   148  You can put these variables either globally at the top of the .toml file or
   149  set them up for each line in the experiment (see the exapmles below).
   150  
   151  ### Experimental
   152  
   153  -   `SingleHost` - which will reduce the tree to use only one host per server, and
   154      thus speeding up connections again
   155  -   `Tags` - build-tags that will be called when building the binaries for the
   156      simulation
   157  
   158  ### Example
   159  
   160      Simulation = "ExampleHandlers"
   161      Servers = 16
   162      BF = 2
   163      Rounds = 10
   164      #SingleHost = true
   165  
   166      Hosts
   167      3
   168      7
   169      15
   170      31
   171  
   172  This will run the `ExampleHandlers`-simulation on 16 servers with a branching
   173  factor of 2 and 10 rounds. The `SingleHost`-argument is commented out, so it
   174  will use as many hosts as described.
   175  
   176  In the second part, 4 experiments are defined, each only changing the number
   177  of `Hosts`. First 3, then 7, 15, and finally 31 hosts are run on the 16
   178  servers. For each experiment 10 rounds are started.
   179  
   180  Assuming the simulation runs on MiniNet, the network delay can be set globally
   181  as follows:
   182  
   183      Simulation = "ExampleHandlers"
   184      Delay = 100
   185      Servers = 16
   186      BF = 2
   187      Rounds = 10
   188      #SingleHost = true
   189  
   190      Hosts
   191      3
   192      7
   193      15
   194      31
   195  
   196  Alternatively, it can be set for each individual experiment:
   197  
   198      Simulation = "ExampleHandlers"
   199      Servers = 16
   200      BF = 2
   201      Rounds = 10
   202      #SingleHost = true
   203  
   204      Hosts,Delay
   205      3,50
   206      7,100
   207      15,200
   208      31,400
   209  
   210  ## test_data format
   211  
   212  Every simulation will be written to the `test_data` directory with the name
   213   of the simulation file as base and a `.csv` applied.
   214  The configuration of the simulation file is written to the tables in the
   215   following columns, which are copied as-is from the simulation file:
   216   
   217  - hosts, bf, delay, depth, other, prescript, ratio, rounds, servers, suite
   218  
   219  For all the other measurements, the following statistics are available:
   220  
   221  - `_avg` - the average
   222  - `_std` - standard-deviation
   223  - `_min` - minimum
   224  - `_max` - maximum
   225  - `_sum` - sum of all calls
   226  
   227  ### measure.NewTimeMeasure
   228  
   229  The following measurements will be taken for `measure.NewTimeMeasure`:
   230  - `_user` - user-space time, crypto and other calculations
   231  - `_system` - system-space time - disk i/o network i/o
   232  - `_wall` - wall-clock, as described above
   233  
   234  The measurements are given in seconds.
   235  There is an important difference in the `_wall` and the `_user`/`_system` 
   236  measurements: the `_wall` measurements indicate how much time an external
   237   observer would have measured.
   238  So if the system waits for a reply of the network, this waiting time is
   239   included in the measurement.
   240  Contrary to this, the `_user`/`_system` measures how much work has been done
   241   by the CPU during the measurement.
   242  When measuring parallel execution of code, it is possible that the 
   243  `_user`/`_system` measurements are bigger than the `_wall` measurements
   244  , because more than one CPU participated in the calculation.
   245  The difference in `_user`/`_system` is explained for example here: 
   246  https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
   247  The `_wall` corresponds to the `real` in this comment.
   248  
   249  There are some standard time measurements done by the simulation:
   250  - `ChildrenWait` - how long the system had to wait for all children to be
   251   available - might show problems in setting up the servers
   252  - `SimulSyncWait` - how long the system had to wait at the end of the
   253   simulation - might indicate problems in the wrap-up of the simulation
   254   
   255  ### measure.NewCounterIOMeasure
   256  
   257  If you want to measure bandwidth, you can use `measure.NewCounterIOMeasure`.
   258  But you have to be careful to make sure that the system will not include
   259   traffic that is outside of your scope by putting the `.Record()` as close as
   260    possible to the `NewCounterIOMeasure`.
   261  Every `CounterIOMeasure` has the following statistics:
   262  
   263  - `_tx` - transmission-bytes
   264  - `_rx` - bytes received
   265  - `_msg_tx` - packets transmitted
   266  - `_msg_rx` - packets received
   267  
   268  Plus the standard modifiers (`_avg`, `_std`, ...).
   269  
   270  There are two standard measurements done by every simulation:
   271  - `bandwidth` (empty) - all node bandwidth
   272  - `bandwidth_root` - bandwidth of the first node of the roster