github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/autoscale.md

github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/autoscale.md (about)

     1  # Runners autoscale configuration
     2  
     3  Autoscale provides the ability to utilize resources in a more elastic and
     4  dynamic way.
     5  
     6  Thanks to Runners being able to autoscale, your infrastructure contains only as
     7  much build instances as necessary at anytime. If you configure the Runner to
     8  only use autoscale, the system on which the Runner is installed acts as a
     9  bastion for all the machines it creates.
    10  
    11  ## Overview
    12  
    13  When this feature is enabled and configured properly, builds are executed on
    14  machines created _on demand_. Those machines, after the build is finished, can
    15  wait to run the next builds or can be removed after the configured `IdleTime`.
    16  In case of many cloud providers this helps to utilize the cost of already used
    17  instances.
    18  
    19  ## System requirements
    20  
    21  To use the autoscale feature, the system which will host the Runner must have:
    22  
    23  - AlloyCI Runner executable - installation guide can be found in
    24    [AlloyCI Runner Documentation][runner-installation]
    25  - Docker Machine executable - installation guide can be found in
    26    [Docker Machine documentation][docker-machine-installation]
    27  
    28  If you need to use any virtualization/cloud providers that aren't handled by
    29  Docker's Machine internal drivers, the appropriate driver plugin must be
    30  installed. The Docker Machine driver plugin installation and configuration is
    31  out of the scope of this documentation. For more details please read the
    32  [Docker Machine documentation][docker-machine-docs].
    33  
    34  ## Runner configuration
    35  
    36  In this section we will describe only the significant parameters from the
    37  autoscale feature point of view. For more configurations details please read
    38  the [AlloyCI Runner - Installation][runner-installation]
    39  and [AlloyCI Runner - Advanced Configuration][runner-configuration].
    40  
    41  ### Runner global options
    42  
    43  | Parameter    | Value   | Description |
    44  |--------------|---------|-------------|
    45  | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section](advanced-configuration.md#the-runnersmachine-section)) it affects the upper limit of created machines. |
    46  
    47  ### `[[runners]]` options
    48  
    49  | Parameter  | Value            | Description |
    50  |------------|------------------|-------------|
    51  | `executor` | string           | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. |
    52  | `limit`    | integer          | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). |
    53  
    54  ### `[runners.machine]` options
    55  
    56  Configuration parameters details can be found
    57  in [AlloyCI Runner - Advanced Configuration - The runners.machine section](advanced-configuration.md#the-runnersmachine-section).
    58  
    59  ### `[runners.cache]` options
    60  
    61  Configuration parameters details can be found
    62  in [AlloyCI Runner - Advanced Configuration - The runners.cache section](advanced-configuration.md#the-runnerscache-section)
    63  
    64  ### Additional configuration information
    65  
    66  There is also a special mode, when you set `IdleCount = 0`. In this mode,
    67  machines are **always** created **on-demand** before each build (if there is no
    68  available machine in _Idle_ state). After the build is finished, the autoscaling
    69  algorithm works
    70  [the same as it is described below](#autoscaling-algorithm-and-parameters).
    71  The machine is waiting for the next builds, and if no one is executed, after
    72  the `IdleTime` period, the machine is removed. If there are no builds, there
    73  are no machines in _Idle_ state.
    74  
    75  ## Autoscaling algorithm and parameters
    76  
    77  The autoscaling algorithm is based on three main parameters: `IdleCount`,
    78  `IdleTime` and `limit`.
    79  
    80  We say that each machine that does not run a build is in _Idle_ state. When
    81  AlloyCI Runner is in autoscale mode, it monitors all machines and ensures that
    82  there is always an `IdleCount` of machines in _Idle_ state.
    83  
    84  At the same time, AlloyCI Runner is checking the duration of the _Idle_ state of
    85  each machine. If the time exceeds the `IdleTime` value, the machine is
    86  automatically removed.
    87  
    88  ---
    89  
    90  **Example:**
    91  Let's suppose, that we have configured AlloyCI Runner with the following
    92  autoscale parameters:
    93  
    94  ```bash
    95  [[runners]]
    96    limit = 10
    97    (...)
    98    executor = "docker+machine"
    99    [runners.machine]
   100      IdleCount = 2
   101      IdleTime = 1800
   102      (...)
   103  ```
   104  
   105  At the beginning, when no builds are queued, AlloyCI Runner starts two machines
   106  (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set 
   107  `IdleTime` to 30 minutes (`IdleTime = 1800`).
   108  
   109  Now, let's assume that 5 builds are queued in AlloyCI CI. The first 2 builds are
   110  sent to the _Idle_ machines of which we have two. AlloyCI Runner now notices that 
   111  the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new 
   112  machines. Then, the next 2 builds from the queue are sent to those newly created 
   113  machines. Again, the number of _Idle_ machines is less than `IdleCount`, so 
   114  AlloyCI Runner starts 2 new machines and the last queued build is sent to one of 
   115  the _Idle_ machines.
   116  
   117  We now have 1 _Idle_ machine, so AlloyCI Runner starts another 1 new machine to
   118  satisfy `IdleCount`. Because there are no new builds in queue, those two
   119  machines stay in _Idle_ state and AlloyCI Runner is satisfied.
   120  
   121  ---
   122  
   123  **This is what happened:**
   124  We had 2 machines, waiting in _Idle_ state for new builds. After the 5 builds
   125  where queued, new machines were created, so in total we had 7 machines. Five of
   126  them were running builds, and 2 were in _Idle_ state, waiting for the next
   127  builds.
   128  
   129  The algorithm will still work in the same way; AlloyCI Runner will create a new
   130  _Idle_ machine for each machine used for the build execution until `IdleCount`
   131  is satisfied. Those machines will be created up to the number defined by
   132  `limit` parameter. If AlloyCI Runner notices that there is a `limit` number of
   133  total created machines, it will stop autoscaling, and new builds will need to
   134  wait in the build queue until machines start returning to _Idle_ state.
   135  
   136  In the above example we will always have two idle machines. The `IdleTime`
   137  applies only when we are over the `IdleCount`, then we try to reduce the number
   138  of machines to `IdleCount`.
   139  
   140  ---
   141  
   142  **Scaling down:**
   143  After the build is finished, the machine is set to _Idle_ state and is waiting
   144  for the next builds to be executed. Let's suppose that we have no new builds in
   145  the queue. After the time designated by `IdleTime` passes, the _Idle_ machines
   146  will be removed. In our example, after 30 minutes, all machines will be removed
   147  (each machine after 30 minutes from when last build execution ended) and AlloyCI
   148  Runner will start to keep an `IdleCount` of _Idle_ machines running, just like
   149  at the beginning of the example.
   150  
   151  ---
   152  
   153  So, to sum up:
   154  
   155  1. We start the Runner
   156  2. Runner creates 2 idle machines
   157  3. Runner picks one build
   158  4. Runner creates one more machine to fulfill the strong requirement of always
   159     having the two idle machines
   160  5. Build finishes, we have 3 idle machines
   161  6. When one of the three idle machines goes over `IdleTime` from the time when
   162     last time it picked the build it will be removed
   163  7. The Runner will always have at least 2 idle machines waiting for fast
   164     picking of the builds
   165  
   166  Below you can see a comparison chart of builds statuses and machines statuses
   167  in time:
   168  
   169  ![Autoscale state chart](img/autoscale-state-chart.png)
   170  
   171  ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines
   172  
   173  There doesn't exist a magic equation that will tell you what to set `limit` or
   174  `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_
   175  machines is a speedup feature. You don't need to wait 10s/20s/30s for the
   176  instance to be created. But as a user, you'd want all your machines (for which
   177  you need to pay) to be running builds, not stay in _Idle_ state. So you should
   178  have `concurrent` and `limit` set to values that will run the maximum count of
   179  machines you are willing to pay for. As for `IdleCount`, it should be set to a
   180  value that will generate a minimum amount of _not used_ machines when the build
   181  queue is empty.
   182  
   183  Let's assume the following example:
   184  
   185  ```bash
   186  concurrent=20
   187  
   188  [[runners]]
   189    limit = 40
   190    [runners.machine]
   191      IdleCount = 10
   192  ```
   193  
   194  In the above scenario the total amount of machines we could have is 30. The
   195  `limit` of total machines (building and idle) can be 40. We can have 10 idle
   196  machines but the `concurrent` builds are 20. So in total we can have 20
   197  concurrent machines running builds and 10 idle, summing up to 30.
   198  
   199  But what happens if the `limit` is less than the total amount of machines that
   200  could be created? The example below explains that case:
   201  
   202  ```bash
   203  concurrent=20
   204  
   205  [[runners]]
   206    limit = 25
   207    [runners.machine]
   208      IdleCount = 10
   209  ```
   210  
   211  In this example we will have at most 20 concurrent builds, and at most 25
   212  machines created. In the worst case scenario regarding idle machines, we will
   213  not be able to have 10 idle machines, but only 5, because the `limit` is 25.
   214  
   215  ## Off Peak time mode configuration
   216  
   217  > Introduced in AlloyCI Runner v1.7
   218  
   219  Autoscale can be configured with the support for _Off Peak_ time mode periods.
   220  
   221  **What is _Off Peak_ time mode period?**
   222  
   223  Some organizations can select a regular time periods when no work is done.
   224  For example most of commercial companies are working from Monday to
   225  Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week -
   226  from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday -
   227  no one is working. These time periods we're naming here as _Off Peak_.
   228  
   229  Organizations where _Off Peak_ time periods occurs probably don't want
   230  to pay for the _Idle_ machines when it's certain that no builds will be
   231  executed in this time. Especially when `IdleCount` is set to a big number.
   232  
   233  In the `v1.7` version of the Runner we've added the support for _Off Peak_
   234  configuration. With parameters described in configuration file you can now
   235  change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode
   236  periods.
   237  
   238  **How it is working?**
   239  
   240  Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`,
   241  `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The
   242  `OffPeakPeriods` setting contains an array of cron-style patterns defining
   243  when the _Off Peak_ time mode should be set on. For example:
   244  
   245  ```toml
   246  [runners.machine]
   247    OffPeakPeriods = [
   248      "* * 0-9,18-23 * * mon-fri *",
   249      "* * * * * sat,sun *"
   250    ]
   251  ```
   252  
   253  will enable the _Off Peak_ periods described above, so the _working_ days
   254  from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines
   255  scheduler is checking all patterns from the array and if at least one of
   256  them describes current time, then the _Off Peak_ time mode is enabled.
   257  
   258  You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't,
   259  the system setting of the host machine of every runner will be used. This
   260  default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish.
   261  
   262  When the _Off Peak_ time mode is enabled machines scheduler use
   263  `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime`
   264  instead of `IdleTime` setting. The autoscaling algorithm is not changed,
   265  only the parameters. When machines scheduler discovers that none from
   266  the `OffPeakPeriods` pattern is fulfilled then it switches back to
   267  `IdleCount` and `IdleTime` settings.
   268  
   269  More information about syntax of `OffPeakPeriods` patterns can be found
   270  in [AlloyCI Runner - Advanced Configuration - The runners.machine section](advanced-configuration.md#the-runnersmachine-section).
   271  
   272  ## Distributed runners caching
   273  
   274  To speed up your builds, AlloyCI Runner provides a [cache mechanism][cache]
   275  where selected directories and/or files are saved and shared between subsequent
   276  builds.
   277  
   278  This is working fine when builds are run on the same host, but when you start
   279  using the Runners autoscale feature, most of your builds will be running on a
   280  new (or almost new) host, which will execute each build in a new Docker
   281  container. In that case, you will not be able to take advantage of the cache
   282  feature.
   283  
   284  To overcome this issue, together with the autoscale feature, the distributed
   285  Runners cache feature was introduced.
   286  
   287  It uses any S3-compatible server to share the cache between used Docker hosts.
   288  When restoring and archiving the cache, AlloyCI Runner will query the S3 server
   289  and will download or upload the archive.
   290  
   291  To enable distributed caching, you have to define it in `config.toml` using the
   292  [`[runners.cache]` directive][runners-cache]:
   293  
   294  ```bash
   295  [[runners]]
   296    limit = 10
   297    executor = "docker+machine"
   298    [runners.cache]
   299      Type = "s3"
   300      ServerAddress = "s3.example.com"
   301      AccessKey = "access-key"
   302      SecretKey = "secret-key"
   303      BucketName = "runner"
   304      Insecure = false
   305      Path = "path/to/prefix"
   306      Shared = false
   307  ```
   308  
   309  The S3 URLs follow the structure `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`.
   310  
   311  To share the cache between two or more runners, set the `Shared` flag to true. That will remove the runner token from the S3 URL (`runner/<runner-id>`) and all configured runners will share the same cache. Remember that you can also set `Path` to separate caches between runners when cache sharing is enabled.
   312  
   313  Read how to [install your own caching server][caching].
   314  
   315  ## Distributed Docker registry mirroring
   316  
   317  To speed up builds executed inside of Docker containers, you can use the [Docker
   318  registry mirroring service][registry]. This will provide a proxy between your
   319  Docker machines and all used registries. Images will be downloaded once by the
   320  registry mirror. On each new host, or on an existing host where the image is
   321  not available, it will be downloaded from the configured registry mirror.
   322  
   323  Provided that the mirror will exist in your Docker machines LAN, the image
   324  downloading step should be much faster on each host.
   325  
   326  To configure the Docker registry mirroring, you have to add `MachineOptions` to
   327  the configuration in `config.toml`:
   328  
   329  ```bash
   330  [[runners]]
   331    limit = 10
   332    executor = "docker+machine"
   333    [runners.machine]
   334      (...)
   335      MachineOptions = [
   336        (...)
   337        "engine-registry-mirror=http://10.11.12.13:12345"
   338      ]
   339  ```
   340  
   341  Where `10.11.12.13:12345` is the IP address and port where your registry mirror
   342  is listening for connections from the Docker service. It must be accessible for
   343  each host created by Docker Machine.
   344  
   345  Read how to [install your own Docker registry server][registry-server].
   346  
   347  ## A complete example of `config.toml`
   348  
   349  The `config.toml` below uses the `digitalocean` Docker Machine driver:
   350  
   351  ```bash
   352  concurrent = 50   # All registered Runners can run up to 50 concurrent builds
   353  
   354  [[runners]]
   355    url = "https://alloy-ci.com"
   356    token = "RUNNER_TOKEN"             # Note this is different from the registration token used by `alloy-runner register`
   357    name = "autoscale-runner"
   358    executor = "docker+machine"        # This Runner is using the 'docker+machine' executor
   359    limit = 10                         # This Runner can execute up to 10 builds (created machines)
   360    [runners.docker]
   361      image = "ruby:2.1"               # The default image used for builds is 'ruby:2.1'
   362    [runners.machine]
   363      OffPeakPeriods = [               # Set the Off Peak time mode on for:
   364        "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm
   365        "* * * * * sat,sun *"          # - whole Saturday and Sunday
   366      ]
   367      OffPeakIdleCount = 1             # There must be 1 machine in Idle state - when Off Peak time mode is on
   368      OffPeakIdleTime = 1200           # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on
   369      IdleCount = 5                    # There must be 5 machines in Idle state - when Off Peak time mode is off
   370      IdleTime = 600                   # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off
   371      MaxBuilds = 100                  # Each machine can handle up to 100 builds in a row (after this it will be removed)
   372      MachineName = "auto-scale-%s"    # Each machine will have a unique name ('%s' is required)
   373      MachineDriver = "digitalocean"   # Docker Machine is using the 'digitalocean' driver
   374      MachineOptions = [
   375          "digitalocean-image=coreos-stable",
   376          "digitalocean-ssh-user=core",
   377          "digitalocean-access-token=DO_ACCESS_TOKEN",
   378          "digitalocean-region=nyc2",
   379          "digitalocean-size=4gb",
   380          "digitalocean-private-networking",
   381          "engine-registry-mirror=http://10.11.12.13:12345"   # Docker Machine is using registry mirroring
   382      ]
   383    [runners.cache]
   384      Type = "s3"   # The Runner is using a distributed cache with Amazon S3 service
   385      ServerAddress = "s3-eu-west-1.amazonaws.com"
   386      AccessKey = "AMAZON_S3_ACCESS_KEY"
   387      SecretKey = "AMAZON_S3_SECRET_KEY"
   388      BucketName = "runners"
   389      Insecure = false
   390  ```
   391  
   392  Note that the `MachineOptions` parameter contains options for the `digitalocean`
   393  driver which is used by Docker Machine to spawn machines hosted on Digital Ocean,
   394  and one option for Docker Machine itself (`engine-registry-mirror`).
   395  
   396  ## What are the supported cloud providers
   397  
   398  The autoscale mechanism currently is based on Docker Machine. Advanced
   399  configuration options, including virtualization/cloud provider parameters, are
   400  available at the [Docker Machine documentation][docker-machine-driver].
   401  
   402  [cache]: https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md#cache
   403  [runner-installation]: ../install/autoscaling.md
   404  [runner-configuration]: README.md
   405  [docker-machine-docs]: https://docs.docker.com/machine/
   406  [docker-machine-driver]: https://docs.docker.com/machine/drivers/
   407  [docker-machine-installation]: https://docs.docker.com/machine/install-machine/
   408  [runners-cache]: advanced-configuration.md#the-runnerscache-section
   409  [registry]: https://docs.docker.com/docker-trusted-registry/overview/
   410  [caching]: ../install/autoscaling.md#install-the-cache-server
   411  [registry-server]: ../install/autoscaling.md#install-docker-registry