github.com/secure-build/gitlab-runner@v12.5.0+incompatible/docs/configuration/autoscale.md (about)

     1  # Runners autoscale configuration
     2  
     3  > The autoscale feature was introduced in GitLab Runner 1.1.0.
     4  
     5  Autoscale provides the ability to utilize resources in a more elastic and
     6  dynamic way.
     7  
     8  Thanks to Runners being able to autoscale, your infrastructure contains only as
     9  much build instances as necessary at anytime. If you configure the Runner to
    10  only use autoscale, the system on which the Runner is installed acts as a
    11  bastion for all the machines it creates.
    12  
    13  ## Overview
    14  
    15  When this feature is enabled and configured properly, jobs are executed on
    16  machines created _on demand_. Those machines, after the job is finished, can
    17  wait to run the next jobs or can be removed after the configured `IdleTime`.
    18  In case of many cloud providers this helps to utilize the cost of already used
    19  instances.
    20  
    21  Below, you can see a real life example of the runners autoscale feature, tested
    22  on GitLab.com for the [GitLab Community Edition][ce] project:
    23  
    24  ![Real life example of autoscaling](img/autoscale-example.png)
    25  
    26  Each machine on the chart is an independent cloud instance, running jobs
    27  inside of Docker containers.
    28  
    29  [ce]: https://gitlab.com/gitlab-org/gitlab-ce
    30  
    31  ## System requirements
    32  
    33  Before configuring autoscale, you must:
    34  
    35  - [Prepare your own environment](../executors/docker_machine.md#preparing-the-environment).
    36  - Optionally use a [forked version](../executors/docker_machine.md#forked-version-of-docker-machine) of Docker machine supplied by GitLab, which has some additional fixes.
    37  
    38  ## Supported cloud providers
    39  
    40  The autoscale mechanism is based on [Docker Machine](https://docs.docker.com/machine/overview/).
    41  All supported virtualization/cloud provider parameters, are available at the
    42  [Docker Machine drivers documentation](https://docs.docker.com/machine/drivers/).
    43  
    44  ## Runner configuration
    45  
    46  In this section we will describe only the significant parameters from the
    47  autoscale feature point of view. For more configurations details read the
    48  [advanced configuration](advanced-configuration.md).
    49  
    50  ### Runner global options
    51  
    52  | Parameter    | Value   | Description |
    53  |--------------|---------|-------------|
    54  | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section][runners-machine]) it affects the upper limit of created machines. |
    55  
    56  ### `[[runners]]` options
    57  
    58  | Parameter  | Value            | Description |
    59  |------------|------------------|-------------|
    60  | `executor` | string           | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. |
    61  | `limit`    | integer          | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). |
    62  
    63  ### `[runners.machine]` options
    64  
    65  Configuration parameters details can be found
    66  in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine].
    67  
    68  ### `[runners.cache]` options
    69  
    70  Configuration parameters details can be found
    71  in [GitLab Runner - Advanced Configuration - The `[runners.cache]` section][runners-cache]
    72  
    73  ### Additional configuration information
    74  
    75  There is also a special mode, when you set `IdleCount = 0`. In this mode,
    76  machines are **always** created **on-demand** before each job (if there is no
    77  available machine in _Idle_ state). After the job is finished, the autoscaling
    78  algorithm works
    79  [the same as it is described below](#autoscaling-algorithm-and-parameters).
    80  The machine is waiting for the next jobs, and if no one is executed, after
    81  the `IdleTime` period, the machine is removed. If there are no jobs, there
    82  are no machines in _Idle_ state.
    83  
    84  ## Autoscaling algorithm and parameters
    85  
    86  The autoscaling algorithm is based on three main parameters: `IdleCount`,
    87  `IdleTime` and `limit`.
    88  
    89  We say that each machine that does not run a job is in _Idle_ state. When
    90  GitLab Runner is in autoscale mode, it monitors all machines and ensures that
    91  there is always an `IdleCount` of machines in _Idle_ state.
    92  
    93  At the same time, GitLab Runner is checking the duration of the _Idle_ state of
    94  each machine. If the time exceeds the `IdleTime` value, the machine is
    95  automatically removed.
    96  
    97  ---
    98  
    99  **Example:**
   100  Let's suppose, that we have configured GitLab Runner with the following
   101  autoscale parameters:
   102  
   103  ```bash
   104  [[runners]]
   105    limit = 10
   106    (...)
   107    executor = "docker+machine"
   108    [runners.machine]
   109      IdleCount = 2
   110      IdleTime = 1800
   111      (...)
   112  ```
   113  
   114  At the beginning, when no jobs are queued, GitLab Runner starts two machines
   115  (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set
   116  `IdleTime` to 30 minutes (`IdleTime = 1800`).
   117  
   118  Now, let's assume that 5 jobs are queued in GitLab CI. The first 2 jobs are
   119  sent to the _Idle_ machines of which we have two. GitLab Runner now notices that
   120  the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new
   121  machines. Then, the next 2 jobs from the queue are sent to those newly created
   122  machines. Again, the number of _Idle_ machines is less than `IdleCount`, so
   123  GitLab Runner starts 2 new machines and the last queued job is sent to one of
   124  the _Idle_ machines.
   125  
   126  We now have 1 _Idle_ machine, so GitLab Runner starts another 1 new machine to
   127  satisfy `IdleCount`. Because there are no new jobs in queue, those two
   128  machines stay in _Idle_ state and GitLab Runner is satisfied.
   129  
   130  ---
   131  
   132  **This is what happened:**
   133  We had 2 machines, waiting in _Idle_ state for new jobs. After the 5 jobs
   134  where queued, new machines were created, so in total we had 7 machines. Five of
   135  them were running jobs, and 2 were in _Idle_ state, waiting for the next
   136  jobs.
   137  
   138  The algorithm will still work in the same way; GitLab Runner will create a new
   139  _Idle_ machine for each machine used for the job execution until `IdleCount`
   140  is satisfied. Those machines will be created up to the number defined by
   141  `limit` parameter. If GitLab Runner notices that there is a `limit` number of
   142  total created machines, it will stop autoscaling, and new jobs will need to
   143  wait in the job queue until machines start returning to _Idle_ state.
   144  
   145  In the above example we will always have two idle machines. The `IdleTime`
   146  applies only when we are over the `IdleCount`, then we try to reduce the number
   147  of machines to `IdleCount`.
   148  
   149  ---
   150  
   151  **Scaling down:**
   152  After the job is finished, the machine is set to _Idle_ state and is waiting
   153  for the next jobs to be executed. Let's suppose that we have no new jobs in
   154  the queue. After the time designated by `IdleTime` passes, the _Idle_ machines
   155  will be removed. In our example, after 30 minutes, all machines will be removed
   156  (each machine after 30 minutes from when last job execution ended) and GitLab
   157  Runner will start to keep an `IdleCount` of _Idle_ machines running, just like
   158  at the beginning of the example.
   159  
   160  ---
   161  
   162  So, to sum up:
   163  
   164  1. We start the Runner
   165  1. Runner creates 2 idle machines
   166  1. Runner picks one job
   167  1. Runner creates one more machine to fulfill the strong requirement of always
   168     having the two idle machines
   169  1. Job finishes, we have 3 idle machines
   170  1. When one of the three idle machines goes over `IdleTime` from the time when
   171     last time it picked the job it will be removed
   172  1. The Runner will always have at least 2 idle machines waiting for fast
   173     picking of the jobs
   174  
   175  Below you can see a comparison chart of jobs statuses and machines statuses
   176  in time:
   177  
   178  ![Autoscale state chart](img/autoscale-state-chart.png)
   179  
   180  ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines
   181  
   182  There doesn't exist a magic equation that will tell you what to set `limit` or
   183  `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_
   184  machines is a speedup feature. You don't need to wait 10s/20s/30s for the
   185  instance to be created. But as a user, you'd want all your machines (for which
   186  you need to pay) to be running jobs, not stay in _Idle_ state. So you should
   187  have `concurrent` and `limit` set to values that will run the maximum count of
   188  machines you are willing to pay for. As for `IdleCount`, it should be set to a
   189  value that will generate a minimum amount of _not used_ machines when the job
   190  queue is empty.
   191  
   192  Let's assume the following example:
   193  
   194  ```bash
   195  concurrent=20
   196  
   197  [[runners]]
   198    limit = 40
   199    [runners.machine]
   200      IdleCount = 10
   201  ```
   202  
   203  In the above scenario the total amount of machines we could have is 30. The
   204  `limit` of total machines (building and idle) can be 40. We can have 10 idle
   205  machines but the `concurrent` jobs are 20. So in total we can have 20
   206  concurrent machines running jobs and 10 idle, summing up to 30.
   207  
   208  But what happens if the `limit` is less than the total amount of machines that
   209  could be created? The example below explains that case:
   210  
   211  ```bash
   212  concurrent=20
   213  
   214  [[runners]]
   215    limit = 25
   216    [runners.machine]
   217      IdleCount = 10
   218  ```
   219  
   220  In this example we will have at most 20 concurrent jobs, and at most 25
   221  machines created. In the worst case scenario regarding idle machines, we will
   222  not be able to have 10 idle machines, but only 5, because the `limit` is 25.
   223  
   224  ## Off Peak time mode configuration
   225  
   226  > Introduced in GitLab Runner v1.7
   227  
   228  Autoscale can be configured with the support for _Off Peak_ time mode periods.
   229  
   230  **What is _Off Peak_ time mode period?**
   231  
   232  Some organizations can select a regular time periods when no work is done.
   233  For example most of commercial companies are working from Monday to
   234  Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week -
   235  from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday -
   236  no one is working. These time periods we're naming here as _Off Peak_.
   237  
   238  Organizations where _Off Peak_ time periods occurs probably don't want
   239  to pay for the _Idle_ machines when it's certain that no jobs will be
   240  executed in this time. Especially when `IdleCount` is set to a big number.
   241  
   242  In the `v1.7` version of the Runner we've added the support for _Off Peak_
   243  configuration. With parameters described in configuration file you can now
   244  change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode
   245  periods.
   246  
   247  **How it is working?**
   248  
   249  Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`,
   250  `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The
   251  `OffPeakPeriods` setting contains an array of cron-style patterns defining
   252  when the _Off Peak_ time mode should be set on. For example:
   253  
   254  ```toml
   255  [runners.machine]
   256    OffPeakPeriods = [
   257      "* * 0-9,18-23 * * mon-fri *",
   258      "* * * * * sat,sun *"
   259    ]
   260  ```
   261  
   262  will enable the _Off Peak_ periods described above, so the _working_ days
   263  from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines
   264  scheduler is checking all patterns from the array and if at least one of
   265  them describes current time, then the _Off Peak_ time mode is enabled.
   266  
   267  NOTE: **Note:**
   268  The 59th second of the last
   269  minute in any period that you specify will *not* be considered part of the
   270  period. For more information, see [issue #2170](https://gitlab.com/gitlab-org/gitlab-runner/issues/2170).
   271  
   272  You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't,
   273  the system setting of the host machine of every runner will be used. This
   274  default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish.
   275  
   276  When the _Off Peak_ time mode is enabled machines scheduler use
   277  `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime`
   278  instead of `IdleTime` setting. The autoscaling algorithm is not changed,
   279  only the parameters. When machines scheduler discovers that none from
   280  the `OffPeakPeriods` pattern is fulfilled then it switches back to
   281  `IdleCount` and `IdleTime` settings.
   282  
   283  More information about syntax of `OffPeakPeriods` patterns can be found
   284  in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine].
   285  
   286  ## Distributed runners caching
   287  
   288  NOTE: **Note:**
   289  Read how to [install your own cache server](../install/registry_and_cache_servers.md#install-your-own-cache-server).
   290  
   291  To speed up your jobs, GitLab Runner provides a [cache mechanism][cache]
   292  where selected directories and/or files are saved and shared between subsequent
   293  jobs.
   294  
   295  This is working fine when jobs are run on the same host, but when you start
   296  using the Runners autoscale feature, most of your jobs will be running on a
   297  new (or almost new) host, which will execute each job in a new Docker
   298  container. In that case, you will not be able to take advantage of the cache
   299  feature.
   300  
   301  To overcome this issue, together with the autoscale feature, the distributed
   302  Runners cache feature was introduced.
   303  
   304  It uses configured object storage server to share the cache between used Docker hosts.
   305  When restoring and archiving the cache, GitLab Runner will query the server
   306  and will download or upload the archive respectively.
   307  
   308  To enable distributed caching, you have to define it in `config.toml` using the
   309  [`[runners.cache]` directive][runners-cache]:
   310  
   311  ```bash
   312  [[runners]]
   313    limit = 10
   314    executor = "docker+machine"
   315    [runners.cache]
   316      Type = "s3"
   317      Path = "path/to/prefix"
   318      Shared = false
   319      [runners.cache.s3]
   320        ServerAddress = "s3.example.com"
   321        AccessKey = "access-key"
   322        SecretKey = "secret-key"
   323        BucketName = "runner"
   324        Insecure = false
   325  ```
   326  
   327  In the example above, the S3 URLs follow the structure
   328  `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`.
   329  
   330  To share the cache between two or more Runners, set the `Shared` flag to true.
   331  That will remove the runner token from the URL (`runner/<runner-id>`) and
   332  all configured Runners will share the same cache. Remember that you can also
   333  set `Path` to separate caches between Runners when cache sharing is enabled.
   334  
   335  ## Distributed container registry mirroring
   336  
   337  NOTE: **Note:**
   338  Read how to [install a container registry](../install/registry_and_cache_servers.md#install-a-proxy-container-registry).
   339  
   340  To speed up jobs executed inside of Docker containers, you can use the [Docker
   341  registry mirroring service][registry]. This will provide a proxy between your
   342  Docker machines and all used registries. Images will be downloaded once by the
   343  registry mirror. On each new host, or on an existing host where the image is
   344  not available, it will be downloaded from the configured registry mirror.
   345  
   346  Provided that the mirror will exist in your Docker machines LAN, the image
   347  downloading step should be much faster on each host.
   348  
   349  To configure the Docker registry mirroring, you have to add `MachineOptions` to
   350  the configuration in `config.toml`:
   351  
   352  ```bash
   353  [[runners]]
   354    limit = 10
   355    executor = "docker+machine"
   356    [runners.machine]
   357      (...)
   358      MachineOptions = [
   359        (...)
   360        "engine-registry-mirror=http://10.11.12.13:12345"
   361      ]
   362  ```
   363  
   364  Where `10.11.12.13:12345` is the IP address and port where your registry mirror
   365  is listening for connections from the Docker service. It must be accessible for
   366  each host created by Docker Machine.
   367  
   368  ## A complete example of `config.toml`
   369  
   370  The `config.toml` below uses the [`digitalocean` Docker Machine driver](https://docs.docker.com/machine/drivers/digital-ocean/):
   371  
   372  ```bash
   373  concurrent = 50   # All registered Runners can run up to 50 concurrent jobs
   374  
   375  [[runners]]
   376    url = "https://gitlab.com"
   377    token = "RUNNER_TOKEN"             # Note this is different from the registration token used by `gitlab-runner register`
   378    name = "autoscale-runner"
   379    executor = "docker+machine"        # This Runner is using the 'docker+machine' executor
   380    limit = 10                         # This Runner can execute up to 10 jobs (created machines)
   381    [runners.docker]
   382      image = "ruby:2.1"               # The default image used for jobs is 'ruby:2.1'
   383    [runners.machine]
   384      OffPeakPeriods = [               # Set the Off Peak time mode on for:
   385        "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm
   386        "* * * * * sat,sun *"          # - whole Saturday and Sunday
   387      ]
   388      OffPeakIdleCount = 1             # There must be 1 machine in Idle state - when Off Peak time mode is on
   389      OffPeakIdleTime = 1200           # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on
   390      IdleCount = 5                    # There must be 5 machines in Idle state - when Off Peak time mode is off
   391      IdleTime = 600                   # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off
   392      MaxBuilds = 100                  # Each machine can handle up to 100 jobs in a row (after this it will be removed)
   393      MachineName = "auto-scale-%s"    # Each machine will have a unique name ('%s' is required)
   394      MachineDriver = "digitalocean"   # Docker Machine is using the 'digitalocean' driver
   395      MachineOptions = [
   396          "digitalocean-image=coreos-stable",
   397          "digitalocean-ssh-user=core",
   398          "digitalocean-access-token=DO_ACCESS_TOKEN",
   399          "digitalocean-region=nyc2",
   400          "digitalocean-size=4gb",
   401          "digitalocean-private-networking",
   402          "engine-registry-mirror=http://10.11.12.13:12345"   # Docker Machine is using registry mirroring
   403      ]
   404    [runners.cache]
   405      Type = "s3"
   406      [runners.cache.s3]
   407        ServerAddress = "s3-eu-west-1.amazonaws.com"
   408        AccessKey = "AMAZON_S3_ACCESS_KEY"
   409        SecretKey = "AMAZON_S3_SECRET_KEY"
   410        BucketName = "runner"
   411        Insecure = false
   412  ```
   413  
   414  Note that the `MachineOptions` parameter contains options for the `digitalocean`
   415  driver which is used by Docker Machine to spawn machines hosted on Digital Ocean,
   416  and one option for Docker Machine itself (`engine-registry-mirror`).
   417  
   418  [cache]: https://docs.gitlab.com/ee/ci/yaml/README.html#cache
   419  [docker-machine-docs]: https://docs.docker.com/machine/
   420  [docker-machine-driver]: https://docs.docker.com/machine/drivers/
   421  [docker-machine-installation]: https://docs.docker.com/machine/install-machine/
   422  [runners-cache]: advanced-configuration.md#the-runnerscache-section
   423  [runners-machine]: advanced-configuration.md#the-runnersmachine-section
   424  [registry]: https://docs.docker.com/registry/