gitlab.com/jfprevost/gitlab-runner-notlscheck@v11.11.4+incompatible/docs/configuration/autoscale.md (about)

     1  # Runners autoscale configuration
     2  
     3  > The autoscale feature was introduced in GitLab Runner 1.1.0.
     4  
     5  Autoscale provides the ability to utilize resources in a more elastic and
     6  dynamic way.
     7  
     8  Thanks to Runners being able to autoscale, your infrastructure contains only as
     9  much build instances as necessary at anytime. If you configure the Runner to
    10  only use autoscale, the system on which the Runner is installed acts as a
    11  bastion for all the machines it creates.
    12  
    13  ## Overview
    14  
    15  When this feature is enabled and configured properly, jobs are executed on
    16  machines created _on demand_. Those machines, after the job is finished, can
    17  wait to run the next jobs or can be removed after the configured `IdleTime`.
    18  In case of many cloud providers this helps to utilize the cost of already used
    19  instances.
    20  
    21  Below, you can see a real life example of the runners autoscale feature, tested
    22  on GitLab.com for the [GitLab Community Edition][ce] project:
    23  
    24  ![Real life example of autoscaling](img/autoscale-example.png)
    25  
    26  Each machine on the chart is an independent cloud instance, running jobs
    27  inside of Docker containers.
    28  
    29  [ce]: https://gitlab.com/gitlab-org/gitlab-ce
    30  
    31  ## System requirements
    32  
    33  At this point you should have
    34  [installed all the requirements](../executors/docker_machine.md#preparing-the-environment).
    35  If not, make sure to do it before going over the configuration.
    36  
    37  ## Supported cloud providers
    38  
    39  The autoscale mechanism is based on [Docker Machine](https://docs.docker.com/machine/overview/).
    40  All supported virtualization/cloud provider parameters, are available at the
    41  [Docker Machine drivers documentation](https://docs.docker.com/machine/drivers/).
    42  
    43  ## Runner configuration
    44  
    45  In this section we will describe only the significant parameters from the
    46  autoscale feature point of view. For more configurations details read the
    47  [advanced configuration](advanced-configuration.md).
    48  
    49  ### Runner global options
    50  
    51  | Parameter    | Value   | Description |
    52  |--------------|---------|-------------|
    53  | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section][runners-machine]) it affects the upper limit of created machines. |
    54  
    55  ### `[[runners]]` options
    56  
    57  | Parameter  | Value            | Description |
    58  |------------|------------------|-------------|
    59  | `executor` | string           | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. |
    60  | `limit`    | integer          | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). |
    61  
    62  ### `[runners.machine]` options
    63  
    64  Configuration parameters details can be found
    65  in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine].
    66  
    67  ### `[runners.cache]` options
    68  
    69  Configuration parameters details can be found
    70  in [GitLab Runner - Advanced Configuration - The `[runners.cache]` section][runners-cache]
    71  
    72  ### Additional configuration information
    73  
    74  There is also a special mode, when you set `IdleCount = 0`. In this mode,
    75  machines are **always** created **on-demand** before each job (if there is no
    76  available machine in _Idle_ state). After the job is finished, the autoscaling
    77  algorithm works
    78  [the same as it is described below](#autoscaling-algorithm-and-parameters).
    79  The machine is waiting for the next jobs, and if no one is executed, after
    80  the `IdleTime` period, the machine is removed. If there are no jobs, there
    81  are no machines in _Idle_ state.
    82  
    83  ## Autoscaling algorithm and parameters
    84  
    85  The autoscaling algorithm is based on three main parameters: `IdleCount`,
    86  `IdleTime` and `limit`.
    87  
    88  We say that each machine that does not run a job is in _Idle_ state. When
    89  GitLab Runner is in autoscale mode, it monitors all machines and ensures that
    90  there is always an `IdleCount` of machines in _Idle_ state.
    91  
    92  At the same time, GitLab Runner is checking the duration of the _Idle_ state of
    93  each machine. If the time exceeds the `IdleTime` value, the machine is
    94  automatically removed.
    95  
    96  ---
    97  
    98  **Example:**
    99  Let's suppose, that we have configured GitLab Runner with the following
   100  autoscale parameters:
   101  
   102  ```bash
   103  [[runners]]
   104    limit = 10
   105    (...)
   106    executor = "docker+machine"
   107    [runners.machine]
   108      IdleCount = 2
   109      IdleTime = 1800
   110      (...)
   111  ```
   112  
   113  At the beginning, when no jobs are queued, GitLab Runner starts two machines
   114  (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set
   115  `IdleTime` to 30 minutes (`IdleTime = 1800`).
   116  
   117  Now, let's assume that 5 jobs are queued in GitLab CI. The first 2 jobs are
   118  sent to the _Idle_ machines of which we have two. GitLab Runner now notices that
   119  the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new
   120  machines. Then, the next 2 jobs from the queue are sent to those newly created
   121  machines. Again, the number of _Idle_ machines is less than `IdleCount`, so
   122  GitLab Runner starts 2 new machines and the last queued job is sent to one of
   123  the _Idle_ machines.
   124  
   125  We now have 1 _Idle_ machine, so GitLab Runner starts another 1 new machine to
   126  satisfy `IdleCount`. Because there are no new jobs in queue, those two
   127  machines stay in _Idle_ state and GitLab Runner is satisfied.
   128  
   129  ---
   130  
   131  **This is what happened:**
   132  We had 2 machines, waiting in _Idle_ state for new jobs. After the 5 jobs
   133  where queued, new machines were created, so in total we had 7 machines. Five of
   134  them were running jobs, and 2 were in _Idle_ state, waiting for the next
   135  jobs.
   136  
   137  The algorithm will still work in the same way; GitLab Runner will create a new
   138  _Idle_ machine for each machine used for the job execution until `IdleCount`
   139  is satisfied. Those machines will be created up to the number defined by
   140  `limit` parameter. If GitLab Runner notices that there is a `limit` number of
   141  total created machines, it will stop autoscaling, and new jobs will need to
   142  wait in the job queue until machines start returning to _Idle_ state.
   143  
   144  In the above example we will always have two idle machines. The `IdleTime`
   145  applies only when we are over the `IdleCount`, then we try to reduce the number
   146  of machines to `IdleCount`.
   147  
   148  ---
   149  
   150  **Scaling down:**
   151  After the job is finished, the machine is set to _Idle_ state and is waiting
   152  for the next jobs to be executed. Let's suppose that we have no new jobs in
   153  the queue. After the time designated by `IdleTime` passes, the _Idle_ machines
   154  will be removed. In our example, after 30 minutes, all machines will be removed
   155  (each machine after 30 minutes from when last job execution ended) and GitLab
   156  Runner will start to keep an `IdleCount` of _Idle_ machines running, just like
   157  at the beginning of the example.
   158  
   159  ---
   160  
   161  So, to sum up:
   162  
   163  1. We start the Runner
   164  2. Runner creates 2 idle machines
   165  3. Runner picks one job
   166  4. Runner creates one more machine to fulfill the strong requirement of always
   167     having the two idle machines
   168  5. Job finishes, we have 3 idle machines
   169  6. When one of the three idle machines goes over `IdleTime` from the time when
   170     last time it picked the job it will be removed
   171  7. The Runner will always have at least 2 idle machines waiting for fast
   172     picking of the jobs
   173  
   174  Below you can see a comparison chart of jobs statuses and machines statuses
   175  in time:
   176  
   177  ![Autoscale state chart](img/autoscale-state-chart.png)
   178  
   179  ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines
   180  
   181  There doesn't exist a magic equation that will tell you what to set `limit` or
   182  `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_
   183  machines is a speedup feature. You don't need to wait 10s/20s/30s for the
   184  instance to be created. But as a user, you'd want all your machines (for which
   185  you need to pay) to be running jobs, not stay in _Idle_ state. So you should
   186  have `concurrent` and `limit` set to values that will run the maximum count of
   187  machines you are willing to pay for. As for `IdleCount`, it should be set to a
   188  value that will generate a minimum amount of _not used_ machines when the job
   189  queue is empty.
   190  
   191  Let's assume the following example:
   192  
   193  ```bash
   194  concurrent=20
   195  
   196  [[runners]]
   197    limit = 40
   198    [runners.machine]
   199      IdleCount = 10
   200  ```
   201  
   202  In the above scenario the total amount of machines we could have is 30. The
   203  `limit` of total machines (building and idle) can be 40. We can have 10 idle
   204  machines but the `concurrent` jobs are 20. So in total we can have 20
   205  concurrent machines running jobs and 10 idle, summing up to 30.
   206  
   207  But what happens if the `limit` is less than the total amount of machines that
   208  could be created? The example below explains that case:
   209  
   210  ```bash
   211  concurrent=20
   212  
   213  [[runners]]
   214    limit = 25
   215    [runners.machine]
   216      IdleCount = 10
   217  ```
   218  
   219  In this example we will have at most 20 concurrent jobs, and at most 25
   220  machines created. In the worst case scenario regarding idle machines, we will
   221  not be able to have 10 idle machines, but only 5, because the `limit` is 25.
   222  
   223  ## Off Peak time mode configuration
   224  
   225  > Introduced in GitLab Runner v1.7
   226  
   227  Autoscale can be configured with the support for _Off Peak_ time mode periods.
   228  
   229  **What is _Off Peak_ time mode period?**
   230  
   231  Some organizations can select a regular time periods when no work is done.
   232  For example most of commercial companies are working from Monday to
   233  Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week -
   234  from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday -
   235  no one is working. These time periods we're naming here as _Off Peak_.
   236  
   237  Organizations where _Off Peak_ time periods occurs probably don't want
   238  to pay for the _Idle_ machines when it's certain that no jobs will be
   239  executed in this time. Especially when `IdleCount` is set to a big number.
   240  
   241  In the `v1.7` version of the Runner we've added the support for _Off Peak_
   242  configuration. With parameters described in configuration file you can now
   243  change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode
   244  periods.
   245  
   246  **How it is working?**
   247  
   248  Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`,
   249  `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The
   250  `OffPeakPeriods` setting contains an array of cron-style patterns defining
   251  when the _Off Peak_ time mode should be set on. For example:
   252  
   253  ```toml
   254  [runners.machine]
   255    OffPeakPeriods = [
   256      "* * 0-9,18-23 * * mon-fri *",
   257      "* * * * * sat,sun *"
   258    ]
   259  ```
   260  
   261  will enable the _Off Peak_ periods described above, so the _working_ days
   262  from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines
   263  scheduler is checking all patterns from the array and if at least one of
   264  them describes current time, then the _Off Peak_ time mode is enabled.
   265  
   266  NOTE: **Note:**
   267  The 59th second of the last
   268  minute in any period that you specify will *not* be considered part of the
   269  period. For more information, see [issue #2170](https://gitlab.com/gitlab-org/gitlab-runner/issues/2170).
   270  
   271  You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't,
   272  the system setting of the host machine of every runner will be used. This
   273  default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish.
   274  
   275  When the _Off Peak_ time mode is enabled machines scheduler use
   276  `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime`
   277  instead of `IdleTime` setting. The autoscaling algorithm is not changed,
   278  only the parameters. When machines scheduler discovers that none from
   279  the `OffPeakPeriods` pattern is fulfilled then it switches back to
   280  `IdleCount` and `IdleTime` settings.
   281  
   282  More information about syntax of `OffPeakPeriods` patterns can be found
   283  in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine].
   284  
   285  ## Distributed runners caching
   286  
   287  NOTE: **Note:**
   288  Read how to [install your own cache server](../install/registry_and_cache_servers.md#install-your-own-cache-server).
   289  
   290  To speed up your jobs, GitLab Runner provides a [cache mechanism][cache]
   291  where selected directories and/or files are saved and shared between subsequent
   292  jobs.
   293  
   294  This is working fine when jobs are run on the same host, but when you start
   295  using the Runners autoscale feature, most of your jobs will be running on a
   296  new (or almost new) host, which will execute each job in a new Docker
   297  container. In that case, you will not be able to take advantage of the cache
   298  feature.
   299  
   300  To overcome this issue, together with the autoscale feature, the distributed
   301  Runners cache feature was introduced.
   302  
   303  It uses configured object storage server to share the cache between used Docker hosts.
   304  When restoring and archiving the cache, GitLab Runner will query the server
   305  and will download or upload the archive respectively.
   306  
   307  To enable distributed caching, you have to define it in `config.toml` using the
   308  [`[runners.cache]` directive][runners-cache]:
   309  
   310  ```bash
   311  [[runners]]
   312    limit = 10
   313    executor = "docker+machine"
   314    [runners.cache]
   315      Type = "s3"
   316      Path = "path/to/prefix"
   317      Shared = false
   318      [runners.cache.s3]
   319        ServerAddress = "s3.example.com"
   320        AccessKey = "access-key"
   321        SecretKey = "secret-key"
   322        BucketName = "runner"
   323        Insecure = false
   324  ```
   325  
   326  In the example above, the S3 URLs follow the structure
   327  `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`.
   328  
   329  To share the cache between two or more Runners, set the `Shared` flag to true.
   330  That will remove the runner token from the URL (`runner/<runner-id>`) and
   331  all configured Runners will share the same cache. Remember that you can also
   332  set `Path` to separate caches between Runners when cache sharing is enabled.
   333  
   334  ## Distributed container registry mirroring
   335  
   336  NOTE: **Note:**
   337  Read how to [install a container registry](../install/registry_and_cache_servers.md#install-a-proxy-container-registry).
   338  
   339  To speed up jobs executed inside of Docker containers, you can use the [Docker
   340  registry mirroring service][registry]. This will provide a proxy between your
   341  Docker machines and all used registries. Images will be downloaded once by the
   342  registry mirror. On each new host, or on an existing host where the image is
   343  not available, it will be downloaded from the configured registry mirror.
   344  
   345  Provided that the mirror will exist in your Docker machines LAN, the image
   346  downloading step should be much faster on each host.
   347  
   348  To configure the Docker registry mirroring, you have to add `MachineOptions` to
   349  the configuration in `config.toml`:
   350  
   351  ```bash
   352  [[runners]]
   353    limit = 10
   354    executor = "docker+machine"
   355    [runners.machine]
   356      (...)
   357      MachineOptions = [
   358        (...)
   359        "engine-registry-mirror=http://10.11.12.13:12345"
   360      ]
   361  ```
   362  
   363  Where `10.11.12.13:12345` is the IP address and port where your registry mirror
   364  is listening for connections from the Docker service. It must be accessible for
   365  each host created by Docker Machine.
   366  
   367  ## A complete example of `config.toml`
   368  
   369  The `config.toml` below uses the [`digitalocean` Docker Machine driver](https://docs.docker.com/machine/drivers/digital-ocean/):
   370  
   371  ```bash
   372  concurrent = 50   # All registered Runners can run up to 50 concurrent jobs
   373  
   374  [[runners]]
   375    url = "https://gitlab.com"
   376    token = "RUNNER_TOKEN"             # Note this is different from the registration token used by `gitlab-runner register`
   377    name = "autoscale-runner"
   378    executor = "docker+machine"        # This Runner is using the 'docker+machine' executor
   379    limit = 10                         # This Runner can execute up to 10 jobs (created machines)
   380    [runners.docker]
   381      image = "ruby:2.1"               # The default image used for jobs is 'ruby:2.1'
   382    [runners.machine]
   383      OffPeakPeriods = [               # Set the Off Peak time mode on for:
   384        "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm
   385        "* * * * * sat,sun *"          # - whole Saturday and Sunday
   386      ]
   387      OffPeakIdleCount = 1             # There must be 1 machine in Idle state - when Off Peak time mode is on
   388      OffPeakIdleTime = 1200           # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on
   389      IdleCount = 5                    # There must be 5 machines in Idle state - when Off Peak time mode is off
   390      IdleTime = 600                   # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off
   391      MaxBuilds = 100                  # Each machine can handle up to 100 jobs in a row (after this it will be removed)
   392      MachineName = "auto-scale-%s"    # Each machine will have a unique name ('%s' is required)
   393      MachineDriver = "digitalocean"   # Docker Machine is using the 'digitalocean' driver
   394      MachineOptions = [
   395          "digitalocean-image=coreos-stable",
   396          "digitalocean-ssh-user=core",
   397          "digitalocean-access-token=DO_ACCESS_TOKEN",
   398          "digitalocean-region=nyc2",
   399          "digitalocean-size=4gb",
   400          "digitalocean-private-networking",
   401          "engine-registry-mirror=http://10.11.12.13:12345"   # Docker Machine is using registry mirroring
   402      ]
   403    [runners.cache]
   404      Type = "s3"
   405      [runners.cache.s3]
   406        ServerAddress = "s3-eu-west-1.amazonaws.com"
   407        AccessKey = "AMAZON_S3_ACCESS_KEY"
   408        SecretKey = "AMAZON_S3_SECRET_KEY"
   409        BucketName = "runner"
   410        Insecure = false
   411  ```
   412  
   413  Note that the `MachineOptions` parameter contains options for the `digitalocean`
   414  driver which is used by Docker Machine to spawn machines hosted on Digital Ocean,
   415  and one option for Docker Machine itself (`engine-registry-mirror`).
   416  
   417  [cache]: https://docs.gitlab.com/ee/ci/yaml/README.html#cache
   418  [docker-machine-docs]: https://docs.docker.com/machine/
   419  [docker-machine-driver]: https://docs.docker.com/machine/drivers/
   420  [docker-machine-installation]: https://docs.docker.com/machine/install-machine/
   421  [runners-cache]: advanced-configuration.md#the-runnerscache-section
   422  [runners-machine]: advanced-configuration.md#the-runnersmachine-section
   423  [registry]: https://docs.docker.com/registry/