github.com/secure-build/gitlab-runner@v12.5.0+incompatible/docs/configuration/autoscale.md (about) 1 # Runners autoscale configuration 2 3 > The autoscale feature was introduced in GitLab Runner 1.1.0. 4 5 Autoscale provides the ability to utilize resources in a more elastic and 6 dynamic way. 7 8 Thanks to Runners being able to autoscale, your infrastructure contains only as 9 much build instances as necessary at anytime. If you configure the Runner to 10 only use autoscale, the system on which the Runner is installed acts as a 11 bastion for all the machines it creates. 12 13 ## Overview 14 15 When this feature is enabled and configured properly, jobs are executed on 16 machines created _on demand_. Those machines, after the job is finished, can 17 wait to run the next jobs or can be removed after the configured `IdleTime`. 18 In case of many cloud providers this helps to utilize the cost of already used 19 instances. 20 21 Below, you can see a real life example of the runners autoscale feature, tested 22 on GitLab.com for the [GitLab Community Edition][ce] project: 23 24  25 26 Each machine on the chart is an independent cloud instance, running jobs 27 inside of Docker containers. 28 29 [ce]: https://gitlab.com/gitlab-org/gitlab-ce 30 31 ## System requirements 32 33 Before configuring autoscale, you must: 34 35 - [Prepare your own environment](../executors/docker_machine.md#preparing-the-environment). 36 - Optionally use a [forked version](../executors/docker_machine.md#forked-version-of-docker-machine) of Docker machine supplied by GitLab, which has some additional fixes. 37 38 ## Supported cloud providers 39 40 The autoscale mechanism is based on [Docker Machine](https://docs.docker.com/machine/overview/). 41 All supported virtualization/cloud provider parameters, are available at the 42 [Docker Machine drivers documentation](https://docs.docker.com/machine/drivers/). 43 44 ## Runner configuration 45 46 In this section we will describe only the significant parameters from the 47 autoscale feature point of view. For more configurations details read the 48 [advanced configuration](advanced-configuration.md). 49 50 ### Runner global options 51 52 | Parameter | Value | Description | 53 |--------------|---------|-------------| 54 | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section][runners-machine]) it affects the upper limit of created machines. | 55 56 ### `[[runners]]` options 57 58 | Parameter | Value | Description | 59 |------------|------------------|-------------| 60 | `executor` | string | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. | 61 | `limit` | integer | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). | 62 63 ### `[runners.machine]` options 64 65 Configuration parameters details can be found 66 in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine]. 67 68 ### `[runners.cache]` options 69 70 Configuration parameters details can be found 71 in [GitLab Runner - Advanced Configuration - The `[runners.cache]` section][runners-cache] 72 73 ### Additional configuration information 74 75 There is also a special mode, when you set `IdleCount = 0`. In this mode, 76 machines are **always** created **on-demand** before each job (if there is no 77 available machine in _Idle_ state). After the job is finished, the autoscaling 78 algorithm works 79 [the same as it is described below](#autoscaling-algorithm-and-parameters). 80 The machine is waiting for the next jobs, and if no one is executed, after 81 the `IdleTime` period, the machine is removed. If there are no jobs, there 82 are no machines in _Idle_ state. 83 84 ## Autoscaling algorithm and parameters 85 86 The autoscaling algorithm is based on three main parameters: `IdleCount`, 87 `IdleTime` and `limit`. 88 89 We say that each machine that does not run a job is in _Idle_ state. When 90 GitLab Runner is in autoscale mode, it monitors all machines and ensures that 91 there is always an `IdleCount` of machines in _Idle_ state. 92 93 At the same time, GitLab Runner is checking the duration of the _Idle_ state of 94 each machine. If the time exceeds the `IdleTime` value, the machine is 95 automatically removed. 96 97 --- 98 99 **Example:** 100 Let's suppose, that we have configured GitLab Runner with the following 101 autoscale parameters: 102 103 ```bash 104 [[runners]] 105 limit = 10 106 (...) 107 executor = "docker+machine" 108 [runners.machine] 109 IdleCount = 2 110 IdleTime = 1800 111 (...) 112 ``` 113 114 At the beginning, when no jobs are queued, GitLab Runner starts two machines 115 (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set 116 `IdleTime` to 30 minutes (`IdleTime = 1800`). 117 118 Now, let's assume that 5 jobs are queued in GitLab CI. The first 2 jobs are 119 sent to the _Idle_ machines of which we have two. GitLab Runner now notices that 120 the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new 121 machines. Then, the next 2 jobs from the queue are sent to those newly created 122 machines. Again, the number of _Idle_ machines is less than `IdleCount`, so 123 GitLab Runner starts 2 new machines and the last queued job is sent to one of 124 the _Idle_ machines. 125 126 We now have 1 _Idle_ machine, so GitLab Runner starts another 1 new machine to 127 satisfy `IdleCount`. Because there are no new jobs in queue, those two 128 machines stay in _Idle_ state and GitLab Runner is satisfied. 129 130 --- 131 132 **This is what happened:** 133 We had 2 machines, waiting in _Idle_ state for new jobs. After the 5 jobs 134 where queued, new machines were created, so in total we had 7 machines. Five of 135 them were running jobs, and 2 were in _Idle_ state, waiting for the next 136 jobs. 137 138 The algorithm will still work in the same way; GitLab Runner will create a new 139 _Idle_ machine for each machine used for the job execution until `IdleCount` 140 is satisfied. Those machines will be created up to the number defined by 141 `limit` parameter. If GitLab Runner notices that there is a `limit` number of 142 total created machines, it will stop autoscaling, and new jobs will need to 143 wait in the job queue until machines start returning to _Idle_ state. 144 145 In the above example we will always have two idle machines. The `IdleTime` 146 applies only when we are over the `IdleCount`, then we try to reduce the number 147 of machines to `IdleCount`. 148 149 --- 150 151 **Scaling down:** 152 After the job is finished, the machine is set to _Idle_ state and is waiting 153 for the next jobs to be executed. Let's suppose that we have no new jobs in 154 the queue. After the time designated by `IdleTime` passes, the _Idle_ machines 155 will be removed. In our example, after 30 minutes, all machines will be removed 156 (each machine after 30 minutes from when last job execution ended) and GitLab 157 Runner will start to keep an `IdleCount` of _Idle_ machines running, just like 158 at the beginning of the example. 159 160 --- 161 162 So, to sum up: 163 164 1. We start the Runner 165 1. Runner creates 2 idle machines 166 1. Runner picks one job 167 1. Runner creates one more machine to fulfill the strong requirement of always 168 having the two idle machines 169 1. Job finishes, we have 3 idle machines 170 1. When one of the three idle machines goes over `IdleTime` from the time when 171 last time it picked the job it will be removed 172 1. The Runner will always have at least 2 idle machines waiting for fast 173 picking of the jobs 174 175 Below you can see a comparison chart of jobs statuses and machines statuses 176 in time: 177 178  179 180 ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines 181 182 There doesn't exist a magic equation that will tell you what to set `limit` or 183 `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_ 184 machines is a speedup feature. You don't need to wait 10s/20s/30s for the 185 instance to be created. But as a user, you'd want all your machines (for which 186 you need to pay) to be running jobs, not stay in _Idle_ state. So you should 187 have `concurrent` and `limit` set to values that will run the maximum count of 188 machines you are willing to pay for. As for `IdleCount`, it should be set to a 189 value that will generate a minimum amount of _not used_ machines when the job 190 queue is empty. 191 192 Let's assume the following example: 193 194 ```bash 195 concurrent=20 196 197 [[runners]] 198 limit = 40 199 [runners.machine] 200 IdleCount = 10 201 ``` 202 203 In the above scenario the total amount of machines we could have is 30. The 204 `limit` of total machines (building and idle) can be 40. We can have 10 idle 205 machines but the `concurrent` jobs are 20. So in total we can have 20 206 concurrent machines running jobs and 10 idle, summing up to 30. 207 208 But what happens if the `limit` is less than the total amount of machines that 209 could be created? The example below explains that case: 210 211 ```bash 212 concurrent=20 213 214 [[runners]] 215 limit = 25 216 [runners.machine] 217 IdleCount = 10 218 ``` 219 220 In this example we will have at most 20 concurrent jobs, and at most 25 221 machines created. In the worst case scenario regarding idle machines, we will 222 not be able to have 10 idle machines, but only 5, because the `limit` is 25. 223 224 ## Off Peak time mode configuration 225 226 > Introduced in GitLab Runner v1.7 227 228 Autoscale can be configured with the support for _Off Peak_ time mode periods. 229 230 **What is _Off Peak_ time mode period?** 231 232 Some organizations can select a regular time periods when no work is done. 233 For example most of commercial companies are working from Monday to 234 Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week - 235 from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday - 236 no one is working. These time periods we're naming here as _Off Peak_. 237 238 Organizations where _Off Peak_ time periods occurs probably don't want 239 to pay for the _Idle_ machines when it's certain that no jobs will be 240 executed in this time. Especially when `IdleCount` is set to a big number. 241 242 In the `v1.7` version of the Runner we've added the support for _Off Peak_ 243 configuration. With parameters described in configuration file you can now 244 change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode 245 periods. 246 247 **How it is working?** 248 249 Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`, 250 `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The 251 `OffPeakPeriods` setting contains an array of cron-style patterns defining 252 when the _Off Peak_ time mode should be set on. For example: 253 254 ```toml 255 [runners.machine] 256 OffPeakPeriods = [ 257 "* * 0-9,18-23 * * mon-fri *", 258 "* * * * * sat,sun *" 259 ] 260 ``` 261 262 will enable the _Off Peak_ periods described above, so the _working_ days 263 from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines 264 scheduler is checking all patterns from the array and if at least one of 265 them describes current time, then the _Off Peak_ time mode is enabled. 266 267 NOTE: **Note:** 268 The 59th second of the last 269 minute in any period that you specify will *not* be considered part of the 270 period. For more information, see [issue #2170](https://gitlab.com/gitlab-org/gitlab-runner/issues/2170). 271 272 You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't, 273 the system setting of the host machine of every runner will be used. This 274 default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish. 275 276 When the _Off Peak_ time mode is enabled machines scheduler use 277 `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime` 278 instead of `IdleTime` setting. The autoscaling algorithm is not changed, 279 only the parameters. When machines scheduler discovers that none from 280 the `OffPeakPeriods` pattern is fulfilled then it switches back to 281 `IdleCount` and `IdleTime` settings. 282 283 More information about syntax of `OffPeakPeriods` patterns can be found 284 in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine]. 285 286 ## Distributed runners caching 287 288 NOTE: **Note:** 289 Read how to [install your own cache server](../install/registry_and_cache_servers.md#install-your-own-cache-server). 290 291 To speed up your jobs, GitLab Runner provides a [cache mechanism][cache] 292 where selected directories and/or files are saved and shared between subsequent 293 jobs. 294 295 This is working fine when jobs are run on the same host, but when you start 296 using the Runners autoscale feature, most of your jobs will be running on a 297 new (or almost new) host, which will execute each job in a new Docker 298 container. In that case, you will not be able to take advantage of the cache 299 feature. 300 301 To overcome this issue, together with the autoscale feature, the distributed 302 Runners cache feature was introduced. 303 304 It uses configured object storage server to share the cache between used Docker hosts. 305 When restoring and archiving the cache, GitLab Runner will query the server 306 and will download or upload the archive respectively. 307 308 To enable distributed caching, you have to define it in `config.toml` using the 309 [`[runners.cache]` directive][runners-cache]: 310 311 ```bash 312 [[runners]] 313 limit = 10 314 executor = "docker+machine" 315 [runners.cache] 316 Type = "s3" 317 Path = "path/to/prefix" 318 Shared = false 319 [runners.cache.s3] 320 ServerAddress = "s3.example.com" 321 AccessKey = "access-key" 322 SecretKey = "secret-key" 323 BucketName = "runner" 324 Insecure = false 325 ``` 326 327 In the example above, the S3 URLs follow the structure 328 `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`. 329 330 To share the cache between two or more Runners, set the `Shared` flag to true. 331 That will remove the runner token from the URL (`runner/<runner-id>`) and 332 all configured Runners will share the same cache. Remember that you can also 333 set `Path` to separate caches between Runners when cache sharing is enabled. 334 335 ## Distributed container registry mirroring 336 337 NOTE: **Note:** 338 Read how to [install a container registry](../install/registry_and_cache_servers.md#install-a-proxy-container-registry). 339 340 To speed up jobs executed inside of Docker containers, you can use the [Docker 341 registry mirroring service][registry]. This will provide a proxy between your 342 Docker machines and all used registries. Images will be downloaded once by the 343 registry mirror. On each new host, or on an existing host where the image is 344 not available, it will be downloaded from the configured registry mirror. 345 346 Provided that the mirror will exist in your Docker machines LAN, the image 347 downloading step should be much faster on each host. 348 349 To configure the Docker registry mirroring, you have to add `MachineOptions` to 350 the configuration in `config.toml`: 351 352 ```bash 353 [[runners]] 354 limit = 10 355 executor = "docker+machine" 356 [runners.machine] 357 (...) 358 MachineOptions = [ 359 (...) 360 "engine-registry-mirror=http://10.11.12.13:12345" 361 ] 362 ``` 363 364 Where `10.11.12.13:12345` is the IP address and port where your registry mirror 365 is listening for connections from the Docker service. It must be accessible for 366 each host created by Docker Machine. 367 368 ## A complete example of `config.toml` 369 370 The `config.toml` below uses the [`digitalocean` Docker Machine driver](https://docs.docker.com/machine/drivers/digital-ocean/): 371 372 ```bash 373 concurrent = 50 # All registered Runners can run up to 50 concurrent jobs 374 375 [[runners]] 376 url = "https://gitlab.com" 377 token = "RUNNER_TOKEN" # Note this is different from the registration token used by `gitlab-runner register` 378 name = "autoscale-runner" 379 executor = "docker+machine" # This Runner is using the 'docker+machine' executor 380 limit = 10 # This Runner can execute up to 10 jobs (created machines) 381 [runners.docker] 382 image = "ruby:2.1" # The default image used for jobs is 'ruby:2.1' 383 [runners.machine] 384 OffPeakPeriods = [ # Set the Off Peak time mode on for: 385 "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm 386 "* * * * * sat,sun *" # - whole Saturday and Sunday 387 ] 388 OffPeakIdleCount = 1 # There must be 1 machine in Idle state - when Off Peak time mode is on 389 OffPeakIdleTime = 1200 # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on 390 IdleCount = 5 # There must be 5 machines in Idle state - when Off Peak time mode is off 391 IdleTime = 600 # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off 392 MaxBuilds = 100 # Each machine can handle up to 100 jobs in a row (after this it will be removed) 393 MachineName = "auto-scale-%s" # Each machine will have a unique name ('%s' is required) 394 MachineDriver = "digitalocean" # Docker Machine is using the 'digitalocean' driver 395 MachineOptions = [ 396 "digitalocean-image=coreos-stable", 397 "digitalocean-ssh-user=core", 398 "digitalocean-access-token=DO_ACCESS_TOKEN", 399 "digitalocean-region=nyc2", 400 "digitalocean-size=4gb", 401 "digitalocean-private-networking", 402 "engine-registry-mirror=http://10.11.12.13:12345" # Docker Machine is using registry mirroring 403 ] 404 [runners.cache] 405 Type = "s3" 406 [runners.cache.s3] 407 ServerAddress = "s3-eu-west-1.amazonaws.com" 408 AccessKey = "AMAZON_S3_ACCESS_KEY" 409 SecretKey = "AMAZON_S3_SECRET_KEY" 410 BucketName = "runner" 411 Insecure = false 412 ``` 413 414 Note that the `MachineOptions` parameter contains options for the `digitalocean` 415 driver which is used by Docker Machine to spawn machines hosted on Digital Ocean, 416 and one option for Docker Machine itself (`engine-registry-mirror`). 417 418 [cache]: https://docs.gitlab.com/ee/ci/yaml/README.html#cache 419 [docker-machine-docs]: https://docs.docker.com/machine/ 420 [docker-machine-driver]: https://docs.docker.com/machine/drivers/ 421 [docker-machine-installation]: https://docs.docker.com/machine/install-machine/ 422 [runners-cache]: advanced-configuration.md#the-runnerscache-section 423 [runners-machine]: advanced-configuration.md#the-runnersmachine-section 424 [registry]: https://docs.docker.com/registry/