gitlab.com/jfprevost/gitlab-runner-notlscheck@v11.11.4+incompatible/docs/configuration/autoscale.md (about) 1 # Runners autoscale configuration 2 3 > The autoscale feature was introduced in GitLab Runner 1.1.0. 4 5 Autoscale provides the ability to utilize resources in a more elastic and 6 dynamic way. 7 8 Thanks to Runners being able to autoscale, your infrastructure contains only as 9 much build instances as necessary at anytime. If you configure the Runner to 10 only use autoscale, the system on which the Runner is installed acts as a 11 bastion for all the machines it creates. 12 13 ## Overview 14 15 When this feature is enabled and configured properly, jobs are executed on 16 machines created _on demand_. Those machines, after the job is finished, can 17 wait to run the next jobs or can be removed after the configured `IdleTime`. 18 In case of many cloud providers this helps to utilize the cost of already used 19 instances. 20 21 Below, you can see a real life example of the runners autoscale feature, tested 22 on GitLab.com for the [GitLab Community Edition][ce] project: 23 24  25 26 Each machine on the chart is an independent cloud instance, running jobs 27 inside of Docker containers. 28 29 [ce]: https://gitlab.com/gitlab-org/gitlab-ce 30 31 ## System requirements 32 33 At this point you should have 34 [installed all the requirements](../executors/docker_machine.md#preparing-the-environment). 35 If not, make sure to do it before going over the configuration. 36 37 ## Supported cloud providers 38 39 The autoscale mechanism is based on [Docker Machine](https://docs.docker.com/machine/overview/). 40 All supported virtualization/cloud provider parameters, are available at the 41 [Docker Machine drivers documentation](https://docs.docker.com/machine/drivers/). 42 43 ## Runner configuration 44 45 In this section we will describe only the significant parameters from the 46 autoscale feature point of view. For more configurations details read the 47 [advanced configuration](advanced-configuration.md). 48 49 ### Runner global options 50 51 | Parameter | Value | Description | 52 |--------------|---------|-------------| 53 | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section][runners-machine]) it affects the upper limit of created machines. | 54 55 ### `[[runners]]` options 56 57 | Parameter | Value | Description | 58 |------------|------------------|-------------| 59 | `executor` | string | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. | 60 | `limit` | integer | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). | 61 62 ### `[runners.machine]` options 63 64 Configuration parameters details can be found 65 in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine]. 66 67 ### `[runners.cache]` options 68 69 Configuration parameters details can be found 70 in [GitLab Runner - Advanced Configuration - The `[runners.cache]` section][runners-cache] 71 72 ### Additional configuration information 73 74 There is also a special mode, when you set `IdleCount = 0`. In this mode, 75 machines are **always** created **on-demand** before each job (if there is no 76 available machine in _Idle_ state). After the job is finished, the autoscaling 77 algorithm works 78 [the same as it is described below](#autoscaling-algorithm-and-parameters). 79 The machine is waiting for the next jobs, and if no one is executed, after 80 the `IdleTime` period, the machine is removed. If there are no jobs, there 81 are no machines in _Idle_ state. 82 83 ## Autoscaling algorithm and parameters 84 85 The autoscaling algorithm is based on three main parameters: `IdleCount`, 86 `IdleTime` and `limit`. 87 88 We say that each machine that does not run a job is in _Idle_ state. When 89 GitLab Runner is in autoscale mode, it monitors all machines and ensures that 90 there is always an `IdleCount` of machines in _Idle_ state. 91 92 At the same time, GitLab Runner is checking the duration of the _Idle_ state of 93 each machine. If the time exceeds the `IdleTime` value, the machine is 94 automatically removed. 95 96 --- 97 98 **Example:** 99 Let's suppose, that we have configured GitLab Runner with the following 100 autoscale parameters: 101 102 ```bash 103 [[runners]] 104 limit = 10 105 (...) 106 executor = "docker+machine" 107 [runners.machine] 108 IdleCount = 2 109 IdleTime = 1800 110 (...) 111 ``` 112 113 At the beginning, when no jobs are queued, GitLab Runner starts two machines 114 (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set 115 `IdleTime` to 30 minutes (`IdleTime = 1800`). 116 117 Now, let's assume that 5 jobs are queued in GitLab CI. The first 2 jobs are 118 sent to the _Idle_ machines of which we have two. GitLab Runner now notices that 119 the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new 120 machines. Then, the next 2 jobs from the queue are sent to those newly created 121 machines. Again, the number of _Idle_ machines is less than `IdleCount`, so 122 GitLab Runner starts 2 new machines and the last queued job is sent to one of 123 the _Idle_ machines. 124 125 We now have 1 _Idle_ machine, so GitLab Runner starts another 1 new machine to 126 satisfy `IdleCount`. Because there are no new jobs in queue, those two 127 machines stay in _Idle_ state and GitLab Runner is satisfied. 128 129 --- 130 131 **This is what happened:** 132 We had 2 machines, waiting in _Idle_ state for new jobs. After the 5 jobs 133 where queued, new machines were created, so in total we had 7 machines. Five of 134 them were running jobs, and 2 were in _Idle_ state, waiting for the next 135 jobs. 136 137 The algorithm will still work in the same way; GitLab Runner will create a new 138 _Idle_ machine for each machine used for the job execution until `IdleCount` 139 is satisfied. Those machines will be created up to the number defined by 140 `limit` parameter. If GitLab Runner notices that there is a `limit` number of 141 total created machines, it will stop autoscaling, and new jobs will need to 142 wait in the job queue until machines start returning to _Idle_ state. 143 144 In the above example we will always have two idle machines. The `IdleTime` 145 applies only when we are over the `IdleCount`, then we try to reduce the number 146 of machines to `IdleCount`. 147 148 --- 149 150 **Scaling down:** 151 After the job is finished, the machine is set to _Idle_ state and is waiting 152 for the next jobs to be executed. Let's suppose that we have no new jobs in 153 the queue. After the time designated by `IdleTime` passes, the _Idle_ machines 154 will be removed. In our example, after 30 minutes, all machines will be removed 155 (each machine after 30 minutes from when last job execution ended) and GitLab 156 Runner will start to keep an `IdleCount` of _Idle_ machines running, just like 157 at the beginning of the example. 158 159 --- 160 161 So, to sum up: 162 163 1. We start the Runner 164 2. Runner creates 2 idle machines 165 3. Runner picks one job 166 4. Runner creates one more machine to fulfill the strong requirement of always 167 having the two idle machines 168 5. Job finishes, we have 3 idle machines 169 6. When one of the three idle machines goes over `IdleTime` from the time when 170 last time it picked the job it will be removed 171 7. The Runner will always have at least 2 idle machines waiting for fast 172 picking of the jobs 173 174 Below you can see a comparison chart of jobs statuses and machines statuses 175 in time: 176 177  178 179 ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines 180 181 There doesn't exist a magic equation that will tell you what to set `limit` or 182 `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_ 183 machines is a speedup feature. You don't need to wait 10s/20s/30s for the 184 instance to be created. But as a user, you'd want all your machines (for which 185 you need to pay) to be running jobs, not stay in _Idle_ state. So you should 186 have `concurrent` and `limit` set to values that will run the maximum count of 187 machines you are willing to pay for. As for `IdleCount`, it should be set to a 188 value that will generate a minimum amount of _not used_ machines when the job 189 queue is empty. 190 191 Let's assume the following example: 192 193 ```bash 194 concurrent=20 195 196 [[runners]] 197 limit = 40 198 [runners.machine] 199 IdleCount = 10 200 ``` 201 202 In the above scenario the total amount of machines we could have is 30. The 203 `limit` of total machines (building and idle) can be 40. We can have 10 idle 204 machines but the `concurrent` jobs are 20. So in total we can have 20 205 concurrent machines running jobs and 10 idle, summing up to 30. 206 207 But what happens if the `limit` is less than the total amount of machines that 208 could be created? The example below explains that case: 209 210 ```bash 211 concurrent=20 212 213 [[runners]] 214 limit = 25 215 [runners.machine] 216 IdleCount = 10 217 ``` 218 219 In this example we will have at most 20 concurrent jobs, and at most 25 220 machines created. In the worst case scenario regarding idle machines, we will 221 not be able to have 10 idle machines, but only 5, because the `limit` is 25. 222 223 ## Off Peak time mode configuration 224 225 > Introduced in GitLab Runner v1.7 226 227 Autoscale can be configured with the support for _Off Peak_ time mode periods. 228 229 **What is _Off Peak_ time mode period?** 230 231 Some organizations can select a regular time periods when no work is done. 232 For example most of commercial companies are working from Monday to 233 Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week - 234 from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday - 235 no one is working. These time periods we're naming here as _Off Peak_. 236 237 Organizations where _Off Peak_ time periods occurs probably don't want 238 to pay for the _Idle_ machines when it's certain that no jobs will be 239 executed in this time. Especially when `IdleCount` is set to a big number. 240 241 In the `v1.7` version of the Runner we've added the support for _Off Peak_ 242 configuration. With parameters described in configuration file you can now 243 change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode 244 periods. 245 246 **How it is working?** 247 248 Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`, 249 `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The 250 `OffPeakPeriods` setting contains an array of cron-style patterns defining 251 when the _Off Peak_ time mode should be set on. For example: 252 253 ```toml 254 [runners.machine] 255 OffPeakPeriods = [ 256 "* * 0-9,18-23 * * mon-fri *", 257 "* * * * * sat,sun *" 258 ] 259 ``` 260 261 will enable the _Off Peak_ periods described above, so the _working_ days 262 from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines 263 scheduler is checking all patterns from the array and if at least one of 264 them describes current time, then the _Off Peak_ time mode is enabled. 265 266 NOTE: **Note:** 267 The 59th second of the last 268 minute in any period that you specify will *not* be considered part of the 269 period. For more information, see [issue #2170](https://gitlab.com/gitlab-org/gitlab-runner/issues/2170). 270 271 You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't, 272 the system setting of the host machine of every runner will be used. This 273 default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish. 274 275 When the _Off Peak_ time mode is enabled machines scheduler use 276 `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime` 277 instead of `IdleTime` setting. The autoscaling algorithm is not changed, 278 only the parameters. When machines scheduler discovers that none from 279 the `OffPeakPeriods` pattern is fulfilled then it switches back to 280 `IdleCount` and `IdleTime` settings. 281 282 More information about syntax of `OffPeakPeriods` patterns can be found 283 in [GitLab Runner - Advanced Configuration - The `[runners.machine]` section][runners-machine]. 284 285 ## Distributed runners caching 286 287 NOTE: **Note:** 288 Read how to [install your own cache server](../install/registry_and_cache_servers.md#install-your-own-cache-server). 289 290 To speed up your jobs, GitLab Runner provides a [cache mechanism][cache] 291 where selected directories and/or files are saved and shared between subsequent 292 jobs. 293 294 This is working fine when jobs are run on the same host, but when you start 295 using the Runners autoscale feature, most of your jobs will be running on a 296 new (or almost new) host, which will execute each job in a new Docker 297 container. In that case, you will not be able to take advantage of the cache 298 feature. 299 300 To overcome this issue, together with the autoscale feature, the distributed 301 Runners cache feature was introduced. 302 303 It uses configured object storage server to share the cache between used Docker hosts. 304 When restoring and archiving the cache, GitLab Runner will query the server 305 and will download or upload the archive respectively. 306 307 To enable distributed caching, you have to define it in `config.toml` using the 308 [`[runners.cache]` directive][runners-cache]: 309 310 ```bash 311 [[runners]] 312 limit = 10 313 executor = "docker+machine" 314 [runners.cache] 315 Type = "s3" 316 Path = "path/to/prefix" 317 Shared = false 318 [runners.cache.s3] 319 ServerAddress = "s3.example.com" 320 AccessKey = "access-key" 321 SecretKey = "secret-key" 322 BucketName = "runner" 323 Insecure = false 324 ``` 325 326 In the example above, the S3 URLs follow the structure 327 `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`. 328 329 To share the cache between two or more Runners, set the `Shared` flag to true. 330 That will remove the runner token from the URL (`runner/<runner-id>`) and 331 all configured Runners will share the same cache. Remember that you can also 332 set `Path` to separate caches between Runners when cache sharing is enabled. 333 334 ## Distributed container registry mirroring 335 336 NOTE: **Note:** 337 Read how to [install a container registry](../install/registry_and_cache_servers.md#install-a-proxy-container-registry). 338 339 To speed up jobs executed inside of Docker containers, you can use the [Docker 340 registry mirroring service][registry]. This will provide a proxy between your 341 Docker machines and all used registries. Images will be downloaded once by the 342 registry mirror. On each new host, or on an existing host where the image is 343 not available, it will be downloaded from the configured registry mirror. 344 345 Provided that the mirror will exist in your Docker machines LAN, the image 346 downloading step should be much faster on each host. 347 348 To configure the Docker registry mirroring, you have to add `MachineOptions` to 349 the configuration in `config.toml`: 350 351 ```bash 352 [[runners]] 353 limit = 10 354 executor = "docker+machine" 355 [runners.machine] 356 (...) 357 MachineOptions = [ 358 (...) 359 "engine-registry-mirror=http://10.11.12.13:12345" 360 ] 361 ``` 362 363 Where `10.11.12.13:12345` is the IP address and port where your registry mirror 364 is listening for connections from the Docker service. It must be accessible for 365 each host created by Docker Machine. 366 367 ## A complete example of `config.toml` 368 369 The `config.toml` below uses the [`digitalocean` Docker Machine driver](https://docs.docker.com/machine/drivers/digital-ocean/): 370 371 ```bash 372 concurrent = 50 # All registered Runners can run up to 50 concurrent jobs 373 374 [[runners]] 375 url = "https://gitlab.com" 376 token = "RUNNER_TOKEN" # Note this is different from the registration token used by `gitlab-runner register` 377 name = "autoscale-runner" 378 executor = "docker+machine" # This Runner is using the 'docker+machine' executor 379 limit = 10 # This Runner can execute up to 10 jobs (created machines) 380 [runners.docker] 381 image = "ruby:2.1" # The default image used for jobs is 'ruby:2.1' 382 [runners.machine] 383 OffPeakPeriods = [ # Set the Off Peak time mode on for: 384 "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm 385 "* * * * * sat,sun *" # - whole Saturday and Sunday 386 ] 387 OffPeakIdleCount = 1 # There must be 1 machine in Idle state - when Off Peak time mode is on 388 OffPeakIdleTime = 1200 # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on 389 IdleCount = 5 # There must be 5 machines in Idle state - when Off Peak time mode is off 390 IdleTime = 600 # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off 391 MaxBuilds = 100 # Each machine can handle up to 100 jobs in a row (after this it will be removed) 392 MachineName = "auto-scale-%s" # Each machine will have a unique name ('%s' is required) 393 MachineDriver = "digitalocean" # Docker Machine is using the 'digitalocean' driver 394 MachineOptions = [ 395 "digitalocean-image=coreos-stable", 396 "digitalocean-ssh-user=core", 397 "digitalocean-access-token=DO_ACCESS_TOKEN", 398 "digitalocean-region=nyc2", 399 "digitalocean-size=4gb", 400 "digitalocean-private-networking", 401 "engine-registry-mirror=http://10.11.12.13:12345" # Docker Machine is using registry mirroring 402 ] 403 [runners.cache] 404 Type = "s3" 405 [runners.cache.s3] 406 ServerAddress = "s3-eu-west-1.amazonaws.com" 407 AccessKey = "AMAZON_S3_ACCESS_KEY" 408 SecretKey = "AMAZON_S3_SECRET_KEY" 409 BucketName = "runner" 410 Insecure = false 411 ``` 412 413 Note that the `MachineOptions` parameter contains options for the `digitalocean` 414 driver which is used by Docker Machine to spawn machines hosted on Digital Ocean, 415 and one option for Docker Machine itself (`engine-registry-mirror`). 416 417 [cache]: https://docs.gitlab.com/ee/ci/yaml/README.html#cache 418 [docker-machine-docs]: https://docs.docker.com/machine/ 419 [docker-machine-driver]: https://docs.docker.com/machine/drivers/ 420 [docker-machine-installation]: https://docs.docker.com/machine/install-machine/ 421 [runners-cache]: advanced-configuration.md#the-runnerscache-section 422 [runners-machine]: advanced-configuration.md#the-runnersmachine-section 423 [registry]: https://docs.docker.com/registry/