github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/autoscale.md (about) 1 # Runners autoscale configuration 2 3 Autoscale provides the ability to utilize resources in a more elastic and 4 dynamic way. 5 6 Thanks to Runners being able to autoscale, your infrastructure contains only as 7 much build instances as necessary at anytime. If you configure the Runner to 8 only use autoscale, the system on which the Runner is installed acts as a 9 bastion for all the machines it creates. 10 11 ## Overview 12 13 When this feature is enabled and configured properly, builds are executed on 14 machines created _on demand_. Those machines, after the build is finished, can 15 wait to run the next builds or can be removed after the configured `IdleTime`. 16 In case of many cloud providers this helps to utilize the cost of already used 17 instances. 18 19 ## System requirements 20 21 To use the autoscale feature, the system which will host the Runner must have: 22 23 - AlloyCI Runner executable - installation guide can be found in 24 [AlloyCI Runner Documentation][runner-installation] 25 - Docker Machine executable - installation guide can be found in 26 [Docker Machine documentation][docker-machine-installation] 27 28 If you need to use any virtualization/cloud providers that aren't handled by 29 Docker's Machine internal drivers, the appropriate driver plugin must be 30 installed. The Docker Machine driver plugin installation and configuration is 31 out of the scope of this documentation. For more details please read the 32 [Docker Machine documentation][docker-machine-docs]. 33 34 ## Runner configuration 35 36 In this section we will describe only the significant parameters from the 37 autoscale feature point of view. For more configurations details please read 38 the [AlloyCI Runner - Installation][runner-installation] 39 and [AlloyCI Runner - Advanced Configuration][runner-configuration]. 40 41 ### Runner global options 42 43 | Parameter | Value | Description | 44 |--------------|---------|-------------| 45 | `concurrent` | integer | Limits how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using _all_ defined runners, local and autoscale. Together with `limit` (from [`[[runners]]` section](#runners-options)) and `IdleCount` (from [`[runners.machine]` section](advanced-configuration.md#the-runnersmachine-section)) it affects the upper limit of created machines. | 46 47 ### `[[runners]]` options 48 49 | Parameter | Value | Description | 50 |------------|------------------|-------------| 51 | `executor` | string | To use the autoscale feature, `executor` must be set to `docker+machine` or `docker-ssh+machine`. | 52 | `limit` | integer | Limits how many jobs can be handled concurrently by this specific token. 0 simply means don't limit. For autoscale it's the upper limit of machines created by this provider (in conjunction with `concurrent` and `IdleCount`). | 53 54 ### `[runners.machine]` options 55 56 Configuration parameters details can be found 57 in [AlloyCI Runner - Advanced Configuration - The runners.machine section](advanced-configuration.md#the-runnersmachine-section). 58 59 ### `[runners.cache]` options 60 61 Configuration parameters details can be found 62 in [AlloyCI Runner - Advanced Configuration - The runners.cache section](advanced-configuration.md#the-runnerscache-section) 63 64 ### Additional configuration information 65 66 There is also a special mode, when you set `IdleCount = 0`. In this mode, 67 machines are **always** created **on-demand** before each build (if there is no 68 available machine in _Idle_ state). After the build is finished, the autoscaling 69 algorithm works 70 [the same as it is described below](#autoscaling-algorithm-and-parameters). 71 The machine is waiting for the next builds, and if no one is executed, after 72 the `IdleTime` period, the machine is removed. If there are no builds, there 73 are no machines in _Idle_ state. 74 75 ## Autoscaling algorithm and parameters 76 77 The autoscaling algorithm is based on three main parameters: `IdleCount`, 78 `IdleTime` and `limit`. 79 80 We say that each machine that does not run a build is in _Idle_ state. When 81 AlloyCI Runner is in autoscale mode, it monitors all machines and ensures that 82 there is always an `IdleCount` of machines in _Idle_ state. 83 84 At the same time, AlloyCI Runner is checking the duration of the _Idle_ state of 85 each machine. If the time exceeds the `IdleTime` value, the machine is 86 automatically removed. 87 88 --- 89 90 **Example:** 91 Let's suppose, that we have configured AlloyCI Runner with the following 92 autoscale parameters: 93 94 ```bash 95 [[runners]] 96 limit = 10 97 (...) 98 executor = "docker+machine" 99 [runners.machine] 100 IdleCount = 2 101 IdleTime = 1800 102 (...) 103 ``` 104 105 At the beginning, when no builds are queued, AlloyCI Runner starts two machines 106 (`IdleCount = 2`), and sets them in _Idle_ state. Notice that we have also set 107 `IdleTime` to 30 minutes (`IdleTime = 1800`). 108 109 Now, let's assume that 5 builds are queued in AlloyCI CI. The first 2 builds are 110 sent to the _Idle_ machines of which we have two. AlloyCI Runner now notices that 111 the number of _Idle_ is less than `IdleCount` (`0 < 2`), so it starts 2 new 112 machines. Then, the next 2 builds from the queue are sent to those newly created 113 machines. Again, the number of _Idle_ machines is less than `IdleCount`, so 114 AlloyCI Runner starts 2 new machines and the last queued build is sent to one of 115 the _Idle_ machines. 116 117 We now have 1 _Idle_ machine, so AlloyCI Runner starts another 1 new machine to 118 satisfy `IdleCount`. Because there are no new builds in queue, those two 119 machines stay in _Idle_ state and AlloyCI Runner is satisfied. 120 121 --- 122 123 **This is what happened:** 124 We had 2 machines, waiting in _Idle_ state for new builds. After the 5 builds 125 where queued, new machines were created, so in total we had 7 machines. Five of 126 them were running builds, and 2 were in _Idle_ state, waiting for the next 127 builds. 128 129 The algorithm will still work in the same way; AlloyCI Runner will create a new 130 _Idle_ machine for each machine used for the build execution until `IdleCount` 131 is satisfied. Those machines will be created up to the number defined by 132 `limit` parameter. If AlloyCI Runner notices that there is a `limit` number of 133 total created machines, it will stop autoscaling, and new builds will need to 134 wait in the build queue until machines start returning to _Idle_ state. 135 136 In the above example we will always have two idle machines. The `IdleTime` 137 applies only when we are over the `IdleCount`, then we try to reduce the number 138 of machines to `IdleCount`. 139 140 --- 141 142 **Scaling down:** 143 After the build is finished, the machine is set to _Idle_ state and is waiting 144 for the next builds to be executed. Let's suppose that we have no new builds in 145 the queue. After the time designated by `IdleTime` passes, the _Idle_ machines 146 will be removed. In our example, after 30 minutes, all machines will be removed 147 (each machine after 30 minutes from when last build execution ended) and AlloyCI 148 Runner will start to keep an `IdleCount` of _Idle_ machines running, just like 149 at the beginning of the example. 150 151 --- 152 153 So, to sum up: 154 155 1. We start the Runner 156 2. Runner creates 2 idle machines 157 3. Runner picks one build 158 4. Runner creates one more machine to fulfill the strong requirement of always 159 having the two idle machines 160 5. Build finishes, we have 3 idle machines 161 6. When one of the three idle machines goes over `IdleTime` from the time when 162 last time it picked the build it will be removed 163 7. The Runner will always have at least 2 idle machines waiting for fast 164 picking of the builds 165 166 Below you can see a comparison chart of builds statuses and machines statuses 167 in time: 168 169  170 171 ## How `concurrent`, `limit` and `IdleCount` generate the upper limit of running machines 172 173 There doesn't exist a magic equation that will tell you what to set `limit` or 174 `concurrent` to. Act according to your needs. Having `IdleCount` of _Idle_ 175 machines is a speedup feature. You don't need to wait 10s/20s/30s for the 176 instance to be created. But as a user, you'd want all your machines (for which 177 you need to pay) to be running builds, not stay in _Idle_ state. So you should 178 have `concurrent` and `limit` set to values that will run the maximum count of 179 machines you are willing to pay for. As for `IdleCount`, it should be set to a 180 value that will generate a minimum amount of _not used_ machines when the build 181 queue is empty. 182 183 Let's assume the following example: 184 185 ```bash 186 concurrent=20 187 188 [[runners]] 189 limit = 40 190 [runners.machine] 191 IdleCount = 10 192 ``` 193 194 In the above scenario the total amount of machines we could have is 30. The 195 `limit` of total machines (building and idle) can be 40. We can have 10 idle 196 machines but the `concurrent` builds are 20. So in total we can have 20 197 concurrent machines running builds and 10 idle, summing up to 30. 198 199 But what happens if the `limit` is less than the total amount of machines that 200 could be created? The example below explains that case: 201 202 ```bash 203 concurrent=20 204 205 [[runners]] 206 limit = 25 207 [runners.machine] 208 IdleCount = 10 209 ``` 210 211 In this example we will have at most 20 concurrent builds, and at most 25 212 machines created. In the worst case scenario regarding idle machines, we will 213 not be able to have 10 idle machines, but only 5, because the `limit` is 25. 214 215 ## Off Peak time mode configuration 216 217 > Introduced in AlloyCI Runner v1.7 218 219 Autoscale can be configured with the support for _Off Peak_ time mode periods. 220 221 **What is _Off Peak_ time mode period?** 222 223 Some organizations can select a regular time periods when no work is done. 224 For example most of commercial companies are working from Monday to 225 Friday in a fixed hours, eg. from 10am to 6pm. In the rest of the week - 226 from Monday to Friday at 12am-9am and 6pm-11pm and whole Saturday and Sunday - 227 no one is working. These time periods we're naming here as _Off Peak_. 228 229 Organizations where _Off Peak_ time periods occurs probably don't want 230 to pay for the _Idle_ machines when it's certain that no builds will be 231 executed in this time. Especially when `IdleCount` is set to a big number. 232 233 In the `v1.7` version of the Runner we've added the support for _Off Peak_ 234 configuration. With parameters described in configuration file you can now 235 change the `IdleCount` and `IdleTime` values for the _Off Peak_ time mode 236 periods. 237 238 **How it is working?** 239 240 Configuration of _Off Peak_ is done by four parameters: `OffPeakPeriods`, 241 `OffPeakTimezone`, `OffPeakIdleCount` and `OffPeakIdleTime`. The 242 `OffPeakPeriods` setting contains an array of cron-style patterns defining 243 when the _Off Peak_ time mode should be set on. For example: 244 245 ```toml 246 [runners.machine] 247 OffPeakPeriods = [ 248 "* * 0-9,18-23 * * mon-fri *", 249 "* * * * * sat,sun *" 250 ] 251 ``` 252 253 will enable the _Off Peak_ periods described above, so the _working_ days 254 from 12am to 9am and from 6pm to 11pm and whole weekend days. Machines 255 scheduler is checking all patterns from the array and if at least one of 256 them describes current time, then the _Off Peak_ time mode is enabled. 257 258 You can specify the `OffPeakTimezone` e.g. `"Australia/Sydney"`. If you don't, 259 the system setting of the host machine of every runner will be used. This 260 default can be stated as `OffPeakTimezone = "Local"` explicitly if you wish. 261 262 When the _Off Peak_ time mode is enabled machines scheduler use 263 `OffPeakIdleCount` instead of `IdleCount` setting and `OffPeakIdleTime` 264 instead of `IdleTime` setting. The autoscaling algorithm is not changed, 265 only the parameters. When machines scheduler discovers that none from 266 the `OffPeakPeriods` pattern is fulfilled then it switches back to 267 `IdleCount` and `IdleTime` settings. 268 269 More information about syntax of `OffPeakPeriods` patterns can be found 270 in [AlloyCI Runner - Advanced Configuration - The runners.machine section](advanced-configuration.md#the-runnersmachine-section). 271 272 ## Distributed runners caching 273 274 To speed up your builds, AlloyCI Runner provides a [cache mechanism][cache] 275 where selected directories and/or files are saved and shared between subsequent 276 builds. 277 278 This is working fine when builds are run on the same host, but when you start 279 using the Runners autoscale feature, most of your builds will be running on a 280 new (or almost new) host, which will execute each build in a new Docker 281 container. In that case, you will not be able to take advantage of the cache 282 feature. 283 284 To overcome this issue, together with the autoscale feature, the distributed 285 Runners cache feature was introduced. 286 287 It uses any S3-compatible server to share the cache between used Docker hosts. 288 When restoring and archiving the cache, AlloyCI Runner will query the S3 server 289 and will download or upload the archive. 290 291 To enable distributed caching, you have to define it in `config.toml` using the 292 [`[runners.cache]` directive][runners-cache]: 293 294 ```bash 295 [[runners]] 296 limit = 10 297 executor = "docker+machine" 298 [runners.cache] 299 Type = "s3" 300 ServerAddress = "s3.example.com" 301 AccessKey = "access-key" 302 SecretKey = "secret-key" 303 BucketName = "runner" 304 Insecure = false 305 Path = "path/to/prefix" 306 Shared = false 307 ``` 308 309 The S3 URLs follow the structure `http(s)://<ServerAddress>/<BucketName>/<Path>/runner/<runner-id>/project/<id>/<cache-key>`. 310 311 To share the cache between two or more runners, set the `Shared` flag to true. That will remove the runner token from the S3 URL (`runner/<runner-id>`) and all configured runners will share the same cache. Remember that you can also set `Path` to separate caches between runners when cache sharing is enabled. 312 313 Read how to [install your own caching server][caching]. 314 315 ## Distributed Docker registry mirroring 316 317 To speed up builds executed inside of Docker containers, you can use the [Docker 318 registry mirroring service][registry]. This will provide a proxy between your 319 Docker machines and all used registries. Images will be downloaded once by the 320 registry mirror. On each new host, or on an existing host where the image is 321 not available, it will be downloaded from the configured registry mirror. 322 323 Provided that the mirror will exist in your Docker machines LAN, the image 324 downloading step should be much faster on each host. 325 326 To configure the Docker registry mirroring, you have to add `MachineOptions` to 327 the configuration in `config.toml`: 328 329 ```bash 330 [[runners]] 331 limit = 10 332 executor = "docker+machine" 333 [runners.machine] 334 (...) 335 MachineOptions = [ 336 (...) 337 "engine-registry-mirror=http://10.11.12.13:12345" 338 ] 339 ``` 340 341 Where `10.11.12.13:12345` is the IP address and port where your registry mirror 342 is listening for connections from the Docker service. It must be accessible for 343 each host created by Docker Machine. 344 345 Read how to [install your own Docker registry server][registry-server]. 346 347 ## A complete example of `config.toml` 348 349 The `config.toml` below uses the `digitalocean` Docker Machine driver: 350 351 ```bash 352 concurrent = 50 # All registered Runners can run up to 50 concurrent builds 353 354 [[runners]] 355 url = "https://alloy-ci.com" 356 token = "RUNNER_TOKEN" # Note this is different from the registration token used by `alloy-runner register` 357 name = "autoscale-runner" 358 executor = "docker+machine" # This Runner is using the 'docker+machine' executor 359 limit = 10 # This Runner can execute up to 10 builds (created machines) 360 [runners.docker] 361 image = "ruby:2.1" # The default image used for builds is 'ruby:2.1' 362 [runners.machine] 363 OffPeakPeriods = [ # Set the Off Peak time mode on for: 364 "* * 0-9,18-23 * * mon-fri *", # - Monday to Friday for 12am to 9am and 6pm to 11pm 365 "* * * * * sat,sun *" # - whole Saturday and Sunday 366 ] 367 OffPeakIdleCount = 1 # There must be 1 machine in Idle state - when Off Peak time mode is on 368 OffPeakIdleTime = 1200 # Each machine can be in Idle state up to 1200 seconds (after this it will be removed) - when Off Peak time mode is on 369 IdleCount = 5 # There must be 5 machines in Idle state - when Off Peak time mode is off 370 IdleTime = 600 # Each machine can be in Idle state up to 600 seconds (after this it will be removed) - when Off Peak time mode is off 371 MaxBuilds = 100 # Each machine can handle up to 100 builds in a row (after this it will be removed) 372 MachineName = "auto-scale-%s" # Each machine will have a unique name ('%s' is required) 373 MachineDriver = "digitalocean" # Docker Machine is using the 'digitalocean' driver 374 MachineOptions = [ 375 "digitalocean-image=coreos-stable", 376 "digitalocean-ssh-user=core", 377 "digitalocean-access-token=DO_ACCESS_TOKEN", 378 "digitalocean-region=nyc2", 379 "digitalocean-size=4gb", 380 "digitalocean-private-networking", 381 "engine-registry-mirror=http://10.11.12.13:12345" # Docker Machine is using registry mirroring 382 ] 383 [runners.cache] 384 Type = "s3" # The Runner is using a distributed cache with Amazon S3 service 385 ServerAddress = "s3-eu-west-1.amazonaws.com" 386 AccessKey = "AMAZON_S3_ACCESS_KEY" 387 SecretKey = "AMAZON_S3_SECRET_KEY" 388 BucketName = "runners" 389 Insecure = false 390 ``` 391 392 Note that the `MachineOptions` parameter contains options for the `digitalocean` 393 driver which is used by Docker Machine to spawn machines hosted on Digital Ocean, 394 and one option for Docker Machine itself (`engine-registry-mirror`). 395 396 ## What are the supported cloud providers 397 398 The autoscale mechanism currently is based on Docker Machine. Advanced 399 configuration options, including virtualization/cloud provider parameters, are 400 available at the [Docker Machine documentation][docker-machine-driver]. 401 402 [cache]: https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md#cache 403 [runner-installation]: ../install/autoscaling.md 404 [runner-configuration]: README.md 405 [docker-machine-docs]: https://docs.docker.com/machine/ 406 [docker-machine-driver]: https://docs.docker.com/machine/drivers/ 407 [docker-machine-installation]: https://docs.docker.com/machine/install-machine/ 408 [runners-cache]: advanced-configuration.md#the-runnerscache-section 409 [registry]: https://docs.docker.com/docker-trusted-registry/overview/ 410 [caching]: ../install/autoscaling.md#install-the-cache-server 411 [registry-server]: ../install/autoscaling.md#install-docker-registry