github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/runner_autoscale_aws/index.md (about) 1 # Autoscaling AlloyCI Runner on AWS 2 3 One of the biggest advantages of AlloyCI Runner is its ability to automatically 4 spin up and down VMs to make sure your builds get processed immediately. It's a 5 great feature, and if used correctly, it can be extremely useful in situations 6 where you don't use your Runners 24/7 and want to have a cost-effective and 7 scalable solution. 8 9 ## Introduction 10 11 In this tutorial, we'll explore how to properly configure a AlloyCI Runner in 12 AWS that will serve as the bastion where it will spawn new Docker machines on 13 demand. 14 15 In addition, we'll make use of [Amazon's EC2 Spot instances][spot] which will 16 greatly reduce the costs of the Runner instances while still using quite 17 powerful autoscaling machines. 18 19 ## Prerequisites 20 21 NOTE: **Note:** 22 A familiarity with Amazon Web Services (AWS) is required as this is where most 23 of the configuration will take place. 24 25 Your AlloyCI instance is going to need to talk to the Runners over the network, 26 and that is something you need think about when configuring any AWS security 27 groups or when setting up your DNS configuration. 28 29 For example, you can keep the EC2 resources segmented away from public traffic 30 in a different VPC to better strengthen your network security. Your environment 31 is likely different, so consider what works best for your situation. 32 33 ### AWS security groups 34 35 Docker Machine will attempt to use a 36 [default security group](https://docs.docker.com/machine/drivers/aws/#security-group) 37 with rules for port `2376`, which is required for communication with the Docker 38 daemon. Instead of relying on Docker, you can create a security group with the 39 rules you need and provide that in the Runner options as we will 40 [see below](#the-runners-machine-section). This way, you can customize it to your 41 liking ahead of time based on your networking environment. 42 43 ### AWS credentials 44 45 You'll need an [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html) 46 tied to a user with permission to scale (EC2) and update the cache (via S3). 47 Create a new user with [policies](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-policies-for-amazon-ec2.html) 48 for EC2 (AmazonEC2FullAccess) and S3 (AmazonS3FullAccess). To be more secure, 49 you can disable console login for that user. Keep the tab open or copy paste the 50 security credentials in an editor as we'll use them later during the 51 [Runner configuration](#the-runners-machine-section). 52 53 ## Prepare the bastion instance 54 55 The first step is to install AlloyCI Runner in an EC2 instance that will serve 56 as the bastion that spawns new machines. This doesn't have to be a powerful 57 machine since it will not run any jobs itself, a `t2.micro` instance will do. 58 This machine will be a dedicated host since we need it always up and running, 59 thus it will be the only standard cost. 60 61 NOTE: **Note:** 62 For the bastion instance, choose a distribution that both Docker and AlloyCI 63 Runner support, for example either Ubuntu, Debian, CentOS or RHEL will work fine. 64 65 Install the prerequisites: 66 67 1. Log in to your server 68 1. [Install AlloyCI Runner from the official AlloyCI repository](../../install/linux-repository.md) 69 1. [Install Docker](https://docs.docker.com/engine/installation/#server) 70 1. [Install Docker Machine](https://docs.docker.com/machine/install-machine/) 71 72 Now that the Runner is installed, it's time to register it. 73 74 ## Registering the AlloyCI Runner 75 76 Before configuring the AlloyCI Runner, you need to first register it, so that 77 it connects with your AlloyCI instance: 78 79 1. [Obtain a Runner token](../../README.md) 80 1. [Register the Runner](../../register/index.md#gnu-linux) 81 1. When asked the executor type, enter `docker+machine` 82 83 You can now move on to the most important part, configuring the AlloyCI Runner. 84 85 TIP: **Tip:** 86 If you want every user in your instance to be able to use the autoscaled Runners, 87 register the Runner as a shared one. 88 89 ## Configuring the AlloyCI Runner 90 91 Now that the Runner is registered, you need to edit its configuration file and 92 add the required options for the AWS machine driver. 93 94 Let's first break it down to pieces. 95 96 ### The global section 97 98 In the global section, you can define the limit of the jobs that can be run 99 concurrently across all Runners (`concurrent`). This heavily depends on your 100 needs, like how many users your Runners will accommodate, how much time your 101 builds take, etc. You can start with something low like `10`, and increase or 102 decrease its value going forward. 103 104 The `check_interval` option defines how often the Runner should check AlloyCI 105 for new jobs, in seconds. 106 107 Example: 108 109 ```toml 110 concurrent = 10 111 check_interval = 0 112 ``` 113 114 [Read more](../advanced-configuration.md#the-global-section) 115 about all the options you can use. 116 117 ### The `runners` section 118 119 From the `[[runners]]` section, the most important part is the `executor` which 120 must be set to `docker+machine`. Most of those settings are taken care of when 121 you register the Runner for the first time. 122 123 `limit` sets the maximum number of machines (running and idle) that this Runner 124 will spawn. For more info check the [relationship between `limit`, `concurrent` 125 and `IdleCount`](../autoscale.md#how-concurrent-limit-and-idlecount-generate-the-upper-limit-of-running-machines). 126 127 Example: 128 129 ```toml 130 [[runners]] 131 name = "alloy-aws-autoscaler" 132 url = "<URL of your AlloyCI instance>" 133 token = "<Runner's token>" 134 executor = "docker+machine" 135 limit = 20 136 ``` 137 138 [Read more](../advanced-configuration.md#the-runners-section) 139 about all the options you can use under `[[runners]]`. 140 141 ### The `runners.docker` section 142 143 In the `[runners.docker]` section you can define the default Docker image to 144 be used by the child Runners if it's not defined in [`.alloy-ci.json`](https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md). 145 By using `privileged = true`, all Runners will be able to run 146 [Docker in Docker](https://github.com/AlloyCI/alloy_ci/tree/master/doc/docker/README.md#use-docker-in-docker-executor) 147 which is useful if you plan to build your own Docker images via AlloyCI. 148 149 Next, we use `disable_cache = true` to disable the Docker executor's inner 150 cache mechanism since we will use the distributed cache mode as described 151 in the following section. 152 153 Example: 154 155 ```toml 156 [runners.docker] 157 image = "alpine" 158 privileged = true 159 disable_cache = true 160 ``` 161 162 [Read more](../advanced-configuration.md#the-runners-docker-section) 163 about all the options you can use under `[runners.docker]`. 164 165 ### The `runners.cache` section 166 167 To speed up your jobs, AlloyCI Runner provides a cache mechanism where selected 168 directories and/or files are saved and shared between subsequent jobs. 169 While not required for this setup, it is recommended to use the distributed cache 170 mechanism that AlloyCI Runner provides. Since new instances will be created on 171 demand, it is essential to have a common place where the cache is stored. 172 173 In the following example, we use Amazon S3: 174 175 ```toml 176 [runners.cache] 177 Type = "s3" 178 ServerAddress = "s3.amazonaws.com" 179 AccessKey = "<your AWS Access Key ID>" 180 SecretKey = "<your AWS Secret Access Key>" 181 BucketName = "<the bucket where your cache should be kept>" 182 BucketLocation = "us-east-1" 183 Shared = true 184 ``` 185 186 Here's some more info to further explore the cache mechanism: 187 188 - [Reference for `runners.cache`](../advanced-configuration.md#the-runners-cache-section) 189 - [Deploying and using a cache server for AlloyCI Runner](../autoscale.md#distributed-runners-caching) 190 - [How cache works](https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md#cache) 191 192 ### The `runners.machine` section 193 194 This is the most important part of the configuration and it's the one that 195 tells AlloyCI Runner how and when to spawn new or remove old Docker Machine 196 instances. 197 198 We will focus on the AWS machine options, for the rest of the settings read 199 about the: 200 201 - [Autoscaling algorithm and the parameters it's based on](../autoscale.md#autoscaling-algorithm-and-parameters) - depends on the needs of your organization 202 - [Off peak time configuration](../autoscale.md#off-peak-time-mode-configuration) - useful when there are regular time periods in your organization when no work is done, for example weekends 203 204 Here's an example of the `runners.machine` section: 205 206 ```toml 207 [runners.machine] 208 IdleCount = 1 209 IdleTime = 1800 210 MaxBuilds = 10 211 OffPeakPeriods = [ 212 "* * 0-9,18-23 * * mon-fri *", 213 "* * * * * sat,sun *" 214 ] 215 OffPeakIdleCount = 0 216 OffPeakIdleTime = 1200 217 MachineDriver = "amazonec2" 218 MachineName = "alloy-docker-machine-%s" 219 MachineOptions = [ 220 "amazonec2-access-key=XXXX", 221 "amazonec2-secret-key=XXXX", 222 "amazonec2-region=us-central-1", 223 "amazonec2-vpc-id=vpc-xxxxx", 224 "amazonec2-subnet-id=subnet-xxxxx", 225 "amazonec2-use-private-address=true", 226 "amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true", 227 "amazonec2-security-group=docker-machine-scaler", 228 "amazonec2-instance-type=m4.2xlarge", 229 ] 230 ``` 231 232 The Docker Machine driver is set to `amazonec2` and the machine name has a 233 standard prefix followed by `%s` (required) that is replaced by the ID of the 234 child Runner: `alloy-docker-machine-%s`. 235 236 Now, depending on your AWS infrastructure, there are many options you can set up 237 under `MachineOptions`. Below you can see the most common ones. 238 239 | Machine option | Description | 240 | -------------- | ----------- | 241 | `amazonec2-access-key=XXXX` | The AWS access key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). | 242 | `amazonec2-secret-key=XXXX` | The AWS secret key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). | 243 | `amazonec2-region=eu-central-1` | The region to use when launching the instance. You can omit this entirely and the default `us-east-1` will be used. | 244 | `amazonec2-vpc-id=vpc-xxxxx` | Your [VPC ID](https://docs.docker.com/machine/drivers/aws/#vpc-id) to launch the instance in. | 245 | `amazonec2-subnet-id=subnet-xxxx` | The AWS VPC subnet ID. | 246 | `amazonec2-use-private-address=true` | Use the private IP address of Docker Machines, but still create a public IP address. Useful to keep the traffic internal and avoid extra costs.| 247 | `amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true` | AWS extra tag key-value pairs, useful to identify the instances on the AWS console. The "Name" tag is set to the machine name by default. We set the "runner-manager-name" to match the Runner name set in `[[runners]]`, so that we can filter all the EC2 instances created by a specific manager setup. Read more about [using tags in AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html). | 248 | `amazonec2-security-group=docker-machine-scaler` | AWS VPC security group name, see [AWS security groups](#aws-security-groups). | 249 | `amazonec2-instance-type=m4.2xlarge` | The instance type that the child Runners will run on. | 250 251 TIP: **Tip:** 252 Under `MachineOptions` you can add anything that the [AWS Docker Machine driver 253 supports](https://docs.docker.com/machine/drivers/aws/#options). You are highly 254 encouraged to read Docker's docs as your infrastructure setup may warrant 255 different options to be applied. 256 257 NOTE: **Note:** 258 The child instances will use by default Ubuntu 16.04 unless you choose a 259 different AMI ID by setting `amazonec2-ami`. 260 261 NOTE: **Note:** 262 If you specify `amazonec2-private-address-only=true` as one of the machine 263 options, your EC2 instance won't get assigned a public IP. This is ok if your 264 VPC is configured correctly with an Internet Gateway (IGW) and routing is fine, 265 but it’s something to consider if you've got a more complex configuration. Read 266 more in [Docker docs about VPC connectivity](https://docs.docker.com/machine/drivers/aws/#vpc-connectivity). 267 268 [Read more](../advanced-configuration.md#the-runners-machine-section) 269 about all the options you can use under `[runners.machine]`. 270 271 ### Getting it all together 272 273 Here's the full example of `/etc/alloy-runner/config.toml`: 274 275 ```toml 276 concurrent = 10 277 check_interval = 0 278 279 [[runners]] 280 name = "alloy-aws-autoscaler" 281 url = "<URL of your AlloyCI instance>" 282 token = "<Runner's token>" 283 executor = "docker+machine" 284 limit = 20 285 [runners.docker] 286 image = "alpine" 287 privileged = true 288 disable_cache = true 289 [runners.cache] 290 Type = "s3" 291 ServerAddress = "s3.amazonaws.com" 292 AccessKey = "<your AWS Access Key ID>" 293 SecretKey = "<your AWS Secret Access Key>" 294 BucketName = "<the bucket where your cache should be kept>" 295 BucketLocation = "us-east-1" 296 Shared = true 297 [runners.machine] 298 IdleCount = 1 299 IdleTime = 1800 300 MaxBuilds = 100 301 OffPeakPeriods = [ 302 "* * 0-9,18-23 * * mon-fri *", 303 "* * * * * sat,sun *" 304 ] 305 OffPeakIdleCount = 0 306 OffPeakIdleTime = 1200 307 MachineDriver = "amazonec2" 308 MachineName = "alloy-docker-machine-%s" 309 MachineOptions = [ 310 "amazonec2-access-key=XXXX", 311 "amazonec2-secret-key=XXXX", 312 "amazonec2-region=us-central-1", 313 "amazonec2-vpc-id=vpc-xxxxx", 314 "amazonec2-subnet-id=subnet-xxxxx", 315 "amazonec2-use-private-address=true", 316 "amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true", 317 "amazonec2-security-group=docker-machine-scaler", 318 "amazonec2-instance-type=m4.2xlarge", 319 ] 320 ``` 321 322 ## Cutting down costs with Amazon EC2 Spot instances 323 324 As [described by][spot] Amazon: 325 326 > 327 Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity. 328 Since Spot instances are often available at a discount compared to On-Demand 329 pricing, you can significantly reduce the cost of running your applications, 330 grow your application’s compute capacity and throughput for the same budget, 331 and enable new types of cloud computing applications. 332 333 In addition to the [`runners.machine`](#the-runners-machine-section) options 334 you picked above, in `/etc/alloy-runner/config.toml` under the `MachineOptions` 335 section, add the following: 336 337 ```toml 338 MachineOptions = [ 339 "amazonec2-request-spot-instance=true", 340 "amazonec2-spot-price=0.03", 341 "amazonec2-block-duration-minutes=60" 342 ] 343 ``` 344 345 With this configuration, Docker Machines are created on Spot instances with a 346 maximum bid price of $0.03 per hour and the duration of the Spot instance is 347 capped at 60 minutes. The `0.03` number mentioned above is just an example, so 348 be sure to check on the current pricing based on the region you picked. 349 350 To learn more about Amazon EC2 Spot instances, visit the following links: 351 352 - https://aws.amazon.com/ec2/spot/ 353 - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html 354 - https://aws.amazon.com/blogs/aws/focusing-on-spot-instances-lets-talk-about-best-practices/ 355 356 ### Caveats of Spot instances 357 358 While Spot instances is a great way to use unused resources and minimize the 359 costs of your infrastructure, you must be aware of the implications. 360 361 Running CI jobs on Spot instances may increase the failure rates because of the 362 Spot instances pricing model. If the price exceeds your bid, the existing Spot 363 instances will be immediately terminated and all your jobs on that host will fail. 364 365 As a consequence, the auto-scale Runner would fail to create new machines while 366 it will continue to request new instances. This eventually will make 60 requests 367 and then AWS won't accept any more. Then once the Spot price is acceptable, you 368 are locked out for a bit because the call amount limit is exceeded. 369 370 If you encounter that case, you can use the following command in the bastion 371 machine to see the Docker Machines state: 372 373 ```sh 374 docker-machine ls -q --filter state=Error --format "{{.NAME}}" 375 ``` 376 377 NOTE: **Note:** 378 There are some issues regarding making AlloyCI Runner gracefully handle Spot 379 price changes, and there are reports of `docker-machine` attempting to 380 continually remove a Docker Machine. AlloyCI has provided patches for both cases 381 in the upstream project. For more information, see issues 382 [#2771](https://gitlab.com/gitlab-org/gitlab-runner/issues/2771) and 383 [#2772](https://gitlab.com/gitlab-org/gitlab-runner/issues/2772). 384 385 ## Conclusion 386 387 In this guide we learned how to install and configure a AlloyCI Runner in 388 autoscale mode on AWS. 389 390 Using the autoscale feature of AlloyCI Runner can save you both time and money. 391 Using the Spot instances that AWS provides can save you even more, but you must 392 be aware of the implications. As long as your bid is high enough, there shouldn't 393 be an issue. 394 395 You can read the following use cases from which this tutorial was (heavily) 396 influenced: 397 398 - [HumanGeo - Scaling GitLab CI](http://blog.thehumangeo.com/gitlab-autoscale-runners.html) 399 - [subtrakt Health - Autoscale GitLab CI Runners and save 90% on EC2 costs](https://substrakthealth.com/news/gitlab-ci-cost-savings/) 400 401 [spot]: https://aws.amazon.com/ec2/spot/