github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/runner_autoscale_aws/index.md

github.com/alloyci/alloy-runner@v1.0.1-0.20180222164613-925503ccafd6/docs/configuration/runner_autoscale_aws/index.md (about)

     1  # Autoscaling AlloyCI Runner on AWS
     2  
     3  One of the biggest advantages of AlloyCI Runner is its ability to automatically
     4  spin up and down VMs to make sure your builds get processed immediately. It's a
     5  great feature, and if used correctly, it can be extremely useful in situations
     6  where you don't use your Runners 24/7 and want to have a cost-effective and
     7  scalable solution.
     8  
     9  ## Introduction
    10  
    11  In this tutorial, we'll explore how to properly configure a AlloyCI Runner in
    12  AWS that will serve as the bastion where it will spawn new Docker machines on
    13  demand.
    14  
    15  In addition, we'll make use of [Amazon's EC2 Spot instances][spot] which will
    16  greatly reduce the costs of the Runner instances while still using quite
    17  powerful autoscaling machines.
    18  
    19  ## Prerequisites
    20  
    21  NOTE: **Note:**
    22  A familiarity with Amazon Web Services (AWS) is required as this is where most
    23  of the configuration will take place.
    24  
    25  Your AlloyCI instance is going to need to talk to the Runners over the network,
    26  and that is something you need think about when configuring any AWS security
    27  groups or when setting up your DNS configuration.
    28  
    29  For example, you can keep the EC2 resources segmented away from public traffic
    30  in a different VPC to better strengthen your network security. Your environment
    31  is likely different, so consider what works best for your situation.
    32  
    33  ### AWS security groups
    34  
    35  Docker Machine will attempt to use a
    36  [default security group](https://docs.docker.com/machine/drivers/aws/#security-group)
    37  with rules for port `2376`, which is required for communication with the Docker
    38  daemon. Instead of relying on Docker, you can create a security group with the
    39  rules you need and provide that in the Runner options as we will
    40  [see below](#the-runners-machine-section). This way, you can customize it to your
    41  liking ahead of time based on your networking environment.
    42  
    43  ### AWS credentials
    44  
    45  You'll need an [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html)
    46  tied to a user with permission to scale (EC2) and update the cache (via S3).
    47  Create a new user with [policies](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-policies-for-amazon-ec2.html)
    48  for EC2 (AmazonEC2FullAccess) and S3 (AmazonS3FullAccess). To be more secure,
    49  you can disable console login for that user. Keep the tab open or copy paste the
    50  security credentials in an editor as we'll use them later during the
    51  [Runner configuration](#the-runners-machine-section).
    52  
    53  ## Prepare the bastion instance
    54  
    55  The first step is to install AlloyCI Runner in an EC2 instance that will serve
    56  as the bastion that spawns new machines. This doesn't have to be a powerful
    57  machine since it will not run any jobs itself, a `t2.micro` instance will do.
    58  This machine will be a dedicated host since we need it always up and running,
    59  thus it will be the only standard cost.
    60  
    61  NOTE: **Note:**
    62  For the bastion instance, choose a distribution that both Docker and AlloyCI
    63  Runner support, for example either Ubuntu, Debian, CentOS or RHEL will work fine.
    64  
    65  Install the prerequisites:
    66  
    67  1. Log in to your server
    68  1. [Install AlloyCI Runner from the official AlloyCI repository](../../install/linux-repository.md)
    69  1. [Install Docker](https://docs.docker.com/engine/installation/#server)
    70  1. [Install Docker Machine](https://docs.docker.com/machine/install-machine/)
    71  
    72  Now that the Runner is installed, it's time to register it.
    73  
    74  ## Registering the AlloyCI Runner
    75  
    76  Before configuring the AlloyCI Runner, you need to first register it, so that
    77  it connects with your AlloyCI instance:
    78  
    79  1. [Obtain a Runner token](../../README.md)
    80  1. [Register the Runner](../../register/index.md#gnu-linux)
    81  1. When asked the executor type, enter `docker+machine`
    82  
    83  You can now move on to the most important part, configuring the AlloyCI Runner.
    84  
    85  TIP: **Tip:**
    86  If you want every user in your instance to be able to use the autoscaled Runners,
    87  register the Runner as a shared one.
    88  
    89  ## Configuring the AlloyCI Runner
    90  
    91  Now that the Runner is registered, you need to edit its configuration file and
    92  add the required options for the AWS machine driver.
    93  
    94  Let's first break it down to pieces.
    95  
    96  ### The global section
    97  
    98  In the global section, you can define the limit of the jobs that can be run
    99  concurrently across all Runners (`concurrent`). This heavily depends on your
   100  needs, like how many users your Runners will accommodate, how much time your
   101  builds take, etc. You can start with something low like `10`, and increase or
   102  decrease its value going forward.
   103  
   104  The `check_interval` option defines how often the Runner should check AlloyCI
   105  for new jobs, in seconds.
   106  
   107  Example:
   108  
   109  ```toml
   110  concurrent = 10
   111  check_interval = 0
   112  ```
   113  
   114  [Read more](../advanced-configuration.md#the-global-section)
   115  about all the options you can use.
   116  
   117  ### The `runners` section
   118  
   119  From the `[[runners]]` section, the most important part is the `executor` which
   120  must be set to `docker+machine`. Most of those settings are taken care of when
   121  you register the Runner for the first time.
   122  
   123  `limit` sets the maximum number of machines (running and idle) that this Runner
   124  will spawn. For more info check the [relationship between `limit`, `concurrent`
   125  and `IdleCount`](../autoscale.md#how-concurrent-limit-and-idlecount-generate-the-upper-limit-of-running-machines).
   126  
   127  Example:
   128  
   129  ```toml
   130  [[runners]]
   131    name = "alloy-aws-autoscaler"
   132    url = "<URL of your AlloyCI instance>"
   133    token = "<Runner's token>"
   134    executor = "docker+machine"
   135    limit = 20
   136  ```
   137  
   138  [Read more](../advanced-configuration.md#the-runners-section)
   139  about all the options you can use under `[[runners]]`.
   140  
   141  ### The `runners.docker` section
   142  
   143  In the `[runners.docker]` section you can define the default Docker image to
   144  be used by the child Runners if it's not defined in [`.alloy-ci.json`](https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md).
   145  By using `privileged = true`, all Runners will be able to run
   146  [Docker in Docker](https://github.com/AlloyCI/alloy_ci/tree/master/doc/docker/README.md#use-docker-in-docker-executor)
   147  which is useful if you plan to build your own Docker images via AlloyCI.
   148  
   149  Next, we use `disable_cache = true` to disable the Docker executor's inner
   150  cache mechanism since we will use the distributed cache mode as described
   151  in the following section.
   152  
   153  Example:
   154  
   155  ```toml
   156    [runners.docker]
   157      image = "alpine"
   158      privileged = true
   159      disable_cache = true
   160  ```
   161  
   162  [Read more](../advanced-configuration.md#the-runners-docker-section)
   163  about all the options you can use under `[runners.docker]`.
   164  
   165  ### The `runners.cache` section
   166  
   167  To speed up your jobs, AlloyCI Runner provides a cache mechanism where selected
   168  directories and/or files are saved and shared between subsequent jobs.
   169  While not required for this setup, it is recommended to use the distributed cache
   170  mechanism that AlloyCI Runner provides. Since new instances will be created on
   171  demand, it is essential to have a common place where the cache is stored.
   172  
   173  In the following example, we use Amazon S3:
   174  
   175  ```toml
   176    [runners.cache]
   177      Type = "s3"
   178      ServerAddress = "s3.amazonaws.com"
   179      AccessKey = "<your AWS Access Key ID>"
   180      SecretKey = "<your AWS Secret Access Key>"
   181      BucketName = "<the bucket where your cache should be kept>"
   182      BucketLocation = "us-east-1"
   183      Shared = true
   184  ```
   185  
   186  Here's some more info to further explore the cache mechanism:
   187  
   188  - [Reference for `runners.cache`](../advanced-configuration.md#the-runners-cache-section)
   189  - [Deploying and using a cache server for AlloyCI Runner](../autoscale.md#distributed-runners-caching)
   190  - [How cache works](https://github.com/AlloyCI/alloy_ci/tree/master/doc/json/README.md#cache)
   191  
   192  ### The `runners.machine` section
   193  
   194  This is the most important part of the configuration and it's the one that
   195  tells AlloyCI Runner how and when to spawn new or remove old Docker Machine
   196  instances.
   197  
   198  We will focus on the AWS machine options, for the rest of the settings read
   199  about the:
   200  
   201  - [Autoscaling algorithm and the parameters it's based on](../autoscale.md#autoscaling-algorithm-and-parameters) - depends on the needs of your organization
   202  - [Off peak time configuration](../autoscale.md#off-peak-time-mode-configuration) - useful when there are regular time periods in your organization when no work is done, for example weekends
   203  
   204  Here's an example of the `runners.machine` section:
   205  
   206  ```toml
   207    [runners.machine]
   208      IdleCount = 1
   209      IdleTime = 1800
   210      MaxBuilds = 10
   211      OffPeakPeriods = [
   212        "* * 0-9,18-23 * * mon-fri *",
   213        "* * * * * sat,sun *"
   214      ]
   215      OffPeakIdleCount = 0
   216      OffPeakIdleTime = 1200
   217      MachineDriver = "amazonec2"
   218      MachineName = "alloy-docker-machine-%s"
   219      MachineOptions = [
   220        "amazonec2-access-key=XXXX",
   221        "amazonec2-secret-key=XXXX",
   222        "amazonec2-region=us-central-1",
   223        "amazonec2-vpc-id=vpc-xxxxx",
   224        "amazonec2-subnet-id=subnet-xxxxx",
   225        "amazonec2-use-private-address=true",
   226        "amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true",
   227        "amazonec2-security-group=docker-machine-scaler",
   228        "amazonec2-instance-type=m4.2xlarge",
   229      ]
   230  ```
   231  
   232  The Docker Machine driver is set to `amazonec2` and the machine name has a
   233  standard prefix followed by `%s` (required) that is replaced by the ID of the
   234  child Runner: `alloy-docker-machine-%s`.
   235  
   236  Now, depending on your AWS infrastructure, there are many options you can set up
   237  under `MachineOptions`. Below you can see the most common ones.
   238  
   239  | Machine option | Description |
   240  | -------------- | ----------- |
   241  | `amazonec2-access-key=XXXX` | The AWS access key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). |
   242  | `amazonec2-secret-key=XXXX` | The AWS secret key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). |
   243  | `amazonec2-region=eu-central-1` | The region to use when launching the instance. You can omit this entirely and the default `us-east-1` will be used. |
   244  | `amazonec2-vpc-id=vpc-xxxxx` | Your [VPC ID](https://docs.docker.com/machine/drivers/aws/#vpc-id) to launch the instance in. |
   245  | `amazonec2-subnet-id=subnet-xxxx` | The AWS VPC subnet ID. |
   246  | `amazonec2-use-private-address=true` | Use the private IP address of Docker Machines, but still create a public IP address. Useful to keep the traffic internal and avoid extra costs.|
   247  | `amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true` | AWS extra tag key-value pairs, useful to identify the instances on the AWS console. The "Name" tag is set to the machine name by default. We set the "runner-manager-name" to match the Runner name set in `[[runners]]`, so that we can filter all the EC2 instances created by a specific manager setup. Read more about [using tags in AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html). |
   248  | `amazonec2-security-group=docker-machine-scaler` | AWS VPC security group name, see [AWS security groups](#aws-security-groups). |
   249  | `amazonec2-instance-type=m4.2xlarge` | The instance type that the child Runners will run on. |
   250  
   251  TIP: **Tip:**
   252  Under `MachineOptions` you can add anything that the [AWS Docker Machine driver
   253  supports](https://docs.docker.com/machine/drivers/aws/#options). You are highly
   254  encouraged to read Docker's docs as your infrastructure setup may warrant
   255  different options to be applied.
   256  
   257  NOTE: **Note:**
   258  The child instances will use by default Ubuntu 16.04 unless you choose a
   259  different AMI ID by setting `amazonec2-ami`.
   260  
   261  NOTE: **Note:**
   262  If you specify `amazonec2-private-address-only=true` as one of the machine
   263  options, your EC2 instance won't get assigned a public IP. This is ok if your
   264  VPC is configured correctly with an Internet Gateway (IGW) and routing is fine,
   265  but it’s something to consider if you've got a more complex configuration. Read
   266  more in [Docker docs about VPC connectivity](https://docs.docker.com/machine/drivers/aws/#vpc-connectivity).
   267  
   268  [Read more](../advanced-configuration.md#the-runners-machine-section)
   269  about all the options you can use under `[runners.machine]`.
   270  
   271  ### Getting it all together
   272  
   273  Here's the full example of `/etc/alloy-runner/config.toml`:
   274  
   275  ```toml
   276  concurrent = 10
   277  check_interval = 0
   278  
   279  [[runners]]
   280    name = "alloy-aws-autoscaler"
   281    url = "<URL of your AlloyCI instance>"
   282    token = "<Runner's token>"
   283    executor = "docker+machine"
   284    limit = 20
   285    [runners.docker]
   286      image = "alpine"
   287      privileged = true
   288      disable_cache = true
   289    [runners.cache]
   290      Type = "s3"
   291      ServerAddress = "s3.amazonaws.com"
   292      AccessKey = "<your AWS Access Key ID>"
   293      SecretKey = "<your AWS Secret Access Key>"
   294      BucketName = "<the bucket where your cache should be kept>"
   295      BucketLocation = "us-east-1"
   296      Shared = true
   297    [runners.machine]
   298      IdleCount = 1
   299      IdleTime = 1800
   300      MaxBuilds = 100
   301      OffPeakPeriods = [
   302        "* * 0-9,18-23 * * mon-fri *",
   303        "* * * * * sat,sun *"
   304      ]
   305      OffPeakIdleCount = 0
   306      OffPeakIdleTime = 1200
   307      MachineDriver = "amazonec2"
   308      MachineName = "alloy-docker-machine-%s"
   309      MachineOptions = [
   310        "amazonec2-access-key=XXXX",
   311        "amazonec2-secret-key=XXXX",
   312        "amazonec2-region=us-central-1",
   313        "amazonec2-vpc-id=vpc-xxxxx",
   314        "amazonec2-subnet-id=subnet-xxxxx",
   315        "amazonec2-use-private-address=true",
   316        "amazonec2-tags=runner-manager-name,alloy-aws-autoscaler,alloy,true,alloy-runner-autoscale,true",
   317        "amazonec2-security-group=docker-machine-scaler",
   318        "amazonec2-instance-type=m4.2xlarge",
   319      ]
   320  ```
   321  
   322  ## Cutting down costs with Amazon EC2 Spot instances
   323  
   324  As [described by][spot] Amazon:
   325  
   326  >
   327  Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity.
   328  Since Spot instances are often available at a discount compared to On-Demand
   329  pricing, you can significantly reduce the cost of running your applications,
   330  grow your application’s compute capacity and throughput for the same budget,
   331  and enable new types of cloud computing applications.
   332  
   333  In addition to the [`runners.machine`](#the-runners-machine-section) options
   334  you picked above, in `/etc/alloy-runner/config.toml` under the `MachineOptions`
   335  section, add the following:
   336  
   337  ```toml
   338      MachineOptions = [
   339        "amazonec2-request-spot-instance=true",
   340        "amazonec2-spot-price=0.03",
   341        "amazonec2-block-duration-minutes=60"
   342      ]
   343  ```
   344  
   345  With this configuration, Docker Machines are created on Spot instances with a
   346  maximum bid price of $0.03 per hour and the duration of the Spot instance is
   347  capped at 60 minutes. The `0.03` number mentioned above is just an example, so
   348  be sure to check on the current pricing based on the region you picked.
   349  
   350  To learn more about Amazon EC2 Spot instances, visit the following links:
   351  
   352  - https://aws.amazon.com/ec2/spot/
   353  - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html
   354  - https://aws.amazon.com/blogs/aws/focusing-on-spot-instances-lets-talk-about-best-practices/
   355  
   356  ### Caveats of Spot instances
   357  
   358  While Spot instances is a great way to use unused resources and minimize the
   359  costs of your infrastructure, you must be aware of the implications.
   360  
   361  Running CI jobs on Spot instances may increase the failure rates because of the
   362  Spot instances pricing model. If the price exceeds your bid, the existing Spot
   363  instances will be immediately terminated and all your jobs on that host will fail.
   364  
   365  As a consequence, the auto-scale Runner would fail to create new machines while
   366  it will continue to request new instances. This eventually will make 60 requests
   367  and then AWS won't accept any more. Then once the Spot price is acceptable, you
   368  are locked out for a bit because the call amount limit is exceeded.
   369  
   370  If you encounter that case, you can use the following command in the bastion
   371  machine to see the Docker Machines state:
   372  
   373  ```sh
   374  docker-machine ls -q --filter state=Error --format "{{.NAME}}"
   375  ```
   376  
   377  NOTE: **Note:**
   378  There are some issues regarding making AlloyCI Runner gracefully handle Spot
   379  price changes, and there are reports of `docker-machine` attempting to
   380  continually remove a Docker Machine. AlloyCI has provided patches for both cases
   381  in the upstream project. For more information, see issues
   382  [#2771](https://gitlab.com/gitlab-org/gitlab-runner/issues/2771) and
   383  [#2772](https://gitlab.com/gitlab-org/gitlab-runner/issues/2772).
   384  
   385  ## Conclusion
   386  
   387  In this guide we learned how to install and configure a AlloyCI Runner in
   388  autoscale mode on AWS.
   389  
   390  Using the autoscale feature of AlloyCI Runner can save you both time and money.
   391  Using the Spot instances that AWS provides can save you even more, but you must
   392  be aware of the implications. As long as your bid is high enough, there shouldn't
   393  be an issue.
   394  
   395  You can read the following use cases from which this tutorial was (heavily)
   396  influenced:
   397  
   398  - [HumanGeo - Scaling GitLab CI](http://blog.thehumangeo.com/gitlab-autoscale-runners.html)
   399  - [subtrakt Health - Autoscale GitLab CI Runners and save 90% on EC2 costs](https://substrakthealth.com/news/gitlab-ci-cost-savings/)
   400  
   401  [spot]: https://aws.amazon.com/ec2/spot/