github.com/secure-build/gitlab-runner@v12.5.0+incompatible/docs/configuration/runner_autoscale_aws/index.md

github.com/secure-build/gitlab-runner@v12.5.0+incompatible/docs/configuration/runner_autoscale_aws/index.md (about)

     1  ---
     2  last_updated: 2019-08-21
     3  ---
     4  
     5  > **[Article Type](https://docs.gitlab.com/ee/development/writing_documentation.html#types-of-technical-articles):** Admin guide ||
     6  > **Level:** intermediary ||
     7  > **Author:** [Achilleas Pipinellis](https://gitlab.com/axil) ||
     8  > **Publication date:** 2017-11-24
     9  
    10  # Autoscaling GitLab Runner on AWS
    11  
    12  One of the biggest advantages of GitLab Runner is its ability to automatically
    13  spin up and down VMs to make sure your builds get processed immediately. It's a
    14  great feature, and if used correctly, it can be extremely useful in situations
    15  where you don't use your Runners 24/7 and want to have a cost-effective and
    16  scalable solution.
    17  
    18  ## Introduction
    19  
    20  In this tutorial, we'll explore how to properly configure a GitLab Runner in
    21  AWS that will serve as the Runner Manager where it will spawn new Docker machines on
    22  demand.
    23  
    24  In addition, we'll make use of [Amazon's EC2 Spot instances][spot] which will
    25  greatly reduce the costs of the Runner instances while still using quite
    26  powerful autoscaling machines.
    27  
    28  ## Prerequisites
    29  
    30  NOTE: **Note:**
    31  A familiarity with Amazon Web Services (AWS) is required as this is where most
    32  of the configuration will take place.
    33  
    34  TIP: **Tip:**
    35  We suggest a quick read through docker machine [`amazonec2` driver
    36  documentation](https://docs.docker.com/machine/drivers/aws/) to familiarize
    37  yourself with the parameters we will set later in this article.
    38  
    39  Your GitLab instance is going to need to talk to the Runners over the network,
    40  and that is something you need think about when configuring any AWS security
    41  groups or when setting up your DNS configuration.
    42  
    43  For example, you can keep the EC2 resources segmented away from public traffic
    44  in a different VPC to better strengthen your network security. Your environment
    45  is likely different, so consider what works best for your situation.
    46  
    47  ### AWS security groups
    48  
    49  Docker Machine will attempt to use a
    50  [default security group](https://docs.docker.com/machine/drivers/aws/#security-group)
    51  with rules for port `2376`, which is required for communication with the Docker
    52  daemon. Instead of relying on Docker, you can create a security group with the
    53  rules you need and provide that in the Runner options as we will
    54  [see below](#the-runnersmachine-section). This way, you can customize it to your
    55  liking ahead of time based on your networking environment.
    56  
    57  ### AWS credentials
    58  
    59  You'll need an [AWS Access Key](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html)
    60  tied to a user with permission to scale (EC2) and update the cache (via S3).
    61  Create a new user with [policies](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-policies-for-amazon-ec2.html)
    62  for EC2 (AmazonEC2FullAccess) and S3 (AmazonS3FullAccess). To be more secure,
    63  you can disable console login for that user. Keep the tab open or copy paste the
    64  security credentials in an editor as we'll use them later during the
    65  [Runner configuration](#the-runnersmachine-section).
    66  
    67  ## Prepare the Runner Manager instance
    68  
    69  The first step is to install GitLab Runner in an EC2 instance that will serve
    70  as the Runner Manager that spawns new machines. This doesn't have to be a powerful
    71  machine since it will not run any jobs itself, a `t2.micro` instance will do.
    72  This machine will be a dedicated host since we need it always up and running,
    73  thus it will be the only standard cost.
    74  
    75  NOTE: **Note:**
    76  For the Runner Manager instance, choose a distribution that both Docker and GitLab
    77  Runner support, for example either Ubuntu, Debian, CentOS or RHEL will work fine.
    78  
    79  Install the prerequisites:
    80  
    81  1. Log in to your server
    82  1. [Install GitLab Runner from the official GitLab repository](../../install/linux-repository.md)
    83  1. [Install Docker](https://docs.docker.com/engine/installation/#server)
    84  1. [Install Docker Machine](https://docs.docker.com/machine/install-machine/)
    85  
    86  Now that the Runner is installed, it's time to register it.
    87  
    88  ## Registering the GitLab Runner
    89  
    90  Before configuring the GitLab Runner, you need to first register it, so that
    91  it connects with your GitLab instance:
    92  
    93  1. [Obtain a Runner token](https://docs.gitlab.com/ee/ci/runners/)
    94  1. [Register the Runner](../../register/index.md#gnulinux)
    95  1. When asked the executor type, enter `docker+machine`
    96  
    97  You can now move on to the most important part, configuring the GitLab Runner.
    98  
    99  TIP: **Tip:**
   100  If you want every user in your instance to be able to use the autoscaled Runners,
   101  register the Runner as a shared one.
   102  
   103  ## Configuring the GitLab Runner
   104  
   105  Now that the Runner is registered, you need to edit its configuration file and
   106  add the required options for the AWS machine driver.
   107  
   108  Let's first break it down to pieces.
   109  
   110  ### The global section
   111  
   112  In the global section, you can define the limit of the jobs that can be run
   113  concurrently across all Runners (`concurrent`). This heavily depends on your
   114  needs, like how many users your Runners will accommodate, how much time your
   115  builds take, etc. You can start with something low like `10`, and increase or
   116  decrease its value going forward.
   117  
   118  The `check_interval` option defines how often the Runner should check GitLab
   119  for new jobs, in seconds.
   120  
   121  Example:
   122  
   123  ```toml
   124  concurrent = 10
   125  check_interval = 0
   126  ```
   127  
   128  [Read more](../advanced-configuration.md#the-global-section)
   129  about all the options you can use.
   130  
   131  ### The `runners` section
   132  
   133  From the `[[runners]]` section, the most important part is the `executor` which
   134  must be set to `docker+machine`. Most of those settings are taken care of when
   135  you register the Runner for the first time.
   136  
   137  `limit` sets the maximum number of machines (running and idle) that this Runner
   138  will spawn. For more info check the [relationship between `limit`, `concurrent`
   139  and `IdleCount`](../autoscale.md#how-concurrent-limit-and-idlecount-generate-the-upper-limit-of-running-machines).
   140  
   141  Example:
   142  
   143  ```toml
   144  [[runners]]
   145    name = "gitlab-aws-autoscaler"
   146    url = "<URL of your GitLab instance>"
   147    token = "<Runner's token>"
   148    executor = "docker+machine"
   149    limit = 20
   150  ```
   151  
   152  [Read more](../advanced-configuration.md#the-runners-section)
   153  about all the options you can use under `[[runners]]`.
   154  
   155  ### The `runners.docker` section
   156  
   157  In the `[runners.docker]` section you can define the default Docker image to
   158  be used by the child Runners if it's not defined in [`.gitlab-ci.yml`](https://docs.gitlab.com/ee/ci/yaml/).
   159  By using `privileged = true`, all Runners will be able to run
   160  [Docker in Docker](https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor)
   161  which is useful if you plan to build your own Docker images via GitLab CI/CD.
   162  
   163  Next, we use `disable_cache = true` to disable the Docker executor's inner
   164  cache mechanism since we will use the distributed cache mode as described
   165  in the following section.
   166  
   167  Example:
   168  
   169  ```toml
   170    [runners.docker]
   171      image = "alpine"
   172      privileged = true
   173      disable_cache = true
   174  ```
   175  
   176  [Read more](../advanced-configuration.md#the-runnersdocker-section)
   177  about all the options you can use under `[runners.docker]`.
   178  
   179  ### The `runners.cache` section
   180  
   181  To speed up your jobs, GitLab Runner provides a cache mechanism where selected
   182  directories and/or files are saved and shared between subsequent jobs.
   183  While not required for this setup, it is recommended to use the distributed cache
   184  mechanism that GitLab Runner provides. Since new instances will be created on
   185  demand, it is essential to have a common place where the cache is stored.
   186  
   187  In the following example, we use Amazon S3:
   188  
   189  ```toml
   190    [runners.cache]
   191      Type = "s3"
   192      Shared = true
   193      [runners.cache.s3]
   194        ServerAddress = "s3.amazonaws.com"
   195        AccessKey = "<your AWS Access Key ID>"
   196        SecretKey = "<your AWS Secret Access Key>"
   197        BucketName = "<the bucket where your cache should be kept>"
   198        BucketLocation = "us-east-1"
   199  ```
   200  
   201  Here's some more info to further explore the cache mechanism:
   202  
   203  - [Reference for `runners.cache`](../advanced-configuration.md#the-runnerscache-section)
   204  - [Reference for `runners.cache.s3`](../advanced-configuration.html#the-runnerscaches3-section)
   205  - [Deploying and using a cache server for GitLab Runner](../autoscale.md#distributed-runners-caching)
   206  - [How cache works](https://docs.gitlab.com/ee/ci/yaml/#cache)
   207  
   208  ### The `runners.machine` section
   209  
   210  This is the most important part of the configuration and it's the one that
   211  tells GitLab Runner how and when to spawn new or remove old Docker Machine
   212  instances.
   213  
   214  We will focus on the AWS machine options, for the rest of the settings read
   215  about the:
   216  
   217  - [Autoscaling algorithm and the parameters it's based on](../autoscale.md#autoscaling-algorithm-and-parameters) - depends on the needs of your organization
   218  - [Off peak time configuration](../autoscale.md#off-peak-time-mode-configuration) - useful when there are regular time periods in your organization when no work is done, for example weekends
   219  
   220  Here's an example of the `runners.machine` section:
   221  
   222  ```toml
   223    [runners.machine]
   224      IdleCount = 1
   225      IdleTime = 1800
   226      MaxBuilds = 10
   227      OffPeakPeriods = [
   228        "* * 0-9,18-23 * * mon-fri *",
   229        "* * * * * sat,sun *"
   230      ]
   231      OffPeakIdleCount = 0
   232      OffPeakIdleTime = 1200
   233      MachineDriver = "amazonec2"
   234      MachineName = "gitlab-docker-machine-%s"
   235      MachineOptions = [
   236        "amazonec2-access-key=XXXX",
   237        "amazonec2-secret-key=XXXX",
   238        "amazonec2-region=us-central-1",
   239        "amazonec2-vpc-id=vpc-xxxxx",
   240        "amazonec2-subnet-id=subnet-xxxxx",
   241        "amazonec2-zone=x",
   242        "amazonec2-use-private-address=true",
   243        "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
   244        "amazonec2-security-group=xxxxx",
   245        "amazonec2-instance-type=m4.2xlarge",
   246      ]
   247  ```
   248  
   249  The Docker Machine driver is set to `amazonec2` and the machine name has a
   250  standard prefix followed by `%s` (required) that is replaced by the ID of the
   251  child Runner: `gitlab-docker-machine-%s`.
   252  
   253  Now, depending on your AWS infrastructure, there are many options you can set up
   254  under `MachineOptions`. Below you can see the most common ones.
   255  
   256  | Machine option | Description |
   257  | -------------- | ----------- |
   258  | `amazonec2-access-key=XXXX` | The AWS access key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). |
   259  | `amazonec2-secret-key=XXXX` | The AWS secret key of the user that has permissions to create EC2 instances, see [AWS credentials](#aws-credentials). |
   260  | `amazonec2-region=eu-central-1` | The region to use when launching the instance. You can omit this entirely and the default `us-east-1` will be used. |
   261  | `amazonec2-vpc-id=vpc-xxxxx` | Your [VPC ID](https://docs.docker.com/machine/drivers/aws/#vpc-id) to launch the instance in. |
   262  | `amazonec2-subnet-id=subnet-xxxx` | The AWS VPC subnet ID. |
   263  | `amazonec2-zone=x` | If not specified, the [availability zone is `a`](https://docs.docker.com/machine/drivers/aws/#environment-variables-and-default-values), it needs to be set to the same availability zone as the specified subnet, for example when the zone is `eu-west-1b` it has to be `amazonec2-zone=b` |
   264  | `amazonec2-use-private-address=true` | Use the private IP address of Docker Machines, but still create a public IP address. Useful to keep the traffic internal and avoid extra costs.|
   265  | `amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true` | AWS extra tag key-value pairs, useful to identify the instances on the AWS console. The "Name" tag is set to the machine name by default. We set the "runner-manager-name" to match the Runner name set in `[[runners]]`, so that we can filter all the EC2 instances created by a specific manager setup. Read more about [using tags in AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html). |
   266  | `amazonec2-security-group=xxxx` | AWS VPC security group name, see [AWS security groups](#aws-security-groups). |
   267  | `amazonec2-instance-type=m4.2xlarge` | The instance type that the child Runners will run on. |
   268  
   269  TIP: **Tip:**
   270  Under `MachineOptions` you can add anything that the [AWS Docker Machine driver
   271  supports](https://docs.docker.com/machine/drivers/aws/#options). You are highly
   272  encouraged to read Docker's docs as your infrastructure setup may warrant
   273  different options to be applied.
   274  
   275  NOTE: **Note:**
   276  The child instances will use by default Ubuntu 16.04 unless you choose a
   277  different AMI ID by setting `amazonec2-ami`. Set only [supported
   278  base operating systems for Docker Machine](https://docs.docker.com/machine/drivers/os-base/).
   279  
   280  NOTE: **Note:**
   281  If you specify `amazonec2-private-address-only=true` as one of the machine
   282  options, your EC2 instance won't get assigned a public IP. This is ok if your
   283  VPC is configured correctly with an Internet Gateway (IGW) and routing is fine,
   284  but it’s something to consider if you've got a more complex configuration. Read
   285  more in [Docker docs about VPC connectivity](https://docs.docker.com/machine/drivers/aws/#vpc-connectivity).
   286  
   287  [Read more](../advanced-configuration.md#the-runnersmachine-section)
   288  about all the options you can use under `[runners.machine]`.
   289  
   290  ### Getting it all together
   291  
   292  Here's the full example of `/etc/gitlab-runner/config.toml`:
   293  
   294  ```toml
   295  concurrent = 10
   296  check_interval = 0
   297  
   298  [[runners]]
   299    name = "gitlab-aws-autoscaler"
   300    url = "<URL of your GitLab instance>"
   301    token = "<Runner's token>"
   302    executor = "docker+machine"
   303    limit = 20
   304    [runners.docker]
   305      image = "alpine"
   306      privileged = true
   307      disable_cache = true
   308    [runners.cache]
   309      Type = "s3"
   310      Shared = true
   311      [runners.cache.s3]
   312        ServerAddress = "s3.amazonaws.com"
   313        AccessKey = "<your AWS Access Key ID>"
   314        SecretKey = "<your AWS Secret Access Key>"
   315        BucketName = "<the bucket where your cache should be kept>"
   316        BucketLocation = "us-east-1"
   317    [runners.machine]
   318      IdleCount = 1
   319      IdleTime = 1800
   320      MaxBuilds = 100
   321      OffPeakPeriods = [
   322        "* * 0-9,18-23 * * mon-fri *",
   323        "* * * * * sat,sun *"
   324      ]
   325      OffPeakIdleCount = 0
   326      OffPeakIdleTime = 1200
   327      MachineDriver = "amazonec2"
   328      MachineName = "gitlab-docker-machine-%s"
   329      MachineOptions = [
   330        "amazonec2-access-key=XXXX",
   331        "amazonec2-secret-key=XXXX",
   332        "amazonec2-region=us-central-1",
   333        "amazonec2-vpc-id=vpc-xxxxx",
   334        "amazonec2-subnet-id=subnet-xxxxx",
   335        "amazonec2-use-private-address=true",
   336        "amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
   337        "amazonec2-security-group=docker-machine-scaler",
   338        "amazonec2-instance-type=m4.2xlarge",
   339      ]
   340  ```
   341  
   342  ## Cutting down costs with Amazon EC2 Spot instances
   343  
   344  As [described by][spot] Amazon:
   345  
   346  >
   347  Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity.
   348  Since Spot instances are often available at a discount compared to On-Demand
   349  pricing, you can significantly reduce the cost of running your applications,
   350  grow your application’s compute capacity and throughput for the same budget,
   351  and enable new types of cloud computing applications.
   352  
   353  In addition to the [`runners.machine`](#the-runnersmachine-section) options
   354  you picked above, in `/etc/gitlab-runner/config.toml` under the `MachineOptions`
   355  section, add the following:
   356  
   357  ```toml
   358      MachineOptions = [
   359        "amazonec2-request-spot-instance=true",
   360        "amazonec2-spot-price=",
   361      ]
   362  ```
   363  
   364  In this configuration with an empty `amazonec2-spot-price`, AWS sets your
   365  bidding price for a Spot instance to the default On-Demand price of that
   366  instance class. If you omit the `amazonec2-spot-price` completely, Docker
   367  Machine will set the bidding price to a [default value of $0.50 per
   368  hour](https://docs.docker.com/machine/drivers/aws/#environment-variables-and-default-values).
   369  
   370  You may further customize your Spot instance request:
   371  
   372  ```toml
   373      MachineOptions = [
   374        "amazonec2-request-spot-instance=true",
   375        "amazonec2-spot-price=0.03",
   376        "amazonec2-block-duration-minutes=60"
   377      ]
   378  ```
   379  
   380  With this configuration, Docker Machines are created on Spot instances with a
   381  maximum bid price of $0.03 per hour and the duration of the Spot instance is
   382  capped at 60 minutes. The `0.03` number mentioned above is just an example, so
   383  be sure to check on the current pricing based on the region you picked.
   384  
   385  To learn more about Amazon EC2 Spot instances, visit the following links:
   386  
   387  - <https://aws.amazon.com/ec2/spot/>
   388  - <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html>
   389  - <https://aws.amazon.com/blogs/aws/focusing-on-spot-instances-lets-talk-about-best-practices/>
   390  
   391  ### Caveats of Spot instances
   392  
   393  While Spot instances is a great way to use unused resources and minimize the
   394  costs of your infrastructure, you must be aware of the implications.
   395  
   396  Running CI jobs on Spot instances may increase the failure rates because of the
   397  Spot instances pricing model. If the price exceeds your bid, the existing Spot
   398  instances will be terminated within two minutes and all your jobs on that host
   399  will fail.
   400  
   401  As a consequence, the auto-scale Runner would fail to create new machines while
   402  it will continue to request new instances. This eventually will make 60 requests
   403  and then AWS won't accept any more. Then once the Spot price is acceptable, you
   404  are locked out for a bit because the call amount limit is exceeded.
   405  
   406  If you encounter that case, you can use the following command in the Runner Manager
   407  machine to see the Docker Machines state:
   408  
   409  ```sh
   410  docker-machine ls -q --filter state=Error --format "{{.NAME}}"
   411  ```
   412  
   413  NOTE: **Note:**
   414  There are some issues regarding making GitLab Runner gracefully handle Spot
   415  price changes, and there are reports of `docker-machine` attempting to
   416  continually remove a Docker Machine. GitLab has provided patches for both cases
   417  in the upstream project. For more information, see issues
   418  [#2771](https://gitlab.com/gitlab-org/gitlab-runner/issues/2771) and
   419  [#2772](https://gitlab.com/gitlab-org/gitlab-runner/issues/2772).
   420  
   421  ## Conclusion
   422  
   423  In this guide we learned how to install and configure a GitLab Runner in
   424  autoscale mode on AWS.
   425  
   426  Using the autoscale feature of GitLab Runner can save you both time and money.
   427  Using the Spot instances that AWS provides can save you even more, but you must
   428  be aware of the implications. As long as your bid is high enough, there shouldn't
   429  be an issue.
   430  
   431  You can read the following use cases from which this tutorial was (heavily)
   432  influenced:
   433  
   434  - [HumanGeo - Scaling GitLab CI](http://blog.thehumangeo.com/gitlab-autoscale-runners.html)
   435  - [Substrakt Health - Autoscale GitLab CI Runners and save 90% on EC2 costs](https://about.gitlab.com/blog/2017/11/23/autoscale-ci-runners/)
   436  
   437  [spot]: https://aws.amazon.com/ec2/spot/