github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/e2e/terraform/README.md (about)

     1  # Terraform infrastructure
     2  
     3  This folder contains Terraform resources for provisioning a Nomad cluster on
     4  EC2 instances on AWS to use as the target of end-to-end tests.
     5  
     6  Terraform provisions the AWS infrastructure assuming that EC2 AMIs have
     7  already been built via Packer. It deploys a specific build of Nomad to the
     8  cluster along with configuration files for Nomad, Consul, and Vault.
     9  
    10  ## Setup
    11  
    12  You'll need Terraform 0.13+, as well as AWS credentials to create the Nomad
    13  cluster. This Terraform stack assumes that an appropriate instance role has
    14  been configured elsewhere and that you have the ability to `AssumeRole` into
    15  the AWS account.
    16  
    17  Optionally, edit the `terraform.tfvars` file to change the number of Linux
    18  clients or Windows clients. The Terraform variables file
    19  `terraform.full.tfvars` is for the nightly E2E test run and deploys a larger,
    20  more diverse set of test targets.
    21  
    22  ```hcl
    23  region                           = "us-east-1"
    24  instance_type                    = "t2.medium"
    25  server_count                     = "3"
    26  client_count_ubuntu_bionic_amd64 = "4"
    27  client_count_windows_2016_amd64  = "1"
    28  profile                          = "dev-cluster"
    29  ```
    30  
    31  Run Terraform apply to deploy the infrastructure:
    32  
    33  ```sh
    34  cd e2e/terraform/
    35  terraform apply
    36  ```
    37  
    38  > Note: You will likely see "Connection refused" or "Permission denied" errors
    39  > in the logs as the provisioning script run by Terraform hits an instance
    40  > where the ssh service isn't yet ready. That's ok and expected; they'll get
    41  > retried. In particular, Windows instances can take a few minutes before ssh
    42  > is ready.
    43  
    44  ## Nomad Version
    45  
    46  You'll need to pass one of the following variables in either your
    47  `terraform.tfvars` file or as a command line argument (ex. `terraform apply
    48  -var 'nomad_version=0.10.2+ent'`)
    49  
    50  * `nomad_local_binary`: provision this specific local binary of Nomad. This is
    51    a path to a Nomad binary on your own host. Ex. `nomad_local_binary =
    52    "/home/me/nomad"`. This setting overrides `nomad_sha` or `nomad_version`.
    53  * `nomad_sha`: provision this specific sha from S3. This is a Nomad binary
    54    identified by its full commit SHA that's stored in a shared s3 bucket that
    55    Nomad team developers can access. That commit SHA can be from any branch
    56    that's pushed to remote. Ex. `nomad_sha =
    57    "0b6b475e7da77fed25727ea9f01f155a58481b6c"`. This setting overrides
    58    `nomad_version`.
    59  * `nomad_version`: provision this version from
    60    [releases.hashicorp.com](https://releases.hashicorp.com/nomad). Ex. `nomad_version
    61    = "0.10.2+ent"`
    62  
    63  If you want to deploy the Enterprise build of a specific SHA, include
    64  `-var 'nomad_enterprise=true'`.
    65  
    66  If you want to bootstrap Nomad ACLs, include `-var 'nomad_acls=true'`.
    67  
    68  > Note: If you bootstrap ACLs you will see "No cluster leader" in the output
    69  > several times while the ACL bootstrap script polls the cluster to start and
    70  > and elect a leader.
    71  
    72  ## Profiles
    73  
    74  The `profile` field selects from a set of configuration files for Nomad,
    75  Consul, and Vault by uploading the files found in `./config/<profile>`. The
    76  standard profiles are as follows:
    77  
    78  * `full-cluster`: This profile is used for nightly E2E testing. It assumes at
    79    least 3 servers and includes a unique config for each Nomad client.
    80  * `dev-cluster`: This profile is used for developer testing of a more limited
    81    set of clients. It assumes at least 3 servers but uses the one config for
    82    all the Linux Nomad clients and one config for all the Windows Nomad
    83    clients.
    84  
    85  You may create additional profiles for testing more complex interactions between features.
    86  You can build your own custom profile by writing config files to the
    87  `./config/<custom name>` directory.
    88  
    89  For each profile, application (Nomad, Consul, Vault), and agent type
    90  (`server`, `client_linux`, or `client_windows`), the agent gets the following
    91  configuration files, ignoring any that are missing.
    92  
    93  * `./config/<profile>/<application>/*`: base configurations shared between all
    94    servers and clients.
    95  * `./config/<profile>/<application>/<type>/*`: base configurations shared
    96    between all agents of this type.
    97  * `./config/<profile>/<application>/<type>/indexed/*<index>.<ext>`: a
    98    configuration for that particular agent, where the index value is the index
    99    of that agent within the total count.
   100  
   101  For example, with the `full-cluster` profile, 2nd Nomad server would get the
   102  following configuration files:
   103  * `./config/full-cluster/nomad/base.hcl`
   104  * `./config/full-cluster/nomad/server/indexed/server-1.hcl`
   105  
   106  The directory `./config/full-cluster/nomad/server` has no configuration files,
   107  so that's safely skipped.
   108  
   109  ## Outputs
   110  
   111  After deploying the infrastructure, you can get connection information
   112  about the cluster:
   113  
   114  - `$(terraform output environment)` will set your current shell's
   115    `NOMAD_ADDR` and `CONSUL_HTTP_ADDR` to point to one of the cluster's
   116    server nodes, and set the `NOMAD_E2E` variable.
   117  - `terraform output servers` will output the list of server node IPs.
   118  - `terraform output linux_clients` will output the list of Linux
   119    client node IPs.
   120  - `terraform output windows_clients` will output the list of Windows
   121    client node IPs.
   122  
   123  ## SSH
   124  
   125  You can use Terraform outputs above to access nodes via ssh:
   126  
   127  ```sh
   128  ssh -i keys/nomad-e2e-*.pem ubuntu@${EC2_IP_ADDR}
   129  ```
   130  
   131  The Windows client runs OpenSSH for convenience, but has a different
   132  user and will drop you into a Powershell shell instead of bash:
   133  
   134  ```sh
   135  ssh -i keys/nomad-e2e-*.pem Administrator@${EC2_IP_ADDR}
   136  ```
   137  
   138  ## Teardown
   139  
   140  The terraform state file stores all the info.
   141  
   142  ```sh
   143  cd e2e/terraform/
   144  terraform destroy
   145  ```
   146  
   147  ## FAQ
   148  
   149  #### E2E Provisioning Goals
   150  
   151  1. The provisioning process should be able to run a nightly build against a
   152    variety of OS targets.
   153  2. The provisioning process should be able to support update-in-place
   154    tests. (See [#7063](https://github.com/hashicorp/nomad/issues/7063))
   155  3. A developer should be able to quickly stand up a small E2E cluster and
   156    provision it with a version of Nomad they've built on their laptop. The
   157    developer should be able to send updated builds to that cluster with a short
   158    iteration time, rather than having to rebuild the cluster.
   159  
   160  #### Why not just drop all the provisioning into the AMI?
   161  
   162  While that's the "correct" production approach for cloud infrastructure, it
   163  creates a few pain points for testing:
   164  
   165  * Creating a Linux AMI takes >10min, and creating a Windows AMI can take
   166    15-20min. This interferes with goal (3) above.
   167  * We won't be able to do in-place upgrade testing without having an in-place
   168    provisioning process anyways. This interferes with goals (2) above.
   169  
   170  #### Why not just drop all the provisioning into the user data?
   171  
   172  * Userdata is executed on boot, which prevents using them for in-place upgrade
   173    testing.
   174  * Userdata scripts are not very observable and it's painful to determine
   175    whether they've failed or simply haven't finished yet before trying to run
   176    tests.