github.com/hernad/nomad@v1.6.112/e2e/terraform/README.md (about)

     1  # Terraform infrastructure
     2  
     3  This folder contains Terraform resources for provisioning a Nomad
     4  cluster on EC2 instances on AWS to use as the target of end-to-end
     5  tests.
     6  
     7  Terraform provisions the AWS infrastructure assuming that EC2 AMIs
     8  have already been built via Packer and HCP Consul and HCP Vault
     9  clusters are already running. It deploys a build of Nomad from your
    10  local machine along with configuration files.
    11  
    12  ## Setup
    13  
    14  You'll need a recent version of Terraform (1.1+ recommended), as well
    15  as AWS credentials to create the Nomad cluster and credentials for
    16  HCP. This Terraform stack assumes that an appropriate instance role
    17  has been configured elsewhere and that you have the ability to
    18  `AssumeRole` into the AWS account.
    19  
    20  Configure the following environment variables. For HashiCorp Nomad
    21  developers, this configuration can be found in 1Pass in the Nomad
    22  team's vault under `nomad-e2e`.
    23  
    24  ```
    25  export HCP_CLIENT_ID=
    26  export HCP_CLIENT_SECRET=
    27  export CONSUL_HTTP_TOKEN=
    28  export CONSUL_HTTP_ADDR=
    29  ```
    30  
    31  The Vault admin token will expire after 6 hours. If you haven't
    32  created one already use the separate Terraform configuration found in
    33  the `hcp-vault-auth` directory. The following will set the correct
    34  values for `VAULT_TOKEN`, `VAULT_ADDR`, and `VAULT_NAMESPACE`:
    35  
    36  ```
    37  cd ./hcp-vault-auth
    38  terraform init
    39  terraform apply --auto-approve
    40  $(terraform output --raw environment)
    41  ```
    42  
    43  Optionally, edit the `terraform.tfvars` file to change the number of
    44  Linux clients or Windows clients.
    45  
    46  ```hcl
    47  region                           = "us-east-1"
    48  instance_type                    = "t2.medium"
    49  server_count                     = "3"
    50  client_count_ubuntu_jammy_amd64  = "4"
    51  client_count_windows_2016_amd64  = "1"
    52  ```
    53  
    54  Optionally, edit the `nomad_local_binary` variable in the
    55  `terraform.tfvars` file to change the path to the local binary of
    56  Nomad you'd like to upload.
    57  
    58  Run Terraform apply to deploy the infrastructure:
    59  
    60  ```sh
    61  cd e2e/terraform/
    62  terraform init
    63  terraform apply
    64  ```
    65  
    66  > Note: You will likely see "Connection refused" or "Permission denied" errors
    67  > in the logs as the provisioning script run by Terraform hits an instance
    68  > where the ssh service isn't yet ready. That's ok and expected; they'll get
    69  > retried. In particular, Windows instances can take a few minutes before ssh
    70  > is ready.
    71  >
    72  > Also note: When ACLs are being bootstrapped, you may see "No cluster
    73  > leader" in the output several times while the ACL bootstrap script
    74  > polls the cluster to start and and elect a leader.
    75  
    76  ## Configuration
    77  
    78  The files in `etc` are template configuration files for Nomad and the
    79  Consul agent. Terraform will render these files to the `uploads`
    80  folder and upload them to the cluster during provisioning.
    81  
    82  * `etc/nomad.d` are the Nomad configuration files.
    83    * `base.hcl`, `tls.hcl`, `consul.hcl`, and `vault.hcl` are shared.
    84    * `server-linux.hcl`, `client-linux.hcl`, and `client-windows.hcl` are role and platform specific.
    85    * `client-linux-0.hcl`, etc. are specific to individual instances.
    86  * `etc/consul.d` are the Consul agent configuration files.
    87  * `etc/acls` are ACL policy files for Consul and Vault.
    88  
    89  ## Web UI
    90  
    91  To access the web UI, deploy a reverse proxy to the cluster. All
    92  clients have a TLS proxy certificate at `/etc/nomad.d/tls_proxy.crt`
    93  and a self-signed cert at `/etc/nomad.d/self_signed.crt`. See
    94  `../ui/inputs/proxy.nomad` for an example of using this. Deploy as follows:
    95  
    96  ```sh
    97  nomad namespace apply proxy
    98  nomad job run ../ui/input/proxy.nomad
    99  ```
   100  
   101  You can get the public IP for the proxy allocation from the following
   102  nested query:
   103  
   104  ```sh
   105  nomad node status -json -verbose \
   106      $(nomad operator api '/v1/allocations?namespace=proxy' | jq -r '.[] | select(.JobID == "nomad-proxy") | .NodeID') \
   107      | jq '.Attributes."unique.platform.aws.public-ipv4"'
   108  ```
   109  
   110  ## Outputs
   111  
   112  After deploying the infrastructure, you can get connection information
   113  about the cluster:
   114  
   115  - `$(terraform output --raw environment)` will set your current shell's
   116    `NOMAD_ADDR` and `CONSUL_HTTP_ADDR` to point to one of the cluster's server
   117    nodes, and set the `NOMAD_E2E` variable.
   118  - `terraform output servers` will output the list of server node IPs.
   119  - `terraform output linux_clients` will output the list of Linux
   120    client node IPs.
   121  - `terraform output windows_clients` will output the list of Windows
   122    client node IPs.
   123  
   124  ## SSH
   125  
   126  You can use Terraform outputs above to access nodes via ssh:
   127  
   128  ```sh
   129  ssh -i keys/nomad-e2e-*.pem ubuntu@${EC2_IP_ADDR}
   130  ```
   131  
   132  The Windows client runs OpenSSH for convenience, but has a different
   133  user and will drop you into a Powershell shell instead of bash:
   134  
   135  ```sh
   136  ssh -i keys/nomad-e2e-*.pem Administrator@${EC2_IP_ADDR}
   137  ```
   138  
   139  ## Teardown
   140  
   141  The terraform state file stores all the info.
   142  
   143  ```sh
   144  cd e2e/terraform/
   145  terraform destroy
   146  ```
   147  
   148  ## FAQ
   149  
   150  #### E2E Provisioning Goals
   151  
   152  1. The provisioning process should be able to run a nightly build against a
   153    variety of OS targets.
   154  2. The provisioning process should be able to support update-in-place
   155    tests. (See [#7063](https://github.com/hernad/nomad/issues/7063))
   156  3. A developer should be able to quickly stand up a small E2E cluster and
   157    provision it with a version of Nomad they've built on their laptop. The
   158    developer should be able to send updated builds to that cluster with a short
   159    iteration time, rather than having to rebuild the cluster.
   160  
   161  #### Why not just drop all the provisioning into the AMI?
   162  
   163  While that's the "correct" production approach for cloud infrastructure, it
   164  creates a few pain points for testing:
   165  
   166  * Creating a Linux AMI takes >10min, and creating a Windows AMI can take
   167    15-20min. This interferes with goal (3) above.
   168  * We won't be able to do in-place upgrade testing without having an in-place
   169    provisioning process anyways. This interferes with goals (2) above.
   170  
   171  #### Why not just drop all the provisioning into the user data?
   172  
   173  * Userdata is executed on boot, which prevents using them for in-place upgrade
   174    testing.
   175  * Userdata scripts are not very observable and it's painful to determine
   176    whether they've failed or simply haven't finished yet before trying to run
   177    tests.