github.com/hernad/nomad@v1.6.112/e2e/terraform/README.md

github.com/hernad/nomad@v1.6.112/e2e/terraform/README.md (about)

1 # Terraform infrastructure
2
3 This folder contains Terraform resources for provisioning a Nomad
4 cluster on EC2 instances on AWS to use as the target of end-to-end
5 tests.
6
7 Terraform provisions the AWS infrastructure assuming that EC2 AMIs
8 have already been built via Packer and HCP Consul and HCP Vault
9 clusters are already running. It deploys a build of Nomad from your
10 local machine along with configuration files.
11
12 ## Setup
13
14 You'll need a recent version of Terraform (1.1+ recommended), as well
15 as AWS credentials to create the Nomad cluster and credentials for
16 HCP. This Terraform stack assumes that an appropriate instance role
17 has been configured elsewhere and that you have the ability to
18 `AssumeRole` into the AWS account.
19
20 Configure the following environment variables. For HashiCorp Nomad
21 developers, this configuration can be found in 1Pass in the Nomad
22 team's vault under `nomad-e2e`.
23
24 ```
25 export HCP_CLIENT_ID=
26 export HCP_CLIENT_SECRET=
27 export CONSUL_HTTP_TOKEN=
28 export CONSUL_HTTP_ADDR=
29 ```
30
31 The Vault admin token will expire after 6 hours. If you haven't
32 created one already use the separate Terraform configuration found in
33 the `hcp-vault-auth` directory. The following will set the correct
34 values for `VAULT_TOKEN`, `VAULT_ADDR`, and `VAULT_NAMESPACE`:
35
36 ```
37 cd ./hcp-vault-auth
38 terraform init
39 terraform apply --auto-approve
40 $(terraform output --raw environment)
41 ```
42
43 Optionally, edit the `terraform.tfvars` file to change the number of
44 Linux clients or Windows clients.
45
46 ```hcl
47 region = "us-east-1"
48 instance_type = "t2.medium"
49 server_count = "3"
50 client_count_ubuntu_jammy_amd64 = "4"
51 client_count_windows_2016_amd64 = "1"
52 ```
53
54 Optionally, edit the `nomad_local_binary` variable in the
55 `terraform.tfvars` file to change the path to the local binary of
56 Nomad you'd like to upload.
57
58 Run Terraform apply to deploy the infrastructure:
59
60 ```sh
61 cd e2e/terraform/
62 terraform init
63 terraform apply
64 ```
65
66 > Note: You will likely see "Connection refused" or "Permission denied" errors
67 > in the logs as the provisioning script run by Terraform hits an instance
68 > where the ssh service isn't yet ready. That's ok and expected; they'll get
69 > retried. In particular, Windows instances can take a few minutes before ssh
70 > is ready.
71 >
72 > Also note: When ACLs are being bootstrapped, you may see "No cluster
73 > leader" in the output several times while the ACL bootstrap script
74 > polls the cluster to start and and elect a leader.
75
76 ## Configuration
77
78 The files in `etc` are template configuration files for Nomad and the
79 Consul agent. Terraform will render these files to the `uploads`
80 folder and upload them to the cluster during provisioning.
81
82 * `etc/nomad.d` are the Nomad configuration files.
83 * `base.hcl`, `tls.hcl`, `consul.hcl`, and `vault.hcl` are shared.
84 * `server-linux.hcl`, `client-linux.hcl`, and `client-windows.hcl` are role and platform specific.
85 * `client-linux-0.hcl`, etc. are specific to individual instances.
86 * `etc/consul.d` are the Consul agent configuration files.
87 * `etc/acls` are ACL policy files for Consul and Vault.
88
89 ## Web UI
90
91 To access the web UI, deploy a reverse proxy to the cluster. All
92 clients have a TLS proxy certificate at `/etc/nomad.d/tls_proxy.crt`
93 and a self-signed cert at `/etc/nomad.d/self_signed.crt`. See
94 `../ui/inputs/proxy.nomad` for an example of using this. Deploy as follows:
95
96 ```sh
97 nomad namespace apply proxy
98 nomad job run ../ui/input/proxy.nomad
99 ```
100
101 You can get the public IP for the proxy allocation from the following
102 nested query:
103
104 ```sh
105 nomad node status -json -verbose \
106 $(nomad operator api '/v1/allocations?namespace=proxy' | jq -r '.[] | select(.JobID == "nomad-proxy") | .NodeID') \
107 | jq '.Attributes."unique.platform.aws.public-ipv4"'
108 ```
109
110 ## Outputs
111
112 After deploying the infrastructure, you can get connection information
113 about the cluster:
114
115 - `$(terraform output --raw environment)` will set your current shell's
116 `NOMAD_ADDR` and `CONSUL_HTTP_ADDR` to point to one of the cluster's server
117 nodes, and set the `NOMAD_E2E` variable.
118 - `terraform output servers` will output the list of server node IPs.
119 - `terraform output linux_clients` will output the list of Linux
120 client node IPs.
121 - `terraform output windows_clients` will output the list of Windows
122 client node IPs.
123
124 ## SSH
125
126 You can use Terraform outputs above to access nodes via ssh:
127
128 ```sh
129 ssh -i keys/nomad-e2e-*.pem ubuntu@${EC2_IP_ADDR}
130 ```
131
132 The Windows client runs OpenSSH for convenience, but has a different
133 user and will drop you into a Powershell shell instead of bash:
134
135 ```sh
136 ssh -i keys/nomad-e2e-*.pem Administrator@${EC2_IP_ADDR}
137 ```
138
139 ## Teardown
140
141 The terraform state file stores all the info.
142
143 ```sh
144 cd e2e/terraform/
145 terraform destroy
146 ```
147
148 ## FAQ
149
150 #### E2E Provisioning Goals
151
152 1. The provisioning process should be able to run a nightly build against a
153 variety of OS targets.
154 2. The provisioning process should be able to support update-in-place
155 tests. (See [#7063](https://github.com/hernad/nomad/issues/7063))
156 3. A developer should be able to quickly stand up a small E2E cluster and
157 provision it with a version of Nomad they've built on their laptop. The
158 developer should be able to send updated builds to that cluster with a short
159 iteration time, rather than having to rebuild the cluster.
160
161 #### Why not just drop all the provisioning into the AMI?
162
163 While that's the "correct" production approach for cloud infrastructure, it
164 creates a few pain points for testing:
165
166 * Creating a Linux AMI takes >10min, and creating a Windows AMI can take
167 15-20min. This interferes with goal (3) above.
168 * We won't be able to do in-place upgrade testing without having an in-place
169 provisioning process anyways. This interferes with goals (2) above.
170
171 #### Why not just drop all the provisioning into the user data?
172
173 * Userdata is executed on boot, which prevents using them for in-place upgrade
174 testing.
175 * Userdata scripts are not very observable and it's painful to determine
176 whether they've failed or simply haven't finished yet before trying to run
177 tests.