github.com/smintz/nomad@v0.8.3/website/source/guides/securing-nomad.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Securing Nomad with TLS" 4 sidebar_current: "guides-securing-nomad" 5 description: |- 6 Securing Nomad's cluster communication with TLS is important for both 7 security and easing operations. Nomad can use mutual TLS (mTLS) for 8 authenticating for all HTTP and RPC communication. 9 --- 10 11 # Securing Nomad with TLS 12 13 Securing Nomad's cluster communication is not only important for security but 14 can even ease operations by preventing mistakes and misconfigurations. Nomad 15 optionally uses mutual [TLS][tls] (mTLS) for all HTTP and RPC communication. 16 Nomad's use of mTLS provides the following properties: 17 18 * Prevent unauthorized Nomad access 19 * Prevent observing or tampering with Nomad communication 20 * Prevent client/server role or region misconfigurations 21 * Prevent other services from masquerading as Nomad agents 22 23 Preventing region misconfigurations is a property of Nomad's mTLS not commonly 24 found in the TLS implementations on the public Internet. While most uses of 25 TLS verify the identity of the server you are connecting to based on a domain 26 name such as `example.com`, Nomad verifies the node you are connecting to is in 27 the expected region and configured for the expected role (e.g. 28 `client.us-west.nomad`). This also prevents other services who may have access 29 to certificates signed by the same private CA from masquerading as Nomad 30 agents. If certificates were identified based on hostname/IP then any other 31 service on a host could masquerade as a Nomad agent. 32 33 Correctly configuring TLS can be a complex process, especially given the wide 34 range of deployment methodologies. If you use the sample 35 [Vagrantfile][vagrantfile] from the [Getting Started Guide][guide-install] - or 36 have [cfssl][cfssl] and Nomad installed - this guide will provide you with a 37 production ready TLS configuration. 38 39 ~> Note that while Nomad's TLS configuration will be production ready, key 40 management and rotation is a complex subject not covered by this guide. 41 [Vault][vault] is the suggested solution for key generation and management. 42 43 ## Creating Certificates 44 45 The first step to configuring TLS for Nomad is generating certificates. In 46 order to prevent unauthorized cluster access, Nomad requires all certificates 47 be signed by the same Certificate Authority (CA). This should be a _private_ CA 48 and not a public one like [Let's Encrypt][letsencrypt] as any certificate 49 signed by this CA will be allowed to communicate with the cluster. 50 51 ~> Nomad certificates may be signed by intermediate CAs as long as the root CA 52 is the same. Append all intermediate CAs to the `cert_file`. 53 54 ### Certificate Authority 55 56 There are a variety of tools for managing your own CA, [like the PKI secret 57 backend in Vault][vault-pki], but for the sake of simplicity this guide will 58 use [cfssl][cfssl]. You can generate a private CA certificate and key with 59 [cfssl][cfssl]: 60 61 ```shell 62 $ # Generate the CA's private key and certificate 63 $ cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca 64 ``` 65 66 The CA key (`nomad-ca-key.pem`) will be used to sign certificates for Nomad 67 nodes and must be kept private. The CA certificate (`nomad-ca.pem`) contains 68 the public key necessary to validate Nomad certificates and therefore must be 69 distributed to every node that requires access. 70 71 ### Node Certificates 72 73 Once you have a CA certificate and key you can generate and sign the 74 certificates Nomad will use directly. TLS certificates commonly use the 75 fully-qualified domain name of the system being identified as the certificate's 76 Common Name (CN). However, hosts (and therefore hostnames and IPs) are often 77 ephemeral in Nomad clusters. Not only would signing a new certificate per 78 Nomad node be difficult, but using a hostname provides no security or 79 functional benefits to Nomad. To fulfill the desired security properties 80 (above) Nomad certificates are signed with their region and role such as: 81 82 * `client.global.nomad` for a client node in the `global` region 83 * `server.us-west.nomad` for a server node in the `us-west` region 84 85 To create certificates for the client and server in the cluster from the 86 [Getting Started guide][guide-cluster] with [cfssl][cfssl] create ([or 87 download][cfssl.json]) the following configuration file as `cfssl.json` to 88 increase the default certificate expiration time: 89 90 ```json 91 { 92 "signing": { 93 "default": { 94 "expiry": "87600h", 95 "usages": [ 96 "signing", 97 "key encipherment", 98 "server auth", 99 "client auth" 100 ] 101 } 102 } 103 } 104 ``` 105 106 ```shell 107 $ # Generate a certificate for the Nomad server 108 $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \ 109 -hostname="server.global.nomad,localhost,127.0.0.1" - | cfssljson -bare server 110 111 # Generate a certificate for the Nomad client 112 $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \ 113 -hostname="client.global.nomad,localhost,127.0.0.1" - | cfssljson -bare client 114 115 # Generate a certificate for the CLI 116 $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -profile=client \ 117 - | cfssljson -bare cli 118 ``` 119 120 Using `localhost` and `127.0.0.1` as subject alternate names (SANs) allows 121 tools like `curl` to be able to communicate with Nomad's HTTP API when run on 122 the same host. Other SANs may be added including a DNS resolvable hostname to 123 allow remote HTTP requests from third party tools. 124 125 You should now have the following files: 126 127 * `cfssl.json` - cfssl configuration. 128 * `nomad-ca.csr` - CA signing request. 129 * `nomad-ca-key.pem` - CA private key. Keep safe! 130 * `nomad-ca.pem` - CA public certificate. 131 * `cli.csr` - Nomad CLI certificate signing request. 132 * `cli-key.pem` - Nomad CLI private key. 133 * `cli.pem` - Nomad CLI certificate. 134 * `client.csr` - Nomad client node certificate signing request for the `global` region. 135 * `client-key.pem` - Nomad client node private key for the `global` region. 136 * `client.pem` - Nomad client node public certificate for the `global` region. 137 * `server.csr` - Nomad server node certificate signing request for the `global` region. 138 * `server-key.pem` - Nomad server node private key for the `global` region. 139 * `server.pem` - Nomad server node public certificate for the `global` region. 140 141 Each Nomad node should have the appropriate key (`-key.pem`) and certificate 142 (`.pem`) file for its region and role. In addition each node needs the CA's 143 public certificate (`nomad-ca.pem`). 144 145 ## Configuring Nomad 146 147 Next Nomad must be configured to use the newly-created key and certificates for 148 mTLS. Starting with the [server configuration from the Getting Started 149 guide][guide-server] add the following TLS configuration options: 150 151 ```hcl 152 # Increase log verbosity 153 log_level = "DEBUG" 154 155 # Setup data dir 156 data_dir = "/tmp/server1" 157 158 # Enable the server 159 server { 160 enabled = true 161 162 # Self-elect, should be 3 or 5 for production 163 bootstrap_expect = 1 164 } 165 166 # Require TLS 167 tls { 168 http = true 169 rpc = true 170 171 ca_file = "nomad-ca.pem" 172 cert_file = "server.pem" 173 key_file = "server-key.pem" 174 175 verify_server_hostname = true 176 verify_https_client = true 177 } 178 ``` 179 180 The new [`tls`][tls_block] section is worth breaking down in more detail: 181 182 ```hcl 183 tls { 184 http = true 185 rpc = true 186 # ... 187 } 188 ``` 189 190 This enables TLS for the HTTP and RPC protocols. Unlike web servers, Nomad 191 doesn't use separate ports for TLS and non-TLS traffic: your cluster should 192 either use TLS or not. 193 194 ```hcl 195 tls { 196 # ... 197 198 ca_file = "nomad-ca.pem" 199 cert_file = "server.pem" 200 key_file = "server-key.pem" 201 202 # ... 203 } 204 ``` 205 206 The file lines should point to wherever you placed the certificate files on 207 the node. This guide assumes they are in Nomad's current directory. 208 209 ```hcl 210 tls { 211 # ... 212 213 verify_server_hostname = true 214 verify_https_client = true 215 } 216 ``` 217 218 These two settings are important for ensuring all of Nomad's mTLS security 219 properties are met. If [`verify_server_hostname`][verify_server_hostname] is 220 set to `false` the node's certificate will be checked to ensure it is signed by 221 the same CA, but its role and region will not be verified. This means any 222 service with a certificate signed by same CA as Nomad can act as a client or 223 server of any region. 224 225 [`verify_https_client`][verify_https_client] requires HTTP API clients to 226 present a certificate signed by the same CA as Nomad's certificate. It may be 227 disabled to allow HTTP API clients (e.g. Nomad CLI, Consul, or curl) to 228 communicate with the HTTPS API without presenting a client-side certificate. If 229 `verify_https_client` is enabled only HTTP API clients presenting a certificate 230 signed by the same CA as Nomad's certificate are allowed to access Nomad. 231 232 ~> Enabling `verify_https_client` effectively protects Nomad from unauthorized 233 network access at the cost of losing Consul HTTPS health checks for agents. 234 235 ### Client Configuration 236 237 The Nomad client configuration is similar to the server configuration. The 238 biggest difference is in the certificate and key used for configuration. 239 240 ```hcl 241 # Increase log verbosity 242 log_level = "DEBUG" 243 244 # Setup data dir 245 data_dir = "/tmp/client1" 246 247 # Enable the client 248 client { 249 enabled = true 250 251 # For demo assume we are talking to server1. For production, 252 # this should be like "nomad.service.consul:4647" and a system 253 # like Consul used for service discovery. 254 servers = ["127.0.0.1:4647"] 255 } 256 257 # Modify our port to avoid a collision with server1 258 ports { 259 http = 5656 260 } 261 262 # Require TLS 263 tls { 264 http = true 265 rpc = true 266 267 ca_file = "nomad-ca.pem" 268 cert_file = "client.pem" 269 key_file = "client-key.pem" 270 271 verify_server_hostname = true 272 verify_https_client = true 273 } 274 ``` 275 276 ### Running with TLS 277 278 Now that we have certificates generated and configuration for a client and 279 server we can test our TLS-enabled cluster! 280 281 In separate terminals start a server and client agent: 282 283 ```shell 284 $ # In one terminal... 285 $ nomad agent -config server1.hcl 286 287 $ # ...and in another 288 $ nomad agent -config client1.hcl 289 ``` 290 291 If you run `nomad node status` now, you'll get an error, like: 292 293 ```text 294 Error querying node status: Get http://127.0.0.1:4646/v1/nodes: malformed HTTP response "\x15\x03\x01\x00\x02\x02" 295 ``` 296 297 This is because the Nomad CLI defaults to communicating via HTTP instead of 298 HTTPS. We can configure the local Nomad client to connect using TLS and specify 299 our custom keys and certificates using the command line: 300 301 ```shell 302 $ nomad node status -ca-cert=nomad-ca.pem -client-cert=cli.pem -client-key=cli-key.pem -address=https://127.0.0.1:4646 303 ``` 304 305 This process can be cumbersome to type each time, so the Nomad CLI also 306 searches environment variables for default values. Set the following 307 environment variables in your shell: 308 309 ```shell 310 $ export NOMAD_ADDR=https://localhost:4646 311 $ export NOMAD_CACERT=nomad-ca.pem 312 $ export NOMAD_CLIENT_CERT=cli.pem 313 $ export NOMAD_CLIENT_KEY=cli-key.pem 314 ``` 315 316 * `NOMAD_ADDR` is the URL of the Nomad agent and sets the default for `-addr`. 317 * `NOMAD_CACERT` is the location of your CA certificate and sets the default 318 for `-ca-cert`. 319 * `NOMAD_CLIENT_CERT` is the location of your CLI certificate and sets the 320 default for `-client-cert`. 321 * `NOMAD_CLIENT_KEY` is the location of your CLI key and sets the default for 322 `-client-key`. 323 324 After these environment variables are correctly configured, the CLI will 325 respond as expected: 326 327 ```text 328 $ nomad node status 329 ID DC Name Class Drain Eligibility Status 330 237cd4c5 dc1 nomad <none> false eligible ready 331 332 $ nomad job init 333 Example job file written to example.nomad 334 vagrant@nomad:~$ nomad job run example.nomad 335 ==> Monitoring evaluation "e9970e1d" 336 Evaluation triggered by job "example" 337 Allocation "a1f6c3e7" created: node "237cd4c5", group "cache" 338 Evaluation within deployment: "080460ce" 339 Evaluation status changed: "pending" -> "complete" 340 ==> Evaluation "e9970e1d" finished with status "complete" 341 ``` 342 343 ## Server Gossip 344 345 At this point all of Nomad's RPC and HTTP communication is secured with mTLS. 346 However, Nomad servers also communicate with a gossip protocol, Serf, that does 347 not use TLS: 348 349 * HTTP - Used to communicate between CLI and Nomad agents. Secured by mTLS. 350 * RPC - Used to communicate between Nomad agents. Secured by mTLS. 351 * Serf - Used to communicate between Nomad servers. Secured by a shared key. 352 353 Nomad server's gossip protocol use a shared key instead of TLS for encryption. 354 This encryption key must be added to every server's configuration using the 355 [`encrypt`](/docs/agent/configuration/server.html#encrypt) parameter or with 356 the [`-encrypt` command line option](/docs/commands/agent.html). 357 358 The Nomad CLI includes a `operator keygen` command for generating a new secure gossip 359 encryption key: 360 361 ```text 362 $ nomad operator keygen 363 cg8StVXbQJ0gPvMd9o7yrg== 364 ``` 365 366 Alternatively, you can use any method that base64 encodes 16 random bytes: 367 368 ```text 369 $ openssl rand -base64 16 370 raZjciP8vikXng2S5X0m9w== 371 $ dd if=/dev/urandom bs=16 count=1 status=none | base64 372 LsuYyj93KVfT3pAJPMMCgA== 373 ``` 374 375 Put the same generated key into every server's configuration file or command 376 line arguments: 377 378 ```hcl 379 server { 380 enabled = true 381 382 # Self-elect, should be 3 or 5 for production 383 bootstrap_expect = 1 384 385 # Encrypt gossip communication 386 encrypt = "cg8StVXbQJ0gPvMd9o7yrg==" 387 } 388 ``` 389 390 ## Switching an existing cluster to TLS 391 392 Since Nomad does _not_ use different ports for TLS and non-TLS communication, 393 the use of TLS must be consistent across the cluster. Switching an existing 394 cluster to use TLS everywhere is operationally similar to upgrading between 395 versions of Nomad, but requires additional steps to preventing needlessly 396 rescheduling allocations. 397 398 1. Add the appropriate key and certificates to all nodes. 399 * Ensure the private key file is only readable by the Nomad user. 400 1. Add the environment variables to all nodes where the CLI is used. 401 1. Add the appropriate [`tls`][tls_block] block to the configuration file on 402 all nodes. 403 1. Generate a gossip key and add it the Nomad server configuration. 404 405 ~> Once a quorum of servers are TLS-enabled, clients will no longer be able to 406 communicate with the servers until their client configuration is updated and 407 reloaded. 408 409 At this point a rolling restart of the cluster will enable TLS everywhere. 410 However, once servers are restarted clients will be unable to heartbeat. This 411 means any client unable to restart with TLS enabled before their heartbeat TTL 412 expires will have their allocations marked as `lost` and rescheduled. 413 414 While the default heartbeat settings may be sufficient for concurrently 415 restarting a small number of nodes without any allocations being marked as 416 `lost`, most operators should raise the [`heartbeat_grace`][heartbeat_grace] 417 configuration setting before restarting their servers: 418 419 1. Set `heartbeat_grace = "1h"` or an appropriate duration on servers. 420 1. Restart servers, one at a time. 421 1. Restart clients, one or more at a time. 422 1. Set [`heartbeat_grace`][heartbeat_grace] back to its previous value (or 423 remove to accept the default). 424 1. Restart servers, one at a time. 425 426 ~> In a future release Nomad will allow upgrading a cluster to use TLS by 427 allowing servers to accept TLS and non-TLS connections from clients during 428 the migration. 429 430 Jobs running in the cluster will _not_ be affected and will continue running 431 throughout the switch as long as all clients can restart within their heartbeat 432 TTL. 433 434 ## Changing Nomad certificates on the fly 435 436 As of 0.7.1, Nomad supports dynamic certificate reloading via SIGHUP. 437 438 Given a prior TLS configuration as follows: 439 440 ```hcl 441 tls { 442 http = true 443 rpc = true 444 445 ca_file = "nomad-ca.pem" 446 cert_file = "server.pem" 447 key_file = "server-key.pem" 448 449 verify_server_hostname = true 450 verify_https_client = true 451 } 452 ``` 453 454 Nomad's cert_file and key_file can be reloaded via SIGHUP simply by 455 updating the TLS stanza to: 456 457 ```hcl 458 tls { 459 http = true 460 rpc = true 461 462 ca_file = "nomad-ca.pem" 463 cert_file = "new_server.pem" 464 key_file = "new_server_key.pem" 465 466 verify_server_hostname = true 467 verify_https_client = true 468 } 469 ``` 470 ## Migrating a cluster to TLS 471 472 ### Reloading TLS configuration via SIGHUP 473 474 Nomad supports dynamically reloading both client and server TLS configuration. 475 To reload an agent's TLS configuration, first update the TLS block in the 476 agent's configuration file and then send the Nomad agent a SIGHUP signal. 477 Note that this will only reload a subset of the configuration file, 478 including the TLS configuration. 479 480 The agent reloads all its network connections when there are changes to its TLS 481 configuration during a config reload via SIGHUP. Any new connections 482 established will use the updated configuration, and any outstanding old 483 connections will be closed. This process works when upgrading to TLS, 484 downgrading from it, as well as rolling certificates. We recommend upgrading 485 to TLS. 486 487 ### RPC Upgrade Mode for Nomad Servers 488 489 When migrating to TLS, the [ `rpc_upgrade_mode` ][rpc_upgrade_mode] option 490 (defaults to `false`) in the TLS configuration for a Nomad server can be set 491 to true. When set to true, servers will accept both TLS and non-TLS 492 connections. By accepting non-TLS connections, operators can upgrade clients 493 to TLS without the clients being marked as lost because the server is 494 rejecting the client connection due to the connection not being over TLS. 495 However, it is important to note that `rpc_upgrade_mode` should be used as a 496 temporary solution in the process of migration, and this option should be 497 re-set to false (meaning that the server will strictly accept only TLS 498 connections) once the entire cluster has been migrated. 499 500 [cfssl]: https://cfssl.org/ 501 [cfssl.json]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/cfssl.json 502 [guide-install]: https://www.nomadproject.io/intro/getting-started/install.html 503 [guide-cluster]: https://www.nomadproject.io/intro/getting-started/cluster.html 504 [guide-server]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/server.hcl 505 [heartbeat_grace]: /docs/agent/configuration/server.html#heartbeat_grace 506 [letsencrypt]: https://letsencrypt.org/ 507 [rpc_upgrade_mode]: https://www.nomadproject.io/docs/agent/configuration/tls.html#rpc_upgrade_mode/ 508 [tls]: https://en.wikipedia.org/wiki/Transport_Layer_Security 509 [tls_block]: /docs/agent/configuration/tls.html 510 [vagrantfile]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile 511 [vault]: https://www.vaultproject.io/ 512 [vault-pki]: https://www.vaultproject.io/docs/secrets/pki/index.html 513 [verify_https_client]: /docs/agent/configuration/tls.html#verify_https_client 514 [verify_server_hostname]: /docs/agent/configuration/tls.html#verify_server_hostname