github.com/smintz/nomad@v0.8.3/website/source/guides/securing-nomad.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Securing Nomad with TLS"
     4  sidebar_current: "guides-securing-nomad"
     5  description: |-
     6    Securing Nomad's cluster communication with TLS is important for both
     7    security and easing operations. Nomad can use mutual TLS (mTLS) for
     8    authenticating for all HTTP and RPC communication.
     9  ---
    10  
    11  # Securing Nomad with TLS
    12  
    13  Securing Nomad's cluster communication is not only important for security but
    14  can even ease operations by preventing mistakes and misconfigurations. Nomad
    15  optionally uses mutual [TLS][tls] (mTLS) for all HTTP and RPC communication.
    16  Nomad's use of mTLS provides the following properties:
    17  
    18  * Prevent unauthorized Nomad access
    19  * Prevent observing or tampering with Nomad communication
    20  * Prevent client/server role or region misconfigurations
    21  * Prevent other services from masquerading as Nomad agents
    22  
    23  Preventing region misconfigurations is a property of Nomad's mTLS not commonly
    24  found in the TLS implementations on the public Internet.  While most uses of
    25  TLS verify the identity of the server you are connecting to based on a domain
    26  name such as `example.com`, Nomad verifies the node you are connecting to is in
    27  the expected region and configured for the expected role (e.g.
    28  `client.us-west.nomad`). This also prevents other services who may have access
    29  to certificates signed by the same private CA from masquerading as Nomad
    30  agents. If certificates were identified based on hostname/IP then any other
    31  service on a host could masquerade as a Nomad agent.
    32  
    33  Correctly configuring TLS can be a complex process, especially given the wide
    34  range of deployment methodologies. If you use the sample
    35  [Vagrantfile][vagrantfile] from the [Getting Started Guide][guide-install] - or
    36  have [cfssl][cfssl] and Nomad installed - this guide will provide you with a
    37  production ready TLS configuration.
    38  
    39  ~> Note that while Nomad's TLS configuration will be production ready, key
    40     management and rotation is a complex subject not covered by this guide.
    41     [Vault][vault] is the suggested solution for key generation and management.
    42  
    43  ## Creating Certificates
    44  
    45  The first step to configuring TLS for Nomad is generating certificates. In
    46  order to prevent unauthorized cluster access, Nomad requires all certificates
    47  be signed by the same Certificate Authority (CA). This should be a _private_ CA
    48  and not a public one like [Let's Encrypt][letsencrypt] as any certificate
    49  signed by this CA will be allowed to communicate with the cluster.
    50  
    51  ~> Nomad certificates may be signed by intermediate CAs as long as the root CA
    52     is the same. Append all intermediate CAs to the `cert_file`.
    53  
    54  ### Certificate Authority
    55  
    56  There are a variety of tools for managing your own CA, [like the PKI secret
    57  backend in Vault][vault-pki], but for the sake of simplicity this guide will
    58  use [cfssl][cfssl]. You can generate a private CA certificate and key with
    59  [cfssl][cfssl]:
    60  
    61  ```shell
    62  $ # Generate the CA's private key and certificate
    63  $ cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca
    64  ```
    65  
    66  The CA key (`nomad-ca-key.pem`) will be used to sign certificates for Nomad
    67  nodes and must be kept private. The CA certificate (`nomad-ca.pem`) contains
    68  the public key necessary to validate Nomad certificates and therefore must be
    69  distributed to every node that requires access.
    70  
    71  ### Node Certificates
    72  
    73  Once you have a CA certificate and key you can generate and sign the
    74  certificates Nomad will use directly. TLS certificates commonly use the
    75  fully-qualified domain name of the system being identified as the certificate's
    76  Common Name (CN). However, hosts (and therefore hostnames and IPs) are often
    77  ephemeral in Nomad clusters.  Not only would signing a new certificate per
    78  Nomad node be difficult, but using a hostname provides no security or
    79  functional benefits to Nomad. To fulfill the desired security properties
    80  (above) Nomad certificates are signed with their region and role such as:
    81  
    82  * `client.global.nomad` for a client node in the `global` region
    83  * `server.us-west.nomad` for a server node in the `us-west` region
    84  
    85  To create certificates for the client and server in the cluster from the
    86  [Getting Started guide][guide-cluster] with [cfssl][cfssl] create ([or
    87  download][cfssl.json]) the following configuration file as `cfssl.json` to
    88  increase the default certificate expiration time:
    89  
    90  ```json
    91  {
    92    "signing": {
    93      "default": {
    94        "expiry": "87600h",
    95        "usages": [
    96          "signing",
    97          "key encipherment",
    98          "server auth",
    99          "client auth"
   100        ]
   101      }
   102    }
   103  }
   104  ```
   105  
   106  ```shell
   107  $ # Generate a certificate for the Nomad server
   108  $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
   109      -hostname="server.global.nomad,localhost,127.0.0.1" - | cfssljson -bare server
   110  
   111  # Generate a certificate for the Nomad client
   112  $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
   113      -hostname="client.global.nomad,localhost,127.0.0.1" - | cfssljson -bare client
   114  
   115  # Generate a certificate for the CLI
   116  $ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -profile=client \
   117      - | cfssljson -bare cli
   118  ```
   119  
   120  Using `localhost` and `127.0.0.1` as subject alternate names (SANs) allows
   121  tools like `curl` to be able to communicate with Nomad's HTTP API when run on
   122  the same host. Other SANs may be added including a DNS resolvable hostname to
   123  allow remote HTTP requests from third party tools.
   124  
   125  You should now have the following files:
   126  
   127  * `cfssl.json` - cfssl configuration.
   128  * `nomad-ca.csr` - CA signing request.
   129  * `nomad-ca-key.pem` - CA private key. Keep safe!
   130  * `nomad-ca.pem` - CA public certificate.
   131  * `cli.csr` - Nomad CLI certificate signing request.
   132  * `cli-key.pem` - Nomad CLI private key.
   133  * `cli.pem` - Nomad CLI certificate.
   134  * `client.csr` - Nomad client node certificate signing request for the `global` region.
   135  * `client-key.pem` - Nomad client node private key for the `global` region.
   136  * `client.pem` - Nomad client node public certificate for the `global` region.
   137  * `server.csr` - Nomad server node certificate signing request for the `global` region.
   138  * `server-key.pem` - Nomad server node private key for the `global` region.
   139  * `server.pem` - Nomad server node public certificate for the `global` region.
   140  
   141  Each Nomad node should have the appropriate key (`-key.pem`) and certificate
   142  (`.pem`) file for its region and role. In addition each node needs the CA's
   143  public certificate (`nomad-ca.pem`).
   144  
   145  ## Configuring Nomad
   146  
   147  Next Nomad must be configured to use the newly-created key and certificates for
   148  mTLS. Starting with the [server configuration from the Getting Started
   149  guide][guide-server] add the following TLS configuration options:
   150  
   151  ```hcl
   152  # Increase log verbosity
   153  log_level = "DEBUG"
   154  
   155  # Setup data dir
   156  data_dir = "/tmp/server1"
   157  
   158  # Enable the server
   159  server {
   160    enabled = true
   161  
   162    # Self-elect, should be 3 or 5 for production
   163    bootstrap_expect = 1
   164  }
   165  
   166  # Require TLS
   167  tls {
   168    http = true
   169    rpc  = true
   170  
   171    ca_file   = "nomad-ca.pem"
   172    cert_file = "server.pem"
   173    key_file  = "server-key.pem"
   174  
   175    verify_server_hostname = true
   176    verify_https_client    = true
   177  }
   178  ```
   179  
   180  The new [`tls`][tls_block] section is worth breaking down in more detail:
   181  
   182  ```hcl
   183  tls {
   184    http = true
   185    rpc  = true
   186    # ...
   187  }
   188  ```
   189  
   190  This enables TLS for the HTTP and RPC protocols. Unlike web servers, Nomad
   191  doesn't use separate ports for TLS and non-TLS traffic: your cluster should
   192  either use TLS or not.
   193  
   194  ```hcl
   195  tls {
   196    # ...
   197  
   198    ca_file   = "nomad-ca.pem"
   199    cert_file = "server.pem"
   200    key_file  = "server-key.pem"
   201  
   202    # ...
   203  }
   204  ```
   205  
   206  The file lines should point to wherever you placed the certificate files on
   207  the node. This guide assumes they are in Nomad's current directory.
   208  
   209  ```hcl
   210  tls {
   211    # ...
   212  
   213    verify_server_hostname = true
   214    verify_https_client    = true
   215  }
   216  ```
   217  
   218  These two settings are important for ensuring all of Nomad's mTLS security
   219  properties are met. If [`verify_server_hostname`][verify_server_hostname] is
   220  set to `false` the node's certificate will be checked to ensure it is signed by
   221  the same CA, but its role and region will not be verified. This means any
   222  service with a certificate signed by same CA as Nomad can act as a client or
   223  server of any region.
   224  
   225  [`verify_https_client`][verify_https_client] requires HTTP API clients to
   226  present a certificate signed by the same CA as Nomad's certificate. It may be
   227  disabled to allow HTTP API clients (e.g. Nomad CLI, Consul, or curl) to
   228  communicate with the HTTPS API without presenting a client-side certificate. If
   229  `verify_https_client` is enabled only HTTP API clients presenting a certificate
   230  signed by the same CA as Nomad's certificate are allowed to access Nomad.
   231  
   232  ~> Enabling `verify_https_client` effectively protects Nomad from unauthorized
   233     network access at the cost of losing Consul HTTPS health checks for agents.
   234  
   235  ### Client Configuration
   236  
   237  The Nomad client configuration is similar to the server configuration. The
   238  biggest difference is in the certificate and key used for configuration.
   239  
   240  ```hcl
   241  # Increase log verbosity
   242  log_level = "DEBUG"
   243  
   244  # Setup data dir
   245  data_dir = "/tmp/client1"
   246  
   247  # Enable the client
   248  client {
   249    enabled = true
   250  
   251    # For demo assume we are talking to server1. For production,
   252    # this should be like "nomad.service.consul:4647" and a system
   253    # like Consul used for service discovery.
   254    servers = ["127.0.0.1:4647"]
   255  }
   256  
   257  # Modify our port to avoid a collision with server1
   258  ports {
   259    http = 5656
   260  }
   261  
   262  # Require TLS
   263  tls {
   264    http = true
   265    rpc  = true
   266  
   267    ca_file   = "nomad-ca.pem"
   268    cert_file = "client.pem"
   269    key_file  = "client-key.pem"
   270  
   271    verify_server_hostname = true
   272    verify_https_client    = true
   273  }
   274  ```
   275  
   276  ### Running with TLS
   277  
   278  Now that we have certificates generated and configuration for a client and
   279  server we can test our TLS-enabled cluster!
   280  
   281  In separate terminals start a server and client agent:
   282  
   283  ```shell
   284  $ # In one terminal...
   285  $ nomad agent -config server1.hcl
   286  
   287  $ # ...and in another
   288  $ nomad agent -config client1.hcl
   289  ```
   290  
   291  If you run `nomad node status` now, you'll get an error, like:
   292  
   293  ```text
   294  Error querying node status: Get http://127.0.0.1:4646/v1/nodes: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
   295  ```
   296  
   297  This is because the Nomad CLI defaults to communicating via HTTP instead of
   298  HTTPS. We can configure the local Nomad client to connect using TLS and specify
   299  our custom keys and certificates using the command line:
   300  
   301  ```shell
   302  $ nomad node status -ca-cert=nomad-ca.pem -client-cert=cli.pem -client-key=cli-key.pem -address=https://127.0.0.1:4646
   303  ```
   304  
   305  This process can be cumbersome to type each time, so the Nomad CLI also
   306  searches environment variables for default values. Set the following
   307  environment variables in your shell:
   308  
   309  ```shell
   310  $ export NOMAD_ADDR=https://localhost:4646
   311  $ export NOMAD_CACERT=nomad-ca.pem
   312  $ export NOMAD_CLIENT_CERT=cli.pem
   313  $ export NOMAD_CLIENT_KEY=cli-key.pem
   314  ```
   315  
   316  * `NOMAD_ADDR` is the URL of the Nomad agent and sets the default for `-addr`.
   317  * `NOMAD_CACERT` is the location of your CA certificate and sets the default
   318    for `-ca-cert`.
   319  * `NOMAD_CLIENT_CERT` is the location of your CLI certificate and sets the
   320    default for `-client-cert`.
   321  * `NOMAD_CLIENT_KEY` is the location of your CLI key and sets the default for
   322    `-client-key`.
   323  
   324  After these environment variables are correctly configured, the CLI will
   325  respond as expected:
   326  
   327  ```text
   328  $ nomad node status
   329  ID        DC   Name   Class   Drain  Eligibility  Status
   330  237cd4c5  dc1  nomad  <none>  false  eligible     ready
   331  
   332  $ nomad job init
   333  Example job file written to example.nomad
   334  vagrant@nomad:~$ nomad job run example.nomad
   335  ==> Monitoring evaluation "e9970e1d"
   336      Evaluation triggered by job "example"
   337      Allocation "a1f6c3e7" created: node "237cd4c5", group "cache"
   338      Evaluation within deployment: "080460ce"
   339      Evaluation status changed: "pending" -> "complete"
   340  ==> Evaluation "e9970e1d" finished with status "complete"
   341  ```
   342  
   343  ## Server Gossip
   344  
   345  At this point all of Nomad's RPC and HTTP communication is secured with mTLS.
   346  However, Nomad servers also communicate with a gossip protocol, Serf, that does
   347  not use TLS:
   348  
   349  * HTTP - Used to communicate between CLI and Nomad agents. Secured by mTLS.
   350  * RPC - Used to communicate between Nomad agents. Secured by mTLS.
   351  * Serf - Used to communicate between Nomad servers. Secured by a shared key.
   352  
   353  Nomad server's gossip protocol use a shared key instead of TLS for encryption.
   354  This encryption key must be added to every server's configuration using the
   355  [`encrypt`](/docs/agent/configuration/server.html#encrypt) parameter or with
   356  the [`-encrypt` command line option](/docs/commands/agent.html).
   357  
   358  The Nomad CLI includes a `operator keygen` command for generating a new secure gossip
   359  encryption key:
   360  
   361  ```text
   362  $ nomad operator keygen
   363  cg8StVXbQJ0gPvMd9o7yrg==
   364  ```
   365  
   366  Alternatively, you can use any method that base64 encodes 16 random bytes:
   367  
   368  ```text
   369  $ openssl rand -base64 16
   370  raZjciP8vikXng2S5X0m9w==
   371  $ dd if=/dev/urandom bs=16 count=1 status=none | base64
   372  LsuYyj93KVfT3pAJPMMCgA==
   373  ```
   374  
   375  Put the same generated key into every server's configuration file or command
   376  line arguments:
   377  
   378  ```hcl
   379  server {
   380    enabled = true
   381  
   382    # Self-elect, should be 3 or 5 for production
   383    bootstrap_expect = 1
   384  
   385    # Encrypt gossip communication
   386    encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
   387  }
   388  ```
   389  
   390  ## Switching an existing cluster to TLS
   391  
   392  Since Nomad does _not_ use different ports for TLS and non-TLS communication,
   393  the use of TLS must be consistent across the cluster. Switching an existing
   394  cluster to use TLS everywhere is operationally similar to upgrading between
   395  versions of Nomad, but requires additional steps to preventing needlessly
   396  rescheduling allocations.
   397  
   398  1. Add the appropriate key and certificates to all nodes.
   399    * Ensure the private key file is only readable by the Nomad user.
   400  1. Add the environment variables to all nodes where the CLI is used.
   401  1. Add the appropriate [`tls`][tls_block] block to the configuration file on
   402     all nodes.
   403  1. Generate a gossip key and add it the Nomad server configuration.
   404  
   405  ~> Once a quorum of servers are TLS-enabled, clients will no longer be able to
   406     communicate with the servers until their client configuration is updated and
   407     reloaded.
   408  
   409  At this point a rolling restart of the cluster will enable TLS everywhere.
   410  However, once servers are restarted clients will be unable to heartbeat. This
   411  means any client unable to restart with TLS enabled before their heartbeat TTL
   412  expires will have their allocations marked as `lost` and rescheduled.
   413  
   414  While the default heartbeat settings may be sufficient for concurrently
   415  restarting a small number of nodes without any allocations being marked as
   416  `lost`, most operators should raise the [`heartbeat_grace`][heartbeat_grace]
   417  configuration setting before restarting their servers:
   418  
   419  1. Set `heartbeat_grace = "1h"` or an appropriate duration on servers.
   420  1. Restart servers, one at a time.
   421  1. Restart clients, one or more at a time.
   422  1. Set [`heartbeat_grace`][heartbeat_grace] back to its previous value (or
   423     remove to accept the default).
   424  1. Restart servers, one at a time.
   425  
   426  ~> In a future release Nomad will allow upgrading a cluster to use TLS by
   427     allowing servers to accept TLS and non-TLS connections from clients during
   428     the migration.
   429  
   430  Jobs running in the cluster will _not_ be affected and will continue running
   431  throughout the switch as long as all clients can restart within their heartbeat
   432  TTL.
   433  
   434  ## Changing Nomad certificates on the fly
   435  
   436  As of 0.7.1, Nomad supports dynamic certificate reloading via SIGHUP.
   437  
   438  Given a prior TLS configuration as follows:
   439  
   440  ```hcl
   441  tls {
   442    http = true
   443    rpc  = true
   444  
   445    ca_file   = "nomad-ca.pem"
   446    cert_file = "server.pem"
   447    key_file  = "server-key.pem"
   448  
   449    verify_server_hostname = true
   450    verify_https_client    = true
   451  }
   452  ```
   453  
   454  Nomad's cert_file and key_file can be reloaded via SIGHUP simply by
   455  updating the TLS stanza to:
   456  
   457  ```hcl
   458  tls {
   459    http = true
   460    rpc  = true
   461  
   462    ca_file   = "nomad-ca.pem"
   463    cert_file = "new_server.pem"
   464    key_file  = "new_server_key.pem"
   465  
   466    verify_server_hostname = true
   467    verify_https_client    = true
   468  }
   469  ```
   470  ## Migrating a cluster to TLS
   471  
   472  ### Reloading TLS configuration via SIGHUP
   473  
   474  Nomad supports dynamically reloading both client and server TLS configuration.
   475  To reload an agent's TLS  configuration, first update the TLS block in the
   476  agent's configuration file and then send the Nomad agent a SIGHUP signal.
   477  Note that this will only reload a subset of the configuration file,
   478  including the TLS configuration.
   479  
   480  The agent reloads all its network connections when there are changes to its TLS
   481  configuration during a config reload via SIGHUP. Any new connections
   482  established will use the updated configuration, and any outstanding old
   483  connections will be closed. This process works when upgrading to TLS,
   484  downgrading from it, as well as rolling certificates. We recommend upgrading
   485  to TLS.
   486  
   487  ### RPC Upgrade Mode for Nomad Servers
   488  
   489  When migrating to TLS, the [ `rpc_upgrade_mode` ][rpc_upgrade_mode] option
   490  (defaults to `false`) in the TLS configuration for a Nomad server can be set
   491  to true. When set to true, servers will accept both TLS and non-TLS
   492  connections. By accepting non-TLS connections, operators can upgrade clients
   493  to TLS without the clients being marked as lost because the server is
   494  rejecting the client connection due to the connection not being over TLS.
   495  However, it is important to note that `rpc_upgrade_mode` should be used as a
   496  temporary solution in the process of migration, and this option should be
   497  re-set to false (meaning that the server will strictly accept only TLS
   498  connections) once the entire cluster has been migrated.
   499  
   500  [cfssl]: https://cfssl.org/
   501  [cfssl.json]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/cfssl.json
   502  [guide-install]: https://www.nomadproject.io/intro/getting-started/install.html
   503  [guide-cluster]: https://www.nomadproject.io/intro/getting-started/cluster.html
   504  [guide-server]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/server.hcl
   505  [heartbeat_grace]: /docs/agent/configuration/server.html#heartbeat_grace
   506  [letsencrypt]: https://letsencrypt.org/
   507  [rpc_upgrade_mode]: https://www.nomadproject.io/docs/agent/configuration/tls.html#rpc_upgrade_mode/
   508  [tls]: https://en.wikipedia.org/wiki/Transport_Layer_Security
   509  [tls_block]: /docs/agent/configuration/tls.html
   510  [vagrantfile]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile
   511  [vault]: https://www.vaultproject.io/
   512  [vault-pki]: https://www.vaultproject.io/docs/secrets/pki/index.html
   513  [verify_https_client]: /docs/agent/configuration/tls.html#verify_https_client
   514  [verify_server_hostname]: /docs/agent/configuration/tls.html#verify_server_hostname