github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/security.mdx (about) 1 --- 2 layout: docs 3 page_title: Security Model 4 description: >- 5 Nomad relies on both a lightweight gossip mechanism and an RPC system to 6 provide various features. Both of the systems have different security 7 mechanisms that stem from their designs. However, the security mechanisms of 8 Nomad have a common goal: to provide confidentiality, integrity, and 9 authentication. 10 --- 11 12 ## Overview 13 14 Nomad is a flexible workload orchestrator to deploy and manage any containerized 15 or legacy application using a single, unified workflow. It can run diverse 16 workloads including Docker, non-containerized, microservice, and batch 17 applications. 18 19 Nomad utilizes a lightweight gossip and RPC system, [similar to 20 Consul](https://developer.hashicorp.com/consul/docs/concepts/security), which provides 21 various essential features. Both of these systems provide security mechanisms 22 which should be utilized to help provide [confidentiality, integrity and 23 authentication](https://en.wikipedia.org/wiki/Information_security). 24 25 Using defense in depth is crucial for cluster security, and deployment 26 requirements may differ drastically depending on your use case. Further security 27 features for multi-tenant deployments are offered exclusively in the enterprise 28 version. This documentation may need to be adapted to your deployment situation, 29 but the general mechanisms for a secure Nomad deployment revolve around: 30 31 - **[mTLS](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)** - 32 Mutual authentication of both the TLS server and client x509 certificates 33 prevents internal abuse by preventing unauthenticated access to network 34 components within the cluster. 35 36 - **[ACLs](https://learn.hashicorp.com/collections/nomad/access-control)** - Enables 37 authorization for authenticated connections by granting capabilities to ACL 38 tokens. 39 40 - **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)** 41 42 - Access to read and write to a namespace can be 43 controlled to allow for granular access to job information managed within a 44 multi-tenant cluster. 45 46 - **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)** 47 (**Enterprise Only**) - Sentinel policies allow for granular control over 48 components such as task drivers within a cluster. 49 50 ### Personas 51 52 When thinking about Nomad, it helps to consider the following types of base 53 personas when managing the security requirements for the cluster deployment. The 54 granularity may change depending on your team's use case where rigorous roles 55 can be accurately defined and managed using the [Nomad backend secret engine for 56 Vault](https://www.vaultproject.io/docs/secrets/nomad). This is 57 described further with getting started steps using a development server 58 [here](https://learn.hashicorp.com/collections/nomad/access-control). 59 60 It's important to note that there's no traditional concept of a user 61 within Nomad itself. 62 63 - **System Administrator** - This is someone who has access to the underlying 64 infrastructure to a Nomad cluster. Often she has access to SSH or RDP 65 directly into a server within a cluster through a bastion host. Ultimately 66 they have read, write and execute permissions for the actual Nomad binary. 67 This binary is the same for server and client nodes using different 68 configuration files. These users potentially have something like sudo, 69 administrative, or some other super-user access to the underlying compute 70 resource. Users like these are essentially totally trusted by Nomad as they 71 have administrative rights to the system and can start or stop the agent. 72 73 - **Nomad Administrator** - This is someone (probably the same **System 74 Administrator**) who has access to define the Nomad agent configurations 75 for servers and clients, and/or have a Nomad management ACL token. They also 76 have total rights to all of the parts in the Nomad system including the 77 ability to start and stop all jobs within a cluster. 78 79 - **Nomad Operator** - This is someone who likely has selective access with 80 restricted capabilities to manage jobs applicable to their namespace within 81 a cluster. 82 83 - **User** - This is someone who is a user of an application being run on the 84 system. In some cases applications may be public facing and exposed to the 85 internet such as a web server. This is someone who shouldn't have any 86 network access to the Nomad server API. 87 88 ### Secure Configuration 89 90 Nomad's security model is applicable only if all parts of the system are running 91 with a secure configuration; **Nomad is not secure-by-default.** Without the following 92 mechanisms enabled in Nomad's configuration, it may be possible to abuse access 93 to a cluster. Like all security considerations, one must appropriately determine 94 what concerns they have for their environment and adapt to these security 95 recommendations accordingly. 96 97 #### Requirements 98 99 - **[mTLS enabled](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)** 100 101 - Mutual TLS (mTLS) enables [mutual 102 authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with 103 security properties to prevent the following problems: 104 105 * Unauthorized access because both server and clients must provide valid TLS 106 [X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same 107 valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to 108 communicate within the cluster. 109 110 * Observing or tampering communication between nodes is thwarted due to the 111 traffic being encrypted using the well known network security protocol 112 [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2, 113 with a [configurable minimal 114 version](/docs/configuration/tls#tls_min_version). 115 Both server and client agents must be configured to validate each other's 116 certificates to ensure mTLS is actually enabled. This requires appropriate 117 certificates to be distributed to servers, clients, machines, or operators 118 for things like CLI usage. It is recommended to use 119 [Vault](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad) 120 to securely manage the certificate creation and rotation for nodes. 121 122 * Agent role misconfiguration is prevented using the X.509 123 [SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension. 124 This is essentially a domain name that is used to identify and verify a 125 node's region and role name are configured as expected (e.g. 126 `client.us-east.nomad`). 127 128 * Using the previously mentioned role name prevents maliciously masquerading 129 as a server or client node, and allows other services to be signed easily by 130 the same CA. This also avoids any potential pitfalls with certificates using 131 the IP or Hostname of nodes within a cluster. 132 133 - **[ACLs enabled](https://learn.hashicorp.com/collections/nomad/access-control)** - The 134 access control list (ACL) system provides a capability-based control 135 mechanism for Nomad administrators allowing for custom roles (typically 136 within Vault) to be tied to an individual human or machine operator 137 identity. This allows for access to capabilities within the cluster to be 138 restricted to specific users. 139 140 - **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)** 141 142 - This feature allows for a cluster to be shared by 143 multiple teams within a company. Using this logical separation is important 144 for multi-tenant clusters to prevent users without access to that namespace 145 from conflicting with each other. This requires ACLs to be enabled in order 146 to be enforced. 147 148 - **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)** 149 (**Enterprise Only**) - [Sentinel](https://www.hashicorp.com/sentinel/) is 150 a feature which enables 151 [policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code) 152 to enforce further restrictions on operators. This is used to augment the 153 built-in ACL system for fine-grained control over jobs. 154 155 - **[Resource Quotas](https://learn.hashicorp.com/tutorials/nomad/quotas)** 156 (**Enterprise Only**) - Can limit a namespace's access to the underlying 157 compute resources in the cluster by setting upper-limits for operators. 158 Access to these resource quotas can be managed via ACLs to ensure read-only 159 access for operators so they can't just change their quotas. 160 161 #### Recommendations 162 163 The following are security recommendations that can help significantly improve 164 the security of your cluster depending on your use case. We recommend always 165 practicing defense in depth when architecting the security mechanisms for your 166 environment. 167 168 - **Rotate credentials** - Using short-lived credentials or rotating them 169 frequently is highly recommended to reduce damage of accidentally leaked 170 credentials. 171 172 - Use [Vault](/docs/integrations/vault-integration) to create and manage 173 dynamic, rotated credentials prevent secrets from being easily exposed 174 within the [job specification](/docs/job-specification) itself 175 which may be leaked into version control or otherwise be accidentally stored 176 on disk on an operator's local machine. 177 178 - Rotate credentials used by the Nomad agent; e.g. [integrate with Vault's 179 PKI secret engine](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad) to 180 automatically generate and renew dynamic, unique X.509 certificates for each 181 Nomad node with a short [TTL](https://en.wikipedia.org/wiki/Time_to_live). 182 183 - **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** - 184 Nomad servers can be run as unprivileged users that only require access to 185 the data directory. 186 187 - **Containers with Sandbox Runtimes** - In some situations, such as running 188 untrusted code as a service, it may be worth considering using different 189 container runtimes such as [gVisor](https://gvisor.dev/) or [Kata 190 Containers](https://katacontainers.io/). These types of runtimes provide 191 sandboxing features which help prevent raw access to the underlying shared 192 kernel for other containers and the Nomad client agent itself. Docker driver 193 allows [customizing runtimes](/docs/drivers/docker#runtime). 194 195 - **[Disable Unused Drivers](/docs/configuration/client#driver-denylist)** - 196 Each driver provides different degrees of isolation, and bugs may allow 197 unintended privilege escalation. If a task driver is not needed, you can 198 disable it to reduce risk. 199 200 - **Linux Security Modules** - Use of security modules that can be directly 201 integrated into operating systems such as AppArmor, SElinux, and Seccomp on 202 both the Nomad hosts and applied to containers for an extra layer of 203 security. Seccomp profiles are able to be passed directly to containers 204 using the 205 **[`security_opt`](/docs/drivers/docker#security_opt)** 206 parameter available in the default [Docker 207 driver](/docs/drivers/docker). 208 209 - **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** - 210 Integrating service mesh technologies such as 211 **[Consul](https://www.consul.io/)** can be extremely useful for limiting 212 and efficiently load balancing network connectivity within a cluster. 213 214 - **[TLS Settings](/docs/configuration/tls)** - 215 TLS settings, such as the available [cipher suites](/docs/configuration/tls#tls_cipher_suites), should be tuned to fit the needs of your environment. 216 217 - **[HTTP Headers](/docs/configuration#http_api_response_headers)** - 218 Additional security [headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers), such as [`X-XSS-Protection`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection), can be [configured](/docs/configuration#http_api_response_headers) for HTTP API responses. 219 220 ### Threat Model 221 222 The following are parts of the Nomad threat model: 223 224 - **Nomad agent-to-agent communication** - Transport encryption for 225 agent-to-agent communication is required to prevent eavesdropping. TCP and 226 UDP based protocols within Nomad provide different mechanisms for enabling 227 encryption including symmetric (shared gossip encryption keys) and 228 asymmetric keys (TLS). 229 230 - **Tampering of data in transit** - Any tampering should be detectable via mTLS 231 and cause Nomad to avoid processing the request. 232 233 - **Access to data without authentication or authorization** - Requests to the 234 server should be authenticated and authorized using mTLS and ACLs 235 respectively. 236 237 - **State modification or corruption due to malicious messages** - Improperly 238 formatted messages are discarded while properly formatted messages require 239 authentication and authorization. 240 241 - **Non-server members accessing raw data** - All servers that join the cluster 242 require proper authentication and authorization in order to begin 243 participating in Raft. All data in Raft should be encrypted with TLS. 244 245 - **Denial of Service against a node** - DoS attacks against a single node 246 should not compromise the security posture of Nomad. 247 248 The following are not part of the threat model for server agents: 249 250 - **Access (read or write) to the Nomad data directory** - Information about the 251 jobs managed by Nomad is persisted to a server's data directory. 252 253 - **Access (read or write) to the Nomad configuration directory** - Access to 254 Nomad's configuration file(s) directory can enable and disable features for 255 a cluster. 256 257 - **Memory access to a running Nomad server agent** - Direct access to the 258 memory of the Nomad server agent process (usually requiring a shell on the 259 system through various means) results in almost all aspects of the agent 260 being compromised including access to certificates and other secrets. 261 262 - **Existence of [Variables] metadata** - Access to Variables List APIs is 263 controlled by ACL policies, but the existence of specific paths or metadata is 264 not considered sensitive. 265 266 The following are not part of the threat model for client agents: 267 268 - **Access (read or write) to the Nomad data directory** - Information about the 269 allocations scheduled to a Nomad client is persisted to its data directory. 270 This would include any secrets in any of the allocation's file systems. 271 272 - **Access (read or write) to the Nomad configuration directory** - Access to a 273 client's configuration file can enable and disable features for a client 274 including insecure drivers such as 275 [`raw_exec`](/docs/drivers/raw_exec). 276 277 - **Memory access to a running Nomad client agent** - Direct access to the 278 memory of the Nomad client agent process allows an attack to extract secrets 279 from clients such as Vault tokens. 280 281 - **Lax Client Driver Sandbox** - Drivers may allow some privileged operations, 282 e.g. filesystem access to configuration directories, or raw accesses to host 283 devices. Such privileges can be used to facilitate compromise other workloads, 284 or cause denial-of-service attacks. 285 286 #### Internal Threats 287 288 - **Job Operator** - Someone with a valid mTLS certificate and ACL token may still be a 289 threat to your cluster in certain situations, especially in multi-team 290 cluster deployments. They may accidentally or intentionally use a malicious 291 job to harm a cluster which can help be protected against using 292 Quotas, Namespace, and Sentinel policies. 293 294 - **Workload** - Workloads may have host network access within a cluster which 295 can lead to SSRF due to application security issues outside of the scope of 296 Nomad which may lead to internal access within the cluster. Using mTLS, ACLs 297 and Sentinel policies together can add layers of protection against 298 malicious workloads. 299 300 - **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose 301 clusters to abuse within the cluster from malicious workloads. 302 303 - **Client driver** - Drivers implement various workload types for a cluster, 304 and the backend configuration of these drivers should be considered to 305 implement defense in depth. For example, a custom Docker driver that limits 306 the ability to mount the host file system may be subverted by network access 307 to an exposed Docker daemon API through other means such as the [`raw_exec`](/docs/drivers/raw_exec) 308 driver. 309 310 #### External Threats 311 312 There are two main components to consider to for external threats in a Nomad cluster: 313 314 - **Server agent** - Internal cluster leader elections and replication is 315 managed via Raft between server agents encrypted in transit. However, 316 information about the server is stored unencrypted at rest in the agent's 317 data directory. This may contain sensitive information such as ACL tokens 318 and TLS certificates. 319 320 - **Client agent** - Client-to-server communication within a cluster is 321 encrypted and authenticated using mTLS. Information about the allocations on 322 a client node is unencrypted in the agent's data and configuration 323 directory. 324 325 ### Network Ports 326 327 | **Port / Protocol** | Agents | Description | 328 | -------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 329 | **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](https://learn.hashicorp.com/tutorials/nomad/web-ui-access) and [API](/api-docs) access to agents. | 330 | **4647** / TCP | All | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. | 331 | **4648** / TCP + UDP | Servers | [gossip](/docs/concepts/gossip) protocol to manage server membership using [Serf](https://www.serf.io/). | 332 333 334 [Variables]: /docs/concepts/variables