github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/security.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/concepts/security.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: Security Model
     4  description: >-
     5    Nomad relies on both a lightweight gossip mechanism and an RPC system to
     6    provide various features. Both of the systems have different security
     7    mechanisms that stem from their designs. However, the security mechanisms of
     8    Nomad have a common goal: to provide confidentiality, integrity, and
     9    authentication.
    10  ---
    11  
    12  ## Overview
    13  
    14  Nomad is a flexible workload orchestrator to deploy and manage any containerized
    15  or legacy application using a single, unified workflow. It can run diverse
    16  workloads including Docker, non-containerized, microservice, and batch
    17  applications.
    18  
    19  Nomad utilizes a lightweight gossip and RPC system, [similar to
    20  Consul](https://developer.hashicorp.com/consul/docs/concepts/security), which provides
    21  various essential features. Both of these systems provide security mechanisms
    22  which should be utilized to help provide [confidentiality, integrity and
    23  authentication](https://en.wikipedia.org/wiki/Information_security).
    24  
    25  Using defense in depth is crucial for cluster security, and deployment
    26  requirements may differ drastically depending on your use case. Further security
    27  features for multi-tenant deployments are offered exclusively in the enterprise
    28  version. This documentation may need to be adapted to your deployment situation,
    29  but the general mechanisms for a secure Nomad deployment revolve around:
    30  
    31  - **[mTLS](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)** -
    32    Mutual authentication of both the TLS server and client x509 certificates
    33    prevents internal abuse by preventing unauthenticated access to network
    34    components within the cluster.
    35  
    36  - **[ACLs](https://learn.hashicorp.com/collections/nomad/access-control)** - Enables
    37    authorization for authenticated connections by granting capabilities to ACL
    38    tokens.
    39  
    40  - **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)**
    41  
    42    - Access to read and write to a namespace can be
    43      controlled to allow for granular access to job information managed within a
    44      multi-tenant cluster.
    45  
    46  - **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)**
    47    (**Enterprise Only**) - Sentinel policies allow for granular control over
    48    components such as task drivers within a cluster.
    49  
    50  ### Personas
    51  
    52  When thinking about Nomad, it helps to consider the following types of base
    53  personas when managing the security requirements for the cluster deployment. The
    54  granularity may change depending on your team's use case where rigorous roles
    55  can be accurately defined and managed using the [Nomad backend secret engine for
    56  Vault](https://www.vaultproject.io/docs/secrets/nomad). This is
    57  described further with getting started steps using a development server
    58  [here](https://learn.hashicorp.com/collections/nomad/access-control).
    59  
    60  It's important to note that there's no traditional concept of a user
    61  within Nomad itself.
    62  
    63  - **System Administrator** - This is someone who has access to the underlying
    64    infrastructure to a Nomad cluster. Often she has access to SSH or RDP
    65    directly into a server within a cluster through a bastion host. Ultimately
    66    they have read, write and execute permissions for the actual Nomad binary.
    67    This binary is the same for server and client nodes using different
    68    configuration files. These users potentially have something like sudo,
    69    administrative, or some other super-user access to the underlying compute
    70    resource. Users like these are essentially totally trusted by Nomad as they
    71    have administrative rights to the system and can start or stop the agent.
    72  
    73  - **Nomad Administrator** - This is someone (probably the same **System
    74    Administrator**) who has access to define the Nomad agent configurations
    75    for servers and clients, and/or have a Nomad management ACL token. They also
    76    have total rights to all of the parts in the Nomad system including the
    77    ability to start and stop all jobs within a cluster.
    78  
    79  - **Nomad Operator** - This is someone who likely has selective access with
    80    restricted capabilities to manage jobs applicable to their namespace within
    81    a cluster.
    82  
    83  - **User** - This is someone who is a user of an application being run on the
    84    system. In some cases applications may be public facing and exposed to the
    85    internet such as a web server. This is someone who shouldn't have any
    86    network access to the Nomad server API.
    87  
    88  ### Secure Configuration
    89  
    90  Nomad's security model is applicable only if all parts of the system are running
    91  with a secure configuration; **Nomad is not secure-by-default.** Without the following
    92  mechanisms enabled in Nomad's configuration, it may be possible to abuse access
    93  to a cluster. Like all security considerations, one must appropriately determine
    94  what concerns they have for their environment and adapt to these security
    95  recommendations accordingly.
    96  
    97  #### Requirements
    98  
    99  - **[mTLS enabled](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)**
   100  
   101    - Mutual TLS (mTLS) enables [mutual
   102      authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with
   103      security properties to prevent the following problems:
   104  
   105    * Unauthorized access because both server and clients must provide valid TLS
   106      [X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same
   107      valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to
   108      communicate within the cluster.
   109  
   110    * Observing or tampering communication between nodes is thwarted due to the
   111      traffic being encrypted using the well known network security protocol
   112      [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2,
   113      with a [configurable minimal
   114      version](/docs/configuration/tls#tls_min_version).
   115      Both server and client agents must be configured to validate each other's
   116      certificates to ensure mTLS is actually enabled. This requires appropriate
   117      certificates to be distributed to servers, clients, machines, or operators
   118      for things like CLI usage. It is recommended to use
   119      [Vault](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad)
   120      to securely manage the certificate creation and rotation for nodes.
   121  
   122    * Agent role misconfiguration is prevented using the X.509
   123      [SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension.
   124      This is essentially a domain name that is used to identify and verify a
   125      node's region and role name are configured as expected (e.g.
   126      `client.us-east.nomad`).
   127  
   128    * Using the previously mentioned role name prevents maliciously masquerading
   129      as a server or client node, and allows other services to be signed easily by
   130      the same CA. This also avoids any potential pitfalls with certificates using
   131      the IP or Hostname of nodes within a cluster.
   132  
   133  - **[ACLs enabled](https://learn.hashicorp.com/collections/nomad/access-control)** - The
   134    access control list (ACL) system provides a capability-based control
   135    mechanism for Nomad administrators allowing for custom roles (typically
   136    within Vault) to be tied to an individual human or machine operator
   137    identity. This allows for access to capabilities within the cluster to be
   138    restricted to specific users.
   139  
   140  - **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)**
   141  
   142    - This feature allows for a cluster to be shared by
   143      multiple teams within a company. Using this logical separation is important
   144      for multi-tenant clusters to prevent users without access to that namespace
   145      from conflicting with each other. This requires ACLs to be enabled in order
   146      to be enforced.
   147  
   148  - **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)**
   149    (**Enterprise Only**) - [Sentinel](https://www.hashicorp.com/sentinel/) is
   150    a feature which enables
   151    [policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code)
   152    to enforce further restrictions on operators. This is used to augment the
   153    built-in ACL system for fine-grained control over jobs.
   154  
   155  - **[Resource Quotas](https://learn.hashicorp.com/tutorials/nomad/quotas)**
   156    (**Enterprise Only**) - Can limit a namespace's access to the underlying
   157    compute resources in the cluster by setting upper-limits for operators.
   158    Access to these resource quotas can be managed via ACLs to ensure read-only
   159    access for operators so they can't just change their quotas.
   160  
   161  #### Recommendations
   162  
   163  The following are security recommendations that can help significantly improve
   164  the security of your cluster depending on your use case. We recommend always
   165  practicing defense in depth when architecting the security mechanisms for your
   166  environment.
   167  
   168  - **Rotate credentials** - Using short-lived credentials or rotating them
   169    frequently is highly recommended to reduce damage of accidentally leaked
   170    credentials.
   171  
   172    - Use [Vault](/docs/integrations/vault-integration) to create and manage
   173      dynamic, rotated credentials prevent secrets from being easily exposed
   174      within the [job specification](/docs/job-specification) itself
   175      which may be leaked into version control or otherwise be accidentally stored
   176      on disk on an operator's local machine.
   177  
   178    - Rotate credentials used by the Nomad agent; e.g. [integrate with Vault's
   179      PKI secret engine](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad) to
   180      automatically generate and renew dynamic, unique X.509 certificates for each
   181      Nomad node with a short [TTL](https://en.wikipedia.org/wiki/Time_to_live).
   182  
   183  - **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** -
   184    Nomad servers can be run as unprivileged users that only require access to
   185    the data directory.
   186  
   187  - **Containers with Sandbox Runtimes** - In some situations, such as running
   188    untrusted code as a service, it may be worth considering using different
   189    container runtimes such as [gVisor](https://gvisor.dev/) or [Kata
   190    Containers](https://katacontainers.io/). These types of runtimes provide
   191    sandboxing features which help prevent raw access to the underlying shared
   192    kernel for other containers and the Nomad client agent itself. Docker driver
   193    allows [customizing runtimes](/docs/drivers/docker#runtime).
   194  
   195  - **[Disable Unused Drivers](/docs/configuration/client#driver-denylist)** -
   196    Each driver provides different degrees of isolation, and bugs may allow
   197    unintended privilege escalation. If a task driver is not needed, you can
   198    disable it to reduce risk.
   199  
   200  - **Linux Security Modules** - Use of security modules that can be directly
   201    integrated into operating systems such as AppArmor, SElinux, and Seccomp on
   202    both the Nomad hosts and applied to containers for an extra layer of
   203    security. Seccomp profiles are able to be passed directly to containers
   204    using the
   205    **[`security_opt`](/docs/drivers/docker#security_opt)**
   206    parameter available in the default [Docker
   207    driver](/docs/drivers/docker).
   208  
   209  - **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** -
   210    Integrating service mesh technologies such as
   211    **[Consul](https://www.consul.io/)** can be extremely useful for limiting
   212    and efficiently load balancing network connectivity within a cluster.
   213  
   214  - **[TLS Settings](/docs/configuration/tls)** -
   215    TLS settings, such as the available [cipher suites](/docs/configuration/tls#tls_cipher_suites), should be tuned to fit the needs of your environment.
   216  
   217  - **[HTTP Headers](/docs/configuration#http_api_response_headers)** -
   218    Additional security [headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers), such as [`X-XSS-Protection`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection), can be [configured](/docs/configuration#http_api_response_headers) for HTTP API responses.
   219  
   220  ### Threat Model
   221  
   222  The following are parts of the Nomad threat model:
   223  
   224  - **Nomad agent-to-agent communication** - Transport encryption for
   225    agent-to-agent communication is required to prevent eavesdropping. TCP and
   226    UDP based protocols within Nomad provide different mechanisms for enabling
   227    encryption including symmetric (shared gossip encryption keys) and
   228    asymmetric keys (TLS).
   229  
   230  - **Tampering of data in transit** - Any tampering should be detectable via mTLS
   231    and cause Nomad to avoid processing the request.
   232  
   233  - **Access to data without authentication or authorization** - Requests to the
   234    server should be authenticated and authorized using mTLS and ACLs
   235    respectively.
   236  
   237  - **State modification or corruption due to malicious messages** - Improperly
   238    formatted messages are discarded while properly formatted messages require
   239    authentication and authorization.
   240  
   241  - **Non-server members accessing raw data** - All servers that join the cluster
   242    require proper authentication and authorization in order to begin
   243    participating in Raft. All data in Raft should be encrypted with TLS.
   244  
   245  - **Denial of Service against a node** - DoS attacks against a single node
   246    should not compromise the security posture of Nomad.
   247  
   248  The following are not part of the threat model for server agents:
   249  
   250  - **Access (read or write) to the Nomad data directory** - Information about the
   251    jobs managed by Nomad is persisted to a server's data directory.
   252  
   253  - **Access (read or write) to the Nomad configuration directory** - Access to
   254    Nomad's configuration file(s) directory can enable and disable features for
   255    a cluster.
   256  
   257  - **Memory access to a running Nomad server agent** - Direct access to the
   258    memory of the Nomad server agent process (usually requiring a shell on the
   259    system through various means) results in almost all aspects of the agent
   260    being compromised including access to certificates and other secrets.
   261  
   262  - **Existence of [Variables] metadata** - Access to Variables List APIs is
   263    controlled by ACL policies, but the existence of specific paths or metadata is
   264    not considered sensitive.
   265  
   266  The following are not part of the threat model for client agents:
   267  
   268  - **Access (read or write) to the Nomad data directory** - Information about the
   269    allocations scheduled to a Nomad client is persisted to its data directory.
   270    This would include any secrets in any of the allocation's file systems.
   271  
   272  - **Access (read or write) to the Nomad configuration directory** - Access to a
   273    client's configuration file can enable and disable features for a client
   274    including insecure drivers such as
   275    [`raw_exec`](/docs/drivers/raw_exec).
   276  
   277  - **Memory access to a running Nomad client agent** - Direct access to the
   278    memory of the Nomad client agent process allows an attack to extract secrets
   279    from clients such as Vault tokens.
   280  
   281  - **Lax Client Driver Sandbox** - Drivers may allow some privileged operations,
   282    e.g. filesystem access to configuration directories, or raw accesses to host
   283    devices. Such privileges can be used to facilitate compromise other workloads,
   284    or cause denial-of-service attacks.
   285  
   286  #### Internal Threats
   287  
   288  - **Job Operator** - Someone with a valid mTLS certificate and ACL token may still be a
   289    threat to your cluster in certain situations, especially in multi-team
   290    cluster deployments. They may accidentally or intentionally use a malicious
   291    job to harm a cluster which can help be protected against using
   292    Quotas, Namespace, and Sentinel policies.
   293  
   294  - **Workload** - Workloads may have host network access within a cluster which
   295    can lead to SSRF due to application security issues outside of the scope of
   296    Nomad which may lead to internal access within the cluster. Using mTLS, ACLs
   297    and Sentinel policies together can add layers of protection against
   298    malicious workloads.
   299  
   300  - **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose
   301    clusters to abuse within the cluster from malicious workloads.
   302  
   303  - **Client driver** - Drivers implement various workload types for a cluster,
   304    and the backend configuration of these drivers should be considered to
   305    implement defense in depth. For example, a custom Docker driver that limits
   306    the ability to mount the host file system may be subverted by network access
   307    to an exposed Docker daemon API through other means such as the [`raw_exec`](/docs/drivers/raw_exec)
   308    driver.
   309  
   310  #### External Threats
   311  
   312  There are two main components to consider to for external threats in a Nomad cluster:
   313  
   314  - **Server agent** - Internal cluster leader elections and replication is
   315    managed via Raft between server agents encrypted in transit. However,
   316    information about the server is stored unencrypted at rest in the agent's
   317    data directory. This may contain sensitive information such as ACL tokens
   318    and TLS certificates.
   319  
   320  - **Client agent** - Client-to-server communication within a cluster is
   321    encrypted and authenticated using mTLS. Information about the allocations on
   322    a client node is unencrypted in the agent's data and configuration
   323    directory.
   324  
   325  ### Network Ports
   326  
   327  | **Port / Protocol**  | Agents  | Description                                                                                                                                                                         |
   328  | -------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
   329  | **4646** / TCP       | All     | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](https://learn.hashicorp.com/tutorials/nomad/web-ui-access) and [API](/api-docs) access to agents. |
   330  | **4647** / TCP       | All     | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents.                                                                                                 |
   331  | **4648** / TCP + UDP | Servers | [gossip](/docs/concepts/gossip) protocol to manage server membership using [Serf](https://www.serf.io/).                                                                           |
   332  
   333  
   334  [Variables]: /docs/concepts/variables