github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/install/production/requirements.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/install/production/requirements.mdx (about)

1 ---
2 layout: docs
3 page_title: Requirements
4 description: |-
5 Learn about Nomad client and server requirements such as memory and CPU
6 recommendations, network topologies, and more.
7 ---
8
9 # Requirements
10
11 ## Resources (RAM, CPU, etc.)
12
13 **Nomad servers** may need to be run on large machine instances. We suggest
14 having between 4-8+ cores, 16-32 GB+ of memory, 40-80 GB+ of **fast** disk and
15 significant network bandwidth. The core count and network recommendations are to
16 ensure high throughput as Nomad heavily relies on network communication and as
17 the Servers are managing all the nodes in the region and performing scheduling.
18 The memory and disk requirements are due to the fact that Nomad stores all state
19 in memory and will store two snapshots of this data onto disk, which causes high IO in busy clusters with lots of writes. Thus disk should
20 be at least 2 times the memory available to the server when deploying a high
21 load cluster. When running on AWS prefer NVME or Provisioned IOPS SSD storage for data dir.
22
23 These recommendations are guidelines and operators should always monitor the
24 resource usage of Nomad to determine if the machines are under or over-sized.
25
26 **Nomad clients** support reserving resources on the node that should not be
27 used by Nomad. This should be used to target a specific resource utilization per
28 node and to reserve resources for applications running outside of Nomad's
29 supervision such as Consul and the operating system itself.
30
31 Please see the [reservation configuration](/docs/configuration/client#reserved) for
32 more detail.
33
34 ## Network Topology
35
36 **Nomad servers** are expected to have sub 10 millisecond network latencies
37 between each other to ensure liveness and high throughput scheduling. Nomad
38 servers can be spread across multiple datacenters if they have low latency
39 connections between them to achieve high availability.
40
41 For example, on AWS every region comprises of multiple zones which have very low
42 latency links between them, so every zone can be modeled as a Nomad datacenter
43 and every Zone can have a single Nomad server which could be connected to form a
44 quorum and a region.
45
46 Nomad servers uses Raft for state replication and Raft being highly consistent
47 needs a quorum of servers to function, therefore we recommend running an odd
48 number of Nomad servers in a region. Usually running 3-5 servers in a region is
49 recommended. The cluster can withstand a failure of one server in a cluster of
50 three servers and two failures in a cluster of five servers. Adding more servers
51 to the quorum adds more time to replicate state and hence throughput decreases
52 so we don't recommend having more than seven servers in a region.
53
54 **Nomad clients** do not have the same latency requirements as servers since they
55 are not participating in Raft. Thus clients can have 100+ millisecond latency to
56 their servers. This allows having a set of Nomad servers that service clients
57 that can be spread geographically over a continent or even the world in the case
58 of having a single "global" region and many datacenter.
59
60 ## Ports Used
61
62 Nomad requires 3 different ports to work properly on servers and 2 on clients,
63 some on TCP, UDP, or both protocols. Below we document the requirements for each
64 port. If you use a firewall of any type, you must ensure that it is configured to
65 allow the following traffic.
66
67 - HTTP API (Default 4646). This is used by clients and servers to serve the HTTP
68 API. TCP only.
69
70 - RPC (Default 4647). This is used for internal RPC communication between client
71 agents and servers, and for inter-server traffic. TCP only.
72
73 - Serf WAN (Default 4648). This is used by servers to gossip both over the LAN and
74 WAN to other servers. It isn't required that Nomad clients can reach this address.
75 TCP and UDP.
76
77 When tasks ask for dynamic ports, they are allocated out of the port range
78 between 20,000 and 32,000. This is well under the ephemeral port range suggested
79 by the [IANA](https://en.wikipedia.org/wiki/Ephemeral_port). If your operating
80 system's default ephemeral port range overlaps with Nomad's dynamic port range,
81 you should tune the OS to avoid this overlap.
82
83 On Linux this can be checked and set as follows:
84
85 ```shell-session
86 $ cat /proc/sys/net/ipv4/ip_local_port_range
87 32768 60999
88 $ echo "49152 65535" > /proc/sys/net/ipv4/ip_local_port_range
89 ```
90
91 ## Bridge Networking and `iptables`
92
93 Nomad's task group networks and Consul Connect integration use bridge networking and iptables to send traffic between containers. The Linux kernel bridge module has three "tunables" that control whether traffic crossing the bridge are processed by iptables. Some operating systems (RedHat, CentOS, and Fedora in particular) configure these tunables to optimize for VM workloads where iptables rules might not be correctly configured for guest traffic.
94
95 These tunables can be set to allow iptables processing for the bridge network as follows:
96
97 ```shell-session
98 $ echo 1 > /proc/sys/net/bridge/bridge-nf-call-arptables
99 $ echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
100 $ echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
101 ```
102
103 To preserve these settings on startup of a client node, add a file including the following to `/etc/sysctl.d/` or remove the file your Linux distribution puts in that directory.
104
105 ```text
106 net.bridge.bridge-nf-call-arptables = 1
107 net.bridge.bridge-nf-call-ip6tables = 1
108 net.bridge.bridge-nf-call-iptables = 1
109 ```
110
111 ## Hardening Nomad
112
113 As noted in the [Security Model][] guide, Nomad is not **secure-by-default**.
114
115 ### User Permissions
116
117 Nomad servers and Nomad clients have different requirements for permissions.
118
119 Nomad servers should be run with the lowest possible permissions. They need
120 access to their own data directory and the ability to bind to their ports. You
121 should create a `nomad` user with the minimal set of required privileges. If you
122 are installing Nomad from the official Linux packages, the systemd unit file
123 runs Nomad as `root`. For your server nodes you should change this to a
124 minimally privileged `nomad` user. See the [production deployment guide][] for
125 details.
126
127 Nomad clients must be run as `root` due to the OS isolation mechanisms that
128 require root privileges (see also [Linux Capabilities][] below). The Nomad
129 client's data directory should be owned by `root` with filesystem permissions
130 set to `0700`.
131
132 ### Linux Capabilities
133
134 On Linux, Nomad clients require privileged capabilities for isolating
135 tasks. Nomad clients require `CAP_SYS_ADMIN` for creating the tmpfs used for
136 secrets, bind-mounting task directories, mounting volumes, and running some task
137 driver plugins. Nomad clients require `CAP_NET_ADMIN` for a variety of tasks to
138 set up networking. You should run Nomad clients as `root`, but running as `root`
139 does not grant these required capabilities if Nomad is running in a user
140 namespace. Running Nomad clients inside a user namespace is unsupported. See the
141 [`capabilities(7)`][] man page for details on Linux capabilities.
142
143 In order to run a task, Nomad clients perform privileged operations normally
144 reserved to the `root` user:
145
146 * Mounting tmpfs file systems for the task `/secrets` directory.
147 * Creating the network bridge for `bridge` networking.
148 * Allowing inbound and outbound network traffic to the workload (typically via
149 `iptables`).
150 * Starting tasks as a specific `user`.
151 * Setting the owner of `template` outputs.
152
153 On Linux this set of requirements expands to:
154
155 * Configuring resource isolation via cgroups.
156 * Configuring namespace isolation: `mount`, `user`, `pid`, `ipc`, and `network`
157 namespaces.
158
159 Nomad task drivers that support bind-mounting volumes also need to run as `root`
160 to do so. This includes the built-in `exec` and `java` task drivers. The
161 built-in task drivers run in the same process as the Nomad client, so this
162 requires that the Nomad client agent is also running as `root`.
163
164 ### Rootless Nomad Clients
165
166 Although it's possible to run a Nomad client agent as a non-root user or as
167 `root` in a user namespace, to perform the privileged operations described above
168 you also need to grant the client agent `CAP_SYS_ADMIN` and `CAP_NET_ADMIN`
169 capabilities. Note that these capabilities are nearly functionally equivalent to
170 running as `root` and that a process running with `CAP_SYS_ADMIN` can almost
171 always escalate itself to "true" (unnamespaced) `root`.
172
173 Some task drivers delegate many of their privileged operations to an external
174 process such as `dockerd` or `podman`. If you don't need `bridge` networking and
175 are using these task drivers or custom task drivers, you may be able to run
176 Nomad client agents as a non-root user with the following additional
177 configuration:
178
179 * Delegated cgroups: to safely set cgroups as an unprivileged user requires
180 cgroups v2.
181 * User namespaces: on some distros this may require setting sysctls like
182 `kernel.unprivileged_userns_clone=1`
183 * The task driver engine (ex. `dockerd`, `podman`, `containerd`, etc) must be
184 configured for rootless operation. This requires cgroups v2, user namespaces,
185 and typically either a patched kernel or kernel module (ex. `overlay.ko`)
186 allowing unprivileged [overlay filesystem][] or a FUSE overlay filesystem.
187
188 This is not a supported or well-tested configuration. See [GH-13669][] for a
189 further discussion and to provide feedback on your experiences trying to run
190 rootless Nomad clients.
191
192 [Security Model]: /docs/concepts/security
193 [production deployment guide]: https://developer.hashicorp.com/nomad/tutorials/enterprise/production-deployment-guide-vm-with-consul#configure-systemd
194 [linux capabilities]: #linux-capabilities
195 [`capabilities(7)`]: https://man7.org/linux/man-pages/man7/capabilities.7.html
196 [overlay filesystem]: https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html
197 [GH-13669]: https://github.com/hashicorp/nomad/issues/13669