github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/configuration/server.mdx

github.com/Ilhicas/nomad@v1.0.4-0.20210304152020-e86851182bc3/website/content/docs/configuration/server.mdx (about)

1 ---
2 layout: docs
3 page_title: server Stanza - Agent Configuration
4 sidebar_title: server
5 description: |-
6 The "server" stanza configures the Nomad agent to operate in server mode to
7 participate in scheduling decisions, register with service discovery, handle
8 join failures, and more.
9 ---
10
11 # `server` Stanza
12
13 <Placement groups={['server']} />
14
15 The `server` stanza configures the Nomad agent to operate in server mode to
16 participate in scheduling decisions, register with service discovery, handle
17 join failures, and more.
18
19 ```hcl
20 server {
21 enabled = true
22 bootstrap_expect = 3
23 server_join {
24 retry_join = [ "1.1.1.1", "2.2.2.2" ]
25 retry_max = 3
26 retry_interval = "15s"
27 }
28 }
29 ```
30
31 ## `server` Parameters
32
33 - `authoritative_region` `(string: "")` - Specifies the authoritative region, which
34 provides a single source of truth for global configurations such as ACL Policies and
35 global ACL tokens. Non-authoritative regions will replicate from the authoritative
36 to act as a mirror. By default, the local region is assumed to be authoritative.
37
38 - `bootstrap_expect` `(int: required)` - Specifies the number of server nodes to
39 wait for before bootstrapping. It is most common to use the odd-numbered
40 integers `3` or `5` for this value, depending on the cluster size. A value of
41 `1` does not provide any fault tolerance and is not recommended for production
42 use cases.
43
44 - `data_dir` `(string: "[data_dir]/server")` - Specifies the directory to use -
45 for server-specific data, including the replicated log. By default, this is -
46 the top-level [data_dir](/docs/configuration#data_dir)
47 suffixed with "server", like `"/opt/nomad/server"`. This must be an absolute
48 path.
49
50 - `enabled` `(bool: false)` - Specifies if this agent should run in server mode.
51 All other server options depend on this value being set.
52
53 - `enabled_schedulers` `(array<string>: [all])` - Specifies which sub-schedulers
54 this server will handle. This can be used to restrict the evaluations that
55 worker threads will dequeue for processing.
56
57 - `enable_event_broker` `(bool: true)` - Specifies if this server will generate
58 events for its event stream.
59
60 - `encrypt` `(string: "")` - Specifies the secret key to use for encryption of
61 Nomad server's gossip network traffic. This key must be 32 bytes that are
62 [RFC4648] "URL and filename safe" base64-encoded. You can generate an
63 appropriately-formatted key with the [`nomad operator keygen`] command. The
64 provided key is automatically persisted to the data directory and loaded
65 automatically whenever the agent is restarted. This means that to encrypt
66 Nomad server's gossip protocol, this option only needs to be provided once
67 on each agent's initial startup sequence. If it is provided after Nomad has
68 been initialized with an encryption key, then the provided key is ignored
69 and a warning will be displayed. See the [encryption
70 documentation][encryption] for more details on this option and its impact on
71 the cluster.
72
73 - `event_buffer_size` `(int: 100)` - Specifies the number of events generated
74 by the server to be held in memory. Increasing this value enables new
75 subscribers to have a larger look back window when initially subscribing.
76 Decreasing will lower the amount of memory used for the event buffer.
77
78 - `node_gc_threshold` `(string: "24h")` - Specifies how long a node must be in a
79 terminal state before it is garbage collected and purged from the system. This
80 is specified using a label suffix like "30s" or "1h".
81
82 - `job_gc_interval` `(string: "5m")` - Specifies the interval between the job
83 garbage collections. Only jobs who have been terminal for at least
84 `job_gc_threshold` will be collected. Lowering the interval will perform more
85 frequent but smaller collections. Raising the interval will perform collections
86 less frequently but collect more jobs at a time. Reducing this interval is
87 useful if there is a large throughput of tasks, leading to a large set of
88 dead jobs. This is specified using a label suffix like "30s" or "3m". `job_gc_interval`
89 was introduced in Nomad 0.10.0.
90
91 - `job_gc_threshold` `(string: "4h")` - Specifies the minimum time a job must be
92 in the terminal state before it is eligible for garbage collection. This is
93 specified using a label suffix like "30s" or "1h".
94
95 - `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an
96 evaluation must be in the terminal state before it is eligible for garbage
97 collection. This is specified using a label suffix like "30s" or "1h".
98
99 - `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a
100 deployment must be in the terminal state before it is eligible for garbage
101 collection. This is specified using a label suffix like "30s" or "1h".
102
103 - `csi_volume_claim_gc_threshold` `(string: "1h")` - Specifies the minimum age of
104 a CSI volume before it is eligible to have its claims garbage collected.
105 This is specified using a label suffix like "30s" or "1h".
106
107 - `csi_plugin_gc_threshold` `(string: "1h")` - Specifies the minimum age of a
108 CSI plugin before it is eligible for garbage collection if not in use.
109 This is specified using a label suffix like "30s" or "1h".
110
111 - `default_scheduler_config` <code>([scheduler_configuration][update-scheduler-config]:
112 nil)</code> - Specifies the initial default scheduler config when
113 bootstrapping cluster. The parameter is ignored once the cluster is bootstrapped or
114 value is updated through the [API endpoint][update-scheduler-config]. See [the
115 example section](#configuring-scheduler-config) for more details
116 `default_scheduler_config` was introduced in Nomad 0.10.4.
117
118 - `heartbeat_grace` `(string: "10s")` - Specifies the additional time given as a
119 grace period beyond the heartbeat TTL of nodes to account for network and
120 processing delays as well as clock skew. This is specified using a label
121 suffix like "30s" or "1h".
122
123 - `min_heartbeat_ttl` `(string: "10s")` - Specifies the minimum time between
124 node heartbeats. This is used as a floor to prevent excessive updates. This is
125 specified using a label suffix like "30s" or "1h". Lowering the minimum TTL is
126 a tradeoff as it lowers failure detection time of nodes at the tradeoff of
127 false positives and increased load on the leader.
128
129 - `max_heartbeats_per_second` `(float: 50.0)` - Specifies the maximum target
130 rate of heartbeats being processed per second. This allows the TTL to be
131 increased to meet the target rate. Increasing the maximum heartbeats per
132 second is a tradeoff as it lowers failure detection time of nodes at the
133 tradeoff of false positives and increased load on the leader.
134
135 - `non_voting_server` `(bool: false)` - (Enterprise-only) Specifies whether
136 this server will act as a non-voting member of the cluster to help provide
137 read scalability.
138
139 - `num_schedulers` `(int: [num-cores])` - Specifies the number of parallel
140 scheduler threads to run. This can be as many as one per core, or `0` to
141 disallow this server from making any scheduling decisions. This defaults to
142 the number of CPU cores.
143
144 - `protocol_version` `(int: 1)` - Specifies the Nomad protocol version to use
145 when communicating with other Nomad servers. This value is typically not
146 required as the agent internally knows the latest version, but may be useful
147 in some upgrade scenarios.
148
149 - `raft_protocol` `(int: 2)` - Specifies the Raft protocol version to use when
150 communicating with other Nomad servers. This affects available Autopilot
151 features and is typically not required as the agent internally knows the
152 latest version, but may be useful in some upgrade scenarios.
153
154 - `raft_multiplier` `(int: 1)` - An integer multiplier used by Nomad servers to
155 scale key Raft timing parameters. Omitting this value or setting it to 0 uses
156 default timing described below. Lower values are used to tighten timing and
157 increase sensitivity while higher values relax timings and reduce sensitivity.
158 Tuning this affects the time it takes Nomad to detect leader failures and to
159 perform leader elections, at the expense of requiring more network and CPU
160 resources for better performance. The maximum allowed value is 10.
161
162 By default, Nomad will use the highest-performance timing, currently equivalent
163 to setting this to a value of 1. Increasing the timings makes leader election
164 less likely during periods of networking issues or resource starvation. Since
165 leader elections pause Nomad's normal work, it may be beneficial for slow or
166 unreliable networks to wait longer before electing a new leader. The tradeoff
167 when raising this value is that during network partitions or other events
168 (server crash) where a leader is lost, Nomad will not elect a new leader for
169 a longer period of time than the default. The [`nomad.nomad.leader.barrier` and
170 `nomad.raft.leader.lastContact` metrics](/docs/telemetry/metrics) are a good
171 indicator of how often leader elections occur and raft latency.
172
173 - `redundancy_zone` `(string: "")` - (Enterprise-only) Specifies the redundancy
174 zone that this server will be a part of for Autopilot management. For more
175 information, see the [Autopilot Guide](https://learn.hashicorp.com/tutorials/nomad/autopilot).
176
177 - `rejoin_after_leave` `(bool: false)` - Specifies if Nomad will ignore a
178 previous leave and attempt to rejoin the cluster when starting. By default,
179 Nomad treats leave as a permanent intent and does not attempt to join the
180 cluster again when starting. This flag allows the previous state to be used to
181 rejoin the cluster.
182
183 - `server_join` <code>([server_join][server-join]: nil)</code> - Specifies
184 how the Nomad server will connect to other Nomad servers. The `retry_join`
185 fields may directly specify the server address or use go-discover syntax for
186 auto-discovery. See the [server_join documentation][server-join] for more detail.
187
188 - `upgrade_version` `(string: "")` - A custom version of the format X.Y.Z to use
189 in place of the Nomad version when custom upgrades are enabled in Autopilot.
190 For more information, see the [Autopilot Guide](https://learn.hashicorp.com/tutorials/nomad/autopilot).
191
192 ### Deprecated Parameters
193
194 - `retry_join` `(array<string>: [])` - Specifies a list of server addresses to
195 retry joining if the first attempt fails. This is similar to
196 [`start_join`](#start_join), but only invokes if the initial join attempt
197 fails. The list of addresses will be tried in the order specified, until one
198 succeeds. After one succeeds, no further addresses will be contacted. This is
199 useful for cases where we know the address will become available eventually.
200 Use `retry_join` with an array as a replacement for `start_join`, **do not use
201 both options**. See the [server_join][server-join]
202 section for more information on the format of the string. This field is
203 deprecated in favor of the [server_join stanza][server-join].
204
205 - `retry_interval` `(string: "30s")` - Specifies the time to wait between retry
206 join attempts. This field is deprecated in favor of the [server_join
207 stanza][server-join].
208
209 - `retry_max` `(int: 0)` - Specifies the maximum number of join attempts to be
210 made before exiting with a return code of 1. By default, this is set to 0
211 which is interpreted as infinite retries. This field is deprecated in favor of
212 the [server_join stanza][server-join].
213
214 - `start_join` `(array<string>: [])` - Specifies a list of server addresses to
215 join on startup. If Nomad is unable to join with any of the specified
216 addresses, agent startup will fail. See the [server address
217 format](/docs/configuration/server_join#server-address-format)
218 section for more information on the format of the string. This field is
219 deprecated in favor of the [server_join stanza][server-join].
220
221 ## `server` Examples
222
223 ### Common Setup
224
225 This example shows a common Nomad agent `server` configuration stanza. The two
226 IP addresses could also be DNS, and should point to the other Nomad servers in
227 the cluster
228
229 ```hcl
230 server {
231 enabled = true
232 bootstrap_expect = 3
233
234 server_join {
235 retry_join = [ "1.1.1.1", "2.2.2.2" ]
236 retry_max = 3
237 retry_interval = "15s"
238 }
239 }
240 ```
241
242 ### Configuring Data Directory
243
244 This example shows configuring a custom data directory for the server data.
245
246 ```hcl
247 server {
248 data_dir = "/opt/nomad/server"
249 }
250 ```
251
252 ### Automatic Bootstrapping
253
254 The Nomad servers can automatically bootstrap if Consul is configured. For a
255 more detailed explanation, please see the
256 [automatic Nomad bootstrapping documentation](https://learn.hashicorp.com/tutorials/nomad/clustering).
257
258 ### Restricting Schedulers
259
260 This example shows restricting the schedulers that are enabled as well as the
261 maximum number of cores to utilize when participating in scheduling decisions:
262
263 ```hcl
264 server {
265 enabled = true
266 enabled_schedulers = ["batch", "service"]
267 num_schedulers = 7
268 }
269 ```
270
271 ### Bootstrapping with a Custom Scheduler Config ((#configuring-scheduler-config))
272
273 While [bootstrapping a cluster], you can use the `default_scheduler_config` stanza
274 to prime the cluster with a [`SchedulerConfig`][update-scheduler-config]. The
275 scheduler configuration determines which scheduling algorithm is configured—
276 spread scheduling or binpacking—and which job types are eligible for preemption.
277
278 ~> **Warning:** Once the cluster is bootstrapped, you must configure this using
279 the [update scheduler configuration][update-scheduler-config] API. This
280 option is only consulted during bootstrap.
281
282 The structure matches the [Update Scheduler Config][update-scheduler-config] API
283 endpoint, which you should consult for canonical documentation. However, the
284 attributes names must be adapted to HCL syntax by using snake case
285 representations rather than camel case.
286
287 This example shows configuring spread scheduling and enabling preemption for all
288 job-type schedulers.
289
290 ```hcl
291 server {
292 default_scheduler_config {
293 scheduler_algorithm = "spread"
294
295 preemption_config {
296 batch_scheduler_enabled = true
297 system_scheduler_enabled = true
298 service_scheduler_enabled = true
299 }
300 }
301 }
302 ```
303
304 [encryption]: https://learn.hashicorp.com/tutorials/nomad/security-gossip-encryption 'Nomad Encryption Overview'
305 [server-join]: /docs/configuration/server_join 'Server Join'
306 [update-scheduler-config]: /api-docs/operator#update-scheduler-configuration 'Scheduler Config'
307 [bootstrapping a cluster]: /docs/faq#bootstrapping
308 [RFC4648]: https://tools.ietf.org/html/rfc4648#section-5
309 [`nomad operator keygen`]: /docs/commands/operator/keygen