github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/configuration/server.mdx (about) 1 --- 2 layout: docs 3 page_title: server Stanza - Agent Configuration 4 description: |- 5 The "server" stanza configures the Nomad agent to operate in server mode to 6 participate in scheduling decisions, register with service discovery, handle 7 join failures, and more. 8 --- 9 10 # `server` Stanza 11 12 <Placement groups={['server']} /> 13 14 The `server` stanza configures the Nomad agent to operate in server mode to 15 participate in scheduling decisions, register with service discovery, handle 16 join failures, and more. 17 18 ```hcl 19 server { 20 enabled = true 21 bootstrap_expect = 3 22 server_join { 23 retry_join = [ "1.1.1.1", "2.2.2.2" ] 24 retry_max = 3 25 retry_interval = "15s" 26 } 27 } 28 ``` 29 30 ## `server` Parameters 31 32 - `authoritative_region` `(string: "")` - Specifies the authoritative region, which 33 provides a single source of truth for global configurations such as ACL Policies and 34 global ACL tokens. Non-authoritative regions will replicate from the authoritative 35 to act as a mirror. By default, the local region is assumed to be authoritative. 36 37 - `bootstrap_expect` `(int: required)` - Specifies the number of server nodes to 38 wait for before bootstrapping. It is most common to use the odd-numbered 39 integers `3` or `5` for this value, depending on the cluster size. A value of 40 `1` does not provide any fault tolerance and is not recommended for production 41 use cases. 42 43 - `data_dir` `(string: "[data_dir]/server")` - Specifies the directory to use 44 for server-specific data, including the replicated log. By default, this is 45 the top-level [data_dir](/docs/configuration#data_dir) suffixed with "server", 46 like `"/opt/nomad/server"`. The top-level option must be set, even when 47 setting this value. This must be an absolute path. 48 49 - `enabled` `(bool: false)` - Specifies if this agent should run in server mode. 50 All other server options depend on this value being set. 51 52 - `enabled_schedulers` `(array<string>: [all])` - Specifies which sub-schedulers 53 this server will handle. This can be used to restrict the evaluations that 54 worker threads will dequeue for processing. 55 56 - `enable_event_broker` `(bool: true)` - Specifies if this server will generate 57 events for its event stream. 58 59 - `encrypt` `(string: "")` - Specifies the secret key to use for encryption of 60 Nomad server's gossip network traffic. This key must be 32 bytes that are 61 [RFC4648] "URL and filename safe" base64-encoded. You can generate an 62 appropriately-formatted key with the [`nomad operator keygen`] command. The 63 provided key is automatically persisted to the data directory and loaded 64 automatically whenever the agent is restarted. This means that to encrypt 65 Nomad server's gossip protocol, this option only needs to be provided once 66 on each agent's initial startup sequence. If it is provided after Nomad has 67 been initialized with an encryption key, then the provided key is ignored 68 and a warning will be displayed. See the [encryption 69 documentation][encryption] for more details on this option and its impact on 70 the cluster. 71 72 - `event_buffer_size` `(int: 100)` - Specifies the number of events generated 73 by the server to be held in memory. Increasing this value enables new 74 subscribers to have a larger look back window when initially subscribing. 75 Decreasing will lower the amount of memory used for the event buffer. 76 77 - `node_gc_threshold` `(string: "24h")` - Specifies how long a node must be in a 78 terminal state before it is garbage collected and purged from the system. This 79 is specified using a label suffix like "30s" or "1h". 80 81 - `job_gc_interval` `(string: "5m")` - Specifies the interval between the job 82 garbage collections. Only jobs who have been terminal for at least 83 `job_gc_threshold` will be collected. Lowering the interval will perform more 84 frequent but smaller collections. Raising the interval will perform collections 85 less frequently but collect more jobs at a time. Reducing this interval is 86 useful if there is a large throughput of tasks, leading to a large set of 87 dead jobs. This is specified using a label suffix like "30s" or "3m". `job_gc_interval` 88 was introduced in Nomad 0.10.0. 89 90 - `job_gc_threshold` `(string: "4h")` - Specifies the minimum time a job must be 91 in the terminal state before it is eligible for garbage collection. This is 92 specified using a label suffix like "30s" or "1h". 93 94 - `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an 95 evaluation must be in the terminal state before it is eligible for garbage 96 collection. This is specified using a label suffix like "30s" or "1h". 97 98 - `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a 99 deployment must be in the terminal state before it is eligible for garbage 100 collection. This is specified using a label suffix like "30s" or "1h". 101 102 - `csi_volume_claim_gc_threshold` `(string: "1h")` - Specifies the minimum age of 103 a CSI volume before it is eligible to have its claims garbage collected. 104 This is specified using a label suffix like "30s" or "1h". 105 106 - `csi_plugin_gc_threshold` `(string: "1h")` - Specifies the minimum age of a 107 CSI plugin before it is eligible for garbage collection if not in use. 108 This is specified using a label suffix like "30s" or "1h". 109 110 - `acl_token_gc_threshold` `(string: "1h")` - Specifies the minimum age of an 111 expired ACL token before it is eligible for garbage collection. This is 112 specified using a label suffix like "30s" or "1h". 113 114 - `default_scheduler_config` <code>([scheduler_configuration][update-scheduler-config]: 115 nil)</code> - Specifies the initial default scheduler config when 116 bootstrapping cluster. The parameter is ignored once the cluster is bootstrapped or 117 value is updated through the [API endpoint][update-scheduler-config]. See [the 118 example section](#configuring-scheduler-config) for more details 119 `default_scheduler_config` was introduced in Nomad 0.10.4. 120 121 - `heartbeat_grace` `(string: "10s")` - Specifies the additional time given 122 beyond the heartbeat TTL of Clients to account for network and processing 123 delays and clock skew. This is specified using a label suffix like "30s" or 124 "1h". See [Client Heartbeats](#client-heartbeats) below for details. 125 126 - `min_heartbeat_ttl` `(string: "10s")` - Specifies the minimum time between 127 Client heartbeats. This is used as a floor to prevent excessive updates. This 128 is specified using a label suffix like "30s" or "1h". See [Client 129 Heartbeats](#client-heartbeats) below for details. 130 131 - `failover_heartbeat_ttl` `(string: "5m")` - The time by which all Clients 132 must heartbeat after a Server leader election. This is specified using a label 133 suffix like "30s" or "1h". See [Client Heartbeats](#client-heartbeats) below 134 for details. 135 136 - `max_heartbeats_per_second` `(float: 50.0)` - Specifies the maximum target 137 rate of heartbeats being processed per second. This allows the TTL to be 138 increased to meet the target rate. See [Client 139 Heartbeats](#client-heartbeats) below for details. 140 141 - `non_voting_server` `(bool: false)` - (Enterprise-only) Specifies whether 142 this server will act as a non-voting member of the cluster to help provide 143 read scalability. 144 145 - `num_schedulers` `(int: [num-cores])` - Specifies the number of parallel 146 scheduler threads to run. This can be as many as one per core, or `0` to 147 disallow this server from making any scheduling decisions. This defaults to 148 the number of CPU cores. 149 150 - `license_path` `(string: "")` - Specifies the path to load a Nomad Enterprise 151 license from. This must be an absolute path (`/opt/nomad/license.hclic`). The 152 license can also be set by setting `NOMAD_LICENSE_PATH` or by setting 153 `NOMAD_LICENSE` as the entire license value. `license_path` has the highest 154 precedence, followed by `NOMAD_LICENSE` and then `NOMAD_LICENSE_PATH`. 155 156 - `plan_rejection_tracker` <code>([PlanRejectionTracker](#plan_rejection_tracker-parameters))</code> - 157 Configuration for the plan rejection tracker that the Nomad leader uses to 158 track the history of plan rejections. 159 160 - `raft_boltdb` - This is a nested object that allows configuring options for 161 Raft's BoltDB based log store. 162 - `no_freelist_sync` - Setting this to `true` will disable syncing the BoltDB 163 freelist to disk within the `raft.db` file. Not syncing the freelist to disk 164 will reduce disk IO required for write operations at the expense of longer 165 server startup times. 166 167 - `raft_protocol` `(int: 3)` - Specifies the Raft protocol version to use when 168 communicating with other Nomad servers. This affects available Autopilot 169 features and is typically not required as the agent internally knows the 170 latest version, but may be useful in some upgrade scenarios. Must be `3` in 171 Nomad v1.4 or later. 172 173 - `raft_multiplier` `(int: 1)` - An integer multiplier used by Nomad servers to 174 scale key Raft timing parameters. Omitting this value or setting it to 0 uses 175 default timing described below. Lower values are used to tighten timing and 176 increase sensitivity while higher values relax timings and reduce sensitivity. 177 Tuning this affects the time it takes Nomad to detect leader failures and to 178 perform leader elections, at the expense of requiring more network and CPU 179 resources for better performance. The maximum allowed value is 10. 180 181 By default, Nomad will use the highest-performance timing, currently equivalent 182 to setting this to a value of 1. Increasing the timings makes leader election 183 less likely during periods of networking issues or resource starvation. Since 184 leader elections pause Nomad's normal work, it may be beneficial for slow or 185 unreliable networks to wait longer before electing a new leader. The tradeoff 186 when raising this value is that during network partitions or other events 187 (server crash) where a leader is lost, Nomad will not elect a new leader for 188 a longer period of time than the default. The [`nomad.nomad.leader.barrier` and 189 `nomad.raft.leader.lastContact` metrics](/docs/operations/metrics-reference) are a good 190 indicator of how often leader elections occur and raft latency. 191 192 - `redundancy_zone` `(string: "")` - (Enterprise-only) Specifies the redundancy 193 zone that this server will be a part of for Autopilot management. For more 194 information, see the [Autopilot Guide](https://learn.hashicorp.com/tutorials/nomad/autopilot). 195 196 - `rejoin_after_leave` `(bool: false)` - Specifies if Nomad will ignore a 197 previous leave and attempt to rejoin the cluster when starting. By default, 198 Nomad treats leave as a permanent intent and does not attempt to join the 199 cluster again when starting. This flag allows the previous state to be used to 200 rejoin the cluster. 201 202 - `root_key_gc_interval` `(string: "10m")` - Specifies the interval between 203 [encryption key][] metadata garbage collections. 204 205 - `root_key_gc_threshold` `(string: "1h")` - Specifies the minimum time that an 206 [encryption key][] must exist before it can be eligible for garbage 207 collection. 208 209 - `root_key_rotation_threshold` `(string: "720h")` - Specifies the minimum time 210 that an [encryption key][] must exist before it is automatically rotated on 211 the next garbage collection interval. 212 213 - `server_join` <code>([server_join][server-join]: nil)</code> - Specifies 214 how the Nomad server will connect to other Nomad servers. The `retry_join` 215 fields may directly specify the server address or use go-discover syntax for 216 auto-discovery. See the [server_join documentation][server-join] for more detail. 217 218 - `upgrade_version` `(string: "")` - A custom version of the format X.Y.Z to use 219 in place of the Nomad version when custom upgrades are enabled in Autopilot. 220 For more information, see the [Autopilot Guide](https://learn.hashicorp.com/tutorials/nomad/autopilot). 221 222 - `search` <code>([search][search]: nil)</code> - Specifies configuration parameters 223 for the Nomad search API. 224 225 ### Deprecated Parameters 226 227 - `retry_join` `(array<string>: [])` - Specifies a list of server addresses to 228 retry joining if the first attempt fails. This is similar to 229 [`start_join`](#start_join), but only invokes if the initial join attempt 230 fails. The list of addresses will be tried in the order specified, until one 231 succeeds. After one succeeds, no further addresses will be contacted. This is 232 useful for cases where we know the address will become available eventually. 233 Use `retry_join` with an array as a replacement for `start_join`, **do not use 234 both options**. See the [server_join][server-join] 235 section for more information on the format of the string. This field is 236 deprecated in favor of the [server_join stanza][server-join]. 237 238 - `retry_interval` `(string: "30s")` - Specifies the time to wait between retry 239 join attempts. This field is deprecated in favor of the [server_join 240 stanza][server-join]. 241 242 - `retry_max` `(int: 0)` - Specifies the maximum number of join attempts to be 243 made before exiting with a return code of 1. By default, this is set to 0 244 which is interpreted as infinite retries. This field is deprecated in favor of 245 the [server_join stanza][server-join]. 246 247 - `start_join` `(array<string>: [])` - Specifies a list of server addresses to 248 join on startup. If Nomad is unable to join with any of the specified 249 addresses, agent startup will fail. See the [server address 250 format](/docs/configuration/server_join#server-address-format) 251 section for more information on the format of the string. This field is 252 deprecated in favor of the [server_join stanza][server-join]. 253 254 ### `plan_rejection_tracker` Parameters 255 256 The leader plan rejection tracker can be adjusted to prevent evaluations from 257 getting stuck due to always being scheduled to a client that may have an 258 unexpected issue. Refer to [Monitoring Nomad][monitoring_nomad_progress] for 259 more details. 260 261 - `enabled` `(bool: false)` - Specifies if plan rejections should be tracked. 262 263 - `node_threshold` `(int: 100)` - The number of plan rejections for a node 264 within the `node_window` to trigger a client to be set as ineligible. 265 266 - `node_window` `(string: "5m")` - The time window for when plan rejections for 267 a node should be considered. 268 269 If you observe too many false positives (clients being marked as ineligible 270 even if they don't present any problem) you may want to increase 271 `node_threshold`. 272 273 Or if you are noticing jobs not being scheduled due to plan rejections for the 274 same `node_id` and the client is not being set as ineligible you can try 275 increasing the `node_window` so more historical rejections are taken into 276 account. 277 278 ## `server` Examples 279 280 ### Common Setup 281 282 This example shows a common Nomad agent `server` configuration stanza. The two 283 IP addresses could also be DNS, and should point to the other Nomad servers in 284 the cluster 285 286 ```hcl 287 server { 288 enabled = true 289 bootstrap_expect = 3 290 291 server_join { 292 retry_join = [ "1.1.1.1", "2.2.2.2" ] 293 retry_max = 3 294 retry_interval = "15s" 295 } 296 } 297 ``` 298 299 ### Configuring Data Directory 300 301 This example shows configuring a custom data directory for the server data. 302 303 ```hcl 304 server { 305 data_dir = "/opt/nomad/server" 306 } 307 ``` 308 309 ### Automatic Bootstrapping 310 311 The Nomad servers can automatically bootstrap if Consul is configured. For a 312 more detailed explanation, please see the 313 [automatic Nomad bootstrapping documentation](https://learn.hashicorp.com/tutorials/nomad/clustering). 314 315 ### Restricting Schedulers 316 317 This example shows restricting the schedulers that are enabled as well as the 318 maximum number of cores to utilize when participating in scheduling decisions: 319 320 ```hcl 321 server { 322 enabled = true 323 enabled_schedulers = ["batch", "service"] 324 num_schedulers = 7 325 } 326 ``` 327 328 ### Bootstrapping with a Custom Scheduler Config ((#configuring-scheduler-config)) 329 330 While [bootstrapping a cluster], you can use the `default_scheduler_config` stanza 331 to prime the cluster with a [`SchedulerConfig`][update-scheduler-config]. The 332 scheduler configuration determines which scheduling algorithm is configured— 333 spread scheduling or binpacking—and which job types are eligible for preemption. 334 335 ~> **Warning:** Once the cluster is bootstrapped, you must configure this using 336 the [update scheduler configuration][update-scheduler-config] API. This 337 option is only consulted during bootstrap. 338 339 The structure matches the [Update Scheduler Config][update-scheduler-config] API 340 endpoint, which you should consult for canonical documentation. However, the 341 attributes names must be adapted to HCL syntax by using snake case 342 representations rather than camel case. 343 344 This example shows configuring spread scheduling and enabling preemption for all 345 job-type schedulers. 346 347 ```hcl 348 server { 349 default_scheduler_config { 350 scheduler_algorithm = "spread" 351 memory_oversubscription_enabled = true 352 reject_job_registration = false 353 pause_eval_broker = false # New in Nomad 1.3.2 354 355 preemption_config { 356 batch_scheduler_enabled = true 357 system_scheduler_enabled = true 358 service_scheduler_enabled = true 359 sysbatch_scheduler_enabled = true # New in Nomad 1.2 360 } 361 } 362 } 363 ``` 364 365 ## Client Heartbeats ((#client-heartbeats)) 366 367 ~> This is an advanced topic. It is most beneficial to clusters over 1,000 368 nodes or with unreliable networks or nodes (eg some edge deployments). 369 370 Nomad Clients periodically heartbeat to Nomad Servers to confirm they are 371 operating as expected. Nomad Clients which do not heartbeat in the specified 372 amount of time are considered `down` and their allocations are marked as `lost` 373 or `disconnected` (if [`max_client_disconnect`][max_client_disconnect] is set) 374 and rescheduled. 375 376 The various heartbeat related parameters allow you to tune the following 377 tradeoffs: 378 379 - The longer the heartbeat period, the longer a `down` Client's workload will 380 take to be rescheduled. 381 - The shorter the heartbeat period, the more likely transient network issues, 382 leader elections, and other temporary issues could cause a perfectly 383 functional Client and its workloads to be marked as `down` and the work 384 rescheduled. 385 386 While Nomad Clients can connect to any Server, all heartbeats are forwarded to 387 the leader for processing. Since this heartbeat processing consumes resources, 388 Nomad adjusts the rate at which Clients heartbeat based on cluster size. The 389 goal is to try to keep the resource cost of processing heartbeats constant 390 regardless of cluster size. 391 392 The base formula for determining how often a Client must heartbeat is: 393 394 ``` 395 <number of Clients> / <max_heartbeats_per_second> 396 ``` 397 398 Other factors modify this base TTL: 399 400 - A random factor up to `2x` is added to the base TTL to prevent the 401 [thundering herd][herd] problem where a large number of clients attempt to 402 heartbeat at exactly the same time. 403 - [`min_heartbeat_ttl`](#min_heartbeat_ttl) is used as the lower bound to 404 prevent small clusters from sending excessive heartbeats. 405 - [`heartbeat_grace`](#heartbeat_grace) is the amount of _extra_ time the 406 leader will wait for a heartbeat beyond the base heartbeat. 407 - After a leader election all Clients are given up to `failover_heartbeat_ttl` 408 to successfully heartbeat. This gives Clients time to discover a functioning 409 Server in case they were directly connected to a leader that crashed. 410 411 For example, given the default values for heartbeat parameters, different sized 412 clusters will use the following TTLs for the heartbeats. Note that the `Server TTL` 413 simply adds the `heartbeat_grace` parameter to the TTL Clients are given. 414 415 | Clients | Client TTL | Server TTL | Safe after elections | 416 | ------- | ----------- | ----------- | -------------------- | 417 | 10 | 10s - 20s | 20s - 30s | yes | 418 | 100 | 10s - 20s | 20s - 30s | yes | 419 | 1000 | 20s - 40s | 30s - 50s | yes | 420 | 5000 | 100s - 200s | 110s - 210s | yes | 421 | 10000 | 200s - 400s | 210s - 410s | NO (see below) | 422 423 Regardless of size, all clients will have a Server TTL of 424 `failover_heartbeat_ttl` after a leader election. It should always be larger 425 than the maximum Client TTL for your cluster size in order to prevent marking 426 live Clients as `down`. 427 428 For clusters over 5000 Clients you should increase `failover_heartbeat_ttl` 429 using the following formula: 430 431 ``` 432 (2 * (<number of Clients> / <max_heartbeats_per_second>)) + (10 * <min_heartbeat_ttl>) 433 434 # For example with 6000 Clients: 435 (2 * (6000 / 50)) + (10 * 10) = 340s (5m40s) 436 ``` 437 438 This ensures Clients have some additional time to failover even if they were 439 told to heartbeat after the maximum interval. 440 441 The actual value used should take into consideration how much tolerance your 442 system has for a delay in noticing crashed Clients. For example a 443 `failover_heartbeat_ttl` of 30 minutes may give even the slowest clients in the 444 largest clusters ample time to heartbeat after an election. However if the 445 election was due to a datacenter-wide failure affecting Clients, it will be 30 446 minutes before Nomad recognizes that they are `down` and reschedules their 447 work. 448 449 [encryption]: https://learn.hashicorp.com/tutorials/nomad/security-gossip-encryption 'Nomad Encryption Overview' 450 [server-join]: /docs/configuration/server_join 'Server Join' 451 [update-scheduler-config]: /api-docs/operator/scheduler#update-scheduler-configuration 'Scheduler Config' 452 [bootstrapping a cluster]: /docs/faq#bootstrapping 453 [rfc4648]: https://tools.ietf.org/html/rfc4648#section-5 454 [monitoring_nomad_progress]: /docs/operations/monitoring-nomad#progress 455 [`nomad operator keygen`]: /docs/commands/operator/keygen 456 [search]: /docs/configuration/search 457 [encryption key]: /docs/operations/key-management 458 [max_client_disconnect]: /docs/job-specification/group#max-client-disconnect 459 [herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem