github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/group.mdx (about) 1 --- 2 layout: docs 3 page_title: group Stanza - Job Specification 4 description: |- 5 The "group" stanza defines a series of tasks that should be co-located on the 6 same Nomad client. Any task within a group will be placed on the same client. 7 --- 8 9 # `group` Stanza 10 11 <Placement groups={['job', 'group']} /> 12 13 The `group` stanza defines a series of tasks that should be co-located on the 14 same Nomad client. Any [task][] within a group will be placed on the same 15 client. 16 17 ```hcl 18 job "docs" { 19 group "example" { 20 # ... 21 } 22 } 23 ``` 24 25 ## `group` Parameters 26 27 - `constraint` <code>([Constraint][]: nil)</code> - 28 This can be provided multiple times to define additional constraints. 29 30 - `affinity` <code>([Affinity][]: nil)</code> - This can be provided 31 multiple times to define preferred placement criteria. 32 33 - `spread` <code>([Spread][spread]: nil)</code> - This can be provided 34 multiple times to define criteria for spreading allocations across a 35 node attribute or metadata. See the 36 [Nomad spread reference](/docs/job-specification/spread) for more details. 37 38 - `count` `(int)` - Specifies the number of instances that should be running 39 under for this group. This value must be non-negative. This defaults to the 40 `min` value specified in the [`scaling`](/docs/job-specification/scaling) 41 block, if present; otherwise, this defaults to `1`. 42 43 - `consul` <code>([Consul][consul]: nil)</code> - Specifies Consul configuration 44 options specific to the group. 45 46 - `ephemeral_disk` <code>([EphemeralDisk][]: nil)</code> - Specifies the 47 ephemeral disk requirements of the group. Ephemeral disks can be marked as 48 sticky and support live data migrations. 49 50 - `meta` <code>([Meta][]: nil)</code> - Specifies a key-value map that annotates 51 with user-defined metadata. 52 53 - `migrate` <code>([Migrate][]: nil)</code> - Specifies the group strategy for 54 migrating off of draining nodes. Only service jobs with a count greater than 55 1 support migrate stanzas. 56 57 - `network` <code>([Network][]: <optional>)</code> - Specifies the network 58 requirements and configuration, including static and dynamic port allocations, 59 for the group. 60 61 - `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a 62 rescheduling strategy. Nomad will then attempt to schedule the task on another 63 node if any of the group allocation statuses become "failed". 64 65 - `restart` <code>([Restart][]: nil)</code> - Specifies the restart policy for 66 all tasks in this group. If omitted, a default policy exists for each job 67 type, which can be found in the [restart stanza documentation][restart]. 68 69 - `service` <code>([Service][]: nil)</code> - Specifies integrations with 70 [Consul](/docs/configuration/consul) for service discovery. 71 Nomad automatically registers each service when an allocation 72 is started and de-registers them when the allocation is destroyed. 73 74 - `shutdown_delay` `(string: "0s")` - Specifies the duration to wait when 75 stopping a group's tasks. The delay occurs between Consul deregistration 76 and sending each task a shutdown signal. Ideally, services would fail 77 healthchecks once they receive a shutdown signal. Alternatively 78 `shutdown_delay` may be set to give in-flight requests time to complete 79 before shutting down. A group level `shutdown_delay` will run regardless 80 if there are any defined group services. In addition, tasks may have their 81 own [`shutdown_delay`](/docs/job-specification/task#shutdown_delay) 82 which waits between deregistering task services and stopping the task. 83 84 - `stop_after_client_disconnect` `(string: "")` - Specifies a duration after 85 which a Nomad client will stop allocations, if it cannot communicate with the 86 servers. By default, a client will not stop an allocation until explicitly 87 told to by a server. A client that fails to heartbeat to a server within the 88 [`heartbeat_grace`] window and any allocations running on it will be marked 89 "lost" and Nomad will schedule replacement allocations. The replaced 90 allocations will normally continue to run on the non-responsive client. But 91 you may want them to stop instead — for example, allocations requiring 92 exclusive access to an external resource. When specified, the Nomad client 93 will stop them after this duration. 94 The Nomad client process must be running for this to occur. This setting 95 cannot be used with [`max_client_disconnect`]. 96 97 - `max_client_disconnect` `(string: "")` - Specifies a duration during which a 98 Nomad client will attempt to reconnect allocations after it fails to heartbeat 99 in the [`heartbeat_grace`] window. See [the example code 100 below][max-client-disconnect] for more details. This setting cannot be used 101 with [`stop_after_client_disconnect`]. 102 103 - `task` <code>([Task][]: <required>)</code> - Specifies one or more tasks to run 104 within this group. This can be specified multiple times, to add a task as part 105 of the group. 106 107 - `update` <code>([Update][update]: nil)</code> - Specifies the task's update 108 strategy. When omitted, a default update strategy is applied. 109 110 - `vault` <code>([Vault][]: nil)</code> - Specifies the set of Vault policies 111 required by all tasks in this group. Overrides a `vault` block set at the 112 `job` level. 113 114 - `volume` <code>([Volume][]: nil)</code> - Specifies the volumes that are 115 required by tasks within the group. 116 117 ### `consul` Parameters 118 119 - `namespace` `(string: "")` <EnterpriseAlert inline/> - The Consul namespace in which 120 group and task-level services within the group will be registered. Use of 121 `template` to access Consul KV will read from the specified Consul namespace. 122 Specifying `namespace` takes precedence over the [`-consul-namespace`][consul_namespace] 123 command line argument in `job run`. 124 125 ## `group` Examples 126 127 The following examples only show the `group` stanzas. Remember that the 128 `group` stanza is only valid in the placements listed above. 129 130 ### Specifying Count 131 132 This example specifies that 5 instances of the tasks within this group should be 133 running: 134 135 ```hcl 136 group "example" { 137 count = 5 138 } 139 ``` 140 141 ### Tasks with Constraint 142 143 This example shows two abbreviated tasks with a constraint on the group. This 144 will restrict the tasks to 64-bit operating systems. 145 146 ```hcl 147 group "example" { 148 constraint { 149 attribute = "${attr.cpu.arch}" 150 value = "amd64" 151 } 152 153 task "cache" { 154 # ... 155 } 156 157 task "server" { 158 # ... 159 } 160 } 161 ``` 162 163 ### Metadata 164 165 This example show arbitrary user-defined metadata on the group: 166 167 ```hcl 168 group "example" { 169 meta { 170 my-key = "my-value" 171 } 172 } 173 ``` 174 175 ### Network 176 177 This example shows network constraints as specified in the [network][] stanza 178 which uses the `bridge` networking mode, dynamically allocates two ports, and 179 statically allocates one port: 180 181 ```hcl 182 group "example" { 183 network { 184 mode = "bridge" 185 port "http" {} 186 port "https" {} 187 port "lb" { 188 static = "8889" 189 } 190 } 191 } 192 ``` 193 194 ### Service Discovery 195 196 This example creates a service in Consul. To read more about service discovery 197 in Nomad, please see the [Nomad service discovery documentation][service_discovery]. 198 199 ```hcl 200 group "example" { 201 network { 202 port "api" {} 203 } 204 205 service { 206 name = "example" 207 port = "api" 208 tags = ["default"] 209 210 check { 211 type = "tcp" 212 interval = "10s" 213 timeout = "2s" 214 } 215 } 216 217 task "api" { ... } 218 } 219 ``` 220 221 ### Stop After Client Disconnect 222 223 This example shows how `stop_after_client_disconnect` interacts with 224 other stanzas. For the `first` group, after the default 10 second 225 [`heartbeat_grace`] window expires and 90 more seconds passes, the 226 server will reschedule the allocation. The client will wait 90 seconds 227 before sending a stop signal (`SIGTERM`) to the `first-task` 228 task. After 15 more seconds because of the task's `kill_timeout`, the 229 client will send `SIGKILL`. The `second` group does not have 230 `stop_after_client_disconnect`, so the server will reschedule the 231 allocation after the 10 second [`heartbeat_grace`] expires. It will 232 not be stopped on the client, regardless of how long the client is out 233 of touch. 234 235 Note that if the server's clocks are not closely synchronized with 236 each other, the server may reschedule the group before the client has 237 stopped the allocation. Operators should ensure that clock drift 238 between servers is as small as possible. 239 240 Note also that a group using this feature will be stopped on the 241 client if the Nomad server cluster fails, since the client will be 242 unable to contact any server in that case. Groups opting in to this 243 feature are therefore exposed to an additional runtime dependency and 244 potential point of failure. 245 246 ```hcl 247 group "first" { 248 stop_after_client_disconnect = "90s" 249 250 task "first-task" { 251 kill_timeout = "15s" 252 } 253 } 254 255 group "second" { 256 257 task "second-task" { 258 kill_timeout = "5s" 259 } 260 } 261 ``` 262 263 ### Max Client Disconnect 264 265 `max_client_disconnect` specifies a duration during which a Nomad client will 266 attempt to reconnect allocations after it fails to heartbeat in the 267 [`heartbeat_grace`] window. 268 269 By default, allocations running on a client that fails to heartbeat will be 270 marked "lost". When a client reconnects, its allocations, which may still be 271 healthy, will restart because they have been marked "lost". This can cause 272 issues with stateful tasks or tasks with long restart times. 273 274 Instead, an operator may desire that these allocations reconnect without a 275 restart. When `max_client_disconnect` is specified, the Nomad server will mark 276 clients that fail to heartbeat as "disconnected" rather than "down", and will 277 mark allocations on a disconnected client as "unknown" rather than "lost". These 278 allocations may continue to run on the disconnected client. Replacement 279 allocations will be scheduled according to the allocations' reschedule policy 280 until the disconnected client reconnects. Once a disconnected client reconnects, 281 Nomad will compare the "unknown" allocations with their replacements and keep 282 the one with the best node score. If the `max_client_disconnect` duration 283 expires before the client reconnects, the allocations will be marked "lost". 284 Clients that contain "unknown" allocations will transition to "disconnected" 285 rather than "down" until the last `max_client_disconnect` duration has expired. 286 287 In the example code below, if both of these task groups were placed on the same 288 client and that client experienced a network outage, both of the group's 289 allocations would be marked as "disconnected" at two minutes because of the 290 client's `heartbeat_grace` value of "2m". If the network outage continued for 291 eight hours, and the client continued to fail to heartbeat, the client would 292 remain in a "disconnected" state, as the first group's `max_client_disconnect` 293 is twelve hours. Once all groups' `max_client_disconnect` durations are 294 exceeded, in this case in twelve hours, the client node will be marked as "down" 295 and the allocation will be marked as "lost". If the client had reconnected 296 before twelve hours had passed, the allocations would gracefully reconnect 297 without a restart. 298 299 Max Client Disconnect is useful for edge deployments, or scenarios when 300 operators want zero on-client downtime due to node connectivity issues. This 301 setting cannot be used with [`stop_after_client_disconnect`]. 302 303 ```hcl 304 # server_config.hcl 305 306 server { 307 enabled = true 308 heartbeat_grace = "2m" 309 } 310 ``` 311 312 ```hcl 313 # jobspec.nomad 314 315 group "first" { 316 max_client_disconnect = "12h" 317 318 task "first-task" { 319 ... 320 } 321 } 322 323 group "second" { 324 max_client_disconnect = "6h" 325 326 task "second-task" { 327 ... 328 } 329 } 330 ``` 331 332 ~> **Note:** The `max_client_disconnect` feature is only supported on Nomad 333 version 1.3.0 and above. If you run a job with `max_client_disconnect` on servers 334 where some servers are not upgraded to 1.3.0, the `max_client_disconnect` 335 flag will be _ignored_. Deploying a job with `max_client_disconnect` to a 336 `datacenter` of Nomad clients where all clients are not 1.3.0 or above is unsupported. 337 338 [task]: /docs/job-specification/task 'Nomad task Job Specification' 339 [job]: /docs/job-specification/job 'Nomad job Job Specification' 340 [constraint]: /docs/job-specification/constraint 'Nomad constraint Job Specification' 341 [consul]: /docs/job-specification/group#consul-parameters 342 [consul_namespace]: /docs/commands/job/run#consul-namespace 343 [spread]: /docs/job-specification/spread 'Nomad spread Job Specification' 344 [affinity]: /docs/job-specification/affinity 'Nomad affinity Job Specification' 345 [ephemeraldisk]: /docs/job-specification/ephemeral_disk 'Nomad ephemeral_disk Job Specification' 346 [`heartbeat_grace`]: /docs/configuration/server#heartbeat_grace 347 [`max_client_disconnect`]: /docs/job-specification/group#max_client_disconnect 348 [max-client-disconnect]: /docs/job-specification/group#max-client-disconnect 'the example code below' 349 [`stop_after_client_disconnect`]: /docs/job-specification/group#stop_after_client_disconnect 350 [meta]: /docs/job-specification/meta 'Nomad meta Job Specification' 351 [migrate]: /docs/job-specification/migrate 'Nomad migrate Job Specification' 352 [network]: /docs/job-specification/network 'Nomad network Job Specification' 353 [reschedule]: /docs/job-specification/reschedule 'Nomad reschedule Job Specification' 354 [restart]: /docs/job-specification/restart 'Nomad restart Job Specification' 355 [service]: /docs/job-specification/service 'Nomad service Job Specification' 356 [service_discovery]: /docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery' 357 [update]: /docs/job-specification/update 'Nomad update Job Specification' 358 [vault]: /docs/job-specification/vault 'Nomad vault Job Specification' 359 [volume]: /docs/job-specification/volume 'Nomad volume Job Specification'