github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/group.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/group.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: group Stanza - Job Specification
     4  description: |-
     5    The "group" stanza defines a series of tasks that should be co-located on the
     6    same Nomad client. Any task within a group will be placed on the same client.
     7  ---
     8  
     9  # `group` Stanza
    10  
    11  <Placement groups={['job', 'group']} />
    12  
    13  The `group` stanza defines a series of tasks that should be co-located on the
    14  same Nomad client. Any [task][] within a group will be placed on the same
    15  client.
    16  
    17  ```hcl
    18  job "docs" {
    19    group "example" {
    20      # ...
    21    }
    22  }
    23  ```
    24  
    25  ## `group` Parameters
    26  
    27  - `constraint` <code>([Constraint][]: nil)</code> -
    28    This can be provided multiple times to define additional constraints.
    29  
    30  - `affinity` <code>([Affinity][]: nil)</code> - This can be provided
    31    multiple times to define preferred placement criteria.
    32  
    33  - `spread` <code>([Spread][spread]: nil)</code> - This can be provided
    34    multiple times to define criteria for spreading allocations across a
    35    node attribute or metadata. See the
    36    [Nomad spread reference](/docs/job-specification/spread) for more details.
    37  
    38  - `count` `(int)` - Specifies the number of instances that should be running
    39    under for this group. This value must be non-negative. This defaults to the
    40    `min` value specified in the [`scaling`](/docs/job-specification/scaling)
    41    block, if present; otherwise, this defaults to `1`.
    42  
    43  - `consul` <code>([Consul][consul]: nil)</code> - Specifies Consul configuration
    44    options specific to the group.
    45  
    46  - `ephemeral_disk` <code>([EphemeralDisk][]: nil)</code> - Specifies the
    47    ephemeral disk requirements of the group. Ephemeral disks can be marked as
    48    sticky and support live data migrations.
    49  
    50  - `meta` <code>([Meta][]: nil)</code> - Specifies a key-value map that annotates
    51    with user-defined metadata.
    52  
    53  - `migrate` <code>([Migrate][]: nil)</code> - Specifies the group strategy for
    54    migrating off of draining nodes. Only service jobs with a count greater than
    55    1 support migrate stanzas.
    56  
    57  - `network` <code>([Network][]: &lt;optional&gt;)</code> - Specifies the network
    58    requirements and configuration, including static and dynamic port allocations,
    59    for the group.
    60  
    61  - `reschedule` <code>([Reschedule][]: nil)</code> - Allows to specify a
    62    rescheduling strategy. Nomad will then attempt to schedule the task on another
    63    node if any of the group allocation statuses become "failed".
    64  
    65  - `restart` <code>([Restart][]: nil)</code> - Specifies the restart policy for
    66    all tasks in this group. If omitted, a default policy exists for each job
    67    type, which can be found in the [restart stanza documentation][restart].
    68  
    69  - `service` <code>([Service][]: nil)</code> - Specifies integrations with
    70    [Consul](/docs/configuration/consul) for service discovery.
    71    Nomad automatically registers each service when an allocation
    72    is started and de-registers them when the allocation is destroyed.
    73  
    74  - `shutdown_delay` `(string: "0s")` - Specifies the duration to wait when
    75    stopping a group's tasks. The delay occurs between Consul deregistration
    76    and sending each task a shutdown signal. Ideally, services would fail
    77    healthchecks once they receive a shutdown signal. Alternatively
    78    `shutdown_delay` may be set to give in-flight requests time to complete
    79    before shutting down. A group level `shutdown_delay` will run regardless
    80    if there are any defined group services. In addition, tasks may have their
    81    own [`shutdown_delay`](/docs/job-specification/task#shutdown_delay)
    82    which waits between deregistering task services and stopping the task.
    83  
    84  - `stop_after_client_disconnect` `(string: "")` - Specifies a duration after
    85    which a Nomad client will stop allocations, if it cannot communicate with the
    86    servers. By default, a client will not stop an allocation until explicitly
    87    told to by a server. A client that fails to heartbeat to a server within the
    88    [`heartbeat_grace`] window and any allocations running on it will be marked
    89    "lost" and Nomad will schedule replacement allocations. The replaced
    90    allocations will normally continue to run on the non-responsive client. But
    91    you may want them to stop instead — for example, allocations requiring
    92    exclusive access to an external resource. When specified, the Nomad client
    93    will stop them after this duration.
    94    The Nomad client process must be running for this to occur. This setting
    95    cannot be used with [`max_client_disconnect`].
    96  
    97  - `max_client_disconnect` `(string: "")` - Specifies a duration during which a
    98    Nomad client will attempt to reconnect allocations after it fails to heartbeat
    99    in the [`heartbeat_grace`] window. See [the example code
   100    below][max-client-disconnect] for more details. This setting cannot be used
   101    with [`stop_after_client_disconnect`].
   102  
   103  - `task` <code>([Task][]: &lt;required&gt;)</code> - Specifies one or more tasks to run
   104    within this group. This can be specified multiple times, to add a task as part
   105    of the group.
   106  
   107  - `update` <code>([Update][update]: nil)</code> - Specifies the task's update
   108    strategy. When omitted, a default update strategy is applied.
   109  
   110  - `vault` <code>([Vault][]: nil)</code> - Specifies the set of Vault policies
   111    required by all tasks in this group. Overrides a `vault` block set at the
   112    `job` level.
   113  
   114  - `volume` <code>([Volume][]: nil)</code> - Specifies the volumes that are
   115    required by tasks within the group.
   116  
   117  ### `consul` Parameters
   118  
   119  - `namespace` `(string: "")` <EnterpriseAlert inline/> - The Consul namespace in which
   120    group and task-level services within the group will be registered. Use of
   121    `template` to access Consul KV will read from the specified Consul namespace.
   122    Specifying `namespace` takes precedence over the [`-consul-namespace`][consul_namespace]
   123    command line argument in `job run`.
   124  
   125  ## `group` Examples
   126  
   127  The following examples only show the `group` stanzas. Remember that the
   128  `group` stanza is only valid in the placements listed above.
   129  
   130  ### Specifying Count
   131  
   132  This example specifies that 5 instances of the tasks within this group should be
   133  running:
   134  
   135  ```hcl
   136  group "example" {
   137    count = 5
   138  }
   139  ```
   140  
   141  ### Tasks with Constraint
   142  
   143  This example shows two abbreviated tasks with a constraint on the group. This
   144  will restrict the tasks to 64-bit operating systems.
   145  
   146  ```hcl
   147  group "example" {
   148    constraint {
   149      attribute = "${attr.cpu.arch}"
   150      value     = "amd64"
   151    }
   152  
   153    task "cache" {
   154      # ...
   155    }
   156  
   157    task "server" {
   158      # ...
   159    }
   160  }
   161  ```
   162  
   163  ### Metadata
   164  
   165  This example show arbitrary user-defined metadata on the group:
   166  
   167  ```hcl
   168  group "example" {
   169    meta {
   170      my-key = "my-value"
   171    }
   172  }
   173  ```
   174  
   175  ### Network
   176  
   177  This example shows network constraints as specified in the [network][] stanza
   178  which uses the `bridge` networking mode, dynamically allocates two ports, and
   179  statically allocates one port:
   180  
   181  ```hcl
   182  group "example" {
   183    network {
   184      mode = "bridge"
   185      port "http" {}
   186      port "https" {}
   187      port "lb" {
   188        static = "8889"
   189      }
   190    }
   191  }
   192  ```
   193  
   194  ### Service Discovery
   195  
   196  This example creates a service in Consul. To read more about service discovery
   197  in Nomad, please see the [Nomad service discovery documentation][service_discovery].
   198  
   199  ```hcl
   200  group "example" {
   201    network {
   202      port "api" {}
   203    }
   204  
   205    service {
   206      name = "example"
   207      port = "api"
   208      tags = ["default"]
   209  
   210      check {
   211        type     = "tcp"
   212        interval = "10s"
   213        timeout  = "2s"
   214      }
   215    }
   216  
   217    task "api" { ... }
   218  }
   219  ```
   220  
   221  ### Stop After Client Disconnect
   222  
   223  This example shows how `stop_after_client_disconnect` interacts with
   224  other stanzas. For the `first` group, after the default 10 second
   225  [`heartbeat_grace`] window expires and 90 more seconds passes, the
   226  server will reschedule the allocation. The client will wait 90 seconds
   227  before sending a stop signal (`SIGTERM`) to the `first-task`
   228  task. After 15 more seconds because of the task's `kill_timeout`, the
   229  client will send `SIGKILL`. The `second` group does not have
   230  `stop_after_client_disconnect`, so the server will reschedule the
   231  allocation after the 10 second [`heartbeat_grace`] expires. It will
   232  not be stopped on the client, regardless of how long the client is out
   233  of touch.
   234  
   235  Note that if the server's clocks are not closely synchronized with
   236  each other, the server may reschedule the group before the client has
   237  stopped the allocation. Operators should ensure that clock drift
   238  between servers is as small as possible.
   239  
   240  Note also that a group using this feature will be stopped on the
   241  client if the Nomad server cluster fails, since the client will be
   242  unable to contact any server in that case. Groups opting in to this
   243  feature are therefore exposed to an additional runtime dependency and
   244  potential point of failure.
   245  
   246  ```hcl
   247  group "first" {
   248    stop_after_client_disconnect = "90s"
   249  
   250    task "first-task" {
   251      kill_timeout = "15s"
   252    }
   253  }
   254  
   255  group "second" {
   256  
   257    task "second-task" {
   258      kill_timeout = "5s"
   259    }
   260  }
   261  ```
   262  
   263  ### Max Client Disconnect
   264  
   265  `max_client_disconnect` specifies a duration during which a Nomad client will
   266  attempt to reconnect allocations after it fails to heartbeat in the
   267  [`heartbeat_grace`] window.
   268  
   269  By default, allocations running on a client that fails to heartbeat will be
   270  marked "lost". When a client reconnects, its allocations, which may still be
   271  healthy, will restart because they have been marked "lost". This can cause
   272  issues with stateful tasks or tasks with long restart times.
   273  
   274  Instead, an operator may desire that these allocations reconnect without a
   275  restart. When `max_client_disconnect` is specified, the Nomad server will mark
   276  clients that fail to heartbeat as "disconnected" rather than "down", and will
   277  mark allocations on a disconnected client as "unknown" rather than "lost". These
   278  allocations may continue to run on the disconnected client. Replacement
   279  allocations will be scheduled according to the allocations' reschedule policy
   280  until the disconnected client reconnects. Once a disconnected client reconnects,
   281  Nomad will compare the "unknown" allocations with their replacements and keep
   282  the one with the best node score. If the `max_client_disconnect` duration
   283  expires before the client reconnects, the allocations will be marked "lost".
   284  Clients that contain "unknown" allocations will transition to "disconnected"
   285  rather than "down" until the last `max_client_disconnect` duration has expired.
   286  
   287  In the example code below, if both of these task groups were placed on the same
   288  client and that client experienced a network outage, both of the group's
   289  allocations would be marked as "disconnected" at two minutes because of the
   290  client's `heartbeat_grace` value of "2m". If the network outage continued for
   291  eight hours, and the client continued to fail to heartbeat, the client would
   292  remain in a "disconnected" state, as the first group's `max_client_disconnect`
   293  is twelve hours. Once all groups' `max_client_disconnect` durations are
   294  exceeded, in this case in twelve hours, the client node will be marked as "down"
   295  and the allocation will be marked as "lost". If the client had reconnected
   296  before twelve hours had passed, the allocations would gracefully reconnect
   297  without a restart.
   298  
   299  Max Client Disconnect is useful for edge deployments, or scenarios when
   300  operators want zero on-client downtime due to node connectivity issues. This
   301  setting cannot be used with [`stop_after_client_disconnect`].
   302  
   303  ```hcl
   304  # server_config.hcl
   305  
   306  server {
   307    enabled         = true
   308    heartbeat_grace = "2m"
   309  }
   310  ```
   311  
   312  ```hcl
   313  # jobspec.nomad
   314  
   315  group "first" {
   316    max_client_disconnect = "12h"
   317  
   318    task "first-task" {
   319      ...
   320    }
   321  }
   322  
   323  group "second" {
   324    max_client_disconnect = "6h"
   325  
   326    task "second-task" {
   327      ...
   328    }
   329  }
   330  ```
   331  
   332  ~> **Note:** The `max_client_disconnect` feature is only supported on Nomad
   333  version 1.3.0 and above. If you run a job with `max_client_disconnect` on servers
   334  where some servers are not upgraded to 1.3.0, the `max_client_disconnect`
   335  flag will be _ignored_. Deploying a job with `max_client_disconnect` to a
   336  `datacenter` of Nomad clients where all clients are not 1.3.0 or above is unsupported.
   337  
   338  [task]: /docs/job-specification/task 'Nomad task Job Specification'
   339  [job]: /docs/job-specification/job 'Nomad job Job Specification'
   340  [constraint]: /docs/job-specification/constraint 'Nomad constraint Job Specification'
   341  [consul]: /docs/job-specification/group#consul-parameters
   342  [consul_namespace]: /docs/commands/job/run#consul-namespace
   343  [spread]: /docs/job-specification/spread 'Nomad spread Job Specification'
   344  [affinity]: /docs/job-specification/affinity 'Nomad affinity Job Specification'
   345  [ephemeraldisk]: /docs/job-specification/ephemeral_disk 'Nomad ephemeral_disk Job Specification'
   346  [`heartbeat_grace`]: /docs/configuration/server#heartbeat_grace
   347  [`max_client_disconnect`]: /docs/job-specification/group#max_client_disconnect
   348  [max-client-disconnect]: /docs/job-specification/group#max-client-disconnect 'the example code below'
   349  [`stop_after_client_disconnect`]: /docs/job-specification/group#stop_after_client_disconnect
   350  [meta]: /docs/job-specification/meta 'Nomad meta Job Specification'
   351  [migrate]: /docs/job-specification/migrate 'Nomad migrate Job Specification'
   352  [network]: /docs/job-specification/network 'Nomad network Job Specification'
   353  [reschedule]: /docs/job-specification/reschedule 'Nomad reschedule Job Specification'
   354  [restart]: /docs/job-specification/restart 'Nomad restart Job Specification'
   355  [service]: /docs/job-specification/service 'Nomad service Job Specification'
   356  [service_discovery]: /docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
   357  [update]: /docs/job-specification/update 'Nomad update Job Specification'
   358  [vault]: /docs/job-specification/vault 'Nomad vault Job Specification'
   359  [volume]: /docs/job-specification/volume 'Nomad volume Job Specification'