github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/check.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: check Block - Job Specification
     4  description: |-
     5    The "check" block declares service check definition for a service registered into the Nomad or Consul service provider.
     6  ---
     7  
     8  # `check` Stanza
     9  
    10  <Placement
    11    groups={[
    12      ['job', 'group', 'service', 'check'],
    13      ['job', 'group', 'task', 'service', 'check'],
    14    ]}
    15  />
    16  
    17  The `check` block instructs Nomad to register a check associated with a [service][service]
    18  into the Nomad or Consul service provider.
    19  
    20  ```hcl
    21  job "example" {
    22    datacenters = ["dc1"]
    23  
    24    group "cache" {
    25      network {
    26        port "db" { to = 6379 }
    27      }
    28  
    29      service {
    30        provider = "nomad"
    31        name     = "redis"
    32        port     = "db"
    33        check {
    34          name     = "redis_probe"
    35          type     = "tcp"
    36          interval = "10s"
    37          timeout  = "1s"
    38        }
    39      }
    40  
    41      task "redis" {
    42        driver = "docker"
    43        config {
    44          image = "redis:7"
    45          ports = ["db"]
    46        }
    47      }
    48    }
    49  }
    50  ```
    51  
    52  ### `check` Parameters
    53  
    54  - `address_mode` `(string: "host")` - Same as `address_mode` on `service`.
    55    Unlike services, checks do not have an `auto` address mode as there's no way
    56    for Nomad to know which is the best address to use for checks. Consul needs
    57    access to the address for any HTTP or TCP checks. See
    58    [below for details.](#using-driver-address-mode) Unlike `port`, this setting
    59    is _not_ inherited from the `service`.
    60    If the service `address` is set and the check `address_mode` is not set, the
    61    service `address` value will be used for the check address.
    62  
    63  - `args` `(array<string>: [])` - Specifies additional arguments to the
    64    `command`. This only applies to script-based health checks.
    65  
    66  - `check_restart` - See [`check_restart` stanza][check_restart_stanza].
    67  
    68  - `command` `(string: <varies>)` - Specifies the command to run for performing
    69    the health check. The script must exit: 0 for passing, 1 for warning, or any
    70    other value for a failing health check. This is required for script-based
    71    health checks. Only supported in the Consul service provider.
    72  
    73    ~> **Caveat:** The command must be the path to the command on disk, and no
    74    shell exists by default. That means operators like `||` or `&&` are not
    75    available. Additionally, all arguments must be supplied via the `args`
    76    parameter. To achieve the behavior of shell operators, specify the command
    77    as a shell, like `/bin/bash` and then use `args` to run the check.
    78  
    79  - `grpc_service` `(string: <optional>)` - What service, if any, to specify in
    80    the gRPC health check. gRPC health checks require Consul 1.0.5 or later.
    81  
    82  - `grpc_use_tls` `(bool: false)` - Use TLS to perform a gRPC health check. May
    83    be used with `tls_skip_verify` to use TLS but skip certificate verification.
    84  
    85  - `initial_status` `(string: <enum>)` - Specifies the starting status of the
    86    service. Valid options are `passing`, `warning`, and `critical`. Omitting
    87    this field (or submitting an empty string) will result in the Consul default
    88    behavior, which is `critical`. Only supported in the Consul service provider.
    89    In the Nomad service provider, the initial status of a check is `pending`
    90    until Nomad produces an initial check status result.
    91  
    92  - `success_before_passing` `(int:0)` - The number of consecutive successful checks
    93    required before Consul will transition the service status to [`passing`][consul_passfail].
    94    Only supported in the Consul service provider.
    95  
    96  - `failures_before_critical` `(int:0)` - The number of consecutive failing checks
    97    required before Consul will transition the service status to [`critical`][consul_passfail].
    98    Only supported in the Consul service provider.
    99  
   100  - `interval` `(string: <required>)` - Specifies the frequency of the health checks
   101    that Consul or Nomad service provider will perform. This is specified using a label
   102    suffix like "30s" or "1h". This must be greater than or equal to "1s".
   103  
   104  - `method` `(string: "GET")` - Specifies the HTTP method to use for HTTP
   105    checks. Must be a valid HTTP method.
   106  
   107  - `body` `(string: "")` - Specifies the HTTP body to use for HTTP checks.
   108  
   109  - `name` `(string: "service: <name> check")` - Specifies the name of the health
   110    check. If the name is not specified Nomad generates one based on the service name.
   111  
   112  - `path` `(string: <varies>)` - Specifies the path of the HTTP endpoint which
   113    will be queried to observe the health of a service. Nomad will automatically
   114    add the IP of the service and the port, so this is just the relative URL to
   115    the health check endpoint. This is required for http-based health checks.
   116  
   117  - `expose` `(bool: false)` - Specifies whether an [Expose Path](/docs/job-specification/expose#path-parameters)
   118    should be automatically generated for this check. Only compatible with
   119    Connect-enabled task-group services using the default Connect proxy. If set, check
   120    [`type`][type] must be `http` or `grpc`, and check `name` must be set.
   121    Only supported in the Consul service provider.
   122  
   123  - `port` `(string: <varies>)` - Specifies the label of the port on which the
   124    check will be performed. Note this is the _label_ of the port and not the port
   125    number unless `address_mode = driver`. The port label must match one defined
   126    in the [`network`][network] stanza. If a port value was declared on the
   127    `service`, this will inherit from that value if not supplied. If supplied,
   128    this value takes precedence over the `service.port` value. This is useful for
   129    services which operate on multiple ports. `grpc`, `http`, and `tcp` checks
   130    require a port while `script` checks do not. Checks will use the host IP and
   131    ports by default. In Nomad 0.7.1 or later numeric ports may be used if
   132    `address_mode="driver"` is set on the check.
   133  
   134  - `protocol` `(string: "http")` - Specifies the protocol for the http-based
   135    health checks. Valid options are `http` and `https`.
   136  
   137  - `task` `(string: "")` - Specifies the task associated with this
   138    check. Scripts are executed within the task's environment, and
   139    `check_restart` stanzas will apply to the specified task. Inherits
   140    the [`service.task`][service_task] value if not set. Must be unset
   141    or equivelent to `service.task` in task-level services.
   142  
   143  - `timeout` `(string: <required>)` - Specifies how long to wait for a health check
   144    query to succeed. This is specified using a label suffix like "30s" or "1h". This
   145    must be greater than or equal to "1s"
   146  
   147    ~> **Caveat:** Script checks use the task driver to execute in the task's
   148    environment. For task drivers with namespace isolation such as `docker` or
   149    `exec`, setting up the context for the script check may take an unexpectedly
   150    long amount of time (a full second or two), especially on busy hosts. The
   151    timeout configuration must allow for both this setup and the execution of
   152    the script. Operators should use long timeouts (5 or more seconds) for script
   153    checks, and monitor telemetry for
   154    `client.allocrunner.taskrunner.tasklet_timeout`.
   155  
   156  - `type` `(string: <required>)` - This indicates the check types supported by
   157    Nomad. For Consul service checks, valid options are `grpc`, `http`, `script`,
   158    and `tcp`. For Nomad service checks, valid options are `http` and `tcp`.
   159  
   160  - `tls_skip_verify` `(bool: false)` - Skip verifying TLS certificates for HTTPS
   161    checks. Only supported in the Consul service provider.
   162  
   163  - `on_update` `(string: "require_healthy")` - Specifies how checks should be
   164    evaluated when determining deployment health (including a job's initial
   165    deployment). This allows job submitters to define certain checks as readiness
   166    checks, progressing a deployment even if the Service's checks are not yet
   167    healthy. Checks inherit the Service's value by default. The check status is
   168    not altered in the service provider and is only used to determine the check's
   169    health during an update.
   170  
   171    - `require_healthy` - In order for Nomad to consider the check healthy during
   172      an update it must report as healthy.
   173  
   174    - `ignore_warnings` - If a Service Check reports as warning, Nomad will treat
   175      the check as healthy. The Check will still be in a warning state in Consul.
   176  
   177    - `ignore` - Any status will be treated as healthy.
   178  
   179    ~> **Caveat:** `on_update` is only compatible with certain
   180    [`check_restart`][check_restart_stanza] configurations. `on_update = "ignore_warnings"` requires that `check_restart.ignore_warnings = true`.
   181    `check_restart` can however specify `ignore_warnings = true` with `on_update = "require_healthy"`. If `on_update` is set to `ignore`, `check_restart` must
   182    be omitted entirely.
   183  
   184  #### `header` Stanza
   185  
   186  HTTP checks may include a `header` stanza to set HTTP headers. The `header`
   187  stanza parameters have lists of strings as values. Multiple values will cause
   188  the header to be set multiple times, once for each value.
   189  
   190  ```hcl
   191  service {
   192    # ...
   193    check {
   194      type     = "http"
   195      port     = "lb"
   196      path     = "/_healthz"
   197      interval = "5s"
   198      timeout  = "2s"
   199      header {
   200        Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
   201      }
   202    }
   203  }
   204  ```
   205  
   206  ### HTTP Health Check
   207  
   208  This example shows a service with an HTTP health check. This will query the
   209  service on the IP and port registered with Nomad at `/_healthz` every 5 seconds,
   210  giving the service a maximum of 2 seconds to return a response, and include an
   211  Authorization header. Any non-2xx code is considered a failure.
   212  
   213  ```hcl
   214  service {
   215    check {
   216      type     = "http"
   217      port     = "lb"
   218      path     = "/_healthz"
   219      interval = "5s"
   220      timeout  = "2s"
   221      header {
   222        Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
   223      }
   224    }
   225  }
   226  ```
   227  
   228  ### Multiple Health Checks
   229  
   230  This example shows a service with multiple health checks defined. All health
   231  checks must be passing in order for the service to register as healthy.
   232  
   233  ```hcl
   234  service {
   235    check {
   236      name     = "HTTP Check"
   237      type     = "http"
   238      port     = "lb"
   239      path     = "/_healthz"
   240      interval = "5s"
   241      timeout  = "2s"
   242    }
   243  
   244    check {
   245      name     = "HTTPS Check"
   246      type     = "http"
   247      protocol = "https"
   248      port     = "lb"
   249      path     = "/_healthz"
   250      interval = "5s"
   251      timeout  = "2s"
   252      method   = "POST"
   253    }
   254  
   255    check {
   256      name      = "Postgres Check"
   257      type      = "script"
   258      command   = "/usr/local/bin/pg-tools"
   259      args      = ["verify", "database", "prod", "up"]
   260      interval  = "5s"
   261      timeout   = "2s"
   262      on_update = "ignore_warnings"
   263    }
   264  }
   265  ```
   266  
   267  ### gRPC Health Check
   268  
   269  gRPC health checks use the same host and port behavior as `http` and `tcp`
   270  checks, but gRPC checks also have an optional gRPC service to health check. Not
   271  all gRPC applications require a service to health check.
   272  
   273  ```hcl
   274  service {
   275    check {
   276      type            = "grpc"
   277      port            = "rpc"
   278      interval        = "5s"
   279      timeout         = "2s"
   280      grpc_service    = "example.Service"
   281      grpc_use_tls    = true
   282      tls_skip_verify = true
   283    }
   284  }
   285  ```
   286  
   287  In this example Consul would health check the `example.Service` service on the
   288  `rpc` port defined in the task's [network resources][network] stanza. See
   289  [Using Driver Address Mode](#using-driver-address-mode) for details on address
   290  selection.
   291  
   292  ### Script Checks with Shells
   293  
   294  Note that script checks run inside the task. If your task is a Docker container,
   295  the script will run inside the Docker container. If your task is running in a
   296  chroot, it will run in the chroot. Please keep this in mind when authoring check
   297  scripts.
   298  
   299  This example shows a service with a script check that is evaluated and interpolated in a shell; it
   300  tests whether a file is present at `${HEALTH_CHECK_FILE}` environment variable:
   301  
   302  ```hcl
   303  service {
   304    check {
   305      type    = "script"
   306      command = "/bin/bash"
   307      args    = ["-c", "test -f ${HEALTH_CHECK_FILE}"]
   308    }
   309  }
   310  ```
   311  
   312  Using `/bin/bash` (or another shell) is required here to interpolate the `${HEALTH_CHECK_FILE}` value.
   313  
   314  The following examples of `command` fields **will not work**:
   315  
   316  ```hcl
   317  # invalid because command is not a path
   318  check {
   319    type    = "script"
   320    command = "test -f /tmp/file.txt"
   321  }
   322  
   323  # invalid because path will not be interpolated
   324  check {
   325    type    = "script"
   326    command = "/bin/test"
   327    args    = ["-f", "${HEALTH_CHECK_FILE}"]
   328  }
   329  ```
   330  
   331  ### Healthiness vs. Readiness Checks
   332  
   333  Multiple checks for a service can be composed to create healthiness and readiness
   334  checks by configuring [`on_update`][on_update] for the check.
   335  
   336  ```hcl
   337  service {
   338    # This is a healthiness check that will be used to verify the service
   339    # is responsive to tcp connections and behaving as expected.
   340    check {
   341      name     = "connection_tcp"
   342      type     = "tcp"
   343      port     = 6379
   344      interval = "10s"
   345      timeout  = "2s"
   346    }
   347  
   348    # This is a readiness check that is used to verify that, for example, the
   349    # application has elected a leader by making a request to its /leader endpoint.
   350    # Failures of this check are ignored during deployments.
   351    check {
   352      name      = "leader_elected"
   353      type      = "http"
   354      path      = "/leader"
   355      interval  = "10s"
   356      timeout   = "2s"
   357      on_update = "ignore_warnings"
   358    }
   359  }
   360  ```
   361  
   362  For checks registered into the Nomad service provider, the status information will
   363  indicate `Mode = readiness` for readiness checks and `Mode = healthiness` for health
   364  checks.
   365  
   366  ### Check status on CLI
   367  
   368  For checks registered into the Nomad service provider, the status information of
   369  checks can be viewed per-allocation. The `alloc status` command now includes
   370  summary information for Nomad service checks.
   371  
   372  ```
   373  ➜ nomad alloc status <allocation-id>
   374  ```
   375  
   376  ```
   377  Nomad Service Checks:
   378  Service   Task     Name          Mode         Status
   379  database  task     db_tcp_probe  readiness    success
   380  web       (group)  healthz       healthiness  failure
   381  web       (group)  index-page    healthiness  success
   382  ```
   383  
   384  The `alloc checks` command can be used for viewing complete check status information
   385  for all checks in an allocation.
   386  
   387  ```
   388  ➜ noamd alloc checks <allocation-id>
   389  ```
   390  
   391  ```
   392  Status of 3 Nomad Service Checks
   393  
   394  ID         =  d8651d93a50b9e28375a7beb9418c418
   395  Name       =  db_tcp_probe
   396  Group      =  example.group[0]
   397  Task       =  task
   398  Service    =  database
   399  Status     =  success
   400  Mode       =  readiness
   401  Timestamp  =  2022-08-22T10:41:23-05:00
   402  Output     =  nomad: tcp ok
   403  
   404  ID          =  0413b61bda7014f02671675d7e146373
   405  Name        =  index-page
   406  Group       =  example.group[0]
   407  Task        =  (group)
   408  Service     =  web
   409  Status      =  success
   410  StatusCode  =  200
   411  Mode        =  healthiness
   412  Timestamp   =  2022-08-22T10:41:23-05:00
   413  Output      =  nomad: http ok
   414  
   415  ID         =  c3cce3f0c97975f84bbf39bdd50deaea
   416  Name       =  healthz
   417  Group      =  example.group[0]
   418  Task       =  (group)
   419  Service    =  web
   420  Status     =  failure
   421  Mode       =  healthiness
   422  Timestamp  =  2022-08-22T10:41:23-05:00
   423  Output     =  nomad: Get "http://:9999/": dial tcp :9999: connect: connection refused
   424  ```
   425  
   426  ---
   427  
   428  <sup>
   429    <small>1</small>
   430  </sup>
   431  <small>
   432    {' '}
   433    Script checks are not supported for the QEMU driver since the Nomad client
   434    does not have access to the file system of a task for that driver.
   435  </small>
   436  
   437  [check_restart_stanza]: /docs/job-specification/check_restart
   438  [consul_passfail]: https://developer.hashicorp.com/consul/docs/discovery/checks#success-failures-before-passing-critical
   439  [network]: /docs/job-specification/network 'Nomad network Job Specification'
   440  [service]: /docs/job-specification/service
   441  [service_task]: /docs/job-specification/service#task-1
   442  [on_update]: /docs/job-specification/service#on_update