github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/check.mdx (about) 1 --- 2 layout: docs 3 page_title: check Block - Job Specification 4 description: |- 5 The "check" block declares service check definition for a service registered into the Nomad or Consul service provider. 6 --- 7 8 # `check` Stanza 9 10 <Placement 11 groups={[ 12 ['job', 'group', 'service', 'check'], 13 ['job', 'group', 'task', 'service', 'check'], 14 ]} 15 /> 16 17 The `check` block instructs Nomad to register a check associated with a [service][service] 18 into the Nomad or Consul service provider. 19 20 ```hcl 21 job "example" { 22 datacenters = ["dc1"] 23 24 group "cache" { 25 network { 26 port "db" { to = 6379 } 27 } 28 29 service { 30 provider = "nomad" 31 name = "redis" 32 port = "db" 33 check { 34 name = "redis_probe" 35 type = "tcp" 36 interval = "10s" 37 timeout = "1s" 38 } 39 } 40 41 task "redis" { 42 driver = "docker" 43 config { 44 image = "redis:7" 45 ports = ["db"] 46 } 47 } 48 } 49 } 50 ``` 51 52 ### `check` Parameters 53 54 - `address_mode` `(string: "host")` - Same as `address_mode` on `service`. 55 Unlike services, checks do not have an `auto` address mode as there's no way 56 for Nomad to know which is the best address to use for checks. Consul needs 57 access to the address for any HTTP or TCP checks. See 58 [below for details.](#using-driver-address-mode) Unlike `port`, this setting 59 is _not_ inherited from the `service`. 60 If the service `address` is set and the check `address_mode` is not set, the 61 service `address` value will be used for the check address. 62 63 - `args` `(array<string>: [])` - Specifies additional arguments to the 64 `command`. This only applies to script-based health checks. 65 66 - `check_restart` - See [`check_restart` stanza][check_restart_stanza]. 67 68 - `command` `(string: <varies>)` - Specifies the command to run for performing 69 the health check. The script must exit: 0 for passing, 1 for warning, or any 70 other value for a failing health check. This is required for script-based 71 health checks. Only supported in the Consul service provider. 72 73 ~> **Caveat:** The command must be the path to the command on disk, and no 74 shell exists by default. That means operators like `||` or `&&` are not 75 available. Additionally, all arguments must be supplied via the `args` 76 parameter. To achieve the behavior of shell operators, specify the command 77 as a shell, like `/bin/bash` and then use `args` to run the check. 78 79 - `grpc_service` `(string: <optional>)` - What service, if any, to specify in 80 the gRPC health check. gRPC health checks require Consul 1.0.5 or later. 81 82 - `grpc_use_tls` `(bool: false)` - Use TLS to perform a gRPC health check. May 83 be used with `tls_skip_verify` to use TLS but skip certificate verification. 84 85 - `initial_status` `(string: <enum>)` - Specifies the starting status of the 86 service. Valid options are `passing`, `warning`, and `critical`. Omitting 87 this field (or submitting an empty string) will result in the Consul default 88 behavior, which is `critical`. Only supported in the Consul service provider. 89 In the Nomad service provider, the initial status of a check is `pending` 90 until Nomad produces an initial check status result. 91 92 - `success_before_passing` `(int:0)` - The number of consecutive successful checks 93 required before Consul will transition the service status to [`passing`][consul_passfail]. 94 Only supported in the Consul service provider. 95 96 - `failures_before_critical` `(int:0)` - The number of consecutive failing checks 97 required before Consul will transition the service status to [`critical`][consul_passfail]. 98 Only supported in the Consul service provider. 99 100 - `interval` `(string: <required>)` - Specifies the frequency of the health checks 101 that Consul or Nomad service provider will perform. This is specified using a label 102 suffix like "30s" or "1h". This must be greater than or equal to "1s". 103 104 - `method` `(string: "GET")` - Specifies the HTTP method to use for HTTP 105 checks. Must be a valid HTTP method. 106 107 - `body` `(string: "")` - Specifies the HTTP body to use for HTTP checks. 108 109 - `name` `(string: "service: <name> check")` - Specifies the name of the health 110 check. If the name is not specified Nomad generates one based on the service name. 111 112 - `path` `(string: <varies>)` - Specifies the path of the HTTP endpoint which 113 will be queried to observe the health of a service. Nomad will automatically 114 add the IP of the service and the port, so this is just the relative URL to 115 the health check endpoint. This is required for http-based health checks. 116 117 - `expose` `(bool: false)` - Specifies whether an [Expose Path](/docs/job-specification/expose#path-parameters) 118 should be automatically generated for this check. Only compatible with 119 Connect-enabled task-group services using the default Connect proxy. If set, check 120 [`type`][type] must be `http` or `grpc`, and check `name` must be set. 121 Only supported in the Consul service provider. 122 123 - `port` `(string: <varies>)` - Specifies the label of the port on which the 124 check will be performed. Note this is the _label_ of the port and not the port 125 number unless `address_mode = driver`. The port label must match one defined 126 in the [`network`][network] stanza. If a port value was declared on the 127 `service`, this will inherit from that value if not supplied. If supplied, 128 this value takes precedence over the `service.port` value. This is useful for 129 services which operate on multiple ports. `grpc`, `http`, and `tcp` checks 130 require a port while `script` checks do not. Checks will use the host IP and 131 ports by default. In Nomad 0.7.1 or later numeric ports may be used if 132 `address_mode="driver"` is set on the check. 133 134 - `protocol` `(string: "http")` - Specifies the protocol for the http-based 135 health checks. Valid options are `http` and `https`. 136 137 - `task` `(string: "")` - Specifies the task associated with this 138 check. Scripts are executed within the task's environment, and 139 `check_restart` stanzas will apply to the specified task. Inherits 140 the [`service.task`][service_task] value if not set. Must be unset 141 or equivelent to `service.task` in task-level services. 142 143 - `timeout` `(string: <required>)` - Specifies how long to wait for a health check 144 query to succeed. This is specified using a label suffix like "30s" or "1h". This 145 must be greater than or equal to "1s" 146 147 ~> **Caveat:** Script checks use the task driver to execute in the task's 148 environment. For task drivers with namespace isolation such as `docker` or 149 `exec`, setting up the context for the script check may take an unexpectedly 150 long amount of time (a full second or two), especially on busy hosts. The 151 timeout configuration must allow for both this setup and the execution of 152 the script. Operators should use long timeouts (5 or more seconds) for script 153 checks, and monitor telemetry for 154 `client.allocrunner.taskrunner.tasklet_timeout`. 155 156 - `type` `(string: <required>)` - This indicates the check types supported by 157 Nomad. For Consul service checks, valid options are `grpc`, `http`, `script`, 158 and `tcp`. For Nomad service checks, valid options are `http` and `tcp`. 159 160 - `tls_skip_verify` `(bool: false)` - Skip verifying TLS certificates for HTTPS 161 checks. Only supported in the Consul service provider. 162 163 - `on_update` `(string: "require_healthy")` - Specifies how checks should be 164 evaluated when determining deployment health (including a job's initial 165 deployment). This allows job submitters to define certain checks as readiness 166 checks, progressing a deployment even if the Service's checks are not yet 167 healthy. Checks inherit the Service's value by default. The check status is 168 not altered in the service provider and is only used to determine the check's 169 health during an update. 170 171 - `require_healthy` - In order for Nomad to consider the check healthy during 172 an update it must report as healthy. 173 174 - `ignore_warnings` - If a Service Check reports as warning, Nomad will treat 175 the check as healthy. The Check will still be in a warning state in Consul. 176 177 - `ignore` - Any status will be treated as healthy. 178 179 ~> **Caveat:** `on_update` is only compatible with certain 180 [`check_restart`][check_restart_stanza] configurations. `on_update = "ignore_warnings"` requires that `check_restart.ignore_warnings = true`. 181 `check_restart` can however specify `ignore_warnings = true` with `on_update = "require_healthy"`. If `on_update` is set to `ignore`, `check_restart` must 182 be omitted entirely. 183 184 #### `header` Stanza 185 186 HTTP checks may include a `header` stanza to set HTTP headers. The `header` 187 stanza parameters have lists of strings as values. Multiple values will cause 188 the header to be set multiple times, once for each value. 189 190 ```hcl 191 service { 192 # ... 193 check { 194 type = "http" 195 port = "lb" 196 path = "/_healthz" 197 interval = "5s" 198 timeout = "2s" 199 header { 200 Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="] 201 } 202 } 203 } 204 ``` 205 206 ### HTTP Health Check 207 208 This example shows a service with an HTTP health check. This will query the 209 service on the IP and port registered with Nomad at `/_healthz` every 5 seconds, 210 giving the service a maximum of 2 seconds to return a response, and include an 211 Authorization header. Any non-2xx code is considered a failure. 212 213 ```hcl 214 service { 215 check { 216 type = "http" 217 port = "lb" 218 path = "/_healthz" 219 interval = "5s" 220 timeout = "2s" 221 header { 222 Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="] 223 } 224 } 225 } 226 ``` 227 228 ### Multiple Health Checks 229 230 This example shows a service with multiple health checks defined. All health 231 checks must be passing in order for the service to register as healthy. 232 233 ```hcl 234 service { 235 check { 236 name = "HTTP Check" 237 type = "http" 238 port = "lb" 239 path = "/_healthz" 240 interval = "5s" 241 timeout = "2s" 242 } 243 244 check { 245 name = "HTTPS Check" 246 type = "http" 247 protocol = "https" 248 port = "lb" 249 path = "/_healthz" 250 interval = "5s" 251 timeout = "2s" 252 method = "POST" 253 } 254 255 check { 256 name = "Postgres Check" 257 type = "script" 258 command = "/usr/local/bin/pg-tools" 259 args = ["verify", "database", "prod", "up"] 260 interval = "5s" 261 timeout = "2s" 262 on_update = "ignore_warnings" 263 } 264 } 265 ``` 266 267 ### gRPC Health Check 268 269 gRPC health checks use the same host and port behavior as `http` and `tcp` 270 checks, but gRPC checks also have an optional gRPC service to health check. Not 271 all gRPC applications require a service to health check. 272 273 ```hcl 274 service { 275 check { 276 type = "grpc" 277 port = "rpc" 278 interval = "5s" 279 timeout = "2s" 280 grpc_service = "example.Service" 281 grpc_use_tls = true 282 tls_skip_verify = true 283 } 284 } 285 ``` 286 287 In this example Consul would health check the `example.Service` service on the 288 `rpc` port defined in the task's [network resources][network] stanza. See 289 [Using Driver Address Mode](#using-driver-address-mode) for details on address 290 selection. 291 292 ### Script Checks with Shells 293 294 Note that script checks run inside the task. If your task is a Docker container, 295 the script will run inside the Docker container. If your task is running in a 296 chroot, it will run in the chroot. Please keep this in mind when authoring check 297 scripts. 298 299 This example shows a service with a script check that is evaluated and interpolated in a shell; it 300 tests whether a file is present at `${HEALTH_CHECK_FILE}` environment variable: 301 302 ```hcl 303 service { 304 check { 305 type = "script" 306 command = "/bin/bash" 307 args = ["-c", "test -f ${HEALTH_CHECK_FILE}"] 308 } 309 } 310 ``` 311 312 Using `/bin/bash` (or another shell) is required here to interpolate the `${HEALTH_CHECK_FILE}` value. 313 314 The following examples of `command` fields **will not work**: 315 316 ```hcl 317 # invalid because command is not a path 318 check { 319 type = "script" 320 command = "test -f /tmp/file.txt" 321 } 322 323 # invalid because path will not be interpolated 324 check { 325 type = "script" 326 command = "/bin/test" 327 args = ["-f", "${HEALTH_CHECK_FILE}"] 328 } 329 ``` 330 331 ### Healthiness vs. Readiness Checks 332 333 Multiple checks for a service can be composed to create healthiness and readiness 334 checks by configuring [`on_update`][on_update] for the check. 335 336 ```hcl 337 service { 338 # This is a healthiness check that will be used to verify the service 339 # is responsive to tcp connections and behaving as expected. 340 check { 341 name = "connection_tcp" 342 type = "tcp" 343 port = 6379 344 interval = "10s" 345 timeout = "2s" 346 } 347 348 # This is a readiness check that is used to verify that, for example, the 349 # application has elected a leader by making a request to its /leader endpoint. 350 # Failures of this check are ignored during deployments. 351 check { 352 name = "leader_elected" 353 type = "http" 354 path = "/leader" 355 interval = "10s" 356 timeout = "2s" 357 on_update = "ignore_warnings" 358 } 359 } 360 ``` 361 362 For checks registered into the Nomad service provider, the status information will 363 indicate `Mode = readiness` for readiness checks and `Mode = healthiness` for health 364 checks. 365 366 ### Check status on CLI 367 368 For checks registered into the Nomad service provider, the status information of 369 checks can be viewed per-allocation. The `alloc status` command now includes 370 summary information for Nomad service checks. 371 372 ``` 373 ➜ nomad alloc status <allocation-id> 374 ``` 375 376 ``` 377 Nomad Service Checks: 378 Service Task Name Mode Status 379 database task db_tcp_probe readiness success 380 web (group) healthz healthiness failure 381 web (group) index-page healthiness success 382 ``` 383 384 The `alloc checks` command can be used for viewing complete check status information 385 for all checks in an allocation. 386 387 ``` 388 ➜ noamd alloc checks <allocation-id> 389 ``` 390 391 ``` 392 Status of 3 Nomad Service Checks 393 394 ID = d8651d93a50b9e28375a7beb9418c418 395 Name = db_tcp_probe 396 Group = example.group[0] 397 Task = task 398 Service = database 399 Status = success 400 Mode = readiness 401 Timestamp = 2022-08-22T10:41:23-05:00 402 Output = nomad: tcp ok 403 404 ID = 0413b61bda7014f02671675d7e146373 405 Name = index-page 406 Group = example.group[0] 407 Task = (group) 408 Service = web 409 Status = success 410 StatusCode = 200 411 Mode = healthiness 412 Timestamp = 2022-08-22T10:41:23-05:00 413 Output = nomad: http ok 414 415 ID = c3cce3f0c97975f84bbf39bdd50deaea 416 Name = healthz 417 Group = example.group[0] 418 Task = (group) 419 Service = web 420 Status = failure 421 Mode = healthiness 422 Timestamp = 2022-08-22T10:41:23-05:00 423 Output = nomad: Get "http://:9999/": dial tcp :9999: connect: connection refused 424 ``` 425 426 --- 427 428 <sup> 429 <small>1</small> 430 </sup> 431 <small> 432 {' '} 433 Script checks are not supported for the QEMU driver since the Nomad client 434 does not have access to the file system of a task for that driver. 435 </small> 436 437 [check_restart_stanza]: /docs/job-specification/check_restart 438 [consul_passfail]: https://developer.hashicorp.com/consul/docs/discovery/checks#success-failures-before-passing-critical 439 [network]: /docs/job-specification/network 'Nomad network Job Specification' 440 [service]: /docs/job-specification/service 441 [service_task]: /docs/job-specification/service#task-1 442 [on_update]: /docs/job-specification/service#on_update