github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/failure-handling-strategies/check-restart.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Check Restart Stanza - Operating a Job" 4 sidebar_current: "guides-operating-a-job-failure-handling-strategies-check-restart" 5 description: |- 6 Nomad can restart tasks if they have a failing health check based on 7 configuration specified in the `check_restart` stanza. Restarts are done locally on the node 8 running the task based on their `restart` policy. 9 --- 10 11 # Check Restart Stanza 12 13 The [`check_restart` stanza][check restart] instructs Nomad when to restart tasks with unhealthy service checks. 14 When a health check in Consul has been unhealthy for the limit specified in a check_restart stanza, 15 it is restarted according to the task group's restart policy. 16 17 The `limit ` field is used to specify the number of times a failing healthcheck is seen before local restarts are attempted. 18 Operators can also specify a `grace` duration to wait after a task restarts before checking its health. 19 20 We recommend configuring the check restart on services if its likely that a restart would resolve the failure. This 21 is applicable in cases like temporary memory issues on the service. 22 23 # Example 24 25 The following `check_restart` stanza waits for two consecutive health check failures with a 26 grace period and considers both `critical` and `warning` statuses as failures 27 28 ```text 29 check_restart { 30 limit = 2 31 grace = "10s" 32 ignore_warnings = false 33 } 34 ``` 35 36 The following CLI example output shows healthcheck failures triggering restarts until its 37 restart limit is reached. 38 39 ``` 40 $nomad alloc status e1b43128-2a0a-6aa3-c375-c7e8a7c48690 41 ID = e1b43128 42 Eval ID = 249cbfe9 43 Name = demo.demo[0] 44 Node ID = 221e998e 45 Job ID = demo 46 Job Version = 0 47 Client Status = failed 48 Client Description = <none> 49 Desired Status = run 50 Desired Description = <none> 51 Created = 2m59s ago 52 Modified = 39s ago 53 54 Task "test" is "dead" 55 Task Resources 56 CPU Memory Disk IOPS Addresses 57 100 MHz 300 MiB 300 MiB 0 p1: 127.0.0.1:28422 58 59 Task Events: 60 Started At = 2018-04-12T22:50:32Z 61 Finished At = 2018-04-12T22:50:54Z 62 Total Restarts = 3 63 Last Restart = 2018-04-12T17:50:15-05:00 64 65 Recent Events: 66 Time Type Description 67 2018-04-12T17:50:54-05:00 Not Restarting Exceeded allowed attempts 3 in interval 30m0s and mode is "fail" 68 2018-04-12T17:50:54-05:00 Killed Task successfully killed 69 2018-04-12T17:50:54-05:00 Killing Sent interrupt. Waiting 5s before force killing 70 2018-04-12T17:50:54-05:00 Restart Signaled healthcheck: check "service: \"demo-service-test\" check" unhealthy 71 2018-04-12T17:50:32-05:00 Started Task started by client 72 2018-04-12T17:50:15-05:00 Restarting Task restarting in 16.887291122s 73 2018-04-12T17:50:15-05:00 Killed Task successfully killed 74 2018-04-12T17:50:15-05:00 Killing Sent interrupt. Waiting 5s before force killing 75 2018-04-12T17:50:15-05:00 Restart Signaled healthcheck: check "service: \"demo-service-test\" check" unhealthy 76 2018-04-12T17:49:53-05:00 Started Task started by client 77 ``` 78 79 [check restart]: /docs/job-specification/check_restart.html "Nomad check restart Stanza"