github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/failure-handling-strategies/check-restart.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Check Restart Stanza - Operating a Job"
     4  sidebar_current: "guides-operating-a-job-failure-handling-strategies-check-restart"
     5  description: |-
     6    Nomad can restart tasks if they have a failing health check based on
     7    configuration specified in the `check_restart` stanza. Restarts are done locally on the node
     8    running the task based on their `restart` policy.
     9  ---
    10  
    11  # Check Restart Stanza
    12  
    13  The [`check_restart` stanza][check restart] instructs Nomad when to restart tasks with unhealthy service checks.
    14  When a health check in Consul has been unhealthy for the limit specified in a check_restart stanza,
    15  it is restarted according to the task group's restart policy.
    16  
    17  The `limit ` field is used to specify the number of times a failing healthcheck is seen before local restarts are attempted.
    18  Operators can also specify a `grace` duration to wait after a task restarts before checking its health.
    19  
    20  We recommend configuring the check restart on services if its likely that a restart would resolve the failure. This
    21  is applicable in cases like temporary memory issues on the service.
    22  
    23  # Example
    24  
    25  The following `check_restart` stanza waits for two consecutive health check failures with a
    26  grace period and considers both `critical` and `warning` statuses as failures
    27  
    28  ```text
    29  check_restart {
    30    limit           = 2
    31    grace           = "10s"
    32    ignore_warnings = false
    33  }
    34  ```
    35  
    36  The following CLI example output shows healthcheck failures triggering restarts until its
    37  restart limit is reached.
    38  
    39  ```
    40  $nomad alloc status e1b43128-2a0a-6aa3-c375-c7e8a7c48690
    41  ID                   = e1b43128
    42  Eval ID              = 249cbfe9
    43  Name                 = demo.demo[0]
    44  Node ID              = 221e998e
    45  Job ID               = demo
    46  Job Version          = 0
    47  Client Status        = failed
    48  Client Description   = <none>
    49  Desired Status       = run
    50  Desired Description  = <none>
    51  Created              = 2m59s ago
    52  Modified             = 39s ago
    53  
    54  Task "test" is "dead"
    55  Task Resources
    56  CPU      Memory   Disk     IOPS  Addresses
    57  100 MHz  300 MiB  300 MiB  0     p1: 127.0.0.1:28422
    58  
    59  Task Events:
    60  Started At     = 2018-04-12T22:50:32Z
    61  Finished At    = 2018-04-12T22:50:54Z
    62  Total Restarts = 3
    63  Last Restart   = 2018-04-12T17:50:15-05:00
    64  
    65  Recent Events:
    66  Time                       Type              Description
    67  2018-04-12T17:50:54-05:00  Not Restarting    Exceeded allowed attempts 3 in interval 30m0s and mode is "fail"
    68  2018-04-12T17:50:54-05:00  Killed            Task successfully killed
    69  2018-04-12T17:50:54-05:00  Killing           Sent interrupt. Waiting 5s before force killing
    70  2018-04-12T17:50:54-05:00  Restart Signaled  healthcheck: check "service: \"demo-service-test\" check" unhealthy
    71  2018-04-12T17:50:32-05:00  Started           Task started by client
    72  2018-04-12T17:50:15-05:00  Restarting        Task restarting in 16.887291122s
    73  2018-04-12T17:50:15-05:00  Killed            Task successfully killed
    74  2018-04-12T17:50:15-05:00  Killing           Sent interrupt. Waiting 5s before force killing
    75  2018-04-12T17:50:15-05:00  Restart Signaled  healthcheck: check "service: \"demo-service-test\" check" unhealthy
    76  2018-04-12T17:49:53-05:00  Started           Task started by client
    77  ```
    78  
    79  [check restart]: /docs/job-specification/check_restart.html "Nomad check restart Stanza"