github.com/smintz/nomad@v0.8.3/website/source/docs/job-specification/check_restart.html.md

github.com/smintz/nomad@v0.8.3/website/source/docs/job-specification/check_restart.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "check_restart Stanza - Job Specification"
     4  sidebar_current: "docs-job-specification-check_restart"
     5  description: |-
     6    The "check_restart" stanza instructs Nomad when to restart tasks with
     7    unhealthy service checks.
     8  ---
     9  
    10  # `check_restart` Stanza
    11  
    12  <table class="table table-bordered table-striped">
    13    <tr>
    14      <th width="120">Placement</th>
    15      <td>
    16        <code>job -> group -> task -> service -> **check_restart**</code>
    17        <br>
    18        <code>job -> group -> task -> service -> check -> **check_restart**</code>
    19      </td>
    20    </tr>
    21  </table>
    22  
    23  As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
    24  tasks with unhealthy service checks.  When a health check in Consul has been
    25  unhealthy for the `limit` specified in a `check_restart` stanza, it is
    26  restarted according to the task group's [`restart` policy][restart_stanza]. The
    27  `check_restart` settings apply to [`check`s][check_stanza], but may also be
    28  placed on [`service`s][service_stanza] to apply to all checks on a service.
    29  If `check_restart` is set on both the check and service, the stanzas are
    30  merged with the check values taking precedence.
    31  
    32  ```hcl
    33  job "mysql" {
    34    group "mysqld" {
    35  
    36      restart {
    37        attempts = 3
    38        delay    = "10s"
    39        interval = "10m"
    40        mode     = "fail"
    41      }
    42  
    43      task "server" {
    44        service {
    45          tags = ["leader", "mysql"]
    46  
    47          port = "db"
    48  
    49          check {
    50            type     = "tcp"
    51            port     = "db"
    52            interval = "10s"
    53            timeout  = "2s"
    54          }
    55  
    56          check {
    57            type     = "script"
    58            name     = "check_table"
    59            command  = "/usr/local/bin/check_mysql_table_status"
    60            args     = ["--verbose"]
    61            interval = "60s"
    62            timeout  = "5s"
    63  
    64            check_restart {
    65              limit = 3
    66              grace = "90s"
    67              ignore_warnings = false
    68            }
    69          }
    70        }
    71      }
    72    }
    73  }
    74  ```
    75  
    76  - `limit` `(int: 0)` - Restart task when a health check has failed `limit`
    77    times.  For example 1 causes a restart on the first failure. The default,
    78    `0`, disables health check based restarts. Failures must be consecutive. A
    79    single passing check will reset the count, so flapping services may not be
    80    restarted.
    81  
    82  - `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
    83    before checking its health.
    84  
    85  - `ignore_warnings` `(bool: false)` - By default checks with both `critical`
    86    and `warning` statuses are considered unhealthy. Setting `ignore_warnings =
    87    true` treats a `warning` status like `passing` and will not trigger a restart.
    88  
    89  ## Example Behavior
    90  
    91  Using the example `mysql` above would have the following behavior:
    92  
    93  ```hcl
    94  check_restart {
    95    # ...
    96    grace = "90s"
    97    # ...
    98  }
    99  ```
   100  
   101  When the `server` task first starts and is registered in Consul, its health
   102  will not be checked for 90 seconds. This gives the server time to startup.
   103  
   104  ```hcl
   105  check_restart {
   106    limit = 3
   107    # ...
   108  }
   109  ```
   110  
   111  After the grace period if the script check fails, it has 180 seconds (`60s
   112  interval * 3 limit`) to pass before a restart is triggered. Once a restart is
   113  triggered the task group's [`restart` policy][restart_stanza] takes control:
   114  
   115  ```hcl
   116  restart {
   117    # ...
   118    delay    = "10s"
   119    # ...
   120  }
   121  ```
   122  
   123  The [`restart` stanza][restart_stanza] controls the restart behavior of the
   124  task. In this case it will stop the task and then wait 10 seconds before
   125  starting it again.
   126  
   127  Once the task restarts Nomad waits the `grace` period again before starting to
   128  check the task's health.
   129  
   130  
   131  ```hcl
   132  restart {
   133    attempts = 3
   134    # ...
   135    interval = "10m"
   136    mode     = "fail"
   137  }
   138  ```
   139  
   140  If the check continues to fail, the task will be restarted up to `attempts`
   141  times within an `interval`. If the `restart` attempts are reached within the
   142  `limit` then the `mode` controls the behavior. In this case the task would fail
   143  and not be restarted again. See the [`restart` stanza][restart_stanza] for
   144  details.
   145  
   146  [check_stanza]:  /docs/job-specification/service.html#check-parameters "check stanza"
   147  [restart_stanza]: /docs/job-specification/restart.html "restart stanza"
   148  [service_stanza]: /docs/job-specification/service.html "service stanza"