github.com/hspak/nomad@v0.7.2-0.20180309000617-bc4ae22a39a5/website/source/docs/job-specification/check_restart.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "check_restart Stanza - Job Specification"
     4  sidebar_current: "docs-job-specification-check_restart"
     5  description: |-
     6    The "check_restart" stanza instructs Nomad when to restart tasks with
     7    unhealthy service checks.
     8  ---
     9  
    10  # `check_restart` Stanza
    11  
    12  <table class="table table-bordered table-striped">
    13    <tr>
    14      <th width="120">Placement</th>
    15      <td>
    16        <code>job -> group -> task -> service -> **check_restart**</code>
    17      </td>
    18    </tr>
    19    <tr>
    20      <th width="120">Placement</th>
    21      <td>
    22        <code>job -> group -> task -> service -> check -> **check_restart**</code>
    23      </td>
    24    </tr>
    25  </table>
    26  
    27  As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
    28  tasks with unhealthy service checks.  When a health check in Consul has been
    29  unhealthy for the `limit` specified in a `check_restart` stanza, it is
    30  restarted according to the task group's [`restart` policy][restart_stanza]. The
    31  `check_restart` settings apply to [`check`s][check_stanza], but may also be
    32  placed on [`service`s][service_stanza] to apply to all checks on a service.
    33  If `check_restart` is set on both the check and service, the stanzas are
    34  merged with the check values taking precedence.
    35  
    36  ```hcl
    37  job "mysql" {
    38    group "mysqld" {
    39  
    40      restart {
    41        attempts = 3
    42        delay    = "10s"
    43        interval = "10m"
    44        mode     = "fail"
    45      }
    46  
    47      task "server" {
    48        service {
    49          tags = ["leader", "mysql"]
    50  
    51          port = "db"
    52  
    53          check {
    54            type     = "tcp"
    55            port     = "db"
    56            interval = "10s"
    57            timeout  = "2s"
    58          }
    59  
    60          check {
    61            type     = "script"
    62            name     = "check_table"
    63            command  = "/usr/local/bin/check_mysql_table_status"
    64            args     = ["--verbose"]
    65            interval = "60s"
    66            timeout  = "5s"
    67  
    68            check_restart {
    69              limit = 3
    70              grace = "90s"
    71              ignore_warnings = false
    72            }
    73          }
    74        }
    75      }
    76    }
    77  }
    78  ```
    79  
    80  - `limit` `(int: 0)` - Restart task when a health check has failed `limit`
    81    times.  For example 1 causes a restart on the first failure. The default,
    82    `0`, disables health check based restarts. Failures must be consecutive. A
    83    single passing check will reset the count, so flapping services may not be
    84    restarted.
    85  
    86  - `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
    87    before checking its health.
    88  
    89  - `ignore_warnings` `(bool: false)` - By default checks with both `critical`
    90    and `warning` statuses are considered unhealthy. Setting `ignore_warnings =
    91    true` treats a `warning` status like `passing` and will not trigger a restart.
    92  
    93  ## Example Behavior
    94  
    95  Using the example `mysql` above would have the following behavior:
    96  
    97  ```hcl
    98  check_restart {
    99    # ...
   100    grace = "90s"
   101    # ...
   102  }
   103  ```
   104  
   105  When the `server` task first starts and is registered in Consul, its health
   106  will not be checked for 90 seconds. This gives the server time to startup.
   107  
   108  ```hcl
   109  check_restart {
   110    limit = 3
   111    # ...
   112  }
   113  ```
   114  
   115  After the grace period if the script check fails, it has 180 seconds (`60s
   116  interval * 3 limit`) to pass before a restart is triggered. Once a restart is
   117  triggered the task group's [`restart` policy][restart_stanza] takes control:
   118  
   119  ```hcl
   120  restart {
   121    # ...
   122    delay    = "10s"
   123    # ...
   124  }
   125  ```
   126  
   127  The [`restart` stanza][restart_stanza] controls the restart behavior of the
   128  task. In this case it will stop the task and then wait 10 seconds before
   129  starting it again.
   130  
   131  Once the task restarts Nomad waits the `grace` period again before starting to
   132  check the task's health.
   133  
   134  
   135  ```hcl
   136  restart {
   137    attempts = 3
   138    # ...
   139    interval = "10m"
   140    mode     = "fail"
   141  }
   142  ```
   143  
   144  If the check continues to fail, the task will be restarted up to `attempts`
   145  times within an `interval`. If the `restart` attempts are reached within the
   146  `limit` then the `mode` controls the behavior. In this case the task would fail
   147  and not be restarted again. See the [`restart` stanza][restart_stanza] for
   148  details.
   149  
   150  [check_stanza]:  /docs/job-specification/service.html#check-parameters "check stanza"
   151  [restart_stanza]: /docs/job-specification/restart.html "restart stanza"
   152  [service_stanza]: /docs/job-specification/service.html "service stanza"