github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/job-specification/check_restart.mdx

github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/job-specification/check_restart.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: check_restart Stanza - Job Specification
     4  sidebar_title: check_restart
     5  description: |-
     6    The "check_restart" stanza instructs Nomad when to restart tasks with
     7    unhealthy service checks.
     8  ---
     9  
    10  # `check_restart` Stanza
    11  
    12  <Placement
    13    groups={[
    14      ['job', 'group', 'task', 'service', 'check_restart'],
    15      ['job', 'group', 'task', 'service', 'check', 'check_restart']
    16    ]}
    17  />
    18  
    19  As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
    20  tasks with unhealthy service checks. When a health check in Consul has been
    21  unhealthy for the `limit` specified in a `check_restart` stanza, it is
    22  restarted according to the task group's [`restart` policy][restart_stanza]. The
    23  `check_restart` settings apply to [`check`s][check_stanza], but may also be
    24  placed on [`service`s][service_stanza] to apply to all checks on a service.
    25  If `check_restart` is set on both the check and service, the stanzas are
    26  merged with the check values taking precedence.
    27  
    28  ```hcl
    29  job "mysql" {
    30    group "mysqld" {
    31  
    32      restart {
    33        attempts = 3
    34        delay    = "10s"
    35        interval = "10m"
    36        mode     = "fail"
    37      }
    38  
    39      task "server" {
    40        service {
    41          tags = ["leader", "mysql"]
    42  
    43          port = "db"
    44  
    45          check {
    46            type     = "tcp"
    47            port     = "db"
    48            interval = "10s"
    49            timeout  = "2s"
    50          }
    51  
    52          check {
    53            type     = "script"
    54            name     = "check_table"
    55            command  = "/usr/local/bin/check_mysql_table_status"
    56            args     = ["--verbose"]
    57            interval = "60s"
    58            timeout  = "5s"
    59  
    60            check_restart {
    61              limit = 3
    62              grace = "90s"
    63              ignore_warnings = false
    64            }
    65          }
    66        }
    67      }
    68    }
    69  }
    70  ```
    71  
    72  - `limit` `(int: 0)` - Restart task when a health check has failed `limit`
    73    times. For example 1 causes a restart on the first failure. The default,
    74    `0`, disables health check based restarts. Failures must be consecutive. A
    75    single passing check will reset the count, so flapping services may not be
    76    restarted.
    77  
    78  - `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
    79    before checking its health.
    80  
    81  - `ignore_warnings` `(bool: false)` - By default checks with both `critical`
    82    and `warning` statuses are considered unhealthy. Setting `ignore_warnings = true` treats a `warning` status like `passing` and will not trigger a restart.
    83  
    84  ## Example Behavior
    85  
    86  Using the example `mysql` above would have the following behavior:
    87  
    88  ```hcl
    89  check_restart {
    90    # ...
    91    grace = "90s"
    92    # ...
    93  }
    94  ```
    95  
    96  When the `server` task first starts and is registered in Consul, its health
    97  will not be checked for 90 seconds. This gives the server time to startup.
    98  
    99  ```hcl
   100  check_restart {
   101    limit = 3
   102    # ...
   103  }
   104  ```
   105  
   106  After the grace period if the script check fails, it has 180 seconds (`60s interval * 3 limit`) to pass before a restart is triggered. Once a restart is
   107  triggered the task group's [`restart` policy][restart_stanza] takes control:
   108  
   109  ```hcl
   110  restart {
   111    # ...
   112    delay    = "10s"
   113    # ...
   114  }
   115  ```
   116  
   117  The [`restart` stanza][restart_stanza] controls the restart behavior of the
   118  task. In this case it will stop the task and then wait 10 seconds before
   119  starting it again.
   120  
   121  Once the task restarts Nomad waits the `grace` period again before starting to
   122  check the task's health.
   123  
   124  ```hcl
   125  restart {
   126    attempts = 3
   127    # ...
   128    interval = "10m"
   129    mode     = "fail"
   130  }
   131  ```
   132  
   133  If the check continues to fail, the task will be restarted up to `attempts`
   134  times within an `interval`. If the `restart` attempts are reached within the
   135  `limit` then the `mode` controls the behavior. In this case the task would fail
   136  and not be restarted again. See the [`restart` stanza][restart_stanza] for
   137  details.
   138  
   139  [check_stanza]: /docs/job-specification/service#check-parameters 'check stanza'
   140  [restart_stanza]: /docs/job-specification/restart 'restart stanza'
   141  [service_stanza]: /docs/job-specification/service 'service stanza'