github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/check_restart.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/job-specification/check_restart.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: check_restart Stanza - Job Specification
     4  description: |-
     5    The "check_restart" stanza instructs Nomad when to restart tasks with
     6    unhealthy service checks.
     7  ---
     8  
     9  # `check_restart` Stanza
    10  
    11  <Placement
    12    groups={[
    13      ['job', 'group', 'task', 'service', 'check_restart'],
    14      ['job', 'group', 'task', 'service', 'check', 'check_restart'],
    15    ]}
    16  />
    17  
    18  The `check_restart` stanza instructs Nomad when to restart tasks with unhealthy
    19  service checks. When a health check in Nomad or Consul has been unhealthy for the `limit`
    20  specified in a `check_restart` stanza, it is restarted according to the task group's
    21  [`restart` policy][restart_stanza]. The `check_restart` settings apply to
    22  [`check`s][check_stanza], but may also be placed on [`service`s][service_stanza]
    23  to apply to all checks on a service. If `check_restart` is set on both the check
    24  and service, the stanzas are merged with the check values taking precedence.
    25  
    26  ```hcl
    27  job "mysql" {
    28    group "mysqld" {
    29  
    30      restart {
    31        attempts = 3
    32        delay    = "10s"
    33        interval = "10m"
    34        mode     = "fail"
    35      }
    36  
    37      task "server" {
    38        service {
    39          tags = ["leader", "mysql"]
    40  
    41          port = "db"
    42  
    43          check {
    44            type     = "tcp"
    45            port     = "db"
    46            interval = "10s"
    47            timeout  = "2s"
    48          }
    49  
    50          check {
    51            type     = "script"
    52            name     = "check_table"
    53            command  = "/usr/local/bin/check_mysql_table_status"
    54            args     = ["--verbose"]
    55            interval = "60s"
    56            timeout  = "5s"
    57  
    58            check_restart {
    59              limit = 3
    60              grace = "90s"
    61              ignore_warnings = false
    62            }
    63          }
    64        }
    65      }
    66    }
    67  }
    68  ```
    69  
    70  - `limit` `(int: 0)` - Restart task when a health check has failed `limit`
    71    times. For example 1 causes a restart on the first failure. The default,
    72    `0`, disables health check based restarts. Failures must be consecutive. A
    73    single passing check will reset the count, so flapping services may not be
    74    restarted.
    75  
    76  - `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
    77    before checking its health.
    78  
    79  - `ignore_warnings` `(bool: false)` - By default checks with both `critical`
    80    and `warning` statuses are considered unhealthy. Setting `ignore_warnings = true`
    81    treats a `warning` status like `passing` and will not trigger a restart. Only
    82    available in the Consul service provider.
    83  
    84  ## Example Behavior
    85  
    86  Using the example `mysql` above would have the following behavior:
    87  
    88  ```hcl
    89  check_restart {
    90    # ...
    91    grace = "90s"
    92    # ...
    93  }
    94  ```
    95  
    96  When the `server` task first starts and is registered in Consul, its health
    97  will not be checked for 90 seconds. This gives the server time to startup.
    98  
    99  ```hcl
   100  check_restart {
   101    limit = 3
   102    # ...
   103  }
   104  ```
   105  
   106  After the grace period if the script check fails, it has 180 seconds (`60s interval * 3 limit`)
   107  to pass before a restart is triggered. Once a restart is triggered the task group's
   108  [`restart` policy][restart_stanza] takes control:
   109  
   110  ```hcl
   111  restart {
   112    # ...
   113    delay    = "10s"
   114    # ...
   115  }
   116  ```
   117  
   118  The [`restart` stanza][restart_stanza] controls the restart behavior of the
   119  task. In this case it will stop the task and then wait 10 seconds before
   120  starting it again.
   121  
   122  Once the task restarts Nomad waits the `grace` period again before starting to
   123  check the task's health.
   124  
   125  ```hcl
   126  restart {
   127    attempts = 3
   128    # ...
   129    interval = "10m"
   130    mode     = "fail"
   131  }
   132  ```
   133  
   134  If the check continues to fail, the task will be restarted up to `attempts`
   135  times within an `interval`. If the `restart` attempts are reached within the
   136  `limit` then the `mode` controls the behavior. In this case the task would fail
   137  and not be restarted again. See the [`restart` stanza][restart_stanza] for
   138  details.
   139  
   140  [check_stanza]: /docs/job-specification/service#check-parameters 'check stanza'
   141  [gh-9176]: https://github.com/hashicorp/nomad/issues/9176
   142  [restart_stanza]: /docs/job-specification/restart 'restart stanza'
   143  [service_stanza]: /docs/job-specification/service 'service stanza'