github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/failure-handling-strategies/restart.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Restart Stanza - Operating a Job"
     4  sidebar_current: "guides-operating-a-job-failure-handling-strategies-local-restarts"
     5  description: |-
     6    Nomad can restart a task on the node it is running on to recover from
     7    failures. Task restarts can be configured to be limited by number of attempts within
     8    a specific interval.
     9  ---
    10  
    11  # Restart Stanza
    12  
    13  To enable restarting a failed task on the node it is running on, the task group can be annotated
    14  with configurable options using the [`restart` stanza][restart]. Nomad will restart the failed task
    15  up to `attempts` times within a provided `interval`. Operators can also choose whether to
    16  keep attempting restarts on the same node, or to fail the task so that it can be rescheduled
    17  on another node, via the `mode` parameter.
    18  
    19  We recommend setting mode to `fail` in the restart stanza to allow rescheduling the task on another node.
    20  
    21  
    22  ## Example
    23  The following CLI example shows job status and allocation status for a failed task that is being restarted by Nomad.
    24  Allocations are in the `pending` state while restarts are attempted. The `Recent Events` section in the CLI
    25  shows ongoing restart attempts.
    26  
    27  ```text
    28  $nomad job status demo
    29  ID            = demo
    30  Name          = demo
    31  Submit Date   = 2018-04-12T14:37:18-05:00
    32  Type          = service
    33  Priority      = 50
    34  Datacenters   = dc1
    35  Status        = running
    36  Periodic      = false
    37  Parameterized = false
    38  
    39  Summary
    40  Task Group  Queued  Starting  Running  Failed  Complete  Lost
    41  demo        0       3         0        0       0         0
    42  
    43  Allocations
    44  ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
    45  ce5bf1d1  8a184f31  demo        0        run      pending  27s ago  5s ago
    46  d5dee7c8  8a184f31  demo        0        run      pending  27s ago  5s ago
    47  ed815997  8a184f31  demo        0        run      pending  27s ago  5s ago
    48  ```
    49  
    50  In the following example, the allocation `ce5bf1d1` is restarted by Nomad approximately
    51  every ten seconds, with a small random jitter. It eventually reaches its limit of three attempts and
    52  transitions into a `failed` state, after which it becomes eligible for [rescheduling][rescheduling].
    53  
    54  ```text
    55  $nomad alloc-status ce5bf1d1
    56  ID                     = ce5bf1d1
    57  Eval ID                = 64e45d11
    58  Name                   = demo.demo[1]
    59  Node ID                = a0ccdd8b
    60  Job ID                 = demo
    61  Job Version            = 0
    62  Client Status          = failed
    63  Client Description     = <none>
    64  Desired Status         = run
    65  Desired Description    = <none>
    66  Created                = 56s ago
    67  Modified               = 22s ago
    68  
    69  Task "demo" is "dead"
    70  Task Resources
    71  CPU      Memory   Disk     IOPS  Addresses
    72  100 MHz  300 MiB  300 MiB  0
    73  
    74  Task Events:
    75  Started At     = 2018-04-12T22:29:08Z
    76  Finished At    = 2018-04-12T22:29:08Z
    77  Total Restarts = 3
    78  Last Restart   = 2018-04-12T17:28:57-05:00
    79  
    80  Recent Events:
    81  Time                       Type            Description
    82  2018-04-12T17:29:08-05:00  Not Restarting  Exceeded allowed attempts 3 in interval 5m0s and mode is "fail"
    83  2018-04-12T17:29:08-05:00  Terminated      Exit Code: 127
    84  2018-04-12T17:29:08-05:00  Started         Task started by client
    85  2018-04-12T17:28:57-05:00  Restarting      Task restarting in 10.364602876s
    86  2018-04-12T17:28:57-05:00  Terminated      Exit Code: 127
    87  2018-04-12T17:28:57-05:00  Started         Task started by client
    88  2018-04-12T17:28:47-05:00  Restarting      Task restarting in 10.666963769s
    89  2018-04-12T17:28:47-05:00  Terminated      Exit Code: 127
    90  2018-04-12T17:28:47-05:00  Started         Task started by client
    91  2018-04-12T17:28:35-05:00  Restarting      Task restarting in 11.777324721s
    92  ```
    93  
    94  
    95  [restart]: /docs/job-specification/restart.html "Nomad restart Stanza"
    96  [rescheduling]: /docs/job-specification/reschedule.html "Nomad restart Stanza"