github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/failure-handling-strategies/restart.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Restart Stanza - Operating a Job" 4 sidebar_current: "guides-operating-a-job-failure-handling-strategies-local-restarts" 5 description: |- 6 Nomad can restart a task on the node it is running on to recover from 7 failures. Task restarts can be configured to be limited by number of attempts within 8 a specific interval. 9 --- 10 11 # Restart Stanza 12 13 To enable restarting a failed task on the node it is running on, the task group can be annotated 14 with configurable options using the [`restart` stanza][restart]. Nomad will restart the failed task 15 up to `attempts` times within a provided `interval`. Operators can also choose whether to 16 keep attempting restarts on the same node, or to fail the task so that it can be rescheduled 17 on another node, via the `mode` parameter. 18 19 We recommend setting mode to `fail` in the restart stanza to allow rescheduling the task on another node. 20 21 22 ## Example 23 The following CLI example shows job status and allocation status for a failed task that is being restarted by Nomad. 24 Allocations are in the `pending` state while restarts are attempted. The `Recent Events` section in the CLI 25 shows ongoing restart attempts. 26 27 ```text 28 $nomad job status demo 29 ID = demo 30 Name = demo 31 Submit Date = 2018-04-12T14:37:18-05:00 32 Type = service 33 Priority = 50 34 Datacenters = dc1 35 Status = running 36 Periodic = false 37 Parameterized = false 38 39 Summary 40 Task Group Queued Starting Running Failed Complete Lost 41 demo 0 3 0 0 0 0 42 43 Allocations 44 ID Node ID Task Group Version Desired Status Created Modified 45 ce5bf1d1 8a184f31 demo 0 run pending 27s ago 5s ago 46 d5dee7c8 8a184f31 demo 0 run pending 27s ago 5s ago 47 ed815997 8a184f31 demo 0 run pending 27s ago 5s ago 48 ``` 49 50 In the following example, the allocation `ce5bf1d1` is restarted by Nomad approximately 51 every ten seconds, with a small random jitter. It eventually reaches its limit of three attempts and 52 transitions into a `failed` state, after which it becomes eligible for [rescheduling][rescheduling]. 53 54 ```text 55 $nomad alloc-status ce5bf1d1 56 ID = ce5bf1d1 57 Eval ID = 64e45d11 58 Name = demo.demo[1] 59 Node ID = a0ccdd8b 60 Job ID = demo 61 Job Version = 0 62 Client Status = failed 63 Client Description = <none> 64 Desired Status = run 65 Desired Description = <none> 66 Created = 56s ago 67 Modified = 22s ago 68 69 Task "demo" is "dead" 70 Task Resources 71 CPU Memory Disk IOPS Addresses 72 100 MHz 300 MiB 300 MiB 0 73 74 Task Events: 75 Started At = 2018-04-12T22:29:08Z 76 Finished At = 2018-04-12T22:29:08Z 77 Total Restarts = 3 78 Last Restart = 2018-04-12T17:28:57-05:00 79 80 Recent Events: 81 Time Type Description 82 2018-04-12T17:29:08-05:00 Not Restarting Exceeded allowed attempts 3 in interval 5m0s and mode is "fail" 83 2018-04-12T17:29:08-05:00 Terminated Exit Code: 127 84 2018-04-12T17:29:08-05:00 Started Task started by client 85 2018-04-12T17:28:57-05:00 Restarting Task restarting in 10.364602876s 86 2018-04-12T17:28:57-05:00 Terminated Exit Code: 127 87 2018-04-12T17:28:57-05:00 Started Task started by client 88 2018-04-12T17:28:47-05:00 Restarting Task restarting in 10.666963769s 89 2018-04-12T17:28:47-05:00 Terminated Exit Code: 127 90 2018-04-12T17:28:47-05:00 Started Task started by client 91 2018-04-12T17:28:35-05:00 Restarting Task restarting in 11.777324721s 92 ``` 93 94 95 [restart]: /docs/job-specification/restart.html "Nomad restart Stanza" 96 [rescheduling]: /docs/job-specification/reschedule.html "Nomad restart Stanza"