github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/update-strategies/rolling-upgrades.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Rolling Upgrades - Operating a Job" 4 sidebar_current: "guides-operating-a-job-updating-rolling-upgrades" 5 description: |- 6 In order to update a service while reducing downtime, Nomad provides a 7 built-in mechanism for rolling upgrades. Rolling upgrades incrementally 8 transitions jobs between versions and using health check information to 9 reduce downtime. 10 --- 11 12 # Rolling Upgrades 13 14 Nomad supports rolling updates as a first class feature. To enable rolling 15 updates a job or task group is annotated with a high-level description of the 16 update strategy using the [`update` stanza][update]. Under the hood, Nomad 17 handles limiting parallelism, interfacing with Consul to determine service 18 health and even automatically reverting to an older, healthy job when a 19 deployment fails. 20 21 ## Enabling Rolling Updates 22 23 Rolling updates are enabled by adding the [`update` stanza][update] to the job 24 specification. The `update` stanza may be placed at the job level or in an 25 individual task group. When placed at the job level, the update strategy is 26 inherited by all task groups in the job. When placed at both the job and group 27 level, the `update` stanzas are merged, with group stanzas taking precedence 28 over job level stanzas. See the [`update` stanza 29 documentation](/docs/job-specification/update.html#upgrade-stanza-inheritance) 30 for an example. 31 32 ```hcl 33 job "geo-api-server" { 34 # ... 35 36 group "api-server" { 37 count = 6 38 39 # Add an update stanza to enable rolling updates of the service 40 update { 41 max_parallel = 2 42 min_healthy_time = "30s" 43 healthy_deadline = "10m" 44 } 45 46 task "server" { 47 driver = "docker" 48 49 config { 50 image = "geo-api-server:0.1" 51 } 52 53 # ... 54 } 55 } 56 } 57 ``` 58 59 In this example, by adding the simple `update` stanza to the "api-server" task 60 group, we inform Nomad that updates to the group should be handled with a 61 rolling update strategy. 62 63 Thus when a change is made to the job file that requires new allocations to be 64 made, Nomad will deploy 2 allocations at a time and require that the allocations 65 be running in a healthy state for 30 seconds before deploying more versions of the 66 new group. 67 68 By default Nomad determines allocation health by ensuring that all tasks in the 69 group are running and that any [service 70 check](/docs/job-specification/service.html#check-parameters) the tasks register 71 are passing. 72 73 ## Planning Changes 74 75 Suppose we make a change to a file to upgrade the version of a Docker container 76 that is configured with the same rolling update strategy from above. 77 78 ```diff 79 @@ -2,6 +2,8 @@ job "geo-api-server" { 80 group "api-server" { 81 task "server" { 82 driver = "docker" 83 84 config { 85 - image = "geo-api-server:0.1" 86 + image = "geo-api-server:0.2" 87 ``` 88 89 The [`nomad job plan` command](/docs/commands/job/plan.html) allows 90 us to visualize the series of steps the scheduler would perform. We can analyze 91 this output to confirm it is correct: 92 93 ```text 94 $ nomad job plan geo-api-server.nomad 95 +/- Job: "geo-api-server" 96 +/- Task Group: "api-server" (2 create/destroy update, 4 ignore) 97 +/- Task: "server" (forces create/destroy update) 98 +/- Config { 99 +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" 100 } 101 102 Scheduler dry-run: 103 - All tasks successfully allocated. 104 105 Job Modify Index: 7 106 To submit the job with version verification run: 107 108 nomad job run -check-index 7 my-web.nomad 109 110 When running the job with the check-index flag, the job will only be run if the 111 server side version matches the job modify index returned. If the index has 112 changed, another user has modified the job and the plan's results are 113 potentially invalid. 114 ``` 115 116 Here we can see that Nomad will begin a rolling update by creating and 117 destroying 2 allocations first and for the time being ignoring 4 of the old 118 allocations, matching our configured `max_parallel`. 119 120 ## Inspecting a Deployment 121 122 After running the plan we can submit the updated job by simply running `nomad 123 run`. Once run, Nomad will begin the rolling upgrade of our service by placing 124 2 allocations at a time of the new job and taking two of the old jobs down. 125 126 We can inspect the current state of a rolling deployment using `nomad status`: 127 128 ```text 129 $ nomad status geo-api-server 130 ID = geo-api-server 131 Name = geo-api-server 132 Submit Date = 07/26/17 18:08:56 UTC 133 Type = service 134 Priority = 50 135 Datacenters = dc1 136 Status = running 137 Periodic = false 138 Parameterized = false 139 140 Summary 141 Task Group Queued Starting Running Failed Complete Lost 142 api-server 0 0 6 0 4 0 143 144 Latest Deployment 145 ID = c5b34665 146 Status = running 147 Description = Deployment is running 148 149 Deployed 150 Task Group Desired Placed Healthy Unhealthy 151 api-server 6 4 2 0 152 153 Allocations 154 ID Node ID Task Group Version Desired Status Created At 155 14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 156 a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 157 a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 158 496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 159 9fc96fcc f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC 160 2521c47a f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC 161 6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 162 9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 163 691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 164 af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 165 ``` 166 167 Here we can see that Nomad has created a deployment to conduct the rolling 168 upgrade from job version 0 to 1 and has placed 4 instances of the new job and 169 has stopped 4 of the old instances. If we look at the deployed allocations, we 170 also can see that Nomad has placed 4 instances of job version 1 but only 171 considers 2 of them healthy. This is because the 2 newest placed allocations 172 haven't been healthy for the required 30 seconds yet. 173 174 If we wait for the deployment to complete and re-issue the command, we get the 175 following: 176 177 ```text 178 $ nomad status geo-api-server 179 ID = geo-api-server 180 Name = geo-api-server 181 Submit Date = 07/26/17 18:08:56 UTC 182 Type = service 183 Priority = 50 184 Datacenters = dc1 185 Status = running 186 Periodic = false 187 Parameterized = false 188 189 Summary 190 Task Group Queued Starting Running Failed Complete Lost 191 cache 0 0 6 0 6 0 192 193 Latest Deployment 194 ID = c5b34665 195 Status = successful 196 Description = Deployment completed successfully 197 198 Deployed 199 Task Group Desired Placed Healthy Unhealthy 200 cache 6 6 6 0 201 202 Allocations 203 ID Node ID Task Group Version Desired Status Created At 204 d42a1656 f7b1ee08 api-server 1 run running 07/26/17 18:10:10 UTC 205 401daaf9 f7b1ee08 api-server 1 run running 07/26/17 18:10:00 UTC 206 14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 207 a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 208 a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 209 496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 210 9fc96fcc f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 211 2521c47a f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 212 6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 213 9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 214 691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 215 af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 216 ``` 217 218 Nomad has successfully transitioned the group to running the updated canary and 219 did so with no downtime to our service by ensuring only two allocations were 220 changed at a time and the newly placed allocations ran successfully. Had any of 221 the newly placed allocations failed their health check, Nomad would have aborted 222 the deployment and stopped placing new allocations. If configured, Nomad can 223 automatically revert back to the old job definition when the deployment fails. 224 225 ## Auto Reverting on Failed Deployments 226 227 In the case we do a deployment in which the new allocations are unhealthy, Nomad 228 will fail the deployment and stop placing new instances of the job. It 229 optionally supports automatically reverting back to the last stable job version 230 on deployment failure. Nomad keeps a history of submitted jobs and whether the 231 job version was stable. A job is considered stable if all its allocations are 232 healthy. 233 234 To enable this we simply add the `auto_revert` parameter to the `update` stanza: 235 236 ``` 237 update { 238 max_parallel = 2 239 min_healthy_time = "30s" 240 healthy_deadline = "10m" 241 242 # Enable automatically reverting to the last stable job on a failed 243 # deployment. 244 auto_revert = true 245 } 246 ``` 247 248 Now imagine we want to update our image to "geo-api-server:0.3" but we instead 249 update it to the below and run the job: 250 251 ```diff 252 @@ -2,6 +2,8 @@ job "geo-api-server" { 253 group "api-server" { 254 task "server" { 255 driver = "docker" 256 257 config { 258 - image = "geo-api-server:0.2" 259 + image = "geo-api-server:0.33" 260 ``` 261 262 If we run `nomad job deployments` we can see that the deployment fails and Nomad 263 auto-reverts to the last stable job: 264 265 ```text 266 $ nomad job deployments geo-api-server 267 ID Job ID Job Version Status Description 268 0c6f87a5 geo-api-server 3 successful Deployment completed successfully 269 b1712b7f geo-api-server 2 failed Failed due to unhealthy allocations - rolling back to job version 1 270 3eee83ce geo-api-server 1 successful Deployment completed successfully 271 72813fcf geo-api-server 0 successful Deployment completed successfully 272 ``` 273 274 Nomad job versions increment monotonically, so even though Nomad reverted to the 275 job specification at version 1, it creates a new job version. We can see the 276 differences between a jobs versions and how Nomad auto-reverted the job using 277 the `job history` command: 278 279 ```text 280 $ nomad job history -p geo-api-server 281 Version = 3 282 Stable = true 283 Submit Date = 07/26/17 18:44:18 UTC 284 Diff = 285 +/- Job: "geo-api-server" 286 +/- Task Group: "api-server" 287 +/- Task: "server" 288 +/- Config { 289 +/- image: "geo-api-server:0.33" => "geo-api-server:0.2" 290 } 291 292 Version = 2 293 Stable = false 294 Submit Date = 07/26/17 18:45:21 UTC 295 Diff = 296 +/- Job: "geo-api-server" 297 +/- Task Group: "api-server" 298 +/- Task: "server" 299 +/- Config { 300 +/- image: "geo-api-server:0.2" => "geo-api-server:0.33" 301 } 302 303 Version = 1 304 Stable = true 305 Submit Date = 07/26/17 18:44:18 UTC 306 Diff = 307 +/- Job: "geo-api-server" 308 +/- Task Group: "api-server" 309 +/- Task: "server" 310 +/- Config { 311 +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" 312 } 313 314 Version = 0 315 Stable = true 316 Submit Date = 07/26/17 18:43:43 UTC 317 ``` 318 319 We can see that Nomad considered the job running "geo-api-server:0.1" and 320 "geo-api-server:0.2" as stable but job Version 2 that submitted the incorrect 321 image is marked as unstable. This is because the placed allocations failed to 322 start. Nomad detected the deployment failed and as such, created job Version 3 323 that reverted back to the last healthy job. 324 325 [update]: /docs/job-specification/update.html "Nomad update Stanza"