github.com/blixtra/nomad@v0.7.2-0.20171221000451-da9a1d7bb050/website/source/docs/operating-a-job/update-strategies/rolling-upgrades.html.md (about) 1 --- 2 layout: "docs" 3 page_title: "Rolling Upgrades - Operating a Job" 4 sidebar_current: "docs-operating-a-job-updating-rolling-upgrades" 5 description: |- 6 In order to update a service while reducing downtime, Nomad provides a 7 built-in mechanism for rolling upgrades. Rolling upgrades incrementally 8 transistions jobs between versions and using health check information to 9 reduce downtime. 10 --- 11 12 # Rolling Upgrades 13 14 Nomad supports rolling updates as a first class feature. To enable rolling 15 updates a job or task group is annotated with a high-level description of the 16 update strategy using the [`update` stanza][update]. Under the hood, Nomad 17 handles limiting parallelism, interfacing with Consul to determine service 18 health and even automatically reverting to an older, healthy job when a 19 deployment fails. 20 21 ## Enabling Rolling Updates 22 23 Rolling updates are enabled by adding the [`update` stanza][update] to the job 24 specification. The `update` stanza may be placed at the job level or in an 25 individual task group. When placed at the job level, the update strategy is 26 inherited by all task groups in the job. When placed at both the job and group 27 level, the `update` stanzas are merged, with group stanzas taking precedence 28 over job level stanzas. See the [`update` stanza 29 documentation](/docs/job-specification/update.html#upgrade-stanza-inheritance) 30 for an example. 31 32 ```hcl 33 job "geo-api-server" { 34 # ... 35 36 group "api-server" { 37 count = 6 38 39 # Add an update stanza to enable rolling updates of the service 40 update { 41 max_parallel = 2 42 min_healthy_time = "30s" 43 healthy_deadline = "10m" 44 } 45 46 task "server" { 47 driver = "docker" 48 49 config { 50 image = "geo-api-server:0.1" 51 } 52 53 # ... 54 } 55 } 56 } 57 ``` 58 59 In this example, by adding the simple `update` stanza to the "api-server" task 60 group, we inform Nomad that updates to the group should be handled with a 61 rolling update strategy. 62 63 Thus when a change is made to the job file that requires new allocations to be 64 made, Nomad will deploy 2 allocations at a time and require that the allocations 65 be running in a healthy state for 30 seconds before deploying more versions of the 66 new group. 67 68 By default Nomad determines allocation health by ensuring that all tasks in the 69 group are running and that any [service 70 check](/docs/job-specification/service.html#check-parameters) the tasks register 71 are passing. 72 73 ## Planning Changes 74 75 Suppose we make a change to a file to upgrade the version of a Docker container 76 that is configured with the same rolling update strategy from above. 77 78 ```diff 79 @@ -2,6 +2,8 @@ job "geo-api-server" { 80 group "api-server" { 81 task "server" { 82 driver = "docker" 83 84 config { 85 - image = "geo-api-server:0.1" 86 + image = "geo-api-server:0.2" 87 ``` 88 89 The [`nomad plan` command](/docs/commands/plan.html) allows 90 us to visualize the series of steps the scheduler would perform. We can analyze 91 this output to confirm it is correct: 92 93 ```text 94 $ nomad plan geo-api-server.nomad 95 ``` 96 97 Here is some sample output: 98 99 ```text 100 +/- Job: "geo-api-server" 101 +/- Task Group: "api-server" (2 create/destroy update, 4 ignore) 102 +/- Task: "server" (forces create/destroy update) 103 +/- Config { 104 +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" 105 } 106 107 Scheduler dry-run: 108 - All tasks successfully allocated. 109 110 Job Modify Index: 7 111 To submit the job with version verification run: 112 113 nomad run -check-index 7 my-web.nomad 114 115 When running the job with the check-index flag, the job will only be run if the 116 server side version matches the job modify index returned. If the index has 117 changed, another user has modified the job and the plan's results are 118 potentially invalid. 119 ``` 120 121 Here we can see that Nomad will begin a rolling update by creating and 122 destroying 2 allocations first and for the time being ignoring 4 of the old 123 allocations, matching our configured `max_parallel`. 124 125 ## Inspecting a Deployment 126 127 After running the plan we can submit the updated job by simply running `nomad 128 run`. Once run, Nomad will begin the rolling upgrade of our service by placing 129 2 allocations at a time of the new job and taking two of the old jobs down. 130 131 We can inspect the current state of a rolling deployment using `nomad status`: 132 133 ```text 134 $ nomad status geo-api-server 135 ID = geo-api-server 136 Name = geo-api-server 137 Submit Date = 07/26/17 18:08:56 UTC 138 Type = service 139 Priority = 50 140 Datacenters = dc1 141 Status = running 142 Periodic = false 143 Parameterized = false 144 145 Summary 146 Task Group Queued Starting Running Failed Complete Lost 147 api-server 0 0 6 0 4 0 148 149 Latest Deployment 150 ID = c5b34665 151 Status = running 152 Description = Deployment is running 153 154 Deployed 155 Task Group Desired Placed Healthy Unhealthy 156 api-server 6 4 2 0 157 158 Allocations 159 ID Node ID Task Group Version Desired Status Created At 160 14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 161 a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 162 a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 163 496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 164 9fc96fcc f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC 165 2521c47a f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC 166 6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 167 9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 168 691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 169 af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 170 ``` 171 172 Here we can see that Nomad has created a deployment to conduct the rolling 173 upgrade from job version 0 to 1 and has placed 4 instances of the new job and 174 has stopped 4 of the old instances. If we look at the deployed allocations, we 175 also can see that Nomad has placed 4 instances of job version 1 but only 176 considers 2 of them healthy. This is because the 2 newest placed allocations 177 haven't been healthy for the required 30 seconds yet. 178 179 If we wait for the deployment to complete and re-issue the command, we get the 180 following: 181 182 ```text 183 $ nomad status geo-api-server 184 ID = geo-api-server 185 Name = geo-api-server 186 Submit Date = 07/26/17 18:08:56 UTC 187 Type = service 188 Priority = 50 189 Datacenters = dc1 190 Status = running 191 Periodic = false 192 Parameterized = false 193 194 Summary 195 Task Group Queued Starting Running Failed Complete Lost 196 cache 0 0 6 0 6 0 197 198 Latest Deployment 199 ID = c5b34665 200 Status = successful 201 Description = Deployment completed successfully 202 203 Deployed 204 Task Group Desired Placed Healthy Unhealthy 205 cache 6 6 6 0 206 207 Allocations 208 ID Node ID Task Group Version Desired Status Created At 209 d42a1656 f7b1ee08 api-server 1 run running 07/26/17 18:10:10 UTC 210 401daaf9 f7b1ee08 api-server 1 run running 07/26/17 18:10:00 UTC 211 14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 212 a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC 213 a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 214 496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC 215 9fc96fcc f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 216 2521c47a f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 217 6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 218 9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 219 691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 220 af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC 221 ``` 222 223 Nomad has successfully transitioned the group to running the updated canary and 224 did so with no downtime to our service by ensuring only two allocations were 225 changed at a time and the newly placed allocations ran successfully. Had any of 226 the newly placed allocations failed their health check, Nomad would have aborted 227 the deployment and stopped placing new allocations. If configured, Nomad can 228 automatically revert back to the old job definition when the deployment fails. 229 230 ## Auto Reverting on Failed Deployments 231 232 In the case we do a deployment in which the new allocations are unhealthy, Nomad 233 will fail the deployment and stop placing new instances of the job. It 234 optionally supports automatically reverting back to the last stable job version 235 on deployment failure. Nomad keeps a history of submitted jobs and whether the 236 job version was stable. A job is considered stable if all its allocations are 237 healthy. 238 239 To enable this we simply add the `auto_revert` parameter to the `update` stanza: 240 241 ``` 242 update { 243 max_parallel = 2 244 min_healthy_time = "30s" 245 healthy_deadline = "10m" 246 247 # Enable automatically reverting to the last stable job on a failed 248 # deployment. 249 auto_revert = true 250 } 251 ``` 252 253 Now imagine we want to update our image to "geo-api-server:0.3" but we instead 254 update it to the below and run the job: 255 256 ```diff 257 @@ -2,6 +2,8 @@ job "geo-api-server" { 258 group "api-server" { 259 task "server" { 260 driver = "docker" 261 262 config { 263 - image = "geo-api-server:0.2" 264 + image = "geo-api-server:0.33" 265 ``` 266 267 If we run `nomad job deployments` we can see that the deployment fails and Nomad 268 auto-reverts to the last stable job: 269 270 ```text 271 $ nomad job deployments geo-api-server 272 ID Job ID Job Version Status Description 273 0c6f87a5 geo-api-server 3 successful Deployment completed successfully 274 b1712b7f geo-api-server 2 failed Failed due to unhealthy allocations - rolling back to job version 1 275 3eee83ce geo-api-server 1 successful Deployment completed successfully 276 72813fcf geo-api-server 0 successful Deployment completed successfully 277 ``` 278 279 Nomad job versions increment monotonically, so even though Nomad reverted to the 280 job specification at version 1, it creates a new job version. We can see the 281 differences between a jobs versions and how Nomad auto-reverted the job using 282 the `job history` command: 283 284 ```text 285 $ nomad job history -p geo-api-server 286 Version = 3 287 Stable = true 288 Submit Date = 07/26/17 18:44:18 UTC 289 Diff = 290 +/- Job: "geo-api-server" 291 +/- Task Group: "api-server" 292 +/- Task: "server" 293 +/- Config { 294 +/- image: "geo-api-server:0.33" => "geo-api-server:0.2" 295 } 296 297 Version = 2 298 Stable = false 299 Submit Date = 07/26/17 18:45:21 UTC 300 Diff = 301 +/- Job: "geo-api-server" 302 +/- Task Group: "api-server" 303 +/- Task: "server" 304 +/- Config { 305 +/- image: "geo-api-server:0.2" => "geo-api-server:0.33" 306 } 307 308 Version = 1 309 Stable = true 310 Submit Date = 07/26/17 18:44:18 UTC 311 Diff = 312 +/- Job: "geo-api-server" 313 +/- Task Group: "api-server" 314 +/- Task: "server" 315 +/- Config { 316 +/- image: "geo-api-server:0.1" => "geo-api-server:0.2" 317 } 318 319 Version = 0 320 Stable = true 321 Submit Date = 07/26/17 18:43:43 UTC 322 ``` 323 324 We can see that Nomad considered the job running "geo-api-server:0.1" and 325 "geo-api-server:0.2" as stable but job Version 2 that submitted the incorrect 326 image is marked as unstable. This is because the placed allocations failed to 327 start. Nomad detected the deployment failed and as such, created job Version 3 328 that reverted back to the last healthy job. 329 330 [update]: /docs/job-specification/update.html "Nomad update Stanza"