github.com/smintz/nomad@v0.8.3/website/source/guides/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Blue/Green & Canary Deployments - Operating a Job" 4 sidebar_current: "guides-operating-a-job-updating-blue-green-deployments" 5 description: |- 6 Nomad has built-in support for doing blue/green and canary deployments to more 7 safely update existing applications and services. 8 --- 9 10 # Blue/Green & Canary Deployments 11 12 Sometimes [rolling 13 upgrades](/guides/operating-a-job/update-strategies/rolling-upgrades.html) do not 14 offer the required flexibility for updating an application in production. Often 15 organizations prefer to put a "canary" build into production or utilize a 16 technique known as a "blue/green" deployment to ensure a safe application 17 rollout to production while minimizing downtime. 18 19 ## Blue/Green Deployments 20 21 Blue/Green deployments have several other names including Red/Black or A/B, but 22 the concept is generally the same. In a blue/green deployment, there are two 23 application versions. Only one application version is active at a time, except 24 during the transition phase from one version to the next. The term "active" 25 tends to mean "receiving traffic" or "in service". 26 27 Imagine a hypothetical API server which has five instances deployed to 28 production at version 1.3, and we want to safely upgrade to version 1.4. We want 29 to create five new instances at version 1.4 and in the case that they are 30 operating correctly we want to promote them and take down the five versions 31 running 1.3. In the event of failure, we can quickly rollback to 1.3. 32 33 To start, we examine our job which is running in production: 34 35 ```hcl 36 job "docs" { 37 # ... 38 39 group "api" { 40 count = 5 41 42 update { 43 max_parallel = 1 44 canary = 5 45 min_healthy_time = "30s" 46 healthy_deadline = "10m" 47 auto_revert = true 48 } 49 50 task "api-server" { 51 driver = "docker" 52 53 config { 54 image = "api-server:1.3" 55 } 56 } 57 } 58 } 59 ``` 60 61 We see that it has an `update` stanza that has the `canary` equal to the desired 62 count. This is what allows us to easily model blue/green deployments. When we 63 change the job to run the "api-server:1.4" image, Nomad will create 5 new 64 allocations without touching the original "api-server:1.3" allocations. Below we 65 can see how this works by changing the image to run the new version: 66 67 ```diff 68 @@ -2,6 +2,8 @@ job "docs" { 69 group "api" { 70 task "api-server" { 71 config { 72 - image = "api-server:1.3" 73 + image = "api-server:1.4" 74 ``` 75 76 Next we plan and run these changes: 77 78 ```text 79 $ nomad job plan docs.nomad 80 +/- Job: "docs" 81 +/- Task Group: "api" (5 canary, 5 ignore) 82 +/- Task: "api-server" (forces create/destroy update) 83 +/- Config { 84 +/- image: "api-server:1.3" => "api-server:1.4" 85 } 86 87 Scheduler dry-run: 88 - All tasks successfully allocated. 89 90 Job Modify Index: 7 91 To submit the job with version verification run: 92 93 nomad job run -check-index 7 example.nomad 94 95 When running the job with the check-index flag, the job will only be run if the 96 server side version matches the job modify index returned. If the index has 97 changed, another user has modified the job and the plan's results are 98 potentially invalid. 99 100 $ nomad job run docs.nomad 101 # ... 102 ``` 103 104 We can see from the plan output that Nomad is going to create 5 canaries that 105 are running the "api-server:1.4" image and ignore all the allocations running 106 the older image. Now if we examine the status of the job we can see that both 107 the blue ("api-server:1.3") and green ("api-server:1.4") set are running. 108 109 ```text 110 $ nomad status docs 111 ID = docs 112 Name = docs 113 Submit Date = 07/26/17 19:57:47 UTC 114 Type = service 115 Priority = 50 116 Datacenters = dc1 117 Status = running 118 Periodic = false 119 Parameterized = false 120 121 Summary 122 Task Group Queued Starting Running Failed Complete Lost 123 api 0 0 10 0 0 0 124 125 Latest Deployment 126 ID = 32a080c1 127 Status = running 128 Description = Deployment is running but requires promotion 129 130 Deployed 131 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 132 api true false 5 5 5 5 0 133 134 Allocations 135 ID Node ID Task Group Version Desired Status Created At 136 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 137 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 138 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 139 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 140 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 141 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 142 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 143 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 144 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC 145 b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC 146 ``` 147 148 Now that we have the new set in production, we can route traffic to it and 149 validate the new job version is working properly. Based on whether the new 150 version is functioning properly or improperly we will either want to promote or 151 fail the deployment. 152 153 ### Promoting the Deployment 154 155 After deploying the new image along side the old version we have determined it 156 is functioning properly and we want to transition fully to the new version. 157 Doing so is as simple as promoting the deployment: 158 159 ```text 160 $ nomad deployment promote 32a080c1 161 ==> Monitoring evaluation "61ac2be5" 162 Evaluation triggered by job "docs" 163 Evaluation within deployment: "32a080c1" 164 Evaluation status changed: "pending" -> "complete" 165 ==> Evaluation "61ac2be5" finished with status "complete" 166 ``` 167 168 If we look at the job's status we see that after promotion, Nomad stopped the 169 older allocations and is only running the new one. This now completes our 170 blue/green deployment. 171 172 ```text 173 $ nomad status docs 174 ID = docs 175 Name = docs 176 Submit Date = 07/26/17 19:57:47 UTC 177 Type = service 178 Priority = 50 179 Datacenters = dc1 180 Status = running 181 Periodic = false 182 Parameterized = false 183 184 Summary 185 Task Group Queued Starting Running Failed Complete Lost 186 api 0 0 5 0 5 0 187 188 Latest Deployment 189 ID = 32a080c1 190 Status = successful 191 Description = Deployment completed successfully 192 193 Deployed 194 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 195 api true true 5 5 5 5 0 196 197 Allocations 198 ID Node ID Task Group Version Desired Status Created At 199 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 200 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 201 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 202 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 203 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 204 3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 205 4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 206 2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 207 35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 208 b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 209 ``` 210 211 ### Failing the Deployment 212 213 After deploying the new image alongside the old version we have determined it 214 is not functioning properly and we want to roll back to the old version. Doing 215 so is as simple as failing the deployment: 216 217 ```text 218 $ nomad deployment fail 32a080c1 219 Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0. 220 221 ==> Monitoring evaluation "6840f512" 222 Evaluation triggered by job "example" 223 Evaluation within deployment: "32a080c1" 224 Allocation "0ccb732f" modified: node "36e7a123", group "cache" 225 Allocation "64d4f282" modified: node "36e7a123", group "cache" 226 Allocation "664e33c7" modified: node "36e7a123", group "cache" 227 Allocation "a4cb6a4b" modified: node "36e7a123", group "cache" 228 Allocation "fdd73bdd" modified: node "36e7a123", group "cache" 229 Evaluation status changed: "pending" -> "complete" 230 ==> Evaluation "6840f512" finished with status "complete" 231 ``` 232 233 If we now look at the job's status we can see that after failing the deployment, 234 Nomad stopped the new allocations and is only running the old ones and reverted 235 the working copy of the job back to the original specification running 236 "api-server:1.3". 237 238 ```text 239 $ nomad status docs 240 ID = docs 241 Name = docs 242 Submit Date = 07/26/17 19:57:47 UTC 243 Type = service 244 Priority = 50 245 Datacenters = dc1 246 Status = running 247 Periodic = false 248 Parameterized = false 249 250 Summary 251 Task Group Queued Starting Running Failed Complete Lost 252 api 0 0 5 0 5 0 253 254 Latest Deployment 255 ID = 6f3f84b3 256 Status = successful 257 Description = Deployment completed successfully 258 259 Deployed 260 Task Group Auto Revert Desired Placed Healthy Unhealthy 261 cache true 5 5 5 0 262 263 Allocations 264 ID Node ID Task Group Version Desired Status Created At 265 27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 266 5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 267 983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 268 d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 269 d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 270 0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 271 64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 272 664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 273 a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 274 fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 275 276 $ nomad job deployments docs 277 ID Job ID Job Version Status Description 278 6f3f84b3 example 2 successful Deployment completed successfully 279 32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0 280 c4c16494 example 0 successful Deployment completed successfully 281 ``` 282 283 ## Canary Deployments 284 285 Canary updates are a useful way to test a new version of a job before beginning 286 a rolling upgrade. The `update` stanza supports setting the number of canaries 287 the job operator would like Nomad to create when the job changes via the 288 `canary` parameter. When the job specification is updated, Nomad creates the 289 canaries without stopping any allocations from the previous job. 290 291 This pattern allows operators to achieve higher confidence in the new job 292 version because they can route traffic, examine logs, etc, to determine the new 293 application is performing properly. 294 295 ```hcl 296 job "docs" { 297 # ... 298 299 group "api" { 300 count = 5 301 302 update { 303 max_parallel = 1 304 canary = 1 305 min_healthy_time = "30s" 306 healthy_deadline = "10m" 307 auto_revert = true 308 } 309 310 task "api-server" { 311 driver = "docker" 312 313 config { 314 image = "api-server:1.3" 315 } 316 } 317 } 318 } 319 ``` 320 321 In the example above, the `update` stanza tells Nomad to create a single canary 322 when the job specification is changed. Below we can see how this works by 323 changing the image to run the new version: 324 325 ```diff 326 @@ -2,6 +2,8 @@ job "docs" { 327 group "api" { 328 task "api-server" { 329 config { 330 - image = "api-server:1.3" 331 + image = "api-server:1.4" 332 ``` 333 334 Next we plan and run these changes: 335 336 ```text 337 $ nomad job plan docs.nomad 338 +/- Job: "docs" 339 +/- Task Group: "api" (1 canary, 5 ignore) 340 +/- Task: "api-server" (forces create/destroy update) 341 +/- Config { 342 +/- image: "api-server:1.3" => "api-server:1.4" 343 } 344 345 Scheduler dry-run: 346 - All tasks successfully allocated. 347 348 Job Modify Index: 7 349 To submit the job with version verification run: 350 351 nomad job run -check-index 7 example.nomad 352 353 When running the job with the check-index flag, the job will only be run if the 354 server side version matches the job modify index returned. If the index has 355 changed, another user has modified the job and the plan's results are 356 potentially invalid. 357 358 $ nomad job run docs.nomad 359 # ... 360 ``` 361 362 We can see from the plan output that Nomad is going to create 1 canary that 363 will run the "api-server:1.4" image and ignore all the allocations running 364 the older image. If we inspect the status we see that the canary is running 365 along side the older version of the job: 366 367 ```text 368 $ nomad status docs 369 ID = docs 370 Name = docs 371 Submit Date = 07/26/17 19:57:47 UTC 372 Type = service 373 Priority = 50 374 Datacenters = dc1 375 Status = running 376 Periodic = false 377 Parameterized = false 378 379 Summary 380 Task Group Queued Starting Running Failed Complete Lost 381 api 0 0 6 0 0 0 382 383 Latest Deployment 384 ID = 32a080c1 385 Status = running 386 Description = Deployment is running but requires promotion 387 388 Deployed 389 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 390 api true false 5 1 1 1 0 391 392 Allocations 393 ID Node ID Task Group Version Desired Status Created At 394 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 395 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 396 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 397 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 398 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC 399 b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC 400 ``` 401 402 Now if we promote the canary, this will trigger a rolling update to replace the 403 remaining allocations running the older image. The rolling update will happen at 404 a rate of `max_parallel`, so in this case one allocation at a time: 405 406 ```text 407 $ nomad deployment promote 37033151 408 ==> Monitoring evaluation "37033151" 409 Evaluation triggered by job "docs" 410 Evaluation within deployment: "ed28f6c2" 411 Allocation "f5057465" created: node "f6646949", group "cache" 412 Allocation "f5057465" status changed: "pending" -> "running" 413 Evaluation status changed: "pending" -> "complete" 414 ==> Evaluation "37033151" finished with status "complete" 415 416 $ nomad status docs 417 ID = docs 418 Name = docs 419 Submit Date = 07/26/17 20:28:59 UTC 420 Type = service 421 Priority = 50 422 Datacenters = dc1 423 Status = running 424 Periodic = false 425 Parameterized = false 426 427 Summary 428 Task Group Queued Starting Running Failed Complete Lost 429 api 0 0 5 0 2 0 430 431 Latest Deployment 432 ID = ed28f6c2 433 Status = running 434 Description = Deployment is running 435 436 Deployed 437 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 438 api true true 5 1 2 1 0 439 440 Allocations 441 ID Node ID Task Group Version Desired Status Created At 442 f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC 443 b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC 444 1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC 445 1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC 446 4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC 447 62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC 448 f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC 449 ``` 450 451 Alternatively, if the canary was not performing properly, we could abandon the 452 change using the `nomad deployment fail` command, similar to the blue/green 453 example.