github.com/ferranbt/nomad@v0.9.3-0.20190607002617-85c449b7667c/website/source/guides/operating-a-job/update-strategies/blue-green-and-canary-deployments.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Blue/Green & Canary Deployments - Operating a Job" 4 sidebar_current: "guides-operating-a-job-updating-blue-green-deployments" 5 description: |- 6 Nomad has built-in support for doing blue/green and canary deployments to more 7 safely update existing applications and services. 8 --- 9 10 # Blue/Green & Canary Deployments 11 12 Sometimes [rolling 13 upgrades](/guides/operating-a-job/update-strategies/rolling-upgrades.html) do not 14 offer the required flexibility for updating an application in production. Often 15 organizations prefer to put a "canary" build into production or utilize a 16 technique known as a "blue/green" deployment to ensure a safe application 17 rollout to production while minimizing downtime. 18 19 ## Blue/Green Deployments 20 21 Blue/Green deployments have several other names including Red/Black or A/B, but 22 the concept is generally the same. In a blue/green deployment, there are two 23 application versions. Only one application version is active at a time, except 24 during the transition phase from one version to the next. The term "active" 25 tends to mean "receiving traffic" or "in service". 26 27 Imagine a hypothetical API server which has five instances deployed to 28 production at version 1.3, and we want to safely upgrade to version 1.4. We want 29 to create five new instances at version 1.4 and in the case that they are 30 operating correctly we want to promote them and take down the five versions 31 running 1.3. In the event of failure, we can quickly rollback to 1.3. 32 33 To start, we examine our job which is running in production: 34 35 ```hcl 36 job "docs" { 37 # ... 38 39 group "api" { 40 count = 5 41 42 update { 43 max_parallel = 1 44 canary = 5 45 min_healthy_time = "30s" 46 healthy_deadline = "10m" 47 auto_revert = true 48 auto_promote = false 49 } 50 51 task "api-server" { 52 driver = "docker" 53 54 config { 55 image = "api-server:1.3" 56 } 57 } 58 } 59 } 60 ``` 61 62 We see that it has an `update` stanza that has the `canary` equal to the desired 63 count. This is what allows us to easily model blue/green deployments. When we 64 change the job to run the "api-server:1.4" image, Nomad will create 5 new 65 allocations without touching the original "api-server:1.3" allocations. Below we 66 can see how this works by changing the image to run the new version: 67 68 ```diff 69 @@ -2,6 +2,8 @@ job "docs" { 70 group "api" { 71 task "api-server" { 72 config { 73 - image = "api-server:1.3" 74 + image = "api-server:1.4" 75 ``` 76 77 Next we plan and run these changes: 78 79 ```text 80 $ nomad job plan docs.nomad 81 +/- Job: "docs" 82 +/- Task Group: "api" (5 canary, 5 ignore) 83 +/- Task: "api-server" (forces create/destroy update) 84 +/- Config { 85 +/- image: "api-server:1.3" => "api-server:1.4" 86 } 87 88 Scheduler dry-run: 89 - All tasks successfully allocated. 90 91 Job Modify Index: 7 92 To submit the job with version verification run: 93 94 nomad job run -check-index 7 example.nomad 95 96 When running the job with the check-index flag, the job will only be run if the 97 server side version matches the job modify index returned. If the index has 98 changed, another user has modified the job and the plan's results are 99 potentially invalid. 100 101 $ nomad job run docs.nomad 102 # ... 103 ``` 104 105 We can see from the plan output that Nomad is going to create 5 canaries that 106 are running the "api-server:1.4" image and ignore all the allocations running 107 the older image. Now if we examine the status of the job we can see that both 108 the blue ("api-server:1.3") and green ("api-server:1.4") set are running. 109 110 ```text 111 $ nomad status docs 112 ID = docs 113 Name = docs 114 Submit Date = 07/26/17 19:57:47 UTC 115 Type = service 116 Priority = 50 117 Datacenters = dc1 118 Status = running 119 Periodic = false 120 Parameterized = false 121 122 Summary 123 Task Group Queued Starting Running Failed Complete Lost 124 api 0 0 10 0 0 0 125 126 Latest Deployment 127 ID = 32a080c1 128 Status = running 129 Description = Deployment is running but requires manual promotion 130 131 Deployed 132 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 133 api true false 5 5 5 5 0 134 135 Allocations 136 ID Node ID Task Group Version Desired Status Created At 137 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 138 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 139 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 140 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 141 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 142 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 143 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 144 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 145 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC 146 b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC 147 ``` 148 149 Now that we have the new set in production, we can route traffic to it and 150 validate the new job version is working properly. Based on whether the new 151 version is functioning properly or improperly we will either want to promote or 152 fail the deployment. 153 154 ### Promoting the Deployment 155 156 After deploying the new image along side the old version we have determined it 157 is functioning properly and we want to transition fully to the new version. 158 Doing so is as simple as promoting the deployment: 159 160 ```text 161 $ nomad deployment promote 32a080c1 162 ==> Monitoring evaluation "61ac2be5" 163 Evaluation triggered by job "docs" 164 Evaluation within deployment: "32a080c1" 165 Evaluation status changed: "pending" -> "complete" 166 ==> Evaluation "61ac2be5" finished with status "complete" 167 ``` 168 169 If we look at the job's status we see that after promotion, Nomad stopped the 170 older allocations and is only running the new one. This now completes our 171 blue/green deployment. 172 173 ```text 174 $ nomad status docs 175 ID = docs 176 Name = docs 177 Submit Date = 07/26/17 19:57:47 UTC 178 Type = service 179 Priority = 50 180 Datacenters = dc1 181 Status = running 182 Periodic = false 183 Parameterized = false 184 185 Summary 186 Task Group Queued Starting Running Failed Complete Lost 187 api 0 0 5 0 5 0 188 189 Latest Deployment 190 ID = 32a080c1 191 Status = successful 192 Description = Deployment completed successfully 193 194 Deployed 195 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 196 api true true 5 5 5 5 0 197 198 Allocations 199 ID Node ID Task Group Version Desired Status Created At 200 6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC 201 7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC 202 36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC 203 410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC 204 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 205 3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 206 4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 207 2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 208 35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 209 b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC 210 ``` 211 212 ### Failing the Deployment 213 214 After deploying the new image alongside the old version we have determined it 215 is not functioning properly and we want to roll back to the old version. Doing 216 so is as simple as failing the deployment: 217 218 ```text 219 $ nomad deployment fail 32a080c1 220 Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0. 221 222 ==> Monitoring evaluation "6840f512" 223 Evaluation triggered by job "example" 224 Evaluation within deployment: "32a080c1" 225 Allocation "0ccb732f" modified: node "36e7a123", group "cache" 226 Allocation "64d4f282" modified: node "36e7a123", group "cache" 227 Allocation "664e33c7" modified: node "36e7a123", group "cache" 228 Allocation "a4cb6a4b" modified: node "36e7a123", group "cache" 229 Allocation "fdd73bdd" modified: node "36e7a123", group "cache" 230 Evaluation status changed: "pending" -> "complete" 231 ==> Evaluation "6840f512" finished with status "complete" 232 ``` 233 234 If we now look at the job's status we can see that after failing the deployment, 235 Nomad stopped the new allocations and is only running the old ones and reverted 236 the working copy of the job back to the original specification running 237 "api-server:1.3". 238 239 ```text 240 $ nomad status docs 241 ID = docs 242 Name = docs 243 Submit Date = 07/26/17 19:57:47 UTC 244 Type = service 245 Priority = 50 246 Datacenters = dc1 247 Status = running 248 Periodic = false 249 Parameterized = false 250 251 Summary 252 Task Group Queued Starting Running Failed Complete Lost 253 api 0 0 5 0 5 0 254 255 Latest Deployment 256 ID = 6f3f84b3 257 Status = successful 258 Description = Deployment completed successfully 259 260 Deployed 261 Task Group Auto Revert Desired Placed Healthy Unhealthy 262 cache true 5 5 5 0 263 264 Allocations 265 ID Node ID Task Group Version Desired Status Created At 266 27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 267 5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 268 983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 269 d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 270 d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC 271 0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 272 64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 273 664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 274 a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 275 fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC 276 277 $ nomad job deployments docs 278 ID Job ID Job Version Status Description 279 6f3f84b3 example 2 successful Deployment completed successfully 280 32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0 281 c4c16494 example 0 successful Deployment completed successfully 282 ``` 283 284 ## Canary Deployments 285 286 Canary updates are a useful way to test a new version of a job before beginning 287 a rolling upgrade. The `update` stanza supports setting the number of canaries 288 the job operator would like Nomad to create when the job changes via the 289 `canary` parameter. When the job specification is updated, Nomad creates the 290 canaries without stopping any allocations from the previous job. 291 292 This pattern allows operators to achieve higher confidence in the new job 293 version because they can route traffic, examine logs, etc, to determine the new 294 application is performing properly. 295 296 ```hcl 297 job "docs" { 298 # ... 299 300 group "api" { 301 count = 5 302 303 update { 304 max_parallel = 1 305 canary = 1 306 min_healthy_time = "30s" 307 healthy_deadline = "10m" 308 auto_revert = true 309 auto_promote = false 310 } 311 312 task "api-server" { 313 driver = "docker" 314 315 config { 316 image = "api-server:1.3" 317 } 318 } 319 } 320 } 321 ``` 322 323 In the example above, the `update` stanza tells Nomad to create a single canary 324 when the job specification is changed. Below we can see how this works by 325 changing the image to run the new version: 326 327 ```diff 328 @@ -2,6 +2,8 @@ job "docs" { 329 group "api" { 330 task "api-server" { 331 config { 332 - image = "api-server:1.3" 333 + image = "api-server:1.4" 334 ``` 335 336 Next we plan and run these changes: 337 338 ```text 339 $ nomad job plan docs.nomad 340 +/- Job: "docs" 341 +/- Task Group: "api" (1 canary, 5 ignore) 342 +/- Task: "api-server" (forces create/destroy update) 343 +/- Config { 344 +/- image: "api-server:1.3" => "api-server:1.4" 345 } 346 347 Scheduler dry-run: 348 - All tasks successfully allocated. 349 350 Job Modify Index: 7 351 To submit the job with version verification run: 352 353 nomad job run -check-index 7 example.nomad 354 355 When running the job with the check-index flag, the job will only be run if the 356 server side version matches the job modify index returned. If the index has 357 changed, another user has modified the job and the plan's results are 358 potentially invalid. 359 360 $ nomad job run docs.nomad 361 # ... 362 ``` 363 364 We can see from the plan output that Nomad is going to create 1 canary that 365 will run the "api-server:1.4" image and ignore all the allocations running 366 the older image. If we inspect the status we see that the canary is running 367 along side the older version of the job: 368 369 ```text 370 $ nomad status docs 371 ID = docs 372 Name = docs 373 Submit Date = 07/26/17 19:57:47 UTC 374 Type = service 375 Priority = 50 376 Datacenters = dc1 377 Status = running 378 Periodic = false 379 Parameterized = false 380 381 Summary 382 Task Group Queued Starting Running Failed Complete Lost 383 api 0 0 6 0 0 0 384 385 Latest Deployment 386 ID = 32a080c1 387 Status = running 388 Description = Deployment is running but requires manual promotion 389 390 Deployed 391 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 392 api true false 5 1 1 1 0 393 394 Allocations 395 ID Node ID Task Group Version Desired Status Created At 396 85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC 397 3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC 398 4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC 399 2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC 400 35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC 401 b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC 402 ``` 403 404 Now if we promote the canary, this will trigger a rolling update to replace the 405 remaining allocations running the older image. The rolling update will happen at 406 a rate of `max_parallel`, so in this case one allocation at a time: 407 408 ```text 409 $ nomad deployment promote 37033151 410 ==> Monitoring evaluation "37033151" 411 Evaluation triggered by job "docs" 412 Evaluation within deployment: "ed28f6c2" 413 Allocation "f5057465" created: node "f6646949", group "cache" 414 Allocation "f5057465" status changed: "pending" -> "running" 415 Evaluation status changed: "pending" -> "complete" 416 ==> Evaluation "37033151" finished with status "complete" 417 418 $ nomad status docs 419 ID = docs 420 Name = docs 421 Submit Date = 07/26/17 20:28:59 UTC 422 Type = service 423 Priority = 50 424 Datacenters = dc1 425 Status = running 426 Periodic = false 427 Parameterized = false 428 429 Summary 430 Task Group Queued Starting Running Failed Complete Lost 431 api 0 0 5 0 2 0 432 433 Latest Deployment 434 ID = ed28f6c2 435 Status = running 436 Description = Deployment is running 437 438 Deployed 439 Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy 440 api true true 5 1 2 1 0 441 442 Allocations 443 ID Node ID Task Group Version Desired Status Created At 444 f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC 445 b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC 446 1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC 447 1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC 448 4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC 449 62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC 450 f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC 451 ``` 452 453 Alternatively, if the canary was not performing properly, we could abandon the 454 change using the `nomad deployment fail` command, similar to the blue/green 455 example.