github.com/mattyr/nomad@v0.3.3-0.20160919021406-3485a065154a/website/source/docs/jobops/updating.html.md (about)

     1  ---
     2  layout: "docs"
     3  page_title: "Operating a Job: Updating Jobs"
     4  sidebar_current: "docs-jobops-updating"
     5  description: |-
     6    Learn how to do safely update Nomad Jobs.
     7  ---
     8  
     9  # Updating a Job
    10  
    11  When operating a service, updating the version of the job will be a common task.
    12  Under a cluster scheduler the same best practices apply for reliably deploying
    13  new versions including: rolling updates, blue-green deploys and canaries which
    14  are special cased blue-green deploys. This section will explore how to do each
    15  of these safely with Nomad.
    16  
    17  ## Rolling Updates
    18  
    19  In order to update a service without introducing down-time, Nomad has build in
    20  support for rolling updates. When a job specifies a rolling update, with the
    21  below syntax, Nomad will only update `max-parallel` number of task groups at a
    22  time and will wait `stagger` duration before updating the next set.
    23  
    24  ```
    25  job "rolling" {
    26      ...
    27      update {
    28          stagger = "30s"
    29          max_parallel = 1
    30      }
    31      ...
    32  }
    33  ```
    34  
    35  We can use the [`nomad plan` command](/docs/commands/plan.html) while updating
    36  jobs to ensure the scheduler will do as we expect. In this example, we have 3
    37  web server instances that we want to update their version. After the job file
    38  was modified we can run `plan`:
    39  
    40  ```
    41  $ nomad plan my-web.nomad
    42  +/- Job: "my-web"
    43  +/- Task Group: "web" (3 create/destroy update)
    44    +/- Task: "web" (forces create/destroy update)
    45      +/- Config {
    46        +/- image:             "nginx:1.10" => "nginx:1.11"
    47            port_map[0][http]: "80"
    48      }
    49  
    50  Scheduler dry-run:
    51  - All tasks successfully allocated.
    52  - Rolling update, next evaluation will be in 10s.
    53  
    54  Job Modify Index: 7
    55  To submit the job with version verification run:
    56  
    57  nomad run -check-index 7 my-web.nomad
    58  
    59  When running the job with the check-index flag, the job will only be run if the
    60  server side version matches the the job modify index returned. If the index has
    61  changed, another user has modified the job and the plan's results are
    62  potentially invalid.
    63  ```
    64  
    65  Here we can see that Nomad will destroy the 3 existing tasks and create 3
    66  replacements but it will occur with a rolling update with a stagger of `10s`.
    67  For more details on the update block, see
    68  the [Jobspec documentation](/docs/jobspec/index.html#update).
    69  
    70  ## Blue-green and Canaries
    71  
    72  Blue-green deploys have several names, Red/Black, A/B, Blue/Green, but the
    73  concept is the same. The idea is to have two sets of applications with only one
    74  of them being live at a given time, except while transitioning from one set to
    75  another.  What the term "live" means is that the live set of applications are
    76  the set receiving traffic.
    77  
    78  So imagine we have an API server that has 10 instances deployed to production
    79  at version 1 and we want to upgrade to version 2. Hopefully the new version has
    80  been tested in a QA environment and is now ready to start accepting production
    81  traffic.
    82  
    83  In this case we would consider version 1 to be the live set and we want to
    84  transition to version 2. We can model this workflow with the below job:
    85  
    86  ```
    87  job "my-api" {
    88      ...
    89  
    90      group "api-green" {
    91          count = 10
    92  
    93          task "api-server" {
    94              driver = "docker"
    95              
    96              config {
    97                  image = "api-server:v1"
    98              }
    99          }
   100      }
   101  
   102      group "api-blue" {
   103          count = 0
   104  
   105          task "api-server" {
   106              driver = "docker"
   107              
   108              config {
   109                  image = "api-server:v2"
   110              }
   111          }
   112      }
   113  }
   114  ```
   115  
   116  Here we can see the live group is "api-green" since it has a non-zero count. To
   117  transition to v2, we up the count of "api-blue" and down the count of
   118  "api-green". We can now see how the canary process is a special case of
   119  blue-green. If we set "api-blue" to `count = 1` and "api-green" to `count = 9`,
   120  there will still be the original 10 instances but we will be testing only one
   121  instance of the new version, essentially canarying it.
   122  
   123  If at any time we notice that the new version is behaving incorrectly and we
   124  want to roll back, all that we have to do is drop the count of the new group to
   125  0 and restore the original version back to 10. This fine control lets job
   126  operators be confident that deployments will not cause down time. If the deploy
   127  is successful and we fully transition from v1 to v2 the job file will look like
   128  this:
   129  
   130  ```
   131  job "my-api" {
   132      ...
   133  
   134      group "api-green" {
   135          count = 0
   136  
   137          task "api-server" {
   138              driver = "docker"
   139              
   140              config {
   141                  image = "api-server:v1"
   142              }
   143          }
   144      }
   145  
   146      group "api-blue" {
   147          count = 10
   148  
   149          task "api-server" {
   150              driver = "docker"
   151              
   152              config {
   153                  image = "api-server:v2"
   154              }
   155          }
   156      }
   157  }
   158  ```
   159  
   160  Now "api-blue" is the live group and when we are ready to update the api to v3,
   161  we would modify "api-green" and repeat this process. The rate at which the count
   162  of groups are incremented and decremented is totally up to the user. It is
   163  usually good practice to start by transition one at a time until a certain
   164  confidence threshold is met based on application specific logs and metrics.
   165  
   166  ## Handling Drain Signals
   167  
   168  On operating systems that support signals, Nomad will signal the application
   169  before killing it. This gives the application time to gracefully drain
   170  connections and conduct any other cleanup that is necessary. Certain
   171  applications take longer to drain than others and as such Nomad lets the job
   172  file specify how long to wait in-between signaling the application to exit and
   173  forcefully killing it. This is configurable via the `kill_timeout`. More details
   174  can be seen in the [Jobspec documentation](/docs/jobspec/index.html#kill_timeout).