github.com/smintz/nomad@v0.8.3/website/source/guides/node-draining.html.md

github.com/smintz/nomad@v0.8.3/website/source/guides/node-draining.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Decommissioning Nodes"
     4  sidebar_current: "guides-decommissioning-nodes"
     5  description: |-
     6    Decommissioning nodes is a normal part of cluster operations for a variety of
     7    reasons: server maintenance, operating system upgrades, etc. Nomad offers a
     8    number of parameters for controlling how running jobs are migrated off of
     9    draining nodes.
    10  ---
    11  
    12  # Decommissioning Nomad Client Nodes
    13  
    14  Decommissioning nodes is a normal part of cluster operations for a variety of
    15  reasons: server maintenance, operating system upgrades, etc. Nomad offers a
    16  number of parameters for controlling how running jobs are migrated off of
    17  draining nodes.
    18  
    19  ## Configuring How Jobs are Migrated
    20  
    21  In Nomad 0.8 a [`migrate`][migrate] stanza was added to jobs to allow control
    22  over how allocations for a job are migrated off of a draining node. Below is an
    23  example job that runs a web service and has a Consul health check:
    24  
    25  ```hcl
    26  job "webapp" {
    27    datacenters = ["dc1"]
    28  
    29    migrate {
    30      max_parallel = 2
    31      health_check = "checks"
    32      min_healthy_time = "15s"
    33      healthy_deadline = "5m"
    34    }
    35  
    36    group "webapp" {
    37      count = 9
    38  
    39      task "webapp" {
    40        driver = "docker"
    41        config {
    42          image = "hashicorp/http-echo:0.2.3"
    43          args  = ["-text", "ok"]
    44          port_map {
    45            http = 5678
    46          }
    47        }
    48  
    49        resources {
    50          network {
    51            mbits = 10
    52            port "http" {}
    53          }
    54        }
    55  
    56        service {
    57          name = "webapp"
    58          port = "http"
    59          check {
    60            name = "http-ok"
    61            type = "http"
    62            path = "/"
    63            interval = "10s"
    64            timeout  = "2s"
    65          }
    66        }
    67      }
    68    }
    69  }
    70  ```
    71  
    72  The above `migrate` stanza ensures only 2 allocations are stopped at a time to
    73  migrate during node drains. Even if multiple nodes running allocations for this
    74  job were draining at the same time, only 2 allocations would be migrated at a
    75  time.
    76  
    77  When the job is run it may be placed on multiple nodes. In the following
    78  example the 9 `webapp` allocations are spread across 2 nodes: 
    79  
    80  ```text
    81  $ nomad run webapp.nomad
    82  ==> Monitoring evaluation "5129bc74"
    83      Evaluation triggered by job "webapp"
    84      Allocation "5b4d6db5" created: node "46f1c6c4", group "webapp"
    85      Allocation "670a715f" created: node "f7476465", group "webapp"
    86      Allocation "78b6b393" created: node "46f1c6c4", group "webapp"
    87      Allocation "85743ff5" created: node "f7476465", group "webapp"
    88      Allocation "edf71a5d" created: node "f7476465", group "webapp"
    89      Allocation "56f770c0" created: node "46f1c6c4", group "webapp"
    90      Allocation "9a51a484" created: node "46f1c6c4", group "webapp"
    91      Allocation "f6f6e64c" created: node "f7476465", group "webapp"
    92      Allocation "fefe81d0" created: node "f7476465", group "webapp"
    93      Evaluation status changed: "pending" -> "complete"
    94  ==> Evaluation "5129bc74" finished with status "complete"
    95  ```
    96  
    97  If one those nodes needed to be decommissioned, perhaps because of a hardware
    98  issue, then an operator would issue node drain to migrate the allocations off:
    99  
   100  ```text
   101  $ nomad node drain -enable -yes 46f1
   102  2018-04-11T23:41:56Z: Ctrl-C to stop monitoring: will not cancel the node drain
   103  2018-04-11T23:41:56Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set
   104  2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
   105  2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
   106  2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" draining
   107  2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" draining
   108  2018-04-11T23:42:03Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" status running -> complete
   109  2018-04-11T23:42:03Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" status running -> complete
   110  2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
   111  2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" draining
   112  2018-04-11T23:42:27Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" status running -> complete
   113  2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" marked for migration
   114  2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" draining
   115  2018-04-11T23:42:29Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" has marked all allocations for migration
   116  2018-04-11T23:42:34Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" status running -> complete
   117  2018-04-11T23:42:34Z: All allocations on node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" have stopped.
   118  ```
   119  
   120  There are a couple important events to notice in the output. First, only 2
   121  allocations are migrated initially:
   122  
   123  ```
   124  2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
   125  2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
   126  ```
   127  
   128  This is because `max_parallel = 2` in the job specification. The next
   129  allocation on the draining node waits to be migrated:
   130  
   131  ```
   132  2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
   133  ```
   134  
   135  Note that this occurs 25 seconds after the initial migrations. The 25 second
   136  delay is because a replacement allocation took 10 seconds to become healthy and
   137  then the `min_healthy_time = "15s"` meant node draining waited an additional 15
   138  seconds. If the replacement allocation had failed within that time the node
   139  drain would not have continued until a replacement could be successfully made.
   140  
   141  ### Scheduling Eligibility
   142  
   143  Now that the example drain has finished we can inspect the state of the drained
   144  node:
   145  
   146  ```text
   147  $ nomad node status
   148  ID        DC   Name     Class   Drain  Eligibility  Status
   149  f7476465  dc1  nomad-1  <none>  false  eligible     ready
   150  96b52ad8  dc1  nomad-2  <none>  false  eligible     ready
   151  46f1c6c4  dc1  nomad-3  <none>  false  ineligible   ready
   152  ```
   153  
   154  While node `46f1c6c4` has `Drain = false`, notice that its `Eligibility =
   155  ineligible`. Node scheduling eligibility is a new field in Nomad 0.8. When a
   156  node is ineligible for scheduling the scheduler will not consider it for new
   157  placements.
   158  
   159  While draining, a node will always be ineligible for scheduling. Once draining
   160  completes it will remain ineligible to prevent refilling a newly drained node.
   161  
   162  However, by default canceling a drain with the `-disable` option will reset a
   163  node to be eligible for scheduling. To cancel a drain and preserving the node's
   164  ineligible status use the `-keep-ineligible` option.
   165  
   166  Scheduling eligibility can be toggled independently of node drains by using the
   167  [`nomad node eligibility`][eligibility] command:
   168  
   169  ```text
   170  $ nomad node eligibility -disable 46f1
   171  Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
   172  ```
   173  
   174  ### Node Drain Deadline
   175  
   176  Sometimes a drain is unable to proceed and complete normally. This could be
   177  caused by not enough capacity existing in the cluster to replace the drained
   178  allocations or by replacement allocations failing to start successfully in a
   179  timely fashion.
   180  
   181  Operators may specify a deadline when enabling a node drain to prevent drains
   182  from not finishing. Once the deadline is reached, all remaining allocations on
   183  the node are stopped regardless of `migrate` stanza parameters.
   184  
   185  The default deadline is 1 hour and may be changed with the
   186  [`-deadline`][deadline] command line option. The [`-force`][force] option is an
   187  instant deadline: all allocations are immediately stopped. The
   188  [`-no-deadline`][no-deadline] option disables the deadline so a drain may
   189  continue indefinitely.
   190  
   191  Like all other drain parameters, a drain's deadline can be updated by making
   192  subsequent `nomad node drain ...` calls with updated values.
   193  
   194  ## Node Drains and Non-Service Jobs
   195  
   196  So far we have only seen how draining works with service jobs. Both batch and
   197  system jobs are have different behaviors during node drains.
   198  
   199  ### Draining Batch Jobs
   200  
   201  Node drains only migrate batch jobs once the drain's deadline has been reached.
   202  For node drains without a deadline the drain will not complete until all batch
   203  jobs on the node have completed (or failed).
   204  
   205  The goal of this behavior is to avoid losing progress a batch job has made by
   206  forcing it to exit early.
   207  
   208  ### Keeping System Jobs Running
   209  
   210  Node drains only stop system jobs once all other allocations have exited. This
   211  way if a node is running a log shipping daemon or metrics collector as a system
   212  job, it will continue to run as long as there are other allocations running.
   213  
   214  The [`-ignore-system`][ignore-system] option leaves system jobs running even
   215  after all other allocations have exited. This is useful when system jobs are
   216  used to monitor Nomad or the node itself.
   217  
   218  ## Draining Multiple Nodes
   219  
   220  A common operation is to decommission an entire class of nodes at once. Prior
   221  to Nomad 0.7 this was a problematic operation as the first node to begin
   222  draining may migrate all of their allocations to the next node about to be
   223  drained. In pathological cases this could repeat on each node to be drained and
   224  cause allocations to be rescheduled repeatedly.
   225  
   226  As of Nomad 0.8 an operator can avoid this churn by marking nodes ineligible
   227  for scheduling before draining them using the [`nomad node
   228  eligibility`][eligibility] command:
   229  
   230  ```text
   231  $ nomad node eligibility -disable 46f1
   232  Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
   233  
   234  $ nomad node eligibility -disable 96b5
   235  Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
   236  
   237  $ nomad node status
   238  ID        DC   Name     Class   Drain  Eligibility  Status
   239  f7476465  dc1  nomad-1  <none>  false  eligible     ready
   240  46f1c6c4  dc1  nomad-2  <none>  false  ineligible   ready
   241  96b52ad8  dc1  nomad-3  <none>  false  ineligible   ready
   242  ```
   243  
   244  Now that both `nomad-2` and `nomad-3` are ineligible for scheduling, they can
   245  be drained without risking placing allocations on an _about-to-be-drained_
   246  node.
   247  
   248  Toggling scheduling eligibility can be done totally independently of draining.
   249  For example when an operator wants to inspect the allocations currently running
   250  on a node without risking new allocations being scheduled and changing the
   251  node's state:
   252  
   253  ```text
   254  $ nomad node eligibility -self -disable
   255  Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
   256  
   257  $ # ...inspect node state...
   258  
   259  $ nomad node eligibility -self -enable
   260  Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: eligible for scheduling
   261  ```
   262  
   263  ### Example: Migrating Datacenters
   264  
   265  A more complete example of draining multiple nodes would be when migrating from
   266  an old datacenter (`dc1`) to a new datacenter (`dc2`):
   267  
   268  ```text
   269  $ nomad node status -allocs
   270  ID        DC   Name     Class   Drain  Eligibility  Status  Running Allocs
   271  f7476465  dc1  nomad-1  <none>  false  eligible     ready   4
   272  46f1c6c4  dc1  nomad-2  <none>  false  eligible     ready   1
   273  96b52ad8  dc1  nomad-3  <none>  false  eligible     ready   4
   274  168bdd03  dc2  nomad-4  <none>  false  eligible     ready   0
   275  9ccb3306  dc2  nomad-5  <none>  false  eligible     ready   0
   276  7a7f9a37  dc2  nomad-6  <none>  false  eligible     ready   0
   277  ```
   278  
   279  Before migrating ensure that all jobs in `dc1` have `datacenters = ["dc1",
   280  "dc2"]`.  Then before draining, mark all nodes in `dc1` as ineligible for
   281  scheduling. Shell scripting can help automate manipulating multiple nodes at
   282  once:
   283  
   284  ```text
   285  $ nomad node status | awk '{ print $2 " " $1 }' | grep ^dc1 | awk '{ system("nomad node eligibility -disable "$2) }'
   286  Node "f7476465-4d6e-c0de-26d0-e383c49be941" scheduling eligibility set: ineligible for scheduling
   287  Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
   288  Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
   289  
   290  $ nomad node status
   291  ID        DC   Name     Class   Drain  Eligibility  Status
   292  f7476465  dc1  nomad-1  <none>  false  ineligible   ready
   293  46f1c6c4  dc1  nomad-2  <none>  false  ineligible   ready
   294  96b52ad8  dc1  nomad-3  <none>  false  ineligible   ready
   295  168bdd03  dc2  nomad-4  <none>  false  eligible     ready
   296  9ccb3306  dc2  nomad-5  <none>  false  eligible     ready
   297  7a7f9a37  dc2  nomad-6  <none>  false  eligible     ready
   298  ```
   299  
   300  Then drain each node in `dc1`. For this example we will only monitor the final
   301  ode that is draining. Watching `nomad node status -allocs` is also a good way
   302  to monitor the status of drains.
   303  
   304  ```text
   305  $ nomad node drain -enable -yes -detach f7476465
   306  Node "f7476465-4d6e-c0de-26d0-e383c49be941" drain strategy set
   307  
   308  $ nomad node drain -enable -yes -detach 46f1c6c4
   309  Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set
   310  
   311  $ nomad node drain -enable -yes 9ccb3306
   312  2018-04-12T22:08:00Z: Ctrl-C to stop monitoring: will not cancel the node drain
   313  2018-04-12T22:08:00Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" drain strategy set
   314  2018-04-12T22:08:15Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" marked for migration
   315  2018-04-12T22:08:16Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" draining
   316  2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" marked for migration
   317  2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" draining
   318  2018-04-12T22:08:21Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" status running -> complete
   319  2018-04-12T22:08:22Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" status running -> complete
   320  2018-04-12T22:09:08Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" marked for migration
   321  2018-04-12T22:09:09Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" draining
   322  2018-04-12T22:09:14Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" status running -> complete
   323  2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" marked for migration
   324  2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" draining
   325  2018-04-12T22:09:33Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" has marked all allocations for migration
   326  2018-04-12T22:09:39Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" status running -> complete
   327  2018-04-12T22:09:39Z: All allocations on node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" have stopped.
   328  ```
   329  
   330  Note that there was a 15 second delay between node `96b52ad8` starting to drain
   331  and having its first allocation migrated. The delay was due to 2 other
   332  allocations for the same job already being migrated from the other nodes. Once
   333  at least 8 out of the 9 allocations are running for the job, another allocation
   334  could begin draining.
   335  
   336  The final node drain command did not exit until 6 seconds after the `drain
   337  complete` message because the command line tool blocks until all allocations on
   338  the node have stopped. This allows operators to script shutting down a node
   339  once a drain command exits and know all services have already exited.
   340  
   341  [deadline]: /docs/commands/node/drain.html#deadline
   342  [eligibility]: /docs/commands/node/eligibility.html
   343  [force]: /docs/commands/node/drain.html#force
   344  [ignore-system]: /docs/commands/node/drain.html#ignore-system
   345  [migrate]: /docs/job-specification/migrate.html
   346  [no-deadline]: /docs/commands/node/drain.html#no-deadline