github.com/smintz/nomad@v0.8.3/website/source/guides/node-draining.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Decommissioning Nodes" 4 sidebar_current: "guides-decommissioning-nodes" 5 description: |- 6 Decommissioning nodes is a normal part of cluster operations for a variety of 7 reasons: server maintenance, operating system upgrades, etc. Nomad offers a 8 number of parameters for controlling how running jobs are migrated off of 9 draining nodes. 10 --- 11 12 # Decommissioning Nomad Client Nodes 13 14 Decommissioning nodes is a normal part of cluster operations for a variety of 15 reasons: server maintenance, operating system upgrades, etc. Nomad offers a 16 number of parameters for controlling how running jobs are migrated off of 17 draining nodes. 18 19 ## Configuring How Jobs are Migrated 20 21 In Nomad 0.8 a [`migrate`][migrate] stanza was added to jobs to allow control 22 over how allocations for a job are migrated off of a draining node. Below is an 23 example job that runs a web service and has a Consul health check: 24 25 ```hcl 26 job "webapp" { 27 datacenters = ["dc1"] 28 29 migrate { 30 max_parallel = 2 31 health_check = "checks" 32 min_healthy_time = "15s" 33 healthy_deadline = "5m" 34 } 35 36 group "webapp" { 37 count = 9 38 39 task "webapp" { 40 driver = "docker" 41 config { 42 image = "hashicorp/http-echo:0.2.3" 43 args = ["-text", "ok"] 44 port_map { 45 http = 5678 46 } 47 } 48 49 resources { 50 network { 51 mbits = 10 52 port "http" {} 53 } 54 } 55 56 service { 57 name = "webapp" 58 port = "http" 59 check { 60 name = "http-ok" 61 type = "http" 62 path = "/" 63 interval = "10s" 64 timeout = "2s" 65 } 66 } 67 } 68 } 69 } 70 ``` 71 72 The above `migrate` stanza ensures only 2 allocations are stopped at a time to 73 migrate during node drains. Even if multiple nodes running allocations for this 74 job were draining at the same time, only 2 allocations would be migrated at a 75 time. 76 77 When the job is run it may be placed on multiple nodes. In the following 78 example the 9 `webapp` allocations are spread across 2 nodes: 79 80 ```text 81 $ nomad run webapp.nomad 82 ==> Monitoring evaluation "5129bc74" 83 Evaluation triggered by job "webapp" 84 Allocation "5b4d6db5" created: node "46f1c6c4", group "webapp" 85 Allocation "670a715f" created: node "f7476465", group "webapp" 86 Allocation "78b6b393" created: node "46f1c6c4", group "webapp" 87 Allocation "85743ff5" created: node "f7476465", group "webapp" 88 Allocation "edf71a5d" created: node "f7476465", group "webapp" 89 Allocation "56f770c0" created: node "46f1c6c4", group "webapp" 90 Allocation "9a51a484" created: node "46f1c6c4", group "webapp" 91 Allocation "f6f6e64c" created: node "f7476465", group "webapp" 92 Allocation "fefe81d0" created: node "f7476465", group "webapp" 93 Evaluation status changed: "pending" -> "complete" 94 ==> Evaluation "5129bc74" finished with status "complete" 95 ``` 96 97 If one those nodes needed to be decommissioned, perhaps because of a hardware 98 issue, then an operator would issue node drain to migrate the allocations off: 99 100 ```text 101 $ nomad node drain -enable -yes 46f1 102 2018-04-11T23:41:56Z: Ctrl-C to stop monitoring: will not cancel the node drain 103 2018-04-11T23:41:56Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set 104 2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration 105 2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration 106 2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" draining 107 2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" draining 108 2018-04-11T23:42:03Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" status running -> complete 109 2018-04-11T23:42:03Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" status running -> complete 110 2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration 111 2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" draining 112 2018-04-11T23:42:27Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" status running -> complete 113 2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" marked for migration 114 2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" draining 115 2018-04-11T23:42:29Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" has marked all allocations for migration 116 2018-04-11T23:42:34Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" status running -> complete 117 2018-04-11T23:42:34Z: All allocations on node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" have stopped. 118 ``` 119 120 There are a couple important events to notice in the output. First, only 2 121 allocations are migrated initially: 122 123 ``` 124 2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration 125 2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration 126 ``` 127 128 This is because `max_parallel = 2` in the job specification. The next 129 allocation on the draining node waits to be migrated: 130 131 ``` 132 2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration 133 ``` 134 135 Note that this occurs 25 seconds after the initial migrations. The 25 second 136 delay is because a replacement allocation took 10 seconds to become healthy and 137 then the `min_healthy_time = "15s"` meant node draining waited an additional 15 138 seconds. If the replacement allocation had failed within that time the node 139 drain would not have continued until a replacement could be successfully made. 140 141 ### Scheduling Eligibility 142 143 Now that the example drain has finished we can inspect the state of the drained 144 node: 145 146 ```text 147 $ nomad node status 148 ID DC Name Class Drain Eligibility Status 149 f7476465 dc1 nomad-1 <none> false eligible ready 150 96b52ad8 dc1 nomad-2 <none> false eligible ready 151 46f1c6c4 dc1 nomad-3 <none> false ineligible ready 152 ``` 153 154 While node `46f1c6c4` has `Drain = false`, notice that its `Eligibility = 155 ineligible`. Node scheduling eligibility is a new field in Nomad 0.8. When a 156 node is ineligible for scheduling the scheduler will not consider it for new 157 placements. 158 159 While draining, a node will always be ineligible for scheduling. Once draining 160 completes it will remain ineligible to prevent refilling a newly drained node. 161 162 However, by default canceling a drain with the `-disable` option will reset a 163 node to be eligible for scheduling. To cancel a drain and preserving the node's 164 ineligible status use the `-keep-ineligible` option. 165 166 Scheduling eligibility can be toggled independently of node drains by using the 167 [`nomad node eligibility`][eligibility] command: 168 169 ```text 170 $ nomad node eligibility -disable 46f1 171 Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling 172 ``` 173 174 ### Node Drain Deadline 175 176 Sometimes a drain is unable to proceed and complete normally. This could be 177 caused by not enough capacity existing in the cluster to replace the drained 178 allocations or by replacement allocations failing to start successfully in a 179 timely fashion. 180 181 Operators may specify a deadline when enabling a node drain to prevent drains 182 from not finishing. Once the deadline is reached, all remaining allocations on 183 the node are stopped regardless of `migrate` stanza parameters. 184 185 The default deadline is 1 hour and may be changed with the 186 [`-deadline`][deadline] command line option. The [`-force`][force] option is an 187 instant deadline: all allocations are immediately stopped. The 188 [`-no-deadline`][no-deadline] option disables the deadline so a drain may 189 continue indefinitely. 190 191 Like all other drain parameters, a drain's deadline can be updated by making 192 subsequent `nomad node drain ...` calls with updated values. 193 194 ## Node Drains and Non-Service Jobs 195 196 So far we have only seen how draining works with service jobs. Both batch and 197 system jobs are have different behaviors during node drains. 198 199 ### Draining Batch Jobs 200 201 Node drains only migrate batch jobs once the drain's deadline has been reached. 202 For node drains without a deadline the drain will not complete until all batch 203 jobs on the node have completed (or failed). 204 205 The goal of this behavior is to avoid losing progress a batch job has made by 206 forcing it to exit early. 207 208 ### Keeping System Jobs Running 209 210 Node drains only stop system jobs once all other allocations have exited. This 211 way if a node is running a log shipping daemon or metrics collector as a system 212 job, it will continue to run as long as there are other allocations running. 213 214 The [`-ignore-system`][ignore-system] option leaves system jobs running even 215 after all other allocations have exited. This is useful when system jobs are 216 used to monitor Nomad or the node itself. 217 218 ## Draining Multiple Nodes 219 220 A common operation is to decommission an entire class of nodes at once. Prior 221 to Nomad 0.7 this was a problematic operation as the first node to begin 222 draining may migrate all of their allocations to the next node about to be 223 drained. In pathological cases this could repeat on each node to be drained and 224 cause allocations to be rescheduled repeatedly. 225 226 As of Nomad 0.8 an operator can avoid this churn by marking nodes ineligible 227 for scheduling before draining them using the [`nomad node 228 eligibility`][eligibility] command: 229 230 ```text 231 $ nomad node eligibility -disable 46f1 232 Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling 233 234 $ nomad node eligibility -disable 96b5 235 Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling 236 237 $ nomad node status 238 ID DC Name Class Drain Eligibility Status 239 f7476465 dc1 nomad-1 <none> false eligible ready 240 46f1c6c4 dc1 nomad-2 <none> false ineligible ready 241 96b52ad8 dc1 nomad-3 <none> false ineligible ready 242 ``` 243 244 Now that both `nomad-2` and `nomad-3` are ineligible for scheduling, they can 245 be drained without risking placing allocations on an _about-to-be-drained_ 246 node. 247 248 Toggling scheduling eligibility can be done totally independently of draining. 249 For example when an operator wants to inspect the allocations currently running 250 on a node without risking new allocations being scheduled and changing the 251 node's state: 252 253 ```text 254 $ nomad node eligibility -self -disable 255 Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling 256 257 $ # ...inspect node state... 258 259 $ nomad node eligibility -self -enable 260 Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: eligible for scheduling 261 ``` 262 263 ### Example: Migrating Datacenters 264 265 A more complete example of draining multiple nodes would be when migrating from 266 an old datacenter (`dc1`) to a new datacenter (`dc2`): 267 268 ```text 269 $ nomad node status -allocs 270 ID DC Name Class Drain Eligibility Status Running Allocs 271 f7476465 dc1 nomad-1 <none> false eligible ready 4 272 46f1c6c4 dc1 nomad-2 <none> false eligible ready 1 273 96b52ad8 dc1 nomad-3 <none> false eligible ready 4 274 168bdd03 dc2 nomad-4 <none> false eligible ready 0 275 9ccb3306 dc2 nomad-5 <none> false eligible ready 0 276 7a7f9a37 dc2 nomad-6 <none> false eligible ready 0 277 ``` 278 279 Before migrating ensure that all jobs in `dc1` have `datacenters = ["dc1", 280 "dc2"]`. Then before draining, mark all nodes in `dc1` as ineligible for 281 scheduling. Shell scripting can help automate manipulating multiple nodes at 282 once: 283 284 ```text 285 $ nomad node status | awk '{ print $2 " " $1 }' | grep ^dc1 | awk '{ system("nomad node eligibility -disable "$2) }' 286 Node "f7476465-4d6e-c0de-26d0-e383c49be941" scheduling eligibility set: ineligible for scheduling 287 Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling 288 Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling 289 290 $ nomad node status 291 ID DC Name Class Drain Eligibility Status 292 f7476465 dc1 nomad-1 <none> false ineligible ready 293 46f1c6c4 dc1 nomad-2 <none> false ineligible ready 294 96b52ad8 dc1 nomad-3 <none> false ineligible ready 295 168bdd03 dc2 nomad-4 <none> false eligible ready 296 9ccb3306 dc2 nomad-5 <none> false eligible ready 297 7a7f9a37 dc2 nomad-6 <none> false eligible ready 298 ``` 299 300 Then drain each node in `dc1`. For this example we will only monitor the final 301 ode that is draining. Watching `nomad node status -allocs` is also a good way 302 to monitor the status of drains. 303 304 ```text 305 $ nomad node drain -enable -yes -detach f7476465 306 Node "f7476465-4d6e-c0de-26d0-e383c49be941" drain strategy set 307 308 $ nomad node drain -enable -yes -detach 46f1c6c4 309 Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set 310 311 $ nomad node drain -enable -yes 9ccb3306 312 2018-04-12T22:08:00Z: Ctrl-C to stop monitoring: will not cancel the node drain 313 2018-04-12T22:08:00Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" drain strategy set 314 2018-04-12T22:08:15Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" marked for migration 315 2018-04-12T22:08:16Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" draining 316 2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" marked for migration 317 2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" draining 318 2018-04-12T22:08:21Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" status running -> complete 319 2018-04-12T22:08:22Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" status running -> complete 320 2018-04-12T22:09:08Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" marked for migration 321 2018-04-12T22:09:09Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" draining 322 2018-04-12T22:09:14Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" status running -> complete 323 2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" marked for migration 324 2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" draining 325 2018-04-12T22:09:33Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" has marked all allocations for migration 326 2018-04-12T22:09:39Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" status running -> complete 327 2018-04-12T22:09:39Z: All allocations on node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" have stopped. 328 ``` 329 330 Note that there was a 15 second delay between node `96b52ad8` starting to drain 331 and having its first allocation migrated. The delay was due to 2 other 332 allocations for the same job already being migrated from the other nodes. Once 333 at least 8 out of the 9 allocations are running for the job, another allocation 334 could begin draining. 335 336 The final node drain command did not exit until 6 seconds after the `drain 337 complete` message because the command line tool blocks until all allocations on 338 the node have stopped. This allows operators to script shutting down a node 339 once a drain command exits and know all services have already exited. 340 341 [deadline]: /docs/commands/node/drain.html#deadline 342 [eligibility]: /docs/commands/node/eligibility.html 343 [force]: /docs/commands/node/drain.html#force 344 [ignore-system]: /docs/commands/node/drain.html#ignore-system 345 [migrate]: /docs/job-specification/migrate.html 346 [no-deadline]: /docs/commands/node/drain.html#no-deadline