github.com/iqoqo/nomad@v0.11.3-0.20200911112621-d7021c74d101/website/pages/docs/upgrade/upgrade-specific.mdx (about) 1 --- 2 layout: docs 3 page_title: Upgrade Guides 4 sidebar_title: Specific Version Details 5 description: |- 6 Specific versions of Nomad may have additional information about the upgrade 7 process beyond the standard flow. 8 --- 9 10 # Upgrade Guides 11 12 The [upgrading page](/docs/upgrade) covers the details of doing 13 a standard upgrade. However, specific versions of Nomad may have more 14 details provided for their upgrades as a result of new features or changed 15 behavior. This page is used to document those details separately from the 16 standard upgrade flow. 17 18 ## Nomad 0.11.2 19 20 ### Scheduler Scoring Changes 21 22 Prior to Nomad 0.11.2 the scheduler algorithm used a [node's reserved 23 resources][reserved] 24 incorrectly during scoring. The result of this bug was that scoring biased in 25 favor of nodes with reserved resources vs nodes without reserved resources. 26 27 Placements will be more correct but slightly different in v0.11.2 vs earlier 28 versions of Nomad. Operators do *not* need to take any actions as the impact of 29 the bug fix will only minimally affect scoring. 30 31 Feasability (whether a node is capable of running a job at all) is *not* 32 affected. 33 34 ### Periodic Jobs and Daylight Saving Time 35 36 Nomad 0.11.2 fixed a long outstanding bug affecting periodic jobs that are 37 scheduled to run during Daylight Saving Time transitions. 38 39 Nomad 0.11.2 provides a more defined behavior: Nomad evaluates the cron 40 expression with respect to specified time zone during transition. A 2:30am 41 nightly job with `America/New_York` time zone will not run on the day daylight 42 saving time start; similarly, a 1:30am nightly job will run twice on the day 43 daylight saving time ends. See the [Daylight Saving Time][dst] documentation 44 for details. 45 46 ## Nomad 0.11.0 47 48 ### client.template: `vault_grace` deprecation 49 50 Nomad 0.11.0 updates 51 [consul-template](https://github.com/hashicorp/consul-template) to v0.24.1. 52 This library deprecates the [`vault_grace`][vault_grace] option for templating 53 included in Nomad. The feature has been ignored since Vault 0.5 and as long as 54 you are running a more recent version of Vault, you can safely remove 55 `vault_grace` from your Nomad jobs. 56 57 ### Rkt Task Driver Removed 58 59 The `rkt` task driver has been deprecated and removed from Nomad. While the 60 code is available in an external repository, 61 [https://github.com/hashicorp/nomad-driver-rkt](https://github.com/hashicorp/nomad-driver-rkt), 62 it will not be maintained as `rkt` is [no longer being developed 63 upstream](https://github.com/rkt/rkt). We encourage all `rkt` users to find a 64 new task driver as soon as possible. 65 66 ## Nomad 0.10.4 67 68 ### Same-Node Scheduling Penalty Removed 69 70 Nomad 0.10.4 includes a fix to the scheduler that removes the 71 same-node penalty for allocations that have not previously failed. In 72 earlier versions of Nomad, the node where an allocation was running 73 was penalized from receiving updated versions of that allocation, 74 resulting in a higher chance of the allocation being placed on a new 75 node. This was changed so that the penalty only applies to nodes where 76 the previous allocation has failed or been rescheduled, to reduce the 77 risk of correlated failures on a host. Scheduling weighs a number of 78 factors, but this change should reduce movement of allocations that 79 are being updated from a healthy state. You can view the placement 80 metrics for an allocation with `nomad alloc status -verbose`. 81 82 ### Additional Environment Variable Filtering 83 84 Nomad will by default prevent certain environment variables set in the client 85 process from being passed along into launched tasks. The `CONSUL_HTTP_TOKEN` 86 environment variable has been added to the default list. More information can 87 be found in the `env.blacklist` [configuration](/docs/configuration/client#env-blacklist) . 88 89 ## Nomad 0.10.3 90 91 ### mTLS Certificate Validation 92 93 Nomad 0.10.3 includes a fix for a privilege escalation vulnerability in 94 validating TLS certificates for RPC with mTLS. Nomad RPC endpoints validated 95 that TLS client certificates had not expired and were signed by the same CA as 96 the Nomad node, but did not correctly check the certificate's name for the role 97 and region as described in the [Securing Nomad with TLS][tls-guide] guide. This 98 allows trusted operators with a client certificate signed by the CA to send RPC 99 calls as a Nomad client or server node, bypassing access control and accessing 100 any secrets available to a client. 101 102 Nomad clusters configured for mTLS following the [Securing Nomad with TLS][tls-guide] 103 guide or the [Vault PKI Secrets Engine Integration][tls-vault-guide] guide 104 should already have certificates that will pass validation. Before upgrading to 105 Nomad 0.10.3, operators using mTLS with `verify_server_hostname = true` should 106 confirm that the common name or SAN of all Nomad client node certs is 107 `client.<region>.nomad`, and that the common name or SAN of all Nomad server 108 node certs is `server.<region>.nomad`. 109 110 ### Connection Limits Added 111 112 Nomad 0.10.3 introduces the [limits][limits] agent configuration parameters for 113 mitigating denial of service attacks from users who are not authenticated via 114 mTLS. The default limits stanza is: 115 116 ```hcl 117 limits { 118 https_handshake_timeout = "5s" 119 http_max_conns_per_client = 100 120 rpc_handshake_timeout = "5s" 121 rpc_max_conns_per_client = 100 122 } 123 ``` 124 125 If your Nomad agent's endpoints are protected from unauthenticated users via 126 other mechanisms these limits may be safely disabled by setting them to `0`. 127 128 However the defaults were chosen to be safe for a wide variety of Nomad 129 deployments and may protect against accidental abuses of the Nomad API that 130 could cause unintended resource usage. 131 132 ## Nomad 0.10.2 133 134 ### Preemption Panic Fixed 135 136 Nomad 0.9.7 and 0.10.2 fix a [server crashing bug][gh-6787] present in 137 scheduler preemption since 0.9.0. Users unable to immediately upgrade Nomad can 138 [disable preemption][preemption-api] to avoid the panic. 139 140 ### Dangling Docker Container Cleanup 141 142 Nomad 0.10.2 addresses an issue occurring in heavily loaded clients, where 143 containers are started without being properly managed by Nomad. Nomad 0.10.2 144 introduced a reaper that detects and kills such containers. 145 146 Operators may opt to run reaper in a dry-mode or disabling it through a client config. 147 148 For more information, see [Docker Dangling containers][dangling-containers]. 149 150 ## Nomad 0.10.0 151 152 ### Deployments 153 154 Nomad 0.10 enables rolling deployments for service jobs by default 155 and adds a default update stanza when a service job is created or updated. 156 This does not affect jobs with an update stanza. 157 158 In pre-0.10 releases, when updating a service job without an update stanza, 159 all existing allocations are stopped while new allocations start up, 160 and this may cause a service degradation or an outage. 161 You can regain this behavior and disable deployments by setting `max_parallel` to 0. 162 163 For more information, see [`update` stanza][update]. 164 165 ## Nomad 0.9.5 166 167 ### Template Rendering 168 169 Nomad 0.9.5 includes security fixes for privilege escalation vulnerabilities in handling of job `template` stanzas: 170 171 - The client host's environment variables are now cleaned before rendering the template. If a template includes the `env` function, the job should include an [`env`](/docs/job-specification/env) stanza to allow access to the variable in the template. 172 - The `plugin` function is no longer permitted by default and will raise an error if used in a template. Operator can opt-in to permitting this function with the new [`template.function_blacklist`](/docs/configuration/client#template-parameters) field in the client configuration. 173 - The `file` function has been changed to restrict paths to fall inside the task directory by default. Paths that used the `NOMAD_TASK_DIR` environment variable to prefix file paths should work unchanged. Relative paths or symlinks that point outside the task directory will raise an error. An operator can opt-out of this protection with the new [`template.disable_file_sandbox`](/docs/configuration/client#template-parameters) field in the client configuration. 174 175 ## Nomad 0.9.0 176 177 ### Preemption 178 179 Nomad 0.9 adds preemption support for system jobs. If a system job is submitted 180 that has a higher priority than other running jobs on the node, and the node 181 does not have capacity remaining, Nomad may preempt those lower priority 182 allocations to place the system job. See [preemption][preemption] for more 183 details. 184 185 ### Task Driver Plugins 186 187 All task drivers have become [plugins][plugins] in Nomad 0.9.0. There are two 188 user visible differences between 0.8 and 0.9 drivers: 189 190 - [LXC][lxc] is now community supported and distributed independently. 191 - Task driver [`config`][task-config] stanzas are no longer validated by 192 the [`nomad job validate`][validate] command. This is a regression that will 193 be fixed in a future release. 194 195 There is a new method for client driver configuration options, but existing 196 `client.options` settings are supported in 0.9. See [plugin 197 configuration][plugin-stanza] for details. 198 199 #### LXC 200 201 LXC is now an external plugin and must be installed separately. See [the LXC 202 driver's documentation][lxc] for details. 203 204 ### Structured Logging 205 206 Nomad 0.9.0 switches to structured logging. Any log processing on the pre-0.9 207 log output will need to be updated to match the structured output. 208 209 Structured log lines have the format: 210 211 ``` 212 # <Timestamp> [<Level>] <Component>: <Message>: <KeyN>=<ValueN> ... 213 214 2019-01-29T05:52:09.221Z [INFO ] client.plugin: starting plugin manager: plugin-type=device 215 ``` 216 217 Values containing whitespace will be quoted: 218 219 ``` 220 ... starting plugin: task=redis args="[/opt/gopath/bin/nomad logmon]" 221 ``` 222 223 ### HCL2 Transition 224 225 Nomad 0.9.0 begins a transition to [HCL2][hcl2], the next version of the 226 HashiCorp configuration language. While Nomad has begun integrating HCL2, 227 users will need to continue to use HCL1 in Nomad 0.9.0 as the transition is 228 incomplete. 229 230 If you interpolate variables in your [`task.config`][task-config] containing 231 consecutive dots in their name, you will need to change your job specification 232 to use the `env` map. See the following example: 233 234 ```hcl 235 env { 236 # Note the multiple consecutive dots 237 image...version = "3.2" 238 239 # Valid in both v0.8 and v0.9 240 image.version = "3.2" 241 } 242 243 # v0.8 task config stanza: 244 task { 245 driver = "docker" 246 config { 247 image = "redis:${image...version}" 248 } 249 } 250 251 # v0.9 task config stanza: 252 task { 253 driver = "docker" 254 config { 255 image = "redis:${env["image...version"]}" 256 } 257 } 258 ``` 259 260 This only affects users who interpolate unusual variables with multiple 261 consecutive dots in their task `config` stanza. All other interpolation is 262 unchanged. 263 264 Since HCL2 uses dotted object notation for interpolation users should 265 transition away from variable names with multiple consecutive dots. 266 267 ### Downgrading clients 268 269 Due to the large refactor of the Nomad client in 0.9, downgrading to a 270 previous version of the client after upgrading it to Nomad 0.9 is not supported. 271 To downgrade safely, users should erase the Nomad client's data directory. 272 273 ## Nomad 0.8.0 274 275 ### Raft Protocol Version Compatibility 276 277 When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need 278 to set the 279 [`raft_protocol`](/docs/configuration/server#raft_protocol) option 280 in their `server` stanza to 1 in order to maintain backwards compatibility with 281 the old servers during the upgrade. After the servers have been migrated to 282 version 0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted 283 to match the default. 284 285 The Raft protocol must be stepped up in this way; only adjacent version numbers are 286 compatible (for example, version 1 cannot talk to version 3). Here is a table of the 287 Raft Protocol versions supported by each Nomad version: 288 289 <table> 290 <thead> 291 <tr> 292 <th>Version</th> 293 <th>Supported Raft Protocols</th> 294 </tr> 295 </thead> 296 <tbody> 297 <tr> 298 <td>0.6 and earlier</td> 299 <td>0</td> 300 </tr> 301 <tr> 302 <td>0.7</td> 303 <td>1</td> 304 </tr> 305 <tr> 306 <td>0.8 and later</td> 307 <td>1, 2, 3</td> 308 </tr> 309 </tbody> 310 </table> 311 312 In order to enable all [Autopilot](https://learn.hashicorp.com/nomad/operating-nomad/autopilot) features, all servers 313 in a Nomad cluster must be running with Raft protocol version 3 or later. 314 315 #### Upgrading to Raft Protocol 3 316 317 This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all servers in order to work. See [Raft Protocol Version Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility) for more details. Also the format of `peers.json` used for outage recovery is different when running with the latest Raft protocol. See [Manual Recovery Using peers.json](https://learn.hashicorp.com/nomad/operating-nomad/outage#manual-recovery-using-peersjson) for a description of the required format. 318 319 Please note that the Raft protocol is different from Nomad's internal protocol as shown in commands like `nomad server members`. To see the version of the Raft protocol in use on each server, use the `nomad operator raft list-peers` command. 320 321 The easiest way to upgrade servers is to have each server leave the cluster, upgrade its `raft_protocol` version in the `server` stanza, and then add it back. Make sure the new server joins successfully and that the cluster is stable before rolling the upgrade forward to the next server. It's also possible to stand up a new set of servers, and then slowly stand down each of the older servers in a similar fashion. 322 323 When using Raft protocol version 3, servers are identified by their `node-id` instead of their IP address when Nomad makes changes to its internal Raft quorum configuration. This means that once a cluster has been upgraded with servers all running Raft protocol version 3, it will no longer allow servers running any older Raft protocol versions to be added. If running a single Nomad server, restarting it in-place will result in that server not being able to elect itself as a leader. To avoid this, either set the Raft protocol back to 2, or use [Manual Recovery Using peers.json](https://learn.hashicorp.com/nomad/operating-nomad/outage#manual-recovery-using-peersjson) to map the server to its node ID in the Raft quorum configuration. 324 325 ### Node Draining Improvements 326 327 Node draining via the [`node drain`][drain-cli] command or the [drain 328 API][drain-api] has been substantially changed in Nomad 0.8. In Nomad 0.7.1 and 329 earlier draining a node would immediately stop all allocations on the node 330 being drained. Nomad 0.8 now supports a [`migrate`][migrate] stanza in job 331 specifications to control how many allocations may be migrated at once and the 332 default will be used for existing jobs. 333 334 The `drain` command now blocks until the drain completes. To get the Nomad 335 0.7.1 and earlier drain behavior use the command: `nomad node drain -enable -force -detach <node-id>` 336 337 See the [`migrate` stanza documentation][migrate] and [Decommissioning Nodes 338 guide](https://learn.hashicorp.com/nomad/operating-nomad/node-draining) for details. 339 340 ### Periods in Environment Variable Names No Longer Escaped 341 342 _Applications which expect periods in environment variable names to be replaced 343 with underscores must be updated._ 344 345 In Nomad 0.7 periods (`.`) in environment variables names were replaced with an 346 underscore in both the [`env`](/docs/job-specification/env) and 347 [`template`](/docs/job-specification/template) stanzas. 348 349 In Nomad 0.8 periods are _not_ replaced and will be included in environment 350 variables verbatim. 351 352 For example the following stanza: 353 354 ```text 355 env { 356 registry.consul.addr = "${NOMAD_IP_http}:8500" 357 } 358 ``` 359 360 In Nomad 0.7 would be exposed to the task as 361 `registry_consul_addr=127.0.0.1:8500`. In Nomad 0.8 it will now appear exactly 362 as specified: `registry.consul.addr=127.0.0.1:8500`. 363 364 ### Client APIs Unavailable on Older Nodes 365 366 Because Nomad 0.8 uses a new RPC mechanism to route node-specific APIs like 367 [`nomad alloc fs`](/docs/commands/alloc/fs) through servers to the node, 368 0.8 CLIs are incompatible using these commands on clients older than 0.8. 369 370 To access these commands on older clients either continue to use a pre-0.8 371 version of the CLI, or upgrade all clients to 0.8. 372 373 ### CLI Command Changes 374 375 Nomad 0.8 has changed the organization of CLI commands to be based on 376 subcommands. An example of this change is the change from `nomad alloc-status` 377 to `nomad alloc status`. All commands have been made to be backwards compatible, 378 but operators should update any usage of the old style commands to the new style 379 as the old style will be deprecated in future versions of Nomad. 380 381 ### RPC Advertise Address 382 383 The behavior of the [advertised RPC 384 address](/docs/configuration#rpc-1) has changed to be only used 385 to advertise the RPC address of servers to client nodes. Server to server 386 communication is done using the advertised Serf address. Existing cluster's 387 should not be effected but the advertised RPC address may need to be updated to 388 allow connecting client's over a NAT. 389 390 ## Nomad 0.6.0 391 392 ### Default `advertise` address changes 393 394 When no `advertise` address was specified and Nomad's `bind_addr` was loopback 395 or `0.0.0.0`, Nomad attempted to resolve the local hostname to use as an 396 advertise address. 397 398 Many hosts cannot properly resolve their hostname, so Nomad 0.6 defaults 399 `advertise` to the first private IP on the host (e.g. `10.1.2.3`). 400 401 If you manually configure `advertise` addresses no changes are necessary. 402 403 ## Nomad Clients 404 405 The change to the default, advertised IP also effect clients that do not specify 406 which network_interface to use. If you have several routable IPs, it is advised 407 to configure the client's [network 408 interface](/docs/configuration/client#network_interface) 409 such that tasks bind to the correct address. 410 411 ## Nomad 0.5.5 412 413 ### Docker `load` changes 414 415 Nomad 0.5.5 has a backward incompatible change in the `docker` driver's 416 configuration. Prior to 0.5.5 the `load` configuration option accepted a list 417 images to load, in 0.5.5 it has been changed to a single string. No 418 functionality was changed. Even if more than one item was specified prior to 419 0.5.5 only the first item was used. 420 421 To do a zero-downtime deploy with jobs that use the `load` option: 422 423 - Upgrade servers to version 0.5.5 or later. 424 425 - Deploy new client nodes on the same version as the servers. 426 427 - Resubmit jobs with the `load` option fixed and a constraint to only run on 428 version 0.5.5 or later: 429 430 ```hcl 431 constraint { 432 attribute = "${attr.nomad.version}" 433 operator = "version" 434 value = ">= 0.5.5" 435 } 436 ``` 437 438 - Drain and shutdown old client nodes. 439 440 ### Validation changes 441 442 Due to internal job serialization and validation changes you may run into 443 issues using 0.5.5 command line tools such as `nomad run` and `nomad validate` 444 with 0.5.4 or earlier agents. 445 446 It is recommended you upgrade agents before or alongside your command line 447 tools. 448 449 ## Nomad 0.4.0 450 451 Nomad 0.4.0 has backward incompatible changes in the logic for Consul 452 deregistration. When a Task which was started by Nomad v0.3.x is uncleanly shut 453 down, the Nomad 0.4 Client will no longer clean up any stale services. If an 454 in-place upgrade of the Nomad client to 0.4 prevents the Task from gracefully 455 shutting down and deregistering its Consul-registered services, the Nomad Client 456 will not clean up the remaining Consul services registered with the 0.3 457 Executor. 458 459 We recommend draining a node before upgrading to 0.4.0 and then re-enabling the 460 node once the upgrade is complete. 461 462 ## Nomad 0.3.1 463 464 Nomad 0.3.1 removes artifact downloading from driver configurations and places them as 465 a first class element of the task. As such, jobs will have to be rewritten in 466 the proper format and resubmitted to Nomad. Nomad clients will properly 467 re-attach to existing tasks but job definitions must be updated before they can 468 be dispatched to clients running 0.3.1. 469 470 ## Nomad 0.3.0 471 472 Nomad 0.3.0 has made several substantial changes to job files included a new 473 `log` block and variable interpretation syntax (`${var}`), a modified `restart` 474 policy syntax, and minimum resources for tasks as well as validation. These 475 changes require a slight change to the default upgrade flow. 476 477 After upgrading the version of the servers, all previously submitted jobs must 478 be resubmitted with the updated job syntax using a Nomad 0.3.0 binary. 479 480 - All instances of `$var` must be converted to the new syntax of `${var}` 481 482 - All tasks must provide their required resources for CPU, memory and disk as 483 well as required network usage if ports are required by the task. 484 485 - Restart policies must be updated to indicate whether it is desired for the 486 task to restart on failure or to fail using `mode = "delay"` or `mode = "fail"` respectively. 487 488 - Service names that include periods will fail validation. To fix, remove any 489 periods from the service name before running the job. 490 491 After updating the Servers and job files, Nomad Clients can be upgraded by first 492 draining the node so no tasks are running on it. This can be verified by running 493 `nomad node status <node-id>` and verify there are no tasks in the `running` 494 state. Once that is done the client can be killed, the `data_dir` should be 495 deleted and then Nomad 0.3.0 can be launched. 496 497 [dangling-containers]: /docs/drivers/docker#dangling-containers 498 [drain-api]: /api-docs/nodes#drain-node 499 [drain-cli]: /docs/commands/node/drain 500 [dst]: /docs/job-specification/periodic#daylight-saving-time 501 [gh-6787]: https://github.com/hashicorp/nomad/issues/6787 502 [hcl2]: https://github.com/hashicorp/hcl2 503 [limits]: /docs/configuration#limits 504 [lxc]: /docs/drivers/external/lxc 505 [migrate]: /docs/job-specification/migrate 506 [plugin-stanza]: /docs/configuration/plugin 507 [plugins]: /docs/drivers/external 508 [preemption-api]: /api-docs/operator#update-scheduler-configuration 509 [preemption]: /docs/internals/scheduling/preemption 510 [reserved]: /docs/configuration/client#reserved-parameters 511 [task-config]: /docs/job-specification/task#config 512 [tls-guide]: https://learn.hashicorp.com/nomad/transport-security/enable-tls 513 [tls-vault-guide]: https://learn.hashicorp.com/nomad/vault-integration/vault-pki-nomad 514 [update]: /docs/job-specification/update 515 [validate]: /docs/commands/job/validate 516 [vault_grace]: /docs/job-specification/template