github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/upgrade/index.mdx (about) 1 --- 2 layout: docs 3 page_title: 'Upgrading' 4 sidebar_current: 'guides-upgrade' 5 description: |- 6 Learn how to upgrade Nomad. 7 --- 8 9 # Upgrading 10 11 Nomad is designed to be flexible and resilient when upgrading from one Nomad 12 version to the next. Upgrades should cause neither a Nomad nor a service 13 outage. However, there are some restrictions to be aware of before upgrading: 14 15 - Nomad strives to be backward compatible for at least 1 point release, so 16 Nomad v0.10 hosts work with v0.9 hosts. Upgrading 2 point releases (eg v0.8 17 to v0.10) may work but is untested and unsupported. 18 19 - Nomad does _not_ support downgrading at this time. Downgrading clients 20 requires draining allocations and removing the [data directory][data_dir]. 21 Downgrading servers safely requires re-provisioning the cluster. 22 23 - New features are unlikely to work correctly until all nodes have been 24 upgraded. 25 26 - Check the [version upgrade details page][upgrade-specific] for important 27 changes and backward incompatibilities. 28 29 - When upgrading a Nomad Client, if it takes longer than the 30 [`heartbeat_grace`][heartbeat_grace] (10s by default) period to restart, all 31 allocations on that node may be rescheduled. 32 33 Nomad supports upgrading in place or by rolling in new servers: 34 35 - In Place: The Nomad binary can be updated on existing hosts. Running 36 allocations will continue running uninterrupted. 37 38 - Rolling: New hosts containing the new Nomad version may be added followed by 39 the removal of old hosts. The old nodes must be drained to migrate running 40 allocations to the new nodes. 41 42 This guide describes both approaches. 43 44 ## Upgrade Process 45 46 Once you have checked the [upgrade details for the new 47 version][upgrade-specific], the upgrade process is as simple as updating the 48 binary on each host and restarting the Nomad service. 49 50 At a high level we complete the following steps to upgrade Nomad: 51 52 - **Add the new version** 53 - **Check cluster health** 54 - **Remove the old version** 55 - **Check cluster health** 56 - **Upgrade clients** 57 58 ### 1. Add the new version to the existing cluster 59 60 While it is possible to upgrade Nomad client nodes before servers, this guide 61 recommends upgrading servers first as many new client features will not work 62 until servers are upgraded. 63 64 In a [federated cluster](https://learn.hashicorp.com/tutorials/nomad/federation), 65 new features are not guaranteed to work until all agents in a region and the 66 server nodes in the authoritative region are upgraded. 67 68 Whether you are replacing Nomad in place on existing systems or bringing up new 69 servers you should make changes incrementally, verifying cluster health at each 70 step of the upgrade. 71 72 On a single server, install the new version of Nomad. You can do this by 73 joining a new server to the cluster or by replacing or upgrading the binary 74 locally and restarting the Nomad service. 75 76 ### 2. Check cluster health 77 78 [Monitor the Nomad logs][monitor] on the remaining servers to check that the 79 new server has joined the cluster correctly. 80 81 Run `nomad agent-info` on the new servers and check that the `last_log_index` 82 is of a similar value to the other servers. This step ensures that changes have 83 been replicated to the new server. 84 85 ```shell-session 86 ubuntu@nomad-server-10-1-1-4:~$ nomad agent-info 87 nomad 88 bootstrap = false 89 known_regions = 1 90 leader = false 91 server = true 92 raft 93 applied_index = 53460 94 commit_index = 53460 95 fsm_pending = 0 96 last_contact = 54.512216ms 97 last_log_index = 53460 98 last_log_term = 1 99 last_snapshot_index = 49511 100 last_snapshot_term = 1 101 num_peers = 2 102 ... 103 ``` 104 105 Continue with the upgrades across the servers making sure to do a single Nomad 106 server at a time. You can check state of the servers with [`nomad server members`][server-members], and the state of the client nodes with [`nomad node status`][node-status]. 107 108 ### 3. Remove the old versions from servers 109 110 If you are doing an in place upgrade on existing servers this step is not 111 necessary as the version was changed in place. 112 113 If you are doing an upgrade by adding new servers and removing old servers 114 from the fleet you need to ensure that the server has left the fleet safely. 115 116 1. Stop the service on the existing host 117 2. On another server issue a `nomad server members` and check the status, if 118 the server is now in a left state you are safe to continue. 119 3. If the server is not in a left state, issue a `nomad server force-leave <server id>` 120 to remove the server from the cluster. 121 122 Monitor the logs of the other hosts in the Nomad cluster over this period. 123 124 ### 4. Check cluster health 125 126 Use the same actions in step #2 above to confirm cluster health. 127 128 ### 5. Upgrade clients 129 130 Following the successful upgrade of the servers you can now update your 131 clients using a similar process as the servers. You may either upgrade clients 132 in-place or start new nodes on the new version. See the [Workload Migration 133 Guide](https://learn.hashicorp.com/tutorials/nomad/node-drain) for instructions on how to migrate running 134 allocations from the old nodes to the new nodes with the [`nomad node drain`](/docs/commands/node/drain) command. 135 136 ## Done 137 138 You are now running the latest Nomad version. You can verify all 139 Clients joined by running `nomad node status` and checking all the clients 140 are in a `ready` state. 141 142 ## Upgrading to Nomad Enterprise 143 144 The process of upgrading to a Nomad Enterprise version is identical to upgrading 145 between versions of open source Nomad. The same guidance above should be 146 followed and as always, prior to starting the upgrade please check the [specific 147 version details](/docs/upgrade/upgrade-specific) page as some version 148 differences may require specific steps. 149 150 [data_dir]: /docs/configuration#data_dir 151 [heartbeat_grace]: /docs/configuration/server#heartbeat_grace 152 [monitor]: /docs/commands/monitor 153 [node-status]: /docs/commands/node/status 154 [server-members]: /docs/commands/server/members 155 [upgrade-specific]: /docs/upgrade/upgrade-specific 156 157 ## Upgrading to Raft Protocol 3 158 159 This section provides details on upgrading to Raft Protocol 3. Raft 160 protocol version 3 requires Nomad running 0.8.0 or newer on all 161 servers in order to work. Raft protocol version 2 will be removed in 162 Nomad 1.4.0. 163 164 To see the version of the Raft protocol in use on each server, use the 165 `nomad operator raft list-peers` command. 166 167 Note that the format of `peers.json` used for outage recovery is 168 different when running with the latest Raft protocol. See [Manual 169 Recovery Using 170 peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson) 171 for a description of the required format. 172 173 When using Raft protocol version 3, servers are identified by their 174 `node-id` instead of their IP address when Nomad makes changes to its 175 internal Raft quorum configuration. This means that once a cluster has 176 been upgraded with servers all running Raft protocol version 3, it 177 will no longer allow servers running any older Raft protocol versions 178 to be added. 179 180 ### Upgrading a Production Cluster to Raft Version 3 181 182 For production raft clusters with 3 or more members, the easiest way 183 to upgrade servers is to have each server leave the cluster, upgrade 184 its [`raft_protocol`] version in the `server` stanza (if upgrading to 185 a version lower than v1.3.0), and then add it back. Make sure the new 186 server joins successfully and that the cluster is stable before 187 rolling the upgrade forward to the next server. It's also possible to 188 stand up a new set of servers, and then slowly stand down each of the 189 older servers in a similar fashion. 190 191 For in-place raft protocol upgrades, perform the following for each 192 server, leaving the leader until last to reduce the chance of leader 193 elections that will slow down the process: 194 195 * Stop the server. 196 * Run `nomad server force-leave $server_name`. 197 * If the upgrade is for a Nomad version lower than v1.3.0, update the 198 [`raft_protocol`] in the server's configuration file to `3`. 199 * Restart the server. 200 * Run `nomad operator raft list-peers` to verify that the 201 `RaftProtocol` for the server is now `3`. 202 * On the server, run `nomad agent-info` and check that the 203 `last_log_index` is of a similar value to the other servers. This 204 step ensures that raft is healthy and changes are replicating to the 205 new server. 206 207 ### Upgrading a Single Server Cluster to Raft Version 3 208 209 If you are running a single Nomad server, restarting it in-place will 210 result in that server not being able to elect itself as a leader. To 211 avoid this, create a new [`raft.peers`][peers-json] file before 212 restarting the server with the new configuration. If you have `jq` 213 installed you can run the following script on the server's host to 214 write the correct `raft.peers` file: 215 216 ``` 217 #!/usr/bin/env bash 218 219 NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir') 220 NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr') 221 NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id") 222 223 cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json" 224 [ 225 { 226 "id": "$NODE_ID", 227 "address": "$NOMAD_ADDR", 228 "non_voter": false 229 } 230 ] 231 EOF 232 ``` 233 234 After running this script, if the upgrade is for a Nomad version lower 235 than v1.3.0, update the [`raft_protocol`] in the server's 236 configuration to `3` and restart the server. 237 238 [peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson 239 [`raft_protocol`]: /docs/configuration/server#raft_protocol