github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/upgrade/index.mdx

github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/upgrade/index.mdx (about)

1 ---
2 layout: docs
3 page_title: 'Upgrading'
4 sidebar_current: 'guides-upgrade'
5 description: |-
6 Learn how to upgrade Nomad.
7 ---
8
9 # Upgrading
10
11 Nomad is designed to be flexible and resilient when upgrading from one Nomad
12 version to the next. Upgrades should cause neither a Nomad nor a service
13 outage. However, there are some restrictions to be aware of before upgrading:
14
15 - Nomad strives to be backward compatible for at least 1 point release, so
16 Nomad v0.10 hosts work with v0.9 hosts. Upgrading 2 point releases (eg v0.8
17 to v0.10) may work but is untested and unsupported.
18
19 - Nomad does _not_ support downgrading at this time. Downgrading clients
20 requires draining allocations and removing the [data directory][data_dir].
21 Downgrading servers safely requires re-provisioning the cluster.
22
23 - New features are unlikely to work correctly until all nodes have been
24 upgraded.
25
26 - Check the [version upgrade details page][upgrade-specific] for important
27 changes and backward incompatibilities.
28
29 - When upgrading a Nomad Client, if it takes longer than the
30 [`heartbeat_grace`][heartbeat_grace] (10s by default) period to restart, all
31 allocations on that node may be rescheduled.
32
33 Nomad supports upgrading in place or by rolling in new servers:
34
35 - In Place: The Nomad binary can be updated on existing hosts. Running
36 allocations will continue running uninterrupted.
37
38 - Rolling: New hosts containing the new Nomad version may be added followed by
39 the removal of old hosts. The old nodes must be drained to migrate running
40 allocations to the new nodes.
41
42 This guide describes both approaches.
43
44 ## Upgrade Process
45
46 Once you have checked the [upgrade details for the new
47 version][upgrade-specific], the upgrade process is as simple as updating the
48 binary on each host and restarting the Nomad service.
49
50 At a high level we complete the following steps to upgrade Nomad:
51
52 - **Add the new version**
53 - **Check cluster health**
54 - **Remove the old version**
55 - **Check cluster health**
56 - **Upgrade clients**
57
58 ### 1. Add the new version to the existing cluster
59
60 While it is possible to upgrade Nomad client nodes before servers, this guide
61 recommends upgrading servers first as many new client features will not work
62 until servers are upgraded.
63
64 In a [federated cluster](https://learn.hashicorp.com/tutorials/nomad/federation),
65 new features are not guaranteed to work until all agents in a region and the
66 server nodes in the authoritative region are upgraded.
67
68 Whether you are replacing Nomad in place on existing systems or bringing up new
69 servers you should make changes incrementally, verifying cluster health at each
70 step of the upgrade.
71
72 On a single server, install the new version of Nomad. You can do this by
73 joining a new server to the cluster or by replacing or upgrading the binary
74 locally and restarting the Nomad service.
75
76 ### 2. Check cluster health
77
78 [Monitor the Nomad logs][monitor] on the remaining servers to check that the
79 new server has joined the cluster correctly.
80
81 Run `nomad agent-info` on the new servers and check that the `last_log_index`
82 is of a similar value to the other servers. This step ensures that changes have
83 been replicated to the new server.
84
85 ```shell-session
86 ubuntu@nomad-server-10-1-1-4:~$ nomad agent-info
87 nomad
88 bootstrap = false
89 known_regions = 1
90 leader = false
91 server = true
92 raft
93 applied_index = 53460
94 commit_index = 53460
95 fsm_pending = 0
96 last_contact = 54.512216ms
97 last_log_index = 53460
98 last_log_term = 1
99 last_snapshot_index = 49511
100 last_snapshot_term = 1
101 num_peers = 2
102 ...
103 ```
104
105 Continue with the upgrades across the servers making sure to do a single Nomad
106 server at a time. You can check state of the servers with [`nomad server members`][server-members], and the state of the client nodes with [`nomad node status`][node-status].
107
108 ### 3. Remove the old versions from servers
109
110 If you are doing an in place upgrade on existing servers this step is not
111 necessary as the version was changed in place.
112
113 If you are doing an upgrade by adding new servers and removing old servers
114 from the fleet you need to ensure that the server has left the fleet safely.
115
116 1. Stop the service on the existing host
117 2. On another server issue a `nomad server members` and check the status, if
118 the server is now in a left state you are safe to continue.
119 3. If the server is not in a left state, issue a `nomad server force-leave <server id>`
120 to remove the server from the cluster.
121
122 Monitor the logs of the other hosts in the Nomad cluster over this period.
123
124 ### 4. Check cluster health
125
126 Use the same actions in step #2 above to confirm cluster health.
127
128 ### 5. Upgrade clients
129
130 Following the successful upgrade of the servers you can now update your
131 clients using a similar process as the servers. You may either upgrade clients
132 in-place or start new nodes on the new version. See the [Workload Migration
133 Guide](https://learn.hashicorp.com/tutorials/nomad/node-drain) for instructions on how to migrate running
134 allocations from the old nodes to the new nodes with the [`nomad node drain`](/docs/commands/node/drain) command.
135
136 ## Done
137
138 You are now running the latest Nomad version. You can verify all
139 Clients joined by running `nomad node status` and checking all the clients
140 are in a `ready` state.
141
142 ## Upgrading to Nomad Enterprise
143
144 The process of upgrading to a Nomad Enterprise version is identical to upgrading
145 between versions of open source Nomad. The same guidance above should be
146 followed and as always, prior to starting the upgrade please check the [specific
147 version details](/docs/upgrade/upgrade-specific) page as some version
148 differences may require specific steps.
149
150 [data_dir]: /docs/configuration#data_dir
151 [heartbeat_grace]: /docs/configuration/server#heartbeat_grace
152 [monitor]: /docs/commands/monitor
153 [node-status]: /docs/commands/node/status
154 [server-members]: /docs/commands/server/members
155 [upgrade-specific]: /docs/upgrade/upgrade-specific
156
157 ## Upgrading to Raft Protocol 3
158
159 This section provides details on upgrading to Raft Protocol 3. Raft
160 protocol version 3 requires Nomad running 0.8.0 or newer on all
161 servers in order to work. Raft protocol version 2 will be removed in
162 Nomad 1.4.0.
163
164 To see the version of the Raft protocol in use on each server, use the
165 `nomad operator raft list-peers` command.
166
167 Note that the format of `peers.json` used for outage recovery is
168 different when running with the latest Raft protocol. See [Manual
169 Recovery Using
170 peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
171 for a description of the required format.
172
173 When using Raft protocol version 3, servers are identified by their
174 `node-id` instead of their IP address when Nomad makes changes to its
175 internal Raft quorum configuration. This means that once a cluster has
176 been upgraded with servers all running Raft protocol version 3, it
177 will no longer allow servers running any older Raft protocol versions
178 to be added.
179
180 ### Upgrading a Production Cluster to Raft Version 3
181
182 For production raft clusters with 3 or more members, the easiest way
183 to upgrade servers is to have each server leave the cluster, upgrade
184 its [`raft_protocol`] version in the `server` stanza (if upgrading to
185 a version lower than v1.3.0), and then add it back. Make sure the new
186 server joins successfully and that the cluster is stable before
187 rolling the upgrade forward to the next server. It's also possible to
188 stand up a new set of servers, and then slowly stand down each of the
189 older servers in a similar fashion.
190
191 For in-place raft protocol upgrades, perform the following for each
192 server, leaving the leader until last to reduce the chance of leader
193 elections that will slow down the process:
194
195 * Stop the server.
196 * Run `nomad server force-leave $server_name`.
197 * If the upgrade is for a Nomad version lower than v1.3.0, update the
198 [`raft_protocol`] in the server's configuration file to `3`.
199 * Restart the server.
200 * Run `nomad operator raft list-peers` to verify that the
201 `RaftProtocol` for the server is now `3`.
202 * On the server, run `nomad agent-info` and check that the
203 `last_log_index` is of a similar value to the other servers. This
204 step ensures that raft is healthy and changes are replicating to the
205 new server.
206
207 ### Upgrading a Single Server Cluster to Raft Version 3
208
209 If you are running a single Nomad server, restarting it in-place will
210 result in that server not being able to elect itself as a leader. To
211 avoid this, create a new [`raft.peers`][peers-json] file before
212 restarting the server with the new configuration. If you have `jq`
213 installed you can run the following script on the server's host to
214 write the correct `raft.peers` file:
215
216 ```
217 #!/usr/bin/env bash
218
219 NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
220 NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
221 NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
222
223 cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
224 [
225 {
226 "id": "$NODE_ID",
227 "address": "$NOMAD_ADDR",
228 "non_voter": false
229 }
230 ]
231 EOF
232 ```
233
234 After running this script, if the upgrade is for a Nomad version lower
235 than v1.3.0, update the [`raft_protocol`] in the server's
236 configuration to `3` and restart the server.
237
238 [peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson
239 [`raft_protocol`]: /docs/configuration/server#raft_protocol