github.com/anth0d/nomad@v0.0.0-20221214183521-ae3a0a2cad06/website/content/docs/upgrade/index.mdx (about)

     1  ---
     2  layout: docs
     3  page_title: 'Upgrading'
     4  sidebar_current: 'guides-upgrade'
     5  description: |-
     6    Learn how to upgrade Nomad.
     7  ---
     8  
     9  # Upgrading
    10  
    11  Nomad is designed to be flexible and resilient when upgrading from one Nomad
    12  version to the next. Upgrades should cause neither a Nomad nor a service
    13  outage. However, there are some restrictions to be aware of before upgrading:
    14  
    15  - Nomad strives to be backward compatible for at least 1 point release, so
    16    Nomad v0.10 hosts work with v0.9 hosts. Upgrading 2 point releases (eg v0.8
    17    to v0.10) may work but is untested and unsupported.
    18  
    19    - Nomad does _not_ support downgrading at this time. Downgrading clients
    20      requires draining allocations and removing the [data directory][data_dir].
    21      Downgrading servers safely requires re-provisioning the cluster.
    22  
    23    - New features are unlikely to work correctly until all nodes have been
    24      upgraded.
    25  
    26    - Check the [version upgrade details page][upgrade-specific] for important
    27      changes and backward incompatibilities.
    28  
    29  - When upgrading a Nomad Client, if it takes longer than the
    30    [`heartbeat_grace`][heartbeat_grace] (10s by default) period to restart, all
    31    allocations on that node may be rescheduled.
    32  
    33  Nomad supports upgrading in place or by rolling in new servers:
    34  
    35  - In Place: The Nomad binary can be updated on existing hosts. Running
    36    allocations will continue running uninterrupted.
    37  
    38  - Rolling: New hosts containing the new Nomad version may be added followed by
    39    the removal of old hosts. The old nodes must be drained to migrate running
    40    allocations to the new nodes.
    41  
    42  This guide describes both approaches.
    43  
    44  ## Upgrade Process
    45  
    46  Once you have checked the [upgrade details for the new
    47  version][upgrade-specific], the upgrade process is as simple as updating the
    48  binary on each host and restarting the Nomad service.
    49  
    50  At a high level we complete the following steps to upgrade Nomad:
    51  
    52  - **Add the new version**
    53  - **Check cluster health**
    54  - **Remove the old version**
    55  - **Check cluster health**
    56  - **Upgrade clients**
    57  
    58  ### 1. Add the new version to the existing cluster
    59  
    60  While it is possible to upgrade Nomad client nodes before servers, this guide
    61  recommends upgrading servers first as many new client features will not work
    62  until servers are upgraded.
    63  
    64  In a [federated cluster](https://learn.hashicorp.com/tutorials/nomad/federation),
    65  new features are not guaranteed to work until all agents in a region and the
    66  server nodes in the authoritative region are upgraded.
    67  
    68  Whether you are replacing Nomad in place on existing systems or bringing up new
    69  servers you should make changes incrementally, verifying cluster health at each
    70  step of the upgrade.
    71  
    72  On a single server, install the new version of Nomad. You can do this by
    73  joining a new server to the cluster or by replacing or upgrading the binary
    74  locally and restarting the Nomad service.
    75  
    76  ### 2. Check cluster health
    77  
    78  [Monitor the Nomad logs][monitor] on the remaining servers to check that the
    79  new server has joined the cluster correctly.
    80  
    81  Run `nomad agent-info` on the new servers and check that the `last_log_index`
    82  is of a similar value to the other servers. This step ensures that changes have
    83  been replicated to the new server.
    84  
    85  ```shell-session
    86  ubuntu@nomad-server-10-1-1-4:~$ nomad agent-info
    87  nomad
    88    bootstrap = false
    89    known_regions = 1
    90    leader = false
    91    server = true
    92  raft
    93    applied_index = 53460
    94    commit_index = 53460
    95    fsm_pending = 0
    96    last_contact = 54.512216ms
    97    last_log_index = 53460
    98    last_log_term = 1
    99    last_snapshot_index = 49511
   100    last_snapshot_term = 1
   101    num_peers = 2
   102  ...
   103  ```
   104  
   105  Continue with the upgrades across the servers making sure to do a single Nomad
   106  server at a time. You can check state of the servers with [`nomad server members`][server-members], and the state of the client nodes with [`nomad node status`][node-status].
   107  
   108  ### 3. Remove the old versions from servers
   109  
   110  If you are doing an in place upgrade on existing servers this step is not
   111  necessary as the version was changed in place.
   112  
   113  If you are doing an upgrade by adding new servers and removing old servers
   114  from the fleet you need to ensure that the server has left the fleet safely.
   115  
   116  1. Stop the service on the existing host
   117  2. On another server issue a `nomad server members` and check the status, if
   118     the server is now in a left state you are safe to continue.
   119  3. If the server is not in a left state, issue a `nomad server force-leave <server id>`
   120     to remove the server from the cluster.
   121  
   122  Monitor the logs of the other hosts in the Nomad cluster over this period.
   123  
   124  ### 4. Check cluster health
   125  
   126  Use the same actions in step #2 above to confirm cluster health.
   127  
   128  ### 5. Upgrade clients
   129  
   130  Following the successful upgrade of the servers you can now update your
   131  clients using a similar process as the servers. You may either upgrade clients
   132  in-place or start new nodes on the new version. See the [Workload Migration
   133  Guide](https://learn.hashicorp.com/tutorials/nomad/node-drain) for instructions on how to migrate running
   134  allocations from the old nodes to the new nodes with the [`nomad node drain`](/docs/commands/node/drain) command.
   135  
   136  ## Done
   137  
   138  You are now running the latest Nomad version. You can verify all
   139  Clients joined by running `nomad node status` and checking all the clients
   140  are in a `ready` state.
   141  
   142  ## Upgrading to Nomad Enterprise
   143  
   144  The process of upgrading to a Nomad Enterprise version is identical to upgrading
   145  between versions of open source Nomad. The same guidance above should be
   146  followed and as always, prior to starting the upgrade please check the [specific
   147  version details](/docs/upgrade/upgrade-specific) page as some version
   148  differences may require specific steps.
   149  
   150  [data_dir]: /docs/configuration#data_dir
   151  [heartbeat_grace]: /docs/configuration/server#heartbeat_grace
   152  [monitor]: /docs/commands/monitor
   153  [node-status]: /docs/commands/node/status
   154  [server-members]: /docs/commands/server/members
   155  [upgrade-specific]: /docs/upgrade/upgrade-specific
   156  
   157  ## Upgrading to Raft Protocol 3
   158  
   159  This section provides details on upgrading to Raft Protocol 3. Raft
   160  protocol version 3 requires Nomad running 0.8.0 or newer on all
   161  servers in order to work. Raft protocol version 2 will be removed in
   162  Nomad 1.4.0.
   163  
   164  To see the version of the Raft protocol in use on each server, use the
   165  `nomad operator raft list-peers` command.
   166  
   167  Note that the format of `peers.json` used for outage recovery is
   168  different when running with the latest Raft protocol. See [Manual
   169  Recovery Using
   170  peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
   171  for a description of the required format.
   172  
   173  When using Raft protocol version 3, servers are identified by their
   174  `node-id` instead of their IP address when Nomad makes changes to its
   175  internal Raft quorum configuration. This means that once a cluster has
   176  been upgraded with servers all running Raft protocol version 3, it
   177  will no longer allow servers running any older Raft protocol versions
   178  to be added.
   179  
   180  ### Upgrading a Production Cluster to Raft Version 3
   181  
   182  For production raft clusters with 3 or more members, the easiest way
   183  to upgrade servers is to have each server leave the cluster, upgrade
   184  its [`raft_protocol`] version in the `server` stanza (if upgrading to
   185  a version lower than v1.3.0), and then add it back. Make sure the new
   186  server joins successfully and that the cluster is stable before
   187  rolling the upgrade forward to the next server. It's also possible to
   188  stand up a new set of servers, and then slowly stand down each of the
   189  older servers in a similar fashion.
   190  
   191  For in-place raft protocol upgrades, perform the following for each
   192  server, leaving the leader until last to reduce the chance of leader
   193  elections that will slow down the process:
   194  
   195  * Stop the server.
   196  * Run `nomad server force-leave $server_name`.
   197  * If the upgrade is for a Nomad version lower than v1.3.0, update the
   198    [`raft_protocol`] in the server's configuration file to `3`.
   199  * Restart the server.
   200  * Run `nomad operator raft list-peers` to verify that the
   201    `RaftProtocol` for the server is now `3`.
   202  * On the server, run `nomad agent-info` and check that the
   203    `last_log_index` is of a similar value to the other servers. This
   204    step ensures that raft is healthy and changes are replicating to the
   205    new server.
   206  
   207  ### Upgrading a Single Server Cluster to Raft Version 3
   208  
   209  If you are running a single Nomad server, restarting it in-place will
   210  result in that server not being able to elect itself as a leader. To
   211  avoid this, create a new [`raft.peers`][peers-json] file before
   212  restarting the server with the new configuration. If you have `jq`
   213  installed you can run the following script on the server's host to
   214  write the correct `raft.peers` file:
   215  
   216  ```
   217  #!/usr/bin/env bash
   218  
   219  NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
   220  NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
   221  NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
   222  
   223  cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
   224  [
   225    {
   226      "id": "$NODE_ID",
   227      "address": "$NOMAD_ADDR",
   228      "non_voter": false
   229    }
   230  ]
   231  EOF
   232  ```
   233  
   234  After running this script, if the upgrade is for a Nomad version lower
   235  than v1.3.0, update the [`raft_protocol`] in the server's
   236  configuration to `3` and restart the server.
   237  
   238  [peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson
   239  [`raft_protocol`]: /docs/configuration/server#raft_protocol