github.com/apprenda/kismatic@v1.12.0/docs/upgrade.md (about) 1 # Upgrading Your Cluster 2 3 In order to keep your Kubernetes cluster up to date, and to take advantage of new 4 features in Kismatic, you may upgrade an existing cluster that was previously 5 stood up using Kismatic. The upgrade functionality is available through the 6 `kismatic upgrade` command. 7 8 The upgrade process is applied to each node, one node at a time. If a private docker registry 9 is being used, the new container images will be pushed by Kismatic before starting to upgrade 10 nodes. 11 12 Nodes in the cluster are upgraded in the following order: 13 14 1. Etcd nodes 15 2. Master nodes 16 3. Worker nodes (regardless of specialization) 17 18 It is important to keep in mind that if a node has multiple roles, all components will be upgraded. 19 For example, if we are in the process of upgrading etcd nodes, and a node is both an etcd node and 20 a master node, both the etcd and kubernetes master components will be upgraded in a single pass. 21 22 Cluster level services are upgraded once all nodes have been successfully upgraded. 23 Cluster services include the pod network provider (e.g. Calico), Dashboard, cluster DNS, etc. 24 25 **Upgrade pre-requisites**: 26 - Plan file used to install existing cluster 27 - Generated assets directory ("generated") 28 - SSH access to cluster nodes 29 - Cluster in a healthy state 30 31 ## Supported Upgrade Paths 32 KET supports upgrades from the following source versions: 33 - Same minor version, any patch version. For example, KET supports an upgrade from v1.3.0 to v1.3.4. 34 - Previous minor version, last patch version. For example, KET supports an upgrade from v1.3.3 to v1.4.0, but it does not support an upgrade from v1.3.0 to v1.4.0. 35 36 ## Quick Start 37 Here are some example commands to get you started with upgrading your Kubernetes cluster. We encourage you to read this doc and understand the upgrade process before performing an upgrade. 38 ``` 39 # Run an offline upgrade 40 ./kismatic upgrade offline 41 42 # Run the checks performed during an online upgrade, but don't actually upgrade my cluster 43 ./kismatic upgrade online --dry-run 44 45 # Run an online upgrade 46 ./kismatic upgrade online 47 48 # Run an online upgrade, and skip the checks that I know are safe to ignore 49 ./kismatic upgrade online --ignore-safety-checks 50 ``` 51 52 ## Readiness 53 Before performing an upgrade, Kismatic ensures that the nodes are ready to be upgraded. 54 The following checks are performed on each node to determine readiness: 55 56 1. Disk space: Ensure that there is enough disk space on the root drive of the node. 57 2. Packages: When package installation is disabled, ensure that the new packages are installed. 58 59 ## Etcd upgrade 60 The etcd clusters should be backed up before performing an upgrade. Even though Kismatic will 61 backup the clusters during an upgrade, it is recommended that you perform and maintain your own backups. 62 If you don't have an automated backup solution in place, it is recommended that you perform a manual backup of 63 both the Kubernetes and networking etcd clusters before upgrading your cluster, and store 64 the backup on persistent storage off cluster. 65 66 Kismatic will backup the etcd data before performing an upgrade. If necessary, you may find the 67 backups in the following locations: 68 69 * Kubernetes etcd cluster: `/etc/etcd_k8s/backup/$timestamp` 70 * Networking etcd cluster: `/etc/etcd_networking/backup/$timestamp` 71 72 For safety reasons, Kismatic does not remove the backups after the cluster has been 73 successfully upgraded. 74 75 ## Online Upgrade 76 With the goal of preventing workload data or availability loss, you might opt for doing 77 an online upgrade. In this mode, Kismatic will run safety and availability checks (see table below) against the 78 existing cluster before performing the upgrade. If any unsafe condition is detected, a report will 79 be printed, and the upgrade will not proceed. 80 81 Once all nodes are deemed ready for upgrade, it will proceed one node at a time. 82 If the node under upgrade is a Kubernetes node, it is cordoned and drained of workloads 83 before any changes are applied. In order to prevent workloads from being forcefully killed, 84 it is important that they handle termination signals to perform any clean up if required. 85 Once the node has been upgraded successfully, it is uncordoned and reinstated to the pool 86 of available nodes. 87 88 To perform an online upgrade, use the `kismatic upgrade online` command. 89 90 ### Safety 91 Safety is the first concern of upgrading Kubernetes. An unsafe upgrade is one that results in 92 loss of data or critical functionality, or the potential for this loss. 93 For example, upgrading a node that hosts a pod which writes to an empty dir volume is considered unsafe. 94 95 ### Availability 96 Availability is the second concern of upgrading Kubernetes. An upgrade interrupts 97 **cluster availability** if it results in the loss of a global cluster function 98 (such as removing the last master, ingress or breaking etcd quorum). An upgrade 99 interrupts **workload availability** if it results in the reduction of a service 100 to 0 active pods. 101 102 ### Safety and Availability checks 103 The following list contains the conditions that are checked during an online upgrade, and the reason 104 why the upgrade is blocked if the condition is detected. 105 106 | Condition | Reasoning | 107 |--------------------------------------------|---------------------------------------------------------------------------| 108 | Pod not managed by RC, RS, Job, DS, or SS | Potentially unsafe: unmanaged pod will not be rescheduled | 109 | Pods without peers (i.e. replicas = 1) | Potentially unavailable: singleton pod will be unavailable during upgrade | 110 | DaemonSet scheduled on a single node | Potentially unavailable: singleton pod will be unavailable during upgrade | 111 | Pod using EmptyDir volume | Potentially unsafe: pod will loose the data in this volume | 112 | Pod using HostPath volume | Potentially unsafe: pod will loose the data in this volume | 113 | Pod using HostPath persistent volume | Potentially unsafe: pod will loose the data in this volume | 114 | Etcd node in a cluster with < 3 etcds | Unavailable: upgrading the etcd node will bring the cluster down | 115 | Master node in a cluster with < 2 masters | Unavailable: upgrading the master node will bring the control plane down | 116 | Worker node in a cluster with < 2 workers | Unavailable: upgrading the worker node will bring all workloads down | 117 | Ingress node | Unavailable: we can't ensure that ingress nodes are load balanced | 118 | Storage node | Potentially unavailable: brick on node will become unavailable | 119 120 ### Ignoring Safety Checks 121 Flagged safety checks should usually be resolved before performing an online upgrade. 122 There might be circumstances, however, in which failed checks cannot be resolved and they can 123 safely be ignored. For example, a workload using an EmptyDir volume as scratch space 124 can be drained from a node, as it won't have any useful data in the EmptyDir. 125 126 Once all the resolvable safety checks are taken care of, you may want to 127 ignore the remaining safety checks. To ignore them, pass the `--ignore-safety-checks` 128 flag to the `kismatic upgrade online` command. The checks will still run, but they 129 won't prevent the upgrade from running. 130 131 ## Offline Upgrade 132 The offline upgrade is available for those clusters in which safety and availability are not a concern. 133 In this mode, the safety and availability checks will not be performed. 134 135 Performing an offline upgrade could result in loss of critical data and reduced service 136 availability. For this reason, this method should not be used for clusters that are housing 137 production workloads. 138 139 To perform an offline upgrade, use the `kismatic upgrade offline` command. 140 141 ## Partial Upgrade 142 Kismatic is able to perform a partial upgrade, in which the subset of nodes that 143 reported readiness, safety or availability problems are not upgraded. A partial upgrade 144 can only be performed when all etcd and master nodes are ready for upgrading. In other words, 145 performing a partial upgrade is not supported if any etcd or master node reports issues. 146 147 The partial upgrade can be useful in the case where addressing these problems is not feasible. 148 For example, one could decide to upgrade most of the nodes under an online upgrade, and then schedule 149 a downtime window for upgrading the rest of the nodes under an offline upgrade. 150 151 This mode can be enabled in both the online and offline upgrades by using the `--partial-ok` flag. 152 153 ## Version-specific notes 154 The following list contains links to upgrade notes that are specific to a given 155 Kismatic version. 156 157 - [Kismatic v1.3.0](./upgrade/v1.3.0)