github.com/apprenda/kismatic@v1.12.0/docs/upgrade.md

github.com/apprenda/kismatic@v1.12.0/docs/upgrade.md (about)

1 # Upgrading Your Cluster
2
3 In order to keep your Kubernetes cluster up to date, and to take advantage of new
4 features in Kismatic, you may upgrade an existing cluster that was previously
5 stood up using Kismatic. The upgrade functionality is available through the
6 `kismatic upgrade` command.
7
8 The upgrade process is applied to each node, one node at a time. If a private docker registry
9 is being used, the new container images will be pushed by Kismatic before starting to upgrade
10 nodes.
11
12 Nodes in the cluster are upgraded in the following order:
13
14 1. Etcd nodes
15 2. Master nodes
16 3. Worker nodes (regardless of specialization)
17
18 It is important to keep in mind that if a node has multiple roles, all components will be upgraded.
19 For example, if we are in the process of upgrading etcd nodes, and a node is both an etcd node and
20 a master node, both the etcd and kubernetes master components will be upgraded in a single pass.
21
22 Cluster level services are upgraded once all nodes have been successfully upgraded.
23 Cluster services include the pod network provider (e.g. Calico), Dashboard, cluster DNS, etc.
24
25 **Upgrade pre-requisites**:
26 - Plan file used to install existing cluster
27 - Generated assets directory ("generated")
28 - SSH access to cluster nodes
29 - Cluster in a healthy state
30
31 ## Supported Upgrade Paths
32 KET supports upgrades from the following source versions:
33 - Same minor version, any patch version. For example, KET supports an upgrade from v1.3.0 to v1.3.4.
34 - Previous minor version, last patch version. For example, KET supports an upgrade from v1.3.3 to v1.4.0, but it does not support an upgrade from v1.3.0 to v1.4.0.
35
36 ## Quick Start
37 Here are some example commands to get you started with upgrading your Kubernetes cluster. We encourage you to read this doc and understand the upgrade process before performing an upgrade.
38 ```
39 # Run an offline upgrade
40 ./kismatic upgrade offline
41
42 # Run the checks performed during an online upgrade, but don't actually upgrade my cluster
43 ./kismatic upgrade online --dry-run
44
45 # Run an online upgrade
46 ./kismatic upgrade online
47
48 # Run an online upgrade, and skip the checks that I know are safe to ignore
49 ./kismatic upgrade online --ignore-safety-checks
50 ```
51
52 ## Readiness
53 Before performing an upgrade, Kismatic ensures that the nodes are ready to be upgraded.
54 The following checks are performed on each node to determine readiness:
55
56 1. Disk space: Ensure that there is enough disk space on the root drive of the node.
57 2. Packages: When package installation is disabled, ensure that the new packages are installed.
58
59 ## Etcd upgrade
60 The etcd clusters should be backed up before performing an upgrade. Even though Kismatic will
61 backup the clusters during an upgrade, it is recommended that you perform and maintain your own backups.
62 If you don't have an automated backup solution in place, it is recommended that you perform a manual backup of
63 both the Kubernetes and networking etcd clusters before upgrading your cluster, and store
64 the backup on persistent storage off cluster.
65
66 Kismatic will backup the etcd data before performing an upgrade. If necessary, you may find the
67 backups in the following locations:
68
69 * Kubernetes etcd cluster: `/etc/etcd_k8s/backup/$timestamp`
70 * Networking etcd cluster: `/etc/etcd_networking/backup/$timestamp`
71
72 For safety reasons, Kismatic does not remove the backups after the cluster has been
73 successfully upgraded.
74
75 ## Online Upgrade
76 With the goal of preventing workload data or availability loss, you might opt for doing
77 an online upgrade. In this mode, Kismatic will run safety and availability checks (see table below) against the
78 existing cluster before performing the upgrade. If any unsafe condition is detected, a report will
79 be printed, and the upgrade will not proceed.
80
81 Once all nodes are deemed ready for upgrade, it will proceed one node at a time.
82 If the node under upgrade is a Kubernetes node, it is cordoned and drained of workloads
83 before any changes are applied. In order to prevent workloads from being forcefully killed,
84 it is important that they handle termination signals to perform any clean up if required.
85 Once the node has been upgraded successfully, it is uncordoned and reinstated to the pool
86 of available nodes.
87
88 To perform an online upgrade, use the `kismatic upgrade online` command.
89
90 ### Safety
91 Safety is the first concern of upgrading Kubernetes. An unsafe upgrade is one that results in
92 loss of data or critical functionality, or the potential for this loss.
93 For example, upgrading a node that hosts a pod which writes to an empty dir volume is considered unsafe.
94
95 ### Availability
96 Availability is the second concern of upgrading Kubernetes. An upgrade interrupts
97 **cluster availability** if it results in the loss of a global cluster function
98 (such as removing the last master, ingress or breaking etcd quorum). An upgrade
99 interrupts **workload availability** if it results in the reduction of a service
100 to 0 active pods.
101
102 ### Safety and Availability checks
103 The following list contains the conditions that are checked during an online upgrade, and the reason
104 why the upgrade is blocked if the condition is detected.
105
106 | Condition | Reasoning |
107 |--------------------------------------------|---------------------------------------------------------------------------|
108 | Pod not managed by RC, RS, Job, DS, or SS | Potentially unsafe: unmanaged pod will not be rescheduled |
109 | Pods without peers (i.e. replicas = 1) | Potentially unavailable: singleton pod will be unavailable during upgrade |
110 | DaemonSet scheduled on a single node | Potentially unavailable: singleton pod will be unavailable during upgrade |
111 | Pod using EmptyDir volume | Potentially unsafe: pod will loose the data in this volume |
112 | Pod using HostPath volume | Potentially unsafe: pod will loose the data in this volume |
113 | Pod using HostPath persistent volume | Potentially unsafe: pod will loose the data in this volume |
114 | Etcd node in a cluster with < 3 etcds | Unavailable: upgrading the etcd node will bring the cluster down |
115 | Master node in a cluster with < 2 masters | Unavailable: upgrading the master node will bring the control plane down |
116 | Worker node in a cluster with < 2 workers | Unavailable: upgrading the worker node will bring all workloads down |
117 | Ingress node | Unavailable: we can't ensure that ingress nodes are load balanced |
118 | Storage node | Potentially unavailable: brick on node will become unavailable |
119
120 ### Ignoring Safety Checks
121 Flagged safety checks should usually be resolved before performing an online upgrade.
122 There might be circumstances, however, in which failed checks cannot be resolved and they can
123 safely be ignored. For example, a workload using an EmptyDir volume as scratch space
124 can be drained from a node, as it won't have any useful data in the EmptyDir.
125
126 Once all the resolvable safety checks are taken care of, you may want to
127 ignore the remaining safety checks. To ignore them, pass the `--ignore-safety-checks`
128 flag to the `kismatic upgrade online` command. The checks will still run, but they
129 won't prevent the upgrade from running.
130
131 ## Offline Upgrade
132 The offline upgrade is available for those clusters in which safety and availability are not a concern.
133 In this mode, the safety and availability checks will not be performed.
134
135 Performing an offline upgrade could result in loss of critical data and reduced service
136 availability. For this reason, this method should not be used for clusters that are housing
137 production workloads.
138
139 To perform an offline upgrade, use the `kismatic upgrade offline` command.
140
141 ## Partial Upgrade
142 Kismatic is able to perform a partial upgrade, in which the subset of nodes that
143 reported readiness, safety or availability problems are not upgraded. A partial upgrade
144 can only be performed when all etcd and master nodes are ready for upgrading. In other words,
145 performing a partial upgrade is not supported if any etcd or master node reports issues.
146
147 The partial upgrade can be useful in the case where addressing these problems is not feasible.
148 For example, one could decide to upgrade most of the nodes under an online upgrade, and then schedule
149 a downtime window for upgrading the rest of the nodes under an offline upgrade.
150
151 This mode can be enabled in both the online and offline upgrades by using the `--partial-ok` flag.
152
153 ## Version-specific notes
154 The following list contains links to upgrade notes that are specific to a given
155 Kismatic version.
156
157 - [Kismatic v1.3.0](./upgrade/v1.3.0)