github.com/apprenda/kismatic@v1.12.0/docs/upgrade.md (about)

     1  # Upgrading Your Cluster
     2  
     3  In order to keep your Kubernetes cluster up to date, and to take advantage of new
     4  features in Kismatic, you may upgrade an existing cluster that was previously
     5  stood up using Kismatic. The upgrade functionality is available through the
     6  `kismatic upgrade` command.
     7  
     8  The upgrade process is applied to each node, one node at a time. If a private docker registry
     9  is being used, the new container images will be pushed by Kismatic before starting to upgrade
    10  nodes.
    11  
    12  Nodes in the cluster are upgraded in the following order:
    13  
    14  1. Etcd nodes
    15  2. Master nodes
    16  3. Worker nodes (regardless of specialization)
    17  
    18  It is important to keep in mind that if a node has multiple roles, all components will be upgraded.
    19  For example, if we are in the process of upgrading etcd nodes, and a node is both an etcd node and
    20  a master node, both the etcd and kubernetes master components will be upgraded in a single pass.
    21  
    22  Cluster level services are upgraded once all nodes have been successfully upgraded.
    23  Cluster services include the pod network provider (e.g. Calico), Dashboard, cluster DNS, etc.
    24  
    25  **Upgrade pre-requisites**:
    26  - Plan file used to install existing cluster
    27  - Generated assets directory ("generated")
    28  - SSH access to cluster nodes
    29  - Cluster in a healthy state
    30  
    31  ## Supported Upgrade Paths
    32  KET supports upgrades from the following source versions:
    33  - Same minor version, any patch version. For example, KET supports an upgrade from v1.3.0 to v1.3.4.
    34  - Previous minor version, last patch version. For example, KET supports an upgrade from v1.3.3 to v1.4.0, but it does not support an upgrade from v1.3.0 to v1.4.0.
    35  
    36  ## Quick Start
    37  Here are some example commands to get you started with upgrading your Kubernetes cluster. We encourage you to read this doc and understand the upgrade process before performing an upgrade.
    38  ```
    39  # Run an offline upgrade
    40  ./kismatic upgrade offline
    41  
    42  # Run the checks performed during an online upgrade, but don't actually upgrade my cluster
    43  ./kismatic upgrade online --dry-run
    44  
    45  # Run an online upgrade
    46  ./kismatic upgrade online
    47  
    48  # Run an online upgrade, and skip the checks that I know are safe to ignore
    49  ./kismatic upgrade online --ignore-safety-checks
    50  ```
    51  
    52  ## Readiness
    53  Before performing an upgrade, Kismatic ensures that the nodes are ready to be upgraded.
    54  The following checks are performed on each node to determine readiness:
    55  
    56  1. Disk space: Ensure that there is enough disk space on the root drive of the node.
    57  2. Packages: When package installation is disabled, ensure that the new packages are installed.
    58  
    59  ## Etcd upgrade
    60  The etcd clusters should be backed up before performing an upgrade. Even though Kismatic will 
    61  backup the clusters during an upgrade, it is recommended that you perform and maintain your own backups.
    62  If you don't have an automated backup solution in place, it is recommended that you perform a manual backup of 
    63  both the Kubernetes and networking etcd clusters before upgrading your cluster, and store 
    64  the backup on persistent storage off cluster.
    65  
    66  Kismatic will backup the etcd data before performing an upgrade. If necessary, you may find the
    67  backups in the following locations:
    68  
    69  * Kubernetes etcd cluster: `/etc/etcd_k8s/backup/$timestamp`
    70  * Networking etcd cluster: `/etc/etcd_networking/backup/$timestamp`
    71  
    72  For safety reasons, Kismatic does not remove the backups after the cluster has been
    73  successfully upgraded.
    74  
    75  ## Online Upgrade
    76  With the goal of preventing workload data or availability loss, you might opt for doing
    77  an online upgrade. In this mode, Kismatic will run safety and availability checks (see table below) against the
    78  existing cluster before performing the upgrade. If any unsafe condition is detected, a report will
    79  be printed, and the upgrade will not proceed.
    80  
    81  Once all nodes are deemed ready for upgrade, it will proceed one node at a time.
    82  If the node under upgrade is a Kubernetes node, it is cordoned and drained of workloads
    83  before any changes are applied. In order to prevent workloads from being forcefully killed,
    84  it is important that they handle termination signals to perform any clean up if required.
    85  Once the node has been upgraded successfully, it is uncordoned and reinstated to the pool
    86  of available nodes.
    87  
    88  To perform an online upgrade, use the `kismatic upgrade online` command.
    89  
    90  ### Safety
    91  Safety is the first concern of upgrading Kubernetes. An unsafe upgrade is one that results in
    92  loss of data or critical functionality, or the potential for this loss.
    93  For example, upgrading a node that hosts a pod which writes to an empty dir volume is considered unsafe.
    94  
    95  ### Availability
    96  Availability is the second concern of upgrading Kubernetes. An upgrade interrupts
    97  **cluster availability** if it results in the loss of a global cluster function
    98  (such as removing the last master, ingress or breaking etcd quorum). An upgrade
    99  interrupts **workload availability** if it results in the reduction of a service
   100  to 0 active pods.
   101  
   102  ### Safety and Availability checks
   103  The following list contains the conditions that are checked during an online upgrade, and the reason
   104  why the upgrade is blocked if the condition is detected.
   105  
   106  | Condition                                  | Reasoning                                                                 |
   107  |--------------------------------------------|---------------------------------------------------------------------------|
   108  | Pod not managed by RC, RS,  Job, DS, or SS | Potentially unsafe: unmanaged pod will not be rescheduled                 |
   109  | Pods without peers (i.e. replicas = 1)     | Potentially unavailable: singleton pod will be unavailable during upgrade |
   110  | DaemonSet scheduled on a single node       | Potentially unavailable: singleton pod will be unavailable during upgrade |
   111  | Pod using EmptyDir volume                  | Potentially unsafe: pod will loose the data in this volume                |
   112  | Pod using HostPath volume                  | Potentially unsafe: pod will loose the data in this volume                |
   113  | Pod using HostPath persistent volume       | Potentially unsafe: pod will loose the data in this volume                |
   114  | Etcd node in a cluster with < 3 etcds      | Unavailable: upgrading the etcd node will bring the cluster down          |
   115  | Master node in a cluster with < 2 masters  | Unavailable: upgrading the master node will bring the control plane down  |
   116  | Worker node in a cluster with < 2 workers  | Unavailable: upgrading the worker node will bring all workloads down      |
   117  | Ingress node                               | Unavailable: we can't ensure that ingress nodes are load balanced         |
   118  | Storage node                               | Potentially unavailable: brick on node will become unavailable            |
   119  
   120  ### Ignoring Safety Checks
   121  Flagged safety checks should usually be resolved before performing an online upgrade. 
   122  There might be circumstances, however, in which failed checks cannot be resolved and they can
   123  safely be ignored. For example, a workload using an EmptyDir volume as scratch space
   124  can be drained from a node, as it won't have any useful data in the EmptyDir.
   125  
   126  Once all the resolvable safety checks are taken care of, you may want to
   127  ignore the remaining safety checks. To ignore them, pass the `--ignore-safety-checks`
   128  flag to the `kismatic upgrade online` command. The checks will still run, but they
   129  won't prevent the upgrade from running.
   130  
   131  ## Offline Upgrade
   132  The offline upgrade is available for those clusters in which safety and availability are not a concern.
   133  In this mode, the safety and availability checks will not be performed.
   134  
   135  Performing an offline upgrade could result in loss of critical data and reduced service
   136  availability. For this reason, this method should not be used for clusters that are housing
   137  production workloads.
   138  
   139  To perform an offline upgrade, use the `kismatic upgrade offline` command.
   140  
   141  ## Partial Upgrade
   142  Kismatic is able to perform a partial upgrade, in which the subset of nodes that
   143  reported readiness, safety or availability problems are not upgraded. A partial upgrade
   144  can only be performed when all etcd and master nodes are ready for upgrading. In other words,
   145  performing a partial upgrade is not supported if any etcd or master node reports issues.
   146  
   147  The partial upgrade can be useful in the case where addressing these problems is not feasible. 
   148  For example, one could decide to upgrade most of the nodes under an online upgrade, and then schedule
   149  a downtime window for upgrading the rest of the nodes under an offline upgrade.
   150  
   151  This mode can be enabled in both the online and offline upgrades by using the `--partial-ok` flag.
   152  
   153  ## Version-specific notes
   154  The following list contains links to upgrade notes that are specific to a given
   155  Kismatic version.
   156  
   157  - [Kismatic v1.3.0](./upgrade/v1.3.0)