github.com/openshift/installer@v1.4.17/docs/user/openstack/etcd-ephemeral-disk.md (about) 1 # Moving etcd to an ephemeral local disk 2 3 You can move etcd from a root volume (Cinder) to a dedicated ephemeral local disk to prevent or resolve performance issues. 4 5 ## Prerequisites 6 7 * This migration is currently tested and documented as a day 2 operation. 8 * An OpenStack cloud where Nova is configured to use local storage for ephemeral disks. The `libvirt.images_type` option in `nova.conf` must not be `rbd`. 9 * An OpenStack cloud with Cinder being functional and enough available storage to accommodate 3 Root Volumes for the OpenShift control plane. 10 * OpenShift will be deployed with IPI for now; UPI is not yet documented but technically possible. 11 * The control-plane machine’s auxiliary storage device, such as /dev/vdb, must match the vdb. Change this reference in all places in the file. 12 13 ## Procedure 14 15 * Create a Nova flavor for the Control Plane which allows 10 GiB of Ephemeral Disk: 16 17 ```bash 18 openstack flavor create --ephemeral 10 [...] 19 ``` 20 21 * We will deploy a cluster with Root Volumes for the Control Plane. Here is an example of `install-config.yaml`: 22 23 ```yaml 24 [...] 25 controlPlane: 26 name: master 27 platform: 28 openstack: 29 type: ${CONTROL_PLANE_FLAVOR} 30 rootVolume: 31 size: 100 32 types: 33 - ${CINDER_TYPE} 34 replicas: 3 35 [...] 36 ``` 37 38 * Run openshift-install with the following parameters to create the cluster: 39 40 ```bash 41 openshift-install create cluster --dir=install_dir 42 ``` 43 44 * Once the cluster has been deployed and is healthy, edit the ControlPlaneMachineSet (CPMS) to add the additional block ephemeral device that will be used by etcd: 45 46 ```bash 47 oc patch ControlPlaneMachineSet/cluster -n openshift-machine-api --type json -p '[{"op": "add", "path": "/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/additionalBlockDevices", "value": [{"name": "etcd", "sizeGiB": 10, "storage": {"type": "Local"}}]}]' 48 ``` 49 50 > [!NOTE] 51 > Putting etcd on a block device of type Volume is not supported for performance reasons simply because we don't test it. 52 > While it's functionally the same as using the root volume, we decided to support local devices only for now. 53 54 * Wait for the control-plane to roll out with new Machines. A few commands can be used to check that everything is healthy: 55 56 ```bash 57 oc wait --timeout=90m --for=condition=Progressing=false controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster 58 oc wait --timeout=90m --for=jsonpath='{.spec.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster 59 oc wait --timeout=90m --for=jsonpath='{.status.updatedReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster 60 oc wait --timeout=90m --for=jsonpath='{.status.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster 61 oc wait --timeout=90m --for=jsonpath='{.status.readyReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster 62 oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false 63 ``` 64 65 * Check that we have 3 control plane machines, and that each machine has the additional block device: 66 67 ```bash 68 cp_machines=$(oc get machines -n openshift-machine-api --selector='machine.openshift.io/cluster-api-machine-role=master' --no-headers -o custom-columns=NAME:.metadata.name) 69 if [[ $(echo "${cp_machines}" | wc -l) -ne 3 ]]; then 70 exit 1 71 fi 72 for machine in ${cp_machines}; do 73 if ! oc get machine -n openshift-machine-api "${machine}" -o jsonpath='{.spec.providerSpec.value.additionalBlockDevices}' | grep -q 'etcd'; then 74 exit 1 75 fi 76 done 77 ``` 78 79 * We will use a MachineConfig to handle etcd on local disk. Create a file named `98-var-lib-etcd.yaml` with this content: 80 81 ```yaml 82 apiVersion: machineconfiguration.openshift.io/v1 83 kind: MachineConfig 84 metadata: 85 labels: 86 machineconfiguration.openshift.io/role: master 87 name: 98-var-lib-etcd 88 spec: 89 config: 90 ignition: 91 version: 3.2.0 92 systemd: 93 units: 94 - contents: | 95 [Unit] 96 Description=Make File System on /dev/vdb 97 DefaultDependencies=no 98 BindsTo=dev-vdb.device 99 After=dev-vdb.device var.mount 100 Before=systemd-fsck@dev-vdb.service 101 102 [Service] 103 Type=oneshot 104 RemainAfterExit=yes 105 ExecStart=/usr/sbin/mkfs.xfs -f /dev/vdb 106 TimeoutSec=0 107 108 [Install] 109 WantedBy=var-lib-containers.mount 110 enabled: true 111 name: systemd-mkfs@dev-vdb.service 112 - contents: | 113 [Unit] 114 Description=Mount /dev/vdb to /var/lib/etcd 115 Before=local-fs.target 116 Requires=systemd-mkfs@dev-vdb.service 117 After=systemd-mkfs@dev-vdb.service var.mount 118 119 [Mount] 120 What=/dev/vdb 121 Where=/var/lib/etcd 122 Type=xfs 123 Options=defaults,prjquota 124 125 [Install] 126 WantedBy=local-fs.target 127 enabled: true 128 name: var-lib-etcd.mount 129 - contents: | 130 [Unit] 131 Description=Sync etcd data if new mount is empty 132 DefaultDependencies=no 133 After=var-lib-etcd.mount var.mount 134 Before=crio.service 135 136 [Service] 137 Type=oneshot 138 RemainAfterExit=yes 139 ExecCondition=/usr/bin/test ! -d /var/lib/etcd/member 140 ExecStart=/usr/sbin/setenforce 0 141 ExecStart=/bin/rsync -ar /sysroot/ostree/deploy/rhcos/var/lib/etcd/ /var/lib/etcd/ 142 ExecStart=/usr/sbin/setenforce 1 143 TimeoutSec=0 144 145 [Install] 146 WantedBy=multi-user.target graphical.target 147 enabled: true 148 name: sync-var-lib-etcd-to-etcd.service 149 - contents: | 150 [Unit] 151 Description=Restore recursive SELinux security contexts 152 DefaultDependencies=no 153 After=var-lib-etcd.mount 154 Before=crio.service 155 156 [Service] 157 Type=oneshot 158 RemainAfterExit=yes 159 ExecStart=/sbin/restorecon -R /var/lib/etcd/ 160 TimeoutSec=0 161 162 [Install] 163 WantedBy=multi-user.target graphical.target 164 enabled: true 165 name: restorecon-var-lib-etcd.service 166 ``` 167 168 * Apply this file that will create the device and sync the data by entering the following command: 169 170 ```bash 171 oc create -f 98-var-lib-etcd.yaml 172 ``` 173 174 * This will take some time to complete, as the etcd data will be synced from the root volume to the local disk on 175 the control-plane machines. Run these commands to check whether the cluster is healthy: 176 177 ```bash 178 oc wait --timeout=45m --for=condition=Updating=false machineconfigpool/master 179 oc wait node --selector='node-role.kubernetes.io/master' --for condition=Ready --timeout=30s 180 oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false 181 ``` 182 183 184 * Once the cluster is healthy, create a file named `etcd-replace.yaml` with this content: 185 186 ```yaml 187 apiVersion: machineconfiguration.openshift.io/v1 188 kind: MachineConfig 189 metadata: 190 labels: 191 machineconfiguration.openshift.io/role: master 192 name: 98-var-lib-etcd 193 spec: 194 config: 195 ignition: 196 version: 3.2.0 197 systemd: 198 units: 199 - contents: | 200 [Unit] 201 Description=Mount /dev/vdb to /var/lib/etcd 202 Before=local-fs.target 203 Requires=systemd-mkfs@dev-vdb.service 204 After=systemd-mkfs@dev-vdb.service var.mount 205 206 [Mount] 207 What=/dev/vdb 208 Where=/var/lib/etcd 209 Type=xfs 210 Options=defaults,prjquota 211 212 [Install] 213 WantedBy=local-fs.target 214 enabled: true 215 name: var-lib-etcd.mount 216 ``` 217 218 Apply this file that will remove the logic for creating and syncing the device by entering the following command: 219 220 ```bash 221 oc replace -f etcd-replace.yaml 222 ``` 223 224 * Again we need to wait for the cluster to be healthy. The same commands as above can be used to check that everything is healthy. 225 226 * Now etcd is stored on ephemeral local disk. This can be verified by connected to a master nodes with `oc debug node/<master-node-name>` and running the following commands: 227 228 ```bash 229 oc debug node/<master-node-name> -- df -T /host/var/lib/etcd 230 ```