github.com/openshift/installer@v1.4.17/docs/user/openstack/etcd-ephemeral-disk.md (about)

     1  # Moving etcd to an ephemeral local disk
     2  
     3  You can move etcd from a root volume (Cinder) to a dedicated ephemeral local disk to prevent or resolve performance issues.
     4  
     5  ## Prerequisites
     6  
     7  * This migration is currently tested and documented as a day 2 operation.
     8  * An OpenStack cloud where Nova is configured to use local storage for ephemeral disks. The `libvirt.images_type` option in `nova.conf` must not be `rbd`.
     9  * An OpenStack cloud with Cinder being functional and enough available storage to accommodate 3 Root Volumes for the OpenShift control plane.
    10  * OpenShift will be deployed with IPI for now; UPI is not yet documented but technically possible.
    11  * The control-plane machine’s auxiliary storage device, such as /dev/vdb, must match the vdb. Change this reference in all places in the file.
    12  
    13  ## Procedure
    14  
    15  * Create a Nova flavor for the Control Plane which allows 10 GiB of Ephemeral Disk:
    16  
    17  ```bash
    18  openstack flavor create --ephemeral 10 [...]
    19  ```
    20  
    21  * We will deploy a cluster with Root Volumes for the Control Plane. Here is an example of `install-config.yaml`:
    22  
    23  ```yaml
    24  [...]
    25  controlPlane:
    26    name: master
    27    platform:
    28      openstack:
    29        type: ${CONTROL_PLANE_FLAVOR}
    30        rootVolume:
    31          size: 100
    32          types:
    33          - ${CINDER_TYPE}
    34    replicas: 3
    35  [...]
    36  ```
    37  
    38  * Run openshift-install with the following parameters to create the cluster:
    39  
    40  ```bash
    41  openshift-install create cluster --dir=install_dir
    42  ```
    43  
    44  * Once the cluster has been deployed and is healthy, edit the ControlPlaneMachineSet (CPMS) to add the additional block ephemeral device that will be used by etcd:
    45  
    46  ```bash
    47  oc patch ControlPlaneMachineSet/cluster -n openshift-machine-api --type json -p '[{"op": "add", "path": "/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/additionalBlockDevices", "value": [{"name": "etcd", "sizeGiB": 10, "storage": {"type": "Local"}}]}]'
    48  ```
    49  
    50  > [!NOTE]
    51  > Putting etcd on a block device of type Volume is not supported for performance reasons simply because we don't test it.
    52  > While it's functionally the same as using the root volume, we decided to support local devices only for now.
    53  
    54  * Wait for the control-plane to roll out with new Machines. A few commands can be used to check that everything is healthy:
    55  
    56  ```bash
    57  oc wait --timeout=90m --for=condition=Progressing=false controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    58  oc wait --timeout=90m --for=jsonpath='{.spec.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    59  oc wait --timeout=90m --for=jsonpath='{.status.updatedReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    60  oc wait --timeout=90m --for=jsonpath='{.status.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    61  oc wait --timeout=90m --for=jsonpath='{.status.readyReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
    62  oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
    63  ```
    64  
    65  * Check that we have 3 control plane machines, and that each machine has the additional block device:
    66  
    67  ```bash
    68  cp_machines=$(oc get machines -n openshift-machine-api --selector='machine.openshift.io/cluster-api-machine-role=master' --no-headers -o custom-columns=NAME:.metadata.name)
    69  if [[ $(echo "${cp_machines}" | wc -l) -ne 3 ]]; then
    70    exit 1
    71  fi
    72  for machine in ${cp_machines}; do
    73    if ! oc get machine -n openshift-machine-api "${machine}" -o jsonpath='{.spec.providerSpec.value.additionalBlockDevices}' | grep -q 'etcd'; then
    74    exit 1
    75    fi
    76  done
    77  ```
    78  
    79  * We will use a MachineConfig to handle etcd on local disk. Create a file named `98-var-lib-etcd.yaml` with this content:
    80  
    81  ```yaml
    82  apiVersion: machineconfiguration.openshift.io/v1
    83  kind: MachineConfig
    84  metadata:
    85    labels:
    86      machineconfiguration.openshift.io/role: master
    87    name: 98-var-lib-etcd
    88  spec:
    89    config:
    90      ignition:
    91        version: 3.2.0
    92      systemd:
    93        units:
    94        - contents: |
    95            [Unit]
    96            Description=Make File System on /dev/vdb
    97            DefaultDependencies=no
    98            BindsTo=dev-vdb.device
    99            After=dev-vdb.device var.mount
   100            Before=systemd-fsck@dev-vdb.service
   101  
   102            [Service]
   103            Type=oneshot
   104            RemainAfterExit=yes
   105            ExecStart=/usr/sbin/mkfs.xfs -f /dev/vdb
   106            TimeoutSec=0
   107  
   108            [Install]
   109            WantedBy=var-lib-containers.mount
   110          enabled: true
   111          name: systemd-mkfs@dev-vdb.service
   112        - contents: |
   113            [Unit]
   114            Description=Mount /dev/vdb to /var/lib/etcd
   115            Before=local-fs.target
   116            Requires=systemd-mkfs@dev-vdb.service
   117            After=systemd-mkfs@dev-vdb.service var.mount
   118  
   119            [Mount]
   120            What=/dev/vdb
   121            Where=/var/lib/etcd
   122            Type=xfs
   123            Options=defaults,prjquota
   124  
   125            [Install]
   126            WantedBy=local-fs.target
   127          enabled: true
   128          name: var-lib-etcd.mount
   129        - contents: |
   130            [Unit]
   131            Description=Sync etcd data if new mount is empty
   132            DefaultDependencies=no
   133            After=var-lib-etcd.mount var.mount
   134            Before=crio.service
   135  
   136            [Service]
   137            Type=oneshot
   138            RemainAfterExit=yes
   139            ExecCondition=/usr/bin/test ! -d /var/lib/etcd/member
   140            ExecStart=/usr/sbin/setenforce 0
   141            ExecStart=/bin/rsync -ar /sysroot/ostree/deploy/rhcos/var/lib/etcd/ /var/lib/etcd/
   142            ExecStart=/usr/sbin/setenforce 1
   143            TimeoutSec=0
   144  
   145            [Install]
   146            WantedBy=multi-user.target graphical.target
   147          enabled: true
   148          name: sync-var-lib-etcd-to-etcd.service
   149        - contents: |
   150            [Unit]
   151            Description=Restore recursive SELinux security contexts
   152            DefaultDependencies=no
   153            After=var-lib-etcd.mount
   154            Before=crio.service
   155  
   156            [Service]
   157            Type=oneshot
   158            RemainAfterExit=yes
   159            ExecStart=/sbin/restorecon -R /var/lib/etcd/
   160            TimeoutSec=0
   161  
   162            [Install]
   163            WantedBy=multi-user.target graphical.target
   164          enabled: true
   165          name: restorecon-var-lib-etcd.service
   166  ```
   167  
   168  * Apply this file that will create the device and sync the data by entering the following command:
   169  
   170  ```bash
   171  oc create -f 98-var-lib-etcd.yaml
   172  ```
   173  
   174  * This will take some time to complete, as the etcd data will be synced from the root volume to the local disk on
   175  the control-plane machines. Run these commands to check whether the cluster is healthy:
   176  
   177  ```bash
   178  oc wait --timeout=45m --for=condition=Updating=false machineconfigpool/master
   179  oc wait node --selector='node-role.kubernetes.io/master' --for condition=Ready --timeout=30s
   180  oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
   181  ```
   182  
   183  
   184  * Once the cluster is healthy, create a file named `etcd-replace.yaml` with this content:
   185  
   186  ```yaml
   187  apiVersion: machineconfiguration.openshift.io/v1
   188  kind: MachineConfig
   189  metadata:
   190    labels:
   191      machineconfiguration.openshift.io/role: master
   192    name: 98-var-lib-etcd
   193  spec:
   194    config:
   195      ignition:
   196        version: 3.2.0
   197      systemd:
   198        units:
   199        - contents: |
   200            [Unit]
   201            Description=Mount /dev/vdb to /var/lib/etcd
   202            Before=local-fs.target
   203            Requires=systemd-mkfs@dev-vdb.service
   204            After=systemd-mkfs@dev-vdb.service var.mount
   205  
   206            [Mount]
   207            What=/dev/vdb
   208            Where=/var/lib/etcd
   209            Type=xfs
   210            Options=defaults,prjquota
   211  
   212            [Install]
   213            WantedBy=local-fs.target
   214          enabled: true
   215          name: var-lib-etcd.mount
   216  ```
   217  
   218  Apply this file that will remove the logic for creating and syncing the device by entering the following command:
   219  
   220  ```bash
   221  oc replace -f etcd-replace.yaml
   222  ```
   223  
   224  * Again we need to wait for the cluster to be healthy. The same commands as above can be used to check that everything is healthy.
   225  
   226  * Now etcd is stored on ephemeral local disk. This can be verified by connected to a master nodes with `oc debug node/<master-node-name>` and running the following commands:
   227  
   228  ```bash
   229  oc debug node/<master-node-name> -- df -T /host/var/lib/etcd
   230  ```