github.com/mirantis/virtlet@v1.5.2-0.20191204181327-1659b8a48e9b/docs/design-proposals/pv.md

github.com/mirantis/virtlet@v1.5.2-0.20191204181327-1659b8a48e9b/docs/design-proposals/pv.md (about)

     1  # Making use of Local PVs and Block Volume mode
     2  
     3  Virtlet uses custom flexvolume driver to handle raw block devices and
     4  Ceph volumes right now. This makes VM pods less consistent with
     5  "plain" Kubernetes pods. Another problem is that we may want to support
     6  persistent rootfs in future. As there's now Local Persistent Volume
     7  support (beta as of 1.10) and Block Volume support (alpha as of 1.10)
     8  in Kubernetes, we may use these features in Virtlet to avoid the
     9  flexvolume hacks and gain persistent rootfs support.
    10  
    11  This document contains the results of the research and will be turned
    12  into a more detailed proposal later if we decide to make use of
    13  the block PVs.
    14  
    15  The research is based on
    16  [this Kubernetes blog post](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/#enabling-smarter-scheduling-and-volume-binding)
    17  and
    18  [the raw block volume description](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#raw-block-volume-support)
    19  from Kubernetes documentation.
    20  
    21  First, I'll describe how the block PVs can be used in Virtlet, and
    22  then I'll give a detailed description of how the experiments were
    23  conducted.
    24  
    25  ## Using block PVs in Virtlet
    26  
    27  As it turns out, the non-local block PVs aren't different from local
    28  block PVs from the CRI point of view. They're configured using
    29  `volumeDevices` section of the container spec in the pod and `volumes`
    30  section of the pod spec, and passed as `devices` section in the
    31  container config to `CreateContainer()` CRI call:
    32  
    33  ```yaml
    34    devices:
    35    - container_path: /dev/testpvc
    36      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~local-volume/local-block-pv
    37      permissions: mrw
    38  ```
    39  
    40  Virtlet can use `host_path` to attach the device to the VM using a
    41  `DomainDisk`, and `container_path` to mount it inside the VM using
    42  cloud-init. The handling of local and non-local PVs doesn't differ
    43  on the CRI level.
    44  
    45  Supporting non-local PVs will automatically give Virtlet support for
    46  all the Kubernetes volume types that support the block mode, which
    47  include Ceph, FibreChannel, and the persistent disks on AWS, GCP and
    48  Azure, with the list probably growing larger in the future. It will
    49  also give automatic support for CSI plugins that support the block
    50  mode.  The caveat is that the block mode is Alpha as of Kubernetes
    51  1.10 and it wasn't checked for earlier Kubernetes versions.
    52  
    53  The use of block PVs will eliminate the need for custom flexvolumes at
    54  some point (after block volumes become GA and we stop supporting
    55  earlier Kubernetes versions). There's one caveat, with block PVs the
    56  Ceph RBDs will be mapped on the nodes by `kubelet`, instead of being
    57  consumed by qemu by the means of `librbd`. It's not clear though if
    58  this will be good or bad from the performance standpoint. If we'll
    59  still need custom volume types, flexvolumes may be replaced with
    60  [CSI](https://kubernetes.io/blog/2018/04/10/container-storage-interface-beta/).
    61  
    62  More advantages of using block PVs instead of custom flexvolumes
    63  include having VM pods differ even less from "plain" pods, and a
    64  possibility to make use automatic PV provisioning in future.
    65  
    66  There's also a possibility of using the block PVs (local or non-local)
    67  for the persistent rootfs. It's possible to copy the image onto PV
    68  upon the first use, and then have another pod reuse the PV after the
    69  original one is destroyed. For local PVs, the scheduler will always
    70  place the pod on the node where the local PV resides (this constitutes
    71  so called "gravity"). There's a problem with this approach, namely,
    72  there's no reliable way for a CRI implementation to find a PV that
    73  corresponds to a block device, so Virtlet will have to examine the
    74  contents of the PV to see if it's used for the first time. This also
    75  means that Virtlet will have hard time establishing the correspondence
    76  between PVs and the images that are applied to them (e.g. imagine a PV
    77  being used by a pod with different image later). It's possible to
    78  overcome these problems by either storing the metadata on the block
    79  device itself somehow, or using CRDs and PV metadata to keep track of
    80  "pet" VMs and their root filesystems. The use of local PVs will take
    81  much of the burden from the corresponding controller, though.
    82  
    83  ## Experimenting with the Local Persistent Volumes
    84  
    85  First, we need to define a storage class that specifies
    86  `volumeBindingMode: WaitForFirstConsumer` that's
    87  [needed](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/#enabling-smarter-scheduling-and-volume-binding)
    88  for propoper pod scheduling:
    89  ```yaml
    90  ---
    91  kind: StorageClass
    92  apiVersion: storage.k8s.io/v1
    93  metadata:
    94    name: local-storage
    95  provisioner: kubernetes.io/no-provisioner
    96  volumeBindingMode: WaitForFirstConsumer
    97  ```
    98  
    99  Below is a definition of a Local Persistent Volume:
   100  ```yaml
   101  apiVersion: v1
   102  kind: PersistentVolume
   103  metadata:
   104    name: local-block-pv
   105  spec:
   106    capacity:
   107      storage: 100Mi
   108    accessModes:
   109    - ReadWriteOnce
   110    persistentVolumeReclaimPolicy: Retain
   111    storageClassName: local-storage
   112    volumeMode: Block
   113    local:
   114      path: /dev/loop3
   115    claimRef:
   116      name: local-block-pvc
   117      namespace: default
   118    nodeAffinity:
   119      required:
   120        nodeSelectorTerms:
   121        - matchExpressions:
   122          - key: kubernetes.io/hostname
   123            operator: In
   124            values:
   125            - kube-node-1
   126  ```
   127  
   128  The important parts here are the following: `volumeMode: Block`
   129  setting the block volume mode, local volume source specification
   130  that makes the PV use `/dev/loop3`
   131  ```yaml
   132    local:
   133      path: /dev/loop3
   134  ```
   135  and a `nodeAffinity` spec that pins the local PV to `kube-node-1`:
   136  ```
   137    nodeAffinity:
   138      required:
   139        nodeSelectorTerms:
   140        - matchExpressions:
   141          - key: kubernetes.io/hostname
   142            operator: In
   143            values:
   144            - kube-node-1
   145  ```
   146  
   147  The following PVC makes use of that PV (it's referenced explicitly via
   148  `claimRef` above but we could allow Kubernetes to associate the PV
   149  with PVC instead), also including `volumeMode: Block` in it:
   150  ```yaml
   151  kind: PersistentVolumeClaim
   152  apiVersion: v1
   153  metadata:
   154    name: local-block-pvc
   155  spec:
   156    accessModes:
   157    - ReadWriteOnce
   158    volumeMode: Block
   159    storageClassName: local-storage
   160    resources:
   161      requests:
   162        storage: 100Mi
   163  ```
   164  
   165  And, finally, a pod that makes use of the PVC:
   166  ```
   167  ---
   168  kind: Pod
   169  apiVersion: v1
   170  metadata:
   171    name: test-block-pod
   172  spec:
   173    containers:
   174    - name: ubuntu
   175      image: ubuntu:16.04
   176      command:
   177      - /bin/sh
   178      - -c
   179      - sleep 30000
   180      volumeDevices:
   181      - devicePath: /dev/testpvc
   182        name: testpvc
   183    volumes:
   184    - name: testpvc
   185      persistentVolumeClaim:
   186        claimName: local-block-pvc
   187  ```
   188  
   189  In the pod definition, we're using `volumeDevices` with `devicePath`
   190  instead of `volumeMounts` with `mountPath`. This will make the node's
   191  `/dev/loop3` appear as `/dev/testpvc` inside the pod's container:
   192  
   193  ```
   194  $ kubectl exec test-block-pod -- ls -l /dev/testpvc
   195  brw-rw---- 1 root disk 7, 3 Jun 12 20:44 /dev/testpvc
   196  $ kubectl exec test-block-pod -- mkfs.ext4 /dev/testpvc
   197  Discarding device blocks: done
   198  Creating filesystem with 102400 1k blocks and 25688 inodes
   199  Filesystem UUID: a02f7560-23a6-45c1-b10a-6e0a1b1eee72
   200  Superblock backups stored on blocks:
   201          8193, 24577, 40961, 57345, 73729
   202  
   203  Allocating group tables: done
   204  Writing inode tables: done
   205  Creating journal (4096 blocks): mke2fs 1.42.13 (17-May-2015)
   206  done
   207  Writing superblocks and filesystem accounting information: done
   208  ```
   209  
   210  The important part is that the pod gets automatically scheduled on the
   211  node where the local PV used by the PVC resides:
   212  ```
   213  $ kubectl get pods test-block-pod -o wide
   214  NAME             READY     STATUS    RESTARTS   AGE       IP           NODE
   215  test-block-pod   1/1       Running   0          21m       10.244.2.9   kube-node-1
   216  ```
   217  
   218  From CRI point of view, the following container config is passed to
   219  the `CreateContainer()` call, as seen in CRI Proxy logs (pod sandbox
   220  config omitted for brevity as it doesn't contain the mount or device
   221  related information):
   222  ```yaml
   223  I0612 20:44:29.869566    1038 proxy.go:126] ENTER: /runtime.v1alpha2.RuntimeService/CreateContainer():
   224  config:
   225    annotations:
   226      io.kubernetes.container.hash: ff82c6d3
   227      io.kubernetes.container.restartCount: "0"
   228      io.kubernetes.container.terminationMessagePath: /dev/termination-log
   229      io.kubernetes.container.terminationMessagePolicy: File
   230      io.kubernetes.pod.terminationGracePeriod: "30"
   231    command:
   232    - /bin/sh
   233    - -c
   234    - sleep 30000
   235    devices:
   236    - container_path: /dev/testpvc
   237      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~local-volume/local-block-pv
   238      permissions: mrw
   239    envs:
   240    - key: KUBERNETES_SERVICE_PORT_HTTPS
   241      value: "443"
   242    - key: KUBERNETES_PORT
   243      value: tcp://10.96.0.1:443
   244    - key: KUBERNETES_PORT_443_TCP
   245      value: tcp://10.96.0.1:443
   246    - key: KUBERNETES_PORT_443_TCP_PROTO
   247      value: tcp
   248    - key: KUBERNETES_PORT_443_TCP_PORT
   249      value: "443"
   250    - key: KUBERNETES_PORT_443_TCP_ADDR
   251      value: 10.96.0.1
   252    - key: KUBERNETES_SERVICE_HOST
   253      value: 10.96.0.1
   254    - key: KUBERNETES_SERVICE_PORT
   255      value: "443"
   256    image:
   257      image: sha256:5e8b97a2a0820b10338bd91674249a94679e4568fd1183ea46acff63b9883e9c
   258    labels:
   259      io.kubernetes.container.name: ubuntu
   260      io.kubernetes.pod.name: test-block-pod
   261      io.kubernetes.pod.namespace: default
   262      io.kubernetes.pod.uid: 65b0c985-6e81-11e8-be27-769e6e14e66a
   263    linux:
   264      resources:
   265        cpu_shares: 2
   266        oom_score_adj: 1000
   267      security_context:
   268        namespace_options:
   269          pid: 1
   270        run_as_user: {}
   271    log_path: ubuntu/0.log
   272    metadata:
   273      name: ubuntu
   274    mounts:
   275    - container_path: /var/run/secrets/kubernetes.io/serviceaccount
   276      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumes/kubernetes.io~secret/default-token-7zwlh
   277      readonly: true
   278    - container_path: /etc/hosts
   279      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/etc-hosts
   280    - container_path: /dev/termination-log
   281      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/containers/ubuntu/2be42601
   282  ```
   283  
   284  The important part is this:
   285  ```yaml
   286    devices:
   287    - container_path: /dev/testpvc
   288      host_path: /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~local-volume/local-block-pv
   289      permissions: mrw
   290  ```
   291  
   292  If we look at the node, we'll see that `host_path` points to a symlink to `/dev/loop3` which
   293  is specified in the local block PV:
   294  ```
   295  root@kube-node-1:/# ls -l /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~local-volume/local-block-pv
   296  lrwxrwxrwx 1 root root 10 Jun 13 08:31 /var/lib/kubelet/pods/65b0c985-6e81-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~local-volume/local-block-pv -> /dev/loop3
   297  ```
   298  
   299  `container_path` denotes the path to the device inside the container.
   300  
   301  The `permissions` is described in CRI spec as follows:
   302  ```
   303      // Cgroups permissions of the device, candidates are one or more of
   304      // * r - allows container to read from the specified device.
   305      // * w - allows container to write to the specified device.
   306      // * m - allows container to create device files that do not yet exist.
   307  ```
   308  
   309  Also note that the device is not listed in `mounts`.
   310  
   311  There's a
   312  [tool](https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume)
   313  for automatic provisioning of Local Persistent Volumes that's part of
   314  [external-storage](https://github.com/kubernetes-incubator/external-storage)
   315  project. Right now it may not be very useful for Virtlet, but it may
   316  gain some important features later, like support for automatic
   317  partitioning and fs formatting.
   318  
   319  ## Experimenting with non-local ("plain") Persistent Volumes
   320  
   321  Let's check "plain" PVs now. We'll be using Ceph block volumes.
   322  
   323  Below are some tricks that make kubeadm-dind-cluster compatible with
   324  Ceph. Some of them may be useful for running
   325  [Rook](https://github.com/rook/rook) on k-d-c, too.
   326  
   327  For Ceph RBDs to work with Kubernetes Ceph PVs (not just Virtlet's
   328  flexvolume-based ones), I had to make `rbd` work on the DIND nodes, so
   329  the following change had to be made to the kubeadm-dind-cluster's main
   330  script (observed in [Rook's](https://github.com/rook/rook) DIND
   331  setup):
   332  ```
   333  diff --git a/dind-cluster.sh b/dind-cluster.sh
   334  index e9118e2..24a0a78 100755
   335  --- a/dind-cluster.sh
   336  +++ b/dind-cluster.sh
   337  @@ -645,6 +645,9 @@ function dind::run {
   338            --hostname "${container_name}" \
   339            -l mirantis.kubeadm_dind_cluster \
   340            -v ${volume_name}:/dind \
   341  +         -v /dev:/dev \
   342  +         -v /sys/bus:/sys/bus \
   343  +         -v /var/run/docker.sock:/opt/outer-docker.sock \
   344            ${opts[@]+"${opts[@]}"} \
   345            "${DIND_IMAGE}" \
   346            ${args[@]+"${args[@]}"}
   347  ```
   348  
   349  The following file had to be added as a fake `rbd` command to each DIND node
   350  (borrowed from [Rook scripts](https://github.com/rook/rook/blob/cd2b69915958e7453b3fc5031f59179058163dcd/tests/scripts/dind-cluster-rbd)):
   351  ```
   352  #!/bin/bash
   353  DOCKER_HOST=unix:///opt/outer-docker.sock /usr/bin/docker run --rm -v /sys:/sys --net=host --privileged=true ceph/base rbd "$@"
   354  ```
   355  It basically executes rbd command using `ceph/base` images using the
   356  host docker in the host network namespace.
   357  
   358  So let's bring up the cluster:
   359  ```bash
   360  ./dind-cluster.sh up
   361  ```
   362  
   363  Disable rate limiting so journald doesn't choke on CRI proxy logs on the node 1:
   364  ```bash
   365  docker exec kube-node-1 /bin/bash -c 'echo "RateLimitInterval=0" >>/etc/systemd/journald.conf && systemctl restart systemd-journald'
   366  ```
   367  
   368  Enable `BlockVolume` mode for kubelet on the node 1
   369  (`MountPropagation` is enabled by default in 1.10, so let's just
   370  replace it):
   371  ```bash
   372  docker exec kube-node-1 /bin/bash -c 'sed -i "s/MountPropagation/BlockVolume/" /lib/systemd/system/kubelet.service && systemctl daemon-reload && systemctl restart kubelet'
   373  ```
   374  
   375  Install CRI Proxy so we can grab the logs:
   376  ```bash
   377  CRIPROXY_DEB_URL="${CRIPROXY_DEB_URL:-https://github.com/Mirantis/criproxy/releases/download/v0.14.0/criproxy-nodeps_0.14.0_amd64.deb}"
   378  docker exec kube-node-1 /bin/bash -c "curl -sSL '${CRIPROXY_DEB_URL}' >/criproxy.deb && dpkg -i /criproxy.deb && rm /criproxy.deb"
   379  ```
   380  
   381  Taint node 2 so we get everything scheduled on node 1:
   382  ```bash
   383  kubectl taint nodes kube-node-2 dedicated=foobar:NoSchedule
   384  ```
   385  
   386  Now we need to add `rbd` command to the 'hypokube' image that's used
   387  by the control plane (we need it for `kube-controller-manager`). The
   388  proper way would be using the node's `rbd` command with mounting host
   389  docker socket into the container, but as the controller manager
   390  doesn't need `rbd map` command which needs host access, we can just
   391  install `rbd` package here, just make sure it's new enough to support
   392  commands like `rbd status` that are invoked by the controller manager:
   393  
   394  ```bash
   395  docker exec kube-master /bin/bash -c 'docker rm -f tmp; docker run --name tmp mirantis/hypokube:final /bin/bash -c "echo deb http://ftp.debian.org/debian jessie-backports main >>/etc/apt/sources.list && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y ceph-common=10.2.5-6~bpo8+1 libradosstriper1 ntp librados2=10.2.5-6~bpo8+1 librbd1=10.2.5-6~bpo8+1 python-cephfs=10.2.5-6~bpo8+1 libcephfs1=10.2.5-6~bpo8+1" && docker commit tmp mirantis/hypokube:final && docker rm -f tmp'
   396  ```
   397  
   398  At this point, we must edit the following files on `kube-master` node, adding
   399  `--feature-gates=BlockVolume=true` to the end of `command:` in each pod's only container:
   400  
   401  * `/etc/kubernetes/manifests/kube-apiserver.yaml`
   402  * `/etc/kubernetes/manifests/kube-scheduler.yaml`
   403  * `/etc/kubernetes/manifests/kube-controller-manager.yaml`
   404  
   405  Likely, updating just the controller manager may suffice, but I didn't
   406  check.  This will cause the pods to restart and use the updated
   407  `mirantis/hypokube:final` image.
   408  
   409  Now let's start the Ceph demo container:
   410  ```bash
   411  MON_IP=$(docker exec kube-master route | grep default | awk '{print $2}')
   412  CEPH_PUBLIC_NETWORK=${MON_IP}/16
   413  docker run -d --net=host -e MON_IP=${MON_IP} \
   414         -e CEPH_PUBLIC_NETWORK=${CEPH_PUBLIC_NETWORK} \
   415         -e CEPH_DEMO_UID=foo \
   416         -e CEPH_DEMO_ACCESS_KEY=foo \
   417         -e CEPH_DEMO_SECRET_KEY=foo \
   418         -e CEPH_DEMO_BUCKET=foo \
   419         -e DEMO_DAEMONS="osd mds" \
   420         --name ceph_cluster docker.io/ceph/daemon demo
   421  ```
   422  
   423  Create a pool there:
   424  ```bash
   425  docker exec ceph_cluster ceph osd pool create kube 8 8
   426  ```
   427  
   428  Create an image for testing (it's important to use `rbd create` with
   429  `layering` feature here so as not to get a feature mismatch error
   430  later when creating a pod):
   431  ```bash
   432  docker exec ceph_cluster rbd create tstimg \
   433         --size 11M --pool kube --image-feature layering
   434  ```
   435  
   436  Set up a Kubernetes secret for use with Ceph:
   437  ```bash
   438  admin_secret="$(docker exec ceph_cluster ceph auth get-key client.admin)"
   439  kubectl create secret generic ceph-admin \
   440          --type="kubernetes.io/rbd" \
   441          --from-literal=key="${admin_secret}" \
   442          --namespace=kube-system
   443  ```
   444  
   445  Copy the `rbd` replacement script presented earlier above to each node:
   446  ```bash
   447  for n in kube-{master,node-{1,2}}; do
   448    docker cp dind-cluster-rbd ${n}:/usr/bin/rbd
   449  done
   450  ```
   451  
   452  Now we can create a test PV, PVC and a pod.
   453  
   454  Let's define a storage class:
   455  ```yaml
   456  kind: StorageClass
   457  apiVersion: storage.k8s.io/v1
   458  metadata:
   459    name: ceph-testnew
   460  provisioner: kubernetes.io/rbd
   461  parameters:
   462    monitors: 10.192.0.1:6789
   463    adminId: admin
   464    adminSecretName: ceph-admin
   465    adminSecretNamespace: kube-system
   466    pool: kube
   467    userId: admin
   468    userSecretName: ceph-admin
   469    userSecretNamespace: kube-system
   470    fsType: ext4
   471    imageFormat: "1"
   472    # the following was disabled while testing non-block PVs
   473    imageFeatures: "layering"
   474  ```
   475  Actually, the automatic provisioning didn't work for me because it
   476  was setting `volumeMode: Filesystem` in the PVs, but this was probably
   477  due to my mistake, or otherwise from looking at Kubernetes source
   478  it should be fixable.
   479  
   480  Let's define a block PV:
   481  ```yaml
   482  ---
   483  apiVersion: v1
   484  kind: PersistentVolume
   485  metadata:
   486    name: test-block-pv
   487  spec:
   488    accessModes:
   489    - ReadWriteOnce
   490    capacity:
   491      storage: 10Mi
   492    claimRef:
   493      name: ceph-block-pvc
   494      namespace: default
   495    persistentVolumeReclaimPolicy: Delete
   496    rbd:
   497      image: tstimg
   498      keyring: /etc/ceph/keyring
   499      monitors:
   500      - 10.192.0.1:6789
   501      pool: kube
   502      secretRef:
   503        name: ceph-admin
   504        namespace: kube-system
   505      user: admin
   506    storageClassName: ceph-testnew
   507    volumeMode: Block
   508  ```
   509  
   510  The difference from the "usual" RBD PV is `volumeMode: Block` here,
   511  and the same goes for the PVC:
   512  ```yaml
   513  ---
   514  kind: PersistentVolumeClaim
   515  apiVersion: v1
   516  metadata:
   517    name: ceph-block-pvc
   518  spec:
   519    accessModes:
   520    - ReadWriteOnce
   521    volumeMode: Block
   522    storageClassName: ceph-testnew
   523    resources:
   524      requests:
   525        storage: 10Mi
   526  ```
   527  
   528  Now, the pod itself, with `volumeDevices` instead of `volumeMounts`:
   529  ```yaml
   530  kind: Pod
   531  apiVersion: v1
   532  metadata:
   533    name: ceph-block-pod
   534  spec:
   535    containers:
   536    - name: ubuntu
   537      image: ubuntu:16.04
   538      command:
   539      - /bin/sh
   540      - -c
   541      - sleep 30000
   542      volumeDevices:
   543      - name: data
   544        devicePath: /dev/cephdev
   545    volumes:
   546    - name: data
   547      persistentVolumeClaim:
   548        claimName: ceph-block-pvc
   549  ```
   550  
   551  Let's do `kubectl apply -f ceph-test.yaml` (`ceph-test.yaml`
   552  containing all of the yaml documents above), and try it out:
   553  
   554  ```
   555  $ kubectl exec ceph-block-pod -- ls -l /dev/cephdev
   556  brw-rw---- 1 root disk 252, 0 Jun 12 20:19 /dev/cephdev
   557  $ kubectl exec ceph-block-pod -- mkfs.ext4 /dev/cephdev
   558  mke2fs 1.42.13 (17-May-2015)
   559  Discarding device blocks: done
   560  Creating filesystem with 11264 1k blocks and 2816 inodes
   561  Filesystem UUID: 81ce32e8-bf37-4bc8-88bf-674bf6f79d14
   562  Superblock backups stored on blocks:
   563          8193
   564  
   565  Allocating group tables: done
   566  Writing inode tables: done
   567  Creating journal (1024 blocks): done
   568  Writing superblocks and filesystem accounting information: done
   569  ```
   570  
   571  Let's capture CRI Proxy logs:
   572  ```
   573  docker exec kube-node-1 journalctl -xe -n 20000 -u criproxy|egrep --line-buffered -v '/run/virtlet.sock|\]: \{\}|/var/run/dockershim.sock|ImageFsInfo' >/tmp/log.txt
   574  ```
   575  
   576  The following is the important part of the log which is slightly
   577  cleaned up:
   578  ```
   579  I0612 20:19:38.681852    1038 proxy.go:126] ENTER: /runtime.v1alpha2.RuntimeService/CreateContainer():
   580  config:
   581    annotations:
   582      io.kubernetes.container.hash: d0c4a380
   583      io.kubernetes.container.restartCount: "0"
   584      io.kubernetes.container.terminationMessagePath: /dev/termination-log
   585      io.kubernetes.container.terminationMessagePolicy: File
   586      io.kubernetes.pod.terminationGracePeriod: "30"
   587    command:
   588    - /bin/sh
   589    - -c
   590    - sleep 30000
   591    devices:
   592    - container_path: /dev/cephdev
   593      host_path: /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~rbd/test-block-pv
   594      permissions: mrw
   595    envs:
   596    - key: KUBERNETES_PORT
   597      value: tcp://10.96.0.1:443
   598    - key: KUBERNETES_PORT_443_TCP
   599      value: tcp://10.96.0.1:443
   600    - key: KUBERNETES_PORT_443_TCP_PROTO
   601      value: tcp
   602    - key: KUBERNETES_PORT_443_TCP_PORT
   603      value: "443"
   604    - key: KUBERNETES_PORT_443_TCP_ADDR
   605      value: 10.96.0.1
   606    - key: KUBERNETES_SERVICE_HOST
   607      value: 10.96.0.1
   608    - key: KUBERNETES_SERVICE_PORT
   609      value: "443"
   610    - key: KUBERNETES_SERVICE_PORT_HTTPS
   611      value: "443"
   612    image:
   613      image: sha256:5e8b97a2a0820b10338bd91674249a94679e4568fd1183ea46acff63b9883e9c
   614    labels:
   615      io.kubernetes.container.name: ubuntu
   616      io.kubernetes.pod.name: ceph-block-pod
   617      io.kubernetes.pod.namespace: default
   618      io.kubernetes.pod.uid: ebb11dcb-6e7d-11e8-be27-769e6e14e66a
   619    linux:
   620      resources:
   621        cpu_shares: 2
   622        oom_score_adj: 1000
   623      security_context:
   624        namespace_options:
   625          pid: 1
   626        run_as_user: {}
   627    log_path: ubuntu/0.log
   628    metadata:
   629      name: ubuntu
   630    mounts:
   631    - container_path: /var/run/secrets/kubernetes.io/serviceaccount
   632      host_path: /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/volumes/kubernetes.io~secret/default-token-7zwlh
   633      readonly: true
   634    - container_path: /etc/hosts
   635      host_path: /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/etc-hosts
   636    - container_path: /dev/termination-log
   637      host_path: /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/containers/ubuntu/577593a5
   638  ```
   639  
   640  Again, we have this here:
   641  ```yaml
   642    devices:
   643    - container_path: /dev/cephdev
   644      host_path: /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~rbd/test-block-pv
   645      permissions: mrw
   646  ```
   647  
   648  The `host_path` points to a mapped RBD:
   649  ```yaml
   650  root@kube-node-1:/# ls -l /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~rbd/test-block-pv
   651  lrwxrwxrwx 1 root root 9 Jun 12 20:19 /var/lib/kubelet/pods/ebb11dcb-6e7d-11e8-be27-769e6e14e66a/volumeDevices/kubernetes.io~rbd/test-block-pv -> /dev/rbd0
   652  ```
   653  
   654  An unpleasant part about RBDs+DIND is that the machine may hang on
   655  some commands / refuse to reboot if RBDs aren't unmapped. If kdc
   656  cluster is already teared down (but `ceph_cluster` container is still
   657  alive), the following commands can be used to list and unmap RBDs on
   658  the Linux host:
   659  
   660  ```
   661  # rbd showmapped
   662  id pool image                                 snap device
   663  0  kube tstimg                                -    /dev/rbd0
   664  # rbd unmap -o force kube/tstimg
   665  ```