github.com/Mirantis/virtlet@v1.5.2-0.20191204181327-1659b8a48e9b/docs/design-proposals/cri-metrics.md

github.com/Mirantis/virtlet@v1.5.2-0.20191204181327-1659b8a48e9b/docs/design-proposals/cri-metrics.md (about)

     1  # Exposing VM metrics as CRI metrics
     2  
     3  Kubernetes historically uses `cAdvisor` as a main tool to gather container performance metrics.
     4  With introduction of CRI, new container runtimes can be plugged, including those not supported
     5  by `cAdvisor`. Thus there become a need to extend CRI with ability for runtimes to report their
     6  own core performance metrics. The interface was added in Kubernetes 1.7 and it's expected to be
     7  consumed and integrated into Kubernetes monitoring pipeline by 1.8 release.
     8  
     9  Since `virtlet` is implemented as container runtime and virtlet VMs pretend to be normal pods,
    10  it makes sense to implement CRI metrics interface so that virtlet VMs could be monitored with
    11  the same tools that are used for docker containers. Since the interface is already released,
    12  there is nothing that blocks from implementing it right away.
    13  
    14  ## CRI metrics interface
    15  
    16  Main CRI interface was extended with two additional methods: `ContainerStats` that returns
    17  metrics for particular container and `ListContainerStats`, which returns a list of container
    18  metrics for all containers hosted by the target runtime and matching the given filter, which
    19  may include sandbox id and a label map.
    20  
    21  In all cases, the result is either one or collection of `ContainerStats` structures that is
    22  comprised of CPU, memory and disk usage metrics, each with its own timestamp.
    23  
    24  ## Metrics gathering rate
    25  
    26  Virtlet should gather the required metrics asynchronously rather than upon request from, `kubelet`.
    27  The frequency, at which metrics are going to be fetched by `kubelet` is yet unknown. Thus it makes
    28  sense to have frequency at which virtlet collects the metrics be configurable. If this frequency
    29  is set to a higher value than that of `kubelet` requests, we will end up collecting several
    30  samples between requests. In this case, the intermediate samples should be aggregated.
    31  For CPU usage it makes sense to return average usage from collected samples (along with mean timestamp),
    32  whereas for disk and memory the last sample is more representative. After each `kubelet` requests,
    33  all collected samples are purged.
    34  
    35  However, since the CRI metrics are optional and do not implemented in 1.7, virtlet must be ready that
    36  there won't be any `kubelet` queries at all. Thus we should keep in memory only `N` last collected
    37  samples. To make algorithm simpler and more flexible at the same time, we can make `N` be configurable
    38  for each metric type separately and always return average value over all collected samples, but have
    39  default `N` values be 1 for disk and memory and about 3 for CPU.
    40  
    41  ## Obtaining metrics values
    42  
    43  Virtlet can use `libvirt` API to get required core metrics from VMs.
    44  For each domain, cumulative CPU usage can be taken from `Cpu.Time` field. As another choice, CPU
    45  information can be also retieved with `GetCPUStats` libvirt function.
    46  
    47  Memory statistics are available through `GetMemoryStats` function.
    48  
    49  As for the disks, the current interface makes use of `FilesystemUsage` structure, which was used in the
    50  `ImageService` interface to return storage, occupied by the images in both bytes and inodes as well as
    51  ID of the underlying storage. It is not clear, how kubernetes is going to use this structure for disk
    52  metrics. Meanwhile we could set only the most important field `used_bytes` and ignore the rest.
    53  We can use `virt-df` from `libguestfs` to retrieve free space information from VM.