github.com/galamsiva2020/kubernetes-heapster-monitoring@v0.0.0-20210823134957-3c1baa7c1e70/docs/storage-schema.md (about)

     1  ## Metrics
     2  
     3  Heapster exports the following metrics to its backends.
     4  
     5  | Metric Name | Description |
     6  |------------|-------------|
     7  | cpu/limit | CPU hard limit in millicores. |
     8  | cpu/node_capacity | CPU capacity of a node. |
     9  | cpu/node_allocatable | CPU allocatable of a node. |
    10  | cpu/node_reservation | Share of CPU that is reserved on the node allocatable. |
    11  | cpu/node_utilization | CPU utilization as a share of node allocatable. |
    12  | cpu/request | CPU request (the guaranteed amount of resources) in millicores. |
    13  | cpu/usage | Cumulative amount of consumed CPU time on all cores in nanoseconds. |
    14  | cpu/usage_rate | CPU usage on all cores in millicores. |
    15  | cpu/load | CPU load in milliloads, i.e., runnable threads * 1000 |
    16  | ephemeral_storage/limit | Local ephemeral storage hard limit in bytes. |
    17  | ephemeral_storage/request | Local ephemeral storage request (the guaranteed amount of resources) in bytes. |
    18  | ephemeral_storage/usage | Total local ephemeral storage usage. |
    19  | ephemeral_storage/node_capacity | Local ephemeral storage capacity of a node. |
    20  | ephemeral_storage/node_allocatable | Local ephemeral storage allocatable of a node. |
    21  | ephemeral_storage/node_reservation | Share of local ephemeral storage that is reserved on the node allocatable. |
    22  | ephemeral_storage/node_utilization | Local ephemeral utilization as a share of ephemeral storage allocatable. |
    23  | filesystem/usage | Total number of bytes consumed on a filesystem. |
    24  | filesystem/limit | The total size of filesystem in bytes. |
    25  | filesystem/available | The number of available bytes remaining in a the filesystem |
    26  | filesystem/inodes | The number of available inodes in a the filesystem |
    27  | filesystem/inodes_free | The number of free inodes remaining in a the filesystem |
    28  | disk/io_read_bytes | Number of bytes read from a disk partition |
    29  | disk/io_write_bytes | Number of bytes written to a disk partition |
    30  | disk/io_read_bytes_rate | Number of bytes read from a disk partition per second |
    31  | disk/io_write_bytes_rate | Number of bytes written to a disk partition per second |
    32  | memory/limit | Memory hard limit in bytes. |
    33  | memory/major_page_faults | Number of major page faults. |
    34  | memory/major_page_faults_rate | Number of major page faults per second. |
    35  | memory/node_capacity | Memory capacity of a node. |
    36  | memory/node_allocatable | Memory allocatable of a node. |
    37  | memory/node_reservation | Share of memory that is reserved on the node allocatable. |
    38  | memory/node_utilization | Memory utilization as a share of memory allocatable. |
    39  | memory/page_faults | Number of page faults. |
    40  | memory/page_faults_rate | Number of page faults per second. |
    41  | memory/request | Memory request (the guaranteed amount of resources) in bytes. |
    42  | memory/usage | Total memory usage. |
    43  | memory/cache | Cache memory usage. |
    44  | memory/rss | RSS memory usage. |
    45  | memory/working_set | Total working set usage. Working set is the memory being used and not easily dropped by the kernel. |
    46  | accelerator/memory_total | Memory capacity of an accelerator. |
    47  | accelerator/memory_used | Memory used of an accelerator. |
    48  | accelerator/duty_cycle | Duty cycle of an accelerator. |
    49  | accelerator/request | Number of accelerator devices requested by container. |
    50  | network/rx | Cumulative number of bytes received over the network. |
    51  | network/rx_errors | Cumulative number of errors while receiving over the network. |
    52  | network/rx_errors_rate | Number of errors while receiving over the network per second. |
    53  | network/rx_rate | Number of bytes received over the network per second. |
    54  | network/tx | Cumulative number of bytes sent over the network |
    55  | network/tx_errors | Cumulative number of errors while sending over the network |
    56  | network/tx_errors_rate | Number of errors while sending over the network |
    57  | network/tx_rate | Number of bytes sent over the network per second. |
    58  | uptime  | Number of milliseconds since the container was started. |
    59  
    60  All custom (aka application) metrics are prefixed with 'custom/'.
    61  
    62  ## Labels
    63  
    64  Heapster tags each metric with the following labels.
    65  
    66  | Label Name     | Description                                                                   |
    67  |----------------|-------------------------------------------------------------------------------|
    68  | pod_id         | Unique ID of a Pod                                                            |
    69  | pod_name       | User-provided name of a Pod                                                   |
    70  | container_base_image | Base image for the container |
    71  | container_name | User-provided name of the container or full cgroup name for system containers |
    72  | host_id        | Cloud-provider specified or user specified Identifier of a node               |
    73  | hostname       | Hostname where the container ran                                              |
    74  | nodename       | Nodename where the container ran                                              |
    75  | labels         | Comma-separated(Default) list of user-provided labels. Format is 'key:value'  |
    76  | namespace_id   | UID of the namespace of a Pod                                                 |
    77  | namespace_name | User-provided name of a Namespace                                             |
    78  | resource_id    | A unique identifier used to differentiate multiple metrics of the same type. e.x. Fs partitions under filesystem/usage, disk device name under disk/io_read_bytes |
    79  | make  | Make of the accelerator (nvidia, amd, google etc.) |
    80  | model | Model of the accelerator (tesla-p100, tesla-k80 etc.) |
    81  | accelerator_id    | ID of the accelerator |
    82  
    83  **Note**
    84    * Label separator can be configured with Heapster `--label-separator`. Comma-separated label pairs is fine until we use [Bosun](http://bosun.org) as alert system and use `group by labels` to search for labels.
    85      [Bosun(0.5.0) uses comma to split queried tag key and tag value](https://github.com/bosun-monitor/bosun/blob/0.5.0/opentsdb/tsdb.go#L566-L575). For example if the expression used for query InfluxDB from Bosun is like this:
    86  ```
    87  $limit = avg(influx("k8s", '''SELECT mean(value) as value FROM "memory/limit" WHERE type = 'node' GROUP BY nodename, labels''', "${INTERVAL}s", "", ""))
    88  ```
    89  With a comma-separated labels:
    90  ```
    91  nodename=127.0.0.1,labels=beta.kubernetes.io/arch:amd64,beta.kubernetes.io/os:linux,kubernetes.io/hostname:127.0.0.1
    92  ```
    93  When split by a comma, something wrong happened. Bosun split it wrongly to:
    94  ```
    95  nodename=127.0.0.1
    96  labels=labels:beta.kubernetes.io/arch:amd64
    97  beta.kubernetes.io/os.linux
    98  kubernetes.io/hostname:127.0.0.1
    99  ```
   100  Last two tag key-value pairs is wrong. They should not exist and be squashed to `labels`:
   101  ```
   102  nodename=127.0.0.1
   103  labels=labels:beta.kubernetes.io/arch:amd64,beta.kubernetes.io/os.linux,kubernetes.io/hostname:127.0.0.1
   104  ```
   105  This will make bosun confused and panic with something like "panic: opentsdb: bad tag: beta.kubernetes.io/os:linux".
   106    * User-provided labels can be stored additionally as separate labels with Heapster `--store-label`. Similarily, using `--ignore-label`, labels can be ommited in concatenated labels.
   107  
   108  ## Aggregates
   109  
   110  The metrics are initially collected for nodes and containers and later aggregated for pods, namespaces and clusters.
   111  Disk and network metrics are not available at container level (only at pod and node level).
   112  
   113  ## Storage Schema
   114  
   115  ### InfluxDB
   116  
   117  ##### Default
   118  
   119  Each metric translates to a separate 'series' in InfluxDB. Labels are stored as tags.
   120  The metric name is not modified.
   121  
   122  ##### Using fields
   123  
   124  If you want to use InfluxDB fields, you have to add `withfields=true` as parameter in InfluxDB sink URL.
   125  (More information here: https://docs.influxdata.com/influxdb/v0.9/concepts/key_concepts/)
   126  
   127  In that case, each metric translates to a separate in 'series' in InfluxDB. This means that some metrics are grouped in the same 'measurement'.
   128  For example, we have the measurement 'cpu' with fields 'node_reservation', 'node_utilization', 'request', 'usage', 'usage_rate'.
   129  Also, all labels are stored as tags.
   130  Here the measurement list: cpu, filesystem, memory, network, uptime
   131  
   132  Also, standard Grafana dashboard is not working with this new schema, you have to use [new dashboards](/grafana/dashboards/influxdb_withfields)
   133  
   134  ### Google Cloud Monitoring
   135  
   136  Metrics mentioned above are stored along with corresponding labels as [custom metrics](https://cloud.google.com/monitoring/custom-metrics/) in Google Cloud Monitoring.
   137  
   138  * Metrics are collected every 2 minutes by default and pushed with a 1 minute precision.
   139  * Each metric has a custom metric prefix - `custom.cloudmonitoring.googleapis.com`
   140  * Each metric is pushed with an additional namespace prefix - `kubernetes.io`.
   141  * GCM does not support visualizing cumulative metrics yet. To work around that, heapster exports an equivalent gauge metric for all cumulative metrics mentioned above.
   142  
   143    The gauge metrics use their parent cumulative metric name as the prefix, followed by a "_rate" suffix.
   144     E.x.: "cpu/usage", which is cumulative, will have a corresponding gauge metric "cpu/usage_rate"
   145     NOTE: The gauge metrics will be deprecated as soon as GCM supports visualizing cumulative metrics.
   146  
   147  TODO: Add a snapshot of all the metrics stored in GCM.
   148  
   149  ### Hawkular
   150  
   151  Each metric is stored as separate timeseries (metric) in Hawkular-Metrics with tags being inherited from common ancestor type. The metric name is created with the following format: `containerName/podId/metricName` (`/` is separator). Each definition stores the labels as tags with following addons:
   152  
   153  * All the Label descriptions are stored as label_description
   154  * The ancestor metric name (such as cpu/usage) is stored under the tag `descriptor_name`
   155  * To ease search, a tag with `group_id` stores the key `containerName/metricName` so each podId can be linked under a single timeseries if necessary.
   156  * Units are stored under `units` tag
   157  * If labelToTenant parameter is given, any metric with the label will use this label's value as the target tenant. If the metric doesn't have the label defined, default tenant is used.
   158  
   159  At the start, all the definitions are fetched from the Hawkular-Metrics tenant and filtered to cache only the Heapster metrics. It is recommended to use a separate tenant for Heapster information if you have lots of metrics from other systems, but not required.
   160  
   161  The Hawkular-Metrics instance can be a standalone installation of Hawkular-Metrics or the full installation of Hawkular.