github.com/google/cadvisor@v0.49.1/docs/storage/prometheus.md

github.com/google/cadvisor@v0.49.1/docs/storage/prometheus.md (about)

     1  # Monitoring cAdvisor with Prometheus
     2  
     3  cAdvisor exposes container and hardware statistics as [Prometheus](https://prometheus.io) metrics out of the box. By default, these metrics are served under the `/metrics` HTTP endpoint. This endpoint may be customized by setting the `-prometheus_endpoint` and `-disable_metrics` or `-enable_metrics` command-line flags.
     4  
     5  To collect some of metrics it is required to build cAdvisor with additional flags, for details see [build instructions](../development/build.md), additional flags are indicated in "additional build flag" column in table below.
     6  
     7  To monitor cAdvisor with Prometheus, simply configure one or more jobs in Prometheus which scrape the relevant cAdvisor processes at that metrics endpoint. For details, see Prometheus's [Configuration](https://prometheus.io/docs/operating/configuration/) documentation, as well as the [Getting started](https://prometheus.io/docs/introduction/getting_started/) guide.
     8  
     9  # Examples
    10  
    11  * [CenturyLink Labs](https://labs.ctl.io/) did an excellent write up on [Monitoring Docker services with Prometheus +cAdvisor](https://www.ctl.io/developers/blog/post/monitoring-docker-services-with-prometheus/), while it is great to get a better overview of cAdvisor integration with Prometheus, the PromDash GUI part is outdated as it has been deprecated for Grafana.
    12  
    13  * [vegasbrianc](https://github.com/vegasbrianc) provides a [starter project](https://github.com/vegasbrianc/prometheus) for cAdvisor and Prometheus monitoring, alongide a ready-to-use [Grafana dashboard](https://github.com/vegasbrianc/grafana_dashboard).
    14  
    15  ## Prometheus container metrics
    16  
    17  The table below lists the Prometheus container metrics exposed by cAdvisor (in alphabetical order by metric name) and corresponding `-disable_metrics` / `-enable_metrics` option parameter:
    18  
    19  Metric name | Type | Description | Unit (where applicable) | option parameter | additional build flag |
    20  :-----------|:-----|:------------|:------------------------|:---------------------------|:----------------------
    21  `container_blkio_device_usage_total` | Counter | Blkio device bytes usage | bytes | diskIO | 
    22  `container_cpu_cfs_periods_total` | Counter | Number of elapsed enforcement period intervals | | cpu |
    23  `container_cpu_cfs_throttled_periods_total` | Counter | Number of throttled period intervals | | cpu |
    24  `container_cpu_cfs_throttled_seconds_total` | Counter | Total time duration the container has been throttled | seconds | cpu |
    25  `container_cpu_load_average_10s` | Gauge | Value of container cpu load average over the last 10 seconds | | cpuLoad |
    26  `container_cpu_schedstat_run_periods_total` | Counter | Number of times processes of the cgroup have run on the cpu | | sched |
    27  `container_cpu_schedstat_runqueue_seconds_total` | Counter | Time duration processes of the container have been waiting on a runqueue | seconds | sched |
    28  `container_cpu_schedstat_run_seconds_total` | Counter | Time duration the processes of the container have run on the CPU | seconds | sched |
    29  `container_cpu_system_seconds_total` | Counter | Cumulative system cpu time consumed | seconds | cpu |
    30  `container_cpu_usage_seconds_total` | Counter | Cumulative cpu time consumed | seconds | cpu |
    31  `container_cpu_user_seconds_total` | Counter | Cumulative user cpu time consumed | seconds | cpu |
    32  `container_file_descriptors` | Gauge | Number of open file descriptors for the container | | process |
    33  `container_fs_inodes_free` | Gauge | Number of available Inodes | | disk |
    34  `container_fs_inodes_total` | Gauge | Total number of Inodes | | disk |
    35  `container_fs_io_current` | Gauge | Number of I/Os currently in progress | | diskIO |
    36  `container_fs_io_time_seconds_total` | Counter | Cumulative count of seconds spent doing I/Os | seconds | diskIO |
    37  `container_fs_io_time_weighted_seconds_total` | Counter | Cumulative weighted I/O time | seconds | diskIO |
    38  `container_fs_limit_bytes` | Gauge | Number of bytes that can be consumed by the container on this filesystem | bytes | disk |
    39  `container_fs_reads_bytes_total` | Counter | Cumulative count of bytes read | bytes | diskIO |
    40  `container_fs_read_seconds_total` | Counter | Cumulative count of seconds spent reading | | diskIO |
    41  `container_fs_reads_merged_total` | Counter | Cumulative count of reads merged | | diskIO |
    42  `container_fs_reads_total` | Counter | Cumulative count of reads completed | | diskIO |
    43  `container_fs_sector_reads_total` | Counter | Cumulative count of sector reads completed | | diskIO |
    44  `container_fs_sector_writes_total` | Counter | Cumulative count of sector writes completed | | diskIO |
    45  `container_fs_usage_bytes` | Gauge | Number of bytes that are consumed by the container on this filesystem | bytes | disk |
    46  `container_fs_writes_bytes_total` | Counter | Cumulative count of bytes written | bytes | diskIO |
    47  `container_fs_write_seconds_total` | Counter | Cumulative count of seconds spent writing | seconds | diskIO |
    48  `container_fs_writes_merged_total` | Counter | Cumulative count of writes merged | | diskIO |
    49  `container_fs_writes_total` | Counter | Cumulative count of writes completed | | diskIO |
    50  `container_hugetlb_failcnt` | Counter | Number of hugepage usage hits limits | | hugetlb |
    51  `container_hugetlb_max_usage_bytes` | Gauge | Maximum hugepage usages recorded | bytes | hugetlb |
    52  `container_hugetlb_usage_bytes` | Gauge | Current hugepage usage | bytes | hugetlb |
    53  `container_last_seen` | Gauge | Last time a container was seen by the exporter | timestamp | - |
    54  `container_llc_occupancy_bytes` | Gauge | Last level cache usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
    55  `container_memory_bandwidth_bytes` | Gauge | Total memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
    56  `container_memory_bandwidth_local_bytes` | Gauge | Local memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
    57  `container_memory_cache` | Gauge | Total page cache memory | bytes | memory |
    58  `container_memory_failcnt` | Counter | Number of memory usage hits limits | | memory |
    59  `container_memory_failures_total` | Counter | Cumulative count of memory allocation failures | | memory |
    60  `container_memory_mapped_file` | Gauge | Size of memory mapped files | bytes | memory |
    61  `container_memory_max_usage_bytes` | Gauge | Maximum memory usage recorded | bytes | memory |
    62  `container_memory_migrate` | Gauge | Memory migrate status | | cpuset |
    63  `container_memory_numa_pages` | Gauge | Number of used pages per NUMA node | | memory_numa |
    64  `container_memory_rss` | Gauge | Size of RSS | bytes | memory |
    65  `container_memory_swap` | Gauge | Container swap usage | bytes | memory |
    66  `container_memory_usage_bytes` | Gauge | Current memory usage, including all memory regardless of when it was accessed | bytes | memory |
    67  `container_memory_working_set_bytes` | Gauge | Current working set | bytes | memory |
    68  `container_network_advance_tcp_stats_total` | Gauge | advanced tcp connections statistic for container | | advtcp |
    69  `container_network_receive_bytes_total` | Counter | Cumulative count of bytes received | bytes | network |
    70  `container_network_receive_errors_total` | Counter | Cumulative count of errors encountered while receiving | | network |
    71  `container_network_receive_packets_dropped_total` | Counter | Cumulative count of packets dropped while receiving | | network |
    72  `container_network_receive_packets_total` | Counter | Cumulative count of packets received | | network |
    73  `container_network_tcp6_usage_total` | Gauge | tcp6 connection usage statistic for container | | tcp |
    74  `container_network_tcp_usage_total` | Gauge | tcp connection usage statistic for container | | tcp |
    75  `container_network_transmit_bytes_total` | Counter | Cumulative count of bytes transmitted | bytes | network |
    76  `container_network_transmit_errors_total` | Counter | Cumulative count of errors encountered while transmitting | | network |
    77  `container_network_transmit_packets_dropped_total` | Counter | Cumulative count of packets dropped while transmitting | | network |
    78  `container_network_transmit_packets_total` | Counter | Cumulative count of packets transmitted | | network |
    79  `container_network_udp6_usage_total` | Gauge | udp6 connection usage statistic for container | | udp |
    80  `container_network_udp_usage_total` | Gauge | udp connection usage statistic for container | | udp |
    81  `container_oom_events_total` | Counter | Count of out of memory events observed for the container | | oom_event |
    82  `container_perf_events_scaling_ratio` | Gauge | Scaling ratio for perf event counter (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm
    83  `container_perf_events_total` | Counter | Scaled counter of perf core event (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm
    84  `container_perf_uncore_events_scaling_ratio` | Gauge | Scaling ratio for perf uncore event counter (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). Metric exists only for main cgroup (id="/"). | | perf_event | libpfm
    85  `container_perf_uncore_events_total` | Counter | Scaled counter of perf uncore event (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events)). Metric exists only for main cgroup (id="/").| | perf_event | libpfm
    86  `container_processes` | Gauge | Number of processes running inside the container | | process |
    87  `container_referenced_bytes` | Gauge |  Container referenced bytes during last measurements cycle based on Referenced field in /proc/smaps file, with /proc/PIDs/clear_refs set to 1 after defined number of cycles configured through `referenced_reset_interval` cAdvisor parameter.</br>Warning: this is intrusive collection because can influence kernel page reclaim policy and add latency. Refer to https://github.com/brendangregg/wss#wsspl-referenced-page-flag for more details. | bytes | referenced_memory |
    88  `container_sockets` | Gauge | Number of open sockets for the container | | process |
    89  `container_spec_cpu_period` | Gauge | CPU period of the container | | - |
    90  `container_spec_cpu_quota` | Gauge | CPU quota of the container | | - |
    91  `container_spec_cpu_shares` | Gauge | CPU share of the container | | - |
    92  `container_spec_memory_limit_bytes` | Gauge | Memory limit for the container | bytes | - |
    93  `container_spec_memory_reservation_limit_bytes` | Gauge | Memory reservation limit for the container | bytes | |
    94  `container_spec_memory_swap_limit_bytes` | Gauge | Memory swap limit for the container | bytes | |
    95  `container_start_time_seconds` | Gauge | Start time of the container since unix epoch | seconds | |
    96  `container_tasks_state` | Gauge | Number of tasks in given state (`sleeping`, `running`, `stopped`, `uninterruptible`, or `ioawaiting`) | | cpuLoad |
    97  `container_threads` | Gauge | Number of threads running inside the container | | process |
    98  `container_threads_max` | Gauge | Maximum number of threads allowed inside the container | | process |
    99  `container_ulimits_soft` | Gauge | Soft ulimit values for the container root process. Unlimited if -1, except priority and nice | | process |
   100  
   101  ## Prometheus hardware metrics
   102  
   103  The table below lists the Prometheus hardware metrics exposed by cAdvisor (in alphabetical order by metric name) and corresponding `-disable_metrics` / `-enable_metrics` option parameter:
   104  
   105  Metric name | Type | Description | Unit (where applicable) | option parameter | addional build flag |
   106  :-----------|:-----|:------------|:------------------------|:---------------------------|:--------------------
   107  `machine_cpu_cache_capacity_bytes` | Gauge |  Cache size in bytes assigned to NUMA node and CPU core | bytes | cpu_topology |
   108  `machine_cpu_cores` | Gauge | Number of logical CPU cores | | |
   109  `machine_cpu_physical_cores` | Gauge | Number of physical CPU cores | | |
   110  `machine_cpu_sockets` | Gauge | Number of CPU sockets | | |
   111  `machine_dimm_capacity_bytes` | Gauge | Total RAM DIMM capacity (all types memory modules) value labeled by dimm type,<br>information is retrieved from sysfs edac per-DIMM API (/sys/devices/system/edac/mc/) introduced in kernel 3.6 | bytes | | |
   112  `machine_dimm_count` | Gauge | Number of RAM DIMM (all types memory modules) value labeled by dimm type,<br>information is retrieved from sysfs edac per-DIMM API (/sys/devices/system/edac/mc/) introduced in kernel 3.6 | | |
   113  `machine_memory_bytes` | Gauge | Amount of memory installed on the machine | bytes | |
   114  `machine_swap_bytes` | Gauge | Amount of swap memory available on the machine | bytes | |
   115  `machine_node_distance` | Gauge | Distance between NUMA node and target NUMA node | | cpu_topology |
   116  `machine_node_hugepages_count` | Gauge |  Numer of hugepages assigned to NUMA node | | cpu_topology |
   117  `machine_node_memory_capacity_bytes` | Gauge |  Amount of memory assigned to NUMA node | bytes | cpu_topology |
   118  `machine_nvm_avg_power_budget_watts` | Gauge |  NVM power budget | watts | | libipmctl
   119  `machine_nvm_capacity` | Gauge | NVM capacity value labeled by NVM mode (memory mode or app direct mode) | bytes | | libipmctl
   120  `machine_thread_siblings_count` | Gauge | Number of CPU thread siblings | | cpu_topology |