github.com/google/cadvisor@v0.49.1/docs/storage/prometheus.md (about) 1 # Monitoring cAdvisor with Prometheus 2 3 cAdvisor exposes container and hardware statistics as [Prometheus](https://prometheus.io) metrics out of the box. By default, these metrics are served under the `/metrics` HTTP endpoint. This endpoint may be customized by setting the `-prometheus_endpoint` and `-disable_metrics` or `-enable_metrics` command-line flags. 4 5 To collect some of metrics it is required to build cAdvisor with additional flags, for details see [build instructions](../development/build.md), additional flags are indicated in "additional build flag" column in table below. 6 7 To monitor cAdvisor with Prometheus, simply configure one or more jobs in Prometheus which scrape the relevant cAdvisor processes at that metrics endpoint. For details, see Prometheus's [Configuration](https://prometheus.io/docs/operating/configuration/) documentation, as well as the [Getting started](https://prometheus.io/docs/introduction/getting_started/) guide. 8 9 # Examples 10 11 * [CenturyLink Labs](https://labs.ctl.io/) did an excellent write up on [Monitoring Docker services with Prometheus +cAdvisor](https://www.ctl.io/developers/blog/post/monitoring-docker-services-with-prometheus/), while it is great to get a better overview of cAdvisor integration with Prometheus, the PromDash GUI part is outdated as it has been deprecated for Grafana. 12 13 * [vegasbrianc](https://github.com/vegasbrianc) provides a [starter project](https://github.com/vegasbrianc/prometheus) for cAdvisor and Prometheus monitoring, alongide a ready-to-use [Grafana dashboard](https://github.com/vegasbrianc/grafana_dashboard). 14 15 ## Prometheus container metrics 16 17 The table below lists the Prometheus container metrics exposed by cAdvisor (in alphabetical order by metric name) and corresponding `-disable_metrics` / `-enable_metrics` option parameter: 18 19 Metric name | Type | Description | Unit (where applicable) | option parameter | additional build flag | 20 :-----------|:-----|:------------|:------------------------|:---------------------------|:---------------------- 21 `container_blkio_device_usage_total` | Counter | Blkio device bytes usage | bytes | diskIO | 22 `container_cpu_cfs_periods_total` | Counter | Number of elapsed enforcement period intervals | | cpu | 23 `container_cpu_cfs_throttled_periods_total` | Counter | Number of throttled period intervals | | cpu | 24 `container_cpu_cfs_throttled_seconds_total` | Counter | Total time duration the container has been throttled | seconds | cpu | 25 `container_cpu_load_average_10s` | Gauge | Value of container cpu load average over the last 10 seconds | | cpuLoad | 26 `container_cpu_schedstat_run_periods_total` | Counter | Number of times processes of the cgroup have run on the cpu | | sched | 27 `container_cpu_schedstat_runqueue_seconds_total` | Counter | Time duration processes of the container have been waiting on a runqueue | seconds | sched | 28 `container_cpu_schedstat_run_seconds_total` | Counter | Time duration the processes of the container have run on the CPU | seconds | sched | 29 `container_cpu_system_seconds_total` | Counter | Cumulative system cpu time consumed | seconds | cpu | 30 `container_cpu_usage_seconds_total` | Counter | Cumulative cpu time consumed | seconds | cpu | 31 `container_cpu_user_seconds_total` | Counter | Cumulative user cpu time consumed | seconds | cpu | 32 `container_file_descriptors` | Gauge | Number of open file descriptors for the container | | process | 33 `container_fs_inodes_free` | Gauge | Number of available Inodes | | disk | 34 `container_fs_inodes_total` | Gauge | Total number of Inodes | | disk | 35 `container_fs_io_current` | Gauge | Number of I/Os currently in progress | | diskIO | 36 `container_fs_io_time_seconds_total` | Counter | Cumulative count of seconds spent doing I/Os | seconds | diskIO | 37 `container_fs_io_time_weighted_seconds_total` | Counter | Cumulative weighted I/O time | seconds | diskIO | 38 `container_fs_limit_bytes` | Gauge | Number of bytes that can be consumed by the container on this filesystem | bytes | disk | 39 `container_fs_reads_bytes_total` | Counter | Cumulative count of bytes read | bytes | diskIO | 40 `container_fs_read_seconds_total` | Counter | Cumulative count of seconds spent reading | | diskIO | 41 `container_fs_reads_merged_total` | Counter | Cumulative count of reads merged | | diskIO | 42 `container_fs_reads_total` | Counter | Cumulative count of reads completed | | diskIO | 43 `container_fs_sector_reads_total` | Counter | Cumulative count of sector reads completed | | diskIO | 44 `container_fs_sector_writes_total` | Counter | Cumulative count of sector writes completed | | diskIO | 45 `container_fs_usage_bytes` | Gauge | Number of bytes that are consumed by the container on this filesystem | bytes | disk | 46 `container_fs_writes_bytes_total` | Counter | Cumulative count of bytes written | bytes | diskIO | 47 `container_fs_write_seconds_total` | Counter | Cumulative count of seconds spent writing | seconds | diskIO | 48 `container_fs_writes_merged_total` | Counter | Cumulative count of writes merged | | diskIO | 49 `container_fs_writes_total` | Counter | Cumulative count of writes completed | | diskIO | 50 `container_hugetlb_failcnt` | Counter | Number of hugepage usage hits limits | | hugetlb | 51 `container_hugetlb_max_usage_bytes` | Gauge | Maximum hugepage usages recorded | bytes | hugetlb | 52 `container_hugetlb_usage_bytes` | Gauge | Current hugepage usage | bytes | hugetlb | 53 `container_last_seen` | Gauge | Last time a container was seen by the exporter | timestamp | - | 54 `container_llc_occupancy_bytes` | Gauge | Last level cache usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl | 55 `container_memory_bandwidth_bytes` | Gauge | Total memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl | 56 `container_memory_bandwidth_local_bytes` | Gauge | Local memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl | 57 `container_memory_cache` | Gauge | Total page cache memory | bytes | memory | 58 `container_memory_failcnt` | Counter | Number of memory usage hits limits | | memory | 59 `container_memory_failures_total` | Counter | Cumulative count of memory allocation failures | | memory | 60 `container_memory_mapped_file` | Gauge | Size of memory mapped files | bytes | memory | 61 `container_memory_max_usage_bytes` | Gauge | Maximum memory usage recorded | bytes | memory | 62 `container_memory_migrate` | Gauge | Memory migrate status | | cpuset | 63 `container_memory_numa_pages` | Gauge | Number of used pages per NUMA node | | memory_numa | 64 `container_memory_rss` | Gauge | Size of RSS | bytes | memory | 65 `container_memory_swap` | Gauge | Container swap usage | bytes | memory | 66 `container_memory_usage_bytes` | Gauge | Current memory usage, including all memory regardless of when it was accessed | bytes | memory | 67 `container_memory_working_set_bytes` | Gauge | Current working set | bytes | memory | 68 `container_network_advance_tcp_stats_total` | Gauge | advanced tcp connections statistic for container | | advtcp | 69 `container_network_receive_bytes_total` | Counter | Cumulative count of bytes received | bytes | network | 70 `container_network_receive_errors_total` | Counter | Cumulative count of errors encountered while receiving | | network | 71 `container_network_receive_packets_dropped_total` | Counter | Cumulative count of packets dropped while receiving | | network | 72 `container_network_receive_packets_total` | Counter | Cumulative count of packets received | | network | 73 `container_network_tcp6_usage_total` | Gauge | tcp6 connection usage statistic for container | | tcp | 74 `container_network_tcp_usage_total` | Gauge | tcp connection usage statistic for container | | tcp | 75 `container_network_transmit_bytes_total` | Counter | Cumulative count of bytes transmitted | bytes | network | 76 `container_network_transmit_errors_total` | Counter | Cumulative count of errors encountered while transmitting | | network | 77 `container_network_transmit_packets_dropped_total` | Counter | Cumulative count of packets dropped while transmitting | | network | 78 `container_network_transmit_packets_total` | Counter | Cumulative count of packets transmitted | | network | 79 `container_network_udp6_usage_total` | Gauge | udp6 connection usage statistic for container | | udp | 80 `container_network_udp_usage_total` | Gauge | udp connection usage statistic for container | | udp | 81 `container_oom_events_total` | Counter | Count of out of memory events observed for the container | | oom_event | 82 `container_perf_events_scaling_ratio` | Gauge | Scaling ratio for perf event counter (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm 83 `container_perf_events_total` | Counter | Scaled counter of perf core event (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm 84 `container_perf_uncore_events_scaling_ratio` | Gauge | Scaling ratio for perf uncore event counter (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). Metric exists only for main cgroup (id="/"). | | perf_event | libpfm 85 `container_perf_uncore_events_total` | Counter | Scaled counter of perf uncore event (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events)). Metric exists only for main cgroup (id="/").| | perf_event | libpfm 86 `container_processes` | Gauge | Number of processes running inside the container | | process | 87 `container_referenced_bytes` | Gauge | Container referenced bytes during last measurements cycle based on Referenced field in /proc/smaps file, with /proc/PIDs/clear_refs set to 1 after defined number of cycles configured through `referenced_reset_interval` cAdvisor parameter.</br>Warning: this is intrusive collection because can influence kernel page reclaim policy and add latency. Refer to https://github.com/brendangregg/wss#wsspl-referenced-page-flag for more details. | bytes | referenced_memory | 88 `container_sockets` | Gauge | Number of open sockets for the container | | process | 89 `container_spec_cpu_period` | Gauge | CPU period of the container | | - | 90 `container_spec_cpu_quota` | Gauge | CPU quota of the container | | - | 91 `container_spec_cpu_shares` | Gauge | CPU share of the container | | - | 92 `container_spec_memory_limit_bytes` | Gauge | Memory limit for the container | bytes | - | 93 `container_spec_memory_reservation_limit_bytes` | Gauge | Memory reservation limit for the container | bytes | | 94 `container_spec_memory_swap_limit_bytes` | Gauge | Memory swap limit for the container | bytes | | 95 `container_start_time_seconds` | Gauge | Start time of the container since unix epoch | seconds | | 96 `container_tasks_state` | Gauge | Number of tasks in given state (`sleeping`, `running`, `stopped`, `uninterruptible`, or `ioawaiting`) | | cpuLoad | 97 `container_threads` | Gauge | Number of threads running inside the container | | process | 98 `container_threads_max` | Gauge | Maximum number of threads allowed inside the container | | process | 99 `container_ulimits_soft` | Gauge | Soft ulimit values for the container root process. Unlimited if -1, except priority and nice | | process | 100 101 ## Prometheus hardware metrics 102 103 The table below lists the Prometheus hardware metrics exposed by cAdvisor (in alphabetical order by metric name) and corresponding `-disable_metrics` / `-enable_metrics` option parameter: 104 105 Metric name | Type | Description | Unit (where applicable) | option parameter | addional build flag | 106 :-----------|:-----|:------------|:------------------------|:---------------------------|:-------------------- 107 `machine_cpu_cache_capacity_bytes` | Gauge | Cache size in bytes assigned to NUMA node and CPU core | bytes | cpu_topology | 108 `machine_cpu_cores` | Gauge | Number of logical CPU cores | | | 109 `machine_cpu_physical_cores` | Gauge | Number of physical CPU cores | | | 110 `machine_cpu_sockets` | Gauge | Number of CPU sockets | | | 111 `machine_dimm_capacity_bytes` | Gauge | Total RAM DIMM capacity (all types memory modules) value labeled by dimm type,<br>information is retrieved from sysfs edac per-DIMM API (/sys/devices/system/edac/mc/) introduced in kernel 3.6 | bytes | | | 112 `machine_dimm_count` | Gauge | Number of RAM DIMM (all types memory modules) value labeled by dimm type,<br>information is retrieved from sysfs edac per-DIMM API (/sys/devices/system/edac/mc/) introduced in kernel 3.6 | | | 113 `machine_memory_bytes` | Gauge | Amount of memory installed on the machine | bytes | | 114 `machine_swap_bytes` | Gauge | Amount of swap memory available on the machine | bytes | | 115 `machine_node_distance` | Gauge | Distance between NUMA node and target NUMA node | | cpu_topology | 116 `machine_node_hugepages_count` | Gauge | Numer of hugepages assigned to NUMA node | | cpu_topology | 117 `machine_node_memory_capacity_bytes` | Gauge | Amount of memory assigned to NUMA node | bytes | cpu_topology | 118 `machine_nvm_avg_power_budget_watts` | Gauge | NVM power budget | watts | | libipmctl 119 `machine_nvm_capacity` | Gauge | NVM capacity value labeled by NVM mode (memory mode or app direct mode) | bytes | | libipmctl 120 `machine_thread_siblings_count` | Gauge | Number of CPU thread siblings | | cpu_topology |