github.com/imran-kn/cilium-fork@v1.6.9/Documentation/configuration/metrics.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 http://docs.cilium.io 6 7 .. _metrics: 8 9 ******************** 10 Monitoring & Metrics 11 ******************** 12 13 ``cilium-agent`` and ``cilium-operator`` can be configured to serve `Prometheus 14 <https://prometheus.io>`_ metrics. Prometheus is a pluggable metrics collection 15 and storage system and can act as a data source for `Grafana 16 <https://grafana.com/>`_, a metrics visualization frontend. Unlike some metrics 17 collectors like statsd, Prometheus requires the collectors to pull metrics from 18 each source. 19 20 To run Cilium with Prometheus metrics enabled, deploy it with the 21 ``global.prometheus.enabled=true`` Helm value set. 22 23 All metrics are exported under the ``cilium`` Prometheus namespace. When 24 running and collecting in Kubernetes they will be tagged with a pod name and 25 namespace. 26 27 Installation 28 ============ 29 30 When deployed with the Helm value ``global.prometheus.enabled=true``, all Cilium 31 components will have the annotations to signal Prometheus whether to scrape 32 metrics: 33 34 .. code-block:: yaml 35 36 prometheus.io/scrape: "true" 37 prometheus.io/port: "9090" 38 39 Example Prometheus & Grafana Deployment 40 --------------------------------------- 41 42 If you don't have an existing Prometheus and Grafana stack running, you can 43 deploy a stack with: 44 45 .. parsed-literal:: 46 47 kubectl apply -f \ |SCM_WEB|\/examples/kubernetes/addons/prometheus/monitoring-example.yaml 48 49 It will run Prometheus and Grafana in the ``cilium-monitoring`` namespace. You 50 can then expose Grafana to access it via your browser. 51 52 .. code:: bash 53 54 kubectl -n cilium-monitoring port-forward service/grafana 3000:3000 55 56 Open your browser and access ``https://localhost:3000/`` 57 58 cilium-agent 59 ============ 60 61 To expose any metrics, invoke ``cilium-agent`` with the 62 ``--prometheus-serve-addr`` option. This option takes a ``IP:Port`` pair but 63 passing an empty IP (e.g. ``:9090``) will bind the server to all available 64 interfaces (there is usually only one in a container). 65 66 in :git-tree:`examples/kubernetes/addons/prometheus/monitoring-example.yaml` 67 68 Exported Metrics 69 ---------------- 70 71 Endpoint 72 ~~~~~~~~ 73 74 ============================================ ================================================== ======================================================== 75 Name Labels Description 76 ============================================ ================================================== ======================================================== 77 ``endpoint_count`` Number of endpoints managed by this agent 78 ``endpoint_regenerations`` ``outcome`` Count of all endpoint regenerations that have completed 79 ``endpoint_regeneration_time_stats_seconds`` ``scope`` Endpoint regeneration time stats 80 ``endpoint_state`` ``state`` Count of all endpoints 81 ============================================ ================================================== ======================================================== 82 83 Services 84 ~~~~~~~~ 85 86 ========================================== ================================================== ======================================================== 87 Name Labels Description 88 ========================================== ================================================== ======================================================== 89 ``services_events_total`` Number of services events labeled by action type 90 ========================================== ================================================== ======================================================== 91 92 Datapath 93 ~~~~~~~~ 94 95 ============================================= ================================================== ======================================================== 96 Name Labels Description 97 ============================================= ================================================== ======================================================== 98 ``datapath_errors_total`` ``area``, ``name``, ``family`` Total number of errors occurred in datapath management 99 ``datapath_conntrack_gc_runs_total`` ``status`` Number of times that the conntrack garbage collector process was run 100 ``datapath_conntrack_gc_key_fallbacks_total`` The number of alive and deleted conntrack entries at the end of a garbage collector run labeled by datapath family 101 ``datapath_conntrack_gc_entries`` ``family`` The number of alive and deleted conntrack entries at the end of a garbage collector run 102 ``datapath_conntrack_gc_duration_seconds`` ``status`` Duration in seconds of the garbage collector process 103 ============================================= ================================================== ======================================================== 104 105 BPF 106 ~~~ 107 108 ========================================== ================================================== ======================================================== 109 Name Labels Description 110 ========================================== ================================================== ======================================================== 111 ``bpf_syscall_duration_seconds`` ``operation``, ``outcome`` Duration of BPF system call performed 112 ``bpf_map_ops_total`` ``mapName``, ``operation``, ``outcome`` Number of BPF map operations performed 113 ========================================== ================================================== ======================================================== 114 115 Drops/Forwards (L3/L4) 116 ~~~~~~~~~~~~~~~~~~~~~~ 117 118 ========================================== ================================================== ======================================================== 119 Name Labels Description 120 ========================================== ================================================== ======================================================== 121 ``drop_count_total`` ``reason``, ``direction`` Total dropped packets 122 ``drop_bytes_total`` ``reason``, ``direction`` Total dropped bytes 123 ``forward_count_total`` ``direction`` Total forwarded packets 124 ``forward_bytes_total`` ``direction`` Total forwarded bytes 125 ========================================== ================================================== ======================================================== 126 127 Policy 128 ~~~~~~ 129 130 ========================================== ================================================== ======================================================== 131 Name Labels Description 132 ========================================== ================================================== ======================================================== 133 ``policy_count`` Number of policies currently loaded 134 ``policy_regeneration_total`` Total number of policies regenerated successfully 135 ``policy_regeneration_time_stats_seconds`` ``scope`` Policy regeneration time stats labeled by the scope 136 ``policy_max_revision`` Highest policy revision number in the agent 137 ``policy_import_errors`` Number of times a policy import has failed 138 ``policy_endpoint_enforcement_status`` Number of endpoints labeled by policy enforcement status 139 ========================================== ================================================== ======================================================== 140 141 Policy L7 (HTTP/Kafka) 142 ~~~~~~~~~~~~~~~~~~~~~~ 143 144 ======================================== ================================================== ======================================================== 145 Name Labels Description 146 ======================================== ================================================== ======================================================== 147 ``proxy_redirects`` ``protocol`` Number of redirects installed for endpoints 148 ``proxy_upstream_reply_seconds`` Seconds waited for upstream server to reply to a request 149 ``policy_l7_total`` ``type`` Number of total L7 requests/responses 150 ======================================== ================================================== ======================================================== 151 152 Identity 153 ~~~~~~~~ 154 155 ======================================== ================================================== ======================================================== 156 Name Labels Description 157 ======================================== ================================================== ======================================================== 158 ``identity_count`` Number of identities currently allocated 159 ======================================== ================================================== ======================================================== 160 161 Events external to Cilium 162 ~~~~~~~~~~~~~~~~~~~~~~~~~ 163 164 ======================================== ================================================== ======================================================== 165 Name Labels Description 166 ======================================== ================================================== ======================================================== 167 ``event_ts`` ``source`` Last timestamp when we received an event 168 ======================================== ================================================== ======================================================== 169 170 Controllers 171 ~~~~~~~~~~~ 172 173 ======================================== ================================================== ======================================================== 174 Name Labels Description 175 ======================================== ================================================== ======================================================== 176 ``controllers_runs_total`` ``status`` Number of times that a controller process was run 177 ``controllers_runs_duration_seconds`` ``status`` Duration in seconds of the controller process 178 ======================================== ================================================== ======================================================== 179 180 SubProcess 181 ~~~~~~~~~~ 182 183 ======================================== ================================================== ======================================================== 184 Name Labels Description 185 ======================================== ================================================== ======================================================== 186 ``subprocess_start_total`` ``subsystem`` Number of times that Cilium has started a subprocess 187 ======================================== ================================================== ======================================================== 188 189 Kubernetes 190 ~~~~~~~~~~ 191 192 ======================================== ================================================== ======================================================== 193 Name Labels Description 194 ======================================== ================================================== ======================================================== 195 ``kubernetes_events_received_total`` ``scope``, ``action``, ``validity``, ``equiality`` Number of Kubernetes events received 196 ``kubernetes_events_total`` ``scope``, ``action``, ``outcome`` Number of Kubernetes events processed 197 ``k8s_cnp_status_completion_seconds`` ``attempts``, ``outcome`` Duration in seconds in how long it took to complete a CNP status update 198 ======================================== ================================================== ======================================================== 199 200 IPAM 201 ~~~~ 202 203 ======================================== ============================================ ======================================================== 204 Name Labels Description 205 ======================================== ============================================ ======================================================== 206 ``ipam_events_total`` Number of IPAM events received labeled by action and datapath family type 207 ======================================== ============================================ ======================================================== 208 209 KVstore 210 ~~~~~~~ 211 212 ======================================== ============================================ ======================================================== 213 Name Labels Description 214 ======================================== ============================================ ======================================================== 215 ``kvstore_operations_duration_seconds`` ``action``, ``kind``, ``outcome``, ``scope`` Duration of kvstore operation 216 ``kvstore_events_queue_seconds`` ``action``, ``scope`` Duration of seconds of time received event was blocked before it could be queued 217 ======================================== ============================================ ======================================================== 218 219 Agent 220 ~~~~~ 221 222 ================================ ================================ ======================================================== 223 Name Labels Description 224 ================================ ================================ ======================================================== 225 ``agent_bootstrap_seconds`` ``scope``, ``outcome`` Duration of various bootstrap phases 226 ``api_process_time_seconds`` Processing time of all the API calls made to the cilium-agent, labeled by API method, API path and returned HTTP code. 227 ================================ ================================ ======================================================== 228 229 FQDN 230 ~~~~ 231 232 ================================ ================================ ======================================================== 233 Name Labels Description 234 ================================ ================================ ======================================================== 235 ``qdn_gc_deletions_total`` Number of FQDNs that have been cleaned on FQDN garbage collector job 236 ================================ ================================ ======================================================== 237 238 cilium-operator 239 =============== 240 241 ``cilium-operator`` can be configured to serve metrics by running with the 242 option ``--enable-metrics``. By default, the operator will expose metrics on 243 port 6942, the port can be changed with the option ``--metrics-address``. 244 245 Exported Metrics 246 ---------------- 247 248 All metrics are exported under the ``cilium_operator_`` Prometheus namespace. 249 250 ENI 251 ~~~ 252 253 ================================ ================================ ======================================================== 254 Name Labels Description 255 ================================ ================================ ======================================================== 256 ``eni_ips`` ``type`` Number of IPs allocated 257 ``eni_allocation_ops`` ``subnetId`` Number of IP allocation operations 258 ``eni_interface_creation_ops`` ``subnetId``, ``status`` Number of ENIs allocated 259 ``eni_available`` Number of ENIs with addresses available 260 ``eni_nodes_at_capacity`` Number of nodes unable to allocate more addresses 261 ``eni_aws_api_duration_seconds`` ``operation``, ``responseCode`` Duration of interactions with AWS API 262 ``eni_resync_total`` Number of synchronization operations to synchronize AWS EC2 metadata 263 ``eni_ec2_rate_limit`` ``operation`` Number of times the EC2 client rate limiter kicked in 264 ================================ ================================ ========================================================