github.com/inspektor-gadget/inspektor-gadget@v0.28.1/docs/guides/prometheus.md

github.com/inspektor-gadget/inspektor-gadget@v0.28.1/docs/guides/prometheus.md (about)

     1  ---
     2  title: 'Using prometheus'
     3  weight: 30
     4  description: >
     5    Expose metrics using prometheus
     6  ---
     7  
     8  The Prometheus gadget collects and exposes metrics in Prometheus format. It's available in both, for
     9  Kubernetes (`ig-k8s`) and in Linux hosts (`ig`).
    10  
    11  
    12  ```bash
    13  $ kubectl gadget prometheus --config @<path>
    14  $ ig prometheus --config @<path> --metrics-listen-address $IP:$PORT --metrics-path /metrics
    15  ```
    16  
    17  ## Configuration File
    18  
    19  The configuration file defines the metrics to be exposed and their settings. The structure of this
    20  file is:
    21  
    22  ```yaml
    23  metrics_name: metrics_name
    24  metrics:
    25    - name: metric_name
    26      type: counter or  gauge or histogram
    27      category: trace # category of the gadget to collect the metric. trace, snapshot, etc.
    28      gadget: exec # gadget used to collect the metric. exec, open, etc.
    29      selector:
    30        # defines which events to take into consideration when updating the metrics.
    31        # See more information below.
    32      labels:
    33        # defines the granularity of the labels to capture. See below.
    34  ```
    35  
    36  ### Filtering (aka Selectors)
    37  
    38  It's possible to configure Inspektor Gadget to only update metrics for some specific labels. This is
    39  useful to keep the cardinality of the labels low.
    40  
    41  ```yaml
    42    selector:
    43    - "columnName:value" # matches if the content of the column is equals to value
    44    - "columnName:!value" # matches if the content of the column is not equal to value
    45    - "columnName:>=value" # matches if the content of the column is greater and equal to value
    46    - "columnName:>value" # matches if the content of columnName is greater than the value
    47    - "columnName:<=value" # matches, if the content of columnName is lower or equal to the value
    48    - "columnName:<value" # matches, if the content of columnName is lower than the value
    49    - "columnName:~value" # matches if the content of column matches the regular expression 'value'.
    50                          # see https://github.com/google/re2/wiki/Syntax for more information on the syntax.
    51  ```
    52  
    53  Some examples are:
    54  
    55  Only metrics for default namespace
    56  
    57  ```yaml
    58  selector:
    59    - k8s.namespace: default
    60  ```
    61  
    62  Only events with retval != 0
    63  
    64  ```yaml
    65  selector:
    66    - "retval:!0"
    67  ```
    68  
    69  Only events executed by pid 1 by non root users
    70  
    71  ```yaml
    72  selector:
    73    - "pid:0"
    74    - "uid:>=1"
    75  ```
    76  
    77  ### Counters
    78  
    79  This is the most intuitive metric: "A _counter_ is a cumulative metric that represents a
    80  single [monotonically increasing counter](https://en.wikipedia.org/wiki/Monotonic_function) whose
    81  value can only increase or be reset to zero on restart. For example, you can use a counter to
    82  represent the number of requests served, tasks completed, or errors." from
    83  [https://prometheus.io/docs/concepts/metric_types/#counter](https://prometheus.io/docs/concepts/metric_types/#counter).
    84  
    85  The following are examples of counters we can support with the existing gadgets. The first one
    86  counts the number of executed processes by namespace, pod and container.
    87  
    88  ```yaml
    89  metrics_name: my_metrics
    90  metrics:
    91    - name: executed_processes
    92      type: counter
    93      category: trace
    94      gadget: exec
    95      labels:
    96        - k8s.namespace
    97        - k8s.pod
    98        - k8s.container
    99  ```
   100  
   101  By default, a counter is increased by one each time there is an event, however it's possible to
   102  increase a counter using a field on the event too.
   103  
   104  Executed processes by pod and container in the default namespace
   105  
   106  ```yaml
   107  metrics_name: metrics_name
   108  metrics:
   109    - name: executed_processes
   110      type: counter
   111      category: trace
   112      gadget: exec
   113      labels:
   114        - k8s.pod
   115        - k8s.container
   116      selector:
   117        - "k8s.namespace:default"
   118  ```
   119  
   120  Or only count events for a given command:
   121  
   122  `cat` executions by namespace, pod and container
   123  
   124  ```yaml
   125  metrics_name: metrics_name
   126  metrics:
   127    - name: executed_cats # ohno!
   128      type: counter
   129      category: trace
   130      gadget: exec
   131      labels:
   132        - k8s.namespace
   133        - k8s.pod
   134        - k8s.container
   135      selector:
   136        - "comm:cat"
   137  ```
   138  
   139  DNS requests aggregated by namespace and pod
   140  
   141  ```yaml
   142  metrics_name: metrics_name
   143  metrics:
   144    - name: dns_requests
   145      type: counter
   146      category: trace
   147      gadget: dns
   148      labels:
   149        - k8s.namespace
   150        - k8s.pod
   151      selector:
   152        - "qr:Q" # Only count query events
   153  ```
   154  
   155  ### Gauges
   156  
   157  "A _gauge_ is a metric that represents a single numerical value that can arbitrarily go up and down"
   158  from
   159  [https://prometheus.io/docs/concepts/metric_types/#gauge](https://prometheus.io/docs/concepts/metric_types/#gauge).
   160  
   161  Right now only snapshotters are supported.
   162  
   163  Examples of gauges are:
   164  
   165  Number of processes by namespace, pod and container.
   166  
   167  ```yaml
   168  metrics_name: metrics_name
   169  metrics:
   170    - name: number_of_processes
   171      type: gauge
   172      category: snapshot
   173      gadget: process
   174      labels:
   175        - k8s.namespace
   176        - k8s.pod
   177        - k8s.container
   178  ```
   179  
   180  Number of sockets in `CLOSE_WAIT` state
   181  
   182  ```yaml
   183  metrics_name: metrics_name
   184  metrics:
   185    - name: number_of_sockets_close_wait
   186      type: gauge
   187      category: snapshot
   188      gadget: socket
   189      labels:
   190        - k8s.namespace
   191        - k8s.pod
   192        - k8s.container
   193      selector:
   194        - "status:CLOSE_WAIT"
   195  ```
   196  
   197  ### Histograms
   198  
   199  "A _histogram_ samples observations (usually things like request durations or response sizes) and counts them in
   200  configurable buckets. It also provides a sum of all observed values."
   201  from [https://prometheus.io/docs/concepts/metric_types/#histogram](https://prometheus.io/docs/concepts/metric_types/#histogram).
   202  We support the same bucket configuration as described in
   203  [https://github.com/cloudflare/ebpf_exporter#histograms.](https://github.com/cloudflare/ebpf_exporter#histograms.)
   204  
   205  Right now only trace gadgets are supported.
   206  
   207  Example of histograms is:
   208  
   209  Latency of DNS requests for all pods
   210  
   211  ```yaml
   212  metrics_name: metrics_name
   213  metrics:
   214    - name: dns_requests_latency
   215      type: histogram
   216      category: trace
   217      field: latency
   218      bucket:
   219        min: 0
   220        max: 10
   221        multiplier: 100000 # 0.1ms
   222        type: exp2
   223        unit: ns
   224      selector:
   225        - "qr:R" # Latency is only calculated for response events
   226  ```
   227  
   228  ### Guide
   229  
   230  Let's see how we can use this gadget in different environments.
   231  
   232  #### On Kubernetes
   233  
   234  In this guide we'll use the Prometheus Service Discovery: it automatically detects the endpoints to
   235  scrape metrics from.
   236  
   237  If you already have a Prometheus instance running in your cluster, be sure you provide it with the
   238  following configuration:
   239  
   240  ```yaml
   241  scrape_configs:
   242    - job_name: 'kubernetes-pods'
   243  
   244      scrape_interval: 1s
   245      scrape_timeout: 1s
   246  
   247      kubernetes_sd_configs:
   248      - role: pod
   249  
   250      relabel_configs:
   251      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
   252        action: keep
   253        regex: true
   254      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
   255        action: replace
   256        target_label: __scheme__
   257        regex: (https?)
   258      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
   259        action: replace
   260        target_label: __metrics_path__
   261        regex: (.+)
   262      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
   263        action: replace
   264        target_label: __address__
   265        regex: ([^:]+)(?::\d+)?;(\d+)
   266        replacement: $1:$2
   267  ```
   268  
   269  Otherwise, you can just apply the config provided with this guide:
   270  
   271  ```bash
   272  $ kubectl apply -f docs/examples/prometheus.yaml
   273  namespace/monitoring created
   274  serviceaccount/prometheus created
   275  clusterrole.rbac.authorization.k8s.io/discoverer created
   276  clusterrolebinding.rbac.authorization.k8s.io/prometheus-discoverer created
   277  configmap/prometheus-server-conf created
   278  deployment.apps/prometheus created
   279  ```
   280  
   281  Create a port-forward session to Prometheus:
   282  
   283  ```bash
   284  $ kubectl port-forward --namespace monitoring deployment/prometheus 9090:9090 &
   285  ```
   286  
   287  Let's create a metric that reports processes executed:
   288  
   289  ```yaml
   290  # myconfig.yaml
   291  metrics_name: guide
   292  metrics:
   293    - name: executed_processes
   294      type: counter
   295      category: trace
   296      gadget: exec
   297      labels:
   298        - k8s.namespace
   299        - k8s.pod
   300        - k8s.container
   301  ```
   302  
   303  Start the gadget
   304  
   305  ```bash
   306  $ kubectl gadget prometheus --config @myconfig.yaml
   307  INFO[0000] Running. Press Ctrl + C to finish
   308  INFO[0000] minikube             | Publishing metrics...
   309  ```
   310  
   311  <!-- markdown-link-check-disable-next-line -->
   312  Now, the `executed_processes_total` counter is available in Prometheus http://localhost:9090/graph?g0.expr=executed_processes_total&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1m:
   313  
   314  ![Inspektor Gadget Counter Metric](../images/prometheus_counter_1.png)
   315  
   316  You can see that the counters are already going up for some containers.
   317  
   318  Let's create a pod to execute from more processes:
   319  
   320  ```bash
   321  $ kubectl run mypod1 -it --image busybox --restart Never -- sh -c 'for i in $(seq 0 1 1000); do cat /dev/null ; ping -c 1 localhost > /dev/null; done'
   322  ```
   323  
   324  If we check the counter again, we can see that it shows that our pod has executed a lot of processes:
   325  
   326  ![Inspektor Gadget Counter Metric](../images/prometheus_counter_2.png)
   327  
   328  Now, update the configuration file to only take into considerations executions of the `cat` binary:
   329  
   330  ```yaml
   331  # myconfig.yaml
   332  metrics_name: guide
   333  metrics:
   334    - name: executed_processes
   335      type: counter
   336      category: trace
   337      gadget: exec
   338      labels:
   339        - k8s.namespace
   340        - k8s.pod
   341        - k8s.container
   342      selector:
   343       - "comm:cat"
   344  ```
   345  
   346  Restart the gadget
   347  
   348  ```bash
   349  $ kubectl gadget prometheus --config @myconfig.yaml
   350  INFO[0000] Running. Press Ctrl + C to finish
   351  INFO[0000] minikube             | Publishing metrics...
   352  ```
   353  
   354  Create a new pod that executes processes:
   355  
   356  ```bash
   357  $ kubectl run mypod2 -it --image busybox --restart Never -- sh -c 'for i in $(seq 0 1 1000); do cat /dev/null ; ping -c 1 localhost > /dev/null; done'
   358  ```
   359  
   360  The counter only takes into consideration the cat commands now:
   361  
   362  ![Inspektor Gadget Counter Metric](../images/prometheus_counter_3.png)
   363  
   364  #### With `ig`
   365  
   366  It's also possible to use the prometheus gadget without Kubernetes. In this case, we have to
   367  configure Prometheus to point to the endpoint exposed by ig, it's `localhost:2223` by default:
   368  
   369  ```yaml
   370  # prometheus.yaml
   371  scrape_configs:
   372  - job_name: ig
   373    scrape_interval: 1s
   374    static_configs:
   375    - targets:
   376      - localhost:2223
   377  ```
   378  
   379  Start prometheus with above configuration (please refer to [docker
   380  installation](https://prometheus.io/docs/prometheus/latest/installation/#using-docker) in case you
   381  want to run prometheus in a container).
   382  
   383  ```bash
   384  $ prometheus --config.file prometheus.yaml
   385  ```
   386  
   387  Then, start the prometheus gadget with the same configuration as above Kubernetes section:
   388  
   389  ```bash
   390  $ sudo ig prometheus --config @myconfig.yaml
   391  INFO[0000] Running. Press Ctrl + C to finish
   392  INFO[0000] Publishing metrics...
   393  ```
   394  
   395  <!-- markdown-link-check-disable-next-line -->
   396  You can check in http://localhost:9090/targets and check that the ig endpoint is reporting metrics:
   397  
   398  ![ig service up](../images/prometheus_ig_service_up.png)
   399  
   400  Let's execute some commands inside a container:
   401  
   402  ```bash
   403  docker run --rm -ti --name=mycontainer busybox sh -c 'for i in $(seq 0 1 1000); do cat /dev/null ; ping -c 1 localhost > /dev/null; done'
   404  ```
   405  
   406  <!-- markdown-link-check-disable-next-line -->
   407  We can see how the counter for `mycontainer` is increased in http://localhost:9090/graph?g0.expr=executed_processes_total&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1m.
   408  
   409  ![ig counter](../images/prometheus_ig_counter1.png)
   410  
   411  #### Grafana
   412  
   413  It's possible to visualize the metrics in Grafana. As an example we will plot a histogram for DNS requests latency. We
   414  can use the [docker compose file](../../tools/monitoring/docker-compose.yml) to prepare the environment:
   415  
   416  ```bash
   417  $ pushd tools/monitoring
   418  $ docker compose up -d
   419  $ popd
   420  ```
   421  
   422  <!-- markdown-link-check-disable-next-line -->
   423  At this point, Grafana is available at http://localhost:3000 and Prometheus at http://localhost:9090. We can start `ig`
   424  with the following configuration:
   425  
   426  ```bash
   427  $ sudo ig prometheus --config @tools/monitoring/config/histogram.yaml
   428  INFO[0000] Running. Press Ctrl + C to finish
   429  ```
   430  
   431  Now, generate some DNS requests:
   432  
   433  ```bash
   434  $ docker run --rm -ti busybox sh -c 'for i in $(seq 0 1 1000); do cat /dev/null ; nslookup -querytype=a microsoft.com. > /dev/null; done'
   435  ```
   436  
   437  <!-- markdown-link-check-disable -->
   438  We should now be able to see the visualized histogram
   439  at: http://localhost:3000/d/e1981f70-308c-4784-b986-9b5f1a895444/inspektor-gadget?orgId=1&viewPanel=1
   440  <!-- markdown-link-check-enable -->
   441  
   442  ![ig histogram](../images/prometheus_ig_histogram.png)
   443  
   444  ### Limitations
   445  
   446  - The `kubectl gadget` instance has to keep running in order to update the metrics.
   447  - It's not possible to configure the metrics endpoint in ig-k8s