github.com/argoproj/argo-events@v1.9.1/docs/metrics.md (about)

     1  # Prometheus Metrics
     2  
     3  ![alpha](assets/alpha.svg)
     4  
     5  > v1.3 and after
     6  
     7  ## User Metrics
     8  
     9  Each of generated EventSource, Sensor and EventBus PODs exposes an HTTP endpoint
    10  for its metrics, which include things like how many events were generated, how
    11  many actions were triggered, and so on. To let your Prometheus server discover
    12  those user metrics, add following to your configuration.
    13  
    14  ```txt
    15      - job_name: 'argo-events'
    16        kubernetes_sd_configs:
    17        - role: pod
    18          selectors:
    19          - role: pod
    20            label: 'controller in (eventsource-controller,sensor-controller,eventbus-controller)'
    21        relabel_configs:
    22        - source_labels: [__meta_kubernetes_pod_label_eventbus_name, __meta_kubernetes_pod_label_controller]
    23          action: replace
    24          regex: (.+);eventbus-controller
    25          replacement: $1
    26          target_label: 'eventbus_name'
    27        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_controller]
    28          action: replace
    29          regex: (.+);eventbus-controller
    30          replacement: $1
    31          target_label: 'namespace'
    32        - source_labels: [__address__, __meta_kubernetes_pod_label_controller]
    33          action: drop
    34          regex: (.+):(\d222);eventbus-controller
    35  ```
    36  
    37  Also please make sure your Prometheus Service Account has the permission to do
    38  POD discovery. A sample `ClusterRole` like below needs to be added or merged,
    39  and grant it to your Service Account.
    40  
    41  ```yaml
    42  apiVersion: rbac.authorization.k8s.io/v1
    43  kind: ClusterRole
    44  metadata:
    45    name: pod-discovery
    46  rules:
    47    - apiGroups: [""]
    48      resources:
    49        - pods
    50      verbs: ["get", "list", "watch"]
    51  ```
    52  
    53  ### EventSource
    54  
    55  #### argo_events_event_service_running_total
    56  
    57  How many configured events in the EventSource object are actively running.
    58  
    59  #### argo_events_events_sent_total
    60  
    61  How many events have been sent successfully.
    62  
    63  #### argo_events_events_sent_failed_total
    64  
    65  How many events failed to send to EventBus.
    66  
    67  #### argo_events_events_processing_failed_total
    68  
    69  How many events failed to process due to all the reasons, it includes
    70  `argo_events_events_sent_failed_total`.
    71  
    72  #### argo_events_event_processing_duration_milliseconds
    73  
    74  Event processing duration (from getting the event to send it to EventBus) in
    75  milliseconds.
    76  
    77  ### Sensor
    78  
    79  #### argo_events_action_triggered_total
    80  
    81  How many actions have been triggered successfully.
    82  
    83  #### argo_events_action_failed_total
    84  
    85  How many actions failed.
    86  
    87  #### argo_events_action_duration_milliseconds
    88  
    89  Action triggering duration.
    90  
    91  ### EventBus
    92  
    93  For `native` NATS EventBus, check this
    94  [link](https://github.com/nats-io/prometheus-nats-exporter) for the metrics
    95  explanation.
    96  
    97  ## Controller Metrics
    98  
    99  If you are interested in Argo Events controller metrics, add following to your
   100  Prometheus configuration.
   101  
   102  ```txt
   103      - job_name: 'argo-events-controllers'
   104        kubernetes_sd_configs:
   105        - role: pod
   106          selectors:
   107          - role: pod
   108            label: 'app in (eventsource-controller,sensor-controller,eventbus-controller)'
   109        relabel_configs:
   110        - source_labels: [__address__, __meta_kubernetes_pod_label_app]
   111          action: replace
   112          regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller)
   113          replacement: $1:7777
   114          target_label: '__address__'
   115        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
   116          action: replace
   117          regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller)
   118          replacement: $1
   119          target_label: 'namespace'
   120  ```
   121  
   122  ## Golden Signals
   123  
   124  Following metrics are considered as
   125  [Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals)
   126  of monitoring your applications running with Argo Events.
   127  
   128  - Latency
   129  
   130    - `argo_events_event_processing_duration_milliseconds`
   131    - `argo_events_action_duration_milliseconds`
   132  
   133  - Traffic
   134  
   135    - `argo_events_events_sent_total`
   136    - `argo_events_action_triggered_total`
   137  
   138  - Errors
   139  
   140    - `argo_events_events_processing_failed_total`
   141    - `argo_events_events_sent_failed_total`
   142    - `argo_events_action_failed_total`
   143  
   144  - Saturation
   145  
   146    - `argo_events_event_service_running_total`.
   147    - Other Kubernetes metrics such as CPU or memory.