github.com/argoproj/argo-events@v1.9.1/docs/metrics.md (about) 1 # Prometheus Metrics 2 3  4 5 > v1.3 and after 6 7 ## User Metrics 8 9 Each of generated EventSource, Sensor and EventBus PODs exposes an HTTP endpoint 10 for its metrics, which include things like how many events were generated, how 11 many actions were triggered, and so on. To let your Prometheus server discover 12 those user metrics, add following to your configuration. 13 14 ```txt 15 - job_name: 'argo-events' 16 kubernetes_sd_configs: 17 - role: pod 18 selectors: 19 - role: pod 20 label: 'controller in (eventsource-controller,sensor-controller,eventbus-controller)' 21 relabel_configs: 22 - source_labels: [__meta_kubernetes_pod_label_eventbus_name, __meta_kubernetes_pod_label_controller] 23 action: replace 24 regex: (.+);eventbus-controller 25 replacement: $1 26 target_label: 'eventbus_name' 27 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_controller] 28 action: replace 29 regex: (.+);eventbus-controller 30 replacement: $1 31 target_label: 'namespace' 32 - source_labels: [__address__, __meta_kubernetes_pod_label_controller] 33 action: drop 34 regex: (.+):(\d222);eventbus-controller 35 ``` 36 37 Also please make sure your Prometheus Service Account has the permission to do 38 POD discovery. A sample `ClusterRole` like below needs to be added or merged, 39 and grant it to your Service Account. 40 41 ```yaml 42 apiVersion: rbac.authorization.k8s.io/v1 43 kind: ClusterRole 44 metadata: 45 name: pod-discovery 46 rules: 47 - apiGroups: [""] 48 resources: 49 - pods 50 verbs: ["get", "list", "watch"] 51 ``` 52 53 ### EventSource 54 55 #### argo_events_event_service_running_total 56 57 How many configured events in the EventSource object are actively running. 58 59 #### argo_events_events_sent_total 60 61 How many events have been sent successfully. 62 63 #### argo_events_events_sent_failed_total 64 65 How many events failed to send to EventBus. 66 67 #### argo_events_events_processing_failed_total 68 69 How many events failed to process due to all the reasons, it includes 70 `argo_events_events_sent_failed_total`. 71 72 #### argo_events_event_processing_duration_milliseconds 73 74 Event processing duration (from getting the event to send it to EventBus) in 75 milliseconds. 76 77 ### Sensor 78 79 #### argo_events_action_triggered_total 80 81 How many actions have been triggered successfully. 82 83 #### argo_events_action_failed_total 84 85 How many actions failed. 86 87 #### argo_events_action_duration_milliseconds 88 89 Action triggering duration. 90 91 ### EventBus 92 93 For `native` NATS EventBus, check this 94 [link](https://github.com/nats-io/prometheus-nats-exporter) for the metrics 95 explanation. 96 97 ## Controller Metrics 98 99 If you are interested in Argo Events controller metrics, add following to your 100 Prometheus configuration. 101 102 ```txt 103 - job_name: 'argo-events-controllers' 104 kubernetes_sd_configs: 105 - role: pod 106 selectors: 107 - role: pod 108 label: 'app in (eventsource-controller,sensor-controller,eventbus-controller)' 109 relabel_configs: 110 - source_labels: [__address__, __meta_kubernetes_pod_label_app] 111 action: replace 112 regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller) 113 replacement: $1:7777 114 target_label: '__address__' 115 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app] 116 action: replace 117 regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller) 118 replacement: $1 119 target_label: 'namespace' 120 ``` 121 122 ## Golden Signals 123 124 Following metrics are considered as 125 [Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals) 126 of monitoring your applications running with Argo Events. 127 128 - Latency 129 130 - `argo_events_event_processing_duration_milliseconds` 131 - `argo_events_action_duration_milliseconds` 132 133 - Traffic 134 135 - `argo_events_events_sent_total` 136 - `argo_events_action_triggered_total` 137 138 - Errors 139 140 - `argo_events_events_processing_failed_total` 141 - `argo_events_events_sent_failed_total` 142 - `argo_events_action_failed_total` 143 144 - Saturation 145 146 - `argo_events_event_service_running_total`. 147 - Other Kubernetes metrics such as CPU or memory.