github.com/netdata/go.d.plugin@v0.58.1/modules/consul/integrations/consul.md (about) 1 <!--startmeta 2 custom_edit_url: "https://github.com/netdata/go.d.plugin/edit/master/modules/consul/README.md" 3 meta_yaml: "https://github.com/netdata/go.d.plugin/edit/master/modules/consul/metadata.yaml" 4 sidebar_label: "Consul" 5 learn_status: "Published" 6 learn_rel_path: "Data Collection/Service Discovery / Registry" 7 most_popular: True 8 message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" 9 endmeta--> 10 11 # Consul 12 13 14 <img src="https://netdata.cloud/img/consul.svg" width="150"/> 15 16 17 Plugin: go.d.plugin 18 Module: consul 19 20 <img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> 21 22 ## Overview 23 24 This collector monitors [key metrics](https://developer.hashicorp.com/consul/docs/agent/telemetry#key-metrics) of Consul Agents: transaction timings, leadership changes, memory usage and more. 25 26 27 It periodically sends HTTP requests to [Consul REST API](https://developer.hashicorp.com/consul/api-docs). 28 29 Used endpoints: 30 31 - [/operator/autopilot/health](https://developer.hashicorp.com/consul/api-docs/operator/autopilot#read-health) 32 - [/agent/checks](https://developer.hashicorp.com/consul/api-docs/agent/check#list-checks) 33 - [/agent/self](https://developer.hashicorp.com/consul/api-docs/agent#read-configuration) 34 - [/agent/metrics](https://developer.hashicorp.com/consul/api-docs/agent#view-metrics) 35 - [/coordinate/nodes](https://developer.hashicorp.com/consul/api-docs/coordinate#read-lan-coordinates-for-all-nodes) 36 37 38 This collector is supported on all platforms. 39 40 This collector supports collecting metrics from multiple instances of this integration, including remote instances. 41 42 43 ### Default Behavior 44 45 #### Auto-Detection 46 47 This collector discovers instances running on the local host, that provide metrics on port 8500. 48 49 On startup, it tries to collect metrics from: 50 51 - http://localhost:8500 52 - http://127.0.0.1:8500 53 54 55 #### Limits 56 57 The default configuration for this integration does not impose any limits on data collection. 58 59 #### Performance Impact 60 61 The default configuration for this integration is not expected to impose a significant performance impact on the system. 62 63 64 ## Metrics 65 66 Metrics grouped by *scope*. 67 68 The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. 69 70 The set of metrics depends on the [Consul Agent mode](https://developer.hashicorp.com/consul/docs/install/glossary#agent). 71 72 73 ### Per Consul instance 74 75 These metrics refer to the entire monitored application. 76 77 This scope has no labels. 78 79 Metrics: 80 81 | Metric | Dimensions | Unit | Leader | Follower | Client | 82 |:------|:----------|:----|:---:|:---:|:---:| 83 | consul.client_rpc_requests_rate | rpc | requests/s | • | • | • | 84 | consul.client_rpc_requests_exceeded_rate | exceeded | requests/s | • | • | • | 85 | consul.client_rpc_requests_failed_rate | failed | requests/s | • | • | • | 86 | consul.memory_allocated | allocated | bytes | • | • | • | 87 | consul.memory_sys | sys | bytes | • | • | • | 88 | consul.gc_pause_time | gc_pause | seconds | • | • | • | 89 | consul.kvs_apply_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | | 90 | consul.kvs_apply_operations_rate | kvs_apply | ops/s | • | • | | 91 | consul.txn_apply_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | | 92 | consul.txn_apply_operations_rate | txn_apply | ops/s | • | • | | 93 | consul.autopilot_health_status | healthy, unhealthy | status | • | • | | 94 | consul.autopilot_failure_tolerance | failure_tolerance | servers | • | • | | 95 | consul.autopilot_server_health_status | healthy, unhealthy | status | • | • | | 96 | consul.autopilot_server_stable_time | stable | seconds | • | • | | 97 | consul.autopilot_server_serf_status | active, failed, left, none | status | • | • | | 98 | consul.autopilot_server_voter_status | voter, not_voter | status | • | • | | 99 | consul.network_lan_rtt | min, max, avg | ms | • | • | | 100 | consul.raft_commit_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | | | 101 | consul.raft_commits_rate | commits | commits/s | • | | | 102 | consul.raft_leader_last_contact_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | | | 103 | consul.raft_leader_oldest_log_age | oldest_log_age | seconds | • | | | 104 | consul.raft_follower_last_contact_leader_time | leader_last_contact | ms | | • | | 105 | consul.raft_rpc_install_snapshot_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | | • | | 106 | consul.raft_leader_elections_rate | leader | elections/s | • | • | | 107 | consul.raft_leadership_transitions_rate | leadership | transitions/s | • | • | | 108 | consul.server_leadership_status | leader, not_leader | status | • | • | | 109 | consul.raft_thread_main_saturation_perc | quantile_0.5, quantile_0.9, quantile_0.99 | percentage | • | • | | 110 | consul.raft_thread_fsm_saturation_perc | quantile_0.5, quantile_0.9, quantile_0.99 | percentage | • | • | | 111 | consul.raft_fsm_last_restore_duration | last_restore_duration | ms | • | • | | 112 | consul.raft_boltdb_freelist_bytes | freelist | bytes | • | • | | 113 | consul.raft_boltdb_logs_per_batch_rate | written | logs/s | • | • | | 114 | consul.raft_boltdb_store_logs_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | | 115 | consul.license_expiration_time | license_expiration | seconds | • | • | • | 116 117 ### Per node check 118 119 Metrics about checks on Node level. 120 121 Labels: 122 123 | Label | Description | 124 |:-----------|:----------------| 125 | datacenter | Datacenter Identifier | 126 | node_name | The node's name | 127 | check_name | The check's name | 128 129 Metrics: 130 131 | Metric | Dimensions | Unit | Leader | Follower | Client | 132 |:------|:----------|:----|:---:|:---:|:---:| 133 | consul.node_health_check_status | passing, maintenance, warning, critical | status | • | • | • | 134 135 ### Per service check 136 137 Metrics about checks at a Service level. 138 139 Labels: 140 141 | Label | Description | 142 |:-----------|:----------------| 143 | datacenter | Datacenter Identifier | 144 | node_name | The node's name | 145 | check_name | The check's name | 146 | service_name | The service's name | 147 148 Metrics: 149 150 | Metric | Dimensions | Unit | Leader | Follower | Client | 151 |:------|:----------|:----|:---:|:---:|:---:| 152 | consul.service_health_check_status | passing, maintenance, warning, critical | status | • | • | • | 153 154 155 156 ## Alerts 157 158 159 The following alerts are available: 160 161 | Alert name | On metric | Description | 162 |:------------|:----------|:------------| 163 | [ consul_node_health_check_status ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.node_health_check_status | node health check ${label:check_name} has failed on server ${label:node_name} datacenter ${label:datacenter} | 164 | [ consul_service_health_check_status ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.service_health_check_status | service health check ${label:check_name} for service ${label:service_name} has failed on server ${label:node_name} datacenter ${label:datacenter} | 165 | [ consul_client_rpc_requests_exceeded ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.client_rpc_requests_exceeded_rate | number of rate-limited RPC requests made by server ${label:node_name} datacenter ${label:datacenter} | 166 | [ consul_client_rpc_requests_failed ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.client_rpc_requests_failed_rate | number of failed RPC requests made by server ${label:node_name} datacenter ${label:datacenter} | 167 | [ consul_gc_pause_time ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.gc_pause_time | time spent in stop-the-world garbage collection pauses on server ${label:node_name} datacenter ${label:datacenter} | 168 | [ consul_autopilot_health_status ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.autopilot_health_status | datacenter ${label:datacenter} cluster is unhealthy as reported by server ${label:node_name} | 169 | [ consul_autopilot_server_health_status ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.autopilot_server_health_status | server ${label:node_name} from datacenter ${label:datacenter} is unhealthy | 170 | [ consul_raft_leader_last_contact_time ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.raft_leader_last_contact_time | median time elapsed since leader server ${label:node_name} datacenter ${label:datacenter} was last able to contact the follower nodes | 171 | [ consul_raft_leadership_transitions ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.raft_leadership_transitions_rate | there has been a leadership change and server ${label:node_name} datacenter ${label:datacenter} has become the leader | 172 | [ consul_raft_thread_main_saturation ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.raft_thread_main_saturation_perc | average saturation of the main Raft goroutine on server ${label:node_name} datacenter ${label:datacenter} | 173 | [ consul_raft_thread_fsm_saturation ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.raft_thread_fsm_saturation_perc | average saturation of the FSM Raft goroutine on server ${label:node_name} datacenter ${label:datacenter} | 174 | [ consul_license_expiration_time ](https://github.com/netdata/netdata/blob/master/health/health.d/consul.conf) | consul.license_expiration_time | Consul Enterprise licence expiration time on node ${label:node_name} datacenter ${label:datacenter} | 175 176 177 ## Setup 178 179 ### Prerequisites 180 181 #### Enable Prometheus telemetry 182 183 [Enable](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-prometheus_retention_time) telemetry on your Consul agent, by increasing the value of `prometheus_retention_time` from `0`. 184 185 186 #### Add required ACLs to Token 187 188 Required **only if authentication is enabled**. 189 190 | ACL | Endpoint | 191 |:---------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 192 | `operator:read` | [autopilot health status](https://developer.hashicorp.com/consul/api-docs/operator/autopilot#read-health) | 193 | `node:read` | [checks](https://developer.hashicorp.com/consul/api-docs/agent/check#list-checks) | 194 | `agent:read` | [configuration](https://developer.hashicorp.com/consul/api-docs/agent#read-configuration), [metrics](https://developer.hashicorp.com/consul/api-docs/agent#view-metrics), and [lan coordinates](https://developer.hashicorp.com/consul/api-docs/coordinate#read-lan-coordinates-for-all-nodes) | 195 196 197 198 ### Configuration 199 200 #### File 201 202 The configuration file name for this integration is `go.d/consul.conf`. 203 204 205 You can edit the configuration file using the `edit-config` script from the 206 Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). 207 208 ```bash 209 cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata 210 sudo ./edit-config go.d/consul.conf 211 ``` 212 #### Options 213 214 The following options can be defined globally: update_every, autodetection_retry. 215 216 217 <details><summary>All options</summary> 218 219 | Name | Description | Default | Required | 220 |:----|:-----------|:-------|:--------:| 221 | update_every | Data collection frequency. | 1 | no | 222 | autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | 223 | url | Server URL. | http://localhost:8500 | yes | 224 | acl_token | ACL token used in every request. | | no | 225 | max_checks | Checks processing/charting limit. | | no | 226 | max_filter | Checks processing/charting filter. Uses [simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md). | | no | 227 | username | Username for basic HTTP authentication. | | no | 228 | password | Password for basic HTTP authentication. | | no | 229 | proxy_url | Proxy URL. | | no | 230 | proxy_username | Username for proxy basic HTTP authentication. | | no | 231 | proxy_password | Password for proxy basic HTTP authentication. | | no | 232 | timeout | HTTP request timeout. | 1 | no | 233 | method | HTTP request method. | GET | no | 234 | body | HTTP request body. | | no | 235 | headers | HTTP request headers. | | no | 236 | not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no | 237 | tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no | 238 | tls_ca | Certification authority that the client uses when verifying the server's certificates. | | no | 239 | tls_cert | Client tls certificate. | | no | 240 | tls_key | Client tls key. | | no | 241 242 </details> 243 244 #### Examples 245 246 ##### Basic 247 248 An example configuration. 249 250 ```yaml 251 jobs: 252 - name: local 253 url: http://127.0.0.1:8500 254 acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7" 255 256 ``` 257 ##### Basic HTTP auth 258 259 Local server with basic HTTP authentication. 260 261 <details><summary>Config</summary> 262 263 ```yaml 264 jobs: 265 - name: local 266 url: http://127.0.0.1:8500 267 acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7" 268 username: foo 269 password: bar 270 271 ``` 272 </details> 273 274 ##### Multi-instance 275 276 > **Note**: When you define multiple jobs, their names must be unique. 277 278 Collecting metrics from local and remote instances. 279 280 281 <details><summary>Config</summary> 282 283 ```yaml 284 jobs: 285 - name: local 286 url: http://127.0.0.1:8500 287 acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7" 288 289 - name: remote 290 url: http://203.0.113.10:8500 291 acl_token: "ada7f751-f654-8872-7f93-498e799158b6" 292 293 ``` 294 </details> 295 296 297 298 ## Troubleshooting 299 300 ### Debug Mode 301 302 To troubleshoot issues with the `consul` collector, run the `go.d.plugin` with the debug option enabled. The output 303 should give you clues as to why the collector isn't working. 304 305 - Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on 306 your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. 307 308 ```bash 309 cd /usr/libexec/netdata/plugins.d/ 310 ``` 311 312 - Switch to the `netdata` user. 313 314 ```bash 315 sudo -u netdata -s 316 ``` 317 318 - Run the `go.d.plugin` to debug the collector: 319 320 ```bash 321 ./go.d.plugin -d -m consul 322 ``` 323 324