github.com/netdata/go.d.plugin@v0.58.1/modules/envoy/integrations/envoy.md

github.com/netdata/go.d.plugin@v0.58.1/modules/envoy/integrations/envoy.md (about)

     1  <!--startmeta
     2  custom_edit_url: "https://github.com/netdata/go.d.plugin/edit/master/modules/envoy/README.md"
     3  meta_yaml: "https://github.com/netdata/go.d.plugin/edit/master/modules/envoy/metadata.yaml"
     4  sidebar_label: "Envoy"
     5  learn_status: "Published"
     6  learn_rel_path: "Data Collection/Web Servers and Web Proxies"
     7  most_popular: True
     8  message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
     9  endmeta-->
    10  
    11  # Envoy
    12  
    13  
    14  <img src="https://netdata.cloud/img/envoy.svg" width="150"/>
    15  
    16  
    17  Plugin: go.d.plugin
    18  Module: envoy
    19  
    20  <img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />
    21  
    22  ## Overview
    23  
    24  This collector monitors Envoy proxies. It collects server, cluster, and listener metrics.
    25  
    26  
    27  
    28  
    29  This collector is supported on all platforms.
    30  
    31  This collector supports collecting metrics from multiple instances of this integration, including remote instances.
    32  
    33  
    34  ### Default Behavior
    35  
    36  #### Auto-Detection
    37  
    38  By default, it detects Envoy instances running on localhost.
    39  
    40  
    41  #### Limits
    42  
    43  The default configuration for this integration does not impose any limits on data collection.
    44  
    45  #### Performance Impact
    46  
    47  The default configuration for this integration is not expected to impose a significant performance impact on the system.
    48  
    49  
    50  ## Metrics
    51  
    52  Metrics grouped by *scope*.
    53  
    54  The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
    55  
    56  
    57  
    58  ### Per Envoy instance
    59  
    60  Envoy exposes metrics in Prometheus format. All metric labels are added to charts.
    61  
    62  This scope has no labels.
    63  
    64  Metrics:
    65  
    66  | Metric | Dimensions | Unit |
    67  |:------|:----------|:----|
    68  | envoy.server_state | live, draining, pre_initializing, initializing | state |
    69  | envoy.server_connections_count | connections | connections |
    70  | envoy.server_parent_connections_count | connections | connections |
    71  | envoy.server_memory_allocated_size | allocated | bytes |
    72  | envoy.server_memory_heap_size | heap | bytes |
    73  | envoy.server_memory_physical_size | physical | bytes |
    74  | envoy.server_uptime | uptime | seconds |
    75  | envoy.cluster_manager_cluster_count | active, not_active | clusters |
    76  | envoy.cluster_manager_cluster_changes_rate | added, modified, removed | clusters/s |
    77  | envoy.cluster_manager_cluster_updates_rate | cluster | updates/s |
    78  | envoy.cluster_manager_cluster_updated_via_merge_rate | via_merge | updates/s |
    79  | envoy.cluster_manager_update_merge_cancelled_rate | merge_cancelled | updates/s |
    80  | envoy.cluster_manager_update_out_of_merge_window_rate | out_of_merge_window | updates/s |
    81  | envoy.cluster_membership_endpoints_count | healthy, degraded, excluded | endpoints |
    82  | envoy.cluster_membership_changes_rate | membership | changes/s |
    83  | envoy.cluster_membership_updates_rate | success, failure, empty, no_rebuild | updates/s |
    84  | envoy.cluster_upstream_cx_active_count | active | connections |
    85  | envoy.cluster_upstream_cx_rate | created | connections/s |
    86  | envoy.cluster_upstream_cx_http_rate | http1, http2, http3 | connections/s |
    87  | envoy.cluster_upstream_cx_destroy_rate | local, remote | connections/s |
    88  | envoy.cluster_upstream_cx_connect_fail_rate | failed | connections/s |
    89  | envoy.cluster_upstream_cx_connect_timeout_rate | timeout | connections/s |
    90  | envoy.cluster_upstream_cx_bytes_rate | received, sent | bytes/s |
    91  | envoy.cluster_upstream_cx_bytes_buffered_size | received, send | bytes |
    92  | envoy.cluster_upstream_rq_active_count | active | requests |
    93  | envoy.cluster_upstream_rq_rate | requests | requests/s |
    94  | envoy.cluster_upstream_rq_failed_rate | cancelled, maintenance_mode, timeout, max_duration_reached, per_try_timeout, reset_local, reset_remote | requests/s |
    95  | envoy.cluster_upstream_rq_pending_active_count | active_pending | requests |
    96  | envoy.cluster_upstream_rq_pending_rate | pending | requests/s |
    97  | envoy.cluster_upstream_rq_pending_failed_rate | overflow, failure_eject | requests/s |
    98  | envoy.cluster_upstream_rq_retry_rate | request | retries/s |
    99  | envoy.cluster_upstream_rq_retry_success_rate | success | retries/s |
   100  | envoy.cluster_upstream_rq_retry_backoff_rate | exponential, ratelimited | retries/s |
   101  | envoy.listener_manager_listeners_count | active, warming, draining | listeners |
   102  | envoy.listener_manager_listener_changes_rate | added, modified, removed, stopped | listeners/s |
   103  | envoy.listener_manager_listener_object_events_rate | create_success, create_failure, in_place_updated | objects/s |
   104  | envoy.listener_admin_downstream_cx_active_count | active | connections |
   105  | envoy.listener_admin_downstream_cx_rate | created | connections/s |
   106  | envoy.listener_admin_downstream_cx_destroy_rate | destroyed | connections/s |
   107  | envoy.listener_admin_downstream_cx_transport_socket_connect_timeout_rate | timeout | connections/s |
   108  | envoy.listener_admin_downstream_cx_rejected_rate | overflow, overload, global_overflow | connections/s |
   109  | envoy.listener_admin_downstream_listener_filter_remote_close_rate | closed | connections/s |
   110  | envoy.listener_admin_downstream_listener_filter_error_rate | read | errors/s |
   111  | envoy.listener_admin_downstream_pre_cx_active_count | active | sockets |
   112  | envoy.listener_admin_downstream_pre_cx_timeout_rate | timeout | sockets/s |
   113  | envoy.listener_downstream_cx_active_count | active | connections |
   114  | envoy.listener_downstream_cx_rate | created | connections/s |
   115  | envoy.listener_downstream_cx_destroy_rate | destroyed | connections/s |
   116  | envoy.listener_downstream_cx_transport_socket_connect_timeout_rate | timeout | connections/s |
   117  | envoy.listener_downstream_cx_rejected_rate | overflow, overload, global_overflow | connections/s |
   118  | envoy.listener_downstream_listener_filter_remote_close_rate | closed | connections/s |
   119  | envoy.listener_downstream_listener_filter_error_rate | read | errors/s |
   120  | envoy.listener_downstream_pre_cx_active_count | active | sockets |
   121  | envoy.listener_downstream_pre_cx_timeout_rate | timeout | sockets/s |
   122  
   123  
   124  
   125  ## Alerts
   126  
   127  There are no alerts configured by default for this integration.
   128  
   129  
   130  ## Setup
   131  
   132  ### Prerequisites
   133  
   134  No action required.
   135  
   136  ### Configuration
   137  
   138  #### File
   139  
   140  The configuration file name for this integration is `go.d/envoy.conf`.
   141  
   142  
   143  You can edit the configuration file using the `edit-config` script from the
   144  Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory).
   145  
   146  ```bash
   147  cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
   148  sudo ./edit-config go.d/envoy.conf
   149  ```
   150  #### Options
   151  
   152  The following options can be defined globally: update_every, autodetection_retry.
   153  
   154  
   155  <details><summary>Config options</summary>
   156  
   157  | Name | Description | Default | Required |
   158  |:----|:-----------|:-------|:--------:|
   159  | update_every | Data collection frequency. | 1 | no |
   160  | autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
   161  | url | Server URL. | http://127.0.0.1:9091/stats/prometheus | yes |
   162  | timeout | HTTP request timeout. | 1 | no |
   163  | username | Username for basic HTTP authentication. |  | no |
   164  | password | Password for basic HTTP authentication. |  | no |
   165  | proxy_url | Proxy URL. |  | no |
   166  | proxy_username | Username for proxy basic HTTP authentication. |  | no |
   167  | proxy_password | Password for proxy basic HTTP authentication. |  | no |
   168  | method | HTTP request method. | GET | no |
   169  | body | HTTP request body. |  | no |
   170  | headers | HTTP request headers. |  | no |
   171  | not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no |
   172  | tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no |
   173  | tls_ca | Certification authority that the client uses when verifying the server's certificates. |  | no |
   174  | tls_cert | Client TLS certificate. |  | no |
   175  | tls_key | Client TLS key. |  | no |
   176  
   177  </details>
   178  
   179  #### Examples
   180  
   181  ##### Basic
   182  
   183  A basic example configuration.
   184  
   185  ```yaml
   186  jobs:
   187    - name: local
   188      url: http://127.0.0.1:9901/stats/prometheus
   189  
   190  ```
   191  ##### HTTP authentication
   192  
   193  Basic HTTP authentication.
   194  
   195  <details><summary>Config</summary>
   196  
   197  ```yaml
   198  jobs:
   199    - name: local
   200      url: http://127.0.0.1:9901/stats/prometheus
   201      username: username
   202      password: password
   203  
   204  ```
   205  </details>
   206  
   207  ##### HTTPS with self-signed certificate
   208  
   209  Do not validate server certificate chain and hostname.
   210  
   211  
   212  <details><summary>Config</summary>
   213  
   214  ```yaml
   215  jobs:
   216    - name: local
   217      url: https://127.0.0.1:9901/stats/prometheus
   218      tls_skip_verify: yes
   219  
   220  ```
   221  </details>
   222  
   223  ##### Multi-instance
   224  
   225  > **Note**: When you define multiple jobs, their names must be unique.
   226  
   227  Collecting metrics from local and remote instances.
   228  
   229  
   230  <details><summary>Config</summary>
   231  
   232  ```yaml
   233  jobs:
   234    - name: local
   235      url: http://127.0.0.1:9901/stats/prometheus
   236  
   237    - name: remote
   238      url: http://192.0.2.1:9901/stats/prometheus
   239  
   240  ```
   241  </details>
   242  
   243  
   244  
   245  ## Troubleshooting
   246  
   247  ### Debug Mode
   248  
   249  To troubleshoot issues with the `envoy` collector, run the `go.d.plugin` with the debug option enabled. The output
   250  should give you clues as to why the collector isn't working.
   251  
   252  - Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
   253    your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.
   254  
   255    ```bash
   256    cd /usr/libexec/netdata/plugins.d/
   257    ```
   258  
   259  - Switch to the `netdata` user.
   260  
   261    ```bash
   262    sudo -u netdata -s
   263    ```
   264  
   265  - Run the `go.d.plugin` to debug the collector:
   266  
   267    ```bash
   268    ./go.d.plugin -d -m envoy
   269    ```
   270  
   271