github.com/anycable/anycable-go@v1.5.1/docs/instrumentation.md (about)

     1  # AnyCable-Go Instrumentation
     2  
     3  AnyCable-Go provides useful statistical information about the service (such as the number of connected clients, received messages, etc.).
     4  
     5  <p style="text-align:center;">
     6    <img width="70%" alt="AnyCable Grafana" src="/assets/images/grafana.png">
     7  </p>
     8  
     9  > Read the ["Real-time stress: AnyCable, k6, WebSockets, and Yabeda"](https://evilmartians.com/chronicles/real-time-stress-anycable-k6-websockets-and-yabeda) post to learn more about AnyCable observability and see example Grafana dashboards.
    10  
    11  ## Metrics and what can we learn from them
    12  
    13  Instrumentation exists to help us preventing and identifying performance issues. Here we provide the list of the most crucial metrics and how to interpret their values.
    14  
    15  **NOTE:** Some values are updated at real time, others (_interval metrics_) are updated periodically (every 5 seconds by default, could be configured via the `stats_refresh_interval` configuration parameter). Interval metrics are marked with the ⏱ icon below.
    16  
    17  **NOTE:** The `*_total` metrics are _counters_; when printing metrics to logs, the delta between two subsequent metrics collections is displayed; Prometheus works with absolute (i.e., cumulative) values and interpolates them on its own.
    18  
    19  ### ⏱ `clients_num` / `clients_uniq_num`
    20  
    21  The `clients_num` shows the current number of _active_ sessions (WebSocket connections). A session is considered activated as soon as it has been **authenticated** and until the connection is closed.
    22  
    23  The `clients_uniq_num` shows the current number of unique _connection identifiers_ across the active sessions. A connection identifier is a combination of `identified_by` values for the corresponding Connection object.
    24  
    25  One the useful derivative of these two metrics is the `clients_uniq_num` / `clients_num` ratio. If it's much less than 1 and is decreasing, that could be an indicator of an improper connection managements at the client side (e.g., creating a _client_ per a component mount or a Turbo navigation instead of re-using a singleton).
    26  
    27  ### `rpc_call_total`, `rpc_error_total`, `rpc_retries_total`, `rpc_pending_num`
    28  
    29  These are the vital metrics of the RPC communication channel.
    30  
    31  The `rpc_error_total` describes the number of failed RPC calls. This is the actual number of _commands_ that failed. The most common reason for the is a lack of network connectivity with the RPC service. Another potential reason is the RPC schema incompatibility (in that case, most RPC requests would fail, i.e., `rpc_call_total / rpc_error_total` tends to 1).
    32  
    33  The `rpc_retries_total` describes the number of retried RPC calls. Retries could happen if the RPC server is exhausted or unavailable (no network connectivity). The former indicates that **concurrency settings for RPC and anycable-go went out of sync** (see [here](./configuration.md)).
    34  
    35  The `rpc_pending_num` is the **key latency metrics** of AnyCable-Go. We limit the number of concurrent RPC requests (to prevent the RPC server exhaustion and retries). If the number of pending requests grows (which means we can not keep up with the rate of incoming messages), you should consider either tuning concurrency settings or scale up your cluster.
    36  
    37  ### `failed_auths_total`
    38  
    39  This `failed_auths_total` indicates the total number of unauthenticated connection attempts and has a special purpose: it helps you identify misconfigured client credentials and malicious behaviour. Ideally, the change rate of this number should be low comparing to the `clients_num`.)
    40  
    41  ### ⏱ `disconnect_queue_size`
    42  
    43  The `disconnect_queue_size` shows the current number of pending Disconnect calls. AnyCable-Go performs Disconnect calls in the background with some throttling (by default, 100 calls per second).
    44  
    45  During the normal operation, the value should be close to zero most of the a time. Larger values or growth could indicate inefficient client-side connection management (high re-connection rate). Spikes could indicate mass disconnect events.
    46  
    47  ### ⏱ `goroutines_num`
    48  
    49  The `goroutines_num` metrics is meant for debugging Go routines leak purposes. The number should be O(N), where N is the `clients_num` value for the OSS version and should be O(1) for the PRO version (unless IO polling is disabled).
    50  
    51  ### `mem_sys_bytes`
    52  
    53  The total bytes of memory obtained from the OS (according to [`runtime.MemStats.Sys`](https://golang.org/pkg/runtime/#MemStats)).
    54  
    55  ## Prometheus
    56  
    57  To enable a HTTP endpoint to serve [Prometheus](https://prometheus.io)-compatible metrics (disabled by default) you must specify `--metrics_http` option (e.g. `--metrics_http="/metrics"`).
    58  
    59  You can also change a listening port and listening host through `--metrics_port` and `--metrics_host` options respectively (by default the same as the main (websocket) server port and host, i.e., using the same server).
    60  
    61  The exported metrics format is the following (NOTE: the list above is just an example and could be incomplete):
    62  
    63  ```sh
    64  # HELP anycable_go_clients_num The number of active clients
    65  # TYPE anycable_go_clients_num gauge
    66  anycable_go_clients_num 0
    67  
    68  # HELP anycable_go_clients_uniq_num The number of unique clients (with respect to connection identifiers)
    69  # TYPE anycable_go_clients_uniq_num gauge
    70  anycable_go_clients_uniq_num 0
    71  
    72  # HELP anycable_go_client_msg_total The total number of received messages from clients
    73  # TYPE anycable_go_client_msg_total counter
    74  anycable_go_client_msg_total 5906
    75  
    76  # HELP anycable_go_failed_client_msg_total The total number of unrecognized messages received from clients
    77  # TYPE anycable_go_failed_client_msg_total counter
    78  anycable_go_failed_client_msg_total 0
    79  
    80  # HELP anycable_go_broadcast_msg_total The total number of messages received through PubSub (for broadcast)
    81  # TYPE anycable_go_broadcast_msg_total counter
    82  anycable_go_broadcast_msg_total 956
    83  
    84  # HELP anycable_go_failed_broadcast_msg_total The total number of unrecognized messages received through PubSub
    85  # TYPE anycable_go_failed_broadcast_msg_total counter
    86  anycable_go_failed_broadcast_msg_total 0
    87  
    88  # HELP anycable_go_broadcast_streams_num The number of active broadcasting streams
    89  # TYPE anycable_go_broadcast_streams_num gauge
    90  anycable_go_broadcast_streams_num 0
    91  
    92  # HELP anycable_go_rpc_call_total The total number of RPC calls
    93  # TYPE anycable_go_rpc_call_total counter
    94  anycable_go_rpc_call_total 15808
    95  
    96  # HELP anycable_go_rpc_error_total The total number of failed RPC calls
    97  # TYPE anycable_go_rpc_error_total counter
    98  anycable_go_rpc_error_total 0
    99  
   100  # HELP anycable_go_rpc_retries_total The total number of RPC call retries
   101  # TYPE anycable_go_rpc_retries_total counter
   102  anycable_go_rpc_retries_total 0
   103  
   104  # HELP anycable_go_rpc_pending_num The number of pending RPC calls
   105  # TYPE anycable_go_rpc_pending_num gauge
   106  anycable_go_rpc_pending_num 0
   107  
   108  # HELP anycable_go_failed_auths_total The total number of failed authentication attempts
   109  # TYPE anycable_go_failed_auths_total counter
   110  anycable_go_failed_auths_total 0
   111  
   112  # HELP anycable_go_goroutines_num The number of Go routines
   113  # TYPE anycable_go_goroutines_num gauge
   114  anycable_go_goroutines_num 5222
   115  
   116  # HELP anycable_go_disconnect_queue_size The size of delayed disconnect
   117  # TYPE anycable_go_disconnect_queue_size gauge
   118  anycable_go_disconnect_queue_size 0
   119  
   120  # HELP anycable_go_server_msg_total The total number of messages sent to clients
   121  # TYPE anycable_go_server_msg_total counter
   122  anycable_go_server_msg_total 453
   123  
   124  # HELP anycable_go_failed_server_msg_total The total number of messages failed to send to clients
   125  # TYPE anycable_go_failed_server_msg_total counter
   126  anycable_go_failed_server_msg_total 0
   127  
   128  # HELP anycable_go_data_sent_total The total amount of bytes sent to clients
   129  # TYPE anycable_go_data_sent_total counter
   130  anycable_go_data_sent_total 1232434334
   131  
   132  # HELP anycable_go_data_rcvd_total The total amount of bytes received from clients
   133  # TYPE anycable_go_data_rcvd_total counter
   134  anycable_go_data_rcvd_total 434334
   135  ```
   136  
   137  ## StatsD
   138  
   139  AnyCable also supports emitting real-time metrics to [StatsD](https://github.com/statsd/statsd).
   140  
   141  For that, you must specify the StatsD server UDP host:
   142  
   143  ```sh
   144  anycable-go -statsd_host=localhost:8125
   145  ```
   146  
   147  Metrics are pushed with the `anycable_go.` prefix by default. You can override it by specifying the `statsd_prefix` parameter. Find more info about StatsD metric types [here](https://github.com/statsd/statsd/blob/master/docs/metric_types.md).
   148  
   149  Example payload:
   150  
   151  ```sh
   152  anycable_go.mem_sys_bytes:15516936|g
   153  anycable_go.clients_num:0|g
   154  anycable_go.clients_uniq_num:0|g
   155  anycable_go.broadcast_streams_num:0|g
   156  anycable_go.disconnect_queue_size:0|g
   157  anycable_go.rpc_pending_num:0|g
   158  anycable_go.failed_server_msg_total:1|c
   159  anycable_go.rpc_call_total:1|c
   160  anycable_go.rpc_retries_total:1|c
   161  anycable_go.rpc_error_total:1|c
   162  ```
   163  
   164  ## Default metrics tags
   165  
   166  You can define global tags (added to every reported metric by default) for Prometheus (reported as labels)
   167  and StatsD. For example, we can add environment and node information:
   168  
   169  ```sh
   170  anycable-go --metrics_tags=environment:production,node_id:xyz
   171  # or via environment variables
   172  ANYCABLE_METRICS_TAGS=environment:production,node_id:xyz anycable-go
   173  ```
   174  
   175  For StatsD, you can specify tags format: "datadog" (default), "influxdb", or "graphite".
   176  Use the `statsd_tag_format` configuration parameter for that.
   177  
   178  ## Logging
   179  
   180  Another option is to periodically write stats to log (with `info` level).
   181  
   182  To enable metrics logging pass `--metrics_log` flag.
   183  
   184  Your logs should contain something like this:
   185  
   186  ```sh
   187  INFO 2018-03-06T14:16:27.872Z broadcast_msg_total=0 broadcast_streams_num=0 client_msg_total=0 clients_num=0 clients_uniq_num=0 context=metrics disconnect_queue_size=0 failed_auths_total=0 failed_broadcast_msg_total=0 failed_client_msg_total=0 goroutines_num=35 rpc_call_total=0 rpc_error_total=0
   188  ```
   189  
   190  By default, metrics are logged every 15 seconds (you can change this behavior through `--metrics_rotate_interval` option).
   191  
   192  By default, all available metrics are logged. You can specify a subset of metrics to print to logs via the `--metrics_log_filter` option. For example:
   193  
   194  ```sh
   195  $ anycable-go --metrics_log_filter=clients_num,rpc_call_total,rpc_error_total
   196  
   197  ...
   198  INFO 2023-02-21T15:49:25.744Z context=metrics Log metrics every 15s (only selected fields: clients_num, rpc_call_total, rpc_error_total)
   199  ...
   200  ```
   201  
   202  ### Custom loggers with mruby
   203  
   204  <!-- TODO: add new API, remove "experimental" -->
   205  
   206  > 👨‍🔬 This is an experimental API and could change in the future 👩‍🔬
   207  
   208  AnyCable-Go allows you to write custom log formatters using an embedded [mruby](http://mruby.org) engine.
   209  
   210  mruby is the lightweight implementation of the Ruby language. Hence it is possible to use Ruby to write metrics exporters.
   211  
   212  First, you should download the version of `anycable-go` with mruby (it's not included by default): these binaries have `-mrb` suffix right after the version (i.e. `anycable-go-1.0.0-mrb-linux-amd64`).
   213  
   214  **NOTE**: only MacOS and Linux are supported.
   215  
   216  **NOTE**: when a server with mruby support is starting you should the following message:
   217  
   218  ```sh
   219  $ anycable-go
   220  
   221  INFO 2019-08-07T16:37:46.387Z context=main Starting AnyCable v0.6.2-13-gd421927 (with mruby 1.2.0 (2015-11-17)) (pid: 1362)
   222  ```
   223  
   224  Secondly, write a Ruby script implementing a simple interface:
   225  
   226  ```ruby
   227  # Module MUST be named MetricsFormatter
   228  module MetricsFormatter
   229    # The only required method is .call.
   230    #
   231    # It accepts the metrics Hash and MUST return a string
   232    def self.call(data)
   233      data.to_json
   234    end
   235  end
   236  ```
   237  
   238  Finally, specify `--metrics_log_formatter` when running a server:
   239  
   240  ```sh
   241  anycable-go --metrics_log_formatter path/to/custom_printer.rb
   242  ```
   243  
   244  #### Example
   245  
   246  This a [Librato](https://www.librato.com)-compatible printer:
   247  
   248  ```ruby
   249  module MetricsFormatter
   250    def self.call(data)
   251      parts = []
   252  
   253      data.each do |key, value|
   254        parts << "sample##{key}=#{value}"
   255      end
   256  
   257      parts.join(" ")
   258    end
   259  end
   260  ```
   261  
   262  ```sh
   263  INFO 2018-04-27T14:11:59.701Z sample#clients_num=0 sample#clients_uniq_num=0 sample#goroutines_num=0
   264  ```