github.com/anycable/anycable-go@v1.5.1/docs/instrumentation.md (about) 1 # AnyCable-Go Instrumentation 2 3 AnyCable-Go provides useful statistical information about the service (such as the number of connected clients, received messages, etc.). 4 5 <p style="text-align:center;"> 6 <img width="70%" alt="AnyCable Grafana" src="/assets/images/grafana.png"> 7 </p> 8 9 > Read the ["Real-time stress: AnyCable, k6, WebSockets, and Yabeda"](https://evilmartians.com/chronicles/real-time-stress-anycable-k6-websockets-and-yabeda) post to learn more about AnyCable observability and see example Grafana dashboards. 10 11 ## Metrics and what can we learn from them 12 13 Instrumentation exists to help us preventing and identifying performance issues. Here we provide the list of the most crucial metrics and how to interpret their values. 14 15 **NOTE:** Some values are updated at real time, others (_interval metrics_) are updated periodically (every 5 seconds by default, could be configured via the `stats_refresh_interval` configuration parameter). Interval metrics are marked with the ⏱ icon below. 16 17 **NOTE:** The `*_total` metrics are _counters_; when printing metrics to logs, the delta between two subsequent metrics collections is displayed; Prometheus works with absolute (i.e., cumulative) values and interpolates them on its own. 18 19 ### ⏱ `clients_num` / `clients_uniq_num` 20 21 The `clients_num` shows the current number of _active_ sessions (WebSocket connections). A session is considered activated as soon as it has been **authenticated** and until the connection is closed. 22 23 The `clients_uniq_num` shows the current number of unique _connection identifiers_ across the active sessions. A connection identifier is a combination of `identified_by` values for the corresponding Connection object. 24 25 One the useful derivative of these two metrics is the `clients_uniq_num` / `clients_num` ratio. If it's much less than 1 and is decreasing, that could be an indicator of an improper connection managements at the client side (e.g., creating a _client_ per a component mount or a Turbo navigation instead of re-using a singleton). 26 27 ### `rpc_call_total`, `rpc_error_total`, `rpc_retries_total`, `rpc_pending_num` 28 29 These are the vital metrics of the RPC communication channel. 30 31 The `rpc_error_total` describes the number of failed RPC calls. This is the actual number of _commands_ that failed. The most common reason for the is a lack of network connectivity with the RPC service. Another potential reason is the RPC schema incompatibility (in that case, most RPC requests would fail, i.e., `rpc_call_total / rpc_error_total` tends to 1). 32 33 The `rpc_retries_total` describes the number of retried RPC calls. Retries could happen if the RPC server is exhausted or unavailable (no network connectivity). The former indicates that **concurrency settings for RPC and anycable-go went out of sync** (see [here](./configuration.md)). 34 35 The `rpc_pending_num` is the **key latency metrics** of AnyCable-Go. We limit the number of concurrent RPC requests (to prevent the RPC server exhaustion and retries). If the number of pending requests grows (which means we can not keep up with the rate of incoming messages), you should consider either tuning concurrency settings or scale up your cluster. 36 37 ### `failed_auths_total` 38 39 This `failed_auths_total` indicates the total number of unauthenticated connection attempts and has a special purpose: it helps you identify misconfigured client credentials and malicious behaviour. Ideally, the change rate of this number should be low comparing to the `clients_num`.) 40 41 ### ⏱ `disconnect_queue_size` 42 43 The `disconnect_queue_size` shows the current number of pending Disconnect calls. AnyCable-Go performs Disconnect calls in the background with some throttling (by default, 100 calls per second). 44 45 During the normal operation, the value should be close to zero most of the a time. Larger values or growth could indicate inefficient client-side connection management (high re-connection rate). Spikes could indicate mass disconnect events. 46 47 ### ⏱ `goroutines_num` 48 49 The `goroutines_num` metrics is meant for debugging Go routines leak purposes. The number should be O(N), where N is the `clients_num` value for the OSS version and should be O(1) for the PRO version (unless IO polling is disabled). 50 51 ### `mem_sys_bytes` 52 53 The total bytes of memory obtained from the OS (according to [`runtime.MemStats.Sys`](https://golang.org/pkg/runtime/#MemStats)). 54 55 ## Prometheus 56 57 To enable a HTTP endpoint to serve [Prometheus](https://prometheus.io)-compatible metrics (disabled by default) you must specify `--metrics_http` option (e.g. `--metrics_http="/metrics"`). 58 59 You can also change a listening port and listening host through `--metrics_port` and `--metrics_host` options respectively (by default the same as the main (websocket) server port and host, i.e., using the same server). 60 61 The exported metrics format is the following (NOTE: the list above is just an example and could be incomplete): 62 63 ```sh 64 # HELP anycable_go_clients_num The number of active clients 65 # TYPE anycable_go_clients_num gauge 66 anycable_go_clients_num 0 67 68 # HELP anycable_go_clients_uniq_num The number of unique clients (with respect to connection identifiers) 69 # TYPE anycable_go_clients_uniq_num gauge 70 anycable_go_clients_uniq_num 0 71 72 # HELP anycable_go_client_msg_total The total number of received messages from clients 73 # TYPE anycable_go_client_msg_total counter 74 anycable_go_client_msg_total 5906 75 76 # HELP anycable_go_failed_client_msg_total The total number of unrecognized messages received from clients 77 # TYPE anycable_go_failed_client_msg_total counter 78 anycable_go_failed_client_msg_total 0 79 80 # HELP anycable_go_broadcast_msg_total The total number of messages received through PubSub (for broadcast) 81 # TYPE anycable_go_broadcast_msg_total counter 82 anycable_go_broadcast_msg_total 956 83 84 # HELP anycable_go_failed_broadcast_msg_total The total number of unrecognized messages received through PubSub 85 # TYPE anycable_go_failed_broadcast_msg_total counter 86 anycable_go_failed_broadcast_msg_total 0 87 88 # HELP anycable_go_broadcast_streams_num The number of active broadcasting streams 89 # TYPE anycable_go_broadcast_streams_num gauge 90 anycable_go_broadcast_streams_num 0 91 92 # HELP anycable_go_rpc_call_total The total number of RPC calls 93 # TYPE anycable_go_rpc_call_total counter 94 anycable_go_rpc_call_total 15808 95 96 # HELP anycable_go_rpc_error_total The total number of failed RPC calls 97 # TYPE anycable_go_rpc_error_total counter 98 anycable_go_rpc_error_total 0 99 100 # HELP anycable_go_rpc_retries_total The total number of RPC call retries 101 # TYPE anycable_go_rpc_retries_total counter 102 anycable_go_rpc_retries_total 0 103 104 # HELP anycable_go_rpc_pending_num The number of pending RPC calls 105 # TYPE anycable_go_rpc_pending_num gauge 106 anycable_go_rpc_pending_num 0 107 108 # HELP anycable_go_failed_auths_total The total number of failed authentication attempts 109 # TYPE anycable_go_failed_auths_total counter 110 anycable_go_failed_auths_total 0 111 112 # HELP anycable_go_goroutines_num The number of Go routines 113 # TYPE anycable_go_goroutines_num gauge 114 anycable_go_goroutines_num 5222 115 116 # HELP anycable_go_disconnect_queue_size The size of delayed disconnect 117 # TYPE anycable_go_disconnect_queue_size gauge 118 anycable_go_disconnect_queue_size 0 119 120 # HELP anycable_go_server_msg_total The total number of messages sent to clients 121 # TYPE anycable_go_server_msg_total counter 122 anycable_go_server_msg_total 453 123 124 # HELP anycable_go_failed_server_msg_total The total number of messages failed to send to clients 125 # TYPE anycable_go_failed_server_msg_total counter 126 anycable_go_failed_server_msg_total 0 127 128 # HELP anycable_go_data_sent_total The total amount of bytes sent to clients 129 # TYPE anycable_go_data_sent_total counter 130 anycable_go_data_sent_total 1232434334 131 132 # HELP anycable_go_data_rcvd_total The total amount of bytes received from clients 133 # TYPE anycable_go_data_rcvd_total counter 134 anycable_go_data_rcvd_total 434334 135 ``` 136 137 ## StatsD 138 139 AnyCable also supports emitting real-time metrics to [StatsD](https://github.com/statsd/statsd). 140 141 For that, you must specify the StatsD server UDP host: 142 143 ```sh 144 anycable-go -statsd_host=localhost:8125 145 ``` 146 147 Metrics are pushed with the `anycable_go.` prefix by default. You can override it by specifying the `statsd_prefix` parameter. Find more info about StatsD metric types [here](https://github.com/statsd/statsd/blob/master/docs/metric_types.md). 148 149 Example payload: 150 151 ```sh 152 anycable_go.mem_sys_bytes:15516936|g 153 anycable_go.clients_num:0|g 154 anycable_go.clients_uniq_num:0|g 155 anycable_go.broadcast_streams_num:0|g 156 anycable_go.disconnect_queue_size:0|g 157 anycable_go.rpc_pending_num:0|g 158 anycable_go.failed_server_msg_total:1|c 159 anycable_go.rpc_call_total:1|c 160 anycable_go.rpc_retries_total:1|c 161 anycable_go.rpc_error_total:1|c 162 ``` 163 164 ## Default metrics tags 165 166 You can define global tags (added to every reported metric by default) for Prometheus (reported as labels) 167 and StatsD. For example, we can add environment and node information: 168 169 ```sh 170 anycable-go --metrics_tags=environment:production,node_id:xyz 171 # or via environment variables 172 ANYCABLE_METRICS_TAGS=environment:production,node_id:xyz anycable-go 173 ``` 174 175 For StatsD, you can specify tags format: "datadog" (default), "influxdb", or "graphite". 176 Use the `statsd_tag_format` configuration parameter for that. 177 178 ## Logging 179 180 Another option is to periodically write stats to log (with `info` level). 181 182 To enable metrics logging pass `--metrics_log` flag. 183 184 Your logs should contain something like this: 185 186 ```sh 187 INFO 2018-03-06T14:16:27.872Z broadcast_msg_total=0 broadcast_streams_num=0 client_msg_total=0 clients_num=0 clients_uniq_num=0 context=metrics disconnect_queue_size=0 failed_auths_total=0 failed_broadcast_msg_total=0 failed_client_msg_total=0 goroutines_num=35 rpc_call_total=0 rpc_error_total=0 188 ``` 189 190 By default, metrics are logged every 15 seconds (you can change this behavior through `--metrics_rotate_interval` option). 191 192 By default, all available metrics are logged. You can specify a subset of metrics to print to logs via the `--metrics_log_filter` option. For example: 193 194 ```sh 195 $ anycable-go --metrics_log_filter=clients_num,rpc_call_total,rpc_error_total 196 197 ... 198 INFO 2023-02-21T15:49:25.744Z context=metrics Log metrics every 15s (only selected fields: clients_num, rpc_call_total, rpc_error_total) 199 ... 200 ``` 201 202 ### Custom loggers with mruby 203 204 <!-- TODO: add new API, remove "experimental" --> 205 206 > 👨🔬 This is an experimental API and could change in the future 👩🔬 207 208 AnyCable-Go allows you to write custom log formatters using an embedded [mruby](http://mruby.org) engine. 209 210 mruby is the lightweight implementation of the Ruby language. Hence it is possible to use Ruby to write metrics exporters. 211 212 First, you should download the version of `anycable-go` with mruby (it's not included by default): these binaries have `-mrb` suffix right after the version (i.e. `anycable-go-1.0.0-mrb-linux-amd64`). 213 214 **NOTE**: only MacOS and Linux are supported. 215 216 **NOTE**: when a server with mruby support is starting you should the following message: 217 218 ```sh 219 $ anycable-go 220 221 INFO 2019-08-07T16:37:46.387Z context=main Starting AnyCable v0.6.2-13-gd421927 (with mruby 1.2.0 (2015-11-17)) (pid: 1362) 222 ``` 223 224 Secondly, write a Ruby script implementing a simple interface: 225 226 ```ruby 227 # Module MUST be named MetricsFormatter 228 module MetricsFormatter 229 # The only required method is .call. 230 # 231 # It accepts the metrics Hash and MUST return a string 232 def self.call(data) 233 data.to_json 234 end 235 end 236 ``` 237 238 Finally, specify `--metrics_log_formatter` when running a server: 239 240 ```sh 241 anycable-go --metrics_log_formatter path/to/custom_printer.rb 242 ``` 243 244 #### Example 245 246 This a [Librato](https://www.librato.com)-compatible printer: 247 248 ```ruby 249 module MetricsFormatter 250 def self.call(data) 251 parts = [] 252 253 data.each do |key, value| 254 parts << "sample##{key}=#{value}" 255 end 256 257 parts.join(" ") 258 end 259 end 260 ``` 261 262 ```sh 263 INFO 2018-04-27T14:11:59.701Z sample#clients_num=0 sample#clients_uniq_num=0 sample#goroutines_num=0 264 ```