github.com/NVIDIA/aistore@v1.3.23-0.20240517131212-7df6609be51d/docs/prometheus.md (about)

     1  ---
     2  layout: post
     3  title: PROMETHEUS
     4  permalink: /docs/prometheus
     5  redirect_from:
     6   - /prometheus.md/
     7   - /docs/prometheus.md/
     8  ---
     9  
    10  ## Monitoring AIStore with Prometheus
    11  
    12  AIStore tracks a growing list of performance counters, utilization percentages, latency and throughput metrics, transmitted and received stats (total bytes and numbers of objects), error counters, and more.
    13  
    14  Viewership is equally supported via:
    15  * AIS node logs
    16  * [CLI](/docs/cli.md), and specifically
    17  * [`ais show cluster stats`](/docs/cli/cluster.md) command
    18  
    19  On the monitoring backend side, AIS equally supports:
    20  * [StatsD](https://github.com/etsy/statsd) with any compliant backend (e.g., Graphite/Grafana), and
    21  * [Prometheus](https://prometheus.io/)
    22  
    23  This document mostly talks about the "Prometheus" option. Other related documentation includes [AIS metrics](metrics.md) readme that provides general background, naming conventions and examples, and also have a separate section on `aisloader` metrics - the metrics generated by `aisloader` when running its benches.
    24  
    25  > For `aisloader`, please refer to [Load Generator](/docs/aisloader.md) and [How To Benchmark AIStore](howto_benchmark.md).
    26  
    27  ## Prometheus Exporter
    28  
    29  AIStore is a fully compliant [Prometheus exporter](https://prometheus.io/docs/instrumenting/writing_exporters/) that natively supports [Prometheus](https://prometheus.io/) stats collection. There's no special configuration - the only thing required to enable the corresponding integration is letting AIStore know whether to publish its stats via StatsD **or** Prometheus.
    30  
    31  The corresponding binary choice between StatsD and Prometheus is a **deployment-time** switch that is a single environment variable: **AIS_PROMETHEUS**. When a starting-up AIS node (gateway or storage target) sees `AIS_PROMETHEUS` in the environment it registers all its metric descriptions (names, labels, and helps) with Prometheus and provides HTTP endpoint `/metrics` for subsequent collection (aka "scraping") by Prometheus.
    32  
    33  > With no `AIS_PROMETHEUS` in the environment, AIS nodes default to StatsD.
    34  
    35  Here's a simplified example:
    36  
    37  ```console
    38  $ AIS_PROMETHEUS=true aisnode -config=/etc/ais/ais.json -local_config=/etc/ais/ais_local.json -role=target
    39  
    40  # Assuming the target with hostname "hostname" listens on port 8081:
    41  $ curl http://hostname:8081/metrics | grep ais
    42  
    43  # A sample output follows below (note the metric names that must be self-explanatory):
    44  
    45    # TYPE ais_target_DFIltrTgz_disk_sda_avg_rsize gauge
    46    ais_target_DFIltrTgz_disk_sda_avg_rsize 23560
    47    # HELP ais_target_DFIltrTgz_disk_sda_avg_wsize average write size (bytes)
    48    # TYPE ais_target_DFIltrTgz_disk_sda_avg_wsize gauge
    49    ais_target_DFIltrTgz_disk_sda_avg_wsize 63120
    50    # HELP ais_target_DFIltrTgz_disk_sda_util gauge
    51    # TYPE ais_target_DFIltrTgz_disk_sda_util gauge
    52    ais_target_DFIltrTgz_disk_sda_util 42
    53    # HELP ais_target_DFIltrTgz_get_mbps throughput (MB/s)
    54    # TYPE ais_target_DFIltrTgz_get_mbps gauge
    55    ais_target_DFIltrTgz_get_mbps 72.65
    56    # HELP ais_target_DFIltrTgz_get_ms latency (milliseconds)
    57    # TYPE ais_target_DFIltrTgz_get_ms gauge
    58    ais_target_DFIltrTgz_get_ms 2
    59    # HELP ais_target_DFIltrTgz_get_n total number of operations
    60    # TYPE ais_target_DFIltrTgz_get_n counter
    61    ais_target_DFIltrTgz_get_n 155431
    62    # HELP ais_target_DFIltrTgz_get_redir_ms latency (milliseconds)
    63    # TYPE ais_target_DFIltrTgz_get_redir_ms gauge
    64    ais_target_DFIltrTgz_get_redir_ms 0
    65    # HELP ais_target_DFIltrTgz_kalive_ms latency (milliseconds)
    66    # TYPE ais_target_DFIltrTgz_kalive_ms gauge
    67    ais_target_DFIltrTgz_kalive_ms 1
    68    # HELP ais_target_DFIltrTgz_lst_ms latency (milliseconds)
    69    # TYPE ais_target_DFIltrTgz_lst_ms gauge
    70    ais_target_DFIltrTgz_lst_ms 2
    71    # HELP ais_target_DFIltrTgz_lst_n total number of operations
    72    # TYPE ais_target_DFIltrTgz_lst_n counter
    73    ais_target_DFIltrTgz_lst_n 120
    74    # HELP ais_target_DFIltrTgz_put_ms latency (milliseconds)
    75    # TYPE ais_target_DFIltrTgz_put_ms gauge
    76    ais_target_DFIltrTgz_put_ms 5
    77    # HELP ais_target_DFIltrTgz_put_n total number of operations
    78    ...
    79  ```
    80  
    81  References:
    82  
    83  * https://prometheus.io/docs/instrumenting/writing_exporters/
    84  * https://prometheus.io/docs/concepts/data_model/
    85  * https://prometheus.io/docs/concepts/metric_types/
    86  
    87  ## StatsD Exporter for Prometheus
    88  
    89  If, for whatever reason, you decide to use the "StatsD" option, you can still send AIS stats to Prometheus - via its own generic [statsd_exporter](https://github.com/prometheus/statsd_exporter) extension that on-the-fly translates StatsD formatted metrics.
    90  
    91  > **Note**: while native Prometheus integration (the previous section) is the preferred and recommended option [statsd_exporter](https://github.com/prometheus/statsd_exporter) can be considered a backup plan for deployments with very special requirements.
    92  
    93  First, the picture:
    94  
    95  ![AIStore monitoring with Prometheus](images/statsd-exporter.png)
    96  
    97  The diagram depicts AIS cluster that runs an arbitrary number of nodes with each node periodically sending its StatsD metrics to a configured UDP address of any compliant StatsD server. In fact, [statsd_exporter](https://github.com/prometheus/statsd_exporter) is one such compliant StatsD server that happens to be available out of the box.
    98  
    99  To deploy [statsd_exporter](https://github.com/prometheus/statsd_exporter):
   100  
   101  * you could either use [prebuilt container image](https://quay.io/repository/prometheus/statsd-exporter);
   102  * or, `git clone` or `go install` the exporter's own repository at https://github.com/prometheus/statsd_exporter and then run it as shown above. Just take a note of the default StatsD port: **8125**.
   103  
   104  To test a combination of AIStore and [statsd_exporter](https://github.com/prometheus/statsd_exporter) without Prometheus, run the exporter with debug:
   105  
   106  ```console
   107  $ statsd_exporter --statsd.listen-udp localhost:8125 --log.level debug
   108  ```
   109  
   110  The resulting (debug) output will look something like:
   111  
   112  ```console
   113  level=info ts=2021-05-13T15:30:22.251Z caller=main.go:321 msg="Starting StatsD -> Prometheus Exporter" version="(version=, branch=, revision=)"
   114  level=info ts=2021-05-13T15:30:22.251Z caller=main.go:322 msg="Build context" context="(go=go1.16.3, user=, date=)"
   115  level=info ts=2021-05-13T15:30:22.251Z caller=main.go:361 msg="Accepting StatsD Traffic" udp=localhost:8125 tcp=:9125 unixgram=
   116  level=info ts=2021-05-13T15:30:22.251Z caller=main.go:362 msg="Accepting Prometheus Requests" addr=:9102
   117  level=debug ts=2021-05-13T15:30:27.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
   118  level=debug ts=2021-05-13T15:30:29.891Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:77|c
   119  level=debug ts=2021-05-13T15:30:37.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
   120  level=debug ts=2021-05-13T15:30:39.892Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:78|c
   121  level=debug ts=2021-05-13T15:30:47.811Z caller=listener.go:73 msg="Incoming line" proto=udp line=aistarget.pakftUgh.kalive.latency:1|ms
   122  level=debug ts=2021-05-13T15:30:49.892Z caller=listener.go:73 msg="Incoming line" proto=udp line=aisproxy.qYyhpllR.pst.count:79|c
   123  ...
   124  ```
   125  
   126  Finally, point any available Prometheus instance to poll the listening port - **9102** by default - of the exporter.
   127  
   128  Note that the two listening ports mentioned - StatsD port **8125** and Prometheus port **9102** - are both configurable via the exporter's command line. To see all supported options, run:
   129  
   130  ```console
   131  $ statsd_exporter --help
   132  ```