github.com/thanos-io/thanos@v0.32.5/docs/quick-tutorial.md (about)

     1  # Quick Tutorial
     2  
     3  Check out the free, in-browser interactive tutorial [Killercoda Thanos course](https://killercoda.com/thanos). We will be progressively updating our Killercoda course with more scenarios.
     4  
     5  On top of this, find our quick tutorial below.
     6  
     7  ## Prometheus
     8  
     9  Thanos is based on Prometheus. With Thanos, Prometheus always remains as an integral foundation for collecting metrics and alerting using local data.
    10  
    11  Thanos bases itself on vanilla [Prometheus](https://prometheus.io/). We plan to support *all* Prometheus versions beyond v2.2.1.
    12  
    13  NOTE: It is highly recommended to use Prometheus v2.13.0+ due to its remote read improvements.
    14  
    15  Always make sure to run Prometheus as recommended by the Prometheus team:
    16  
    17  * Put Prometheus in the same failure domain. This means in the same network and in the same geographic location as the monitored services.
    18  * Use a persistent disk to persist data across Prometheus restarts.
    19  * Use local compaction for longer retentions.
    20  * Do not change the minimum TSDB block durations.
    21  * Do not scale out Prometheus unless necessary. A single Prometheus instance is already efficient.
    22  
    23  We recommend using Thanos when you need to scale out your Prometheus instance.
    24  
    25  ## Components
    26  
    27  Following the [KISS](https://en.wikipedia.org/wiki/KISS_principle) and Unix philosophies, Thanos is comprised of a set of components where each fulfills a specific role.
    28  
    29  * Sidecar: connects to Prometheus, reads its data for query and/or uploads it to cloud storage.
    30  * Store Gateway: serves metrics inside of a cloud storage bucket.
    31  * Compactor: compacts, downsamples and applies retention on the data stored in the cloud storage bucket.
    32  * Receiver: receives data from Prometheus's remote write write-ahead log, exposes it, and/or uploads it to cloud storage.
    33  * Ruler/Rule: evaluates recording and alerting rules against data in Thanos for exposition and/or upload.
    34  * Querier/Query: implements Prometheus's v1 API to aggregate data from the underlying components.
    35  * Query Frontend: implements Prometheus's v1 API to proxy it to Querier while caching the response and optionally splitting it by queries per day.
    36  
    37  Deployment with Thanos Sidecar for Kubernetes:
    38  
    39  <!---
    40  Source file to copy and edit: https://docs.google.com/drawings/d/1AiMc1qAjASMbtqL6PNs0r9-ynGoZ9LIAtf0b9PjILxw/edit?usp=sharing
    41  -->
    42  
    43  ![Sidecar](https://docs.google.com/drawings/d/e/2PACX-1vSJd32gPh8-MC5Ko0-P-v1KQ0Xnxa0qmsVXowtkwVGlczGfVW-Vd415Y6F129zvh3y0vHLBZcJeZEoz/pub?w=960&h=720)
    44  
    45  Deployment via Receive in order to scale out or integrate with other remote write-compatible sources:
    46  
    47  <!---
    48  Source file to copy and edit: https://docs.google.com/drawings/d/1iimTbcicKXqz0FYtSfz04JmmVFLVO9BjAjEzBm5538w/edit?usp=sharing
    49  -->
    50  
    51  ![Receive](https://docs.google.com/drawings/d/e/2PACX-1vRdYP__uDuygGR5ym1dxBzU6LEx5v7Rs1cAUKPsl5BZrRGVl5YIj5lsD_FOljeIVOGWatdAI9pazbCP/pub?w=960&h=720)
    52  
    53  ### Sidecar
    54  
    55  Thanos integrates with existing Prometheus servers as a [sidecar process](https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar#solution), which runs on the same machine or in the same pod as the Prometheus server.
    56  
    57  The purpose of Thanos Sidecar is to back up Prometheus's data into an object storage bucket, and give other Thanos components access to the Prometheus metrics via a gRPC API.
    58  
    59  Sidecar makes use of Prometheus's `reload` endpoint. Make sure it's enabled with the flag `--web.enable-lifecycle`.
    60  
    61  [Sidecar component documentation](components/sidecar.md)
    62  
    63  ### External Storage
    64  
    65  The following configures Sidecar to write Prometheus's data into a configured object storage bucket:
    66  
    67  ```bash
    68  thanos sidecar \
    69      --tsdb.path            /var/prometheus \          # TSDB data directory of Prometheus
    70      --prometheus.url       "http://localhost:9090" \  # Be sure that Sidecar can use this URL!
    71      --objstore.config-file bucket_config.yaml \       # Storage configuration for uploading data
    72  ```
    73  
    74  The exact format of the YAML file depends on the provider you choose. Configuration examples and an up-to-date list of the storage types that Thanos supports are available [here](storage.md).
    75  
    76  Rolling this out has little to no impact on the running Prometheus instance. This allows you to ensure you are backing up your data while figuring out the other pieces of Thanos.
    77  
    78  If you are not interested in backing up any data, the `--objstore.config-file` flag can simply be omitted.
    79  
    80  * *[Example Kubernetes manifests using Prometheus operator](https://github.com/coreos/prometheus-operator/tree/master/example/thanos)*
    81  * *[Example Deploying Sidecar using official Prometheus Helm Chart](../tutorials/kubernetes-helm/README.md)*
    82  * *[Details & Config for other object stores](storage.md)*
    83  
    84  ### Store API
    85  
    86  The Sidecar component implements and exposes a gRPC *[Store API](https://github.com/thanos-io/thanos/blob/main/pkg/store/storepb/rpc.proto#L27)*. This implementation allows you to query the metric data stored in Prometheus.
    87  
    88  Let's extend the Sidecar from the previous section to connect to a Prometheus server, and expose the Store API:
    89  
    90  ```bash
    91  thanos sidecar \
    92      --tsdb.path                 /var/prometheus \
    93      --objstore.config-file      bucket_config.yaml \       # Bucket config file to send data to
    94      --prometheus.url            http://localhost:9090 \    # Location of the Prometheus HTTP server
    95      --http-address              0.0.0.0:19191 \            # HTTP endpoint for collecting metrics on Sidecar
    96      --grpc-address              0.0.0.0:19090              # GRPC endpoint for StoreAPI
    97  ```
    98  
    99  * *[Example Kubernetes manifests using Prometheus operator](https://github.com/coreos/prometheus-operator/tree/master/example/thanos)*
   100  
   101  ### Uploading Old Metrics
   102  
   103  When Sidecar is run with the `--shipper.upload-compacted` flag, it will sync all older existing blocks from Prometheus local storage on startup.
   104  
   105  NOTE: This assumes you never run the Sidecar with block uploading against this bucket. Otherwise, you must manually remove overlapping blocks from the bucket. Those mitigations will be suggested in the sidecar verification process.
   106  
   107  ### External Labels
   108  
   109  Prometheus allows the configuration of "external labels" of a given Prometheus instance. These are meant to globally identify the role of that instance. As Thanos aims to aggregate data across all instances, providing a consistent set of external labels becomes crucial!
   110  
   111  Every Prometheus instance must have a globally unique set of identifying labels. For example, in Prometheus's configuration file:
   112  
   113  ```yaml
   114  global:
   115    external_labels:
   116      region: eu-west
   117      monitor: infrastructure
   118      replica: A
   119  ```
   120  
   121  ## Querier/Query
   122  
   123  Now that we have setup Sidecar for one or more Prometheus instances, we want to use Thanos's global [Query Layer](components/query.md) to evaluate PromQL queries against all instances at once.
   124  
   125  The Querier component is stateless and horizontally scalable, and can be deployed with any number of replicas. Once connected to Thanos Sidecar, it automatically detects which Prometheus servers need to be contacted for a given PromQL query.
   126  
   127  Thanos Querier also implements Prometheus's official HTTP API and can thus be used with external tools such as Grafana. It also serves a derivative of Prometheus's UI for ad-hoc querying and checking the status of the Thanos stores.
   128  
   129  Below, we will set up a Thanos Querier to connect to our Sidecars, and expose its HTTP UI:
   130  
   131  ```bash
   132  thanos query \
   133      --http-address 0.0.0.0:19192 \                                # HTTP Endpoint for Thanos Querier UI
   134      --endpoint     1.2.3.4:19090 \                                # Static gRPC Store API Address for the query node to query
   135      --endpoint     1.2.3.5:19090 \                                # Also repeatable
   136      --endpoint     dnssrv+_grpc._tcp.thanos-store.monitoring.svc  # Supports DNS A & SRV records
   137  ```
   138  
   139  Go to the configured HTTP address, which should now show a UI similar to that of Prometheus. You can now query across all Prometheus instances within the cluster. You can also check out the Stores page, which shows all of your stores.
   140  
   141  [Query documentation](components/query.md)
   142  
   143  ### Deduplicating Data from Prometheus HA Pairs
   144  
   145  The Querier component is also capable of deduplicating data collected from Prometheus HA pairs. This requires configuring Prometheus's `global.external_labels` configuration block to identify the role of a given Prometheus instance.
   146  
   147  A typical configuration uses the label name "replica" with whatever value you choose. For example, you might set up the following in Prometheus's configuration file:
   148  
   149  ```yaml
   150  global:
   151    external_labels:
   152      region: eu-west
   153      monitor: infrastructure
   154      replica: A
   155  # ...
   156  ```
   157  
   158  In a Kubernetes stateful deployment, the replica label can also be the pod name.
   159  
   160  Ensure your Prometheus instances have been reloaded with the configuration you defined above. Then, in Thanos Querier, we will define `replica` as the label we want to enable deduplication on:
   161  
   162  ```bash
   163  thanos query \
   164      --http-address        0.0.0.0:19192 \
   165      --endpoint            1.2.3.4:19090 \
   166      --endpoint            1.2.3.5:19090 \
   167      --query.replica-label replica          # Replica label for deduplication
   168      --query.replica-label replicaX         # Supports multiple replica labels for deduplication
   169  ```
   170  
   171  Go to the configured HTTP address, and you should now be able to query across all Prometheus instances and receive deduplicated data.
   172  
   173  * *[Example Kubernetes manifest](https://github.com/thanos-io/kube-thanos/blob/master/manifests/thanos-query-deployment.yaml)*
   174  
   175  ### Communication Between Components
   176  
   177  The only required communication between nodes is for a Thanos Querier to be able to reach the gRPC Store APIs that you provide. Thanos Querier periodically calls the info endpoint to collect up-to-date metadata as well as check the health of a given Store API. That metadata includes the information about time windows and external labels for each node.
   178  
   179  There are various ways to tell Thanos Querier about the Store APIs it should query data from. The simplest way is to use a static list of well known addresses to query. These are repeatable, so you can add as many endpoints as you need. You can also put a DNS domain prefixed by `dns+` or `dnssrv+` to have a Thanos Querier do an `A` or `SRV` lookup to get all the required IPs it should communicate with.
   180  
   181  ```bash
   182  thanos query \
   183      --http-address 0.0.0.0:19192 \              # Endpoint for Thanos Querier UI
   184      --grpc-address 0.0.0.0:19092 \              # gRPC endpoint for Store API
   185      --endpoint     1.2.3.4:19090 \              # Static gRPC Store API Address for the query node to query
   186      --endpoint     1.2.3.5:19090 \              # Also repeatable
   187      --endpoint     dns+rest.thanos.peers:19092  # Use DNS lookup for getting all registered IPs as separate Store APIs
   188  ```
   189  
   190  Read more details [here](service-discovery.md).
   191  
   192  * *[Example Kubernetes manifests using Prometheus operator](https://github.com/coreos/prometheus-operator/tree/master/example/thanos)*
   193  
   194  ## Store Gateway
   195  
   196  As Thanos Sidecar backs up data into the object storage bucket of your choice, you can decrease Prometheus's retention in order to store less data locally. However, we need a way to query all that historical data again. Store Gateway does just that, by implementing the same gRPC data API as Sidecar, but backing it with data it can find in your object storage bucket. Just like sidecars and query nodes, Store Gateway exposes a Store API and needs to be discovered by Thanos Querier.
   197  
   198  ```bash
   199  thanos store \
   200      --data-dir             /var/thanos/store \   # Disk space for local caches
   201      --objstore.config-file bucket_config.yaml \  # Bucket to fetch data from
   202      --http-address         0.0.0.0:19191 \       # HTTP endpoint for collecting metrics on the Store Gateway
   203      --grpc-address         0.0.0.0:19090         # GRPC endpoint for StoreAPI
   204  ```
   205  
   206  Store Gateway uses a small amount of disk space for caching basic information about data in the object storage bucket. This will rarely exceed more than a few gigabytes and is used to improve restart times. It is useful but not required to preserve it across restarts.
   207  
   208  * *[Example Kubernetes manifest](https://github.com/thanos-io/kube-thanos/blob/master/manifests/thanos-store-statefulSet.yaml)*
   209  
   210  [Store Gateway documentation](components/store.md)
   211  
   212  ## Compactor
   213  
   214  A local Prometheus installation periodically compacts older data to improve query efficiency. Since Sidecar backs up data into an object storage bucket as soon as possible, we need a way to apply the same process to data in the bucket.
   215  
   216  Thanos Compactor simply scans the object storage bucket and performs compaction where required. At the same time, it is responsible for creating downsampled copies of data in order to speed up queries.
   217  
   218  ```bash
   219  thanos compact \
   220      --data-dir             /var/thanos/compact \  # Temporary workspace for data processing
   221      --objstore.config-file bucket_config.yaml \   # Bucket where compacting will be performed
   222      --http-address         0.0.0.0:19191          # HTTP endpoint for collecting metrics on the compactor
   223  ```
   224  
   225  Compactor is not in the critical path of querying or data backup. It can either be run as a periodic batch job or be left running to always compact data as soon as possible. It is recommended to provide 100-300GB of local disk space for data processing.
   226  
   227  *NOTE: Compactor must be run as a **singleton** and must not run when manually modifying data in the bucket.*
   228  
   229  * *[Example Kubernetes manifest](https://github.com/thanos-io/kube-thanos/blob/master/examples/all/manifests/thanos-compact-statefulSet.yaml)*
   230  
   231  [Compactor documentation](components/compact.md)
   232  
   233  ## Ruler/Rule
   234  
   235  In case Prometheus running with Thanos Sidecar does not have enough retention, or if you want to have alerts or recording rules that require a global view, Thanos has just the component for that: the [Ruler](components/rule.md), which does rule and alert evaluation on top of a given Thanos Querier.
   236  
   237  [Rule documentation](components/rule.md)