github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/blocks-storage/_index.md

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/blocks-storage/_index.md (about)

     1  ---
     2  title: "Blocks Storage"
     3  linkTitle: "Blocks Storage"
     4  weight: 3
     5  menu:
     6  ---
     7  
     8  The blocks storage is a Cortex storage engine based on [Prometheus TSDB](https://prometheus.io/docs/prometheus/latest/storage/): it stores each tenant's time series into their own TSDB which write out their series to a on-disk block (defaults to 2h block range periods). Each block is composed by chunk files - containing the timestamp-value pairs for multiple series - and an index, which indexes metric names and labels to time series in the chunk files.
     9  
    10  The supported backends for the blocks storage are:
    11  
    12  * [Amazon S3](https://aws.amazon.com/s3)
    13  * [Google Cloud Storage](https://cloud.google.com/storage/)
    14  * [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
    15  * [OpenStack Swift](https://wiki.openstack.org/wiki/Swift) (experimental)
    16  * [Local Filesystem](https://thanos.io/storage.md/#filesystem) (single node only)
    17  
    18  _Internally, some components are based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
    19  
    20  ## Architecture
    21  
    22  When running the Cortex blocks storage, the Cortex architecture doesn't significantly change and thus the [general architecture documentation](../architecture.md) applies to the blocks storage as well. However, there are two additional Cortex services when running the blocks storage:
    23  
    24  - [Store-gateway](./store-gateway.md)
    25  - [Compactor](./compactor.md)
    26  
    27  ![Architecture](/images/blocks-storage/architecture.png)
    28  <!-- Diagram source at https://docs.google.com/presentation/d/1bHp8_zcoWCYoNU2AhO2lSagQyuIrghkCncViSqn14cU/edit -->
    29  
    30  The **[store-gateway](./store-gateway.md)** is responsible to query blocks and is used by the [querier](./querier.md) at query time. The store-gateway is required when running the blocks storage.
    31  
    32  The **[compactor](./compactor.md)** is responsible to merge and deduplicate smaller blocks into larger ones, in order to reduce the number of blocks stored in the long-term storage for a given tenant and query them more efficiently. It also keeps the [bucket index](./bucket-index.md) updated and, for this reason, it's a required component.
    33  
    34  The `alertmanager` and `ruler` components can also use object storage to store its configurations and rules uploaded by users.  In that case a separate bucket should be created to store alertmanager configurations and rules: using the same bucket between ruler/alertmanager and blocks will cause issue with the **[compactor](./compactor.md)**.
    35  
    36  Finally, the [**table-manager**](../chunks-storage/table-manager.md) and the [**schema config**](../chunks-storage/schema-config.md) are **not used** by the blocks storage.
    37  
    38  ### The write path
    39  
    40  **Ingesters** receive incoming samples from the distributors. Each push request belongs to a tenant, and the ingester appends the received samples to the specific per-tenant TSDB stored on the local disk. The received samples are both kept in-memory and written to a write-ahead log (WAL) and used to recover the in-memory series in case the ingester abruptly terminates. The per-tenant TSDB is lazily created in each ingester as soon as the first samples are received for that tenant.
    41  
    42  The in-memory samples are periodically flushed to disk - and the WAL truncated - when a new TSDB block is created, which by default occurs every 2 hours. Each newly created block is then uploaded to the long-term storage and kept in the ingester until the configured `-blocks-storage.tsdb.retention-period` expires, in order to give [queriers](./querier.md) and [store-gateways](./store-gateway.md) enough time to discover the new block on the storage and download its index-header.
    43  
    44  In order to effectively use the **WAL** and being able to recover the in-memory series upon ingester abruptly termination, the WAL needs to be stored to a persistent disk which can survive in the event of an ingester failure (ie. AWS EBS volume or GCP persistent disk when running in the cloud). For example, if you're running the Cortex cluster in Kubernetes, you may use a StatefulSet with a persistent volume claim for the ingesters. The location on the filesystem where the WAL is stored is the same where local TSDB blocks (compacted from head) are stored and cannot be decoupled.  See also the [timeline of block uploads](production-tips/#how-to-estimate--querierquery-store-after) and [disk space estimate](production-tips/#ingester-disk-space).
    45  
    46  #### Distributor series sharding and replication
    47  
    48  The series sharding and replication done by the distributor doesn't change based on the storage engine.
    49  
    50  It's important to note that - differently than the [chunks storage](../chunks-storage/_index.md) - due to the replication factor N (typically 3), each time series is stored by N ingesters. Since each ingester writes its own block to the long-term storage, this leads a storage utilization N times more than the chunks storage. [Compactor](./compactor.md) solves this problem by merging blocks from multiple ingesters into a single block, and removing duplicated samples. After blocks compaction, the storage utilization is significantly smaller compared to the chunks storage for the same exact series and samples.
    51  
    52  For more information, please refer to the following dedicated sections:
    53  
    54  - [Compactor](./compactor.md)
    55  - [Production tips](./production-tips.md)
    56  
    57  ### The read path
    58  
    59  [Queriers](./querier.md) and [store-gateways](./store-gateway.md) periodically iterate over the storage bucket to discover blocks recently uploaded by ingesters.
    60  
    61  For each discovered block, queriers only download the block's `meta.json` file (containing some metadata including min and max timestamp of samples within the block), while store-gateways download the `meta.json` as well as the index-header, which is a small subset of the block's index used by the store-gateway to lookup series at query time.
    62  
    63  Queriers use the blocks metadata to compute the list of blocks that need to be queried at query time and fetch matching series from the store-gateway instances holding the required blocks.
    64  
    65  For more information, please refer to the following dedicated sections:
    66  
    67  - [Querier](./querier.md)
    68  - [Store-gateway](./store-gateway.md)
    69  - [Production tips](./production-tips.md)
    70  
    71  ## Configuration
    72  
    73  The general [configuration documentation](../configuration/config-file-reference.md) also applies to a Cortex cluster running the blocks storage. The blocks storage can be enabled switching the storage `engine` to `blocks`:
    74  
    75  ```yaml
    76  storage:
    77    # The storage engine to use. Use "blocks" for the blocks storage.
    78    # CLI flag: -store.engine
    79    engine: blocks
    80  ```
    81  
    82  ## Known issues
    83  
    84  GitHub issues tagged with the [`storage/blocks`](https://github.com/cortexproject/cortex/issues?q=is%3Aopen+is%3Aissue+label%3Astorage%2Fblocks) label are the best source of currently known issues affecting the blocks storage.