github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/blocks-storage/compactor.template

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/blocks-storage/compactor.template (about)

1 ---
2 title: "Compactor"
3 linkTitle: "Compactor"
4 weight: 4
5 slug: compactor
6 ---
7
8 {{ .GeneratedFileWarning }}
9
10 The **compactor** is an service which is responsible to:
11
12 - Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
13 - Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md), [store-gateways](./store-gateway.md) and rulers to discover new blocks in the storage.
14
15 The compactor is **stateless**.
16
17 ## How compaction works
18
19 The blocks compaction has two main benefits:
20
21 1. Vertically compact blocks uploaded by all ingesters for the same time range
22 2. Horizontally compact blocks with small time ranges into a single larger block
23
24 The **vertical compaction** merges all the blocks of a tenant uploaded by ingesters for the same time range (2 hours ranges by default) into a single block, while **de-duplicating samples** that are originally written to N blocks as a result of replication. This step reduces number of blocks for single 2 hours time range from #(number of ingesters) to 1 per tenant.
25
26 The **horizontal compaction** triggers after the vertical compaction and compacts several blocks with adjacent 2-hour range periods into a single larger block. Even though the total size of block chunks doesn't change after this compaction, it may still significantly reduce the size of the index and the index-header kept in memory by store-gateways.
27
28 ![Compactor - horizontal and vertical compaction](/images/blocks-storage/compactor-horizontal-and-vertical-compaction.png)
29
30 
31
32 ## Compactor sharding
33
34 The compactor optionally supports sharding.
35
36 When sharding is enabled, multiple compactor instances can coordinate to split the workload and shard blocks by tenant. All the blocks of a tenant are processed by a single compactor instance at any given time, but compaction for different tenants may simultaneously run on different compactor instances.
37
38 Whenever the pool of compactors increase or decrease (ie. following up a scale up/down), tenants are resharded across the available compactor instances without any manual intervention.
39
40 The compactor sharding is based on the Cortex [hash ring](../architecture.md#the-hash-ring). At startup, a compactor generates random tokens and registers itself to the ring. While running, it periodically scans the storage bucket (every `-compactor.compaction-interval`) to discover the list of tenants in the storage and compacts blocks for each tenant whose hash matches the token ranges assigned to the instance itself within the ring.
41
42 This feature can be enabled via `-compactor.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-compactor.ring.*` flags (or their respective YAML config options).
43
44 ### Waiting for stable ring at startup
45
46 In the event of a cluster cold start or scale up of 2+ compactor instances at the same time we may end up in a situation where each new compactor instance starts at a slightly different time and thus each one runs the first compaction based on a different state of the ring. This is not a critical condition, but may be inefficient, because multiple compactor replicas may start compacting the same tenant nearly at the same time.
47
48 To reduce the likelihood this could happen, the compactor waits for a stable ring at startup. A ring is considered stable if no instance is added/removed to the ring for at least `-compactor.ring.wait-stability-min-duration`. If the ring keep getting changed after `-compactor.ring.wait-stability-max-duration`, the compactor will stop waiting for a stable ring and will proceed starting up normally.
49
50 To disable this waiting logic, you can start the compactor with `-compactor.ring.wait-stability-min-duration=0`.
51
52 ## Soft and hard blocks deletion
53
54 When the compactor successfully compacts some source blocks into a larger block, source blocks are deleted from the storage. Blocks deletion is not immediate, but follows a two steps process:
55
56 1. First, a block is **marked for deletion** (soft delete)
57 2. Then, once a block is marked for deletion for longer then `-compactor.deletion-delay`, the block is **deleted** from the storage (hard delete)
58
59 The compactor is both responsible to mark blocks for deletion and then hard delete them once the deletion delay expires.
60 The soft deletion is based on a tiny `deletion-mark.json` file stored within the block location in the bucket which gets looked up both by queriers and store-gateways.
61
62 This soft deletion mechanism is used to give enough time to queriers and store-gateways to discover the new compacted blocks before the old source blocks are deleted. If source blocks would be immediately hard deleted by the compactor, some queries involving the compacted blocks may fail until the queriers and store-gateways haven't rescanned the bucket and found both deleted source blocks and the new compacted ones.
63
64 ## Compactor disk utilization
65
66 The compactor needs to download source blocks from the bucket to the local disk, and store the compacted block to the local disk before uploading it to the bucket. Depending on the largest tenants in your cluster and the configured `-compactor.block-ranges`, the compactor may need a lot of disk space.
67
68 Assuming `max_compaction_range_blocks_size` is the total size of blocks for the largest tenant (you can measure it inspecting the bucket) and the longest `-compactor.block-ranges` period, the formula to estimate the minimum disk space required is:
69
70 ```
71 min_disk_space_required = compactor.compaction-concurrency * max_compaction_range_blocks_size * 2
72 ```
73
74 Alternatively, assuming the largest `-compactor.block-ranges` is `24h` (default), you could consider 150GB of disk space every 10M active series owned by the largest tenant. For example, if your largest tenant has 30M active series and `-compactor.compaction-concurrency=1` we would recommend having a disk with at least 450GB available.
75
76 ## Compactor HTTP endpoints
77
78 - `GET /compactor/ring`<br />
79 Displays the status of the compactors ring, including the tokens owned by each compactor and an option to remove (forget) instances from the ring.
80
81 ## Compactor configuration
82
83 This section described the compactor configuration. For the general Cortex configuration and references to common config blocks, please refer to the [configuration documentation](../configuration/config-file-reference.md).
84
85 {{ .CompactorConfigBlock }}