github.com/grafana/pyroscope@v1.18.0/docs/sources/reference-pyroscope-architecture/components/compactor/index.md

github.com/grafana/pyroscope@v1.18.0/docs/sources/reference-pyroscope-architecture/components/compactor/index.md (about)

1 ---
2 description:
3 The compactor increases query performance and reduces long-term storage
4 usage.
5 menuTitle: Compactor
6 title: Grafana Pyroscope compactor
7 weight: 10
8 ---
9
10 # Grafana Pyroscope compactor
11
12 The compactor increases query performance and reduces long-term storage usage by combining blocks.
13
14 The compactor is the component responsible for:
15
16 - Compacting multiple blocks of a given tenant into a single, optimized larger block. This deduplicates chunks and reduces the size of the index, resulting in reduced storage costs. Querying fewer blocks is faster, so it also increases query speed.
17 - Keeping the per-tenant bucket index updated. The [bucket index](../../bucket-index/) is used by [queriers](../querier/) and [store-gateways](../store-gateway/) to discover both new blocks and deleted blocks in the storage.
18
19 The compactor is stateless.
20
21 ## How compaction works
22
23 Compaction occurs on a per-tenant basis.
24
25 The compactor runs at regular, configurable intervals.
26
27 **Vertical compaction** merges all the blocks of a tenant uploaded by ingesters for the same time range (1 hour range by default) into a single block. It also deduplicates samples that were originally written to N blocks as a result of replication. Vertical compaction reduces the number of blocks for a single time range from the quantity of ingesters down to one block per tenant.
28
29 **Horizontal compaction** triggers after a vertical compaction. It compacts several blocks with adjacent range periods into a single larger block. The total size of the associated block chunks does not change after horizontal compaction. The horizontal compaction may significantly reduce the size of the index and the index-header kept in memory by store-gateways.
30
31 ![Compactor - horizontal and vertical compaction](compactor-horizontal-and-vertical-compaction.png)
32
33 
34
35 ## Scaling
36
37 Compaction can be tuned for clusters with large tenants. Configuration specifies both vertical and horizontal scaling of how the compactor runs as it compacts on a per-tenant basis.
38
39 - **Vertical scaling**<br />
40 The setting `-compactor.compaction-concurrency` configures the max number of concurrent compactions running in a single compactor instance. Each compaction uses one CPU core.
41 - **Horizontal scaling**<br />
42 By default, tenant blocks can be compacted by any Grafana Pyroscope compactor. When you enable compactor [shuffle sharding](../../../configure-server/configure-shuffle-sharding/) by setting `-compactor.compactor-tenant-shard-size` (or its respective YAML configuration option) to a value higher than `0` and lower than the number of available compactors, only the specified number of compactors are eligible to compact blocks for a given tenant.
43
44 ## Compaction algorithm
45
46 Pyroscope uses a sophisticated compaction algorithm called split-and-merge.
47
48 By design, the split-and-merge algorithm overcomes time series database (TSDB) index limitations, and it avoids situations in which compacted blocks grow indefinitely for a very large tenant at any compaction stage.
49
50 This compaction strategy is a two-stage process: split and merge.
51 The default configuration disables the split stage.
52
53 To split, the first level of compaction, for example `2h`, the compactor divides all source blocks into _N_ (`-compactor.split-groups`) groups. For each group, the compactor compacts the blocks, but instead of producing a single result block, it outputs _M_ (`-compactor.split-and-merge-shards`) blocks, known as _split blocks_. Each split block contains only a subset of the series belonging to a given shard out of _M_ shards. At the end of the split stage, the compactor produces _N \* M_ blocks with a reference to their respective shard in the block’s `meta.json` file.
54
55 The compactor merges the split blocks for each shard. This compacts all _N_ split blocks of a given shard. The merge reduces the number of blocks from _N \* M_ to _M_. For a given compaction time range, there will be a compacted block for each of the _M_ shards.
56
57 ![Compactor - split-and-merge compaction strategy](compactor-split-and-merge.png)
58
59 
60
61 The merge then runs on other configured compaction time ranges, for example 1h and 4h. It compacts blocks belonging to the same shard.
62
63 This strategy is suitable for clusters with large tenants. The number of shards _M_ is configurable on a per-tenant basis using `-compactor.split-and-merge-shards`, and it can be adjusted based on the number of series of each tenant. The more a tenant grows in terms of series, the more you can grow the configured number of shards. Doing so improves compaction parallelization and keeps each per-shard compacted block size under control.
64
65 The number of split groups, _N_, can also be adjusted per tenant using the `-compactor.split-groups` option. Increasing this value produces more compaction jobs with fewer blocks during the split stage. This allows multiple compactors to work on these jobs, and finish the splitting stage faster. However, increasing this value also generates more intermediate blocks during the split stage, which will only be reduced later in the merge stage.
66
67 If the configuration of `-compactor.split-and-merge-shards` changes during compaction, the change will affect only the compaction of blocks which have not yet been split. Already split blocks will use the original configuration when merged. The original configuration is stored in the `meta.json` of each split block.
68
69 Splitting and merging can be horizontally scaled. Non-conflicting and non-overlapping jobs will be executed in parallel.
70
71 ## Compactor sharding
72
73 The compactor shards compaction jobs, either from a single tenant or multiple tenants. The compaction of a single tenant can be split and processed by multiple compactor instances.
74
75 Whenever the pool of compactors grows or shrinks, tenants and jobs are resharded across the available compactor instances without any manual intervention.
76
77 Compactor sharding uses a [hash ring](../../hash-ring/). At startup, a compactor generates random tokens and registers itself to the compactor hash ring. While running, it periodically scans the storage bucket at every interval defined by `-compactor.compaction-interval`, to discover the list of tenants in storage and to compact blocks for each tenant whose hash matches the token ranges assigned to the instance itself within the hash ring.
78
79 To configure the compactors' hash ring, refer to [configuring memberlist](../../../configure-server/configuring-memberlist/).
80
81 ### Waiting for a stable hash ring at startup
82
83 A cluster cold start or an increase of two or more compactor instances at the same time may result in each new compactor instance starting at a slightly different time. Then, each compactor runs its first compaction based on a different state of the hash ring. This is not an error condition, but it may be inefficient, because multiple compactor instances may start compacting the same tenant at nearly the same time.
84
85 To mitigate the issue, compactors can be configured to wait for a stable hash ring at startup. A ring is considered stable if no instance is added to or removed from the hash ring for at least `-compactor.ring.wait-stability-min-duration`. The maximum time the compactor will wait is controlled by the flag `-compactor.ring.wait-stability-max-duration` (or the respective YAML configuration option). Once the compactor has finished waiting, either because the ring stabilized or because the maximum wait time was reached, it will start up normally.
86
87 The default value of zero for `-compactor.ring.wait-stability-min-duration` disables waiting for ring stability.
88
89 ## Compaction jobs order
90
91 The compactor allows configuring of the compaction jobs order via the `-compactor.compaction-jobs-order` flag (or its respective YAML config option). The configured ordering defines which compaction jobs should be executed first. The following values of `-compactor.compaction-jobs-order` are supported:
92
93 - `smallest-range-oldest-blocks-first` (default)
94
95 This ordering gives priority to smallest range, oldest blocks first.
96
97 For example, with compaction ranges `1h, 4h, 8h`, the compactor will compact the 1h ranges first, and among them give priority to the oldest blocks. Once all blocks in the 1h range have been compacted, it moves to the 2h range, and finally to 8h one.
98
99 All split jobs are moved to the front of the work queue, because finishing all split jobs in a given time range unblocks the merge jobs.
100
101 - `newest-blocks-first`
102
103 This ordering gives priority to the most recent time ranges first, regardless of their compaction level.
104
105 For example, with compaction ranges `1h, 4h, 8h`, the compactor compacts the most recent blocks first (up to the 8h range), and then moves to older blocks. This policy favours the most recent blocks, assuming they are queried the most frequently.
106
107 ## Blocks deletion
108
109 Following a successful compaction, the original blocks are deleted from the storage. Block deletion is not immediate; it follows a two-step process:
110
111 1. An original block is marked for deletion; this is a soft delete
112 1. Once a block has been marked for deletion for longer than the configurable `-compactor.deletion-delay`, the block is deleted from storage; this is a hard delete
113
114 The compactor is responsible for both marking blocks and for hard deletion.
115 Soft deletion is based on a small `deletion-mark.json` file stored within the block location in the bucket.
116
117 The soft delete mechanism gives queriers and store-gateways time to discover the new compacted blocks before the original blocks are deleted. If those original blocks were immediately hard deleted, some queries involving the compacted blocks could temporarily fail or return partial results.
118
119 ## Compactor disk utilization
120
121 The compactor needs to download blocks from the bucket to the local disk, and the compactor needs to store compacted blocks to the local disk before uploading them to the bucket. The largest tenants may need a lot of disk space.
122
123 Assuming `max_compaction_range_blocks_size` is the total block size for the largest tenant during the longest `-compactor.block-ranges` period, the expression that estimates the minimum disk space required is:
124
125 ```
126 compactor.compaction-concurrency * max_compaction_range_blocks_size * 2
127 ```
128
129 ## Compactor configuration
130
131 Refer to the [compactor](../../../configure-server/reference-configuration-parameters/#compactor)
132 block section and the [limits](../../../configure-server/reference-configuration-parameters/#limits) block section for details of compaction-related configuration.