github.com/grafana/pyroscope@v1.18.0/docs/sources/configure-server/configure-shuffle-sharding/index.md

github.com/grafana/pyroscope@v1.18.0/docs/sources/configure-server/configure-shuffle-sharding/index.md (about)

1 ---
2 description: Learn how to configure shuffle sharding.
3 menuTitle: Shuffle sharding
4 title: Configure Grafana Pyroscope shuffle sharding
5 weight: 800
6 ---
7
8 # Configure Grafana Pyroscope shuffle sharding
9
10 Grafana Pyroscope leverages sharding to horizontally scale both single- and multi-tenant clusters beyond the capacity of a single node.
11
12 ## Background
13
14 Grafana Pyroscope uses a sharding strategy that distributes the workload across a subset of the instances that run a given component.
15 For example, on the write path, each tenant's series are sharded across a subset of the ingesters.
16 The size of this subset, which is the number of instances, is configured using the `shard size` parameter, which by default is `0`.
17 This default value means that each tenant uses all available instances, in order to fairly balance resources such as CPU and memory usage, and to maximize the usage of these resources across the cluster.
18
19 In a multi-tenant cluster this default (`0`) value introduces the following downsides:
20
21 - An outage affects all tenants.
22 - A misbehaving tenant, for example, a tenant that causes an out-of-memory error, can negatively affect all other tenants.
23
24 Configuring a shard size value higher than `0` enables shuffle sharding. The goal of shuffle sharding is to reduce the blast radius of an outage and better isolate tenants.
25
26 ## About shuffle sharding
27
28 Shuffle sharding is a technique that isolates different tenant's workloads and gives each tenant a single-tenant experience, even if they're running in a shared cluster.
29 For more information about how AWS describes shuffle sharding, refer to [What is shuffle sharding?](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/).
30
31 Shuffle sharding assigns each tenant a shard that is composed of a subset of the Grafana Pyroscope instances.
32 This technique minimizes the number of overlapping instances between two tenants.
33 Shuffle sharding provides the following benefits:
34
35 - An outage on some Grafana Pyroscope cluster instances or nodes only affect a subset of tenants.
36 - A misbehaving tenant only affects its shard instances.
37 Assuming that each tenant shard is relatively small compared to the total number of instances in the cluster, it’s likely that any other tenant runs on different instances or that only a subset of instances match the affected instances.
38
39 Using shuffle sharding doesn't require more resources, but can result in unbalanced instances.
40
41 ### Low overlapping instances probability
42
43 For example, in a Grafana Pyroscope cluster that runs 50 ingesters and assigns each tenant four out of 50 ingesters, by shuffling instances between each tenant, there are 230,000 possible combinations.
44
45 Randomly picking two tenants yields the following probabilities:
46
47 - 71% chance that they do not share any instance
48 - 26% chance that they share only 1 instance
49 - 2.7% chance that they share 2 instances
50 - 0.08% chance that they share 3 instances
51 - 0.0004% chance that their instances fully overlap
52
53 ![Shuffle sharding probability](shuffle-sharding-probability.png)
54
55 [//]: # "Diagram source of shuffle-sharding probability at https://docs.google.com/spreadsheets/d/1FXbiWTXi6bdERtamH-IfmpgFq1fNL4GP_KX_yJvbRi4/edit"
56
57 ## Grafana Pyroscope shuffle sharding
58
59 Grafana Pyroscope supports shuffle sharding in the following components:
60
61 - [Ingesters](#ingesters-shuffle-sharding)
62 - [Query-frontend / Query-scheduler](#query-frontend-and-query-scheduler-shuffle-sharding)
63 - [Store-gateway](#store-gateway-shuffle-sharding)
64 - [Compactor](#compactor-shuffle-sharding)
65
66 When you run Grafana Pyroscope with the default configuration, shuffle sharding is disabled and you need to explicitly enable it by increasing the shard size either globally or for a given tenant.
67
68 > **Note:** If the shard size value is equal to or higher than the number of available instances, for example where `-distributor.ingestion-tenant-shard-size` is higher than the number of ingesters, then shuffle sharding is disabled and all instances are used again.
69
70 ### Guaranteed properties
71
72 The Grafana Pyroscope shuffle sharding implementation provides the following benefits:
73
74 - **Stability** 
75 Given a consistent state of the hash ring, the shuffle sharding algorithm always selects the same instances for a given tenant, even across different machines.
76 - **Consistency** 
77 Adding or removing an instance from the hash ring leads to, at most, only one instance changed in each tenant's shard.
78 - **Shuffling** 
79 Probabilistically and for a large enough cluster, shuffle sharding ensures that every tenant receives a different set of instances with a reduced number of overlapping instances between two tenants, which improves failure isolation.
80
81 ### Ingesters shuffle sharding
82
83 By default, the Grafana Pyroscope distributor divides the received series among all running ingesters.
84
85 When you enable ingester shuffle sharding, the distributor on the write path divide each tenant series among `-distributor.ingestion-tenant-shard-size` number of ingesters, while on the read path, the querier queries only the subset of ingesters that hold the series for a given tenant.
86
87 The shard size can be overridden on a per-tenant basis by setting `ingestion_tenant_shard_size` in the overrides section of the runtime configuration.
88
89 #### Ingesters write path
90
91 To enable shuffle sharding for ingesters on the write path, configure the following flags (or their respective YAML configuration options) on the distributor, ingester, and ruler:
92
93 - `-distributor.ingestion-tenant-shard-size=<size>` 
94 `<size>`: Set the size to the number of ingesters each tenant series should be sharded to. If `<size>` is `0` or is greater than the number of available ingesters in the Grafana Pyroscope cluster, the tenant series are sharded across all ingesters.
95
96 #### Ingesters read path
97
98 Assuming that you have enabled shuffle sharding for the write path, to enable shuffle sharding for ingesters on the read path, configure the following flags (or their respective YAML configuration options) on the querier:
99
100 - `-distributor.ingestion-tenant-shard-size=<size>`
101
102 The following flag is set appropriately by default to enable shuffle sharding for ingesters on the read path. If you need to modify its defaults:
103
104 - `-querier.shuffle-sharding-ingesters-enabled=true` 
105 Shuffle sharding for ingesters on the read path can be explicitly enabled or disabled.
106 - If shuffle sharding is enabled, queriers fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which might have received series since now - `-blocks-storage.tsdb.retention-period`. Otherwise, the request is sent to all ingesters.
107
108 If you enable ingesters shuffle sharding only for the write path, queriers on the read path always query all ingesters instead of querying the subset of ingesters that belong to the tenant's shard.
109 Keeping ingesters shuffle sharding enabled only on the write path does not lead to incorrect query results, but might increase query latency.
110
111 #### Rollout strategy
112
113 If you’re running a Grafana Pyroscope cluster with shuffle sharding disabled, and you want to enable it for the ingesters, use the following rollout strategy to avoid missing querying for any series currently in the ingesters:
114
115 1. Explicitly disable ingesters shuffle-sharding on the read path via `-querier.shuffle-sharding-ingesters-enabled=false` since this is enabled by default.
116 1. Enable ingesters shuffle sharding on the write path.
117 1. Enable ingesters shuffle-sharding on the read path via `-querier.shuffle-sharding-ingesters-enabled=true`.
118
119 #### Limitation: Decreasing the tenant shard size
120
121 The current shuffle sharding implementation in Grafana Pyroscope has a limitation that prevents you from safely decreasing the tenant shard size when you enable ingesters’ shuffle sharding on the read path.
122
123 If a tenant’s shard decreases in size, there is currently no way for the queriers to know how large the tenant shard was previously, and as a result, they potentially miss an ingester with data for that tenant.
124 The blocks-storage.tsdb.retention-period, which is used to select the ingesters that might have received series since 'now - blocks-storage.tsdb.retention-period', doesn't work correctly for finding tenant shards if the tenant shard size is decreased.
125
126 Although decreasing the tenant shard size is not supported, consider the following workaround:
127
128 1. Disable shuffle sharding on the read path via `-querier.shuffle-sharding-ingesters-enabled=false`.
129 1. Decrease the configured tenant shard size.
130 1. Wait for at least the amount of time specified via `-blocks-storage.tsdb.retention-period`.
131 1. Re-enable shuffle sharding on the read path via `-querier.shuffle-sharding-ingesters-enabled=true`.
132
133 ### Query-frontend and query-scheduler shuffle sharding
134
135 By default, all Grafana Pyroscope queriers can execute queries for any tenant.
136
137 When you enable shuffle sharding by setting `-query-frontend.max-queriers-per-tenant` (or its respective YAML configuration option) to a value higher than `0` and lower than the number of available queriers, only the specified number of queriers are eligible to execute queries for a given tenant.
138
139 Note that this distribution happens in query-frontend, or query-scheduler, if used.
140 When using query-scheduler, the `-query-frontend.max-queriers-per-tenant` option must be set for the query-scheduler component.
141 When you don't use query-frontend (with or without query-scheduler), this option is not available.
142
143 You can override the maximum number of queriers on a per-tenant basis by setting `max_queriers_per_tenant` in the overrides section of the runtime configuration.
144
145 #### The impact of a "query of death"
146
147 In the event a tenant sends a "query of death" which causes a querier to crash, the crashed querier becomes disconnected from the query-frontend or query-scheduler, and another running querier is immediately assigned to the tenant's shard.
148
149 If the tenant repeatedly sends this query, the new querier assigned to the tenant's shard crashes as well, and yet another querier is assigned to the shard.
150 This cascading failure can potentially result in all running queriers to crash, one by one, which invalidates the assumption that shuffle sharding contains the blast radius of queries of death.
151
152 To mitigate this negative impact, there are experimental configuration options that enable you to configure a time delay between when a querier disconnects due to a crash and when the crashed querier is replaced by a healthy querier.
153 When you configure a time delay, a tenant that repeatedly sends a "query of death" runs with reduced querier capacity after a querier has crashed.
154 The tenant could end up having no available queriers, but this configuration reduces the likelihood that the crash impacts other tenants.
155
156 A delay of 1 minute might be a reasonable trade-off:
157
158 - Query-frontend: `-query-frontend.querier-forget-delay=1m`
159 - Query-scheduler: `-query-scheduler.querier-forget-delay=1m`
160
161 ### Store-gateway shuffle sharding
162
163 By default, a tenant's blocks are divided among all Grafana Pyroscope store-gateways.
164
165 When you enable store-gateway shuffle sharding by setting `-store-gateway.tenant-shard-size` (or its respective YAML configuration option) to a value higher than `0` and lower than the number of available store-gateways, only the specified number of store-gateways are eligible to load and query blocks for a given tenant.
166 You must set this flag on the store-gateway and querier.
167
168 You can override the store-gateway shard size on a per-tenant basis by setting `store_gateway_tenant_shard_size` in the overrides section of the runtime configuration.
169
170 For more information about the store-gateway, refer to [store-gateway](../../reference-pyroscope-architecture/components/store-gateway/).
171
172 ### Compactor shuffle sharding
173
174 By default, tenant blocks can be compacted by any Grafana Pyroscope compactor.
175
176 When you enable compactor shuffle sharding by setting `-compactor.compactor-tenant-shard-size` (or its respective YAML configuration option) to a value higher than `0` and lower than the number of available compactors, only the specified number of compactors are eligible to compact blocks for a given tenant.
177
178 You can override the compactor shard size on a per-tenant basis setting by `compactor_tenant_shard_size` in the overrides section of the runtime configuration.
179
180 ### Shuffle sharding impact to the KV store
181
182 Shuffle sharding does not add additional overhead to the KV store.
183 Shards are computed client-side and are not stored in the ring.
184 KV store sizing depends primarily on the number of replicas of any component that uses the ring, for example, ingesters, and the number of tokens per replica.
185
186 However, in some components, each tenant's shard is cached in-memory on the client-side, which might slightly increase their memory footprint. Increased memory footprint can happen mostly in the distributor.