github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/operations/shuffle-sharding.md

github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/operations/shuffle-sharding.md (about)

     1  ---
     2  title: Shuffle sharding
     3  menuTitle: Shuffle sharding
     4  description: Shuffle sharding can isolate a tenant workload from other tenant workloads, providing a better sharing of resources.
     5  weight: 100
     6  ---
     7  
     8  # Shuffle sharding
     9  
    10  Shuffle sharding is a resource-management technique used to isolate tenant workloads from other tenant workloads, to give each tenant more of a single-tenant experience when running in a shared cluster.
    11  This technique is explained by AWS in their article [Workload isolation using shuffle-sharding](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/).
    12  A reference implementation has been shown in the [Route53 Infima library](https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java).
    13  
    14  ## The issues that shuffle sharding mitigates
    15  
    16  Shuffle sharding can be configured for the query path.
    17  
    18  The query path is sharded by default, and the default does not use shuffle sharding.
    19  Each tenant’s query is sharded across all queriers, so the workload uses all querier instances.
    20  
    21  In a multi-tenant cluster, sharding across all instances of a component may exhibit these issues:
    22  
    23  - Any outage of a component instance affects all tenants
    24  - A misbehaving tenant affects all other tenants
    25  
    26  An individual query may create issues for all tenants.
    27  A single tenant or a group of tenants may issue an expensive query:
    28  one that causes a querier component to hit an out-of-memory error,
    29  or one that causes a querier component to crash.
    30  Once the error occurs,
    31  the tenant or tenants issuing the error-causing query will be reassigned
    32  to other running queriers,
    33  up to the limit imposed by the `max_queriers_per_tenant` configuration.
    34  This, in turn, may affect the queriers that have been reassigned.
    35  
    36  ## How shuffle sharding works
    37  
    38  The idea of shuffle sharding is to assign each tenant to a shard composed by a subset of the Loki queriers, aiming to minimize the overlapping instances between distinct tenants.
    39  
    40  A misbehaving tenant will affect only its shard's queriers. Due to the low overlap of queriers among tenants, only a small subset of tenants will be affected bythe misbehaving tenant.
    41  Shuffle sharding requires no more resources than the default sharding strategy.
    42  
    43  Shuffle sharding does not fix all issues.
    44  If a tenant repeatedly sends a problematic query, the crashed querier
    45  will be disconnected from the query-frontend, and a new querier
    46  will be immediately assigned to the tenant’s shard.
    47  This invalidates the positive effects of shuffle sharding.
    48  In this case,
    49  configuring a delay between when a querier disconnects because of a crash,
    50  and when the crashed querier is actually removed from the tenant’s shard
    51  and another healthy querier is added as a replacement improves the situation.
    52  A delay of 1 minute may be a reasonable value in
    53  the query-frontend with configuration parameter
    54  `-query-frontend.querier-forget-delay=1m`, and in the query-scheduler with configuration parameter
    55  `-query-scheduler.querier-forget-delay=1m`.
    56  
    57  ### Low probability of overlapping instances
    58  
    59  If an example Loki cluster runs 50 queriers and assigns each tenant 4 out of 50 queriers, shuffling instances between each tenant, there are 230K possible combinations.
    60  
    61  Statistically, randomly picking two distinct tenants, there is:
    62  
    63  - a 71% chance that they will not share any instance
    64  - a 26% chance that they will share only 1 instance
    65  - a 2.7% chance that they will share 2 instances
    66  - a 0.08% chance that they will share 3 instances
    67  - only a 0.0004% chance that their instances will fully overlap
    68  
    69  ![overlapping instances probability](../shuffle-sharding-probability.png)
    70  
    71  ## Configuration
    72  
    73  Enable shuffle sharding by setting `-frontend.max-queriers-per-tenant` to a value higher than 0 and lower than the number of available queriers.
    74  The value of the per-tenant configuration
    75  `max_queriers_per_tenant` sets the quantity of allocated queriers.
    76  This option is only available when using the query-frontend, with or without a scheduler.
    77  
    78  The per-tenant configuration parameter
    79  `max_query_parallelism` describes how many sub queries, after query splitting and query sharding, can be scheduled to run at the same time for each request of any tenant.
    80  
    81  Configuration parameter
    82  `querier.concurrency` controls the quanity of worker threads (goroutines) per single querier.
    83  
    84  The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration by `max_queriers_per_tenant`.
    85  
    86  ## Shuffle sharding metrics
    87  
    88  These metrics reveal information relevant to shuffle sharding:
    89  
    90  -  the overall query-scheduler queue duration,  `cortex_query_scheduler_queue_duration_seconds_*`
    91  
    92  -  the query-scheduler queue length per tenant, `cortex_query_scheduler_queue_length`
    93  
    94  -  the query-scheduler queue duration per tenant can be found with this query:
    95      ```
    96      max_over_time({cluster="$cluster",container="query-frontend", namespace="$namespace"} |= "metrics.go" |logfmt | unwrap duration(queue_time) | __error__="" [5m]) by (org_id)
    97      ```
    98  
    99  Too many spikes in any of these metrics may imply:
   100  
   101  -  A particular tenant is trying to use more query resources than they were allocated.
   102  -  That tenant may need an increase in the value of `max_queriers_per_tenant`.
   103  -  Loki instances may be under provisioned.
   104  
   105  A useful query checks how many queriers are being used by each tenant:
   106  
   107  ```
   108  count by (org_id) (sum by (org_id, pod) (count_over_time({job="$namespace/querier", cluster="$cluster"} |= "metrics.go" | logfmt [$__interval])))
   109  ```