github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/proposals/cross-tenant-query-federation.md (about)

     1  ---
     2  title: "Cross-Tenant Query Federation"
     3  linkTitle: "Cross-Tenant Query Federation"
     4  weight: 1
     5  slug: "cross-tenant-query-federation"
     6  ---
     7  
     8  - Author: [Christian Simon](https://github.com/simonswine)
     9  - Date: October 2020
    10  - Status: Accepted
    11  
    12  ## Overview
    13  
    14  This document aims to describe how to implement the ability to allow queries to cover data from more than a single Cortex tenant.
    15  
    16  ## Reasoning
    17  
    18  Adopting a tenancy model within an organization with each tenant representing a department comes with the disadvantage that it will prevent queries from spanning multiple departments. This proposal tries to overcome those limitations.
    19  
    20  ## Alternatives considered
    21  
    22  ### Aggregation in PromQL API clients
    23  
    24  In theory PromQL API clients could be aggregating/correlating query results from multiple tenants. For example Grafana could be used with multiple data sources and a cross tenant query could be achieved through using [Transformations][grafana_transformation].
    25  
    26  As this approach comes with the following disadvantages, it was not considered further:
    27  
    28  - Every PromQL API client needs to support the aggregation from various sources.
    29  
    30  - Queries that are written in PromQL can't be used without extra work across tenants.
    31  
    32  [grafana_transformation]: https://grafana.com/docs/grafana/latest/panels/transformations/
    33  
    34  ### Multi tenant aggregation in the query frontends
    35  
    36  Another approach to multi tenant query federation could be achieved by aggregation of partial query results within the query frontend. For this a query needs to be split into sub queries per tenant and afterwards the partial results need reduced into the final result.
    37  
    38  The [astmapper] package goes down a similar approach, but it cannot parallelize all query types. Ideally multi-tenant query federation should support the full PromQL language and the algorithms necessary would differ per query functions and operators used. This approach was deemed as a fairly complex way to achieve that tenant query federation.
    39  
    40  [astmapper]: https://github.com/cortexproject/cortex/blob/f0c81bb59bf202db820403812e8dabcb64347bfd/pkg/querier/astmapper/parallel.go#L27
    41  
    42  ## Challenges
    43  
    44  ### Aggregate data without overlaps
    45  
    46  #### Challenge
    47  
    48  The series in different tenants might have exactly the same labels and hence potentially collide which each other.
    49  
    50  #### Proposal
    51  
    52  In order to be able to always identify the tenant correctly, queries using multiple tenants should inject a tenant label named `__tenant_id__` and its value containing the tenant ID into the results. A potentially existing label with the same name should be stored in a label with the prefix `original_`.
    53  
    54  Label selectors containing the tenant label should behave like any other labels. This can be achieved by selecting the tenants used in a multi tenant query.
    55  
    56  ### Exposure of feature to the user
    57  
    58  #### Challenge
    59  
    60  The tenant ID is currently read from the `X-Scope-OrgID` HTTP header. The tenant ID has those [documented limitations][cortex_tenant_id] of values being allowed.
    61  
    62  [cortex_tenant_id]: https://cortexmetrics.io/docs/guides/limitations/#tenant-id-naming
    63  
    64  #### Proposal
    65  
    66  For the query path a user should be able to specify a `X-Scope-OrgID` header with multiple tenant IDs.  Multiple tenant IDs should then be propagating throughout out the system until it reaches the querier. The `Queryable` instance returned by the querier component, is wrapped by a `mergeQueryable`, which will aggregate the results from a `Queryable` per tenant and hence treated by the downstream components as a single tenant query.
    67  
    68  To allow such queries to be processed we suggest that an experimental configuration flag `-querier.tenant-federation.enabled` will be added, which is switched off by default. Once enabled the value of the `X-Scope-OrgID` header should be interpreted as `|` separated list of tenant ids. Components which are not expecting multiple tenant ids (e.g. the ingress path) must signal an error if multiple are used.
    69  
    70  
    71  ### Implementing Limits, Fairness and Observability for Cross-Tenant queries
    72  
    73  #### Challenge
    74  
    75  In Cortex the tenant id is used as the primary identifier for those components:
    76  
    77  - The limits that apply to a certain query.
    78  
    79  - The query-frontend maintains a per tenant query queue to implement fairness.
    80  
    81  - Relevant metrics about the query are exposed under a `user` label.
    82  
    83  Having a query spanning multiple tenants, the existing methods are no longer correct.
    84  
    85  #### Proposal
    86  
    87  The identifier for aforementioned features for queries involving more than a single tenant should be derived from: An ordered, distinct list of tenant IDs, which is joined by a `|`. This will produce a reproducible identifier for the same set of tenants no matter which order they have been specified.
    88  
    89  While this feature is considered experimental, this provides some insights and ability to limit multi-tenant queries with these short comings:
    90  
    91  - Cardinality costs to the possible amount of tenant ID combinations.
    92  
    93  - Query limits applied to single tenants part of a multi tenant query are ignored.
    94  
    95  
    96  ## Conclusion
    97  
    98  | Challenge                                                                | Status                       |
    99  |--------------------------------------------------------------------------|------------------------------|
   100  | Aggregate data without overlap                                           | Implementation in PR [#3250] |
   101  | Exposure of feature to the user                                          | Implementation in PR [#3250] |
   102  | Implementing Limits, Fairness and Observability for Cross-Tenant queries | Implementation in PR [#3250] |
   103  
   104  [#3250]: https://github.com/cortexproject/cortex/pull/3250
   105  
   106  ### Future work
   107  
   108  Those features are currently out of scope for this proposal, but we can foresee some interest implementing them after this proposal.
   109  
   110  #### Cross Tenant support for the ruler
   111  
   112  Ability to use multi tenant queries in the ruler.
   113  
   114  #### Allow the identifier for limits, fairness and observability to be switched out
   115  
   116  It would be great if the source identifier could be made more pluggable. This could allow to for example base all of those features on another dimension (e.g. users rather than tenants)
   117  
   118  #### Allow customisation of the label used to expose tenant ids
   119  
   120  As per this proposal the label name `__tenant_id__` is fixed, but users might want to be able to modify that through a configuration option.
   121  
   122  #### Retain overlapping tenant id label values recursively
   123  
   124  As per this proposal the tenant label injection retains an existing label value, but this is not implemented recursively. So if the result already contains `__tenant_id__` and `original__tenant_id__` labels, the value of the latter would be lost.