github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/proposals/cross-tenant-query-federation.md

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/proposals/cross-tenant-query-federation.md (about)

1 ---
2 title: "Cross-Tenant Query Federation"
3 linkTitle: "Cross-Tenant Query Federation"
4 weight: 1
5 slug: "cross-tenant-query-federation"
6 ---
7
8 - Author: [Christian Simon](https://github.com/simonswine)
9 - Date: October 2020
10 - Status: Accepted
11
12 ## Overview
13
14 This document aims to describe how to implement the ability to allow queries to cover data from more than a single Cortex tenant.
15
16 ## Reasoning
17
18 Adopting a tenancy model within an organization with each tenant representing a department comes with the disadvantage that it will prevent queries from spanning multiple departments. This proposal tries to overcome those limitations.
19
20 ## Alternatives considered
21
22 ### Aggregation in PromQL API clients
23
24 In theory PromQL API clients could be aggregating/correlating query results from multiple tenants. For example Grafana could be used with multiple data sources and a cross tenant query could be achieved through using [Transformations][grafana_transformation].
25
26 As this approach comes with the following disadvantages, it was not considered further:
27
28 - Every PromQL API client needs to support the aggregation from various sources.
29
30 - Queries that are written in PromQL can't be used without extra work across tenants.
31
32 [grafana_transformation]: https://grafana.com/docs/grafana/latest/panels/transformations/
33
34 ### Multi tenant aggregation in the query frontends
35
36 Another approach to multi tenant query federation could be achieved by aggregation of partial query results within the query frontend. For this a query needs to be split into sub queries per tenant and afterwards the partial results need reduced into the final result.
37
38 The [astmapper] package goes down a similar approach, but it cannot parallelize all query types. Ideally multi-tenant query federation should support the full PromQL language and the algorithms necessary would differ per query functions and operators used. This approach was deemed as a fairly complex way to achieve that tenant query federation.
39
40 [astmapper]: https://github.com/cortexproject/cortex/blob/f0c81bb59bf202db820403812e8dabcb64347bfd/pkg/querier/astmapper/parallel.go#L27
41
42 ## Challenges
43
44 ### Aggregate data without overlaps
45
46 #### Challenge
47
48 The series in different tenants might have exactly the same labels and hence potentially collide which each other.
49
50 #### Proposal
51
52 In order to be able to always identify the tenant correctly, queries using multiple tenants should inject a tenant label named `__tenant_id__` and its value containing the tenant ID into the results. A potentially existing label with the same name should be stored in a label with the prefix `original_`.
53
54 Label selectors containing the tenant label should behave like any other labels. This can be achieved by selecting the tenants used in a multi tenant query.
55
56 ### Exposure of feature to the user
57
58 #### Challenge
59
60 The tenant ID is currently read from the `X-Scope-OrgID` HTTP header. The tenant ID has those [documented limitations][cortex_tenant_id] of values being allowed.
61
62 [cortex_tenant_id]: https://cortexmetrics.io/docs/guides/limitations/#tenant-id-naming
63
64 #### Proposal
65
66 For the query path a user should be able to specify a `X-Scope-OrgID` header with multiple tenant IDs. Multiple tenant IDs should then be propagating throughout out the system until it reaches the querier. The `Queryable` instance returned by the querier component, is wrapped by a `mergeQueryable`, which will aggregate the results from a `Queryable` per tenant and hence treated by the downstream components as a single tenant query.
67
68 To allow such queries to be processed we suggest that an experimental configuration flag `-querier.tenant-federation.enabled` will be added, which is switched off by default. Once enabled the value of the `X-Scope-OrgID` header should be interpreted as `|` separated list of tenant ids. Components which are not expecting multiple tenant ids (e.g. the ingress path) must signal an error if multiple are used.
69
70
71 ### Implementing Limits, Fairness and Observability for Cross-Tenant queries
72
73 #### Challenge
74
75 In Cortex the tenant id is used as the primary identifier for those components:
76
77 - The limits that apply to a certain query.
78
79 - The query-frontend maintains a per tenant query queue to implement fairness.
80
81 - Relevant metrics about the query are exposed under a `user` label.
82
83 Having a query spanning multiple tenants, the existing methods are no longer correct.
84
85 #### Proposal
86
87 The identifier for aforementioned features for queries involving more than a single tenant should be derived from: An ordered, distinct list of tenant IDs, which is joined by a `|`. This will produce a reproducible identifier for the same set of tenants no matter which order they have been specified.
88
89 While this feature is considered experimental, this provides some insights and ability to limit multi-tenant queries with these short comings:
90
91 - Cardinality costs to the possible amount of tenant ID combinations.
92
93 - Query limits applied to single tenants part of a multi tenant query are ignored.
94
95
96 ## Conclusion
97
98 | Challenge | Status |
99 |--------------------------------------------------------------------------|------------------------------|
100 | Aggregate data without overlap | Implementation in PR [#3250] |
101 | Exposure of feature to the user | Implementation in PR [#3250] |
102 | Implementing Limits, Fairness and Observability for Cross-Tenant queries | Implementation in PR [#3250] |
103
104 [#3250]: https://github.com/cortexproject/cortex/pull/3250
105
106 ### Future work
107
108 Those features are currently out of scope for this proposal, but we can foresee some interest implementing them after this proposal.
109
110 #### Cross Tenant support for the ruler
111
112 Ability to use multi tenant queries in the ruler.
113
114 #### Allow the identifier for limits, fairness and observability to be switched out
115
116 It would be great if the source identifier could be made more pluggable. This could allow to for example base all of those features on another dimension (e.g. users rather than tenants)
117
118 #### Allow customisation of the label used to expose tenant ids
119
120 As per this proposal the label name `__tenant_id__` is fixed, but users might want to be able to modify that through a configuration option.
121
122 #### Retain overlapping tenant id label values recursively
123
124 As per this proposal the tenant label injection retains an existing label value, but this is not implemented recursively. So if the result already contains `__tenant_id__` and `original__tenant_id__` labels, the value of the latter would be lost.