github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/proposals/cross-tenant-query-federation.md (about) 1 --- 2 title: "Cross-Tenant Query Federation" 3 linkTitle: "Cross-Tenant Query Federation" 4 weight: 1 5 slug: "cross-tenant-query-federation" 6 --- 7 8 - Author: [Christian Simon](https://github.com/simonswine) 9 - Date: October 2020 10 - Status: Accepted 11 12 ## Overview 13 14 This document aims to describe how to implement the ability to allow queries to cover data from more than a single Cortex tenant. 15 16 ## Reasoning 17 18 Adopting a tenancy model within an organization with each tenant representing a department comes with the disadvantage that it will prevent queries from spanning multiple departments. This proposal tries to overcome those limitations. 19 20 ## Alternatives considered 21 22 ### Aggregation in PromQL API clients 23 24 In theory PromQL API clients could be aggregating/correlating query results from multiple tenants. For example Grafana could be used with multiple data sources and a cross tenant query could be achieved through using [Transformations][grafana_transformation]. 25 26 As this approach comes with the following disadvantages, it was not considered further: 27 28 - Every PromQL API client needs to support the aggregation from various sources. 29 30 - Queries that are written in PromQL can't be used without extra work across tenants. 31 32 [grafana_transformation]: https://grafana.com/docs/grafana/latest/panels/transformations/ 33 34 ### Multi tenant aggregation in the query frontends 35 36 Another approach to multi tenant query federation could be achieved by aggregation of partial query results within the query frontend. For this a query needs to be split into sub queries per tenant and afterwards the partial results need reduced into the final result. 37 38 The [astmapper] package goes down a similar approach, but it cannot parallelize all query types. Ideally multi-tenant query federation should support the full PromQL language and the algorithms necessary would differ per query functions and operators used. This approach was deemed as a fairly complex way to achieve that tenant query federation. 39 40 [astmapper]: https://github.com/cortexproject/cortex/blob/f0c81bb59bf202db820403812e8dabcb64347bfd/pkg/querier/astmapper/parallel.go#L27 41 42 ## Challenges 43 44 ### Aggregate data without overlaps 45 46 #### Challenge 47 48 The series in different tenants might have exactly the same labels and hence potentially collide which each other. 49 50 #### Proposal 51 52 In order to be able to always identify the tenant correctly, queries using multiple tenants should inject a tenant label named `__tenant_id__` and its value containing the tenant ID into the results. A potentially existing label with the same name should be stored in a label with the prefix `original_`. 53 54 Label selectors containing the tenant label should behave like any other labels. This can be achieved by selecting the tenants used in a multi tenant query. 55 56 ### Exposure of feature to the user 57 58 #### Challenge 59 60 The tenant ID is currently read from the `X-Scope-OrgID` HTTP header. The tenant ID has those [documented limitations][cortex_tenant_id] of values being allowed. 61 62 [cortex_tenant_id]: https://cortexmetrics.io/docs/guides/limitations/#tenant-id-naming 63 64 #### Proposal 65 66 For the query path a user should be able to specify a `X-Scope-OrgID` header with multiple tenant IDs. Multiple tenant IDs should then be propagating throughout out the system until it reaches the querier. The `Queryable` instance returned by the querier component, is wrapped by a `mergeQueryable`, which will aggregate the results from a `Queryable` per tenant and hence treated by the downstream components as a single tenant query. 67 68 To allow such queries to be processed we suggest that an experimental configuration flag `-querier.tenant-federation.enabled` will be added, which is switched off by default. Once enabled the value of the `X-Scope-OrgID` header should be interpreted as `|` separated list of tenant ids. Components which are not expecting multiple tenant ids (e.g. the ingress path) must signal an error if multiple are used. 69 70 71 ### Implementing Limits, Fairness and Observability for Cross-Tenant queries 72 73 #### Challenge 74 75 In Cortex the tenant id is used as the primary identifier for those components: 76 77 - The limits that apply to a certain query. 78 79 - The query-frontend maintains a per tenant query queue to implement fairness. 80 81 - Relevant metrics about the query are exposed under a `user` label. 82 83 Having a query spanning multiple tenants, the existing methods are no longer correct. 84 85 #### Proposal 86 87 The identifier for aforementioned features for queries involving more than a single tenant should be derived from: An ordered, distinct list of tenant IDs, which is joined by a `|`. This will produce a reproducible identifier for the same set of tenants no matter which order they have been specified. 88 89 While this feature is considered experimental, this provides some insights and ability to limit multi-tenant queries with these short comings: 90 91 - Cardinality costs to the possible amount of tenant ID combinations. 92 93 - Query limits applied to single tenants part of a multi tenant query are ignored. 94 95 96 ## Conclusion 97 98 | Challenge | Status | 99 |--------------------------------------------------------------------------|------------------------------| 100 | Aggregate data without overlap | Implementation in PR [#3250] | 101 | Exposure of feature to the user | Implementation in PR [#3250] | 102 | Implementing Limits, Fairness and Observability for Cross-Tenant queries | Implementation in PR [#3250] | 103 104 [#3250]: https://github.com/cortexproject/cortex/pull/3250 105 106 ### Future work 107 108 Those features are currently out of scope for this proposal, but we can foresee some interest implementing them after this proposal. 109 110 #### Cross Tenant support for the ruler 111 112 Ability to use multi tenant queries in the ruler. 113 114 #### Allow the identifier for limits, fairness and observability to be switched out 115 116 It would be great if the source identifier could be made more pluggable. This could allow to for example base all of those features on another dimension (e.g. users rather than tenants) 117 118 #### Allow customisation of the label used to expose tenant ids 119 120 As per this proposal the label name `__tenant_id__` is fixed, but users might want to be able to modify that through a configuration option. 121 122 #### Retain overlapping tenant id label values recursively 123 124 As per this proposal the tenant label injection retains an existing label value, but this is not implemented recursively. So if the result already contains `__tenant_id__` and `original__tenant_id__` labels, the value of the latter would be lost.