github.com/thanos-io/thanos@v0.32.5/docs/proposals-done/202003-thanos-rules-federation.md (about) 1 --- 2 type: proposal 3 title: Global / Federated Rules API 4 status: accepted 5 owner: s-urbaniak 6 menu: proposals-done 7 --- 8 9 ### Related Tickets 10 11 https://github.com/thanos-io/thanos/issues/1375 12 13 ## Summary 14 15 Thanos allows a global query view for the Prometheus series. This means discovering, connecting various (often remote) "leafs" components and aggregating series data from them. 16 17 Since we work with metrics, evaluating recording rules and alerts is a very important part of our system. Some of our "StoreAPIs" like Prometheus and Thanos Ruler are designed for that. However, we currently don’t have a way to present those resources in a federated way e.g Alerts and Recording Rules. This document explores the potential way of solving this in the Prometheus Ecosystem. 18 19 ## Motivation 20 21 Deployment scenarios with various levels of federated setups are not unusual. Some deployments leverage Thanos Ruler to query multiple data sources (i.e. via Thanos Querier) for rule and alert evaluation. This topology sacrifices availability. At the same time in the same deployment topology, a dedicated Prometheus instance is deployed for certain critical alerting rules implying high availability. Here, we are missing a federation middleware exposing a consolidated view of evaluated rules and alerts in the underlying Prometheus and Thanos Ruler instances. 22 23 Pitfalls of current solutions: 24 * It’s tedious to manually visit each leaf and we are lazy. 25 * Visiting leaf instances (be it Thanos Ruler or dedicated Prometheus endpoints) have to be implemented manually. 26 27 ## Goals 28 29 * Present other Prometheus/Thanos Resources in a global view. 30 * This also means an up-to-date view (e.g statuses) 31 * Simple to run if you already run Thanos. 32 * Use consistent protocols (if we suddenly switch to pure HTTP from gRPC it will break user’s proxies for auth, routing, monitoring, rate limiting, etc) 33 * Potentially use the same connections, discovered targets by Querier. 34 35 ## Verification 36 37 * Unit tests which would fire up a dummy Rules API and check different scenarios. 38 * Ad-hoc testing. 39 40 ## Proposal 41 42 ### Rules API 43 44 We propose to add following proto service and propagate all using existing components that have rules, so: 45 * Sidecar 46 * Querier (for federation) 47 * Ruler 48 49 The newly introduced `Rules` service is designed to reflect [Prometheus' Rules API](https://github.com/prometheus/prometheus/blob/bc703b64568ebfaecf27b9b70be737ad318e217a/web/api/v1/api.go) allowing to retrieve recording and alerting rules. 50 51 ```proto 52 service Rules { 53 /// Rules has info for all rules. 54 rpc Rules(RulesRequest) returns (stream RulesResponse); 55 } 56 ``` 57 58 Sidecar proxies the Rules request to its local Prometheus instance and synthesizes the response. Similarly, Thanos Ruler constructs the response based on its local state. 59 60 Thanos Querier fans-out to all know Rules endpoints configured via `--rule` command line flag, merges and deduplicates the result. This new setting is meant to configure rules endpoints as a strict subset of store endpoints as specified with `--store` and `--store.sd-files`. If a user specifies a `--rule` endpoint not matching `--store`/`--store.sd-files` endpoints the initial implementation would log out that fact. In the future (see below) it is planned to have this a more separate setting. 61 62 Generally the deduplication logic is less complex than with time series, specifically: 63 64 * Deduplication happens first at the rule group level. The identifier is the group name and the group file. 65 * Then, per group name deduplication happens on the rule level, where: 66 67 1. the rule type (recording rule vs. alerting rule) 68 2. the rule name 69 3. the rule label names 70 4. the rule expression `expr` field 71 5. the alerting rule `for` field 72 73 are being used as the deduplication identifier. Disjunct entries are simply merged by adding them to the result set. 74 75 Thanos Querier presents the result of the fan-out on a `/api/v1/rules` endpoint which is compatible with the Prometheus' rules API endpoint. Additionally Thanos Querier gains a new `--rule.replica-label` command line argument which falls back to `--query.replica-label` if unset. The replica labels refer to the same labels as specified in the `external_labels` section of Prometheus and the `--label` command line argument of Thanos Ruler. 76 77 #### Examples 78 79 A stream of alerting rules is merged from different Thanos Ruler API instances. These may be remote Thanos Ruler, Prometheus Sidecar, or even Thanos Querier instances. The merging Thanos Querier has `--rule.replica-label=replica` 80 81 Scenario 1: 82 83 As specified, the rule type and then rule name is used for deduplication. 84 85 Given the following stream of incoming rule groups and containing recording/alerting rules: 86 87 ```text 88 group: a 89 recording:<name:"r1" last_evaluation:<seconds:1 > > 90 alert: <name:"a1" last_evaluation:<seconds:1 > > 91 group: b 92 recording:<name:"r1" last_evaluation:<seconds:1 > > 93 group: a 94 recording:<name:"r2" last_evaluation:<seconds:1 > > 95 ``` 96 97 The output becomes: 98 99 ```text 100 group: a 101 alert: <name:"a1" last_evaluation:<seconds:1 > > 102 recording:<name:"r1" last_evaluation:<seconds:1 > > 103 recording:<name:"r2" last_evaluation:<seconds:1 > > 104 group: b 105 recording:<name:"r1" last_evaluation:<seconds:1 > > 106 ``` 107 108 Note in the example above how the recording rule `r1` is not deduplicated as it is contained in two different groups. 109 110 Scenario 2: 111 112 The next level of deduplication is governed by the label/value set of the underlying recording/alerting rule while respecting the replica label. For a given conflict, the youngest is preferred. For alerting rules, the youngest firing rule is preferred. 113 114 Given the following stream of incoming recording rules: 115 116 ```text 117 group: a 118 recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> > 119 group: a 120 recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> > 121 ``` 122 123 The output becomes: 124 125 ```text 126 group: a 127 recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> > 128 ``` 129 130 Given the following stream of incoming alerting rules: 131 132 ```text 133 group: a 134 alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> > 135 group: a 136 alert:<state:PENDING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> > 137 ``` 138 139 The output becomes: 140 141 ```text 142 group: a 143 alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> > 144 ``` 145 146 Note how in the above output the firing alerting rule was preferred despite being older. 147 148 Scenario 3: 149 150 If, under the above conditions a rule is a candidate for deduplication, finally the rule `expr` and `for` fields are being considered for deduplication. 151 152 Given the following stream of incoming alerting rules will also result in two independent alerting rules as both the `expr` and `for` fields differ: 153 154 ```text 155 - alert: KubeAPIErrorBudgetBurn 156 annotations: 157 message: The API server is burning too much error budget 158 expr: | 159 sum(apiserver_request:burnrate1h) > (14.40 * 0.01000) 160 and 161 sum(apiserver_request:burnrate5m) > (14.40 * 0.01000) 162 for: 2m 163 labels: 164 severity: critical 165 - alert: KubeAPIErrorBudgetBurn 166 annotations: 167 message: The API server is burning too much error budget 168 expr: | 169 sum(apiserver_request:burnrate6h) > (6.00 * 0.01000) 170 and 171 sum(apiserver_request:burnrate30m) > (6.00 * 0.01000) 172 for: 15m 173 labels: 174 severity: critical 175 ``` 176 177 Scenario 4: 178 179 As specified, the group name and file fields are used for deduplication. 180 181 Given the following stream of incoming rule groups: 182 183 ```text 184 group: a/file1 185 group: b/file1 186 group: a/file2 187 ``` 188 189 The output becomes: 190 191 ```text 192 group: a/file1 193 group: a/file2 194 group: b/file1 195 ``` 196 197 Deduplication of included alerting/recording rules inside groups is described in the previous scenarios. 198 199 ## Alternatives 200 201 * Cortex contains a sharded Ruler. Assigning rules to shards is done via Consul, though a gossip implementation is under development. Shards do not communicate with other shards. Rules come from a store (e.g. a Postgres database). 202 203 ## Work Plan 204 205 * Implement a new flag `--rule` in Thanos Querier which registers RulesAPI endpoints. 206 * Implement a new flag `--rule.replica-label` in Thanos Querier. 207 * Implement RulesAPI backends in sidecar, query, rule. 208 * Feature branch: https://github.com/thanos-io/thanos/pull/2200 209 210 ## Future 211 212 These changes are suggestions which we will need to be discussed in future and are not part of the proposal implementation. 213 214 ### Type and Info 215 216 Currently, `Info` is shared between the `RulesAPI` proposed here and the existing `StoreAPI` services. To accommodate for future additional APIs the following changes of the protobuf `Info` and `Type` structures are suggested. 217 218 The current `StoreType` enum is renamed to `Type`. This retains binary compatibility with older clients: 219 220 ```diff 221 -enum StoreType { 222 +enum Type { 223 UNKNOWN = 0; 224 QUERY = 1; 225 RULE = 2; 226 ``` 227 228 The current fields in `InfoResponse` connected to Store APIs are deprecated and dedicated new API sub types are proposed: 229 230 ```diff 231 message InfoResponse { 232 // Deprecated. Use label_sets instead. 233 repeated Label labels = 1 [(gogoproto.nullable) = false]; 234 + // Deprecated. Will be removed in favor of StoreInfoResponse in the future. 235 int64 min_time = 2; 236 + // Deprecated. Will be removed in favor of StoreInfoResponse in the future. 237 int64 max_time = 3; 238 - StoreType storeType = 4; 239 + Type type = 4; 240 // label_sets is an unsorted list of `LabelSet`s. 241 + // Deprecated. Will be removed in favor of StoreInfoResponse in the future. 242 repeated LabelSet label_sets = 5 [(gogoproto.nullable) = false]; 243 + 244 + StoreInfoResponse store = 6; 245 + RulesInfoResponse rules = 7; 246 } 247 ``` 248 249 ### Independent `--rule` and `--store` endpoints configuration 250 251 To ease the current implementation, rules endpoints are a strict subset of store endpoints. In the future these settings should be separate, i.e. the user could specify different endpoints for rules and different endpoints for store.