github.com/thanos-io/thanos@v0.32.5/docs/proposals-done/202003-thanos-rules-federation.md (about)

     1  ---
     2  type: proposal
     3  title: Global / Federated Rules API
     4  status: accepted
     5  owner: s-urbaniak
     6  menu: proposals-done
     7  ---
     8  
     9  ### Related Tickets
    10  
    11  https://github.com/thanos-io/thanos/issues/1375
    12  
    13  ## Summary
    14  
    15  Thanos allows a global query view for the Prometheus series. This means discovering, connecting various (often remote) "leafs" components and aggregating series data from them.
    16  
    17  Since we work with metrics, evaluating recording rules and alerts is a very important part of our system. Some of our "StoreAPIs" like Prometheus and Thanos Ruler are designed for that. However, we currently don’t have a way to present those resources in a federated way e.g Alerts and Recording Rules. This document explores the potential way of solving this in the Prometheus Ecosystem.
    18  
    19  ## Motivation
    20  
    21  Deployment scenarios with various levels of federated setups are not unusual. Some deployments leverage Thanos Ruler to query multiple data sources (i.e. via Thanos Querier) for rule and alert evaluation. This topology sacrifices availability. At the same time in the same deployment topology, a dedicated Prometheus instance is deployed for certain critical alerting rules implying high availability. Here, we are missing a federation middleware exposing a consolidated view of evaluated rules and alerts in the underlying Prometheus and Thanos Ruler instances.
    22  
    23  Pitfalls of current solutions:
    24  * It’s tedious to manually visit each leaf and we are lazy.
    25  * Visiting leaf instances (be it Thanos Ruler or dedicated Prometheus endpoints) have to be implemented manually.
    26  
    27  ## Goals
    28  
    29  * Present other Prometheus/Thanos Resources in a global view.
    30    * This also means an up-to-date view (e.g statuses)
    31  * Simple to run if you already run Thanos.
    32    * Use consistent protocols (if we suddenly switch to pure HTTP from gRPC it will break user’s proxies for auth, routing, monitoring, rate limiting, etc)
    33    * Potentially use the same connections, discovered targets by Querier.
    34  
    35  ## Verification
    36  
    37  * Unit tests which would fire up a dummy Rules API and check different scenarios.
    38  * Ad-hoc testing.
    39  
    40  ## Proposal
    41  
    42  ### Rules API
    43  
    44  We propose to add following proto service and propagate all using existing components that have rules, so:
    45  * Sidecar
    46  * Querier (for federation)
    47  * Ruler
    48  
    49  The newly introduced `Rules` service is designed to reflect [Prometheus' Rules API](https://github.com/prometheus/prometheus/blob/bc703b64568ebfaecf27b9b70be737ad318e217a/web/api/v1/api.go) allowing to retrieve recording and alerting rules.
    50  
    51  ```proto
    52  service Rules {
    53    /// Rules has info for all rules.
    54    rpc Rules(RulesRequest) returns (stream RulesResponse);
    55  }
    56  ```
    57  
    58  Sidecar proxies the Rules request to its local Prometheus instance and synthesizes the response. Similarly, Thanos Ruler constructs the response based on its local state.
    59  
    60  Thanos Querier fans-out to all know Rules endpoints configured via `--rule` command line flag, merges and deduplicates the result. This new setting is meant to configure rules endpoints as a strict subset of store endpoints as specified with `--store` and `--store.sd-files`. If a user specifies a `--rule` endpoint not matching `--store`/`--store.sd-files` endpoints the initial implementation would log out that fact. In the future (see below) it is planned to have this a more separate setting.
    61  
    62  Generally the deduplication logic is less complex than with time series, specifically:
    63  
    64  * Deduplication happens first at the rule group level. The identifier is the group name and the group file.
    65  * Then, per group name deduplication happens on the rule level, where:
    66  
    67  1. the rule type (recording rule vs. alerting rule)
    68  2. the rule name
    69  3. the rule label names
    70  4. the rule expression `expr` field
    71  5. the alerting rule `for` field
    72  
    73  are being used as the deduplication identifier. Disjunct entries are simply merged by adding them to the result set.
    74  
    75  Thanos Querier presents the result of the fan-out on a `/api/v1/rules` endpoint which is compatible with the Prometheus' rules API endpoint. Additionally Thanos Querier gains a new `--rule.replica-label` command line argument which falls back to `--query.replica-label` if unset. The replica labels refer to the same labels as specified in the `external_labels` section of Prometheus and the `--label` command line argument of Thanos Ruler.
    76  
    77  #### Examples
    78  
    79  A stream of alerting rules is merged from different Thanos Ruler API instances. These may be remote Thanos Ruler, Prometheus Sidecar, or even Thanos Querier instances. The merging Thanos Querier has `--rule.replica-label=replica`
    80  
    81  Scenario 1:
    82  
    83  As specified, the rule type and then rule name is used for deduplication.
    84  
    85  Given the following stream of incoming rule groups and containing recording/alerting rules:
    86  
    87  ```text
    88  group: a
    89     recording:<name:"r1" last_evaluation:<seconds:1 > >
    90     alert:    <name:"a1" last_evaluation:<seconds:1 > >
    91  group: b
    92     recording:<name:"r1" last_evaluation:<seconds:1 > >
    93  group: a
    94     recording:<name:"r2" last_evaluation:<seconds:1 > >
    95  ```
    96  
    97  The output becomes:
    98  
    99  ```text
   100  group: a
   101     alert:    <name:"a1" last_evaluation:<seconds:1 > >
   102     recording:<name:"r1" last_evaluation:<seconds:1 > >
   103     recording:<name:"r2" last_evaluation:<seconds:1 > >
   104  group: b
   105     recording:<name:"r1" last_evaluation:<seconds:1 > >
   106  ```
   107  
   108  Note in the example above how the recording rule `r1` is not deduplicated as it is contained in two different groups.
   109  
   110  Scenario 2:
   111  
   112  The next level of deduplication is governed by the label/value set of the underlying recording/alerting rule while respecting the replica label. For a given conflict, the youngest is preferred. For alerting rules, the youngest firing rule is preferred.
   113  
   114  Given the following stream of incoming recording rules:
   115  
   116  ```text
   117  group: a
   118     recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
   119  group: a
   120     recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
   121  ```
   122  
   123  The output becomes:
   124  
   125  ```text
   126  group: a
   127     recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
   128  ```
   129  
   130  Given the following stream of incoming alerting rules:
   131  
   132  ```text
   133  group: a
   134     alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
   135  group: a
   136     alert:<state:PENDING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
   137  ```
   138  
   139  The output becomes:
   140  
   141  ```text
   142  group: a
   143     alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
   144  ```
   145  
   146  Note how in the above output the firing alerting rule was preferred despite being older.
   147  
   148  Scenario 3:
   149  
   150  If, under the above conditions a rule is a candidate for deduplication, finally the rule `expr` and `for` fields are being considered for deduplication.
   151  
   152  Given the following stream of incoming alerting rules will also result in two independent alerting rules as both the `expr` and `for` fields differ:
   153  
   154  ```text
   155    - alert: KubeAPIErrorBudgetBurn
   156      annotations:
   157        message: The API server is burning too much error budget
   158      expr: |
   159        sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
   160        and
   161        sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
   162      for: 2m
   163      labels:
   164        severity: critical
   165    - alert: KubeAPIErrorBudgetBurn
   166      annotations:
   167        message: The API server is burning too much error budget
   168      expr: |
   169        sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)
   170        and
   171        sum(apiserver_request:burnrate30m) > (6.00 * 0.01000)
   172      for: 15m
   173      labels:
   174        severity: critical
   175  ```
   176  
   177  Scenario 4:
   178  
   179  As specified, the group name and file fields are used for deduplication.
   180  
   181  Given the following stream of incoming rule groups:
   182  
   183  ```text
   184  group: a/file1
   185  group: b/file1
   186  group: a/file2
   187  ```
   188  
   189  The output becomes:
   190  
   191  ```text
   192  group: a/file1
   193  group: a/file2
   194  group: b/file1
   195  ```
   196  
   197  Deduplication of included alerting/recording rules inside groups is described in the previous scenarios.
   198  
   199  ## Alternatives
   200  
   201  * Cortex contains a sharded Ruler. Assigning rules to shards is done via Consul, though a gossip implementation is under development. Shards do not communicate with other shards. Rules come from a store (e.g. a Postgres database).
   202  
   203  ## Work Plan
   204  
   205  * Implement a new flag `--rule` in Thanos Querier which registers RulesAPI endpoints.
   206  * Implement a new flag `--rule.replica-label` in Thanos Querier.
   207  * Implement RulesAPI backends in sidecar, query, rule.
   208  * Feature branch: https://github.com/thanos-io/thanos/pull/2200
   209  
   210  ## Future
   211  
   212  These changes are suggestions which we will need to be discussed in future and are not part of the proposal implementation.
   213  
   214  ### Type and Info
   215  
   216  Currently, `Info` is shared between the `RulesAPI` proposed here and the existing `StoreAPI` services. To accommodate for future additional APIs the following changes of the protobuf `Info` and `Type` structures are suggested.
   217  
   218  The current `StoreType` enum is renamed to `Type`. This retains binary compatibility with older clients:
   219  
   220  ```diff
   221  -enum StoreType {
   222  +enum Type {
   223    UNKNOWN = 0;
   224    QUERY = 1;
   225    RULE = 2;
   226  ```
   227  
   228  The current fields in `InfoResponse` connected to Store APIs are deprecated and dedicated new API sub types are proposed:
   229  
   230  ```diff
   231  message InfoResponse {
   232    // Deprecated. Use label_sets instead.
   233    repeated Label labels = 1 [(gogoproto.nullable) = false];
   234  +  // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
   235    int64 min_time = 2;
   236  +  // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
   237    int64 max_time = 3;
   238  -  StoreType storeType  = 4;
   239  +  Type type  = 4;
   240    // label_sets is an unsorted list of `LabelSet`s.
   241  +  // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
   242    repeated LabelSet label_sets = 5 [(gogoproto.nullable) = false];
   243  +
   244  +  StoreInfoResponse store = 6;
   245  +  RulesInfoResponse rules = 7;
   246  }
   247  ```
   248  
   249  ### Independent `--rule` and `--store` endpoints configuration
   250  
   251  To ease the current implementation, rules endpoints are a strict subset of store endpoints. In the future these settings should be separate, i.e. the user could specify different endpoints for rules and different endpoints for store.