github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/operations/query-auditor.md

github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/operations/query-auditor.md (about)

     1  ---
     2  title: "Query Auditor (tool)"
     3  linkTitle: "Query Auditor (tool)"
     4  weight: 2
     5  slug: query-auditor
     6  ---
     7  
     8  The query auditor is a tool bundled in the Cortex repository, but **not** included in Docker images -- this must be built from source. It's primarily useful for those _developing_ Cortex, but can be helpful to operators as well during certain scenarios (backend migrations come to mind).
     9  
    10  ## How it works
    11  
    12  The `query-audit` tool performs a set of queries against two backends that expose the Prometheus read API. This is generally the `query-frontend` component of two Cortex deployments. It will then compare the differences in the responses to determine the average difference for each query. It does this by:
    13  
    14   - Ensuring the resulting label sets match.
    15   - For each label set, ensuring they contain the same number of samples as their pair from the other backend.
    16   - For each sample, calculates their difference against it's pair from the other backend/label set.
    17   - Calculates the average diff per query from the above diffs.
    18  
    19  ### Limitations
    20  
    21  It currently only supports queries with `Matrix` response types.
    22  
    23  ### Use cases
    24  
    25  - Correctness testing when working on the read path.
    26  - Comparing results from different backends.
    27  
    28  ### Example Configuration
    29  
    30  ```yaml
    31  control:
    32    host: http://localhost:8080/prometheus
    33    headers:
    34      "X-Scope-OrgID": 1234
    35  
    36  test:
    37    host: http://localhost:8081/prometheus
    38    headers:
    39      "X-Scope-OrgID": 1234
    40  
    41  queries:
    42    - query: 'sum(rate(container_cpu_usage_seconds_total[5m]))'
    43      start: 2019-11-25T00:00:00Z
    44      end: 2019-11-28T00:00:00Z
    45      step_size: 15m
    46    - query: 'sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name)'
    47      start: 2019-11-25T00:00:00Z
    48      end: 2019-11-28T00:00:00Z
    49      step_size: 15m
    50    - query: 'sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name)'
    51      start: 2019-11-25T00:00:00Z
    52      end: 2019-11-26T00:00:00Z
    53      step_size: 15m
    54    - query: 'histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job))'
    55      start: 2019-11-25T00:00:00Z
    56      end: 2019-11-25T06:00:00Z
    57      step_size: 15m
    58      # two shardable legs
    59    - query: 'sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m]))'
    60      start: 2019-11-25T00:00:00Z
    61      end: 2019-11-25T06:00:00Z
    62      step_size: 15m
    63      # one shardable leg
    64    - query: 'sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m])'
    65      start: 2019-11-25T00:00:00Z
    66      end: 2019-11-25T06:00:00Z
    67      step_size: 15m
    68  ```
    69  
    70  ### Example Output
    71  
    72  Under ideal circumstances, you'll see output like the following:
    73  
    74  ```
    75  $ go run ./tools/query-audit/ -f config.yaml
    76  
    77  0.000000% avg diff for:
    78          query: sum(rate(container_cpu_usage_seconds_total[5m]))
    79          series: 1
    80          samples: 289
    81          start: 2019-11-25 00:00:00 +0000 UTC
    82          end: 2019-11-28 00:00:00 +0000 UTC
    83          step: 15m0s
    84  
    85  0.000000% avg diff for:
    86          query: sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name)
    87          series: 95
    88          samples: 25877
    89          start: 2019-11-25 00:00:00 +0000 UTC
    90          end: 2019-11-28 00:00:00 +0000 UTC
    91          step: 15m0s
    92  
    93  0.000000% avg diff for:
    94          query: sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name)
    95          series: 4308
    96          samples: 374989
    97          start: 2019-11-25 00:00:00 +0000 UTC
    98          end: 2019-11-26 00:00:00 +0000 UTC
    99          step: 15m0s
   100  
   101  0.000000% avg diff for:
   102          query: histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job))
   103          series: 13
   104          samples: 325
   105          start: 2019-11-25 00:00:00 +0000 UTC
   106          end: 2019-11-25 06:00:00 +0000 UTC
   107          step: 15m0s
   108  
   109  0.000000% avg diff for:
   110          query: sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m]))
   111          series: 21
   112          samples: 525
   113          start: 2019-11-25 00:00:00 +0000 UTC
   114          end: 2019-11-25 06:00:00 +0000 UTC
   115          step: 15m0s
   116  
   117  0.000000% avg diff for:
   118          query: sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m])
   119          series: 942
   120          samples: 23550
   121          start: 2019-11-25 00:00:00 +0000 UTC
   122          end: 2019-11-25 06:00:00 +0000 UTC
   123          step: 15m0s
   124  
   125  0.000000% avg diff for:
   126          query: sum by (namespace) (predict_linear(container_cpu_usage_seconds_total[5m], 10))
   127          series: 16
   128          samples: 400
   129          start: 2019-11-25 00:00:00 +0000 UTC
   130          end: 2019-11-25 06:00:00 +0000 UTC
   131          step: 15m0s
   132  
   133  0.000000% avg diff for:
   134          query: sum by (namespace) (avg_over_time((rate(container_cpu_usage_seconds_total[5m]))[10m:]) > 1)
   135          series: 4
   136          samples: 52
   137          start: 2019-11-25 00:00:00 +0000 UTC
   138          end: 2019-11-25 01:00:00 +0000 UTC
   139          step: 5m0s
   140  ```