github.com/muhammadn/cortex@v1.9.1-0.20220510110439-46bb7000d03d/docs/operations/query-auditor.md (about) 1 --- 2 title: "Query Auditor (tool)" 3 linkTitle: "Query Auditor (tool)" 4 weight: 2 5 slug: query-auditor 6 --- 7 8 The query auditor is a tool bundled in the Cortex repository, but **not** included in Docker images -- this must be built from source. It's primarily useful for those _developing_ Cortex, but can be helpful to operators as well during certain scenarios (backend migrations come to mind). 9 10 ## How it works 11 12 The `query-audit` tool performs a set of queries against two backends that expose the Prometheus read API. This is generally the `query-frontend` component of two Cortex deployments. It will then compare the differences in the responses to determine the average difference for each query. It does this by: 13 14 - Ensuring the resulting label sets match. 15 - For each label set, ensuring they contain the same number of samples as their pair from the other backend. 16 - For each sample, calculates their difference against it's pair from the other backend/label set. 17 - Calculates the average diff per query from the above diffs. 18 19 ### Limitations 20 21 It currently only supports queries with `Matrix` response types. 22 23 ### Use cases 24 25 - Correctness testing when working on the read path. 26 - Comparing results from different backends. 27 28 ### Example Configuration 29 30 ```yaml 31 control: 32 host: http://localhost:8080/prometheus 33 headers: 34 "X-Scope-OrgID": 1234 35 36 test: 37 host: http://localhost:8081/prometheus 38 headers: 39 "X-Scope-OrgID": 1234 40 41 queries: 42 - query: 'sum(rate(container_cpu_usage_seconds_total[5m]))' 43 start: 2019-11-25T00:00:00Z 44 end: 2019-11-28T00:00:00Z 45 step_size: 15m 46 - query: 'sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name)' 47 start: 2019-11-25T00:00:00Z 48 end: 2019-11-28T00:00:00Z 49 step_size: 15m 50 - query: 'sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name)' 51 start: 2019-11-25T00:00:00Z 52 end: 2019-11-26T00:00:00Z 53 step_size: 15m 54 - query: 'histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job))' 55 start: 2019-11-25T00:00:00Z 56 end: 2019-11-25T06:00:00Z 57 step_size: 15m 58 # two shardable legs 59 - query: 'sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m]))' 60 start: 2019-11-25T00:00:00Z 61 end: 2019-11-25T06:00:00Z 62 step_size: 15m 63 # one shardable leg 64 - query: 'sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m])' 65 start: 2019-11-25T00:00:00Z 66 end: 2019-11-25T06:00:00Z 67 step_size: 15m 68 ``` 69 70 ### Example Output 71 72 Under ideal circumstances, you'll see output like the following: 73 74 ``` 75 $ go run ./tools/query-audit/ -f config.yaml 76 77 0.000000% avg diff for: 78 query: sum(rate(container_cpu_usage_seconds_total[5m])) 79 series: 1 80 samples: 289 81 start: 2019-11-25 00:00:00 +0000 UTC 82 end: 2019-11-28 00:00:00 +0000 UTC 83 step: 15m0s 84 85 0.000000% avg diff for: 86 query: sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name) 87 series: 95 88 samples: 25877 89 start: 2019-11-25 00:00:00 +0000 UTC 90 end: 2019-11-28 00:00:00 +0000 UTC 91 step: 15m0s 92 93 0.000000% avg diff for: 94 query: sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name) 95 series: 4308 96 samples: 374989 97 start: 2019-11-25 00:00:00 +0000 UTC 98 end: 2019-11-26 00:00:00 +0000 UTC 99 step: 15m0s 100 101 0.000000% avg diff for: 102 query: histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job)) 103 series: 13 104 samples: 325 105 start: 2019-11-25 00:00:00 +0000 UTC 106 end: 2019-11-25 06:00:00 +0000 UTC 107 step: 15m0s 108 109 0.000000% avg diff for: 110 query: sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m])) 111 series: 21 112 samples: 525 113 start: 2019-11-25 00:00:00 +0000 UTC 114 end: 2019-11-25 06:00:00 +0000 UTC 115 step: 15m0s 116 117 0.000000% avg diff for: 118 query: sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m]) 119 series: 942 120 samples: 23550 121 start: 2019-11-25 00:00:00 +0000 UTC 122 end: 2019-11-25 06:00:00 +0000 UTC 123 step: 15m0s 124 125 0.000000% avg diff for: 126 query: sum by (namespace) (predict_linear(container_cpu_usage_seconds_total[5m], 10)) 127 series: 16 128 samples: 400 129 start: 2019-11-25 00:00:00 +0000 UTC 130 end: 2019-11-25 06:00:00 +0000 UTC 131 step: 15m0s 132 133 0.000000% avg diff for: 134 query: sum by (namespace) (avg_over_time((rate(container_cpu_usage_seconds_total[5m]))[10m:]) > 1) 135 series: 4 136 samples: 52 137 start: 2019-11-25 00:00:00 +0000 UTC 138 end: 2019-11-25 01:00:00 +0000 UTC 139 step: 5m0s 140 ```