github.com/abayer/test-infra@v0.0.5/metrics/README.md

github.com/abayer/test-infra@v0.0.5/metrics/README.md (about)

     1  # Bigquery metrics
     2  
     3  This `metrics-bigquery` job generates metrics that summarize data in our Bigquery
     4  test result database. Each metric is defined with a config file that is consumed
     5  by the `metrics-bigquery` periodic prow job.  Each metric config is a yaml file
     6  like the following:
     7  
     8  ```yaml
     9  # Metric name
    10  metric: failures
    11  # BigQuery query
    12  query: |
    13    #standardSQL
    14    select /* find the most recent time each job passed (may not be this week) */
    15      job,
    16      max(started) latest_pass
    17    from `k8s-gubernator.build.all`
    18    where
    19      result = 'SUCCESS'
    20    group by job
    21  
    22  # JQ filter to make daily results from raw query results
    23  jqfilter: |
    24    [(.[] | select((.latest_pass|length) > 0)
    25    | {(.job): {
    26        latest_pass: (.latest_pass)
    27    }})] | add
    28  
    29  # JQ filter to make influxdb timeseries data points for Velodrome. (Optional)
    30  jqmeasurements: |
    31    [(.[] | select((.latest_pass|length) > 0) | {
    32      measurement: "latest_pass_time",
    33      tags: {
    34        job: (.job)
    35      },
    36      fields: {
    37        job: (.job),
    38        latest_pass: (.latest_pass)
    39    }})]
    40  
    41  ```
    42  
    43  ## Metrics
    44  
    45  * build-stats - number of daily builds and pass rate
    46      - [Config](configs/build-stats.yaml)
    47      - [build-stats-latest.json](http://storage.googleapis.com/k8s-metrics/build-stats-latest.json)
    48  * presubmit-health - presubmit failure rate and timing across PRs
    49      - [Config](configs/presubmit-health.yaml)
    50      - [presubmit-health-latest.json](http://storage.googleapis.com/k8s-metrics/presubmit-health-latest.json)
    51  * failures - find jobs that have been failing the longest
    52      - [Config](configs/failures-config.yaml)
    53      - [failures-latest.json](http://storage.googleapis.com/k8s-metrics/failures-latest.json)
    54  * flakes - find the flakiest jobs this week (and the flakiest tests in each job).
    55      - [Config](configs/flakes-config.yaml)
    56      - [flakes-latest.json](http://storage.googleapis.com/k8s-metrics/flakes-latest.json)
    57  * flakes-daily - find flakes from the previous day. Similar to `flakes`, but creates more granular results for display in Velodrome.
    58      - [Config](configs/flakes-daily-config.yaml)
    59      - [flakes-daily-latest.json](http://storage.googleapis.com/k8s-metrics/flakes-daily-latest.json)
    60  * job-flakes - compute consistency of all jobs
    61      - [Config](configs/job-flakes-config.yaml)
    62      - [job-flakes-latest.json](http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json)
    63  * pr-consistency - calculate PR flakiness for the previous day.
    64      - [Config](configs/pr-consistency-config.yaml)
    65      - [pr-consistency-latest.json](http://storage.googleapis.com/k8s-metrics/pr-consistency-latest.json)
    66  * weekly-consistency - compute overall weekly consistency for PRs
    67      - [Config](configs/weekly-consistency-config.yaml)
    68      - [weekly-consistency-latest.json](http://storage.googleapis.com/k8s-metrics/weekly-consistency-latest.json)
    69  * istio-job-flakes - compute overall weekly consistency for postsubmits
    70      - [Config](configs/istio-flakes.yaml)
    71      - [istio-job-flakes-latest.json](http://storage.googleapis.com/k8s-metrics/istio-job-flakes-latest.json)
    72  
    73  ## Adding a new metric
    74  
    75  To add a new metric, create a PR that adds a new yaml config file
    76  specifying the metric name (`metric`), the bigquery query to execute (`query`), and a
    77  jq filter to filter the data for the daily and latest files (`jqfilter`).
    78  *Optionally*: Include a jqfilter to extract influxdb timeseries measurements
    79  from the raw query results (`jqmeasurements`).
    80  
    81  Run `./bigquery.py --config configs/my-new-config.yaml` and verify that the
    82  output is what you expect.
    83  
    84  Add the new metric to the list above.
    85  
    86  After merging, find the new metric on GCS within 24 hours.
    87  
    88  ## Details
    89  
    90  Each query is run every 24 hours to produce a json
    91  file containing the complete raw query results named with the format
    92  `raw-yyyy-mm-dd.json`. The raw file is then filtered with the associated
    93  jq filter and the results are stored in `daily-yyyy-mm-dd.json`.  These
    94  files are stored in the k8s-metrics GCS bucket in a directory named with
    95  the metric name and persist for a year after their creation. Additionally,
    96  the latest filtered results for a metric are stored in the root of the
    97  k8s-metrics bucket and named with the format `METRICNAME-latest.json`.
    98  
    99  If a config specifies the optional jq filter used to create influxdb timeseries
   100  data points, then the job will use the filter to generate timeseries points from
   101  the raw query results. The points are uploaded to [Velodrome](http://velodrome.k8s.io)'s influxdb instance where they can be used to create graphs and tables.
   102  
   103  ## Consistency
   104  
   105  Consistency means the test, job, pr always produced the same answer. For
   106  example suppose we run a build of a job 5 times at the same commit:
   107  * 5 passing runs, 0 failing runs: consistent
   108  * 0 passing runs, 5 failing runs: consistent
   109  * 1-4 passing runs, 1-4 failing runs: inconsistent aka flaked