github.com/munnerz/test-infra@v0.0.0-20190108210205-ce3d181dc989/testgrid/README.md (about)

     1  # Testgrid
     2  
     3  ### Table of Contents
     4  * [Configuration](#configuration)
     5  * [Advanced Configuration](#advanced-configuration)
     6  * [Using the Client](#using-the-client)
     7  * [Unit Testing](#unit-testing)
     8  * [Merging Changes](#merging-changes)
     9  
    10  
    11  The testgrid site is accessible at https://testgrid.k8s.io. The site is
    12  configured by [`config.yaml`].
    13  Updates to the config are automatically tested and pushed to production.
    14  
    15  Testgrid is composed of:
    16  * A list of test groups that contain results for a job over time.
    17  * A list of dashboards that are composed of tabs that display a test group
    18  * A list of dashboard groups of related dashboards.
    19  
    20  ## Tip and Tricks
    21  
    22  We have a short [video] from the testgrid session at the 2018 contributor summit.
    23  
    24  The video demos power features of testgrid, including:
    25  * Sorting
    26  * Filtering
    27  * Graphing
    28  * Grouping
    29  * Dashboard groups
    30  * Summaries
    31  
    32  Please have a look!
    33  
    34  ## Configuration
    35  Open [`config.yaml`] in your favorite editor and:
    36  1. Configure the test groups
    37  2. Add those testgroups to one or more tabs in one or more dashboards
    38  3. Consider using dashboard groups if multiple dashboards are needed.
    39  
    40  ### Test groups
    41  Test groups contain a set of test results across time for the same job. Each group backs one or more dashboard tabs.
    42  
    43  Add a new test group under `test_groups:`, specifying the group's name, and where the logs are located.
    44  
    45  Ex:
    46  
    47  ```
    48  test_groups:
    49  - name: {test_group_name}
    50    gcs_prefix: kubernetes-jenkins/logs/{test_group_name}
    51  ```
    52  
    53  See the `TestGroup` message in [`config.proto`] for additional fields to
    54  configure like `days_of_results`, `tests_name_policy`, `notifications`, etc.
    55  
    56  ### Dashboards
    57  #### Tabs
    58  A dashboard tab is a particular view of a test group. Multiple dashboard tabs can view the same test group in different ways, via different configuration options. All dashboard tabs belong under a dashboard (see below).
    59  
    60  #### Dashboards
    61  
    62  A dashboard is a set of related dashboard tabs.  The dashboard name shows up as the top-level link when viewing TestGrid.
    63  
    64  Add a new dashboard under `dashboards` and a new dashboard tab under that.
    65  
    66  Ex:
    67  
    68  ```
    69  dashboards:
    70  - name: {dashboard-name}
    71    dashboard_tab:
    72    - name: {dashboard-tab-name}
    73      test_group_name: {test-group-name}
    74  ```
    75  
    76  See the `Dashboard` and `DashboardTab` messages in [`config.proto`] for
    77  additional configuration options, such as `notifications`, `file_bug_template`,
    78  `description`, `code_search_url_template`, etc.
    79  
    80  #### Dashboard groups
    81  A dashboard group is a set of related dashboards. When viewing a dashboard's tabs, you'll see the other dashboards in the Dashboard Group at the top of the client.
    82  
    83  Add a new dashboard group, specifying names for all the dashboards that fall under this group.
    84  
    85  Ex:
    86  
    87  ```
    88  dashboard_groups:
    89  - name: {dashboard-group-name}
    90    dashboard_names:
    91    - {dashboard-1}
    92    - {dashboard-2}
    93    - {dashboard-3}
    94  ```
    95  
    96  ## Advanced configuration
    97  See [`config.proto`] for an extensive list of configuration options. Here are some commonly-used ones.
    98  
    99  ### More/Fewer Results
   100  Specify `days_of_results` in a test group to increase or decrease the number of days of results shown.
   101  
   102  ```
   103  test_groups:
   104  - name: kubernetes-build
   105    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-build
   106    days_of_results: 7
   107  ```
   108  
   109  ### Tab descriptions
   110  Add a short description to a dashboard tab describing its purpose.
   111  
   112  ```
   113    dashboard_tab:
   114    - name: gce
   115      test_group_name: ci-kubernetes-e2e-gce
   116      base_options: 'include-filter-by-regex=Kubectl%7Ckubectl'
   117      description: 'kubectl gce e2e tests for master branch'
   118  ```
   119  
   120  ### Column headers
   121  TestGrid shows date, build number, and k8s and test-infra commit shas above
   122  each run's results by default. To add your own custom column headers, add a
   123  key-value pair in your tests' metadata (see [metadata for
   124  finished.json](https://github.com/kubernetes/test-infra/tree/master/gubernator#job-artifact-gcs-layout)),
   125  and add the key for that pair as a `configuration_value` under `column_header`
   126  for your test group. Example:
   127  
   128  ```
   129  test_groups:
   130  - name: ci-kubernetes-e2e-gce-ubuntudev-k8sdev-default
   131    gcs_prefix:
   132    kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-ubuntudev-k8sdev-default
   133    column_header:
   134    - configuration_value: node_os_image
   135    - configuration_value: master_os_image
   136    - configuration_value: Commit
   137    - configuration_value: infra-commit
   138  ```
   139  
   140  ### Email alerts
   141  In TestGroup, set `num_failures_to_alert` (alerts for consistent failures)
   142  and/or `alert_stale_results_hours` (alerts when tests haven't run recently).
   143  You can also set `num_passes_to_disable_alert`.
   144  
   145  In DashboardTab, set `alert_mail_to_addresses` (comma-separated list of email
   146  addresses to send mail to).
   147  
   148  These alerts will send whenever new failures are detected (or whenever the
   149  dashboard tab goes stale), and will stop when `num_passes_to_disable_alert`
   150  consecutive passes are found (or no failure is found in `num_columns_recent`
   151  runs).
   152  
   153  ```
   154  # Send alerts to foo@bar.com whenever a test fails 3 times in a row, or tests
   155  # haven't run in the last day.
   156  test_groups:
   157  - name: ci-kubernetes-e2e-gce
   158    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-e2e-gce
   159    alert_stale_results_hours: 24
   160    num_failures_to_alert: 3
   161  
   162  dashboards:
   163  - name: google-gce
   164    dashboard_tab:
   165    - name: gce
   166      test_group_name: ci-kubernetes-e2e-gce
   167      alert_options:
   168        alert_mail_to_addresses: 'foo@bar.com'
   169  ```
   170  
   171  
   172  ### Base options
   173  Default to a set of client modifiers when viewing this dashboard tab.
   174  
   175  ```
   176  # Show test cases from ci-kubernetes-e2e-gce, but only if the test has 'Kubectl' or 'kubectl' in the name.
   177    dashboard_tab:
   178    - name: gce
   179      test_group_name: ci-kubernetes-e2e-gce
   180      base_options: 'include-filter-by-regex=Kubectl%7Ckubectl'
   181      description: 'kubectl gce e2e tests for master branch'
   182  ```
   183  
   184  ### More informative test names
   185  If you run multiple versions of a test against different parameters, show which parameters they with after the test name.
   186  
   187  ```
   188  # Show a test case as "{test_case_name} [{Context}]"
   189  - name: ci-kubernetes-node-kubelet-benchmark
   190    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-benchmark
   191    test_name_config:
   192      name_elements:
   193      - target_config: Tests name
   194      - target_config: Context
   195      name_format: '%s [%s]'
   196  ```
   197  
   198  ### Customize regression search
   199  Narrow down where to search when searching for a regression between two builds/commits.
   200  
   201  ```
   202    dashboard_tab:
   203    - name: bazel
   204      description: Runs bazel test //... on the test-infra repo.
   205      test_group_name: ci-test-infra-bazel
   206      code_search_url_template:
   207        url: https://github.com/kubernetes/test-infra/compare/<start-custom-0>...<end-custom-0>
   208  ```
   209  
   210  ### Notifications
   211  Testgrid supports the ability to add notifications, which appears as a yellow
   212  butter bar / toast message at the top of the screen.
   213  
   214  This is an effective way to broadcast system wide information (all
   215  FOO suites are failing due to blah, upgrade frobber to vX before the
   216  weekend, etc.)
   217  
   218  Configure the list of `notifications:` under dashboard or testgroup:
   219  Each notification includes a `summary:` that defines the text displayed.
   220  Notifications benefit from including a `context_link:` url that can be clicked
   221  to provide more information.
   222  
   223  Ex:
   224  
   225  ```
   226  dashboards:
   227  - name: k8s
   228    dashboard_tab:
   229    - name: build
   230      test_group_name: kubernetes-build
   231    notifications:  # Attach to a specific dashboard
   232    - summary: Hello world (first notification).
   233    - summary: Tests are failing to start (second notification).
   234      context_link: https://github.com/kubernetes/kubernetes/issues/123
   235  ```
   236  
   237  or
   238  
   239  ```
   240  test_groups:  # Attach to a specific test_group
   241  - name: kubernetes-build
   242    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-build
   243    notifications:
   244    - summary: Hello world (first notification)
   245    - summary: Tests are failing to start (second notification).
   246      context_link: https://github.com/kubernetes/kubernetes/issues/123
   247  ```
   248  
   249  ### What Counts as 'Recent'
   250  Configure `num_columns_recent` to change how many columns TestGrid should consider 'recent' for results.
   251  TestGrid uses this to calculate things like 'is this test stale?' (and hides the test).
   252  
   253  ```
   254  test_groups:
   255  - name: kubernetes-build
   256    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-build
   257    num_columns_recent: 3
   258  ```
   259  
   260  ### Ignore Pending Results
   261  `ignore_pending` is false by default, which means that in-progress results will
   262  be shown if we have data for them. If you want to have these not show up, add:
   263  
   264  ```
   265  test_groups:
   266  - name: kubernetes-build
   267    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-build
   268    ignore_pending: true
   269  ```
   270  
   271  ### Showing a metric in the cells
   272  Specify `short_text_metric` to display a custom numeric metric in the TestGrid cells. Example:
   273  
   274  ```
   275  test_groups:
   276  - name: ci-kubernetes-coverage-conformance
   277    gcs_prefix: kubernetes-jenkins/logs/ci-kubernetes-coverage-conformance
   278    short_text_metric: coverage
   279  ```
   280  
   281  ## Using the client
   282  
   283  Here are some quick tips and clarifications for using the TestGrid site!
   284  
   285  ### Tab Statuses
   286  
   287  TestGrid assigns dashboard tabs a status based on recent test runs.
   288  
   289   *  **PASSING**: No failures found in recent (`num_columns_recent`) test runs.
   290   *  **FAILING**: One or more consistent failures in recent test runs.
   291   *  **FLAKY**: The tab is neither PASSING nor FAILING. There is at least one
   292      recent failed result that is not a consistent failure.
   293  
   294  ### Summary Widget
   295  
   296  You can get a small widget showing the status of your dashboard tab, based on
   297  the tab statuses above! For example:
   298  
   299  `sig-testing-misc#bazel`: [![sig-testing-misc/bazel](https://testgrid.k8s.io/q/summary/sig-testing-misc/bazel/tests_status?style=svg)](https://testgrid.k8s.io/sig-testing-misc#bazel)
   300  
   301  Inline it with:
   302  
   303  ```
   304  <!-- Inline with a link to your tab -->
   305  [![<dashboard_name>/<tab_name>](https://testgrid.k8s.io/q/summary/<dashboard_name>/<tab_name>/tests_status?style=svg)](https://testgrid.k8s.io/<dashboard_name>#<tab_name>)
   306  ```
   307  
   308  ### Customizing Test Result Sizes
   309  
   310  Change the size of the test result rectangles.
   311  
   312  The three sizes are Standard, Compact, and Super Compact. You can also specify
   313  `width=X` in the URL (X > 3) to customize the width. For small widths, this may
   314  mean the date and/or changelist, or other custom headers, are no longer
   315  visible.
   316  
   317  ### Filtering Tests
   318  
   319  You can repeatedly add filters to include/exclude test rows. Under **Options**:
   320  
   321  *   **Include/Exclude Filter by RegEx**: Specify a regular expression that
   322      matches test names for rows you'd like to include/exclude.
   323  *   **Exclude non-failed Tests**: Omit rows with no failing results.
   324  
   325  ### Grouping Tests
   326  
   327  Grouped tests are summarized in a single row that is collapsible/expandable by
   328  clicking on the test name (shown as a triangle on the left). Under **Options**:
   329  
   330  *   **Group by RegEx Mask**: Specify a regular expression to mask a portion of
   331      the test name. Any test names that match after applying this mask will be
   332      grouped together.
   333  *   **Group by Target**: Any tests that contain the same target will be
   334      grouped together.
   335  *   **Group by Hierarchy Pattern**: Specify a regular expression that matches
   336      one or more parts of the tests' names and the tests will be grouped
   337      hierarchically. For example, if you have these tests in your dashboard:
   338  
   339      ```text
   340      /test/dir1/target1
   341      /test/dir1/target2
   342      /test/dir2/target3
   343      ```
   344  
   345      By specifying regular expression "\w+", the tests will be organized into:
   346  
   347      ```text
   348      ▼test
   349        ▼dir1
   350          target1
   351        ▼dir2
   352          target2
   353          target3
   354      ```
   355  
   356  ### Sorting Tests
   357  
   358  Under **Options**
   359  
   360  *   **Sort by Failures**: Tests with more recent failures will appear before
   361      other tests.
   362  *   **Sort by Flakiness**: Tests with a higher flakiness score will appear
   363      before tests with a lower flakiness score. The flakiness score, which is not
   364      reported, is based on the number of transitions from passing to failing (and
   365      vice versa) with more weight given to more recent transitions.
   366  *   **Sort by Name**: Sort alphabetically.
   367  
   368  ## Unit testing
   369  
   370  Run `bazel test //testgrid/...` to ensure the config is valid.
   371  
   372  This finds common problems such as malformed yaml, a tab referring to a
   373  non-existent test group, a test group never appearing on any tab, etc.
   374  
   375  Run `bazel test //...` for slightly more advanced testing, such as ensuring that
   376  every job in our CI system appears somewhere in testgrid, etc.
   377  
   378  All PRs updating the configuration must pass prior to merging
   379  
   380  
   381  ## Merging changes
   382  
   383  Updates to the testgrid configuration are automatically pushed immediately when
   384  merging a change.
   385  
   386  Manually convert the yaml file to the config proto with:
   387  ```
   388  bazel run //testgrid/cmd/config -- \
   389    --yaml=testgrid/config.yaml \
   390    --print-text \
   391    --oneshot \
   392    --output=/tmp/config.pb \
   393    # Or push to gcs
   394    # --output=gs://my-bucket/config
   395    # --gcp-service-account=/path/to/foo.json
   396  ```
   397  
   398  [`config.proto`]: ./config.proto
   399  [`config.yaml`]: ./config.yaml
   400  [video]: https://www.youtube.com/watch?v=jm2l2SLq_yE