github.com/cilium/cilium@v1.16.2/Documentation/contributing/testing/ci.rst (about) 1 .. only:: not (epub or latex or html) 2 3 WARNING: You are looking at unreleased Cilium documentation. 4 Please use the official rendered version released here: 5 https://docs.cilium.io 6 7 .. _ci_gha: 8 9 CI / GitHub Actions 10 -------------------- 11 12 The main CI infrastructure is maintained on GitHub Actions (GHA). 13 14 This infrastructure is broadly comprised of smoke tests and platform tests. 15 Smoke tests are typically initiated by ``pull_request`` or 16 ``pull_request_target`` triggers automatically when opening or updating a pull 17 request. Platform tests often require an organization member to manually 18 trigger the test when the pull request is ready to be tested. 19 20 Triggering Smoke Tests 21 ~~~~~~~~~~~~~~~~~~~~~~ 22 23 Several short-running tests are automatically triggered for all contributor 24 submissions, subject to GitHub's limitations around first-time contributors. 25 If no GitHub workflows are triggering on your PR, a committer for the project 26 should trigger these within a few days. Reach out in the ``#testing`` 27 channel on `Cilium Slack`_ for assistance in running these tests. 28 29 .. _trigger_phrases: 30 31 Triggering Platform Tests 32 ~~~~~~~~~~~~~~~~~~~~~~~~~ 33 34 To ensure that build resources are used judiciously, some tests on GHA are 35 manually triggered via comments. These builds typically make use of cloud 36 infrastructure, such as allocating clusters or VMs in AKS, EKS or GKE. In 37 order to trigger these jobs, a member of the GitHub organization must post a 38 comment on the Pull Request with a "trigger phrase". 39 40 If you'd like to trigger these jobs, ask in `Cilium Slack`_ in the ``#testing`` 41 channel. If you're regularly contributing to Cilium, you can also `become a 42 member <https://github.com/cilium/community/blob/main/CONTRIBUTOR-LADDER.md#organization-member>`__ 43 of the Cilium organization. 44 45 Depending on the PR target branch, a specific set of jobs is marked as required, 46 as per the `Cilium CI matrix`_. They will be automatically featured in PR checks 47 directly on the PR page. The following trigger phrases may be used to trigger 48 them all at once: 49 50 +------------------+--------------------------+ 51 | PR target branch | Trigger required PR jobs | 52 +==================+==========================+ 53 | main | /test | 54 +------------------+--------------------------+ 55 | v1.15 | /test-backport-1.15 | 56 +------------------+--------------------------+ 57 | v1.14 | /test-backport-1.14 | 58 +------------------+--------------------------+ 59 | v1.13 | /test-backport-1.13 | 60 +------------------+--------------------------+ 61 | v1.12 | /test-backport-1.12 | 62 +------------------+--------------------------+ 63 64 Pull requests submitted against older stable branches such as v1.13 may also be 65 subject to Jenkins CI jobs. For more information, see 66 `v1.13 CI <https://docs.cilium.io/en/v1.13/contributing/testing/ci/#ci-jenkins>`__. 67 68 For a full list of GHA, see `GitHub Actions Page <https://github.com/cilium/cilium/actions>`_ 69 70 Using GitHub Actions for testing 71 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 72 73 On GHA, running a specific set of Ginkgo tests (``conformance-ginkgo.yaml``) 74 can also be accomplished by modifying the files under 75 ``.github/actions/ginkgo/`` by adding or removing entries. 76 77 ``main-focus.yaml``: 78 79 This file contains a list of tests to include and exclude. The ``cliFocus`` 80 defined for each element in the "include" section is expanded to the 81 specific defined ``focus``. This mapping allows us to determine which regex 82 should be used with ``ginkgo --focus`` for each element in the "focus" list. 83 See :ref:`ginkgo-documentation` for more information about ``--focus`` flag. 84 85 Additionally, there is a list of excluded tests along with justifications 86 in the form of comments, explaining why each test is excluded based on 87 constraints defined in the ginkgo tests. 88 89 For more information, refer to 90 `GitHub's documentation on expanding matrix configurations <https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#expanding-or-adding-matrix-configurations>`__ 91 92 ``main-k8s-versions.yaml``: 93 94 This file defines which kernel versions should be run with specific Kubernetes 95 (k8s) versions. It contains an "include" section where each entry consists of 96 a k8s version, IP family, Kubernetes image, and kernel version. These details 97 determine the combinations of k8s versions and kernel versions to be tested. 98 99 ``main-prs.yaml``: 100 101 This file specifies the k8s versions to be executed for each pull request (PR). 102 The list of k8s versions under the "k8s-version" section determines the matrix 103 of jobs that should be executed for CI when triggered by PRs. 104 105 ``main-scheduled.yaml``: 106 107 This file specifies the k8s versions to be executed on a regular basis. The 108 list of k8s versions under the "k8s-version" section determines the matrix of 109 jobs that should be executed for CI as part of scheduled jobs. 110 111 Workflow interactions: 112 113 - The ``main-focus.yaml`` file helps define the test focus for CI jobs based on 114 specific criteria, expanding the ``cliFocus`` to determine the relevant 115 ``focus`` regex for ``ginkgo --focus``. 116 117 - The ``main-k8s-versions.yaml`` file defines the mapping between k8s versions 118 and the associated kernel versions to be tested. 119 120 - Both ``main-prs.yaml`` and ``main-scheduled.yaml`` files utilize the 121 "k8s-version" section to specify the k8s versions that should be included 122 in the job matrix for PRs and scheduled jobs respectively. 123 124 - These files collectively contribute to the generation of the job matrix 125 for GitHub Actions workflows, ensuring appropriate testing and validation 126 of the defined k8s versions. 127 128 For example, to only run the test under ``f09-datapath-misc-2`` with Kubernetes 129 version 1.26, the following files can be modified to have the following content: 130 131 ``main-focus.yaml``: 132 133 .. code-block:: yaml 134 135 --- 136 focus: 137 - "f09-datapath-misc-2" 138 include: 139 - focus: "f09-datapath-misc-2" 140 cliFocus: "K8sDatapathConfig Check|K8sDatapathConfig IPv4Only|K8sDatapathConfig High-scale|K8sDatapathConfig Iptables|K8sDatapathConfig IPv4Only|K8sDatapathConfig IPv6|K8sDatapathConfig Transparent" 141 142 ``main-prs.yaml``: 143 144 .. code-block:: yaml 145 146 --- 147 k8s-version: 148 - "1.26" 149 150 The ``main-k8s-versions.yaml`` and ``main-scheduled.yaml`` files can be left 151 unmodified and this will result in the execution on the tests under 152 ``f09-datapath-misc-2`` for the ``k8s-version`` "``1.26``". 153 154 155 Bisect process 156 ^^^^^^^^^^^^^^ 157 158 Bisecting Ginkgo tests (``conformance-ginkgo.yaml``) can be performed by 159 modifying the workflow file, as well as modifying the files under 160 ``.github/actions/ginkgo/`` as explained in the previous section. The sections 161 that need to be modified for the ``conformance-ginkgo.yaml`` can be found in 162 form of comments inside that file under the ``on`` section and enable the 163 event type of ``pull_request``. Additionally, the following section also needs 164 to be modified: 165 166 .. code-block:: yaml 167 168 jobs: 169 check_changes: 170 name: Deduce required tests from code changes 171 [...] 172 outputs: 173 tested: ${{ steps.tested-tree.outputs.src }} 174 matrix_sha: ${{ steps.sha.outputs.sha }} 175 base_branch: ${{ steps.sha.outputs.base_branch }} 176 sha: ${{ steps.sha.outputs.sha }} 177 # 178 # For bisect uncomment the base_branch and 'sha' lines below and comment 179 # the two lines above this comment 180 # 181 #base_branch: <replace with the base branch name, should be 'main', not your branch name> 182 #sha: <replace with the SHA of an existing docker image tag that you want to bisect> 183 184 As per the instructions, the ``base_branch`` needs to be uncommented and 185 should point to the base branch name that we are testing. The ``sha`` must to 186 point to the commit SHA that we want to bisect. **The SHA must point to an 187 existing image tag under the ``quay.io/cilium/cilium-ci`` docker image 188 repository**. 189 190 It is possible to find out whether or not a SHA exists by running either 191 ``docker manifest inspect`` or ``docker buildx imagetools inspect``. 192 This is an example output for the non-existing SHA ``22fa4bbd9a03db162f08c74c6ef260c015ecf25e`` 193 and existing SHA ``7b368923823e63c9824ea2b5ee4dc026bc4d5cd8``: 194 195 196 .. code-block:: shell 197 198 $ docker manifest inspect quay.io/cilium/cilium-ci:22fa4bbd9a03db162f08c74c6ef260c015ecf25e 199 ERROR: quay.io/cilium/cilium-ci:22fa4bbd9a03db162f08c74c6ef260c015ecf25e: not found 200 201 $ docker buildx imagetools inspect quay.io/cilium/cilium-ci:7b368923823e63c9824ea2b5ee4dc026bc4d5cd8 202 Name: quay.io/cilium/cilium-ci:7b368923823e63c9824ea2b5ee4dc026bc4d5cd8 203 MediaType: application/vnd.docker.distribution.manifest.list.v2+json 204 Digest: sha256:0b7d1078570e6979c3a3b98896e4a3811bff483834771abc5969660df38463b5 205 206 Manifests: 207 Name: quay.io/cilium/cilium-ci:7b368923823e63c9824ea2b5ee4dc026bc4d5cd8@sha256:63dbffea393df2c4cc96ff340280e92d2191b6961912f70ff3b44a0dd2b73c74 208 MediaType: application/vnd.docker.distribution.manifest.v2+json 209 Platform: linux/amd64 210 211 Name: quay.io/cilium/cilium-ci:7b368923823e63c9824ea2b5ee4dc026bc4d5cd8@sha256:0c310ab0b7a14437abb5df46d62188f4b8b809f0a2091899b8151e5c0c578d09 212 MediaType: application/vnd.docker.distribution.manifest.v2+json 213 Platform: linux/arm64 214 215 Once the changes are committed and pushed into a draft Pull Request, it is 216 possible to visualize the test results on the Pull Request's page. 217 218 GitHub Test Results 219 ^^^^^^^^^^^^^^^^^^^ 220 221 Once the test finishes, its result is sent to the respective Pull Request's 222 page. 223 224 In case of a failure, it is possible to check with test failed by going over the 225 summary of the test on the GitHub Workflow Run's page: 226 227 228 .. image:: /images/gha-summary.png 229 :align: center 230 231 232 On this example, the test ``K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing with bpf_host`` 233 failed. With the ``cilium-sysdumps`` artifact available for download we can 234 retrieve it and perform further inspection to identify the cause for the 235 failure. To investigate CI failures, see :ref:`ci_failure_triage`. 236 237 .. _test_matrix: 238 239 Testing matrix 240 ^^^^^^^^^^^^^^ 241 242 Up to date CI testing information regarding k8s - kernel version pairs can 243 always be found in the `Cilium CI matrix`_. 244 245 .. _Cilium CI matrix: https://docs.google.com/spreadsheets/d/1TThkqvVZxaqLR-Ela4ZrcJ0lrTJByCqrbdCjnI32_X0 246 247 .. _ci_failure_triage: 248 249 CI Failure Triage 250 ~~~~~~~~~~~~~~~~~ 251 252 This section describes the process to triage CI failures. We define 3 categories: 253 254 +----------------------+-----------------------------------------------------------------------------------+ 255 | Keyword | Description | 256 +======================+===================================================================================+ 257 | Flake | Failure due to a temporary situation such as loss of connectivity to external | 258 | | services or bug in system component, e.g. quay.io is down, VM race conditions, | 259 | | kube-dns bug, ... | 260 +----------------------+-----------------------------------------------------------------------------------+ 261 | CI-Bug | Bug in the test itself that renders the test unreliable, e.g. timing issue when | 262 | | importing and missing to block until policy is being enforced before connectivity | 263 | | is verified. | 264 +----------------------+-----------------------------------------------------------------------------------+ 265 | Regression | Failure is due to a regression, all failures in the CI that are not caused by | 266 | | bugs in the test are considered regressions. | 267 +----------------------+-----------------------------------------------------------------------------------+ 268 269 Triage process 270 ^^^^^^^^^^^^^^ 271 272 #. Investigate the failure you are interested in and determine if it is a 273 CI-Bug, Flake, or a Regression as defined in the table above. 274 275 #. Search `GitHub issues <https://github.com/cilium/cilium/issues?utf8=%E2%9C%93&q=is%3Aissue+>`_ 276 to see if bug is already filed. Make sure to also include closed issues in 277 your search as a CI issue can be considered solved and then re-appears. 278 Good search terms are: 279 280 - The test name, e.g. 281 :: 282 283 k8s-1.7.K8sValidatedKafkaPolicyTest Kafka Policy Tests KafkaPolicies (from (k8s-1.7.xml)) 284 285 - The line on which the test failed, e.g. 286 :: 287 288 github.com/cilium/cilium/test/k8s/kafka_policies.go:202 289 290 - The error message, e.g. 291 :: 292 293 Failed to produce from empire-hq on topic deathstar-plan 294 295 #. If a corresponding GitHub issue exists, update it with: 296 297 #. A link to the failing GHA build (note that the build information is 298 eventually deleted). 299 300 #. If no existing GitHub issue was found, file a `new GitHub issue <https://github.com/cilium/cilium/issues/new>`_: 301 302 #. Attach failure case and logs from failing test 303 #. If the failure is a new regression or a real bug: 304 305 #. Title: ``<Short bug description>`` 306 #. Labels ``kind/bug`` and ``needs/triage``. 307 308 #. If failure is a new CI-Bug, Flake or if you are unsure: 309 310 #. Title ``CI: <testname>: <cause>``, e.g. ``CI: K8sValidatedPolicyTest Namespaces: cannot curl service`` 311 #. Labels ``kind/bug/CI`` and ``needs/triage`` 312 #. Include the test name and whole Stacktrace section to help others find this issue. 313 314 .. note:: 315 316 Be extra careful when you see a new flake on a PR, and want to open an 317 issue. It's much more difficult to debug these without context around the 318 PR and the changes it introduced. When creating an issue for a PR flake, 319 include a description of the code change, the PR, or the diff. If it 320 isn't related to the PR, then it should already happen in the ``main`` 321 branch, and a new issue isn't needed. 322 323 **Examples:** 324 325 * ``Flake, quay.io is down`` 326 * ``Flake, DNS not ready, #3333`` 327 * ``CI-Bug, K8sValidatedPolicyTest: Namespaces, pod not ready, #9939`` 328 * ``Regression, k8s host policy, #1111`` 329 330 Disabling Github Actions Workflows 331 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 332 333 .. warning:: 334 Do not use the `GitHub web UI <https://docs.github.com/en/actions/using-workflows/disabling-and-enabling-a-workflow?tool=webui>`_ 335 to disable GitHub Actions workflows. It makes it difficult to find out who 336 disabled the workflows and why. 337 338 Alternatives to Disabling Github Actions Workflows 339 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 340 341 Before proceeding, consider the following alternatives to disabling an entire 342 GitHub Actions workflow. 343 344 - Skip individual tests. If specific tests are causing the workflow to fail, 345 disable those tests instead of disabling the workflow. When you disable a 346 workflow, all the tests in the workflow stop running. This makes it easier 347 to introduce new regressions that would have been caught by these tests 348 otherwise. 349 - Remove the workflow from the list of required status checks. This way the 350 workflow still runs on pull requests, but you can still merge them without 351 the workflow succeeding. To remove the workflow from the required status check 352 list, post a message in the `#testing Slack channel <https://cilium.slack.com/archives/C7PE7V806>`_ 353 and @mention people in the `cilium-maintainers team <https://github.com/orgs/cilium/teams/cilium-maintainers>`__. 354 355 Step 1: Open a GitHub Issue 356 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 357 358 Open a GitHub issue to track activities related to fixing the workflow. If there 359 are existing test flake GitHub issues, list them in the tracking issue. Find an 360 assignee for the tracking issue to avoid the situation where the workflow remains 361 disabled indefinitely because nobody is assigned to actually fix the workflow. 362 363 Step 2: Update the required status check list 364 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 365 366 If the workflow is in the required status check list, it needs to be removed 367 from the list. Notify the `cilium-maintainers team <https://github.com/orgs/cilium/teams/cilium-maintainers>`__ 368 by mentioning ``@cilium/cilium-maintainers`` in the tracking issue and ask them 369 to remove the workflow from the required status check list. 370 371 Step 3: Update the workflow configuration 372 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 373 374 Update the workflow configuration as described in the following sub-steps 375 depending on whether the workflow is triggered by the ``/test`` comment 376 or by the ``pull_request`` or ``pull_request_target`` trigger. Open a pull 377 request with your changes, have it reviewed, then merged. 378 379 .. tabs:: 380 .. group-tab:: ``/test`` comment trigger 381 382 For those workflows that get triggered by the ``/test`` comment, update 383 ariane-config.yaml and remove the workflow from ``triggers:/test:workflows`` 384 section (`an example <https://github.com/cilium/cilium/pull/29488>`_). Do not 385 remove the targeted trigger (``triggers:/ci-e2e`` for example) so that you can 386 still use the targeted trigger to run the workflow when needed. 387 388 .. group-tab:: ``pull_request`` or ``pull_request_target`` trigger 389 390 For those workflows that get triggered by the ``pull_request`` or 391 ``pull_request_target`` trigger, remove the trigger from the workflow file. 392 Do not remove the ``schedule`` trigger if the workflow has it. It is useful 393 to be able to see if the workflow has stabilized enough over time when making 394 the decision to re-enable the workflow.