github.com/zhyoulun/cilium@v1.6.12/Documentation/contributing/ci.rst (about)

     1  .. only:: not (epub or latex or html)
     2    
     3      WARNING: You are looking at unreleased Cilium documentation.
     4      Please use the official rendered version released here:
     5      http://docs.cilium.io
     6  
     7  .. _ci_jenkins:
     8  
     9  CI / Jenkins
    10  ------------
    11  
    12  The main CI infrastructure is maintained at https://jenkins.cilium.io/
    13  
    14  Jobs Overview
    15  ~~~~~~~~~~~~~
    16  
    17  Cilium-PR-Ginkgo-Tests-Validated
    18  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    19  
    20  Runs validated Ginkgo tests which are confirmed to be stable and have been
    21  verified. These tests must always pass.
    22  
    23  The configuration for this job is contained within ``ginkgo.Jenkinsfile``.
    24  
    25  It first runs unit tests using docker-compose using a YAML located at
    26  ``test/docker-compose.yaml``.
    27  
    28  The next steps happens in parallel:
    29  
    30      - Runs the single-node e2e tests using the Docker runtime.
    31      - Runs the multi-node Kubernetes e2e tests against the latest default
    32        version of Kubernetes specified above.
    33  
    34  This job can be used to run tests on custom branches. To do so, log into Jenkins and go to https://jenkins.cilium.io/job/cilium-ginkgo/configure .
    35  Then add your branch name to ``GitHub Organization -> cilium -> Filter by name (with wildcards) -> Include`` field and save changes.
    36  After you don't need to run tests on your branch, please remove the branch from this field.
    37  
    38  
    39  Cilium-PR-Ginkgo-Tests-k8s
    40  ^^^^^^^^^^^^^^^^^^^^^^^^^^
    41  
    42  Runs the Kubernetes e2e tests against all Kubernetes versions that are not
    43  currently not tested as part of each pull-request, but which Cilium still
    44  supports, as well as the the most-recently-released versions of Kubernetes that
    45  that might not be declared stable by Kubernetes upstream:
    46  
    47  First stage.
    48  
    49      - 1.10
    50      - 1.11
    51  
    52  Second stage (other versions)
    53  
    54      - 1.12
    55      - 1.13
    56  
    57  Third stage
    58  
    59      - 1.14
    60      - beta versions (1.16-beta once it's out)
    61  
    62  Ginkgo-CI-Tests-Pipeline
    63  ^^^^^^^^^^^^^^^^^^^^^^^^
    64  
    65  https://jenkins.cilium.io/job/Ginkgo-CI-Tests-Pipeline/
    66  
    67  Cilium-Nightly-Tests-PR
    68  ^^^^^^^^^^^^^^^^^^^^^^^
    69  
    70  Runs long-lived tests which take extended time. Some of these tests have an
    71  expected failure rate.
    72  
    73  Nightly tests run once per day in the ``Cilium-Nightly-Tests Job``.  The
    74  configuration for this job is stored in ``Jenkinsfile.nightly``.
    75  
    76  To see the results of these tests, you can view the JUnit Report for an individual job:
    77  
    78  1. Click on the build number you wish to get test results from on the left hand
    79     side of the ``Cilium-Nightly-Tests Job``.
    80  2. Click on 'Test Results' on the left side of the page to view the results from the build.
    81     This will give you a report of which tests passed and failed. You can click on each test
    82     to view its corresponding output created from Ginkgo.
    83  
    84  This first runs the Nightly tests with the following setup:
    85  
    86      - 4 Kubernetes 1.8 nodes
    87      - 4 GB of RAM per node.
    88      - 4 vCPUs per node.
    89  
    90  Then, it runs tests Kubernetes tests against versions of Kubernetes that are currently not tested against
    91  as part of each pull-request, but that Cilium still supports.
    92  
    93  It also runs a variety of tests against Envoy to ensure that proxy functionality is working correctly.
    94  
    95  .. _trigger_phrases:
    96  
    97  Triggering Pull-Request Builds With Jenkins
    98  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    99  
   100  To ensure that build resources are used judiciously, builds on Jenkins
   101  are manually triggered via comments on each pull-request that contain
   102  "trigger-phrases". Only members of the Cilium GitHub organization are
   103  allowed to trigger these jobs. Refer to the table below for information
   104  regarding which phrase triggers which build, which build is required for
   105  a pull-request to be merged, etc. Each linked job contains a description
   106  illustrating which subset of tests the job runs.
   107  
   108  
   109  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   110  | Jenkins Job                                                                                             | Trigger Phrase    | Required To Merge? |
   111  +=========================================================================================================+===================+====================+
   112  | `Cilium-PR-Ginkgo-Tests-Validated <https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/>`_   | test-me-please    | Yes                |
   113  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   114  | `Cilium-Pr-Ginkgo-Test-k8s <https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-k8s/>`_                | test-missed-k8s   | No                 |
   115  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   116  | `Cilium-Nightly-Tests-PR <https://jenkins.cilium.io/job/Cilium-PR-Nightly-Tests-All/>`_                 | test-nightly      | No                 |
   117  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   118  | `Cilium-PR-Doc-Tests <https://jenkins.cilium.io/view/all/job/Cilium-PR-Doc-Tests/>`_                    | test-docs-please  | No                 |
   119  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   120  | `Cilium-PR-Kubernetes-Upstream <https://jenkins.cilium.io/view/PR/job/Cilium-PR-Kubernetes-Upstream/>`_ | test-upstream-k8s | No                 |
   121  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   122  | `Cilium-PR-Flannel <https://jenkins.cilium.io/job/Cilium-PR-Flannel/>`_                                 | test-flannel      | No                 |
   123  +---------------------------------------------------------------------------------------------------------+-------------------+--------------------+
   124  
   125  There are some feature flags based on Pull Requests labels, the list of labels
   126  are the following:
   127  
   128  - ``area/containerd``: Enable containerd runtime on all Kubernetes test.
   129  - ``ci/next-next``: Run tests on net-next kernel. This causes the
   130    ``test-me-please`` target to only run on the net-next kernel. It is purely
   131    for testing on a different kernel, to merge a PR it must pass the CI
   132    without this flag.
   133  
   134  
   135  Using Jenkins for testing
   136  ~~~~~~~~~~~~~~~~~~~~~~~~~
   137  
   138  Typically when running Jenkins tests via one of the above trigger phases, it
   139  will run all of the tests in that particular category. However, there may be
   140  cases where you just want to run a single test quickly on Jenkins and observe
   141  the test result. To do so, you need to update the relevant test to have a
   142  custom name, and to update the Jenkins file to focus that test. Below is an
   143  example patch that shows how this can be achieved.
   144  
   145  .. code-block:: diff
   146  
   147      diff --git a/ginkgo.Jenkinsfile b/ginkgo.Jenkinsfile
   148      index ee17808748a6..637f99269a41 100644
   149      --- a/ginkgo.Jenkinsfile
   150      +++ b/ginkgo.Jenkinsfile
   151      @@ -62,10 +62,10 @@ pipeline {
   152                   steps {
   153                       parallel(
   154                           "Runtime":{
   155      -                        sh 'cd ${TESTDIR}; ginkgo --focus="RuntimeValidated*" -v -noColor'
   156      +                        sh 'cd ${TESTDIR}; ginkgo --focus="XFoooo*" -v -noColor'
   157                           },
   158                           "K8s-1.9":{
   159      -                        sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus=" K8sValidated*" -v -noColor ${FAILFAST}'
   160      +                        sh 'cd ${TESTDIR}; K8S_VERSION=1.9 ginkgo --focus=" K8sFooooo*" -v -noColor ${FAILFAST}'
   161                           },
   162                           failFast: true
   163                       )
   164      diff --git a/test/k8sT/Nightly.go b/test/k8sT/Nightly.go
   165      index 62b324619797..3f955c73a818 100644
   166      --- a/test/k8sT/Nightly.go
   167      +++ b/test/k8sT/Nightly.go
   168      @@ -466,7 +466,7 @@ var _ = Describe("NightlyExamples", func() {
   169  
   170                      })
   171  
   172      -               It("K8sValidated Updating Cilium stable to master", func() {
   173      +               FIt("K8sFooooo K8sValidated Updating Cilium stable to master", func() {
   174                              podFilter := "k8s:zgroup=testapp"
   175  
   176                              //This test should run in each PR for now.
   177  
   178  CI Failure Triage
   179  ~~~~~~~~~~~~~~~~~
   180  
   181  This section describes the process to triage CI failures. We define 3 categories:
   182  
   183  +----------------------+-----------------------------------------------------------------------------------+
   184  | Keyword              | Description                                                                       |
   185  +======================+===================================================================================+
   186  | Flake                | Failure due to a temporary situation such as loss of connectivity to external     |
   187  |                      | services or bug in system component, e.g. quay.io is down, VM race conditions,    |
   188  |                      | kube-dns bug, ...                                                                 |
   189  +----------------------+-----------------------------------------------------------------------------------+
   190  | CI-Bug               | Bug in the test itself that renders the test unreliable, e.g. timing issue when   |
   191  |                      | importing and missing to block until policy is being enforced before connectivity |
   192  |                      | is verified.                                                                      |
   193  +----------------------+-----------------------------------------------------------------------------------+
   194  | Regression           | Failure is due to a regression, all failures in the CI that are not caused by     |
   195  |                      | bugs in the test are considered regressions.                                      |
   196  +----------------------+-----------------------------------------------------------------------------------+
   197  
   198  Pipelines subject to triage
   199  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   200  
   201  Build/test failures for the following Jenkins pipelines must be reported as
   202  GitHub issues using the process below:
   203  
   204  +---------------------------------------+------------------------------------------------------------------+
   205  | Pipeline                              | Description                                                      |
   206  +=======================================+==================================================================+
   207  | `Ginkgo-Tests-Validated-master`_      | Runs whenever a PR is merged into master                         |
   208  +---------------------------------------+------------------------------------------------------------------+
   209  | `Ginkgo-CI-Tests-Pipeline`_           | Runs every two hours on the master branch                        |
   210  +---------------------------------------+------------------------------------------------------------------+
   211  | `Master-Nightly`_                     | Runs durability tests every night                                |
   212  +---------------------------------------+------------------------------------------------------------------+
   213  | `Vagrant-Master-Boxes-Packer-Build`_  | Runs on merge into `github.com/cilium/packer-ci-build`_.         |
   214  +---------------------------------------+------------------------------------------------------------------+
   215  | :jenkins-branch:`Release-branch <>`   | Runs various Ginkgo tests on merge into branch "\ |SCM_BRANCH|"  |
   216  +---------------------------------------+------------------------------------------------------------------+
   217  
   218  .. _Ginkgo-Tests-Validated-master: https://jenkins.cilium.io/job/cilium-ginkgo/job/cilium/job/master/
   219  .. _Ginkgo-CI-Tests-Pipeline: https://jenkins.cilium.io/job/Ginkgo-CI-Tests-Pipeline/
   220  .. _Master-Nightly: https://jenkins.cilium.io/job/Cilium-Master-Nightly/
   221  .. _Vagrant-Master-Boxes-Packer-Build: https://jenkins.cilium.io/job/Vagrant-Master-Boxes-Packer-Build/
   222  .. _github.com/cilium/packer-ci-build: https://github.com/cilium/packer-ci-build/
   223  
   224  Triage process
   225  ^^^^^^^^^^^^^^
   226  
   227  #. Discover untriaged Jenkins failures via the jenkins-failures.sh script. It
   228     defaults to checking the previous 24 hours but this can be modified by
   229     setting the SINCE environment variable (it is a unix timestamp). The script
   230     checks the various test pipelines that need triage.
   231  
   232     .. code-block:: bash
   233  
   234         $ contrib/scripts/jenkins-failures.sh
   235  
   236     .. note::
   237  
   238       You can quickly assign SINCE with statements like ``SINCE=`date -d -3days```
   239  
   240  #. Investigate the failure you are interested in and determine if it is a
   241     CI-Bug, Flake, or a Regression as defined in the table above.
   242  
   243     #. Search `GitHub issues <https://github.com/cilium/cilium/issues?utf8=%E2%9C%93&q=is%3Aissue+>`_
   244        to see if bug is already filed. Make sure to also include closed issues in
   245        your search as a CI issue can be considered solved and then re-appears.
   246        Good search terms are:
   247  
   248        - The test name, e.g.
   249          ::
   250  
   251              k8s-1.7.K8sValidatedKafkaPolicyTest Kafka Policy Tests KafkaPolicies (from (k8s-1.7.xml))
   252  
   253        - The line on which the test failed, e.g.
   254          ::
   255  
   256              github.com/cilium/cilium/test/k8sT/KafkaPolicies.go:202
   257  
   258        - The error message, e.g.
   259          ::
   260  
   261              Failed to produce from empire-hq on topic deathstar-plan
   262  
   263  #. If a corresponding GitHub issue exists, update it with:
   264  
   265     #. A link to the failing Jenkins build (note that the build information is
   266        eventually deleted).
   267     #. Attach the zipfile downloaded from Jenkins with logs from the failing
   268        tests. A zipfile for all tests is also available.
   269     #. Check how much time has passed since the last reported occurrence of this
   270        failure and move this issue to the correct column in the `CI flakes
   271        project <https://github.com/cilium/cilium/projects/8>`_ board.
   272  
   273  #. If no existing GitHub issue was found, file a `new GitHub issue <https://github.com/cilium/cilium/issues/new>`_:
   274  
   275     #. Attach zipfile downloaded from Jenkins with logs from failing test
   276     #. If the failure is a new regression or a real bug:
   277  
   278        #. Title: ``<Short bug description>``
   279        #. Labels ``kind/bug`` and ``needs/triage``.
   280  
   281     #. If failure is a new CI-Bug, Flake or if you are unsure:
   282  
   283        #. Title ``CI: <testname>: <cause>``, e.g. ``CI: K8sValidatedPolicyTest Namespaces: cannot curl service``
   284        #. Labels ``kind/bug/CI`` and ``needs/triage``
   285        #. Include a link to the failing Jenkins build (note that the build information is
   286           eventually deleted).
   287        #. Attach zipfile downloaded from Jenkins with logs from failing test
   288        #. Include the test name and whole Stacktrace section to help others find this issue.
   289        #. Add issue to `CI flakes project <https://github.com/cilium/cilium/projects/8>`_
   290  
   291     .. note::
   292  
   293        Be extra careful when you see a new flake on a PR, and want to open an
   294        issue. It's much more difficult to debug these without context around the
   295        PR and the changes it introduced. When creating an issue for a PR flake,
   296        include a description of the code change, the PR, or the diff. If it
   297        isn't related to the PR, then it should already happen in master, and a
   298        new issue isn't needed.
   299  
   300  #. Edit the description of the Jenkins build to mark it as triaged. This will
   301     exclude it from future jenkins-failures.sh output.
   302  
   303     #. Login -> Click on build -> Edit Build Information
   304     #. Add the failure type and GH issue number. Use the table describing the
   305        failure categories, at the beginning of this section, to help
   306        categorize them.
   307  
   308     .. note::
   309  
   310        This step can only be performed with an account on Jenkins. If you are
   311        interested in CI failure reviews and do not have an account yet, ping us
   312        on Slack.
   313  
   314  **Examples:**
   315  
   316  * ``Flake, quay.io is down``
   317  * ``Flake, DNS not ready, #3333``
   318  * ``CI-Bug, K8sValidatedPolicyTest: Namespaces, pod not ready, #9939``
   319  * ``Regression, k8s host policy, #1111``
   320  
   321  Infrastructure details
   322  ~~~~~~~~~~~~~~~~~~~~~~
   323  
   324  Logging into VM running tests
   325  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   326  
   327  1. If you have access to credentials for Jenkins, log into the Jenkins slave running the test workload
   328  2. Identify the vagrant box running the specific test
   329  
   330  .. code:: bash
   331  
   332      $ vagrant global-status
   333      id       name                          provider   state   directory
   334      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   335      6e68c6c  k8s1-build-PR-1588-6          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
   336      ec5962a  cilium-master-build-PR-1588-6 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q
   337      bfaffaa  k8s2-build-PR-1588-6          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q/tests/k8s
   338      3fa346c  k8s1-build-PR-1588-7          virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2/tests/k8s
   339      b7ded3c  cilium-master-build-PR-1588-7 virtualbox running /root/jenkins/workspace/cilium_cilium_PR-1588-CWL743UTZEF6CPEZCNXQVSZVEW32FR3CMGKGY6667CU7X43AAZ4Q@2
   340  
   341  3. Log into the specific VM
   342  
   343  .. code:: bash
   344  
   345      $ JOB_BASE_NAME=PR-1588 BUILD_NUMBER=6 vagrant ssh 6e68c6c
   346  
   347  
   348  Jenkinsfiles Extensions
   349  ^^^^^^^^^^^^^^^^^^^^^^^
   350  
   351  Cilium uses a custom `Jenkins helper library
   352  <https://github.com/cilium/Jenkins-library>`_ to gather metadata from PRs and
   353  simplify our Jenkinsfiles. The exported methods are:
   354  
   355  - **ispr()**: return true if the current build is a PR.
   356  - **setIfPr(string, string)**: return the first argument in case of a PR, if not
   357    a PR return the second one.
   358  - **BuildIfLabel(String label, String Job)**: trigger a new Job if the PR has
   359    that specific Label.
   360  - **Status(String status, String context)**: set pull request check status on
   361    the given context, example ``Status("SUCCESS", "$JOB_BASE_NAME")``