gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/test/runtimes/README.md (about)

     1  # gVisor Runtime Tests
     2  
     3  These tests execute language runtime test suites inside gVisor. They serve as
     4  high-level integration tests for the various runtimes.
     5  
     6  ## Runtime Test Components
     7  
     8  The runtime tests have the following components:
     9  
    10  -   [`images`][runtime-images] - These are Docker images for each language
    11      runtime we test. The images contain all the particular runtime tests, and
    12      whatever other libraries or utilities are required to run the tests.
    13  -   [`proctor`](proctor) - This is a binary that acts as an agent inside the
    14      container and provides a uniform command-line API to list and run the
    15      various language tests.
    16  -   [`runner`](runner) - This is the test entrypoint invoked by `bazel run`.
    17      This binary spawns Docker (using `runsc` runtime) and runs the language
    18      image with `proctor` binary mounted.
    19  -   [`exclude`](exclude) - Holds a CSV file for each language runtime containing
    20      the full path of tests that should be excluded from running along with a
    21      reason for exclusion.
    22  
    23  ## Testing Locally
    24  
    25  The following `make` targets will run an entire runtime test suite locally.
    26  
    27  Note: java runtime test take 1+ hours with 16 cores.
    28  
    29  Language | Version | Running the test suite
    30  -------- | ------- | ----------------------------------
    31  Go       | 1.22    | `make go1.22-runtime-tests`
    32  Java     | 21      | `make java21-runtime-tests`
    33  NodeJS   | 16.13.2 | `make nodejs16.13.2-runtime-tests`
    34  Php      | 8.1.1   | `make php8.1.1-runtime-tests`
    35  Python   | 3.10.2  | `make python3.10.2-runtime-tests`
    36  
    37  You can modify the runtime test behaviors by passing in the following `make`
    38  variables:
    39  
    40  *   `RUNTIME_TESTS_FILTER`: Comma-separated list of tests to run, even if
    41      otherwise excluded. Useful to debug single failing test cases.
    42  *   `RUNTIME_TESTS_PER_TEST_TIMEOUT`: Modify per-test timeout. Useful when
    43      debugging a test that has a tendency to get stuck, in order to make it fail
    44      faster.
    45  *   `RUNTIME_TESTS_RUNS_PER_TEST`: Number of times to run each test. Useful to
    46      find flaky tests.
    47  *   `RUNTIME_TESTS_FLAKY_IS_ERROR`: Boolean indicating whether tests found flaky
    48      (i.e. running them multiple times has sometimes succeeded, sometimes failed)
    49      should be considered a test suite failure (`true`) or success (`false`).
    50  *   `RUNTIME_TESTS_FLAKY_SHORT_CIRCUIT`: If true, when running tests multiple
    51      times, and a test has been found flaky (i.e. running it multiple times has
    52      succeeded at least once and failed at least once), exit immediately, rather
    53      than running all `RUNTIME_TESTS_RUNS_PER_TEST` attempts.
    54  
    55  Example invocation:
    56  
    57  ```shell
    58  $ make php8.1.1-runtime-tests \
    59      RUNTIME_TESTS_FILTER=ext/standard/tests/file/bug60120.phpt \
    60      RUNTIME_TESTS_PER_TEST_TIMEOUT=10s \
    61      RUNTIME_TESTS_RUNS_PER_TEST=100
    62  ```
    63  
    64  ### Clean Up
    65  
    66  Sometimes when runtime tests fail or when the testing container itself crashes
    67  unexpectedly, the containers are not removed or sometimes do not even exit. This
    68  can cause some docker commands like `docker system prune` to hang forever.
    69  
    70  Here are some helpful commands (should be executed in order):
    71  
    72  ```bash
    73  docker ps -a  # Lists all docker processes; useful when investigating hanging containers.
    74  docker kill $(docker ps -a -q)  # Kills all running containers.
    75  docker rm $(docker ps -a -q)  # Removes all exited containers.
    76  docker system prune  # Remove unused data.
    77  ```
    78  
    79  ## Updating Runtime Tests
    80  
    81  To bump the version of an existing runtime test:
    82  
    83  1.  Update the [Docker image](../../images/runtimes) for with the new runtime
    84      version. Rename the `Dockerfile` directory name and update any packages or
    85      downloaded urls to point to the new version. Test building the image with
    86      `docker build images/runtimes/<new_runtime>`.
    87  
    88  2.  Update [`runtime_test`](BUILD) target. The `name` field must be the
    89      directory name for the `Dockerfile` created in Step 1.
    90  
    91  3.  Update [Buildkite pipeline](../../.buildkite/pipeline.yaml).
    92  
    93  4.  Run the tests, and triage any failures. Some language tests are flaky (or
    94      never pass at all), other failures may indicate a gVisor bug or divergence
    95      from Linux behavior.
    96  
    97  5.  Update the [exclude](exclude) file by renaming it with the right version and
    98      adding any failing tests to it with a reason.
    99  
   100  ### Cleaning up exclude files
   101  
   102  Usually when the runtime is updated, a lot has changed. Tests may have been
   103  deleted, modified (fixed or broken) or added. After you have an exclude list
   104  from step 3 above with which all runtime tests pass, it is useful to clean up
   105  the exclude files with the following steps:
   106  
   107  1.  Check for the existence of tests in the runtime image. See how each runtime
   108      lists all its tests (see `ListTests()` implementations in `proctor/lib`
   109      directory). Then you can compare against that list and remove any excluded
   110      tests that don't exist anymore.
   111  2.  Run all excluded tests with runc (native) for each runtime. If the test
   112      fails, we can consider the test as broken. Such tests should be marked with
   113      `Broken test` in the reason column. These tests don't provide a
   114      compatibility gap signal for gvisor. We can happily ignore them. Some tests
   115      which were previously broken may not be unbroken and for them the reason
   116      field should be cleared.
   117  3.  Run all the unbroken and non-flaky tests on runsc (gVisor). If the test is
   118      now passing, then the test should be removed from the exclude list. This
   119      effectively increases our testing surface. Once upon a time, this test was
   120      failing. Now it is passing. Something was fixed in between. Enabling this
   121      test is equivalent to adding a regression test for the fix.
   122  4.  Some tests are excluded and marked flaky. Run these tests 100 times on runsc
   123      (gVisor). If it does not flake, then you can remove it from the exclude
   124      list.
   125  5.  Finally, close all corresponding bugs for tests that are now passing. These
   126      bugs are stale.
   127  
   128  Creating new runtime tests for an entirely new language is similar to the above,
   129  except that Step 1 is a bit harder. You have to figure out how to download and
   130  run the language tests in a Docker container. Once you have that, you must also
   131  implement the [`proctor/TestRunner`](proctor/lib/lib.go) interface for that
   132  language, so that proctor can list and run the tests in the image you created.