github.com/kubeflow/training-operator@v1.7.0/docs/testing/e2e_testing.md (about) 1 # How to Write an E2E Test for Kubeflow Training Operator 2 3 TODO (andreyvelich): This doc is outdated. Currently, E2Es are located here: 4 [`sdk/python/test/e2e`](../../sdk/python/test/e2e) 5 6 The E2E tests for Kubeflow Training operator are implemented as Argo workflows. For more background and details 7 about Argo (not required for understanding the rest of this document), please take a look at 8 [this link](https://github.com/kubeflow/testing/blob/master/README.md). 9 10 Test results can be monitored at the [Prow dashboard](http://prow.kubeflow-testing.com/?repo=kubeflow%2Ftraining-operator). 11 12 At a high level, the E2E test suites are structured as Python test classes. Each test class contains 13 one or more tests. A test typically runs the following: 14 15 - Create a ksonnet component using a TFJob spec; 16 - Creates the specified TFJob; 17 - Verifies some expected results (e.g. number of pods started, job status); 18 - Deletes the TFJob. 19 20 ## Adding a Test Method 21 22 An example can be found [here](https://github.com/kubeflow/training-operator/blob/master/py/kubeflow/tf_operator/simple_tfjob_tests.py). 23 24 A test class can have several test methods. Each method executes a series of user actions (e.g. 25 starting or deleting a TFJob), and performs verifications of expected results (e.g. TFJob exits with 26 correct status, pods are deleted, etc). 27 28 Test classes should follow this pattern: 29 30 ```python 31 class MyTest(test_util.TestCase): 32 def __init__(self, args): 33 # Initialize environment 34 35 def test_case_1(self): 36 # Test code 37 38 def test_case_2(self): 39 # Test code 40 41 if __name__ == "__main__" 42 test_runner.main(module=__name__) 43 ``` 44 45 The code here ideally should only contain API calls. Any common functionalities used by the test code should 46 be added to one of the helper modules: 47 48 - k8s_util - for K8s operations like querying/deleting a pod 49 - ks_util - for ksonnet operations 50 - tf_job_client - for TFJob-specific operations, such as waiting for the job to be in a certain phase 51 52 ## Adding a TFJob Spec 53 54 This is needed if you want to use your own TFJob spec instead of an existing one. An example can be found 55 [here](https://github.com/kubeflow/training-operator/tree/master/test/workflows/components/simple_tfjob_v1.jsonnet). 56 All TFJob specs should be placed in the same directory. 57 58 These are similar to actual TFJob specs. Note that many of these are using the 59 [training-operator-test-server](https://github.com/kubeflow/training-operator/tree/master/test/test-server) as the test image. 60 This gives us more control over when each replica exits, and allows us to send specific requests like fetching the 61 runtime TensorFlow config. 62 63 ## Adding a New Test Class 64 65 This is needed if you are creating a new test class. Creating a new test class is recommended if you are implementing 66 a new feature, and want to group all relevant E2E tests together. 67 68 New test classes should be added as Argo workflow steps to the 69 [workflows.libsonnet](https://github.com/kubeflow/training-operator/blob/master/test/workflows/components/workflows.libsonnet) file. 70 71 Under the templates section, add the following to the dag: 72 73 ``` 74 { 75 name: "my-test", 76 template: "my-test", 77 dependencies: ["setup-kubeflow"], 78 }, 79 ``` 80 81 This will configure Argo to run `my-test` after setting up the Kubeflow cluster. 82 83 Next, add the following lines toward the end of the file: 84 85 ``` 86 $.parts(namespace, name, overrides).e2e(prow_env, bucket).buildTestTemplate( 87 "my-test"), 88 ``` 89 90 This assumes that there is a corresponding Python file named `my_test.py` (note the difference between dashes and 91 underscores).