github.com/crossplane/upjet@v1.3.0/docs/testing-with-uptest.md

github.com/crossplane/upjet@v1.3.0/docs/testing-with-uptest.md (about)

1 
6 # Testing resources by using Uptest
7
8 `Uptest` provides a framework to test resources in an end-to-end pipeline during
9 the resource configuration process. Together with the example manifest
10 generation tool, it allows us to avoid manual interventions and shortens testing
11 processes.
12
13 These integration tests are costly as they create real resources in cloud
14 providers. So they are not executed by default. Instead, a comment should be
15 posted to the PR for triggering tests.
16
17 Tests can be run by adding something like the following expressions to the
18 anywhere in comment:
19
20 - `/test-examples="provider-azure/examples/kubernetes/cluster.yaml"`
21 - `/test-examples="provider-aws/examples/s3/bucket.yaml, provider-aws/examples/eks/cluster.yaml"`
22
23 You can trigger a test job for an only provider. Provider that the tests will
24 run is determined by using the first element of the comma separated list. If the
25 comment contains resources that are from different providers, then these
26 different resources will be skipped. So, if you want to run tests more than one
27 provider, you must post separate comments for each provider.
28
29 ## Debugging Failed Test
30
31 After a test failed, it is important to understand what is going wrong. For
32 debugging the tests, we push some collected logs to GitHub Action artifacts.
33 These artifacts contain the following data:
34
35 - Dump of Kind Cluster
36 - Kuttl input files (Applied manifests, assertion files)
37 - Managed resource yaml outputs
38
39 To download the artifacts, firstly you must go to the `Summary` page of the
40 relevant job:
41
42 ![images/summary.png](images/summary.png)
43
44 Then click the `1` under the `Artifacts` button in the upper right. If the
45 automated tests run for more than one providers, this number will be higher.
46
47 When you click this, you can see the `Artifacts` list of job. You can download
48 the artifact you are interested in by clicking it.
49
50 ![images/artifacts.png](images/artifacts.png)
51
52 When a test fails, the first point to look is the provider container's logs. In
53 test environment, we run provider by using the `-d` flag to see the debug logs.
54 In the provider logs, it is possible to see all errors caused by the content of
55 the resource manifest, caused by the configuration or returned by the cloud
56 provider.
57
58 Also, as you know, yaml output of the managed resources (it is located in the
59 `managed.yaml` of the artifact archive's root level) are very useful to catch
60 errors.
61
62 If you have any doubts about the generated kuttl files, please check the
63 `kuttl-inputs.yaml` file in the archive's root.
64
65 ## Running Uptest locally
66
67 For a faster feedback loop, you might want to run `uptest` locally in your
68 development setup.
69
70 To do so run a special `uptest-local` target that accepts `PROVIDER_NAME` and
71 `EXAMPLE_LIST` arguments as in the example below.
72
73 ```bash
74 make uptest-local PROVIDER_NAME=provider-azure EXAMPLE_LIST="provider-azure/examples/resource/resourcegroup.yaml"
75 ```
76
77 You may also provide all the files in a folder like below:
78
79 ```bash
80 make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=$(find provider-aws/examples/secretsmanager/*.yaml | tr '\n' ',')
81 ```
82
83 The local invocation is intentionally lightweight and skips the local cluster,
84 credentials and ProviderConfig setup assuming you already have it all already
85 configured in your environment.
86
87 For a more heavyweight setup see `run_automated_tests` target which is used in a
88 centralized GitHub Actions invocation.
89
90 ## Testing Instructions and Known Error Cases
91
92 While configuring resources, the testing effort is the longest part. Because the
93 characteristics of cloud providers and services can change. This test effort can
94 be executed in two main methods. The first one is testing the resources in a
95 manual way and the second one is using the `Uptest` that is an automated test
96 tool for Official Providers. `Uptest` provides a framework to test resources in
97 an end-to-end pipeline during the resource configuration process. Together with
98 the example manifest generation tool, it allows us to avoid manual interventions
99 and shortens testing processes.
100
101 ### Testing Methods
102
103 #### Manual Test
104
105 Configured resources can be tested by using manual method. This method generally
106 contains the environment preparation and creating the example manifest in the
107 Kubernetes cluster steps. The following steps can be followed for preparing the
108 environment:
109
110 1. Obtaining a Kubernetes Cluster: For manual/local effort, generally a Kind
111 cluster is sufficient and can be used. For detailed information about Kind see
112 [this repo].
113 An alternative way to obtain a cluster is: [k3d]
114
115 2. Registering the CRDs (Custom Resource Definitions) to Cluster: We need to
116 apply the CRD manifests to the cluster. The relevant manifests are located in
117 the `package/crds` folder of provider subdirectories such as:
118 `provider-aws/package/crds`. For registering them please run the following
119 command: `kubectl apply -f package/crds`
120
121 3. Create ProviderConfig: ProviderConfig Custom Resource contains some
122 configurations and credentials for the provider. For example, to connect to the
123 cloud provider, we use the credentials field of ProviderConfig. For creating the
124 ProviderConfig with correct credentials, please see [the documentation]:
125
126 4. Start Provider: For every Custom Resource, there is a controller and these
127 controllers are part of the provider. So, for starting the reconciliations for
128 Custom Resources, we need to run the provider (collect of controllers). For
129 running provider, two ways can be used:
130 - `make run`: This make target starts the controllers.
131 - Running provider in IDE: Especially for debug effort, you may want to use
132 an IDE. For running the provider in an IDE, some program arguments are
133 needed to be passed. The following example is for `provider-aws`.
134 Values of the `--terraform-version`, `--terraform-provider-source` and
135 `--terraform-provider-version` options can be collected from the Makefile of
136 the provider: `provider-aws/Makefile`
137 - `-d` -> To see debug level logs. `make run` also is run the provider in
138 debug mode.
139 - `--terraform-version 1.2.1`: Terraform version.
140 - `--terraform-provider-source hashicorp/aws`: Provider source name.
141 - `--terraform-provider-version 4.15.1`: Provider version.
142
143 Now our preparation steps are completed. This is the time for testing:
144
145 - Create Examples and Start Testing: After completing the steps above, your
146 environment is ready to testing. For testing, we need to apply some example
147 manifests to the cluster. The manifests in the `examples-generated` folder can be
148 used as a first step. Before starting to change these manifests, you should move
149 them from `examples-generated` folder to the `examples` folder. There are two
150 main reasons for this. The first one is that these manifests are generated for
151 every `make generate` command to catch the latest changes in the resources. So
152 for preserving your changes moving them is necessary. The second reason is that
153 we use the `examples` folder as the source for keeping these manifests and using
154 them in our automated test effort.
155
156 In some cases, these manifests need manual interventions so, for successfully
157 applying them to a cluster (passing the Kubernetes schema validation) you may
158 need to do some work. Possible problems you might face:
159
160 - The generated manifest cannot provide at least one required field. So
161 before creating the resource you must set the required field in the manifest.
162 - In some fields of generated manifest the types of values cannot be matched.
163 For example, X field expects a string but the manifest provides an integer.
164 In these cases you need to provide the correct type in your example YAML
165 manifest.
166
167 Successfully applying these example manifests to cluster is only the
168 first step. After successfully creating these Managed Resources, we need to
169 check whether their statuses are ready or not. So we need to expect a `True`
170 value for `Synced` and `Ready` conditions. To check the statuses of all created
171 example manifests quickly you can run the `kubectl get managed` command. We will
172 wait for all values to be `True` in this list:
173
174 ![img.png](images/managed-all.png)
175
176 When all of the `Synced` and `Ready` fields are `True`, the test was
177 successfully completed! However, if there are some resource values that are
178 `False`, you need to debug this situation. The main debugging ways will be
179 mentioned in the next parts.
180
181 > [!NOTE]
182 > For following the test processes in a more accurate way, we have `UpToDate`
183 status condition. This status condition will be visible when you set the
184 annotation: `upjet.upbound.io/test=true`. Without adding this annotation you
185 cannot see the mentioned condition. Uptest adds this annotation to the tested
186 resources, but if you want to see the value of conditions in your tests in
187 your local environment (during manual tests) you need to add this condition
188 manually. For the goal and details of this status condition please see this
189 PR: https://github.com/crossplane/upjet/pull/23
190
191 > [!NOTE]
192 > The resources that are tried to be created may have dependencies. For example,
193 you might actually need resources Y and Z while trying to test resource X.
194 Many of the generated examples include these dependencies. However, in some
195 cases, there may be missing dependencies. In these cases, please add the
196 relevant dependencies to your example manifest. This is important both for you
197 to pass the tests and to provide the correct manifests.
198
199 #### Automated Tests - Uptest
200
201 Configured resources can be tested also by using `Uptest`. We can also separate
202 this part into two main application methods:
203
204 ##### Using Uptest in GitHub Actions
205
206 We have a GitHub workflow `Automated Tests`. This is an integration test for
207 Official Providers. This workflow prepares the environment (provisioning Kind
208 cluster, creating ProviderConfig, installing Provider, etc.) and runs the Uptest
209 with the input manifest list that will be given by the person who triggers the
210 test.
211
212 This `Automated Tests` job can be triggered from the PR that contains the
213 configuration test works for the related resources/groups. For triggering the
214 test, you need to leave a comment in the PR in the following format:
215
216 `/test-examples="provider-aws/examples/s3/bucket.yaml, provider-aws/examples/eks/cluster.yaml"`
217
218 We test using the API group approach for `Automated-Tests`. So, we wait for the
219 entire API group's resources to pass the test in a single test run. This means
220 that while triggering tests, leaving the following type of comment is expected:
221
222 `/test-examples="provider-aws/examples/s3`
223
224 This comment will test all the examples of the `s3` group.
225
226 **Ignoring Some Resources in Automated Tests**
227
228 Some resources require manual intervention such as providing valid public keys
229 or using on-the-fly values. These cases can be handled in manual tests, but in
230 cases where we cannot provide generic values for automated tests, we can skip
231 some resources in the tests of the relevant group via an annotation:
232
233 ```yaml
234 upjet.upbound.io/manual-intervention: "The Certificate needs to be provisioned successfully which requires a real domain."
235 ```
236
237 The key is important for skipping, we are checking this `upjet.upbound.io/manual-intervention`
238 annotation key and if is in there, we skip the related resource. The value is also
239 important to see why we skip this resource.
240
241 > [!NOTE]
242 > For resources that are ignored during Automated Tests, manual testing is a
243 must. Because we need to make sure that all resources published in the
244 `v1beta1` version are working.
245
246 At the end of the tests, Uptest will provide a report for you. And also for all
247 GitHub Actions, we will have an artifact that contains logs for debugging. For
248 details please see [here].
249
250 ##### Using Uptest in Local Dev Environment
251
252 The main difference between running `Uptest` from your local environment and
253 running GitHub Actions is that the environment is also prepared during GitHub
254 Actions. During your tests on local, `Uptest` is only responsible for creating
255 instance manifests and assertions of them. Therefore, all the preparation steps
256 mentioned in the Manual Testing section are also necessary for tests performed
257 using `Uptest` locally.
258
259 After preparing the testing environment, you should run the following command to
260 trigger tests locally by using `Uptest`:
261
262 Example for single file test:
263
264 ```bash
265 make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=provider-aws/examples/secretsmanager/secret.yaml
266 ```
267
268 Example of whole API Group test:
269
270 ```bash
271 make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=$(find provider-aws/examples/secretsmanager/*.yaml | tr '\n' ',')
272 ```
273
274 ### Debugging Tests
275
276 Whether the tests fail using `Uptest` or when testing manually, the steps to be
277 followed are the same. What finally failed was a Managed Resource tested against
278 Official Providers. In this case, the first thing to do is to check the manifest
279 of the failing resource (where the value of `Synced` or `Ready` condition is
280 `False`) in the cluster.
281
282 If the test was in your local environment, you can check the current state of
283 the resource by using the following command:
284 `kubectl get network.compute.gcp.upbound.io/example-network-1 -o yaml`
285
286 If the test ran in the GitHub Actions, you need to check the action artifact
287 mentioned in the previous part of the documentation.
288
289 The second important point to understand the problem is the provider logs. If
290 the test was in your local environment, you need to check the `make run` or IDE
291 logs. If testing was in GitHub Actions, you need to check the action artifact.
292 It contains the cluster dump that has the provider logs.
293
294 ## Known Error Cases
295
296 1. `prevent_destroy` Case: In some cases, when unexpected changes or situations
297 occur in the resources, Terraform tries to delete the related resource and
298 create it again. However, in order to prevent this situation, the resources are
299 configurable. In this context, the name of the field where you can provide this
300 control is `prevent_destroy`. Please see details of [Terraform Resource Lifecycle].
301 For resources in Official Providers, this value defaults to `true`. So the
302 deletion of the resource is blocked.
303
304 Encountering this situation (i.e. Terraform trying to delete and recreate the
305 resource) is not normal and may indicate a specific error. Some possible
306 problems could be:
307
308 - As a result of overriding the constructed ID after Terraform calls, Terraform
309 could not match the IDs and tries to recreate the resource. Please see
310 [this issue] for details. In this type of cases, you need to review your
311 external name configuration.
312 - Crossplane's concept of [Late Initialization] may cause some side effects.
313 One of them is while late initialization, filling a field that is not initially
314 filled on the manifest may cause the resource to be destroyed and recreated.
315 In such a case, it should be evaluated that which field's value is set will
316 cause such an error. During this evaluation, it will be necessary to make use
317 of the terraform registry document. In the end, the field that is thought to
318 solve the problem is put into the ignore list using the
319 [late initialization configuration] and the test is repeated from the
320 beginning.
321 - Some resources fall into `tainted` state as a result of certain steps in the
322 creation process fail. Please see [tainted issue] for details.
323
324 2. External Name Configuration Related Errors: The most common known issue is
325 errors in the external name configuration. A clear error message regarding this
326 situation may not be visible. Many error messages can be related to an incorrect
327 external name configuration. Such as, a field cannot be read properly from the
328 parameter map, there are unexpected fields in the generated `main.tf.json` file,
329 etc.
330
331 Therefore, when debugging a non-ready resource; if you do not see errors
332 returned by the Cloud API related to the constraints or characteristics of the
333 service (for example, you are stuck on the creation limit of this resource in
334 this region, or the use of the relevant field for this resource depends on the
335 following conditions etc.), the first point to check is external name
336 configuration.
337
338 3. Late Initialization Errors: Late Initialization is one of the key concepts of
339 Crossplane. It allows for some values that are not initially located in the
340 resource's manifest to be filled with the values returned by the cloud providers.
341
342 As a side effect of this, some fields conflict each other. In this case, a
343 detailed error message is usually displayed about which fields conflict with
344 each other. In this case, the relevant field should be skipped by [these steps].
345
346 4. Provider Service Specific Errors: Every cloud provider and every service has
347 its own features and behavior. Therefore, you may see special error messages in
348 the status of the resources from time to time. These may say that you are out of
349 the allowed values in some fields of the resource, or that you need to enable
350 the relevant service, etc. In such cases, please review your example manifest
351 and try to find the appropriate example.
352
353 > [!IMPORTANT]
354 > `make reviewable` and `kubectl apply -f package/crds` commands must be run
355 after any change that will affect the schema or controller of the
356 configured/tested resource. In addition, the provider needs to be restarted
357 after the changes in the controllers, because the controller change actually
358 corresponds to the changes made in the running code.
359
360 [this repo]: https://github.com/kubernetes-sigs/kind
361 [the documentation]: https://crossplane.io/docs/v1.9/getting-started/install-configure.html#install-configuration-package
362 [here]: https://github.com/upbound/official-providers/blob/main/docs/testing-resources-by-using-uptest.md#debugging-failed-test
363 [these steps]: https://github.com/upbound/crossplane/blob/main/docs/configuring-a-resource.md#late-initialization-configuration
364 [late initialization configuration]: https://github.com/upbound/crossplane/blob/main/docs/configuring-a-resource.md#late-initialization-configuration
365 [Terraform Resource Lifecycle]: https://learn.hashicorp.com/tutorials/terraform/resource-lifecycle
366 [this issue]: https://github.com/upbound/crossplane/issues/32
367 [Late Initialization]: https://crossplane.io/docs/v1.9/concepts/managed-resources.html#late-initialization
368 [tainted issue]: https://github.com/upbound/crossplane/issues/80
369 [k3d]: https://k3d.io/