github.com/crossplane/upjet@v1.3.0/docs/testing-with-uptest.md (about)

     1  <!--
     2  SPDX-FileCopyrightText: 2023 The Crossplane Authors <https://crossplane.io>
     3  
     4  SPDX-License-Identifier: CC-BY-4.0
     5  -->
     6  # Testing resources by using Uptest
     7  
     8  `Uptest` provides a framework to test resources in an end-to-end pipeline during
     9  the resource configuration process. Together with the example manifest
    10  generation tool, it allows us to avoid manual interventions and shortens testing
    11  processes.
    12  
    13  These integration tests are costly as they create real resources in cloud
    14  providers. So they are not executed by default. Instead, a comment should be
    15  posted to the PR for triggering tests.
    16  
    17  Tests can be run by adding something like the following expressions to the
    18  anywhere in comment:
    19  
    20  - `/test-examples="provider-azure/examples/kubernetes/cluster.yaml"`
    21  - `/test-examples="provider-aws/examples/s3/bucket.yaml, provider-aws/examples/eks/cluster.yaml"`
    22  
    23  You can trigger a test job for an only provider. Provider that the tests will
    24  run is determined by using the first element of the comma separated list. If the
    25  comment contains resources that are from different providers, then these
    26  different resources will be skipped. So, if you want to run tests more than one
    27  provider, you must post separate comments for each provider.
    28  
    29  ## Debugging Failed Test
    30  
    31  After a test failed, it is important to understand what is going wrong. For
    32  debugging the tests, we push some collected logs to GitHub Action artifacts.
    33  These artifacts contain the following data:
    34  
    35  - Dump of Kind Cluster
    36  - Kuttl input files (Applied manifests, assertion files)
    37  - Managed resource yaml outputs
    38  
    39  To download the artifacts, firstly you must go to the `Summary` page of the
    40  relevant job:
    41  
    42  ![images/summary.png](images/summary.png)
    43  
    44  Then click the `1` under the `Artifacts` button in the upper right. If the
    45  automated tests run for more than one providers, this number will be higher.
    46  
    47  When you click this, you can see the `Artifacts` list of job. You can download
    48  the artifact you are interested in by clicking it.
    49  
    50  ![images/artifacts.png](images/artifacts.png)
    51  
    52  When a test fails, the first point to look is the provider container's logs. In
    53  test environment, we run provider by using the `-d` flag to see the debug logs.
    54  In the provider logs, it is possible to see all errors caused by the content of
    55  the resource manifest, caused by the configuration or returned by the cloud
    56  provider.
    57  
    58  Also, as you know, yaml output of the managed resources (it is located in the
    59  `managed.yaml` of the artifact archive's root level) are very useful to catch
    60  errors.
    61  
    62  If you have any doubts about the generated kuttl files, please check the
    63  `kuttl-inputs.yaml` file in the archive's root.
    64  
    65  ## Running Uptest locally
    66  
    67  For a faster feedback loop, you might want to run `uptest` locally in your
    68  development setup.
    69  
    70  To do so run a special `uptest-local` target that accepts `PROVIDER_NAME` and
    71  `EXAMPLE_LIST` arguments as in the example below.
    72  
    73  ```bash
    74  make uptest-local PROVIDER_NAME=provider-azure EXAMPLE_LIST="provider-azure/examples/resource/resourcegroup.yaml"
    75  ```
    76  
    77  You may also provide all the files in a folder like below:
    78  
    79  ```bash
    80  make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=$(find provider-aws/examples/secretsmanager/*.yaml | tr '\n' ',')
    81  ```
    82  
    83  The local invocation is intentionally lightweight and skips the local cluster,
    84  credentials and ProviderConfig setup assuming you already have it all already
    85  configured in your environment.
    86  
    87  For a more heavyweight setup see `run_automated_tests` target which is used in a
    88  centralized GitHub Actions invocation.
    89  
    90  ## Testing Instructions and Known Error Cases
    91  
    92  While configuring resources, the testing effort is the longest part. Because the
    93  characteristics of cloud providers and services can change. This test effort can
    94  be executed in two main methods. The first one is testing the resources in a
    95  manual way and the second one is using the `Uptest` that is an automated test
    96  tool for Official Providers. `Uptest` provides a framework to test resources in
    97  an end-to-end pipeline during the resource configuration process. Together with
    98  the example manifest generation tool, it allows us to avoid manual interventions
    99  and shortens testing processes.
   100  
   101  ### Testing Methods
   102  
   103  #### Manual Test
   104  
   105  Configured resources can be tested by using manual method. This method generally
   106  contains the environment preparation and creating the example manifest in the
   107  Kubernetes cluster steps. The following steps can be followed for preparing the
   108  environment:
   109  
   110  1. Obtaining a Kubernetes Cluster: For manual/local effort, generally a Kind
   111  cluster is sufficient and can be used. For detailed information about Kind see
   112  [this repo].
   113  An alternative way to obtain a cluster is: [k3d]
   114  
   115  2. Registering the CRDs (Custom Resource Definitions) to Cluster: We need to
   116  apply the CRD manifests to the cluster. The relevant manifests are located in
   117  the `package/crds` folder of provider subdirectories such as:
   118  `provider-aws/package/crds`. For registering them please run the following
   119  command: `kubectl apply -f package/crds`
   120  
   121  3. Create ProviderConfig: ProviderConfig Custom Resource contains some
   122  configurations and credentials for the provider. For example, to connect to the
   123  cloud provider, we use the credentials field of ProviderConfig. For creating the
   124  ProviderConfig with correct credentials, please see [the documentation]:
   125  
   126  4. Start Provider: For every Custom Resource, there is a controller and these
   127  controllers are part of the provider. So, for starting the reconciliations for
   128  Custom Resources, we need to run the provider (collect of controllers). For
   129  running provider, two ways can be used:
   130      - `make run`: This make target starts the controllers.
   131      - Running provider in IDE: Especially for debug effort, you may want to use
   132      an IDE. For running the provider in an IDE, some program arguments are
   133      needed to be passed. The following example is for `provider-aws`.
   134      Values of the `--terraform-version`, `--terraform-provider-source` and
   135      `--terraform-provider-version` options can be collected from the Makefile of
   136      the provider: `provider-aws/Makefile`
   137        - `-d` -> To see debug level logs. `make run` also is run the provider in
   138        debug mode.
   139        - `--terraform-version 1.2.1`: Terraform version.
   140        - `--terraform-provider-source hashicorp/aws`: Provider source name.
   141        - `--terraform-provider-version 4.15.1`: Provider version.
   142  
   143  Now our preparation steps are completed. This is the time for testing:
   144  
   145  - Create Examples and Start Testing: After completing the steps above, your
   146  environment is ready to testing. For testing, we need to apply some example
   147  manifests to the cluster. The manifests in the `examples-generated` folder can be
   148  used as a first step. Before starting to change these manifests, you should move
   149  them from `examples-generated` folder to the `examples` folder. There are two
   150  main reasons for this. The first one is that these manifests are generated for
   151  every `make generate` command to catch the latest changes in the resources. So
   152  for preserving your changes moving them is necessary. The second reason is that
   153  we use the `examples` folder as the source for keeping these manifests and using
   154  them in our automated test effort.
   155  
   156  In some cases, these manifests need manual interventions so, for successfully
   157  applying them to a cluster (passing the Kubernetes schema validation) you may
   158  need to do some work. Possible problems you might face:
   159  
   160  - The generated manifest cannot provide at least one required field. So
   161     before creating the resource you must set the required field in the manifest.
   162  - In some fields of generated manifest the types of values cannot be matched.
   163     For example, X field expects a string but the manifest provides an integer.
   164     In these cases you need to provide the correct type in your example YAML
   165     manifest.
   166  
   167  Successfully applying these example manifests to cluster is only the
   168  first step. After successfully creating these Managed Resources, we need to
   169  check whether their statuses are ready or not. So we need to expect a `True`
   170  value for `Synced` and `Ready` conditions. To check the statuses of all created
   171  example manifests quickly you can run the `kubectl get managed` command. We will
   172  wait for all values to be `True` in this list:
   173  
   174  ![img.png](images/managed-all.png)
   175  
   176  When all of the `Synced` and `Ready` fields are `True`, the test was
   177  successfully completed! However, if there are some resource values that are
   178  `False`, you need to debug this situation. The main debugging ways will be
   179  mentioned in the next parts.
   180  
   181  > [!NOTE]
   182  > For following the test processes in a more accurate way, we have `UpToDate`
   183    status condition. This status condition will be visible when you set the
   184    annotation: `upjet.upbound.io/test=true`. Without adding this annotation you
   185    cannot see the mentioned condition. Uptest adds this annotation to the tested
   186    resources, but if you want to see the value of conditions in your tests in
   187    your local environment (during manual tests) you need to add this condition
   188    manually. For the goal and details of this status condition please see this
   189    PR: https://github.com/crossplane/upjet/pull/23
   190  
   191  > [!NOTE]
   192  > The resources that are tried to be created may have dependencies. For example,
   193    you might actually need resources Y and Z while trying to test resource X.
   194    Many of the generated examples include these dependencies. However, in some
   195    cases, there may be missing dependencies. In these cases, please add the
   196    relevant dependencies to your example manifest. This is important both for you
   197    to pass the tests and to provide the correct manifests.
   198  
   199  #### Automated Tests - Uptest
   200  
   201  Configured resources can be tested also by using `Uptest`. We can also separate
   202  this part into two main application methods:
   203  
   204  ##### Using Uptest in GitHub Actions
   205  
   206  We have a GitHub workflow `Automated Tests`. This is an integration test for
   207  Official Providers. This workflow prepares the environment (provisioning Kind
   208  cluster, creating ProviderConfig, installing Provider, etc.) and runs the Uptest
   209  with the input manifest list that will be given by the person who triggers the
   210  test.
   211  
   212  This `Automated Tests` job can be triggered from the PR that contains the
   213  configuration test works for the related resources/groups. For triggering the
   214  test, you need to leave a comment in the PR in the following format:
   215  
   216  `/test-examples="provider-aws/examples/s3/bucket.yaml, provider-aws/examples/eks/cluster.yaml"`
   217  
   218  We test using the API group approach for `Automated-Tests`. So, we wait for the
   219  entire API group's resources to pass the test in a single test run. This means
   220  that while triggering tests, leaving the following type of comment is expected:
   221  
   222  `/test-examples="provider-aws/examples/s3`
   223  
   224  This comment will test all the examples of the `s3` group.
   225  
   226  **Ignoring Some Resources in Automated Tests**
   227  
   228  Some resources require manual intervention such as providing valid public keys
   229  or using on-the-fly values. These cases can be handled in manual tests, but in
   230  cases where we cannot provide generic values for automated tests, we can skip
   231  some resources in the tests of the relevant group via an annotation:
   232  
   233  ```yaml
   234  upjet.upbound.io/manual-intervention: "The Certificate needs to be provisioned successfully which requires a real domain."
   235  ```
   236  
   237  The key is important for skipping, we are checking this `upjet.upbound.io/manual-intervention`
   238  annotation key and if is in there, we skip the related resource. The value is also
   239  important to see why we skip this resource.
   240  
   241  > [!NOTE]
   242  > For resources that are ignored during Automated Tests, manual testing is a
   243    must. Because we need to make sure that all resources published in the
   244    `v1beta1` version are working.
   245  
   246  At the end of the tests, Uptest will provide a report for you. And also for all
   247  GitHub Actions, we will have an artifact that contains logs for debugging. For
   248  details please see [here].
   249  
   250  ##### Using Uptest in Local Dev Environment
   251  
   252  The main difference between running `Uptest` from your local environment and
   253  running GitHub Actions is that the environment is also prepared during GitHub
   254  Actions. During your tests on local, `Uptest` is only responsible for creating
   255  instance manifests and assertions of them. Therefore, all the preparation steps
   256  mentioned in the Manual Testing section are also necessary for tests performed
   257  using `Uptest` locally.
   258  
   259  After preparing the testing environment, you should run the following command to
   260  trigger tests locally by using `Uptest`:
   261  
   262  Example for single file test:
   263  
   264  ```bash
   265  make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=provider-aws/examples/secretsmanager/secret.yaml
   266  ```
   267  
   268  Example of whole API Group test:
   269  
   270  ```bash
   271  make uptest-local PROVIDER_NAME=provider-aws EXAMPLE_LIST=$(find provider-aws/examples/secretsmanager/*.yaml | tr '\n' ',')
   272  ```
   273  
   274  ### Debugging Tests
   275  
   276  Whether the tests fail using `Uptest` or when testing manually, the steps to be
   277  followed are the same. What finally failed was a Managed Resource tested against
   278  Official Providers. In this case, the first thing to do is to check the manifest
   279  of the failing resource (where the value of `Synced` or `Ready` condition is
   280  `False`) in the cluster.
   281  
   282  If the test was in your local environment, you can check the current state of
   283  the resource by using the following command:
   284  `kubectl get network.compute.gcp.upbound.io/example-network-1 -o yaml`
   285  
   286  If the test ran in the GitHub Actions, you need to check the action artifact
   287  mentioned in the previous part of the documentation.
   288  
   289  The second important point to understand the problem is the provider logs. If
   290  the test was in your local environment, you need to check the `make run` or IDE
   291  logs. If testing was in GitHub Actions, you need to check the action artifact.
   292  It contains the cluster dump that has the provider logs.
   293  
   294  ## Known Error Cases
   295  
   296  1. `prevent_destroy` Case: In some cases, when unexpected changes or situations
   297  occur in the resources, Terraform tries to delete the related resource and
   298  create it again. However, in order to prevent this situation, the resources are
   299  configurable. In this context, the name of the field where you can provide this
   300  control is `prevent_destroy`. Please see details of [Terraform Resource Lifecycle].
   301  For resources in Official Providers, this value defaults to `true`. So the
   302  deletion of the resource is blocked.
   303  
   304  Encountering this situation (i.e. Terraform trying to delete and recreate the
   305  resource) is not normal and may indicate a specific error. Some possible
   306  problems could be:
   307  
   308  - As a result of overriding the constructed ID after Terraform calls, Terraform
   309    could not match the IDs and tries to recreate the resource. Please see
   310    [this issue] for details. In this type of cases, you need to review your
   311    external name configuration.
   312  - Crossplane's concept of [Late Initialization] may cause some side effects.
   313    One of them is while late initialization, filling a field that is not initially
   314    filled on the manifest may cause the resource to be destroyed and recreated.
   315    In such a case, it should be evaluated that which field's value is set will
   316    cause such an error. During this evaluation, it will be necessary to make use
   317    of the terraform registry document. In the end, the field that is thought to
   318    solve the problem is put into the ignore list using the
   319    [late initialization configuration] and the test is repeated from the
   320    beginning.
   321  - Some resources fall into `tainted` state as a result of certain steps in the
   322    creation process fail. Please see [tainted issue] for details.
   323  
   324  2. External Name Configuration Related Errors: The most common known issue is
   325  errors in the external name configuration. A clear error message regarding this
   326  situation may not be visible. Many error messages can be related to an incorrect
   327  external name configuration. Such as, a field cannot be read properly from the
   328  parameter map, there are unexpected fields in the generated `main.tf.json` file,
   329  etc.
   330  
   331  Therefore, when debugging a non-ready resource; if you do not see errors
   332  returned by the Cloud API related to the constraints or characteristics of the
   333  service (for example, you are stuck on the creation limit of this resource in
   334  this region, or the use of the relevant field for this resource depends on the
   335  following conditions etc.), the first point to check is external name
   336  configuration.
   337  
   338  3. Late Initialization Errors: Late Initialization is one of the key concepts of
   339  Crossplane. It allows for some values that are not initially located in the
   340  resource's manifest to be filled with the values returned by the cloud providers.
   341  
   342  As a side effect of this, some fields conflict each other. In this case, a
   343  detailed error message is usually displayed about which fields conflict with
   344  each other. In this case, the relevant field should be skipped by [these steps].
   345  
   346  4. Provider Service Specific Errors: Every cloud provider and every service has
   347  its own features and behavior. Therefore, you may see special error messages in
   348  the status of the resources from time to time. These may say that you are out of
   349  the allowed values in some fields of the resource, or that you need to enable
   350  the relevant service, etc. In such cases, please review your example manifest
   351  and try to find the appropriate example.
   352  
   353  > [!IMPORTANT]
   354  > `make reviewable` and `kubectl apply -f package/crds` commands must be run
   355    after any change that will affect the schema or controller of the
   356    configured/tested resource. In addition, the provider needs to be restarted
   357    after the changes in the controllers, because the controller change actually
   358    corresponds to the changes made in the running code.
   359  
   360  [this repo]: https://github.com/kubernetes-sigs/kind
   361  [the documentation]: https://crossplane.io/docs/v1.9/getting-started/install-configure.html#install-configuration-package
   362  [here]: https://github.com/upbound/official-providers/blob/main/docs/testing-resources-by-using-uptest.md#debugging-failed-test
   363  [these steps]: https://github.com/upbound/crossplane/blob/main/docs/configuring-a-resource.md#late-initialization-configuration
   364  [late initialization configuration]: https://github.com/upbound/crossplane/blob/main/docs/configuring-a-resource.md#late-initialization-configuration
   365  [Terraform Resource Lifecycle]: https://learn.hashicorp.com/tutorials/terraform/resource-lifecycle
   366  [this issue]: https://github.com/upbound/crossplane/issues/32
   367  [Late Initialization]: https://crossplane.io/docs/v1.9/concepts/managed-resources.html#late-initialization
   368  [tainted issue]: https://github.com/upbound/crossplane/issues/80
   369  [k3d]: https://k3d.io/