github.com/verrazzano/verrazzano@v1.7.1/tools/psr/DEVELOPER.md (about) 1 # PSR Developer Guide 2 3 This document describes how to develop PSR workers and scenarios that can be used to test specific Verrazzano areas. 4 Following is a summary of the steps needed: 5 6 1. Get familiar with the PSR tool, run the example and some scenarios. 7 2. Decide what component you want to test. 8 3. Decide what you want to test for your first scenario. 9 4. Decide what workers you need to implement your scenario use cases. 10 5. Implement a single worker and test it using Helm. 11 6. Create or update a scenario that includes the worker. 12 7. Test the scenario using the PSR CLI (psrctl). 13 8. Repeat steps 5-7 until the scenario is complete. 14 9. Update the README with your worker information. 15 16 ## Prerequisites 17 - Read the [Verrazzano PSR README](./README.md) to get familiar with the PSR concepts and structure of the source code. 18 - A Kubernetes cluster with Verrazzano installed (full installation or the components you are testing). 19 20 ## PSR Areas 21 Workers are organized into areas, where each area typically maps to one or more Verrazzano backend components, but that isn't always 22 the case as shown with HTTP workers. You can see the workers in the [workers](./backend/workers) package. 23 PSR scenarios are also grouped into areas. 24 25 The following area names are used in the source code and YAML configuration. 26 They are not exposed in metrics names, rather each `worker.go` file specifies the metrics prefix, which is the long name. 27 For example, the OpenSearch worker uses the metric prefix `opensearch` 28 29 1. argo - Argo 30 2. oam - OAM applications, Verrazzano application operator 31 3. cm - cert-manager 32 4. cluster - Verrazzano Cluster operator, multicluster 33 5. coh - Coherence 34 6. dns - ExternalDNS 35 7. jaeger - Jaeger 36 8. kc - Keycloak 37 9. http - HTTP tests 38 10. istio - Istio, Kiali 39 11. mysql - MySQL 40 12. nginx - NGINX Ingress Controller, AuthProxy 41 13. ops - OpenSearch, OpenSearchDashboards, Fluentd, VMO 42 14. prom - Prometheus stack, Grafana 43 15. rancher - Rancher 44 16. velero - Velero 45 17. wls - WebLogic 46 47 ## Developing a worker 48 As mentioned in the README, a worker is the code that implements a single use case. For example, a worker might continuously 49 scale OpenSearch in and out. The `DoWork` function is the code that actually does the work for one loop iteration, and 50 is called repeatedly by the `runner`. DoWork does whatever it needs to do to perform work, this includes blocking calls or 51 condition checks. 52 53 ### Worker Tips 54 Here is some important information to know about workers, much of it is repeated in the README. 55 56 1. Worker code runs in a backend pod. 57 2. The same backend pod has the code for all the workers, but only one worker is executing. 58 3. Workers can have multiple threads doing work (scale up). 59 4. Workers can have multiple replicas (scale out). 60 5. Workers are configured using environment variables. 61 6. Workers should only do one thing (e.g., query OpenSearch). 62 7. All worker should emit metrics. 63 8. Workers must wait for their dependencies before doing work (e.g., Verrazzano CR ready). 64 9. Worker `DoWork` function is called repeatedly in a loop by the `runner`. 65 10. Some workers must be run in an Istio enabled namespace (depends on what the worker does). 66 11. A Worker might need additional Kubernetes resources to be created (e.g., AuthorizationPolicies). 67 12. Workers can be run as Kubernetes deployments or OAM apps (default), this is specified at Helm install. 68 13. All workers run as cluster-admin. 69 70 ### Worker Chart and Overrides 71 Workers are deployed using Helm where there is a single Helm chart for all workers along with area specific Helm subcharts. 72 Each worker specifies the value overrides in a YAML file, such as the environment variables needed to configure 73 worker. If an area specific subchart is needed, then it must be enabled in the override file. 74 75 The worker override YAML file is in manifests/usecases/<area>/<worker>.yaml. The only environment variable required is 76 the `PSR_WORKER_TYPE`. For example, [usecases/opensearch/getlogs.yaml](./manifests/usecases/opensearch/getlogs.yaml) 77 78 ``` 79 global: 80 envVars: 81 PSR_WORKER_TYPE: ops-getlogs 82 83 # activate subchart 84 opensearch: 85 enabled: true 86 87 ``` 88 89 ### Sample MySQL worker 90 To make this section easier to follow, we will describe creating a new MySQL worker that queries the MySQL database. 91 In general, when creating a worker, it is easiest to just copy an existing worker that does the same type of action (e.g., scale) 92 and modify it as needed for your component. When it makes sense, common code should be factored out and reused by multiple workers. 93 94 ### Creating a worker skeleton 95 Following are the first steps to implement a worker: 96 97 1. Add a worker type named `WorkerTypeMysqlQuery = mysql-query` to [config.go](./backend/config/config.go). 98 2. Create a package named `mysql` in package [workers](./backend/workers). 99 3. Create a file `query.go` in the `mysql` package and do the following: 100 1. Stub out the [worker interface](./backend/spi/worker.go) implementation in `query.go` You can copy the ops getlogs worker as a starting point. 101 2. Change the const metrics prefix to `metricsPrefix = "mysql_query"`. 102 3. Rename the `NewGetLogsWorker` function to `NewQueryWorker`. 103 4. Change the `GetWorkerDesc` function to return information about the worker. 104 6. Change the DoWork function to `fmt.Println("hello mysql query worker")`. 105 4. Add your worker case to the `getWorker` function in [manager.go](./backend/workmanager/manager.go). 106 5. Add a directory named `mysql` to [usecases](./manifests/usecases). 107 6. Copy [usecases/opensearch/getlogs.yaml](./manifests/usecases/opensearch/getlogs.yaml) to a file named `usecases/mysql/query.yaml`. 108 7. Edit query.yaml: 109 1. change `PSR_WORKER_TYPE: ops-getlogs` to `PSR_WORKER_TYPE: mysql-query`. 110 2. remove the opensearch-authpol section. 111 112 ### Testing the worker skeleton 113 This section shows how to test the new worker in a Kind cluster. 114 115 1. Test the example worker first by building the image, loading it into the cluster and running the example worker: 116 1. `make run-example-k8s`. 117 2. take note of the image name:tag that is used with the --set override, for example the output might show this: 118 1. helm upgrade --install psr manifests/charts/worker --set appType=k8s --set imageName=ghcr.io/verrazzano/psr-backend:local-4210a50. 119 2. kubectl get pods to see the example worker, look at the pod logs to make sure it is logging. 120 3. Delete the example worker: 121 1. `helm delete psr`. 122 4. Run the mysql worker with the newly built image, an example image tag is shown below: 123 1. `helm install psr manifests/charts/worker -f manifests/usescases/mysql/query.yaml --set appType=k8s --set imageName=ghcr.io/verrazzano/psr-backend:local-4210a50` 124 5. Look at the PSR mysql worker pod and make sure that it is logging `hello mysql query worker`. 125 6. Delete the mysql worker: 126 1. `helm delete psr`. 127 128 ### Add worker specific charts 129 To function properly, certain workers need additional Kubernetes resources to be created. Rather than having the worker create the 130 resources at runtime, you can use a subchart to create them. The subchart will be shared by all workers in an area. 131 Since the MySQL query worker needs to access MySQL directly within the cluster, it will need an Istio AuthorizationPolicy, 132 just like the OpenSearch workers do. This section will show how to add the chart and use it in the use case YAML file. 133 134 1. Create a new subchart called `mysql`: 135 1. copy the opensearch chart from [manifests/charts/worker/charts/opensearch](./manifests/charts/worker/charts/opensearch) to 136 [manifests/charts/worker/charts/mysql](./manifests/charts/worker/charts/mysql). 137 2. create the authorizationpolicy.yaml file with the correct policy to access MySQL. 138 3. Delete the existing opensearch policy yaml files. 139 2. Edit the [worker Chart.yaml](./manifests/charts/worker/Chart.yaml) file and add a dependency for the mysql chart. 140 ``` 141 dependencies: 142 - name: mysql 143 repository: file://../mysql 144 version: 0.1.0 145 condition: mysql.enabled 146 ``` 147 3.Edit the [worker Chart.yaml](./manifests/charts/worker/Chart.yaml) file and add the following section: 148 ``` 149 # activate subchart 150 mysql: 151 enabled: false 152 ``` 153 4. Edit [usecases/mysql/query.yaml](./manifests/usecases/mysql/query.yaml) and add the following section: 154 ``` 155 # activate subchart 156 mysql: 157 enabled: true 158 ``` 159 5. You will need to install the chart in an Istio enabled namespace. 160 6. Test the chart in an Verrazzano installation using the same Helm command as previously, but also specify the namespace: 161 1. `helm install psr manifests/charts/worker -n myns -f manifests/usescases/mysql/query.yaml --set appType=k8s --set imageName=ghcr.io/verrazzano/psr-backend:local-4210a50`. 162 163 ### Add metrics to worker 164 Worker metrics are very important because they let us track the progress and health of a worker. Before implementing 165 the `DoWork` and `PreconditionsMet` functions, you should get metrics working. The reason is that you will be able to 166 easily test your metrics by running your worker in an IDE, then opening up your browser to http://localhost:9090/metrics. 167 Once you implement the real worker code (`DoWork`), you might need to run in an Istio enabled namespace and will need 168 to use Prometheus or Grafana to see the metrics. 169 170 The [runner](./backend/workmanager/runner.go) also emits metrics such as loop count, so you don't need to emit the same metrics. 171 172 1. Modify the `workerMetrics` struct to add the metrics that the worker will emit. 173 2. Modify the `NewQueryWorker` function to specify the metrics descriptors: 174 1. Use a CounterValue metric if the value can never go down, otherwise use GaugeValue or some other metric type. 175 2. Don't specify the worker type prefix in the name field, that is automatically added to the metric name. 176 3. Modify the `GetMetricList` function returning the list of metrics. 177 4. Modify DoWork to update the metrics as work is done: 178 1. You might have some metrics that you cannot implement until the full DoWork code is done. 179 2. Metric access must be thread-safe, use the atomic package like the other worker. 180 5. Test the worker using the Helm chart. 181 6. Access the Prometheus console and query the metrics. 182 183 ### Implement the remainder of the worker code 184 Implement the remaining worker code in `query.go`, specifically `PreconditionsMet` and `DoWork` Note that the query worker 185 doesn't need a Kubernetes client since it knows the MySQL service name. If your worker needs 186 to call the Kubernetes API server, then use the [k8sclient](./backend/pkg/k8sclient) package. See how the 187 OpenSearch [getlogs](./backend/workers/opensearch/scale/scale.go) worker uses ` k8sclient.NewPsrClient`. 188 189 1. Implement NewQueryWorker to create the worker instance. 190 2. Change function GetEnvDescList to return configuration environment variables that the worker needs: 191 1. See the OpenSearch [getlogs](./backend/workers/opensearch/scale/scale.go) worker for an example. 192 3. Implement DoWork. This method should not log, but if it really needs to log, then use the throttled Verrazzano logging, 193 such as Progress or ErrorfThrottled. 194 4. Test the worker using the Helm chart. 195 196 **NOTE** The same worker instance is shared across all worker threads. There is currently no state per worker. Workers 197 that keep state, such as the scaling worker, normally only run in a single thread. 198 199 ## Creating a scenario 200 A scenario is a collection of use cases with a curated configuration, that are run concurrently. Typically, 201 you should restrict the scenario use cases to a single area, but that is not a strict requirement. You can run multiple 202 scenarios concurrently so creating a mixed-area scenario might not be necessary. If you do decide to create a mixed area scenario, 203 then create it in a directory called scenario/mixed. 204 205 ### Scenario files 206 Scenarios are specified by a scenario.yaml file along with use case override files, one for each use case. 207 By convention, the files must be in the [<area>/<scenario-name>/scenarios](./manifests/scenarios) directory structured as follows: 208 ``` 209 <area>/<scenario-name>/scenario.yaml 210 <area>/<scenario-name>usecase-overrides/* 211 ``` 212 Use the long name for the area, e.g. opensearch instead of ops, or cert-manager instead of cm. 213 For example, the scenario to restart all OpenSearch tiers is in [restart-all-tiers](./manifests/scenarios/opensearch/restart-all-tiers) 214 215 ### Scenario YAML file 216 The scenario YAML file describes the scenario, the use cases that comprise the scenario, and the use case overrides for the scenario. 217 Following is the OpenSearch [restart-all-tiers/scenario.yaml](./manifests/scenarios/opensearch/restart-all-tiers/scenario.yaml). 218 ``` 219 name: opensearch-restart-all-tiers 220 ID: ops-rat 221 description: | 222 This is a scenario that restarts pods on all 3 OpenSearch tiers simultaneously 223 usecases: 224 - usecasePath: opensearch/restart.yaml 225 overrideFile: restart-master.yaml 226 description: restarts master nodes 227 - usecasePath: opensearch/restart.yaml 228 overrideFile: restart-data.yaml 229 description: restarts data nodes 230 - usecasePath: opensearch/restart.yaml 231 overrideFile: restart-ingest.yaml 232 description: restarts ingest nodes 233 ``` 234 The front section has the scenario name, ID, and description. The usecases section has all the use cases that will be run when 235 the scenario is started. The `usecasePath` points to the built-in use case override file. The `overrideFile` specifies the file 236 in the `usecase-overrides` directory which contains the scenario overrides for that specific use case. 237 238 Following is the [scenario override file](./manifests/scenarios/opensearch/restart-all-tiers/usecase-overrides/restart-data.yaml) for restarting the data tier. 239 ``` 240 global: 241 envVars: 242 PSR_WORKER_TYPE: ops-restart 243 OPENSEARCH_TIER: data 244 PSR_LOOP_SLEEP: 5s 245 ``` 246 This file specifies that the data tier should be restarted every 5 seconds. The `PSR_WORKER_TYPE` is not really needed here, since it 247 is already in the use case file at [usecases/opensearch/restart.yaml](./manifests/usecases/opensearch/restart.yaml), however, it 248 doesn't hurt to have it for documentation purposes. 249 250 ## Running a scenario 251 Scenarios are run by the PSR command line interface, `psrctl`. The source code [manifests](./manifests) directory contains all 252 of the helm charts, use cases overrides, and scenario files. These manifests files are built into the psrctl binary and accessed 253 internally at runtime, so the psrctl binary is self-contained and there is no need for the user to provide external files. However, 254 you can override the scenario directory at runtime with the `-d` flag. This allows you to modify and test scenarios without having 255 to rebuild `psrctl`. See `psrctl` help for details. 256 257 Scenarios always use OAM to deploy the workers, Kubernetes Deployments are not an option at this time. 258 259 ### Specifying the backend image 260 If you build `psrctl` using make, the image tag is derived from the last commit id. If that image has not been uploaded to 261 ghcr.io, you will need to run `make docker-push`. Since that image is private will need to provide a secret with the ghcr.io 262 credentials with `psrctl -p`. If you want to override the image name, use `psrctl -w`. 263 264 If you want to use a local image and load it into a kind cluster, they run `make kind-load-image` and specify that image 265 using `psrctl -w`. This is the easiest way to develop and test a worker on Kind. 266 267 ### Updating a running scenario 268 During development, you may want to update scenario override values while testing the scenario. Currently, you 269 can only update with `psrctl -u` but only the `usecase-overrides` files can be changed. If you need to change a chart 270 or scenario.yaml, then just restart the scenario. 271 272 ### Sample psrctl commands 273 This section shows examples to run some built-in scenarios. 274 275 **Show the built-in scenarios** 276 ``` 277 psrctl explain 278 ``` 279 280 **Start the OpenSearch scenario ops-s2 in Istio enabled namespace using a custom image** 281 ``` 282 kubectl create ns psrtest 283 kubectl label ns psrtest verrazzano-managed=true istio-injection=enabled 284 psrctl start -s ops-s2 -n psrtest -w ghcr.io/verrazzano/psr-backend:local-c2e911e 285 ``` 286 287 **Show the running scenarios in namespace psrtest, then across all namespaces** 288 ``` 289 psrctl list -n psrtest 290 psrctl list -A 291 ``` 292 293 **Update the running OpenSearch scenario ops-s2 with external scenario override files** 294 ``` 295 psrctl update -s ops-s2 -n psrtest -d ~/tmp/my-ops-s2 296 ``` 297 298 **Stop the running OpenSearch scenario ops-s2** 299 ``` 300 psrctl stop -s ops-s2 -n psrtest 301 ``` 302 303 ## Source Code 304 The source code is organized into backend code, psrctl code, and manifest files. 305 306 ### Backend code 307 The [backend](./backend) directory has the backend code which consists of the following packages: 308 * config - configuration code 309 * metrics - metrics server for metrics generated by the workers 310 * osenv - package that allows workers to specify default and required env vars 311 * pkg - various packages needed by the workers 312 * spi - the worker interface 313 * workers - the various workers 314 * workmanager - the worker manager and runner 315 316 ### Manifests 317 The [manifests/charts](./manifests/charts) directory has the Helm charts. There is a single worker chart for using 318 either OAM or plain Kubernetes resources to deploy the backend. The default is OAM, use 319 deploy without OAM, use `--set appType=k8s`. There is also a subchart for each area that requires one. 320 321 The [manifests/usecases](./manifests/usecases) directory has the Helm override files for every use 322 case. These files must contain the configuration, as key:value pairs, required by the worker. 323 The usecases are organized by area. 324 325 The [manifests/scenarios](./manifests/scenarios) directories for each scenario. Each scenario director 326 contains the scenario.yaml file for the scenario, along with use case override files. 327 The scenarios are organized by area. 328 329 ### Psrctl 330 The [psrct](./psrctl) directory contains the command line interface along with support packages. One 331 thing to note is that the [embed.go](./embed.go) file is needed by the psrctl code to access the built-in 332 manifests. This file needs to be in the parent directory of psrctl. 333 334 335 ## Summary 336 This document has all the information needed to create new workers and scenarios for any Verrazzano component. 337 When creating workers, it is easiest to use Helm to deploy and test the worker. Always start with a single worker 338 thread, then test with multiple threads and replicas. 339 340 When using Helm directly, you can deploy your worker as a Kubernetes Deployment or as an OAM application. 341 If you use a Deployment, you can start testing your stubbed-out worker without Verrazzano installed. 342 When you get further and need Verrazzano dependencies or want the worker metrics scraped, you can switch to OAM. 343 Your worker needs to run in the target cluster, or the worker metrics won't get scraped. 344 345 If your worker needs to access resource in the mesh, like OpenSearch, you will need to create AuthorizationPolicies 346 (via a subchart) and will need to deploy your worker to an Istio enabled namespace. The existing OpenSearch 347 workers have this requirement so you can use one of them as a starting point. Make sure you use Prometheus to 348 test your worker metrics. Once the worker is running, you can create any custom scenario using YAML files as described earlier. 349 Finally, use the `psrctl` CLI to run and test your scenarios.