github.com/theishshah/operator-sdk@v0.6.0/doc/user-guide.md (about) 1 # User Guide 2 3 This guide walks through an example of building a simple memcached-operator using the operator-sdk 4 CLI tool and controller-runtime library API. To learn how to use Ansible or Helm to create an 5 operator, see the [Ansible Operator User Guide][ansible_user_guide] or the [Helm Operator User 6 Guide][helm_user_guide]. The rest of this document will show how to program an operator in Go. 7 8 ## Prerequisites 9 10 - [dep][dep_tool] version v0.5.0+. 11 - [git][git_tool] 12 - [go][go_tool] version v1.10+. 13 - [docker][docker_tool] version 17.03+. 14 - [kubectl][kubectl_tool] version v1.11.3+. 15 - Access to a Kubernetes v1.11.3+ cluster. 16 17 **Note**: This guide uses [minikube][minikube_tool] version v0.25.0+ as the local Kubernetes cluster and quay.io for the public registry. 18 19 ## Install the Operator SDK CLI 20 21 The Operator SDK has a CLI tool that helps the developer to create, build, and deploy a new operator project. 22 23 Checkout the desired release tag and install the SDK CLI tool: 24 25 ```sh 26 $ mkdir -p $GOPATH/src/github.com/operator-framework 27 $ cd $GOPATH/src/github.com/operator-framework 28 $ git clone https://github.com/operator-framework/operator-sdk 29 $ cd operator-sdk 30 $ git checkout master 31 $ make dep 32 $ make install 33 ``` 34 35 This installs the CLI binary `operator-sdk` at `$GOPATH/bin`. 36 37 ## Create a new project 38 39 Use the CLI to create a new memcached-operator project: 40 41 ```sh 42 $ mkdir -p $GOPATH/src/github.com/example-inc/ 43 $ cd $GOPATH/src/github.com/example-inc/ 44 $ operator-sdk new memcached-operator 45 $ cd memcached-operator 46 ``` 47 48 To learn about the project directory structure, see [project layout][layout_doc] doc. 49 50 #### Operator scope 51 52 A namespace-scoped operator (the default) watches and manages resources in a single namespace, whereas a cluster-scoped operator watches and manages resources cluster-wide. Namespace-scoped operators are preferred because of their flexibility. They enable decoupled upgrades, namespace isolation for failures and monitoring, and differing API definitions. However, there are use cases where a cluster-scoped operator may make sense. For example, the [cert-manager](https://github.com/jetstack/cert-manager) operator is often deployed with cluster-scoped permissions and watches so that it can manage issuing certificates for an entire cluster. 53 54 If you'd like to create your memcached-operator project to be cluster-scoped use the following `operator-sdk new` command instead: 55 ``` 56 $ operator-sdk new memcached-operator --cluster-scoped 57 ``` 58 59 Using `--cluster-scoped` will scaffold the new operator with the following modifications: 60 * `deploy/operator.yaml` - Set `WATCH_NAMESPACE=""` instead of setting it to the pod's namespace 61 * `deploy/role.yaml` - Use `ClusterRole` instead of `Role` 62 * `deploy/role_binding.yaml`: 63 * Use `ClusterRoleBinding` instead of `RoleBinding` 64 * Use `ClusterRole` instead of `Role` for roleRef 65 * Set the subject namespace to `REPLACE_NAMESPACE`. This must be changed to the namespace in which the operator is deployed. 66 67 ### Manager 68 The main program for the operator `cmd/manager/main.go` initializes and runs the [Manager][manager_go_doc]. 69 70 The Manager will automatically register the scheme for all custom resources defined under `pkg/apis/...` and run all controllers under `pkg/controller/...`. 71 72 The Manager can restrict the namespace that all controllers will watch for resources: 73 ```Go 74 mgr, err := manager.New(cfg, manager.Options{Namespace: namespace}) 75 ``` 76 By default this will be the namespace that the operator is running in. To watch all namespaces leave the namespace option empty: 77 ```Go 78 mgr, err := manager.New(cfg, manager.Options{Namespace: ""}) 79 ``` 80 81 ## Add a new Custom Resource Definition 82 83 Add a new Custom Resource Definition(CRD) API called Memcached, with APIVersion `cache.example.com/v1alpha1` and Kind `Memcached`. 84 85 ```sh 86 $ operator-sdk add api --api-version=cache.example.com/v1alpha1 --kind=Memcached 87 ``` 88 89 This will scaffold the Memcached resource API under `pkg/apis/cache/v1alpha1/...`. 90 91 ### Define the spec and status 92 93 Modify the spec and status of the `Memcached` Custom Resource(CR) at `pkg/apis/cache/v1alpha1/memcached_types.go`: 94 95 ```Go 96 type MemcachedSpec struct { 97 // Size is the size of the memcached deployment 98 Size int32 `json:"size"` 99 } 100 type MemcachedStatus struct { 101 // Nodes are the names of the memcached pods 102 Nodes []string `json:"nodes"` 103 } 104 ``` 105 106 After modifying the `*_types.go` file always run the following command to update the generated code for that resource type: 107 108 ```sh 109 $ operator-sdk generate k8s 110 ``` 111 112 ## Add a new Controller 113 114 Add a new [Controller][controller-go-doc] to the project that will watch and reconcile the Memcached resource: 115 116 ```sh 117 $ operator-sdk add controller --api-version=cache.example.com/v1alpha1 --kind=Memcached 118 ``` 119 120 This will scaffold a new Controller implementation under `pkg/controller/memcached/...`. 121 122 For this example replace the generated Controller file `pkg/controller/memcached/memcached_controller.go` with the example [`memcached_controller.go`][memcached_controller] implementation. 123 124 The example Controller executes the following reconciliation logic for each `Memcached` CR: 125 - Create a memcached Deployment if it doesn't exist 126 - Ensure that the Deployment size is the same as specified by the `Memcached` CR spec 127 - Update the `Memcached` CR status using the status writer with the names of the memcached pods 128 129 The next two subsections explain how the Controller watches resources and how the reconcile loop is triggered. Skip to the [Build](#build-and-run-the-operator) section to see how to build and run the operator. 130 131 ### Resources watched by the Controller 132 133 Inspect the Controller implementation at `pkg/controller/memcached/memcached_controller.go` to see how the Controller watches resources. 134 135 The first watch is for the Memcached type as the primary resource. For each Add/Update/Delete event the reconcile loop will be sent a reconcile `Request` (a namespace/name key) for that Memcached object: 136 137 ```Go 138 err := c.Watch( 139 &source.Kind{Type: &cachev1alpha1.Memcached{}}, &handler.EnqueueRequestForObject{}) 140 ``` 141 142 The next watch is for Deployments but the event handler will map each event to a reconcile `Request` for the owner of the Deployment. Which in this case is the Memcached object for which the Deployment was created. This allows the controller to watch Deployments as a secondary resource. 143 144 ```Go 145 err := c.Watch(&source.Kind{Type: &appsv1.Deployment{}}, &handler.EnqueueRequestForOwner{ 146 IsController: true, 147 OwnerType: &cachev1alpha1.Memcached{}, 148 }) 149 ``` 150 151 **// TODO:** Doc on eventhandler, arbitrary mapping between watched and reconciled resource. 152 153 **// TODO:** Doc on configuring a Controller: number of workers, predicates, watching channels, 154 155 ### Reconcile loop 156 157 Every Controller has a Reconciler object with a `Reconcile()` method that implements the reconcile loop. The reconcile loop is passed the [`Request`][request-go-doc] argument which is a Namespace/Name key used to lookup the primary resource object, Memcached, from the cache: 158 159 ```Go 160 func (r *ReconcileMemcached) Reconcile(request reconcile.Request) (reconcile.Result, error) { 161 // Lookup the Memcached instance for this reconcile request 162 memcached := &cachev1alpha1.Memcached{} 163 err := r.client.Get(context.TODO(), request.NamespacedName, memcached) 164 ... 165 } 166 ``` 167 168 Based on the return values, [`Result`][result_go_doc] and error, the `Request` may be requeued and the reconcile loop may be triggered again: 169 170 ```Go 171 // Reconcile successful - don't requeue 172 return reconcile.Result{}, nil 173 // Reconcile failed due to error - requeue 174 return reconcile.Result{}, err 175 // Requeue for any reason other than error 176 return reconcile.Result{Requeue: true}, nil 177 ``` 178 179 You can set the `Result.RequeueAfter` to requeue the `Request` after a grace period as well: 180 ```Go 181 import "time" 182 183 // Reconcile for any reason than error after 5 seconds 184 return reconcile.Result{RequeueAfter: time.Second*5}, nil 185 ``` 186 187 **Note:** Returning `Result` with `RequeueAfter` set is how you can periodically reconcile a CR. 188 189 For a guide on Reconcilers, Clients, and interacting with resource Events, see the [Client API doc][doc_client_api]. 190 191 ## Build and run the operator 192 193 Before running the operator, the CRD must be registered with the Kubernetes apiserver: 194 195 ```sh 196 $ kubectl create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml 197 ``` 198 199 Once this is done, there are two ways to run the operator: 200 201 - As a Deployment inside a Kubernetes cluster 202 - As Go program outside a cluster 203 204 ### 1. Run as a Deployment inside the cluster 205 206 Build the memcached-operator image and push it to a registry: 207 ``` 208 $ operator-sdk build quay.io/example/memcached-operator:v0.0.1 209 $ sed -i 's|REPLACE_IMAGE|quay.io/example/memcached-operator:v0.0.1|g' deploy/operator.yaml 210 $ docker push quay.io/example/memcached-operator:v0.0.1 211 ``` 212 213 If you created your operator using `--cluster-scoped=true`, update the service account namespace in the generated `ClusterRoleBinding` to match where you are deploying your operator. 214 ``` 215 $ export OPERATOR_NAMESPACE=$(kubectl config view --minify -o jsonpath='{.contexts[0].context.namespace}') 216 $ sed -i "s|REPLACE_NAMESPACE|$OPERATOR_NAMESPACE|g" deploy/role_binding.yaml 217 ``` 218 219 **Note** 220 If you are performing these steps on OSX, use the following commands instead: 221 ``` 222 $ sed -i "" 's|REPLACE_IMAGE|quay.io/example/memcached-operator:v0.0.1|g' deploy/operator.yaml 223 $ sed -i "" "s|REPLACE_NAMESPACE|$OPERATOR_NAMESPACE|g" deploy/role_binding.yaml 224 ``` 225 226 The Deployment manifest is generated at `deploy/operator.yaml`. Be sure to update the deployment image as shown above since the default is just a placeholder. 227 228 Setup RBAC and deploy the memcached-operator: 229 230 ```sh 231 $ kubectl create -f deploy/service_account.yaml 232 $ kubectl create -f deploy/role.yaml 233 $ kubectl create -f deploy/role_binding.yaml 234 $ kubectl create -f deploy/operator.yaml 235 ``` 236 237 Verify that the memcached-operator is up and running: 238 239 ```sh 240 $ kubectl get deployment 241 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 242 memcached-operator 1 1 1 1 1m 243 ``` 244 245 ### 2. Run locally outside the cluster 246 247 This method is preferred during development cycle to deploy and test faster. 248 249 Set the name of the operator in an environment variable: 250 251 ```sh 252 export OPERATOR_NAME=memcached-operator 253 ``` 254 255 Run the operator locally with the default Kubernetes config file present at `$HOME/.kube/config`: 256 257 ```sh 258 $ operator-sdk up local --namespace=default 259 2018/09/30 23:10:11 Go Version: go1.10.2 260 2018/09/30 23:10:11 Go OS/Arch: darwin/amd64 261 2018/09/30 23:10:11 operator-sdk Version: 0.0.6+git 262 2018/09/30 23:10:12 Registering Components. 263 2018/09/30 23:10:12 Starting the Cmd. 264 ``` 265 266 You can use a specific kubeconfig via the flag `--kubeconfig=<path/to/kubeconfig>`. 267 268 ## Create a Memcached CR 269 270 Create the example `Memcached` CR that was generated at `deploy/crds/cache_v1alpha1_memcached_cr.yaml`: 271 272 ```sh 273 $ cat deploy/crds/cache_v1alpha1_memcached_cr.yaml 274 apiVersion: "cache.example.com/v1alpha1" 275 kind: "Memcached" 276 metadata: 277 name: "example-memcached" 278 spec: 279 size: 3 280 281 $ kubectl apply -f deploy/crds/cache_v1alpha1_memcached_cr.yaml 282 ``` 283 284 Ensure that the memcached-operator creates the deployment for the CR: 285 286 ```sh 287 $ kubectl get deployment 288 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 289 memcached-operator 1 1 1 1 2m 290 example-memcached 3 3 3 3 1m 291 ``` 292 293 Check the pods and CR status to confirm the status is updated with the memcached pod names: 294 295 ```sh 296 $ kubectl get pods 297 NAME READY STATUS RESTARTS AGE 298 example-memcached-6fd7c98d8-7dqdr 1/1 Running 0 1m 299 example-memcached-6fd7c98d8-g5k7v 1/1 Running 0 1m 300 example-memcached-6fd7c98d8-m7vn7 1/1 Running 0 1m 301 memcached-operator-7cc7cfdf86-vvjqk 1/1 Running 0 2m 302 ``` 303 304 ```sh 305 $ kubectl get memcached/example-memcached -o yaml 306 apiVersion: cache.example.com/v1alpha1 307 kind: Memcached 308 metadata: 309 clusterName: "" 310 creationTimestamp: 2018-03-31T22:51:08Z 311 generation: 0 312 name: example-memcached 313 namespace: default 314 resourceVersion: "245453" 315 selfLink: /apis/cache.example.com/v1alpha1/namespaces/default/memcacheds/example-memcached 316 uid: 0026cc97-3536-11e8-bd83-0800274106a1 317 spec: 318 size: 3 319 status: 320 nodes: 321 - example-memcached-6fd7c98d8-7dqdr 322 - example-memcached-6fd7c98d8-g5k7v 323 - example-memcached-6fd7c98d8-m7vn7 324 ``` 325 326 ### Update the size 327 328 Change the `spec.size` field in the memcached CR from 3 to 4 and apply the change: 329 330 ```sh 331 $ cat deploy/crds/cache_v1alpha1_memcached_cr.yaml 332 apiVersion: "cache.example.com/v1alpha1" 333 kind: "Memcached" 334 metadata: 335 name: "example-memcached" 336 spec: 337 size: 4 338 339 $ kubectl apply -f deploy/crds/cache_v1alpha1_memcached_cr.yaml 340 ``` 341 342 Confirm that the operator changes the deployment size: 343 344 ```sh 345 $ kubectl get deployment 346 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 347 example-memcached 4 4 4 4 5m 348 ``` 349 350 ### Cleanup 351 352 Clean up the resources: 353 354 ```sh 355 $ kubectl delete -f deploy/crds/cache_v1alpha1_memcached_cr.yaml 356 $ kubectl delete -f deploy/operator.yaml 357 $ kubectl delete -f deploy/role_binding.yaml 358 $ kubectl delete -f deploy/role.yaml 359 $ kubectl delete -f deploy/service_account.yaml 360 ``` 361 362 ## Advanced Topics 363 364 ### Adding 3rd Party Resources To Your Operator 365 366 The operator's Manager supports the Core Kubernetes resource types as found in the client-go [scheme][scheme_package] package and will also register the schemes of all custom resource types defined in your project under `pkg/apis`. 367 ```Go 368 import ( 369 "github.com/example-inc/memcached-operator/pkg/apis" 370 ... 371 ) 372 // Setup Scheme for all resources 373 if err := apis.AddToScheme(mgr.GetScheme()); err != nil { 374 log.Error(err, "") 375 os.Exit(1) 376 } 377 ``` 378 379 To add a 3rd party resource to an operator, you must add it to the Manager's scheme. By creating an `AddToScheme` method or reusing one you can easily add a resource to your scheme. An [example][deployments_register] shows that you define a function and then use the [runtime][runtime_package] package to create a `SchemeBuilder`. 380 381 #### Register with the Manager's scheme 382 383 Call the `AddToScheme()` function for your 3rd party resource and pass it the Manager's scheme via `mgr.GetScheme()`. 384 385 Example: 386 ```go 387 import ( 388 .... 389 routev1 "github.com/openshift/api/route/v1" 390 ) 391 392 func main() { 393 .... 394 if err := routev1.AddToScheme(mgr.GetScheme()); err != nil { 395 log.Error(err, "") 396 os.Exit(1) 397 } 398 .... 399 } 400 ``` 401 402 After adding new import paths to your operator project, run `dep ensure` in the root of your project directory to fulfill these dependencies. 403 404 405 ### Handle Cleanup on Deletion 406 407 To implement complex deletion logic, you can add a finalizer to your Custom Resource. This will prevent your Custom Resource from being 408 deleted until you remove the finalizer (ie, after your cleanup logic has successfully run). For more information, see the 409 [official Kubernetes documentation on finalizers](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#finalizers). 410 411 ### Metrics 412 413 To learn about how metrics work in the Operator SDK read the [metrics section][metrics_doc] of the user documentation. 414 415 ## Leader election 416 417 During the lifecycle of an operator it's possible that there may be more than 1 instance running at any given time e.g when rolling out an upgrade for the operator. 418 In such a scenario it is necessary to avoid contention between multiple operator instances via leader election so that only one leader instance handles the reconciliation while the other instances are inactive but ready to take over when the leader steps down. 419 420 There are two different leader election implementations to choose from, each with its own tradeoff. 421 422 - [Leader-for-life][leader_for_life]: The leader pod only gives up leadership (via garbage collection) when it is deleted. This implementation precludes the possibility of 2 instances mistakenly running as leaders (split brain). However, this method can be subject to a delay in electing a new leader. For instance when the leader pod is on an unresponsive or partitioned node, the [`pod-eviction-timeout`][pod_eviction_timeout] dictates how it takes for the leader pod to be deleted from the node and step down (default 5m). 423 - [Leader-with-lease][leader_with_lease]: The leader pod periodically renews the leader lease and gives up leadership when it can't renew the lease. This implementation allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in [certain situations][lease_split_brain]. 424 425 By default the SDK enables the leader-for-life implementation. However you should consult the docs above for both approaches to consider the tradeoffs that make sense for your use case. 426 427 The following examples illustrate how to use the two options: 428 429 ### Leader for life 430 431 A call to `leader.Become()` will block the operator as it retries until it can become the leader by creating the configmap named `memcached-operator-lock`. 432 433 ```Go 434 import ( 435 ... 436 "github.com/operator-framework/operator-sdk/pkg/leader" 437 ) 438 439 func main() { 440 ... 441 err = leader.Become(context.TODO(), "memcached-operator-lock") 442 if err != nil { 443 log.Error(err, "Failed to retry for leader lock") 444 os.Exit(1) 445 } 446 ... 447 } 448 ``` 449 If the operator is not running inside a cluster `leader.Become()` will simply return without error to skip the leader election since it can't detect the operator's namespace. 450 451 ### Leader with lease 452 453 The leader-with-lease approach can be enabled via the [Manager Options][manager_options] for leader election. 454 455 ```Go 456 import ( 457 ... 458 "sigs.k8s.io/controller-runtime/pkg/manager" 459 ) 460 461 func main() { 462 ... 463 opts := manager.Options{ 464 ... 465 LeaderElection: true, 466 LeaderElectionID: "memcached-operator-lock" 467 } 468 mgr, err := manager.New(cfg, opts) 469 ... 470 } 471 ``` 472 473 When the operator is not running in a cluster, the Manager will return an error on starting since it can't detect the operator's namespace in order to create the configmap for leader election. You can override this namespace by setting the Manager's `LeaderElectionNamespace` option. 474 475 476 477 [pod_eviction_timeout]: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options 478 [manager_options]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/manager#Options 479 [lease_split_brain]: https://github.com/kubernetes/client-go/blob/30b06a83d67458700a5378239df6b96948cb9160/tools/leaderelection/leaderelection.go#L21-L24 480 [leader_for_life]: https://godoc.org/github.com/operator-framework/operator-sdk/pkg/leader 481 [leader_with_lease]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/leaderelection 482 [memcached_handler]: ../example/memcached-operator/handler.go.tmpl 483 [memcached_controller]: ../example/memcached-operator/memcached_controller.go.tmpl 484 [layout_doc]:./project_layout.md 485 [ansible_user_guide]:./ansible/user-guide.md 486 [helm_user_guide]:./helm/user-guide.md 487 [dep_tool]:https://golang.github.io/dep/docs/installation.html 488 [git_tool]:https://git-scm.com/downloads 489 [go_tool]:https://golang.org/dl/ 490 [docker_tool]:https://docs.docker.com/install/ 491 [kubectl_tool]:https://kubernetes.io/docs/tasks/tools/install-kubectl/ 492 [minikube_tool]:https://github.com/kubernetes/minikube#installation 493 [scheme_package]:https://github.com/kubernetes/client-go/blob/master/kubernetes/scheme/register.go 494 [deployments_register]: https://github.com/kubernetes/api/blob/master/apps/v1/register.go#L41 495 [doc_client_api]:./user/client.md 496 [runtime_package]: https://godoc.org/k8s.io/apimachinery/pkg/runtime 497 [manager_go_doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/manager#Manager 498 [controller-go-doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg#hdr-Controller 499 [request-go-doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/reconcile#Request 500 [result_go_doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/reconcile#Result 501 [metrics_doc]: ./user/metrics/README.md