github.com/theishshah/operator-sdk@v0.6.0/doc/user-guide.md (about)

     1  # User Guide
     2  
     3  This guide walks through an example of building a simple memcached-operator using the operator-sdk
     4  CLI tool and controller-runtime library API. To learn how to use Ansible or Helm to create an
     5  operator, see the [Ansible Operator User Guide][ansible_user_guide] or the [Helm Operator User
     6  Guide][helm_user_guide]. The rest of this document will show how to program an operator in Go.
     7  
     8  ## Prerequisites
     9  
    10  - [dep][dep_tool] version v0.5.0+.
    11  - [git][git_tool]
    12  - [go][go_tool] version v1.10+.
    13  - [docker][docker_tool] version 17.03+.
    14  - [kubectl][kubectl_tool] version v1.11.3+.
    15  - Access to a Kubernetes v1.11.3+ cluster.
    16  
    17  **Note**: This guide uses [minikube][minikube_tool] version v0.25.0+ as the local Kubernetes cluster and quay.io for the public registry.
    18  
    19  ## Install the Operator SDK CLI
    20  
    21  The Operator SDK has a CLI tool that helps the developer to create, build, and deploy a new operator project.
    22  
    23  Checkout the desired release tag and install the SDK CLI tool:
    24  
    25  ```sh
    26  $ mkdir -p $GOPATH/src/github.com/operator-framework
    27  $ cd $GOPATH/src/github.com/operator-framework
    28  $ git clone https://github.com/operator-framework/operator-sdk
    29  $ cd operator-sdk
    30  $ git checkout master
    31  $ make dep
    32  $ make install
    33  ```
    34  
    35  This installs the CLI binary `operator-sdk` at `$GOPATH/bin`.
    36  
    37  ## Create a new project
    38  
    39  Use the CLI to create a new memcached-operator project:
    40  
    41  ```sh
    42  $ mkdir -p $GOPATH/src/github.com/example-inc/
    43  $ cd $GOPATH/src/github.com/example-inc/
    44  $ operator-sdk new memcached-operator
    45  $ cd memcached-operator
    46  ```
    47  
    48  To learn about the project directory structure, see [project layout][layout_doc] doc.
    49  
    50  #### Operator scope
    51  
    52  A namespace-scoped operator (the default) watches and manages resources in a single namespace, whereas a cluster-scoped operator watches and manages resources cluster-wide. Namespace-scoped operators are preferred because of their flexibility. They enable decoupled upgrades, namespace isolation for failures and monitoring, and differing API definitions. However, there are use cases where a cluster-scoped operator may make sense. For example, the [cert-manager](https://github.com/jetstack/cert-manager) operator is often deployed with cluster-scoped permissions and watches so that it can manage issuing certificates for an entire cluster.
    53  
    54  If you'd like to create your memcached-operator project to be cluster-scoped use the following `operator-sdk new` command instead:
    55  ```
    56  $ operator-sdk new memcached-operator --cluster-scoped
    57  ```
    58  
    59  Using `--cluster-scoped` will scaffold the new operator with the following modifications:
    60  * `deploy/operator.yaml` - Set `WATCH_NAMESPACE=""` instead of setting it to the pod's namespace
    61  * `deploy/role.yaml` - Use `ClusterRole` instead of `Role`
    62  * `deploy/role_binding.yaml`:
    63    * Use `ClusterRoleBinding` instead of `RoleBinding`
    64    * Use `ClusterRole` instead of `Role` for roleRef
    65    * Set the subject namespace to `REPLACE_NAMESPACE`. This must be changed to the namespace in which the operator is deployed.
    66  
    67  ### Manager
    68  The main program for the operator `cmd/manager/main.go` initializes and runs the [Manager][manager_go_doc].
    69  
    70  The Manager will automatically register the scheme for all custom resources defined under `pkg/apis/...` and run all controllers under `pkg/controller/...`.
    71  
    72  The Manager can restrict the namespace that all controllers will watch for resources:
    73  ```Go
    74  mgr, err := manager.New(cfg, manager.Options{Namespace: namespace})
    75  ```
    76  By default this will be the namespace that the operator is running in. To watch all namespaces leave the namespace option empty:
    77  ```Go
    78  mgr, err := manager.New(cfg, manager.Options{Namespace: ""})
    79  ```
    80  
    81  ## Add a new Custom Resource Definition
    82  
    83  Add a new Custom Resource Definition(CRD) API called Memcached, with APIVersion `cache.example.com/v1alpha1` and Kind `Memcached`.
    84  
    85  ```sh
    86  $ operator-sdk add api --api-version=cache.example.com/v1alpha1 --kind=Memcached
    87  ```
    88  
    89  This will scaffold the Memcached resource API under `pkg/apis/cache/v1alpha1/...`.
    90  
    91  ### Define the spec and status
    92  
    93  Modify the spec and status of the `Memcached` Custom Resource(CR) at `pkg/apis/cache/v1alpha1/memcached_types.go`:
    94  
    95  ```Go
    96  type MemcachedSpec struct {
    97  	// Size is the size of the memcached deployment
    98  	Size int32 `json:"size"`
    99  }
   100  type MemcachedStatus struct {
   101  	// Nodes are the names of the memcached pods
   102  	Nodes []string `json:"nodes"`
   103  }
   104  ```
   105  
   106  After modifying the `*_types.go` file always run the following command to update the generated code for that resource type:
   107  
   108  ```sh
   109  $ operator-sdk generate k8s
   110  ```
   111  
   112  ## Add a new Controller
   113  
   114  Add a new [Controller][controller-go-doc] to the project that will watch and reconcile the Memcached resource:
   115  
   116  ```sh
   117  $ operator-sdk add controller --api-version=cache.example.com/v1alpha1 --kind=Memcached
   118  ```
   119  
   120  This will scaffold a new Controller implementation under `pkg/controller/memcached/...`.
   121  
   122  For this example replace the generated Controller file `pkg/controller/memcached/memcached_controller.go` with the example [`memcached_controller.go`][memcached_controller] implementation.
   123  
   124  The example Controller executes the following reconciliation logic for each `Memcached` CR:
   125  - Create a memcached Deployment if it doesn't exist
   126  - Ensure that the Deployment size is the same as specified by the `Memcached` CR spec
   127  - Update the `Memcached` CR status using the status writer with the names of the memcached pods
   128  
   129  The next two subsections explain how the Controller watches resources and how the reconcile loop is triggered. Skip to the [Build](#build-and-run-the-operator) section to see how to build and run the operator.
   130  
   131  ### Resources watched by the Controller
   132  
   133  Inspect the Controller implementation at `pkg/controller/memcached/memcached_controller.go` to see how the Controller watches resources.
   134  
   135  The first watch is for the Memcached type as the primary resource. For each Add/Update/Delete event the reconcile loop will be sent a reconcile `Request` (a namespace/name key) for that Memcached object:
   136  
   137  ```Go
   138  err := c.Watch(
   139    &source.Kind{Type: &cachev1alpha1.Memcached{}}, &handler.EnqueueRequestForObject{})
   140  ```
   141  
   142  The next watch is for Deployments but the event handler will map each event to a reconcile `Request` for the owner of the Deployment. Which in this case is the Memcached object for which the Deployment was created. This allows the controller to watch Deployments as a secondary resource.
   143  
   144  ```Go
   145  err := c.Watch(&source.Kind{Type: &appsv1.Deployment{}}, &handler.EnqueueRequestForOwner{
   146      IsController: true,
   147      OwnerType:    &cachev1alpha1.Memcached{},
   148    })
   149  ```
   150  
   151  **// TODO:** Doc on eventhandler, arbitrary mapping between watched and reconciled resource.
   152  
   153  **// TODO:** Doc on configuring a Controller: number of workers, predicates, watching channels,
   154  
   155  ### Reconcile loop
   156  
   157  Every Controller has a Reconciler object with a `Reconcile()` method that implements the reconcile loop. The reconcile loop is passed the [`Request`][request-go-doc] argument which is a Namespace/Name key used to lookup the primary resource object, Memcached, from the cache:
   158  
   159  ```Go
   160  func (r *ReconcileMemcached) Reconcile(request reconcile.Request) (reconcile.Result, error) {
   161    // Lookup the Memcached instance for this reconcile request
   162    memcached := &cachev1alpha1.Memcached{}
   163    err := r.client.Get(context.TODO(), request.NamespacedName, memcached)
   164    ...
   165  }  
   166  ```
   167  
   168  Based on the return values, [`Result`][result_go_doc] and error, the `Request` may be requeued and the reconcile loop may be triggered again:
   169  
   170  ```Go
   171  // Reconcile successful - don't requeue
   172  return reconcile.Result{}, nil
   173  // Reconcile failed due to error - requeue
   174  return reconcile.Result{}, err
   175  // Requeue for any reason other than error
   176  return reconcile.Result{Requeue: true}, nil
   177  ```
   178  
   179  You can set the `Result.RequeueAfter` to requeue the `Request` after a grace period as well:
   180  ```Go
   181  import "time"
   182  
   183  // Reconcile for any reason than error after 5 seconds
   184  return reconcile.Result{RequeueAfter: time.Second*5}, nil
   185  ```
   186  
   187  **Note:** Returning `Result` with `RequeueAfter` set is how you can periodically reconcile a CR.
   188  
   189  For a guide on Reconcilers, Clients, and interacting with resource Events, see the [Client API doc][doc_client_api].
   190  
   191  ## Build and run the operator
   192  
   193  Before running the operator, the CRD must be registered with the Kubernetes apiserver:
   194  
   195  ```sh
   196  $ kubectl create -f deploy/crds/cache_v1alpha1_memcached_crd.yaml
   197  ```
   198  
   199  Once this is done, there are two ways to run the operator:
   200  
   201  - As a Deployment inside a Kubernetes cluster
   202  - As Go program outside a cluster
   203  
   204  ### 1. Run as a Deployment inside the cluster
   205  
   206  Build the memcached-operator image and push it to a registry:
   207  ```
   208  $ operator-sdk build quay.io/example/memcached-operator:v0.0.1
   209  $ sed -i 's|REPLACE_IMAGE|quay.io/example/memcached-operator:v0.0.1|g' deploy/operator.yaml
   210  $ docker push quay.io/example/memcached-operator:v0.0.1
   211  ```
   212  
   213  If you created your operator using `--cluster-scoped=true`, update the service account namespace in the generated `ClusterRoleBinding` to match where you are deploying your operator.
   214  ```
   215  $ export OPERATOR_NAMESPACE=$(kubectl config view --minify -o jsonpath='{.contexts[0].context.namespace}')
   216  $ sed -i "s|REPLACE_NAMESPACE|$OPERATOR_NAMESPACE|g" deploy/role_binding.yaml
   217  ```
   218  
   219  **Note**
   220  If you are performing these steps on OSX, use the following commands instead:
   221  ```
   222  $ sed -i "" 's|REPLACE_IMAGE|quay.io/example/memcached-operator:v0.0.1|g' deploy/operator.yaml
   223  $ sed -i "" "s|REPLACE_NAMESPACE|$OPERATOR_NAMESPACE|g" deploy/role_binding.yaml
   224  ```
   225  
   226  The Deployment manifest is generated at `deploy/operator.yaml`. Be sure to update the deployment image as shown above since the default is just a placeholder.
   227  
   228  Setup RBAC and deploy the memcached-operator:
   229  
   230  ```sh
   231  $ kubectl create -f deploy/service_account.yaml
   232  $ kubectl create -f deploy/role.yaml
   233  $ kubectl create -f deploy/role_binding.yaml
   234  $ kubectl create -f deploy/operator.yaml
   235  ```
   236  
   237  Verify that the memcached-operator is up and running:
   238  
   239  ```sh
   240  $ kubectl get deployment
   241  NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
   242  memcached-operator       1         1         1            1           1m
   243  ```
   244  
   245  ### 2. Run locally outside the cluster
   246  
   247  This method is preferred during development cycle to deploy and test faster.
   248  
   249  Set the name of the operator in an environment variable:
   250  
   251  ```sh
   252  export OPERATOR_NAME=memcached-operator
   253  ```
   254  
   255  Run the operator locally with the default Kubernetes config file present at `$HOME/.kube/config`:
   256  
   257  ```sh
   258  $ operator-sdk up local --namespace=default
   259  2018/09/30 23:10:11 Go Version: go1.10.2
   260  2018/09/30 23:10:11 Go OS/Arch: darwin/amd64
   261  2018/09/30 23:10:11 operator-sdk Version: 0.0.6+git
   262  2018/09/30 23:10:12 Registering Components.
   263  2018/09/30 23:10:12 Starting the Cmd.
   264  ```
   265  
   266  You can use a specific kubeconfig via the flag `--kubeconfig=<path/to/kubeconfig>`.
   267  
   268  ## Create a Memcached CR
   269  
   270  Create the example `Memcached` CR that was generated at `deploy/crds/cache_v1alpha1_memcached_cr.yaml`:
   271  
   272  ```sh
   273  $ cat deploy/crds/cache_v1alpha1_memcached_cr.yaml
   274  apiVersion: "cache.example.com/v1alpha1"
   275  kind: "Memcached"
   276  metadata:
   277    name: "example-memcached"
   278  spec:
   279    size: 3
   280  
   281  $ kubectl apply -f deploy/crds/cache_v1alpha1_memcached_cr.yaml
   282  ```
   283  
   284  Ensure that the memcached-operator creates the deployment for the CR:
   285  
   286  ```sh
   287  $ kubectl get deployment
   288  NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
   289  memcached-operator       1         1         1            1           2m
   290  example-memcached        3         3         3            3           1m
   291  ```
   292  
   293  Check the pods and CR status to confirm the status is updated with the memcached pod names:
   294  
   295  ```sh
   296  $ kubectl get pods
   297  NAME                                  READY     STATUS    RESTARTS   AGE
   298  example-memcached-6fd7c98d8-7dqdr     1/1       Running   0          1m
   299  example-memcached-6fd7c98d8-g5k7v     1/1       Running   0          1m
   300  example-memcached-6fd7c98d8-m7vn7     1/1       Running   0          1m
   301  memcached-operator-7cc7cfdf86-vvjqk   1/1       Running   0          2m
   302  ```
   303  
   304  ```sh
   305  $ kubectl get memcached/example-memcached -o yaml
   306  apiVersion: cache.example.com/v1alpha1
   307  kind: Memcached
   308  metadata:
   309    clusterName: ""
   310    creationTimestamp: 2018-03-31T22:51:08Z
   311    generation: 0
   312    name: example-memcached
   313    namespace: default
   314    resourceVersion: "245453"
   315    selfLink: /apis/cache.example.com/v1alpha1/namespaces/default/memcacheds/example-memcached
   316    uid: 0026cc97-3536-11e8-bd83-0800274106a1
   317  spec:
   318    size: 3
   319  status:
   320    nodes:
   321    - example-memcached-6fd7c98d8-7dqdr
   322    - example-memcached-6fd7c98d8-g5k7v
   323    - example-memcached-6fd7c98d8-m7vn7
   324  ```
   325  
   326  ### Update the size
   327  
   328  Change the `spec.size` field in the memcached CR from 3 to 4 and apply the change:
   329  
   330  ```sh
   331  $ cat deploy/crds/cache_v1alpha1_memcached_cr.yaml
   332  apiVersion: "cache.example.com/v1alpha1"
   333  kind: "Memcached"
   334  metadata:
   335    name: "example-memcached"
   336  spec:
   337    size: 4
   338  
   339  $ kubectl apply -f deploy/crds/cache_v1alpha1_memcached_cr.yaml
   340  ```
   341  
   342  Confirm that the operator changes the deployment size:
   343  
   344  ```sh
   345  $ kubectl get deployment
   346  NAME                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
   347  example-memcached    4         4         4            4           5m
   348  ```
   349  
   350  ### Cleanup
   351  
   352  Clean up the resources:
   353  
   354  ```sh
   355  $ kubectl delete -f deploy/crds/cache_v1alpha1_memcached_cr.yaml
   356  $ kubectl delete -f deploy/operator.yaml
   357  $ kubectl delete -f deploy/role_binding.yaml
   358  $ kubectl delete -f deploy/role.yaml
   359  $ kubectl delete -f deploy/service_account.yaml
   360  ```
   361  
   362  ## Advanced Topics
   363  
   364  ### Adding 3rd Party Resources To Your Operator
   365  
   366  The operator's Manager supports the Core Kubernetes resource types as found in the client-go [scheme][scheme_package] package and will also register the schemes of all custom resource types defined in your project under `pkg/apis`.
   367  ```Go
   368  import (
   369    "github.com/example-inc/memcached-operator/pkg/apis"
   370    ...
   371  )
   372  // Setup Scheme for all resources
   373  if err := apis.AddToScheme(mgr.GetScheme()); err != nil {
   374    log.Error(err, "")
   375    os.Exit(1)
   376  }
   377  ```
   378  
   379  To add a 3rd party resource to an operator, you must add it to the Manager's scheme. By creating an `AddToScheme` method or reusing one you can easily add a resource to your scheme. An [example][deployments_register] shows that you define a function and then use the [runtime][runtime_package] package to create a `SchemeBuilder`.
   380  
   381  #### Register with the Manager's scheme
   382  
   383  Call the `AddToScheme()` function for your 3rd party resource and pass it the Manager's scheme via `mgr.GetScheme()`.
   384  
   385  Example:
   386  ```go
   387  import (
   388      ....
   389      routev1 "github.com/openshift/api/route/v1"
   390  )
   391  
   392  func main() {
   393      ....
   394      if err := routev1.AddToScheme(mgr.GetScheme()); err != nil {
   395        log.Error(err, "")
   396        os.Exit(1)
   397      }
   398      ....
   399  }
   400  ```
   401  
   402  After adding new import paths to your operator project, run `dep ensure` in the root of your project directory to fulfill these dependencies.
   403  
   404  
   405  ### Handle Cleanup on Deletion
   406  
   407  To implement complex deletion logic, you can add a finalizer to your Custom Resource. This will prevent your Custom Resource from being
   408  deleted until you remove the finalizer (ie, after your cleanup logic has successfully run). For more information, see the 
   409  [official Kubernetes documentation on finalizers](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#finalizers).
   410  
   411  ### Metrics
   412  
   413  To learn about how metrics work in the Operator SDK read the [metrics section][metrics_doc] of the user documentation.
   414  
   415  ## Leader election
   416  
   417  During the lifecycle of an operator it's possible that there may be more than 1 instance running at any given time e.g when rolling out an upgrade for the operator.
   418  In such a scenario it is necessary to avoid contention between multiple operator instances via leader election so that only one leader instance handles the reconciliation while the other instances are inactive but ready to take over when the leader steps down.
   419  
   420  There are two different leader election implementations to choose from, each with its own tradeoff.
   421  
   422  - [Leader-for-life][leader_for_life]: The leader pod only gives up leadership (via garbage collection) when it is deleted. This implementation precludes the possibility of 2 instances mistakenly running as leaders (split brain). However, this method can be subject to a delay in electing a new leader. For instance when the leader pod is on an unresponsive or partitioned node, the [`pod-eviction-timeout`][pod_eviction_timeout] dictates how it takes for the leader pod to be deleted from the node and step down (default 5m).
   423  - [Leader-with-lease][leader_with_lease]: The leader pod periodically renews the leader lease and gives up leadership when it can't renew the lease. This implementation allows for a faster transition to a new leader when the existing leader is isolated, but there is a possibility of split brain in [certain situations][lease_split_brain].
   424  
   425  By default the SDK enables the leader-for-life implementation. However you should consult the docs above for both approaches to consider the tradeoffs that make sense for your use case.
   426  
   427  The following examples illustrate how to use the two options:
   428  
   429  ### Leader for life
   430  
   431  A call to `leader.Become()` will block the operator as it retries until it can become the leader by creating the configmap named `memcached-operator-lock`.
   432  
   433  ```Go
   434  import (
   435    ...
   436    "github.com/operator-framework/operator-sdk/pkg/leader"
   437  )
   438  
   439  func main() {
   440    ...
   441    err = leader.Become(context.TODO(), "memcached-operator-lock")
   442    if err != nil {
   443      log.Error(err, "Failed to retry for leader lock")
   444      os.Exit(1)
   445    }
   446    ...
   447  }
   448  ```
   449  If the operator is not running inside a cluster `leader.Become()` will simply return without error to skip the leader election since it can't detect the operator's namespace.
   450  
   451  ### Leader with lease
   452  
   453  The leader-with-lease approach can be enabled via the [Manager Options][manager_options] for leader election.
   454  
   455  ```Go
   456  import (
   457    ...
   458    "sigs.k8s.io/controller-runtime/pkg/manager"
   459  )
   460  
   461  func main() {
   462    ...
   463    opts := manager.Options{
   464      ...
   465      LeaderElection: true,
   466      LeaderElectionID: "memcached-operator-lock"
   467    }
   468    mgr, err := manager.New(cfg, opts)
   469    ...
   470  }
   471  ```
   472  
   473  When the operator is not running in a cluster, the Manager will return an error on starting since it can't detect the operator's namespace in order to create the configmap for leader election. You can override this namespace by setting the Manager's `LeaderElectionNamespace` option.
   474  
   475  
   476  
   477  [pod_eviction_timeout]: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
   478  [manager_options]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/manager#Options
   479  [lease_split_brain]: https://github.com/kubernetes/client-go/blob/30b06a83d67458700a5378239df6b96948cb9160/tools/leaderelection/leaderelection.go#L21-L24
   480  [leader_for_life]: https://godoc.org/github.com/operator-framework/operator-sdk/pkg/leader
   481  [leader_with_lease]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/leaderelection
   482  [memcached_handler]: ../example/memcached-operator/handler.go.tmpl
   483  [memcached_controller]: ../example/memcached-operator/memcached_controller.go.tmpl
   484  [layout_doc]:./project_layout.md
   485  [ansible_user_guide]:./ansible/user-guide.md
   486  [helm_user_guide]:./helm/user-guide.md
   487  [dep_tool]:https://golang.github.io/dep/docs/installation.html
   488  [git_tool]:https://git-scm.com/downloads
   489  [go_tool]:https://golang.org/dl/
   490  [docker_tool]:https://docs.docker.com/install/
   491  [kubectl_tool]:https://kubernetes.io/docs/tasks/tools/install-kubectl/
   492  [minikube_tool]:https://github.com/kubernetes/minikube#installation
   493  [scheme_package]:https://github.com/kubernetes/client-go/blob/master/kubernetes/scheme/register.go
   494  [deployments_register]: https://github.com/kubernetes/api/blob/master/apps/v1/register.go#L41
   495  [doc_client_api]:./user/client.md
   496  [runtime_package]: https://godoc.org/k8s.io/apimachinery/pkg/runtime
   497  [manager_go_doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/manager#Manager
   498  [controller-go-doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg#hdr-Controller
   499  [request-go-doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/reconcile#Request
   500  [result_go_doc]: https://godoc.org/github.com/kubernetes-sigs/controller-runtime/pkg/reconcile#Result
   501  [metrics_doc]: ./user/metrics/README.md