github.com/operator-framework/operator-lifecycle-manager@v0.30.0/doc/design/debugging.md (about) 1 # Debugging a ClusterServiceVersion 2 3 We have a ClusterServiceVersion that is failing to report as available. 4 5 ```sh 6 $ kubectl -n ci-olm-pr-188-gc-csvs get clusterserviceversions etcdoperator.v0.8.1 -o yaml 7 ... 8 lastTransitionTime: 2018-01-22T15:48:13Z 9 lastUpdateTime: 2018-01-22T15:51:09Z 10 message: | 11 installing: Waiting: waiting for deployment etcd-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... 12 phase: Installing 13 reason: InstallWaiting 14 ... 15 ``` 16 17 The message tells us install can't complete because the etcd-operator deployment isn't available yet. Now we check on that deployment: 18 19 ```sh 20 $ kubectl -n ci-olm-pr-188-gc-csvs get deployments etcd-operator -o yaml 21 ... 22 spec: 23 template: 24 metadata: 25 labels: 26 name: etcd-operator-olm-owned 27 ... 28 status: 29 unavailableReplicas: 1 30 ... 31 ``` 32 33 We see that 1 of the replicas is unavailable, and the spec tells us the label query to use to find the failing pods: 34 35 ```sh 36 $ kubectl -n ci-olm-pr-188-gc-csvs get pods -l name=etcd-operator-olm-owned 1 ↵ 37 NAME READY STATUS RESTARTS AGE 38 etcd-operator-6c7c8ccb56-9scrz 2/3 CrashLoopBackOff 820 2d 39 40 $ kubectl -n ci-olm-pr-188-gc-csvs get pods etcd-operator-6c7c8ccb56-9scrz -o yaml 41 ... 42 containerStatuses: 43 - containerID: docker://aa7ee0902228247c32b9198be13fc826dfaf4901a70ee84f31582c284721a110 44 image: quay.io/coreos/etcd-operator@sha256:b85754eaeed0a684642b0886034742234d288132dc6439b8132e9abd7a199de0 45 imageID: docker-pullable://quay.io/coreos/etcd-operator@sha256:b85754eaeed0a684642b0886034742234d288132dc6439b8132e9abd7a199de0 46 lastState: 47 terminated: 48 containerID: docker://aa7ee0902228247c32b9198be13fc826dfaf4901a70ee84f31582c284721a110 49 exitCode: 1 50 finishedAt: 2018-01-22T15:55:16Z 51 reason: Error 52 startedAt: 2018-01-22T15:55:16Z 53 name: etcd-backup-operator 54 ready: false 55 restartCount: 820 56 state: 57 waiting: 58 message: Back-off 5m0s restarting failed container=etcd-backup-operator pod=etcd-operator-6c7c8ccb56-9scrz_ci-olm-pr-188-gc-csvs(3084f195-fd38-11e7-b3ea-0aae23d78648) 59 reason: CrashLoopBackOff 60 ... 61 ``` 62 63 One of the pods in the deployment, `etcd-backup-operator` is crash looping for some reason. Now we check the logs of that container: 64 65 ```sh 66 $ kubectl -n ci-olm-pr-188-gc-csvs logs etcd-operator-6c7c8ccb56-9scrz etcd-backup-operator 1 ↵ 67 time="2018-01-22T15:55:16Z" level=info msg="Go Version: go1.9.2" 68 time="2018-01-22T15:55:16Z" level=info msg="Go OS/Arch: linux/amd64" 69 time="2018-01-22T15:55:16Z" level=info msg="etcd-backup-operator Version: 0.8.1" 70 time="2018-01-22T15:55:16Z" level=info msg="Git SHA: b97d9305" 71 time="2018-01-22T15:55:16Z" level=info msg="Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"ci-olm-pr-188-gc-csvs", Name:"etcd-backup-operator", UID:"328b063e-fd38-11e7-b021-122952f9fac4", APIVersion:"v1", ResourceVersion:"11570590", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' etcd-operator-6c7c8ccb56-9scrz became leader" 72 time="2018-01-22T15:55:16Z" level=info msg="starting backup controller" pkg=controller 73 time="2018-01-22T15:55:16Z" level=fatal msg="unknown StorageType: " 74 ``` 75 76 And we can see the reason for the error and take action to craft a new CSV that doesn't cause this error. 77 78 # Debugging an InstallPlan 79 80 The primary way an InstallPlan can fail is by not resolving the resources needed to install a CSV. 81 82 ```yaml 83 apiVersion: app.coreos.com/v1alpha1 84 kind: InstallPlan 85 metadata: 86 namespace: ci-olm-pr-188-gc-csvs 87 name: olm-testing 88 spec: 89 clusterServiceVersionNames: 90 - etcdoperator123 91 approval: Automatic 92 ``` 93 94 This installplan will fail because `etcdoperator123` is not in the catalog. We can see this in its status: 95 96 ```sh 97 $ kubectl get -n ci-olm-pr-188-gc-csvs installplans olm-testing -o yaml 98 apiVersion: app.coreos.com/v1alpha1 99 kind: InstallPlan 100 metadata: 101 ... 102 spec: 103 approval: Automatic 104 clusterServiceVersionNames: 105 - etcdoperator123 106 status: 107 catalogSources: 108 - rh-operators 109 conditions: 110 - lastTransitionTime: 2018-01-22T16:05:09Z 111 lastUpdateTime: 2018-01-22T16:06:59Z 112 message: 'not found: ClusterServiceVersion etcdoperator123' 113 reason: DependenciesConflict 114 status: "False" 115 type: Resolved 116 phase: Planning 117 ``` 118 119 Error messages like this will displayed for any other inconsistency in the catalog. They can be resolved by either updating the catalog or choosing clusterservices that resolve correctly. 120 121 # Debugging ALM operators 122 123 Both the ALM and Catalog operators have `-debug` flags available that display much more useful information when diagnosing a problem. If necessary, add this flag to their deployments and perform the action that is showing undersired behavior.