github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/cloud/kubernetes/prometheus/README.md (about) 1 This guide is based on using CoreOS's Prometheus Operator, which allows 2 a Prometheus instance to be managed using native Kubernetes concepts. 3 4 5 References used: 6 * https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md 7 * https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md 8 9 # Preflight 10 11 Create and initialize a Cockroach cluster, if you haven't already done 12 so: 13 * `kubectl apply -f 14 https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset.yaml` 15 * `kubectl apply -f 16 https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init.yaml` 17 18 19 If you're running on Google Kubernetes Engine, it's necessary to ensure 20 that your Kubernetes user is part of the cluster-admin groups. Edit the 21 following command before running it; the email address should be 22 whatever account you use to access GKE. This is required, regardless 23 of whether or not you are using a secure CockroachDB cluster. 24 * `kubectl create clusterrolebinding $USER-cluster-admin-binding 25 --clusterrole=cluster-admin --user=YOU@YOURDOMAIN.COM` 26 27 # Monitoring 28 29 Edit the cockroachdb service to add the label `prometheus: cockroachdb`. 30 We use this because we don't want to duplicate the monitoring data 31 between the two services that we create. If we don't have a way to 32 distinguish the `cockroachdb` and `cockroachdb-public` services from 33 one another, we'd have two different prometheus jobs that had duplicated 34 backends. 35 * `kubectl label svc cockroachdb prometheus=cockroachdb` 36 37 38 Install Prometheus Operator: 39 * `kubectl apply -f 40 https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.20/bundle.yaml` 41 42 Ensure that the instance of prometheus-operator has started before 43 continuing. The `kubectl get` command and its desired output is below: 44 ``` 45 $ kubectl get deploy prometheus-operator 46 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 47 prometheus-operator 1 1 1 1 23h 48 ``` 49 50 Create the various objects necessary to run a prometheus instance: 51 * `kubectl apply -f prometheus.yaml` 52 53 To view the Prometheus UI locally: 54 * `kubectl port-forward 55 prometheus-cockroachdb-0 9090` 56 * Open http://localhost:9090 in your browser. 57 * Select the `Status -> Targets` menu entry to verify that the 58 CockroachDB instances have been located. 59  60 * Graphing the `sys_uptime` variable will verify that data is being 61 collected.  62 63 64 # Alerting 65 66 Edit the template `alertmanager.yaml` with your relevant configuration. 67 What's in the file has a dummy web hook, per the prometheus-operator 68 alerting guide linked from the top of the document. 69 70 Upload `alertmanager-config.yaml`, renaming it to `alertmanager.yaml` 71 in the process, and labelling it to make it easier to find. 72 * `kubectl create secret generic 73 alertmanager-cockroachdb --from-file=alertmanager.yaml=alertmanager-config.yaml` 74 * `kubectl label secret alertmanager-cockroachdb app=cockroachdb` 75 76 It's critical that the name of the secret and the `alertmanager.yaml` 77 are given exactly as shown. 78 79 Create an AlertManager object to run a replicated AlertManager instance 80 and create a ClusterIP service so that Prometheus can forward alerts: 81 * `kubectl apply -f alertmanager.yaml` 82 83 84 Verify that AlertManager is running: 85 * `kubectl port-forward alertmanager-cockroachdb-0 9093` 86 * Open http://localhost:9093 in your browser. You should see something 87 similar to the following: 88  89 * Ensure that the AlertManagers are visible to Prometheus by checking 90 http://localhost:9090/status. It may take a minute for the configuration 91 changes to propagate. If this is successful, you should see something 92 similar to the following: 93  94 95 96 Upload alert rules: 97 * These are copied from https://github.com/cockroachdb/cockroach/blob/master/monitoring/rules/alerts.rules.yml: 98 * `kubectl apply -f alert-rules.yaml` 99 * Check that the rules are visible to Prometheus by opening 100 http://localhost:9090/rules. It may take a minute for the configuration 101 changes to propagate.  102 * Verify that the example alert is firing by opening 103 http://localhost:9090/rules  104 * Remove the example alert by running 105 `kubectl edit prometheusrules prometheus-cockroachdb-rules` and 106 deleting the `dummy.rules` block. 107 108 # Cleaning Up 109 110 You can remove the monitoring configurations using the following command: 111 112 `kubectl delete Alertmanager,Prometheus,PrometheusRule,ServiceMonitor -l app=cockroachdb` 113 114 # Maintenance 115 116 The contents of `alert-rules.yaml` are generated from our reference 117 prometheus configs, located in the top-level `cockroach/monitoring` 118 directory. A `wraprules` tool exists to make maintaining this easier. 119 120 ``` 121 go get github.com/cockroachdb/cockroach/pkg/cmd/wraprules 122 wraprules -o path/to/alert-rules.yaml path/to/cockroach/monitoring/rules/*.rules.yml 123 ```