github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/cloud/kubernetes/prometheus/README.md (about)

     1  This guide is based on using CoreOS's Prometheus Operator, which allows
     2  a Prometheus instance to be managed using native Kubernetes concepts.
     3  
     4  
     5  References used:
     6  * https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
     7  * https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
     8  
     9  # Preflight
    10  
    11  Create and initialize a Cockroach cluster, if you haven't already done
    12  so:
    13  * `kubectl apply -f
    14  https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset.yaml`
    15  * `kubectl apply -f
    16  https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cluster-init.yaml`
    17  
    18  
    19  If you're running on Google Kubernetes Engine, it's necessary to ensure
    20  that your Kubernetes user is part of the cluster-admin groups.  Edit the
    21  following command before running it; the email address should be
    22  whatever account you use to access GKE.  This is required, regardless
    23  of whether or not you are using a secure CockroachDB cluster.
    24  * `kubectl create clusterrolebinding $USER-cluster-admin-binding
    25  --clusterrole=cluster-admin --user=YOU@YOURDOMAIN.COM`
    26  
    27  # Monitoring
    28  
    29  Edit the cockroachdb service to add the label `prometheus: cockroachdb`.
    30  We use this because we don't want to duplicate the monitoring data
    31  between the two services that we create.  If we don't have a way to
    32  distinguish the `cockroachdb` and `cockroachdb-public` services from
    33  one another, we'd have two different prometheus jobs that had duplicated
    34  backends.
    35  * `kubectl label svc cockroachdb prometheus=cockroachdb`
    36  
    37  
    38  Install Prometheus Operator:
    39  * `kubectl apply -f
    40  https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.20/bundle.yaml`
    41  
    42  Ensure that the instance of prometheus-operator has started before
    43  continuing.  The `kubectl get` command and its desired output is below:
    44  ```
    45  $ kubectl get deploy prometheus-operator
    46  NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    47  prometheus-operator   1         1         1            1           23h
    48  ```
    49  
    50  Create the various objects necessary to run a prometheus instance:
    51  * `kubectl apply -f prometheus.yaml`
    52  
    53  To view the Prometheus UI locally:
    54  * `kubectl port-forward
    55  prometheus-cockroachdb-0 9090`
    56  * Open http://localhost:9090 in your browser.
    57  * Select the `Status -> Targets` menu entry to verify that the
    58    CockroachDB instances have been located.
    59    ![Targets screenshot](img/targets.png)
    60  * Graphing the `sys_uptime` variable will verify that data is being
    61    collected. ![Uptime graph screenshot](img/graph.png)
    62  
    63  
    64  # Alerting
    65  
    66  Edit the template `alertmanager.yaml` with your relevant configuration.
    67  What's in the file has a dummy web hook, per the prometheus-operator
    68  alerting guide linked from the top of the document.
    69  
    70  Upload `alertmanager-config.yaml`, renaming it to `alertmanager.yaml`
    71  in the process, and labelling it to make it easier to find.
    72  * `kubectl create secret generic
    73  alertmanager-cockroachdb --from-file=alertmanager.yaml=alertmanager-config.yaml`
    74  * `kubectl label secret  alertmanager-cockroachdb app=cockroachdb`
    75  
    76  It's critical that the name of the secret and the `alertmanager.yaml`
    77  are given exactly as shown.
    78  
    79  Create an AlertManager object to run a replicated AlertManager instance
    80  and create a ClusterIP service so that Prometheus can forward alerts:
    81  * `kubectl apply -f alertmanager.yaml`
    82  
    83  
    84  Verify that AlertManager is running:
    85  * `kubectl port-forward alertmanager-cockroachdb-0  9093`
    86  * Open http://localhost:9093 in your browser.  You should see something
    87    similar to the following:
    88    ![AlertManager screenshot](img/alertmanager.png)
    89  * Ensure that the AlertManagers are visible to Prometheus by checking
    90    http://localhost:9090/status.  It may take a minute for the configuration
    91    changes to propagate.  If this is successful, you should see something
    92    similar to the following:
    93    ![AlertManager screenshot](img/status-alertmanagers.png)
    94  
    95  
    96  Upload alert rules:
    97  *  These are copied from https://github.com/cockroachdb/cockroach/blob/master/monitoring/rules/alerts.rules.yml:
    98  * `kubectl apply -f alert-rules.yaml`
    99  * Check that the rules are visible to Prometheus by opening
   100    http://localhost:9090/rules.  It may take a minute for the configuration
   101    changes to propagate. ![Rule screenshot](img/rules.png)
   102  * Verify that the example alert is firing by opening
   103    http://localhost:9090/rules ![Alerts screenshot](img/alerts.png)
   104  * Remove the example alert by running
   105    `kubectl edit prometheusrules prometheus-cockroachdb-rules` and
   106    deleting the `dummy.rules` block.
   107  
   108  # Cleaning Up
   109  
   110  You can remove the monitoring configurations using the following command:
   111  
   112  `kubectl delete Alertmanager,Prometheus,PrometheusRule,ServiceMonitor -l app=cockroachdb`
   113  
   114  # Maintenance
   115  
   116  The contents of `alert-rules.yaml` are generated from our reference
   117  prometheus configs, located in the top-level `cockroach/monitoring`
   118  directory.  A `wraprules` tool exists to make maintaining this easier.
   119  
   120  ```
   121  go get github.com/cockroachdb/cockroach/pkg/cmd/wraprules
   122  wraprules -o path/to/alert-rules.yaml path/to/cockroach/monitoring/rules/*.rules.yml
   123  ```