github.com/galamsiva2020/kubernetes-heapster-monitoring@v0.0.0-20210823134957-3c1baa7c1e70/docs/debugging.md

github.com/galamsiva2020/kubernetes-heapster-monitoring@v0.0.0-20210823134957-3c1baa7c1e70/docs/debugging.md (about)

     1  ## Heapster Debugging:
     2  
     3  This is a collection of common issues faced by users and ways to debug them.
     4  
     5  Depending on the deployment setup, the issue could be either with Heapster, cAdvisor, Kubernetes, or the monitoring backend.
     6  
     7  ### Heapster Core
     8  
     9  #### Common Problems
    10  
    11  * Some distros (including Debian) ship with memory accounting disabled by default. To enable memory and swap accounting on the nodes, follow [these instructions](https://docs.docker.com/installation/ubuntulinux/#memory-and-swap-accounting).
    12  
    13  #### Debuging
    14  
    15  There are 2 endpoints that can give you an insight into what is going on in Heapster:
    16  
    17  * `/metrics` contains lots of metrics in Prometheus format that can indicate the root cause of Heapster problems. Example:
    18  ```
    19  master:~$ curl 10.244.1.3:8082/metrics
    20  # HELP heapster_exporter_duration_microseconds Time spent exporting data to sink in microseconds.
    21  # TYPE heapster_exporter_duration_microseconds summary
    22  heapster_exporter_duration_microseconds{exporter="InfluxDB Sink",quantile="0.5"} 3.497
    23  heapster_exporter_duration_microseconds{exporter="InfluxDB Sink",quantile="0.9"} 5.296
    24  heapster_exporter_duration_microseconds{exporter="InfluxDB Sink",quantile="0.99"} 5.296
    25  heapster_exporter_duration_microseconds_sum{exporter="InfluxDB Sink"} 16698.508000000013
    26  heapster_exporter_duration_microseconds_count{exporter="InfluxDB Sink"} 3089
    27  heapster_exporter_duration_microseconds{exporter="Metric Sink",quantile="0.5"} 4.546
    28  heapster_exporter_duration_microseconds{exporter="Metric Sink",quantile="0.9"} 7.632
    29  heapster_exporter_duration_microseconds{exporter="Metric Sink",quantile="0.99"} 7.632
    30  heapster_exporter_duration_microseconds_sum{exporter="Metric Sink"} 25597.190999999973
    31  heapster_exporter_duration_microseconds_count{exporter="Metric Sink"} 3089
    32  [...]
    33  ```
    34  This endpoint is enabled for both metrics(Heapster) and events(Eventer).
    35  
    36  
    37  * `/api/v1/model/debug/allkeys` has a list of all metrics sets that are processed inside Heapster. This can be useful to check what is 
    38  passed to your configured sinks Example:
    39  
    40  ```
    41  master:~$ curl 10.244.1.3:8082/api/v1/model/debug/allkeys
    42  [
    43    "namespace:kube-system/pod:kube-dns-v10-qey9d",
    44    "namespace:default/pod:resource-consumer-qcnzr",
    45    "namespace:default",
    46    "cluster",
    47    "node:kubernetes-minion-fpdd/container:kubelet",
    48    "namespace:kube-system/pod:kube-proxy-kubernetes-minion-fpdd/container:kube-proxy",
    49    "node:kubernetes-minion-j82g/container:system",
    50    "namespace:kube-system/pod:kube-proxy-kubernetes-minion-j82g/container:kube-proxy",
    51    "node:kubernetes-minion-j82g/container:docker-daemon",
    52    "namespace:kube-system/pod:monitoring-influxdb-grafana-v3-q3ozn/container:grafana",
    53    "namespace:kube-system/pod:kubernetes-dashboard-v1.0.0beta1-085ag",
    54    "node:kubernetes-minion-j82g/container:kubelet",
    55    "namespace:kube-system/pod:kube-dns-v10-qey9d/container:healthz",
    56    "node:kubernetes-minion-fhue",
    57    [...]
    58   ``` 
    59  This is enabled for metrics only.
    60  
    61  #### Extra Logging
    62  
    63  Moreover additional logging can be enabled by setting an extra flag `--vmodule=*=4`. 
    64  You can also enable a sink that writes all metrics or events to stdout with `--sink=log` added to command line parameters.
    65  Both changes require restarting Heapster though.
    66  
    67  ### InfluxDB & Grafana
    68  
    69  Ensure Influxdb is up and reachable. Heapster attempts to create a database by default, which will fail eventually after a fixed number of retries.
    70  If the Grafana queries are stuck or slow, it is due to InfluxDB being unresponsive. Consider providing InfluxDB more compute resources (CPU and Memory).
    71  The default database on Influxdb is 'k8s'. 
    72  A `list series` query on 'k8s' database should list all the series being pushed by Heapster. If you do not see any series, take a look at Heapster logs.