sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/developer/logging.md (about) 1 # Logging 2 The Cluster API project is committed to improving the SRE/developer experience when troubleshooting issues, and logging 3 plays an important part in this goal. 4 5 In Cluster API we strive to follow three principles while implementing logging: 6 7 - **Logs are for SRE & developers, not for end users!** 8 Whenever an end user is required to read logs to understand what is happening in the system, most probably there is an 9 opportunity for improvement of other observability in our API, like e.g. conditions and events. 10 - **Navigating logs should be easy**: 11 We should make sure that SREs/Developers can easily drill down logs while investigating issues, e.g. by allowing to 12 search all the log entries for a specific Machine object, eventually across different controllers/reconciler logs. 13 - **Cluster API developers MUST use logs!** 14 As Cluster API contributors you are not only the ones that implement logs, but also the first users of them. Use it! 15 Provide feedback! 16 17 ## Upstream Alignment 18 19 Kubernetes defines a set of [logging conventions](https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md), 20 as well as tools and libraries for logging. 21 22 ## Continuous improvement 23 24 The foundational items of Cluster API logging are: 25 26 - Support for structured logging in all the Cluster API controllers (see [log format](#log-format)). 27 - Using contextual logging (see [contextual logging](#contextual-logging)). 28 - Adding a minimal set of key/value pairs in the logger at the beginning of each reconcile loop, so all the subsequent 29 log entries will inherit them (see [key value pairs](#keyvalue-pairs)). 30 31 Starting from the above foundations, then the long tail of small improvements will consist of following activities: 32 33 - Improve consistency of additional key/value pairs added by single log entries (see [key value pairs](#keyvalue-pairs)). 34 - Improve log messages (see [log messages](#log-messages)). 35 - Improve consistency of log levels (see [log levels](#log-levels)). 36 37 ## Log Format 38 39 Controllers MUST provide support for [structured logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging) 40 and for the [JSON output format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#json-output-format); 41 quoting the Kubernetes documentation, these are the key elements of this approach: 42 43 - Separate a log message from its arguments. 44 - Treat log arguments as key-value pairs. 45 - Be easily parsable and queryable. 46 47 Cluster API uses all the tooling provided by the Kubernetes community to implement structured logging: [Klog](https://github.com/kubernetes/klog), a 48 [logr](https://github.com/go-logr/logr) wrapper that works with controller runtime, and other utils for exposing flags 49 in the controller’s main.go. 50 51 Ideally, in a future release of Cluster API we will make JSON output format the default format for all the Cluster API 52 controllers (currently the default is still text format). 53 54 ## Contextual logging 55 56 [Contextual logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging) 57 is the practice of using a log stored in the context across the entire chain of calls of a reconcile 58 action. One of the main advantages of this approach is that key value pairs which are added to the logger at the 59 beginning of the chain are then inherited by all the subsequent log entries created down the chain. 60 61 Contextual logging is also embedded in controller runtime; In Cluster API we use contextual logging via controller runtime's 62 `LoggerFrom(ctx)` and `LoggerInto(ctx, log)` primitives and this ensures that: 63 64 - The logger passed to each reconcile call has a unique `reconcileID`, so all the logs being written during a single 65 reconcile call can be easily identified (note: controller runtime also adds other useful key value pairs by default). 66 - The logger has a key value pair identifying the objects being reconciled,e.g. a Machine Deployment, so all the logs 67 impacting this object can be easily identified. 68 69 Cluster API developer MUST ensure that: 70 71 - The logger has a set of key value pairs identifying the hierarchy of objects the object being reconciled belongs to, 72 e.g. the Cluster a Machine Deployment belongs to, so it will be possible to drill down logs for related Cluster API 73 objects while investigating issues. 74 75 ## Key/Value Pairs 76 77 One of the key elements of structured logging is key-value pairs. 78 79 Having consistent key value pairs is a requirement for ensuring readability and for providing support for searching and 80 correlating lines across logs. 81 82 A set of good practices for defining key value pairs is defined in the [Kubernetes Guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments), and 83 one of the above practices is really important for Cluster API developers 84 85 - Developers MUST use `klog.KObj` or `klog.KRef` functions when logging key value pairs for Kubernetes objects, thus 86 ensuring a key value pair representing a Kubernetes object is formatted consistently in all the logs. 87 88 Please note that, in order to ensure logs can be easily searched it is important to ensure consistency for the following 89 key value pairs (in order of importance): 90 91 - Key value pairs identifying the object being reconciled, e.g. a Machine Deployment. 92 - Key value pairs identifying the hierarchy of objects being reconciled, e.g. the Cluster a Machine Deployment belongs 93 to. 94 - Key value pairs identifying side effects on other objects, e.g. while reconciling a MachineDeployment, the controller 95 creates a MachinesSet. 96 - Other Key value pairs. 97 98 ## Log Messages 99 100 - A Message MUST always start with a capital letter. 101 - Period at the end of a message MUST be omitted. 102 - Always prefer logging before the action, so in case of errors there will be an immediate, visual correlation between 103 the action log and the corresponding error log; While logging before the action, log verbs should use the -ing form. 104 - Ideally log messages should surface a different level of detail according to the target log level (see [log levels](#log-levels) 105 for more details). 106 107 ## Log Levels 108 109 Kubernetes provides a set of [recommendations](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#what-method-to-use) 110 for log levels; as a small integration on the above guidelines we would like to add: 111 112 - Logs at the lower levels of verbosity (<=3) are meant to document “what happened” by describing how an object status 113 is being changed by controller/reconcilers across subsequent reconciliations; as a rule of thumb, it is reasonable 114 to assume that a person reading those logs has a deep knowledge of how the system works, but it should not be required 115 for those persons to have knowledge of the codebase. 116 - Logs at higher levels of verbosity (>=4) are meant to document “how it happened”, providing insight on thorny parts of 117 the code; a person reading those logs usually has deep knowledge of the codebase. 118 - Don’t use verbosity higher than 5. 119 120 We are using log level 2 as a default verbosity for all core Cluster API 121 controllers as recommended by the Kubernetes guidelines. 122 123 ## Trade-offs 124 125 When developing logs there are operational trade-offs to take into account, e.g. verbosity vs space allocation, user 126 readability vs machine readability, maintainability of the logs across the code base. 127 128 A reasonable approach for logging is to keep things simple and implement more log verbosity selectively and only on 129 thorny parts of code. Over time, based on feedback from SRE/developers, more logs can be added to shed light where necessary. 130 131 ## Developing and testing logs 132 133 Our [Tilt](tilt.md) setup offers a batteries-included log suite based on [Promtail](https://grafana.com/docs/loki/latest/clients/promtail/), [Loki](https://grafana.com/docs/loki/latest/fundamentals/overview/) and [Grafana](https://grafana.com/docs/grafana/latest/explore/logs-integration/). 134 135 We are working to continuously improving this experience, allowing Cluster API developers to use logs and improve them as part of their development process. 136 137 For the best experience exploring the logs using Tilt: 138 1. Set `--logging-format=json`. 139 2. Set a high log verbosity, e.g. `v=5`. 140 3. Enable Promtail, Loki, and Grafana under `deploy_observability`. 141 142 A minimal example of a tilt-settings.yaml file that deploys a ready-to-use logging suite looks like: 143 ```yaml 144 deploy_observability: 145 - promtail 146 - loki 147 - grafana 148 enable_providers: 149 - docker 150 - kubeadm-bootstrap 151 - kubeadm-control-plane 152 extra_args: 153 core: 154 - "--logging-format=json" 155 - "--v=5" 156 docker: 157 - "--v=5" 158 - "--logging-format=json" 159 kubeadm-bootstrap: 160 - "--v=5" 161 - "--logging-format=json" 162 kubeadm-control-plane: 163 - "--v=5" 164 - "--logging-format=json" 165 ``` 166 The above options can be combined with other settings from our [Tilt](tilt.md) setup. Once Tilt is up and running with these settings users will be able to browse logs using the Grafana Explore UI. 167 168 This will normally be available on `localhost:3001`. To explore logs from Loki, open the Explore interface for the DataSource 'Loki'. [This link](http://localhost:3001/explore?datasource%22:%22Loki%22) should work as a shortcut with the default Tilt settings. 169 170 ### Example queries 171 172 In the Log browser the following queries can be used to browse logs by controller, and by specific Cluster API objects. For example: 173 ``` 174 {app="capi-controller-manager"} | json 175 ``` 176 Will return logs from the `capi-controller-manager` which are parsed in json. Passing the query through the json parser allows filtering by key-value pairs that are part of nested json objects. For example `.cluster.name` becomes `cluster_name`. 177 178 ``` 179 {app="capi-controller-manager"} | json | Cluster_name="my-cluster" 180 ``` 181 Will return logs from the `capi-controller-manager` that are associated with the Cluster `my-cluster`. 182 183 ``` 184 {app="capi-controller-manager"} | json | Cluster_name="my-cluster" | v <= 2 185 ``` 186 Will return logs from the `capi-controller-manager` that are associated with the Cluster `my-cluster` with log level <= 2. 187 188 ``` 189 {app="capi-controller-manager"} | json | Cluster_name="my-cluster" reconcileID="6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx" 190 ``` 191 192 Will return logs from the `capi-controller-manager`, associated with the Cluster `my-cluster` and the Reconcile ID `6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx`. Each reconcile loop will have a unique Reconcile ID. 193 194 ``` 195 {app="capi-controller-manager"} | json | Cluster_name="my-cluster" reconcileID="6f6ad971-bdb6-4fa3-b803-ef81c5c8f9d0" controller="cluster" | line_format "{{ .msg }}" 196 ``` 197 Will return logs from the `capi-controller-manager`, associated with the Cluster `my-cluster` and the Reconcile ID `6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx` it further selects only those logs which come from the Cluster controller. It will then format the logs so only the message is displayed. 198 199 ``` 200 {app=~"capd-controller-manager|capi-kubeadm-bootstrap-controller-manager|capi-kubeadm-control-plane-controller-manager"} | json | Cluster_name="my-cluster" Machine_name="my-cluster-linux-worker-1" | line_format "{{.controller}} {{.msg}}" 201 ``` 202 203 Will return the logs from four CAPI providers - the Core provider, Kubeadm Control Plane provider, Kubeadm Bootstrap provider and the Docker infrastructure provider. It filters by the cluster name and the machine name and then formats the log lines to show just the source controller and the message. This allows us to correlate logs and see actions taken by each of these four providers related to the machine `my-cluster-linux-worker-1`. 204 205 For more information on formatting and filtering logs using Grafana and Loki see: 206 - [json parsing](https://grafana.com/docs/loki/latest/clients/promtail/stages/json/) 207 - [log queries](https://grafana.com/docs/loki/latest/logql/log_queries/) 208 209 ## What about providers 210 Cluster API providers are developed by independent teams, and each team is free to define their own processes and 211 conventions. 212 213 However, given that SRE/developers looking at logs are often required to look both at logs from core CAPI and providers, 214 we encourage providers to adopt and contribute to the guidelines defined in this document. 215 216 It is also worth noting that the foundational elements of the approach described in this document are easy to achieve 217 by leveraging default Kubernetes tooling for logging. 218