sigs.k8s.io/cluster-api@v1.7.1/docs/book/src/developer/logging.md (about)

     1  # Logging
     2  The Cluster API project is committed to improving the SRE/developer experience when troubleshooting issues, and logging
     3  plays an important part in this goal.
     4  
     5  In Cluster API we strive to follow three principles while implementing logging:
     6  
     7  - **Logs are for SRE & developers, not for end users!**
     8    Whenever an end user is required to read logs to understand what is happening in the system, most probably there is an
     9    opportunity for improvement of other observability in our API, like e.g. conditions and events.
    10  - **Navigating logs should be easy**:
    11    We should make sure that SREs/Developers can easily drill down logs while investigating issues, e.g. by allowing to
    12    search all the log entries for a specific Machine object, eventually across different controllers/reconciler logs.
    13  - **Cluster API developers MUST use logs!**
    14    As Cluster API contributors you are not only the ones that implement logs, but also the first users of them. Use it!
    15    Provide feedback!
    16  
    17  ## Upstream Alignment
    18  
    19  Kubernetes defines a set of [logging conventions](https://git.k8s.io/community/contributors/devel/sig-instrumentation/logging.md), 
    20  as well as tools and libraries for logging.
    21  
    22  ## Continuous improvement
    23  
    24  The foundational items of Cluster API logging are:
    25  
    26  - Support for structured logging in all the Cluster API controllers (see [log format](#log-format)).
    27  - Using contextual logging (see [contextual logging](#contextual-logging)).
    28  - Adding a minimal set of key/value pairs in the logger at the beginning of each reconcile loop, so all the subsequent
    29    log entries will inherit them (see [key value pairs](#keyvalue-pairs)).
    30  
    31  Starting from the above foundations, then the long tail of small improvements will consist of following activities: 
    32   
    33  - Improve consistency of additional key/value pairs added by single log entries (see [key value pairs](#keyvalue-pairs)).
    34  - Improve log messages (see [log messages](#log-messages)).
    35  - Improve consistency of log levels (see [log levels](#log-levels)).
    36  
    37  ## Log Format
    38  
    39  Controllers MUST provide support for [structured logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging)
    40  and for the [JSON output format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#json-output-format); 
    41  quoting the Kubernetes documentation, these are the key elements of this approach:
    42  
    43  - Separate a log message from its arguments.
    44  - Treat log arguments as key-value pairs.
    45  - Be easily parsable and queryable.
    46  
    47  Cluster API uses all the tooling provided by the Kubernetes community to implement structured logging: [Klog](https://github.com/kubernetes/klog), a
    48  [logr](https://github.com/go-logr/logr) wrapper that works with controller runtime, and other utils for exposing flags
    49  in the controller’s main.go.
    50  
    51  Ideally, in a future release of Cluster API we will make JSON output format the default format for all the Cluster API
    52  controllers (currently the default is still text format).
    53  
    54  ## Contextual logging
    55  
    56  [Contextual logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging)
    57  is the practice of using a log stored in the context across the entire chain of calls of a reconcile
    58  action. One of the main advantages of this approach is that key value pairs which are added to the logger at the
    59  beginning of the chain are then inherited by all the subsequent log entries created down the chain.
    60  
    61  Contextual logging is also embedded in controller runtime; In Cluster API we use contextual logging via controller runtime's
    62  `LoggerFrom(ctx)` and `LoggerInto(ctx, log)` primitives and this ensures that:
    63  
    64  - The logger passed to each reconcile call has a unique `reconcileID`, so all the logs being written during a single 
    65    reconcile call can be easily identified (note: controller runtime also adds other useful key value pairs by default).
    66  - The logger has a key value pair identifying the objects being reconciled,e.g. a Machine Deployment, so all the logs
    67    impacting this object can be easily identified.
    68  
    69  Cluster API developer MUST ensure that:
    70  
    71  - The logger has a set of key value pairs identifying the hierarchy of objects the object being reconciled belongs to,
    72    e.g. the Cluster a Machine Deployment belongs to, so it will be possible to drill down logs for related Cluster API
    73    objects while investigating issues.
    74  
    75  ## Key/Value Pairs
    76  
    77  One of the key elements of structured logging is key-value pairs.
    78  
    79  Having consistent key value pairs is a requirement for ensuring readability and for providing support for searching and
    80  correlating lines across logs.
    81  
    82  A set of good practices for defining key value pairs is defined in the [Kubernetes Guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments), and
    83  one of the above practices is really important for Cluster API developers
    84  
    85  - Developers MUST use `klog.KObj` or `klog.KRef` functions when logging key value pairs for Kubernetes objects, thus
    86    ensuring a key value pair representing a Kubernetes object is formatted consistently in all the logs.
    87  
    88  Please note that, in order to ensure logs can be easily searched it is important to ensure consistency for the following
    89  key value pairs (in order of importance):
    90  
    91  - Key value pairs identifying the object being reconciled, e.g. a Machine Deployment.
    92  - Key value pairs identifying the hierarchy of objects being reconciled, e.g. the Cluster a Machine Deployment belongs
    93    to.
    94  - Key value pairs identifying side effects on other objects, e.g. while reconciling a MachineDeployment, the controller 
    95    creates a MachinesSet.
    96  - Other Key value pairs.
    97  
    98  ## Log Messages
    99  
   100  - A Message MUST always start with a capital letter.
   101  - Period at the end of a message MUST be omitted.
   102  - Always prefer logging before the action, so in case of errors there will be an immediate, visual correlation between
   103    the action log and the corresponding error log; While logging before the action, log verbs should use the -ing form.
   104  - Ideally log messages should surface a different level of detail according to the target log level (see [log levels](#log-levels)
   105    for more details).
   106  
   107  ## Log Levels
   108  
   109  Kubernetes provides a set of [recommendations](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#what-method-to-use)
   110  for log levels; as a small integration on the above guidelines we would like to add:
   111  
   112  - Logs at the lower levels of verbosity (<=3) are meant to document “what happened” by describing how an object status
   113    is being changed by controller/reconcilers across subsequent reconciliations; as a rule of thumb, it is reasonable
   114    to assume that a person reading those logs has a deep knowledge of how the system works, but it should not be required
   115    for those persons to have knowledge of the codebase. 
   116  - Logs at higher levels of verbosity (>=4) are meant to document “how it happened”, providing insight on thorny parts of
   117    the code; a person reading those logs usually has deep knowledge of the codebase. 
   118  - Don’t use verbosity higher than 5.
   119  
   120  We are using log level 2 as a default verbosity for all core Cluster API
   121  controllers as recommended by the Kubernetes guidelines.
   122  
   123  ## Trade-offs
   124  
   125  When developing logs there are operational trade-offs to take into account, e.g. verbosity vs space allocation, user
   126  readability vs machine readability, maintainability of the logs across the code base.
   127  
   128  A reasonable approach for logging is to keep things simple and implement more log verbosity selectively and only on
   129  thorny parts of code. Over time, based on feedback from SRE/developers, more logs can be added to shed light where necessary.
   130  
   131  ## Developing and testing logs
   132  
   133  Our [Tilt](tilt.md) setup offers a batteries-included log suite based on [Promtail](https://grafana.com/docs/loki/latest/clients/promtail/), [Loki](https://grafana.com/docs/loki/latest/fundamentals/overview/) and [Grafana](https://grafana.com/docs/grafana/latest/explore/logs-integration/).
   134  
   135  We are working to continuously improving this experience, allowing Cluster API developers to use logs and improve them as part of their development process.
   136  
   137  For the best experience exploring the logs using Tilt:
   138  1. Set `--logging-format=json`. 
   139  2. Set a high log verbosity, e.g. `v=5`.
   140  3. Enable Promtail, Loki, and Grafana under `deploy_observability`.
   141  
   142  A minimal example of a tilt-settings.yaml file that deploys a ready-to-use logging suite looks like:
   143  ```yaml
   144  deploy_observability:
   145    - promtail
   146    - loki
   147    - grafana
   148  enable_providers:
   149    - docker
   150    - kubeadm-bootstrap
   151    - kubeadm-control-plane
   152  extra_args:
   153    core:
   154      - "--logging-format=json"
   155      - "--v=5"
   156    docker:
   157      - "--v=5"
   158      - "--logging-format=json"
   159    kubeadm-bootstrap:
   160      - "--v=5"
   161      - "--logging-format=json"
   162    kubeadm-control-plane:
   163      - "--v=5"
   164      - "--logging-format=json"
   165  ```
   166  The above options can be combined with other settings from our [Tilt](tilt.md) setup. Once Tilt is up and running with these settings users will be able to browse logs using the Grafana Explore UI. 
   167  
   168  This will normally be available on `localhost:3001`. To explore logs from Loki, open the Explore interface for the DataSource 'Loki'. [This link](http://localhost:3001/explore?datasource%22:%22Loki%22) should work as a shortcut with the default Tilt settings.
   169  
   170  ### Example queries
   171  
   172  In the Log browser the following queries can be used to browse logs by controller, and by specific Cluster API objects. For example:
   173  ```
   174  {app="capi-controller-manager"} | json 
   175  ```
   176  Will return logs from the `capi-controller-manager` which are parsed in json. Passing the query through the json parser allows filtering by key-value pairs that are part of nested json objects. For example `.cluster.name` becomes `cluster_name`.
   177  
   178  ```
   179  {app="capi-controller-manager"} | json | Cluster_name="my-cluster"
   180  ```
   181  Will return logs from the `capi-controller-manager` that are associated with the Cluster `my-cluster`.
   182  
   183  ```
   184  {app="capi-controller-manager"} | json | Cluster_name="my-cluster" | v <= 2
   185  ```
   186  Will return logs from the `capi-controller-manager` that are associated with the Cluster `my-cluster` with log level <= 2.
   187  
   188  ```
   189  {app="capi-controller-manager"} | json | Cluster_name="my-cluster" reconcileID="6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx"
   190  ```
   191  
   192  Will return logs from the `capi-controller-manager`, associated with the Cluster `my-cluster` and the Reconcile ID `6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx`. Each reconcile loop will have a unique Reconcile ID.
   193  
   194  ```
   195  {app="capi-controller-manager"} | json | Cluster_name="my-cluster" reconcileID="6f6ad971-bdb6-4fa3-b803-ef81c5c8f9d0" controller="cluster" | line_format "{{ .msg }}"
   196  ```
   197  Will return logs from the `capi-controller-manager`, associated with the Cluster `my-cluster` and the Reconcile ID `6f6ad971-bdb6-4fa3-b803-xxxxxxxxxxxx` it further selects only those logs which come from the Cluster controller. It will then format the logs so only the message is displayed.
   198  
   199  ```
   200  {app=~"capd-controller-manager|capi-kubeadm-bootstrap-controller-manager|capi-kubeadm-control-plane-controller-manager"} | json | Cluster_name="my-cluster" Machine_name="my-cluster-linux-worker-1" | line_format "{{.controller}} {{.msg}}"
   201  ```
   202  
   203  Will return the logs from four CAPI providers - the Core provider, Kubeadm Control Plane provider, Kubeadm Bootstrap provider and the Docker infrastructure provider. It filters by the cluster name and the machine name and then formats the log lines to show just the source controller and the message. This allows us to correlate logs and see actions taken by each of these four providers related to the machine `my-cluster-linux-worker-1`.
   204  
   205  For more information on formatting and filtering logs using Grafana and Loki see:
   206  - [json parsing](https://grafana.com/docs/loki/latest/clients/promtail/stages/json/)
   207  - [log queries](https://grafana.com/docs/loki/latest/logql/log_queries/)
   208  
   209  ## What about providers
   210  Cluster API providers are developed by independent teams, and each team is free to define their own processes and
   211  conventions.
   212  
   213  However, given that SRE/developers looking at logs are often required to look both at logs from core CAPI and providers,
   214  we encourage providers to adopt and contribute to the guidelines defined in this document.
   215  
   216  It is also worth noting that the foundational elements of the approach described in this document are easy to achieve
   217  by leveraging default Kubernetes tooling for logging.
   218