github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/clients/promtail/pipelines.md

github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/clients/promtail/pipelines.md (about)

     1  ---
     2  title: Pipelines
     3  ---
     4  # Pipelines
     5  
     6  A detailed look at how to set up Promtail to process your log lines, including
     7  extracting metrics and labels.
     8  
     9  ## Pipeline
    10  
    11  A pipeline is used to transform a single log line, its labels, and its
    12  timestamp. A pipeline is comprised of a set of **stages**. There are 4 types of
    13  stages:
    14  
    15  1. **Parsing stages** parse the current log line and extract data out of it. The
    16     extracted data is then available for use by other stages.
    17  1. **Transform stages** transform extracted data from previous stages.
    18  1. **Action stages** take extracted data from previous stages and do something
    19     with them. Actions can:
    20      1. Add or modify existing labels to the log line
    21      1. Change the timestamp of the log line
    22      1. Change the content of the log line
    23      1. Create a metric based on the extracted data
    24  1. **Filtering stages** optionally apply a subset of stages or drop entries based on some
    25     condition.
    26  
    27  Typical pipelines will start with a parsing stage (such as a
    28  [regex](../stages/regex/) or [json](../stages/json/) stage) to extract data
    29  from the log line. Then, a series of action stages will be present to do
    30  something with that extracted data. The most common action stage will be a
    31  [labels](../stages/labels/) stage to turn extracted data into a label.
    32  
    33  A common stage will also be the [match](../stages/match/) stage to selectively
    34  apply stages or drop entries based on a [LogQL stream selector and filter expressions](../../../logql/).
    35  
    36  Note that pipelines can not currently be used to deduplicate logs; Grafana Loki will
    37  receive the same log line multiple times if, for example:
    38  
    39  1. Two scrape configs read from the same file
    40  1. Duplicate log lines in a file are sent through a pipeline. Deduplication is
    41     not done.
    42  
    43  However, Loki will perform some deduplication at query time for logs that have
    44  the exact same nanosecond timestamp, labels, and log contents.
    45  
    46  This documented example gives a good glimpse of what you can achieve with a
    47  pipeline:
    48  
    49  ```yaml
    50  scrape_configs:
    51  - job_name: kubernetes-pods-name
    52    kubernetes_sd_configs: ....
    53    pipeline_stages:
    54  
    55    # This stage is only going to run if the scraped target has a label
    56    # of "name" with value "promtail".
    57    - match:
    58        selector: '{name="promtail"}'
    59        stages:
    60        # The regex stage parses out a level, timestamp, and component. At the end
    61        # of the stage, the values for level, timestamp, and component are only
    62        # set internally for the pipeline. Future stages can use these values and
    63        # decide what to do with them.
    64        - regex:
    65            expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
    66  
    67        # The labels stage takes the level and component entries from the previous
    68        # regex stage and promotes them to a label. For example, level=error may
    69        # be a label added by this stage.
    70        - labels:
    71            level:
    72            component:
    73  
    74        # Finally, the timestamp stage takes the timestamp extracted from the
    75        # regex stage and promotes it to be the new timestamp of the log entry,
    76        # parsing it as an RFC3339Nano-formatted value.
    77        - timestamp:
    78            format: RFC3339Nano
    79            source: timestamp
    80  
    81    # This stage is only going to run if the scraped target has a label of
    82    # "name" with a value of "nginx" and if the log line contains the word "GET"
    83    - match:
    84        selector: '{name="nginx"} |= "GET"'
    85        stages:
    86        # This regex stage extracts a new output by matching against some
    87        # values and capturing the rest.
    88        - regex:
    89            expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*)
    90  
    91        # The output stage changes the content of the captured log line by
    92        # setting it to the value of output from the regex stage.
    93        - output:
    94            source: output
    95  
    96    # This stage is only going to run if the scraped target has a label of
    97    # "name" with a value of "jaeger-agent".
    98    - match:
    99        selector: '{name="jaeger-agent"}'
   100        stages:
   101        # The JSON stage reads the log line as a JSON string and extracts
   102        # the "level" field from the object for use in further stages.
   103        - json:
   104            expressions:
   105              level: level
   106  
   107        # The labels stage pulls the value from "level" that was extracted
   108        # from the previous stage and promotes it to a label.
   109        - labels:
   110            level:
   111  - job_name: kubernetes-pods-app
   112    kubernetes_sd_configs: ....
   113    pipeline_stages:
   114    # This stage will only run if the scraped target has a label of "app"
   115    # with a name of *either* grafana or prometheus.
   116    - match:
   117        selector: '{app=~"grafana|prometheus"}'
   118        stages:
   119        # The regex stage will extract a level and component for use in further
   120        # stages, allowing the level to be defined as either lvl=<level> or
   121        # level=<level> and the component to be defined as either
   122        # logger=<component> or component=<component>
   123        - regex:
   124            expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)"
   125  
   126        # The labels stage then promotes the level and component extracted from
   127        # the regex stage to labels.
   128        - labels:
   129            level:
   130            component:
   131  
   132    # This stage will only run if the scraped target has a label "app"
   133    # with a value of "some-app" and the log line doesn't contain the word "info"
   134    - match:
   135        selector: '{app="some-app"} != "info"'
   136        stages:
   137        # The regex stage tries to extract a Go panic by looking for panic:
   138        # in the log message.
   139        - regex:
   140            expression: ".*(?P<panic>panic: .*)"
   141  
   142        # The metrics stage is going to increment a panic_total metric counter
   143        # which Promtail exposes. The counter is only incremented when panic
   144        # was extracted from the regex stage.
   145        - metrics:
   146            panic_total:
   147              type: Counter
   148              description: "total count of panic"
   149              source: panic
   150              config:
   151                action: inc
   152  ```
   153  
   154  ### Data Accessible to Stages
   155  
   156  The following sections further describe the types that are accessible to each
   157  stage (although not all may be used):
   158  
   159  #### Label Set
   160  
   161  The current set of labels for the log line. Initialized to be the set of labels
   162  that were scraped along with the log line. The label set is only modified by an
   163  action stage, but filtering stages read from it.
   164  
   165  The final label set will be index by Loki and can be used for queries.
   166  
   167  #### Extracted Map
   168  
   169  A collection of key-value pairs extracted during a parsing stage. Subsequent
   170  stages operate on the extracted map, either transforming them or taking action
   171  with them. At the end of a pipeline, the extracted map is discarded; for a
   172  parsing stage to be useful, it must always be paired with at least one action
   173  stage.
   174  
   175  The extracted map is initialized with the same set of initial labels that were
   176  scraped along with the log line. This initial data allows for taking action on
   177  the values of labels inside pipeline stages that only manipulate the extracted
   178  map. For example, log entries tailed from files have the label `filename` whose
   179  value is the file path that was tailed. When a pipeline executes for that log
   180  entry, the initial extracted map would contain `filename` using the same value
   181  as the label.
   182  
   183  #### Log Timestamp
   184  
   185  The current timestamp for the log line. Action stages can modify this value.
   186  If left unset, it defaults to the time when the log was scraped.
   187  
   188  The final value for the timestamp is sent to Loki.
   189  
   190  #### Log Line
   191  
   192  The current log line, represented as text. Initialized to be the text that
   193  Promtail scraped. Action stages can modify this value.
   194  
   195  The final value for the log line is sent to Loki as the text content for the
   196  given log entry.
   197  
   198  ## Stages
   199  
   200  Parsing stages:
   201  
   202    - [docker](../stages/docker/): Extract data by parsing the log line using the standard Docker format.
   203    - [cri](../stages/cri/): Extract data by parsing the log line using the standard CRI format.
   204    - [regex](../stages/regex/): Extract data using a regular expression.
   205    - [json](../stages/json/): Extract data by parsing the log line as JSON.
   206  
   207  Transform stages:
   208  
   209    - [multiline](../stages/multiline/): Merges multiple lines, e.g. stack traces, into multiline blocks.
   210    - [template](../stages/template/): Use Go templates to modify extracted data.
   211  
   212  Action stages:
   213  
   214    - [timestamp](../stages/timestamp/): Set the timestamp value for the log entry.
   215    - [output](../stages/output/): Set the log line text.
   216    - [labels](../stages/labels/): Update the label set for the log entry.
   217    - [metrics](../stages/metrics/): Calculate metrics based on extracted data.
   218    - [tenant](../stages/tenant/): Set the tenant ID value to use for the log entry.
   219  
   220  Filtering stages:
   221  
   222    - [match](../stages/match/): Conditionally run stages based on the label set.
   223    - [drop](../stages/drop/): Conditionally drop log lines based on several options.
   224    - [limit](../stages/limit/): Conditionally rate limit log lines based on several options.