github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/clients/promtail/pipelines.md (about) 1 --- 2 title: Pipelines 3 --- 4 # Pipelines 5 6 A detailed look at how to set up Promtail to process your log lines, including 7 extracting metrics and labels. 8 9 ## Pipeline 10 11 A pipeline is used to transform a single log line, its labels, and its 12 timestamp. A pipeline is comprised of a set of **stages**. There are 4 types of 13 stages: 14 15 1. **Parsing stages** parse the current log line and extract data out of it. The 16 extracted data is then available for use by other stages. 17 1. **Transform stages** transform extracted data from previous stages. 18 1. **Action stages** take extracted data from previous stages and do something 19 with them. Actions can: 20 1. Add or modify existing labels to the log line 21 1. Change the timestamp of the log line 22 1. Change the content of the log line 23 1. Create a metric based on the extracted data 24 1. **Filtering stages** optionally apply a subset of stages or drop entries based on some 25 condition. 26 27 Typical pipelines will start with a parsing stage (such as a 28 [regex](../stages/regex/) or [json](../stages/json/) stage) to extract data 29 from the log line. Then, a series of action stages will be present to do 30 something with that extracted data. The most common action stage will be a 31 [labels](../stages/labels/) stage to turn extracted data into a label. 32 33 A common stage will also be the [match](../stages/match/) stage to selectively 34 apply stages or drop entries based on a [LogQL stream selector and filter expressions](../../../logql/). 35 36 Note that pipelines can not currently be used to deduplicate logs; Grafana Loki will 37 receive the same log line multiple times if, for example: 38 39 1. Two scrape configs read from the same file 40 1. Duplicate log lines in a file are sent through a pipeline. Deduplication is 41 not done. 42 43 However, Loki will perform some deduplication at query time for logs that have 44 the exact same nanosecond timestamp, labels, and log contents. 45 46 This documented example gives a good glimpse of what you can achieve with a 47 pipeline: 48 49 ```yaml 50 scrape_configs: 51 - job_name: kubernetes-pods-name 52 kubernetes_sd_configs: .... 53 pipeline_stages: 54 55 # This stage is only going to run if the scraped target has a label 56 # of "name" with value "promtail". 57 - match: 58 selector: '{name="promtail"}' 59 stages: 60 # The regex stage parses out a level, timestamp, and component. At the end 61 # of the stage, the values for level, timestamp, and component are only 62 # set internally for the pipeline. Future stages can use these values and 63 # decide what to do with them. 64 - regex: 65 expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)' 66 67 # The labels stage takes the level and component entries from the previous 68 # regex stage and promotes them to a label. For example, level=error may 69 # be a label added by this stage. 70 - labels: 71 level: 72 component: 73 74 # Finally, the timestamp stage takes the timestamp extracted from the 75 # regex stage and promotes it to be the new timestamp of the log entry, 76 # parsing it as an RFC3339Nano-formatted value. 77 - timestamp: 78 format: RFC3339Nano 79 source: timestamp 80 81 # This stage is only going to run if the scraped target has a label of 82 # "name" with a value of "nginx" and if the log line contains the word "GET" 83 - match: 84 selector: '{name="nginx"} |= "GET"' 85 stages: 86 # This regex stage extracts a new output by matching against some 87 # values and capturing the rest. 88 - regex: 89 expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*) 90 91 # The output stage changes the content of the captured log line by 92 # setting it to the value of output from the regex stage. 93 - output: 94 source: output 95 96 # This stage is only going to run if the scraped target has a label of 97 # "name" with a value of "jaeger-agent". 98 - match: 99 selector: '{name="jaeger-agent"}' 100 stages: 101 # The JSON stage reads the log line as a JSON string and extracts 102 # the "level" field from the object for use in further stages. 103 - json: 104 expressions: 105 level: level 106 107 # The labels stage pulls the value from "level" that was extracted 108 # from the previous stage and promotes it to a label. 109 - labels: 110 level: 111 - job_name: kubernetes-pods-app 112 kubernetes_sd_configs: .... 113 pipeline_stages: 114 # This stage will only run if the scraped target has a label of "app" 115 # with a name of *either* grafana or prometheus. 116 - match: 117 selector: '{app=~"grafana|prometheus"}' 118 stages: 119 # The regex stage will extract a level and component for use in further 120 # stages, allowing the level to be defined as either lvl=<level> or 121 # level=<level> and the component to be defined as either 122 # logger=<component> or component=<component> 123 - regex: 124 expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)" 125 126 # The labels stage then promotes the level and component extracted from 127 # the regex stage to labels. 128 - labels: 129 level: 130 component: 131 132 # This stage will only run if the scraped target has a label "app" 133 # with a value of "some-app" and the log line doesn't contain the word "info" 134 - match: 135 selector: '{app="some-app"} != "info"' 136 stages: 137 # The regex stage tries to extract a Go panic by looking for panic: 138 # in the log message. 139 - regex: 140 expression: ".*(?P<panic>panic: .*)" 141 142 # The metrics stage is going to increment a panic_total metric counter 143 # which Promtail exposes. The counter is only incremented when panic 144 # was extracted from the regex stage. 145 - metrics: 146 panic_total: 147 type: Counter 148 description: "total count of panic" 149 source: panic 150 config: 151 action: inc 152 ``` 153 154 ### Data Accessible to Stages 155 156 The following sections further describe the types that are accessible to each 157 stage (although not all may be used): 158 159 #### Label Set 160 161 The current set of labels for the log line. Initialized to be the set of labels 162 that were scraped along with the log line. The label set is only modified by an 163 action stage, but filtering stages read from it. 164 165 The final label set will be index by Loki and can be used for queries. 166 167 #### Extracted Map 168 169 A collection of key-value pairs extracted during a parsing stage. Subsequent 170 stages operate on the extracted map, either transforming them or taking action 171 with them. At the end of a pipeline, the extracted map is discarded; for a 172 parsing stage to be useful, it must always be paired with at least one action 173 stage. 174 175 The extracted map is initialized with the same set of initial labels that were 176 scraped along with the log line. This initial data allows for taking action on 177 the values of labels inside pipeline stages that only manipulate the extracted 178 map. For example, log entries tailed from files have the label `filename` whose 179 value is the file path that was tailed. When a pipeline executes for that log 180 entry, the initial extracted map would contain `filename` using the same value 181 as the label. 182 183 #### Log Timestamp 184 185 The current timestamp for the log line. Action stages can modify this value. 186 If left unset, it defaults to the time when the log was scraped. 187 188 The final value for the timestamp is sent to Loki. 189 190 #### Log Line 191 192 The current log line, represented as text. Initialized to be the text that 193 Promtail scraped. Action stages can modify this value. 194 195 The final value for the log line is sent to Loki as the text content for the 196 given log entry. 197 198 ## Stages 199 200 Parsing stages: 201 202 - [docker](../stages/docker/): Extract data by parsing the log line using the standard Docker format. 203 - [cri](../stages/cri/): Extract data by parsing the log line using the standard CRI format. 204 - [regex](../stages/regex/): Extract data using a regular expression. 205 - [json](../stages/json/): Extract data by parsing the log line as JSON. 206 207 Transform stages: 208 209 - [multiline](../stages/multiline/): Merges multiple lines, e.g. stack traces, into multiline blocks. 210 - [template](../stages/template/): Use Go templates to modify extracted data. 211 212 Action stages: 213 214 - [timestamp](../stages/timestamp/): Set the timestamp value for the log entry. 215 - [output](../stages/output/): Set the log line text. 216 - [labels](../stages/labels/): Update the label set for the log entry. 217 - [metrics](../stages/metrics/): Calculate metrics based on extracted data. 218 - [tenant](../stages/tenant/): Set the tenant ID value to use for the log entry. 219 220 Filtering stages: 221 222 - [match](../stages/match/): Conditionally run stages based on the label set. 223 - [drop](../stages/drop/): Conditionally drop log lines based on several options. 224 - [limit](../stages/limit/): Conditionally rate limit log lines based on several options.