github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/clients/promtail/scraping.md (about) 1 --- 2 title: Scraping 3 --- 4 # Promtail Scraping (Service Discovery) 5 6 ## File Target Discovery 7 8 Promtail discovers locations of log files and extract labels from them through 9 the `scrape_configs` section in the config YAML. The syntax is identical to what 10 [Prometheus uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config). 11 12 `scrape_configs` contains one or more entries which are executed for each 13 discovered target (i.e., each container in each new pod running in the 14 instance): 15 16 ``` 17 scrape_configs: 18 - job_name: local 19 static_configs: 20 - ... 21 22 - job_name: kubernetes 23 kubernetes_sd_config: 24 - ... 25 ``` 26 27 If more than one scrape config section matches your logs, you will get duplicate 28 entries as the logs are sent in different streams likely with slightly 29 different labels. 30 31 There are different types of labels present in Promtail: 32 33 - Labels starting with `__` (two underscores) are internal labels. They usually 34 come from dynamic sources like service discovery. Once relabeling is done, 35 they are removed from the label set. To persist internal labels so they're 36 sent to Grafana Loki, rename them so they don't start with `__`. See 37 [Relabeling](#relabeling) for more information. 38 39 - Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which 40 are generated based on your Kubernetes pod's labels. 41 42 For example, if your Kubernetes pod has a label `name` set to `foobar`, then 43 the `scrape_configs` section will receive an internal label 44 `__meta_kubernetes_pod_label_name` with a value set to `foobar`. 45 46 - Other labels starting with `__meta_kubernetes_*` exist based on other 47 Kubernetes metadata, such as the namespace of the pod 48 (`__meta_kubernetes_namespace`) or the name of the container inside the pod 49 (`__meta_kubernetes_pod_container_name`). Refer to 50 [the Prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) 51 for the full list of Kubernetes meta labels. 52 53 - The `__path__` label is a special label which Promtail uses after discovery to 54 figure out where the file to read is located. Wildcards are allowed, for example `/var/log/*.log` to get all files with a `log` extension in the specified directory, and `/var/log/**/*.log` for matching files and directories recursively. For a full list of options check out the docs for the [library](https://github.com/bmatcuk/doublestar) Promtail uses. 55 56 - The `__path_exclude__` label is another special label Promtail uses after 57 discovery, to exclude a subset of the files discovered using `__path__` from 58 being read in the current scrape_config block. It uses the same 59 [library](https://github.com/bmatcuk/doublestar) to enable usage of 60 wildcards and glob patterns. 61 62 - The label `filename` is added for every file found in `__path__` to ensure the 63 uniqueness of the streams. It is set to the absolute path of the file the line 64 was read from. 65 66 ### Kubernetes Discovery 67 68 Note that while Promtail can utilize the Kubernetes API to discover pods as 69 targets, it can only read log files from pods that are running on the same node 70 as the one Promtail is running on. Promtail looks for a `__host__` label on 71 each target and validates that it is set to the same hostname as Promtail's 72 (using either `$HOSTNAME` or the hostname reported by the kernel if the 73 environment variable is not set). 74 75 This means that any time Kubernetes service discovery is used, there must be a 76 `relabel_config` that creates the intermediate label `__host__` from 77 `__meta_kubernetes_pod_node_name`: 78 79 ```yaml 80 relabel_configs: 81 - source_labels: ['__meta_kubernetes_pod_node_name'] 82 target_label: '__host__' 83 ``` 84 85 See [Relabeling](#relabeling) for more information. For more information on how to configure the service discovery see the [Kubernetes Service Discovery configuration](../configuration/#kubernetes_sd_config). 86 87 ## Journal Scraping (Linux Only) 88 89 On systems with `systemd`, Promtail also supports reading from the journal. Unlike 90 file scraping which is defined in the `static_configs` stanza, journal scraping is 91 defined in a `journal` stanza: 92 93 ```yaml 94 scrape_configs: 95 - job_name: journal 96 journal: 97 json: false 98 max_age: 12h 99 path: /var/log/journal 100 labels: 101 job: systemd-journal 102 relabel_configs: 103 - source_labels: ['__journal__systemd_unit'] 104 target_label: 'unit' 105 ``` 106 107 All fields defined in the `journal` section are optional, and are just provided 108 here for reference. The `max_age` field ensures that no older entry than the 109 time specified will be sent to Loki; this circumvents "entry too old" errors. 110 The `path` field tells Promtail where to read journal entries from. The labels 111 map defines a constant list of labels to add to every journal entry that Promtail 112 reads. 113 114 When the `json` field is set to `true`, messages from the journal will be 115 passed through the pipeline as JSON, keeping all of the original fields from the 116 journal entry. This is useful when you don't want to index some fields but you 117 still want to know what values they contained. 118 119 By default, Promtail reads from the journal by looking in the `/var/log/journal` 120 and `/run/log/journal` paths. If running Promtail inside of a Docker container, 121 the path appropriate to your distribution should be bind mounted inside of 122 Promtail along with binding `/etc/machine-id`. Bind mounting `/etc/machine-id` 123 to the path of the same name is required for the journal reader to know which 124 specific journal to read from. For example: 125 126 ```bash 127 docker run \ 128 -v /var/log/journal/:/var/log/journal/ \ 129 -v /run/log/journal/:/run/log/journal/ \ 130 -v /etc/machine-id:/etc/machine-id \ 131 grafana/promtail:latest \ 132 -config.file=/path/to/config/file.yaml 133 ``` 134 135 When Promtail reads from the journal, it brings in all fields prefixed with 136 `__journal_` as internal labels. Like in the example above, the `_SYSTEMD_UNIT` 137 field from the journal was transformed into a label called `unit` through 138 `relabel_configs`. See [Relabeling](#relabeling) for more information, also look at [the systemd man pages](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html) for a list of fields exposed by the journal. 139 140 Here's an example where the `SYSTEMD_UNIT`, `HOSTNAME`, and `SYSLOG_IDENTIFIER` are relabeled for use in Loki. 141 142 Keep in mind that labels prefixed with `__` will be dropped, so relabeling is required to keep these labels. 143 144 ```yaml 145 - job_name: systemd-journal 146 journal: 147 labels: 148 cluster: ops-tools1 149 job: default/systemd-journal 150 path: /var/log/journal 151 relabel_configs: 152 - source_labels: 153 - __journal__systemd_unit 154 target_label: systemd_unit 155 - source_labels: 156 - __journal__hostname 157 target_label: nodename 158 - source_labels: 159 - __journal_syslog_identifier 160 target_label: syslog_identifier 161 ``` 162 163 ## Windows Event Log 164 165 On Windows Promtail supports reading from the event log. 166 Windows event targets can be configured using the `windows_events` stanza: 167 168 169 ```yaml 170 scrape_configs: 171 - job_name: windows 172 windows_events: 173 use_incoming_timestamp: false 174 bookmark_path: "./bookmark.xml" 175 eventlog_name: "Application" 176 xpath_query: '*' 177 labels: 178 job: windows 179 relabel_configs: 180 - source_labels: ['computer'] 181 target_label: 'host' 182 ``` 183 184 When Promtail receives an event it will attach the `channel` and `computer` labels 185 and serialize the event in json. 186 You can relabel default labels via [Relabeling](#relabeling) if required. 187 188 Providing a path to a bookmark is mandatory, it will be used to persist the last event processed and allow 189 resuming the target without skipping logs. 190 191 see the [configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#windows_events) section for more information. 192 193 ## GCP Log scraping 194 195 Promtail supports scraping cloud resource logs such as GCS bucket logs, load balancer logs, and Kubernetes cluster logs from GCP. 196 Configuration is specified in the `gcplog` section, within `scrape_config`. 197 198 There are two kind of scraping strategies: `pull` and `push`. 199 200 ### Pull 201 202 ```yaml 203 - job_name: gcplog 204 gcplog: 205 subscription_type: "pull" # If the `subscription_type` field is empty, defaults to `pull` 206 project_id: "my-gcp-project" 207 subscription: "my-pubsub-subscription" 208 use_incoming_timestamp: false # default rewrite timestamps. 209 labels: 210 job: "gcplog" 211 relabel_configs: 212 - source_labels: ['__gcp_resource_type'] 213 target_label: 'resource_type' 214 - source_labels: ['__gcp_resource_labels_project_id'] 215 target_label: 'project' 216 ``` 217 Here `project_id` and `subscription` are the only required fields. 218 219 - `project_id` is the GCP project id. 220 - `subscription` is the GCP pubsub subscription where Promtail can consume log entries from. 221 222 Before using `gcplog` target, GCP should be [configured](../gcplog-cloud) with pubsub subscription to receive logs from. 223 224 It also supports `relabeling` and `pipeline` stages just like other targets. 225 226 When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling): 227 - `__gcp_logname` 228 - `__gcp_resource_type` 229 - `__gcp_resource_labels_<NAME>` 230 In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`. 231 232 ### Push 233 234 ```yaml 235 - job_name: gcplog 236 gcplog: 237 subscription_type: "push" 238 use_incoming_timestamp: false 239 labels: 240 job: "gcplog-push" 241 server: 242 http_listen_address: 0.0.0.0 243 http_listen_port: 8080 244 relabel_configs: 245 - source_labels: ['__gcp_message_id'] 246 target_label: 'message_id' 247 - source_labels: ['__gcp_attributes_logging_googleapis_com_timestamp'] 248 target_label: 'incoming_ts' 249 ``` 250 251 When configuring the GCP Log push target, Promtail will start an HTTP server listening on port `8080`, as configured in the `server` 252 section. This server exposes the single endpoint `POST /gcp/api/v1/push`, responsible for receiving logs from GCP. 253 254 It also supports `relabeling` and `pipeline` stages. 255 256 When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling): 257 - `__gcp_message_id` 258 - `__gcp_attributes_<NAME>` 259 260 In the example above, the `__gcp_message_id` and the `__gcp_attributes_logging_googleapis_com_timestamp` labels are 261 transformed to `message_id` and `incoming_ts` through `relabel_configs`. All other internal labels, for example some other attribute, 262 will be dropped by the target if not transformed. 263 264 ## Syslog Receiver 265 266 Promtail supports receiving [IETF Syslog (RFC5424)](https://tools.ietf.org/html/rfc5424) 267 messages from a TCP or UDP stream. Receiving syslog messages is defined in a `syslog` 268 stanza: 269 270 ```yaml 271 scrape_configs: 272 - job_name: syslog 273 syslog: 274 listen_address: 0.0.0.0:1514 275 listen_protocol: tcp 276 idle_timeout: 60s 277 label_structured_data: yes 278 labels: 279 job: "syslog" 280 relabel_configs: 281 - source_labels: ['__syslog_message_hostname'] 282 target_label: 'host' 283 ``` 284 285 The only required field in the syslog section is the `listen_address` field, 286 where a valid network address must be provided. The default protocol for 287 receiving messages is TCP. To change the protocol, the `listen_protocol` field 288 can be changed to `udp`. Note, that UDP does not support TLS. 289 The `idle_timeout` can help with cleaning up stale syslog connections. 290 If `label_structured_data` is set, 291 [structured data](https://tools.ietf.org/html/rfc5424#section-6.3) in the 292 syslog header will be translated to internal labels in the form of 293 `__syslog_message_sd_<ID>_<KEY>`. 294 The labels map defines a constant list of labels to add to every journal entry 295 that Promtail reads. 296 297 Note that it is recommended to deploy a dedicated syslog forwarder 298 like **syslog-ng** or **rsyslog** in front of Promtail. 299 The forwarder can take care of the various specifications 300 and transports that exist (UDP, BSD syslog, ...). See recommended output 301 configurations for [syslog-ng](#syslog-ng-output-configuration) and 302 [rsyslog](#rsyslog-output-configuration). 303 304 When Promtail receives syslog messages, it brings in all header fields, 305 parsed from the received message, prefixed with `__syslog_` as internal labels. 306 Like in the example above, the `__syslog_message_hostname` 307 field from the journal was transformed into a label called `host` through 308 `relabel_configs`. See [Relabeling](#relabeling) for more information. 309 310 ### Syslog-NG Output Configuration 311 312 ``` 313 destination d_loki { 314 syslog("localhost" transport("tcp") port(<promtail_port>)); 315 }; 316 ``` 317 318 ### Rsyslog Output Configuration 319 320 For sending messages via TCP: 321 322 ``` 323 *.* action(type="omfwd" protocol="tcp" target="<promtail_host>" port="<promtail_port>" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on") 324 ``` 325 326 For sending messages via UDP: 327 328 ``` 329 *.* action(type="omfwd" protocol="udp" target="<promtail_host>" port="<promtail_port>" Template="RSYSLOG_SyslogProtocol23Format") 330 ``` 331 332 ## Kafka 333 334 Promtail supports reading message from Kafka using a consumer group. 335 The Kafka targets can be configured using the `kafka` stanza: 336 337 ```yaml 338 scrape_configs: 339 - job_name: kafka 340 kafka: 341 brokers: 342 - my-kafka-0.org:50705 343 - my-kafka-1.org:50705 344 topics: 345 - ^promtail.* 346 - some_fixed_topic 347 labels: 348 job: kafka 349 relabel_configs: 350 - action: replace 351 source_labels: 352 - __meta_kafka_topic 353 target_label: topic 354 - action: replace 355 source_labels: 356 - __meta_kafka_partition 357 target_label: partition 358 - action: replace 359 source_labels: 360 - __meta_kafka_group_id 361 target_label: group 362 - action: replace 363 source_labels: 364 - __meta_kafka_message_key 365 target_label: message_key 366 ``` 367 368 Only the `brokers` and `topics` is required. 369 see the [configuration](../../configuration/#kafka) section for more information. 370 371 ## GELF 372 373 <span style="background-color:#f3f973;">GELF support in Promtail is an experimental feature.</span> 374 375 Promtail supports listening message using the [GELF](https://docs.graylog.org/docs/gelf) UDP protocol. 376 The GELF targets can be configured using the `gelf` stanza: 377 378 ```yaml 379 scrape_configs: 380 - job_name: gelf 381 gelf: 382 listen_address: "0.0.0.0:12201" 383 use_incoming_timestamp: true 384 labels: 385 job: gelf 386 relabel_configs: 387 - action: replace 388 source_labels: 389 - __gelf_message_host 390 target_label: host 391 - action: replace 392 source_labels: 393 - __gelf_message_level 394 target_label: level 395 - action: replace 396 source_labels: 397 - __gelf_message_facility 398 target_label: facility 399 ``` 400 401 ## Cloudflare 402 403 Promtail supports pulling HTTP log messages from Cloudflare using the [Logpull API](https://developers.cloudflare.com/logs/logpull). 404 The Cloudflare targets can be configured with a `cloudflare` block: 405 406 ```yaml 407 scrape_configs: 408 - job_name: cloudflare 409 cloudflare: 410 api_token: REDACTED 411 zone_id: REDACTED 412 fields_type: all 413 labels: 414 job: cloudflare-foo.com 415 ``` 416 417 Only `api_token` and `zone_id` are required. 418 Refer to the [Cloudfare](../configuration/#cloudflare) configuration section for details. 419 420 ## Heroku Drain 421 Promtail supports receiving logs from a Heroku application by using a [Heroku HTTPS Drain](https://devcenter.heroku.com/articles/log-drains#https-drains). 422 Configuration is specified in a`heroku_drain` block within the Promtail `scrape_config` configuration. 423 424 ```yaml 425 - job_name: heroku_drain 426 heroku_drain: 427 server: 428 http_listen_address: 0.0.0.0 429 http_listen_port: 8080 430 labels: 431 job: heroku_drain_docs 432 use_incoming_timestamp: true 433 relabel_configs: 434 - source_labels: ['__heroku_drain_host'] 435 target_label: 'host' 436 - source_labels: ['__heroku_drain_app'] 437 target_label: 'app' 438 - source_labels: ['__heroku_drain_proc'] 439 target_label: 'proc' 440 - source_labels: ['__heroku_drain_log_id'] 441 target_label: 'log_id' 442 ``` 443 Within the `scrape_configs` configuration for a Heroku Drain target, the `job_name` must be a Prometheus-compatible [metric name](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels). 444 445 The [server](../configuration.md#server) section configures the HTTP server created for receiving logs. 446 `labels` defines a static set of label values added to each received log entry. `use_incoming_timestamp` can be used to pass 447 the timestamp received from Heroku. 448 449 Before using a `heroku_drain` target, Heroku should be configured with the URL where the Promtail instance will be listening. 450 Follow the steps in [Heroku HTTPS Drain docs](https://devcenter.heroku.com/articles/log-drains#https-drains) for using the Heroku CLI 451 with a command like the following: 452 453 ``` 454 heroku drains:add [http|https]://HOSTNAME:8080/heroku/api/v1/drain -a HEROKU_APP_NAME 455 ``` 456 457 It also supports `relabeling` and `pipeline` stages just like other targets. 458 459 When Promtail receives Heroku Drain logs, various internal labels are made available for [relabeling](#relabeling): 460 - `__heroku_drain_host` 461 - `__heroku_drain_app` 462 - `__heroku_drain_proc` 463 - `__heroku_drain_log_id` 464 In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`. 465 466 ## Relabeling 467 468 Each `scrape_configs` entry can contain a `relabel_configs` stanza. 469 `relabel_configs` is a list of operations to transform the labels from discovery 470 into another form. 471 472 A single entry in `relabel_configs` can also reject targets by doing an `action: 473 drop` if a label value matches a specified regex. When a target is dropped, the 474 owning `scrape_config` will not process logs from that particular source. 475 Other `scrape_configs` without the drop action reading from the same target 476 may still use and forward logs from it to Loki. 477 478 A common use case of `relabel_configs` is to transform an internal label such 479 as `__meta_kubernetes_*` into an intermediate internal label such as 480 `__service__`. The intermediate internal label may then be dropped based on 481 value or transformed to a final external label, such as `__job__`. 482 483 ### Examples 484 485 - Drop the target if a label (`__service__` in the example) is empty: 486 ```yaml 487 - action: drop 488 regex: '' 489 source_labels: 490 - __service__ 491 ``` 492 - Drop the target if any of the `source_labels` contain a value: 493 ```yaml 494 - action: drop 495 regex: .+ 496 separator: '' 497 source_labels: 498 - __meta_kubernetes_pod_label_name 499 - __meta_kubernetes_pod_label_app 500 ``` 501 - Persist an internal label by renaming it so it will be sent to Loki: 502 ```yaml 503 - action: replace 504 source_labels: 505 - __meta_kubernetes_namespace 506 target_label: namespace 507 ``` 508 - Persist all Kubernetes pod labels by mapping them, like by mapping 509 `__meta_kube__meta_kubernetes_pod_label_foo` to `foo`. 510 ```yaml 511 - action: labelmap 512 regex: __meta_kubernetes_pod_label_(.+) 513 ``` 514 515 Additional reading: 516 517 - [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749) 518 519 ## HTTP client options 520 521 Promtail uses the Prometheus HTTP client implementation for all calls to Loki. 522 Therefore it can be configured using the `clients` stanza, where one or more 523 connections to Loki can be established: 524 525 ```yaml 526 clients: 527 - [ <client_option> ] 528 ``` 529 530 Refer to [`client_config`](./configuration#client_config) from the Promtail 531 Configuration reference for all available options.