github.com/yankunsam/loki/v2@v2.6.3-0.20220817130409-389df5235c27/docs/sources/clients/promtail/scraping.md (about)

     1  ---
     2  title: Scraping
     3  ---
     4  # Promtail Scraping (Service Discovery)
     5  
     6  ## File Target Discovery
     7  
     8  Promtail discovers locations of log files and extract labels from them through
     9  the `scrape_configs` section in the config YAML. The syntax is identical to what
    10  [Prometheus uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
    11  
    12  `scrape_configs` contains one or more entries which are executed for each
    13  discovered target (i.e., each container in each new pod running in the
    14  instance):
    15  
    16  ```
    17  scrape_configs:
    18    - job_name: local
    19      static_configs:
    20        - ...
    21  
    22    - job_name: kubernetes
    23      kubernetes_sd_config:
    24        - ...
    25  ```
    26  
    27  If more than one scrape config section matches your logs, you will get duplicate
    28  entries as the logs are sent in different streams likely with slightly
    29  different labels.
    30  
    31  There are different types of labels present in Promtail:
    32  
    33  - Labels starting with `__` (two underscores) are internal labels. They usually
    34    come from dynamic sources like service discovery. Once relabeling is done,
    35    they are removed from the label set. To persist internal labels so they're
    36    sent to Grafana Loki, rename them so they don't start with `__`. See
    37    [Relabeling](#relabeling) for more information.
    38  
    39  - Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which
    40    are generated based on your Kubernetes pod's labels.
    41  
    42    For example, if your Kubernetes pod has a label `name` set to `foobar`, then
    43    the `scrape_configs` section will receive an internal label
    44    `__meta_kubernetes_pod_label_name` with a value set to `foobar`.
    45  
    46  - Other labels starting with `__meta_kubernetes_*` exist based on other
    47    Kubernetes metadata, such as the namespace of the pod
    48    (`__meta_kubernetes_namespace`) or the name of the container inside the pod
    49    (`__meta_kubernetes_pod_container_name`). Refer to
    50    [the Prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config)
    51    for the full list of Kubernetes meta labels.
    52  
    53  - The `__path__` label is a special label which Promtail uses after discovery to
    54    figure out where the file to read is located. Wildcards are allowed, for example `/var/log/*.log` to get all files with a `log` extension in the specified directory, and `/var/log/**/*.log` for matching files and directories recursively. For a full list of options check out the docs for the [library](https://github.com/bmatcuk/doublestar) Promtail uses.
    55  
    56  - The `__path_exclude__` label is another special label Promtail uses after 
    57    discovery, to exclude a subset of the files discovered using `__path__` from 
    58    being read in the current scrape_config block. It uses the same 
    59    [library](https://github.com/bmatcuk/doublestar) to enable usage of
    60    wildcards and glob patterns.
    61  
    62  - The label `filename` is added for every file found in `__path__` to ensure the
    63    uniqueness of the streams. It is set to the absolute path of the file the line
    64    was read from.
    65  
    66  ### Kubernetes Discovery
    67  
    68  Note that while Promtail can utilize the Kubernetes API to discover pods as
    69  targets, it can only read log files from pods that are running on the same node
    70  as the one Promtail is running on. Promtail looks for a `__host__` label on
    71  each target and validates that it is set to the same hostname as Promtail's
    72  (using either `$HOSTNAME` or the hostname reported by the kernel if the
    73  environment variable is not set).
    74  
    75  This means that any time Kubernetes service discovery is used, there must be a
    76  `relabel_config` that creates the intermediate label `__host__` from
    77  `__meta_kubernetes_pod_node_name`:
    78  
    79  ```yaml
    80  relabel_configs:
    81    - source_labels: ['__meta_kubernetes_pod_node_name']
    82      target_label: '__host__'
    83  ```
    84  
    85  See [Relabeling](#relabeling) for more information. For more information on how to configure the service discovery see the [Kubernetes Service Discovery configuration](../configuration/#kubernetes_sd_config).
    86  
    87  ## Journal Scraping (Linux Only)
    88  
    89  On systems with `systemd`, Promtail also supports reading from the journal. Unlike
    90  file scraping which is defined in the `static_configs` stanza, journal scraping is
    91  defined in a `journal` stanza:
    92  
    93  ```yaml
    94  scrape_configs:
    95    - job_name: journal
    96      journal:
    97        json: false
    98        max_age: 12h
    99        path: /var/log/journal
   100        labels:
   101          job: systemd-journal
   102      relabel_configs:
   103        - source_labels: ['__journal__systemd_unit']
   104          target_label: 'unit'
   105  ```
   106  
   107  All fields defined in the `journal` section are optional, and are just provided
   108  here for reference. The `max_age` field ensures that no older entry than the
   109  time specified will be sent to Loki; this circumvents "entry too old" errors.
   110  The `path` field tells Promtail where to read journal entries from. The labels
   111  map defines a constant list of labels to add to every journal entry that Promtail
   112  reads.
   113  
   114  When the `json` field is set to `true`, messages from the journal will be
   115  passed through the pipeline as JSON, keeping all of the original fields from the
   116  journal entry. This is useful when you don't want to index some fields but you
   117  still want to know what values they contained.
   118  
   119  By default, Promtail reads from the journal by looking in the `/var/log/journal`
   120  and `/run/log/journal` paths. If running Promtail inside of a Docker container,
   121  the path appropriate to your distribution should be bind mounted inside of
   122  Promtail along with binding `/etc/machine-id`. Bind mounting `/etc/machine-id`
   123  to the path of the same name is required for the journal reader to know which
   124  specific journal to read from. For example:
   125  
   126  ```bash
   127  docker run \
   128    -v /var/log/journal/:/var/log/journal/ \
   129    -v /run/log/journal/:/run/log/journal/ \
   130    -v /etc/machine-id:/etc/machine-id \
   131    grafana/promtail:latest \
   132    -config.file=/path/to/config/file.yaml
   133  ```
   134  
   135  When Promtail reads from the journal, it brings in all fields prefixed with
   136  `__journal_` as internal labels. Like in the example above, the `_SYSTEMD_UNIT`
   137  field from the journal was transformed into a label called `unit` through
   138  `relabel_configs`. See [Relabeling](#relabeling) for more information, also look at [the systemd man pages](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html) for a list of fields exposed by the journal.
   139  
   140  Here's an example where the `SYSTEMD_UNIT`, `HOSTNAME`, and `SYSLOG_IDENTIFIER` are relabeled for use in Loki.
   141  
   142  Keep in mind that labels prefixed with `__` will be dropped, so relabeling is required to keep these labels.
   143  
   144  ```yaml
   145  - job_name: systemd-journal
   146    journal:
   147      labels:
   148        cluster: ops-tools1
   149        job: default/systemd-journal
   150      path: /var/log/journal
   151    relabel_configs:
   152    - source_labels:
   153      - __journal__systemd_unit
   154      target_label: systemd_unit
   155    - source_labels:
   156      - __journal__hostname
   157      target_label: nodename
   158    - source_labels:
   159      - __journal_syslog_identifier
   160      target_label: syslog_identifier
   161  ```
   162  
   163  ## Windows Event Log
   164  
   165  On Windows Promtail supports reading from the event log.
   166  Windows event targets can be configured using the `windows_events` stanza:
   167  
   168  
   169  ```yaml
   170  scrape_configs:
   171  - job_name: windows
   172    windows_events:
   173      use_incoming_timestamp: false
   174      bookmark_path: "./bookmark.xml"
   175      eventlog_name: "Application"
   176      xpath_query: '*'
   177      labels:
   178        job: windows
   179    relabel_configs:
   180      - source_labels: ['computer']
   181        target_label: 'host'
   182  ```
   183  
   184  When Promtail receives an event it will attach the `channel` and `computer` labels
   185  and serialize the event in json.
   186  You can relabel default labels via [Relabeling](#relabeling) if required.
   187  
   188  Providing a path to a bookmark is mandatory, it will be used to persist the last event processed and allow
   189  resuming the target without skipping logs.
   190  
   191  see the [configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#windows_events) section for more information.
   192  
   193  ## GCP Log scraping
   194  
   195  Promtail supports scraping cloud resource logs such as GCS bucket logs, load balancer logs, and Kubernetes cluster logs from GCP.
   196  Configuration is specified in the `gcplog` section, within `scrape_config`.
   197  
   198  There are two kind of scraping strategies: `pull` and `push`.
   199  
   200  ### Pull
   201  
   202  ```yaml
   203    - job_name: gcplog
   204      gcplog:
   205        subscription_type: "pull" # If the `subscription_type` field is empty, defaults to `pull`
   206        project_id: "my-gcp-project"
   207        subscription: "my-pubsub-subscription"
   208        use_incoming_timestamp: false # default rewrite timestamps.
   209        labels:
   210          job: "gcplog"
   211      relabel_configs:
   212        - source_labels: ['__gcp_resource_type']
   213          target_label: 'resource_type'
   214        - source_labels: ['__gcp_resource_labels_project_id']
   215          target_label: 'project'
   216  ```
   217  Here `project_id` and `subscription` are the only required fields.
   218  
   219  - `project_id` is the GCP project id.
   220  - `subscription` is the GCP pubsub subscription where Promtail can consume log entries from.
   221  
   222  Before using `gcplog` target, GCP should be [configured](../gcplog-cloud) with pubsub subscription to receive logs from.
   223  
   224  It also supports `relabeling` and `pipeline` stages just like other targets.
   225  
   226  When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling):
   227    - `__gcp_logname`
   228    - `__gcp_resource_type`
   229    - `__gcp_resource_labels_<NAME>`
   230      In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`.
   231  
   232  ### Push
   233  
   234  ```yaml
   235    - job_name: gcplog
   236      gcplog:
   237        subscription_type: "push"
   238        use_incoming_timestamp: false
   239        labels:
   240          job: "gcplog-push"
   241        server:
   242          http_listen_address: 0.0.0.0
   243          http_listen_port: 8080
   244      relabel_configs:
   245        - source_labels: ['__gcp_message_id']
   246          target_label: 'message_id'
   247        - source_labels: ['__gcp_attributes_logging_googleapis_com_timestamp']
   248          target_label: 'incoming_ts'
   249  ```
   250  
   251  When configuring the GCP Log push target, Promtail will start an HTTP server listening on port `8080`, as configured in the `server`
   252  section. This server exposes the single endpoint `POST /gcp/api/v1/push`, responsible for receiving logs from GCP.
   253  
   254  It also supports `relabeling` and `pipeline` stages.
   255  
   256  When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling):
   257  - `__gcp_message_id`
   258  - `__gcp_attributes_<NAME>`
   259  
   260  In the example above, the `__gcp_message_id` and the `__gcp_attributes_logging_googleapis_com_timestamp` labels are 
   261  transformed to `message_id` and `incoming_ts` through `relabel_configs`. All other internal labels, for example some other attribute,
   262  will be dropped by the target if not transformed.
   263  
   264  ## Syslog Receiver
   265  
   266  Promtail supports receiving [IETF Syslog (RFC5424)](https://tools.ietf.org/html/rfc5424)
   267  messages from a TCP or UDP stream. Receiving syslog messages is defined in a `syslog`
   268  stanza:
   269  
   270  ```yaml
   271  scrape_configs:
   272    - job_name: syslog
   273      syslog:
   274        listen_address: 0.0.0.0:1514
   275        listen_protocol: tcp
   276        idle_timeout: 60s
   277        label_structured_data: yes
   278        labels:
   279          job: "syslog"
   280      relabel_configs:
   281        - source_labels: ['__syslog_message_hostname']
   282          target_label: 'host'
   283  ```
   284  
   285  The only required field in the syslog section is the `listen_address` field,
   286  where a valid network address must be provided. The default protocol for
   287  receiving messages is TCP. To change the protocol, the `listen_protocol` field
   288  can be changed to `udp`. Note, that UDP does not support TLS.
   289  The `idle_timeout` can help with cleaning up stale syslog connections.
   290  If `label_structured_data` is set,
   291  [structured data](https://tools.ietf.org/html/rfc5424#section-6.3) in the
   292  syslog header will be translated to internal labels in the form of
   293  `__syslog_message_sd_<ID>_<KEY>`.
   294  The labels map defines a constant list of labels to add to every journal entry
   295  that Promtail reads.
   296  
   297  Note that it is recommended to deploy a dedicated syslog forwarder
   298  like **syslog-ng** or **rsyslog** in front of Promtail.
   299  The forwarder can take care of the various specifications
   300  and transports that exist (UDP, BSD syslog, ...). See recommended output
   301  configurations for [syslog-ng](#syslog-ng-output-configuration) and
   302  [rsyslog](#rsyslog-output-configuration).
   303  
   304  When Promtail receives syslog messages, it brings in all header fields,
   305  parsed from the received message, prefixed with `__syslog_` as internal labels.
   306  Like in the example above, the `__syslog_message_hostname`
   307  field from the journal was transformed into a label called `host` through
   308  `relabel_configs`. See [Relabeling](#relabeling) for more information.
   309  
   310  ### Syslog-NG Output Configuration
   311  
   312  ```
   313  destination d_loki {
   314    syslog("localhost" transport("tcp") port(<promtail_port>));
   315  };
   316  ```
   317  
   318  ### Rsyslog Output Configuration
   319  
   320  For sending messages via TCP:
   321  
   322  ```
   323  *.* action(type="omfwd" protocol="tcp" target="<promtail_host>" port="<promtail_port>" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on")
   324  ```
   325  
   326  For sending messages via UDP:
   327  
   328  ```
   329  *.* action(type="omfwd" protocol="udp" target="<promtail_host>" port="<promtail_port>" Template="RSYSLOG_SyslogProtocol23Format")
   330  ```
   331  
   332  ## Kafka
   333  
   334  Promtail supports reading message from Kafka using a consumer group.
   335  The Kafka targets can be configured using the `kafka` stanza:
   336  
   337  ```yaml
   338  scrape_configs:
   339  - job_name: kafka
   340    kafka:
   341      brokers:
   342      - my-kafka-0.org:50705
   343      - my-kafka-1.org:50705
   344      topics:
   345      - ^promtail.*
   346      - some_fixed_topic
   347      labels:
   348        job: kafka
   349    relabel_configs:
   350        - action: replace
   351          source_labels:
   352            - __meta_kafka_topic
   353          target_label: topic
   354        - action: replace
   355          source_labels:
   356            - __meta_kafka_partition
   357          target_label: partition
   358        - action: replace
   359          source_labels:
   360            - __meta_kafka_group_id
   361          target_label: group
   362        - action: replace
   363          source_labels:
   364            - __meta_kafka_message_key
   365          target_label: message_key
   366  ```
   367  
   368  Only the `brokers` and `topics` is required.
   369  see the [configuration](../../configuration/#kafka) section for more information.
   370  
   371  ## GELF
   372  
   373  <span style="background-color:#f3f973;">GELF support in Promtail is an experimental feature.</span>
   374  
   375  Promtail supports listening message using the [GELF](https://docs.graylog.org/docs/gelf) UDP protocol.
   376  The GELF targets can be configured using the `gelf` stanza:
   377  
   378  ```yaml
   379  scrape_configs:
   380  - job_name: gelf
   381    gelf:
   382      listen_address: "0.0.0.0:12201"
   383      use_incoming_timestamp: true
   384      labels:
   385        job: gelf
   386    relabel_configs:
   387        - action: replace
   388          source_labels:
   389            - __gelf_message_host
   390          target_label: host
   391        - action: replace
   392          source_labels:
   393            - __gelf_message_level
   394          target_label: level
   395        - action: replace
   396          source_labels:
   397            - __gelf_message_facility
   398          target_label: facility
   399  ```
   400  
   401  ## Cloudflare
   402  
   403  Promtail supports pulling HTTP log messages from Cloudflare using the [Logpull API](https://developers.cloudflare.com/logs/logpull).
   404  The Cloudflare targets can be configured with a `cloudflare` block:
   405  
   406  ```yaml
   407  scrape_configs:
   408  - job_name: cloudflare
   409    cloudflare:
   410      api_token: REDACTED
   411      zone_id: REDACTED
   412      fields_type: all
   413      labels:
   414        job: cloudflare-foo.com
   415  ```
   416  
   417  Only `api_token` and `zone_id` are required.
   418  Refer to the [Cloudfare](../configuration/#cloudflare) configuration section for details.
   419  
   420  ## Heroku Drain
   421  Promtail supports receiving logs from a Heroku application by using a [Heroku HTTPS Drain](https://devcenter.heroku.com/articles/log-drains#https-drains).
   422  Configuration is specified in a`heroku_drain` block within the Promtail `scrape_config` configuration.
   423  
   424  ```yaml
   425  - job_name: heroku_drain
   426      heroku_drain:
   427        server:
   428          http_listen_address: 0.0.0.0
   429          http_listen_port: 8080
   430        labels:
   431          job: heroku_drain_docs
   432        use_incoming_timestamp: true
   433      relabel_configs:
   434        - source_labels: ['__heroku_drain_host']
   435          target_label: 'host'
   436        - source_labels: ['__heroku_drain_app']
   437          target_label: 'app'
   438        - source_labels: ['__heroku_drain_proc']
   439          target_label: 'proc'
   440        - source_labels: ['__heroku_drain_log_id']
   441          target_label: 'log_id'
   442  ```
   443  Within the `scrape_configs` configuration for a Heroku Drain target, the `job_name` must be a Prometheus-compatible [metric name](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels).
   444  
   445  The [server](../configuration.md#server) section configures the HTTP server created for receiving logs.
   446  `labels` defines a static set of label values added to each received log entry. `use_incoming_timestamp` can be used to pass
   447  the timestamp received from Heroku.
   448  
   449  Before using a `heroku_drain` target, Heroku should be configured with the URL where the Promtail instance will be listening. 
   450  Follow the steps in [Heroku HTTPS Drain docs](https://devcenter.heroku.com/articles/log-drains#https-drains) for using the Heroku CLI
   451  with a command like the following:
   452  
   453  ```
   454  heroku drains:add [http|https]://HOSTNAME:8080/heroku/api/v1/drain -a HEROKU_APP_NAME
   455  ```
   456  
   457  It also supports `relabeling` and `pipeline` stages just like other targets.
   458  
   459  When Promtail receives Heroku Drain logs, various internal labels are made available for [relabeling](#relabeling):
   460  - `__heroku_drain_host`
   461  - `__heroku_drain_app`
   462  - `__heroku_drain_proc`
   463  - `__heroku_drain_log_id`
   464  In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`.
   465  
   466  ## Relabeling
   467  
   468  Each `scrape_configs` entry can contain a `relabel_configs` stanza.
   469  `relabel_configs` is a list of operations to transform the labels from discovery
   470  into another form.
   471  
   472  A single entry in `relabel_configs` can also reject targets by doing an `action:
   473  drop` if a label value matches a specified regex. When a target is dropped, the
   474  owning `scrape_config` will not process logs from that particular source.
   475  Other `scrape_configs` without the drop action reading from the same target
   476  may still use and forward logs from it to Loki.
   477  
   478  A common use case of `relabel_configs` is to transform an internal label such
   479  as `__meta_kubernetes_*` into an intermediate internal label such as
   480  `__service__`. The intermediate internal label may then be dropped based on
   481  value or transformed to a final external label, such as `__job__`.
   482  
   483  ### Examples
   484  
   485  - Drop the target if a label (`__service__` in the example) is empty:
   486  ```yaml
   487    - action: drop
   488      regex: ''
   489      source_labels:
   490      - __service__
   491  ```
   492  - Drop the target if any of the `source_labels` contain a value:
   493  ```yaml
   494    - action: drop
   495      regex: .+
   496      separator: ''
   497      source_labels:
   498      - __meta_kubernetes_pod_label_name
   499      - __meta_kubernetes_pod_label_app
   500  ```
   501  - Persist an internal label by renaming it so it will be sent to Loki:
   502  ```yaml
   503    - action: replace
   504      source_labels:
   505      - __meta_kubernetes_namespace
   506      target_label: namespace
   507  ```
   508  - Persist all Kubernetes pod labels by mapping them, like by mapping
   509      `__meta_kube__meta_kubernetes_pod_label_foo` to `foo`.
   510  ```yaml
   511    - action: labelmap
   512      regex: __meta_kubernetes_pod_label_(.+)
   513  ```
   514  
   515  Additional reading:
   516  
   517   - [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749)
   518  
   519  ## HTTP client options
   520  
   521  Promtail uses the Prometheus HTTP client implementation for all calls to Loki.
   522  Therefore it can be configured using the `clients` stanza, where one or more
   523  connections to Loki can be established:
   524  
   525  ```yaml
   526  clients:
   527    - [ <client_option> ]
   528  ```
   529  
   530  Refer to [`client_config`](./configuration#client_config) from the Promtail
   531  Configuration reference for all available options.