github.com/thanos-io/thanos@v0.32.5/docs/tracing.md

github.com/thanos-io/thanos@v0.32.5/docs/tracing.md (about)

     1  # Tracing
     2  
     3  Thanos supports different tracing backends that implements `opentracing.Tracer` interface.
     4  
     5  All clients are configured using `--tracing.config-file` to reference to the configuration file or `--tracing.config` to put yaml config directly.
     6  
     7  ## How to use `config` flags?
     8  
     9  You can either pass YAML file defined below in `--tracing.config-file` or pass the YAML content directly using `--tracing.config`. We recommend the latter as it gives an explicit static view of configuration for each component. It also saves you the fuss of creating and managing additional file.
    10  
    11  Don't be afraid of multiline flags!
    12  
    13  In Kubernetes it is as easy as (using Thanos sidecar example):
    14  
    15  ```yaml
    16        - args:
    17          - sidecar
    18          - |
    19            --objstore.config=type: GCS
    20            config:
    21              bucket: <bucket>
    22          - --prometheus.url=http://localhost:9090
    23          - |
    24            --tracing.config=type: STACKDRIVER
    25            config:
    26              service_name: ""
    27              project_id: <project>
    28              sample_factor: 16
    29          - --tsdb.path=/prometheus-data
    30  ```
    31  
    32  ## How to add a new client?
    33  
    34  1. Create new directory under `pkg/tracing/<provider>`
    35  2. Implement `opentracing.Tracer` interface
    36  3. Add client implementation to the factory in [factory](../pkg/tracing/client/factory.go) code. (Using as small amount of flags as possible in every command)
    37  4. Add client struct config to [cfggen](../scripts/cfggen/main.go) to allow config auto generation.
    38  
    39  At that point, anyone can use your provider by spec.
    40  
    41  See [this issue](https://github.com/thanos-io/thanos/issues/1972) to check our progress on moving to OpenTelemetry Go client library.
    42  
    43  ## Usage
    44  
    45  Once tracing is enabled and sampling per backend is configured, Thanos will generate traces for all gRPC and HTTP APIs thanks to generic "middlewares". Some more interesting to observe APIs like `query` or `query_range` have more low-level spans with focused metadata showing latency for important functionalities. For example, Jaeger view of HTTP query_range API call might look as follows:
    46  
    47  ![view](img/tracing2.png)
    48  
    49  As you can see it contains both HTTP request and spans around gRPC request, since [Querier](components/query.md) calls gRPC services to get fetch series data.
    50  
    51  Each Thanos component generates spans related to its work and sends them to central place e.g Jaeger or OpenTelemetry collector. Such place is then responsible to tie all spans to a single trace, showing a request execution path.
    52  
    53  ### Obtaining Trace ID
    54  
    55  Single trace is tied to a single, unique request to the system and is composed of many spans from different components. Trace is identifiable using `Trace ID`, which is a unique hash e.g `131da78f02aa3525`. This information can be also referred as `request id` and `operation id` in other systems. In order to use trace data you want to find trace IDs that explains the requests you are interested in e.g request with interesting error, or longer latency, or just debug call you just made.
    56  
    57  When using tracing with Thanos, you can obtain trace ID in multiple ways:
    58  
    59  * Search by labels/attributes/tags/time/component/latency e.g. using Jaeger indexing.
    60  * [Exemplars](https://www.bwplotka.dev/2021/correlations-exemplars/)
    61  * If request was sampled, response will have `X-Thanos-Trace-Id` response header with trace ID of this request as value.
    62  
    63  ![view](img/tracing.png)
    64  
    65  ### Forcing Sampling
    66  
    67  Every request against any Thanos component's API with header `X-Thanos-Force-Tracing` will be sampled if tracing backend was configured.
    68  
    69  ## Configuration
    70  
    71  Currently supported tracing backends:
    72  
    73  ### OpenTelemetry (OTLP)
    74  
    75  Thanos supports exporting traces in the OpenTelemetry Protocol (OTLP). Both gRPC and HTTP clients are supported. Options can be provided also via environment variables. For more details see the [exporter specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options).
    76  
    77  ```yaml mdox-exec="go run scripts/cfggen/main.go --name=otlp.Config"
    78  type: OTLP
    79  config:
    80    client_type: ""
    81    service_name: ""
    82    reconnection_period: 0s
    83    compression: ""
    84    insecure: false
    85    endpoint: ""
    86    url_path: ""
    87    timeout: 0s
    88    retry_config:
    89      retry_enabled: false
    90      retry_initial_interval: 0s
    91      retry_max_interval: 0s
    92      retry_max_elapsed_time: 0s
    93    headers: {}
    94    tls_config:
    95      ca_file: ""
    96      cert_file: ""
    97      key_file: ""
    98      server_name: ""
    99      insecure_skip_verify: false
   100    sampler_type: ""
   101    sampler_param: ""
   102  ```
   103  
   104  ### Jaeger
   105  
   106  Client for https://github.com/jaegertracing/jaeger tracing. Options can be provided also via environment variables. For more details see the Jaeger [exporter specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#jaeger-exporter).
   107  
   108  *WARNING: Options `RPC Metrics`, `Gen128Bit` and `Disabled` are now deprecated and won't have any effect when set*
   109  
   110  ```yaml mdox-exec="go run scripts/cfggen/main.go --name=jaeger.Config"
   111  type: JAEGER
   112  config:
   113    service_name: ""
   114    disabled: false
   115    rpc_metrics: false
   116    tags: ""
   117    sampler_type: ""
   118    sampler_param: 0
   119    sampler_manager_host_port: ""
   120    sampler_max_operations: 0
   121    sampler_refresh_interval: 0s
   122    sampler_parent_config:
   123      local_parent_sampled: false
   124      remote_parent_sampled: false
   125    sampling_server_url: ""
   126    operation_name_late_binding: false
   127    initial_sampler_rate: 0
   128    reporter_max_queue_size: 0
   129    reporter_flush_interval: 0s
   130    reporter_log_spans: false
   131    reporter_disable_attempt_reconnecting: false
   132    reporter_attempt_reconnect_interval: 0s
   133    endpoint: ""
   134    user: ""
   135    password: ""
   136    agent_host: ""
   137    agent_port: 0
   138    traceid_128bit: false
   139  ```
   140  
   141  ### Google Cloud (formerly Stackdriver)
   142  
   143  Client for https://cloud.google.com/trace/ tracing.
   144  
   145  You will also need to ensure that the authentication with the API is working, follow [this guide](https://cloud.google.com/trace/docs/setup/go-ot#configure_your_platform) to set it up.
   146  
   147  *Note:* The `type` in the configuration below can have either value `GOOGLE_CLOUD` or `STACKDRIVER` - this is to ensure backwards compatibility.
   148  
   149  ```yaml mdox-exec="go run scripts/cfggen/main.go --name=google_cloud.Config"
   150  type: GOOGLE_CLOUD
   151  config:
   152    service_name: ""
   153    project_id: ""
   154    sample_factor: 0
   155  ```
   156  
   157  ### Elastic APM
   158  
   159  Client for https://www.elastic.co/products/apm tracing.
   160  
   161  ```yaml mdox-exec="go run scripts/cfggen/main.go --name=elasticapm.Config"
   162  type: ELASTIC_APM
   163  config:
   164    service_name: ""
   165    service_version: ""
   166    service_environment: ""
   167    sample_rate: 0
   168  ```
   169  
   170  ### Lightstep
   171  
   172  Client for [Lightstep](https://lightstep.com).
   173  
   174  In order to configure Thanos to interact with Lightstep you need to provide at least an [access token](https://docs.lightstep.com/docs/create-and-use-access-tokens) in the configuration file. The `collector` key is optional and used when you have on-premise satellites.
   175  
   176  ```yaml mdox-exec="go run scripts/cfggen/main.go --name=lightstep.Config"
   177  type: LIGHTSTEP
   178  config:
   179    access_token: ""
   180    collector:
   181      scheme: ""
   182      host: ""
   183      port: 0
   184      plaintext: false
   185      custom_ca_cert_file: ""
   186    tags: ""
   187  ```