github.com/thanos-io/thanos@v0.32.5/docs/tracing.md (about) 1 # Tracing 2 3 Thanos supports different tracing backends that implements `opentracing.Tracer` interface. 4 5 All clients are configured using `--tracing.config-file` to reference to the configuration file or `--tracing.config` to put yaml config directly. 6 7 ## How to use `config` flags? 8 9 You can either pass YAML file defined below in `--tracing.config-file` or pass the YAML content directly using `--tracing.config`. We recommend the latter as it gives an explicit static view of configuration for each component. It also saves you the fuss of creating and managing additional file. 10 11 Don't be afraid of multiline flags! 12 13 In Kubernetes it is as easy as (using Thanos sidecar example): 14 15 ```yaml 16 - args: 17 - sidecar 18 - | 19 --objstore.config=type: GCS 20 config: 21 bucket: <bucket> 22 - --prometheus.url=http://localhost:9090 23 - | 24 --tracing.config=type: STACKDRIVER 25 config: 26 service_name: "" 27 project_id: <project> 28 sample_factor: 16 29 - --tsdb.path=/prometheus-data 30 ``` 31 32 ## How to add a new client? 33 34 1. Create new directory under `pkg/tracing/<provider>` 35 2. Implement `opentracing.Tracer` interface 36 3. Add client implementation to the factory in [factory](../pkg/tracing/client/factory.go) code. (Using as small amount of flags as possible in every command) 37 4. Add client struct config to [cfggen](../scripts/cfggen/main.go) to allow config auto generation. 38 39 At that point, anyone can use your provider by spec. 40 41 See [this issue](https://github.com/thanos-io/thanos/issues/1972) to check our progress on moving to OpenTelemetry Go client library. 42 43 ## Usage 44 45 Once tracing is enabled and sampling per backend is configured, Thanos will generate traces for all gRPC and HTTP APIs thanks to generic "middlewares". Some more interesting to observe APIs like `query` or `query_range` have more low-level spans with focused metadata showing latency for important functionalities. For example, Jaeger view of HTTP query_range API call might look as follows: 46 47 ![view](img/tracing2.png) 48 49 As you can see it contains both HTTP request and spans around gRPC request, since [Querier](components/query.md) calls gRPC services to get fetch series data. 50 51 Each Thanos component generates spans related to its work and sends them to central place e.g Jaeger or OpenTelemetry collector. Such place is then responsible to tie all spans to a single trace, showing a request execution path. 52 53 ### Obtaining Trace ID 54 55 Single trace is tied to a single, unique request to the system and is composed of many spans from different components. Trace is identifiable using `Trace ID`, which is a unique hash e.g `131da78f02aa3525`. This information can be also referred as `request id` and `operation id` in other systems. In order to use trace data you want to find trace IDs that explains the requests you are interested in e.g request with interesting error, or longer latency, or just debug call you just made. 56 57 When using tracing with Thanos, you can obtain trace ID in multiple ways: 58 59 * Search by labels/attributes/tags/time/component/latency e.g. using Jaeger indexing. 60 * [Exemplars](https://www.bwplotka.dev/2021/correlations-exemplars/) 61 * If request was sampled, response will have `X-Thanos-Trace-Id` response header with trace ID of this request as value. 62 63 ![view](img/tracing.png) 64 65 ### Forcing Sampling 66 67 Every request against any Thanos component's API with header `X-Thanos-Force-Tracing` will be sampled if tracing backend was configured. 68 69 ## Configuration 70 71 Currently supported tracing backends: 72 73 ### OpenTelemetry (OTLP) 74 75 Thanos supports exporting traces in the OpenTelemetry Protocol (OTLP). Both gRPC and HTTP clients are supported. Options can be provided also via environment variables. For more details see the [exporter specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#configuration-options). 76 77 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=otlp.Config" 78 type: OTLP 79 config: 80 client_type: "" 81 service_name: "" 82 reconnection_period: 0s 83 compression: "" 84 insecure: false 85 endpoint: "" 86 url_path: "" 87 timeout: 0s 88 retry_config: 89 retry_enabled: false 90 retry_initial_interval: 0s 91 retry_max_interval: 0s 92 retry_max_elapsed_time: 0s 93 headers: {} 94 tls_config: 95 ca_file: "" 96 cert_file: "" 97 key_file: "" 98 server_name: "" 99 insecure_skip_verify: false 100 sampler_type: "" 101 sampler_param: "" 102 ``` 103 104 ### Jaeger 105 106 Client for https://github.com/jaegertracing/jaeger tracing. Options can be provided also via environment variables. For more details see the Jaeger [exporter specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#jaeger-exporter). 107 108 *WARNING: Options `RPC Metrics`, `Gen128Bit` and `Disabled` are now deprecated and won't have any effect when set* 109 110 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=jaeger.Config" 111 type: JAEGER 112 config: 113 service_name: "" 114 disabled: false 115 rpc_metrics: false 116 tags: "" 117 sampler_type: "" 118 sampler_param: 0 119 sampler_manager_host_port: "" 120 sampler_max_operations: 0 121 sampler_refresh_interval: 0s 122 sampler_parent_config: 123 local_parent_sampled: false 124 remote_parent_sampled: false 125 sampling_server_url: "" 126 operation_name_late_binding: false 127 initial_sampler_rate: 0 128 reporter_max_queue_size: 0 129 reporter_flush_interval: 0s 130 reporter_log_spans: false 131 reporter_disable_attempt_reconnecting: false 132 reporter_attempt_reconnect_interval: 0s 133 endpoint: "" 134 user: "" 135 password: "" 136 agent_host: "" 137 agent_port: 0 138 traceid_128bit: false 139 ``` 140 141 ### Google Cloud (formerly Stackdriver) 142 143 Client for https://cloud.google.com/trace/ tracing. 144 145 You will also need to ensure that the authentication with the API is working, follow [this guide](https://cloud.google.com/trace/docs/setup/go-ot#configure_your_platform) to set it up. 146 147 *Note:* The `type` in the configuration below can have either value `GOOGLE_CLOUD` or `STACKDRIVER` - this is to ensure backwards compatibility. 148 149 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=google_cloud.Config" 150 type: GOOGLE_CLOUD 151 config: 152 service_name: "" 153 project_id: "" 154 sample_factor: 0 155 ``` 156 157 ### Elastic APM 158 159 Client for https://www.elastic.co/products/apm tracing. 160 161 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=elasticapm.Config" 162 type: ELASTIC_APM 163 config: 164 service_name: "" 165 service_version: "" 166 service_environment: "" 167 sample_rate: 0 168 ``` 169 170 ### Lightstep 171 172 Client for [Lightstep](https://lightstep.com). 173 174 In order to configure Thanos to interact with Lightstep you need to provide at least an [access token](https://docs.lightstep.com/docs/create-and-use-access-tokens) in the configuration file. The `collector` key is optional and used when you have on-premise satellites. 175 176 ```yaml mdox-exec="go run scripts/cfggen/main.go --name=lightstep.Config" 177 type: LIGHTSTEP 178 config: 179 access_token: "" 180 collector: 181 scheme: "" 182 host: "" 183 port: 0 184 plaintext: false 185 custom_ca_cert_file: "" 186 tags: "" 187 ```