github.com/mailgun/holster/v4@v4.20.0/tracing/README.md (about) 1 # Distributed Tracing Using OpenTelemetry 2 ## What is OpenTelemetry? 3 From [opentelemetry.io](https://opentelemetry.io): 4 5 > OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, 6 > generate, collect, and export telemetry data (metrics, logs, and traces) to 7 > help you analyze your software’s performance and behavior. 8 9 Use OpenTelemetry to generate traces visualizing behavior in your application. 10 It's comprised of nested spans that are rendered as a waterfall graph. Each 11 span indicates start/end timings and optionally other developer specified 12 metadata and logging output. 13 14 Jaeger Tracing is a common tool used to receive OpenTelemetry trace data. Use 15 its web UI to query for traces and view the waterfall graph. 16 17 OpenTelemetry is distributed, which allows services to pass the trace ids to 18 disparate remote services. The remote service may generate child spans that 19 will be visible on the same waterfall graph. This requires that all services 20 send traces to the same Jaeger Server. 21 22 ## Why OpenTelemetry? 23 It is the latest standard for distributed tracing clients. 24 25 OpenTelemetry supersedes its now deprecated predecessor, 26 [OpenTracing](https://opentracing.io). 27 28 It no longer requires implementation specific client modules, such as Jaeger 29 client. The provided OpenTelemetry SDK includes a client for Jaeger. 30 31 ## Why Jaeger Tracing Server? 32 Easy to setup. Powerful and easy to use web UI. Open source. Scalable using 33 Elasticsearch. 34 35 ## Getting Started 36 [opentelemetry.io](https://opentelemetry.io) 37 38 OpenTelemetry dev reference: 39 [https://pkg.go.dev/go.opentelemetry.io/otel](https://pkg.go.dev/go.opentelemetry.io/otel) 40 41 See unit tests for usage examples. 42 43 ### Configuration 44 In ideal conditions where you wish to send traces to localhost on 6831/udp, no 45 configuration is necessary. 46 47 Configuration reference via environment variables: 48 [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md). 49 50 #### Exporter 51 Traces export to Jaeger by default. Other exporters are available by setting 52 environment variable `OTEL_TRACES_EXPORTER` with one or more of: 53 54 * `otlp`: [OTLP: OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md) 55 * `jaeger`: [Jaeger Tracing](https://jaegertracing.io) 56 * `none`: Disable export. 57 58 Usually, you'd only need one exporter. If not, more than one may be selected 59 by delimiting with comma. 60 61 #### OTLP Exporter 62 By default, OTLP exporter exports to an [OpenTelemetry 63 Collector](https://opentelemetry.io/docs/collector/) on localhost on gRPC port 64 4317. The host and port can be changed by setting environment variable 65 `OTEL_EXPORTER_OTLP_ENDPOINT` like `https://collector:4317`. 66 67 See more: [OTLP configuration](#OTLP) 68 69 #### Jaeger Exporter via UDP 70 By default, Jaeger exporter exports to a Jaeger Agent on localhost port 71 6831/udp. The host and port can be changed by setting environment variables 72 `OTEL_EXPORTER_JAEGER_AGENT_HOST`, `OTEL_EXPORTER_JAEGER_AGENT_PORT`. 73 74 It's important to ensure UDP traces are sent on the loopback interface (aka 75 localhost). UDP datagrams are limited in size to the MTU of the interface and 76 the payload cannot be split into multiple datagrams. The loopback interface 77 MTU is large, typically 65000 or higher. Network interface MTU is typically 78 much lower at 1500. OpenTelemetry's Jaeger client is sometimes unable to limit 79 its payload to fit in a 1500 byte datagram and will drop those packets. This 80 causes traces that are mangled or missing detail. 81 82 #### Jaeger Exporter via HTTP 83 If it's not possible to install a Jaeger Agent on localhost, the client can 84 instead export directly to the Jaeger Collector of the Jaeger Server on HTTP 85 port 14268. 86 87 Enable HTTP exporter with configuration: 88 ``` 89 OTEL_EXPORTER_JAEGER_PROTOCOL=http/thrift.binary 90 OTEL_EXPORTER_JAEGER_ENDPOINT=http://<jaeger-server>:14268/api/traces 91 ``` 92 93 #### Probabilistic Sampling 94 By default, all traces are sampled. If the tracing volume is burdening the 95 application, network, or Jaeger Server, then sampling can be used to 96 selectively drop some of the traces. 97 98 In production, it may be ideal to set sampling based on a percentage 99 probability. The probability can be set in Jaeger Server configuration or 100 locally. 101 102 To enable locally, set environment variables: 103 104 ``` 105 OTEL_TRACES_SAMPLER=traceidratio 106 OTEL_TRACES_SAMPLER_ARG=<value-between-0-and-1> 107 ``` 108 109 Where 1 is always sample every trace and 0 is do not sample anything. 110 111 ### Initialization 112 The OpenTelemetry client must be initialized to read configuration and prepare 113 a `Tracer` object. When application is exiting, call `CloseTracing()`. 114 115 The library name passed in the second argument appears in spans as metadata 116 `otel.library.name`. This is used to identify the library or module that 117 generated that span. This usually the fully qualified module name of your 118 repo. Pass an empty string to autodetect the module name of the executible. 119 120 ```go 121 import "github.com/mailgun/holster/v4/tracing" 122 123 err := tracing.InitTracing(ctx, "github.com/myrepo/myservice") 124 // or... 125 err := tracing.InitTracing(ctx, "") 126 127 // ... 128 129 err = tracing.CloseTracing(ctx) 130 ``` 131 132 ### Log Level 133 Log level may be applied to traces to filter spans having a minimum log 134 severity. Spans that do not meed the minimum severity are simply dropped and 135 not exported. 136 137 Log level is passed with option `tracing.WithLevel()` as a numeric 138 log level (0-6): Panic, Fatal, Error, Warning, Info, Debug, Trace. 139 140 As a convenience, use constants, such as `tracing.DebugLevel`: 141 142 ```go 143 import "github.com/mailgun/holster/v4/tracing" 144 145 level := tracing.DebugLevel 146 err := tracing.InitTracing(ctx, "my library name", tracing.WithLevel(level)) 147 ``` 148 149 If `WithLevel()` is omitted, the level will be the global level set in Logrus. 150 151 See [Scope Log Level](#scope-log-level) for details on creating spans 152 with an assigned log level. 153 154 #### Log Level Filtering 155 Just like with common log frameworks, scope will filter spans that are a lower 156 severity than threshold provided using `WithLevel()`. 157 158 If scopes are nested and one in the middle is dropped, the hierarchy will be 159 preserved. 160 161 e.g. If `WithLevel()` is passed a log level of "Info", we expect 162 "Debug" scopes to be dropped: 163 ``` 164 # Input: 165 Info Level 1 -> Debug Level 2 -> Info Level 3 166 167 # Exports spans in form: 168 Info Level 1 -> Info Level 3 169 ``` 170 171 Log level filtering is critical for high volume applications where debug 172 tracing would generate significantly more data that isn't sustainable or 173 helpful for normal operations. But developers will have the option to 174 selectively enable debug tracing for troubleshooting. 175 176 ### Tracer Lifecycle 177 The common use case is to call `InitTracing()` to build a single default tracer 178 that the application uses througout its lifetime, then call `CloseTracing()` on 179 shutdown. 180 181 The default tracer is stored globally in the tracer package for use by tracing 182 functions. 183 184 The tracer object identifies itself by a library name, which can be seen in Jaeger 185 traces as attribute `otel.library.name`. This value is typically the module 186 name of the application. 187 188 If it's necessary to create traces with a different library name, additional 189 tracer objects may be created by `NewTracer()` which returns a context with the 190 tracer object embedded in it. This context object must be passed to tracing 191 functions use that tracer in particular, otherwise the default tracer will be 192 selected. 193 194 ### Setting Resources 195 OpenTelemetry is configured by environment variables and supplemental resource 196 settings. Some of these resources also map to environment variables. 197 198 #### Service Name 199 The service name appears in the Jaeger "Service" dropdown. If unset, default 200 is `unknown_service:<executable-filename>`. 201 202 Service name may be set in configuration by environment variable 203 `OTEL_SERVICE_NAME`. 204 205 As an alternative to environment variable, it may be provided as a resource. 206 The resource setting takes precedent over the environment variable. 207 208 ```go 209 import "github.com/mailgun/holster/v4/tracing" 210 211 res, err := tracing.NewResource("My service", "v1.0.0") 212 ctx, tracer, err := tracing.InitTracing(ctx, "github.com/myrepo/myservice", tracing.WithResource(res)) 213 ``` 214 215 ### Manual Tracing 216 Basic instrumentation. Traces function duration as a span and captures logrus logs. 217 218 ```go 219 import ( 220 "context" 221 222 "github.com/mailgun/holster/v4/tracing" 223 ) 224 225 func MyFunc(ctx context.Context) error { 226 tracer := tracing.Tracer() 227 ctx, span := tracer.Start(ctx, "Span name") 228 defer span.End() 229 230 // ... 231 232 return nil 233 } 234 ``` 235 236 ### Common OpenTelemetry Tasks 237 #### Span Attributes 238 The active `Span` object is embedded in the `Context` object. This can be 239 extracted to do things like add attribute metadata to the span: 240 241 ```go 242 import ( 243 "context" 244 245 "go.opentelemetry.io/otel/attribute" 246 "go.opentelemetry.io/otel/trace" 247 ) 248 249 func MyFunc(ctx context.Context) error { 250 span := trace.SpanFromContext(ctx) 251 span.SetAttributes( 252 attribute.String("foobar", "value"), 253 attribute.Int("x", 12345), 254 ) 255 } 256 ``` 257 258 #### Add Span Event 259 A span event is a log message added to the active span. It can optionally 260 include attribute metadata. 261 262 ```go 263 span.AddEvent("My message") 264 span.AddEvent("My metadata", trace.WithAttributes( 265 attribute.String("foobar", "value"), 266 attribute.Int("x", 12345"), 267 )) 268 ``` 269 270 #### Log an Error 271 An `Error` object can be logged to the active span. This appears as a log 272 event on the span. 273 274 ```go 275 err := errors.New("My error message") 276 span.RecordError(err) 277 278 // Can also add attribute metadata. 279 span.RecordError(err, trace.WithAttributes( 280 attribute.String("foobar", "value"), 281 )) 282 ``` 283 284 ### Scope Tracing 285 The scope functions automate span start/end and error reporting to the active 286 trace. 287 288 | Function | Description | 289 | -------------- | ----------- | 290 | `StartScope()`/`BranchScope()` | Start a scope by creating a span named after the fully qualified calling function. | 291 | `StartNamedScope()`/`BranchNamedScope()` | Start a scope by creating a span with user-provided name. | 292 | `EndScope()` | End the scope, record returned error value. | 293 | `CallScope()`/`CallScopeBranch()` | Call a code block as a scope using `StartScope()`/`EndScope()` functionality. | 294 | `CallNamedScope()`/`CallNamedScopeBranch()` | Same as `CallScope()` with a user-provided span name. | 295 296 The secondary `Branch` functions perform the same task as their counterparts, 297 except that it will "branch" from an existing trace only. If the context 298 contains no trace id, no trace will be created. The `Branch` functions are 299 best used with lower level or shared code where there is no value in creating a 300 trace starting at that point. 301 302 If the `CallScope()` action function returns an error, the error message is 303 automatically logged to the trace and the trace is marked as error. 304 305 #### Using `StartScope()`/`EndScope()` 306 ```go 307 import ( 308 "context" 309 310 "github.com/mailgun/holster/tracing" 311 "github.com/sirupsen/logrus" 312 ) 313 314 func MyFunc(ctx context.Context) (reterr error) { 315 ctx = tracing.StartScope(ctx) 316 defer func() { 317 tracing.EndScope(ctx, reterr) 318 }() 319 320 logrus.WithContext(ctx).Info("This message also logged to trace") 321 322 // ... 323 324 return nil 325 } 326 ``` 327 328 #### Using `CallScope()` 329 ```go 330 import ( 331 "context" 332 333 "github.com/mailgun/holster/v4/tracing" 334 "github.com/sirupsen/logrus" 335 ) 336 337 func MyFunc(ctx context.Context) error { 338 return tracing.CallScope(ctx, func(ctx context.Context) error { 339 logrus.WithContext(ctx).Info("This message also logged to trace") 340 341 // ... 342 343 return nil 344 }) 345 } 346 ``` 347 348 #### Scope Log Level 349 Log level can be applied to individual spans using variants of 350 `CallScope()`/`StartScope()` to set debug, info, warn, or error levels: 351 352 ```go 353 ctx2 := tracing.StartScopeDebug(ctx) 354 defer tracing.EndScope(ctx2, nil) 355 ``` 356 357 ```go 358 err := tracing.CallScopeDebug(ctx, func(ctx context.Context) error { 359 // ... 360 361 return nil 362 }) 363 ``` 364 365 #### Scope Log Level Filtering 366 Just like with common log frameworks, scope will filter spans that are a lower 367 severity than threshold provided using `WithLevel()`. 368 369 370 ## Instrumentation 371 ### Logrus 372 Logrus is configured by `InitTracing()` to mirror log messages to the active trace, if exists. 373 374 For this to work, you must use the `WithContext()` method to propagate the active 375 trace stored in the context. 376 377 ```go 378 logrus.WithContext(ctx).Info("This message also logged to trace") 379 ``` 380 381 If the log is error level or higher, the span is also marked as error and sets 382 attributes `otel.status_code` and `otel.status_description` with the error 383 details. 384 385 ### Other Instrumentation Options 386 See: [https://opentelemetry.io/registry/?language=go&component=instrumentation](https://opentelemetry.io/registry/?language=go&component=instrumentation) 387 388 #### gRPC Client 389 Client's trace ids are propagated to the server. A span will be created for 390 the client call and another one for the server side. 391 392 ```go 393 import ( 394 "google.golang.org/grpc" 395 "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc" 396 ) 397 398 conn, err := grpc.Dial(server, 399 grpc.WithStatsHandler(otelgrpc.NewClientHandler()), 400 ) 401 ``` 402 403 #### gRPC Server 404 ```go 405 import ( 406 "google.golang.org/grpc" 407 "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc" 408 ) 409 410 grpcSrv := grpc.NewServer( 411 grpc.StatsHandler(otelgrpc.NewServerHandler()), 412 ) 413 ``` 414 415 ### Config Options 416 Possible environment config exporter config options when using 417 `tracing.InitTracing()`. 418 419 #### OTLP 420 * `OTEL_EXPORTER_OTLP_PROTOCOL` 421 * May be one of: `grpc`, `http/protobuf`. 422 * `OTEL_EXPORTER_OTLP_ENDPOINT` 423 * Set to URL like `http://collector:<port>` or `https://collector:<port>`. 424 * Port for `grpc` protocol is 4317, `http/protobuf` is 4318. 425 * If protocol is `grpc`, URL scheme `http` indicates insecure TLS 426 connection, `https` indicates secure even though connection is over gRPC 427 protocol, not HTTP(S). 428 * `OTEL_EXPORTER_OTLP_CERTIFICATE`, `OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE`, 429 `OTEL_EXPORTER_OTLP_CLIENT_KEY` 430 * If protocol is `grpc` or using HTTPS endpoint, set TLS certificate files. 431 * `OTEL_EXPORTER_OTLP_HEADERS` 432 * Optional headers passed to collector in format: 433 `key=value,key2=value2,...`. 434 435 See also [OTLP configuration 436 reference](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md). 437 438 #### Jaeger 439 * `OTEL_EXPORTER_JAEGER_PROTOCOL` 440 * `OTEL_EXPORTER_JAEGER_ENDPOINT` 441 * `OTEL_EXPORTER_JAEGER_AGENT_HOST` 442 * `OTEL_EXPORTER_JAEGER_AGENT_PORT` 443 444 ##### `OTEL_EXPORTER_JAEGER_PROTOCOL` 445 Possible values: 446 * `udp/thrift.compact` (default): Export traces via UDP datagrams. Best used when Jaeger Agent is accessible via loopback interface. May also provide `OTEL_EXPORTER_JAEGER_AGENT_HOST`/`OTEL_EXPORTER_JAEGER_AGENT_PORT`, which default to `localhost`/`6831`. 447 * `udp/thrift.binary`: Alternative protocol to the more commonly used `udp/thrift.compact`. May also provide `OTEL_EXPORTER_JAEGER_AGENT_HOST`/`OTEL_EXPORTER_JAEGER_AGENT_PORT`, which default to `localhost`/`6832`. 448 * `http/thrift.compact`: Export traces via HTTP packets. Best used when Jaeger Agent cannot be deployed or is inaccessible via loopback interface. This setting sends traces directly to Jaeger's collector port. May also provide `OTEL_EXPORTER_JAEGER_ENDPOINT`, which defaults to `http://localhost:14268/api/traces`. 449 450 #### Honeycomb 451 [Honeycomb](https://honeycomb.io) consumes OTLP traces and requires an API key header: 452 ``` 453 OTEL_EXPORTER_OTLP_PROTOCOL=otlp 454 OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io 455 OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=<api-key> 456 ``` 457 458 459 ## Prometheus Metrics 460 Prometheus metric objects are defined at array `tracing.Metrics`. Enable by registering these metrics in Prometheus. 461 462 | Metric | Description | 463 | ------ | ----------- | 464 | `holster_tracing_counter` | Count of traces generated by holster `tracing` package. Label `error` contains `true` for traces in error status. | 465 | `holster_tracing_spans` | Count of trace spans generated by holster `tracing` package. Label `error` contains `true` for spans in error status. |