github.com/mailgun/holster/v4@v4.20.0/tracing/README.md (about)

     1  # Distributed Tracing Using OpenTelemetry
     2  ## What is OpenTelemetry?
     3  From [opentelemetry.io](https://opentelemetry.io):
     4  
     5  > OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument,
     6  > generate, collect, and export telemetry data (metrics, logs, and traces) to
     7  > help you analyze your software’s performance and behavior.
     8  
     9  Use OpenTelemetry to generate traces visualizing behavior in your application.
    10  It's comprised of nested spans that are rendered as a waterfall graph.  Each
    11  span indicates start/end timings and optionally other developer specified
    12  metadata and logging output.
    13  
    14  Jaeger Tracing is a common tool used to receive OpenTelemetry trace data.  Use
    15  its web UI to query for traces and view the waterfall graph.
    16  
    17  OpenTelemetry is distributed, which allows services to pass the trace ids to
    18  disparate remote services.  The remote service may generate child spans that
    19  will be visible on the same waterfall graph.  This requires that all services
    20  send traces to the same Jaeger Server.
    21  
    22  ## Why OpenTelemetry?
    23  It is the latest standard for distributed tracing clients.
    24  
    25  OpenTelemetry supersedes its now deprecated predecessor,
    26  [OpenTracing](https://opentracing.io).
    27  
    28  It no longer requires implementation specific client modules, such as Jaeger
    29  client.  The provided OpenTelemetry SDK includes a client for Jaeger.
    30  
    31  ## Why Jaeger Tracing Server?
    32  Easy to setup.  Powerful and easy to use web UI.  Open source.  Scalable using
    33  Elasticsearch.
    34  
    35  ## Getting Started
    36  [opentelemetry.io](https://opentelemetry.io)
    37  
    38  OpenTelemetry dev reference:
    39  [https://pkg.go.dev/go.opentelemetry.io/otel](https://pkg.go.dev/go.opentelemetry.io/otel)
    40  
    41  See unit tests for usage examples.
    42  
    43  ### Configuration
    44  In ideal conditions where you wish to send traces to localhost on 6831/udp, no
    45  configuration is necessary.
    46  
    47  Configuration reference via environment variables:
    48  [https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md).
    49  
    50  #### Exporter
    51  Traces export to Jaeger by default.  Other exporters are available by setting
    52  environment variable `OTEL_TRACES_EXPORTER` with one or more of:
    53  
    54  * `otlp`: [OTLP: OpenTelemetry Protocol](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md)
    55  * `jaeger`: [Jaeger Tracing](https://jaegertracing.io)
    56  * `none`: Disable export.
    57  
    58  Usually, you'd only need one exporter.  If not, more than one may be selected
    59  by delimiting with comma.
    60  
    61  #### OTLP Exporter
    62  By default, OTLP exporter exports to an [OpenTelemetry
    63  Collector](https://opentelemetry.io/docs/collector/) on localhost on gRPC port
    64  4317.  The host and port can be changed by setting environment variable
    65  `OTEL_EXPORTER_OTLP_ENDPOINT` like `https://collector:4317`.
    66  
    67  See more: [OTLP configuration](#OTLP)
    68  
    69  #### Jaeger Exporter via UDP
    70  By default, Jaeger exporter exports to a Jaeger Agent on localhost port
    71  6831/udp.  The host and port can be changed by setting environment variables
    72  `OTEL_EXPORTER_JAEGER_AGENT_HOST`, `OTEL_EXPORTER_JAEGER_AGENT_PORT`.
    73  
    74  It's important to ensure UDP traces are sent on the loopback interface (aka
    75  localhost).  UDP datagrams are limited in size to the MTU of the interface and
    76  the payload cannot be split into multiple datagrams.  The loopback interface
    77  MTU is large, typically 65000 or higher.  Network interface MTU is typically
    78  much lower at 1500.  OpenTelemetry's Jaeger client is sometimes unable to limit
    79  its payload to fit in a 1500 byte datagram and will drop those packets.  This
    80  causes traces that are mangled or missing detail.
    81  
    82  #### Jaeger Exporter via HTTP
    83  If it's not possible to install a Jaeger Agent on localhost, the client can
    84  instead export directly to the Jaeger Collector of the Jaeger Server on HTTP
    85  port 14268.
    86  
    87  Enable HTTP exporter with configuration:
    88  ```
    89  OTEL_EXPORTER_JAEGER_PROTOCOL=http/thrift.binary
    90  OTEL_EXPORTER_JAEGER_ENDPOINT=http://<jaeger-server>:14268/api/traces
    91  ```
    92  
    93  #### Probabilistic Sampling
    94  By default, all traces are sampled.  If the tracing volume is burdening the
    95  application, network, or Jaeger Server, then sampling can be used to
    96  selectively drop some of the traces.
    97  
    98  In production, it may be ideal to set sampling based on a percentage
    99  probability.  The probability can be set in Jaeger Server configuration or
   100  locally.
   101  
   102  To enable locally, set environment variables:
   103  
   104  ```
   105  OTEL_TRACES_SAMPLER=traceidratio
   106  OTEL_TRACES_SAMPLER_ARG=<value-between-0-and-1>
   107  ```
   108  
   109  Where 1 is always sample every trace and 0 is do not sample anything.
   110  
   111  ### Initialization
   112  The OpenTelemetry client must be initialized to read configuration and prepare
   113  a `Tracer` object.  When application is exiting, call `CloseTracing()`.
   114  
   115  The library name passed in the second argument appears in spans as metadata
   116  `otel.library.name`.  This is used to identify the library or module that
   117  generated that span.  This usually the fully qualified module name of your
   118  repo.  Pass an empty string to autodetect the module name of the executible.
   119  
   120  ```go
   121  import "github.com/mailgun/holster/v4/tracing"
   122  
   123  err := tracing.InitTracing(ctx, "github.com/myrepo/myservice")
   124  // or...
   125  err := tracing.InitTracing(ctx, "")
   126  
   127  // ...
   128  
   129  err = tracing.CloseTracing(ctx)
   130  ```
   131  
   132  ### Log Level
   133  Log level may be applied to traces to filter spans having a minimum log
   134  severity.  Spans that do not meed the minimum severity are simply dropped and
   135  not exported.
   136  
   137  Log level is passed with option `tracing.WithLevel()` as a numeric
   138  log level (0-6): Panic, Fatal, Error, Warning, Info, Debug, Trace.
   139  
   140  As a convenience, use constants, such as `tracing.DebugLevel`:
   141  
   142  ```go
   143  import "github.com/mailgun/holster/v4/tracing"
   144  
   145  level := tracing.DebugLevel
   146  err := tracing.InitTracing(ctx, "my library name", tracing.WithLevel(level))
   147  ```
   148  
   149  If `WithLevel()` is omitted, the level will be the global level set in Logrus.
   150  
   151  See [Scope Log Level](#scope-log-level) for details on creating spans
   152  with an assigned log level.
   153  
   154  #### Log Level Filtering
   155  Just like with common log frameworks, scope will filter spans that are a lower
   156  severity than threshold provided using `WithLevel()`.
   157  
   158  If scopes are nested and one in the middle is dropped, the hierarchy will be
   159  preserved.
   160  
   161  e.g. If `WithLevel()` is passed a log level of "Info", we expect
   162  "Debug" scopes to be dropped:
   163  ```
   164  # Input:
   165  Info Level 1 -> Debug Level 2 -> Info Level 3
   166  
   167  # Exports spans in form:
   168  Info Level 1 -> Info Level 3
   169  ```
   170  
   171  Log level filtering is critical for high volume applications where debug
   172  tracing would generate significantly more data that isn't sustainable or
   173  helpful for normal operations.  But developers will have the option to
   174  selectively enable debug tracing for troubleshooting.
   175  
   176  ### Tracer Lifecycle
   177  The common use case is to call `InitTracing()` to build a single default tracer
   178  that the application uses througout its lifetime, then call `CloseTracing()` on
   179  shutdown.
   180  
   181  The default tracer is stored globally in the tracer package for use by tracing
   182  functions.
   183  
   184  The tracer object identifies itself by a library name, which can be seen in Jaeger
   185  traces as attribute `otel.library.name`.  This value is typically the module
   186  name of the application.
   187  
   188  If it's necessary to create traces with a different library name, additional
   189  tracer objects may be created by `NewTracer()` which returns a context with the
   190  tracer object embedded in it.  This context object must be passed to tracing
   191  functions use that tracer in particular, otherwise the default tracer will be
   192  selected.
   193  
   194  ### Setting Resources
   195  OpenTelemetry is configured by environment variables and supplemental resource
   196  settings.  Some of these resources also map to environment variables.
   197  
   198  #### Service Name
   199  The service name appears in the Jaeger "Service" dropdown.  If unset, default
   200  is `unknown_service:<executable-filename>`.
   201  
   202  Service name may be set in configuration by environment variable
   203  `OTEL_SERVICE_NAME`.
   204  
   205  As an alternative to environment variable, it may be provided as a resource.
   206  The resource setting takes precedent over the environment variable.
   207  
   208  ```go
   209  import "github.com/mailgun/holster/v4/tracing"
   210  
   211  res, err := tracing.NewResource("My service", "v1.0.0")
   212  ctx, tracer, err := tracing.InitTracing(ctx, "github.com/myrepo/myservice", tracing.WithResource(res))
   213  ```
   214  
   215  ### Manual Tracing
   216  Basic instrumentation.  Traces function duration as a span and captures logrus logs.
   217  
   218  ```go
   219  import (
   220  	"context"
   221  
   222  	"github.com/mailgun/holster/v4/tracing"
   223  )
   224  
   225  func MyFunc(ctx context.Context) error {
   226  	tracer := tracing.Tracer()
   227  	ctx, span := tracer.Start(ctx, "Span name")
   228  	defer span.End()
   229  
   230  	// ...
   231  
   232  	return nil
   233  }
   234  ```
   235  
   236  ### Common OpenTelemetry Tasks
   237  #### Span Attributes
   238  The active `Span` object is embedded in the `Context` object.  This can be
   239  extracted to do things like add attribute metadata to the span:
   240  
   241  ```go
   242  import (
   243  	"context"
   244  
   245  	"go.opentelemetry.io/otel/attribute"
   246  	"go.opentelemetry.io/otel/trace"
   247  )
   248  
   249  func MyFunc(ctx context.Context) error {
   250  	span := trace.SpanFromContext(ctx)
   251  	span.SetAttributes(
   252  		attribute.String("foobar", "value"),
   253  		attribute.Int("x", 12345),
   254  	)
   255  }
   256  ```
   257  
   258  #### Add Span Event
   259  A span event is a log message added to the active span.  It can optionally
   260  include attribute metadata.
   261  
   262  ```go
   263  span.AddEvent("My message")
   264  span.AddEvent("My metadata", trace.WithAttributes(
   265  	attribute.String("foobar", "value"),
   266  	attribute.Int("x", 12345"),
   267  ))
   268  ```
   269  
   270  #### Log an Error
   271  An `Error` object can be logged to the active span.  This appears as a log
   272  event on the span.
   273  
   274  ```go
   275  err := errors.New("My error message")
   276  span.RecordError(err)
   277  
   278  // Can also add attribute metadata.
   279  span.RecordError(err, trace.WithAttributes(
   280  	attribute.String("foobar", "value"),
   281  ))
   282  ```
   283  
   284  ### Scope Tracing
   285  The scope functions automate span start/end and error reporting to the active
   286  trace.
   287  
   288  | Function       | Description |
   289  | -------------- | ----------- |
   290  | `StartScope()`/`BranchScope()` | Start a scope by creating a span named after the fully qualified calling function. |
   291  | `StartNamedScope()`/`BranchNamedScope()` | Start a scope by creating a span with user-provided name. |
   292  | `EndScope()`   | End the scope, record returned error value. |
   293  | `CallScope()`/`CallScopeBranch()` | Call a code block as a scope using `StartScope()`/`EndScope()` functionality. |
   294  | `CallNamedScope()`/`CallNamedScopeBranch()` | Same as `CallScope()` with a user-provided span name. |
   295  
   296  The secondary `Branch` functions perform the same task as their counterparts,
   297  except that it will "branch" from an existing trace only.  If the context
   298  contains no trace id, no trace will be created.  The `Branch` functions are
   299  best used with lower level or shared code where there is no value in creating a
   300  trace starting at that point.
   301  
   302  If the `CallScope()` action function returns an error, the error message is
   303  automatically logged to the trace and the trace is marked as error.
   304  
   305  #### Using `StartScope()`/`EndScope()`
   306  ```go
   307  import (
   308  	"context"
   309  
   310  	"github.com/mailgun/holster/tracing"
   311  	"github.com/sirupsen/logrus"
   312  )
   313  
   314  func MyFunc(ctx context.Context) (reterr error) {
   315  	ctx = tracing.StartScope(ctx)
   316  	defer func() {
   317  		tracing.EndScope(ctx, reterr)
   318  	}()
   319  
   320  	logrus.WithContext(ctx).Info("This message also logged to trace")
   321  
   322  	// ...
   323  
   324  	return nil
   325  }
   326  ```
   327  
   328  #### Using `CallScope()`
   329  ```go
   330  import (
   331  	"context"
   332  
   333  	"github.com/mailgun/holster/v4/tracing"
   334  	"github.com/sirupsen/logrus"
   335  )
   336  
   337  func MyFunc(ctx context.Context) error {
   338  	return tracing.CallScope(ctx, func(ctx context.Context) error {
   339  		logrus.WithContext(ctx).Info("This message also logged to trace")
   340  
   341  		// ...
   342  
   343  		return nil
   344  	})
   345  }
   346  ```
   347  
   348  #### Scope Log Level
   349  Log level can be applied to individual spans using variants of
   350  `CallScope()`/`StartScope()` to set debug, info, warn, or error levels:
   351  
   352  ```go
   353  ctx2 := tracing.StartScopeDebug(ctx)
   354  defer tracing.EndScope(ctx2, nil)
   355  ```
   356  
   357  ```go
   358  err := tracing.CallScopeDebug(ctx, func(ctx context.Context) error {
   359      // ...
   360  
   361      return nil
   362  })
   363  ```
   364  
   365  #### Scope Log Level Filtering
   366  Just like with common log frameworks, scope will filter spans that are a lower
   367  severity than threshold provided using `WithLevel()`.
   368  
   369  
   370  ## Instrumentation
   371  ### Logrus
   372  Logrus is configured by `InitTracing()` to mirror log messages to the active trace, if exists.
   373  
   374  For this to work, you must use the `WithContext()` method to propagate the active
   375  trace stored in the context.
   376  
   377  ```go
   378  logrus.WithContext(ctx).Info("This message also logged to trace")
   379  ```
   380  
   381  If the log is error level or higher, the span is also marked as error and sets
   382  attributes `otel.status_code` and `otel.status_description` with the error
   383  details.
   384  
   385  ### Other Instrumentation Options
   386  See: [https://opentelemetry.io/registry/?language=go&component=instrumentation](https://opentelemetry.io/registry/?language=go&component=instrumentation)
   387  
   388  #### gRPC Client
   389  Client's trace ids are propagated to the server.  A span will be created for
   390  the client call and another one for the server side.
   391  
   392  ```go
   393  import (
   394  	"google.golang.org/grpc"
   395  	"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
   396  )
   397  
   398  conn, err := grpc.Dial(server,
   399  	grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
   400  )
   401  ```
   402  
   403  #### gRPC Server
   404  ```go
   405  import (
   406  	"google.golang.org/grpc"
   407  	"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
   408  )
   409  
   410  grpcSrv := grpc.NewServer(
   411  	grpc.StatsHandler(otelgrpc.NewServerHandler()),
   412  )
   413  ```
   414  
   415  ### Config Options
   416  Possible environment config exporter config options when using
   417  `tracing.InitTracing()`.
   418  
   419  #### OTLP
   420  * `OTEL_EXPORTER_OTLP_PROTOCOL`
   421     * May be one of: `grpc`, `http/protobuf`.
   422  * `OTEL_EXPORTER_OTLP_ENDPOINT`
   423     * Set to URL like `http://collector:<port>` or `https://collector:<port>`.
   424     * Port for `grpc` protocol is 4317, `http/protobuf` is 4318.
   425     * If protocol is `grpc`, URL scheme `http` indicates insecure TLS
   426       connection, `https` indicates secure even though connection is over gRPC
   427       protocol, not HTTP(S).
   428  * `OTEL_EXPORTER_OTLP_CERTIFICATE`, `OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE`,
   429    `OTEL_EXPORTER_OTLP_CLIENT_KEY`
   430     * If protocol is `grpc` or using HTTPS endpoint, set TLS certificate files.
   431  * `OTEL_EXPORTER_OTLP_HEADERS`
   432     * Optional headers passed to collector in format:
   433       `key=value,key2=value2,...`.
   434  
   435  See also [OTLP configuration
   436  reference](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md).
   437  
   438  #### Jaeger
   439  * `OTEL_EXPORTER_JAEGER_PROTOCOL`
   440  * `OTEL_EXPORTER_JAEGER_ENDPOINT`
   441  * `OTEL_EXPORTER_JAEGER_AGENT_HOST`
   442  * `OTEL_EXPORTER_JAEGER_AGENT_PORT`
   443  
   444  ##### `OTEL_EXPORTER_JAEGER_PROTOCOL`
   445    Possible values:
   446  * `udp/thrift.compact` (default): Export traces via UDP datagrams.  Best used when Jaeger Agent is accessible via loopback interface.  May also provide `OTEL_EXPORTER_JAEGER_AGENT_HOST`/`OTEL_EXPORTER_JAEGER_AGENT_PORT`, which default to `localhost`/`6831`.
   447  * `udp/thrift.binary`: Alternative protocol to the more commonly used `udp/thrift.compact`.  May also provide `OTEL_EXPORTER_JAEGER_AGENT_HOST`/`OTEL_EXPORTER_JAEGER_AGENT_PORT`, which default to `localhost`/`6832`.
   448  * `http/thrift.compact`: Export traces via HTTP packets.  Best used when Jaeger Agent cannot be deployed or is inaccessible via loopback interface.  This setting sends traces directly to Jaeger's collector port.  May also provide `OTEL_EXPORTER_JAEGER_ENDPOINT`, which defaults to `http://localhost:14268/api/traces`.
   449  
   450  #### Honeycomb
   451  [Honeycomb](https://honeycomb.io) consumes OTLP traces and requires an API key header:
   452  ```
   453  OTEL_EXPORTER_OTLP_PROTOCOL=otlp
   454  OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
   455  OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=<api-key>
   456  ```
   457  
   458  
   459  ## Prometheus Metrics
   460  Prometheus metric objects are defined at array `tracing.Metrics`.  Enable by registering these metrics in Prometheus.
   461  
   462  | Metric | Description |
   463  | ------ | ----------- |
   464  | `holster_tracing_counter` | Count of traces generated by holster `tracing` package.  Label `error` contains `true` for traces in error status. |
   465  | `holster_tracing_spans`   | Count of trace spans generated by holster `tracing` package.  Label `error` contains `true` for spans in error status. |