trpc.group/trpc-go/trpc-go@v1.0.3/rpcz/README.md (about)

     1  English | [中文](README.zh_CN.md)
     2  
     3  # RPCZ
     4  
     5  RPCZ is a tool for monitoring RPC, logging various events that occur in a single rpc, such as serialization/deserialization, compression/decompression and execution of interceptors.
     6  It allows users to configure the events that need to be logged, and users can view the logged events through the admin tool, which can help them locate problems quickly and accurately.
     7  In addition, since RPCZ records the duration of various events in RPC and the size of packets sent and received, it can help users analyze timeout events and optimize the performance of the service.
     8  
     9  ## Explanation of terms
    10  
    11  ### Event
    12  
    13  Event (Event) [1, 2, 3] is used to describe that something (`Event.Name`) happened at a particular moment (`Event.Time`).
    14  
    15  ```go
    16  type Event struct {
    17      Name string
    18      Time time.Time
    19  }
    20  ```
    21  In a normal RPC call, a series of events will occur, for example, the Client side of the request is sent in chronological order, and generally the following series of events will occur.
    22  
    23  1. start running the pre-interceptor
    24  2. finish running the pre-interceptor
    25  3. start serialization
    26  4. end serialization
    27  5. start compression
    28  6. end compression
    29  7. start encoding protocol header fields
    30  8. end encoding protocol header fields
    31  9. start sending binaries to the network
    32  10. end sending binary to network
    33  11. start receiving binary files from the network
    34  12. ends receiving binary files from the network
    35  13. start decoding protocol header fields
    36  14. end decoding protocol header fields
    37  15. start decompression
    38  16. end decompression
    39  17. start deserialization
    40  18. end deserialization
    41  19. start running post-interceptor
    42  20. finish running the post-interceptor
    43  
    44  On the server side, where the request is processed, the following sequence of events typically occurs in chronological order.
    45  
    46  1. start decoding the protocol header fields
    47  2. finish decoding the protocol header fields
    48  3. start decompression
    49  4. end decompression
    50  5. start deserialization
    51  6. end deserialization
    52  7. start running the pre-interceptor
    53  8. finish running the pre-interceptor
    54  9. start running user-defined handler
    55  10. end running user-defined handler
    56  11. start running post-interceptor
    57  12. end running post-interceptor
    58  13. start serialization
    59  14. end serialization
    60  15. start compression
    61  16. end compression
    62  17. start encoding protocol header fields
    63  18. end decoding protocol header fields
    64  19. start sending binary files to the network
    65  20. end sending binary to network
    66  
    67  ### Span
    68  
    69  Span[4, 5] is used to describe a single operation for a certain time interval (with a start time and an end time), such as a client sending a remote request and a server processing the request or a function call.
    70  Depending on the size of the divided time interval, a large Span can contain multiple smaller Spans, just as multiple other functions may be called within a single function, creating a tree structured hierarchy.
    71  Thus, a Span may contain many sub-Spans in addition to the name, the internal identifier span-id [6], the start time, the end time, and the set of events (Events) that occurred during this time.
    72  
    73  There are two types of Span in rpcz:
    74  
    75  - client-Span: describes the actions of the client during the interval from the start of the request to the receipt of the reply (covering the series of events on the client side described in the previous section Event).
    76  - server-Span: describes the operation of the server from the time it starts receiving requests to the time it finishes sending replies (covers the series of events on the server side described in the previous section Event).
    77    When server-Span runs a user-defined processing function, it may create a client to call a downstream service, so server-Span will contain several sub-client-Span.
    78  
    79  ```
    80  server-Span
    81      client-Span-1
    82      client-Span-2
    83      ......
    84      client-Span-n
    85  ```
    86  
    87  Span is stored in context, rpcz will automatically call ContextWithSpan to store Span in context, you need to ensure that the Span in context will not be lost during the function call.
    88  
    89  ## Life cycle of Span
    90  
    91  When examining the lifecycle of Span objects, most of the operations on Span in rpcz need to consider concurrency safety.
    92  Pool and pre-allocated circular array are used to reduce the performance impact of memory allocation for Span.
    93  
    94  ### Span construction
    95  
    96  rpcz initializes a global GlobalRPCZ at startup, which is used to generate and store Span.
    97  There are only two possible locations where a Span can be constructed within the framework.
    98  The first location is when the handle function of the transport layer on the server side first starts processing incoming requests.
    99  The second location is when the Invoke function is called in the client-side stub code to start the rpc request.
   100  Although the two locations create different types of Span, the code logic is similar, both will call `rpcz.NewSpanContext`, which actually performs following three operations successively:
   101  
   102  1. Call the SpanFromContext function to get the span from the context.
   103  2. Call span.NewChild method to create a new child span.
   104  3. Call the ContextWithSpan function to set the newly created child span into context.
   105  
   106  ### Span passing in context
   107  
   108  The created span is stored in context until it is committed, and is passed along the link of the rpc call.
   109  Use `rpcz.AddEvent` to add a new event to the span in the current context on the call link.
   110  
   111  ### Span commits
   112  
   113  After the request is processed by the handle function at the transport layer on the server side, `ender.End()` is called to commit the Span to GlobalRPCZ.
   114  After that, the Span is still stored in the context, but semantically, the Span is not allowed to be manipulated again after the End function has been called, and its behavior is undefined.
   115  
   116  ### Accessing Span in admin
   117  
   118  The admin module calls `rpcz.Query` and `rpcz.BatchQuery` to read the Span from GlobalRPCZ.
   119  One thing to note is that the Span obtained by admin is a read-only Span (ReadOnlySpan), which is exported from a writable Span for the sake of concurrent access security.
   120  
   121  ### Delete redundant Span
   122  
   123  When too many Spans are stored in the hash table, it is necessary to remove the redundant Spans according to some elimination rules.
   124  The current implementation removes the oldest Span when the number of Spans in the GlobalRPCZ exceeds the maximum capacity limit.
   125  
   126  ## Origin of RPCZ name
   127  
   128  Regarding the origin of the name "RPCZ", the suffix -z has two general meanings in English [7]: it is used in nouns to change the singular to plural, e.g. Boy**z** are always trouble; and it is used in verbs to change the verb form He love**z** me.
   129  In summary, adding -z to a word has the same effect as adding -s.
   130  So "RPCZ" refers to various types of RPCs, and this does hold true from a distributed global call-link perspective, where there is a tree-like parent-child relationship of various RPC calls that combine to form the "RPCZ".
   131  
   132  The term "RPCZ" first came from Google's internal RPC framework Stubby, based on which Google implemented a similar function in the open source grpc channelz [8], which not only includes information about various channels, but also covers trace information.
   133  After that, Baidu's open source brpc implemented a non-distributed trace tool based on the distributed trace system Dapper paper [9] published by google, imitating channelz named brpc-rpcz [10].
   134  The next step is that users need a tool similar to brpc-rpcz for debugging and optimization in tRPC, so tRPC-Cpp first supports similar functionality, still keeping the name RPCZ.
   135  
   136  The last thing is to support similar functionality to "RPCZ" in tRPC-Go. During the implementation process, it was found that with the development of distributed tracing systems, open source systems of opentracing [11] and opentelemetry [12] emerged in the community.
   137  tRPC-Go-RPCZ partially borrows the go language implementation of opentelemetry-trace for span and event design, and can be considered as a trace system inside the tRPC-Go framework.
   138  Strictly speaking, tRPC-Go-RPCZ is non-distributed, because there is no communication between the different services at the protocol level.
   139  Now it seems that brpc, tRPC-Cpp and the tRPC-Go implementation of rpcz, named spanz, might be more in line with the original meaning of the suffix "-z".
   140  
   141  
   142  ## How to configure rpcz
   143  
   144  The configuration of rpcz includes basic configuration, advanced configuration and code configuration, see `config_test.go` for more configuration examples.
   145  
   146  ### Supporting for different tRPC-GO versions
   147  
   148  - v0.15.0: supporting tRPC streaming and tnet.
   149  - v0.14.0: supporting http protocol.
   150  - v0.11.0: supporting tRPC unary call.
   151  - Versions prior to v0.11.0: Not supported.
   152  
   153  ### Basic configuration
   154  
   155  Configure admin on the server side, and configure rpcz inside admin:
   156  
   157  ```yaml
   158  server:
   159    admin:
   160      ip: 127.0.0.1
   161      port: 9028
   162      rpcz:
   163        fraction: 1.0
   164        capacity: 10000
   165  ```
   166  
   167  - `fraction`: the sampling rate, the range is `[0.0, 1.0]`, the default value is 0.0 which means no sampling, you need to configure it manually.
   168  - `capacity`: the storage capacity of rpcz, the default value is 10000, which means the maximum number of spans can be stored.
   169  
   170  ### Advanced configuration
   171  
   172  Advanced configuration allows you to filter the span of interest. Before using advanced configuration, you need to understand the sampling mechanism of rpcz.
   173  
   174  #### Sampling mechanism
   175  
   176  rpcz uses the sampling mechanism to control performance overhead and filter spans that are not of interest to you.
   177  Sampling may occur at different stages of a Span's lifecycle, with the earliest sampling occurring before a Span is created and the latest sampling occurring before a Span is committed.
   178  
   179  ##### Sampling results table
   180  
   181  Only Spans that are sampled before both creation and commit will eventually be collected in GlobalRPCZ for you to query through the admin interface.
   182  
   183  | Sampled before Span creation? | Sampled before Span commit? | Will the Span eventually be collected? |
   184  |:------------------------------|:---------------------------:|:--------------------------------------:|
   185  | true                          |            true             |                  true                  | 
   186  | true                          |            false            |                 false                  | 
   187  | false                         |            true             |                 false                  | 
   188  | false                         |            false            |                 false                  | 
   189  
   190  ##### Sampling before Span creation
   191  
   192  Span is created only when it is sampled, otherwise it is not created, which avoids a series of subsequent operations on Span and thus reduces the performance overhead to a large extent.
   193  The sampling policy with fixed sampling rate [13] has only one configurable floating-point parameter `rpcz.fraction`, for example, `rpcz.fraction` is 0.0001, which means one request is sampled for every 10000 (1/0.0001) requests.
   194  When `rpcz.fraction` is less than 0, it is fetched up by 0. When `rpcz.fraction` is greater than 1, it is fetched down by 1.
   195  
   196  ##### Sampling before Span commit
   197  
   198  Spans that have been created will record all kinds of information in the rpc, but you may only care about spans that contain certain information, such as spans with rpc errors, spans that are highly time consuming, and spans that contain certain property information.
   199  In this case, it is necessary to sample only the Span that you needs before the Span is finally committed.
   200  rpcz provides a flexible external interface that allows you to set the `rpcz.record_when` field in the configuration file to customize the sampling logic before the service is started.
   201  "record_when" provides three common boolean operations: "AND", "OR", and "NOT",
   202  as well as seven basic operations that return boolean values: "__min_request_size", "__min_response_size", "__error_code", "__error_message", "__rpc_name", "__min_duration", and "__has_attribute". It should be noted that "record_when" itself is an "AND" operation.
   203  By combining these operations in any way, you can flexibly filter out the Spans of interest.
   204  
   205  ```yaml
   206  server:
   207    admin:
   208      rpcz:
   209        record_when:
   210          error_codes: [0,]      
   211          min_duration: 1000ms # ms or s
   212          sampling_fraction: 1 # [0.0, 1.0]
   213  ```
   214  
   215  - `error_codes`: Only sample spans containing any of these error codes, e.g. 0(RetOk), 21(RetServerTimeout).
   216  - `min_duration`: Only sample spans that last longer than `min_duration`, which can be used for time-consuming analysis.
   217  - `sampling_fraction`: The sampling rate, in the range of `[0, 1]`.
   218  
   219  #### Example of configuration
   220  
   221  ##### Submitting the span that contains error code 1 (RetServerDecodeFail), the error message contains the string "unknown", and the duration is greater than 1 second.
   222  
   223  ```yaml
   224  server:
   225    admin:
   226      ip: 127.0.0.1
   227      port: 9028
   228      rpcz:
   229        fraction: 1.0
   230        capacity: 10000
   231        record_when:
   232          - __error_code: 1       
   233          - __min_duration: 1000ms
   234          - __sampling_fraction: 1         
   235  ```
   236  
   237  
   238  Note: "record_when" itself is an "AND" node, and it can also be written in the following ways:
   239  
   240  style1:
   241  
   242  ```yaml
   243  server:
   244    admin:
   245      ip: 127.0.0.1
   246      port: 9028
   247      rpcz:
   248        fraction: 1.0
   249        capacity: 10000
   250        record_when:
   251          - AND:
   252              - __error_code: 1       
   253              - __min_duration: 1000ms
   254              - __sampling_fraction: 1         
   255  ```
   256  
   257  style2:
   258  
   259  ```yaml
   260  server:
   261    admin:
   262      ip: 127.0.0.1
   263      port: 9028
   264      rpcz:
   265        fraction: 1.0
   266        capacity: 10000
   267        record_when:
   268          - AND:
   269              - __error_code: 1
   270          - AND:
   271              - __min_duration: 1000ms
   272              - __sampling_fraction: 1         
   273  ```
   274  
   275  style3:
   276  
   277  ```yaml
   278  server:
   279    admin:
   280      ip: 127.0.0.1
   281      port: 9028
   282      rpcz:
   283        fraction: 1.0
   284        capacity: 10000
   285        record_when:
   286          - AND:
   287              - __error_code: 1
   288          - AND:
   289              - __min_duration: 1000ms
   290          - AND:
   291              - __sampling_fraction: 1         
   292  ```
   293  
   294  ##### Submitting the span that contains error code 1 (RetServerDecodeFail) or 21 (RetServerTimeout), or the duration is greater than 2 seconds with a probability of 1/2.
   295  
   296  ```yaml
   297  server:
   298    admin:
   299      ip: 127.0.0.1
   300      port: 9028
   301      rpcz:
   302        fraction: 1.0
   303        capacity: 10000
   304        record_when:
   305          - OR:
   306              - error_code: 1
   307              - error_code: 21
   308              - min_duration: 2s
   309          - __sampling_fraction: 0.5      
   310  ```
   311  
   312  ##### Submitting the span that has a duration greater than 10 seconds, contains the string "TDXA/Transfer" in the rpc name, and the error message does not contain the string "pseudo".
   313  
   314  ```yaml
   315  server:
   316    admin:
   317      ip: 127.0.0.1
   318      port: 9028
   319      rpcz:
   320        fraction: 1.0
   321        capacity: 10000
   322        record_when:
   323          - min_duration: 2s
   324          - __rpc_name: "TDXA/Transfer"
   325          - NOT:
   326              __error_message: "pseudo" 
   327          - __sampling_fraction: 1     
   328  ```
   329  
   330  ### Code configuration
   331  
   332  After reading the configuration file and before the service starts, rpcz can be configured with `rpcz.GlobalRPCZ`, where the commit sampling logic needs to implement the `ShouldRecord` function.
   333  
   334  ```go
   335  // ShouldRecord determines if the Span should be recorded.
   336  type ShouldRecord = func(Span) bool
   337  ```
   338  
   339  ##### commits only for Span containing the "SpecialAttribute" attribute
   340  
   341  ```go
   342  const attributeName = "SpecialAttribute"
   343  rpcz.GlobalRPCZ = rpcz.NewRPCZ(&rpcz.Config{
   344  Fraction: 1.0,
   345  Capacity: 1000,
   346  ShouldRecord: func(s rpcz.Span) bool {
   347  _, ok = s.Attribute(attributeName)
   348  return ok
   349  },
   350  })
   351  ```
   352  
   353  ### Query the summary information of the most recently submitted multiple span
   354  
   355  To query the summary information of the last num span, you can access the following url:
   356  
   357  ```html
   358  http://ip:port/cmds/rpcz/spans?num=xxx
   359  ```
   360  
   361  For example, executing `curl http://ip:port/cmds/rpcz/spans?num=2` will return the summary information for 2 spans as follows.
   362  
   363  ```html
   364  1:
   365  span: (client, 65744150616107367)
   366  time: (Dec 1 20:57:43.946627, Dec 1 20:57:43.946947)
   367  duration: (0, 319.792µs, 0)
   368  attributes: (RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall), (Error, <nil>)
   369    2:
   370    span: (server, 1844470940819923952)
   371    time: (Dec 1 20:57:43.946677, Dec 1 20:57:43.946912)
   372    duration: (0, 235.5µs, 0)
   373    attributes: (RequestSize, 125),(ResponseSize, 18),(RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall),(Error, success)
   374  ```
   375  
   376  The summary information for each span matches the following template.
   377  
   378  ```html
   379  span: (name, id)
   380  time: (startTime, endTime)
   381  duration: (preDur, middleDur, postDur)
   382  attributes: (name1, value1) (name2, value2)
   383  ```
   384  
   385  The meaning of each of these fields is explained as follows.
   386  
   387  - name: the name of the span
   388  - id: the unique identifier of the span, which can be used to query the details of a specific span
   389  - startTime: the creation time of the span
   390  - endTime: the commit time of the span, when the span is not successfully committed, the value of this field is "unknown"
   391  - duration: contains a time period to describe the duration of currentSpan and parentSpan
   392    - preDur: currentSpan.startTime - parentSpan.startTime
   393    - middleDur: currentSpan.endTime - currentSpan.startTime, when currentSpan.endTime is "unknown", the value of middleDur is also "unknown".
   394    - postDur: parentSpan.endTime - currentSpan.endTime, when parentSpan.endTime or currentSpan.endTime is "unknown", the value of postDur is also "unknown"
   395  - attributes: attributes of the span, each attribute consists of (attribute name, attribute value), usually the following three attributes are displayed
   396    - RequestSize: request packet size (byte)
   397    - ResponseSize: response packet size (byte)
   398    - RPCName: the service name of the counterpart + interface name (/trpc.app.server.service/method)
   399    - Error: error message, according to the framework return code to determine whether the request is successful, success or nil means success
   400  
   401  If you do not specify the number of queries, the following query will default to return a summary of the [^1] 10 most recently submitted successful spans.
   402  
   403  ```html
   404  http://ip:port/cmds/rpcz/spans
   405  ```
   406  
   407  [^1]: **The most recently committed span is not sorted strictly by time, there may be multiple goroutines submitting spans at the same time, and they are sorted by the most recently committed span.**
   408  
   409  ### Query the details of a span
   410  
   411  To query the details of a span containing an id, you can access the following url.
   412  
   413  ```html
   414  http://ip:port/cmds/rpcz/spans/{id}
   415  ```
   416  
   417  For example, execute `curl http://ip:port/cmds/rpcz/spans/6673650005084645130` to query the details of a span with the span id 6673650005084645130.
   418  
   419  ```
   420  span: (server, 6673650005084645130)
   421    time: (Dec  2 10:43:55.295935, Dec  2 10:43:55.399262)
   422    duration: (0, 103.326ms, 0)
   423    attributes: (RequestSize, 125),(ResponseSize, 18),(RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall),(Error, success)
   424    span: (DecodeProtocolHead, 6673650005084645130)
   425      time: (Dec  2 10:43:55.295940, Dec  2 10:43:55.295952)
   426      duration: (4.375µs, 12.375µs, 103.30925ms)
   427    span: (Decompress, 6673650005084645130)
   428      time: (Dec  2 10:43:55.295981, Dec  2 10:43:55.295982)
   429      duration: (45.875µs, 791ns, 103.279334ms)
   430    span: (Unmarshal, 6673650005084645130)
   431      time: (Dec  2 10:43:55.295982, Dec  2 10:43:55.295983)
   432      duration: (47.041µs, 334ns, 103.278625ms)
   433    span: (filter1, 6673650005084645130)
   434      time: (Dec  2 10:43:55.296161, Dec  2 10:43:55.399249)
   435      duration: (225.708µs, 103.088ms, 12.292µs)
   436      event: (your annotation at pre-filter, Dec  2 10:43:55.296163)
   437      span: (filter2, 6673650005084645130)
   438        time: (Dec  2 10:43:55.296164, Dec  2 10:43:55.399249)
   439        duration: (2.75µs, 103.085ms, 250ns)
   440        event: (your annotation at pre-filter, Dec  2 10:43:55.296165)
   441        span: (server.WithFilter, 6673650005084645130)
   442          time: (Dec  2 10:43:55.296165, Dec  2 10:43:55.399249)
   443          duration: (1.208µs, 103.083625ms, 167ns)
   444          event: (your annotation at pre-filter, Dec  2 10:43:55.296165)
   445          span: (, 6673650005084645130)
   446            time: (Dec  2 10:43:55.296166, Dec  2 10:43:55.399249)
   447            duration: (792ns, 103.082583ms, 250ns)
   448            span: (HandleFunc, 6673650005084645130)
   449              time: (Dec  2 10:43:55.296177, Dec  2 10:43:55.399249)
   450              duration: (11.583µs, 103.070917ms, 83ns)
   451              event: (handling EmptyCallF, Dec  2 10:43:55.296179)
   452              span: (client, 6673650005084645130)
   453                time: (Dec  2 10:43:55.296187, Dec  2 10:43:55.297871)
   454                duration: (9.125µs, 1.684625ms, 101.377167ms)
   455                attributes: (RPCName, /trpc.testing.end2end.TestTRPC/UnaryCall),(Error, <nil>)
   456                span: (filter1, 6673650005084645130)
   457                  time: (Dec  2 10:43:55.296192, Dec  2 10:43:55.297870)
   458                  duration: (5.292µs, 1.678542ms, 791ns)
   459                  span: (client.WithFilter, 6673650005084645130)
   460                    time: (Dec  2 10:43:55.296192, Dec  2 10:43:55.297870)
   461                    duration: (542ns, 1.677875ms, 125ns)
   462                    span: (selector, 6673650005084645130)
   463                      time: (Dec  2 10:43:55.296193, Dec  2 10:43:55.297870)
   464                      duration: (541ns, 1.677209ms, 125ns)
   465                      span: (CallFunc, 6673650005084645130)
   466                        time: (Dec  2 10:43:55.296200, Dec  2 10:43:55.297869)
   467                        duration: (7.459µs, 1.668541ms, 1.209µs)
   468                        attributes: (RequestSize, 405),(ResponseSize, 338)
   469                        span: (Marshal, 6673650005084645130)
   470                          time: (Dec  2 10:43:55.296202, Dec  2 10:43:55.296341)
   471                          duration: (1.375µs, 138.875µs, 1.528291ms)
   472                        span: (Compress, 6673650005084645130)
   473                          time: (Dec  2 10:43:55.296341, Dec  2 10:43:55.296341)
   474                          duration: (140.708µs, 333ns, 1.5275ms)
   475                        span: (EncodeProtocolHead, 6673650005084645130)
   476                          time: (Dec  2 10:43:55.296342, Dec  2 10:43:55.296345)
   477                          duration: (141.458µs, 3.333µs, 1.52375ms)
   478                        span: (SendMessage, 6673650005084645130)
   479                          time: (Dec  2 10:43:55.297540, Dec  2 10:43:55.297555)
   480                          duration: (1.339375ms, 15.708µs, 313.458µs)
   481                        span: (ReceiveMessage, 6673650005084645130)
   482                          time: (Dec  2 10:43:55.297556, Dec  2 10:43:55.297860)
   483                          duration: (1.355666ms, 303.75µs, 9.125µs)
   484                        span: (DecodeProtocolHead, 6673650005084645130)
   485                          time: (Dec  2 10:43:55.297862, Dec  2 10:43:55.297865)
   486                          duration: (1.661916ms, 2.5µs, 4.125µs)
   487                        span: (Decompress, 6673650005084645130)
   488                          time: (Dec  2 10:43:55.297866, Dec  2 10:43:55.297866)
   489                          duration: (1.665583ms, 167ns, 2.791µs)
   490                        span: (Unmarshal, 6673650005084645130)
   491                          time: (Dec  2 10:43:55.297866, Dec  2 10:43:55.297868)
   492                          duration: (1.666041ms, 1.709µs, 791ns)
   493              span: (sleep, 6673650005084645130)
   494                time: (Dec  2 10:43:55.297876, unknown)
   495                duration: (1.698709ms, unknown, unknown)
   496                event: (awake, Dec  2 10:43:55.398954)
   497              span: (client, 6673650005084645130)
   498                time: (Dec  2 10:43:55.398979, Dec  2 10:43:55.399244)
   499                duration: (102.80125ms, 265.417µs, 4.25µs)
   500                attributes: (RPCName, /trpc.testing.end2end.TestTRPC/UnaryCall),(Error, <nil>)
   501                span: (filter2, 6673650005084645130)
   502                  time: (Dec  2 10:43:55.398986, Dec  2 10:43:55.399244)
   503                  duration: (6.834µs, 258.25µs, 333ns)
   504                  span: (client.WithFilter, 6673650005084645130)
   505                    time: (Dec  2 10:43:55.398987, Dec  2 10:43:55.399244)
   506                    duration: (1.708µs, 256.458µs, 84ns)
   507                    span: (selector, 6673650005084645130)
   508                      time: (Dec  2 10:43:55.398988, Dec  2 10:43:55.399244)
   509                      duration: (417ns, 255.916µs, 125ns)
   510                      span: (CallFunc, 6673650005084645130)
   511                        time: (Dec  2 10:43:55.399005, Dec  2 10:43:55.399243)
   512                        duration: (16.833µs, 238.375µs, 708ns)
   513                        attributes: (RequestSize, 405),(ResponseSize, 338)
   514                        span: (Marshal, 6673650005084645130)
   515                          time: (Dec  2 10:43:55.399006, Dec  2 10:43:55.399017)
   516                          duration: (1.792µs, 10.458µs, 226.125µs)
   517                        span: (Compress, 6673650005084645130)
   518                          time: (Dec  2 10:43:55.399017, Dec  2 10:43:55.399017)
   519                          duration: (12.583µs, 167ns, 225.625µs)
   520                        span: (EncodeProtocolHead, 6673650005084645130)
   521                          time: (Dec  2 10:43:55.399018, Dec  2 10:43:55.399023)
   522                          duration: (12.958µs, 4.917µs, 220.5µs)
   523                        span: (SendMessage, 6673650005084645130)
   524                          time: (Dec  2 10:43:55.399041, Dec  2 10:43:55.399070)
   525                          duration: (36.375µs, 29.083µs, 172.917µs)
   526                        span: (ReceiveMessage, 6673650005084645130)
   527                          time: (Dec  2 10:43:55.399070, Dec  2 10:43:55.399239)
   528                          duration: (65.75µs, 168.25µs, 4.375µs)
   529                        span: (DecodeProtocolHead, 6673650005084645130)
   530                          time: (Dec  2 10:43:55.399240, Dec  2 10:43:55.399241)
   531                          duration: (235.417µs, 1.375µs, 1.583µs)
   532                        span: (Decompress, 6673650005084645130)
   533                          time: (Dec  2 10:43:55.399242, Dec  2 10:43:55.399242)
   534                          duration: (237µs, 125ns, 1.25µs)
   535                        span: (Unmarshal, 6673650005084645130)
   536                          time: (Dec  2 10:43:55.399242, Dec  2 10:43:55.399243)
   537                          duration: (237.292µs, 750ns, 333ns)
   538          event: (your annotation at post-filter, Dec  2 10:43:55.399249)
   539        event: (your annotation at post-filter, Dec  2 10:43:55.399249)
   540      event: (your annotation at post-filter, Dec  2 10:43:55.399249)
   541    span: (Marshal, 6673650005084645130)
   542      time: (Dec  2 10:43:55.399250, Dec  2 10:43:55.399251)
   543      duration: (103.314625ms, 1.208µs, 10.167µs)
   544    span: (Compress, 6673650005084645130)
   545      time: (Dec  2 10:43:55.399252, Dec  2 10:43:55.399252)
   546      duration: (103.315958ms, 125ns, 9.917µs)
   547    span: (EncodeProtocolHead, 6673650005084645130)
   548      time: (Dec  2 10:43:55.399252, Dec  2 10:43:55.399253)
   549      duration: (103.316208ms, 750ns, 9.042µs)
   550    span: (SendMessage, 6673650005084645130)
   551      time: (Dec  2 10:43:55.399253, Dec  2 10:43:55.399261)
   552      duration: (103.317333ms, 8.333µs, 334ns)
   553  ```
   554  
   555  A new `event` field has been added to the span details, along with an embedded subspan.
   556  
   557  - event: describes what happened at a given moment, similar to a log.
   558    Events that can be inserted by you, such as `Nov 4 14:39:23.594147: your annotation at pre-filter` in the example above.
   559  - span: While the server is processing your custom function, a new client may be created to call the downstream service, and a sub-span will be created
   560    As you can see, all subspans occur within `HandleFunc`.
   561  
   562  Note that the values of middleDur and postDur in endTime, duration may be ``unknown'', for example, the above span contains the following subspan.
   563  
   564  ```
   565  span: (sleep, 6673650005084645130)
   566  time: (Dec 2 10:43:55.297876, unknown)
   567  duration: (1.698709ms, unknown, unknown)
   568  event: (awake, Dec 2 10:43:55.398954)
   569  ```
   570  
   571  ## Span Interface
   572  
   573  You can call `rpcz.SpanFromContext`[^2] to get the current `Span` in the `context` and then use the following interface to manipulate Span.
   574  
   575  ```go
   576  type Span interface {
   577  // AddEvent adds an event.
   578  AddEvent(name string)
   579  
   580  // SetAttribute sets Attribute with (name, value).
   581  SetAttribute(name string, value interface{})
   582  
   583  // ID returns SpanID.
   584  ID() SpanID
   585  
   586  // NewChild creates a child span from current span.
   587  // Ender ends this span if related operation is completed. 
   588  NewChild(name string) (Span, Ender)
   589  }
   590  ```
   591  
   592  [^2]: Return a `noopSpan` when the `context` does not contain any `span`, any operation on the `noopSpan` is null and will not take effect.
   593  
   594  ### Using AddEvent to add events
   595  
   596  ```go
   597  // If no Span is currently set in ctx an implementation of a Span that performs no operations is returned.
   598  span := SpanFromContext(ctx context.Context)
   599  
   600  span.AddEvent("Acquiring lock")
   601  mutex.Lock()
   602  
   603  span.AddEvent("Got lock, doing work...")
   604  // do some stuff ...
   605  
   606  span.AddEvent("Unlocking")
   607  mutex.Unlock()
   608  ```
   609  
   610  ### Use SetAttribute to set attributes
   611  
   612  ```go
   613  ctx, msg := codec.EnsureMessage(ctx)
   614  span := SpanFromContext(ctx context.Context)
   615  span.SetAttribute("RPCName", msg.ClientRPCName())
   616  span.SetAttribute("Error", msg.ClientRspErr())
   617  ```
   618  
   619  ### Create a new child span
   620  
   621  **End() function should be called only once by the caller to end the life cycle of the child span; uncalled End and multiple calls to End are undefined**
   622  
   623  ```go
   624  span := SpanFromContext(ctx context.Context)
   625  cs, end := span.NewChild("Decompress")
   626  reqBodyBuf, err := codec.Decompress(compressType, reqBodyBuf)
   627  end.End()
   628  ```
   629  
   630  ## Reference
   631  
   632  - [1] https://en.wikipedia.org/wiki/Event_(UML)
   633  - [2] https://en.wikipedia.org/wiki/Event_(computing)
   634  - [3] https://opentelemetry.io/docs/instrumentation/go/manual/#events
   635  - [4] https://opentelemetry.io/docs/instrumentation/go/api/tracing/#starting-and-ending-a-span
   636  - [5] https://opentelemetry.io/docs/concepts/observability-primer/#spans
   637  - [6] span-id represented as an 8-byte array, satisfying the w3c trace-context specification. https://www.w3.org/TR/trace-context/#parent-id
   638  - [7] https://en.wiktionary.org/wiki/-z#English
   639  - [8] https://github.com/grpc/proposal/blob/master/A14-channelz.md
   640  - [9] Dapper, a Large-Scale Distributed Systems Tracing Infrastructure: http://static.googleusercontent.com/media/research.google.com/en// pubs/archive/36356.pdf
   641  - [10] brpc-rpcz: https://github.com/apache/incubator-brpc/blob/master/docs/cn/rpcz.md
   642  - [11] opentracing: https://opentracing.io/
   643  - [12] opentelemetry: https://opentelemetry.io/
   644  - [13] open-telemetry-sdk-go-traceIDRatioSampler: https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/sampling.go