trpc.group/trpc-go/trpc-go@v1.0.3/rpcz/README.md (about) 1 English | [中文](README.zh_CN.md) 2 3 # RPCZ 4 5 RPCZ is a tool for monitoring RPC, logging various events that occur in a single rpc, such as serialization/deserialization, compression/decompression and execution of interceptors. 6 It allows users to configure the events that need to be logged, and users can view the logged events through the admin tool, which can help them locate problems quickly and accurately. 7 In addition, since RPCZ records the duration of various events in RPC and the size of packets sent and received, it can help users analyze timeout events and optimize the performance of the service. 8 9 ## Explanation of terms 10 11 ### Event 12 13 Event (Event) [1, 2, 3] is used to describe that something (`Event.Name`) happened at a particular moment (`Event.Time`). 14 15 ```go 16 type Event struct { 17 Name string 18 Time time.Time 19 } 20 ``` 21 In a normal RPC call, a series of events will occur, for example, the Client side of the request is sent in chronological order, and generally the following series of events will occur. 22 23 1. start running the pre-interceptor 24 2. finish running the pre-interceptor 25 3. start serialization 26 4. end serialization 27 5. start compression 28 6. end compression 29 7. start encoding protocol header fields 30 8. end encoding protocol header fields 31 9. start sending binaries to the network 32 10. end sending binary to network 33 11. start receiving binary files from the network 34 12. ends receiving binary files from the network 35 13. start decoding protocol header fields 36 14. end decoding protocol header fields 37 15. start decompression 38 16. end decompression 39 17. start deserialization 40 18. end deserialization 41 19. start running post-interceptor 42 20. finish running the post-interceptor 43 44 On the server side, where the request is processed, the following sequence of events typically occurs in chronological order. 45 46 1. start decoding the protocol header fields 47 2. finish decoding the protocol header fields 48 3. start decompression 49 4. end decompression 50 5. start deserialization 51 6. end deserialization 52 7. start running the pre-interceptor 53 8. finish running the pre-interceptor 54 9. start running user-defined handler 55 10. end running user-defined handler 56 11. start running post-interceptor 57 12. end running post-interceptor 58 13. start serialization 59 14. end serialization 60 15. start compression 61 16. end compression 62 17. start encoding protocol header fields 63 18. end decoding protocol header fields 64 19. start sending binary files to the network 65 20. end sending binary to network 66 67 ### Span 68 69 Span[4, 5] is used to describe a single operation for a certain time interval (with a start time and an end time), such as a client sending a remote request and a server processing the request or a function call. 70 Depending on the size of the divided time interval, a large Span can contain multiple smaller Spans, just as multiple other functions may be called within a single function, creating a tree structured hierarchy. 71 Thus, a Span may contain many sub-Spans in addition to the name, the internal identifier span-id [6], the start time, the end time, and the set of events (Events) that occurred during this time. 72 73 There are two types of Span in rpcz: 74 75 - client-Span: describes the actions of the client during the interval from the start of the request to the receipt of the reply (covering the series of events on the client side described in the previous section Event). 76 - server-Span: describes the operation of the server from the time it starts receiving requests to the time it finishes sending replies (covers the series of events on the server side described in the previous section Event). 77 When server-Span runs a user-defined processing function, it may create a client to call a downstream service, so server-Span will contain several sub-client-Span. 78 79 ``` 80 server-Span 81 client-Span-1 82 client-Span-2 83 ...... 84 client-Span-n 85 ``` 86 87 Span is stored in context, rpcz will automatically call ContextWithSpan to store Span in context, you need to ensure that the Span in context will not be lost during the function call. 88 89 ## Life cycle of Span 90 91 When examining the lifecycle of Span objects, most of the operations on Span in rpcz need to consider concurrency safety. 92 Pool and pre-allocated circular array are used to reduce the performance impact of memory allocation for Span. 93 94 ### Span construction 95 96 rpcz initializes a global GlobalRPCZ at startup, which is used to generate and store Span. 97 There are only two possible locations where a Span can be constructed within the framework. 98 The first location is when the handle function of the transport layer on the server side first starts processing incoming requests. 99 The second location is when the Invoke function is called in the client-side stub code to start the rpc request. 100 Although the two locations create different types of Span, the code logic is similar, both will call `rpcz.NewSpanContext`, which actually performs following three operations successively: 101 102 1. Call the SpanFromContext function to get the span from the context. 103 2. Call span.NewChild method to create a new child span. 104 3. Call the ContextWithSpan function to set the newly created child span into context. 105 106 ### Span passing in context 107 108 The created span is stored in context until it is committed, and is passed along the link of the rpc call. 109 Use `rpcz.AddEvent` to add a new event to the span in the current context on the call link. 110 111 ### Span commits 112 113 After the request is processed by the handle function at the transport layer on the server side, `ender.End()` is called to commit the Span to GlobalRPCZ. 114 After that, the Span is still stored in the context, but semantically, the Span is not allowed to be manipulated again after the End function has been called, and its behavior is undefined. 115 116 ### Accessing Span in admin 117 118 The admin module calls `rpcz.Query` and `rpcz.BatchQuery` to read the Span from GlobalRPCZ. 119 One thing to note is that the Span obtained by admin is a read-only Span (ReadOnlySpan), which is exported from a writable Span for the sake of concurrent access security. 120 121 ### Delete redundant Span 122 123 When too many Spans are stored in the hash table, it is necessary to remove the redundant Spans according to some elimination rules. 124 The current implementation removes the oldest Span when the number of Spans in the GlobalRPCZ exceeds the maximum capacity limit. 125 126 ## Origin of RPCZ name 127 128 Regarding the origin of the name "RPCZ", the suffix -z has two general meanings in English [7]: it is used in nouns to change the singular to plural, e.g. Boy**z** are always trouble; and it is used in verbs to change the verb form He love**z** me. 129 In summary, adding -z to a word has the same effect as adding -s. 130 So "RPCZ" refers to various types of RPCs, and this does hold true from a distributed global call-link perspective, where there is a tree-like parent-child relationship of various RPC calls that combine to form the "RPCZ". 131 132 The term "RPCZ" first came from Google's internal RPC framework Stubby, based on which Google implemented a similar function in the open source grpc channelz [8], which not only includes information about various channels, but also covers trace information. 133 After that, Baidu's open source brpc implemented a non-distributed trace tool based on the distributed trace system Dapper paper [9] published by google, imitating channelz named brpc-rpcz [10]. 134 The next step is that users need a tool similar to brpc-rpcz for debugging and optimization in tRPC, so tRPC-Cpp first supports similar functionality, still keeping the name RPCZ. 135 136 The last thing is to support similar functionality to "RPCZ" in tRPC-Go. During the implementation process, it was found that with the development of distributed tracing systems, open source systems of opentracing [11] and opentelemetry [12] emerged in the community. 137 tRPC-Go-RPCZ partially borrows the go language implementation of opentelemetry-trace for span and event design, and can be considered as a trace system inside the tRPC-Go framework. 138 Strictly speaking, tRPC-Go-RPCZ is non-distributed, because there is no communication between the different services at the protocol level. 139 Now it seems that brpc, tRPC-Cpp and the tRPC-Go implementation of rpcz, named spanz, might be more in line with the original meaning of the suffix "-z". 140 141 142 ## How to configure rpcz 143 144 The configuration of rpcz includes basic configuration, advanced configuration and code configuration, see `config_test.go` for more configuration examples. 145 146 ### Supporting for different tRPC-GO versions 147 148 - v0.15.0: supporting tRPC streaming and tnet. 149 - v0.14.0: supporting http protocol. 150 - v0.11.0: supporting tRPC unary call. 151 - Versions prior to v0.11.0: Not supported. 152 153 ### Basic configuration 154 155 Configure admin on the server side, and configure rpcz inside admin: 156 157 ```yaml 158 server: 159 admin: 160 ip: 127.0.0.1 161 port: 9028 162 rpcz: 163 fraction: 1.0 164 capacity: 10000 165 ``` 166 167 - `fraction`: the sampling rate, the range is `[0.0, 1.0]`, the default value is 0.0 which means no sampling, you need to configure it manually. 168 - `capacity`: the storage capacity of rpcz, the default value is 10000, which means the maximum number of spans can be stored. 169 170 ### Advanced configuration 171 172 Advanced configuration allows you to filter the span of interest. Before using advanced configuration, you need to understand the sampling mechanism of rpcz. 173 174 #### Sampling mechanism 175 176 rpcz uses the sampling mechanism to control performance overhead and filter spans that are not of interest to you. 177 Sampling may occur at different stages of a Span's lifecycle, with the earliest sampling occurring before a Span is created and the latest sampling occurring before a Span is committed. 178 179 ##### Sampling results table 180 181 Only Spans that are sampled before both creation and commit will eventually be collected in GlobalRPCZ for you to query through the admin interface. 182 183 | Sampled before Span creation? | Sampled before Span commit? | Will the Span eventually be collected? | 184 |:------------------------------|:---------------------------:|:--------------------------------------:| 185 | true | true | true | 186 | true | false | false | 187 | false | true | false | 188 | false | false | false | 189 190 ##### Sampling before Span creation 191 192 Span is created only when it is sampled, otherwise it is not created, which avoids a series of subsequent operations on Span and thus reduces the performance overhead to a large extent. 193 The sampling policy with fixed sampling rate [13] has only one configurable floating-point parameter `rpcz.fraction`, for example, `rpcz.fraction` is 0.0001, which means one request is sampled for every 10000 (1/0.0001) requests. 194 When `rpcz.fraction` is less than 0, it is fetched up by 0. When `rpcz.fraction` is greater than 1, it is fetched down by 1. 195 196 ##### Sampling before Span commit 197 198 Spans that have been created will record all kinds of information in the rpc, but you may only care about spans that contain certain information, such as spans with rpc errors, spans that are highly time consuming, and spans that contain certain property information. 199 In this case, it is necessary to sample only the Span that you needs before the Span is finally committed. 200 rpcz provides a flexible external interface that allows you to set the `rpcz.record_when` field in the configuration file to customize the sampling logic before the service is started. 201 "record_when" provides three common boolean operations: "AND", "OR", and "NOT", 202 as well as seven basic operations that return boolean values: "__min_request_size", "__min_response_size", "__error_code", "__error_message", "__rpc_name", "__min_duration", and "__has_attribute". It should be noted that "record_when" itself is an "AND" operation. 203 By combining these operations in any way, you can flexibly filter out the Spans of interest. 204 205 ```yaml 206 server: 207 admin: 208 rpcz: 209 record_when: 210 error_codes: [0,] 211 min_duration: 1000ms # ms or s 212 sampling_fraction: 1 # [0.0, 1.0] 213 ``` 214 215 - `error_codes`: Only sample spans containing any of these error codes, e.g. 0(RetOk), 21(RetServerTimeout). 216 - `min_duration`: Only sample spans that last longer than `min_duration`, which can be used for time-consuming analysis. 217 - `sampling_fraction`: The sampling rate, in the range of `[0, 1]`. 218 219 #### Example of configuration 220 221 ##### Submitting the span that contains error code 1 (RetServerDecodeFail), the error message contains the string "unknown", and the duration is greater than 1 second. 222 223 ```yaml 224 server: 225 admin: 226 ip: 127.0.0.1 227 port: 9028 228 rpcz: 229 fraction: 1.0 230 capacity: 10000 231 record_when: 232 - __error_code: 1 233 - __min_duration: 1000ms 234 - __sampling_fraction: 1 235 ``` 236 237 238 Note: "record_when" itself is an "AND" node, and it can also be written in the following ways: 239 240 style1: 241 242 ```yaml 243 server: 244 admin: 245 ip: 127.0.0.1 246 port: 9028 247 rpcz: 248 fraction: 1.0 249 capacity: 10000 250 record_when: 251 - AND: 252 - __error_code: 1 253 - __min_duration: 1000ms 254 - __sampling_fraction: 1 255 ``` 256 257 style2: 258 259 ```yaml 260 server: 261 admin: 262 ip: 127.0.0.1 263 port: 9028 264 rpcz: 265 fraction: 1.0 266 capacity: 10000 267 record_when: 268 - AND: 269 - __error_code: 1 270 - AND: 271 - __min_duration: 1000ms 272 - __sampling_fraction: 1 273 ``` 274 275 style3: 276 277 ```yaml 278 server: 279 admin: 280 ip: 127.0.0.1 281 port: 9028 282 rpcz: 283 fraction: 1.0 284 capacity: 10000 285 record_when: 286 - AND: 287 - __error_code: 1 288 - AND: 289 - __min_duration: 1000ms 290 - AND: 291 - __sampling_fraction: 1 292 ``` 293 294 ##### Submitting the span that contains error code 1 (RetServerDecodeFail) or 21 (RetServerTimeout), or the duration is greater than 2 seconds with a probability of 1/2. 295 296 ```yaml 297 server: 298 admin: 299 ip: 127.0.0.1 300 port: 9028 301 rpcz: 302 fraction: 1.0 303 capacity: 10000 304 record_when: 305 - OR: 306 - error_code: 1 307 - error_code: 21 308 - min_duration: 2s 309 - __sampling_fraction: 0.5 310 ``` 311 312 ##### Submitting the span that has a duration greater than 10 seconds, contains the string "TDXA/Transfer" in the rpc name, and the error message does not contain the string "pseudo". 313 314 ```yaml 315 server: 316 admin: 317 ip: 127.0.0.1 318 port: 9028 319 rpcz: 320 fraction: 1.0 321 capacity: 10000 322 record_when: 323 - min_duration: 2s 324 - __rpc_name: "TDXA/Transfer" 325 - NOT: 326 __error_message: "pseudo" 327 - __sampling_fraction: 1 328 ``` 329 330 ### Code configuration 331 332 After reading the configuration file and before the service starts, rpcz can be configured with `rpcz.GlobalRPCZ`, where the commit sampling logic needs to implement the `ShouldRecord` function. 333 334 ```go 335 // ShouldRecord determines if the Span should be recorded. 336 type ShouldRecord = func(Span) bool 337 ``` 338 339 ##### commits only for Span containing the "SpecialAttribute" attribute 340 341 ```go 342 const attributeName = "SpecialAttribute" 343 rpcz.GlobalRPCZ = rpcz.NewRPCZ(&rpcz.Config{ 344 Fraction: 1.0, 345 Capacity: 1000, 346 ShouldRecord: func(s rpcz.Span) bool { 347 _, ok = s.Attribute(attributeName) 348 return ok 349 }, 350 }) 351 ``` 352 353 ### Query the summary information of the most recently submitted multiple span 354 355 To query the summary information of the last num span, you can access the following url: 356 357 ```html 358 http://ip:port/cmds/rpcz/spans?num=xxx 359 ``` 360 361 For example, executing `curl http://ip:port/cmds/rpcz/spans?num=2` will return the summary information for 2 spans as follows. 362 363 ```html 364 1: 365 span: (client, 65744150616107367) 366 time: (Dec 1 20:57:43.946627, Dec 1 20:57:43.946947) 367 duration: (0, 319.792µs, 0) 368 attributes: (RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall), (Error, <nil>) 369 2: 370 span: (server, 1844470940819923952) 371 time: (Dec 1 20:57:43.946677, Dec 1 20:57:43.946912) 372 duration: (0, 235.5µs, 0) 373 attributes: (RequestSize, 125),(ResponseSize, 18),(RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall),(Error, success) 374 ``` 375 376 The summary information for each span matches the following template. 377 378 ```html 379 span: (name, id) 380 time: (startTime, endTime) 381 duration: (preDur, middleDur, postDur) 382 attributes: (name1, value1) (name2, value2) 383 ``` 384 385 The meaning of each of these fields is explained as follows. 386 387 - name: the name of the span 388 - id: the unique identifier of the span, which can be used to query the details of a specific span 389 - startTime: the creation time of the span 390 - endTime: the commit time of the span, when the span is not successfully committed, the value of this field is "unknown" 391 - duration: contains a time period to describe the duration of currentSpan and parentSpan 392 - preDur: currentSpan.startTime - parentSpan.startTime 393 - middleDur: currentSpan.endTime - currentSpan.startTime, when currentSpan.endTime is "unknown", the value of middleDur is also "unknown". 394 - postDur: parentSpan.endTime - currentSpan.endTime, when parentSpan.endTime or currentSpan.endTime is "unknown", the value of postDur is also "unknown" 395 - attributes: attributes of the span, each attribute consists of (attribute name, attribute value), usually the following three attributes are displayed 396 - RequestSize: request packet size (byte) 397 - ResponseSize: response packet size (byte) 398 - RPCName: the service name of the counterpart + interface name (/trpc.app.server.service/method) 399 - Error: error message, according to the framework return code to determine whether the request is successful, success or nil means success 400 401 If you do not specify the number of queries, the following query will default to return a summary of the [^1] 10 most recently submitted successful spans. 402 403 ```html 404 http://ip:port/cmds/rpcz/spans 405 ``` 406 407 [^1]: **The most recently committed span is not sorted strictly by time, there may be multiple goroutines submitting spans at the same time, and they are sorted by the most recently committed span.** 408 409 ### Query the details of a span 410 411 To query the details of a span containing an id, you can access the following url. 412 413 ```html 414 http://ip:port/cmds/rpcz/spans/{id} 415 ``` 416 417 For example, execute `curl http://ip:port/cmds/rpcz/spans/6673650005084645130` to query the details of a span with the span id 6673650005084645130. 418 419 ``` 420 span: (server, 6673650005084645130) 421 time: (Dec 2 10:43:55.295935, Dec 2 10:43:55.399262) 422 duration: (0, 103.326ms, 0) 423 attributes: (RequestSize, 125),(ResponseSize, 18),(RPCName, /trpc.testing.end2end.TestTRPC/EmptyCall),(Error, success) 424 span: (DecodeProtocolHead, 6673650005084645130) 425 time: (Dec 2 10:43:55.295940, Dec 2 10:43:55.295952) 426 duration: (4.375µs, 12.375µs, 103.30925ms) 427 span: (Decompress, 6673650005084645130) 428 time: (Dec 2 10:43:55.295981, Dec 2 10:43:55.295982) 429 duration: (45.875µs, 791ns, 103.279334ms) 430 span: (Unmarshal, 6673650005084645130) 431 time: (Dec 2 10:43:55.295982, Dec 2 10:43:55.295983) 432 duration: (47.041µs, 334ns, 103.278625ms) 433 span: (filter1, 6673650005084645130) 434 time: (Dec 2 10:43:55.296161, Dec 2 10:43:55.399249) 435 duration: (225.708µs, 103.088ms, 12.292µs) 436 event: (your annotation at pre-filter, Dec 2 10:43:55.296163) 437 span: (filter2, 6673650005084645130) 438 time: (Dec 2 10:43:55.296164, Dec 2 10:43:55.399249) 439 duration: (2.75µs, 103.085ms, 250ns) 440 event: (your annotation at pre-filter, Dec 2 10:43:55.296165) 441 span: (server.WithFilter, 6673650005084645130) 442 time: (Dec 2 10:43:55.296165, Dec 2 10:43:55.399249) 443 duration: (1.208µs, 103.083625ms, 167ns) 444 event: (your annotation at pre-filter, Dec 2 10:43:55.296165) 445 span: (, 6673650005084645130) 446 time: (Dec 2 10:43:55.296166, Dec 2 10:43:55.399249) 447 duration: (792ns, 103.082583ms, 250ns) 448 span: (HandleFunc, 6673650005084645130) 449 time: (Dec 2 10:43:55.296177, Dec 2 10:43:55.399249) 450 duration: (11.583µs, 103.070917ms, 83ns) 451 event: (handling EmptyCallF, Dec 2 10:43:55.296179) 452 span: (client, 6673650005084645130) 453 time: (Dec 2 10:43:55.296187, Dec 2 10:43:55.297871) 454 duration: (9.125µs, 1.684625ms, 101.377167ms) 455 attributes: (RPCName, /trpc.testing.end2end.TestTRPC/UnaryCall),(Error, <nil>) 456 span: (filter1, 6673650005084645130) 457 time: (Dec 2 10:43:55.296192, Dec 2 10:43:55.297870) 458 duration: (5.292µs, 1.678542ms, 791ns) 459 span: (client.WithFilter, 6673650005084645130) 460 time: (Dec 2 10:43:55.296192, Dec 2 10:43:55.297870) 461 duration: (542ns, 1.677875ms, 125ns) 462 span: (selector, 6673650005084645130) 463 time: (Dec 2 10:43:55.296193, Dec 2 10:43:55.297870) 464 duration: (541ns, 1.677209ms, 125ns) 465 span: (CallFunc, 6673650005084645130) 466 time: (Dec 2 10:43:55.296200, Dec 2 10:43:55.297869) 467 duration: (7.459µs, 1.668541ms, 1.209µs) 468 attributes: (RequestSize, 405),(ResponseSize, 338) 469 span: (Marshal, 6673650005084645130) 470 time: (Dec 2 10:43:55.296202, Dec 2 10:43:55.296341) 471 duration: (1.375µs, 138.875µs, 1.528291ms) 472 span: (Compress, 6673650005084645130) 473 time: (Dec 2 10:43:55.296341, Dec 2 10:43:55.296341) 474 duration: (140.708µs, 333ns, 1.5275ms) 475 span: (EncodeProtocolHead, 6673650005084645130) 476 time: (Dec 2 10:43:55.296342, Dec 2 10:43:55.296345) 477 duration: (141.458µs, 3.333µs, 1.52375ms) 478 span: (SendMessage, 6673650005084645130) 479 time: (Dec 2 10:43:55.297540, Dec 2 10:43:55.297555) 480 duration: (1.339375ms, 15.708µs, 313.458µs) 481 span: (ReceiveMessage, 6673650005084645130) 482 time: (Dec 2 10:43:55.297556, Dec 2 10:43:55.297860) 483 duration: (1.355666ms, 303.75µs, 9.125µs) 484 span: (DecodeProtocolHead, 6673650005084645130) 485 time: (Dec 2 10:43:55.297862, Dec 2 10:43:55.297865) 486 duration: (1.661916ms, 2.5µs, 4.125µs) 487 span: (Decompress, 6673650005084645130) 488 time: (Dec 2 10:43:55.297866, Dec 2 10:43:55.297866) 489 duration: (1.665583ms, 167ns, 2.791µs) 490 span: (Unmarshal, 6673650005084645130) 491 time: (Dec 2 10:43:55.297866, Dec 2 10:43:55.297868) 492 duration: (1.666041ms, 1.709µs, 791ns) 493 span: (sleep, 6673650005084645130) 494 time: (Dec 2 10:43:55.297876, unknown) 495 duration: (1.698709ms, unknown, unknown) 496 event: (awake, Dec 2 10:43:55.398954) 497 span: (client, 6673650005084645130) 498 time: (Dec 2 10:43:55.398979, Dec 2 10:43:55.399244) 499 duration: (102.80125ms, 265.417µs, 4.25µs) 500 attributes: (RPCName, /trpc.testing.end2end.TestTRPC/UnaryCall),(Error, <nil>) 501 span: (filter2, 6673650005084645130) 502 time: (Dec 2 10:43:55.398986, Dec 2 10:43:55.399244) 503 duration: (6.834µs, 258.25µs, 333ns) 504 span: (client.WithFilter, 6673650005084645130) 505 time: (Dec 2 10:43:55.398987, Dec 2 10:43:55.399244) 506 duration: (1.708µs, 256.458µs, 84ns) 507 span: (selector, 6673650005084645130) 508 time: (Dec 2 10:43:55.398988, Dec 2 10:43:55.399244) 509 duration: (417ns, 255.916µs, 125ns) 510 span: (CallFunc, 6673650005084645130) 511 time: (Dec 2 10:43:55.399005, Dec 2 10:43:55.399243) 512 duration: (16.833µs, 238.375µs, 708ns) 513 attributes: (RequestSize, 405),(ResponseSize, 338) 514 span: (Marshal, 6673650005084645130) 515 time: (Dec 2 10:43:55.399006, Dec 2 10:43:55.399017) 516 duration: (1.792µs, 10.458µs, 226.125µs) 517 span: (Compress, 6673650005084645130) 518 time: (Dec 2 10:43:55.399017, Dec 2 10:43:55.399017) 519 duration: (12.583µs, 167ns, 225.625µs) 520 span: (EncodeProtocolHead, 6673650005084645130) 521 time: (Dec 2 10:43:55.399018, Dec 2 10:43:55.399023) 522 duration: (12.958µs, 4.917µs, 220.5µs) 523 span: (SendMessage, 6673650005084645130) 524 time: (Dec 2 10:43:55.399041, Dec 2 10:43:55.399070) 525 duration: (36.375µs, 29.083µs, 172.917µs) 526 span: (ReceiveMessage, 6673650005084645130) 527 time: (Dec 2 10:43:55.399070, Dec 2 10:43:55.399239) 528 duration: (65.75µs, 168.25µs, 4.375µs) 529 span: (DecodeProtocolHead, 6673650005084645130) 530 time: (Dec 2 10:43:55.399240, Dec 2 10:43:55.399241) 531 duration: (235.417µs, 1.375µs, 1.583µs) 532 span: (Decompress, 6673650005084645130) 533 time: (Dec 2 10:43:55.399242, Dec 2 10:43:55.399242) 534 duration: (237µs, 125ns, 1.25µs) 535 span: (Unmarshal, 6673650005084645130) 536 time: (Dec 2 10:43:55.399242, Dec 2 10:43:55.399243) 537 duration: (237.292µs, 750ns, 333ns) 538 event: (your annotation at post-filter, Dec 2 10:43:55.399249) 539 event: (your annotation at post-filter, Dec 2 10:43:55.399249) 540 event: (your annotation at post-filter, Dec 2 10:43:55.399249) 541 span: (Marshal, 6673650005084645130) 542 time: (Dec 2 10:43:55.399250, Dec 2 10:43:55.399251) 543 duration: (103.314625ms, 1.208µs, 10.167µs) 544 span: (Compress, 6673650005084645130) 545 time: (Dec 2 10:43:55.399252, Dec 2 10:43:55.399252) 546 duration: (103.315958ms, 125ns, 9.917µs) 547 span: (EncodeProtocolHead, 6673650005084645130) 548 time: (Dec 2 10:43:55.399252, Dec 2 10:43:55.399253) 549 duration: (103.316208ms, 750ns, 9.042µs) 550 span: (SendMessage, 6673650005084645130) 551 time: (Dec 2 10:43:55.399253, Dec 2 10:43:55.399261) 552 duration: (103.317333ms, 8.333µs, 334ns) 553 ``` 554 555 A new `event` field has been added to the span details, along with an embedded subspan. 556 557 - event: describes what happened at a given moment, similar to a log. 558 Events that can be inserted by you, such as `Nov 4 14:39:23.594147: your annotation at pre-filter` in the example above. 559 - span: While the server is processing your custom function, a new client may be created to call the downstream service, and a sub-span will be created 560 As you can see, all subspans occur within `HandleFunc`. 561 562 Note that the values of middleDur and postDur in endTime, duration may be ``unknown'', for example, the above span contains the following subspan. 563 564 ``` 565 span: (sleep, 6673650005084645130) 566 time: (Dec 2 10:43:55.297876, unknown) 567 duration: (1.698709ms, unknown, unknown) 568 event: (awake, Dec 2 10:43:55.398954) 569 ``` 570 571 ## Span Interface 572 573 You can call `rpcz.SpanFromContext`[^2] to get the current `Span` in the `context` and then use the following interface to manipulate Span. 574 575 ```go 576 type Span interface { 577 // AddEvent adds an event. 578 AddEvent(name string) 579 580 // SetAttribute sets Attribute with (name, value). 581 SetAttribute(name string, value interface{}) 582 583 // ID returns SpanID. 584 ID() SpanID 585 586 // NewChild creates a child span from current span. 587 // Ender ends this span if related operation is completed. 588 NewChild(name string) (Span, Ender) 589 } 590 ``` 591 592 [^2]: Return a `noopSpan` when the `context` does not contain any `span`, any operation on the `noopSpan` is null and will not take effect. 593 594 ### Using AddEvent to add events 595 596 ```go 597 // If no Span is currently set in ctx an implementation of a Span that performs no operations is returned. 598 span := SpanFromContext(ctx context.Context) 599 600 span.AddEvent("Acquiring lock") 601 mutex.Lock() 602 603 span.AddEvent("Got lock, doing work...") 604 // do some stuff ... 605 606 span.AddEvent("Unlocking") 607 mutex.Unlock() 608 ``` 609 610 ### Use SetAttribute to set attributes 611 612 ```go 613 ctx, msg := codec.EnsureMessage(ctx) 614 span := SpanFromContext(ctx context.Context) 615 span.SetAttribute("RPCName", msg.ClientRPCName()) 616 span.SetAttribute("Error", msg.ClientRspErr()) 617 ``` 618 619 ### Create a new child span 620 621 **End() function should be called only once by the caller to end the life cycle of the child span; uncalled End and multiple calls to End are undefined** 622 623 ```go 624 span := SpanFromContext(ctx context.Context) 625 cs, end := span.NewChild("Decompress") 626 reqBodyBuf, err := codec.Decompress(compressType, reqBodyBuf) 627 end.End() 628 ``` 629 630 ## Reference 631 632 - [1] https://en.wikipedia.org/wiki/Event_(UML) 633 - [2] https://en.wikipedia.org/wiki/Event_(computing) 634 - [3] https://opentelemetry.io/docs/instrumentation/go/manual/#events 635 - [4] https://opentelemetry.io/docs/instrumentation/go/api/tracing/#starting-and-ending-a-span 636 - [5] https://opentelemetry.io/docs/concepts/observability-primer/#spans 637 - [6] span-id represented as an 8-byte array, satisfying the w3c trace-context specification. https://www.w3.org/TR/trace-context/#parent-id 638 - [7] https://en.wiktionary.org/wiki/-z#English 639 - [8] https://github.com/grpc/proposal/blob/master/A14-channelz.md 640 - [9] Dapper, a Large-Scale Distributed Systems Tracing Infrastructure: http://static.googleusercontent.com/media/research.google.com/en// pubs/archive/36356.pdf 641 - [10] brpc-rpcz: https://github.com/apache/incubator-brpc/blob/master/docs/cn/rpcz.md 642 - [11] opentracing: https://opentracing.io/ 643 - [12] opentelemetry: https://opentelemetry.io/ 644 - [13] open-telemetry-sdk-go-traceIDRatioSampler: https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/sampling.go