github.com/grafana/pyroscope-go/godeltaprof@v0.1.8-0.20240513050943-1b1f97373e2a/README.md (about) 1 # godeltaprof 2 3 godeltaprof is an efficient delta profiler for memory, mutex, and block. 4 5 # Why 6 7 In Golang, allocation, mutex and block profiles are cumulative. They only grow over time and show allocations that happened since the beginning of the running program. 8 Not only values grow, but the size of the profile itself grows as well. It could grow up to megabytes in size for long-running processes. These megabytes profiles are called huge profiles in this document. 9 10 In many cases, it's more useful to see the differences between two points in time. 11 You can use the original runtime/pprof package, called a delta profile, to see these differences. 12 Using the delta profile requires passing seconds argument to the pprof endpoint query. 13 14 ``` 15 go tool pprof http://localhost:6060/debug/pprof/heap?seconds=30 16 ``` 17 18 What this does: 19 1. Dump profile `p0` 20 2. Sleep 21 3. Dump profile `p1` 22 4. Decompress and parse protobuf `p0` 23 5. Decompress and parse protobuf `p1` 24 6. Subtract `p0` from `p1` 25 7. Serialize protobuf and compress the result 26 27 The resulting profile is *usually* much smaller (`p0` may be megabytes, while result is usually tens of kilobytes). 28 29 There are number of issues with this approach: 30 31 1. Heap profile contains both allocation values and in-use values. In-use values are not cumulative. In-use values are corrupted by the subtraction. 32 **Note:** It can be fixed if runtime/pprof package uses `p0.ScaleN([]float64{-1,-1,0,0})`, instead of `p0.Scale(-1)` - that would subtract allocation values and zero out in-use values in `p0`. 33 2. It requires dumping two profiles. 34 3. It produces a lot of allocations putting pressure on GC. 35 36 37 ## DataDog's fastdelta 38 39 DataDog's [fastdelta profiler](https://github.com/DataDog/dd-trace-go/blob/30e1406c2cb62af749df03d559853e1d1de0e3bf/profiler/internal/fastdelta/fd.go#L75) uses another approach. 40 41 It improves the runtime/pprof approach by keeping a copy of the previous profile and subtracting the current profile from it. 42 The fastdelta profiler uses a custom protobuf pprof parser that doesn't allocate as much memory. 43 This approach is more efficient, faster, and produces less garbage. It also doesn't require using two profiles. 44 However, the fastdelta profiler still parses huge profiles up to megabytes, just to discard most of it. 45 46 ## godeltaprof 47 48 godeltaprof does a similar job but slightly differently. 49 50 Delta computation happens before serializing any pprof files using `runtime.MemprofileRecord` and `BlockProfileRecord`. 51 This way, huge profiles don't need to be parsed. The delta is computed on raw records, all zeros are rejected, and results are serialized and compressed. 52 53 The source code for godeltaprof is based (forked) on the original [runtime/pprof package](https://github.com/golang/go/tree/master/src/runtime/pprof). 54 godeltaprof is modified to include delta computation before serialization and to expose the new endpoints. 55 There are other small improvements and benefits: 56 - Using `github.com/klauspost/compress/gzip` instead of `compress/gzip` 57 - Optional lazy mappings reading (they don't change over time for most applications) 58 - Separate package from runtime, so updated independently 59 60 # benchmarks 61 62 These benchmarks used memory profiles from the [pyroscope](https://github.com/grafana/pyroscope) server. 63 64 BenchmarkOG - dumps memory profile with runtime/pprof package 65 BenchmarkFastDelta - dumps memory profile with runtime/pprof package and computes delta using fastdelta 66 BenchmarkGodeltaprof - does not dump profile with runtime/pprof, computes delta, outputs it results 67 68 Each benchmark also outputs produced profile sizes. 69 ``` 70 BenchmarkOG 71 63 181862189 ns/op 72 profile sizes: [209117 209107 209077 209089 209095 209076 209088 209082 209090 209092] 73 74 BenchmarkFastDelta 75 43 273936764 ns/op 76 profile sizes: [169300 10815 8969 9511 9752 9376 9545 8959 10357 9536] 77 78 BenchmarkGodeltaprof 79 366 31148264 ns/op 80 profile sizes: [208898 11485 9347 9967 10291 9848 10085 9285 11033 9986] 81 ``` 82 83 Notice how BenchmarkOG profiles sizes are ~200k and BenchmarkGodeltaprof and BenchmarkFastDelta are ~10k - that is because a lof of samples 84 with zero values are discarded after delta computation. 85 86 Source code of benchmarks could be found [here](https://github.com/grafana/pyroscope/compare/godeltaprofbench?expand=1) 87 88 CPU profiles: [BenchmarkOG](https://flamegraph.com/share/a8f68312-98c7-11ee-a502-466f68d203a5), [BenchmarkFastDelta](https://flamegraph.com/share/c23821f3-98c7-11ee-a502-466f68d203a5), [BenchmarkGodeltaprof]( https://flamegraph.com/share/ea66df36-98c7-11ee-9a0d-f2c25703e557) 89 90 91 92 # upstreaming 93 94 TODO(korniltsev): create golang issue and ask if godeltaprof is something that could be considered merging to upstream golang repo 95 in some way(maybe not as is, maybe with different APIs) 96 97 98