github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/utils/binstat/README.md (about) 1 # `binstat` - run-time & space efficient stats in bins 2 3 ## Why use `binstat`? 4 5 * `pprof` reports the seconds of CPU used and not the seconds of wall-clock used. 6 * `trace` can be too heavyweight for larger Golang programs. 7 * prometheus can be heavyweight, and it's difficult to programmatically manage stats once exported. 8 * logging can be heavyweight; 1M function calls == 1M log lines. 9 10 ## `binstat` goals 11 12 * `binstat` is disabled at compile time by default resulting in zero production code overhead. 13 * `binstat` may be enabled at compile time on a stat by stat basis without enabling all stats. 14 * `binstat` aims to be efficient at collecting small stats very fast, e.g. measuring lock time. 15 * `binstat` aims to allow stats to be compared programmatically; is this worse/better than last run? 16 17 ## How `binstat` works? 18 19 * Sprinkle `binstat.Enter*(<what>)` and `binstat.Leave*()` into Golang source code, e.g.: 20 21 ``` 22 import ( 23 ... 24 "github.com/onflow/flow-go/utils/binstat" 25 ... 26 ) 27 ... 28 for loop := 0; loop < 2; loop++ { 29 bs := binstat.EnterTime(fmt.Sprintf("loop-%d", loop)) 30 myFunc() // code to generate stats for 31 binstat.Leave(bs) 32 } 33 ``` 34 35 * Run the program, and max. once per second a stats file will be written opportunistically, e.g.: 36 37 ``` 38 $ cat myproc.pid-085882.txt| egrep loop 39 /GOMAXPROCS=1,GC=100/what[loop-0]/time[1.000000-1.999999]=1 1.912292 // e.g. mypkg.myFunc:161 40 /GOMAXPROCS=8,GC=100/what[loop-1]/time[0.600000-0.699999]=1 0.696432 // e.g. mypkg.myFunc:161 41 $ # user defined string ^^^^^^ 42 $ # bin time range in seconds ^^^^^^^^^^^^^^^^^ 43 $ # times bin incremented ^ 44 $ # total seconds for all increments ^^^^^^^^ 45 $ # name & line of instrumented file ^^^^^^^^^^^^^^^^ 46 ``` 47 48 * If `myFunc()` is identical, why different results? E.g. launches 3 go-routines executing slower if `GOMAXPROCS=1`. 49 50 ### How 'collapsable' `<what>` works? 51 52 * If `<what>` has the format `~<default len><what>`, then `~4RootL1L2` only uses `~4Root` in the bin name, unless env var `BINSTAT_LEN_WHAT="~Root=6"` in which case the bin name uses `~4RootL1`. 53 * In this way, the bin quantity for a particular `<what>` defaults to fewer bins, but may be enlarged using the env var. 54 55 ## Disable AKA 'hide' `binstat` instrumentation before pushing code 56 57 ``` 58 $ utils/binstat/binstat 59 utils/binstat/binstat: Please specifiy a command line option: (list|hide|unhide)-instrumentation [regex='binstat.Bin(Net|MakeID)'] 60 $ utils/binstat/binstat hide-instrumentation 61 - hide instrumentation found 17 files to examine <-- source files containing binstat instrumentation 62 - hide instrumentation wrote 0 files <-- source files rewritten with commented binstat 63 ``` 64 65 * Auto commented `binstat` code looks like this: 66 67 ``` 68 import ( 69 ... 70 _ "github.com/onflow/flow-go/utils/binstat" 71 ... 72 ) 73 ... 74 for loop := 0; loop < 2; loop++ { 75 //bs := binstat.EnterTime(fmt.Sprintf("loop-%d", loop)) 76 myFunc() // code to generate stats for 77 //binstat.Leave(bs) 78 } 79 ``` 80 81 ## What makes `binstat` efficient? 82 83 * The number of bins is relatively small, regardless of the number of function calls. 84 * For timing, `binstat` uses `runtime.nano()` which is used by & about twice as fast as `time.Now()`. 85 * A lock per stat collection is eliminated using a `sync.RWMutex{}` reader/writer mutual exclusion lock: 86 * Usual case: If the bin already exists, any 'reader' can *concurrently* update it via atomic operations. 87 * Rare event: Else a single 'writer' blocks all 'readers' to add the new bin. 88 * `binstat` instrumentation can be disabled in two different ways: 89 * At compile time via source code commenting of instrumentation. 90 * At run-time if `BINSTAT_ENABLE` does not exist. 91 92 ## Example comparison with `pprof` 93 94 * This example -- on a GCP Linux box -- launches 6 identical go-routines, 3 times, with `gomaxprocs=1` & then `=8`. 95 96 ``` 97 $ pushd binstat ; go test -v -vv ./... 2>&1 | perl -lane 's~\\n~\n~g; s~"time".*?,~~g; print;' ; popd 98 ... 99 === RUN TestWithPprof 100 ... 101 {"level":"debug","message":"test: loop=0 try=1; running 6 identical functions with gomaxprocs=1"} 102 ... 103 {"level":"debug","message":"test: loop=0 try=2; running 6 identical functions with gomaxprocs=1"} 104 ... 105 {"level":"debug","message":"test: loop=0 try=3; running 6 identical functions with gomaxprocs=1"} 106 ... 107 {"level":"debug","message":"test: loop=1 try=1; running 6 identical functions with gomaxprocs=8"} 108 ... 109 {"level":"debug","message":"test: loop=1 try=2; running 6 identical functions with gomaxprocs=8"} 110 ... 111 {"level":"debug","message":"test: loop=1 try=3; running 6 identical functions with gomaxprocs=8"} 112 ... 113 ``` 114 115 * The test collects running times from `binstat` & `pprof` for a side by side comparison. 116 * Note: With `gomaxprocs=1` there is a large delta between CPU & wall-clock, e.g. 0.07 vs 0.40 seconds. 117 * Note: With `gomaxprocs=8` `pprof` CPU time incorrectly varies between 0.02 to 0.12 seconds. 118 119 ``` 120 {"level":"debug","message":"test: binstat------- pprof---------"} 121 {"level":"debug","message":"test: try1 try2 try3 try1 try2 try3"} 122 {"level":"debug","message":"test: 0.38 0.40 0.38 0.06 0.06 0.05 // f1() seconds; loop=0 gomaxprocs=1"} 123 {"level":"debug","message":"test: 0.37 0.38 0.36 0.06 0.07 0.06 // f2() seconds; loop=0 gomaxprocs=1"} 124 {"level":"debug","message":"test: 0.29 0.37 0.34 0.06 0.06 0.06 // f3() seconds; loop=0 gomaxprocs=1"} 125 {"level":"debug","message":"test: 0.33 0.35 0.32 0.06 0.07 0.06 // f4() seconds; loop=0 gomaxprocs=1"} 126 {"level":"debug","message":"test: 0.28 0.34 0.29 0.05 0.06 0.06 // f5() seconds; loop=0 gomaxprocs=1"} 127 {"level":"debug","message":"test: 0.40 0.40 0.36 0.06 0.05 0.06 // f6() seconds; loop=0 gomaxprocs=1"} 128 {"level":"debug","message":"test: binstat------- pprof---------"} 129 {"level":"debug","message":"test: try1 try2 try3 try1 try2 try3"} 130 {"level":"debug","message":"test: 0.07 0.08 0.07 0.04 0.05 0.07 // f1() seconds; loop=1 gomaxprocs=8"} 131 {"level":"debug","message":"test: 0.07 0.08 0.07 0.05 0.07 0.05 // f2() seconds; loop=1 gomaxprocs=8"} 132 {"level":"debug","message":"test: 0.07 0.08 0.07 0.06 0.11 0.02 // f3() seconds; loop=1 gomaxprocs=8"} 133 {"level":"debug","message":"test: 0.07 0.08 0.07 0.06 0.06 0.06 // f4() seconds; loop=1 gomaxprocs=8"} 134 {"level":"debug","message":"test: 0.07 0.08 0.07 0.09 0.05 0.04 // f5() seconds; loop=1 gomaxprocs=8"} 135 {"level":"debug","message":"test: 0.07 0.08 0.07 0.03 0.04 0.12 // f6() seconds; loop=1 gomaxprocs=8"} 136 ``` 137 138 * Finally, the test shows the `binstat` sorted file containing the stats. 139 * This part is when `GOMAXPROCS=1`. 140 * Note: The `binstat.tck` bins show how many 1/10ths of a second had how many go-routines running. 141 * Note: The `binstat.dmp` bins show how long opportunistically saving the `binstat` file took. 142 143 ``` 144 {"level":"debug","message":"test: output of command: ls -al ./binstat.test.pid-*.binstat.txt ; cat ./binstat.test.pid-*.binstat.txt | sort --version-sort 145 -rw-r--r-- 1 simonhardy-francis staff 3610 23 Jun 17:10 ./binstat.test.pid-047616.binstat.txt 146 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f1]/time[0.200000-0.299999]=1 0.292582 // e.g. utils/binstat_test.run.func1:83 147 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f1]/time[0.300000-0.399999]=2 0.758238 // e.g. utils/binstat_test.run.func1:83 148 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f2]/time[0.200000-0.299999]=2 0.587235 // e.g. utils/binstat_test.run.func1:83 149 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f2]/time[0.300000-0.399999]=1 0.356942 // e.g. utils/binstat_test.run.func1:83 150 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f3]/time[0.300000-0.399999]=3 1.017089 // e.g. utils/binstat_test.run.func1:83 151 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f4]/time[0.200000-0.299999]=2 0.562171 // e.g. utils/binstat_test.run.func1:83 152 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f4]/time[0.300000-0.399999]=1 0.315660 // e.g. utils/binstat_test.run.func1:83 153 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f5]/time[0.200000-0.299999]=2 0.567996 // e.g. utils/binstat_test.run.func1:83 154 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f5]/time[0.300000-0.399999]=1 0.301534 // e.g. utils/binstat_test.run.func1:83 155 /GOMAXPROCS=1,CPUS=8/what[~1f-via-f6]/time[0.300000-0.399999]=3 1.157075 // e.g. utils/binstat_test.run.func1:83 156 /GOMAXPROCS=1,CPUS=8/what[internal-NumG]/size[4]=6 0.000185 // e.g. utils/binstat.tick:424 157 /GOMAXPROCS=1,CPUS=8/what[internal-NumG]/size[5]=2 0.000042 // e.g. utils/binstat.tick:424 158 /GOMAXPROCS=1,CPUS=8/what[internal-NumG]/size[9]=2 0.000012 // e.g. utils/binstat.tick:424 159 /GOMAXPROCS=1,CPUS=8/what[internal-NumG]/size[10-19]=9 0.000137 // e.g. utils/binstat.tick:424 160 /GOMAXPROCS=1,CPUS=8/what[internal-dump]/time[0.000600-0.000699]=1 0.000650 // e.g. utils/binstat.dump:436 161 /GOMAXPROCS=1,CPUS=8/what[loop-0]/time[1.000000-1.999999]=1 1.812913 // e.g. utils/binstat_test.TestWithPprof:176 162 ``` 163 164 * This part is when `GOMAXPROCS=8`. 165 * Note: The `f1` - `f6` functions executed in parallel without pre-emption & therefore quicker. 166 167 ``` 168 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f1]/time[0.070000-0.079999]=2 0.145029 // e.g. utils/binstat_test.run.func1:83 169 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f1]/time[0.080000-0.089999]=1 0.082286 // e.g. utils/binstat_test.run.func1:83 170 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f2]/time[0.070000-0.079999]=2 0.144383 // e.g. utils/binstat_test.run.func1:83 171 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f2]/time[0.100000-0.199999]=1 0.100980 // e.g. utils/binstat_test.run.func1:83 172 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f3]/time[0.070000-0.079999]=2 0.146867 // e.g. utils/binstat_test.run.func1:83 173 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f3]/time[0.100000-0.199999]=1 0.106264 // e.g. utils/binstat_test.run.func1:83 174 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f4]/time[0.070000-0.079999]=2 0.144903 // e.g. utils/binstat_test.run.func1:83 175 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f4]/time[0.090000-0.099999]=1 0.098164 // e.g. utils/binstat_test.run.func1:83 176 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f5]/time[0.070000-0.079999]=2 0.144949 // e.g. utils/binstat_test.run.func1:83 177 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f5]/time[0.090000-0.099999]=1 0.098013 // e.g. utils/binstat_test.run.func1:83 178 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f6]/time[0.070000-0.079999]=2 0.141520 // e.g. utils/binstat_test.run.func1:83 179 /GOMAXPROCS=8,CPUS=8/what[~1f-via-f6]/time[0.100000-0.199999]=1 0.103813 // e.g. utils/binstat_test.run.func1:83 180 /GOMAXPROCS=8,CPUS=8/what[~2egEnterTimeVal]/size[100-199]/time[0.000010-0.000019]=1 0.000017 // e.g. utils/binstat.TestBinstatInternal:20 181 /GOMAXPROCS=8,CPUS=8/what[~2egEnterTimeVal]/size[myPoint]/time[0.000005-0.000005]=1 0.000006 // e.g. utils/binstat.TestBinstatInternal:20 182 /GOMAXPROCS=8,CPUS=8/what[~2egEnter]/size[100-199]=1 0.000070 // e.g. utils/binstat.TestBinstatInternal:17 183 /GOMAXPROCS=8,CPUS=8/what[~2egenterTimeValInternal]/size[100-199]/time[0.000004-0.000004]=1 0.000005 // e.g. utils/binstat.TestBinstatInternal:24 184 /GOMAXPROCS=8,CPUS=8/what[~2egenterTimeValInternal]/size[myPoint]/time[0.000002-0.000002]=1 0.000002 // e.g. utils/binstat.TestBinstatInternal:24 185 /GOMAXPROCS=8,CPUS=8/what[internal-NumG]/size[4]=5 0.000160 // e.g. utils/binstat.tick:424 186 /GOMAXPROCS=8,CPUS=8/what[internal-NumG]/size[5]=1 0.000029 // e.g. utils/binstat.tick:424 187 /GOMAXPROCS=8,CPUS=8/what[internal-NumG]/size[10-19]=2 0.000046 // e.g. utils/binstat.tick:424 188 /GOMAXPROCS=8,CPUS=8/what[internal-dump]/time[0.000900-0.000999]=1 0.000943 // e.g. utils/binstat.dump:436 189 /GOMAXPROCS=8,CPUS=8/what[loop-1]/time[0.800000-0.899999]=1 0.819436 // e.g. utils/binstat_test.TestWithPprof:176 190 ``` 191 192 * This part shows internal timings, as well as the Unix time, elapsed seconds since process start, & the number of GCs and the total seconds they took.. 193 194 ``` 195 /internal/GCStats=9 0.001566 196 /internal/binstat.debug=36 0.001022 197 /internal/binstat.enter=70 0.001932 198 /internal/binstat.point=71 0.000882 199 /internal/binstat.x_2_y=111 0.000383 200 /internal/second=1624493407 2.665259 201 "} 202 ``` 203 204 ## Environment variables 205 206 * If `BINSTAT_ENABLE` exists, `binstat` is enabled. Default: Disabled. 207 * If `BINSTAT_VERBOSE` exists, `binstat` outputs debug info & e.g. function location. Default: Disabled. 208 * If `BINSTAT_DMP_NAME` exists, use it. Default: `<process name>.pid-<process pid>.binstat.txt`. 209 * If `BINSTAT_DMP_PATH` exists, output dump to `<BINSTAT_DMP_PATH>/<BINSTAT_DMP_NAME>`. Default: `.`. 210 * If `BINSTAT_CUT_PATH` exists, cut function name path with this. Default: `=github.com/onflow/flow-go/`. 211 * If `BINSTAT_LEN_WHAT` exists, truncate `<what>` to `<len>` from `~<what>=<len>`, e.g. `~f=99;~2eg=0`. 212 213 ## Todo 214 215 * Monitor this [tsc github issue](https://github.com/templexxx/tsc/issues/8) in case we can accelerate timing further. 216 * How to best compress & access the per second `binstat` file from outside the container?