github.com/aclements/go-misc@v0.0.0-20240129233631-2f6ede80790c/split/README.md

github.com/aclements/go-misc@v0.0.0-20240129233631-2f6ede80790c/split/README.md (about)

     1  This package is a prototype implementation of split (or "sharded")
     2  values for Go. This is a possible solution to
     3  https://github.com/golang/go/issues/18802.
     4  
     5  [![](https://godoc.org/github.com/aclements/go-misc/split?status.svg)](https://godoc.org/github.com/aclements/go-misc/split)
     6  
     7  This prototype is very dependent on Go runtime internals. As is, it
     8  does not depend on any *modifications* to the Go runtime; however,
     9  there is an optional runtime modification that shaves about 4ns off
    10  the cost of `Value.Get`. See that method for details.
    11  
    12  Benchmarks
    13  ----------
    14  
    15  With the runtime modification, the single-core overhead of the split
    16  value compared to a single atomic counter is about 2 ns, and compared
    17  to a non-atomic counter is about 6 ns:
    18  
    19  ```
    20  BenchmarkCounterSplit          	200000000	         8.15 ns/op
    21  BenchmarkCounterShared         	300000000	         5.96 ns/op
    22  BenchmarkCounterSequential     	1000000000	         2.14 ns/op
    23  BenchmarkLazyAggregationSplit  	100000000	        23.9 ns/op
    24  BenchmarkLazyAggregationShared 	100000000	        23.1 ns/op
    25  ```
    26  
    27  The scaling of the split values to 24 cores is nearly perfect (real
    28  cores, no hyperthreads), while the shared values collapse as you'd
    29  expect:
    30  
    31  ```
    32  BenchmarkCounterSplit-24               	2000000000	         0.35 ns/op      8.40 cpu-ns/op
    33  BenchmarkCounterShared-24            	50000000	        24.7 ns/op     593    cpu-ns/op
    34  BenchmarkLazyAggregationSplit-24       	2000000000	         1.03 ns/op     24.7  cpu-ns/op
    35  BenchmarkLazyAggregationShared-24    	10000000	       174 ns/op      4176    cpu-ns/op
    36  ```
    37  
    38  Without the runtime modification, there's a little more overhead in
    39  the sequential case, but the scaling isn't affected:
    40  
    41  ```
    42  BenchmarkCounterSplit          	100000000	        12.3 ns/op
    43  BenchmarkCounterShared         	300000000	         5.97 ns/op
    44  BenchmarkCounterSequential     	1000000000	         2.28 ns/op
    45  BenchmarkLazyAggregationSplit  	50000000	        25.2 ns/op
    46  BenchmarkLazyAggregationShared 	100000000	        23.5 ns/op
    47  ```