github.com/puzpuzpuz/xsync/v3@v3.1.1-0.20240225193106-cbe4ec1e954f/BENCHMARKS.md (about) 1 # xsync benchmarks 2 3 If you're interested in `MapOf` comparison with some of the popular concurrent hash maps written in Go, check [this](https://github.com/cornelk/hashmap/pull/70) and [this](https://github.com/alphadose/haxmap/pull/22) PRs. 4 5 The below results were obtained for xsync v2.3.1 on a c6g.metal EC2 instance (64 CPU, 128GB RAM) running Linux and Go 1.19.3. I'd like to thank [@felixge](https://github.com/felixge) who kindly ran the benchmarks. 6 7 The following commands were used to run the benchmarks: 8 ```bash 9 $ go test -run='^$' -cpu=1,2,4,8,16,32,64 -bench . -count=30 -timeout=0 | tee bench.txt 10 $ benchstat bench.txt | tee benchstat.txt 11 ``` 12 13 The below sections contain some of the results. Refer to [this gist](https://gist.github.com/puzpuzpuz/e62e38e06feadecfdc823c0f941ece0b) for the complete output. 14 15 ### Counter vs. atomic int64 16 17 ``` 18 name time/op 19 Counter 27.3ns ± 1% 20 Counter-2 27.2ns ±11% 21 Counter-4 15.3ns ± 8% 22 Counter-8 7.43ns ± 7% 23 Counter-16 3.70ns ±10% 24 Counter-32 1.77ns ± 3% 25 Counter-64 0.96ns ±10% 26 AtomicInt64 7.60ns ± 0% 27 AtomicInt64-2 12.6ns ±13% 28 AtomicInt64-4 13.5ns ±14% 29 AtomicInt64-8 12.7ns ± 9% 30 AtomicInt64-16 12.8ns ± 8% 31 AtomicInt64-32 13.0ns ± 6% 32 AtomicInt64-64 12.9ns ± 7% 33 ``` 34 35 Here `time/op` stands for average time spent on operation. If you divide `10^9` by the result in nanoseconds per operation, you'd get the throughput in operations per second. Thus, the ideal theoretical scalability of a concurrent data structure implies that the reported `time/op` decreases proportionally with the increased number of CPU cores. On the contrary, if the measured time per operation increases when run on more cores, it means performance degradation. 36 37 ### MapOf vs. sync.Map 38 39 1,000 `[int, int]` entries with a warm-up, 100% Loads: 40 ``` 41 IntegerMapOf_WarmUp/reads=100% 24.0ns ± 0% 42 IntegerMapOf_WarmUp/reads=100%-2 12.0ns ± 0% 43 IntegerMapOf_WarmUp/reads=100%-4 6.02ns ± 0% 44 IntegerMapOf_WarmUp/reads=100%-8 3.01ns ± 0% 45 IntegerMapOf_WarmUp/reads=100%-16 1.50ns ± 0% 46 IntegerMapOf_WarmUp/reads=100%-32 0.75ns ± 0% 47 IntegerMapOf_WarmUp/reads=100%-64 0.38ns ± 0% 48 IntegerMapStandard_WarmUp/reads=100% 55.3ns ± 0% 49 IntegerMapStandard_WarmUp/reads=100%-2 27.6ns ± 0% 50 IntegerMapStandard_WarmUp/reads=100%-4 16.1ns ± 3% 51 IntegerMapStandard_WarmUp/reads=100%-8 8.35ns ± 7% 52 IntegerMapStandard_WarmUp/reads=100%-16 4.24ns ± 7% 53 IntegerMapStandard_WarmUp/reads=100%-32 2.18ns ± 6% 54 IntegerMapStandard_WarmUp/reads=100%-64 1.11ns ± 3% 55 ``` 56 57 1,000 `[int, int]` entries with a warm-up, 99% Loads, 0.5% Stores, 0.5% Deletes: 58 ``` 59 IntegerMapOf_WarmUp/reads=99% 31.0ns ± 0% 60 IntegerMapOf_WarmUp/reads=99%-2 16.4ns ± 1% 61 IntegerMapOf_WarmUp/reads=99%-4 8.42ns ± 0% 62 IntegerMapOf_WarmUp/reads=99%-8 4.41ns ± 0% 63 IntegerMapOf_WarmUp/reads=99%-16 2.38ns ± 2% 64 IntegerMapOf_WarmUp/reads=99%-32 1.37ns ± 4% 65 IntegerMapOf_WarmUp/reads=99%-64 0.85ns ± 2% 66 IntegerMapStandard_WarmUp/reads=99% 121ns ± 1% 67 IntegerMapStandard_WarmUp/reads=99%-2 109ns ± 3% 68 IntegerMapStandard_WarmUp/reads=99%-4 115ns ± 4% 69 IntegerMapStandard_WarmUp/reads=99%-8 114ns ± 2% 70 IntegerMapStandard_WarmUp/reads=99%-16 105ns ± 2% 71 IntegerMapStandard_WarmUp/reads=99%-32 97.0ns ± 3% 72 IntegerMapStandard_WarmUp/reads=99%-64 98.0ns ± 2% 73 ``` 74 75 1,000 `[int, int]` entries with a warm-up, 75% Loads, 12.5% Stores, 12.5% Deletes: 76 ``` 77 IntegerMapOf_WarmUp/reads=75%-reads 46.2ns ± 1% 78 IntegerMapOf_WarmUp/reads=75%-reads-2 36.7ns ± 2% 79 IntegerMapOf_WarmUp/reads=75%-reads-4 22.0ns ± 1% 80 IntegerMapOf_WarmUp/reads=75%-reads-8 12.8ns ± 2% 81 IntegerMapOf_WarmUp/reads=75%-reads-16 7.69ns ± 1% 82 IntegerMapOf_WarmUp/reads=75%-reads-32 5.16ns ± 1% 83 IntegerMapOf_WarmUp/reads=75%-reads-64 4.91ns ± 1% 84 IntegerMapStandard_WarmUp/reads=75%-reads 156ns ± 0% 85 IntegerMapStandard_WarmUp/reads=75%-reads-2 177ns ± 1% 86 IntegerMapStandard_WarmUp/reads=75%-reads-4 197ns ± 1% 87 IntegerMapStandard_WarmUp/reads=75%-reads-8 221ns ± 2% 88 IntegerMapStandard_WarmUp/reads=75%-reads-16 242ns ± 1% 89 IntegerMapStandard_WarmUp/reads=75%-reads-32 258ns ± 1% 90 IntegerMapStandard_WarmUp/reads=75%-reads-64 264ns ± 1% 91 ``` 92 93 ### MPMCQueue vs. Go channels 94 95 Concurrent producers and consumers (1:1), queue/channel size 1,000, some work done by both producers and consumers: 96 ``` 97 QueueProdConsWork100 252ns ± 0% 98 QueueProdConsWork100-2 206ns ± 5% 99 QueueProdConsWork100-4 136ns ±12% 100 QueueProdConsWork100-8 110ns ± 6% 101 QueueProdConsWork100-16 108ns ± 2% 102 QueueProdConsWork100-32 102ns ± 2% 103 QueueProdConsWork100-64 101ns ± 0% 104 ChanProdConsWork100 283ns ± 0% 105 ChanProdConsWork100-2 406ns ±21% 106 ChanProdConsWork100-4 549ns ± 7% 107 ChanProdConsWork100-8 754ns ± 7% 108 ChanProdConsWork100-16 828ns ± 7% 109 ChanProdConsWork100-32 810ns ± 8% 110 ChanProdConsWork100-64 832ns ± 4% 111 ``` 112 113 ### RBMutex vs. sync.RWMutex 114 115 The writer locks on each 100,000 iteration with some work in the critical section for both readers and the writer: 116 ``` 117 RBMutexWorkWrite100000 146ns ± 0% 118 RBMutexWorkWrite100000-2 73.3ns ± 0% 119 RBMutexWorkWrite100000-4 36.7ns ± 0% 120 RBMutexWorkWrite100000-8 18.6ns ± 0% 121 RBMutexWorkWrite100000-16 9.83ns ± 3% 122 RBMutexWorkWrite100000-32 5.53ns ± 0% 123 RBMutexWorkWrite100000-64 4.04ns ± 3% 124 RWMutexWorkWrite100000 121ns ± 0% 125 RWMutexWorkWrite100000-2 128ns ± 1% 126 RWMutexWorkWrite100000-4 124ns ± 2% 127 RWMutexWorkWrite100000-8 101ns ± 1% 128 RWMutexWorkWrite100000-16 92.9ns ± 1% 129 RWMutexWorkWrite100000-32 89.9ns ± 1% 130 RWMutexWorkWrite100000-64 88.4ns ± 1% 131 ```