github.com/puzpuzpuz/xsync/v3@v3.1.1-0.20240225193106-cbe4ec1e954f/BENCHMARKS.md (about)

     1  # xsync benchmarks
     2  
     3  If you're interested in `MapOf` comparison with some of the popular concurrent hash maps written in Go, check [this](https://github.com/cornelk/hashmap/pull/70) and [this](https://github.com/alphadose/haxmap/pull/22) PRs.
     4  
     5  The below results were obtained for xsync v2.3.1 on a c6g.metal EC2 instance (64 CPU, 128GB RAM) running Linux and Go 1.19.3. I'd like to thank [@felixge](https://github.com/felixge) who kindly ran the benchmarks.
     6  
     7  The following commands were used to run the benchmarks:
     8  ```bash
     9  $ go test -run='^$' -cpu=1,2,4,8,16,32,64 -bench . -count=30 -timeout=0 | tee bench.txt
    10  $ benchstat bench.txt | tee benchstat.txt
    11  ```
    12  
    13  The below sections contain some of the results. Refer to [this gist](https://gist.github.com/puzpuzpuz/e62e38e06feadecfdc823c0f941ece0b) for the complete output.
    14  
    15  ### Counter vs. atomic int64
    16  
    17  ```
    18  name                                            time/op
    19  Counter                                         27.3ns ± 1%
    20  Counter-2                                       27.2ns ±11%
    21  Counter-4                                       15.3ns ± 8%
    22  Counter-8                                       7.43ns ± 7%
    23  Counter-16                                      3.70ns ±10%
    24  Counter-32                                      1.77ns ± 3%
    25  Counter-64                                      0.96ns ±10%
    26  AtomicInt64                                     7.60ns ± 0%
    27  AtomicInt64-2                                   12.6ns ±13%
    28  AtomicInt64-4                                   13.5ns ±14%
    29  AtomicInt64-8                                   12.7ns ± 9%
    30  AtomicInt64-16                                  12.8ns ± 8%
    31  AtomicInt64-32                                  13.0ns ± 6%
    32  AtomicInt64-64                                  12.9ns ± 7%
    33  ```
    34  
    35  Here `time/op` stands for average time spent on operation. If you divide `10^9` by the result in nanoseconds per operation, you'd get the throughput in operations per second. Thus, the ideal theoretical scalability of a concurrent data structure implies that the reported `time/op` decreases proportionally with the increased number of CPU cores. On the contrary, if the measured time per operation increases when run on more cores, it means performance degradation.
    36  
    37  ### MapOf vs. sync.Map
    38  
    39  1,000 `[int, int]` entries with a warm-up, 100% Loads:
    40  ```
    41  IntegerMapOf_WarmUp/reads=100%                  24.0ns ± 0%
    42  IntegerMapOf_WarmUp/reads=100%-2                12.0ns ± 0%
    43  IntegerMapOf_WarmUp/reads=100%-4                6.02ns ± 0%
    44  IntegerMapOf_WarmUp/reads=100%-8                3.01ns ± 0%
    45  IntegerMapOf_WarmUp/reads=100%-16               1.50ns ± 0%
    46  IntegerMapOf_WarmUp/reads=100%-32               0.75ns ± 0%
    47  IntegerMapOf_WarmUp/reads=100%-64               0.38ns ± 0%
    48  IntegerMapStandard_WarmUp/reads=100%            55.3ns ± 0%
    49  IntegerMapStandard_WarmUp/reads=100%-2          27.6ns ± 0%
    50  IntegerMapStandard_WarmUp/reads=100%-4          16.1ns ± 3%
    51  IntegerMapStandard_WarmUp/reads=100%-8          8.35ns ± 7%
    52  IntegerMapStandard_WarmUp/reads=100%-16         4.24ns ± 7%
    53  IntegerMapStandard_WarmUp/reads=100%-32         2.18ns ± 6%
    54  IntegerMapStandard_WarmUp/reads=100%-64         1.11ns ± 3%
    55  ```
    56  
    57  1,000 `[int, int]` entries with a warm-up, 99% Loads, 0.5% Stores, 0.5% Deletes:
    58  ```
    59  IntegerMapOf_WarmUp/reads=99%                   31.0ns ± 0%
    60  IntegerMapOf_WarmUp/reads=99%-2                 16.4ns ± 1%
    61  IntegerMapOf_WarmUp/reads=99%-4                 8.42ns ± 0%
    62  IntegerMapOf_WarmUp/reads=99%-8                 4.41ns ± 0%
    63  IntegerMapOf_WarmUp/reads=99%-16                2.38ns ± 2%
    64  IntegerMapOf_WarmUp/reads=99%-32                1.37ns ± 4%
    65  IntegerMapOf_WarmUp/reads=99%-64                0.85ns ± 2%
    66  IntegerMapStandard_WarmUp/reads=99%              121ns ± 1%
    67  IntegerMapStandard_WarmUp/reads=99%-2            109ns ± 3%
    68  IntegerMapStandard_WarmUp/reads=99%-4            115ns ± 4%
    69  IntegerMapStandard_WarmUp/reads=99%-8            114ns ± 2%
    70  IntegerMapStandard_WarmUp/reads=99%-16           105ns ± 2%
    71  IntegerMapStandard_WarmUp/reads=99%-32          97.0ns ± 3%
    72  IntegerMapStandard_WarmUp/reads=99%-64          98.0ns ± 2%
    73  ```
    74  
    75  1,000 `[int, int]` entries with a warm-up, 75% Loads, 12.5% Stores, 12.5% Deletes:
    76  ```
    77  IntegerMapOf_WarmUp/reads=75%-reads             46.2ns ± 1%
    78  IntegerMapOf_WarmUp/reads=75%-reads-2           36.7ns ± 2%
    79  IntegerMapOf_WarmUp/reads=75%-reads-4           22.0ns ± 1%
    80  IntegerMapOf_WarmUp/reads=75%-reads-8           12.8ns ± 2%
    81  IntegerMapOf_WarmUp/reads=75%-reads-16          7.69ns ± 1%
    82  IntegerMapOf_WarmUp/reads=75%-reads-32          5.16ns ± 1%
    83  IntegerMapOf_WarmUp/reads=75%-reads-64          4.91ns ± 1%
    84  IntegerMapStandard_WarmUp/reads=75%-reads        156ns ± 0%
    85  IntegerMapStandard_WarmUp/reads=75%-reads-2      177ns ± 1%
    86  IntegerMapStandard_WarmUp/reads=75%-reads-4      197ns ± 1%
    87  IntegerMapStandard_WarmUp/reads=75%-reads-8      221ns ± 2%
    88  IntegerMapStandard_WarmUp/reads=75%-reads-16     242ns ± 1%
    89  IntegerMapStandard_WarmUp/reads=75%-reads-32     258ns ± 1%
    90  IntegerMapStandard_WarmUp/reads=75%-reads-64     264ns ± 1%
    91  ```
    92  
    93  ### MPMCQueue vs. Go channels
    94  
    95  Concurrent producers and consumers (1:1), queue/channel size 1,000, some work done by both producers and consumers:
    96  ```
    97  QueueProdConsWork100                             252ns ± 0%
    98  QueueProdConsWork100-2                           206ns ± 5%
    99  QueueProdConsWork100-4                           136ns ±12%
   100  QueueProdConsWork100-8                           110ns ± 6%
   101  QueueProdConsWork100-16                          108ns ± 2%
   102  QueueProdConsWork100-32                          102ns ± 2%
   103  QueueProdConsWork100-64                          101ns ± 0%
   104  ChanProdConsWork100                              283ns ± 0%
   105  ChanProdConsWork100-2                            406ns ±21%
   106  ChanProdConsWork100-4                            549ns ± 7%
   107  ChanProdConsWork100-8                            754ns ± 7%
   108  ChanProdConsWork100-16                           828ns ± 7%
   109  ChanProdConsWork100-32                           810ns ± 8%
   110  ChanProdConsWork100-64                           832ns ± 4%
   111  ```
   112  
   113  ### RBMutex vs. sync.RWMutex
   114  
   115  The writer locks on each 100,000 iteration with some work in the critical section for both readers and the writer:
   116  ```
   117  RBMutexWorkWrite100000                           146ns ± 0%
   118  RBMutexWorkWrite100000-2                        73.3ns ± 0%
   119  RBMutexWorkWrite100000-4                        36.7ns ± 0%
   120  RBMutexWorkWrite100000-8                        18.6ns ± 0%
   121  RBMutexWorkWrite100000-16                       9.83ns ± 3%
   122  RBMutexWorkWrite100000-32                       5.53ns ± 0%
   123  RBMutexWorkWrite100000-64                       4.04ns ± 3%
   124  RWMutexWorkWrite100000                           121ns ± 0%
   125  RWMutexWorkWrite100000-2                         128ns ± 1%
   126  RWMutexWorkWrite100000-4                         124ns ± 2%
   127  RWMutexWorkWrite100000-8                         101ns ± 1%
   128  RWMutexWorkWrite100000-16                       92.9ns ± 1%
   129  RWMutexWorkWrite100000-32                       89.9ns ± 1%
   130  RWMutexWorkWrite100000-64                       88.4ns ± 1%
   131  ```