github.com/cockroachdb/pebble@v0.0.0-20231214172447-ab4952c5f87b/internal/arenaskl/README.md (about)

     1  # arenaskl
     2  
     3  Fast, lock-free, arena-based Skiplist implementation in Go that supports iteration
     4  in both directions.
     5  
     6  ## Advantages
     7  
     8  Arenaskl offers several advantages over other skiplist implementations:
     9  
    10  * High performance that linearly scales with the number of cores. This is
    11    achieved by allocating from a fixed-size arena and by avoiding locks.
    12  * Iterators that can be allocated on the stack and easily cloned by value.
    13  * Simple-to-use and low overhead model for detecting and handling race conditions
    14    with other threads.
    15  * Support for iterating in reverse (i.e. previous links). 
    16  
    17  ## Limitations
    18  
    19  The advantages come at a cost that prevents arenaskl from being a general-purpose
    20  skiplist implementation:
    21  
    22  * The size of the arena sets a hard upper bound on the combined size of skiplist
    23    nodes, keys, and values. This limit includes even the size of deleted nodes,
    24    keys, and values.
    25  * Deletion is not supported. Instead, higher-level code is expected to
    26    add deletion tombstones and needs to process those tombstones
    27    appropriately.
    28  
    29  ## Pedigree
    30  
    31  This code is based on Andy Kimball's arenaskl code:
    32  
    33  https://github.com/andy-kimball/arenaskl
    34  
    35  The arenaskl code is based on the skiplist found in Badger, a Go-based
    36  KV store:
    37  
    38  https://github.com/dgraph-io/badger/tree/master/skl
    39  
    40  The skiplist in Badger is itself based on a C++ skiplist built for
    41  Facebook's RocksDB:
    42  
    43  https://github.com/facebook/rocksdb/tree/master/memtable
    44  
    45  ## Benchmarks
    46  
    47  The benchmarks consist of a mix of reads and writes executed in parallel. The
    48  fraction of reads is indicated in the run name: "frac_X" indicates a run where
    49  X percent of the operations are reads.
    50  
    51  The results are much better than `skiplist` and `slist`.
    52  
    53  ```
    54  name                  time/op
    55  ReadWrite/frac_0-8     470ns ±11%
    56  ReadWrite/frac_10-8    462ns ± 3%
    57  ReadWrite/frac_20-8    436ns ± 2%
    58  ReadWrite/frac_30-8    410ns ± 2%
    59  ReadWrite/frac_40-8    385ns ± 2%
    60  ReadWrite/frac_50-8    360ns ± 4%
    61  ReadWrite/frac_60-8    386ns ± 1%
    62  ReadWrite/frac_70-8    352ns ± 2%
    63  ReadWrite/frac_80-8    306ns ± 3%
    64  ReadWrite/frac_90-8    253ns ± 4%
    65  ReadWrite/frac_100-8  28.1ns ± 2%
    66  ```
    67  
    68  Note that the above numbers are for concurrent operations using 8x
    69  parallelism. The same benchmarks without concurrency (use these
    70  numbers when comparing vs batchskl):
    71  
    72  ```
    73  name                time/op
    74  ReadWrite/frac_0    1.53µs ± 1%
    75  ReadWrite/frac_10   1.46µs ± 2%
    76  ReadWrite/frac_20   1.39µs ± 3%
    77  ReadWrite/frac_30   1.28µs ± 3%
    78  ReadWrite/frac_40   1.21µs ± 2%
    79  ReadWrite/frac_50   1.11µs ± 3%
    80  ReadWrite/frac_60   1.23µs ±17%
    81  ReadWrite/frac_70   1.16µs ± 4%
    82  ReadWrite/frac_80    959ns ± 3%
    83  ReadWrite/frac_90    738ns ± 5%
    84  ReadWrite/frac_100  81.9ns ± 2%
    85  ```
    86  
    87  Forward and backward iteration are also fast:
    88  
    89  ```
    90  name                time/op
    91  IterNext            3.97ns ± 5%
    92  IterPrev            3.88ns ± 3%
    93  ```