github.com/cockroachdb/pebble@v0.0.0-20231214172447-ab4952c5f87b/internal/arenaskl/README.md (about) 1 # arenaskl 2 3 Fast, lock-free, arena-based Skiplist implementation in Go that supports iteration 4 in both directions. 5 6 ## Advantages 7 8 Arenaskl offers several advantages over other skiplist implementations: 9 10 * High performance that linearly scales with the number of cores. This is 11 achieved by allocating from a fixed-size arena and by avoiding locks. 12 * Iterators that can be allocated on the stack and easily cloned by value. 13 * Simple-to-use and low overhead model for detecting and handling race conditions 14 with other threads. 15 * Support for iterating in reverse (i.e. previous links). 16 17 ## Limitations 18 19 The advantages come at a cost that prevents arenaskl from being a general-purpose 20 skiplist implementation: 21 22 * The size of the arena sets a hard upper bound on the combined size of skiplist 23 nodes, keys, and values. This limit includes even the size of deleted nodes, 24 keys, and values. 25 * Deletion is not supported. Instead, higher-level code is expected to 26 add deletion tombstones and needs to process those tombstones 27 appropriately. 28 29 ## Pedigree 30 31 This code is based on Andy Kimball's arenaskl code: 32 33 https://github.com/andy-kimball/arenaskl 34 35 The arenaskl code is based on the skiplist found in Badger, a Go-based 36 KV store: 37 38 https://github.com/dgraph-io/badger/tree/master/skl 39 40 The skiplist in Badger is itself based on a C++ skiplist built for 41 Facebook's RocksDB: 42 43 https://github.com/facebook/rocksdb/tree/master/memtable 44 45 ## Benchmarks 46 47 The benchmarks consist of a mix of reads and writes executed in parallel. The 48 fraction of reads is indicated in the run name: "frac_X" indicates a run where 49 X percent of the operations are reads. 50 51 The results are much better than `skiplist` and `slist`. 52 53 ``` 54 name time/op 55 ReadWrite/frac_0-8 470ns ±11% 56 ReadWrite/frac_10-8 462ns ± 3% 57 ReadWrite/frac_20-8 436ns ± 2% 58 ReadWrite/frac_30-8 410ns ± 2% 59 ReadWrite/frac_40-8 385ns ± 2% 60 ReadWrite/frac_50-8 360ns ± 4% 61 ReadWrite/frac_60-8 386ns ± 1% 62 ReadWrite/frac_70-8 352ns ± 2% 63 ReadWrite/frac_80-8 306ns ± 3% 64 ReadWrite/frac_90-8 253ns ± 4% 65 ReadWrite/frac_100-8 28.1ns ± 2% 66 ``` 67 68 Note that the above numbers are for concurrent operations using 8x 69 parallelism. The same benchmarks without concurrency (use these 70 numbers when comparing vs batchskl): 71 72 ``` 73 name time/op 74 ReadWrite/frac_0 1.53µs ± 1% 75 ReadWrite/frac_10 1.46µs ± 2% 76 ReadWrite/frac_20 1.39µs ± 3% 77 ReadWrite/frac_30 1.28µs ± 3% 78 ReadWrite/frac_40 1.21µs ± 2% 79 ReadWrite/frac_50 1.11µs ± 3% 80 ReadWrite/frac_60 1.23µs ±17% 81 ReadWrite/frac_70 1.16µs ± 4% 82 ReadWrite/frac_80 959ns ± 3% 83 ReadWrite/frac_90 738ns ± 5% 84 ReadWrite/frac_100 81.9ns ± 2% 85 ``` 86 87 Forward and backward iteration are also fast: 88 89 ``` 90 name time/op 91 IterNext 3.97ns ± 5% 92 IterPrev 3.88ns ± 3% 93 ```