github.com/matrixorigin/matrixone@v1.2.0/pkg/common/arenaskl/README.md (about) 1 # arenaskl 2 3 Fast, lock-free, arena-based Skiplist implementation in Go that supports iteration 4 in both directions. 5 6 Since this is pebble's internal lib, matrixone copied it and modified it to support some features and make it easier to use in matrixone. 7 8 ## Advantages 9 10 Arenaskl offers several advantages over other skiplist implementations: 11 12 * High performance that linearly scales with the number of cores. This is 13 achieved by allocating from a fixed-size arena and by avoiding locks. 14 * Iterators that can be allocated on the stack and easily cloned by value. 15 * Simple-to-use and low overhead model for detecting and handling race conditions 16 with other threads. 17 * Support for iterating in reverse (i.e. previous links). 18 19 ## Limitations 20 21 The advantages come at a cost that prevents arenaskl from being a general-purpose 22 skiplist implementation: 23 24 * The size of the arena sets a hard upper bound on the combined size of skiplist 25 nodes, keys, and values. This limit includes even the size of deleted nodes, 26 keys, and values. 27 * Deletion is not supported. Instead, higher-level code is expected to 28 add deletion tombstones and needs to process those tombstones 29 appropriately. 30 31 ## Pedigree 32 33 This code is based on Andy Kimball's arenaskl code: 34 35 https://github.com/andy-kimball/arenaskl 36 37 The arenaskl code is based on the skiplist found in Badger, a Go-based 38 KV store: 39 40 https://github.com/dgraph-io/badger/tree/master/skl 41 42 The skiplist in Badger is itself based on a C++ skiplist built for 43 Facebook's RocksDB: 44 45 https://github.com/facebook/rocksdb/tree/master/memtable 46 47 ## Benchmarks 48 49 The benchmarks consist of a mix of reads and writes executed in parallel. The 50 fraction of reads is indicated in the run name: "frac_X" indicates a run where 51 X percent of the operations are reads. 52 53 The results are much better than `skiplist` and `slist`. 54 55 ``` 56 name time/op 57 ReadWrite/frac_0-8 470ns ±11% 58 ReadWrite/frac_10-8 462ns ± 3% 59 ReadWrite/frac_20-8 436ns ± 2% 60 ReadWrite/frac_30-8 410ns ± 2% 61 ReadWrite/frac_40-8 385ns ± 2% 62 ReadWrite/frac_50-8 360ns ± 4% 63 ReadWrite/frac_60-8 386ns ± 1% 64 ReadWrite/frac_70-8 352ns ± 2% 65 ReadWrite/frac_80-8 306ns ± 3% 66 ReadWrite/frac_90-8 253ns ± 4% 67 ReadWrite/frac_100-8 28.1ns ± 2% 68 ``` 69 70 Note that the above numbers are for concurrent operations using 8x 71 parallelism. The same benchmarks without concurrency (use these 72 numbers when comparing vs batchskl): 73 74 ``` 75 name time/op 76 ReadWrite/frac_0 1.53µs ± 1% 77 ReadWrite/frac_10 1.46µs ± 2% 78 ReadWrite/frac_20 1.39µs ± 3% 79 ReadWrite/frac_30 1.28µs ± 3% 80 ReadWrite/frac_40 1.21µs ± 2% 81 ReadWrite/frac_50 1.11µs ± 3% 82 ReadWrite/frac_60 1.23µs ±17% 83 ReadWrite/frac_70 1.16µs ± 4% 84 ReadWrite/frac_80 959ns ± 3% 85 ReadWrite/frac_90 738ns ± 5% 86 ReadWrite/frac_100 81.9ns ± 2% 87 ``` 88 89 Forward and backward iteration are also fast: 90 91 ``` 92 name time/op 93 IterNext 3.97ns ± 5% 94 IterPrev 3.88ns ± 3% 95 ```