github.com/fufuok/freelru@v0.13.3/README.md

github.com/fufuok/freelru@v0.13.3/README.md (about)

     1  [![Go Reference](https://pkg.go.dev/badge/github.com/elastic/go-freelru.svg)](https://pkg.go.dev/github.com/elastic/go-freelru)
     2  [![Go Report Card](https://goreportcard.com/badge/github.com/elastic/go-freelru)](https://goreportcard.com/report/github.com/elastic/go-freelru)
     3  [![Coverage Status](https://coveralls.io/repos/github/elastic/go-freelru/badge.svg?branch=main)](https://coveralls.io/github/elastic/go-freelru?branch=main)
     4  [![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)
     5  
     6  
     7  # FreeLRU - A GC-less, fast and generic LRU hashmap library for Go
     8  
     9  FreeLRU allows you to cache objects without introducing GC overhead.
    10  It uses Go generics for simplicity, type-safety and performance over interface types.
    11  It performs better than other LRU implementations in the Go benchmarks provided.
    12  The API is simple in order to ease migrations from other LRU implementations.
    13  The function to calculate hashes from the keys needs to be provided by the caller.
    14  
    15  ## `LRU`: Single-threaded LRU hashmap
    16  
    17  `LRU` is a single-threaded LRU hashmap implementation.
    18  It uses a fast exact LRU algorithm and has no locking overhead.
    19  It has been developed for low-GC overhead and type-safety.
    20  For thread-safety, pick one of `SyncedLRU` or `ShardedLRU` or do locking by yourself.
    21  
    22  ### Comparison with other single-threaded LRU implementations
    23  Get (key and value are both of type `int`)
    24  ```
    25  BenchmarkFreeLRUGet              73456962                15.17 ns/op           0 B/op          0 allocs/op
    26  BenchmarkSimpleLRUGet            91878808                12.09 ns/op           0 B/op          0 allocs/op
    27  BenchmarkMapGet                 173823274                6.884 ns/op           0 B/op          0 allocs/op
    28  ```
    29  Add
    30  ```
    31  BenchmarkFreeLRUAdd_int_int             39446706                30.04 ns/op            0 B/op          0 allocs/op
    32  BenchmarkFreeLRUAdd_int_int128          39622722                29.71 ns/op            0 B/op          0 allocs/op
    33  BenchmarkFreeLRUAdd_uint32_uint64       43750496                26.97 ns/op            0 B/op          0 allocs/op
    34  BenchmarkFreeLRUAdd_string_uint64       25839464                39.31 ns/op            0 B/op          0 allocs/op
    35  BenchmarkFreeLRUAdd_int_string          37269870                30.55 ns/op            0 B/op          0 allocs/op
    36  
    37  BenchmarkSimpleLRUAdd_int_int           12471030                86.33 ns/op           48 B/op          1 allocs/op
    38  BenchmarkSimpleLRUAdd_int_int128        11981545                85.70 ns/op           48 B/op          1 allocs/op
    39  BenchmarkSimpleLRUAdd_uint32_uint64     11506755                87.52 ns/op           48 B/op          1 allocs/op
    40  BenchmarkSimpleLRUAdd_string_uint64      8674652               142.8 ns/op            49 B/op          1 allocs/op
    41  BenchmarkSimpleLRUAdd_int_string        12267968                87.77 ns/op           48 B/op          1 allocs/op
    42  
    43  BenchmarkMapAdd_int_int                 34951609                48.08 ns/op            0 B/op          0 allocs/op
    44  BenchmarkMapAdd_int_int128              31082216                47.05 ns/op            0 B/op          0 allocs/op
    45  BenchmarkMapAdd_uint32_uint64           36277005                48.08 ns/op            0 B/op          0 allocs/op
    46  BenchmarkMapAdd_string_uint64           29380040                49.37 ns/op            0 B/op          0 allocs/op
    47  BenchmarkMapAdd_int_string              30325861                47.35 ns/op            0 B/op          0 allocs/op
    48  ```
    49  
    50  The comparison with Map is just for reference - Go maps don't implement LRU functionality and thus should
    51  be significantly faster than LRU implementations.
    52  
    53  ## `SyncedLRU`: Concurrent LRU hashmap for low concurrency.
    54  
    55  `SyncedLRU` is a concurrency-safe LRU hashmap implementation wrapped around `LRU`.
    56  It is best used in low-concurrency environments where lock contention isn't a thing to worry about.
    57  It uses an exact LRU algorithm.
    58  
    59  ## `ShardedLRU`: Concurrent LRU hashmap for high concurrency
    60  
    61  `ShardedLRU` is a sharded, concurrency-safe LRU hashmap implementation.
    62  It is best used in high-concurrency environments where lock contention is a thing.
    63  Due to the sharded nature, it uses an approximate LRU algorithm.
    64  
    65  FreeLRU is for single-threaded use only.
    66  For thread-safety, the locking of operations needs to be controlled by the caller.
    67  
    68  ### Comparison with other multithreaded LRU implementations
    69  Add with `GOMAXPROCS=1`
    70  ```
    71  BenchmarkParallelSyncedFreeLRUAdd_int_int128    42022706                28.27 ns/op            0 B/op          0 allocs/op
    72  BenchmarkParallelShardedFreeLRUAdd_int_int128   35353412                33.33 ns/op            0 B/op          0 allocs/op
    73  BenchmarkParallelFreeCacheAdd_int_int128        14825518                79.58 ns/op            0 B/op          0 allocs/op
    74  BenchmarkParallelRistrettoAdd_int_int128         5565997               206.1 ns/op           121 B/op          3 allocs/op
    75  BenchmarkParallelPhusluAdd_int_int128           28041186                41.26 ns/op            0 B/op          0 allocs/op
    76  BenchmarkParallelCloudflareAdd_int_int128        6300747               185.0 ns/op            48 B/op          2 allocs/op
    77  ```
    78  Add with `GOMAXPROCS=1000`
    79  ```
    80  BenchmarkParallelSyncedFreeLRUAdd_int_int128-1000               12251070               138.9 ns/op             0 B/op          0 allocs/op
    81  BenchmarkParallelShardedFreeLRUAdd_int_int128-1000              112706306               10.59 ns/op            0 B/op          0 allocs/op
    82  BenchmarkParallelFreeCacheAdd_int_int128-1000                   47873679                24.14 ns/op            0 B/op          0 allocs/op
    83  BenchmarkParallelRistrettoAdd_int_int128-1000                   69838436                16.93 ns/op          104 B/op          3 allocs/op
    84  BenchmarkParallelOracamanMapAdd_int_int128-1000                 25694386                40.48 ns/op           37 B/op          0 allocs/op
    85  BenchmarkParallelPhusluAdd_int_int128-1000                      89379122                14.19 ns/op            0 B/op          0 allocs/op
    86  ```
    87  `Ristretto` offloads the LRU functionality of `Add()` to a separate goroutine, which is why it is relatively fast. But the
    88  separate goroutine doesn't show up in the benchmarks, so the numbers are not directly comparable.
    89  
    90  `Oracaman` is not an LRU implementation, just a thread-safety wrapper around `map`.
    91  
    92  Get with `GOMAXPROCS=1`
    93  ```
    94  BenchmarkParallelSyncedGet      43031780                27.35 ns/op            0 B/op          0 allocs/op
    95  BenchmarkParallelShardedGet     51807500                22.86 ns/op            0 B/op          0 allocs/op
    96  BenchmarkParallelFreeCacheGet   21948183                53.52 ns/op           16 B/op          1 allocs/op
    97  BenchmarkParallelRistrettoGet   30343872                33.82 ns/op            7 B/op          0 allocs/op
    98  BenchmarkParallelBigCacheGet    21073627                51.08 ns/op           16 B/op          2 allocs/op
    99  BenchmarkParallelPhusluGet      59487482                20.02 ns/op            0 B/op          0 allocs/op
   100  BenchmarkParallelCloudflareGet  17011405                67.11 ns/op            8 B/op          1 allocs/op
   101  ```
   102  Get with `GOMAXPROCS=1000`
   103  ```
   104  BenchmarkParallelSyncedGet-1000                 10867552               151.0 ns/op             0 B/op          0 allocs/op
   105  BenchmarkParallelShardedGet-1000                287238988                4.061 ns/op           0 B/op          0 allocs/op
   106  BenchmarkParallelFreeCacheGet-1000              78045916                15.33 ns/op           16 B/op          1 allocs/op
   107  BenchmarkParallelRistrettoGet-1000              214839645                6.060 ns/op           7 B/op          0 allocs/op
   108  BenchmarkParallelBigCacheGet-1000               163672804                7.282 ns/op          16 B/op          2 allocs/op
   109  BenchmarkParallelPhusluGet-1000                 200133655                6.039 ns/op           0 B/op          0 allocs/op
   110  BenchmarkParallelCloudflareGet-1000             100000000               11.26 ns/op            8 B/op          1 allocs/op
   111  ```
   112  `Cloudflare` and `BigCache` only accept `string` as the key type.
   113  So the ser/deser of `int` to `string` is part of the benchmarks for a fair comparison
   114  
   115  Here you can see that `SyncedLRU` badly suffers from lock contention.
   116  `ShardedLRU` is ~37x faster than `SyncedLRU` in a high-concurrency situation and the second
   117  fastest LRU implementation (`Ristretto` and `Phuslu`) is 50% slower.
   118  
   119  ### Merging hashmap and ringbuffer
   120  
   121  Most LRU implementations combine Go's `map` for the key/value lookup and their own implementation of
   122  a circular doubly-linked list for keeping track of the recent-ness of objects.
   123  This requires one additional heap allocation for the list element. A second downside is that the list
   124  elements are not contiguous in memory, which causes more (expensive) CPU cache misses for accesses.
   125  
   126  FreeLRU addresses both issues by merging hashmap and ringbuffer into a contiguous array of elements.
   127  Each element contains key, value and two indices to keep the cached objects ordered by recent-ness.
   128  
   129  ### Avoiding GC overhead
   130  
   131  The contiguous array of elements is allocated on cache creation time.
   132  So there is only a single memory object instead of possibly millions that the GC needs to iterate during
   133  a garbage collection phase.
   134  The GC overhead can be quite large in comparison with the overall CPU usage of an application.
   135  Especially long-running and low-CPU applications with lots of cached objects suffer from the GC overhead.
   136  
   137  ### Type safety by using generics
   138  
   139  Using generics allows type-checking at compile time, so type conversions are not needed at runtime.
   140  The interface type or `any` requires type conversions at runtime, which may fail.
   141  
   142  ### Reducing memory allocations by using generics
   143  
   144  The interface types (aka `any`) is a pointer type and thus require a heap allocation when being stored.
   145  This is true even if you just need an integer to integer lookup or translation.
   146  
   147  With generics, the two allocations for key and value can be avoided: as long as the key and value types do not contain
   148  pointer types, no allocations will take place when adding such objects to the cache.
   149  
   150  ### Overcommitting of hashtable memory
   151  
   152  Each hashtable implementation tries to avoid hash collisions because collisions are expensive.
   153  FreeLRU allows allocating more elements than the maximum number of elements stored.
   154  This value is configurable and can be increased to reduce the likeliness of collisions.
   155  The performance of the LRU operations will generally become faster by doing so.
   156  Setting the size of LRU to a value of 2^N is recognized to replace slow divisions by fast bitwise AND operations.
   157  
   158  ## Benchmarks
   159  
   160  Below we compare FreeLRU with SimpleLRU, FreeCache and Go map.
   161  The comparison with FreeCache is just for reference - it is thread-safe and comes with a mutex/locking overhead.
   162  The comparison with Go map is also just for reference - Go maps don't implement LRU functionality and thus should
   163  be significantly faster than FreeLRU. It turns out, the opposite is the case.
   164  
   165  The numbers are from my laptop (Intel(R) Core(TM) i7-12800H @ 2800 MHz).
   166  
   167  The key and value types are part of the benchmark name, e.g. `int_int` means key and value are of type `int`.
   168  `int128` is a struct type made of two `uint64` fields.
   169  
   170  To run the benchmarks
   171  ```
   172  make benchmarks
   173  ```
   174  
   175  ### Adding objects
   176  
   177  FreeLRU is ~3.5x faster than SimpleLRU, no surprise.
   178  But it is also significantly faster than Go maps, which is a bit of a surprise.
   179  
   180  This is with 0% memory overcommitment (default) and a capacity of 8192.
   181   ```
   182  BenchmarkFreeLRUAdd_int_int-20                  43097347                27.41 ns/op            0 B/op          0 allocs/op
   183  BenchmarkFreeLRUAdd_int_int128-20               42129165                28.38 ns/op            0 B/op          0 allocs/op
   184  BenchmarkFreeLRUAdd_uint32_uint64-20            98322132                11.74 ns/op            0 B/op          0 allocs/op (*)
   185  BenchmarkFreeLRUAdd_string_uint64-20            39122446                31.12 ns/op            0 B/op          0 allocs/op
   186  BenchmarkFreeLRUAdd_int_string-20               81920673                14.00 ns/op            0 B/op          0 allocs/op (*)
   187  BenchmarkSimpleLRUAdd_int_int-20                12253708                93.85 ns/op           48 B/op          1 allocs/op
   188  BenchmarkSimpleLRUAdd_int_int128-20             12095150                94.26 ns/op           48 B/op          1 allocs/op
   189  BenchmarkSimpleLRUAdd_uint32_uint64-20          12367568                92.60 ns/op           48 B/op          1 allocs/op
   190  BenchmarkSimpleLRUAdd_string_uint64-20          10395525               119.0 ns/op            49 B/op          1 allocs/op
   191  BenchmarkSimpleLRUAdd_int_string-20             12373900                94.40 ns/op           48 B/op          1 allocs/op
   192  BenchmarkFreeCacheAdd_int_int-20                 9691870               122.9 ns/op             1 B/op          0 allocs/op
   193  BenchmarkFreeCacheAdd_int_int128-20              9240273               125.6 ns/op             1 B/op          0 allocs/op
   194  BenchmarkFreeCacheAdd_uint32_uint64-20           8140896               132.1 ns/op             1 B/op          0 allocs/op
   195  BenchmarkFreeCacheAdd_string_uint64-20           8248917               137.9 ns/op             1 B/op          0 allocs/op
   196  BenchmarkFreeCacheAdd_int_string-20              8079253               145.0 ns/op            64 B/op          1 allocs/op
   197  BenchmarkRistrettoAdd_int_int-20                11102623               100.1 ns/op           109 B/op          2 allocs/op
   198  BenchmarkRistrettoAdd_int128_int-20             10317686               113.5 ns/op           129 B/op          4 allocs/op
   199  BenchmarkRistrettoAdd_uint32_uint64-20          12892147                94.28 ns/op          104 B/op          2 allocs/op
   200  BenchmarkRistrettoAdd_string_uint64-20          11266416               105.8 ns/op           122 B/op          3 allocs/op
   201  BenchmarkRistrettoAdd_int_string-20             10360814               107.4 ns/op           129 B/op          4 allocs/op
   202  BenchmarkMapAdd_int_int-20                      35306983                46.29 ns/op            0 B/op          0 allocs/op
   203  BenchmarkMapAdd_int_int128-20                   30986126                45.16 ns/op            0 B/op          0 allocs/op
   204  BenchmarkMapAdd_string_uint64-20                28406497                49.35 ns/op            0 B/op          0 allocs/op
   205  ```
   206  (*)
   207  There is an interesting affect when using increasing number (0..N) as keys in combination with FNV1a().
   208  The number of collisions is strongly reduced here, thus the high performance.
   209  Exchanging the sequential numbers with random numbers results in roughly the same performance as the other results.
   210  
   211  Just to give you an idea for 100% memory overcommitment:
   212  Performance increased by ~20%.
   213  ```
   214  BenchmarkFreeLRUAdd_int_int-20                  53473030                21.52 ns/op            0 B/op          0 allocs/op
   215  BenchmarkFreeLRUAdd_int_int128-20               52852280                22.10 ns/op            0 B/op          0 allocs/op
   216  BenchmarkFreeLRUAdd_uint32_uint64-20            100000000               10.15 ns/op            0 B/op          0 allocs/op
   217  BenchmarkFreeLRUAdd_string_uint64-20            49477594                24.55 ns/op            0 B/op          0 allocs/op
   218  BenchmarkFreeLRUAdd_int_string-20               85288306                12.10 ns/op            0 B/op          0 allocs/op
   219  ```
   220  
   221  ### Getting objects
   222  
   223  This is with 0% memory overcommitment (default) and a capacity of 8192.
   224  ```
   225  BenchmarkFreeLRUGet-20                          83158561                13.80 ns/op            0 B/op          0 allocs/op
   226  BenchmarkSimpleLRUGet-20                        146248706                8.199 ns/op           0 B/op          0 allocs/op
   227  BenchmarkFreeCacheGet-20                        58229779                19.56 ns/op            0 B/op          0 allocs/op
   228  BenchmarkRistrettoGet-20                        31157457                35.37 ns/op           10 B/op          1 allocs/op
   229  BenchmarkPhusluGet-20                           55071919                20.63 ns/op            0 B/op          0 allocs/op
   230  BenchmarkMapGet-20                              195464706                6.031 ns/op           0 B/op          0 allocs/op
   231  ```
   232  
   233  ## Example usage
   234  
   235  ```go
   236  package main
   237  
   238  import (
   239  	"fmt"
   240  
   241  	"github.com/cespare/xxhash/v2"
   242  
   243  	"github.com/elastic/go-freelru"
   244  )
   245  
   246  // more hash function in https://github.com/elastic/go-freelru/blob/main/bench/hash.go
   247  func hashStringXXHASH(s string) uint32 {
   248  	return uint32(xxhash.Sum64String(s))
   249  }
   250  
   251  func main() {
   252  	lru, err := freelru.New[string, uint64](8192, hashStringXXHASH)
   253  	if err != nil {
   254  		panic(err)
   255  	}
   256  
   257  	key := "go-freelru"
   258  	val := uint64(999)
   259  	lru.Add(key, val)
   260  
   261  	if v, ok := lru.Get(key); ok {
   262  		fmt.Printf("found %v=%v\n", key, v)
   263  	}
   264  
   265  	// Output:
   266  	// found go-freelru=999
   267  }
   268  ```
   269  
   270  The function `hashInt(int) uint32` will be called to calculate a hash value of the key.
   271  Please take a look into `bench/` directory to find examples of hash functions.
   272  Here you will also find an amd64 version of the Go internal hash function, which uses AESENC features
   273  of the CPU.
   274  
   275  In case you already have a hash that you want to use as the key, you have to provide an "identity" function.
   276  
   277  ## Comparison of hash functions
   278  Hashing `int`
   279  ```
   280  BenchmarkHashInt_MapHash-20                             181521530                6.806 ns/op           0 B/op          0 allocs/op
   281  BenchmarkHashInt_MapHasher-20                           727805824                1.595 ns/op           0 B/op          0 allocs/op
   282  BenchmarkHashInt_FNV1A-20                               621439513                1.919 ns/op           0 B/op          0 allocs/op
   283  BenchmarkHashInt_FNV1AUnsafe-20                         706583145                1.699 ns/op           0 B/op          0 allocs/op
   284  BenchmarkHashInt_AESENC-20                              1000000000               0.9659 ns/op          0 B/op          0 allocs/op
   285  BenchmarkHashInt_XXHASH-20                              516779404                2.341 ns/op           0 B/op          0 allocs/op
   286  BenchmarkHashInt_XXH3HASH-20                            562645186                2.127 ns/op           0 B/op          0 allocs/op
   287  ```
   288  Hashing `string`
   289  ```
   290  BenchmarkHashString_MapHash-20                          72106830                15.80 ns/op            0 B/op          0 allocs/op
   291  BenchmarkHashString_MapHasher-20                        385338830                2.868 ns/op           0 B/op          0 allocs/op
   292  BenchmarkHashString_FNV1A-20                            60162328                19.33 ns/op            0 B/op          0 allocs/op
   293  BenchmarkHashString_AESENC-20                           475896514                2.472 ns/op           0 B/op          0 allocs/op
   294  BenchmarkHashString_XXHASH-20                           185842404                6.476 ns/op           0 B/op          0 allocs/op
   295  BenchmarkHashString_XXH3HASH-20                         375255375                3.182 ns/op           0 B/op          0 allocs/op
   296  ```
   297  As you can see, the speed depends on the object type to hash. I think, it mostly boils down to the size of the object.
   298  `MapHasher` is dangerous to use because it is not guaranteed to be stable across Go versions.
   299  `AESENC` uses the AES CPU extensions on X86-64. In theory, it should work on ARM64 as well (not tested by me). 
   300  
   301  For a small number of bytes, `FNV1A` is the fastest. 
   302  Otherwise, `XXH3` looks like a good choice.
   303  
   304  ## License
   305  The code is licensed under the Apache 2.0 license. See the `LICENSE` file for details.