github.com/fufuok/freelru@v0.13.3/README.md (about) 1 [![Go Reference](https://pkg.go.dev/badge/github.com/elastic/go-freelru.svg)](https://pkg.go.dev/github.com/elastic/go-freelru) 2 [![Go Report Card](https://goreportcard.com/badge/github.com/elastic/go-freelru)](https://goreportcard.com/report/github.com/elastic/go-freelru) 3 [![Coverage Status](https://coveralls.io/repos/github/elastic/go-freelru/badge.svg?branch=main)](https://coveralls.io/github/elastic/go-freelru?branch=main) 4 [![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go) 5 6 7 # FreeLRU - A GC-less, fast and generic LRU hashmap library for Go 8 9 FreeLRU allows you to cache objects without introducing GC overhead. 10 It uses Go generics for simplicity, type-safety and performance over interface types. 11 It performs better than other LRU implementations in the Go benchmarks provided. 12 The API is simple in order to ease migrations from other LRU implementations. 13 The function to calculate hashes from the keys needs to be provided by the caller. 14 15 ## `LRU`: Single-threaded LRU hashmap 16 17 `LRU` is a single-threaded LRU hashmap implementation. 18 It uses a fast exact LRU algorithm and has no locking overhead. 19 It has been developed for low-GC overhead and type-safety. 20 For thread-safety, pick one of `SyncedLRU` or `ShardedLRU` or do locking by yourself. 21 22 ### Comparison with other single-threaded LRU implementations 23 Get (key and value are both of type `int`) 24 ``` 25 BenchmarkFreeLRUGet 73456962 15.17 ns/op 0 B/op 0 allocs/op 26 BenchmarkSimpleLRUGet 91878808 12.09 ns/op 0 B/op 0 allocs/op 27 BenchmarkMapGet 173823274 6.884 ns/op 0 B/op 0 allocs/op 28 ``` 29 Add 30 ``` 31 BenchmarkFreeLRUAdd_int_int 39446706 30.04 ns/op 0 B/op 0 allocs/op 32 BenchmarkFreeLRUAdd_int_int128 39622722 29.71 ns/op 0 B/op 0 allocs/op 33 BenchmarkFreeLRUAdd_uint32_uint64 43750496 26.97 ns/op 0 B/op 0 allocs/op 34 BenchmarkFreeLRUAdd_string_uint64 25839464 39.31 ns/op 0 B/op 0 allocs/op 35 BenchmarkFreeLRUAdd_int_string 37269870 30.55 ns/op 0 B/op 0 allocs/op 36 37 BenchmarkSimpleLRUAdd_int_int 12471030 86.33 ns/op 48 B/op 1 allocs/op 38 BenchmarkSimpleLRUAdd_int_int128 11981545 85.70 ns/op 48 B/op 1 allocs/op 39 BenchmarkSimpleLRUAdd_uint32_uint64 11506755 87.52 ns/op 48 B/op 1 allocs/op 40 BenchmarkSimpleLRUAdd_string_uint64 8674652 142.8 ns/op 49 B/op 1 allocs/op 41 BenchmarkSimpleLRUAdd_int_string 12267968 87.77 ns/op 48 B/op 1 allocs/op 42 43 BenchmarkMapAdd_int_int 34951609 48.08 ns/op 0 B/op 0 allocs/op 44 BenchmarkMapAdd_int_int128 31082216 47.05 ns/op 0 B/op 0 allocs/op 45 BenchmarkMapAdd_uint32_uint64 36277005 48.08 ns/op 0 B/op 0 allocs/op 46 BenchmarkMapAdd_string_uint64 29380040 49.37 ns/op 0 B/op 0 allocs/op 47 BenchmarkMapAdd_int_string 30325861 47.35 ns/op 0 B/op 0 allocs/op 48 ``` 49 50 The comparison with Map is just for reference - Go maps don't implement LRU functionality and thus should 51 be significantly faster than LRU implementations. 52 53 ## `SyncedLRU`: Concurrent LRU hashmap for low concurrency. 54 55 `SyncedLRU` is a concurrency-safe LRU hashmap implementation wrapped around `LRU`. 56 It is best used in low-concurrency environments where lock contention isn't a thing to worry about. 57 It uses an exact LRU algorithm. 58 59 ## `ShardedLRU`: Concurrent LRU hashmap for high concurrency 60 61 `ShardedLRU` is a sharded, concurrency-safe LRU hashmap implementation. 62 It is best used in high-concurrency environments where lock contention is a thing. 63 Due to the sharded nature, it uses an approximate LRU algorithm. 64 65 FreeLRU is for single-threaded use only. 66 For thread-safety, the locking of operations needs to be controlled by the caller. 67 68 ### Comparison with other multithreaded LRU implementations 69 Add with `GOMAXPROCS=1` 70 ``` 71 BenchmarkParallelSyncedFreeLRUAdd_int_int128 42022706 28.27 ns/op 0 B/op 0 allocs/op 72 BenchmarkParallelShardedFreeLRUAdd_int_int128 35353412 33.33 ns/op 0 B/op 0 allocs/op 73 BenchmarkParallelFreeCacheAdd_int_int128 14825518 79.58 ns/op 0 B/op 0 allocs/op 74 BenchmarkParallelRistrettoAdd_int_int128 5565997 206.1 ns/op 121 B/op 3 allocs/op 75 BenchmarkParallelPhusluAdd_int_int128 28041186 41.26 ns/op 0 B/op 0 allocs/op 76 BenchmarkParallelCloudflareAdd_int_int128 6300747 185.0 ns/op 48 B/op 2 allocs/op 77 ``` 78 Add with `GOMAXPROCS=1000` 79 ``` 80 BenchmarkParallelSyncedFreeLRUAdd_int_int128-1000 12251070 138.9 ns/op 0 B/op 0 allocs/op 81 BenchmarkParallelShardedFreeLRUAdd_int_int128-1000 112706306 10.59 ns/op 0 B/op 0 allocs/op 82 BenchmarkParallelFreeCacheAdd_int_int128-1000 47873679 24.14 ns/op 0 B/op 0 allocs/op 83 BenchmarkParallelRistrettoAdd_int_int128-1000 69838436 16.93 ns/op 104 B/op 3 allocs/op 84 BenchmarkParallelOracamanMapAdd_int_int128-1000 25694386 40.48 ns/op 37 B/op 0 allocs/op 85 BenchmarkParallelPhusluAdd_int_int128-1000 89379122 14.19 ns/op 0 B/op 0 allocs/op 86 ``` 87 `Ristretto` offloads the LRU functionality of `Add()` to a separate goroutine, which is why it is relatively fast. But the 88 separate goroutine doesn't show up in the benchmarks, so the numbers are not directly comparable. 89 90 `Oracaman` is not an LRU implementation, just a thread-safety wrapper around `map`. 91 92 Get with `GOMAXPROCS=1` 93 ``` 94 BenchmarkParallelSyncedGet 43031780 27.35 ns/op 0 B/op 0 allocs/op 95 BenchmarkParallelShardedGet 51807500 22.86 ns/op 0 B/op 0 allocs/op 96 BenchmarkParallelFreeCacheGet 21948183 53.52 ns/op 16 B/op 1 allocs/op 97 BenchmarkParallelRistrettoGet 30343872 33.82 ns/op 7 B/op 0 allocs/op 98 BenchmarkParallelBigCacheGet 21073627 51.08 ns/op 16 B/op 2 allocs/op 99 BenchmarkParallelPhusluGet 59487482 20.02 ns/op 0 B/op 0 allocs/op 100 BenchmarkParallelCloudflareGet 17011405 67.11 ns/op 8 B/op 1 allocs/op 101 ``` 102 Get with `GOMAXPROCS=1000` 103 ``` 104 BenchmarkParallelSyncedGet-1000 10867552 151.0 ns/op 0 B/op 0 allocs/op 105 BenchmarkParallelShardedGet-1000 287238988 4.061 ns/op 0 B/op 0 allocs/op 106 BenchmarkParallelFreeCacheGet-1000 78045916 15.33 ns/op 16 B/op 1 allocs/op 107 BenchmarkParallelRistrettoGet-1000 214839645 6.060 ns/op 7 B/op 0 allocs/op 108 BenchmarkParallelBigCacheGet-1000 163672804 7.282 ns/op 16 B/op 2 allocs/op 109 BenchmarkParallelPhusluGet-1000 200133655 6.039 ns/op 0 B/op 0 allocs/op 110 BenchmarkParallelCloudflareGet-1000 100000000 11.26 ns/op 8 B/op 1 allocs/op 111 ``` 112 `Cloudflare` and `BigCache` only accept `string` as the key type. 113 So the ser/deser of `int` to `string` is part of the benchmarks for a fair comparison 114 115 Here you can see that `SyncedLRU` badly suffers from lock contention. 116 `ShardedLRU` is ~37x faster than `SyncedLRU` in a high-concurrency situation and the second 117 fastest LRU implementation (`Ristretto` and `Phuslu`) is 50% slower. 118 119 ### Merging hashmap and ringbuffer 120 121 Most LRU implementations combine Go's `map` for the key/value lookup and their own implementation of 122 a circular doubly-linked list for keeping track of the recent-ness of objects. 123 This requires one additional heap allocation for the list element. A second downside is that the list 124 elements are not contiguous in memory, which causes more (expensive) CPU cache misses for accesses. 125 126 FreeLRU addresses both issues by merging hashmap and ringbuffer into a contiguous array of elements. 127 Each element contains key, value and two indices to keep the cached objects ordered by recent-ness. 128 129 ### Avoiding GC overhead 130 131 The contiguous array of elements is allocated on cache creation time. 132 So there is only a single memory object instead of possibly millions that the GC needs to iterate during 133 a garbage collection phase. 134 The GC overhead can be quite large in comparison with the overall CPU usage of an application. 135 Especially long-running and low-CPU applications with lots of cached objects suffer from the GC overhead. 136 137 ### Type safety by using generics 138 139 Using generics allows type-checking at compile time, so type conversions are not needed at runtime. 140 The interface type or `any` requires type conversions at runtime, which may fail. 141 142 ### Reducing memory allocations by using generics 143 144 The interface types (aka `any`) is a pointer type and thus require a heap allocation when being stored. 145 This is true even if you just need an integer to integer lookup or translation. 146 147 With generics, the two allocations for key and value can be avoided: as long as the key and value types do not contain 148 pointer types, no allocations will take place when adding such objects to the cache. 149 150 ### Overcommitting of hashtable memory 151 152 Each hashtable implementation tries to avoid hash collisions because collisions are expensive. 153 FreeLRU allows allocating more elements than the maximum number of elements stored. 154 This value is configurable and can be increased to reduce the likeliness of collisions. 155 The performance of the LRU operations will generally become faster by doing so. 156 Setting the size of LRU to a value of 2^N is recognized to replace slow divisions by fast bitwise AND operations. 157 158 ## Benchmarks 159 160 Below we compare FreeLRU with SimpleLRU, FreeCache and Go map. 161 The comparison with FreeCache is just for reference - it is thread-safe and comes with a mutex/locking overhead. 162 The comparison with Go map is also just for reference - Go maps don't implement LRU functionality and thus should 163 be significantly faster than FreeLRU. It turns out, the opposite is the case. 164 165 The numbers are from my laptop (Intel(R) Core(TM) i7-12800H @ 2800 MHz). 166 167 The key and value types are part of the benchmark name, e.g. `int_int` means key and value are of type `int`. 168 `int128` is a struct type made of two `uint64` fields. 169 170 To run the benchmarks 171 ``` 172 make benchmarks 173 ``` 174 175 ### Adding objects 176 177 FreeLRU is ~3.5x faster than SimpleLRU, no surprise. 178 But it is also significantly faster than Go maps, which is a bit of a surprise. 179 180 This is with 0% memory overcommitment (default) and a capacity of 8192. 181 ``` 182 BenchmarkFreeLRUAdd_int_int-20 43097347 27.41 ns/op 0 B/op 0 allocs/op 183 BenchmarkFreeLRUAdd_int_int128-20 42129165 28.38 ns/op 0 B/op 0 allocs/op 184 BenchmarkFreeLRUAdd_uint32_uint64-20 98322132 11.74 ns/op 0 B/op 0 allocs/op (*) 185 BenchmarkFreeLRUAdd_string_uint64-20 39122446 31.12 ns/op 0 B/op 0 allocs/op 186 BenchmarkFreeLRUAdd_int_string-20 81920673 14.00 ns/op 0 B/op 0 allocs/op (*) 187 BenchmarkSimpleLRUAdd_int_int-20 12253708 93.85 ns/op 48 B/op 1 allocs/op 188 BenchmarkSimpleLRUAdd_int_int128-20 12095150 94.26 ns/op 48 B/op 1 allocs/op 189 BenchmarkSimpleLRUAdd_uint32_uint64-20 12367568 92.60 ns/op 48 B/op 1 allocs/op 190 BenchmarkSimpleLRUAdd_string_uint64-20 10395525 119.0 ns/op 49 B/op 1 allocs/op 191 BenchmarkSimpleLRUAdd_int_string-20 12373900 94.40 ns/op 48 B/op 1 allocs/op 192 BenchmarkFreeCacheAdd_int_int-20 9691870 122.9 ns/op 1 B/op 0 allocs/op 193 BenchmarkFreeCacheAdd_int_int128-20 9240273 125.6 ns/op 1 B/op 0 allocs/op 194 BenchmarkFreeCacheAdd_uint32_uint64-20 8140896 132.1 ns/op 1 B/op 0 allocs/op 195 BenchmarkFreeCacheAdd_string_uint64-20 8248917 137.9 ns/op 1 B/op 0 allocs/op 196 BenchmarkFreeCacheAdd_int_string-20 8079253 145.0 ns/op 64 B/op 1 allocs/op 197 BenchmarkRistrettoAdd_int_int-20 11102623 100.1 ns/op 109 B/op 2 allocs/op 198 BenchmarkRistrettoAdd_int128_int-20 10317686 113.5 ns/op 129 B/op 4 allocs/op 199 BenchmarkRistrettoAdd_uint32_uint64-20 12892147 94.28 ns/op 104 B/op 2 allocs/op 200 BenchmarkRistrettoAdd_string_uint64-20 11266416 105.8 ns/op 122 B/op 3 allocs/op 201 BenchmarkRistrettoAdd_int_string-20 10360814 107.4 ns/op 129 B/op 4 allocs/op 202 BenchmarkMapAdd_int_int-20 35306983 46.29 ns/op 0 B/op 0 allocs/op 203 BenchmarkMapAdd_int_int128-20 30986126 45.16 ns/op 0 B/op 0 allocs/op 204 BenchmarkMapAdd_string_uint64-20 28406497 49.35 ns/op 0 B/op 0 allocs/op 205 ``` 206 (*) 207 There is an interesting affect when using increasing number (0..N) as keys in combination with FNV1a(). 208 The number of collisions is strongly reduced here, thus the high performance. 209 Exchanging the sequential numbers with random numbers results in roughly the same performance as the other results. 210 211 Just to give you an idea for 100% memory overcommitment: 212 Performance increased by ~20%. 213 ``` 214 BenchmarkFreeLRUAdd_int_int-20 53473030 21.52 ns/op 0 B/op 0 allocs/op 215 BenchmarkFreeLRUAdd_int_int128-20 52852280 22.10 ns/op 0 B/op 0 allocs/op 216 BenchmarkFreeLRUAdd_uint32_uint64-20 100000000 10.15 ns/op 0 B/op 0 allocs/op 217 BenchmarkFreeLRUAdd_string_uint64-20 49477594 24.55 ns/op 0 B/op 0 allocs/op 218 BenchmarkFreeLRUAdd_int_string-20 85288306 12.10 ns/op 0 B/op 0 allocs/op 219 ``` 220 221 ### Getting objects 222 223 This is with 0% memory overcommitment (default) and a capacity of 8192. 224 ``` 225 BenchmarkFreeLRUGet-20 83158561 13.80 ns/op 0 B/op 0 allocs/op 226 BenchmarkSimpleLRUGet-20 146248706 8.199 ns/op 0 B/op 0 allocs/op 227 BenchmarkFreeCacheGet-20 58229779 19.56 ns/op 0 B/op 0 allocs/op 228 BenchmarkRistrettoGet-20 31157457 35.37 ns/op 10 B/op 1 allocs/op 229 BenchmarkPhusluGet-20 55071919 20.63 ns/op 0 B/op 0 allocs/op 230 BenchmarkMapGet-20 195464706 6.031 ns/op 0 B/op 0 allocs/op 231 ``` 232 233 ## Example usage 234 235 ```go 236 package main 237 238 import ( 239 "fmt" 240 241 "github.com/cespare/xxhash/v2" 242 243 "github.com/elastic/go-freelru" 244 ) 245 246 // more hash function in https://github.com/elastic/go-freelru/blob/main/bench/hash.go 247 func hashStringXXHASH(s string) uint32 { 248 return uint32(xxhash.Sum64String(s)) 249 } 250 251 func main() { 252 lru, err := freelru.New[string, uint64](8192, hashStringXXHASH) 253 if err != nil { 254 panic(err) 255 } 256 257 key := "go-freelru" 258 val := uint64(999) 259 lru.Add(key, val) 260 261 if v, ok := lru.Get(key); ok { 262 fmt.Printf("found %v=%v\n", key, v) 263 } 264 265 // Output: 266 // found go-freelru=999 267 } 268 ``` 269 270 The function `hashInt(int) uint32` will be called to calculate a hash value of the key. 271 Please take a look into `bench/` directory to find examples of hash functions. 272 Here you will also find an amd64 version of the Go internal hash function, which uses AESENC features 273 of the CPU. 274 275 In case you already have a hash that you want to use as the key, you have to provide an "identity" function. 276 277 ## Comparison of hash functions 278 Hashing `int` 279 ``` 280 BenchmarkHashInt_MapHash-20 181521530 6.806 ns/op 0 B/op 0 allocs/op 281 BenchmarkHashInt_MapHasher-20 727805824 1.595 ns/op 0 B/op 0 allocs/op 282 BenchmarkHashInt_FNV1A-20 621439513 1.919 ns/op 0 B/op 0 allocs/op 283 BenchmarkHashInt_FNV1AUnsafe-20 706583145 1.699 ns/op 0 B/op 0 allocs/op 284 BenchmarkHashInt_AESENC-20 1000000000 0.9659 ns/op 0 B/op 0 allocs/op 285 BenchmarkHashInt_XXHASH-20 516779404 2.341 ns/op 0 B/op 0 allocs/op 286 BenchmarkHashInt_XXH3HASH-20 562645186 2.127 ns/op 0 B/op 0 allocs/op 287 ``` 288 Hashing `string` 289 ``` 290 BenchmarkHashString_MapHash-20 72106830 15.80 ns/op 0 B/op 0 allocs/op 291 BenchmarkHashString_MapHasher-20 385338830 2.868 ns/op 0 B/op 0 allocs/op 292 BenchmarkHashString_FNV1A-20 60162328 19.33 ns/op 0 B/op 0 allocs/op 293 BenchmarkHashString_AESENC-20 475896514 2.472 ns/op 0 B/op 0 allocs/op 294 BenchmarkHashString_XXHASH-20 185842404 6.476 ns/op 0 B/op 0 allocs/op 295 BenchmarkHashString_XXH3HASH-20 375255375 3.182 ns/op 0 B/op 0 allocs/op 296 ``` 297 As you can see, the speed depends on the object type to hash. I think, it mostly boils down to the size of the object. 298 `MapHasher` is dangerous to use because it is not guaranteed to be stable across Go versions. 299 `AESENC` uses the AES CPU extensions on X86-64. In theory, it should work on ARM64 as well (not tested by me). 300 301 For a small number of bytes, `FNV1A` is the fastest. 302 Otherwise, `XXH3` looks like a good choice. 303 304 ## License 305 The code is licensed under the Apache 2.0 license. See the `LICENSE` file for details.