github.com/dgraph-io/ristretto@v0.1.2-0.20240116140435-c67e07994f91/README.md (about) 1 # Ristretto 2 [](http://godoc.org/github.com/dgraph-io/ristretto) 3 [](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-tests.yml) 4 [](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-lint.yml) 5 [](https://coveralls.io/github/dgraph-io/ristretto?branch=main) 6 [](https://goreportcard.com/report/github.com/dgraph-io/ristretto) 7 8 Ristretto is a fast, concurrent cache library built with a focus on performance and correctness. 9 10 The motivation to build Ristretto comes from the need for a contention-free cache in [Dgraph][]. 11 12 [Dgraph]: https://github.com/dgraph-io/dgraph 13 14 ## Features 15 16 * **High Hit Ratios** - with our unique admission/eviction policy pairing, Ristretto's performance is best in class. 17 * **Eviction: SampledLFU** - on par with exact LRU and better performance on Search and Database traces. 18 * **Admission: TinyLFU** - extra performance with little memory overhead (12 bits per counter). 19 * **Fast Throughput** - we use a variety of techniques for managing contention and the result is excellent throughput. 20 * **Cost-Based Eviction** - any large new item deemed valuable can evict multiple smaller items (cost could be anything). 21 * **Fully Concurrent** - you can use as many goroutines as you want with little throughput degradation. 22 * **Metrics** - optional performance metrics for throughput, hit ratios, and other stats. 23 * **Simple API** - just figure out your ideal `Config` values and you're off and running. 24 25 ## Status 26 27 Ristretto is production-ready. See [Projects using Ristretto](#projects-using-ristretto). 28 29 ## Table of Contents 30 31 - [Ristretto](#ristretto) 32 - [Features](#features) 33 - [Status](#status) 34 - [Table of Contents](#table-of-contents) 35 - [Usage](#usage) 36 - [Example](#example) 37 - [Config](#config) 38 - [Benchmarks](#benchmarks) 39 - [Hit Ratios](#hit-ratios) 40 - [Search](#search) 41 - [Database](#database) 42 - [Looping](#looping) 43 - [CODASYL](#codasyl) 44 - [Throughput](#throughput) 45 - [Mixed](#mixed) 46 - [Read](#read) 47 - [Write](#write) 48 - [Projects Using Ristretto](#projects-using-ristretto) 49 - [FAQ](#faq) 50 - [How are you achieving this performance? What shortcuts are you taking?](#how-are-you-achieving-this-performance-what-shortcuts-are-you-taking) 51 - [Is Ristretto distributed?](#is-ristretto-distributed) 52 53 ## Usage 54 55 ### Example 56 57 ```go 58 package main 59 60 import ( 61 "fmt" 62 63 "github.com/dgraph-io/ristretto" 64 ) 65 66 func main() { 67 cache, err := ristretto.NewCache(&ristretto.Config{ 68 NumCounters: 1e7, // number of keys to track frequency of (10M). 69 MaxCost: 1 << 30, // maximum cost of cache (1GB). 70 BufferItems: 64, // number of keys per Get buffer. 71 }) 72 if err != nil { 73 panic(err) 74 } 75 76 // set a value with a cost of 1 77 cache.Set("key", "value", 1) 78 79 // wait for value to pass through buffers 80 cache.Wait() 81 82 // get value from cache 83 value, found := cache.Get("key") 84 if !found { 85 panic("missing value") 86 } 87 fmt.Println(value) 88 89 // del value from cache 90 cache.Del("key") 91 } 92 ``` 93 94 ### Config 95 96 The `Config` struct is passed to `NewCache` when creating Ristretto instances (see the example above). 97 98 **NumCounters** `int64` 99 100 NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full. 101 102 For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the *number of unique items* in the full cache, not necessarily the MaxCost value. 103 104 **MaxCost** `int64` 105 106 MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted. 107 108 MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted. 109 110 MaxCost could be anything as long as it matches how you're using the cost values when calling Set. 111 112 **BufferItems** `int64` 113 114 BufferItems is the size of the Get buffers. The best value we've found for this is 64. 115 116 If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this. 117 118 **Metrics** `bool` 119 120 Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead. 121 122 **OnEvict** `func(hashes [2]uint64, value interface{}, cost int64)` 123 124 OnEvict is called for every eviction. 125 126 **KeyToHash** `func(key interface{}) [2]uint64` 127 128 KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of [defaults depending on the underlying interface type](https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L19-L41). 129 130 Note that if you want 128bit hashes you should use the full `[2]uint64`, 131 otherwise just fill the `uint64` at the `0` position and it will behave like 132 any 64bit hash. 133 134 **Cost** `func(value interface{}) int64` 135 136 Cost is an optional function you can pass to the Config in order to evaluate 137 item cost at runtime, and only for the Set calls that aren't dropped (this is 138 useful if calculating item cost is particularly expensive and you don't want to 139 waste time on items that will be dropped anyways). 140 141 To signal to Ristretto that you'd like to use this Cost function: 142 143 1. Set the Cost field to a non-nil function. 144 2. When calling Set for new items or item updates, use a `cost` of 0. 145 146 ## Benchmarks 147 148 The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto. 149 150 ### Hit Ratios 151 152 #### Search 153 154 This trace is described as "disk read accesses initiated by a large commercial 155 search engine in response to various web search requests." 156 157 <p align="center"> 158 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Search%20(ARC-S3).svg"> 159 </p> 160 161 #### Database 162 163 This trace is described as "a database server running at a commercial site 164 running an ERP application on top of a commercial database." 165 166 <p align="center"> 167 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Database%20(ARC-DS1).svg"> 168 </p> 169 170 #### Looping 171 172 This trace demonstrates a looping access pattern. 173 174 <p align="center"> 175 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Glimpse%20(LIRS-GLI).svg"> 176 </p> 177 178 #### CODASYL 179 180 This trace is described as "references to a CODASYL database for a one hour 181 period." 182 183 <p align="center"> 184 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20CODASYL%20(ARC-OLTP).svg"> 185 </p> 186 187 ### Throughput 188 189 All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb 190 of RAM. 191 192 #### Mixed 193 194 <p align="center"> 195 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Mixed.svg"> 196 </p> 197 198 #### Read 199 200 <p align="center"> 201 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Read%20(Zipfian).svg"> 202 </p> 203 204 #### Write 205 206 <p align="center"> 207 <img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Write%20(Zipfian).svg"> 208 </p> 209 210 ## Projects Using Ristretto 211 212 Below is a list of known projects that use Ristretto: 213 214 - [Badger](https://github.com/dgraph-io/badger) - Embeddable key-value DB in Go 215 - [Dgraph](https://github.com/dgraph-io/dgraph) - Horizontally scalable and distributed GraphQL database with a graph backend 216 - [Vitess](https://github.com/vitessio/vitess) - Database clustering system for horizontal scaling of MySQL 217 - [SpiceDB](https://github.com/authzed/spicedb) - Horizontally scalable permissions database 218 219 ## FAQ 220 221 ### How are you achieving this performance? What shortcuts are you taking? 222 223 We go into detail in the [Ristretto blog post](https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/), but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent [admission policy](https://arxiv.org/abs/1512.00727) and SampledLFU eviction policy. 224 225 As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache. 226 227 ### Is Ristretto distributed? 228 229 No, it's just like any other Go library that you can import into your project and use in a single process.