github.com/cockroachdb/pebble@v1.1.1-0.20240513155919-3622ade60459/docs/memory.md (about) 1 # Memory Management 2 3 ## Background 4 5 Pebble has two significant sources of memory usage: MemTables and the 6 Block Cache. MemTables buffer data that has been written to the WAL 7 but not yet flushed to an SSTable. The Block Cache provides a cache of 8 uncompressed SSTable data blocks. 9 10 Originally, Pebble used regular Go memory allocation for the memory 11 backing both MemTables and the Block Cache. This was problematic as it 12 put significant pressure on the Go GC. The higher the bandwidth of 13 memory allocations, the more work GC has to do to reclaim the 14 memory. In order to lessen the pressure on the Go GC, an "allocation 15 cache" was introduced to the Block Cache which allowed reusing the 16 memory backing cached blocks in most circumstances. This produced a 17 dramatic reduction in GC pressure and a measurable performance 18 improvement in CockroachDB workloads. 19 20 Unfortunately, the use of Go allocated memory still caused a 21 problem. CockroachDB running on top of Pebble often resulted in an RSS 22 (resident set size) 2x what it was when using RocksDB. The cause of 23 this effect is due to the Go runtime's heuristic for triggering GC: 24 25 > A collection is triggered when the ratio of freshly allocated data 26 > to live data remaining after the previous collection reaches this 27 > percentage. 28 29 This percentage can be configured by the 30 [`GOGC`](https://golang.org/pkg/runtime/) environment variable or by 31 calling 32 [`debug.SetGCPercent`](https://golang.org/pkg/runtime/debug/#SetGCPercent). The 33 default value is `100`, which means that GC is triggered when the 34 freshly allocated data is equal to the amount of live data at the end 35 of the last collection period. This generally works well in practice, 36 but the Pebble Block Cache is often configured to be 10s of gigabytes 37 in size. Waiting for 10s of gigabytes of data to be allocated before 38 triggering a GC results in very large Go heap sizes. 39 40 ## Manual Memory Management 41 42 Attempting to adjust `GOGC` to account for the significant amount of 43 memory used by the Block Cache is fraught. What value should be used? 44 `10%`? `20%`? Should the setting be tuned dynamically? Rather than 45 introducing a heuristic which may have cascading effects on the 46 application using Pebble, we decided to move the Block Cache and 47 MemTable memory out of the Go heap. This is done by using the C memory 48 allocator, though it could also be done by providing a simple memory 49 allocator in Go which uses `mmap` to allocate memory. 50 51 In order to support manual memory management for the Block Cache and 52 MemTables, Pebble needs to precisely track their lifetime. This was 53 already being done for the MemTable in order to account for its memory 54 usage in metrics. It was mostly being done for the Block Cache. Values 55 stores in the Block Cache are reference counted and are returned to 56 the "alloc cache" when their reference count falls 57 to 0. Unfortunately, this tracking wasn't precise and there were 58 numerous cases where the cache values were being leaked. This was 59 acceptable in a world where the Go GC would clean up after us. It is 60 unacceptable if the leak becomes permanent. 61 62 ## Leak Detection 63 64 In order to find all of the cache value leaks, Pebble has a leak 65 detection facility built on top of 66 [`runtime.SetFinalizer`](https://golang.org/pkg/runtime/#SetFinalizer). A 67 finalizer is a function associated with an object which is run when 68 the object is no longer reachable. On the surface, this sounds perfect 69 as a facility for performing all memory reclamation. Unfortunately, 70 finalizers are generally frowned upon by the Go implementors, and come 71 with very loose guarantees: 72 73 > The finalizer is scheduled to run at some arbitrary time after the 74 > program can no longer reach the object to which obj points. There is 75 > no guarantee that finalizers will run before a program exits, so 76 > typically they are useful only for releasing non-memory resources 77 > associated with an object during a long-running program 78 79 This language is somewhat frightening, but in practice finalizers are run at the 80 end of every GC period. Pebble primarily relies on finalizers for its leak 81 detection facility. In the block cache, a finalizer is associated with the Go 82 allocated `cache.Value` object. When the finalizer is run, it checks that the 83 buffer backing the `cache.Value` has been freed. This leak detection facility is 84 enabled by the `"invariants"` build tag which is enabled by the Pebble unit 85 tests. 86 87 There also exists a very specific memory reclamation use case in the block cache 88 that ensures that structs with transitively reachable fields backed by manually 89 allocated memory that are pooled in a `sync.Pool` are freed correctly when their 90 parent struct is released from the pool and consequently garbage collected by 91 the Go runtime (see `cache/entry_normal.go`). The loose guarantees provided by 92 the runtime are reasonable to rely on in this case to prevent a memory leak.