github.com/cockroachdb/pebble@v1.1.1-0.20240513155919-3622ade60459/docs/memory.md (about)

     1  # Memory Management
     2  
     3  ## Background
     4  
     5  Pebble has two significant sources of memory usage: MemTables and the
     6  Block Cache. MemTables buffer data that has been written to the WAL
     7  but not yet flushed to an SSTable. The Block Cache provides a cache of
     8  uncompressed SSTable data blocks.
     9  
    10  Originally, Pebble used regular Go memory allocation for the memory
    11  backing both MemTables and the Block Cache. This was problematic as it
    12  put significant pressure on the Go GC. The higher the bandwidth of
    13  memory allocations, the more work GC has to do to reclaim the
    14  memory. In order to lessen the pressure on the Go GC, an "allocation
    15  cache" was introduced to the Block Cache which allowed reusing the
    16  memory backing cached blocks in most circumstances. This produced a
    17  dramatic reduction in GC pressure and a measurable performance
    18  improvement in CockroachDB workloads.
    19  
    20  Unfortunately, the use of Go allocated memory still caused a
    21  problem. CockroachDB running on top of Pebble often resulted in an RSS
    22  (resident set size) 2x what it was when using RocksDB. The cause of
    23  this effect is due to the Go runtime's heuristic for triggering GC:
    24  
    25  > A collection is triggered when the ratio of freshly allocated data
    26  > to live data remaining after the previous collection reaches this
    27  > percentage.
    28  
    29  This percentage can be configured by the
    30  [`GOGC`](https://golang.org/pkg/runtime/) environment variable or by
    31  calling
    32  [`debug.SetGCPercent`](https://golang.org/pkg/runtime/debug/#SetGCPercent). The
    33  default value is `100`, which means that GC is triggered when the
    34  freshly allocated data is equal to the amount of live data at the end
    35  of the last collection period. This generally works well in practice,
    36  but the Pebble Block Cache is often configured to be 10s of gigabytes
    37  in size. Waiting for 10s of gigabytes of data to be allocated before
    38  triggering a GC results in very large Go heap sizes.
    39  
    40  ## Manual Memory Management
    41  
    42  Attempting to adjust `GOGC` to account for the significant amount of
    43  memory used by the Block Cache is fraught. What value should be used?
    44  `10%`? `20%`? Should the setting be tuned dynamically? Rather than
    45  introducing a heuristic which may have cascading effects on the
    46  application using Pebble, we decided to move the Block Cache and
    47  MemTable memory out of the Go heap. This is done by using the C memory
    48  allocator, though it could also be done by providing a simple memory
    49  allocator in Go which uses `mmap` to allocate memory.
    50  
    51  In order to support manual memory management for the Block Cache and
    52  MemTables, Pebble needs to precisely track their lifetime. This was
    53  already being done for the MemTable in order to account for its memory
    54  usage in metrics. It was mostly being done for the Block Cache. Values
    55  stores in the Block Cache are reference counted and are returned to
    56  the "alloc cache" when their reference count falls
    57  to 0. Unfortunately, this tracking wasn't precise and there were
    58  numerous cases where the cache values were being leaked. This was
    59  acceptable in a world where the Go GC would clean up after us. It is
    60  unacceptable if the leak becomes permanent.
    61  
    62  ## Leak Detection
    63  
    64  In order to find all of the cache value leaks, Pebble has a leak
    65  detection facility built on top of
    66  [`runtime.SetFinalizer`](https://golang.org/pkg/runtime/#SetFinalizer). A
    67  finalizer is a function associated with an object which is run when
    68  the object is no longer reachable. On the surface, this sounds perfect
    69  as a facility for performing all memory reclamation. Unfortunately,
    70  finalizers are generally frowned upon by the Go implementors, and come
    71  with very loose guarantees:
    72  
    73  > The finalizer is scheduled to run at some arbitrary time after the
    74  > program can no longer reach the object to which obj points. There is
    75  > no guarantee that finalizers will run before a program exits, so
    76  > typically they are useful only for releasing non-memory resources
    77  > associated with an object during a long-running program
    78  
    79  This language is somewhat frightening, but in practice finalizers are run at the
    80  end of every GC period. Pebble primarily relies on finalizers for its leak
    81  detection facility. In the block cache, a finalizer is associated with the Go
    82  allocated `cache.Value` object. When the finalizer is run, it checks that the
    83  buffer backing the `cache.Value` has been freed. This leak detection facility is
    84  enabled by the `"invariants"` build tag which is enabled by the Pebble unit
    85  tests.
    86  
    87  There also exists a very specific memory reclamation use case in the block cache
    88  that ensures that structs with transitively reachable fields backed by manually
    89  allocated memory that are pooled in a `sync.Pool` are freed correctly when their
    90  parent struct is released from the pool and consequently garbage collected by
    91  the Go runtime (see `cache/entry_normal.go`). The loose guarantees provided by
    92  the runtime are reasonable to rely on in this case to prevent a memory leak.