github.com/cockroachdb/pebble@v0.0.0-20231214172447-ab4952c5f87b/internal/base/lazy_value.go (about)

     1  // Copyright 2022 The LevelDB-Go and Pebble Authors. All rights reserved. Use
     2  // of this source code is governed by a BSD-style license that can be found in
     3  // the LICENSE file.
     4  
     5  package base
     6  
     7  import "github.com/cockroachdb/pebble/internal/invariants"
     8  
     9  // A value can have user-defined attributes that are a function of the value
    10  // byte slice. For now, we only support "short attributes", which can be
    11  // encoded in 3 bits. We will likely extend this to "long attributes" later
    12  // for values that are even more expensive to access than those in value
    13  // blocks in the same sstable.
    14  //
    15  // When a sstable writer chooses not to store a value together with the key,
    16  // it can call the ShortAttributeExtractor to extract the attribute and store
    17  // it together with the key. This allows for cheap retrieval of
    18  // AttributeAndLen on the read-path, without doing a more expensive retrieval
    19  // of the value. In general, the extraction code may want to also look at the
    20  // key to decide how to treat the value, hence the key* parameters.
    21  //
    22  // Write path performance: The ShortAttributeExtractor func cannot be inlined,
    23  // so we will pay the cost of this function call. However, we will only pay
    24  // this when (a) the value is not being stored together with the key, and (b)
    25  // the key-value pair is being initially written to the DB, or a compaction is
    26  // transitioning the key-value pair from being stored together to being stored
    27  // separately.
    28  
    29  // ShortAttribute encodes a user-specified attribute of the value.
    30  type ShortAttribute uint8
    31  
    32  // MaxShortAttribute is the maximum value of the short attribute (3 bits).
    33  const MaxShortAttribute = 7
    34  
    35  // ShortAttributeExtractor is an extractor that given the value, will return
    36  // the ShortAttribute.
    37  type ShortAttributeExtractor func(
    38  	key []byte, keyPrefixLen int, value []byte) (ShortAttribute, error)
    39  
    40  // AttributeAndLen represents the pair of value length and the short
    41  // attribute.
    42  type AttributeAndLen struct {
    43  	ValueLen       int32
    44  	ShortAttribute ShortAttribute
    45  }
    46  
    47  // LazyValue represents a value that may not already have been extracted.
    48  // Currently, it can represent either an in-place value (stored with the key)
    49  // or a value stored in the value section. However, the interface is general
    50  // enough to support values that are stored in separate files.
    51  //
    52  // LazyValue is used in the InternalIterator interface, such that all
    53  // positioning calls return (*InternalKey, LazyValue). It is also exposed via
    54  // the public Iterator for callers that need to remember a recent but not
    55  // necessarily latest LazyValue, in case they need the actual value in the
    56  // future. An example is a caller that is iterating in reverse and looking for
    57  // the latest MVCC version for a key -- it cannot identify the latest MVCC
    58  // version without stepping to the previous key-value pair e.g.
    59  // storage.pebbleMVCCScanner in CockroachDB.
    60  //
    61  // Performance note: It is important for this struct to not exceed a sizeof 32
    62  // bytes, for optimizing the common case of the in-place value. Prior to
    63  // introducing LazyValue, we were passing around a []byte which is 24 bytes.
    64  // Passing a 40 byte or larger struct causes performance to drop by 75% on
    65  // some benchmarks that do tight iteration loops.
    66  //
    67  // Memory management:
    68  // This is subtle, but important for performance.
    69  //
    70  // A LazyValue returned by an InternalIterator or Iterator is unstable in that
    71  // repositioning the iterator will invalidate the memory inside it. A caller
    72  // wishing to maintain that LazyValue needs to call LazyValue.Clone(). Note
    73  // that this does not fetch the value if it is not in-place. Clone() should
    74  // ideally not be called if LazyValue.Value() has been called, since the
    75  // cloned LazyValue will forget the extracted/fetched value, and calling
    76  // Value() on this clone will cause the value to be extracted again. That is,
    77  // Clone() does not make any promise about the memory stability of the
    78  // underlying value.
    79  //
    80  // A user of an iterator that calls LazyValue.Value() wants as much as
    81  // possible for the returned value []byte to point to iterator owned memory.
    82  //
    83  //  1. [P1] The underlying iterator that owns that memory also needs a promise
    84  //     from that user that at any time there is at most one value []byte slice
    85  //     that the caller is expecting it to maintain. Otherwise, the underlying
    86  //     iterator has to maintain multiple such []byte slices which results in
    87  //     more complicated and inefficient code.
    88  //
    89  //  2. [P2] The underlying iterator, in order to make the promise that it is
    90  //     maintaining the one value []byte slice, also needs a way to know when
    91  //     it is relieved of that promise. One way it is relieved of that promise
    92  //     is by being told that it is being repositioned. Typically, the owner of
    93  //     the value []byte slice is a sstable iterator, and it will know that it
    94  //     is relieved of the promise when it is repositioned. However, consider
    95  //     the case where the caller has used LazyValue.Clone() and repositioned
    96  //     the iterator (which is actually a tree of iterators). In this case the
    97  //     underlying sstable iterator may not even be open. LazyValue.Value()
    98  //     will still work (at a higher cost), but since the sstable iterator is
    99  //     not open, it does not have a mechanism to know when the retrieved value
   100  //     is no longer in use. We refer to this situation as "not satisfying P2".
   101  //     To handle this situation, the LazyValue.Value() method accepts a caller
   102  //     owned buffer, that the callee will use if needed. The callee explicitly
   103  //     tells the caller whether the []byte slice for the value is now owned by
   104  //     the caller. This will be true if the callee attempted to use buf and
   105  //     either successfully used it or allocated a new []byte slice.
   106  //
   107  // To ground the above in reality, we consider three examples of callers of
   108  // LazyValue.Value():
   109  //
   110  //   - Iterator: it calls LazyValue.Value for its own use when merging values.
   111  //     When merging during reverse iteration, it may have cloned the LazyValue.
   112  //     In this case it calls LazyValue.Value() on the cloned value, merges it,
   113  //     and then calls LazyValue.Value() on the current iterator position and
   114  //     merges it. So it is honoring P1.
   115  //
   116  //   - Iterator on behalf of Iterator clients: The Iterator.Value() method
   117  //     needs to call LazyValue.Value(). The client of Iterator is satisfying P1
   118  //     because of the inherent Iterator interface constraint, i.e., it is calling
   119  //     Iterator.Value() on the current Iterator position. It is possible that
   120  //     the Iterator has cloned this LazyValue (for the reverse iteration case),
   121  //     which the client is unaware of, so the underlying sstable iterator may
   122  //     not be able to satisfy P2. This is ok because Iterator will call
   123  //     LazyValue.Value with its (reusable) owned buffer.
   124  //
   125  //   - CockroachDB's pebbleMVCCScanner: This will use LazyValues from Iterator
   126  //     since during reverse iteration in order to find the highest version that
   127  //     satisfies a read it needs to clone the LazyValue, step back the iterator
   128  //     and then decide whether it needs the value from the previously cloned
   129  //     LazyValue. The pebbleMVCCScanner will satisfy P1. The P2 story is
   130  //     similar to the previous case in that it will call LazyValue.Value with
   131  //     its (reusable) owned buffer.
   132  //
   133  // Corollary: callers that directly use InternalIterator can know that they
   134  // have done nothing to interfere with promise P2 can pass in a nil buf and be
   135  // sure that it will not trigger an allocation.
   136  //
   137  // Repeated calling of LazyValue.Value:
   138  // This is ok as long as the caller continues to satisfy P1. The previously
   139  // fetched value will be remembered inside LazyValue to avoid fetching again.
   140  // So if the caller's buffer is used the first time the value was fetched, it
   141  // is still in use.
   142  //
   143  // LazyValue fields are visible outside the package for use in
   144  // InternalIterator implementations and in Iterator, but not meant for direct
   145  // use by users of Pebble.
   146  type LazyValue struct {
   147  	// ValueOrHandle represents a value, or a handle to be passed to ValueFetcher.
   148  	// - Fetcher == nil: ValueOrHandle is a value.
   149  	// - Fetcher != nil: ValueOrHandle is a handle and Fetcher.Attribute is
   150  	//   initialized.
   151  	// The ValueOrHandle exposed by InternalIterator or Iterator may not be stable
   152  	// if the iterator is stepped. To make it stable, make a copy using Clone.
   153  	ValueOrHandle []byte
   154  	// Fetcher provides support for fetching an actually lazy value.
   155  	Fetcher *LazyFetcher
   156  }
   157  
   158  // LazyFetcher supports fetching a lazy value.
   159  //
   160  // Fetcher and Attribute are to be initialized at creation time. The fields
   161  // are arranged to reduce the sizeof this struct.
   162  type LazyFetcher struct {
   163  	// Fetcher, given a handle, returns the value.
   164  	Fetcher ValueFetcher
   165  	err     error
   166  	value   []byte
   167  	// Attribute includes the short attribute and value length.
   168  	Attribute   AttributeAndLen
   169  	fetched     bool
   170  	callerOwned bool
   171  }
   172  
   173  // ValueFetcher is an interface for fetching a value.
   174  type ValueFetcher interface {
   175  	// Fetch returns the value, given the handle. It is acceptable to call the
   176  	// ValueFetcher.Fetch as long as the DB is open. However, one should assume
   177  	// there is a fast-path when the iterator tree has not moved off the sstable
   178  	// iterator that initially provided this LazyValue. Hence, to utilize this
   179  	// fast-path the caller should try to decide whether it needs the value or
   180  	// not as soon as possible, with minimal possible stepping of the iterator.
   181  	//
   182  	// buf will be used if the fetcher cannot satisfy P2 (see earlier comment).
   183  	// If the fetcher attempted to use buf *and* len(buf) was insufficient, it
   184  	// will allocate a new slice for the value. In either case it will set
   185  	// callerOwned to true.
   186  	Fetch(
   187  		handle []byte, valLen int32, buf []byte) (val []byte, callerOwned bool, err error)
   188  }
   189  
   190  // Value returns the underlying value.
   191  func (lv *LazyValue) Value(buf []byte) (val []byte, callerOwned bool, err error) {
   192  	if lv.Fetcher == nil {
   193  		return lv.ValueOrHandle, false, nil
   194  	}
   195  	// Do the rest of the work in a separate method to attempt mid-stack
   196  	// inlining of Value(). Unfortunately, this still does not inline since the
   197  	// cost of 85 exceeds the budget of 80.
   198  	//
   199  	// TODO(sumeer): Packing the return values into a struct{[]byte error bool}
   200  	// causes it to be below the budget. Consider this if we need to recover
   201  	// more performance. I suspect that inlining this only matters in
   202  	// micro-benchmarks, and in actual use cases in CockroachDB it will not
   203  	// matter because there is substantial work done with a fetched value.
   204  	return lv.fetchValue(buf)
   205  }
   206  
   207  // INVARIANT: lv.Fetcher != nil
   208  func (lv *LazyValue) fetchValue(buf []byte) (val []byte, callerOwned bool, err error) {
   209  	f := lv.Fetcher
   210  	if !f.fetched {
   211  		f.fetched = true
   212  		f.value, f.callerOwned, f.err = f.Fetcher.Fetch(
   213  			lv.ValueOrHandle, lv.Fetcher.Attribute.ValueLen, buf)
   214  	}
   215  	return f.value, f.callerOwned, f.err
   216  }
   217  
   218  // InPlaceValue returns the value under the assumption that it is in-place.
   219  // This is for Pebble-internal code.
   220  func (lv *LazyValue) InPlaceValue() []byte {
   221  	if invariants.Enabled && lv.Fetcher != nil {
   222  		panic("value must be in-place")
   223  	}
   224  	return lv.ValueOrHandle
   225  }
   226  
   227  // Len returns the length of the value.
   228  func (lv *LazyValue) Len() int {
   229  	if lv.Fetcher == nil {
   230  		return len(lv.ValueOrHandle)
   231  	}
   232  	return int(lv.Fetcher.Attribute.ValueLen)
   233  }
   234  
   235  // TryGetShortAttribute returns the ShortAttribute and a bool indicating
   236  // whether the ShortAttribute was populated.
   237  func (lv *LazyValue) TryGetShortAttribute() (ShortAttribute, bool) {
   238  	if lv.Fetcher == nil {
   239  		return 0, false
   240  	}
   241  	return lv.Fetcher.Attribute.ShortAttribute, true
   242  }
   243  
   244  // Clone creates a stable copy of the LazyValue, by appending bytes to buf.
   245  // The fetcher parameter must be non-nil and may be over-written and used
   246  // inside the returned LazyValue -- this is needed to avoid an allocation.
   247  // Most callers have at most K cloned LazyValues, where K is hard-coded, so
   248  // they can have a pool of exactly K LazyFetcher structs they can reuse in
   249  // these calls. The alternative of allocating LazyFetchers from a sync.Pool is
   250  // not viable since we have no code trigger for returning to the pool
   251  // (LazyValues are simply GC'd).
   252  //
   253  // NB: It is highly preferable that LazyValue.Value() has not been called,
   254  // since the Clone will forget any previously extracted value, and a future
   255  // call to Value will cause it to be fetched again. We do this since we don't
   256  // want to reason about whether or not to clone an already extracted value
   257  // inside the Fetcher (we don't). Property P1 applies here too: if lv1.Value()
   258  // has been called, and then lv2 is created as a clone of lv1, then calling
   259  // lv2.Value() can invalidate any backing memory maintained inside the fetcher
   260  // for lv1 (even though these are the same values). We initially prohibited
   261  // calling LazyValue.Clone() if LazyValue.Value() has been called, but there
   262  // is at least one complex caller (pebbleMVCCScanner inside CockroachDB) where
   263  // it is not easy to prove this invariant.
   264  func (lv *LazyValue) Clone(buf []byte, fetcher *LazyFetcher) (LazyValue, []byte) {
   265  	var lvCopy LazyValue
   266  	if lv.Fetcher != nil {
   267  		*fetcher = LazyFetcher{
   268  			Fetcher:   lv.Fetcher.Fetcher,
   269  			Attribute: lv.Fetcher.Attribute,
   270  			// Not copying anything that has been extracted.
   271  		}
   272  		lvCopy.Fetcher = fetcher
   273  	}
   274  	vLen := len(lv.ValueOrHandle)
   275  	if vLen == 0 {
   276  		return lvCopy, buf
   277  	}
   278  	bufLen := len(buf)
   279  	buf = append(buf, lv.ValueOrHandle...)
   280  	lvCopy.ValueOrHandle = buf[bufLen : bufLen+vLen]
   281  	return lvCopy, buf
   282  }
   283  
   284  // MakeInPlaceValue constructs an in-place value.
   285  func MakeInPlaceValue(val []byte) LazyValue {
   286  	return LazyValue{ValueOrHandle: val}
   287  }