github.com/weaviate/weaviate@v1.24.6/adapters/repos/db/lsmkv/doc.go (about)

     1  //                           _       _
     2  // __      _____  __ ___   ___  __ _| |_ ___
     3  // \ \ /\ / / _ \/ _` \ \ / / |/ _` | __/ _ \
     4  //  \ V  V /  __/ (_| |\ V /| | (_| | ||  __/
     5  //   \_/\_/ \___|\__,_| \_/ |_|\__,_|\__\___|
     6  //
     7  //  Copyright © 2016 - 2024 Weaviate B.V. All rights reserved.
     8  //
     9  //  CONTACT: hello@weaviate.io
    10  //
    11  
    12  /*
    13  # LSMKV (= Log-structured Merge-Tree Key-Value Store)
    14  
    15  This package contains Weaviate's custom LSM store. While modeled after the
    16  usecases that are required for Weaviate to be fast, reliable, and scalable, it
    17  is technically completely independent. You could build your own database on top
    18  of this key-value store.
    19  
    20  Covering the architecture of [LSM Stores] in general goes beyond the scope of
    21  this documentation. Therefore things that are specific to this implementation
    22  are highlighted.
    23  
    24  # Strategies
    25  
    26  To understand the different type of buckets in this store, you need to
    27  familiarize yourself with the following strategies. A strategy defines a
    28  different usecase for a [Bucket].
    29  
    30    - "Replace"
    31  
    32      Replace resembles the classical key-value store. Each key has exactly one
    33      value. A subsequent PUT on an an existing key, replaces the value (hence
    34      the name "replace"). Once replaced a former value can no longer be
    35      retrieved, and will eventually be removed in compactions.
    36  
    37    - "Set" (aka "SetCollection")
    38  
    39      A set behaves like an unordered collection of independent values. In other
    40      words a single key has multiple values. For example, for key "foo", you
    41      could have values "bar1", "bar2", "bazzinga". A bucket of this type is
    42      optimized for cheap writes to add new set additions. For example adding
    43      another set element has a fixed cost independent of the number of the
    44      existing set length. This makes it very well suited for building an
    45      inverted index.
    46  
    47      Retrieving a Set has a slight cost to it if a set is spread across multiple
    48      segments. This cost will eventually reduce as more and more compactions
    49      happen. In the ideal case (fully compacted DB), retrieving a Set requires
    50      just a single disk read.
    51  
    52    - "Map" (aka "MapCollection")
    53  
    54      Maps are similar to Sets in the sense that for a single key there are
    55      multiple values. However, each value is in itself a key-value pair. This
    56      makes this type very similar to a dict or hashmap type. For example for
    57      key "foo", you could have value pairs: "bar":17, "baz":19.
    58  
    59      This makes a map a great use case for an inverted index that needs to store
    60      additional info beyond just the docid-pointer, such as in the case of a
    61      BM25 index where the term frequency needs to be stored.
    62  
    63      The same performance-considerations as for sets apply.
    64  
    65  # Navigate around these docs
    66  
    67  Good entrypoints to learn more about how this package works include [Store]
    68  with [New] and [Store.CreateOrLoadBucket], as well as [Bucket] with
    69  [Bucket.Get], [Bucket.GetBySecondary], [Bucket.Put], etc.
    70  
    71  Each strategy also supports cursor types: [CursorReplace] can be created using [Bucket.Cursor], [CursorSet] can be created with [Bucket.SetCursor] , and [CursorMap] can be created with [Bucket.MapCursor].
    72  
    73  [LSM Stores]: https://en.wikipedia.org/wiki/Log-structured_merge-tree
    74  */
    75  package lsmkv