github.com/weaviate/weaviate@v1.24.6/adapters/repos/db/roaringset/doc.go (about)

     1  //                           _       _
     2  // __      _____  __ ___   ___  __ _| |_ ___
     3  // \ \ /\ / / _ \/ _` \ \ / / |/ _` | __/ _ \
     4  //  \ V  V /  __/ (_| |\ V /| | (_| | ||  __/
     5  //   \_/\_/ \___|\__,_| \_/ |_|\__,_|\__\___|
     6  //
     7  //  Copyright © 2016 - 2024 Weaviate B.V. All rights reserved.
     8  //
     9  //  CONTACT: hello@weaviate.io
    10  //
    11  
    12  // Package roaringset contains all the LSM business logic that is unique
    13  // to the "RoaringSet" strategy
    14  //
    15  // This package alone does not contain an entire LSM store. It's intended to be
    16  // used as part of the [github.com/weaviate/weaviate/adapters/repos/db/lsmkv] package.
    17  //
    18  // # Motivation
    19  //
    20  // What makes the RoaringSet strategy unique is that it's essentially a fully
    21  // persistent Roaring Bitmap that can be built up and updated incrementally
    22  // (without write amplification) while being extremely fast to query.
    23  //
    24  // Without this specific strategy, it would not be efficient to use roaring
    25  // bitmaps in an LSM store. For example:
    26  //
    27  //   - Lucene uses posting lists in the inverted index on disk and supports
    28  //     converting them to a Roaring Bitmap at query time. This resulting bitmap
    29  //     can then be cached. However, the cost to initially convert a posting list
    30  //     to a roaring bitmap is quite huge. In our own tests, inserting 90M out of
    31  //     100M possible ids into a [github.com/weaviate/sroar.Bitmap] takes about
    32  //     3.5s.
    33  //
    34  //   - You could store a regular roaring bitmap, such as
    35  //     [github.com/weaviate/sroar.Bitmap] in a regular LSM store, such as
    36  //     RocksDB. This would fix the retrieval issue and you should be able to
    37  //     retrieve and initialize a bitmap containing 90M objects in a few
    38  //     milliseconds. However, the cost to incrementally update this bitmap would
    39  //     be extreme. You would have to use a read-modify-write pattern which would
    40  //     lead to huge write-amplification on large setups. A 90M roaring bitmap
    41  //     is about 10.5MB, so to add a single entry (which would take up anywhere
    42  //     from 1 bit to 2 bytes), you would have to read 10.5MB and write 10.5MB
    43  //     again. That's not feasible except for bulk-loading. In Weaviate we cannot
    44  //     always assume bulk loading, as user behavior and insert orders are
    45  //     generally unpredictable.
    46  //
    47  // We solve this issue by making the LSM store roaring-bitmap-native. This way,
    48  // we can keep the benefits of an LSM store (very fast writes) with the
    49  // benefits of a serialized roaring bitmap (very fast reads/initializations).
    50  //
    51  // Essentially this means the RoaringSet strategy behaves like a fully
    52  // persistent (and durable) Roaring Bitmap. See the next section to learn how
    53  // it works under the hood.
    54  //
    55  // # Internals
    56  //
    57  // The public-facing methods make use of [github.com/weaviate/sroar.Bitmap].
    58  // This serialized bitmap already fulfills many of the criteria needed in
    59  // Weaviate. It can be initialized at almost no cost (sharing memory) or very
    60  // little cost (copying some memory). Furthermore, its set, remove, and
    61  // intersection methods work well for the inverted index use cases in Weaviate.
    62  //
    63  // So, the novel part in the lsmkv.RoaringSet strategy does not sit in the
    64  // roaring bitmap itself, but rather in the way it's persisted. It uses the
    65  // standard principles of an LSM store where each new write is first cached in
    66  // a memtable (and of course written into a Write-Ahead-Log to make it
    67  // durable). The memtable is flushed into a disk segment when specific criteria
    68  // are met (memtable size, WAL size, idle time, time since last flush, etc.).
    69  //
    70  // This means that each layer (represented by [BitmapLayer]) only contains the
    71  // deltas that were written in a specific time interval. When reading, all
    72  // layers must be combined into a single bitmap (see [BitmapLayers.Flatten]).
    73  //
    74  // Over time segments can be combined into fewer, larger segments using an LSM
    75  // Compaction process. The logic for that can be found in [BitmapLayers.Merge].
    76  //
    77  // To make sure access is efficient the entire RoaringSet strategy is built to
    78  // avoid encoding/decoding steps. Instead we internally store data as simple
    79  // byte slices. For example, see [SegmentNode]. You can access bitmaps without
    80  // any meaningful allocations using [SegmentNode.Additions] and
    81  // [SegmentNode.Deletions]. If you plan to hold on to the bitmap for a time
    82  // window that is longer than holding a lock that prevents a compaction, you
    83  // need to copy data (e.g. using [SegmentNode.AdditionsWithCopy]). Even with
    84  // such a copy, reading a 90M-ids bitmap takes only single-digit milliseconds.
    85  package roaringset