go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/logdog/common/storage/bigtable/doc.go (about)

     1  // Copyright 2015 The LUCI Authors.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //      http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  // Package bigtable provides an implementation of the Storage interface backed
    16  // by Google Cloud Platform's BigTable.
    17  //
    18  // # Intermediate Log Table
    19  //
    20  // The Intermediate Log Table stores LogEntry protobufs that have been ingested,
    21  // but haven't yet been archived. It is a tall table whose rows are keyed off
    22  // of the log's (Path,Stream-Index) in that order.
    23  //
    24  // Each entry in the table will contain the following schema:
    25  //   - Column Family "log"
    26  //   - Column "data": the LogEntry raw protobuf data. Soft size limit of ~1MB.
    27  //
    28  // The log path is the composite of the log's (Prefix, Name) properties. Logs
    29  // belonging to the same stream will share the same path, so they will be
    30  // clustered together and suitable for efficient iteration. Immediately
    31  // following the path will be the log's stream index.
    32  //
    33  //	[            20 bytes          ]     ~    [       1-5 bytes      ]
    34  //	 B64(SHA256(Path(Prefix, Name)))  + '~' + HEX(cmpbin(StreamIndex))
    35  //
    36  // As there is no (technical) size constraint to either the Prefix or Name
    37  // values, they will both be hashed using SHA256 to produce a unique key
    38  // representing that specific log stream.
    39  //
    40  // This allows a key to be generated representing "immediately after the row"
    41  // by appending two '~' characters to the base hash. Since the second '~'
    42  // character is always greater than any HEX(cmpbin(*)) value, this will
    43  // effectively upper-bound the row.
    44  //
    45  // "cmpbin" (go.chromium.org/luci/common/cmpbin) will be used to format the
    46  // stream index. It is a variable-width number encoding scheme that offers the
    47  // guarantee that byte-sorted encoded numbers will maintain the same order as
    48  // the numbers themselves.
    49  package bigtable