go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/common/proto/msgpackpb/doc.go (about)

     1  // Copyright 2022 The LUCI Authors.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //      http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  // Package msgpackpb implements generic protobuf message serialization to
    16  // msgpack.
    17  //
    18  // This library exists primarially to allow safe interchange between lua scripts
    19  // running inside of a Redis instance, and external programs.
    20  //
    21  // It is intended to be fast and compact for lua to decode via cmsgpack
    22  // (specifically, the version of cmsgpack which ships with Redis 5.1+).
    23  //
    24  // To avoid implementing a brand new versioning or encoding schemes while not
    25  // totally sacrificing performance and storage cababilities (e.g. by using
    26  // JSONPB), we lean on the versioning and compatibility features of protobufs,
    27  // and so choose a scheme which can be derived entirely from the proto schema
    28  // definition.
    29  //
    30  // The scheme works by encoding a message as a map of `field tag` to `value`.
    31  //
    32  // The value can be a message, a scalar (bool, int, uint, float, string), a list
    33  // of messages or scalars, a map of (bool, int, uint, string) to messages or
    34  // scalars.
    35  //
    36  //	message Foo {
    37  //	  string field = 2;
    38  //	  Foo recurse = 7;
    39  //	}
    40  //
    41  // Would encode the instance `field: "hello" recurse {field: "hi"}}` as msgpack:
    42  //
    43  //	{
    44  //	  2: "hello",
    45  //	  7: {
    46  //	    2: "hi"
    47  //	  }
    48  //	}
    49  //
    50  //	i.e. `94 02 a5 68 65 6c 6c 6f f9 92 02 a2 68 69`
    51  //
    52  // This would be 14 bytes vs 12 bytes for binary proto or 46 bytes for JSONPB.
    53  //
    54  // This encoding is simple enough that we can make a simple table based
    55  // encoder/decoder in Lua, but robust enough that as long as everyone follows
    56  // the backwards compatibility rules for proto, we should be OK when having Go
    57  // and Lua interact with the same-encoded messages.
    58  //
    59  // # Unknown fields
    60  //
    61  // Unknown fields are saved in decoded protobufs as an unknown field tagged with
    62  // INT32_MAX. The value of this field is essentially a filtered version of the
    63  // Message; a map of `field tag` and `value`, but just for the unknown field
    64  // tags.
    65  //
    66  // # Deterministic serialization
    67  //
    68  // This library optionally provides deterministic serialization of messages,
    69  // which will:
    70  //   - Sort all maps
    71  //   - Order all messages by tag
    72  //   - Emit all numeric types using the most compact representation in
    73  //     msgpack.
    74  //   - Walk all unknown fields, ordering messages there by tag, and interleaving
    75  //     their field numbers with the known fields in sorted order.
    76  //   - If any of the above would yield a map with integer keys from 1..N,
    77  //     instead emit this as a list of just the values.
    78  //
    79  // It is intended that this encoding be stable across binary versions, and
    80  // should be suitable for hashing (see MarshalStream; You can use io.MultiWriter
    81  // to marshal to e.g. a strings.Builder at the same time that you write to
    82  // a hash).
    83  //
    84  // # Notes
    85  //
    86  // NOTE: It could probably be a better idea to implement native protobuf
    87  // encoding for Lua (and, in fact, there are Redis extensions for this), but we
    88  // cannot currently use them with Cloud Memorystore, which is what manages our
    89  // Redis instance. Writing a 'pure lua' protobuf codec seemed like it would be
    90  // a time sink, but it could still be an option if this msgpack encoding proves
    91  // difficult to work with.
    92  //
    93  // NOTE: An alternative to this package would be to create a bespoke encoding
    94  // for your data objects, and include, e.g. a version identifier, to allow for
    95  // schema updates. This is actually the initial approach that we took before
    96  // writing this library, but we deemed this to be unwieldy and were concerned
    97  // about having to worry both about proto change semantics and mapping them to
    98  // the bespoke versioning semantics.
    99  //
   100  // NOTE: It would be possible to build a more efficient code generation
   101  // marshalling implementation, but it presents a problem; Doing so would require
   102  // generating serialization code for ALL proto messages which could be encoded,
   103  // which includes references to externally declared proto messages (for example,
   104  // google.protobuf.Duration). As such, a code generation approach would need to
   105  // diverge pretty significantly from the 'usual' Go codegen model (i.e. where
   106  // the generated code lives next to the protos that generate the code).
   107  // Specifically, the generated code for external protos would need to be
   108  // generated into a common library or something like that which lives separately
   109  // from the protos.
   110  //
   111  // The other alternative would be to partially generate the marshalling code for
   112  // protos which we own, but then fall back to a reflection based approach for
   113  // messages which don't have the codegen encoding scheme. However this means
   114  // that we would need a fully working codegen scheme AND ALSO a fully working
   115  // reflection based scheme.
   116  //
   117  // For 'simplicity', we took the reflection-based approach as the initial and
   118  // sole approach for now.
   119  //
   120  // NOTE: Unlike binary protobuf, msgpackpb messages cannot be concatenated to
   121  // produce a single encodeable message. However, concatenated messages can
   122  // accurately be parsed serially and applied to the same Message using
   123  // UnmarshalStream.
   124  //
   125  // NOTE: Using this to interact with lua, keep in mind that lua (until 5.3)
   126  // stores all numbers as double precision floating point (note that Redis looks
   127  // like it's effectively stuck on lua 5.1 indefinitely, as of late 2022).
   128  // The upshot of this is that integer types will be integers until they hit 2^52
   129  // or so (assuming that your redis is using lua with 64bit numbers! It is
   130  // possible to configure lua to use 32bit numbers...). If your lua program
   131  // serializes a number past this threshold, Go will refuse to decode it into
   132  // a field with an integer type, so at least this won't cause silent corruption.
   133  //
   134  // NOTE: Lua only has a single type to represent maps and lists; the 'table'.
   135  // Additionally, the lua cmsgpack library will encode a table as a list if it
   136  // 'looks like' a list. A table 'looks like' a list if it contains N entries and
   137  // all the entries are keyed with the numbers 1 through N. This affects the
   138  // encoding of both messages (which are map of field tag to value) as well as
   139  // proto fields which are maps.
   140  //
   141  // NOTE: Should you need to switch to another encoding, note that because this
   142  // encoding ALWAYS encodes a message, the msgpack 'type' of the first item in
   143  // a stream will always be either a map or a list (due to lua table
   144  // shenanigans). This means that you could insert a number into the stream as
   145  // the first item, instead, and use this to disambiguate future versions of this
   146  // encoding.
   147  package msgpackpb