go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/common/proto/msgpackpb/doc.go (about) 1 // Copyright 2022 The LUCI Authors. 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 // Package msgpackpb implements generic protobuf message serialization to 16 // msgpack. 17 // 18 // This library exists primarially to allow safe interchange between lua scripts 19 // running inside of a Redis instance, and external programs. 20 // 21 // It is intended to be fast and compact for lua to decode via cmsgpack 22 // (specifically, the version of cmsgpack which ships with Redis 5.1+). 23 // 24 // To avoid implementing a brand new versioning or encoding schemes while not 25 // totally sacrificing performance and storage cababilities (e.g. by using 26 // JSONPB), we lean on the versioning and compatibility features of protobufs, 27 // and so choose a scheme which can be derived entirely from the proto schema 28 // definition. 29 // 30 // The scheme works by encoding a message as a map of `field tag` to `value`. 31 // 32 // The value can be a message, a scalar (bool, int, uint, float, string), a list 33 // of messages or scalars, a map of (bool, int, uint, string) to messages or 34 // scalars. 35 // 36 // message Foo { 37 // string field = 2; 38 // Foo recurse = 7; 39 // } 40 // 41 // Would encode the instance `field: "hello" recurse {field: "hi"}}` as msgpack: 42 // 43 // { 44 // 2: "hello", 45 // 7: { 46 // 2: "hi" 47 // } 48 // } 49 // 50 // i.e. `94 02 a5 68 65 6c 6c 6f f9 92 02 a2 68 69` 51 // 52 // This would be 14 bytes vs 12 bytes for binary proto or 46 bytes for JSONPB. 53 // 54 // This encoding is simple enough that we can make a simple table based 55 // encoder/decoder in Lua, but robust enough that as long as everyone follows 56 // the backwards compatibility rules for proto, we should be OK when having Go 57 // and Lua interact with the same-encoded messages. 58 // 59 // # Unknown fields 60 // 61 // Unknown fields are saved in decoded protobufs as an unknown field tagged with 62 // INT32_MAX. The value of this field is essentially a filtered version of the 63 // Message; a map of `field tag` and `value`, but just for the unknown field 64 // tags. 65 // 66 // # Deterministic serialization 67 // 68 // This library optionally provides deterministic serialization of messages, 69 // which will: 70 // - Sort all maps 71 // - Order all messages by tag 72 // - Emit all numeric types using the most compact representation in 73 // msgpack. 74 // - Walk all unknown fields, ordering messages there by tag, and interleaving 75 // their field numbers with the known fields in sorted order. 76 // - If any of the above would yield a map with integer keys from 1..N, 77 // instead emit this as a list of just the values. 78 // 79 // It is intended that this encoding be stable across binary versions, and 80 // should be suitable for hashing (see MarshalStream; You can use io.MultiWriter 81 // to marshal to e.g. a strings.Builder at the same time that you write to 82 // a hash). 83 // 84 // # Notes 85 // 86 // NOTE: It could probably be a better idea to implement native protobuf 87 // encoding for Lua (and, in fact, there are Redis extensions for this), but we 88 // cannot currently use them with Cloud Memorystore, which is what manages our 89 // Redis instance. Writing a 'pure lua' protobuf codec seemed like it would be 90 // a time sink, but it could still be an option if this msgpack encoding proves 91 // difficult to work with. 92 // 93 // NOTE: An alternative to this package would be to create a bespoke encoding 94 // for your data objects, and include, e.g. a version identifier, to allow for 95 // schema updates. This is actually the initial approach that we took before 96 // writing this library, but we deemed this to be unwieldy and were concerned 97 // about having to worry both about proto change semantics and mapping them to 98 // the bespoke versioning semantics. 99 // 100 // NOTE: It would be possible to build a more efficient code generation 101 // marshalling implementation, but it presents a problem; Doing so would require 102 // generating serialization code for ALL proto messages which could be encoded, 103 // which includes references to externally declared proto messages (for example, 104 // google.protobuf.Duration). As such, a code generation approach would need to 105 // diverge pretty significantly from the 'usual' Go codegen model (i.e. where 106 // the generated code lives next to the protos that generate the code). 107 // Specifically, the generated code for external protos would need to be 108 // generated into a common library or something like that which lives separately 109 // from the protos. 110 // 111 // The other alternative would be to partially generate the marshalling code for 112 // protos which we own, but then fall back to a reflection based approach for 113 // messages which don't have the codegen encoding scheme. However this means 114 // that we would need a fully working codegen scheme AND ALSO a fully working 115 // reflection based scheme. 116 // 117 // For 'simplicity', we took the reflection-based approach as the initial and 118 // sole approach for now. 119 // 120 // NOTE: Unlike binary protobuf, msgpackpb messages cannot be concatenated to 121 // produce a single encodeable message. However, concatenated messages can 122 // accurately be parsed serially and applied to the same Message using 123 // UnmarshalStream. 124 // 125 // NOTE: Using this to interact with lua, keep in mind that lua (until 5.3) 126 // stores all numbers as double precision floating point (note that Redis looks 127 // like it's effectively stuck on lua 5.1 indefinitely, as of late 2022). 128 // The upshot of this is that integer types will be integers until they hit 2^52 129 // or so (assuming that your redis is using lua with 64bit numbers! It is 130 // possible to configure lua to use 32bit numbers...). If your lua program 131 // serializes a number past this threshold, Go will refuse to decode it into 132 // a field with an integer type, so at least this won't cause silent corruption. 133 // 134 // NOTE: Lua only has a single type to represent maps and lists; the 'table'. 135 // Additionally, the lua cmsgpack library will encode a table as a list if it 136 // 'looks like' a list. A table 'looks like' a list if it contains N entries and 137 // all the entries are keyed with the numbers 1 through N. This affects the 138 // encoding of both messages (which are map of field tag to value) as well as 139 // proto fields which are maps. 140 // 141 // NOTE: Should you need to switch to another encoding, note that because this 142 // encoding ALWAYS encodes a message, the msgpack 'type' of the first item in 143 // a stream will always be either a map or a list (due to lua table 144 // shenanigans). This means that you could insert a number into the stream as 145 // the first item, instead, and use this to disambiguate future versions of this 146 // encoding. 147 package msgpackpb