github.com/m3db/m3@v1.5.0/src/dbnode/encoding/proto/docs/unmarshal.md (about)

     1  # Top Level Scalar Unmarshaller
     2  
     3  ## Overview
     4  
     5  It's recommended that readers familiarize themselves with the [proto3 encoding documentation](https://developers.google.com/protocol-buffers/docs/encoding) before reading the remainder of this document.
     6  
     7  The Encoder in this package is responsible for encoding an unbuffered stream of marshalled Protobuf messages into a new compressed stream one message at a time.
     8  
     9  In order to accomplish this, it needs to unmarshal the Protobuf messages so that it can re-encode their values.
    10  
    11  Since the schemas for the Protobuf messages are provided dynamically (and thus efficient unmarshalling code can not be generated ahead of time) the easiest way to accomplish the unmarshalling is to rely on a dynamic Protobuf package like `jhump/protoreflect` to perform the heavy lfting.
    12  
    13  For M3DB, a solution like this is prohibitively inefficient: unmarshalling into a `*dynamic.Message` is expensive, as it involves `interface{}` magic and allocates a large number of short-lived objects that are difficult to reuse.
    14  It's especially inefficient for Protobuf schemas that are optimized for this package (specifically those that make heavy use of top-level scalar fields where allocating objects on the heap just to wrap primitive types is particularly wasteful).
    15  
    16  As a result, this package implements `customFieldUnmarshaller`, which accepts a marshalled Protobuf message (`[]byte`) and exposes methods for unmarshalling the top-level scalar fields (i.e the fields that the encoder can perform custom compression on) in an efficient and reusable manner such that in the general case there are no allocations.
    17  In addition, it also exposes a `nonCustomFieldValues` method which returns **the marshalled** bytes for every field non top-level scalar field that the unmarshaler couldn't unmarshal.
    18  
    19  ## Implementation
    20  
    21  ### Overview
    22  
    23  The implementation is broken into two parts:
    24  
    25  1. The `buffer`, which is similar to [protoreflect's dynamic codex](https://github.com/jhump/protoreflect/blob/master/dynamic/codec.go) and provides an interface for iterating over a marshalled Protobuf message, one `<fieldNumber, wiretype, value>` tuple at a time.
    26  
    27  2. The `customFieldUnmarshaller`, which wraps the `buffer` and exposes an interface for efficiently unmarshalling top-level scalar fields with no allocations, as well as returning the marshalled bytes for any fields that the unmarshaller doesn't have an efficient unmarshalling codepath for (`maps`, `repeated` fields, and nested messages, etc).
    28  
    29  ### Buffer
    30  
    31  The code in the `buffer` is mostly self explanatory for anyone familiar with the [proto3 encoding format](https://developers.google.com/protocol-buffers/docs/encoding).
    32  
    33  ### CustomFieldUnmarshaller
    34  
    35  The `customFieldUnmarshaller` has three primary responsibilities:
    36  
    37  1. Provide an interface for efficiently unmarshalling top-level scalar fields in a marshalled Protobuf message without allocating.
    38  2. Ensure that the values unmarshalled in #1 are sorted by field number.
    39  3. Return a slice of `marshalledField` that contains *only* the fields that could not be unmarshalled efficiently in #1. This does not allocate / expend any resources at all in the case of optimized schemas that only contain fields that can be handled by #1.
    40  
    41  The `customFieldUnmarshaller` works by iterating through all the `<fieldNumber, wireType, value>` tuples in the marshalled Protobuf and checking if they are supported by the efficient code path.
    42  If they are, it unmarshals the value into an `unmarshalValue` which is a space-optimized type that can be reused without any allocations.
    43  If the tuple cannot be unmarshalled efficiently, the unmarshaller keeps track of the bytes that represent that tuple and will return them as part of the (sorted) `[]marshalledField` when unmarshalling is complete. This approach supports both any combination of custom and non-custom fields, and in any order.
    44  
    45  The output of unmarshalling is a slice of `unmarshalValue`s (sorted by field number) containing all custom-encoded values and a `[]marshalledField` containing any complex fields that cannot be unmarshalled or compressed efficiently. This value slice is reused to mitigate allocation costs for subsequent unmarshalling.
    46  
    47  Note that the `customFieldUnmarshaller` only returns an `unmarshalValue` for fields that were actually encoded into the stream. According to the [Proto3 encoding format](https://developers.google.com/protocol-buffers/docs/encoding), fields set to their default values are omitted from the marshalled stream.
    48  Thus, if an `unmarshalValue` is not present for a field (and the given field would nominally unmarshal to an `unmarshalValue`), then that value is the type's default value.
    49