github.com/cosmos/cosmos-sdk@v0.50.10/docs/architecture/adr-019-protobuf-state-encoding.md (about)

     1  # ADR 019: Protocol Buffer State Encoding
     2  
     3  ## Changelog
     4  
     5  * 2020 Feb 15: Initial Draft
     6  * 2020 Feb 24: Updates to handle messages with interface fields
     7  * 2020 Apr 27: Convert usages of `oneof` for interfaces to `Any`
     8  * 2020 May 15: Describe `cosmos_proto` extensions and amino compatibility
     9  * 2020 Dec 4: Move and rename `MarshalAny` and `UnmarshalAny` into the `codec.Codec` interface.
    10  * 2021 Feb 24: Remove mentions of `HybridCodec`, which has been abandoned in [#6843](https://github.com/cosmos/cosmos-sdk/pull/6843).
    11  
    12  ## Status
    13  
    14  Accepted
    15  
    16  ## Context
    17  
    18  Currently, the Cosmos SDK utilizes [go-amino](https://github.com/tendermint/go-amino/) for binary
    19  and JSON object encoding over the wire bringing parity between logical objects and persistence objects.
    20  
    21  From the Amino docs:
    22  
    23  > Amino is an object encoding specification. It is a subset of Proto3 with an extension for interface
    24  > support. See the [Proto3 spec](https://developers.google.com/protocol-buffers/docs/proto3) for more
    25  > information on Proto3, which Amino is largely compatible with (but not with Proto2).
    26  >
    27  > The goal of the Amino encoding protocol is to bring parity into logic objects and persistence objects.
    28  
    29  Amino also aims to have the following goals (not a complete list):
    30  
    31  * Binary bytes must be decode-able with a schema.
    32  * Schema must be upgradeable.
    33  * The encoder and decoder logic must be reasonably simple.
    34  
    35  However, we believe that Amino does not fulfill these goals completely and does not fully meet the
    36  needs of a truly flexible cross-language and multi-client compatible encoding protocol in the Cosmos SDK.
    37  Namely, Amino has proven to be a big pain-point in regards to supporting object serialization across
    38  clients written in various languages while providing virtually little in the way of true backwards
    39  compatibility and upgradeability. Furthermore, through profiling and various benchmarks, Amino has
    40  been shown to be an extremely large performance bottleneck in the Cosmos SDK <sup>1</sup>. This is
    41  largely reflected in the performance of simulations and application transaction throughput.
    42  
    43  Thus, we need to adopt an encoding protocol that meets the following criteria for state serialization:
    44  
    45  * Language agnostic
    46  * Platform agnostic
    47  * Rich client support and thriving ecosystem
    48  * High performance
    49  * Minimal encoded message size
    50  * Codegen-based over reflection-based
    51  * Supports backward and forward compatibility
    52  
    53  Note, migrating away from Amino should be viewed as a two-pronged approach, state and client encoding.
    54  This ADR focuses on state serialization in the Cosmos SDK state machine. A corresponding ADR will be
    55  made to address client-side encoding.
    56  
    57  ## Decision
    58  
    59  We will adopt [Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing
    60  persisted structured data in the Cosmos SDK while providing a clean mechanism and developer UX for
    61  applications wishing to continue to use Amino. We will provide this mechanism by updating modules to
    62  accept a codec interface, `Marshaler`, instead of a concrete Amino codec. Furthermore, the Cosmos SDK
    63  will provide two concrete implementations of the `Marshaler` interface: `AminoCodec` and `ProtoCodec`.
    64  
    65  * `AminoCodec`: Uses Amino for both binary and JSON encoding.
    66  * `ProtoCodec`: Uses Protobuf for both binary and JSON encoding.
    67  
    68  Modules will use whichever codec that is instantiated in the app. By default, the Cosmos SDK's `simapp`
    69  instantiates a `ProtoCodec` as the concrete implementation of `Marshaler`, inside the `MakeTestEncodingConfig`
    70  function. This can be easily overwritten by app developers if they so desire.
    71  
    72  The ultimate goal will be to replace Amino JSON encoding with Protobuf encoding and thus have
    73  modules accept and/or extend `ProtoCodec`. Until then, Amino JSON is still provided for legacy use-cases.
    74  A handful of places in the Cosmos SDK still have Amino JSON hardcoded, such as the Legacy API REST endpoints
    75  and the `x/params` store. They are planned to be converted to Protobuf in a gradual manner.
    76  
    77  ### Module Codecs
    78  
    79  Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf
    80  migration is pretty straightforward. These modules are to simply migrate any existing types that
    81  are encoded and persisted via their concrete Amino codec to Protobuf and have their keeper accept a
    82  `Marshaler` that will be a `ProtoCodec`. This migration is simple as things will just work as-is.
    83  
    84  Note, any business logic that needs to encode primitive types like `bool` or `int64` should use
    85  [gogoprotobuf](https://github.com/cosmos/gogoproto) Value types.
    86  
    87  Example:
    88  
    89  ```go
    90    ts, err := gogotypes.TimestampProto(completionTime)
    91    if err != nil {
    92      // ...
    93    }
    94  
    95    bz := cdc.MustMarshal(ts)
    96  ```
    97  
    98  However, modules can vary greatly in purpose and design and so we must support the ability for modules
    99  to be able to encode and work with interfaces (e.g. `Account` or `Content`). For these modules, they
   100  must define their own codec interface that extends `Marshaler`. These specific interfaces are unique
   101  to the module and will contain method contracts that know how to serialize the needed interfaces.
   102  
   103  Example:
   104  
   105  ```go
   106  // x/auth/types/codec.go
   107  
   108  type Codec interface {
   109    codec.Codec
   110  
   111    MarshalAccount(acc exported.Account) ([]byte, error)
   112    UnmarshalAccount(bz []byte) (exported.Account, error)
   113  
   114    MarshalAccountJSON(acc exported.Account) ([]byte, error)
   115    UnmarshalAccountJSON(bz []byte) (exported.Account, error)
   116  }
   117  ```
   118  
   119  ### Usage of `Any` to encode interfaces
   120  
   121  In general, module-level .proto files should define messages which encode interfaces
   122  using [`google.protobuf.Any`](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto).
   123  After [extension discussion](https://github.com/cosmos/cosmos-sdk/issues/6030),
   124  this was chosen as the preferred alternative to application-level `oneof`s
   125  as in our original protobuf design. The arguments in favor of `Any` can be
   126  summarized as follows:
   127  
   128  * `Any` provides a simpler, more consistent client UX for dealing with
   129  interfaces than app-level `oneof`s that will need to be coordinated more
   130  carefully across applications. Creating a generic transaction
   131  signing library using `oneof`s may be cumbersome and critical logic may need
   132  to be reimplemented for each chain
   133  * `Any` provides more resistance against human error than `oneof`
   134  * `Any` is generally simpler to implement for both modules and apps
   135  
   136  The main counter-argument to using `Any` centers around its additional space
   137  and possibly performance overhead. The space overhead could be dealt with using
   138  compression at the persistence layer in the future and the performance impact
   139  is likely to be small. Thus, not using `Any` is seem as a pre-mature optimization,
   140  with user experience as the higher order concern.
   141  
   142  Note, that given the Cosmos SDK's decision to adopt the `Codec` interfaces described
   143  above, apps can still choose to use `oneof` to encode state and transactions
   144  but it is not the recommended approach. If apps do choose to use `oneof`s
   145  instead of `Any` they will likely lose compatibility with client apps that
   146  support multiple chains. Thus developers should think carefully about whether
   147  they care more about what is possibly a pre-mature optimization or end-user
   148  and client developer UX.
   149  
   150  ### Safe usage of `Any`
   151  
   152  By default, the [gogo protobuf implementation of `Any`](https://pkg.go.dev/github.com/cosmos/gogoproto/types)
   153  uses [global type registration]( https://github.com/cosmos/gogoproto/blob/master/proto/properties.go#L540)
   154  to decode values packed in `Any` into concrete
   155  go types. This introduces a vulnerability where any malicious module
   156  in the dependency tree could register a type with the global protobuf registry
   157  and cause it to be loaded and unmarshaled by a transaction that referenced
   158  it in the `type_url` field.
   159  
   160  To prevent this, we introduce a type registration mechanism for decoding `Any`
   161  values into concrete types through the `InterfaceRegistry` interface which
   162  bears some similarity to type registration with Amino:
   163  
   164  ```go
   165  type InterfaceRegistry interface {
   166      // RegisterInterface associates protoName as the public name for the
   167      // interface passed in as iface
   168      // Ex:
   169      //   registry.RegisterInterface("cosmos_sdk.Msg", (*sdk.Msg)(nil))
   170      RegisterInterface(protoName string, iface interface{})
   171  
   172      // RegisterImplementations registers impls as a concrete implementations of
   173      // the interface iface
   174      // Ex:
   175      //  registry.RegisterImplementations((*sdk.Msg)(nil), &MsgSend{}, &MsgMultiSend{})
   176      RegisterImplementations(iface interface{}, impls ...proto.Message)
   177  
   178  }
   179  ```
   180  
   181  In addition to serving as a whitelist, `InterfaceRegistry` can also serve
   182  to communicate the list of concrete types that satisfy an interface to clients.
   183  
   184  In .proto files:
   185  
   186  * fields which accept interfaces should be annotated with `cosmos_proto.accepts_interface`
   187  using the same full-qualified name passed as `protoName` to `InterfaceRegistry.RegisterInterface`
   188  * interface implementations should be annotated with `cosmos_proto.implements_interface`
   189  using the same full-qualified name passed as `protoName` to `InterfaceRegistry.RegisterInterface`
   190  
   191  In the future, `protoName`, `cosmos_proto.accepts_interface`, `cosmos_proto.implements_interface`
   192  may be used via code generation, reflection &/or static linting.
   193  
   194  The same struct that implements `InterfaceRegistry` will also implement an
   195  interface `InterfaceUnpacker` to be used for unpacking `Any`s:
   196  
   197  ```go
   198  type InterfaceUnpacker interface {
   199      // UnpackAny unpacks the value in any to the interface pointer passed in as
   200      // iface. Note that the type in any must have been registered with
   201      // RegisterImplementations as a concrete type for that interface
   202      // Ex:
   203      //    var msg sdk.Msg
   204      //    err := ctx.UnpackAny(any, &msg)
   205      //    ...
   206      UnpackAny(any *Any, iface interface{}) error
   207  }
   208  ```
   209  
   210  Note that `InterfaceRegistry` usage does not deviate from standard protobuf
   211  usage of `Any`, it just introduces a security and introspection layer for
   212  golang usage.
   213  
   214  `InterfaceRegistry` will be a member of `ProtoCodec`
   215  described above. In order for modules to register interface types, app modules
   216  can optionally implement the following interface:
   217  
   218  ```go
   219  type InterfaceModule interface {
   220      RegisterInterfaceTypes(InterfaceRegistry)
   221  }
   222  ```
   223  
   224  The module manager will include a method to call `RegisterInterfaceTypes` on
   225  every module that implements it in order to populate the `InterfaceRegistry`.
   226  
   227  ### Using `Any` to encode state
   228  
   229  The Cosmos SDK will provide support methods `MarshalInterface` and `UnmarshalInterface` to hide a complexity of wrapping interface types into `Any` and allow easy serialization.
   230  
   231  ```go
   232  import "github.com/cosmos/cosmos-sdk/codec"
   233  
   234  // note: eviexported.Evidence is an interface type
   235  func MarshalEvidence(cdc codec.BinaryCodec, e eviexported.Evidence) ([]byte, error) {
   236  	return cdc.MarshalInterface(e)
   237  }
   238  
   239  func UnmarshalEvidence(cdc codec.BinaryCodec, bz []byte) (eviexported.Evidence, error) {
   240  	var evi eviexported.Evidence
   241  	err := cdc.UnmarshalInterface(&evi, bz)
   242      return err, nil
   243  }
   244  ```
   245  
   246  ### Using `Any` in `sdk.Msg`s
   247  
   248  A similar concept is to be applied for messages that contain interfaces fields.
   249  For example, we can define `MsgSubmitEvidence` as follows where `Evidence` is
   250  an interface:
   251  
   252  ```protobuf
   253  // x/evidence/types/types.proto
   254  
   255  message MsgSubmitEvidence {
   256    bytes submitter = 1
   257      [
   258        (gogoproto.casttype) = "github.com/cosmos/cosmos-sdk/types.AccAddress"
   259      ];
   260    google.protobuf.Any evidence = 2;
   261  }
   262  ```
   263  
   264  Note that in order to unpack the evidence from `Any` we do need a reference to
   265  `InterfaceRegistry`. In order to reference evidence in methods like
   266  `ValidateBasic` which shouldn't have to know about the `InterfaceRegistry`, we
   267  introduce an `UnpackInterfaces` phase to deserialization which unpacks
   268  interfaces before they're needed.
   269  
   270  ### Unpacking Interfaces
   271  
   272  To implement the `UnpackInterfaces` phase of deserialization which unpacks
   273  interfaces wrapped in `Any` before they're needed, we create an interface
   274  that `sdk.Msg`s and other types can implement:
   275  
   276  ```go
   277  type UnpackInterfacesMessage interface {
   278    UnpackInterfaces(InterfaceUnpacker) error
   279  }
   280  ```
   281  
   282  We also introduce a private `cachedValue interface{}` field onto the `Any`
   283  struct itself with a public getter `GetCachedValue() interface{}`.
   284  
   285  The `UnpackInterfaces` method is to be invoked during message deserialization right
   286  after `Unmarshal` and any interface values packed in `Any`s will be decoded
   287  and stored in `cachedValue` for reference later.
   288  
   289  Then unpacked interface values can safely be used in any code afterwards
   290  without knowledge of the `InterfaceRegistry`
   291  and messages can introduce a simple getter to cast the cached value to the
   292  correct interface type.
   293  
   294  This has the added benefit that unmarshaling of `Any` values only happens once
   295  during initial deserialization rather than every time the value is read. Also,
   296  when `Any` values are first packed (for instance in a call to
   297  `NewMsgSubmitEvidence`), the original interface value is cached so that
   298  unmarshaling isn't needed to read it again.
   299  
   300  `MsgSubmitEvidence` could implement `UnpackInterfaces`, plus a convenience getter
   301  `GetEvidence` as follows:
   302  
   303  ```go
   304  func (msg MsgSubmitEvidence) UnpackInterfaces(ctx sdk.InterfaceRegistry) error {
   305    var evi eviexported.Evidence
   306    return ctx.UnpackAny(msg.Evidence, *evi)
   307  }
   308  
   309  func (msg MsgSubmitEvidence) GetEvidence() eviexported.Evidence {
   310    return msg.Evidence.GetCachedValue().(eviexported.Evidence)
   311  }
   312  ```
   313  
   314  ### Amino Compatibility
   315  
   316  Our custom implementation of `Any` can be used transparently with Amino if used
   317  with the proper codec instance. What this means is that interfaces packed within
   318  `Any`s will be amino marshaled like regular Amino interfaces (assuming they
   319  have been registered properly with Amino).
   320  
   321  In order for this functionality to work:
   322  
   323  * **all legacy code must use `*codec.LegacyAmino` instead of `*amino.Codec` which is
   324    now a wrapper which properly handles `Any`**
   325  * **all new code should use `Marshaler` which is compatible with both amino and
   326    protobuf**
   327  * Also, before v0.39, `codec.LegacyAmino` will be renamed to `codec.LegacyAmino`.
   328  
   329  ### Why Wasn't X Chosen Instead
   330  
   331  For a more complete comparison to alternative protocols, see [here](https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f).
   332  
   333  ### Cap'n Proto
   334  
   335  While [Cap’n Proto](https://capnproto.org/) does seem like an advantageous alternative to Protobuf
   336  due to it's native support for interfaces/generics and built in canonicalization, it does lack the
   337  rich client ecosystem compared to Protobuf and is a bit less mature.
   338  
   339  ### FlatBuffers
   340  
   341  [FlatBuffers](https://google.github.io/flatbuffers/) is also a potentially viable alternative, with the
   342  primary difference being that FlatBuffers does not need a parsing/unpacking step to a secondary
   343  representation before you can access data, often coupled with per-object memory allocation.
   344  
   345  However, it would require great efforts into research and full understanding the scope of the migration
   346  and path forward -- which isn't immediately clear. In addition, FlatBuffers aren't designed for
   347  untrusted inputs.
   348  
   349  ## Future Improvements & Roadmap
   350  
   351  In the future we may consider a compression layer right above the persistence
   352  layer which doesn't change tx or merkle tree hashes, but reduces the storage
   353  overhead of `Any`. In addition, we may adopt protobuf naming conventions which
   354  make type URLs a bit more concise while remaining descriptive.
   355  
   356  Additional code generation support around the usage of `Any` is something that
   357  could also be explored in the future to make the UX for go developers more
   358  seamless.
   359  
   360  ## Consequences
   361  
   362  ### Positive
   363  
   364  * Significant performance gains.
   365  * Supports backward and forward type compatibility.
   366  * Better support for cross-language clients.
   367  
   368  ### Negative
   369  
   370  * Learning curve required to understand and implement Protobuf messages.
   371  * Slightly larger message size due to use of `Any`, although this could be offset
   372    by a compression layer in the future
   373  
   374  ### Neutral
   375  
   376  ## References
   377  
   378  1. https://github.com/cosmos/cosmos-sdk/issues/4977
   379  2. https://github.com/cosmos/cosmos-sdk/issues/5444