github.com/cosmos/cosmos-sdk@v0.50.10/docs/architecture/adr-019-protobuf-state-encoding.md (about) 1 # ADR 019: Protocol Buffer State Encoding 2 3 ## Changelog 4 5 * 2020 Feb 15: Initial Draft 6 * 2020 Feb 24: Updates to handle messages with interface fields 7 * 2020 Apr 27: Convert usages of `oneof` for interfaces to `Any` 8 * 2020 May 15: Describe `cosmos_proto` extensions and amino compatibility 9 * 2020 Dec 4: Move and rename `MarshalAny` and `UnmarshalAny` into the `codec.Codec` interface. 10 * 2021 Feb 24: Remove mentions of `HybridCodec`, which has been abandoned in [#6843](https://github.com/cosmos/cosmos-sdk/pull/6843). 11 12 ## Status 13 14 Accepted 15 16 ## Context 17 18 Currently, the Cosmos SDK utilizes [go-amino](https://github.com/tendermint/go-amino/) for binary 19 and JSON object encoding over the wire bringing parity between logical objects and persistence objects. 20 21 From the Amino docs: 22 23 > Amino is an object encoding specification. It is a subset of Proto3 with an extension for interface 24 > support. See the [Proto3 spec](https://developers.google.com/protocol-buffers/docs/proto3) for more 25 > information on Proto3, which Amino is largely compatible with (but not with Proto2). 26 > 27 > The goal of the Amino encoding protocol is to bring parity into logic objects and persistence objects. 28 29 Amino also aims to have the following goals (not a complete list): 30 31 * Binary bytes must be decode-able with a schema. 32 * Schema must be upgradeable. 33 * The encoder and decoder logic must be reasonably simple. 34 35 However, we believe that Amino does not fulfill these goals completely and does not fully meet the 36 needs of a truly flexible cross-language and multi-client compatible encoding protocol in the Cosmos SDK. 37 Namely, Amino has proven to be a big pain-point in regards to supporting object serialization across 38 clients written in various languages while providing virtually little in the way of true backwards 39 compatibility and upgradeability. Furthermore, through profiling and various benchmarks, Amino has 40 been shown to be an extremely large performance bottleneck in the Cosmos SDK <sup>1</sup>. This is 41 largely reflected in the performance of simulations and application transaction throughput. 42 43 Thus, we need to adopt an encoding protocol that meets the following criteria for state serialization: 44 45 * Language agnostic 46 * Platform agnostic 47 * Rich client support and thriving ecosystem 48 * High performance 49 * Minimal encoded message size 50 * Codegen-based over reflection-based 51 * Supports backward and forward compatibility 52 53 Note, migrating away from Amino should be viewed as a two-pronged approach, state and client encoding. 54 This ADR focuses on state serialization in the Cosmos SDK state machine. A corresponding ADR will be 55 made to address client-side encoding. 56 57 ## Decision 58 59 We will adopt [Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing 60 persisted structured data in the Cosmos SDK while providing a clean mechanism and developer UX for 61 applications wishing to continue to use Amino. We will provide this mechanism by updating modules to 62 accept a codec interface, `Marshaler`, instead of a concrete Amino codec. Furthermore, the Cosmos SDK 63 will provide two concrete implementations of the `Marshaler` interface: `AminoCodec` and `ProtoCodec`. 64 65 * `AminoCodec`: Uses Amino for both binary and JSON encoding. 66 * `ProtoCodec`: Uses Protobuf for both binary and JSON encoding. 67 68 Modules will use whichever codec that is instantiated in the app. By default, the Cosmos SDK's `simapp` 69 instantiates a `ProtoCodec` as the concrete implementation of `Marshaler`, inside the `MakeTestEncodingConfig` 70 function. This can be easily overwritten by app developers if they so desire. 71 72 The ultimate goal will be to replace Amino JSON encoding with Protobuf encoding and thus have 73 modules accept and/or extend `ProtoCodec`. Until then, Amino JSON is still provided for legacy use-cases. 74 A handful of places in the Cosmos SDK still have Amino JSON hardcoded, such as the Legacy API REST endpoints 75 and the `x/params` store. They are planned to be converted to Protobuf in a gradual manner. 76 77 ### Module Codecs 78 79 Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf 80 migration is pretty straightforward. These modules are to simply migrate any existing types that 81 are encoded and persisted via their concrete Amino codec to Protobuf and have their keeper accept a 82 `Marshaler` that will be a `ProtoCodec`. This migration is simple as things will just work as-is. 83 84 Note, any business logic that needs to encode primitive types like `bool` or `int64` should use 85 [gogoprotobuf](https://github.com/cosmos/gogoproto) Value types. 86 87 Example: 88 89 ```go 90 ts, err := gogotypes.TimestampProto(completionTime) 91 if err != nil { 92 // ... 93 } 94 95 bz := cdc.MustMarshal(ts) 96 ``` 97 98 However, modules can vary greatly in purpose and design and so we must support the ability for modules 99 to be able to encode and work with interfaces (e.g. `Account` or `Content`). For these modules, they 100 must define their own codec interface that extends `Marshaler`. These specific interfaces are unique 101 to the module and will contain method contracts that know how to serialize the needed interfaces. 102 103 Example: 104 105 ```go 106 // x/auth/types/codec.go 107 108 type Codec interface { 109 codec.Codec 110 111 MarshalAccount(acc exported.Account) ([]byte, error) 112 UnmarshalAccount(bz []byte) (exported.Account, error) 113 114 MarshalAccountJSON(acc exported.Account) ([]byte, error) 115 UnmarshalAccountJSON(bz []byte) (exported.Account, error) 116 } 117 ``` 118 119 ### Usage of `Any` to encode interfaces 120 121 In general, module-level .proto files should define messages which encode interfaces 122 using [`google.protobuf.Any`](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto). 123 After [extension discussion](https://github.com/cosmos/cosmos-sdk/issues/6030), 124 this was chosen as the preferred alternative to application-level `oneof`s 125 as in our original protobuf design. The arguments in favor of `Any` can be 126 summarized as follows: 127 128 * `Any` provides a simpler, more consistent client UX for dealing with 129 interfaces than app-level `oneof`s that will need to be coordinated more 130 carefully across applications. Creating a generic transaction 131 signing library using `oneof`s may be cumbersome and critical logic may need 132 to be reimplemented for each chain 133 * `Any` provides more resistance against human error than `oneof` 134 * `Any` is generally simpler to implement for both modules and apps 135 136 The main counter-argument to using `Any` centers around its additional space 137 and possibly performance overhead. The space overhead could be dealt with using 138 compression at the persistence layer in the future and the performance impact 139 is likely to be small. Thus, not using `Any` is seem as a pre-mature optimization, 140 with user experience as the higher order concern. 141 142 Note, that given the Cosmos SDK's decision to adopt the `Codec` interfaces described 143 above, apps can still choose to use `oneof` to encode state and transactions 144 but it is not the recommended approach. If apps do choose to use `oneof`s 145 instead of `Any` they will likely lose compatibility with client apps that 146 support multiple chains. Thus developers should think carefully about whether 147 they care more about what is possibly a pre-mature optimization or end-user 148 and client developer UX. 149 150 ### Safe usage of `Any` 151 152 By default, the [gogo protobuf implementation of `Any`](https://pkg.go.dev/github.com/cosmos/gogoproto/types) 153 uses [global type registration]( https://github.com/cosmos/gogoproto/blob/master/proto/properties.go#L540) 154 to decode values packed in `Any` into concrete 155 go types. This introduces a vulnerability where any malicious module 156 in the dependency tree could register a type with the global protobuf registry 157 and cause it to be loaded and unmarshaled by a transaction that referenced 158 it in the `type_url` field. 159 160 To prevent this, we introduce a type registration mechanism for decoding `Any` 161 values into concrete types through the `InterfaceRegistry` interface which 162 bears some similarity to type registration with Amino: 163 164 ```go 165 type InterfaceRegistry interface { 166 // RegisterInterface associates protoName as the public name for the 167 // interface passed in as iface 168 // Ex: 169 // registry.RegisterInterface("cosmos_sdk.Msg", (*sdk.Msg)(nil)) 170 RegisterInterface(protoName string, iface interface{}) 171 172 // RegisterImplementations registers impls as a concrete implementations of 173 // the interface iface 174 // Ex: 175 // registry.RegisterImplementations((*sdk.Msg)(nil), &MsgSend{}, &MsgMultiSend{}) 176 RegisterImplementations(iface interface{}, impls ...proto.Message) 177 178 } 179 ``` 180 181 In addition to serving as a whitelist, `InterfaceRegistry` can also serve 182 to communicate the list of concrete types that satisfy an interface to clients. 183 184 In .proto files: 185 186 * fields which accept interfaces should be annotated with `cosmos_proto.accepts_interface` 187 using the same full-qualified name passed as `protoName` to `InterfaceRegistry.RegisterInterface` 188 * interface implementations should be annotated with `cosmos_proto.implements_interface` 189 using the same full-qualified name passed as `protoName` to `InterfaceRegistry.RegisterInterface` 190 191 In the future, `protoName`, `cosmos_proto.accepts_interface`, `cosmos_proto.implements_interface` 192 may be used via code generation, reflection &/or static linting. 193 194 The same struct that implements `InterfaceRegistry` will also implement an 195 interface `InterfaceUnpacker` to be used for unpacking `Any`s: 196 197 ```go 198 type InterfaceUnpacker interface { 199 // UnpackAny unpacks the value in any to the interface pointer passed in as 200 // iface. Note that the type in any must have been registered with 201 // RegisterImplementations as a concrete type for that interface 202 // Ex: 203 // var msg sdk.Msg 204 // err := ctx.UnpackAny(any, &msg) 205 // ... 206 UnpackAny(any *Any, iface interface{}) error 207 } 208 ``` 209 210 Note that `InterfaceRegistry` usage does not deviate from standard protobuf 211 usage of `Any`, it just introduces a security and introspection layer for 212 golang usage. 213 214 `InterfaceRegistry` will be a member of `ProtoCodec` 215 described above. In order for modules to register interface types, app modules 216 can optionally implement the following interface: 217 218 ```go 219 type InterfaceModule interface { 220 RegisterInterfaceTypes(InterfaceRegistry) 221 } 222 ``` 223 224 The module manager will include a method to call `RegisterInterfaceTypes` on 225 every module that implements it in order to populate the `InterfaceRegistry`. 226 227 ### Using `Any` to encode state 228 229 The Cosmos SDK will provide support methods `MarshalInterface` and `UnmarshalInterface` to hide a complexity of wrapping interface types into `Any` and allow easy serialization. 230 231 ```go 232 import "github.com/cosmos/cosmos-sdk/codec" 233 234 // note: eviexported.Evidence is an interface type 235 func MarshalEvidence(cdc codec.BinaryCodec, e eviexported.Evidence) ([]byte, error) { 236 return cdc.MarshalInterface(e) 237 } 238 239 func UnmarshalEvidence(cdc codec.BinaryCodec, bz []byte) (eviexported.Evidence, error) { 240 var evi eviexported.Evidence 241 err := cdc.UnmarshalInterface(&evi, bz) 242 return err, nil 243 } 244 ``` 245 246 ### Using `Any` in `sdk.Msg`s 247 248 A similar concept is to be applied for messages that contain interfaces fields. 249 For example, we can define `MsgSubmitEvidence` as follows where `Evidence` is 250 an interface: 251 252 ```protobuf 253 // x/evidence/types/types.proto 254 255 message MsgSubmitEvidence { 256 bytes submitter = 1 257 [ 258 (gogoproto.casttype) = "github.com/cosmos/cosmos-sdk/types.AccAddress" 259 ]; 260 google.protobuf.Any evidence = 2; 261 } 262 ``` 263 264 Note that in order to unpack the evidence from `Any` we do need a reference to 265 `InterfaceRegistry`. In order to reference evidence in methods like 266 `ValidateBasic` which shouldn't have to know about the `InterfaceRegistry`, we 267 introduce an `UnpackInterfaces` phase to deserialization which unpacks 268 interfaces before they're needed. 269 270 ### Unpacking Interfaces 271 272 To implement the `UnpackInterfaces` phase of deserialization which unpacks 273 interfaces wrapped in `Any` before they're needed, we create an interface 274 that `sdk.Msg`s and other types can implement: 275 276 ```go 277 type UnpackInterfacesMessage interface { 278 UnpackInterfaces(InterfaceUnpacker) error 279 } 280 ``` 281 282 We also introduce a private `cachedValue interface{}` field onto the `Any` 283 struct itself with a public getter `GetCachedValue() interface{}`. 284 285 The `UnpackInterfaces` method is to be invoked during message deserialization right 286 after `Unmarshal` and any interface values packed in `Any`s will be decoded 287 and stored in `cachedValue` for reference later. 288 289 Then unpacked interface values can safely be used in any code afterwards 290 without knowledge of the `InterfaceRegistry` 291 and messages can introduce a simple getter to cast the cached value to the 292 correct interface type. 293 294 This has the added benefit that unmarshaling of `Any` values only happens once 295 during initial deserialization rather than every time the value is read. Also, 296 when `Any` values are first packed (for instance in a call to 297 `NewMsgSubmitEvidence`), the original interface value is cached so that 298 unmarshaling isn't needed to read it again. 299 300 `MsgSubmitEvidence` could implement `UnpackInterfaces`, plus a convenience getter 301 `GetEvidence` as follows: 302 303 ```go 304 func (msg MsgSubmitEvidence) UnpackInterfaces(ctx sdk.InterfaceRegistry) error { 305 var evi eviexported.Evidence 306 return ctx.UnpackAny(msg.Evidence, *evi) 307 } 308 309 func (msg MsgSubmitEvidence) GetEvidence() eviexported.Evidence { 310 return msg.Evidence.GetCachedValue().(eviexported.Evidence) 311 } 312 ``` 313 314 ### Amino Compatibility 315 316 Our custom implementation of `Any` can be used transparently with Amino if used 317 with the proper codec instance. What this means is that interfaces packed within 318 `Any`s will be amino marshaled like regular Amino interfaces (assuming they 319 have been registered properly with Amino). 320 321 In order for this functionality to work: 322 323 * **all legacy code must use `*codec.LegacyAmino` instead of `*amino.Codec` which is 324 now a wrapper which properly handles `Any`** 325 * **all new code should use `Marshaler` which is compatible with both amino and 326 protobuf** 327 * Also, before v0.39, `codec.LegacyAmino` will be renamed to `codec.LegacyAmino`. 328 329 ### Why Wasn't X Chosen Instead 330 331 For a more complete comparison to alternative protocols, see [here](https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f). 332 333 ### Cap'n Proto 334 335 While [Cap’n Proto](https://capnproto.org/) does seem like an advantageous alternative to Protobuf 336 due to it's native support for interfaces/generics and built in canonicalization, it does lack the 337 rich client ecosystem compared to Protobuf and is a bit less mature. 338 339 ### FlatBuffers 340 341 [FlatBuffers](https://google.github.io/flatbuffers/) is also a potentially viable alternative, with the 342 primary difference being that FlatBuffers does not need a parsing/unpacking step to a secondary 343 representation before you can access data, often coupled with per-object memory allocation. 344 345 However, it would require great efforts into research and full understanding the scope of the migration 346 and path forward -- which isn't immediately clear. In addition, FlatBuffers aren't designed for 347 untrusted inputs. 348 349 ## Future Improvements & Roadmap 350 351 In the future we may consider a compression layer right above the persistence 352 layer which doesn't change tx or merkle tree hashes, but reduces the storage 353 overhead of `Any`. In addition, we may adopt protobuf naming conventions which 354 make type URLs a bit more concise while remaining descriptive. 355 356 Additional code generation support around the usage of `Any` is something that 357 could also be explored in the future to make the UX for go developers more 358 seamless. 359 360 ## Consequences 361 362 ### Positive 363 364 * Significant performance gains. 365 * Supports backward and forward type compatibility. 366 * Better support for cross-language clients. 367 368 ### Negative 369 370 * Learning curve required to understand and implement Protobuf messages. 371 * Slightly larger message size due to use of `Any`, although this could be offset 372 by a compression layer in the future 373 374 ### Neutral 375 376 ## References 377 378 1. https://github.com/cosmos/cosmos-sdk/issues/4977 379 2. https://github.com/cosmos/cosmos-sdk/issues/5444