github.com/cosmos/cosmos-sdk@v0.50.10/docs/architecture/adr-050-sign-mode-textual.md (about)

     1  # ADR 050: SIGN_MODE_TEXTUAL
     2  
     3  ## Changelog
     4  
     5  * Dec 06, 2021: Initial Draft.
     6  * Feb 07, 2022: Draft read and concept-ACKed by the Ledger team.
     7  * May 16, 2022: Change status to Accepted.
     8  * Aug 11, 2022: Require signing over tx raw bytes.
     9  * Sep 07, 2022: Add custom `Msg`-renderers.
    10  * Sep 18, 2022: Structured format instead of lines of text
    11  * Nov 23, 2022: Specify CBOR encoding.
    12  * Dec 01, 2022: Link to examples in separate JSON file.
    13  * Dec 06, 2022: Re-ordering of envelope screens.
    14  * Dec 14, 2022: Mention exceptions for invertability.
    15  * Jan 23, 2023: Switch Screen.Text to Title+Content.
    16  * Mar 07, 2023: Change SignDoc from array to struct containing array.
    17  * Mar 20, 2023: Introduce a spec version initialized to 0.
    18  
    19  ## Status
    20  
    21  Accepted. Implementation started. Small value renderers details still need to be polished.
    22  
    23  Spec version: 0.
    24  
    25  ## Abstract
    26  
    27  This ADR specifies SIGN_MODE_TEXTUAL, a new string-based sign mode that is targetted at signing with hardware devices.
    28  
    29  ## Context
    30  
    31  Protobuf-based SIGN_MODE_DIRECT was introduced in [ADR-020](./adr-020-protobuf-transaction-encoding.md) and is intended to replace SIGN_MODE_LEGACY_AMINO_JSON in most situations, such as mobile wallets and CLI keyrings. However, the [Ledger](https://www.ledger.com/) hardware wallet is still using SIGN_MODE_LEGACY_AMINO_JSON for displaying the sign bytes to the user. Hardware wallets cannot transition to SIGN_MODE_DIRECT as:
    32  
    33  * SIGN_MODE_DIRECT is binary-based and thus not suitable for display to end-users. Technically, hardware wallets could simply display the sign bytes to the user. But this would be considered as blind signing, and is a security concern.
    34  * hardware cannot decode the protobuf sign bytes due to memory constraints, as the Protobuf definitions would need to be embedded on the hardware device.
    35  
    36  In an effort to remove Amino from the SDK, a new sign mode needs to be created for hardware devices. [Initial discussions](https://github.com/cosmos/cosmos-sdk/issues/6513) propose a text-based sign mode, which this ADR formally specifies.
    37  
    38  ## Decision
    39  
    40  In SIGN_MODE_TEXTUAL, a transaction is rendered into a textual representation,
    41  which is then sent to a secure device or subsystem for the user to review and sign.
    42  Unlike `SIGN_MODE_DIRECT`, the transmitted data can be simply decoded into legible text
    43  even on devices with limited processing and display.
    44  
    45  The textual representation is a sequence of _screens_.
    46  Each screen is meant to be displayed in its entirety (if possible) even on a small device like a Ledger.
    47  A screen is roughly equivalent to a short line of text.
    48  Large screens can be displayed in several pieces,
    49  much as long lines of text are wrapped,
    50  so no hard guidance is given, though 40 characters is a good target.
    51  A screen is used to display a single key/value pair for scalar values
    52  (or composite values with a compact notation, such as `Coins`)
    53  or to introduce or conclude a larger grouping.
    54  
    55  The text can contain the full range of Unicode code points, including control characters and nul.
    56  The device is responsible for deciding how to display characters it cannot render natively.
    57  See [annex 2](./adr-050-sign-mode-textual-annex2.md) for guidance.
    58  
    59  Screens have a non-negative indentation level to signal composite or nested structures.
    60  Indentation level zero is the top level.
    61  Indentation is displayed via some device-specific mechanism.
    62  Message quotation notation is an appropriate model, such as
    63  leading `>` characters or vertical bars on more capable displays.
    64  
    65  Some screens are marked as _expert_ screens,
    66  meant to be displayed only if the viewer chooses to opt in for the extra detail.
    67  Expert screens are meant for information that is rarely useful,
    68  or needs to be present only for signature integrity (see below).
    69  
    70  ### Invertible Rendering
    71  
    72  We require that the rendering of the transaction be invertible:
    73  there must be a parsing function such that for every transaction,
    74  when rendered to the textual representation,
    75  parsing that representation yeilds a proto message equivalent
    76  to the original under proto equality.
    77  
    78  Note that this inverse function does not need to perform correct
    79  parsing or error signaling for the whole domain of textual data.
    80  Merely that the range of valid transactions be invertible under
    81  the composition of rendering and parsing.
    82  
    83  Note that the existence of an inverse function ensures that the
    84  rendered text contains the full information of the original transaction,
    85  not a hash or subset.
    86  
    87  We make an exception for invertibility for data which are too large to
    88  meaningfully display, such as byte strings longer than 32 bytes. We may then
    89  selectively render them with a cryptographically-strong hash. In these cases,
    90  it is still computationally infeasible to find a different transaction which
    91  has the same rendering. However, we must ensure that the hash computation is
    92  simple enough to be reliably executed independently, so at least the hash is
    93  itself reasonably verifiable when the raw byte string is not.
    94  
    95  ### Chain State
    96  
    97  The rendering function (and parsing function) may depend on the current chain state.
    98  This is useful for reading parameters, such as coin display metadata,
    99  or for reading user-specific preferences such as language or address aliases.
   100  Note that if the observed state changes between signature generation
   101  and the transaction's inclusion in a block, the delivery-time rendering
   102  might differ. If so, the signature will be invalid and the transaction
   103  will be rejected.
   104  
   105  ### Signature and Security
   106  
   107  For security, transaction signatures should have three properties:
   108  
   109  1. Given the transaction, signatures, and chain state, it must be possible to validate that the signatures matches the transaction,
   110  to verify that the signers must have known their respective secret keys.
   111  
   112  2. It must be computationally infeasible to find a substantially different transaction for which the given signatures are valid, given the same chain state.
   113  
   114  3. The user should be able to give informed consent to the signed data via a simple, secure device with limited display capabilities.
   115  
   116  The correctness and security of `SIGN_MODE_TEXTUAL` is guaranteed by demonstrating an inverse function from the rendering to transaction protos.
   117  This means that it is impossible for a different protocol buffer message to render to the same text.
   118  
   119  ### Transaction Hash Malleability
   120  
   121  When client software forms a transaction, the "raw" transaction (`TxRaw`) is serialized as a proto
   122  and a hash of the resulting byte sequence is computed.
   123  This is the `TxHash`, and is used by various services to track the submitted transaction through its lifecycle.
   124  Various misbehavior is possible if one can generate a modified transaction with a different TxHash
   125  but for which the signature still checks out.
   126  
   127  SIGN_MODE_TEXTUAL prevents this transaction malleability by including the TxHash as an expert screen
   128  in the rendering.
   129  
   130  ### SignDoc
   131  
   132  The SignDoc for `SIGN_MODE_TEXTUAL` is formed from a data structure like:
   133  
   134  ```go
   135  type Screen struct {
   136    Title string   // possibly size limited to, advised to 64 characters
   137    Content string // possibly size limited to, advised to 255 characters
   138    Indent uint8   // size limited to something small like 16 or 32
   139    Expert bool
   140  }
   141  
   142  type SignDocTextual struct {
   143    Screens []Screen
   144  }
   145  ```
   146  
   147  We do not plan to use protobuf serialization to form the sequence of bytes
   148  that will be tranmitted and signed, in order to keep the decoder simple.
   149  We will use [CBOR](https://cbor.io) ([RFC 8949](https://www.rfc-editor.org/rfc/rfc8949.html)) instead.
   150  The encoding is defined by the following CDDL ([RFC 8610](https://www.rfc-editor.org/rfc/rfc8610)):
   151  
   152  ```
   153  ;;; CDDL (RFC 8610) Specification of SignDoc for SIGN_MODE_TEXTUAL.
   154  ;;; Must be encoded using CBOR deterministic encoding (RFC 8949, section 4.2.1).
   155  
   156  ;; A Textual document is a struct containing one field: an array of screens.
   157  sign_doc = {
   158    screens_key: [* screen],
   159  }
   160  
   161  ;; The key is an integer to keep the encoding small.
   162  screens_key = 1
   163  
   164  ;; A screen consists of a text string, an indentation, and the expert flag,
   165  ;; represented as an integer-keyed map. All entries are optional
   166  ;; and MUST be omitted from the encoding if empty, zero, or false.
   167  ;; Text defaults to the empty string, indent defaults to zero,
   168  ;; and expert defaults to false.
   169  screen = {
   170    ? title_key: tstr,
   171    ? content_key: tstr,
   172    ? indent_key: uint,
   173    ? expert_key: bool,
   174  }
   175  
   176  ;; Keys are small integers to keep the encoding small.
   177  title_key = 1
   178  content_key = 2
   179  indent_key = 3
   180  expert_key = 4
   181  ```
   182  
   183  Defining the sign_doc as directly an array of screens has also been considered. However, given the possibility of future iterations of this specification, using a single-keyed struct has been chosen over the former proposal, as structs allow for easier backwards-compatibility.
   184  
   185  ## Details
   186  
   187  In the examples that follow, screens will be shown as lines of text,
   188  indentation is indicated with a leading '>',
   189  and expert screens are marked with a leading `*`.
   190  
   191  ### Encoding of the Transaction Envelope
   192  
   193  We define "transaction envelope" as all data in a transaction that is not in the `TxBody.Messages` field. Transaction envelope includes fee, signer infos and memo, but don't include `Msg`s. `//` denotes comments and are not shown on the Ledger device.
   194  
   195  ```
   196  Chain ID: <string>
   197  Account number: <uint64>
   198  Sequence: <uint64>
   199  Address: <string>
   200  *Public Key: <Any>
   201  This transaction has <int> Message(s)                       // Pluralize "Message" only when int>1
   202  > Message (<int>/<int>): <Any>                              // See value renderers for Any rendering.
   203  End of Message
   204  Memo: <string>                                              // Skipped if no memo set.
   205  Fee: <coins>                                                // See value renderers for coins rendering.
   206  *Fee payer: <string>                                        // Skipped if no fee_payer set.
   207  *Fee granter: <string>                                      // Skipped if no fee_granter set.
   208  Tip: <coins>                                                // Skippted if no tip.
   209  Tipper: <string>
   210  *Gas Limit: <uint64>
   211  *Timeout Height: <uint64>                                   // Skipped if no timeout_height set.
   212  *Other signer: <int> SignerInfo                             // Skipped if the transaction only has 1 signer.
   213  *> Other signer (<int>/<int>): <SignerInfo>
   214  *End of other signers
   215  *Extension options: <int> Any:                              // Skipped if no body extension options
   216  *> Extension options (<int>/<int>): <Any>
   217  *End of extension options
   218  *Non critical extension options: <int> Any:                 // Skipped if no body non critical extension options
   219  *> Non critical extension options (<int>/<int>): <Any>
   220  *End of Non critical extension options
   221  *Hash of raw bytes: <hex_string>                            // Hex encoding of bytes defined, to prevent tx hash malleability.
   222  ```
   223  
   224  ### Encoding of the Transaction Body
   225  
   226  Transaction Body is the `Tx.TxBody.Messages` field, which is an array of `Any`s, where each `Any` packs a `sdk.Msg`. Since `sdk.Msg`s are widely used, they have a slightly different encoding than usual array of `Any`s (Protobuf: `repeated google.protobuf.Any`) described in Annex 1.
   227  
   228  ```
   229  This transaction has <int> message:   // Optional 's' for "message" if there's is >1 sdk.Msgs.
   230  // For each Msg, print the following 2 lines:
   231  Msg (<int>/<int>): <string>           // E.g. Msg (1/2): bank v1beta1 send coins
   232  <value rendering of Msg struct>
   233  End of transaction messages
   234  ```
   235  
   236  #### Example
   237  
   238  Given the following Protobuf message:
   239  
   240  ```protobuf
   241  message Grant {
   242    google.protobuf.Any       authorization = 1 [(cosmos_proto.accepts_interface) = "cosmos.authz.v1beta1.Authorization"];
   243    google.protobuf.Timestamp expiration    = 2 [(gogoproto.stdtime) = true, (gogoproto.nullable) = false];
   244  }
   245  
   246  message MsgGrant {
   247    option (cosmos.msg.v1.signer) = "granter";
   248  
   249    string granter = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"];
   250    string grantee = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"];
   251  }
   252  ```
   253  
   254  and a transaction containing 1 such `sdk.Msg`, we get the following encoding:
   255  
   256  ```
   257  This transaction has 1 message:
   258  Msg (1/1): authz v1beta1 grant
   259  Granter: cosmos1abc...def
   260  Grantee: cosmos1ghi...jkl
   261  End of transaction messages
   262  ```
   263  
   264  ### Custom `Msg` Renderers
   265  
   266  Application developers may choose to not follow default renderer value output for their own `Msg`s. In this case, they can implement their own custom `Msg` renderer. This is similar to [EIP4430](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-4430.md), where the smart contract developer chooses the description string to be shown to the end user.
   267  
   268  This is done by setting the `cosmos.msg.textual.v1.expert_custom_renderer` Protobuf option to a non-empty string. This option CAN ONLY be set on a Protobuf message representing transaction message object (implementing `sdk.Msg` interface).
   269  
   270  ```protobuf
   271  message MsgFooBar {
   272    // Optional comments to describe in human-readable language the formatting
   273    // rules of the custom renderer.
   274    option (cosmos.msg.textual.v1.expert_custom_renderer) = "<unique algorithm identifier>";
   275  
   276    // proto fields
   277  }
   278  ```
   279  
   280  When this option is set on a `Msg`, a registered function will transform the `Msg` into an array of one or more strings, which MAY use the key/value format (described in point #3) with the expert field prefix (described in point #5) and arbitrary indentation (point #6). These strings MAY be rendered from a `Msg` field using a default value renderer, or they may be generated from several fields using custom logic.
   281  
   282  The `<unique algorithm identifier>` is a string convention chosen by the application developer and is used to identify the custom `Msg` renderer. For example, the documentation or specification of this custom algorithm can reference this identifier. This identifier CAN have a versioned suffix (e.g. `_v1`) to adapt for future changes (which would be consensus-breaking). We also recommend adding Protobuf comments to describe in human language the custom logic used.
   283  
   284  Moreover, the renderer must provide 2 functions: one for formatting from Protobuf to string, and one for parsing string to Protobuf. These 2 functions are provided by the application developer. To satisfy point #1, the parse function MUST be the inverse of the formatting function. This property will not be checked by the SDK at runtime. However, we strongly recommend the application developer to include a comprehensive suite in their app repo to test invertibility, as to not introduce security bugs.
   285  
   286  ### Require signing over the `TxBody` and `AuthInfo` raw bytes
   287  
   288  Recall that the transaction bytes merklelized on chain are the Protobuf binary serialization of [TxRaw](hhttps://buf.build/cosmos/cosmos-sdk/docs/main:cosmos.tx.v1beta1#cosmos.tx.v1beta1.TxRaw), which contains the `body_bytes` and `auth_info_bytes`. Moreover, the transaction hash is defined as the SHA256 hash of the `TxRaw` bytes. We require that the user signs over these bytes in SIGN_MODE_TEXTUAL, more specifically over the following string:
   289  
   290  ```
   291  *Hash of raw bytes: <HEX(sha256(len(body_bytes) ++ body_bytes ++ len(auth_info_bytes) ++ auth_info_bytes))>
   292  ```
   293  
   294  where:
   295  
   296  * `++` denotes concatenation,
   297  * `HEX` is the hexadecimal representation of the bytes, all in capital letters, no `0x` prefix,
   298  * and `len()` is encoded as a Big-Endian uint64.
   299  
   300  This is to prevent transaction hash malleability. The point #1 about invertiblity assures that transaction `body` and `auth_info` values are not malleable, but the transaction hash still might be malleable with point #1 only, because the SIGN_MODE_TEXTUAL strings don't follow the byte ordering defined in `body_bytes` and `auth_info_bytes`. Without this hash, a malicious validator or exchange could intercept a transaction, modify its transaction hash _after_ the user signed it using SIGN_MODE_TEXTUAL (by tweaking the byte ordering inside `body_bytes` or `auth_info_bytes`), and then submit it to Tendermint.
   301  
   302  By including this hash in the SIGN_MODE_TEXTUAL signing payload, we keep the same level of guarantees as [SIGN_MODE_DIRECT](./adr-020-protobuf-transaction-encoding.md).
   303  
   304  These bytes are only shown in expert mode, hence the leading `*`.
   305  
   306  ## Updates to the current specification
   307  
   308  The current specification is not set in stone, and future iterations are to be expected. We distinguish two categories of updates to this specification:
   309  
   310  1. Updates that require changes of the hardware device embedded application.
   311  2. Updates that only modify the envelope and the value renderers.
   312  
   313  Updates in the 1st category include changes of the `Screen` struct or its corresponding CBOR encoding. This type of updates require a modification of the hardware signer application, to be able to decode and parse the new types. Backwards-compatibility must also be guaranteed, so that the new hardware application works with existing versions of the SDK. These updates require the coordination of multiple parties: SDK developers, hardware application developers (currently: Zondax), and client-side developers (e.g. CosmJS). Furthermore, a new submission of the hardware device application may be necessary, which, dependending on the vendor, can take some time. As such, we recommend to avoid this type of updates as much as possible.
   314  
   315  Updates in the 2nd category include changes to any of the value renderers or to the transaction envelope. For example, the ordering of fields in the envelope can be swapped, or the timestamp formatting can be modified. Since SIGN_MODE_TEXTUAL sends `Screen`s to the hardware device, this type of change do not need a hardware wallet application update. They are however state-machine-breaking, and must be documented as such. They require the coordination of SDK developers with client-side developers (e.g. CosmJS), so that the updates are released on both sides close to each other in time.
   316  
   317  We define a spec version, which is an integer that must be incremented on each update of either category. This spec version will be exposed by the SDK's implementation, and can be communicated to clients. For example, SDK v0.50 might use the spec version 1, and SDK v0.51 might use 2; thanks to this versioning, clients can know how to craft SIGN_MODE_TEXTUAL transactions based on the target SDK version.
   318  
   319  The current spec version is defined in the "Status" section, on the top of this document. It is initialized to `0` to allow flexibility in choosing how to define future versions, as it would allow adding a field either in the SignDoc Go struct or in Protobuf in a backwards-compatible way.
   320  
   321  ## Additional Formatting by the Hardware Device
   322  
   323  See [annex 2](./adr-050-sign-mode-textual-annex2.md).
   324  
   325  ## Examples
   326  
   327  1. A minimal MsgSend: [see transaction](https://github.com/cosmos/cosmos-sdk/blob/094abcd393379acbbd043996024d66cd65246fb1/tx/textual/internal/testdata/e2e.json#L2-L70).
   328  2. A transaction with a bit of everything: [see transaction](https://github.com/cosmos/cosmos-sdk/blob/094abcd393379acbbd043996024d66cd65246fb1/tx/textual/internal/testdata/e2e.json#L71-L270).
   329  
   330  The examples below are stored in a JSON file with the following fields:
   331  
   332  * `proto`: the representation of the transaction in ProtoJSON,
   333  * `screens`: the transaction rendered into SIGN_MODE_TEXTUAL screens,
   334  * `cbor`: the sign bytes of the transaction, which is the CBOR encoding of the screens.
   335  
   336  ## Consequences
   337  
   338  ### Backwards Compatibility
   339  
   340  SIGN_MODE_TEXTUAL is purely additive, and doesn't break any backwards compatibility with other sign modes.
   341  
   342  ### Positive
   343  
   344  * Human-friendly way of signing in hardware devices.
   345  * Once SIGN_MODE_TEXTUAL is shipped, SIGN_MODE_LEGACY_AMINO_JSON can be deprecated and removed. On the longer term, once the ecosystem has totally migrated, Amino can be totally removed.
   346  
   347  ### Negative
   348  
   349  * Some fields are still encoded in non-human-readable ways, such as public keys in hexadecimal.
   350  * New ledger app needs to be released, still unclear
   351  
   352  ### Neutral
   353  
   354  * If the transaction is complex, the string array can be arbitrarily long, and some users might just skip some screens and blind sign.
   355  
   356  ## Further Discussions
   357  
   358  * Some details on value renderers need to be polished, see [Annex 1](./adr-050-sign-mode-textual-annex1.md).
   359  * Are ledger apps able to support both SIGN_MODE_LEGACY_AMINO_JSON and SIGN_MODE_TEXTUAL at the same time?
   360  * Open question: should we add a Protobuf field option to allow app developers to overwrite the textual representation of certain Protobuf fields and message? This would be similar to Ethereum's [EIP4430](https://github.com/ethereum/EIPs/pull/4430), where the contract developer decides on the textual representation.
   361  * Internationalization.
   362  
   363  ## References
   364  
   365  * [Annex 1](./adr-050-sign-mode-textual-annex1.md)
   366  
   367  * Initial discussion: https://github.com/cosmos/cosmos-sdk/issues/6513
   368  * Living document used in the working group: https://hackmd.io/fsZAO-TfT0CKmLDtfMcKeA?both
   369  * Working group meeting notes: https://hackmd.io/7RkGfv_rQAaZzEigUYhcXw
   370  * Ethereum's "Described Transactions" https://github.com/ethereum/EIPs/pull/4430