github.com/cosmos/cosmos-sdk@v0.50.10/docs/architecture/adr-028-public-key-addresses.md (about) 1 # ADR 028: Public Key Addresses 2 3 ## Changelog 4 5 * 2020/08/18: Initial version 6 * 2021/01/15: Analysis and algorithm update 7 8 ## Status 9 10 Proposed 11 12 ## Abstract 13 14 This ADR defines an address format for all addressable Cosmos SDK accounts. That includes: new public key algorithms, multisig public keys, and module accounts. 15 16 ## Context 17 18 Issue [\#3685](https://github.com/cosmos/cosmos-sdk/issues/3685) identified that public key 19 address spaces are currently overlapping. We confirmed that it significantly decreases security of Cosmos SDK. 20 21 ### Problem 22 23 An attacker can control an input for an address generation function. This leads to a birthday attack, which significantly decreases the security space. 24 To overcome this, we need to separate the inputs for different kind of account types: 25 a security break of one account type shouldn't impact the security of other account types. 26 27 ### Initial proposals 28 29 One initial proposal was extending the address length and 30 adding prefixes for different types of addresses. 31 32 @ethanfrey explained an alternate approach originally used in https://github.com/iov-one/weave: 33 34 > I spent quite a bit of time thinking about this issue while building weave... The other cosmos Sdk. 35 > Basically I define a condition to be a type and format as human readable string with some binary data appended. This condition is hashed into an Address (again at 20 bytes). The use of this prefix makes it impossible to find a preimage for a given address with a different condition (eg ed25519 vs secp256k1). 36 > This is explained in depth here https://weave.readthedocs.io/en/latest/design/permissions.html 37 > And the code is here, look mainly at the top where we process conditions. https://github.com/iov-one/weave/blob/master/conditions.go 38 39 And explained how this approach should be sufficiently collision resistant: 40 41 > Yeah, AFAIK, 20 bytes should be collision resistance when the preimages are unique and not malleable. A space of 2^160 would expect some collision to be likely around 2^80 elements (birthday paradox). And if you want to find a collision for some existing element in the database, it is still 2^160. 2^80 only is if all these elements are written to state. 42 > The good example you brought up was eg. a public key bytes being a valid public key on two algorithms supported by the codec. Meaning if either was broken, you would break accounts even if they were secured with the safer variant. This is only as the issue when no differentiating type info is present in the preimage (before hashing into an address). 43 > I would like to hear an argument if the 20 bytes space is an actual issue for security, as I would be happy to increase my address sizes in weave. I just figured cosmos and ethereum and bitcoin all use 20 bytes, it should be good enough. And the arguments above which made me feel it was secure. But I have not done a deeper analysis. 44 45 This led to the first proposal (which we proved to be not good enough): 46 we concatenate a key type with a public key, hash it and take the first 20 bytes of that hash, summarized as `sha256(keyTypePrefix || keybytes)[:20]`. 47 48 ### Review and Discussions 49 50 In [\#5694](https://github.com/cosmos/cosmos-sdk/issues/5694) we discussed various solutions. 51 We agreed that 20 bytes it's not future proof, and extending the address length is the only way to allow addresses of different types, various signature types, etc. 52 This disqualifies the initial proposal. 53 54 In the issue we discussed various modifications: 55 56 * Choice of the hash function. 57 * Move the prefix out of the hash function: `keyTypePrefix + sha256(keybytes)[:20]` [post-hash-prefix-proposal]. 58 * Use double hashing: `sha256(keyTypePrefix + sha256(keybytes)[:20])`. 59 * Increase to keybytes hash slice from 20 byte to 32 or 40 bytes. We concluded that 32 bytes, produced by a good hash functions is future secure. 60 61 ### Requirements 62 63 * Support currently used tools - we don't want to break an ecosystem, or add a long adaptation period. Ref: https://github.com/cosmos/cosmos-sdk/issues/8041 64 * Try to keep the address length small - addresses are widely used in state, both as part of a key and object value. 65 66 ### Scope 67 68 This ADR only defines a process for the generation of address bytes. For end-user interactions with addresses (through the API, or CLI, etc.), we still use bech32 to format these addresses as strings. This ADR doesn't change that. 69 Using Bech32 for string encoding gives us support for checksum error codes and handling of user typos. 70 71 ## Decision 72 73 We define the following account types, for which we define the address function: 74 75 1. simple accounts: represented by a regular public key (ie: secp256k1, sr25519) 76 2. naive multisig: accounts composed by other addressable objects (ie: naive multisig) 77 3. composed accounts with a native address key (ie: bls, group module accounts) 78 4. module accounts: basically any accounts which cannot sign transactions and which are managed internally by modules 79 80 ### Legacy Public Key Addresses Don't Change 81 82 Currently (Jan 2021), the only officially supported Cosmos SDK user accounts are `secp256k1` basic accounts and legacy amino multisig. 83 They are used in existing Cosmos SDK zones. They use the following address formats: 84 85 * secp256k1: `ripemd160(sha256(pk_bytes))[:20]` 86 * legacy amino multisig: `sha256(aminoCdc.Marshal(pk))[:20]` 87 88 We don't want to change existing addresses. So the addresses for these two key types will remain the same. 89 90 The current multisig public keys use amino serialization to generate the address. We will retain 91 those public keys and their address formatting, and call them "legacy amino" multisig public keys 92 in protobuf. We will also create multisig public keys without amino addresses to be described below. 93 94 ### Hash Function Choice 95 96 As in other parts of the Cosmos SDK, we will use `sha256`. 97 98 ### Basic Address 99 100 We start with defining a base algorithm for generating addresses which we will call `Hash`. Notably, it's used for accounts represented by a single key pair. For each public key schema we have to have an associated `typ` string, explained in the next section. `hash` is the cryptographic hash function defined in the previous section. 101 102 ```go 103 const A_LEN = 32 104 105 func Hash(typ string, key []byte) []byte { 106 return hash(hash(typ) + key)[:A_LEN] 107 } 108 ``` 109 110 The `+` is bytes concatenation, which doesn't use any separator. 111 112 This algorithm is the outcome of a consultation session with a professional cryptographer. 113 Motivation: this algorithm keeps the address relatively small (length of the `typ` doesn't impact the length of the final address) 114 and it's more secure than [post-hash-prefix-proposal] (which uses the first 20 bytes of a pubkey hash, significantly reducing the address space). 115 Moreover the cryptographer motivated the choice of adding `typ` in the hash to protect against a switch table attack. 116 117 `address.Hash` is a low level function to generate _base_ addresses for new key types. Example: 118 119 * BLS: `address.Hash("bls", pubkey)` 120 121 ### Composed Addresses 122 123 For simple composed accounts (like a new naive multisig) we generalize the `address.Hash`. The address is constructed by recursively creating addresses for the sub accounts, sorting the addresses and composing them into a single address. It ensures that the ordering of keys doesn't impact the resulting address. 124 125 ```go 126 // We don't need a PubKey interface - we need anything which is addressable. 127 type Addressable interface { 128 Address() []byte 129 } 130 131 func Composed(typ string, subaccounts []Addressable) []byte { 132 addresses = map(subaccounts, \a -> LengthPrefix(a.Address())) 133 addresses = sort(addresses) 134 return address.Hash(typ, addresses[0] + ... + addresses[n]) 135 } 136 ``` 137 138 The `typ` parameter should be a schema descriptor, containing all significant attributes with deterministic serialization (eg: utf8 string). 139 `LengthPrefix` is a function which prepends 1 byte to the address. The value of that byte is the length of the address bits before prepending. The address must be at most 255 bits long. 140 We are using `LengthPrefix` to eliminate conflicts - it assures, that for 2 lists of addresses: `as = {a1, a2, ..., an}` and `bs = {b1, b2, ..., bm}` such that every `bi` and `ai` is at most 255 long, `concatenate(map(as, (a) => LengthPrefix(a))) = map(bs, (b) => LengthPrefix(b))` if `as = bs`. 141 142 Implementation Tip: account implementations should cache addresses. 143 144 #### Multisig Addresses 145 146 For a new multisig public keys, we define the `typ` parameter not based on any encoding scheme (amino or protobuf). This avoids issues with non-determinism in the encoding scheme. 147 148 Example: 149 150 ```protobuf 151 package cosmos.crypto.multisig; 152 153 message PubKey { 154 uint32 threshold = 1; 155 repeated google.protobuf.Any pubkeys = 2; 156 } 157 ``` 158 159 ```go 160 func (multisig PubKey) Address() { 161 // first gather all nested pub keys 162 var keys []address.Addressable // cryptotypes.PubKey implements Addressable 163 for _, _key := range multisig.Pubkeys { 164 keys = append(keys, key.GetCachedValue().(cryptotypes.PubKey)) 165 } 166 167 // form the type from the message name (cosmos.crypto.multisig.PubKey) and the threshold joined together 168 prefix := fmt.Sprintf("%s/%d", proto.MessageName(multisig), multisig.Threshold) 169 170 // use the Composed function defined above 171 return address.Composed(prefix, keys) 172 } 173 ``` 174 175 176 ### Derived Addresses 177 178 We must be able to cryptographically derive one address from another one. The derivation process must guarantee hash properties, hence we use the already defined `Hash` function: 179 180 ```go 181 func Derive(address, derivationKey []byte) []byte { 182 return Hash(addres, derivationKey) 183 } 184 ``` 185 186 ### Module Account Addresses 187 188 A module account will have `"module"` type. Module accounts can have sub accounts. The submodule account will be created based on module name, and sequence of derivation keys. Typically, the first derivation key should be a class of the derived accounts. The derivation process has a defined order: module name, submodule key, subsubmodule key... An example module account is created using: 189 190 ```go 191 address.Module(moduleName, key) 192 ``` 193 194 An example sub-module account is created using: 195 196 ```go 197 groupPolicyAddresses := []byte{1} 198 address.Module(moduleName, groupPolicyAddresses, policyID) 199 ``` 200 201 The `address.Module` function is using `address.Hash` with `"module"` as the type argument, and byte representation of the module name concatenated with submodule key. The two last component must be uniquely separated to avoid potential clashes (example: modulename="ab" & submodulekey="bc" will have the same derivation key as modulename="a" & submodulekey="bbc"). 202 We use a null byte (`'\x00'`) to separate module name from the submodule key. This works, because null byte is not a part of a valid module name. Finally, the sub-submodule accounts are created by applying the `Derive` function recursively. 203 We could use `Derive` function also in the first step (rather than concatenating module name with zero byte and the submodule key). We decided to do concatenation to avoid one level of derivation and speed up computation. 204 205 For backward compatibility with the existing `authtypes.NewModuleAddress`, we add a special case in `Module` function: when no derivation key is provided, we fallback to the "legacy" implementation. 206 207 ```go 208 func Module(moduleName string, derivationKeys ...[]byte) []byte{ 209 if len(derivationKeys) == 0 { 210 return authtypes.NewModuleAddress(modulenName) // legacy case 211 } 212 submoduleAddress := Hash("module", []byte(moduleName) + 0 + key) 213 return fold((a, k) => Derive(a, k), subsubKeys, submoduleAddress) 214 } 215 ``` 216 217 **Example 1** A lending BTC pool address would be: 218 219 ```go 220 btcPool := address.Module("lending", btc.Address()}) 221 ``` 222 223 If we want to create an address for a module account depending on more than one key, we can concatenate them: 224 225 ```go 226 btcAtomAMM := address.Module("amm", btc.Address() + atom.Address()}) 227 ``` 228 229 **Example 2** a smart-contract address could be constructed by: 230 231 ```go 232 smartContractAddr = Module("mySmartContractVM", smartContractsNamespace, smartContractKey}) 233 234 // which equals to: 235 smartContractAddr = Derived( 236 Module("mySmartContractVM", smartContractsNamespace), 237 []{smartContractKey}) 238 ``` 239 240 ### Schema Types 241 242 A `typ` parameter used in `Hash` function SHOULD be unique for each account type. 243 Since all Cosmos SDK account types are serialized in the state, we propose to use the protobuf message name string. 244 245 Example: all public key types have a unique protobuf message type similar to: 246 247 ```protobuf 248 package cosmos.crypto.sr25519; 249 250 message PubKey { 251 bytes key = 1; 252 } 253 ``` 254 255 All protobuf messages have unique fully qualified names, in this example `cosmos.crypto.sr25519.PubKey`. 256 These names are derived directly from .proto files in a standardized way and used 257 in other places such as the type URL in `Any`s. We can easily obtain the name using 258 `proto.MessageName(msg)`. 259 260 ## Consequences 261 262 ### Backwards Compatibility 263 264 This ADR is compatible with what was committed and directly supported in the Cosmos SDK repository. 265 266 ### Positive 267 268 * a simple algorithm for generating addresses for new public keys, complex accounts and modules 269 * the algorithm generalizes _native composed keys_ 270 * increased security and collision resistance of addresses 271 * the approach is extensible for future use-cases - one can use other address types, as long as they don't conflict with the address length specified here (20 or 32 bytes). 272 * support new account types. 273 274 ### Negative 275 276 * addresses do not communicate key type, a prefixed approach would have done this 277 * addresses are 60% longer and will consume more storage space 278 * requires a refactor of KVStore store keys to handle variable length addresses 279 280 ### Neutral 281 282 * protobuf message names are used as key type prefixes 283 284 ## Further Discussions 285 286 Some accounts can have a fixed name or may be constructed in other way (eg: modules). We were discussing an idea of an account with a predefined name (eg: `me.regen`), which could be used by institutions. 287 Without going into details, these kinds of addresses are compatible with the hash based addresses described here as long as they don't have the same length. 288 More specifically, any special account address must not have a length equal to 20 or 32 bytes. 289 290 ## Appendix: Consulting session 291 292 End of Dec 2020 we had a session with [Alan Szepieniec](https://scholar.google.be/citations?user=4LyZn8oAAAAJ&hl=en) to consult the approach presented above. 293 294 Alan general observations: 295 296 * we don’t need 2-preimage resistance 297 * we need 32bytes address space for collision resistance 298 * when an attacker can control an input for object with an address then we have a problem with birthday attack 299 * there is an issue with smart-contracts for hashing 300 * sha2 mining can be use to breaking address pre-image 301 302 Hashing algorithm 303 304 * any attack breaking blake3 will break blake2 305 * Alan is pretty confident about the current security analysis of the blake hash algorithm. It was a finalist, and the author is well known in security analysis. 306 307 Algorithm: 308 309 * Alan recommends to hash the prefix: `address(pub_key) = hash(hash(key_type) + pub_key)[:32]`, main benefits: 310 * we are free to user arbitrary long prefix names 311 * we still don’t risk collisions 312 * switch tables 313 * discussion about penalization -> about adding prefix post hash 314 * Aaron asked about post hash prefixes (`address(pub_key) = key_type + hash(pub_key)`) and differences. Alan noted that this approach has longer address space and it’s stronger. 315 316 Algorithm for complex / composed keys: 317 318 * merging tree like addresses with same algorithm are fine 319 320 Module addresses: Should module addresses have different size to differentiate it? 321 322 * we will need to set a pre-image prefix for module addresse to keept them in 32-byte space: `hash(hash('module') + module_key)` 323 * Aaron observation: we already need to deal with variable length (to not break secp256k1 keys). 324 325 Discssion about arithmetic hash function for ZKP 326 327 * Posseidon / Rescue 328 * Problem: much bigger risk because we don’t know much techniques and history of crypto-analysis of arithmetic constructions. It’s still a new ground and area of active research. 329 330 Post quantum signature size 331 332 * Alan suggestion: Falcon: speed / size ration - very good. 333 * Aaron - should we think about it? 334 Alan: based on early extrapolation this thing will get able to break EC cryptography in 2050 . But that’s a lot of uncertainty. But there is magic happening with recurions / linking / simulation and that can speedup the progress. 335 336 Other ideas 337 338 * Let’s say we use same key and two different address algorithms for 2 different use cases. Is it still safe to use it? Alan: if we want to hide the public key (which is not our use case), then it’s less secure but there are fixes. 339 340 ### References 341 342 * [Notes](https://hackmd.io/_NGWI4xZSbKzj1BkCqyZMw)