github.com/badrootd/celestia-core@v0.0.0-20240305091328-aa4207a4b25d/spec/p2p/v0.34/addressbook.md (about) 1 # Address Book 2 3 The address book tracks information about peers, i.e., about other nodes in the network. 4 5 The primary information stored in the address book are peer addresses. 6 A peer address is composed by a node ID and a network address; a network 7 address is composed by an IP address or a DNS name plus a port number. 8 The same node ID can be associated to multiple network addresses. 9 10 There are two sources for the addresses stored in the address book. 11 The [Peer Exchange protocol](./pex-protocol.md) stores in the address book 12 the peer addresses it discovers, i.e., it learns from connected peers. 13 And the [Switch](./switch.md) registers the addresses of peers with which it 14 has interacted: to which it has dialed or from which it has accepted a 15 connection. 16 17 The address book also records additional information about peers with which the 18 node has interacted, from which is possible to rank peers. 19 The Switch reports [connection attempts](#dial-attempts) to a peer address; too 20 much failed attempts indicate that a peer address is invalid. 21 Reactors, in they turn, report a peer as [good](#good-peers) when it behaves as 22 expected, or as a [bad peer](#bad-peers), when it misbehaves. 23 24 There are two entities that retrieve peer addresses from the address book. 25 The [Peer Manager](./peer_manager.md) retrieves peer addresses to dial, so to 26 establish outbound connections. 27 This selection is random, but has a configurable bias towards peers that have 28 been marked as good peers. 29 The [Peer Exchange protocol](./pex-protocol.md) retrieves random samples of 30 addresses to offer (send) to peers. 31 This selection is also random but it includes, in particular for nodes that 32 operate in seed mode, some bias toward peers marked as good ones. 33 34 ## Buckets 35 36 Peer addresses are stored in buckets. 37 There are buckets for new addresses and buckets for old addresses. 38 The buckets for new addresses store addresses of peers about which the node 39 does not have much information; the first address registered for a peer ID is 40 always stored in a bucket for new addresses. 41 The buckets for old addresses store addresses of peers with which the node has 42 interacted and that were reported as [good peers](#good-peers) by a reactor. 43 An old address therefore can be seen as an alias for a good address. 44 45 > Note that new addresses does not mean bad addresses. 46 > The addresses of peers marked as [bad peers](#bad-peers) are removed from the 47 > buckets where they are stored, and temporarily kept in a table of banned peers. 48 49 The number of buckets is fixed and there are more buckets for new addresses 50 (`256`) than buckets for old addresses (`64`), a ratio of 4:1. 51 Each bucket can store up to `64` addresses. 52 When a bucket becomes full, the peer address with the lowest ranking is removed 53 from the bucket. 54 The first choice is to remove bad addresses, with multiple failed attempts 55 associated. 56 In the absence of those, the *oldest* address in the bucket is removed, i.e., 57 the address with the oldest last attempt to dial. 58 59 When a bucket for old addresses becomes full, the lowest-ranked peer address in 60 the bucket is moved to a bucket of new addresses. 61 When a bucket for new addresses becomes full, the lowest-ranked peer address in 62 the bucket is removed from the address book. 63 In other words, exceeding old or good addresses are downgraded to new 64 addresses, while exceeding new addresses are dropped. 65 66 The bucket that stores an `address` is defined by the following two methods, 67 for new and old addresses: 68 69 - `calcNewBucket(address, source) = hash(key + groupKey(source) + hash(key + groupKey(address) + groupKey(source)) % newBucketsPerGroup) % newBucketCount` 70 - `calcOldBucket(address) = hash(key + groupKey(address) + hash(key + address) % oldBucketsPerGroup) % oldBucketCount` 71 72 The `key` is a fixed random 96-bit (8-byte) string. 73 The `groupKey` for an address is a string representing its network group. 74 The `source` of an address is the address of the peer from which we learn the 75 address.. 76 The first (internal) hash is reduced to an integer up to `newBucketsPerGroup = 77 32`, for new addresses, and `oldBucketsPerGroup = 4`, for old addresses. 78 The second (external) hash is reduced to bucket indexes, in the interval from 0 79 to the number of new (`newBucketCount = 256`) or old (`oldBucketCount = 64`) buckets. 80 81 Notice that new addresses with sources from the same network group are more 82 likely to end up in the same bucket, therefore to competing for it. 83 For old address, instead, two addresses are more likely to end up in the same 84 bucket when they belong to the same network group. 85 86 ## Adding addresses 87 88 The `AddAddress` method adds the address of a peer to the address book. 89 90 The added address is associated to a *source* address, which identifies the 91 node from which the peer address was learned. 92 93 Addresses are added to the address book in the following situations: 94 95 1. When a peer address is learned via PEX protocol, having the sender 96 of the PEX message as its source 97 2. When an inbound peer is added, in this case the peer itself is set as the 98 source of its own address 99 3. When the switch is instructed to dial addresses via the `DialPeersAsync` 100 method, in this case the node itself is set as the source 101 102 If the added address contains a node ID that is not registered in the address 103 book, the address is added to a [bucket](#buckets) of new addresses. 104 Otherwise, the additional address for an existing node ID is **not added** to 105 the address book when: 106 107 - The last address added with the same node ID is stored in an old bucket, so 108 it is considered a "good" address 109 - There are addresses associated to the same node ID stored in 110 `maxNewBucketsPerAddress = 4` distinct buckets 111 - Randomly, with a probability that increases exponentially with the number of 112 buckets in which there is an address with the same node ID. 113 So, a new address for a node ID which is already present in one bucket is 114 added with 50% of probability; if the node ID is present in two buckets, the 115 probability decreases to 25%; and if it is present in three buckets, the 116 probability is 12.5%. 117 118 The new address is also added to the `addrLookup` table, which stores 119 `knownAddress` entries indexed by their node IDs. 120 If the new address is from an unknown peer, a new entry is added to the 121 `addrLookup` table; otherwise, the existing entry is updated with the new 122 address. 123 Entries of this table contain, among other fields, the list of buckets where 124 addresses of a peer are stored. 125 The `addrLookup` table is used by most of the address book methods (e.g., 126 `HasAddress`, `IsGood`, `MarkGood`, `MarkAttempt`), as it provides fast access 127 to addresses. 128 129 ### Errors 130 131 - if the added address or the associated source address are nil 132 - if the added address is invalid 133 - if the added address is the local node's address 134 - if the added address ID is of a [banned](#bad-peers) peer 135 - if either the added address or the associated source address IDs are configured as private IDs 136 - if `routabilityStrict` is set and the address is not routable 137 - in case of failures computing the bucket for the new address (`calcNewBucket` method) 138 - if the added address instance, which is a new address, is configured as an 139 old address (sanity check of `addToNewBucket` method) 140 141 ## Need for Addresses 142 143 The `NeedMoreAddrs` method verifies whether the address book needs more addresses. 144 145 It is invoked by the PEX reactor to define whether to request peer addresses 146 to a new outbound peer or to a randomly selected connected peer. 147 148 The address book needs more addresses when it has less than `1000` addresses 149 registered, counting all buckets for new and old addresses. 150 151 ## Pick address 152 153 The `PickAddress` method returns an address stored in the address book, chosen 154 at random with a configurable bias towards new addresses. 155 156 It is invoked by the Peer Manager to obtain a peer address to dial, as part of 157 its `ensurePeers` routine. 158 The bias starts from 10%, when the peer has no outbound peers, increasing by 159 10% for each outbound peer the node has, up to 90%, when the node has at least 160 8 outbound peers. 161 162 The configured bias is a parameter that influences the probability of choosing 163 an address from a bucket of new addresses or from a bucket of old addresses. 164 A second parameter influencing this choice is the number of new and old 165 addresses stored in the address book. 166 In the absence of bias (i.e., if the configured bias is 50%), the probability 167 of picking a new address is given by the square root of the number of new 168 addresses divided by the sum of the square roots of the numbers of new and old 169 addresses. 170 By adding a bias toward new addresses (i.e., configured bias larger than 50%), 171 the portion on the sample occupied by the square root of the number of new 172 addresses increases, while the corresponding portion for old addresses decreases. 173 As a result, it becomes more likely to pick a new address at random from this sample. 174 175 > The use of the square roots softens the impact of disproportional numbers of 176 > new and old addresses in the address book. This is actually the expected 177 > scenario, as there are 4 times more buckets for new addresses than buckets 178 > for old addresses. 179 180 Once the type of address, new or old, is defined, a non-empty bucket of this 181 type is selected at random. 182 From the selected bucket, an address is chosen at random and returned. 183 If all buckets of the selected type are empty, no address is returned. 184 185 ## Random selection 186 187 The `GetSelection` method returns a selection of addresses stored in the 188 address book, with no bias toward new or old addresses. 189 190 It is invoked by the PEX protocol to obtain a list of peer addresses with two 191 purposes: 192 193 - To send to a peer in a PEX response, in the case of outbound peers or of 194 nodes not operating in seed mode 195 - To crawl, in the case of nodes operating in seed mode, as part of every 196 interaction of the `crawlPeersRoutine` 197 198 The selection is a random subset of the peer addresses stored in the 199 `addrLookup` table, which stores the last address added for each peer ID. 200 The target size of the selection is `23%` (`getSelectionPercent`) of the 201 number of addresses stored in the address book, but it should not be lower than 202 `32` (`minGetSelection`) --- if it is, all addresses in the book are returned 203 --- nor greater than `250` (`maxGetSelection`). 204 205 > The random selection is produced by: 206 > 207 > - Retrieving all entries of the `addrLookup` map, which by definition are 208 > returned in random order. 209 > - Randomly shuffling the retrieved list, using the Fisher-Yates algorithm 210 211 ## Random selection with bias 212 213 The `GetSelectionWithBias` method returns a selection of addresses stored in 214 the address book, with bias toward new addresses. 215 216 It is invoked by the PEX protocol to obtain a list of peer addresses to be sent 217 to a peer in a PEX response. 218 This method is only invoked by seed nodes, when replying to a PEX request 219 received from an inbound peer (i.e., a peer that dialed the seed node). 220 The bias used in this scenario is hard-coded to 30%, meaning that 70% of 221 the returned addresses are expected to be old addresses. 222 223 The number of addresses that compose the selection is computed in the same way 224 as for the non-biased random selection. 225 The bias toward new addresses is implemented by requiring that the configured 226 bias, interpreted as a percentage, of the select addresses come from buckets of 227 new addresses, while the remaining come from buckets of old addresses. 228 Since the number of old addresses is typically lower than the number of new 229 addresses, it is possible that the address book does not have enough old 230 addresses to include in the selection. 231 In this case, additional new addresses are included in the selection. 232 Thus, the configured bias, in practice, is towards old addresses, not towards 233 new addresses. 234 235 To randomly select addresses of a type, the address book considers all 236 addresses present in every bucket of that type. 237 This list of all addresses of a type is randomly shuffled, and the requested 238 number of addresses are retrieved from the tail of this list. 239 The returned selection contains, at its beginning, a random selection of new 240 addresses in random order, followed by a random selection of old addresses, in 241 random order. 242 243 ## Dial Attempts 244 245 The `MarkAttempt` method records a failed attempt to connect to an address. 246 247 It is invoked by the Peer Manager when it fails dialing a peer, but the failure 248 is not in the authentication step (`ErrSwitchAuthenticationFailure` error). 249 In case of authentication errors, the peer is instead marked as a [bad peer](#bad-peers). 250 251 The failed connection attempt is recorded in the address registered for the 252 peer's ID in the `addrLookup` table, which is the last address added with that ID. 253 The known address' counter of failed `Attempts` is increased and the failure 254 time is registered in `LastAttempt`. 255 256 The possible effect of recording multiple failed connect attempts to a peer is 257 to turn its address into a *bad* address (do not confuse with banned addresses). 258 A known address becomes bad if it is stored in buckets of new addresses, and 259 when connection attempts: 260 261 - Have not been made over a week, i.e., `LastAttempt` is older than a week 262 - Have failed 3 times and never succeeded, i.e., `LastSucess` field is unset 263 - Have failed 10 times in the last week, i.e., `LastSucess` is older than a week 264 265 Addresses marked as *bad* are the first candidates to be removed from a bucket of 266 new addresses when the bucket becomes full. 267 268 > Note that failed connection attempts are reported for a peer address, but in 269 > fact the address book records them for a peer. 270 > 271 > More precisely, failed connection attempts are recorded in the entry of the 272 > `addrLookup` table with reported peer ID, which contains the last address 273 > added for that node ID, which is not necessarily the reported peer address. 274 275 ## Good peers 276 277 The `MarkGood` method marks a peer ID as good. 278 279 It is invoked by the consensus reactor, via switch, when the number of useful 280 messages received from a peer is a multiple of `10000`. 281 Vote and block part messages are considered for this number, they must be valid 282 and not be duplicated messages to be considered useful. 283 284 > The `SwitchReporter` type of `behaviour` package also invokes the `MarkGood` 285 > method when a "reason" associated with consensus votes and block parts is 286 > reported. 287 > No reactor, however, currently provides these "reasons" to the `SwitchReporter`. 288 289 The effect of this action is that the address registered for the peer's ID in the 290 `addrLookup` table, which is the last address added with that ID, is marked as 291 good and moved to a bucket of old addresses. 292 An address marked as good has its failed to connect counter and timestamp reset. 293 If the destination bucket of old addresses is full, the oldest address in the 294 bucket is moved (downgraded) to a bucket of new addresses. 295 296 Moving the peer address to a bucket of old addresses has the effect of 297 upgrading, or increasing the ranking of a peer in the address book. 298 299 ## Bad peers 300 301 The `MarkBad` method marks a peer as bad and bans it for a period of time. 302 303 This method is only invoked within the PEX reactor, with a banning time of 24 304 hours, for the following reasons: 305 306 - A peer misbehaves in the [PEX protocol](pex-protocol.md#misbehavior) 307 - When the `maxAttemptsToDial` limit (`16`) is reached for a peer 308 - If an `ErrSwitchAuthenticationFailure` error is returned when dialing a peer 309 310 The effect of this action is that the address registered for the peer's ID in the 311 `addrLookup` table, which is the last address added with that ID, is banned for 312 a period of time. 313 The banned peer is removed from the `addrLookup` table and from all buckets 314 where its addresses are stored. 315 316 The information about banned peers, however, is not discarded. 317 It is maintained in the `badPeers` map, indexed by peer ID. 318 This allows, in particular, addresses of banned peers to be 319 [reinstated](#reinstating-addresses), i.e., to be added 320 back to the address book, when their ban period expires. 321 322 ## Reinstating addresses 323 324 The `ReinstateBadPeers` method attempts to re-add banned addresses to the address book. 325 326 It is invoked by the PEX reactor when dialing new peers. 327 This action is taken before requesting additional addresses to peers, 328 in the case that the node needs more peer addresses. 329 330 The set of banned peer addresses is retrieved from the `badPeers` map. 331 Addresses that are not any longer banned, i.e., whose banned period has expired, 332 are added back to the address book as new addresses, while the corresponding 333 node IDs are removed from the `badPeers` map. 334 335 ## Removing addresses 336 337 The `RemoveAddress` method removes an address from the address book. 338 339 It is invoked by the switch when it dials a peer or accepts a connection from a 340 peer that ends up being the node itself (`IsSelf` error). 341 In both cases, the address dialed or accepted is also added to the address book 342 as a local address, via the `AddOurAddress` method. 343 344 The same logic is also internally used by the address book for removing 345 addresses of a peer that is [marked as a bad peer](#bad-peers). 346 347 The entry registered with the peer ID of the address in the `addrLookup` table, 348 which is the last address added with that ID, is removed from all buckets where 349 it is stored and from the `addrLookup` table. 350 351 > FIXME: is it possible that addresses with the same ID as the removed address, 352 > but with distinct network addresses, are kept in buckets of the address book? 353 > While they will not be accessible anymore, as there is no reference to them 354 > in the `addrLookup`, they will still be there. 355 356 ## Persistence 357 358 The `loadFromFile` method, called when the address book is started, reads 359 address book entries from a file, passed to the address book constructor. 360 The file, at this point, does not need to exist. 361 362 The `saveRoutine` is started when the address book is started. 363 It saves the address book to the configured file every `dumpAddressInterval`, 364 hard-coded to 2 minutes. 365 It is also possible to save the content of the address book using the `Save` 366 method. 367 Saving the address book content to a file acquires the address book lock, also 368 employed by all other public methods.