github.com/aakash4dev/cometbft@v0.38.2/spec/p2p/implementation/peer_manager.md (about) 1 # Peer Manager 2 3 The peer manager is responsible for establishing connections with peers. 4 It defines when a node should dial peers and which peers it should dial. 5 The peer manager is not an implementation abstraction of the p2p layer, 6 but a role that is played by the [PEX reactor](./pex.md). 7 8 ## Outbound peers 9 10 The `ensurePeersRoutine` is a persistent routine intended to ensure that a node 11 is connected to `MaxNumOutboundPeers` outbound peers. 12 This routine is continuously executed by regular nodes, i.e. nodes not 13 operating in seed mode, as part of the PEX reactor implementation. 14 15 The logic defining when the node should dial peers, for selecting peers to dial 16 and for actually dialing them is implemented in the `ensurePeers` method. 17 This method is periodically invoked -- every `ensurePeersPeriod`, with default 18 value to 30 seconds -- by the `ensurePeersRoutine`. 19 20 A node is expected to dial peers whenever the number of outbound peers is lower 21 than the configured `MaxNumOutboundPeers` parameter. 22 The current number of outbound peers is retrieved from the switch, using the 23 `NumPeers` method, which also reports the number of nodes to which the switch 24 is currently dialing. 25 If the number of outbound peers plus the number of dialing routines equals to 26 `MaxNumOutboundPeers`, nothing is done. 27 Otherwise, the `ensurePeers` method will attempt to dial node addresses in 28 order to reach the target number of outbound peers. 29 30 Once defined that the node needs additional outbound peers, the node queries 31 the address book for candidate addresses. 32 This is done using the [`PickAddress`](./addressbook.md#pick-address) method, 33 which returns an address selected at random on the address book, with some bias 34 towards new or old addresses. 35 When the node has up to 3 outbound peers, the adopted bias is towards old 36 addresses, i.e., addresses of peers that are believed to be "good". 37 When the node has from 5 outbound peers, the adopted bias is towards new 38 addresses, i.e., addresses of peers about which the node has not yet collected 39 much information. 40 So, the more outbound peers a node has, the less conservative it will be when 41 selecting new peers. 42 43 The selected peer addresses are then dialed in parallel, by starting a dialing 44 routine per peer address. 45 Dialing a peer address can fail for multiple reasons. 46 The node might have attempted to dial the peer too many times. 47 In this case, the peer address is marked as bad and removed from the address book. 48 The node might have attempted and failed to dial the peer recently 49 and the exponential `backoffDuration` has not yet passed. 50 Or the current connection attempt might fail, which is registered in the address book. 51 None of these errors are explicitly handled by the `ensurePeers` method, which 52 also does not wait until the connections are established. 53 54 The third step of the `ensurePeers` method is to ensure that the address book 55 has enough addresses. 56 This is done, first, by [reinstating banned peers](./addressbook.md#Reinstating-addresses) 57 whose ban period has expired. 58 Then, the node randomly selects a connected peer, which can be either an 59 inbound or outbound peer, to [requests addresses](./pex-protocol.md#Requesting-Addresses) 60 using the PEX protocol. 61 Last, and this action is only performed if the node could not retrieve any new 62 address to dial from the address book, the node dials the configured seed nodes 63 in order to establish a connection to at least one of them. 64 65 ### Fast dialing 66 67 As above described, seed nodes are actually the last source of peer addresses 68 for regular nodes. 69 They are contacted by a node when, after an invocation of the `ensurePeers` 70 method, no suitable peer address to dial is retrieved from the address book 71 (e.g., because it is empty). 72 73 Once a connection with a seed node is established, the node immediately 74 [sends a PEX request](./pex-protocol.md#Requesting-Addresses) to it, as it is 75 added as an outbound peer. 76 When the corresponding PEX response is received, the addresses provided by the 77 seed node are added to the address book. 78 As a result, in the next invocation of the `ensurePeers` method, the node 79 should be able to dial some of the peer addresses provided by the seed node. 80 81 However, as observed in this [issue](https://github.com/tendermint/tendermint/issues/2093), 82 it can take some time, up to `ensurePeersPeriod` or 30 seconds, from when the 83 node receives new peer addresses and when it dials the received addresses. 84 To avoid this delay, which can be particularly relevant when the node has no 85 peers, a node immediately attempts to dial peer addresses when they are 86 received from a peer that is locally configured as a seed node. 87 88 > FIXME: The current logic was introduced in [#3762](https://github.com/tendermint/tendermint/pull/3762). 89 > Although it fix the issue, the delay between receiving an address and dialing 90 > the peer, it does not impose and limit on how many addresses are dialed in this 91 > scenario. 92 > So, all addresses received from a seed node are dialed, regardless of the 93 > current number of outbound peers, the number of dialing routines, or the 94 > `MaxNumOutboundPeers` parameter. 95 > 96 > Issue [#9548](https://github.com/tendermint/tendermint/issues/9548) was 97 > created to handle this situation. 98 99 ### First round 100 101 When the PEX reactor is started, the `ensurePeersRoutine` is created and it 102 runs thorough the operation of a node, periodically invoking the `ensurePeers` 103 method. 104 However, if when the persistent routine is started the node already has some 105 peers, either inbound or outbound peers, or is dialing some addresses, the 106 first invocation of `ensurePeers` is delayed by a random amount of time from 0 107 to `ensurePeersPeriod`. 108 109 ### Persistent peers 110 111 The node configuration can contain a list of *persistent peers*. 112 Those peers have preferential treatment compared to regular peers and the node 113 is always trying to connect to them. 114 Moreover, these peers are not removed from the address book in the case of 115 multiple failed dial attempts. 116 117 On startup, the node immediately tries to dial the configured persistent peers 118 by calling the switch's [`DialPeersAsync`](./switch.md#manual-operation) method. 119 This is not done in the p2p package, but it is part of the procedure to set up a node. 120 121 > TODO: the handling of persistent peers should be described in more detail. 122 123 ### Life cycle 124 125 The picture below is a first attempt of illustrating the life cycle of an outbound peer: 126 127 <img src="../images/p2p_state.png" width="50%" title="Outgoing peers lifecycle"> 128 129 A peer can be in the following states: 130 131 - Candidate peers: peer addresses stored in the address boook, that can be 132 retrieved via the [`PickAddress`](./addressbook.md#pick-address) method 133 - [Dialing](./switch.md#dialing-peers): peer addresses that are currently being 134 dialed. This state exists to ensure that a single dialing routine exist per peer. 135 - [Reconnecting](./switch.md#reconnect-to-peer): persistent peers to which a node 136 is currently reconnecting, as a previous connection attempt has failed. 137 - Connected peers: peers that a node has successfully dialed, added as outbound peers. 138 - [Bad peers](./addressbook.md#bad-peers): peers marked as bad in the address 139 book due to exhibited [misbehavior](./pex-protocol.md#misbehavior). 140 Peers can be reinstated after being marked as bad. 141 142 ## Pending of documentation 143 144 The `dialSeeds` method of the PEX reactor. 145 146 The `dialPeer` method of the PEX reactor. 147 This includes `dialAttemptsInfo`, `maxBackoffDurationForPeer` methods.