github.com/aakash4dev/cometbft@v0.38.2/spec/p2p/implementation/peer_manager.md (about)

     1  # Peer Manager
     2  
     3  The peer manager is responsible for establishing connections with peers.
     4  It defines when a node should dial peers and which peers it should dial.
     5  The peer manager is not an implementation abstraction of the p2p layer,
     6  but a role that is played by the [PEX reactor](./pex.md).
     7  
     8  ## Outbound peers
     9  
    10  The `ensurePeersRoutine` is a persistent routine intended to ensure that a node
    11  is connected to `MaxNumOutboundPeers` outbound peers.
    12  This routine is continuously executed by regular nodes, i.e. nodes not
    13  operating in seed mode, as part of the PEX reactor implementation.
    14  
    15  The logic defining when the node should dial peers, for selecting peers to dial
    16  and for actually dialing them is implemented in the `ensurePeers` method.
    17  This method is periodically invoked -- every `ensurePeersPeriod`, with default
    18  value to 30 seconds -- by the `ensurePeersRoutine`.
    19  
    20  A node is expected to dial peers whenever the number of outbound peers is lower
    21  than the configured `MaxNumOutboundPeers` parameter.
    22  The current number of outbound peers is retrieved from the switch, using the
    23  `NumPeers` method, which also reports the number of nodes to which the switch
    24  is currently dialing.
    25  If the number of outbound peers plus the number of dialing routines equals to
    26  `MaxNumOutboundPeers`, nothing is done.
    27  Otherwise, the `ensurePeers` method will attempt to dial node addresses in
    28  order to reach the target number of outbound peers.
    29  
    30  Once defined that the node needs additional outbound peers, the node queries
    31  the address book for candidate addresses.
    32  This is done using the [`PickAddress`](./addressbook.md#pick-address) method,
    33  which returns an address selected at random on the address book, with some bias
    34  towards new or old addresses.
    35  When the node has up to 3 outbound peers, the adopted bias is towards old
    36  addresses, i.e., addresses of peers that are believed to be "good".
    37  When the node has from 5 outbound peers, the adopted bias is towards new
    38  addresses, i.e., addresses of peers about which the node has not yet collected
    39  much information.
    40  So, the more outbound peers a node has, the less conservative it will be when
    41  selecting new peers.
    42  
    43  The selected peer addresses are then dialed in parallel, by starting a dialing
    44  routine per peer address.
    45  Dialing a peer address can fail for multiple reasons.
    46  The node might have attempted to dial the peer too many times.
    47  In this case, the peer address is marked as bad and removed from the address book.
    48  The node might have attempted and failed to dial the peer recently
    49  and the exponential `backoffDuration` has not yet passed.
    50  Or the current connection attempt might fail, which is registered in the address book.
    51  None of these errors are explicitly handled by the `ensurePeers` method, which
    52  also does not wait until the connections are established.
    53  
    54  The third step of the `ensurePeers` method is to ensure that the address book
    55  has enough addresses.
    56  This is done, first, by [reinstating banned peers](./addressbook.md#Reinstating-addresses)
    57  whose ban period has expired.
    58  Then, the node randomly selects a connected peer, which can be either an
    59  inbound or outbound peer, to [requests addresses](./pex-protocol.md#Requesting-Addresses)
    60  using the PEX protocol.
    61  Last, and this action is only performed if the node could not retrieve any new
    62  address to dial from the address book, the node dials the configured seed nodes
    63  in order to establish a connection to at least one of them.
    64  
    65  ### Fast dialing
    66  
    67  As above described, seed nodes are actually the last source of peer addresses
    68  for regular nodes.
    69  They are contacted by a node when, after an invocation of the `ensurePeers`
    70  method, no suitable peer address to dial is retrieved from the address book
    71  (e.g., because it is empty).
    72  
    73  Once a connection with a seed node is established, the node immediately
    74  [sends a PEX request](./pex-protocol.md#Requesting-Addresses) to it, as it is
    75  added as an outbound peer.
    76  When the corresponding PEX response is received, the addresses provided by the
    77  seed node are added to the address book.
    78  As a result, in the next invocation of the `ensurePeers` method, the node
    79  should be able to dial some of the peer addresses provided by the seed node.
    80  
    81  However, as observed in this [issue](https://github.com/tendermint/tendermint/issues/2093),
    82  it can take some time, up to `ensurePeersPeriod` or 30 seconds, from when the
    83  node receives new peer addresses and when it dials the received addresses.
    84  To avoid this delay, which can be particularly relevant when the node has no
    85  peers, a node immediately attempts to dial peer addresses when they are
    86  received from a peer that is locally configured as a seed node.
    87  
    88  > FIXME: The current logic was introduced in [#3762](https://github.com/tendermint/tendermint/pull/3762).
    89  > Although it fix the issue, the delay between receiving an address and dialing
    90  > the peer, it does not impose and limit on how many addresses are dialed in this
    91  > scenario.
    92  > So, all addresses received from a seed node are dialed, regardless of the
    93  > current number of outbound peers, the number of dialing routines, or the
    94  > `MaxNumOutboundPeers` parameter.
    95  >
    96  > Issue [#9548](https://github.com/tendermint/tendermint/issues/9548) was
    97  > created to handle this situation.
    98  
    99  ### First round
   100  
   101  When the PEX reactor is started, the `ensurePeersRoutine` is created and it
   102  runs thorough the operation of a node, periodically invoking the `ensurePeers`
   103  method.
   104  However, if when the persistent routine is started the node already has some
   105  peers, either inbound or outbound peers, or is dialing some addresses, the
   106  first invocation of `ensurePeers` is delayed by a random amount of time from 0
   107  to `ensurePeersPeriod`.
   108  
   109  ### Persistent peers
   110  
   111  The node configuration can contain a list of *persistent peers*.
   112  Those peers have preferential treatment compared to regular peers and the node
   113  is always trying to connect to them.
   114  Moreover, these peers are not removed from the address book in the case of
   115  multiple failed dial attempts.
   116  
   117  On startup, the node immediately tries to dial the configured persistent peers
   118  by calling the switch's [`DialPeersAsync`](./switch.md#manual-operation) method.
   119  This is not done in the p2p package, but it is part of the procedure to set up a node.
   120  
   121  > TODO: the handling of persistent peers should be described in more detail.
   122  
   123  ### Life cycle
   124  
   125  The picture below is a first attempt of illustrating the life cycle of an outbound peer:
   126  
   127  <img src="../images/p2p_state.png" width="50%" title="Outgoing peers lifecycle">
   128  
   129  A peer can be in the following states:
   130  
   131  - Candidate peers: peer addresses stored in the address boook, that can be
   132    retrieved via the [`PickAddress`](./addressbook.md#pick-address) method
   133  - [Dialing](./switch.md#dialing-peers): peer addresses that are currently being
   134    dialed. This state exists to ensure that a single dialing routine exist per peer.
   135  - [Reconnecting](./switch.md#reconnect-to-peer): persistent peers to which a node
   136    is currently reconnecting, as a previous connection attempt has failed.
   137  - Connected peers: peers that a node has successfully dialed, added as outbound peers.
   138  - [Bad peers](./addressbook.md#bad-peers): peers marked as bad in the address
   139    book due to exhibited [misbehavior](./pex-protocol.md#misbehavior).
   140    Peers can be reinstated after being marked as bad.
   141  
   142  ## Pending of documentation
   143  
   144  The `dialSeeds` method of the PEX reactor.
   145  
   146  The `dialPeer` method of the PEX reactor.
   147  This includes `dialAttemptsInfo`, `maxBackoffDurationForPeer` methods.