github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/network/p2p/unicast/README.MD (about) 1 # Unicast Manager 2 3 ## Overview 4 In Flow blockchain, nodes communicate with each other in 3 different ways; `unicast`, `multicast`, and `publish`. 5 The `multicast` and `publish` are handled by the pubsub (GossipSub) protocol. 6 The `unicast` is a protocol that is used to send messages over direct (one-to-one) connections to remote nodes. 7 Each `unicast` message is sent through a single-used, one-time stream. One can see a stream as a virtual protocol 8 that expands the base direct connection into a full-duplex communication channel. 9 Figure below illustrates the notion of direct connection and streams between nodes A and B. The direct 10 connection is established between the nodes and then the nodes can open multiple streams over the connection. 11 The streams are shown with dashed green lines, while the direct connection is illustrated by blue lines that 12 encapsulates the streams. 13 ![streams.png](streams.png) 14 15 The `unicast` `Manager` is responsible for _establishing_ streams between nodes when they need to communicate 16 over `unicast` protocol. When the manager receives a `CreateStream` invocation, it will try to establish a stream to the 17 remote `peer` whose identifier is provided in the invocation (`peer.ID`). The manager is expanding the libp2p 18 functionalities, hence, it operates on the notion of the `peer` (rather than Flow node), and `peer.ID` rather 19 than `flow.Identifier`. It is the responsibility of the caller to provide the correct `peer.ID` of the remote 20 node. 21 22 The `UnicastManager` relies on the underlying libp2p node to establish the connection to the remote peer. Once the underlying 23 libp2p node receives a stream creation request from the `UnicastManager`, it will try to establish a connection to the remote peer if 24 there is no existing connection to the peer. Otherwise, it will pick and re-use the best existing connection to the remote peer. 25 Hence, the `UnicastManager` does not (and should not) care about the connection establishment, and rather relies on the underlying 26 libp2p node to establish the connection. The `UnicastManager` only cares about the stream creation, and will return an error 27 if the underlying libp2p node fails to establish a connection to the remote peer. 28 29 30 A stream is a one-time communication channel, i.e., it is assumed to be closed 31 by the caller once the message is sent. The caller (i.e., the Flow node) does not necessarily re-use a stream, and the 32 `Manager` creates one stream per request (i.e., `CreateStream` invocation), which is typically a single message. 33 34 Note: the limit of number of streams and connections between nodes is set throught eh libp2p resource manager limits (see `config/default-config.yml`): 35 36 Note: `pubsub` protocol also establishes connections between nodes to exchange gossip messages with each other. 37 The connection type is the same between `pubsub` and `unicast` protocols, as they both consult the underlying LibP2P node to 38 establish the connection. However, the level of reliability, life-cycle, and other aspects of the connections are different 39 between the two protocols. For example, `pubsub` requires some _number_ of connections to some _number_ of peers, which in most cases 40 is regardless of their identity. However, `unicast` requires a connection to a specific peer, and the connection is assumed 41 to be persistent. Hence, both these protocols have their own notion of connection management; the `unicast` `Manager` is responsible 42 for establishing connections when `unicast` protocol needs to send a message to a remote peer, while the `PeerManager` is responsible 43 for establishing connections when `pubsub`. These two work in isolation and independent of each other to satisfy different requirements. 44 45 The `PeerManager` regularly checks the health of the connections and closes the connections to the peers that are not part of the Flow 46 protocol state. One the other hand, the `unicast` `Manager` only establishes a connection if there is no existing connection to the remote 47 peer. Currently, Flow nodes operate on a full mesh topology, meaning that every node is connected to every other node through `PeerManager`. 48 The `PeerManager` starts connecting to every remote node of the Flow protocol upon startup, and then maintains the connections unless the node 49 is disallow-listed or ejected by the protocol state. Accordingly, it is a rare event that a node does not have a connection to another node. 50 Also, that is the reason behind the `unicast` `Manager` not closing the connection after the stream is closed. The `unicast` `Manager` assumes 51 that the connection is persistent and will be kept open by the `PeerManager`. 52 53 ## Backoff and Retry Attempts 54 The flowchart below explains the abstract logic of the `UnicastManager` when it receives a `CreateStream` invocation. 55 On the happy path, the `UnicastManager` successfully opens a stream to the peer. 56 However, there can be cases that the remote peer is not reliable for stream creation, or the remote peer acts 57 maliciously and does not respond stream creation requests. In order to distinguish between the cases that the remote peer 58 is not reliable and the cases that the remote peer is malicious, the `UnicastManager` uses a backoff and retry mechanism. 59 60 ![retry.png](retry.png) 61 62 ### Addressing Unreliable Remote Peer 63 To address the unreliability of remote peer, upon an unsuccessful attempt to establish a stream, the `UnicastManager` will wait for a certain 64 amount of time before it tries to establish (i.e., the backoff mechanism), and will retry a certain number of times before it gives up (i.e., the retry mechanism). 65 The backoff and retry parameters are configurable through runtime flags. 66 If all backoff and retry attempts fail, the `UnicastManager` will return an error to the caller. The caller can then decide to retry the request or not. 67 By default, `UnicastManager` retries each stream creation attempt 3 times. Also, the backoff intervals for dialing and stream creation are initialized to 1 second and progress 68 exponentially with a factor of 2, i.e., the `i-th` retry attempt is made after `t * 2^(i-1)`, where `t` is the backoff interval. 69 For example, if the backoff interval is 1s, the first attempt is made right-away, the first (retry) attempt is made after 1s * 2^(1 - 1) = 1s, the third (retry) attempt is made 70 after `1s * 2^(2 - 1) = 2s`, and so on. 71 72 These parameters are configured using the `config/default-config.yml` file: 73 ```yaml 74 # Unicast create stream retry delay is initial delay used in the exponential backoff for create stream retries 75 unicast-create-stream-retry-delay: 1s 76 ``` 77 78 ### Addressing Malicious Remote Peer 79 The backoff and retry mechanism is used to address the cases that the remote peer is not reliable. 80 However, there can be cases that the remote peer is malicious and does not respond to stream creation requests. 81 Such cases may cause the `UnicastManager` to wait for a long time before it gives up, resulting in a resource exhaustion and slow-down of the stream creation. 82 To mitigate such cases, the `UnicastManager` uses a retry budget for the stream creation. The retry budgets are initialized 83 using the `config/default-config.yml` file: 84 ```yaml 85 # The maximum number of retry attempts for creating a unicast stream to a remote peer before giving up. If it is set to 3 for example, it means that if a peer fails to create 86 # retry a unicast stream to a remote peer 3 times, the peer will give up and will not retry creating a unicast stream to that remote peer. 87 # When it is set to zero it means that the peer will not retry creating a unicast stream to a remote peer if it fails. 88 unicast-max-stream-creation-retry-attempt-times: 3 89 ``` 90 91 As shown in the above snippet, the stream creation is set to 3 by default for every remote peer. 92 Each time the `UnicastManager` is invoked on `CreateStream` to `pid` (`peer.ID`), it loads the retry budgets for `pid` from the unicast config cache. 93 If no unicast config record exists for `pid`, one is created with the default retry budgets. The `UnicastManager` then uses the retry budgets to decide 94 whether to retry the stream creation attempt or not. If the retry budget for stream creation is exhausted, the `UnicastManager` 95 will not retry the stream creation attempt, and returns an error to the caller. The caller can then decide to retry the request or not. 96 Note that even when the retry budget is exhausted, the `UnicastManager` will try the stream creation attempt once, though it will not retry the attempt if it fails. 97 98 #### Penalizing Malicious Remote Peer 99 Each time the `UnicastManager` fails to create a stream to a remote peer and exhausts the retry budget, it penalizes the remote peer as follows: 100 - If the `UnicastManager` exhausts the retry budget for stream creation, it will decrement the stream creation retry budget for the remote peer. 101 - If the retry budget reaches zero, the `UnicastManager` will only attempt once to create a stream to the remote peer, and will not retry the attempt, and rather return an error to the caller. 102 - When the budget reaches zero, the `UnicastManager` will not decrement the budget anymore. 103 104 **Note:** `UnicastManager` is part of the networking layer of the Flow node, which is a lower-order component than 105 the Flow protocol engines who call the `UnicastManager` to send messages to remote peers. Hence, the `UnicastManager` _must not_ outsmart 106 the Flow protocol engines on deciding whether to _create stream_ in the first place. This means that `UnicastManager` will attempt 107 to create stream even to peers with zero retry budgets. However, `UnicastManager` does not retry attempts for the peers with zero budgets, and rather 108 returns an error immediately upon a failure. This is the responsibility of the Flow protocol engines to decide whether 109 to send a message to a remote peer or not after a certain number of failures. 110 111 #### Restoring Retry Budgets 112 113 The `UnicastManager` may reset the stream creation budget for a remote peers _from zero to the default values_ in the following cases: 114 115 - **Restoring Stream Creation Retry Budget**: To restore the stream creation budget from zero to the default value, the `UnicastManager` keeps track of the _consecutive_ 116 successful streams created to the remote peer. Everytime a stream is created successfully, the `UnicastManager` increments a counter for the remote peer. The counter is 117 reset to zero upon the _first failure_ to create a stream to the remote peer. If the counter reaches a certain threshold, the `UnicastManager` will reset the stream creation 118 budget for the remote peer to the default value. The threshold is configurable through the `config/default-config.yml` file: 119 ```yaml 120 # The minimum number of consecutive successful streams to reset the unicast stream creation retry budget from zero to the maximum default. If it is set to 100 for example, it 121 # means that if a peer has 100 consecutive successful streams to the remote peer, and the remote peer has a zero stream creation budget, 122 # the unicast stream creation retry budget for that remote peer will be reset to the maximum default. 123 unicast-stream-zero-retry-reset-threshold: 100 124 ``` 125 Reaching the threshold means that the remote peer is reliable enough to regain the default retry budget for stream creation.