github.com/number571/tendermint@v0.34.11-gost/docs/architecture/adr-062-p2p-architecture.md (about) 1 # ADR 062: P2P Architecture and Abstractions 2 3 ## Changelog 4 5 - 2020-11-09: Initial version (@erikgrinaker) 6 7 - 2020-11-13: Remove stream IDs, move peer errors onto channel, note on moving PEX into core (@erikgrinaker) 8 9 - 2020-11-16: Notes on recommended reactor implementation patterns, approve ADR (@erikgrinaker) 10 11 - 2021-02-04: Update with new P2P core and Transport API changes (@erikgrinaker). 12 13 ## Context 14 15 In [ADR 061](adr-061-p2p-refactor-scope.md) we decided to refactor the peer-to-peer (P2P) networking stack. The first phase is to redesign and refactor the internal P2P architecture, while retaining protocol compatibility as far as possible. 16 17 ## Alternative Approaches 18 19 Several variations of the proposed design were considered, including e.g. calling interface methods instead of passing messages (like the current architecture), merging channels with streams, exposing the internal peer data structure to reactors, being message format-agnostic via arbitrary codecs, and so on. This design was chosen because it has very loose coupling, is simpler to reason about and more convenient to use, avoids race conditions and lock contention for internal data structures, gives reactors better control of message ordering and processing semantics, and allows for QoS scheduling and backpressure in a very natural way. 20 21 [multiaddr](https://github.com/multiformats/multiaddr) was considered as a transport-agnostic peer address format over regular URLs, but it does not appear to have very widespread adoption, and advanced features like protocol encapsulation and tunneling do not appear to be immediately useful to us. 22 23 There were also proposals to use LibP2P instead of maintaining our own P2P stack, which were rejected (for now) in [ADR 061](adr-061-p2p-refactor-scope.md). 24 25 The initial version of this ADR had a byte-oriented multi-stream transport API, but this had to be abandoned/postponed to maintain backwards-compatibility with the existing MConnection protocol which is message-oriented. See the rejected RFC in [tendermint/spec#227](https://github.com/tendermint/spec/pull/227) for details. 26 27 ## Decision 28 29 The P2P stack will be redesigned as a message-oriented architecture, primarily relying on Go channels for communication and scheduling. It will use a message-oriented transport to binary messages with individual peers, bidirectional peer-addressable channels to send and receive Protobuf messages, a router to route messages between reactors and peers, and a peer manager to manage peer lifecycle information. Message passing is asynchronous with at-most-once delivery. 30 31 ## Detailed Design 32 33 This ADR is primarily concerned with the architecture and interfaces of the P2P stack, not implementation details. The interfaces described here should therefore be considered a rough architecture outline, not a complete and final design. 34 35 Primary design objectives have been: 36 37 * Loose coupling between components, for a simpler, more robust, and test-friendly architecture. 38 * Pluggable transports (not necessarily networked). 39 * Better scheduling of messages, with improved prioritization, backpressure, and performance. 40 * Centralized peer lifecycle and connection management. 41 * Better peer address detection, advertisement, and exchange. 42 * Wire-level backwards compatibility with current P2P network protocols, except where it proves too obstructive. 43 44 The main abstractions in the new stack are: 45 46 * `Transport`: An arbitrary mechanism to exchange binary messages with a peer across a `Connection`. 47 * `Channel`: A bidirectional channel to asynchronously exchange Protobuf messages with peers using node ID addressing. 48 * `Router`: Maintains transport connections to relevant peers and routes channel messages. 49 * `PeerManager`: Manages peer lifecycle information, e.g. deciding which peers to dial and when, using a `peerStore` for storage. 50 * Reactor: A design pattern loosely defined as "something which listens on a channel and reacts to messages". 51 52 These abstractions are illustrated in the following diagram (representing the internals of node A) and described in detail below. 53 54  55 56 ### Transports 57 58 Transports are arbitrary mechanisms for exchanging binary messages with a peer. For example, a gRPC transport would connect to a peer over TCP/IP and send data using the gRPC protocol, while an in-memory transport might communicate with a peer running in another goroutine using internal Go channels. Note that transports don't have a notion of a "peer" or "node" as such - instead, they establish connections between arbitrary endpoint addresses (e.g. IP address and port number), to decouple them from the rest of the P2P stack. 59 60 Transports must satisfy the following requirements: 61 62 * Be connection-oriented, and support both listening for inbound connections and making outbound connections using endpoint addresses. 63 64 * Support sending binary messages with distinct channel IDs (although channels and channel IDs are a higher-level application protocol concept explained in the Router section, they are threaded through the transport layer as well for backwards compatibilty with the existing MConnection protocol). 65 66 * Exchange the MConnection `NodeInfo` and public key via a node handshake, and possibly encrypt or sign the traffic as appropriate. 67 68 The initial transport is a port of the current MConnection protocol currently used by Tendermint, and should be backwards-compatible at the wire level. An in-memory transport for testing has also been implemented. There are plans to explore a QUIC transport that may replace the MConnection protocol. 69 70 The `Transport` interface is as follows: 71 72 ```go 73 // Transport is a connection-oriented mechanism for exchanging data with a peer. 74 type Transport interface { 75 // Protocols returns the protocols supported by the transport. The Router 76 // uses this to pick a transport for an Endpoint. 77 Protocols() []Protocol 78 79 // Endpoints returns the local endpoints the transport is listening on, if any. 80 // How to listen is transport-dependent, e.g. MConnTransport uses Listen() while 81 // MemoryTransport starts listening via MemoryNetwork.CreateTransport(). 82 Endpoints() []Endpoint 83 84 // Accept waits for the next inbound connection on a listening endpoint, blocking 85 // until either a connection is available or the transport is closed. On closure, 86 // io.EOF is returned and further Accept calls are futile. 87 Accept() (Connection, error) 88 89 // Dial creates an outbound connection to an endpoint. 90 Dial(context.Context, Endpoint) (Connection, error) 91 92 // Close stops accepting new connections, but does not close active connections. 93 Close() error 94 } 95 ``` 96 97 How the transport configures listening is transport-dependent, and not covered by the interface. This typically happens during transport construction, where a single instance of the transport is created and set to listen on an appropriate network interface before being passed to the router. 98 99 #### Endpoints 100 101 `Endpoint` represents a transport endpoint (e.g. an IP address and port). A connection always has two endpoints: one at the local node and one at the remote peer. Outbound connections to remote endpoints are made via `Dial()`, and inbound connections to listening endpoints are returned via `Accept()`. 102 103 The `Endpoint` struct is: 104 105 ```go 106 // Endpoint represents a transport connection endpoint, either local or remote. 107 // 108 // Endpoints are not necessarily networked (see e.g. MemoryTransport) but all 109 // networked endpoints must use IP as the underlying transport protocol to allow 110 // e.g. IP address filtering. Either IP or Path (or both) must be set. 111 type Endpoint struct { 112 // Protocol specifies the transport protocol. 113 Protocol Protocol 114 115 // IP is an IP address (v4 or v6) to connect to. If set, this defines the 116 // endpoint as a networked endpoint. 117 IP net.IP 118 119 // Port is a network port (either TCP or UDP). If 0, a default port may be 120 // used depending on the protocol. 121 Port uint16 122 123 // Path is an optional transport-specific path or identifier. 124 Path string 125 } 126 127 // Protocol identifies a transport protocol. 128 type Protocol string 129 ``` 130 131 Endpoints are arbitrary transport-specific addresses, but if they are networked they must use IP addresses and thus rely on IP as a fundamental packet routing protocol. This enables policies for address discovery, advertisement, and exchange - for example, a private `192.168.0.0/24` IP address should only be advertised to peers on that IP network, while the public address `8.8.8.8` may be advertised to all peers. Similarly, any port numbers if given must represent TCP and/or UDP port numbers, in order to use [UPnP](https://en.wikipedia.org/wiki/Universal_Plug_and_Play) to autoconfigure e.g. NAT gateways. 132 133 Non-networked endpoints (without an IP address) are considered local, and will only be advertised to other peers connecting via the same protocol. For example, the in-memory transport used for testing uses `Endpoint{Protocol: "memory", Path: "foo"}` as an address for the node "foo", and this should only be advertised to other nodes using `Protocol: "memory"`. 134 135 #### Connections 136 137 A connection represents an established transport connection between two endpoints (i.e. two nodes), which can be used to exchange binary messages with logical channel IDs (corresponding to the higher-level channel IDs used in the router). Connections are set up either via `Transport.Dial()` (outbound) or `Transport.Accept()` (inbound). 138 139 Once a connection is esablished, `Transport.Handshake()` must be called to perform a node handshake, exchanging node info and public keys to verify node identities. Node handshakes should not really be part of the transport layer (it's an application protocol concern), this exists for backwards-compatibility with the existing MConnection protocol which conflates the two. `NodeInfo` is part of the existing MConnection protocol, but does not appear to be documented in the specification -- refer to the Go codebase for details. 140 141 The `Connection` interface is shown below. It omits certain additions that are currently implemented for backwards compatibility with the legacy P2P stack and are planned to be removed before the final release. 142 143 ```go 144 // Connection represents an established connection between two endpoints. 145 type Connection interface { 146 // Handshake executes a node handshake with the remote peer. It must be 147 // called once the connection is established, and returns the remote peer's 148 // node info and public key. The caller is responsible for validation. 149 Handshake(context.Context, NodeInfo, crypto.PrivKey) (NodeInfo, crypto.PubKey, error) 150 151 // ReceiveMessage returns the next message received on the connection, 152 // blocking until one is available. Returns io.EOF if closed. 153 ReceiveMessage() (ChannelID, []byte, error) 154 155 // SendMessage sends a message on the connection. Returns io.EOF if closed. 156 SendMessage(ChannelID, []byte) error 157 158 // LocalEndpoint returns the local endpoint for the connection. 159 LocalEndpoint() Endpoint 160 161 // RemoteEndpoint returns the remote endpoint for the connection. 162 RemoteEndpoint() Endpoint 163 164 // Close closes the connection. 165 Close() error 166 } 167 ``` 168 169 This ADR initially proposed a byte-oriented multi-stream connection API that follows more typical networking API conventions (using e.g. `io.Reader` and `io.Writer` interfaces which easily compose with other libraries). This would also allow moving the responsibility for message framing, node handshakes, and traffic scheduling to the common router instead of reimplementing this across transports, and would allow making better use of multi-stream protocols such as QUIC. However, this would require minor breaking changes to the MConnection protocol which were rejected, see [tendermint/spec#227](https://github.com/tendermint/spec/pull/227) for details. This should be revisited when starting work on a QUIC transport. 170 171 ### Peer Management 172 173 Peers are other Tendermint nodes. Each peer is identified by a unique `NodeID` (tied to the node's private key). 174 175 #### Peer Addresses 176 177 Nodes have one or more `NodeAddress` addresses expressed as URLs that they can be reached at. Examples of node addresses might be e.g.: 178 179 * `mconn://nodeid@host.domain.com:25567/path` 180 * `memory:nodeid` 181 182 Addresses are resolved into one or more transport endpoints, e.g. by resolving DNS hostnames into IP addresses. Peers should always be expressed as address URLs rather than endpoints (which are a lower-level transport construct). 183 184 ```go 185 // NodeID is a hex-encoded crypto.Address. It must be lowercased 186 // (for uniqueness) and of length 40. 187 type NodeID string 188 189 // NodeAddress is a node address URL. It differs from a transport Endpoint in 190 // that it contains the node's ID, and that the address hostname may be resolved 191 // into multiple IP addresses (and thus multiple endpoints). 192 // 193 // If the URL is opaque, i.e. of the form "scheme:opaque", then the opaque part 194 // is expected to contain a node ID. 195 type NodeAddress struct { 196 NodeID NodeID 197 Protocol Protocol 198 Hostname string 199 Port uint16 200 Path string 201 } 202 203 // ParseNodeAddress parses a node address URL into a NodeAddress, normalizing 204 // and validating it. 205 func ParseNodeAddress(urlString string) (NodeAddress, error) 206 207 // Resolve resolves a NodeAddress into a set of Endpoints, e.g. by expanding 208 // out a DNS hostname to IP addresses. 209 func (a NodeAddress) Resolve(ctx context.Context) ([]Endpoint, error) 210 ``` 211 212 #### Peer Manager 213 214 The P2P stack needs to track a lot of internal state about peers, such as their addresses, connection state, priorities, availability, failures, retries, and so on. This responsibility has been separated out to a `PeerManager`, which track this state for the `Router` (but does not maintain the actual transport connections themselves, which is the router's responsibility). 215 216 The `PeerManager` is a synchronous state machine, where all state transitions are serialized (implemented as synchronous method calls holding an exclusive mutex lock). Most peer state is intentionally kept internal, stored in a `peerStore` database that persists it as appropriate, and the external interfaces pass the minimum amount of information necessary in order to avoid shared state between router goroutines. This design significantly simplifies the model, making it much easier to reason about and test than if it was baked into the asynchronous ball of concurrency that the P2P networking core must necessarily be. As peer lifecycle events are expected to be relatively infrequent, this should not significantly impact performance either. 217 218 The `Router` uses the `PeerManager` to request which peers to dial and evict, and reports in with peer lifecycle events such as connections, disconnections, and failures as they occur. The manager can reject these events (e.g. reject an inbound connection) by returning errors. This happens as follows: 219 220 * Outbound connections, via `Transport.Dial`: 221 * `DialNext()`: returns a peer address to dial, or blocks until one is available. 222 * `DialFailed()`: reports a peer dial failure. 223 * `Dialed()`: reports a peer dial success. 224 * `Ready()`: reports the peer as routed and ready. 225 * `Disconnected()`: reports a peer disconnection. 226 227 * Inbound connections, via `Transport.Accept`: 228 * `Accepted()`: reports an inbound peer connection. 229 * `Ready()`: reports the peer as routed and ready. 230 * `Disconnected()`: reports a peer disconnection. 231 232 * Evictions, via `Connection.Close`: 233 * `EvictNext()`: returns a peer to disconnect, or blocks until one is available. 234 * `Disconnected()`: reports a peer disconnection. 235 236 These calls have the following interface: 237 238 ```go 239 // DialNext returns a peer address to dial, blocking until one is available. 240 func (m *PeerManager) DialNext(ctx context.Context) (NodeAddress, error) 241 242 // DialFailed reports a dial failure for the given address. 243 func (m *PeerManager) DialFailed(address NodeAddress) error 244 245 // Dialed reports a successful outbound connection to the given address. 246 func (m *PeerManager) Dialed(address NodeAddress) error 247 248 // Accepted reports a successful inbound connection from the given node. 249 func (m *PeerManager) Accepted(peerID NodeID) error 250 251 // Ready reports the peer as fully routed and ready for use. 252 func (m *PeerManager) Ready(peerID NodeID) error 253 254 // EvictNext returns a peer ID to disconnect, blocking until one is available. 255 func (m *PeerManager) EvictNext(ctx context.Context) (NodeID, error) 256 257 // Disconnected reports a peer disconnection. 258 func (m *PeerManager) Disconnected(peerID NodeID) error 259 ``` 260 261 Internally, the `PeerManager` uses a numeric peer score to prioritize peers, e.g. when deciding which peers to dial next. The scoring policy has not yet been implemented, but should take into account e.g. node configuration such a `persistent_peers`, uptime and connection failures, performance, and so on. The manager will also attempt to automatically upgrade to better-scored peers by evicting lower-scored peers when a better one becomes available (e.g. when a persistent peer comes back online after an outage). 262 263 The `PeerManager` should also have an API for reporting peer behavior from reactors that affects its score (e.g. signing a block increases the score, double-voting decreases it or even bans the peer), but this has not yet been designed and implemented. 264 265 Additionally, the `PeerManager` provides `PeerUpdates` subscriptions that will receive `PeerUpdate` events whenever significant peer state changes happen. Reactors can use these e.g. to know when peers are connected or disconnected, and take appropriate action. This is currently fairly minimal: 266 267 ```go 268 // Subscribe subscribes to peer updates. The caller must consume the peer updates 269 // in a timely fashion and close the subscription when done, to avoid stalling the 270 // PeerManager as delivery is semi-synchronous, guaranteed, and ordered. 271 func (m *PeerManager) Subscribe() *PeerUpdates 272 273 // PeerUpdate is a peer update event sent via PeerUpdates. 274 type PeerUpdate struct { 275 NodeID NodeID 276 Status PeerStatus 277 } 278 279 // PeerStatus is a peer status. 280 type PeerStatus string 281 282 const ( 283 PeerStatusUp PeerStatus = "up" // Connected and ready. 284 PeerStatusDown PeerStatus = "down" // Disconnected. 285 ) 286 287 // PeerUpdates is a real-time peer update subscription. 288 type PeerUpdates struct { ... } 289 290 // Updates returns a channel for consuming peer updates. 291 func (pu *PeerUpdates) Updates() <-chan PeerUpdate 292 293 // Close closes the peer updates subscription. 294 func (pu *PeerUpdates) Close() 295 ``` 296 297 The `PeerManager` will also be responsible for providing peer information to the PEX reactor that can be gossipped to other nodes. This requires an improved system for peer address detection and advertisement, that e.g. reliably detects peer and self addresses and only gossips private network addresses to other peers on the same network, but this system has not yet been fully designed and implemented. 298 299 ### Channels 300 301 While low-level data exchange happens via the `Transport`, the high-level API is based on a bidirectional `Channel` that can send and receive Protobuf messages addressed by `NodeID`. A channel is identified by an arbitrary `ChannelID` identifier, and can exchange Protobuf messages of one specific type (since the type to unmarshal into must be predefined). Message delivery is asynchronous and at-most-once. 302 303 The channel can also be used to report peer errors, e.g. when receiving an invalid or malignant message. This may cause the peer to be disconnected or banned depending on `PeerManager` policy, but should probably be replaced by a broader peer behavior API that can also report good behavior. 304 305 A `Channel` has this interface: 306 307 ```go 308 // ChannelID is an arbitrary channel ID. 309 type ChannelID uint16 310 311 // Channel is a bidirectional channel to exchange Protobuf messages with peers. 312 type Channel struct { 313 ID ChannelID // Channel ID. 314 In <-chan Envelope // Inbound messages (peers to reactors). 315 Out chan<- Envelope // outbound messages (reactors to peers) 316 Error chan<- PeerError // Peer error reporting. 317 messageType proto.Message // Channel's message type, for e.g. unmarshaling. 318 } 319 320 // Close closes the channel, also closing Out and Error. 321 func (c *Channel) Close() error 322 323 // Envelope specifies the message receiver and sender. 324 type Envelope struct { 325 From NodeID // Sender (empty if outbound). 326 To NodeID // Receiver (empty if inbound). 327 Broadcast bool // Send to all connected peers, ignoring To. 328 Message proto.Message // Message payload. 329 } 330 331 // PeerError is a peer error reported via the Error channel. 332 type PeerError struct { 333 NodeID NodeID 334 Err error 335 } 336 ``` 337 338 A channel can reach any connected peer, and will automatically (un)marshal the Protobuf messages. Message scheduling and queueing is a `Router` implementation concern, and can use any number of algorithms such as FIFO, round-robin, priority queues, etc. Since message delivery is not guaranteed, both inbound and outbound messages may be dropped, buffered, reordered, or blocked as appropriate. 339 340 Since a channel can only exchange messages of a single type, it is often useful to use a wrapper message type with e.g. a Protobuf `oneof` field that specifies a set of inner message types that it can contain. The channel can automatically perform this (un)wrapping if the outer message type implements the `Wrapper` interface (see [Reactor Example](#reactor-example) for an example): 341 342 ```go 343 // Wrapper is a Protobuf message that can contain a variety of inner messages. 344 // If a Channel's message type implements Wrapper, the channel will 345 // automatically (un)wrap passed messages using the container type, such that 346 // the channel can transparently support multiple message types. 347 type Wrapper interface { 348 proto.Message 349 350 // Wrap will take a message and wrap it in this one. 351 Wrap(proto.Message) error 352 353 // Unwrap will unwrap the inner message contained in this message. 354 Unwrap() (proto.Message, error) 355 } 356 ``` 357 358 ### Routers 359 360 The router exeutes P2P networking for a node, taking instructions from and reporting events to the `PeerManager`, maintaining transport connections to peers, and routing messages between channels and peers. 361 362 Practically all concurrency in the P2P stack has been moved into the router and reactors, while as many other responsibilities as possible have been moved into separate components such as the `Transport` and `PeerManager` that can remain largely synchronous. Limiting concurrency to a single core component makes it much easier to reason about since there is only a single concurrency structure, while the remaining components can be serial, simple, and easily testable. 363 364 The `Router` has a very minimal API, since it is mostly driven by `PeerManager` and `Transport` events: 365 366 ```go 367 // Router maintains peer transport connections and routes messages between 368 // peers and channels. 369 type Router struct { 370 // Some details have been omitted below. 371 372 logger log.Logger 373 options RouterOptions 374 nodeInfo NodeInfo 375 privKey crypto.PrivKey 376 peerManager *PeerManager 377 transports []Transport 378 379 peerMtx sync.RWMutex 380 peerQueues map[NodeID]queue 381 382 channelMtx sync.RWMutex 383 channelQueues map[ChannelID]queue 384 } 385 386 // OpenChannel opens a new channel for the given message type. The caller must 387 // close the channel when done, before stopping the Router. messageType is the 388 // type of message passed through the channel. 389 func (r *Router) OpenChannel(id ChannelID, messageType proto.Message) (*Channel, error) 390 391 // Start starts the router, connecting to peers and routing messages. 392 func (r *Router) Start() error 393 394 // Stop stops the router, disconnecting from all peers and stopping message routing. 395 func (r *Router) Stop() error 396 ``` 397 398 All Go channel sends in the `Router` and reactors are blocking (the router also selects on signal channels for closure and shutdown). The responsibility for message scheduling, prioritization, backpressure, and load shedding is centralized in a core `queue` interface that is used at contention points (i.e. from all peers to a single channel, and from all channels to a single peer): 399 400 ```go 401 // queue does QoS scheduling for Envelopes, enqueueing and dequeueing according 402 // to some policy. Queues are used at contention points, i.e.: 403 // - Receiving inbound messages to a single channel from all peers. 404 // - Sending outbound messages to a single peer from all channels. 405 type queue interface { 406 // enqueue returns a channel for submitting envelopes. 407 enqueue() chan<- Envelope 408 409 // dequeue returns a channel ordered according to some queueing policy. 410 dequeue() <-chan Envelope 411 412 // close closes the queue. After this call enqueue() will block, so the 413 // caller must select on closed() as well to avoid blocking forever. The 414 // enqueue() and dequeue() channels will not be closed. 415 close() 416 417 // closed returns a channel that's closed when the scheduler is closed. 418 closed() <-chan struct{} 419 } 420 ``` 421 422 The current implementation is `fifoQueue`, which is a simple unbuffered lossless queue that passes messages in the order they were received and blocks until the message is delivered (i.e. it is a Go channel). The router will need a more sophisticated queueing policy, but this has not yet been implemented. 423 424 The internal `Router` goroutine structure and design is described in the `Router` GoDoc, which is included below for reference: 425 426 ```go 427 // On startup, three main goroutines are spawned to maintain peer connections: 428 // 429 // dialPeers(): in a loop, calls PeerManager.DialNext() to get the next peer 430 // address to dial and spawns a goroutine that dials the peer, handshakes 431 // with it, and begins to route messages if successful. 432 // 433 // acceptPeers(): in a loop, waits for an inbound connection via 434 // Transport.Accept() and spawns a goroutine that handshakes with it and 435 // begins to route messages if successful. 436 // 437 // evictPeers(): in a loop, calls PeerManager.EvictNext() to get the next 438 // peer to evict, and disconnects it by closing its message queue. 439 // 440 // When a peer is connected, an outbound peer message queue is registered in 441 // peerQueues, and routePeer() is called to spawn off two additional goroutines: 442 // 443 // sendPeer(): waits for an outbound message from the peerQueues queue, 444 // marshals it, and passes it to the peer transport which delivers it. 445 // 446 // receivePeer(): waits for an inbound message from the peer transport, 447 // unmarshals it, and passes it to the appropriate inbound channel queue 448 // in channelQueues. 449 // 450 // When a reactor opens a channel via OpenChannel, an inbound channel message 451 // queue is registered in channelQueues, and a channel goroutine is spawned: 452 // 453 // routeChannel(): waits for an outbound message from the channel, looks 454 // up the recipient peer's outbound message queue in peerQueues, and submits 455 // the message to it. 456 // 457 // All channel sends in the router are blocking. It is the responsibility of the 458 // queue interface in peerQueues and channelQueues to prioritize and drop 459 // messages as appropriate during contention to prevent stalls and ensure good 460 // quality of service. 461 ``` 462 463 ### Reactor Example 464 465 While reactors are a first-class concept in the current P2P stack (i.e. there is an explicit `p2p.Reactor` interface), they will simply be a design pattern in the new stack, loosely defined as "something which listens on a channel and reacts to messages". 466 467 Since reactors have very few formal constraints, they can be implemented in a variety of ways. There is currently no recommended pattern for implementing reactors, to avoid overspecification and scope creep in this ADR. However, prototyping and developing a reactor pattern should be done early during implementation, to make sure reactors built using the `Channel` interface can satisfy the needs for convenience, deterministic tests, and reliability. 468 469 Below is a trivial example of a simple echo reactor implemented as a function. The reactor will exchange the following Protobuf messages: 470 471 ```protobuf 472 message EchoMessage { 473 oneof inner { 474 PingMessage ping = 1; 475 PongMessage pong = 2; 476 } 477 } 478 479 message PingMessage { 480 string content = 1; 481 } 482 483 message PongMessage { 484 string content = 1; 485 } 486 ``` 487 488 Implementing the `Wrapper` interface for `EchoMessage` allows transparently passing `PingMessage` and `PongMessage` through the channel, where it will automatically be (un)wrapped in an `EchoMessage`: 489 490 ```go 491 func (m *EchoMessage) Wrap(inner proto.Message) error { 492 switch inner := inner.(type) { 493 case *PingMessage: 494 m.Inner = &EchoMessage_PingMessage{Ping: inner} 495 case *PongMessage: 496 m.Inner = &EchoMessage_PongMessage{Pong: inner} 497 default: 498 return fmt.Errorf("unknown message %T", inner) 499 } 500 return nil 501 } 502 503 func (m *EchoMessage) Unwrap() (proto.Message, error) { 504 switch inner := m.Inner.(type) { 505 case *EchoMessage_PingMessage: 506 return inner.Ping, nil 507 case *EchoMessage_PongMessage: 508 return inner.Pong, nil 509 default: 510 return nil, fmt.Errorf("unknown message %T", inner) 511 } 512 } 513 ``` 514 515 The reactor itself would be implemented e.g. like this: 516 517 ```go 518 // RunEchoReactor wires up an echo reactor to a router and runs it. 519 func RunEchoReactor(router *p2p.Router, peerManager *p2p.PeerManager) error { 520 channel, err := router.OpenChannel(1, &EchoMessage{}) 521 if err != nil { 522 return err 523 } 524 defer channel.Close() 525 peerUpdates := peerManager.Subscribe() 526 defer peerUpdates.Close() 527 528 return EchoReactor(context.Background(), channel, peerUpdates) 529 } 530 531 // EchoReactor provides an echo service, pinging all known peers until the given 532 // context is canceled. 533 func EchoReactor(ctx context.Context, channel *p2p.Channel, peerUpdates *p2p.PeerUpdates) error { 534 ticker := time.NewTicker(5 * time.Second) 535 defer ticker.Stop() 536 537 for { 538 select { 539 // Send ping message to all known peers every 5 seconds. 540 case <-ticker.C: 541 channel.Out <- Envelope{ 542 Broadcast: true, 543 Message: &PingMessage{Content: "👋"}, 544 } 545 546 // When we receive a message from a peer, either respond to ping, output 547 // pong, or report peer error on unknown message type. 548 case envelope := <-channel.In: 549 switch msg := envelope.Message.(type) { 550 case *PingMessage: 551 channel.Out <- Envelope{ 552 To: envelope.From, 553 Message: &PongMessage{Content: msg.Content}, 554 } 555 556 case *PongMessage: 557 fmt.Printf("%q replied with %q\n", envelope.From, msg.Content) 558 559 default: 560 channel.Error <- PeerError{ 561 PeerID: envelope.From, 562 Err: fmt.Errorf("unexpected message %T", msg), 563 } 564 } 565 566 // Output info about any peer status changes. 567 case peerUpdate := <-peerUpdates: 568 fmt.Printf("Peer %q changed status to %q", peerUpdate.PeerID, peerUpdate.Status) 569 570 // Exit when context is canceled. 571 case <-ctx.Done(): 572 return nil 573 } 574 } 575 } 576 ``` 577 578 ## Status 579 580 Partially implemented ([#5670](https://github.com/number571/tendermint/issues/5670)) 581 582 ## Consequences 583 584 ### Positive 585 586 * Reduced coupling and simplified interfaces should lead to better understandability, increased reliability, and more testing. 587 588 * Using message passing via Go channels gives better control of backpressure and quality-of-service scheduling. 589 590 * Peer lifecycle and connection management is centralized in a single entity, making it easier to reason about. 591 592 * Detection, advertisement, and exchange of node addresses will be improved. 593 594 * Additional transports (e.g. QUIC) can be implemented and used in parallel with the existing MConn protocol. 595 596 * The P2P protocol will not be broken in the initial version, if possible. 597 598 ### Negative 599 600 * Fully implementing the new design as indended is likely to require breaking changes to the P2P protocol at some point, although the initial implementation shouldn't. 601 602 * Gradually migrating the existing stack and maintaining backwards-compatibility will be more labor-intensive than simply replacing the entire stack. 603 604 * A complete overhaul of P2P internals is likely to cause temporary performance regressions and bugs as the implementation matures. 605 606 * Hiding peer management information inside the `PeerManager` may prevent certain functionality or require additional deliberate interfaces for information exchange, as a tradeoff to simplify the design, reduce coupling, and avoid race conditions and lock contention. 607 608 ### Neutral 609 610 * Implementation details around e.g. peer management, message scheduling, and peer and endpoint advertisement are not yet determined. 611 612 ## References 613 614 * [ADR 061: P2P Refactor Scope](adr-061-p2p-refactor-scope.md) 615 * [#5670 p2p: internal refactor and architecture redesign](https://github.com/number571/tendermint/issues/5670)