github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-005-event-system.rst (about) 1 ===================== 2 RFC 005: Event System 3 ===================== 4 5 Changelog 6 --------- 7 8 - 2021-09-17: Initial Draft (@tychoish) 9 10 Abstract 11 -------- 12 13 The event system within Tendermint, which supports a lot of core 14 functionality, also represents a major infrastructural liability. As part of 15 our upcoming review of the RPC interfaces and our ongoing thoughts about 16 stability and performance, as well as the preparation for Tendermint 1.0, we 17 should revisit the design and implementation of the event system. This 18 document discusses both the current state of the system and potential 19 directions for future improvement. 20 21 Background 22 ---------- 23 24 Current State of Events 25 ~~~~~~~~~~~~~~~~~~~~~~~ 26 27 The event system makes it possible for clients, both internal and external, 28 to receive notifications of state replication events, such as new blocks, 29 new transactions, validator set changes, as well as intermediate events during 30 consensus. Because the event system is very cross cutting, the behavior and 31 performance of the event publication and subscription system has huge impacts 32 for all of Tendermint. 33 34 The subscription service is exposed over the RPC interface, but also powers 35 the indexing (e.g. to an external database,) and is the mechanism by which 36 `BroadcastTxCommit` is able to wait for transactions to land in a block. 37 38 The current pubsub mechanism relies on a couple of buffered channels, 39 primarily between all event creators and subscribers, but also for each 40 subscription. The result of this design is that, in some situations with the 41 right collection of slow subscription consumers the event system can put 42 backpressure on the consensus state machine and message gossiping in the 43 network, thereby causing nodes to lag. 44 45 Improvements 46 ~~~~~~~~~~~~ 47 48 The current system relies on implicit, bounded queues built by the buffered channels, 49 and though threadsafe, can force all activity within Tendermint to serialize, 50 which does not need to happen. Additionally, timeouts for subscription 51 consumers related to the implementation of the RPC layer, may complicate the 52 use of the system. 53 54 References 55 ~~~~~~~~~~ 56 57 - Legacy Implementation 58 - `publication of events <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L333-L345>`_ 59 - `send operation <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L489-L527>`_ 60 - `send loop <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L381-L402>`_ 61 - Related RFCs 62 - `RFC 002: IPC Ecosystem <./rfc-002-ipc-ecosystem.md>`_ 63 - `RFC 003: Performance Questions <./rfc-003-performance-questions.md>`_ 64 65 Discussion 66 ---------- 67 68 Changes to Published Events 69 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 70 71 As part of this process, the Tendermint team should do a study of the existing 72 event types and ensure that there are viable production use cases for 73 subscriptions to all event types. Instinctively it seems plausible that some 74 of the events may not be useable outside of tendermint, (e.g. ``TimeoutWait`` 75 or ``NewRoundStep``) and it might make sense to remove them. Certainly, it 76 would be good to make sure that we don't maintain infrastructure for unused or 77 un-useful message indefinitely. 78 79 Blocking Subscription 80 ~~~~~~~~~~~~~~~~~~~~~ 81 82 The blocking subscription mechanism makes it possible to have *send* 83 operations into the subscription channel be un-buffered (the event processing 84 channel is still buffered.) In the blocking case, events from one subscription 85 can block processing that event for other non-blocking subscriptions. The main 86 case, it seems for blocking subscriptions is ensuring that a transaction has 87 been committed to a block for ``BroadcastTxCommit``. Removing blocking 88 subscriptions entirely, and potentially finding another way to implement 89 ``BroadcastTxCommit``, could lead to important simplifications and 90 improvements to throughput without requiring large changes. 91 92 Subscription Identification 93 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 94 95 Before `#6386 <https://github.com/tendermint/tendermint/pull/6386>`_, all 96 subscriptions were identified by the combination of a client ID and a query, 97 and with that change, it became possible to identify all subscription given 98 only an ID, but compatibility with the legacy identification means that there's a 99 good deal of legacy code as well as client side efficiency that could be 100 improved. 101 102 Pubsub Changes 103 ~~~~~~~~~~~~~~ 104 105 The pubsub core should be implemented in a way that removes the possibility of 106 backpressure from the event system to impact the core system *or* for one 107 subscription to impact the behavior of another area of the 108 system. Additionally, because the current system is implemented entirely in 109 terms of a collection of buffered channels, the event system (and large 110 numbers of subscriptions) can be a source of memory pressure. 111 112 These changes could include: 113 114 - explicit cancellation and timeouts promulgated from callers (e.g. RPC end 115 points, etc,) this should be done using contexts. 116 117 - subscription system should be able to spill to disk to avoid putting memory 118 pressure on the core behavior of the node (consensus, gossip). 119 120 - subscriptions implemented as cursors rather than channels, with either 121 condition variables to simulate the existing "push" API or a client side 122 iterator API with some kind of long polling-type interface.