github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/rfc/rfc-005-event-system.rst (about)

     1  =====================
     2  RFC 005: Event System
     3  =====================
     4  
     5  Changelog
     6  ---------
     7  
     8  - 2021-09-17: Initial Draft (@tychoish)
     9  
    10  Abstract
    11  --------
    12  
    13  The event system within Tendermint, which supports a lot of core
    14  functionality, also represents a major infrastructural liability. As part of
    15  our upcoming review of the RPC interfaces and our ongoing thoughts about
    16  stability and performance, as well as the preparation for Tendermint 1.0, we
    17  should revisit the design and implementation of the event system. This
    18  document discusses both the current state of the system and potential
    19  directions for future improvement.
    20  
    21  Background
    22  ----------
    23  
    24  Current State of Events
    25  ~~~~~~~~~~~~~~~~~~~~~~~
    26  
    27  The event system makes it possible for clients, both internal and external,
    28  to receive notifications of state replication events, such as new blocks,
    29  new transactions, validator set changes, as well as intermediate events during
    30  consensus. Because the event system is very cross cutting, the behavior and
    31  performance of the event publication and subscription system has huge impacts
    32  for all of Tendermint.
    33  
    34  The subscription service is exposed over the RPC interface, but also powers
    35  the indexing (e.g. to an external database,) and is the mechanism by which
    36  `BroadcastTxCommit` is able to wait for transactions to land in a block.
    37  
    38  The current pubsub mechanism relies on a couple of buffered channels,
    39  primarily between all event creators and subscribers, but also for each
    40  subscription. The result of this design is that, in some situations with the
    41  right collection of slow subscription consumers the event system can put
    42  backpressure on the consensus state machine and message gossiping in the
    43  network, thereby causing nodes to lag.
    44  
    45  Improvements
    46  ~~~~~~~~~~~~
    47  
    48  The current system relies on implicit, bounded queues built by the buffered channels,
    49  and though threadsafe, can force all activity within Tendermint to serialize,
    50  which does not need to happen. Additionally, timeouts for subscription
    51  consumers related to the implementation of the RPC layer, may complicate the
    52  use of the system.
    53  
    54  References
    55  ~~~~~~~~~~
    56  
    57  - Legacy Implementation
    58    - `publication of events <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L333-L345>`_ 
    59    - `send operation <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L489-L527>`_ 
    60    - `send loop <https://github.com/tendermint/tendermint/blob/v0.37.x/libs/pubsub/pubsub.go#L381-L402>`_
    61  - Related RFCs 
    62    - `RFC 002: IPC Ecosystem <./rfc-002-ipc-ecosystem.md>`_ 
    63    - `RFC 003: Performance Questions <./rfc-003-performance-questions.md>`_ 
    64  
    65  Discussion
    66  ----------
    67  
    68  Changes to Published Events
    69  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    70  
    71  As part of this process, the Tendermint team should do a study of the existing
    72  event types and ensure that there are viable production use cases for
    73  subscriptions to all event types. Instinctively it seems plausible that some
    74  of the events may not be useable outside of tendermint, (e.g. ``TimeoutWait``
    75  or ``NewRoundStep``) and it might make sense to remove them. Certainly, it
    76  would be good to make sure that we don't maintain infrastructure for unused or
    77  un-useful message indefinitely.
    78  
    79  Blocking Subscription
    80  ~~~~~~~~~~~~~~~~~~~~~
    81  
    82  The blocking subscription mechanism makes it possible to have *send*
    83  operations into the subscription channel be un-buffered (the event processing
    84  channel is still buffered.) In the blocking case, events from one subscription
    85  can block processing that event for other non-blocking subscriptions. The main
    86  case, it seems for blocking subscriptions is ensuring that a transaction has
    87  been committed to a block for ``BroadcastTxCommit``. Removing blocking
    88  subscriptions entirely, and potentially finding another way to implement
    89  ``BroadcastTxCommit``, could lead to important simplifications and
    90  improvements to throughput without requiring large changes.
    91  
    92  Subscription Identification
    93  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    94  
    95  Before `#6386 <https://github.com/tendermint/tendermint/pull/6386>`_, all
    96  subscriptions were identified by the combination of a client ID and a query,
    97  and with that change, it became possible to identify all subscription given
    98  only an ID, but compatibility with the legacy identification means that there's a
    99  good deal of legacy code as well as client side efficiency that could be
   100  improved. 
   101  
   102  Pubsub Changes
   103  ~~~~~~~~~~~~~~
   104  
   105  The pubsub core should be implemented in a way that removes the possibility of
   106  backpressure from the event system to impact the core system *or* for one
   107  subscription to impact the behavior of another area of the
   108  system. Additionally, because the current system is implemented entirely in
   109  terms of a collection of buffered channels, the event system (and large
   110  numbers of subscriptions) can be a source of memory pressure. 
   111  
   112  These changes could include: 
   113  
   114  - explicit cancellation and timeouts promulgated from callers (e.g. RPC end
   115    points, etc,) this should be done using contexts.
   116  
   117  - subscription system should be able to spill to disk to avoid putting memory
   118    pressure on the core behavior of the node (consensus, gossip).
   119    
   120  - subscriptions implemented as cursors rather than channels, with either
   121    condition variables to simulate the existing "push" API or a client side
   122    iterator API with some kind of long polling-type interface.