github.com/ari-anchor/sei-tendermint@v0.0.0-20230519144642-dc826b7b56bb/docs/rfc/rfc-006-event-subscription.md (about)

     1  # RFC 006: Event Subscription
     2  
     3  ## Changelog
     4  
     5  - 30-Oct-2021: Initial draft (@creachadair)
     6  
     7  ## Abstract
     8  
     9  The Tendermint consensus node allows clients to subscribe to its event stream
    10  via methods on its RPC service.  The ability to view the event stream is
    11  valuable for clients, but the current implementation has some deficiencies that
    12  make it difficult for some clients to use effectively. This RFC documents these
    13  issues and discusses possible approaches to solving them.
    14  
    15  
    16  ## Background
    17  
    18  A running Tendermint consensus node exports a [JSON-RPC service][rpc-service]
    19  that provides a [large set of methods][rpc-methods] for inspecting and
    20  interacting with the node.  One important cluster of these methods are the
    21  `subscribe`, `unsubscribe`, and `unsubscribe_all` methods, which permit clients
    22  to subscribe to a filtered stream of the [events generated by the node][events]
    23  as it runs.
    24  
    25  Unlike the other methods of the service, the methods in the "event
    26  subscription" cluster are not accessible via [ordinary HTTP GET or POST
    27  requests][rpc-transport], but require upgrading the HTTP connection to a
    28  [websocket][ws].  This is necessary because the `subscribe` request needs a
    29  persistent channel to deliver results back to the client, and an ordinary HTTP
    30  connection does not reliably persist across multiple requests.  Since these
    31  methods do not work properly without a persistent channel, they are _only_
    32  exported via a websocket connection, and are not routed for plain HTTP.
    33  
    34  
    35  ## Discussion
    36  
    37  There are some operational problems with the current implementation of event
    38  subscription in the RPC service:
    39  
    40  - **Event delivery is not valid JSON-RPC.** When a client issues a `subscribe`
    41    request, the server replies (correctly) with an initial empty acknowledgement
    42    (`{}`). After that, each matching event is delivered "unsolicited" (without
    43    another request from the client), as a separate [response object][json-response]
    44    with the same ID as the initial request.
    45  
    46    This matters because it means a standard JSON-RPC client library can't
    47    interact correctly with the event subscription mechanism.
    48  
    49    Even for clients that can handle unsolicited values pushed by the server,
    50    these responses are invalid: They have an ID, so they cannot be treated as
    51    [notifications][json-notify]; but the ID corresponds to a request that was
    52    already completed.  In practice, this means that general-purpose JSON-RPC
    53    libraries cannot use this method correctly -- it requires a custom client.
    54  
    55    The Go RPC client from the Tendermint core can support this case, but clients
    56    in other languages have no easy solution.
    57  
    58    This is the cause of issue [#2949][issue2949].
    59  
    60  - **Subscriptions are terminated by disconnection.** When the connection to the
    61    client is interrupted, the subscription is silently dropped.
    62  
    63    This is a reasonable behavior, but it matters because a client whose
    64    subscription is dropped gets no useful error feedback, just a closed
    65    connection.  Should they try again?  Is the node overloaded?  Was the client
    66    too slow?  Did the caller forget to respond to pings? Debugging these kinds
    67    of failures is unnecessarily painful.
    68  
    69    Websockets compound this, because websocket connections time out if no
    70    traffic is seen for a while, and keeping them alive requires active
    71    cooperation between the client and server.  With a plain TCP socket, liveness
    72    is handled transparently by the keepalive mechanism.  On a websocket,
    73    however, one side has to occasionally send a PING (if the connection is
    74    otherwise idle).  The other side must return a matching PONG in time, or the
    75    connection is dropped.  Apart from being tedious, this is highly susceptible
    76    to CPU load.
    77  
    78    The Tendermint Go implementation automatically sends and responds to pings.
    79    Clients in other languages (or not wanting to use the Tendermint libraries)
    80    need to handle it explicitly.  This burdens the client for no practical
    81    benefit: A subscriber has no information about when matching events may be
    82    available, so it shouldn't have to participate in keeping the connection
    83    alive.
    84  
    85  - **Mismatched load profiles.** Most of the RPC service is mainly important for
    86    low-volume local use, either by the application the node serves (e.g., the
    87    ABCI methods) or by the node operator (e.g., the info methods).  Event
    88    subscription is important for remote clients, and may represent a much higher
    89    volume of traffic.
    90  
    91    This matters because both are using the same JSON-RPC mechanism. For
    92    low-volume local use, the ergonomics of JSON-RPC are a good fit: It's easy to
    93    issue queries from the command line (e.g., using `curl`) or to write scripts
    94    that call the RPC methods to monitor the running node.
    95  
    96    For high-volume remote use, JSON-RPC is not such a good fit: Even leaving
    97    aside the non-standard delivery protocol mentioned above, the time and memory
    98    cost of encoding event data matters for the stability of the node when there
    99    can be potentially hundreds of subscribers. Moreover, a subscription is
   100    long-lived compared to most RPC methods, in that it may persist as long the
   101    node is active.
   102  
   103  - **Mismatched security profiles.** The RPC service exports several methods
   104    that should not be open to arbitrary remote callers, both for correctness
   105    reasons (e.g., `remove_tx` and `broadcast_tx_*`) and for operational
   106    stability reasons (e.g., `tx_search`). A node may still need to expose
   107    events, however, to support UI tools.
   108  
   109    This matters, because all the methods share the same network endpoint. While
   110    it is possible to block the top-level GET and POST handlers with a proxy,
   111    exposing the `/websocket` handler exposes not _only_ the event subscription
   112    methods, but the rest of the service as well.
   113  
   114  ### Possible Improvements
   115  
   116  There are several things we could do to improve the experience of developers
   117  who need to subscribe to events from the consensus node. These are not all
   118  mutually exclusive.
   119  
   120  1. **Split event subscription into a separate service**. Instead of exposing
   121     event subscription on the same endpoint as the rest of the RPC service,
   122     dedicate a separate endpoint on the node for _only_ event subscription.  The
   123     rest of the RPC services (_sans_ events) would remain as-is.
   124  
   125     This would make it easy to disable or firewall outside access to sensitive
   126     RPC methods, without blocking access to event subscription (and vice versa).
   127     This is probably worth doing, even if we don't take any of the other steps
   128     described here.
   129  
   130  2. **Use a different protocol for event subscription.** There are various ways
   131     we could approach this, depending how much we're willing to shake up the
   132     current API. Here are sketches of a few options:
   133  
   134     - Keep the websocket, but rework the API to be more JSON-RPC compliant,
   135       perhaps by converting event delivery into notifications.  This is less
   136       up-front change for existing clients, but retains all of the existing
   137       implementation complexity, and doesn't contribute much toward more serious
   138       performance and UX improvements later.
   139  
   140     - Switch from websocket to plain HTTP, and rework the subscription API to
   141       use a more conventional request/response pattern instead of streaming.
   142       This is a little more up-front work for existing clients, but leverages
   143       better library support for clients not written in Go.
   144  
   145       The protocol would become more chatty, but we could mitigate that with
   146       batching, and in return we would get more control over what to do about
   147       slow clients: Instead of simply silently dropping them, as we do now, we
   148       could drop messages and signal the client that they missed some data ("M
   149       dropped messages since your last poll").
   150  
   151       This option is probably the best balance between work, API change, and
   152       benefit, and has a nice incidental effect that it would be easier to debug
   153       subscriptions from the command-line, like the other RPC methods.
   154  
   155     - Switch to gRPC: Preserves a persistent connection and gives us a more
   156       efficient binary wire format (protobuf), at the cost of much more work for
   157       clients and harder debugging. This may be the best option if performance
   158       and server load are our top concerns.
   159  
   160       Given that we are currently using JSON-RPC, however, I'm not convinced the
   161       costs of encoding and sending messages on the event subscription channel
   162       are the limiting factor on subscription efficiency, however.
   163  
   164  3. **Delegate event subscriptions to a proxy.** Give responsibility for
   165     managing event subscription to a proxy that runs separately from the node,
   166     and switch the node to push events to the proxy (like a webhook) instead of
   167     serving subscribers directly.  This is more work for the operator (another
   168     process to configure and run) but may scale better for big networks.
   169  
   170     I mention this option for completeness, but making this change would be a
   171     fairly substantial project.  If we want to consider shifting responsibility
   172     for event subscription outside the node anyway, we should probably be more
   173     systematic about it. For a more principled approach, see point (4) below.
   174  
   175  4. **Move event subscription downstream of indexing.** We are already planning
   176     to give applications more control over event indexing. By extension, we
   177     might allow the application to also control how events are filtered,
   178     queried, and subscribed. Having the application control these concerns,
   179     rather than the node, might make life easier for developers building UI and
   180     tools for that application.
   181  
   182     This is a much larger change, so I don't think it is likely to be practical
   183     in the near-term, but it's worth considering as a broader option. Some of
   184     the existing code for filtering and selection could be made more reusable,
   185     so applications would not need to reinvent everything.
   186  
   187  
   188  ## References
   189  
   190  - [Tendermint RPC service][rpc-service]
   191  - [Tendermint RPC routes][rpc-methods]
   192  - [Discussion of the event system][events]
   193  - [Discussion about RPC transport options][rpc-transport] (from RFC 002)
   194  - [RFC 6455: The websocket protocol][ws]
   195  - [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
   196  
   197  [rpc-service]: https://docs.tendermint.com/master/rpc/
   198  [rpc-methods]: https://github.com/tendermint/tendermint/blob/master/internal/rpc/core/routes.go#L12
   199  [events]: ./rfc-005-event-system.rst
   200  [rpc-transport]: ./rfc-002-ipc-ecosystem.md#rpc-transport
   201  [ws]: https://datatracker.ietf.org/doc/html/rfc6455
   202  [json-response]: https://www.jsonrpc.org/specification#response_object
   203  [json-notify]: https://www.jsonrpc.org/specification#notification
   204  [issue2949]: https://github.com/tendermint/tendermint/issues/2949