github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170620_streaming_results.md (about)

     1  - Feature Name: Streaming results through pgwire
     2  - Status: In-progress
     3  - Start Date: 2017-06-15
     4  - Authors: Tristan Ohlson
     5  - RFC PR: [#16626](https://github.com/cockroachdb/cockroach/pull/16626)
     6  - Cockroach Issue: [#7775](https://github.com/cockroachdb/cockroach/issues/7775)
     7  
     8  # Summary
     9  
    10  At the time of writing the results of a SQL query are buffered into memory
    11  prior to being sent back to the client. This RFC outlines a minimum viable
    12  plan for streaming results through pgwire.
    13  
    14  # Motivation and background
    15  
    16  It is currently impossible to do queries with large result sets, since we will
    17  exceed our memory budget. A nice side effect of changing to a streaming
    18  interface is lowering the time to first byte in some cases.
    19  
    20  Executor currently provides an interface which takes in a query and returns a
    21  list of results. The interaction between executor and pgwire (`v3.go`) will
    22  have to be changed.
    23  
    24  A concern for streaming results is handling error cases – especially retryable
    25  errors. It's important to note that we cannot automatically retry a query whose
    26  results contradict those that we've already sent to the client.
    27  
    28  # Detailed design
    29  
    30  I propose creating a result writer interface which `v3.go` can pass into
    31  executor to write results to. The result writer interface will look roughly
    32  like this:
    33  ``` {.go}
    34  type ResultWriter interface {
    35      BeginResult(pgTag string, statementType parser.StatementType)
    36      SetColumns(columns sqlbase.ResultColumns)
    37      Type() parser.StatementType
    38      SetRowsAffected(rowsAffected int)
    39      AddRow(ctx context.Context, row parser.Datums) error
    40      EndResult()
    41      Error(err error)
    42  }
    43  ```
    44  
    45  ## Automatic retries
    46  
    47  Adding a buffer which holds some amount of rows before sending results to the
    48  client alleviates automatic retry concerns for queries which return a result
    49  set less than the size of the buffer.
    50  
    51  Executor can remain in charge of its automatic retries as long as it can
    52  determine whether the result writer has sent results back to the client.
    53  Conveniently, we now have the ability to change the transaction state from
    54  `FirstBatch` to `Open` to indicate to executor that the query cannot be
    55  automatically retried.
    56  
    57  While technically "we cannot automatically retry a query whose results
    58  contradict those that we've already sent to the client", in practice we will
    59  not automatically retry after we've sent _any_ results to the client.
    60  
    61  ## Configuration
    62  
    63  Buffer size will be configured per-session, with a default configured, using
    64  a cluster setting.
    65  
    66  I propose a default of 20 due to data collected on June 5th 2017 which 
    67  indicates that automatic retries are predominately used for queries which
    68  return between 0 and 20 rows.
    69  
    70  A user would be able to disable streaming in practice by setting the buffer
    71  size to `MaxInt64`. 
    72  
    73  ## Postgres wire protocol
    74  
    75  Postgres states that "a frontend must be prepared to accept
    76  `ErrorResponse` and `NoticeResponse` messages whenever it is expecting any other
    77  type of message"<sup>[1]</sup>. Due to this it is safe to send an error midway
    78  through streaming results to the client.
    79  
    80  # Alternatives
    81  
    82  ## Change executor's interface to provide Start/Next/Close methods
    83  
    84  Modifying executor's interface to provide Start/Next/Close methods is another
    85  possible approach. In our code `v3.go` already has the concept of a list of
    86  results (we loop through the statement results when we send them to the
    87  client), so having it call Next repeatedly would not a big change.
    88  
    89  ### Automatic retries
    90  
    91  The largest change that would have to happen for `v3.go` comes from automatic
    92  retry handling. Since we want to keep automatic retries this logic would need
    93  to be pushed into `v3.go`.
    94  
    95  Adding a buffer which holds a configurable amount of rows before sending
    96  results to the client would alleviate automatic retry concerns for queries
    97  which return a result set less than the configured amount.
    98  
    99  ### Downsides
   100  
   101  - Pushes retry logic into `v3.go` which is not ideal
   102  - Could require a lot of changes to executor, which is not the easiest code to
   103    modify
   104  
   105  # Possible future steps
   106  
   107  - Buffer directly into the pgwire output format
   108  - Remove the pagination code for the `cockroach dump` command and rely on
   109    streaming results instead
   110  - Start streaming results after the buffer has been non-empty for some amount
   111    of time
   112  - Decouple the client communication and query processing by adding
   113    asynchronicity
   114  
   115  # Potential additional benefits
   116  
   117  - In the Spanner SQL paper they mention that clients are able to get paginated
   118    results without sorting by using streaming results.
   119  
   120  # Footnotes
   121  
   122  \[1\]: https://www.postgresql.org/docs/9.6/static/protocol-flow.html
   123  
   124  [1]: https://www.postgresql.org/docs/9.6/static/protocol-flow.html