github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170620_streaming_results.md (about) 1 - Feature Name: Streaming results through pgwire 2 - Status: In-progress 3 - Start Date: 2017-06-15 4 - Authors: Tristan Ohlson 5 - RFC PR: [#16626](https://github.com/cockroachdb/cockroach/pull/16626) 6 - Cockroach Issue: [#7775](https://github.com/cockroachdb/cockroach/issues/7775) 7 8 # Summary 9 10 At the time of writing the results of a SQL query are buffered into memory 11 prior to being sent back to the client. This RFC outlines a minimum viable 12 plan for streaming results through pgwire. 13 14 # Motivation and background 15 16 It is currently impossible to do queries with large result sets, since we will 17 exceed our memory budget. A nice side effect of changing to a streaming 18 interface is lowering the time to first byte in some cases. 19 20 Executor currently provides an interface which takes in a query and returns a 21 list of results. The interaction between executor and pgwire (`v3.go`) will 22 have to be changed. 23 24 A concern for streaming results is handling error cases – especially retryable 25 errors. It's important to note that we cannot automatically retry a query whose 26 results contradict those that we've already sent to the client. 27 28 # Detailed design 29 30 I propose creating a result writer interface which `v3.go` can pass into 31 executor to write results to. The result writer interface will look roughly 32 like this: 33 ``` {.go} 34 type ResultWriter interface { 35 BeginResult(pgTag string, statementType parser.StatementType) 36 SetColumns(columns sqlbase.ResultColumns) 37 Type() parser.StatementType 38 SetRowsAffected(rowsAffected int) 39 AddRow(ctx context.Context, row parser.Datums) error 40 EndResult() 41 Error(err error) 42 } 43 ``` 44 45 ## Automatic retries 46 47 Adding a buffer which holds some amount of rows before sending results to the 48 client alleviates automatic retry concerns for queries which return a result 49 set less than the size of the buffer. 50 51 Executor can remain in charge of its automatic retries as long as it can 52 determine whether the result writer has sent results back to the client. 53 Conveniently, we now have the ability to change the transaction state from 54 `FirstBatch` to `Open` to indicate to executor that the query cannot be 55 automatically retried. 56 57 While technically "we cannot automatically retry a query whose results 58 contradict those that we've already sent to the client", in practice we will 59 not automatically retry after we've sent _any_ results to the client. 60 61 ## Configuration 62 63 Buffer size will be configured per-session, with a default configured, using 64 a cluster setting. 65 66 I propose a default of 20 due to data collected on June 5th 2017 which 67 indicates that automatic retries are predominately used for queries which 68 return between 0 and 20 rows. 69 70 A user would be able to disable streaming in practice by setting the buffer 71 size to `MaxInt64`. 72 73 ## Postgres wire protocol 74 75 Postgres states that "a frontend must be prepared to accept 76 `ErrorResponse` and `NoticeResponse` messages whenever it is expecting any other 77 type of message"<sup>[1]</sup>. Due to this it is safe to send an error midway 78 through streaming results to the client. 79 80 # Alternatives 81 82 ## Change executor's interface to provide Start/Next/Close methods 83 84 Modifying executor's interface to provide Start/Next/Close methods is another 85 possible approach. In our code `v3.go` already has the concept of a list of 86 results (we loop through the statement results when we send them to the 87 client), so having it call Next repeatedly would not a big change. 88 89 ### Automatic retries 90 91 The largest change that would have to happen for `v3.go` comes from automatic 92 retry handling. Since we want to keep automatic retries this logic would need 93 to be pushed into `v3.go`. 94 95 Adding a buffer which holds a configurable amount of rows before sending 96 results to the client would alleviate automatic retry concerns for queries 97 which return a result set less than the configured amount. 98 99 ### Downsides 100 101 - Pushes retry logic into `v3.go` which is not ideal 102 - Could require a lot of changes to executor, which is not the easiest code to 103 modify 104 105 # Possible future steps 106 107 - Buffer directly into the pgwire output format 108 - Remove the pagination code for the `cockroach dump` command and rely on 109 streaming results instead 110 - Start streaming results after the buffer has been non-empty for some amount 111 of time 112 - Decouple the client communication and query processing by adding 113 asynchronicity 114 115 # Potential additional benefits 116 117 - In the Spanner SQL paper they mention that clients are able to get paginated 118 results without sorting by using streaming results. 119 120 # Footnotes 121 122 \[1\]: https://www.postgresql.org/docs/9.6/static/protocol-flow.html 123 124 [1]: https://www.postgresql.org/docs/9.6/static/protocol-flow.html