github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170719_distsql_buffering_router.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170719_distsql_buffering_router.md (about)

1 - Feature Name: distsql buffering hash router
2 - Status: completed
3 - Start Date: 2017-07-19
4 - Authors: Radu
5 - RFC PR: [#17105](https://github.com/cockroachdb/cockroach/pull/17105)
6 - Cockroach Issue: [#17097](https://github.com/cockroachdb/cockroach/issues/17097)
7
8 # Summary
9
10 This RFC discusses the implementation of a "by-hash" output router in distsql
11 which doesn't stop sending results once a consumer is blocked.
12
13 # Motivation
14
15 Issue [#17097](https://github.com/cockroachdb/cockroach/issues/17097) describes
16 scenarios in which a distsql computation can deadlock. The crux of the issue is
17 that the streams between the processors have limited buffers, and when sending
18 on one of these streams blocks, it can block a producer with multiple consumers;
19 in some cases, sending rows to one of the non-blocked consumers is required for
20 progress, so the current implementation deadlocks. See the issue for some
21 examples.
22
23 We can fix this by adding buffering on the input side, wherever a consumer reads
24 from multiple producers (synchronizers, joiners). However, it is difficult to
25 determine when we need to buffer (and we don't want to buffer unnecessarily);
26 and, there are multiple distsql components that would be affected.
27
28 The alternative is to buffer on the output side; currently, the only component
29 which has multiple consumers is the hash router (we don't yet use the mirror
30 router). Moreover, it is easy to have a heuristic for when to buffer: only when
31 some of the consumers are blocked and others aren't.
32
33 The current implementation of the hash router is simple: it is a routine that is
34 called directly from a processor, which hashes the relevant columns and calls
35 `Push` on the correct consumer.
36
37 The requirements for the new implementation are:
38 - if at least one of the consumers is blocked, the router needs to continue
39 absorbing rows, buffering rows for blocked consumers and sending rows to
40 non-blocked consumers. This is required for preventing deadlocks.
41 - if all consumers are blocked, the router must stop buffering rows. This is
42 necessary to apply backpressure and prevent buffering a lot of rows when the
43 producer is faster than the consumers.
44
45 Note that the consumers we are concerned about here are `RowChannel`s which are
46 implemented using go channels. Routers never send rows directly to gRPC (they go
47 through a `RowChannel ` to an `outbox` goroutine which does gRPC calls).
48
49 # Proposed design
50
51 For a k-way hash router, create k goroutines and k `memRowContainer`s (later
52 `diskRowContainer`s). Each goroutine is responsible for sending rows to a
53 consumer.
54
55 The main router routine adds rows to the containers and uses a channel or
56 condition variable to wake up the goroutine. The goroutine `Push`es the first
57 row (which blocks until it gets sent).
58
59 To ensure the second requirement above, all the k goroutines as well as the main
60 routine use a semaphore of capacity `k`. Whenever a goroutine has buffered rows,
61 it acquires the semaphore; whenever it has no more buffered rows, it releases
62 the semaphore. The main router routine tries to acquire the semaphore whenever
63 it's trying to buffer a new row. The result here is that if all consumers have
64 buffered rows, the router routine also blocks on the semaphore.
65
66 ### Pros
67
68 - Efficient when fanout is high and many consumers are blocked.
69 - Efficient when no buffering is necessary (the goroutines will never acquire
70 the semaphore in that case).
71
72 ### Cons
73
74 - Extra goroutines = extra overhead.
75
76 ## Implementation notes
77
78 - Proof-of-concept benchmarks showed very little difference between using a
79 condition variable vs a wake-up channel.
80
81 - Adding and removing rows to a `memRowContainer` has overhead (e.g. memory
82 accounting). The implementation should use a small lookaside buffer to avoid
83 going through the container if we only buffer a few rows.
84
85 - The goroutine that sends the rows along should grab multiple buffered rows
86 instead of reacquiring the mutex for each row.
87
88 - The main routine can reduce overhead by only acquiring the semaphore
89 occasionally (e.g. every 8 rows) - it's ok if we buffer a few extra rows
90 before we block.
91
92 # Considered alternatives
93
94 ## Channels to k goroutines
95
96 Similar to the proposed solution, except that the router routine sends rows to
97 the goroutines via channels, and the goroutines are responsible for buffering as
98 necessary.
99
100 The goroutines have a loop which tries to either receive a row, or send a row to
101 a consumer (via a `select`). This would require exposing the underlying channel
102 (we can no longer hide it behind the `RowReceiver` interface).
103
104 The solution still uses the semaphore but an optimization is possible: we can
105 have a semaphore of `k-1` and only acquire the semaphore from the goroutines,
106 the idea being that if all consumers are blocked, the last goroutine blocks,
107 eventually causing the router routine to also block. This optimization has some
108 subtleties (especially for k=2), and there are cases where it doesn't block as
109 early as the proposed solution leading to more buffering (even when all
110 consumers are blocked, the router routine will continue to send rows until it
111 has to send to the one blocked goroutine (the last to acquire the semaphore).
112
113 This solution seems more complicated to implement correctly, and
114 proof-of-concept
115 [benchmarks](https://github.com/RaduBerinde/playground/tree/master/buffering_router)
116 suggest this solution (`Option1` in the benchmarks) is slower anyway.
117
118 ## reflect.Select
119
120 An alternative solution involves using the channels to the consumers directly
121 and avoids the use of `k` goroutines.
122
123 The hashing router routine receives a row destined to a certain consumer. If we
124 don't have rows buffered for this consumer, we do a non-blocking send to the
125 consumer. If that doesn't succeed, we buffer the row. In either case, if there
126 are other consumers with buffered rows, we `TryPush` a row to each one
127 (repeating if we are successful).
128
129 If all the consumers have buffered rows, we need to block so we stop consuming
130 more rows. We need to block until one consumer is able to receive a row; because
131 the list of consumers is not fixed at compile time, we can't use a regular
132 `select` statement; we would need to use `reflect.Select`.
133
134 This solution was decided against because `reflect.Select` is likely [too
135 slow](https://stackoverflow.com/a/32342741/4019276) and the solution is overall
136 less idiomatic Go than the proposed solution.