github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/tech-notes/sql-principles.md

github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/tech-notes/sql-principles.md (about)

1 # Guiding principles for the SQL middle-end and back-end in CockroachDB
2
3 This tech nodes outlines some general guiding principles for future
4 work, during the period 2017-2019.
5
6 Reminder from "[The SQL layer in CockroachDB](sql.md)" - the SQL layer
7 in CockroachDB contains 5 main component groups, including:
8
9 - the SQL **front-end**, responsible for parsing,
10 desugaring, free simplifications and semantic analysis; this
11 comprises the two blocks "Parser" and "Expression analysis" in the
12 overview diagram.
13
14 - the SQL **middle-end**, responsible for logical planning and
15 optimization; this comprises the two blocks "(plan constructors)"
16 and "Logical optimizer" in the overview diagram.
17
18 - the SQL **back-end**, of which there are actually two:
19 - the distSQL back-end, itself comprised of its own physical
20 planner, optimizer and runner, able to process any statement/query
21 *for which a distSQL plan can be constructed*;
22 - the **local runner**, responsible for everything else.
23
24 This tech note focuses on the latter two, although the general
25 principles introduced here may be applicable through the entire
26 CockroachDB project.
27
28 Original author: Raphael 'kena' Poss
29
30 ## Overview
31
32 CockroachDB must:
33
34 1) catch up with and innovate beyond, over the course of five years,
35 what hundreds of database experts have built in tens of projects over
36 a period of thirty years.
37
38 2) do so incrementally: function, orthogonality (new!), then performance.
39
40 3) do so openly, transparently and dependably. Openness is about
41 readiness to interact with a community beyond Cockroach Labs;
42 transparency is about developing and sustaining trust of the community
43 in the quality of the code; dependability is about delivering what we
44 promised.
45
46 ## Catching up and beyond
47
48 The main challenge to the first goal is that published articles, books
49 and documentation about the state of the art *describe* the work that
50 has been performed by other SQL implementors but is insufficient to
51 *reproduce* it.
52
53 There are two obstacles really.
54
55 The first obstacle is that the descriptions are largely
56 incomplete. This is an historical artifact of the community where the
57 work was performed: SQL engines have been mostly proprietary affairs,
58 where good technology is still hidden from study by secrecy and
59 software patents.
60
61 Reproducing a SQL engine using database literature is like trying to
62 [picture an elephant without ready access to
63 one](https://www.google.com/search?tbm=isch&q=medieval+drawings+of+elephants). The
64 experience is very different from e.g. what programming language
65 experts are used to, where published literature often contains
66 complete instructions sufficient to reproduce the results, or links to
67 source code to achieve the same. These luxuries are simply absent from
68 most significant published works on databases.
69
70 The main issue that a reader of database literature has to contend
71 with is *the lack of concrete description of the data structures used
72 to represent the concepts described*. As Fred Brookes puts it in the
73 Mythical Man-Month: *"show me your flowcharts and conceal your tables,
74 and I shall continue to be mystified. Show me your tables, and I won’t
75 usually need your flowcharts; they’ll be obvious"* later translated
76 by Linus Torvalds: *"Bad programmers worry about the code. Good
77 programmers worry about data structures and their relationships."*
78
79 To bring CockroachDB to the state of the art and beyond, it will thus
80 be insufficient to look at published artifacts and hope to be able to
81 follow the example of good performers. A large amount of
82 *reverse-engineering*, both of the literature and competing technology,
83 will be needed as well.
84
85 The second obstacle to catching up with the state of the art is that
86 the architecture of CockroachDB fundamentally deviates from the
87 assumptions that have driven SQL research in the past 30 years. To
88 understand why this matters, recall that the main question that drives
89 the design and implementation of a SQL optimization engine is the
90 following:
91
92 *what are the parameters that influence the execution latency, row
93 throughput and CPU/memory usage of a query, and how to decide which
94 SQL query plan combines those parameters in a way that minimizes
95 latency, maximizes throughput and minimizes CPU/memory usage?*
96
97 There are 3 moving pieces in this question:
98
99 - which parameters influence the performance of a query?
100 - for each candidate query plan, how to estimate performance for that plan?
101 - when considering multiple candidates, how to decide the best plan(s)?
102
103 Now, there has been a lot of research into the latter two of these
104 three items. But there's one thing that has barely changed since the
105 late 1960's: understanding of the low-level parameters that influence
106 performance.
107
108 Of these, we can make two groups: those parameters that are different
109 in CockroachDB for no good reason, and for which we can work to
110 converge with the state of the art; and those parameters that are
111 really fundamentally different for a good reason.
112
113 Some examples in the first group:
114
115 - Traditional relational engines know how to perform every SQL
116 operation in a fixed, small RAM budget. CockroachDB does not know
117 how to work with restricted memory budgets. This causes RAM usage to
118 remain a significant parameter to query planning in CockroachDB,
119 whereas it is largely irrelevant in the state of the art. We will
120 work on removing this parameter.
121 - Traditional relational engines always assume up-to-date row counts
122 in tables and cardinality estimates for indexed columns. These are
123 the bread and butter of most costing functions to decide the best
124 query plan candidate. CockroachDB currently maintains neither, but
125 will soon.
126
127 Some examples in the second group:
128
129 | What happened in research before 1980, and remains since then. | What happens in CockroachDB |
130 |---------------------------|-----------------------------|
131 | Data is stored in equally sized pages; pages contain approximately equal numbers of rows. | Prefix compression for keys, Snappy compression for values. |
132 | Parallelism, if at all used in query optimization, only appears in data scans. | CockroachDB already contains a moderately complex distributed execution engine. |
133 | Parallelism, if at all used in query optimization, assumes a fixed number of processors. | CockroachDB node counts evolve dynamically, even possibly during query execution. |
134 | The overall latency of a query is solely decided by the number of rows processed in memory and the number of page accesses on disk. | Network effects and implicit retries dominate. |
135 | The costing estimates use a fixed ratio between disk and CPU performance. | Each node in a cluster can have a different CPU/disk performance ratio. Network performance evolves over time. |
136 | Each area of contiguous storage contains data from at most one table or one index. Storage locality maps 1-to-1 to query locality. | [Interleaved tables.](../RFCS/20160624_sql_interleaved_tables.md) |
137 | A logical range of values, in a SQL query, corresponds to a contiguous area of storage when the columns are indexed. | [Partitioned tables.](../RFCS/20170921_sql_partitioning.md) |
138
139 For this second group, original research will be needed to construct,
140 from the ground up, CockroachDB-specific replacements to the best
141 practices available in the literature and most competing products.
142
143 ## Function, orthogonality, then performance
144
145 Regarding the relationship between function and performance: this is
146 an axiomatic *modus operandi* of this particular engineering
147 organization.
148
149 - "Make it work correctly quickly, only then make it work correctly and increasingly fast."
150 a.k.a. "Make a MVP early, then use adoption to drive further development."
151 - Functional behavior and correctness (which outputs are produced for
152 which inputs and why) comes before non-functional behavior (latency,
153 throughput, jitter, resource costs).
154
155 Reminder: the motivation to prioritize function over performance is
156 that correctness would be much more difficult to obtain and
157 demonstrate otherwise.
158
159 I am adding
160 [*orthogonality*](https://en.wikipedia.org/wiki/Orthogonality_(programming))
161 in the mix, between function and performance.
162
163 Orthogonality is a general design principle for a software project
164 where the designers strive to evolve a small set of basic building
165 blocks (and actively eliminating redundancy over time whenever it is
166 discovered) while maximizing the number of their allowable
167 combinations.
168
169 From a management perspective, orthogonality is something that is
170 essential, but invisible in the short term. Introducing a focus on
171 orthogonality in a team tends to naturally push for modularity, DRY,
172 reusability, good abstractions and creative discovery of new useful
173 features for end-users, without having to train these individual
174 traits separately. It is the means by which we ensure we can keep
175 growing CockroachDB without making each future change incrementally
176 expensive.
177
178 ## On openness, transparency and dependability
179
180 The challenges and goals set forth above create valid concerns for
181 external observers:
182
183 - will CockroachDB deliver on both performance and quality? Can we
184 "catch up on 30 years of database research" without creating a messy
185 product hacked together too fast?
186
187 - at every point in time when the team decides to transition from a
188 functional, correct feature to a faster implementation of the same,
189 how is it guaranteed that the faster implementation is as correct as
190 the base, functional one?
191
192 These concerns are already highly relevant:
193
194 - the code base is already hard to understand! I posit that it is so
195 because it was hacked together too fast, without general guiding
196 principles and directions (such as those proposed in this
197 document). Openness, transparency and dependability will offer our
198 community a way to understand CockroachDB above and beyond what is
199 currently implemented at each point in time.
200
201 - we have already made mistakes by introducing new bugs while
202 "optimizing" the code to make it faster. Openness, transparency and
203 dependability on the plan to implementation, and releasing
204 intermediate functional features before they are optimized, will
205 help building trust by establishing baselines for testing and validation.
206
207 The nuance between "openness", "transparency" and "dependability"
208 relates to the wants of the community:
209
210 - transparency is our ability to communicate clearly about "why" and "when".
211 - openness is our ability to answer "how" and "why not".
212 - dependability is our ability to match what we announce we will do, and what we end up doing.