github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/tech-notes/sql-principles.md (about) 1 # Guiding principles for the SQL middle-end and back-end in CockroachDB 2 3 This tech nodes outlines some general guiding principles for future 4 work, during the period 2017-2019. 5 6 Reminder from "[The SQL layer in CockroachDB](sql.md)" - the SQL layer 7 in CockroachDB contains 5 main component groups, including: 8 9 - the SQL **front-end**, responsible for parsing, 10 desugaring, free simplifications and semantic analysis; this 11 comprises the two blocks "Parser" and "Expression analysis" in the 12 overview diagram. 13 14 - the SQL **middle-end**, responsible for logical planning and 15 optimization; this comprises the two blocks "(plan constructors)" 16 and "Logical optimizer" in the overview diagram. 17 18 - the SQL **back-end**, of which there are actually two: 19 - the distSQL back-end, itself comprised of its own physical 20 planner, optimizer and runner, able to process any statement/query 21 *for which a distSQL plan can be constructed*; 22 - the **local runner**, responsible for everything else. 23 24 This tech note focuses on the latter two, although the general 25 principles introduced here may be applicable through the entire 26 CockroachDB project. 27 28 Original author: Raphael 'kena' Poss 29 30 ## Overview 31 32 CockroachDB must: 33 34 1) catch up with and innovate beyond, over the course of five years, 35 what hundreds of database experts have built in tens of projects over 36 a period of thirty years. 37 38 2) do so incrementally: function, orthogonality (new!), then performance. 39 40 3) do so openly, transparently and dependably. Openness is about 41 readiness to interact with a community beyond Cockroach Labs; 42 transparency is about developing and sustaining trust of the community 43 in the quality of the code; dependability is about delivering what we 44 promised. 45 46 ## Catching up and beyond 47 48 The main challenge to the first goal is that published articles, books 49 and documentation about the state of the art *describe* the work that 50 has been performed by other SQL implementors but is insufficient to 51 *reproduce* it. 52 53 There are two obstacles really. 54 55 The first obstacle is that the descriptions are largely 56 incomplete. This is an historical artifact of the community where the 57 work was performed: SQL engines have been mostly proprietary affairs, 58 where good technology is still hidden from study by secrecy and 59 software patents. 60 61 Reproducing a SQL engine using database literature is like trying to 62 [picture an elephant without ready access to 63 one](https://www.google.com/search?tbm=isch&q=medieval+drawings+of+elephants). The 64 experience is very different from e.g. what programming language 65 experts are used to, where published literature often contains 66 complete instructions sufficient to reproduce the results, or links to 67 source code to achieve the same. These luxuries are simply absent from 68 most significant published works on databases. 69 70 The main issue that a reader of database literature has to contend 71 with is *the lack of concrete description of the data structures used 72 to represent the concepts described*. As Fred Brookes puts it in the 73 Mythical Man-Month: *"show me your flowcharts and conceal your tables, 74 and I shall continue to be mystified. Show me your tables, and I won’t 75 usually need your flowcharts; they’ll be obvious"* later translated 76 by Linus Torvalds: *"Bad programmers worry about the code. Good 77 programmers worry about data structures and their relationships."* 78 79 To bring CockroachDB to the state of the art and beyond, it will thus 80 be insufficient to look at published artifacts and hope to be able to 81 follow the example of good performers. A large amount of 82 *reverse-engineering*, both of the literature and competing technology, 83 will be needed as well. 84 85 The second obstacle to catching up with the state of the art is that 86 the architecture of CockroachDB fundamentally deviates from the 87 assumptions that have driven SQL research in the past 30 years. To 88 understand why this matters, recall that the main question that drives 89 the design and implementation of a SQL optimization engine is the 90 following: 91 92 *what are the parameters that influence the execution latency, row 93 throughput and CPU/memory usage of a query, and how to decide which 94 SQL query plan combines those parameters in a way that minimizes 95 latency, maximizes throughput and minimizes CPU/memory usage?* 96 97 There are 3 moving pieces in this question: 98 99 - which parameters influence the performance of a query? 100 - for each candidate query plan, how to estimate performance for that plan? 101 - when considering multiple candidates, how to decide the best plan(s)? 102 103 Now, there has been a lot of research into the latter two of these 104 three items. But there's one thing that has barely changed since the 105 late 1960's: understanding of the low-level parameters that influence 106 performance. 107 108 Of these, we can make two groups: those parameters that are different 109 in CockroachDB for no good reason, and for which we can work to 110 converge with the state of the art; and those parameters that are 111 really fundamentally different for a good reason. 112 113 Some examples in the first group: 114 115 - Traditional relational engines know how to perform every SQL 116 operation in a fixed, small RAM budget. CockroachDB does not know 117 how to work with restricted memory budgets. This causes RAM usage to 118 remain a significant parameter to query planning in CockroachDB, 119 whereas it is largely irrelevant in the state of the art. We will 120 work on removing this parameter. 121 - Traditional relational engines always assume up-to-date row counts 122 in tables and cardinality estimates for indexed columns. These are 123 the bread and butter of most costing functions to decide the best 124 query plan candidate. CockroachDB currently maintains neither, but 125 will soon. 126 127 Some examples in the second group: 128 129 | What happened in research before 1980, and remains since then. | What happens in CockroachDB | 130 |---------------------------|-----------------------------| 131 | Data is stored in equally sized pages; pages contain approximately equal numbers of rows. | Prefix compression for keys, Snappy compression for values. | 132 | Parallelism, if at all used in query optimization, only appears in data scans. | CockroachDB already contains a moderately complex distributed execution engine. | 133 | Parallelism, if at all used in query optimization, assumes a fixed number of processors. | CockroachDB node counts evolve dynamically, even possibly during query execution. | 134 | The overall latency of a query is solely decided by the number of rows processed in memory and the number of page accesses on disk. | Network effects and implicit retries dominate. | 135 | The costing estimates use a fixed ratio between disk and CPU performance. | Each node in a cluster can have a different CPU/disk performance ratio. Network performance evolves over time. | 136 | Each area of contiguous storage contains data from at most one table or one index. Storage locality maps 1-to-1 to query locality. | [Interleaved tables.](../RFCS/20160624_sql_interleaved_tables.md) | 137 | A logical range of values, in a SQL query, corresponds to a contiguous area of storage when the columns are indexed. | [Partitioned tables.](../RFCS/20170921_sql_partitioning.md) | 138 139 For this second group, original research will be needed to construct, 140 from the ground up, CockroachDB-specific replacements to the best 141 practices available in the literature and most competing products. 142 143 ## Function, orthogonality, then performance 144 145 Regarding the relationship between function and performance: this is 146 an axiomatic *modus operandi* of this particular engineering 147 organization. 148 149 - "Make it work correctly quickly, only then make it work correctly and increasingly fast." 150 a.k.a. "Make a MVP early, then use adoption to drive further development." 151 - Functional behavior and correctness (which outputs are produced for 152 which inputs and why) comes before non-functional behavior (latency, 153 throughput, jitter, resource costs). 154 155 Reminder: the motivation to prioritize function over performance is 156 that correctness would be much more difficult to obtain and 157 demonstrate otherwise. 158 159 I am adding 160 [*orthogonality*](https://en.wikipedia.org/wiki/Orthogonality_(programming)) 161 in the mix, between function and performance. 162 163 Orthogonality is a general design principle for a software project 164 where the designers strive to evolve a small set of basic building 165 blocks (and actively eliminating redundancy over time whenever it is 166 discovered) while maximizing the number of their allowable 167 combinations. 168 169 From a management perspective, orthogonality is something that is 170 essential, but invisible in the short term. Introducing a focus on 171 orthogonality in a team tends to naturally push for modularity, DRY, 172 reusability, good abstractions and creative discovery of new useful 173 features for end-users, without having to train these individual 174 traits separately. It is the means by which we ensure we can keep 175 growing CockroachDB without making each future change incrementally 176 expensive. 177 178 ## On openness, transparency and dependability 179 180 The challenges and goals set forth above create valid concerns for 181 external observers: 182 183 - will CockroachDB deliver on both performance and quality? Can we 184 "catch up on 30 years of database research" without creating a messy 185 product hacked together too fast? 186 187 - at every point in time when the team decides to transition from a 188 functional, correct feature to a faster implementation of the same, 189 how is it guaranteed that the faster implementation is as correct as 190 the base, functional one? 191 192 These concerns are already highly relevant: 193 194 - the code base is already hard to understand! I posit that it is so 195 because it was hacked together too fast, without general guiding 196 principles and directions (such as those proposed in this 197 document). Openness, transparency and dependability will offer our 198 community a way to understand CockroachDB above and beyond what is 199 currently implemented at each point in time. 200 201 - we have already made mistakes by introducing new bugs while 202 "optimizing" the code to make it faster. Openness, transparency and 203 dependability on the plan to implementation, and releasing 204 intermediate functional features before they are optimized, will 205 help building trust by establishing baselines for testing and validation. 206 207 The nuance between "openness", "transparency" and "dependability" 208 relates to the wants of the community: 209 210 - transparency is our ability to communicate clearly about "why" and "when". 211 - openness is our ability to answer "how" and "why not". 212 - dependability is our ability to match what we announce we will do, and what we end up doing.