vitess.io/vitess@v0.16.2/doc/design-docs/VTGateV3Features.md (about) 1 # Vitess Query Features 2 3 ## Overview 4 5 Client applications connect to Vitess through VTGate. VTGate is a proxy that 6 routes queries to the right vttablets. It is basically a fat client, but provided 7 as a service. Over time, we moved away from a fat client, which had to be 8 incorporated into the application. This way, the routing logic is more decoupled 9 from the application and easier to maintain separately. 10 11 The previous versions of the VTGate API existed as transition points while the 12 logic was gradually moved away from the client into this new server: 13 14 * V0: This version had no VTGate. The application was expected to fetch the 15 sharding info and tablet locations from a 'toposerver' and use it to talk 16 directly to the tablet servers. 17 * V1: This was the first version of VTGate. In this version, the app only needed 18 to know the number of shards, and how to map the sharding key to the correct 19 shard. The rest was done by VTGate. In this version, the app was still exposed 20 to resharding events. 21 * V2: In this version, the keyspace id was required instead of the shard. This 22 allowed the app to be agnostic of the number of shards. 23 24 With V3, the app does not need to specify any routing info. It just sends the 25 query to VTGate as if it's a single database. Apart from simplifying the API, 26 there are some additional benefits: 27 28 * Database compliant drivers can be built for each client language. This will 29 also allow for integration with third party tools that work with such drivers. 30 * V3 can aspire to satisfy the full SQL syntax. This means that it will be able 31 to perform cross-shard joins, aggregations and sorting. 32 * Easier migration: an application that was written to use a single database can 33 be trivially changed to use Vitess, and then the database can be scaled from 34 underneath without changing much of the app. 35 36 The 37 [V3 design](https://github.com/vitessio/vitess/blob/main/doc/V3VindexDesign.md) 38 is quite elaborate. If necessary, it will allow you to plug in custom indexes 39 and sharding schemes. However, it comes equipped with some pre-cooked recipes 40 that satisfy the immediate needs of the real-world: 41 42 ## Sharding schemes 43 44 At its core, vitess uses range-based sharding, where the sharding column is 45 typically a number or a varbinary. However, allowing data to be accessed only by 46 the sharding key limits the flexibility of an application. V3 comes with a set 47 of new indexing schemes that are built on top of range-based sharding. 48 49 ### Basic sharding key 50 51 If the application already has a well-distributed sharding key, you just have to 52 tell VTGate what those keys are for each table. VTGate will correctly route your 53 queries based on input values or the WHERE clause. 54 55 ### Hashed sharding key 56 57 If the application's sharding key is a monotonically increasing number, then you 58 may not get well-balanced shards. In such cases, you can ask V3 to route queries 59 based on the hash of the main sharding key. 60 61 With V3, it is no longer necessary to store the sharding key as an extra column 62 in the table. Even during resharding, filtered replication can now compute the 63 hashed values. 64 65 ### Auto-increment columns 66 67 When a table gets sharded, you are no longer able to use MySQL's auto increment 68 functionality. V3 allows you to designate a table in an unsharded database as 69 the source of auto-increment ids. Once you've specified this, V3 will 70 transparently use generated values from this table to keep the auto-increment 71 going. The auto-increment column can in turn be a basic or hashed sharding 72 key. If it's a hashed sharding key, the newly generated value will be hashed 73 before the query is routed. 74 75 ### Cross-shard indexes 76 77 As your application evolves, you'll invariably find yourself wanting to fetch 78 rows based on columns other than the main sharding key. For example, if you've 79 sharded your database by user id, you may still want to be able find users by 80 their username. If you only had the sharding key, such queries can only be 81 answered by sending it to all shards. This could become very expensive as the 82 number of shards grow. 83 84 The typical strategy to address this problem is to build a separate lookup table 85 and keep it up-to-date. In the above case, you may build a separate 86 username->user_id relationship table. Once you've informed V3 of this table, it 87 will know what to do with a query like 'select * from user where 88 username=:value'. You can also configure V3 to keep this table up-to-date as you 89 insert or delete data. In other words, the application can be completely 90 agnostic of this table's existence. 91 92 #### Non-unique indexes 93 94 Cross-shard indexes don't have to be unique. It is possible that rows may exist 95 in multiple shards for a given where clause. V3 allows you to specify indexes as 96 unique or non-unique, and accordingly enforces such constraints during 97 changes. This flexibility is currently available for lookup indexes. For 98 example, you may want to index users by their last name. If so, there would be a 99 last_name->user_id index, which could result in multiple user ids being returned 100 for a given last_name. 101 102 #### Shared indexes 103 104 There are situations where multiple tables share the same foreign key. A typical 105 use case is a situation where there is a customer table, an order table and an 106 order_detail table. The order table would have a customer_id column. In order to 107 efficiently access all orders of a customer, it would be beneficial to shard 108 this table by customer_id. This will co-locate order rows with their 109 corresponding customer row. 110 111 The order table would also need an order_id column. As mentioned above, you can 112 create an order_id->customer_id cross-shard index for this table. This will 113 allow you to efficiently access orders by their order_id. 114 115 In the case of an order_detail table, it may only need an order_id foreign 116 key. Since this foreign key means the same thing as the order_id in order, 117 creating a cross-shard index for it will result in a duplication of the 118 order_id->customer_id index. In such situations, V3 allows you to just reuse the 119 existing index for the order_detail table also. This saves disk space and also 120 reduces the overall write load. 121 122 ## Knowing where tables are 123 124 As your database grows, you will not only be sharding it, you will also be 125 splitting it vertically by migrating tables from one database to another. V3 126 will be able to keep track of this. The app will only have to refer a table by 127 name, and VTGate will figure out how to route the query to the correct database. 128 129 The vitess workflow also ensures that such migrations are done transparently 130 with virtually no downtime. 131 132 ## Consistency 133 134 Once you add multiple indexes to tables, it's possible that the application 135 could make inconsistent requests. V3 makes sure that none of the specified 136 constraints are broken. For example, if a table had both a basic sharding key 137 and a hashed sharding key, it will enforce the rule that the hash of the basic 138 sharding key matches that of the hashed sharding key. 139 140 Some of the changes require updates to be performed across multiple 141 databases. For example, inserting a row into a table that has a cross-shard key 142 requires an additional row to be inserted into the lookup table. This results in 143 distributed transactions. Currently, this is a best effort update. It is 144 possible that partial commits happen if databases fail in the middle of a 145 distributed commit. 146 147 *The 2PC transactions feature is coming very soon to overcome this limitation.* 148 149 ## Query Diversity 150 151 V3 does not support the full SQL feature set. The current implementation 152 supports the following queries: 153 154 * Single table DML statements: This is a vitess-wide restriction where you can 155 affect only one table and one sharding key per statement. *This restriction 156 may be removed in the future.* 157 * Single table SELECT statements: 158 * All constructs allowed if the statement targets only a single sharding key 159 * Aggregation and sorting not allowed if the statement targets more than one 160 sharding key. Selects are allowed to target multiple sharding keys as long 161 as the results from individual shards can be simply combined together to 162 form the final result. 163 * Joins that can be served by sending the query to a single shard. 164 * Joins that can be served by sending the query to multiple shards, and 165 trivially combined to form the final result. 166 167 Work is underway to support the following additional constructs: 168 169 * Cross-shard joins that can be served through a simple nested lookup. 170 * Sorting that can be trivially merged from the results of multiple shards 171 (merge-sort). 172 * Aggregations (and grouping) that can be trivially combined from multiple 173 shards. 174 * A combination of the above constructs as long as the results remain trivially 175 combinable. 176 177 SQL is a very powerful language. You can build queries that can result in large 178 amount of work and memory consumption involving big intermediate results. Such 179 constructs where the scope of work is open-ended will not be immediately 180 supported. In such cases, it's recommended that you use map-reduce techniques 181 for which there is a separate API. 182 183 *On-the-fly map-reducers can be built to address the more complex needs in the future.* 184 185 ## Special Queries 186 187 MySQL supports a number of tables that are not storing data, but rather expose 188 meta-data about the database. For instance, the information_schema database. 189 190 The V3 API is meant to resemble a single database instance (using the Execute 191 API calls). But it can be backed by multiple keyspaces and each can have multiple 192 shards. In the long term, we want to behave as if all keyspaces and tables are 193 in one single instance: each keyspace would be shown as a database, each table 194 in a sharded keyspace would only be shown once (and not once per shard). 195 196 *We do not support these queries just yet.* 197 198 For administrators who want to target individual shards, it is possible to use 199 the V1 API (with the ExecuteShards API). A deeper knowledge of the existing 200 shards is then required, but administrators have that knowledge. 201 202 ## VSchema 203 204 The above features require metadata like configuration of sharding key and 205 cross-shard indexes to be configured and stored in some place. This is known as 206 the vschema. 207 208 *A way to edit the VSchema using SQL-like commands will be built to integrate 209 VSchema changes with regular SQL schema changes. For now, the VSchema needs to 210 be specified as a JSON file.* 211 212 Under the covers, the vschema is a JSON file. There are low level vtctl commands 213 to upload it also. This will allow you to build workflows for tracking and 214 managing changes.