github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160721_information_schema.md (about) 1 - Feature Name: Information Schema 2 - Status: completed 3 - Start Date: 2016-07-19 4 - Authors: Nathan VanBenschoten 5 - RFC PR: [#7965](https://github.com/cockroachdb/cockroach/pull/7965) 6 - Cockroach Issue: [#5194](https://github.com/cockroachdb/cockroach/issues/5194), 7 [#5583](https://github.com/cockroachdb/cockroach/issues/5583) 8 9 10 # Summary 11 12 `information_schema` allows users to perform database introspection, providing 13 information about all tables, views, columns, procedures, and access privileges 14 in a database. This proposal details initial plans to support the `information_schema` 15 specification in CockroachDB. 16 17 # Motivation 18 19 Up until this point, CockroachDB has only supported limited schema reflection through 20 `SHOW` statements. This is restricting in that it is inconsistent with other SQL 21 statements, it forces users to learn new syntax, and it is inextensible. Unlike `SHOW` 22 statements, `information_schema` exposes a table-like interface which feels natural in 23 relational database and addresses all of these concerns. For other advantages of 24 querying `information_schema` over using `SHOW` statements, see MySQL's 25 [documentation on the topic.](http://dev.mysql.com/doc/refman/5.7/en/information-schema.html#idm140648692050944) 26 27 More importantly, in order to perform schema reflection, a large majority of ORMs 28 and other database tools query metadata tables within the `information_schema`. Without 29 proper support for these meta table, many tools simply cannot be used with CockroachDB. 30 This is an issue, as support for these tools is critical for user accessibility 31 of CockroachDB. Because of this widespread usage of `information_schema`, adding support 32 for the standard will be a major step towards supporting a group of ORMs and other tools, 33 and by extension, expanding the ecosystem around CockroachDB. 34 35 # Detailed design 36 37 ### Relation to VIEWs 38 39 `information_schema` tables are commonly referred to as VIEWs. This is because they are 40 not base tables, so there are no files associated with them. On top of this, they are read-only 41 tables, so one can only read their contents, and cannot perform `INSERT`, `UPDATE`, or `DELETE` 42 operations on them. 43 44 However, while `information_schema` tables are conceptually similar to read-only views, they 45 are handled differently than standard user-level VIEWs in most databases. For instance, in MySQL 46 `information_schema` tables are **temporary tables that are created and filled on demand**. 47 The reason for this is that a `VIEW` is a mapping from an ordinary base table to a virtual table. 48 However, the data stored in the `information_schema` is not stored in any underlying table, 49 so the "tables" need to supply data themselves. All this goes to say that even though future 50 implementation efforts towards adding support for VIEWs to CockroachDB may touch 51 `information_schema` code, the features can and should be developed independently from each other. 52 53 ### General Implementation Concerns 54 55 There are two general implementation concerns that need to be considered for the 56 introduction of `information_schema` into CockroachDB. The first concern is that the 57 `information_schema` database and the tables it contains should **behave** exactly 58 like all other databases and tables, respectively. Regardless of how they are implemented, 59 they should be indistinguishable from persisted relations. The second concern is that the 60 tables need to **produce** the correct information when queried by performing a form 61 of database introspection. 62 63 ### Virtual Database and Table Descriptors 64 65 To address the first implementation concern, this RFC proposes that we hardcode a set of 66 static database and table descriptors for the new database and all new tables needed 67 for the `information_schema`. This will be similar to the approach taken for `SystemDatabase`, 68 except there is no need to actually store the descriptors in an underlying Cockroach KV store 69 during cluster bootstrap, because they will not need to be modified. These descriptors 70 will all be given **read-only privileges**, just like the majority of system tables. 71 72 Interactions with these "virtual" database and table descriptors will be handled by 73 `sql.planner`'s implementation of 74 [`DatabaseAccessor`](https://github.com/cockroachdb/cockroach/blob/4d45696ea776a5b912262e5eef9889eacb6abe41/sql/database.go#L90) 75 and 76 [`SchemaAccessor`](https://github.com/cockroachdb/cockroach/blob/4d45696ea776a5b912262e5eef9889eacb6abe41/sql/table.go#L89), 77 respectively. By handling these descriptors at this level, we can avoid having to add 78 special-cases for `information_schema` constructs at abstraction levels above this. 79 Desired behavior which this would naturally provide is: 80 81 - Preventing the creation of databases called "information_schema" 82 - Preventing the creation of tables in the "information_schema" (read-only database privileges) 83 - Preventing the mutation of tables in the "information_schema" (read-only table privileges) 84 85 ### Virtual Data Sources 86 87 To address the second implementation concern, the RFC proposes that we catch queries to 88 these virtual descriptors in 89 [`planner.getDataSource`](https://github.com/cockroachdb/cockroach/blob/4d45696ea776a5b912262e5eef9889eacb6abe41/sql/data_source.go#L191), 90 and return a `valuesNode` instead of a `scanNode` with desired information populated. Using 91 a `valueNode` in this way draws direct parallels to our current implementation of `SHOW` 92 statements. It allows us to mock out the table scan, and populate the provided values using 93 arbitrary code. 94 95 ### pg_catalog 96 97 `pg_catalog` is a PostgreSQL extension database that predates the introduction of 98 `information_schema` to the SQL standard. PostgreSQL maps all `information_schema` "tables" to 99 `pg_catalog` through the use of VIEWS. While it would be ideal to only support 100 `information_schema`, as it is part of the SQL standard and present in most databases (as opposed 101 to just PostgreSQL), there does seem to be a demand for an implementation of `pg_catalog` as well. 102 This demand comes from: 103 104 - PostgreSQL-specific tools and ORMs, which do not make an effort to be cross platform compatible 105 - PostgreSQL-specific drivers, which do not make an effort to be cross platform compatible 106 - Legacy cross-platform tools and ORMs, which special-case the use of `pg_catalog` for 107 any database exposing the PostgreSQL wire protocol 108 109 The RFC proposes to map all `pg_catalog` tables to queries on the `information_schema`, in 110 a similar way that PostgreSQL does (but in the opposite direction). Ideally this would be 111 completed with VIEWs, but there are a few downsides to this. These include that the eventual 112 implementation of VIEWs will require `ViewDescriptors` to be distributed, which is unnecessary 113 for a static VIEW. Instead, it would be simpler to map `pg_catalog` queries directly to 114 `information_schema` queries in code. However, this RFC also proposes that we push off a more 115 detailed discussion on `pg_catalog` until the implementation of `information_schema` is complete. 116 117 ### Standards Considerations 118 119 It is advised that we take a similar approach to MySQL in its representation of 120 `information_schema`. The `information_schema` table structure will follow the ANSI/ISO 121 SQL:2003 standard Part 11 Schemata. The intent is approximate compliance with SQL:2003 core 122 feature F021 Basic information schema. 123 124 Additionally, MySQL provides a number of extensions to their implementation of the 125 `information_schema`. An example of this is the inclusion of an `ENGINE` column in their 126 `INFORMATION_SCHEMA.TABLES` table. The database also excludes columns that do not 127 make sense to include given their SQL implementation. Similarly, we will omit columns that 128 are not relevant for our implementation, and add CockroachDB-specific extension columns 129 if any specific need for them arises. 130 131 CockroachDB-specific information which makes sense to add to our `information_schema` 132 implementation includes: 133 - column families 134 - interleaved tables 135 - index stored columns 136 137 ### Privileges 138 139 All users will have access to the `information_schema` tables, but they will only be able 140 to see rows which they have access to. The same privileges will apply to selecting information 141 from `information_schema` and viewing the same information through `SHOW` statements. This will 142 be a change from our current implementation of `SHOW` statements, where we generally return all 143 instances of a given object, even those that a user does not have privileges for. 144 145 This approach to access control for the `information_schema` is identical to MySQL's. 146 147 ### Other Concerns 148 149 #### SHOW Statements 150 151 Our inclusion of `SHOW` statements was inspired by MySQL. In MySQL, `SHOW` statements 152 were added as the primary means of schema reflection before `information_schema` was 153 introduced to the SQL standard. Shortly after it was introduced, MySQL added support for the 154 meta tables, and began mapping `SHOW` statements directly onto `SELECT` statements from 155 these virtual tables. 156 157 This RFC proposes that sometime after support for `information_schema` is added, we 158 follow MySQL's lead and turn `SHOW` statements into a wrapper around `information_schema` 159 queries. This will help eliminate duplicate implementations and simplify SHOW statement 160 handling significantly. 161 162 #### Schema Search Path Extension (only needed for `pg_catalog` support) 163 164 PostgreSQL has a notion of a "Schema Search Path". In much the same way that we support the 165 `SET database = xyz`, which will search for unqualified table names in the `xyz` database, the 166 schema search path allows unqualified table names to be associated with a specific 167 database/table pair. However, unlike our session database setting, PostgreSQL allows multiple 168 databases to be in the search path. This is important because a large percent of the the time 169 that `pg_catalog` tables are referenced in ORMs and other external tools, they are 170 referenced using unqualified table names. For instance, a number of the ORMs use 171 `pg_catalog` like: 172 173 ```sql 174 SELECT * from pg_class 175 ``` 176 177 instead of 178 179 ```sql 180 SELECT * from pg_catalog.pg_class 181 ``` 182 183 The reason this is allowed is because PostgreSQL includes `pg_catalog` in its 184 database search path by default. In order to support this use case, we should extend our 185 notion of a single database search path to a set of search paths. Like support for 186 `pg_catalog`, this does not need to be completed during the initial implementation of 187 this RFC. 188 189 # Drawbacks 190 191 - Complicates some SQL descriptor logic 192 - Complicates database introspection on itself (ie. `SHOW TABLES FROM information_schema`) 193 - Adds a few special cases to the query engine, but this is much less intrusive than 194 if we tried to mock out the database and table existence at a higher level than 195 descriptors 196 197 # Alternatives 198 199 - Create and maintain real tables just like we do with SystemDB tables 200 - Mock out database and table existence at a higher level than [virtual 201 descriptors](#virtual-database-and-table-descriptors) 202 - Begin building out VIEW infrastructure, and treat `information_schema` like 203 a read-only view. As noted in [Relation to VIEWs](#relation-to-views), this 204 wouldn't actually change very much, and would still require a lot of 205 this work. For instance, we would still need to somehow mock out the existence 206 of the `information_schema` database itself, and would still need to have the 207 VIEWs source their own introspective data. 208 209 # Unresolved questions 210 211 ### Information_Schema Across Different Versions of CockroachDB 212 213 Because this proposal suggests the use of static virtual descriptors instead of real 214 descriptors persisted to an underlying CockroachDB store, the descriptors used for a 215 given `information_schema` query will depend on the version of the CockroachDB instance 216 fielding a SQL request. In a mixed-version cluster, this could result in two nodes returning 217 `information_schema` tables with different schemas if the static schema was altered between 218 the nodes' versions. Still, both nodes will issue replicated transactions internally to 219 populate these tables, so both will return "correct" information. 220 221 Inconsistencies where different nodes return different results for a given query can occur 222 today if two nodes are running different versions of our SQL query engine. The only difference 223 is that this change will introduce the possibility of schema inconsistencies because 224 `information_schema` descriptors may not be consistent between nodes. This shouldn't be a 225 serious issue, but it is something to note.