github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160218_freeze.md (about) 1 - Feature Name: Data/network freeze 2 - Status: completed 3 - Start Date: 2016-02-18 4 - Authors: Ben Darnell 5 - RFC PR: [#4499](https://github.com/cockroachdb/cockroach/pull/4499) 6 - Cockroach Issue: 7 8 # Summary 9 10 This RFC outlines the plan for freezing our data formats and network 11 protocols. 12 13 # Motivation 14 15 We currently make backwards-incompatible changes to data formats 16 without providing any means to upgrade without data loss. This will 17 need to stop before beta for obvious reasons. 18 19 # Detailed design 20 21 ## Freeze plan 22 23 The freeze will proceed in several steps. 24 25 ### Stage 0 (pre-beta) 26 27 Anything goes; changes to on-disk formats do not require any kind of 28 migration path. 29 30 ### Stage 1: Guaranteed upgrade path (Mar 30, 2016) 31 32 In stage 1, we require backwards-compatibility with data written by 33 any previous stage 1 build. It should always be possible to upgrade by 34 stopping all of the old nodes and then bringing up the new version. 35 It's OK at this stage if old and new versions cannot be run 36 concurrently, or if the migration process takes some time (e.g. 37 rewriting all data before the node can start up). 38 39 It is acceptable at this stage if the process is somewhat manual (e.g. 40 running some sort of yet-to-be-written backup/restore process and 41 stopping/restarting all nodes at once). However, it is preferable if 42 any migrations are done automatically when a node starts up with an 43 old data directory. 44 45 ### Stage 2: Online upgrades (Date TBD) 46 47 Beginning in stage 2, we require that any upgrade be able to be 48 performed without taking the cluster offline: old and new nodes must 49 be able to coexist. The exact date for this stage is yet to be 50 determined, but will be during the beta period and before 1.0. 51 52 ## Affected code 53 54 Any code could potentially be affected by the freeze, but areas that 55 will deserve special scrutiny include: 56 57 * All `.proto` definitions 58 * The packages `keys` and `util/encoding` 59 * All system tables (defined in `sql/system.go`) 60 61 ## Migration strategies 62 63 It is difficult to come up with a universal migration strategy, since 64 different changes will require different approaches (for example, 65 `.proto` changes could perhaps be made by rewriting data on disk at 66 startup, while changes to key construction may require the change to 67 be coordinated in a distributed fashion). Therefore we leave the 68 specifics of a migration process until the need arises. 69 70 To facilitate future changes, we will introduce version numbers at 71 several levels. Initially the behavior around these version numbers 72 will be conservative and cross-version communication will be limited. 73 That makes these version numbers a blunt instrument to be reserved for 74 major changes. 75 76 * The on-disk format (via a file that lives outside RocksDB). Servers 77 will refuse to load a database with a higher version number than 78 they understand. 79 * The network protocol (via GRPC header). Servers and clients will 80 treat a higher version number than they understand as an error. 81 * Gossip (perhaps via a new node attribute, or field in the 82 `NodeDescriptor` proto). The rebalance/allocation system will not 83 choose to place a replica on a node with a different version number. 84 * The SQL `TableDescriptor` 85 86 # Drawbacks 87 88 After the freeze, some changes will be much harder to make. 89 90 # Alternatives 91 92 None. 93 94 # Unresolved questions 95 96 * When exactly do we begin the stage 2 freeze? 97 * Is there anything else worth doing at this point to facilitate 98 future migrations? 99 * What about downgrades? It's scary to upgrade when there is no going 100 back. However, supporting downgrades adds even more complexity and I 101 don't think it's worth making this commitment during beta (maybe for 102 1.0, though).