github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20160218_freeze.md (about)

     1  - Feature Name: Data/network freeze
     2  - Status: completed
     3  - Start Date: 2016-02-18
     4  - Authors: Ben Darnell
     5  - RFC PR: [#4499](https://github.com/cockroachdb/cockroach/pull/4499)
     6  - Cockroach Issue:
     7  
     8  # Summary
     9  
    10  This RFC outlines the plan for freezing our data formats and network
    11  protocols.
    12  
    13  # Motivation
    14  
    15  We currently make backwards-incompatible changes to data formats
    16  without providing any means to upgrade without data loss. This will
    17  need to stop before beta for obvious reasons.
    18  
    19  # Detailed design
    20  
    21  ## Freeze plan
    22  
    23  The freeze will proceed in several steps.
    24  
    25  ### Stage 0 (pre-beta)
    26  
    27  Anything goes; changes to on-disk formats do not require any kind of
    28  migration path.
    29  
    30  ### Stage 1: Guaranteed upgrade path (Mar 30, 2016)
    31  
    32  In stage 1, we require backwards-compatibility with data written by
    33  any previous stage 1 build. It should always be possible to upgrade by
    34  stopping all of the old nodes and then bringing up the new version.
    35  It's OK at this stage if old and new versions cannot be run
    36  concurrently, or if the migration process takes some time (e.g.
    37  rewriting all data before the node can start up).
    38  
    39  It is acceptable at this stage if the process is somewhat manual (e.g.
    40  running some sort of yet-to-be-written backup/restore process and
    41  stopping/restarting all nodes at once). However, it is preferable if
    42  any migrations are done automatically when a node starts up with an
    43  old data directory.
    44  
    45  ### Stage 2: Online upgrades (Date TBD)
    46  
    47  Beginning in stage 2, we require that any upgrade be able to be
    48  performed without taking the cluster offline: old and new nodes must
    49  be able to coexist. The exact date for this stage is yet to be
    50  determined, but will be during the beta period and before 1.0.
    51  
    52  ## Affected code
    53  
    54  Any code could potentially be affected by the freeze, but areas that
    55  will deserve special scrutiny include:
    56  
    57  * All `.proto` definitions
    58  * The packages `keys` and `util/encoding`
    59  * All system tables (defined in `sql/system.go`)
    60  
    61  ## Migration strategies
    62  
    63  It is difficult to come up with a universal migration strategy, since
    64  different changes will require different approaches (for example,
    65  `.proto` changes could perhaps be made by rewriting data on disk at
    66  startup, while changes to key construction may require the change to
    67  be coordinated in a distributed fashion). Therefore we leave the
    68  specifics of a migration process until the need arises.
    69  
    70  To facilitate future changes, we will introduce version numbers at
    71  several levels. Initially the behavior around these version numbers
    72  will be conservative and cross-version communication will be limited.
    73  That makes these version numbers a blunt instrument to be reserved for
    74  major changes.
    75  
    76  * The on-disk format (via a file that lives outside RocksDB). Servers
    77    will refuse to load a database with a higher version number than
    78    they understand.
    79  * The network protocol (via GRPC header). Servers and clients will
    80    treat a higher version number than they understand as an error.
    81  * Gossip (perhaps via a new node attribute, or field in the
    82    `NodeDescriptor` proto). The rebalance/allocation system will not
    83    choose to place a replica on a node with a different version number.
    84  * The SQL `TableDescriptor`
    85  
    86  # Drawbacks
    87  
    88  After the freeze, some changes will be much harder to make.
    89  
    90  # Alternatives
    91  
    92  None.
    93  
    94  # Unresolved questions
    95  
    96  * When exactly do we begin the stage 2 freeze?
    97  * Is there anything else worth doing at this point to facilitate
    98    future migrations?
    99  * What about downgrades? It's scary to upgrade when there is no going
   100    back. However, supporting downgrades adds even more complexity and I
   101    don't think it's worth making this commitment during beta (maybe for
   102    1.0, though).