github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20161024_cluster_upgrade_tool.md (about)

     1  - Feature Name: Cluster Upgrade Tool
     2  - Status: completed
     3  - Start Date: 2016-09-15
     4  - Authors: Daniel Harrison, Alex Robinson
     5  - RFC PR: [#9404](https://github.com/cockroachdb/cockroach/pull/9404)
     6  - Cockroach Issue: [#4267](https://github.com/cockroachdb/cockroach/issues/4267)
     7  
     8  # Summary
     9  
    10  A series of hooks to perform any necessary bookkeeping for a CockroachDB
    11  version upgrade.
    12  
    13  # Motivation
    14  
    15  Backward compatibility is crucial to a database, but can introduce considerable
    16  complexity to a codebase. All else being equal, it's better to keep backward
    17  compatibility logic siloed from the significant complexity that is inherent in
    18  software as complex as CockroachDB.
    19  
    20  ## Examples of migrations
    21  
    22  * Add a `system.jobs` table (#7073)
    23  * Add root user and authentication to the system.users table (#9877)
    24  * Remove the reference to `experimental_unique_bytes` from the
    25    `system.eventlog` table (#5887)
    26  * (maybe) Migrate `TableDescriptor`s to new `FormatVersion`s and ensure
    27    compatibility (#7136)
    28  * (maybe) Switch from nanoseconds to microseconds for storing timestamps and
    29    intervals (#9758, #9759)
    30  * (maybe) Change the encoding of the `RaftAppliedIndexKey` and
    31    `LeaseAppliedIndexKey` keys (#9306) - this can be done on a store-by-store
    32    basis without requiring cluster-level coordination
    33  * Switch over to proposer-evaulated kv (#6290, #6166) - this is likely to be a
    34    special case, where we force a stop-the-world event to make the switch
    35    sometime before 1.0
    36  
    37  # Detailed design
    38  
    39  Some versions of CockroachDB will require that a "system migration hook" be run.
    40  This is expected to be infrequent; not every release will require migrations.
    41  
    42  ## Jargon
    43  
    44  A "system migration hook" is a self-contained function that can be run from one
    45  of the CockroachDB nodes in a cluster to modify the state of the cluster in
    46  some way.
    47  
    48  Simple migrations can be discussed in terms of which versions of the Cockroach
    49  binary are compatible with it: "pre-migration" versions are incompatible with
    50  the migration, "pending-migration" versions are compatible with and without it,
    51  and "post-migration" versions require it. More complex migrations can be
    52  modeled by repeated simple migrations.
    53  
    54  Example: Adding a `system.jobs` table. No versions are pre-migration, because
    55  any unknown system tables are ignored. Versions that use the table if it is
    56  there but can function without it are pending-migration. The first commit that
    57  assumes that the table is present begins the post-migration range.
    58  
    59  For simplicity, we assume that at most two versions of CockroachDB are ever
    60  running in a cluster at once. This restriction could potentially be relaxed,
    61  but it's out of scope for the first version of this RFC.
    62  
    63  Some migrations most naturally match a model where the CockroachDB version with
    64  the migration hook is a post-migration version. One example is a hook to add a
    65  new system table and code that uses the system table. Significant complexity is
    66  avoided if it can be assumed that the table exists. These migrations should be
    67  run before any node starts using the post-migration version.
    68  
    69  Other migrations work better when the hook version is pending-migration. When
    70  changing the schema of an existing system table, it's easiest to include the
    71  code that handles both schemas in the same version as the hook that performs the
    72  actual migration. These migrations should be run after all nodes are rolled onto
    73  the hook version.
    74  
    75  Our most pressing initial motivations fall into the first model, so it will be
    76  the primary focus of this RFC.
    77  
    78  ## Short-term design
    79  
    80  Because handling migrations in the general case is a very broad problem
    81  encompassing many potential types of migrations, we choose to first tackle the
    82  simplest case, migrations that are backward-compatible, while leaving the door
    83  open to more involved schemes that can support backward-incompatible
    84  migrations.
    85  
    86  In the short term, migration hooks will consist of a name and a work function.
    87  In the steady state, there is an ordered list of migration names in the code,
    88  ordered by when they were added.
    89  
    90  When a node starts up, it checks that all known migrations have been run (the
    91  amount of work involved in this can be kept bounded using the `/SystemVersion`
    92  keys described below). If not, it runs them via a migration coordinator.
    93  
    94  The ordered list of migrations can be used to order migrations when a cluster
    95  needs to catch up with more than one of them.
    96  
    97  The migration coordinator starts by grabbing and maintaining a migration lease
    98  to ensure that there is only ever one running at a time. Other nodes that start
    99  up and require migrations while a different node is doing migrations will have
   100  to block until the lease is released (or expired in the case of node failure).
   101  Then, for each missing migration, the migration coordinator runs its work
   102  function and writes a record of the completion to kv entry
   103  `/SystemVersion/<MigrationName>`.
   104  
   105  Each work function must be idempotent (in case a migration coordinator crashes)
   106  and compatible with previous versions of CockroachDB that could be running on
   107  other nodes in the cluster. The latter restriction will be relaxed in the
   108  [long-term design](#long-term-design).
   109  
   110  ### Examples
   111  
   112  Simple migrations, like `CREATE TABLE IF NOT EXISTS system.jobs (...)`, can be
   113  accomplished with one hook and immediately used. To add such a migration, you'd
   114  create a new migration called something like "CREATE system.jobs" with a
   115  function that creates the new system table and add it to the end of the list of
   116  migrations. It would then automatically be run whenever a version of CockroachDB
   117  that includes it joins an existing cluster for the first time.
   118  
   119  The example of adding the root user to `system.users` can also be accomplished
   120  with a single post-migration version hook, meaning that it could similarly just
   121  be added to the list of migration hooks and run on startup without concern for
   122  what CockroachDB versions are being used by other active nodes in the cluster.
   123  
   124  ## Long-term design
   125  
   126  While our most immediate needs don't actually require backward-incompatible
   127  changes, there are a number of examples of changes that we'd like to make that
   128  do. For such hooks that have pre-migration versions, we'll have to put in place
   129  additional infrastructure to ensure safe migrations.
   130  
   131  There are two primary approaches for this, which we don't actually have to
   132  choose from now, but should at least understand to ensure that we don't restrict
   133  our options too much with what we do in the shorter term.
   134  
   135  ### Option 1: Require operator intervention
   136  
   137  The first option to support non-backward-compatible migrations is to introduce
   138  a new CLI command `./cockroach cluster-upgrade` that gives the DB administrator
   139  control over when migrations happen. This command will support:
   140  
   141  * Listing all migrations that have been run on the cluster
   142  * Listing the available migrations, which migrations they depend on, and the
   143    CockroachDB versions at which it's safe to run each of them
   144  * Running a single migration specified by the admin
   145  * Running all available migrations whose migration dependencies are satisfied
   146    (note that this may be dangerous if the minimum version for a migration isn't
   147    satisfied by all nodes in the cluster; we may want to validate this ourselves
   148    or at least require a scary command line flag to avoid mistakes)
   149  
   150  The idea is that before upgrading to a new version of CockroachDB, the admin
   151  will look in the release notes for the version that they want to run and
   152  ensure that they've run all the required migrations. If they haven't, they'll
   153  need to do so before upgrading. If not all the upgrades required by the
   154  desired version are available at the cluster's current version, they may need
   155  to first upgrade to an intermediate version that supports the migrations so
   156  that they can run them.
   157  
   158  If a CockroachDB node starts up and the cluster that it joins has not run all
   159  the migrations required by the node's version, that node will exit with an
   160  appropriate error message.
   161  
   162  This approach gives administrators total control of potentially destructive
   163  migrations at the cost of adding extra manual work. On the plus side, it
   164  enables rollbacks better than a design where non-backward-compatible changes
   165  are made automatically by the system when starting up nodes with the new
   166  version.
   167  
   168  The CLI tool will be the recommended way of upgrading CockroachDB versions for
   169  production scenarios. For ease of small clusters and local development,
   170  one-node clusters will be able to do migrations during their normal startup
   171  process if so desired. Also, migrations with no minimum version specified can
   172  still be run at startup (as in the short-term design) because the lack of a
   173  minimum version can be assumed to mean that they're backward-compatible.
   174  
   175  #### Overview of changes needed from short-term design
   176  
   177  This approach will require a few more details to be stored in the hard-coded
   178  migration descriptors than for the short-term solution. For each migration,
   179  we'll have to include a list of all migrations that it depends on and the
   180  minimum version at which it can safely be run. The data stored in the
   181  `/SystemVersion/<MigrationName>` keys will not have to change.
   182  
   183  This approach will also require command-line tooling to be built for the
   184  `./cockroach cluster-upgrade` command.
   185  
   186  #### Example
   187  
   188  Let's consider the example of switching the storage format of timestamps from
   189  nanoseconds to microseconds (#9759). We can't simply change the code and
   190  add an automatically run migration to the next release of CockroachDB because
   191  it's not safe to change the storage format of all the timestamps in a cluster
   192  while old nodes may not know how to read the new format.
   193  
   194  Instead, the migration might look something like this from the perspective of
   195  CockroachDB developers:
   196  
   197  1. Release new version of CockroachDB with:
   198    1. Code that can handle both encodings
   199    1. A non-required migration that changes the encodings of all timestamps
   200       in the cluster. This migration probably won't depend on any other
   201       migrations.
   202  1. Some number of releases later (could be the next version, or could be
   203     multiple versions down the line), remove the compatibility code and switch
   204     the migration to be required.
   205  
   206  From the perspective of a DB admin, it'd look like:
   207  
   208  1. Decide to upgrade CockroachDB cluster to new version
   209  1. Verify that all migrations for the current version have been completed by
   210     running `./cockroach cluster-upgrade` with the current binary
   211  1. Check release notes for the desired version to determine whether it's safe
   212     to upgrade directly from their current version
   213    1. If it is, just carry out the upgrade
   214    1. If it isn't, pick an intermediate version recommended by the docs and
   215       restart the process for it
   216  1. Once the upgrade has completed and rollbacks are deemed no longer necessary,
   217     perform the new version's migrations by running `./cockroach cluster-upgrade`
   218  
   219  ### Option 2: Do all migrations automatically
   220  
   221  The main alternative to manual intervention is to attempt to do the operator's
   222  work for them automatically. This would provide the best user experience for
   223  less sophisticated users who care more about their time and energy than about
   224  having total control.
   225  
   226  Unlike in the backward-compatible case, we can't necessarily run
   227  all known migrations when the first node at a new version is started up,
   228  because some migrations may modify the cluster state in ways that are
   229  incompatible with the other nodes in the cluster. We can never automatically run
   230  migrations unless we know that all the nodes in the cluster are at a recent
   231  enough version.
   232  
   233  We could ensure this in at least two different ways:
   234  
   235  1. Start tracking both node version information for the entire cluster and add
   236     some code that understands our semantic versioning scheme sufficiently well
   237     to determine which node versions support which migrations.
   238  1. Start tracking the migrations each node knows about that have yet to be run.
   239     A migration could then be run once all the nodes know about it.
   240  
   241  That may be some work, but once it's in place, we can go back to doing
   242  migrations on start-up by adding in some logic that checks whether the all the
   243  nodes in the cluster support the required migrations. If not, the new node can
   244  exit out with an error message. If they do, then the new node can kick off the
   245  migrations.
   246  
   247  #### Overview of changes needed from short-term design
   248  
   249  This approach will require the same few additional details to be stored in the
   250  hard-coded migration descriptors as for option 1. For each migration,
   251  we'll have to include a list of all migrations that it depends on and the
   252  minimum version at which it can safely be run. The data stored in the
   253  `/SystemVersion/<MigrationName>` keys will not have to change.
   254  
   255  The main difference for this approach is that we'll have to start tracking
   256  the CockroachDB version (or alternatively the known migrations) for all nodes
   257  in the cluster. We could potentially use gossip for this, but have to be sure
   258  that we know the state of all nodes, not just most of them to be confident that
   259  it's safe to run a migration. It has also been suggested that we [use a form of
   260  version leases for this](https://github.com/cockroachdb/cockroach/issues/10212).
   261  
   262  #### Example
   263  
   264  Again, let's consider the example of switching the storage format of timestamps
   265  from nanoseconds to microseconds (#9759).
   266  
   267  The migration will look fairly similar from the perspective of CockroachDB
   268  developers:
   269  
   270  1. Release new version of CockroachDB with code that can handle both encodings.
   271  1. Include a non-required migration that changes the encodings of all timestamps
   272     in the cluster. This migration probably won't depend on any other migrations.
   273  1. Some number of releases later (could be the next version, or could be
   274     multiple versions down the line):
   275    1. Add an automatically run migration that switches the encodings
   276    1. Remove the compatibility code for handling the old encoding
   277  
   278  From the perspective of a DB admin, it'd look like:
   279  
   280  1. Decide to upgrade CockroachDB cluster to new version
   281  1. Check release notes for the desired version to see whether it's safe to
   282     upgrade to from the current version
   283    1. If it is, just carry out the upgrade
   284    1. If it isn't, then repeat this process for an earlier version than the one
   285       initially chosen in the first step before continuing with the upgrade
   286  
   287  ## User experience
   288  
   289  When a migration is in progress, that fact should be exposed in the UI. It would
   290  be nice if we could also display the progress in the same place. Once we've used
   291  added a `system.jobs` table (using a simple migration), it can be used for this.
   292  
   293  # Drawbacks
   294  
   295  The short-term design doesn't appear to have any obvious drawbacks, as it
   296  solves a couple of immediate needs and can be added without limiting future
   297  extensibility.
   298  
   299  The long-term design adds administrative complexity to the upgrade process, no
   300  matter which option we go with. It'll also add a little extra work to the
   301  release process for developers due to the need to include minimum versions for
   302  migrations -- sometimes the minimum version will be the version being released.
   303  
   304  # Alternatives
   305  
   306  Currently, various upgrades are performed on an "as needed" basis: see the
   307  `FormatVersion` example mentioned above. This has the advantage of supporting
   308  cluster upgrades with no operational overhead, but it introduces code
   309  complexity that will be hard to ever remove. Adding new system tables is
   310  particularly complex; see the
   311  [attempted `system.jobs` PR](https://github.com/cockroachdb/cockroach/pull/7073)
   312  for an example.
   313  
   314  We've also considered requiring a command-line tool to be run using the new
   315  binary on a currently running cluster (on an older version) before upgrading it.
   316  This has a similar admin experience to long-term design option 1, while adding
   317  the complexity of needing to deal with multiple versions of the binary at once.
   318  
   319  There is also always the option of requiring clusters to be brought down for
   320  migrations, but given the importance of uptime to most users, we consider that
   321  something to be avoided whenever reasonably possible. It will likely be needed
   322  to switch to proposer-evaluated kv (#6290, #6166) in the pre-1.0 time frame,
   323  but we hope that it won't be needed after that.