github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170318_init_command.md (about)

     1  - Feature Name: init command
     2  - Status: completed
     3  - Start Date: 2017-03-13
     4  - Authors: @bdarnell
     5  - RFC PR: [#14251](https://github.com/cockroachdb/cockroach/pull/14251)
     6  - Cockroach Issue: [#5974](https://github.com/cockroachdb/cockroach/issues/5974)
     7  
     8  # Summary
     9  
    10  This RFC proposes a change to the cluster initialization workflow,
    11  introducing a `cockroach init` command which can take the place of the
    12  current logic involving the absence of a `--join` flag. This is
    13  intended to be more compatible with various deployment tools by making
    14  the node configuration more homogeneous.
    15  
    16  The new procedure will be:
    17  
    18  1. Start all nodes with the same `--join` flag.
    19  2. Run `cockroach init --host=...`, where the `host` parameter is the
    20     address of one of the nodes in the cluster.
    21  
    22  The old procedure of omitting the `--join` flag on one node will still
    23  be permitted, but discouraged for production use.
    24  
    25  # Motivation
    26  
    27  All CockroachDB clusters require a one-time-only init/bootstrap step.
    28  This is currently performed when a node is started without a `--join`
    29  flag, relying on the admin to start exactly one node in this way. This
    30  is fine for manual test clusters, but it is awkward to automate. One
    31  node must be treated as "special" on its first startup, but it must
    32  revert to normal mode (with a `--join` flag) for later restarts (or
    33  else it could re-initialize a new cluster if it is ever restarted
    34  without its data directory. We have solved
    35  this
    36  [for Kubernetes](https://github.com/cockroachdb/cockroach/blob/43f24c9042657448a0ad635b95099b75e478de41/cloud/kubernetes/cockroachdb-statefulset.yaml#L97) with
    37  a special "init container", but this is relatively subtle logic that
    38  must be redone for each new deployment platform.
    39  
    40  Instead, this RFC proposes that the deployment be simplified by using
    41  the "real" `--join` flags everywhere from the beginning, and using an
    42  explicit action by the administrator (or another script) to bootstrap
    43  the cluster.
    44  
    45  # Detailed design
    46  
    47  We introduce a new command `cockroach init` and a new RPC
    48  `InitCluster`.
    49  
    50  ## `InitCluster` RPC
    51  
    52  The `InitCluster` RPC is a node-level RPC that calls
    53  `server.bootstrapCluster` (unless the cluster is already
    54  bootstrapped). It requires `root` permissions.
    55  
    56  ## `cockroach init`
    57  
    58  The `cockroach init` command is responsible for calling `InitCluster`.
    59  It makes a single attempt and does not retry unless it can be certain
    60  that the previous attempt did not succeed (for example, it could retry
    61  on "connection refused" errors, but not on timeouts). In the event of
    62  an ambiguous error, the admin should examine the cluster to determine
    63  whether the `init` command needs to be retried.
    64  
    65  ## Complete example
    66  
    67  The recommended process for starting a three-node cluster will look
    68  like this (although it would normally be wrapped up in some sort of
    69  orchestration tooling):
    70  
    71  ```shell
    72  user@node1$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data
    73  
    74  user@node2$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data
    75  
    76  user@node3$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data
    77  
    78  user@anywhere$ cockroach init --host=node1:26257
    79  ```
    80  
    81  # Drawbacks
    82  
    83  ## Extra step
    84  
    85  This proposal adds an extra step to cluster initialization. However,
    86  this step could be performed at the same time as other common
    87  post-deployment actions (such as creating databases, granting
    88  permissions, etc), which should minimize the overall impact on
    89  operational complexity.
    90  
    91  ## Node ID divergence
    92  
    93  With this proposal, the assignment of node IDs and store IDs becomes
    94  less predictable, so node IDs will be less likely to correspond to
    95  externally-assigned host names, task IDs, etc.
    96  
    97  # Alternatives
    98  
    99  ## Init before start
   100  
   101  Originally, CockroachDB required an explicit bootstrapping step using
   102  an `cockroach init` command to be run *before* starting any nodes
   103  (this mirrors PostgreSQL's `initdb` command or MySQL's
   104  `mysql_install_db`). This was removed because it required that the
   105  same directory that `cockroach init` wrote to was used when starting
   106  the real server, which is difficult to guarantee with many deployment
   107  platforms.
   108  
   109  ## Wait for connected nodes
   110  
   111  An earlier draft of this RFC proposed that the `cockroach init`
   112  command take the number of nodes expected in the cluster and not
   113  attempt to bootstrap the cluster until that number of nodes are
   114  present. This information would be used to make the retry logic
   115  slightly more robust, as well as giving an opportunity to present
   116  diagnostic information to the admin when the cluster is not connecting
   117  via gossip. This was considered too much complexity for little
   118  benefit.
   119  
   120  ## Remove old behavior
   121  
   122  The existing logic of automatic bootstrapping when no `--join` flag is
   123  present could be removed, forcing all clusters to use the explicit
   124  `init` command. This would be a conceptual simplification by removing
   125  a redundant (and discouraged) option, but adds additional friction to
   126  simple single-node cases.