github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20170318_init_command.md (about) 1 - Feature Name: init command 2 - Status: completed 3 - Start Date: 2017-03-13 4 - Authors: @bdarnell 5 - RFC PR: [#14251](https://github.com/cockroachdb/cockroach/pull/14251) 6 - Cockroach Issue: [#5974](https://github.com/cockroachdb/cockroach/issues/5974) 7 8 # Summary 9 10 This RFC proposes a change to the cluster initialization workflow, 11 introducing a `cockroach init` command which can take the place of the 12 current logic involving the absence of a `--join` flag. This is 13 intended to be more compatible with various deployment tools by making 14 the node configuration more homogeneous. 15 16 The new procedure will be: 17 18 1. Start all nodes with the same `--join` flag. 19 2. Run `cockroach init --host=...`, where the `host` parameter is the 20 address of one of the nodes in the cluster. 21 22 The old procedure of omitting the `--join` flag on one node will still 23 be permitted, but discouraged for production use. 24 25 # Motivation 26 27 All CockroachDB clusters require a one-time-only init/bootstrap step. 28 This is currently performed when a node is started without a `--join` 29 flag, relying on the admin to start exactly one node in this way. This 30 is fine for manual test clusters, but it is awkward to automate. One 31 node must be treated as "special" on its first startup, but it must 32 revert to normal mode (with a `--join` flag) for later restarts (or 33 else it could re-initialize a new cluster if it is ever restarted 34 without its data directory. We have solved 35 this 36 [for Kubernetes](https://github.com/cockroachdb/cockroach/blob/43f24c9042657448a0ad635b95099b75e478de41/cloud/kubernetes/cockroachdb-statefulset.yaml#L97) with 37 a special "init container", but this is relatively subtle logic that 38 must be redone for each new deployment platform. 39 40 Instead, this RFC proposes that the deployment be simplified by using 41 the "real" `--join` flags everywhere from the beginning, and using an 42 explicit action by the administrator (or another script) to bootstrap 43 the cluster. 44 45 # Detailed design 46 47 We introduce a new command `cockroach init` and a new RPC 48 `InitCluster`. 49 50 ## `InitCluster` RPC 51 52 The `InitCluster` RPC is a node-level RPC that calls 53 `server.bootstrapCluster` (unless the cluster is already 54 bootstrapped). It requires `root` permissions. 55 56 ## `cockroach init` 57 58 The `cockroach init` command is responsible for calling `InitCluster`. 59 It makes a single attempt and does not retry unless it can be certain 60 that the previous attempt did not succeed (for example, it could retry 61 on "connection refused" errors, but not on timeouts). In the event of 62 an ambiguous error, the admin should examine the cluster to determine 63 whether the `init` command needs to be retried. 64 65 ## Complete example 66 67 The recommended process for starting a three-node cluster will look 68 like this (although it would normally be wrapped up in some sort of 69 orchestration tooling): 70 71 ```shell 72 user@node1$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data 73 74 user@node2$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data 75 76 user@node3$ cockroach start --join=node1:26257,node2:26257,node3:26257 --store=/mnt/data 77 78 user@anywhere$ cockroach init --host=node1:26257 79 ``` 80 81 # Drawbacks 82 83 ## Extra step 84 85 This proposal adds an extra step to cluster initialization. However, 86 this step could be performed at the same time as other common 87 post-deployment actions (such as creating databases, granting 88 permissions, etc), which should minimize the overall impact on 89 operational complexity. 90 91 ## Node ID divergence 92 93 With this proposal, the assignment of node IDs and store IDs becomes 94 less predictable, so node IDs will be less likely to correspond to 95 externally-assigned host names, task IDs, etc. 96 97 # Alternatives 98 99 ## Init before start 100 101 Originally, CockroachDB required an explicit bootstrapping step using 102 an `cockroach init` command to be run *before* starting any nodes 103 (this mirrors PostgreSQL's `initdb` command or MySQL's 104 `mysql_install_db`). This was removed because it required that the 105 same directory that `cockroach init` wrote to was used when starting 106 the real server, which is difficult to guarantee with many deployment 107 platforms. 108 109 ## Wait for connected nodes 110 111 An earlier draft of this RFC proposed that the `cockroach init` 112 command take the number of nodes expected in the cluster and not 113 attempt to bootstrap the cluster until that number of nodes are 114 present. This information would be used to make the retry logic 115 slightly more robust, as well as giving an opportunity to present 116 diagnostic information to the admin when the cluster is not connecting 117 via gossip. This was considered too much complexity for little 118 benefit. 119 120 ## Remove old behavior 121 122 The existing logic of automatic bootstrapping when no `--join` flag is 123 present could be removed, forcing all clusters to use the explicit 124 `init` command. This would be a conceptual simplification by removing 125 a redundant (and discouraged) option, but adds additional friction to 126 simple single-node cases.