github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/kafka_raft_migration.md (about) 1 # Migrating from Kafka to Raft 2 3 **Note: this document presumes a high degree of expertise with channel 4 configuration update transactions. As the process for migration involves 5 several channel configuration update transactions, do not attempt to migrate 6 from Kafka to Raft without first familiarizing yourself with the [Add an 7 Organization to a Channel](channel_update_tutorial.html) tutorial, which 8 describes the channel update process in detail.** 9 10 For users who want to transition channels from using Kafka-based ordering 11 services to [Raft-based](./orderer/ordering_service.html#Raft) ordering services, 12 nodes at v1.4.2 or higher allow this to be accomplished through a series of configuration update 13 transactions on each channel in the network. 14 15 This tutorial will describe this process at a high level, calling out specific 16 details where necessary, rather than show each command in detail. 17 18 ## Assumptions and considerations 19 20 Before attempting migration, take the following into account: 21 22 1. This process is solely for migration from Kafka to Raft. Migrating between 23 any other orderer consensus types is not currently supported. 24 25 2. Migration is one way. Once the ordering service is migrated to Raft, and 26 starts committing transactions, it is not possible to go back to Kafka. 27 28 3. Because the ordering nodes must go down and be brought back up, downtime must 29 be allowed during the migration. 30 31 4. Recovering from a botched migration is possible only if a backup is taken at 32 the point in migration prescribed later in this document. If you do not take a 33 backup, and migration fails, you will not be able to recover your previous state. 34 35 5. All channels must be migrated during the same maintenance window. It is not 36 possible to migrate only some channels before resuming operations. 37 38 6. At the end of the migration process, every channel will have the same 39 consenter set of Raft nodes. This is the same consenter set that will exist in 40 the ordering system channel. This makes it possible to diagnose a successful 41 migration. 42 43 7. Migration is done in place, utilizing the existing ledgers for the deployed 44 ordering nodes. Addition or removal of orderers should be performed after the 45 migration. 46 47 ## High level migration flow 48 49 Migration is carried out in five phases. 50 51 1. The system is placed into a maintenance mode where application transactions 52 are rejected and only ordering service admins can make changes to the channel 53 configuration. 54 2. The system is stopped, and a backup is taken in case an error occurs during 55 migration. 56 3. The system is started, and each channel has its consensus type and metadata 57 modified. 58 4. The system is restarted and is now operating on Raft consensus; each channel 59 is checked to confirm that it has successfully achieved a quorum. 60 5. The system is moved out of maintenance mode and normal function resumes. 61 62 ## Preparing to migrate 63 64 There are several steps you should take before attempting to migrate. 65 66 * Design the Raft deployment, deciding which ordering service nodes are going to 67 remain as Raft consenters. You should deploy at least three ordering nodes in 68 your cluster, but note that deploying a consenter set of at least five nodes 69 will maintain high availability should a node goes down, whereas a three node 70 configuration will lose high availability once a single node goes down for any 71 reason (for example, as during a maintenance cycle). 72 * Prepare the material for 73 building the Raft `Metadata` configuration. **Note: all the channels should receive 74 the same Raft `Metadata` configuration**. Refer to the [Raft configuration guide](raft_configuration.html) 75 for more information on these fields. Note: you may find it easiest to bootstrap 76 a new ordering network with the Raft consensus protocol, then copy and modify 77 the consensus metadata section from its config. In any case, you will need 78 (for each ordering node): 79 - `hostname` 80 - `port` 81 - `server certificate` 82 - `client certificate` 83 * Compile a list of all channels (system and application) in the system. Make 84 sure you have the correct credentials to sign the configuration updates. For 85 example, the relevant ordering service admin identities. 86 * Ensure all ordering service nodes are running the same version of Fabric, and 87 that this version is v1.4.2 or greater. 88 * Ensure all peers are running at least v1.4.2 of Fabric. Make sure all channels 89 are configured with the channel capability that enables migration. 90 - Orderer capability `V1_4_2` (or above). 91 - Channel capability `V1_4_2` (or above). 92 93 ### Entry to maintenance mode 94 95 Prior to setting the ordering service into maintenance mode, it is recommended 96 that the peers and clients of the network be stopped. Leaving peers or clients 97 up and running is safe, however, because the ordering service will reject all of 98 their requests, their logs will fill with benign but misleading failures. 99 100 Follow the process in the [Add an Organization to a Channel](channel_update_tutorial.html) 101 tutorial to pull, translate, and scope the configuration of **each channel, 102 starting with the system channel**. The only field you should change during 103 this step is in the channel configuration at `/Channel/Orderer/ConsensusType`. 104 In a JSON representation of the channel configuration, this would be 105 `.channel_group.groups.Orderer.values.ConsensusType`. 106 107 The `ConsensusType` is represented by three values: `Type`, `Metadata`, and 108 `State`, where: 109 110 * `Type` is either `kafka` or `etcdraft` (Raft). This value can only be 111 changed while in maintenance mode. 112 * `Metadata` will be empty if the `Type` is kafka, but must carry valid Raft 113 metadata if the `ConsensusType` is `etcdraft`. More on this below. 114 * `State` is either `STATE_NORMAL`, when the channel is processing transactions, or 115 `STATE_MAINTENANCE`, during the migration process. 116 117 In the first step of the channel configuration update, only change the `State` 118 from `STATE_NORMAL` to `STATE_MAINTENANCE`. Do not change the `Type` or the `Metadata` field 119 yet. Note that the `Type` should currently be `kafka`. 120 121 While in maintenance mode, normal transactions, config updates unrelated to 122 migration, and `Deliver` requests from the peers used to retrieve new blocks are 123 rejected. This is done in order to prevent the need to both backup, and if 124 necessary restore, peers during migration, as they only receive updates once 125 migration has successfully completed. In other words, we want to keep the 126 ordering service backup point, which is the next step, ahead of the peer’s ledger, 127 in order to be able to perform rollback if needed. However, ordering node admins 128 can issue `Deliver` requests (which they need to be able to do in order to 129 continue the migration process). 130 131 **Verify** that each ordering service node has entered maintenance mode on each 132 of the channels. This can be done by fetching the last config block and making 133 sure that the `Type`, `Metadata`, `State` on each channel is `kafka`, empty 134 (recall that there is no metadata for Kafka), and `STATE_MAINTENANCE`, respectively. 135 136 If the channels have been updated successfully, the ordering service is now 137 ready for backup. 138 139 #### Backup files and shut down servers 140 141 Shut down all ordering nodes, Kafka servers, and Zookeeper servers. It is 142 important to **shutdown the ordering service nodes first**. Then, after allowing 143 the Kafka service to flush its logs to disk (this typically takes about 30 144 seconds, but might take longer depending on your system), the Kafka servers 145 should be shut down. Shutting down the Kafka brokers at the same time as the 146 orderers can result in the filesystem state of the orderers being more recent 147 than the Kafka brokers which could prevent your network from starting. 148 149 Create a backup of the file system of these servers. Then restart the Kafka 150 service and then the ordering service nodes. 151 152 ### Switch to Raft in maintenance mode 153 154 The next step in the migration process is another channel configuration update 155 for each channel. In this configuration update, switch the `Type` to `etcdraft` 156 (for Raft) while keeping the `State` in `STATE_MAINTENANCE`, and fill in the 157 `Metadata` configuration. It is highly recommended that the `Metadata` configuration be 158 identical on all channels. If you want to establish different consenter sets 159 with different nodes, you will be able to reconfigure the `Metadata` configuration 160 after the system is restarted into `etcdraft` mode. Supplying an identical metadata 161 object, and hence, an identical consenter set, means that when the nodes are 162 restarted, if the system channel forms a quorum and can exit maintenance mode, 163 other channels will likely be able do the same. Supplying different consenter 164 sets to each channel can cause one channel to succeed in forming a cluster while 165 another channel will fail. 166 167 Then, validate that each ordering service node has committed the `ConsensusType` 168 change configuration update by pulling and inspecting the configuration of each 169 channel. 170 171 Note: For each channel, the transaction that changes the `ConsensusType` must be the last 172 configuration transaction before restarting the nodes (in the next step). If 173 some other configuration transaction happens after this step, the nodes will 174 most likely crash on restart, or result in undefined behavior. 175 176 #### Restart and validate leader 177 178 Note: exit of maintenance mode **must** be done **after** restart. 179 180 After the `ConsensusType` update has been completed on each channel, stop all 181 ordering service nodes, stop all Kafka brokers and Zookeepers, and then restart 182 only the ordering service nodes. They should restart as Raft nodes, form a cluster per 183 channel, and elect a leader on each channel. 184 185 **Note**: Since Raft-based ordering service requires mutual TLS between orderer nodes, 186 **additional configurations** are required before you start them again, see 187 [Section: Local Configuration](./raft_configuration.md#local-configuration) for more details. 188 189 After restart process finished, make sure to **validate** that a 190 leader has been elected on each channel by inspecting the node logs (you can see 191 what to look for below). This will confirm that the process has been completed 192 successfully. 193 194 When a leader is elected, the log will show, for each channel: 195 196 ``` 197 "Raft leader changed: 0 -> node-number channel=channel-name 198 node=node-number " 199 ``` 200 201 For example: 202 203 ``` 204 2019-05-26 10:07:44.075 UTC [orderer.consensus.etcdraft] serveRequest -> 205 INFO 047 Raft leader changed: 0 -> 1 channel=testchannel1 node=2 206 ``` 207 208 In this example `node 2` reports that a leader was elected (the leader is 209 `node 1`) by the cluster of channel `testchannel1`. 210 211 ### Switch out of maintenance mode 212 213 Perform another channel configuration update on each channel (sending the config 214 update to the same ordering node you have been sending configuration updates to 215 until now), switching the `State` from `STATE_MAINTENANCE` to `STATE_NORMAL`. Start with the 216 system channel, as usual. If it succeeds on the ordering system channel, 217 migration is likely to succeed on all channels. To verify, fetch the last config 218 block of the system channel from the ordering node, verifying that the `State` 219 is now `STATE_NORMAL`. For completeness, verify this on each ordering node. 220 221 When this process is completed, the ordering service is now ready to accept all 222 transactions on all channels. If you stopped your peers and application as 223 recommended, you may now restart them. 224 225 ## Abort and rollback 226 227 If a problem emerges during the migration process **before exiting maintenance 228 mode**, simply perform the rollback procedure below. 229 230 1. Shut down the ordering nodes and the Kafka service (servers and Zookeeper 231 ensemble). 232 2. Rollback the file system of these servers to the backup taken at maintenance 233 mode before changing the `ConsensusType`. 234 3. Restart said servers, the ordering nodes will bootstrap to Kafka in 235 maintenance mode. 236 4. Send a configuration update exiting maintenance mode to continue using Kafka 237 as your consensus mechanism, or resume the instructions after the point of 238 backup and fix the error which prevented a Raft quorum from forming and retry 239 migration with corrected Raft configuration `Metadata`. 240 241 There are a few states which might indicate migration has failed: 242 243 1. Some nodes crash or shutdown. 244 2. There is no record of a successful leader election per channel in the logs. 245 3. The attempt to flip to `STATE_NORMAL` mode on the system channel fails. 246 247 <!--- Licensed under Creative Commons Attribution 4.0 International License 248 https://creativecommons.org/licenses/by/4.0/) -->