github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/kafka_raft_migration.md

github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/kafka_raft_migration.md (about)

1 # Migrating from Kafka to Raft
2
3 **Note: this document presumes a high degree of expertise with channel
4 configuration update transactions. As the process for migration involves
5 several channel configuration update transactions, do not attempt to migrate
6 from Kafka to Raft without first familiarizing yourself with the [Add an
7 Organization to a Channel](channel_update_tutorial.html) tutorial, which
8 describes the channel update process in detail.**
9
10 For users who want to transition channels from using Kafka-based ordering
11 services to [Raft-based](./orderer/ordering_service.html#Raft) ordering services,
12 nodes at v1.4.2 or higher allow this to be accomplished through a series of configuration update
13 transactions on each channel in the network.
14
15 This tutorial will describe this process at a high level, calling out specific
16 details where necessary, rather than show each command in detail.
17
18 ## Assumptions and considerations
19
20 Before attempting migration, take the following into account:
21
22 1. This process is solely for migration from Kafka to Raft. Migrating between
23 any other orderer consensus types is not currently supported.
24
25 2. Migration is one way. Once the ordering service is migrated to Raft, and
26 starts committing transactions, it is not possible to go back to Kafka.
27
28 3. Because the ordering nodes must go down and be brought back up, downtime must
29 be allowed during the migration.
30
31 4. Recovering from a botched migration is possible only if a backup is taken at
32 the point in migration prescribed later in this document. If you do not take a
33 backup, and migration fails, you will not be able to recover your previous state.
34
35 5. All channels must be migrated during the same maintenance window. It is not
36 possible to migrate only some channels before resuming operations.
37
38 6. At the end of the migration process, every channel will have the same
39 consenter set of Raft nodes. This is the same consenter set that will exist in
40 the ordering system channel. This makes it possible to diagnose a successful
41 migration.
42
43 7. Migration is done in place, utilizing the existing ledgers for the deployed
44 ordering nodes. Addition or removal of orderers should be performed after the
45 migration.
46
47 ## High level migration flow
48
49 Migration is carried out in five phases.
50
51 1. The system is placed into a maintenance mode where application transactions
52 are rejected and only ordering service admins can make changes to the channel
53 configuration.
54 2. The system is stopped, and a backup is taken in case an error occurs during
55 migration.
56 3. The system is started, and each channel has its consensus type and metadata
57 modified.
58 4. The system is restarted and is now operating on Raft consensus; each channel
59 is checked to confirm that it has successfully achieved a quorum.
60 5. The system is moved out of maintenance mode and normal function resumes.
61
62 ## Preparing to migrate
63
64 There are several steps you should take before attempting to migrate.
65
66 * Design the Raft deployment, deciding which ordering service nodes are going to
67 remain as Raft consenters. You should deploy at least three ordering nodes in
68 your cluster, but note that deploying a consenter set of at least five nodes
69 will maintain high availability should a node goes down, whereas a three node
70 configuration will lose high availability once a single node goes down for any
71 reason (for example, as during a maintenance cycle).
72 * Prepare the material for
73 building the Raft `Metadata` configuration. **Note: all the channels should receive
74 the same Raft `Metadata` configuration**. Refer to the [Raft configuration guide](raft_configuration.html)
75 for more information on these fields. Note: you may find it easiest to bootstrap
76 a new ordering network with the Raft consensus protocol, then copy and modify
77 the consensus metadata section from its config. In any case, you will need
78 (for each ordering node):
79 - `hostname`
80 - `port`
81 - `server certificate`
82 - `client certificate`
83 * Compile a list of all channels (system and application) in the system. Make
84 sure you have the correct credentials to sign the configuration updates. For
85 example, the relevant ordering service admin identities.
86 * Ensure all ordering service nodes are running the same version of Fabric, and
87 that this version is v1.4.2 or greater.
88 * Ensure all peers are running at least v1.4.2 of Fabric. Make sure all channels
89 are configured with the channel capability that enables migration.
90 - Orderer capability `V1_4_2` (or above).
91 - Channel capability `V1_4_2` (or above).
92
93 ### Entry to maintenance mode
94
95 Prior to setting the ordering service into maintenance mode, it is recommended
96 that the peers and clients of the network be stopped. Leaving peers or clients
97 up and running is safe, however, because the ordering service will reject all of
98 their requests, their logs will fill with benign but misleading failures.
99
100 Follow the process in the [Add an Organization to a Channel](channel_update_tutorial.html)
101 tutorial to pull, translate, and scope the configuration of **each channel,
102 starting with the system channel**. The only field you should change during
103 this step is in the channel configuration at `/Channel/Orderer/ConsensusType`.
104 In a JSON representation of the channel configuration, this would be
105 `.channel_group.groups.Orderer.values.ConsensusType`.
106
107 The `ConsensusType` is represented by three values: `Type`, `Metadata`, and
108 `State`, where:
109
110 * `Type` is either `kafka` or `etcdraft` (Raft). This value can only be
111 changed while in maintenance mode.
112 * `Metadata` will be empty if the `Type` is kafka, but must carry valid Raft
113 metadata if the `ConsensusType` is `etcdraft`. More on this below.
114 * `State` is either `STATE_NORMAL`, when the channel is processing transactions, or
115 `STATE_MAINTENANCE`, during the migration process.
116
117 In the first step of the channel configuration update, only change the `State`
118 from `STATE_NORMAL` to `STATE_MAINTENANCE`. Do not change the `Type` or the `Metadata` field
119 yet. Note that the `Type` should currently be `kafka`.
120
121 While in maintenance mode, normal transactions, config updates unrelated to
122 migration, and `Deliver` requests from the peers used to retrieve new blocks are
123 rejected. This is done in order to prevent the need to both backup, and if
124 necessary restore, peers during migration, as they only receive updates once
125 migration has successfully completed. In other words, we want to keep the
126 ordering service backup point, which is the next step, ahead of the peer’s ledger,
127 in order to be able to perform rollback if needed. However, ordering node admins
128 can issue `Deliver` requests (which they need to be able to do in order to
129 continue the migration process).
130
131 **Verify** that each ordering service node has entered maintenance mode on each
132 of the channels. This can be done by fetching the last config block and making
133 sure that the `Type`, `Metadata`, `State` on each channel is `kafka`, empty
134 (recall that there is no metadata for Kafka), and `STATE_MAINTENANCE`, respectively.
135
136 If the channels have been updated successfully, the ordering service is now
137 ready for backup.
138
139 #### Backup files and shut down servers
140
141 Shut down all ordering nodes, Kafka servers, and Zookeeper servers. It is
142 important to **shutdown the ordering service nodes first**. Then, after allowing
143 the Kafka service to flush its logs to disk (this typically takes about 30
144 seconds, but might take longer depending on your system), the Kafka servers
145 should be shut down. Shutting down the Kafka brokers at the same time as the
146 orderers can result in the filesystem state of the orderers being more recent
147 than the Kafka brokers which could prevent your network from starting.
148
149 Create a backup of the file system of these servers. Then restart the Kafka
150 service and then the ordering service nodes.
151
152 ### Switch to Raft in maintenance mode
153
154 The next step in the migration process is another channel configuration update
155 for each channel. In this configuration update, switch the `Type` to `etcdraft`
156 (for Raft) while keeping the `State` in `STATE_MAINTENANCE`, and fill in the
157 `Metadata` configuration. It is highly recommended that the `Metadata` configuration be
158 identical on all channels. If you want to establish different consenter sets
159 with different nodes, you will be able to reconfigure the `Metadata` configuration
160 after the system is restarted into `etcdraft` mode. Supplying an identical metadata
161 object, and hence, an identical consenter set, means that when the nodes are
162 restarted, if the system channel forms a quorum and can exit maintenance mode,
163 other channels will likely be able do the same. Supplying different consenter
164 sets to each channel can cause one channel to succeed in forming a cluster while
165 another channel will fail.
166
167 Then, validate that each ordering service node has committed the `ConsensusType`
168 change configuration update by pulling and inspecting the configuration of each
169 channel.
170
171 Note: For each channel, the transaction that changes the `ConsensusType` must be the last
172 configuration transaction before restarting the nodes (in the next step). If
173 some other configuration transaction happens after this step, the nodes will
174 most likely crash on restart, or result in undefined behavior.
175
176 #### Restart and validate leader
177
178 Note: exit of maintenance mode **must** be done **after** restart.
179
180 After the `ConsensusType` update has been completed on each channel, stop all
181 ordering service nodes, stop all Kafka brokers and Zookeepers, and then restart
182 only the ordering service nodes. They should restart as Raft nodes, form a cluster per
183 channel, and elect a leader on each channel.
184
185 **Note**: Since Raft-based ordering service requires mutual TLS between orderer nodes,
186 **additional configurations** are required before you start them again, see
187 [Section: Local Configuration](./raft_configuration.md#local-configuration) for more details.
188
189 After restart process finished, make sure to **validate** that a
190 leader has been elected on each channel by inspecting the node logs (you can see
191 what to look for below). This will confirm that the process has been completed
192 successfully.
193
194 When a leader is elected, the log will show, for each channel:
195
196 ```
197 "Raft leader changed: 0 -> node-number channel=channel-name
198 node=node-number "
199 ```
200
201 For example:
202
203 ```
204 2019-05-26 10:07:44.075 UTC [orderer.consensus.etcdraft] serveRequest ->
205 INFO 047 Raft leader changed: 0 -> 1 channel=testchannel1 node=2
206 ```
207
208 In this example `node 2` reports that a leader was elected (the leader is
209 `node 1`) by the cluster of channel `testchannel1`.
210
211 ### Switch out of maintenance mode
212
213 Perform another channel configuration update on each channel (sending the config
214 update to the same ordering node you have been sending configuration updates to
215 until now), switching the `State` from `STATE_MAINTENANCE` to `STATE_NORMAL`. Start with the
216 system channel, as usual. If it succeeds on the ordering system channel,
217 migration is likely to succeed on all channels. To verify, fetch the last config
218 block of the system channel from the ordering node, verifying that the `State`
219 is now `STATE_NORMAL`. For completeness, verify this on each ordering node.
220
221 When this process is completed, the ordering service is now ready to accept all
222 transactions on all channels. If you stopped your peers and application as
223 recommended, you may now restart them.
224
225 ## Abort and rollback
226
227 If a problem emerges during the migration process **before exiting maintenance
228 mode**, simply perform the rollback procedure below.
229
230 1. Shut down the ordering nodes and the Kafka service (servers and Zookeeper
231 ensemble).
232 2. Rollback the file system of these servers to the backup taken at maintenance
233 mode before changing the `ConsensusType`.
234 3. Restart said servers, the ordering nodes will bootstrap to Kafka in
235 maintenance mode.
236 4. Send a configuration update exiting maintenance mode to continue using Kafka
237 as your consensus mechanism, or resume the instructions after the point of
238 backup and fix the error which prevented a Raft quorum from forming and retry
239 migration with corrected Raft configuration `Metadata`.
240
241 There are a few states which might indicate migration has failed:
242
243 1. Some nodes crash or shutdown.
244 2. There is no record of a successful leader election per channel in the logs.
245 3. The attempt to flip to `STATE_NORMAL` mode on the system channel fails.
246
247