github.com/kaituanwang/hyperledger@v2.0.1+incompatible/docs/source/kafka_raft_migration.md (about)

     1  # Migrating from Kafka to Raft
     2  
     3  **Note: this document presumes a high degree of expertise with channel
     4  configuration update transactions. As the process for migration involves
     5  several channel configuration update transactions, do not attempt to migrate
     6  from Kafka to Raft without first familiarizing yourself with the [Add an
     7  Organization to a Channel](channel_update_tutorial.html) tutorial, which
     8  describes the channel update process in detail.**
     9  
    10  For users who want to transition channels from using Kafka-based ordering
    11  services to [Raft-based](./orderer/ordering_service.html#Raft) ordering services,
    12  nodes at v1.4.2 or higher allow this to be accomplished through a series of configuration update
    13  transactions on each channel in the network.
    14  
    15  This tutorial will describe this process at a high level, calling out specific
    16  details where necessary, rather than show each command in detail.
    17  
    18  ## Assumptions and considerations
    19  
    20  Before attempting migration, take the following into account:
    21  
    22  1. This process is solely for migration from Kafka to Raft. Migrating between
    23  any other orderer consensus types is not currently supported.
    24  
    25  2. Migration is one way. Once the ordering service is migrated to Raft, and
    26  starts committing transactions, it is not possible to go back to Kafka.
    27  
    28  3. Because the ordering nodes must go down and be brought back up, downtime must
    29  be allowed during the migration.
    30  
    31  4. Recovering from a botched migration is possible only if a backup is taken at
    32  the point in migration prescribed later in this document. If you do not take a
    33  backup, and migration fails, you will not be able to recover your previous state.
    34  
    35  5. All channels must be migrated during the same maintenance window. It is not
    36  possible to migrate only some channels before resuming operations.
    37  
    38  6. At the end of the migration process, every channel will have the same
    39  consenter set of Raft nodes. This is the same consenter set that will exist in
    40  the ordering system channel. This makes it possible to diagnose a successful
    41  migration.
    42  
    43  7. Migration is done in place, utilizing the existing ledgers for the deployed
    44  ordering nodes. Addition or removal of orderers should be performed after the
    45  migration.
    46  
    47  ## High level migration flow
    48  
    49  Migration is carried out in five phases.
    50  
    51  1. The system is placed into a maintenance mode where application transactions
    52     are rejected and only ordering service admins can make changes to the channel
    53     configuration.
    54  2. The system is stopped, and a backup is taken in case an error occurs during
    55     migration.
    56  3. The system is started, and each channel has its consensus type and metadata
    57     modified.
    58  4. The system is restarted and is now operating on Raft consensus; each channel
    59     is checked to confirm that it has successfully achieved a quorum.
    60  5. The system is moved out of maintenance mode and normal function resumes.
    61  
    62  ## Preparing to migrate
    63  
    64  There are several steps you should take before attempting to migrate.
    65  
    66  * Design the Raft deployment, deciding which ordering service nodes are going to
    67    remain as Raft consenters. You should deploy at least three ordering nodes in
    68    your cluster, but note that deploying a consenter set of at least five nodes
    69    will maintain high availability should a node goes down, whereas a three node
    70    configuration will lose high availability once a single node goes down for any
    71    reason (for example, as during a maintenance cycle).
    72  * Prepare the material for
    73    building the Raft `Metadata` configuration. **Note: all the channels should receive
    74    the same Raft `Metadata` configuration**. Refer to the [Raft configuration guide](raft_configuration.html)
    75    for more information on these fields. Note: you may find it easiest to bootstrap
    76    a new ordering network with the Raft consensus protocol, then copy and modify
    77    the consensus metadata section from its config. In any case, you will need
    78    (for each ordering node):
    79    - `hostname`
    80    - `port`
    81    - `server certificate`
    82    - `client certificate`
    83  * Compile a list of all channels (system and application) in the system. Make
    84    sure you have the correct credentials to sign the configuration updates. For
    85    example, the relevant ordering service admin identities.
    86  * Ensure all ordering service nodes are running the same version of Fabric, and
    87    that this version is v1.4.2 or greater.
    88  * Ensure all peers are running at least v1.4.2 of Fabric. Make sure all channels
    89    are configured with the channel capability that enables migration.
    90    - Orderer capability `V1_4_2` (or above).
    91    - Channel capability `V1_4_2` (or above).
    92  
    93  ### Entry to maintenance mode
    94  
    95  Prior to setting the ordering service into maintenance mode, it is recommended
    96  that the peers and clients of the network be stopped. Leaving peers or clients
    97  up and running is safe, however, because the ordering service will reject all of
    98  their requests, their logs will fill with benign but misleading failures.
    99  
   100  Follow the process in the [Add an Organization to a Channel](channel_update_tutorial.html)
   101  tutorial to pull, translate, and scope the configuration of **each channel,
   102  starting with the system channel**. The only field you should change during
   103  this step is in the channel configuration at `/Channel/Orderer/ConsensusType`.
   104  In a JSON representation of the channel configuration, this would be
   105  `.channel_group.groups.Orderer.values.ConsensusType​`.
   106  
   107  The `ConsensusType` is represented by three values: `Type`, `Metadata`, and
   108  `State`, where:
   109  
   110    * `Type` is either `kafka` or `etcdraft` (Raft). This value can only be
   111       changed while in maintenance mode.
   112    * `Metadata` will be empty if the `Type` is kafka, but must carry valid Raft
   113       metadata if the `ConsensusType` is `etcdraft`. More on this below.
   114    * `State` is either `STATE_NORMAL`, when the channel is processing transactions, or
   115      `STATE_MAINTENANCE`, during the migration process.
   116  
   117  In the first step of the channel configuration update, only change the `State`
   118  from `STATE_NORMAL` to `STATE_MAINTENANCE`. Do not change the `Type` or the `Metadata` field
   119  yet. Note that the `Type` should currently be `kafka`.
   120  
   121  While in maintenance mode, normal transactions, config updates unrelated to
   122  migration, and `Deliver` requests from the peers used to retrieve new blocks are
   123  rejected. This is done in order to prevent the need to both backup, and if
   124  necessary restore, peers during migration, as they only receive updates once
   125  migration has successfully completed. In other words, we want to keep the
   126  ordering service backup point, which is the next step, ahead of the peer’s ledger,
   127  in order to be able to perform rollback if needed. However, ordering node admins
   128  can issue `Deliver` requests (which they need to be able to do in order to
   129  continue the migration process).
   130  
   131  **Verify** that each ordering service node has entered maintenance mode on each
   132  of the channels. This can be done by fetching the last config block and making
   133  sure that the `Type`, `Metadata`, `State` on each channel is `kafka`, empty
   134  (recall that there is no metadata for Kafka), and `STATE_MAINTENANCE`, respectively.
   135  
   136  If the channels have been updated successfully, the ordering service is now
   137  ready for backup.
   138  
   139  #### Backup files and shut down servers
   140  
   141  Shut down all ordering nodes, Kafka servers, and Zookeeper servers. It is
   142  important to **shutdown the ordering service nodes first**. Then, after allowing
   143  the Kafka service to flush its logs to disk (this typically takes about 30
   144  seconds, but might take longer depending on your system), the Kafka servers
   145  should be shut down. Shutting down the Kafka brokers at the same time as the
   146  orderers can result in the filesystem state of the orderers being more recent
   147  than the Kafka brokers which could prevent your network from starting.
   148  
   149  Create a backup of the file system of these servers. Then restart the Kafka
   150  service and then the ordering service nodes.
   151  
   152  ### Switch to Raft in maintenance mode
   153  
   154  The next step in the migration process is another channel configuration update
   155  for each channel. In this configuration update, switch the `Type` to `etcdraft`
   156  (for Raft) while keeping the `State` in `STATE_MAINTENANCE`, and fill in the
   157  `Metadata` configuration​. It is highly recommended that the `Metadata` configuration​ be
   158  identical​ on all channels. If you want to establish different consenter sets
   159  with different nodes, you will be able to reconfigure the `Metadata` configuration​
   160  after the system is restarted into `etcdraft` mode. Supplying an identical metadata
   161  object, and hence, an identical consenter set, means that when the nodes are
   162  restarted, if the system channel forms a quorum and can exit maintenance mode,
   163  other channels will likely be able do the same. Supplying different consenter
   164  sets to each channel can cause one channel to succeed in forming a cluster while
   165  another channel will fail.
   166  
   167  Then, validate that each ordering service node has committed the `ConsensusType`
   168  change configuration update by pulling and inspecting the configuration of each
   169  channel.
   170  
   171  Note: For each channel, the transaction that changes the `ConsensusType` must be the last
   172  configuration transaction before restarting the nodes (in the next step). If
   173  some other configuration transaction happens after this step, the nodes will
   174  most likely crash on restart, or result in undefined behavior.
   175  
   176  #### Restart and validate leader
   177  
   178  Note: exit of maintenance mode **must** be done **after** restart.
   179  
   180  After the `ConsensusType` update has been completed on each channel, stop all
   181  ordering service nodes, stop all Kafka brokers and Zookeepers, and then restart
   182  only the ordering service nodes. They should restart as Raft nodes, form a cluster per
   183  channel, and elect a leader on each channel.
   184  
   185  **Note**: Since Raft-based ordering service requires mutual TLS between orderer nodes,
   186  **additional configurations** are required before you start them again, see
   187  [Section: Local Configuration](./raft_configuration.md#local-configuration) for more details.
   188  
   189  After restart process finished, make sure to **validate** that a
   190  leader has been elected on each channel by inspecting the node logs (you can see
   191  what to look for below). This will confirm that the process has been completed
   192  successfully.
   193  
   194  When a leader is elected, the log will show, for each channel:
   195  
   196  ``` ​
   197  "Raft leader changed: 0 -> ​node-number​ ​channel=​channel-name​
   198  node=​node-number​ ​"
   199  ```
   200  
   201  For example:
   202  
   203  ```
   204  2019-05-26 10:07:44.075 UTC [orderer.consensus.etcdraft] serveRequest ->
   205  INFO 047 Raft leader changed: 0 -> 1 channel=testchannel1 node=2
   206  ```
   207  
   208  In this example `​node 2​` reports that a leader was elected (the leader is
   209  ​`node 1`​) by the cluster of channel `​testchannel1​`.
   210  
   211  ### Switch out of maintenance mode
   212  
   213  Perform another channel configuration update on each channel (sending the config
   214  update to the same ordering node you have been sending configuration updates to
   215  until now), switching the `State` from `STATE_MAINTENANCE` to `STATE_NORMAL`. Start with the
   216  system channel, as usual. If it succeeds on the ordering system channel,
   217  migration is likely to succeed on all channels. To verify, fetch the last config
   218  block of the system channel from the ordering node, verifying that the `State`
   219  is now `STATE_NORMAL`. For completeness, verify this on each ordering node.
   220  
   221  When this process is completed, the ordering service is now ready to accept all
   222  transactions on all channels. If you stopped your peers and application as
   223  recommended, you may now restart them.
   224  
   225  ## Abort and rollback
   226  
   227  If a problem emerges during the migration process **before exiting maintenance
   228  mode**, simply perform the rollback procedure below.
   229  
   230  1. Shut down the ordering nodes and the Kafka service (servers and Zookeeper
   231     ensemble).
   232  2. Rollback the file system of these servers to the backup taken at maintenance
   233     mode before changing the `ConsensusType`.
   234  3. Restart said servers, the ordering nodes will bootstrap to Kafka in
   235     maintenance mode.
   236  4. Send a configuration update exiting maintenance mode to continue using Kafka
   237     as your consensus mechanism, or resume the instructions after the point of
   238     backup and fix the error which prevented a Raft quorum from forming and retry
   239     migration with corrected Raft configuration `Metadata`.
   240  
   241  There are a few states which might indicate migration has failed:
   242  
   243  1. Some nodes crash or shutdown.
   244  2. There is no record of a successful leader election per channel in the logs.
   245  3. The attempt to flip to `STATE_NORMAL` mode on the system channel fails.
   246  
   247  <!--- Licensed under Creative Commons Attribution 4.0 International License
   248  https://creativecommons.org/licenses/by/4.0/) -->