github.com/hechain20/hechain@v0.0.0-20220316014945-b544036ba106/docs/source/raft_configuration.md (about)

     1  # Configuring and operating a Raft ordering service
     2  
     3  **Audience**: *Raft ordering node admins*
     4  
     5  Note: this topic describes the process for configuring a Raft ordering service that has not been bootstrapped with a system channel genesis block. For a version of this topic that includes information about the system channel, check out [Configuring and operating a Raft ordering service](https://hyperledger-fabric.readthedocs.io/en/release-2.2/raft_configuration.html).
     6  
     7  ## Conceptual overview
     8  
     9  For a high level overview of the concept of ordering and how the supported
    10  ordering service implementations (including Raft) work at a high level, check
    11  out our conceptual documentation on the [Ordering Service](./orderer/ordering_service.html).
    12  
    13  To learn about the process of setting up an ordering node, check out our
    14  documentation on [Planning for an ordering service](./deployorderer/ordererplan.html).
    15  
    16  ## Configuration
    17  
    18  A Raft cluster is configured in two places:
    19  
    20    * **Local configuration**: Governs node specific aspects, such as TLS
    21    communication, replication behavior, and file storage.
    22  
    23    * **Channel configuration**: Defines the membership of the Raft cluster for the
    24    corresponding channel, as well as protocol specific parameters such as heartbeat
    25    frequency, leader timeouts, and more.
    26  
    27  Raft nodes identify each other using TLS pinning, so in order to impersonate a
    28  Raft node, an attacker needs to obtain the **private key** of its TLS
    29  certificate. As a result, it is not possible to run a Raft node without a valid
    30  TLS configuration.
    31  
    32  Recall, each channel has its own instance of a Raft protocol running. Thus, a
    33  Raft node must be referenced in the configuration of each channel it belongs to
    34  by adding its server and client TLS certificates (in `PEM` format) to the channel
    35  config. This ensures that when other nodes receive a message from it, they can
    36  securely confirm the identity of the node that sent the message.
    37  
    38  The following section from `configtx.yaml` shows three Raft nodes (also called
    39  “consenters”) in the channel:
    40  
    41  ```
    42         Consenters:
    43              - Host: raft0.example.com
    44                Port: 7050
    45                ClientTLSCert: path/to/ClientTLSCert0
    46                ServerTLSCert: path/to/ServerTLSCert0
    47              - Host: raft1.example.com
    48                Port: 7050
    49                ClientTLSCert: path/to/ClientTLSCert1
    50                ServerTLSCert: path/to/ServerTLSCert1
    51              - Host: raft2.example.com
    52                Port: 7050
    53                ClientTLSCert: path/to/ClientTLSCert2
    54                ServerTLSCert: path/to/ServerTLSCert2
    55  ```
    56  
    57  When the channel config block is created, the `configtxgen` tool reads the paths
    58  to the TLS certificates, and replaces the paths with the corresponding bytes of
    59  the certificates.
    60  
    61  Note: it is possible to remove and add an ordering node from a channel dynamically without affecting the other nodes, a process described in the Reconfiguration section below.
    62  
    63  ### Local configuration
    64  
    65  The `orderer.yaml` has two configuration sections that are relevant for Raft
    66  orderers:
    67  
    68  **Cluster**, which determines the TLS communication configuration. And
    69  **consensus**, which determines where Write Ahead Logs and Snapshots are
    70  stored.
    71  
    72  **Cluster parameters:**
    73  
    74  By default, the Raft service is running on the same gRPC server as the client
    75  facing server (which is used to send transactions or pull blocks), but it can be
    76  configured to have a separate gRPC server with a separate port.
    77  
    78  This is useful for cases where you want TLS certificates issued by the
    79  organizational CAs, but used only by the cluster nodes to communicate among each
    80  other, and TLS certificates issued by a public TLS CA for the client facing API.
    81  
    82    * `ClientCertificate`, `ClientPrivateKey`: The file path of the client TLS certificate
    83    and corresponding private key.
    84    * `ListenPort`: The port the cluster listens on.
    85    It must be same as `consenters[i].Port` in Channel configuration.
    86    If blank, the port is the same port as the orderer general port (`general.listenPort`)
    87    * `ListenAddress`: The address the cluster service is listening on.
    88    * `ServerCertificate`, `ServerPrivateKey`: The TLS server certificate key pair
    89    which is used when the cluster service is running on a separate gRPC server
    90    (different port).
    91  
    92  Note: `ListenPort`, `ListenAddress`, `ServerCertificate`, `ServerPrivateKey` must
    93  be either set together or unset together.
    94  If they are unset, they are inherited from the general TLS section,
    95  in example `general.tls.{privateKey, certificate}`.
    96  When general TLS is disabled:
    97   - Use a different `ListenPort` than the orderer general port
    98   - Properly configure TLS root CAs in the channel configuration.
    99  
   100  There are also hidden configuration parameters for `general.cluster` which can be
   101  used to further fine tune the cluster communication or replication mechanisms:
   102  
   103    * `SendBufferSize`: Regulates the number of messages in the egress buffer.
   104    * `DialTimeout`, `RPCTimeout`: Specify the timeouts of creating connections and
   105    establishing streams.
   106    * `ReplicationBufferSize`: the maximum number of bytes that can be allocated
   107    for each in-memory buffer used for block replication from other cluster nodes.
   108    Each channel has its own memory buffer. Defaults to `20971520` which is `20MB`.
   109    * `PullTimeout`: the maximum duration the ordering node will wait for a block
   110    to be received before it aborts. Defaults to five seconds.
   111    * `ReplicationRetryTimeout`: The maximum duration the ordering node will wait
   112    between two consecutive attempts. Defaults to five seconds.
   113    * `ReplicationBackgroundRefreshInterval`: the time between two consecutive
   114    attempts to replicate existing channels that this node was added to, or
   115    channels that this node failed to replicate in the past. Defaults to five
   116    minutes.
   117    * `TLSHandshakeTimeShift`: If the TLS certificates of the ordering nodes
   118    expire and are not replaced in time (see TLS certificate rotation below),
   119     communication between them cannot be established, and it will be impossible
   120     to send new transactions to the ordering service.
   121     To recover from such a scenario, it is possible to make TLS handshakes
   122     between ordering nodes consider the time to be shifted backwards a given
   123     amount that is configured to `TLSHandshakeTimeShift`.
   124     This setting only applies when a separate cluster listener is in use.  If
   125     the cluster service is sharing the orderer's main gRPC server, then instead
   126     specify `TLSHandshakeTimeShift` in the `General.TLS` section.
   127  
   128  **Consensus parameters:**
   129  
   130    * `WALDir`: the location at which Write Ahead Logs for `etcd/raft` are stored.
   131    Each channel will have its own subdirectory named after the channel ID.
   132    * `SnapDir`: specifies the location at which snapshots for `etcd/raft` are stored.
   133    Each channel will have its own subdirectory named after the channel ID.
   134  
   135  There are also two hidden configuration parameters that can each be set by adding
   136  them the consensus section in the `orderer.yaml`:
   137  
   138    * `EvictionSuspicion`: The cumulative period of time of channel eviction
   139    suspicion that triggers the node to pull blocks from other nodes and see if it
   140    has been evicted from the channel in order to confirm its suspicion. If the
   141    suspicion is confirmed (the inspected block doesn't contain the node's TLS
   142    certificate), the node halts its operation for that channel. A node suspects
   143    its channel eviction when it doesn't know about any elected leader nor can be
   144    elected as leader in the channel. Defaults to 10 minutes.
   145    * `TickIntervalOverride`: If set, this value will be preferred over the tick
   146    interval configured in all channels where this ordering node is a consenter.
   147    This value should be set only with great care, as a mismatch in tick interval
   148    across orderers could result in a loss of quorum for one or more channels.
   149  
   150  ### Channel configuration
   151  
   152  Apart from the (already discussed) consenters, the Raft channel configuration has
   153  an `Options` section which relates to protocol specific knobs. It is currently
   154  not possible to change these values dynamically while a node is running. The
   155  node have to be reconfigured and restarted.
   156  
   157  The only exceptions is `SnapshotIntervalSize`, which can be adjusted at runtime.
   158  
   159  Note: It is recommended to avoid changing the following values, as a misconfiguration
   160  might lead to a state where a leader cannot be elected at all (i.e, if the
   161  `TickInterval` and `ElectionTick` are extremely low). Situations where a leader
   162  cannot be elected are impossible to resolve, as leaders are required to make
   163  changes. Because of such dangers, we suggest not tuning these parameters for most
   164  use cases.
   165  
   166    * `TickInterval`: The time interval between two `Node.Tick` invocations.
   167    * `ElectionTick`: The number of `Node.Tick` invocations that must pass between
   168    elections. That is, if a follower does not receive any message from the leader
   169    of current term before `ElectionTick` has elapsed, it will become candidate
   170    and start an election.
   171    * `ElectionTick` must be greater than `HeartbeatTick`.
   172    * `HeartbeatTick`: The number of `Node.Tick` invocations that must pass between
   173    heartbeats. That is, a leader sends heartbeat messages to maintain its
   174    leadership every `HeartbeatTick` ticks.
   175    * `MaxInflightBlocks`: Limits the max number of in-flight append blocks during
   176    optimistic replication phase.
   177    * `SnapshotIntervalSize`: Defines number of bytes per which a snapshot is taken.
   178  
   179  ## Reconfiguration
   180  
   181  The Raft orderer supports dynamic (meaning, while the channel is being serviced)
   182  addition and removal of nodes as long as only one node is added or removed at a
   183  time. Note that your cluster must be operational and able to achieve consensus
   184  before you attempt to reconfigure it. For instance, if you have three nodes, and
   185  two nodes fail, you will not be able to reconfigure your cluster to remove those
   186  nodes. Similarly, if you have one failed node in a channel with three nodes, you
   187  should not attempt to rotate a certificate, as this would induce a second fault.
   188  As a rule, you should never attempt any configuration changes to the Raft
   189  consenters, such as adding or removing a consenter, or rotating a consenter's
   190  certificate unless all consenters are online and healthy.
   191  
   192  If you do decide to change these parameters, it is recommended to only attempt
   193  such a change during a maintenance cycle. Problems are most likely to occur when
   194  a configuration is attempted in clusters with only a few nodes while a node is
   195  down. For example, if you have three nodes in your consenter set and one of them
   196  is down, it means you have two out of three nodes alive. If you extend the cluster
   197  to four nodes while in this state, you will have only two out of four nodes alive,
   198  which is not a quorum. The fourth node won't be able to onboard because nodes can
   199  only onboard to functioning clusters (unless the total size of the cluster is
   200  one or two).
   201  
   202  So by extending a cluster of three nodes to four nodes (while only two are
   203  alive) you are effectively stuck until the original offline node is resurrected.
   204  
   205  To add a new node to the ordering service:
   206  
   207    1. **Ensure the orderer organization that owns the new node is one of the orderer organizations on the channel**. If the orderer organization is not an administrator, the node will be unable to pull blocks as a follower or be joined to the consenter set.
   208    2. **Start the new ordering node**. For information about how to deploy an ordering node, check out [Planning for an ordering service](./deployorderer/ordererdeploy.html). Note that when you use the `osnadmin` CLI to create and join a channel, you do not need to point to a configuration block when starting the node.
   209    3. **Use the `osnadmin` CLI to add the first orderer to the channel**. For more information, check out the [Create a channel](./create_channel/create_channel_participation.html#step-two-use-the-osnadmin-cli-to-add-the-first-orderer-to-the-channel) tutorial.
   210    4. **Wait for the Raft node to replicate the blocks** from existing nodes for all channels its certificates have been added to. When an ordering node is added to a channel, it is added as a "follower", a state in which it can replicate blocks but is not part of the "consenter set" actively servicing the channel. When the node finishes replicating the blocks, its status should change from "onboarding" to "active". Note that an "active" ordering node is still not part of the consenter set.
   211    5. **Add the new ordering node to the consenter set**. For more information, check out the [Create a channel](./create_channel/create_channel_participation.html#step-three-join-additional-ordering-nodes) tutorial.
   212  
   213  It is possible to add a node that is already running (and participates in some
   214  channels already) to a channel while the node itself is running. To do this, simply
   215  add the node’s certificate to the channel config of the channel. The node will
   216  autonomously detect its addition to the new channel (the default value here is
   217  five minutes, but if you want the node to detect the new channel more quickly,
   218  reboot the node) and will pull the channel blocks from an orderer in the
   219  channel, and then start the Raft instance for that chain.
   220  
   221  After it has successfully done so, the channel configuration can be updated to
   222  include the endpoint of the new Raft orderer.
   223  
   224  To remove an ordering node from the consenter set of a channel, use the `osnadmin channel remove` command to remove its endpoint and certificates from the channel. For more information, check out [Add or remove orderers from existing channels](./create_channel/create_channel_participation.html#add-or-remove-orderers-from-existing-channels).
   225  
   226  Once an ordering node is removed from the channel, the other ordering nodes stop communicating with the removed orderer in the context of the removed channel. They might still be communicating on other channels.
   227  
   228  The node that is removed from the channel automatically detects its removal either immediately or after `EvictionSuspicion` time has passed (10 minutes by default) and shuts down its Raft instance on that channel.
   229  
   230  If the intent is to delete the node entirely, remove it from all channels before shutting down the node.
   231  
   232  ### TLS certificate rotation for an orderer node
   233  
   234  All TLS certificates have an expiration date that is determined by the issuer.
   235  These expiration dates can range from 10 years from the date of issuance to as
   236  little as a few months, so check with your issuer. Before the expiration date,
   237  you will need to rotate these certificates on the node itself and every channel
   238  the node is joined to.
   239  
   240  **Note:** In case the public key of the TLS certificate remains the same,
   241  there is no need to issue channel configuration updates.
   242  
   243  For each channel the node participates in:
   244  
   245    1. Update the channel configuration with the new certificates.
   246    2. Replace its certificates in the file system of the node.
   247    3. Restart the node.
   248  
   249  Because a node can only have a single TLS certificate key pair, the node will be
   250  unable to service channels its new certificates have not been added to during
   251  the update process, degrading the capacity of fault tolerance. Because of this,
   252  **once the certificate rotation process has been started, it should be completed
   253  as quickly as possible.**
   254  
   255  If for some reason the rotation of the TLS certificates has started but cannot
   256  complete in all channels, it is advised to rotate TLS certificates back to
   257  what they were and attempt the rotation later.
   258  
   259  ### Certificate expiration related authentication
   260  Whenever a client with an identity that has an expiration date (such as an identity based on an x509 certificate)
   261  sends a transaction to the orderer, the orderer checks whether its identity has expired, and if
   262  so, rejects the transaction submission.
   263  
   264  However, it is possible to configure the orderer to ignore expiration of identities via enabling
   265  the `General.Authentication.NoExpirationChecks` configuration option in the `orderer.yaml`.
   266  
   267  This should be done only under extreme circumstances, where the certificates of the administrators
   268  have expired, and due to this it is not possible to send configuration updates to replace the administrator
   269  certificates with renewed ones, because the config transactions signed by the existing administrators
   270  are now rejected because they have expired.
   271  After updating the channel it is recommended to change back to the default configuration which enforces
   272  expiration checks on identities.
   273  
   274  
   275  ## Metrics
   276  
   277  For a description of the Operations Service and how to set it up, check out
   278  [our documentation on the Operations Service](operations_service.html).
   279  
   280  For a list at the metrics that are gathered by the Operations Service, check out
   281  our [reference material on metrics](metrics_reference.html).
   282  
   283  While the metrics you prioritize will have a lot to do with your particular use
   284  case and configuration, there are two metrics in particular you might want to
   285  monitor:
   286  
   287  * `consensus_etcdraft_is_leader`: identifies which node in the cluster is
   288     currently leader. If no nodes have this set, you have lost quorum.
   289  * `consensus_etcdraft_data_persist_duration`: indicates how long write operations
   290     to the Raft cluster's persistent write ahead log take. For protocol safety,
   291     messages must be persisted durably, calling `fsync` where appropriate, before
   292     they can be shared with the consenter set. If this value begins to climb, this
   293     node may not be able to participate in consensus (which could lead to a
   294     service interruption for this node and possibly the network).
   295  * `consensus_etcdraft_cluster_size` and `consensus_etcdraft_active_nodes`: these
   296     channel metrics help track the "active" nodes (which, as it sounds, are the nodes that
   297     are currently contributing to the cluster, as compared to the total number of
   298     nodes in the cluster). If the number of active nodes falls below a majority of
   299     the nodes in the cluster, quorum will be lost and the ordering service will
   300     stop processing blocks on the channel.
   301  
   302  ## Troubleshooting
   303  
   304  * The more stress you put on your nodes, the more you might have to change certain
   305  parameters. As with any system, computer or mechanical, stress can lead to a drag
   306  in performance. As we noted in the conceptual documentation, leader elections in
   307  Raft are triggered when follower nodes do not receive either a "heartbeat"
   308  messages or an "append" message that carries data from the leader for a certain
   309  amount of time. Because Raft nodes share the same communication layer across
   310  channels (this does not mean they share data --- they do not!), if a Raft node is
   311  part of the consenter set in many channels, you might want to lengthen the amount
   312  of time it takes to trigger an election to avoid inadvertent leader elections.
   313  
   314  <!--- Licensed under Creative Commons Attribution 4.0 International License
   315  https://creativecommons.org/licenses/by/4.0/) -->