github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/consensus/hotstuff/cruisectl/README.md

github.com/onflow/flow-go@v0.35.7-crescendo-preview.23-atree-inlining/consensus/hotstuff/cruisectl/README.md (about)

     1  # Cruise Control: Automated Block Time Adjustment for Precise Epoch Switchover Timing
     2  
     3  # Overview
     4  
     5  ## Context
     6  
     7  Epochs have a fixed length, measured in views.
     8  The actual view rate of the network varies depending on network conditions, e.g. load, number of offline replicas, etc.
     9  We would like for consensus nodes to observe the actual view rate of the committee, and adjust how quickly they proceed
    10  through views accordingly, to target a desired weekly epoch switchover time.
    11  
    12  ## High-Level Design
    13  
    14  The `BlockTimeController` observes the current view rate and adjusts the timing when the proposal should be released.
    15  It is a [PID controller](https://en.wikipedia.org/wiki/PID_controller). The essential idea is to take into account the
    16  current error, the rate of change of the error, and the cumulative error, when determining how much compensation to apply.
    17  The compensation function $u[v]$ has three terms:
    18  
    19  - $P[v]$ compensates proportionally to the magnitude of the instantaneous error
    20  - $I[v]$ compensates proportionally to the magnitude of the error and how long it has persisted
    21  - $D[v]$ compensates proportionally to the rate of change of the error
    22  
    23  
    24  📚 This document uses ideas from:
    25  
    26  - the paper [Fast self-tuning PID controller specially suited for mini robots](https://www.frba.utn.edu.ar/wp-content/uploads/2021/02/EWMA_PID_7-1.pdf)
    27  - the ‘Leaky Integrator’ [[forum discussion](https://engineering.stackexchange.com/questions/29833/limiting-the-integral-to-a-time-window-in-pid-controller), [technical background](https://www.music.mcgill.ca/~gary/307/week2/node4.html)]
    28  
    29  
    30  ### Choice of Process Variable: Targeted Epoch Switchover Time
    31  
    32  The process variable is the variable which:
    33  
    34  - has a target desired value, or setpoint ($SP$)
    35  - is successively measured by the controller to compute the error $e$
    36  
    37  ---
    38  👉 The `BlockTimeController` controls the progression through views, such that the epoch switchover happens at the intended point in time. We define:
    39  
    40  - $\gamma = k\cdot \tau_0$ is the remaining epoch duration of a hypothetical ideal system, where *all* remaining $k$ views of the epoch progress with the ideal view time  $\tau_0$.
    41  - The parameter $\tau_0$ is computed solely based on the Epoch configuration as
    42    $\tau_0 := \frac{<{\rm total\ epoch\ time}>}{<{\rm total\ views\ in\ epoch}>}$ (for mainnet 22, Epoch 75, we have $\tau_0 \simeq$  1250ms).
    43  - $\Gamma$ is the *actual* time remaining until the desired epoch switchover.
    44  
    45  The error, which the controller should drive towards zero, is defined as:
    46  
    47  ```math
    48  e := \gamma - \Gamma
    49  ```
    50  ---
    51  
    52  
    53  From our definition it follows that:
    54  
    55  - $e > 0$  implies that the estimated epoch switchover (assuming ideal system behaviour) happens too late. Therefore, to hit the desired epoch switchover time, the time we spend in views has to be *smaller* than $\tau_0$.
    56  - For $e < 0$  means that we estimate the epoch switchover to be too early. Therefore, we should be slowing down and spend more than $\tau_0$ in the following views.
    57  
    58  **Reasoning:** 
    59  
    60  The desired idealized system behaviour would a constant view duration $\tau_0$ throughout the entire epoch.
    61  
    62  However, in the real-world system we have disturbances (varying message relay times, slow or offline nodes, etc) and measurement uncertainty (node can only observe its local view times, but not the committee’s collective swarm behaviour).
    63  
    64  <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/PID_controller_for_block-rate-delay.png' width='600'>
    65  
    66  
    67  After a disturbance, we want the controller to drive the system back to a state, where it can closely follow the ideal behaviour from there on. 
    68  
    69  - Simulations have shown that this approach produces *very* stable controller with the intended behaviour.
    70  
    71      **Controller driving  $e := \gamma - \Gamma \rightarrow 0$**
    72      - setting the differential term $K_d=0$, the controller responds as expected with damped oscillatory behaviour
    73        to a singular strong disturbance. Setting $K_d=3$ suppresses oscillations and the controller's performance improves as it responds more effectively.  
    74  
    75        <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/EpochSimulation_029.png' width='900'>
    76  
    77        <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/EpochSimulation_030.png' width='900'>
    78  
    79      - controller very quickly compensates for moderate disturbances and observational noise in a well-behaved system:
    80  
    81        <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/EpochSimulation_028.png' width='900'>
    82    
    83      - controller compensates massive anomaly (100s network partition) effectively:
    84  
    85        <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/EpochSimulation_000.png' width='900'>
    86  
    87      - controller effectively stabilizes system with continued larger disturbances (20% of offline consensus participants) and notable observational noise:
    88  
    89        <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/EpochSimulation_005-0.png' width='900'>
    90           
    91      **References:**
    92      
    93      - statistical model for happy-path view durations: [ID controller for ``block-rate-delay``](https://www.notion.so/ID-controller-for-block-rate-delay-cc9c2d9785ac4708a37bb952557b5ef4?pvs=21)
    94      - For Python implementation with additional disturbances (offline nodes) and observational noise, see GitHub repo: [flow-internal/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller](https://github.com/dapperlabs/flow-internal/tree/master/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller) → [controller_tuning_v01.py](https://github.com/dapperlabs/flow-internal/blob/master/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller/controller_tuning_v01.py)
    95  
    96  # Detailed PID controller specification
    97  
    98  Each consensus participant runs a local instance of the controller described below. Hence, all the quantities are based on the node’s local observation.
    99  
   100  ## Definitions
   101  
   102  **Observables** (quantities provided to the node or directly measurable by the node):
   103  
   104  - $v$ is the node’s current view
   105  - ideal view time $\tau_0$ is computed solely based on the Epoch configuration:
   106  $\tau_0 := \frac{<{\rm total\ epoch\ time}>}{<{\rm total\ views\ in\ epoch}>}$  (for mainnet 22, Epoch 75, we have $\tau_0 \simeq$  1250ms).
   107  - $t[v]$ is the time the node entered view $v$
   108  - $F[v]$  is the final view of the current epoch
   109  - $T[v]$ is the target end time of the current epoch
   110  
   111  **Derived quantities**
   112  
   113  - remaining views of the epoch $k[v] := F[v] +1 - v$
   114  - time remaining until the desired epoch switchover $\Gamma[v] := T[v]-t[v]$
   115  - error $e[v] := \underbrace{k\cdot\tau_0}_{\gamma[v]} - \Gamma[v] = t[v] + k[v] \cdot\tau_0 - T[v]$
   116  
   117  ### Precise convention of View Timing
   118  
   119  Upon observing block `B` with view $v$, the controller updates its internal state. 
   120  
   121  Note the '+1' term in the computation of the remaining views $k[v] := F[v] +1 - v$  . This is related to our convention that the epoch begins (happy path) when observing the first block of the epoch. Only by observing this block, the nodes transition to the first view of the epoch. Up to that point, the consensus replicas remain in the last view of the previous epoch, in the state of `having processed the last block of the old epoch and voted for it` (happy path). Replicas remain in this state until they see a confirmation of the view (either QC or TC for the last view of the previous epoch). 
   122  
   123  <img src='https://github.com/onflow/flow-go/blob/master/docs/CruiseControl_BlockTimeController/ViewDurationConvention.png' width='600'>
   124  
   125  In accordance with this convention, observing the proposal for the last view of an epoch, marks the start of the last view. By observing the proposal, nodes enter the last view, verify the block, vote for it, the primary aggregates the votes, constructs the child (for first view of new epoch). The last view of the epoch ends, when the child proposal is published.
   126  
   127  ### Controller
   128  
   129  The goal of the controller is to drive the system towards an error of zero, i.e. $e[v] \rightarrow 0$. For a [PID controller](https://en.wikipedia.org/wiki/PID_controller), the output $u$ for view $v$ has the form: 
   130  
   131  ```math
   132  u[v] = K_p \cdot e[v]+K_i \cdot \mathcal{I}[v] + K_d \cdot \Delta[v]
   133  ```
   134  
   135  With error terms (computed from observations)
   136  
   137  - $e[v]$ representing the *instantaneous* error as of view $v$
   138  (commonly referred to as ‘proportional term’)
   139  - $\mathcal{I} [v] = \sum_v e[v]$ the sum of the errors
   140  (commonly referred to as ‘integral term’)
   141  - $\Delta[v]=e[v]-e[v-1]$ the rate of change of the error
   142  (commonly referred to as ‘derivative term’)
   143  
   144  and controller parameters (values derived from controller tuning): 
   145  
   146  - $K_p$ be the proportional coefficient
   147  - $K_i$ be the integral coefficient
   148  - $K_d$ be the derivative coefficient
   149  
   150  ## Measuring view duration
   151  
   152  Each consensus participant observes the error $e[v]$ based on its local view evolution. As the following figure illustrates, the view duration is highly variable on small time scales.
   153  
   154  ![](/docs/CruiseControl_BlockTimeController/ViewRate.png)
   155  
   156  Therefore, we expect $e[v]$ to be very variable. Furthermore, note that a node uses its local view transition times as an estimator for the collective behaviour of the entire committee. Therefore, there is also observational noise obfuscating the underlying collective behaviour. Hence, we expect notable noise. 
   157  
   158  ## Managing noise
   159  
   160  Noisy values for $e[v]$ also impact the derivative term $\Delta[v]$ and integral term $\mathcal{I}[v]$. This can impact the controller’s performance.
   161  
   162  ### **Managing noise in the proportional term**
   163  
   164  An established approach for managing noise in observables is to use [exponentially weighted moving average [EWMA]](https://en.wikipedia.org/wiki/Moving_average) instead of the instantaneous values.  Specifically, let $\bar{e}[v]$ denote the EWMA of the instantaneous error, which is computed as follows:
   165  
   166  ```math
   167  \eqalign{
   168  \textnormal{initialization: }\quad \bar{e} :&= 0 \\
   169  \textnormal{update with instantaneous error\ } e[v]:\quad \bar{e}[v] &= \alpha \cdot e[v] + (1-\alpha)\cdot \bar{e}[v-1]
   170  }
   171  ```
   172  
   173  The parameter $\alpha$ relates to the averaging time window. Let $\alpha \equiv \frac{1}{N_\textnormal{ewma}}$ and consider that the input changes from $x_\textnormal{old}$ to $x_\textnormal{new}$ as a step function. Then $N_\textnormal{ewma}$ is the number of samples required to move the output average about 2/3 of the way from  $x_\textnormal{old}$ to $x_\textnormal{new}$.
   174  
   175  see also [Python `Ewma` implementation](https://github.com/dapperlabs/flow-internal/blob/423d927421c073e4c3f66165d8f51b829925278f/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller/controller_tuning_v01.py#L405-L431)
   176  
   177  ### **Managing noise in the integral term**
   178  
   179  In particular systematic observation bias are a problem, as it leads to a diverging integral term. The commonly adopted approach is to use a ‘leaky integrator’ [[1](https://www.music.mcgill.ca/~gary/307/week2/node4.html), [2](https://engineering.stackexchange.com/questions/29833/limiting-the-integral-to-a-time-window-in-pid-controller)], which we denote as $\bar{\mathcal{I}}[v]$. 
   180  
   181  ```math
   182  \eqalign{
   183  \textnormal{initialization: }\quad \bar{\mathcal{I}} :&= 0 \\
   184  \textnormal{update with instantaneous error\ } e[v]:\quad \bar{\mathcal{I}}[v] &= e[v] + (1-\lambda)\cdot\bar{\mathcal{I}}[v-1]
   185  }
   186  ```
   187  
   188  Intuitively, the loss factor $\lambda$ relates to the time window of the integrator. A factor of 0 means an infinite time horizon, while $\lambda =1$  makes the integrator only memorize the last input. Let  $\lambda \equiv \frac{1}{N_\textnormal{itg}}$ and consider a constant input value $x$. Then $N_\textnormal{itg}$ relates to the number of past samples that the integrator remembers: 
   189  
   190  - the integrators output will saturate at $x\cdot N_\textnormal{itg}$
   191  - an integrator initialized with 0, reaches 2/3 of the saturation value $x\cdot N_\textnormal{itg}$ after consuming $N_\textnormal{itg}$ inputs
   192  
   193  see also [Python `LeakyIntegrator` implementation](https://github.com/dapperlabs/flow-internal/blob/423d927421c073e4c3f66165d8f51b829925278f/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller/controller_tuning_v01.py#L444-L468)
   194  
   195  ### **Managing noise in the derivative term**
   196  
   197  Similarly to the proportional term, we apply an EWMA to the differential term and denote the averaged value as $\bar{\Delta}[v]$:
   198  
   199  ```math
   200  \eqalign{
   201  \textnormal{initialization: }\quad \bar{\Delta} :&= 0 \\
   202  \textnormal{update with instantaneous error\ } e[v]:\quad \bar{\Delta}[v] &= \bar{e}[v] - \bar{e}[v-1]
   203  }
   204  ```
   205  
   206  ## Final formula for PID controller
   207  
   208  We have used a statistical model of the view duration extracted from mainnet 22 (Epoch 75) and manually added disturbances and observational noise and systemic observational bias.
   209  
   210  The following parameters have proven to generate stable controller behaviour over a large variety of network conditions:
   211  
   212  ---
   213  👉 The controller is given by
   214  
   215  ```math
   216  u[v] = K_p \cdot \bar{e}[v]+K_i \cdot \bar{\mathcal{I}}[v] + K_d \cdot \bar{\Delta}[v]
   217  ```
   218  
   219  with parameters:
   220  
   221  - $K_p = 2.0$
   222  - $K_i = 0.6$
   223  - $K_d = 3.0$
   224  - $N_\textnormal{ewma} = 5$, i.e. $\alpha = \frac{1}{N_\textnormal{ewma}} = 0.2$
   225  - $N_\textnormal{itg} = 50$, i.e.  $\lambda = \frac{1}{N_\textnormal{itg}} = 0.02$
   226      
   227  The controller output $u[v]$ represents the amount of time by which the controller wishes to deviate from the ideal view duration $\tau_0$. In other words, the duration of view $v$ that the controller wants to set is
   228  ```math
   229  \widehat{\tau}[v] = \tau_0 - u[v]
   230  ```
   231  ---    
   232  
   233  ### Limits of authority 
   234  
   235  [Latest update: Crescendo Upgrade, June 2024]
   236  
   237  In general, there is no bound on the output of the controller output $u$. However, it is important to limit the controller’s influence to keep $u$ within a sensible range.
   238  
   239  - upper bound on view duration $\widehat{\tau}[v]$ that we allow the controller to set:
   240    
   241    The current timeout threshold is set to 1045ms and the largest view duration we want to allow the controller to set is $\tau_\textrm{max}$ = 910ms.
   242    Thereby, we have a buffer $\beta$ = 135ms remaining for message propagation and the replicas validating the proposal for view $v$.
   243  
   244    Note the subtle but important aspect: Primary for view $v$ controls duration of view $v-1$. This is because its proposal for view $v$
   245    contains the proof (Quorum Certificate [QC]) that view $v-1$ concluded on the happy path. By observing the QC for view $v-1$, nodes enter the
   246    subsequent view $v$.
   247  
   248  
   249  - lower bound on the view duration:
   250      
   251    Let $t_\textnormal{p}[v]$ denote the time when the primary for view $v$ has constructed its block proposal.
   252    On the happy path, a replica concludes view $v-1$ and transitions to view $v$, when it observes the proposal for view $v$.
   253    The duration $t_\textnormal{p}[v] - t[v-1]$ is the time between the primary observing the parent block (view $v-1$), collecting votes,
   254    constructing a QC for view $v-1$, and subsequently its own proposal for view $v$. This duration is the minimally required time to execute the protocol.
   255    The controller can only *delay* broadcasting the block,
   256    but it cannot release the block before  $t_\textnormal{p}[v]$ simply because the proposal isn’t ready any earlier. 
   257      
   258  
   259  
   260  👉 Let $\hat{t}[v]$ denote the time when the primary for view $v$ *broadcasts* its proposal. We assign:
   261  
   262  ```math
   263  \hat{t}[v] := \max\Big(t[v-1] +\min(\widehat{\tau}[v-1],\ \tau_\textrm{max}),\  t_\textnormal{p}[v]\Big) 
   264  ```
   265  This equation guarantees that the controller does not drive consensus into a timeout, as long as broadcasting the block and its validation
   266  together require less than time $\beta$. Currently, we have $\tau_\textrm{max}$ = 910ms as the upper bound for view durations that the controller can set.
   267  In comparison, for HotStuff's timeout threshold we set $\texttt{hotstuff-min-timeout} = \tau_\textrm{max} + \beta$, with $\beta$ = 135ms.  
   268  
   269  
   270  
   271  ### Further reading
   272  
   273  - the statistical model of the view duration, see [PID controller for ``block-rate-delay``](https://www.notion.so/ID-controller-for-block-rate-delay-cc9c2d9785ac4708a37bb952557b5ef4?pvs=21)
   274  - the simulation and controller tuning, see  [flow-internal/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller](https://github.com/dapperlabs/flow-internal/tree/master/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller) → [controller_tuning_v01.py](https://github.com/dapperlabs/flow-internal/blob/master/analyses/pacemaker_timing/2023-05_Blocktime_PID-controller/controller_tuning_v01.py)
   275  - The most recent parameter setting was derived here:
   276      - [Cruise-Control headroom for speedups](https://www.notion.so/flowfoundation/Cruise-Control-headroom-for-speedups-46dc17e07ae14462b03341e4432a907d?pvs=4) contains the formal analysis and discusses the numerical results in detail
   277      - Python code for figures and calculating the final parameter settings: [flow-internal/analyses/pacemaker_timing/2024-03_Block-timing-update](https://github.com/dapperlabs/flow-internal/tree/master/analyses/pacemaker_timing/2024-03_Block-timing-update) → [timeout-attacks.py](https://github.com/dapperlabs/flow-internal/blob/master/analyses/pacemaker_timing/2024-03_Block-timing-update/timeout-attacks.py)
   278  
   279  
   280  ## Edge Cases
   281  
   282  ### A node is catching up
   283  
   284  When a node is catching up, it observes the blocks significantly later than they were published. In other words, from the perspective
   285  of the node catching up, the blocks are too late. However, as it reaches the most recent blocks, also the observed timing error approaches zero
   286  (assuming approximately correct block publication by the honest supermajority). Nevertheless, due to its biased error observations, the node
   287  catching up could still try to compensate for the network being behind, and publish its proposal as early as possible.   
   288  
   289  **Assumption:** With only a smaller fraction of nodes being offline or catching up, the effect is expected to be small and easily compensated for by the supermajority of online nodes.
   290  
   291  ### A node has a misconfigured clock
   292  
   293  Cap the maximum deviation from the default delay (limits the general impact of error introduced by the `BlockTimeController`). The node with misconfigured clock will contribute to the error in a limited way, but as long as the majority of nodes have an accurate clock, they will offset this error. 
   294  
   295  **Assumption:** With only a smaller fraction of nodes having misconfigured clocks, the effect will be small enough to be easily compensated for by the supermajority of correct nodes.
   296  
   297  ### Near epoch boundaries
   298  
   299  We might incorrectly compute high error in the target view rate, if local current view and current epoch are not exactly synchronized. By default, they would not be, because `EpochTransition` events occur upon finalization, and current view is updated as soon as QC/TC is available.
   300  
   301  **Solution:** determine epoch locally based on view only, do not use `EpochTransition` event.
   302  
   303  ### EFM
   304  
   305  When the network is in EFM, epoch timing is anyway disrupted. The main thing we want to avoid is that the controller drives consensus into a timeout.
   306  This is largely guaranteed, due to the limits of authority. Beyond that, pretty much any block timing on the happy path is acceptable.
   307  Through, the optimal solution would be a consistent view time throughout normal Epochs as well as EFM.  
   308  
   309  ## Testing
   310  
   311  [Cruise Control: Benchnet Testing Notes](https://www.notion.so/Cruise-Control-Benchnet-Testing-Notes-ea08f49ba9d24ce2a158fca9358966df?pvs=21)