github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/architecture/adr-006-trust-metric.md (about)

     1  # ADR 006: Trust Metric Design
     2  
     3  ## Context
     4  
     5  The proposed trust metric will allow Tendermint to maintain local trust rankings for peers it has directly interacted with, which can then be used to implement soft security controls. The calculations were obtained from the [TrustGuard](https://dl.acm.org/citation.cfm?id=1060808) project.
     6  
     7  ### Background
     8  
     9  The Tendermint Core project developers would like to improve Tendermint security and reliability by keeping track of the level of trustworthiness peers have demonstrated within the peer-to-peer network. This way, undesirable outcomes from peers will not immediately result in them being dropped from the network (potentially causing drastic changes to take place). Instead, peers behavior can be monitored with appropriate metrics and be removed from the network once Tendermint Core is certain the peer is a threat. For example, when the PEXReactor makes a request for peers network addresses from a already known peer, and the returned network addresses are unreachable, this untrustworthy behavior should be tracked. Returning a few bad network addresses probably shouldn’t cause a peer to be dropped, while excessive amounts of this behavior does qualify the peer being dropped.
    10  
    11  Trust metrics can be circumvented by malicious nodes through the use of strategic oscillation techniques, which adapts the malicious node’s behavior pattern in order to maximize its goals. For instance, if the malicious node learns that the time interval of the Tendermint trust metric is _X_ hours, then it could wait _X_ hours in-between malicious activities. We could try to combat this issue by increasing the interval length, yet this will make the system less adaptive to recent events.
    12  
    13  Instead, having shorter intervals, but keeping a history of interval values, will give our metric the flexibility needed in order to keep the network stable, while also making it resilient against a strategic malicious node in the Tendermint peer-to-peer network. Also, the metric can access trust data over a rather long period of time while not greatly increasing its history size by aggregating older history values over a larger number of intervals, and at the same time, maintain great precision for the recent intervals. This approach is referred to as fading memories, and closely resembles the way human beings remember their experiences. The trade-off to using history data is that the interval values should be preserved in-between executions of the node.
    14  
    15  ### References
    16  
    17  S. Mudhakar, L. Xiong, and L. Liu, “TrustGuard: Countering Vulnerabilities in Reputation Management for Decentralized Overlay Networks,” in _Proceedings of the 14th international conference on World Wide Web, pp. 422-431_, May 2005.
    18  
    19  ## Decision
    20  
    21  The proposed trust metric will allow a developer to inform the trust metric store of all good and bad events relevant to a peer's behavior, and at any time, the metric can be queried for a peer's current trust ranking.
    22  
    23  The three subsections below will cover the process being considered for calculating the trust ranking, the concept of the trust metric store, and the interface for the trust metric.
    24  
    25  ### Proposed Process
    26  
    27  The proposed trust metric will count good and bad events relevant to the object, and calculate the percent of counters that are good over an interval with a predefined duration. This is the procedure that will continue for the life of the trust metric. When the trust metric is queried for the current **trust value**, a resilient equation will be utilized to perform the calculation.
    28  
    29  The equation being proposed resembles a Proportional-Integral-Derivative (PID) controller used in control systems. The proportional component allows us to be sensitive to the value of the most recent interval, while the integral component allows us to incorporate trust values stored in the history data, and the derivative component allows us to give weight to sudden changes in the behavior of a peer. We compute the trust value of a peer in interval i based on its current trust ranking, its trust rating history prior to interval _i_ (over the past _maxH_ number of intervals) and its trust ranking fluctuation. We will break up the equation into the three components.
    30  
    31  ```math
    32  (1) Proportional Value = a * R[i]
    33  ```
    34  
    35  where _R_[*i*] denotes the raw trust value at time interval _i_ (where _i_ == 0 being current time) and _a_ is the weight applied to the contribution of the current reports. The next component of our equation uses a weighted sum over the last _maxH_ intervals to calculate the history value for time _i_:
    36  
    37  `H[i] =` ![formula1](img/formula1.png "Weighted Sum Formula")
    38  
    39  The weights can be chosen either optimistically or pessimistically. An optimistic weight creates larger weights for newer history data values, while the the pessimistic weight creates larger weights for time intervals with lower scores. The default weights used during the calculation of the history value are optimistic and calculated as _Wk_ = 0.8^_k_, for time interval _k_. With the history value available, we can now finish calculating the integral value:
    40  
    41  ```math
    42  (2) Integral Value = b * H[i]
    43  ```
    44  
    45  Where _H_[*i*] denotes the history value at time interval _i_ and _b_ is the weight applied to the contribution of past performance for the object being measured. The derivative component will be calculated as follows:
    46  
    47  ```math
    48  D[i] = R[i] – H[i]
    49  
    50  (3) Derivative Value = c(D[i]) * D[i]
    51  ```
    52  
    53  Where the value of _c_ is selected based on the _D_[*i*] value relative to zero. The default selection process makes _c_ equal to 0 unless _D_[*i*] is a negative value, in which case c is equal to 1. The result is that the maximum penalty is applied when current behavior is lower than previously experienced behavior. If the current behavior is better than the previously experienced behavior, then the Derivative Value has no impact on the trust value. With the three components brought together, our trust value equation is calculated as follows:
    54  
    55  ```math
    56  TrustValue[i] = a * R[i] + b * H[i] + c(D[i]) * D[i]
    57  ```
    58  
    59  As a performance optimization that will keep the amount of raw interval data being saved to a reasonable size of _m_, while allowing us to represent 2^_m_ - 1 history intervals, we can employ the fading memories technique that will trade space and time complexity for the precision of the history data values by summarizing larger quantities of less recent values. While our equation above attempts to access up to _maxH_ (which can be 2^_m_ - 1), we will map those requests down to _m_ values using equation 4 below:
    60  
    61  ```math
    62  (4) j = index, where index > 0
    63  ```
    64  
    65  Where _j_ is one of _(0, 1, 2, … , m – 1)_ indices used to access history interval data. Now we can access the raw intervals using the following calculations:
    66  
    67  ```math
    68  R[0] = raw data for current time interval
    69  ```
    70  
    71  `R[j] =` ![formula2](img/formula2.png "Fading Memories Formula")
    72  
    73  ### Trust Metric Store
    74  
    75  Similar to the P2P subsystem AddrBook, the trust metric store will maintain information relevant to Tendermint peers. Additionally, the trust metric store will ensure that trust metrics will only be active for peers that a node is currently and directly engaged with.
    76  
    77  Reactors will provide a peer key to the trust metric store in order to retrieve the associated trust metric. The trust metric can then record new positive and negative events experienced by the reactor, as well as provided the current trust score calculated by the metric.
    78  
    79  When the node is shutting down, the trust metric store will save history data for trust metrics associated with all known peers. This saved information allows experiences with a peer to be preserved across node executions, which can span a tracking windows of days or weeks. The trust history data is loaded automatically during OnStart.
    80  
    81  ### Interface Detailed Design
    82  
    83  Each trust metric allows for the recording of positive/negative events, querying the current trust value/score, and the stopping/pausing of tracking over time intervals. This can be seen below:
    84  
    85  ```go
    86  // TrustMetric - keeps track of peer reliability
    87  type TrustMetric struct {
    88      // Private elements.
    89  }
    90  
    91  // Pause tells the metric to pause recording data over time intervals.
    92  // All method calls that indicate events will unpause the metric
    93  func (tm *TrustMetric) Pause() {}
    94  
    95  // Stop tells the metric to stop recording data over time intervals
    96  func (tm *TrustMetric) Stop() {}
    97  
    98  // BadEvents indicates that an undesirable event(s) took place
    99  func (tm *TrustMetric) BadEvents(num int) {}
   100  
   101  // GoodEvents indicates that a desirable event(s) took place
   102  func (tm *TrustMetric) GoodEvents(num int) {}
   103  
   104  // TrustValue gets the dependable trust value; always between 0 and 1
   105  func (tm *TrustMetric) TrustValue() float64 {}
   106  
   107  // TrustScore gets a score based on the trust value always between 0 and 100
   108  func (tm *TrustMetric) TrustScore() int {}
   109  
   110  // NewMetric returns a trust metric with the default configuration
   111  func NewMetric() *TrustMetric {}
   112  
   113  //------------------------------------------------------------------------------------------------
   114  // For example
   115  
   116  tm := NewMetric()
   117  
   118  tm.BadEvents(1)
   119  score := tm.TrustScore()
   120  
   121  tm.Stop()
   122  ```
   123  
   124  Some of the trust metric parameters can be configured. The weight values should probably be left alone in more cases, yet the time durations for the tracking window and individual time interval should be considered.
   125  
   126  ```go
   127  // TrustMetricConfig - Configures the weight functions and time intervals for the metric
   128  type TrustMetricConfig struct {
   129      // Determines the percentage given to current behavior
   130      ProportionalWeight float64
   131  
   132      // Determines the percentage given to prior behavior
   133      IntegralWeight float64
   134  
   135      // The window of time that the trust metric will track events across.
   136      // This can be set to cover many days without issue
   137      TrackingWindow time.Duration
   138  
   139      // Each interval should be short for adapability.
   140      // Less than 30 seconds is too sensitive,
   141      // and greater than 5 minutes will make the metric numb
   142      IntervalLength time.Duration
   143  }
   144  
   145  // DefaultConfig returns a config with values that have been tested and produce desirable results
   146  func DefaultConfig() TrustMetricConfig {}
   147  
   148  // NewMetricWithConfig returns a trust metric with a custom configuration
   149  func NewMetricWithConfig(tmc TrustMetricConfig) *TrustMetric {}
   150  
   151  //------------------------------------------------------------------------------------------------
   152  // For example
   153  
   154  config := TrustMetricConfig{
   155      TrackingWindow: time.Minute * 60 * 24, // one day
   156      IntervalLength:    time.Minute * 2,
   157  }
   158  
   159  tm := NewMetricWithConfig(config)
   160  
   161  tm.BadEvents(10)
   162  tm.Pause()
   163  tm.GoodEvents(1) // becomes active again
   164  ```
   165  
   166  A trust metric store should be created with a DB that has persistent storage so it can save history data across node executions. All trust metrics instantiated by the store will be created with the provided TrustMetricConfig configuration.
   167  
   168  When you attempt to fetch the trust metric for a peer, and an entry does not exist in the trust metric store, a new metric is automatically created and the entry made within the store.
   169  
   170  In additional to the fetching method, GetPeerTrustMetric, the trust metric store provides a method to call when a peer has disconnected from the node. This is so the metric can be paused (history data will not be saved) for periods of time when the node is not having direct experiences with the peer.
   171  
   172  ```go
   173  // TrustMetricStore - Manages all trust metrics for peers
   174  type TrustMetricStore struct {
   175      cmn.BaseService
   176  
   177      // Private elements
   178  }
   179  
   180  // OnStart implements Service
   181  func (tms *TrustMetricStore) OnStart(context.Context) error { return nil }
   182  
   183  // OnStop implements Service
   184  func (tms *TrustMetricStore) OnStop() {}
   185  
   186  // NewTrustMetricStore returns a store that saves data to the DB
   187  // and uses the config when creating new trust metrics
   188  func NewTrustMetricStore(db dbm.DB, tmc TrustMetricConfig) *TrustMetricStore {}
   189  
   190  // Size returns the number of entries in the trust metric store
   191  func (tms *TrustMetricStore) Size() int {}
   192  
   193  // GetPeerTrustMetric returns a trust metric by peer key
   194  func (tms *TrustMetricStore) GetPeerTrustMetric(key string) *TrustMetric {}
   195  
   196  // PeerDisconnected pauses the trust metric associated with the peer identified by the key
   197  func (tms *TrustMetricStore) PeerDisconnected(key string) {}
   198  
   199  //------------------------------------------------------------------------------------------------
   200  // For example
   201  
   202  db := dbm.NewDB("trusthistory", "goleveldb", dirPathStr)
   203  tms := NewTrustMetricStore(db, DefaultConfig())
   204  
   205  tm := tms.GetPeerTrustMetric(key)
   206  tm.BadEvents(1)
   207  
   208  tms.PeerDisconnected(key)
   209  ```
   210  
   211  ## Status
   212  
   213  Approved.
   214  
   215  ## Consequences
   216  
   217  ### Positive
   218  
   219  - The trust metric will allow Tendermint to make non-binary security and reliability decisions
   220  - Will help Tendermint implement deterrents that provide soft security controls, yet avoids disruption on the network
   221  - Will provide useful profiling information when analyzing performance over time related to peer interaction
   222  
   223  ### Negative
   224  
   225  - Requires saving the trust metric history data across node executions
   226  
   227  ### Neutral
   228  
   229  - Keep in mind that, good events need to be recorded just as bad events do using this implementation