github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/qa/TMCore-QA-34.md

github.com/badrootd/nibiru-cometbft@v0.37.5-0.20240307173500-2a75559eee9b/docs/qa/TMCore-QA-34.md (about)

1 ---
2 order: 1
3 parent:
4 title: Tendermint Core QA Results v0.34.x
5 description: This is a report on the results obtained when running v0.34.x on testnets
6 order: 2
7 ---
8
9 # Tendermint Core QA Results v0.34.x
10
11 ## 200 Node Testnet
12
13 ### Finding the Saturation Point
14
15 The first goal when examining the results of the tests is identifying the saturation point.
16 The saturation point is a setup with a transaction load big enough to prevent the testnet
17 from being stable: the load runner tries to produce slightly more transactions than can
18 be processed by the testnet.
19
20 The following table summarizes the results for v0.34.x, for the different experiments
21 (extracted from file [`v034_report_tabbed.txt`](img34/v034_report_tabbed.txt)).
22
23 The X axis of this table is `c`, the number of connections created by the load runner process to the target node.
24 The Y axis of this table is `r`, the rate or number of transactions issued per second.
25
26 | | c=1 | c=2 | c=4 |
27 | :--- | ----: | ----: | ----: |
28 | r=25 | 2225 | 4450 | 8900 |
29 | r=50 | 4450 | 8900 | 17800 |
30 | r=100 | 8900 | 17800 | 35600 |
31 | r=200 | 17800 | 35600 | 38660 |
32
33 The table shows the number of 1024-byte-long transactions that were produced by the load runner,
34 and processed by Tendermint Core, during the 90 seconds of the experiment's duration.
35 Each cell in the table refers to an experiment with a particular number of websocket connections (`c`)
36 to a chosen validator, and the number of transactions per second that the load runner
37 tries to produce (`r`). Note that the overall load that the tool attempts to generate is $c \cdot r$.
38
39 We can see that the saturation point is beyond the diagonal that spans cells
40
41 * `r=200,c=2`
42 * `r=100,c=4`
43
44 given that the total number of transactions should be close to the product rate X the number of connections x experiment time.
45
46 All experiments below the saturation diagonal (`r=200,c=4`) have in common that the total
47 number of transactions processed is noticeably less than the product $c \cdot r \cdot 89$ (89 seconds, since the last batch never gets sent),
48 which is the expected number of transactions when the system is able to deal well with the
49 load.
50 With (`r=200,c=4`), we obtained 38660 whereas the theoretical number of transactions should
51 have been $200 \cdot 4 \cdot 89 = 71200$.
52
53 At this point, we chose an experiment at the limit of the saturation diagonal,
54 in order to further study the performance of this release.
55 **The chosen experiment is (`r=200,c=2`)**.
56
57 This is a plot of the CPU load (average over 1 minute, as output by `top`) of the load runner for (`r=200,c=2`),
58 where we can see that the load stays close to 0 most of the time.
59
60 ![load-load-runner](img34/v034_r200c2_load-runner.png)
61
62 ### Examining latencies
63
64 The method described [here](method.md) allows us to plot the latencies of transactions
65 for all experiments.
66
67 ![all-latencies](img34/v034_200node_latencies.png)
68
69 As we can see, even the experiments beyond the saturation diagonal managed to keep
70 transaction latency stable (i.e. not constantly increasing).
71 Our interpretation for this is that contention within Tendermint Core was propagated,
72 via the websockets, to the load runner,
73 hence the load runner could not produce the target load, but a fraction of it.
74
75 Further examination of the Prometheus data (see below), showed that the mempool contained many transactions
76 at steady state, but did not grow much without quickly returning to this steady state. This demonstrates
77 that Tendermint Core network was able to process transactions at least as quickly as they
78 were submitted to the mempool. Finally, the test script made sure that, at the end of an experiment, the
79 mempool was empty so that all transactions submitted to the chain were processed.
80
81 Finally, the number of points present in the plot appears to be much less than expected given the
82 number of transactions in each experiment, particularly close to or above the saturation diagonal.
83 This is a visual effect of the plot; what appear to be points in the plot are actually potentially huge
84 clusters of points. To corroborate this, we have zoomed in the plot above by setting (carefully chosen)
85 tiny axis intervals. The cluster shown below looks like a single point in the plot above.
86
87 ![all-latencies-zoomed](img34/v034_200node_latencies_zoomed.png)
88
89 The plot of latencies can we used as a baseline to compare with other releases.
90
91 The following plot summarizes average latencies versus overall throughput
92 across different numbers of WebSocket connections to the node into which
93 transactions are being loaded.
94
95 ![latency-vs-throughput](img34/v034_latency_throughput.png)
96
97 ### Prometheus Metrics on the Chosen Experiment
98
99 As mentioned [above](#finding-the-saturation-point), the chosen experiment is `r=200,c=2`.
100 This section further examines key metrics for this experiment extracted from Prometheus data.
101
102 #### Mempool Size
103
104 The mempool size, a count of the number of transactions in the mempool, was shown to be stable and homogeneous
105 at all full nodes. It did not exhibit any unconstrained growth.
106 The plot below shows the evolution over time of the cumulative number of transactions inside all full nodes' mempools
107 at a given time.
108 The two spikes that can be observed correspond to a period where consensus instances proceeded beyond the initial round
109 at some nodes.
110
111 ![mempool-cumulative](img34/v034_r200c2_mempool_size.png)
112
113 The plot below shows evolution of the average over all full nodes, which oscillates between 1500 and 2000
114 outstanding transactions.
115
116 ![mempool-avg](img34/v034_r200c2_mempool_size_avg.png)
117
118 The peaks observed coincide with the moments when some nodes proceeded beyond the initial round of consensus (see below).
119
120 #### Peers
121
122 The number of peers was stable at all nodes.
123 It was higher for the seed nodes (around 140) than for the rest (between 21 and 74).
124 The fact that non-seed nodes reach more than 50 peers is due to #9548.
125
126 ![peers](img34/v034_r200c2_peers.png)
127
128 #### Consensus Rounds per Height
129
130 Most nodes used only round 0 for most heights, but some nodes needed to advance to round 1 for some heights.
131
132 ![rounds](img34/v034_r200c2_rounds.png)
133
134 #### Blocks Produced per Minute, Transactions Processed per Minute
135
136 The blocks produced per minute are the slope of this plot.
137
138 ![heights](img34/v034_r200c2_heights.png)
139
140 Over a period of 2 minutes, the height goes from 530 to 569.
141 This results in an average of 19.5 blocks produced per minute.
142
143 The transactions processed per minute are the slope of this plot.
144
145 ![total-txs](img34/v034_r200c2_total-txs.png)
146
147 Over a period of 2 minutes, the total goes from 64525 to 100125 transactions,
148 resulting in 17800 transactions per minute. However, we can see in the plot that
149 all transactions in the load are processed long before the two minutes.
150 If we adjust the time window when transactions are processed (approx. 105 seconds),
151 we obtain 20343 transactions per minute.
152
153 #### Memory Resident Set Size
154
155 Resident Set Size of all monitored processes is plotted below.
156
157 ![rss](img34/v034_r200c2_rss.png)
158
159 The average over all processes oscillates around 1.2 GiB and does not demonstrate unconstrained growth.
160
161 ![rss-avg](img34/v034_r200c2_rss_avg.png)
162
163 #### CPU utilization
164
165 The best metric from Prometheus to gauge CPU utilization in a Unix machine is `load1`,
166 as it usually appears in the
167 [output of `top`](https://www.digitalocean.com/community/tutorials/load-average-in-linux).
168
169 ![load1](img34/v034_r200c2_load1.png)
170
171 It is contained in most cases below 5, which is generally considered acceptable load.
172
173 ### Test Result
174
175 **Result: N/A** (v0.34.x is the baseline)
176
177 Date: 2022-10-14
178
179 Version: 3ec6e424d6ae4c96867c2dcf8310572156068bb6
180
181 ## Rotating Node Testnet
182
183 For this testnet, we will use a load that can safely be considered below the saturation
184 point for the size of this testnet (between 13 and 38 full nodes): `c=4,r=800`.
185
186 N.B.: The version of CometBFT used for these tests is affected by #9539.
187 However, the reduced load that reaches the mempools is orthogonal to functionality
188 we are focusing on here.
189
190 ### Latencies
191
192 The plot of all latencies can be seen in the following plot.
193
194 ![rotating-all-latencies](img34/v034_rotating_latencies.png)
195
196 We can observe there are some very high latencies, towards the end of the test.
197 Upon suspicion that they are duplicate transactions, we examined the latencies
198 raw file and discovered there are more than 100K duplicate transactions.
199
200 The following plot shows the latencies file where all duplicate transactions have
201 been removed, i.e., only the first occurrence of a duplicate transaction is kept.
202
203 ![rotating-all-latencies-uniq](img34/v034_rotating_latencies_uniq.png)
204
205 This problem, existing in `v0.34.x`, will need to be addressed, perhaps in the same way
206 we addressed it when running the 200 node test with high loads: increasing the `cache_size`
207 configuration parameter.
208
209 ### Prometheus Metrics
210
211 The set of metrics shown here are less than for the 200 node experiment.
212 We are only interested in those for which the catch-up process (blocksync) may have an impact.
213
214 #### Blocks and Transactions per minute
215
216 Just as shown for the 200 node test, the blocks produced per minute are the gradient of this plot.
217
218 ![rotating-heights](img34/v034_rotating_heights.png)
219
220 Over a period of 5229 seconds, the height goes from 2 to 3638.
221 This results in an average of 41 blocks produced per minute.
222
223 The following plot shows only the heights reported by ephemeral nodes
224 (which are also included in the plot above). Note that the _height_ metric
225 is only showed _once the node has switched to consensus_, hence the gaps
226 when nodes are killed, wiped out, started from scratch, and catching up.
227
228 ![rotating-heights-ephe](img34/v034_rotating_heights_ephe.png)
229
230 The transactions processed per minute are the gradient of this plot.
231
232 ![rotating-total-txs](img34/v034_rotating_total-txs.png)
233
234 The small lines we see periodically close to `y=0` are the transactions that
235 ephemeral nodes start processing when they are caught up.
236
237 Over a period of 5229 minutes, the total goes from 0 to 387697 transactions,
238 resulting in 4449 transactions per minute. We can see some abrupt changes in
239 the plot's gradient. This will need to be investigated.
240
241 #### Peers
242
243 The plot below shows the evolution in peers throughout the experiment.
244 The periodic changes observed are due to the ephemeral nodes being stopped,
245 wiped out, and recreated.
246
247 ![rotating-peers](img34/v034_rotating_peers.png)
248
249 The validators' plots are concentrated at the higher part of the graph, whereas the ephemeral nodes
250 are mostly at the lower part.
251
252 #### Memory Resident Set Size
253
254 The average Resident Set Size (RSS) over all processes seems stable, and slightly growing toward the end.
255 This might be related to the increased in transaction load observed above.
256
257 ![rotating-rss-avg](img34/v034_rotating_rss_avg.png)
258
259 The memory taken by the validators and the ephemeral nodes (when they are up) is comparable.
260
261 #### CPU utilization
262
263 The plot shows metric `load1` for all nodes.
264
265 ![rotating-load1](img34/v034_rotating_load1.png)
266
267 It is contained under 5 most of the time, which is considered normal load.
268 The purple line, which follows a different pattern is the validator receiving all
269 transactions, via RPC, from the load runner process.
270
271 ### Test Result
272
273 **Result: N/A**
274
275 Date: 2022-10-10
276
277 Version: a28c987f5a604ff66b515dd415270063e6fb069d