github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/perf.md

github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/perf.md (about)

1 ---
2 title: Performance Test Proposal
3 authors:
4 - "@edisonxiang"
5 - "@pavan187"
6 approvers:
7 - "@qizha"
8 - "@CindyXing"
9 - "@kevin-wangzefeng"
10 - "@Baoqiang-Zhang"
11 - "@m1093782566"
12 creation-date: 2019-03-28
13 last-updated: 2019-04-26
14 status: implementable
15 ---
16
17 # Performance Test Proposal
18
19 * [Performance Test Proposal](#performance-test-proposal)
20 * [Motivation](#motivation)
21 * [Goals](#goals)
22 * [Non\-goals](#non-goals)
23 * [Proposal](#proposal)
24 * [Performance Test Deployment](#performance-test-deployment)
25 * [Performance Test Framework](#performance-test-framework)
26 * [Performance Test Metrics Tools](#performance-test-metrics-tools)
27 * [Performance Test Scenarios](#performance-test-scenarios)
28 * [Thresholds](#thresholds)
29
30 ## Motivation
31
32 Currently KubeEdge test is focused on automated test suites for unit, integration and E2E test and validation.
33 KubeEdge allows the users to manage large scale of edge nodes, devices from cloud.
34 A set of tests specifically performance tests can be used to determine the non-functional
35 characteristics of KubeEdge such as latency, throughput, cpu usage, memory usage and so on.
36 As a result, we can also evaluate the future improvement items for KubeEdge.
37
38 This proposal lists the possible performance test scenarios and test cases for KubeEdge.
39
40 ### Goals
41
42 * Benchmark the performance against the following Service Level Objectives:
43 * Latency: time cost from the moment when the server gets the request to last byte of response sent to the users.
44 * Throughput: measure how many requests can be served within given time.
45 * Scalability: potential scaling capacity (including number of nodes, pods, devices etc. at the edge) under different load conditions.
46 * CPU Usage: measure the cpu usage of KubeEdge under different load conditions.
47 * Memory Usage: measure the memory usage of KubeEdge under different load conditions.
48 * Performance test should be able to run against both containerized and un-containerized version of KubeEdge.
49
50 ### Non-goals
51
52 * To design the specific implementation details of any single performance test.
53
54 ## Proposal
55
56 ### Performance Test Deployment
57 <img src="../images/perf/perf-deploy-type.png">
58
59 Every running KubeEdge Performance Test setup looks like the following:
60
61 1. A real K8S Cluster that has K8S Master and Nodes as **K8S Cluster** including **VM2** and other **VMs** shown in the above figure.
62 This Cluster is used to provision KubeEdge Edge Nodes.
63 2. A Cluster that has K8S Master and Nodes as **KubeEdge Cluster** including **VM3** and **VM4** shown in the above figure.
64 The KubeEdge Cloud Part and KubeEdge Edge Nodes are also including in this Cluster.
65 This Cluster is used to deploy the KubeEdge Cloud Part pod and performance test.
66 3. KubeEdge Cloud Part image and KubeEdge Edge Node image are built and put into any reachable container image registry.
67 4. Test Client as **VM1** shown in the above figure uses the deployment controller
68 to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster**
69 and KubeEdge Edge Node pods in **K8S Cluster** respectively,
70 and then launches performance test against **KubeEdge Cluster** for KubeEdge.
71
72 Before runing the KubeEdge Performance Test, the developer is responsible for creating 1~3 above.
73 Test Client uses the deployment object to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster**
74 and KubeEdge Edge Node pods in **K8S Cluster**,
75 and waits until all the pods come up and **Running**.
76 The KubeEdge Cloud Part pod will be running in the independent VM as **VM3** shown in the above figure.
77 The KubeEdge Edge Node pods will be running in **K8S Cluster**.
78 Once the KubeEdge Cloud Part and KubeEdge Edge Nodes are running,
79 KubeEdge Cloud Part will try to connect with K8S Master as **VM4** shown in the above figure,
80 and KubeEdge Edge Nodes will try to connect with KubeEdge Cloud Part.
81 At last, another Cluster is made up as **KubeEdge Cluster** shown in the above figure.
82
83 #### Test Client
84 | Subject | Description |
85 |--------------------------------|----------------------------------------------|
86 | OS | Ubuntu 18.04 server 64bit |
87 | CPU | 4vCPUs |
88 | RAM | 8GB |
89 | Disk Size | 40GB |
90 | Count | 1 |
91
92 This VM is used to deploy KubeEdge and run the performance test for KubeEdge.
93
94 #### K8S Masters
95 | Subject | Description |
96 |--------------------------------|----------------------------------------------|
97 | OS | Ubuntu 18.04 server 64bit |
98 | K8S Version | v1.13.5 |
99 | Docker Version | v17.09 |
100 | CPU | 32vCPUs |
101 | RAM | 128GB |
102 | Disk Size | 40GB |
103 | Count | 2 |
104
105 These two VMs are used to run K8S Master Services including K8S API Server and K8S Scheduler and so on.
106 One of them is used to deploy KubeEdge Edge Node pods.
107 The other one is used to deploy KubeEdge Cloud Part pod and run the performance test for KubeEdge.
108
109 #### K8S Nodes
110 | Subject | Description |
111 |--------------------------------|----------------------------------------------|
112 | OS | Ubuntu 18.04 server 64bit |
113 | K8S Version | v1.13.5 |
114 | Docker Version | v17.09 |
115 | CPU | 32vCPUs |
116 | RAM | 128GB |
117 | Disk Size | 40GB |
118 | Count | 2...N |
119
120 One of these VMs is used to run KubeEdge Cloud Part pod which is running Controllers and CloudHub and so on.
121 The other VMs are used to run numbers of KubeEdge Edge Node pods which are running Edged and EdgeHub and so on.
122 We will adjust the Count of VMs based on the KubeEdge Edge Nodes numbers.
123
124 KubeEdge Performance Test setup is similar with K8S KubeMark setup,
125 where they simulate numbers of hollow-node pods and deploy on K8S Cluster.
126 In KubeEdge we also do the similar kind of simulation for creating KubeEdge Edge Node pods
127 and deploy the pods through deployment, the difference is that we use docker in docker for KubeEdge Edge Nodes.
128 That means the applications deployed by KubeEdge will be running in the KubeEdge Edge Node pods.
129 Our pod takes up resources as below:
130 - 1 pod : 0.10 vCPU & 250MB RAM
131
132 With KubeEdge pod deployment we can accomodate 10 pods/1vCPU approximately.
133 Base on the above K8S Nodes flavor, the CPU and RAM are 32vCPU and 128GB respectively.
134 Per K8S Node we should be able to deploy 320 pods(KubeEdge Edge Nodes)/32vCPU
135 and RAM consumption would be around 80GB. If we have 5 K8S Nodes with the similar flavor,
136 on a whole we should be able to deploy 1500 pods(KubeEdge Edge Nodes)/5 K8S Nodes.
137
138 ### Performance Test Framework
139 <img src="../images/perf/perf-test-framework.png">
140
141 KubeEdge Performance Test Framework will be designed based on the **Gomega** and **Ginkgo**.
142
143 The Performance Test Framework mainly relates to Utils Library and different types of tests:
144 - E2E Test
145 - Latency Test
146 - Load Test
147 - Scalability Test
148 - ...
149
150 E2E Test Sample:
151
152 ```
153 It("E2E_Test_1: Create deployment and check the pods are coming up correctly", func() {
154 var deploymentList v1.DeploymentList
155 var podlist metav1.PodList
156 replica := 1
157 //Generate the random string and assign as a UID
158 UID = "deployment-app-" + utils.GetRandomString(5)
159 IsAppDeployed := utils.HandleDeployment(http.MethodPost, ctx.Cfg.ApiServer+DeploymentHandler, UID, ctx.Cfg.AppImageUrl[1], nodeSelector, replica)
160 Expect(IsAppDeployed).Should(BeTrue())
161 err := utils.GetDeployments(&deploymentList, ctx.Cfg.ApiServer+DeploymentHandler)
162 Expect(err).To(BeNil())
163 for _, deployment := range deploymentList.Items {
164 if deployment.Name == UID {
165 label := nodeName
166 podlist, err = utils.GetPods(ctx.Cfg.ApiServer+AppHandler, label)
167 Expect(err).To(BeNil())
168 break
169 }
170 }
171 utils.CheckPodRunningState(ctx.Cfg.ApiServer+AppHandler, podlist)
172 })
173 ```
174
175 By default Performance Test Framework will run all tests when the users run the **perf.sh** script.
176 Also the users can also provide the specific tests to run as a command line input to the **perf.sh** script.
177
178 Performance Test Framework also has the support of a command line interface with plenty of handy command line arguments
179 for running your tests and generating test files. Here is a choice example:
180
181 - Ex: perf.test -focus="LoadTest" and perf.test -skip="ScalabilityTest"
182
183 Performance Test Framework features include:
184
185 - A comprehensive test runner.
186 - Built-in support for testing asynchronicity.
187 - Modular and easy to customize.
188 - Logging and Reporting.
189 - Scalable to add more features.
190 - Built-in support of command line interface.
191 - ...
192
193 ### Performance Test Metrics Tools
194 * [Prometheus](https://github.com/prometheus/prometheus)
195 * [Grafana](https://github.com/grafana/grafana)
196
197 ### Performance Test Scenarios
198
199 #### 1. Edge Nodes join in K8S Cluster
200 <img src="../images/perf/perf-edgenodes-join-cluster.png">
201
202 Different numbers of Edge Nodes need be tested.
203 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
204
205 Test Cases:
206 * Measure Edge Nodes join in K8S Cluster startup time.
207
208 This test case ends with all Edge Nodes are in `Ready` status.
209
210 * Measure CPU and Memory Usage of KubeEdge Cloud Part.
211
212 * Measure CPU and Memory Usage of KubeEdge Edge Part.
213
214 #### 2. Create Devices from Cloud
215 <img src="../images/perf/perf-create-device.png">
216
217 This scenario is expected to measure the northbound API of KubeEdge.
218
219 Test Cases:
220 * Measure the latency between K8S Master and KubeEdge Cloud Part.
221
222 * Measure the throughput between K8S Master and KubeEdge Cloud Part.
223
224 * Measure CPU and Memory Usage of KubeEdge Cloud Part.
225
226 #### 3. Report Device Status to Edge
227 <img src="../images/perf/perf-report-devicestatus.png">
228
229 This scenario is expected to measure the southbound API of KubeEdge.
230
231 Different numbers of Devices need be tested.
232 * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`.
233
234 Test Cases:
235 * Measure the latency between KubeEdge Edge Part and device.
236
237 * Measure the throughput between KubeEdge Edge Part and device.
238
239 * Measure CPU and Memory Usage of KubeEdge Edge Part.
240
241 As the result of the latency and throughput with different device numbers,
242 we can evaluate scalability of devices for KubeEdge Edge Part.
243 Measure how many devices can be handled per Edge Node.
244 <img src="../images/perf/perf-multi-devices.png">
245
246 Different protocols are considered to test between KubeEdge Edge Part and devices.
247 E.g. Bluetooth, MQTT, ZigBee, BACnet and Modbus and so on.
248 Currenly less than 20ms latency can be accepted in Edge IoT scenario.
249 Two kinds of test cases can be adopted: emulators of different devices and actual devices.
250
251 #### 4. Application Deployment from Cloud to Edge
252 <img src="../images/perf/perf-app-deploy.png">
253
254 This scenario is expected to measure the performance of KubeEdge from Cloud to Edge.
255 The docker image download latency is not included in this scenario.
256 In the following test cases, we need to make sure that docker images are already downloaded on the Edge Nodes.
257
258 Different numbers of Edge Nodes and Pods need be tested.
259 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
260
261 * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`.
262
263 Test Cases:
264 * Measure the pod startup time.
265
266 This test case ends with all pods are in `Ready` status.
267
268 * Measure CPU and Memory Usage of KubeEdge Cloud Part.
269
270 * Measure CPU and Memory Usage of KubeEdge Edge Part.
271
272 As the result of the pod startup time, we can evaluate scalability of KubeEdge Edge Nodes.
273 Measure how many Edge Nodes can be handled by KubeEdge Cloud Part.
274 Measure how many pods can be handled per Edge Node.
275
276 <img src="../images/perf/perf-multi-edgenodes.png">
277
278 #### 5. Update Device Twin State from Cloud to Device
279 <img src="../images/perf/perf-update-devicetwin.png">
280
281 This scenario is expected to measure the E2E performance of KubeEdge.
282
283 Different numbers of Edge Nodes and Devices need be tested.
284 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
285
286 * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`.
287
288 Test Cases:
289 * Measure E2E latency.
290
291 * Measure CPU and Memory Usage of KubeEdge Cloud Part.
292
293 * Measure CPU and Memory Usage of KubeEdge Edge Part.
294
295 These test cases should be run in both system idle and under heavy load.
296
297 #### 6. Add Pod from CloudHub to EdgeHub
298 <img src="../images/perf/perf-cloudhub-edgehub.png">
299
300 This scenario is expected to measure the performance of KubeEdge between CloudHub to EdgeHub.
301 Actually this is not an E2E Test scenario for KubeEdge,
302 but the message delivery channel between CloudHub to EdgeHub may be our bottleneck.
303 Currently we are using web socket as the communication protocal between Cloud and Edge.
304 In the following test cases, we need to mock the behaviors of CloudHub and EdgeHub,
305 and the simulation messages of adding pod will be sent to EdgeHub,
306 and the simulation messages of pod status will be sent back to CloudHub.
307 so that we can get the exact latency and throughput between CloudHub and EdgeHub.
308
309 Different numbers of Edge Nodes and Pods need be tested.
310 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
311
312 * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`.
313
314 Test Cases:
315 * Measure the latency between KubeEdge CloudHub and KubeEdge EdgeHub.
316
317 * Measure the throughput between KubeEdge CloudHub and KubeEdge EdgeHub.
318
319 * Measure CPU and Memory Usage of KubeEdge Cloud Part.
320
321 * Measure CPU and Memory Usage of KubeEdge Edge Part.
322
323 As the result of the latency and throughput, we can evaluate scalability of KubeEdge EdgeHubs also the same with KubeEdge Edge Nodes.
324
325 ## Thresholds
326
327 As the result of Performance Test, we expect to determine the performance and scalability for KubeEdge.
328 This is critical to make some improvement items for KubeEdge.
329 On the other hand, it will give the users recommended setup and user guides on KubeEdge.
330
331 As [this K8S document](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md) mentions,
332 since 1.6 release K8S can support 5000 Nodes and 150000 Pods in single cluster.
333 KubeEdge is based on K8S Master, the difference between K8S Nodes and KubeEdge Edge Nodes
334 is that KubeEdge Edge Nodes are not directly connected with the K8S Master like the K8S Nodes do.
335 The KubeEdge Cloud Part connects K8S Master with KubeEdge Edge Nodes.
336 Certainly the KubeEdge Edge Nodes are light weight and making use of less resources like CPU, Memory.
337
338 Currently about KubeEdge we have no performance data which can make comparison with the other systems.
339 But we can measure the performance and scalability for KubeEdge using the Performance Test data.
340 We can get the original test data from KubeEdge 0.3 release, and also make Performance Test for the follow up releases.
341 We define the following thresholds which will be based on the Performance Test data for KubeEdge.
342 In most cases, exceeding these thresholds do not mean KubeEdge fails over,
343 it just means its overall performance degrades.
344
345 | Quantity | 0.3 Release | 1.0 Release | Long Term Goal |
346 |-------------------------------------|----------------|-------------|----------------|
347 | Edge Nodes numbers | | | |
348 | Pods numbers | | | |
349 | Pods numbers per Edge Node | | | |
350 | Device numbers | | | |
351 | Device numbers per Edge Node | | | |
352
353 The KubeEdge Performance Test Cases will exceed 5000 Edge Nodes and 150000 Pods,
354 so that we can make comparison with K8S Cluster.
355 The form will be filled with the first round of Performance Test data.