github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/perf.md (about) 1 --- 2 title: Performance Test Proposal 3 authors: 4 - "@edisonxiang" 5 - "@pavan187" 6 approvers: 7 - "@qizha" 8 - "@CindyXing" 9 - "@kevin-wangzefeng" 10 - "@Baoqiang-Zhang" 11 - "@m1093782566" 12 creation-date: 2019-03-28 13 last-updated: 2019-04-26 14 status: implementable 15 --- 16 17 # Performance Test Proposal 18 19 * [Performance Test Proposal](#performance-test-proposal) 20 * [Motivation](#motivation) 21 * [Goals](#goals) 22 * [Non\-goals](#non-goals) 23 * [Proposal](#proposal) 24 * [Performance Test Deployment](#performance-test-deployment) 25 * [Performance Test Framework](#performance-test-framework) 26 * [Performance Test Metrics Tools](#performance-test-metrics-tools) 27 * [Performance Test Scenarios](#performance-test-scenarios) 28 * [Thresholds](#thresholds) 29 30 ## Motivation 31 32 Currently KubeEdge test is focused on automated test suites for unit, integration and E2E test and validation. 33 KubeEdge allows the users to manage large scale of edge nodes, devices from cloud. 34 A set of tests specifically performance tests can be used to determine the non-functional 35 characteristics of KubeEdge such as latency, throughput, cpu usage, memory usage and so on. 36 As a result, we can also evaluate the future improvement items for KubeEdge. 37 38 This proposal lists the possible performance test scenarios and test cases for KubeEdge. 39 40 ### Goals 41 42 * Benchmark the performance against the following Service Level Objectives: 43 * Latency: time cost from the moment when the server gets the request to last byte of response sent to the users. 44 * Throughput: measure how many requests can be served within given time. 45 * Scalability: potential scaling capacity (including number of nodes, pods, devices etc. at the edge) under different load conditions. 46 * CPU Usage: measure the cpu usage of KubeEdge under different load conditions. 47 * Memory Usage: measure the memory usage of KubeEdge under different load conditions. 48 * Performance test should be able to run against both containerized and un-containerized version of KubeEdge. 49 50 ### Non-goals 51 52 * To design the specific implementation details of any single performance test. 53 54 ## Proposal 55 56 ### Performance Test Deployment 57 <img src="../images/perf/perf-deploy-type.png"> 58 59 Every running KubeEdge Performance Test setup looks like the following: 60 61 1. A real K8S Cluster that has K8S Master and Nodes as **K8S Cluster** including **VM2** and other **VMs** shown in the above figure. 62 This Cluster is used to provision KubeEdge Edge Nodes. 63 2. A Cluster that has K8S Master and Nodes as **KubeEdge Cluster** including **VM3** and **VM4** shown in the above figure. 64 The KubeEdge Cloud Part and KubeEdge Edge Nodes are also including in this Cluster. 65 This Cluster is used to deploy the KubeEdge Cloud Part pod and performance test. 66 3. KubeEdge Cloud Part image and KubeEdge Edge Node image are built and put into any reachable container image registry. 67 4. Test Client as **VM1** shown in the above figure uses the deployment controller 68 to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster** 69 and KubeEdge Edge Node pods in **K8S Cluster** respectively, 70 and then launches performance test against **KubeEdge Cluster** for KubeEdge. 71 72 Before runing the KubeEdge Performance Test, the developer is responsible for creating 1~3 above. 73 Test Client uses the deployment object to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster** 74 and KubeEdge Edge Node pods in **K8S Cluster**, 75 and waits until all the pods come up and **Running**. 76 The KubeEdge Cloud Part pod will be running in the independent VM as **VM3** shown in the above figure. 77 The KubeEdge Edge Node pods will be running in **K8S Cluster**. 78 Once the KubeEdge Cloud Part and KubeEdge Edge Nodes are running, 79 KubeEdge Cloud Part will try to connect with K8S Master as **VM4** shown in the above figure, 80 and KubeEdge Edge Nodes will try to connect with KubeEdge Cloud Part. 81 At last, another Cluster is made up as **KubeEdge Cluster** shown in the above figure. 82 83 #### Test Client 84 | Subject | Description | 85 |--------------------------------|----------------------------------------------| 86 | OS | Ubuntu 18.04 server 64bit | 87 | CPU | 4vCPUs | 88 | RAM | 8GB | 89 | Disk Size | 40GB | 90 | Count | 1 | 91 92 This VM is used to deploy KubeEdge and run the performance test for KubeEdge. 93 94 #### K8S Masters 95 | Subject | Description | 96 |--------------------------------|----------------------------------------------| 97 | OS | Ubuntu 18.04 server 64bit | 98 | K8S Version | v1.13.5 | 99 | Docker Version | v17.09 | 100 | CPU | 32vCPUs | 101 | RAM | 128GB | 102 | Disk Size | 40GB | 103 | Count | 2 | 104 105 These two VMs are used to run K8S Master Services including K8S API Server and K8S Scheduler and so on. 106 One of them is used to deploy KubeEdge Edge Node pods. 107 The other one is used to deploy KubeEdge Cloud Part pod and run the performance test for KubeEdge. 108 109 #### K8S Nodes 110 | Subject | Description | 111 |--------------------------------|----------------------------------------------| 112 | OS | Ubuntu 18.04 server 64bit | 113 | K8S Version | v1.13.5 | 114 | Docker Version | v17.09 | 115 | CPU | 32vCPUs | 116 | RAM | 128GB | 117 | Disk Size | 40GB | 118 | Count | 2...N | 119 120 One of these VMs is used to run KubeEdge Cloud Part pod which is running Controllers and CloudHub and so on. 121 The other VMs are used to run numbers of KubeEdge Edge Node pods which are running Edged and EdgeHub and so on. 122 We will adjust the Count of VMs based on the KubeEdge Edge Nodes numbers. 123 124 KubeEdge Performance Test setup is similar with K8S KubeMark setup, 125 where they simulate numbers of hollow-node pods and deploy on K8S Cluster. 126 In KubeEdge we also do the similar kind of simulation for creating KubeEdge Edge Node pods 127 and deploy the pods through deployment, the difference is that we use docker in docker for KubeEdge Edge Nodes. 128 That means the applications deployed by KubeEdge will be running in the KubeEdge Edge Node pods. 129 Our pod takes up resources as below: 130 - 1 pod : 0.10 vCPU & 250MB RAM 131 132 With KubeEdge pod deployment we can accomodate 10 pods/1vCPU approximately. 133 Base on the above K8S Nodes flavor, the CPU and RAM are 32vCPU and 128GB respectively. 134 Per K8S Node we should be able to deploy 320 pods(KubeEdge Edge Nodes)/32vCPU 135 and RAM consumption would be around 80GB. If we have 5 K8S Nodes with the similar flavor, 136 on a whole we should be able to deploy 1500 pods(KubeEdge Edge Nodes)/5 K8S Nodes. 137 138 ### Performance Test Framework 139 <img src="../images/perf/perf-test-framework.png"> 140 141 KubeEdge Performance Test Framework will be designed based on the **Gomega** and **Ginkgo**. 142 143 The Performance Test Framework mainly relates to Utils Library and different types of tests: 144 - E2E Test 145 - Latency Test 146 - Load Test 147 - Scalability Test 148 - ... 149 150 E2E Test Sample: 151 152 ``` 153 It("E2E_Test_1: Create deployment and check the pods are coming up correctly", func() { 154 var deploymentList v1.DeploymentList 155 var podlist metav1.PodList 156 replica := 1 157 //Generate the random string and assign as a UID 158 UID = "deployment-app-" + utils.GetRandomString(5) 159 IsAppDeployed := utils.HandleDeployment(http.MethodPost, ctx.Cfg.ApiServer+DeploymentHandler, UID, ctx.Cfg.AppImageUrl[1], nodeSelector, replica) 160 Expect(IsAppDeployed).Should(BeTrue()) 161 err := utils.GetDeployments(&deploymentList, ctx.Cfg.ApiServer+DeploymentHandler) 162 Expect(err).To(BeNil()) 163 for _, deployment := range deploymentList.Items { 164 if deployment.Name == UID { 165 label := nodeName 166 podlist, err = utils.GetPods(ctx.Cfg.ApiServer+AppHandler, label) 167 Expect(err).To(BeNil()) 168 break 169 } 170 } 171 utils.CheckPodRunningState(ctx.Cfg.ApiServer+AppHandler, podlist) 172 }) 173 ``` 174 175 By default Performance Test Framework will run all tests when the users run the **perf.sh** script. 176 Also the users can also provide the specific tests to run as a command line input to the **perf.sh** script. 177 178 Performance Test Framework also has the support of a command line interface with plenty of handy command line arguments 179 for running your tests and generating test files. Here is a choice example: 180 181 - Ex: perf.test -focus="LoadTest" and perf.test -skip="ScalabilityTest" 182 183 Performance Test Framework features include: 184 185 - A comprehensive test runner. 186 - Built-in support for testing asynchronicity. 187 - Modular and easy to customize. 188 - Logging and Reporting. 189 - Scalable to add more features. 190 - Built-in support of command line interface. 191 - ... 192 193 ### Performance Test Metrics Tools 194 * [Prometheus](https://github.com/prometheus/prometheus) 195 * [Grafana](https://github.com/grafana/grafana) 196 197 ### Performance Test Scenarios 198 199 #### 1. Edge Nodes join in K8S Cluster 200 <img src="../images/perf/perf-edgenodes-join-cluster.png"> 201 202 Different numbers of Edge Nodes need be tested. 203 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`. 204 205 Test Cases: 206 * Measure Edge Nodes join in K8S Cluster startup time. 207 208 This test case ends with all Edge Nodes are in `Ready` status. 209 210 * Measure CPU and Memory Usage of KubeEdge Cloud Part. 211 212 * Measure CPU and Memory Usage of KubeEdge Edge Part. 213 214 #### 2. Create Devices from Cloud 215 <img src="../images/perf/perf-create-device.png"> 216 217 This scenario is expected to measure the northbound API of KubeEdge. 218 219 Test Cases: 220 * Measure the latency between K8S Master and KubeEdge Cloud Part. 221 222 * Measure the throughput between K8S Master and KubeEdge Cloud Part. 223 224 * Measure CPU and Memory Usage of KubeEdge Cloud Part. 225 226 #### 3. Report Device Status to Edge 227 <img src="../images/perf/perf-report-devicestatus.png"> 228 229 This scenario is expected to measure the southbound API of KubeEdge. 230 231 Different numbers of Devices need be tested. 232 * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`. 233 234 Test Cases: 235 * Measure the latency between KubeEdge Edge Part and device. 236 237 * Measure the throughput between KubeEdge Edge Part and device. 238 239 * Measure CPU and Memory Usage of KubeEdge Edge Part. 240 241 As the result of the latency and throughput with different device numbers, 242 we can evaluate scalability of devices for KubeEdge Edge Part. 243 Measure how many devices can be handled per Edge Node. 244 <img src="../images/perf/perf-multi-devices.png"> 245 246 Different protocols are considered to test between KubeEdge Edge Part and devices. 247 E.g. Bluetooth, MQTT, ZigBee, BACnet and Modbus and so on. 248 Currenly less than 20ms latency can be accepted in Edge IoT scenario. 249 Two kinds of test cases can be adopted: emulators of different devices and actual devices. 250 251 #### 4. Application Deployment from Cloud to Edge 252 <img src="../images/perf/perf-app-deploy.png"> 253 254 This scenario is expected to measure the performance of KubeEdge from Cloud to Edge. 255 The docker image download latency is not included in this scenario. 256 In the following test cases, we need to make sure that docker images are already downloaded on the Edge Nodes. 257 258 Different numbers of Edge Nodes and Pods need be tested. 259 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`. 260 261 * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`. 262 263 Test Cases: 264 * Measure the pod startup time. 265 266 This test case ends with all pods are in `Ready` status. 267 268 * Measure CPU and Memory Usage of KubeEdge Cloud Part. 269 270 * Measure CPU and Memory Usage of KubeEdge Edge Part. 271 272 As the result of the pod startup time, we can evaluate scalability of KubeEdge Edge Nodes. 273 Measure how many Edge Nodes can be handled by KubeEdge Cloud Part. 274 Measure how many pods can be handled per Edge Node. 275 276 <img src="../images/perf/perf-multi-edgenodes.png"> 277 278 #### 5. Update Device Twin State from Cloud to Device 279 <img src="../images/perf/perf-update-devicetwin.png"> 280 281 This scenario is expected to measure the E2E performance of KubeEdge. 282 283 Different numbers of Edge Nodes and Devices need be tested. 284 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`. 285 286 * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`. 287 288 Test Cases: 289 * Measure E2E latency. 290 291 * Measure CPU and Memory Usage of KubeEdge Cloud Part. 292 293 * Measure CPU and Memory Usage of KubeEdge Edge Part. 294 295 These test cases should be run in both system idle and under heavy load. 296 297 #### 6. Add Pod from CloudHub to EdgeHub 298 <img src="../images/perf/perf-cloudhub-edgehub.png"> 299 300 This scenario is expected to measure the performance of KubeEdge between CloudHub to EdgeHub. 301 Actually this is not an E2E Test scenario for KubeEdge, 302 but the message delivery channel between CloudHub to EdgeHub may be our bottleneck. 303 Currently we are using web socket as the communication protocal between Cloud and Edge. 304 In the following test cases, we need to mock the behaviors of CloudHub and EdgeHub, 305 and the simulation messages of adding pod will be sent to EdgeHub, 306 and the simulation messages of pod status will be sent back to CloudHub. 307 so that we can get the exact latency and throughput between CloudHub and EdgeHub. 308 309 Different numbers of Edge Nodes and Pods need be tested. 310 * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`. 311 312 * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`. 313 314 Test Cases: 315 * Measure the latency between KubeEdge CloudHub and KubeEdge EdgeHub. 316 317 * Measure the throughput between KubeEdge CloudHub and KubeEdge EdgeHub. 318 319 * Measure CPU and Memory Usage of KubeEdge Cloud Part. 320 321 * Measure CPU and Memory Usage of KubeEdge Edge Part. 322 323 As the result of the latency and throughput, we can evaluate scalability of KubeEdge EdgeHubs also the same with KubeEdge Edge Nodes. 324 325 ## Thresholds 326 327 As the result of Performance Test, we expect to determine the performance and scalability for KubeEdge. 328 This is critical to make some improvement items for KubeEdge. 329 On the other hand, it will give the users recommended setup and user guides on KubeEdge. 330 331 As [this K8S document](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md) mentions, 332 since 1.6 release K8S can support 5000 Nodes and 150000 Pods in single cluster. 333 KubeEdge is based on K8S Master, the difference between K8S Nodes and KubeEdge Edge Nodes 334 is that KubeEdge Edge Nodes are not directly connected with the K8S Master like the K8S Nodes do. 335 The KubeEdge Cloud Part connects K8S Master with KubeEdge Edge Nodes. 336 Certainly the KubeEdge Edge Nodes are light weight and making use of less resources like CPU, Memory. 337 338 Currently about KubeEdge we have no performance data which can make comparison with the other systems. 339 But we can measure the performance and scalability for KubeEdge using the Performance Test data. 340 We can get the original test data from KubeEdge 0.3 release, and also make Performance Test for the follow up releases. 341 We define the following thresholds which will be based on the Performance Test data for KubeEdge. 342 In most cases, exceeding these thresholds do not mean KubeEdge fails over, 343 it just means its overall performance degrades. 344 345 | Quantity | 0.3 Release | 1.0 Release | Long Term Goal | 346 |-------------------------------------|----------------|-------------|----------------| 347 | Edge Nodes numbers | | | | 348 | Pods numbers | | | | 349 | Pods numbers per Edge Node | | | | 350 | Device numbers | | | | 351 | Device numbers per Edge Node | | | | 352 353 The KubeEdge Performance Test Cases will exceed 5000 Edge Nodes and 150000 Pods, 354 so that we can make comparison with K8S Cluster. 355 The form will be filled with the first round of Performance Test data.