github.com/jingruilea/kubeedge@v1.2.0-beta.0.0.20200410162146-4bb8902b3879/docs/proposals/perf.md (about)

     1  ---
     2  title: Performance Test Proposal
     3  authors:
     4      - "@edisonxiang"
     5      - "@pavan187"
     6  approvers:
     7    - "@qizha"
     8    - "@CindyXing"
     9    - "@kevin-wangzefeng"
    10    - "@Baoqiang-Zhang"
    11    - "@m1093782566"
    12  creation-date: 2019-03-28
    13  last-updated: 2019-04-26
    14  status: implementable
    15  ---
    16  
    17  # Performance Test Proposal
    18  
    19  * [Performance Test Proposal](#performance-test-proposal)
    20    * [Motivation](#motivation)
    21      * [Goals](#goals)
    22      * [Non\-goals](#non-goals)
    23    * [Proposal](#proposal)
    24      * [Performance Test Deployment](#performance-test-deployment)
    25      * [Performance Test Framework](#performance-test-framework)
    26      * [Performance Test Metrics Tools](#performance-test-metrics-tools)
    27      * [Performance Test Scenarios](#performance-test-scenarios)
    28    * [Thresholds](#thresholds)
    29  
    30  ## Motivation
    31  
    32  Currently KubeEdge test is focused on automated test suites for unit, integration and E2E test and validation.
    33  KubeEdge allows the users to manage large scale of edge nodes, devices from cloud.
    34  A set of tests specifically performance tests can be used to determine the non-functional
    35  characteristics of KubeEdge such as latency, throughput, cpu usage, memory usage and so on.
    36  As a result, we can also evaluate the future improvement items for KubeEdge.
    37  
    38  This proposal lists the possible performance test scenarios and test cases for KubeEdge.
    39  
    40  ### Goals
    41  
    42  * Benchmark the performance against the following Service Level Objectives:
    43    * Latency: time cost from the moment when the server gets the request to last byte of response sent to the users.
    44    * Throughput: measure how many requests can be served within given time.
    45    * Scalability: potential scaling capacity (including number of nodes, pods, devices etc. at the edge) under different load conditions.
    46    * CPU Usage: measure the cpu usage of KubeEdge under different load conditions.
    47    * Memory Usage: measure the memory usage of KubeEdge under different load conditions.
    48  * Performance test should be able to run against both containerized and un-containerized version of KubeEdge.
    49  
    50  ### Non-goals
    51  
    52  * To design the specific implementation details of any single performance test.
    53  
    54  ## Proposal
    55  
    56  ### Performance Test Deployment
    57  <img src="../images/perf/perf-deploy-type.png">
    58  
    59  Every running KubeEdge Performance Test setup looks like the following:
    60  
    61  1. A real K8S Cluster that has K8S Master and Nodes as **K8S Cluster** including **VM2** and other **VMs** shown in the above figure.
    62  This Cluster is used to provision KubeEdge Edge Nodes.
    63  2. A Cluster that has K8S Master and Nodes as **KubeEdge Cluster** including **VM3** and **VM4** shown in the above figure.
    64  The KubeEdge Cloud Part and KubeEdge Edge Nodes are also including in this Cluster.
    65  This Cluster is used to deploy the KubeEdge Cloud Part pod and performance test.
    66  3. KubeEdge Cloud Part image and KubeEdge Edge Node image are built and put into any reachable container image registry.
    67  4. Test Client as **VM1** shown in the above figure uses the deployment controller
    68  to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster**
    69  and KubeEdge Edge Node pods in **K8S Cluster** respectively,
    70  and then launches performance test against **KubeEdge Cluster** for KubeEdge.
    71  
    72  Before runing the KubeEdge Performance Test, the developer is responsible for creating 1~3 above.
    73  Test Client uses the deployment object to deploy KubeEdge Cloud Part pod in **KubeEdge Cluster**
    74  and KubeEdge Edge Node pods in **K8S Cluster**,
    75  and waits until all the pods come up and **Running**.
    76  The KubeEdge Cloud Part pod will be running in the independent VM as **VM3** shown in the above figure.
    77  The KubeEdge Edge Node pods will be running in **K8S Cluster**.
    78  Once the KubeEdge Cloud Part and KubeEdge Edge Nodes are running,
    79  KubeEdge Cloud Part will try to connect with K8S Master as **VM4** shown in the above figure,
    80  and KubeEdge Edge Nodes will try to connect with KubeEdge Cloud Part.
    81  At last, another Cluster is made up as **KubeEdge Cluster** shown in the above figure.
    82  
    83  #### Test Client
    84  | Subject                        | Description                                  |
    85  |--------------------------------|----------------------------------------------|
    86  | OS                             |  Ubuntu 18.04 server 64bit                   |
    87  | CPU                            |  4vCPUs                                      |
    88  | RAM                            |  8GB                                         |
    89  | Disk Size                      |  40GB                                        |
    90  | Count                          |  1                                           |
    91  
    92  This VM is used to deploy KubeEdge and run the performance test for KubeEdge.
    93  
    94  #### K8S Masters
    95  | Subject                        | Description                                  |
    96  |--------------------------------|----------------------------------------------|
    97  | OS                             |  Ubuntu 18.04 server 64bit                   |
    98  | K8S Version                    |  v1.13.5                                     |
    99  | Docker Version                 |  v17.09                                      |
   100  | CPU                            |  32vCPUs                                     |
   101  | RAM                            |  128GB                                       |
   102  | Disk Size                      |  40GB                                        |
   103  | Count                          |  2                                           |
   104  
   105  These two VMs are used to run K8S Master Services including K8S API Server and K8S Scheduler and so on.
   106  One of them is used to deploy KubeEdge Edge Node pods.
   107  The other one is used to deploy KubeEdge Cloud Part pod and run the performance test for KubeEdge.
   108  
   109  #### K8S Nodes
   110  | Subject                        | Description                                  |
   111  |--------------------------------|----------------------------------------------|
   112  | OS                             |  Ubuntu 18.04 server 64bit                   |
   113  | K8S Version                    |  v1.13.5                                     |
   114  | Docker Version                 |  v17.09                                      |
   115  | CPU                            |  32vCPUs                                     |
   116  | RAM                            |  128GB                                       |
   117  | Disk Size                      |  40GB                                        |
   118  | Count                          |  2...N                                       |
   119  
   120  One of these VMs is used to run KubeEdge Cloud Part pod which is running Controllers and CloudHub and so on.
   121  The other VMs are used to run numbers of KubeEdge Edge Node pods which are running Edged and EdgeHub and so on.
   122  We will adjust the Count of VMs based on the KubeEdge Edge Nodes numbers.
   123  
   124  KubeEdge Performance Test setup is similar with K8S KubeMark setup,
   125  where they simulate numbers of hollow-node pods and deploy on K8S Cluster.
   126  In KubeEdge we also do the similar kind of simulation for creating KubeEdge Edge Node pods
   127  and deploy the pods through deployment, the difference is that we use docker in docker for KubeEdge Edge Nodes.
   128  That means the applications deployed by KubeEdge will be running in the KubeEdge Edge Node pods.
   129  Our pod takes up resources as below:
   130  - 1 pod : 0.10 vCPU & 250MB RAM
   131  
   132  With KubeEdge pod deployment we can accomodate 10 pods/1vCPU approximately.
   133  Base on the above K8S Nodes flavor, the CPU and RAM are 32vCPU and 128GB respectively.
   134  Per K8S Node we should be able to deploy 320 pods(KubeEdge Edge Nodes)/32vCPU
   135  and RAM consumption would be around 80GB. If we have 5 K8S Nodes with the similar flavor,
   136  on a whole we should be able to deploy 1500 pods(KubeEdge Edge Nodes)/5 K8S Nodes.
   137  
   138  ### Performance Test Framework
   139  <img src="../images/perf/perf-test-framework.png">
   140  
   141  KubeEdge Performance Test Framework will be designed based on the **Gomega** and **Ginkgo**.
   142  
   143  The Performance Test Framework mainly relates to Utils Library and different types of tests:
   144  - E2E Test
   145  - Latency Test
   146  - Load Test
   147  - Scalability Test
   148  - ...
   149  
   150  E2E Test Sample:
   151  
   152  ```
   153  It("E2E_Test_1: Create deployment and check the pods are coming up correctly", func() {
   154  			var deploymentList v1.DeploymentList
   155  			var podlist metav1.PodList
   156  			replica := 1
   157  			//Generate the random string and assign as a UID
   158  			UID = "deployment-app-" + utils.GetRandomString(5)
   159  			IsAppDeployed := utils.HandleDeployment(http.MethodPost, ctx.Cfg.ApiServer+DeploymentHandler, UID, ctx.Cfg.AppImageUrl[1], nodeSelector, replica)
   160  			Expect(IsAppDeployed).Should(BeTrue())
   161  			err := utils.GetDeployments(&deploymentList, ctx.Cfg.ApiServer+DeploymentHandler)
   162  			Expect(err).To(BeNil())
   163  			for _, deployment := range deploymentList.Items {
   164  				if deployment.Name == UID {
   165  					label := nodeName
   166  					podlist, err = utils.GetPods(ctx.Cfg.ApiServer+AppHandler, label)
   167  					Expect(err).To(BeNil())
   168  					break
   169  				}
   170  			}
   171  			utils.CheckPodRunningState(ctx.Cfg.ApiServer+AppHandler, podlist)
   172  		})
   173  ```
   174  
   175  By default Performance Test Framework will run all tests when the users run the **perf.sh** script.
   176  Also the users can also provide the specific tests to run as a command line input to the **perf.sh** script.
   177  
   178  Performance Test Framework also has the support of a command line interface with plenty of handy command line arguments
   179  for running your tests and generating test files. Here is a choice example:
   180  
   181      - Ex:   perf.test -focus="LoadTest" and perf.test -skip="ScalabilityTest"
   182  
   183  Performance Test Framework features include:
   184  
   185  - A comprehensive test runner.
   186  - Built-in support for testing asynchronicity.
   187  - Modular and easy to customize.
   188  - Logging and Reporting.
   189  - Scalable to add more features.
   190  - Built-in support of command line interface.
   191  - ...
   192  
   193  ### Performance Test Metrics Tools
   194  * [Prometheus](https://github.com/prometheus/prometheus)
   195  * [Grafana](https://github.com/grafana/grafana)
   196  
   197  ### Performance Test Scenarios
   198  
   199  #### 1. Edge Nodes join in K8S Cluster
   200  <img src="../images/perf/perf-edgenodes-join-cluster.png">
   201  
   202  Different numbers of Edge Nodes need be tested.
   203  * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
   204  
   205  Test Cases:
   206  * Measure Edge Nodes join in K8S Cluster startup time.
   207  
   208    This test case ends with all Edge Nodes are in `Ready` status.
   209  
   210  * Measure CPU and Memory Usage of KubeEdge Cloud Part.
   211  
   212  * Measure CPU and Memory Usage of KubeEdge Edge Part.
   213  
   214  #### 2. Create Devices from Cloud
   215  <img src="../images/perf/perf-create-device.png">
   216  
   217  This scenario is expected to measure the northbound API of KubeEdge.
   218  
   219  Test Cases:
   220  * Measure the latency between K8S Master and KubeEdge Cloud Part.
   221  
   222  * Measure the throughput between K8S Master and KubeEdge Cloud Part.
   223  
   224  * Measure CPU and Memory Usage of KubeEdge Cloud Part.
   225  
   226  #### 3. Report Device Status to Edge
   227  <img src="../images/perf/perf-report-devicestatus.png">
   228  
   229  This scenario is expected to measure the southbound API of KubeEdge.
   230  
   231  Different numbers of Devices need be tested.
   232  * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`.
   233  
   234  Test Cases:
   235  * Measure the latency between KubeEdge Edge Part and device.
   236  
   237  * Measure the throughput between KubeEdge Edge Part and device.
   238  
   239  * Measure CPU and Memory Usage of KubeEdge Edge Part.
   240  
   241  As the result of the latency and throughput with different device numbers,
   242  we can evaluate scalability of devices for KubeEdge Edge Part.
   243  Measure how many devices can be handled per Edge Node.
   244  <img src="../images/perf/perf-multi-devices.png">
   245  
   246  Different protocols are considered to test between KubeEdge Edge Part and devices.
   247  E.g. Bluetooth, MQTT, ZigBee, BACnet and Modbus and so on.
   248  Currenly less than 20ms latency can be accepted in Edge IoT scenario.
   249  Two kinds of test cases can be adopted: emulators of different devices and actual devices.
   250  
   251  #### 4. Application Deployment from Cloud to Edge
   252  <img src="../images/perf/perf-app-deploy.png">
   253  
   254  This scenario is expected to measure the performance of KubeEdge from Cloud to Edge.
   255  The docker image download latency is not included in this scenario.
   256  In the following test cases, we need to make sure that docker images are already downloaded on the Edge Nodes.
   257  
   258  Different numbers of Edge Nodes and Pods need be tested.
   259  * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
   260  
   261  * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`.
   262  
   263  Test Cases:
   264  * Measure the pod startup time.
   265  
   266    This test case ends with all pods are in `Ready` status.
   267  
   268  * Measure CPU and Memory Usage of KubeEdge Cloud Part.
   269  
   270  * Measure CPU and Memory Usage of KubeEdge Edge Part.
   271  
   272  As the result of the pod startup time, we can evaluate scalability of KubeEdge Edge Nodes.
   273  Measure how many Edge Nodes can be handled by KubeEdge Cloud Part.
   274  Measure how many pods can be handled per Edge Node.
   275  
   276  <img src="../images/perf/perf-multi-edgenodes.png">
   277  
   278  #### 5. Update Device Twin State from Cloud to Device
   279  <img src="../images/perf/perf-update-devicetwin.png">
   280  
   281  This scenario is expected to measure the E2E performance of KubeEdge.
   282  
   283  Different numbers of Edge Nodes and Devices need be tested.
   284  * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
   285  
   286  * Devices numbers per Edge Node are one of `[1, 10, 20, 50, 100, 200...]`.
   287  
   288  Test Cases:
   289  * Measure E2E latency.
   290  
   291  * Measure CPU and Memory Usage of KubeEdge Cloud Part.
   292  
   293  * Measure CPU and Memory Usage of KubeEdge Edge Part.
   294  
   295  These test cases should be run in both system idle and under heavy load.
   296  
   297  #### 6. Add Pod from CloudHub to EdgeHub
   298  <img src="../images/perf/perf-cloudhub-edgehub.png">
   299  
   300  This scenario is expected to measure the performance of KubeEdge between CloudHub to EdgeHub.
   301  Actually this is not an E2E Test scenario for KubeEdge,
   302  but the message delivery channel between CloudHub to EdgeHub may be our bottleneck.
   303  Currently we are using web socket as the communication protocal between Cloud and Edge.
   304  In the following test cases, we need to mock the behaviors of CloudHub and EdgeHub,
   305  and the simulation messages of adding pod will be sent to EdgeHub,
   306  and the simulation messages of pod status will be sent back to CloudHub.
   307  so that we can get the exact latency and throughput between CloudHub and EdgeHub.
   308  
   309  Different numbers of Edge Nodes and Pods need be tested.
   310  * Edge Nodes numbers are one of `[1, 10, 20, 50, 100, 200...]`.
   311  
   312  * Pods numbers per Edge Node are one of `[1, 2, 5, 10, 20...]`.
   313  
   314  Test Cases:
   315  * Measure the latency between KubeEdge CloudHub and KubeEdge EdgeHub.
   316  
   317  * Measure the throughput between KubeEdge CloudHub and KubeEdge EdgeHub.
   318  
   319  * Measure CPU and Memory Usage of KubeEdge Cloud Part.
   320  
   321  * Measure CPU and Memory Usage of KubeEdge Edge Part.
   322  
   323  As the result of the latency and throughput, we can evaluate scalability of KubeEdge EdgeHubs also the same with KubeEdge Edge Nodes.
   324  
   325  ## Thresholds
   326  
   327  As the result of Performance Test, we expect to determine the performance and scalability for KubeEdge.
   328  This is critical to make some improvement items for KubeEdge.
   329  On the other hand, it will give the users recommended setup and user guides on KubeEdge.
   330  
   331  As [this K8S document](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md) mentions,
   332  since 1.6 release K8S can support 5000 Nodes and 150000 Pods in single cluster.
   333  KubeEdge is based on K8S Master, the difference between K8S Nodes and KubeEdge Edge Nodes
   334  is that KubeEdge Edge Nodes are not directly connected with the K8S Master like the K8S Nodes do.
   335  The KubeEdge Cloud Part connects K8S Master with KubeEdge Edge Nodes.
   336  Certainly the KubeEdge Edge Nodes are light weight and making use of less resources like CPU, Memory.
   337  
   338  Currently about KubeEdge we have no performance data which can make comparison with the other systems.
   339  But we can measure the performance and scalability for KubeEdge using the Performance Test data.
   340  We can get the original test data from KubeEdge 0.3 release, and also make Performance Test for the follow up releases.
   341  We define the following thresholds which will be based on the Performance Test data for KubeEdge.
   342  In most cases, exceeding these thresholds do not mean KubeEdge fails over,
   343  it just means its overall performance degrades.
   344  
   345  | Quantity                            | 0.3 Release    | 1.0 Release | Long Term Goal |
   346  |-------------------------------------|----------------|-------------|----------------|
   347  | Edge Nodes numbers                  |                |             |                |
   348  | Pods numbers                        |                |             |                |
   349  | Pods numbers per Edge Node          |                |             |                |
   350  | Device numbers                      |                |             |                |
   351  | Device numbers per Edge Node        |                |             |                |
   352  
   353  The KubeEdge Performance Test Cases will exceed 5000 Edge Nodes and 150000 Pods,
   354  so that we can make comparison with K8S Cluster.
   355  The form will be filled with the first round of Performance Test data.