k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/GETTING_STARTED.md

k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/GETTING_STARTED.md (about)

     1  # ClusterLoader2
     2  
     3  In this tutorial, we will:
     4  - Set-up perf-tests repository for local development
     5  - Create single node cluster using [Kind]
     6  - Implement a simple CL2 test and run it
     7  - Run load test on 100 nodes cluster
     8  
     9  ## Clone perf-tests repository
    10  
    11  Start with cloning perf-tests repository:
    12  ```bash
    13  git clone git@github.com:kubernetes/perf-tests.git
    14  cd perf-tests
    15  ```
    16  
    17  ## Install GVM
    18  Follow instructions on [GVM install].
    19  Install golang with specific version (1.15.12 was tested in this tutorial):
    20  ```bash
    21  gvm install go1.15.12
    22  gvm use go1.15.12
    23  ```
    24  Next, add perf-tests repository to GOPATH:
    25  
    26  ```bash
    27  gvm linkthis k8s.io/perf-tests
    28  ```
    29  
    30  ## Create cluster using kind
    31  Follow the [kind installation][Kind install] guide.
    32  
    33  Create cluster v1.21.1 with one master and one node:
    34  ```bash
    35  kind create cluster --image=kindest/node:v1.21.1 --name=test-cluster --wait=5m
    36  ```
    37  
    38  This command additionally generates cluster access credentials which are
    39   stored in `${HOME}/.kube/config` within a context named test-cluster.
    40  
    41  Check that you can connect to cluster:
    42  ```bash
    43  kubectl get nodes
    44  ```
    45  
    46  ## Prepare simple test to run
    47  Let's prepare our first test config (config.yaml).
    48  This test will:
    49  - Create one namespace
    50  - Create a single deployment with 10 pods inside that namespace
    51  - Measure startup latency of these pod
    52  
    53  We will create file `config.yaml` that describes this test.
    54  First we need to start with defining test name:
    55  ```yaml
    56  name: test
    57  ```
    58  CL2 will create namespaces automatically, but we need to specify
    59  how many namespaces we want:
    60  ```yaml
    61  namespace:
    62    number: 1
    63  ```
    64  Next, we need to specify TuningSets.
    65  TuningSet describes how actions are executed.
    66  In our case, we will have only 1 deployment
    67  so there will be only 1 action to execute.
    68  In this case tuningSet doesn't really affect transition between states.
    69  ```yaml
    70  tuningSets:
    71  - name: Uniform1qps
    72    qpsLoad:
    73      qps: 1
    74  ```
    75  Test definition consists of a list of steps.
    76  A step can be either collection of Phases or Measurements.
    77  A Phase defines a state the cluster should reach.
    78  A Measurement allows to measure something or wait for something.
    79  You can find list of available measurements here [Measurements].
    80  
    81  Our first step will be starting two measurements.
    82  We want to start measuring pod startup latency
    83  and also make measurement that will wait for all pods to be in running state.
    84  Setting the action field to `start` begins execution of measurement.
    85  For both measurements we need to specify labelSelectors
    86  so they know which pods they should take into account.
    87  PodStartupLatency also takes threshold. If 99th percentile of latency
    88  will go over this threshold, test will fail.
    89  ```yaml
    90  steps:
    91  - name: Start measurements
    92    measurements:
    93    - Identifier: PodStartupLatency
    94      Method: PodStartupLatency
    95      Params:
    96        action: start
    97        labelSelector: group = test-pod
    98        threshold: 20s
    99    - Identifier: WaitForControlledPodsRunning
   100      Method: WaitForControlledPodsRunning
   101      Params:
   102        action: start
   103        apiVersion: apps/v1
   104        kind: Deployment
   105        labelSelector: group = test-deployment
   106        operationTimeout: 120s
   107  ```
   108  Once we created these two measurements,
   109  we can have next step that creates deployment.
   110  We need to specify in which namespaces we want this deployment to be created,
   111  how many of these deployments per namespace.
   112  Also, we will need to specify template for our deployment,
   113  which we will do later.
   114  For now, let's assume that this template allows us
   115  to specify number of replicas in deployment.
   116  ```yaml
   117  - name: Create deployment
   118    phases:
   119    - namespaceRange:
   120        min: 1
   121        max: 1
   122      replicasPerNamespace: 1
   123      tuningSet: Uniform1qps
   124      objectBundle:
   125      - basename: test-deployment
   126        objectTemplatePath: "deployment.yaml"
   127        templateFillMap:
   128          Replicas: 10
   129  ```
   130  Now, we need to wait for pods in this deployment to be in Running state:
   131  ```yaml
   132  - name: Wait for pods to be running
   133    measurements:
   134    - Identifier: WaitForControlledPodsRunning
   135      Method: WaitForControlledPodsRunning
   136      Params:
   137        action: gather
   138  ```
   139  Now we can gather results of PodStartupLatency in next step:
   140  ```yaml
   141  - name: Measure pod startup latency
   142    measurements:
   143    - Identifier: PodStartupLatency
   144      Method: PodStartupLatency
   145      Params:
   146        action: gather
   147  ```
   148  Whole `config.yaml` will look like this:
   149  ```yaml
   150  name: test
   151  
   152  namespace:
   153    number: 1
   154  
   155  tuningSets:
   156  - name: Uniform1qps
   157    qpsLoad:
   158      qps: 1
   159  
   160  steps:
   161  - name: Start measurements
   162    measurements:
   163    - Identifier: PodStartupLatency
   164      Method: PodStartupLatency
   165      Params:
   166        action: start
   167        labelSelector: group = test-pod
   168        threshold: 20s
   169    - Identifier: WaitForControlledPodsRunning
   170      Method: WaitForControlledPodsRunning
   171      Params:
   172        action: start
   173        apiVersion: apps/v1
   174        kind: Deployment
   175        labelSelector: group = test-deployment
   176        operationTimeout: 120s
   177  - name: Create deployment
   178    phases:
   179    - namespaceRange:
   180        min: 1
   181        max: 1
   182      replicasPerNamespace: 1
   183      tuningSet: Uniform1qps
   184      objectBundle:
   185      - basename: test-deployment
   186        objectTemplatePath: "deployment.yaml"
   187        templateFillMap:
   188          Replicas: 10
   189  - name: Wait for pods to be running
   190    measurements:
   191    - Identifier: WaitForControlledPodsRunning
   192      Method: WaitForControlledPodsRunning
   193      Params:
   194        action: gather
   195  - name: Measure pod startup latency
   196    measurements:
   197    - Identifier: PodStartupLatency
   198      Method: PodStartupLatency
   199      Params:
   200        action: gather
   201  ```
   202  
   203  By default, clusterloader will delete auto-created namespaces
   204  so we don't need to worry with cleaning up cluster.
   205  
   206  Now, in order to finish our first test, we need to specify deployment template.
   207  You can think of it as regular kubernetes object, but with templating.
   208  CL2 by default adds parameter `Name` that you can use in your template.
   209  In our config, we also passed `Replicas` parameter.
   210  We need to remember to set correct labels so PodStartupLatency
   211  and WaitForControlledPodsRunning will watch correct pods.
   212  So our template for deployment will look like this (`deployment.yaml` file):
   213  ```yaml
   214  apiVersion: apps/v1
   215  kind: Deployment
   216  metadata:
   217    name: {{.Name}}
   218    labels:
   219      group: test-deployment
   220  spec:
   221    replicas: {{.Replicas}}
   222    selector:
   223      matchLabels:
   224        group: test-pod
   225    template:
   226      metadata:
   227        labels:
   228          group: test-pod
   229      spec:
   230        containers:
   231        - image: registry.k8s.io/pause:3.9
   232          name: {{.Name}}
   233  ```
   234  ## Execute test
   235  Before running test, make sure that kubeconfig
   236  current context points to kind cluster:
   237  ```bash
   238  $ kubectl config current-context
   239  > kind-test-cluster
   240  ```
   241  To execute test, run:
   242  ```bash
   243  cd clusterloader2/
   244  go run cmd/clusterloader.go --testconfig=config.yaml --provider=kind --kubeconfig=${HOME}/.kube/config --v=2
   245  ```
   246  
   247  At the end of clusterloader output you should see pod startup latency:
   248  ```json
   249  {
   250    "data": {
   251      "Perc50": 7100.534796,
   252      "Perc90": 8702.523037,
   253      "Perc99": 9122.894555
   254    },
   255    "unit": "ms",
   256    "labels": {
   257      "Metric": "pod_startup"
   258    }
   259  },
   260  ```
   261  `pod_startup` measures time since pod was created
   262  until it was observed via watch as running.
   263  
   264  You should also see that test succeeded:
   265  ```
   266  --------------------------------------------------------------------------------
   267  Test Finished
   268  Test: ./config.yaml
   269  Status: Success
   270  --------------------------------------------------------------------------------
   271  ```
   272  As an exercise you can modify threshold for PodStartupLatency
   273  below values you've observed in your run and check if test fails.
   274  
   275  ## Delete kind cluster
   276  To delete kind cluster, run:
   277  ```bash
   278  kind delete cluster --name test-cluster
   279  ```
   280  
   281  ## Running 100-node scale test
   282  
   283  Here you can find general purpose [Load test].
   284  This test is release-blocking test we use to evaluate scalability of kubernetes.
   285  It consists of 3 main phases:
   286  - Creating objects
   287  - Scaling objects to size between (50%, 150%) of their original size
   288  - Deleting objects
   289  
   290  It can be used to test clusters starting from 100 nodes up to 5k nodes.
   291  Load test will create, roughly 30 * nodes pod objects. It will create:
   292  - deployments
   293  - jobs
   294  - statefulsets
   295  - services
   296  - secrets
   297  - configmaps
   298  
   299  There are small (5 pods), medium (30 pods) and big (250 pods) versions of
   300  deployments, jobs and statefulsets.
   301  
   302  First, you need to create 100-nodes cluster
   303  then you can run cluster scale test with this command:
   304  ```bash
   305  ./run-e2e-with-prometheus-fw-rule.sh cluster-loader2 --testconfig=./testing/load/config.yaml --nodes=100 --provider=gke --enable-prometheus-server=true --kubeconfig=${HOME}/.kube/config --v=2
   306  ```
   307  
   308  `--enable-prometheus-server=true` deploys prometheus server
   309  using prometheus-operator.
   310  
   311  There are various measurements that depend on prometheus metrics, for example:
   312  - API responsiveness - measures the latency of requests to kube-apiserver
   313  - Scheduling throughput
   314  - NodeLocalDNS latency
   315  
   316  [Kind]: https://kind.sigs.k8s.io/
   317  [GVM install]: https://github.com/moovweb/gvm#installing
   318  [Kind config]: https://kind.sigs.k8s.io/docs/user/quick-start/#advanced
   319  [Kind install]: https://kind.sigs.k8s.io/docs/user/quick-start#installation
   320  [Load test]: https://github.com/kubernetes/perf-tests/tree/master/clusterloader2/testing/load
   321  [Measurements]: https://github.com/kubernetes/perf-tests/tree/master/clusterloader2#measurement