k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/README.md (about) 1 # ClusterLoader 2 3 ClusterLoader2 (CL2) is a "bring your own yaml" Kubernetes load testing tool 4 being an official K8s scalability and performance testing framework. 5 6 The CL2 tests are written in yaml using the semi-declarative paradigm. 7 A test defines a set of states in which a cluster should be 8 (e.g. I want to run 10k pods, 2k cluster-ip services, 5 daemon-sets, etc.) 9 and specifies how fast (e.g. pod throughput) a given state should be reached. 10 In addition, it defines which performance characteristics 11 should be measured [Measurements list](#Measurement). 12 Last but not least, CL2 provides an extra observability of the cluster 13 during the test with [Prometheus](#prometheus-metrics). 14 15 The CL2 test API is described [here][api]. 16 17 ## Getting started 18 19 See [Getting started] guide if you are new user of ClusterLoader. 20 21 ### Flags 22 23 #### Required 24 25 These flags are required for any test to be run. 26 - kubeconfig - path to the kubeconfig file. 27 - testconfig - path to the test config file. This flag can be used multiple times 28 if more than one test should be run. 29 - provider - Cluster provider, options are: gce, gke, kind, kubemark, aws, local, vsphere, skeleton 30 31 #### Optional 32 33 - nodes - number of nodes in the cluster. 34 If not provided, test will assign the number of schedulable cluster nodes. 35 - report-dir - path to directory, where summaries files should be stored. 36 If not specified, summaries are printed to standard log. 37 - mastername - Name of the master node 38 - masterip - DNS Name / IP of the master node 39 - testoverrides - path to file with overrides. 40 - kubelet-port - Port of the kubelet to use (*default: 10250*) 41 42 ## Tests 43 44 ### Test definition 45 46 Test definition is an instantiation of this [api] (in json or yaml). 47 The motivation and description of the API can be found in [design doc]. 48 Definitions of test as well as definitions of individual objects support templating. 49 Templates for test definition come with one predefined value - ```{{.Nodes}}```, 50 which represents the number of schedulable nodes in the cluster. \ 51 Example of a test definition can be found here: [load test]. 52 53 ### Modules 54 55 ClusterLoader2 supports modularization of the test configs via the [Module API](https://github.com/kubernetes/perf-tests/blob/1bbb8bd493e5ce6370b0e18f3deaf821f3f28fd0/clusterloader2/api/types.go#L77). 56 With the Module API, you can divide a single test config file into multiple 57 module files. A module can be parameterized and used multiple times by 58 the test or other module. This provides a convenient way to avoid copy-pasting 59 and maintaining super-long, unreadable test configs<sup id="a1">[1](#f1)</sup>. 60 61 [TODO(mm4tt)]: <> (Point to the load config based on modules here once we migrate it) 62 63 ### Object template 64 65 Object template is similar to standard kubernetes object definition 66 with the only difference being templating mechanism. 67 Parameters can be passed from the test definition to the object template 68 using the ```templateFillMap``` map. 69 Two always available parameters are ```{{.Name}}``` and ```{{.Index}}``` 70 which specifies object name and object replica index respectively. \ 71 Example of a template can be found here: [load deployment template]. 72 73 ### Overrides 74 75 Overrides allow to inject new variables values to the template. \ 76 Many tests define input parameters. Input parameter is a variable 77 that potentially will be provided by the test framework. Cause input parameters are optional, 78 each reference has to be opaqued with ```DefaultParam``` function that will 79 handle case if given variable doesn't exist. \ 80 Example of overrides can be found here: [overrides] 81 82 #### Passing environment variables 83 84 Instead of using overrides in file, it is possible to depend on environment 85 variables. Only variables that start with `CL2_` prefix will be parsed and 86 available in script. 87 88 Environment variables can be used with `DefaultParam` function to provide sane 89 default values. 90 91 ##### Setting variables in shell 92 ```shell 93 export CL2_ACCESS_TOKENS_QPS=5 94 ``` 95 96 ##### Usage from test definition 97 ```yaml 98 {{$qpsPerToken := DefaultParam .CL2_ACCESS_TOKENS_QPS 0.1}} 99 ``` 100 101 ## Measurement 102 103 Currently available measurements are: 104 - **APIAvailabilityMeasurement** \ 105 This measurement collects information about the availability of cluster's control plane. \ 106 There are two slightly different ways this is measured: 107 - cluster-level availability, where we periodically issue an API call to `/readyz`, 108 - host-level availability, where we periodically poll each of the control plane's host `/readyz` endpoint. 109 - this requires the [exec service](https://github.com/kubernetes/perf-tests/tree/master/clusterloader2/pkg/execservice) to be enabled. 110 - **APIResponsivenessPrometheusSimple** \ 111 This measurement creates percentiles of latency and number for server api calls based on the data collected by the prometheus server. 112 Api calls are divided by resource, subresource, verb and scope. \ 113 This measurement verifies if [API call latencies SLO] is satisfied. 114 If prometheus server is not available, the measurement will be skipped. 115 - **APIResponsivenessPrometheus** \ 116 This measurement creates summary for latency and number for server api calls 117 based on the data collected by the prometheus server. 118 Api calls are divided by resource, subresource, verb and scope. \ 119 This measurement verifies if [API call latencies SLO] is satisfied. 120 If prometheus server is not available, the measurement will be skipped. 121 - **CPUProfile** \ 122 This measurement gathers the cpu usage profile provided by pprof for a given component. 123 - **EtcdMetrics** \ 124 This measurement gathers a set of etcd metrics and its database size. 125 - **MemoryProfile** \ 126 This measurement gathers the memory profile provided by pprof for a given component. 127 - **MetricsForE2E** \ 128 The measurement gathers metrics from kube-apiserver, controller manager, 129 scheduler and optionally all kubelets. 130 - **PodPeriodicCommand** \ 131 This measurement continually runs commands on an interval in pods targeted 132 with a label selector. The output from each command is collected, allowing for 133 information to be polled throughout the duration of the measurement, such as 134 CPU and memory profiles. 135 - **PodStartupLatency** \ 136 This measurement verifies if [pod startup SLO] is satisfied. 137 - **ResourceUsageSummary** \ 138 This measurement collects the resource usage per component. During gather execution, 139 the collected data will be converted into summary presenting 90th, 99th and 100th usage percentile 140 for each observed component. \ 141 Optionally resource constraints file can be provided to the measurement. 142 Resource constraints file specifies cpu and/or memory constraint for a given component. 143 If any of the constraint is violated, an error will be returned, causing test to fail. 144 - **SchedulingMetrics** \ 145 This measurement gathers a set of scheduler metrics. 146 - **SchedulingThroughput** \ 147 This measurement gathers scheduling throughput. 148 - **Timer** \ 149 Timer allows for measuring latencies of certain parts of the test 150 (single timer allows for independent measurements of different actions). 151 - **WaitForControlledPodsRunning** \ 152 This measurement works as a barrier that waits until specified controlling objects 153 (ReplicationController, ReplicaSet, Deployment, DaemonSet and Job) have all pods running. 154 Controlling objects can be specified by label selector, field selector and namespace. 155 In case of timeout test continues to run, with error (causing marking test as failed) being logged. 156 - **WaitForRunningPods** \ 157 This is a barrier that waits until required number of pods are running. 158 Pods can be specified by label selector, field selector and namespace. 159 In case of timeout test continues to run, with error (causing marking test as failed) being logged. 160 - **Sleep** \ 161 This is a barrier that waits until requested amount of the time passes. 162 - **WaitForGenericK8sObjects** \ 163 This is a barrier that waits until required number of k8s object fulfill given condition requirements. 164 Those conditions can be specified as a list of requirements of `Type=Status` format, e.g.: `NodeReady=True`. 165 In case of timeout test continues to run, with error (causing marking test as failed) being logged. 166 167 ## Prometheus metrics 168 169 There are two ways of scraping metrics from pods within cluster: 170 - **ServiceMonitor** \ 171 Allows to scrape metrics from all pods in service. Here you can find example [Service monitor] 172 - **PodMonitor** \ 173 Allows to scrape metrics from all pods with specific label. Here you can find example [Pod monitor] 174 175 ## Vendor 176 177 Vendor is created using [Go modules]. 178 179 --- 180 181 <sup><b id="f1">1.</b> As an example and anti-pattern see the 900 line [load test config.yaml](https://github.com/kubernetes/perf-tests/blob/92cc27ff529ae3702c87e8f154ea62f3f2d8e837/clusterloader2/testing/load/config.yaml) we ended up maintaining at some point. [↩](#a1)</sup> 182 183 184 [api]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/api/types.go 185 [API call latencies SLO]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md 186 [exec service]: https://github.com/kubernetes/perf-tests/tree/master/clusterloader2/pkg/execservice 187 [design doc]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/docs/design.md 188 [Go modules]: https://blog.golang.org/using-go-modules 189 [Getting started]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/docs/GETTING_STARTED.md 190 [load deployment template]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/deployment.yaml 191 [load test]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/config.yaml 192 [overrides]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/density/scheduler/pod-affinity/overrides.yaml 193 [pod startup SLO]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md 194 [Service monitor]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/prometheus/manifests/default/prometheus-serviceMonitorKubeProxy.yaml 195 [Pod monitor]: https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/prometheus/manifests/default/prometheus-podMonitorNodeLocalDNS.yaml