k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/design.md

k8s.io/perf-tests/clusterloader2@v0.0.0-20240304094227-64bdb12da87e/docs/design.md (about)

     1  # Cluster loader vision
     2  
     3  Author: wojtek-t
     4  
     5  Last update time: 1st Aug 2018
     6  
     7  ## Background
     8  
     9  As of 31/03/2018, all our scalability tests, are regular e2e tests written in Go.
    10  It makes them really unfriendly for developers not working on scalability who just
    11  want to load test Kubernetes features they are working on. Doing so in many
    12  situations requires understanding how those tests really work, modify their code
    13  to test the new feature and only then run and debug tests. Alternatively, they
    14  may create a dedicated load test for their particular feature on their own, which
    15  may be easier, but on the other hand may not exercise important metrics that our
    16  performance tests check. This workflow is far from optimal.
    17  
    18  That said, long time ago we came up with the idea that users should be able to
    19  just bring their own objects definitions in JSON format, potentially annotate them
    20  with some metadata to describe how the load should be generated from these and
    21  testing infrastructure should do everything else automatically.
    22  
    23  In early 2017 the prototype of “Cluster Loader” has been created. It proved that
    24  configuring tests with json/yaml files is possible. But its functionality is very
    25  limited and it is very far from enabling migration any of existing scalability
    26  tests to that framework.
    27  
    28  We would like to get back to that and build fully functional Cluster Loader and
    29  use it as a framework to run all our scalability tests. This doc is describing
    30  high-level vision of how this will work.
    31  
    32  
    33  ## Vision
    34  
    35  At the high level, a single test will consist of a number of steps. In each of
    36  those steps we will be creating, updating and/or deleting a number of different
    37  objects.
    38  Additionally, we will introduce a set of predefined operations (that user will be
    39  able to use as phases). They will allow users to monitor/measure performance impact
    40  of user-defined phases.
    41  The following subsections will describe this in a bit more detail.
    42  
    43  ### Config
    44  
    45  A single test scenario will be defined by a `Config`. Its schema will be as following:
    46  
    47  ```
    48  struct Config {
    49  	// Number of namespaces automanaged by ClusterLoader.
    50  	Namespaces int32
    51  	// Steps of the test.
    52  	Steps []Step
    53  	// Tuning sets that are used by steps.
    54  	TuningSets []TuningSet
    55  }
    56  ```
    57  
    58  With a test being defined by a single json/yaml file, it should be pretty simple
    59  to modify scenarios, and fork them to new ones.
    60  
    61  Note that before running any steps, ClusterLoader will create all the requested
    62  namespaces and after running all of them will delete them (together with all
    63  objects that remained undeleted after running the test). Namespaces are described
    64  in more details in the later part of this document.
    65  
    66  ### Step
    67  
    68  Each step will consist of a number of create, update and delete operations (potentially
    69  on many different object types) or a number of monitoring/measurement-related actions.
    70  A single step is defined as following:
    71  
    72  ```
    73  struct Step {
    74  	// Only one of these can be non-empty.
    75  	Phases       []Phase
    76  	Measurements []string
    77  	Name         string
    78  }
    79  ```
    80  
    81  We make `Phases` and `Measurements` separate concepts to ensure correct ordering
    82  between those two types of actions. It's very important to ensure that proper
    83  measurements are started before we start given actions and they are stopped when
    84  all actions are done.
    85  
    86  Also note that all `Phases` and `Measurements` within a single `Step` will be
    87  run in parallel - a `Step` ends when all its `Phases` or `Measurements` finish.
    88  That also means, that individual steps run in serial.
    89  
    90  Step has optional `Name`. If step is named, a timer will be fired
    91  for that step automatically.
    92  
    93  ### Phase
    94  
    95  Phase declaratively defines state of objects we should reach in the underlying
    96  Kubernetes cluster. A single declaration may result in a number of create, update
    97  and delete operations depending on the current state of the cluster.
    98  
    99  We define the phase as following:
   100  
   101  ```
   102  struct Phase {
   103  	// Set of namespaces in which objects should be reconciled.
   104  	// If null, objects are assumed to be cluster scoped.
   105  	NamespaceRange *NamespaceRange
   106  	// Number of instances of a given object to exist in each
   107  	// of referenced namespaces.
   108  	ReplicasPerNamespace int32
   109  	// Name of TuningSet to be used.
   110  	TuningSet string
   111  	// A set of objects that should be reconciled.
   112  	Objects []Object
   113  }
   114  ```
   115  
   116  ```
   117  struct Object {
   118  	// Type definition for a given object.
   119  	ObjectType ObjectType
   120  	// Base name from which names of objects will be created.
   121  	// Names of objects will be "basename-0", "basename-1", ...
   122  	Basename string
   123  	// A file path to object definition.
   124  	ObjectTemplatePath string
   125  }
   126  ```
   127  
   128  The semantic of the above structure will be as following:
   129  - `Phases` within a single `Step` will be run in parallel (to recall,
   130    individual `Steps` run in serial).
   131  - `Objects` within a single `Phase` will be reconciled in serial for a given
   132    (namespace, replica number) pair. For different (namespace, replica number)
   133    pairs they will be spread using a given tuning set.
   134  
   135  The rationale for having such structure is the following:
   136  - `Objects` represent a collection of Kubernetes objects that can be logically
   137    though of as a unit of workload (e.g. application comprised of a service,
   138    deployment and a volume). Conceptually, this collection is out unit of
   139    replication. Note that we process the `Objects` slice serially which allows
   140    ordering between objects of a unit (e.g. create a service before deployment).
   141    The replication itself is done according to `TuningSet` and
   142    `ReplicasPerNamespace` parameters of the `Phase`.
   143  - Running multiple `Phases` in parallel allows to run different workloads at the
   144    same time. As an example, it allows to create two different types of
   145    applications in parallel (possibly using different tuning sets).
   146  - Running `Steps` in serial allow you to synchronize between `Phases` (and
   147    for example block finishing measurement on all phases from previous step
   148    to be finished).
   149  
   150  Within an single `Phase` we make an explicit assumption that if `ReplicasPerNamespace`
   151  changes, any of `ObjectTemplatePath` cannot change at the same time (assuming it
   152  already exists for a given set of objects). That basically means that within a
   153  single `Phase` operations for a given `Object` may only be of a single type
   154  (create, update of delete).
   155  
   156  All of the objects are assumed to be units of workload.
   157  Therefore, if an object comes with dependents, all of its dependents will be affected
   158  by the operation performed on this object. E.g. removing instance of a `ReplicationCotroller`
   159  will also result with removing depended `Pods`.
   160  
   161  To make it more explicit:
   162  - if `ReplicasPerNamespace` is different than it previously was, we will create
   163    or delete a number of objects to reach expected cluster state
   164  - if `ReplicasPerNamespace` is the same as it previously was, we will update all
   165    objects to the referenced template.
   166  
   167  An appropriate validation will be added to cluster loader to ensure the above for
   168  a given input config.
   169  
   170  Note that (namespace number, object type, basename) tuple defines a set of
   171  replicated objects.
   172  
   173  All `Object` changes for a given (namespace, replica number) pair are treated as
   174  a unit of action. Such units will be spread over time using a referenced tuning
   175  set (described below).
   176  
   177  The following definition makes the API declarative and thus a bit similar with
   178  Kubernetes API.
   179  
   180  Caveats:
   181  
   182  - Note that even with such declarative approach, we may e.g. express a phase of
   183    randomly scaling a number of objects. This would be possible by expressing e.g.
   184    `spec.replicas: <3+RAND()%5>` in DeploymentSpec.
   185    This will require evaluating templates once for every object, but that should
   186    be fine.
   187  
   188  ### Multiple copies of the same workload
   189  
   190  To fill-in large clusters, we would need to spread objects across different namespaces.
   191  In many cases, it will be enough for users for many namespaces to contain the same
   192  objects (or to be more specific: objects created from the same templates). Obviously,
   193  we want the config for the test to be as small as possible.
   194  As a result, we will introduce the following rules:
   195  - In the top-level test definition, we will define the number of namespaces that will
   196    be automanaged by ClusterLoader.
   197  - The automanaged namespaces will have names of the form “namespace-<number>” for
   198    number in range 1..N (where N is the number of namespaces in a test)
   199  
   200  However, users may want to create their own namespaces (as part of `Phases`) and
   201  create objects in them. That is perfectly valid usecase that will be supported.
   202  
   203  To make it possible to reference a set of namespaces (both automanaged and user-created),
   204  we introduce the following type:
   205  
   206  ```
   207  struct NamespaceRange {
   208  	Min int32
   209  	Max int32
   210  	Basename *string
   211  }
   212  ```
   213  
   214  The `NamespaceRange` would select all namespace `\<Basename\>-\<i\>` for `i` in the
   215  range [Min, Max]. If `Basename` is unset, it will be default to the basename used for
   216  automanaged namespaces (i.e. `namespace`).
   217  
   218  #### Defining object type
   219  
   220  In order to update or delete an object, users need to be able to define type of
   221  object that this operation is about. Thus, we introduce  the following type for
   222  this purpose:
   223  
   224  ```
   225  struct ObjectType {
   226  	APIGroup   string
   227  	APIVersion string
   228  	Kind       string
   229  }
   230  ```
   231  
   232  Using this will allow us to easily use dynamic client in most of the places
   233  which may significantly simplify Cluster Loader itself.
   234  
   235  #### Tuning Set
   236  
   237  Since we would like to be able to fully load even very big clusters, we need to
   238  be able to create a number of “similar” objects. A “Tuning Set” concept will allow
   239  us to spread those operations across some time.
   240  We define Tuning Set as following:
   241  
   242  ```
   243  struct TuningSet {
   244  	Name         string
   245  	InitialDelay time.Duration
   246  	// Exactly one of the following should be set.
   247  	QpsLoad        *QpsLoad
   248  	RandomizedLoad *RandomizedLoad
   249  	SteppedLoad    *SteppedLoad
   250  }
   251  
   252  // QpsLoad defines a uniform load with a given QPS.
   253  struct QpsLoad {
   254  	Qps float
   255  }
   256  
   257  // RandomizedLoad defines a load that is spread randomly
   258  // across a given total time.
   259  struct RandomizedLoad {
   260  	AverageQps float
   261  }
   262  
   263  // SteppedLoad defines a load that generates a burst of
   264  // a given size every X seconds.
   265  struct SteppedLoad {
   266  	BurstSize int32
   267  	StepDelay time.Duration
   268  }
   269  ```
   270  
   271  More policies can be introduced in the future.
   272  
   273  ### Measurements
   274  
   275  A critical part of Cluster Loader is an ability to check whether tests (defined by
   276  configs) are satisfying set of Kubernetes performance SLOs.
   277  Fortunately, for testing a specific functionality, we don’t really change the SLO.
   278  We may want to, from time to time, tweak how do we measure existing SLOs or introduce
   279  a new one, but it is fine to require changes to the framework to achieve that.
   280  
   281  As a result, mechanisms to measure specific SLOs (or gather other types of metrics)
   282  will be incorporated into Cluster Loader framework. We will expect that developers
   283  trying to introduce a new SLO (or change how do we measure the existing one) will
   284  be modifying that part of Cluster Loader codebase. Within the codebase, we will try
   285  to provide a relatively easy framework to achieve it though.
   286  
   287  At the high level, to implement gathering a given portion of data or measure a new
   288  SLO, you will need to implement a very simple Go interface:
   289  
   290  ```
   291  type Measurement interface {
   292  	Execute(config *MeasurementConfig) error
   293  }
   294  
   295  // An instance of below struct would be constructed by clusterloader during runtime
   296  // and passed to the Execute method.
   297  struct MeasurementConfig {
   298  	// Client to access the k8s api.
   299  	Clientset *k8sclient.ClientSet
   300  	// Interface to access the cloud-provider api (can be skipped for initial version).
   301  	CloudProvider *cloudprovider.Interface
   302  	// Params is a map of {name: value} pairs enabling for injection of arbitrary
   303  	// config into the Execute method. This is copied over from the Params field
   304  	// in the Measurement config (explained later) as it is.
   305  	Params map[string]interface{}
   306  }
   307  ```
   308  
   309  Once you implement such an interface, registering it in the correct
   310  place will allow you to use those as phases in your config.
   311  As an example, consider gathering resource usage from system components.
   312  It will be enough to implement something like the following:
   313  
   314  ```
   315  struct ResourceGatherer {
   316  	// Some fields that you need.
   317  }
   318  
   319  func (r *ResourceGatherer) Execute(c MeasurementConfig) error {
   320  	if c.Params["start"] {
   321  		// Initialize gatherer.
   322  		// Start the gathering goroutines.
   323  		return nil
   324  	}
   325  	if c.Params["stop"] {
   326  		// Stop the gatherer goroutines.
   327  		// Validate and/or save the results.
   328  		return nil
   329  	}
   330  	// Handling of any other potential cases.
   331  }
   332  ```
   333  
   334  and registering this type in some factory, to enable use of `ResourceGatherer`
   335  as a measurement 'Method' in your test. And finally, at the config level,
   336  each `Measurement` is defined as:
   337  
   338  ```
   339  struct Measurement {
   340  	// The measurement method to be run.
   341  	// Such method has to be registered in ClusterLoader factory.
   342  	Method string
   343  	// Identifier is a string for differentiating this measurement instance
   344  	// from other instances of the same method.
   345  	Identifier string
   346  	// Params is a map of {name: value} pairs which will be passed to the
   347  	// measurement method - allowing for injection of arbitrary parameters to it.
   348  	Params map[string]interface{}
   349  }
   350  ```
   351  
   352  To begin with, we will provide few built-in measurement methods like:
   353  
   354  ```
   355  	ResourceGatherer
   356  	ProfileGatherer
   357  	MetricsGatherer
   358  	APICallLatencyValidator
   359  	PodStartupLatencyValidator
   360  ```
   361  
   362  
   363  ## Future enhancements
   364  
   365  This section contains future enhancements that will need to happen, but not
   366  necessary at the early beginning.
   367  
   368  1. Simple templating in json files.
   369     This would be extremely useful (necessary) feature to enable referencing
   370     objects from other objects. As an example, let's say that we want to reference
   371     secret number `i` from deployment number `i`.
   372     We would achieve that by providing very simple templating mechanism at the
   373     level of files with object definitions. The exact details are TBD, but the
   374     high-level proposal is to:
   375     - use the `{{ param }}` syntax for templates
   376     - support only very simply mathematical operations, and symbols:
   377       - `N` would mean the number of that object (as defined in `basename-<N>`)
   378       - `RAND` will be random integer
   379       - `%` (modulo) operation will be supported
   380       - `+` operation will be supported
   381     - though, only simple expressions (like {{ N+i%5 }} or {{ RAND%3+5 }} will
   382       be supported (at least initially).
   383  
   384  2. Feedback loop from monitoring.
   385     Assume that we defined some SLO (that Cluster Loader is able to measure) and
   386     now we want to understand what conditions need to be satisfied to meet this
   387     SLO (e.g. what throughput we can support to meet the latency SLO).
   388     Providing a feedback loop from measurements to load generation tuning can
   389     solve that problem automatically for us.
   390     There are a number of details that needs to be figured out to do that, that's
   391     not needed for the initial version (or for migration existing scalability
   392     tests), but that should also happen once the framework is already usable.