github.com/gocrane/crane@v0.11.0/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md

github.com/gocrane/crane@v0.11.0/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md (about)

     1  # Pod Sorting And Precise Execution For Crane Agent
     2  The proposal enriches the sorting strategy of the crane agent and perfects the general sorting. In addition, a framework of precise operation (throttle/eviction) is implemented. When performing throttle, eviction and other operations, the precise operation logic of operating to the water level specified by the user, i.e. stopping, avoids excessive operation of low optimal pod;
     3  
     4  Specifically:
     5  
     6  - Enriches the sorting strategy of crane agent, and perfects the general sorting and CPU dimension sorting with CPU usage as the main reference;
     7  
     8  - For CPU usage, the precise operation logic that stops when operating to the water level specified by the user when throttle/eviction is implemented, which avoids the excessive operation of low optimal pod;
     9  
    10  - A framework of precise operation (throttle/eviction) is implemented. By improving some column attributes and implementation of user-defined indicators, it can also have the same precise operation ability as CPU usage without caring about specific details, and has certain universality and scalability.
    11  
    12  ## Table of Contents
    13  
    14  <!-- TOC -->
    15  
    16  - [Pod Sorting And Precise Execution For Crane Agent](#Pod Sorting And Precise Execution For Crane Agent)
    17      - [Table of Contents](#table-of-contents)
    18      - [Motivation](#motivation)
    19          - [Goals](#goals)
    20      - [Proposal](#proposal)
    21          - [Enrich the sorting strategy of pod](#Enrich the sorting strategy of pod)
    22          - [Definition of metric attribute](#Definition of metric attribute)
    23          - [How to control accurately according to the water level](#How to control accurately according to the water level)
    24          - [Precise operation of pod based on water level](#Precise operation of pod based on water level)
    25              - [Analyzer phase](#Analyzer phase)
    26              - [Executor phase](#Executor phase)
    27          - [Non-Goals/Future Work](#non-goalsfuture-work)
    28          - [User Stories](#user-stories)
    29  
    30  <!-- /TOC -->
    31  ## Motivation
    32  Currently, in the crane agent, when the water level specified in the NodeQosEnsurancePolicy is exceeded, perform throttle, eviction and other operations to sort the low priority pods first. The current sorting is based on the prority class of the pod, and then perform throttle or eviction on the sorted pods;
    33  
    34  The existing problems are:
    35  
    36  1. sorting only refers to prority class, which cannot meet the sorting based on other features; At the same time, it can not meet the requirements of flexible sequencing according to the precise operation of the water level line, and can not meet the requirements of making the nodes reach the specified water level as soon as possible. For example, when we want to reduce the CPU usage of low priority services as soon as possible, we should select the pod with more CPU usage, which can reduce the CPU usage faster and ensure that high-quality services are not affected.
    37  
    38  2. after triggering the watermark specified in NodeQosEnsurancePolicy, all pods on the node that are lower than the specified prolityclass will be operated; For example, there are 10 pods on the current node that are lower than the specified prority class. After the water level is triggered, operations will be performed on all 10 pods. However, in fact, after the operation on the first pod is completed, it may be lower than the index value in NodeQosEnsurancePolicy. The operation on the remaining pods is excessive and can be avoided. If the index value in NodeQosEnsurancePolicy can be used as the watermark to accurately operate the pod, it is more appropriate to operate it just below the watermark, so as to avoid excessive impact on low priority services.
    39  
    40  ### Goals
    41  
    42  - Enriches the sorting strategy of crane agent, including the sorting with pod CPU consumption as the main reference, the sorting with pod memory consumption as the main reference, the sorting based on runtime, and the sorting based on extended resource utilization.
    43  
    44  - Implement a framework including sorting and a precise operation, support to enrich sorting rules for different indicators, and realize precise operation.
    45  
    46  - To achieve a precise operation for CPU usage and memory usage, when the machine load exceeds the water level specified in NodeQosEnsurancePolicy, the low priority pods will be sorted first, and then the operation will be carried out in order until it is just below the water level.
    47  
    48  ## Proposal
    49  
    50  ### Enrich the sorting strategy of pod
    51  
    52  - The proposal implements some general sorting methods (which will be improved later):
    53  
    54    classAndPriority： Compare the Qos class and class value of two pods. Compare Qos class first and then class value; Those with high priority are ranked later and have higher priority
    55  
    56    runningTime：Compare the running time of two pods. The one with a long running time is ranked later and has a higher priority
    57  
    58    If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the usage of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod
    59      ```go
    60      func GeneralSorter(pods []podinfo.PodContext) {
    61          orderedBy(classAndPriority, runningTime).Sort(pods)
    62      }
    63      ```
    64  
    65  - Sorting of CPU usage
    66  
    67    The priority of two pods will be compared in turn. If the priority is the same, then compare the CPU usage. If the CPU usage is also the same, continue to compare the EXT CPU resource usage (this is a special point of the CPU attribute). Finally, compare the running time of the pod. When there is a difference in a certain index, the comparison result can be returned
    68  
    69      ```go
    70      func CpuUsageSorter(pods []podinfo.PodContext) {
    71          orderedBy(classAndPriority, cpuUsage, extCpuUsage, runningTime).Sort(pods)
    72      }
    73      ```
    74  
    75  - Sorting of ext CPU usage
    76  
    77    First, it will compare whether the extended CPU resources are used by two pods. If both are used, it will compare the ratio of the extended CPU resource usage / the extended CPU resource limit
    78  
    79  
    80  - For the indicators that need to be customized, the following methods can be implemented, and the flexible and customized sorting of pods can be easily realized by freely matching the general sorting methods. The <metric> represents the customized metric indicators, and the <metric sort func> represents the customized sorting strategy for <metric>
    81      ```go
    82      func <metric>Sorter(pods []podinfo.PodContext) {
    83          orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods)
    84      }
    85      ```
    86    The <metric sort func> only needs to implement the following sorting methods
    87      ```go
    88      func (p1, p2 podinfo.PodContext) int32 
    89      ```
    90  
    91  
    92  ### Definition of metric attribute
    93  
    94  In order to better sort and precisely control metrics configured based on NodeQosEnsurancePolicy, the concept of attributes is introduced into metrics.
    95  
    96  The attributes of metrics include the following:
    97  
    98  1. Name indicates the name of the metric, which should be consistent with the indicator name collected in the collector module
    99  2. ActionPriority indicates the priority of the indicator. 0 is the lowest and 10 is the highest
   100  3. SortAble indicates whether the indicator can be sorted
   101  4. Sorting methods corresponding to SortFunc. Sorting methods can be arranged and combined with some general methods, and then combined with the sorting of indicators, which will be introduced in detail below
   102  5. ThrottleAble indicates whether pod can be suppressed for this indicator. For example, for the metric of CPU usage, there are corresponding suppression methods. However, for the indicator of memory usage, the pod can only be expelled, and effective suppression cannot be carried out
   103  6. ThrottleQuantified indicates whether the corresponding metric resources released after the suppression can be accurately calculated after a pod is restored. We call the indicators that can be accurately quantified quantifiable, otherwise, they are not quantifiable;
   104     For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; For example, memory usage does not belong to the suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod;
   105  7. ThrottleFunc, the specific method to execute the throttle action. If throttling is not available, the returned released is null
   106  8. RestoreFunc: after being throttled, the specific method to execute the recovery action. If throttling is not allowed, the returned released is null
   107  9. Relevant definitions of evicting actions by evictable, evictquantified, and evictfunc are similar to those of throttle actions
   108  
   109  
   110  ```go
   111  type metric struct {
   112  	Name WaterLineMetric
   113  
   114  	ActionPriority int
   115  
   116  	SortAble bool
   117  	SortFunc func(pods []podinfo.PodContext)
   118  
   119  	ThrottleAble      bool
   120  	ThrottleQuantified bool
   121  	ThrottleFunc      func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource)
   122  	RestoreFunc       func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource)
   123  
   124  	EvictAble      bool
   125  	EvictQuantified bool
   126  	EvictFunc      func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource)
   127  }
   128  ```
   129  
   130  You can define your own metric. After the construction is completed, you can register it through registermetricmap()
   131  
   132  ### How to control accurately according to the water level
   133  
   134  - Build multiple waterlines according to multiple nodeqosensurancepolicies and objectiveinsurances:
   135      1. Classified according to the actions corresponding to objectiveinsurances, the crane agent currently has three operations to guarantee node QoS, namely, evict, thtottledown (to suppress pod usage when the current usage is higher than the value in objectiveinsurances) and throttleup (to relax and recover pod usage when the current usage is lower than the value in objectiveinsurances). Therefore, there will be three waterline sets, namely, throttledownwaterline, Throttleupwaterline and evictwaterline
   136  
   137      2. Then classify the waterlines in the same operation category according to their metric rules (metric A and metric Z are used as schematic in the figure), and record the value of each objectiveinsurances water level line, which is recorded as waterline;
   138  
   139         The structures of throttledownwaterline, throttleupwaterline and evictwaterline are as follows:
   140         `type WaterLines map[WaterLineMetric]*WaterLine`
   141  
   142         Where waterlinemetric is the name field of the above metric, and waterline of value is the resource value
   143         `type WaterLine resource.Quantity`
   144  
   145    Finally, a data store similar to the following figure is formed:  
   146    ![](waterline-construct.png)
   147  
   148  - Construct the difference between real-time consumption and waterline:
   149    The following data structure is constructed by combining the difference between the real-time consumption of the indicator at the current node and the minimum value in the waterline corresponding to the indicator in waterlines, representing the difference between the current consumption and the waterline
   150    `type GapToWaterLines map[WaterLineMetric]float64`
   151  
   152    Where the key value is the name field of metric, and the value is the difference between the consumption and the waterline;
   153  
   154    It should be noted that for throttleup, the minimum waterline - current usage is used as the gap value. For the other two, the minimum waterline - current usage is used as the gap value, that is, the gap value is always kept positive
   155  
   156    The following three data represent the indicators that need to perform evict, thatttledown and throttleup operations and their corresponding differences to the lowest waterline
   157      ```go
   158      EvictGapToWaterLines[metrics]     
   159      ThrottoleDownGapToWaterLines[metrics]
   160      ThrottleUpGapWaterLine[metrics]
   161      ```
   162  
   163  - Taking the metric CpuUsage as an example, the process and data structure of constructing the waterline related to node CPU usage are as follows:
   164    ![](cpu-usage-water-line.png)
   165  
   166  ### Precise operation of pod based on water level
   167  In order to realize the precise operation of pod based on the water level, the proposal will modify the analyzer and executor. The general process is as follows:
   168  
   169  In the analyzer phase, construct waterlines for different operations (eviction, throttle, etc.) and different metrics, delete the original sorting logic, and move it to the executor phase where formal operations are required, and multiple rounds of sorting may be required;
   170  
   171  In the executor stage, the corresponding sorting is carried out according to the indicators involved in the waterline, the latest consumption is obtained, gaptowaterlines is constructed, and precise operations are carried out
   172  
   173  #### Analyzer phase
   174  At this stage, the NodeQosEnsurancePolicy is converted to waterlines, and the rules of the same actionname and metricreule are merged. The details have been described above
   175  
   176  #### Executor phase
   177  
   178  Throttle:
   179  
   180  1. Firstly, analyze the metrics involved in the ThrottoleDownGapToWaterLines, and divide these metrics into two parts according to their quantized attribute. If there is a metric that cannot be quantized, get the metric of a throttleable (with a throttlefunc) with the highest action priority through gethighstprioritythottleablemetric to suppress all the selected pods, because if there is a metric that cannot be quantized, It is impossible to carry out a precise operation
   181  
   182  2. Get the latest usage of the current node and workload through getstatefunc(), Construct the gaptowaterline according to the ThrottoleDownGapToWaterLines and real-time usage (note that when constructing the gaptowaterline, it will traverse with the registered metric, so the finally constructed metric in the gaptowaterline will be the metric registered in the ThrottoleDownGapToWaterLines, avoiding the situation that the configuration error does not exist or the metric is not registered in the nodeqosensancepolicy)
   183  
   184  3. If there is a metric in the gaptowaterline whose real-time usage cannot be obtained (hasusagemissedmetric), obtain the metric of a throttleable (with throttlefunc) with the highest action priority through GetHighestPriorityThrottleAbleMetric to suppress all the selected pods, because if there is a metric whose real-time usage cannot be obtained, the gap with the waterline cannot be known, and precise operations cannot be performed
   185  
   186  4. If the situation in 3 does not exist, traverse the quantifiable metrics in the ThrottoleDownGapToWaterLines: if the metric has a sorting method, it directly uses its sortfunc to sort the pods. If not, it uses generalsorter to sort the pods, and then uses its corresponding throttlefunc to suppress the pods, and calculate the released resources of the corresponding metric, Until the gap corresponding to this metric in ThrottoleDownGapToWaterLines no longer exists
   187  
   188  ```go
   189  metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified()
   190  if len(MetricsNotThrottleQuantified) != 0 {
   191      highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric()
   192      if highestPrioriyMetric != "" {
   193          t.throttlePods(ctx, &totalReleased, highestPrioriyMetric)
   194      }
   195  } else {
   196      ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc())
   197      if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() {
   198          highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric()
   199          if highestPrioriyMetric != "" {
   200              throttlePods(ctx, &totalReleased, highestPrioriyMetric)
   201          }
   202      } else {
   203          var released ReleaseResource
   204          for _, m := range metricsQuantified {
   205              if m.SortAble {
   206                  m.SortFunc(ThrottleDownPods)
   207              } else {
   208                  GeneralSorter(ThrottleDownPods)
   209              }
   210      
   211              for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) {
   212                  for index, _ := range ThrottleDownPods {
   213                      released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased)
   214                      ThrottoleDownGapToWaterLines[m] -= released[m]
   215                  }
   216              }
   217          }
   218      }
   219  }
   220  ```
   221  
   222  Eviction：
   223  
   224  The process of eviction and throttle is the same, except that it is necessary to judge whether the pod has been expelled when operating the pod; Take out a pod that has not been executed, execute the eviction operation, calculate the released metric resources, and subtract the released value from the corresponding water level until the current metric waterline requirements are met
   225  ```go
   226  metricsEvictQuantified, MetricsNotEvcitQuantified := EvictWaterLine.DivideMetricsByEvictQuantified()
   227  
   228  if len(MetricsNotEvcitQuantified) != 0 {
   229      highestPrioriyMetric := e.EvictWaterLine.GetHighestPriorityEvictAbleMetric()
   230      if highestPrioriyMetric != "" {
   231          e.evictPods(ctx, &totalReleased, highestPrioriyMetric)
   232      }
   233  } else {
   234      EvictGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc(), ThrottleExecutor{}, *e)
   235  	if EvictGapToWaterLines.HasUsageMissedMetric() {
   236          highestPrioriyMetric := EvictWaterLine.GetHighestPriorityEvictAbleMetric()
   237          if highestPrioriyMetric != "" {
   238              e.evictPods(ctx, &totalReleased, highestPrioriyMetric)
   239          }
   240      } else {
   241  		wg := sync.WaitGroup{}
   242          var released ReleaseResource
   243          for _, m := range metricsEvictQuantified {
   244              if MetricMap[m].SortAble {
   245                  MetricMap[m].SortFunc(e.EvictPods)
   246              } else {
   247                  execsort.GeneralSorter(e.EvictPods)
   248              }
   249      
   250              for !EvictGapToWaterLines.TargetGapsRemoved(m) {
   251                  if podinfo.HasNoExecutedPod(e.EvictPods) {
   252                      index := podinfo.GetFirstNoExecutedPod(e.EvictPods)
   253                      released = MetricMap[m].EvictFunc(&wg, ctx, index, &totalReleased, e.EvictPods)
   254      
   255                      e.EvictPods[index].HasBeenActioned = true
   256                      ctx.EvictGapToWaterLines[m] -= released[m]
   257                  }
   258              }
   259          }
   260          wg.Wait()
   261          }
   262      
   263  }
   264  ```
   265  
   266  ### Non-Goals/Future Work
   267  
   268  - Currently, only the precise operation of CPU usage is supported, but the framework can be reused. In the future, the framework based on precise control can achieve precise control of more dimensional indicators.
   269  - In the process of precise control, only the release of metric is considered at present, and the interaction between different metrics is not considered. For example, when pressing CPU usage, memory usage will also be affected. If there are many indicators, the relationship between different indicators will be very complex, so the direct interaction of different metrics will not be considered for the time being.
   270  
   271  ### User Stories
   272  
   273  - Users can use crane agent for better QoS guarantees. Support faster node load reduction to ensure that high priority services are not affected. At the same time, the throttle/eviction of low priority services is precisely controlled to avoid excessive operation.
   274  - With the help of the framework of precise operation (throttle/eviction), users can easily realize the QoS function with precise operation and sorting capability based on the user-defined metric without paying attention to details by implementing the attributes and methods related to the user-defined metric.