github.com/gocrane/crane@v0.11.0/docs/proposals/Pod-Sorting-And-Precise-Execution-For-Crane-Agent.md (about) 1 # Pod Sorting And Precise Execution For Crane Agent 2 The proposal enriches the sorting strategy of the crane agent and perfects the general sorting. In addition, a framework of precise operation (throttle/eviction) is implemented. When performing throttle, eviction and other operations, the precise operation logic of operating to the water level specified by the user, i.e. stopping, avoids excessive operation of low optimal pod; 3 4 Specifically: 5 6 - Enriches the sorting strategy of crane agent, and perfects the general sorting and CPU dimension sorting with CPU usage as the main reference; 7 8 - For CPU usage, the precise operation logic that stops when operating to the water level specified by the user when throttle/eviction is implemented, which avoids the excessive operation of low optimal pod; 9 10 - A framework of precise operation (throttle/eviction) is implemented. By improving some column attributes and implementation of user-defined indicators, it can also have the same precise operation ability as CPU usage without caring about specific details, and has certain universality and scalability. 11 12 ## Table of Contents 13 14 <!-- TOC --> 15 16 - [Pod Sorting And Precise Execution For Crane Agent](#Pod Sorting And Precise Execution For Crane Agent) 17 - [Table of Contents](#table-of-contents) 18 - [Motivation](#motivation) 19 - [Goals](#goals) 20 - [Proposal](#proposal) 21 - [Enrich the sorting strategy of pod](#Enrich the sorting strategy of pod) 22 - [Definition of metric attribute](#Definition of metric attribute) 23 - [How to control accurately according to the water level](#How to control accurately according to the water level) 24 - [Precise operation of pod based on water level](#Precise operation of pod based on water level) 25 - [Analyzer phase](#Analyzer phase) 26 - [Executor phase](#Executor phase) 27 - [Non-Goals/Future Work](#non-goalsfuture-work) 28 - [User Stories](#user-stories) 29 30 <!-- /TOC --> 31 ## Motivation 32 Currently, in the crane agent, when the water level specified in the NodeQosEnsurancePolicy is exceeded, perform throttle, eviction and other operations to sort the low priority pods first. The current sorting is based on the prority class of the pod, and then perform throttle or eviction on the sorted pods; 33 34 The existing problems are: 35 36 1. sorting only refers to prority class, which cannot meet the sorting based on other features; At the same time, it can not meet the requirements of flexible sequencing according to the precise operation of the water level line, and can not meet the requirements of making the nodes reach the specified water level as soon as possible. For example, when we want to reduce the CPU usage of low priority services as soon as possible, we should select the pod with more CPU usage, which can reduce the CPU usage faster and ensure that high-quality services are not affected. 37 38 2. after triggering the watermark specified in NodeQosEnsurancePolicy, all pods on the node that are lower than the specified prolityclass will be operated; For example, there are 10 pods on the current node that are lower than the specified prority class. After the water level is triggered, operations will be performed on all 10 pods. However, in fact, after the operation on the first pod is completed, it may be lower than the index value in NodeQosEnsurancePolicy. The operation on the remaining pods is excessive and can be avoided. If the index value in NodeQosEnsurancePolicy can be used as the watermark to accurately operate the pod, it is more appropriate to operate it just below the watermark, so as to avoid excessive impact on low priority services. 39 40 ### Goals 41 42 - Enriches the sorting strategy of crane agent, including the sorting with pod CPU consumption as the main reference, the sorting with pod memory consumption as the main reference, the sorting based on runtime, and the sorting based on extended resource utilization. 43 44 - Implement a framework including sorting and a precise operation, support to enrich sorting rules for different indicators, and realize precise operation. 45 46 - To achieve a precise operation for CPU usage and memory usage, when the machine load exceeds the water level specified in NodeQosEnsurancePolicy, the low priority pods will be sorted first, and then the operation will be carried out in order until it is just below the water level. 47 48 ## Proposal 49 50 ### Enrich the sorting strategy of pod 51 52 - The proposal implements some general sorting methods (which will be improved later): 53 54 classAndPriority: Compare the Qos class and class value of two pods. Compare Qos class first and then class value; Those with high priority are ranked later and have higher priority 55 56 runningTime:Compare the running time of two pods. The one with a long running time is ranked later and has a higher priority 57 58 If you only need to use these two sorting strategies, you can use the default sorting method: you will first compare the priority of the pod, then compare the usage of the corresponding indicators of the pod, and then compare the running time of the pod. There is a dimension that can compare the results, that is, the sorting results of the pod 59 ```go 60 func GeneralSorter(pods []podinfo.PodContext) { 61 orderedBy(classAndPriority, runningTime).Sort(pods) 62 } 63 ``` 64 65 - Sorting of CPU usage 66 67 The priority of two pods will be compared in turn. If the priority is the same, then compare the CPU usage. If the CPU usage is also the same, continue to compare the EXT CPU resource usage (this is a special point of the CPU attribute). Finally, compare the running time of the pod. When there is a difference in a certain index, the comparison result can be returned 68 69 ```go 70 func CpuUsageSorter(pods []podinfo.PodContext) { 71 orderedBy(classAndPriority, cpuUsage, extCpuUsage, runningTime).Sort(pods) 72 } 73 ``` 74 75 - Sorting of ext CPU usage 76 77 First, it will compare whether the extended CPU resources are used by two pods. If both are used, it will compare the ratio of the extended CPU resource usage / the extended CPU resource limit 78 79 80 - For the indicators that need to be customized, the following methods can be implemented, and the flexible and customized sorting of pods can be easily realized by freely matching the general sorting methods. The <metric> represents the customized metric indicators, and the <metric sort func> represents the customized sorting strategy for <metric> 81 ```go 82 func <metric>Sorter(pods []podinfo.PodContext) { 83 orderedBy(classAndPriority, <metric-sort-func>, runningTime).Sort(pods) 84 } 85 ``` 86 The <metric sort func> only needs to implement the following sorting methods 87 ```go 88 func (p1, p2 podinfo.PodContext) int32 89 ``` 90 91 92 ### Definition of metric attribute 93 94 In order to better sort and precisely control metrics configured based on NodeQosEnsurancePolicy, the concept of attributes is introduced into metrics. 95 96 The attributes of metrics include the following: 97 98 1. Name indicates the name of the metric, which should be consistent with the indicator name collected in the collector module 99 2. ActionPriority indicates the priority of the indicator. 0 is the lowest and 10 is the highest 100 3. SortAble indicates whether the indicator can be sorted 101 4. Sorting methods corresponding to SortFunc. Sorting methods can be arranged and combined with some general methods, and then combined with the sorting of indicators, which will be introduced in detail below 102 5. ThrottleAble indicates whether pod can be suppressed for this indicator. For example, for the metric of CPU usage, there are corresponding suppression methods. However, for the indicator of memory usage, the pod can only be expelled, and effective suppression cannot be carried out 103 6. ThrottleQuantified indicates whether the corresponding metric resources released after the suppression can be accurately calculated after a pod is restored. We call the indicators that can be accurately quantified quantifiable, otherwise, they are not quantifiable; 104 For example, the CPU usage can be suppressed by limiting the CGroup usage, and the CPU usage released after suppression can be calculated by the current running value and the value after suppression; For example, memory usage does not belong to the suppression quantifiable metric, because memory has no corresponding throttle implementation, so it is impossible to accurately measure the specific amount of memory resources released after suppressing a pod; 105 7. ThrottleFunc, the specific method to execute the throttle action. If throttling is not available, the returned released is null 106 8. RestoreFunc: after being throttled, the specific method to execute the recovery action. If throttling is not allowed, the returned released is null 107 9. Relevant definitions of evicting actions by evictable, evictquantified, and evictfunc are similar to those of throttle actions 108 109 110 ```go 111 type metric struct { 112 Name WaterLineMetric 113 114 ActionPriority int 115 116 SortAble bool 117 SortFunc func(pods []podinfo.PodContext) 118 119 ThrottleAble bool 120 ThrottleQuantified bool 121 ThrottleFunc func(ctx *ExecuteContext, index int, ThrottleDownPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) 122 RestoreFunc func(ctx *ExecuteContext, index int, ThrottleUpPods ThrottlePods, totalReleasedResource *ReleaseResource) (errPodKeys []string, released ReleaseResource) 123 124 EvictAble bool 125 EvictQuantified bool 126 EvictFunc func(wg *sync.WaitGroup, ctx *ExecuteContext, index int, totalReleasedResource *ReleaseResource, EvictPods EvictPods) (errPodKeys []string, released ReleaseResource) 127 } 128 ``` 129 130 You can define your own metric. After the construction is completed, you can register it through registermetricmap() 131 132 ### How to control accurately according to the water level 133 134 - Build multiple waterlines according to multiple nodeqosensurancepolicies and objectiveinsurances: 135 1. Classified according to the actions corresponding to objectiveinsurances, the crane agent currently has three operations to guarantee node QoS, namely, evict, thtottledown (to suppress pod usage when the current usage is higher than the value in objectiveinsurances) and throttleup (to relax and recover pod usage when the current usage is lower than the value in objectiveinsurances). Therefore, there will be three waterline sets, namely, throttledownwaterline, Throttleupwaterline and evictwaterline 136 137 2. Then classify the waterlines in the same operation category according to their metric rules (metric A and metric Z are used as schematic in the figure), and record the value of each objectiveinsurances water level line, which is recorded as waterline; 138 139 The structures of throttledownwaterline, throttleupwaterline and evictwaterline are as follows: 140 `type WaterLines map[WaterLineMetric]*WaterLine` 141 142 Where waterlinemetric is the name field of the above metric, and waterline of value is the resource value 143 `type WaterLine resource.Quantity` 144 145 Finally, a data store similar to the following figure is formed: 146  147 148 - Construct the difference between real-time consumption and waterline: 149 The following data structure is constructed by combining the difference between the real-time consumption of the indicator at the current node and the minimum value in the waterline corresponding to the indicator in waterlines, representing the difference between the current consumption and the waterline 150 `type GapToWaterLines map[WaterLineMetric]float64` 151 152 Where the key value is the name field of metric, and the value is the difference between the consumption and the waterline; 153 154 It should be noted that for throttleup, the minimum waterline - current usage is used as the gap value. For the other two, the minimum waterline - current usage is used as the gap value, that is, the gap value is always kept positive 155 156 The following three data represent the indicators that need to perform evict, thatttledown and throttleup operations and their corresponding differences to the lowest waterline 157 ```go 158 EvictGapToWaterLines[metrics] 159 ThrottoleDownGapToWaterLines[metrics] 160 ThrottleUpGapWaterLine[metrics] 161 ``` 162 163 - Taking the metric CpuUsage as an example, the process and data structure of constructing the waterline related to node CPU usage are as follows: 164  165 166 ### Precise operation of pod based on water level 167 In order to realize the precise operation of pod based on the water level, the proposal will modify the analyzer and executor. The general process is as follows: 168 169 In the analyzer phase, construct waterlines for different operations (eviction, throttle, etc.) and different metrics, delete the original sorting logic, and move it to the executor phase where formal operations are required, and multiple rounds of sorting may be required; 170 171 In the executor stage, the corresponding sorting is carried out according to the indicators involved in the waterline, the latest consumption is obtained, gaptowaterlines is constructed, and precise operations are carried out 172 173 #### Analyzer phase 174 At this stage, the NodeQosEnsurancePolicy is converted to waterlines, and the rules of the same actionname and metricreule are merged. The details have been described above 175 176 #### Executor phase 177 178 Throttle: 179 180 1. Firstly, analyze the metrics involved in the ThrottoleDownGapToWaterLines, and divide these metrics into two parts according to their quantized attribute. If there is a metric that cannot be quantized, get the metric of a throttleable (with a throttlefunc) with the highest action priority through gethighstprioritythottleablemetric to suppress all the selected pods, because if there is a metric that cannot be quantized, It is impossible to carry out a precise operation 181 182 2. Get the latest usage of the current node and workload through getstatefunc(), Construct the gaptowaterline according to the ThrottoleDownGapToWaterLines and real-time usage (note that when constructing the gaptowaterline, it will traverse with the registered metric, so the finally constructed metric in the gaptowaterline will be the metric registered in the ThrottoleDownGapToWaterLines, avoiding the situation that the configuration error does not exist or the metric is not registered in the nodeqosensancepolicy) 183 184 3. If there is a metric in the gaptowaterline whose real-time usage cannot be obtained (hasusagemissedmetric), obtain the metric of a throttleable (with throttlefunc) with the highest action priority through GetHighestPriorityThrottleAbleMetric to suppress all the selected pods, because if there is a metric whose real-time usage cannot be obtained, the gap with the waterline cannot be known, and precise operations cannot be performed 185 186 4. If the situation in 3 does not exist, traverse the quantifiable metrics in the ThrottoleDownGapToWaterLines: if the metric has a sorting method, it directly uses its sortfunc to sort the pods. If not, it uses generalsorter to sort the pods, and then uses its corresponding throttlefunc to suppress the pods, and calculate the released resources of the corresponding metric, Until the gap corresponding to this metric in ThrottoleDownGapToWaterLines no longer exists 187 188 ```go 189 metricsQuantified, MetricsNotQuantified := ThrottleDownWaterLine.DivideMetricsByQuantified() 190 if len(MetricsNotThrottleQuantified) != 0 { 191 highestPrioriyMetric := GetHighestPriorityThrottleAbleMetric() 192 if highestPrioriyMetric != "" { 193 t.throttlePods(ctx, &totalReleased, highestPrioriyMetric) 194 } 195 } else { 196 ThrottoleDownGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc()) 197 if ThrottoleDownGapToWaterLines.HasUsageMissedMetric() { 198 highestPrioriyMetric := ThrottleDownWaterLine.GetHighestPriorityThrottleAbleMetric() 199 if highestPrioriyMetric != "" { 200 throttlePods(ctx, &totalReleased, highestPrioriyMetric) 201 } 202 } else { 203 var released ReleaseResource 204 for _, m := range metricsQuantified { 205 if m.SortAble { 206 m.SortFunc(ThrottleDownPods) 207 } else { 208 GeneralSorter(ThrottleDownPods) 209 } 210 211 for !ThrottoleDownGapToWaterLines.TargetGapsRemoved(m) { 212 for index, _ := range ThrottleDownPods { 213 released = m.ThrottleFunc(ctx, index, ThrottleDownPods, &totalReleased) 214 ThrottoleDownGapToWaterLines[m] -= released[m] 215 } 216 } 217 } 218 } 219 } 220 ``` 221 222 Eviction: 223 224 The process of eviction and throttle is the same, except that it is necessary to judge whether the pod has been expelled when operating the pod; Take out a pod that has not been executed, execute the eviction operation, calculate the released metric resources, and subtract the released value from the corresponding water level until the current metric waterline requirements are met 225 ```go 226 metricsEvictQuantified, MetricsNotEvcitQuantified := EvictWaterLine.DivideMetricsByEvictQuantified() 227 228 if len(MetricsNotEvcitQuantified) != 0 { 229 highestPrioriyMetric := e.EvictWaterLine.GetHighestPriorityEvictAbleMetric() 230 if highestPrioriyMetric != "" { 231 e.evictPods(ctx, &totalReleased, highestPrioriyMetric) 232 } 233 } else { 234 EvictGapToWaterLines = buildGapToWaterLine(ctx.getStateFunc(), ThrottleExecutor{}, *e) 235 if EvictGapToWaterLines.HasUsageMissedMetric() { 236 highestPrioriyMetric := EvictWaterLine.GetHighestPriorityEvictAbleMetric() 237 if highestPrioriyMetric != "" { 238 e.evictPods(ctx, &totalReleased, highestPrioriyMetric) 239 } 240 } else { 241 wg := sync.WaitGroup{} 242 var released ReleaseResource 243 for _, m := range metricsEvictQuantified { 244 if MetricMap[m].SortAble { 245 MetricMap[m].SortFunc(e.EvictPods) 246 } else { 247 execsort.GeneralSorter(e.EvictPods) 248 } 249 250 for !EvictGapToWaterLines.TargetGapsRemoved(m) { 251 if podinfo.HasNoExecutedPod(e.EvictPods) { 252 index := podinfo.GetFirstNoExecutedPod(e.EvictPods) 253 released = MetricMap[m].EvictFunc(&wg, ctx, index, &totalReleased, e.EvictPods) 254 255 e.EvictPods[index].HasBeenActioned = true 256 ctx.EvictGapToWaterLines[m] -= released[m] 257 } 258 } 259 } 260 wg.Wait() 261 } 262 263 } 264 ``` 265 266 ### Non-Goals/Future Work 267 268 - Currently, only the precise operation of CPU usage is supported, but the framework can be reused. In the future, the framework based on precise control can achieve precise control of more dimensional indicators. 269 - In the process of precise control, only the release of metric is considered at present, and the interaction between different metrics is not considered. For example, when pressing CPU usage, memory usage will also be affected. If there are many indicators, the relationship between different indicators will be very complex, so the direct interaction of different metrics will not be considered for the time being. 270 271 ### User Stories 272 273 - Users can use crane agent for better QoS guarantees. Support faster node load reduction to ensure that high priority services are not affected. At the same time, the throttle/eviction of low priority services is precisely controlled to avoid excessive operation. 274 - With the help of the framework of precise operation (throttle/eviction), users can easily realize the QoS function with precise operation and sorting capability based on the user-defined metric without paying attention to details by implementing the attributes and methods related to the user-defined metric.