github.com/gocrane/crane@v0.11.0/docs/tutorials/qos-dynamic-resource-oversold-and-limit.md (about) 1 ## Dynamic resource oversold enhanced by prediction algorithm 2 In order to improve the stability, users usually set the request value higher than the actual usage when deploying applications, resulting in a waste of resources. In order to improve the resource utilization of nodes, users will deploy some besteffort applications in combination, using idle resources to realize oversold; 3 However, due to the lack of resource limit and request constraints and related information in these applications, scheduler may still schedule these pods to nodes with high load, which is inconsistent with our original intention, so it is best to schedule based on the free resources of nodes. 4 5 Crane collects the idle resources of nodes in the following two ways, and takes them as the idle resources of nodes after synthesis, which enhances the accuracy of resource evaluation: 6 7 Take cpu as an example, crane also supports the recovery of memory idle resources. 8 9 1. CPU usage information collected locally 10 11 `nodeCpuCannotBeReclaimed := nodeCpuUsageTotal + exclusiveCPUIdle - extResContainerCpuUsageTotal` 12 13 ExclusiveCPUIdle refers to the idle amount of CPU occupied by the pod whose CPU manager policy is exclusive. Although this part of resources is idle, it cannot be reused because of monopoly, so it is counted as used 14 15 ExtResContainerCpuUsageTotal refers to the CPU consumption used as dynamic resources, which needs to be subtracted to avoid secondary calculation 16 17 2. Create a TSP of node CPU usage, which is automatically created by default, and will predict node CPU usage based on history 18 ```yaml 19 apiVersion: v1 20 data: 21 spec: | 22 predictionMetrics: 23 - algorithm: 24 algorithmType: dsp 25 dsp: 26 estimators: 27 fft: 28 - highFrequencyThreshold: "0.05" 29 lowAmplitudeThreshold: "1.0" 30 marginFraction: "0.2" 31 maxNumOfSpectrumItems: 20 32 minNumOfSpectrumItems: 10 33 historyLength: 3d 34 sampleInterval: 60s 35 resourceIdentifier: cpu 36 type: ExpressionQuery 37 expressionQuery: 38 expression: 'sum(count(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}[5m]))' 39 predictionWindowSeconds: 3600 40 kind: ConfigMap 41 metadata: 42 name: noderesource-tsp-template 43 namespace: default 44 ``` 45 46 Combine the prediction algorithm with the current actual consumption to calculate the remaining available resources of the node, and give it to the node as an extended resource. Pod can indicate that the extended resource is used as an offline job to use the idle resources, so as to improve the resource utilization rate of the node; 47 48 How to use: 49 When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`, as follows 50 ```yaml 51 spec: 52 containers: 53 - image: nginx 54 imagePullPolicy: Always 55 name: extended-resource-demo-ctr 56 resources: 57 limits: 58 gocrane.io/cpu: "2" 59 gocrane.io/memory: "2000Mi" 60 requests: 61 gocrane.io/cpu: "2" 62 gocrane.io/memory: "2000Mi" 63 ``` 64 65 ## Elastic resource restriction function 66 The native besteffort application lacks a fair guarantee of resource usage. Crane guarantees that the CPU usage of the besteffort pod using dynamic resources is limited within the reasonable range of its allowable use. The agent guarantees that the actual consumption of the pod using extended resources will not exceed its stated limit. At the same time, when the CPU competes, it can also compete fairly according to its stated amount; At the same time, pod using elastic resources will also be managed by the watermark function. 67 68 How to use: 69 When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>` 70 71 ## suitable scene 72 In order to increase the load of nodes, some offline jobs or less important jobs can be scheduled and deployed to the cluster by using dynamic resources. Such jobs will use idle elastic resources. 73 With the watermark guarantee of QOS, when the node has a high load, it will be evicted and throttled first, and the utilization of the node will be improved on the premise of ensuring the stability of high-priority services. 74 See the section "Used with dynamic resources" in qos-interference-detection-and-active-avoidance.md.