github.com/gocrane/crane@v0.11.0/docs/tutorials/qos-dynamic-resource-oversold-and-limit.md

github.com/gocrane/crane@v0.11.0/docs/tutorials/qos-dynamic-resource-oversold-and-limit.md (about)

     1  ## Dynamic resource oversold enhanced by prediction algorithm
     2  In order to improve the stability, users usually set the request value higher than the actual usage when deploying applications, resulting in a waste of resources. In order to improve the resource utilization of nodes, users will deploy some besteffort applications in combination, using idle resources to realize oversold;
     3  However, due to the lack of resource limit and request constraints and related information in these applications, scheduler may still schedule these pods to nodes with high load, which is inconsistent with our original intention, so it is best to schedule based on the free resources of nodes.
     4  
     5  Crane collects the idle resources of nodes in the following two ways, and takes them as the idle resources of nodes after synthesis, which enhances the accuracy of resource evaluation:
     6  
     7  Take cpu as an example, crane also supports the recovery of memory idle resources.
     8  
     9  1. CPU usage information collected locally
    10  
    11  `nodeCpuCannotBeReclaimed := nodeCpuUsageTotal + exclusiveCPUIdle - extResContainerCpuUsageTotal`
    12  
    13  ExclusiveCPUIdle refers to the idle amount of CPU occupied by the pod whose CPU manager policy is exclusive. Although this part of resources is idle, it cannot be reused because of monopoly, so it is counted as used
    14  
    15  ExtResContainerCpuUsageTotal refers to the CPU consumption used as dynamic resources, which needs to be subtracted to avoid secondary calculation
    16  
    17  2. Create a TSP of node CPU usage, which is automatically created by default, and will predict node CPU usage based on history
    18  ```yaml
    19  apiVersion: v1
    20  data:
    21    spec: |
    22      predictionMetrics:
    23      - algorithm:
    24          algorithmType: dsp
    25          dsp:
    26            estimators:
    27              fft:
    28              - highFrequencyThreshold: "0.05"
    29                lowAmplitudeThreshold: "1.0"
    30                marginFraction: "0.2"
    31                maxNumOfSpectrumItems: 20
    32                minNumOfSpectrumItems: 10
    33            historyLength: 3d
    34            sampleInterval: 60s
    35        resourceIdentifier: cpu
    36        type: ExpressionQuery
    37        expressionQuery:
    38          expression: 'sum(count(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}) by (mode, cpu)) - sum(irate(node_cpu_seconds_total{mode="idle",instance=~"({{.metadata.name}})(:\\d+)?"}[5m]))'
    39      predictionWindowSeconds: 3600
    40  kind: ConfigMap
    41  metadata:
    42    name: noderesource-tsp-template
    43    namespace: default
    44  ```
    45  
    46  Combine the prediction algorithm with the current actual consumption to calculate the remaining available resources of the node, and give it to the node as an extended resource. Pod can indicate that the extended resource is used as an offline job to use the idle resources, so as to improve the resource utilization rate of the node;
    47  
    48  How to use:
    49  When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`, as follows
    50  ```yaml
    51  spec: 
    52     containers:
    53     - image: nginx
    54       imagePullPolicy: Always
    55       name: extended-resource-demo-ctr
    56       resources:
    57         limits:
    58           gocrane.io/cpu: "2"
    59           gocrane.io/memory: "2000Mi"
    60         requests:
    61           gocrane.io/cpu: "2"
    62           gocrane.io/memory: "2000Mi"
    63  ```
    64  
    65  ## Elastic resource restriction function
    66  The native besteffort application lacks a fair guarantee of resource usage. Crane guarantees that the CPU usage of the besteffort pod using dynamic resources is limited within the reasonable range of its allowable use. The agent guarantees that the actual consumption of the pod using extended resources will not exceed its stated limit. At the same time, when the CPU competes, it can also compete fairly according to its stated amount; At the same time, pod using elastic resources will also be managed by the watermark function.
    67  
    68  How to use:
    69  When deploying pod, limit and request use `gocrane.io/<$resourcename>:<$value>`
    70  
    71  ## suitable scene
    72  In order to increase the load of nodes, some offline jobs or less important jobs can be scheduled and deployed to the cluster by using dynamic resources. Such jobs will use idle elastic resources.
    73  With the watermark guarantee of QOS, when the node has a high load, it will be evicted and throttled first, and the utilization of the node will be improved on the premise of ensuring the stability of high-priority services.
    74  See the section "Used with dynamic resources" in qos-interference-detection-and-active-avoidance.md.