github.com/gocrane/crane@v0.11.0/docs/tutorials/replicas-recommendation.md

github.com/gocrane/crane@v0.11.0/docs/tutorials/replicas-recommendation.md (about)

     1  # Replicas Recommendation
     2  
     3  Kubernetes' users often set the replicas of workload or HPA configurations based on empirical values. Replicas recommendation analyze the actual application usage and give advice for replicas and HPA configurations. You can refer to and adopt it for your workloads to improve cluster resource utilization.
     4  
     5  ## Features
     6  
     7  1. Algorithm: The algorithm for calculating the replicas refers to HPA, and supports to customization algo args
     8  2. HPA recommendations: Scan for applications that suitable for configuring horizontal elasticity (EHPA), And give advice for configuration of EHPA, [EHPA](using-effective-hpa-to-scaling-with-effectiveness.md) is a smart horizontal elastic product provided by Crane
     9  3. Support batch analysis: With the ResourceSelector, users can batch analyze multiple workloads
    10  
    11  ## Create HPA Analytics
    12  
    13  Create an **Resource** `Analytics` to give recommendation for deployment: `nginx-deployment` as a sample.
    14  
    15  === "Main"
    16  
    17        ```bash
    18        kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/examples/analytics/nginx-deployment.yaml
    19        kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/examples/analytics/analytics-replicas.yaml
    20        ```
    21  
    22  === "Mirror"
    23  
    24        ```bash
    25        kubectl apply -f https://gitee.com/finops/crane/raw/main/examples/analytics/nginx-deployment.yaml
    26        kubectl apply -f https://gitee.com/finops/crane/raw/main/examples/analytics/analytics-replicas.yaml
    27        ```
    28  
    29  The created `Analytics` yaml is following:
    30  
    31  ```yaml title="analytics-replicas.yaml"
    32  apiVersion: analysis.crane.io/v1alpha1
    33  kind: Analytics
    34  metadata:
    35    name: nginx-hpa
    36  spec:
    37    type: Replicas                        # This can only be "Resource" or "Replicas".
    38    completionStrategy:
    39      completionStrategyType: Periodical  # This can only be "Once" or "Periodical".
    40      periodSeconds: 600                  # analytics selected resources every 10 minutes
    41    resourceSelectors:                    # defines all the resources to be select with
    42      - kind: Deployment
    43        apiVersion: apps/v1
    44        name: nginx-deployment
    45    config:                               # defines all the configuration for this analytics
    46      ehpa.deployment-min-replicas: "1"
    47      ehpa.fluctuation-threshold: "0"
    48      ehpa.min-cpu-usage-threshold: "0"
    49  ```
    50  
    51  You can get created recommendations from analytics status:
    52  
    53  ```bash
    54  kubectl get analytics nginx-replicas -o yaml
    55  ```
    56  
    57  The output is similar to:
    58  
    59  ```yaml
    60  apiVersion: analysis.crane.io/v1alpha1
    61  kind: Analytics
    62  metadata:
    63    name: nginx-replicas
    64    namespace: default
    65  spec:
    66    completionStrategy:
    67      completionStrategyType: Periodical
    68      periodSeconds: 600
    69    config:
    70      replicas.fluctuation-threshold: "0"
    71      replicas.min-cpu-usage-threshold: "0"
    72      replicas.workload-min-replicas: "1"
    73    resourceSelectors:
    74    - apiVersion: apps/v1
    75      kind: Deployment
    76      labelSelector: {}
    77      name: nginx-deployment
    78    type: Replicas
    79  status:
    80    conditions:
    81    - lastTransitionTime: "2022-06-02T09:44:54Z"
    82      message: Analytics is ready
    83      reason: AnalyticsReady
    84      status: "True"
    85      type: Ready
    86    lastUpdateTime: "2022-06-02T09:44:54Z"
    87    recommendations:
    88    - lastStartTime: "2022-06-02T09:44:54Z"
    89      message: Success
    90      name: nginx-replicas-replicas-7qspm
    91      namespace: default
    92      targetRef:
    93        apiVersion: apps/v1
    94        kind: Deployment
    95        name: nginx-deployment
    96        namespace: default
    97      uid: c853043c-5ff6-4ee0-a941-e04c8ec3093b
    98  ```
    99  
   100  ## Recommendation: Analytics result
   101  
   102  Use label selector to get related recommendations owns by `Analytics`.
   103  
   104  ```bash
   105  kubectl get recommend -l analysis.crane.io/analytics-name=nginx-replicas -o yaml
   106  ```
   107  
   108  The output is similar to:
   109  
   110  ```yaml
   111  apiVersion: v1
   112  items:
   113     - apiVersion: analysis.crane.io/v1alpha1
   114       kind: Recommendation
   115       metadata:
   116          creationTimestamp: "2022-06-02T09:44:54Z"
   117          generateName: nginx-replicas-replicas-
   118          generation: 2
   119          labels:
   120             analysis.crane.io/analytics-name: nginx-replicas
   121             analysis.crane.io/analytics-type: Replicas
   122             analysis.crane.io/analytics-uid: e9168c6e-329f-40e9-8d0f-a1ddc35b0d47
   123             app: nginx
   124          name: nginx-replicas-replicas-7qspm
   125          namespace: default
   126          ownerReferences:
   127             - apiVersion: analysis.crane.io/v1alpha1
   128               blockOwnerDeletion: false
   129               controller: false
   130               kind: Analytics
   131               name: nginx-replicas
   132               uid: e9168c6e-329f-40e9-8d0f-a1ddc35b0d47
   133          resourceVersion: "818959913"
   134          selfLink: /apis/analysis.crane.io/v1alpha1/namespaces/default/recommendations/nginx-replicas-replicas-7qspm
   135          uid: c853043c-5ff6-4ee0-a941-e04c8ec3093b
   136       spec:
   137          adoptionType: StatusAndAnnotation
   138          completionStrategy:
   139             completionStrategyType: Once
   140          targetRef:
   141             apiVersion: apps/v1
   142             kind: Deployment
   143             name: nginx-deployment
   144             namespace: default
   145          type: Replicas
   146       status:
   147          conditions:
   148             - lastTransitionTime: "2022-06-02T09:44:54Z"
   149               message: Recommendation is ready
   150               reason: RecommendationReady
   151               status: "True"
   152               type: Ready
   153          lastUpdateTime: "2022-06-02T09:44:54Z"
   154          recommendedValue: |
   155             effectiveHPA:
   156               maxReplicas: 3
   157               metrics:
   158               - resource:
   159                   name: cpu
   160                   target:
   161                     averageUtilization: 75
   162                     type: Utilization
   163                 type: Resource
   164               minReplicas: 3
   165             replicasRecommendation:
   166               replicas: 3
   167  kind: List
   168  metadata:
   169     resourceVersion: ""
   170     selfLink: ""
   171  ```
   172  
   173  ## Batch recommendation
   174  
   175  Use a sample to show how to recommend all Deployments and StatefulSets by one `Analytics`:
   176  
   177  ```yaml
   178  apiVersion: analysis.crane.io/v1alpha1
   179  kind: Analytics
   180  metadata:
   181     name: workload-replicas
   182     namespace: crane-system               # The Analytics in Crane-system will select all resource across all namespaces.
   183  spec:
   184     type: Replicas                        # This can only be "Resource" or "Replicas".
   185     completionStrategy:
   186        completionStrategyType: Periodical  # This can only be "Once" or "Periodical".
   187        periodSeconds: 86400                # analytics selected resources every 1 day
   188     resourceSelectors:                    # defines all the resources to be select with
   189        - kind: Deployment
   190          apiVersion: apps/v1
   191        - kind: StatefulSet
   192          apiVersion: apps/v1
   193  ```
   194  
   195  1. when using `crane-system` as your namespace，`Analytics` selected all namespaces，when namespace not equal `crane-system`，`Analytics` selected the resource that in `Analytics` namespace
   196  2. resourceSelectors defines the resource to analysis，kind and apiVersion is mandatory，name is optional
   197  3. resourceSelectors supoort any resource that are [Scale Subresource](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource)
   198  
   199  ## HPA Recommendation Algorithm model
   200  
   201  ### Inspecting
   202  
   203  1. Workload with low replicas: If the replicas is too low,  may not be suitable for hpa recommendation. Associated configuration: `ehpa.deployment-min-replicas` | `ehpa.statefulset-min-replicas` | `ehpa.workload-min-replicas`
   204  2. Workload with a certain percentage of not running pods: if the workload of Pod mostly can't run normally, may not be suitable for flexibility. Associated configuration: `ehpa.pod-min-ready-seconds` | `ehpa.pod-available-ratio`
   205  3. Workload with low CPU usage: The low CPU usage workload means that there is no load pressure. In this case, we can't estimate it. Associated configuration: `ehpa.min-cpu-usage-threshold`
   206  4. Workload with low fluctuation of CPU usage: dividing of the maximum and minimum usage is defined as the fluctuation rate. If the fluctuation rate is too low, the workload will not benefit much from hpa. Associated configuration: `ehpa.fluctuation-threshold` 
   207  
   208  ### Advising
   209  
   210  In the advising phase, one EffectiveHPA Spec is recommended using the following Algorithm model. The recommended logic for each field is as follows:
   211  
   212  **Recommend TargetUtilization**
   213  
   214  Principle: Use Pod P99 resource utilization to recommend hpa. Because if the application can accept this utilization over P99 time, it can be inferred as a target for elasticity.
   215  
   216  1. Get the Pod P99 usage of the past seven days by Percentile algorithm: $pod\_cpu\_usage\_p99$
   217  2. Corresponding utilization:
   218  
   219        $target\_pod\_CPU\_utilization = \frac{pod\_cpu\_usage\_p99}{pod\_cpu\_request}$
   220  
   221  3. To prevent over-utilization or under-utilization, target_pod_cpu_utilization needs to be less than ehpa.min-cpu-target-utilization and greater than ehpa. max-cpu-target-utilization
   222  
   223     $ehpa.max\mbox{-}cpu\mbox{-}target\mbox{-}utilization  < target\_pod\_cpu\_utilization < ehpa.min\mbox{-}cpu\mbox{-}target\mbox{-}utilization$
   224  
   225  **Recommend minReplicas**
   226  
   227  Principle: MinReplicas are recommended for the lowest hourly workload utilization for the past seven days.
   228  
   229  1. Calculate the lowest median workload cpu usage of the past seven days: $workload\_cpu\_usage\_medium\_min$
   230  2. Corresponding replicas: 
   231  
   232        $minReplicas = \frac{\mathrm{workload\_cpu\_usage\_medium\_min} }{pod\_cpu\_request \times ehpa.max-cpu-target-utilization}$
   233  
   234  3. To prevent the minReplicas being too small, the minReplicas must be greater than or equal to ehpa.default-min-replicas
   235  
   236        $minReplicas \geq ehpa.default\mbox{-}min\mbox{-}replicas$
   237  
   238  **Recommend maxReplicas**
   239  
   240  Principle: Use workload's past and future seven days load to recommend maximum replicas.
   241  
   242  1. Calculate P95 workload CPU usage for the past seven days and the next seven days: $workload\_cpu\_usage\_p95$
   243  2. Corresponding replicas:
   244  
   245       $max\_replicas\_origin = \frac{\mathrm{workload\_cpu\_usage\_p95} }{pod\_cpu\_request \times target\_cpu\_utilization}$
   246  
   247  3. To handle with the peak traffic, Magnify by a certain factor: 
   248  
   249     $max\_replicas = max\_replicas\_origin \times  ehpa.max\mbox{-}replicas\mbox{-}factor$
   250  
   251  **Recommend MetricSpec(except CpuUtilization)**
   252  
   253  1. If HPA is configured for workload, MetricSpecs other than CpuUtilization are inherited
   254  
   255  **Recommend Behavior**
   256  
   257  1. If HPA is configured for workload, the corresponding Behavior configuration is inherited
   258  
   259  **Recommend Prediction**
   260  
   261  1. Try to predict the CPU usage of the workload in the next seven days using DSP
   262  2. If the prediction is successful, add the prediction configuration
   263  3. If the workload is not predictable, do not add the prediction configuration.
   264  
   265  ## Configurations for HPA Recommendation
   266  
   267  | Configuration | Default Value | Description |
   268  | ------------- | ------------- | ----------- |
   269  | ehpa.deployment-min-replicas | 1 | hpa recommendations are not made for workloads smaller than this value. |
   270  | ehpa.statefulset-min-replicas| 1 | hpa recommendations are not made for workloads smaller than this value. |
   271  | ehpa.workload-min-replicas| 1 | Workload replicas smaller than this value are not recommended for hpa. |
   272  | ehpa.pod-min-ready-seconds| 30 | specifies the number of seconds in decide whether a POD is ready. |
   273  | ehpa.pod-available-ratio| 0.5 | Workloads whose Ready pod ratio is smaller than this value are not recommended for hpa. |
   274  | ehpa.default-min-replicas| 2 | the default minimum minReplicas.|
   275  | ehpa.max-replicas-factor| 3 | the factor for calculate maxReplicas. |
   276  | ehpa.min-cpu-usage-threshold| 10| hpa recommendations are not made for workloads smaller than this value.|
   277  | ehpa.fluctuation-threshold| 1.5 | hpa recommendations are not made for workloads smaller than this value.|
   278  | ehpa.min-cpu-target-utilization| 30 | |
   279  | ehpa.max-cpu-target-utilization| 75 | |
   280  | ehpa.reference-hpa| true | inherits the existing HPA configuration |