github.com/gocrane/crane@v0.11.0/docs/tutorials/using-effective-hpa-to-scaling-with-effectiveness.md

github.com/gocrane/crane@v0.11.0/docs/tutorials/using-effective-hpa-to-scaling-with-effectiveness.md (about)

     1  # EffectiveHorizontalPodAutoscaler
     2  
     3  EffectiveHorizontalPodAutoscaler helps you manage application scaling in an easy way. 
     4  
     5  It is compatible with HorizontalPodAutoscaler but extends more features.
     6  
     7  EffectiveHorizontalPodAutoscaler supports prediction-driven autoscaling. 
     8  
     9  With this capability, user can forecast the incoming peak flow and scale up their application ahead, also user can know when the peak flow will end and scale down their application gracefully.
    10  
    11  Besides that, EffectiveHorizontalPodAutoscaler also defines several scale strategies to support different scaling scenarios.
    12  
    13  ## Features
    14  A EffectiveHorizontalPodAutoscaler sample yaml looks like below:
    15  
    16  ```yaml
    17  apiVersion: autoscaling.crane.io/v1alpha1
    18  kind: EffectiveHorizontalPodAutoscaler
    19  metadata:
    20    name: php-apache
    21  spec:
    22    scaleTargetRef: #(1)
    23      apiVersion: apps/v1
    24      kind: Deployment
    25      name: php-apache
    26    minReplicas: 1 #(2)
    27    maxReplicas: 10 #(3)
    28    scaleStrategy: Auto #(4)
    29    metrics: #(5)
    30    - type: Resource
    31      resource:
    32        name: cpu
    33        target:
    34          type: Utilization
    35          averageUtilization: 50
    36    prediction: #(6)
    37      predictionWindowSeconds: 3600 #(7)
    38      predictionAlgorithm:
    39        algorithmType: dsp
    40        dsp:
    41          sampleInterval: "60s"
    42          historyLength: "3d"
    43  ```
    44  
    45  1. ScaleTargetRef is the reference to the workload that should be scaled.
    46  2. MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
    47  3. MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
    48  4. ScaleStrategy indicates the strategy to scaling target, value can be "Auto" and "Preview".
    49  5. Metrics contains the specifications for which to use to calculate the desired replica count.
    50  6. Prediction defines configurations for predict resources.If unspecified, defaults don't enable prediction.
    51  7. PredictionWindowSeconds is the time window to predict metrics in the future.
    52  
    53  ### Prediction-driven autoscaling
    54  Most of online applications follow regular pattern. We can predict future trend of hours or days. DSP is a time series prediction algorithm that applicable for application metrics prediction.
    55  
    56  The following shows a sample EffectiveHorizontalPodAutoscaler yaml with prediction enabled.
    57  ```yaml
    58  apiVersion: autoscaling.crane.io/v1alpha1
    59  kind: EffectiveHorizontalPodAutoscaler
    60  spec:
    61    prediction:
    62      predictionWindowSeconds: 3600
    63      predictionAlgorithm:
    64        algorithmType: dsp
    65        dsp:
    66          sampleInterval: "60s"
    67          historyLength: "3d"
    68  
    69  ```
    70  
    71  #### Metric conversion
    72  
    73  When user defines `spec.metrics` in EffectiveHorizontalPodAutoscaler and prediction configuration is enabled, EffectiveHPAController will convert it to a new metric and configure the background HorizontalPodAutoscaler. 
    74  
    75  This is a source EffectiveHorizontalPodAutoscaler yaml for metric definition.
    76  ```yaml
    77  apiVersion: autoscaling.crane.io/v1alpha1
    78  kind: EffectiveHorizontalPodAutoscaler
    79  spec:
    80    metrics:
    81    - type: Resource
    82      resource:
    83        name: cpu
    84        target:
    85          type: Utilization
    86          averageUtilization: 50
    87  ```
    88  
    89  It's converted to underlying HorizontalPodAutoscaler metrics yaml.
    90  ```yaml
    91  apiVersion: autoscaling/v2beta1
    92  kind: HorizontalPodAutoscaler
    93  spec:
    94    metrics:
    95      - pods:
    96          metric:
    97            name: crane_pod_cpu_usage
    98            selector:
    99              matchLabels:
   100                autoscaling.crane.io/effective-hpa-uid: f9b92249-eab9-4671-afe0-17925e5987b8
   101          target:
   102            type: AverageValue
   103            averageValue: 100m
   104        type: Pods
   105      - resource:
   106          name: cpu
   107          target:
   108            type: Utilization
   109            averageUtilization: 50
   110        type: Resource
   111  ```
   112  
   113  In this sample, the resource metric defined by user is converted into two metrics: prediction metric and origin metric.
   114  
   115  * **prediction metric** is custom metrics that provided by component MetricAdapter. Since custom metric doesn't support `targetAverageUtilization`, it's converted to `targetAverageValue` based on target pod cpu request.
   116  * **origin metric** is equivalent to user defined metrics in EffectiveHorizontalPodAutoscaler, to fall back to baseline user defined in case of some unexpected situation e.g. business traffic sudden growth.
   117  
   118  HorizontalPodAutoscaler will calculate on each metric, and propose new replicas based on that. The **largest** one will be picked as the new scale.
   119  
   120  #### Horizontal scaling process
   121  There are six steps of prediction and scaling process:
   122  
   123  1. EffectiveHPAController create HorizontalPodAutoscaler and TimeSeriesPrediction instance 
   124  2. PredictionCore get historic metric from prometheus and persist into TimeSeriesPrediction
   125  3. HPAController read metrics from KubeApiServer
   126  4. KubeApiServer forward requests to MetricAdapter and MetricServer
   127  5. HPAController calculate all metric results and propose a new scale replicas for target
   128  6. HPAController scale target with Scale Api
   129  
   130  Below is the process flow.
   131  ![crane-ehpa](../images/crane-ehpa.png)
   132  
   133  #### Use case
   134  Let's take one use case that using EffectiveHorizontalPodAutoscaler in production cluster.
   135  
   136  We did a profiling on the load history of one application in production and replayed it in staging environment. With the same application, we leverage both EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler to manage the scale and compare the result.
   137  
   138  From the red line in below chart, we can see its actual total cpu usage is high at ~8am, ~12pm, ~8pm and low in midnight. The green line shows the prediction cpu usage trend.
   139  ![crane-ehpa-metrics-chart](../images/crane-ehpa-metrics-chart.png)
   140  
   141  Below is the comparison result between EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler. The red line is the replica number generated by HorizontalPodAutoscaler and the green line is the result from EffectiveHorizontalPodAutoscaler.
   142  ![crane-ehpa-metrics-replicas-chart](../images/crane-ehpa-replicas-chart.png)
   143  
   144  We can see significant improvement with EffectiveHorizontalPodAutoscaler:
   145  
   146  * Scale up in advance before peek flow
   147  * Scale down gracefully after peek flow
   148  * Fewer replicas changes than HorizontalPodAutoscaler
   149  
   150  ### ScaleStrategy
   151  EffectiveHorizontalPodAutoscaler provides two strategies for scaling: `Auto` and `Preview`. User can change the strategy at runtime, and it will take effect on the fly.
   152  
   153  #### Auto
   154  Auto strategy achieves automatic scaling based on metrics. It is the default strategy. With this strategy, EffectiveHorizontalPodAutoscaler will create and control a HorizontalPodAutoscaler instance in backend. We don't recommend explicit configuration on the underlying HorizontalPodAutoscaler because it will be overridden by EffectiveHPAController. If user delete EffectiveHorizontalPodAutoscaler, HorizontalPodAutoscaler will be cleaned up too.
   155  
   156  #### Preview
   157  Preview strategy means EffectiveHorizontalPodAutoscaler won't change target's replicas automatically, so you can preview the calculated replicas and control target's replicas by themselves. User can switch from default strategy to this one by applying `spec.scaleStrategy` to `Preview`. It will take effect immediately, During the switch, EffectiveHPAController will disable HorizontalPodAutoscaler if exists and scale the target to the value `spec.specificReplicas`, if user not set `spec.specificReplicas`, when ScaleStrategy is change to Preview, it will just stop scaling.
   158  
   159  A sample preview configuration looks like following:
   160  ```yaml
   161  apiVersion: autoscaling.crane.io/v1alpha1
   162  kind: EffectiveHorizontalPodAutoscaler
   163  spec:
   164    scaleStrategy: Preview   # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Preview".
   165    specificReplicas: 5      # SpecificReplicas specify the target replicas.
   166  status:
   167    expectReplicas: 4        # expectReplicas is the calculated replicas that based on prediction metrics or spec.specificReplicas.
   168    currentReplicas: 4       # currentReplicas is actual replicas from target
   169  ```
   170  
   171  ### HorizontalPodAutoscaler compatible
   172  EffectiveHorizontalPodAutoscaler is designed to be compatible with k8s native HorizontalPodAutoscaler, because we don't reinvent the autoscaling part but take advantage of the extension from HorizontalPodAutoscaler and build a high level autoscaling CRD. EffectiveHorizontalPodAutoscaler support all abilities from HorizontalPodAutoscaler like metricSpec and behavior.
   173  
   174  EffectiveHorizontalPodAutoscaler will continue support incoming new feature from HorizontalPodAutoscaler.
   175  
   176  ### EffectiveHorizontalPodAutoscaler status
   177  This is a yaml from EffectiveHorizontalPodAutoscaler.Status
   178  ```yaml
   179  apiVersion: autoscaling.crane.io/v1alpha1
   180  kind: EffectiveHorizontalPodAutoscaler
   181  status:
   182    conditions:                                               
   183    - lastTransitionTime: "2021-11-30T08:18:59Z"
   184      message: the HPA controller was able to get the target's current scale
   185      reason: SucceededGetScale
   186      status: "True"
   187      type: AbleToScale
   188    - lastTransitionTime: "2021-11-30T08:18:59Z"
   189      message: Effective HPA is ready
   190      reason: EffectiveHorizontalPodAutoscalerReady
   191      status: "True"
   192      type: Ready
   193    currentReplicas: 1
   194    expectReplicas: 0
   195  
   196  ```
   197  
   198  ### Cron-based autoscaling
   199  EffectiveHorizontalPodAutoscaler supports cron based autoscaling. 
   200  
   201  Besides based on monitoring metrics, sometimes there are differences between holiday and weekdays in workload traffic, and a simple prediction algorithm may not work relatively well. Then you can make up for the lack of prediction by setting the weekend cron to have a larger number of replicas.
   202  
   203  For some non-web traffic applications, for example, some applications do not need to work on weekends, and then want to reduce the workload replicas to 1, you can also configure cron to reduce the cost for your service.
   204  
   205  Following are cron main fields in the ehpa spec:
   206  
   207   - CronSpec: You can set multiple cron autoscaling configurations, cron cycle can set the start time and end time of the cycle, and the number of replicas of the workload can be continuously guaranteed to the set target value within the time range.
   208   - Name: cron identifier
   209   - TargetReplicas: the target number of replicas of the workload in this cron time range.
   210   - Start: The start time of the cron, in the standard linux crontab format
   211   - End: the end time of the cron, in the standard linux crontab format
   212  
   213  
   214  Current cron autoscaling capabilities from some manufacturers and communities have some shortcomings.
   215  
   216  1. The cron capability is provided separately, has no global view of autoscaling, poor compatibility with HPA, and  conflicts with other scale trigger.
   217  2. The semantics and behavior of cron do not match very well, and are even very difficult to understand when used, which can easily mislead users and lead to autoscaling failures.
   218  
   219  The following figure shows the comparison between the current EHPA cron autoscaling implementation and other cron capabilities.
   220  
   221  ![crane-keda-ali-compare-cron.png](../images/crane-keda-ali-compare-cron.png)
   222  
   223  
   224  To address the above issues, the cron autoscaling implemented by EHPA is designed on the basis of compatibility with HPA, and cron, as an indicator of HPA, acts on the workload object together with other indicators. In addition, the setting of cron is also very simple. When cron is configured separately, the default scaling of the workload will not be performed when it is not in the active time range.
   225  
   226  
   227  #### Cron working without other metrics
   228  You can just configure cron itself to work, assume you have no other metrics configured.
   229  ```yaml
   230  apiVersion: autoscaling.crane.io/v1alpha1
   231  kind: EffectiveHorizontalPodAutoscaler
   232  metadata:
   233    name: php-apache-local
   234  spec:
   235    # ScaleTargetRef is the reference to the workload that should be scaled.
   236    scaleTargetRef:
   237      apiVersion: apps/v1
   238      kind: Deployment
   239      name: php-apache
   240    minReplicas: 1        # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
   241    maxReplicas: 100       # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
   242    scaleStrategy: Auto   # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
   243    # Better to setting cron to fill the one complete time period such as one day, one week
   244    # Below is one day cron scheduling, it
   245    #(targetReplicas)
   246    #80                  --------     ---------        ----------
   247    #                    |       |    |        |       |         |
   248    #10       ------------       -----         --------          ----------
   249    #(time)   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
   250    # Local timezone means you use the server's(or maybe is a container's) timezone which the craned running in. for example, if your craned started as utc timezone, then it is utc. if it started as Asia/Shanghai, then it is Asia/Shanghai.
   251    crons:
   252      - name: "cron1"
   253        timezone: "Local"
   254        description: "scale down"
   255        start: "0 0 ? * *"
   256        end: "0 6 ? * *"
   257        targetReplicas: 10
   258      - name: "cron2"
   259        timezone: "Local"
   260        description: "scale up"
   261        start: "0 6 ? * *"
   262        end: "0 9 ? * *"
   263        targetReplicas: 80
   264      - name: "cron3"
   265        timezone: "Local"
   266        description: "scale down"
   267        start: "00 9 ? * *"
   268        end: "00 11 ? * *"
   269        targetReplicas: 10
   270      - name: "cron4"
   271        timezone: "Local"
   272        description: "scale up"
   273        start: "00 11 ? * *"
   274        end: "00 14 ? * *"
   275        targetReplicas: 80
   276      - name: "cron5"
   277        timezone: "Local"
   278        description: "scale down"
   279        start: "00 14 ? * *"
   280        end: "00 17 ? * *"
   281        targetReplicas: 10
   282      - name: "cron6"
   283        timezone: "Local"
   284        description: "scale up"
   285        start: "00 17 ? * *"
   286        end: "00 20 ? * *"
   287        targetReplicas: 80
   288      - name: "cron7"
   289        timezone: "Local"
   290        description: "scale down"
   291        start: "00 20 ? * *"
   292        end: "00 00 ? * *"
   293        targetReplicas: 10
   294  ``` 
   295  
   296  CronSpec has following fields.
   297  
   298  * **name** defines the name of the cron, cron name must be unique in the same ehpa
   299  * **description** defines the details description of the cron. it can be empty.
   300  * **timezone** defines the timezone of the cron which the crane to schedule in. If unspecified, default use `UTC` timezone. you can set it to `Local` which means you use timezone of the container of crane service running in. Also, `America/Los_Angeles` is ok.
   301  * **start** defines the cron start time schedule, which is crontab format. see https://en.wikipedia.org/wiki/Cron
   302  * **end** defines the cron end time schedule, which is crontab format. see https://en.wikipedia.org/wiki/Cron
   303  * **targetReplicas** defines the target replicas the workload to scale when the cron is active, which means current time is between start and end.
   304  
   305  Above means each day, the workload needs to keep the replicas hourly.
   306  ```
   307    #80                  --------     ---------        ----------
   308    #                    |       |    |        |       |         |
   309    #1        ------------       -----         --------          ----------
   310    #(time)   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
   311  ```
   312  
   313  Remember **not to set start time is after end**. For example, when you set following:
   314  ```
   315    crons:
   316      - name: "cron2"
   317        timezone: "Local"
   318        description: "scale up"
   319        start: "0 9 ? * *"
   320        end: "0 6 ? * *"
   321        targetReplicas: 80
   322  ```
   323  Above is not valid because the start will be always later than end. The hpa controller will always get the workload's desired replica to scale, which means keep the original replicas.
   324  
   325  
   326  #### Horizontal scaling process
   327  There are six steps of cron-driven and scaling process:
   328  
   329  1. EffectiveHPAController creates HorizontalPodAutoscaler which is injected to external cron metrics in spec.
   330  2. HPAController reads cron external metrics from KubeApiServer
   331  3. KubeApiServer forwards requests to MetricAdapter and MetricServer
   332  4. The MetricAdapter finds the cron scaler for target hpa, and detect if the cron scaler is active, which means the current time is between the cron start and end schedule time. It will return the `TargetReplicas` specified in the `CronSpec`.
   333  5. HPAController calculates all metric results and propose a new scale replicas for target by selecting the largest one.
   334  6. HPAController scales target with Scale Api
   335  
   336  
   337  When use ehpa, users can configure only cron metric, let the ehpa to be used as cron hpa.
   338  
   339  Multiple crons of one ehpa will be transformed to one external metric. HPA will fetch this external cron metric and calculates target replicas when reconcile. HPA will select the largest proposal replicas to scale the workload from multiple metrics.
   340  
   341  
   342  
   343  #### Cron working with other metrics together
   344  
   345  EffectiveHorizontalPodAutoscaler is compatible with HorizontalPodAutoscaler(Which is kubernetes built in). So if you configured metrics for HPA such as cpu or memory, then the HPA will scale by the real time metric it observed. 
   346  
   347  With EHPA, users can configure CronMetric、PredictionMetric、OriginalMetric at the same time.
   348  
   349  **We highly recommend you configure metrics of all dimensions. They are represtenting the cron replicas, prior predicted replicas, posterior observed replicas.**
   350  
   351  This is a powerful feature. Because HPA always pick the largest replicas calculated by all dimensional metrics to scale. Which will guarantee your workload's QoS, when you configure three types of autoscaling at the same time, the replicas caculated by real metric observed is largest, then it will use the max one. Although the replicas caculated by prediction metric is smaller for some unexpected reason. So you don't be worried about the QoS.
   352  
   353  
   354  #### Mechanism
   355  When metrics adapter deal with the external cron metric requests, metrics adapter will do following steps.
   356  
   357  ``` mermaid
   358  graph LR
   359    A[Start] --> B{Active Cron?};
   360    B -->|Yes| C(largest targetReplicas) --> F;
   361    B -->|No| D{Work together with other metrics?};
   362    D -->|Yes| G(minimum replicas) --> F;
   363    D -->|No| H(current replicas) --> F;
   364    F[Result workload replicas];
   365  ```
   366  
   367  1. No active cron now, there are two cases:
   368     
   369     - no other hpa metrics work with cron together, then return current workload replicas to keep the original desired replicas
   370     - other hpa metrics work with cron together, then return min value to remove the cron impact for other metrics. when cron is working with other metrics together, it should not return workload's original desired replicas, because there maybe other metrics want to trigger the workload to scale in. hpa controller select max replicas computed by all metrics(this is hpa default policy in hard code), cron will impact the hpa. so we should remove the cron effect when cron is not active, it should return min value.
   371  		
   372  		
   373  2. Has active ones.  we use the largest targetReplicas specified in cron spec. Basically, there should not be more then one active cron at the same time period, it is not a best practice.
   374  
   375  HPA will get the cron external metric value, then it will compute the replicas by itself.
   376  
   377  #### Use Case
   378  
   379  When you need to keep the workload replicas to minimum at midnight, you configured cron. And you need the HPA to get the real metric observed by metrics server to do scale based on real time observed metric. At last you configure a prediction-driven metric to do scale up early and scale down lately by predicting way.
   380  
   381  ```yaml
   382  apiVersion: autoscaling.crane.io/v1alpha1
   383  kind: EffectiveHorizontalPodAutoscaler
   384  metadata:
   385    name: php-apache-multi-dimensions
   386  spec:
   387    # ScaleTargetRef is the reference to the workload that should be scaled.
   388    scaleTargetRef:
   389      apiVersion: apps/v1
   390      kind: Deployment
   391      name: php-apache
   392    minReplicas: 1        # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
   393    maxReplicas: 100       # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
   394    scaleStrategy: Auto   # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
   395    # Metrics contains the specifications for which to use to calculate the desired replica count.
   396    metrics:
   397      - type: Resource
   398        resource:
   399          name: cpu
   400          target:
   401            type: Utilization
   402            averageUtilization: 50
   403    # Prediction defines configurations for predict resources.
   404    # If unspecified, defaults don't enable prediction.
   405    prediction:
   406      predictionWindowSeconds: 3600   # PredictionWindowSeconds is the time window to predict metrics in the future.
   407      predictionAlgorithm:
   408        algorithmType: dsp
   409        dsp:
   410          sampleInterval: "60s"
   411          historyLength: "3d"
   412    crons:
   413      - name: "cron1"
   414        description: "scale up"
   415        start: "0 0 ? * 6"
   416        end: "00 23 ? * 0"
   417        targetReplicas: 100
   418  ```
   419  
   420  
   421  ## FAQ
   422  
   423  ### error: unable to get metric crane_pod_cpu_usage 
   424  
   425  When checking the status for EffectiveHorizontalPodAutoscaler, you may see this error: 
   426  
   427  ```yaml
   428  - lastTransitionTime: "2022-05-15T14:05:43Z"
   429    message: 'the HPA was unable to compute the replica count: unable to get metric
   430      crane_pod_cpu_usage: unable to fetch metrics from custom metrics API: TimeSeriesPrediction
   431      is not ready. '
   432    reason: FailedGetPodsMetric
   433    status: "False"
   434    type: ScalingActive
   435  ```
   436  
   437  reason: Not all workload's cpu metric are predictable, if predict your workload failed, it will show above errors. 
   438  
   439  solution: 
   440  
   441  - Just waiting. the Prediction algorithm need more time, you can see `DSP` section to know more about this algorithm.
   442  - EffectiveHorizontalPodAutoscaler have a protection mechanism when prediction failed, it will use the actual cpu utilization to do autoscaling.