github.com/gocrane/crane@v0.11.0/docs/tutorials/using-effective-hpa-to-scaling-with-effectiveness.md (about) 1 # EffectiveHorizontalPodAutoscaler 2 3 EffectiveHorizontalPodAutoscaler helps you manage application scaling in an easy way. 4 5 It is compatible with HorizontalPodAutoscaler but extends more features. 6 7 EffectiveHorizontalPodAutoscaler supports prediction-driven autoscaling. 8 9 With this capability, user can forecast the incoming peak flow and scale up their application ahead, also user can know when the peak flow will end and scale down their application gracefully. 10 11 Besides that, EffectiveHorizontalPodAutoscaler also defines several scale strategies to support different scaling scenarios. 12 13 ## Features 14 A EffectiveHorizontalPodAutoscaler sample yaml looks like below: 15 16 ```yaml 17 apiVersion: autoscaling.crane.io/v1alpha1 18 kind: EffectiveHorizontalPodAutoscaler 19 metadata: 20 name: php-apache 21 spec: 22 scaleTargetRef: #(1) 23 apiVersion: apps/v1 24 kind: Deployment 25 name: php-apache 26 minReplicas: 1 #(2) 27 maxReplicas: 10 #(3) 28 scaleStrategy: Auto #(4) 29 metrics: #(5) 30 - type: Resource 31 resource: 32 name: cpu 33 target: 34 type: Utilization 35 averageUtilization: 50 36 prediction: #(6) 37 predictionWindowSeconds: 3600 #(7) 38 predictionAlgorithm: 39 algorithmType: dsp 40 dsp: 41 sampleInterval: "60s" 42 historyLength: "3d" 43 ``` 44 45 1. ScaleTargetRef is the reference to the workload that should be scaled. 46 2. MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to. 47 3. MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to. 48 4. ScaleStrategy indicates the strategy to scaling target, value can be "Auto" and "Preview". 49 5. Metrics contains the specifications for which to use to calculate the desired replica count. 50 6. Prediction defines configurations for predict resources.If unspecified, defaults don't enable prediction. 51 7. PredictionWindowSeconds is the time window to predict metrics in the future. 52 53 ### Prediction-driven autoscaling 54 Most of online applications follow regular pattern. We can predict future trend of hours or days. DSP is a time series prediction algorithm that applicable for application metrics prediction. 55 56 The following shows a sample EffectiveHorizontalPodAutoscaler yaml with prediction enabled. 57 ```yaml 58 apiVersion: autoscaling.crane.io/v1alpha1 59 kind: EffectiveHorizontalPodAutoscaler 60 spec: 61 prediction: 62 predictionWindowSeconds: 3600 63 predictionAlgorithm: 64 algorithmType: dsp 65 dsp: 66 sampleInterval: "60s" 67 historyLength: "3d" 68 69 ``` 70 71 #### Metric conversion 72 73 When user defines `spec.metrics` in EffectiveHorizontalPodAutoscaler and prediction configuration is enabled, EffectiveHPAController will convert it to a new metric and configure the background HorizontalPodAutoscaler. 74 75 This is a source EffectiveHorizontalPodAutoscaler yaml for metric definition. 76 ```yaml 77 apiVersion: autoscaling.crane.io/v1alpha1 78 kind: EffectiveHorizontalPodAutoscaler 79 spec: 80 metrics: 81 - type: Resource 82 resource: 83 name: cpu 84 target: 85 type: Utilization 86 averageUtilization: 50 87 ``` 88 89 It's converted to underlying HorizontalPodAutoscaler metrics yaml. 90 ```yaml 91 apiVersion: autoscaling/v2beta1 92 kind: HorizontalPodAutoscaler 93 spec: 94 metrics: 95 - pods: 96 metric: 97 name: crane_pod_cpu_usage 98 selector: 99 matchLabels: 100 autoscaling.crane.io/effective-hpa-uid: f9b92249-eab9-4671-afe0-17925e5987b8 101 target: 102 type: AverageValue 103 averageValue: 100m 104 type: Pods 105 - resource: 106 name: cpu 107 target: 108 type: Utilization 109 averageUtilization: 50 110 type: Resource 111 ``` 112 113 In this sample, the resource metric defined by user is converted into two metrics: prediction metric and origin metric. 114 115 * **prediction metric** is custom metrics that provided by component MetricAdapter. Since custom metric doesn't support `targetAverageUtilization`, it's converted to `targetAverageValue` based on target pod cpu request. 116 * **origin metric** is equivalent to user defined metrics in EffectiveHorizontalPodAutoscaler, to fall back to baseline user defined in case of some unexpected situation e.g. business traffic sudden growth. 117 118 HorizontalPodAutoscaler will calculate on each metric, and propose new replicas based on that. The **largest** one will be picked as the new scale. 119 120 #### Horizontal scaling process 121 There are six steps of prediction and scaling process: 122 123 1. EffectiveHPAController create HorizontalPodAutoscaler and TimeSeriesPrediction instance 124 2. PredictionCore get historic metric from prometheus and persist into TimeSeriesPrediction 125 3. HPAController read metrics from KubeApiServer 126 4. KubeApiServer forward requests to MetricAdapter and MetricServer 127 5. HPAController calculate all metric results and propose a new scale replicas for target 128 6. HPAController scale target with Scale Api 129 130 Below is the process flow. 131  132 133 #### Use case 134 Let's take one use case that using EffectiveHorizontalPodAutoscaler in production cluster. 135 136 We did a profiling on the load history of one application in production and replayed it in staging environment. With the same application, we leverage both EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler to manage the scale and compare the result. 137 138 From the red line in below chart, we can see its actual total cpu usage is high at ~8am, ~12pm, ~8pm and low in midnight. The green line shows the prediction cpu usage trend. 139  140 141 Below is the comparison result between EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler. The red line is the replica number generated by HorizontalPodAutoscaler and the green line is the result from EffectiveHorizontalPodAutoscaler. 142  143 144 We can see significant improvement with EffectiveHorizontalPodAutoscaler: 145 146 * Scale up in advance before peek flow 147 * Scale down gracefully after peek flow 148 * Fewer replicas changes than HorizontalPodAutoscaler 149 150 ### ScaleStrategy 151 EffectiveHorizontalPodAutoscaler provides two strategies for scaling: `Auto` and `Preview`. User can change the strategy at runtime, and it will take effect on the fly. 152 153 #### Auto 154 Auto strategy achieves automatic scaling based on metrics. It is the default strategy. With this strategy, EffectiveHorizontalPodAutoscaler will create and control a HorizontalPodAutoscaler instance in backend. We don't recommend explicit configuration on the underlying HorizontalPodAutoscaler because it will be overridden by EffectiveHPAController. If user delete EffectiveHorizontalPodAutoscaler, HorizontalPodAutoscaler will be cleaned up too. 155 156 #### Preview 157 Preview strategy means EffectiveHorizontalPodAutoscaler won't change target's replicas automatically, so you can preview the calculated replicas and control target's replicas by themselves. User can switch from default strategy to this one by applying `spec.scaleStrategy` to `Preview`. It will take effect immediately, During the switch, EffectiveHPAController will disable HorizontalPodAutoscaler if exists and scale the target to the value `spec.specificReplicas`, if user not set `spec.specificReplicas`, when ScaleStrategy is change to Preview, it will just stop scaling. 158 159 A sample preview configuration looks like following: 160 ```yaml 161 apiVersion: autoscaling.crane.io/v1alpha1 162 kind: EffectiveHorizontalPodAutoscaler 163 spec: 164 scaleStrategy: Preview # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Preview". 165 specificReplicas: 5 # SpecificReplicas specify the target replicas. 166 status: 167 expectReplicas: 4 # expectReplicas is the calculated replicas that based on prediction metrics or spec.specificReplicas. 168 currentReplicas: 4 # currentReplicas is actual replicas from target 169 ``` 170 171 ### HorizontalPodAutoscaler compatible 172 EffectiveHorizontalPodAutoscaler is designed to be compatible with k8s native HorizontalPodAutoscaler, because we don't reinvent the autoscaling part but take advantage of the extension from HorizontalPodAutoscaler and build a high level autoscaling CRD. EffectiveHorizontalPodAutoscaler support all abilities from HorizontalPodAutoscaler like metricSpec and behavior. 173 174 EffectiveHorizontalPodAutoscaler will continue support incoming new feature from HorizontalPodAutoscaler. 175 176 ### EffectiveHorizontalPodAutoscaler status 177 This is a yaml from EffectiveHorizontalPodAutoscaler.Status 178 ```yaml 179 apiVersion: autoscaling.crane.io/v1alpha1 180 kind: EffectiveHorizontalPodAutoscaler 181 status: 182 conditions: 183 - lastTransitionTime: "2021-11-30T08:18:59Z" 184 message: the HPA controller was able to get the target's current scale 185 reason: SucceededGetScale 186 status: "True" 187 type: AbleToScale 188 - lastTransitionTime: "2021-11-30T08:18:59Z" 189 message: Effective HPA is ready 190 reason: EffectiveHorizontalPodAutoscalerReady 191 status: "True" 192 type: Ready 193 currentReplicas: 1 194 expectReplicas: 0 195 196 ``` 197 198 ### Cron-based autoscaling 199 EffectiveHorizontalPodAutoscaler supports cron based autoscaling. 200 201 Besides based on monitoring metrics, sometimes there are differences between holiday and weekdays in workload traffic, and a simple prediction algorithm may not work relatively well. Then you can make up for the lack of prediction by setting the weekend cron to have a larger number of replicas. 202 203 For some non-web traffic applications, for example, some applications do not need to work on weekends, and then want to reduce the workload replicas to 1, you can also configure cron to reduce the cost for your service. 204 205 Following are cron main fields in the ehpa spec: 206 207 - CronSpec: You can set multiple cron autoscaling configurations, cron cycle can set the start time and end time of the cycle, and the number of replicas of the workload can be continuously guaranteed to the set target value within the time range. 208 - Name: cron identifier 209 - TargetReplicas: the target number of replicas of the workload in this cron time range. 210 - Start: The start time of the cron, in the standard linux crontab format 211 - End: the end time of the cron, in the standard linux crontab format 212 213 214 Current cron autoscaling capabilities from some manufacturers and communities have some shortcomings. 215 216 1. The cron capability is provided separately, has no global view of autoscaling, poor compatibility with HPA, and conflicts with other scale trigger. 217 2. The semantics and behavior of cron do not match very well, and are even very difficult to understand when used, which can easily mislead users and lead to autoscaling failures. 218 219 The following figure shows the comparison between the current EHPA cron autoscaling implementation and other cron capabilities. 220 221  222 223 224 To address the above issues, the cron autoscaling implemented by EHPA is designed on the basis of compatibility with HPA, and cron, as an indicator of HPA, acts on the workload object together with other indicators. In addition, the setting of cron is also very simple. When cron is configured separately, the default scaling of the workload will not be performed when it is not in the active time range. 225 226 227 #### Cron working without other metrics 228 You can just configure cron itself to work, assume you have no other metrics configured. 229 ```yaml 230 apiVersion: autoscaling.crane.io/v1alpha1 231 kind: EffectiveHorizontalPodAutoscaler 232 metadata: 233 name: php-apache-local 234 spec: 235 # ScaleTargetRef is the reference to the workload that should be scaled. 236 scaleTargetRef: 237 apiVersion: apps/v1 238 kind: Deployment 239 name: php-apache 240 minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to. 241 maxReplicas: 100 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to. 242 scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual". 243 # Better to setting cron to fill the one complete time period such as one day, one week 244 # Below is one day cron scheduling, it 245 #(targetReplicas) 246 #80 -------- --------- ---------- 247 # | | | | | | 248 #10 ------------ ----- -------- ---------- 249 #(time) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 250 # Local timezone means you use the server's(or maybe is a container's) timezone which the craned running in. for example, if your craned started as utc timezone, then it is utc. if it started as Asia/Shanghai, then it is Asia/Shanghai. 251 crons: 252 - name: "cron1" 253 timezone: "Local" 254 description: "scale down" 255 start: "0 0 ? * *" 256 end: "0 6 ? * *" 257 targetReplicas: 10 258 - name: "cron2" 259 timezone: "Local" 260 description: "scale up" 261 start: "0 6 ? * *" 262 end: "0 9 ? * *" 263 targetReplicas: 80 264 - name: "cron3" 265 timezone: "Local" 266 description: "scale down" 267 start: "00 9 ? * *" 268 end: "00 11 ? * *" 269 targetReplicas: 10 270 - name: "cron4" 271 timezone: "Local" 272 description: "scale up" 273 start: "00 11 ? * *" 274 end: "00 14 ? * *" 275 targetReplicas: 80 276 - name: "cron5" 277 timezone: "Local" 278 description: "scale down" 279 start: "00 14 ? * *" 280 end: "00 17 ? * *" 281 targetReplicas: 10 282 - name: "cron6" 283 timezone: "Local" 284 description: "scale up" 285 start: "00 17 ? * *" 286 end: "00 20 ? * *" 287 targetReplicas: 80 288 - name: "cron7" 289 timezone: "Local" 290 description: "scale down" 291 start: "00 20 ? * *" 292 end: "00 00 ? * *" 293 targetReplicas: 10 294 ``` 295 296 CronSpec has following fields. 297 298 * **name** defines the name of the cron, cron name must be unique in the same ehpa 299 * **description** defines the details description of the cron. it can be empty. 300 * **timezone** defines the timezone of the cron which the crane to schedule in. If unspecified, default use `UTC` timezone. you can set it to `Local` which means you use timezone of the container of crane service running in. Also, `America/Los_Angeles` is ok. 301 * **start** defines the cron start time schedule, which is crontab format. see https://en.wikipedia.org/wiki/Cron 302 * **end** defines the cron end time schedule, which is crontab format. see https://en.wikipedia.org/wiki/Cron 303 * **targetReplicas** defines the target replicas the workload to scale when the cron is active, which means current time is between start and end. 304 305 Above means each day, the workload needs to keep the replicas hourly. 306 ``` 307 #80 -------- --------- ---------- 308 # | | | | | | 309 #1 ------------ ----- -------- ---------- 310 #(time) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 311 ``` 312 313 Remember **not to set start time is after end**. For example, when you set following: 314 ``` 315 crons: 316 - name: "cron2" 317 timezone: "Local" 318 description: "scale up" 319 start: "0 9 ? * *" 320 end: "0 6 ? * *" 321 targetReplicas: 80 322 ``` 323 Above is not valid because the start will be always later than end. The hpa controller will always get the workload's desired replica to scale, which means keep the original replicas. 324 325 326 #### Horizontal scaling process 327 There are six steps of cron-driven and scaling process: 328 329 1. EffectiveHPAController creates HorizontalPodAutoscaler which is injected to external cron metrics in spec. 330 2. HPAController reads cron external metrics from KubeApiServer 331 3. KubeApiServer forwards requests to MetricAdapter and MetricServer 332 4. The MetricAdapter finds the cron scaler for target hpa, and detect if the cron scaler is active, which means the current time is between the cron start and end schedule time. It will return the `TargetReplicas` specified in the `CronSpec`. 333 5. HPAController calculates all metric results and propose a new scale replicas for target by selecting the largest one. 334 6. HPAController scales target with Scale Api 335 336 337 When use ehpa, users can configure only cron metric, let the ehpa to be used as cron hpa. 338 339 Multiple crons of one ehpa will be transformed to one external metric. HPA will fetch this external cron metric and calculates target replicas when reconcile. HPA will select the largest proposal replicas to scale the workload from multiple metrics. 340 341 342 343 #### Cron working with other metrics together 344 345 EffectiveHorizontalPodAutoscaler is compatible with HorizontalPodAutoscaler(Which is kubernetes built in). So if you configured metrics for HPA such as cpu or memory, then the HPA will scale by the real time metric it observed. 346 347 With EHPA, users can configure CronMetricăPredictionMetricăOriginalMetric at the same time. 348 349 **We highly recommend you configure metrics of all dimensions. They are represtenting the cron replicas, prior predicted replicas, posterior observed replicas.** 350 351 This is a powerful feature. Because HPA always pick the largest replicas calculated by all dimensional metrics to scale. Which will guarantee your workload's QoS, when you configure three types of autoscaling at the same time, the replicas caculated by real metric observed is largest, then it will use the max one. Although the replicas caculated by prediction metric is smaller for some unexpected reason. So you don't be worried about the QoS. 352 353 354 #### Mechanism 355 When metrics adapter deal with the external cron metric requests, metrics adapter will do following steps. 356 357 ``` mermaid 358 graph LR 359 A[Start] --> B{Active Cron?}; 360 B -->|Yes| C(largest targetReplicas) --> F; 361 B -->|No| D{Work together with other metrics?}; 362 D -->|Yes| G(minimum replicas) --> F; 363 D -->|No| H(current replicas) --> F; 364 F[Result workload replicas]; 365 ``` 366 367 1. No active cron now, there are two cases: 368 369 - no other hpa metrics work with cron together, then return current workload replicas to keep the original desired replicas 370 - other hpa metrics work with cron together, then return min value to remove the cron impact for other metrics. when cron is working with other metrics together, it should not return workload's original desired replicas, because there maybe other metrics want to trigger the workload to scale in. hpa controller select max replicas computed by all metrics(this is hpa default policy in hard code), cron will impact the hpa. so we should remove the cron effect when cron is not active, it should return min value. 371 372 373 2. Has active ones. we use the largest targetReplicas specified in cron spec. Basically, there should not be more then one active cron at the same time period, it is not a best practice. 374 375 HPA will get the cron external metric value, then it will compute the replicas by itself. 376 377 #### Use Case 378 379 When you need to keep the workload replicas to minimum at midnight, you configured cron. And you need the HPA to get the real metric observed by metrics server to do scale based on real time observed metric. At last you configure a prediction-driven metric to do scale up early and scale down lately by predicting way. 380 381 ```yaml 382 apiVersion: autoscaling.crane.io/v1alpha1 383 kind: EffectiveHorizontalPodAutoscaler 384 metadata: 385 name: php-apache-multi-dimensions 386 spec: 387 # ScaleTargetRef is the reference to the workload that should be scaled. 388 scaleTargetRef: 389 apiVersion: apps/v1 390 kind: Deployment 391 name: php-apache 392 minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to. 393 maxReplicas: 100 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to. 394 scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual". 395 # Metrics contains the specifications for which to use to calculate the desired replica count. 396 metrics: 397 - type: Resource 398 resource: 399 name: cpu 400 target: 401 type: Utilization 402 averageUtilization: 50 403 # Prediction defines configurations for predict resources. 404 # If unspecified, defaults don't enable prediction. 405 prediction: 406 predictionWindowSeconds: 3600 # PredictionWindowSeconds is the time window to predict metrics in the future. 407 predictionAlgorithm: 408 algorithmType: dsp 409 dsp: 410 sampleInterval: "60s" 411 historyLength: "3d" 412 crons: 413 - name: "cron1" 414 description: "scale up" 415 start: "0 0 ? * 6" 416 end: "00 23 ? * 0" 417 targetReplicas: 100 418 ``` 419 420 421 ## FAQ 422 423 ### error: unable to get metric crane_pod_cpu_usage 424 425 When checking the status for EffectiveHorizontalPodAutoscaler, you may see this error: 426 427 ```yaml 428 - lastTransitionTime: "2022-05-15T14:05:43Z" 429 message: 'the HPA was unable to compute the replica count: unable to get metric 430 crane_pod_cpu_usage: unable to fetch metrics from custom metrics API: TimeSeriesPrediction 431 is not ready. ' 432 reason: FailedGetPodsMetric 433 status: "False" 434 type: ScalingActive 435 ``` 436 437 reason: Not all workload's cpu metric are predictable, if predict your workload failed, it will show above errors. 438 439 solution: 440 441 - Just waiting. the Prediction algorithm need more time, you can see `DSP` section to know more about this algorithm. 442 - EffectiveHorizontalPodAutoscaler have a protection mechanism when prediction failed, it will use the actual cpu utilization to do autoscaling.