github.com/gocrane/crane@v0.11.0/docs/tutorials/effective-hpa-with-prometheus-adapter.zh.md

github.com/gocrane/crane@v0.11.0/docs/tutorials/effective-hpa-with-prometheus-adapter.zh.md (about)

     1  # 基于 Effective HPA 实现自定义指标的智能弹性实践
     2  
     3  Kubernetes HPA 支持了丰富的弹性扩展能力，Kubernetes 平台开发者部署服务实现自定义 Metric 的服务，Kubernetes 用户配置多项内置的资源指标或者自定义 Metric 指标实现自定义水平弹性。
     4  Effective HPA 兼容社区的 Kubernetes HPA 的能力，提供了更智能的弹性策略，比如基于预测的弹性和基于 Cron 周期的弹性等。
     5  Prometheus 是当下流行的开源监控系统，通过 Prometheus 可以获取到用户的自定义指标配置。
     6  
     7  本文将通过一个例子介绍了如何基于 Effective HPA 实现自定义指标的智能弹性。部分配置来自于 [官方文档](https://github.com/kubernetes-sigs/prometheus-adapter/blob/master/docs/walkthrough.md)
     8  
     9  ## 部署环境要求
    10  
    11  - Kubernetes 1.18+
    12  - Helm 3.1.0
    13  - Crane v0.6.0+
    14  - Prometheus
    15  
    16  参考 [安裝文档](https://docs.gocrane.io/dev/installation/) 在集群中安装 Crane，Prometheus 可以使用安装文档中的也可以是已部署的 Prometheus。
    17  
    18  ## 环境搭建
    19  
    20  ### 安装 PrometheusAdapter
    21  
    22  Crane 组件 Metric-Adapter 和 PrometheusAdapter 都基于 [custom-metric-apiserver](https://github.com/kubernetes-sigs/custom-metrics-apiserver) 实现了 Custom Metric 和 External Metric 的 ApiService。在安装 Crane 时会将对应的 ApiService 安装为 Crane 的 Metric-Adapter，因此安装 PrometheusAdapter 前需要删除 ApiService 以确保 Helm 安装成功。
    23  
    24  ```bash
    25  # 查看当前集群 ApiService
    26  kubectl get apiservice 
    27  ```
    28  
    29  因为安装了 Crane， 结果如下：
    30  
    31  ```bash
    32  NAME                                   SERVICE                           AVAILABLE   AGE
    33  v1beta1.batch                          Local                             True        35d
    34  v1beta1.custom.metrics.k8s.io          crane-system/metric-adapter       True        18d
    35  v1beta1.discovery.k8s.io               Local                             True        35d
    36  v1beta1.events.k8s.io                  Local                             True        35d
    37  v1beta1.external.metrics.k8s.io        crane-system/metric-adapter       True        18d
    38  v1beta1.flowcontrol.apiserver.k8s.io   Local                             True        35d
    39  v1beta1.metrics.k8s.io                 kube-system/metrics-service       True        35d
    40  ```
    41  
    42  删除 crane 安装的 ApiService
    43  
    44  ```bash
    45  kubectl delete apiservice v1beta1.custom.metrics.k8s.io
    46  kubectl delete apiservice v1beta1.external.metrics.k8s.io
    47  ```
    48  
    49  通过 Helm 安装 PrometheusAdapter
    50  
    51  ```bash
    52  helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    53  helm repo update
    54  helm install prometheus-adapter -n crane-system prometheus-community/prometheus-adapter
    55  ```
    56  
    57  再将 ApiService 改回 Crane 的 Metric-Adapter
    58  
    59  ```bash
    60  kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/deploy/metric-adapter/apiservice.yaml
    61  ```
    62  
    63  ### 配置 Metric-Adapter 开启 RemoteAdapter 功能
    64  
    65  在安装 PrometheusAdapter 时没有将 ApiService 指向 PrometheusAdapter，因此为了让 PrometheusAdapter 也可以提供自定义 Metric，通过 Crane Metric Adapter 的 `RemoteAdapter` 功能将请求转发给 PrometheusAdapter。
    66  
    67  修改 Metric-Adapter 的配置，将 PrometheusAdapter 的 Service 配置成 Crane Metric Adapter 的 RemoteAdapter
    68  
    69  ```bash
    70  # 查看当前集群 ApiService
    71  kubectl edit deploy metric-adapter -n crane-system
    72  ```
    73  
    74  根据 PrometheusAdapter 的配置做以下修改：
    75  
    76  ```yaml
    77  apiVersion: apps/v1
    78  kind: Deployment
    79  metadata:
    80    name: metric-adapter
    81    namespace: crane-system
    82  spec:
    83    template:
    84      spec:
    85        containers:
    86        - args:
    87            #添加外部 Adapter 配置
    88          - --remote-adapter=true
    89          - --remote-adapter-service-namespace=crane-system
    90          - --remote-adapter-service-name=prometheus-adapter
    91          - --remote-adapter-service-port=443
    92  ```
    93  
    94  #### RemoteAdapter 能力
    95  
    96  ![](../images/remote-adapter.png)
    97  
    98  Kubernetes 限制一个 ApiService 只能配置一个后端服务，因此，为了在一个集群内使用 Crane 提供的 Metric 和 PrometheusAdapter 提供的 Metric，Crane 支持了 RemoteAdapter 解决此问题
    99  
   100  - Crane Metric-Adapter 支持配置一个 Kubernetes Service 作为一个远程 Adapter
   101  - Crane Metric-Adapter 处理请求时会先检查是否是 Crane 提供的 Local Metric，如果不是，则转发给远程 Adapter
   102  
   103  ## 运行例子
   104  
   105  ### 准备应用
   106  
   107  将以下应用部署到集群中，应用暴露了 Metric 展示每秒收到的 http 请求数量。
   108  
   109  <summary>sample-app.deploy.yaml</summary>
   110  
   111  ```yaml
   112  apiVersion: apps/v1
   113  kind: Deployment
   114  metadata:
   115    name: sample-app
   116    labels:
   117      app: sample-app
   118  spec:
   119    replicas: 1
   120    selector:
   121      matchLabels:
   122        app: sample-app
   123    template:
   124      metadata:
   125        labels:
   126          app: sample-app
   127      spec:
   128        containers:
   129        - image: luxas/autoscale-demo:v0.1.2
   130          name: metrics-provider
   131          resources:
   132            limits:
   133              cpu: 500m
   134            requests:
   135              cpu: 200m
   136          ports:
   137          - name: http
   138            containerPort: 8080
   139  ```
   140  
   141  <summary>sample-app.service.yaml</summary>
   142  
   143  ```yaml
   144  apiVersion: v1
   145  kind: Service
   146  metadata:
   147    labels:
   148      app: sample-app
   149    name: sample-app
   150  spec:
   151    ports:
   152    - name: http
   153      port: 80
   154      protocol: TCP
   155      targetPort: 8080
   156    selector:
   157      app: sample-app
   158    type: ClusterIP
   159  ```
   160  
   161  ```bash
   162  kubectl create -f sample-app.deploy.yaml
   163  kubectl create -f sample-app.service.yaml
   164  ```
   165  
   166  当应用部署完成后，您可以通过命令检查 `http_requests_total` Metric：
   167  
   168  ```bash
   169  curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
   170  ```
   171  
   172  ### 配置采集规则
   173  
   174  配置 Prometheus 的 ScrapeConfig，收集应用的 Metric: http_requests_total
   175  
   176  ```bash
   177  kubectl edit configmap -n crane-system prometheus-server
   178  ```
   179  
   180  添加以下配置
   181  
   182  ```yaml
   183      - job_name: sample-app
   184        kubernetes_sd_configs:
   185        - role: pod
   186        relabel_configs:
   187        - action: keep
   188          regex: default;sample-app-(.+)
   189          source_labels:
   190          - __meta_kubernetes_namespace
   191          - __meta_kubernetes_pod_name
   192        - action: labelmap
   193          regex: __meta_kubernetes_pod_label_(.+)
   194        - action: replace
   195          source_labels:
   196          - __meta_kubernetes_namespace
   197          target_label: namespace
   198        - source_labels: [__meta_kubernetes_pod_name]
   199          action: replace
   200          target_label: pod
   201  ```
   202  
   203  此时，您可以在 Prometheus 查询 psql：sum(rate(http_requests_total[5m])) by (pod)
   204  
   205  ### 验证 PrometheusAdapter 
   206  
   207  PrometheusAdapter 默认的 Rule 配置支持将 http_requests_total 转换成 Pods 类型的 Custom Metric，通过命令验证：
   208  
   209  ```bash
   210  kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq . 
   211  ```
   212  
   213  结果应包括 `pods/http_requests`:
   214  
   215  ```bash
   216  {
   217    "name": "pods/http_requests",
   218    "singularName": "",
   219    "namespaced": true,
   220    "kind": "MetricValueList",
   221    "verbs": [
   222      "get"
   223    ]
   224  }
   225  ```
   226  
   227  这表明现在可以通过 Pod Metric 配置 HPA。
   228  
   229  ### 配置弹性
   230  
   231  现在我们可以创建 Effective HPA。此时 Effective HPA 可以通过 Pod Metric `http_requests` 进行弹性：
   232  
   233  #### 如何定义一个自定义指标开启预测功能
   234  
   235  在 Effective HPA 的 Annotation 按以下规则添加配置：
   236  
   237  ```yaml
   238  annotations:
   239    # metric-query.autoscaling.crane.io 是固定的前缀，后面是 Metric 名字，需跟 spec.metrics 中的 Metric.name 相同，支持 Pods 类型和 External 类型
   240    metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"
   241  ```
   242  
   243  <summary>sample-app-hpa.yaml</summary>
   244  
   245  ```yaml
   246  apiVersion: autoscaling.crane.io/v1alpha1
   247  kind: EffectiveHorizontalPodAutoscaler
   248  metadata:
   249    name: php-apache
   250    annotations:
   251      # metric-query.autoscaling.crane.io 是固定的前缀，后面是 Metric 名字，需跟 spec.metrics 中的 Metric.name 相同，支持 Pods 类型和 External 类型
   252      metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"
   253  spec:
   254    # ScaleTargetRef is the reference to the workload that should be scaled.
   255    scaleTargetRef:
   256      apiVersion: apps/v1
   257      kind: Deployment
   258      name: sample-app
   259    minReplicas: 1        # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
   260    maxReplicas: 10       # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
   261    scaleStrategy: Auto   # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
   262    # Metrics contains the specifications for which to use to calculate the desired replica count.
   263    metrics:
   264    - type: Resource
   265      resource:
   266        name: cpu
   267        target:
   268          type: Utilization
   269          averageUtilization: 50
   270    - type: Pods
   271      pods:
   272        metric:
   273          name: http_requests
   274        target:
   275          type: AverageValue
   276          averageValue: 500m
   277    # Prediction defines configurations for predict resources.
   278    # If unspecified, defaults don't enable prediction.
   279    prediction:
   280      predictionWindowSeconds: 3600   # PredictionWindowSeconds is the time window to predict metrics in the future.
   281      predictionAlgorithm:
   282        algorithmType: dsp
   283        dsp:
   284          sampleInterval: "60s"
   285          historyLength: "7d"
   286  ```
   287  
   288  ```bash
   289  kubectl create -f sample-app-hpa.yaml
   290  ```
   291  
   292  查看 TimeSeriesPrediction 状态，如果应用运行时间较短，可能会无法预测：
   293  
   294  ```yaml
   295  apiVersion: prediction.crane.io/v1alpha1
   296  kind: TimeSeriesPrediction
   297  metadata:
   298    creationTimestamp: "2022-07-11T16:10:09Z"
   299    generation: 1
   300    labels:
   301      app.kubernetes.io/managed-by: effective-hpa-controller
   302      app.kubernetes.io/name: ehpa-php-apache
   303      app.kubernetes.io/part-of: php-apache
   304      autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
   305    name: ehpa-php-apache
   306    namespace: default
   307  spec:
   308    predictionMetrics:
   309      - algorithm:
   310          algorithmType: dsp
   311          dsp:
   312            estimators: {}
   313            historyLength: 7d
   314            sampleInterval: 60s
   315        resourceIdentifier: crane_pod_cpu_usage
   316        resourceQuery: cpu
   317        type: ResourceQuery
   318      - algorithm:
   319          algorithmType: dsp
   320          dsp:
   321            estimators: {}
   322            historyLength: 7d
   323            sampleInterval: 60s
   324        expressionQuery:
   325          expression: sum(rate(http_requests_total[5m])) by (pod)
   326        resourceIdentifier: crane_custom.pods_http_requests
   327        type: ExpressionQuery
   328    predictionWindowSeconds: 3600
   329    targetRef:
   330      apiVersion: apps/v1
   331      kind: Deployment
   332      name: sample-app
   333      namespace: default
   334  status:
   335    conditions:
   336      - lastTransitionTime: "2022-07-12T06:54:42Z"
   337        message: not all metric predicted
   338        reason: PredictPartial
   339        status: "False"
   340        type: Ready
   341    predictionMetrics:
   342      - ready: false
   343        resourceIdentifier: crane_pod_cpu_usage
   344      - prediction:
   345          - labels:
   346              - name: pod
   347                value: sample-app-7cfb596f98-8h5vv
   348            samples:
   349              - timestamp: 1657608900
   350                value: "0.01683"
   351              - timestamp: 1657608960
   352                value: "0.01683"
   353              ......
   354        ready: true
   355        resourceIdentifier: crane_custom.pods_http_requests  
   356  ```
   357  
   358  查看 Effective HPA 创建的 HPA 对象，可以观测到已经创建出基于自定义指标预测的 Metric: `crane_custom.pods_http_requests`
   359  
   360  ```yaml
   361  apiVersion: autoscaling/v2beta2
   362  kind: HorizontalPodAutoscaler
   363  metadata:
   364    creationTimestamp: "2022-07-11T16:10:10Z"
   365    labels:
   366      app.kubernetes.io/managed-by: effective-hpa-controller
   367      app.kubernetes.io/name: ehpa-php-apache
   368      app.kubernetes.io/part-of: php-apache
   369      autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
   370    name: ehpa-php-apache
   371    namespace: default
   372  spec:
   373    maxReplicas: 10
   374    metrics:
   375    - pods:
   376        metric:
   377          name: http_requests
   378        target:
   379          averageValue: 500m
   380          type: AverageValue
   381      type: Pods
   382    - pods:
   383        metric:
   384          name: crane_custom.pods_http_requests
   385          selector:
   386            matchLabels:
   387              autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
   388        target:
   389          averageValue: 500m
   390          type: AverageValue
   391      type: Pods
   392    - resource:
   393        name: cpu
   394        target:
   395          averageUtilization: 50
   396          type: Utilization
   397      type: Resource
   398    minReplicas: 1
   399    scaleTargetRef:
   400      apiVersion: apps/v1
   401      kind: Deployment
   402      name: sample-app
   403  ```
   404  
   405  ## 总结
   406  
   407  由于生产环境的复杂性，基于多指标的弹性（CPU/Memory/自定义指标）往往是生产应用的常见选择，因此 Effective HPA 通过预测算法覆盖了多指标的弹性，达到了帮助更多业务在生产环境落地水平弹性的成效。