github.com/gocrane/crane@v0.11.0/docs/tutorials/scheduling-pods-based-on-actual-node-load.zh.md (about)

     1  # Crane-scheduler
     2  
     3  ## 概述
     4  Crane-scheduler 是一组基于[scheduler framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/)的调度插件, 包含:
     5  
     6  - [Dynamic scheduler:负载感知调度器插件](./dynamic-scheduler-plugin.md)
     7  
     8  ## 开始
     9  
    10  ### 安装 Prometheus
    11  确保你的 Kubernetes 集群已安装 Prometheus。如果没有,请参考[Install Prometheus](https://github.com/gocrane/fadvisor/blob/main/README.md#prerequests).
    12  
    13  ### 配置 Prometheus 规则
    14  
    15  配置 Prometheus 的规则以获取预期的聚合数据:
    16  
    17  ```yaml
    18  apiVersion: monitoring.coreos.com/v1
    19  kind: PrometheusRule
    20  metadata:
    21      name: example-record
    22  spec:
    23      groups:
    24      - name: cpu_mem_usage_active
    25          interval: 30s
    26          rules:
    27          - record: cpu_usage_active
    28          expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
    29          - record: mem_usage_active
    30          expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
    31      - name: cpu-usage-5m
    32          interval: 5m
    33          rules:
    34          - record: cpu_usage_max_avg_1h
    35          expr: max_over_time(cpu_usage_avg_5m[1h])
    36          - record: cpu_usage_max_avg_1d
    37          expr: max_over_time(cpu_usage_avg_5m[1d])
    38      - name: cpu-usage-1m
    39          interval: 1m
    40          rules:
    41          - record: cpu_usage_avg_5m
    42          expr: avg_over_time(cpu_usage_active[5m])
    43      - name: mem-usage-5m
    44          interval: 5m
    45          rules:
    46          - record: mem_usage_max_avg_1h
    47          expr: max_over_time(mem_usage_avg_5m[1h])
    48          - record: mem_usage_max_avg_1d
    49          expr: max_over_time(mem_usage_avg_5m[1d])
    50      - name: mem-usage-1m
    51          interval: 1m
    52          rules:
    53          - record: mem_usage_avg_5m
    54          expr: avg_over_time(mem_usage_active[5m])
    55  ```
    56  !!! warning "️Troubleshooting"
    57  
    58          Prometheus 的采样间隔必须小于30秒,不然可能会导致规则无法正常生效。如:`cpu_usage_active`。
    59  
    60  ### 安装 Crane-scheduler
    61  有两种选择:
    62  
    63  - 安装 Crane-scheduler 作为第二个调度器
    64  - 用 Crane-scheduler 替换原生 Kube-scheduler
    65  
    66  #### 安装 Crane-scheduler 作为第二个调度器
    67  === "Main"
    68  
    69         ```bash
    70         helm repo add crane https://gocrane.github.io/helm-charts
    71         helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
    72         ```
    73  
    74  === "Mirror"
    75  
    76         ```bash
    77         helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane
    78         helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
    79         ```
    80  #### 用 Crane-scheduler 替换原生 Kube-scheduler
    81  
    82  1. 备份`/etc/kubernetes/manifests/kube-scheduler.yaml`
    83  ```bash
    84  cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/
    85  ```
    86  2. 通过修改 kube-scheduler 的配置文件(`scheduler-config.yaml` ) 启用动态调度插件并配置插件参数:
    87  ```yaml title="scheduler-config.yaml"
    88  apiVersion: kubescheduler.config.k8s.io/v1beta2
    89  kind: KubeSchedulerConfiguration
    90  ...
    91  profiles:
    92  - schedulerName: default-scheduler
    93   plugins:
    94     filter:
    95       enabled:
    96       - name: Dynamic
    97     score:
    98       enabled:
    99       - name: Dynamic
   100         weight: 3
   101   pluginConfig:
   102   - name: Dynamic
   103      args:
   104       policyConfigPath: /etc/kubernetes/policy.yaml
   105  ...
   106  ```
   107  3. 新建`/etc/kubernetes/policy.yaml`,用作动态插件的调度策略:
   108   ```yaml title="/etc/kubernetes/policy.yaml"
   109    apiVersion: scheduler.policy.crane.io/v1alpha1
   110    kind: DynamicSchedulerPolicy
   111    spec:
   112      syncPolicy:
   113        ##cpu usage
   114        - name: cpu_usage_avg_5m
   115          period: 3m
   116        - name: cpu_usage_max_avg_1h
   117          period: 15m
   118        - name: cpu_usage_max_avg_1d
   119          period: 3h
   120        ##memory usage
   121        - name: mem_usage_avg_5m
   122          period: 3m
   123        - name: mem_usage_max_avg_1h
   124          period: 15m
   125        - name: mem_usage_max_avg_1d
   126          period: 3h
   127  
   128      predicate:
   129        ##cpu usage
   130        - name: cpu_usage_avg_5m
   131          maxLimitPecent: 0.65
   132        - name: cpu_usage_max_avg_1h
   133          maxLimitPecent: 0.75
   134        ##memory usage
   135        - name: mem_usage_avg_5m
   136          maxLimitPecent: 0.65
   137        - name: mem_usage_max_avg_1h
   138          maxLimitPecent: 0.75
   139  
   140      priority:
   141        ##cpu usage
   142        - name: cpu_usage_avg_5m
   143          weight: 0.2
   144        - name: cpu_usage_max_avg_1h
   145          weight: 0.3
   146        - name: cpu_usage_max_avg_1d
   147          weight: 0.5
   148        ##memory usage
   149        - name: mem_usage_avg_5m
   150          weight: 0.2
   151        - name: mem_usage_max_avg_1h
   152          weight: 0.3
   153        - name: mem_usage_max_avg_1d
   154          weight: 0.5
   155  
   156      hotValue:
   157        - timeRange: 5m
   158          count: 5
   159        - timeRange: 1m
   160          count: 2
   161   ```
   162   4. 修改`kube-scheduler.yaml`并用 Crane-scheduler的镜像替换 kube-scheduler 镜像:
   163   ```yaml title="kube-scheduler.yaml"
   164   ...
   165    image: docker.io/gocrane/crane-scheduler:0.0.23
   166   ...
   167   ```
   168   5. 安装[crane-scheduler-controller](https://github.com/gocrane/crane-scheduler/tree/main/deploy/controller):
   169  === "Main"
   170  
   171        ```bash
   172          kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/rbac.yaml
   173          kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/deployment.yaml
   174        ```
   175  
   176  === "Mirror"
   177  
   178        ```bash
   179        kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/rbac.yaml
   180        kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/deployment.yaml
   181        ```
   182  
   183  ### 使用 Crane-scheduler 调度 Pod
   184  使用以下示例测试 Crane-scheduler :
   185  
   186  ```yaml
   187  apiVersion: apps/v1
   188  kind: Deployment
   189  metadata:
   190    name: cpu-stress
   191  spec:
   192    selector:
   193      matchLabels:
   194        app: cpu-stress
   195    replicas: 1
   196    template:
   197      metadata:
   198        labels:
   199          app: cpu-stress
   200      spec:
   201        schedulerName: crane-scheduler
   202        hostNetwork: true
   203        tolerations:
   204        - key: node.kubernetes.io/network-unavailable
   205          operator: Exists
   206          effect: NoSchedule
   207        containers:
   208        - name: stress
   209          image: docker.io/gocrane/stress:latest
   210          command: ["stress", "-c", "1"]
   211          resources:
   212            requests:
   213              memory: "1Gi"
   214              cpu: "1"
   215            limits:
   216              memory: "1Gi"
   217              cpu: "1"
   218  ```
   219  !!! Note
   220  
   221      如果想将`crane-scheduler`用作默认调度器,请将`crane-scheduler`更改为`default-scheduler`。
   222  
   223  如果测试 pod 调度成功,将会有以下事件:
   224  ```bash
   225  Type    Reason     Age   From             Message
   226  ----    ------     ----  ----             -------
   227  Normal  Scheduled  28s   crane-scheduler  Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu
   228  ```