github.com/gocrane/crane@v0.11.0/docs/tutorials/scheduling-pods-based-on-actual-node-load.zh.md (about) 1 # Crane-scheduler 2 3 ## 概述 4 Crane-scheduler 是一组基于[scheduler framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/)的调度插件, 包含: 5 6 - [Dynamic scheduler:负载感知调度器插件](./dynamic-scheduler-plugin.md) 7 8 ## 开始 9 10 ### 安装 Prometheus 11 确保你的 Kubernetes 集群已安装 Prometheus。如果没有,请参考[Install Prometheus](https://github.com/gocrane/fadvisor/blob/main/README.md#prerequests). 12 13 ### 配置 Prometheus 规则 14 15 配置 Prometheus 的规则以获取预期的聚合数据: 16 17 ```yaml 18 apiVersion: monitoring.coreos.com/v1 19 kind: PrometheusRule 20 metadata: 21 name: example-record 22 spec: 23 groups: 24 - name: cpu_mem_usage_active 25 interval: 30s 26 rules: 27 - record: cpu_usage_active 28 expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100) 29 - record: mem_usage_active 30 expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes) 31 - name: cpu-usage-5m 32 interval: 5m 33 rules: 34 - record: cpu_usage_max_avg_1h 35 expr: max_over_time(cpu_usage_avg_5m[1h]) 36 - record: cpu_usage_max_avg_1d 37 expr: max_over_time(cpu_usage_avg_5m[1d]) 38 - name: cpu-usage-1m 39 interval: 1m 40 rules: 41 - record: cpu_usage_avg_5m 42 expr: avg_over_time(cpu_usage_active[5m]) 43 - name: mem-usage-5m 44 interval: 5m 45 rules: 46 - record: mem_usage_max_avg_1h 47 expr: max_over_time(mem_usage_avg_5m[1h]) 48 - record: mem_usage_max_avg_1d 49 expr: max_over_time(mem_usage_avg_5m[1d]) 50 - name: mem-usage-1m 51 interval: 1m 52 rules: 53 - record: mem_usage_avg_5m 54 expr: avg_over_time(mem_usage_active[5m]) 55 ``` 56 !!! warning "️Troubleshooting" 57 58 Prometheus 的采样间隔必须小于30秒,不然可能会导致规则无法正常生效。如:`cpu_usage_active`。 59 60 ### 安装 Crane-scheduler 61 有两种选择: 62 63 - 安装 Crane-scheduler 作为第二个调度器 64 - 用 Crane-scheduler 替换原生 Kube-scheduler 65 66 #### 安装 Crane-scheduler 作为第二个调度器 67 === "Main" 68 69 ```bash 70 helm repo add crane https://gocrane.github.io/helm-charts 71 helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler 72 ``` 73 74 === "Mirror" 75 76 ```bash 77 helm repo add crane https://finops-helm.pkg.coding.net/gocrane/gocrane 78 helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler 79 ``` 80 #### 用 Crane-scheduler 替换原生 Kube-scheduler 81 82 1. 备份`/etc/kubernetes/manifests/kube-scheduler.yaml` 83 ```bash 84 cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/ 85 ``` 86 2. 通过修改 kube-scheduler 的配置文件(`scheduler-config.yaml` ) 启用动态调度插件并配置插件参数: 87 ```yaml title="scheduler-config.yaml" 88 apiVersion: kubescheduler.config.k8s.io/v1beta2 89 kind: KubeSchedulerConfiguration 90 ... 91 profiles: 92 - schedulerName: default-scheduler 93 plugins: 94 filter: 95 enabled: 96 - name: Dynamic 97 score: 98 enabled: 99 - name: Dynamic 100 weight: 3 101 pluginConfig: 102 - name: Dynamic 103 args: 104 policyConfigPath: /etc/kubernetes/policy.yaml 105 ... 106 ``` 107 3. 新建`/etc/kubernetes/policy.yaml`,用作动态插件的调度策略: 108 ```yaml title="/etc/kubernetes/policy.yaml" 109 apiVersion: scheduler.policy.crane.io/v1alpha1 110 kind: DynamicSchedulerPolicy 111 spec: 112 syncPolicy: 113 ##cpu usage 114 - name: cpu_usage_avg_5m 115 period: 3m 116 - name: cpu_usage_max_avg_1h 117 period: 15m 118 - name: cpu_usage_max_avg_1d 119 period: 3h 120 ##memory usage 121 - name: mem_usage_avg_5m 122 period: 3m 123 - name: mem_usage_max_avg_1h 124 period: 15m 125 - name: mem_usage_max_avg_1d 126 period: 3h 127 128 predicate: 129 ##cpu usage 130 - name: cpu_usage_avg_5m 131 maxLimitPecent: 0.65 132 - name: cpu_usage_max_avg_1h 133 maxLimitPecent: 0.75 134 ##memory usage 135 - name: mem_usage_avg_5m 136 maxLimitPecent: 0.65 137 - name: mem_usage_max_avg_1h 138 maxLimitPecent: 0.75 139 140 priority: 141 ##cpu usage 142 - name: cpu_usage_avg_5m 143 weight: 0.2 144 - name: cpu_usage_max_avg_1h 145 weight: 0.3 146 - name: cpu_usage_max_avg_1d 147 weight: 0.5 148 ##memory usage 149 - name: mem_usage_avg_5m 150 weight: 0.2 151 - name: mem_usage_max_avg_1h 152 weight: 0.3 153 - name: mem_usage_max_avg_1d 154 weight: 0.5 155 156 hotValue: 157 - timeRange: 5m 158 count: 5 159 - timeRange: 1m 160 count: 2 161 ``` 162 4. 修改`kube-scheduler.yaml`并用 Crane-scheduler的镜像替换 kube-scheduler 镜像: 163 ```yaml title="kube-scheduler.yaml" 164 ... 165 image: docker.io/gocrane/crane-scheduler:0.0.23 166 ... 167 ``` 168 5. 安装[crane-scheduler-controller](https://github.com/gocrane/crane-scheduler/tree/main/deploy/controller): 169 === "Main" 170 171 ```bash 172 kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/rbac.yaml 173 kubectl apply -f https://raw.githubusercontent.com/gocrane/crane-scheduler/main/deploy/controller/deployment.yaml 174 ``` 175 176 === "Mirror" 177 178 ```bash 179 kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/rbac.yaml 180 kubectl apply -f https://gitee.com/finops/crane-scheduler/raw/main/deploy/controller/deployment.yaml 181 ``` 182 183 ### 使用 Crane-scheduler 调度 Pod 184 使用以下示例测试 Crane-scheduler : 185 186 ```yaml 187 apiVersion: apps/v1 188 kind: Deployment 189 metadata: 190 name: cpu-stress 191 spec: 192 selector: 193 matchLabels: 194 app: cpu-stress 195 replicas: 1 196 template: 197 metadata: 198 labels: 199 app: cpu-stress 200 spec: 201 schedulerName: crane-scheduler 202 hostNetwork: true 203 tolerations: 204 - key: node.kubernetes.io/network-unavailable 205 operator: Exists 206 effect: NoSchedule 207 containers: 208 - name: stress 209 image: docker.io/gocrane/stress:latest 210 command: ["stress", "-c", "1"] 211 resources: 212 requests: 213 memory: "1Gi" 214 cpu: "1" 215 limits: 216 memory: "1Gi" 217 cpu: "1" 218 ``` 219 !!! Note 220 221 如果想将`crane-scheduler`用作默认调度器,请将`crane-scheduler`更改为`default-scheduler`。 222 223 如果测试 pod 调度成功,将会有以下事件: 224 ```bash 225 Type Reason Age From Message 226 ---- ------ ---- ---- ------- 227 Normal Scheduled 28s crane-scheduler Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu 228 ```