volcano.sh/volcano@v1.9.0/docs/design/multi-scheduler.md (about) 1 # multi-scheduling 2 3 ## Backgrounds 4 In a cluster, there are Multiple Schedulers for different workload, e.g. default scheduler for system daemon, and volcano scheduler for biz workload. 5 when the schedulers work in parallel, it is easy to occur the resource conflicts. 6 7 ## Motivation 8 9 - Classify Pods to different resource group and each resource group specifies its own scheduler. 10 - There are no resource conflicts when Multiple schedulers work in parallel. 11 ## Design Action 12 13 To devide the cluster resource, user need to set the taints and labels on cluster nodes and set the corresponding tolerations and nodeSelector on the pods. It will need a lot of manpower to modify the yaml files. 14 For less user operation, volcano supports the action about the pod parts automatically and adds a new MutatingAdmissionWebhook in volcano admission deployment to do it. 15 16 ### New configmap (volcano-admission-configmap) 17 18 The configmap defines some resource groups, each resource group contains 19 - The identification information of the pod object, volcano support two fields: 20 - Namespace field 21 - Annotation field 22 - The pod's data volcano needs to patch, volcano support to patch the fields and the fields are optional and not mandatory. User can set them according the application scenario. 23 - Tolerations 24 - Affinity 25 - NodeSelector 26 - SchedulerName 27 28 If the object field is not setted, it is filled with a default as the following: 29 ```` 30 - resourceGroup: XXX 31 object: 32 key: annotation # set the field and the value to be matched 33 value: 34 - "volcano.sh/resource-group: XXX" # XXX is the value in resourceGroup field 35 ```` 36 37 For example 38 ```` 39 apiVersion: v1 40 kind: ConfigMap 41 metadata: 42 name: volcano-admission-configmap 43 namespace: volcano-system 44 data: 45 volcano-admission.conf: | 46 resourceGroups: 47 - resourceGroup: management # set the resource group name 48 object: 49 key: namespace # set the field and the value to be matched 50 value: 51 - mng-ns-1 52 schedulerName: default-scheduler # set the scheduler for patching 53 tolerations: # set the tolerations for patching 54 - effect: NoSchedule 55 key: taint 56 operator: Exists 57 labels: 58 volcano.sh/nodetype: management # set the nodeSelector for patching 59 - resourceGroup: cpu 60 object: 61 key: annotation 62 value: 63 - "volcano.sh/resource-group: cpu" 64 schedulerName: volcano 65 labels: 66 volcano.sh/nodetype: cpu 67 - resourceGroup: gpu # if the object is unsetted, default is: the key is annotation, 68 schedulerName: volcano # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field 69 labels: 70 volcano.sh/nodetype: gpu 71 - resourceGroup: fixed # if the object is unsetted, default is: the key is annotation, 72 schedulerName: volcano # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field 73 object: 74 key: annotation 75 value: 76 - "volcano.sh/resource-group-job-role: master" 77 # set the affinity for patching, the format is a json string. 78 affinity: "{\"nodeAffinity\":{\"requiredDuringSchedulingIgnoredDuringExecution\":{\"nodeSelectorTerms\":[{\"matchExpressions\":[{\"key\":\"volcano.sh/nodetype\",\"operator\":\"In\",\"values\":[\"fixed\"]}]}]}}}" 79 ```` 80 81 ### The pod mutate process 82  83 84 If the pod matches serval resource groups, volcano will match the resource group by sort. 85 86 87 ## Usage 88 ### case 1 89 Default scheduler for system daemon, and volcano scheduler for biz workload 90 Here is a cluster as the following: 91 92 |node| label | taint| 93 |----|-----|-----| 94 |node1| volcano.sh/nodetype: management| management-taint:NoSchedule| 95 |node2| none| none| 96 97 |pod | kind | namespace| 98 |----|----|----| 99 |deployment-A|deployment| mng-ns-1| 100 |volcano-job-B|volcano job | default| 101 102 1. Edit volcano-admission-configmap 103 ```` 104 apiVersion: v1 105 kind: ConfigMap 106 metadata: 107 name: volcano-admission-configmap 108 namespace: volcano-system 109 data: 110 volcano-admission.conf: | 111 resourceGroups: 112 - resourceGroup: management # set the resource group name 113 object: 114 key: namespace # set the field and the value to be matched 115 value: 116 - mng-ns-1 117 schedulerName: default-scheduler # set the scheduler for patching 118 tolerations: # set the tolerations for patching 119 - effect: NoSchedule 120 key: management-taint 121 operator: Exists 122 labels: 123 volcano.sh/nodetype: management # set the nodeSelector for patching 124 ```` 125 126 2. Submit deployment-A and volcano-job-B 127 128 3. Check the deployment Pod 129 130 ```` 131 deployment-A: 132 .... 133 nodeSelector: 134 volcano.sh/nodetype: management 135 ... 136 schedulerName: default-scheduler 137 ... 138 tolerations: 139 - effect: NoSchedule 140 key: management-taint 141 operator: Exists 142 143 volcano-job-B: 144 .... 145 <none> 146 .... 147 ```` 148 4. Check the result of the pod's scheduling 149 ```` 150 The pod in deployment-A is scheduled to node1. 151 the pod in volcano-job-B is scheduled to node2. 152 ```` 153 154 ### case 2 155 156 Here is a cluster as the following: 157 158 |node|label| 159 |----|-----| 160 |node1| volcano.sh/nodetype: cpu| 161 |node2| volcano.sh/nodetype: gpu| 162 163 |volcano job | annotation| 164 |----|----| 165 |job-A|volcano.sh/resource-group: cpu| 166 |job-B|volcano.sh/resource-group: gpu| 167 168 1. Edit volcano-admission-configmap 169 ```` 170 apiVersion: v1 171 kind: ConfigMap 172 metadata: 173 name: volcano-admission-configmap 174 namespace: volcano-system 175 data: 176 volcano-admission.conf: | 177 resourceGroups: 178 - resourceGroup: cpu 179 object: 180 key: annotation 181 value: 182 - "volcano.sh/resource-group: cpu" 183 schedulerName: volcano 184 labels: 185 volcano.sh/nodetype: cpu 186 - resourceGroup: gpu 187 schedulerName: volcano 188 labels: 189 volcano.sh/nodetype: gpu 190 ```` 191 2. Submit job-A and job-B 192 193 3. Check the Pod information 194 195 ```` 196 job-A: 197 .... 198 nodeSelector: 199 volcano.sh/nodetype: cpu 200 ... 201 schedulerName: volcano 202 .... 203 204 job-B: 205 .... 206 nodeSelector: 207 volcano.sh/nodetype: gpu 208 ... 209 schedulerName: volcano 210 .... 211 ```` 212 4. Check the result of the pod's scheduling 213 ```` 214 The pod in job-A is scheduled to node1. 215 The pod in job-B job is scheduled to node2. 216 ```` 217 218 ### case 3 219 220 Here is a cluster as the following: 221 222 |node|label| 223 |----|-----| 224 |node1| volcano.sh/nodetype: fixed| 225 |node2| none| none| 226 227 |volcano job | annotation| 228 |----|----| 229 |job-A|volcano.sh/resource-group-job-role: master| 230 |job-B|none| 231 232 1. Edit volcano-admission-configmap 233 ```` 234 apiVersion: v1 235 kind: ConfigMap 236 metadata: 237 name: volcano-admission-configmap 238 namespace: volcano-system 239 data: 240 volcano-admission.conf: | 241 resourceGroups: 242 - resourceGroup: fixed # if the object is unsetted, default is: the key is annotation, 243 schedulerName: volcano # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field 244 object: 245 key: annotation 246 value: 247 - "volcano.sh/resource-group-job-role: master" 248 # set the affinity for patching, the format is a json string. 249 affinity: "{\"nodeAffinity\":{\"requiredDuringSchedulingIgnoredDuringExecution\":{\"nodeSelectorTerms\":[{\"matchExpressions\":[{\"key\":\"volcano.sh/nodetype",\"operator\":\"In\",\"values\":[\"fixed\"]}]}]}}}" 250 ```` 251 2. Submit job-A and job-B 252 253 3. Check the Pod information 254 255 ```` 256 job-A: 257 .... 258 affinity: 259 nodeAffinity: 260 requiredDuringSchedulingIgnoredDuringExecution: 261 nodeSelectorTerms: 262 - matchExpressions: 263 - key: volcano.sh/nodetype 264 operator: In 265 values: 266 - fixed 267 ... 268 schedulerName: volcano 269 .... 270 271 job-B: 272 .... 273 ... 274 schedulerName: volcano 275 .... 276 ```` 277 4. Check the result of the pod's scheduling 278 ```` 279 The pod in job-A is scheduled to node1. 280 The pod in job-B job is scheduled to node1/node2. 281 ```` 282 283 ## NOTE 284 285 Enable this feature may modify pod information and affect resource utilization. 286 - The feature will divide the cluster resource and may decrease the resource usage. 287 - The feature will add some additional information to the pods, such as tolerations and nodeSelector data. 288