volcano.sh/volcano@v1.9.0/docs/user-guide/how_to_use_nodegroup_plugin.md (about) 1 # Nodegroup Plugin User Guide 2 3 ## Introduction 4 5 **Nodegroup plugin** is designed to isolate resources by assigning labels to nodes and set node label affinty on Queue. 6 7 ## Usage 8 9 ### assign label to node 10 Assign label to node, label key is `volcano.sh/nodegroup-name`. 11 ```shell script 12 kubectl label nodes <nodename> volcano.sh/nodegroup-name=<groupname> 13 ``` 14 15 ### configure queue 16 Create queue and bind nodegroup to it. 17 18 ```yaml 19 apiVersion: scheduling.volcano.sh/v1beta1 20 kind: Queue 21 metadata: 22 name: default 23 spec: 24 reclaimable: true 25 weight: 1 26 affinity: # added field 27 nodeGroupAffinity: 28 requiredDuringSchedulingIgnoredDuringExecution: 29 - <groupname> 30 preferredDuringSchedulingIgnoredDuringExecution: 31 - <groupname> 32 nodeGroupAntiAffinity: 33 requiredDuringSchedulingIgnoredDuringExecution: 34 - <groupname> 35 preferredDuringSchedulingIgnoredDuringExecution: 36 - <groupname> 37 ``` 38 ### submit a vcjob 39 40 submit vcjob job-1 to default queue. 41 42 ```shell script 43 $ cat <<EOF | kubectl apply -f - 44 apiVersion: batch.volcano.sh/v1alpha1 45 kind: Job 46 metadata: 47 name: job-1 48 spec: 49 minAvailable: 1 50 schedulerName: volcano 51 queue: default 52 policies: 53 - event: PodEvicted 54 action: RestartJob 55 tasks: 56 - replicas: 1 57 name: nginx 58 policies: 59 - event: TaskCompleted 60 action: CompleteJob 61 template: 62 spec: 63 containers: 64 - command: 65 - sleep 66 - 10m 67 image: nginx:latest 68 name: nginx 69 resources: 70 requests: 71 cpu: 1 72 limits: 73 cpu: 1 74 restartPolicy: Never 75 EOF 76 ``` 77 78 ### validate queue affinity and antiAffinity rules is effected 79 80 Query pod information and verify whether the pod has been scheduled on the correct node. The pod should be scheduled on nodes with 81 label `nodeGroupAffinity.requiredDuringSchedulingIgnoredDuringExecution` or `nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution`. If not, the pod should be scheduled on nodes with label of `nodeGroupAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution`. Specifically, the pod must not be scheduled on nodes with the label `nodeGroupAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution`. 82 83 ```shell script 84 kubectl get po job-1-nginx-0 -o wide 85 ``` 86 87 ## How the Nodegroup Plugin Works 88 89 The nodegroup design document provides the most detailed information about the node group. There are some tips to help avoid certain issues.These tips are based on a four-nodes cluster and vcjob called job-1: 90 91 | Node | Label | 92 | ------ | ------ | 93 | node1 | groupname1 | 94 | node2 | groupname2 | 95 | node3 | groupname3 | 96 | node4 | groupname4 | 97 98 ```yaml 99 apiVersion: batch.volcano.sh/v1alpha1 100 kind: Job 101 metadata: 102 name: job-1 103 spec: 104 minAvailable: 1 105 schedulerName: volcano 106 queue: default 107 policies: 108 - event: PodEvicted 109 action: RestartJob 110 tasks: 111 - replicas: 1 112 name: nginx 113 policies: 114 - event: TaskCompleted 115 action: CompleteJob 116 template: 117 spec: 118 containers: 119 - command: 120 - sleep 121 - 10m 122 image: nginx:latest 123 name: nginx 124 resources: 125 requests: 126 cpu: 1 127 limits: 128 cpu: 1 129 restartPolicy: Never 130 ``` 131 132 1. Soft constraints are a subset of hard constraints, including both affinity and anti-affinity. Consider a queue setup as follows: 133 ```yaml 134 apiVersion: scheduling.volcano.sh/v1beta1 135 kind: Queue 136 metadata: 137 name: default 138 spec: 139 reclaimable: true 140 weight: 1 141 affinity: # added field 142 nodeGroupAffinity: 143 requiredDuringSchedulingIgnoredDuringExecution: 144 - groupname1 145 - gropuname2 146 preferredDuringSchedulingIgnoredDuringExecution: 147 - groupname1 148 nodeGroupAntiAffinity: 149 requiredDuringSchedulingIgnoredDuringExecution: 150 - groupname3 151 - gropuname4 152 preferredDuringSchedulingIgnoredDuringExecution: 153 - groupname3 154 ``` 155 This implies that tasks in the "default" queue will be scheduled on "groupname1" and "groupname2", with a preference for "groupname1" to run first. Tasks are restricted from running on "groupname3" and "groupname4". However, if the resources in other node groups are insufficient, the task can run on "nodegroup3". 156 157 2. If soft constraints do not form a subset of hard constraints, the queue configuration is incorrect, leading to tasks running on "groupname2": 158 ```yaml 159 apiVersion: scheduling.volcano.sh/v1beta1 160 kind: Queue 161 metadata: 162 name: default 163 spec: 164 reclaimable: true 165 weight: 1 166 affinity: # added field 167 nodeGroupAffinity: 168 requiredDuringSchedulingIgnoredDuringExecution: 169 - gropuname2 170 preferredDuringSchedulingIgnoredDuringExecution: 171 - groupname1 172 nodeGroupAntiAffinity: 173 requiredDuringSchedulingIgnoredDuringExecution: 174 - gropuname4 175 preferredDuringSchedulingIgnoredDuringExecution: 176 - groupname3 177 ``` 178 179 3. If there is a conflict between nodeGroupAffinity and nodeGroupAntiAffinity, nodeGroupAntiAffinity takes higher priority. 180 ```yaml 181 apiVersion: scheduling.volcano.sh/v1beta1 182 kind: Queue 183 metadata: 184 name: default 185 spec: 186 reclaimable: true 187 weight: 1 188 affinity: # added field 189 nodeGroupAffinity: 190 requiredDuringSchedulingIgnoredDuringExecution: 191 - groupname1 192 - gropuname2 193 preferredDuringSchedulingIgnoredDuringExecution: 194 - groupname1 195 nodeGroupAntiAffinity: 196 requiredDuringSchedulingIgnoredDuringExecution: 197 - groupname1 198 preferredDuringSchedulingIgnoredDuringExecution: 199 - gropuname2 200 ``` 201 This implies that tasks in the "default" queue can only run on "groupname2". 202 203 4. Generally, tasks run on "groupname1" first because it is a soft constraint. However, the scoring function comprises several plugins, so the task may sometimes run on "groupname2". 204 ```yaml 205 apiVersion: scheduling.volcano.sh/v1beta1 206 kind: Queue 207 metadata: 208 name: default 209 spec: 210 reclaimable: true 211 weight: 1 212 affinity: # added field 213 nodeGroupAffinity: 214 requiredDuringSchedulingIgnoredDuringExecution: 215 - groupname1 216 - gropuname2 217 preferredDuringSchedulingIgnoredDuringExecution: 218 - groupname1 219 nodeGroupAntiAffinity: 220 requiredDuringSchedulingIgnoredDuringExecution: 221 - groupname3 222 - gropuname4 223 preferredDuringSchedulingIgnoredDuringExecution: 224 - groupname3 225 ```