volcano.sh/volcano@v1.9.0/docs/user-guide/how_to_use_gpu_number.md (about) 1 # GPU Number User guide 2 3 ## Environment setup 4 5 ### Install volcano 6 7 #### 1. Install from source 8 9 Refer to [Install Guide](../../installer/README.md) to install volcano. 10 11 > **Note** The Volcano VGPU feature has been transferred to the HAMI project, click [here](https://github.com/Project-HAMi/volcano-vgpu-device-plugin) to access 12 13 After installed, update the scheduler configuration: 14 15 ```shell script 16 kubectl edit cm -n volcano-system volcano-scheduler-configmap 17 ``` 18 19 For volcano v1.8.2+(v1.8.2 excluded), use the following configMap 20 21 ```yaml 22 kind: ConfigMap 23 apiVersion: v1 24 metadata: 25 name: volcano-scheduler-configmap 26 namespace: volcano-system 27 data: 28 volcano-scheduler.conf: | 29 actions: "enqueue, allocate, backfill" 30 tiers: 31 - plugins: 32 - name: priority 33 - name: gang 34 - name: conformance 35 - plugins: 36 - name: drf 37 - name: deviceshare 38 arguments: 39 deviceshare.GPUNumberEnable: true # enable gpu number 40 - name: predicates 41 - name: proportion 42 - name: nodeorder 43 - name: binpack 44 ``` 45 46 For volcano v1.8.2-(v1.8.2 included), use the following configMap 47 48 ```yaml 49 kind: ConfigMap 50 apiVersion: v1 51 metadata: 52 name: volcano-scheduler-configmap 53 namespace: volcano-system 54 data: 55 volcano-scheduler.conf: | 56 actions: "enqueue, allocate, backfill" 57 tiers: 58 - plugins: 59 - name: priority 60 - name: gang 61 - name: conformance 62 - plugins: 63 - name: drf 64 - name: predicates 65 arguments: 66 predicate.GPUNumberEnable: true # enable gpu number 67 - name: proportion 68 - name: nodeorder 69 - name: binpack 70 ``` 71 72 #### 2. Install from release package. 73 74 Same as above, after installed, update the scheduler configuration in `volcano-scheduler-configmap` configmap. 75 76 ### Install Volcano device plugin 77 78 Please refer to [volcano device plugin](https://github.com/volcano-sh/devices/blob/master/README.md#quick-start) 79 80 * Remember to config volcano device plugin to support gpu-number, users need to config volcano device plugin --gpu-strategy=number. For more information [volcano device plugin configuration](https://github.com/volcano-sh/devices/blob/master/doc/config.md) 81 82 ### Verify environment is ready 83 84 Check the node status, it is ok `volcano.sh/gpu-number` is included in the allocatable resources. 85 86 ```shell script 87 $ kubectl get node {node name} -oyaml 88 ... 89 Capacity: 90 attachable-volumes-gce-pd: 127 91 cpu: 2 92 ephemeral-storage: 98868448Ki 93 hugepages-1Gi: 0 94 hugepages-2Mi: 0 95 memory: 7632596Ki 96 pods: 110 97 volcano.sh/gpu-memory: 0 98 volcano.sh/gpu-number: 1 99 Allocatable: 100 attachable-volumes-gce-pd: 127 101 cpu: 1930m 102 ephemeral-storage: 47093746742 103 hugepages-1Gi: 0 104 hugepages-2Mi: 0 105 memory: 5752532Ki 106 pods: 110 107 volcano.sh/gpu-memory: 0 108 volcano.sh/gpu-number: 1 109 ``` 110 111 ### Running Jobs With Multiple GPU Cards 112 113 Jobs can have multiple exclusive NVIDIA GPUs cards via defining container level resource requirements `volcano.sh/gpu-number`: 114 ```shell script 115 $ cat <<EOF | kubectl apply -f - 116 apiVersion: v1 117 kind: Pod 118 metadata: 119 name: gpu-pod1 120 spec: 121 containers: 122 - name: cuda-container 123 image: nvidia/cuda:9.0-devel 124 command: ["sleep"] 125 args: ["100000"] 126 resources: 127 limits: 128 volcano.sh/gpu-number: 1 # requesting 1 gpu cards 129 EOF 130 ``` 131 132 If the above pods claim multiple gpu cards, you can see each of them has exclusive gpu cards: 133 134 ```shell script 135 $ kubectl exec -ti gpu-pod1 env 136 ... 137 NVIDIA_VISIBLE_DEVICES=0 138 VOLCANO_GPU_ALLOCATED=1 139 ... 140 ``` 141 ### Understanding How Multiple GPU Cards Requirement Works 142 143 The main architecture is similar as the previous, but the gpu-index results of each pod will be a list of gpu cards index. 144 145  146 147 1. create a pod with `volcano.sh/gpu-number` resource request, 148 149 2. volcano scheduler predicates and allocate gpu cards to the pod. Add the below annotation 150 151 ```yaml 152 annotations: 153 volcano.sh/gpu-index: “0” 154 volcano.sh/predicate-time: “1593764466550835304” 155 ``` 156 157 3. kubelet watches the pod bound to itself, and calls allocate API to set env before running the container. 158 159 ```yaml 160 env: 161 NVIDIA_VISIBLE_DEVICES: “0” # GPU card index 162 VOLCANO_GPU_ALLOCATED: “1” # GPU number allocated 163 ```