volcano.sh/volcano@v1.9.0/docs/user-guide/how_to_use_gpu_number.md

volcano.sh/volcano@v1.9.0/docs/user-guide/how_to_use_gpu_number.md (about)

     1  # GPU Number User guide
     2  
     3  ## Environment setup
     4  
     5  ### Install volcano
     6  
     7  #### 1. Install from source
     8  
     9  Refer to [Install Guide](../../installer/README.md) to install volcano.
    10  
    11  > **Note** The Volcano VGPU feature has been transferred to the HAMI project, click [here](https://github.com/Project-HAMi/volcano-vgpu-device-plugin) to access
    12  
    13  After installed, update the scheduler configuration:
    14  
    15  ```shell script
    16  kubectl edit cm -n volcano-system volcano-scheduler-configmap
    17  ```
    18  
    19  For volcano v1.8.2+(v1.8.2 excluded), use the following configMap 
    20  
    21  ```yaml
    22  kind: ConfigMap
    23  apiVersion: v1
    24  metadata:
    25    name: volcano-scheduler-configmap
    26    namespace: volcano-system
    27  data:
    28    volcano-scheduler.conf: |
    29      actions: "enqueue, allocate, backfill"
    30      tiers:
    31      - plugins:
    32        - name: priority
    33        - name: gang
    34        - name: conformance
    35      - plugins:
    36        - name: drf
    37        - name: deviceshare
    38          arguments:
    39            deviceshare.GPUNumberEnable: true # enable gpu number
    40        - name: predicates
    41        - name: proportion
    42        - name: nodeorder
    43        - name: binpack
    44  ```
    45  
    46  For volcano v1.8.2-(v1.8.2 included), use the following configMap 
    47  
    48  ```yaml
    49  kind: ConfigMap
    50  apiVersion: v1
    51  metadata:
    52    name: volcano-scheduler-configmap
    53    namespace: volcano-system
    54  data:
    55    volcano-scheduler.conf: |
    56      actions: "enqueue, allocate, backfill"
    57      tiers:
    58      - plugins:
    59        - name: priority
    60        - name: gang
    61        - name: conformance
    62      - plugins:
    63        - name: drf
    64        - name: predicates
    65          arguments:
    66            predicate.GPUNumberEnable: true # enable gpu number
    67        - name: proportion
    68        - name: nodeorder
    69        - name: binpack
    70  ```
    71  
    72  #### 2. Install from release package.
    73  
    74  Same as above, after installed, update the scheduler configuration in `volcano-scheduler-configmap` configmap.
    75  
    76  ### Install Volcano device plugin
    77  
    78  Please refer to [volcano device plugin](https://github.com/volcano-sh/devices/blob/master/README.md#quick-start)
    79  
    80  * Remember to config volcano device plugin to support gpu-number, users need to config volcano device plugin --gpu-strategy=number. For more information [volcano device plugin configuration](https://github.com/volcano-sh/devices/blob/master/doc/config.md)
    81  
    82  ### Verify environment is ready
    83  
    84  Check the node status, it is ok  `volcano.sh/gpu-number` is included in the allocatable resources. 
    85  
    86  ```shell script
    87  $ kubectl get node {node name} -oyaml
    88  ...
    89  Capacity:
    90    attachable-volumes-gce-pd:  127
    91    cpu:                        2
    92    ephemeral-storage:          98868448Ki
    93    hugepages-1Gi:              0
    94    hugepages-2Mi:              0
    95    memory:                     7632596Ki
    96    pods:                       110
    97    volcano.sh/gpu-memory:      0
    98    volcano.sh/gpu-number:      1
    99  Allocatable:
   100    attachable-volumes-gce-pd:  127
   101    cpu:                        1930m
   102    ephemeral-storage:          47093746742
   103    hugepages-1Gi:              0
   104    hugepages-2Mi:              0
   105    memory:                     5752532Ki
   106    pods:                       110
   107    volcano.sh/gpu-memory:      0
   108    volcano.sh/gpu-number:      1
   109  ```
   110  
   111  ### Running Jobs With Multiple GPU Cards
   112  
   113  Jobs can have multiple exclusive NVIDIA GPUs cards via defining container level resource requirements `volcano.sh/gpu-number`:
   114  ```shell script
   115  $ cat <<EOF | kubectl apply -f -
   116  apiVersion: v1
   117  kind: Pod
   118  metadata:
   119    name: gpu-pod1
   120  spec:
   121    containers:
   122      - name: cuda-container
   123        image: nvidia/cuda:9.0-devel
   124        command: ["sleep"]
   125        args: ["100000"]
   126        resources:
   127          limits:
   128            volcano.sh/gpu-number: 1 # requesting 1 gpu cards
   129  EOF
   130  ```
   131  
   132  If the above pods claim multiple gpu cards, you can see each of them has exclusive gpu cards:
   133  
   134  ```shell script
   135  $ kubectl exec -ti  gpu-pod1 env
   136  ...
   137  NVIDIA_VISIBLE_DEVICES=0
   138  VOLCANO_GPU_ALLOCATED=1
   139  ...
   140  ```
   141  ### Understanding How Multiple GPU Cards Requirement Works 
   142  
   143  The main architecture is similar as the previous, but the gpu-index results of each pod will be a list of gpu cards index. 
   144  
   145  ![gpu_number](../images/gpu-number.png)
   146  
   147  1. create a pod with `volcano.sh/gpu-number` resource request,
   148  
   149  2. volcano scheduler predicates and allocate gpu cards to the pod. Add the below annotation
   150  
   151  ```yaml
   152  annotations:
   153    volcano.sh/gpu-index: “0”
   154    volcano.sh/predicate-time: “1593764466550835304”
   155  ```
   156  
   157  3. kubelet watches the pod bound to itself, and calls allocate API to set env before running the container.
   158  
   159  ```yaml
   160  env:
   161    NVIDIA_VISIBLE_DEVICES: “0” # GPU card index
   162    VOLCANO_GPU_ALLOCATED: “1” # GPU number allocated
   163  ```