volcano.sh/volcano@v1.9.0/docs/design/proportional.md

volcano.sh/volcano@v1.9.0/docs/design/proportional.md (about)

     1  ## Background
     2  
     3  Volcano scheduler handles jobs requiring different types of resources, such as GPU, CPU, memory. Under particular circumstances, we may specify a 'primary' resource(e.g., GPU in deep learning), and preserve the amount of associated 'secondary' resources by a pre-set proportion. This plugin works in the phase of predicates, dedicates to ensure the node's idle resource is enough for the proportion after jobs requiring secondary resources are scheduled.
     4  
     5  ## Scenario of default scheduler
     6  
     7  Considering we have a node with 74CPUs, 8GPUs, 128G memory. As no job is submitted, resource NodeAllocatable is equal to NodeIdle.
     8  
     9  Node | NodeAllocatable | NodeIdle
    10  ---|---|---
    11  nodeC0-0 | cpu 74, memory 128G, nvidia.com/gpu 8 | cpu 74, memory 128G, nvidia.com/gpu 8 |
    12  
    13  Then two jobs requiring 8CPUs, 8G memory are submitted, and scheduled to the node; the resource status is as below:
    14  
    15  Job | Pod | Resource | Node | NodeAllocatable | NodeIdle
    16  ---|---|---|---|---|---
    17  default/single-1000-0 | single-1000-0 | cpu 8, memory 8G, nvidia.com/gpu 0 | nodeC0-0 | cpu 74, memory 128G, nvidia.com/gpu 8 | cpu 66, memory 120G, nvidia.com/gpu 8 |
    18  default/single-1000-1 | single-1000-1 | cpu 8, memory 8G, nvidia.com/gpu 0 | nodeC0-0 | cpu 66, memory 120G, nvidia.com/gpu 8 | cpu 58, memory 112G, nvidia.com/gpu 8 |
    19  
    20  If we take GPU as primary resource and want to use 1GPU 'binded' with 8CPUs, the left 58 CPUs are insufficent for 8 GPUs; the proportion plugin is designed to solve this problem.
    21  
    22  ## with proportion plugin
    23  
    24  ![](./images/proportional-diagram.png)
    25  
    26  Firstly set the proportion binding in volcano-scheduler.conf:
    27  
    28  ```yaml
    29  actions: "enqueue, allocate, backfill"
    30  tiers:
    31  - plugins:
    32    - name: predicates
    33      arguments:
    34        predicate.ProportionalEnable: true
    35        predicate.resources: nvidia.com/gpu,nvidia.com/v100-sxm2-16gb
    36        predicate.resources.nvidia.com/gpu.cpu: 8
    37        predicate.resources.nvidia.com/gpu.memory: 8
    38        predicate.resources.nvidia.com/v100-sxm2-16gb.cpu: 16
    39        predicate.resources.nvidia.com/v100-sxm2-16gb.memory: 16
    40  ```
    41  
    42  The proportion is GPU:CPU:MEMORY=1:8:8, and let the test scenario just as above:
    43  
    44  Node | NodeAllocatable | NodeIdle
    45  ---|---|---
    46  nodeC0-0 | cpu 74, memory 128G, nvidia.com/gpu 8 | cpu 74, memory 128G, nvidia.com/gpu 8 |
    47  
    48  Job | Pod | Resource | Node | NodeAllocatable | NodeIdle
    49  ---|---|---|---|---|---
    50  default/single-1000-0 | single-1000-0 | cpu 8, memory 8G, nvidia.com/gpu 0 | nodeC0-0 | cpu 74, memory 128G, nvidia.com/gpu 8 | cpu 66, memory 120G, nvidia.com/gpu 8 |
    51  default/single-1000-1 | single-1000-1 | cpu 8, memory 8G, nvidia.com/gpu 0 | - | - | - |
    52  
    53  After job single-1000-0 is scheduled, the Idel resouce is 8GPUs, 66CPUs, 120G memory. During the predicate phase, this plugin caculates the resource left if job single-1000-1 is scheduled`(node.Idel.CPU - task.Resreq.CPU < node.Idel.GPU * cpuRatio ||
    54  node.Idel.Memory - task.Resreq.Memory < node.Idel.GPU * memoryRatio)`; the result is 8GPUs, 58CPUs, 112G memory, that unsatisfies the 1:8:8 proportion. Therefore nodeC0-0 is removed from the predicateNodes, and NodeIdle remains 8GPUs, 66CPUs, 120G memory. 
    55  
    56  
    57