volcano.sh/volcano@v1.9.0/docs/design/elastic-scheduler.md (about)

     1  ## Introduction
     2  
     3  This feature allows Volcano to schedule workloads based on the `[min,max]` config to improve resource utilization rate and shorten the execution time of training job.
     4  
     5  For example, K8s cluster has 10 GPUs, and I want to use Volcano to schedule training jobs(tfjob/pytorchjob/vcjob) in two queues: queue1 and queue2
     6  
     7  ||weight|  reclaimable| deserved GPUs|
     8  |---|---|---|---|
     9  |queue1|  1|   false|    5|
    10  |queue2|  1|   false|    5|
    11  
    12  ![](images/elastic-scheduler-job1-1.png)
    13  
    14  If there is a job1-1 running in queue1, we set pod6 to pod10 as elastic pods/resources which can be preempted when queue1's resource is shortage. The elastic pods have the lowest priority. Specifically,**these pods will be created last and be preempted first**.
    15  1. elastic pods can be created only when there are free resources.
    16  2. elastic pods will be preempted if there are not enough resources for running minAvailable pods.
    17  
    18  ```yaml
    19  apiVersion: batch.volcano.sh/v1alpha1
    20  kind: Job
    21  metadata:
    22    name: job1-1
    23  spec:
    24    minAvailable: 5     #min
    25    queue: queue1
    26    tasks:
    27      - replicas: 10    #max
    28        name: job1-1
    29        template:
    30          metadata:
    31            name: job1-1
    32          spec:
    33            containers:
    34              - image: train_script
    35                name: xx
    36                resources:
    37                  limits:
    38                    cpu: 1
    39                    nvidia.com/gpu: 1
    40  ```
    41  
    42  In detail, there are some principles for elastic schedule
    43  1. if job1-1 and job1-2 are submited at the same time, `job1.minAvailable` pods and `job2.minAvailable` pods will be created first. And then `job1/job2.elastic` pods will be created if there are extra resource.
    44     ![](images/elastic-scheduler-job1-1-2.png)
    45  2. if submit job1-1 and then submit job1-2 in queue1, elastic pods in job1-1 will be preempted
    46     ![](images/elastic-scheduler-job1-2.png)
    47  3. if submit job1-1 and then submit job2-1 in queue2, elastic pods in job1-1 will be preempted
    48     ![](images/elastic-scheduler-job2-1.png)
    49  
    50  ## Design
    51  
    52  1. Enqueue action
    53  - Modify the logic of job enqueue process. For elastic pods can be preempted at any time, elastic resources are free resources in a queue. So we will fix `jobEnqueueableFns` in `overcommit` and `proportion` plugin. it should be noticed that if total elastic resources can not meet new-job's minRequest and the new-job should also be pending.
    54    ![](images/elastic-scheduler-job1-3.png)
    55  
    56  2. Allocate action(already implemented)
    57  - All pods will be created initially(by controller/operator), but minAvailable pods will be scheduled first and then schedule elastic pods if there are free resources.
    58  
    59  3. Preempt action in queue scope
    60  - Preempt elastic pods if there are starving job in the same queue(already implemented).
    61  - It is not necessary to preempt elastic pods if total elastic resources can not meet the starving job's minRequest.
    62  
    63  4. Reclaim action in cluster scope
    64  - If a queue is overused, reclaim its elastic resources whether this queue's `reclaimable` filed is true or false.