volcano.sh/volcano@v1.9.0/docs/design/queue-guarantee-resource-reservation-design.md (about)

     1  # Volcano Resource Reservation For Queue
     2  
     3  @[qiankunli](https://github.com/qiankunli); Oct 11rd, 2021
     4  
     5  ## Motivation
     6  
     7  In my case, we use volcano to schedule training job(tfjob/pytorchjob/vcjob) in k8s cluster, there are many groups such as ad/recommend/tts, a queue represents a group.
     8  In order to ensure the full utilization of resources, we generally do not configure `queue.capability`. but this will cause a queue to running out all resources, and make the new job of other queue unable to execute.
     9  so we want to reserve some resources for a queue, so that any new job in the queue can be submitted immediately.
    10  
    11  As [issue 1101](https://github.com/volcano-sh/volcano/issues/1101) mentioned, Volcano should support resource reservation
    12  for specified queue. Requirement detail as follows:
    13  * Support reserving specified resources for specified queue
    14  * We only Consider non-preemption reservation.
    15  * Support enable and disable resource reservation for specified queue dynamically without restarting Volcano.
    16  * Support hard reservation resource specified and percentage reservation resource specified.
    17  
    18  @[Thor-wl](https://github.com/Thor-wl) already provide a design doc [Volcano Resource Reservation For Queue](https://github.com/volcano-sh/volcano/blob/master/docs/design/queue-resource-reservation-design.md)
    19  I do not implement all features above, supported feature are as follows:
    20  
    21  * Support reserving specified resources for specified queue
    22  * We only Consider non-preemption reservation.
    23  * Support enable and disable resource reservation for specified queue dynamically without restarting Volcano.
    24  * Support hard reservation resource specified
    25  
    26  ## Consideration
    27  ### Resource Request
    28  * The reserved resource cannot be more than the total resource amount of cluster at all dimensions.
    29  * If `capability` is set in a queue, the reserved resource must be no more than it at all dimensions.
    30  
    31  ### Safety
    32  * Malicious application for large amount of resource reservation will cause jobs in other queue to block.
    33  
    34  ## Design
    35  ### API
    36  ```
    37  apiVersion: scheduling.volcano.sh/v1beta1
    38  kind: Queue
    39  metadata:
    40    name: q1
    41  spec:
    42    reclaimable: true
    43    weight: 1
    44    guarantee:             // reservation key word
    45      resource:            // specified reserving resource
    46        cpu: 2c
    47        memory: 4G
    48  ```
    49  
    50  `guarantee.resource` list of reserving resource categories and amount.
    51  
    52  ## Implementation
    53  
    54  In order to support guarantee mechanism, there are two scenarios to consider
    55  1. support `spec.guarantee` during scheduling
    56  2. create a new queue whose `spec.guarantee` is not nil or an existed queue's `spec.guarantee` becomes bigger
    57  
    58  ### support `spec.guarantee` during scheduling
    59  
    60  if there are three queues and 30 GPUs in cluster.
    61  
    62  |queue/attr|guarantee GPUs|capability GPUs|realCapability GPUs|
    63  |---|---|---|---|
    64  |queue1|5|nil|30|
    65  |queue2|nil|nil|25|
    66  |queue3|nil|10|10|
    67  
    68  
    69  ```go
    70  // /volcano/pkg/scheduler/plugins/proportion/proportion.go
    71  type queueAttr struct {
    72  	queueID api.QueueID
    73  	name    string
    74  	
    75  	deserved  *api.Resource
    76  	allocated *api.Resource
    77  	request   *api.Resource
    78  	// inqueue represents the resource request of the inqueue job
    79  	inqueue    *api.Resource
    80  	capability *api.Resource
    81  	realCapability *api.Resource
    82  	guarantee      *api.Resource
    83  }
    84  ```
    85  
    86  on each schedule cycle, proportion plugin will calculate `queueAttr.deserved` for a queue which means how many resources the queue can use. when consider a new task, 
    87  if `queueAttr.deserved` is bigger than `queueAttr.allocated`, the new task can be scheduled.
    88  1. `queueAttr.deserved` must be bigger than `queueAttr.guarantee`
    89  2. if `queueAttr.guarantee` is not nil(like queue1), it means the 5 GPUs only can be used by queue1 even there is no job running in queue1. we use `queueAttr.realCapability` to represent the upper limit resources that a queue can use. 
    90     1. if `queueAttr.capability` is nil(like queue2), `realCapability = total resources - sum(other-queue.guarantee)`
    91     2. if `queueAttr.capability` is not nil(like queue3), `realCapability = min(capability,total resources - sum(other-queue.guarantee))`
    92  3. replace `queueAttr.capability` with `queueAttr.realCapability` everywhere
    93  
    94  After doing this, a queue owns the resources which is bigger than `queueAttr.guarantee` and less than `queueAttr.realCapability`
    95  
    96  ### create a new queue whose `spec.guarantee` is not nil
    97  
    98  if there are three queues and 30 GPUs in cluster, there are many task in queue1/queue2/queue3 and running out the 30 GPUs,
    99  
   100  |queue/attr|weight|deserved GPUs|guarantee GPUs|capability GPUs|realCapability GPUs|
   101  |---|---|---|---|---|---|
   102  |queue1|1|10|5|nil|30|
   103  |queue2|1|10|nil|nil|25|
   104  |queue3|1|10|nil|10|10|
   105  
   106  then we create queue4 and submit a new job(request 2GPUs) in queue4
   107  
   108  |queue/attr|weight|deserved GPUs|guarantee GPUs|capability GPUs|realCapability GPUs|
   109  |---|---|---|---|---|---|
   110  |queue1|1|6|5|nil|20|
   111  |queue2|1|6|nil|nil|15|
   112  |queue3|1|6|nil|10|10|
   113  |queue4|2|12|10|nil|25|
   114  
   115  1. the overcommit plugin will deny the new job in queue4 because there is no free GPUs in cluster. so,we should change the logic, if `job.request < queue4.guarantee`, the job can be `Inqueue` whether there are free GPUs or not.
   116  2. we should enable the reclaim action, so that volcano can reclaim the task in overused queue
   117  
   118  ## Usage
   119  Configure guarantee for queue
   120  ```yaml
   121  apiVersion: scheduling.volcano.sh/v1beta1
   122  kind: Queue
   123  metadata:
   124    name: q1
   125  spec:
   126    reclaimable: true
   127    weight: 1
   128    guarantee:             // reservation key word
   129      resource:            // specified reserving resource
   130        cpu: 2c
   131        memory: 4G
   132  ```
   133  Enable reclaim action for scheduler.
   134  ```yaml
   135  apiVersion: v1
   136  kind: ConfigMap
   137  metadata:
   138    name: volcano-scheduler-configmap
   139    namespace: volcano-system
   140  data:
   141    volcano-scheduler.conf: |
   142      actions: "enqueue,allocate,reclaim,backfill"
   143      tiers:
   144      - plugins:
   145        - name: priority
   146        - name: gang
   147        - name: conformance
   148      - plugins:
   149        - name: overcommit
   150        - name: drf
   151        - name: predicates
   152        - name: proportion
   153        - name: nodeorder
   154        - name: binpack
   155  ```
   156  
   157  
   158  
   159