volcano.sh/volcano@v1.9.0/docs/design/queue-resource-reservation-design.md (about)

     1  # Volcano Resource Reservation For Queue
     2  
     3  @[Thor-wl](https://github.com/Thor-wl); Nov 3rd, 2020
     4  
     5  ## Motivation
     6  As [issue 1101](https://github.com/volcano-sh/volcano/issues/1101) mentioned, Volcano should support resource reservation
     7  for specified queue. Requirement detail as follows:
     8  * Support reserving specified resources for specified queue
     9  * We only Consider non-preemption reservation. 
    10  * Support enable and disable resource reservation for specified queue dynamically without restarting Volcano.
    11  * Support hard reservation resource specified and percentage reservation resource specified.
    12  
    13  ## Consideration
    14  ### Resource Request
    15  * The request cannot be more than the total resource amount of cluster at all dimensions.
    16  * If `capability` is set, request must be no more than it at all dimensions.
    17  * The resource amount reserved must be no less than request at all dimensions but should not exceed too much. An algorithm 
    18  ensuring the reserved amount to the point is necessary.
    19  * Support total node resource percentage of cluster as request. If reservation resource amount is also specified, it's
    20  important to decide which configuration is adopted. This feature is more useful on condition that resource specification
    21  of all nodes are almost the same.
    22  
    23  ### Reservation Algorithm
    24  * reservation needs gentle treatment to balance scheduling and reserving performance.
    25  * Nodes locked by target job cannot be chosen as locked node.
    26  * Nodes locked by another queue cannot be chosen as locked node.
    27  * Try to lock nodes, whose total resource and nodes numbers is as less as possible on base of satisfying requirement, to avoid dramatic 
    28  scheduling performance degradation.
    29  
    30  
    31  ### Safety
    32  * Malicious application for large amount of resource reservation will cause jobs in other queue to block.
    33  
    34  ## Design
    35  ### API
    36  ```
    37  apiVersion: scheduling.volcano.sh/v1beta1
    38  kind: Queue
    39  metadata:
    40    name: q1
    41  spec:
    42    reclaimable: true
    43    weight: 1
    44    guarantee:             // reservation key word
    45      policy: Best-Effort  // preemption reservation or non-preemption reservation
    46      percentage:          // locked nodes resource percentage in cluster
    47        dimensions: ["cpu", "memory, "gpu", "other-scalable-resource-type"...]  
    48        value: 0.2
    49      resource:            // specified reserving resource
    50        cpu: 2c
    51        memory: 4G
    52  
    53  status:
    54    state: Open
    55  
    56    reservation:          // reservation status key word
    57      nodes: [n1, n2]     // locked nodes list
    58      resource:           // total idle resource in locked nodes
    59        cpu: 1C
    60        memory: 2G 
    61  ```
    62  ### Detail
    63  #### Fields
    64  ##### policy
    65  Option Field, options are "Best-Effort" and "Guaranteed" and "Best-Effort" by default. If "Best-Effort" is set, scheduler
    66  will reserve resource without evicting running workloads, while "Guaranteed" is opposite.
    67  #### percentage
    68  Option Field if `resource` not specified, legal value range is [0, 1]. Locked node number will be as follows: 
    69  ```
    70  math.Floor(clusterNodeNumber * percentage)
    71  ```
    72  It should be noted that nodes will be locked randomly util reaching the percentage. If `resource` is also specified, choose
    73  the one with more reserved resources as the final result.
    74  #### guarantee.resource
    75  Option Field if `percentage` not specified. List of reserving resource categories and amount. 
    76  #### nodes
    77  List of locked nodes.
    78  #### status.resource
    79  Total idle resource of locked nodes. You can judge whether reservation has met demand by watching this field.
    80  
    81  #### Algorithm
    82  
    83  Sort all nodes combination which satisfy the queue request resource in all dimensions.
    84  
    85  and these situations will effect the order of the nodes  combination.
    86  
    87  * sum(nodes's resource)
    88  
    89  the sum of the combination nodes's resource is lower,the priority is higher. and the weight is 0.4
    90  
    91  * queue already used
    92  
    93  the more resources are used on the nodes that belong to the queue,the priority is higher. and the weight is 0.35
    94  
    95  * sum(nodes's number)
    96  
    97  the nums of the combination nodes is lower,the priority is higher. and the weight is 0.15
    98  
    99  * sum(nodes's idle resource)
   100  
   101  the more idle resources on the nodes,the priority is higher. and the weight is 0.1
   102  
   103  #### Complete formula 
   104  
   105  ##### Explanation of terms
   106  * sum :    the sum of the combination nodes's resource.
   107  * target:  the target queue's request resource.  
   108  * used  :  the sum of used resources on the nodes.
   109  * n     :  the nums of the combination nodes.
   110  * idle  :  the sum idle of resources on the nodes.
   111  
   112  * We hope the value of (sum-target) is as less as possible to avoid wasting of the cluster resources. and this should have the hightest weight. 
   113  * We condider the element 'used' will effect the locking nodes's efficiency and stable. 
   114  * When the above two conditions's score are almost close. we preffer the The least number of combination nodes as the best choice,cause the fewer nodes
   115  lokced,The smaller the impact on the cluster.
   116  * The least important condition is the idle of the nodes,the more idle the nodes are, the more efficient it is to lock on.
   117  ```
   118  0.4*1/(sum-target)/[(sum-target)+used+n+idle] + 0.35*used/[(sum-target)+used+n+idle] + 0.15*1/n/[(sum-target)+used+n+idle] + 0.1*idle/[(sum-target)+used+n+idle]
   119  ```
   120  
   121  ##### Lock Strategy
   122  * schedule relock
   123  
   124  for every 5s or 10s. the worker in the  queueController  will find the best combination nodes for a queue.
   125  and update the queue's locked nodes,and caculate the idle resources on the locked nodes.
   126  
   127  
   128  ## Implementation
   129  ### Controller
   130  Add a new worker in the queue_controller:  reserveWorker. reserveWorker aims to take care of the queue which has the requirement 
   131  of reservation ,including finding and updating the node details for the queue to lock, updating the queue reservation status.
   132  
   133  ![Workflow](./images/queue_reservation_lock_workfow.png)
   134  
   135  ###  Plugin
   136  Add new Plugin node_reservation to implement algorithm detail above.
   137  
   138  ![Workflow](./images/queue_reservation_allocate_workflow.png)