volcano.sh/volcano@v1.9.0/docs/design/queue-resource-reservation-design.md (about) 1 # Volcano Resource Reservation For Queue 2 3 @[Thor-wl](https://github.com/Thor-wl); Nov 3rd, 2020 4 5 ## Motivation 6 As [issue 1101](https://github.com/volcano-sh/volcano/issues/1101) mentioned, Volcano should support resource reservation 7 for specified queue. Requirement detail as follows: 8 * Support reserving specified resources for specified queue 9 * We only Consider non-preemption reservation. 10 * Support enable and disable resource reservation for specified queue dynamically without restarting Volcano. 11 * Support hard reservation resource specified and percentage reservation resource specified. 12 13 ## Consideration 14 ### Resource Request 15 * The request cannot be more than the total resource amount of cluster at all dimensions. 16 * If `capability` is set, request must be no more than it at all dimensions. 17 * The resource amount reserved must be no less than request at all dimensions but should not exceed too much. An algorithm 18 ensuring the reserved amount to the point is necessary. 19 * Support total node resource percentage of cluster as request. If reservation resource amount is also specified, it's 20 important to decide which configuration is adopted. This feature is more useful on condition that resource specification 21 of all nodes are almost the same. 22 23 ### Reservation Algorithm 24 * reservation needs gentle treatment to balance scheduling and reserving performance. 25 * Nodes locked by target job cannot be chosen as locked node. 26 * Nodes locked by another queue cannot be chosen as locked node. 27 * Try to lock nodes, whose total resource and nodes numbers is as less as possible on base of satisfying requirement, to avoid dramatic 28 scheduling performance degradation. 29 30 31 ### Safety 32 * Malicious application for large amount of resource reservation will cause jobs in other queue to block. 33 34 ## Design 35 ### API 36 ``` 37 apiVersion: scheduling.volcano.sh/v1beta1 38 kind: Queue 39 metadata: 40 name: q1 41 spec: 42 reclaimable: true 43 weight: 1 44 guarantee: // reservation key word 45 policy: Best-Effort // preemption reservation or non-preemption reservation 46 percentage: // locked nodes resource percentage in cluster 47 dimensions: ["cpu", "memory, "gpu", "other-scalable-resource-type"...] 48 value: 0.2 49 resource: // specified reserving resource 50 cpu: 2c 51 memory: 4G 52 53 status: 54 state: Open 55 56 reservation: // reservation status key word 57 nodes: [n1, n2] // locked nodes list 58 resource: // total idle resource in locked nodes 59 cpu: 1C 60 memory: 2G 61 ``` 62 ### Detail 63 #### Fields 64 ##### policy 65 Option Field, options are "Best-Effort" and "Guaranteed" and "Best-Effort" by default. If "Best-Effort" is set, scheduler 66 will reserve resource without evicting running workloads, while "Guaranteed" is opposite. 67 #### percentage 68 Option Field if `resource` not specified, legal value range is [0, 1]. Locked node number will be as follows: 69 ``` 70 math.Floor(clusterNodeNumber * percentage) 71 ``` 72 It should be noted that nodes will be locked randomly util reaching the percentage. If `resource` is also specified, choose 73 the one with more reserved resources as the final result. 74 #### guarantee.resource 75 Option Field if `percentage` not specified. List of reserving resource categories and amount. 76 #### nodes 77 List of locked nodes. 78 #### status.resource 79 Total idle resource of locked nodes. You can judge whether reservation has met demand by watching this field. 80 81 #### Algorithm 82 83 Sort all nodes combination which satisfy the queue request resource in all dimensions. 84 85 and these situations will effect the order of the nodes combination. 86 87 * sum(nodes's resource) 88 89 the sum of the combination nodes's resource is lower,the priority is higher. and the weight is 0.4 90 91 * queue already used 92 93 the more resources are used on the nodes that belong to the queue,the priority is higher. and the weight is 0.35 94 95 * sum(nodes's number) 96 97 the nums of the combination nodes is lower,the priority is higher. and the weight is 0.15 98 99 * sum(nodes's idle resource) 100 101 the more idle resources on the nodes,the priority is higher. and the weight is 0.1 102 103 #### Complete formula 104 105 ##### Explanation of terms 106 * sum : the sum of the combination nodes's resource. 107 * target: the target queue's request resource. 108 * used : the sum of used resources on the nodes. 109 * n : the nums of the combination nodes. 110 * idle : the sum idle of resources on the nodes. 111 112 * We hope the value of (sum-target) is as less as possible to avoid wasting of the cluster resources. and this should have the hightest weight. 113 * We condider the element 'used' will effect the locking nodes's efficiency and stable. 114 * When the above two conditions's score are almost close. we preffer the The least number of combination nodes as the best choice,cause the fewer nodes 115 lokced,The smaller the impact on the cluster. 116 * The least important condition is the idle of the nodes,the more idle the nodes are, the more efficient it is to lock on. 117 ``` 118 0.4*1/(sum-target)/[(sum-target)+used+n+idle] + 0.35*used/[(sum-target)+used+n+idle] + 0.15*1/n/[(sum-target)+used+n+idle] + 0.1*idle/[(sum-target)+used+n+idle] 119 ``` 120 121 ##### Lock Strategy 122 * schedule relock 123 124 for every 5s or 10s. the worker in the queueController will find the best combination nodes for a queue. 125 and update the queue's locked nodes,and caculate the idle resources on the locked nodes. 126 127 128 ## Implementation 129 ### Controller 130 Add a new worker in the queue_controller: reserveWorker. reserveWorker aims to take care of the queue which has the requirement 131 of reservation ,including finding and updating the node details for the queue to lock, updating the queue reservation status. 132 133  134 135 ### Plugin 136 Add new Plugin node_reservation to implement algorithm detail above. 137 138 