volcano.sh/volcano@v1.9.0/docs/design/device-sharing.md

volcano.sh/volcano@v1.9.0/docs/design/device-sharing.md (about)

     1  # Sharing devices in volcano
     2  
     3  ## Introduction
     4  
     5  We implement a common interface for shareable devices(GPU,NPU,FPGA,...) called Devices, and use it to reimplement current gpu-share mechanism. The goal is to let device-sharing easy to implement, and better organised. If you wish to grant vc-scheduler the ability to share another device, all you need is to implement these methods in Devices, and place your logic under pkg/scheduler/api/devices. 
     6  
     7  ## Backguards
     8  
     9  We intended to provide volcano the ability to share third-party resources link GPU,NPU,etc in the near future. At first, I tried to implement these logics based on predicate.gpushare, but i sooner realised that these logics scattered in device_info.go, node_info.go, pod_info.go, and whole predicate folder. if i follow the implementation of predicate.gpushare, i will have no choice but hack deeply into vc-scheduler api. Sooner or later vc-scheduler api will be crowded with various device-sharing logic, which is probably not what we wished.
    10  
    11  ## Implementation
    12  
    13  ### Interface Devices design
    14  
    15  The design of Devices is shown below:
    16  
    17  ```
    18  type Devices interface {
    19  	//following two functions used in node_info
    20  	//AddResource is to add the corresponding device resource of this 'pod' into current scheduler cache
    21  	AddResource(pod *v1.Pod)
    22  	//SubResoure is to substract the corresponding device resource of this 'pod' from current scheduler cache
    23  	SubResource(pod *v1.Pod)
    24  
    25  	//following four functions used in predicate
    26  	//HasDeviceRequest checks if the 'pod' request this device
    27  	HasDeviceRequest(pod *v1.Pod) bool
    28  	//FiltreNode checks if the 'pod' fit in current node
    29  	// The first return value represents the filtering result, and the value range is "0, 1, 2, 3"
    30  	// 0: Success
    31  	// Success means that plugin ran correctly and found pod schedulable.
    32  
    33  	// 1: Error
    34  	// Error is used for internal plugin errors, unexpected input, etc.
    35  
    36  	// 2: Unschedulable
    37  	// Unschedulable is used when a plugin finds a pod unschedulable. The scheduler might attempt to
    38  	// preempt other pods to get this pod scheduled. Use UnschedulableAndUnresolvable to make the
    39  	// scheduler skip preemption.
    40  	// The accompanying status message should explain why the pod is unschedulable.
    41  
    42  	// 3: UnschedulableAndUnresolvable
    43  	// UnschedulableAndUnresolvable is used when a plugin finds a pod unschedulable and
    44  	// preemption would not change anything. Plugins should return Unschedulable if it is possible
    45  	// that the pod can get scheduled with preemption.
    46  	// The accompanying status message should explain why the pod is unschedulable.
    47  	FilterNode(pod *v1.Pod) (int, string, error)
    48  	
    49  	//Allocate action in predicate
    50  	Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error
    51  	//Release action in predicate
    52  	Release(kubeClient kubernetes.Interface, pod *v1.Pod) error
    53  
    54  	//used for debug and monitor
    55  	GetStatus() string
    56  }
    57  ```
    58  
    59  The first two method are used for node_info to update cluster status. The following four methods are used in predicate which allocatation and deallocation actually take place. Finally a monitor mothod for debug.
    60  
    61  ### Create a seperate package for gpushare related methods, and use Devices method to reimplement it.
    62  
    63  There are two steps we need to do, first, we need to create a new package in "pkg/scheduler/api/devices/nvidia/gpushare", and implement Devices methods in it, then we need to seperate gpushare-related logic from "scheduler.api" and "predicate plugin", and convert them to package "pkg/scheduler/api/devices/nvidia/gpushare". The package contains the following files: device.go(which implement SharedDevicePool interface methods), share.go(which contains private methods for device.go), type.go(which contains const values and definations).
    64  
    65  Details of methods mapping is shown in the table below:
    66  
    67  | origin file | corresponding file(s) in new package |
    68  | ------------- | ------------- |
    69  | pkg/scheduler/api/node_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go |
    70  | pkg/scheduler/api/device_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go |
    71  | pkg/scheduler/api/pod_info.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go |
    72  | pkg/scheduler/plugins/predicates/predicates.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go |
    73  | pkg/scheduler/plugins/predicates/gpu.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go |
    74  
    75  ## How to add a new device-share policy
    76  
    77  ### 1. Define your device in /pkg/scheduler/api/shared_device_pool.go
    78  
    79  Name your policy and put it in shared_device_pool.go as follows:
    80  
    81  ```
    82  const (
    83  	GPUSharingDevice = "GpuShare"
    84  	Your_new_sharing_policy = "xxxxx"
    85  )
    86  ```
    87  
    88  ### 2. Create a new package in /pkg/scheduler/api/devices/"your device name"/"your policy name"
    89  
    90  For example, if you try to implement a NPU share policy, then you are recommended to create a package in /pkg/scheduler/api/device/ascend/npushare
    91  
    92  ### 3. Implement methods of interface shared_device_pool, and put them in your new package
    93  
    94  Note that, you can't to refer to any struct of methods in scheduler.api to avoid cycle importing. If there is anything in scheduler.api you *must* need, then you should modify the SharedDevicePool interface to pass it.
    95  The methods defined in SharedDevicePool interface and its information is shown in table below:
    96  
    97  | interface | invoker file | information |
    98  | ------------- | ------------ | ------------- |
    99  | AddResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Add the 'pod' and its resources into scheduler cache |
   100  | SubResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Delete the 'pod' and substract its resources from scheduler cache |
   101  | HasDeviceRequest(pod *v1.Pod) bool | pkg/scheduler/plugins/predicates/predicate.go | Check whether this 'pod' request a portion of this device |
   102  | FilterNode(pod *v1.Pod)| pkg/scheduler/plugins/predicates/predicate.go | Check whether the portion of device this pod requests can fit in current node |
   103  | Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Allocate the portion of this device from the current node to this pod |
   104  | Release(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Dellocate the portion of this device from this pod |
   105  | GetStatus() string | none | Used for debug and monitor | 
   106  
   107  ### 4. Add your initialization code in /pkg/scheduler/api/node_info.go
   108  
   109  This is the *only* place you hack into scheduler.api ,which you have to register your policy during initialization of node_struct.
   110  
   111  ```
   112  
   113  // setNodeOthersResource initialize sharable devices
   114  func (ni *NodeInfo) setNodeOthersResource(node *v1.Node) {
   115  	ni.Others[GPUSharingDevice] = gpushare.NewGPUDevices(ni.Name, node)
   116  	//ni.Others["your device sharing policy name"] = your device sharing package initialization method
   117  }
   118  
   119  ```
   120  
   121  ### 5. Check if your policy is enabled in /pkg/scheduler/plugins/predicate/predicates.go
   122  
   123  This is the *only* plae you hack into predicates.go, when the scheduler checks if your policy is enabled in scheduler configuration.
   124  
   125  predicates.go:
   126  
   127  ```
   128  ...
   129  // Checks whether predicate.GPUSharingEnable is provided or not, if given, modifies the value in predicateEnable struct.
   130  args.GetBool(&gpushare.GpuSharingEnable, GPUSharingPredicate)
   131  args.GetBool(&gpushare.GpuNumberEnable, GPUNumberPredicate)
   132  args.GetBool(&gpushare.NodeLockEnable, NodeLockEnable)
   133  args.GetBool("your policy enable variable","your policy enable parameter")
   134  ...
   135  ```
   136  
   137  
   138  
   139