volcano.sh/volcano@v1.9.0/docs/design/multi-scheduler.md (about)

     1  # multi-scheduling
     2  
     3  ## Backgrounds
     4  In a cluster, there are Multiple Schedulers for different workload, e.g. default scheduler for system daemon, and volcano scheduler for biz workload.
     5  when the schedulers work in parallel, it is easy to occur the resource conflicts.
     6  
     7  ## Motivation
     8  
     9  - Classify Pods to different resource group and each resource group specifies its own scheduler.
    10  - There are no resource conflicts when Multiple schedulers work in parallel.
    11  ## Design Action
    12  
    13  To devide the cluster resource, user need to set the taints and labels on cluster nodes and set the corresponding tolerations and nodeSelector on the pods. It will need a lot of manpower to modify the yaml files. 
    14  For less user operation, volcano supports the action about the pod parts automatically and adds a new MutatingAdmissionWebhook in volcano admission deployment to do it.
    15  
    16  ### New configmap (volcano-admission-configmap)
    17  
    18  The configmap defines some resource groups, each resource group contains
    19  - The identification information of the pod object, volcano support two fields:
    20    - Namespace field
    21    - Annotation field
    22  - The pod's data volcano needs to patch, volcano support to patch the fields and the fields are optional and not mandatory. User can set them according the application scenario.
    23    - Tolerations 
    24    - Affinity
    25    - NodeSelector
    26    - SchedulerName
    27  
    28  If the object field is not setted, it is filled with a default as the following:
    29  ````
    30  - resourceGroup: XXX                   
    31    object:
    32      key: annotation                             # set the field and the value to be matched
    33      value:
    34      - "volcano.sh/resource-group: XXX"          # XXX is the value in resourceGroup field
    35  ````
    36  
    37  For example
    38  ````
    39  apiVersion: v1
    40  kind: ConfigMap
    41  metadata:
    42    name: volcano-admission-configmap
    43    namespace: volcano-system
    44  data:
    45    volcano-admission.conf: |
    46      resourceGroups:
    47      - resourceGroup: management                    # set the resource group name
    48        object:
    49          key: namespace                             # set the field and the value to be matched
    50          value:
    51          - mng-ns-1
    52        schedulerName: default-scheduler             # set the scheduler for patching
    53        tolerations:                                 # set the tolerations for patching
    54        - effect: NoSchedule
    55          key: taint
    56          operator: Exists
    57        labels:
    58          volcano.sh/nodetype: management           # set the nodeSelector for patching
    59      - resourceGroup: cpu
    60        object:
    61          key: annotation
    62          value:
    63          - "volcano.sh/resource-group: cpu"
    64        schedulerName: volcano
    65        labels:
    66          volcano.sh/nodetype: cpu
    67      - resourceGroup: gpu                          # if the object is unsetted, default is:  the key is annotation,
    68        schedulerName: volcano                      # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field
    69        labels:
    70          volcano.sh/nodetype: gpu
    71      - resourceGroup: fixed                        # if the object is unsetted, default is:  the key is annotation,
    72        schedulerName: volcano                      # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field
    73        object:
    74          key: annotation
    75          value:
    76          - "volcano.sh/resource-group-job-role: master"
    77        # set the affinity for patching, the format is a json string.
    78        affinity: "{\"nodeAffinity\":{\"requiredDuringSchedulingIgnoredDuringExecution\":{\"nodeSelectorTerms\":[{\"matchExpressions\":[{\"key\":\"volcano.sh/nodetype\",\"operator\":\"In\",\"values\":[\"fixed\"]}]}]}}}"
    79  ````
    80  
    81  ### The pod mutate process 
    82  ![](./images/pod-webhook-mutate.png)
    83  
    84  If the pod matches serval resource groups, volcano will match the resource group by sort.
    85  
    86  
    87  ## Usage
    88  ### case 1 
    89  Default scheduler for system daemon, and volcano scheduler for biz workload
    90  Here is a cluster as the following:
    91  
    92  |node| label | taint|
    93  |----|-----|-----|
    94  |node1| volcano.sh/nodetype: management| management-taint:NoSchedule|
    95  |node2| none| none|
    96  
    97  |pod | kind | namespace|
    98  |----|----|----|
    99  |deployment-A|deployment| mng-ns-1| 
   100  |volcano-job-B|volcano job | default|
   101  
   102  1. Edit volcano-admission-configmap 
   103  ````
   104  apiVersion: v1
   105  kind: ConfigMap
   106  metadata:
   107    name: volcano-admission-configmap
   108    namespace: volcano-system
   109  data:
   110    volcano-admission.conf: |
   111      resourceGroups:
   112      - resourceGroup: management                    # set the resource group name
   113        object:
   114          key: namespace                             # set the field and the value to be matched
   115          value:
   116          - mng-ns-1
   117        schedulerName: default-scheduler             # set the scheduler for patching
   118        tolerations:                                 # set the tolerations for patching
   119        - effect: NoSchedule
   120          key: management-taint
   121          operator: Exists
   122        labels:
   123          volcano.sh/nodetype: management           # set the nodeSelector for patching
   124  ````
   125  
   126  2. Submit deployment-A and volcano-job-B
   127  
   128  3. Check the deployment Pod 
   129   
   130  ````
   131  deployment-A:
   132  ....
   133  nodeSelector:
   134    volcano.sh/nodetype: management
   135  ...
   136  schedulerName: default-scheduler
   137  ...
   138  tolerations:
   139    - effect: NoSchedule
   140      key: management-taint
   141      operator: Exists
   142      
   143  volcano-job-B:
   144  ....
   145    <none>
   146  ....
   147  ````
   148  4. Check the result of the pod's scheduling 
   149  ````
   150  The pod in deployment-A is scheduled to node1.
   151  the pod in volcano-job-B is scheduled to node2.
   152  ````
   153  
   154  ### case 2 
   155  
   156  Here is a cluster as the following:
   157  
   158  |node|label|
   159  |----|-----|
   160  |node1| volcano.sh/nodetype: cpu|
   161  |node2| volcano.sh/nodetype: gpu|
   162  
   163  |volcano job | annotation|
   164  |----|----|
   165  |job-A|volcano.sh/resource-group: cpu| 
   166  |job-B|volcano.sh/resource-group: gpu|
   167  
   168  1. Edit volcano-admission-configmap
   169  ````
   170  apiVersion: v1
   171  kind: ConfigMap
   172  metadata:
   173    name: volcano-admission-configmap
   174    namespace: volcano-system
   175  data:
   176    volcano-admission.conf: |
   177      resourceGroups:
   178      - resourceGroup: cpu
   179        object:
   180          key: annotation
   181          value:
   182          - "volcano.sh/resource-group: cpu"
   183        schedulerName: volcano
   184        labels:
   185          volcano.sh/nodetype: cpu
   186      - resourceGroup: gpu                          
   187        schedulerName: volcano                 
   188        labels:
   189          volcano.sh/nodetype: gpu
   190  ````
   191  2. Submit job-A and job-B
   192  
   193  3. Check the Pod information 
   194  
   195  ````
   196  job-A:
   197  ....
   198  nodeSelector:
   199     volcano.sh/nodetype: cpu
   200  ...
   201  schedulerName: volcano
   202  ....
   203  
   204  job-B:
   205  ....
   206  nodeSelector:
   207     volcano.sh/nodetype: gpu
   208  ...
   209  schedulerName: volcano
   210  ....
   211   ````
   212  4. Check the result of the pod's scheduling
   213  ````
   214  The pod in job-A is scheduled to node1.
   215  The pod in job-B job is scheduled to node2.
   216  ````
   217  
   218  ### case 3 
   219  
   220  Here is a cluster as the following:
   221  
   222  |node|label|
   223  |----|-----|
   224  |node1| volcano.sh/nodetype: fixed|
   225  |node2| none| none|
   226  
   227  |volcano job | annotation|
   228  |----|----|
   229  |job-A|volcano.sh/resource-group-job-role: master| 
   230  |job-B|none|
   231  
   232  1. Edit volcano-admission-configmap
   233  ````
   234  apiVersion: v1
   235  kind: ConfigMap
   236  metadata:
   237    name: volcano-admission-configmap
   238    namespace: volcano-system
   239  data:
   240    volcano-admission.conf: |
   241      resourceGroups:
   242      - resourceGroup: fixed                        # if the object is unsetted, default is:  the key is annotation,
   243        schedulerName: volcano                      # the annotation key is fixed and is "volcano.sh/resource-group", The corresponding value is the resourceGroup field
   244        object:
   245          key: annotation
   246          value:
   247          - "volcano.sh/resource-group-job-role: master"
   248        # set the affinity for patching, the format is a json string.
   249        affinity: "{\"nodeAffinity\":{\"requiredDuringSchedulingIgnoredDuringExecution\":{\"nodeSelectorTerms\":[{\"matchExpressions\":[{\"key\":\"volcano.sh/nodetype",\"operator\":\"In\",\"values\":[\"fixed\"]}]}]}}}"
   250  ````
   251  2. Submit job-A and job-B
   252  
   253  3. Check the Pod information 
   254  
   255  ````
   256  job-A:
   257  ....
   258  affinity:
   259    nodeAffinity:
   260      requiredDuringSchedulingIgnoredDuringExecution:
   261        nodeSelectorTerms:
   262        - matchExpressions:
   263          - key: volcano.sh/nodetype
   264            operator: In
   265            values:
   266            - fixed
   267  ...
   268  schedulerName: volcano
   269  ....
   270  
   271  job-B:
   272  ....
   273  ...
   274  schedulerName: volcano
   275  ....
   276   ````
   277  4. Check the result of the pod's scheduling
   278  ````
   279  The pod in job-A is scheduled to node1.
   280  The pod in job-B job is scheduled to node1/node2.
   281  ````
   282  
   283  ## NOTE
   284  
   285  Enable this feature may modify pod information and affect resource utilization.
   286  - The feature will divide the cluster resource and may decrease the resource usage.
   287  - The feature will add some additional information to the pods, such as tolerations and nodeSelector data.
   288