volcano.sh/volcano@v1.9.0/docs/design/task-minavailable.md (about)

     1  # Support MinAvailable For Task Level
     2  
     3  @[shinytang6](https://github.com/shinytang6); Nov 19th, 2020
     4  
     5  ## Motivation
     6  As [issue 988](https://github.com/volcano-sh/volcano/issues/988) mentioned, Volcano should support set minAvailable in task level.
     7  
     8  ## Design
     9  
    10  #### Support minAvailable for task
    11  
    12  Before talking about the specific implementation, l will first describe the purpose of `MinAvailable` field:
    13  Currently `MinAvailable` field only supported at job level.
    14  1. `MinAvailable` decides whether a vcjob can be scheduled during gang scheduling. If sumof(valid tasks) >= job.minavailable, we take this job as valid(can be scheduled).
    15  2. `MinAvailable` decides the final status of the vcjob. E.g. If the number of successful tasks in a finished job >= job.minavailable, we set the status of this job as `Completed`, or we set it as `Failed`.
    16  
    17  So if we want to support minAvailable for task level, I think the changes involved in this feature are as follows:
    18  1. We need to define `minAvailable` field at the task level & verify/set the default value in the webhook.
    19  
    20      The new field is as follows:
    21      ```
    22      // TaskSpec specifies the task specification of Job.
    23      type TaskSpec struct {
    24          ...
    25          
    26          // The minimal available pods to run for this Task
    27          // Defaults to the task replicas
    28          // +optional
    29          MinAvailable *int32 `json:"minAvailable,omitempty" protobuf:"bytes,2,opt,name=minAvailable"`
    30      }
    31      ```
    32  
    33  2. Since the scheduler does not aware jobs, it schedules based on `PodGroup`, we need to add a field to the `PodGroup` to describe the minMember corresponding to different tasks.
    34  
    35     The new field is as follows:
    36     ```
    37     // PodGroupSpec represents the template of a pod group.
    38     type PodGroupSpec struct {
    39          ...
    40         
    41          // MinTaskMember defines the minimal number of pods to run each task in the pod group;
    42          // if there's not enough resources to start each task, the scheduler
    43          // will not start anyone.
    44          MinTaskMember map[string]int32
    45     }
    46     ```
    47  3. Add a new judgment logic to the current gang scheduling. The current logic is if sumof(valid tasks) >= job.minavailable, we take this job as valid(can be scheduled). We need to add a logic before that. The logic is that if the `minAvailable` field of the task is set, the task under current job must meet the conditions of (valid pod of the task) >= job.task.minAvailable, then we can take the job as valid.
    48  4. Modify the judgment of job status. The current logic is that if the sumof(successful pods) >= job.minAvailable, then we can take the status of the job as `Completed`. This change may need another field in `JobStatus` to record the status of the pods under each task, currently `JobStatus` only records the number of pods in different states which we cannot distinguish which task pod belongs to.
    49  
    50      The newly added fields are as follows:
    51      
    52      ```
    53      // TaskState contains details for the current state of the task.
    54      type TaskState struct {
    55          // The phase of Task.
    56          // +optional
    57          Phase map[v1.PodPhase]int32 `json:"phase,omitempty" protobuf:"bytes,11,opt,name=phase"`
    58      } 
    59     
    60      // JobStatus represents the current status of a Job.
    61      type JobStatus struct {
    62          ...
    63          
    64          // The status of pods for each task
    65          // +optional
    66          TaskStatusCount map[string]TaskState `json:"taskStatusCount,omitempty" protobuf:"bytes,21,opt,name=taskStatusCount"`
    67      }
    68      ```