volcano.sh/volcano@v1.9.0/docs/design/podgroup-status.md

volcano.sh/volcano@v1.9.0/docs/design/podgroup-status.md (about)

     1  # PodGroup Status Enhancement
     2  
     3  @k82cn; Jan 2, 2019
     4  
     5  ## Table of Contents
     6  
     7  * [Table of Contents](#table-of-contents)
     8  * [Motivation](#motivation)
     9  * [Function Detail](#function-detail)
    10  * [Feature Interaction](#feature-interaction)
    11     * [Cluster AutoScale](#cluster-autoscale)
    12     * [Operators/Controllers](#operatorscontrollers)
    13  * [Reference](#reference)
    14  
    15  ## Motivation
    16  
    17  In [Coscheduling v1alph1](https://github.com/kubernetes/enhancements/pull/639) design, `PodGroup`'s status
    18  only includes counters of related pods which is not enough for `PodGroup` lifecycle management. More information
    19  about PodGroup's status will be introduced in this design doc for lifecycle management, e.g. `PodGroupPhase`.
    20  
    21  ## Function Detail
    22  
    23  To include more information for PodGroup current status/phase, the following types are introduced:
    24  
    25  ```go
    26  // PodGroupPhase is the phase of a pod group at the current time.
    27  type PodGroupPhase string
    28  
    29  // These are the valid phase of podGroups.
    30  const (
    31      // PodPending means the pod group has been accepted by the system, but scheduler can not allocate
    32      // enough resources to it.
    33      PodGroupPending PodGroupPhase = "Pending"
    34  
    35      // PodRunning means `spec.minMember` pods of PodGroups has been in running phase.
    36      PodGroupRunning PodGroupPhase = "Running"
    37  
    38      // PodGroupUnknown means part of `spec.minMember` pods are running but the other part can not
    39      // be scheduled, e.g. not enough resource; scheduler will wait for related controller to recover it.
    40      PodGroupUnknown PodGroupPhase = "Unknown"
    41  )
    42  
    43  type PodGroupConditionType string
    44  
    45  const (
    46      PodGroupUnschedulableType PodGroupConditionType = "Unschedulable"
    47  )
    48  
    49  // PodGroupCondition contains details for the current state of this pod group.
    50  type PodGroupCondition struct {
    51      // Type is the type of the condition
    52      Type PodGroupConditionType `json:"type,omitempty" protobuf:"bytes,1,opt,name=type"`
    53  
    54      // Status is the status of the condition.
    55      Status v1.ConditionStatus `json:"status,omitempty" protobuf:"bytes,2,opt,name=status"`
    56  
    57      // The ID of condition transition.
    58      TransitionID string `json:"transitionID,omitempty" protobuf:"bytes,3,opt,name=transitionID"`
    59  
    60      // Last time the phase transitioned from another to current phase.
    61      // +optional
    62      LastTransitionTime metav1.Time `json:"lastTransitionTime,omitempty" protobuf:"bytes,4,opt,name=lastTransitionTime"`
    63  
    64      // Unique, one-word, CamelCase reason for the phase's last transition.
    65      // +optional
    66      Reason string `json:"reason,omitempty" protobuf:"bytes,5,opt,name=reason"`
    67  
    68      // Human-readable message indicating details about last transition.
    69      // +optional
    70      Message string `json:"message,omitempty" protobuf:"bytes,6,opt,name=message"`
    71  }
    72  
    73  const (
    74      // PodFailedReason is probed if pod of PodGroup failed
    75      PodFailedReason string = "PodFailed"
    76  
    77      // PodDeletedReason is probed if pod of PodGroup deleted
    78      PodDeletedReason string = "PodDeleted"
    79  
    80      // NotEnoughResourcesReason is probed if there're not enough resources to schedule pods
    81      NotEnoughResourcesReason string = "NotEnoughResources"
    82  
    83      // NotEnoughPodsReason is probed if there're not enough tasks compared to `spec.minMember`
    84      NotEnoughPodsReason string = "NotEnoughTasks"
    85  )
    86  
    87  // PodGroupStatus represents the current state of a pod group.
    88  type PodGroupStatus struct {
    89      // Current phase of PodGroup.
    90      Phase PodGroupPhase `json:"phase,omitempty" protobuf:"bytes,1,opt,name=phase"`
    91  
    92      // The conditions of PodGroup.
    93      // +optional
    94      Conditions []PodGroupCondition `json:"conditions,omitempty" protobuf:"bytes,2,opt,name=conditions"`
    95  
    96      // The number of actively running pods.
    97      // +optional
    98      Running int32 `json:"running,omitempty" protobuf:"bytes,3,opt,name=running"`
    99  
   100      // The number of pods which reached phase Succeeded.
   101      // +optional
   102      Succeeded int32 `json:"succeeded,omitempty" protobuf:"bytes,4,opt,name=succeeded"`
   103  
   104      // The number of pods which reached phase Failed.
   105      // +optional
   106      Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
   107  }
   108  
   109  ```
   110  
   111  According to the PodGroup's lifecycle, the following phase/state transactions are reasonable. And related
   112  reasons will be appended to `Reason` field.
   113  
   114  | From    | To            | Reason  |
   115  |---------|---------------|---------|
   116  | Pending | Running       | When every pods of `spec.minMember` are running |
   117  | Running | Unknown       | When some pods of `spec.minMember` are restarted but can not be rescheduled |
   118  | Unknown | Pending       | When all pods (`spec.minMember`) in PodGroups are deleted |
   119  
   120  ## Feature Interaction
   121  
   122  ### Cluster AutoScale
   123  
   124  [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) is a tool that
   125  automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
   126  
   127  * there are pods that failed to run in the cluster due to insufficient resources,
   128  * there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
   129  
   130  When Cluster-Autoscaler scale-out a new node, it leverage predicates in scheduler to check whether the new node can be
   131  scheduled. But Coscheduling is not an implementation of predicates for now; so it'll not work well together with
   132  Cluster-Autoscaler right now. Alternative solution will be proposed later for that.
   133  
   134  ### Operators/Controllers
   135  
   136  The lifecycle of `PodGroup` are managed by operators/controllers, the scheduler only probes related state for
   137  controllers. For example, if `PodGroup` is `Unknown` for MPI job, the controller need to re-start all pods in `PodGroup`.
   138  
   139  ## Reference
   140  
   141  * [Coscheduling](https://github.com/kubernetes/enhancements/pull/639)
   142  * [Add phase/conditions into PodGroup.Status](https://github.com/kubernetes-sigs/kube-batch/issues/521)
   143  * [Add Pod Condition and unblock cluster autoscaler](https://github.com/kubernetes-sigs/kube-batch/issues/526)
   144