volcano.sh/volcano@v1.9.0/docs/design/podgroup-status.md (about) 1 # PodGroup Status Enhancement 2 3 @k82cn; Jan 2, 2019 4 5 ## Table of Contents 6 7 * [Table of Contents](#table-of-contents) 8 * [Motivation](#motivation) 9 * [Function Detail](#function-detail) 10 * [Feature Interaction](#feature-interaction) 11 * [Cluster AutoScale](#cluster-autoscale) 12 * [Operators/Controllers](#operatorscontrollers) 13 * [Reference](#reference) 14 15 ## Motivation 16 17 In [Coscheduling v1alph1](https://github.com/kubernetes/enhancements/pull/639) design, `PodGroup`'s status 18 only includes counters of related pods which is not enough for `PodGroup` lifecycle management. More information 19 about PodGroup's status will be introduced in this design doc for lifecycle management, e.g. `PodGroupPhase`. 20 21 ## Function Detail 22 23 To include more information for PodGroup current status/phase, the following types are introduced: 24 25 ```go 26 // PodGroupPhase is the phase of a pod group at the current time. 27 type PodGroupPhase string 28 29 // These are the valid phase of podGroups. 30 const ( 31 // PodPending means the pod group has been accepted by the system, but scheduler can not allocate 32 // enough resources to it. 33 PodGroupPending PodGroupPhase = "Pending" 34 35 // PodRunning means `spec.minMember` pods of PodGroups has been in running phase. 36 PodGroupRunning PodGroupPhase = "Running" 37 38 // PodGroupUnknown means part of `spec.minMember` pods are running but the other part can not 39 // be scheduled, e.g. not enough resource; scheduler will wait for related controller to recover it. 40 PodGroupUnknown PodGroupPhase = "Unknown" 41 ) 42 43 type PodGroupConditionType string 44 45 const ( 46 PodGroupUnschedulableType PodGroupConditionType = "Unschedulable" 47 ) 48 49 // PodGroupCondition contains details for the current state of this pod group. 50 type PodGroupCondition struct { 51 // Type is the type of the condition 52 Type PodGroupConditionType `json:"type,omitempty" protobuf:"bytes,1,opt,name=type"` 53 54 // Status is the status of the condition. 55 Status v1.ConditionStatus `json:"status,omitempty" protobuf:"bytes,2,opt,name=status"` 56 57 // The ID of condition transition. 58 TransitionID string `json:"transitionID,omitempty" protobuf:"bytes,3,opt,name=transitionID"` 59 60 // Last time the phase transitioned from another to current phase. 61 // +optional 62 LastTransitionTime metav1.Time `json:"lastTransitionTime,omitempty" protobuf:"bytes,4,opt,name=lastTransitionTime"` 63 64 // Unique, one-word, CamelCase reason for the phase's last transition. 65 // +optional 66 Reason string `json:"reason,omitempty" protobuf:"bytes,5,opt,name=reason"` 67 68 // Human-readable message indicating details about last transition. 69 // +optional 70 Message string `json:"message,omitempty" protobuf:"bytes,6,opt,name=message"` 71 } 72 73 const ( 74 // PodFailedReason is probed if pod of PodGroup failed 75 PodFailedReason string = "PodFailed" 76 77 // PodDeletedReason is probed if pod of PodGroup deleted 78 PodDeletedReason string = "PodDeleted" 79 80 // NotEnoughResourcesReason is probed if there're not enough resources to schedule pods 81 NotEnoughResourcesReason string = "NotEnoughResources" 82 83 // NotEnoughPodsReason is probed if there're not enough tasks compared to `spec.minMember` 84 NotEnoughPodsReason string = "NotEnoughTasks" 85 ) 86 87 // PodGroupStatus represents the current state of a pod group. 88 type PodGroupStatus struct { 89 // Current phase of PodGroup. 90 Phase PodGroupPhase `json:"phase,omitempty" protobuf:"bytes,1,opt,name=phase"` 91 92 // The conditions of PodGroup. 93 // +optional 94 Conditions []PodGroupCondition `json:"conditions,omitempty" protobuf:"bytes,2,opt,name=conditions"` 95 96 // The number of actively running pods. 97 // +optional 98 Running int32 `json:"running,omitempty" protobuf:"bytes,3,opt,name=running"` 99 100 // The number of pods which reached phase Succeeded. 101 // +optional 102 Succeeded int32 `json:"succeeded,omitempty" protobuf:"bytes,4,opt,name=succeeded"` 103 104 // The number of pods which reached phase Failed. 105 // +optional 106 Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"` 107 } 108 109 ``` 110 111 According to the PodGroup's lifecycle, the following phase/state transactions are reasonable. And related 112 reasons will be appended to `Reason` field. 113 114 | From | To | Reason | 115 |---------|---------------|---------| 116 | Pending | Running | When every pods of `spec.minMember` are running | 117 | Running | Unknown | When some pods of `spec.minMember` are restarted but can not be rescheduled | 118 | Unknown | Pending | When all pods (`spec.minMember`) in PodGroups are deleted | 119 120 ## Feature Interaction 121 122 ### Cluster AutoScale 123 124 [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) is a tool that 125 automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true: 126 127 * there are pods that failed to run in the cluster due to insufficient resources, 128 * there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. 129 130 When Cluster-Autoscaler scale-out a new node, it leverage predicates in scheduler to check whether the new node can be 131 scheduled. But Coscheduling is not an implementation of predicates for now; so it'll not work well together with 132 Cluster-Autoscaler right now. Alternative solution will be proposed later for that. 133 134 ### Operators/Controllers 135 136 The lifecycle of `PodGroup` are managed by operators/controllers, the scheduler only probes related state for 137 controllers. For example, if `PodGroup` is `Unknown` for MPI job, the controller need to re-start all pods in `PodGroup`. 138 139 ## Reference 140 141 * [Coscheduling](https://github.com/kubernetes/enhancements/pull/639) 142 * [Add phase/conditions into PodGroup.Status](https://github.com/kubernetes-sigs/kube-batch/issues/521) 143 * [Add Pod Condition and unblock cluster autoscaler](https://github.com/kubernetes-sigs/kube-batch/issues/526) 144