volcano.sh/volcano@v1.9.0/docs/design/Enhance-Generate-PodGroup-OwnerReferences-for-Normal-Pod.md (about) 1 # Enhance Generate PodGroup OwnerReferences for Normal Pod 2 3 ## Background 4 Volcano scheduler can schedule normal workload such as single Pod, Deployment, CronJob, Statefulset etc., and other CRD workflow. For these normal pod, volcano controller will create a podgroup for these workload. When it is a single Pod, controller will set podgroup `ownerReferences` points to this pod, otherwise controller will use pod`ownerReferences` as the podgroup's. 5 6 For issue [2143](https://github.com/volcano-sh/volcano/issues/2143) when deployment update, it will create new replicaset and volcano controller will create new podgroup for replicaset. After several times update, the Deployment has many `Inqueue` podgroups. For example, it is not reasonable for one Deployment has four podgourps(three podgroups is in `Inqueue` and one in `Running` status) which consume resources in `overcommit` plugin. 7 8 <table> 9 <tr> 10 <th>Deployment</th> 11 <th>ReplicaSet</th> 12 <th>PodGroup</th> 13 </tr> 14 <tr> 15 <td rowspan="4">deploy-with-volcano</td> 16 <td>deploy-with-volcano-7bd985746b</td> 17 <td>podgroup-4f5210d9-3cba-4a59-b61c-1b243f464708</td> 18 </tr> 19 <tr> 20 <td>deploy-with-volcano-7f95dd7984</td> 21 <td>podgroup-17fec14a-232c-4e1c-942e-e4d1289fc60b</td> 22 </tr> 23 <tr> 24 <td>deploy-with-volcano-fd9dd69b6</td> 25 <td>podgroup-5edaeb31-f16e-49c4-ac81-4aa331ea8f19</td> 26 </tr> 27 <tr> 28 <td>deploy-with-volcano-995f85c44</td> 29 <td>podgroup-0296521d-9dad-4d68-8abf-2d4b32f3bfbc</td> 30 </tr> 31 </table> 32 33 Let review volcano controller create podgroup process: 34 35  36 37 ## Solution 38 Some workload control pod directly such as Job, SparkApplication etc. For these workload, we can use the pod's `ownerReferences` as podgroup's `ownerReferences`. Some workload will control pod by other resources such as Deployment controls pod by Replicaset or Workflow controls pod by other resources. 39 40 Volcano controller will introduce a configmap which stores (apiversion/kind/adjust-level) array. apiversion and kind can consist of GVK. adjust-level is integer which is used for workflow cases. 41 42 ``` 43 apiVersion: v1 44 kind: ConfigMap 45 metadata: 46 name: volcano-controller-configmap 47 namespace: volcano-system 48 data: 49 volcano-controller.conf: | 50 podgroup-level-rules: 51 - apiversion: apps/v1 52 kind: deployment 53 adjust-level: 0 54 - apiversion: batch/v1 55 kind: job 56 adjust-level: 0 57 - apiversion: argoproj.io/v1alpha1 58 kind: workflow 59 adjust-level: -1 60 ``` 61 62 In workflow case, one workflow consists of many different resources (job/application) and job/application consists of pods. So the podgroup should match with job/application level instead of workflow level. We can use workflow GVK and `adjust-level: -1` to achieve this purpose. "-1" means podgroup should just one level below this GVK. 63 64 65 When controller need create a podgroup, it will check the pod `ownerReferences` apiversion and kind if match one of the configmap rule. If yes, will use this `ownerReferences` as podgroup's `ownerReferences`; if not, controller will use `dynamicClient` to check parent resource `ownerReferences` recursively until get the one match the configmap rule or the root `ownerReferences`. 66 67  68 69 ### DiscoveryClient & DynamicClient 70 From the `ownerReferences`, we can get or create a GVK and owner resource name. 71 Discovery client is focused on the k8s resources, it can get GVR from GVK. When get GVR and owner resource name, we can use Dynamic client to get the owner resource's `ownerReferences`. 72 73 ### RBAC 74 75 76 ## User Cases 77 Configmap: 78 ``` 79 apiVersion: v1 80 kind: ConfigMap 81 metadata: 82 name: volcano-controller-configmap 83 namespace: volcano-system 84 data: 85 volcano-controller.conf: | 86 podgroupLevel: 87 - apiversion: apps/v1 88 kind: deployment 89 adjust-level: 0 90 - apiversion: argoproj.io/v1alpha1 91 kind: workflow 92 adjust-level: -1 93 ``` 94 ### 1. Single pod workload with scheduling.k8s.io/group-name annotation 95 Pod has scheduling.k8s.io/group-name annotation, controller will not create pg for it. 96 97 ### 2. Single pod workload without scheduling.k8s.io/group-name annotation 98 Pod does not have owner reference, controller will create pg and set this pod as pg's owner reference. 99 100 ### 3. Argo k8s job workflow 101 Consider argo workflow such as: 102 ``` 103 apiVersion: argoproj.io/v1alpha1 104 kind: Workflow 105 metadata: 106 generateName: k8s-jobs- 107 spec: 108 entrypoint: pi-tmpl 109 templates: 110 - name: pi-tmpl 111 resource: 112 action: create 113 successCondition: status.succeeded > 0 114 failureCondition: status.failed > 3 115 manifest: | 116 apiVersion: batch/v1 117 kind: Job 118 metadata: 119 generateName: pi-job- 120 spec: 121 template: 122 metadata: 123 name: pi 124 spec: 125 containers: 126 - name: pi 127 image: perl 128 command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] 129 restartPolicy: Never 130 backoffLimit: 4 131 ``` 132 Workflow create a Job and Pod's owner reference points to a Job. Because the configmap does not contain any information about workflow and job, finally controller will create a pg and use pod's owner reference (Job) information as pg's owner reference. 133 134 ### 4. Deployment workload 135 Pod's owner reference points to a ReplicaSet and ReplicaSet's owner reference points to Deployment. Controller will search owner reference recursively, finally find the ReplicaSet's owner reference (Deployment) match one of the configmap rule. Controller will use this Deployment pg's owner reference.