volcano.sh/volcano@v1.9.0/docs/design/Enhance-Generate-PodGroup-OwnerReferences-for-Normal-Pod.md (about)

     1  # Enhance Generate PodGroup OwnerReferences for Normal Pod
     2  
     3  ## Background
     4  Volcano scheduler can schedule normal workload such as single Pod, Deployment, CronJob, Statefulset etc., and other CRD workflow. For these normal pod, volcano controller will create a podgroup for these workload. When it is a single Pod, controller will set podgroup `ownerReferences` points to this pod, otherwise controller will use pod`ownerReferences` as the podgroup's. 
     5  
     6  For issue [2143](https://github.com/volcano-sh/volcano/issues/2143) when deployment update, it will create new replicaset and volcano controller will create new podgroup for replicaset. After several times update, the Deployment has many `Inqueue` podgroups. For example, it is not reasonable for one Deployment has four podgourps(three podgroups is in `Inqueue` and one in `Running` status) which consume resources in `overcommit` plugin.
     7  
     8  <table>
     9  	<tr>
    10  	    <th>Deployment</th>
    11  	    <th>ReplicaSet</th>
    12  	    <th>PodGroup</th>  
    13  	</tr>
    14  	<tr>
    15  	    <td rowspan="4">deploy-with-volcano</td>
    16  	    <td>deploy-with-volcano-7bd985746b</td>
    17  	    <td>podgroup-4f5210d9-3cba-4a59-b61c-1b243f464708</td>
    18  	</tr>
    19  	<tr>
    20  	    <td>deploy-with-volcano-7f95dd7984</td>
    21  	    <td>podgroup-17fec14a-232c-4e1c-942e-e4d1289fc60b</td>
    22  	</tr>
    23  	<tr>
    24  	    <td>deploy-with-volcano-fd9dd69b6</td>
    25  	    <td>podgroup-5edaeb31-f16e-49c4-ac81-4aa331ea8f19</td>
    26  	</tr>
    27  	<tr>
    28  	    <td>deploy-with-volcano-995f85c44</td>
    29  	    <td>podgroup-0296521d-9dad-4d68-8abf-2d4b32f3bfbc</td>
    30  	</tr>
    31  </table>
    32  
    33  Let review volcano controller create podgroup process:
    34  
    35  ![set pg ownerreference original-process](./images/original-process.png) 
    36  
    37  ## Solution
    38  Some workload control pod directly such as Job, SparkApplication etc. For these workload, we can use the pod's `ownerReferences` as podgroup's `ownerReferences`. Some workload will control pod by other resources such as Deployment controls pod by Replicaset or Workflow controls pod by other resources.
    39  
    40  Volcano controller will introduce a configmap which stores (apiversion/kind/adjust-level) array. apiversion and kind can consist of GVK. adjust-level is integer which is used for workflow cases. 
    41  
    42  ```
    43  apiVersion: v1
    44  kind: ConfigMap
    45  metadata:
    46    name: volcano-controller-configmap
    47    namespace: volcano-system
    48  data:
    49    volcano-controller.conf: |
    50      podgroup-level-rules:
    51      - apiversion: apps/v1                    
    52        kind: deployment
    53        adjust-level: 0
    54      - apiversion: batch/v1                    
    55        kind: job
    56        adjust-level: 0
    57      - apiversion: argoproj.io/v1alpha1                   
    58        kind: workflow
    59        adjust-level: -1
    60  ```
    61  
    62  In workflow case, one workflow consists of many different resources (job/application) and job/application consists of pods. So the podgroup should match with job/application level instead of workflow level. We can use workflow GVK and `adjust-level: -1` to achieve this purpose. "-1" means podgroup should just one level below this GVK.
    63  
    64  
    65  When controller need create a podgroup, it will check the pod `ownerReferences` apiversion and kind if match one of the configmap rule. If yes, will use this `ownerReferences` as podgroup's `ownerReferences`; if not, controller will use `dynamicClient` to check parent resource `ownerReferences` recursively until get the one match the configmap rule or the root `ownerReferences`.
    66   
    67  ![set pg ownerreference workflow](./images/workflow.png) 
    68  
    69  ### DiscoveryClient & DynamicClient
    70  From the `ownerReferences`, we can get or create a GVK and owner resource name. 
    71  Discovery client is focused on the k8s resources, it can get GVR from GVK. When get GVR and owner resource name, we can use Dynamic client to get the owner resource's `ownerReferences`.
    72  
    73  ### RBAC
    74  
    75  
    76  ## User Cases
    77  Configmap:
    78  ```
    79  apiVersion: v1
    80  kind: ConfigMap
    81  metadata:
    82    name: volcano-controller-configmap
    83    namespace: volcano-system
    84  data:
    85    volcano-controller.conf: |
    86      podgroupLevel:
    87      - apiversion: apps/v1                    
    88        kind: deployment
    89        adjust-level: 0
    90      - apiversion: argoproj.io/v1alpha1                   
    91        kind: workflow
    92        adjust-level: -1
    93  ```
    94  ### 1. Single pod workload with scheduling.k8s.io/group-name annotation
    95  Pod has scheduling.k8s.io/group-name annotation, controller will not create pg for it.
    96  
    97  ### 2. Single pod workload without scheduling.k8s.io/group-name annotation
    98  Pod does not have owner reference, controller will create pg and set this pod as pg's owner reference.
    99  
   100  ### 3. Argo k8s job workflow
   101  Consider argo workflow such as: 
   102  ```
   103  apiVersion: argoproj.io/v1alpha1
   104  kind: Workflow
   105  metadata:
   106    generateName: k8s-jobs-
   107  spec:
   108    entrypoint: pi-tmpl
   109    templates:
   110    - name: pi-tmpl
   111      resource:                  
   112        action: create           
   113        successCondition: status.succeeded > 0
   114        failureCondition: status.failed > 3
   115        manifest: |              
   116          apiVersion: batch/v1
   117          kind: Job
   118          metadata:
   119            generateName: pi-job-
   120          spec:
   121            template:
   122              metadata:
   123                name: pi
   124              spec:
   125                containers:
   126                - name: pi
   127                  image: perl
   128                  command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
   129                restartPolicy: Never
   130            backoffLimit: 4
   131  ```
   132  Workflow create a Job and Pod's owner reference points to a Job. Because the configmap does not contain any information about workflow and job, finally controller will create a pg and use pod's owner reference (Job) information as pg's owner reference.
   133  
   134  ### 4. Deployment workload
   135  Pod's owner reference points to a ReplicaSet and ReplicaSet's owner reference points to Deployment. Controller will search owner reference recursively, finally find the ReplicaSet's owner reference (Deployment) match one of the configmap rule. Controller will use this Deployment pg's owner reference.