github.com/oam-dev/kubevela@v1.9.11/design/vela-core/workflow_policy.md

github.com/oam-dev/kubevela@v1.9.11/design/vela-core/workflow_policy.md (about)

     1  # Application-Level Policies and Customized Control-Logic Workflow Design
     2  
     3  ## Background
     4  
     5  The current model consists of mainly Components and Traits.
     6  While this enables the Application object to plug-in operational capabilities, it is still not flexible enough.
     7  Specifically, it has the following limitations:
     8  
     9  - The current control logic could not be customized. Once the Vela controller renders final k8s resources, it simply applies them without any extension points. In some scenarios, users want to do more complex operations like:
    10    - Blue-green style upgrade of the app.
    11    - User interaction like manual approval/rollback.
    12    - Distributing workloads across multiple clusters.
    13    - Actions to enforce policies and audit.
    14    - Pushing final k8s resources to other config store (e.g. Git repos).
    15  - There is only per-component config, but no application-level policies. In some scenarios, users want to define policies like:
    16    - Security: RBAC rules, audit settings, secret backend types.
    17    - Insights: app delivery lead time, frequence, MTTR.
    18  
    19  Here is an overview of the features we want to expose and the capabilities we want to plug in:
    20  
    21  ![alt](../../docs/resources/workflow-feature.jpg)
    22  
    23  ## Proposal
    24  
    25  To resolve the aforementioned problems, we propose to add app-level policies and customizable workflow to the Application CRD:
    26  
    27  ```yaml
    28  kind: Application
    29  spec:
    30    components: ...
    31  
    32    # Policies are rendered after components are rendered but before workflow are started
    33    policies:
    34      - type: security
    35        name: my-rule
    36        properties:
    37          rbac: enabled
    38          audit: enabled
    39          secretBackend: vault
    40  
    41      - type: deployment-insights
    42        name: my-deploy-insight
    43        properties:
    44          leadTime: enabled
    45          frequency: enabled
    46          mttr: enabled
    47  
    48    # workflow is used to customize the control logic.
    49    # If workflow is specified, Vela won't apply any resource, but provide rendered resources in a ConfigMap, referenced via AppRevision.
    50    # workflow steps are executed in array order, and each step:
    51    # - will have a context in annotation.
    52    # - should mark "finish" phase in status.conditions.
    53    workflow:
    54  
    55      steps:
    56  
    57      # blue-green rollout
    58      - type: blue-green-rollout
    59        stage: post-render # stage could be pre/post-render. Default is post-render.
    60        properties:
    61          partition: "50%"
    62  
    63      # suspend can manually stop the workflow and resume. it will also allow suspend policy for workflow.
    64      - type: suspend
    65  
    66      # traffic shift
    67      - type: traffic-shift
    68        properties:
    69          partition: "50%"
    70  
    71      # promote/rollback
    72      - type: rollout-promotion
    73        properties:
    74          manualApproval: true
    75          rollbackIfNotApproved: true
    76  ```
    77  
    78  This also implicates we will add two Definition CRDs -- `PolicyDefinition` and `WorkflowStepDefinition`.
    79  
    80  PolicyDefinition looks like below:
    81  
    82  ```yaml
    83  apiVersion: core.oam.dev/v1beta1
    84  kind: PolicyDefinition
    85  spec:
    86    schematic:
    87      cue:
    88        template: |
    89          parameters: {
    90            frequency: *"enabled" | "disabled"
    91          }
    92          output: {
    93            apiVersion: app.oam.dev/v1
    94            kind: Insight
    95            spec:
    96              frequency: parameters.frequency
    97          }
    98  ```
    99  
   100  ### CUE-Based Workflow Task
   101  
   102  Outputing a CR object to complete a task in workflow requires users to implement an Operator which incurs heavy overhead.
   103  To simplify it, especially for users with simple use cases, we decide to provide lightweight CUE based workflow task.
   104  
   105  ```yaml
   106  apiVersion: core.oam.dev/v1beta1
   107  kind: WorkflowStepDefinition
   108  metadata:
   109    name: apply
   110  spec:
   111    schematic:
   112      cue:
   113        template: |
   114          import "vela/op"
   115  
   116          parameters: {
   117            image: string
   118          }
   119  
   120          apply: op.#Apply & {
   121            resource: context.workload
   122          }
   123  
   124          wait: op.#ConditionalWait & {
   125            continue: apply.status.ready == true
   126          }
   127  
   128          export: op.#Export & {
   129            secret: apply.status.secret
   130          }
   131  ```
   132  
   133  ### Stability mechanism
   134  
   135  #### Backoff Time
   136  
   137  Sometimes a workflow step can take a long time, so we need a backoff time for workflow reconciliation.
   138  
   139  If the status of workflow step is `waiting` or `failed`, the workflow will be reconciled after a backoff time like below:
   140  
   141  ```
   142  int(0.05 * 2^(n-1))
   143  ```
   144  
   145  Based on the above formula, we will take `1s` as the min time and `60s` as the max time. You can change the max time by setting `MaxWorkflowWaitBackoffTime`.
   146  
   147  For example, if the workflow is `waiting`, the first ten reconciliation will be like:
   148  
   149  | Times | 2^(n-1) | 0.05*2^(n-1) | Requeue After(s) |
   150  | ------ | ------ | ------ | ------ |
   151  | 1 | 1 | 0.05 | 1 |
   152  | 2 | 2 | 0.1 | 1 |
   153  | 3 | 4 | 0.2 | 1 |
   154  | 4 | 8 | 0.4 | 1 |
   155  | 5 | 16 | 0.8 | 1 |
   156  | 6 | 32 | 1.6 | 1 |
   157  | 7 | 64 | 3.2 | 3 |
   158  | 8 | 128 | 6.4 | 6 |
   159  | 9 | 256 | 12.8 | 12 |
   160  | 10 | 512 | 25.6 | 25 |
   161  | ... | ... | ... | ... |
   162  
   163  #### Failed Workflow Steps
   164  
   165  If the workflow step is `failed`, it means that there may be some error in the workflow step, like some cue errors.
   166  
   167  > Note that if the workflow step is unhealthy, the workflow step will be marked as `wait` but not `failed` and it will wait for healthy.
   168  
   169  For this case, we will retry the workflow step 10 times by default, and if the workflow step is still `failed`, we will terminate this workflow, and it's message will be `The workflow terminates automatically because the failed times of steps have reached the limit`. You can change the retry times by setting `MaxWorkflowStepErrorRetryTimes`.
   170  
   171  ## Implementation
   172  
   173  In this section we will discuss the implementation details for supporting policies and workflow tasks.
   174  
   175  Here's a diagram of how workflow internals work:
   176  
   177  ![alt](../../docs/resources/workflow-internals.png)
   178  
   179  
   180  ### 1. Application Controller
   181  
   182  Here are the steps in Application Controller:
   183  
   184  - On reconciling an Application event, Application Controller will render out all resources from components, traits, policies.
   185    It will also put rendered resources into a ConfigMap, and reference the ConfigMap name in AppRevision as below:
   186  
   187    ```yaml
   188    kind: ApplicationRevision
   189    spec:
   190      ...
   191      resourcesConfigMap:
   192        name: my-app-v1
   193    ---
   194  
   195    kind: ConfigMap
   196    metadata:
   197      name: my-app-v1
   198    data:
   199      mysvc: |
   200        {
   201          "apiVersion": "apps/v1",
   202          "kind": "Deployment",
   203          "metadata": {
   204              "name": "mysvc"
   205          },
   206          "spec": {
   207              "replicas": 1
   208          }
   209        }
   210      ...more name:data pairs...
   211    ```
   212  
   213  - After render, Application Controller will execute `spec.workflow`.
   214    This will basically call Workflow Manager to execute workflow tasks starting from scratch or last-run step on retry.
   215  
   216  
   217  ### 2. Workflow Manager
   218  
   219  Here are the steps in Workflow Manager:
   220  
   221  - The Workflow Manager will get the current workflow step via `status.workflow.stepIndex`.
   222  - If stepIndex is equal to the length of the all steps, it indicates that workflow is all done and return immediately.
   223  - If there are workflow tasks left, they will be run step by step. For each step, Workflow Manager will call Task Manager to handle it.
   224  - On return from calling Task Manager, Workflow Manager checks the return result:
   225    - If `status = completed`, Workflow Manager will increment `status.workflow.stepIndex`, and continue to run next step if any.
   226    - Otherwise, it will retry later.
   227  
   228  ### 3. Task Manager
   229  
   230  Here are the steps in Task Manager:
   231  
   232  - A workflow task will be executed synchronously which requires that the steps of a task should be non-blocking.
   233  - A workflow task will be parsed with its properties first to retrieve the full CUE data.
   234  - Task manager will get all do-able steps from the CUE data. This is done by analyzing if the step has a `#do` field.
   235    Here is an example:
   236  
   237    ```
   238    apply: op.#Apply & {
   239      resource: ...
   240    }
   241    ```
   242  
   243    The `op.#Apply` contains a [hidden field][2] `#do`:
   244  
   245    ```yaml
   246    #Apply: {
   247      #do: "apply"
   248      ...
   249    }
   250    ```
   251  
   252    This will inject the `#do` field to the `apply` step.
   253  
   254  - All do-able steps will be executed one by one by Task Manager.
   255  
   256  
   257  ### 4. CUE Step Execution
   258  
   259  - Task Manager will keep a map of actions.
   260    An action follows this interface:
   261  
   262    ```go
   263    type TaskAction interface {
   264      // cueValue is the parsed CUE value for this action
   265      Run(cueValue interface{}) (TaskStatus, error)
   266    }
   267    ```
   268  
   269  - Task Manager will use the `#do` field of the CUE step as the key to find an action to run.
   270  
   271  - An action returns a status indicating what to do next:
   272    - continue: continue to run the next action.
   273    - wait: makes the workflow manager to retry later.
   274    - break: makes the workflow manager to stop the entire workflow.
   275    - failedAfterRetries: if there are no other running steps, makes the workflow manager to suspend the workflow.
   276  
   277  - Task Manager will change status as needed based on the returned TaskStatus, e.g. change to wait. 
   278  
   279  
   280  ## Task Action
   281  
   282  These are the task actions to be supported in `vela/op` CUE lib:
   283  
   284  
   285  - Load: loads the rendered component resources
   286  
   287    ```
   288    #Load: {
   289      #do: "load"
   290      component?: string
   291    }
   292    ```
   293  
   294  - KubeRead: reads a k8s resource object
   295  
   296    ```
   297    #Read: {
   298      #do: "read"
   299      apiVersion: string
   300      kind: string
   301      namespace: string
   302      name: string
   303    }
   304    ```
   305  
   306  - Apply: applies a k8s resource object
   307  
   308    ```
   309    #Apply: {
   310      #do: "apply"
   311      resource: string
   312    }
   313    ```
   314  
   315  - Wait: waits until the `continue` condition is ready, otherwise makes the controller to reconcile later.
   316  
   317    ```
   318    #Wait: {
   319      #do: "wait"
   320      continue: bool
   321    }
   322    ```
   323  
   324  - Break: breaks from the workflow, and reports reasoning message.
   325  
   326    ```
   327    #Break: {
   328      #do: "break"
   329      message: string
   330    }
   331    ```
   332  
   333  
   334  - Export: exports the data into context for other workflow tasks to reuse
   335  
   336    ```
   337    #Export: {
   338      #do: "export"
   339      type: "patch" | *"var"
   340      if type == "patch" {
   341        component: string
   342      }
   343      value: _
   344    }
   345    ```
   346  
   347  ## Workflow Operation
   348  
   349  These are the operations that users can use to control the workflow at global level.
   350  
   351  ### 1. Terminate Workflow
   352  
   353  If the execution of the workflow does not meet expectations, it may be necessary to terminate the workflow
   354  
   355  There are two ways to achieve that:
   356  
   357  1. Modify the `workflow.terminated` field in status
   358  
   359  ```yaml
   360    kind: Application
   361    metadata:
   362      name: foo
   363    status:
   364      phase: runningWorkflow
   365      workflow:
   366        stepIndex: 1
   367        terminated: true
   368        steps:
   369        - name: ...
   370  ```
   371  
   372  
   373  2. Use `op.#Break` in workflowStep definition. When the task is executed, the op.#Break can be captured and then report terminated status
   374  
   375  ```yaml
   376  if job.status == "failed"{
   377    break: op.#Break & {
   378        message: "job failed: "+ job.status.message
   379    }
   380  }
   381  ```
   382  
   383  ### 2. Pause Workflow
   384  
   385  1. Modify the value of the `workflow.suspend` field to true to pause the workflow
   386  
   387  ```yaml
   388  kind: Application
   389  metadata:
   390    name: foo
   391  status:
   392    phase: runningWorkflow
   393    workflow:
   394      stepIndex: 1
   395      suspend: true
   396      steps:
   397      - name: ...
   398  ```
   399  
   400  2. The built-in suspend task support pause workflow, the example as follow
   401  
   402  ```yaml
   403  kind: Application
   404  spec:
   405    components: ...
   406  workflow:
   407    steps:
   408    - name: manual-approve
   409      type: suspend
   410  ```
   411  
   412  The `workflow.suspend` field will be set to true after the suspend-type task is started
   413  
   414  ### 3. Resume Workflow
   415  
   416  Modify the value of the `workflow.suspend` field to false to resume the workflow
   417  
   418  ```yaml
   419  kind: Application
   420  metadata:
   421    name: foo
   422  status:
   423    phase: runningWorkflow
   424    workflow:
   425      stepIndex: 1
   426      suspend: false
   427      steps:
   428      - name: ...
   429   ```
   430  
   431  ### 4. Restart Workflow
   432  
   433  The workflow will be restarted in the following two cases:
   434  
   435  1. Modify the value of the `status.phase` field to "runningWorkflow" and clear the status of the workflow
   436  
   437  ```yaml
   438  kind: Application
   439  metadata:
   440    name: foo
   441  status:
   442    phase: runningWorkflow
   443    workflow: {}
   444   ```
   445  
   446  
   447  2. The application spec changes
   448  
   449  The spec change also means that the application needs to be re-executed, and the application controller will clear the status of application includes workflow status.
   450  
   451  
   452  ## Operator Best Practice
   453  
   454  Each workflow task has similar interactions with Task Manager as follows:
   455  
   456  - The Task Manager will apply the workflow object with annotation `app.oam.dev/workflow-context`. This annotation will pass in the context marshalled in json defined as the following:
   457    ```go
   458      type WorkflowContext struct {
   459      	cli        client.Client
   460      	store      *corev1.ConfigMap
   461      	components map[string]*ComponentManifest
   462      	vars       *value.Value
   463      	modified   bool
   464      }
   465    ```
   466  
   467  - The workflow object's status condition should turn to be `True` status and `Succeeded` reason, and `observedGeneration` to match the resource's generation per se.
   468    This is to solve the [issue of passing data from the old generation][1].
   469    We will provide CUE op library to check this condition to decide whether to wait.
   470  
   471    ```yaml
   472    kind: SomeTask
   473    metadata:
   474      generation: 2
   475    status:
   476      observedGeneration: 2
   477      conditions:
   478        - type: workflow-progress
   479          status: 'True'
   480          reason: 'Succeeded'
   481    ```
   482  
   483  ## Use Cases
   484  
   485  In this section we will walk through how we implement workflow solutions for the following use cases.
   486  
   487  ### Case 1: Multi-cluster
   488  
   489  In this case, users want to distribute workflow to multiple clusters. The dispatcher implementation is flexible and could be based on [open-cluster-management](https://open-cluster-management.io/) or other methods.
   490  
   491  ```yaml
   492  workflow:
   493    steps:
   494    - type: open-cluster-management
   495      properties:
   496        placement:
   497          - clusterSelector:
   498              region: east
   499            replicas: "70%"
   500          - clusterSelector:
   501              region: west
   502            replicas: "20%"
   503  ```
   504  
   505  The process goes as:
   506  
   507  - During infra setup, the Cluster objects are applied and agents are setup in each cluster to manage lifecycle of k8s clusters.
   508  - Once the Application is applied, the OCM controller can retrieve all rendered resources from AppRevision. It will apply a ManifestWork object including all resources. Then the OCM agent will execute the workload creation in each cluster.
   509  
   510  ### Case 2: Blue-green rollout
   511  
   512  In this case, users want to rollout a new version of the application components in a blue-green rolling upgrade style.
   513  
   514  ```yaml
   515  workflow:
   516    steps:
   517    # blue-green rollout
   518    - type: blue-green-rollout
   519      properties:
   520        partition: "50%"
   521  
   522    # traffic shift
   523    - type: traffic-shift
   524      properties:
   525        partition: "50%"
   526  
   527    # promote/rollback
   528    - type: rollout-promotion
   529      properties:
   530        manualApproval: true
   531        rollbackIfNotApproved: true
   532  ```
   533  
   534  The process goes as:
   535  
   536  - By default, each modification of the Application object will generate an AppRevision object. The rollout controller will get the current revision from the context and retrieve the previous revision via kube API.
   537  - Then the rollout controller will do the operation to rollings replicas between two revisions (the actual behavior depends on the workload type, e.g. Deployment or CloneSet).
   538  - Once the rollover is done, the rollout controller can shift partial traffic to the new revision too.
   539  - The rollout controller will wait for the manual approval. In this case, it is in the status of Rollout object:
   540    ```yaml
   541    kind: Rollout
   542    status:
   543      pause: true # change this to false
   544    ```
   545  
   546    The reference to the rollout object will be in the Application object:
   547    ```yaml
   548    apiVersion: core.oam.dev/v1beta1
   549    kind: Application
   550    status:
   551      workflow:
   552        steps:
   553        - type: rollout-promotion
   554          resourceRef:
   555            kind: Rollout
   556            name: ...
   557    ```
   558  
   559  ### Case 3: Data Passing
   560  
   561  In this case, users want to deploy a database component first, wait the database to be up and ready, and then deploy the application with database connection secret.
   562  
   563  ```yaml
   564  components:
   565    - name: my-db
   566      type: mysql
   567      properties:
   568  
   569    - name: my-app
   570      type: webservice
   571  
   572  
   573  workflow:
   574    steps:
   575    # Wait for the MySQL object's status.connSecret to have value.
   576    - type: apply-component
   577      outputs:
   578        - name: connSecret
   579          valueFrom: output.status.connSecret
   580      properties:
   581        name: my-db
   582  
   583    # Patch my-app Deployment object's field with the secret name
   584    # emitted from MySQL object. And then apply my-app component.
   585    - type: apply-component
   586      inputs:
   587        - from: connSecret
   588          parameterKey: patch.valueFrom.field
   589      properties:
   590        name: my-app
   591        patch:
   592          to:
   593            field: spec.containers[0].envFrom[0].secretRef.name
   594          valueFrom:
   595            apiVersion: database.example.org/v1alpha1
   596            kind: MySQLInstance
   597            name: my-db
   598  ```
   599  
   600  ### Case 4: GitOps rollout
   601  
   602  In this case, users just want Vela to provide final k8s resources and push them to Git, and then integrate with ArgoCD/Flux to do final rollout. Users will setup a GitOps workflow like below:
   603  
   604  ```yaml
   605  workflow:
   606    steps:
   607    - type: gitops # This part configures how to push resources to Git repo
   608      properties:
   609        gitRepo: git-repo-url
   610        branch: branch
   611        credentials: ...
   612  ```
   613  
   614  The process goes as:
   615  
   616  - Everytime an Application event is triggered, the GitOps workflow controller will push the rendered resources to a Git repo. This will trigger ArgoCD/Flux to do continuous deployment.
   617  
   618  ### Case 5: Template-based rollout
   619  
   620  In this case, a template for Application object has already been defined. Instead of writing the `spec.components`, users will reference the template and provide parameters/patch to it.
   621  
   622  ```yaml
   623  workflow:
   624    steps:
   625    - type: helm-template
   626      stage: pre-render
   627      properties:
   628        source: git-repo-url
   629        path: chart/folder/path
   630        parameters:
   631          image: my-image
   632          replicas: 3
   633  ---
   634  workflow:
   635    steps:
   636    - type: kustomize-patch
   637      stage: pre-render
   638      properties:
   639        source: git-repo-url
   640        path: base/folder/path
   641        patch:
   642          spec:
   643            components:
   644              - name: instance
   645                properties:
   646                  image: prod-image
   647  ```
   648  
   649  The process goes as:
   650  
   651  - On creating the application, app controller will apply the HelmTemplate/KustomizePatch objects, and wait for its status.
   652  - The HelmTemplate/KustomizePatch controller would read the template from specified source, render the final config. It will compare the config with the Application object -- if there is difference, it will write back to the Application object per se.
   653  - The update of Application will trigger another event, the app controller will apply the HelmTemplate/KustomizePatch objects with new context. But this time, the HelmTemplate/KustomizePatch controller will find no diff after the rendering. So it will skip this time.
   654  
   655  ### Case 6: Conditional Check
   656  
   657  In this case, users want to execute different steps based on the responseCode. When the `if` condition is not met, the step will be skipped.
   658  
   659  ```yaml
   660  workflow:
   661    steps:
   662      - name: request
   663        type: webhook
   664      - name: handle-200
   665        type: deploy
   666        if: request.output.responseCode == 200
   667      - name: handle-400
   668        type: notification
   669        if: request.output.responseCode == 400
   670      - name: handle-500
   671        type: rollback
   672        if: request.output.responseCode == 500
   673  ```
   674  
   675  If users want to execute one step no matter what, they can use `if: always` in the step. In this way, whether the workflow is successful or not, the step will be executed`.
   676  
   677  ```yaml
   678  workflow:
   679  steps:
   680     - type: deploy
   681       name: deploy-app
   682     - name: notificationA
   683       if: always
   684       type: notification
   685  ```
   686  
   687  ### Case 7: step group
   688  
   689  In this case, the user runs multiple workflow steps in the `step-group` workflow type. subSteps in a step group will be executed in dag mode.
   690  ```yaml
   691  workflow:
   692    steps:
   693    - type: step-group
   694      name: run-step-group1
   695      subSteps: 
   696      - name: sub-step1
   697        type: ...
   698        ...
   699      - name: sub-step2
   700        type: ...
   701        ...
   702  
   703  ```
   704  
   705  The process is as follows:
   706  
   707  - When executing a `step-group` step, the subSteps in the step group are executed in dag mode. A step group will only complete when all subSteps have been executed to completion.
   708  
   709  ## Considerations
   710  
   711  ### Comparison with Argo Workflow/Tekton
   712  
   713  The workflow defined here are k8s resource based and very simple one direction workflow. It's mainly used to customize Vela control logic to do more complex deployment operations.
   714  
   715  While Argo Workflow/Tekton shares similar idea to provide workflow functionalities, they are container based and provide more complex features like parameters sharing (using volumes and sidecars). More importantly, these projects couldn't satisfy our needs. Otherwise we can just use them in our implementation.
   716  
   717  [1]: https://github.com/crossplane/oam-kubernetes-runtime/issues/222
   718  [2]: https://cuetorials.com/overview/scope-and-visibility/#hidden-fields-and-values