volcano.sh/volcano@v1.9.0/docs/design/jobflow/README.md (about) 1 # JobFlow 2 3 ## Introduction 4 5 In order to solve the problem of inter-job dependencies. We need many VCJobs to cooperate each other and orchestrate them manually or by another Job Orchestration Platform to get the job done finally.We present an new way of orchestrating VCJobs called JobFlow. We proposed two concepts to running multiple batch jobs automatically named JobTemplate and JobFlow so end users can easily declare their jobs and run them using complex controlling primitives, for example, sequential or parallel executing, if-then-else statement, switch-case statement, loop executing and so on. 6 7 JobFlow helps migrating AI, BigData, HPC workloads to the cloud-native world. Though there are already some workload flow engines, they are not designed for batch job workloads. Those jobs typically have a complex running dependencies and take long time to run, for example days or weeks. JobFlow helps the end users to declare their jobs as an jobTemplate and then reuse them accordingly. Also, JobFlow orchestrating those jobs using complex controlling primitives and launch those jobs automatically. This can significantly reduce the time consumption of an complex job and improve resource utilization. Finally, JobFlow is not an generally purposed workflow engine, it knows the details of VCJobs. End user can have a better understanding of their jobs, for example, job's running state, beginning and ending timestamps, the next jobs to run, pod-failure-ratio and so on. 8 9 ## Scope 10 11 ### In Scope 12 - Define the API of JobFlow 13 - Define the behaviour of JobFlow 14 - Start sequence between multiple jobs 15 - Dependency completion state of the job start sequence 16 - DAG-based job dependency startup 17 18 ### Out of Scope 19 - Supports other job 20 - Achieve vcjobs level gang 21 22 ## Scenarios 23 24 - Some jobs need to depend on the completion of the previous job or other status when running, etc. Otherwise, the correct result cannot be calculated. 25 - Sometimes inter-job dependencies also require diverse dependency types, such as conditional dependencies, circular dependencies, probes, and so on. 26 27  28 29 ## Design 30 31  32 33 The blue part is the components of k8s itself, the orange is the existing definition of Volcano, and the red is the new definition of JobFlow. 34 35 **jobflow job submission complete process**: 36 37 1. After passing the Admission. kubectl will create JobTemplate and JobFlow (Volcano CRD) objects in kube-apiserver. 38 39 2. The JobFlowController uses the JobTemplate as a template according to the configuration of the JobFlow, and creates the corresponding VcJob according to the flow dependency rules. 40 41 3. After VcJob is created, VcJobController creates corresponding Pods and podgroups according to the configuration of VcJob. 42 43 4. After Pod and PodGroup are created, vc-scheduler will go to kube-apiserver to get Pod/PodGroup and node information. 44 45 5. After obtaining the information, vc-scheduler will select the appropriate node for each Pod according to its configured scheduling policy. 46 47 6. After assigning nodes to Pods, kubelet will get the Pod's configuration from kube-apiserver and start the corresponding containers. 48 49 **update jobflow**: 50 51 Currently, jobflow does not support the update operation, and the update of jobflow will be blocked through webhook. 52 53 **delete jobflow**: 54 55 Deleting a jobflow when the jobflow is in a non-complete state will be intercepted by the webhook. otherwise, after deleting jobflow, all vcjobs created by jobflow will be deleted directly. 56 57 ### Controller 58 59  60 61 ### Webhook 62 63 ``` 64 Create a JobFlow check 65 1、There cannot be a template with the same name in a JobFlow dependency 66 Such as: A->B->A->C A appears twice 67 2、Closed loops cannot occur in JobFlow 68 E.g:A -> B -> C 69 ^ / 70 | / 71 < - D 72 73 Create a JobTemplte check (following the vcjob parameter specification) 74 E.g: job minAvailable must be greater than or equal to zero 75 job maxRetry must be greater than or equal to zero 76 tasks cannot be empty, and cannot have tasks with the same name 77 The number of task replicas cannot be less than zero 78 task minAvailable cannot be greater than task replicas... 79 ``` 80 81 ### JobFlow 82 83 #### Introduction 84 85 JobFlow defines the running flow of a set of jobs. Fields in JobFlow define how jobs are orchestrated. 86 87 JobFlow is abbreviated as jf, and the resource can be viewed through kubectl get jf 88 89 JobFlow aims to realize job-dependent operation between vcjobs in volcano. According to the dependency between vcjob, vcjob is issued. 90 91 #### Key Fields 92 93 ##### Top-Level Attributes 94 95 The top-level attributes of a jobflow define its apiVersion, kind, metadata and spec. 96 97 | Attribute | Type | Required | Default Value | Description | 98 | ------------ | ----------------------- | -------- | -------------------------- | ------------------------------------------------------------ | 99 | `apiVersion` | `string` | Y | `flow.volcano.sh/v1alpha1` | A string that identifies the version of the schema the object should have. The core types uses `flow.volcano.sh/v1alpha1` in this version of documentation. | 100 | `kind` | `string` | Y | `JobFlow` | Must be `JobFlow` | 101 | `metadata` | [`Metadata`](#Metadata) | Y | | Information about the JobFlow resource. | 102 | `spec` | [`Spec`](#spec) | Y | | A specification for the JobFlow resource attributes. | 103 | `status` | [`Status`](#Status) | Y | | A specification for the JobFlow status attributes. | 104 105 <a id="Metadata"></a> 106 107 ##### Metadata 108 109 Metadata provides basic information about the JobFlow. 110 111 | Attribute | Type | Required | Default Value | Description | 112 | ------------- | ------------------- | -------- | ------------- | ------------------------------------------------------------ | 113 | `name` | `string` | Y | | A name for the schematic. `name` is subject to the restrictions listed beneath this table. | 114 | `namespace` | `string` | Y | | A namespace for the schematic. `namespace` is subject to the restrictions listed beneath this table. | 115 | `labels` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary labels on this component. Labels follow the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/). | 116 | `annotations` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary descriptive text associated with this object. Annotations follows the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set). | 117 118 <a id="Spec"></a> 119 120 ##### Spec 121 122 The specification of cloud-native services defines service metadata, version list, service capabilities and plugins. 123 124 | Attribute | Type | Required | Default Value | Description | 125 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 126 | `flows` | [`Flow array`](#Flow) | Y | | Describes the dependencies between vcjobs. | 127 | `jobRetainPolicy` | `string` | Y | retain | After JobFlow succeed, keep the generated job. Otherwise, delete it. | 128 129 <a id="Flow"></a> 130 131 ##### Flow 132 133 | Attribute | Type | Required | Default Value | Description | 134 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 135 | `name` | `string` | Y | | JobTemplate name | 136 | `dependsOn` | [`DependsOn`](#DependsOn) | Y | | JobTemplate dependencies | 137 | `patch` | [`Patch`](#Patch) | N | | Patch JobTemplate | 138 139 <a id="DependsOn"></a> 140 141 ##### DependsOn 142 143 | Attribute | Type | Required | Default Value | Description | 144 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 145 | `targets` | `string array` | Y | | All jobtemplate names that JobTemplate depends on | 146 | `probe` | [`Probe`](#Probe) | N | | Probe Type Dependency | 147 | `strategy` | `string` | Y | all | Whether the dependencies need to be all satisfied | 148 149 <a id="Patch"></a> 150 151 ##### Patch 152 153 | Attribute | Type | Required | Default Value | Description | 154 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 155 | `spec` | `spec` | Y | | Patch the contents of the jobtemplate's spec | 156 157 <a id="Probe"></a> 158 159 ##### Probe 160 161 | Attribute | Type | Required | Default Value | Description | 162 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 163 | `httpGetList` | [`HttpGet array`](#HttpGet) | N | | HttpGet type dependencies | 164 | `tcpSocketList` | [`TcpSocket array`](#TcpSocket) | N | | TcpSocket type dependencies | 165 | `taskStatusList` | [`TaskStatus array`](#TaskStatus) | N | | TaskStatus type dependencies | 166 167 <a id="HttpGet"></a> 168 169 ##### HttpGet 170 171 | Attribute | Type | Required | Default Value | Description | 172 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 173 | `TaskName` | `string` | Y | | The name of the task under vcjob | 174 | `Path` | [`Probe`](#Probe) | Y | | The path of httpget | 175 | `Port` | `int` | Y | | The port of httpget | 176 | `httpHeader` | `HTTPHeader` | N | | The httpHeader of httpget | 177 178 <a id="TcpSocket"></a> 179 180 ##### TcpSocket 181 182 | Attribute | Type | Required | Default Value | Description | 183 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 184 | `TaskName` | `string` | Y | | The name of the task under vcjob | 185 | `Port` | `int` | Y | | The port of TcpSocket | 186 187 <a id="TaskStatus"></a> 188 189 ##### TaskStatus 190 191 | Attribute | Type | Required | Default Value | Description | 192 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 193 | `TaskName` | `string` | Y | | The name of the task under vcjob | 194 | `Phase` | `string` | Y | | The phase of task | 195 196 <a id="Status"></a> 197 198 ##### Status 199 200 | Attribute | Type | Required | Default Value | Description | 201 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 202 | `pendingJobs` | `string array` | N | | Vcjobs in pending state | 203 | `runningJobs` | `string array` | N | | Vcjobs in running state | 204 | `failedJobs` | `string array` | N | | Vcjobs in failed state | 205 | `completedJobs` | `string array` | N | | Vcjobs in completed and completing state | 206 | `terminatedJobs` | `string array` | N | | Vcjobs in terminated and terminating state | 207 | `unKnowJobs` | `string array` | N | | Vcjobs in pending state | 208 | `jobStatusList` | [`JobStatus array`](#JobStatus) | N | | Status information of all split vcjobs | 209 | `conditions` | [`map[string]Condition`](#Condition) | N | | It is used to describe the current state, creation time, completion time and information of all vcjobs. The vcjob state here additionally adds the waiting state to describe the vcjob whose dependencies do not meet the requirements. | 210 | `state` | [`State`](#State) | N | | State of JobFlow | 211 212 <a id="JobStatus"></a> 213 214 ##### JobStatus 215 216 | Attribute | Type | Required | Default Value | Description | 217 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 218 | `name` | `string` | N | | Name of vcjob | 219 | `state` | `string` | N | | State of vcJob | 220 | `startTimestamp` | `Time` | N | | StartTimestamp of vcjob | 221 | `endTimestamp` | `Time` | N | | EndTimestamp of vcjob | 222 | `restartCount` | `int32` | N | | RestartCount of vcjob | 223 | `runningHistories` | [`JobRunningHistory array`](#JobRunningHistory) | N | | Historical information of various states of vcjob | 224 225 <a id="Condition"></a> 226 227 ##### Condition 228 229 | Attribute | Type | Required | Default Value | Description | 230 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 231 | `phase` | `string` | N | | phase of vcjob | 232 | `createTime` | `Time` | N | | CreateTime of vcjob | 233 | `runningDuration` | `Duration` | N | | RunningDuration of vcjob | 234 | `taskStatusCount` | `map[string]TaskState` | N | | The number of tasks in different states | 235 236 <a id="State"></a> 237 238 ##### State 239 240 | Attribute | Type | Required | Default Value | Description | 241 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 242 | `phase` | `string` | N | | Succeed: All vcjobs have reached completed state. <br/>Terminating: Jobflow is deleting. <br/>Failed: A vcjob in the flow is in the failed state, so the vcjob in the flow cannot continue to be delivered. <br/>Running: Flow contains vcjob in Running state。<br/>Pending: When the vcjob under jobflow is not in the above situation, jobflow is in pending state. | 243 244 <a id="JobRunningHistory"></a> 245 246 ##### JobRunningHistory 247 248 | Attribute | Type | Required | Default Value | Description | 249 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 250 | `startTimestamp` | `Time` | N | | The start time of a certain state of the vcjob | 251 | `endTimestamp` | `Time` | N | | The end time of a certain state of the vcjob | 252 | `state` | `string` | N | | Vcjob status | 253 254 **Scope of influence of JobFlow state change**: 255 256 Changes in the current JobFlow state will not affect other resources. 257 258 **JobFlow supports the functionality of the JobTemplate patch. The example in JobFlow is as follows**: 259 260 ``` 261 apiVersion: flow.volcano.sh/v1alpha1 262 kind: JobFlow 263 metadata: 264 name: test 265 namespace: default 266 spec: 267 jobRetainPolicy: delete 268 flows: 269 - name: a 270 patch: 271 spec: 272 tasks: 273 - name: "default-nginx" 274 template: 275 spec: 276 containers: 277 - name: nginx 278 command: 279 - sh 280 - -c 281 - sleep 10s 282 ``` 283 284 Here is an example of jobflow: 285 286 [the sample file of JobFlow](../../../example/jobflow/JobFlow.yaml) 287 288 ### JobTemplate 289 290 #### Introduction 291 292 * JobTemplate is the template of vcjob, after JobTemplate is created, it will not be processed by vc-controller like vcjob, it will wait to be referenced by JobFlow. 293 * JobFlow can reference multiple jobtemplates 294 * A jobtemplate can be referenced by multiple jobflows 295 * JobTemplate can be converted to and from vcjob. 296 * Jobtemplate is abbreviated as jt, and the resource can be viewed through kubectl get jt 297 * The difference between jobtemplate and vcjob is that jobtemplate will not be issued by the job controller, and jobflow can directly reference the name of the JobTemplate to implement the issuance of vcjob. 298 * JobFlow supports making changes to jobtemplate when referencing jobtemplate 299 300 ####action of jobtemplate and response impact 301 302 **create jobtemplate**: 303 304 Create a jobtemplate to be used by jobflow. 305 306 **update jobtemplate**: 307 308 After the jobtemplate is updated, it will not affect the vcjobs that have been created based on the jobtemplate. It will not affect the successfully executed jobflow. It may affect the jobflow that has not been executed. For example, the jobflow that has not been executed to the jobtemplate stage will use the updated jobtemplate template. 309 310 **delete jobtemplate**: 311 312 When the jobtemplate is being referenced by a non-complete jobflow, the webhook will intercept the jobtemplate deletion request. 313 314 #### Key Fields 315 316 ##### Top-Level Attributes 317 318 The top-level attributes of a jobtemplate define its apiVersion, kind, metadata and spec. 319 320 | Attribute | Type | Required | Default Value | Description | 321 | ------------ | ----------------------- | -------- | -------------------------- | ------------------------------------------------------------ | 322 | `apiVersion` | `string` | Y | `flow.volcano.sh/v1alpha1` | A string that identifies the version of the schema the object should have. The core types uses `flow.volcano.sh/v1alpha1` in this version of documentation. | 323 | `kind` | `string` | Y | `JobTemplate` | Must be `JobTemplate` | 324 | `metadata` | [`Metadata`](#JobTemplateMetadata) | Y | | Information about the JobTemplate resource. | 325 | `spec` | [`Spec`](#JobTemplateSpec) | Y | | A specification for the JobTemplate resource attributes. | 326 | `status` | [`Status`](# JobTemplateStatus) | Y | | A specification for the JobTemplate status attributes. | 327 328 <a id="JobTemplateMetadata"></a> 329 330 ##### Metadata 331 332 Metadata provides basic information about the JobTemplate. 333 334 | Attribute | Type | Required | Default Value | Description | 335 | ------------- | ------------------- | -------- | ------------- | ------------------------------------------------------------ | 336 | `name` | `string` | Y | | A name for the schematic. `name` is subject to the restrictions listed beneath this table. | 337 | `namespace` | `string` | Y | | A namespace for the schematic. `namespace` is subject to the restrictions listed beneath this table. | 338 | `labels` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary labels on this component. Labels follow the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/). | 339 | `annotations` | `map[string]string` | N | | A set of string key/value pairs used as arbitrary descriptive text associated with this object. Annotations follows the [Kubernetes specification](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set). | 340 341 <a id="JobTemplateSpec"></a> 342 343 ##### JobTemplateSpec 344 345 The spec of jobtemplate directly follows the spec of vcjob. 346 347 <a id="JobTemplateStatus"></a> 348 349 ##### JobTemplateStatus 350 | Attribute | Type | Required | Default Value | Description | 351 | ----------------- | ------------------------------------ | -------- | ------------- | ------------------------------------------------------------ | 352 | `jobDependsOnList` | `string array` | Y | | Vcjobs created by this jobtemplate as a template. | 353 354 You can view [the sample file of JobTemplate](../../../example/jobflow/JobTemplate.yaml) 355 356 ## JobFlow task scheduling 357 358  359 360 ## Demo video 361 362 https://www.bilibili.com/video/BV1c44y1Y7FX 363 364 ## Usage 365 366 - Create the jobTemplate that needs to be used 367 - Create a jobflow. The flow field of the jobflow is filled with the corresponding jobtemplate used to create a vcjob. 368 - The field jobRetainPolicy indicates whether to delete the vcjob created by the jobflow after the jobflow succeeds. (delete/retain) default is retain. 369 370 ## JobFlow Features 371 372 ### Features that have been implemented 373 374 * Create JobFlow and JobTemplate CRD 375 * Support sequential start of vcjob 376 * Support vcjob to depend on other vcjobs to start 377 * Support the conversion of vcjob and JobTemplate to each other 378 * Supports viewing of the running status of JobFlow 379 380 ### Features not yet implemented 381 382 * JobFlow supports making changes to jobtemplate when referencing jobtemplate 383 * `if` statements 384 * `switch` statements 385 * `for` statements 386 * Support job failure retry in JobFlow 387 * Integration with volcano-scheduler 388 * Support for scheduling plugins at JobFlow level