volcano.sh/volcano@v1.9.0/docs/design/job-api.md (about)

     1  # Job API
     2  
     3  [@k82cn](http://github.com/k82cn);  Dec 27, 2018
     4  
     5  ## Motivation
     6  
     7  `Job` is the fundamental object of high performance workload; this document provides the definition of `Job` in Volcano.
     8  
     9  ## Scope
    10  
    11  ### In Scope
    12  
    13  * Define the API of Job
    14  * Define the behaviour of Job
    15  * Clarify the interaction with other features
    16  
    17  ### Out of Scope
    18  
    19  * Volumes: volume management is out of scope for job management related features
    20  * Network: the addressing between tasks will be described in other project
    21  
    22  ## Function Detail
    23  
    24  The definition of `Job` follow Kuberentes's style, e.g. Status, Spec; the follow sections will only describe
    25  the major functions of `Job`, refer to [Appendix](#appendix) section for the whole definition of `Job`.
    26  
    27  ### Multiple Pod Template
    28  
    29  As most jobs of high performance workload include different type of tasks, e.g. TensorFlow (ps/worker), Spark (driver/executor);
    30  `Job` introduces `taskSpecs` to support multiple pod template, defined as follow.  The `Policies` will describe in
    31   [Error Handling](#error-handling) section.
    32  
    33   ```go
    34  // JobSpec describes how the job execution will look like and when it will actually run
    35  type JobSpec struct {
    36      ...
    37  
    38      // Tasks specifies the task specification of Job
    39      // +optional
    40      Tasks []TaskSpec `json:"tasks,omitempty" protobuf:"bytes,5,opt,name=tasks"`
    41  }
    42  
    43  // TaskSpec specifies the task specification of Job
    44  type TaskSpec struct {
    45      // Name specifies the name of task
    46      Name string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
    47  
    48      // Replicas specifies the replicas of this TaskSpec in Job
    49      Replicas int32 `json:"replicas,omitempty" protobuf:"bytes,2,opt,name=replicas"`
    50  
    51      // Specifies the pod that will be created for this TaskSpec
    52      // when executing a Job
    53      Template v1.PodTemplateSpec `json:"template,omitempty" protobuf:"bytes,3,opt,name=template"`
    54  
    55      // Specifies the lifecycle of tasks
    56      // +optional
    57      Policies []LifecyclePolicy `json:"policies,omitempty" protobuf:"bytes,4,opt,name=policies"`
    58  }
    59  ```
    60  
    61  `JobController` will create Pods based on the templates and replicas in `spec.tasks`;
    62  the controlled `OwnerReference` of Pod will be set to the `Job`. The following is
    63  an example YAML with multiple pod template.
    64  
    65  ```yaml
    66  apiVersion: batch.volcano.sh/v1alpha1
    67  kind: Job
    68  metadata:
    69    name: tf-job
    70  spec:
    71    tasks:
    72    - name: "ps"
    73      replicas: 2
    74      template:
    75        spec:
    76          containers:
    77          - name: ps
    78            image: ps-img
    79    - name: "worker"
    80      replicas: 5
    81      template:
    82        spec:
    83          containers:
    84          - name: worker
    85            image: worker-img
    86  ```
    87  
    88  ### Job Input/Output
    89  
    90  Most of high performance workload will handle data which is considering as input/output of a Job.
    91  The following types are introduced for Job's input/output.
    92  
    93  ```go
    94  type VolumeSpec struct {
    95  	MountPath string `json:"mountPath" protobuf:"bytes,1,opt,name=mountPath"`
    96  
    97  	// defined the PVC name
    98  	// + optional
    99  	VolumeClaimName string `json:"volumeClaimName,omitempty" protobuf:"bytes,2,opt,name=volumeClaimName"`
   100  
   101  	// VolumeClaim defines the PVC used by the VolumeSpec.
   102  	// + optional
   103  	VolumeClaim *PersistentVolumeClaim `json:"claim,omitempty" protobuf:"bytes,3,opt,name=claim"`
   104  }
   105  
   106  type JobSpec struct{
   107      ...
   108  
   109      // The volumes mount on Job
   110      // +optional
   111      Volumes []VolumeSpec `json:"volumes,omitempty" protobuf:"bytes,1,opt,name=volumes"`
   112  }
   113  ```
   114  
   115  The `Volumes` of Job can be `nil` which means user will manage data themselves. If `*VolumeSpec.volumeClaim` is `nil` and `*VolumeSpec.volumeClaimName` is `nil` or not exist in PersistentVolumeClaim,`emptyDir` volume will be used for each Task/Pod.
   116  
   117  ### Conditions and Phases
   118  
   119  The following phases are introduced to give a simple, high-level summary of where the Job is in its lifecycle; and the conditions array,
   120  the reason and message field contain more detail about the job's status.
   121  
   122  ```go
   123  type JobPhase string
   124  
   125  const (
   126      // Pending is the phase that job is pending in the queue, waiting for scheduling decision
   127      Pending JobPhase = "Pending"
   128      // Aborting is the phase that job is aborted, waiting for releasing pods
   129      Aborting JobPhase = "Aborting"
   130      // Aborted is the phase that job is aborted by user or error handling
   131      Aborted JobPhase = "Aborted"
   132      // Running is the phase that minimal available tasks of Job are running
   133      Running JobPhase = "Running"
   134      // Restarting is the phase that the Job is restarted, waiting for pod releasing and recreating
   135      Restarting JobPhase = "Restarting"
   136      // Completed is the phase that all tasks of Job are completed successfully
   137      Completed JobPhase = "Completed"
   138      // Terminating is the phase that the Job is terminated, waiting for releasing pods
   139      Terminating JobPhase = "Terminating"
   140      // Teriminated is the phase that the job is finished unexpected, e.g. events
   141      Teriminated JobPhase = "Terminated"
   142  )
   143  
   144  // JobState contains details for the current state of the job.
   145  type JobState struct {
   146      // The phase of Job.
   147      // +optional
   148      Phase JobPhase `json:"phase,omitempty" protobuf:"bytes,1,opt,name=phase"`
   149  
   150      // Unique, one-word, CamelCase reason for the phase's last transition.
   151      // +optional
   152      Reason string `json:"reason,omitempty" protobuf:"bytes,2,opt,name=reason"`
   153  
   154      // Human-readable message indicating details about last transition.
   155      // +optional
   156      Message string `json:"message,omitempty" protobuf:"bytes,3,opt,name=message"`
   157  }
   158  
   159  // JobStatus represents the current state of a Job
   160  type JobStatus struct {
   161      // Current state of Job.
   162      State JobState `json:"state,omitempty" protobuf:"bytes,1,opt,name=state"`
   163  
   164      ......
   165  }
   166  ```
   167  
   168  The following table shows available transactions between different phases. The phase can not transfer to the target
   169  phase if the cell is empty.
   170  
   171  | From \ To     | Pending | Aborted | Running | Completed | Terminated |
   172  | ------------- | ------- | ------- | ------- | --------- | ---------- |
   173  | Pending       | *       | *       | *       |           |            |
   174  | Aborted       | *       | *       |         |           |            |
   175  | Running       |         | *       | *       | *         | *          |
   176  | Completed     |         |         |         | *         |            |
   177  | Terminated    |         |         |         |           | *          |
   178  
   179  `Restarting`, `Aborting` and `Terminating` are temporary states to avoid race condition, e.g. there'll be several
   180  `PodeEvictedEvent`s because of `TerminateJobAction` which should not be handled again.
   181  
   182  ### Error Handling
   183  
   184  After Job was created in system, there'll be several events related to the Job, e.g. Pod succeeded, Pod failed;
   185  and some events are critical to the Job, e.g. Pod of MPIJob failed. So `LifecyclePolicy` is introduced to handle different
   186  events based on user's configuration.
   187  
   188  ```go
   189  // Event is the type of Event related to the Job
   190  type Event string
   191  
   192  const (
   193      // AllEvents means all event
   194      AllEvents             Event = "*"
   195      // PodFailedEvent is triggered if Pod was failed
   196      PodFailedEvent        Event = "PodFailed"
   197      // PodEvictedEvent is triggered if Pod was deleted
   198      PodEvictedEvent       Event = "PodEvicted"
   199      // These below are several events can lead to job 'Unknown'
   200      // 1. Task Unschedulable, this is triggered when part of
   201      //    pods can't be scheduled while some are already running in gang-scheduling case.
   202      JobUnknownEvent Event = "Unknown"
   203  
   204      // OutOfSyncEvent is triggered if Pod/Job were updated
   205      OutOfSyncEvent Event = "OutOfSync"
   206      // CommandIssuedEvent is triggered if a command is raised by user
   207      CommandIssuedEvent Event = "CommandIssued"
   208      // TaskCompletedEvent is triggered if the 'Replicas' amount of pods in one task are succeed
   209      TaskCompletedEvent Event = "TaskCompleted"
   210  )
   211  
   212  // Action is the type of event handling
   213  type Action string
   214  
   215  const (
   216      // AbortJobAction if this action is set, the whole job will be aborted:
   217      // all Pod of Job will be evicted, and no Pod will be recreated
   218      AbortJobAction Action = "AbortJob"
   219      // RestartJobAction if this action is set, the whole job will be restarted
   220      RestartJobAction Action = "RestartJob"
   221      // TerminateJobAction if this action is set, the whole job wil be terminated
   222      // and can not be resumed: all Pod of Job will be evicted, and no Pod will be recreated.
   223      TerminateJobAction Action = "TerminateJob"
   224      // CompleteJobAction if this action is set, the unfinished pods will be killed, job completed.
   225      CompleteJobAction Action = "CompleteJob"
   226  
   227      // ResumeJobAction is the action to resume an aborted job.
   228      ResumeJobAction Action = "ResumeJob"
   229      // SyncJobAction is the action to sync Job/Pod status.
   230      SyncJobAction Action = "SyncJob"
   231  )
   232  
   233  // LifecyclePolicy specifies the lifecycle and error handling of task and job.
   234  type LifecyclePolicy struct {
   235      Event  Event  `json:"event,omitempty" protobuf:"bytes,1,opt,name=event"`
   236      Action Action `json:"action,omitempty" protobuf:"bytes,2,opt,name=action"`
   237      Timeout *metav1.Duration `json:"timeout,omitempty" protobuf:"bytes,3,opt,name=timeout"`
   238  }
   239  ```
   240  
   241  Both `JobSpec` and `TaskSpec` include lifecycle policy: the policies in `JobSpec` are the default policy if no policies
   242  in `TaskSpec`; the policies in `TaskSpec` will overwrite defaults.
   243  
   244  ```go
   245  // JobSpec describes how the job execution will look like and when it will actually run
   246  type JobSpec struct {
   247      ...
   248  
   249      // Specifies the default lifecycle of tasks
   250      // +optional
   251      Policies []LifecyclePolicy `json:"policies,omitempty" protobuf:"bytes,5,opt,name=policies"`
   252  
   253      // Tasks specifies the task specification of Job
   254      // +optional
   255      Tasks []TaskSpec `json:"tasks,omitempty" protobuf:"bytes,6,opt,name=tasks"`
   256  }
   257  
   258  // TaskSpec specifies the task specification of Job
   259  type TaskSpec struct {
   260      ...
   261  
   262      // Specifies the lifecycle of tasks
   263      // +optional
   264      Policies []LifecyclePolicy `json:"policies,omitempty" protobuf:"bytes,4,opt,name=policies"`
   265  }
   266  ```
   267  
   268  The following examples demonstrate the usage of `LifecyclePolicy` for job and task.
   269  
   270  For the training job of machine learning framework, the whole job should be restarted if any task was failed or evicted.
   271  To simplify the configuration, a job level `LifecyclePolicy` is set as follows.  As no `LifecyclePolicy` is set for any
   272  task, all tasks will use the policies in `spec.policies`.
   273  
   274  ```yaml
   275  apiVersion: batch.volcano.sh/v1alpha1
   276  kind: Job
   277  metadata:
   278    name: tf-job
   279  spec:
   280    # If any event here, restart the whole job.
   281    policies:
   282    - event: *
   283      action: RestartJob
   284    tasks:
   285    - name: "ps"
   286      replicas: 1
   287      template:
   288        spec:
   289          containers:
   290          - name: ps
   291            image: ps-img
   292    - name: "worker"
   293      replicas: 5
   294      template:
   295        spec:
   296          containers:
   297          - name: worker
   298            image: worker-img
   299    ...
   300  ```
   301  
   302  Some BigData framework (e.g. Spark) may have different requirements. Take Spark as example, the whole job will be restarted
   303  if 'driver' tasks failed and only restart the task if 'executor' tasks failed. `OnFailure` restartPolicy is set for executor
   304  and `RestartJob` is set for driver `spec.tasks.policies` as follow.
   305  
   306  ```yaml
   307  apiVersion: batch.volcano.sh/v1alpha1
   308  kind: Job
   309  metadata:
   310    name: spark-job
   311  spec:
   312    tasks:
   313    - name: "driver"
   314      replicas: 1
   315      policies:
   316      - event: *
   317        action: RestartJob
   318      template:
   319        spec:
   320          containers:
   321          - name: driver
   322            image: driver-img
   323    - name: "executor"
   324      replicas: 5
   325      template:
   326        spec:
   327          containers:
   328          - name: executor
   329            image: executor-img
   330          restartPolicy: OnFailure
   331  ```
   332  
   333  ## Features Interaction
   334  
   335  ### Admission Controller
   336  
   337  The following validations must be included to make sure expected behaviours:
   338  
   339  * `spec.minAvailable` <= sum(`spec.taskSpecs.replicas`)
   340  * no duplicated name in `spec.taskSpecs` array
   341  * no duplicated event handler in `LifecyclePolicy` array, both job policies and task policies
   342  
   343  ### CoScheduling
   344  
   345  CoScheduling (or Gang-scheduling) is required by most of high performance workload, e.g. TF training job, MPI job.
   346  The `spec.minAvailable` is used to identify how many pods will be scheduled together. The default value of `spec.minAvailable`
   347  is summary of `spec.tasks.replicas`. The admission controller web hook will check `spec.minAvailable` against
   348  the summary of `spec.tasks.replicas`; the job creation will be rejected if `spec.minAvailable` > sum(`spec.tasks.replicas`).
   349  If `spec.minAvailable` < sum(`spec.tasks.replicas`), the pod of `spec.tasks` will be created randomly;
   350  refer to [Task Priority with Job](#task-priority-within-job) section on how to create tasks in order.
   351  
   352  ```yaml
   353  apiVersion: batch.volcano.sh/v1alpha1
   354  kind: Job
   355  metadata:
   356    name: tf-job
   357  spec:
   358    # minAvailable to run job
   359    minAvailable: 6
   360    tasks:
   361    - name: "ps"
   362      replicas: 1
   363      template:
   364        spec:
   365          containers:
   366          - name: "ps"
   367            image: "ps-img"
   368    - name: "worker"
   369      replicas: 5
   370      template:
   371        spec:
   372          containers:
   373          - name: "worker"
   374            image: "worker-img"
   375  ```
   376  
   377  ### Task Priority within Job
   378  
   379  In addition to multiple pod template, the priority of each task maybe different. `PriorityClass` of `PodTemplate` is reused
   380  to define the priority of task within a job. This's an example to run spark job: 1 driver with 5 executors, the driver's
   381  priority is `master-pri` which is higher than normal pods; as `spec.minAvailable` is 3, the scheduler will make sure one driver
   382  with 2 executors will be scheduled if not enough resources.
   383  
   384  ```yaml
   385  apiVersion: batch.volcano.sh/v1alpha1
   386  kind: Job
   387  metadata:
   388    name: spark-job
   389  spec:
   390    minAvailable: 3
   391    tasks:
   392    - name: "driver"
   393      replicas: 1
   394      template:
   395        spec:
   396          priorityClass: "master-pri"
   397          containers:
   398          - name: driver
   399            image: driver-img
   400    - name: "executor"
   401      replicas: 5
   402      template:
   403        spec:
   404          containers:
   405          - name: executor
   406            image: executor-img
   407  ```
   408  
   409  **NOTE**: although scheduler will make sure high priority pods with job will be scheduled firstly, there's still a race
   410  condition between different kubelets that low priority pod maybe launched early; the job/task dependency will be introduced
   411  later to handle such kind of race condition.
   412  
   413  ### Resource sharing between Job
   414  
   415  By default, the `spec.minAvailable` is set to the summary of `spec.tasks.replicas`; if it's set to a smaller value,
   416  the pod beyond `spec.minAvailable` will share resource between jobs.
   417  
   418  ```yaml
   419  apiVersion: batch.volcano.sh/v1alpha1
   420  kind: Job
   421  metadata:
   422    name: spark-job
   423  spec:
   424    minAvailable: 3
   425    tasks:
   426    - name: "driver"
   427      replicas: 1
   428      template:
   429        spec:
   430          priorityClass: "master-pri"
   431          containers:
   432          - name: driver
   433            image: driver-img
   434    - name: "executor"
   435      replicas: 5
   436      template:
   437        spec:
   438          containers:
   439          - name: executor
   440            image: executor-img
   441  ```
   442  
   443  ### Plugins for Job
   444  
   445  As many jobs of AI frame, e.g. TensorFlow, MPI, Mxnet, need set env, pods communicate, ssh sign in without password.
   446  We provide Job api plugins to give users a better focus on core business.
   447  Now we have three plugins, every plugin has parameters, if not provided, we use default.
   448  
   449  * env: set VK_TASK_INDEX to each container, is a index for giving the identity to container.
   450  * svc: create Service and *.host to enable pods communicate.
   451  * ssh: sign in ssh without password, e.g. use command mpirun or mpiexec.
   452  
   453  ```yaml
   454  apiVersion: batch.volcano.sh/v1alpha1
   455  kind: Job
   456  metadata:
   457    name: mpi-job
   458  spec:
   459    minAvailable: 2
   460    schedulerName: volcano
   461    policies:
   462    - event: PodEvicted
   463      action: RestartJob
   464    plugins:
   465      ssh: []
   466      env: []
   467      svc: []
   468    tasks:
   469    - replicas: 1
   470      name: mpimaster
   471      template:
   472        spec:
   473          containers:
   474            image: mpi-image
   475            name: mpimaster
   476    - replicas: 2
   477      name: mpiworker
   478      template:
   479        spec:
   480          containers:
   481            image: mpi-image
   482            name: mpiworker
   483  ```
   484  
   485  ## Appendix
   486  
   487  ```go
   488  type Job struct {
   489      metav1.TypeMeta `json:",inline"`
   490  
   491      metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
   492  
   493      // Specification of the desired behavior of a cron job, including the minAvailable
   494      // +optional
   495      Spec JobSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
   496  
   497      // Current status of Job
   498      // +optional
   499      Status JobStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
   500  }
   501  
   502  // JobSpec describes how the job execution will look like and when it will actually run
   503  type JobSpec struct {
   504      // SchedulerName is the default value of `taskSpecs.template.spec.schedulerName`.
   505      // +optional
   506      SchedulerName string `json:"schedulerName,omitempty" protobuf:"bytes,1,opt,name=schedulerName"`
   507  
   508      // The minimal available pods to run for this Job
   509      // +optional
   510      MinAvailable int32 `json:"minAvailable,omitempty" protobuf:"bytes,2,opt,name=minAvailable"`
   511  
   512      // The volumes mount on Job
   513      Volumes []VolumeSpec `json:"volumes,omitempty" protobuf:"bytes,3,opt,name=volumes"`
   514  
   515      // Tasks specifies the task specification of Job
   516      // +optional
   517      Tasks []TaskSpec `json:"taskSpecs,omitempty" protobuf:"bytes,4,opt,name=taskSpecs"`
   518  
   519      // Specifies the default lifecycle of tasks
   520      // +optional
   521      Policies []LifecyclePolicy `json:"policies,omitempty" protobuf:"bytes,5,opt,name=policies"`
   522  
   523      // Specifies the plugin of job
   524      // Key is plugin name, value is the arguments of the plugin
   525      // +optional
   526      Plugins map[string][]string `json:"plugins,omitempty" protobuf:"bytes,6,opt,name=plugins"`
   527  
   528      //Specifies the queue that will be used in the scheduler, "default" queue is used this leaves empty.
   529      Queue string `json:"queue,omitempty" protobuf:"bytes,7,opt,name=queue"`
   530  
   531      // Specifies the maximum number of retries before marking this Job failed.
   532      // Defaults to 3.
   533      // +optional
   534      MaxRetry int32 `json:"maxRetry,omitempty" protobuf:"bytes,8,opt,name=maxRetry"`
   535  }
   536  
   537  // VolumeSpec defines the specification of Volume, e.g. PVC
   538  type VolumeSpec struct {
   539      MountPath string `json:"mountPath" protobuf:"bytes,1,opt,name=mountPath"`
   540  
   541      // defined the PVC name
   542      VolumeClaimName string `json:"volumeClaimName,omitempty" protobuf:"bytes,2,opt,name=volumeClaimName"`
   543  
   544      // VolumeClaim defines the PVC used by the VolumeMount.
   545      VolumeClaim *v1.PersistentVolumeClaimSpec `json:"volumeClaim,omitempty" protobuf:"bytes,3,opt,name=volumeClaim"`
   546  }
   547  
   548  // Event represent the phase of Job, e.g. pod-failed.
   549  type Event string
   550  
   551  const (
   552      // AllEvent means all event
   553      AllEvents Event = "*"
   554      // PodFailedEvent is triggered if Pod was failed
   555      PodFailedEvent Event = "PodFailed"
   556      // PodEvictedEvent is triggered if Pod was deleted
   557      PodEvictedEvent Event = "PodEvicted"
   558      // These below are several events can lead to job 'Unknown'
   559      // 1. Task Unschedulable, this is triggered when part of
   560      //    pods can't be scheduled while some are already running in gang-scheduling case.
   561      JobUnknownEvent Event = "Unknown"
   562  
   563      // OutOfSyncEvent is triggered if Pod/Job were updated
   564      OutOfSyncEvent Event = "OutOfSync"
   565      // CommandIssuedEvent is triggered if a command is raised by user
   566      CommandIssuedEvent Event = "CommandIssued"
   567      // TaskCompletedEvent is triggered if the 'Replicas' amount of pods in one task are succeed
   568      TaskCompletedEvent Event = "TaskCompleted"
   569  )
   570  
   571  // Action is the action that Job controller will take according to the event.
   572  type Action string
   573  
   574  const (
   575      // AbortJobAction if this action is set, the whole job will be aborted:
   576      // all Pod of Job will be evicted, and no Pod will be recreated
   577      AbortJobAction Action = "AbortJob"
   578      // RestartJobAction if this action is set, the whole job will be restarted
   579      RestartJobAction Action = "RestartJob"
   580      // TerminateJobAction if this action is set, the whole job wil be terminated
   581      // and can not be resumed: all Pod of Job will be evicted, and no Pod will be recreated.
   582      TerminateJobAction Action = "TerminateJob"
   583      // CompleteJobAction if this action is set, the unfinished pods will be killed, job completed.
   584      CompleteJobAction Action = "CompleteJob"
   585  
   586      // ResumeJobAction is the action to resume an aborted job.
   587      ResumeJobAction Action = "ResumeJob"
   588      // SyncJobAction is the action to sync Job/Pod status.
   589      SyncJobAction Action = "SyncJob"
   590  )
   591  
   592  // LifecyclePolicy specifies the lifecycle and error handling of task and job.
   593  type LifecyclePolicy struct {
   594      // The action that will be taken to the PodGroup according to Event.
   595      // One of "Restart", "None".
   596      // Default to None.
   597      // +optional
   598      Action Action `json:"action,omitempty" protobuf:"bytes,1,opt,name=action"`
   599  
   600      // The Event recorded by scheduler; the controller takes actions
   601      // according to this Event.
   602      // +optional
   603      Event Event `json:"event,omitempty" protobuf:"bytes,2,opt,name=event"`
   604  
   605      // Timeout is the grace period for controller to take actions.
   606      // Default to nil (take action immediately).
   607      // +optional
   608      Timeout *metav1.Duration `json:"timeout,omitempty" protobuf:"bytes,3,opt,name=timeout"`
   609  }
   610  
   611  // TaskSpec specifies the task specification of Job
   612  type TaskSpec struct {
   613      // Name specifies the name of tasks
   614      Name string `json:"name,omitempty" protobuf:"bytes,1,opt,name=name"`
   615  
   616      // Replicas specifies the replicas of this TaskSpec in Job
   617      Replicas int32 `json:"replicas,omitempty" protobuf:"bytes,2,opt,name=replicas"`
   618  
   619      // Specifies the pod that will be created for this TaskSpec
   620      // when executing a Job
   621      Template v1.PodTemplateSpec `json:"template,omitempty" protobuf:"bytes,3,opt,name=template"`
   622  
   623      // Specifies the lifecycle of task
   624      // +optional
   625      Policies []LifecyclePolicy `json:"policies,omitempty" protobuf:"bytes,4,opt,name=policies"`
   626  }
   627  
   628  type JobPhase string
   629  
   630  const (
   631      // Pending is the phase that job is pending in the queue, waiting for scheduling decision
   632      Pending JobPhase = "Pending"
   633      // Aborting is the phase that job is aborted, waiting for releasing pods
   634      Aborting JobPhase = "Aborting"
   635      // Aborted is the phase that job is aborted by user or error handling
   636      Aborted JobPhase = "Aborted"
   637      // Running is the phase that minimal available tasks of Job are running
   638      Running JobPhase = "Running"
   639      // Restarting is the phase that the Job is restarted, waiting for pod releasing and recreating
   640      Restarting JobPhase = "Restarting"
   641      // Completing is the phase that required tasks of job are completed, job starts to clean up
   642      Completing JobPhase = "Completing"
   643      // Completed is the phase that all tasks of Job are completed successfully
   644      Completed JobPhase = "Completed"
   645      // Terminating is the phase that the Job is terminated, waiting for releasing pods
   646      Terminating JobPhase = "Terminating"
   647      // Terminated is the phase that the job is finished unexpected, e.g. events
   648      Terminated JobPhase = "Terminated"
   649      // Failed is the phase that the job is restarted failed reached the maximum number of retries.
   650      Failed JobPhase = "Failed"
   651  )
   652  
   653  // JobState contains details for the current state of the job.
   654  type JobState struct {
   655      // The phase of Job.
   656      // +optional
   657      Phase JobPhase `json:"phase,omitempty" protobuf:"bytes,1,opt,name=phase"`
   658  
   659      // Unique, one-word, CamelCase reason for the phase's last transition.
   660      // +optional
   661      Reason string `json:"reason,omitempty" protobuf:"bytes,2,opt,name=reason"`
   662  
   663      // Human-readable message indicating details about last transition.
   664      // +optional
   665      Message string `json:"message,omitempty" protobuf:"bytes,3,opt,name=message"`
   666  }
   667  
   668  // JobStatus represents the current status of a Job
   669  type JobStatus struct {
   670      // Current state of Job.
   671      State JobState `json:"state,omitempty" protobuf:"bytes,1,opt,name=state"`
   672  
   673      // The number of pending pods.
   674      // +optional
   675      Pending int32 `json:"pending,omitempty" protobuf:"bytes,2,opt,name=pending"`
   676  
   677      // The number of running pods.
   678      // +optional
   679      Running int32 `json:"running,omitempty" protobuf:"bytes,3,opt,name=running"`
   680  
   681      // The number of pods which reached phase Succeeded.
   682      // +optional
   683      Succeeded int32 `json:"Succeeded,omitempty" protobuf:"bytes,4,opt,name=succeeded"`
   684  
   685      // The number of pods which reached phase Failed.
   686      // +optional
   687      Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
   688  
   689      // The minimal available pods to run for this Job
   690      // +optional
   691      MinAvailable int32 `json:"minAvailable,omitempty" protobuf:"bytes,6,opt,name=minAvailable"`
   692  
   693      // The number of pods which reached phase Terminating.
   694      // +optional
   695      Terminating int32 `json:"terminating,omitempty" protobuf:"bytes,7,opt,name=terminating"`
   696  }
   697  
   698  // +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
   699  type JobList struct {
   700      metav1.TypeMeta `json:",inline"`
   701      metav1.ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
   702  
   703      Items []Job `json:"items" protobuf:"bytes,2,rep,name=items"`
   704  }
   705  
   706  ```