github.com/kubeflow/training-operator@v1.7.0/docs/api/kubeflow.org_v1_generated.asciidoc (about) 1 // Generated documentation. Please do not edit. 2 :anchor_prefix: k8s-api 3 4 [id="{p}-api-reference"] 5 = API Reference 6 7 .Packages 8 - xref:{anchor_prefix}-kubeflow-org-v1[$$kubeflow.org/v1$$] 9 10 11 [id="{anchor_prefix}-kubeflow-org-v1"] 12 == kubeflow.org/v1 13 14 Package v1 is the v1 version of the API. 15 16 Package v1 contains API Schema definitions for the kubeflow.org v1 API group 17 18 .Resource Types 19 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijob[$$MPIJob$$] 20 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijoblist[$$MPIJobList$$] 21 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjob[$$MXJob$$] 22 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjoblist[$$MXJobList$$] 23 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejob[$$PaddleJob$$] 24 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejoblist[$$PaddleJobList$$] 25 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjob[$$PyTorchJob$$] 26 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjoblist[$$PyTorchJobList$$] 27 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjob[$$TFJob$$] 28 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjoblist[$$TFJobList$$] 29 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjob[$$XGBoostJob$$] 30 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjoblist[$$XGBoostJobList$$] 31 32 33 === Definitions 34 35 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-elasticpolicy"] 36 ==== ElasticPolicy 37 38 39 40 .Appears In: 41 **** 42 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec[$$PyTorchJobSpec$$] 43 **** 44 45 [cols="25a,75a", options="header"] 46 |=== 47 | Field | Description 48 | *`minReplicas`* __integer__ | minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null. 49 | *`maxReplicas`* __integer__ | upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null. 50 | *`rdzvBackend`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-rdzvbackend[$$RDZVBackend$$]__ | 51 | *`rdzvPort`* __integer__ | 52 | *`rdzvHost`* __string__ | 53 | *`rdzvId`* __string__ | 54 | *`rdzvConf`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-rdzvconf[$$RDZVConf$$] array__ | RDZVConf contains additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,...). 55 | *`standalone`* __boolean__ | Start a local standalone rendezvous backend that is represented by a C10d TCP store on port 29400. Useful when launching single-node, multi-worker job. If specified --rdzv_backend, --rdzv_endpoint, --rdzv_id are auto-assigned; any explicitly set values are ignored. 56 | *`nProcPerNode`* __integer__ | Number of workers per node; supported values: [auto, cpu, gpu, int]. Deprecated: This API is deprecated in v1.7+ Use .spec.nprocPerNode instead. 57 | *`maxRestarts`* __integer__ | 58 | *`metrics`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#metricspec-v2-autoscaling[$$MetricSpec$$] array__ | Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created. 59 |=== 60 61 62 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobcondition"] 63 ==== JobCondition 64 65 JobCondition describes the state of the job at a certain point. 66 67 .Appears In: 68 **** 69 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$] 70 **** 71 72 [cols="25a,75a", options="header"] 73 |=== 74 | Field | Description 75 | *`type`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobconditiontype[$$JobConditionType$$]__ | Type of job condition. 76 | *`status`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#conditionstatus-v1-core[$$ConditionStatus$$]__ | Status of the condition, one of True, False, Unknown. 77 | *`reason`* __string__ | The reason for the condition's last transition. 78 | *`message`* __string__ | A human readable message indicating details about the transition. 79 | *`lastUpdateTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta[$$Time$$]__ | The last time this condition was updated. 80 | *`lastTransitionTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta[$$Time$$]__ | Last time the condition transitioned from one status to another. 81 |=== 82 83 84 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobconditiontype"] 85 ==== JobConditionType (string) 86 87 JobConditionType defines all kinds of types of JobStatus. 88 89 .Appears In: 90 **** 91 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobcondition[$$JobCondition$$] 92 **** 93 94 95 96 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobmodetype"] 97 ==== JobModeType (string) 98 99 JobModeType id the type for JobMode 100 101 .Appears In: 102 **** 103 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec[$$MXJobSpec$$] 104 **** 105 106 107 108 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus"] 109 ==== JobStatus 110 111 JobStatus represents the current observed state of the training Job. 112 113 .Appears In: 114 **** 115 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijob[$$MPIJob$$] 116 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjob[$$MXJob$$] 117 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejob[$$PaddleJob$$] 118 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjob[$$PyTorchJob$$] 119 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjob[$$TFJob$$] 120 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjob[$$XGBoostJob$$] 121 **** 122 123 [cols="25a,75a", options="header"] 124 |=== 125 | Field | Description 126 | *`conditions`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobcondition[$$JobCondition$$] array__ | Conditions is an array of current observed job conditions. 127 | *`replicaStatuses`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicastatus[$$ReplicaStatus$$])__ | ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica. 128 | *`startTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta[$$Time$$]__ | Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. 129 | *`completionTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta[$$Time$$]__ | Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. 130 | *`lastReconcileTime`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#time-v1-meta[$$Time$$]__ | Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC. 131 |=== 132 133 134 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijob"] 135 ==== MPIJob 136 137 138 139 .Appears In: 140 **** 141 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijoblist[$$MPIJobList$$] 142 **** 143 144 [cols="25a,75a", options="header"] 145 |=== 146 | Field | Description 147 | *`apiVersion`* __string__ | `kubeflow.org/v1` 148 | *`kind`* __string__ | `MPIJob` 149 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 150 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 151 152 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijobspec[$$MPIJobSpec$$]__ | 153 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | 154 |=== 155 156 157 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijoblist"] 158 ==== MPIJobList 159 160 161 162 163 164 [cols="25a,75a", options="header"] 165 |=== 166 | Field | Description 167 | *`apiVersion`* __string__ | `kubeflow.org/v1` 168 | *`kind`* __string__ | `MPIJobList` 169 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 170 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 171 172 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijob[$$MPIJob$$] array__ | 173 |=== 174 175 176 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijobspec"] 177 ==== MPIJobSpec 178 179 180 181 .Appears In: 182 **** 183 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijob[$$MPIJob$$] 184 **** 185 186 [cols="25a,75a", options="header"] 187 |=== 188 | Field | Description 189 | *`slotsPerWorker`* __integer__ | Specifies the number of slots per worker used in hostfile. Defaults to 1. 190 | *`cleanPodPolicy`* __CleanPodPolicy__ | CleanPodPolicy defines the policy that whether to kill pods after the job completes. Defaults to None. 191 | *`mpiReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | `MPIReplicaSpecs` contains maps from `MPIReplicaType` to `ReplicaSpec` that specify the MPI replicas to run. 192 | *`mainContainer`* __string__ | MainContainer specifies name of the main container which executes the MPI code. 193 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | `RunPolicy` encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 194 |=== 195 196 197 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjob"] 198 ==== MXJob 199 200 MXJob is the Schema for the mxjobs API 201 202 .Appears In: 203 **** 204 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjoblist[$$MXJobList$$] 205 **** 206 207 [cols="25a,75a", options="header"] 208 |=== 209 | Field | Description 210 | *`apiVersion`* __string__ | `kubeflow.org/v1` 211 | *`kind`* __string__ | `MXJob` 212 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 213 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 214 215 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec[$$MXJobSpec$$]__ | 216 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | 217 |=== 218 219 220 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjoblist"] 221 ==== MXJobList 222 223 MXJobList contains a list of MXJob 224 225 226 227 [cols="25a,75a", options="header"] 228 |=== 229 | Field | Description 230 | *`apiVersion`* __string__ | `kubeflow.org/v1` 231 | *`kind`* __string__ | `MXJobList` 232 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 233 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 234 235 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjob[$$MXJob$$] array__ | 236 |=== 237 238 239 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec"] 240 ==== MXJobSpec 241 242 MXJobSpec defines the desired state of MXJob 243 244 .Appears In: 245 **** 246 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjob[$$MXJob$$] 247 **** 248 249 [cols="25a,75a", options="header"] 250 |=== 251 | Field | Description 252 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 253 | *`jobMode`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobmodetype[$$JobModeType$$]__ | JobMode specify the kind of MXjob to do. Different mode may have different MXReplicaSpecs request 254 | *`mxReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | MXReplicaSpecs is map of ReplicaType and ReplicaSpec specifies the MX replicas to run. For example, { "Scheduler": ReplicaSpec, "Server": ReplicaSpec, "Worker": ReplicaSpec, } 255 |=== 256 257 258 259 260 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddleelasticpolicy"] 261 ==== PaddleElasticPolicy 262 263 264 265 .Appears In: 266 **** 267 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec[$$PaddleJobSpec$$] 268 **** 269 270 [cols="25a,75a", options="header"] 271 |=== 272 | Field | Description 273 | *`minReplicas`* __integer__ | minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null. 274 | *`maxReplicas`* __integer__ | upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null. 275 | *`maxRestarts`* __integer__ | MaxRestarts is the limit for restart times of pods in elastic mode. 276 | *`metrics`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#metricspec-v2-autoscaling[$$MetricSpec$$] array__ | Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created. 277 |=== 278 279 280 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejob"] 281 ==== PaddleJob 282 283 PaddleJob Represents a PaddleJob resource. 284 285 .Appears In: 286 **** 287 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejoblist[$$PaddleJobList$$] 288 **** 289 290 [cols="25a,75a", options="header"] 291 |=== 292 | Field | Description 293 | *`apiVersion`* __string__ | `kubeflow.org/v1` 294 | *`kind`* __string__ | `PaddleJob` 295 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard Kubernetes type metadata. 296 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 297 298 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec[$$PaddleJobSpec$$]__ | Specification of the desired state of the PaddleJob. 299 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | Most recently observed status of the PaddleJob. Read-only (modified by the system). 300 |=== 301 302 303 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejoblist"] 304 ==== PaddleJobList 305 306 PaddleJobList is a list of PaddleJobs. 307 308 309 310 [cols="25a,75a", options="header"] 311 |=== 312 | Field | Description 313 | *`apiVersion`* __string__ | `kubeflow.org/v1` 314 | *`kind`* __string__ | `PaddleJobList` 315 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard type metadata. 316 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 317 318 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejob[$$PaddleJob$$] array__ | List of PaddleJobs. 319 |=== 320 321 322 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec"] 323 ==== PaddleJobSpec 324 325 PaddleJobSpec is a desired state description of the PaddleJob. 326 327 .Appears In: 328 **** 329 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejob[$$PaddleJob$$] 330 **** 331 332 [cols="25a,75a", options="header"] 333 |=== 334 | Field | Description 335 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 336 | *`elasticPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddleelasticpolicy[$$PaddleElasticPolicy$$]__ | ElasticPolicy holds the elastic policy for paddle job. 337 | *`paddleReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | A map of PaddleReplicaType (type) to ReplicaSpec (value). Specifies the Paddle cluster configuration. For example, { "Master": PaddleReplicaSpec, "Worker": PaddleReplicaSpec, } 338 |=== 339 340 341 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjob"] 342 ==== PyTorchJob 343 344 PyTorchJob Represents a PyTorchJob resource. 345 346 .Appears In: 347 **** 348 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjoblist[$$PyTorchJobList$$] 349 **** 350 351 [cols="25a,75a", options="header"] 352 |=== 353 | Field | Description 354 | *`apiVersion`* __string__ | `kubeflow.org/v1` 355 | *`kind`* __string__ | `PyTorchJob` 356 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard Kubernetes type metadata. 357 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 358 359 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec[$$PyTorchJobSpec$$]__ | Specification of the desired state of the PyTorchJob. 360 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | Most recently observed status of the PyTorchJob. Read-only (modified by the system). 361 |=== 362 363 364 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjoblist"] 365 ==== PyTorchJobList 366 367 PyTorchJobList is a list of PyTorchJobs. 368 369 370 371 [cols="25a,75a", options="header"] 372 |=== 373 | Field | Description 374 | *`apiVersion`* __string__ | `kubeflow.org/v1` 375 | *`kind`* __string__ | `PyTorchJobList` 376 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard type metadata. 377 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 378 379 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjob[$$PyTorchJob$$] array__ | List of PyTorchJobs. 380 |=== 381 382 383 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec"] 384 ==== PyTorchJobSpec 385 386 PyTorchJobSpec is a desired state description of the PyTorchJob. 387 388 .Appears In: 389 **** 390 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjob[$$PyTorchJob$$] 391 **** 392 393 [cols="25a,75a", options="header"] 394 |=== 395 | Field | Description 396 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 397 | *`elasticPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-elasticpolicy[$$ElasticPolicy$$]__ | 398 | *`pytorchReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { "Master": PyTorchReplicaSpec, "Worker": PyTorchReplicaSpec, } 399 | *`nprocPerNode`* __string__ | Number of workers per node; supported values: [auto, cpu, gpu, int]. For more, https://github.com/pytorch/pytorch/blob/26f7f470df64d90e092081e39507e4ac751f55d6/torch/distributed/run.py#L629-L658. Defaults to auto. 400 |=== 401 402 403 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-rdzvbackend"] 404 ==== RDZVBackend (string) 405 406 407 408 .Appears In: 409 **** 410 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-elasticpolicy[$$ElasticPolicy$$] 411 **** 412 413 414 415 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-rdzvconf"] 416 ==== RDZVConf 417 418 419 420 .Appears In: 421 **** 422 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-elasticpolicy[$$ElasticPolicy$$] 423 **** 424 425 [cols="25a,75a", options="header"] 426 |=== 427 | Field | Description 428 | *`key`* __string__ | 429 | *`value`* __string__ | 430 |=== 431 432 433 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec"] 434 ==== ReplicaSpec 435 436 ReplicaSpec is a description of the replica 437 438 .Appears In: 439 **** 440 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijobspec[$$MPIJobSpec$$] 441 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec[$$MXJobSpec$$] 442 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec[$$PaddleJobSpec$$] 443 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec[$$PyTorchJobSpec$$] 444 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec[$$TFJobSpec$$] 445 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjobspec[$$XGBoostJobSpec$$] 446 **** 447 448 [cols="25a,75a", options="header"] 449 |=== 450 | Field | Description 451 | *`replicas`* __integer__ | Replicas is the desired number of replicas of the given template. If unspecified, defaults to 1. 452 | *`template`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core[$$PodTemplateSpec$$]__ | Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec 453 | *`restartPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-restartpolicy[$$RestartPolicy$$]__ | Restart policy for all replicas within the job. One of Always, OnFailure, Never and ExitCode. Default to Never. 454 |=== 455 456 457 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicastatus"] 458 ==== ReplicaStatus 459 460 ReplicaStatus represents the current observed state of the replica. 461 462 .Appears In: 463 **** 464 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$] 465 **** 466 467 [cols="25a,75a", options="header"] 468 |=== 469 | Field | Description 470 | *`active`* __integer__ | The number of actively running pods. 471 | *`succeeded`* __integer__ | The number of pods which reached phase Succeeded. 472 | *`failed`* __integer__ | The number of pods which reached phase Failed. 473 | *`labelSelector`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#labelselector-v1-meta[$$LabelSelector$$]__ | Deprecated: Use Selector instead 474 | *`selector`* __string__ | A Selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty Selector matches all objects. A null Selector matches no objects. 475 |=== 476 477 478 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype"] 479 ==== ReplicaType (string) 480 481 ReplicaType represents the type of the replica. Each operator needs to define its own set of ReplicaTypes. 482 483 .Appears In: 484 **** 485 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$] 486 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijobspec[$$MPIJobSpec$$] 487 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec[$$MXJobSpec$$] 488 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec[$$PaddleJobSpec$$] 489 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec[$$PyTorchJobSpec$$] 490 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec[$$TFJobSpec$$] 491 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjobspec[$$XGBoostJobSpec$$] 492 **** 493 494 495 496 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-restartpolicy"] 497 ==== RestartPolicy (string) 498 499 RestartPolicy describes how the replicas should be restarted. Only one of the following restart policies may be specified. If none of the following policies is specified, the default one is RestartPolicyAlways. 500 501 .Appears In: 502 **** 503 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$] 504 **** 505 506 507 508 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy"] 509 ==== RunPolicy 510 511 RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 512 513 .Appears In: 514 **** 515 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mpijobspec[$$MPIJobSpec$$] 516 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-mxjobspec[$$MXJobSpec$$] 517 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-paddlejobspec[$$PaddleJobSpec$$] 518 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-pytorchjobspec[$$PyTorchJobSpec$$] 519 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec[$$TFJobSpec$$] 520 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjobspec[$$XGBoostJobSpec$$] 521 **** 522 523 [cols="25a,75a", options="header"] 524 |=== 525 | Field | Description 526 | *`cleanPodPolicy`* __CleanPodPolicy__ | CleanPodPolicy defines the policy to kill pods after the job completes. Default to None. 527 | *`ttlSecondsAfterFinished`* __integer__ | TTLSecondsAfterFinished is the TTL to clean up jobs. It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Default to infinite. 528 | *`activeDeadlineSeconds`* __integer__ | Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer. 529 | *`backoffLimit`* __integer__ | Optional number of retries before marking this job failed. 530 | *`schedulingPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-schedulingpolicy[$$SchedulingPolicy$$]__ | SchedulingPolicy defines the policy related to scheduling, e.g. gang-scheduling 531 | *`suspend`* __boolean__ | suspend specifies whether the Job controller should create Pods or not. If a Job is created with suspend set to true, no Pods are created by the Job controller. If a Job is suspended after creation (i.e. the flag goes from false to true), the Job controller will delete all active Pods and PodGroups associated with this Job. Users must design their workload to gracefully handle this. Suspending a Job will reset the StartTime field of the Job. 532 Defaults to false. 533 |=== 534 535 536 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-schedulingpolicy"] 537 ==== SchedulingPolicy 538 539 SchedulingPolicy encapsulates various scheduling policies of the distributed training job, for example `minAvailable` for gang-scheduling. 540 541 .Appears In: 542 **** 543 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$] 544 **** 545 546 [cols="25a,75a", options="header"] 547 |=== 548 | Field | Description 549 | *`minAvailable`* __integer__ | 550 | *`queue`* __string__ | 551 | *`minResources`* __Quantity__ | 552 | *`priorityClass`* __string__ | 553 | *`scheduleTimeoutSeconds`* __integer__ | 554 |=== 555 556 557 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-successpolicy"] 558 ==== SuccessPolicy (string) 559 560 SuccessPolicy is the success policy. 561 562 .Appears In: 563 **** 564 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec[$$TFJobSpec$$] 565 **** 566 567 568 569 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjob"] 570 ==== TFJob 571 572 TFJob represents a TFJob resource. 573 574 .Appears In: 575 **** 576 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjoblist[$$TFJobList$$] 577 **** 578 579 [cols="25a,75a", options="header"] 580 |=== 581 | Field | Description 582 | *`apiVersion`* __string__ | `kubeflow.org/v1` 583 | *`kind`* __string__ | `TFJob` 584 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard Kubernetes type metadata. 585 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 586 587 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec[$$TFJobSpec$$]__ | Specification of the desired state of the TFJob. 588 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | Most recently observed status of the TFJob. Populated by the system. Read-only. 589 |=== 590 591 592 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjoblist"] 593 ==== TFJobList 594 595 TFJobList is a list of TFJobs. 596 597 598 599 [cols="25a,75a", options="header"] 600 |=== 601 | Field | Description 602 | *`apiVersion`* __string__ | `kubeflow.org/v1` 603 | *`kind`* __string__ | `TFJobList` 604 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | Standard type metadata. 605 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 606 607 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjob[$$TFJob$$] array__ | List of TFJobs. 608 |=== 609 610 611 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjobspec"] 612 ==== TFJobSpec 613 614 TFJobSpec is a desired state description of the TFJob. 615 616 .Appears In: 617 **** 618 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-tfjob[$$TFJob$$] 619 **** 620 621 [cols="25a,75a", options="header"] 622 |=== 623 | Field | Description 624 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active. 625 | *`successPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-successpolicy[$$SuccessPolicy$$]__ | SuccessPolicy defines the policy to mark the TFJob as succeeded. Default to "", using the default rules. 626 | *`tfReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | A map of TFReplicaType (type) to ReplicaSpec (value). Specifies the TF cluster configuration. For example, { "PS": ReplicaSpec, "Worker": ReplicaSpec, } 627 | *`enableDynamicWorker`* __boolean__ | A switch to enable dynamic worker 628 |=== 629 630 631 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjob"] 632 ==== XGBoostJob 633 634 XGBoostJob is the Schema for the xgboostjobs API 635 636 .Appears In: 637 **** 638 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjoblist[$$XGBoostJobList$$] 639 **** 640 641 [cols="25a,75a", options="header"] 642 |=== 643 | Field | Description 644 | *`apiVersion`* __string__ | `kubeflow.org/v1` 645 | *`kind`* __string__ | `XGBoostJob` 646 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 647 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#objectmeta-v1-meta[$$ObjectMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 648 649 | *`spec`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjobspec[$$XGBoostJobSpec$$]__ | 650 | *`status`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-jobstatus[$$JobStatus$$]__ | 651 |=== 652 653 654 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjoblist"] 655 ==== XGBoostJobList 656 657 XGBoostJobList contains a list of XGBoostJob 658 659 660 661 [cols="25a,75a", options="header"] 662 |=== 663 | Field | Description 664 | *`apiVersion`* __string__ | `kubeflow.org/v1` 665 | *`kind`* __string__ | `XGBoostJobList` 666 | *`TypeMeta`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#typemeta-v1-meta[$$TypeMeta$$]__ | 667 | *`metadata`* __link:https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#listmeta-v1-meta[$$ListMeta$$]__ | Refer to Kubernetes API documentation for fields of `metadata`. 668 669 | *`items`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjob[$$XGBoostJob$$] array__ | 670 |=== 671 672 673 [id="{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjobspec"] 674 ==== XGBoostJobSpec 675 676 XGBoostJobSpec defines the desired state of XGBoostJob 677 678 .Appears In: 679 **** 680 - xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-xgboostjob[$$XGBoostJob$$] 681 **** 682 683 [cols="25a,75a", options="header"] 684 |=== 685 | Field | Description 686 | *`runPolicy`* __xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-runpolicy[$$RunPolicy$$]__ | INSERT ADDITIONAL SPEC FIELDS - desired state of cluster Important: Run "make" to regenerate code after modifying this file 687 | *`xgbReplicaSpecs`* __object (keys:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicatype[$$ReplicaType$$], values:xref:{anchor_prefix}-github-com-kubeflow-training-operator-pkg-apis-kubeflow-org-v1-replicaspec[$$ReplicaSpec$$])__ | 688 |=== 689 690