sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_kubeflow_jobs/run_pytorchjobs.md (about) 1 --- 2 title: "Run a PyTorchJob" 3 date: 2023-08-09 4 weight: 6 5 description: > 6 Run a Kueue scheduled PyTorchJob 7 --- 8 9 This page shows how to leverage Kueue's scheduling and resource management capabilities when running [Training Operator](https://www.kubeflow.org/docs/components/training/pytorch/) PyTorchJobs. 10 11 This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview). 12 13 ## Before you begin 14 15 Check [administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial cluster setup. 16 17 Check [the Training Operator installation guide](https://github.com/kubeflow/training-operator#installation). 18 19 Note that the minimum requirement training-operator version is v1.7.0. 20 21 You can [modify kueue configurations from installed releases](/docs/installation#install-a-custom-configured-released-version) to include PyTorchJobs as an allowed workload. 22 23 ## PyTorchJob definition 24 25 ### a. Queue selection 26 27 The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the PyTorchJob configuration. 28 29 ```yaml 30 metadata: 31 labels: 32 kueue.x-k8s.io/queue-name: user-queue 33 ``` 34 35 ### b. Optionally set Suspend field in PyTorchJobs 36 37 ```yaml 38 spec: 39 runPolicy: 40 suspend: true 41 ``` 42 43 By default, Kueue will set `suspend` to true via webhook and unsuspend it when the PyTorchJob is admitted. 44 45 ## Sample PyTorchJob 46 47 This example is based on https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/examples/pytorch/simple.yaml. 48 49 {{< include "examples/jobs/sample-pytorchjob.yaml" "yaml" >}}