sigs.k8s.io/kueue@v0.6.2/site/content/en/docs/tasks/run_kubeflow_jobs/run_pytorchjobs.md (about)

     1  ---
     2  title: "Run a PyTorchJob"
     3  date: 2023-08-09
     4  weight: 6
     5  description: >
     6    Run a Kueue scheduled PyTorchJob
     7  ---
     8  
     9  This page shows how to leverage Kueue's scheduling and resource management capabilities when running [Training Operator](https://www.kubeflow.org/docs/components/training/pytorch/) PyTorchJobs.
    10  
    11  This guide is for [batch users](/docs/tasks#batch-user) that have a basic understanding of Kueue. For more information, see [Kueue's overview](/docs/overview).
    12  
    13  ## Before you begin
    14  
    15  Check [administer cluster quotas](/docs/tasks/administer_cluster_quotas) for details on the initial cluster setup.
    16  
    17  Check [the Training Operator installation guide](https://github.com/kubeflow/training-operator#installation).
    18  
    19  Note that the minimum requirement training-operator version is v1.7.0.
    20  
    21  You can [modify kueue configurations from installed releases](/docs/installation#install-a-custom-configured-released-version) to include PyTorchJobs as an allowed workload.
    22  
    23  ## PyTorchJob definition
    24  
    25  ### a. Queue selection
    26  
    27  The target [local queue](/docs/concepts/local_queue) should be specified in the `metadata.labels` section of the PyTorchJob configuration.
    28  
    29  ```yaml
    30  metadata:
    31    labels:
    32      kueue.x-k8s.io/queue-name: user-queue
    33  ```
    34  
    35  ### b. Optionally set Suspend field in PyTorchJobs
    36  
    37  ```yaml
    38  spec:
    39    runPolicy:
    40      suspend: true
    41  ```
    42  
    43  By default, Kueue will set `suspend` to true via webhook and unsuspend it when the PyTorchJob is admitted.
    44  
    45  ## Sample PyTorchJob
    46  
    47  This example is based on https://github.com/kubeflow/training-operator/blob/855e0960668b34992ba4e1fd5914a08a3362cfb1/examples/pytorch/simple.yaml.
    48  
    49  {{< include "examples/jobs/sample-pytorchjob.yaml" "yaml" >}}