github.com/kubeflow/training-operator@v1.7.0/examples/pytorch/mnist/README.md (about)

     1  ### Distributed MNIST Examples
     2  
     3  This folder contains an example where mnist is trained. This example is also used for e2e testing.
     4  
     5  The python script used to train mnist with pytorch takes in several arguments that can be used
     6  to switch the distributed backends. The manifests to launch the distributed training of this mnist
     7  file using the pytorch operator are under the respective version folders: [v1](./v1).
     8  Each folder contains manifests with example usage of the different backends.
     9  
    10  **Note**: PyTorch job doesn’t work in a user namespace by default because of Istio [automatic sidecar injection](https://istio.io/v1.3/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). In order to get it running, it needs annotation sidecar.istio.io/inject: "false" to disable it for either PyTorch pods or namespace.
    11  
    12  **Build Image**
    13  
    14  The default image name and tag is `kubeflow/pytorch-dist-mnist-test:1.0`.
    15  
    16  ```shell
    17  docker build -f Dockerfile -t kubeflow/pytorch-dist-mnist-test:1.0 ./
    18  ```
    19  NOTE: If you you are working on Power System, Dockerfile.ppc64le could be used.
    20  
    21  **Create the mnist PyTorch job**
    22  
    23  The below example uses the gloo backend.
    24  
    25  ```shell
    26  kubectl create -f ./v1/pytorch_job_mnist_gloo.yaml
    27  ```