github.com/kubeflow/training-operator@v1.7.0/examples/pytorch/mnist/README.md (about) 1 ### Distributed MNIST Examples 2 3 This folder contains an example where mnist is trained. This example is also used for e2e testing. 4 5 The python script used to train mnist with pytorch takes in several arguments that can be used 6 to switch the distributed backends. The manifests to launch the distributed training of this mnist 7 file using the pytorch operator are under the respective version folders: [v1](./v1). 8 Each folder contains manifests with example usage of the different backends. 9 10 **Note**: PyTorch job doesn’t work in a user namespace by default because of Istio [automatic sidecar injection](https://istio.io/v1.3/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). In order to get it running, it needs annotation sidecar.istio.io/inject: "false" to disable it for either PyTorch pods or namespace. 11 12 **Build Image** 13 14 The default image name and tag is `kubeflow/pytorch-dist-mnist-test:1.0`. 15 16 ```shell 17 docker build -f Dockerfile -t kubeflow/pytorch-dist-mnist-test:1.0 ./ 18 ``` 19 NOTE: If you you are working on Power System, Dockerfile.ppc64le could be used. 20 21 **Create the mnist PyTorch job** 22 23 The below example uses the gloo backend. 24 25 ```shell 26 kubectl create -f ./v1/pytorch_job_mnist_gloo.yaml 27 ```