volcano.sh/volcano@v1.9.0/example/integrations/mxnet/train/README.md (about) 1 # Distributed Training with MXNet and CPU on Volcano 2 3 This is an example of running distributed training with MXNet and CPU on Volcano. The source code is taken from 4 MXNet team's example [here](https://github.com/apache/incubator-mxnet/blob/master/example/distributed_training-horovod/gluon_mnist.py). 5 6 The directory contains the following files: 7 * Dockerfile: Builds the independent worker image. 8 * Makefile: For building the above image. 9 * train-mnist-cpu.yaml: The Volcano Job spec. 10 11 To run the example, edit `train-mnist-cpu.yaml` for your image's name and version. Then run 12 ``` 13 kubectl apply -f train-mnist-cpu.yaml -n ${NAMESPACE} 14 ``` 15 to create the job. 16 17 Then use 18 ``` 19 kubectl -n ${NAMESPACE} describe job.batch.volcano.sh mxnet-job 20 ``` 21 to see the status.