volcano.sh/volcano@v1.9.0/example/integrations/mxnet/train/README.md (about)

     1  # Distributed Training with MXNet and CPU on Volcano
     2  
     3  This is an example of running distributed training with MXNet and CPU on Volcano. The source code is taken from
     4  MXNet team's example [here](https://github.com/apache/incubator-mxnet/blob/master/example/distributed_training-horovod/gluon_mnist.py).
     5  
     6  The directory contains the following files:
     7  * Dockerfile: Builds the independent worker image.
     8  * Makefile: For building the above image.
     9  * train-mnist-cpu.yaml: The Volcano Job spec.
    10  
    11  To run the example, edit `train-mnist-cpu.yaml` for your image's name and version. Then run
    12  ```
    13  kubectl apply -f train-mnist-cpu.yaml -n ${NAMESPACE}
    14  ```
    15  to create the job.
    16  
    17  Then use
    18  ```
    19  kubectl -n ${NAMESPACE} describe job.batch.volcano.sh mxnet-job
    20  ```
    21  to see the status.