volcano.sh/volcano@v1.9.0/docs/user-guide/how_to_use_env_plugin.md (about) 1 # Volcano Job Plugin -- Env User Guidance 2 3 ## Background 4 **Env Plugin** is designed for business that a pod should be aware of its index in the task such as [MPI](https://www.open-mpi.org/) 5 and [TensorFlow](https://tensorflow.google.cn/). The indices will be registered as **environment variables** automatically 6 when the Volcano job is created. For example, a tensorflow job consists of *1* ps and *2* workers. And each worker maps 7 to a slice of raw data. In order to make the workers be aware of its target slice, they get their index in the environment 8 variables. 9 10 ## Key Points 11 * The index keys of the environment variables are `VK_TASK_INDEX` and `VC_TASK_INDEX`, they have the same value. 12 * The value of the indices is a number which ranges from `0` to `length - 1`. The `length` equals to the number of replicas 13 of the task. It is also the index of the pod in the task. 14 15 ## Examples 16 ```yaml 17 apiVersion: batch.volcano.sh/v1alpha1 18 kind: Job 19 metadata: 20 name: tensorflow-dist-mnist 21 spec: 22 minAvailable: 3 23 schedulerName: volcano 24 plugins: 25 env: [] ## Env plugin register, note that no values are needed in the array. 26 svc: [] 27 policies: 28 - event: PodEvicted 29 action: RestartJob 30 queue: default 31 tasks: 32 - replicas: 1 33 name: ps 34 template: 35 spec: 36 containers: 37 - command: 38 - sh 39 - -c 40 - | 41 PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; 42 WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; 43 export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"ps\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}; ## Get the index from the environment variable and configure it in the TF job. 44 python /var/tf_dist_mnist/dist_mnist.py 45 image: volcanosh/dist-mnist-tf-example:0.0.1 46 name: tensorflow 47 ports: 48 - containerPort: 2222 49 name: tfjob-port 50 resources: {} 51 restartPolicy: Never 52 - replicas: 2 53 name: worker 54 policies: 55 - event: TaskCompleted 56 action: CompleteJob 57 template: 58 spec: 59 containers: 60 - command: 61 - sh 62 - -c 63 - | 64 PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; 65 WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | sed 's/^/"/;s/$/"/' | tr "\n" ","`; 66 export TF_CONFIG={\"cluster\":{\"ps\":[${PS_HOST}],\"worker\":[${WORKER_HOST}]},\"task\":{\"type\":\"worker\",\"index\":${VK_TASK_INDEX}},\"environment\":\"cloud\"}; 67 python /var/tf_dist_mnist/dist_mnist.py 68 image: volcanosh/dist-mnist-tf-example:0.0.1 69 name: tensorflow 70 ports: 71 - containerPort: 2222 72 name: tfjob-port 73 resources: {} 74 restartPolicy: Never 75 ``` 76 Note: 77 * When config env plugin in the tensorflow job above, all the pods will have 2 environment variables `VK_TASK_INDEX` and 78 `VC_TASK_INDEX`. The environment variables registered in the `ps` pod are as follows. 79 ``` 80 [root@tensorflow-dist-mnist-ps-0 /] env | grep TASK_INDEX 81 VK_TASK_INDEX=0 82 VC_TASK_INDEX=0 83 ``` 84 * Considering the 2 workers, you will get that their names are `tensorflow-dist-mnist-worker-0` and 85 `tensorflow-dist-mnist-worker-1`. And the corresponding values of the index environment variables are `0` and `1`. 86 ``` 87 [root@tensorflow-dist-mnist-worker-0 /] env | grep TASK_INDEX 88 VK_TASK_INDEX=0 89 VC_TASK_INDEX=0 90 ``` 91 ``` 92 [root@tensorflow-dist-mnist-worker-1 /] env | grep TASK_INDEX 93 VK_TASK_INDEX=1 94 VC_TASK_INDEX=1 95 ``` 96 ## Note 97 * Because of historical reasons, environment variables `VK_TASK_INDEX` and `VC_TASK_INDEX` both exist, `VK_TASK_INDEX` will 98 be **deprecated** in the future releases. 99 * No value are needed when register env plugin in the volcano job.