volcano.sh/volcano@v1.9.0/docs/design/task-topology-plugin.md (about) 1 # Task Topology Plugin 2 3 ## Introduction 4 5 In big data processing jobs like Tensorflow & Spark, tasks transmitted a large amount of data between each other, causing transmission delay took a large proportion in job execution time. So task topology plugin was proposed to modify scheduling strategy according to transmission topology inside a job, so as to cut the data amount to be transmitted between nodes, decrease transmission delay proportion in job execution time, and improve resource utilization. 6 7 ## Theory 8 9 - For simplicity, task-topology plugin create task topology of a job according to task affinities set in job annotation, then create buckets to store tasks. Tasks with affinity tends to be put in same bucket, and tasks with anti-affinity tends to be put in different bucket. Finally reflect bucket to nodes, so as to minimize the data transmission between nodes. 10 11 - Here is an example to describe what task-topology plugin do. 12 13 - Suppose a tensorflow job with 6 task: `ps0, ps1, worker0, worker1, worker2, worker3`. For simplicity, each task just need 1 cpu. Set the task affinity as `"affinity": "ps,worker"`, `"anti-affinity": "ps"` 14 15 - In `OnSessionOpen`, task-topology plugin generates the bucket by affinity: 16 - sort the task by `taskAffinityOrder`, in this order, the anti-affinity is prior to affinity, because anti-affinity would generate more bucket. 17 - Suppose tasks with orders: `ps0, ps1, worker0, worker1, worker2, worker3` 18 19 - generate bucket 20 1. ps0, there is no bucket, generate bucket 1 21 2. ps1, has 1 bucket, but has anti-affinity config, generate bucket 2 22 3. worker0, affinity to all two bucket, choose bucket 1, 23 4. worker1, affinity to all two bucket, but by resource balancing, choose bucket 2 24 5. worker2, choose 1 25 6. worker3, choose 2 26 7. now, we have buckets: 27 | bucket | tasks | 28 | - | - | 29 | bucket1 | ps0, worker0, worker2 | 30 | bucket2 | ps1, worker1, worker3 | 31 32 - After bucket generation, task-topology plugin provides `taskOrderFn` for `allocate` action to create a `priorityQueue` for allocate. In sample above, the task order will be like: `ps0, worker0, worker2, ps1, worker1, worker3` 33 34 - Suppose there are 3 nodes available in cluster: 35 | node | resources | 36 | - | - | 37 | node1 | cpu: 2 | 38 | node2 | cpu: 1 | 39 | node3 | cpu: 4 | 40 41 - Task-topology plugin also provide `nodeOrderFn` to priority score for each node, which would mapping to [0, 10], but now just using bucket score for simplicity: 42 - for ps0: 43 | node | bucket in node | score | 44 | - | - | - | 45 | node1 | ps0 worker0 | 2 | 46 | node2 | ps0 | 1 | 47 | node3 | ps0 worker0 worker2 | 3 | 48 49 obviously, ps0 will bind to node3. 50 | node1 | node2 | node3 | 51 | - | - | - | 52 | | | ps0 | 53 54 - for worker0: 55 | node | own tasks | bucket in node | score | 56 | - | - | - | - | 57 | node1 | | worker0 worker2 | 2 | 58 | node2 | | worker0 | 1 | 59 | node3 | ps0 | worker0 worker2 | 3 | 60 61 and then, worker0 will follow the ps0, and bind to node3. 62 63 - and the same to worker2. 64 obviously, ps0 will bind to node3. 65 | node1 | node2 | node3 | 66 | - | - | - | 67 | | | ps0, worker0, worker2 | 68 69 - next task, for ps1: 70 | node | own tasks | bucket in node | score | 71 | - | - | - | - | 72 | node1 | | ps1 worker1 | 2 | 73 | node2 | | ps1 | 1 | 74 | node3 | ps0, worker0, worker2 | ps1 | 0(anti-affinity) | 75 76 so, ps1 will bind to node1. 77 78 | node1 | node2 | node3 | 79 | - | - | - | 80 | ps1 | | ps0, worker0, worker2 | 81 82 - and worker1 will bind to node1. 83 | node1 | node2 | node3 | 84 | - | - | - | 85 | ps1, worker1 | | ps0, worker0, worker2 | 86 87 - for worker3, the node2 and node3 has the same score, the choice will affect by other plugins like `binpack` or `leastRequestPriority`. 88 89 ## Future Improvement 90 91 1. By now task-topology plugin uses annotations as input arguments, it is easy to cooperate with upper applications through various operators, but not official. So next step task-topology plugin could be added into job plugin like `svc` & `ssh`, which could still set inside individual job. 92 2. By now task-topology plugin only create task topology according to task species & affinities, but a more detailed topology may need a whole matrix with data scale. So one more interface is needed once task-topology plugin needs to be extended. 93 3. By now task-topology plugin do not interact with other arguments of volcano, `minAvailable`, etc, may need supports about this if necessary.