github.com/grailbio/bigslice@v0.0.0-20230519005545-30c4c12152ad/docs/parallelism.md (about) 1 --- 2 title: Bigslice - parallelism 3 layout: default 4 --- 5 6 # Parallelism in Bigslice 7 8 ## Tasks 9 Bigslice computations are specified as data transformations operating on 10 "slices" of data. When a computation is evaluated with `(*exec.Session).Run`, it 11 is compiled into an acyclic directed graph. Each node is a *task* that computes 12 a portion of the whole computation. For example, a single task may perform a 13 `Map` transformation on a particular shard. Each edge is a dependency between 14 tasks. 15 16 A task is the unit of parallelism. Any task whose dependencies are satisfied can 17 be scheduled to run. 18 19 ## Procs 20 Bigslice represents a computational resource that can be used to evaluate tasks 21 as a *proc*. This generally corresponds to a single CPU core. For example, if we are 22 using EC2, each core of each instance provides a single proc. 23 24 By default, each task occupies a single proc for its evaluation. 25 26 ## Controlling parallelism 27 The set of procs made available and used by a computation is a function of 28 configuration parameters and the computation itself. 29 30 ### Proc supply 31 `bigslice parallelism` specifies the number of procs that Bigslice will try to 32 make available, e.g. by launching new EC2 instances. Bigslice will only make 33 procs available as needed by the computation. For example, if a computation only 34 ever needs to compute `5` tasks in parallel, Bigslice will only make `5` procs 35 available even if `bigslice parallelism` is set `>5`. 36 37 If we're using an EC2 system, e.g. `bigmachine/ec2system`, procs are made 38 available by launching EC2 instances. Bigslice provides parameters to control 39 the type and number of instances to launch and use. 40 41 - `bigmachine/ec2system instance` specifies the EC2 instance type. Each vCPU 42 provides a proc, e.g. `m4.xlarge` will provide 4 procs per instance. 43 - `bigslice max-load` specifies the maximum number of vCPUs to use per instance 44 as a proportion in `[0.0, 1.0]`. For example, if we are using `m4.xlarge` 45 instances with `max-load` of `0.6`, each instance will only provide `2` procs, 46 `floor(4*0.6)`. Tune this parameter if your computation is constrained by 47 factors other than CPU. 48 49 Let's consider a complete example. Suppose we are performing a mapping 50 computation with 1000 shards, i.e. there are 1000 tasks that we could run in 51 parallel given no other constraints. We also have the following configuration: 52 53 ``` 54 param bigmachine/ec2system instance = "m4.xlarge" 55 param bigslice parallelism = 16 56 param bigslice max-load = 0.9 57 ``` 58 59 When Bigslice evaluates our computation, it will see a demand for 1000 procs. 60 However, `parallelism` will cap this at `16`. Bigslice knows that each instance 61 provides 3 procs, `floor((4 vCPUs per instance) * 0.9)`. Bigslice will launch 6 62 instances, resulting in 18 available procs. Notice that this is more than the 63 `16` we specified; Bigslice will launch (and fully utilize) the minimum number 64 of machines necessary to provide the requested procs/parallelism. 65 66 ### Proc demand 67 By default, each task occupies a single proc for its evaluation. 68 Pragmas[^pragma] can be specified on slice operations to customize this 69 behavior. 70 71 #### `bigslice.Procs` 72 `bigslice.Procs(n int)` specifies the number of procs that each task compiled 73 from the slice will occupy. (`bigslice.Procs(1)` is a no-op, as that's the 74 default.) 75 76 For example, 77 ```go 78 slice = bigslice.Map(slice, bigslice.Procs(6)) 79 ``` 80 81 This mapping slice will be compiled into `S` tasks, where `S` is the number of 82 shards of the input slice. When Bigslice evaluates one of these tasks, it will 83 occupy `6` procs. 84 85 The number of procs a task requires is clamped to the number of procs a single 86 instance provides. A single task cannot be divided across multiple instances. 87 88 There are (at least) two use cases for `bigslice.Procs`. 89 90 1. Your computation has internal parallelism, e.g. your function passed to 91 `bigslice.Map` uses multiple threads to perform the mapping of a single 92 element. In general, it's preferable to allow Bigslice to manage parallelism, 93 but this isn't always convenient. 94 2. Your computation is constrained on resources other than CPU. This is similar 95 to the usage of `bigslice.max-load` but specified at the slice level instead 96 of the whole-computation level. 97 98 #### `bigslice.Exclusive` 99 `bigslice.Exclusive` specifies that each task compiled from a slice should 100 occupy an entire instance, regardless of the type of instance. (It is 101 practically equivalent to 102 `bigslice.Procs(nThatIsAtLeastNumberOfProcsPerInstance)`.) 103 104 Use `bigslice.Exclusive` if your tasks will consume the entire resources of a 105 machine, e.g. fully occupy a GPU. 106 107 [^pragma]: A pragma is a directive used to specify some intention that may 108 modify Bigslice evaluation. They are passed as optional arguments to slice 109 operations. Pragmas do not affect the results of a computation but may 110 change how machines are allocated, tasks are distributed, results are 111 materialized, etc.