github.com/grailbio/bigslice@v0.0.0-20230519005545-30c4c12152ad/docs/parallelism.md (about)

     1  ---
     2  title: Bigslice - parallelism
     3  layout: default
     4  ---
     5  
     6  # Parallelism in Bigslice
     7  
     8  ## Tasks
     9  Bigslice computations are specified as data transformations operating on
    10  "slices" of data. When a computation is evaluated with `(*exec.Session).Run`, it
    11  is compiled into an acyclic directed graph. Each node is a *task* that computes
    12  a portion of the whole computation. For example, a single task may perform a
    13  `Map` transformation on a particular shard. Each edge is a dependency between
    14  tasks.
    15  
    16  A task is the unit of parallelism. Any task whose dependencies are satisfied can
    17  be scheduled to run.
    18  
    19  ## Procs
    20  Bigslice represents a computational resource that can be used to evaluate tasks
    21  as a *proc*. This generally corresponds to a single CPU core. For example, if we are 
    22  using EC2, each core of each instance provides a single proc.
    23  
    24  By default, each task occupies a single proc for its evaluation.
    25  
    26  ## Controlling parallelism
    27  The set of procs made available and used by a computation is a function of
    28  configuration parameters and the computation itself.
    29  
    30  ### Proc supply
    31  `bigslice parallelism` specifies the number of procs that Bigslice will try to
    32  make available, e.g. by launching new EC2 instances. Bigslice will only make
    33  procs available as needed by the computation. For example, if a computation only
    34  ever needs to compute `5` tasks in parallel, Bigslice will only make `5` procs
    35  available even if `bigslice parallelism` is set `>5`.
    36  
    37  If we're using an EC2 system, e.g. `bigmachine/ec2system`, procs are made
    38  available by launching EC2 instances. Bigslice provides parameters to control
    39  the type and number of instances to launch and use.
    40  
    41  - `bigmachine/ec2system instance` specifies the EC2 instance type. Each vCPU
    42    provides a proc, e.g. `m4.xlarge` will provide 4 procs per instance.
    43  - `bigslice max-load` specifies the maximum number of vCPUs to use per instance
    44    as a proportion in `[0.0, 1.0]`. For example, if we are using `m4.xlarge`
    45    instances with `max-load` of `0.6`, each instance will only provide `2` procs,
    46    `floor(4*0.6)`. Tune this parameter if your computation is constrained by
    47    factors other than CPU.
    48    
    49  Let's consider a complete example. Suppose we are performing a mapping
    50  computation with 1000 shards, i.e. there are 1000 tasks that we could run in
    51  parallel given no other constraints. We also have the following configuration:
    52  
    53  ```
    54  param bigmachine/ec2system instance = "m4.xlarge"
    55  param bigslice parallelism = 16
    56  param bigslice max-load = 0.9
    57  ```
    58  
    59  When Bigslice evaluates our computation, it will see a demand for 1000 procs.
    60  However, `parallelism` will cap this at `16`. Bigslice knows that each instance
    61  provides 3 procs, `floor((4 vCPUs per instance) * 0.9)`. Bigslice will launch 6
    62  instances, resulting in 18 available procs. Notice that this is more than the
    63  `16` we specified; Bigslice will launch (and fully utilize) the minimum number
    64  of machines necessary to provide the requested procs/parallelism.
    65  
    66  ### Proc demand
    67  By default, each task occupies a single proc for its evaluation.
    68  Pragmas[^pragma] can be specified on slice operations to customize this
    69  behavior.
    70  
    71  #### `bigslice.Procs`
    72  `bigslice.Procs(n int)` specifies the number of procs that each task compiled
    73  from the slice will occupy. (`bigslice.Procs(1)` is a no-op, as that's the
    74  default.)
    75  
    76  For example,
    77  ```go
    78  slice = bigslice.Map(slice, bigslice.Procs(6))
    79  ```
    80  
    81  This mapping slice will be compiled into `S` tasks, where `S` is the number of
    82  shards of the input slice. When Bigslice evaluates one of these tasks, it will
    83  occupy `6` procs.
    84  
    85  The number of procs a task requires is clamped to the number of procs a single
    86  instance provides. A single task cannot be divided across multiple instances.
    87  
    88  There are (at least) two use cases for `bigslice.Procs`.
    89  
    90  1. Your computation has internal parallelism, e.g. your function passed to
    91     `bigslice.Map` uses multiple threads to perform the mapping of a single
    92     element. In general, it's preferable to allow Bigslice to manage parallelism,
    93     but this isn't always convenient.
    94  2. Your computation is constrained on resources other than CPU. This is similar
    95     to the usage of `bigslice.max-load` but specified at the slice level instead
    96     of the whole-computation level.
    97  
    98  #### `bigslice.Exclusive`
    99  `bigslice.Exclusive` specifies that each task compiled from a slice should
   100  occupy an entire instance, regardless of the type of instance. (It is
   101  practically equivalent to
   102  `bigslice.Procs(nThatIsAtLeastNumberOfProcsPerInstance)`.)
   103  
   104  Use `bigslice.Exclusive` if your tasks will consume the entire resources of a
   105  machine, e.g. fully occupy a GPU.
   106  
   107  [^pragma]: A pragma is a directive used to specify some intention that may
   108      modify Bigslice evaluation. They are passed as optional arguments to slice
   109      operations. Pragmas do not affect the results of a computation but may
   110      change how machines are allocated, tasks are distributed, results are
   111      materialized, etc.