github.com/grailbio/bigslice@v0.0.0-20230519005545-30c4c12152ad/doc.go (about)

     1  // Copyright 2018 GRAIL, Inc. All rights reserved.
     2  // Use of this source code is governed by the Apache 2.0
     3  // license that can be found in the LICENSE file.
     4  
     5  // TODO(marius): fill this in some more, especially once tooling
     6  // improves.
     7  
     8  /*
     9  	Package bigslice implements a distributed data processing system.
    10  	Users compose computations by operating over large collections ("big
    11  	slices") of data, transforming them with a handful of combinators.
    12  	While users express computations using collections-style operations,
    13  	bigslice takes care of the details of parallel execution and
    14  	distribution across multiple machines.
    15  
    16  	Bigslice jobs can run locally, but uses bigmachine for distribution
    17  	among a cluster of compute nodes. In either case, user code does not
    18  	change; the details of distribution are handled by the combination
    19  	of bigmachine and bigslice.
    20  
    21  	Because Go cannot easily serialize code to be sent over the wire and
    22  	executed remotely, bigslice programs have to be written with a few
    23  	constraints:
    24  
    25  	1. All slices must be constructed by bigslice funcs (bigslice.Func), and
    26  	all such functions must be instantiated before exec.Start is called. This
    27  	rule is easy to follow: if funcs are global variables, and exec.Start is
    28  	called from a program's main, then the program is compliant.
    29  
    30  	2. The driver program must be compiled on the same GOOS and GOARCH as the
    31  	target architecture. When running locally, this is not a concern, but
    32  	programs that require distribution must be run from a linux/amd64 binary.
    33  	Bigslice also supports the fat binary format implemented by
    34  	github.com/grailbio/base/fatbin. The bigslice tool
    35  	(github.com/grailbio/bigslice/cmd/bigslice) uses this package to compile
    36  	portable fat binaries.
    37  
    38  	Some Bigslice operations may be annotated with runtime pragmas: directives
    39  	for the Bigslice runtime. See Pragma for details.
    40  
    41  	User provided functions in Bigslice
    42  
    43  	Functions provided to the various bigslice combinators (e.g., bigslice.Map)
    44  	may take an additional argument of type context.Context. If specified, then
    45  	the lifetime of the context is tied to that of the underlying bigslice task.
    46  	Additionally, the context carries a metrics scope
    47  	(github.com/grailbio/base/bigslice/metrics.Scope) which can be used to update
    48  	metric values during data processing.
    49  
    50  */
    51  package bigslice