github.com/qri-io/qri@v0.10.1-0.20220104210721-c771715036cb/transform/startf/README.md (about)

     1  # Qri Starlark Transformation Syntax
     2  
     3  Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. [Starlark](https://github.com/google/starlark-go/blob/master/doc/spec.md) is a scripting language from Google that feels a lot like python. This package implements starlark as a _transformation syntax_. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.
     4  
     5  Typical examples of a starlark transformation include:
     6  * combining paginated calls to an API into a single dataset
     7  * downloading unstructured structured data from the internet to extract
     8  * pulling raw data off the web & turning it into a datset
     9  
    10  We're excited about starlark for a few reasons:
    11  * **python syntax** - _many_ people working in data science these days write python, we like that, starlark likes that. dope.
    12  * **deterministic subset of python** - unlike python, starlark removes properties that reduce introspection into code behaviour. things like `while` loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave.
    13  * **parallel execution** - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.
    14  
    15  
    16  ## Getting started
    17  If you're mainly interested in learning how to write starlark transformations, our [documentation](https://qri.io/docs) is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place!
    18  
    19  The easiest way to see starlark transformations in action is to use [qri](https://github.com/qri-io/qri). This `startf` package powers all the starlark stuff in qri. Assuming you have the [go programming language](https://golang.org/) the following should work from a terminal:
    20  
    21  <!--
    22  docrun:
    23    pass: true
    24  -->
    25  ```shell
    26  # get this package
    27  $ go get github.com/qri-io/startf
    28  
    29  # navigate to package
    30  $ cd $GOPATH/src/github.com/qri-io/startf
    31  ```
    32  
    33  # run tests
    34  
    35  <!--
    36  docrun:
    37    pass: true
    38  -->
    39  ```
    40  $ go test ./...
    41  ```
    42  
    43  Often the next steps are to install [qri](https://github.com/qri-io/qri), mess with this `startf` package, then rebuild qri with your changes to see them in action within qri itself.
    44  
    45  ## Starlark Special Functions
    46  
    47  _Special Functions_ are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant:
    48  
    49  <!--
    50  docrun:
    51    test:
    52      call: transform(ds, ctx)
    53      actual: ds.get_meta()
    54      expect: {"hello": "world", "qri": "md:0"}
    55  -->
    56  ```python
    57  def transform(ds,ctx):
    58    ds.set_meta("hello","world")
    59  ```
    60  
    61  Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body
    62  
    63  <!--
    64  docrun:
    65    test:
    66      setup: ds.set_body(["a","b","c"])
    67      call: transform(ds, ctx)
    68      actual: ds.get_body()
    69      expect: [{"total": 3.0}]
    70  -->
    71  ```python
    72  def transform(ds, ctx):
    73    body = ds.get_body()
    74    if body != None:
    75      count = 0
    76      for entry in body:
    77        count += len(entry)
    78    ds.set_body([{"total": count}])
    79  ```
    80  
    81  Starlark special functions have a few rules on top of starlark itself:
    82  * special functions *always* accept a _transformation context_ (the `ctx` arg)
    83  * When you define a data function, qri calls it for you
    84  * All special functions are optional (you don't _need_ to define them), except `transform`. transform is required.
    85  * Special functions are always called in the same order
    86  
    87  Another import special function is `download`, which allows access to the `http` package:
    88  
    89  <!--
    90  docrun:
    91    test:
    92      webproxy:
    93        url: http://example.com/data.json
    94        response: {"data":[4,5,6]}
    95      call: download(ctx)
    96      actual: ctx.download
    97      expect: {"data":[4.0,5.0,6.0]}
    98    save:
    99      filename: transform.star
   100  -->
   101  ```python
   102  load("http.star", "http")
   103  
   104  def download(ctx):
   105    data = http.get("http://example.com/data.json")
   106    return data
   107  ```
   108  
   109  The result of this special function can be accessed using `ctx.download`:
   110  
   111  <!--
   112  docrun:
   113    test:
   114      setup: ctx.download = ["test"]
   115      call: transform(ds, ctx)
   116      actual: ds.get_body()
   117      expect: ["test"]
   118    save:
   119      filename: transform.star
   120      append: true
   121  -->
   122  ```python
   123  def transform(ds, ctx):
   124    ds.set_body(ctx.download)
   125  ```
   126  
   127  More docs on the provide API is coming soon.
   128  
   129  ## Running a transform
   130  
   131  Let's say the above function is saved as `transform.star`. You can run it to create a new dataset by using:
   132  
   133  <!--
   134  docrun:
   135    pass: true
   136    # TODO: Run this command in a sandbox, using the transform.star created above.
   137  -->
   138  ```
   139  qri save --file=transform.star me/dataset_name
   140  ```
   141  
   142  Or, you can add more details by creating a dataset file (saved as `dataset.yaml`, for example) with additional structure:
   143  
   144  <!--
   145  docrun:
   146    pass: true
   147    # TODO: Save this file to use in the command below.
   148  -->
   149  ```
   150  name: dataset_name
   151  transform:
   152    scriptpath: transform.star
   153  meta:
   154    title: My awesome dataset
   155  ```
   156  
   157  Then invoke qri:
   158  
   159  <!--
   160  docrun:
   161    pass: true
   162    # TODO: Run this command in a sandbox, using the dataset.yaml created above.
   163  -->
   164  ```
   165  qri save --file=dataset.yaml
   166  ```
   167  
   168  Fun! More info over on our [docs site](https://qri.io/docs)
   169  
   170  ** **