github.com/qri-io/qri@v0.10.1-0.20220104210721-c771715036cb/transform/startf/README.md (about) 1 # Qri Starlark Transformation Syntax 2 3 Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. [Starlark](https://github.com/google/starlark-go/blob/master/doc/spec.md) is a scripting language from Google that feels a lot like python. This package implements starlark as a _transformation syntax_. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset. 4 5 Typical examples of a starlark transformation include: 6 * combining paginated calls to an API into a single dataset 7 * downloading unstructured structured data from the internet to extract 8 * pulling raw data off the web & turning it into a datset 9 10 We're excited about starlark for a few reasons: 11 * **python syntax** - _many_ people working in data science these days write python, we like that, starlark likes that. dope. 12 * **deterministic subset of python** - unlike python, starlark removes properties that reduce introspection into code behaviour. things like `while` loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave. 13 * **parallel execution** - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months. 14 15 16 ## Getting started 17 If you're mainly interested in learning how to write starlark transformations, our [documentation](https://qri.io/docs) is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place! 18 19 The easiest way to see starlark transformations in action is to use [qri](https://github.com/qri-io/qri). This `startf` package powers all the starlark stuff in qri. Assuming you have the [go programming language](https://golang.org/) the following should work from a terminal: 20 21 <!-- 22 docrun: 23 pass: true 24 --> 25 ```shell 26 # get this package 27 $ go get github.com/qri-io/startf 28 29 # navigate to package 30 $ cd $GOPATH/src/github.com/qri-io/startf 31 ``` 32 33 # run tests 34 35 <!-- 36 docrun: 37 pass: true 38 --> 39 ``` 40 $ go test ./... 41 ``` 42 43 Often the next steps are to install [qri](https://github.com/qri-io/qri), mess with this `startf` package, then rebuild qri with your changes to see them in action within qri itself. 44 45 ## Starlark Special Functions 46 47 _Special Functions_ are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant: 48 49 <!-- 50 docrun: 51 test: 52 call: transform(ds, ctx) 53 actual: ds.get_meta() 54 expect: {"hello": "world", "qri": "md:0"} 55 --> 56 ```python 57 def transform(ds,ctx): 58 ds.set_meta("hello","world") 59 ``` 60 61 Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body 62 63 <!-- 64 docrun: 65 test: 66 setup: ds.set_body(["a","b","c"]) 67 call: transform(ds, ctx) 68 actual: ds.get_body() 69 expect: [{"total": 3.0}] 70 --> 71 ```python 72 def transform(ds, ctx): 73 body = ds.get_body() 74 if body != None: 75 count = 0 76 for entry in body: 77 count += len(entry) 78 ds.set_body([{"total": count}]) 79 ``` 80 81 Starlark special functions have a few rules on top of starlark itself: 82 * special functions *always* accept a _transformation context_ (the `ctx` arg) 83 * When you define a data function, qri calls it for you 84 * All special functions are optional (you don't _need_ to define them), except `transform`. transform is required. 85 * Special functions are always called in the same order 86 87 Another import special function is `download`, which allows access to the `http` package: 88 89 <!-- 90 docrun: 91 test: 92 webproxy: 93 url: http://example.com/data.json 94 response: {"data":[4,5,6]} 95 call: download(ctx) 96 actual: ctx.download 97 expect: {"data":[4.0,5.0,6.0]} 98 save: 99 filename: transform.star 100 --> 101 ```python 102 load("http.star", "http") 103 104 def download(ctx): 105 data = http.get("http://example.com/data.json") 106 return data 107 ``` 108 109 The result of this special function can be accessed using `ctx.download`: 110 111 <!-- 112 docrun: 113 test: 114 setup: ctx.download = ["test"] 115 call: transform(ds, ctx) 116 actual: ds.get_body() 117 expect: ["test"] 118 save: 119 filename: transform.star 120 append: true 121 --> 122 ```python 123 def transform(ds, ctx): 124 ds.set_body(ctx.download) 125 ``` 126 127 More docs on the provide API is coming soon. 128 129 ## Running a transform 130 131 Let's say the above function is saved as `transform.star`. You can run it to create a new dataset by using: 132 133 <!-- 134 docrun: 135 pass: true 136 # TODO: Run this command in a sandbox, using the transform.star created above. 137 --> 138 ``` 139 qri save --file=transform.star me/dataset_name 140 ``` 141 142 Or, you can add more details by creating a dataset file (saved as `dataset.yaml`, for example) with additional structure: 143 144 <!-- 145 docrun: 146 pass: true 147 # TODO: Save this file to use in the command below. 148 --> 149 ``` 150 name: dataset_name 151 transform: 152 scriptpath: transform.star 153 meta: 154 title: My awesome dataset 155 ``` 156 157 Then invoke qri: 158 159 <!-- 160 docrun: 161 pass: true 162 # TODO: Run this command in a sandbox, using the dataset.yaml created above. 163 --> 164 ``` 165 qri save --file=dataset.yaml 166 ``` 167 168 Fun! More info over on our [docs site](https://qri.io/docs) 169 170 ** **