github.com/lab47/exprcore@v0.0.0-20210525052339-fb7d6bd9331e/doc/impl.md (about) 1 2 # Starlark in Go: Implementation 3 4 This document (a work in progress) describes some of the design 5 choices of the Go implementation of Starlark. 6 7 * [Scanner](#scanner) 8 * [Parser](#parser) 9 * [Resolver](#resolver) 10 * [Evaluator](#evaluator) 11 * [Data types](#data-types) 12 * [Freezing](#freezing) 13 * [Testing](#testing) 14 15 16 ## Scanner 17 18 The scanner is derived from Russ Cox's 19 [buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier) 20 tool, which pretty-prints Bazel BUILD files. 21 22 Most of the work happens in `(*scanner).nextToken`. 23 24 ## Parser 25 26 The parser is hand-written recursive-descent parser. It uses the 27 technique of [precedence 28 climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing) 29 to reduce the number of productions. 30 31 In some places the parser accepts a larger set of programs than are 32 strictly valid, leaving the task of rejecting them to the subsequent 33 resolver pass. For example, in the function call `f(a, b=c)` the 34 parser accepts any expression for `a` and `b`, even though `b` may 35 legally be only an identifier. For the parser to distinguish these 36 cases would require additional lookahead. 37 38 ## Resolver 39 40 The resolver reports structural errors in the program, such as the use 41 of `break` and `continue` outside of a loop. 42 43 Starlark has stricter syntactic limitations than Python. For example, 44 it does not permit `for` loops or `if` statements at top level, nor 45 does it permit global variables to be bound more than once. 46 These limitations come from the Bazel project's desire to make it easy 47 to identify the sole statement that defines each global, permitting 48 accurate cross-reference documentation. 49 50 In addition, the resolver validates all variable names, classifying 51 them as references to universal, global, local, or free variables. 52 Local and free variables are mapped to a small integer, allowing the 53 evaluator to use an efficient (flat) representation for the 54 environment. 55 56 Not all features of the Go implementation are "standard" (that is, 57 supported by Bazel's Java implementation), at least for now, so 58 non-standard features such as `lambda`, `float`, and `set` 59 are flag-controlled. The resolver reports 60 any uses of dialect features that have not been enabled. 61 62 63 ## Evaluator 64 65 ### Data types 66 67 <b>Integers:</b> Integers are representing using `big.Int`, an 68 arbitrary precision integer. This representation was chosen because, 69 for many applications, Starlark must be able to handle without loss 70 protocol buffer values containing signed and unsigned 64-bit integers, 71 which requires 65 bits of precision. 72 73 Small integers (<256) are preallocated, but all other values require 74 memory allocation. Integer performance is relatively poor, but it 75 matters little for Bazel-like workloads which depend much 76 more on lists of strings than on integers. (Recall that a typical loop 77 over a list in Starlark does not materialize the loop index as an `int`.) 78 79 An optimization worth trying would be to represent integers using 80 either an `int32` or `big.Int`, with the `big.Int` used only when 81 `int32` does not suffice. Using `int32`, not `int64`, for "small" 82 numbers would make it easier to detect overflow from operations like 83 `int32 * int32`, which would trigger the use of `big.Int`. 84 85 <b>Floating point</b>: 86 Floating point numbers are represented using Go's `float64`. 87 Again, `float` support is required to support protocol buffers. The 88 existence of floating-point NaN and its infamous comparison behavior 89 (`NaN != NaN`) had many ramifications for the API, since we cannot 90 assume the result of an ordered comparison is either less than, 91 greater than, or equal: it may also fail. 92 93 <b>Strings</b>: 94 95 TODO: discuss UTF-8 and string.bytes method. 96 97 <b>Dictionaries and sets</b>: 98 Starlark dictionaries have predictable iteration order. 99 Furthermore, many Starlark values are hashable in Starlark even though 100 the Go values that represent them are not hashable in Go: big 101 integers, for example. 102 Consequently, we cannot use Go maps to implement Starlark's dictionary. 103 104 We use a simple hash table whose buckets are linked lists, each 105 element of which holds up to 8 key/value pairs. In a well-distributed 106 table the list should rarely exceed length 1. In addition, each 107 key/value item is part of doubly-linked list that maintains the 108 insertion order of the elements for iteration. 109 110 <b>Struct:</b> 111 The `starlarkstruct` Go package provides a non-standard Starlark 112 extension data type, `struct`, that maps field identifiers to 113 arbitrary values. Fields are accessed using dot notation: `y = s.f`. 114 This data type is extensively used in Bazel, but its specification is 115 currently evolving. 116 117 Starlark has no `class` mechanism, nor equivalent of Python's 118 `namedtuple`, though it is likely that future versions will support 119 some way to define a record data type of several fields, with a 120 representation more efficient than a hash table. 121 122 123 ### Freezing 124 125 All mutable values created during module initialization are _frozen_ 126 upon its completion. It is this property that permits a Starlark module 127 to be referenced by two Starlark threads running concurrently (such as 128 the initialization threads of two other modules) without the 129 possibility of a data race. 130 131 The Go implementation supports freezing by storing an additional 132 "frozen" Boolean variable in each mutable object. Once this flag is set, 133 all subsequent attempts at mutation fail. Every value defines a 134 Freeze method that sets its own frozen flag if not already set, and 135 calls Freeze for each value that it contains. 136 For example, when a list is frozen, it freezes each of its elements; 137 when a dictionary is frozen, it freezes each of its keys and values; 138 and when a function value is frozen, it freezes each of the free 139 variables and parameter default values implicitly referenced by its closure. 140 Application-defined types must also follow this discipline. 141 142 The freeze mechanism in the Go implementation is finer grained than in 143 the Java implementation: in effect, the latter has one "frozen" flag 144 per module, and every value holds a reference to the frozen flag of 145 its module. This makes setting the frozen flag more efficient---a 146 simple bit flip, no need to traverse the object graph---but coarser 147 grained. Also, it complicates the API slightly because to construct a 148 list, say, requires a reference to the frozen flag it should use. 149 150 The Go implementation would also permit the freeze operation to be 151 exposed to the program, for example as a built-in function. 152 This has proven valuable in writing tests of the freeze mechanism 153 itself, but is otherwise mostly a curiosity. 154 155 156 ### Fail-fast iterators 157 158 In some languages (such as Go), a program may mutate a data structure 159 while iterating over it; for example, a range loop over a map may 160 delete map elements. In other languages (such as Java), iterators do 161 extra bookkeeping so that modification of the underlying collection 162 invalidates the iterator, and the next attempt to use it fails. 163 This often helps to detect subtle mistakes. 164 165 Starlark takes this a step further. Instead of mutation of the 166 collection invalidating the iterator, the act of iterating makes the 167 collection temporarily immutable, so that an attempt to, say, delete a 168 dict element while looping over the dict, will fail. The error is 169 reported against the delete operation, not the iteration. 170 171 This is implemented by having each mutable iterable value record a 172 counter of active iterators. Starting a loop increments this counter, 173 and completing a loop decrements it. A collection with a nonzero 174 counter behaves as if frozen. If the collection is actually frozen, 175 the counter bookkeeping is unnecessary. (Consequently, iterator 176 bookkeeping is needed only while objects are still mutable, before 177 they can have been published to another thread, and thus no 178 synchronization is necessary.) 179 180 A consequence of this design is that in the Go API, it is imperative 181 to call `Done` on each iterator once it is no longer needed. 182 183 ``` 184 TODO 185 starlark.Value interface and subinterfaces 186 argument passing to builtins: UnpackArgs, UnpackPositionalArgs. 187 ``` 188 189 <b>Evaluation strategy:</b> 190 The evaluator uses a simple recursive tree walk, returning a value or 191 an error for each expression. We have experimented with just-in-time 192 compilation of syntax trees to bytecode, but two limitations in the 193 current Go compiler prevent this strategy from outperforming the 194 tree-walking evaluator. 195 196 First, the Go compiler does not generate a "computed goto" for a 197 switch statement ([Go issue 198 5496](https://github.com/golang/go/issues/5496)). A bytecode 199 interpreter's main loop is a for-loop around a switch statement with 200 dozens or hundreds of cases, and the speed with which each case can be 201 dispatched strongly affects overall performance. 202 Currently, a switch statement generates a binary tree of ordered 203 comparisons, requiring several branches instead of one. 204 205 Second, the Go compiler's escape analysis assumes that the underlying 206 array from a `make([]Value, n)` allocation always escapes 207 ([Go issue 20533](https://github.com/golang/go/issues/20533)). 208 Because the bytecode interpreter's operand stack has a non-constant 209 length, it must be allocated with `make`. The resulting allocation 210 adds to the cost of each Starlark function call; this can be tolerated 211 by amortizing one very large stack allocation across many calls. 212 More problematic appears to be the cost of the additional GC write 213 barriers incurred by every VM operation: every intermediate result is 214 saved to the VM's operand stack, which is on the heap. 215 By contrast, intermediate results in the tree-walking evaluator are 216 never stored to the heap. 217 218 ``` 219 TODO 220 frames, backtraces, errors. 221 threads 222 Print 223 Load 224 ``` 225 226 ## Testing 227 228 ``` 229 TODO 230 starlarktest package 231 `assert` module 232 starlarkstruct 233 integration with Go testing.T 234 ``` 235 236 237 ## TODO 238 239 240 ``` 241 Discuss practical separation of code and data. 242 ```