github.com/NeowayLabs/nash@v0.2.2-0.20200127205349-a227041ffd50/proposal/1-scope-management.md (about) 1 # Proposal: Proper scope management 2 3 This has already been implemented but these docs remain here as some sort 4 of rationale. In the end we implemented proposal one, requiring new variables 5 to be declared with the **var** keyword. 6 7 ## Abstract 8 9 Currently on nash there is no way to properly work 10 with closures because scope management is very limited. 11 12 Lets elaborate on the problem by implementing a 13 list object by instantiating a set of functions 14 that manipulates the same data. 15 16 ```sh 17 fn list() { 18 l = () 19 20 fn add(val) { 21 l <= append($l, $val) 22 } 23 24 fn get(i) { 25 return $l[$i] 26 } 27 28 fn string() { 29 print("list: [%s]\n", $l) 30 } 31 32 return $add, $get, $string 33 } 34 ``` 35 36 The idea is to hide all list data behind these 3 functions 37 that will manipulate the same data. The problem is that today 38 this is not possible, using this code: 39 40 ```sh 41 add, get, string <= list() 42 43 $add("1") 44 $add("2") 45 $string() 46 47 v <= $get("0") 48 echo $v 49 ``` 50 51 Will result in: 52 53 ``` 54 list: [] 55 /tmp/test.sh:27:5: /tmp/test.sh:11:23: Index out of bounds. len($l) == 0, but given 0 56 ``` 57 58 As you can see, even when we call the **add** function the list 59 remains empty, why is that ? The problem is on the add function: 60 61 ```sh 62 fn add(val) { 63 l <= append($l, $val) 64 } 65 ``` 66 67 When we reference the **l** variable it uses the reference on the 68 outer scope (the empty list), but there is no way to express syntactically 69 that we want to change the list on the outer scope instead of creating 70 a new variable **l** (shadowing the outer **l**). 71 72 That is why the **get** and **print** functions 73 are always referencing an outer list **l** that is empty, a new one 74 is created each time the add function is called. 75 76 In this document we navigate the solution space for this problem. 77 78 ## Proposal I - Create new variables explicitly 79 80 On this proposal new variable creation requires an explicit 81 syntax construction. 82 83 We could add a new keyword `var` that will be used to declare and 84 initialize variables in the local scope, like this: 85 86 ```js 87 var i = "0" 88 ``` 89 90 While the current syntax: 91 92 ```js 93 i = "0" 94 ``` 95 96 Will be assigning a new value to an already existent variable **i**. 97 The assignment will first look for the target variable in the local 98 scope and then in the parent, traversing the entire stack, until it's 99 found and then updated, otherwise (in case the variable is not found) 100 the interpreter must abort with error. 101 102 ```sh 103 var count = "0" # declare local variable 104 105 fn inc() { 106 # update outer variable 107 count, _ <= expr $count "+" 1 108 } 109 110 inc() 111 print($count) # outputs: 1 112 ``` 113 114 Below is how this proposal solves the list example: 115 116 ```sh 117 fn list() { 118 # initialize an "l" variable in this scope 119 var l = () 120 121 fn add(val) { 122 # use the "l" variable from parent scope 123 # find first in the this scope if not found 124 # then find variable in the parent scope 125 l <= append($l, $val) 126 } 127 128 fn get(i) { 129 # use the "l" variable from parent scope 130 return $l[$i] 131 } 132 133 fn string() { 134 # use the "l" variable from parent scope 135 print("list: [%s]\n", $l) 136 } 137 138 fn not_clear() { 139 # force initialize a new "l" variable in this scope 140 # because this the "l" list in the parent scope is not cleared 141 var l = () 142 } 143 144 return $add, $get, $string 145 } 146 ``` 147 148 Syntactically, the `var` statement is an extension of the assignment 149 and exec-assignment statements, and then it should support multiple 150 declarations in a single statement also. Eg.: 151 152 ```sh 153 var i, j = "0", "1" 154 155 var body, err <= curl -f $url 156 157 var name, surname, err <= getAuthor() 158 ``` 159 160 Using var always creates new variables, shadowing previous ones, 161 for example: 162 163 164 ```sh 165 var a, b = "0", "1" # works fine, variables didn't existed before 166 167 var a, b, c = "4", "5", "6" # works! too, creating new a, b, c 168 ``` 169 170 On a dynamic typed language there is very little difference between 171 creating a new var or just reassigning it since variables are just 172 references that store no type information at all. For example, 173 what is the difference between this: 174 175 ``` 176 var a = "1" 177 a = () 178 ``` 179 180 And this ? 181 182 ``` 183 var a = "1" 184 var a = () 185 ``` 186 187 The behavior will be exactly the same, there is no semantic error 188 on reassigning the same variable to a value with a different type, 189 so reassigning on redeclaring has no difference at all (although it 190 makes sense for statically typed languages). 191 192 Statements are evaluated in order, so this: 193 194 ``` 195 a = () 196 var a = "1" 197 ``` 198 199 Is **NOT** the same as this: 200 201 ``` 202 var a = "1" 203 var a = () 204 ``` 205 206 This is easier to understand when using closures, let's go 207 back to our list implementation, we had something like this: 208 209 ``` 210 var l = () 211 212 fn add(val) { 213 # use the "l" variable from parent scope 214 # find first in the this scope if not found 215 # then find variable in the parent scope 216 l <= append($l, $val) 217 } 218 ``` 219 220 If we write this: 221 222 ``` 223 var l = () 224 225 fn add(val) { 226 # creates new var 227 var l = () 228 # manipulates new l var 229 l <= append($l, $val) 230 } 231 ``` 232 233 The **add** function will not manipulate the **l** variable from the 234 outer scope, and our list implementation will not work properly. 235 236 But writing this: 237 238 ``` 239 var l = () 240 241 fn add(val) { 242 # manipulates outer l var 243 l <= append($l, $val) 244 # creates new var that is useless 245 var l = () 246 } 247 ``` 248 249 Will work, since we assigned a new value to the outer **l** 250 before creating a new **l** var. 251 252 The approach described here is very similar to how variables 253 are handled in [Lua](https://www.lua.org/), with the exception 254 that Lua uses the **local** keyword, instead of var. 255 256 Also, Lua allows global variables to be created by default, on 257 Nash we prefer to avoid global stuff and produce an error when 258 assigning new values to variables that do not exist. 259 260 Summarizing, on this proposal creating new variables is explicit 261 and referencing existent variables on outer scopes is implicit. 262 263 264 ## Proposal II - Manipulate outer scope explicitly 265 266 This proposal adds a new `outer` keyword that permits the update of 267 variables in the outer scope. The default and implicit behavior of 268 variable assignments is to always create a new variable. 269 270 Considering our list example: 271 272 ```sh 273 fn list() { 274 # initialize an "l" variable in this scope 275 l = () 276 277 fn add(val) { 278 # use the "l" variable from the parent 279 outer l <= append($l, $val) 280 } 281 282 fn get(i) { 283 # use the "l" variable from the parent outer l 284 return $l[$i] 285 } 286 287 fn string() { 288 # use the "l" variable from the parent outer l 289 print("list: [%s]\n", $l) 290 } 291 292 return $add, $get, $string 293 } 294 ``` 295 296 The `outer` keyword has the same meaning that Python's `global` 297 keyword. 298 299 Different from Python global, outer must appear on all assignments, 300 like this: 301 302 ```sh 303 fn list() { 304 # initialize an "l" variable in this scope 305 l = () 306 307 fn doubleadd(val) { 308 outer l <= append($l, $val) 309 outer l <= append($l, $val) 310 } 311 312 return $doubleadd 313 } 314 ``` 315 316 This would be buggy and only add once: 317 318 ```sh 319 fn list() { 320 # initialize an "l" variable in this scope 321 l = () 322 323 fn doubleadd(val) { 324 outer l <= append($l, $val) 325 l <= append($l, $val) 326 } 327 328 return $doubleadd 329 } 330 ``` 331 332 Trying to elaborate more on possible combinations 333 when using the **outer** keyword we get at some hard 334 questions, like what does outer means on this case: 335 336 ``` 337 fn list() { 338 # initialize an "l" variable in this scope 339 l = () 340 fn doubleadd(val) { 341 l <= append($l, $val) 342 outer l <= append($l, $val) 343 } 344 return $doubleadd 345 } 346 ``` 347 348 Will outer just handle the reference on its own scope or 349 will it jump its own scope and manipulate the outer variable ? 350 351 The name outer implies that it will manipulate the outer scope, 352 bypassing its own current scope, but how do you read the outer 353 variable ? We would need to support something like: 354 355 ``` 356 fn list() { 357 # initialize an "l" variable in this scope 358 l = () 359 fn add(val) { 360 l <= "whatever" 361 outer l <= append(outer $l, $val) 362 } 363 return $doubleadd 364 } 365 ``` 366 367 It is like with outer we are bypassing the lexical semantics 368 of the code, the order of declarations is not relevant anymore 369 since you have a form of "goto" to jump the current scope. 370 371 ## Comparing both approaches 372 373 As everything in life, the design space for how to handle 374 scope management is full of tradeoffs. 375 376 Making outer scope management explicit makes declaring 377 new variables easier, since you have to type less to 378 create new vars. 379 380 But managing scope using closures gets more cumbersome, 381 consider this nested closures with the **outer** keyword: 382 383 ```sh 384 fn list() { 385 l = () 386 387 fn add(val) { 388 # use the "l" variable from the parent 389 outer l <= append($l, $val) 390 fn addagain() { 391 outer l <= append($l, $val) 392 } 393 return $addagain 394 } 395 396 return $add 397 } 398 ``` 399 400 And this one with **var** : 401 402 ```sh 403 fn list() { 404 var l = () 405 406 fn add(val) { 407 # use the "l" variable from the parent 408 l <= append($l, $val) 409 fn addagain() { 410 l <= append($l, $val) 411 } 412 return $addagain 413 } 414 415 return $add 416 } 417 ``` 418 419 The **var** option requires more writing for the common 420 case of declaring new variables (specially on the interactive shell 421 this is pretty annoying), but makes closures pretty 422 natural to write, you just manipulate the variables 423 that exists lexically on your scope, like you would do 424 inside a **if** or **for** block. 425 426 Thinking about cognition, it seems easier to write buggy code 427 by forgetting to add an **outer** on the code than forgetting 428 to add a **var** and by mistake manipulate an variable outside 429 the scope. 430 431 The decision to break if the variable does not exist also enhances 432 the **var** option as less buggy since no new variable will be 433 created if you forget the **var**, but lexically reachable variables 434 will be manipulated (this is ameliorated by the fact that we don't have 435 global variables). 436 437 If we go for **outer** it seems that we are going to write less, 438 but some code, involving closures, will be harder to read (and write). 439 Since code is usually read more than it is written it seems like a sensible 440 choice to optimize for readability and understandability than just 441 save a few keystrokes. 442 443 But any statements made about cognition are really hard to be 444 considered as a global truth, since all human beings are biased which makes 445 identification of common patterns of cognition really hard. But if software 446 design has any kind of goal, must be this =).