github.com/NeowayLabs/nash@v0.2.2-0.20200127205349-a227041ffd50/proposal/1-scope-management.md (about)

     1  # Proposal: Proper scope management
     2  
     3  This has already been implemented but these docs remain here as some sort
     4  of rationale. In the end we implemented proposal one, requiring new variables
     5  to be declared with the **var** keyword.
     6  
     7  ## Abstract
     8  
     9  Currently on nash there is no way to properly work
    10  with closures because scope management is very limited.
    11  
    12  Lets elaborate on the problem by implementing a
    13  list object by instantiating a set of functions
    14  that manipulates the same data.
    15  
    16  ```sh
    17  fn list() {
    18  	l = ()
    19  
    20  	fn add(val) {
    21  		l <= append($l, $val)
    22  	}
    23  
    24  	fn get(i) {
    25  		return $l[$i]
    26  	}
    27  
    28  	fn string() {
    29  		print("list: [%s]\n", $l)
    30  	}
    31  
    32  	return $add, $get, $string
    33  }
    34  ```
    35  
    36  The idea is to hide all list data behind these 3 functions
    37  that will manipulate the same data. The problem is that today
    38  this is not possible, using this code:
    39  
    40  ```sh
    41  add, get, string <= list()
    42  
    43  $add("1")
    44  $add("2")
    45  $string()
    46  
    47  v <= $get("0")
    48  echo $v
    49  ```
    50  
    51  Will result in:
    52  
    53  ```
    54  list: []
    55  /tmp/test.sh:27:5: /tmp/test.sh:11:23: Index out of bounds. len($l) == 0, but given 0
    56  ```
    57  
    58  As you can see, even when we call the **add** function the list
    59  remains empty, why is that ? The problem is on the add function:
    60  
    61  ```sh
    62  fn add(val) {
    63  	l <= append($l, $val)
    64  }
    65  ```
    66  
    67  When we reference the **l** variable it uses the reference on the
    68  outer scope (the empty list), but there is no way to express syntactically
    69  that we want to change the list on the outer scope instead of creating
    70  a new variable **l** (shadowing the outer **l**).
    71  
    72  That is why the **get** and **print** functions
    73  are always referencing an outer list **l** that is empty, a new one
    74  is created each time the add function is called.
    75  
    76  In this document we navigate the solution space for this problem.
    77  
    78  ## Proposal I - Create new variables explicitly
    79  
    80  On this proposal new variable creation requires an explicit
    81  syntax construction.
    82  
    83  We could add a new keyword `var` that will be used to declare and
    84  initialize variables in the local scope, like this:
    85  
    86  ```js
    87  var i = "0"
    88  ```
    89  
    90  While the current syntax:
    91  
    92  ```js
    93  i = "0"
    94  ```
    95  
    96  Will be assigning a new value to an already existent variable **i**.
    97  The assignment will first look for the target variable in the local
    98  scope and then in the parent, traversing the entire stack, until it's
    99  found and then updated, otherwise (in case the variable is not found)
   100  the interpreter must abort with error.
   101  
   102  ```sh
   103  var count = "0" # declare local variable
   104  
   105  fn inc() {
   106  	# update outer variable
   107  	count, _ <= expr $count "+" 1
   108  }
   109  
   110  inc()
   111  print($count) 	# outputs: 1
   112  ```
   113  
   114  Below is how this proposal solves the list example:
   115  
   116  ```sh
   117  fn list() {
   118  	# initialize an "l" variable in this scope
   119  	var l = ()
   120  
   121  	fn add(val) {
   122  		# use the "l" variable from parent scope
   123  		# find first in the this scope if not found
   124  		# then find variable in the parent scope
   125  		l <= append($l, $val)
   126  	}
   127  
   128  	fn get(i) {
   129  		# use the "l" variable from parent scope
   130  		return $l[$i]
   131  	}
   132  
   133  	fn string() {
   134  		# use the "l" variable from parent scope
   135  		print("list: [%s]\n", $l)
   136  	}
   137  
   138  	fn not_clear() {
   139  		# force initialize a new "l" variable in this scope
   140  		# because this the "l" list in the parent scope is not cleared
   141  		var l = ()
   142  	}
   143  
   144  	return $add, $get, $string
   145  }
   146  ```
   147  
   148  Syntactically, the `var` statement is an extension of the assignment
   149  and exec-assignment statements, and then it should support multiple
   150  declarations in a single statement also. Eg.:
   151  
   152  ```sh
   153  var i, j = "0", "1"
   154  
   155  var body, err <= curl -f $url
   156  
   157  var name, surname, err <= getAuthor()
   158  ```
   159  
   160  Using var always creates new variables, shadowing previous ones,
   161  for example:
   162  
   163  
   164  ```sh
   165  var a, b = "0", "1" # works fine, variables didn't existed before
   166  
   167  var a, b, c = "4", "5", "6" # works! too, creating new a, b, c
   168  ```
   169  
   170  On a dynamic typed language there is very little difference between
   171  creating a new var or just reassigning it since variables are just
   172  references that store no type information at all. For example,
   173  what is the difference between this:
   174  
   175  ```
   176  var a = "1"
   177  a = ()
   178  ```
   179  
   180  And this ?
   181  
   182  ```
   183  var a = "1"
   184  var a = ()
   185  ```
   186  
   187  The behavior will be exactly the same, there is no semantic error
   188  on reassigning the same variable to a value with a different type,
   189  so reassigning on redeclaring has no difference at all (although it
   190  makes sense for statically typed languages).
   191  
   192  Statements are evaluated in order, so this:
   193  
   194  ```
   195  a = ()
   196  var a = "1"
   197  ```
   198  
   199  Is **NOT** the same as this:
   200  
   201  ```
   202  var a = "1"
   203  var a = ()
   204  ```
   205  
   206  This is easier to understand when using closures, let's go
   207  back to our list implementation, we had something like this:
   208  
   209  ```
   210  var l = ()
   211  
   212  fn add(val) {
   213          # use the "l" variable from parent scope
   214          # find first in the this scope if not found
   215          # then find variable in the parent scope
   216          l <= append($l, $val)
   217  }
   218  ```
   219  
   220  If we write this:
   221  
   222  ```
   223  var l = ()
   224  
   225  fn add(val) {
   226          # creates new var
   227          var l = ()
   228          # manipulates new l var
   229          l <= append($l, $val)
   230  }
   231  ```
   232  
   233  The **add** function will not manipulate the **l** variable from the
   234  outer scope, and our list implementation will not work properly.
   235  
   236  But writing this:
   237  
   238  ```
   239  var l = ()
   240  
   241  fn add(val) {
   242          # manipulates outer l var
   243          l <= append($l, $val)
   244          # creates new var that is useless
   245          var l = ()
   246  }
   247  ```
   248  
   249  Will work, since we assigned a new value to the outer **l**
   250  before creating a new **l** var.
   251  
   252  The approach described here is very similar to how variables
   253  are handled in [Lua](https://www.lua.org/), with the exception
   254  that Lua uses the **local** keyword, instead of var.
   255  
   256  Also, Lua allows global variables to be created by default, on
   257  Nash we prefer to avoid global stuff and produce an error when
   258  assigning new values to variables that do not exist.
   259  
   260  Summarizing, on this proposal creating new variables is explicit
   261  and referencing existent variables on outer scopes is implicit.
   262  
   263  
   264  ## Proposal II - Manipulate outer scope explicitly
   265  
   266  This proposal adds a new `outer` keyword that permits the update of
   267  variables in the outer scope. The default and implicit behavior of
   268  variable assignments is to always create a new variable.
   269  
   270  Considering our list example:
   271  
   272  ```sh
   273  fn list() {
   274  	# initialize an "l" variable in this scope
   275  	l = ()
   276  
   277  	fn add(val) {
   278  		# use the "l" variable from the parent
   279  		outer l <= append($l, $val)
   280  	}
   281  
   282  	fn get(i) {
   283  		# use the "l" variable from the parent outer l
   284  		return $l[$i]
   285  	}
   286  
   287  	fn string() {
   288  		# use the "l" variable from the parent outer l
   289  		print("list: [%s]\n", $l)
   290  	}
   291  
   292  	return $add, $get, $string
   293  }
   294  ```
   295  
   296  The `outer` keyword has the same meaning that Python's `global`
   297  keyword.
   298  
   299  Different from Python global, outer must appear on all assignments,
   300  like this:
   301  
   302  ```sh
   303  fn list() {
   304  	# initialize an "l" variable in this scope
   305  	l = ()
   306  
   307  	fn doubleadd(val) {
   308  		outer l <= append($l, $val)
   309  		outer l <= append($l, $val)
   310  	}
   311  
   312  	return $doubleadd
   313  }
   314  ```
   315  
   316  This would be buggy and only add once:
   317  
   318  ```sh
   319  fn list() {
   320  	# initialize an "l" variable in this scope
   321  	l = ()
   322  
   323  	fn doubleadd(val) {
   324  		outer l <= append($l, $val)
   325  		l <= append($l, $val)
   326  	}
   327  
   328  	return $doubleadd
   329  }
   330  ```
   331  
   332  Trying to elaborate more on possible combinations
   333  when using the **outer** keyword we get at some hard
   334  questions, like what does outer means on this case:
   335  
   336  ```
   337  fn list() {
   338      # initialize an "l" variable in this scope
   339      l = ()
   340      fn doubleadd(val) {
   341          l <= append($l, $val)
   342          outer l <= append($l, $val)
   343      }
   344      return $doubleadd
   345  }
   346  ```
   347  
   348  Will outer just handle the reference on its own scope or
   349  will it jump its own scope and manipulate the outer variable ?
   350  
   351  The name outer implies that it will manipulate the outer scope,
   352  bypassing its own current scope, but how do you read the outer
   353  variable ? We would need to support something like:
   354  
   355  ```
   356  fn list() {
   357      # initialize an "l" variable in this scope
   358      l = ()
   359      fn add(val) {
   360          l <= "whatever"
   361          outer l <= append(outer $l, $val)
   362      }
   363      return $doubleadd
   364  }
   365  ```
   366  
   367  It is like with outer we are bypassing the lexical semantics
   368  of the code, the order of declarations is not relevant anymore
   369  since you have a form of "goto" to jump the current scope.
   370  
   371  ## Comparing both approaches
   372  
   373  As everything in life, the design space for how to handle
   374  scope management is full of tradeoffs.
   375  
   376  Making outer scope management explicit makes declaring
   377  new variables easier, since you have to type less to
   378  create new vars.
   379  
   380  But managing scope using closures gets more cumbersome,
   381  consider this nested closures with the **outer** keyword:
   382  
   383  ```sh
   384  fn list() {
   385  	l = ()
   386  
   387  	fn add(val) {
   388  		# use the "l" variable from the parent
   389  		outer l <= append($l, $val)
   390  		fn addagain() {
   391  		        outer l <= append($l, $val)
   392  		}
   393  		return $addagain
   394  	}
   395  
   396  	return $add
   397  }
   398  ```
   399  
   400  And this one with **var** :
   401  
   402  ```sh
   403  fn list() {
   404  	var l = ()
   405  
   406  	fn add(val) {
   407  		# use the "l" variable from the parent
   408  		l <= append($l, $val)
   409  		fn addagain() {
   410  		        l <= append($l, $val)
   411  		}
   412  		return $addagain
   413  	}
   414  
   415  	return $add
   416  }
   417  ```
   418  
   419  The **var** option requires more writing for the common
   420  case of declaring new variables (specially on the interactive shell
   421  this is pretty annoying), but makes closures pretty
   422  natural to write, you just manipulate the variables
   423  that exists lexically on your scope, like you would do
   424  inside a **if** or **for** block.
   425  
   426  Thinking about cognition, it seems easier to write buggy code
   427  by forgetting to add an **outer** on the code than forgetting
   428  to add a **var** and by mistake manipulate an variable outside
   429  the scope.
   430  
   431  The decision to break if the variable does not exist also enhances
   432  the **var** option as less buggy since no new variable will be
   433  created if you forget the **var**, but lexically reachable variables
   434  will be manipulated (this is ameliorated by the fact that we don't have
   435  global variables).
   436  
   437  If we go for **outer** it seems that we are going to write less,
   438  but some code, involving closures, will be harder to read (and write).
   439  Since code is usually read more than it is written it seems like a sensible
   440  choice to optimize for readability and understandability than just
   441  save a few keystrokes.
   442  
   443  But any statements made about cognition are really hard to be
   444  considered as a global truth, since all human beings are biased which makes
   445  identification of common patterns of cognition really hard. But if software
   446  design has any kind of goal, must be this =).