github.com/xiaq/elvish@v0.12.0/website/src/learn/unique-semantics.md

github.com/xiaq/elvish@v0.12.0/website/src/learn/unique-semantics.md (about)

     1  <!-- toc -->
     2  
     3  <div class="clear"></div>
     4  
     5  The semantics of Elvish is unique in many aspects when compared to other
     6  shells. This can be surprising if you are used to other shells, and it is a
     7  result of the [design choice](/ref/philosophy.html) of making Elvish a
     8  full-fledged programming language.
     9  
    10  # Structureful IO
    11  
    12  Elvish offers the ability to build elaborate data structures, "return" them
    13  from functions, and pass them through pipelines.
    14  
    15  ## Motivation
    16  
    17  Traditional shells use strings for all kinds of data. They can be stored in
    18  variables, used as function arguments, written to output and read from input.
    19  Strings are very simple to use, but they fall short if your data has an
    20  inherent structure. A common solution is using pseudo-structures of "each line
    21  representing a record, and each (whitespace-separated) field represents a
    22  property", which is fine as long as your data do not contain whitespaces. If
    23  they do, you will quickly run into problems with escaping and quotation and
    24  find yourself doing black magics with strings.
    25  
    26  Some shells provide data structures like lists and maps, but they are usually
    27  not first-class values. You can store them in variables, but you might not to
    28  be able to nest them, pass them to functions, or returned them from functions.
    29  
    30  ## Data Structures and "Returning" Them
    31  
    32  Elvish offers first-class support for data structures such as lists and maps.
    33  Here is an example that uses a list:
    34  
    35  ```elvish-transcript
    36  ~> li = [foo bar 'lorem ipsum']
    37  ~> kind-of $li # "kind" is like type
    38  ▶ list
    39  ~> count $li # count the number of elements in a list
    40  ▶ 3
    41  ```
    42  
    43  (See the [language reference](/ref/language.html) for a more complete description
    44  of the builtin data structures.)
    45  
    46  As you can see, you can store lists in variables and use them as command
    47  arguments. But they would be much less useful if you cannot **return** them
    48  from a function. A naive way to do this is by `echo`ing the list and use output
    49  capture to recover it:
    50  
    51  ```elvish-transcript
    52  ~> fn f {
    53       echo [foo bar 'lorem ipsum']
    54     }
    55  ~> li = (f) # (...) is output capture, like $(...) in other shells
    56  ~> kind-of $li
    57  ▶ string
    58  ~> count $li # count the number of bytes, since $li is now a string
    59  ▶ 23
    60  ```
    61  
    62  As we have seen, our attempt to output a list has turned it into a string.
    63  This is because the `echo` command in Elvish, like in other shells, is
    64  string-oriented. To echo a list, it has to be first converted to a string.
    65  
    66  Elvish provides a `put` command to output structured values as they are:
    67  
    68  ```elvish-transcript
    69  ~> fn f {
    70       put [foo bar 'lorem ipsum']
    71     }
    72  ~> li = (f)
    73  ~> kind-of $li
    74  ▶ list
    75  ~> count $li
    76  ▶ 3
    77  ```
    78  
    79  So how does `put` work differently from `echo` under the hood?
    80  
    81  In Elvish, the standard output is made up of two parts: one traditional
    82  byte-oriented **file**, and one internal **value-oriented channel**. The
    83  `echo` command writes to the file, so it has to serialize its arguments into
    84  strings; the `put` command writes to the value-oriented channel, preserving
    85  all the internal structures of the values.
    86  
    87  If you invoke `put` directly from the command prompt, the values it output have
    88  a leading `▶`:
    89  
    90  ```elvish-transcript
    91  ~> put [foo bar]
    92  ▶ [foo bar]
    93  ```
    94  
    95  The leading arrow is a way to visualize that a command has written something
    96  onto the value channel, and not part of the value itself.
    97  
    98  In retrospect, you may discover that the `kind-of` and `count` builtin commands
    99  also write their output to the value channel.
   100  
   101  
   102  ## Passing Data Structures Through the Pipeline
   103  
   104  When I said that standard output in Elvish comprises two parts, I was not
   105  telling the full story: pipelines in Elvish also have these two parts, in a
   106  very similar way. Data structures can flow in the value-oriented part of the
   107  pipeline as well. For instance, the `each` command takes **input** from the
   108  value-oriented channel, and apply a function to each value:
   109  
   110  ```elvish-transcript
   111  ~> put lorem ipsum | each [x]{ echo "Got "$x }
   112  Got lorem
   113  Got ipsum
   114  ```
   115  
   116  There are many builtin commands that inputs or outputs values. As another
   117  example, the `take` commands retains a fixed number of items:
   118  
   119  ```elvish-transcript
   120  ~> put [lorem ipsum] "foo\nbar" [&key=value] | take 2
   121  ▶ [lorem ipsum]
   122  ▶ "foo\nbar"
   123  ```
   124  
   125  ## Interoperability with External Commands
   126  
   127  Unfortunately, the ability of passing structured values is not available to
   128  external commands. However, Elvish comes with a pair of commands for JSON
   129  serialization/deserialization. The following snippet illustrates how
   130  to interoperate with a Python script:
   131  
   132  ```elvish-transcript
   133  ~> cat sort-list.py
   134  import json, sys
   135  li = json.load(sys.stdin)
   136  li.sort()
   137  json.dump(li, sys.stdout)
   138  ~> put [lorem ipsum foo bar] | to-json | python sort-list.py | from-json
   139  ▶ [bar foo ipsum lorem]
   140  ```
   141  
   142  It is easy to write a wrapper for such external commands:
   143  
   144  ```elvish-transcript
   145  ~> fn sort-list { to-json | python sort-list.py | from-json }
   146  ~> put [lorem ipsum foo bar] | sort-list
   147  ▶ [bar foo ipsum lorem]
   148  ```
   149  
   150  More serialization/deserialization commands may be added to the language in
   151  the future.
   152  
   153  
   154  # Exit Status and Exceptions
   155  
   156  Unix commands exit with a non-zero value to signal errors. This is available
   157  traditionally as a `$?` variable in other shells:
   158  
   159  ```bash
   160  true
   161  echo $? # prints "0"
   162  false
   163  echo $? # prints "1"
   164  ```
   165  
   166  Builtin commands and user-defined functions also do this to signal errors,
   167  although they are not Unix commands:
   168  
   169  ```bash
   170  bad() {
   171    return 2
   172  }
   173  bad
   174  echo $? # prints "2"
   175  ```
   176  
   177  This model is fine, only if most errors are non-fatal (so that errors from a
   178  previous command normally do not affect the execution of subsequence ones) and
   179  the script author remembers to check `$?` for the rare fatal errors.
   180  
   181  Elvish has no concept of exit status. Instead, it has exceptions that, when
   182  thrown, interrupt the flow of execution. The equivalency of the `bad` function
   183  in elvish is as follows:
   184  
   185  ```elvish
   186  fn bad {
   187    fail "bad things have happened" # throw an exception
   188  }
   189  bad # will print a stack trace and stop execution
   190  echo "after bad" # not executed
   191  ```
   192  
   193  (If you run this interactively, you need to enter a literal newline after
   194  `bad` by pressing <span class="key">Alt-Enter</span> to make sure that it is
   195  executed in the same chunk as `echo "after bad"`.)
   196  
   197  And, non-zero exit status from external commands are turned into exceptions:
   198  
   199  ```elvish
   200  false # will print a stack trace and stop execution
   201  echo "after false"
   202  ```
   203  
   204  (If you are running a version of Elvish older than 0.9, you might need to use
   205  `e:false`, as Elvish used to have a builtin version of `false` that does
   206  something different.)
   207  
   208  An alternative way to describe this is that Elvish **does** have exit
   209  statuses, but non-zero exit statuses terminates execution by default.
   210  
   211  If you come from POSIX shell, this is aknin to `set -e` or having a `&&`
   212  between each pair of pipelines. In a sense, `&&` is written as `;` (or more
   213  commonly, a newline) in Elvish.
   214  
   215  Defaulting on stopping execution when bad things happen makes Elvish safer and
   216  code behavior more predictable.
   217  
   218  
   219  ## Predicates and `if`
   220  
   221  The use of exit status is not limited to errors, however. In the Unix toolbox,
   222  quite a few commands exit with 0 to signal "true" and 1 to signal "false".
   223  Notably ones are:
   224  
   225  *   `test` aka `[`: testing file types, comparing numbers and strings;
   226  
   227  *   `grep`: exits with 0 when there are matches, with 1 otherwise;
   228  
   229  *   `diff`: exits with 0 when files are the same, with 1 otherwise;
   230  
   231  *   `true` and `false`, always exit with 0 and 1 respectively.
   232  
   233  The `if` control structure in POSIX shell is designed to work with such
   234  predicate commands: it takes a pipeline, and executes the body if the last
   235  command in the pipeline exits with 0. Examples:
   236  
   237  ```sh
   238  # command 1
   239  if true; then
   240    echo 'always executes'
   241  fi
   242  
   243  # command 2
   244  n=10
   245  if test $n -gt 2; then
   246    echo 'executes when $n > 2'
   247  fi
   248  
   249  # command 3
   250  if diff a.txt b.txt; then
   251    echo 'a.txt and b.txt are the same'
   252  fi
   253  ```
   254  
   255  Since Elvish treats non-zero exit status as a kind of exception, the way that
   256  predicate commands and `if` work in POSIX shell does not work well for Elvish.
   257  Instead, Elvish's `if` is like most non-shell programming languages: it takes
   258  a value, and executes the body if the value is booleanly true. The first
   259  command above is written in Elvish as:
   260  
   261  ```elvish
   262  if $true {
   263    echo 'always executes'
   264  }
   265  ```
   266  
   267  The way to write the second command in Elvish warrants an explanation of how
   268  predicates work in Elvish first. Predicates in Elvish simply write a boolean
   269  output, either `$true` or `$false`:
   270  
   271  ```elvish-transcript
   272  ~> > 10 2
   273  ▶ $true
   274  ~> > 1 2
   275  ▶ $false
   276  ```
   277  
   278  To use predicates in `if`, you simply capture its output with `()`. So the
   279  second command is written in Elvish as:
   280  
   281  ```elvish
   282  n = 10
   283  if (> $n 2) {
   284    echo 'executes when $n > 2'
   285  }
   286  ```
   287  
   288  The parentheses after `if` are different to those in C: In C it is a
   289  syntactical requirement to put them around the condition; in Elvish, it
   290  functions as output capture operator.
   291  
   292  Sometimes it can be useful to have a condition on whether an external commands
   293  exits with 0. In this case, you can use the exception capture operator `?()`:
   294  
   295  ```elvish
   296  if ?(diff a.txt b.txt) {
   297    echo 'a.txt and b.txt are the same'
   298  }
   299  ```
   300  
   301  In Elvish, all exceptions are booleanly false, while the special `$ok` value
   302  is booleanly true. If the `diff` exits with 0, the `?(...)` construct
   303  evaluates to `$ok`, which is booleanly true. Otherwise, it evaluates to an
   304  exception, which is booleanly false. Overall, this leads to a similar
   305  semantics with the POSIX `if` command.
   306  
   307  Note that the following code does have a severe downside: `?()` will prevent
   308  any kind of exceptions from throwing. In this case, we only want to turn one
   309  sort of exception into a boolean: `diff` exits with 1. If `diff` exits with 2,
   310  it usually means that there was a genuine error (e.g. `a.txt` does not exist).
   311  Swallowing this error defeats Elvish's philosophy of erring on the side of
   312  caution; a more sophisticated system of handling exit status is still being
   313  considered.
   314  
   315  
   316  # Phases of Code Execution
   317  
   318  A piece of code that gets evaluated as a whole is called a **chunk** (a
   319  loanword from Lua). If you run `elvish some-script.elv` from the command line,
   320  the entire script is one chunk; in interactive mode, each time you hit Enter,
   321  the code you have written is one chunk.
   322  
   323  Elvish interprets a code chunk in 3 phases: it first **parse**s the code into
   324  a syntax tree, then **compile**s the syntax tree code to an internal
   325  representation, and finally **evaluate**s the just-generated internal
   326  representation.
   327  
   328  If any error happens during the first two phases, Elvish rejects the chunk
   329  without executing any of it. For instance, in Elvish unclosed parenthesis is a
   330  syntax error (error in the first parsing phase). The following code, when
   331  executed as a chunk, does nothing other than printing the parse error:
   332  
   333  ```elvish-bad
   334  echo before
   335  echo (
   336  ```
   337  
   338  The same code, interpreted as bash, also contains a syntax error. However, if
   339  you save this file to `bad.bash` and run `bash bad.bash`, bash will execute the
   340  first line before complaining about the syntax error on the second line.
   341  
   342  Likewise, in Elvish using an unassigned variable is a compilation error, so the
   343  following code does nothing either:
   344  
   345  ```elvish
   346  # assuming $nonexistent was not assigned
   347  echo before
   348  echo $nonexistent
   349  ```
   350  
   351  There seems to be no equivalency of compilation errors in other shells, but
   352  this extra compilation phase makes the language safer. In future, optional
   353  type checking may be introduced, which will fit into the compilation phase.
   354  
   355  
   356  # Assignment Semantics
   357  
   358  In Python, JavaScript and many other languages, if you assign a container
   359  (e.g. a map) to multiple variables, modifications via those variables mutate
   360  the same container. This is best illustrated with an example:
   361  
   362  ```python
   363  m = {'foo': 'bar', 'lorem': 'ipsum'}
   364  m2 = m
   365  m2['foo'] = 'quux'
   366  print(m['foo']) # prints "quux"
   367  ```
   368  
   369  This is because in such languages, variables do not hold the "actual" map, but
   370  a reference to it. After the assignment `m2 = m`, both variables refer to the
   371  same map. The subsequent element assignment `m2['foo'] = 'quux'` mutates the
   372  underlying map, so `m['foo']`  is also changed.
   373  
   374  This is not the case for Elvish:
   375  
   376  ```elvish-transcript
   377  ~> m = [&foo=bar &lorem=ipsum]
   378  ~> m2 = $m
   379  ~> m2[foo] = quux
   380  ~> put $m[foo]
   381  ▶ bar
   382  ```
   383  
   384  It seems that when you assign `m2 = $m`, the entire map is copied from `$m`
   385  into `$m2`, so any subsequent changes to `$m2` does not affect the original
   386  map in `$m`. You can entirely think of it this way: thinking
   387  **assignment as copying** correctly models the behavior of Elvish.
   388  
   389  But wouldn't it be expensive to copy an entire list or map every time
   390  assignment happens? No, the "copying" is actually very cheap. Is it
   391  implemented as [copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) --
   392  i.e. the copying is delayed until `$m2` gets modified? No, subsequent
   393  modifications to the new `$m2` is also very cheap. Read on if you are
   394  interested in how it is possible.
   395  
   396  ## Implementation Detail: Persistent Data Structures
   397  
   398  Like in Python and JavaScript, Elvish variables like `$m` and `$m2` also only
   399  hold a reference to the underlying map. However, that map is **immutable**,
   400  meaning that they never change after creation. That explains why `$m` did not
   401  change: because the map `$m` refers to never changes. But how is it possible
   402  to do `m2[foo] = quux` if the map is immutable?
   403  
   404  The map implementation of Elvish has another property: although the map is
   405  immutable, it is easy to create a slight variation of one map. Given a map, it
   406  is easy to create another map that is almost the same, either 1) with one more
   407  key/value pair, or 2) with the value for one key changed, or 3) with one fewer
   408  key/value pair. This operation is fast, even if the original map is very
   409  large.
   410  
   411  This low-level functionality is exposed by the `assoc` (associate) and
   412  `dissoc` (dissociate) builtins:
   413  
   414  ```elvish-transcript
   415  ~> assoc [&] foo quux # "add" one pair
   416  ▶ [&foo=quux]
   417  ~> assoc [&foo=bar &lorem=ipsum] foo quux # "modify" one pair
   418  ▶ [&lorem=ipsum &foo=quux]
   419  ~> dissoc [&foo=bar &lorem=ipsum] foo # "remove" one pair
   420  ▶ [&lorem=ipsum]
   421  ```
   422  
   423  Now, although maps are immutable, variables are mutable. So when you try to
   424  assign an element of `$m2`, Elvish turns that into an assignment of `$m2`
   425  itself:
   426  
   427  ```elvish
   428  m2[foo] = quux
   429  # is just syntax sugar for:
   430  m2 = (assoc $m2 foo quux)
   431  ```
   432  
   433  The sort of immutable data structures that support cheap creation of "slight
   434  variations" are called [persistent data
   435  structures](https://en.wikipedia.org/wiki/Persistent_data_structure) and is
   436  used in functional programming languages. However, the way Elvish turns
   437  assignment to `$m2[foo]` into an assignment to `$m2` itself seems to be a new
   438  approach.