github.com/xiaq/elvish@v0.12.0/website/src/learn/unique-semantics.md (about) 1 <!-- toc --> 2 3 <div class="clear"></div> 4 5 The semantics of Elvish is unique in many aspects when compared to other 6 shells. This can be surprising if you are used to other shells, and it is a 7 result of the [design choice](/ref/philosophy.html) of making Elvish a 8 full-fledged programming language. 9 10 # Structureful IO 11 12 Elvish offers the ability to build elaborate data structures, "return" them 13 from functions, and pass them through pipelines. 14 15 ## Motivation 16 17 Traditional shells use strings for all kinds of data. They can be stored in 18 variables, used as function arguments, written to output and read from input. 19 Strings are very simple to use, but they fall short if your data has an 20 inherent structure. A common solution is using pseudo-structures of "each line 21 representing a record, and each (whitespace-separated) field represents a 22 property", which is fine as long as your data do not contain whitespaces. If 23 they do, you will quickly run into problems with escaping and quotation and 24 find yourself doing black magics with strings. 25 26 Some shells provide data structures like lists and maps, but they are usually 27 not first-class values. You can store them in variables, but you might not to 28 be able to nest them, pass them to functions, or returned them from functions. 29 30 ## Data Structures and "Returning" Them 31 32 Elvish offers first-class support for data structures such as lists and maps. 33 Here is an example that uses a list: 34 35 ```elvish-transcript 36 ~> li = [foo bar 'lorem ipsum'] 37 ~> kind-of $li # "kind" is like type 38 ▶ list 39 ~> count $li # count the number of elements in a list 40 ▶ 3 41 ``` 42 43 (See the [language reference](/ref/language.html) for a more complete description 44 of the builtin data structures.) 45 46 As you can see, you can store lists in variables and use them as command 47 arguments. But they would be much less useful if you cannot **return** them 48 from a function. A naive way to do this is by `echo`ing the list and use output 49 capture to recover it: 50 51 ```elvish-transcript 52 ~> fn f { 53 echo [foo bar 'lorem ipsum'] 54 } 55 ~> li = (f) # (...) is output capture, like $(...) in other shells 56 ~> kind-of $li 57 ▶ string 58 ~> count $li # count the number of bytes, since $li is now a string 59 ▶ 23 60 ``` 61 62 As we have seen, our attempt to output a list has turned it into a string. 63 This is because the `echo` command in Elvish, like in other shells, is 64 string-oriented. To echo a list, it has to be first converted to a string. 65 66 Elvish provides a `put` command to output structured values as they are: 67 68 ```elvish-transcript 69 ~> fn f { 70 put [foo bar 'lorem ipsum'] 71 } 72 ~> li = (f) 73 ~> kind-of $li 74 ▶ list 75 ~> count $li 76 ▶ 3 77 ``` 78 79 So how does `put` work differently from `echo` under the hood? 80 81 In Elvish, the standard output is made up of two parts: one traditional 82 byte-oriented **file**, and one internal **value-oriented channel**. The 83 `echo` command writes to the file, so it has to serialize its arguments into 84 strings; the `put` command writes to the value-oriented channel, preserving 85 all the internal structures of the values. 86 87 If you invoke `put` directly from the command prompt, the values it output have 88 a leading `▶`: 89 90 ```elvish-transcript 91 ~> put [foo bar] 92 ▶ [foo bar] 93 ``` 94 95 The leading arrow is a way to visualize that a command has written something 96 onto the value channel, and not part of the value itself. 97 98 In retrospect, you may discover that the `kind-of` and `count` builtin commands 99 also write their output to the value channel. 100 101 102 ## Passing Data Structures Through the Pipeline 103 104 When I said that standard output in Elvish comprises two parts, I was not 105 telling the full story: pipelines in Elvish also have these two parts, in a 106 very similar way. Data structures can flow in the value-oriented part of the 107 pipeline as well. For instance, the `each` command takes **input** from the 108 value-oriented channel, and apply a function to each value: 109 110 ```elvish-transcript 111 ~> put lorem ipsum | each [x]{ echo "Got "$x } 112 Got lorem 113 Got ipsum 114 ``` 115 116 There are many builtin commands that inputs or outputs values. As another 117 example, the `take` commands retains a fixed number of items: 118 119 ```elvish-transcript 120 ~> put [lorem ipsum] "foo\nbar" [&key=value] | take 2 121 ▶ [lorem ipsum] 122 ▶ "foo\nbar" 123 ``` 124 125 ## Interoperability with External Commands 126 127 Unfortunately, the ability of passing structured values is not available to 128 external commands. However, Elvish comes with a pair of commands for JSON 129 serialization/deserialization. The following snippet illustrates how 130 to interoperate with a Python script: 131 132 ```elvish-transcript 133 ~> cat sort-list.py 134 import json, sys 135 li = json.load(sys.stdin) 136 li.sort() 137 json.dump(li, sys.stdout) 138 ~> put [lorem ipsum foo bar] | to-json | python sort-list.py | from-json 139 ▶ [bar foo ipsum lorem] 140 ``` 141 142 It is easy to write a wrapper for such external commands: 143 144 ```elvish-transcript 145 ~> fn sort-list { to-json | python sort-list.py | from-json } 146 ~> put [lorem ipsum foo bar] | sort-list 147 ▶ [bar foo ipsum lorem] 148 ``` 149 150 More serialization/deserialization commands may be added to the language in 151 the future. 152 153 154 # Exit Status and Exceptions 155 156 Unix commands exit with a non-zero value to signal errors. This is available 157 traditionally as a `$?` variable in other shells: 158 159 ```bash 160 true 161 echo $? # prints "0" 162 false 163 echo $? # prints "1" 164 ``` 165 166 Builtin commands and user-defined functions also do this to signal errors, 167 although they are not Unix commands: 168 169 ```bash 170 bad() { 171 return 2 172 } 173 bad 174 echo $? # prints "2" 175 ``` 176 177 This model is fine, only if most errors are non-fatal (so that errors from a 178 previous command normally do not affect the execution of subsequence ones) and 179 the script author remembers to check `$?` for the rare fatal errors. 180 181 Elvish has no concept of exit status. Instead, it has exceptions that, when 182 thrown, interrupt the flow of execution. The equivalency of the `bad` function 183 in elvish is as follows: 184 185 ```elvish 186 fn bad { 187 fail "bad things have happened" # throw an exception 188 } 189 bad # will print a stack trace and stop execution 190 echo "after bad" # not executed 191 ``` 192 193 (If you run this interactively, you need to enter a literal newline after 194 `bad` by pressing <span class="key">Alt-Enter</span> to make sure that it is 195 executed in the same chunk as `echo "after bad"`.) 196 197 And, non-zero exit status from external commands are turned into exceptions: 198 199 ```elvish 200 false # will print a stack trace and stop execution 201 echo "after false" 202 ``` 203 204 (If you are running a version of Elvish older than 0.9, you might need to use 205 `e:false`, as Elvish used to have a builtin version of `false` that does 206 something different.) 207 208 An alternative way to describe this is that Elvish **does** have exit 209 statuses, but non-zero exit statuses terminates execution by default. 210 211 If you come from POSIX shell, this is aknin to `set -e` or having a `&&` 212 between each pair of pipelines. In a sense, `&&` is written as `;` (or more 213 commonly, a newline) in Elvish. 214 215 Defaulting on stopping execution when bad things happen makes Elvish safer and 216 code behavior more predictable. 217 218 219 ## Predicates and `if` 220 221 The use of exit status is not limited to errors, however. In the Unix toolbox, 222 quite a few commands exit with 0 to signal "true" and 1 to signal "false". 223 Notably ones are: 224 225 * `test` aka `[`: testing file types, comparing numbers and strings; 226 227 * `grep`: exits with 0 when there are matches, with 1 otherwise; 228 229 * `diff`: exits with 0 when files are the same, with 1 otherwise; 230 231 * `true` and `false`, always exit with 0 and 1 respectively. 232 233 The `if` control structure in POSIX shell is designed to work with such 234 predicate commands: it takes a pipeline, and executes the body if the last 235 command in the pipeline exits with 0. Examples: 236 237 ```sh 238 # command 1 239 if true; then 240 echo 'always executes' 241 fi 242 243 # command 2 244 n=10 245 if test $n -gt 2; then 246 echo 'executes when $n > 2' 247 fi 248 249 # command 3 250 if diff a.txt b.txt; then 251 echo 'a.txt and b.txt are the same' 252 fi 253 ``` 254 255 Since Elvish treats non-zero exit status as a kind of exception, the way that 256 predicate commands and `if` work in POSIX shell does not work well for Elvish. 257 Instead, Elvish's `if` is like most non-shell programming languages: it takes 258 a value, and executes the body if the value is booleanly true. The first 259 command above is written in Elvish as: 260 261 ```elvish 262 if $true { 263 echo 'always executes' 264 } 265 ``` 266 267 The way to write the second command in Elvish warrants an explanation of how 268 predicates work in Elvish first. Predicates in Elvish simply write a boolean 269 output, either `$true` or `$false`: 270 271 ```elvish-transcript 272 ~> > 10 2 273 ▶ $true 274 ~> > 1 2 275 ▶ $false 276 ``` 277 278 To use predicates in `if`, you simply capture its output with `()`. So the 279 second command is written in Elvish as: 280 281 ```elvish 282 n = 10 283 if (> $n 2) { 284 echo 'executes when $n > 2' 285 } 286 ``` 287 288 The parentheses after `if` are different to those in C: In C it is a 289 syntactical requirement to put them around the condition; in Elvish, it 290 functions as output capture operator. 291 292 Sometimes it can be useful to have a condition on whether an external commands 293 exits with 0. In this case, you can use the exception capture operator `?()`: 294 295 ```elvish 296 if ?(diff a.txt b.txt) { 297 echo 'a.txt and b.txt are the same' 298 } 299 ``` 300 301 In Elvish, all exceptions are booleanly false, while the special `$ok` value 302 is booleanly true. If the `diff` exits with 0, the `?(...)` construct 303 evaluates to `$ok`, which is booleanly true. Otherwise, it evaluates to an 304 exception, which is booleanly false. Overall, this leads to a similar 305 semantics with the POSIX `if` command. 306 307 Note that the following code does have a severe downside: `?()` will prevent 308 any kind of exceptions from throwing. In this case, we only want to turn one 309 sort of exception into a boolean: `diff` exits with 1. If `diff` exits with 2, 310 it usually means that there was a genuine error (e.g. `a.txt` does not exist). 311 Swallowing this error defeats Elvish's philosophy of erring on the side of 312 caution; a more sophisticated system of handling exit status is still being 313 considered. 314 315 316 # Phases of Code Execution 317 318 A piece of code that gets evaluated as a whole is called a **chunk** (a 319 loanword from Lua). If you run `elvish some-script.elv` from the command line, 320 the entire script is one chunk; in interactive mode, each time you hit Enter, 321 the code you have written is one chunk. 322 323 Elvish interprets a code chunk in 3 phases: it first **parse**s the code into 324 a syntax tree, then **compile**s the syntax tree code to an internal 325 representation, and finally **evaluate**s the just-generated internal 326 representation. 327 328 If any error happens during the first two phases, Elvish rejects the chunk 329 without executing any of it. For instance, in Elvish unclosed parenthesis is a 330 syntax error (error in the first parsing phase). The following code, when 331 executed as a chunk, does nothing other than printing the parse error: 332 333 ```elvish-bad 334 echo before 335 echo ( 336 ``` 337 338 The same code, interpreted as bash, also contains a syntax error. However, if 339 you save this file to `bad.bash` and run `bash bad.bash`, bash will execute the 340 first line before complaining about the syntax error on the second line. 341 342 Likewise, in Elvish using an unassigned variable is a compilation error, so the 343 following code does nothing either: 344 345 ```elvish 346 # assuming $nonexistent was not assigned 347 echo before 348 echo $nonexistent 349 ``` 350 351 There seems to be no equivalency of compilation errors in other shells, but 352 this extra compilation phase makes the language safer. In future, optional 353 type checking may be introduced, which will fit into the compilation phase. 354 355 356 # Assignment Semantics 357 358 In Python, JavaScript and many other languages, if you assign a container 359 (e.g. a map) to multiple variables, modifications via those variables mutate 360 the same container. This is best illustrated with an example: 361 362 ```python 363 m = {'foo': 'bar', 'lorem': 'ipsum'} 364 m2 = m 365 m2['foo'] = 'quux' 366 print(m['foo']) # prints "quux" 367 ``` 368 369 This is because in such languages, variables do not hold the "actual" map, but 370 a reference to it. After the assignment `m2 = m`, both variables refer to the 371 same map. The subsequent element assignment `m2['foo'] = 'quux'` mutates the 372 underlying map, so `m['foo']` is also changed. 373 374 This is not the case for Elvish: 375 376 ```elvish-transcript 377 ~> m = [&foo=bar &lorem=ipsum] 378 ~> m2 = $m 379 ~> m2[foo] = quux 380 ~> put $m[foo] 381 ▶ bar 382 ``` 383 384 It seems that when you assign `m2 = $m`, the entire map is copied from `$m` 385 into `$m2`, so any subsequent changes to `$m2` does not affect the original 386 map in `$m`. You can entirely think of it this way: thinking 387 **assignment as copying** correctly models the behavior of Elvish. 388 389 But wouldn't it be expensive to copy an entire list or map every time 390 assignment happens? No, the "copying" is actually very cheap. Is it 391 implemented as [copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write) -- 392 i.e. the copying is delayed until `$m2` gets modified? No, subsequent 393 modifications to the new `$m2` is also very cheap. Read on if you are 394 interested in how it is possible. 395 396 ## Implementation Detail: Persistent Data Structures 397 398 Like in Python and JavaScript, Elvish variables like `$m` and `$m2` also only 399 hold a reference to the underlying map. However, that map is **immutable**, 400 meaning that they never change after creation. That explains why `$m` did not 401 change: because the map `$m` refers to never changes. But how is it possible 402 to do `m2[foo] = quux` if the map is immutable? 403 404 The map implementation of Elvish has another property: although the map is 405 immutable, it is easy to create a slight variation of one map. Given a map, it 406 is easy to create another map that is almost the same, either 1) with one more 407 key/value pair, or 2) with the value for one key changed, or 3) with one fewer 408 key/value pair. This operation is fast, even if the original map is very 409 large. 410 411 This low-level functionality is exposed by the `assoc` (associate) and 412 `dissoc` (dissociate) builtins: 413 414 ```elvish-transcript 415 ~> assoc [&] foo quux # "add" one pair 416 ▶ [&foo=quux] 417 ~> assoc [&foo=bar &lorem=ipsum] foo quux # "modify" one pair 418 ▶ [&lorem=ipsum &foo=quux] 419 ~> dissoc [&foo=bar &lorem=ipsum] foo # "remove" one pair 420 ▶ [&lorem=ipsum] 421 ``` 422 423 Now, although maps are immutable, variables are mutable. So when you try to 424 assign an element of `$m2`, Elvish turns that into an assignment of `$m2` 425 itself: 426 427 ```elvish 428 m2[foo] = quux 429 # is just syntax sugar for: 430 m2 = (assoc $m2 foo quux) 431 ``` 432 433 The sort of immutable data structures that support cheap creation of "slight 434 variations" are called [persistent data 435 structures](https://en.wikipedia.org/wiki/Persistent_data_structure) and is 436 used in functional programming languages. However, the way Elvish turns 437 assignment to `$m2[foo]` into an assignment to `$m2` itself seems to be a new 438 approach.