github.com/joomcode/cue@v0.4.4-0.20221111115225-539fe3512047/doc/ref/spec.md

github.com/joomcode/cue@v0.4.4-0.20221111115225-539fe3512047/doc/ref/spec.md (about)

     1  <!--
     2   Copyright 2018 The CUE Authors
     3  
     4   Licensed under the Apache License, Version 2.0 (the "License");
     5   you may not use this file except in compliance with the License.
     6   You may obtain a copy of the License at
     7  
     8       http://www.apache.org/licenses/LICENSE-2.0
     9  
    10   Unless required by applicable law or agreed to in writing, software
    11   distributed under the License is distributed on an "AS IS" BASIS,
    12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    13   See the License for the specific language governing permissions and
    14   limitations under the License.
    15  -->
    16  
    17  # The CUE Language Specification
    18  
    19  ## Introduction
    20  
    21  This is a reference manual for the CUE data constraint language.
    22  CUE, pronounced cue or Q, is a general-purpose and strongly typed
    23  constraint-based language.
    24  It can be used for data templating, data validation, code generation, scripting,
    25  and many other applications involving structured data.
    26  The CUE tooling, layered on top of CUE, provides
    27  a general purpose scripting language for creating scripts as well as
    28  simple servers, also expressed in CUE.
    29  
    30  CUE was designed with cloud configuration, and related systems, in mind,
    31  but is not limited to this domain.
    32  It derives its formalism from relational programming languages.
    33  This formalism allows for managing and reasoning over large amounts of
    34  data in a straightforward manner.
    35  
    36  The grammar is compact and regular, allowing for easy analysis by automatic
    37  tools such as integrated development environments.
    38  
    39  This document is maintained by mpvl@golang.org.
    40  CUE has a lot of similarities with the Go language. This document draws heavily
    41  from the Go specification as a result.
    42  
    43  CUE draws its influence from many languages.
    44  Its main influences were BCL/ GCL (internal to Google),
    45  LKB (LinGO), Go, and JSON.
    46  Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
    47  Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
    48  
    49  
    50  ## Notation
    51  
    52  The syntax is specified using Extended Backus-Naur Form (EBNF):
    53  
    54  ```
    55  Production  = production_name "=" [ Expression ] "." .
    56  Expression  = Alternative { "|" Alternative } .
    57  Alternative = Term { Term } .
    58  Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
    59  Group       = "(" Expression ")" .
    60  Option      = "[" Expression "]" .
    61  Repetition  = "{" Expression "}" .
    62  ```
    63  
    64  Productions are expressions constructed from terms and the following operators,
    65  in increasing precedence:
    66  
    67  ```
    68  |   alternation
    69  ()  grouping
    70  []  option (0 or 1 times)
    71  {}  repetition (0 to n times)
    72  ```
    73  
    74  Lower-case production names are used to identify lexical tokens. Non-terminals
    75  are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
    76  ``.
    77  
    78  The form a … b represents the set of characters from a through b as
    79  alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
    80  informally denote various enumerations or code snippets that are not further
    81  specified. The character … (as opposed to the three characters ...) is not a
    82  token of the CUE language.
    83  
    84  
    85  ## Source code representation
    86  
    87  Source code is Unicode text encoded in UTF-8.
    88  Unless otherwise noted, the text is not canonicalized, so a single
    89  accented code point is distinct from the same character constructed from
    90  combining an accent and a letter; those are treated as two code points.
    91  For simplicity, this document will use the unqualified term character to refer
    92  to a Unicode code point in the source text.
    93  
    94  Each code point is distinct; for instance, upper and lower case letters are
    95  different characters.
    96  
    97  Implementation restriction: For compatibility with other tools, a compiler may
    98  disallow the NUL character (U+0000) in the source text.
    99  
   100  Implementation restriction: For compatibility with other tools, a compiler may
   101  ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
   102  point in the source text. A byte order mark may be disallowed anywhere else in
   103  the source.
   104  
   105  
   106  ### Characters
   107  
   108  The following terms are used to denote specific Unicode character classes:
   109  
   110  ```
   111  newline        = /* the Unicode code point U+000A */ .
   112  unicode_char   = /* an arbitrary Unicode code point except newline */ .
   113  unicode_letter = /* a Unicode code point classified as "Letter" */ .
   114  unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
   115  ```
   116  
   117  In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
   118  character categories.
   119  CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
   120  as Unicode letters, and those in the Number category Nd as Unicode digits.
   121  
   122  
   123  ### Letters and digits
   124  
   125  The underscore character _ (U+005F) is considered a letter.
   126  
   127  ```
   128  letter        = unicode_letter | "_" | "$" .
   129  decimal_digit = "0" … "9" .
   130  binary_digit  = "0" … "1" .
   131  octal_digit   = "0" … "7" .
   132  hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .
   133  ```
   134  
   135  
   136  ## Lexical elements
   137  
   138  ### Comments
   139  Comments serve as program documentation.
   140  CUE supports line comments that start with the character sequence //
   141  and stop at the end of the line.
   142  
   143  A comment cannot start inside a string literal or inside a comment.
   144  A comment acts like a newline.
   145  
   146  
   147  ### Tokens
   148  
   149  Tokens form the vocabulary of the CUE language. There are four classes:
   150  identifiers, keywords, operators and punctuation, and literals. White space,
   151  formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
   152  (U+000D), and newlines (U+000A), is ignored except as it separates tokens that
   153  would otherwise combine into a single token. Also, a newline or end of file may
   154  trigger the insertion of a comma. While breaking the input into tokens, the
   155  next token is the longest sequence of characters that form a valid token.
   156  
   157  
   158  ### Commas
   159  
   160  The formal grammar uses commas "," as terminators in a number of productions.
   161  CUE programs may omit most of these commas using the following two rules:
   162  
   163  When the input is broken into tokens, a comma is automatically inserted into
   164  the token stream immediately after a line's final token if that token is
   165  
   166  - an identifier, keyword, or bottom
   167  - a number or string literal, including an interpolation
   168  - one of the characters `)`, `]`, `}`, or `?`
   169  - an ellipsis `...`
   170  
   171  
   172  Although commas are automatically inserted, the parser will require
   173  explicit commas between two list elements.
   174  
   175  To reflect idiomatic use, examples in this document elide commas using
   176  these rules.
   177  
   178  
   179  ### Identifiers
   180  
   181  Identifiers name entities such as fields and aliases.
   182  An identifier is a sequence of one or more letters (which includes `_` and `$`)
   183  and digits, optionally preceded by `#` or `_#`.
   184  It may not be `_` or `$`.
   185  The first character in an identifier, or after an `#` if it contains one,
   186  must be a letter.
   187  Identifiers starting with a `#` or `_` are reserved for definitions and hidden
   188  fields.
   189  
   190  <!--
   191  TODO: allow identifiers as defined in Unicode UAX #31
   192  (https://unicode.org/reports/tr31/).
   193  
   194  Identifiers are normalized using the NFC normal form.
   195  -->
   196  
   197  ```
   198  identifier  = [ "#" | "_#" ] letter { letter | unicode_digit } .
   199  ```
   200  
   201  ```
   202  a
   203  _x9
   204  fieldName
   205  αβ
   206  ```
   207  
   208  <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
   209  
   210  Some identifiers are [predeclared](#predeclared-identifiers).
   211  
   212  
   213  ### Keywords
   214  
   215  CUE has a limited set of keywords.
   216  In addition, CUE reserves all identifiers starting with `__`(double underscores)
   217  as keywords.
   218  These are typically targets of pre-declared identifiers.
   219  
   220  All keywords may be used as labels (field names).
   221  Unless noted otherwise, they can also be used as identifiers to refer to
   222  the same name.
   223  
   224  
   225  #### Values
   226  
   227  The following keywords are values.
   228  
   229  ```
   230  null         true         false
   231  ```
   232  
   233  These can never be used to refer to a field of the same name.
   234  This restriction is to ensure compatibility with JSON configuration files.
   235  
   236  
   237  #### Preamble
   238  
   239  The following keywords are used at the preamble of a CUE file.
   240  After the preamble, they may be used as identifiers to refer to namesake fields.
   241  
   242  ```
   243  package      import
   244  ```
   245  
   246  
   247  #### Comprehension clauses
   248  
   249  The following keywords are used in comprehensions.
   250  
   251  ```
   252  for          in           if           let
   253  ```
   254  
   255  <!--
   256  TODO:
   257      reduce [to]
   258      order [by]
   259  -->
   260  
   261  
   262  ### Operators and punctuation
   263  
   264  The following character sequences represent operators and punctuation:
   265  
   266  ```
   267  +     &&    ==    <     =     (     )
   268  -     ||    !=    >     :     {     }
   269  *     &     =~    <=    ?     [     ]     ,
   270  /     |     !~    >=    !     _|_   ...   .
   271  ```
   272  <!--
   273  Free tokens:  ; ~ ^
   274  // To be used:
   275    @   at: associative lists.
   276  
   277  // Idea: use # instead of @ for attributes and allow then at declaration level.
   278  // This will open up the possibility of defining #! at the start of a file
   279  // without requiring special syntax. Although probably not quite.
   280   -->
   281  
   282  
   283  ### Numeric literals
   284  
   285  There are several kinds of numeric literals.
   286  
   287  ```
   288  int_lit     = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
   289  decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .
   290  decimals    = decimal_digit { [ "_" ] decimal_digit } .
   291  si_it       = decimals [ "." decimals ] multiplier |
   292                "." decimals  multiplier .
   293  binary_lit  = "0b" binary_digit { binary_digit } .
   294  hex_lit     = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
   295  octal_lit   = "0o" octal_digit { [ "_" ] octal_digit } .
   296  multiplier  = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
   297  
   298  float_lit   = decimals "." [ decimals ] [ exponent ] |
   299                decimals exponent |
   300                "." decimals [ exponent ].
   301  exponent    = ( "e" | "E" ) [ "+" | "-" ] decimals .
   302  ```
   303  
   304  An _integer literal_ is a sequence of digits representing an integer value.
   305  An optional prefix sets a non-decimal base: 0o for octal,
   306  0x or 0X for hexadecimal, and 0b for binary.
   307  In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
   308  All integers allow interstitial underscores "_";
   309  these have no meaning and are solely for readability.
   310  
   311  Integer literals may have an SI or IEC multiplier.
   312  Multipliers can be used with fractional numbers.
   313  When multiplying a fraction by a multiplier, the result is truncated
   314  towards zero if it is not an integer.
   315  
   316  ```
   317  42
   318  1.5G    // 1_000_000_000
   319  1.3Ki   // 1.3 * 1024 = trunc(1331.2) = 1331
   320  170_141_183_460_469_231_731_687_303_715_884_105_727
   321  0xBad_Face
   322  0o755
   323  0b0101_0001
   324  ```
   325  
   326  A _decimal floating-point literal_ is a representation of
   327  a decimal floating-point value (a _float_).
   328  It has an integer part, a decimal point, a fractional part, and an
   329  exponent part.
   330  The integer and fractional part comprise decimal digits; the
   331  exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
   332  One of the integer part or the fractional part may be elided; one of the decimal
   333  point or the exponent may be elided.
   334  
   335  ```
   336  0.
   337  72.40
   338  072.40  // == 72.40
   339  2.71828
   340  1.e+0
   341  6.67428e-11
   342  1E6
   343  .25
   344  .12345E+5
   345  ```
   346  
   347  <!--
   348  TODO: consider allowing Exo (and up), if not followed by a sign
   349  or number. Alternatively one could only allow Ei, Yi, and Zi.
   350  -->
   351  
   352  Neither a `float_lit` nor an `si_lit` may appear after a token that is:
   353  
   354  - an identifier, keyword, or bottom
   355  - a number or string literal, including an interpolation
   356  - one of the characters `)`, `]`, `}`, `?`, or `.`.
   357  
   358  <!--
   359  So
   360  `a + 3.2Ti`  -> `a`, `+`, `3.2Ti`
   361  `a 3.2Ti`    -> `a`, `3`, `.`, `2`, `Ti`
   362  `a + .5e3`   -> `a`, `+`, `.5e3`
   363  `a .5e3`     -> `a`, `.`, `5`, `e3`.
   364  -->
   365  
   366  
   367  ### String and byte sequence literals
   368  
   369  A string literal represents a string constant obtained from concatenating a
   370  sequence of characters.
   371  Byte sequences are a sequence of bytes.
   372  
   373  String and byte sequence literals are character sequences between,
   374  respectively, double and single quotes, as in `"bar"` and `'bar'`.
   375  Within the quotes, any character may appear except newline and,
   376  respectively, unescaped double or single quote.
   377  String literals may only be valid UTF-8.
   378  Byte sequences may contain any sequence of bytes.
   379  
   380  Several escape sequences allow arbitrary values to be encoded as ASCII text.
   381  An escape sequence starts with an _escape delimiter_, which is `\` by default.
   382  The escape delimiter may be altered to be `\` plus a fixed number of
   383  hash symbols `#`
   384  by padding the start and end of a string or byte sequence literal
   385  with this number of hash symbols.
   386  
   387  There are four ways to represent the integer value as a numeric constant: `\x`
   388  followed by exactly two hexadecimal digits; `\u` followed by exactly four
   389  hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
   390  plain backslash `\` followed by exactly three octal digits.
   391  In each case the value of the literal is the value represented by the
   392  digits in the corresponding base.
   393  Hexadecimal and octal escapes are only allowed within byte sequences
   394  (single quotes).
   395  
   396  Although these representations all result in an integer, they have different
   397  valid ranges.
   398  Octal escapes must represent a value between 0 and 255 inclusive.
   399  Hexadecimal escapes satisfy this condition by construction.
   400  The escapes `\u` and `\U` represent Unicode code points so within them
   401  some values are illegal, in particular those above `0x10FFFF`.
   402  Surrogate halves are allowed,
   403  but are translated into their non-surrogate equivalent internally.
   404  
   405  The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
   406  represent individual bytes of the resulting string; all other escapes represent
   407  the (possibly multi-byte) UTF-8 encoding of individual characters.
   408  Thus inside a string literal `\377` and `\xFF` represent a single byte of
   409  value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
   410  the two bytes `0xc3 0xbf` of the UTF-8
   411  encoding of character `U+00FF`.
   412  
   413  ```
   414  \a   U+0007 alert or bell
   415  \b   U+0008 backspace
   416  \f   U+000C form feed
   417  \n   U+000A line feed or newline
   418  \r   U+000D carriage return
   419  \t   U+0009 horizontal tab
   420  \v   U+000b vertical tab
   421  \/   U+002f slash (solidus)
   422  \\   U+005c backslash
   423  \'   U+0027 single quote  (valid escape only within single quoted literals)
   424  \"   U+0022 double quote  (valid escape only within double quoted literals)
   425  ```
   426  
   427  The escape `\(` is used as an escape for string interpolation.
   428  A `\(` must be followed by a valid CUE Expression, followed by a `)`.
   429  
   430  All other sequences starting with a backslash are illegal inside literals.
   431  
   432  ```
   433  escaped_char     = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
   434  byte_value       = octal_byte_value | hex_byte_value .
   435  octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit .
   436  hex_byte_value   = `\` { `#` } "x" hex_digit hex_digit .
   437  little_u_value   = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit .
   438  big_u_value      = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit
   439                             hex_digit hex_digit hex_digit hex_digit .
   440  unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
   441  interpolation    = "\" { `#` } "(" Expression ")" .
   442  
   443  string_lit       = simple_string_lit |
   444                     multiline_string_lit |
   445                     simple_bytes_lit |
   446                     multiline_bytes_lit |
   447                     `#` string_lit `#` .
   448  
   449  simple_string_lit    = `"` { unicode_value | interpolation } `"` .
   450  simple_bytes_lit     = `'` { unicode_value | interpolation | byte_value } `'` .
   451  multiline_string_lit = `"""` newline
   452                               { unicode_value | interpolation | newline }
   453                               newline `"""` .
   454  multiline_bytes_lit  = "'''" newline
   455                               { unicode_value | interpolation | byte_value | newline }
   456                               newline "'''" .
   457  ```
   458  
   459  Carriage return characters (`\r`) inside string literals are discarded from
   460  the string value.
   461  
   462  ```
   463  'a\000\xab'
   464  '\007'
   465  '\377'
   466  '\xa'        // illegal: too few hexadecimal digits
   467  "\n"
   468  "\""
   469  'Hello, world!\n'
   470  "Hello, \( name )!"
   471  "日本語"
   472  "\u65e5本\U00008a9e"
   473  '\xff\u00FF'
   474  "\uD800"             // illegal: surrogate half (TODO: probably should allow)
   475  "\U00110000"         // illegal: invalid Unicode code point
   476  
   477  #"This is not an \(interpolation)"#
   478  #"This is an \#(interpolation)"#
   479  #"The sequence "\U0001F604" renders as \#U0001F604."#
   480  ```
   481  
   482  These examples all represent the same string:
   483  
   484  ```
   485  "日本語"                                 // UTF-8 input text
   486  '日本語'                                 // UTF-8 input text as byte sequence
   487  "\u65e5\u672c\u8a9e"                    // the explicit Unicode code points
   488  "\U000065e5\U0000672c\U00008a9e"        // the explicit Unicode code points
   489  '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'  // the explicit UTF-8 bytes
   490  ```
   491  
   492  If the source code represents a character as two code points, such as a
   493  combining form involving an accent and a letter, the result will appear as two
   494  code points if placed in a string literal.
   495  
   496  Strings and byte sequences have a multiline equivalent.
   497  Multiline strings are like their single-line equivalent,
   498  but allow newline characters.
   499  
   500  Multiline strings and byte sequences respectively start with
   501  a triple double quote (`"""`) or triple single quote (`'''`),
   502  immediately followed by a newline, which is discarded from the string contents.
   503  The string is closed by a matching triple quote, which must be by itself
   504  on a newline, preceded by optional whitespace.
   505  The newline preceding the closing quote is discarded from the string contents.
   506  The whitespace before a closing triple quote must appear before any non-empty
   507  line after the opening quote and will be removed from each of these
   508  lines in the string literal.
   509  A closing triple quote may not appear in the string.
   510  To include it is suffices to escape one of the quotes.
   511  
   512  ```
   513  """
   514      lily:
   515      out of the water
   516      out of itself
   517  
   518      bass
   519      picking bugs
   520      off the moon
   521          — Nick Virgilio, Selected Haiku, 1988
   522      """
   523  ```
   524  
   525  This represents the same string as:
   526  
   527  ```
   528  "lily:\nout of the water\nout of itself\n\n" +
   529  "bass\npicking bugs\noff the moon\n" +
   530  "    — Nick Virgilio, Selected Haiku, 1988"
   531  ```
   532  
   533  <!-- TODO: other values
   534  
   535  Support for other values:
   536  - Duration literals
   537  - regular expressions: `re("[a-z]")`
   538  -->
   539  
   540  
   541  ## Values
   542  
   543  In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
   544  A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
   545  Structs are CUE's only way of building up complex values;
   546  lists, which we will see later,
   547  are defined in terms of structs.
   548  
   549  All possible values are ordered in a lattice,
   550  a partial order where every two elements have a single greatest lower bound.
   551  A value `a` is an _instance_ of a value `b`,
   552  denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
   553  that is if `a` orders before `b` in the partial order
   554  (`⊑` is _not_ a CUE operator).
   555  We also say that `b` _subsumes_ `a` in this case.
   556  In graphical terms, `b` is "above" `a` in the lattice.
   557  
   558  At the top of the lattice is the single ancestor of all values, called
   559  _top_, denoted `_` in CUE.
   560  Every value is an instance of top.
   561  
   562  At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
   563  A bottom value usually indicates an error.
   564  Bottom is an instance of every value.
   565  
   566  An _atom_ is any value whose only instances are itself and bottom.
   567  Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
   568  
   569  A value is _concrete_ if it is either an atom, or a struct all of whose
   570  field values are themselves concrete, recursively.
   571  
   572  CUE's values also include what we normally think of as types, like `string` and
   573  `float`.
   574  But CUE does not distinguish between types and values; only the
   575  relationship of values in the lattice is important.
   576  Each CUE "type" subsumes the concrete values that one would normally think
   577  of as part of that type.
   578  For example, "hello" is an instance of `string`, and `42.0` is an instance of
   579  `float`.
   580  In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
   581  We informally call these CUE's "basic types".
   582  
   583  
   584  ```
   585  false ⊑ bool
   586  true  ⊑ bool
   587  true  ⊑ true
   588  5.0   ⊑ float
   589  bool  ⊑ _
   590  _|_   ⊑ _
   591  _|_   ⊑ _|_
   592  
   593  _     ⋢ _|_
   594  _     ⋢ bool
   595  int   ⋢ bool
   596  bool  ⋢ int
   597  false ⋢ true
   598  true  ⋢ false
   599  float ⋢ 5.0
   600  5     ⋢ 6
   601  ```
   602  
   603  
   604  ### Unification
   605  
   606  The _unification_ of values `a` and `b`
   607  is defined as the greatest lower bound of `a` and `b`. (That is, the
   608  value `u` such that `u ⊑ a` and `u ⊑ b`,
   609  and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
   610  it holds that `v ⊑ u`.)
   611  Since CUE values form a lattice, the unification of two CUE values is
   612  always unique.
   613  
   614  These all follow from the definition of unification:
   615  - The unification of `a` with itself is always `a`.
   616  - The unification of values `a` and `b` where `a ⊑ b` is always `a`.
   617  - The unification of a value with bottom is always bottom.
   618  
   619  Unification in CUE is a [binary expression](#operands), written `a & b`.
   620  It is commutative and associative.
   621  As a consequence, order of evaluation is irrelevant, a property that is key
   622  to many of the constructs in the CUE language as well as the tooling layered
   623  on top of it.
   624  
   625  
   626  
   627  <!-- TODO: explicitly mention that disjunction is not a binary operation
   628  but a definition of a single value?-->
   629  
   630  
   631  ### Disjunction
   632  
   633  The _disjunction_ of values `a` and `b`
   634  is defined as the least upper bound of `a` and `b`.
   635  (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
   636  and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
   637  it holds that `d ⊑ e`.)
   638  This style of disjunctions is sometimes also referred to as sum types.
   639  Since CUE values form a lattice, the disjunction of two CUE values is always unique.
   640  
   641  
   642  These all follow from the definition of disjunction:
   643  - The disjunction of `a` with itself is always `a`.
   644  - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
   645  - The disjunction of a value `a` with bottom is always `a`.
   646  - The disjunction of two bottom values is bottom.
   647  
   648  Disjunction in CUE is a [binary expression](#operands), written `a | b`.
   649  It is commutative, associative, and idempotent.
   650  
   651  The unification of a disjunction with another value is equal to the disjunction
   652  composed of the unification of this value with all of the original elements
   653  of the disjunction.
   654  In other words, unification distributes over disjunction.
   655  
   656  ```
   657  (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
   658  ```
   659  
   660  ```
   661  Expression                Result
   662  ({a:1} | {b:2}) & {c:3}   {a:1, c:3} | {b:2, c:3}
   663  (int | string) & "foo"    "foo"
   664  ("a" | "b") & "c"         _|_
   665  ```
   666  
   667  A disjunction is _normalized_ if there is no element
   668  `a` for which there is an element `b` such that `a ⊑ b`.
   669  
   670  <!--
   671  Normalization is important, as we need to account for spurious elements
   672  For instance "tcp" | "tcp" should resolve to "tcp".
   673  
   674  Also consider
   675  
   676    ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
   677  
   678  in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
   679  this expression is logically equivalent to {a:1} and should therefore be
   680  considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
   681  
   682  For instance, in
   683  
   684    x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
   685    y: x.a // 1
   686  
   687  y should resolve to 1, and not an error.
   688  
   689  For comparison, in
   690  
   691    x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
   692    y: x.a // _|_
   693  
   694  y should be an error as x is still ambiguous before the selector is applied,
   695  even though `a` resolves to 1 in all cases.
   696  -->
   697  
   698  
   699  #### Default values
   700  
   701  Any value `v` _may_ be associated with a default value `d`,
   702  where `d` must be in instance of `v` (`d ⊑ v`).
   703  
   704  Default values are introduced by means of disjunctions.
   705  Any element of a disjunction can be _marked_ as a default
   706  by prefixing it with an asterisk `*` ([a unary expression](#operators)).
   707  Syntactically consecutive disjunctions are considered to be
   708  part of a single disjunction,
   709  whereby multiple disjuncts can be marked as default.
   710  A _marked disjunction_ is one where any of its terms are marked.
   711  So `a | b | *c | d` is a single marked disjunction of four terms,
   712  whereas `a | (b | *c | d)` is an unmarked disjunction of two terms,
   713  one of which is a marked disjunction of three terms.
   714  During unification, if all the marked disjuncts of a marked disjunction are
   715  eliminated, then the remaining unmarked disjuncts are considered as if they
   716  originated from an unmarked disjunction
   717  <!-- TODO: this formulation should be worked out more.  -->
   718  As explained below, distinguishing the nesting of disjunctions like this
   719  is only relevant when both an outer and nested disjunction are marked.
   720  
   721  Intuitively, when an expression needs to be resolved for an operation other
   722  than unification or disjunction,
   723  non-starred elements are dropped in favor of starred ones if the starred ones
   724  do not resolve to bottom.
   725  
   726  To define the unification and disjunction operation we use the notation
   727  `⟨v⟩` to denote a CUE value `v` that is not associated with a default
   728  and the notation `⟨v, d⟩` to denote a value `v` associated with a default
   729  value `d`.
   730  
   731  The rewrite rules for unifying such values are as follows:
   732  ```
   733  U0: ⟨v1⟩ & ⟨v2⟩         => ⟨v1&v2⟩
   734  U1: ⟨v1, d1⟩ & ⟨v2⟩     => ⟨v1&v2, d1&v2⟩
   735  U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩
   736  ```
   737  
   738  The rewrite rules for disjoining terms of unmarked disjunctions are
   739  ```
   740  D0: ⟨v1⟩ | ⟨v2⟩         => ⟨v1|v2⟩
   741  D1: ⟨v1, d1⟩ | ⟨v2⟩     => ⟨v1|v2, d1⟩
   742  D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩
   743  ```
   744  
   745  Terms of marked disjunctions are first rewritten according to the following
   746  rules:
   747  ```
   748  M0:  ⟨v⟩    => ⟨v⟩        don't introduce defaults for unmarked term
   749  M1: *⟨v⟩    => ⟨v, v⟩     introduce identical default for marked term
   750  M2: *⟨v, d⟩ => ⟨v, d⟩     keep existing defaults for marked term
   751  M3:  ⟨v, d⟩ => ⟨v⟩        strip existing defaults from unmarked term
   752  ```
   753  
   754  Note that for any marked disjunction `a`,
   755  the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`.
   756  
   757  ```
   758  Expression               Value-default pair     Rules applied
   759  *"tcp" | "udp"           ⟨"tcp"|"udp", "tcp"⟩    M1, D1
   760  string | *"foo"          ⟨string, "foo"⟩         M1, D1
   761  
   762  *1 | 2 | 3               ⟨1|2|3, 1⟩              M1, D1
   763  
   764  (*1|2|3) | (1|*2|3)      ⟨1|2|3, 1|2⟩            M1, D1, D2
   765  (*1|2|3) | *(1|*2|3)     ⟨1|2|3, 2⟩              M1, M2, M3, D1, D2
   766  (*1|2|3) | (1|*2|3)&2    ⟨1|2|3, 1|2⟩            M1, D1, U1, D2
   767  
   768  (*1|2) & (1|*2)          ⟨1|2, _|_⟩              M1, D1, U2
   769  ```
   770  
   771  The rules of subsumption for defaults can be derived from the above definitions
   772  and are as follows.
   773  
   774  ```
   775  ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩  if v2 ⊑ v1 and d2 ⊑ d1
   776  ⟨v1, d1⟩ ⊑ ⟨v⟩       if v1 ⊑ v
   777  ⟨v⟩      ⊑ ⟨v1, d1⟩  if v ⊑ d1
   778  ```
   779  
   780  <!--
   781  For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
   782  
   783  The last one is so restrictive as v could still be made more specific by
   784  associating it with a default that is not subsumed by d1.
   785  
   786  Proof:
   787    by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
   788    where the most general value is (v, v).
   789    Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
   790    from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
   791    exactly defines the boundary of this subsumption.
   792  -->
   793  
   794  <!--
   795  (non-normalized entries could also be implicitly marked, allowing writing
   796  int | 1, instead of int | *1, but that can be done in a backwards
   797  compatible way later if really desirable, as long as we require that
   798  disjunction literals be normalized).
   799  -->
   800  
   801  ```
   802  Expression                       Resolves to
   803  "tcp" | "udp"                    "tcp" | "udp"
   804  *"tcp" | "udp"                   "tcp"
   805  float | *1                       1
   806  *string | 1.0                    string
   807  (*1|2) + (2|*3)                  4
   808  
   809  (*1|2|3) | (1|*2|3)              1|2
   810  (*1|2|3) & (1|*2|3)              1|2|3 // default is _|_
   811  
   812  (* >=5 | int) & (* <=5 | int)    5
   813  
   814  (*"tcp"|"udp") & ("udp"|*"tcp")  "tcp"
   815  (*"tcp"|"udp") & ("udp"|"tcp")   "tcp"
   816  (*"tcp"|"udp") & "tcp"           "tcp"
   817  (*"tcp"|"udp") & (*"udp"|"tcp")  "tcp" | "udp" // default is _|_
   818  
   819  (*true | false) & bool           true
   820  (*true | false) & (true | false) true
   821  
   822  {a: 1} | {b: 1}                  {a: 1} | {b: 1}
   823  {a: 1} | *{b: 1}                 {b:1}
   824  *{a: 1} | *{b: 1}                {a: 1} | {b: 1}
   825  ({a: 1} | {b: 1}) & {a:1}        {a:1}  | {a: 1, b: 1}
   826  ({a:1}|*{b:1}) & ({a:1}|*{b:1})  {b:1}
   827  ```
   828  
   829  
   830  ### Bottom and errors
   831  
   832  Any evaluation error in CUE results in a bottom value, represented by
   833  the token `_|_`.
   834  Bottom is an instance of every other value.
   835  Any evaluation error is represented as bottom.
   836  
   837  Implementations may associate error strings with different instances of bottom;
   838  logically they all remain the same value.
   839  
   840  ```
   841  bottom_lit = "_|_" .
   842  ```
   843  
   844  
   845  ### Top
   846  
   847  Top is represented by the underscore character `_`, lexically an identifier.
   848  Unifying any value `v` with top results `v` itself.
   849  
   850  ```
   851  Expr        Result
   852  _ &  5        5
   853  _ &  _        _
   854  _ & _|_      _|_
   855  _ | _|_       _
   856  ```
   857  
   858  
   859  ### Null
   860  
   861  The _null value_ is represented with the keyword `null`.
   862  It has only one parent, top, and one child, bottom.
   863  It is unordered with respect to any other value.
   864  
   865  ```
   866  null_lit   = "null" .
   867  ```
   868  
   869  ```
   870  null & 8     _|_
   871  null & _     null
   872  null & _|_   _|_
   873  ```
   874  
   875  
   876  ### Boolean values
   877  
   878  A _boolean type_ represents the set of Boolean truth values denoted by
   879  the keywords `true` and `false`.
   880  The predeclared boolean type is `bool`; it is a defined type and a separate
   881  element in the lattice.
   882  
   883  ```
   884  bool_lit = "true" | "false" .
   885  ```
   886  
   887  ```
   888  bool & true          true
   889  true & true          true
   890  true & false         _|_
   891  bool & (false|true)  false | true
   892  bool & (true|false)  true | false
   893  ```
   894  
   895  
   896  ### Numeric values
   897  
   898  The _integer type_ represents the set of all integral numbers.
   899  The _decimal floating-point type_ represents the set of all decimal floating-point
   900  numbers.
   901  They are two distinct types.
   902  Both are instances instances of a generic `number` type.
   903  
   904  <!--
   905                      number
   906                     /      \
   907                  int      float
   908  -->
   909  
   910  The predeclared number, integer, decimal floating-point types are
   911  `number`, `int` and `float`; they are defined types.
   912  <!--
   913  TODO: should we drop float? It is somewhat preciser and probably a good idea
   914  to have it in the programmatic API, but it may be confusing to have to deal
   915  with it in the language.
   916  -->
   917  
   918  A decimal floating-point literal always has type `float`;
   919  it is not an instance of `int` even if it is an integral number.
   920  
   921  Integer literals are always of type `int` and don't match type `float`.
   922  
   923  Numeric literals are exact values of arbitrary precision.
   924  If the operation permits it, numbers should be kept in arbitrary precision.
   925  
   926  Implementation restriction: although numeric values have arbitrary precision
   927  in the language, implementations may implement them using an internal
   928  representation with limited precision.
   929  That said, every implementation must:
   930  
   931  - Represent integer values with at least 256 bits.
   932  - Represent floating-point values, with a mantissa of at least 256 bits and
   933  a signed binary exponent of at least 16 bits.
   934  - Give an error if unable to represent an integer value precisely.
   935  - Give an error if unable to represent a floating-point value due to overflow.
   936  - Round to the nearest representable value if unable to represent
   937  a floating-point value due to limits on precision.
   938  These requirements apply to the result of any expression except for builtin
   939  functions for which an unusual loss of precision must be explicitly documented.
   940  
   941  
   942  ### Strings
   943  
   944  The _string type_ represents the set of UTF-8 strings,
   945  not allowing surrogates.
   946  The predeclared string type is `string`; it is a defined type.
   947  
   948  The length of a string `s` (its size in bytes) can be discovered using
   949  the built-in function `len`.
   950  
   951  
   952  ### Bytes
   953  
   954  The _bytes type_ represents the set of byte sequences.
   955  A byte sequence value is a (possibly empty) sequence of bytes.
   956  The number of bytes is called the length of the byte sequence
   957  and is never negative.
   958  The predeclared byte sequence type is `bytes`; it is a defined type.
   959  
   960  
   961  ### Bounds
   962  
   963  A _bound_, syntactically a [unary expression](#operands), defines
   964  an infinite disjunction of concrete values than can be represented
   965  as a single comparison.
   966  
   967  For any [comparison operator](#comparison-operators) `op` except `==`,
   968  `op a` is the disjunction of every `x` such that `x op a`.
   969  
   970  ```
   971  2 & >=2 & <=5           // 2, where 2 is either an int or float.
   972  2.5 & >=1 & <=5         // 2.5
   973  2 & >=1.0 & <3.0        // 2.0
   974  2 & >1 & <3.0           // 2.0
   975  2.5 & int & >1 & <5     // _|_
   976  2.5 & float & >1 & <5   // 2.5
   977  int & 2 & >1.0 & <3.0   // _|_
   978  2.5 & >=(int & 1) & <5  // _|_
   979  >=0 & <=7 & >=3 & <=10  // >=3 & <=7
   980  !=null & 1              // 1
   981  >=5 & <=5               // 5
   982  ```
   983  
   984  
   985  ### Structs
   986  
   987  A _struct_ is a set of elements called _fields_, each of
   988  which has a name, called a _label_, and value.
   989  
   990  We say a label is defined for a struct if the struct has a field with the
   991  corresponding label.
   992  The value for a label `f` of struct `a` is denoted `a.f`.
   993  A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
   994  defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
   995  Note that if `a` is an instance of `b` it may have fields with labels that
   996  are not defined for `b`.
   997  
   998  The (unique) struct with no fields, written `{}`, has every struct as an
   999  instance. It can be considered the type of all structs.
  1000  
  1001  ```
  1002  {a: 1} ⊑ {}
  1003  {a: 1, b: 1} ⊑ {a: 1}
  1004  {a: 1} ⊑ {a: int}
  1005  {a: 1, b: 1.0} ⊑ {a: int, b: float}
  1006  
  1007  {} ⋢ {a: 1}
  1008  {a: 2} ⋢ {a: 1}
  1009  {a: 1} ⋢ {b: 1}
  1010  ```
  1011  
  1012  A field may be required or optional.
  1013  The successful unification of structs `a` and `b` is a new struct `c` which
  1014  has all fields of both `a` and `b`, where
  1015  the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
  1016  or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
  1017  If a field `f` is in both `a` and `b`, `c.f` is optional only if both
  1018  `a.f` and `b.f` are optional.
  1019  Any [references](#references) to `a` or `b`
  1020  in their respective field values need to be replaced with references to `c`.
  1021  The result of a unification is bottom (`_|_`) if any of its non-optional
  1022  fields evaluates to bottom, recursively.
  1023  
  1024  <!--NOTE: About bottom values for optional fields being okay.
  1025  
  1026  The proposition ¬P is a close cousin of P → ⊥ and is often used
  1027  as an approximation to avoid the issues of using not.
  1028  Bottom (⊥) is also frequently used to mean undefined. This makes sense.
  1029  Consider `{a?: 2} & {a?: 3}`.
  1030  Both structs say `a` is optional; in other words, it may be omitted.
  1031  So we can still get a valid result by omitting `a`, even in
  1032  case of a conflict.
  1033  
  1034  Granted, this definition may lead to confusing results, especially in
  1035  definitions, when tightening an optional field leads to unintentionally
  1036  discarding it.
  1037  It could be a role of vet checkers to identify such cases (and suggest users
  1038  to explicitly use `_|_` to discard a field, for instance).
  1039  -->
  1040  
  1041  Syntactically, a field is marked as optional by following its label with a `?`.
  1042  The question mark is not part of the field name.
  1043  A struct literal may contain multiple fields with
  1044  the same label, the result of which is a single field with the same properties
  1045  as defined as the unification of two fields resulting from unifying two structs.
  1046  
  1047  These examples illustrate required fields only.
  1048  Examples with optional fields follow below.
  1049  
  1050  ```
  1051  Expression                             Result (without optional fields)
  1052  {a: int, a: 1}                         {a: 1}
  1053  {a: int} & {a: 1}                      {a: 1}
  1054  {a: >=1 & <=7} & {a: >=5 & <=9}        {a: >=5 & <=7}
  1055  {a: >=1 & <=7, a: >=5 & <=9}           {a: >=5 & <=7}
  1056  
  1057  {a: 1} & {b: 2}                        {a: 1, b: 2}
  1058  {a: 1, b: int} & {b: 2}                {a: 1, b: 2}
  1059  
  1060  {a: 1} & {a: 2}                        _|_
  1061  ```
  1062  
  1063  A struct may define constraints that apply to fields that are added when unified
  1064  with another struct using pattern or default constraints (_Note_: default
  1065  constraints are not yet implemented).
  1066  
  1067  A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
  1068  is a value of type string, and a value to unify with fields whose label
  1069  match that pattern.
  1070  When unifying structs `a` and `b`,
  1071  a pattern constraint `[p]: v` declared in `a`
  1072  defines that the value `v` should unify with any field in the resulting struct `c`
  1073  whose label unifies with pattern `p`.
  1074  
  1075  <!-- TODO: Update grammar and support this.
  1076  A pattern constraints with a pattern preceded by `...` indicates
  1077  the pattern can only matches fields in `b` for which there
  1078  exists no field in `a` with the same label.
  1079  -->
  1080  
  1081  Additionally, a _default constraint_, denoted `...value`, defines a value
  1082  to unify with any field for which there is no other declaration in a struct.
  1083  When unifying structs `a` and `b`,
  1084  a default constraint `...v` declared in `a`
  1085  defines that the value `v` should unify with any field in the resulting struct `c`
  1086  whose label does not unify with any of the patterns of the pattern
  1087  constraints defined for `a` _and_ for which there exists no field in `a`
  1088  with that label.
  1089  The token `...` is a shorthand for `..._`.
  1090  _Note_: default constraints are not yet implemented.
  1091  
  1092  
  1093  ```
  1094  a: {
  1095      foo:    string    // foo is a string
  1096      [=~"^i"]: int     // all other fields starting with i are integers
  1097      [=~"^b"]: bool    // all other fields starting with b are booleans
  1098      ...string         // all other fields must be a string. Note: default constraints are not yet implemented.
  1099  }
  1100  
  1101  b: a & {
  1102      i3:    3
  1103      bar:   true
  1104      other: "a string"
  1105  }
  1106  ```
  1107  
  1108  Concrete field labels may be an identifier or string, the latter of which may be
  1109  interpolated.
  1110  Fields with identifier labels can be referred to within the scope they are
  1111  defined, string labels cannot.
  1112  References within such interpolated strings are resolved within
  1113  the scope of the struct in which the label sequence is
  1114  defined and can reference concrete labels lexically preceding
  1115  the label within a label sequence.
  1116  <!-- We allow this so that rewriting a CUE file to collapse or expand
  1117  field sequences has no impact on semantics.
  1118  -->
  1119  
  1120  <!--TODO: first implementation round will not yet have expression labels
  1121  
  1122  An ExpressionLabel sets a collection of optional fields to a field value.
  1123  By default it defines this value for all possible string labels.
  1124  An optional expression limits this to the set of optional fields which
  1125  labels match the expression.
  1126  -->
  1127  
  1128  
  1129  <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
  1130  
  1131  
  1132  <!-- NOTE:
  1133  A DefinitionDecl does not allow repeated labels. This is to avoid
  1134  any ambiguity or confusion about whether earlier path components
  1135  are to be interpreted as declarations or normal fields (they should
  1136  always be normal fields.)
  1137  -->
  1138  
  1139  <!--NOTE:
  1140  The syntax has been deliberately restricted to allow for the following
  1141  future extensions and relaxations:
  1142    - Allow omitting a "?" in an expression label to indicate a concrete
  1143      string value (but maybe we want to use () for that).
  1144    - Make the "?" in expression label optional if expression labels
  1145      are always optional.
  1146    - Or allow eliding the "?" if the expression has no references and
  1147      is obviously not concrete (such as `[string]`).
  1148    - The expression of an expression label may also indicate a struct with
  1149      integer or even number labels
  1150      (beware of imprecise computation in the latter).
  1151        e.g. `{ [int]: string }` is a map of integers to strings.
  1152    - Allow for associative lists (`foo [@.field]: {field: string}`)
  1153    - The `...` notation can be extended analogously to that of a ListList,
  1154      by allowing it to follow with an expression for the remaining properties.
  1155      In that case it is no longer a shorthand for `[string]: _`, but rather
  1156      would define the value for any other value for which there is no field
  1157      defined.
  1158      Like the definition with List, this is somewhat odd, but it allows the
  1159      encoding of JSON schema's and (non-structural) OpenAPI's
  1160      additionalProperties and additionalItems.
  1161  -->
  1162  
  1163  ```
  1164  StructLit       = "{" { Declaration "," } "}" .
  1165  Declaration     = Field | Ellipsis | Embedding | LetClause | attribute .
  1166  Ellipsis        = "..." [ Expression ] .
  1167  Embedding       = Comprehension | AliasExpr .
  1168  Field           = Label ":" { Label ":" } AliasExpr { attribute } .
  1169  Label           = [ identifier "=" ] LabelExpr .
  1170  LabelExpr       = LabelName [ "?" ] | "[" AliasExpr "]" .
  1171  LabelName       = identifier | simple_string_lit  .
  1172  
  1173  attribute       = "@" identifier "(" attr_tokens ")" .
  1174  attr_tokens     = { attr_token |
  1175                      "(" attr_tokens ")" |
  1176                      "[" attr_tokens "]" |
  1177                      "{" attr_tokens "}" } .
  1178  attr_token      = /* any token except '(', ')', '[', ']', '{', or '}' */
  1179  ```
  1180  
  1181  ```
  1182  Expression                             Result (without optional fields)
  1183  a: { foo?: string }                    {}
  1184  b: { foo: "bar" }                      { foo: "bar" }
  1185  c: { foo?: *"bar" | string }           {}
  1186  
  1187  d: a & b                               { foo: "bar" }
  1188  e: b & c                               { foo: "bar" }
  1189  f: a & c                               {}
  1190  g: a & { foo?: number }                {}
  1191  h: b & { foo?: number }                _|_
  1192  i: c & { foo: string }                 { foo: "bar" }
  1193  
  1194  intMap: [string]: int
  1195  intMap: {
  1196      t1: 43
  1197      t2: 2.4  // error: 2.4 is not an integer
  1198  }
  1199  
  1200  nameMap: [string]: {
  1201      firstName: string
  1202      nickName:  *firstName | string
  1203  }
  1204  
  1205  nameMap: hank: { firstName: "Hank" }
  1206  ```
  1207  The optional field set defined by `nameMap` matches every field,
  1208  in this case just `hank`, and unifies the associated constraint
  1209  with the matched field, resulting in:
  1210  ```
  1211  nameMap: hank: {
  1212      firstName: "Hank"
  1213      nickName:  "Hank"
  1214  }
  1215  ```
  1216  
  1217  
  1218  #### Closed structs
  1219  
  1220  By default, structs are open to adding fields.
  1221  Instances of an open struct `p` may contain fields not defined in `p`.
  1222  This is makes it easy to add fields, but can lead to bugs:
  1223  
  1224  ```
  1225  S: {
  1226      field1: string
  1227  }
  1228  
  1229  S1: S & { field2: "foo" }
  1230  
  1231  // S1 is { field1: string, field2: "foo" }
  1232  
  1233  
  1234  A: {
  1235      field1: string
  1236      field2: string
  1237  }
  1238  
  1239  A1: A & {
  1240      feild1: "foo"  // "field1" was accidentally misspelled
  1241  }
  1242  
  1243  // A1 is
  1244  //    { field1: string, field2: string, feild1: "foo" }
  1245  // not the intended
  1246  //    { field1: "foo", field2: string }
  1247  ```
  1248  
  1249  A _closed struct_ `c` is a struct whose instances may not declare any field
  1250  with a name that does not match the name of a field
  1251  or the pattern of a pattern constraint defined in `c`.
  1252  Hidden fields are excluded from this limitation.
  1253  A struct that is the result of unifying any struct with a [`...`](#structs)
  1254  declaration is defined for all regular fields.
  1255  Closing a struct is equivalent to adding `..._|_` to it.
  1256  
  1257  Syntactically, structs are closed explicitly with the `close` builtin or
  1258  implicitly and recursively by [definitions](#definitions-and-hidden-fields).
  1259  
  1260  
  1261  ```
  1262  A: close({
  1263      field1: string
  1264      field2: string
  1265  })
  1266  
  1267  A1: A & {
  1268      feild1: string
  1269  } // _|_ feild1 not defined for A
  1270  
  1271  A2: A & {
  1272      for k,v in { feild1: string } {
  1273          k: v
  1274      }
  1275  }  // _|_ feild1 not defined for A
  1276  
  1277  C: close({
  1278      [_]: _
  1279  })
  1280  
  1281  C2: C & {
  1282      for k,v in { thisIsFine: string } {
  1283          "\(k)": v
  1284      }
  1285  }
  1286  
  1287  D: close({
  1288      // Values generated by comprehensions are treated as embeddings.
  1289      for k,v in { x: string } {
  1290          "\(k)": v
  1291      }
  1292  })
  1293  ```
  1294  
  1295  <!-- (jba) Somewhere it should be said that optional fields are only
  1296       interesting inside closed structs. -->
  1297  
  1298  <!-- TODO: move embedding section to above the previous one -->
  1299  
  1300  #### Embedding
  1301  
  1302  A struct may contain an _embedded value_, an operand used as a declaration.
  1303  An embedded value of type struct is unified with the struct in which it is
  1304  embedded, but disregarding the restrictions imposed by closed structs.
  1305  So if an embedding resolves to a closed struct, the corresponding enclosing
  1306  struct will also be closed, but may have fields that are not allowed if
  1307  normal rules for closed structs were observed.
  1308  
  1309  If an embedded value is not of type struct, the struct may only have
  1310  definitions or hidden fields. Regular fields are not allowed in such case.
  1311  
  1312  The result of `{ A }` is `A` for any `A` (including definitions).
  1313  
  1314  Syntactically, embeddings may be any expression.
  1315  
  1316  ```
  1317  S1: {
  1318      a: 1
  1319      b: 2
  1320      {
  1321          c: 3
  1322      }
  1323  }
  1324  // S1 is { a: 1, b: 2, c: 3 }
  1325  
  1326  S2: close({
  1327      a: 1
  1328      b: 2
  1329      {
  1330          c: 3
  1331      }
  1332  })
  1333  // same as close(S1)
  1334  
  1335  S3: {
  1336      a: 1
  1337      b: 2
  1338      close({
  1339          c: 3
  1340      })
  1341  }
  1342  // same as S2
  1343  ```
  1344  
  1345  
  1346  #### Definitions and hidden fields
  1347  
  1348  A field is a _definition_ if its identifier starts with `#` or `_#`.
  1349  A field is _hidden_ if its identifier starts with a `_`.
  1350  All other fields are _regular_.
  1351  
  1352  Definitions and hidden fields are not emitted when converting a CUE program
  1353  to data and are never required to be concrete.
  1354  
  1355  Referencing a definition will recursively [close](#closed-structs) it.
  1356  That is, a referenced definition will not unify with a struct
  1357  that would add a field anywhere within the definition that it does not
  1358  already define or explicitly allow with a pattern constraint or `...`.
  1359  [Embeddings](#embedding) allow bypassing this check.
  1360  
  1361  If referencing a definition would always result in an error, implementations
  1362  may report this inconsistency at the point of its declaration.
  1363  
  1364  ```
  1365  #MyStruct: {
  1366      sub: field:    string
  1367  }
  1368  
  1369  #MyStruct: {
  1370      sub: enabled?: bool
  1371  }
  1372  
  1373  myValue: #MyStruct & {
  1374      sub: feild:   2     // error, feild not defined in #MyStruct
  1375      sub: enabled: true  // okay
  1376  }
  1377  
  1378  #D: {
  1379      #OneOf
  1380  
  1381      c: int // adds this field.
  1382  }
  1383  
  1384  #OneOf: { a: int } | { b: int }
  1385  
  1386  
  1387  D1: #D & { a: 12, c: 22 }  // { a: 12, c: 22 }
  1388  D2: #D & { a: 12, b: 33 }  // _|_ // cannot define both `a` and `b`
  1389  ```
  1390  
  1391  
  1392  ```
  1393  #A: {a: int}
  1394  
  1395  B: {
  1396      #A
  1397      b: c: int
  1398  }
  1399  
  1400  x: B
  1401  x: d: 3  // not allowed, as closed by embedded #A
  1402  
  1403  y: B.b
  1404  y: d: 3  // allowed as nothing closes b
  1405  
  1406  #B: {
  1407      #A
  1408      b: c: int
  1409  }
  1410  
  1411  z: #B.b
  1412  z: d: 3  // not allowed, as referencing #B closes b
  1413  ```
  1414  
  1415  
  1416  <!---
  1417  JSON fields are usual camelCase. Clashes can be avoided by adopting the
  1418  convention that definitions be TitleCase. Unexported definitions are still
  1419  subject to clashes, but those are likely easier to resolve because they are
  1420  package internal.
  1421  --->
  1422  
  1423  
  1424  #### Attributes
  1425  
  1426  Attributes allow associating meta information with values.
  1427  Their primary purpose is to define mappings between CUE and
  1428  other representations.
  1429  Attributes do not influence the evaluation of CUE.
  1430  
  1431  An attribute associates an identifier with a value, a balanced token sequence,
  1432  which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
  1433  The sequence may not contain interpolations.
  1434  
  1435  Fields, structs and packages can be associated with a set of attributes.
  1436  Attributes accumulate during unification, but implementations may remove
  1437  duplicates that have the same source string representation.
  1438  The interpretation of an attribute, including the handling of multiple
  1439  attributes for a given identifier, is up to the consumer of the attribute.
  1440  
  1441  Field attributes define additional information about a field,
  1442  such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
  1443  name of the field when mapping to a different language.
  1444  
  1445  
  1446  ```
  1447  // Package attribute
  1448  @protobuf(proto3)
  1449  
  1450  myStruct1: {
  1451      // Struct attribute:
  1452      @jsonschema(id="https://example.org/mystruct1.json")
  1453  
  1454      // Field attributes
  1455      field: string @go(Field)
  1456      attr:  int    @xml(,attr) @go(Attr)
  1457  }
  1458  
  1459  myStruct2: {
  1460      field: string @go(Field)
  1461      attr:  int    @xml(a1,attr) @go(Attr)
  1462  }
  1463  
  1464  Combined: myStruct1 & myStruct2
  1465  // field: string @go(Field)
  1466  // attr:  int    @xml(,attr) @xml(a1,attr) @go(Attr)
  1467  ```
  1468  
  1469  
  1470  #### Aliases
  1471  
  1472  Aliases name values that can be referred to
  1473  within the [scope](#declarations-and-scopes) in which they are declared.
  1474  The name of an alias must be unique within its scope.
  1475  
  1476  ```
  1477  AliasExpr  = [ identifier "=" ] Expression .
  1478  ```
  1479  
  1480  Aliases can appear in several positions:
  1481  
  1482  <!--- TODO: consider allowing this. It should be considered whether
  1483  having field aliases isn't already sufficient.
  1484  
  1485  As a declaration in a struct (`X=value`):
  1486  
  1487  - binds identifier `X` to a value embedded within the struct.
  1488  --->
  1489  
  1490  In front of a Label (`X=label: value`):
  1491  
  1492  - binds the identifier to the same value as `label` would be bound
  1493    to if it were a valid identifier.
  1494  - for optional fields (`foo?: bar` and `[foo]: bar`),
  1495    the bound identifier is only visible within the field value (`bar`).
  1496  
  1497  Before a value (`foo: X=x`)
  1498  
  1499  - binds the identifier to the value it precedes within the scope of that value.
  1500  
  1501  Inside a bracketed label (`[X=expr]: value`):
  1502  
  1503  - binds the identifier to the concrete label that matches `expr`
  1504    within the instances of the field value (`value`).
  1505  
  1506  Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
  1507  
  1508  - binds the identifier to the list element it precedes within the scope of the
  1509    list expression.
  1510  
  1511  <!-- TODO: explain the difference between aliases and definitions.
  1512       Now that you have definitions, are aliases really necessary?
  1513       Consider removing.
  1514  -->
  1515  
  1516  ```
  1517  // A field alias
  1518  foo: X  // 4
  1519  X="not an identifier": 4
  1520  
  1521  // A value alias
  1522  foo: X={x: X.a}
  1523  bar: foo & {a: 1}  // {a: 1, x: 1}
  1524  
  1525  // A label alias
  1526  [Y=string]: { name: Y }
  1527  foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
  1528  ```
  1529  
  1530  <!-- TODO: also allow aliases as lists -->
  1531  
  1532  
  1533  #### Let declarations
  1534  
  1535  _Let declarations_ bind an identifier to an expression.
  1536  The identifier is visible within the [scope](#declarations-and-scopes)
  1537  in which it is declared.
  1538  The identifier must be unique within its scope.
  1539  
  1540  ```
  1541  let x = expr
  1542  
  1543  a: x + 1
  1544  b: x + 2
  1545  ```
  1546  
  1547  #### Shorthand notation for nested structs
  1548  
  1549  A field whose value is a struct with a single field may be written as
  1550  a colon-separated sequence of the two field names,
  1551  followed by a colon and the value of that single field.
  1552  
  1553  ```
  1554  job: myTask: replicas: 2
  1555  ```
  1556  expands to
  1557  ```
  1558  job: {
  1559      myTask: {
  1560          replicas: 2
  1561      }
  1562  }
  1563  ```
  1564  
  1565  <!-- OPTIONAL FIELDS:
  1566  
  1567  The optional marker solves the issue of having to print large amounts of
  1568  boilerplate when dealing with large types with many optional or default
  1569  values (such as Kubernetes).
  1570  Writing such optional values in terms of *null | value is tedious,
  1571  unpleasant to read, and as it is not well defined what can be dropped or not,
  1572  all null values have to be emitted from the output, even if the user
  1573  doesn't override them.
  1574  Part of the issue is how null is defined. We could adopt a Typescript-like
  1575  approach of introducing "void" or "undefined" to mean "not defined and not
  1576  part of the output". But having all of null, undefined, and void can be
  1577  confusing. If these ever are introduced anyway, the ? operator could be
  1578  expressed along the lines of
  1579     foo?: bar
  1580  being a shorthand for
  1581     foo: void | bar
  1582  where void is the default if no other default is given.
  1583  
  1584  The current mechanical definition of "?" is straightforward, though, and
  1585  probably avoids the need for void, while solving a big issue.
  1586  
  1587  Caveats:
  1588  [1] this definition requires explicitly defined fields to be emitted, even
  1589  if they could be elided (for instance if the explicit value is the default
  1590  value defined an optional field). This is probably a good thing.
  1591  
  1592  [2] a default value may still need to be included in an output if it is not
  1593  the zero value for that field and it is not known if any outside system is
  1594  aware of defaults. For instance, which defaults are specified by the user
  1595  and which by the schema understood by the receiving system.
  1596  The use of "?" together with defaults should therefore be used carefully
  1597  in non-schema definitions.
  1598  Problematic cases should be easy to detect by a vet-like check, though.
  1599  
  1600  [3] It should be considered how this affects the trim command.
  1601  Should values implied by optional fields be allowed to be removed?
  1602  Probably not. This restriction is unlikely to limit the usefulness of trim,
  1603  though.
  1604  
  1605  [4] There should be an option to emit all concrete optional values.
  1606  ```
  1607  -->
  1608  
  1609  ### Lists
  1610  
  1611  A list literal defines a new value of type list.
  1612  A list may be open or closed.
  1613  An open list is indicated with a `...` at the end of an element list,
  1614  optionally followed by a value for the remaining elements.
  1615  
  1616  The length of a closed list is the number of elements it contains.
  1617  The length of an open list is the number of elements as a lower bound
  1618  and an unlimited number of elements as its upper bound.
  1619  
  1620  ```
  1621  ListLit       = "[" [ ElementList [ "," ] ] "]" .
  1622  ElementList   = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] .
  1623  ```
  1624  
  1625  Lists can be thought of as structs:
  1626  
  1627  ```
  1628  List: *null | {
  1629      Elem: _
  1630      Tail: List
  1631  }
  1632  ```
  1633  
  1634  For closed lists, `Tail` is `null` for the last element, for open lists it is
  1635  `*null | List`, defaulting to the shortest variant.
  1636  For instance, the open list [ 1, 2, ... ] can be represented as:
  1637  ```
  1638  open: List & { Elem: 1, Tail: { Elem: 2 } }
  1639  ```
  1640  and the closed version of this list, [ 1, 2 ], as
  1641  ```
  1642  closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
  1643  ```
  1644  
  1645  Using this representation, the subsumption rule for lists can
  1646  be derived from those of structs.
  1647  Implementations are not required to implement lists as structs.
  1648  The `Elem` and `Tail` fields are not special and `len` will not work as
  1649  expected in these cases.
  1650  
  1651  
  1652  ## Declarations and Scopes
  1653  
  1654  
  1655  ### Blocks
  1656  
  1657  A _block_ is a possibly empty sequence of declarations.
  1658  The braces of a struct literal `{ ... }` form a block, but there are
  1659  others as well:
  1660  
  1661  - The _universe block_ encompasses all CUE source text.
  1662  - Each [package](#modules-instances-and-packages) has a _package block_
  1663    containing all CUE source text in that package.
  1664  - Each file has a _file block_ containing all CUE source text in that file.
  1665  - Each `for` and `let` clause in a [comprehension](#comprehensions)
  1666    is considered to be its own implicit block.
  1667  
  1668  Blocks nest and influence scoping.
  1669  
  1670  
  1671  ### Declarations and scope
  1672  
  1673  A _declaration_  may bind an identifier to a field, alias, or package.
  1674  Every identifier in a program must be declared.
  1675  Other than for fields,
  1676  no identifier may be declared twice within the same block.
  1677  For fields, an identifier may be declared more than once within the same block,
  1678  resulting in a field with a value that is the result of unifying the values
  1679  of all fields with the same identifier.
  1680  String labels do not bind an identifier to the respective field.
  1681  
  1682  The _scope_ of a declared identifier is the extent of source text in which the
  1683  identifier denotes the specified field, alias, or package.
  1684  
  1685  CUE is lexically scoped using blocks:
  1686  
  1687  1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
  1688  1. The scope of an identifier denoting a field
  1689    declared at top level (outside any struct literal) is the package block.
  1690  1. The scope of an identifier denoting an alias
  1691    declared at top level (outside any struct literal) is the file block.
  1692  1. The scope of a let identifier
  1693    declared at top level (outside any struct literal) is the file block.
  1694  1. The scope of the package name of an imported package is the file block of the
  1695    file containing the import declaration.
  1696  1. The scope of a field, alias or let identifier declared inside a struct
  1697     literal is the innermost containing block.
  1698  
  1699  An identifier declared in a block may be redeclared in an inner block.
  1700  While the identifier of the inner declaration is in scope, it denotes the entity
  1701  declared by the inner declaration.
  1702  
  1703  The package clause is not a declaration;
  1704  the package name does not appear in any scope.
  1705  Its purpose is to identify the files belonging to the same package
  1706  and to specify the default name for import declarations.
  1707  
  1708  
  1709  ### Predeclared identifiers
  1710  
  1711  CUE predefines a set of types and builtin functions.
  1712  For each of these there is a corresponding keyword which is the name
  1713  of the predefined identifier, prefixed with `__`.
  1714  
  1715  ```
  1716  Functions
  1717  len       close and or
  1718  
  1719  Types
  1720  null      The null type and value
  1721  bool      All boolean values
  1722  int       All integral numbers
  1723  float     All decimal floating-point numbers
  1724  string    Any valid UTF-8 sequence
  1725  bytes     Any valid byte sequence
  1726  
  1727  Derived   Value
  1728  number    int | float
  1729  uint      >=0
  1730  uint8     >=0 & <=255
  1731  int8      >=-128 & <=127
  1732  uint16    >=0 & <=65536
  1733  int16     >=-32_768 & <=32_767
  1734  rune      >=0 & <=0x10FFFF
  1735  uint32    >=0 & <=4_294_967_296
  1736  int32     >=-2_147_483_648 & <=2_147_483_647
  1737  uint64    >=0 & <=18_446_744_073_709_551_615
  1738  int64     >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
  1739  uint128   >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
  1740  int128    >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
  1741             <=170_141_183_460_469_231_731_687_303_715_884_105_727
  1742  float32   >=-3.40282346638528859811704183484516925440e+38 &
  1743            <=3.40282346638528859811704183484516925440e+38
  1744  float64   >=-1.797693134862315708145274237317043567981e+308 &
  1745            <=1.797693134862315708145274237317043567981e+308
  1746  ```
  1747  
  1748  
  1749  ### Exported identifiers
  1750  
  1751  <!-- move to a more logical spot -->
  1752  
  1753  An identifier of a package may be exported to permit access to it
  1754  from another package.
  1755  All identifiers not starting with `_` (so all regular fields and definitions
  1756  starting with `#`) are exported.
  1757  Any identifier starting with `_` is not visible outside the package and resides
  1758  in a separate namespace than namesake identifiers of other packages.
  1759  
  1760  ```
  1761  package mypackage
  1762  
  1763  foo:   string  // visible outside mypackage
  1764  "bar": string  // visible outside mypackage
  1765  
  1766  #Foo: {      // visible outside mypackage
  1767      a:  1    // visible outside mypackage
  1768      _b: 2    // not visible outside mypackage
  1769  
  1770      #C: {    // visible outside mypackage
  1771          d: 4 // visible outside mypackage
  1772      }
  1773      _#E: foo // not visible outside mypackage
  1774  }
  1775  ```
  1776  
  1777  
  1778  ### Uniqueness of identifiers
  1779  
  1780  Given a set of identifiers, an identifier is called unique if it is different
  1781  from every other in the set, after applying normalization following
  1782  Unicode Annex #31.
  1783  Two identifiers are different if they are spelled differently
  1784  or if they appear in different packages and are not exported.
  1785  Otherwise, they are the same.
  1786  
  1787  
  1788  ### Field declarations
  1789  
  1790  A field associates the value of an expression to a label within a struct.
  1791  If this label is an identifier, it binds the field to that identifier,
  1792  so the field's value can be referenced by writing the identifier.
  1793  String labels are not bound to fields.
  1794  ```
  1795  a: {
  1796      b: 2
  1797      "s": 3
  1798  
  1799      c: b   // 2
  1800      d: s   // _|_ unresolved identifier "s"
  1801      e: a.s // 3
  1802  }
  1803  ```
  1804  
  1805  If an expression may result in a value associated with a default value
  1806  as described in [default values](#default-values), the field binds to this
  1807  value-default pair.
  1808  
  1809  
  1810  <!-- TODO: disallow creating identifiers starting with __
  1811  ...and reserve them for builtin values.
  1812  
  1813  The issue is with code generation. As no guarantee can be given that
  1814  a predeclared identifier is not overridden in one of the enclosing scopes,
  1815  code will have to handle detecting such cases and renaming them.
  1816  An alternative is to have the predeclared identifiers be aliases for namesake
  1817  equivalents starting with a double underscore (e.g. string -> __string),
  1818  allowing generated code (normal code would keep using `string`) to refer
  1819  to these directly.
  1820  -->
  1821  
  1822  
  1823  ### Let declarations
  1824  
  1825  Within a struct, a let clause binds an identifier to the given expression.
  1826  
  1827  Within the scope of the identifier, the identifier refers to the
  1828  _locally declared_ expression.
  1829  The expression is evaluated in the scope it was declared.
  1830  
  1831  
  1832  ## Expressions
  1833  
  1834  An expression specifies the computation of a value by applying operators and
  1835  built-in functions to operands.
  1836  
  1837  Expressions that require concrete values are called _incomplete_ if any of
  1838  their operands are not concrete, but define a value that would be legal for
  1839  that expression.
  1840  Incomplete expressions may be left unevaluated until a concrete value is
  1841  requested at the application level.
  1842  
  1843  ### Operands
  1844  
  1845  Operands denote the elementary values in an expression.
  1846  An operand may be a literal, a (possibly qualified) identifier denoting
  1847  field, alias, or let declaration, or a parenthesized expression.
  1848  
  1849  ```
  1850  Operand     = Literal | OperandName | "(" Expression ")" .
  1851  Literal     = BasicLit | ListLit | StructLit .
  1852  BasicLit    = int_lit | float_lit | string_lit |
  1853                null_lit | bool_lit | bottom_lit .
  1854  OperandName = identifier | QualifiedIdent .
  1855  ```
  1856  
  1857  ### Qualified identifiers
  1858  
  1859  A qualified identifier is an identifier qualified with a package name prefix.
  1860  
  1861  ```
  1862  QualifiedIdent = PackageName "." identifier .
  1863  ```
  1864  
  1865  A qualified identifier accesses an identifier in a different package,
  1866  which must be [imported](#import-declarations).
  1867  The identifier must be declared in the [package block](#blocks) of that package.
  1868  
  1869  ```
  1870  math.Sin    // denotes the Sin function in package math
  1871  ```
  1872  
  1873  ### References
  1874  
  1875  An identifier operand refers to a field and is called a reference.
  1876  The value of a reference is a copy of the expression associated with the field
  1877  that it is bound to,
  1878  with any references within that expression bound to the respective copies of
  1879  the fields they were originally bound to.
  1880  Implementations may use a different mechanism to evaluate as long as
  1881  these semantics are maintained.
  1882  
  1883  ```
  1884  a: {
  1885      place:    string
  1886      greeting: "Hello, \(place)!"
  1887  }
  1888  
  1889  b: a & { place: "world" }
  1890  c: a & { place: "you" }
  1891  
  1892  d: b.greeting  // "Hello, world!"
  1893  e: c.greeting  // "Hello, you!"
  1894  ```
  1895  
  1896  
  1897  
  1898  ### Primary expressions
  1899  
  1900  Primary expressions are the operands for unary and binary expressions.
  1901  
  1902  ```
  1903  PrimaryExpr =
  1904  	Operand |
  1905  	PrimaryExpr Selector |
  1906  	PrimaryExpr Index |
  1907  	PrimaryExpr Slice |
  1908  	PrimaryExpr Arguments .
  1909  
  1910  Selector       = "." (identifier | simple_string_lit) .
  1911  Index          = "[" Expression "]" .
  1912  Argument       = Expression .
  1913  Arguments      = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
  1914  ```
  1915  <!---
  1916  TODO:
  1917  	PrimaryExpr Query |
  1918  Query          = "." Filters .
  1919  Filters        = Filter { Filter } .
  1920  Filter         = "[" [ "?" ] AliasExpr "]" .
  1921  
  1922  TODO: maybe reintroduce slices, as they are useful in queries, probably this
  1923  time with Python semantics.
  1924  Slice          = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
  1925  
  1926  Argument       = Expression | ( identifier ":" Expression ).
  1927  
  1928  // & expression type
  1929  // string_lit: same as label. Arguments is current node.
  1930  // If selector is applied to list, it performs the operation for each
  1931  // element.
  1932  
  1933  TODO: considering allowing decimal_lit for selectors.
  1934  --->
  1935  
  1936  ```
  1937  x
  1938  2
  1939  (s + ".txt")
  1940  f(3.1415, true)
  1941  m["foo"]
  1942  obj.color
  1943  f.p[i].x
  1944  ```
  1945  
  1946  
  1947  ### Selectors
  1948  
  1949  For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
  1950  the selector expression
  1951  
  1952  ```
  1953  x.f
  1954  ```
  1955  
  1956  denotes the element of a <!--list or -->struct `x` identified by `f`.
  1957  <!--For structs, -->
  1958  `f` must be an identifier or a string literal identifying
  1959  any definition or regular non-optional field.
  1960  The identifier `f` is called the field selector.
  1961  
  1962  <!--
  1963  Allowing strings to be used as field selectors obviates the need for
  1964  backquoted identifiers. Note that some standards use names for structs that
  1965  are not standard identifiers (such "Fn::Foo"). Note that indexing does not
  1966  allow access to identifiers.
  1967  -->
  1968  
  1969  <!--
  1970  For lists, `f` must be an integer and follows the same lookup rules as
  1971  for the index operation.
  1972  The type of the selector expression is the type of `f`.
  1973  -->
  1974  
  1975  If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
  1976  
  1977  <!--
  1978  TODO: consider allowing this and also for selectors. It needs to be considered
  1979  how defaults are carried forward in cases like:
  1980  
  1981      x: { a: string | *"foo" } | *{ a: int | *4 }
  1982      y: x.a & string
  1983  
  1984  What is y in this case?
  1985     (x.a & string, _|_)
  1986     (string|"foo", _|_)
  1987     (string|"foo", "foo)
  1988  If the latter, then why?
  1989  
  1990  For a disjunction of the form `x1 | ... | xn`,
  1991  the selector is applied to each element `x1.f | ... | xn.f`.
  1992  -->
  1993  
  1994  Otherwise, if `x` is not a <!--list or -->struct,
  1995  or if `f` does not exist in `x`,
  1996  the result of the expression is bottom (an error).
  1997  In the latter case the expression is incomplete.
  1998  The operand of a selector may be associated with a default.
  1999  
  2000  ```
  2001  T: {
  2002      x:     int
  2003      y:     3
  2004      "x-y": 4
  2005  }
  2006  
  2007  a: T.x     // int
  2008  b: T.y     // 3
  2009  c: T.z     // _|_ // field 'z' not found in T
  2010  d: T."x-y" // 4
  2011  
  2012  e: {a: 1|*2} | *{a: 3|*4}
  2013  f: e.a  // 4 (default value)
  2014  ```
  2015  
  2016  <!--
  2017  ```
  2018  (v, d).f  =>  (v.f, d.f)
  2019  
  2020  e: {a: 1|*2} | *{a: 3|*4}
  2021  f: e.a  // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
  2022  
  2023  ```
  2024  -->
  2025  
  2026  
  2027  ### Index expressions
  2028  
  2029  A primary expression of the form
  2030  
  2031  ```
  2032  a[x]
  2033  ```
  2034  
  2035  denotes the element of a list or struct `a` indexed by `x`.
  2036  The value `x` is called the index or field name, respectively.
  2037  The following rules apply:
  2038  
  2039  If `a` is not a struct:
  2040  
  2041  - `a` is a list (which need not be complete)
  2042  - the index `x` unified with `int` must be concrete.
  2043  - the index `x` is in range if `0 <= x < len(a)`, where only the
  2044    explicitly defined values of an open-ended list are considered,
  2045    otherwise it is out of range
  2046  
  2047  The result of `a[x]` is
  2048  
  2049  for `a` of list type:
  2050  
  2051  - the list element at index `x`, if `x` is within range
  2052  - bottom (an error), otherwise
  2053  
  2054  
  2055  for `a` of struct type:
  2056  
  2057  - the index `x` unified with `string` must be concrete.
  2058  - the value of the regular and non-optional field named `x` of struct `a`,
  2059    if this field exists
  2060  - bottom (an error), otherwise
  2061  
  2062  
  2063  ```
  2064  [ 1, 2 ][1]     // 2
  2065  [ 1, 2 ][2]     // _|_
  2066  [ 1, 2, ...][2] // _|_
  2067  ```
  2068  
  2069  Both the operand and index value may be a value-default pair.
  2070  ```
  2071  va[vi]              =>  va[vi]
  2072  va[(vi, di)]        =>  (va[vi], va[di])
  2073  (va, da)[vi]        =>  (va[vi], da[vi])
  2074  (va, da)[(vi, di)]  =>  (va[vi], da[di])
  2075  ```
  2076  
  2077  ```
  2078  Fields                  Result
  2079  x: [1, 2] | *[3, 4]     ([1,2]|[3,4], [3,4])
  2080  i: int | *1             (int, 1)
  2081  
  2082  v: x[i]                 (x[i], 4)
  2083  ```
  2084  
  2085  ### Operators
  2086  
  2087  Operators combine operands into expressions.
  2088  
  2089  ```
  2090  Expression = UnaryExpr | Expression binary_op Expression .
  2091  UnaryExpr  = PrimaryExpr | unary_op UnaryExpr .
  2092  
  2093  binary_op  = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op  .
  2094  rel_op     = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
  2095  add_op     = "+" | "-" .
  2096  mul_op     = "*" | "/" .
  2097  unary_op   = "+" | "-" | "!" | "*" | rel_op .
  2098  ```
  2099  
  2100  Comparisons are discussed [elsewhere](#comparison-operators).
  2101  For any binary operators, the operand types must unify.
  2102  
  2103  <!-- TODO: durations
  2104   unless the operation involves durations.
  2105  
  2106  Except for duration operations, if one operand is an untyped [literal] and the
  2107  other operand is not, the constant is [converted] to the type of the other
  2108  operand.
  2109  -->
  2110  
  2111  Operands of unary and binary expressions may be associated with a default using
  2112  the following
  2113  
  2114  <!--
  2115  ```
  2116  O1: op (v1, d1)          => (op v1, op d1)
  2117  
  2118  O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
  2119  and because v => (v, v)
  2120  O3: v1       op (v2, d2) => (v1 op v2, v1 op d2)
  2121  O4: (v1, d1) op v2       => (v1 op v2, d1 op v2)
  2122  ```
  2123  -->
  2124  
  2125  ```
  2126  Field               Resulting Value-Default pair
  2127  a: *1|2             (1|2, 1)
  2128  b: -a               (-a, -1)
  2129  
  2130  c: a + 2            (a+2, 3)
  2131  d: a + a            (a+a, 2)
  2132  ```
  2133  
  2134  #### Operator precedence
  2135  
  2136  Unary operators have the highest precedence.
  2137  
  2138  There are eight precedence levels for binary operators.
  2139  Multiplication operators binds strongest, followed by
  2140  addition operators, comparison operators,
  2141  `&&` (logical AND), `||` (logical OR), `&` (unification),
  2142  and finally `|` (disjunction):
  2143  
  2144  ```
  2145  Precedence    Operator
  2146      7             *  /
  2147      6             +  -
  2148      5             ==  !=  <  <=  >  >= =~ !~
  2149      4             &&
  2150      3             ||
  2151      2             &
  2152      1             |
  2153  ```
  2154  
  2155  Binary operators of the same precedence associate from left to right.
  2156  For instance, `x / y * z` is the same as `(x / y) * z`.
  2157  
  2158  ```
  2159  +x
  2160  23 + 3*x[i]
  2161  x <= f()
  2162  f() || g()
  2163  x == y+1 && y == z-1
  2164  2 | int
  2165  { a: 1 } & { b: 2 }
  2166  ```
  2167  
  2168  #### Arithmetic operators
  2169  
  2170  Arithmetic operators apply to numeric values and yield a result of the same type
  2171  as the first operand. The four standard arithmetic operators
  2172  `(+, -, *, /)` apply to integer and decimal floating-point types;
  2173  `+` and `*` also apply to strings and bytes.
  2174  
  2175  ```
  2176  +    sum                    integers, floats, strings, bytes
  2177  -    difference             integers, floats
  2178  *    product                integers, floats, strings, bytes
  2179  /    quotient               integers, floats
  2180  ```
  2181  
  2182  For any operator that accepts operands of type `float`, any operand may be
  2183  of type `int` or `float`, in which case the result will be `float`
  2184  if it cannot be represented as an `int` or if any of the operands are `float`,
  2185  or `int` otherwise.
  2186  So the result of `1 / 2` is `0.5` and is of type `float`.
  2187  
  2188  The result of division by zero is bottom (an error).
  2189  <!-- TODO: consider making it +/- Inf -->
  2190  Integer division is implemented through the builtin functions
  2191  `quo`, `rem`, `div`, and `mod`.
  2192  
  2193  The unary operators `+` and `-` are defined for numeric values as follows:
  2194  
  2195  ```
  2196  +x                          is 0 + x
  2197  -x    negation              is 0 - x
  2198  ```
  2199  
  2200  #### String operators
  2201  
  2202  Strings can be concatenated using the `+` operator:
  2203  ```
  2204  s: "hi " + name + " and good bye"
  2205  ```
  2206  String addition creates a new string by concatenating the operands.
  2207  
  2208  A string can be repeated by multiplying it:
  2209  
  2210  ```
  2211  s: "etc. "*3  // "etc. etc. etc. "
  2212  ```
  2213  
  2214  <!-- jba: Do these work for byte sequences? If not, why not? -->
  2215  
  2216  
  2217  ##### Comparison operators
  2218  
  2219  Comparison operators compare two operands and yield an untyped boolean value.
  2220  
  2221  ```
  2222  ==    equal
  2223  !=    not equal
  2224  <     less
  2225  <=    less or equal
  2226  >     greater
  2227  >=    greater or equal
  2228  =~    matches regular expression
  2229  !~    does not match regular expression
  2230  ```
  2231  
  2232  <!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
  2233  
  2234  In any comparison, the types of the two operands must unify or one of the
  2235  operands must be null.
  2236  
  2237  The equality operators `==` and `!=` apply to operands that are comparable.
  2238  The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
  2239  The matching operators `=~` and `!~` apply to a string and regular
  2240  expression operand.
  2241  These terms and the result of the comparisons are defined as follows:
  2242  
  2243  - Null is comparable with itself and any other type.
  2244    Two null values are always equal, null is unequal with anything else.
  2245  - Boolean values are comparable.
  2246    Two boolean values are equal if they are either both true or both false.
  2247  - Integer values are comparable and ordered, in the usual way.
  2248  - Floating-point values are comparable and ordered, as per the definitions
  2249    for binary coded decimals in the IEEE-754-2008 standard.
  2250  - Floating point numbers may be compared with integers.
  2251  - String and bytes values are comparable and ordered lexically byte-wise.
  2252  - Struct are not comparable.
  2253  - Lists are not comparable.
  2254  - The regular expression syntax is the one accepted by RE2,
  2255    described in https://github.com/google/re2/wiki/Syntax,
  2256    except for `\C`.
  2257  - `s =~ r` is true if `s` matches the regular expression `r`.
  2258  - `s !~ r` is true if `s` does not match regular expression `r`.
  2259  
  2260  <!--- TODO: consider the following
  2261  - For regular expression, named capture groups are interpreted as CUE references
  2262    that must unify with the strings matching this capture group.
  2263  --->
  2264  <!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
  2265  <!-- Consider implementing Level 2 of Unicode regular expression. -->
  2266  
  2267  ```
  2268  3 < 4       // true
  2269  3 < 4.0     // true
  2270  null == 2   // false
  2271  null != {}  // true
  2272  {} == {}    // _|_: structs are not comparable against structs
  2273  
  2274  "Wild cats" =~ "cat"   // true
  2275  "Wild cats" !~ "dog"   // true
  2276  
  2277  "foo" =~ "^[a-z]{3}$"  // true
  2278  "foo" =~ "^[a-z]{4}$"  // false
  2279  ```
  2280  
  2281  <!-- jba
  2282  I think I know what `3 < a` should mean if
  2283  
  2284      a: >=1 & <=5
  2285  
  2286  It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
  2287  
  2288  But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
  2289  -->
  2290  
  2291  #### Logical operators
  2292  
  2293  Logical operators apply to boolean values and yield a result of the same type
  2294  as the operands. The right operand is evaluated conditionally.
  2295  
  2296  ```
  2297  &&    conditional AND    p && q  is  "if p then q else false"
  2298  ||    conditional OR     p || q  is  "if p then true else q"
  2299  !     NOT                !p      is  "not p"
  2300  ```
  2301  
  2302  
  2303  <!--
  2304  ### TODO TODO TODO
  2305  
  2306  3.14 / 0.0   // illegal: division by zero
  2307  Illegal conversions always apply to CUE.
  2308  
  2309  Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
  2310  -->
  2311  
  2312  <!--- TODO(mpvl): conversions
  2313  ### Conversions
  2314  Conversions are expressions of the form `T(x)` where `T` and `x` are
  2315  expressions.
  2316  The result is always an instance of `T`.
  2317  
  2318  ```
  2319  Conversion = Expression "(" Expression [ "," ] ")" .
  2320  ```
  2321  --->
  2322  <!---
  2323  
  2324  A literal value `x` can be converted to type T if `x` is representable by a
  2325  value of `T`.
  2326  
  2327  As a special case, an integer literal `x` can be converted to a string type
  2328  using the same rule as for non-constant x.
  2329  
  2330  Converting a literal yields a typed value as result.
  2331  
  2332  ```
  2333  uint(iota)               // iota value of type uint
  2334  float32(2.718281828)     // 2.718281828 of type float32
  2335  complex128(1)            // 1.0 + 0.0i of type complex128
  2336  float32(0.49999999)      // 0.5 of type float32
  2337  float64(-1e-1000)        // 0.0 of type float64
  2338  string('x')              // "x" of type string
  2339  string(0x266c)           // "♬" of type string
  2340  MyString("foo" + "bar")  // "foobar" of type MyString
  2341  string([]byte{'a'})      // not a constant: []byte{'a'} is not a constant
  2342  (*int)(nil)              // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
  2343  int(1.2)                 // illegal: 1.2 cannot be represented as an int
  2344  string(65.0)             // illegal: 65.0 is not an integer constant
  2345  ```
  2346  --->
  2347  <!---
  2348  
  2349  A conversion is always allowed if `x` is an instance of `T`.
  2350  
  2351  If `T` and `x` of different underlying type, a conversion is allowed if
  2352  `x` can be converted to a value `x'` of `T`'s type, and
  2353  `x'` is an instance of `T`.
  2354  A value `x` can be converted to the type of `T` in any of these cases:
  2355  
  2356  - `x` is a struct and is subsumed by `T`.
  2357  - `x` and `T` are both integer or floating points.
  2358  - `x` is an integer or a byte sequence and `T` is a string.
  2359  - `x` is a string and `T` is a byte sequence.
  2360  
  2361  Specific rules apply to conversions between numeric types, structs,
  2362  or to and from a string type. These conversions may change the representation
  2363  of `x`.
  2364  All other conversions only change the type but not the representation of x.
  2365  
  2366  
  2367  #### Conversions between numeric ranges
  2368  For the conversion of numeric values, the following rules apply:
  2369  
  2370  1. Any integer value can be converted into any other integer value
  2371     provided that it is within range.
  2372  2. When converting a decimal floating-point number to an integer, the fraction
  2373     is discarded (truncation towards zero). TODO: or disallow truncating?
  2374  
  2375  ```
  2376  a: uint16(int(1000))  // uint16(1000)
  2377  b: uint8(1000)        // _|_ // overflow
  2378  c: int(2.5)           // 2  TODO: TBD
  2379  ```
  2380  
  2381  
  2382  #### Conversions to and from a string type
  2383  
  2384  Converting a list of bytes to a string type yields a string whose successive
  2385  bytes are the elements of the slice.
  2386  Invalid UTF-8 is converted to `"\uFFFD"`.
  2387  
  2388  ```
  2389  string('hell\xc3\xb8')   // "hellø"
  2390  string(bytes([0x20]))    // " "
  2391  ```
  2392  
  2393  As string value is always convertible to a list of bytes.
  2394  
  2395  ```
  2396  bytes("hellø")   // 'hell\xc3\xb8'
  2397  bytes("")        // ''
  2398  ```
  2399  
  2400  #### Conversions between list types
  2401  
  2402  Conversions between list types are possible only if `T` strictly subsumes `x`
  2403  and the result will be the unification of `T` and `x`.
  2404  
  2405  If we introduce named types this would be different from IP & [10, ...]
  2406  
  2407  Consider removing this until it has a different meaning.
  2408  
  2409  ```
  2410  IP:        4*[byte]
  2411  Private10: IP([10, ...])  // [10, byte, byte, byte]
  2412  ```
  2413  
  2414  #### Conversions between struct types
  2415  
  2416  A conversion from `x` to `T`
  2417  is applied using the following rules:
  2418  
  2419  1. `x` must be an instance of `T`,
  2420  2. all fields defined for `x` that are not defined for `T` are removed from
  2421    the result of the conversion, recursively.
  2422  
  2423  <!-- jba: I don't think you say anywhere that the matching fields are unified.
  2424  mpvl: they are not, x must be an instance of T, in which case x == T&x,
  2425  so unification would be unnecessary.
  2426  -->
  2427  <!--
  2428  ```
  2429  T: {
  2430      a: { b: 1..10 }
  2431  }
  2432  
  2433  x1: {
  2434      a: { b: 8, c: 10 }
  2435      d: 9
  2436  }
  2437  
  2438  c1: T(x1)             // { a: { b: 8 } }
  2439  c2: T({})             // _|_  // missing field 'a' in '{}'
  2440  c3: T({ a: {b: 0} })  // _|_  // field a.b does not unify (0 & 1..10)
  2441  ```
  2442  -->
  2443  
  2444  ### Calls
  2445  
  2446  Calls can be made to core library functions, called builtins.
  2447  Given an expression `f` of function type F,
  2448  ```
  2449  f(a1, a2, … an)
  2450  ```
  2451  calls `f` with arguments a1, a2, … an. Arguments must be expressions
  2452  of which the values are an instance of the parameter types of `F`
  2453  and are evaluated before the function is called.
  2454  
  2455  ```
  2456  a: math.Atan2(x, y)
  2457  ```
  2458  
  2459  In a function call, the function value and arguments are evaluated in the usual
  2460  order.
  2461  After they are evaluated, the parameters of the call are passed by value
  2462  to the function and the called function begins execution.
  2463  The return parameters
  2464  of the function are passed by value back to the calling function when the
  2465  function returns.
  2466  
  2467  
  2468  ### Comprehensions
  2469  
  2470  Lists and fields can be constructed using comprehensions.
  2471  
  2472  Comprehensions define a clause sequence that consists of a sequence of
  2473  `for`, `if`, and `let` clauses, nesting from left to right.
  2474  The sequence must start with a `for` or `if` clause.
  2475  The `for` and `let` clauses each define a new scope in which new values are
  2476  bound to be available for the next clause.
  2477  
  2478  The `for` clause binds the defined identifiers, on each iteration, to the next
  2479  value of some iterable value in a new scope.
  2480  A `for` clause may bind one or two identifiers.
  2481  If there is one identifier, it binds it to the value of
  2482  a list element or struct field value.
  2483  If there are two identifiers, the first value will be the key or index,
  2484  if available, and the second will be the value.
  2485  
  2486  For lists, `for` iterates over all elements in the list after closing it.
  2487  For structs, `for` iterates over all non-optional regular fields.
  2488  
  2489  An `if` clause, or guard, specifies an expression that terminates the current
  2490  iteration if it evaluates to false.
  2491  
  2492  The `let` clause binds the result of an expression to the defined identifier
  2493  in a new scope.
  2494  
  2495  A current iteration is said to complete if the innermost block of the clause
  2496  sequence is reached.
  2497  Syntactically, the comprehension value is a struct.
  2498  A comprehension can generate non-struct values by embedding such values within
  2499  this struct.
  2500  
  2501  Within lists, the values yielded by a comprehension are inserted in the list
  2502  at the position of the comprehension.
  2503  Within structs, the values yielded by a comprehension are embedded within the
  2504  struct.
  2505  Both structs and lists may contain multiple comprehensions.
  2506  
  2507  ```
  2508  Comprehension       = Clauses StructLit .
  2509  
  2510  Clauses             = StartClause { [ "," ] Clause } .
  2511  StartClause         = ForClause | GuardClause .
  2512  Clause              = StartClause | LetClause .
  2513  ForClause           = "for" identifier [ "," identifier ] "in" Expression .
  2514  GuardClause         = "if" Expression .
  2515  LetClause           = "let" identifier "=" Expression .
  2516  ```
  2517  
  2518  ```
  2519  a: [1, 2, 3, 4]
  2520  b: [ for x in a if x > 1 { x+1 } ]  // [3, 4, 5]
  2521  
  2522  c: {
  2523      for x in a
  2524      if x < 4
  2525      let y = 1 {
  2526          "\(x)": x + y
  2527      }
  2528  }
  2529  d: { "1": 2, "2": 3, "3": 4 }
  2530  ```
  2531  
  2532  
  2533  ### String interpolation
  2534  
  2535  String interpolation allows constructing strings by replacing placeholder
  2536  expressions with their string representation.
  2537  String interpolation may be used in single- and double-quoted strings, as well
  2538  as their multiline equivalent.
  2539  
  2540  A placeholder consists of "\\(" followed by an expression and a ")".
  2541  The expression is evaluated in the scope within which the string is defined.
  2542  
  2543  The result of the expression is substituted as follows:
  2544  - string: as is
  2545  - bool: the JSON representation of the bool
  2546  - number: a JSON representation of the number that preserves the
  2547  precision of the underlying binary coded decimal
  2548  - bytes: as if substituted within single quotes or
  2549  converted to valid UTF-8 replacing the
  2550  maximal subpart of ill-formed subsequences with a single
  2551  replacement character (W3C encoding standard) otherwise
  2552  - list: illegal
  2553  - struct: illegal
  2554  
  2555  
  2556  ```
  2557  a: "World"
  2558  b: "Hello \( a )!" // Hello World!
  2559  ```
  2560  
  2561  
  2562  ## Builtin Functions
  2563  
  2564  Built-in functions are predeclared. They are called like any other function.
  2565  
  2566  
  2567  ### `len`
  2568  
  2569  The built-in function `len` takes arguments of various types and returns
  2570  a result of type int.
  2571  
  2572  ```
  2573  Argument type    Result
  2574  
  2575  string            string length in bytes
  2576  bytes             length of byte sequence
  2577  list              list length, smallest length for an open list
  2578  struct            number of distinct data fields, excluding optional
  2579  ```
  2580  <!-- TODO: consider not supporting len, but instead rely on more
  2581  precisely named builtin functions:
  2582    - strings.RuneLen(x)
  2583    - bytes.Len(x)  // x may be a string
  2584    - struct.NumFooFields(x)
  2585    - list.Len(x)
  2586  -->
  2587  
  2588  ```
  2589  Expression           Result
  2590  len("Hellø")         6
  2591  len([1, 2, 3])       3
  2592  len([1, 2, ...])     >=2
  2593  ```
  2594  
  2595  
  2596  ### `close`
  2597  
  2598  The builtin function `close` converts a partially defined, or open, struct
  2599  to a fully defined, or closed, struct.
  2600  
  2601  
  2602  ### `and`
  2603  
  2604  The built-in function `and` takes a list and returns the result of applying
  2605  the `&` operator to all elements in the list.
  2606  It returns top for the empty list.
  2607  
  2608  ```
  2609  Expression:          Result
  2610  and([a, b])          a & b
  2611  and([a])             a
  2612  and([])              _
  2613  ```
  2614  
  2615  ### `or`
  2616  
  2617  The built-in function `or` takes a list and returns the result of applying
  2618  the `|` operator to all elements in the list.
  2619  It returns bottom for the empty list.
  2620  
  2621  ```
  2622  Expression:          Result
  2623  or([a, b])           a | b
  2624  or([a])              a
  2625  or([])               _|_
  2626  ```
  2627  
  2628  ### `div`, `mod`, `quo` and `rem`
  2629  
  2630  For two integer values `x` and `y`,
  2631  the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)`
  2632  implement Euclidean division and
  2633  satisfy the following relationship:
  2634  
  2635  ```
  2636  r = x - y*q  with 0 <= r < |y|
  2637  ```
  2638  where `|y|` denotes the absolute value of `y`.
  2639  
  2640  ```
  2641   x     y   div(x, y)  mod(x, y)
  2642   5     3        1          2
  2643  -5     3       -2          1
  2644   5    -3       -1          2
  2645  -5    -3        2          1
  2646  ```
  2647  
  2648  For two integer values `x` and `y`,
  2649  the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)`
  2650  implement truncated division and
  2651  satisfy the following relationship:
  2652  
  2653  ```
  2654  x = q*y + r  and  |r| < |y|
  2655  ```
  2656  
  2657  with `quo(x, y)` truncated towards zero.
  2658  
  2659  ```
  2660   x     y   quo(x, y)  rem(x, y)
  2661   5     3        1          2
  2662  -5     3       -1         -2
  2663   5    -3       -1          2
  2664  -5    -3        1         -2
  2665  ```
  2666  
  2667  A zero divisor in either case results in bottom (an error).
  2668  
  2669  
  2670  ## Cycles
  2671  
  2672  Implementations are required to interpret or reject cycles encountered
  2673  during evaluation according to the rules in this section.
  2674  
  2675  
  2676  ### Reference cycles
  2677  
  2678  A _reference cycle_ occurs if a field references itself, either directly or
  2679  indirectly.
  2680  
  2681  ```
  2682  // x references itself
  2683  x: x
  2684  
  2685  // indirect cycles
  2686  b: c
  2687  c: d
  2688  d: b
  2689  ```
  2690  
  2691  Implementations should treat these as `_`.
  2692  Two particular cases are discussed below.
  2693  
  2694  
  2695  #### Expressions that unify an atom with an expression
  2696  
  2697  An expression of the form `a & e`, where `a` is an atom
  2698  and `e` is an expression, always evaluates to `a` or bottom.
  2699  As it does not matter how we fail, we can assume the result to be `a`
  2700  and postpone validating `a == e` until after all references
  2701  in `e` have been resolved.
  2702  
  2703  ```
  2704  // Config            Evaluates to (requiring concrete values)
  2705  x: {                  x: {
  2706      a: b + 100            a: _|_ // cycle detected
  2707      b: a - 100            b: _|_ // cycle detected
  2708  }                     }
  2709  
  2710  y: x & {              y: {
  2711      a: 200                a: 200 // asserted that 200 == b + 100
  2712                            b: 100
  2713  }                     }
  2714  ```
  2715  
  2716  
  2717  #### Field values
  2718  
  2719  A field value of the form `r & v`,
  2720  where `r` evaluates to a reference cycle and `v` is a concrete value,
  2721  evaluates to `v`.
  2722  Unification is idempotent and unifying a value with itself ad infinitum,
  2723  which is what the cycle represents, results in this value.
  2724  Implementations should detect cycles of this kind, ignore `r`,
  2725  and take `v` as the result of unification.
  2726  
  2727  <!-- Tomabechi's graph unification algorithm
  2728  can detect such cycles at near-zero cost. -->
  2729  
  2730  ```
  2731  Configuration    Evaluated
  2732  //    c           Cycles in nodes of type struct evaluate
  2733  //  ↙︎   ↖         to the fixed point of unifying their
  2734  // a  →  b        values ad infinitum.
  2735  
  2736  a: b & { x: 1 }   // a: { x: 1, y: 2, z: 3 }
  2737  b: c & { y: 2 }   // b: { x: 1, y: 2, z: 3 }
  2738  c: a & { z: 3 }   // c: { x: 1, y: 2, z: 3 }
  2739  
  2740  // resolve a             b & {x:1}
  2741  // substitute b          c & {y:2} & {x:1}
  2742  // substitute c          a & {z:3} & {y:2} & {x:1}
  2743  // eliminate a (cycle)   {z:3} & {y:2} & {x:1}
  2744  // simplify              {x:1,y:2,z:3}
  2745  ```
  2746  
  2747  This rule also applies to field values that are disjunctions of unification
  2748  operations of the above form.
  2749  
  2750  ```
  2751  a: b&{x:1} | {y:1}  // {x:1,y:3,z:2} | {y:1}
  2752  b: {x:2} | c&{z:2}  // {x:2} | {x:1,y:3,z:2}
  2753  c: a&{y:3} | {z:3}  // {x:1,y:3,z:2} | {z:3}
  2754  
  2755  
  2756  // resolving a           b&{x:1} | {y:1}
  2757  // substitute b          ({x:2} | c&{z:2})&{x:1} | {y:1}
  2758  // simplify              c&{z:2}&{x:1} | {y:1}
  2759  // substitute c          (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
  2760  // simplify              a&{y:3}&{z:2}&{x:1} | {y:1}
  2761  // eliminate a (cycle)   {y:3}&{z:2}&{x:1} | {y:1}
  2762  // expand                {x:1,y:3,z:2} | {y:1}
  2763  ```
  2764  
  2765  Note that all nodes that form a reference cycle to form a struct will evaluate
  2766  to the same value.
  2767  If a field value is a disjunction, any element that is part of a cycle will
  2768  evaluate to this value.
  2769  
  2770  
  2771  ### Structural cycles
  2772  
  2773  A structural cycle is when a node references one of its ancestor nodes.
  2774  It is possible to construct a structural cycle by unifying two acyclic values:
  2775  ```
  2776  // acyclic
  2777  y: {
  2778      f: h: g
  2779      g: _
  2780  }
  2781  // acyclic
  2782  x: {
  2783      f: _
  2784      g: f
  2785  }
  2786  // introduces structural cycle
  2787  z: x & y
  2788  ```
  2789  Implementations should be able to detect such structural cycles dynamically.
  2790  
  2791  A structural cycle can result in infinite structure or evaluation loops.
  2792  ```
  2793  // infinite structure
  2794  a: b: a
  2795  
  2796  // infinite evaluation
  2797  f: {
  2798      n:   int
  2799      out: n + (f & {n: 1}).out
  2800  }
  2801  ```
  2802  CUE must allow or disallow structural cycles under certain circumstances.
  2803  
  2804  If a node `a` references an ancestor node, we call it and any of its
  2805  field values `a.f` _cyclic_.
  2806  So if `a` is cyclic, all of its descendants are also regarded as cyclic.
  2807  A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
  2808  is valid if any of its conjuncts is not cyclic.
  2809  
  2810  ```
  2811  // Disallowed: a list of infinite length with all elements being 1.
  2812  #List: {
  2813      head: 1
  2814      tail: #List
  2815  }
  2816  
  2817  // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
  2818  a: {
  2819      b: c
  2820  }
  2821  c: {
  2822      d: a
  2823  }
  2824  
  2825  // #List defines a list of arbitrary length. Because the recursive reference
  2826  // is part of a disjunction, this does not result in a structural cycle.
  2827  #List: {
  2828      head: _
  2829      tail: null | #List
  2830  }
  2831  
  2832  // Usage of #List. The value of tail in the most deeply nested element will
  2833  // be `null`: as the value of the disjunct referring to list is the only
  2834  // conjunct, all conjuncts are cyclic and the value is invalid and so
  2835  // eliminated from the disjunction.
  2836  MyList: #List & { head: 1, tail: { head: 2 }}
  2837  ```
  2838  
  2839  <!--
  2840  ### Unused fields
  2841  
  2842  TODO: rules for detection of unused fields
  2843  
  2844  1. Any alias value must be used
  2845  -->
  2846  
  2847  
  2848  ## Modules, instances, and packages
  2849  
  2850  CUE configurations are constructed combining _instances_.
  2851  An instance, in turn, is constructed from one or more source files belonging
  2852  to the same _package_ that together declare the data representation.
  2853  Elements of this data representation may be exported and used
  2854  in other instances.
  2855  
  2856  ### Source file organization
  2857  
  2858  Each source file consists of an optional package clause defining collection
  2859  of files to which it belongs,
  2860  followed by a possibly empty set of import declarations that declare
  2861  packages whose contents it wishes to use, followed by a possibly empty set of
  2862  declarations.
  2863  
  2864  Like with a struct, a source file may contain embeddings.
  2865  Unlike with a struct, the embedded expressions may be any value.
  2866  If the result of the unification of all embedded values is not a struct,
  2867  it will be output instead of its enclosing file when exporting CUE
  2868  to a data format
  2869  
  2870  ```
  2871  SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
  2872  ```
  2873  
  2874  ```
  2875  "Hello \(#place)!"
  2876  
  2877  #place: "world"
  2878  
  2879  // Outputs "Hello world!"
  2880  ```
  2881  
  2882  ### Package clause
  2883  
  2884  A package clause is an optional clause that defines the package to which
  2885  a source file the file belongs.
  2886  
  2887  ```
  2888  PackageClause  = "package" PackageName .
  2889  PackageName    = identifier .
  2890  ```
  2891  
  2892  The PackageName must not be the blank identifier or a definition identifier.
  2893  
  2894  ```
  2895  package math
  2896  ```
  2897  
  2898  ### Modules and instances
  2899  A _module_ defines a tree of directories, rooted at the _module root_.
  2900  
  2901  All source files within a module with the same package belong to the same
  2902  package.
  2903  <!-- jba: I can't make sense of the above sentence. -->
  2904  A module may define multiple packages.
  2905  
  2906  An _instance_ of a package is any subset of files belonging
  2907  to the same package.
  2908  <!-- jba: Are you saying that -->
  2909  <!-- if I have a package with files a, b and c, then there are 8 instances of -->
  2910  <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
  2911  <!-- purpose of that definition? -->
  2912  It is interpreted as the concatenation of these files.
  2913  
  2914  An implementation may impose conventions on the layout of package files
  2915  to determine which files of a package belongs to an instance.
  2916  For example, an instance may be defined as the subset of package files
  2917  belonging to a directory and all its ancestors.
  2918  <!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
  2919  
  2920  
  2921  ### Import declarations
  2922  
  2923  An import declaration states that the source file containing the declaration
  2924  depends on definitions of the _imported_ package
  2925  and enables access to exported identifiers of that package.
  2926  The import names an identifier (PackageName) to be used for access and an
  2927  ImportPath that specifies the package to be imported.
  2928  
  2929  ```
  2930  ImportDecl       = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
  2931  ImportSpec       = [ PackageName ] ImportPath .
  2932  ImportLocation   = { unicode_value } .
  2933  ImportPath       = `"` ImportLocation [ ":" identifier ] `"` .
  2934  ```
  2935  
  2936  The PackageName is used in qualified identifiers to access
  2937  exported identifiers of the package within the importing source file.
  2938  It is declared in the file block.
  2939  It defaults to the identifier specified in the package clause of the imported
  2940  package, which must match either the last path component of ImportLocation
  2941  or the identifier following it.
  2942  
  2943  <!--
  2944  Note: this deviates from the Go spec where there is no such restriction.
  2945  This restriction has the benefit of being to determine the identifiers
  2946  for packages from within the file itself. But for CUE it is has another benefit:
  2947  when using package hierarchies, one is more likely to want to include multiple
  2948  packages within the same directory structure. This mechanism allows
  2949  disambiguation in these cases.
  2950  -->
  2951  
  2952  The interpretation of the ImportPath is implementation-dependent but it is
  2953  typically either the path of a builtin package or a fully qualifying location
  2954  of a package within a source code repository.
  2955  
  2956  An ImportLocation must be a non-empty string using only characters belonging to
  2957  Unicode's L, M, N, P, and S general categories
  2958  (the Graphic characters without spaces)
  2959  and may not include the characters !"#$%&'()*,:;<=>?[\\]^`{|}
  2960  or the Unicode replacement character U+FFFD.
  2961  
  2962  Assume we have package containing the package clause "package math",
  2963  which exports function Sin at the path identified by "lib/math".
  2964  This table illustrates how Sin is accessed in files
  2965  that import the package after the various types of import declaration.
  2966  
  2967  ```
  2968  Import declaration          Local name of Sin
  2969  
  2970  import   "lib/math"         math.Sin
  2971  import   "lib/math:math"    math.Sin
  2972  import m "lib/math"         m.Sin
  2973  ```
  2974  
  2975  An import declaration declares a dependency relation between the importing and
  2976  imported package. It is illegal for a package to import itself, directly or
  2977  indirectly, or to directly import a package without referring to any of its
  2978  exported identifiers.
  2979  
  2980  
  2981  ### An example package
  2982  
  2983  TODO