cuelang.org/go@v0.10.1/doc/ref/spec.md

cuelang.org/go@v0.10.1/doc/ref/spec.md (about)

     1  <!--
     2   Copyright 2018 The CUE Authors
     3  
     4   Licensed under the Apache License, Version 2.0 (the "License");
     5   you may not use this file except in compliance with the License.
     6   You may obtain a copy of the License at
     7  
     8       http://www.apache.org/licenses/LICENSE-2.0
     9  
    10   Unless required by applicable law or agreed to in writing, software
    11   distributed under the License is distributed on an "AS IS" BASIS,
    12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    13   See the License for the specific language governing permissions and
    14   limitations under the License.
    15  -->
    16  
    17  # The CUE Language Specification
    18  
    19  ## Introduction
    20  
    21  This is a reference manual for the CUE data constraint language.
    22  CUE, pronounced cue or Q, is a general-purpose and strongly typed
    23  constraint-based language.
    24  It can be used for data templating, data validation, code generation, scripting,
    25  and many other applications involving structured data.
    26  The CUE tooling, layered on top of CUE, provides
    27  a general purpose scripting language for creating scripts as well as
    28  simple servers, also expressed in CUE.
    29  
    30  CUE was designed with cloud configuration and related systems in mind,
    31  but is not limited to this domain.
    32  It derives its formalism from relational programming languages.
    33  This formalism allows for managing and reasoning over large amounts of
    34  data in a straightforward manner.
    35  
    36  The grammar is compact and regular, allowing for easy analysis by automatic
    37  tools such as integrated development environments.
    38  
    39  This document is maintained by mpvl@golang.org.
    40  CUE has a lot of similarities with the Go language. This document draws heavily
    41  from the Go specification as a result.
    42  
    43  CUE draws its influence from many languages.
    44  Its main influences were BCL/GCL (internal to Google),
    45  LKB (LinGO), Go, and JSON.
    46  Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
    47  Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
    48  
    49  
    50  ## Notation
    51  
    52  The syntax is specified using Extended Backus-Naur Form (EBNF):
    53  
    54  ```
    55  Production  = production_name "=" [ Expression ] "." .
    56  Expression  = Alternative { "|" Alternative } .
    57  Alternative = Term { Term } .
    58  Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
    59  Group       = "(" Expression ")" .
    60  Option      = "[" Expression "]" .
    61  Repetition  = "{" Expression "}" .
    62  ```
    63  
    64  Productions are expressions constructed from terms and the following operators,
    65  in increasing precedence:
    66  
    67  ```
    68  |   alternation
    69  ()  grouping
    70  []  option (0 or 1 times)
    71  {}  repetition (0 to n times)
    72  ```
    73  
    74  Lower-case production names are used to identify lexical tokens. Non-terminals
    75  are in CamelCase. Lexical tokens are enclosed in double quotes `""` or back
    76  quotes ` `` `.
    77  
    78  The form `a … b` represents the set of characters from a through b as
    79  alternatives. The horizontal ellipsis `…` is also used elsewhere in the spec to
    80  informally denote various enumerations or code snippets that are not further
    81  specified. The character `…` (as opposed to the three characters `...`) is not a
    82  token of the CUE language.
    83  
    84  
    85  ## Source code representation
    86  
    87  Source code is Unicode text encoded in UTF-8.
    88  Unless otherwise noted, the text is not canonicalized, so a single
    89  accented code point is distinct from the same character constructed from
    90  combining an accent and a letter; those are treated as two code points.
    91  For simplicity, this document will use the unqualified term character to refer
    92  to a Unicode code point in the source text.
    93  
    94  Each code point is distinct; for instance, upper and lower case letters are
    95  different characters.
    96  
    97  Implementation restriction: For compatibility with other tools, a compiler may
    98  disallow the NUL character (U+0000) in the source text.
    99  
   100  Implementation restriction: For compatibility with other tools, a compiler may
   101  ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
   102  point in the source text. A byte order mark may be disallowed anywhere else in
   103  the source.
   104  
   105  
   106  ### Characters
   107  
   108  The following terms are used to denote specific Unicode character classes:
   109  
   110  ```
   111  newline        = /* the Unicode code point U+000A */ .
   112  unicode_char   = /* an arbitrary Unicode code point except newline */ .
   113  unicode_letter = /* a Unicode code point classified as "Letter" */ .
   114  unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
   115  ```
   116  
   117  In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
   118  character categories.
   119  CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
   120  as Unicode letters, and those in the Number category Nd as Unicode digits.
   121  
   122  
   123  ### Letters and digits
   124  
   125  The underscore character `_` (U+005F) is considered a letter.
   126  
   127  ```
   128  letter        = unicode_letter | "_" | "$" .
   129  decimal_digit = "0" … "9" .
   130  binary_digit  = "0" … "1" .
   131  octal_digit   = "0" … "7" .
   132  hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .
   133  ```
   134  
   135  
   136  ## Lexical elements
   137  
   138  ### Comments
   139  
   140  Comments serve as program documentation.
   141  CUE supports line comments that start with the character sequence `//`
   142  and stop at the end of the line.
   143  
   144  A comment cannot start inside a string literal or inside a comment.
   145  A comment acts like a newline.
   146  
   147  
   148  ### Tokens
   149  
   150  Tokens form the vocabulary of the CUE language. There are four classes:
   151  identifiers, keywords, operators and punctuation, and literals. White space,
   152  formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
   153  (U+000D), and newlines (U+000A), is ignored except as it separates tokens that
   154  would otherwise combine into a single token. Also, a newline or end of file may
   155  trigger the insertion of a comma. While breaking the input into tokens, the
   156  next token is the longest sequence of characters that form a valid token.
   157  
   158  
   159  ### Commas
   160  
   161  The formal grammar uses commas `,` as terminators in a number of productions.
   162  CUE programs may omit most of these commas using the following rules:
   163  
   164  When the input is broken into tokens, a comma is automatically inserted into
   165  the token stream immediately after a line's final token if that token is
   166  
   167  - an identifier, keyword, or bottom
   168  - a number or string literal, including an interpolation
   169  - one of the characters `)`, `]`, `}`, or `?`
   170  - an ellipsis `...`
   171  
   172  
   173  Although commas are automatically inserted, the parser will require
   174  explicit commas between two list elements.
   175  
   176  <!--
   177  TODO: remove the above exception
   178  -->
   179  
   180  To reflect idiomatic use, examples in this document elide commas using
   181  these rules.
   182  
   183  
   184  ### Identifiers
   185  
   186  Identifiers name entities such as fields and aliases.
   187  An identifier is a sequence of one or more letters (which includes `_` and `$`)
   188  and digits, optionally preceded by `#` or `_#`.
   189  It may not be `_` or `$`.
   190  The first character in an identifier, or after an `#` if it contains one,
   191  must be a letter.
   192  Identifiers starting with a `#` or `_` are reserved for definitions and hidden
   193  fields.
   194  
   195  <!--
   196  TODO: allow identifiers as defined in Unicode UAX #31
   197  (https://unicode.org/reports/tr31/).
   198  
   199  Identifiers are normalized using the NFC normal form.
   200  -->
   201  
   202  ```
   203  identifier  = [ "#" | "_#" ] letter { letter | unicode_digit } .
   204  ```
   205  
   206  ```
   207  a
   208  _x9
   209  fieldName
   210  αβ
   211  ```
   212  
   213  <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
   214  
   215  Some identifiers are [predeclared](#predeclared-identifiers).
   216  
   217  
   218  ### Keywords
   219  
   220  CUE has a limited set of keywords.
   221  In addition, CUE reserves all identifiers starting with `__` (double underscores)
   222  as keywords.
   223  These are typically targets of pre-declared identifiers.
   224  
   225  All keywords may be used as labels (field names).
   226  Unless noted otherwise, they can also be used as identifiers to refer to
   227  the same name.
   228  
   229  
   230  #### Values
   231  
   232  The following keywords are values.
   233  
   234  ```
   235  null         true         false
   236  ```
   237  
   238  These can never be used to refer to a field of the same name.
   239  This restriction is to ensure compatibility with JSON configuration files.
   240  
   241  
   242  #### Preamble
   243  
   244  The following keywords are used at the preamble of a CUE file.
   245  After the preamble, they may be used as identifiers to refer to namesake fields.
   246  
   247  ```
   248  package      import
   249  ```
   250  
   251  
   252  #### Comprehension clauses
   253  
   254  The following keywords are used in comprehensions.
   255  
   256  ```
   257  for          in           if           let
   258  ```
   259  
   260  <!--
   261  TODO:
   262      reduce [to]
   263      order [by]
   264  -->
   265  
   266  
   267  ### Operators and punctuation
   268  
   269  The following character sequences represent operators and punctuation:
   270  
   271  ```
   272  +     &&    ==    <     =     (     )
   273  -     ||    !=    >     :     {     }
   274  *     &     =~    <=    ?     [     ]     ,
   275  /     |     !~    >=    !     _|_   ...   .
   276  ```
   277  <!--
   278  Free tokens:  ; ~ ^
   279  // To be used:
   280    @   at: associative lists.
   281  
   282  // Idea: use # instead of @ for attributes and allow then at declaration level.
   283  // This will open up the possibility of defining #! at the start of a file
   284  // without requiring special syntax. Although probably not quite.
   285   -->
   286  
   287  
   288  ### Numeric literals
   289  
   290  There are several kinds of numeric literals.
   291  
   292  ```
   293  int_lit     = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
   294  decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .
   295  decimals    = decimal_digit { [ "_" ] decimal_digit } .
   296  si_it       = decimals [ "." decimals ] multiplier |
   297                "." decimals  multiplier .
   298  binary_lit  = "0b" binary_digit { [ "_" ] binary_digit } .
   299  hex_lit     = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
   300  octal_lit   = "0o" octal_digit { [ "_" ] octal_digit } .
   301  multiplier  = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
   302  
   303  float_lit   = decimals "." [ decimals ] [ exponent ] |
   304                decimals exponent |
   305                "." decimals [ exponent ].
   306  exponent    = ( "e" | "E" ) [ "+" | "-" ] decimals .
   307  ```
   308  
   309  An _integer literal_ is a sequence of digits representing an integer value.
   310  An optional prefix sets a non-decimal base: `0o` for octal,
   311  `0x` or `0X` for hexadecimal, and `0b` for binary.
   312  In hexadecimal literals, letters `a … f` and `A … F` represent values 10 through 15.
   313  All integers allow interstitial underscores `_`;
   314  these have no meaning and are solely for readability.
   315  
   316  Integer literals may have an SI or IEC multiplier.
   317  Multipliers can be used with fractional numbers.
   318  When multiplying a fraction by a multiplier, the result is truncated
   319  towards zero if it is not an integer.
   320  
   321  ```
   322  42
   323  1.5G    // 1_500_000_000
   324  1.3Ki   // 1.3 * 1024 = trunc(1331.2) = 1331
   325  170_141_183_460_469_231_731_687_303_715_884_105_727
   326  0xBad_Face
   327  0o755
   328  0b0101_0001
   329  ```
   330  
   331  A _decimal floating-point literal_ is a representation of
   332  a decimal floating-point value (a _float_).
   333  It has an integer part, a decimal point, a fractional part, and an
   334  exponent part.
   335  The integer and fractional part comprise decimal digits; the
   336  exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
   337  One of the integer part or the fractional part may be elided; one of the decimal
   338  point or the exponent may be elided.
   339  
   340  ```
   341  0.
   342  72.40
   343  072.40  // == 72.40
   344  2.71828
   345  1.e+0
   346  6.67428e-11
   347  1E6
   348  .25
   349  .12345E+5
   350  ```
   351  
   352  <!--
   353  TODO: consider allowing Exo (and up), if not followed by a sign
   354  or number. Alternatively one could only allow Ei, Yi, and Zi.
   355  -->
   356  
   357  Neither a `float_lit` nor an `si_lit` may appear after a token that is:
   358  
   359  - an identifier, keyword, or bottom
   360  - a number or string literal, including an interpolation
   361  - one of the characters `)`, `]`, `}`, `?`, or `.`.
   362  
   363  <!--
   364  So
   365  `a + 3.2Ti`  -> `a`, `+`, `3.2Ti`
   366  `a 3.2Ti`    -> `a`, `3`, `.`, `2`, `Ti`
   367  `a + .5e3`   -> `a`, `+`, `.5e3`
   368  `a .5e3`     -> `a`, `.`, `5`, `e3`.
   369  -->
   370  
   371  
   372  ### String and byte sequence literals
   373  
   374  A string literal represents a string constant obtained from concatenating a
   375  sequence of characters.
   376  Byte sequences are a sequence of bytes.
   377  
   378  String and byte sequence literals are character sequences between,
   379  respectively, double and single quotes, as in `"bar"` and `'bar'`.
   380  Within the quotes, any character may appear except newline and,
   381  respectively, unescaped double or single quote.
   382  String literals may only be valid UTF-8.
   383  Byte sequences may contain any sequence of bytes.
   384  
   385  Several escape sequences allow arbitrary values to be encoded as ASCII text.
   386  An escape sequence starts with an _escape delimiter_, which is `\` by default.
   387  The escape delimiter may be altered to be `\` plus a fixed number of
   388  hash symbols `#` by padding the start and end of a string or byte sequence
   389  literal with this number of hash symbols.
   390  
   391  <!--
   392  TODO: move these examples further up so it's evident why #" exists.
   393  	#"This is not an \(interpolation)"#
   394  	#"This is an \#(interpolation)"#
   395  	#"The sequence "\U0001F604" renders as \#U0001F604."#
   396  -->
   397  
   398  There are four ways to represent the integer value as a numeric constant: `\x`
   399  followed by exactly two hexadecimal digits; `\u` followed by exactly four
   400  hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
   401  plain backslash `\` followed by exactly three octal digits.
   402  In each case the value of the literal is the value represented by the
   403  digits in the corresponding base.
   404  Hexadecimal and octal escapes are only allowed within byte sequences
   405  (single quotes).
   406  
   407  Although these representations all result in an integer, they have different
   408  valid ranges.
   409  Octal escapes must represent a value between 0 and 255 inclusive.
   410  Hexadecimal escapes satisfy this condition by construction.
   411  The escapes `\u` and `\U` represent Unicode code points so within them
   412  some values are illegal, in particular those above `0x10FFFF`.
   413  Surrogate halves are allowed,
   414  but are translated into their non-surrogate equivalent internally.
   415  
   416  The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
   417  represent individual bytes of the resulting string; all other escapes represent
   418  the (possibly multi-byte) UTF-8 encoding of individual characters.
   419  Thus inside a string literal `\377` and `\xFF` represent a single byte of
   420  value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
   421  the two bytes `0xc3 0xbf` of the UTF-8 encoding of character `U+00FF`.
   422  
   423  ```
   424  \a   U+0007 alert or bell
   425  \b   U+0008 backspace
   426  \f   U+000C form feed
   427  \n   U+000A line feed or newline
   428  \r   U+000D carriage return
   429  \t   U+0009 horizontal tab
   430  \v   U+000b vertical tab
   431  \/   U+002f slash (solidus)
   432  \\   U+005c backslash
   433  \'   U+0027 single quote  (valid escape only within single quoted literals)
   434  \"   U+0022 double quote  (valid escape only within double quoted literals)
   435  ```
   436  
   437  The escape `\(` is used as an escape for string interpolation.
   438  A `\(` must be followed by a valid CUE Expression, followed by a `)`.
   439  
   440  A backslash at the end of a line elides the line terminator that follows it.
   441  This may not escape the final newline inside a multiline string: that
   442  newline is already implicitly elided.
   443  
   444  All other sequences starting with a backslash are illegal inside literals.
   445  
   446  ```
   447  escaped_char     = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
   448  byte_value       = octal_byte_value | hex_byte_value .
   449  octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit .
   450  hex_byte_value   = `\` { `#` } "x" hex_digit hex_digit .
   451  little_u_value   = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit .
   452  big_u_value      = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit
   453                             hex_digit hex_digit hex_digit hex_digit .
   454  unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
   455  interpolation    = "\" { `#` } "(" Expression ")" .
   456  
   457  string_lit       = simple_string_lit |
   458                     multiline_string_lit |
   459                     simple_bytes_lit |
   460                     multiline_bytes_lit |
   461                     `#` string_lit `#` .
   462  
   463  simple_string_lit    = `"` { unicode_value | interpolation } `"` .
   464  simple_bytes_lit     = `'` { unicode_value | interpolation | byte_value } `'` .
   465  multiline_string_lit = `"""` newline
   466                               { unicode_value | interpolation | newline }
   467                               newline `"""` .
   468  multiline_bytes_lit  = "'''" newline
   469                               { unicode_value | interpolation | byte_value | newline }
   470                               newline "'''" .
   471  ```
   472  
   473  Carriage return characters (`\r`) inside string literals are discarded from
   474  the string value.
   475  
   476  ```
   477  'a\000\xab'
   478  '\007'
   479  '\377'
   480  '\xa'        // illegal: too few hexadecimal digits
   481  "\n"
   482  "\""
   483  'Hello, world!\n'
   484  "Hello, \( name )!"
   485  "日本語"
   486  "\u65e5本\U00008a9e"
   487  '\xff\u00FF'
   488  "\uD800"             // illegal: surrogate half (TODO: probably should allow)
   489  "\U00110000"         // illegal: invalid Unicode code point
   490  
   491  #"This is not an \(interpolation)"#
   492  #"This is an \#(interpolation)"#
   493  #"The sequence "\U0001F604" renders as \#U0001F604."#
   494  ```
   495  
   496  These examples all represent the same string:
   497  
   498  ```
   499  "日本語"                                 // UTF-8 input text
   500  '日本語'                                 // UTF-8 input text as byte sequence
   501  "\u65e5\u672c\u8a9e"                    // the explicit Unicode code points
   502  "\U000065e5\U0000672c\U00008a9e"        // the explicit Unicode code points
   503  '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'  // the explicit UTF-8 bytes
   504  ```
   505  
   506  If the source code represents a character as two code points, such as a
   507  combining form involving an accent and a letter, the result will appear as two
   508  code points if placed in a string literal.
   509  
   510  Strings and byte sequences have a multiline equivalent.
   511  Multiline strings are like their single-line equivalent,
   512  but allow newline characters.
   513  
   514  Multiline strings and byte sequences respectively start with
   515  a triple double quote (`"""`) or triple single quote (`'''`),
   516  immediately followed by a newline, which is discarded from the string contents.
   517  The string is closed by a matching triple quote, which must be by itself
   518  on a new line, preceded by optional whitespace.
   519  The newline preceding the closing quote is discarded from the string contents.
   520  The whitespace before a closing triple quote must appear before any non-empty
   521  line after the opening quote and will be removed from each of these
   522  lines in the string literal.
   523  A closing triple quote may not appear in the string.
   524  To include it is suffices to escape one of the quotes.
   525  
   526  ```
   527  """
   528      lily:
   529      out of the water
   530      out of itself
   531  
   532      bass
   533      picking \
   534      bugs
   535      off the moon
   536          — Nick Virgilio, Selected Haiku, 1988
   537      """
   538  ```
   539  
   540  This represents the same string as:
   541  
   542  ```
   543  "lily:\nout of the water\nout of itself\n\n" +
   544  "bass\npicking bugs\noff the moon\n" +
   545  "    — Nick Virgilio, Selected Haiku, 1988"
   546  ```
   547  
   548  <!-- TODO: other values
   549  
   550  Support for other values:
   551  - Duration literals
   552  - regular expressions: `re("[a-z]")`
   553  -->
   554  
   555  
   556  ## Values
   557  
   558  In addition to simple values like `"hello"` and `42.0`, CUE has [structs](#structs).
   559  A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
   560  Structs are CUE's only way of building up complex values;
   561  lists, which we will see later,
   562  are defined in terms of structs.
   563  
   564  All possible values are ordered in a lattice,
   565  a partial order where every two elements have a single greatest lower bound.
   566  A value `a` is an _instance_ of a value `b`,
   567  denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
   568  that is if `a` orders before `b` in the partial order
   569  (`⊑` is _not_ a CUE operator).
   570  We also say that `b` _subsumes_ `a` in this case.
   571  In graphical terms, `b` is "above" `a` in the lattice.
   572  
   573  <!-- TODO: link to https://cuelang.org/docs/concepts/logic/ as more reading
   574  material, especially for those new to lattices
   575  -->
   576  
   577  At the top of the lattice is the single ancestor of all values, called
   578  [top](#top), denoted `_` in CUE.
   579  Every value is an instance of top.
   580  
   581  At the bottom of the lattice is the value called [bottom](#bottom), denoted `_|_`.
   582  A bottom value usually indicates an error.
   583  Bottom is an instance of every value.
   584  
   585  An _atom_ is any value whose only instances are itself and bottom.
   586  Examples of atoms are `42.0`, `"hello"`, `true`, and `null`.
   587  
   588  A value is _concrete_ if it is either an atom, or a struct whose field values
   589  are all concrete, recursively.
   590  
   591  CUE's values also include what we normally think of as types, like `string` and
   592  `float`.
   593  It does not distinguish between types and values:
   594  only the relationship of values in the lattice is important.
   595  Each CUE "type" subsumes the concrete values that one would normally think
   596  of as part of that type.
   597  For example, `"hello"` is an instance of `string`, and `42.0` is an instance of
   598  `float`.
   599  In addition to `string` and `float`, CUE has `null`, `int`, `bool`, and `bytes`.
   600  We informally call these CUE's "basic types".
   601  
   602  
   603  ```
   604  false ⊑ bool
   605  true  ⊑ bool
   606  true  ⊑ true
   607  5.0   ⊑ float
   608  bool  ⊑ _
   609  _|_   ⊑ _
   610  _|_   ⊑ _|_
   611  
   612  _     ⋢ _|_
   613  _     ⋢ bool
   614  int   ⋢ bool
   615  bool  ⋢ int
   616  false ⋢ true
   617  true  ⋢ false
   618  float ⋢ 5.0
   619  5     ⋢ 6
   620  ```
   621  
   622  
   623  ### Unification
   624  
   625  The _unification_ of values `a` and `b`
   626  is defined as the greatest lower bound of `a` and `b`. (That is, the
   627  value `u` such that `u ⊑ a` and `u ⊑ b`,
   628  and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
   629  it holds that `v ⊑ u`.)
   630  Since CUE values form a lattice, the unification of two CUE values is
   631  always unique.
   632  
   633  These all follow from the definition of unification:
   634  - The unification of `a` with itself is always `a`.
   635  - The unification of values `a` and `b` where `a ⊑ b` is always `a`.
   636  - The unification of a value with bottom is always bottom.
   637  
   638  Unification in CUE is a [binary expression](#operands), written `a & b`.
   639  It is commutative, associative, and idempotent.
   640  As a consequence, order of evaluation is irrelevant, a property that is key
   641  to many of the constructs in the CUE language as well as the tooling layered
   642  on top of it.
   643  
   644  
   645  
   646  <!-- TODO: explicitly mention that disjunction is not a binary operation
   647  but a definition of a single value?-->
   648  
   649  
   650  ### Disjunction
   651  
   652  The _disjunction_ of values `a` and `b`
   653  is defined as the least upper bound of `a` and `b`.
   654  (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
   655  and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
   656  it holds that `d ⊑ e`.)
   657  This style of disjunctions is sometimes also referred to as sum types.
   658  Since CUE values form a lattice, the disjunction of two CUE values is always unique.
   659  
   660  
   661  These all follow from the definition of disjunction:
   662  - The disjunction of `a` with itself is always `a`.
   663  - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
   664  - The disjunction of a value `a` with bottom is always `a`.
   665  - The disjunction of two bottom values is bottom.
   666  
   667  Disjunction in CUE is a [binary expression](#operands), written `a | b`.
   668  It is commutative, associative, and idempotent.
   669  
   670  The unification of a disjunction with another value is equal to the disjunction
   671  composed of the unification of this value with all of the original elements
   672  of the disjunction.
   673  In other words, unification distributes over disjunction.
   674  
   675  ```
   676  (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
   677  ```
   678  
   679  ```
   680  Expression                Result
   681  ({a:1} | {b:2}) & {c:3}   {a:1, c:3} | {b:2, c:3}
   682  (int | string) & "foo"    "foo"
   683  ("a" | "b") & "c"         _|_
   684  ```
   685  
   686  A disjunction is _normalized_ if there is no element
   687  `a` for which there is an element `b` such that `a ⊑ b`.
   688  
   689  <!--
   690  Normalization is important, as we need to account for spurious elements
   691  For instance "tcp" | "tcp" should resolve to "tcp".
   692  
   693  Also consider
   694  
   695    ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
   696  
   697  in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
   698  this expression is logically equivalent to {a:1} and should therefore be
   699  considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
   700  
   701  For instance, in
   702  
   703    x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
   704    y: x.a // 1
   705  
   706  y should resolve to 1, and not an error.
   707  
   708  For comparison, in
   709  
   710    x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
   711    y: x.a // _|_
   712  
   713  y should be an error as x is still ambiguous before the selector is applied,
   714  even though `a` resolves to 1 in all cases.
   715  -->
   716  
   717  
   718  #### Default values
   719  
   720  Any value `v` _may_ be associated with a default value `d`,
   721  where `d` must be in instance of `v` (`d ⊑ v`).
   722  
   723  Default values are introduced by means of disjunctions.
   724  Any element of a disjunction can be _marked_ as a default
   725  by prefixing it with an asterisk `*` ([a unary expression](#operators)).
   726  Syntactically consecutive disjunctions are considered to be
   727  part of a single disjunction,
   728  whereby multiple disjuncts can be marked as default.
   729  A _marked disjunction_ is one where any of its terms are marked.
   730  So `a | b | *c | d` is a single marked disjunction of four terms,
   731  whereas `a | (b | *c | d)` is an unmarked disjunction of two terms,
   732  one of which is a marked disjunction of three terms.
   733  During unification, if all the marked disjuncts of a marked disjunction are
   734  eliminated, then the remaining unmarked disjuncts are considered as if they
   735  originated from an unmarked disjunction
   736  <!-- TODO: this formulation should be worked out more.  -->
   737  As explained below, distinguishing the nesting of disjunctions like this
   738  is only relevant when both an outer and nested disjunction are marked.
   739  
   740  Intuitively, when an expression needs to be resolved for an operation other
   741  than unification or disjunction,
   742  non-starred elements are dropped in favor of starred ones if the starred ones
   743  do not resolve to bottom.
   744  
   745  To define the unification and disjunction operation we use the notation
   746  `⟨v⟩` to denote a CUE value `v` that is not associated with a default
   747  and the notation `⟨v, d⟩` to denote a value `v` associated with a default
   748  value `d`.
   749  
   750  The rewrite rules for unifying such values are as follows:
   751  ```
   752  U0: ⟨v1⟩ & ⟨v2⟩         => ⟨v1&v2⟩
   753  U1: ⟨v1, d1⟩ & ⟨v2⟩     => ⟨v1&v2, d1&v2⟩
   754  U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩
   755  ```
   756  
   757  The rewrite rules for disjoining terms of unmarked disjunctions are
   758  ```
   759  D0: ⟨v1⟩ | ⟨v2⟩         => ⟨v1|v2⟩
   760  D1: ⟨v1, d1⟩ | ⟨v2⟩     => ⟨v1|v2, d1⟩
   761  D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩
   762  ```
   763  
   764  Terms of marked disjunctions are first rewritten according to the following
   765  rules:
   766  ```
   767  M0:  ⟨v⟩    => ⟨v⟩        don't introduce defaults for unmarked term
   768  M1: *⟨v⟩    => ⟨v, v⟩     introduce identical default for marked term
   769  M2: *⟨v, d⟩ => ⟨v, d⟩     keep existing defaults for marked term
   770  M3:  ⟨v, d⟩ => ⟨v⟩        strip existing defaults from unmarked term
   771  ```
   772  
   773  Note that for any marked disjunction `a`,
   774  the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`.
   775  
   776  ```
   777  Expression               Value-default pair     Rules applied
   778  *"tcp" | "udp"           ⟨"tcp"|"udp", "tcp"⟩    M1, D1
   779  string | *"foo"          ⟨string, "foo"⟩         M1, D1
   780  
   781  *1 | 2 | 3               ⟨1|2|3, 1⟩              M1, D1
   782  
   783  (*1|2|3) | (1|*2|3)      ⟨1|2|3, 1|2⟩            M1, D1, D2
   784  (*1|2|3) | *(1|*2|3)     ⟨1|2|3, 2⟩              M1, M2, M3, D1, D2
   785  (*1|2|3) | (1|*2|3)&2    ⟨1|2|3, 1|2⟩            M1, D1, U1, D2
   786  
   787  (*1|2) & (1|*2)          ⟨1|2, _|_⟩              M1, D1, U2
   788  ```
   789  
   790  <!-- TODO: define and consistently use the value-default pair syntax -->
   791  
   792  The rules of subsumption for defaults can be derived from the above definitions
   793  and are as follows.
   794  
   795  ```
   796  ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩  if v2 ⊑ v1 and d2 ⊑ d1
   797  ⟨v1, d1⟩ ⊑ ⟨v⟩       if v1 ⊑ v
   798  ⟨v⟩      ⊑ ⟨v1, d1⟩  if v ⊑ d1
   799  ```
   800  
   801  <!--
   802  For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
   803  
   804  The last one is so restrictive as v could still be made more specific by
   805  associating it with a default that is not subsumed by d1.
   806  
   807  Proof:
   808    by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
   809    where the most general value is (v, v).
   810    Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
   811    from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
   812    exactly defines the boundary of this subsumption.
   813  -->
   814  
   815  <!--
   816  (non-normalized entries could also be implicitly marked, allowing writing
   817  int | 1, instead of int | *1, but that can be done in a backwards
   818  compatible way later if really desirable, as long as we require that
   819  disjunction literals be normalized).
   820  -->
   821  
   822  ```
   823  Expression                       Resolves to
   824  "tcp" | "udp"                    "tcp" | "udp"
   825  *"tcp" | "udp"                   "tcp"
   826  float | *1                       1
   827  *string | 1.0                    string
   828  (*1|2) + (2|*3)                  4
   829  
   830  (*1|2|3) | (1|*2|3)              1|2
   831  (*1|2|3) & (1|*2|3)              1|2|3 // default is _|_
   832  
   833  (* >=5 | int) & (* <=5 | int)    5
   834  
   835  (*"tcp"|"udp") & ("udp"|*"tcp")  "tcp"
   836  (*"tcp"|"udp") & ("udp"|"tcp")   "tcp"
   837  (*"tcp"|"udp") & "tcp"           "tcp"
   838  (*"tcp"|"udp") & (*"udp"|"tcp")  "tcp" | "udp" // default is _|_
   839  
   840  (*true | false) & bool           true
   841  (*true | false) & (true | false) true
   842  
   843  {a: 1} | {b: 1}                  {a: 1} | {b: 1}
   844  {a: 1} | *{b: 1}                 {b:1}
   845  *{a: 1} | *{b: 1}                {a: 1} | {b: 1}
   846  ({a: 1} | {b: 1}) & {a:1}        {a:1}  | {a: 1, b: 1}
   847  ({a:1}|*{b:1}) & ({a:1}|*{b:1})  {b:1}
   848  ```
   849  
   850  
   851  ### Bottom and errors
   852  
   853  Any evaluation error in CUE results in a bottom value, represented by
   854  the token `_|_`.
   855  Bottom is an instance of every other value.
   856  Any evaluation error is represented as bottom.
   857  
   858  Implementations may associate error strings with different instances of bottom;
   859  logically they all remain the same value.
   860  
   861  ```
   862  bottom_lit = "_|_" .
   863  ```
   864  
   865  
   866  ### Top
   867  
   868  Top is represented by the underscore character `_`, lexically an identifier.
   869  Unifying any value `v` with top results in `v` itself.
   870  
   871  ```
   872  Expr        Result
   873  _ &  5        5
   874  _ &  _        _
   875  _ & _|_      _|_
   876  _ | _|_       _
   877  ```
   878  
   879  
   880  ### Null
   881  
   882  The _null value_ is represented with the keyword `null`.
   883  It has only one parent, top, and one child, bottom.
   884  It is unordered with respect to any other value.
   885  
   886  ```
   887  null_lit   = "null" .
   888  ```
   889  
   890  ```
   891  null & 8     _|_
   892  null & _     null
   893  null & _|_   _|_
   894  ```
   895  
   896  
   897  ### Boolean values
   898  
   899  A _boolean type_ represents the set of Boolean truth values denoted by
   900  the keywords `true` and `false`.
   901  The predeclared boolean type is `bool`; it is a defined type and a separate
   902  element in the lattice.
   903  
   904  ```
   905  bool_lit = "true" | "false" .
   906  ```
   907  
   908  ```
   909  bool & true          true
   910  true & true          true
   911  true & false         _|_
   912  bool & (false|true)  false | true
   913  bool & (true|false)  true | false
   914  ```
   915  
   916  
   917  ### Numeric values
   918  
   919  The _integer type_ represents the set of all integral numbers.
   920  The _decimal floating-point type_ represents the set of all decimal floating-point
   921  numbers.
   922  They are two distinct types.
   923  Both are instances instances of a generic `number` type.
   924  
   925  <!--
   926  TODO: would be nice to make this a rendered diagram with Mermaid.
   927  
   928                      number
   929                     /      \
   930                  int      float
   931  -->
   932  
   933  The predeclared number, integer, and decimal floating-point types are
   934  `number`, `int` and `float`; they are defined types.
   935  <!--
   936  TODO: should we drop float? It is somewhat preciser and probably a good idea
   937  to have it in the programmatic API, but it may be confusing to have to deal
   938  with it in the language.
   939  -->
   940  
   941  A decimal floating-point literal always has type `float`;
   942  it is not an instance of `int` even if it is an integral number.
   943  
   944  Integer literals are always of type `int` and don't match type `float`.
   945  
   946  Numeric literals are exact values of arbitrary precision.
   947  If the operation permits it, numbers should be kept in arbitrary precision.
   948  
   949  Implementation restriction: although numeric values have arbitrary precision
   950  in the language, implementations may implement them using an internal
   951  representation with limited precision.
   952  That said, every implementation must:
   953  
   954  - Represent integer values with at least 256 bits.
   955  - Represent floating-point values with a mantissa of at least 256 bits and
   956  a signed binary exponent of at least 16 bits.
   957  - Give an error if unable to represent an integer value precisely.
   958  - Give an error if unable to represent a floating-point value due to overflow.
   959  - Round to the nearest representable value if unable to represent
   960  a floating-point value due to limits on precision.
   961  These requirements apply to the result of any expression except for builtin
   962  functions, for which an unusual loss of precision must be explicitly documented.
   963  
   964  
   965  ### Strings
   966  
   967  The _string type_ represents the set of UTF-8 strings,
   968  not allowing surrogates.
   969  The predeclared string type is `string`; it is a defined type.
   970  
   971  The length of a string `s` (its size in bytes) can be discovered using
   972  the builtin function `len`.
   973  
   974  
   975  ### Bytes
   976  
   977  The _bytes type_ represents the set of byte sequences.
   978  A byte sequence value is a (possibly empty) sequence of bytes.
   979  The number of bytes is called the length of the byte sequence
   980  and is never negative.
   981  The predeclared byte sequence type is `bytes`; it is a defined type.
   982  
   983  
   984  ### Bounds
   985  
   986  A _bound_, syntactically a [unary expression](#operands), defines
   987  a logically infinite disjunction of concrete values represented as a single comparison.
   988  For example, `>= 2` represents the infinite disjunction `2|3|4|5|6|7|…`.
   989  
   990  For any [comparison operator](#comparison-operators) `op` except `==`,
   991  `op a` is the disjunction of every `x` such that `x op a`.
   992  
   993  
   994  ```
   995  2 & >=2 & <=5           // 2, where 2 is either an int or float.
   996  2.5 & >=1 & <=5         // 2.5
   997  2 & >=1.0 & <3.0        // 2.0
   998  2 & >1 & <3.0           // 2.0
   999  2.5 & int & >1 & <5     // _|_
  1000  2.5 & float & >1 & <5   // 2.5
  1001  int & 2 & >1.0 & <3.0   // _|_
  1002  2.5 & >=(int & 1) & <5  // _|_
  1003  >=0 & <=7 & >=3 & <=10  // >=3 & <=7
  1004  !=null & 1              // 1
  1005  >=5 & <=5               // 5
  1006  ```
  1007  
  1008  
  1009  ### Structs
  1010  
  1011  A _struct_ is a set of elements called _fields_, each of
  1012  which has a name, called a _label_, and value.
  1013  
  1014  We say a label is _defined_ for a struct if the struct has a field with the
  1015  corresponding label.
  1016  The value for a label `f` of struct `a` is denoted `a.f`.
  1017  A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
  1018  defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
  1019  Note that if `a` is an instance of `b` it may have fields with labels that
  1020  are not defined for `b`.
  1021  
  1022  The (unique) struct with no fields, written `{}`, has every struct as an
  1023  instance. It can be considered the type of all structs.
  1024  
  1025  ```
  1026  {a: 1} ⊑ {}
  1027  {a: 1, b: 1} ⊑ {a: 1}
  1028  {a: 1} ⊑ {a: int}
  1029  {a: 1, b: 1.0} ⊑ {a: int, b: number}
  1030  
  1031  {} ⋢ {a: 1}
  1032  {a: 2} ⋢ {a: 1}
  1033  {a: 1} ⋢ {b: 1}
  1034  ```
  1035  
  1036  The successful unification of structs `a` and `b` is a new struct `c` which
  1037  has all fields of both `a` and `b`, where
  1038  the value of a field `f` in `c` is `a.f & b.f` if `f` is defined in both `a` and `b`,
  1039  or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
  1040  Any [references](#references) to `a` or `b`
  1041  in their respective field values need to be replaced with references to `c`.
  1042  The result of a unification is bottom (`_|_`) if any of its defined
  1043  fields evaluates to bottom, recursively.
  1044  
  1045  A struct literal may contain multiple fields with the same label,
  1046  the result of which is the unification of all those fields.
  1047  
  1048  ```
  1049  StructLit       = "{" { Declaration "," } "}" .
  1050  Declaration     = Field | Ellipsis | Embedding | LetClause | attribute .
  1051  Ellipsis        = "..." [ Expression ] .
  1052  Embedding       = Comprehension | AliasExpr .
  1053  Field           = Label ":" { Label ":" } AliasExpr { attribute } .
  1054  Label           = [ identifier "=" ] LabelExpr .
  1055  LabelExpr       = LabelName [ "?" | "!" ] | "[" AliasExpr "]" .
  1056  LabelName       = identifier | simple_string_lit | "(" AliasExpr ")" .
  1057  
  1058  attribute       = "@" identifier "(" attr_tokens ")" .
  1059  attr_tokens     = { attr_token |
  1060                      "(" attr_tokens ")" |
  1061                      "[" attr_tokens "]" |
  1062                      "{" attr_tokens "}" } .
  1063  attr_token      = /* any token except '(', ')', '[', ']', '{', or '}' */
  1064  ```
  1065  
  1066  ```
  1067  Expression                             Result
  1068  {a: int, a: 1}                         {a: 1}
  1069  {a: int} & {a: 1}                      {a: 1}
  1070  {a: >=1 & <=7} & {a: >=5 & <=9}        {a: >=5 & <=7}
  1071  {a: >=1 & <=7, a: >=5 & <=9}           {a: >=5 & <=7}
  1072  
  1073  {a: 1} & {b: 2}                        {a: 1, b: 2}
  1074  {a: 1, b: int} & {b: 2}                {a: 1, b: 2}
  1075  
  1076  {a: 1} & {a: 2}                        _|_
  1077  ```
  1078  
  1079  
  1080  #### Field constraints
  1081  
  1082  A struct may declare _field constraints_ which define values
  1083  that should be unified with a given field once it is defined.
  1084  The existence of a field constraint declares, but does not define, that field.
  1085  
  1086  Syntactically, a field is marked as a constraint
  1087  by following its label with an _optional_ marker `?`
  1088  or _required_ marker `!`.
  1089  These markers are not part of the field name.
  1090  
  1091  A struct that has a required field constraint with a bottom value
  1092  evaluates to bottom.
  1093  An optional field constraint with a bottom value does _not_ invalidate
  1094  the struct that contains it
  1095  as long as it is not unified with a defined field.
  1096  
  1097  The subsumption relation for fields with the various markers is defined as
  1098  ```
  1099  {a: x} ⊑ {a!: x} ⊑ {a?: x}
  1100  ```
  1101  for any given `x`.
  1102  
  1103  Implementations may error upon encountering a required field constraint
  1104  when manifesting CUE as data.
  1105  
  1106  ```
  1107  Expression                             Result
  1108  {foo?: 3} & {foo: 3}                   {foo: 3}
  1109  {foo!: 3} & {foo: 3}                   {foo: 3}
  1110  
  1111  {foo!: int} & {foo: int}               {foo:  int}
  1112  {foo!: int} & {foo?: <1}               {foo!: <1}
  1113  {foo!: int} & {foo: <=3}               {foo:  <=3}
  1114  {foo!: int} & {foo: 3}                 {foo:  3}
  1115  
  1116  {foo!: 3} & {foo: int}                 {foo: 3}
  1117  {foo!: 3} & {foo: <=4}                 {foo: 3}
  1118  
  1119  {foo?: 1} & {foo?: 2}                  {foo?: _|_} // No error
  1120  {foo?: 1} & {foo!: 2}                  _|_
  1121  {foo?: 1} & {foo: 2}                   _|_
  1122  ```
  1123  
  1124  <!-- see https://github.com/cue-lang/proposal/blob/main/designs/1951-required-fields-v2.md -->
  1125  
  1126  <!--NOTE: About bottom values for optional fields being okay.
  1127  
  1128  The proposition ¬P is a close cousin of P → ⊥ and is often used
  1129  as an approximation to avoid the issues of using not.
  1130  Bottom (⊥) is also frequently used to mean undefined. This makes sense.
  1131  Consider `{a?: 2} & {a?: 3}`.
  1132  Both structs say `a` is optional; in other words, it may be omitted.
  1133  So we can still get a valid result by omitting `a`, even in
  1134  case of a conflict.
  1135  
  1136  Granted, this definition may lead to confusing results, especially in
  1137  definitions, when tightening an optional field leads to unintentionally
  1138  discarding it.
  1139  It could be a role of vet checkers to identify such cases (and suggest users
  1140  to explicitly use `_|_` to discard a field, for instance).
  1141  
  1142  TODO: These examples show also how field constraints interact with defaults.
  1143  Should we included this? Probably not necessary, as this is an orthogonal
  1144  concern.
  1145  ```
  1146  Expression                             Result
  1147  a: { foo?: string }                    a: { foo?: string }
  1148  b: { foo: "bar" }                      b: { foo: "bar" }
  1149  c: { foo?: *"baz" | string }           c: { foo?: *"baz" | string }
  1150  
  1151  d: a & b                               { foo: "bar" }
  1152  e: b & c                               { foo: "bar" }
  1153  f: a & c                               { foo?: *"baz" | string }
  1154  g: a & { foo?: number }                { foo?: _|_ } // This is fine
  1155  h: b & { foo?: number }                _|_
  1156  i: c & { foo: string }                 { foo: *"baz" | string }
  1157  ```
  1158  -->
  1159  
  1160  
  1161  #### Dynamic fields
  1162  
  1163  A _dynamic field_ is a field whose label is determined by
  1164  an expression wrapped in parentheses.
  1165  A dynamic field may be marked as optional or required.
  1166  
  1167  ```
  1168  Expression                             Result
  1169  a:   "foo"                             a:   "foo"
  1170  b:   "bar"                             b:   "bar"
  1171  (a): "baz"                             foo: "baz"
  1172  
  1173  (a+b): "qux"                           foobar: "qux"
  1174  
  1175  (a)?: string                           foo?: string
  1176  (b)!: string                           bar!: string
  1177  ```
  1178  
  1179  
  1180  #### Pattern and default constraints
  1181  
  1182  A struct may define constraints that apply to a collection of fields.
  1183  
  1184  A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
  1185  is a value of type string, and a value to unify with fields whose label
  1186  unifies with the pattern.
  1187  For a given struct `a` with pattern constraint `[p]: v`, `v` is unified
  1188  with any field with name `f` in `a` for which `p & f` is not bottom.
  1189  When unifying struct `a` and `b`,
  1190  any pattern constraint declared in `a` and `b`
  1191  are also declared in the result of unification.
  1192  
  1193  <!-- TODO: Update grammar and support this.
  1194  A pattern constraints with a pattern preceded by `...` indicates
  1195  the pattern can only matches fields in `b` for which there
  1196  exists no field in `a` with the same label.
  1197  -->
  1198  
  1199  Additionally, a _default constraint_, denoted `...value`, defines a value
  1200  to unify with any field for which there is no other declaration in a struct.
  1201  When unifying structs `a` and `b`,
  1202  a default constraint `...v` declared in `a`
  1203  defines that the value `v` should unify with any field in the resulting struct `c`
  1204  whose label does not unify with any of the patterns of the pattern
  1205  constraints defined for `a` _and_ for which there exists no field declaration
  1206  in `a` with that label.
  1207  The token `...` is a shorthand for `..._`.
  1208  _Note_: default constraints of the form `..._` are not yet implemented.
  1209  
  1210  
  1211  ```
  1212  a: {
  1213      foo:      string  // foo is a string
  1214      [=~"^i"]: int     // all other fields starting with i are integers
  1215      [=~"^b"]: bool    // all other fields starting with b are booleans
  1216      [>"c"]:   string  // all other fields lexically after c are strings
  1217  
  1218      ...string         // all other fields must be a string. Note: default constraints are not yet implemented.
  1219  }
  1220  
  1221  b: a & {
  1222      i3:    3
  1223      bar:   true
  1224      other: "a string"
  1225  }
  1226  ```
  1227  
  1228  <!--
  1229  TODO: are these two equivalent? Rog says that maybe you'll be able to refer
  1230  to optional fields at some point, which will never make sense for patterns.
  1231  Marcel says this is already mentioned elsewhere.
  1232  
  1233  a: {
  1234  	["foo"]: int
  1235  	foo?: int
  1236  }
  1237  -->
  1238  
  1239  Concrete field labels may be an identifier or string, the latter of which may be
  1240  interpolated.
  1241  Fields with identifier labels can be referred to within the scope they are
  1242  defined, string labels cannot.
  1243  References within such interpolated strings are resolved within
  1244  the scope of the struct in which the label sequence is
  1245  defined and can reference concrete labels lexically preceding
  1246  the label within a label sequence.
  1247  <!-- We allow this so that rewriting a CUE file to collapse or expand
  1248  field sequences has no impact on semantics.
  1249  -->
  1250  
  1251  <!--TODO: first implementation round will not yet have expression labels
  1252  
  1253  An ExpressionLabel sets a collection of optional fields to a field value.
  1254  By default it defines this value for all possible string labels.
  1255  An optional expression limits this to the set of optional fields which
  1256  labels match the expression.
  1257  -->
  1258  
  1259  
  1260  <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
  1261  
  1262  
  1263  <!-- NOTE:
  1264  A DefinitionDecl does not allow repeated labels. This is to avoid
  1265  any ambiguity or confusion about whether earlier path components
  1266  are to be interpreted as declarations or normal fields (they should
  1267  always be normal fields.)
  1268  -->
  1269  
  1270  <!--NOTE:
  1271  The syntax has been deliberately restricted to allow for the following
  1272  future extensions and relaxations:
  1273    - Allow omitting a "?" in an expression label to indicate a concrete
  1274      string value (but maybe we want to use () for that).
  1275    - Make the "?" in expression label optional if expression labels
  1276      are always optional.
  1277    - Or allow eliding the "?" if the expression has no references and
  1278      is obviously not concrete (such as `[string]`).
  1279    - The expression of an expression label may also indicate a struct with
  1280      integer or even number labels
  1281      (beware of imprecise computation in the latter).
  1282        e.g. `{ [int]: string }` is a map of integers to strings.
  1283    - Allow for associative lists (`foo [@.field]: {field: string}`)
  1284    - The `...` notation can be extended analogously to that of a ListList,
  1285      by allowing it to follow with an expression for the remaining properties.
  1286      In that case it is no longer a shorthand for `[string]: _`, but rather
  1287      would define the value for any other value for which there is no field
  1288      defined.
  1289      Like the definition with List, this is somewhat odd, but it allows the
  1290      encoding of JSON schema's and (non-structural) OpenAPI's
  1291      additionalProperties and additionalItems.
  1292  -->
  1293  
  1294  ```
  1295  intMap: [string]: int
  1296  intMap: {
  1297      t1: 43
  1298      t2: 2.4  // error: 2.4 is not an integer
  1299  }
  1300  
  1301  nameMap: [string]: {
  1302      firstName: string
  1303      nickName:  *firstName | string
  1304  }
  1305  
  1306  nameMap: hank: firstName: "Hank"
  1307  ```
  1308  
  1309  The optional field set defined by `nameMap` matches every field,
  1310  in this case just `hank`, and unifies the associated constraint
  1311  with the matched field, resulting in:
  1312  
  1313  ```
  1314  nameMap: hank: {
  1315      firstName: "Hank"
  1316      nickName:  "Hank"
  1317  }
  1318  ```
  1319  
  1320  
  1321  #### Closed structs
  1322  
  1323  By default, structs are open to adding fields.
  1324  Instances of an open struct `p` may contain fields not defined in `p`.
  1325  This is makes it easy to add fields, but can lead to bugs:
  1326  
  1327  ```
  1328  S: {
  1329      field1: string
  1330  }
  1331  
  1332  S1: S & { field2: "foo" }
  1333  
  1334  // S1 is { field1: string, field2: "foo" }
  1335  
  1336  
  1337  A: {
  1338      field1: string
  1339      field2: string
  1340  }
  1341  
  1342  A1: A & {
  1343      feild1: "foo"  // "field1" was accidentally misspelled
  1344  }
  1345  
  1346  // A1 is
  1347  //    { field1: string, field2: string, feild1: "foo" }
  1348  // not the intended
  1349  //    { field1: "foo", field2: string }
  1350  ```
  1351  
  1352  A _closed struct_ `c` is a struct whose instances may not declare any field
  1353  with a name that does not match the name of a field
  1354  or the pattern of a pattern constraint defined in `c`.
  1355  Hidden fields are excluded from this limitation.
  1356  A struct that is the result of unifying any struct with a [`...`](#structs)
  1357  declaration is defined for all regular fields.
  1358  Closing a struct is equivalent to adding `..._|_` to it.
  1359  
  1360  Syntactically, structs are closed explicitly with the `close` builtin or
  1361  implicitly and recursively by [definitions](#definitions-and-hidden-fields).
  1362  
  1363  
  1364  ```
  1365  A: close({
  1366      field1: string
  1367      field2: string
  1368  })
  1369  
  1370  A1: A & {
  1371      feild1: string
  1372  } // _|_ feild1 not defined for A
  1373  
  1374  A2: A & {
  1375      for k,v in { feild1: string } {
  1376          k: v
  1377      }
  1378  }  // _|_ feild1 not defined for A
  1379  
  1380  C: close({
  1381      [_]: _
  1382  })
  1383  
  1384  C2: C & {
  1385      for k,v in { thisIsFine: string } {
  1386          "\(k)": v
  1387      }
  1388  }
  1389  
  1390  D: close({
  1391      // Values generated by comprehensions are treated as embeddings.
  1392      for k,v in { x: string } {
  1393          "\(k)": v
  1394      }
  1395  })
  1396  ```
  1397  
  1398  <!-- (jba) Somewhere it should be said that optional fields are only
  1399       interesting inside closed structs. -->
  1400  
  1401  <!-- TODO: move embedding section to above the previous one -->
  1402  
  1403  #### Embedding
  1404  
  1405  A struct may contain an _embedded value_, an operand used as a declaration.
  1406  An embedded value of type struct is unified with the struct in which it is
  1407  embedded, but disregarding the restrictions imposed by closed structs.
  1408  So if an embedding resolves to a closed struct, the corresponding enclosing
  1409  struct will also be closed, but may have fields that are not allowed if
  1410  normal rules for closed structs were observed.
  1411  
  1412  If an embedded value is not of type struct, the struct may only have
  1413  definitions or hidden fields. Regular fields are not allowed in such case.
  1414  
  1415  The result of `{ A }` is `A` for any `A` (including definitions).
  1416  
  1417  Syntactically, embeddings may be any expression.
  1418  
  1419  ```
  1420  S1: {
  1421      a: 1
  1422      b: 2
  1423      {
  1424          c: 3
  1425      }
  1426  }
  1427  // S1 is { a: 1, b: 2, c: 3 }
  1428  
  1429  S2: close({
  1430      a: 1
  1431      b: 2
  1432      {
  1433          c: 3
  1434      }
  1435  })
  1436  // same as close(S1)
  1437  
  1438  S3: {
  1439      a: 1
  1440      b: 2
  1441      close({
  1442          c: 3
  1443      })
  1444  }
  1445  // same as S2
  1446  ```
  1447  
  1448  
  1449  #### Definitions and hidden fields
  1450  
  1451  A field is a _definition_ if its identifier starts with `#` or `_#`.
  1452  A field is _hidden_ if its identifier starts with a `_`.
  1453  All other fields are _regular_.
  1454  
  1455  Definitions and hidden fields are not emitted when converting a CUE program
  1456  to data and are never required to be concrete.
  1457  
  1458  Referencing a definition will recursively [close](#closed-structs) it.
  1459  That is, a referenced definition will not unify with a struct
  1460  that would add a field anywhere within the definition that it does not
  1461  already define or explicitly allow with a pattern constraint or `...`.
  1462  [Embedding](#embedding) allows bypassing this check.
  1463  
  1464  If referencing a definition would always result in an error, implementations
  1465  may report this inconsistency at the point of its declaration.
  1466  
  1467  ```
  1468  #MyStruct: {
  1469      sub: field:    string
  1470  }
  1471  
  1472  #MyStruct: {
  1473      sub: enabled?: bool
  1474  }
  1475  
  1476  myValue: #MyStruct & {
  1477      sub: feild:   2     // error, feild not defined in #MyStruct
  1478      sub: enabled: true  // okay
  1479  }
  1480  
  1481  #D: {
  1482      #OneOf
  1483  
  1484      c: int // adds this field.
  1485  }
  1486  
  1487  #OneOf: { a: int } | { b: int }
  1488  
  1489  
  1490  D1: #D & { a: 12, c: 22 }  // { a: 12, c: 22 }
  1491  D2: #D & { a: 12, b: 33 }  // _|_ // cannot define both `a` and `b`
  1492  ```
  1493  
  1494  
  1495  ```
  1496  #A: {a: int}
  1497  
  1498  B: {
  1499      #A
  1500      b: c: int
  1501  }
  1502  
  1503  x: B
  1504  x: d: 3  // not allowed, as closed by embedded #A
  1505  
  1506  y: B.b
  1507  y: d: 3  // allowed as nothing closes b
  1508  
  1509  #B: {
  1510      #A
  1511      b: c: int
  1512  }
  1513  
  1514  z: #B.b
  1515  z: d: 3  // not allowed, as referencing #B closes b
  1516  ```
  1517  
  1518  
  1519  <!---
  1520  JSON fields are usual camelCase. Clashes can be avoided by adopting the
  1521  convention that definitions be TitleCase. Unexported definitions are still
  1522  subject to clashes, but those are likely easier to resolve because they are
  1523  package internal.
  1524  --->
  1525  
  1526  
  1527  #### Attributes
  1528  
  1529  Attributes allow associating meta information with values.
  1530  Their primary purpose is to define mappings between CUE and
  1531  other representations.
  1532  Attributes do not influence the evaluation of CUE.
  1533  
  1534  An attribute associates an identifier with a value, a balanced token sequence,
  1535  which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
  1536  The sequence may not contain interpolations.
  1537  
  1538  Fields, structs and packages can be associated with a set of attributes.
  1539  Attributes accumulate during unification, but implementations may remove
  1540  duplicates that have the same source string representation.
  1541  The interpretation of an attribute, including the handling of multiple
  1542  attributes for a given identifier, is up to the consumer of the attribute.
  1543  
  1544  Field attributes define additional information about a field,
  1545  such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
  1546  name of the field when mapping to a different language.
  1547  
  1548  
  1549  ```
  1550  // Package attribute
  1551  @protobuf(proto3)
  1552  
  1553  myStruct1: {
  1554      // Struct attribute:
  1555      @jsonschema(id="https://example.org/mystruct1.json")
  1556  
  1557      // Field attributes
  1558      field: string @go(Field)
  1559      attr:  int    @xml(,attr) @go(Attr)
  1560  }
  1561  
  1562  myStruct2: {
  1563      field: string @go(Field)
  1564      attr:  int    @xml(a1,attr) @go(Attr)
  1565  }
  1566  
  1567  Combined: myStruct1 & myStruct2
  1568  // field: string @go(Field)
  1569  // attr:  int    @xml(,attr) @xml(a1,attr) @go(Attr)
  1570  ```
  1571  
  1572  
  1573  #### Aliases
  1574  
  1575  Aliases name values that can be referred to
  1576  within the [scope](#declarations-and-scopes) in which they are declared.
  1577  The name of an alias must be unique within its scope.
  1578  
  1579  ```
  1580  AliasExpr  = [ identifier "=" ] Expression .
  1581  ```
  1582  
  1583  Aliases can appear in several positions:
  1584  
  1585  <!--- TODO: consider allowing this. It should be considered whether
  1586  having field aliases isn't already sufficient.
  1587  
  1588  As a declaration in a struct (`X=value`):
  1589  
  1590  - binds identifier `X` to a value embedded within the struct.
  1591  --->
  1592  
  1593  In front of a Label (`X=label: value`):
  1594  
  1595  - binds the identifier to the same value as `label` would be bound
  1596    to if it were a valid identifier.
  1597  
  1598  In front of a dynamic field (`X=(label): value`):
  1599  
  1600  - binds the identifier to the same value as `label` if it were a valid
  1601    static identifier.
  1602  
  1603  In front of a dynamic field expression (`(X=expr): value`):
  1604  
  1605  - binds the identifier to the concrete label resulting from evaluating `expr`.
  1606  
  1607  In front of a pattern constraint (`X=[expr]: value`):
  1608  
  1609  - binds the identifier to the same field as the matched by the pattern
  1610    within the instance of the field value (`value`).
  1611  
  1612  In front of a pattern constraint expression (`[X=expr]: value`):
  1613  
  1614  - binds the identifier to the concrete label that matches `expr`
  1615    within the instances of the field value (`value`).
  1616  
  1617  Before a value (`foo: X=x`)
  1618  
  1619  - binds the identifier to the value it precedes within the scope of that value.
  1620  
  1621  Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
  1622  
  1623  - binds the identifier to the list element it precedes within the scope of the
  1624    list expression.
  1625  
  1626  <!-- TODO: explain the difference between aliases and definitions.
  1627       Now that you have definitions, are aliases really necessary?
  1628       Consider removing.
  1629  -->
  1630  
  1631  ```
  1632  // A field alias
  1633  foo: X  // 4
  1634  X="not an identifier": 4
  1635  
  1636  // A value alias
  1637  foo: X={x: X.a}
  1638  bar: foo & {a: 1}  // {a: 1, x: 1}
  1639  
  1640  // A label alias
  1641  [Y=string]: { name: Y }
  1642  foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
  1643  ```
  1644  
  1645  <!-- TODO: also allow aliases as lists -->
  1646  
  1647  
  1648  #### Let declarations
  1649  
  1650  _Let declarations_ bind an identifier to an expression.
  1651  The identifier is only visible within the [scope](#declarations-and-scopes)
  1652  in which it is declared.
  1653  The identifier must be unique within its scope.
  1654  
  1655  ```
  1656  let x = expr
  1657  
  1658  a: x + 1
  1659  b: x + 2
  1660  ```
  1661  
  1662  #### Shorthand notation for nested structs
  1663  
  1664  A field whose value is a struct with a single field may be written as
  1665  a colon-separated sequence of the two field names,
  1666  followed by a colon and the value of that single field.
  1667  
  1668  ```
  1669  job: myTask: replicas: 2
  1670  ```
  1671  expands to
  1672  ```
  1673  job: {
  1674      myTask: {
  1675          replicas: 2
  1676      }
  1677  }
  1678  ```
  1679  
  1680  <!-- OPTIONAL FIELDS:
  1681  
  1682  The optional marker solves the issue of having to print large amounts of
  1683  boilerplate when dealing with large types with many optional or default
  1684  values (such as Kubernetes).
  1685  Writing such optional values in terms of *null | value is tedious,
  1686  unpleasant to read, and as it is not well defined what can be dropped or not,
  1687  all null values have to be emitted from the output, even if the user
  1688  doesn't override them.
  1689  Part of the issue is how null is defined. We could adopt a Typescript-like
  1690  approach of introducing "void" or "undefined" to mean "not defined and not
  1691  part of the output". But having all of null, undefined, and void can be
  1692  confusing. If these ever are introduced anyway, the ? operator could be
  1693  expressed along the lines of
  1694     foo?: bar
  1695  being a shorthand for
  1696     foo: void | bar
  1697  where void is the default if no other default is given.
  1698  
  1699  The current mechanical definition of "?" is straightforward, though, and
  1700  probably avoids the need for void, while solving a big issue.
  1701  
  1702  Caveats:
  1703  [1] this definition requires explicitly defined fields to be emitted, even
  1704  if they could be elided (for instance if the explicit value is the default
  1705  value defined an optional field). This is probably a good thing.
  1706  
  1707  [2] a default value may still need to be included in an output if it is not
  1708  the zero value for that field and it is not known if any outside system is
  1709  aware of defaults. For instance, which defaults are specified by the user
  1710  and which by the schema understood by the receiving system.
  1711  The use of "?" together with defaults should therefore be used carefully
  1712  in non-schema definitions.
  1713  Problematic cases should be easy to detect by a vet-like check, though.
  1714  
  1715  [3] It should be considered how this affects the trim command.
  1716  Should values implied by optional fields be allowed to be removed?
  1717  Probably not. This restriction is unlikely to limit the usefulness of trim,
  1718  though.
  1719  
  1720  [4] There should be an option to emit all concrete optional values.
  1721  ```
  1722  -->
  1723  
  1724  ### Lists
  1725  
  1726  A list literal defines a new value of type list.
  1727  A list may be open or closed.
  1728  An open list is indicated with a `...` at the end of an element list,
  1729  optionally followed by a value for the remaining elements.
  1730  
  1731  The length of a closed list is the number of elements it contains.
  1732  The length of an open list is the number of elements as a lower bound
  1733  and an unlimited number of elements as its upper bound.
  1734  
  1735  ```
  1736  ListLit       = "[" [ ElementList [ "," ] ] "]" .
  1737  ElementList   = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] .
  1738  ```
  1739  
  1740  Lists can be thought of as structs:
  1741  
  1742  ```
  1743  List: *null | {
  1744      Elem: _
  1745      Tail: List
  1746  }
  1747  ```
  1748  
  1749  For closed lists, `Tail` is `null` for the last element, for open lists it is
  1750  `*null | List`, defaulting to the shortest variant.
  1751  For instance, the open list [ 1, 2, ... ] can be represented as:
  1752  ```
  1753  open: List & { Elem: 1, Tail: { Elem: 2 } }
  1754  ```
  1755  and the closed version of this list, [ 1, 2 ], as
  1756  ```
  1757  closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
  1758  ```
  1759  
  1760  Using this representation, the subsumption rule for lists can
  1761  be derived from those of structs.
  1762  Implementations are not required to implement lists as structs.
  1763  The `Elem` and `Tail` fields are not special and `len` will not work as
  1764  expected in these cases.
  1765  
  1766  
  1767  ## Declarations and Scopes
  1768  
  1769  
  1770  ### Blocks
  1771  
  1772  A _block_ is a possibly empty sequence of declarations.
  1773  The braces of a struct literal `{ ... }` form a block, but there are
  1774  others as well:
  1775  
  1776  - The _universe block_ encompasses all CUE source text.
  1777  - Each [package](#modules-instances-and-packages) has a _package block_
  1778    containing all CUE source text in that package.
  1779  - Each file has a _file block_ containing all CUE source text in that file.
  1780  - Each `for` and `let` clause in a [comprehension](#comprehensions)
  1781    is considered to be its own implicit block.
  1782  
  1783  Blocks nest and influence scoping.
  1784  
  1785  
  1786  ### Declarations and scope
  1787  
  1788  A _declaration_  may bind an identifier to a field, alias, or package.
  1789  Every identifier in a program must be declared.
  1790  Other than for fields,
  1791  no identifier may be declared twice within the same block.
  1792  For fields, an identifier may be declared more than once within the same block,
  1793  resulting in a field with a value that is the result of unifying the values
  1794  of all fields with the same identifier.
  1795  String labels do not bind an identifier to the respective field.
  1796  
  1797  The _scope_ of a declared identifier is the extent of source text in which the
  1798  identifier denotes the specified field, alias, or package.
  1799  
  1800  CUE is lexically scoped using blocks:
  1801  
  1802  1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
  1803  1. The scope of an identifier denoting a field
  1804    declared at top level (outside any struct literal) is the package block.
  1805  1. The scope of an identifier denoting an alias
  1806    declared at top level (outside any struct literal) is the file block.
  1807  1. The scope of a let identifier
  1808    declared at top level (outside any struct literal) is the file block.
  1809  1. The scope of the package name of an imported package is the file block of the
  1810    file containing the import declaration.
  1811  1. The scope of a field, alias or let identifier declared inside a struct
  1812     literal is the innermost containing block.
  1813  
  1814  An identifier declared in a block may be redeclared in an inner block.
  1815  While the identifier of the inner declaration is in scope, it denotes the entity
  1816  declared by the inner declaration.
  1817  
  1818  The package clause is not a declaration;
  1819  the package name does not appear in any scope.
  1820  Its purpose is to identify the files belonging to the same package
  1821  and to specify the default name for import declarations.
  1822  
  1823  
  1824  ### Predeclared identifiers
  1825  
  1826  CUE predefines a set of types and builtin functions.
  1827  For each of these there is a corresponding keyword which is the name
  1828  of the predefined identifier, prefixed with `__`.
  1829  
  1830  ```
  1831  Functions
  1832  len close and or
  1833  
  1834  Types
  1835  null      The null type and value
  1836  bool      All boolean values
  1837  int       All integral numbers
  1838  float     All decimal floating-point numbers
  1839  string    Any valid UTF-8 sequence
  1840  bytes     Any valid byte sequence
  1841  
  1842  Derived   Value
  1843  number    int | float
  1844  uint      >=0
  1845  uint8     >=0 & <=255
  1846  int8      >=-128 & <=127
  1847  uint16    >=0 & <=65535
  1848  int16     >=-32_768 & <=32_767
  1849  rune      >=0 & <=0x10FFFF
  1850  uint32    >=0 & <=4_294_967_295
  1851  int32     >=-2_147_483_648 & <=2_147_483_647
  1852  uint64    >=0 & <=18_446_744_073_709_551_615
  1853  int64     >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
  1854  uint128   >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
  1855  int128    >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
  1856             <=170_141_183_460_469_231_731_687_303_715_884_105_727
  1857  float32   >=-3.40282346638528859811704183484516925440e+38 &
  1858            <=3.40282346638528859811704183484516925440e+38
  1859  float64   >=-1.797693134862315708145274237317043567981e+308 &
  1860            <=1.797693134862315708145274237317043567981e+308
  1861  ```
  1862  
  1863  
  1864  ### Exported identifiers
  1865  
  1866  <!-- move to a more logical spot -->
  1867  
  1868  An identifier of a package may be exported to permit access to it
  1869  from another package.
  1870  All identifiers not starting with `_` (so all regular fields and definitions
  1871  starting with `#`) are exported.
  1872  Any identifier starting with `_` is not visible outside the package and resides
  1873  in a separate namespace than namesake identifiers of other packages.
  1874  
  1875  ```
  1876  package mypackage
  1877  
  1878  foo:   string  // visible outside mypackage
  1879  "bar": string  // visible outside mypackage
  1880  
  1881  #Foo: {      // visible outside mypackage
  1882      a:  1    // visible outside mypackage
  1883      _b: 2    // not visible outside mypackage
  1884  
  1885      #C: {    // visible outside mypackage
  1886          d: 4 // visible outside mypackage
  1887      }
  1888      _#E: foo // not visible outside mypackage
  1889  }
  1890  ```
  1891  
  1892  
  1893  ### Uniqueness of identifiers
  1894  
  1895  Given a set of identifiers, an identifier is called unique if it is different
  1896  from every other in the set, after applying normalization following
  1897  [Unicode Annex #31](https://unicode.org/reports/tr31/).
  1898  Two identifiers are different if they are spelled differently
  1899  or if they appear in different packages and are not exported.
  1900  Otherwise, they are the same.
  1901  
  1902  
  1903  ### Field declarations
  1904  
  1905  A field associates the value of an expression to a label within a struct.
  1906  If this label is an identifier, it binds the field to that identifier,
  1907  so the field's value can be referenced by writing the identifier.
  1908  String labels are not bound to fields.
  1909  ```
  1910  a: {
  1911      b: 2
  1912      "s": 3
  1913  
  1914      c: b   // 2
  1915      d: s   // _|_ unresolved identifier "s"
  1916      e: a.s // 3
  1917  }
  1918  ```
  1919  
  1920  If an expression may result in a value associated with a default value
  1921  as described in [default values](#default-values), the field binds to this
  1922  value-default pair.
  1923  
  1924  
  1925  <!-- TODO: disallow creating identifiers starting with __
  1926  ...and reserve them for builtin values.
  1927  
  1928  The issue is with code generation. As no guarantee can be given that
  1929  a predeclared identifier is not overridden in one of the enclosing scopes,
  1930  code will have to handle detecting such cases and renaming them.
  1931  An alternative is to have the predeclared identifiers be aliases for namesake
  1932  equivalents starting with a double underscore (e.g. string -> __string),
  1933  allowing generated code (normal code would keep using `string`) to refer
  1934  to these directly.
  1935  -->
  1936  
  1937  
  1938  ### Let declarations
  1939  
  1940  <!--
  1941  TODO: why are there two "Let declarations" sections?
  1942  -->
  1943  
  1944  Within a struct, a let clause binds an identifier to the given expression.
  1945  
  1946  Within the scope of the identifier, the identifier refers to the
  1947  _locally declared_ expression.
  1948  The expression is evaluated in the scope it was declared.
  1949  
  1950  
  1951  ## Expressions
  1952  
  1953  An expression specifies the computation of a value by applying operators and
  1954  builtin functions to operands.
  1955  
  1956  Expressions that require concrete values are called _incomplete_ if any of
  1957  their operands are not concrete, but define a value that would be legal for
  1958  that expression.
  1959  Incomplete expressions may be left unevaluated until a concrete value is
  1960  requested at the application level.
  1961  
  1962  ### Operands
  1963  
  1964  Operands denote the elementary values in an expression.
  1965  An operand may be a literal, a (possibly qualified) identifier denoting
  1966  a field, alias, or let declaration, or a parenthesized expression.
  1967  
  1968  ```
  1969  Operand     = Literal | OperandName | "(" Expression ")" .
  1970  Literal     = BasicLit | ListLit | StructLit .
  1971  BasicLit    = int_lit | float_lit | string_lit |
  1972                null_lit | bool_lit | bottom_lit .
  1973  OperandName = identifier | QualifiedIdent .
  1974  ```
  1975  
  1976  ### Qualified identifiers
  1977  
  1978  A qualified identifier is an identifier qualified with a package name prefix.
  1979  
  1980  ```
  1981  QualifiedIdent = PackageName "." identifier .
  1982  ```
  1983  
  1984  A qualified identifier accesses an identifier in a different package,
  1985  which must be [imported](#import-declarations).
  1986  The identifier must be declared in the [package block](#blocks) of that package.
  1987  
  1988  ```
  1989  math.Sin    // denotes the Sin function in package math
  1990  ```
  1991  
  1992  ### References
  1993  
  1994  An identifier operand refers to a field and is called a reference.
  1995  The value of a reference is a copy of the expression associated with the field
  1996  that it is bound to,
  1997  with any references within that expression bound to the respective copies of
  1998  the fields they were originally bound to.
  1999  Implementations may use a different mechanism to evaluate as long as
  2000  these semantics are maintained.
  2001  
  2002  ```
  2003  a: {
  2004      place:    string
  2005      greeting: "Hello, \(place)!"
  2006  }
  2007  
  2008  b: a & { place: "world" }
  2009  c: a & { place: "you" }
  2010  
  2011  d: b.greeting  // "Hello, world!"
  2012  e: c.greeting  // "Hello, you!"
  2013  ```
  2014  
  2015  
  2016  
  2017  ### Primary expressions
  2018  
  2019  Primary expressions are the operands for unary and binary expressions.
  2020  
  2021  ```
  2022  PrimaryExpr =
  2023  	Operand |
  2024  	PrimaryExpr Selector |
  2025  	PrimaryExpr Index |
  2026  	PrimaryExpr Arguments .
  2027  
  2028  Selector       = "." (identifier | simple_string_lit) .
  2029  Index          = "[" Expression "]" .
  2030  Argument       = Expression .
  2031  Arguments      = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
  2032  ```
  2033  <!---
  2034  TODO:
  2035  	PrimaryExpr Query |
  2036  Query          = "." Filters .
  2037  Filters        = Filter { Filter } .
  2038  Filter         = "[" [ "?" ] AliasExpr "]" .
  2039  
  2040  TODO: maybe reintroduce slices, as they are useful in queries, probably this
  2041  time with Python semantics.
  2042  	PrimaryExpr Slice |
  2043  Slice          = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
  2044  
  2045  Argument       = Expression | ( identifier ":" Expression ).
  2046  
  2047  // & expression type
  2048  // string_lit: same as label. Arguments is current node.
  2049  // If selector is applied to list, it performs the operation for each
  2050  // element.
  2051  
  2052  TODO: considering allowing decimal_lit for selectors.
  2053  --->
  2054  
  2055  ```
  2056  x
  2057  2
  2058  (s + ".txt")
  2059  f(3.1415, true)
  2060  m["foo"]
  2061  obj.color
  2062  f.p[i].x
  2063  ```
  2064  
  2065  
  2066  ### Selectors
  2067  
  2068  For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
  2069  the selector expression
  2070  
  2071  ```
  2072  x.f
  2073  ```
  2074  
  2075  denotes the element of a <!--list or -->struct `x` identified by `f`.
  2076  <!--For structs, -->
  2077  `f` must be an identifier or a string literal identifying
  2078  any definition or regular non-optional field.
  2079  The identifier `f` is called the field selector.
  2080  
  2081  <!--
  2082  Allowing strings to be used as field selectors obviates the need for
  2083  backquoted identifiers. Note that some standards use names for structs that
  2084  are not standard identifiers (such "Fn::Foo"). Note that indexing does not
  2085  allow access to identifiers.
  2086  -->
  2087  
  2088  <!--
  2089  For lists, `f` must be an integer and follows the same lookup rules as
  2090  for the index operation.
  2091  The type of the selector expression is the type of `f`.
  2092  -->
  2093  
  2094  If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
  2095  
  2096  <!--
  2097  TODO: consider allowing this and also for selectors. It needs to be considered
  2098  how defaults are carried forward in cases like:
  2099  
  2100      x: { a: string | *"foo" } | *{ a: int | *4 }
  2101      y: x.a & string
  2102  
  2103  What is y in this case?
  2104     (x.a & string, _|_)
  2105     (string|"foo", _|_)
  2106     (string|"foo", "foo)
  2107  If the latter, then why?
  2108  
  2109  For a disjunction of the form `x1 | ... | xn`,
  2110  the selector is applied to each element `x1.f | ... | xn.f`.
  2111  -->
  2112  
  2113  Otherwise, if `x` is not a <!--list or -->struct,
  2114  or if `f` does not exist in `x`,
  2115  the result of the expression is bottom (an error).
  2116  In the latter case the expression is incomplete.
  2117  The operand of a selector may be associated with a default.
  2118  
  2119  ```
  2120  T: {
  2121      x:     int
  2122      y:     3
  2123      "x-y": 4
  2124  }
  2125  
  2126  a: T.x     // int
  2127  b: T.y     // 3
  2128  c: T.z     // _|_ // field 'z' not found in T
  2129  d: T."x-y" // 4
  2130  
  2131  e: {a: 1|*2} | *{a: 3|*4}
  2132  f: e.a  // 4 (default value)
  2133  ```
  2134  
  2135  <!--
  2136  ```
  2137  (v, d).f  =>  (v.f, d.f)
  2138  
  2139  e: {a: 1|*2} | *{a: 3|*4}
  2140  f: e.a  // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
  2141  
  2142  ```
  2143  -->
  2144  
  2145  
  2146  ### Index expressions
  2147  
  2148  A primary expression of the form
  2149  
  2150  ```
  2151  a[x]
  2152  ```
  2153  
  2154  denotes the element of a list or struct `a` indexed by `x`.
  2155  The value `x` is called the index or field name, respectively.
  2156  The following rules apply:
  2157  
  2158  If `a` is not a struct:
  2159  
  2160  - `a` is a list (which need not be complete)
  2161  - the index `x` unified with `int` must be concrete.
  2162  - the index `x` is in range if `0 <= x < len(a)`, where only the
  2163    explicitly defined values of an open-ended list are considered,
  2164    otherwise it is out of range
  2165  
  2166  The result of `a[x]` is
  2167  
  2168  for `a` of list type:
  2169  
  2170  - the list element at index `x`, if `x` is within range
  2171  - bottom (an error), otherwise
  2172  
  2173  
  2174  for `a` of struct type:
  2175  
  2176  - the index `x` unified with `string` must be concrete.
  2177  - the value of the regular and non-optional field named `x` of struct `a`,
  2178    if this field exists
  2179  - bottom (an error), otherwise
  2180  
  2181  
  2182  ```
  2183  a: [ 1, 2 ][1]     // 2
  2184  b: [ 1, 2 ][2]     // _|_
  2185  c: [ 1, 2, ...][2] // _|_
  2186  
  2187  // Defaults are selected for both operand and index:
  2188  x: [1, 2] | *[3, 4]
  2189  y: int | *1
  2190  z: x[y]  // 4
  2191  ```
  2192  
  2193  ### Operators
  2194  
  2195  Operators combine operands into expressions.
  2196  
  2197  ```
  2198  Expression = UnaryExpr | Expression binary_op Expression .
  2199  UnaryExpr  = PrimaryExpr | unary_op UnaryExpr .
  2200  
  2201  binary_op  = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op  .
  2202  rel_op     = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
  2203  add_op     = "+" | "-" .
  2204  mul_op     = "*" | "/" .
  2205  unary_op   = "+" | "-" | "!" | "*" | rel_op .
  2206  ```
  2207  
  2208  Comparisons are discussed [elsewhere](#comparison-operators).
  2209  For any binary operators, the operand types must unify.
  2210  
  2211  <!-- TODO: durations
  2212   unless the operation involves durations.
  2213  
  2214  Except for duration operations, if one operand is an untyped [literal] and the
  2215  other operand is not, the constant is [converted] to the type of the other
  2216  operand.
  2217  -->
  2218  
  2219  <!--
  2220  Operands of unary and binary expressions may be associated with a default using
  2221  the following:
  2222  
  2223  ```
  2224  O1: op (v1, d1)          => (op v1, op d1)
  2225  
  2226  O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
  2227  and because v => (v, v)
  2228  O3: v1       op (v2, d2) => (v1 op v2, v1 op d2)
  2229  O4: (v1, d1) op v2       => (v1 op v2, d1 op v2)
  2230  ```
  2231  
  2232  ```
  2233  Field               Resulting Value-Default pair
  2234  a: *1|2             (1|2, 1)
  2235  b: -a               (-a, -1)
  2236  
  2237  c: a + 2            (a+2, 3)
  2238  d: a + a            (a+a, 2)
  2239  ```
  2240  -->
  2241  
  2242  #### Operator precedence
  2243  
  2244  Unary operators have the highest precedence.
  2245  
  2246  There are eight precedence levels for binary operators.
  2247  Multiplication operators binds strongest, followed by
  2248  addition operators, comparison operators,
  2249  `&&` (logical AND), `||` (logical OR), `&` (unification),
  2250  and finally `|` (disjunction):
  2251  
  2252  ```
  2253  Precedence    Operator
  2254      7             *  /
  2255      6             +  -
  2256      5             ==  !=  <  <=  >  >= =~ !~
  2257      4             &&
  2258      3             ||
  2259      2             &
  2260      1             |
  2261  ```
  2262  
  2263  Binary operators of the same precedence associate from left to right.
  2264  For instance, `x / y * z` is the same as `(x / y) * z`.
  2265  
  2266  ```
  2267  +x
  2268  23 + 3*x[i]
  2269  x <= f()
  2270  f() || g()
  2271  x == y+1 && y == z-1
  2272  2 | int
  2273  { a: 1 } & { b: 2 }
  2274  ```
  2275  
  2276  #### Arithmetic operators
  2277  
  2278  Arithmetic operators apply to numeric values and yield a result of the same type
  2279  as the first operand. The four standard arithmetic operators
  2280  `(+, -, *, /)` apply to integer and decimal floating-point types;
  2281  `+` and `*` also apply to strings and bytes.
  2282  
  2283  ```
  2284  +    sum                    integers, floats, strings, bytes
  2285  -    difference             integers, floats
  2286  *    product                integers, floats, strings, bytes
  2287  /    quotient               integers, floats
  2288  ```
  2289  
  2290  For any operator that accepts operands of type `float`, any operand may be
  2291  of type `int` or `float`, in which case the result will be `float`
  2292  if it cannot be represented as an `int` or if any of the operands are `float`,
  2293  or `int` otherwise.
  2294  So the result of `1 / 2` is `0.5` and is of type `float`.
  2295  
  2296  The result of division by zero is bottom (an error).
  2297  <!-- TODO: consider making it +/- Inf -->
  2298  Integer division is implemented through the builtin functions
  2299  `quo`, `rem`, `div`, and `mod`.
  2300  
  2301  The unary operators `+` and `-` are defined for numeric values as follows:
  2302  
  2303  ```
  2304  +x                          is 0 + x
  2305  -x    negation              is 0 - x
  2306  ```
  2307  
  2308  #### String operators
  2309  
  2310  Strings can be concatenated using the `+` operator:
  2311  ```
  2312  s: "hi " + name + " and good bye"
  2313  ```
  2314  String addition creates a new string by concatenating the operands.
  2315  
  2316  A string can be repeated by multiplying it:
  2317  
  2318  ```
  2319  s: "etc. "*3  // "etc. etc. etc. "
  2320  ```
  2321  
  2322  <!-- jba: Do these work for byte sequences? If not, why not? -->
  2323  
  2324  
  2325  ##### Comparison operators
  2326  
  2327  Comparison operators compare two operands and yield an untyped boolean value.
  2328  
  2329  ```
  2330  ==    equal
  2331  !=    not equal
  2332  <     less
  2333  <=    less or equal
  2334  >     greater
  2335  >=    greater or equal
  2336  =~    matches regular expression
  2337  !~    does not match regular expression
  2338  ```
  2339  
  2340  <!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
  2341  
  2342  In any comparison, the types of the two operands must unify or one of the
  2343  operands must be null.
  2344  
  2345  The equality operators `==` and `!=` apply to operands that are comparable.
  2346  The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
  2347  The matching operators `=~` and `!~` apply to a string and a regular
  2348  expression operand.
  2349  These terms and the result of the comparisons are defined as follows:
  2350  
  2351  - Null is comparable with itself and any other type.
  2352    Two null values are always equal, null is unequal with anything else.
  2353  - Boolean values are comparable.
  2354    Two boolean values are equal if they are either both true or both false.
  2355  - Integer values are comparable and ordered, in the usual way.
  2356  - Floating-point values are comparable and ordered, as per the definitions
  2357    for binary coded decimals in the IEEE-754-2008 standard.
  2358  - Floating point numbers may be compared with integers.
  2359  - String and bytes values are comparable and ordered lexically byte-wise.
  2360  - Struct are not comparable.
  2361  - Lists are not comparable.
  2362  - The regular expression syntax is the one accepted by RE2,
  2363    described in https://github.com/google/re2/wiki/Syntax,
  2364    except for `\C`.
  2365  - `s =~ r` is true if `s` matches the regular expression `r`.
  2366  - `s !~ r` is true if `s` does not match regular expression `r`.
  2367  
  2368  <!--- TODO: consider the following
  2369  - For regular expression, named capture groups are interpreted as CUE references
  2370    that must unify with the strings matching this capture group.
  2371  --->
  2372  <!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
  2373  <!-- Consider implementing Level 2 of Unicode regular expression. -->
  2374  
  2375  ```
  2376  3 < 4       // true
  2377  3 < 4.0     // true
  2378  null == 2   // false
  2379  null != {}  // true
  2380  {} == {}    // _|_: structs are not comparable against structs
  2381  
  2382  "Wild cats" =~ "cat"   // true
  2383  "Wild cats" !~ "dog"   // true
  2384  
  2385  "foo" =~ "^[a-z]{3}$"  // true
  2386  "foo" =~ "^[a-z]{4}$"  // false
  2387  ```
  2388  
  2389  <!-- jba
  2390  I think I know what `3 < a` should mean if
  2391  
  2392      a: >=1 & <=5
  2393  
  2394  It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
  2395  
  2396  But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
  2397  -->
  2398  
  2399  #### Logical operators
  2400  
  2401  Logical operators apply to boolean values and yield a result of the same type
  2402  as the operands. The right operand is evaluated conditionally.
  2403  
  2404  ```
  2405  &&    conditional AND    p && q  is  "if p then q else false"
  2406  ||    conditional OR     p || q  is  "if p then true else q"
  2407  !     NOT                !p      is  "not p"
  2408  ```
  2409  
  2410  
  2411  <!--
  2412  ### TODO TODO TODO
  2413  
  2414  3.14 / 0.0   // illegal: division by zero
  2415  Illegal conversions always apply to CUE.
  2416  
  2417  Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
  2418  -->
  2419  
  2420  <!--- TODO(mpvl): conversions
  2421  ### Conversions
  2422  Conversions are expressions of the form `T(x)` where `T` and `x` are
  2423  expressions.
  2424  The result is always an instance of `T`.
  2425  
  2426  ```
  2427  Conversion = Expression "(" Expression [ "," ] ")" .
  2428  ```
  2429  --->
  2430  <!---
  2431  
  2432  A literal value `x` can be converted to type T if `x` is representable by a
  2433  value of `T`.
  2434  
  2435  As a special case, an integer literal `x` can be converted to a string type
  2436  using the same rule as for non-constant x.
  2437  
  2438  Converting a literal yields a typed value as result.
  2439  
  2440  ```
  2441  uint(iota)               // iota value of type uint
  2442  float32(2.718281828)     // 2.718281828 of type float32
  2443  complex128(1)            // 1.0 + 0.0i of type complex128
  2444  float32(0.49999999)      // 0.5 of type float32
  2445  float64(-1e-1000)        // 0.0 of type float64
  2446  string('x')              // "x" of type string
  2447  string(0x266c)           // "♬" of type string
  2448  MyString("foo" + "bar")  // "foobar" of type MyString
  2449  string([]byte{'a'})      // not a constant: []byte{'a'} is not a constant
  2450  (*int)(nil)              // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
  2451  int(1.2)                 // illegal: 1.2 cannot be represented as an int
  2452  string(65.0)             // illegal: 65.0 is not an integer constant
  2453  ```
  2454  --->
  2455  <!---
  2456  
  2457  A conversion is always allowed if `x` is an instance of `T`.
  2458  
  2459  If `T` and `x` of different underlying type, a conversion is allowed if
  2460  `x` can be converted to a value `x'` of `T`'s type, and
  2461  `x'` is an instance of `T`.
  2462  A value `x` can be converted to the type of `T` in any of these cases:
  2463  
  2464  - `x` is a struct and is subsumed by `T`.
  2465  - `x` and `T` are both integer or floating points.
  2466  - `x` is an integer or a byte sequence and `T` is a string.
  2467  - `x` is a string and `T` is a byte sequence.
  2468  
  2469  Specific rules apply to conversions between numeric types, structs,
  2470  or to and from a string type. These conversions may change the representation
  2471  of `x`.
  2472  All other conversions only change the type but not the representation of x.
  2473  
  2474  
  2475  #### Conversions between numeric ranges
  2476  For the conversion of numeric values, the following rules apply:
  2477  
  2478  1. Any integer value can be converted into any other integer value
  2479     provided that it is within range.
  2480  2. When converting a decimal floating-point number to an integer, the fraction
  2481     is discarded (truncation towards zero). TODO: or disallow truncating?
  2482  
  2483  ```
  2484  a: uint16(int(1000))  // uint16(1000)
  2485  b: uint8(1000)        // _|_ // overflow
  2486  c: int(2.5)           // 2  TODO: TBD
  2487  ```
  2488  
  2489  
  2490  #### Conversions to and from a string type
  2491  
  2492  Converting a list of bytes to a string type yields a string whose successive
  2493  bytes are the elements of the slice.
  2494  Invalid UTF-8 is converted to `"\uFFFD"`.
  2495  
  2496  ```
  2497  string('hell\xc3\xb8')   // "hellø"
  2498  string(bytes([0x20]))    // " "
  2499  ```
  2500  
  2501  As string value is always convertible to a list of bytes.
  2502  
  2503  ```
  2504  bytes("hellø")   // 'hell\xc3\xb8'
  2505  bytes("")        // ''
  2506  ```
  2507  
  2508  #### Conversions between list types
  2509  
  2510  Conversions between list types are possible only if `T` strictly subsumes `x`
  2511  and the result will be the unification of `T` and `x`.
  2512  
  2513  If we introduce named types this would be different from IP & [10, ...]
  2514  
  2515  Consider removing this until it has a different meaning.
  2516  
  2517  ```
  2518  IP:        4*[byte]
  2519  Private10: IP([10, ...])  // [10, byte, byte, byte]
  2520  ```
  2521  
  2522  #### Conversions between struct types
  2523  
  2524  A conversion from `x` to `T`
  2525  is applied using the following rules:
  2526  
  2527  1. `x` must be an instance of `T`,
  2528  2. all fields defined for `x` that are not defined for `T` are removed from
  2529    the result of the conversion, recursively.
  2530  
  2531  <!-- jba: I don't think you say anywhere that the matching fields are unified.
  2532  mpvl: they are not, x must be an instance of T, in which case x == T&x,
  2533  so unification would be unnecessary.
  2534  -->
  2535  <!--
  2536  ```
  2537  T: {
  2538      a: { b: 1..10 }
  2539  }
  2540  
  2541  x1: {
  2542      a: { b: 8, c: 10 }
  2543      d: 9
  2544  }
  2545  
  2546  c1: T(x1)             // { a: { b: 8 } }
  2547  c2: T({})             // _|_  // missing field 'a' in '{}'
  2548  c3: T({ a: {b: 0} })  // _|_  // field a.b does not unify (0 & 1..10)
  2549  ```
  2550  -->
  2551  
  2552  ### Calls
  2553  
  2554  Calls can be made to core library functions, called builtins.
  2555  Given an expression `f` of function type F,
  2556  ```
  2557  f(a1, a2, … an)
  2558  ```
  2559  calls `f` with arguments `a1, a2, … an`. Arguments must be expressions
  2560  of which the values are an instance of the parameter types of `F`
  2561  and are evaluated before the function is called.
  2562  
  2563  ```
  2564  a: math.Atan2(x, y)
  2565  ```
  2566  
  2567  In a function call, the function value and arguments are evaluated in the usual
  2568  order.
  2569  After they are evaluated, the parameters of the call are passed by value
  2570  to the function and the called function begins execution.
  2571  The return parameters
  2572  of the function are passed by value back to the calling function when the
  2573  function returns.
  2574  
  2575  
  2576  ### Comprehensions
  2577  
  2578  Lists and fields can be constructed using comprehensions.
  2579  
  2580  Comprehensions define a clause sequence that consists of a sequence of
  2581  `for`, `if`, and `let` clauses, nesting from left to right.
  2582  The sequence must start with a `for` or `if` clause.
  2583  The `for` and `let` clauses each define a new scope in which new values are
  2584  bound to be available for the next clause.
  2585  
  2586  The `for` clause binds the defined identifiers, on each iteration, to the next
  2587  value of some iterable value in a new scope.
  2588  A `for` clause may bind one or two identifiers.
  2589  If there is one identifier, it binds it to the value of
  2590  a list element or struct field value.
  2591  If there are two identifiers, the first value will be the key or index,
  2592  if available, and the second will be the value.
  2593  
  2594  For lists, `for` iterates over all elements in the list after closing it.
  2595  For structs, `for` iterates over all non-optional regular fields.
  2596  
  2597  An `if` clause, or guard, specifies an expression that terminates the current
  2598  iteration if it evaluates to false.
  2599  
  2600  The `let` clause binds the result of an expression to the defined identifier
  2601  in a new scope.
  2602  
  2603  A current iteration is said to complete if the innermost block of the clause
  2604  sequence is reached.
  2605  Syntactically, the comprehension value is a struct.
  2606  A comprehension can generate non-struct values by embedding such values within
  2607  this struct.
  2608  
  2609  Within lists, the values yielded by a comprehension are inserted in the list
  2610  at the position of the comprehension.
  2611  Within structs, the values yielded by a comprehension are embedded within the
  2612  struct.
  2613  Both structs and lists may contain multiple comprehensions.
  2614  
  2615  ```
  2616  Comprehension       = Clauses StructLit .
  2617  
  2618  Clauses             = StartClause { [ "," ] Clause } .
  2619  StartClause         = ForClause | GuardClause .
  2620  Clause              = StartClause | LetClause .
  2621  ForClause           = "for" identifier [ "," identifier ] "in" Expression .
  2622  GuardClause         = "if" Expression .
  2623  LetClause           = "let" identifier "=" Expression .
  2624  ```
  2625  
  2626  ```
  2627  a: [1, 2, 3, 4]
  2628  b: [for x in a if x > 1 { x+1 }]  // [3, 4, 5]
  2629  
  2630  c: {
  2631      for x in a
  2632      if x < 4
  2633      let y = 1 {
  2634          "\(x)": x + y
  2635      }
  2636  }
  2637  d: { "1": 2, "2": 3, "3": 4 }
  2638  ```
  2639  
  2640  
  2641  ### String interpolation
  2642  
  2643  String interpolation allows constructing strings by replacing placeholder
  2644  expressions with their string representation.
  2645  String interpolation may be used in single- and double-quoted strings, as well
  2646  as their multiline equivalent.
  2647  
  2648  A placeholder consists of `\(` followed by an expression and `)`.
  2649  The expression is evaluated in the scope within which the string is defined.
  2650  
  2651  The result of the expression is substituted as follows:
  2652  - string: as is
  2653  - bool: the JSON representation of the bool
  2654  - number: a JSON representation of the number that preserves the
  2655  precision of the underlying binary coded decimal
  2656  - bytes: as if substituted within single quotes or
  2657  converted to valid UTF-8 replacing the
  2658  maximal subpart of ill-formed subsequences with a single
  2659  replacement character (W3C encoding standard) otherwise
  2660  - list: illegal
  2661  - struct: illegal
  2662  
  2663  
  2664  ```
  2665  a: "World"
  2666  b: "Hello \( a )!" // Hello World!
  2667  ```
  2668  
  2669  
  2670  ## Builtin Functions
  2671  
  2672  Builtin functions are predeclared. They are called like any other function.
  2673  
  2674  
  2675  ### `len`
  2676  
  2677  The builtin function `len` takes arguments of various types and returns
  2678  a result of type int.
  2679  
  2680  ```
  2681  Argument type    Result
  2682  
  2683  bytes            length of byte sequence
  2684  list             list length, smallest length for an open list
  2685  struct           number of distinct data fields, excluding field constraints
  2686  ```
  2687  <!-- TODO: consider not supporting len, but instead rely on more
  2688  precisely named builtin functions:
  2689    - strings.RuneLen(x)
  2690    - bytes.Len(x)  // x may be a string
  2691    - struct.NumFooFields(x)
  2692    - list.Len(x)
  2693  -->
  2694  
  2695  ```
  2696  Expression           Result
  2697  len("Hellø")         6
  2698  len([1, 2, 3])       3
  2699  len([1, 2, ...])     2
  2700  ```
  2701  
  2702  
  2703  ### `close`
  2704  
  2705  The builtin function `close` converts a partially defined, or open, struct
  2706  to a fully defined, or closed, struct.
  2707  
  2708  
  2709  ### `and`
  2710  
  2711  The builtin function `and` takes a list and returns the result of applying
  2712  the `&` operator to all elements in the list.
  2713  It returns top for the empty list.
  2714  
  2715  ```
  2716  Expression:          Result
  2717  and([a, b])          a & b
  2718  and([a])             a
  2719  and([])              _
  2720  ```
  2721  
  2722  ### `or`
  2723  
  2724  The builtin function `or` takes a list and returns the result of applying
  2725  the `|` operator to all elements in the list.
  2726  It returns bottom for the empty list.
  2727  
  2728  ```
  2729  Expression:          Result
  2730  or([a, b])           a | b
  2731  or([a])              a
  2732  or([])               _|_
  2733  ```
  2734  
  2735  ### `div`, `mod`, `quo` and `rem`
  2736  
  2737  For two integer values `x` and `y`,
  2738  the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)`
  2739  implement Euclidean division and
  2740  satisfy the following relationship:
  2741  
  2742  ```
  2743  r = x - y*q  with 0 <= r < |y|
  2744  ```
  2745  where `|y|` denotes the absolute value of `y`.
  2746  
  2747  ```
  2748   x     y   div(x, y)  mod(x, y)
  2749   5     3        1          2
  2750  -5     3       -2          1
  2751   5    -3       -1          2
  2752  -5    -3        2          1
  2753  ```
  2754  
  2755  For two integer values `x` and `y`,
  2756  the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)`
  2757  implement truncated division and
  2758  satisfy the following relationship:
  2759  
  2760  ```
  2761  x = q*y + r  and  |r| < |y|
  2762  ```
  2763  
  2764  with `quo(x, y)` truncated towards zero.
  2765  
  2766  ```
  2767   x     y   quo(x, y)  rem(x, y)
  2768   5     3        1          2
  2769  -5     3       -1         -2
  2770   5    -3       -1          2
  2771  -5    -3        1         -2
  2772  ```
  2773  
  2774  A zero divisor in either case results in bottom (an error).
  2775  
  2776  
  2777  ## Cycles
  2778  
  2779  Implementations are required to interpret or reject cycles encountered
  2780  during evaluation according to the rules in this section.
  2781  
  2782  
  2783  ### Reference cycles
  2784  
  2785  A _reference cycle_ occurs if a field references itself, either directly or
  2786  indirectly.
  2787  
  2788  ```
  2789  // x references itself
  2790  x: x
  2791  
  2792  // indirect cycles
  2793  b: c
  2794  c: d
  2795  d: b
  2796  ```
  2797  
  2798  Implementations should treat these as `_`.
  2799  Two particular cases are discussed below.
  2800  
  2801  
  2802  #### Expressions that unify an atom with an expression
  2803  
  2804  An expression of the form `a & e`, where `a` is an atom
  2805  and `e` is an expression, always evaluates to `a` or bottom.
  2806  As it does not matter how we fail, we can assume the result to be `a`
  2807  and postpone validating `a == e` until after all references
  2808  in `e` have been resolved.
  2809  
  2810  ```
  2811  // Config            Evaluates to (requiring concrete values)
  2812  x: {                  x: {
  2813      a: b + 100            a: _|_ // cycle detected
  2814      b: a - 100            b: _|_ // cycle detected
  2815  }                     }
  2816  
  2817  y: x & {              y: {
  2818      a: 200                a: 200 // asserted that 200 == b + 100
  2819                            b: 100
  2820  }                     }
  2821  ```
  2822  
  2823  
  2824  #### Field values
  2825  
  2826  A field value of the form `r & v`,
  2827  where `r` evaluates to a reference cycle and `v` is a concrete value,
  2828  evaluates to `v`.
  2829  Unification is idempotent and unifying a value with itself ad infinitum,
  2830  which is what the cycle represents, results in this value.
  2831  Implementations should detect cycles of this kind, ignore `r`,
  2832  and take `v` as the result of unification.
  2833  
  2834  <!-- Tomabechi's graph unification algorithm
  2835  can detect such cycles at near-zero cost. -->
  2836  
  2837  ```
  2838  Configuration    Evaluated
  2839  //    c           Cycles in nodes of type struct evaluate
  2840  //  ↙︎   ↖         to the fixed point of unifying their
  2841  // a  →  b        values ad infinitum.
  2842  
  2843  a: b & { x: 1 }   // a: { x: 1, y: 2, z: 3 }
  2844  b: c & { y: 2 }   // b: { x: 1, y: 2, z: 3 }
  2845  c: a & { z: 3 }   // c: { x: 1, y: 2, z: 3 }
  2846  
  2847  // resolve a             b & {x:1}
  2848  // substitute b          c & {y:2} & {x:1}
  2849  // substitute c          a & {z:3} & {y:2} & {x:1}
  2850  // eliminate a (cycle)   {z:3} & {y:2} & {x:1}
  2851  // simplify              {x:1,y:2,z:3}
  2852  ```
  2853  
  2854  This rule also applies to field values that are disjunctions of unification
  2855  operations of the above form.
  2856  
  2857  ```
  2858  a: b&{x:1} | {y:1}  // {x:1,y:3,z:2} | {y:1}
  2859  b: {x:2} | c&{z:2}  // {x:2} | {x:1,y:3,z:2}
  2860  c: a&{y:3} | {z:3}  // {x:1,y:3,z:2} | {z:3}
  2861  
  2862  
  2863  // resolving a           b&{x:1} | {y:1}
  2864  // substitute b          ({x:2} | c&{z:2})&{x:1} | {y:1}
  2865  // simplify              c&{z:2}&{x:1} | {y:1}
  2866  // substitute c          (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
  2867  // simplify              a&{y:3}&{z:2}&{x:1} | {y:1}
  2868  // eliminate a (cycle)   {y:3}&{z:2}&{x:1} | {y:1}
  2869  // expand                {x:1,y:3,z:2} | {y:1}
  2870  ```
  2871  
  2872  Note that all nodes that form a reference cycle to form a struct will evaluate
  2873  to the same value.
  2874  If a field value is a disjunction, any element that is part of a cycle will
  2875  evaluate to this value.
  2876  
  2877  
  2878  ### Structural cycles
  2879  
  2880  A structural cycle is when a node references one of its ancestor nodes.
  2881  It is possible to construct a structural cycle by unifying two acyclic values:
  2882  ```
  2883  // acyclic
  2884  y: {
  2885      f: h: g
  2886      g: _
  2887  }
  2888  // acyclic
  2889  x: {
  2890      f: _
  2891      g: f
  2892  }
  2893  // introduces structural cycle
  2894  z: x & y
  2895  ```
  2896  Implementations should be able to detect such structural cycles dynamically.
  2897  
  2898  A structural cycle can result in infinite structure or evaluation loops.
  2899  ```
  2900  // infinite structure
  2901  a: b: a
  2902  
  2903  // infinite evaluation
  2904  f: {
  2905      n:   int
  2906      out: n + (f & {n: 1}).out
  2907  }
  2908  ```
  2909  CUE must allow or disallow structural cycles under certain circumstances.
  2910  
  2911  If a node `a` references an ancestor node, we call it and any of its
  2912  field values `a.f` _cyclic_.
  2913  So if `a` is cyclic, all of its descendants are also regarded as cyclic.
  2914  A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
  2915  is valid if any of its conjuncts is not cyclic.
  2916  
  2917  ```
  2918  // Disallowed: a list of infinite length with all elements being 1.
  2919  #List: {
  2920      head: 1
  2921      tail: #List
  2922  }
  2923  
  2924  // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
  2925  a: {
  2926      b: c
  2927  }
  2928  c: {
  2929      d: a
  2930  }
  2931  
  2932  // #List defines a list of arbitrary length. Because the recursive reference
  2933  // is part of a disjunction, this does not result in a structural cycle.
  2934  #List: {
  2935      head: _
  2936      tail: null | #List
  2937  }
  2938  
  2939  // Usage of #List. The value of tail in the most deeply nested element will
  2940  // be `null`: as the value of the disjunct referring to list is the only
  2941  // conjunct, all conjuncts are cyclic and the value is invalid and so
  2942  // eliminated from the disjunction.
  2943  MyList: #List & { head: 1, tail: { head: 2 }}
  2944  ```
  2945  
  2946  <!--
  2947  ### Unused fields
  2948  
  2949  TODO: rules for detection of unused fields
  2950  
  2951  1. Any alias value must be used
  2952  -->
  2953  
  2954  
  2955  ## Modules, instances, and packages
  2956  
  2957  CUE configurations are constructed combining _instances_.
  2958  An instance, in turn, is constructed from one or more source files belonging
  2959  to the same _package_ that together declare the data representation.
  2960  Elements of this data representation may be exported and used
  2961  in other instances.
  2962  
  2963  ### Source file organization
  2964  
  2965  Each source file consists of an optional package clause defining collection
  2966  of files to which it belongs,
  2967  followed by a possibly empty set of import declarations that declare
  2968  packages whose contents it wishes to use, followed by a possibly empty set of
  2969  declarations.
  2970  
  2971  Like with a struct, a source file may contain embeddings.
  2972  Unlike with a struct, the embedded expressions may be any value.
  2973  If the result of the unification of all embedded values is not a struct,
  2974  it will be output instead of its enclosing file when exporting CUE
  2975  to a data format
  2976  
  2977  ```
  2978  SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
  2979  ```
  2980  
  2981  ```
  2982  "Hello \(#place)!"
  2983  
  2984  #place: "world"
  2985  
  2986  // Outputs "Hello world!"
  2987  ```
  2988  
  2989  ### Package clause
  2990  
  2991  A package clause is an optional clause that defines the package to which
  2992  a source file the file belongs.
  2993  
  2994  ```
  2995  PackageClause  = "package" PackageName .
  2996  PackageName    = identifier .
  2997  ```
  2998  
  2999  The PackageName must not be a definition identifier.
  3000  
  3001  If the PackageName is the blank identifier (`_`), it is treated the same
  3002  as if there were no package clause. This can be useful to allow adding
  3003  package level attributes or doc comments to a CUE file without a package
  3004  name.
  3005  
  3006  ```
  3007  package math
  3008  ```
  3009  
  3010  ### Modules and instances
  3011  
  3012  A _module_ defines a tree of directories, rooted at the _module root_.
  3013  
  3014  All source files within a module with the same package name belong to the same
  3015  package.
  3016  <!-- jba: I can't make sense of the above sentence. -->
  3017  A module may define multiple packages.
  3018  
  3019  An _instance_ of a package is any subset of files belonging
  3020  to the same package.
  3021  <!-- jba: Are you saying that -->
  3022  <!-- if I have a package with files a, b and c, then there are 8 instances of -->
  3023  <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
  3024  <!-- purpose of that definition? -->
  3025  It is interpreted as the concatenation of these files.
  3026  
  3027  An implementation may impose conventions on the layout of package files
  3028  to determine which files of a package belongs to an instance.
  3029  For example, an instance may be defined as the subset of package files
  3030  belonging to a directory and all its ancestors.
  3031  <!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
  3032  
  3033  
  3034  ### Import declarations
  3035  
  3036  An import declaration states that the source file containing the declaration
  3037  depends on definitions of the _imported_ package
  3038  and enables access to exported identifiers of that package.
  3039  The import names an identifier (PackageName) to be used for access and an
  3040  ImportPath that specifies the package to be imported.
  3041  
  3042  ```
  3043  ImportDecl       = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
  3044  ImportSpec       = [ PackageName ] ImportPath .
  3045  ImportLocation   = { unicode_value } .
  3046  ImportPath       = `"` ImportLocation [ ":" identifier ] `"` .
  3047  ```
  3048  
  3049  The PackageName is used in qualified identifiers to access
  3050  exported identifiers of the package within the importing source file.
  3051  It is declared in the file block.
  3052  It defaults to the identifier specified in the package clause of the imported
  3053  package, which must match either the last path component of ImportLocation
  3054  or the identifier following it.
  3055  
  3056  <!--
  3057  Note: this deviates from the Go spec where there is no such restriction.
  3058  This restriction has the benefit of being to determine the identifiers
  3059  for packages from within the file itself. But for CUE it is has another benefit:
  3060  when using package hierarchies, one is more likely to want to include multiple
  3061  packages within the same directory structure. This mechanism allows
  3062  disambiguation in these cases.
  3063  -->
  3064  
  3065  The interpretation of the ImportPath is implementation-dependent but it is
  3066  typically either the path of a builtin package or a fully qualifying location
  3067  of a package within a source code repository.
  3068  
  3069  An ImportLocation must be a non-empty string using only characters belonging to
  3070  Unicode's L, M, N, P, and S general categories
  3071  (the Graphic characters without spaces)
  3072  and may not include the characters ``!"#$%&'()*,:;<=>?[\\]^`{|}``
  3073  or the Unicode replacement character U+FFFD.
  3074  
  3075  Assume we have package containing the package clause `package math`,
  3076  which exports function `Sin` at the path identified by `lib/math`.
  3077  This table illustrates how `Sin` is accessed in files
  3078  that import the package after the various types of import declaration.
  3079  
  3080  <!-- TODO: a better example than lib/math:math, where the suffix is a no-op -->
  3081  
  3082  ```
  3083  Import declaration          Local name of Sin
  3084  
  3085  import   "lib/math"         math.Sin
  3086  import   "lib/math:math"    math.Sin
  3087  import m "lib/math"         m.Sin
  3088  ```
  3089  
  3090  An import declaration declares a dependency relation between the importing and
  3091  imported package. It is illegal for a package to import itself, directly or
  3092  indirectly, or to directly import a package without referring to any of its
  3093  exported identifiers.
  3094  
  3095  
  3096  ### An example package
  3097  
  3098  TODO