github.com/solo-io/cue@v0.4.7/doc/ref/spec.md

github.com/solo-io/cue@v0.4.7/doc/ref/spec.md (about)

     1  <!--
     2   Copyright 2018 The CUE Authors
     3  
     4   Licensed under the Apache License, Version 2.0 (the "License");
     5   you may not use this file except in compliance with the License.
     6   You may obtain a copy of the License at
     7  
     8       http://www.apache.org/licenses/LICENSE-2.0
     9  
    10   Unless required by applicable law or agreed to in writing, software
    11   distributed under the License is distributed on an "AS IS" BASIS,
    12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    13   See the License for the specific language governing permissions and
    14   limitations under the License.
    15  -->
    16  
    17  # The CUE Language Specification
    18  
    19  ## Introduction
    20  
    21  This is a reference manual for the CUE data constraint language.
    22  CUE, pronounced cue or Q, is a general-purpose and strongly typed
    23  constraint-based language.
    24  It can be used for data templating, data validation, code generation, scripting,
    25  and many other applications involving structured data.
    26  The CUE tooling, layered on top of CUE, provides
    27  a general purpose scripting language for creating scripts as well as
    28  simple servers, also expressed in CUE.
    29  
    30  CUE was designed with cloud configuration, and related systems, in mind,
    31  but is not limited to this domain.
    32  It derives its formalism from relational programming languages.
    33  This formalism allows for managing and reasoning over large amounts of
    34  data in a straightforward manner.
    35  
    36  The grammar is compact and regular, allowing for easy analysis by automatic
    37  tools such as integrated development environments.
    38  
    39  This document is maintained by mpvl@golang.org.
    40  CUE has a lot of similarities with the Go language. This document draws heavily
    41  from the Go specification as a result.
    42  
    43  CUE draws its influence from many languages.
    44  Its main influences were BCL/ GCL (internal to Google),
    45  LKB (LinGO), Go, and JSON.
    46  Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google),
    47  Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python.
    48  
    49  
    50  ## Notation
    51  
    52  The syntax is specified using Extended Backus-Naur Form (EBNF):
    53  
    54  ```
    55  Production  = production_name "=" [ Expression ] "." .
    56  Expression  = Alternative { "|" Alternative } .
    57  Alternative = Term { Term } .
    58  Term        = production_name | token [ "…" token ] | Group | Option | Repetition .
    59  Group       = "(" Expression ")" .
    60  Option      = "[" Expression "]" .
    61  Repetition  = "{" Expression "}" .
    62  ```
    63  
    64  Productions are expressions constructed from terms and the following operators,
    65  in increasing precedence:
    66  
    67  ```
    68  |   alternation
    69  ()  grouping
    70  []  option (0 or 1 times)
    71  {}  repetition (0 to n times)
    72  ```
    73  
    74  Lower-case production names are used to identify lexical tokens. Non-terminals
    75  are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes
    76  ``.
    77  
    78  The form a … b represents the set of characters from a through b as
    79  alternatives. The horizontal ellipsis … is also used elsewhere in the spec to
    80  informally denote various enumerations or code snippets that are not further
    81  specified. The character … (as opposed to the three characters ...) is not a
    82  token of the CUE language.
    83  
    84  
    85  ## Source code representation
    86  
    87  Source code is Unicode text encoded in UTF-8.
    88  Unless otherwise noted, the text is not canonicalized, so a single
    89  accented code point is distinct from the same character constructed from
    90  combining an accent and a letter; those are treated as two code points.
    91  For simplicity, this document will use the unqualified term character to refer
    92  to a Unicode code point in the source text.
    93  
    94  Each code point is distinct; for instance, upper and lower case letters are
    95  different characters.
    96  
    97  Implementation restriction: For compatibility with other tools, a compiler may
    98  disallow the NUL character (U+0000) in the source text.
    99  
   100  Implementation restriction: For compatibility with other tools, a compiler may
   101  ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code
   102  point in the source text. A byte order mark may be disallowed anywhere else in
   103  the source.
   104  
   105  
   106  ### Characters
   107  
   108  The following terms are used to denote specific Unicode character classes:
   109  
   110  ```
   111  newline        = /* the Unicode code point U+000A */ .
   112  unicode_char   = /* an arbitrary Unicode code point except newline */ .
   113  unicode_letter = /* a Unicode code point classified as "Letter" */ .
   114  unicode_digit  = /* a Unicode code point classified as "Number, decimal digit" */ .
   115  ```
   116  
   117  In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of
   118  character categories.
   119  CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo
   120  as Unicode letters, and those in the Number category Nd as Unicode digits.
   121  
   122  
   123  ### Letters and digits
   124  
   125  The underscore character _ (U+005F) is considered a letter.
   126  
   127  ```
   128  letter        = unicode_letter | "_" .
   129  decimal_digit = "0" … "9" .
   130  binary_digit  = "0" … "1" .
   131  octal_digit   = "0" … "7" .
   132  hex_digit     = "0" … "9" | "A" … "F" | "a" … "f" .
   133  ```
   134  
   135  
   136  ## Lexical elements
   137  
   138  ### Comments
   139  Comments serve as program documentation.
   140  CUE supports line comments that start with the character sequence //
   141  and stop at the end of the line.
   142  
   143  A comment cannot start inside a string literal or inside a comment.
   144  A comment acts like a newline.
   145  
   146  
   147  ### Tokens
   148  
   149  Tokens form the vocabulary of the CUE language. There are four classes:
   150  identifiers, keywords, operators and punctuation, and literals. White space,
   151  formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns
   152  (U+000D), and newlines (U+000A), is ignored except as it separates tokens that
   153  would otherwise combine into a single token. Also, a newline or end of file may
   154  trigger the insertion of a comma. While breaking the input into tokens, the
   155  next token is the longest sequence of characters that form a valid token.
   156  
   157  
   158  ### Commas
   159  
   160  The formal grammar uses commas "," as terminators in a number of productions.
   161  CUE programs may omit most of these commas using the following two rules:
   162  
   163  When the input is broken into tokens, a comma is automatically inserted into
   164  the token stream immediately after a line's final token if that token is
   165  
   166  - an identifier, keyword, or bottom
   167  - a number or string literal, including an interpolation
   168  - one of the characters `)`, `]`, `}`, or `?`
   169  - an ellipsis `...`
   170  
   171  
   172  Although commas are automatically inserted, the parser will require
   173  explicit commas between two list elements.
   174  
   175  To reflect idiomatic use, examples in this document elide commas using
   176  these rules.
   177  
   178  
   179  ### Identifiers
   180  
   181  Identifiers name entities such as fields and aliases.
   182  An identifier is a sequence of one or more letters (which includes `_` and `$`)
   183  and digits, optionally preceded by `#` or `_#`.
   184  It may not be `_` or `$`.
   185  The first character in an identifier, or after an `#` if it contains one,
   186  must be a letter.
   187  Identifiers starting with a `#` or `_` are reserved for definitions and hidden
   188  fields.
   189  
   190  <!--
   191  TODO: allow identifiers as defined in Unicode UAX #31
   192  (https://unicode.org/reports/tr31/).
   193  
   194  Identifiers are normalized using the NFC normal form.
   195  -->
   196  
   197  ```
   198  identifier  = [ "#" | "_#" ] letter { letter | unicode_digit } .
   199  ```
   200  
   201  ```
   202  a
   203  _x9
   204  fieldName
   205  αβ
   206  ```
   207  
   208  <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ -->
   209  
   210  Some identifiers are [predeclared](#predeclared-identifiers).
   211  
   212  
   213  ### Keywords
   214  
   215  CUE has a limited set of keywords.
   216  In addition, CUE reserves all identifiers starting with `__`(double underscores)
   217  as keywords.
   218  These are typically targets of pre-declared identifiers.
   219  
   220  All keywords may be used as labels (field names).
   221  Unless noted otherwise, they can also be used as identifiers to refer to
   222  the same name.
   223  
   224  
   225  #### Values
   226  
   227  The following keywords are values.
   228  
   229  ```
   230  null         true         false
   231  ```
   232  
   233  These can never be used to refer to a field of the same name.
   234  This restriction is to ensure compatibility with JSON configuration files.
   235  
   236  
   237  #### Preamble
   238  
   239  The following keywords are used at the preamble of a CUE file.
   240  After the preamble, they may be used as identifiers to refer to namesake fields.
   241  
   242  ```
   243  package      import
   244  ```
   245  
   246  
   247  #### Comprehension clauses
   248  
   249  The following keywords are used in comprehensions.
   250  
   251  ```
   252  for          in           if           let
   253  ```
   254  
   255  <!--
   256  TODO:
   257      reduce [to]
   258      order [by]
   259  -->
   260  
   261  
   262  ### Operators and punctuation
   263  
   264  The following character sequences represent operators and punctuation:
   265  
   266  ```
   267  +     &&    ==    <     =     (     )
   268  -     ||    !=    >     :     {     }
   269  *     &     =~    <=    ?     [     ]     ,
   270  /     |     !~    >=    !     _|_   ...   .
   271  ```
   272  <!--
   273  Free tokens:  ; ~ ^
   274  // To be used:
   275    @   at: associative lists.
   276  
   277  // Idea: use # instead of @ for attributes and allow then at declaration level.
   278  // This will open up the possibility of defining #! at the start of a file
   279  // without requiring special syntax. Although probably not quite.
   280   -->
   281  
   282  
   283  ### Numeric literals
   284  
   285  There are several kinds of numeric literals.
   286  
   287  ```
   288  int_lit     = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit .
   289  decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .
   290  decimals    = decimal_digit { [ "_" ] decimal_digit } .
   291  si_it       = decimals [ "." decimals ] multiplier |
   292                "." decimals  multiplier .
   293  binary_lit  = "0b" binary_digit { binary_digit } .
   294  hex_lit     = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
   295  octal_lit   = "0o" octal_digit { [ "_" ] octal_digit } .
   296  multiplier  = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ]
   297  
   298  float_lit   = decimals "." [ decimals ] [ exponent ] |
   299                decimals exponent |
   300                "." decimals [ exponent ].
   301  exponent    = ( "e" | "E" ) [ "+" | "-" ] decimals .
   302  ```
   303  
   304  An _integer literal_ is a sequence of digits representing an integer value.
   305  An optional prefix sets a non-decimal base: 0o for octal,
   306  0x or 0X for hexadecimal, and 0b for binary.
   307  In hexadecimal literals, letters a-f and A-F represent values 10 through 15.
   308  All integers allow interstitial underscores "_";
   309  these have no meaning and are solely for readability.
   310  
   311  Integer literals may have an SI or IEC multiplier.
   312  Multipliers can be used with fractional numbers.
   313  When multiplying a fraction by a multiplier, the result is truncated
   314  towards zero if it is not an integer.
   315  
   316  ```
   317  42
   318  1.5G    // 1_000_000_000
   319  1.3Ki   // 1.3 * 1024 = trunc(1331.2) = 1331
   320  170_141_183_460_469_231_731_687_303_715_884_105_727
   321  0xBad_Face
   322  0o755
   323  0b0101_0001
   324  ```
   325  
   326  A _decimal floating-point literal_ is a representation of
   327  a decimal floating-point value (a _float_).
   328  It has an integer part, a decimal point, a fractional part, and an
   329  exponent part.
   330  The integer and fractional part comprise decimal digits; the
   331  exponent part is an `e` or `E` followed by an optionally signed decimal exponent.
   332  One of the integer part or the fractional part may be elided; one of the decimal
   333  point or the exponent may be elided.
   334  
   335  ```
   336  0.
   337  72.40
   338  072.40  // == 72.40
   339  2.71828
   340  1.e+0
   341  6.67428e-11
   342  1E6
   343  .25
   344  .12345E+5
   345  ```
   346  
   347  <!--
   348  TODO: consider allowing Exo (and up), if not followed by a sign
   349  or number. Alternatively one could only allow Ei, Yi, and Zi.
   350  -->
   351  
   352  Neither a `float_lit` nor an `si_lit` may not appear after a token that is:
   353  
   354  - an identifier, keyword, or bottom
   355  - a number or string literal, including an interpolation
   356  - one of the characters `)`, `]`, `}`, `?`, or `.`.
   357  
   358  <!--
   359  So
   360  `a + 3.2Ti`  -> `a`, `+`, `3.2Ti`
   361  `a 3.2Ti`    -> `a`, `3`, `.`, `2`, `Ti`
   362  `a + .5e3`   -> `a`, `+`, `.5e3`
   363  `a .5e3`     -> `a`, `.`, `5`, `e3`.
   364  -->
   365  
   366  
   367  ### String and byte sequence literals
   368  
   369  A string literal represents a string constant obtained from concatenating a
   370  sequence of characters.
   371  Byte sequences are a sequence of bytes.
   372  
   373  String and byte sequence literals are character sequences between,
   374  respectively, double and single quotes, as in `"bar"` and `'bar'`.
   375  Within the quotes, any character may appear except newline and,
   376  respectively, unescaped double or single quote.
   377  String literals may only be valid UTF-8.
   378  Byte sequences may contain any sequence of bytes.
   379  
   380  Several escape sequences allow arbitrary values to be encoded as ASCII text.
   381  An escape sequence starts with an _escape delimiter_, which is `\` by default.
   382  The escape delimiter may be altered to be `\` plus a fixed number of
   383  hash symbols `#`
   384  by padding the start and end of a string or byte sequence literal
   385  with this number of hash symbols.
   386  
   387  There are four ways to represent the integer value as a numeric constant: `\x`
   388  followed by exactly two hexadecimal digits; `\u` followed by exactly four
   389  hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a
   390  plain backslash `\` followed by exactly three octal digits.
   391  In each case the value of the literal is the value represented by the
   392  digits in the corresponding base.
   393  Hexadecimal and octal escapes are only allowed within byte sequences
   394  (single quotes).
   395  
   396  Although these representations all result in an integer, they have different
   397  valid ranges.
   398  Octal escapes must represent a value between 0 and 255 inclusive.
   399  Hexadecimal escapes satisfy this condition by construction.
   400  The escapes `\u` and `\U` represent Unicode code points so within them
   401  some values are illegal, in particular those above `0x10FFFF`.
   402  Surrogate halves are allowed,
   403  but are translated into their non-surrogate equivalent internally.
   404  
   405  The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes
   406  represent individual bytes of the resulting string; all other escapes represent
   407  the (possibly multi-byte) UTF-8 encoding of individual characters.
   408  Thus inside a string literal `\377` and `\xFF` represent a single byte of
   409  value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent
   410  the two bytes `0xc3 0xbf` of the UTF-8
   411  encoding of character `U+00FF`.
   412  
   413  ```
   414  \a   U+0007 alert or bell
   415  \b   U+0008 backspace
   416  \f   U+000C form feed
   417  \n   U+000A line feed or newline
   418  \r   U+000D carriage return
   419  \t   U+0009 horizontal tab
   420  \v   U+000b vertical tab
   421  \/   U+002f slash (solidus)
   422  \\   U+005c backslash
   423  \'   U+0027 single quote  (valid escape only within single quoted literals)
   424  \"   U+0022 double quote  (valid escape only within double quoted literals)
   425  ```
   426  
   427  The escape `\(` is used as an escape for string interpolation.
   428  A `\(` must be followed by a valid CUE Expression, followed by a `)`.
   429  
   430  All other sequences starting with a backslash are illegal inside literals.
   431  
   432  ```
   433  escaped_char     = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) .
   434  byte_value       = octal_byte_value | hex_byte_value .
   435  octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit .
   436  hex_byte_value   = `\` { `#` } "x" hex_digit hex_digit .
   437  little_u_value   = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit .
   438  big_u_value      = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit
   439                             hex_digit hex_digit hex_digit hex_digit .
   440  unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
   441  interpolation    = "\" { `#` } "(" Expression ")" .
   442  
   443  string_lit       = simple_string_lit |
   444                     multiline_string_lit |
   445                     simple_bytes_lit |
   446                     multiline_bytes_lit |
   447                     `#` string_lit `#` .
   448  
   449  simple_string_lit    = `"` { unicode_value | interpolation } `"` .
   450  simple_bytes_lit     = `'` { unicode_value | interpolation | byte_value } `'` .
   451  multiline_string_lit = `"""` newline
   452                               { unicode_value | interpolation | newline }
   453                               newline `"""` .
   454  multiline_bytes_lit  = "'''" newline
   455                               { unicode_value | interpolation | byte_value | newline }
   456                               newline "'''" .
   457  ```
   458  
   459  Carriage return characters (`\r`) inside string literals are discarded from
   460  the string value.
   461  
   462  ```
   463  'a\000\xab'
   464  '\007'
   465  '\377'
   466  '\xa'        // illegal: too few hexadecimal digits
   467  "\n"
   468  "\""
   469  'Hello, world!\n'
   470  "Hello, \( name )!"
   471  "日本語"
   472  "\u65e5本\U00008a9e"
   473  '\xff\u00FF'
   474  "\uD800"             // illegal: surrogate half (TODO: probably should allow)
   475  "\U00110000"         // illegal: invalid Unicode code point
   476  
   477  #"This is not an \(interpolation)"#
   478  #"This is an \#(interpolation)"#
   479  #"The sequence "\U0001F604" renders as \#U0001F604."#
   480  ```
   481  
   482  These examples all represent the same string:
   483  
   484  ```
   485  "日本語"                                 // UTF-8 input text
   486  '日本語'                                 // UTF-8 input text as byte sequence
   487  `日本語`                                 // UTF-8 input text as a raw literal
   488  "\u65e5\u672c\u8a9e"                    // the explicit Unicode code points
   489  "\U000065e5\U0000672c\U00008a9e"        // the explicit Unicode code points
   490  '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e'  // the explicit UTF-8 bytes
   491  ```
   492  
   493  If the source code represents a character as two code points, such as a
   494  combining form involving an accent and a letter, the result will appear as two
   495  code points if placed in a string literal.
   496  
   497  Strings and byte sequences have a multiline equivalent.
   498  Multiline strings are like their single-line equivalent,
   499  but allow newline characters.
   500  
   501  Multiline strings and byte sequences respectively start with
   502  a triple double quote (`"""`) or triple single quote (`'''`),
   503  immediately followed by a newline, which is discarded from the string contents.
   504  The string is closed by a matching triple quote, which must be by itself
   505  on a newline, preceded by optional whitespace.
   506  The newline preceding the closing quote is discarded from the string contents.
   507  The whitespace before a closing triple quote must appear before any non-empty
   508  line after the opening quote and will be removed from each of these
   509  lines in the string literal.
   510  A closing triple quote may not appear in the string.
   511  To include it is suffices to escape one of the quotes.
   512  
   513  ```
   514  """
   515      lily:
   516      out of the water
   517      out of itself
   518  
   519      bass
   520      picking bugs
   521      off the moon
   522          — Nick Virgilio, Selected Haiku, 1988
   523      """
   524  ```
   525  
   526  This represents the same string as:
   527  
   528  ```
   529  "lily:\nout of the water\nout of itself\n\n" +
   530  "bass\npicking bugs\noff the moon\n" +
   531  "    — Nick Virgilio, Selected Haiku, 1988"
   532  ```
   533  
   534  <!-- TODO: other values
   535  
   536  Support for other values:
   537  - Duration literals
   538  - regular expessions: `re("[a-z]")`
   539  -->
   540  
   541  
   542  ## Values
   543  
   544  In addition to simple values like `"hello"` and `42.0`, CUE has _structs_.
   545  A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`.
   546  Structs are CUE's only way of building up complex values;
   547  lists, which we will see later,
   548  are defined in terms of structs.
   549  
   550  All possible values are ordered in a lattice,
   551  a partial order where every two elements have a single greatest lower bound.
   552  A value `a` is an _instance_ of a value `b`,
   553  denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`,
   554  that is if `a` orders before `b` in the partial order
   555  (`⊑` is _not_ a CUE operator).
   556  We also say that `b` _subsumes_ `a` in this case.
   557  In graphical terms, `b` is "above" `a` in the lattice.
   558  
   559  At the top of the lattice is the single ancestor of all values, called
   560  _top_, denoted `_` in CUE.
   561  Every value is an instance of top.
   562  
   563  At the bottom of the lattice is the value called _bottom_, denoted `_|_`.
   564  A bottom value usually indicates an error.
   565  Bottom is an instance of every value.
   566  
   567  An _atom_ is any value whose only instances are itself and bottom.
   568  Examples of atoms are `42.0`, `"hello"`, `true`, `null`.
   569  
   570  A value is _concrete_ if it is either an atom, or a struct all of whose
   571  field values are themselves concrete, recursively.
   572  
   573  CUE's values also include what we normally think of as types, like `string` and
   574  `float`.
   575  But CUE does not distinguish between types and values; only the
   576  relationship of values in the lattice is important.
   577  Each CUE "type" subsumes the concrete values that one would normally think
   578  of as part of that type.
   579  For example, "hello" is an instance of `string`, and `42.0` is an instance of
   580  `float`.
   581  In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`.
   582  We informally call these CUE's "basic types".
   583  
   584  
   585  ```
   586  false ⊑ bool
   587  true  ⊑ bool
   588  true  ⊑ true
   589  5.0   ⊑ float
   590  bool  ⊑ _
   591  _|_   ⊑ _
   592  _|_   ⊑ _|_
   593  
   594  _     ⋢ _|_
   595  _     ⋢ bool
   596  int   ⋢ bool
   597  bool  ⋢ int
   598  false ⋢ true
   599  true  ⋢ false
   600  float ⋢ 5.0
   601  5     ⋢ 6
   602  ```
   603  
   604  
   605  ### Unification
   606  
   607  The _unification_ of values `a` and `b`
   608  is defined as the greatest lower bound of `a` and `b`. (That is, the
   609  value `u` such that `u ⊑ a` and `u ⊑ b`,
   610  and for any other value `v` for which `v ⊑ a` and `v ⊑ b`
   611  it holds that `v ⊑ u`.)
   612  Since CUE values form a lattice, the unification of two CUE values is
   613  always unique.
   614  
   615  These all follow from the definition of unification:
   616  - The unification of `a` with itself is always `a`.
   617  - The unification of values `a` and `b` where `a ⊑ b` is always `a`.
   618  - The unification of a value with bottom is always bottom.
   619  
   620  Unification in CUE is a [binary expression](#Operands), written `a & b`.
   621  It is commutative and associative.
   622  As a consequence, order of evaluation is irrelevant, a property that is key
   623  to many of the constructs in the CUE language as well as the tooling layered
   624  on top of it.
   625  
   626  
   627  
   628  <!-- TODO: explicitly mention that disjunction is not a binary operation
   629  but a definition of a single value?-->
   630  
   631  
   632  ### Disjunction
   633  
   634  The _disjunction_ of values `a` and `b`
   635  is defined as the least upper bound of `a` and `b`.
   636  (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`,
   637  and for any other value `e` for which `a ⊑ e` and `b ⊑ e`,
   638  it holds that `d ⊑ e`.)
   639  This style of disjunctions is sometimes also referred to as sum types.
   640  Since CUE values form a lattice, the disjunction of two CUE values is always unique.
   641  
   642  
   643  These all follow from the definition of disjunction:
   644  - The disjunction of `a` with itself is always `a`.
   645  - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`.
   646  - The disjunction of a value `a` with bottom is always `a`.
   647  - The disjunction of two bottom values is bottom.
   648  
   649  Disjunction in CUE is a [binary expression](#Operands), written `a | b`.
   650  It is commutative, associative, and idempotent.
   651  
   652  The unification of a disjunction with another value is equal to the disjunction
   653  composed of the unification of this value with all of the original elements
   654  of the disjunction.
   655  In other words, unification distributes over disjunction.
   656  
   657  ```
   658  (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b.
   659  ```
   660  
   661  ```
   662  Expression                Result
   663  ({a:1} | {b:2}) & {c:3}   {a:1, c:3} | {b:2, c:3}
   664  (int | string) & "foo"    "foo"
   665  ("a" | "b") & "c"         _|_
   666  ```
   667  
   668  A disjunction is _normalized_ if there is no element
   669  `a` for which there is an element `b` such that `a ⊑ b`.
   670  
   671  <!--
   672  Normalization is important, as we need to account for spurious elements
   673  For instance "tcp" | "tcp" should resolve to "tcp".
   674  
   675  Also consider
   676  
   677    ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2},
   678  
   679  in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus
   680  this expression is logically equivalent to {a:1} and should therefore be
   681  considered to be unambiguous and resolve to {a:1} if a concrete value is needed.
   682  
   683  For instance, in
   684  
   685    x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2}
   686    y: x.a // 1
   687  
   688  y should resolve to 1, and not an error.
   689  
   690  For comparison, in
   691  
   692    x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2}
   693    y: x.a // _|_
   694  
   695  y should be an error as x is still ambiguous before the selector is applied,
   696  even though `a` resolves to 1 in all cases.
   697  -->
   698  
   699  
   700  #### Default values
   701  
   702  Any value `v` _may_ be associated with a default value `d`,
   703  where `d` must be in instance of `v` (`d ⊑ v`).
   704  
   705  Default values are introduced by means of disjunctions.
   706  Any element of a disjunction can be _marked_ as a default
   707  by prefixing it with an asterisk `*` ([a unary expression](#Operators)).
   708  Syntactically consecutive disjunctions are considered to be
   709  part of a single disjunction,
   710  whereby multiple disjuncts can be marked as default.
   711  A _marked disjunction_ is one where any of its terms are marked.
   712  So `a | b | *c | d` is a single marked disjunction of four terms,
   713  whereas `a | (b | *c | d)` is an unmarked disjunction of two terms,
   714  one of which is a marked disjunction of three terms.
   715  During unification, if all the marked disjuncts of a marked disjunction are
   716  eliminated, then the remaining unmarked disjuncts are considered as if they
   717  originated from an unmarked disjunction
   718  <!-- TODO: this formulation should be worked out more.  -->
   719  As explained below, distinguishing the nesting of disjunctions like this
   720  is only relevant when both an outer and nested disjunction are marked.
   721  
   722  Intuitively, when an expression needs to be resolved for an operation other
   723  than unification or disjunction,
   724  non-starred elements are dropped in favor of starred ones if the starred ones
   725  do not resolve to bottom.
   726  
   727  To define the the unification and disjunction operation we use the notation
   728  `⟨v⟩` to denote a CUE value `v` that is not associated with a default
   729  and the notation `⟨v, d⟩` to denote a value `v` associated with a default
   730  value `d`.
   731  
   732  The rewrite rules for unifying such values are as follows:
   733  ```
   734  U0: ⟨v1⟩ & ⟨v2⟩         => ⟨v1&v2⟩
   735  U1: ⟨v1, d1⟩ & ⟨v2⟩     => ⟨v1&v2, d1&v2⟩
   736  U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩
   737  ```
   738  
   739  The rewrite rules for disjoining terms of unmarked disjunctions are
   740  ```
   741  D0: ⟨v1⟩ | ⟨v2⟩         => ⟨v1|v2⟩
   742  D1: ⟨v1, d1⟩ | ⟨v2⟩     => ⟨v1|v2, d1⟩
   743  D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩
   744  ```
   745  
   746  Terms of marked disjunctions are first rewritten according to the following
   747  rules:
   748  ```
   749  M0:  ⟨v⟩    => ⟨v⟩        don't introduce defaults for unmarked term
   750  M1: *⟨v⟩    => ⟨v, v⟩     introduce identical default for marked term
   751  M2: *⟨v, d⟩ => ⟨v, d⟩     keep existing defaults for marked term
   752  M3:  ⟨v, d⟩ => ⟨v⟩        strip existing defaults from unmarked term
   753  ```
   754  
   755  Note that for any marked disjunction `a`,
   756  the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`.
   757  
   758  ```
   759  Expression               Value-default pair      Rules applied
   760  *"tcp" | "udp"           ⟨"tcp"|"udp", "tcp"⟩    M1, D1
   761  string | *"foo"          ⟨string, "foo"⟩         M1, D1
   762  
   763  *1 | 2 | 3               ⟨1|2|3, 1⟩              M1, D1
   764  
   765  (*1|2|3) | (1|*2|3)      ⟨1|2|3, 1|2⟩            M1, D1, D2
   766  (*1|2|3) | *(1|*2|3)     ⟨1|2|3, 2⟩              M1, M2, M3, D1, D2
   767  (*1|2|3) | (1|*2|3)&2    ⟨1|2|3, 1|2⟩            M1, D1, U1, D2
   768  
   769  (*1|2) & (1|*2)          ⟨1|2, _|_⟩              M1, D1, U2
   770  ```
   771  
   772  The rules of subsumption for defaults can be derived from the above definitions
   773  and are as follows.
   774  
   775  ```
   776  ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩  if v2 ⊑ v1 and d2 ⊑ d1
   777  ⟨v1, d1⟩ ⊑ ⟨v⟩       if v1 ⊑ v
   778  ⟨v⟩      ⊑ ⟨v1, d1⟩  if v ⊑ d1
   779  ```
   780  
   781  <!--
   782  For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v.
   783  
   784  The last one is so restrictive as v could still be made more specific by
   785  associating it with a default that is not subsumed by d1.
   786  
   787  Proof:
   788    by definition for any d ⊑ v, it holds that (v, d) ⊑ v,
   789    where the most general value is (v, v).
   790    Given the subsumption rule for (v2, d2) ⊑ (v1, d1),
   791    from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1
   792    exactly defines the boundary of this subsumption.
   793  -->
   794  
   795  <!--
   796  (non-normalized entries could also be implicitly marked, allowing writing
   797  int | 1, instead of int | *1, but that can be done in a backwards
   798  compatible way later if really desirable, as long as we require that
   799  disjunction literals be normalized).
   800  -->
   801  
   802  ```
   803  Expression                       Resolves to
   804  "tcp" | "udp"                    "tcp" | "udp"
   805  *"tcp" | "udp"                   "tcp"
   806  float | *1                       1
   807  *string | 1.0                    string
   808  (*1|2) + (2|*3)                  4
   809  
   810  (*1|2|3) | (1|*2|3)              1|2
   811  (*1|2|3) & (1|*2|3)              1|2|3 // default is _|_
   812  
   813  (* >=5 | int) & (* <=5 | int)    5
   814  
   815  (*"tcp"|"udp") & ("udp"|*"tcp")  "tcp"
   816  (*"tcp"|"udp") & ("udp"|"tcp")   "tcp"
   817  (*"tcp"|"udp") & "tcp"           "tcp"
   818  (*"tcp"|"udp") & (*"udp"|"tcp")  "tcp" | "udp" // default is _|_
   819  
   820  (*true | false) & bool           true
   821  (*true | false) & (true | false) true
   822  
   823  {a: 1} | {b: 1}                  {a: 1} | {b: 1}
   824  {a: 1} | *{b: 1}                 {b:1}
   825  *{a: 1} | *{b: 1}                {a: 1} | {b: 1}
   826  ({a: 1} | {b: 1}) & {a:1}        {a:1}  | {a: 1, b: 1}
   827  ({a:1}|*{b:1}) & ({a:1}|*{b:1})  {b:1}
   828  ```
   829  
   830  
   831  ### Bottom and errors
   832  
   833  Any evaluation error in CUE results in a bottom value, represented by
   834  the token `_|_`.
   835  Bottom is an instance of every other value.
   836  Any evaluation error is represented as bottom.
   837  
   838  Implementations may associate error strings with different instances of bottom;
   839  logically they all remain the same value.
   840  
   841  ```
   842  bottom_lit = "_|_" .
   843  ```
   844  
   845  
   846  ### Top
   847  
   848  Top is represented by the underscore character `_`, lexically an identifier.
   849  Unifying any value `v` with top results `v` itself.
   850  
   851  ```
   852  Expr        Result
   853  _ &  5        5
   854  _ &  _        _
   855  _ & _|_      _|_
   856  _ | _|_       _
   857  ```
   858  
   859  
   860  ### Null
   861  
   862  The _null value_ is represented with the keyword `null`.
   863  It has only one parent, top, and one child, bottom.
   864  It is unordered with respect to any other value.
   865  
   866  ```
   867  null_lit   = "null" .
   868  ```
   869  
   870  ```
   871  null & 8     _|_
   872  null & _     null
   873  null & _|_   _|_
   874  ```
   875  
   876  
   877  ### Boolean values
   878  
   879  A _boolean type_ represents the set of Boolean truth values denoted by
   880  the keywords `true` and `false`.
   881  The predeclared boolean type is `bool`; it is a defined type and a separate
   882  element in the lattice.
   883  
   884  ```
   885  bool_lit = "true" | "false" .
   886  ```
   887  
   888  ```
   889  bool & true          true
   890  true & true          true
   891  true & false         _|_
   892  bool & (false|true)  false | true
   893  bool & (true|false)  true | false
   894  ```
   895  
   896  
   897  ### Numeric values
   898  
   899  The _integer type_ represents the set of all integral numbers.
   900  The _decimal floating-point type_ represents the set of all decimal floating-point
   901  numbers.
   902  They are two distinct types.
   903  Both are instances instances of a generic `number` type.
   904  
   905  <!--
   906                      number
   907                     /      \
   908                  int      float
   909  -->
   910  
   911  The predeclared number, integer, decimal floating-point types are
   912  `number`, `int` and `float`; they are defined types.
   913  <!--
   914  TODO: should we drop float? It is somewhat preciser and probably a good idea
   915  to have it in the programmatic API, but it may be confusing to have to deal
   916  with it in the language.
   917  -->
   918  
   919  A decimal floating-point literal always has type `float`;
   920  it is not an instance of `int` even if it is an integral number.
   921  
   922  Integer literals are always of type `int` and don't match type `float`.
   923  
   924  Numeric literals are exact values of arbitrary precision.
   925  If the operation permits it, numbers should be kept in arbitrary precision.
   926  
   927  Implementation restriction: although numeric values have arbitrary precision
   928  in the language, implementations may implement them using an internal
   929  representation with limited precision.
   930  That said, every implementation must:
   931  
   932  - Represent integer values with at least 256 bits.
   933  - Represent floating-point values, with a mantissa of at least 256 bits and
   934  a signed binary exponent of at least 16 bits.
   935  - Give an error if unable to represent an integer value precisely.
   936  - Give an error if unable to represent a floating-point value due to overflow.
   937  - Round to the nearest representable value if unable to represent
   938  a floating-point value due to limits on precision.
   939  These requirements apply to the result of any expression except for builtin
   940  functions for which an unusual loss of precision must be explicitly documented.
   941  
   942  
   943  ### Strings
   944  
   945  The _string type_ represents the set of UTF-8 strings,
   946  not allowing surrogates.
   947  The predeclared string type is `string`; it is a defined type.
   948  
   949  The length of a string `s` (its size in bytes) can be discovered using
   950  the built-in function `len`.
   951  
   952  
   953  ### Bytes
   954  
   955  The _bytes type_ represents the set of byte sequences.
   956  A byte sequence value is a (possibly empty) sequence of bytes.
   957  The number of bytes is called the length of the byte sequence
   958  and is never negative.
   959  The predeclared byte sequence type is `bytes`; it is a defined type.
   960  
   961  
   962  ### Bounds
   963  
   964  A _bound_, syntactically a [unary expression](#Operands), defines
   965  an infinite disjunction of concrete values than can be represented
   966  as a single comparison.
   967  
   968  For any [comparison operator](#Comparison-operators) `op` except `==`,
   969  `op a` is the disjunction of every `x` such that `x op a`.
   970  
   971  ```
   972  2 & >=2 & <=5           // 2, where 2 is either an int or float.
   973  2.5 & >=1 & <=5         // 2.5
   974  2 & >=1.0 & <3.0        // 2.0
   975  2 & >1 & <3.0           // 2.0
   976  2.5 & int & >1 & <5     // _|_
   977  2.5 & float & >1 & <5   // 2.5
   978  int & 2 & >1.0 & <3.0   // _|_
   979  2.5 & >=(int & 1) & <5  // _|_
   980  >=0 & <=7 & >=3 & <=10  // >=3 & <=7
   981  !=null & 1              // 1
   982  >=5 & <=5               // 5
   983  ```
   984  
   985  
   986  ### Structs
   987  
   988  A _struct_ is a set of elements called _fields_, each of
   989  which has a name, called a _label_, and value.
   990  
   991  We say a label is defined for a struct if the struct has a field with the
   992  corresponding label.
   993  The value for a label `f` of struct `a` is denoted `a.f`.
   994  A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f`
   995  defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`.
   996  Note that if `a` is an instance of `b` it may have fields with labels that
   997  are not defined for `b`.
   998  
   999  The (unique) struct with no fields, written `{}`, has every struct as an
  1000  instance. It can be considered the type of all structs.
  1001  
  1002  ```
  1003  {a: 1} ⊑ {}
  1004  {a: 1, b: 1} ⊑ {a: 1}
  1005  {a: 1} ⊑ {a: int}
  1006  {a: 1, b: 1} ⊑ {a: int, b: float}
  1007  
  1008  {} ⋢ {a: 1}
  1009  {a: 2} ⋢ {a: 1}
  1010  {a: 1} ⋢ {b: 1}
  1011  ```
  1012  
  1013  A field may be required or optional.
  1014  The successful unification of structs `a` and `b` is a new struct `c` which
  1015  has all fields of both `a` and `b`, where
  1016  the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`,
  1017  or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively.
  1018  If a field `f` is in both `a` and `b`, `c.f` is optional only if both
  1019  `a.f` and `b.f` are optional.
  1020  Any [references](#References) to `a` or `b`
  1021  in their respective field values need to be replaced with references to `c`.
  1022  The result of a unification is bottom (`_|_`) if any of its non-optional
  1023  fields evaluates to bottom, recursively.
  1024  
  1025  <!--NOTE: About bottom values for optional fields being okay.
  1026  
  1027  The proposition ¬P is a close cousin of P → ⊥ and is often used
  1028  as an approximation to avoid the issues of using not.
  1029  Bottom (⊥) is also frequently used to mean undefined. This makes sense.
  1030  Consider `{a?: 2} & {a?: 3}`.
  1031  Both structs say `a` is optional; in other words, it may be omitted.
  1032  So we can still get a valid result by omitting `a`, even in
  1033  case of a conflict.
  1034  
  1035  Granted, this definition may lead to confusing results, especially in
  1036  definitions, when tightening an optional field leads to unintentionally
  1037  discarding it.
  1038  It could be a role of vet checkers to identify such cases (and suggest users
  1039  to explicitly use `_|_` to discard a field, for instance).
  1040  -->
  1041  
  1042  Syntactically, a field is marked as optional by following its label with a `?`.
  1043  The question mark is not part of the field name.
  1044  A struct literal may contain multiple fields with
  1045  the same label, the result of which is a single field with the same properties
  1046  as defined as the unification of two fields resulting from unifying two structs.
  1047  
  1048  These examples illustrate required fields only.
  1049  Examples with optional fields follow below.
  1050  
  1051  ```
  1052  Expression                             Result (without optional fields)
  1053  {a: int, a: 1}                         {a: 1}
  1054  {a: int} & {a: 1}                      {a: 1}
  1055  {a: >=1 & <=7} & {a: >=5 & <=9}        {a: >=5 & <=7}
  1056  {a: >=1 & <=7, a: >=5 & <=9}           {a: >=5 & <=7}
  1057  
  1058  {a: 1} & {b: 2}                        {a: 1, b: 2}
  1059  {a: 1, b: int} & {b: 2}                {a: 1, b: 2}
  1060  
  1061  {a: 1} & {a: 2}                        _|_
  1062  ```
  1063  
  1064  A struct may define constraints that apply to fields that are added when unified
  1065  with another struct using pattern or default constraints.
  1066  
  1067  A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which
  1068  is a value of type string, and a value to unify with fields whose label
  1069  match that pattern.
  1070  When unifying structs `a` and `b`,
  1071  a pattern constraint `[p]: v` declared in `a`
  1072  defines that the value `v` should unify with any field in the resulting struct `c`
  1073  whose label unifies with pattern `p`.
  1074  
  1075  <!-- TODO: Update grammar and support this.
  1076  A pattern constraints with a pattern preceded by `...` indicates
  1077  the pattern can only matches fields in `b` for which there
  1078  exists no field in `a` with the same label.
  1079  -->
  1080  
  1081  Additionally, a _default constraint_, denoted `...value`, defines a value
  1082  to unify with any field for which there is no other declaration in a struct.
  1083  When unifying structs `a` and `b`,
  1084  a default constraint `...v` declared in `a`
  1085  defines that the value `v` should unify with any field in the resulting struct `c`
  1086  whose label does not unify with any of the patterns of the pattern
  1087  constraints defined for `a` _and_ for which there exists no field in `a`
  1088  with that label.
  1089  The token `...` is a shorthand for `..._`.
  1090  
  1091  
  1092  ```
  1093  a: {
  1094      foo:    string  // foo is a string
  1095      ["^i"]: int     // all other fields starting with i are integers
  1096      ["^b"]: bool    // all other fields starting with b are booleans
  1097      ...string       // all other fields must be a string
  1098  }
  1099  
  1100  b: a & {
  1101      i3:    3
  1102      bar:   true
  1103      other: "a string"
  1104  }
  1105  ```
  1106  
  1107  Concrete field labels may be an identifier or string, the latter of which may be
  1108  interpolated.
  1109  Fields with identifier labels can be referred to within the scope they are
  1110  defined, string labels cannot.
  1111  References within such interpolated strings are resolved within
  1112  the scope of the struct in which the label sequence is
  1113  defined and can reference concrete labels lexically preceding
  1114  the label within a label sequence.
  1115  <!-- We allow this so that rewriting a CUE file to collapse or expand
  1116  field sequences has no impact on semantics.
  1117  -->
  1118  
  1119  <!--TODO: first implementation round will not yet have expression labels
  1120  
  1121  An ExpressionLabel sets a collection of optional fields to a field value.
  1122  By default it defines this value for all possible string labels.
  1123  An optional expression limits this to the set of optional fields which
  1124  labels match the expression.
  1125  -->
  1126  
  1127  
  1128  <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. -->
  1129  
  1130  
  1131  <!-- NOTE:
  1132  A DefinitionDecl does not allow repeated labels. This is to avoid
  1133  any ambiguity or confusion about whether earlier path components
  1134  are to be interpreted as declarations or normal fields (they should
  1135  always be normal fields.)
  1136  -->
  1137  
  1138  <!--NOTE:
  1139  The syntax has been deliberately restricted to allow for the following
  1140  future extensions and relaxations:
  1141    - Allow omitting a "?" in an expression label to indicate a concrete
  1142      string value (but maybe we want to use () for that).
  1143    - Make the "?" in expression label optional if expression labels
  1144      are always optional.
  1145    - Or allow eliding the "?" if the expression has no references and
  1146      is obviously not concrete (such as `[string]`).
  1147    - The expression of an expression label may also indicate a struct with
  1148      integer or even number labels
  1149      (beware of imprecise computation in the latter).
  1150        e.g. `{ [int]: string }` is a map of integers to strings.
  1151    - Allow for associative lists (`foo [@.field]: {field: string}`)
  1152    - The `...` notation can be extended analogously to that of a ListList,
  1153      by allowing it to follow with an expression for the remaining properties.
  1154      In that case it is no longer a shorthand for `[string]: _`, but rather
  1155      would define the value for any other value for which there is no field
  1156      defined.
  1157      Like the definition with List, this is somewhat odd, but it allows the
  1158      encoding of JSON schema's and (non-structural) OpenAPI's
  1159      additionalProperties and additionalItems.
  1160  -->
  1161  
  1162  ```
  1163  StructLit       = "{" { Declaration "," } "}" .
  1164  Declaration     = Field | Ellipsis | Embedding | LetClause | attribute .
  1165  Ellipsis        = "..." [ Expression ] .
  1166  Embedding       = Comprehension | AliasExpr .
  1167  Field           = Label ":" { Label ":" } AliasExpr { attribute } .
  1168  Label           = [ identifier "=" ] LabelExpr .
  1169  LabelExpr       = LabelName [ "?" ] | "[" AliasExpr "]" .
  1170  LabelName       = identifier | simple_string_lit  .
  1171  
  1172  attribute       = "@" identifier "(" attr_tokens ")" .
  1173  attr_tokens     = { attr_token |
  1174                      "(" attr_tokens ")" |
  1175                      "[" attr_tokens "]" |
  1176                      "{" attr_tokens "}" } .
  1177  attr_token      = /* any token except '(', ')', '[', ']', '{', or '}' */
  1178  ```
  1179  
  1180  ```
  1181  Expression                             Result (without optional fields)
  1182  a: { foo?: string }                    {}
  1183  b: { foo: "bar" }                      { foo: "bar" }
  1184  c: { foo?: *"bar" | string }           {}
  1185  
  1186  d: a & b                               { foo: "bar" }
  1187  e: b & c                               { foo: "bar" }
  1188  f: a & c                               {}
  1189  g: a & { foo?: number }                {}
  1190  h: b & { foo?: number }                _|_
  1191  i: c & { foo: string }                 { foo: "bar" }
  1192  
  1193  intMap: [string]: int
  1194  intMap: {
  1195      t1: 43
  1196      t2: 2.4  // error: 2.4 is not an integer
  1197  }
  1198  
  1199  nameMap: [string]: {
  1200      firstName: string
  1201      nickName:  *firstName | string
  1202  }
  1203  
  1204  nameMap: hank: { firstName: "Hank" }
  1205  ```
  1206  The optional field set defined by `nameMap` matches every field,
  1207  in this case just `hank`, and unifies the associated constraint
  1208  with the matched field, resulting in:
  1209  ```
  1210  nameMap: hank: {
  1211      firstName: "Hank"
  1212      nickName:  "Hank"
  1213  }
  1214  ```
  1215  
  1216  
  1217  #### Closed structs
  1218  
  1219  By default, structs are open to adding fields.
  1220  Instances of an open struct `p` may contain fields not defined in `p`.
  1221  This is makes it easy to add fields, but can lead to bugs:
  1222  
  1223  ```
  1224  S: {
  1225      field1: string
  1226  }
  1227  
  1228  S1: S & { field2: "foo" }
  1229  
  1230  // S1 is { field1: string, field2: "foo" }
  1231  
  1232  
  1233  A: {
  1234      field1: string
  1235      field2: string
  1236  }
  1237  
  1238  A1: A & {
  1239      feild1: "foo"  // "field1" was accidentally misspelled
  1240  }
  1241  
  1242  // A1 is
  1243  //    { field1: string, field2: string, feild1: "foo" }
  1244  // not the intended
  1245  //    { field1: "foo", field2: string }
  1246  ```
  1247  
  1248  A _closed struct_ `c` is a struct whose instances may not declare any field
  1249  with a name that does not match the name of field
  1250  or the pattern of a pattern constraint defined in `c`.
  1251  Hidden fields are excluded from this limitation.
  1252  A struct that is the result of unifying any struct with a [`...`](#Structs)
  1253  declaration is defined for all regular fields.
  1254  Closing a struct is equivalent to adding `..._|_` to it.
  1255  
  1256  Syntactically, structs are closed explicitly with the `close` builtin or
  1257  implicitly and recursively by [definitions](#definitions-and-hidden-fields).
  1258  
  1259  
  1260  ```
  1261  A: close({
  1262      field1: string
  1263      field2: string
  1264  })
  1265  
  1266  A1: A & {
  1267      feild1: string
  1268  } // _|_ feild1 not defined for A
  1269  
  1270  A2: A & {
  1271      for k,v in { feild1: string } {
  1272          k: v
  1273      }
  1274  }  // _|_ feild1 not defined for A
  1275  
  1276  C: close({
  1277      [_]: _
  1278  })
  1279  
  1280  C2: C & {
  1281      for k,v in { thisIsFine: string } {
  1282          "\(k)": v
  1283      }
  1284  }
  1285  
  1286  D: close({
  1287      // Values generated by comprehensions are treated as embeddings.
  1288      for k,v in { x: string } {
  1289          "\(k)": v
  1290      }
  1291  })
  1292  ```
  1293  
  1294  <!-- (jba) Somewhere it should be said that optional fields are only
  1295       interesting inside closed structs. -->
  1296  
  1297  <!-- TODO: move embedding section to above the previous one -->
  1298  
  1299  #### Embedding
  1300  
  1301  A struct may contain an _embedded value_, an operand used as a declaration.
  1302  An embedded value of type struct is unified with the struct in which it is
  1303  embedded, but disregarding the restrictions imposed by closed structs.
  1304  So if an embedding resolves to a closed struct, the corresponding enclosing
  1305  struct will also be closed, but may have fields that are not allowed if
  1306  normal rules for closed structs were observed.
  1307  
  1308  If an embedded value is not of type struct, the struct may only have
  1309  definitions or hidden fields. Regular fields are not allowed in such case.
  1310  
  1311  The result of `{ A }` is `A` for any `A` (including definitions).
  1312  
  1313  Syntactically, embeddings may be any expression.
  1314  
  1315  ```
  1316  S1: {
  1317      a: 1
  1318      b: 2
  1319      {
  1320          c: 3
  1321      }
  1322  }
  1323  // S1 is { a: 1, b: 2, c: 3 }
  1324  
  1325  S2: close({
  1326      a: 1
  1327      b: 2
  1328      {
  1329          c: 3
  1330      }
  1331  })
  1332  // same as close(S1)
  1333  
  1334  S3: {
  1335      a: 1
  1336      b: 2
  1337      close({
  1338          c: 3
  1339      })
  1340  }
  1341  // same as S2
  1342  ```
  1343  
  1344  
  1345  #### Definitions and hidden fields
  1346  
  1347  A field is a _definition_ if its identifier starts with `#` or `_#`.
  1348  A field is _hidden_ if its starts with a `_`.
  1349  All other fields are _regular_.
  1350  
  1351  Definitions and hidden fields are not emitted when converting a CUE program
  1352  to data and are never required to be concrete.
  1353  
  1354  Referencing a definition will recursively [close](#ClosedStructs) it.
  1355  That is, a referenced definition will not unify with a struct
  1356  that would add a field anywhere within the definition that it does not
  1357  already define or explicitly allow with a pattern constraint or `...`.
  1358  [Embeddings](#embedding) allow bypassing this check.
  1359  
  1360  If referencing a definition would always result in an error, implementations
  1361  may report this inconsistency at the point of its declaration.
  1362  
  1363  ```
  1364  #MyStruct: {
  1365      sub: field:    string
  1366  }
  1367  
  1368  #MyStruct: {
  1369      sub: enabled?: bool
  1370  }
  1371  
  1372  myValue: #MyStruct & {
  1373      sub: feild:   2     // error, feild not defined in #MyStruct
  1374      sub: enabled: true  // okay
  1375  }
  1376  
  1377  #D: {
  1378      #OneOf
  1379  
  1380      c: int // adds this field.
  1381  }
  1382  
  1383  #OneOf: { a: int } | { b: int }
  1384  
  1385  
  1386  D1: #D & { a: 12, c: 22 }  // { a: 12, c: 22 }
  1387  D2: #D & { a: 12, b: 33 }  // _|_ // cannot define both `a` and `b`
  1388  ```
  1389  
  1390  
  1391  ```
  1392  #A: {a: int}
  1393  
  1394  B: {
  1395      #A
  1396      b: c: int
  1397  }
  1398  
  1399  x: B
  1400  x: d: 3  // not allowed, as closed by embedded #A
  1401  
  1402  y: B.b
  1403  y: d: 3  // allowed as nothing closes b
  1404  
  1405  #B: {
  1406      #A
  1407      b: c: int
  1408  }
  1409  
  1410  z: #B.b
  1411  z: d: 3  // not allowed, as referencing #B closes b
  1412  ```
  1413  
  1414  
  1415  <!---
  1416  JSON fields are usual camelCase. Clashes can be avoided by adopting the
  1417  convention that definitions be TitleCase. Unexported definitions are still
  1418  subject to clashes, but those are likely easier to resolve because they are
  1419  package internal.
  1420  --->
  1421  
  1422  
  1423  #### Attributes
  1424  
  1425  Attributes allow associating meta information with values.
  1426  Their primary purpose is to define mappings between CUE and
  1427  other representations.
  1428  Attributes do not influence the evaluation of CUE.
  1429  
  1430  An attribute associates an identifier with a value, a balanced token sequence,
  1431  which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`).
  1432  The sequence may not contain interpolations.
  1433  
  1434  Fields, structs and packages can be associated with a set of attributes.
  1435  Attributes accumulate during unification, but implementations may remove
  1436  duplicates that have the same source string representation.
  1437  The interpretation of an attribute, including the handling of multiple
  1438  attributes for a given identifier, is up to the consumer of the attribute.
  1439  
  1440  Field attributes define additional information about a field,
  1441  such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative
  1442  name of the field when mapping to a different language.
  1443  
  1444  
  1445  ```
  1446  // Package attribute
  1447  @protobuf(proto3)
  1448  
  1449  myStruct1: {
  1450      // Struct attribute:
  1451      @jsonschema(id="https://example.org/mystruct1.json")
  1452  
  1453      // Field attributes
  1454      field: string @go(Field)
  1455      attr:  int    @xml(,attr) @go(Attr)
  1456  }
  1457  
  1458  myStruct2: {
  1459      field: string @go(Field)
  1460      attr:  int    @xml(a1,attr) @go(Attr)
  1461  }
  1462  
  1463  Combined: myStruct1 & myStruct2
  1464  // field: string @go(Field)
  1465  // attr:  int    @xml(,attr) @xml(a1,attr) @go(Attr)
  1466  ```
  1467  
  1468  
  1469  #### Aliases
  1470  
  1471  Aliases name values that can be referred to
  1472  within the [scope](#declarations-and-scopes) in which they are declared.
  1473  The name of an alias must be unique within its scope.
  1474  
  1475  ```
  1476  AliasExpr  = [ identifier "=" ] Expression .
  1477  ```
  1478  
  1479  Aliases can appear in several positions:
  1480  
  1481  <!--- TODO: consider allowing this. It should be considered whether
  1482  having field aliases isn't already sufficient.
  1483  
  1484  As a declaration in a struct (`X=value`):
  1485  
  1486  - binds identifier `X` to a value embedded within the struct.
  1487  --->
  1488  
  1489  In front of a Label (`X=label: value`):
  1490  
  1491  - binds the identifier to the same value as `label` would be bound
  1492    to if it were a valid identifier.
  1493  - for optional fields (`foo?: bar` and `[foo]: bar`),
  1494    the bound identifier is only visible within the field value (`bar`).
  1495  
  1496  Before a value (`foo: X=x`)
  1497  
  1498  - binds the identifier to the value it precedes within the scope of that value.
  1499  
  1500  Inside a bracketed label (`[X=expr]: value`):
  1501  
  1502  - binds the identifier to the the concrete label that matches `expr`
  1503    within the instances of the field value (`value`).
  1504  
  1505  Before a list element (`[ X=value, X+1 ]`) (Not yet implemented)
  1506  
  1507  - binds the identifier to the list element it precedes within the scope of the
  1508    list expression.
  1509  
  1510  <!-- TODO: explain the difference between aliases and definitions.
  1511       Now that you have definitions, are aliases really necessary?
  1512       Consider removing.
  1513  -->
  1514  
  1515  ```
  1516  // A field alias
  1517  foo: X  // 4
  1518  X="not an identifier": 4
  1519  
  1520  // A value alias
  1521  foo: X={x: X.a}
  1522  bar: foo & {a: 1}  // {a: 1, x: 1}
  1523  
  1524  // A label alias
  1525  [Y=string]: { name: Y }
  1526  foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 }
  1527  ```
  1528  
  1529  <!-- TODO: also allow aliases as lists -->
  1530  
  1531  
  1532  #### Let declarations
  1533  
  1534  _Let declarations_ bind an identifier to an expression.
  1535  The identifier is visible within the [scope](#declarations-and-scopes)
  1536  in which it is declared.
  1537  The identifier must be unique within its scope.
  1538  
  1539  ```
  1540  let x = expr
  1541  
  1542  a: x + 1
  1543  b: x + 2
  1544  ```
  1545  
  1546  #### Shorthand notation for nested structs
  1547  
  1548  A field whose value is a struct with a single field may be written as
  1549  a colon-separated sequence of the two field names,
  1550  followed by a colon and the value of that single field.
  1551  
  1552  ```
  1553  job: myTask: replicas: 2
  1554  ```
  1555  expands to
  1556  ```
  1557  job: {
  1558      myTask: {
  1559          replicas: 2
  1560      }
  1561  }
  1562  ```
  1563  
  1564  <!-- OPTIONAL FIELDS:
  1565  
  1566  The optional marker solves the issue of having to print large amounts of
  1567  boilerplate when dealing with large types with many optional or default
  1568  values (such as Kubernetes).
  1569  Writing such optional values in terms of *null | value is tedious,
  1570  unpleasant to read, and as it is not well defined what can be dropped or not,
  1571  all null values have to be emitted from the output, even if the user
  1572  doesn't override them.
  1573  Part of the issue is how null is defined. We could adopt a Typescript-like
  1574  approach of introducing "void" or "undefined" to mean "not defined and not
  1575  part of the output". But having all of null, undefined, and void can be
  1576  confusing. If these ever are introduced anyway, the ? operator could be
  1577  expressed along the lines of
  1578     foo?: bar
  1579  being a shorthand for
  1580     foo: void | bar
  1581  where void is the default if no other default is given.
  1582  
  1583  The current mechanical definition of "?" is straightforward, though, and
  1584  probably avoids the need for void, while solving a big issue.
  1585  
  1586  Caveats:
  1587  [1] this definition requires explicitly defined fields to be emitted, even
  1588  if they could be elided (for instance if the explicit value is the default
  1589  value defined an optional field). This is probably a good thing.
  1590  
  1591  [2] a default value may still need to be included in an output if it is not
  1592  the zero value for that field and it is not known if any outside system is
  1593  aware of defaults. For instance, which defaults are specified by the user
  1594  and which by the schema understood by the receiving system.
  1595  The use of "?" together with defaults should therefore be used carefully
  1596  in non-schema definitions.
  1597  Problematic cases should be easy to detect by a vet-like check, though.
  1598  
  1599  [3] It should be considered how this affects the trim command.
  1600  Should values implied by optional fields be allowed to be removed?
  1601  Probably not. This restriction is unlikely to limit the usefulness of trim,
  1602  though.
  1603  
  1604  [4] There should be an option to emit all concrete optional values.
  1605  ```
  1606  -->
  1607  
  1608  ### Lists
  1609  
  1610  A list literal defines a new value of type list.
  1611  A list may be open or closed.
  1612  An open list is indicated with a `...` at the end of an element list,
  1613  optionally followed by a value for the remaining elements.
  1614  
  1615  The length of a closed list is the number of elements it contains.
  1616  The length of an open list is the its number of elements as a lower bound
  1617  and an unlimited number of elements as its upper bound.
  1618  
  1619  ```
  1620  ListLit       = "[" [ ElementList [ "," ] ] "]" .
  1621  ElementList   = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] .
  1622  ```
  1623  
  1624  Lists can be thought of as structs:
  1625  
  1626  ```
  1627  List: *null | {
  1628      Elem: _
  1629      Tail: List
  1630  }
  1631  ```
  1632  
  1633  For closed lists, `Tail` is `null` for the last element, for open lists it is
  1634  `*null | List`, defaulting to the shortest variant.
  1635  For instance, the open list [ 1, 2, ... ] can be represented as:
  1636  ```
  1637  open: List & { Elem: 1, Tail: { Elem: 2 } }
  1638  ```
  1639  and the closed version of this list, [ 1, 2 ], as
  1640  ```
  1641  closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } }
  1642  ```
  1643  
  1644  Using this representation, the subsumption rule for lists can
  1645  be derived from those of structs.
  1646  Implementations are not required to implement lists as structs.
  1647  The `Elem` and `Tail` fields are not special and `len` will not work as
  1648  expected in these cases.
  1649  
  1650  
  1651  ## Declarations and Scopes
  1652  
  1653  
  1654  ### Blocks
  1655  
  1656  A _block_ is a possibly empty sequence of declarations.
  1657  The braces of a struct literal `{ ... }` form a block, but there are
  1658  others as well:
  1659  
  1660  - The _universe block_ encompasses all CUE source text.
  1661  - Each [package](#modules-instances-and-packages) has a _package block_
  1662    containing all CUE source text in that package.
  1663  - Each file has a _file block_ containing all CUE source text in that file.
  1664  - Each `for` and `let` clause in a [comprehension](#comprehensions)
  1665    is considered to be its own implicit block.
  1666  
  1667  Blocks nest and influence scoping.
  1668  
  1669  
  1670  ### Declarations and scope
  1671  
  1672  A _declaration_  may bind an identifier to a field, alias, or package.
  1673  Every identifier in a program must be declared.
  1674  Other than for fields,
  1675  no identifier may be declared twice within the same block.
  1676  For fields an identifier may be declared more than once within the same block,
  1677  resulting in a field with a value that is the result of unifying the values
  1678  of all fields with the same identifier.
  1679  String labels do not bind an identifier to the respective field.
  1680  
  1681  The _scope_ of a declared identifier is the extent of source text in which the
  1682  identifier denotes the specified field, alias, or package.
  1683  
  1684  CUE is lexically scoped using blocks:
  1685  
  1686  1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block.
  1687  1. The scope of an identifier denoting a field
  1688    declared at top level (outside any struct literal) is the package block.
  1689  1. The scope of an identifier denoting an alias
  1690    declared at top level (outside any struct literal) is the file block.
  1691  1. The scope of the package name of an imported package is the file block of the
  1692    file containing the import declaration.
  1693  1. The scope of a field, alias or let identifier declared inside a struct
  1694     literal is the innermost containing block.
  1695  
  1696  An identifier declared in a block may be redeclared in an inner block.
  1697  While the identifier of the inner declaration is in scope, it denotes the entity
  1698  declared by the inner declaration.
  1699  
  1700  The package clause is not a declaration;
  1701  the package name does not appear in any scope.
  1702  Its purpose is to identify the files belonging to the same package
  1703  and to specify the default name for import declarations.
  1704  
  1705  
  1706  ### Predeclared identifiers
  1707  
  1708  CUE predefines a set of types and builtin functions.
  1709  For each of these there is a corresponding keyword which is the name
  1710  of the predefined identifier, prefixed with `__`.
  1711  
  1712  ```
  1713  Functions
  1714  len       close and or
  1715  
  1716  Types
  1717  null      The null type and value
  1718  bool      All boolean values
  1719  int       All integral numbers
  1720  float     All decimal floating-point numbers
  1721  string    Any valid UTF-8 sequence
  1722  bytes     Any valid byte sequence
  1723  
  1724  Derived   Value
  1725  number    int | float
  1726  uint      >=0
  1727  uint8     >=0 & <=255
  1728  int8      >=-128 & <=127
  1729  uint16    >=0 & <=65536
  1730  int16     >=-32_768 & <=32_767
  1731  rune      >=0 & <=0x10FFFF
  1732  uint32    >=0 & <=4_294_967_296
  1733  int32     >=-2_147_483_648 & <=2_147_483_647
  1734  uint64    >=0 & <=18_446_744_073_709_551_615
  1735  int64     >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807
  1736  uint128   >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455
  1737  int128    >=-170_141_183_460_469_231_731_687_303_715_884_105_728 &
  1738             <=170_141_183_460_469_231_731_687_303_715_884_105_727
  1739  float32   >=-3.40282346638528859811704183484516925440e+38 &
  1740            <=3.40282346638528859811704183484516925440e+38
  1741  float64   >=-1.797693134862315708145274237317043567981e+308 &
  1742            <=1.797693134862315708145274237317043567981e+308
  1743  ```
  1744  
  1745  
  1746  ### Exported identifiers
  1747  
  1748  <!-- move to a more logical spot -->
  1749  
  1750  An identifier of a package may be exported to permit access to it
  1751  from another package.
  1752  All identifiers not starting with `_` (so all regular fields and definitions
  1753  starting with `#`) are exported.
  1754  Any identifier starting with `_` is not visible outside the package and resides
  1755  in a separate namespace than namesake identifiers of other packages.
  1756  
  1757  ```
  1758  package mypackage
  1759  
  1760  foo:   string  // visible outside mypackage
  1761  "bar": string  // visible outside mypackage
  1762  
  1763  #Foo: {      // visible outside mypackage
  1764      a:  1    // visible outside mypackage
  1765      _b: 2    // not visible outside mypackage
  1766  
  1767      #C: {    // visible outside mypackage
  1768          d: 4 // visible outside mypackage
  1769      }
  1770      _#E: foo // not visible outside mypackage
  1771  }
  1772  ```
  1773  
  1774  
  1775  ### Uniqueness of identifiers
  1776  
  1777  Given a set of identifiers, an identifier is called unique if it is different
  1778  from every other in the set, after applying normalization following
  1779  Unicode Annex #31.
  1780  Two identifiers are different if they are spelled differently
  1781  or if they appear in different packages and are not exported.
  1782  Otherwise, they are the same.
  1783  
  1784  
  1785  ### Field declarations
  1786  
  1787  A field associates the value of an expression to a label within a struct.
  1788  If this label is an identifier, it binds the field to that identifier,
  1789  so the field's value can be referenced by writing the identifier.
  1790  String labels are not bound to fields.
  1791  ```
  1792  a: {
  1793      b: 2
  1794      "s": 3
  1795  
  1796      c: b   // 2
  1797      d: s   // _|_ unresolved identifier "s"
  1798      e: a.s // 3
  1799  }
  1800  ```
  1801  
  1802  If an expression may result in a value associated with a default value
  1803  as described in [default values](#default-values), the field binds to this
  1804  value-default pair.
  1805  
  1806  
  1807  <!-- TODO: disallow creating identifiers starting with __
  1808  ...and reserve them for builtin values.
  1809  
  1810  The issue is with code generation. As no guarantee can be given that
  1811  a predeclared identifier is not overridden in one of the enclosing scopes,
  1812  code will have to handle detecting such cases and renaming them.
  1813  An alternative is to have the predeclared identifiers be aliases for namesake
  1814  equivalents starting with a double underscore (e.g. string -> __string),
  1815  allowing generated code (normal code would keep using `string`) to refer
  1816  to these directly.
  1817  -->
  1818  
  1819  
  1820  ### Let declarations
  1821  
  1822  Within a struct, a let clause binds an identifier to the given expression.
  1823  
  1824  Within the scope of the identifier, the identifier refers to the
  1825  _locally declared_ expression.
  1826  The expression is evaluated in the scope it was declared.
  1827  
  1828  
  1829  ## Expressions
  1830  
  1831  An expression specifies the computation of a value by applying operators and
  1832  built-in functions to operands.
  1833  
  1834  Expressions that require concrete values are called _incomplete_ if any of
  1835  their operands are not concrete, but define a value that would be legal for
  1836  that expression.
  1837  Incomplete expressions may be left unevaluated until a concrete value is
  1838  requested at the application level.
  1839  
  1840  ### Operands
  1841  
  1842  Operands denote the elementary values in an expression.
  1843  An operand may be a literal, a (possibly qualified) identifier denoting
  1844  field, alias, or let declaration, or a parenthesized expression.
  1845  
  1846  ```
  1847  Operand     = Literal | OperandName | "(" Expression ")" .
  1848  Literal     = BasicLit | ListLit | StructLit .
  1849  BasicLit    = int_lit | float_lit | string_lit |
  1850                null_lit | bool_lit | bottom_lit .
  1851  OperandName = identifier | QualifiedIdent .
  1852  ```
  1853  
  1854  ### Qualified identifiers
  1855  
  1856  A qualified identifier is an identifier qualified with a package name prefix.
  1857  
  1858  ```
  1859  QualifiedIdent = PackageName "." identifier .
  1860  ```
  1861  
  1862  A qualified identifier accesses an identifier in a different package,
  1863  which must be [imported].
  1864  The identifier must be declared in the [package block] of that package.
  1865  
  1866  ```
  1867  math.Sin    // denotes the Sin function in package math
  1868  ```
  1869  
  1870  ### References
  1871  
  1872  An identifier operand refers to a field and is called a reference.
  1873  The value of a reference is a copy of the expression associated with the field
  1874  that it is bound to,
  1875  with any references within that expression bound to the respective copies of
  1876  the fields they were originally bound to.
  1877  Implementations may use a different mechanism to evaluate as long as
  1878  these semantics are maintained.
  1879  
  1880  ```
  1881  a: {
  1882      place:    string
  1883      greeting: "Hello, \(place)!"
  1884  }
  1885  
  1886  b: a & { place: "world" }
  1887  c: a & { place: "you" }
  1888  
  1889  d: b.greeting  // "Hello, world!"
  1890  e: c.greeting  // "Hello, you!"
  1891  ```
  1892  
  1893  
  1894  
  1895  ### Primary expressions
  1896  
  1897  Primary expressions are the operands for unary and binary expressions.
  1898  
  1899  ```
  1900  PrimaryExpr =
  1901  	Operand |
  1902  	PrimaryExpr Selector |
  1903  	PrimaryExpr Index |
  1904  	PrimaryExpr Slice |
  1905  	PrimaryExpr Arguments .
  1906  
  1907  Selector       = "." (identifier | simple_string_lit) .
  1908  Index          = "[" Expression "]" .
  1909  Argument       = Expression .
  1910  Arguments      = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" .
  1911  ```
  1912  <!---
  1913  TODO:
  1914  	PrimaryExpr Query |
  1915  Query          = "." Filters .
  1916  Filters        = Filter { Filter } .
  1917  Filter         = "[" [ "?" ] AliasExpr "]" .
  1918  
  1919  TODO: maybe reintroduce slices, as they are useful in queries, probably this
  1920  time with Python semantics.
  1921  Slice          = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" .
  1922  
  1923  Argument       = Expression | ( identifer ":" Expression ).
  1924  
  1925  // & expression type
  1926  // string_lit: same as label. Arguments is current node.
  1927  // If selector is applied to list, it performs the operation for each
  1928  // element.
  1929  
  1930  TODO: considering allowing decimal_lit for selectors.
  1931  --->
  1932  
  1933  ```
  1934  x
  1935  2
  1936  (s + ".txt")
  1937  f(3.1415, true)
  1938  m["foo"]
  1939  obj.color
  1940  f.p[i].x
  1941  ```
  1942  
  1943  
  1944  ### Selectors
  1945  
  1946  For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause),
  1947  the selector expression
  1948  
  1949  ```
  1950  x.f
  1951  ```
  1952  
  1953  denotes the element of a <!--list or -->struct `x` identified by `f`.
  1954  <!--For structs, -->
  1955  `f` must be an identifier or a string literal identifying
  1956  any definition or regular non-optional field.
  1957  The identifier `f` is called the field selector.
  1958  
  1959  <!--
  1960  Allowing strings to be used as field selectors obviates the need for
  1961  backquoted identifiers. Note that some standards use names for structs that
  1962  are not standard identifiers (such "Fn::Foo"). Note that indexing does not
  1963  allow access to identifiers.
  1964  -->
  1965  
  1966  <!--
  1967  For lists, `f` must be an integer and follows the same lookup rules as
  1968  for the index operation.
  1969  The type of the selector expression is the type of `f`.
  1970  -->
  1971  
  1972  If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers).
  1973  
  1974  <!--
  1975  TODO: consider allowing this and also for selectors. It needs to be considered
  1976  how defaults are carried forward in cases like:
  1977  
  1978      x: { a: string | *"foo" } | *{ a: int | *4 }
  1979      y: x.a & string
  1980  
  1981  What is y in this case?
  1982     (x.a & string, _|_)
  1983     (string|"foo", _|_)
  1984     (string|"foo", "foo)
  1985  If the latter, then why?
  1986  
  1987  For a disjunction of the form `x1 | ... | xn`,
  1988  the selector is applied to each element `x1.f | ... | xn.f`.
  1989  -->
  1990  
  1991  Otherwise, if `x` is not a <!--list or -->struct,
  1992  or if `f` does not exist in `x`,
  1993  the result of the expression is bottom (an error).
  1994  In the latter case the expression is incomplete.
  1995  The operand of a selector may be associated with a default.
  1996  
  1997  ```
  1998  T: {
  1999      x:     int
  2000      y:     3
  2001      "x-y": 4
  2002  }
  2003  
  2004  a: T.x     // int
  2005  b: T.y     // 3
  2006  c: T.z     // _|_ // field 'z' not found in T
  2007  d: T."x-y" // 4
  2008  
  2009  e: {a: 1|*2} | *{a: 3|*4}
  2010  f: e.a  // 4 (default value)
  2011  ```
  2012  
  2013  <!--
  2014  ```
  2015  (v, d).f  =>  (v.f, d.f)
  2016  
  2017  e: {a: 1|*2} | *{a: 3|*4}
  2018  f: e.a  // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4)
  2019  
  2020  ```
  2021  -->
  2022  
  2023  
  2024  ### Index expressions
  2025  
  2026  A primary expression of the form
  2027  
  2028  ```
  2029  a[x]
  2030  ```
  2031  
  2032  denotes the element of a list or struct `a` indexed by `x`.
  2033  The value `x` is called the index or field name, respectively.
  2034  The following rules apply:
  2035  
  2036  If `a` is not a struct:
  2037  
  2038  - `a` is a list (which need not be complete)
  2039  - the index `x` unified with `int` must be concrete.
  2040  - the index `x` is in range if `0 <= x < len(a)`, where only the
  2041    explicitly defined values of an open-ended list are considered,
  2042    otherwise it is out of range
  2043  
  2044  The result of `a[x]` is
  2045  
  2046  for `a` of list type:
  2047  
  2048  - the list element at index `x`, if `x` is within range
  2049  - bottom (an error), otherwise
  2050  
  2051  
  2052  for `a` of struct type:
  2053  
  2054  - the index `x` unified with `string` must be concrete.
  2055  - the value of the regular and non-optional field named `x` of struct `a`,
  2056    if this field exists
  2057  - bottom (an error), otherwise
  2058  
  2059  
  2060  ```
  2061  [ 1, 2 ][1]     // 2
  2062  [ 1, 2 ][2]     // _|_
  2063  [ 1, 2, ...][2] // _|_
  2064  ```
  2065  
  2066  Both the operand and index value may be a value-default pair.
  2067  ```
  2068  va[vi]              =>  va[vi]
  2069  va[(vi, di)]        =>  (va[vi], va[di])
  2070  (va, da)[vi]        =>  (va[vi], da[vi])
  2071  (va, da)[(vi, di)]  =>  (va[vi], da[di])
  2072  ```
  2073  
  2074  ```
  2075  Fields                  Result
  2076  x: [1, 2] | *[3, 4]     ([1,2]|[3,4], [3,4])
  2077  i: int | *1             (int, 1)
  2078  
  2079  v: x[i]                 (x[i], 4)
  2080  ```
  2081  
  2082  ### Operators
  2083  
  2084  Operators combine operands into expressions.
  2085  
  2086  ```
  2087  Expression = UnaryExpr | Expression binary_op Expression .
  2088  UnaryExpr  = PrimaryExpr | unary_op UnaryExpr .
  2089  
  2090  binary_op  = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op  .
  2091  rel_op     = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" .
  2092  add_op     = "+" | "-" .
  2093  mul_op     = "*" | "/" .
  2094  unary_op   = "+" | "-" | "!" | "*" | rel_op .
  2095  ```
  2096  
  2097  Comparisons are discussed [elsewhere](#Comparison-operators).
  2098  For any binary operators, the operand types must unify.
  2099  
  2100  <!-- TODO: durations
  2101   unless the operation involves durations.
  2102  
  2103  Except for duration operations, if one operand is an untyped [literal] and the
  2104  other operand is not, the constant is [converted] to the type of the other
  2105  operand.
  2106  -->
  2107  
  2108  Operands of unary and binary expressions may be associated with a default using
  2109  the following
  2110  
  2111  <!--
  2112  ```
  2113  O1: op (v1, d1)          => (op v1, op d1)
  2114  
  2115  O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2)
  2116  and because v => (v, v)
  2117  O3: v1       op (v2, d2) => (v1 op v2, v1 op d2)
  2118  O4: (v1, d1) op v2       => (v1 op v2, d1 op v2)
  2119  ```
  2120  -->
  2121  
  2122  ```
  2123  Field               Resulting Value-Default pair
  2124  a: *1|2             (1|2, 1)
  2125  b: -a               (-a, -1)
  2126  
  2127  c: a + 2            (a+2, 3)
  2128  d: a + a            (a+a, 2)
  2129  ```
  2130  
  2131  #### Operator precedence
  2132  
  2133  Unary operators have the highest precedence.
  2134  
  2135  There are eight precedence levels for binary operators.
  2136  Multiplication operators binds strongest, followed by
  2137  addition operators, comparison operators,
  2138  `&&` (logical AND), `||` (logical OR), `&` (unification),
  2139  and finally `|` (disjunction):
  2140  
  2141  ```
  2142  Precedence    Operator
  2143      7             *  /
  2144      6             +  -
  2145      5             ==  !=  <  <=  >  >= =~ !~
  2146      4             &&
  2147      3             ||
  2148      2             &
  2149      1             |
  2150  ```
  2151  
  2152  Binary operators of the same precedence associate from left to right.
  2153  For instance, `x / y * z` is the same as `(x / y) * z`.
  2154  
  2155  ```
  2156  +x
  2157  23 + 3*x[i]
  2158  x <= f()
  2159  f() || g()
  2160  x == y+1 && y == z-1
  2161  2 | int
  2162  { a: 1 } & { b: 2 }
  2163  ```
  2164  
  2165  #### Arithmetic operators
  2166  
  2167  Arithmetic operators apply to numeric values and yield a result of the same type
  2168  as the first operand. The four standard arithmetic operators
  2169  `(+, -, *, /)` apply to integer and decimal floating-point types;
  2170  `+` and `*` also apply to strings and bytes.
  2171  
  2172  ```
  2173  +    sum                    integers, floats, strings, bytes
  2174  -    difference             integers, floats
  2175  *    product                integers, floats, strings, bytes
  2176  /    quotient               integers, floats
  2177  ```
  2178  
  2179  For any operator that accepts operands of type `float`, any operand may be
  2180  of type `int` or `float`, in which case the result will be `float`
  2181  if it cannot be represented as an `int` or if any of the operands are `float`,
  2182  or `int` otherwise.
  2183  So the result of `1 / 2` is `0.5` and is of type `float`.
  2184  
  2185  The result of division by zero is bottom (an error).
  2186  <!-- TODO: consider making it +/- Inf -->
  2187  Integer division is implemented through the builtin functions
  2188  `quo`, `rem`, `div`, and `mod`.
  2189  
  2190  The unary operators `+` and `-` are defined for numeric values as follows:
  2191  
  2192  ```
  2193  +x                          is 0 + x
  2194  -x    negation              is 0 - x
  2195  ```
  2196  
  2197  #### String operators
  2198  
  2199  Strings can be concatenated using the `+` operator:
  2200  ```
  2201  s: "hi " + name + " and good bye"
  2202  ```
  2203  String addition creates a new string by concatenating the operands.
  2204  
  2205  A string can be repeated by multiplying it:
  2206  
  2207  ```
  2208  s: "etc. "*3  // "etc. etc. etc. "
  2209  ```
  2210  
  2211  <!-- jba: Do these work for byte sequences? If not, why not? -->
  2212  
  2213  
  2214  ##### Comparison operators
  2215  
  2216  Comparison operators compare two operands and yield an untyped boolean value.
  2217  
  2218  ```
  2219  ==    equal
  2220  !=    not equal
  2221  <     less
  2222  <=    less or equal
  2223  >     greater
  2224  >=    greater or equal
  2225  =~    matches regular expression
  2226  !~    does not match regular expression
  2227  ```
  2228  
  2229  <!-- regular expression operator inspired by Bash, Perl, and Ruby. -->
  2230  
  2231  In any comparison, the types of the two operands must unify or one of the
  2232  operands must be null.
  2233  
  2234  The equality operators `==` and `!=` apply to operands that are comparable.
  2235  The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered.
  2236  The matching operators `=~` and `!~` apply to a string and regular
  2237  expression operand.
  2238  These terms and the result of the comparisons are defined as follows:
  2239  
  2240  - Null is comparable with itself and any other type.
  2241    Two null values are always equal, null is unequal with anything else.
  2242  - Boolean values are comparable.
  2243    Two boolean values are equal if they are either both true or both false.
  2244  - Integer values are comparable and ordered, in the usual way.
  2245  - Floating-point values are comparable and ordered, as per the definitions
  2246    for binary coded decimals in the IEEE-754-2008 standard.
  2247  - Floating point numbers may be compared with integers.
  2248  - String and bytes values are comparable and ordered lexically byte-wise.
  2249  - Struct are not comparable.
  2250  - Lists are not comparable.
  2251  - The regular expression syntax is the one accepted by RE2,
  2252    described in https://github.com/google/re2/wiki/Syntax,
  2253    except for `\C`.
  2254  - `s =~ r` is true if `s` matches the regular expression `r`.
  2255  - `s !~ r` is true if `s` does not match regular expression `r`.
  2256  
  2257  <!--- TODO: consider the following
  2258  - For regular expression, named capture groups are interpreted as CUE references
  2259    that must unify with the strings matching this capture group.
  2260  --->
  2261  <!-- TODO: Implementations should adopt an algorithm that runs in linear time? -->
  2262  <!-- Consider implementing Level 2 of Unicode regular expression. -->
  2263  
  2264  ```
  2265  3 < 4       // true
  2266  3 < 4.0     // true
  2267  null == 2   // false
  2268  null != {}  // true
  2269  {} == {}    // _|_: structs are not comparable against structs
  2270  
  2271  "Wild cats" =~ "cat"   // true
  2272  "Wild cats" !~ "dog"   // true
  2273  
  2274  "foo" =~ "^[a-z]{3}$"  // true
  2275  "foo" =~ "^[a-z]{4}$"  // false
  2276  ```
  2277  
  2278  <!-- jba
  2279  I think I know what `3 < a` should mean if
  2280  
  2281      a: >=1 & <=5
  2282  
  2283  It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely.
  2284  
  2285  But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value.
  2286  -->
  2287  
  2288  #### Logical operators
  2289  
  2290  Logical operators apply to boolean values and yield a result of the same type
  2291  as the operands. The right operand is evaluated conditionally.
  2292  
  2293  ```
  2294  &&    conditional AND    p && q  is  "if p then q else false"
  2295  ||    conditional OR     p || q  is  "if p then true else q"
  2296  !     NOT                !p      is  "not p"
  2297  ```
  2298  
  2299  
  2300  <!--
  2301  ### TODO TODO TODO
  2302  
  2303  3.14 / 0.0   // illegal: division by zero
  2304  Illegal conversions always apply to CUE.
  2305  
  2306  Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
  2307  -->
  2308  
  2309  <!--- TODO(mpvl): conversions
  2310  ### Conversions
  2311  Conversions are expressions of the form `T(x)` where `T` and `x` are
  2312  expressions.
  2313  The result is always an instance of `T`.
  2314  
  2315  ```
  2316  Conversion = Expression "(" Expression [ "," ] ")" .
  2317  ```
  2318  --->
  2319  <!---
  2320  
  2321  A literal value `x` can be converted to type T if `x` is representable by a
  2322  value of `T`.
  2323  
  2324  As a special case, an integer literal `x` can be converted to a string type
  2325  using the same rule as for non-constant x.
  2326  
  2327  Converting a literal yields a typed value as result.
  2328  
  2329  ```
  2330  uint(iota)               // iota value of type uint
  2331  float32(2.718281828)     // 2.718281828 of type float32
  2332  complex128(1)            // 1.0 + 0.0i of type complex128
  2333  float32(0.49999999)      // 0.5 of type float32
  2334  float64(-1e-1000)        // 0.0 of type float64
  2335  string('x')              // "x" of type string
  2336  string(0x266c)           // "♬" of type string
  2337  MyString("foo" + "bar")  // "foobar" of type MyString
  2338  string([]byte{'a'})      // not a constant: []byte{'a'} is not a constant
  2339  (*int)(nil)              // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type
  2340  int(1.2)                 // illegal: 1.2 cannot be represented as an int
  2341  string(65.0)             // illegal: 65.0 is not an integer constant
  2342  ```
  2343  --->
  2344  <!---
  2345  
  2346  A conversion is always allowed if `x` is an instance of `T`.
  2347  
  2348  If `T` and `x` of different underlying type, a conversion is allowed if
  2349  `x` can be converted to a value `x'` of `T`'s type, and
  2350  `x'` is an instance of `T`.
  2351  A value `x` can be converted to the type of `T` in any of these cases:
  2352  
  2353  - `x` is a struct and is subsumed by `T`.
  2354  - `x` and `T` are both integer or floating points.
  2355  - `x` is an integer or a byte sequence and `T` is a string.
  2356  - `x` is a string and `T` is a byte sequence.
  2357  
  2358  Specific rules apply to conversions between numeric types, structs,
  2359  or to and from a string type. These conversions may change the representation
  2360  of `x`.
  2361  All other conversions only change the type but not the representation of x.
  2362  
  2363  
  2364  #### Conversions between numeric ranges
  2365  For the conversion of numeric values, the following rules apply:
  2366  
  2367  1. Any integer value can be converted into any other integer value
  2368     provided that it is within range.
  2369  2. When converting a decimal floating-point number to an integer, the fraction
  2370     is discarded (truncation towards zero). TODO: or disallow truncating?
  2371  
  2372  ```
  2373  a: uint16(int(1000))  // uint16(1000)
  2374  b: uint8(1000)        // _|_ // overflow
  2375  c: int(2.5)           // 2  TODO: TBD
  2376  ```
  2377  
  2378  
  2379  #### Conversions to and from a string type
  2380  
  2381  Converting a list of bytes to a string type yields a string whose successive
  2382  bytes are the elements of the slice.
  2383  Invalid UTF-8 is converted to `"\uFFFD"`.
  2384  
  2385  ```
  2386  string('hell\xc3\xb8')   // "hellø"
  2387  string(bytes([0x20]))    // " "
  2388  ```
  2389  
  2390  As string value is always convertible to a list of bytes.
  2391  
  2392  ```
  2393  bytes("hellø")   // 'hell\xc3\xb8'
  2394  bytes("")        // ''
  2395  ```
  2396  
  2397  #### Conversions between list types
  2398  
  2399  Conversions between list types are possible only if `T` strictly subsumes `x`
  2400  and the result will be the unification of `T` and `x`.
  2401  
  2402  If we introduce named types this would be different from IP & [10, ...]
  2403  
  2404  Consider removing this until it has a different meaning.
  2405  
  2406  ```
  2407  IP:        4*[byte]
  2408  Private10: IP([10, ...])  // [10, byte, byte, byte]
  2409  ```
  2410  
  2411  #### Conversions between struct types
  2412  
  2413  A conversion from `x` to `T`
  2414  is applied using the following rules:
  2415  
  2416  1. `x` must be an instance of `T`,
  2417  2. all fields defined for `x` that are not defined for `T` are removed from
  2418    the result of the conversion, recursively.
  2419  
  2420  <!-- jba: I don't think you say anywhere that the matching fields are unified.
  2421  mpvl: they are not, x must be an instance of T, in which case x == T&x,
  2422  so unification would be unnecessary.
  2423  -->
  2424  <!--
  2425  ```
  2426  T: {
  2427      a: { b: 1..10 }
  2428  }
  2429  
  2430  x1: {
  2431      a: { b: 8, c: 10 }
  2432      d: 9
  2433  }
  2434  
  2435  c1: T(x1)             // { a: { b: 8 } }
  2436  c2: T({})             // _|_  // missing field 'a' in '{}'
  2437  c3: T({ a: {b: 0} })  // _|_  // field a.b does not unify (0 & 1..10)
  2438  ```
  2439  -->
  2440  
  2441  ### Calls
  2442  
  2443  Calls can be made to core library functions, called builtins.
  2444  Given an expression `f` of function type F,
  2445  ```
  2446  f(a1, a2, … an)
  2447  ```
  2448  calls `f` with arguments a1, a2, … an. Arguments must be expressions
  2449  of which the values are an instance of the parameter types of `F`
  2450  and are evaluated before the function is called.
  2451  
  2452  ```
  2453  a: math.Atan2(x, y)
  2454  ```
  2455  
  2456  In a function call, the function value and arguments are evaluated in the usual
  2457  order.
  2458  After they are evaluated, the parameters of the call are passed by value
  2459  to the function and the called function begins execution.
  2460  The return parameters
  2461  of the function are passed by value back to the calling function when the
  2462  function returns.
  2463  
  2464  
  2465  ### Comprehensions
  2466  
  2467  Lists and fields can be constructed using comprehensions.
  2468  
  2469  Comprehensions define a clause sequence that consists of a sequence of
  2470  `for`, `if`, and `let` clauses, nesting from left to right.
  2471  The sequence must start with a `for` or `if` clause.
  2472  The `for` and `let` clauses each define a new scope in which new values are
  2473  bound to be available for the next clause.
  2474  
  2475  The `for` clause binds the defined identifiers, on each iteration, to the next
  2476  value of some iterable value in a new scope.
  2477  A `for` clause may bind one or two identifiers.
  2478  If there is one identifier, it binds it to the value of
  2479  a list element or struct field value.
  2480  If there are two identifiers, the first value will be the key or index,
  2481  if available, and the second will be the value.
  2482  
  2483  For lists, `for` iterates over all elements in the list after closing it.
  2484  For structs, `for` iterates over all non-optional regular fields.
  2485  
  2486  An `if` clause, or guard, specifies an expression that terminates the current
  2487  iteration if it evaluates to false.
  2488  
  2489  The `let` clause binds the result of an expression to the defined identifier
  2490  in a new scope.
  2491  
  2492  A current iteration is said to complete if the innermost block of the clause
  2493  sequence is reached.
  2494  Syntactically, the comprehension value is a struct.
  2495  A comprehension can generate non-struct values by embedding such values within
  2496  this struct.
  2497  
  2498  Within lists, the values yielded by a comprehension are inserted in the list
  2499  at the position of the comprehension.
  2500  Within structs, the values yielded by a comprehension are embedded within the
  2501  struct.
  2502  Both structs and lists may contain multiple comprehensions.
  2503  
  2504  ```
  2505  Comprehension       = Clauses StructLit .
  2506  
  2507  Clauses             = StartClause { [ "," ] Clause } .
  2508  StartClause         = ForClause | GuardClause .
  2509  Clause              = StartClause | LetClause .
  2510  ForClause           = "for" identifier [ "," identifier ] "in" Expression .
  2511  GuardClause         = "if" Expression .
  2512  LetClause           = "let" identifier "=" Expression .
  2513  ```
  2514  
  2515  ```
  2516  a: [1, 2, 3, 4]
  2517  b: [ for x in a if x > 1 { x+1 } ]  // [3, 4, 5]
  2518  
  2519  c: {
  2520      for x in a
  2521      if x < 4
  2522      let y = 1 {
  2523          "\(x)": x + y
  2524      }
  2525  }
  2526  d: { "1": 2, "2": 3, "3": 4 }
  2527  ```
  2528  
  2529  
  2530  ### String interpolation
  2531  
  2532  String interpolation allows constructing strings by replacing placeholder
  2533  expressions with their string representation.
  2534  String interpolation may be used in single- and double-quoted strings, as well
  2535  as their multiline equivalent.
  2536  
  2537  A placeholder consists of "\(" followed by an expression and a ")".
  2538  The expression is evaluated in the scope within which the string is defined.
  2539  
  2540  The result of the expression is substituted as follows:
  2541  - string: as is
  2542  - bool: the JSON representation of the bool
  2543  - number: a JSON representation of the number that preserves the
  2544  precision of the underlying binary coded decimal
  2545  - bytes: as if substituted within single quotes or
  2546  converted to valid UTF-8 replacing the
  2547  maximal subpart of ill-formed subsequences with a single
  2548  replacement character (W3C encoding standard) otherwise
  2549  - list: illegal
  2550  - struct: illegal
  2551  
  2552  
  2553  ```
  2554  a: "World"
  2555  b: "Hello \( a )!" // Hello World!
  2556  ```
  2557  
  2558  
  2559  ## Builtin Functions
  2560  
  2561  Built-in functions are predeclared. They are called like any other function.
  2562  
  2563  
  2564  ### `len`
  2565  
  2566  The built-in function `len` takes arguments of various types and return
  2567  a result of type int.
  2568  
  2569  ```
  2570  Argument type    Result
  2571  
  2572  string            string length in bytes
  2573  bytes             length of byte sequence
  2574  list              list length, smallest length for an open list
  2575  struct            number of distinct data fields, excluding optional
  2576  ```
  2577  <!-- TODO: consider not supporting len, but instead rely on more
  2578  precisely named builtin functions:
  2579    - strings.RuneLen(x)
  2580    - bytes.Len(x)  // x may be a string
  2581    - struct.NumFooFields(x)
  2582    - list.Len(x)
  2583  -->
  2584  
  2585  ```
  2586  Expression           Result
  2587  len("Hellø")         6
  2588  len([1, 2, 3])       3
  2589  len([1, 2, ...])     >=2
  2590  ```
  2591  
  2592  
  2593  ### `close`
  2594  
  2595  The builtin function `close` converts a partially defined, or open, struct
  2596  to a fully defined, or closed, struct.
  2597  
  2598  
  2599  ### `and`
  2600  
  2601  The built-in function `and` takes a list and returns the result of applying
  2602  the `&` operator to all elements in the list.
  2603  It returns top for the empty list.
  2604  
  2605  ```
  2606  Expression:          Result
  2607  and([a, b])          a & b
  2608  and([a])             a
  2609  and([])              _
  2610  ```
  2611  
  2612  ### `or`
  2613  
  2614  The built-in function `or` takes a list and returns the result of applying
  2615  the `|` operator to all elements in the list.
  2616  It returns bottom for the empty list.
  2617  
  2618  ```
  2619  Expression:          Result
  2620  or([a, b])           a | b
  2621  or([a])              a
  2622  or([])               _|_
  2623  ```
  2624  
  2625  #### `div`, `mod`, `quo` and `rem`
  2626  
  2627  For two integer values `x` and `y`,
  2628  the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)`
  2629  implement Euclidean division and
  2630  satisfy the following relationship:
  2631  
  2632  ```
  2633  r = x - y*q  with 0 <= r < |y|
  2634  ```
  2635  where `|y|` denotes the absolute value of `y`.
  2636  
  2637  ```
  2638   x     y   div(x, y)  mod(x, y)
  2639   5     3        1          2
  2640  -5     3       -2          1
  2641   5    -3       -1          2
  2642  -5    -3        2          1
  2643  ```
  2644  
  2645  For two integer values `x` and `y`,
  2646  the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)`
  2647  implement truncated division and
  2648  satisfy the following relationship:
  2649  
  2650  ```
  2651  x = q*y + r  and  |r| < |y|
  2652  ```
  2653  
  2654  with `quo(x, y)` truncated towards zero.
  2655  
  2656  ```
  2657   x     y   quo(x, y)  rem(x, y)
  2658   5     3        1          2
  2659  -5     3       -1         -2
  2660   5    -3       -1          2
  2661  -5    -3        1         -2
  2662  ```
  2663  
  2664  A zero divisor in either case results in bottom (an error).
  2665  
  2666  
  2667  ## Cycles
  2668  
  2669  Implementations are required to interpret or reject cycles encountered
  2670  during evaluation according to the rules in this section.
  2671  
  2672  
  2673  ### Reference cycles
  2674  
  2675  A _reference cycle_ occurs if a field references itself, either directly or
  2676  indirectly.
  2677  
  2678  ```
  2679  // x references itself
  2680  x: x
  2681  
  2682  // indirect cycles
  2683  b: c
  2684  c: d
  2685  d: b
  2686  ```
  2687  
  2688  Implementations should treat these as `_`.
  2689  Two particular cases are discussed below.
  2690  
  2691  
  2692  #### Expressions that unify an atom with an expression
  2693  
  2694  An expression of the form `a & e`, where `a` is an atom
  2695  and `e` is an expression, always evaluates to `a` or bottom.
  2696  As it does not matter how we fail, we can assume the result to be `a`
  2697  and postpone validating `a == e` until after all referenecs
  2698  in `e` have been resolved.
  2699  
  2700  ```
  2701  // Config            Evaluates to (requiring concrete values)
  2702  x: {                  x: {
  2703      a: b + 100            a: _|_ // cycle detected
  2704      b: a - 100            b: _|_ // cycle detected
  2705  }                     }
  2706  
  2707  y: x & {              y: {
  2708      a: 200                a: 200 // asserted that 200 == b + 100
  2709                            b: 100
  2710  }                     }
  2711  ```
  2712  
  2713  
  2714  #### Field values
  2715  
  2716  A field value of the form `r & v`,
  2717  where `r` evaluates to a reference cycle and `v` is a concrete value,
  2718  evaluates to `v`.
  2719  Unification is idempotent and unifying a value with itself ad infinitum,
  2720  which is what the cycle represents, results in this value.
  2721  Implementations should detect cycles of this kind, ignore `r`,
  2722  and take `v` as the result of unification.
  2723  
  2724  <!-- Tomabechi's graph unification algorithm
  2725  can detect such cycles at near-zero cost. -->
  2726  
  2727  ```
  2728  Configuration    Evaluated
  2729  //    c           Cycles in nodes of type struct evaluate
  2730  //  ↙︎   ↖         to the fixed point of unifying their
  2731  // a  →  b        values ad infinitum.
  2732  
  2733  a: b & { x: 1 }   // a: { x: 1, y: 2, z: 3 }
  2734  b: c & { y: 2 }   // b: { x: 1, y: 2, z: 3 }
  2735  c: a & { z: 3 }   // c: { x: 1, y: 2, z: 3 }
  2736  
  2737  // resolve a             b & {x:1}
  2738  // substitute b          c & {y:2} & {x:1}
  2739  // substitute c          a & {z:3} & {y:2} & {x:1}
  2740  // eliminate a (cycle)   {z:3} & {y:2} & {x:1}
  2741  // simplify              {x:1,y:2,z:3}
  2742  ```
  2743  
  2744  This rule also applies to field values that are disjunctions of unification
  2745  operations of the above form.
  2746  
  2747  ```
  2748  a: b&{x:1} | {y:1}  // {x:1,y:3,z:2} | {y:1}
  2749  b: {x:2} | c&{z:2}  // {x:2} | {x:1,y:3,z:2}
  2750  c: a&{y:3} | {z:3}  // {x:1,y:3,z:2} | {z:3}
  2751  
  2752  
  2753  // resolving a           b&{x:1} | {y:1}
  2754  // substitute b          ({x:2} | c&{z:2})&{x:1} | {y:1}
  2755  // simplify              c&{z:2}&{x:1} | {y:1}
  2756  // substitute c          (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1}
  2757  // simplify              a&{y:3}&{z:2}&{x:1} | {y:1}
  2758  // eliminate a (cycle)   {y:3}&{z:2}&{x:1} | {y:1}
  2759  // expand                {x:1,y:3,z:2} | {y:1}
  2760  ```
  2761  
  2762  Note that all nodes that form a reference cycle to form a struct will evaluate
  2763  to the same value.
  2764  If a field value is a disjunction, any element that is part of a cycle will
  2765  evaluate to this value.
  2766  
  2767  
  2768  ### Structural cycles
  2769  
  2770  A structural cycle is when a node references one of its ancestor nodes.
  2771  It is possible to construct a structural cycle by unifying two acyclic values:
  2772  ```
  2773  // acyclic
  2774  y: {
  2775      f: h: g
  2776      g: _
  2777  }
  2778  // acyclic
  2779  x: {
  2780      f: _
  2781      g: f
  2782  }
  2783  // introduces structural cycle
  2784  z: x & y  
  2785  ```
  2786  Implementations should be able to detect such structural cycles dynamically.
  2787  
  2788  A structural cycle can result in infinite structure or evaluation loops.
  2789  ```
  2790  // infinite structure
  2791  a: b: a
  2792  
  2793  // infinite evaluation
  2794  f: {
  2795      n:   int
  2796      out: n + (f & {n: 1}).out
  2797  }
  2798  ```
  2799  CUE must allow or disallow structural cycles under certain circumstances.
  2800  
  2801  If a node `a` references an ancestor node, we call it and any of its
  2802  field values `a.f` _cyclic_.
  2803  So if `a` is cyclic, all of its descendants are also regarded as cyclic.
  2804  A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`,
  2805  is valid if any of its conjuncts is not cyclic.
  2806  
  2807  ```
  2808  // Disallowed: a list of infinite length with all elements being 1.
  2809  #List: {
  2810      head: 1
  2811      tail: #List
  2812  }
  2813  
  2814  // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...).
  2815  a: {
  2816      b: c
  2817  }
  2818  c: {
  2819      d: a
  2820  }
  2821  
  2822  // #List defines a list of arbitrary length. Because the recursive reference
  2823  // is part of a disjunction, this does not result in a structural cycle.
  2824  #List: {
  2825      head: _
  2826      tail: null | #List
  2827  }
  2828  
  2829  // Usage of #List. The value of tail in the most deeply nested element will
  2830  // be `null`: as the value of the disjunct referring to list is the only
  2831  // conjunct, all conjuncts are cyclic and the value is invalid and so
  2832  // eliminated from the disjunction.
  2833  MyList: #List & { head: 1, tail: { head: 2 }}
  2834  ```
  2835  
  2836  <!--
  2837  ### Unused fields
  2838  
  2839  TODO: rules for detection of unused fields
  2840  
  2841  1. Any alias value must be used
  2842  -->
  2843  
  2844  
  2845  ## Modules, instances, and packages
  2846  
  2847  CUE configurations are constructed combining _instances_.
  2848  An instance, in turn, is constructed from one or more source files belonging
  2849  to the same _package_ that together declare the data representation.
  2850  Elements of this data representation may be exported and used
  2851  in other instances.
  2852  
  2853  ### Source file organization
  2854  
  2855  Each source file consists of an optional package clause defining collection
  2856  of files to which it belongs,
  2857  followed by a possibly empty set of import declarations that declare
  2858  packages whose contents it wishes to use, followed by a possibly empty set of
  2859  declarations.
  2860  
  2861  Like with a struct, a source file may contain embeddings.
  2862  Unlike with a struct, the embedded expressions may be any value.
  2863  If the result of the unification of all embedded values is not a struct,
  2864  it will be output instead of its enclosing file when exporting CUE
  2865  to a data format
  2866  
  2867  ```
  2868  SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } .
  2869  ```
  2870  
  2871  ```
  2872  "Hello \(place)!"
  2873  
  2874  place: "world"
  2875  
  2876  // Outputs "Hello world!"
  2877  ```
  2878  
  2879  ### Package clause
  2880  
  2881  A package clause is an optional clause that defines the package to which
  2882  a source file the file belongs.
  2883  
  2884  ```
  2885  PackageClause  = "package" PackageName .
  2886  PackageName    = identifier .
  2887  ```
  2888  
  2889  The PackageName must not be the blank identifier or a definition identifier.
  2890  
  2891  ```
  2892  package math
  2893  ```
  2894  
  2895  ### Modules and instances
  2896  A _module_ defines a tree of directories, rooted at the _module root_.
  2897  
  2898  All source files within a module with the same package belong to the same
  2899  package.
  2900  <!-- jba: I can't make sense of the above sentence. -->
  2901  A module may define multiple packages.
  2902  
  2903  An _instance_ of a package is any subset of files belonging
  2904  to the same package.
  2905  <!-- jba: Are you saying that -->
  2906  <!-- if I have a package with files a, b and c, then there are 8 instances of -->
  2907  <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the -->
  2908  <!-- purpose of that definition? -->
  2909  It is interpreted as the concatenation of these files.
  2910  
  2911  An implementation may impose conventions on the layout of package files
  2912  to determine which files of a package belongs to an instance.
  2913  For example, an instance may be defined as the subset of package files
  2914  belonging to a directory and all its ancestors.
  2915  <!-- jba: OK, that helps a little, but I still don't see what the purpose is. -->
  2916  
  2917  
  2918  ### Import declarations
  2919  
  2920  An import declaration states that the source file containing the declaration
  2921  depends on definitions of the _imported_ package (§Program initialization and
  2922  execution) and enables access to exported identifiers of that package.
  2923  The import names an identifier (PackageName) to be used for access and an
  2924  ImportPath that specifies the package to be imported.
  2925  
  2926  ```
  2927  ImportDecl       = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) .
  2928  ImportSpec       = [ PackageName ] ImportPath .
  2929  ImportLocation   = { unicode_value } .
  2930  ImportPath       = `"` ImportLocation [ ":" identifier ] `"` .
  2931  ```
  2932  
  2933  The PackageName is used in qualified identifiers to access
  2934  exported identifiers of the package within the importing source file.
  2935  It is declared in the file block.
  2936  It defaults to the identifier specified in the package clause of the imported
  2937  package, which must match either the last path component of ImportLocation
  2938  or the identifier following it.
  2939  
  2940  <!--
  2941  Note: this deviates from the Go spec where there is no such restriction.
  2942  This restriction has the benefit of being to determine the identifiers
  2943  for packages from within the file itself. But for CUE it is has another benefit:
  2944  when using package hierarchies, one is more likely to want to include multiple
  2945  packages within the same directory structure. This mechanism allows
  2946  disambiguation in these cases.
  2947  -->
  2948  
  2949  The interpretation of the ImportPath is implementation-dependent but it is
  2950  typically either the path of a builtin package or a fully qualifying location
  2951  of a package within a source code repository.
  2952  
  2953  An ImportLocation must be a non-empty strings using only characters belonging
  2954  Unicode's L, M, N, P, and S general categories
  2955  (the Graphic characters without spaces)
  2956  and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|}
  2957  or the Unicode replacement character U+FFFD.
  2958  
  2959  Assume we have package containing the package clause "package math",
  2960  which exports function Sin at the path identified by "lib/math".
  2961  This table illustrates how Sin is accessed in files
  2962  that import the package after the various types of import declaration.
  2963  
  2964  ```
  2965  Import declaration          Local name of Sin
  2966  
  2967  import   "lib/math"         math.Sin
  2968  import   "lib/math:math"    math.Sin
  2969  import m "lib/math"         m.Sin
  2970  ```
  2971  
  2972  An import declaration declares a dependency relation between the importing and
  2973  imported package. It is illegal for a package to import itself, directly or
  2974  indirectly, or to directly import a package without referring to any of its
  2975  exported identifiers.
  2976  
  2977  
  2978  ### An example package
  2979  
  2980  TODO