cuelang.org/go@v0.10.1/doc/ref/spec.md (about) 1 <!-- 2 Copyright 2018 The CUE Authors 3 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. 15 --> 16 17 # The CUE Language Specification 18 19 ## Introduction 20 21 This is a reference manual for the CUE data constraint language. 22 CUE, pronounced cue or Q, is a general-purpose and strongly typed 23 constraint-based language. 24 It can be used for data templating, data validation, code generation, scripting, 25 and many other applications involving structured data. 26 The CUE tooling, layered on top of CUE, provides 27 a general purpose scripting language for creating scripts as well as 28 simple servers, also expressed in CUE. 29 30 CUE was designed with cloud configuration and related systems in mind, 31 but is not limited to this domain. 32 It derives its formalism from relational programming languages. 33 This formalism allows for managing and reasoning over large amounts of 34 data in a straightforward manner. 35 36 The grammar is compact and regular, allowing for easy analysis by automatic 37 tools such as integrated development environments. 38 39 This document is maintained by mpvl@golang.org. 40 CUE has a lot of similarities with the Go language. This document draws heavily 41 from the Go specification as a result. 42 43 CUE draws its influence from many languages. 44 Its main influences were BCL/GCL (internal to Google), 45 LKB (LinGO), Go, and JSON. 46 Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google), 47 Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python. 48 49 50 ## Notation 51 52 The syntax is specified using Extended Backus-Naur Form (EBNF): 53 54 ``` 55 Production = production_name "=" [ Expression ] "." . 56 Expression = Alternative { "|" Alternative } . 57 Alternative = Term { Term } . 58 Term = production_name | token [ "…" token ] | Group | Option | Repetition . 59 Group = "(" Expression ")" . 60 Option = "[" Expression "]" . 61 Repetition = "{" Expression "}" . 62 ``` 63 64 Productions are expressions constructed from terms and the following operators, 65 in increasing precedence: 66 67 ``` 68 | alternation 69 () grouping 70 [] option (0 or 1 times) 71 {} repetition (0 to n times) 72 ``` 73 74 Lower-case production names are used to identify lexical tokens. Non-terminals 75 are in CamelCase. Lexical tokens are enclosed in double quotes `""` or back 76 quotes ` `` `. 77 78 The form `a … b` represents the set of characters from a through b as 79 alternatives. The horizontal ellipsis `…` is also used elsewhere in the spec to 80 informally denote various enumerations or code snippets that are not further 81 specified. The character `…` (as opposed to the three characters `...`) is not a 82 token of the CUE language. 83 84 85 ## Source code representation 86 87 Source code is Unicode text encoded in UTF-8. 88 Unless otherwise noted, the text is not canonicalized, so a single 89 accented code point is distinct from the same character constructed from 90 combining an accent and a letter; those are treated as two code points. 91 For simplicity, this document will use the unqualified term character to refer 92 to a Unicode code point in the source text. 93 94 Each code point is distinct; for instance, upper and lower case letters are 95 different characters. 96 97 Implementation restriction: For compatibility with other tools, a compiler may 98 disallow the NUL character (U+0000) in the source text. 99 100 Implementation restriction: For compatibility with other tools, a compiler may 101 ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code 102 point in the source text. A byte order mark may be disallowed anywhere else in 103 the source. 104 105 106 ### Characters 107 108 The following terms are used to denote specific Unicode character classes: 109 110 ``` 111 newline = /* the Unicode code point U+000A */ . 112 unicode_char = /* an arbitrary Unicode code point except newline */ . 113 unicode_letter = /* a Unicode code point classified as "Letter" */ . 114 unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ . 115 ``` 116 117 In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of 118 character categories. 119 CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo 120 as Unicode letters, and those in the Number category Nd as Unicode digits. 121 122 123 ### Letters and digits 124 125 The underscore character `_` (U+005F) is considered a letter. 126 127 ``` 128 letter = unicode_letter | "_" | "$" . 129 decimal_digit = "0" … "9" . 130 binary_digit = "0" … "1" . 131 octal_digit = "0" … "7" . 132 hex_digit = "0" … "9" | "A" … "F" | "a" … "f" . 133 ``` 134 135 136 ## Lexical elements 137 138 ### Comments 139 140 Comments serve as program documentation. 141 CUE supports line comments that start with the character sequence `//` 142 and stop at the end of the line. 143 144 A comment cannot start inside a string literal or inside a comment. 145 A comment acts like a newline. 146 147 148 ### Tokens 149 150 Tokens form the vocabulary of the CUE language. There are four classes: 151 identifiers, keywords, operators and punctuation, and literals. White space, 152 formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns 153 (U+000D), and newlines (U+000A), is ignored except as it separates tokens that 154 would otherwise combine into a single token. Also, a newline or end of file may 155 trigger the insertion of a comma. While breaking the input into tokens, the 156 next token is the longest sequence of characters that form a valid token. 157 158 159 ### Commas 160 161 The formal grammar uses commas `,` as terminators in a number of productions. 162 CUE programs may omit most of these commas using the following rules: 163 164 When the input is broken into tokens, a comma is automatically inserted into 165 the token stream immediately after a line's final token if that token is 166 167 - an identifier, keyword, or bottom 168 - a number or string literal, including an interpolation 169 - one of the characters `)`, `]`, `}`, or `?` 170 - an ellipsis `...` 171 172 173 Although commas are automatically inserted, the parser will require 174 explicit commas between two list elements. 175 176 <!-- 177 TODO: remove the above exception 178 --> 179 180 To reflect idiomatic use, examples in this document elide commas using 181 these rules. 182 183 184 ### Identifiers 185 186 Identifiers name entities such as fields and aliases. 187 An identifier is a sequence of one or more letters (which includes `_` and `$`) 188 and digits, optionally preceded by `#` or `_#`. 189 It may not be `_` or `$`. 190 The first character in an identifier, or after an `#` if it contains one, 191 must be a letter. 192 Identifiers starting with a `#` or `_` are reserved for definitions and hidden 193 fields. 194 195 <!-- 196 TODO: allow identifiers as defined in Unicode UAX #31 197 (https://unicode.org/reports/tr31/). 198 199 Identifiers are normalized using the NFC normal form. 200 --> 201 202 ``` 203 identifier = [ "#" | "_#" ] letter { letter | unicode_digit } . 204 ``` 205 206 ``` 207 a 208 _x9 209 fieldName 210 αβ 211 ``` 212 213 <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ --> 214 215 Some identifiers are [predeclared](#predeclared-identifiers). 216 217 218 ### Keywords 219 220 CUE has a limited set of keywords. 221 In addition, CUE reserves all identifiers starting with `__` (double underscores) 222 as keywords. 223 These are typically targets of pre-declared identifiers. 224 225 All keywords may be used as labels (field names). 226 Unless noted otherwise, they can also be used as identifiers to refer to 227 the same name. 228 229 230 #### Values 231 232 The following keywords are values. 233 234 ``` 235 null true false 236 ``` 237 238 These can never be used to refer to a field of the same name. 239 This restriction is to ensure compatibility with JSON configuration files. 240 241 242 #### Preamble 243 244 The following keywords are used at the preamble of a CUE file. 245 After the preamble, they may be used as identifiers to refer to namesake fields. 246 247 ``` 248 package import 249 ``` 250 251 252 #### Comprehension clauses 253 254 The following keywords are used in comprehensions. 255 256 ``` 257 for in if let 258 ``` 259 260 <!-- 261 TODO: 262 reduce [to] 263 order [by] 264 --> 265 266 267 ### Operators and punctuation 268 269 The following character sequences represent operators and punctuation: 270 271 ``` 272 + && == < = ( ) 273 - || != > : { } 274 * & =~ <= ? [ ] , 275 / | !~ >= ! _|_ ... . 276 ``` 277 <!-- 278 Free tokens: ; ~ ^ 279 // To be used: 280 @ at: associative lists. 281 282 // Idea: use # instead of @ for attributes and allow then at declaration level. 283 // This will open up the possibility of defining #! at the start of a file 284 // without requiring special syntax. Although probably not quite. 285 --> 286 287 288 ### Numeric literals 289 290 There are several kinds of numeric literals. 291 292 ``` 293 int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit . 294 decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } . 295 decimals = decimal_digit { [ "_" ] decimal_digit } . 296 si_it = decimals [ "." decimals ] multiplier | 297 "." decimals multiplier . 298 binary_lit = "0b" binary_digit { [ "_" ] binary_digit } . 299 hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } . 300 octal_lit = "0o" octal_digit { [ "_" ] octal_digit } . 301 multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ] 302 303 float_lit = decimals "." [ decimals ] [ exponent ] | 304 decimals exponent | 305 "." decimals [ exponent ]. 306 exponent = ( "e" | "E" ) [ "+" | "-" ] decimals . 307 ``` 308 309 An _integer literal_ is a sequence of digits representing an integer value. 310 An optional prefix sets a non-decimal base: `0o` for octal, 311 `0x` or `0X` for hexadecimal, and `0b` for binary. 312 In hexadecimal literals, letters `a … f` and `A … F` represent values 10 through 15. 313 All integers allow interstitial underscores `_`; 314 these have no meaning and are solely for readability. 315 316 Integer literals may have an SI or IEC multiplier. 317 Multipliers can be used with fractional numbers. 318 When multiplying a fraction by a multiplier, the result is truncated 319 towards zero if it is not an integer. 320 321 ``` 322 42 323 1.5G // 1_500_000_000 324 1.3Ki // 1.3 * 1024 = trunc(1331.2) = 1331 325 170_141_183_460_469_231_731_687_303_715_884_105_727 326 0xBad_Face 327 0o755 328 0b0101_0001 329 ``` 330 331 A _decimal floating-point literal_ is a representation of 332 a decimal floating-point value (a _float_). 333 It has an integer part, a decimal point, a fractional part, and an 334 exponent part. 335 The integer and fractional part comprise decimal digits; the 336 exponent part is an `e` or `E` followed by an optionally signed decimal exponent. 337 One of the integer part or the fractional part may be elided; one of the decimal 338 point or the exponent may be elided. 339 340 ``` 341 0. 342 72.40 343 072.40 // == 72.40 344 2.71828 345 1.e+0 346 6.67428e-11 347 1E6 348 .25 349 .12345E+5 350 ``` 351 352 <!-- 353 TODO: consider allowing Exo (and up), if not followed by a sign 354 or number. Alternatively one could only allow Ei, Yi, and Zi. 355 --> 356 357 Neither a `float_lit` nor an `si_lit` may appear after a token that is: 358 359 - an identifier, keyword, or bottom 360 - a number or string literal, including an interpolation 361 - one of the characters `)`, `]`, `}`, `?`, or `.`. 362 363 <!-- 364 So 365 `a + 3.2Ti` -> `a`, `+`, `3.2Ti` 366 `a 3.2Ti` -> `a`, `3`, `.`, `2`, `Ti` 367 `a + .5e3` -> `a`, `+`, `.5e3` 368 `a .5e3` -> `a`, `.`, `5`, `e3`. 369 --> 370 371 372 ### String and byte sequence literals 373 374 A string literal represents a string constant obtained from concatenating a 375 sequence of characters. 376 Byte sequences are a sequence of bytes. 377 378 String and byte sequence literals are character sequences between, 379 respectively, double and single quotes, as in `"bar"` and `'bar'`. 380 Within the quotes, any character may appear except newline and, 381 respectively, unescaped double or single quote. 382 String literals may only be valid UTF-8. 383 Byte sequences may contain any sequence of bytes. 384 385 Several escape sequences allow arbitrary values to be encoded as ASCII text. 386 An escape sequence starts with an _escape delimiter_, which is `\` by default. 387 The escape delimiter may be altered to be `\` plus a fixed number of 388 hash symbols `#` by padding the start and end of a string or byte sequence 389 literal with this number of hash symbols. 390 391 <!-- 392 TODO: move these examples further up so it's evident why #" exists. 393 #"This is not an \(interpolation)"# 394 #"This is an \#(interpolation)"# 395 #"The sequence "\U0001F604" renders as \#U0001F604."# 396 --> 397 398 There are four ways to represent the integer value as a numeric constant: `\x` 399 followed by exactly two hexadecimal digits; `\u` followed by exactly four 400 hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a 401 plain backslash `\` followed by exactly three octal digits. 402 In each case the value of the literal is the value represented by the 403 digits in the corresponding base. 404 Hexadecimal and octal escapes are only allowed within byte sequences 405 (single quotes). 406 407 Although these representations all result in an integer, they have different 408 valid ranges. 409 Octal escapes must represent a value between 0 and 255 inclusive. 410 Hexadecimal escapes satisfy this condition by construction. 411 The escapes `\u` and `\U` represent Unicode code points so within them 412 some values are illegal, in particular those above `0x10FFFF`. 413 Surrogate halves are allowed, 414 but are translated into their non-surrogate equivalent internally. 415 416 The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes 417 represent individual bytes of the resulting string; all other escapes represent 418 the (possibly multi-byte) UTF-8 encoding of individual characters. 419 Thus inside a string literal `\377` and `\xFF` represent a single byte of 420 value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent 421 the two bytes `0xc3 0xbf` of the UTF-8 encoding of character `U+00FF`. 422 423 ``` 424 \a U+0007 alert or bell 425 \b U+0008 backspace 426 \f U+000C form feed 427 \n U+000A line feed or newline 428 \r U+000D carriage return 429 \t U+0009 horizontal tab 430 \v U+000b vertical tab 431 \/ U+002f slash (solidus) 432 \\ U+005c backslash 433 \' U+0027 single quote (valid escape only within single quoted literals) 434 \" U+0022 double quote (valid escape only within double quoted literals) 435 ``` 436 437 The escape `\(` is used as an escape for string interpolation. 438 A `\(` must be followed by a valid CUE Expression, followed by a `)`. 439 440 A backslash at the end of a line elides the line terminator that follows it. 441 This may not escape the final newline inside a multiline string: that 442 newline is already implicitly elided. 443 444 All other sequences starting with a backslash are illegal inside literals. 445 446 ``` 447 escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) . 448 byte_value = octal_byte_value | hex_byte_value . 449 octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit . 450 hex_byte_value = `\` { `#` } "x" hex_digit hex_digit . 451 little_u_value = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit . 452 big_u_value = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit 453 hex_digit hex_digit hex_digit hex_digit . 454 unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . 455 interpolation = "\" { `#` } "(" Expression ")" . 456 457 string_lit = simple_string_lit | 458 multiline_string_lit | 459 simple_bytes_lit | 460 multiline_bytes_lit | 461 `#` string_lit `#` . 462 463 simple_string_lit = `"` { unicode_value | interpolation } `"` . 464 simple_bytes_lit = `'` { unicode_value | interpolation | byte_value } `'` . 465 multiline_string_lit = `"""` newline 466 { unicode_value | interpolation | newline } 467 newline `"""` . 468 multiline_bytes_lit = "'''" newline 469 { unicode_value | interpolation | byte_value | newline } 470 newline "'''" . 471 ``` 472 473 Carriage return characters (`\r`) inside string literals are discarded from 474 the string value. 475 476 ``` 477 'a\000\xab' 478 '\007' 479 '\377' 480 '\xa' // illegal: too few hexadecimal digits 481 "\n" 482 "\"" 483 'Hello, world!\n' 484 "Hello, \( name )!" 485 "日本語" 486 "\u65e5本\U00008a9e" 487 '\xff\u00FF' 488 "\uD800" // illegal: surrogate half (TODO: probably should allow) 489 "\U00110000" // illegal: invalid Unicode code point 490 491 #"This is not an \(interpolation)"# 492 #"This is an \#(interpolation)"# 493 #"The sequence "\U0001F604" renders as \#U0001F604."# 494 ``` 495 496 These examples all represent the same string: 497 498 ``` 499 "日本語" // UTF-8 input text 500 '日本語' // UTF-8 input text as byte sequence 501 "\u65e5\u672c\u8a9e" // the explicit Unicode code points 502 "\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points 503 '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e' // the explicit UTF-8 bytes 504 ``` 505 506 If the source code represents a character as two code points, such as a 507 combining form involving an accent and a letter, the result will appear as two 508 code points if placed in a string literal. 509 510 Strings and byte sequences have a multiline equivalent. 511 Multiline strings are like their single-line equivalent, 512 but allow newline characters. 513 514 Multiline strings and byte sequences respectively start with 515 a triple double quote (`"""`) or triple single quote (`'''`), 516 immediately followed by a newline, which is discarded from the string contents. 517 The string is closed by a matching triple quote, which must be by itself 518 on a new line, preceded by optional whitespace. 519 The newline preceding the closing quote is discarded from the string contents. 520 The whitespace before a closing triple quote must appear before any non-empty 521 line after the opening quote and will be removed from each of these 522 lines in the string literal. 523 A closing triple quote may not appear in the string. 524 To include it is suffices to escape one of the quotes. 525 526 ``` 527 """ 528 lily: 529 out of the water 530 out of itself 531 532 bass 533 picking \ 534 bugs 535 off the moon 536 — Nick Virgilio, Selected Haiku, 1988 537 """ 538 ``` 539 540 This represents the same string as: 541 542 ``` 543 "lily:\nout of the water\nout of itself\n\n" + 544 "bass\npicking bugs\noff the moon\n" + 545 " — Nick Virgilio, Selected Haiku, 1988" 546 ``` 547 548 <!-- TODO: other values 549 550 Support for other values: 551 - Duration literals 552 - regular expressions: `re("[a-z]")` 553 --> 554 555 556 ## Values 557 558 In addition to simple values like `"hello"` and `42.0`, CUE has [structs](#structs). 559 A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`. 560 Structs are CUE's only way of building up complex values; 561 lists, which we will see later, 562 are defined in terms of structs. 563 564 All possible values are ordered in a lattice, 565 a partial order where every two elements have a single greatest lower bound. 566 A value `a` is an _instance_ of a value `b`, 567 denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`, 568 that is if `a` orders before `b` in the partial order 569 (`⊑` is _not_ a CUE operator). 570 We also say that `b` _subsumes_ `a` in this case. 571 In graphical terms, `b` is "above" `a` in the lattice. 572 573 <!-- TODO: link to https://cuelang.org/docs/concepts/logic/ as more reading 574 material, especially for those new to lattices 575 --> 576 577 At the top of the lattice is the single ancestor of all values, called 578 [top](#top), denoted `_` in CUE. 579 Every value is an instance of top. 580 581 At the bottom of the lattice is the value called [bottom](#bottom), denoted `_|_`. 582 A bottom value usually indicates an error. 583 Bottom is an instance of every value. 584 585 An _atom_ is any value whose only instances are itself and bottom. 586 Examples of atoms are `42.0`, `"hello"`, `true`, and `null`. 587 588 A value is _concrete_ if it is either an atom, or a struct whose field values 589 are all concrete, recursively. 590 591 CUE's values also include what we normally think of as types, like `string` and 592 `float`. 593 It does not distinguish between types and values: 594 only the relationship of values in the lattice is important. 595 Each CUE "type" subsumes the concrete values that one would normally think 596 of as part of that type. 597 For example, `"hello"` is an instance of `string`, and `42.0` is an instance of 598 `float`. 599 In addition to `string` and `float`, CUE has `null`, `int`, `bool`, and `bytes`. 600 We informally call these CUE's "basic types". 601 602 603 ``` 604 false ⊑ bool 605 true ⊑ bool 606 true ⊑ true 607 5.0 ⊑ float 608 bool ⊑ _ 609 _|_ ⊑ _ 610 _|_ ⊑ _|_ 611 612 _ ⋢ _|_ 613 _ ⋢ bool 614 int ⋢ bool 615 bool ⋢ int 616 false ⋢ true 617 true ⋢ false 618 float ⋢ 5.0 619 5 ⋢ 6 620 ``` 621 622 623 ### Unification 624 625 The _unification_ of values `a` and `b` 626 is defined as the greatest lower bound of `a` and `b`. (That is, the 627 value `u` such that `u ⊑ a` and `u ⊑ b`, 628 and for any other value `v` for which `v ⊑ a` and `v ⊑ b` 629 it holds that `v ⊑ u`.) 630 Since CUE values form a lattice, the unification of two CUE values is 631 always unique. 632 633 These all follow from the definition of unification: 634 - The unification of `a` with itself is always `a`. 635 - The unification of values `a` and `b` where `a ⊑ b` is always `a`. 636 - The unification of a value with bottom is always bottom. 637 638 Unification in CUE is a [binary expression](#operands), written `a & b`. 639 It is commutative, associative, and idempotent. 640 As a consequence, order of evaluation is irrelevant, a property that is key 641 to many of the constructs in the CUE language as well as the tooling layered 642 on top of it. 643 644 645 646 <!-- TODO: explicitly mention that disjunction is not a binary operation 647 but a definition of a single value?--> 648 649 650 ### Disjunction 651 652 The _disjunction_ of values `a` and `b` 653 is defined as the least upper bound of `a` and `b`. 654 (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`, 655 and for any other value `e` for which `a ⊑ e` and `b ⊑ e`, 656 it holds that `d ⊑ e`.) 657 This style of disjunctions is sometimes also referred to as sum types. 658 Since CUE values form a lattice, the disjunction of two CUE values is always unique. 659 660 661 These all follow from the definition of disjunction: 662 - The disjunction of `a` with itself is always `a`. 663 - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`. 664 - The disjunction of a value `a` with bottom is always `a`. 665 - The disjunction of two bottom values is bottom. 666 667 Disjunction in CUE is a [binary expression](#operands), written `a | b`. 668 It is commutative, associative, and idempotent. 669 670 The unification of a disjunction with another value is equal to the disjunction 671 composed of the unification of this value with all of the original elements 672 of the disjunction. 673 In other words, unification distributes over disjunction. 674 675 ``` 676 (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b. 677 ``` 678 679 ``` 680 Expression Result 681 ({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3} 682 (int | string) & "foo" "foo" 683 ("a" | "b") & "c" _|_ 684 ``` 685 686 A disjunction is _normalized_ if there is no element 687 `a` for which there is an element `b` such that `a ⊑ b`. 688 689 <!-- 690 Normalization is important, as we need to account for spurious elements 691 For instance "tcp" | "tcp" should resolve to "tcp". 692 693 Also consider 694 695 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2}, 696 697 in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus 698 this expression is logically equivalent to {a:1} and should therefore be 699 considered to be unambiguous and resolve to {a:1} if a concrete value is needed. 700 701 For instance, in 702 703 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2} 704 y: x.a // 1 705 706 y should resolve to 1, and not an error. 707 708 For comparison, in 709 710 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2} 711 y: x.a // _|_ 712 713 y should be an error as x is still ambiguous before the selector is applied, 714 even though `a` resolves to 1 in all cases. 715 --> 716 717 718 #### Default values 719 720 Any value `v` _may_ be associated with a default value `d`, 721 where `d` must be in instance of `v` (`d ⊑ v`). 722 723 Default values are introduced by means of disjunctions. 724 Any element of a disjunction can be _marked_ as a default 725 by prefixing it with an asterisk `*` ([a unary expression](#operators)). 726 Syntactically consecutive disjunctions are considered to be 727 part of a single disjunction, 728 whereby multiple disjuncts can be marked as default. 729 A _marked disjunction_ is one where any of its terms are marked. 730 So `a | b | *c | d` is a single marked disjunction of four terms, 731 whereas `a | (b | *c | d)` is an unmarked disjunction of two terms, 732 one of which is a marked disjunction of three terms. 733 During unification, if all the marked disjuncts of a marked disjunction are 734 eliminated, then the remaining unmarked disjuncts are considered as if they 735 originated from an unmarked disjunction 736 <!-- TODO: this formulation should be worked out more. --> 737 As explained below, distinguishing the nesting of disjunctions like this 738 is only relevant when both an outer and nested disjunction are marked. 739 740 Intuitively, when an expression needs to be resolved for an operation other 741 than unification or disjunction, 742 non-starred elements are dropped in favor of starred ones if the starred ones 743 do not resolve to bottom. 744 745 To define the unification and disjunction operation we use the notation 746 `⟨v⟩` to denote a CUE value `v` that is not associated with a default 747 and the notation `⟨v, d⟩` to denote a value `v` associated with a default 748 value `d`. 749 750 The rewrite rules for unifying such values are as follows: 751 ``` 752 U0: ⟨v1⟩ & ⟨v2⟩ => ⟨v1&v2⟩ 753 U1: ⟨v1, d1⟩ & ⟨v2⟩ => ⟨v1&v2, d1&v2⟩ 754 U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩ 755 ``` 756 757 The rewrite rules for disjoining terms of unmarked disjunctions are 758 ``` 759 D0: ⟨v1⟩ | ⟨v2⟩ => ⟨v1|v2⟩ 760 D1: ⟨v1, d1⟩ | ⟨v2⟩ => ⟨v1|v2, d1⟩ 761 D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩ 762 ``` 763 764 Terms of marked disjunctions are first rewritten according to the following 765 rules: 766 ``` 767 M0: ⟨v⟩ => ⟨v⟩ don't introduce defaults for unmarked term 768 M1: *⟨v⟩ => ⟨v, v⟩ introduce identical default for marked term 769 M2: *⟨v, d⟩ => ⟨v, d⟩ keep existing defaults for marked term 770 M3: ⟨v, d⟩ => ⟨v⟩ strip existing defaults from unmarked term 771 ``` 772 773 Note that for any marked disjunction `a`, 774 the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`. 775 776 ``` 777 Expression Value-default pair Rules applied 778 *"tcp" | "udp" ⟨"tcp"|"udp", "tcp"⟩ M1, D1 779 string | *"foo" ⟨string, "foo"⟩ M1, D1 780 781 *1 | 2 | 3 ⟨1|2|3, 1⟩ M1, D1 782 783 (*1|2|3) | (1|*2|3) ⟨1|2|3, 1|2⟩ M1, D1, D2 784 (*1|2|3) | *(1|*2|3) ⟨1|2|3, 2⟩ M1, M2, M3, D1, D2 785 (*1|2|3) | (1|*2|3)&2 ⟨1|2|3, 1|2⟩ M1, D1, U1, D2 786 787 (*1|2) & (1|*2) ⟨1|2, _|_⟩ M1, D1, U2 788 ``` 789 790 <!-- TODO: define and consistently use the value-default pair syntax --> 791 792 The rules of subsumption for defaults can be derived from the above definitions 793 and are as follows. 794 795 ``` 796 ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩ if v2 ⊑ v1 and d2 ⊑ d1 797 ⟨v1, d1⟩ ⊑ ⟨v⟩ if v1 ⊑ v 798 ⟨v⟩ ⊑ ⟨v1, d1⟩ if v ⊑ d1 799 ``` 800 801 <!-- 802 For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v. 803 804 The last one is so restrictive as v could still be made more specific by 805 associating it with a default that is not subsumed by d1. 806 807 Proof: 808 by definition for any d ⊑ v, it holds that (v, d) ⊑ v, 809 where the most general value is (v, v). 810 Given the subsumption rule for (v2, d2) ⊑ (v1, d1), 811 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1 812 exactly defines the boundary of this subsumption. 813 --> 814 815 <!-- 816 (non-normalized entries could also be implicitly marked, allowing writing 817 int | 1, instead of int | *1, but that can be done in a backwards 818 compatible way later if really desirable, as long as we require that 819 disjunction literals be normalized). 820 --> 821 822 ``` 823 Expression Resolves to 824 "tcp" | "udp" "tcp" | "udp" 825 *"tcp" | "udp" "tcp" 826 float | *1 1 827 *string | 1.0 string 828 (*1|2) + (2|*3) 4 829 830 (*1|2|3) | (1|*2|3) 1|2 831 (*1|2|3) & (1|*2|3) 1|2|3 // default is _|_ 832 833 (* >=5 | int) & (* <=5 | int) 5 834 835 (*"tcp"|"udp") & ("udp"|*"tcp") "tcp" 836 (*"tcp"|"udp") & ("udp"|"tcp") "tcp" 837 (*"tcp"|"udp") & "tcp" "tcp" 838 (*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_ 839 840 (*true | false) & bool true 841 (*true | false) & (true | false) true 842 843 {a: 1} | {b: 1} {a: 1} | {b: 1} 844 {a: 1} | *{b: 1} {b:1} 845 *{a: 1} | *{b: 1} {a: 1} | {b: 1} 846 ({a: 1} | {b: 1}) & {a:1} {a:1} | {a: 1, b: 1} 847 ({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} 848 ``` 849 850 851 ### Bottom and errors 852 853 Any evaluation error in CUE results in a bottom value, represented by 854 the token `_|_`. 855 Bottom is an instance of every other value. 856 Any evaluation error is represented as bottom. 857 858 Implementations may associate error strings with different instances of bottom; 859 logically they all remain the same value. 860 861 ``` 862 bottom_lit = "_|_" . 863 ``` 864 865 866 ### Top 867 868 Top is represented by the underscore character `_`, lexically an identifier. 869 Unifying any value `v` with top results in `v` itself. 870 871 ``` 872 Expr Result 873 _ & 5 5 874 _ & _ _ 875 _ & _|_ _|_ 876 _ | _|_ _ 877 ``` 878 879 880 ### Null 881 882 The _null value_ is represented with the keyword `null`. 883 It has only one parent, top, and one child, bottom. 884 It is unordered with respect to any other value. 885 886 ``` 887 null_lit = "null" . 888 ``` 889 890 ``` 891 null & 8 _|_ 892 null & _ null 893 null & _|_ _|_ 894 ``` 895 896 897 ### Boolean values 898 899 A _boolean type_ represents the set of Boolean truth values denoted by 900 the keywords `true` and `false`. 901 The predeclared boolean type is `bool`; it is a defined type and a separate 902 element in the lattice. 903 904 ``` 905 bool_lit = "true" | "false" . 906 ``` 907 908 ``` 909 bool & true true 910 true & true true 911 true & false _|_ 912 bool & (false|true) false | true 913 bool & (true|false) true | false 914 ``` 915 916 917 ### Numeric values 918 919 The _integer type_ represents the set of all integral numbers. 920 The _decimal floating-point type_ represents the set of all decimal floating-point 921 numbers. 922 They are two distinct types. 923 Both are instances instances of a generic `number` type. 924 925 <!-- 926 TODO: would be nice to make this a rendered diagram with Mermaid. 927 928 number 929 / \ 930 int float 931 --> 932 933 The predeclared number, integer, and decimal floating-point types are 934 `number`, `int` and `float`; they are defined types. 935 <!-- 936 TODO: should we drop float? It is somewhat preciser and probably a good idea 937 to have it in the programmatic API, but it may be confusing to have to deal 938 with it in the language. 939 --> 940 941 A decimal floating-point literal always has type `float`; 942 it is not an instance of `int` even if it is an integral number. 943 944 Integer literals are always of type `int` and don't match type `float`. 945 946 Numeric literals are exact values of arbitrary precision. 947 If the operation permits it, numbers should be kept in arbitrary precision. 948 949 Implementation restriction: although numeric values have arbitrary precision 950 in the language, implementations may implement them using an internal 951 representation with limited precision. 952 That said, every implementation must: 953 954 - Represent integer values with at least 256 bits. 955 - Represent floating-point values with a mantissa of at least 256 bits and 956 a signed binary exponent of at least 16 bits. 957 - Give an error if unable to represent an integer value precisely. 958 - Give an error if unable to represent a floating-point value due to overflow. 959 - Round to the nearest representable value if unable to represent 960 a floating-point value due to limits on precision. 961 These requirements apply to the result of any expression except for builtin 962 functions, for which an unusual loss of precision must be explicitly documented. 963 964 965 ### Strings 966 967 The _string type_ represents the set of UTF-8 strings, 968 not allowing surrogates. 969 The predeclared string type is `string`; it is a defined type. 970 971 The length of a string `s` (its size in bytes) can be discovered using 972 the builtin function `len`. 973 974 975 ### Bytes 976 977 The _bytes type_ represents the set of byte sequences. 978 A byte sequence value is a (possibly empty) sequence of bytes. 979 The number of bytes is called the length of the byte sequence 980 and is never negative. 981 The predeclared byte sequence type is `bytes`; it is a defined type. 982 983 984 ### Bounds 985 986 A _bound_, syntactically a [unary expression](#operands), defines 987 a logically infinite disjunction of concrete values represented as a single comparison. 988 For example, `>= 2` represents the infinite disjunction `2|3|4|5|6|7|…`. 989 990 For any [comparison operator](#comparison-operators) `op` except `==`, 991 `op a` is the disjunction of every `x` such that `x op a`. 992 993 994 ``` 995 2 & >=2 & <=5 // 2, where 2 is either an int or float. 996 2.5 & >=1 & <=5 // 2.5 997 2 & >=1.0 & <3.0 // 2.0 998 2 & >1 & <3.0 // 2.0 999 2.5 & int & >1 & <5 // _|_ 1000 2.5 & float & >1 & <5 // 2.5 1001 int & 2 & >1.0 & <3.0 // _|_ 1002 2.5 & >=(int & 1) & <5 // _|_ 1003 >=0 & <=7 & >=3 & <=10 // >=3 & <=7 1004 !=null & 1 // 1 1005 >=5 & <=5 // 5 1006 ``` 1007 1008 1009 ### Structs 1010 1011 A _struct_ is a set of elements called _fields_, each of 1012 which has a name, called a _label_, and value. 1013 1014 We say a label is _defined_ for a struct if the struct has a field with the 1015 corresponding label. 1016 The value for a label `f` of struct `a` is denoted `a.f`. 1017 A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f` 1018 defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`. 1019 Note that if `a` is an instance of `b` it may have fields with labels that 1020 are not defined for `b`. 1021 1022 The (unique) struct with no fields, written `{}`, has every struct as an 1023 instance. It can be considered the type of all structs. 1024 1025 ``` 1026 {a: 1} ⊑ {} 1027 {a: 1, b: 1} ⊑ {a: 1} 1028 {a: 1} ⊑ {a: int} 1029 {a: 1, b: 1.0} ⊑ {a: int, b: number} 1030 1031 {} ⋢ {a: 1} 1032 {a: 2} ⋢ {a: 1} 1033 {a: 1} ⋢ {b: 1} 1034 ``` 1035 1036 The successful unification of structs `a` and `b` is a new struct `c` which 1037 has all fields of both `a` and `b`, where 1038 the value of a field `f` in `c` is `a.f & b.f` if `f` is defined in both `a` and `b`, 1039 or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively. 1040 Any [references](#references) to `a` or `b` 1041 in their respective field values need to be replaced with references to `c`. 1042 The result of a unification is bottom (`_|_`) if any of its defined 1043 fields evaluates to bottom, recursively. 1044 1045 A struct literal may contain multiple fields with the same label, 1046 the result of which is the unification of all those fields. 1047 1048 ``` 1049 StructLit = "{" { Declaration "," } "}" . 1050 Declaration = Field | Ellipsis | Embedding | LetClause | attribute . 1051 Ellipsis = "..." [ Expression ] . 1052 Embedding = Comprehension | AliasExpr . 1053 Field = Label ":" { Label ":" } AliasExpr { attribute } . 1054 Label = [ identifier "=" ] LabelExpr . 1055 LabelExpr = LabelName [ "?" | "!" ] | "[" AliasExpr "]" . 1056 LabelName = identifier | simple_string_lit | "(" AliasExpr ")" . 1057 1058 attribute = "@" identifier "(" attr_tokens ")" . 1059 attr_tokens = { attr_token | 1060 "(" attr_tokens ")" | 1061 "[" attr_tokens "]" | 1062 "{" attr_tokens "}" } . 1063 attr_token = /* any token except '(', ')', '[', ']', '{', or '}' */ 1064 ``` 1065 1066 ``` 1067 Expression Result 1068 {a: int, a: 1} {a: 1} 1069 {a: int} & {a: 1} {a: 1} 1070 {a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7} 1071 {a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7} 1072 1073 {a: 1} & {b: 2} {a: 1, b: 2} 1074 {a: 1, b: int} & {b: 2} {a: 1, b: 2} 1075 1076 {a: 1} & {a: 2} _|_ 1077 ``` 1078 1079 1080 #### Field constraints 1081 1082 A struct may declare _field constraints_ which define values 1083 that should be unified with a given field once it is defined. 1084 The existence of a field constraint declares, but does not define, that field. 1085 1086 Syntactically, a field is marked as a constraint 1087 by following its label with an _optional_ marker `?` 1088 or _required_ marker `!`. 1089 These markers are not part of the field name. 1090 1091 A struct that has a required field constraint with a bottom value 1092 evaluates to bottom. 1093 An optional field constraint with a bottom value does _not_ invalidate 1094 the struct that contains it 1095 as long as it is not unified with a defined field. 1096 1097 The subsumption relation for fields with the various markers is defined as 1098 ``` 1099 {a: x} ⊑ {a!: x} ⊑ {a?: x} 1100 ``` 1101 for any given `x`. 1102 1103 Implementations may error upon encountering a required field constraint 1104 when manifesting CUE as data. 1105 1106 ``` 1107 Expression Result 1108 {foo?: 3} & {foo: 3} {foo: 3} 1109 {foo!: 3} & {foo: 3} {foo: 3} 1110 1111 {foo!: int} & {foo: int} {foo: int} 1112 {foo!: int} & {foo?: <1} {foo!: <1} 1113 {foo!: int} & {foo: <=3} {foo: <=3} 1114 {foo!: int} & {foo: 3} {foo: 3} 1115 1116 {foo!: 3} & {foo: int} {foo: 3} 1117 {foo!: 3} & {foo: <=4} {foo: 3} 1118 1119 {foo?: 1} & {foo?: 2} {foo?: _|_} // No error 1120 {foo?: 1} & {foo!: 2} _|_ 1121 {foo?: 1} & {foo: 2} _|_ 1122 ``` 1123 1124 <!-- see https://github.com/cue-lang/proposal/blob/main/designs/1951-required-fields-v2.md --> 1125 1126 <!--NOTE: About bottom values for optional fields being okay. 1127 1128 The proposition ¬P is a close cousin of P → ⊥ and is often used 1129 as an approximation to avoid the issues of using not. 1130 Bottom (⊥) is also frequently used to mean undefined. This makes sense. 1131 Consider `{a?: 2} & {a?: 3}`. 1132 Both structs say `a` is optional; in other words, it may be omitted. 1133 So we can still get a valid result by omitting `a`, even in 1134 case of a conflict. 1135 1136 Granted, this definition may lead to confusing results, especially in 1137 definitions, when tightening an optional field leads to unintentionally 1138 discarding it. 1139 It could be a role of vet checkers to identify such cases (and suggest users 1140 to explicitly use `_|_` to discard a field, for instance). 1141 1142 TODO: These examples show also how field constraints interact with defaults. 1143 Should we included this? Probably not necessary, as this is an orthogonal 1144 concern. 1145 ``` 1146 Expression Result 1147 a: { foo?: string } a: { foo?: string } 1148 b: { foo: "bar" } b: { foo: "bar" } 1149 c: { foo?: *"baz" | string } c: { foo?: *"baz" | string } 1150 1151 d: a & b { foo: "bar" } 1152 e: b & c { foo: "bar" } 1153 f: a & c { foo?: *"baz" | string } 1154 g: a & { foo?: number } { foo?: _|_ } // This is fine 1155 h: b & { foo?: number } _|_ 1156 i: c & { foo: string } { foo: *"baz" | string } 1157 ``` 1158 --> 1159 1160 1161 #### Dynamic fields 1162 1163 A _dynamic field_ is a field whose label is determined by 1164 an expression wrapped in parentheses. 1165 A dynamic field may be marked as optional or required. 1166 1167 ``` 1168 Expression Result 1169 a: "foo" a: "foo" 1170 b: "bar" b: "bar" 1171 (a): "baz" foo: "baz" 1172 1173 (a+b): "qux" foobar: "qux" 1174 1175 (a)?: string foo?: string 1176 (b)!: string bar!: string 1177 ``` 1178 1179 1180 #### Pattern and default constraints 1181 1182 A struct may define constraints that apply to a collection of fields. 1183 1184 A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which 1185 is a value of type string, and a value to unify with fields whose label 1186 unifies with the pattern. 1187 For a given struct `a` with pattern constraint `[p]: v`, `v` is unified 1188 with any field with name `f` in `a` for which `p & f` is not bottom. 1189 When unifying struct `a` and `b`, 1190 any pattern constraint declared in `a` and `b` 1191 are also declared in the result of unification. 1192 1193 <!-- TODO: Update grammar and support this. 1194 A pattern constraints with a pattern preceded by `...` indicates 1195 the pattern can only matches fields in `b` for which there 1196 exists no field in `a` with the same label. 1197 --> 1198 1199 Additionally, a _default constraint_, denoted `...value`, defines a value 1200 to unify with any field for which there is no other declaration in a struct. 1201 When unifying structs `a` and `b`, 1202 a default constraint `...v` declared in `a` 1203 defines that the value `v` should unify with any field in the resulting struct `c` 1204 whose label does not unify with any of the patterns of the pattern 1205 constraints defined for `a` _and_ for which there exists no field declaration 1206 in `a` with that label. 1207 The token `...` is a shorthand for `..._`. 1208 _Note_: default constraints of the form `..._` are not yet implemented. 1209 1210 1211 ``` 1212 a: { 1213 foo: string // foo is a string 1214 [=~"^i"]: int // all other fields starting with i are integers 1215 [=~"^b"]: bool // all other fields starting with b are booleans 1216 [>"c"]: string // all other fields lexically after c are strings 1217 1218 ...string // all other fields must be a string. Note: default constraints are not yet implemented. 1219 } 1220 1221 b: a & { 1222 i3: 3 1223 bar: true 1224 other: "a string" 1225 } 1226 ``` 1227 1228 <!-- 1229 TODO: are these two equivalent? Rog says that maybe you'll be able to refer 1230 to optional fields at some point, which will never make sense for patterns. 1231 Marcel says this is already mentioned elsewhere. 1232 1233 a: { 1234 ["foo"]: int 1235 foo?: int 1236 } 1237 --> 1238 1239 Concrete field labels may be an identifier or string, the latter of which may be 1240 interpolated. 1241 Fields with identifier labels can be referred to within the scope they are 1242 defined, string labels cannot. 1243 References within such interpolated strings are resolved within 1244 the scope of the struct in which the label sequence is 1245 defined and can reference concrete labels lexically preceding 1246 the label within a label sequence. 1247 <!-- We allow this so that rewriting a CUE file to collapse or expand 1248 field sequences has no impact on semantics. 1249 --> 1250 1251 <!--TODO: first implementation round will not yet have expression labels 1252 1253 An ExpressionLabel sets a collection of optional fields to a field value. 1254 By default it defines this value for all possible string labels. 1255 An optional expression limits this to the set of optional fields which 1256 labels match the expression. 1257 --> 1258 1259 1260 <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. --> 1261 1262 1263 <!-- NOTE: 1264 A DefinitionDecl does not allow repeated labels. This is to avoid 1265 any ambiguity or confusion about whether earlier path components 1266 are to be interpreted as declarations or normal fields (they should 1267 always be normal fields.) 1268 --> 1269 1270 <!--NOTE: 1271 The syntax has been deliberately restricted to allow for the following 1272 future extensions and relaxations: 1273 - Allow omitting a "?" in an expression label to indicate a concrete 1274 string value (but maybe we want to use () for that). 1275 - Make the "?" in expression label optional if expression labels 1276 are always optional. 1277 - Or allow eliding the "?" if the expression has no references and 1278 is obviously not concrete (such as `[string]`). 1279 - The expression of an expression label may also indicate a struct with 1280 integer or even number labels 1281 (beware of imprecise computation in the latter). 1282 e.g. `{ [int]: string }` is a map of integers to strings. 1283 - Allow for associative lists (`foo [@.field]: {field: string}`) 1284 - The `...` notation can be extended analogously to that of a ListList, 1285 by allowing it to follow with an expression for the remaining properties. 1286 In that case it is no longer a shorthand for `[string]: _`, but rather 1287 would define the value for any other value for which there is no field 1288 defined. 1289 Like the definition with List, this is somewhat odd, but it allows the 1290 encoding of JSON schema's and (non-structural) OpenAPI's 1291 additionalProperties and additionalItems. 1292 --> 1293 1294 ``` 1295 intMap: [string]: int 1296 intMap: { 1297 t1: 43 1298 t2: 2.4 // error: 2.4 is not an integer 1299 } 1300 1301 nameMap: [string]: { 1302 firstName: string 1303 nickName: *firstName | string 1304 } 1305 1306 nameMap: hank: firstName: "Hank" 1307 ``` 1308 1309 The optional field set defined by `nameMap` matches every field, 1310 in this case just `hank`, and unifies the associated constraint 1311 with the matched field, resulting in: 1312 1313 ``` 1314 nameMap: hank: { 1315 firstName: "Hank" 1316 nickName: "Hank" 1317 } 1318 ``` 1319 1320 1321 #### Closed structs 1322 1323 By default, structs are open to adding fields. 1324 Instances of an open struct `p` may contain fields not defined in `p`. 1325 This is makes it easy to add fields, but can lead to bugs: 1326 1327 ``` 1328 S: { 1329 field1: string 1330 } 1331 1332 S1: S & { field2: "foo" } 1333 1334 // S1 is { field1: string, field2: "foo" } 1335 1336 1337 A: { 1338 field1: string 1339 field2: string 1340 } 1341 1342 A1: A & { 1343 feild1: "foo" // "field1" was accidentally misspelled 1344 } 1345 1346 // A1 is 1347 // { field1: string, field2: string, feild1: "foo" } 1348 // not the intended 1349 // { field1: "foo", field2: string } 1350 ``` 1351 1352 A _closed struct_ `c` is a struct whose instances may not declare any field 1353 with a name that does not match the name of a field 1354 or the pattern of a pattern constraint defined in `c`. 1355 Hidden fields are excluded from this limitation. 1356 A struct that is the result of unifying any struct with a [`...`](#structs) 1357 declaration is defined for all regular fields. 1358 Closing a struct is equivalent to adding `..._|_` to it. 1359 1360 Syntactically, structs are closed explicitly with the `close` builtin or 1361 implicitly and recursively by [definitions](#definitions-and-hidden-fields). 1362 1363 1364 ``` 1365 A: close({ 1366 field1: string 1367 field2: string 1368 }) 1369 1370 A1: A & { 1371 feild1: string 1372 } // _|_ feild1 not defined for A 1373 1374 A2: A & { 1375 for k,v in { feild1: string } { 1376 k: v 1377 } 1378 } // _|_ feild1 not defined for A 1379 1380 C: close({ 1381 [_]: _ 1382 }) 1383 1384 C2: C & { 1385 for k,v in { thisIsFine: string } { 1386 "\(k)": v 1387 } 1388 } 1389 1390 D: close({ 1391 // Values generated by comprehensions are treated as embeddings. 1392 for k,v in { x: string } { 1393 "\(k)": v 1394 } 1395 }) 1396 ``` 1397 1398 <!-- (jba) Somewhere it should be said that optional fields are only 1399 interesting inside closed structs. --> 1400 1401 <!-- TODO: move embedding section to above the previous one --> 1402 1403 #### Embedding 1404 1405 A struct may contain an _embedded value_, an operand used as a declaration. 1406 An embedded value of type struct is unified with the struct in which it is 1407 embedded, but disregarding the restrictions imposed by closed structs. 1408 So if an embedding resolves to a closed struct, the corresponding enclosing 1409 struct will also be closed, but may have fields that are not allowed if 1410 normal rules for closed structs were observed. 1411 1412 If an embedded value is not of type struct, the struct may only have 1413 definitions or hidden fields. Regular fields are not allowed in such case. 1414 1415 The result of `{ A }` is `A` for any `A` (including definitions). 1416 1417 Syntactically, embeddings may be any expression. 1418 1419 ``` 1420 S1: { 1421 a: 1 1422 b: 2 1423 { 1424 c: 3 1425 } 1426 } 1427 // S1 is { a: 1, b: 2, c: 3 } 1428 1429 S2: close({ 1430 a: 1 1431 b: 2 1432 { 1433 c: 3 1434 } 1435 }) 1436 // same as close(S1) 1437 1438 S3: { 1439 a: 1 1440 b: 2 1441 close({ 1442 c: 3 1443 }) 1444 } 1445 // same as S2 1446 ``` 1447 1448 1449 #### Definitions and hidden fields 1450 1451 A field is a _definition_ if its identifier starts with `#` or `_#`. 1452 A field is _hidden_ if its identifier starts with a `_`. 1453 All other fields are _regular_. 1454 1455 Definitions and hidden fields are not emitted when converting a CUE program 1456 to data and are never required to be concrete. 1457 1458 Referencing a definition will recursively [close](#closed-structs) it. 1459 That is, a referenced definition will not unify with a struct 1460 that would add a field anywhere within the definition that it does not 1461 already define or explicitly allow with a pattern constraint or `...`. 1462 [Embedding](#embedding) allows bypassing this check. 1463 1464 If referencing a definition would always result in an error, implementations 1465 may report this inconsistency at the point of its declaration. 1466 1467 ``` 1468 #MyStruct: { 1469 sub: field: string 1470 } 1471 1472 #MyStruct: { 1473 sub: enabled?: bool 1474 } 1475 1476 myValue: #MyStruct & { 1477 sub: feild: 2 // error, feild not defined in #MyStruct 1478 sub: enabled: true // okay 1479 } 1480 1481 #D: { 1482 #OneOf 1483 1484 c: int // adds this field. 1485 } 1486 1487 #OneOf: { a: int } | { b: int } 1488 1489 1490 D1: #D & { a: 12, c: 22 } // { a: 12, c: 22 } 1491 D2: #D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b` 1492 ``` 1493 1494 1495 ``` 1496 #A: {a: int} 1497 1498 B: { 1499 #A 1500 b: c: int 1501 } 1502 1503 x: B 1504 x: d: 3 // not allowed, as closed by embedded #A 1505 1506 y: B.b 1507 y: d: 3 // allowed as nothing closes b 1508 1509 #B: { 1510 #A 1511 b: c: int 1512 } 1513 1514 z: #B.b 1515 z: d: 3 // not allowed, as referencing #B closes b 1516 ``` 1517 1518 1519 <!--- 1520 JSON fields are usual camelCase. Clashes can be avoided by adopting the 1521 convention that definitions be TitleCase. Unexported definitions are still 1522 subject to clashes, but those are likely easier to resolve because they are 1523 package internal. 1524 ---> 1525 1526 1527 #### Attributes 1528 1529 Attributes allow associating meta information with values. 1530 Their primary purpose is to define mappings between CUE and 1531 other representations. 1532 Attributes do not influence the evaluation of CUE. 1533 1534 An attribute associates an identifier with a value, a balanced token sequence, 1535 which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`). 1536 The sequence may not contain interpolations. 1537 1538 Fields, structs and packages can be associated with a set of attributes. 1539 Attributes accumulate during unification, but implementations may remove 1540 duplicates that have the same source string representation. 1541 The interpretation of an attribute, including the handling of multiple 1542 attributes for a given identifier, is up to the consumer of the attribute. 1543 1544 Field attributes define additional information about a field, 1545 such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative 1546 name of the field when mapping to a different language. 1547 1548 1549 ``` 1550 // Package attribute 1551 @protobuf(proto3) 1552 1553 myStruct1: { 1554 // Struct attribute: 1555 @jsonschema(id="https://example.org/mystruct1.json") 1556 1557 // Field attributes 1558 field: string @go(Field) 1559 attr: int @xml(,attr) @go(Attr) 1560 } 1561 1562 myStruct2: { 1563 field: string @go(Field) 1564 attr: int @xml(a1,attr) @go(Attr) 1565 } 1566 1567 Combined: myStruct1 & myStruct2 1568 // field: string @go(Field) 1569 // attr: int @xml(,attr) @xml(a1,attr) @go(Attr) 1570 ``` 1571 1572 1573 #### Aliases 1574 1575 Aliases name values that can be referred to 1576 within the [scope](#declarations-and-scopes) in which they are declared. 1577 The name of an alias must be unique within its scope. 1578 1579 ``` 1580 AliasExpr = [ identifier "=" ] Expression . 1581 ``` 1582 1583 Aliases can appear in several positions: 1584 1585 <!--- TODO: consider allowing this. It should be considered whether 1586 having field aliases isn't already sufficient. 1587 1588 As a declaration in a struct (`X=value`): 1589 1590 - binds identifier `X` to a value embedded within the struct. 1591 ---> 1592 1593 In front of a Label (`X=label: value`): 1594 1595 - binds the identifier to the same value as `label` would be bound 1596 to if it were a valid identifier. 1597 1598 In front of a dynamic field (`X=(label): value`): 1599 1600 - binds the identifier to the same value as `label` if it were a valid 1601 static identifier. 1602 1603 In front of a dynamic field expression (`(X=expr): value`): 1604 1605 - binds the identifier to the concrete label resulting from evaluating `expr`. 1606 1607 In front of a pattern constraint (`X=[expr]: value`): 1608 1609 - binds the identifier to the same field as the matched by the pattern 1610 within the instance of the field value (`value`). 1611 1612 In front of a pattern constraint expression (`[X=expr]: value`): 1613 1614 - binds the identifier to the concrete label that matches `expr` 1615 within the instances of the field value (`value`). 1616 1617 Before a value (`foo: X=x`) 1618 1619 - binds the identifier to the value it precedes within the scope of that value. 1620 1621 Before a list element (`[ X=value, X+1 ]`) (Not yet implemented) 1622 1623 - binds the identifier to the list element it precedes within the scope of the 1624 list expression. 1625 1626 <!-- TODO: explain the difference between aliases and definitions. 1627 Now that you have definitions, are aliases really necessary? 1628 Consider removing. 1629 --> 1630 1631 ``` 1632 // A field alias 1633 foo: X // 4 1634 X="not an identifier": 4 1635 1636 // A value alias 1637 foo: X={x: X.a} 1638 bar: foo & {a: 1} // {a: 1, x: 1} 1639 1640 // A label alias 1641 [Y=string]: { name: Y } 1642 foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 } 1643 ``` 1644 1645 <!-- TODO: also allow aliases as lists --> 1646 1647 1648 #### Let declarations 1649 1650 _Let declarations_ bind an identifier to an expression. 1651 The identifier is only visible within the [scope](#declarations-and-scopes) 1652 in which it is declared. 1653 The identifier must be unique within its scope. 1654 1655 ``` 1656 let x = expr 1657 1658 a: x + 1 1659 b: x + 2 1660 ``` 1661 1662 #### Shorthand notation for nested structs 1663 1664 A field whose value is a struct with a single field may be written as 1665 a colon-separated sequence of the two field names, 1666 followed by a colon and the value of that single field. 1667 1668 ``` 1669 job: myTask: replicas: 2 1670 ``` 1671 expands to 1672 ``` 1673 job: { 1674 myTask: { 1675 replicas: 2 1676 } 1677 } 1678 ``` 1679 1680 <!-- OPTIONAL FIELDS: 1681 1682 The optional marker solves the issue of having to print large amounts of 1683 boilerplate when dealing with large types with many optional or default 1684 values (such as Kubernetes). 1685 Writing such optional values in terms of *null | value is tedious, 1686 unpleasant to read, and as it is not well defined what can be dropped or not, 1687 all null values have to be emitted from the output, even if the user 1688 doesn't override them. 1689 Part of the issue is how null is defined. We could adopt a Typescript-like 1690 approach of introducing "void" or "undefined" to mean "not defined and not 1691 part of the output". But having all of null, undefined, and void can be 1692 confusing. If these ever are introduced anyway, the ? operator could be 1693 expressed along the lines of 1694 foo?: bar 1695 being a shorthand for 1696 foo: void | bar 1697 where void is the default if no other default is given. 1698 1699 The current mechanical definition of "?" is straightforward, though, and 1700 probably avoids the need for void, while solving a big issue. 1701 1702 Caveats: 1703 [1] this definition requires explicitly defined fields to be emitted, even 1704 if they could be elided (for instance if the explicit value is the default 1705 value defined an optional field). This is probably a good thing. 1706 1707 [2] a default value may still need to be included in an output if it is not 1708 the zero value for that field and it is not known if any outside system is 1709 aware of defaults. For instance, which defaults are specified by the user 1710 and which by the schema understood by the receiving system. 1711 The use of "?" together with defaults should therefore be used carefully 1712 in non-schema definitions. 1713 Problematic cases should be easy to detect by a vet-like check, though. 1714 1715 [3] It should be considered how this affects the trim command. 1716 Should values implied by optional fields be allowed to be removed? 1717 Probably not. This restriction is unlikely to limit the usefulness of trim, 1718 though. 1719 1720 [4] There should be an option to emit all concrete optional values. 1721 ``` 1722 --> 1723 1724 ### Lists 1725 1726 A list literal defines a new value of type list. 1727 A list may be open or closed. 1728 An open list is indicated with a `...` at the end of an element list, 1729 optionally followed by a value for the remaining elements. 1730 1731 The length of a closed list is the number of elements it contains. 1732 The length of an open list is the number of elements as a lower bound 1733 and an unlimited number of elements as its upper bound. 1734 1735 ``` 1736 ListLit = "[" [ ElementList [ "," ] ] "]" . 1737 ElementList = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] . 1738 ``` 1739 1740 Lists can be thought of as structs: 1741 1742 ``` 1743 List: *null | { 1744 Elem: _ 1745 Tail: List 1746 } 1747 ``` 1748 1749 For closed lists, `Tail` is `null` for the last element, for open lists it is 1750 `*null | List`, defaulting to the shortest variant. 1751 For instance, the open list [ 1, 2, ... ] can be represented as: 1752 ``` 1753 open: List & { Elem: 1, Tail: { Elem: 2 } } 1754 ``` 1755 and the closed version of this list, [ 1, 2 ], as 1756 ``` 1757 closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } } 1758 ``` 1759 1760 Using this representation, the subsumption rule for lists can 1761 be derived from those of structs. 1762 Implementations are not required to implement lists as structs. 1763 The `Elem` and `Tail` fields are not special and `len` will not work as 1764 expected in these cases. 1765 1766 1767 ## Declarations and Scopes 1768 1769 1770 ### Blocks 1771 1772 A _block_ is a possibly empty sequence of declarations. 1773 The braces of a struct literal `{ ... }` form a block, but there are 1774 others as well: 1775 1776 - The _universe block_ encompasses all CUE source text. 1777 - Each [package](#modules-instances-and-packages) has a _package block_ 1778 containing all CUE source text in that package. 1779 - Each file has a _file block_ containing all CUE source text in that file. 1780 - Each `for` and `let` clause in a [comprehension](#comprehensions) 1781 is considered to be its own implicit block. 1782 1783 Blocks nest and influence scoping. 1784 1785 1786 ### Declarations and scope 1787 1788 A _declaration_ may bind an identifier to a field, alias, or package. 1789 Every identifier in a program must be declared. 1790 Other than for fields, 1791 no identifier may be declared twice within the same block. 1792 For fields, an identifier may be declared more than once within the same block, 1793 resulting in a field with a value that is the result of unifying the values 1794 of all fields with the same identifier. 1795 String labels do not bind an identifier to the respective field. 1796 1797 The _scope_ of a declared identifier is the extent of source text in which the 1798 identifier denotes the specified field, alias, or package. 1799 1800 CUE is lexically scoped using blocks: 1801 1802 1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block. 1803 1. The scope of an identifier denoting a field 1804 declared at top level (outside any struct literal) is the package block. 1805 1. The scope of an identifier denoting an alias 1806 declared at top level (outside any struct literal) is the file block. 1807 1. The scope of a let identifier 1808 declared at top level (outside any struct literal) is the file block. 1809 1. The scope of the package name of an imported package is the file block of the 1810 file containing the import declaration. 1811 1. The scope of a field, alias or let identifier declared inside a struct 1812 literal is the innermost containing block. 1813 1814 An identifier declared in a block may be redeclared in an inner block. 1815 While the identifier of the inner declaration is in scope, it denotes the entity 1816 declared by the inner declaration. 1817 1818 The package clause is not a declaration; 1819 the package name does not appear in any scope. 1820 Its purpose is to identify the files belonging to the same package 1821 and to specify the default name for import declarations. 1822 1823 1824 ### Predeclared identifiers 1825 1826 CUE predefines a set of types and builtin functions. 1827 For each of these there is a corresponding keyword which is the name 1828 of the predefined identifier, prefixed with `__`. 1829 1830 ``` 1831 Functions 1832 len close and or 1833 1834 Types 1835 null The null type and value 1836 bool All boolean values 1837 int All integral numbers 1838 float All decimal floating-point numbers 1839 string Any valid UTF-8 sequence 1840 bytes Any valid byte sequence 1841 1842 Derived Value 1843 number int | float 1844 uint >=0 1845 uint8 >=0 & <=255 1846 int8 >=-128 & <=127 1847 uint16 >=0 & <=65535 1848 int16 >=-32_768 & <=32_767 1849 rune >=0 & <=0x10FFFF 1850 uint32 >=0 & <=4_294_967_295 1851 int32 >=-2_147_483_648 & <=2_147_483_647 1852 uint64 >=0 & <=18_446_744_073_709_551_615 1853 int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807 1854 uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455 1855 int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 & 1856 <=170_141_183_460_469_231_731_687_303_715_884_105_727 1857 float32 >=-3.40282346638528859811704183484516925440e+38 & 1858 <=3.40282346638528859811704183484516925440e+38 1859 float64 >=-1.797693134862315708145274237317043567981e+308 & 1860 <=1.797693134862315708145274237317043567981e+308 1861 ``` 1862 1863 1864 ### Exported identifiers 1865 1866 <!-- move to a more logical spot --> 1867 1868 An identifier of a package may be exported to permit access to it 1869 from another package. 1870 All identifiers not starting with `_` (so all regular fields and definitions 1871 starting with `#`) are exported. 1872 Any identifier starting with `_` is not visible outside the package and resides 1873 in a separate namespace than namesake identifiers of other packages. 1874 1875 ``` 1876 package mypackage 1877 1878 foo: string // visible outside mypackage 1879 "bar": string // visible outside mypackage 1880 1881 #Foo: { // visible outside mypackage 1882 a: 1 // visible outside mypackage 1883 _b: 2 // not visible outside mypackage 1884 1885 #C: { // visible outside mypackage 1886 d: 4 // visible outside mypackage 1887 } 1888 _#E: foo // not visible outside mypackage 1889 } 1890 ``` 1891 1892 1893 ### Uniqueness of identifiers 1894 1895 Given a set of identifiers, an identifier is called unique if it is different 1896 from every other in the set, after applying normalization following 1897 [Unicode Annex #31](https://unicode.org/reports/tr31/). 1898 Two identifiers are different if they are spelled differently 1899 or if they appear in different packages and are not exported. 1900 Otherwise, they are the same. 1901 1902 1903 ### Field declarations 1904 1905 A field associates the value of an expression to a label within a struct. 1906 If this label is an identifier, it binds the field to that identifier, 1907 so the field's value can be referenced by writing the identifier. 1908 String labels are not bound to fields. 1909 ``` 1910 a: { 1911 b: 2 1912 "s": 3 1913 1914 c: b // 2 1915 d: s // _|_ unresolved identifier "s" 1916 e: a.s // 3 1917 } 1918 ``` 1919 1920 If an expression may result in a value associated with a default value 1921 as described in [default values](#default-values), the field binds to this 1922 value-default pair. 1923 1924 1925 <!-- TODO: disallow creating identifiers starting with __ 1926 ...and reserve them for builtin values. 1927 1928 The issue is with code generation. As no guarantee can be given that 1929 a predeclared identifier is not overridden in one of the enclosing scopes, 1930 code will have to handle detecting such cases and renaming them. 1931 An alternative is to have the predeclared identifiers be aliases for namesake 1932 equivalents starting with a double underscore (e.g. string -> __string), 1933 allowing generated code (normal code would keep using `string`) to refer 1934 to these directly. 1935 --> 1936 1937 1938 ### Let declarations 1939 1940 <!-- 1941 TODO: why are there two "Let declarations" sections? 1942 --> 1943 1944 Within a struct, a let clause binds an identifier to the given expression. 1945 1946 Within the scope of the identifier, the identifier refers to the 1947 _locally declared_ expression. 1948 The expression is evaluated in the scope it was declared. 1949 1950 1951 ## Expressions 1952 1953 An expression specifies the computation of a value by applying operators and 1954 builtin functions to operands. 1955 1956 Expressions that require concrete values are called _incomplete_ if any of 1957 their operands are not concrete, but define a value that would be legal for 1958 that expression. 1959 Incomplete expressions may be left unevaluated until a concrete value is 1960 requested at the application level. 1961 1962 ### Operands 1963 1964 Operands denote the elementary values in an expression. 1965 An operand may be a literal, a (possibly qualified) identifier denoting 1966 a field, alias, or let declaration, or a parenthesized expression. 1967 1968 ``` 1969 Operand = Literal | OperandName | "(" Expression ")" . 1970 Literal = BasicLit | ListLit | StructLit . 1971 BasicLit = int_lit | float_lit | string_lit | 1972 null_lit | bool_lit | bottom_lit . 1973 OperandName = identifier | QualifiedIdent . 1974 ``` 1975 1976 ### Qualified identifiers 1977 1978 A qualified identifier is an identifier qualified with a package name prefix. 1979 1980 ``` 1981 QualifiedIdent = PackageName "." identifier . 1982 ``` 1983 1984 A qualified identifier accesses an identifier in a different package, 1985 which must be [imported](#import-declarations). 1986 The identifier must be declared in the [package block](#blocks) of that package. 1987 1988 ``` 1989 math.Sin // denotes the Sin function in package math 1990 ``` 1991 1992 ### References 1993 1994 An identifier operand refers to a field and is called a reference. 1995 The value of a reference is a copy of the expression associated with the field 1996 that it is bound to, 1997 with any references within that expression bound to the respective copies of 1998 the fields they were originally bound to. 1999 Implementations may use a different mechanism to evaluate as long as 2000 these semantics are maintained. 2001 2002 ``` 2003 a: { 2004 place: string 2005 greeting: "Hello, \(place)!" 2006 } 2007 2008 b: a & { place: "world" } 2009 c: a & { place: "you" } 2010 2011 d: b.greeting // "Hello, world!" 2012 e: c.greeting // "Hello, you!" 2013 ``` 2014 2015 2016 2017 ### Primary expressions 2018 2019 Primary expressions are the operands for unary and binary expressions. 2020 2021 ``` 2022 PrimaryExpr = 2023 Operand | 2024 PrimaryExpr Selector | 2025 PrimaryExpr Index | 2026 PrimaryExpr Arguments . 2027 2028 Selector = "." (identifier | simple_string_lit) . 2029 Index = "[" Expression "]" . 2030 Argument = Expression . 2031 Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" . 2032 ``` 2033 <!--- 2034 TODO: 2035 PrimaryExpr Query | 2036 Query = "." Filters . 2037 Filters = Filter { Filter } . 2038 Filter = "[" [ "?" ] AliasExpr "]" . 2039 2040 TODO: maybe reintroduce slices, as they are useful in queries, probably this 2041 time with Python semantics. 2042 PrimaryExpr Slice | 2043 Slice = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" . 2044 2045 Argument = Expression | ( identifier ":" Expression ). 2046 2047 // & expression type 2048 // string_lit: same as label. Arguments is current node. 2049 // If selector is applied to list, it performs the operation for each 2050 // element. 2051 2052 TODO: considering allowing decimal_lit for selectors. 2053 ---> 2054 2055 ``` 2056 x 2057 2 2058 (s + ".txt") 2059 f(3.1415, true) 2060 m["foo"] 2061 obj.color 2062 f.p[i].x 2063 ``` 2064 2065 2066 ### Selectors 2067 2068 For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause), 2069 the selector expression 2070 2071 ``` 2072 x.f 2073 ``` 2074 2075 denotes the element of a <!--list or -->struct `x` identified by `f`. 2076 <!--For structs, --> 2077 `f` must be an identifier or a string literal identifying 2078 any definition or regular non-optional field. 2079 The identifier `f` is called the field selector. 2080 2081 <!-- 2082 Allowing strings to be used as field selectors obviates the need for 2083 backquoted identifiers. Note that some standards use names for structs that 2084 are not standard identifiers (such "Fn::Foo"). Note that indexing does not 2085 allow access to identifiers. 2086 --> 2087 2088 <!-- 2089 For lists, `f` must be an integer and follows the same lookup rules as 2090 for the index operation. 2091 The type of the selector expression is the type of `f`. 2092 --> 2093 2094 If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers). 2095 2096 <!-- 2097 TODO: consider allowing this and also for selectors. It needs to be considered 2098 how defaults are carried forward in cases like: 2099 2100 x: { a: string | *"foo" } | *{ a: int | *4 } 2101 y: x.a & string 2102 2103 What is y in this case? 2104 (x.a & string, _|_) 2105 (string|"foo", _|_) 2106 (string|"foo", "foo) 2107 If the latter, then why? 2108 2109 For a disjunction of the form `x1 | ... | xn`, 2110 the selector is applied to each element `x1.f | ... | xn.f`. 2111 --> 2112 2113 Otherwise, if `x` is not a <!--list or -->struct, 2114 or if `f` does not exist in `x`, 2115 the result of the expression is bottom (an error). 2116 In the latter case the expression is incomplete. 2117 The operand of a selector may be associated with a default. 2118 2119 ``` 2120 T: { 2121 x: int 2122 y: 3 2123 "x-y": 4 2124 } 2125 2126 a: T.x // int 2127 b: T.y // 3 2128 c: T.z // _|_ // field 'z' not found in T 2129 d: T."x-y" // 4 2130 2131 e: {a: 1|*2} | *{a: 3|*4} 2132 f: e.a // 4 (default value) 2133 ``` 2134 2135 <!-- 2136 ``` 2137 (v, d).f => (v.f, d.f) 2138 2139 e: {a: 1|*2} | *{a: 3|*4} 2140 f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4) 2141 2142 ``` 2143 --> 2144 2145 2146 ### Index expressions 2147 2148 A primary expression of the form 2149 2150 ``` 2151 a[x] 2152 ``` 2153 2154 denotes the element of a list or struct `a` indexed by `x`. 2155 The value `x` is called the index or field name, respectively. 2156 The following rules apply: 2157 2158 If `a` is not a struct: 2159 2160 - `a` is a list (which need not be complete) 2161 - the index `x` unified with `int` must be concrete. 2162 - the index `x` is in range if `0 <= x < len(a)`, where only the 2163 explicitly defined values of an open-ended list are considered, 2164 otherwise it is out of range 2165 2166 The result of `a[x]` is 2167 2168 for `a` of list type: 2169 2170 - the list element at index `x`, if `x` is within range 2171 - bottom (an error), otherwise 2172 2173 2174 for `a` of struct type: 2175 2176 - the index `x` unified with `string` must be concrete. 2177 - the value of the regular and non-optional field named `x` of struct `a`, 2178 if this field exists 2179 - bottom (an error), otherwise 2180 2181 2182 ``` 2183 a: [ 1, 2 ][1] // 2 2184 b: [ 1, 2 ][2] // _|_ 2185 c: [ 1, 2, ...][2] // _|_ 2186 2187 // Defaults are selected for both operand and index: 2188 x: [1, 2] | *[3, 4] 2189 y: int | *1 2190 z: x[y] // 4 2191 ``` 2192 2193 ### Operators 2194 2195 Operators combine operands into expressions. 2196 2197 ``` 2198 Expression = UnaryExpr | Expression binary_op Expression . 2199 UnaryExpr = PrimaryExpr | unary_op UnaryExpr . 2200 2201 binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op . 2202 rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" . 2203 add_op = "+" | "-" . 2204 mul_op = "*" | "/" . 2205 unary_op = "+" | "-" | "!" | "*" | rel_op . 2206 ``` 2207 2208 Comparisons are discussed [elsewhere](#comparison-operators). 2209 For any binary operators, the operand types must unify. 2210 2211 <!-- TODO: durations 2212 unless the operation involves durations. 2213 2214 Except for duration operations, if one operand is an untyped [literal] and the 2215 other operand is not, the constant is [converted] to the type of the other 2216 operand. 2217 --> 2218 2219 <!-- 2220 Operands of unary and binary expressions may be associated with a default using 2221 the following: 2222 2223 ``` 2224 O1: op (v1, d1) => (op v1, op d1) 2225 2226 O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2) 2227 and because v => (v, v) 2228 O3: v1 op (v2, d2) => (v1 op v2, v1 op d2) 2229 O4: (v1, d1) op v2 => (v1 op v2, d1 op v2) 2230 ``` 2231 2232 ``` 2233 Field Resulting Value-Default pair 2234 a: *1|2 (1|2, 1) 2235 b: -a (-a, -1) 2236 2237 c: a + 2 (a+2, 3) 2238 d: a + a (a+a, 2) 2239 ``` 2240 --> 2241 2242 #### Operator precedence 2243 2244 Unary operators have the highest precedence. 2245 2246 There are eight precedence levels for binary operators. 2247 Multiplication operators binds strongest, followed by 2248 addition operators, comparison operators, 2249 `&&` (logical AND), `||` (logical OR), `&` (unification), 2250 and finally `|` (disjunction): 2251 2252 ``` 2253 Precedence Operator 2254 7 * / 2255 6 + - 2256 5 == != < <= > >= =~ !~ 2257 4 && 2258 3 || 2259 2 & 2260 1 | 2261 ``` 2262 2263 Binary operators of the same precedence associate from left to right. 2264 For instance, `x / y * z` is the same as `(x / y) * z`. 2265 2266 ``` 2267 +x 2268 23 + 3*x[i] 2269 x <= f() 2270 f() || g() 2271 x == y+1 && y == z-1 2272 2 | int 2273 { a: 1 } & { b: 2 } 2274 ``` 2275 2276 #### Arithmetic operators 2277 2278 Arithmetic operators apply to numeric values and yield a result of the same type 2279 as the first operand. The four standard arithmetic operators 2280 `(+, -, *, /)` apply to integer and decimal floating-point types; 2281 `+` and `*` also apply to strings and bytes. 2282 2283 ``` 2284 + sum integers, floats, strings, bytes 2285 - difference integers, floats 2286 * product integers, floats, strings, bytes 2287 / quotient integers, floats 2288 ``` 2289 2290 For any operator that accepts operands of type `float`, any operand may be 2291 of type `int` or `float`, in which case the result will be `float` 2292 if it cannot be represented as an `int` or if any of the operands are `float`, 2293 or `int` otherwise. 2294 So the result of `1 / 2` is `0.5` and is of type `float`. 2295 2296 The result of division by zero is bottom (an error). 2297 <!-- TODO: consider making it +/- Inf --> 2298 Integer division is implemented through the builtin functions 2299 `quo`, `rem`, `div`, and `mod`. 2300 2301 The unary operators `+` and `-` are defined for numeric values as follows: 2302 2303 ``` 2304 +x is 0 + x 2305 -x negation is 0 - x 2306 ``` 2307 2308 #### String operators 2309 2310 Strings can be concatenated using the `+` operator: 2311 ``` 2312 s: "hi " + name + " and good bye" 2313 ``` 2314 String addition creates a new string by concatenating the operands. 2315 2316 A string can be repeated by multiplying it: 2317 2318 ``` 2319 s: "etc. "*3 // "etc. etc. etc. " 2320 ``` 2321 2322 <!-- jba: Do these work for byte sequences? If not, why not? --> 2323 2324 2325 ##### Comparison operators 2326 2327 Comparison operators compare two operands and yield an untyped boolean value. 2328 2329 ``` 2330 == equal 2331 != not equal 2332 < less 2333 <= less or equal 2334 > greater 2335 >= greater or equal 2336 =~ matches regular expression 2337 !~ does not match regular expression 2338 ``` 2339 2340 <!-- regular expression operator inspired by Bash, Perl, and Ruby. --> 2341 2342 In any comparison, the types of the two operands must unify or one of the 2343 operands must be null. 2344 2345 The equality operators `==` and `!=` apply to operands that are comparable. 2346 The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered. 2347 The matching operators `=~` and `!~` apply to a string and a regular 2348 expression operand. 2349 These terms and the result of the comparisons are defined as follows: 2350 2351 - Null is comparable with itself and any other type. 2352 Two null values are always equal, null is unequal with anything else. 2353 - Boolean values are comparable. 2354 Two boolean values are equal if they are either both true or both false. 2355 - Integer values are comparable and ordered, in the usual way. 2356 - Floating-point values are comparable and ordered, as per the definitions 2357 for binary coded decimals in the IEEE-754-2008 standard. 2358 - Floating point numbers may be compared with integers. 2359 - String and bytes values are comparable and ordered lexically byte-wise. 2360 - Struct are not comparable. 2361 - Lists are not comparable. 2362 - The regular expression syntax is the one accepted by RE2, 2363 described in https://github.com/google/re2/wiki/Syntax, 2364 except for `\C`. 2365 - `s =~ r` is true if `s` matches the regular expression `r`. 2366 - `s !~ r` is true if `s` does not match regular expression `r`. 2367 2368 <!--- TODO: consider the following 2369 - For regular expression, named capture groups are interpreted as CUE references 2370 that must unify with the strings matching this capture group. 2371 ---> 2372 <!-- TODO: Implementations should adopt an algorithm that runs in linear time? --> 2373 <!-- Consider implementing Level 2 of Unicode regular expression. --> 2374 2375 ``` 2376 3 < 4 // true 2377 3 < 4.0 // true 2378 null == 2 // false 2379 null != {} // true 2380 {} == {} // _|_: structs are not comparable against structs 2381 2382 "Wild cats" =~ "cat" // true 2383 "Wild cats" !~ "dog" // true 2384 2385 "foo" =~ "^[a-z]{3}$" // true 2386 "foo" =~ "^[a-z]{4}$" // false 2387 ``` 2388 2389 <!-- jba 2390 I think I know what `3 < a` should mean if 2391 2392 a: >=1 & <=5 2393 2394 It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely. 2395 2396 But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value. 2397 --> 2398 2399 #### Logical operators 2400 2401 Logical operators apply to boolean values and yield a result of the same type 2402 as the operands. The right operand is evaluated conditionally. 2403 2404 ``` 2405 && conditional AND p && q is "if p then q else false" 2406 || conditional OR p || q is "if p then true else q" 2407 ! NOT !p is "not p" 2408 ``` 2409 2410 2411 <!-- 2412 ### TODO TODO TODO 2413 2414 3.14 / 0.0 // illegal: division by zero 2415 Illegal conversions always apply to CUE. 2416 2417 Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa. 2418 --> 2419 2420 <!--- TODO(mpvl): conversions 2421 ### Conversions 2422 Conversions are expressions of the form `T(x)` where `T` and `x` are 2423 expressions. 2424 The result is always an instance of `T`. 2425 2426 ``` 2427 Conversion = Expression "(" Expression [ "," ] ")" . 2428 ``` 2429 ---> 2430 <!--- 2431 2432 A literal value `x` can be converted to type T if `x` is representable by a 2433 value of `T`. 2434 2435 As a special case, an integer literal `x` can be converted to a string type 2436 using the same rule as for non-constant x. 2437 2438 Converting a literal yields a typed value as result. 2439 2440 ``` 2441 uint(iota) // iota value of type uint 2442 float32(2.718281828) // 2.718281828 of type float32 2443 complex128(1) // 1.0 + 0.0i of type complex128 2444 float32(0.49999999) // 0.5 of type float32 2445 float64(-1e-1000) // 0.0 of type float64 2446 string('x') // "x" of type string 2447 string(0x266c) // "♬" of type string 2448 MyString("foo" + "bar") // "foobar" of type MyString 2449 string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant 2450 (*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type 2451 int(1.2) // illegal: 1.2 cannot be represented as an int 2452 string(65.0) // illegal: 65.0 is not an integer constant 2453 ``` 2454 ---> 2455 <!--- 2456 2457 A conversion is always allowed if `x` is an instance of `T`. 2458 2459 If `T` and `x` of different underlying type, a conversion is allowed if 2460 `x` can be converted to a value `x'` of `T`'s type, and 2461 `x'` is an instance of `T`. 2462 A value `x` can be converted to the type of `T` in any of these cases: 2463 2464 - `x` is a struct and is subsumed by `T`. 2465 - `x` and `T` are both integer or floating points. 2466 - `x` is an integer or a byte sequence and `T` is a string. 2467 - `x` is a string and `T` is a byte sequence. 2468 2469 Specific rules apply to conversions between numeric types, structs, 2470 or to and from a string type. These conversions may change the representation 2471 of `x`. 2472 All other conversions only change the type but not the representation of x. 2473 2474 2475 #### Conversions between numeric ranges 2476 For the conversion of numeric values, the following rules apply: 2477 2478 1. Any integer value can be converted into any other integer value 2479 provided that it is within range. 2480 2. When converting a decimal floating-point number to an integer, the fraction 2481 is discarded (truncation towards zero). TODO: or disallow truncating? 2482 2483 ``` 2484 a: uint16(int(1000)) // uint16(1000) 2485 b: uint8(1000) // _|_ // overflow 2486 c: int(2.5) // 2 TODO: TBD 2487 ``` 2488 2489 2490 #### Conversions to and from a string type 2491 2492 Converting a list of bytes to a string type yields a string whose successive 2493 bytes are the elements of the slice. 2494 Invalid UTF-8 is converted to `"\uFFFD"`. 2495 2496 ``` 2497 string('hell\xc3\xb8') // "hellø" 2498 string(bytes([0x20])) // " " 2499 ``` 2500 2501 As string value is always convertible to a list of bytes. 2502 2503 ``` 2504 bytes("hellø") // 'hell\xc3\xb8' 2505 bytes("") // '' 2506 ``` 2507 2508 #### Conversions between list types 2509 2510 Conversions between list types are possible only if `T` strictly subsumes `x` 2511 and the result will be the unification of `T` and `x`. 2512 2513 If we introduce named types this would be different from IP & [10, ...] 2514 2515 Consider removing this until it has a different meaning. 2516 2517 ``` 2518 IP: 4*[byte] 2519 Private10: IP([10, ...]) // [10, byte, byte, byte] 2520 ``` 2521 2522 #### Conversions between struct types 2523 2524 A conversion from `x` to `T` 2525 is applied using the following rules: 2526 2527 1. `x` must be an instance of `T`, 2528 2. all fields defined for `x` that are not defined for `T` are removed from 2529 the result of the conversion, recursively. 2530 2531 <!-- jba: I don't think you say anywhere that the matching fields are unified. 2532 mpvl: they are not, x must be an instance of T, in which case x == T&x, 2533 so unification would be unnecessary. 2534 --> 2535 <!-- 2536 ``` 2537 T: { 2538 a: { b: 1..10 } 2539 } 2540 2541 x1: { 2542 a: { b: 8, c: 10 } 2543 d: 9 2544 } 2545 2546 c1: T(x1) // { a: { b: 8 } } 2547 c2: T({}) // _|_ // missing field 'a' in '{}' 2548 c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10) 2549 ``` 2550 --> 2551 2552 ### Calls 2553 2554 Calls can be made to core library functions, called builtins. 2555 Given an expression `f` of function type F, 2556 ``` 2557 f(a1, a2, … an) 2558 ``` 2559 calls `f` with arguments `a1, a2, … an`. Arguments must be expressions 2560 of which the values are an instance of the parameter types of `F` 2561 and are evaluated before the function is called. 2562 2563 ``` 2564 a: math.Atan2(x, y) 2565 ``` 2566 2567 In a function call, the function value and arguments are evaluated in the usual 2568 order. 2569 After they are evaluated, the parameters of the call are passed by value 2570 to the function and the called function begins execution. 2571 The return parameters 2572 of the function are passed by value back to the calling function when the 2573 function returns. 2574 2575 2576 ### Comprehensions 2577 2578 Lists and fields can be constructed using comprehensions. 2579 2580 Comprehensions define a clause sequence that consists of a sequence of 2581 `for`, `if`, and `let` clauses, nesting from left to right. 2582 The sequence must start with a `for` or `if` clause. 2583 The `for` and `let` clauses each define a new scope in which new values are 2584 bound to be available for the next clause. 2585 2586 The `for` clause binds the defined identifiers, on each iteration, to the next 2587 value of some iterable value in a new scope. 2588 A `for` clause may bind one or two identifiers. 2589 If there is one identifier, it binds it to the value of 2590 a list element or struct field value. 2591 If there are two identifiers, the first value will be the key or index, 2592 if available, and the second will be the value. 2593 2594 For lists, `for` iterates over all elements in the list after closing it. 2595 For structs, `for` iterates over all non-optional regular fields. 2596 2597 An `if` clause, or guard, specifies an expression that terminates the current 2598 iteration if it evaluates to false. 2599 2600 The `let` clause binds the result of an expression to the defined identifier 2601 in a new scope. 2602 2603 A current iteration is said to complete if the innermost block of the clause 2604 sequence is reached. 2605 Syntactically, the comprehension value is a struct. 2606 A comprehension can generate non-struct values by embedding such values within 2607 this struct. 2608 2609 Within lists, the values yielded by a comprehension are inserted in the list 2610 at the position of the comprehension. 2611 Within structs, the values yielded by a comprehension are embedded within the 2612 struct. 2613 Both structs and lists may contain multiple comprehensions. 2614 2615 ``` 2616 Comprehension = Clauses StructLit . 2617 2618 Clauses = StartClause { [ "," ] Clause } . 2619 StartClause = ForClause | GuardClause . 2620 Clause = StartClause | LetClause . 2621 ForClause = "for" identifier [ "," identifier ] "in" Expression . 2622 GuardClause = "if" Expression . 2623 LetClause = "let" identifier "=" Expression . 2624 ``` 2625 2626 ``` 2627 a: [1, 2, 3, 4] 2628 b: [for x in a if x > 1 { x+1 }] // [3, 4, 5] 2629 2630 c: { 2631 for x in a 2632 if x < 4 2633 let y = 1 { 2634 "\(x)": x + y 2635 } 2636 } 2637 d: { "1": 2, "2": 3, "3": 4 } 2638 ``` 2639 2640 2641 ### String interpolation 2642 2643 String interpolation allows constructing strings by replacing placeholder 2644 expressions with their string representation. 2645 String interpolation may be used in single- and double-quoted strings, as well 2646 as their multiline equivalent. 2647 2648 A placeholder consists of `\(` followed by an expression and `)`. 2649 The expression is evaluated in the scope within which the string is defined. 2650 2651 The result of the expression is substituted as follows: 2652 - string: as is 2653 - bool: the JSON representation of the bool 2654 - number: a JSON representation of the number that preserves the 2655 precision of the underlying binary coded decimal 2656 - bytes: as if substituted within single quotes or 2657 converted to valid UTF-8 replacing the 2658 maximal subpart of ill-formed subsequences with a single 2659 replacement character (W3C encoding standard) otherwise 2660 - list: illegal 2661 - struct: illegal 2662 2663 2664 ``` 2665 a: "World" 2666 b: "Hello \( a )!" // Hello World! 2667 ``` 2668 2669 2670 ## Builtin Functions 2671 2672 Builtin functions are predeclared. They are called like any other function. 2673 2674 2675 ### `len` 2676 2677 The builtin function `len` takes arguments of various types and returns 2678 a result of type int. 2679 2680 ``` 2681 Argument type Result 2682 2683 bytes length of byte sequence 2684 list list length, smallest length for an open list 2685 struct number of distinct data fields, excluding field constraints 2686 ``` 2687 <!-- TODO: consider not supporting len, but instead rely on more 2688 precisely named builtin functions: 2689 - strings.RuneLen(x) 2690 - bytes.Len(x) // x may be a string 2691 - struct.NumFooFields(x) 2692 - list.Len(x) 2693 --> 2694 2695 ``` 2696 Expression Result 2697 len("Hellø") 6 2698 len([1, 2, 3]) 3 2699 len([1, 2, ...]) 2 2700 ``` 2701 2702 2703 ### `close` 2704 2705 The builtin function `close` converts a partially defined, or open, struct 2706 to a fully defined, or closed, struct. 2707 2708 2709 ### `and` 2710 2711 The builtin function `and` takes a list and returns the result of applying 2712 the `&` operator to all elements in the list. 2713 It returns top for the empty list. 2714 2715 ``` 2716 Expression: Result 2717 and([a, b]) a & b 2718 and([a]) a 2719 and([]) _ 2720 ``` 2721 2722 ### `or` 2723 2724 The builtin function `or` takes a list and returns the result of applying 2725 the `|` operator to all elements in the list. 2726 It returns bottom for the empty list. 2727 2728 ``` 2729 Expression: Result 2730 or([a, b]) a | b 2731 or([a]) a 2732 or([]) _|_ 2733 ``` 2734 2735 ### `div`, `mod`, `quo` and `rem` 2736 2737 For two integer values `x` and `y`, 2738 the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)` 2739 implement Euclidean division and 2740 satisfy the following relationship: 2741 2742 ``` 2743 r = x - y*q with 0 <= r < |y| 2744 ``` 2745 where `|y|` denotes the absolute value of `y`. 2746 2747 ``` 2748 x y div(x, y) mod(x, y) 2749 5 3 1 2 2750 -5 3 -2 1 2751 5 -3 -1 2 2752 -5 -3 2 1 2753 ``` 2754 2755 For two integer values `x` and `y`, 2756 the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)` 2757 implement truncated division and 2758 satisfy the following relationship: 2759 2760 ``` 2761 x = q*y + r and |r| < |y| 2762 ``` 2763 2764 with `quo(x, y)` truncated towards zero. 2765 2766 ``` 2767 x y quo(x, y) rem(x, y) 2768 5 3 1 2 2769 -5 3 -1 -2 2770 5 -3 -1 2 2771 -5 -3 1 -2 2772 ``` 2773 2774 A zero divisor in either case results in bottom (an error). 2775 2776 2777 ## Cycles 2778 2779 Implementations are required to interpret or reject cycles encountered 2780 during evaluation according to the rules in this section. 2781 2782 2783 ### Reference cycles 2784 2785 A _reference cycle_ occurs if a field references itself, either directly or 2786 indirectly. 2787 2788 ``` 2789 // x references itself 2790 x: x 2791 2792 // indirect cycles 2793 b: c 2794 c: d 2795 d: b 2796 ``` 2797 2798 Implementations should treat these as `_`. 2799 Two particular cases are discussed below. 2800 2801 2802 #### Expressions that unify an atom with an expression 2803 2804 An expression of the form `a & e`, where `a` is an atom 2805 and `e` is an expression, always evaluates to `a` or bottom. 2806 As it does not matter how we fail, we can assume the result to be `a` 2807 and postpone validating `a == e` until after all references 2808 in `e` have been resolved. 2809 2810 ``` 2811 // Config Evaluates to (requiring concrete values) 2812 x: { x: { 2813 a: b + 100 a: _|_ // cycle detected 2814 b: a - 100 b: _|_ // cycle detected 2815 } } 2816 2817 y: x & { y: { 2818 a: 200 a: 200 // asserted that 200 == b + 100 2819 b: 100 2820 } } 2821 ``` 2822 2823 2824 #### Field values 2825 2826 A field value of the form `r & v`, 2827 where `r` evaluates to a reference cycle and `v` is a concrete value, 2828 evaluates to `v`. 2829 Unification is idempotent and unifying a value with itself ad infinitum, 2830 which is what the cycle represents, results in this value. 2831 Implementations should detect cycles of this kind, ignore `r`, 2832 and take `v` as the result of unification. 2833 2834 <!-- Tomabechi's graph unification algorithm 2835 can detect such cycles at near-zero cost. --> 2836 2837 ``` 2838 Configuration Evaluated 2839 // c Cycles in nodes of type struct evaluate 2840 // ↙︎ ↖ to the fixed point of unifying their 2841 // a → b values ad infinitum. 2842 2843 a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 } 2844 b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 } 2845 c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 } 2846 2847 // resolve a b & {x:1} 2848 // substitute b c & {y:2} & {x:1} 2849 // substitute c a & {z:3} & {y:2} & {x:1} 2850 // eliminate a (cycle) {z:3} & {y:2} & {x:1} 2851 // simplify {x:1,y:2,z:3} 2852 ``` 2853 2854 This rule also applies to field values that are disjunctions of unification 2855 operations of the above form. 2856 2857 ``` 2858 a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1} 2859 b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2} 2860 c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3} 2861 2862 2863 // resolving a b&{x:1} | {y:1} 2864 // substitute b ({x:2} | c&{z:2})&{x:1} | {y:1} 2865 // simplify c&{z:2}&{x:1} | {y:1} 2866 // substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1} 2867 // simplify a&{y:3}&{z:2}&{x:1} | {y:1} 2868 // eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1} 2869 // expand {x:1,y:3,z:2} | {y:1} 2870 ``` 2871 2872 Note that all nodes that form a reference cycle to form a struct will evaluate 2873 to the same value. 2874 If a field value is a disjunction, any element that is part of a cycle will 2875 evaluate to this value. 2876 2877 2878 ### Structural cycles 2879 2880 A structural cycle is when a node references one of its ancestor nodes. 2881 It is possible to construct a structural cycle by unifying two acyclic values: 2882 ``` 2883 // acyclic 2884 y: { 2885 f: h: g 2886 g: _ 2887 } 2888 // acyclic 2889 x: { 2890 f: _ 2891 g: f 2892 } 2893 // introduces structural cycle 2894 z: x & y 2895 ``` 2896 Implementations should be able to detect such structural cycles dynamically. 2897 2898 A structural cycle can result in infinite structure or evaluation loops. 2899 ``` 2900 // infinite structure 2901 a: b: a 2902 2903 // infinite evaluation 2904 f: { 2905 n: int 2906 out: n + (f & {n: 1}).out 2907 } 2908 ``` 2909 CUE must allow or disallow structural cycles under certain circumstances. 2910 2911 If a node `a` references an ancestor node, we call it and any of its 2912 field values `a.f` _cyclic_. 2913 So if `a` is cyclic, all of its descendants are also regarded as cyclic. 2914 A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`, 2915 is valid if any of its conjuncts is not cyclic. 2916 2917 ``` 2918 // Disallowed: a list of infinite length with all elements being 1. 2919 #List: { 2920 head: 1 2921 tail: #List 2922 } 2923 2924 // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...). 2925 a: { 2926 b: c 2927 } 2928 c: { 2929 d: a 2930 } 2931 2932 // #List defines a list of arbitrary length. Because the recursive reference 2933 // is part of a disjunction, this does not result in a structural cycle. 2934 #List: { 2935 head: _ 2936 tail: null | #List 2937 } 2938 2939 // Usage of #List. The value of tail in the most deeply nested element will 2940 // be `null`: as the value of the disjunct referring to list is the only 2941 // conjunct, all conjuncts are cyclic and the value is invalid and so 2942 // eliminated from the disjunction. 2943 MyList: #List & { head: 1, tail: { head: 2 }} 2944 ``` 2945 2946 <!-- 2947 ### Unused fields 2948 2949 TODO: rules for detection of unused fields 2950 2951 1. Any alias value must be used 2952 --> 2953 2954 2955 ## Modules, instances, and packages 2956 2957 CUE configurations are constructed combining _instances_. 2958 An instance, in turn, is constructed from one or more source files belonging 2959 to the same _package_ that together declare the data representation. 2960 Elements of this data representation may be exported and used 2961 in other instances. 2962 2963 ### Source file organization 2964 2965 Each source file consists of an optional package clause defining collection 2966 of files to which it belongs, 2967 followed by a possibly empty set of import declarations that declare 2968 packages whose contents it wishes to use, followed by a possibly empty set of 2969 declarations. 2970 2971 Like with a struct, a source file may contain embeddings. 2972 Unlike with a struct, the embedded expressions may be any value. 2973 If the result of the unification of all embedded values is not a struct, 2974 it will be output instead of its enclosing file when exporting CUE 2975 to a data format 2976 2977 ``` 2978 SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } . 2979 ``` 2980 2981 ``` 2982 "Hello \(#place)!" 2983 2984 #place: "world" 2985 2986 // Outputs "Hello world!" 2987 ``` 2988 2989 ### Package clause 2990 2991 A package clause is an optional clause that defines the package to which 2992 a source file the file belongs. 2993 2994 ``` 2995 PackageClause = "package" PackageName . 2996 PackageName = identifier . 2997 ``` 2998 2999 The PackageName must not be a definition identifier. 3000 3001 If the PackageName is the blank identifier (`_`), it is treated the same 3002 as if there were no package clause. This can be useful to allow adding 3003 package level attributes or doc comments to a CUE file without a package 3004 name. 3005 3006 ``` 3007 package math 3008 ``` 3009 3010 ### Modules and instances 3011 3012 A _module_ defines a tree of directories, rooted at the _module root_. 3013 3014 All source files within a module with the same package name belong to the same 3015 package. 3016 <!-- jba: I can't make sense of the above sentence. --> 3017 A module may define multiple packages. 3018 3019 An _instance_ of a package is any subset of files belonging 3020 to the same package. 3021 <!-- jba: Are you saying that --> 3022 <!-- if I have a package with files a, b and c, then there are 8 instances of --> 3023 <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the --> 3024 <!-- purpose of that definition? --> 3025 It is interpreted as the concatenation of these files. 3026 3027 An implementation may impose conventions on the layout of package files 3028 to determine which files of a package belongs to an instance. 3029 For example, an instance may be defined as the subset of package files 3030 belonging to a directory and all its ancestors. 3031 <!-- jba: OK, that helps a little, but I still don't see what the purpose is. --> 3032 3033 3034 ### Import declarations 3035 3036 An import declaration states that the source file containing the declaration 3037 depends on definitions of the _imported_ package 3038 and enables access to exported identifiers of that package. 3039 The import names an identifier (PackageName) to be used for access and an 3040 ImportPath that specifies the package to be imported. 3041 3042 ``` 3043 ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) . 3044 ImportSpec = [ PackageName ] ImportPath . 3045 ImportLocation = { unicode_value } . 3046 ImportPath = `"` ImportLocation [ ":" identifier ] `"` . 3047 ``` 3048 3049 The PackageName is used in qualified identifiers to access 3050 exported identifiers of the package within the importing source file. 3051 It is declared in the file block. 3052 It defaults to the identifier specified in the package clause of the imported 3053 package, which must match either the last path component of ImportLocation 3054 or the identifier following it. 3055 3056 <!-- 3057 Note: this deviates from the Go spec where there is no such restriction. 3058 This restriction has the benefit of being to determine the identifiers 3059 for packages from within the file itself. But for CUE it is has another benefit: 3060 when using package hierarchies, one is more likely to want to include multiple 3061 packages within the same directory structure. This mechanism allows 3062 disambiguation in these cases. 3063 --> 3064 3065 The interpretation of the ImportPath is implementation-dependent but it is 3066 typically either the path of a builtin package or a fully qualifying location 3067 of a package within a source code repository. 3068 3069 An ImportLocation must be a non-empty string using only characters belonging to 3070 Unicode's L, M, N, P, and S general categories 3071 (the Graphic characters without spaces) 3072 and may not include the characters ``!"#$%&'()*,:;<=>?[\\]^`{|}`` 3073 or the Unicode replacement character U+FFFD. 3074 3075 Assume we have package containing the package clause `package math`, 3076 which exports function `Sin` at the path identified by `lib/math`. 3077 This table illustrates how `Sin` is accessed in files 3078 that import the package after the various types of import declaration. 3079 3080 <!-- TODO: a better example than lib/math:math, where the suffix is a no-op --> 3081 3082 ``` 3083 Import declaration Local name of Sin 3084 3085 import "lib/math" math.Sin 3086 import "lib/math:math" math.Sin 3087 import m "lib/math" m.Sin 3088 ``` 3089 3090 An import declaration declares a dependency relation between the importing and 3091 imported package. It is illegal for a package to import itself, directly or 3092 indirectly, or to directly import a package without referring to any of its 3093 exported identifiers. 3094 3095 3096 ### An example package 3097 3098 TODO