github.com/joomcode/cue@v0.4.4-0.20221111115225-539fe3512047/doc/ref/spec.md (about) 1 <!-- 2 Copyright 2018 The CUE Authors 3 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. 15 --> 16 17 # The CUE Language Specification 18 19 ## Introduction 20 21 This is a reference manual for the CUE data constraint language. 22 CUE, pronounced cue or Q, is a general-purpose and strongly typed 23 constraint-based language. 24 It can be used for data templating, data validation, code generation, scripting, 25 and many other applications involving structured data. 26 The CUE tooling, layered on top of CUE, provides 27 a general purpose scripting language for creating scripts as well as 28 simple servers, also expressed in CUE. 29 30 CUE was designed with cloud configuration, and related systems, in mind, 31 but is not limited to this domain. 32 It derives its formalism from relational programming languages. 33 This formalism allows for managing and reasoning over large amounts of 34 data in a straightforward manner. 35 36 The grammar is compact and regular, allowing for easy analysis by automatic 37 tools such as integrated development environments. 38 39 This document is maintained by mpvl@golang.org. 40 CUE has a lot of similarities with the Go language. This document draws heavily 41 from the Go specification as a result. 42 43 CUE draws its influence from many languages. 44 Its main influences were BCL/ GCL (internal to Google), 45 LKB (LinGO), Go, and JSON. 46 Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google), 47 Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python. 48 49 50 ## Notation 51 52 The syntax is specified using Extended Backus-Naur Form (EBNF): 53 54 ``` 55 Production = production_name "=" [ Expression ] "." . 56 Expression = Alternative { "|" Alternative } . 57 Alternative = Term { Term } . 58 Term = production_name | token [ "…" token ] | Group | Option | Repetition . 59 Group = "(" Expression ")" . 60 Option = "[" Expression "]" . 61 Repetition = "{" Expression "}" . 62 ``` 63 64 Productions are expressions constructed from terms and the following operators, 65 in increasing precedence: 66 67 ``` 68 | alternation 69 () grouping 70 [] option (0 or 1 times) 71 {} repetition (0 to n times) 72 ``` 73 74 Lower-case production names are used to identify lexical tokens. Non-terminals 75 are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes 76 ``. 77 78 The form a … b represents the set of characters from a through b as 79 alternatives. The horizontal ellipsis … is also used elsewhere in the spec to 80 informally denote various enumerations or code snippets that are not further 81 specified. The character … (as opposed to the three characters ...) is not a 82 token of the CUE language. 83 84 85 ## Source code representation 86 87 Source code is Unicode text encoded in UTF-8. 88 Unless otherwise noted, the text is not canonicalized, so a single 89 accented code point is distinct from the same character constructed from 90 combining an accent and a letter; those are treated as two code points. 91 For simplicity, this document will use the unqualified term character to refer 92 to a Unicode code point in the source text. 93 94 Each code point is distinct; for instance, upper and lower case letters are 95 different characters. 96 97 Implementation restriction: For compatibility with other tools, a compiler may 98 disallow the NUL character (U+0000) in the source text. 99 100 Implementation restriction: For compatibility with other tools, a compiler may 101 ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code 102 point in the source text. A byte order mark may be disallowed anywhere else in 103 the source. 104 105 106 ### Characters 107 108 The following terms are used to denote specific Unicode character classes: 109 110 ``` 111 newline = /* the Unicode code point U+000A */ . 112 unicode_char = /* an arbitrary Unicode code point except newline */ . 113 unicode_letter = /* a Unicode code point classified as "Letter" */ . 114 unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ . 115 ``` 116 117 In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of 118 character categories. 119 CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo 120 as Unicode letters, and those in the Number category Nd as Unicode digits. 121 122 123 ### Letters and digits 124 125 The underscore character _ (U+005F) is considered a letter. 126 127 ``` 128 letter = unicode_letter | "_" | "$" . 129 decimal_digit = "0" … "9" . 130 binary_digit = "0" … "1" . 131 octal_digit = "0" … "7" . 132 hex_digit = "0" … "9" | "A" … "F" | "a" … "f" . 133 ``` 134 135 136 ## Lexical elements 137 138 ### Comments 139 Comments serve as program documentation. 140 CUE supports line comments that start with the character sequence // 141 and stop at the end of the line. 142 143 A comment cannot start inside a string literal or inside a comment. 144 A comment acts like a newline. 145 146 147 ### Tokens 148 149 Tokens form the vocabulary of the CUE language. There are four classes: 150 identifiers, keywords, operators and punctuation, and literals. White space, 151 formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns 152 (U+000D), and newlines (U+000A), is ignored except as it separates tokens that 153 would otherwise combine into a single token. Also, a newline or end of file may 154 trigger the insertion of a comma. While breaking the input into tokens, the 155 next token is the longest sequence of characters that form a valid token. 156 157 158 ### Commas 159 160 The formal grammar uses commas "," as terminators in a number of productions. 161 CUE programs may omit most of these commas using the following two rules: 162 163 When the input is broken into tokens, a comma is automatically inserted into 164 the token stream immediately after a line's final token if that token is 165 166 - an identifier, keyword, or bottom 167 - a number or string literal, including an interpolation 168 - one of the characters `)`, `]`, `}`, or `?` 169 - an ellipsis `...` 170 171 172 Although commas are automatically inserted, the parser will require 173 explicit commas between two list elements. 174 175 To reflect idiomatic use, examples in this document elide commas using 176 these rules. 177 178 179 ### Identifiers 180 181 Identifiers name entities such as fields and aliases. 182 An identifier is a sequence of one or more letters (which includes `_` and `$`) 183 and digits, optionally preceded by `#` or `_#`. 184 It may not be `_` or `$`. 185 The first character in an identifier, or after an `#` if it contains one, 186 must be a letter. 187 Identifiers starting with a `#` or `_` are reserved for definitions and hidden 188 fields. 189 190 <!-- 191 TODO: allow identifiers as defined in Unicode UAX #31 192 (https://unicode.org/reports/tr31/). 193 194 Identifiers are normalized using the NFC normal form. 195 --> 196 197 ``` 198 identifier = [ "#" | "_#" ] letter { letter | unicode_digit } . 199 ``` 200 201 ``` 202 a 203 _x9 204 fieldName 205 αβ 206 ``` 207 208 <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ --> 209 210 Some identifiers are [predeclared](#predeclared-identifiers). 211 212 213 ### Keywords 214 215 CUE has a limited set of keywords. 216 In addition, CUE reserves all identifiers starting with `__`(double underscores) 217 as keywords. 218 These are typically targets of pre-declared identifiers. 219 220 All keywords may be used as labels (field names). 221 Unless noted otherwise, they can also be used as identifiers to refer to 222 the same name. 223 224 225 #### Values 226 227 The following keywords are values. 228 229 ``` 230 null true false 231 ``` 232 233 These can never be used to refer to a field of the same name. 234 This restriction is to ensure compatibility with JSON configuration files. 235 236 237 #### Preamble 238 239 The following keywords are used at the preamble of a CUE file. 240 After the preamble, they may be used as identifiers to refer to namesake fields. 241 242 ``` 243 package import 244 ``` 245 246 247 #### Comprehension clauses 248 249 The following keywords are used in comprehensions. 250 251 ``` 252 for in if let 253 ``` 254 255 <!-- 256 TODO: 257 reduce [to] 258 order [by] 259 --> 260 261 262 ### Operators and punctuation 263 264 The following character sequences represent operators and punctuation: 265 266 ``` 267 + && == < = ( ) 268 - || != > : { } 269 * & =~ <= ? [ ] , 270 / | !~ >= ! _|_ ... . 271 ``` 272 <!-- 273 Free tokens: ; ~ ^ 274 // To be used: 275 @ at: associative lists. 276 277 // Idea: use # instead of @ for attributes and allow then at declaration level. 278 // This will open up the possibility of defining #! at the start of a file 279 // without requiring special syntax. Although probably not quite. 280 --> 281 282 283 ### Numeric literals 284 285 There are several kinds of numeric literals. 286 287 ``` 288 int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit . 289 decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } . 290 decimals = decimal_digit { [ "_" ] decimal_digit } . 291 si_it = decimals [ "." decimals ] multiplier | 292 "." decimals multiplier . 293 binary_lit = "0b" binary_digit { binary_digit } . 294 hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } . 295 octal_lit = "0o" octal_digit { [ "_" ] octal_digit } . 296 multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ] 297 298 float_lit = decimals "." [ decimals ] [ exponent ] | 299 decimals exponent | 300 "." decimals [ exponent ]. 301 exponent = ( "e" | "E" ) [ "+" | "-" ] decimals . 302 ``` 303 304 An _integer literal_ is a sequence of digits representing an integer value. 305 An optional prefix sets a non-decimal base: 0o for octal, 306 0x or 0X for hexadecimal, and 0b for binary. 307 In hexadecimal literals, letters a-f and A-F represent values 10 through 15. 308 All integers allow interstitial underscores "_"; 309 these have no meaning and are solely for readability. 310 311 Integer literals may have an SI or IEC multiplier. 312 Multipliers can be used with fractional numbers. 313 When multiplying a fraction by a multiplier, the result is truncated 314 towards zero if it is not an integer. 315 316 ``` 317 42 318 1.5G // 1_000_000_000 319 1.3Ki // 1.3 * 1024 = trunc(1331.2) = 1331 320 170_141_183_460_469_231_731_687_303_715_884_105_727 321 0xBad_Face 322 0o755 323 0b0101_0001 324 ``` 325 326 A _decimal floating-point literal_ is a representation of 327 a decimal floating-point value (a _float_). 328 It has an integer part, a decimal point, a fractional part, and an 329 exponent part. 330 The integer and fractional part comprise decimal digits; the 331 exponent part is an `e` or `E` followed by an optionally signed decimal exponent. 332 One of the integer part or the fractional part may be elided; one of the decimal 333 point or the exponent may be elided. 334 335 ``` 336 0. 337 72.40 338 072.40 // == 72.40 339 2.71828 340 1.e+0 341 6.67428e-11 342 1E6 343 .25 344 .12345E+5 345 ``` 346 347 <!-- 348 TODO: consider allowing Exo (and up), if not followed by a sign 349 or number. Alternatively one could only allow Ei, Yi, and Zi. 350 --> 351 352 Neither a `float_lit` nor an `si_lit` may appear after a token that is: 353 354 - an identifier, keyword, or bottom 355 - a number or string literal, including an interpolation 356 - one of the characters `)`, `]`, `}`, `?`, or `.`. 357 358 <!-- 359 So 360 `a + 3.2Ti` -> `a`, `+`, `3.2Ti` 361 `a 3.2Ti` -> `a`, `3`, `.`, `2`, `Ti` 362 `a + .5e3` -> `a`, `+`, `.5e3` 363 `a .5e3` -> `a`, `.`, `5`, `e3`. 364 --> 365 366 367 ### String and byte sequence literals 368 369 A string literal represents a string constant obtained from concatenating a 370 sequence of characters. 371 Byte sequences are a sequence of bytes. 372 373 String and byte sequence literals are character sequences between, 374 respectively, double and single quotes, as in `"bar"` and `'bar'`. 375 Within the quotes, any character may appear except newline and, 376 respectively, unescaped double or single quote. 377 String literals may only be valid UTF-8. 378 Byte sequences may contain any sequence of bytes. 379 380 Several escape sequences allow arbitrary values to be encoded as ASCII text. 381 An escape sequence starts with an _escape delimiter_, which is `\` by default. 382 The escape delimiter may be altered to be `\` plus a fixed number of 383 hash symbols `#` 384 by padding the start and end of a string or byte sequence literal 385 with this number of hash symbols. 386 387 There are four ways to represent the integer value as a numeric constant: `\x` 388 followed by exactly two hexadecimal digits; `\u` followed by exactly four 389 hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a 390 plain backslash `\` followed by exactly three octal digits. 391 In each case the value of the literal is the value represented by the 392 digits in the corresponding base. 393 Hexadecimal and octal escapes are only allowed within byte sequences 394 (single quotes). 395 396 Although these representations all result in an integer, they have different 397 valid ranges. 398 Octal escapes must represent a value between 0 and 255 inclusive. 399 Hexadecimal escapes satisfy this condition by construction. 400 The escapes `\u` and `\U` represent Unicode code points so within them 401 some values are illegal, in particular those above `0x10FFFF`. 402 Surrogate halves are allowed, 403 but are translated into their non-surrogate equivalent internally. 404 405 The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes 406 represent individual bytes of the resulting string; all other escapes represent 407 the (possibly multi-byte) UTF-8 encoding of individual characters. 408 Thus inside a string literal `\377` and `\xFF` represent a single byte of 409 value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent 410 the two bytes `0xc3 0xbf` of the UTF-8 411 encoding of character `U+00FF`. 412 413 ``` 414 \a U+0007 alert or bell 415 \b U+0008 backspace 416 \f U+000C form feed 417 \n U+000A line feed or newline 418 \r U+000D carriage return 419 \t U+0009 horizontal tab 420 \v U+000b vertical tab 421 \/ U+002f slash (solidus) 422 \\ U+005c backslash 423 \' U+0027 single quote (valid escape only within single quoted literals) 424 \" U+0022 double quote (valid escape only within double quoted literals) 425 ``` 426 427 The escape `\(` is used as an escape for string interpolation. 428 A `\(` must be followed by a valid CUE Expression, followed by a `)`. 429 430 All other sequences starting with a backslash are illegal inside literals. 431 432 ``` 433 escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) . 434 byte_value = octal_byte_value | hex_byte_value . 435 octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit . 436 hex_byte_value = `\` { `#` } "x" hex_digit hex_digit . 437 little_u_value = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit . 438 big_u_value = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit 439 hex_digit hex_digit hex_digit hex_digit . 440 unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . 441 interpolation = "\" { `#` } "(" Expression ")" . 442 443 string_lit = simple_string_lit | 444 multiline_string_lit | 445 simple_bytes_lit | 446 multiline_bytes_lit | 447 `#` string_lit `#` . 448 449 simple_string_lit = `"` { unicode_value | interpolation } `"` . 450 simple_bytes_lit = `'` { unicode_value | interpolation | byte_value } `'` . 451 multiline_string_lit = `"""` newline 452 { unicode_value | interpolation | newline } 453 newline `"""` . 454 multiline_bytes_lit = "'''" newline 455 { unicode_value | interpolation | byte_value | newline } 456 newline "'''" . 457 ``` 458 459 Carriage return characters (`\r`) inside string literals are discarded from 460 the string value. 461 462 ``` 463 'a\000\xab' 464 '\007' 465 '\377' 466 '\xa' // illegal: too few hexadecimal digits 467 "\n" 468 "\"" 469 'Hello, world!\n' 470 "Hello, \( name )!" 471 "日本語" 472 "\u65e5本\U00008a9e" 473 '\xff\u00FF' 474 "\uD800" // illegal: surrogate half (TODO: probably should allow) 475 "\U00110000" // illegal: invalid Unicode code point 476 477 #"This is not an \(interpolation)"# 478 #"This is an \#(interpolation)"# 479 #"The sequence "\U0001F604" renders as \#U0001F604."# 480 ``` 481 482 These examples all represent the same string: 483 484 ``` 485 "日本語" // UTF-8 input text 486 '日本語' // UTF-8 input text as byte sequence 487 "\u65e5\u672c\u8a9e" // the explicit Unicode code points 488 "\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points 489 '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e' // the explicit UTF-8 bytes 490 ``` 491 492 If the source code represents a character as two code points, such as a 493 combining form involving an accent and a letter, the result will appear as two 494 code points if placed in a string literal. 495 496 Strings and byte sequences have a multiline equivalent. 497 Multiline strings are like their single-line equivalent, 498 but allow newline characters. 499 500 Multiline strings and byte sequences respectively start with 501 a triple double quote (`"""`) or triple single quote (`'''`), 502 immediately followed by a newline, which is discarded from the string contents. 503 The string is closed by a matching triple quote, which must be by itself 504 on a newline, preceded by optional whitespace. 505 The newline preceding the closing quote is discarded from the string contents. 506 The whitespace before a closing triple quote must appear before any non-empty 507 line after the opening quote and will be removed from each of these 508 lines in the string literal. 509 A closing triple quote may not appear in the string. 510 To include it is suffices to escape one of the quotes. 511 512 ``` 513 """ 514 lily: 515 out of the water 516 out of itself 517 518 bass 519 picking bugs 520 off the moon 521 — Nick Virgilio, Selected Haiku, 1988 522 """ 523 ``` 524 525 This represents the same string as: 526 527 ``` 528 "lily:\nout of the water\nout of itself\n\n" + 529 "bass\npicking bugs\noff the moon\n" + 530 " — Nick Virgilio, Selected Haiku, 1988" 531 ``` 532 533 <!-- TODO: other values 534 535 Support for other values: 536 - Duration literals 537 - regular expressions: `re("[a-z]")` 538 --> 539 540 541 ## Values 542 543 In addition to simple values like `"hello"` and `42.0`, CUE has _structs_. 544 A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`. 545 Structs are CUE's only way of building up complex values; 546 lists, which we will see later, 547 are defined in terms of structs. 548 549 All possible values are ordered in a lattice, 550 a partial order where every two elements have a single greatest lower bound. 551 A value `a` is an _instance_ of a value `b`, 552 denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`, 553 that is if `a` orders before `b` in the partial order 554 (`⊑` is _not_ a CUE operator). 555 We also say that `b` _subsumes_ `a` in this case. 556 In graphical terms, `b` is "above" `a` in the lattice. 557 558 At the top of the lattice is the single ancestor of all values, called 559 _top_, denoted `_` in CUE. 560 Every value is an instance of top. 561 562 At the bottom of the lattice is the value called _bottom_, denoted `_|_`. 563 A bottom value usually indicates an error. 564 Bottom is an instance of every value. 565 566 An _atom_ is any value whose only instances are itself and bottom. 567 Examples of atoms are `42.0`, `"hello"`, `true`, `null`. 568 569 A value is _concrete_ if it is either an atom, or a struct all of whose 570 field values are themselves concrete, recursively. 571 572 CUE's values also include what we normally think of as types, like `string` and 573 `float`. 574 But CUE does not distinguish between types and values; only the 575 relationship of values in the lattice is important. 576 Each CUE "type" subsumes the concrete values that one would normally think 577 of as part of that type. 578 For example, "hello" is an instance of `string`, and `42.0` is an instance of 579 `float`. 580 In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`. 581 We informally call these CUE's "basic types". 582 583 584 ``` 585 false ⊑ bool 586 true ⊑ bool 587 true ⊑ true 588 5.0 ⊑ float 589 bool ⊑ _ 590 _|_ ⊑ _ 591 _|_ ⊑ _|_ 592 593 _ ⋢ _|_ 594 _ ⋢ bool 595 int ⋢ bool 596 bool ⋢ int 597 false ⋢ true 598 true ⋢ false 599 float ⋢ 5.0 600 5 ⋢ 6 601 ``` 602 603 604 ### Unification 605 606 The _unification_ of values `a` and `b` 607 is defined as the greatest lower bound of `a` and `b`. (That is, the 608 value `u` such that `u ⊑ a` and `u ⊑ b`, 609 and for any other value `v` for which `v ⊑ a` and `v ⊑ b` 610 it holds that `v ⊑ u`.) 611 Since CUE values form a lattice, the unification of two CUE values is 612 always unique. 613 614 These all follow from the definition of unification: 615 - The unification of `a` with itself is always `a`. 616 - The unification of values `a` and `b` where `a ⊑ b` is always `a`. 617 - The unification of a value with bottom is always bottom. 618 619 Unification in CUE is a [binary expression](#operands), written `a & b`. 620 It is commutative and associative. 621 As a consequence, order of evaluation is irrelevant, a property that is key 622 to many of the constructs in the CUE language as well as the tooling layered 623 on top of it. 624 625 626 627 <!-- TODO: explicitly mention that disjunction is not a binary operation 628 but a definition of a single value?--> 629 630 631 ### Disjunction 632 633 The _disjunction_ of values `a` and `b` 634 is defined as the least upper bound of `a` and `b`. 635 (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`, 636 and for any other value `e` for which `a ⊑ e` and `b ⊑ e`, 637 it holds that `d ⊑ e`.) 638 This style of disjunctions is sometimes also referred to as sum types. 639 Since CUE values form a lattice, the disjunction of two CUE values is always unique. 640 641 642 These all follow from the definition of disjunction: 643 - The disjunction of `a` with itself is always `a`. 644 - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`. 645 - The disjunction of a value `a` with bottom is always `a`. 646 - The disjunction of two bottom values is bottom. 647 648 Disjunction in CUE is a [binary expression](#operands), written `a | b`. 649 It is commutative, associative, and idempotent. 650 651 The unification of a disjunction with another value is equal to the disjunction 652 composed of the unification of this value with all of the original elements 653 of the disjunction. 654 In other words, unification distributes over disjunction. 655 656 ``` 657 (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b. 658 ``` 659 660 ``` 661 Expression Result 662 ({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3} 663 (int | string) & "foo" "foo" 664 ("a" | "b") & "c" _|_ 665 ``` 666 667 A disjunction is _normalized_ if there is no element 668 `a` for which there is an element `b` such that `a ⊑ b`. 669 670 <!-- 671 Normalization is important, as we need to account for spurious elements 672 For instance "tcp" | "tcp" should resolve to "tcp". 673 674 Also consider 675 676 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2}, 677 678 in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus 679 this expression is logically equivalent to {a:1} and should therefore be 680 considered to be unambiguous and resolve to {a:1} if a concrete value is needed. 681 682 For instance, in 683 684 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2} 685 y: x.a // 1 686 687 y should resolve to 1, and not an error. 688 689 For comparison, in 690 691 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2} 692 y: x.a // _|_ 693 694 y should be an error as x is still ambiguous before the selector is applied, 695 even though `a` resolves to 1 in all cases. 696 --> 697 698 699 #### Default values 700 701 Any value `v` _may_ be associated with a default value `d`, 702 where `d` must be in instance of `v` (`d ⊑ v`). 703 704 Default values are introduced by means of disjunctions. 705 Any element of a disjunction can be _marked_ as a default 706 by prefixing it with an asterisk `*` ([a unary expression](#operators)). 707 Syntactically consecutive disjunctions are considered to be 708 part of a single disjunction, 709 whereby multiple disjuncts can be marked as default. 710 A _marked disjunction_ is one where any of its terms are marked. 711 So `a | b | *c | d` is a single marked disjunction of four terms, 712 whereas `a | (b | *c | d)` is an unmarked disjunction of two terms, 713 one of which is a marked disjunction of three terms. 714 During unification, if all the marked disjuncts of a marked disjunction are 715 eliminated, then the remaining unmarked disjuncts are considered as if they 716 originated from an unmarked disjunction 717 <!-- TODO: this formulation should be worked out more. --> 718 As explained below, distinguishing the nesting of disjunctions like this 719 is only relevant when both an outer and nested disjunction are marked. 720 721 Intuitively, when an expression needs to be resolved for an operation other 722 than unification or disjunction, 723 non-starred elements are dropped in favor of starred ones if the starred ones 724 do not resolve to bottom. 725 726 To define the unification and disjunction operation we use the notation 727 `⟨v⟩` to denote a CUE value `v` that is not associated with a default 728 and the notation `⟨v, d⟩` to denote a value `v` associated with a default 729 value `d`. 730 731 The rewrite rules for unifying such values are as follows: 732 ``` 733 U0: ⟨v1⟩ & ⟨v2⟩ => ⟨v1&v2⟩ 734 U1: ⟨v1, d1⟩ & ⟨v2⟩ => ⟨v1&v2, d1&v2⟩ 735 U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩ 736 ``` 737 738 The rewrite rules for disjoining terms of unmarked disjunctions are 739 ``` 740 D0: ⟨v1⟩ | ⟨v2⟩ => ⟨v1|v2⟩ 741 D1: ⟨v1, d1⟩ | ⟨v2⟩ => ⟨v1|v2, d1⟩ 742 D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩ 743 ``` 744 745 Terms of marked disjunctions are first rewritten according to the following 746 rules: 747 ``` 748 M0: ⟨v⟩ => ⟨v⟩ don't introduce defaults for unmarked term 749 M1: *⟨v⟩ => ⟨v, v⟩ introduce identical default for marked term 750 M2: *⟨v, d⟩ => ⟨v, d⟩ keep existing defaults for marked term 751 M3: ⟨v, d⟩ => ⟨v⟩ strip existing defaults from unmarked term 752 ``` 753 754 Note that for any marked disjunction `a`, 755 the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`. 756 757 ``` 758 Expression Value-default pair Rules applied 759 *"tcp" | "udp" ⟨"tcp"|"udp", "tcp"⟩ M1, D1 760 string | *"foo" ⟨string, "foo"⟩ M1, D1 761 762 *1 | 2 | 3 ⟨1|2|3, 1⟩ M1, D1 763 764 (*1|2|3) | (1|*2|3) ⟨1|2|3, 1|2⟩ M1, D1, D2 765 (*1|2|3) | *(1|*2|3) ⟨1|2|3, 2⟩ M1, M2, M3, D1, D2 766 (*1|2|3) | (1|*2|3)&2 ⟨1|2|3, 1|2⟩ M1, D1, U1, D2 767 768 (*1|2) & (1|*2) ⟨1|2, _|_⟩ M1, D1, U2 769 ``` 770 771 The rules of subsumption for defaults can be derived from the above definitions 772 and are as follows. 773 774 ``` 775 ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩ if v2 ⊑ v1 and d2 ⊑ d1 776 ⟨v1, d1⟩ ⊑ ⟨v⟩ if v1 ⊑ v 777 ⟨v⟩ ⊑ ⟨v1, d1⟩ if v ⊑ d1 778 ``` 779 780 <!-- 781 For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v. 782 783 The last one is so restrictive as v could still be made more specific by 784 associating it with a default that is not subsumed by d1. 785 786 Proof: 787 by definition for any d ⊑ v, it holds that (v, d) ⊑ v, 788 where the most general value is (v, v). 789 Given the subsumption rule for (v2, d2) ⊑ (v1, d1), 790 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1 791 exactly defines the boundary of this subsumption. 792 --> 793 794 <!-- 795 (non-normalized entries could also be implicitly marked, allowing writing 796 int | 1, instead of int | *1, but that can be done in a backwards 797 compatible way later if really desirable, as long as we require that 798 disjunction literals be normalized). 799 --> 800 801 ``` 802 Expression Resolves to 803 "tcp" | "udp" "tcp" | "udp" 804 *"tcp" | "udp" "tcp" 805 float | *1 1 806 *string | 1.0 string 807 (*1|2) + (2|*3) 4 808 809 (*1|2|3) | (1|*2|3) 1|2 810 (*1|2|3) & (1|*2|3) 1|2|3 // default is _|_ 811 812 (* >=5 | int) & (* <=5 | int) 5 813 814 (*"tcp"|"udp") & ("udp"|*"tcp") "tcp" 815 (*"tcp"|"udp") & ("udp"|"tcp") "tcp" 816 (*"tcp"|"udp") & "tcp" "tcp" 817 (*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_ 818 819 (*true | false) & bool true 820 (*true | false) & (true | false) true 821 822 {a: 1} | {b: 1} {a: 1} | {b: 1} 823 {a: 1} | *{b: 1} {b:1} 824 *{a: 1} | *{b: 1} {a: 1} | {b: 1} 825 ({a: 1} | {b: 1}) & {a:1} {a:1} | {a: 1, b: 1} 826 ({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} 827 ``` 828 829 830 ### Bottom and errors 831 832 Any evaluation error in CUE results in a bottom value, represented by 833 the token `_|_`. 834 Bottom is an instance of every other value. 835 Any evaluation error is represented as bottom. 836 837 Implementations may associate error strings with different instances of bottom; 838 logically they all remain the same value. 839 840 ``` 841 bottom_lit = "_|_" . 842 ``` 843 844 845 ### Top 846 847 Top is represented by the underscore character `_`, lexically an identifier. 848 Unifying any value `v` with top results `v` itself. 849 850 ``` 851 Expr Result 852 _ & 5 5 853 _ & _ _ 854 _ & _|_ _|_ 855 _ | _|_ _ 856 ``` 857 858 859 ### Null 860 861 The _null value_ is represented with the keyword `null`. 862 It has only one parent, top, and one child, bottom. 863 It is unordered with respect to any other value. 864 865 ``` 866 null_lit = "null" . 867 ``` 868 869 ``` 870 null & 8 _|_ 871 null & _ null 872 null & _|_ _|_ 873 ``` 874 875 876 ### Boolean values 877 878 A _boolean type_ represents the set of Boolean truth values denoted by 879 the keywords `true` and `false`. 880 The predeclared boolean type is `bool`; it is a defined type and a separate 881 element in the lattice. 882 883 ``` 884 bool_lit = "true" | "false" . 885 ``` 886 887 ``` 888 bool & true true 889 true & true true 890 true & false _|_ 891 bool & (false|true) false | true 892 bool & (true|false) true | false 893 ``` 894 895 896 ### Numeric values 897 898 The _integer type_ represents the set of all integral numbers. 899 The _decimal floating-point type_ represents the set of all decimal floating-point 900 numbers. 901 They are two distinct types. 902 Both are instances instances of a generic `number` type. 903 904 <!-- 905 number 906 / \ 907 int float 908 --> 909 910 The predeclared number, integer, decimal floating-point types are 911 `number`, `int` and `float`; they are defined types. 912 <!-- 913 TODO: should we drop float? It is somewhat preciser and probably a good idea 914 to have it in the programmatic API, but it may be confusing to have to deal 915 with it in the language. 916 --> 917 918 A decimal floating-point literal always has type `float`; 919 it is not an instance of `int` even if it is an integral number. 920 921 Integer literals are always of type `int` and don't match type `float`. 922 923 Numeric literals are exact values of arbitrary precision. 924 If the operation permits it, numbers should be kept in arbitrary precision. 925 926 Implementation restriction: although numeric values have arbitrary precision 927 in the language, implementations may implement them using an internal 928 representation with limited precision. 929 That said, every implementation must: 930 931 - Represent integer values with at least 256 bits. 932 - Represent floating-point values, with a mantissa of at least 256 bits and 933 a signed binary exponent of at least 16 bits. 934 - Give an error if unable to represent an integer value precisely. 935 - Give an error if unable to represent a floating-point value due to overflow. 936 - Round to the nearest representable value if unable to represent 937 a floating-point value due to limits on precision. 938 These requirements apply to the result of any expression except for builtin 939 functions for which an unusual loss of precision must be explicitly documented. 940 941 942 ### Strings 943 944 The _string type_ represents the set of UTF-8 strings, 945 not allowing surrogates. 946 The predeclared string type is `string`; it is a defined type. 947 948 The length of a string `s` (its size in bytes) can be discovered using 949 the built-in function `len`. 950 951 952 ### Bytes 953 954 The _bytes type_ represents the set of byte sequences. 955 A byte sequence value is a (possibly empty) sequence of bytes. 956 The number of bytes is called the length of the byte sequence 957 and is never negative. 958 The predeclared byte sequence type is `bytes`; it is a defined type. 959 960 961 ### Bounds 962 963 A _bound_, syntactically a [unary expression](#operands), defines 964 an infinite disjunction of concrete values than can be represented 965 as a single comparison. 966 967 For any [comparison operator](#comparison-operators) `op` except `==`, 968 `op a` is the disjunction of every `x` such that `x op a`. 969 970 ``` 971 2 & >=2 & <=5 // 2, where 2 is either an int or float. 972 2.5 & >=1 & <=5 // 2.5 973 2 & >=1.0 & <3.0 // 2.0 974 2 & >1 & <3.0 // 2.0 975 2.5 & int & >1 & <5 // _|_ 976 2.5 & float & >1 & <5 // 2.5 977 int & 2 & >1.0 & <3.0 // _|_ 978 2.5 & >=(int & 1) & <5 // _|_ 979 >=0 & <=7 & >=3 & <=10 // >=3 & <=7 980 !=null & 1 // 1 981 >=5 & <=5 // 5 982 ``` 983 984 985 ### Structs 986 987 A _struct_ is a set of elements called _fields_, each of 988 which has a name, called a _label_, and value. 989 990 We say a label is defined for a struct if the struct has a field with the 991 corresponding label. 992 The value for a label `f` of struct `a` is denoted `a.f`. 993 A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f` 994 defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`. 995 Note that if `a` is an instance of `b` it may have fields with labels that 996 are not defined for `b`. 997 998 The (unique) struct with no fields, written `{}`, has every struct as an 999 instance. It can be considered the type of all structs. 1000 1001 ``` 1002 {a: 1} ⊑ {} 1003 {a: 1, b: 1} ⊑ {a: 1} 1004 {a: 1} ⊑ {a: int} 1005 {a: 1, b: 1.0} ⊑ {a: int, b: float} 1006 1007 {} ⋢ {a: 1} 1008 {a: 2} ⋢ {a: 1} 1009 {a: 1} ⋢ {b: 1} 1010 ``` 1011 1012 A field may be required or optional. 1013 The successful unification of structs `a` and `b` is a new struct `c` which 1014 has all fields of both `a` and `b`, where 1015 the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`, 1016 or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively. 1017 If a field `f` is in both `a` and `b`, `c.f` is optional only if both 1018 `a.f` and `b.f` are optional. 1019 Any [references](#references) to `a` or `b` 1020 in their respective field values need to be replaced with references to `c`. 1021 The result of a unification is bottom (`_|_`) if any of its non-optional 1022 fields evaluates to bottom, recursively. 1023 1024 <!--NOTE: About bottom values for optional fields being okay. 1025 1026 The proposition ¬P is a close cousin of P → ⊥ and is often used 1027 as an approximation to avoid the issues of using not. 1028 Bottom (⊥) is also frequently used to mean undefined. This makes sense. 1029 Consider `{a?: 2} & {a?: 3}`. 1030 Both structs say `a` is optional; in other words, it may be omitted. 1031 So we can still get a valid result by omitting `a`, even in 1032 case of a conflict. 1033 1034 Granted, this definition may lead to confusing results, especially in 1035 definitions, when tightening an optional field leads to unintentionally 1036 discarding it. 1037 It could be a role of vet checkers to identify such cases (and suggest users 1038 to explicitly use `_|_` to discard a field, for instance). 1039 --> 1040 1041 Syntactically, a field is marked as optional by following its label with a `?`. 1042 The question mark is not part of the field name. 1043 A struct literal may contain multiple fields with 1044 the same label, the result of which is a single field with the same properties 1045 as defined as the unification of two fields resulting from unifying two structs. 1046 1047 These examples illustrate required fields only. 1048 Examples with optional fields follow below. 1049 1050 ``` 1051 Expression Result (without optional fields) 1052 {a: int, a: 1} {a: 1} 1053 {a: int} & {a: 1} {a: 1} 1054 {a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7} 1055 {a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7} 1056 1057 {a: 1} & {b: 2} {a: 1, b: 2} 1058 {a: 1, b: int} & {b: 2} {a: 1, b: 2} 1059 1060 {a: 1} & {a: 2} _|_ 1061 ``` 1062 1063 A struct may define constraints that apply to fields that are added when unified 1064 with another struct using pattern or default constraints (_Note_: default 1065 constraints are not yet implemented). 1066 1067 A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which 1068 is a value of type string, and a value to unify with fields whose label 1069 match that pattern. 1070 When unifying structs `a` and `b`, 1071 a pattern constraint `[p]: v` declared in `a` 1072 defines that the value `v` should unify with any field in the resulting struct `c` 1073 whose label unifies with pattern `p`. 1074 1075 <!-- TODO: Update grammar and support this. 1076 A pattern constraints with a pattern preceded by `...` indicates 1077 the pattern can only matches fields in `b` for which there 1078 exists no field in `a` with the same label. 1079 --> 1080 1081 Additionally, a _default constraint_, denoted `...value`, defines a value 1082 to unify with any field for which there is no other declaration in a struct. 1083 When unifying structs `a` and `b`, 1084 a default constraint `...v` declared in `a` 1085 defines that the value `v` should unify with any field in the resulting struct `c` 1086 whose label does not unify with any of the patterns of the pattern 1087 constraints defined for `a` _and_ for which there exists no field in `a` 1088 with that label. 1089 The token `...` is a shorthand for `..._`. 1090 _Note_: default constraints are not yet implemented. 1091 1092 1093 ``` 1094 a: { 1095 foo: string // foo is a string 1096 [=~"^i"]: int // all other fields starting with i are integers 1097 [=~"^b"]: bool // all other fields starting with b are booleans 1098 ...string // all other fields must be a string. Note: default constraints are not yet implemented. 1099 } 1100 1101 b: a & { 1102 i3: 3 1103 bar: true 1104 other: "a string" 1105 } 1106 ``` 1107 1108 Concrete field labels may be an identifier or string, the latter of which may be 1109 interpolated. 1110 Fields with identifier labels can be referred to within the scope they are 1111 defined, string labels cannot. 1112 References within such interpolated strings are resolved within 1113 the scope of the struct in which the label sequence is 1114 defined and can reference concrete labels lexically preceding 1115 the label within a label sequence. 1116 <!-- We allow this so that rewriting a CUE file to collapse or expand 1117 field sequences has no impact on semantics. 1118 --> 1119 1120 <!--TODO: first implementation round will not yet have expression labels 1121 1122 An ExpressionLabel sets a collection of optional fields to a field value. 1123 By default it defines this value for all possible string labels. 1124 An optional expression limits this to the set of optional fields which 1125 labels match the expression. 1126 --> 1127 1128 1129 <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. --> 1130 1131 1132 <!-- NOTE: 1133 A DefinitionDecl does not allow repeated labels. This is to avoid 1134 any ambiguity or confusion about whether earlier path components 1135 are to be interpreted as declarations or normal fields (they should 1136 always be normal fields.) 1137 --> 1138 1139 <!--NOTE: 1140 The syntax has been deliberately restricted to allow for the following 1141 future extensions and relaxations: 1142 - Allow omitting a "?" in an expression label to indicate a concrete 1143 string value (but maybe we want to use () for that). 1144 - Make the "?" in expression label optional if expression labels 1145 are always optional. 1146 - Or allow eliding the "?" if the expression has no references and 1147 is obviously not concrete (such as `[string]`). 1148 - The expression of an expression label may also indicate a struct with 1149 integer or even number labels 1150 (beware of imprecise computation in the latter). 1151 e.g. `{ [int]: string }` is a map of integers to strings. 1152 - Allow for associative lists (`foo [@.field]: {field: string}`) 1153 - The `...` notation can be extended analogously to that of a ListList, 1154 by allowing it to follow with an expression for the remaining properties. 1155 In that case it is no longer a shorthand for `[string]: _`, but rather 1156 would define the value for any other value for which there is no field 1157 defined. 1158 Like the definition with List, this is somewhat odd, but it allows the 1159 encoding of JSON schema's and (non-structural) OpenAPI's 1160 additionalProperties and additionalItems. 1161 --> 1162 1163 ``` 1164 StructLit = "{" { Declaration "," } "}" . 1165 Declaration = Field | Ellipsis | Embedding | LetClause | attribute . 1166 Ellipsis = "..." [ Expression ] . 1167 Embedding = Comprehension | AliasExpr . 1168 Field = Label ":" { Label ":" } AliasExpr { attribute } . 1169 Label = [ identifier "=" ] LabelExpr . 1170 LabelExpr = LabelName [ "?" ] | "[" AliasExpr "]" . 1171 LabelName = identifier | simple_string_lit . 1172 1173 attribute = "@" identifier "(" attr_tokens ")" . 1174 attr_tokens = { attr_token | 1175 "(" attr_tokens ")" | 1176 "[" attr_tokens "]" | 1177 "{" attr_tokens "}" } . 1178 attr_token = /* any token except '(', ')', '[', ']', '{', or '}' */ 1179 ``` 1180 1181 ``` 1182 Expression Result (without optional fields) 1183 a: { foo?: string } {} 1184 b: { foo: "bar" } { foo: "bar" } 1185 c: { foo?: *"bar" | string } {} 1186 1187 d: a & b { foo: "bar" } 1188 e: b & c { foo: "bar" } 1189 f: a & c {} 1190 g: a & { foo?: number } {} 1191 h: b & { foo?: number } _|_ 1192 i: c & { foo: string } { foo: "bar" } 1193 1194 intMap: [string]: int 1195 intMap: { 1196 t1: 43 1197 t2: 2.4 // error: 2.4 is not an integer 1198 } 1199 1200 nameMap: [string]: { 1201 firstName: string 1202 nickName: *firstName | string 1203 } 1204 1205 nameMap: hank: { firstName: "Hank" } 1206 ``` 1207 The optional field set defined by `nameMap` matches every field, 1208 in this case just `hank`, and unifies the associated constraint 1209 with the matched field, resulting in: 1210 ``` 1211 nameMap: hank: { 1212 firstName: "Hank" 1213 nickName: "Hank" 1214 } 1215 ``` 1216 1217 1218 #### Closed structs 1219 1220 By default, structs are open to adding fields. 1221 Instances of an open struct `p` may contain fields not defined in `p`. 1222 This is makes it easy to add fields, but can lead to bugs: 1223 1224 ``` 1225 S: { 1226 field1: string 1227 } 1228 1229 S1: S & { field2: "foo" } 1230 1231 // S1 is { field1: string, field2: "foo" } 1232 1233 1234 A: { 1235 field1: string 1236 field2: string 1237 } 1238 1239 A1: A & { 1240 feild1: "foo" // "field1" was accidentally misspelled 1241 } 1242 1243 // A1 is 1244 // { field1: string, field2: string, feild1: "foo" } 1245 // not the intended 1246 // { field1: "foo", field2: string } 1247 ``` 1248 1249 A _closed struct_ `c` is a struct whose instances may not declare any field 1250 with a name that does not match the name of a field 1251 or the pattern of a pattern constraint defined in `c`. 1252 Hidden fields are excluded from this limitation. 1253 A struct that is the result of unifying any struct with a [`...`](#structs) 1254 declaration is defined for all regular fields. 1255 Closing a struct is equivalent to adding `..._|_` to it. 1256 1257 Syntactically, structs are closed explicitly with the `close` builtin or 1258 implicitly and recursively by [definitions](#definitions-and-hidden-fields). 1259 1260 1261 ``` 1262 A: close({ 1263 field1: string 1264 field2: string 1265 }) 1266 1267 A1: A & { 1268 feild1: string 1269 } // _|_ feild1 not defined for A 1270 1271 A2: A & { 1272 for k,v in { feild1: string } { 1273 k: v 1274 } 1275 } // _|_ feild1 not defined for A 1276 1277 C: close({ 1278 [_]: _ 1279 }) 1280 1281 C2: C & { 1282 for k,v in { thisIsFine: string } { 1283 "\(k)": v 1284 } 1285 } 1286 1287 D: close({ 1288 // Values generated by comprehensions are treated as embeddings. 1289 for k,v in { x: string } { 1290 "\(k)": v 1291 } 1292 }) 1293 ``` 1294 1295 <!-- (jba) Somewhere it should be said that optional fields are only 1296 interesting inside closed structs. --> 1297 1298 <!-- TODO: move embedding section to above the previous one --> 1299 1300 #### Embedding 1301 1302 A struct may contain an _embedded value_, an operand used as a declaration. 1303 An embedded value of type struct is unified with the struct in which it is 1304 embedded, but disregarding the restrictions imposed by closed structs. 1305 So if an embedding resolves to a closed struct, the corresponding enclosing 1306 struct will also be closed, but may have fields that are not allowed if 1307 normal rules for closed structs were observed. 1308 1309 If an embedded value is not of type struct, the struct may only have 1310 definitions or hidden fields. Regular fields are not allowed in such case. 1311 1312 The result of `{ A }` is `A` for any `A` (including definitions). 1313 1314 Syntactically, embeddings may be any expression. 1315 1316 ``` 1317 S1: { 1318 a: 1 1319 b: 2 1320 { 1321 c: 3 1322 } 1323 } 1324 // S1 is { a: 1, b: 2, c: 3 } 1325 1326 S2: close({ 1327 a: 1 1328 b: 2 1329 { 1330 c: 3 1331 } 1332 }) 1333 // same as close(S1) 1334 1335 S3: { 1336 a: 1 1337 b: 2 1338 close({ 1339 c: 3 1340 }) 1341 } 1342 // same as S2 1343 ``` 1344 1345 1346 #### Definitions and hidden fields 1347 1348 A field is a _definition_ if its identifier starts with `#` or `_#`. 1349 A field is _hidden_ if its identifier starts with a `_`. 1350 All other fields are _regular_. 1351 1352 Definitions and hidden fields are not emitted when converting a CUE program 1353 to data and are never required to be concrete. 1354 1355 Referencing a definition will recursively [close](#closed-structs) it. 1356 That is, a referenced definition will not unify with a struct 1357 that would add a field anywhere within the definition that it does not 1358 already define or explicitly allow with a pattern constraint or `...`. 1359 [Embeddings](#embedding) allow bypassing this check. 1360 1361 If referencing a definition would always result in an error, implementations 1362 may report this inconsistency at the point of its declaration. 1363 1364 ``` 1365 #MyStruct: { 1366 sub: field: string 1367 } 1368 1369 #MyStruct: { 1370 sub: enabled?: bool 1371 } 1372 1373 myValue: #MyStruct & { 1374 sub: feild: 2 // error, feild not defined in #MyStruct 1375 sub: enabled: true // okay 1376 } 1377 1378 #D: { 1379 #OneOf 1380 1381 c: int // adds this field. 1382 } 1383 1384 #OneOf: { a: int } | { b: int } 1385 1386 1387 D1: #D & { a: 12, c: 22 } // { a: 12, c: 22 } 1388 D2: #D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b` 1389 ``` 1390 1391 1392 ``` 1393 #A: {a: int} 1394 1395 B: { 1396 #A 1397 b: c: int 1398 } 1399 1400 x: B 1401 x: d: 3 // not allowed, as closed by embedded #A 1402 1403 y: B.b 1404 y: d: 3 // allowed as nothing closes b 1405 1406 #B: { 1407 #A 1408 b: c: int 1409 } 1410 1411 z: #B.b 1412 z: d: 3 // not allowed, as referencing #B closes b 1413 ``` 1414 1415 1416 <!--- 1417 JSON fields are usual camelCase. Clashes can be avoided by adopting the 1418 convention that definitions be TitleCase. Unexported definitions are still 1419 subject to clashes, but those are likely easier to resolve because they are 1420 package internal. 1421 ---> 1422 1423 1424 #### Attributes 1425 1426 Attributes allow associating meta information with values. 1427 Their primary purpose is to define mappings between CUE and 1428 other representations. 1429 Attributes do not influence the evaluation of CUE. 1430 1431 An attribute associates an identifier with a value, a balanced token sequence, 1432 which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`). 1433 The sequence may not contain interpolations. 1434 1435 Fields, structs and packages can be associated with a set of attributes. 1436 Attributes accumulate during unification, but implementations may remove 1437 duplicates that have the same source string representation. 1438 The interpretation of an attribute, including the handling of multiple 1439 attributes for a given identifier, is up to the consumer of the attribute. 1440 1441 Field attributes define additional information about a field, 1442 such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative 1443 name of the field when mapping to a different language. 1444 1445 1446 ``` 1447 // Package attribute 1448 @protobuf(proto3) 1449 1450 myStruct1: { 1451 // Struct attribute: 1452 @jsonschema(id="https://example.org/mystruct1.json") 1453 1454 // Field attributes 1455 field: string @go(Field) 1456 attr: int @xml(,attr) @go(Attr) 1457 } 1458 1459 myStruct2: { 1460 field: string @go(Field) 1461 attr: int @xml(a1,attr) @go(Attr) 1462 } 1463 1464 Combined: myStruct1 & myStruct2 1465 // field: string @go(Field) 1466 // attr: int @xml(,attr) @xml(a1,attr) @go(Attr) 1467 ``` 1468 1469 1470 #### Aliases 1471 1472 Aliases name values that can be referred to 1473 within the [scope](#declarations-and-scopes) in which they are declared. 1474 The name of an alias must be unique within its scope. 1475 1476 ``` 1477 AliasExpr = [ identifier "=" ] Expression . 1478 ``` 1479 1480 Aliases can appear in several positions: 1481 1482 <!--- TODO: consider allowing this. It should be considered whether 1483 having field aliases isn't already sufficient. 1484 1485 As a declaration in a struct (`X=value`): 1486 1487 - binds identifier `X` to a value embedded within the struct. 1488 ---> 1489 1490 In front of a Label (`X=label: value`): 1491 1492 - binds the identifier to the same value as `label` would be bound 1493 to if it were a valid identifier. 1494 - for optional fields (`foo?: bar` and `[foo]: bar`), 1495 the bound identifier is only visible within the field value (`bar`). 1496 1497 Before a value (`foo: X=x`) 1498 1499 - binds the identifier to the value it precedes within the scope of that value. 1500 1501 Inside a bracketed label (`[X=expr]: value`): 1502 1503 - binds the identifier to the concrete label that matches `expr` 1504 within the instances of the field value (`value`). 1505 1506 Before a list element (`[ X=value, X+1 ]`) (Not yet implemented) 1507 1508 - binds the identifier to the list element it precedes within the scope of the 1509 list expression. 1510 1511 <!-- TODO: explain the difference between aliases and definitions. 1512 Now that you have definitions, are aliases really necessary? 1513 Consider removing. 1514 --> 1515 1516 ``` 1517 // A field alias 1518 foo: X // 4 1519 X="not an identifier": 4 1520 1521 // A value alias 1522 foo: X={x: X.a} 1523 bar: foo & {a: 1} // {a: 1, x: 1} 1524 1525 // A label alias 1526 [Y=string]: { name: Y } 1527 foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 } 1528 ``` 1529 1530 <!-- TODO: also allow aliases as lists --> 1531 1532 1533 #### Let declarations 1534 1535 _Let declarations_ bind an identifier to an expression. 1536 The identifier is visible within the [scope](#declarations-and-scopes) 1537 in which it is declared. 1538 The identifier must be unique within its scope. 1539 1540 ``` 1541 let x = expr 1542 1543 a: x + 1 1544 b: x + 2 1545 ``` 1546 1547 #### Shorthand notation for nested structs 1548 1549 A field whose value is a struct with a single field may be written as 1550 a colon-separated sequence of the two field names, 1551 followed by a colon and the value of that single field. 1552 1553 ``` 1554 job: myTask: replicas: 2 1555 ``` 1556 expands to 1557 ``` 1558 job: { 1559 myTask: { 1560 replicas: 2 1561 } 1562 } 1563 ``` 1564 1565 <!-- OPTIONAL FIELDS: 1566 1567 The optional marker solves the issue of having to print large amounts of 1568 boilerplate when dealing with large types with many optional or default 1569 values (such as Kubernetes). 1570 Writing such optional values in terms of *null | value is tedious, 1571 unpleasant to read, and as it is not well defined what can be dropped or not, 1572 all null values have to be emitted from the output, even if the user 1573 doesn't override them. 1574 Part of the issue is how null is defined. We could adopt a Typescript-like 1575 approach of introducing "void" or "undefined" to mean "not defined and not 1576 part of the output". But having all of null, undefined, and void can be 1577 confusing. If these ever are introduced anyway, the ? operator could be 1578 expressed along the lines of 1579 foo?: bar 1580 being a shorthand for 1581 foo: void | bar 1582 where void is the default if no other default is given. 1583 1584 The current mechanical definition of "?" is straightforward, though, and 1585 probably avoids the need for void, while solving a big issue. 1586 1587 Caveats: 1588 [1] this definition requires explicitly defined fields to be emitted, even 1589 if they could be elided (for instance if the explicit value is the default 1590 value defined an optional field). This is probably a good thing. 1591 1592 [2] a default value may still need to be included in an output if it is not 1593 the zero value for that field and it is not known if any outside system is 1594 aware of defaults. For instance, which defaults are specified by the user 1595 and which by the schema understood by the receiving system. 1596 The use of "?" together with defaults should therefore be used carefully 1597 in non-schema definitions. 1598 Problematic cases should be easy to detect by a vet-like check, though. 1599 1600 [3] It should be considered how this affects the trim command. 1601 Should values implied by optional fields be allowed to be removed? 1602 Probably not. This restriction is unlikely to limit the usefulness of trim, 1603 though. 1604 1605 [4] There should be an option to emit all concrete optional values. 1606 ``` 1607 --> 1608 1609 ### Lists 1610 1611 A list literal defines a new value of type list. 1612 A list may be open or closed. 1613 An open list is indicated with a `...` at the end of an element list, 1614 optionally followed by a value for the remaining elements. 1615 1616 The length of a closed list is the number of elements it contains. 1617 The length of an open list is the number of elements as a lower bound 1618 and an unlimited number of elements as its upper bound. 1619 1620 ``` 1621 ListLit = "[" [ ElementList [ "," ] ] "]" . 1622 ElementList = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] . 1623 ``` 1624 1625 Lists can be thought of as structs: 1626 1627 ``` 1628 List: *null | { 1629 Elem: _ 1630 Tail: List 1631 } 1632 ``` 1633 1634 For closed lists, `Tail` is `null` for the last element, for open lists it is 1635 `*null | List`, defaulting to the shortest variant. 1636 For instance, the open list [ 1, 2, ... ] can be represented as: 1637 ``` 1638 open: List & { Elem: 1, Tail: { Elem: 2 } } 1639 ``` 1640 and the closed version of this list, [ 1, 2 ], as 1641 ``` 1642 closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } } 1643 ``` 1644 1645 Using this representation, the subsumption rule for lists can 1646 be derived from those of structs. 1647 Implementations are not required to implement lists as structs. 1648 The `Elem` and `Tail` fields are not special and `len` will not work as 1649 expected in these cases. 1650 1651 1652 ## Declarations and Scopes 1653 1654 1655 ### Blocks 1656 1657 A _block_ is a possibly empty sequence of declarations. 1658 The braces of a struct literal `{ ... }` form a block, but there are 1659 others as well: 1660 1661 - The _universe block_ encompasses all CUE source text. 1662 - Each [package](#modules-instances-and-packages) has a _package block_ 1663 containing all CUE source text in that package. 1664 - Each file has a _file block_ containing all CUE source text in that file. 1665 - Each `for` and `let` clause in a [comprehension](#comprehensions) 1666 is considered to be its own implicit block. 1667 1668 Blocks nest and influence scoping. 1669 1670 1671 ### Declarations and scope 1672 1673 A _declaration_ may bind an identifier to a field, alias, or package. 1674 Every identifier in a program must be declared. 1675 Other than for fields, 1676 no identifier may be declared twice within the same block. 1677 For fields, an identifier may be declared more than once within the same block, 1678 resulting in a field with a value that is the result of unifying the values 1679 of all fields with the same identifier. 1680 String labels do not bind an identifier to the respective field. 1681 1682 The _scope_ of a declared identifier is the extent of source text in which the 1683 identifier denotes the specified field, alias, or package. 1684 1685 CUE is lexically scoped using blocks: 1686 1687 1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block. 1688 1. The scope of an identifier denoting a field 1689 declared at top level (outside any struct literal) is the package block. 1690 1. The scope of an identifier denoting an alias 1691 declared at top level (outside any struct literal) is the file block. 1692 1. The scope of a let identifier 1693 declared at top level (outside any struct literal) is the file block. 1694 1. The scope of the package name of an imported package is the file block of the 1695 file containing the import declaration. 1696 1. The scope of a field, alias or let identifier declared inside a struct 1697 literal is the innermost containing block. 1698 1699 An identifier declared in a block may be redeclared in an inner block. 1700 While the identifier of the inner declaration is in scope, it denotes the entity 1701 declared by the inner declaration. 1702 1703 The package clause is not a declaration; 1704 the package name does not appear in any scope. 1705 Its purpose is to identify the files belonging to the same package 1706 and to specify the default name for import declarations. 1707 1708 1709 ### Predeclared identifiers 1710 1711 CUE predefines a set of types and builtin functions. 1712 For each of these there is a corresponding keyword which is the name 1713 of the predefined identifier, prefixed with `__`. 1714 1715 ``` 1716 Functions 1717 len close and or 1718 1719 Types 1720 null The null type and value 1721 bool All boolean values 1722 int All integral numbers 1723 float All decimal floating-point numbers 1724 string Any valid UTF-8 sequence 1725 bytes Any valid byte sequence 1726 1727 Derived Value 1728 number int | float 1729 uint >=0 1730 uint8 >=0 & <=255 1731 int8 >=-128 & <=127 1732 uint16 >=0 & <=65536 1733 int16 >=-32_768 & <=32_767 1734 rune >=0 & <=0x10FFFF 1735 uint32 >=0 & <=4_294_967_296 1736 int32 >=-2_147_483_648 & <=2_147_483_647 1737 uint64 >=0 & <=18_446_744_073_709_551_615 1738 int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807 1739 uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455 1740 int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 & 1741 <=170_141_183_460_469_231_731_687_303_715_884_105_727 1742 float32 >=-3.40282346638528859811704183484516925440e+38 & 1743 <=3.40282346638528859811704183484516925440e+38 1744 float64 >=-1.797693134862315708145274237317043567981e+308 & 1745 <=1.797693134862315708145274237317043567981e+308 1746 ``` 1747 1748 1749 ### Exported identifiers 1750 1751 <!-- move to a more logical spot --> 1752 1753 An identifier of a package may be exported to permit access to it 1754 from another package. 1755 All identifiers not starting with `_` (so all regular fields and definitions 1756 starting with `#`) are exported. 1757 Any identifier starting with `_` is not visible outside the package and resides 1758 in a separate namespace than namesake identifiers of other packages. 1759 1760 ``` 1761 package mypackage 1762 1763 foo: string // visible outside mypackage 1764 "bar": string // visible outside mypackage 1765 1766 #Foo: { // visible outside mypackage 1767 a: 1 // visible outside mypackage 1768 _b: 2 // not visible outside mypackage 1769 1770 #C: { // visible outside mypackage 1771 d: 4 // visible outside mypackage 1772 } 1773 _#E: foo // not visible outside mypackage 1774 } 1775 ``` 1776 1777 1778 ### Uniqueness of identifiers 1779 1780 Given a set of identifiers, an identifier is called unique if it is different 1781 from every other in the set, after applying normalization following 1782 Unicode Annex #31. 1783 Two identifiers are different if they are spelled differently 1784 or if they appear in different packages and are not exported. 1785 Otherwise, they are the same. 1786 1787 1788 ### Field declarations 1789 1790 A field associates the value of an expression to a label within a struct. 1791 If this label is an identifier, it binds the field to that identifier, 1792 so the field's value can be referenced by writing the identifier. 1793 String labels are not bound to fields. 1794 ``` 1795 a: { 1796 b: 2 1797 "s": 3 1798 1799 c: b // 2 1800 d: s // _|_ unresolved identifier "s" 1801 e: a.s // 3 1802 } 1803 ``` 1804 1805 If an expression may result in a value associated with a default value 1806 as described in [default values](#default-values), the field binds to this 1807 value-default pair. 1808 1809 1810 <!-- TODO: disallow creating identifiers starting with __ 1811 ...and reserve them for builtin values. 1812 1813 The issue is with code generation. As no guarantee can be given that 1814 a predeclared identifier is not overridden in one of the enclosing scopes, 1815 code will have to handle detecting such cases and renaming them. 1816 An alternative is to have the predeclared identifiers be aliases for namesake 1817 equivalents starting with a double underscore (e.g. string -> __string), 1818 allowing generated code (normal code would keep using `string`) to refer 1819 to these directly. 1820 --> 1821 1822 1823 ### Let declarations 1824 1825 Within a struct, a let clause binds an identifier to the given expression. 1826 1827 Within the scope of the identifier, the identifier refers to the 1828 _locally declared_ expression. 1829 The expression is evaluated in the scope it was declared. 1830 1831 1832 ## Expressions 1833 1834 An expression specifies the computation of a value by applying operators and 1835 built-in functions to operands. 1836 1837 Expressions that require concrete values are called _incomplete_ if any of 1838 their operands are not concrete, but define a value that would be legal for 1839 that expression. 1840 Incomplete expressions may be left unevaluated until a concrete value is 1841 requested at the application level. 1842 1843 ### Operands 1844 1845 Operands denote the elementary values in an expression. 1846 An operand may be a literal, a (possibly qualified) identifier denoting 1847 field, alias, or let declaration, or a parenthesized expression. 1848 1849 ``` 1850 Operand = Literal | OperandName | "(" Expression ")" . 1851 Literal = BasicLit | ListLit | StructLit . 1852 BasicLit = int_lit | float_lit | string_lit | 1853 null_lit | bool_lit | bottom_lit . 1854 OperandName = identifier | QualifiedIdent . 1855 ``` 1856 1857 ### Qualified identifiers 1858 1859 A qualified identifier is an identifier qualified with a package name prefix. 1860 1861 ``` 1862 QualifiedIdent = PackageName "." identifier . 1863 ``` 1864 1865 A qualified identifier accesses an identifier in a different package, 1866 which must be [imported](#import-declarations). 1867 The identifier must be declared in the [package block](#blocks) of that package. 1868 1869 ``` 1870 math.Sin // denotes the Sin function in package math 1871 ``` 1872 1873 ### References 1874 1875 An identifier operand refers to a field and is called a reference. 1876 The value of a reference is a copy of the expression associated with the field 1877 that it is bound to, 1878 with any references within that expression bound to the respective copies of 1879 the fields they were originally bound to. 1880 Implementations may use a different mechanism to evaluate as long as 1881 these semantics are maintained. 1882 1883 ``` 1884 a: { 1885 place: string 1886 greeting: "Hello, \(place)!" 1887 } 1888 1889 b: a & { place: "world" } 1890 c: a & { place: "you" } 1891 1892 d: b.greeting // "Hello, world!" 1893 e: c.greeting // "Hello, you!" 1894 ``` 1895 1896 1897 1898 ### Primary expressions 1899 1900 Primary expressions are the operands for unary and binary expressions. 1901 1902 ``` 1903 PrimaryExpr = 1904 Operand | 1905 PrimaryExpr Selector | 1906 PrimaryExpr Index | 1907 PrimaryExpr Slice | 1908 PrimaryExpr Arguments . 1909 1910 Selector = "." (identifier | simple_string_lit) . 1911 Index = "[" Expression "]" . 1912 Argument = Expression . 1913 Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" . 1914 ``` 1915 <!--- 1916 TODO: 1917 PrimaryExpr Query | 1918 Query = "." Filters . 1919 Filters = Filter { Filter } . 1920 Filter = "[" [ "?" ] AliasExpr "]" . 1921 1922 TODO: maybe reintroduce slices, as they are useful in queries, probably this 1923 time with Python semantics. 1924 Slice = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" . 1925 1926 Argument = Expression | ( identifier ":" Expression ). 1927 1928 // & expression type 1929 // string_lit: same as label. Arguments is current node. 1930 // If selector is applied to list, it performs the operation for each 1931 // element. 1932 1933 TODO: considering allowing decimal_lit for selectors. 1934 ---> 1935 1936 ``` 1937 x 1938 2 1939 (s + ".txt") 1940 f(3.1415, true) 1941 m["foo"] 1942 obj.color 1943 f.p[i].x 1944 ``` 1945 1946 1947 ### Selectors 1948 1949 For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause), 1950 the selector expression 1951 1952 ``` 1953 x.f 1954 ``` 1955 1956 denotes the element of a <!--list or -->struct `x` identified by `f`. 1957 <!--For structs, --> 1958 `f` must be an identifier or a string literal identifying 1959 any definition or regular non-optional field. 1960 The identifier `f` is called the field selector. 1961 1962 <!-- 1963 Allowing strings to be used as field selectors obviates the need for 1964 backquoted identifiers. Note that some standards use names for structs that 1965 are not standard identifiers (such "Fn::Foo"). Note that indexing does not 1966 allow access to identifiers. 1967 --> 1968 1969 <!-- 1970 For lists, `f` must be an integer and follows the same lookup rules as 1971 for the index operation. 1972 The type of the selector expression is the type of `f`. 1973 --> 1974 1975 If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers). 1976 1977 <!-- 1978 TODO: consider allowing this and also for selectors. It needs to be considered 1979 how defaults are carried forward in cases like: 1980 1981 x: { a: string | *"foo" } | *{ a: int | *4 } 1982 y: x.a & string 1983 1984 What is y in this case? 1985 (x.a & string, _|_) 1986 (string|"foo", _|_) 1987 (string|"foo", "foo) 1988 If the latter, then why? 1989 1990 For a disjunction of the form `x1 | ... | xn`, 1991 the selector is applied to each element `x1.f | ... | xn.f`. 1992 --> 1993 1994 Otherwise, if `x` is not a <!--list or -->struct, 1995 or if `f` does not exist in `x`, 1996 the result of the expression is bottom (an error). 1997 In the latter case the expression is incomplete. 1998 The operand of a selector may be associated with a default. 1999 2000 ``` 2001 T: { 2002 x: int 2003 y: 3 2004 "x-y": 4 2005 } 2006 2007 a: T.x // int 2008 b: T.y // 3 2009 c: T.z // _|_ // field 'z' not found in T 2010 d: T."x-y" // 4 2011 2012 e: {a: 1|*2} | *{a: 3|*4} 2013 f: e.a // 4 (default value) 2014 ``` 2015 2016 <!-- 2017 ``` 2018 (v, d).f => (v.f, d.f) 2019 2020 e: {a: 1|*2} | *{a: 3|*4} 2021 f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4) 2022 2023 ``` 2024 --> 2025 2026 2027 ### Index expressions 2028 2029 A primary expression of the form 2030 2031 ``` 2032 a[x] 2033 ``` 2034 2035 denotes the element of a list or struct `a` indexed by `x`. 2036 The value `x` is called the index or field name, respectively. 2037 The following rules apply: 2038 2039 If `a` is not a struct: 2040 2041 - `a` is a list (which need not be complete) 2042 - the index `x` unified with `int` must be concrete. 2043 - the index `x` is in range if `0 <= x < len(a)`, where only the 2044 explicitly defined values of an open-ended list are considered, 2045 otherwise it is out of range 2046 2047 The result of `a[x]` is 2048 2049 for `a` of list type: 2050 2051 - the list element at index `x`, if `x` is within range 2052 - bottom (an error), otherwise 2053 2054 2055 for `a` of struct type: 2056 2057 - the index `x` unified with `string` must be concrete. 2058 - the value of the regular and non-optional field named `x` of struct `a`, 2059 if this field exists 2060 - bottom (an error), otherwise 2061 2062 2063 ``` 2064 [ 1, 2 ][1] // 2 2065 [ 1, 2 ][2] // _|_ 2066 [ 1, 2, ...][2] // _|_ 2067 ``` 2068 2069 Both the operand and index value may be a value-default pair. 2070 ``` 2071 va[vi] => va[vi] 2072 va[(vi, di)] => (va[vi], va[di]) 2073 (va, da)[vi] => (va[vi], da[vi]) 2074 (va, da)[(vi, di)] => (va[vi], da[di]) 2075 ``` 2076 2077 ``` 2078 Fields Result 2079 x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4]) 2080 i: int | *1 (int, 1) 2081 2082 v: x[i] (x[i], 4) 2083 ``` 2084 2085 ### Operators 2086 2087 Operators combine operands into expressions. 2088 2089 ``` 2090 Expression = UnaryExpr | Expression binary_op Expression . 2091 UnaryExpr = PrimaryExpr | unary_op UnaryExpr . 2092 2093 binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op . 2094 rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" . 2095 add_op = "+" | "-" . 2096 mul_op = "*" | "/" . 2097 unary_op = "+" | "-" | "!" | "*" | rel_op . 2098 ``` 2099 2100 Comparisons are discussed [elsewhere](#comparison-operators). 2101 For any binary operators, the operand types must unify. 2102 2103 <!-- TODO: durations 2104 unless the operation involves durations. 2105 2106 Except for duration operations, if one operand is an untyped [literal] and the 2107 other operand is not, the constant is [converted] to the type of the other 2108 operand. 2109 --> 2110 2111 Operands of unary and binary expressions may be associated with a default using 2112 the following 2113 2114 <!-- 2115 ``` 2116 O1: op (v1, d1) => (op v1, op d1) 2117 2118 O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2) 2119 and because v => (v, v) 2120 O3: v1 op (v2, d2) => (v1 op v2, v1 op d2) 2121 O4: (v1, d1) op v2 => (v1 op v2, d1 op v2) 2122 ``` 2123 --> 2124 2125 ``` 2126 Field Resulting Value-Default pair 2127 a: *1|2 (1|2, 1) 2128 b: -a (-a, -1) 2129 2130 c: a + 2 (a+2, 3) 2131 d: a + a (a+a, 2) 2132 ``` 2133 2134 #### Operator precedence 2135 2136 Unary operators have the highest precedence. 2137 2138 There are eight precedence levels for binary operators. 2139 Multiplication operators binds strongest, followed by 2140 addition operators, comparison operators, 2141 `&&` (logical AND), `||` (logical OR), `&` (unification), 2142 and finally `|` (disjunction): 2143 2144 ``` 2145 Precedence Operator 2146 7 * / 2147 6 + - 2148 5 == != < <= > >= =~ !~ 2149 4 && 2150 3 || 2151 2 & 2152 1 | 2153 ``` 2154 2155 Binary operators of the same precedence associate from left to right. 2156 For instance, `x / y * z` is the same as `(x / y) * z`. 2157 2158 ``` 2159 +x 2160 23 + 3*x[i] 2161 x <= f() 2162 f() || g() 2163 x == y+1 && y == z-1 2164 2 | int 2165 { a: 1 } & { b: 2 } 2166 ``` 2167 2168 #### Arithmetic operators 2169 2170 Arithmetic operators apply to numeric values and yield a result of the same type 2171 as the first operand. The four standard arithmetic operators 2172 `(+, -, *, /)` apply to integer and decimal floating-point types; 2173 `+` and `*` also apply to strings and bytes. 2174 2175 ``` 2176 + sum integers, floats, strings, bytes 2177 - difference integers, floats 2178 * product integers, floats, strings, bytes 2179 / quotient integers, floats 2180 ``` 2181 2182 For any operator that accepts operands of type `float`, any operand may be 2183 of type `int` or `float`, in which case the result will be `float` 2184 if it cannot be represented as an `int` or if any of the operands are `float`, 2185 or `int` otherwise. 2186 So the result of `1 / 2` is `0.5` and is of type `float`. 2187 2188 The result of division by zero is bottom (an error). 2189 <!-- TODO: consider making it +/- Inf --> 2190 Integer division is implemented through the builtin functions 2191 `quo`, `rem`, `div`, and `mod`. 2192 2193 The unary operators `+` and `-` are defined for numeric values as follows: 2194 2195 ``` 2196 +x is 0 + x 2197 -x negation is 0 - x 2198 ``` 2199 2200 #### String operators 2201 2202 Strings can be concatenated using the `+` operator: 2203 ``` 2204 s: "hi " + name + " and good bye" 2205 ``` 2206 String addition creates a new string by concatenating the operands. 2207 2208 A string can be repeated by multiplying it: 2209 2210 ``` 2211 s: "etc. "*3 // "etc. etc. etc. " 2212 ``` 2213 2214 <!-- jba: Do these work for byte sequences? If not, why not? --> 2215 2216 2217 ##### Comparison operators 2218 2219 Comparison operators compare two operands and yield an untyped boolean value. 2220 2221 ``` 2222 == equal 2223 != not equal 2224 < less 2225 <= less or equal 2226 > greater 2227 >= greater or equal 2228 =~ matches regular expression 2229 !~ does not match regular expression 2230 ``` 2231 2232 <!-- regular expression operator inspired by Bash, Perl, and Ruby. --> 2233 2234 In any comparison, the types of the two operands must unify or one of the 2235 operands must be null. 2236 2237 The equality operators `==` and `!=` apply to operands that are comparable. 2238 The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered. 2239 The matching operators `=~` and `!~` apply to a string and regular 2240 expression operand. 2241 These terms and the result of the comparisons are defined as follows: 2242 2243 - Null is comparable with itself and any other type. 2244 Two null values are always equal, null is unequal with anything else. 2245 - Boolean values are comparable. 2246 Two boolean values are equal if they are either both true or both false. 2247 - Integer values are comparable and ordered, in the usual way. 2248 - Floating-point values are comparable and ordered, as per the definitions 2249 for binary coded decimals in the IEEE-754-2008 standard. 2250 - Floating point numbers may be compared with integers. 2251 - String and bytes values are comparable and ordered lexically byte-wise. 2252 - Struct are not comparable. 2253 - Lists are not comparable. 2254 - The regular expression syntax is the one accepted by RE2, 2255 described in https://github.com/google/re2/wiki/Syntax, 2256 except for `\C`. 2257 - `s =~ r` is true if `s` matches the regular expression `r`. 2258 - `s !~ r` is true if `s` does not match regular expression `r`. 2259 2260 <!--- TODO: consider the following 2261 - For regular expression, named capture groups are interpreted as CUE references 2262 that must unify with the strings matching this capture group. 2263 ---> 2264 <!-- TODO: Implementations should adopt an algorithm that runs in linear time? --> 2265 <!-- Consider implementing Level 2 of Unicode regular expression. --> 2266 2267 ``` 2268 3 < 4 // true 2269 3 < 4.0 // true 2270 null == 2 // false 2271 null != {} // true 2272 {} == {} // _|_: structs are not comparable against structs 2273 2274 "Wild cats" =~ "cat" // true 2275 "Wild cats" !~ "dog" // true 2276 2277 "foo" =~ "^[a-z]{3}$" // true 2278 "foo" =~ "^[a-z]{4}$" // false 2279 ``` 2280 2281 <!-- jba 2282 I think I know what `3 < a` should mean if 2283 2284 a: >=1 & <=5 2285 2286 It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely. 2287 2288 But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value. 2289 --> 2290 2291 #### Logical operators 2292 2293 Logical operators apply to boolean values and yield a result of the same type 2294 as the operands. The right operand is evaluated conditionally. 2295 2296 ``` 2297 && conditional AND p && q is "if p then q else false" 2298 || conditional OR p || q is "if p then true else q" 2299 ! NOT !p is "not p" 2300 ``` 2301 2302 2303 <!-- 2304 ### TODO TODO TODO 2305 2306 3.14 / 0.0 // illegal: division by zero 2307 Illegal conversions always apply to CUE. 2308 2309 Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa. 2310 --> 2311 2312 <!--- TODO(mpvl): conversions 2313 ### Conversions 2314 Conversions are expressions of the form `T(x)` where `T` and `x` are 2315 expressions. 2316 The result is always an instance of `T`. 2317 2318 ``` 2319 Conversion = Expression "(" Expression [ "," ] ")" . 2320 ``` 2321 ---> 2322 <!--- 2323 2324 A literal value `x` can be converted to type T if `x` is representable by a 2325 value of `T`. 2326 2327 As a special case, an integer literal `x` can be converted to a string type 2328 using the same rule as for non-constant x. 2329 2330 Converting a literal yields a typed value as result. 2331 2332 ``` 2333 uint(iota) // iota value of type uint 2334 float32(2.718281828) // 2.718281828 of type float32 2335 complex128(1) // 1.0 + 0.0i of type complex128 2336 float32(0.49999999) // 0.5 of type float32 2337 float64(-1e-1000) // 0.0 of type float64 2338 string('x') // "x" of type string 2339 string(0x266c) // "♬" of type string 2340 MyString("foo" + "bar") // "foobar" of type MyString 2341 string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant 2342 (*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type 2343 int(1.2) // illegal: 1.2 cannot be represented as an int 2344 string(65.0) // illegal: 65.0 is not an integer constant 2345 ``` 2346 ---> 2347 <!--- 2348 2349 A conversion is always allowed if `x` is an instance of `T`. 2350 2351 If `T` and `x` of different underlying type, a conversion is allowed if 2352 `x` can be converted to a value `x'` of `T`'s type, and 2353 `x'` is an instance of `T`. 2354 A value `x` can be converted to the type of `T` in any of these cases: 2355 2356 - `x` is a struct and is subsumed by `T`. 2357 - `x` and `T` are both integer or floating points. 2358 - `x` is an integer or a byte sequence and `T` is a string. 2359 - `x` is a string and `T` is a byte sequence. 2360 2361 Specific rules apply to conversions between numeric types, structs, 2362 or to and from a string type. These conversions may change the representation 2363 of `x`. 2364 All other conversions only change the type but not the representation of x. 2365 2366 2367 #### Conversions between numeric ranges 2368 For the conversion of numeric values, the following rules apply: 2369 2370 1. Any integer value can be converted into any other integer value 2371 provided that it is within range. 2372 2. When converting a decimal floating-point number to an integer, the fraction 2373 is discarded (truncation towards zero). TODO: or disallow truncating? 2374 2375 ``` 2376 a: uint16(int(1000)) // uint16(1000) 2377 b: uint8(1000) // _|_ // overflow 2378 c: int(2.5) // 2 TODO: TBD 2379 ``` 2380 2381 2382 #### Conversions to and from a string type 2383 2384 Converting a list of bytes to a string type yields a string whose successive 2385 bytes are the elements of the slice. 2386 Invalid UTF-8 is converted to `"\uFFFD"`. 2387 2388 ``` 2389 string('hell\xc3\xb8') // "hellø" 2390 string(bytes([0x20])) // " " 2391 ``` 2392 2393 As string value is always convertible to a list of bytes. 2394 2395 ``` 2396 bytes("hellø") // 'hell\xc3\xb8' 2397 bytes("") // '' 2398 ``` 2399 2400 #### Conversions between list types 2401 2402 Conversions between list types are possible only if `T` strictly subsumes `x` 2403 and the result will be the unification of `T` and `x`. 2404 2405 If we introduce named types this would be different from IP & [10, ...] 2406 2407 Consider removing this until it has a different meaning. 2408 2409 ``` 2410 IP: 4*[byte] 2411 Private10: IP([10, ...]) // [10, byte, byte, byte] 2412 ``` 2413 2414 #### Conversions between struct types 2415 2416 A conversion from `x` to `T` 2417 is applied using the following rules: 2418 2419 1. `x` must be an instance of `T`, 2420 2. all fields defined for `x` that are not defined for `T` are removed from 2421 the result of the conversion, recursively. 2422 2423 <!-- jba: I don't think you say anywhere that the matching fields are unified. 2424 mpvl: they are not, x must be an instance of T, in which case x == T&x, 2425 so unification would be unnecessary. 2426 --> 2427 <!-- 2428 ``` 2429 T: { 2430 a: { b: 1..10 } 2431 } 2432 2433 x1: { 2434 a: { b: 8, c: 10 } 2435 d: 9 2436 } 2437 2438 c1: T(x1) // { a: { b: 8 } } 2439 c2: T({}) // _|_ // missing field 'a' in '{}' 2440 c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10) 2441 ``` 2442 --> 2443 2444 ### Calls 2445 2446 Calls can be made to core library functions, called builtins. 2447 Given an expression `f` of function type F, 2448 ``` 2449 f(a1, a2, … an) 2450 ``` 2451 calls `f` with arguments a1, a2, … an. Arguments must be expressions 2452 of which the values are an instance of the parameter types of `F` 2453 and are evaluated before the function is called. 2454 2455 ``` 2456 a: math.Atan2(x, y) 2457 ``` 2458 2459 In a function call, the function value and arguments are evaluated in the usual 2460 order. 2461 After they are evaluated, the parameters of the call are passed by value 2462 to the function and the called function begins execution. 2463 The return parameters 2464 of the function are passed by value back to the calling function when the 2465 function returns. 2466 2467 2468 ### Comprehensions 2469 2470 Lists and fields can be constructed using comprehensions. 2471 2472 Comprehensions define a clause sequence that consists of a sequence of 2473 `for`, `if`, and `let` clauses, nesting from left to right. 2474 The sequence must start with a `for` or `if` clause. 2475 The `for` and `let` clauses each define a new scope in which new values are 2476 bound to be available for the next clause. 2477 2478 The `for` clause binds the defined identifiers, on each iteration, to the next 2479 value of some iterable value in a new scope. 2480 A `for` clause may bind one or two identifiers. 2481 If there is one identifier, it binds it to the value of 2482 a list element or struct field value. 2483 If there are two identifiers, the first value will be the key or index, 2484 if available, and the second will be the value. 2485 2486 For lists, `for` iterates over all elements in the list after closing it. 2487 For structs, `for` iterates over all non-optional regular fields. 2488 2489 An `if` clause, or guard, specifies an expression that terminates the current 2490 iteration if it evaluates to false. 2491 2492 The `let` clause binds the result of an expression to the defined identifier 2493 in a new scope. 2494 2495 A current iteration is said to complete if the innermost block of the clause 2496 sequence is reached. 2497 Syntactically, the comprehension value is a struct. 2498 A comprehension can generate non-struct values by embedding such values within 2499 this struct. 2500 2501 Within lists, the values yielded by a comprehension are inserted in the list 2502 at the position of the comprehension. 2503 Within structs, the values yielded by a comprehension are embedded within the 2504 struct. 2505 Both structs and lists may contain multiple comprehensions. 2506 2507 ``` 2508 Comprehension = Clauses StructLit . 2509 2510 Clauses = StartClause { [ "," ] Clause } . 2511 StartClause = ForClause | GuardClause . 2512 Clause = StartClause | LetClause . 2513 ForClause = "for" identifier [ "," identifier ] "in" Expression . 2514 GuardClause = "if" Expression . 2515 LetClause = "let" identifier "=" Expression . 2516 ``` 2517 2518 ``` 2519 a: [1, 2, 3, 4] 2520 b: [ for x in a if x > 1 { x+1 } ] // [3, 4, 5] 2521 2522 c: { 2523 for x in a 2524 if x < 4 2525 let y = 1 { 2526 "\(x)": x + y 2527 } 2528 } 2529 d: { "1": 2, "2": 3, "3": 4 } 2530 ``` 2531 2532 2533 ### String interpolation 2534 2535 String interpolation allows constructing strings by replacing placeholder 2536 expressions with their string representation. 2537 String interpolation may be used in single- and double-quoted strings, as well 2538 as their multiline equivalent. 2539 2540 A placeholder consists of "\\(" followed by an expression and a ")". 2541 The expression is evaluated in the scope within which the string is defined. 2542 2543 The result of the expression is substituted as follows: 2544 - string: as is 2545 - bool: the JSON representation of the bool 2546 - number: a JSON representation of the number that preserves the 2547 precision of the underlying binary coded decimal 2548 - bytes: as if substituted within single quotes or 2549 converted to valid UTF-8 replacing the 2550 maximal subpart of ill-formed subsequences with a single 2551 replacement character (W3C encoding standard) otherwise 2552 - list: illegal 2553 - struct: illegal 2554 2555 2556 ``` 2557 a: "World" 2558 b: "Hello \( a )!" // Hello World! 2559 ``` 2560 2561 2562 ## Builtin Functions 2563 2564 Built-in functions are predeclared. They are called like any other function. 2565 2566 2567 ### `len` 2568 2569 The built-in function `len` takes arguments of various types and returns 2570 a result of type int. 2571 2572 ``` 2573 Argument type Result 2574 2575 string string length in bytes 2576 bytes length of byte sequence 2577 list list length, smallest length for an open list 2578 struct number of distinct data fields, excluding optional 2579 ``` 2580 <!-- TODO: consider not supporting len, but instead rely on more 2581 precisely named builtin functions: 2582 - strings.RuneLen(x) 2583 - bytes.Len(x) // x may be a string 2584 - struct.NumFooFields(x) 2585 - list.Len(x) 2586 --> 2587 2588 ``` 2589 Expression Result 2590 len("Hellø") 6 2591 len([1, 2, 3]) 3 2592 len([1, 2, ...]) >=2 2593 ``` 2594 2595 2596 ### `close` 2597 2598 The builtin function `close` converts a partially defined, or open, struct 2599 to a fully defined, or closed, struct. 2600 2601 2602 ### `and` 2603 2604 The built-in function `and` takes a list and returns the result of applying 2605 the `&` operator to all elements in the list. 2606 It returns top for the empty list. 2607 2608 ``` 2609 Expression: Result 2610 and([a, b]) a & b 2611 and([a]) a 2612 and([]) _ 2613 ``` 2614 2615 ### `or` 2616 2617 The built-in function `or` takes a list and returns the result of applying 2618 the `|` operator to all elements in the list. 2619 It returns bottom for the empty list. 2620 2621 ``` 2622 Expression: Result 2623 or([a, b]) a | b 2624 or([a]) a 2625 or([]) _|_ 2626 ``` 2627 2628 ### `div`, `mod`, `quo` and `rem` 2629 2630 For two integer values `x` and `y`, 2631 the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)` 2632 implement Euclidean division and 2633 satisfy the following relationship: 2634 2635 ``` 2636 r = x - y*q with 0 <= r < |y| 2637 ``` 2638 where `|y|` denotes the absolute value of `y`. 2639 2640 ``` 2641 x y div(x, y) mod(x, y) 2642 5 3 1 2 2643 -5 3 -2 1 2644 5 -3 -1 2 2645 -5 -3 2 1 2646 ``` 2647 2648 For two integer values `x` and `y`, 2649 the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)` 2650 implement truncated division and 2651 satisfy the following relationship: 2652 2653 ``` 2654 x = q*y + r and |r| < |y| 2655 ``` 2656 2657 with `quo(x, y)` truncated towards zero. 2658 2659 ``` 2660 x y quo(x, y) rem(x, y) 2661 5 3 1 2 2662 -5 3 -1 -2 2663 5 -3 -1 2 2664 -5 -3 1 -2 2665 ``` 2666 2667 A zero divisor in either case results in bottom (an error). 2668 2669 2670 ## Cycles 2671 2672 Implementations are required to interpret or reject cycles encountered 2673 during evaluation according to the rules in this section. 2674 2675 2676 ### Reference cycles 2677 2678 A _reference cycle_ occurs if a field references itself, either directly or 2679 indirectly. 2680 2681 ``` 2682 // x references itself 2683 x: x 2684 2685 // indirect cycles 2686 b: c 2687 c: d 2688 d: b 2689 ``` 2690 2691 Implementations should treat these as `_`. 2692 Two particular cases are discussed below. 2693 2694 2695 #### Expressions that unify an atom with an expression 2696 2697 An expression of the form `a & e`, where `a` is an atom 2698 and `e` is an expression, always evaluates to `a` or bottom. 2699 As it does not matter how we fail, we can assume the result to be `a` 2700 and postpone validating `a == e` until after all references 2701 in `e` have been resolved. 2702 2703 ``` 2704 // Config Evaluates to (requiring concrete values) 2705 x: { x: { 2706 a: b + 100 a: _|_ // cycle detected 2707 b: a - 100 b: _|_ // cycle detected 2708 } } 2709 2710 y: x & { y: { 2711 a: 200 a: 200 // asserted that 200 == b + 100 2712 b: 100 2713 } } 2714 ``` 2715 2716 2717 #### Field values 2718 2719 A field value of the form `r & v`, 2720 where `r` evaluates to a reference cycle and `v` is a concrete value, 2721 evaluates to `v`. 2722 Unification is idempotent and unifying a value with itself ad infinitum, 2723 which is what the cycle represents, results in this value. 2724 Implementations should detect cycles of this kind, ignore `r`, 2725 and take `v` as the result of unification. 2726 2727 <!-- Tomabechi's graph unification algorithm 2728 can detect such cycles at near-zero cost. --> 2729 2730 ``` 2731 Configuration Evaluated 2732 // c Cycles in nodes of type struct evaluate 2733 // ↙︎ ↖ to the fixed point of unifying their 2734 // a → b values ad infinitum. 2735 2736 a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 } 2737 b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 } 2738 c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 } 2739 2740 // resolve a b & {x:1} 2741 // substitute b c & {y:2} & {x:1} 2742 // substitute c a & {z:3} & {y:2} & {x:1} 2743 // eliminate a (cycle) {z:3} & {y:2} & {x:1} 2744 // simplify {x:1,y:2,z:3} 2745 ``` 2746 2747 This rule also applies to field values that are disjunctions of unification 2748 operations of the above form. 2749 2750 ``` 2751 a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1} 2752 b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2} 2753 c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3} 2754 2755 2756 // resolving a b&{x:1} | {y:1} 2757 // substitute b ({x:2} | c&{z:2})&{x:1} | {y:1} 2758 // simplify c&{z:2}&{x:1} | {y:1} 2759 // substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1} 2760 // simplify a&{y:3}&{z:2}&{x:1} | {y:1} 2761 // eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1} 2762 // expand {x:1,y:3,z:2} | {y:1} 2763 ``` 2764 2765 Note that all nodes that form a reference cycle to form a struct will evaluate 2766 to the same value. 2767 If a field value is a disjunction, any element that is part of a cycle will 2768 evaluate to this value. 2769 2770 2771 ### Structural cycles 2772 2773 A structural cycle is when a node references one of its ancestor nodes. 2774 It is possible to construct a structural cycle by unifying two acyclic values: 2775 ``` 2776 // acyclic 2777 y: { 2778 f: h: g 2779 g: _ 2780 } 2781 // acyclic 2782 x: { 2783 f: _ 2784 g: f 2785 } 2786 // introduces structural cycle 2787 z: x & y 2788 ``` 2789 Implementations should be able to detect such structural cycles dynamically. 2790 2791 A structural cycle can result in infinite structure or evaluation loops. 2792 ``` 2793 // infinite structure 2794 a: b: a 2795 2796 // infinite evaluation 2797 f: { 2798 n: int 2799 out: n + (f & {n: 1}).out 2800 } 2801 ``` 2802 CUE must allow or disallow structural cycles under certain circumstances. 2803 2804 If a node `a` references an ancestor node, we call it and any of its 2805 field values `a.f` _cyclic_. 2806 So if `a` is cyclic, all of its descendants are also regarded as cyclic. 2807 A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`, 2808 is valid if any of its conjuncts is not cyclic. 2809 2810 ``` 2811 // Disallowed: a list of infinite length with all elements being 1. 2812 #List: { 2813 head: 1 2814 tail: #List 2815 } 2816 2817 // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...). 2818 a: { 2819 b: c 2820 } 2821 c: { 2822 d: a 2823 } 2824 2825 // #List defines a list of arbitrary length. Because the recursive reference 2826 // is part of a disjunction, this does not result in a structural cycle. 2827 #List: { 2828 head: _ 2829 tail: null | #List 2830 } 2831 2832 // Usage of #List. The value of tail in the most deeply nested element will 2833 // be `null`: as the value of the disjunct referring to list is the only 2834 // conjunct, all conjuncts are cyclic and the value is invalid and so 2835 // eliminated from the disjunction. 2836 MyList: #List & { head: 1, tail: { head: 2 }} 2837 ``` 2838 2839 <!-- 2840 ### Unused fields 2841 2842 TODO: rules for detection of unused fields 2843 2844 1. Any alias value must be used 2845 --> 2846 2847 2848 ## Modules, instances, and packages 2849 2850 CUE configurations are constructed combining _instances_. 2851 An instance, in turn, is constructed from one or more source files belonging 2852 to the same _package_ that together declare the data representation. 2853 Elements of this data representation may be exported and used 2854 in other instances. 2855 2856 ### Source file organization 2857 2858 Each source file consists of an optional package clause defining collection 2859 of files to which it belongs, 2860 followed by a possibly empty set of import declarations that declare 2861 packages whose contents it wishes to use, followed by a possibly empty set of 2862 declarations. 2863 2864 Like with a struct, a source file may contain embeddings. 2865 Unlike with a struct, the embedded expressions may be any value. 2866 If the result of the unification of all embedded values is not a struct, 2867 it will be output instead of its enclosing file when exporting CUE 2868 to a data format 2869 2870 ``` 2871 SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } . 2872 ``` 2873 2874 ``` 2875 "Hello \(#place)!" 2876 2877 #place: "world" 2878 2879 // Outputs "Hello world!" 2880 ``` 2881 2882 ### Package clause 2883 2884 A package clause is an optional clause that defines the package to which 2885 a source file the file belongs. 2886 2887 ``` 2888 PackageClause = "package" PackageName . 2889 PackageName = identifier . 2890 ``` 2891 2892 The PackageName must not be the blank identifier or a definition identifier. 2893 2894 ``` 2895 package math 2896 ``` 2897 2898 ### Modules and instances 2899 A _module_ defines a tree of directories, rooted at the _module root_. 2900 2901 All source files within a module with the same package belong to the same 2902 package. 2903 <!-- jba: I can't make sense of the above sentence. --> 2904 A module may define multiple packages. 2905 2906 An _instance_ of a package is any subset of files belonging 2907 to the same package. 2908 <!-- jba: Are you saying that --> 2909 <!-- if I have a package with files a, b and c, then there are 8 instances of --> 2910 <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the --> 2911 <!-- purpose of that definition? --> 2912 It is interpreted as the concatenation of these files. 2913 2914 An implementation may impose conventions on the layout of package files 2915 to determine which files of a package belongs to an instance. 2916 For example, an instance may be defined as the subset of package files 2917 belonging to a directory and all its ancestors. 2918 <!-- jba: OK, that helps a little, but I still don't see what the purpose is. --> 2919 2920 2921 ### Import declarations 2922 2923 An import declaration states that the source file containing the declaration 2924 depends on definitions of the _imported_ package 2925 and enables access to exported identifiers of that package. 2926 The import names an identifier (PackageName) to be used for access and an 2927 ImportPath that specifies the package to be imported. 2928 2929 ``` 2930 ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) . 2931 ImportSpec = [ PackageName ] ImportPath . 2932 ImportLocation = { unicode_value } . 2933 ImportPath = `"` ImportLocation [ ":" identifier ] `"` . 2934 ``` 2935 2936 The PackageName is used in qualified identifiers to access 2937 exported identifiers of the package within the importing source file. 2938 It is declared in the file block. 2939 It defaults to the identifier specified in the package clause of the imported 2940 package, which must match either the last path component of ImportLocation 2941 or the identifier following it. 2942 2943 <!-- 2944 Note: this deviates from the Go spec where there is no such restriction. 2945 This restriction has the benefit of being to determine the identifiers 2946 for packages from within the file itself. But for CUE it is has another benefit: 2947 when using package hierarchies, one is more likely to want to include multiple 2948 packages within the same directory structure. This mechanism allows 2949 disambiguation in these cases. 2950 --> 2951 2952 The interpretation of the ImportPath is implementation-dependent but it is 2953 typically either the path of a builtin package or a fully qualifying location 2954 of a package within a source code repository. 2955 2956 An ImportLocation must be a non-empty string using only characters belonging to 2957 Unicode's L, M, N, P, and S general categories 2958 (the Graphic characters without spaces) 2959 and may not include the characters !"#$%&'()*,:;<=>?[\\]^`{|} 2960 or the Unicode replacement character U+FFFD. 2961 2962 Assume we have package containing the package clause "package math", 2963 which exports function Sin at the path identified by "lib/math". 2964 This table illustrates how Sin is accessed in files 2965 that import the package after the various types of import declaration. 2966 2967 ``` 2968 Import declaration Local name of Sin 2969 2970 import "lib/math" math.Sin 2971 import "lib/math:math" math.Sin 2972 import m "lib/math" m.Sin 2973 ``` 2974 2975 An import declaration declares a dependency relation between the importing and 2976 imported package. It is illegal for a package to import itself, directly or 2977 indirectly, or to directly import a package without referring to any of its 2978 exported identifiers. 2979 2980 2981 ### An example package 2982 2983 TODO