github.com/solo-io/cue@v0.4.7/doc/ref/spec.md (about) 1 <!-- 2 Copyright 2018 The CUE Authors 3 4 Licensed under the Apache License, Version 2.0 (the "License"); 5 you may not use this file except in compliance with the License. 6 You may obtain a copy of the License at 7 8 http://www.apache.org/licenses/LICENSE-2.0 9 10 Unless required by applicable law or agreed to in writing, software 11 distributed under the License is distributed on an "AS IS" BASIS, 12 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 See the License for the specific language governing permissions and 14 limitations under the License. 15 --> 16 17 # The CUE Language Specification 18 19 ## Introduction 20 21 This is a reference manual for the CUE data constraint language. 22 CUE, pronounced cue or Q, is a general-purpose and strongly typed 23 constraint-based language. 24 It can be used for data templating, data validation, code generation, scripting, 25 and many other applications involving structured data. 26 The CUE tooling, layered on top of CUE, provides 27 a general purpose scripting language for creating scripts as well as 28 simple servers, also expressed in CUE. 29 30 CUE was designed with cloud configuration, and related systems, in mind, 31 but is not limited to this domain. 32 It derives its formalism from relational programming languages. 33 This formalism allows for managing and reasoning over large amounts of 34 data in a straightforward manner. 35 36 The grammar is compact and regular, allowing for easy analysis by automatic 37 tools such as integrated development environments. 38 39 This document is maintained by mpvl@golang.org. 40 CUE has a lot of similarities with the Go language. This document draws heavily 41 from the Go specification as a result. 42 43 CUE draws its influence from many languages. 44 Its main influences were BCL/ GCL (internal to Google), 45 LKB (LinGO), Go, and JSON. 46 Others are Swift, Typescript, Javascript, Prolog, NCL (internal to Google), 47 Jsonnet, HCL, Flabbergast, Nix, JSONPath, Haskell, Objective-C, and Python. 48 49 50 ## Notation 51 52 The syntax is specified using Extended Backus-Naur Form (EBNF): 53 54 ``` 55 Production = production_name "=" [ Expression ] "." . 56 Expression = Alternative { "|" Alternative } . 57 Alternative = Term { Term } . 58 Term = production_name | token [ "…" token ] | Group | Option | Repetition . 59 Group = "(" Expression ")" . 60 Option = "[" Expression "]" . 61 Repetition = "{" Expression "}" . 62 ``` 63 64 Productions are expressions constructed from terms and the following operators, 65 in increasing precedence: 66 67 ``` 68 | alternation 69 () grouping 70 [] option (0 or 1 times) 71 {} repetition (0 to n times) 72 ``` 73 74 Lower-case production names are used to identify lexical tokens. Non-terminals 75 are in CamelCase. Lexical tokens are enclosed in double quotes "" or back quotes 76 ``. 77 78 The form a … b represents the set of characters from a through b as 79 alternatives. The horizontal ellipsis … is also used elsewhere in the spec to 80 informally denote various enumerations or code snippets that are not further 81 specified. The character … (as opposed to the three characters ...) is not a 82 token of the CUE language. 83 84 85 ## Source code representation 86 87 Source code is Unicode text encoded in UTF-8. 88 Unless otherwise noted, the text is not canonicalized, so a single 89 accented code point is distinct from the same character constructed from 90 combining an accent and a letter; those are treated as two code points. 91 For simplicity, this document will use the unqualified term character to refer 92 to a Unicode code point in the source text. 93 94 Each code point is distinct; for instance, upper and lower case letters are 95 different characters. 96 97 Implementation restriction: For compatibility with other tools, a compiler may 98 disallow the NUL character (U+0000) in the source text. 99 100 Implementation restriction: For compatibility with other tools, a compiler may 101 ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code 102 point in the source text. A byte order mark may be disallowed anywhere else in 103 the source. 104 105 106 ### Characters 107 108 The following terms are used to denote specific Unicode character classes: 109 110 ``` 111 newline = /* the Unicode code point U+000A */ . 112 unicode_char = /* an arbitrary Unicode code point except newline */ . 113 unicode_letter = /* a Unicode code point classified as "Letter" */ . 114 unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */ . 115 ``` 116 117 In The Unicode Standard 8.0, Section 4.5 "General Category" defines a set of 118 character categories. 119 CUE treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo 120 as Unicode letters, and those in the Number category Nd as Unicode digits. 121 122 123 ### Letters and digits 124 125 The underscore character _ (U+005F) is considered a letter. 126 127 ``` 128 letter = unicode_letter | "_" . 129 decimal_digit = "0" … "9" . 130 binary_digit = "0" … "1" . 131 octal_digit = "0" … "7" . 132 hex_digit = "0" … "9" | "A" … "F" | "a" … "f" . 133 ``` 134 135 136 ## Lexical elements 137 138 ### Comments 139 Comments serve as program documentation. 140 CUE supports line comments that start with the character sequence // 141 and stop at the end of the line. 142 143 A comment cannot start inside a string literal or inside a comment. 144 A comment acts like a newline. 145 146 147 ### Tokens 148 149 Tokens form the vocabulary of the CUE language. There are four classes: 150 identifiers, keywords, operators and punctuation, and literals. White space, 151 formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns 152 (U+000D), and newlines (U+000A), is ignored except as it separates tokens that 153 would otherwise combine into a single token. Also, a newline or end of file may 154 trigger the insertion of a comma. While breaking the input into tokens, the 155 next token is the longest sequence of characters that form a valid token. 156 157 158 ### Commas 159 160 The formal grammar uses commas "," as terminators in a number of productions. 161 CUE programs may omit most of these commas using the following two rules: 162 163 When the input is broken into tokens, a comma is automatically inserted into 164 the token stream immediately after a line's final token if that token is 165 166 - an identifier, keyword, or bottom 167 - a number or string literal, including an interpolation 168 - one of the characters `)`, `]`, `}`, or `?` 169 - an ellipsis `...` 170 171 172 Although commas are automatically inserted, the parser will require 173 explicit commas between two list elements. 174 175 To reflect idiomatic use, examples in this document elide commas using 176 these rules. 177 178 179 ### Identifiers 180 181 Identifiers name entities such as fields and aliases. 182 An identifier is a sequence of one or more letters (which includes `_` and `$`) 183 and digits, optionally preceded by `#` or `_#`. 184 It may not be `_` or `$`. 185 The first character in an identifier, or after an `#` if it contains one, 186 must be a letter. 187 Identifiers starting with a `#` or `_` are reserved for definitions and hidden 188 fields. 189 190 <!-- 191 TODO: allow identifiers as defined in Unicode UAX #31 192 (https://unicode.org/reports/tr31/). 193 194 Identifiers are normalized using the NFC normal form. 195 --> 196 197 ``` 198 identifier = [ "#" | "_#" ] letter { letter | unicode_digit } . 199 ``` 200 201 ``` 202 a 203 _x9 204 fieldName 205 αβ 206 ``` 207 208 <!-- TODO: Allow Unicode identifiers TR 32 http://unicode.org/reports/tr31/ --> 209 210 Some identifiers are [predeclared](#predeclared-identifiers). 211 212 213 ### Keywords 214 215 CUE has a limited set of keywords. 216 In addition, CUE reserves all identifiers starting with `__`(double underscores) 217 as keywords. 218 These are typically targets of pre-declared identifiers. 219 220 All keywords may be used as labels (field names). 221 Unless noted otherwise, they can also be used as identifiers to refer to 222 the same name. 223 224 225 #### Values 226 227 The following keywords are values. 228 229 ``` 230 null true false 231 ``` 232 233 These can never be used to refer to a field of the same name. 234 This restriction is to ensure compatibility with JSON configuration files. 235 236 237 #### Preamble 238 239 The following keywords are used at the preamble of a CUE file. 240 After the preamble, they may be used as identifiers to refer to namesake fields. 241 242 ``` 243 package import 244 ``` 245 246 247 #### Comprehension clauses 248 249 The following keywords are used in comprehensions. 250 251 ``` 252 for in if let 253 ``` 254 255 <!-- 256 TODO: 257 reduce [to] 258 order [by] 259 --> 260 261 262 ### Operators and punctuation 263 264 The following character sequences represent operators and punctuation: 265 266 ``` 267 + && == < = ( ) 268 - || != > : { } 269 * & =~ <= ? [ ] , 270 / | !~ >= ! _|_ ... . 271 ``` 272 <!-- 273 Free tokens: ; ~ ^ 274 // To be used: 275 @ at: associative lists. 276 277 // Idea: use # instead of @ for attributes and allow then at declaration level. 278 // This will open up the possibility of defining #! at the start of a file 279 // without requiring special syntax. Although probably not quite. 280 --> 281 282 283 ### Numeric literals 284 285 There are several kinds of numeric literals. 286 287 ``` 288 int_lit = decimal_lit | si_lit | octal_lit | binary_lit | hex_lit . 289 decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } . 290 decimals = decimal_digit { [ "_" ] decimal_digit } . 291 si_it = decimals [ "." decimals ] multiplier | 292 "." decimals multiplier . 293 binary_lit = "0b" binary_digit { binary_digit } . 294 hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } . 295 octal_lit = "0o" octal_digit { [ "_" ] octal_digit } . 296 multiplier = ( "K" | "M" | "G" | "T" | "P" ) [ "i" ] 297 298 float_lit = decimals "." [ decimals ] [ exponent ] | 299 decimals exponent | 300 "." decimals [ exponent ]. 301 exponent = ( "e" | "E" ) [ "+" | "-" ] decimals . 302 ``` 303 304 An _integer literal_ is a sequence of digits representing an integer value. 305 An optional prefix sets a non-decimal base: 0o for octal, 306 0x or 0X for hexadecimal, and 0b for binary. 307 In hexadecimal literals, letters a-f and A-F represent values 10 through 15. 308 All integers allow interstitial underscores "_"; 309 these have no meaning and are solely for readability. 310 311 Integer literals may have an SI or IEC multiplier. 312 Multipliers can be used with fractional numbers. 313 When multiplying a fraction by a multiplier, the result is truncated 314 towards zero if it is not an integer. 315 316 ``` 317 42 318 1.5G // 1_000_000_000 319 1.3Ki // 1.3 * 1024 = trunc(1331.2) = 1331 320 170_141_183_460_469_231_731_687_303_715_884_105_727 321 0xBad_Face 322 0o755 323 0b0101_0001 324 ``` 325 326 A _decimal floating-point literal_ is a representation of 327 a decimal floating-point value (a _float_). 328 It has an integer part, a decimal point, a fractional part, and an 329 exponent part. 330 The integer and fractional part comprise decimal digits; the 331 exponent part is an `e` or `E` followed by an optionally signed decimal exponent. 332 One of the integer part or the fractional part may be elided; one of the decimal 333 point or the exponent may be elided. 334 335 ``` 336 0. 337 72.40 338 072.40 // == 72.40 339 2.71828 340 1.e+0 341 6.67428e-11 342 1E6 343 .25 344 .12345E+5 345 ``` 346 347 <!-- 348 TODO: consider allowing Exo (and up), if not followed by a sign 349 or number. Alternatively one could only allow Ei, Yi, and Zi. 350 --> 351 352 Neither a `float_lit` nor an `si_lit` may not appear after a token that is: 353 354 - an identifier, keyword, or bottom 355 - a number or string literal, including an interpolation 356 - one of the characters `)`, `]`, `}`, `?`, or `.`. 357 358 <!-- 359 So 360 `a + 3.2Ti` -> `a`, `+`, `3.2Ti` 361 `a 3.2Ti` -> `a`, `3`, `.`, `2`, `Ti` 362 `a + .5e3` -> `a`, `+`, `.5e3` 363 `a .5e3` -> `a`, `.`, `5`, `e3`. 364 --> 365 366 367 ### String and byte sequence literals 368 369 A string literal represents a string constant obtained from concatenating a 370 sequence of characters. 371 Byte sequences are a sequence of bytes. 372 373 String and byte sequence literals are character sequences between, 374 respectively, double and single quotes, as in `"bar"` and `'bar'`. 375 Within the quotes, any character may appear except newline and, 376 respectively, unescaped double or single quote. 377 String literals may only be valid UTF-8. 378 Byte sequences may contain any sequence of bytes. 379 380 Several escape sequences allow arbitrary values to be encoded as ASCII text. 381 An escape sequence starts with an _escape delimiter_, which is `\` by default. 382 The escape delimiter may be altered to be `\` plus a fixed number of 383 hash symbols `#` 384 by padding the start and end of a string or byte sequence literal 385 with this number of hash symbols. 386 387 There are four ways to represent the integer value as a numeric constant: `\x` 388 followed by exactly two hexadecimal digits; `\u` followed by exactly four 389 hexadecimal digits; `\U` followed by exactly eight hexadecimal digits, and a 390 plain backslash `\` followed by exactly three octal digits. 391 In each case the value of the literal is the value represented by the 392 digits in the corresponding base. 393 Hexadecimal and octal escapes are only allowed within byte sequences 394 (single quotes). 395 396 Although these representations all result in an integer, they have different 397 valid ranges. 398 Octal escapes must represent a value between 0 and 255 inclusive. 399 Hexadecimal escapes satisfy this condition by construction. 400 The escapes `\u` and `\U` represent Unicode code points so within them 401 some values are illegal, in particular those above `0x10FFFF`. 402 Surrogate halves are allowed, 403 but are translated into their non-surrogate equivalent internally. 404 405 The three-digit octal (`\nnn`) and two-digit hexadecimal (`\xnn`) escapes 406 represent individual bytes of the resulting string; all other escapes represent 407 the (possibly multi-byte) UTF-8 encoding of individual characters. 408 Thus inside a string literal `\377` and `\xFF` represent a single byte of 409 value `0xFF=255`, while `ÿ`, `\u00FF`, `\U000000FF` and `\xc3\xbf` represent 410 the two bytes `0xc3 0xbf` of the UTF-8 411 encoding of character `U+00FF`. 412 413 ``` 414 \a U+0007 alert or bell 415 \b U+0008 backspace 416 \f U+000C form feed 417 \n U+000A line feed or newline 418 \r U+000D carriage return 419 \t U+0009 horizontal tab 420 \v U+000b vertical tab 421 \/ U+002f slash (solidus) 422 \\ U+005c backslash 423 \' U+0027 single quote (valid escape only within single quoted literals) 424 \" U+0022 double quote (valid escape only within double quoted literals) 425 ``` 426 427 The escape `\(` is used as an escape for string interpolation. 428 A `\(` must be followed by a valid CUE Expression, followed by a `)`. 429 430 All other sequences starting with a backslash are illegal inside literals. 431 432 ``` 433 escaped_char = `\` { `#` } ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | "/" | `\` | "'" | `"` ) . 434 byte_value = octal_byte_value | hex_byte_value . 435 octal_byte_value = `\` { `#` } octal_digit octal_digit octal_digit . 436 hex_byte_value = `\` { `#` } "x" hex_digit hex_digit . 437 little_u_value = `\` { `#` } "u" hex_digit hex_digit hex_digit hex_digit . 438 big_u_value = `\` { `#` } "U" hex_digit hex_digit hex_digit hex_digit 439 hex_digit hex_digit hex_digit hex_digit . 440 unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . 441 interpolation = "\" { `#` } "(" Expression ")" . 442 443 string_lit = simple_string_lit | 444 multiline_string_lit | 445 simple_bytes_lit | 446 multiline_bytes_lit | 447 `#` string_lit `#` . 448 449 simple_string_lit = `"` { unicode_value | interpolation } `"` . 450 simple_bytes_lit = `'` { unicode_value | interpolation | byte_value } `'` . 451 multiline_string_lit = `"""` newline 452 { unicode_value | interpolation | newline } 453 newline `"""` . 454 multiline_bytes_lit = "'''" newline 455 { unicode_value | interpolation | byte_value | newline } 456 newline "'''" . 457 ``` 458 459 Carriage return characters (`\r`) inside string literals are discarded from 460 the string value. 461 462 ``` 463 'a\000\xab' 464 '\007' 465 '\377' 466 '\xa' // illegal: too few hexadecimal digits 467 "\n" 468 "\"" 469 'Hello, world!\n' 470 "Hello, \( name )!" 471 "日本語" 472 "\u65e5本\U00008a9e" 473 '\xff\u00FF' 474 "\uD800" // illegal: surrogate half (TODO: probably should allow) 475 "\U00110000" // illegal: invalid Unicode code point 476 477 #"This is not an \(interpolation)"# 478 #"This is an \#(interpolation)"# 479 #"The sequence "\U0001F604" renders as \#U0001F604."# 480 ``` 481 482 These examples all represent the same string: 483 484 ``` 485 "日本語" // UTF-8 input text 486 '日本語' // UTF-8 input text as byte sequence 487 `日本語` // UTF-8 input text as a raw literal 488 "\u65e5\u672c\u8a9e" // the explicit Unicode code points 489 "\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points 490 '\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e' // the explicit UTF-8 bytes 491 ``` 492 493 If the source code represents a character as two code points, such as a 494 combining form involving an accent and a letter, the result will appear as two 495 code points if placed in a string literal. 496 497 Strings and byte sequences have a multiline equivalent. 498 Multiline strings are like their single-line equivalent, 499 but allow newline characters. 500 501 Multiline strings and byte sequences respectively start with 502 a triple double quote (`"""`) or triple single quote (`'''`), 503 immediately followed by a newline, which is discarded from the string contents. 504 The string is closed by a matching triple quote, which must be by itself 505 on a newline, preceded by optional whitespace. 506 The newline preceding the closing quote is discarded from the string contents. 507 The whitespace before a closing triple quote must appear before any non-empty 508 line after the opening quote and will be removed from each of these 509 lines in the string literal. 510 A closing triple quote may not appear in the string. 511 To include it is suffices to escape one of the quotes. 512 513 ``` 514 """ 515 lily: 516 out of the water 517 out of itself 518 519 bass 520 picking bugs 521 off the moon 522 — Nick Virgilio, Selected Haiku, 1988 523 """ 524 ``` 525 526 This represents the same string as: 527 528 ``` 529 "lily:\nout of the water\nout of itself\n\n" + 530 "bass\npicking bugs\noff the moon\n" + 531 " — Nick Virgilio, Selected Haiku, 1988" 532 ``` 533 534 <!-- TODO: other values 535 536 Support for other values: 537 - Duration literals 538 - regular expessions: `re("[a-z]")` 539 --> 540 541 542 ## Values 543 544 In addition to simple values like `"hello"` and `42.0`, CUE has _structs_. 545 A struct is a map from labels to values, like `{a: 42.0, b: "hello"}`. 546 Structs are CUE's only way of building up complex values; 547 lists, which we will see later, 548 are defined in terms of structs. 549 550 All possible values are ordered in a lattice, 551 a partial order where every two elements have a single greatest lower bound. 552 A value `a` is an _instance_ of a value `b`, 553 denoted `a ⊑ b`, if `b == a` or `b` is more general than `a`, 554 that is if `a` orders before `b` in the partial order 555 (`⊑` is _not_ a CUE operator). 556 We also say that `b` _subsumes_ `a` in this case. 557 In graphical terms, `b` is "above" `a` in the lattice. 558 559 At the top of the lattice is the single ancestor of all values, called 560 _top_, denoted `_` in CUE. 561 Every value is an instance of top. 562 563 At the bottom of the lattice is the value called _bottom_, denoted `_|_`. 564 A bottom value usually indicates an error. 565 Bottom is an instance of every value. 566 567 An _atom_ is any value whose only instances are itself and bottom. 568 Examples of atoms are `42.0`, `"hello"`, `true`, `null`. 569 570 A value is _concrete_ if it is either an atom, or a struct all of whose 571 field values are themselves concrete, recursively. 572 573 CUE's values also include what we normally think of as types, like `string` and 574 `float`. 575 But CUE does not distinguish between types and values; only the 576 relationship of values in the lattice is important. 577 Each CUE "type" subsumes the concrete values that one would normally think 578 of as part of that type. 579 For example, "hello" is an instance of `string`, and `42.0` is an instance of 580 `float`. 581 In addition to `string` and `float`, CUE has `null`, `int`, `bool` and `bytes`. 582 We informally call these CUE's "basic types". 583 584 585 ``` 586 false ⊑ bool 587 true ⊑ bool 588 true ⊑ true 589 5.0 ⊑ float 590 bool ⊑ _ 591 _|_ ⊑ _ 592 _|_ ⊑ _|_ 593 594 _ ⋢ _|_ 595 _ ⋢ bool 596 int ⋢ bool 597 bool ⋢ int 598 false ⋢ true 599 true ⋢ false 600 float ⋢ 5.0 601 5 ⋢ 6 602 ``` 603 604 605 ### Unification 606 607 The _unification_ of values `a` and `b` 608 is defined as the greatest lower bound of `a` and `b`. (That is, the 609 value `u` such that `u ⊑ a` and `u ⊑ b`, 610 and for any other value `v` for which `v ⊑ a` and `v ⊑ b` 611 it holds that `v ⊑ u`.) 612 Since CUE values form a lattice, the unification of two CUE values is 613 always unique. 614 615 These all follow from the definition of unification: 616 - The unification of `a` with itself is always `a`. 617 - The unification of values `a` and `b` where `a ⊑ b` is always `a`. 618 - The unification of a value with bottom is always bottom. 619 620 Unification in CUE is a [binary expression](#Operands), written `a & b`. 621 It is commutative and associative. 622 As a consequence, order of evaluation is irrelevant, a property that is key 623 to many of the constructs in the CUE language as well as the tooling layered 624 on top of it. 625 626 627 628 <!-- TODO: explicitly mention that disjunction is not a binary operation 629 but a definition of a single value?--> 630 631 632 ### Disjunction 633 634 The _disjunction_ of values `a` and `b` 635 is defined as the least upper bound of `a` and `b`. 636 (That is, the value `d` such that `a ⊑ d` and `b ⊑ d`, 637 and for any other value `e` for which `a ⊑ e` and `b ⊑ e`, 638 it holds that `d ⊑ e`.) 639 This style of disjunctions is sometimes also referred to as sum types. 640 Since CUE values form a lattice, the disjunction of two CUE values is always unique. 641 642 643 These all follow from the definition of disjunction: 644 - The disjunction of `a` with itself is always `a`. 645 - The disjunction of a value `a` and `b` where `a ⊑ b` is always `b`. 646 - The disjunction of a value `a` with bottom is always `a`. 647 - The disjunction of two bottom values is bottom. 648 649 Disjunction in CUE is a [binary expression](#Operands), written `a | b`. 650 It is commutative, associative, and idempotent. 651 652 The unification of a disjunction with another value is equal to the disjunction 653 composed of the unification of this value with all of the original elements 654 of the disjunction. 655 In other words, unification distributes over disjunction. 656 657 ``` 658 (a_0 | ... |a_n) & b ==> a_0&b | ... | a_n&b. 659 ``` 660 661 ``` 662 Expression Result 663 ({a:1} | {b:2}) & {c:3} {a:1, c:3} | {b:2, c:3} 664 (int | string) & "foo" "foo" 665 ("a" | "b") & "c" _|_ 666 ``` 667 668 A disjunction is _normalized_ if there is no element 669 `a` for which there is an element `b` such that `a ⊑ b`. 670 671 <!-- 672 Normalization is important, as we need to account for spurious elements 673 For instance "tcp" | "tcp" should resolve to "tcp". 674 675 Also consider 676 677 ({a:1} | {b:1}) & ({a:1} | {b:2}) -> {a:1} | {a:1,b:1} | {a:1,b:2}, 678 679 in this case, elements {a:1,b:1} and {a:1,b:2} are subsumed by {a:1} and thus 680 this expression is logically equivalent to {a:1} and should therefore be 681 considered to be unambiguous and resolve to {a:1} if a concrete value is needed. 682 683 For instance, in 684 685 x: ({a:1} | {b:1}) & ({a:1} | {b:2}) // -> {a:1} | {a:1,b:1} | {a:1,b:2} 686 y: x.a // 1 687 688 y should resolve to 1, and not an error. 689 690 For comparison, in 691 692 x: ({a:1, b:1} | {b:2}) & {a:1} // -> {a:1,b:1} | {a:1,b:2} 693 y: x.a // _|_ 694 695 y should be an error as x is still ambiguous before the selector is applied, 696 even though `a` resolves to 1 in all cases. 697 --> 698 699 700 #### Default values 701 702 Any value `v` _may_ be associated with a default value `d`, 703 where `d` must be in instance of `v` (`d ⊑ v`). 704 705 Default values are introduced by means of disjunctions. 706 Any element of a disjunction can be _marked_ as a default 707 by prefixing it with an asterisk `*` ([a unary expression](#Operators)). 708 Syntactically consecutive disjunctions are considered to be 709 part of a single disjunction, 710 whereby multiple disjuncts can be marked as default. 711 A _marked disjunction_ is one where any of its terms are marked. 712 So `a | b | *c | d` is a single marked disjunction of four terms, 713 whereas `a | (b | *c | d)` is an unmarked disjunction of two terms, 714 one of which is a marked disjunction of three terms. 715 During unification, if all the marked disjuncts of a marked disjunction are 716 eliminated, then the remaining unmarked disjuncts are considered as if they 717 originated from an unmarked disjunction 718 <!-- TODO: this formulation should be worked out more. --> 719 As explained below, distinguishing the nesting of disjunctions like this 720 is only relevant when both an outer and nested disjunction are marked. 721 722 Intuitively, when an expression needs to be resolved for an operation other 723 than unification or disjunction, 724 non-starred elements are dropped in favor of starred ones if the starred ones 725 do not resolve to bottom. 726 727 To define the the unification and disjunction operation we use the notation 728 `⟨v⟩` to denote a CUE value `v` that is not associated with a default 729 and the notation `⟨v, d⟩` to denote a value `v` associated with a default 730 value `d`. 731 732 The rewrite rules for unifying such values are as follows: 733 ``` 734 U0: ⟨v1⟩ & ⟨v2⟩ => ⟨v1&v2⟩ 735 U1: ⟨v1, d1⟩ & ⟨v2⟩ => ⟨v1&v2, d1&v2⟩ 736 U2: ⟨v1, d1⟩ & ⟨v2, d2⟩ => ⟨v1&v2, d1&d2⟩ 737 ``` 738 739 The rewrite rules for disjoining terms of unmarked disjunctions are 740 ``` 741 D0: ⟨v1⟩ | ⟨v2⟩ => ⟨v1|v2⟩ 742 D1: ⟨v1, d1⟩ | ⟨v2⟩ => ⟨v1|v2, d1⟩ 743 D2: ⟨v1, d1⟩ | ⟨v2, d2⟩ => ⟨v1|v2, d1|d2⟩ 744 ``` 745 746 Terms of marked disjunctions are first rewritten according to the following 747 rules: 748 ``` 749 M0: ⟨v⟩ => ⟨v⟩ don't introduce defaults for unmarked term 750 M1: *⟨v⟩ => ⟨v, v⟩ introduce identical default for marked term 751 M2: *⟨v, d⟩ => ⟨v, d⟩ keep existing defaults for marked term 752 M3: ⟨v, d⟩ => ⟨v⟩ strip existing defaults from unmarked term 753 ``` 754 755 Note that for any marked disjunction `a`, 756 the expressions `a|a`, `*a|a` and `*a|*a` all resolve to `a`. 757 758 ``` 759 Expression Value-default pair Rules applied 760 *"tcp" | "udp" ⟨"tcp"|"udp", "tcp"⟩ M1, D1 761 string | *"foo" ⟨string, "foo"⟩ M1, D1 762 763 *1 | 2 | 3 ⟨1|2|3, 1⟩ M1, D1 764 765 (*1|2|3) | (1|*2|3) ⟨1|2|3, 1|2⟩ M1, D1, D2 766 (*1|2|3) | *(1|*2|3) ⟨1|2|3, 2⟩ M1, M2, M3, D1, D2 767 (*1|2|3) | (1|*2|3)&2 ⟨1|2|3, 1|2⟩ M1, D1, U1, D2 768 769 (*1|2) & (1|*2) ⟨1|2, _|_⟩ M1, D1, U2 770 ``` 771 772 The rules of subsumption for defaults can be derived from the above definitions 773 and are as follows. 774 775 ``` 776 ⟨v2, d2⟩ ⊑ ⟨v1, d1⟩ if v2 ⊑ v1 and d2 ⊑ d1 777 ⟨v1, d1⟩ ⊑ ⟨v⟩ if v1 ⊑ v 778 ⟨v⟩ ⊑ ⟨v1, d1⟩ if v ⊑ d1 779 ``` 780 781 <!-- 782 For the second rule, note that by definition d1 ⊑ v1, so d1 ⊑ v1 ⊑ v. 783 784 The last one is so restrictive as v could still be made more specific by 785 associating it with a default that is not subsumed by d1. 786 787 Proof: 788 by definition for any d ⊑ v, it holds that (v, d) ⊑ v, 789 where the most general value is (v, v). 790 Given the subsumption rule for (v2, d2) ⊑ (v1, d1), 791 from (v, v) ⊑ v ⊑ (v1, d1) it follows that v ⊑ d1 792 exactly defines the boundary of this subsumption. 793 --> 794 795 <!-- 796 (non-normalized entries could also be implicitly marked, allowing writing 797 int | 1, instead of int | *1, but that can be done in a backwards 798 compatible way later if really desirable, as long as we require that 799 disjunction literals be normalized). 800 --> 801 802 ``` 803 Expression Resolves to 804 "tcp" | "udp" "tcp" | "udp" 805 *"tcp" | "udp" "tcp" 806 float | *1 1 807 *string | 1.0 string 808 (*1|2) + (2|*3) 4 809 810 (*1|2|3) | (1|*2|3) 1|2 811 (*1|2|3) & (1|*2|3) 1|2|3 // default is _|_ 812 813 (* >=5 | int) & (* <=5 | int) 5 814 815 (*"tcp"|"udp") & ("udp"|*"tcp") "tcp" 816 (*"tcp"|"udp") & ("udp"|"tcp") "tcp" 817 (*"tcp"|"udp") & "tcp" "tcp" 818 (*"tcp"|"udp") & (*"udp"|"tcp") "tcp" | "udp" // default is _|_ 819 820 (*true | false) & bool true 821 (*true | false) & (true | false) true 822 823 {a: 1} | {b: 1} {a: 1} | {b: 1} 824 {a: 1} | *{b: 1} {b:1} 825 *{a: 1} | *{b: 1} {a: 1} | {b: 1} 826 ({a: 1} | {b: 1}) & {a:1} {a:1} | {a: 1, b: 1} 827 ({a:1}|*{b:1}) & ({a:1}|*{b:1}) {b:1} 828 ``` 829 830 831 ### Bottom and errors 832 833 Any evaluation error in CUE results in a bottom value, represented by 834 the token `_|_`. 835 Bottom is an instance of every other value. 836 Any evaluation error is represented as bottom. 837 838 Implementations may associate error strings with different instances of bottom; 839 logically they all remain the same value. 840 841 ``` 842 bottom_lit = "_|_" . 843 ``` 844 845 846 ### Top 847 848 Top is represented by the underscore character `_`, lexically an identifier. 849 Unifying any value `v` with top results `v` itself. 850 851 ``` 852 Expr Result 853 _ & 5 5 854 _ & _ _ 855 _ & _|_ _|_ 856 _ | _|_ _ 857 ``` 858 859 860 ### Null 861 862 The _null value_ is represented with the keyword `null`. 863 It has only one parent, top, and one child, bottom. 864 It is unordered with respect to any other value. 865 866 ``` 867 null_lit = "null" . 868 ``` 869 870 ``` 871 null & 8 _|_ 872 null & _ null 873 null & _|_ _|_ 874 ``` 875 876 877 ### Boolean values 878 879 A _boolean type_ represents the set of Boolean truth values denoted by 880 the keywords `true` and `false`. 881 The predeclared boolean type is `bool`; it is a defined type and a separate 882 element in the lattice. 883 884 ``` 885 bool_lit = "true" | "false" . 886 ``` 887 888 ``` 889 bool & true true 890 true & true true 891 true & false _|_ 892 bool & (false|true) false | true 893 bool & (true|false) true | false 894 ``` 895 896 897 ### Numeric values 898 899 The _integer type_ represents the set of all integral numbers. 900 The _decimal floating-point type_ represents the set of all decimal floating-point 901 numbers. 902 They are two distinct types. 903 Both are instances instances of a generic `number` type. 904 905 <!-- 906 number 907 / \ 908 int float 909 --> 910 911 The predeclared number, integer, decimal floating-point types are 912 `number`, `int` and `float`; they are defined types. 913 <!-- 914 TODO: should we drop float? It is somewhat preciser and probably a good idea 915 to have it in the programmatic API, but it may be confusing to have to deal 916 with it in the language. 917 --> 918 919 A decimal floating-point literal always has type `float`; 920 it is not an instance of `int` even if it is an integral number. 921 922 Integer literals are always of type `int` and don't match type `float`. 923 924 Numeric literals are exact values of arbitrary precision. 925 If the operation permits it, numbers should be kept in arbitrary precision. 926 927 Implementation restriction: although numeric values have arbitrary precision 928 in the language, implementations may implement them using an internal 929 representation with limited precision. 930 That said, every implementation must: 931 932 - Represent integer values with at least 256 bits. 933 - Represent floating-point values, with a mantissa of at least 256 bits and 934 a signed binary exponent of at least 16 bits. 935 - Give an error if unable to represent an integer value precisely. 936 - Give an error if unable to represent a floating-point value due to overflow. 937 - Round to the nearest representable value if unable to represent 938 a floating-point value due to limits on precision. 939 These requirements apply to the result of any expression except for builtin 940 functions for which an unusual loss of precision must be explicitly documented. 941 942 943 ### Strings 944 945 The _string type_ represents the set of UTF-8 strings, 946 not allowing surrogates. 947 The predeclared string type is `string`; it is a defined type. 948 949 The length of a string `s` (its size in bytes) can be discovered using 950 the built-in function `len`. 951 952 953 ### Bytes 954 955 The _bytes type_ represents the set of byte sequences. 956 A byte sequence value is a (possibly empty) sequence of bytes. 957 The number of bytes is called the length of the byte sequence 958 and is never negative. 959 The predeclared byte sequence type is `bytes`; it is a defined type. 960 961 962 ### Bounds 963 964 A _bound_, syntactically a [unary expression](#Operands), defines 965 an infinite disjunction of concrete values than can be represented 966 as a single comparison. 967 968 For any [comparison operator](#Comparison-operators) `op` except `==`, 969 `op a` is the disjunction of every `x` such that `x op a`. 970 971 ``` 972 2 & >=2 & <=5 // 2, where 2 is either an int or float. 973 2.5 & >=1 & <=5 // 2.5 974 2 & >=1.0 & <3.0 // 2.0 975 2 & >1 & <3.0 // 2.0 976 2.5 & int & >1 & <5 // _|_ 977 2.5 & float & >1 & <5 // 2.5 978 int & 2 & >1.0 & <3.0 // _|_ 979 2.5 & >=(int & 1) & <5 // _|_ 980 >=0 & <=7 & >=3 & <=10 // >=3 & <=7 981 !=null & 1 // 1 982 >=5 & <=5 // 5 983 ``` 984 985 986 ### Structs 987 988 A _struct_ is a set of elements called _fields_, each of 989 which has a name, called a _label_, and value. 990 991 We say a label is defined for a struct if the struct has a field with the 992 corresponding label. 993 The value for a label `f` of struct `a` is denoted `a.f`. 994 A struct `a` is an instance of `b`, or `a ⊑ b`, if for any label `f` 995 defined for `b`, label `f` is also defined for `a` and `a.f ⊑ b.f`. 996 Note that if `a` is an instance of `b` it may have fields with labels that 997 are not defined for `b`. 998 999 The (unique) struct with no fields, written `{}`, has every struct as an 1000 instance. It can be considered the type of all structs. 1001 1002 ``` 1003 {a: 1} ⊑ {} 1004 {a: 1, b: 1} ⊑ {a: 1} 1005 {a: 1} ⊑ {a: int} 1006 {a: 1, b: 1} ⊑ {a: int, b: float} 1007 1008 {} ⋢ {a: 1} 1009 {a: 2} ⋢ {a: 1} 1010 {a: 1} ⋢ {b: 1} 1011 ``` 1012 1013 A field may be required or optional. 1014 The successful unification of structs `a` and `b` is a new struct `c` which 1015 has all fields of both `a` and `b`, where 1016 the value of a field `f` in `c` is `a.f & b.f` if `f` is in both `a` and `b`, 1017 or just `a.f` or `b.f` if `f` is in just `a` or `b`, respectively. 1018 If a field `f` is in both `a` and `b`, `c.f` is optional only if both 1019 `a.f` and `b.f` are optional. 1020 Any [references](#References) to `a` or `b` 1021 in their respective field values need to be replaced with references to `c`. 1022 The result of a unification is bottom (`_|_`) if any of its non-optional 1023 fields evaluates to bottom, recursively. 1024 1025 <!--NOTE: About bottom values for optional fields being okay. 1026 1027 The proposition ¬P is a close cousin of P → ⊥ and is often used 1028 as an approximation to avoid the issues of using not. 1029 Bottom (⊥) is also frequently used to mean undefined. This makes sense. 1030 Consider `{a?: 2} & {a?: 3}`. 1031 Both structs say `a` is optional; in other words, it may be omitted. 1032 So we can still get a valid result by omitting `a`, even in 1033 case of a conflict. 1034 1035 Granted, this definition may lead to confusing results, especially in 1036 definitions, when tightening an optional field leads to unintentionally 1037 discarding it. 1038 It could be a role of vet checkers to identify such cases (and suggest users 1039 to explicitly use `_|_` to discard a field, for instance). 1040 --> 1041 1042 Syntactically, a field is marked as optional by following its label with a `?`. 1043 The question mark is not part of the field name. 1044 A struct literal may contain multiple fields with 1045 the same label, the result of which is a single field with the same properties 1046 as defined as the unification of two fields resulting from unifying two structs. 1047 1048 These examples illustrate required fields only. 1049 Examples with optional fields follow below. 1050 1051 ``` 1052 Expression Result (without optional fields) 1053 {a: int, a: 1} {a: 1} 1054 {a: int} & {a: 1} {a: 1} 1055 {a: >=1 & <=7} & {a: >=5 & <=9} {a: >=5 & <=7} 1056 {a: >=1 & <=7, a: >=5 & <=9} {a: >=5 & <=7} 1057 1058 {a: 1} & {b: 2} {a: 1, b: 2} 1059 {a: 1, b: int} & {b: 2} {a: 1, b: 2} 1060 1061 {a: 1} & {a: 2} _|_ 1062 ``` 1063 1064 A struct may define constraints that apply to fields that are added when unified 1065 with another struct using pattern or default constraints. 1066 1067 A _pattern constraint_, denoted `[pattern]: value`, defines a pattern, which 1068 is a value of type string, and a value to unify with fields whose label 1069 match that pattern. 1070 When unifying structs `a` and `b`, 1071 a pattern constraint `[p]: v` declared in `a` 1072 defines that the value `v` should unify with any field in the resulting struct `c` 1073 whose label unifies with pattern `p`. 1074 1075 <!-- TODO: Update grammar and support this. 1076 A pattern constraints with a pattern preceded by `...` indicates 1077 the pattern can only matches fields in `b` for which there 1078 exists no field in `a` with the same label. 1079 --> 1080 1081 Additionally, a _default constraint_, denoted `...value`, defines a value 1082 to unify with any field for which there is no other declaration in a struct. 1083 When unifying structs `a` and `b`, 1084 a default constraint `...v` declared in `a` 1085 defines that the value `v` should unify with any field in the resulting struct `c` 1086 whose label does not unify with any of the patterns of the pattern 1087 constraints defined for `a` _and_ for which there exists no field in `a` 1088 with that label. 1089 The token `...` is a shorthand for `..._`. 1090 1091 1092 ``` 1093 a: { 1094 foo: string // foo is a string 1095 ["^i"]: int // all other fields starting with i are integers 1096 ["^b"]: bool // all other fields starting with b are booleans 1097 ...string // all other fields must be a string 1098 } 1099 1100 b: a & { 1101 i3: 3 1102 bar: true 1103 other: "a string" 1104 } 1105 ``` 1106 1107 Concrete field labels may be an identifier or string, the latter of which may be 1108 interpolated. 1109 Fields with identifier labels can be referred to within the scope they are 1110 defined, string labels cannot. 1111 References within such interpolated strings are resolved within 1112 the scope of the struct in which the label sequence is 1113 defined and can reference concrete labels lexically preceding 1114 the label within a label sequence. 1115 <!-- We allow this so that rewriting a CUE file to collapse or expand 1116 field sequences has no impact on semantics. 1117 --> 1118 1119 <!--TODO: first implementation round will not yet have expression labels 1120 1121 An ExpressionLabel sets a collection of optional fields to a field value. 1122 By default it defines this value for all possible string labels. 1123 An optional expression limits this to the set of optional fields which 1124 labels match the expression. 1125 --> 1126 1127 1128 <!-- NOTE: if we allow ...Expr, as in list, it would mean something different. --> 1129 1130 1131 <!-- NOTE: 1132 A DefinitionDecl does not allow repeated labels. This is to avoid 1133 any ambiguity or confusion about whether earlier path components 1134 are to be interpreted as declarations or normal fields (they should 1135 always be normal fields.) 1136 --> 1137 1138 <!--NOTE: 1139 The syntax has been deliberately restricted to allow for the following 1140 future extensions and relaxations: 1141 - Allow omitting a "?" in an expression label to indicate a concrete 1142 string value (but maybe we want to use () for that). 1143 - Make the "?" in expression label optional if expression labels 1144 are always optional. 1145 - Or allow eliding the "?" if the expression has no references and 1146 is obviously not concrete (such as `[string]`). 1147 - The expression of an expression label may also indicate a struct with 1148 integer or even number labels 1149 (beware of imprecise computation in the latter). 1150 e.g. `{ [int]: string }` is a map of integers to strings. 1151 - Allow for associative lists (`foo [@.field]: {field: string}`) 1152 - The `...` notation can be extended analogously to that of a ListList, 1153 by allowing it to follow with an expression for the remaining properties. 1154 In that case it is no longer a shorthand for `[string]: _`, but rather 1155 would define the value for any other value for which there is no field 1156 defined. 1157 Like the definition with List, this is somewhat odd, but it allows the 1158 encoding of JSON schema's and (non-structural) OpenAPI's 1159 additionalProperties and additionalItems. 1160 --> 1161 1162 ``` 1163 StructLit = "{" { Declaration "," } "}" . 1164 Declaration = Field | Ellipsis | Embedding | LetClause | attribute . 1165 Ellipsis = "..." [ Expression ] . 1166 Embedding = Comprehension | AliasExpr . 1167 Field = Label ":" { Label ":" } AliasExpr { attribute } . 1168 Label = [ identifier "=" ] LabelExpr . 1169 LabelExpr = LabelName [ "?" ] | "[" AliasExpr "]" . 1170 LabelName = identifier | simple_string_lit . 1171 1172 attribute = "@" identifier "(" attr_tokens ")" . 1173 attr_tokens = { attr_token | 1174 "(" attr_tokens ")" | 1175 "[" attr_tokens "]" | 1176 "{" attr_tokens "}" } . 1177 attr_token = /* any token except '(', ')', '[', ']', '{', or '}' */ 1178 ``` 1179 1180 ``` 1181 Expression Result (without optional fields) 1182 a: { foo?: string } {} 1183 b: { foo: "bar" } { foo: "bar" } 1184 c: { foo?: *"bar" | string } {} 1185 1186 d: a & b { foo: "bar" } 1187 e: b & c { foo: "bar" } 1188 f: a & c {} 1189 g: a & { foo?: number } {} 1190 h: b & { foo?: number } _|_ 1191 i: c & { foo: string } { foo: "bar" } 1192 1193 intMap: [string]: int 1194 intMap: { 1195 t1: 43 1196 t2: 2.4 // error: 2.4 is not an integer 1197 } 1198 1199 nameMap: [string]: { 1200 firstName: string 1201 nickName: *firstName | string 1202 } 1203 1204 nameMap: hank: { firstName: "Hank" } 1205 ``` 1206 The optional field set defined by `nameMap` matches every field, 1207 in this case just `hank`, and unifies the associated constraint 1208 with the matched field, resulting in: 1209 ``` 1210 nameMap: hank: { 1211 firstName: "Hank" 1212 nickName: "Hank" 1213 } 1214 ``` 1215 1216 1217 #### Closed structs 1218 1219 By default, structs are open to adding fields. 1220 Instances of an open struct `p` may contain fields not defined in `p`. 1221 This is makes it easy to add fields, but can lead to bugs: 1222 1223 ``` 1224 S: { 1225 field1: string 1226 } 1227 1228 S1: S & { field2: "foo" } 1229 1230 // S1 is { field1: string, field2: "foo" } 1231 1232 1233 A: { 1234 field1: string 1235 field2: string 1236 } 1237 1238 A1: A & { 1239 feild1: "foo" // "field1" was accidentally misspelled 1240 } 1241 1242 // A1 is 1243 // { field1: string, field2: string, feild1: "foo" } 1244 // not the intended 1245 // { field1: "foo", field2: string } 1246 ``` 1247 1248 A _closed struct_ `c` is a struct whose instances may not declare any field 1249 with a name that does not match the name of field 1250 or the pattern of a pattern constraint defined in `c`. 1251 Hidden fields are excluded from this limitation. 1252 A struct that is the result of unifying any struct with a [`...`](#Structs) 1253 declaration is defined for all regular fields. 1254 Closing a struct is equivalent to adding `..._|_` to it. 1255 1256 Syntactically, structs are closed explicitly with the `close` builtin or 1257 implicitly and recursively by [definitions](#definitions-and-hidden-fields). 1258 1259 1260 ``` 1261 A: close({ 1262 field1: string 1263 field2: string 1264 }) 1265 1266 A1: A & { 1267 feild1: string 1268 } // _|_ feild1 not defined for A 1269 1270 A2: A & { 1271 for k,v in { feild1: string } { 1272 k: v 1273 } 1274 } // _|_ feild1 not defined for A 1275 1276 C: close({ 1277 [_]: _ 1278 }) 1279 1280 C2: C & { 1281 for k,v in { thisIsFine: string } { 1282 "\(k)": v 1283 } 1284 } 1285 1286 D: close({ 1287 // Values generated by comprehensions are treated as embeddings. 1288 for k,v in { x: string } { 1289 "\(k)": v 1290 } 1291 }) 1292 ``` 1293 1294 <!-- (jba) Somewhere it should be said that optional fields are only 1295 interesting inside closed structs. --> 1296 1297 <!-- TODO: move embedding section to above the previous one --> 1298 1299 #### Embedding 1300 1301 A struct may contain an _embedded value_, an operand used as a declaration. 1302 An embedded value of type struct is unified with the struct in which it is 1303 embedded, but disregarding the restrictions imposed by closed structs. 1304 So if an embedding resolves to a closed struct, the corresponding enclosing 1305 struct will also be closed, but may have fields that are not allowed if 1306 normal rules for closed structs were observed. 1307 1308 If an embedded value is not of type struct, the struct may only have 1309 definitions or hidden fields. Regular fields are not allowed in such case. 1310 1311 The result of `{ A }` is `A` for any `A` (including definitions). 1312 1313 Syntactically, embeddings may be any expression. 1314 1315 ``` 1316 S1: { 1317 a: 1 1318 b: 2 1319 { 1320 c: 3 1321 } 1322 } 1323 // S1 is { a: 1, b: 2, c: 3 } 1324 1325 S2: close({ 1326 a: 1 1327 b: 2 1328 { 1329 c: 3 1330 } 1331 }) 1332 // same as close(S1) 1333 1334 S3: { 1335 a: 1 1336 b: 2 1337 close({ 1338 c: 3 1339 }) 1340 } 1341 // same as S2 1342 ``` 1343 1344 1345 #### Definitions and hidden fields 1346 1347 A field is a _definition_ if its identifier starts with `#` or `_#`. 1348 A field is _hidden_ if its starts with a `_`. 1349 All other fields are _regular_. 1350 1351 Definitions and hidden fields are not emitted when converting a CUE program 1352 to data and are never required to be concrete. 1353 1354 Referencing a definition will recursively [close](#ClosedStructs) it. 1355 That is, a referenced definition will not unify with a struct 1356 that would add a field anywhere within the definition that it does not 1357 already define or explicitly allow with a pattern constraint or `...`. 1358 [Embeddings](#embedding) allow bypassing this check. 1359 1360 If referencing a definition would always result in an error, implementations 1361 may report this inconsistency at the point of its declaration. 1362 1363 ``` 1364 #MyStruct: { 1365 sub: field: string 1366 } 1367 1368 #MyStruct: { 1369 sub: enabled?: bool 1370 } 1371 1372 myValue: #MyStruct & { 1373 sub: feild: 2 // error, feild not defined in #MyStruct 1374 sub: enabled: true // okay 1375 } 1376 1377 #D: { 1378 #OneOf 1379 1380 c: int // adds this field. 1381 } 1382 1383 #OneOf: { a: int } | { b: int } 1384 1385 1386 D1: #D & { a: 12, c: 22 } // { a: 12, c: 22 } 1387 D2: #D & { a: 12, b: 33 } // _|_ // cannot define both `a` and `b` 1388 ``` 1389 1390 1391 ``` 1392 #A: {a: int} 1393 1394 B: { 1395 #A 1396 b: c: int 1397 } 1398 1399 x: B 1400 x: d: 3 // not allowed, as closed by embedded #A 1401 1402 y: B.b 1403 y: d: 3 // allowed as nothing closes b 1404 1405 #B: { 1406 #A 1407 b: c: int 1408 } 1409 1410 z: #B.b 1411 z: d: 3 // not allowed, as referencing #B closes b 1412 ``` 1413 1414 1415 <!--- 1416 JSON fields are usual camelCase. Clashes can be avoided by adopting the 1417 convention that definitions be TitleCase. Unexported definitions are still 1418 subject to clashes, but those are likely easier to resolve because they are 1419 package internal. 1420 ---> 1421 1422 1423 #### Attributes 1424 1425 Attributes allow associating meta information with values. 1426 Their primary purpose is to define mappings between CUE and 1427 other representations. 1428 Attributes do not influence the evaluation of CUE. 1429 1430 An attribute associates an identifier with a value, a balanced token sequence, 1431 which is a sequence of CUE tokens with balanced brackets (`()`, `[]`, and `{}`). 1432 The sequence may not contain interpolations. 1433 1434 Fields, structs and packages can be associated with a set of attributes. 1435 Attributes accumulate during unification, but implementations may remove 1436 duplicates that have the same source string representation. 1437 The interpretation of an attribute, including the handling of multiple 1438 attributes for a given identifier, is up to the consumer of the attribute. 1439 1440 Field attributes define additional information about a field, 1441 such as a mapping to a protocol buffer <!-- TODO: add link --> tag or alternative 1442 name of the field when mapping to a different language. 1443 1444 1445 ``` 1446 // Package attribute 1447 @protobuf(proto3) 1448 1449 myStruct1: { 1450 // Struct attribute: 1451 @jsonschema(id="https://example.org/mystruct1.json") 1452 1453 // Field attributes 1454 field: string @go(Field) 1455 attr: int @xml(,attr) @go(Attr) 1456 } 1457 1458 myStruct2: { 1459 field: string @go(Field) 1460 attr: int @xml(a1,attr) @go(Attr) 1461 } 1462 1463 Combined: myStruct1 & myStruct2 1464 // field: string @go(Field) 1465 // attr: int @xml(,attr) @xml(a1,attr) @go(Attr) 1466 ``` 1467 1468 1469 #### Aliases 1470 1471 Aliases name values that can be referred to 1472 within the [scope](#declarations-and-scopes) in which they are declared. 1473 The name of an alias must be unique within its scope. 1474 1475 ``` 1476 AliasExpr = [ identifier "=" ] Expression . 1477 ``` 1478 1479 Aliases can appear in several positions: 1480 1481 <!--- TODO: consider allowing this. It should be considered whether 1482 having field aliases isn't already sufficient. 1483 1484 As a declaration in a struct (`X=value`): 1485 1486 - binds identifier `X` to a value embedded within the struct. 1487 ---> 1488 1489 In front of a Label (`X=label: value`): 1490 1491 - binds the identifier to the same value as `label` would be bound 1492 to if it were a valid identifier. 1493 - for optional fields (`foo?: bar` and `[foo]: bar`), 1494 the bound identifier is only visible within the field value (`bar`). 1495 1496 Before a value (`foo: X=x`) 1497 1498 - binds the identifier to the value it precedes within the scope of that value. 1499 1500 Inside a bracketed label (`[X=expr]: value`): 1501 1502 - binds the identifier to the the concrete label that matches `expr` 1503 within the instances of the field value (`value`). 1504 1505 Before a list element (`[ X=value, X+1 ]`) (Not yet implemented) 1506 1507 - binds the identifier to the list element it precedes within the scope of the 1508 list expression. 1509 1510 <!-- TODO: explain the difference between aliases and definitions. 1511 Now that you have definitions, are aliases really necessary? 1512 Consider removing. 1513 --> 1514 1515 ``` 1516 // A field alias 1517 foo: X // 4 1518 X="not an identifier": 4 1519 1520 // A value alias 1521 foo: X={x: X.a} 1522 bar: foo & {a: 1} // {a: 1, x: 1} 1523 1524 // A label alias 1525 [Y=string]: { name: Y } 1526 foo: { value: 1 } // outputs: foo: { name: "foo", value: 1 } 1527 ``` 1528 1529 <!-- TODO: also allow aliases as lists --> 1530 1531 1532 #### Let declarations 1533 1534 _Let declarations_ bind an identifier to an expression. 1535 The identifier is visible within the [scope](#declarations-and-scopes) 1536 in which it is declared. 1537 The identifier must be unique within its scope. 1538 1539 ``` 1540 let x = expr 1541 1542 a: x + 1 1543 b: x + 2 1544 ``` 1545 1546 #### Shorthand notation for nested structs 1547 1548 A field whose value is a struct with a single field may be written as 1549 a colon-separated sequence of the two field names, 1550 followed by a colon and the value of that single field. 1551 1552 ``` 1553 job: myTask: replicas: 2 1554 ``` 1555 expands to 1556 ``` 1557 job: { 1558 myTask: { 1559 replicas: 2 1560 } 1561 } 1562 ``` 1563 1564 <!-- OPTIONAL FIELDS: 1565 1566 The optional marker solves the issue of having to print large amounts of 1567 boilerplate when dealing with large types with many optional or default 1568 values (such as Kubernetes). 1569 Writing such optional values in terms of *null | value is tedious, 1570 unpleasant to read, and as it is not well defined what can be dropped or not, 1571 all null values have to be emitted from the output, even if the user 1572 doesn't override them. 1573 Part of the issue is how null is defined. We could adopt a Typescript-like 1574 approach of introducing "void" or "undefined" to mean "not defined and not 1575 part of the output". But having all of null, undefined, and void can be 1576 confusing. If these ever are introduced anyway, the ? operator could be 1577 expressed along the lines of 1578 foo?: bar 1579 being a shorthand for 1580 foo: void | bar 1581 where void is the default if no other default is given. 1582 1583 The current mechanical definition of "?" is straightforward, though, and 1584 probably avoids the need for void, while solving a big issue. 1585 1586 Caveats: 1587 [1] this definition requires explicitly defined fields to be emitted, even 1588 if they could be elided (for instance if the explicit value is the default 1589 value defined an optional field). This is probably a good thing. 1590 1591 [2] a default value may still need to be included in an output if it is not 1592 the zero value for that field and it is not known if any outside system is 1593 aware of defaults. For instance, which defaults are specified by the user 1594 and which by the schema understood by the receiving system. 1595 The use of "?" together with defaults should therefore be used carefully 1596 in non-schema definitions. 1597 Problematic cases should be easy to detect by a vet-like check, though. 1598 1599 [3] It should be considered how this affects the trim command. 1600 Should values implied by optional fields be allowed to be removed? 1601 Probably not. This restriction is unlikely to limit the usefulness of trim, 1602 though. 1603 1604 [4] There should be an option to emit all concrete optional values. 1605 ``` 1606 --> 1607 1608 ### Lists 1609 1610 A list literal defines a new value of type list. 1611 A list may be open or closed. 1612 An open list is indicated with a `...` at the end of an element list, 1613 optionally followed by a value for the remaining elements. 1614 1615 The length of a closed list is the number of elements it contains. 1616 The length of an open list is the its number of elements as a lower bound 1617 and an unlimited number of elements as its upper bound. 1618 1619 ``` 1620 ListLit = "[" [ ElementList [ "," ] ] "]" . 1621 ElementList = Ellipsis | Embedding { "," Embedding } [ "," Ellipsis ] . 1622 ``` 1623 1624 Lists can be thought of as structs: 1625 1626 ``` 1627 List: *null | { 1628 Elem: _ 1629 Tail: List 1630 } 1631 ``` 1632 1633 For closed lists, `Tail` is `null` for the last element, for open lists it is 1634 `*null | List`, defaulting to the shortest variant. 1635 For instance, the open list [ 1, 2, ... ] can be represented as: 1636 ``` 1637 open: List & { Elem: 1, Tail: { Elem: 2 } } 1638 ``` 1639 and the closed version of this list, [ 1, 2 ], as 1640 ``` 1641 closed: List & { Elem: 1, Tail: { Elem: 2, Tail: null } } 1642 ``` 1643 1644 Using this representation, the subsumption rule for lists can 1645 be derived from those of structs. 1646 Implementations are not required to implement lists as structs. 1647 The `Elem` and `Tail` fields are not special and `len` will not work as 1648 expected in these cases. 1649 1650 1651 ## Declarations and Scopes 1652 1653 1654 ### Blocks 1655 1656 A _block_ is a possibly empty sequence of declarations. 1657 The braces of a struct literal `{ ... }` form a block, but there are 1658 others as well: 1659 1660 - The _universe block_ encompasses all CUE source text. 1661 - Each [package](#modules-instances-and-packages) has a _package block_ 1662 containing all CUE source text in that package. 1663 - Each file has a _file block_ containing all CUE source text in that file. 1664 - Each `for` and `let` clause in a [comprehension](#comprehensions) 1665 is considered to be its own implicit block. 1666 1667 Blocks nest and influence scoping. 1668 1669 1670 ### Declarations and scope 1671 1672 A _declaration_ may bind an identifier to a field, alias, or package. 1673 Every identifier in a program must be declared. 1674 Other than for fields, 1675 no identifier may be declared twice within the same block. 1676 For fields an identifier may be declared more than once within the same block, 1677 resulting in a field with a value that is the result of unifying the values 1678 of all fields with the same identifier. 1679 String labels do not bind an identifier to the respective field. 1680 1681 The _scope_ of a declared identifier is the extent of source text in which the 1682 identifier denotes the specified field, alias, or package. 1683 1684 CUE is lexically scoped using blocks: 1685 1686 1. The scope of a [predeclared identifier](#predeclared-identifiers) is the universe block. 1687 1. The scope of an identifier denoting a field 1688 declared at top level (outside any struct literal) is the package block. 1689 1. The scope of an identifier denoting an alias 1690 declared at top level (outside any struct literal) is the file block. 1691 1. The scope of the package name of an imported package is the file block of the 1692 file containing the import declaration. 1693 1. The scope of a field, alias or let identifier declared inside a struct 1694 literal is the innermost containing block. 1695 1696 An identifier declared in a block may be redeclared in an inner block. 1697 While the identifier of the inner declaration is in scope, it denotes the entity 1698 declared by the inner declaration. 1699 1700 The package clause is not a declaration; 1701 the package name does not appear in any scope. 1702 Its purpose is to identify the files belonging to the same package 1703 and to specify the default name for import declarations. 1704 1705 1706 ### Predeclared identifiers 1707 1708 CUE predefines a set of types and builtin functions. 1709 For each of these there is a corresponding keyword which is the name 1710 of the predefined identifier, prefixed with `__`. 1711 1712 ``` 1713 Functions 1714 len close and or 1715 1716 Types 1717 null The null type and value 1718 bool All boolean values 1719 int All integral numbers 1720 float All decimal floating-point numbers 1721 string Any valid UTF-8 sequence 1722 bytes Any valid byte sequence 1723 1724 Derived Value 1725 number int | float 1726 uint >=0 1727 uint8 >=0 & <=255 1728 int8 >=-128 & <=127 1729 uint16 >=0 & <=65536 1730 int16 >=-32_768 & <=32_767 1731 rune >=0 & <=0x10FFFF 1732 uint32 >=0 & <=4_294_967_296 1733 int32 >=-2_147_483_648 & <=2_147_483_647 1734 uint64 >=0 & <=18_446_744_073_709_551_615 1735 int64 >=-9_223_372_036_854_775_808 & <=9_223_372_036_854_775_807 1736 uint128 >=0 & <=340_282_366_920_938_463_463_374_607_431_768_211_455 1737 int128 >=-170_141_183_460_469_231_731_687_303_715_884_105_728 & 1738 <=170_141_183_460_469_231_731_687_303_715_884_105_727 1739 float32 >=-3.40282346638528859811704183484516925440e+38 & 1740 <=3.40282346638528859811704183484516925440e+38 1741 float64 >=-1.797693134862315708145274237317043567981e+308 & 1742 <=1.797693134862315708145274237317043567981e+308 1743 ``` 1744 1745 1746 ### Exported identifiers 1747 1748 <!-- move to a more logical spot --> 1749 1750 An identifier of a package may be exported to permit access to it 1751 from another package. 1752 All identifiers not starting with `_` (so all regular fields and definitions 1753 starting with `#`) are exported. 1754 Any identifier starting with `_` is not visible outside the package and resides 1755 in a separate namespace than namesake identifiers of other packages. 1756 1757 ``` 1758 package mypackage 1759 1760 foo: string // visible outside mypackage 1761 "bar": string // visible outside mypackage 1762 1763 #Foo: { // visible outside mypackage 1764 a: 1 // visible outside mypackage 1765 _b: 2 // not visible outside mypackage 1766 1767 #C: { // visible outside mypackage 1768 d: 4 // visible outside mypackage 1769 } 1770 _#E: foo // not visible outside mypackage 1771 } 1772 ``` 1773 1774 1775 ### Uniqueness of identifiers 1776 1777 Given a set of identifiers, an identifier is called unique if it is different 1778 from every other in the set, after applying normalization following 1779 Unicode Annex #31. 1780 Two identifiers are different if they are spelled differently 1781 or if they appear in different packages and are not exported. 1782 Otherwise, they are the same. 1783 1784 1785 ### Field declarations 1786 1787 A field associates the value of an expression to a label within a struct. 1788 If this label is an identifier, it binds the field to that identifier, 1789 so the field's value can be referenced by writing the identifier. 1790 String labels are not bound to fields. 1791 ``` 1792 a: { 1793 b: 2 1794 "s": 3 1795 1796 c: b // 2 1797 d: s // _|_ unresolved identifier "s" 1798 e: a.s // 3 1799 } 1800 ``` 1801 1802 If an expression may result in a value associated with a default value 1803 as described in [default values](#default-values), the field binds to this 1804 value-default pair. 1805 1806 1807 <!-- TODO: disallow creating identifiers starting with __ 1808 ...and reserve them for builtin values. 1809 1810 The issue is with code generation. As no guarantee can be given that 1811 a predeclared identifier is not overridden in one of the enclosing scopes, 1812 code will have to handle detecting such cases and renaming them. 1813 An alternative is to have the predeclared identifiers be aliases for namesake 1814 equivalents starting with a double underscore (e.g. string -> __string), 1815 allowing generated code (normal code would keep using `string`) to refer 1816 to these directly. 1817 --> 1818 1819 1820 ### Let declarations 1821 1822 Within a struct, a let clause binds an identifier to the given expression. 1823 1824 Within the scope of the identifier, the identifier refers to the 1825 _locally declared_ expression. 1826 The expression is evaluated in the scope it was declared. 1827 1828 1829 ## Expressions 1830 1831 An expression specifies the computation of a value by applying operators and 1832 built-in functions to operands. 1833 1834 Expressions that require concrete values are called _incomplete_ if any of 1835 their operands are not concrete, but define a value that would be legal for 1836 that expression. 1837 Incomplete expressions may be left unevaluated until a concrete value is 1838 requested at the application level. 1839 1840 ### Operands 1841 1842 Operands denote the elementary values in an expression. 1843 An operand may be a literal, a (possibly qualified) identifier denoting 1844 field, alias, or let declaration, or a parenthesized expression. 1845 1846 ``` 1847 Operand = Literal | OperandName | "(" Expression ")" . 1848 Literal = BasicLit | ListLit | StructLit . 1849 BasicLit = int_lit | float_lit | string_lit | 1850 null_lit | bool_lit | bottom_lit . 1851 OperandName = identifier | QualifiedIdent . 1852 ``` 1853 1854 ### Qualified identifiers 1855 1856 A qualified identifier is an identifier qualified with a package name prefix. 1857 1858 ``` 1859 QualifiedIdent = PackageName "." identifier . 1860 ``` 1861 1862 A qualified identifier accesses an identifier in a different package, 1863 which must be [imported]. 1864 The identifier must be declared in the [package block] of that package. 1865 1866 ``` 1867 math.Sin // denotes the Sin function in package math 1868 ``` 1869 1870 ### References 1871 1872 An identifier operand refers to a field and is called a reference. 1873 The value of a reference is a copy of the expression associated with the field 1874 that it is bound to, 1875 with any references within that expression bound to the respective copies of 1876 the fields they were originally bound to. 1877 Implementations may use a different mechanism to evaluate as long as 1878 these semantics are maintained. 1879 1880 ``` 1881 a: { 1882 place: string 1883 greeting: "Hello, \(place)!" 1884 } 1885 1886 b: a & { place: "world" } 1887 c: a & { place: "you" } 1888 1889 d: b.greeting // "Hello, world!" 1890 e: c.greeting // "Hello, you!" 1891 ``` 1892 1893 1894 1895 ### Primary expressions 1896 1897 Primary expressions are the operands for unary and binary expressions. 1898 1899 ``` 1900 PrimaryExpr = 1901 Operand | 1902 PrimaryExpr Selector | 1903 PrimaryExpr Index | 1904 PrimaryExpr Slice | 1905 PrimaryExpr Arguments . 1906 1907 Selector = "." (identifier | simple_string_lit) . 1908 Index = "[" Expression "]" . 1909 Argument = Expression . 1910 Arguments = "(" [ ( Argument { "," Argument } ) [ "," ] ] ")" . 1911 ``` 1912 <!--- 1913 TODO: 1914 PrimaryExpr Query | 1915 Query = "." Filters . 1916 Filters = Filter { Filter } . 1917 Filter = "[" [ "?" ] AliasExpr "]" . 1918 1919 TODO: maybe reintroduce slices, as they are useful in queries, probably this 1920 time with Python semantics. 1921 Slice = "[" [ Expression ] ":" [ Expression ] [ ":" [Expression] ] "]" . 1922 1923 Argument = Expression | ( identifer ":" Expression ). 1924 1925 // & expression type 1926 // string_lit: same as label. Arguments is current node. 1927 // If selector is applied to list, it performs the operation for each 1928 // element. 1929 1930 TODO: considering allowing decimal_lit for selectors. 1931 ---> 1932 1933 ``` 1934 x 1935 2 1936 (s + ".txt") 1937 f(3.1415, true) 1938 m["foo"] 1939 obj.color 1940 f.p[i].x 1941 ``` 1942 1943 1944 ### Selectors 1945 1946 For a [primary expression](#primary-expressions) `x` that is not a [package name](#package-clause), 1947 the selector expression 1948 1949 ``` 1950 x.f 1951 ``` 1952 1953 denotes the element of a <!--list or -->struct `x` identified by `f`. 1954 <!--For structs, --> 1955 `f` must be an identifier or a string literal identifying 1956 any definition or regular non-optional field. 1957 The identifier `f` is called the field selector. 1958 1959 <!-- 1960 Allowing strings to be used as field selectors obviates the need for 1961 backquoted identifiers. Note that some standards use names for structs that 1962 are not standard identifiers (such "Fn::Foo"). Note that indexing does not 1963 allow access to identifiers. 1964 --> 1965 1966 <!-- 1967 For lists, `f` must be an integer and follows the same lookup rules as 1968 for the index operation. 1969 The type of the selector expression is the type of `f`. 1970 --> 1971 1972 If `x` is a package name, see the section on [qualified identifiers](#qualified-identifiers). 1973 1974 <!-- 1975 TODO: consider allowing this and also for selectors. It needs to be considered 1976 how defaults are carried forward in cases like: 1977 1978 x: { a: string | *"foo" } | *{ a: int | *4 } 1979 y: x.a & string 1980 1981 What is y in this case? 1982 (x.a & string, _|_) 1983 (string|"foo", _|_) 1984 (string|"foo", "foo) 1985 If the latter, then why? 1986 1987 For a disjunction of the form `x1 | ... | xn`, 1988 the selector is applied to each element `x1.f | ... | xn.f`. 1989 --> 1990 1991 Otherwise, if `x` is not a <!--list or -->struct, 1992 or if `f` does not exist in `x`, 1993 the result of the expression is bottom (an error). 1994 In the latter case the expression is incomplete. 1995 The operand of a selector may be associated with a default. 1996 1997 ``` 1998 T: { 1999 x: int 2000 y: 3 2001 "x-y": 4 2002 } 2003 2004 a: T.x // int 2005 b: T.y // 3 2006 c: T.z // _|_ // field 'z' not found in T 2007 d: T."x-y" // 4 2008 2009 e: {a: 1|*2} | *{a: 3|*4} 2010 f: e.a // 4 (default value) 2011 ``` 2012 2013 <!-- 2014 ``` 2015 (v, d).f => (v.f, d.f) 2016 2017 e: {a: 1|*2} | *{a: 3|*4} 2018 f: e.a // 4 after selecting default from (({a: 1|*2} | {a: 3|*4}).a, 4) 2019 2020 ``` 2021 --> 2022 2023 2024 ### Index expressions 2025 2026 A primary expression of the form 2027 2028 ``` 2029 a[x] 2030 ``` 2031 2032 denotes the element of a list or struct `a` indexed by `x`. 2033 The value `x` is called the index or field name, respectively. 2034 The following rules apply: 2035 2036 If `a` is not a struct: 2037 2038 - `a` is a list (which need not be complete) 2039 - the index `x` unified with `int` must be concrete. 2040 - the index `x` is in range if `0 <= x < len(a)`, where only the 2041 explicitly defined values of an open-ended list are considered, 2042 otherwise it is out of range 2043 2044 The result of `a[x]` is 2045 2046 for `a` of list type: 2047 2048 - the list element at index `x`, if `x` is within range 2049 - bottom (an error), otherwise 2050 2051 2052 for `a` of struct type: 2053 2054 - the index `x` unified with `string` must be concrete. 2055 - the value of the regular and non-optional field named `x` of struct `a`, 2056 if this field exists 2057 - bottom (an error), otherwise 2058 2059 2060 ``` 2061 [ 1, 2 ][1] // 2 2062 [ 1, 2 ][2] // _|_ 2063 [ 1, 2, ...][2] // _|_ 2064 ``` 2065 2066 Both the operand and index value may be a value-default pair. 2067 ``` 2068 va[vi] => va[vi] 2069 va[(vi, di)] => (va[vi], va[di]) 2070 (va, da)[vi] => (va[vi], da[vi]) 2071 (va, da)[(vi, di)] => (va[vi], da[di]) 2072 ``` 2073 2074 ``` 2075 Fields Result 2076 x: [1, 2] | *[3, 4] ([1,2]|[3,4], [3,4]) 2077 i: int | *1 (int, 1) 2078 2079 v: x[i] (x[i], 4) 2080 ``` 2081 2082 ### Operators 2083 2084 Operators combine operands into expressions. 2085 2086 ``` 2087 Expression = UnaryExpr | Expression binary_op Expression . 2088 UnaryExpr = PrimaryExpr | unary_op UnaryExpr . 2089 2090 binary_op = "|" | "&" | "||" | "&&" | "==" | rel_op | add_op | mul_op . 2091 rel_op = "!=" | "<" | "<=" | ">" | ">=" | "=~" | "!~" . 2092 add_op = "+" | "-" . 2093 mul_op = "*" | "/" . 2094 unary_op = "+" | "-" | "!" | "*" | rel_op . 2095 ``` 2096 2097 Comparisons are discussed [elsewhere](#Comparison-operators). 2098 For any binary operators, the operand types must unify. 2099 2100 <!-- TODO: durations 2101 unless the operation involves durations. 2102 2103 Except for duration operations, if one operand is an untyped [literal] and the 2104 other operand is not, the constant is [converted] to the type of the other 2105 operand. 2106 --> 2107 2108 Operands of unary and binary expressions may be associated with a default using 2109 the following 2110 2111 <!-- 2112 ``` 2113 O1: op (v1, d1) => (op v1, op d1) 2114 2115 O2: (v1, d1) op (v2, d2) => (v1 op v2, d1 op d2) 2116 and because v => (v, v) 2117 O3: v1 op (v2, d2) => (v1 op v2, v1 op d2) 2118 O4: (v1, d1) op v2 => (v1 op v2, d1 op v2) 2119 ``` 2120 --> 2121 2122 ``` 2123 Field Resulting Value-Default pair 2124 a: *1|2 (1|2, 1) 2125 b: -a (-a, -1) 2126 2127 c: a + 2 (a+2, 3) 2128 d: a + a (a+a, 2) 2129 ``` 2130 2131 #### Operator precedence 2132 2133 Unary operators have the highest precedence. 2134 2135 There are eight precedence levels for binary operators. 2136 Multiplication operators binds strongest, followed by 2137 addition operators, comparison operators, 2138 `&&` (logical AND), `||` (logical OR), `&` (unification), 2139 and finally `|` (disjunction): 2140 2141 ``` 2142 Precedence Operator 2143 7 * / 2144 6 + - 2145 5 == != < <= > >= =~ !~ 2146 4 && 2147 3 || 2148 2 & 2149 1 | 2150 ``` 2151 2152 Binary operators of the same precedence associate from left to right. 2153 For instance, `x / y * z` is the same as `(x / y) * z`. 2154 2155 ``` 2156 +x 2157 23 + 3*x[i] 2158 x <= f() 2159 f() || g() 2160 x == y+1 && y == z-1 2161 2 | int 2162 { a: 1 } & { b: 2 } 2163 ``` 2164 2165 #### Arithmetic operators 2166 2167 Arithmetic operators apply to numeric values and yield a result of the same type 2168 as the first operand. The four standard arithmetic operators 2169 `(+, -, *, /)` apply to integer and decimal floating-point types; 2170 `+` and `*` also apply to strings and bytes. 2171 2172 ``` 2173 + sum integers, floats, strings, bytes 2174 - difference integers, floats 2175 * product integers, floats, strings, bytes 2176 / quotient integers, floats 2177 ``` 2178 2179 For any operator that accepts operands of type `float`, any operand may be 2180 of type `int` or `float`, in which case the result will be `float` 2181 if it cannot be represented as an `int` or if any of the operands are `float`, 2182 or `int` otherwise. 2183 So the result of `1 / 2` is `0.5` and is of type `float`. 2184 2185 The result of division by zero is bottom (an error). 2186 <!-- TODO: consider making it +/- Inf --> 2187 Integer division is implemented through the builtin functions 2188 `quo`, `rem`, `div`, and `mod`. 2189 2190 The unary operators `+` and `-` are defined for numeric values as follows: 2191 2192 ``` 2193 +x is 0 + x 2194 -x negation is 0 - x 2195 ``` 2196 2197 #### String operators 2198 2199 Strings can be concatenated using the `+` operator: 2200 ``` 2201 s: "hi " + name + " and good bye" 2202 ``` 2203 String addition creates a new string by concatenating the operands. 2204 2205 A string can be repeated by multiplying it: 2206 2207 ``` 2208 s: "etc. "*3 // "etc. etc. etc. " 2209 ``` 2210 2211 <!-- jba: Do these work for byte sequences? If not, why not? --> 2212 2213 2214 ##### Comparison operators 2215 2216 Comparison operators compare two operands and yield an untyped boolean value. 2217 2218 ``` 2219 == equal 2220 != not equal 2221 < less 2222 <= less or equal 2223 > greater 2224 >= greater or equal 2225 =~ matches regular expression 2226 !~ does not match regular expression 2227 ``` 2228 2229 <!-- regular expression operator inspired by Bash, Perl, and Ruby. --> 2230 2231 In any comparison, the types of the two operands must unify or one of the 2232 operands must be null. 2233 2234 The equality operators `==` and `!=` apply to operands that are comparable. 2235 The ordering operators `<`, `<=`, `>`, and `>=` apply to operands that are ordered. 2236 The matching operators `=~` and `!~` apply to a string and regular 2237 expression operand. 2238 These terms and the result of the comparisons are defined as follows: 2239 2240 - Null is comparable with itself and any other type. 2241 Two null values are always equal, null is unequal with anything else. 2242 - Boolean values are comparable. 2243 Two boolean values are equal if they are either both true or both false. 2244 - Integer values are comparable and ordered, in the usual way. 2245 - Floating-point values are comparable and ordered, as per the definitions 2246 for binary coded decimals in the IEEE-754-2008 standard. 2247 - Floating point numbers may be compared with integers. 2248 - String and bytes values are comparable and ordered lexically byte-wise. 2249 - Struct are not comparable. 2250 - Lists are not comparable. 2251 - The regular expression syntax is the one accepted by RE2, 2252 described in https://github.com/google/re2/wiki/Syntax, 2253 except for `\C`. 2254 - `s =~ r` is true if `s` matches the regular expression `r`. 2255 - `s !~ r` is true if `s` does not match regular expression `r`. 2256 2257 <!--- TODO: consider the following 2258 - For regular expression, named capture groups are interpreted as CUE references 2259 that must unify with the strings matching this capture group. 2260 ---> 2261 <!-- TODO: Implementations should adopt an algorithm that runs in linear time? --> 2262 <!-- Consider implementing Level 2 of Unicode regular expression. --> 2263 2264 ``` 2265 3 < 4 // true 2266 3 < 4.0 // true 2267 null == 2 // false 2268 null != {} // true 2269 {} == {} // _|_: structs are not comparable against structs 2270 2271 "Wild cats" =~ "cat" // true 2272 "Wild cats" !~ "dog" // true 2273 2274 "foo" =~ "^[a-z]{3}$" // true 2275 "foo" =~ "^[a-z]{4}$" // false 2276 ``` 2277 2278 <!-- jba 2279 I think I know what `3 < a` should mean if 2280 2281 a: >=1 & <=5 2282 2283 It should be a constraint on `a` that can be evaluated once `a`'s value is known more precisely. 2284 2285 But what does `3 < (>=1 & <=5)` mean? We'll never get more information, so it must have a definite value. 2286 --> 2287 2288 #### Logical operators 2289 2290 Logical operators apply to boolean values and yield a result of the same type 2291 as the operands. The right operand is evaluated conditionally. 2292 2293 ``` 2294 && conditional AND p && q is "if p then q else false" 2295 || conditional OR p || q is "if p then true else q" 2296 ! NOT !p is "not p" 2297 ``` 2298 2299 2300 <!-- 2301 ### TODO TODO TODO 2302 2303 3.14 / 0.0 // illegal: division by zero 2304 Illegal conversions always apply to CUE. 2305 2306 Implementation restriction: A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa. 2307 --> 2308 2309 <!--- TODO(mpvl): conversions 2310 ### Conversions 2311 Conversions are expressions of the form `T(x)` where `T` and `x` are 2312 expressions. 2313 The result is always an instance of `T`. 2314 2315 ``` 2316 Conversion = Expression "(" Expression [ "," ] ")" . 2317 ``` 2318 ---> 2319 <!--- 2320 2321 A literal value `x` can be converted to type T if `x` is representable by a 2322 value of `T`. 2323 2324 As a special case, an integer literal `x` can be converted to a string type 2325 using the same rule as for non-constant x. 2326 2327 Converting a literal yields a typed value as result. 2328 2329 ``` 2330 uint(iota) // iota value of type uint 2331 float32(2.718281828) // 2.718281828 of type float32 2332 complex128(1) // 1.0 + 0.0i of type complex128 2333 float32(0.49999999) // 0.5 of type float32 2334 float64(-1e-1000) // 0.0 of type float64 2335 string('x') // "x" of type string 2336 string(0x266c) // "♬" of type string 2337 MyString("foo" + "bar") // "foobar" of type MyString 2338 string([]byte{'a'}) // not a constant: []byte{'a'} is not a constant 2339 (*int)(nil) // not a constant: nil is not a constant, *int is not a boolean, numeric, or string type 2340 int(1.2) // illegal: 1.2 cannot be represented as an int 2341 string(65.0) // illegal: 65.0 is not an integer constant 2342 ``` 2343 ---> 2344 <!--- 2345 2346 A conversion is always allowed if `x` is an instance of `T`. 2347 2348 If `T` and `x` of different underlying type, a conversion is allowed if 2349 `x` can be converted to a value `x'` of `T`'s type, and 2350 `x'` is an instance of `T`. 2351 A value `x` can be converted to the type of `T` in any of these cases: 2352 2353 - `x` is a struct and is subsumed by `T`. 2354 - `x` and `T` are both integer or floating points. 2355 - `x` is an integer or a byte sequence and `T` is a string. 2356 - `x` is a string and `T` is a byte sequence. 2357 2358 Specific rules apply to conversions between numeric types, structs, 2359 or to and from a string type. These conversions may change the representation 2360 of `x`. 2361 All other conversions only change the type but not the representation of x. 2362 2363 2364 #### Conversions between numeric ranges 2365 For the conversion of numeric values, the following rules apply: 2366 2367 1. Any integer value can be converted into any other integer value 2368 provided that it is within range. 2369 2. When converting a decimal floating-point number to an integer, the fraction 2370 is discarded (truncation towards zero). TODO: or disallow truncating? 2371 2372 ``` 2373 a: uint16(int(1000)) // uint16(1000) 2374 b: uint8(1000) // _|_ // overflow 2375 c: int(2.5) // 2 TODO: TBD 2376 ``` 2377 2378 2379 #### Conversions to and from a string type 2380 2381 Converting a list of bytes to a string type yields a string whose successive 2382 bytes are the elements of the slice. 2383 Invalid UTF-8 is converted to `"\uFFFD"`. 2384 2385 ``` 2386 string('hell\xc3\xb8') // "hellø" 2387 string(bytes([0x20])) // " " 2388 ``` 2389 2390 As string value is always convertible to a list of bytes. 2391 2392 ``` 2393 bytes("hellø") // 'hell\xc3\xb8' 2394 bytes("") // '' 2395 ``` 2396 2397 #### Conversions between list types 2398 2399 Conversions between list types are possible only if `T` strictly subsumes `x` 2400 and the result will be the unification of `T` and `x`. 2401 2402 If we introduce named types this would be different from IP & [10, ...] 2403 2404 Consider removing this until it has a different meaning. 2405 2406 ``` 2407 IP: 4*[byte] 2408 Private10: IP([10, ...]) // [10, byte, byte, byte] 2409 ``` 2410 2411 #### Conversions between struct types 2412 2413 A conversion from `x` to `T` 2414 is applied using the following rules: 2415 2416 1. `x` must be an instance of `T`, 2417 2. all fields defined for `x` that are not defined for `T` are removed from 2418 the result of the conversion, recursively. 2419 2420 <!-- jba: I don't think you say anywhere that the matching fields are unified. 2421 mpvl: they are not, x must be an instance of T, in which case x == T&x, 2422 so unification would be unnecessary. 2423 --> 2424 <!-- 2425 ``` 2426 T: { 2427 a: { b: 1..10 } 2428 } 2429 2430 x1: { 2431 a: { b: 8, c: 10 } 2432 d: 9 2433 } 2434 2435 c1: T(x1) // { a: { b: 8 } } 2436 c2: T({}) // _|_ // missing field 'a' in '{}' 2437 c3: T({ a: {b: 0} }) // _|_ // field a.b does not unify (0 & 1..10) 2438 ``` 2439 --> 2440 2441 ### Calls 2442 2443 Calls can be made to core library functions, called builtins. 2444 Given an expression `f` of function type F, 2445 ``` 2446 f(a1, a2, … an) 2447 ``` 2448 calls `f` with arguments a1, a2, … an. Arguments must be expressions 2449 of which the values are an instance of the parameter types of `F` 2450 and are evaluated before the function is called. 2451 2452 ``` 2453 a: math.Atan2(x, y) 2454 ``` 2455 2456 In a function call, the function value and arguments are evaluated in the usual 2457 order. 2458 After they are evaluated, the parameters of the call are passed by value 2459 to the function and the called function begins execution. 2460 The return parameters 2461 of the function are passed by value back to the calling function when the 2462 function returns. 2463 2464 2465 ### Comprehensions 2466 2467 Lists and fields can be constructed using comprehensions. 2468 2469 Comprehensions define a clause sequence that consists of a sequence of 2470 `for`, `if`, and `let` clauses, nesting from left to right. 2471 The sequence must start with a `for` or `if` clause. 2472 The `for` and `let` clauses each define a new scope in which new values are 2473 bound to be available for the next clause. 2474 2475 The `for` clause binds the defined identifiers, on each iteration, to the next 2476 value of some iterable value in a new scope. 2477 A `for` clause may bind one or two identifiers. 2478 If there is one identifier, it binds it to the value of 2479 a list element or struct field value. 2480 If there are two identifiers, the first value will be the key or index, 2481 if available, and the second will be the value. 2482 2483 For lists, `for` iterates over all elements in the list after closing it. 2484 For structs, `for` iterates over all non-optional regular fields. 2485 2486 An `if` clause, or guard, specifies an expression that terminates the current 2487 iteration if it evaluates to false. 2488 2489 The `let` clause binds the result of an expression to the defined identifier 2490 in a new scope. 2491 2492 A current iteration is said to complete if the innermost block of the clause 2493 sequence is reached. 2494 Syntactically, the comprehension value is a struct. 2495 A comprehension can generate non-struct values by embedding such values within 2496 this struct. 2497 2498 Within lists, the values yielded by a comprehension are inserted in the list 2499 at the position of the comprehension. 2500 Within structs, the values yielded by a comprehension are embedded within the 2501 struct. 2502 Both structs and lists may contain multiple comprehensions. 2503 2504 ``` 2505 Comprehension = Clauses StructLit . 2506 2507 Clauses = StartClause { [ "," ] Clause } . 2508 StartClause = ForClause | GuardClause . 2509 Clause = StartClause | LetClause . 2510 ForClause = "for" identifier [ "," identifier ] "in" Expression . 2511 GuardClause = "if" Expression . 2512 LetClause = "let" identifier "=" Expression . 2513 ``` 2514 2515 ``` 2516 a: [1, 2, 3, 4] 2517 b: [ for x in a if x > 1 { x+1 } ] // [3, 4, 5] 2518 2519 c: { 2520 for x in a 2521 if x < 4 2522 let y = 1 { 2523 "\(x)": x + y 2524 } 2525 } 2526 d: { "1": 2, "2": 3, "3": 4 } 2527 ``` 2528 2529 2530 ### String interpolation 2531 2532 String interpolation allows constructing strings by replacing placeholder 2533 expressions with their string representation. 2534 String interpolation may be used in single- and double-quoted strings, as well 2535 as their multiline equivalent. 2536 2537 A placeholder consists of "\(" followed by an expression and a ")". 2538 The expression is evaluated in the scope within which the string is defined. 2539 2540 The result of the expression is substituted as follows: 2541 - string: as is 2542 - bool: the JSON representation of the bool 2543 - number: a JSON representation of the number that preserves the 2544 precision of the underlying binary coded decimal 2545 - bytes: as if substituted within single quotes or 2546 converted to valid UTF-8 replacing the 2547 maximal subpart of ill-formed subsequences with a single 2548 replacement character (W3C encoding standard) otherwise 2549 - list: illegal 2550 - struct: illegal 2551 2552 2553 ``` 2554 a: "World" 2555 b: "Hello \( a )!" // Hello World! 2556 ``` 2557 2558 2559 ## Builtin Functions 2560 2561 Built-in functions are predeclared. They are called like any other function. 2562 2563 2564 ### `len` 2565 2566 The built-in function `len` takes arguments of various types and return 2567 a result of type int. 2568 2569 ``` 2570 Argument type Result 2571 2572 string string length in bytes 2573 bytes length of byte sequence 2574 list list length, smallest length for an open list 2575 struct number of distinct data fields, excluding optional 2576 ``` 2577 <!-- TODO: consider not supporting len, but instead rely on more 2578 precisely named builtin functions: 2579 - strings.RuneLen(x) 2580 - bytes.Len(x) // x may be a string 2581 - struct.NumFooFields(x) 2582 - list.Len(x) 2583 --> 2584 2585 ``` 2586 Expression Result 2587 len("Hellø") 6 2588 len([1, 2, 3]) 3 2589 len([1, 2, ...]) >=2 2590 ``` 2591 2592 2593 ### `close` 2594 2595 The builtin function `close` converts a partially defined, or open, struct 2596 to a fully defined, or closed, struct. 2597 2598 2599 ### `and` 2600 2601 The built-in function `and` takes a list and returns the result of applying 2602 the `&` operator to all elements in the list. 2603 It returns top for the empty list. 2604 2605 ``` 2606 Expression: Result 2607 and([a, b]) a & b 2608 and([a]) a 2609 and([]) _ 2610 ``` 2611 2612 ### `or` 2613 2614 The built-in function `or` takes a list and returns the result of applying 2615 the `|` operator to all elements in the list. 2616 It returns bottom for the empty list. 2617 2618 ``` 2619 Expression: Result 2620 or([a, b]) a | b 2621 or([a]) a 2622 or([]) _|_ 2623 ``` 2624 2625 #### `div`, `mod`, `quo` and `rem` 2626 2627 For two integer values `x` and `y`, 2628 the integer quotient `q = div(x, y)` and remainder `r = mod(x, y)` 2629 implement Euclidean division and 2630 satisfy the following relationship: 2631 2632 ``` 2633 r = x - y*q with 0 <= r < |y| 2634 ``` 2635 where `|y|` denotes the absolute value of `y`. 2636 2637 ``` 2638 x y div(x, y) mod(x, y) 2639 5 3 1 2 2640 -5 3 -2 1 2641 5 -3 -1 2 2642 -5 -3 2 1 2643 ``` 2644 2645 For two integer values `x` and `y`, 2646 the integer quotient `q = quo(x, y)` and remainder `r = rem(x, y)` 2647 implement truncated division and 2648 satisfy the following relationship: 2649 2650 ``` 2651 x = q*y + r and |r| < |y| 2652 ``` 2653 2654 with `quo(x, y)` truncated towards zero. 2655 2656 ``` 2657 x y quo(x, y) rem(x, y) 2658 5 3 1 2 2659 -5 3 -1 -2 2660 5 -3 -1 2 2661 -5 -3 1 -2 2662 ``` 2663 2664 A zero divisor in either case results in bottom (an error). 2665 2666 2667 ## Cycles 2668 2669 Implementations are required to interpret or reject cycles encountered 2670 during evaluation according to the rules in this section. 2671 2672 2673 ### Reference cycles 2674 2675 A _reference cycle_ occurs if a field references itself, either directly or 2676 indirectly. 2677 2678 ``` 2679 // x references itself 2680 x: x 2681 2682 // indirect cycles 2683 b: c 2684 c: d 2685 d: b 2686 ``` 2687 2688 Implementations should treat these as `_`. 2689 Two particular cases are discussed below. 2690 2691 2692 #### Expressions that unify an atom with an expression 2693 2694 An expression of the form `a & e`, where `a` is an atom 2695 and `e` is an expression, always evaluates to `a` or bottom. 2696 As it does not matter how we fail, we can assume the result to be `a` 2697 and postpone validating `a == e` until after all referenecs 2698 in `e` have been resolved. 2699 2700 ``` 2701 // Config Evaluates to (requiring concrete values) 2702 x: { x: { 2703 a: b + 100 a: _|_ // cycle detected 2704 b: a - 100 b: _|_ // cycle detected 2705 } } 2706 2707 y: x & { y: { 2708 a: 200 a: 200 // asserted that 200 == b + 100 2709 b: 100 2710 } } 2711 ``` 2712 2713 2714 #### Field values 2715 2716 A field value of the form `r & v`, 2717 where `r` evaluates to a reference cycle and `v` is a concrete value, 2718 evaluates to `v`. 2719 Unification is idempotent and unifying a value with itself ad infinitum, 2720 which is what the cycle represents, results in this value. 2721 Implementations should detect cycles of this kind, ignore `r`, 2722 and take `v` as the result of unification. 2723 2724 <!-- Tomabechi's graph unification algorithm 2725 can detect such cycles at near-zero cost. --> 2726 2727 ``` 2728 Configuration Evaluated 2729 // c Cycles in nodes of type struct evaluate 2730 // ↙︎ ↖ to the fixed point of unifying their 2731 // a → b values ad infinitum. 2732 2733 a: b & { x: 1 } // a: { x: 1, y: 2, z: 3 } 2734 b: c & { y: 2 } // b: { x: 1, y: 2, z: 3 } 2735 c: a & { z: 3 } // c: { x: 1, y: 2, z: 3 } 2736 2737 // resolve a b & {x:1} 2738 // substitute b c & {y:2} & {x:1} 2739 // substitute c a & {z:3} & {y:2} & {x:1} 2740 // eliminate a (cycle) {z:3} & {y:2} & {x:1} 2741 // simplify {x:1,y:2,z:3} 2742 ``` 2743 2744 This rule also applies to field values that are disjunctions of unification 2745 operations of the above form. 2746 2747 ``` 2748 a: b&{x:1} | {y:1} // {x:1,y:3,z:2} | {y:1} 2749 b: {x:2} | c&{z:2} // {x:2} | {x:1,y:3,z:2} 2750 c: a&{y:3} | {z:3} // {x:1,y:3,z:2} | {z:3} 2751 2752 2753 // resolving a b&{x:1} | {y:1} 2754 // substitute b ({x:2} | c&{z:2})&{x:1} | {y:1} 2755 // simplify c&{z:2}&{x:1} | {y:1} 2756 // substitute c (a&{y:3} | {z:3})&{z:2}&{x:1} | {y:1} 2757 // simplify a&{y:3}&{z:2}&{x:1} | {y:1} 2758 // eliminate a (cycle) {y:3}&{z:2}&{x:1} | {y:1} 2759 // expand {x:1,y:3,z:2} | {y:1} 2760 ``` 2761 2762 Note that all nodes that form a reference cycle to form a struct will evaluate 2763 to the same value. 2764 If a field value is a disjunction, any element that is part of a cycle will 2765 evaluate to this value. 2766 2767 2768 ### Structural cycles 2769 2770 A structural cycle is when a node references one of its ancestor nodes. 2771 It is possible to construct a structural cycle by unifying two acyclic values: 2772 ``` 2773 // acyclic 2774 y: { 2775 f: h: g 2776 g: _ 2777 } 2778 // acyclic 2779 x: { 2780 f: _ 2781 g: f 2782 } 2783 // introduces structural cycle 2784 z: x & y 2785 ``` 2786 Implementations should be able to detect such structural cycles dynamically. 2787 2788 A structural cycle can result in infinite structure or evaluation loops. 2789 ``` 2790 // infinite structure 2791 a: b: a 2792 2793 // infinite evaluation 2794 f: { 2795 n: int 2796 out: n + (f & {n: 1}).out 2797 } 2798 ``` 2799 CUE must allow or disallow structural cycles under certain circumstances. 2800 2801 If a node `a` references an ancestor node, we call it and any of its 2802 field values `a.f` _cyclic_. 2803 So if `a` is cyclic, all of its descendants are also regarded as cyclic. 2804 A given node `x`, whose value is composed of the conjuncts `c1 & ... & cn`, 2805 is valid if any of its conjuncts is not cyclic. 2806 2807 ``` 2808 // Disallowed: a list of infinite length with all elements being 1. 2809 #List: { 2810 head: 1 2811 tail: #List 2812 } 2813 2814 // Disallowed: another infinite structure (a:{b:{d:{b:{d:{...}}}}}, ...). 2815 a: { 2816 b: c 2817 } 2818 c: { 2819 d: a 2820 } 2821 2822 // #List defines a list of arbitrary length. Because the recursive reference 2823 // is part of a disjunction, this does not result in a structural cycle. 2824 #List: { 2825 head: _ 2826 tail: null | #List 2827 } 2828 2829 // Usage of #List. The value of tail in the most deeply nested element will 2830 // be `null`: as the value of the disjunct referring to list is the only 2831 // conjunct, all conjuncts are cyclic and the value is invalid and so 2832 // eliminated from the disjunction. 2833 MyList: #List & { head: 1, tail: { head: 2 }} 2834 ``` 2835 2836 <!-- 2837 ### Unused fields 2838 2839 TODO: rules for detection of unused fields 2840 2841 1. Any alias value must be used 2842 --> 2843 2844 2845 ## Modules, instances, and packages 2846 2847 CUE configurations are constructed combining _instances_. 2848 An instance, in turn, is constructed from one or more source files belonging 2849 to the same _package_ that together declare the data representation. 2850 Elements of this data representation may be exported and used 2851 in other instances. 2852 2853 ### Source file organization 2854 2855 Each source file consists of an optional package clause defining collection 2856 of files to which it belongs, 2857 followed by a possibly empty set of import declarations that declare 2858 packages whose contents it wishes to use, followed by a possibly empty set of 2859 declarations. 2860 2861 Like with a struct, a source file may contain embeddings. 2862 Unlike with a struct, the embedded expressions may be any value. 2863 If the result of the unification of all embedded values is not a struct, 2864 it will be output instead of its enclosing file when exporting CUE 2865 to a data format 2866 2867 ``` 2868 SourceFile = { attribute "," } [ PackageClause "," ] { ImportDecl "," } { Declaration "," } . 2869 ``` 2870 2871 ``` 2872 "Hello \(place)!" 2873 2874 place: "world" 2875 2876 // Outputs "Hello world!" 2877 ``` 2878 2879 ### Package clause 2880 2881 A package clause is an optional clause that defines the package to which 2882 a source file the file belongs. 2883 2884 ``` 2885 PackageClause = "package" PackageName . 2886 PackageName = identifier . 2887 ``` 2888 2889 The PackageName must not be the blank identifier or a definition identifier. 2890 2891 ``` 2892 package math 2893 ``` 2894 2895 ### Modules and instances 2896 A _module_ defines a tree of directories, rooted at the _module root_. 2897 2898 All source files within a module with the same package belong to the same 2899 package. 2900 <!-- jba: I can't make sense of the above sentence. --> 2901 A module may define multiple packages. 2902 2903 An _instance_ of a package is any subset of files belonging 2904 to the same package. 2905 <!-- jba: Are you saying that --> 2906 <!-- if I have a package with files a, b and c, then there are 8 instances of --> 2907 <!-- that package, some of which are {a, b}, {c}, {b, c}, and so on? What's the --> 2908 <!-- purpose of that definition? --> 2909 It is interpreted as the concatenation of these files. 2910 2911 An implementation may impose conventions on the layout of package files 2912 to determine which files of a package belongs to an instance. 2913 For example, an instance may be defined as the subset of package files 2914 belonging to a directory and all its ancestors. 2915 <!-- jba: OK, that helps a little, but I still don't see what the purpose is. --> 2916 2917 2918 ### Import declarations 2919 2920 An import declaration states that the source file containing the declaration 2921 depends on definitions of the _imported_ package (§Program initialization and 2922 execution) and enables access to exported identifiers of that package. 2923 The import names an identifier (PackageName) to be used for access and an 2924 ImportPath that specifies the package to be imported. 2925 2926 ``` 2927 ImportDecl = "import" ( ImportSpec | "(" { ImportSpec "," } ")" ) . 2928 ImportSpec = [ PackageName ] ImportPath . 2929 ImportLocation = { unicode_value } . 2930 ImportPath = `"` ImportLocation [ ":" identifier ] `"` . 2931 ``` 2932 2933 The PackageName is used in qualified identifiers to access 2934 exported identifiers of the package within the importing source file. 2935 It is declared in the file block. 2936 It defaults to the identifier specified in the package clause of the imported 2937 package, which must match either the last path component of ImportLocation 2938 or the identifier following it. 2939 2940 <!-- 2941 Note: this deviates from the Go spec where there is no such restriction. 2942 This restriction has the benefit of being to determine the identifiers 2943 for packages from within the file itself. But for CUE it is has another benefit: 2944 when using package hierarchies, one is more likely to want to include multiple 2945 packages within the same directory structure. This mechanism allows 2946 disambiguation in these cases. 2947 --> 2948 2949 The interpretation of the ImportPath is implementation-dependent but it is 2950 typically either the path of a builtin package or a fully qualifying location 2951 of a package within a source code repository. 2952 2953 An ImportLocation must be a non-empty strings using only characters belonging 2954 Unicode's L, M, N, P, and S general categories 2955 (the Graphic characters without spaces) 2956 and may not include the characters !"#$%&'()*,:;<=>?[\]^`{|} 2957 or the Unicode replacement character U+FFFD. 2958 2959 Assume we have package containing the package clause "package math", 2960 which exports function Sin at the path identified by "lib/math". 2961 This table illustrates how Sin is accessed in files 2962 that import the package after the various types of import declaration. 2963 2964 ``` 2965 Import declaration Local name of Sin 2966 2967 import "lib/math" math.Sin 2968 import "lib/math:math" math.Sin 2969 import m "lib/math" m.Sin 2970 ``` 2971 2972 An import declaration declares a dependency relation between the importing and 2973 imported package. It is illegal for a package to import itself, directly or 2974 indirectly, or to directly import a package without referring to any of its 2975 exported identifiers. 2976 2977 2978 ### An example package 2979 2980 TODO