cuelang.org/go@v0.10.1/internal/core/adt/cycle.go (about) 1 // Copyright 2022 CUE Authors 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 package adt 16 17 // Cycle detection: 18 // 19 // - Current algorithm does not allow for early non-cyclic conjunct detection. 20 // - Record possibly cyclic references. 21 // - Mark as cyclic if no evidence is found. 22 // - Note that this also activates the same reference in other (parent) conjuncts. 23 24 // TODO: 25 // - get rid of nodeContext.{hasCycle|hasNonCycle}. 26 // - compiler support for detecting cross-pattern references. 27 // - handle propagation of cyclic references to root across disjunctions. 28 29 // CYCLE DETECTION ALGORITHM 30 // 31 // BACKGROUND 32 // 33 // The cycle detection is inspired by the cycle detection used by Tomabechi's 34 // [Tomabechi COLING 1992] and Van Lohuizen's [Van Lohuizen ACL 2000] graph 35 // unification algorithms. 36 // 37 // Unlike with traditional graph unification, however, CUE uses references, 38 // which, unlike node equivalence, are unidirectional. This means that the 39 // technique to track equivalence through dereference, as common in graph 40 // unification algorithms like Tomabechi's, does not work unaltered. 41 // 42 // The unidirectional nature of references imply that each reference equates a 43 // facsimile of the value it points to. This renders the original approach of 44 // node-pointer equivalence useless. 45 // 46 // 47 // PRINCIPLE OF ALGORITHM 48 // 49 // The solution for CUE is based on two observations: 50 // 51 // - the CUE algorithm tracks all conjuncts that define a node separately, - 52 // accumulating used references on a per-conjunct basis causes duplicate 53 // references to uniquely identify cycles. 54 // 55 // A structural cycle, as defined by the spec, can then be detected if all 56 // conjuncts are marked as a cycle. 57 // 58 // References are accumulated as follows: 59 // 60 // 1. If a conjunct is a reference the reference is associated with that 61 // conjunct as well as the conjunct corresponding to the value it refers to. 62 // 2. If a conjunct is a struct (including lists), its references are associated 63 // with all embedded values and fields. 64 // 65 // To narrow down the specifics of the reference-based cycle detection, let us 66 // explore structural cycles in a bit more detail. 67 // 68 // 69 // STRUCTURAL CYCLES 70 // 71 // See the language specification for a higher-level and more complete overview. 72 // 73 // We have to define when a cycle is detected. CUE implementations MUST report 74 // an error upon a structural cycle, and SHOULD report cycles at the shortest 75 // possible paths at which they occur, but MAY report these at deeper paths. For 76 // instance, the following CUE has a structural cycle 77 // 78 // f: g: f 79 // 80 // The shortest path at which the cycle can be reported is f.g, but as all 81 // failed configurations are logically equal, it is fine for implementations to 82 // report them at f.g.g, for instance. 83 // 84 // It is not, however, correct to assume that a reference to a parent is always 85 // a cycle. Consider this case: 86 // 87 // a: [string]: b: a 88 // 89 // Even though reference `a` refers to a parent node, the cycle needs to be fed 90 // by a concrete field in struct `a` to persist, meaning it cannot result in a 91 // cycle as defined in the spec as it is defined here. Note however, that a 92 // specialization of this configuration _can_ result in a cycle. Consider 93 // 94 // a: [string]: b: a 95 // a: c: _ 96 // 97 // Here reference `a` is guaranteed to result in a structural cycle, as field 98 // `c` will match the pattern constraint unconditionally. 99 // 100 // In other words, it is not possible to exclude tracking references across 101 // pattern constraints from cycle checking. 102 // 103 // It is tempting to try to find a complete set of these edge cases with the aim 104 // to statically determine cases in which this occurs. But as [Carpenter 1992] 105 // demonstrates, it is possible for cycles to be created as a result of unifying 106 // two graphs that are themselves acyclic. The following example is a 107 // translation of Carpenters example to CUE: 108 // 109 // y: { 110 // f: h: g 111 // g: _ 112 // } 113 // x: { 114 // f: _ 115 // g: f 116 // } 117 // 118 // Even though the above contains no cycles, the result of `x & y` is cyclic: 119 // 120 // f: h: g 121 // g: f 122 // 123 // This means that, in practice, cycle detection has at least partially a 124 // dynamic component to it. 125 // 126 // 127 // ABSTRACT ALGORITHM 128 // 129 // The algorithm is described declaratively by defining what it means for a 130 // field to have a structural cycle. In the below, a _reference_ is uniquely 131 // identified by the pointer identity of a Go Resolver instance. 132 // 133 // Cycles are tracked on a per-conjunct basis and are not aggregated per Vertex: 134 // administrative information is only passed on from parent to child conjunct. 135 // 136 // A conjunct is a _parent_ of another conjunct if is a conjunct of one of the 137 // non-optional fields of the conjunct. For instance, conjunct `x` with value 138 // `{b: y & z}`, is a parent of conjunct `y` as well as `z`. Within field `b`, 139 // the conjuncts `y` and `z` would be tracked individually, though. 140 // 141 // A conjunct is _associated with a reference_ if its value was obtained by 142 // evaluating a reference. Note that a conjunct may be associated with many 143 // references if its evaluation requires evaluating a chain of references. For 144 // instance, consider 145 // 146 // a: {x: d} 147 // b: a 148 // c: b & e 149 // d: y: 1 150 // 151 // the first conjunct of field `c` (reference `b`) has the value `{x: y: 1}` and 152 // is associated with references `b` and `a`. 153 // 154 // The _tracked references_ of a conjunct are all references that are associated 155 // with it or any of its ancestors (parents of parents). For instance, the 156 // tracked references of conjunct `b.x` of field `c.x` are `a`, `b`, and `d`. 157 // 158 // A conjunct is a violating cycle if it is a reference that: 159 // - occurs in the tracked references of the conjunct, or 160 // - directly refers to a parent node of the conjunct. 161 // 162 // A conjunct is cyclic if it is a violating cycle or if any of its ancestors 163 // are a violating cycle. 164 // 165 // A field has a structural cycle if it is composed of at least one conjunct 166 // that is a violating cycle and no conjunct that is not cyclic. 167 // 168 // Note that a field can be composed of only cyclic conjuncts while still not be 169 // structural cycle: as long as there are no conjuncts that are a violating 170 // cycle, it is not a structural cycle. This is important for the following 171 // case: 172 // 173 // a: [string]: b: a 174 // x: a 175 // x: c: b: c: {} 176 // 177 // Here, reference `a` is never a cycle as the recursive references crosses a 178 // pattern constraint that only instantiates if it is unified with something 179 // else. 180 // 181 // 182 // DISCUSSION 183 // 184 // The goal of conjunct cycle marking algorithm is twofold: - mark conjuncts 185 // that are proven to propagate indefinitely - mark them as early as possible 186 // (shortest CUE path). 187 // 188 // TODO: Prove all cyclic conjuncts will eventually be marked as cyclic. 189 // 190 // TODO: 191 // - reference marks whether it crosses a pattern, improving the case 192 // a: [string]: b: c: b 193 // This requires a compile-time detection mechanism. 194 // 195 // 196 // REFERENCES 197 // [Tomabechi COLING 1992]: https://aclanthology.org/C92-2068 198 // Hideto Tomabechi. 1992. Quasi-Destructive Graph Unification with 199 // Structure-Sharing. In COLING 1992 Volume 2: The 14th International 200 // Conference on Computational Linguistics. 201 // 202 // [Van Lohuizen ACL 2000]: https://aclanthology.org/P00-1045/ 203 // Marcel P. van Lohuizen. 2000. "Memory-Efficient and Thread-Safe 204 // Quasi-Destructive Graph Unification". In Proceedings of the 38th Annual 205 // Meeting of the Association for Computational Linguistics, pages 352–359, 206 // Hong Kong. Association for Computational Linguistics. 207 // 208 // [Carpenter 1992]: 209 // Bob Carpenter, "The logic of typed feature structures." 210 // Cambridge University Press, ISBN:0-521-41932-8 211 212 type CycleInfo struct { 213 // IsCyclic indicates whether this conjunct, or any of its ancestors, 214 // had a violating cycle. 215 IsCyclic bool 216 217 // Inline is used to detect expressions referencing themselves, for instance: 218 // {x: out, out: x}.out 219 Inline bool 220 221 // TODO(perf): pack this in with CloseInfo. Make an uint32 pointing into 222 // a buffer maintained in OpContext, using a mark-release mechanism. 223 Refs *RefNode 224 } 225 226 // A RefNode is a linked list of associated references. 227 type RefNode struct { 228 Ref Resolver 229 Arc *Vertex // Ref points to this Vertex 230 231 // Node is the Vertex of which Ref is evaluated as a conjunct. 232 // If there is a cyclic reference (not structural cycle), then 233 // the reference will have the same node. This allows detecting reference 234 // cycles for nodes referring to nodes with an evaluation cycle 235 // (mode tracked to Evaluating status). Examples: 236 // 237 // a: x 238 // Y: x 239 // x: {Y} 240 // 241 // and 242 // 243 // Y: x.b 244 // a: x 245 // x: b: {Y} | null 246 // 247 // In both cases there are not structural cycles and thus need to be 248 // distinguished from regular structural cycles. 249 Node *Vertex 250 251 Next *RefNode 252 Depth int32 253 } 254 255 // cyclicConjunct is used in nodeContext to postpone the computation of 256 // cyclic conjuncts until a non-cyclic conjunct permits it to be processed. 257 type cyclicConjunct struct { 258 c Conjunct 259 arc *Vertex // cached Vertex 260 } 261 262 // markCycle checks whether the reference x is cyclic. There are two cases: 263 // 1. it was previously used in this conjunct, and 264 // 2. it directly references a parent node. 265 // 266 // Other inputs: 267 // 268 // arc the reference to which x points 269 // env, ci the components of the Conjunct from which x originates 270 // 271 // A cyclic node is added to a queue for later processing if no evidence of a 272 // non-cyclic node has so far been found. updateCyclicStatus processes delayed 273 // nodes down the line once such evidence is found. 274 // 275 // If a cycle is the result of "inline" processing (an expression referencing 276 // itself), an error is reported immediately. 277 // 278 // It returns the CloseInfo with tracked cyclic conjuncts updated, and 279 // whether or not its processing should be skipped, which is the case either if 280 // the conjunct seems to be fully cyclic so far or if there is a valid reference 281 // cycle. 282 func (n *nodeContext) markCycle(arc *Vertex, env *Environment, x Resolver, ci CloseInfo) (_ CloseInfo, skip bool) { 283 n.assertInitialized() 284 285 // TODO(perf): this optimization can work if we also check for any 286 // references pointing to arc within arc. This can be done with compiler 287 // support. With this optimization, almost all references could avoid cycle 288 // checking altogether! 289 // if arc.status == Finalized && arc.cyclicReferences == nil { 290 // return v, false 291 // } 292 293 // Check whether the reference already occurred in the list, signaling 294 // a potential cycle. 295 found := false 296 depth := int32(0) 297 for r := ci.Refs; r != nil; r = r.Next { 298 if r.Ref != x { 299 // TODO(share): this is a bit of a hack. We really should implement 300 // (*Vertex).cyclicReferences for the new evaluator. However, 301 // implementing cyclicReferences is somewhat tricky, as it requires 302 // referenced nodes to be evaluated, which is a guarantee we may not 303 // want to give. Moreover, it seems we can find a simpler solution 304 // based on structure sharing. So punt on this solution for now. 305 if r.Arc != arc || !n.ctx.isDevVersion() { 306 continue 307 } 308 found = true 309 } 310 311 // A reference that is within a graph that is being evaluated 312 // may repeat with a different arc and will point to a 313 // non-finalized arc. A repeating reference that points outside the 314 // graph will always be the same address. Hence, if this is a 315 // finalized arc with a different address, it resembles a reference that 316 // is included through a different path and is not a cycle. 317 if !equalDeref(r.Arc, arc) && arc.status == finalized { 318 continue 319 } 320 321 // For dynamically created structs we mark this as an error. Otherwise 322 // there is only an error if we have visited the arc before. 323 if ci.Inline && (arc.IsDynamic || equalDeref(r.Arc, arc)) { 324 n.reportCycleError() 325 return ci, true 326 } 327 328 // We have a reference cycle, as distinguished from a structural 329 // cycle. Reference cycles represent equality, and thus are equal 330 // to top. We can stop processing here. 331 // var nn1, nn2 *Vertex 332 // if u := r.Node.state.underlay; u != nil { 333 // nn1 = u.node 334 // } 335 // if u := n.node.state.underlay; u != nil { 336 // nn2 = u.node 337 // } 338 if equalDeref(r.Node, n.node) { 339 return ci, true 340 } 341 342 depth = r.Depth 343 found = true 344 345 // Mark all conjuncts of this Vertex that refer to the same node as 346 // cyclic. This is an extra safety measure to ensure that two conjuncts 347 // cannot work in tandom to circumvent a cycle. It also tightens 348 // structural cycle detection in some cases. Late detection of cycles 349 // can result in a lot of redundant work. 350 // 351 // TODO: this loop is not on a critical path, but it may be evaluated 352 // if it is worthy keeping at some point. 353 for i, c := range n.node.Conjuncts { 354 if c.CloseInfo.IsCyclic { 355 continue 356 } 357 for rr := c.CloseInfo.Refs; rr != nil; rr = rr.Next { 358 // TODO: Is it necessary to find another way to find 359 // "parent" conjuncts? This mechanism seems not entirely 360 // accurate. Maybe a pointer up to find the root and then 361 // "spread" downwards? 362 if r.Ref == x && equalDeref(r.Arc, rr.Arc) { 363 n.node.Conjuncts[i].CloseInfo.IsCyclic = true 364 break 365 } 366 } 367 } 368 369 break 370 } 371 372 if arc.state != nil { 373 if d := arc.state.evalDepth; d > 0 && d >= n.ctx.optionalMark { 374 arc.IsCyclic = true 375 } 376 } 377 378 // The code in this switch statement registers structural cycles caught 379 // through EvaluatingArcs to the root of the cycle. This way, any node 380 // referencing this value can track these nodes early. This is mostly an 381 // optimization to shorten the path for which structural cycles are 382 // detected, which may be critical for performance. 383 outer: 384 switch arc.status { 385 case evaluatingArcs: // also Evaluating? 386 if arc.state.evalDepth < n.ctx.optionalMark { 387 break 388 } 389 390 // The reference may already be there if we had no-cyclic structure 391 // invalidating the cycle. 392 for r := arc.cyclicReferences; r != nil; r = r.Next { 393 if r.Ref == x { 394 break outer 395 } 396 } 397 398 arc.cyclicReferences = &RefNode{ 399 Arc: deref(arc), 400 Ref: x, 401 Next: arc.cyclicReferences, 402 } 403 404 case finalized: 405 // Insert cyclic references from found arc, if any. 406 for r := arc.cyclicReferences; r != nil; r = r.Next { 407 if r.Ref == x { 408 // We have detected a cycle, with the only exception if arc is 409 // a disjunction, as evaluation always stops at unresolved 410 // disjunctions. 411 if _, ok := arc.BaseValue.(*Disjunction); !ok { 412 found = true 413 } 414 } 415 ci.Refs = &RefNode{ 416 Arc: deref(r.Arc), 417 Node: deref(n.node), 418 419 Ref: x, 420 Next: ci.Refs, 421 Depth: n.depth, 422 } 423 } 424 } 425 426 // NOTE: we need to add a tracked reference even if arc is not cyclic: it 427 // may still cause a cycle that does not refer to a parent node. For 428 // instance: 429 // 430 // y: [string]: b: y 431 // x: y 432 // x: c: x 433 // 434 // -> 435 // - in conjuncts 436 // - out conjuncts: these count for cycle detection. 437 // x: { 438 // [string]: <1: y> b: y 439 // c: x 440 // } 441 // x.c: { 442 // <1: y> b: y 443 // <2: x> y 444 // [string]: <3: x, y> b: y 445 // <2: x> c: x 446 // } 447 // x.c.b: { 448 // <1: y> y 449 // [string]: <4: y; Cyclic> b: y 450 // <3: x, y> b: y 451 // } 452 // x.c.b.b: { 453 // <3: x, y> y 454 // [string]: <5: x, y, Cyclic> b: y 455 // <4: y, Cyclic> y 456 // [string]: <5: x, y, Cyclic> b: y 457 // } 458 // x.c.c: { // structural cycle 459 // <3: x, y> b: y 460 // <2: x> x 461 // <6: x, Cyclic>: y 462 // [string]: <8: x, y; Cyclic> b: y 463 // <7: x, Cyclic>: c: x 464 // } 465 // x.c.c.b: { // structural cycle 466 // <3: x, y> y 467 // [string]: <3: x, y; Cyclic> b: y 468 // <8: x, y; Cyclic> y 469 // } 470 // -> 471 // x: [string]: b: y 472 // x: c: b: y 473 // x: c: [string]: b: y 474 // x: c: b: b: y 475 // x: c: b: [string]: b: y 476 // x: c: b: b: b: y 477 // .... // structural cycle 1 478 // x: c: c: x // structural cycle 2 479 // 480 // Note that in this example there is a structural cycle at x.c.c, but we 481 // would need go guarantee that cycle is detected before the algorithm 482 // descends into x.c.b. 483 if !found || depth != n.depth { 484 // Adding this in case there is a definite cycle is unnecessary, but 485 // gives somewhat better error messages. 486 // We also need to add the reference again if the depth differs, as 487 // the depth is used for tracking "new structure". 488 // var nn *Vertex 489 // if u := n.node.state.underlay; u != nil { 490 // nn = u.node 491 // } 492 ci.Refs = &RefNode{ 493 Arc: deref(arc), 494 Ref: x, 495 Node: deref(n.node), 496 Next: ci.Refs, 497 Depth: n.depth, 498 } 499 } 500 501 if !found && arc.status != evaluatingArcs { 502 // No cycle. 503 return ci, false 504 } 505 506 // TODO: consider if we should bail if a cycle is detected using this 507 // mechanism. Ultimately, especially when the old evaluator is removed 508 // and the status field purged, this should be used instead of the above. 509 // if !found && arc.state.evalDepth < n.ctx.optionalMark { 510 // // No cycle. 511 // return ci, false 512 // } 513 514 alreadyCycle := ci.IsCyclic 515 ci.IsCyclic = true 516 517 // TODO: depth might legitimately be 0 if it is a root vertex. 518 // In the worst case, this may lead to a spurious cycle. 519 // Fix this by ensuring the root vertex starts with a depth of 1, for 520 // instance. 521 if depth > 0 { 522 // Look for evidence of "new structure" to invalidate the cycle. 523 // This is done by checking for non-cyclic conjuncts between the 524 // current vertex up to the ancestor to which the reference points. 525 // Note that the cyclic conjunct may not be marked as such, so we 526 // look for at least one other non-cyclic conjunct if this is the case. 527 upCount := n.depth - depth 528 for p := n.node.Parent; p != nil; p = p.Parent { 529 if upCount--; upCount <= 0 { 530 break 531 } 532 a := p.Conjuncts 533 count := 0 534 for _, c := range a { 535 count += getNonCyclicCount(c) 536 } 537 if !alreadyCycle { 538 count-- 539 } 540 if count > 0 { 541 return ci, false 542 } 543 } 544 } 545 546 n.hasCycle = true 547 if !n.hasNonCycle && env != nil { 548 // TODO: investigate if we can get rid of cyclicConjuncts in the new 549 // evaluator. 550 v := Conjunct{env, x, ci} 551 if n.ctx.isDevVersion() { 552 n.node.cc.incDependent(n.ctx, DEFER, nil) 553 } 554 n.cyclicConjuncts = append(n.cyclicConjuncts, cyclicConjunct{v, arc}) 555 return ci, true 556 } 557 558 return ci, false 559 } 560 561 func getNonCyclicCount(c Conjunct) int { 562 switch a, ok := c.x.(*ConjunctGroup); { 563 case ok: 564 count := 0 565 for _, c := range *a { 566 count += getNonCyclicCount(c) 567 } 568 return count 569 570 case !c.CloseInfo.IsCyclic: 571 return 1 572 573 default: 574 return 0 575 } 576 } 577 578 // updateCyclicStatus looks for proof of non-cyclic conjuncts to override 579 // a structural cycle. 580 func (n *nodeContext) updateCyclicStatus(c CloseInfo) { 581 if !c.IsCyclic { 582 n.hasNonCycle = true 583 for _, c := range n.cyclicConjuncts { 584 if n.ctx.isDevVersion() { 585 ci := c.c.CloseInfo 586 ci.cc = n.node.rootCloseContext(n.ctx) 587 n.scheduleVertexConjuncts(c.c, c.arc, ci) 588 n.node.cc.decDependent(n.ctx, DEFER, nil) 589 } else { 590 n.addVertexConjuncts(c.c, c.arc, false) 591 } 592 } 593 n.cyclicConjuncts = n.cyclicConjuncts[:0] 594 } 595 } 596 597 func assertStructuralCycle(n *nodeContext) bool { 598 // TODO: is this the right place to put it? 599 if n.ctx.isDevVersion() { 600 for range n.cyclicConjuncts { 601 n.node.cc.decDependent(n.ctx, DEFER, nil) 602 } 603 n.cyclicConjuncts = n.cyclicConjuncts[:0] 604 } 605 606 if n.hasCycle && !n.hasNonCycle { 607 n.reportCycleError() 608 return true 609 } 610 return false 611 } 612 613 func (n *nodeContext) reportCycleError() { 614 n.node.BaseValue = CombineErrors(nil, 615 n.node.Value(), 616 &Bottom{ 617 Code: StructuralCycleError, 618 Err: n.ctx.Newf("structural cycle"), 619 Value: n.node.Value(), 620 // TODO: probably, this should have the referenced arc. 621 }) 622 n.node.Arcs = nil 623 } 624 625 // makeAnonymousConjunct creates a conjunct that tracks self-references when 626 // evaluating an expression. 627 // 628 // Example: 629 // TODO: 630 func makeAnonymousConjunct(env *Environment, x Expr, refs *RefNode) Conjunct { 631 return Conjunct{ 632 env, x, CloseInfo{CycleInfo: CycleInfo{ 633 Inline: true, 634 Refs: refs, 635 }}, 636 } 637 } 638 639 // incDepth increments the evaluation depth. This should typically be called 640 // before descending into a child node. 641 func (n *nodeContext) incDepth() { 642 n.ctx.evalDepth++ 643 } 644 645 // decDepth decrements the evaluation depth. It should be paired with a call to 646 // incDepth and be called after the processing of child nodes is done. 647 func (n *nodeContext) decDepth() { 648 n.ctx.evalDepth-- 649 } 650 651 // markOptional marks that we are about to process an "optional element" that 652 // allows errors. In these cases, structural cycles are not "terminal". 653 // 654 // Examples of such constructs are: 655 // 656 // Optional fields: 657 // 658 // a: b?: a 659 // 660 // Pattern constraints: 661 // 662 // a: [string]: a 663 // 664 // Disjunctions: 665 // 666 // a: b: null | a 667 // 668 // A call to markOptional should be paired with a call to unmarkOptional. 669 func (n *nodeContext) markOptional() (saved int) { 670 saved = n.ctx.evalDepth 671 n.ctx.optionalMark = n.ctx.evalDepth 672 return saved 673 } 674 675 // See markOptional. 676 func (n *nodeContext) unmarkOptional(saved int) { 677 n.ctx.optionalMark = saved 678 } 679 680 // markDepth assigns the current evaluation depth to the receiving node. 681 // Any previously assigned depth is saved and returned and should be restored 682 // using unmarkDepth after processing n. 683 // 684 // When a node is encountered with a depth set to a non-zero value this 685 // indicates a cycle. The cycle is an evaluation cycle when the node's depth 686 // is equal to the current depth and a structural cycle otherwise. 687 func (n *nodeContext) markDepth() (saved int) { 688 saved = n.evalDepth 689 n.evalDepth = n.ctx.evalDepth 690 return saved 691 } 692 693 // See markDepth. 694 func (n *nodeContext) unmarkDepth(saved int) { 695 n.evalDepth = saved 696 }