github.com/ipld/go-ipld-prime@v0.21.0/traversal/selector/selector.go (about) 1 package selector 2 3 import ( 4 "fmt" 5 6 "github.com/ipld/go-ipld-prime/datamodel" 7 ) 8 9 // Selector is a "compiled" and executable IPLD Selector. 10 // It can be put to work with functions like traversal.Walk, 11 // which will use the Selector's guidance to decide how to traverse an IPLD data graph. 12 // A user will not generally call any of the methods of Selector themselves, nor implement the interface; 13 // it is produced by "compile" functions in this package, and used by functions in the `traversal` package. 14 // 15 // A Selector is created by parsing an IPLD Data Model document that declares a Selector 16 // (this is accomplished with functions like CompileSelector). 17 // To make this even easier, there is a `parse` subpackage, 18 // which contains helper methods for parsing direction from a JSON Selector document to a compiled Selector value. 19 // Alternatively, there is a `builder` subpackage, 20 // which may be useful if you would rather create the Selector declaration programmatically in golang 21 // (however, we recommend using this sparingly, because part of what makes Selectors cool is their language-agnostic declarative nature). 22 // 23 // There is no way to go backwards from this "compiled" Selector type into the declarative IPLD data model information that produced it. 24 // That declaration information is discarded after compilation in order to limit the amount of memory held. 25 // Therefore, if you're building APIs about Selector composition, keep in mind that 26 // you'll probably want to approach this be composing the Data Model declaration documents, 27 // and you should *not* attempt to be composing this type, which is only for the "compiled" result. 28 type Selector interface { 29 // Notes for you who implements a Selector: 30 // this type holds the state describing what we will do at one step in a traversal. 31 // The actual traversal stepping is applied *from the outside* (and this is implemented mostly in the `traversal` package; 32 // this type just gives it instructions on how to step. 33 // Each of the functions on this type should be pure; they can can read the Selector's fields, but should treat them as config, not as state -- the Selector should never mutate. 34 // 35 // The traversal process will ask things of a Selector in three phases, 36 // and control flow will bounce back and forth between traversal logic and selector evaluation -- 37 // traversal owns the actual walking (and any data loading), and just briefly dips down into the Selector so it can answer questions: 38 // T1. Traversal starts at some Node with some Selector. 39 // S1. First, the traversal asks the Selector what its "interests" are. 40 // This lets the Selector hint to the traversal process what it should load, 41 // which can be important for performance if not all of the next data elements are in memory already. 42 // (This is applicable to ADLs which contain large sharded data, for example.) 43 // (The "interests" phase should be _fast_; more complicated checks, and anything that actually looks at the children, should wait until the "explore" phase; 44 // in fact, for this reason, the `Interests` function doesn't even get to look at the data at all yet.) 45 // T2. The traversal looks at the Node and its actual fields, and what the Selector just said are interesting, 46 // and between the two of them figures out what's actually here to act on. 47 // (Note that the Selector can say that certain paths are interesting, and that path can then not be there.) 48 // S2. Second, the code driving the traversal will ask us to "explore", **stepwise**. 49 // The "explore" step is applied **repeatedly**: once per pathSegment that identifies a child in the Node. 50 // (If `Interests()` returned a list, `Explore` will be called for each element in the list (as long as that pathSegment actually existed in the Node, of course); 51 // or if `Interest()` returned no guidance, `Explore` will be called for everything in the object.) 52 // S2.a. The "explore" step returns a new Selector object, with instructions about how to continue the walk for the reached object and beneath. 53 // (Note that the "explore" step can also return `nil` here to say "actually, don't look any further", 54 // and it may do so even if the "interests" phase suggested there might be something to follow up on here. (Remember "interests" had to be fast, and was a first pass only.)) 55 // T2.a. ***Recursion time!*** 56 // The traversal now takes that pathSegment and that subsequent Selector produced by `Explore`, 57 // gets the child Node at that pathSegment, and recurses into traversing on that Node with that Selector! 58 // It is also possibly ***link load time***, right before recursing: 59 // if the child node is a Link, the traversal may choose to load it now, 60 // and then do the recursion on the loaded Node (instead of on the actual direct child Node, which was a Link) with the next Selector. 61 // T2.b. When the recursion is done, the traversal goes on to repeat S2, with the next pathSegment, 62 // until it runs out of things to do. 63 // T3. The traversal asks the Selector to "decide" if this current Node is one that is "matched or not. 64 // See the Selector specs for discussion on "matched" vs "reached"/"visited" nodes. 65 // (Long story short: the traversal probably fires off callbacks for "matched" nodes, aka if `Decide` says `true`.) 66 // S3. The selector does so. 67 // T4. The traversal for this node is done. 68 // 69 // Phase T3+S3 can also be T0+S0, which makes for a pre-order traversal instead of a post-order traversal. 70 // The Selector doesn't know the difference. 71 // (In particular, a Selector implementation absolutely may **not** assume `Decide` will be called before `Interests`, and may **not** hold onto a Node statefully, etc.) 72 // 73 // Note that it's not until phase T2.a that the traversal actually loads child Nodes. 74 // This is interesting because it's *after* when the Selector is asked to `Explore` and yield a subsequent Selector to use on that upcoming Node. 75 // 76 // Can `Explore` and `Decide` do Link loading on their own? Do they need to? 77 // Right now, no, they can't. (Sort of.) They don't have access to a LinkLoader; the traversal would have to give them one. 78 // This might be needed in the future, e.g. if the Selector has a Condition clause that requires looking deeper; so far, we don't have those features, so it hasn't been needed. 79 // The "sort of" is for ADLs. ADLs that work with large sharded data sometimes hold onto their own LinkLoader and apply it transparently. 80 // In that case, of course, `Explore` and `Decide` can just interrogate the Node they've been given, and that may cause link loading. 81 // (If that happens, we're currently assuming the ADL has a reasonable caching behavior. It's very likely that the traversal will look up the same paths that Explore just looked up (assuming the Condition told exploration to continue).) 82 // 83 84 // Interests should return either a list of PathSegment we're likely interested in, 85 // **or nil**, which indicates we're a high-cardinality or expression-based selection clause and thus we'll need all segments proposed to us. 86 // Note that a non-nil zero length list of PathSegment is distinguished from nil: this would mean this selector is interested absolutely nothing. 87 // 88 // Traversal will call this before calling Explore, and use it to try to call Explore less often (or even avoid iterating on the data node at all). 89 Interests() []datamodel.PathSegment 90 91 // Explore is told about the node we're at, and the pathSegment inside it to consider, 92 // and returns either nil, if we shouldn't explore that path any further, 93 // or returns a Selector, which should then be used to explore the child at that path. 94 // 95 // Note that the node parameter is not the child, it's the node we're currently at. 96 // (Often, this is sufficient information: consider ExploreFields, 97 // which only even needs to regard the pathSegment, and not the node at all.) 98 // 99 // Remember that Explore does **not** iterate `node` itself; the visits to any children of `node` will be driven from the outside, by the traversal function. 100 // (The Selector's job is just guiding that process by returning information.) 101 // The architecture works this way so that a sufficiently clever traversal function could consider several reasons for exploring a node before deciding whether to do so. 102 Explore(node datamodel.Node, child datamodel.PathSegment) (subsequent Selector, err error) 103 104 // Decide returns true if the subject node is "matched". 105 // 106 // Only "Matcher" clauses actually implement this in a way that ever returns "true". 107 // See the Selector specs for discussion on "matched" vs "reached"/"visited" nodes. 108 Decide(node datamodel.Node) bool 109 110 // Match is an extension to Decide allowing the matcher to `decide` a transformation of 111 // the matched node. This is used for `Subset` match behavior. If the node is matched, 112 // the first argument will be the matched node. If it is not matched, the first argument 113 // will be null. If there is an error, the first argument will be null. 114 Match(node datamodel.Node) (datamodel.Node, error) 115 } 116 117 // REVIEW: do ParsedParent and ParseContext need to be exported? They're mostly used during the compilation process. 118 119 // ParsedParent is created whenever you are parsing a selector node that may have 120 // child selectors nodes that need to know it 121 type ParsedParent interface { 122 Link(s Selector) bool 123 } 124 125 // ParseContext tracks the progress when parsing a selector 126 type ParseContext struct { 127 parentStack []ParsedParent 128 } 129 130 // CompileSelector accepts a datamodel.Node which should contain data that declares a Selector. 131 // The data layout expected for this declaration is documented in https://datamodel.io/specs/selectors/ . 132 // 133 // If the Selector is compiled successfully, it is returned. 134 // Otherwise, if the given data Node doesn't match the expected shape for a Selector declaration, 135 // or there are any other problems compiling the selector 136 // (such as a recursion edge with no enclosing recursion declaration, etc), 137 // then nil and an error will be returned. 138 func CompileSelector(dmt datamodel.Node) (Selector, error) { 139 return ParseContext{}.ParseSelector(dmt) 140 } 141 142 // ParseSelector is an alias for CompileSelector, and is deprecated. 143 // Prefer CompileSelector. 144 func ParseSelector(dmt datamodel.Node) (Selector, error) { 145 return CompileSelector(dmt) 146 } 147 148 // ParseSelector creates a Selector from an IPLD Selector Node with the given context 149 func (pc ParseContext) ParseSelector(n datamodel.Node) (Selector, error) { 150 if n.Kind() != datamodel.Kind_Map { 151 return nil, fmt.Errorf("selector spec parse rejected: selector is a keyed union and thus must be a map") 152 } 153 if n.Length() != 1 { 154 return nil, fmt.Errorf("selector spec parse rejected: selector is a keyed union and thus must be single-entry map") 155 } 156 kn, v, _ := n.MapIterator().Next() 157 kstr, _ := kn.AsString() 158 // Switch over the single key to determine which selector body comes next. 159 // (This switch is where the keyed union discriminators concretely happen.) 160 switch kstr { 161 case SelectorKey_ExploreFields: 162 return pc.ParseExploreFields(v) 163 case SelectorKey_ExploreAll: 164 return pc.ParseExploreAll(v) 165 case SelectorKey_ExploreIndex: 166 return pc.ParseExploreIndex(v) 167 case SelectorKey_ExploreRange: 168 return pc.ParseExploreRange(v) 169 case SelectorKey_ExploreUnion: 170 return pc.ParseExploreUnion(v) 171 case SelectorKey_ExploreRecursive: 172 return pc.ParseExploreRecursive(v) 173 case SelectorKey_ExploreRecursiveEdge: 174 return pc.ParseExploreRecursiveEdge(v) 175 case SelectorKey_ExploreInterpretAs: 176 return pc.ParseExploreInterpretAs(v) 177 case SelectorKey_Matcher: 178 return pc.ParseMatcher(v) 179 default: 180 return nil, fmt.Errorf("selector spec parse rejected: %q is not a known member of the selector union", kstr) 181 } 182 } 183 184 // PushParent puts a parent onto the stack of parents for a parse context 185 func (pc ParseContext) PushParent(parent ParsedParent) ParseContext { 186 l := len(pc.parentStack) 187 parents := make([]ParsedParent, 0, l+1) 188 parents = append(parents, parent) 189 parents = append(parents, pc.parentStack...) 190 return ParseContext{parents} 191 } 192 193 // SegmentIterator iterates either a list or a map, generating PathSegments 194 // instead of indexes or keys 195 type SegmentIterator interface { 196 Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error) 197 Done() bool 198 } 199 200 // NewSegmentIterator generates a new iterator based on the node type 201 func NewSegmentIterator(n datamodel.Node) SegmentIterator { 202 if n.Kind() == datamodel.Kind_List { 203 return listSegmentIterator{n.ListIterator()} 204 } 205 return mapSegmentIterator{n.MapIterator()} 206 } 207 208 type listSegmentIterator struct { 209 datamodel.ListIterator 210 } 211 212 func (lsi listSegmentIterator) Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error) { 213 i, v, err := lsi.ListIterator.Next() 214 return datamodel.PathSegmentOfInt(i), v, err 215 } 216 217 func (lsi listSegmentIterator) Done() bool { 218 return lsi.ListIterator.Done() 219 } 220 221 type mapSegmentIterator struct { 222 datamodel.MapIterator 223 } 224 225 func (msi mapSegmentIterator) Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error) { 226 k, v, err := msi.MapIterator.Next() 227 if err != nil { 228 return datamodel.PathSegment{}, v, err 229 } 230 kstr, _ := k.AsString() 231 return datamodel.PathSegmentOfString(kstr), v, err 232 } 233 234 func (msi mapSegmentIterator) Done() bool { 235 return msi.MapIterator.Done() 236 }