github.com/ipld/go-ipld-prime@v0.21.0/traversal/selector/selector.go (about)

     1  package selector
     2  
     3  import (
     4  	"fmt"
     5  
     6  	"github.com/ipld/go-ipld-prime/datamodel"
     7  )
     8  
     9  // Selector is a "compiled" and executable IPLD Selector.
    10  // It can be put to work with functions like traversal.Walk,
    11  // which will use the Selector's guidance to decide how to traverse an IPLD data graph.
    12  // A user will not generally call any of the methods of Selector themselves, nor implement the interface;
    13  // it is produced by "compile" functions in this package, and used by functions in the `traversal` package.
    14  //
    15  // A Selector is created by parsing an IPLD Data Model document that declares a Selector
    16  // (this is accomplished with functions like CompileSelector).
    17  // To make this even easier, there is a `parse` subpackage,
    18  // which contains helper methods for parsing direction from a JSON Selector document to a compiled Selector value.
    19  // Alternatively, there is a `builder` subpackage,
    20  // which may be useful if you would rather create the Selector declaration programmatically in golang
    21  // (however, we recommend using this sparingly, because part of what makes Selectors cool is their language-agnostic declarative nature).
    22  //
    23  // There is no way to go backwards from this "compiled" Selector type into the declarative IPLD data model information that produced it.
    24  // That declaration information is discarded after compilation in order to limit the amount of memory held.
    25  // Therefore, if you're building APIs about Selector composition, keep in mind that
    26  // you'll probably want to approach this be composing the Data Model declaration documents,
    27  // and you should *not* attempt to be composing this type, which is only for the "compiled" result.
    28  type Selector interface {
    29  	// Notes for you who implements a Selector:
    30  	// this type holds the state describing what we will do at one step in a traversal.
    31  	// The actual traversal stepping is applied *from the outside* (and this is implemented mostly in the `traversal` package;
    32  	// this type just gives it instructions on how to step.
    33  	// Each of the functions on this type should be pure; they can can read the Selector's fields, but should treat them as config, not as state -- the Selector should never mutate.
    34  	//
    35  	// The traversal process will ask things of a Selector in three phases,
    36  	// and control flow will bounce back and forth between traversal logic and selector evaluation --
    37  	// traversal owns the actual walking (and any data loading), and just briefly dips down into the Selector so it can answer questions:
    38  	//   T1. Traversal starts at some Node with some Selector.
    39  	//   S1. First, the traversal asks the Selector what its "interests" are.
    40  	//        This lets the Selector hint to the traversal process what it should load,
    41  	//        which can be important for performance if not all of the next data elements are in memory already.
    42  	//        (This is applicable to ADLs which contain large sharded data, for example.)
    43  	//        (The "interests" phase should be _fast_; more complicated checks, and anything that actually looks at the children, should wait until the "explore" phase;
    44  	//        in fact, for this reason, the `Interests` function doesn't even get to look at the data at all yet.)
    45  	//   T2. The traversal looks at the Node and its actual fields, and what the Selector just said are interesting,
    46  	//        and between the two of them figures out what's actually here to act on.
    47  	//        (Note that the Selector can say that certain paths are interesting, and that path can then not be there.)
    48  	//   S2. Second, the code driving the traversal will ask us to "explore", **stepwise**.
    49  	//        The "explore" step is applied **repeatedly**: once per pathSegment that identifies a child in the Node.
    50  	//        (If `Interests()` returned a list, `Explore` will be called for each element in the list (as long as that pathSegment actually existed in the Node, of course);
    51  	//        or if `Interest()` returned no guidance, `Explore` will be called for everything in the object.)
    52  	//   S2.a.  The "explore" step returns a new Selector object, with instructions about how to continue the walk for the reached object and beneath.
    53  	//            (Note that the "explore" step can also return `nil` here to say "actually, don't look any further",
    54  	//            and it may do so even if the "interests" phase suggested there might be something to follow up on here.  (Remember "interests" had to be fast, and was a first pass only.))
    55  	//   T2.a.  ***Recursion time!***
    56  	//            The traversal now takes that pathSegment and that subsequent Selector produced by `Explore`,
    57  	//            gets the child Node at that pathSegment, and recurses into traversing on that Node with that Selector!
    58  	//            It is also possibly ***link load time***, right before recursing:
    59  	//            if the child node is a Link, the traversal may choose to load it now,
    60  	//            and then do the recursion on the loaded Node (instead of on the actual direct child Node, which was a Link) with the next Selector.
    61  	//   T2.b.  When the recursion is done, the traversal goes on to repeat S2, with the next pathSegment,
    62  	//            until it runs out of things to do.
    63  	//   T3.  The traversal asks the Selector to "decide" if this current Node is one that is "matched or not.
    64  	//        See the Selector specs for discussion on "matched" vs "reached"/"visited" nodes.
    65  	//        (Long story short: the traversal probably fires off callbacks for "matched" nodes, aka if `Decide` says `true`.)
    66  	//   S3.  The selector does so.
    67  	//   T4.  The traversal for this node is done.
    68  	//
    69  	// Phase T3+S3 can also be T0+S0, which makes for a pre-order traversal instead of a post-order traversal.
    70  	// The Selector doesn't know the difference.
    71  	// (In particular, a Selector implementation absolutely may **not** assume `Decide` will be called before `Interests`, and may **not** hold onto a Node statefully, etc.)
    72  	//
    73  	// Note that it's not until phase T2.a that the traversal actually loads child Nodes.
    74  	// This is interesting because it's *after* when the Selector is asked to `Explore` and yield a subsequent Selector to use on that upcoming Node.
    75  	//
    76  	// Can `Explore` and `Decide` do Link loading on their own?  Do they need to?
    77  	// Right now, no, they can't.  (Sort of.)  They don't have access to a LinkLoader; the traversal would have to give them one.
    78  	// This might be needed in the future, e.g. if the Selector has a Condition clause that requires looking deeper; so far, we don't have those features, so it hasn't been needed.
    79  	// The "sort of" is for ADLs.  ADLs that work with large sharded data sometimes hold onto their own LinkLoader and apply it transparently.
    80  	// In that case, of course, `Explore` and `Decide` can just interrogate the Node they've been given, and that may cause link loading.
    81  	// (If that happens, we're currently assuming the ADL has a reasonable caching behavior.  It's very likely that the traversal will look up the same paths that Explore just looked up (assuming the Condition told exploration to continue).)
    82  	//
    83  
    84  	// Interests should return either a list of PathSegment we're likely interested in,
    85  	// **or nil**, which indicates we're a high-cardinality or expression-based selection clause and thus we'll need all segments proposed to us.
    86  	// Note that a non-nil zero length list of PathSegment is distinguished from nil: this would mean this selector is interested absolutely nothing.
    87  	//
    88  	// Traversal will call this before calling Explore, and use it to try to call Explore less often (or even avoid iterating on the data node at all).
    89  	Interests() []datamodel.PathSegment
    90  
    91  	// Explore is told about the node we're at, and the pathSegment inside it to consider,
    92  	// and returns either nil, if we shouldn't explore that path any further,
    93  	// or returns a Selector, which should then be used to explore the child at that path.
    94  	//
    95  	// Note that the node parameter is not the child, it's the node we're currently at.
    96  	// (Often, this is sufficient information: consider ExploreFields,
    97  	// which only even needs to regard the pathSegment, and not the node at all.)
    98  	//
    99  	// Remember that Explore does **not** iterate `node` itself; the visits to any children of `node` will be driven from the outside, by the traversal function.
   100  	// (The Selector's job is just guiding that process by returning information.)
   101  	// The architecture works this way so that a sufficiently clever traversal function could consider several reasons for exploring a node before deciding whether to do so.
   102  	Explore(node datamodel.Node, child datamodel.PathSegment) (subsequent Selector, err error)
   103  
   104  	// Decide returns true if the subject node is "matched".
   105  	//
   106  	// Only "Matcher" clauses actually implement this in a way that ever returns "true".
   107  	// See the Selector specs for discussion on "matched" vs "reached"/"visited" nodes.
   108  	Decide(node datamodel.Node) bool
   109  
   110  	// Match is an extension to Decide allowing the matcher to `decide` a transformation of
   111  	// the matched node. This is used for `Subset` match behavior. If the node is matched,
   112  	// the first argument will be the matched node. If it is not matched, the first argument
   113  	// will be null. If there is an error, the first argument will be null.
   114  	Match(node datamodel.Node) (datamodel.Node, error)
   115  }
   116  
   117  // REVIEW: do ParsedParent and ParseContext need to be exported?  They're mostly used during the compilation process.
   118  
   119  // ParsedParent is created whenever you are parsing a selector node that may have
   120  // child selectors nodes that need to know it
   121  type ParsedParent interface {
   122  	Link(s Selector) bool
   123  }
   124  
   125  // ParseContext tracks the progress when parsing a selector
   126  type ParseContext struct {
   127  	parentStack []ParsedParent
   128  }
   129  
   130  // CompileSelector accepts a datamodel.Node which should contain data that declares a Selector.
   131  // The data layout expected for this declaration is documented in https://datamodel.io/specs/selectors/ .
   132  //
   133  // If the Selector is compiled successfully, it is returned.
   134  // Otherwise, if the given data Node doesn't match the expected shape for a Selector declaration,
   135  // or there are any other problems compiling the selector
   136  // (such as a recursion edge with no enclosing recursion declaration, etc),
   137  // then nil and an error will be returned.
   138  func CompileSelector(dmt datamodel.Node) (Selector, error) {
   139  	return ParseContext{}.ParseSelector(dmt)
   140  }
   141  
   142  // ParseSelector is an alias for CompileSelector, and is deprecated.
   143  // Prefer CompileSelector.
   144  func ParseSelector(dmt datamodel.Node) (Selector, error) {
   145  	return CompileSelector(dmt)
   146  }
   147  
   148  // ParseSelector creates a Selector from an IPLD Selector Node with the given context
   149  func (pc ParseContext) ParseSelector(n datamodel.Node) (Selector, error) {
   150  	if n.Kind() != datamodel.Kind_Map {
   151  		return nil, fmt.Errorf("selector spec parse rejected: selector is a keyed union and thus must be a map")
   152  	}
   153  	if n.Length() != 1 {
   154  		return nil, fmt.Errorf("selector spec parse rejected: selector is a keyed union and thus must be single-entry map")
   155  	}
   156  	kn, v, _ := n.MapIterator().Next()
   157  	kstr, _ := kn.AsString()
   158  	// Switch over the single key to determine which selector body comes next.
   159  	//  (This switch is where the keyed union discriminators concretely happen.)
   160  	switch kstr {
   161  	case SelectorKey_ExploreFields:
   162  		return pc.ParseExploreFields(v)
   163  	case SelectorKey_ExploreAll:
   164  		return pc.ParseExploreAll(v)
   165  	case SelectorKey_ExploreIndex:
   166  		return pc.ParseExploreIndex(v)
   167  	case SelectorKey_ExploreRange:
   168  		return pc.ParseExploreRange(v)
   169  	case SelectorKey_ExploreUnion:
   170  		return pc.ParseExploreUnion(v)
   171  	case SelectorKey_ExploreRecursive:
   172  		return pc.ParseExploreRecursive(v)
   173  	case SelectorKey_ExploreRecursiveEdge:
   174  		return pc.ParseExploreRecursiveEdge(v)
   175  	case SelectorKey_ExploreInterpretAs:
   176  		return pc.ParseExploreInterpretAs(v)
   177  	case SelectorKey_Matcher:
   178  		return pc.ParseMatcher(v)
   179  	default:
   180  		return nil, fmt.Errorf("selector spec parse rejected: %q is not a known member of the selector union", kstr)
   181  	}
   182  }
   183  
   184  // PushParent puts a parent onto the stack of parents for a parse context
   185  func (pc ParseContext) PushParent(parent ParsedParent) ParseContext {
   186  	l := len(pc.parentStack)
   187  	parents := make([]ParsedParent, 0, l+1)
   188  	parents = append(parents, parent)
   189  	parents = append(parents, pc.parentStack...)
   190  	return ParseContext{parents}
   191  }
   192  
   193  // SegmentIterator iterates either a list or a map, generating PathSegments
   194  // instead of indexes or keys
   195  type SegmentIterator interface {
   196  	Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error)
   197  	Done() bool
   198  }
   199  
   200  // NewSegmentIterator generates a new iterator based on the node type
   201  func NewSegmentIterator(n datamodel.Node) SegmentIterator {
   202  	if n.Kind() == datamodel.Kind_List {
   203  		return listSegmentIterator{n.ListIterator()}
   204  	}
   205  	return mapSegmentIterator{n.MapIterator()}
   206  }
   207  
   208  type listSegmentIterator struct {
   209  	datamodel.ListIterator
   210  }
   211  
   212  func (lsi listSegmentIterator) Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error) {
   213  	i, v, err := lsi.ListIterator.Next()
   214  	return datamodel.PathSegmentOfInt(i), v, err
   215  }
   216  
   217  func (lsi listSegmentIterator) Done() bool {
   218  	return lsi.ListIterator.Done()
   219  }
   220  
   221  type mapSegmentIterator struct {
   222  	datamodel.MapIterator
   223  }
   224  
   225  func (msi mapSegmentIterator) Next() (pathSegment datamodel.PathSegment, value datamodel.Node, err error) {
   226  	k, v, err := msi.MapIterator.Next()
   227  	if err != nil {
   228  		return datamodel.PathSegment{}, v, err
   229  	}
   230  	kstr, _ := k.AsString()
   231  	return datamodel.PathSegmentOfString(kstr), v, err
   232  }
   233  
   234  func (msi mapSegmentIterator) Done() bool {
   235  	return msi.MapIterator.Done()
   236  }