github.com/hasnat/dolt/go@v0.0.0-20210628190320-9eb5d843fbb7/store/datas/pulling.md (about) 1 # Dataset pulling algorithm 2 The approach is to explore the chunk graph of both sink and source in order of decreasing ref-height. As the code walks, it uses the knowledge gained about which chunks are present in the sink to both prune the source-graph-walk and build up a set of `hints` that can be sent to a remote Database to aid in chunk validation. 3 4 ## Basic algorithm 5 6 - let `sink` be the *sink* database 7 - let `source` be the *source* database 8 - let `snkQ` and `srcQ` be priority queues of `Ref` prioritized by highest `Ref.height` 9 - let `hints` be a map of `hash => hash` 10 - let `reachableChunks` be a set of hashes 11 - let `snkHdRef` be the ref (of `Commit`) of the head of the *sink* dataset 12 - let `srcHdRef` be the ref of the *source* `Commit`, which must descend from the `Commit` indicated by `snkHdRef` 13 14 - let `traverseSource(srcRef, srcQ, sink, source, reachableChunks)` be 15 - pop `srcRef` from `srcQ` 16 - if `!sink.has(srcRef)` 17 - let `c` = `source.batchStore().Get(srcRef.targetHash)` 18 - let `v` = `types.DecodeValue(c, source)` 19 - insert all child refs, `cr`, from `v` into `srcQ` and into reachableRefs 20 - `sink.batchStore().Put(c, srcRef.height, no hints)` 21 - (hints will all be gathered and handed to sink.batchStore at the end) 22 23 24 - let `traverseSink(sinkRef, snkQ, sink, hints)` be 25 - pop `snkRef` from `snkQ` 26 - if `snkRef.height` > 1 27 - let `v` = `sink.readValue(snkRef.targetHash)` 28 - insert all child refs, `cr`, from `v` into `snkQ` and `hints[cr] = snkRef` 29 30 31 - let `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)` be 32 - pop `comRef` from both `snkQ` and `srcQ` 33 - if `comRef.height` > 1 34 - if `comRef` is a `Ref` of `Commit` 35 - let `v` = `sink.readValue(comRef.targetHash)` 36 - if `comRef` == snkHdRef 37 - *ignore all parent refs* 38 - insert each other child ref `cr` from `v` into `snkQ` *only*, set `hints[cr] = comRef` 39 - else 40 - insert each child ref `cr` from `v` into both `snkQ` and `srcQ`, set `hints[cr] = comRef` 41 42 43 - let `pull(source, sink, srcHdRef, sinkHdRef) 44 - insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ` 45 - create empty `hints` and `reachableChunks` 46 - while `srcQ` is non-empty 47 - let `srcHt` and `snkHt` be the respective heights of the *top* `Ref` in each of `srcQ` and `snkQ` 48 - if `srcHt` > `snkHt`, for every `srcHdRef` in `srcQ` which is of greater height than `snkHt` 49 - `traverseSource(srcHdRef, srcQ, sink, source)` 50 - else if `snkHt` > `srcHt`, for every `snkHdRef` in `snkQ` which is of greater height than `srcHt` 51 - `traverseSink(snkHdRef, snkQ, sink)` 52 - else 53 - for every `comRef` in which is common to `snkQ` and `srcQ` which is of height `srcHt` (and `snkHt`) 54 - `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)` 55 - for every `ref` in `srcQ` which is of height `srcHt` 56 - `traverseSource(ref, srcQ, sink, source, reachableChunks)` 57 - for every `ref` in `snkQ` which is of height `snkHt` 58 - `traverseSink(ref, snkQ, sink, hints)` 59 - for all `hash` in `reachableChunks` 60 - sink.batchStore().addHint(hints[hash]) 61 62 63 ## Isomorphic, but less clear, algorithm 64 65 - let all identifiers be as above 66 - let `traverseSource`, `traverseSink`, and `traverseCommon` be as above 67 68 - let `higherThan(refA, refB)` be 69 - if refA.height == refB.height 70 - return refA.targetHash < refB.targetHash 71 - return refA.height > refB.height 72 73 - let `pull(source, sink, srcHdRef, sinkHdRef) 74 - insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ` 75 - create empty `hints` and `reachableChunks` 76 - while `srcQ` is non-empty 77 - if `sinkQ` is empty 78 - pop `ref` from `srcQ` 79 - `traverseSource(ref, srcQ, sink, source, reachableChunks)) 80 - else if `higherThan(head of srcQ, head of snkQ)` 81 - pop `ref` from `srcQ` 82 - `traverseSource(ref, srcQ, sink, source, reachableChunks)) 83 - else if `higherThan(head of snkQ, head of srcQ)` 84 - pop `ref` from `snkQ` 85 - `traverseSink(ref, snkQ, sink, hints)` 86 - else, heads of both queues are the same 87 - pop `comRef` from `snkQ` and `srcQ` 88 - `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)` 89 - for all `hash` in `reachableChunks` 90 - sink.batchStore().addHint(hints[hash]) 91 92