github.com/dolthub/dolt/go@v0.40.5-0.20240520175717-68db7794bea6/store/datas/pull/pulling.md (about)

     1  # Dataset pulling algorithm
     2  The approach is to explore the chunk graph of both sink and source in order of decreasing ref-height. As the code walks, it uses the knowledge gained about which chunks are present in the sink to both prune the source-graph-walk and build up a set of `hints` that can be sent to a remote Database to aid in chunk validation.
     3  
     4  ## Basic algorithm
     5  
     6  - let `sink` be the *sink* database
     7  - let `source` be the *source* database
     8  - let `snkQ` and `srcQ` be priority queues of `Ref` prioritized by highest `Ref.height`
     9  - let `hints` be a map of `hash => hash`
    10  - let `reachableChunks` be a set of hashes
    11  - let `snkHdRef` be the ref (of `Commit`) of the head of the *sink* dataset
    12  - let `srcHdRef` be the ref of the *source* `Commit`, which must descend from the `Commit` indicated by `snkHdRef`
    13  
    14  - let `traverseSource(srcRef, srcQ, sink, source, reachableChunks)` be
    15    - pop `srcRef` from `srcQ`
    16      - if `!sink.has(srcRef)`
    17        - let `c` = `source.batchStore().Get(srcRef.targetHash)`
    18        - let `v` = `types.DecodeValue(c, source)`
    19        - insert all child refs, `cr`, from `v` into `srcQ` and into reachableRefs
    20        - `sink.batchStore().Put(c, srcRef.height, no hints)`
    21          - (hints will all be gathered and handed to sink.batchStore at the end)
    22  
    23  
    24  - let `traverseSink(sinkRef, snkQ, sink, hints)` be
    25    - pop `snkRef` from `snkQ`
    26    - if `snkRef.height` > 1
    27      - let `v` = `sink.readValue(snkRef.targetHash)`
    28      - insert all child refs, `cr`, from `v` into `snkQ` and `hints[cr] = snkRef`
    29  
    30  
    31  - let `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)` be
    32    - pop `comRef` from both `snkQ` and `srcQ`
    33    - if `comRef.height` > 1
    34      - if `comRef` is a `Ref` of `Commit`
    35        - let `v` = `sink.readValue(comRef.targetHash)`
    36        - if `comRef` == snkHdRef
    37          - *ignore all parent refs*
    38          - insert each other child ref `cr` from `v` into `snkQ` *only*, set `hints[cr] = comRef`
    39        - else
    40          - insert each child ref `cr` from `v` into both `snkQ` and `srcQ`, set `hints[cr] = comRef`
    41  
    42  
    43  - let `pull(source, sink, srcHdRef, sinkHdRef)
    44    - insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ`
    45    - create empty `hints` and `reachableChunks`
    46    - while `srcQ` is non-empty
    47      - let `srcHt` and `snkHt` be the respective heights of the *top* `Ref` in each of `srcQ` and `snkQ`
    48      - if `srcHt` > `snkHt`, for every `srcHdRef` in `srcQ` which is of greater height than `snkHt`
    49        - `traverseSource(srcHdRef, srcQ, sink, source)`
    50      - else if `snkHt` > `srcHt`, for every `snkHdRef` in `snkQ` which is of greater height than `srcHt`
    51        - `traverseSink(snkHdRef, snkQ, sink)`
    52      - else
    53        - for every `comRef` in which is common to `snkQ` and `srcQ` which is of height `srcHt` (and `snkHt`)
    54          - `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)`
    55        - for every `ref` in `srcQ` which is of height `srcHt`
    56          - `traverseSource(ref, srcQ, sink, source, reachableChunks)`
    57        - for every `ref` in `snkQ` which is of height `snkHt`
    58          - `traverseSink(ref, snkQ, sink, hints)`
    59    - for all `hash` in `reachableChunks`
    60      - sink.batchStore().addHint(hints[hash])
    61  
    62  
    63  ## Isomorphic, but less clear, algorithm
    64  
    65  - let all identifiers be as above
    66  - let `traverseSource`, `traverseSink`, and `traverseCommon` be as above
    67  
    68  - let `higherThan(refA, refB)` be
    69    - if refA.height == refB.height
    70      - return refA.targetHash < refB.targetHash
    71    - return refA.height > refB.height
    72  
    73  - let `pull(source, sink, srcHdRef, sinkHdRef)
    74    - insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ`
    75    - create empty `hints` and `reachableChunks`
    76    - while `srcQ` is non-empty
    77      - if `sinkQ` is empty
    78        - pop `ref` from `srcQ`
    79        - `traverseSource(ref, srcQ, sink, source, reachableChunks))
    80      - else if `higherThan(head of srcQ, head of snkQ)`
    81        - pop `ref` from `srcQ`
    82        - `traverseSource(ref, srcQ, sink, source, reachableChunks))
    83      - else if `higherThan(head of snkQ, head of srcQ)`
    84        - pop `ref` from `snkQ`
    85        - `traverseSink(ref, snkQ, sink, hints)`
    86      - else, heads of both queues are the same
    87        - pop `comRef` from `snkQ` and `srcQ`
    88        - `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)`
    89    - for all `hash` in `reachableChunks`
    90      - sink.batchStore().addHint(hints[hash])
    91  
    92