github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/query/thoughts.md

github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/query/thoughts.md (about)

     1  How to generate a unique list of uids by querying list of posting lists?
     2  
     3  Sol 1:
     4  - Say there're k posting lists involved.
     5  - One way to do so is to have a heap of k elements.
     6  - At each iteration, we pop() an element from the heap (log k)
     7  - Advance the pointer of that posting list, and retrieve another element (involves mutex read lock)
     8  - Push() that element into the heap (log k)
     9  - This would give us O(N*log k), with mutex lock acquired N times.
    10  - With N=1000 and k=5, this gives us 1000 * ln(5) ~ 1600
    11  
    12  Performance Improvements (memory tradeoff) [Sol1a]:
    13  - We can alleviate the need for mutex locks by copying over all the posting list uids in separate vectors.
    14  - This would avoid N lock acquisitions, only requiring the best-case scenario of k locks.
    15  - But this also means all the posting list uids would be stored in memory.
    16  
    17  Performance with Memory [Sol1b]:
    18  - Use k channels, with each channel only maintaining a buffer of say 1000 uids.
    19  - In fact, keep the read lock acquired during this process, to avoid the posting list from changing during a query.
    20  - So, basically have a way for a posting list to stream uids to a blocking channel, after having acquired a read lock.
    21  - Overall this process of merging uids shouldn't take that long anyways; so this won't starve writes, only delay them.
    22  
    23  Another way [Sol2]:
    24  - Pick a posting list, copy all it's uids in one go (one mutex lock)
    25  - Use a binary tree to store uids. Eliminate duplicates.
    26  - Iterate over each element in the uids vector, and insert into binary tree. [O(log N) max per insert]
    27  - Repeat with other posting lists.
    28  - This would give us O(N log N) complexity, with mutex lock acquired k times.
    29  - With N=1000 and k=5, this gives us 1000 * ln(1000) ~ 7000
    30  - Not choosing this path.
    31  
    32  Solution: Sol1b