github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/query/thoughts.md (about) 1 How to generate a unique list of uids by querying list of posting lists? 2 3 Sol 1: 4 - Say there're k posting lists involved. 5 - One way to do so is to have a heap of k elements. 6 - At each iteration, we pop() an element from the heap (log k) 7 - Advance the pointer of that posting list, and retrieve another element (involves mutex read lock) 8 - Push() that element into the heap (log k) 9 - This would give us O(N*log k), with mutex lock acquired N times. 10 - With N=1000 and k=5, this gives us 1000 * ln(5) ~ 1600 11 12 Performance Improvements (memory tradeoff) [Sol1a]: 13 - We can alleviate the need for mutex locks by copying over all the posting list uids in separate vectors. 14 - This would avoid N lock acquisitions, only requiring the best-case scenario of k locks. 15 - But this also means all the posting list uids would be stored in memory. 16 17 Performance with Memory [Sol1b]: 18 - Use k channels, with each channel only maintaining a buffer of say 1000 uids. 19 - In fact, keep the read lock acquired during this process, to avoid the posting list from changing during a query. 20 - So, basically have a way for a posting list to stream uids to a blocking channel, after having acquired a read lock. 21 - Overall this process of merging uids shouldn't take that long anyways; so this won't starve writes, only delay them. 22 23 Another way [Sol2]: 24 - Pick a posting list, copy all it's uids in one go (one mutex lock) 25 - Use a binary tree to store uids. Eliminate duplicates. 26 - Iterate over each element in the uids vector, and insert into binary tree. [O(log N) max per insert] 27 - Repeat with other posting lists. 28 - This would give us O(N log N) complexity, with mutex lock acquired k times. 29 - With N=1000 and k=5, this gives us 1000 * ln(1000) ~ 7000 30 - Not choosing this path. 31 32 Solution: Sol1b