github.com/grumpyhome/grumpy@v0.3.1-0.20201208125205-7b775405bdf1/grumpy-runtime-src/third_party/stdlib/heapq.py (about)

     1  # -*- coding: latin-1 -*-
     2  
     3  """Heap queue algorithm (a.k.a. priority queue).
     4  
     5  Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
     6  all k, counting elements from 0.  For the sake of comparison,
     7  non-existing elements are considered to be infinite.  The interesting
     8  property of a heap is that a[0] is always its smallest element.
     9  
    10  Usage:
    11  
    12  heap = []            # creates an empty heap
    13  heappush(heap, item) # pushes a new item on the heap
    14  item = heappop(heap) # pops the smallest item from the heap
    15  item = heap[0]       # smallest item on the heap without popping it
    16  heapify(x)           # transforms list into a heap, in-place, in linear time
    17  item = heapreplace(heap, item) # pops and returns smallest item, and adds
    18                                 # new item; the heap size is unchanged
    19  
    20  Our API differs from textbook heap algorithms as follows:
    21  
    22  - We use 0-based indexing.  This makes the relationship between the
    23    index for a node and the indexes for its children slightly less
    24    obvious, but is more suitable since Python uses 0-based indexing.
    25  
    26  - Our heappop() method returns the smallest item, not the largest.
    27  
    28  These two make it possible to view the heap as a regular Python list
    29  without surprises: heap[0] is the smallest item, and heap.sort()
    30  maintains the heap invariant!
    31  """
    32  
    33  # Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
    34  
    35  __about__ = """Heap queues
    36  
    37  [explanation by François Pinard]
    38  
    39  Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
    40  all k, counting elements from 0.  For the sake of comparison,
    41  non-existing elements are considered to be infinite.  The interesting
    42  property of a heap is that a[0] is always its smallest element.
    43  
    44  The strange invariant above is meant to be an efficient memory
    45  representation for a tournament.  The numbers below are `k', not a[k]:
    46  
    47                                     0
    48  
    49                    1                                 2
    50  
    51            3               4                5               6
    52  
    53        7       8       9       10      11      12      13      14
    54  
    55      15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
    56  
    57  
    58  In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'.  In
    59  a usual binary tournament we see in sports, each cell is the winner
    60  over the two cells it tops, and we can trace the winner down the tree
    61  to see all opponents s/he had.  However, in many computer applications
    62  of such tournaments, we do not need to trace the history of a winner.
    63  To be more memory efficient, when a winner is promoted, we try to
    64  replace it by something else at a lower level, and the rule becomes
    65  that a cell and the two cells it tops contain three different items,
    66  but the top cell "wins" over the two topped cells.
    67  
    68  If this heap invariant is protected at all time, index 0 is clearly
    69  the overall winner.  The simplest algorithmic way to remove it and
    70  find the "next" winner is to move some loser (let's say cell 30 in the
    71  diagram above) into the 0 position, and then percolate this new 0 down
    72  the tree, exchanging values, until the invariant is re-established.
    73  This is clearly logarithmic on the total number of items in the tree.
    74  By iterating over all items, you get an O(n ln n) sort.
    75  
    76  A nice feature of this sort is that you can efficiently insert new
    77  items while the sort is going on, provided that the inserted items are
    78  not "better" than the last 0'th element you extracted.  This is
    79  especially useful in simulation contexts, where the tree holds all
    80  incoming events, and the "win" condition means the smallest scheduled
    81  time.  When an event schedule other events for execution, they are
    82  scheduled into the future, so they can easily go into the heap.  So, a
    83  heap is a good structure for implementing schedulers (this is what I
    84  used for my MIDI sequencer :-).
    85  
    86  Various structures for implementing schedulers have been extensively
    87  studied, and heaps are good for this, as they are reasonably speedy,
    88  the speed is almost constant, and the worst case is not much different
    89  than the average case.  However, there are other representations which
    90  are more efficient overall, yet the worst cases might be terrible.
    91  
    92  Heaps are also very useful in big disk sorts.  You most probably all
    93  know that a big sort implies producing "runs" (which are pre-sorted
    94  sequences, which size is usually related to the amount of CPU memory),
    95  followed by a merging passes for these runs, which merging is often
    96  very cleverly organised[1].  It is very important that the initial
    97  sort produces the longest runs possible.  Tournaments are a good way
    98  to that.  If, using all the memory available to hold a tournament, you
    99  replace and percolate items that happen to fit the current run, you'll
   100  produce runs which are twice the size of the memory for random input,
   101  and much better for input fuzzily ordered.
   102  
   103  Moreover, if you output the 0'th item on disk and get an input which
   104  may not fit in the current tournament (because the value "wins" over
   105  the last output value), it cannot fit in the heap, so the size of the
   106  heap decreases.  The freed memory could be cleverly reused immediately
   107  for progressively building a second heap, which grows at exactly the
   108  same rate the first heap is melting.  When the first heap completely
   109  vanishes, you switch heaps and start a new run.  Clever and quite
   110  effective!
   111  
   112  In a word, heaps are useful memory structures to know.  I use them in
   113  a few applications, and I think it is good to keep a `heap' module
   114  around. :-)
   115  
   116  --------------------
   117  [1] The disk balancing algorithms which are current, nowadays, are
   118  more annoying than clever, and this is a consequence of the seeking
   119  capabilities of the disks.  On devices which cannot seek, like big
   120  tape drives, the story was quite different, and one had to be very
   121  clever to ensure (far in advance) that each tape movement will be the
   122  most effective possible (that is, will best participate at
   123  "progressing" the merge).  Some tapes were even able to read
   124  backwards, and this was also used to avoid the rewinding time.
   125  Believe me, real good tape sorts were quite spectacular to watch!
   126  From all times, sorting has always been a Great Art! :-)
   127  """
   128  
   129  __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
   130             'nlargest', 'nsmallest', 'heappushpop']
   131  
   132  import itertools
   133  islice = itertools.islice
   134  count = itertools.count
   135  imap = itertools.imap
   136  izip = itertools.izip
   137  tee = itertools.tee
   138  chain = itertools.chain
   139  import operator
   140  itemgetter = operator.itemgetter
   141  
   142  def cmp_lt(x, y):
   143      # Use __lt__ if available; otherwise, try __le__.
   144      # In Py3.x, only __lt__ will be called.
   145      return (x < y) if hasattr(x, '__lt__') else (not y <= x)
   146  
   147  def heappush(heap, item):
   148      """Push item onto heap, maintaining the heap invariant."""
   149      heap.append(item)
   150      _siftdown(heap, 0, len(heap)-1)
   151  
   152  def heappop(heap):
   153      """Pop the smallest item off the heap, maintaining the heap invariant."""
   154      lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
   155      if heap:
   156          returnitem = heap[0]
   157          heap[0] = lastelt
   158          _siftup(heap, 0)
   159      else:
   160          returnitem = lastelt
   161      return returnitem
   162  
   163  def heapreplace(heap, item):
   164      """Pop and return the current smallest value, and add the new item.
   165  
   166      This is more efficient than heappop() followed by heappush(), and can be
   167      more appropriate when using a fixed-size heap.  Note that the value
   168      returned may be larger than item!  That constrains reasonable uses of
   169      this routine unless written as part of a conditional replacement:
   170  
   171          if item > heap[0]:
   172              item = heapreplace(heap, item)
   173      """
   174      returnitem = heap[0]    # raises appropriate IndexError if heap is empty
   175      heap[0] = item
   176      _siftup(heap, 0)
   177      return returnitem
   178  
   179  def heappushpop(heap, item):
   180      """Fast version of a heappush followed by a heappop."""
   181      if heap and cmp_lt(heap[0], item):
   182          item, heap[0] = heap[0], item
   183          _siftup(heap, 0)
   184      return item
   185  
   186  def heapify(x):
   187      """Transform list into a heap, in-place, in O(len(x)) time."""
   188      n = len(x)
   189      # Transform bottom-up.  The largest index there's any point to looking at
   190      # is the largest with a child index in-range, so must have 2*i + 1 < n,
   191      # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
   192      # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
   193      # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
   194      for i in reversed(xrange(n//2)):
   195          _siftup(x, i)
   196  
   197  def _heappushpop_max(heap, item):
   198      """Maxheap version of a heappush followed by a heappop."""
   199      if heap and cmp_lt(item, heap[0]):
   200          item, heap[0] = heap[0], item
   201          _siftup_max(heap, 0)
   202      return item
   203  
   204  def _heapify_max(x):
   205      """Transform list into a maxheap, in-place, in O(len(x)) time."""
   206      n = len(x)
   207      for i in reversed(range(n//2)):
   208          _siftup_max(x, i)
   209  
   210  def nlargest(n, iterable):
   211      """Find the n largest elements in a dataset.
   212  
   213      Equivalent to:  sorted(iterable, reverse=True)[:n]
   214      """
   215      if n < 0:
   216          return []
   217      it = iter(iterable)
   218      result = list(islice(it, n))
   219      if not result:
   220          return result
   221      heapify(result)
   222      _heappushpop = heappushpop
   223      for elem in it:
   224          _heappushpop(result, elem)
   225      result.sort(reverse=True)
   226      return result
   227  
   228  def nsmallest(n, iterable):
   229      """Find the n smallest elements in a dataset.
   230  
   231      Equivalent to:  sorted(iterable)[:n]
   232      """
   233      if n < 0:
   234          return []
   235      it = iter(iterable)
   236      result = list(islice(it, n))
   237      if not result:
   238          return result
   239      _heapify_max(result)
   240      _heappushpop = _heappushpop_max
   241      for elem in it:
   242          _heappushpop(result, elem)
   243      result.sort()
   244      return result
   245  
   246  # 'heap' is a heap at all indices >= startpos, except possibly for pos.  pos
   247  # is the index of a leaf with a possibly out-of-order value.  Restore the
   248  # heap invariant.
   249  def _siftdown(heap, startpos, pos):
   250      newitem = heap[pos]
   251      # Follow the path to the root, moving parents down until finding a place
   252      # newitem fits.
   253      while pos > startpos:
   254          parentpos = (pos - 1) >> 1
   255          parent = heap[parentpos]
   256          if cmp_lt(newitem, parent):
   257              heap[pos] = parent
   258              pos = parentpos
   259              continue
   260          break
   261      heap[pos] = newitem
   262  
   263  # The child indices of heap index pos are already heaps, and we want to make
   264  # a heap at index pos too.  We do this by bubbling the smaller child of
   265  # pos up (and so on with that child's children, etc) until hitting a leaf,
   266  # then using _siftdown to move the oddball originally at index pos into place.
   267  #
   268  # We *could* break out of the loop as soon as we find a pos where newitem <=
   269  # both its children, but turns out that's not a good idea, and despite that
   270  # many books write the algorithm that way.  During a heap pop, the last array
   271  # element is sifted in, and that tends to be large, so that comparing it
   272  # against values starting from the root usually doesn't pay (= usually doesn't
   273  # get us out of the loop early).  See Knuth, Volume 3, where this is
   274  # explained and quantified in an exercise.
   275  #
   276  # Cutting the # of comparisons is important, since these routines have no
   277  # way to extract "the priority" from an array element, so that intelligence
   278  # is likely to be hiding in custom __cmp__ methods, or in array elements
   279  # storing (priority, record) tuples.  Comparisons are thus potentially
   280  # expensive.
   281  #
   282  # On random arrays of length 1000, making this change cut the number of
   283  # comparisons made by heapify() a little, and those made by exhaustive
   284  # heappop() a lot, in accord with theory.  Here are typical results from 3
   285  # runs (3 just to demonstrate how small the variance is):
   286  #
   287  # Compares needed by heapify     Compares needed by 1000 heappops
   288  # --------------------------     --------------------------------
   289  # 1837 cut to 1663               14996 cut to 8680
   290  # 1855 cut to 1659               14966 cut to 8678
   291  # 1847 cut to 1660               15024 cut to 8703
   292  #
   293  # Building the heap by using heappush() 1000 times instead required
   294  # 2198, 2148, and 2219 compares:  heapify() is more efficient, when
   295  # you can use it.
   296  #
   297  # The total compares needed by list.sort() on the same lists were 8627,
   298  # 8627, and 8632 (this should be compared to the sum of heapify() and
   299  # heappop() compares):  list.sort() is (unsurprisingly!) more efficient
   300  # for sorting.
   301  
   302  def _siftup(heap, pos):
   303      endpos = len(heap)
   304      startpos = pos
   305      newitem = heap[pos]
   306      # Bubble up the smaller child until hitting a leaf.
   307      childpos = 2*pos + 1    # leftmost child position
   308      while childpos < endpos:
   309          # Set childpos to index of smaller child.
   310          rightpos = childpos + 1
   311          if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]):
   312              childpos = rightpos
   313          # Move the smaller child up.
   314          heap[pos] = heap[childpos]
   315          pos = childpos
   316          childpos = 2*pos + 1
   317      # The leaf at pos is empty now.  Put newitem there, and bubble it up
   318      # to its final resting place (by sifting its parents down).
   319      heap[pos] = newitem
   320      _siftdown(heap, startpos, pos)
   321  
   322  def _siftdown_max(heap, startpos, pos):
   323      'Maxheap variant of _siftdown'
   324      newitem = heap[pos]
   325      # Follow the path to the root, moving parents down until finding a place
   326      # newitem fits.
   327      while pos > startpos:
   328          parentpos = (pos - 1) >> 1
   329          parent = heap[parentpos]
   330          if cmp_lt(parent, newitem):
   331              heap[pos] = parent
   332              pos = parentpos
   333              continue
   334          break
   335      heap[pos] = newitem
   336  
   337  def _siftup_max(heap, pos):
   338      'Maxheap variant of _siftup'
   339      endpos = len(heap)
   340      startpos = pos
   341      newitem = heap[pos]
   342      # Bubble up the larger child until hitting a leaf.
   343      childpos = 2*pos + 1    # leftmost child position
   344      while childpos < endpos:
   345          # Set childpos to index of larger child.
   346          rightpos = childpos + 1
   347          if rightpos < endpos and not cmp_lt(heap[rightpos], heap[childpos]):
   348              childpos = rightpos
   349          # Move the larger child up.
   350          heap[pos] = heap[childpos]
   351          pos = childpos
   352          childpos = 2*pos + 1
   353      # The leaf at pos is empty now.  Put newitem there, and bubble it up
   354      # to its final resting place (by sifting its parents down).
   355      heap[pos] = newitem
   356      _siftdown_max(heap, startpos, pos)
   357  
   358  # If available, use C implementation
   359  #try:
   360  #    import _heapq
   361  #except ImportError:
   362  #    pass
   363  
   364  def merge(*iterables):
   365      '''Merge multiple sorted inputs into a single sorted output.
   366  
   367      Similar to sorted(itertools.chain(*iterables)) but returns a generator,
   368      does not pull the data into memory all at once, and assumes that each of
   369      the input streams is already sorted (smallest to largest).
   370  
   371      >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
   372      [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
   373  
   374      '''
   375      _heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
   376      _len = len
   377  
   378      h = []
   379      h_append = h.append
   380      for itnum, it in enumerate(map(iter, iterables)):
   381          try:
   382              next = it.next
   383              h_append([next(), itnum, next])
   384          except _StopIteration:
   385              pass
   386      heapify(h)
   387  
   388      while _len(h) > 1:
   389          try:
   390              while 1:
   391                  v, itnum, next = s = h[0]
   392                  yield v
   393                  s[0] = next()               # raises StopIteration when exhausted
   394                  _heapreplace(h, s)          # restore heap condition
   395          except _StopIteration:
   396              _heappop(h)                     # remove empty iterator
   397      if h:
   398          # fast case when only a single iterator remains
   399          v, itnum, next = h[0]
   400          yield v
   401          for v in next.__self__:
   402              yield v
   403  
   404  # Extend the implementations of nsmallest and nlargest to use a key= argument
   405  _nsmallest = nsmallest
   406  def nsmallest(n, iterable, key=None):
   407      """Find the n smallest elements in a dataset.
   408  
   409      Equivalent to:  sorted(iterable, key=key)[:n]
   410      """
   411      # Short-cut for n==1 is to use min() when len(iterable)>0
   412      if n == 1:
   413          it = iter(iterable)
   414          head = list(islice(it, 1))
   415          if not head:
   416              return []
   417          if key is None:
   418              return [min(chain(head, it))]
   419          return [min(chain(head, it), key=key)]
   420  
   421      # When n>=size, it's faster to use sorted()
   422      try:
   423          size = len(iterable)
   424      except (TypeError, AttributeError):
   425          pass
   426      else:
   427          if n >= size:
   428              return sorted(iterable, key=key)[:n]
   429  
   430      # When key is none, use simpler decoration
   431      if key is None:
   432          it = izip(iterable, count())                        # decorate
   433          result = _nsmallest(n, it)
   434          return map(itemgetter(0), result)                   # undecorate
   435  
   436      # General case, slowest method
   437      in1, in2 = tee(iterable)
   438      it = izip(imap(key, in1), count(), in2)                 # decorate
   439      result = _nsmallest(n, it)
   440      return map(itemgetter(2), result)                       # undecorate
   441  
   442  _nlargest = nlargest
   443  def nlargest(n, iterable, key=None):
   444      """Find the n largest elements in a dataset.
   445  
   446      Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
   447      """
   448  
   449      # Short-cut for n==1 is to use max() when len(iterable)>0
   450      if n == 1:
   451          it = iter(iterable)
   452          head = list(islice(it, 1))
   453          if not head:
   454              return []
   455          if key is None:
   456              return [max(chain(head, it))]
   457          return [max(chain(head, it), key=key)]
   458  
   459      # When n>=size, it's faster to use sorted()
   460      try:
   461          size = len(iterable)
   462      except (TypeError, AttributeError):
   463          pass
   464      else:
   465          if n >= size:
   466              return sorted(iterable, key=key, reverse=True)[:n]
   467  
   468      # When key is none, use simpler decoration
   469      if key is None:
   470          it = izip(iterable, count(0,-1))                    # decorate
   471          result = _nlargest(n, it)
   472          return map(itemgetter(0), result)                   # undecorate
   473  
   474      # General case, slowest method
   475      in1, in2 = tee(iterable)
   476      it = izip(imap(key, in1), count(0,-1), in2)             # decorate
   477      result = _nlargest(n, it)
   478      return map(itemgetter(2), result)                       # undecorate
   479  
   480  #if __name__ == "__main__":
   481  #    # Simple sanity test
   482  #    heap = []
   483  #    data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
   484  #    for item in data:
   485  #        heappush(heap, item)
   486  #    sort = []
   487  #    while heap:
   488  #        sort.append(heappop(heap))
   489  #    print sort
   490  #
   491  #    import doctest
   492  #    doctest.testmod()