Whilst the Python
list is great for storing an arbitrary number of elements, and for being able to retrieve elements quickly by position, the
list does not perform well when working with huge volumes of ordered data. For example, imagine we had a list with millions of objects in a random order, and we need to continually retrieve the object that is largest. In this situation, Python needs to continually scan the entire list to find the one that is largest. This is not an efficient approach!
And that’s where a priority queue comes in. A priority queue is an abstract dta structure designed to allow fast retrieval of the largest or smallest value, regardless of the number of items being stored.
Typical use cases for a priority queue are:
What does abstract data structure even mean? It means that however we implement this data structure, it needs to adhere to a few rules. A implementation for a priority queue should include these operations:
Heaps are commonly used to implement priority queues in a manner where performance is guaranteed in relation to the size of the overall dta structure. This is possible because heaps implement a priority queue in the form of a binary tree.
Recall from graphs that a tree is an acyclic graph - i.e. one in which no loops are created - where vertices are connected by exactly one path.
A heap has additional characteristics, called heap property:
The performance of pushing/popping a heap is guaranteed to be proportional to the base-2 log of the size of the queue.
In Python, we have a ready-made heap available in the form of the heapq.
It is a complete binary tree implemented _on top of a list. For this reason, it is easy to convert an existing list or iterable to a heapq.
Recall that the heap property requires that a parent has a smaller value (higher priority) than both of its children. This is implemented in the heapq as shown below.
The arrows indicate the children of any given parent, and shows where the children are always located, in the heapq implementation.
heapq pops based on the smallest value having the highest priority.
So, if we (say) store state objects in the queue, you could implement the
__lt__() method of the state object such that
a < b if
a is closer to the goal.
heapq.heappushpop(heap, element)- more efficient if we need to both push and pop
heapq.merge(iterable1, iterable2, …)- merges sorted iterables and returns an iterator
Recall that the
heapq always pops the item with the smallest value. Thus, highest priority = smallest value. Bear this in mind when adding objects to a
An alternative way to add things to a queue with our desired priority is to add them as the second element of a
tuple. This is because if we add
tuples to a
heapq, the first item in the tuple is used for the priority.
heapq.heappush(frontier, (priority, item))
We’ll look at some real
heapq usage when we move on to look at Shortest Path Algorithms.