Python heapq: Priority Queues and Heap Operations Made Simple
Updated on
You need to process items by priority -- tasks in a job scheduler, events in a simulation, or the shortest path in a graph. A sorted list seems like the obvious choice, but inserting into a sorted list is O(n), which becomes a bottleneck when you're processing millions of items. Sorting the entire list every time you add an element is even worse at O(n log n).
The cost compounds quickly. A job scheduler that re-sorts its queue on every insertion wastes most of its time on bookkeeping. A pathfinding algorithm that scans a full list for the minimum distance on every step turns a linear-time operation into a quadratic one.
Python's heapq module provides a heap-based priority queue that offers O(log n) insertion and O(log n) extraction of the smallest element. It also includes optimized functions for finding the N largest or smallest elements from any iterable without sorting the entire collection. This guide covers every function in heapq with practical examples.
What is a Heap?
A heap is a binary tree where each parent node is smaller than or equal to its children (min-heap). Python's heapq implements a min-heap using a regular list, where the smallest element is always at index 0.
The key operations and their time complexities:
| Operation | Function | Time Complexity |
|---|---|---|
| Push an item | heapq.heappush(heap, item) | O(log n) |
| Pop smallest | heapq.heappop(heap) | O(log n) |
| Push then pop | heapq.heappushpop(heap, item) | O(log n) |
| Pop then push | heapq.heapreplace(heap, item) | O(log n) |
| Build heap from list | heapq.heapify(list) | O(n) |
| N smallest items | heapq.nsmallest(n, iterable) | O(n log k) |
| N largest items | heapq.nlargest(n, iterable) | O(n log k) |
Basic Heap Operations
Creating and Using a Heap
import heapq
# Start with an empty heap (just a regular list)
heap = []
# Push items
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
heapq.heappush(heap, 2)
print(heap) # [1, 2, 3, 5] (heap order, not sorted)
print(heap[0]) # 1 (smallest element, always at index 0)
# Pop smallest item
smallest = heapq.heappop(heap)
print(smallest) # 1
print(heap) # [2, 5, 3]Converting a List to a Heap
import heapq
data = [5, 3, 8, 1, 2, 7, 4, 6]
heapq.heapify(data) # In-place, O(n)
print(data) # [1, 2, 4, 3, 5, 7, 8, 6] (heap order)
# Pop elements in sorted order
sorted_data = []
while data:
sorted_data.append(heapq.heappop(data))
print(sorted_data) # [1, 2, 3, 4, 5, 6, 7, 8]heappushpop and heapreplace
These combined operations are more efficient than separate push/pop calls:
import heapq
heap = [1, 3, 5, 7]
heapq.heapify(heap)
# heappushpop: push first, then pop smallest
result = heapq.heappushpop(heap, 2)
print(result) # 1 (pushed 2, popped 1)
print(heap) # [2, 3, 5, 7]
# heapreplace: pop smallest first, then push
result = heapq.heapreplace(heap, 10)
print(result) # 2 (popped 2, pushed 10)
print(heap) # [3, 7, 5, 10]nlargest and nsmallest
These functions find the N largest or smallest elements without sorting the entire collection.
import heapq
scores = [85, 92, 78, 95, 88, 72, 91, 83, 97, 69]
# Top 3 scores
top_3 = heapq.nlargest(3, scores)
print(top_3) # [97, 95, 92]
# Bottom 3 scores
bottom_3 = heapq.nsmallest(3, scores)
print(bottom_3) # [69, 72, 78]With a Key Function
Both accept a key parameter for custom comparison:
import heapq
students = [
{'name': 'Alice', 'gpa': 3.9},
{'name': 'Bob', 'gpa': 3.5},
{'name': 'Charlie', 'gpa': 3.8},
{'name': 'Diana', 'gpa': 3.95},
{'name': 'Eve', 'gpa': 3.2},
]
top_students = heapq.nlargest(3, students, key=lambda s: s['gpa'])
for s in top_students:
print(f"{s['name']}: {s['gpa']}")
# Diana: 3.95
# Alice: 3.9
# Charlie: 3.8When to Use nlargest/nsmallest vs sorted()
| Scenario | Best Approach | Why |
|---|---|---|
| N is 1 | min() / max() | O(n), simplest |
| N is small relative to total | nlargest / nsmallest | O(n log k), less memory |
| N is close to total length | sorted(iterable)[:n] | O(n log n), but fast in practice |
| Need all items sorted | sorted() | Full sort is appropriate |
Implementing a Max-Heap
Python's heapq only provides a min-heap. To create a max-heap, negate the values:
import heapq
# Max-heap using negation
max_heap = []
for val in [5, 1, 3, 8, 2]:
heapq.heappush(max_heap, -val)
# Pop largest items
while max_heap:
print(-heapq.heappop(max_heap), end=' ')
# 8 5 3 2 1Priority Queue with Custom Objects
For objects that aren't directly comparable, use tuples (priority, counter, item):
import heapq
from dataclasses import dataclass
@dataclass
class Task:
name: str
description: str
# Use (priority, tie_breaker, task) tuples
task_queue = []
counter = 0
def add_task(priority, task):
global counter
heapq.heappush(task_queue, (priority, counter, task))
counter += 1
def get_next_task():
if task_queue:
priority, _, task = heapq.heappop(task_queue)
return task
return None
add_task(3, Task("Low", "Clean up logs"))
add_task(1, Task("Critical", "Fix production bug"))
add_task(2, Task("Medium", "Update docs"))
add_task(1, Task("Critical", "Patch security hole"))
while task_queue:
task = get_next_task()
print(f"{task.name}: {task.description}")
# Critical: Fix production bug
# Critical: Patch security hole
# Medium: Update docs
# Low: Clean up logsThe counter serves as a tie-breaker so items with equal priority are returned in insertion order (FIFO).
Practical Use Cases
Merge Sorted Iterables
import heapq
list1 = [1, 4, 7, 10]
list2 = [2, 5, 8, 11]
list3 = [3, 6, 9, 12]
merged = list(heapq.merge(list1, list2, list3))
print(merged) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]heapq.merge() is memory-efficient -- it doesn't load all lists into memory at once.
Running Median
import heapq
class RunningMedian:
"""Track the median of a stream of numbers."""
def __init__(self):
self.low = [] # max-heap (negated values)
self.high = [] # min-heap
def add(self, num):
heapq.heappush(self.low, -num)
heapq.heappush(self.high, -heapq.heappop(self.low))
if len(self.high) > len(self.low):
heapq.heappush(self.low, -heapq.heappop(self.high))
def median(self):
if len(self.low) > len(self.high):
return -self.low[0]
return (-self.low[0] + self.high[0]) / 2
rm = RunningMedian()
for num in [2, 1, 5, 7, 2, 0, 5]:
rm.add(num)
print(f"Added {num}, median = {rm.median()}")Dijkstra's Shortest Path
import heapq
def dijkstra(graph, start):
"""Find shortest paths from start to all nodes."""
distances = {node: float('inf') for node in graph}
distances[start] = 0
pq = [(0, start)]
while pq:
dist, node = heapq.heappop(pq)
if dist > distances[node]:
continue
for neighbor, weight in graph[node]:
new_dist = dist + weight
if new_dist < distances[neighbor]:
distances[neighbor] = new_dist
heapq.heappush(pq, (new_dist, neighbor))
return distances
graph = {
'A': [('B', 1), ('C', 4)],
'B': [('C', 2), ('D', 5)],
'C': [('D', 1)],
'D': [],
}
print(dijkstra(graph, 'A'))
# {'A': 0, 'B': 1, 'C': 3, 'D': 4}Analyzing Heap Performance in Jupyter
When comparing heap-based algorithms against alternatives or profiling priority queue performance, interactive experimentation speeds up the process. RunCell (opens in a new tab) is an AI agent for Jupyter that can help you benchmark heap operations, visualize performance characteristics, and optimize your data structure choices interactively.
FAQ
What is heapq in Python?
heapq is a standard library module that implements a min-heap using a regular Python list. It provides functions for pushing and popping items in O(log n) time while maintaining the heap property (smallest element always at index 0). It is the standard way to implement priority queues in Python.
How do I create a max-heap in Python?
Python's heapq only supports min-heaps. To simulate a max-heap, negate the values when pushing and negate again when popping: heapq.heappush(heap, -value) to push, -heapq.heappop(heap) to get the largest. For complex objects, negate the priority in a tuple: (-priority, counter, item).
When should I use heapq vs sorted()?
Use heapq.nlargest(n, data) or heapq.nsmallest(n, data) when you need only the top N items and N is small relative to the total data size. Use sorted() when you need all items in order or when N is close to the total size. Use min()/max() when N is 1.
Is heapq thread-safe?
No. Individual heappush and heappop calls are not atomic. If multiple threads access the same heap, you need external synchronization. For thread-safe priority queues, use queue.PriorityQueue, which wraps heapq with proper locking.
What is the time complexity of heapq operations?
heappush and heappop are both O(log n). heapify converts a list to a heap in O(n). nlargest(k, data) and nsmallest(k, data) run in O(n log k) where n is the data size and k is the number of items requested.
Conclusion
Python's heapq module is the standard tool for priority queues and top-N selection. Its heap-based approach provides O(log n) push and pop operations, making it dramatically faster than maintaining a sorted list. Use heappush/heappop for priority queue patterns, nlargest/nsmallest for top-N selection, merge for combining sorted sequences, and heapify when you have an existing list to convert.