When you need to find the kth largest (or smallest) element in a collection, your first instinct might be to sort the entire array and pick the element at the appropriate index. While this works, it requires O(n log n) time complexity, which may be overkill when you only need one specific element.
A more efficient approach uses a heap data structure. A heap is a binary tree that maintains a specific ordering property: in a min-heap, each parent node is smaller than its children, making the root the smallest element. This property allows us to efficiently maintain a collection of the "best" elements we've seen so far.
For finding the kth largest element, we use a min-heap of size k. As we iterate through the array, we add elements to the heap. When the heap grows beyond size k, we remove the smallest element (the root). This ensures our heap always contains the k largest elements we've encountered, with the smallest of these k elements at the root.
The key insight is that we don't need to track all elements—just the k largest ones. After processing all elements, the root of our size-k min-heap is exactly the kth largest element in the original array. This approach has O(n log k) time complexity, which is significantly better than O(n log n) when k is much smaller than n.
Python's heapq module provides an efficient min-heap implementation. You can add elements with heappush(), remove the smallest with heappop(), and access the smallest element at index 0 of the heap list.
1import heapq
2
3# Find the 3 largest numbers in a stream of data
4numbers = [15, 8, 23, 4, 19, 12, 7]
5k = 3
6min_heap = []
7
8for num in numbers:
9 heapq.heappush(min_heap, num)
10 if len(min_heap) > k:
11 heapq.heappop(min_heap) # Remove smallest
12 print(f"After {num}: heap = {min_heap}")
13
14print(f"The {k}rd largest number is: {min_heap[0]}")