K Most Frequent Elements

mediumPython

Lesson

Heap Data Structures for Top-K Problems

When you need to find the "top K" or "bottom K" elements from a dataset, heap data structures provide an elegant and efficient solution. A heap is a specialized binary tree that maintains a specific ordering property: in a min-heap, each parent node is smaller than its children, while in a max-heap, each parent is larger.

The key insight for top-K problems is that you don't need to sort the entire dataset. Sorting would give you O(n log n) time complexity, but heaps allow you to achieve better performance. Python's heapq module provides a min-heap implementation with several useful functions.

For finding the K largest elements, you have two main approaches. First, you can maintain a min-heap of size K, adding elements one by one and removing the smallest when the heap exceeds size K. This gives you O(n log k) complexity. Second, you can use heapq.nlargest(), which internally builds a heap from all elements and extracts the K largest ones, resulting in O(n + k log n) complexity.

The choice between approaches depends on your constraints. If K is much smaller than n, the first approach is more efficient. If K is close to n, the second approach often performs better due to optimized internal implementation.

Frequency-based problems add another layer: you first count occurrences (typically O(n) with a hash table), then apply heap operations to find the most frequent items. This pattern appears frequently in data analysis, recommendation systems, and search algorithms where you need to identify the most popular or relevant items from large datasets.

Example
1import heapq 2from collections import Counter 3 4# Example: Find 3 largest numbers in an array 5numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5] 6 7# Method 1: Using nlargest - O(n + k log n) 8largest_3 = heapq.nlargest(3, numbers) 9print(f"3 largest: {largest_3}") # [9, 6, 5] 10 11# Method 2: Min-heap of size k - O(n log k) 12heap = [] 13for num in numbers: 14 if len(heap) < 3: 15 heapq.heappush(heap, num) 16 elif num > heap[0]: # num is larger than smallest in heap 17 heapq.heapreplace(heap, num) 18 19print(f"3 largest (heap): {sorted(heap, reverse=True)}")
L6nlargest() handles the heap operations internally for convenience
L12Manual heap approach gives more control over the process
L15heapreplace() efficiently removes min and adds new element

Key Takeaways

  • •Heaps solve top-K problems more efficiently than full sorting (O(n + k log n) vs O(n log n))
  • •Min-heaps are counterintuitively useful for finding maximum elements by maintaining a 'threshold'
  • •Frequency problems combine counting (O(n)) with heap operations for optimal performance
Loading...