DAG Task Scheduler

hardPython

Lesson

Topological Sorting and DAG Processing

A Directed Acyclic Graph (DAG) is a graph with directed edges and no cycles. This structure is perfect for representing dependencies where one task must complete before another can begin. Think of it like a recipe: you need to chop vegetables before you can cook them, and cook them before you can serve the dish.

Topological sorting is an algorithm that produces a linear ordering of vertices in a DAG such that for every directed edge from vertex A to vertex B, A appears before B in the ordering. This is exactly what we need for dependency resolution.

Kahn's Algorithm is one of the most intuitive approaches:

  1. Calculate the in-degree (number of incoming edges) for each vertex
  2. Start with vertices that have zero in-degree (no dependencies)
  3. Remove these vertices and decrease the in-degree of their neighbors
  4. Repeat until all vertices are processed or a cycle is detected

Cycle detection is crucial because circular dependencies make execution impossible. If we can't process all vertices, it means some are stuck waiting for each other in a cycle.

Alternative approaches include DFS-based topological sorting, which processes vertices in post-order after exploring all their dependencies. Both methods have O(V + E) time complexity, making them efficient for large dependency graphs.

The key insight is that topological sorting only works on DAGs. If cycles exist, no valid ordering is possible, and we must detect this condition and handle it appropriately.

Example
1from collections import defaultdict, deque 2 3def topological_sort(graph): 4 # Calculate in-degrees 5 in_degree = defaultdict(int) 6 for node in graph: 7 for neighbor in graph[node]: 8 in_degree[neighbor] += 1 9 10 # Start with nodes having no dependencies 11 queue = deque([node for node in graph if in_degree[node] == 0]) 12 result = [] 13 14 while queue: 15 current = queue.popleft() 16 result.append(current) 17 18 # Remove current node and update in-degrees 19 for neighbor in graph[current]: 20 in_degree[neighbor] -= 1 21 if in_degree[neighbor] == 0: 22 queue.append(neighbor) 23 24 # Check for cycles 25 if len(result) != len(graph): 26 raise ValueError("Cycle detected!") 27 28 return result
L4Track how many dependencies each node has
L8Begin with nodes that have no dependencies
L16Decrease dependency count as we process each node
L21If we can't process all nodes, there's a cycle

Key Takeaways

  • •Topological sorting provides a way to order tasks while respecting dependencies
  • •Kahn's algorithm uses in-degree counting to iteratively process nodes with no remaining dependencies
  • •Cycle detection is essential - if not all vertices are processed, cycles exist in the graph
Loading...