Lab 07 — Topological Sort: Course Schedule II (Kahn’s Vs DFS)
Goal
Master both topological sort algorithms — Kahn’s BFS and DFS-postorder — applied to LC 210. Deliverable produces a valid course order in O(V + E) time, O(V + E) space, with cycle detection wired in. You can articulate when each algorithm is preferable, the standard cycle-detection check, and how this generalizes to dependency resolution and build systems.
Background Concepts
DAG topological order; indegree array as a Kahn’s prerequisite; DFS three-color marking (white/gray/black) for cycle detection; postorder reverse for DFS-topo; the equivalence of “topo order exists” and “graph is a DAG”; existence of cycle = len(order) != V for Kahn’s. Review pattern Topological Sort and Graph Foundations.
Interview Context
Topological sort appears at Meta (build dependencies), Amazon (course scheduling, package ordering), Google (Spanner schema migrations). The interview signal is whether you naturally pick Kahn’s for explicit ordering (where the BFS structure makes the algorithm self-documenting) and DFS for cycle detection or recursive structure (where the call stack mirrors the recursion). Weak candidates only know one. Strong candidates code Kahn’s and explain the DFS variant. Elite candidates discuss stable ordering (preserving input order on ties via priority queue) and comment on parallelizability.
Problem Statement
You must take numCourses numbered 0..n-1. Some courses have prerequisites: prerequisites[i] = [a, b] means you must take b before a. Return any valid ordering of courses to finish all of them, or an empty array if impossible (cycle).
Constraints
1 ≤ numCourses ≤ 20000 ≤ |prerequisites| ≤ 5000- All
[a, b]pairs unique;a != b.
Clarifying Questions
- Are duplicate prerequisite pairs possible? (Per constraints, no — but worth confirming, as it affects indegree counting.)
- Are self-loops
[a, a]possible? (Per constraints,a != b— no self-loops.) - If multiple valid orderings exist, return which one? (Any. But mention you can return the lex-smallest with a min-heap-based Kahn’s.)
- Output empty array on cycle, or null/exception? (LC says empty.)
- Are course IDs always
0..n-1contiguous? (Yes per LC; otherwise you’d need to map.)
Examples
| numCourses | prerequisites | Output |
|---|---|---|
| 2 | [[1,0]] | [0,1] |
| 4 | [[1,0],[2,0],[3,1],[3,2]] | [0,1,2,3] or [0,2,1,3] |
| 2 | [[1,0],[0,1]] | [] (cycle) |
| 1 | [] | [0] |
| 3 | [[0,1],[1,2],[2,0]] | [] (cycle) |
Initial Brute Force
Repeatedly find a course with no remaining prerequisites; remove it and its outgoing edges; repeat. If at some point no such course exists but courses remain, there’s a cycle.
def find_order_brute(n, prereqs):
deps = [set() for _ in range(n)]
for a, b in prereqs:
deps[a].add(b)
order = []
for _ in range(n):
for i in range(n):
if deps[i] is not None and not deps[i]:
order.append(i)
deps[i] = None
for j in range(n):
if deps[j] is not None: deps[j].discard(i)
break
else:
return []
return order
Brute Force Complexity
Time O(V³) (V scans × V lookups × V edge updates per round). At V=2000, ~8×10⁹ — TLEs.
Optimization Path
Kahn’s algorithm replaces “find a no-prereq course” + “remove it” with O(1) amortized operations:
- Compute
indegree[v]= number of incoming edges. - Initialize a queue with all
vhavingindegree[v] == 0. - Repeat: pop
ufrom queue, append to order, decrement indegree of each successorv; ifindegree[v]becomes 0, pushv. - If
len(order) == V: that’s the topological order. Else: cycle.
Each edge processed once. Each vertex enqueued/dequeued once. Total O(V + E).
DFS variant: run DFS from each unvisited node; on finish (postorder), prepend the node to the order. Detect cycles via the gray-vertex check.
Final Expected Approach (Kahn’s)
from collections import deque
def find_order(n, prereqs):
adj = [[] for _ in range(n)]
indeg = [0] * n
for a, b in prereqs:
adj[b].append(a) # edge b -> a (b is prereq of a)
indeg[a] += 1
q = deque(v for v in range(n) if indeg[v] == 0)
order = []
while q:
u = q.popleft()
order.append(u)
for v in adj[u]:
indeg[v] -= 1
if indeg[v] == 0:
q.append(v)
return order if len(order) == n else []
Final Expected Approach (DFS)
def find_order_dfs(n, prereqs):
adj = [[] for _ in range(n)]
for a, b in prereqs:
adj[b].append(a)
color = [0] * n # 0=white, 1=gray (on stack), 2=black (done)
order = []
def dfs(u):
if color[u] == 1: return False # back edge -> cycle
if color[u] == 2: return True
color[u] = 1
for v in adj[u]:
if not dfs(v): return False
color[u] = 2
order.append(u)
return True
for u in range(n):
if color[u] == 0 and not dfs(u):
return []
return order[::-1]
Data Structures Used
- Kahn’s: adjacency list, indegree array, queue, output list.
- DFS: adjacency list, color array, output list, implicit recursion stack.
Correctness Argument (Kahn’s)
Invariant: at any point, order contains a valid prefix of some topological order, and the queue contains exactly the unprocessed vertices with no remaining unsatisfied prerequisites (i.e., remaining indegree zero in the subgraph of unprocessed vertices).
Maintenance: when we pop u, all its prereqs are already in order (since indeg[u] == 0 in the residual graph means all its prereqs have been processed). Adding u extends a valid topo-prefix. For each successor v, decrementing indeg[v] reflects that u is now “satisfied”; if v’s residual indegree hits 0, all its prereqs are satisfied, so it’s eligible.
Cycle detection: if len(order) < n, some vertices were never enqueued, meaning their residual indegree never reached 0 — they’re inside a strongly connected component with a cycle (or downstream of one).
Correctness Argument (DFS)
The classical theorem: a directed graph is a DAG iff DFS encounters no back edges. We mark vertices gray when entering DFS, black on finish. Encountering a gray vertex via an outgoing edge is a back edge → cycle. Postorder reversed gives a valid topological order: when u finishes, all reachable-from-u vertices have already finished and are earlier in order; reversing puts them after u.
Complexity
Both: O(V + E) time, O(V + E) space (adjacency + queue/recursion-stack).
Implementation Requirements
- Build the adjacency list with edge direction
prereq → course(so we can decrementindeg[course]when aprereqis processed). The reverse direction also works but flips the topo-order interpretation; pick one convention and stay consistent. - Use
deque(Python) /ArrayDeque(Java) /container/list(Go) for the BFS queue, not a Pythonlist.pop(0)which is O(N). - DFS recursion: watch Python’s default recursion limit (1000) for large N. Either iterative DFS or
sys.setrecursionlimitfor V > ~900. - Cycle check after the loop (Kahn’s) or during (DFS gray check).
Tests
- Smoke:
(2, [[1,0]])→[0,1]. - Unit: no prereqs (
(3, [])→ any permutation of[0,1,2]); single chain ((4, [[1,0],[2,1],[3,2]])→[0,1,2,3]). - Cycle:
(2, [[1,0],[0,1]])→[]; longer cycle(3, [[0,1],[1,2],[2,0]])→[]. - DAG with multiple roots:
(4, [[2,0],[2,1],[3,2]])→ either[0,1,2,3]or[1,0,2,3]. Validate by checking the output is a permutation and all prereqs respected. - Edge: N=1, no prereqs →
[0]. - Large: N=2000, E=5000, random DAG; assert sub-100ms.
- Validator helper (write this!):
def is_valid_topo(order, n, prereqs): if len(order) != n or set(order) != set(range(n)): return False pos = {v: i for i, v in enumerate(order)} return all(pos[b] < pos[a] for a, b in prereqs)
Follow-up Questions
- “Return the lex-smallest valid order.” → replace the queue with a min-heap. Time O((V + E) log V).
- “Detect all vertices in cycles, not just whether any exist.” → SCC decomposition (Tarjan’s / Kosaraju’s), Phase 3.
- “Topological sort under updates (edges added/removed)?” → online topological order maintenance, hard problem; offline batched updates with reordering.
- “Parallel topological sort?” → at each round, all indegree-0 vertices can run in parallel; this is the natural parallelization for build systems (Bazel, Buck).
- “Schedule with time costs per task: minimize total wall time?” → critical-path method; longest path in DAG, computable in O(V + E) after topo-sort.
- “If V is huge (10^9) and the graph is implicit?” → streaming variant; need indegree of each vertex computed via input stream.
Product Extension
This pattern is dependency resolution. Build systems (Bazel, Make, Maven, npm/yarn lockfile resolution), database migration runners, terraform depends_on, container orchestration (Kubernetes init-containers), spreadsheet recalc, even React’s effect-dependency ordering. Course Schedule II is the toy version of “given declared dependencies, output a valid execution order” — and the DFS variant is what most build systems use, because they want to detect cycles early with a clear error path showing the offending cycle.
Language/Runtime Follow-ups
- Python:
collections.dequefor BFS queue;sys.setrecursionlimit(10**5)if doing DFS on large input. List-as-queue withpop(0)is O(N) — never use it. - Java:
ArrayDeque<Integer>is the canonical queue. Boxing tax forQueue<Integer>— for hot loops, useint[]ring buffer. - Go: no built-in queue; use a slice and
q = q[1:](cheap ifcap(q)doesn’t change) orcontainer/list(heavier). Slice-as-queue grows O(N) memory per shrink because Go doesn’t truncate-and-shift; if memory matters, periodically copy the live portion. - C++:
std::queue<int>for Kahn’s;std::vector<int>+ recursive DFS (iterative if V > ~10⁵ to avoid stack overflow with default 8MB stack). - JS/TS:
Array.prototype.shift()is O(N) — use index-based queue (let head = 0; q[head++]) for O(1) amortized. - Stack overflow: any DFS-topo on V > recursion-limit needs iterative implementation. The iterative version uses an explicit stack of
(vertex, iterator)pairs.
Common Bugs
- Edge direction confusion.
[a, b]means “b before a”, so the edge isb → a. Reversing it inverts the topological order and breaks the indegree computation. - Forgetting cycle detection. Returning
ordereven whenlen(order) < nproduces a partial order that misses some courses. - Using
list.pop(0)in Python (orQueue.poll()on aLinkedList<Integer>— actually fine, butArrayList.remove(0)is O(N)). - Python recursion limit in DFS on V > 1000 — silent
RecursionError. Set the limit explicitly. - DFS gray check missing. Without distinguishing gray (on-stack) from black (finished), you can’t detect back edges; you’d accept cyclic graphs with the wrong order.
- Java boxing penalty in
Queue<Integer>— ~2× slowdown vsint[]ring buffer. - Adjacency list as
Map<Integer, List<Integer>>when courses are 0..n-1 — wastes time on hashing; useList<List<Integer>>indexed by ID.
Debugging Strategy
- Trace
(4, [[1,0],[2,0],[3,1],[3,2]])Kahn’s: indeg =[0,1,1,2]. Queue:[0]. Pop 0, decrement indeg of 1 and 2: indeg =[0,0,0,2]. Push 1, 2. Pop 1, decrement indeg of 3:[0,0,0,1]. Pop 2, decrement:[0,0,0,0]. Push 3. Pop 3. Order =[0,1,2,3]. ✓ - Run the validator helper on every output during development.
- For cycle issues: build a small cycle by hand, ensure your code returns
[], not a partial order.
Mastery Criteria
- Recognized “valid order respecting prereqs” as topological sort within 30 seconds.
- Wrote Kahn’s correctly within 8 minutes, with cycle detection.
- Wrote DFS variant within 8 more minutes, with three-color cycle detection.
- Articulated both correctness arguments without prompting.
- Identified the language-specific recursion-limit / boxing trap.
- Generalized to LC 207 (Course Schedule, just yes/no), LC 269 (Alien Dictionary, infer edges), LC 1136 (Parallel Courses, layer-by-layer Kahn’s) within a week.