Lab 07 — Topological Sort: Course Schedule II (Kahn’s Vs DFS)

Goal

Master both topological sort algorithms — Kahn’s BFS and DFS-postorder — applied to LC 210. Deliverable produces a valid course order in O(V + E) time, O(V + E) space, with cycle detection wired in. You can articulate when each algorithm is preferable, the standard cycle-detection check, and how this generalizes to dependency resolution and build systems.

Background Concepts

DAG topological order; indegree array as a Kahn’s prerequisite; DFS three-color marking (white/gray/black) for cycle detection; postorder reverse for DFS-topo; the equivalence of “topo order exists” and “graph is a DAG”; existence of cycle = len(order) != V for Kahn’s. Review pattern Topological Sort and Graph Foundations.

Interview Context

Topological sort appears at Meta (build dependencies), Amazon (course scheduling, package ordering), Google (Spanner schema migrations). The interview signal is whether you naturally pick Kahn’s for explicit ordering (where the BFS structure makes the algorithm self-documenting) and DFS for cycle detection or recursive structure (where the call stack mirrors the recursion). Weak candidates only know one. Strong candidates code Kahn’s and explain the DFS variant. Elite candidates discuss stable ordering (preserving input order on ties via priority queue) and comment on parallelizability.

Problem Statement

You must take numCourses numbered 0..n-1. Some courses have prerequisites: prerequisites[i] = [a, b] means you must take b before a. Return any valid ordering of courses to finish all of them, or an empty array if impossible (cycle).

Constraints

1 ≤ numCourses ≤ 2000
0 ≤ |prerequisites| ≤ 5000
All [a, b] pairs unique; a != b.

Clarifying Questions

Are duplicate prerequisite pairs possible? (Per constraints, no — but worth confirming, as it affects indegree counting.)
Are self-loops [a, a] possible? (Per constraints, a != b — no self-loops.)
If multiple valid orderings exist, return which one? (Any. But mention you can return the lex-smallest with a min-heap-based Kahn’s.)
Output empty array on cycle, or null/exception? (LC says empty.)
Are course IDs always 0..n-1 contiguous? (Yes per LC; otherwise you’d need to map.)

Examples

numCourses	prerequisites	Output
2	`[[1,0]]`	`[0,1]`
4	`[[1,0],[2,0],[3,1],[3,2]]`	`[0,1,2,3]` or `[0,2,1,3]`
2	`[[1,0],[0,1]]`	`[]` (cycle)
1	`[]`	`[0]`
3	`[[0,1],[1,2],[2,0]]`	`[]` (cycle)

Initial Brute Force

Repeatedly find a course with no remaining prerequisites; remove it and its outgoing edges; repeat. If at some point no such course exists but courses remain, there’s a cycle.

def find_order_brute(n, prereqs):
    deps = [set() for _ in range(n)]
    for a, b in prereqs:
        deps[a].add(b)
    order = []
    for _ in range(n):
        for i in range(n):
            if deps[i] is not None and not deps[i]:
                order.append(i)
                deps[i] = None
                for j in range(n):
                    if deps[j] is not None: deps[j].discard(i)
                break
        else:
            return []
    return order

Brute Force Complexity

Time O(V³) (V scans × V lookups × V edge updates per round). At V=2000, ~8×10⁹ — TLEs.

Optimization Path

Kahn’s algorithm replaces “find a no-prereq course” + “remove it” with O(1) amortized operations:

Compute indegree[v] = number of incoming edges.
Initialize a queue with all v having indegree[v] == 0.
Repeat: pop u from queue, append to order, decrement indegree of each successor v; if indegree[v] becomes 0, push v.
If len(order) == V: that’s the topological order. Else: cycle.

Each edge processed once. Each vertex enqueued/dequeued once. Total O(V + E).

DFS variant: run DFS from each unvisited node; on finish (postorder), prepend the node to the order. Detect cycles via the gray-vertex check.

Final Expected Approach (Kahn’s)

from collections import deque

def find_order(n, prereqs):
    adj = [[] for _ in range(n)]
    indeg = [0] * n
    for a, b in prereqs:
        adj[b].append(a)        # edge b -> a (b is prereq of a)
        indeg[a] += 1
    q = deque(v for v in range(n) if indeg[v] == 0)
    order = []
    while q:
        u = q.popleft()
        order.append(u)
        for v in adj[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)
    return order if len(order) == n else []

Final Expected Approach (DFS)

def find_order_dfs(n, prereqs):
    adj = [[] for _ in range(n)]
    for a, b in prereqs:
        adj[b].append(a)
    color = [0] * n               # 0=white, 1=gray (on stack), 2=black (done)
    order = []

    def dfs(u):
        if color[u] == 1: return False    # back edge -> cycle
        if color[u] == 2: return True
        color[u] = 1
        for v in adj[u]:
            if not dfs(v): return False
        color[u] = 2
        order.append(u)
        return True

    for u in range(n):
        if color[u] == 0 and not dfs(u):
            return []
    return order[::-1]

Data Structures Used

Kahn’s: adjacency list, indegree array, queue, output list.
DFS: adjacency list, color array, output list, implicit recursion stack.

Correctness Argument (Kahn’s)

Invariant: at any point, order contains a valid prefix of some topological order, and the queue contains exactly the unprocessed vertices with no remaining unsatisfied prerequisites (i.e., remaining indegree zero in the subgraph of unprocessed vertices).

Maintenance: when we pop u, all its prereqs are already in order (since indeg[u] == 0 in the residual graph means all its prereqs have been processed). Adding u extends a valid topo-prefix. For each successor v, decrementing indeg[v] reflects that u is now “satisfied”; if v’s residual indegree hits 0, all its prereqs are satisfied, so it’s eligible.

Cycle detection: if len(order) < n, some vertices were never enqueued, meaning their residual indegree never reached 0 — they’re inside a strongly connected component with a cycle (or downstream of one).

Correctness Argument (DFS)

The classical theorem: a directed graph is a DAG iff DFS encounters no back edges. We mark vertices gray when entering DFS, black on finish. Encountering a gray vertex via an outgoing edge is a back edge → cycle. Postorder reversed gives a valid topological order: when u finishes, all reachable-from-u vertices have already finished and are earlier in order; reversing puts them after u.

Complexity

Both: O(V + E) time, O(V + E) space (adjacency + queue/recursion-stack).

Implementation Requirements

Build the adjacency list with edge direction prereq → course (so we can decrement indeg[course] when a prereq is processed). The reverse direction also works but flips the topo-order interpretation; pick one convention and stay consistent.
Use deque (Python) / ArrayDeque (Java) / container/list (Go) for the BFS queue, not a Python list.pop(0) which is O(N).
DFS recursion: watch Python’s default recursion limit (1000) for large N. Either iterative DFS or sys.setrecursionlimit for V > ~900.
Cycle check after the loop (Kahn’s) or during (DFS gray check).

Tests

Smoke: (2, [[1,0]]) → [0,1].
Unit: no prereqs ((3, []) → any permutation of [0,1,2]); single chain ((4, [[1,0],[2,1],[3,2]]) → [0,1,2,3]).
Cycle: (2, [[1,0],[0,1]]) → []; longer cycle (3, [[0,1],[1,2],[2,0]]) → [].
DAG with multiple roots: (4, [[2,0],[2,1],[3,2]]) → either [0,1,2,3] or [1,0,2,3]. Validate by checking the output is a permutation and all prereqs respected.
Edge: N=1, no prereqs → [0].
Large: N=2000, E=5000, random DAG; assert sub-100ms.

Validator helper (write this!):

def is_valid_topo(order, n, prereqs):
    if len(order) != n or set(order) != set(range(n)): return False
    pos = {v: i for i, v in enumerate(order)}
    return all(pos[b] < pos[a] for a, b in prereqs)

Follow-up Questions

“Return the lex-smallest valid order.” → replace the queue with a min-heap. Time O((V + E) log V).
“Detect all vertices in cycles, not just whether any exist.” → SCC decomposition (Tarjan’s / Kosaraju’s), Phase 3.
“Topological sort under updates (edges added/removed)?” → online topological order maintenance, hard problem; offline batched updates with reordering.
“Parallel topological sort?” → at each round, all indegree-0 vertices can run in parallel; this is the natural parallelization for build systems (Bazel, Buck).
“Schedule with time costs per task: minimize total wall time?” → critical-path method; longest path in DAG, computable in O(V + E) after topo-sort.
“If V is huge (10^9) and the graph is implicit?” → streaming variant; need indegree of each vertex computed via input stream.

Product Extension

This pattern is dependency resolution. Build systems (Bazel, Make, Maven, npm/yarn lockfile resolution), database migration runners, terraform depends_on, container orchestration (Kubernetes init-containers), spreadsheet recalc, even React’s effect-dependency ordering. Course Schedule II is the toy version of “given declared dependencies, output a valid execution order” — and the DFS variant is what most build systems use, because they want to detect cycles early with a clear error path showing the offending cycle.

Language/Runtime Follow-ups

Python: collections.deque for BFS queue; sys.setrecursionlimit(10**5) if doing DFS on large input. List-as-queue with pop(0) is O(N) — never use it.
Java: ArrayDeque<Integer> is the canonical queue. Boxing tax for Queue<Integer> — for hot loops, use int[] ring buffer.
Go: no built-in queue; use a slice and q = q[1:] (cheap if cap(q) doesn’t change) or container/list (heavier). Slice-as-queue grows O(N) memory per shrink because Go doesn’t truncate-and-shift; if memory matters, periodically copy the live portion.
C++: std::queue<int> for Kahn’s; std::vector<int> + recursive DFS (iterative if V > ~10⁵ to avoid stack overflow with default 8MB stack).
JS/TS: Array.prototype.shift() is O(N) — use index-based queue (let head = 0; q[head++]) for O(1) amortized.
Stack overflow: any DFS-topo on V > recursion-limit needs iterative implementation. The iterative version uses an explicit stack of (vertex, iterator) pairs.

Common Bugs

Edge direction confusion. [a, b] means “b before a”, so the edge is b → a. Reversing it inverts the topological order and breaks the indegree computation.
Forgetting cycle detection. Returning order even when len(order) < n produces a partial order that misses some courses.
Using list.pop(0) in Python (or Queue.poll() on a LinkedList<Integer> — actually fine, but ArrayList.remove(0) is O(N)).
Python recursion limit in DFS on V > 1000 — silent RecursionError. Set the limit explicitly.
DFS gray check missing. Without distinguishing gray (on-stack) from black (finished), you can’t detect back edges; you’d accept cyclic graphs with the wrong order.
Java boxing penalty in Queue<Integer> — ~2× slowdown vs int[] ring buffer.
Adjacency list as Map<Integer, List<Integer>> when courses are 0..n-1 — wastes time on hashing; use List<List<Integer>> indexed by ID.

Debugging Strategy

Trace (4, [[1,0],[2,0],[3,1],[3,2]]) Kahn’s: indeg = [0,1,1,2]. Queue: [0]. Pop 0, decrement indeg of 1 and 2: indeg = [0,0,0,2]. Push 1, 2. Pop 1, decrement indeg of 3: [0,0,0,1]. Pop 2, decrement: [0,0,0,0]. Push 3. Pop 3. Order = [0,1,2,3]. ✓
Run the validator helper on every output during development.
For cycle issues: build a small cycle by hand, ensure your code returns [], not a partial order.

Mastery Criteria

Recognized “valid order respecting prereqs” as topological sort within 30 seconds.
Wrote Kahn’s correctly within 8 minutes, with cycle detection.
Wrote DFS variant within 8 more minutes, with three-color cycle detection.
Articulated both correctness arguments without prompting.
Identified the language-specific recursion-limit / boxing trap.
Generalized to LC 207 (Course Schedule, just yes/no), LC 269 (Alien Dictionary, infer edges), LC 1136 (Parallel Courses, layer-by-layer Kahn’s) within a week.

LeetCode Interview — Extreme Coding