p62 — Course Schedule II

Source: LeetCode 210 · Medium · Topics: Graph, Topological Sort, BFS, DFS Companies (2024–2025): Amazon (very high), Meta (high), Google (high), Microsoft (high), LinkedIn (medium-high) Loop position: Topological sort canonical. Tests Kahn’s BFS (indegree counter) AND DFS post-order — both expected. Build systems, task scheduling, dependency resolution all reduce here.

1. Quick Context

numCourses courses 0..n-1, prerequisites[i] = [a, b] meaning take b before a. Return a valid order, or [] if cycle exists (impossible).

Two equivalent algorithms:

Kahn’s BFS (indegree)

Compute indegree of every node. Push indegree-0 nodes onto a queue. Pop, append to result, decrement indegrees of neighbors; push any that drop to 0. If final result has all n nodes → return it; else cycle.

DFS post-order

DFS with three-color states (WHITE / GRAY / BLACK). On entering a node mark GRAY; recurse on neighbors; if we hit a GRAY neighbor → cycle. On finish mark BLACK and append to stack. Reverse the stack at the end.

Both O(V + E).

🔗 https://leetcode.com/problems/course-schedule-ii/

STOP. 25-min timer. Aim to know BOTH algorithms.

3. Prerequisite Concepts

  • Adjacency list construction.
  • BFS with collections.deque.
  • DFS with explicit recursion + three colors (white/gray/black) for cycle detection.

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Linear order respecting partial-order dependencies.

2. Clarify: Edges duplicated? Self-loop? [a, a] is a cycle. Disconnected components allowed?

3. Examples: n=2, prereqs=[[1,0]] → [0,1]; n=4, [[1,0],[2,0],[3,1],[3,2]] → e.g. [0,1,2,3] or [0,2,1,3]; with cycle [[0,1],[1,0]] → [].

5. Brute force: Try all n! permutations and check validity. n! infeasible.

6. Target: O(V + E).

7. Pattern: Topological sort.

8. Develop: see solution.py.

9. Test: empty prereqs; single course; chain; tree; cycle of length 2 and 3; disconnected components; self-loop.

5. Progressive Hints

See hints.md.

6. Deeper Insight

6.1 Edge direction convention — read it carefully

prerequisites[i] = [a, b] says “to take a, you must take b first.” This is a dependency from b → a. Build the adjacency list as adj[b].append(a) and the indegree of a increments.

Getting the direction wrong gives a topologically-sorted but BACKWARDS answer. The most common LC bug here.

6.2 Kahn’s BFS — full algorithm

1. Build adj list + indegree[]
2. queue = deque of all v with indegree[v] == 0
3. result = []
4. while queue:
     v = queue.popleft()
     result.append(v)
     for u in adj[v]:
         indegree[u] -= 1
         if indegree[u] == 0:
             queue.append(u)
5. return result if len(result) == n else []

Why correct: Every node with no remaining predecessors can safely come next. Cycle ⇔ some node never reaches indegree 0 ⇔ len(result) < n.

6.3 DFS post-order — full algorithm

Three colors: WHITE (unvisited), GRAY (in current DFS stack), BLACK (done).

def dfs(v):
    if color[v] == GRAY: cycle = True; return
    if color[v] == BLACK: return
    color[v] = GRAY
    for u in adj[v]:
        dfs(u)
        if cycle: return
    color[v] = BLACK
    order.append(v)        # post-order push

for v in 0..n-1: dfs(v)
return [] if cycle else order[::-1]

Why correct: DFS finishes children before parents. Reversing post-order gives a sequence where every edge u→v has u before v. Cycle ⇔ encountering a GRAY node (back-edge to ancestor).

6.4 Why both algorithms are worth knowing

  • Kahn’s BFS generalizes to “lexicographically smallest topo order” (use a heapq priority queue instead of deque).
  • DFS generalizes to strongly-connected components (Tarjan, Kosaraju), Euler tours, articulation points.
  • Interviewers often follow up “what about cycle DETECTION only?” or “what if we want lex-smallest?” — having both algorithms in pocket answers either fork.

6.5 Subtleties

  • Self-loop [a, a] is a cycle of length 1 — both algorithms must detect.
  • Duplicate edges in prerequisites inflate indegrees if not deduped — Kahn’s still works (it just decrements multiple times), but the input usually doesn’t have dupes; mention if asked.
  • Disconnected components: both algorithms handle naturally (Kahn enqueues all 0-indegree starts; DFS loops over all nodes).
  • Recursion depth: DFS on n=2000 may hit Python’s default recursion limit; use iterative DFS or sys.setrecursionlimit(10**6) for safety.

6.6 Lex-smallest topological order

Replace deque with heapq (min-heap):

heap = [v for v in range(n) if indegree[v] == 0]
heapq.heapify(heap)
while heap:
    v = heapq.heappop(heap)
    result.append(v)
    for u in adj[v]:
        indegree[u] -= 1
        if indegree[u] == 0:
            heapq.heappush(heap, u)

O((V + E) log V). Standard follow-up.

6.7 Cycle reporting (not just detection)

Sometimes interviewers ask “return the cycle.” Modify DFS: when you hit a GRAY node, walk parent pointers backwards from current node to the GRAY node to recover the cycle. Useful in dependency-resolution diagnostics.

6.8 Production use cases

  • Build systems (Make, Bazel, Ninja): topological order of compilation targets.
  • Task schedulers (Airflow, Luigi, Dagster): DAG of operators.
  • Spreadsheet engines (Excel, Google Sheets): cell evaluation order.
  • Package managers (npm, pip, apt): install order respecting deps.
  • DB query optimizers: rule application order.

Every one of these crashes / loops on a cycle, so detection is essential.

7. Anti-Pattern Analysis

  1. Wrong edge directionadj[a].append(b) instead of adj[b].append(a). Produces a valid reverse-topo, fails the LC test.
  2. Indegree decremented to negative — bug from duplicate edges or wrong direction.
  3. DFS without GRAY state — only WHITE/BLACK; cannot detect cycles, mistakes back-edges for cross-edges.
  4. Recursive DFS without raised recursion limit — silent crash for n > 1000.
  5. Forget to check len(result) == n for Kahn — partial result returned on a cycle, LC fails.
  6. Use a set instead of indegree counter — works but adds log/hash overhead, makes follow-up “lex-smallest” awkward.

8. Skills & Takeaways

  • Kahn’s BFS (indegree) and DFS post-order — both in pocket.
  • Three-color DFS for cycle detection.
  • Adjacency list + indegree counter pattern.
  • Iterative DFS via explicit stack for large graphs.
  • Lex-smallest variant via heapq.

9. When to Move On

  • Solve LC 207 (Course Schedule, cycle detection only) in 5 min.
  • Solve LC 210 with BOTH algorithms in 15 min total.
  • Solve LC 269 (Alien Dictionary) — topo sort over inferred char ordering.
  • Solve LC 1857 (Largest Color Value in DAG) — topo + DP.
  • Explain three-color DFS in 60 seconds.

10. Company Context

  • Amazon: L5/L6; productized as “service deployment order across regions.”
  • Meta: E5; LC 269 follow-up common.
  • Google: L5; lex-smallest variant a frequent twist.
  • Microsoft: Compiler / build-system teams.
  • LinkedIn: L5; skill-tree / course-recommendation analogue.

Scorecard for strong: “Implemented both Kahn and DFS, articulated three-color cycle detection, mentioned lex-smallest extension via heap, related to build systems.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff+
Edge directionConfusedCorrect after promptCorrect + articulates+ draws DAG to verify
Algorithm choiceOne onlyBoth namesBoth implementable+ chooses based on follow-up potential
Cycle detectionForgetsIndegree check+ DFS GRAY state+ reports cycle, not just detects
ExtensionsNoneLC 207LC 269 alien dict+ Tarjan SCC + Kahn-with-heap lex-smallest

12. Level Delta

  • Mid (L4): Kahn’s BFS, correct edge direction, returns [] on cycle.
  • Senior (L5): + DFS post-order + three-color cycle detection + lex-smallest with heapq + LC 269 alien dictionary.
  • Staff (L6): + iterative DFS to avoid recursion limit + Tarjan SCC awareness + reports actual cycle + production parallels (build systems).
  • Principal (L7+): + parallel topological sort (Kahn’s BFS is naturally levelizable across cores) + incremental topo-sort (Pearce-Kelly 2007 algorithm for online edge insertions) + transitive closure / DAG reachability via DFS+memoization.

13. Follow-up Q&A

Q1: Return only “can it be done” (cycle detection). A1: LC 207. Same algorithm; return len(result) == n (Kahn) or not cycle (DFS).

Q2: Lex-smallest topological order. A2: Kahn’s with heapq instead of deque. O((V+E) log V).

Q3: All valid topological orders (not just one). A3: Backtrack on the set of “ready” (indegree-0) nodes; for each choice, decrement and recurse. Exponential in pathological cases (a DAG with no constraints has n! orders).

Q4: Detect AND report the cycle. A4: In DFS, maintain parent pointers. On encountering GRAY neighbor u from v, walk parent chain from v back to u to recover the cycle.

Q5: What if prerequisites has 10^9 edges? A5: Adjacency list won’t fit. Stream edges; if you can bound indegree-0 sets externally or process by batches; otherwise the problem isn’t tractable in-memory. Production: graph DBs (Neo4j) for huge dependency graphs.

14. Solution Walkthrough

See solution.py:

  • find_order_kahn — BFS with indegree counter and deque.
  • find_order_dfs — three-color DFS, post-order push, reverse at end.
  • find_order_brute — try all permutations (stress n ≤ 6).

15. Beyond the Problem

Principal code review: “Topological sort is one of the dozen algorithms you’ll touch every week of an engineering career. Every build system, every CI pipeline, every workflow engine, every database query plan — they all reduce to this. The interview asks for an order; production asks for parallel scheduling (which subset can run concurrently?), incremental rebuilds (only redo what changed downstream?), cycle diagnostics (when a circular dep is added in code review, surface WHICH cycle?). Kahn’s algorithm naturally answers “what can run now” — it levelizes. That’s why Bazel’s job scheduler is essentially Kahn-with-a-thread-pool. Learn the algorithm; then learn its three or four most common extensions; you’ll use them forever.“