p23 — Course Schedule
Source: LeetCode 207 · Medium · Topics: DFS, BFS, Topological Sort, Graph Companies (2024–2025 frequency): Google (very high), Meta (high), Amazon (high), Microsoft (high), Apple (medium), Bloomberg (medium), LinkedIn (high) Loop position: mid-to-late onsite. Heavy “do you understand graphs as a model” signal.
1. Quick Context
numCourses courses and a list of [a, b] prerequisite pairs (meaning “to take a, you must first take b”). Return whether it’s possible to finish all courses.
This is cycle detection on a directed graph, framed as a real-world planning problem. Two canonical algorithms solve it and you should know both:
- Kahn’s algorithm (BFS-based topological sort): track in-degrees, repeatedly remove zero-in-degree nodes. If you process all
Nnodes → no cycle. Otherwise → cycle exists. - DFS 3-coloring (WHITE / GRAY / BLACK): WHITE = unvisited, GRAY = on current recursion path, BLACK = fully processed. A GRAY node encountered during DFS = back-edge = cycle.
What it looks like it tests: can you detect cycles. What it actually tests: (a) do you model “prereq” correctly as a directed edge (which way does the arrow go?), (b) do you know BOTH algorithms or just one, (c) can you extend to LC 210 (return the actual ordering — Course Schedule II) instantly, (d) do you handle self-loops as the trivial cycle case.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/course-schedule/
STOP. 25-min timer. Solve both ways before reading.
3. Prerequisite Concepts
- Directed graph adjacency list construction
- BFS with deque / DFS with explicit recursion
- phase-04-graphs/labs/lab-06-topological-sort.md
- Concept of in-degree
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “Given N courses and a list of [a, b] pairs meaning ‘must take b before a,’ determine whether a complete ordering exists. Equivalent: does the dependency graph have NO directed cycle?”
Step 2 — Clarify (CRITICAL):
- “For
[a, b], doesbpoint toaorapoint tob?” THIS ARROW DIRECTION IS THE QUESTION. By LC convention,[a, b]meansb → a(“completeb, then can doa”). State your interpretation explicitly. - “Are duplicate prereq pairs allowed?” (Doesn’t affect correctness, but slight constant factor.)
- “Can a course be its own prereq (
[a, a])?” (LC says no, but state your algorithm handles it as a trivial cycle.) - “Disconnected components?” (Yes — your loop must seed BFS/DFS from EVERY zero-in-degree node, not just node 0.)
Step 3 — Constraints: N ≤ 2000, len(prerequisites) ≤ 5000. O(V+E) trivially fits.
Step 4 — Examples:
N=2, prereqs=[[1,0]]→ True (take 0, then 1).N=2, prereqs=[[1,0],[0,1]]→ False (cycle).N=3, prereqs=[]→ True.
Step 5 — Brute Force: “Try all N! orderings, check each.” Trivially correct, useless complexity. Mention as proof of correctness baseline; don’t implement.
Step 6 — Complexity: O(V+E) time and space, both algorithms.
Step 7 — Pattern Recognition: “DAG-ness / cycle detection on a directed graph.” Pattern names to drop aloud: “Kahn’s algorithm,” “topological sort,” “3-color DFS.”
Step 8 — Develop (Kahn’s):
def canFinish(N, prereqs):
adj = [[] for _ in range(N)]
indeg = [0] * N
for a, b in prereqs:
adj[b].append(a)
indeg[a] += 1
q = deque(i for i in range(N) if indeg[i] == 0)
processed = 0
while q:
u = q.popleft()
processed += 1
for v in adj[u]:
indeg[v] -= 1
if indeg[v] == 0:
q.append(v)
return processed == N
Develop (DFS 3-color):
WHITE, GRAY, BLACK = 0, 1, 2
def canFinish(N, prereqs):
adj = [[] for _ in range(N)]
for a, b in prereqs:
adj[b].append(a)
color = [WHITE] * N
def dfs(u):
if color[u] == GRAY: return False # back-edge = cycle
if color[u] == BLACK: return True
color[u] = GRAY
for v in adj[u]:
if not dfs(v): return False
color[u] = BLACK
return True
return all(dfs(i) for i in range(N))
Step 9 — Test: Empty prereqs, self-loop, two-node cycle, chain, diamond, two-component (one valid + one cyclic).
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — Kahn vs DFS 3-Color
Both are O(V+E). Pick by what you need next:
| Need | Algorithm |
|---|---|
| Just “is there a cycle?” | DFS 3-color (slightly less bookkeeping) |
| The actual topological order (LC 210) | Kahn’s (the BFS order IS the topo order) |
| Identify all nodes in a cycle | DFS 3-color (when you hit GRAY, walk the stack back) |
| Stream-friendly (nodes arrive over time) | Kahn’s (decrement in-degrees as edges arrive) |
Why the 3-coloring works: the only way for DFS to discover a cycle is to encounter a node still on its current recursion stack — that’s exactly the “GRAY” state. WHITE = not yet touched. BLACK = fully explored and known cycle-free.
Common pitfall — using a single visited set: if you collapse WHITE/GRAY/BLACK into “visited / not visited,” you can no longer distinguish “I’m currently descending through this node” from “I’ve finished with this node.” A node that’s fully processed (BLACK) shouldn’t trigger a cycle alarm just because some unrelated path revisits it.
7. Anti-Pattern Analysis
- Wrong arrow direction.
[a, b]meansb → a. Reversing it builds the wrong graph and gives wrong answers on asymmetric examples. State your interpretation OUT LOUD before coding. - Using a single
visitedset instead of 3 colors. Reports false cycles on diamond graphs where two paths converge. - Seeding BFS from only node 0. Misses disconnected components. Kahn’s MUST initialize the queue from ALL zero-in-degree nodes.
- Forgetting
processed == Ncheck. If you just returnTruewhen the loop ends, you return True for cyclic graphs (the queue empties early because no node ever reaches in-degree 0). - Mutating the original
indegarray without copying — only matters if you reuse across runs, but flag it. - No self-loop handling. A self-loop
[a, a]is a cycle of length 1. Kahn’s catches it (indeg[a] is permanently ≥1, never enters queue). DFS 3-color catches it (enteringdfs(a)sets GRAY, recurses intoa, sees GRAY, returns False). - Recursive DFS without
sys.setrecursionlimiton N=2000 chain. Python default is 1000. A chain0→1→2→...→1999blows the stack.
8. Skills & Takeaways
- Arrow-direction discipline — half of all topo-sort bugs are reversed-edge bugs. State direction OUT LOUD.
- Kahn’s = topo order for free — same loop solves LC 210.
- 3-color DFS — the canonical cycle-detection pattern. GRAY = “on current path.”
processed == Ninvariant — Kahn’s correctness depends on this final check.- Disconnected components — seed from ALL valid roots, not just node 0.
- The reduction — many real problems (build systems, course planning, package managers, spreadsheet formula recalc) reduce to topological sort. Recognize the reduction in 30 seconds.
9. When to Move On
Move on when:
- You can write both Kahn’s and 3-color DFS from memory in under 5 minutes each.
- You can extend Kahn’s to LC 210 (return the ordering) in under 2 minutes.
- You handle: empty prereqs, self-loop, disconnected components, chain of length N=2000.
- Your stress test runs both algorithms + a brute-force
N!checker (small N) and confirms agreement. - You can articulate aloud which algorithm to pick given each downstream need (just-detect vs need-order vs find-cycle).
10. Company Context
Google:
- Asks LC 207 then immediately LC 210 (return the order) then LC 269 (Alien Dictionary — derive the order from sample dictionary words). Test of pattern transfer across three problems.
- Will ask about parallel topological sort: “If we have 8 workers, schedule courses to finish in min wall-clock time.” Reduces to “compute longest path in the DAG” (critical path method).
- Scorecard phrase: “Algorithm — Kahn’s + 3-color, both correct; extended cleanly to ordering and parallel cases.”
Meta:
- Variant: “given the prerequisite list, return the courses that are guaranteed to be in every topo order.” Reduces to checking which nodes have a unique predecessor at every Kahn step. Hard variant.
- Scorecard phrase: “Strong — handled topo + variant.”
Amazon:
- Pairs with a real-world frame: “build the deployment order for our microservices given the dependency manifest.” Same problem, business vocabulary.
- Asks LC 210 nearly always. Have it ready.
- Scorecard phrase: “Customer Obsession + Coding Excellence — translated business framing, shipped clean topo sort.”
Microsoft:
- May ask only LC 207 but expects you to volunteer LC 210 as the “useful generalization.” Senior+ candidates volunteer.
- Scorecard phrase: “Solid — recognized topo sort, demonstrated 2 algorithms.”
Apple:
- Often presents as build-graph cycle detection in their internal dependency system. Asks specifically about handling self-loops and reporting WHICH cycle exists, not just yes/no.
- Scorecard phrase: “Strong — used 3-color DFS, reported the cycle path.”
Bloomberg:
- Asks about computing the order in which to evaluate spreadsheet cells given inter-cell formula dependencies. Topo sort with the additional wrinkle of “cells can be added/removed dynamically.” Mention incremental topological sort as a Staff-level discussion.
- Scorecard phrase: “Strong — pragmatic translation to spreadsheet problem.”
LinkedIn:
- Frequent. Often pairs with “alien dictionary” (LC 269) in same loop.
- Scorecard phrase: “Pattern recognition — saw topo sort instantly.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| Arrow direction | Guesses | Asks once, then codes | States interpretation BEFORE coding | States + writes a 1-line comment in the code |
| Algorithm count | DFS only (no cycle handling) | Kahn’s OR DFS 3-color | Both, picks by downstream need | Both + discusses incremental + parallel variants |
| Disconnected | Misses | Catches after one test fail | Seeds all roots first try | Seeds + tests with a multi-component example |
| Final check | None | Returns True when loop ends (wrong) | processed == N | processed == N + asserts in code |
| Extension | Stuck | Solves LC 210 after a hint | Volunteers LC 210 unprompted | Volunteers LC 210 + LC 269 + parallel scheduling |
12. Level Delta
Mid (L4):
- Kahn’s algorithm, correct, with
processed == Ncheck. - Builds adjacency from
[a, b]with the right direction (after one clarifying question). - Handles disconnected components correctly.
Senior (L5):
- Both Kahn’s AND 3-color DFS, both clean.
- States algorithm choice aloud: “I’ll use Kahn’s because the follow-up will likely ask for the order.”
- Solves LC 210 in 2 minutes when asked.
- Handles self-loop explicitly.
Staff (L6):
- Discusses incremental topological sort (when edges arrive/depart dynamically).
- Knows LC 269 (Alien Dictionary) as the same algorithm + character-pair extraction.
- Discusses parallel scheduling: longest path in DAG = critical-path-method = min wall-clock with infinite parallelism.
- Discusses lexicographically smallest topo order via min-heap variant of Kahn’s.
Principal (L7+):
- Reframes as CSP (constraint satisfaction problem) with one constraint type (ordering); discusses how this generalizes to richer constraint solvers.
- Notes that this is the algorithm shipped in
make,ninja,Bazel,npm,pip,cargo, every build system on Earth. Discusses how those systems detect and report cycles to users (cycle reconstruction in DFS 3-color: when you hit GRAY, walk the call stack backwards). - Discusses how spreadsheet engines (Excel, Google Sheets) handle this incrementally — recalc only the cells downstream of changed input cells (downstream subgraph of a topologically sorted DAG).
- Discusses what topological sort does NOT solve: graphs with weighted edges and constraints on cumulative weight (need ILP / scheduling theory).
13. Follow-up Q&A
Q1: Return the actual order (LC 210).
A: Use Kahn’s. The order in which nodes are popped from the queue IS a valid topological order. Append to a list; if len(list) == N at end, return list; else return [] (cycle).
Q2: Return the lexicographically smallest topo order. A: Replace Kahn’s queue (FIFO) with a min-heap. When multiple zero-in-degree nodes exist, always pop the smallest. Still O((V+E) log V) due to heap.
Q3: There are multiple valid orderings. Return them all. A: Backtracking. At each step, enumerate every zero-in-degree node, recurse, then unselect. Worst case exponential (think K courses with no prereqs — K! orderings). Use only when the count is small or use it for enumeration with early termination.
Q4: A new prerequisite arrives in real time. Update the answer.
A: Incremental topological sort. Pearce-Kelly algorithm: maintain ordinal labels; when an edge u→v is added, if ord(u) < ord(v), nothing to do; else recompute labels for the affected subgraph. Amortized O(δ²·5) per edge where δ is the affected region size. The Staff signal is naming Pearce-Kelly without prompting.
Q5: Detect the actual cycle if one exists (not just yes/no).
A: Use DFS 3-color and maintain a parent pointer. When you hit a GRAY node v from current node u, walk parent pointers from u back to v and prepend v. That’s the cycle.
14. Solution Walkthrough
See solution.py:
can_finish_kahns(N, prereqs)— BFS / Kahn’s. Returns bool.can_finish_dfs_coloring(N, prereqs)— 3-color DFS. Returns bool.find_order_kahns(N, prereqs)— LC 210 bonus. Returns the order or[].find_cycle_dfs(N, prereqs)— Q5 follow-up. Returns the cycle as a list of node ids orNone.has_cycle_brute(N, prereqs)— O(N · (V+E)) baseline: run DFS from every node, track parent path. Used as the oracle in stress tests.stress_test()— 1000 random graphs (both guaranteed-DAG and possibly-cyclic), all three implementations agree with brute baseline;find_order_kahnsalso verified to produce a valid order.edge_cases()— empty prereqs, self-loop, two-node cycle, chain length 1000, diamond, disconnected with one cyclic component.
Run: python3 solution.py → “ALL TESTS PASSED”.
15. Beyond the Problem
Principal-engineer code review comment:
“Two things. First: please log the cycle path when you detect one.
return Falseis what LeetCode wants; what production needs israise CycleError(cycle=[3, 7, 12, 3])so the on-call engineer can read the dependency log at 3am and know which three services to break. Cycle reconstruction is 6 extra lines in the 3-color DFS — please do it. Second: if you’re about to ship Kahn’s, also exporttopological_order()— once we have the in-degree machinery built, refusing to return the order is just rude to the next caller.”
Connections forward:
- p24 (Pacific Atlantic) — multi-source BFS, another canonical “inversion” pattern.
- LC 210 (Course Schedule II), LC 269 (Alien Dictionary), LC 444 (Sequence Reconstruction) — all same algorithm.
- LC 802 (Find Eventual Safe States) — 3-color DFS variant.
- Production: every build system on Earth (
make,ninja,Bazel,npm,pip,cargo,gradle), spreadsheet recalc engines, course catalog planners, dataflow schedulers (Airflow, Dagster, Prefect), database query planners (join order via topo sort over dependency graph of intermediate results). - Theoretical: topological sort is THE entry point to scheduling theory; from here, CPM (critical path method), PERT, and the entire field of project scheduling open up.