p23 — Course Schedule

Source: LeetCode 207 · Medium · Topics: DFS, BFS, Topological Sort, Graph Companies (2024–2025 frequency): Google (very high), Meta (high), Amazon (high), Microsoft (high), Apple (medium), Bloomberg (medium), LinkedIn (high) Loop position: mid-to-late onsite. Heavy “do you understand graphs as a model” signal.

1. Quick Context

numCourses courses and a list of [a, b] prerequisite pairs (meaning “to take a, you must first take b”). Return whether it’s possible to finish all courses.

This is cycle detection on a directed graph, framed as a real-world planning problem. Two canonical algorithms solve it and you should know both:

  1. Kahn’s algorithm (BFS-based topological sort): track in-degrees, repeatedly remove zero-in-degree nodes. If you process all N nodes → no cycle. Otherwise → cycle exists.
  2. DFS 3-coloring (WHITE / GRAY / BLACK): WHITE = unvisited, GRAY = on current recursion path, BLACK = fully processed. A GRAY node encountered during DFS = back-edge = cycle.

What it looks like it tests: can you detect cycles. What it actually tests: (a) do you model “prereq” correctly as a directed edge (which way does the arrow go?), (b) do you know BOTH algorithms or just one, (c) can you extend to LC 210 (return the actual ordering — Course Schedule II) instantly, (d) do you handle self-loops as the trivial cycle case.


🔗 https://leetcode.com/problems/course-schedule/

STOP. 25-min timer. Solve both ways before reading.


3. Prerequisite Concepts


4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Given N courses and a list of [a, b] pairs meaning ‘must take b before a,’ determine whether a complete ordering exists. Equivalent: does the dependency graph have NO directed cycle?”

Step 2 — Clarify (CRITICAL):

  • “For [a, b], does b point to a or a point to b?” THIS ARROW DIRECTION IS THE QUESTION. By LC convention, [a, b] means b → a (“complete b, then can do a”). State your interpretation explicitly.
  • “Are duplicate prereq pairs allowed?” (Doesn’t affect correctness, but slight constant factor.)
  • “Can a course be its own prereq ([a, a])?” (LC says no, but state your algorithm handles it as a trivial cycle.)
  • “Disconnected components?” (Yes — your loop must seed BFS/DFS from EVERY zero-in-degree node, not just node 0.)

Step 3 — Constraints: N ≤ 2000, len(prerequisites) ≤ 5000. O(V+E) trivially fits.

Step 4 — Examples:

  • N=2, prereqs=[[1,0]] → True (take 0, then 1).
  • N=2, prereqs=[[1,0],[0,1]] → False (cycle).
  • N=3, prereqs=[] → True.

Step 5 — Brute Force: “Try all N! orderings, check each.” Trivially correct, useless complexity. Mention as proof of correctness baseline; don’t implement.

Step 6 — Complexity: O(V+E) time and space, both algorithms.

Step 7 — Pattern Recognition: “DAG-ness / cycle detection on a directed graph.” Pattern names to drop aloud: “Kahn’s algorithm,” “topological sort,” “3-color DFS.”

Step 8 — Develop (Kahn’s):

def canFinish(N, prereqs):
    adj = [[] for _ in range(N)]
    indeg = [0] * N
    for a, b in prereqs:
        adj[b].append(a)
        indeg[a] += 1
    q = deque(i for i in range(N) if indeg[i] == 0)
    processed = 0
    while q:
        u = q.popleft()
        processed += 1
        for v in adj[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)
    return processed == N

Develop (DFS 3-color):

WHITE, GRAY, BLACK = 0, 1, 2
def canFinish(N, prereqs):
    adj = [[] for _ in range(N)]
    for a, b in prereqs:
        adj[b].append(a)
    color = [WHITE] * N
    def dfs(u):
        if color[u] == GRAY: return False  # back-edge = cycle
        if color[u] == BLACK: return True
        color[u] = GRAY
        for v in adj[u]:
            if not dfs(v): return False
        color[u] = BLACK
        return True
    return all(dfs(i) for i in range(N))

Step 9 — Test: Empty prereqs, self-loop, two-node cycle, chain, diamond, two-component (one valid + one cyclic).


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Kahn vs DFS 3-Color

Both are O(V+E). Pick by what you need next:

NeedAlgorithm
Just “is there a cycle?”DFS 3-color (slightly less bookkeeping)
The actual topological order (LC 210)Kahn’s (the BFS order IS the topo order)
Identify all nodes in a cycleDFS 3-color (when you hit GRAY, walk the stack back)
Stream-friendly (nodes arrive over time)Kahn’s (decrement in-degrees as edges arrive)

Why the 3-coloring works: the only way for DFS to discover a cycle is to encounter a node still on its current recursion stack — that’s exactly the “GRAY” state. WHITE = not yet touched. BLACK = fully explored and known cycle-free.

Common pitfall — using a single visited set: if you collapse WHITE/GRAY/BLACK into “visited / not visited,” you can no longer distinguish “I’m currently descending through this node” from “I’ve finished with this node.” A node that’s fully processed (BLACK) shouldn’t trigger a cycle alarm just because some unrelated path revisits it.


7. Anti-Pattern Analysis

  1. Wrong arrow direction. [a, b] means b → a. Reversing it builds the wrong graph and gives wrong answers on asymmetric examples. State your interpretation OUT LOUD before coding.
  2. Using a single visited set instead of 3 colors. Reports false cycles on diamond graphs where two paths converge.
  3. Seeding BFS from only node 0. Misses disconnected components. Kahn’s MUST initialize the queue from ALL zero-in-degree nodes.
  4. Forgetting processed == N check. If you just return True when the loop ends, you return True for cyclic graphs (the queue empties early because no node ever reaches in-degree 0).
  5. Mutating the original indeg array without copying — only matters if you reuse across runs, but flag it.
  6. No self-loop handling. A self-loop [a, a] is a cycle of length 1. Kahn’s catches it (indeg[a] is permanently ≥1, never enters queue). DFS 3-color catches it (entering dfs(a) sets GRAY, recurses into a, sees GRAY, returns False).
  7. Recursive DFS without sys.setrecursionlimit on N=2000 chain. Python default is 1000. A chain 0→1→2→...→1999 blows the stack.

8. Skills & Takeaways

  • Arrow-direction discipline — half of all topo-sort bugs are reversed-edge bugs. State direction OUT LOUD.
  • Kahn’s = topo order for free — same loop solves LC 210.
  • 3-color DFS — the canonical cycle-detection pattern. GRAY = “on current path.”
  • processed == N invariant — Kahn’s correctness depends on this final check.
  • Disconnected components — seed from ALL valid roots, not just node 0.
  • The reduction — many real problems (build systems, course planning, package managers, spreadsheet formula recalc) reduce to topological sort. Recognize the reduction in 30 seconds.

9. When to Move On

Move on when:

  • You can write both Kahn’s and 3-color DFS from memory in under 5 minutes each.
  • You can extend Kahn’s to LC 210 (return the ordering) in under 2 minutes.
  • You handle: empty prereqs, self-loop, disconnected components, chain of length N=2000.
  • Your stress test runs both algorithms + a brute-force N! checker (small N) and confirms agreement.
  • You can articulate aloud which algorithm to pick given each downstream need (just-detect vs need-order vs find-cycle).

10. Company Context

Google:

  • Asks LC 207 then immediately LC 210 (return the order) then LC 269 (Alien Dictionary — derive the order from sample dictionary words). Test of pattern transfer across three problems.
  • Will ask about parallel topological sort: “If we have 8 workers, schedule courses to finish in min wall-clock time.” Reduces to “compute longest path in the DAG” (critical path method).
  • Scorecard phrase: “Algorithm — Kahn’s + 3-color, both correct; extended cleanly to ordering and parallel cases.”

Meta:

  • Variant: “given the prerequisite list, return the courses that are guaranteed to be in every topo order.” Reduces to checking which nodes have a unique predecessor at every Kahn step. Hard variant.
  • Scorecard phrase: “Strong — handled topo + variant.”

Amazon:

  • Pairs with a real-world frame: “build the deployment order for our microservices given the dependency manifest.” Same problem, business vocabulary.
  • Asks LC 210 nearly always. Have it ready.
  • Scorecard phrase: “Customer Obsession + Coding Excellence — translated business framing, shipped clean topo sort.”

Microsoft:

  • May ask only LC 207 but expects you to volunteer LC 210 as the “useful generalization.” Senior+ candidates volunteer.
  • Scorecard phrase: “Solid — recognized topo sort, demonstrated 2 algorithms.”

Apple:

  • Often presents as build-graph cycle detection in their internal dependency system. Asks specifically about handling self-loops and reporting WHICH cycle exists, not just yes/no.
  • Scorecard phrase: “Strong — used 3-color DFS, reported the cycle path.”

Bloomberg:

  • Asks about computing the order in which to evaluate spreadsheet cells given inter-cell formula dependencies. Topo sort with the additional wrinkle of “cells can be added/removed dynamically.” Mention incremental topological sort as a Staff-level discussion.
  • Scorecard phrase: “Strong — pragmatic translation to spreadsheet problem.”

LinkedIn:

  • Frequent. Often pairs with “alien dictionary” (LC 269) in same loop.
  • Scorecard phrase: “Pattern recognition — saw topo sort instantly.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
Arrow directionGuessesAsks once, then codesStates interpretation BEFORE codingStates + writes a 1-line comment in the code
Algorithm countDFS only (no cycle handling)Kahn’s OR DFS 3-colorBoth, picks by downstream needBoth + discusses incremental + parallel variants
DisconnectedMissesCatches after one test failSeeds all roots first trySeeds + tests with a multi-component example
Final checkNoneReturns True when loop ends (wrong)processed == Nprocessed == N + asserts in code
ExtensionStuckSolves LC 210 after a hintVolunteers LC 210 unpromptedVolunteers LC 210 + LC 269 + parallel scheduling

12. Level Delta

Mid (L4):

  • Kahn’s algorithm, correct, with processed == N check.
  • Builds adjacency from [a, b] with the right direction (after one clarifying question).
  • Handles disconnected components correctly.

Senior (L5):

  • Both Kahn’s AND 3-color DFS, both clean.
  • States algorithm choice aloud: “I’ll use Kahn’s because the follow-up will likely ask for the order.”
  • Solves LC 210 in 2 minutes when asked.
  • Handles self-loop explicitly.

Staff (L6):

  • Discusses incremental topological sort (when edges arrive/depart dynamically).
  • Knows LC 269 (Alien Dictionary) as the same algorithm + character-pair extraction.
  • Discusses parallel scheduling: longest path in DAG = critical-path-method = min wall-clock with infinite parallelism.
  • Discusses lexicographically smallest topo order via min-heap variant of Kahn’s.

Principal (L7+):

  • Reframes as CSP (constraint satisfaction problem) with one constraint type (ordering); discusses how this generalizes to richer constraint solvers.
  • Notes that this is the algorithm shipped in make, ninja, Bazel, npm, pip, cargo, every build system on Earth. Discusses how those systems detect and report cycles to users (cycle reconstruction in DFS 3-color: when you hit GRAY, walk the call stack backwards).
  • Discusses how spreadsheet engines (Excel, Google Sheets) handle this incrementally — recalc only the cells downstream of changed input cells (downstream subgraph of a topologically sorted DAG).
  • Discusses what topological sort does NOT solve: graphs with weighted edges and constraints on cumulative weight (need ILP / scheduling theory).

13. Follow-up Q&A

Q1: Return the actual order (LC 210). A: Use Kahn’s. The order in which nodes are popped from the queue IS a valid topological order. Append to a list; if len(list) == N at end, return list; else return [] (cycle).

Q2: Return the lexicographically smallest topo order. A: Replace Kahn’s queue (FIFO) with a min-heap. When multiple zero-in-degree nodes exist, always pop the smallest. Still O((V+E) log V) due to heap.

Q3: There are multiple valid orderings. Return them all. A: Backtracking. At each step, enumerate every zero-in-degree node, recurse, then unselect. Worst case exponential (think K courses with no prereqs — K! orderings). Use only when the count is small or use it for enumeration with early termination.

Q4: A new prerequisite arrives in real time. Update the answer. A: Incremental topological sort. Pearce-Kelly algorithm: maintain ordinal labels; when an edge u→v is added, if ord(u) < ord(v), nothing to do; else recompute labels for the affected subgraph. Amortized O(δ²·5) per edge where δ is the affected region size. The Staff signal is naming Pearce-Kelly without prompting.

Q5: Detect the actual cycle if one exists (not just yes/no). A: Use DFS 3-color and maintain a parent pointer. When you hit a GRAY node v from current node u, walk parent pointers from u back to v and prepend v. That’s the cycle.


14. Solution Walkthrough

See solution.py:

  • can_finish_kahns(N, prereqs) — BFS / Kahn’s. Returns bool.
  • can_finish_dfs_coloring(N, prereqs) — 3-color DFS. Returns bool.
  • find_order_kahns(N, prereqs) — LC 210 bonus. Returns the order or [].
  • find_cycle_dfs(N, prereqs) — Q5 follow-up. Returns the cycle as a list of node ids or None.
  • has_cycle_brute(N, prereqs) — O(N · (V+E)) baseline: run DFS from every node, track parent path. Used as the oracle in stress tests.
  • stress_test() — 1000 random graphs (both guaranteed-DAG and possibly-cyclic), all three implementations agree with brute baseline; find_order_kahns also verified to produce a valid order.
  • edge_cases() — empty prereqs, self-loop, two-node cycle, chain length 1000, diamond, disconnected with one cyclic component.

Run: python3 solution.py → “ALL TESTS PASSED”.


15. Beyond the Problem

Principal-engineer code review comment:

“Two things. First: please log the cycle path when you detect one. return False is what LeetCode wants; what production needs is raise CycleError(cycle=[3, 7, 12, 3]) so the on-call engineer can read the dependency log at 3am and know which three services to break. Cycle reconstruction is 6 extra lines in the 3-color DFS — please do it. Second: if you’re about to ship Kahn’s, also export topological_order() — once we have the in-degree machinery built, refusing to return the order is just rude to the next caller.”

Connections forward:

  • p24 (Pacific Atlantic) — multi-source BFS, another canonical “inversion” pattern.
  • LC 210 (Course Schedule II), LC 269 (Alien Dictionary), LC 444 (Sequence Reconstruction) — all same algorithm.
  • LC 802 (Find Eventual Safe States) — 3-color DFS variant.
  • Production: every build system on Earth (make, ninja, Bazel, npm, pip, cargo, gradle), spreadsheet recalc engines, course catalog planners, dataflow schedulers (Airflow, Dagster, Prefect), database query planners (join order via topo sort over dependency graph of intermediate results).
  • Theoretical: topological sort is THE entry point to scheduling theory; from here, CPM (critical path method), PERT, and the entire field of project scheduling open up.