Lab 06 — Topological Sort (Alien Dictionary)
Goal
Build a topological sort over an inferred constraint graph — a problem whose graph is not given but must be extracted from the input. After this lab you should be able to recognize “ordering with constraints” problems in <60 seconds, write Kahn’s algorithm from a blank screen in <8 minutes, and identify and handle the three degenerate cases of alien-dictionary parsing (prefix violation, no constraint differs, cyclic dependency).
Background Concepts
A topological sort orders the vertices of a DAG such that every directed edge u → v has u before v. Kahn’s algorithm repeatedly removes a node of in-degree 0, decrementing its neighbors’ in-degrees. If the final order has length V, the graph is a DAG and the order is valid. Otherwise, a cycle exists.
For alien dictionary, the input is a list of words known to be sorted lexicographically in some unknown alien alphabet. Each adjacent pair of words gives at most one ordering constraint between two characters: the first position where they differ tells you a < b for those characters. Build the constraint graph, run topological sort, output an order. Handle three degenerate inputs:
- No constraint differs between adjacent words but the second is a prefix of the first (e.g.,
["abc", "ab"]) — invalid; return"". - No constraint differs but it’s because the words are equal up to a common length and the second is the longer one (
["ab", "abc"]) — fine; no constraint added. - Cycle in the constraint graph — invalid; return
"".
Interview Context
Alien Dictionary (LC 269) is one of the most-asked Hard graph problems at Meta, Google, and Airbnb. Premium-only on LeetCode but widely leaked. It tests: (1) recognizing topological sort, (2) constraint extraction from non-graph input, (3) handling all three edge cases. Candidates who solve only the happy path lose major points. The senior signal: enumerate the three failure modes upfront, before writing code.
Problem Statement
There is a new alien language using English letters. The order of letters is unknown. Given a list of words sorted lexicographically by the alien language’s rules, return any valid letter ordering. If no valid ordering exists, return "". If multiple are valid, return any.
Constraints
- 1 ≤ |words| ≤ 100
- 1 ≤ |words[i]| ≤ 100
- All words consist of lowercase English letters.
Clarifying Questions
- Is the answer unique? (No — any valid topological order.)
- Does the answer include letters not appearing in any word? (No — only letters that appear.)
- Are duplicate words possible? (Possible; treat normally — they yield no constraint.)
- What does “lexicographically sorted” mean for words of different length? (Standard prefix rule; if A is a prefix of B, then A < B; if B is a prefix of A, the input is invalid.)
- What if
wordsis a single word? (Output any permutation of its unique letters.)
Examples
words = ["wrt","wrf","er","ett","rftt"]
→ "wertf" (one valid ordering)
words = ["z","x"]
→ "zx"
words = ["z","x","z"]
→ "" (z and x must precede each other — cycle)
words = ["abc","ab"]
→ "" (prefix violation — "abc" can't come before "ab")
Initial Brute Force
Enumerate all permutations of the alphabet of size ≤ 26; for each, verify that the input words are in that lexicographic order. O(26!) — infeasible.
Brute Force Complexity
O(26! · ΣL) — astronomical.
Optimization Path
The answer requires O(V + E) where V = number of distinct letters (≤ 26) and E = number of pairwise constraints (≤ |words| - 1). Kahn’s algorithm runs in O(V + E). Constraint extraction is O(Σ |words[i]|).
The total is O(V + E + Σ L) — bounded by Σ L since V ≤ 26. Trivially fast.
Final Expected Approach
- Initialize
in_degree[c] = 0for every letter that appears anywhere. - Build adjacency: for each adjacent pair
(w1, w2)inwords:- Walk both words in parallel; at the first index
iwhere they differ, add edgew1[i] → w2[i](if not already present); break. - If no differing index found and
len(w1) > len(w2): prefix violation — return"".
- Walk both words in parallel; at the first index
- Run Kahn’s: queue all letters with
in_degree[c] == 0; pop, append to order, decrement neighbors. - If the order length equals the number of distinct letters, return the order; else cycle → return
"".
Data Structures Used
defaultdict(set)for adjacency (set prevents duplicate edges).dict[char, int]forin_degree.collections.dequefor the Kahn queue.list[char]for the result.
Correctness Argument
Each adjacent pair contributes at most one constraint — the first differing character. This is sound: if the words are correctly sorted, the relative order of w1[i] and w2[i] (at the first differing index) must be w1[i] < w2[i] in the alien alphabet. No constraint can be inferred from differences after the first; those are consistent with but not implied by the sortedness.
Topological sort over these constraints produces an ordering where every constraint a < b is satisfied (a precedes b in the output). If a cycle exists, the constraints are unsatisfiable and the input is impossible. The prefix-violation case is the only constraint-extraction-time invalid input.
Complexity
| Operation | Time | Space |
|---|---|---|
| Constraint extraction | O(Σ L) | O(unique edges) ≤ O(26²) |
| Kahn’s | O(V + E) | O(V + E) |
| Total | O(Σ L) | O(unique letters + edges) |
Implementation Requirements
from collections import defaultdict, deque
def alienOrder(words):
adj = defaultdict(set)
in_deg = {c: 0 for w in words for c in w}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i+1]
found = False
for j in range(min(len(w1), len(w2))):
if w1[j] != w2[j]:
if w2[j] not in adj[w1[j]]:
adj[w1[j]].add(w2[j])
in_deg[w2[j]] += 1
found = True
break
if not found and len(w1) > len(w2):
return ""
queue = deque([c for c, d in in_deg.items() if d == 0])
order = []
while queue:
c = queue.popleft()
order.append(c)
for nb in adj[c]:
in_deg[nb] -= 1
if in_deg[nb] == 0:
queue.append(nb)
return "".join(order) if len(order) == len(in_deg) else ""
Tests
- Standard:
["wrt","wrf","er","ett","rftt"]→ some topo ofw<e, r<t, t<f. - Single word:
["abc"]→ some permutation of{a,b,c}. - Prefix violation:
["abc","ab"]→"". - Tie/equal words:
["a","a"]→"a". - Cycle:
["z","x","z"]→"". - All same length, all letters used:
["aa","ab","cb"]. - Single letter:
["z"]→"z". - Long words, tiny alphabet: stress for in-degree correctness.
Follow-up Questions
- “Find the lexicographically smallest valid order (in standard a-z order).” → Kahn’s with a min-heap instead of a queue.
- “Find all valid orders.” → Backtracking over Kahn’s choices; exponential.
- “Verify a given ordering.” → For each adjacent word pair, scan for the first differing char and check ordering. O(Σ L).
- “Online: words arrive one by one.” → Maintain the adjacency incrementally; rerun Kahn’s lazily on query.
- “What if the input has typos (wrongly-ordered pairs)?” → Return any consistent ordering, or report the conflict edge.
Product Extension
Build systems (Bazel, Make, Gradle) compute build orders via topological sort over the dependency DAG; cycle detection is a critical correctness property. Database query planners use topo sort over join-graph dependencies. Distributed task schedulers (Airflow, Argo) execute DAGs of jobs in topological order.
Language/Runtime Follow-ups
- Python:
defaultdict(set)andcollections.dequeare essential.dict.items()iteration is fine. - Java:
Map<Character, Set<Character>>andMap<Character, Integer>for in-degree;ArrayDeque<Character>for queue. Useint[26]for in-degree if alphabet is fixed. - Go:
map[byte]map[byte]boolfor adjacency;map[byte]intfor in-degree; slice as queue. Or[26]intfor in-degree as alphabet is fixed. - C++:
unordered_map<char, unordered_set<char>>;array<int, 26>for in-degree;queue<char>. - JS/TS:
Map<string, Set<string>>andMap<string, number>; array-as-queue with care (shiftis O(N)).
Common Bugs
- Adding duplicate edges to in-degree — use a set for adjacency, check membership before incrementing.
- Missing the prefix violation check —
["abc","ab"]returns"abc"if you don’t handle this. - Building in-degree only for letters that have outgoing edges, missing letters that only appear as targets.
- Initializing
in_degreeonly for the first word’s letters — letters appearing only in later words get missed. - Comparing
len(order) == 26instead of== len(in_deg)(only used letters count). - Using a list instead of a set for adjacency, then double-incrementing in-degree.
- Returning the order in the wrong direction (Kahn’s gives the right direction; DFS post-order needs to be reversed).
Debugging Strategy
Print the adjacency and in-degree maps after constraint extraction. Verify each constraint is justified by tracing back to the input pair. For cycles, print the in-degree map at the point Kahn’s stalls — the remaining-positive in-degrees identify nodes in the cycle. For prefix-violation false negatives, print the pair (w1, w2) at each iteration to confirm the check fires.
Mastery Criteria
- Recognized “ordering with constraints” as topological sort in <60 seconds.
- Wrote constraint extraction from word pairs from cold start in <5 minutes.
- Wrote Kahn’s algorithm from blank screen in <6 minutes.
- Enumerated the three degenerate cases (cycle, prefix violation, equal-prefix-shorter-second) before coding.
- Solved LC 269 in <25 minutes from cold start.
- Solved LC 207 (Course Schedule) in <8 minutes by extracting the constraint structure from cold.
- Articulated the white-path lemma / DFS-post-order alternative in <60 seconds.