p25 — Word Ladder

Source: LeetCode 127 · Hard · Topics: BFS, Hash Table, String, Implicit Graph Companies (2024–2025 frequency): Amazon (high), Meta (high), Google (high), Microsoft (high), Bloomberg (medium), Apple (medium) Loop position: late-onsite, the “graph round.” Most candidates fail by writing O(N²·L) naive adjacency.

1. Quick Context

Given a beginWord, an endWord, and a wordList, return the length of the shortest transformation sequence from beginWord to endWord (each step changes exactly one letter; each intermediate word must be in wordList).

This is the canonical implicit graph + BFS problem. The challenge isn’t BFS itself — it’s:

The graph isn’t given. Nodes = words, edges = pairs of words that differ by one letter. You can construct adjacency by either (a) comparing all pairs O(N²·L), or (b) the pattern-bucket trick O(N·L²) — usually a massive speedup.
The bucket trick: for each word, generate L patterns by replacing each letter with * (e.g., "hit" → "*it", "h*t", "hi*"). Group words by these patterns. Two words share a bucket iff they differ by exactly one letter. Adjacency lookup becomes O(L) instead of O(N·L).
Bidirectional BFS: for very large dictionaries, BFS from both ends simultaneously, alternating which frontier to expand (smaller one each iteration). Search space goes from b^d to 2·b^(d/2).
BFS, not DFS. This is unweighted shortest-path. DFS gives a path, not the shortest path.

What it looks like it tests: can you do BFS on a graph. What it actually tests: (a) do you recognize “1-letter-edit” as an implicit graph, (b) do you skip O(N²·L) adjacency in favor of bucket adjacency, (c) do you correctly handle endWord not in wordList (return 0), (d) can you discuss bidirectional BFS as the Hard follow-up.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/word-ladder/

STOP. 30-min timer. This is Hard — try the bucket adjacency before peeking.

3. Prerequisite Concepts

p21–p24 — comfort with BFS on grids and explicit graphs
phase-04-graphs/labs/lab-01-bfs-shortest-path.md
phase-04-graphs/labs/lab-09-graph-modeling.md
BFS level-counting (distance = level # in BFS tree)

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Given beginWord, endWord, and a list of valid intermediate words, find the minimum number of words in a sequence from beginWord to endWord where each consecutive pair differs by exactly one letter and every word in the sequence (except beginWord) is in the wordList. Return 0 if impossible. The ‘length’ is the count of words including endpoints.”

Step 2 — Clarify:

“Is beginWord always in wordList? Always out?” (LC: not guaranteed in. State that this doesn’t affect the algorithm.)
“Must endWord be in wordList?” (LC: yes, otherwise return 0. Check this BEFORE running BFS.)
“Case sensitive?” (LC: all lowercase. Easy to forget.)
“Word length L?” (LC: up to 10. Small. Bucket trick fast.)
“Dictionary size N?” (LC: up to 5000. Bucket beats naive comfortably.)

Step 3 — Constraints: N ≤ 5000, L ≤ 10. Naive O(N²·L) = 5000² · 10 = 2.5e8 — borderline TLE in Python. Bucket O(N·L²) = 5000 · 100 = 5e5 — instant. Always use bucket.

Step 4 — Examples:

beginWord="hit", endWord="cog", wordList=["hot","dot","dog","lot","log","cog"] → 5 (hit→hot→dot→dog→cog).
beginWord="hit", endWord="cog", wordList=["hot","dot","dog","lot","log"] → 0 (endWord not in list).

Step 5 — Brute Force: Compare every pair of words for 1-letter-edit; build full adjacency list; BFS. Time O(N²·L) for adjacency + O(N + E) for BFS. State; show; pivot to bucket.

Step 6 — Complexity: O(N·L²) time and O(N·L) space (each word generates L bucket keys, each O(L) to construct).

Step 7 — Pattern Recognition: “Words differ by one letter” → implicit graph + pattern-bucket adjacency. “Shortest sequence” → BFS (not DFS). “Hard variant for huge dictionaries” → bidirectional BFS.

Step 8 — Develop:

def ladderLength(begin, end, wordList):
    word_set = set(wordList)
    if end not in word_set: return 0
    L = len(begin)
    # Build pattern → list of words
    buckets = defaultdict(list)
    for w in word_set:
        for i in range(L):
            buckets[w[:i] + "*" + w[i+1:]].append(w)
    visited = {begin}
    q = deque([(begin, 1)])
    while q:
        word, dist = q.popleft()
        if word == end: return dist
        for i in range(L):
            key = word[:i] + "*" + word[i+1:]
            for neigh in buckets[key]:
                if neigh not in visited:
                    visited.add(neigh)
                    q.append((neigh, dist + 1))
            buckets[key] = []  # optimization: each bucket processed once
    return 0

Step 9 — Test: endWord not in dict (return 0), beginWord == endWord (return 1 per LC), single-letter words, no path exists.

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — Bucket Adjacency vs Bidirectional BFS

Bucket adjacency math: for word length L over alphabet Σ, naive adjacency needs O(L · |Σ|) tries per word to generate all 1-edit neighbors and check set membership. That’s O(N·L·|Σ|). The bucket approach pre-groups words by pattern: O(N·L) preprocessing, then O(L) per word during BFS. The win is that we don’t enumerate non-existent neighbors.

Bidirectional BFS math: if branching factor is b and shortest distance is d, ordinary BFS visits ~b^d nodes. Bidirectional BFS visits ~2·b^(d/2) — exponentially smaller for big d. Implementation:

Maintain two frontiers, forward (from begin) and backward (from end). Initialize with {begin} and {end}.
At each step, EXPAND THE SMALLER FRONTIER (this is the crucial optimization).
For each word in the expanded frontier, generate 1-edit neighbors. If a neighbor is in the OTHER frontier → answer found (current_distance + 1).
Otherwise add to next-level frontier and visited set.

For LC 127, bidirectional BFS turns “TLE” into “instant” on the worst-case dictionaries. Discuss it as the Hard follow-up — implementation in solution.py.

Why BFS, not DFS: unweighted shortest path requires BFS. DFS gives a path but not the shortest; checking shortest with DFS requires exhaustive search (exponential).

The bucket-clear optimization: buckets[key] = [] after first visit ensures each bucket is processed at most once. Without it, the same set of neighbors can be re-checked through other words in the same bucket. With it, total work stays bounded by O(N·L²).

7. Anti-Pattern Analysis

DFS instead of BFS. Gives a path, not the shortest. Or gives shortest but only after exponential exploration.
O(N²·L) adjacency. Comparing every pair character-by-character. Times out on N=5000.
Mutating wordList (a list) for membership tests. in list is O(N); in set is O(1). Always convert.
Not checking endWord in word_set upfront. Wastes a BFS run that’s doomed to fail.
Off-by-one on distance. LC counts the number of WORDS in the path (including both endpoints), not the number of edges. hit→hot→cog is distance 3, not 2.
Forgetting to mark visited. Words re-enqueue infinitely. Mark when ENQUEUEING.
Building buckets but iterating wordList (list) instead of set during construction. Tiny perf issue, but if wordList has duplicates you process them twice.
Bidirectional BFS without “expand smaller frontier” optimization. Loses most of the speedup.
Trying generic-alphabet brute neighbors (for ch in 'abcdefghijklmnopqrstuvwxyz') without bucket pre-grouping. Works but wasteful — generates 26·L neighbors per word, most of which aren’t in the dictionary.

8. Skills & Takeaways

Implicit graph recognition — nodes are not given as a list; edges are implied by a predicate.
Pattern-bucket adjacency — pre-group items by a “with one feature missing” key for fast equality-up-to-one-feature.
BFS = unweighted shortest path — DFS doesn’t.
Bidirectional BFS — the search-tree pruning trick for big graphs.
Set vs list for membership — O(1) vs O(N); never use list for hot-path lookups.
Distance counting convention — clarify “edges” vs “vertices” in step 1.

9. When to Move On

Move on when:

You can write bucket-BFS Word Ladder in under 10 minutes.
You can sketch bidirectional BFS in under 15 minutes.
You can articulate the math: why bidirectional is ~b^(d/2) vs b^d.
Your stress test runs naive-BFS, bucket-BFS, and bidirectional-BFS on 50 random small dictionaries and confirms all agree.
You can pivot to LC 433 (Min Genetic Mutation — same algorithm with letters from {A,C,G,T}) instantly.
You can sketch LC 126 (Word Ladder II — return ALL shortest paths) as a “BFS for distances, then DFS backtracking with predecessor map” extension.

10. Company Context

Amazon:

Late-onsite. Bar Raiser frequently asks this exact problem to senior candidates.
Watches for: do you start with naive then pivot, or jump straight to bucket? Both fine; Senior+ pivots to bucket within 5 min.
Follow-up: LC 126 (all shortest paths). Very few candidates finish in time; signal is whether you can describe the algorithm cleanly even without finishing code.
Scorecard phrase: “Algorithm + Insist on the Highest Standards — recognized implicit graph, used bucket adjacency, sketched bidirectional BFS as scale extension.”

Meta:

Pairs with LC 433 (Min Genetic Mutation) in the same loop. Tests pattern transfer.
Expects bidirectional BFS as the first approach for the “billions of words” variant.
Scorecard phrase: “Strong — bucket BFS + bidirectional + LC 433 transfer.”

Google:

May ask correctness proof: “prove bucket adjacency gives the same edges as pairwise comparison.” Phrase as “two words share a bucket key iff they agree at all but one position iff edit distance is exactly 1.”
May ask: “what if the alphabet was huge (Unicode codepoints) so the wildcard approach explodes?” Discuss: alphabet doesn’t affect bucket construction; the bucket key length is L*L irrespective of |Σ|.
Scorecard phrase: “Algorithm — clear bucket trick, correctness argued.”

Microsoft:

Often the simpler variant: just need bucket-BFS. Bidirectional is bonus.
Scorecard phrase: “Solid — bucket adjacency + BFS.”

Bloomberg:

Frames in finance: “ticker symbols differ by one character — find shortest substitution chain between two symbols given a list of valid tickers.” Same algorithm.
Scorecard phrase: “Pragmatic — applied bucket adjacency to ticker problem.”

Apple:

Variant: “the cost of swapping each letter depends on the letter pair (Dvorak vs QWERTY distance).” Now it’s weighted shortest path → Dijkstra. Discuss the upgrade path.
Scorecard phrase: “Strong — recognized when to upgrade BFS to Dijkstra.”

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
First instinct	DFS	BFS w/ O(N²) adjacency	BFS w/ bucket adjacency	BFS w/ bucket + bidirectional
Adjacency	All-pair compare	All-pair (then bucket after hint)	Bucket first try	Bucket + clears buckets per visit
Distance	Off-by-one	Right	Right + states “I count vertices not edges”	Right + writes invariant comment
`endWord` check	Doesn’t	Adds after debug	Adds upfront	Adds upfront + comments why
Scale extension	Stuck	“BFS scales fine”	Sketches bidirectional	Implements + explains b^d → 2b^(d/2) math

12. Level Delta

Mid (L4):

BFS with set-membership for wordList.
Bucket adjacency (possibly after one hint).
Correct distance counting.
Handles endWord not in wordList.

Senior (L5):

Bucket adjacency first try.
Clears buckets per visit (the small optimization).
Sketches bidirectional BFS verbally if time runs out.
States BFS-vs-DFS reasoning aloud.

Staff (L6):

Implements bidirectional BFS cleanly.
Discusses the “expand smaller frontier” optimization and why it matters.
Sketches LC 126 (Word Ladder II) algorithm (BFS for distances + DFS backtracking via predecessor map).
Discusses weighted variant (Dijkstra) for non-uniform letter-swap costs.

Principal (L7+):

Reframes as edit-distance-1 graph + BFS shortest path, a special case of the broader “metric BFS” family.
Discusses how this generalizes to A* with edit-distance heuristic for asymmetric letter alphabets.
Discusses memory: bidirectional BFS doubles peak memory; trade off depending on whether memory or CPU is the bottleneck.
Discusses production: spell-check / suggestion engines, autocorrect, DNA mutation pathway analysis, Levenshtein-1 indexes (FuzzyWuzzy library, Lucene’s FuzzyQuery).

13. Follow-up Q&A

Q1: Return ALL shortest paths (LC 126). A: Two-phase: (1) BFS to compute distance from beginWord to every word, AND for each word remember its predecessors (the words at distance-1 that ARE valid 1-edits). (2) DFS from endWord backwards through the predecessor map to reconstruct all shortest paths. Time can be exponential in the number of shortest paths (a real combinatorial blowup is possible).

Q2: Implement bidirectional BFS. A: Maintain two sets forward = {begin} and backward = {end}. Loop: pick the smaller set, generate all 1-edit neighbors in the dictionary, check if any is in the other set (success → return current step count + 1). Otherwise replace forward with the new neighbors and continue. ~2·b^(d/2) work instead of b^d.

Q3: What if wordList contains 10^7 words? A: Bidirectional BFS becomes mandatory. Also: bucket construction memory becomes a concern (10^7 · 10 = 10^8 keys, each ~10 chars — gigabytes). Solutions: hash buckets to fixed-size integers, or stream bucket construction (don’t materialize all at once).

Q4: What if letters have unequal swap costs (e.g., neighboring keys cheaper)? A: BFS → Dijkstra. Adjacency unchanged (still bucket-based). Replace deque with heapq. Cost vector cost[i][j] for swapping letter i to letter j.

Q5: Add a constraint: at most K consecutive same-letter swaps. A: State-augmented BFS. Each state is (word, consecutive_same_position_swaps). Distance grows polynomially with K. Standard graph-product trick.

14. Solution Walkthrough

See solution.py:

ladder_length_bucket_bfs(begin, end, wordList) — canonical bucket BFS.
ladder_length_bidirectional_bfs(begin, end, wordList) — bidirectional bucket BFS with “expand smaller frontier.”
ladder_length_brute_bfs(begin, end, wordList) — O(N²·L) pairwise-compare BFS. Oracle.
stress_test() — 50 random small dictionaries, all three implementations agree.
edge_cases() — endWord not in dict, begin == end, no path, single-letter words, LC canonical.

Run: python3 solution.py → “ALL TESTS PASSED”.

15. Beyond the Problem

Principal-engineer code review comment:

“The bucket trick is great, but please name it in a docstring. It’s called wildcard-pattern indexing or Levenshtein-1 inverted index, depending on which subfield you ask. Lucene calls it ‘fuzzy query.’ Our autocomplete system uses essentially this code path for ‘did you mean?’ suggestions. If you ever generalize it for our search service, please make the wildcard character configurable (we use \\0 because user queries can contain *), and please factor out build_buckets(words) so that the BFS code becomes 10 lines instead of 30. Also: yes, ship the bidirectional version too — our dictionary has 800k entries and the speedup is 100×.”

Connections forward:

Phase 5 starts dynamic programming; many DP problems boil down to “BFS on a small state graph.”
LC 433 (Min Genetic Mutation), LC 126 (Word Ladder II), LC 752 (Open the Lock), LC 854 (K-Similar Strings), LC 854 — all implicit graph BFS variants.
Production: spell correction (Levenshtein-1 index), DNA mutation pathway analysis, password-cracking dictionaries (mutation chains), autocomplete suggestion engines.
Theoretical: this is BFS on the Hamming-1 graph of strings; the same algorithm extends to BFS on the Levenshtein-k graph for general edit-distance shortest paths.

LeetCode Interview — Extreme Coding