Lab 01 — BFS Shortest Path (Word Ladder)
Goal
Implement an unweighted shortest-path search on an implicit graph where the nodes are dictionary words and the edges connect any two words that differ in exactly one character. After this lab you should be able to recognize a word-ladder / state-transformation problem in <60 seconds, build the wildcard-bucket adjacency in <3 minutes, and write the BFS body from a blank screen in <5 minutes with correct visited-on-enqueue semantics.
Background Concepts
BFS on an unweighted graph visits nodes in non-decreasing order of distance from the source: the source first (distance 0), then its neighbors (distance 1), then their neighbors (distance 2), and so on. The first time a node is dequeued, its distance is final. This phase teaches the wildcard-bucket trick that makes word-ladder graphs tractable: rather than checking all O(N²) word-pairs for adjacency, build a dict mapping each L-character “wildcarded” pattern (e.g., h*t, *ot, ho*) to the list of words matching that pattern. Two words are adjacent iff they share at least one bucket.
The buckets are O(N · L) entries total; constructing them is O(N · L); finding all neighbors of a word is O(L · σ) where σ is the average bucket size. The total BFS cost is O(N · L²) instead of O(N² · L).
Interview Context
Word Ladder (LC 127) is a top-50 interview problem at Meta and Amazon — both companies have asked it within the past year. Variants appear at Google (“minimum genetic mutation” — LC 433) and Bloomberg. It tests three things at once: (1) recognizing the implicit graph, (2) building it efficiently, (3) running clean BFS. Candidates who try for each pair: if differs by one: connect time out at N = 5000. Bombing this problem on a phone screen is a serious negative signal at L4+.
Problem Statement
Given a beginWord, an endWord, and a list wordList, return the length of the shortest transformation sequence from beginWord to endWord such that:
- Only one letter changes per step.
- Every intermediate word must be in
wordList. beginWorddoes not need to be inwordList.
Return 0 if no such sequence exists.
Constraints
- 1 ≤
beginWord.length≤ 10 - All words have the same length, L.
- 1 ≤
wordList.length≤ 5000 - All words consist of lowercase English letters.
beginWord != endWord.- All words in
wordListare unique.
Clarifying Questions
- Is
beginWordrequired to differ fromendWord? (Yes — guaranteed.) - Does the answer count
beginWordandendWord? (Yes — sequence length includes both endpoints.) - If
endWordis not inwordList, is the answer 0? (Yes — by the rules.) - Are case-sensitive comparisons required? (No — all lowercase.)
- Can
beginWorditself appear inwordList? (Yes; treat normally.)
Examples
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
→ 5 (hit → hot → dot → dog → cog)
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
→ 0 (cog not in wordList)
Initial Brute Force
Materialize the graph: for each pair of words, check if they differ by one letter (O(L) check per pair). Build adjacency in O(N² · L). Then run BFS in O(V + E) = O(N²) edges worst case.
Brute Force Complexity
Time O(N² · L) for graph construction, O(N²) for BFS. Total O(N² · L). Space O(N²). At N = 5000, L = 10: 2.5 × 10^8 ops — borderline TLE in Python; passes in C++ tightly.
Optimization Path
The bottleneck is graph construction. Wildcard buckets eliminate it: for each word, generate L wildcards and append to a bucket dict. Construction is O(N · L²) — at N = 5000, L = 10, that’s 5 × 10^5 ops, two orders of magnitude better. Neighbor enumeration is also faster: only words sharing a bucket are candidates, which prunes dramatically vs scanning all N words.
Bidirectional BFS is a further optimization that ~halves the work in practice (search from both ends, meet in the middle), but adds complexity and is overkill at N = 5000.
Final Expected Approach
- If
endWordnot inwordList, return 0. - Build
buckets: a dict mapping each wildcard pattern to the list of words matching it. - BFS from
beginWordwith distance 0; on dequeue, generate L wildcards; for each, look up the bucket and enqueue any unvisited word with distance + 1. - On reaching
endWord, return distance + 1 (because the answer counts both endpoints). - If queue exhausts, return 0.
Data Structures Used
dict[str, list[str]]— wildcard buckets.set[str]— visited words.collections.deque— BFS queue, holding(word, distance)tuples.
Correctness Argument
BFS on an unweighted graph: when endWord is first dequeued, its distance is the minimum number of edges from beginWord. Each edge represents a one-letter change between dictionary words. Therefore the distance equals the minimum number of one-letter changes — and the sequence length is distance + 1. Visited-on-enqueue ensures each word enters the queue at most once, so total work is O(V · L · σ) where σ is the average bucket size.
Complexity
| Operation | Time | Space |
|---|---|---|
| Build buckets | O(N · L²) | O(N · L) |
| BFS | O(N · L²) | O(N) |
| Total | O(N · L²) | O(N · L) |
Implementation Requirements
from collections import defaultdict, deque
def ladderLength(beginWord, endWord, wordList):
word_set = set(wordList)
if endWord not in word_set:
return 0
L = len(beginWord)
buckets = defaultdict(list)
for w in word_set:
for i in range(L):
buckets[w[:i] + '*' + w[i+1:]].append(w)
visited = {beginWord}
queue = deque([(beginWord, 1)])
while queue:
word, d = queue.popleft()
if word == endWord:
return d
for i in range(L):
pat = word[:i] + '*' + word[i+1:]
for nb in buckets[pat]:
if nb not in visited:
visited.add(nb)
queue.append((nb, d + 1))
buckets[pat] = [] # optional: clear bucket to avoid reprocessing
return 0
Tests
- Standard:
hit → cogwith full path → 5. - Missing endWord: return 0.
beginWord == endWord: technically violates constraints, but should return 1 if asked.- Single-step:
hit → hotwith wordList=[“hot”] → 2. - No path: disconnected words → 0.
- Long L: words of length 10, N = 5000 (load test).
- All same-length: invariant must hold; assert in code.
Follow-up Questions
- “Return all shortest paths, not just length.” → BFS to identify the layer of
endWord, then DFS backward through parent pointers stored at each layer. - “What if word lengths differ across the list?” → Edges are now insert/delete/substitute; problem reduces to edit-distance graph. Out of scope here.
- “What if N = 10^6?” → Bidirectional BFS halves the layer count; trie-based neighbor finding can replace bucket dicts.
- “Stream of new words being added live.” → Maintain buckets incrementally; BFS becomes a per-query operation.
- “What if changing a letter has a cost?” → Now weighted; switch to Dijkstra.
Product Extension
Spell-correctors, fuzzy-matching APIs, and DNA-mutation analyzers all use similar implicit-graph BFS. Google’s “did you mean” suggestion historically used Levenshtein-distance graphs over its query log; word-ladder BFS is the toy version of that.
Language/Runtime Follow-ups
- Python:
defaultdict(list)andcollections.dequeare essential. String slicing is O(L); thew[:i] + '*' + w[i+1:]pattern allocates a new string per call (3 × 10^5 per call * 5000 words = 1.5 × 10^9 — measured ~1.5s in Python). Acceptable; for faster, precompute patterns once per word. - Java: use
HashMap<String, List<String>>andArrayDeque<String>.StringBuilderfor pattern construction is faster than string concat. Useintdistance via a parallel map or wrap in a custom record. - Go:
map[string][]stringand a slice-based queue (q = q[1:]is O(1) amortized for slice queues, but channels are easier). Strings are immutable so building patterns is O(L) regardless. - C++:
unordered_map<string, vector<string>>andqueue<pair<string,int>>. Preallocate to avoid rehashing. Usestring_viewif possible to avoid copies; or(word_index, distance)to avoid string keys entirely. - JS/TS:
Mapand an array used as a queue (shift()is O(N) — instead, use a deque library or two-stack approach). Strings are immutable; pattern construction allocates.
Common Bugs
- Forgetting to check
endWord in word_setup front — wastes work if missing. - Visited check on dequeue, not on enqueue — exponential blowup of queue size.
- Returning the BFS distance instead of distance + 1 (or vice versa) — off by one.
- Including
beginWordinword_setand then visiting it on a wildcard match — easy if you don’t initializevisited = {beginWord}first. - Generating wildcards with the wrong character (
'_'vs'*') and getting collision-free buckets that are also empty. - Forgetting that
wordListmay contain duplicates if you stored as list — use a set.
Debugging Strategy
Print the buckets for a tiny example (["hot", "dot", "dog"], L=3) and verify each wildcard pattern maps to the expected words. Print the BFS queue state after each layer. If the BFS terminates too early, trace which word was dequeued at the failure point and which neighbors weren’t generated. If too slow, profile with cProfile (Python) — likely you’re not visited-marking on enqueue.
Mastery Criteria
- Recognized the implicit-graph signal (one-character-difference adjacency) within 60 seconds.
- Wrote the bucket construction from blank screen in <3 minutes, no off-by-ones.
- Wrote correct BFS with visited-on-enqueue in <5 minutes from cold start.
- Stated O(N · L²) complexity unprompted.
- Solved LC 127 in <15 minutes from cold start.
- Solved LC 433 (Minimum Genetic Mutation, a near-clone) in <10 minutes.
- Articulated the wildcard-bucket vs all-pairs tradeoff in <30 seconds when asked.