Lab 08 — Backtracking: Word Search II With Trie Pruning

Goal

Master the backtracking-with-pruning pattern in its highest-yield form: a grid DFS guided by a trie, with in-place visit-marking and post-recurse restoration. Deliverable solves LC 212 in O(M·N·4·3^(L-1) + W·L) time, where M·N is grid size, L is max word length, W is number of words. You can articulate why a trie cuts the brute O(W·M·N·4·3^(L-1)) by a factor of W, and why dead-branch pruning of the trie is the speed-of-light optimization.

Background Concepts

DFS on a 2D grid; backtracking with explicit make/undo of state; trie as a multi-pattern matcher; pruning via “if no children, abandon”; visit-marking via in-place mutation (board[r][c] = '#') vs an explicit visited set. Review pattern Backtracking and Trie.

Interview Context

Word Search II is an Amazon / Apple / Bloomberg favorite, and a top-3 hardest commonly-asked Mediums (often listed Hard). The interview signal is recognizing that running LC 79 (Word Search single-word) once per word is catastrophically slow for many words, and that a trie collapses the W independent searches into a single grid traversal. Strong candidates code the trie + DFS in 25 minutes. Elite candidates also implement trie pruning (deleting fully-found subtrees) to avoid revisiting.

Problem Statement

Given an M x N grid of characters and a list of words, return all words from the list that exist in the grid. A word can be constructed from letters of sequentially adjacent cells (horizontal/vertical), each cell used at most once per word.

Constraints

1 ≤ M, N ≤ 12
1 ≤ |words| ≤ 3 × 10^4
1 ≤ |word| ≤ 10
board[i][j] and words[i][j] are lowercase English letters.
All words are distinct.

Clarifying Questions

Each word individually uses each cell ≤ once, but can different words reuse the same cells? (Yes — independent searches per word.)
Diagonal adjacency? (No — only 4-connected.)
Are words guaranteed distinct? (Per constraints, yes.)
Are duplicates in the result allowed? (No — return distinct words found.)
Lower-case only? (Per constraints, yes — alphabet of size 26 simplifies trie nodes to fixed arrays.)

Examples

board = [["o","a","a","n"],
         ["e","t","a","e"],
         ["i","h","k","r"],
         ["i","f","l","v"]]
words = ["oath","pea","eat","rain"]
output = ["eat","oath"]

board = [["a","b"],["c","d"]]
words = ["abcb"]
output = []   (cells not adjacent or reused)

Initial Brute Force

For each word, run LC 79 (single-word search) on the grid.

def find_words_brute(board, words):
    return [w for w in words if exists_in_grid(board, w)]

def exists_in_grid(board, word):
    M, N = len(board), len(board[0])
    def dfs(r, c, i):
        if i == len(word): return True
        if not (0 <= r < M and 0 <= c < N) or board[r][c] != word[i]: return False
        ch, board[r][c] = board[r][c], '#'
        ok = dfs(r+1,c,i+1) or dfs(r-1,c,i+1) or dfs(r,c+1,i+1) or dfs(r,c-1,i+1)
        board[r][c] = ch
        return ok
    return any(dfs(r, c, 0) for r in range(M) for c in range(N))

Brute Force Complexity

Per word: O(M·N · 4·3^(L-1)) — for each starting cell, DFS explores at most 4 branches initially then 3 (one cell visited). Across W words: O(W · M · N · 4 · 3^(L-1)). With W = 3×10⁴, L = 10, M·N = 144: ~3×10⁴ × 144 × 4 × 3⁹ = enormous (~10^11). TLEs.

Optimization Path

Insight: all W single-word searches share grid traversal. If we have a trie of all words, a single DFS over the grid can simultaneously match all words, advancing through trie nodes as we step. At each grid cell (r, c), instead of asking “does this cell match word[i]”, we ask “does the current trie node have a child for board[r][c]”. If yes, descend. If the current trie node has the word field set, that word has been found — record and clear (to dedupe).

Pruning: after backtracking, if the trie node we just descended into has no children left (all its words have been found and we cleared them), prune it from the parent. This avoids revisiting empty subtrees on later starting cells.

Final Expected Approach

def find_words(board, words):
    # 1) Build trie
    trie = {}
    for w in words:
        node = trie
        for c in w:
            node = node.setdefault(c, {})
        node['$'] = w                  # marker: word ends here

    M, N = len(board), len(board[0])
    found = []

    def dfs(r, c, parent):
        ch = board[r][c]
        node = parent.get(ch)
        if not node: return
        if '$' in node:
            found.append(node.pop('$'))    # dedup: clear marker
        board[r][c] = '#'
        for dr, dc in ((1,0),(-1,0),(0,1),(0,-1)):
            nr, nc = r+dr, c+dc
            if 0 <= nr < M and 0 <= nc < N and board[nr][nc] != '#':
                dfs(nr, nc, node)
        board[r][c] = ch
        if not node:                       # prune dead branch
            parent.pop(ch)

    for r in range(M):
        for c in range(N):
            dfs(r, c, trie)

    return found

Data Structures Used

Trie: nested dict (Python). In Java/C++, an explicit TrieNode class with TrieNode[26] children array.
Grid: mutated in place ('#' marker for visited).
Output list: distinct words found.

Correctness Argument

Trie invariant: the trie initially encodes all words; each leaf-marker '$' carries the word. The DFS descends one trie level per grid step; arriving at a node with '$' means we’ve matched the complete word from the start cell.

Backtracking correctness: in-place marking with '#' and explicit restoration in the post-recurse line guarantee that on entry to any DFS call, the grid reflects only ancestor cells as visited. The restore is symmetric to the mark; no leaks.

Dedup via pop('$'): clearing the marker on first find ensures each word is reported exactly once even if multiple grid paths spell it.

Pruning correctness: pruning a child after recursion only removes a subtree that has no remaining words to find (no '$' markers anywhere below). Future searches that would have entered this subtree gain nothing from doing so, so pruning is safe and accelerates the algorithm.

Complexity

Time O(M·N · 4·3^(L-1)) after trie pruning, in the best case (when found words deplete the trie quickly). Worst case (all words distinct, no early termination): O(M·N · 4·3^(L-1)) for the grid traversal + O(W·L) for trie build. Space: O(W·L) for the trie + O(L) for recursion.

The W factor is gone because all words share the traversal.

Implementation Requirements

One DFS per starting cell, not W DFSes per starting cell. The trie unifies them.
Restore the cell (board[r][c] = ch) on every code path. The cleanest pattern: mark before the recursive calls, restore after — never restore inside conditionals.
Prune by removing the child entry when its subtree empties. This is a 5%-50% speedup depending on input.
Dedup by pop('$') on found, not by if word not in found: found.append(word) (the latter is O(W) per check).
For Python, use a nested dict; explicit TrieNode classes are slower due to attribute lookup.

Tests

Smoke: the LC example above.
Unit: single-cell grid, single-letter word; word identical to one row.
No matches: words with letters not in the grid → [].
All matches: every word findable.
Diagonal trap: word that exists only along a diagonal — should NOT be found.
Reuse trap: board=[["a","a"]], words=["aaa"] → [] (cell reuse forbidden).
Stress: M=12, N=12, W=3×10⁴, L=10, random; assert <1s in optimized languages, <5s in Python.
Adversarial: words with long common prefix (e.g., 1000 variants of "prefix....") — exercises the trie’s prefix-sharing benefit.

Follow-up Questions

“What if words can use diagonal adjacency too?” → 8 directions; everything else identical.
“What if the same cell can be reused?” → no marking needed; but then word length is unbounded by grid size, exponential blowup risk; need a cycle guard (e.g., max-length cap = some threshold).
“What if you want all paths spelling each word, not just whether it exists?” → don’t dedup; collect on every match.
“Memory blow-up for huge dictionaries?” → use a compressed trie (radix tree) or DAWG.
“Distributed: shard grid across machines?” → grids small enough to not matter; for huge grids, partition with overlap of size L-1.
“Online: words added/removed dynamically?” → trie supports insert/delete in O(L); the search needs no change.

Product Extension

This pattern underlies multi-pattern string matching in DLP (data-loss prevention) — scanning documents for any of a list of forbidden phrases — and in IDE autocomplete-on-context (which words from the dictionary can be formed by adjacent identifiers in scope?). It’s also used in spell-checkers that operate on keyboard layouts (find dictionary words spellable by adjacent keys). The Aho-Corasick algorithm is the streaming generalization (multi-pattern matching in O(text + total-pattern-length + matches)).

Language/Runtime Follow-ups

Python: nested dict is the fastest trie representation in Python; class-based is slower due to attribute lookup. dict.setdefault(c, {}) is the canonical insert; dict.pop(key, None) is safe pop with default.
Java: explicit TrieNode { TrieNode[] next = new TrieNode[26]; String word; } is fastest. HashMap<Character, TrieNode> is slower (autoboxing + hash).
Go: struct with [26]*TrieNode array. Map-based is slower.
C++: struct with TrieNode* next[26] = {nullptr}; and string* word. Avoid std::map<char, TrieNode>.
JS/TS: plain object as a hashmap is fine, but for hot loops, a Map or fixed-array index works.
Recursion depth: Python L = 10 fits the default 1000-frame stack. For deeper word lengths, set sys.setrecursionlimit.
Mutation safety: in-place grid mutation with '#' is fast but has thread-safety implications; if the function must be re-entrant, use an explicit visited set per call.

Common Bugs

Forgetting to restore the cell. Causes false negatives later: cells stay marked '#' permanently, blocking unrelated words.
Restoring inside if/return paths. Always restore at the end, after all branches have explored. Easiest: structure as mark; for each direction: recurse; restore;.
Visited check on the neighbor, not on the current cell. You should refuse to step into a '#' cell, but the current cell mark happens after entering it.
Adding the same word multiple times. Without pop('$'), a word findable by 5 paths gets reported 5 times.
Walking the trie root-back-to-root for each starting cell in non-trie code — re-enumerating words you’ve already found.
Boxing in Java’s Map<Character, TrieNode> — silent ~3× slowdown vs TrieNode[26].
Using word in found (O(W)) instead of pop('$') (O(1)) for dedup.
Pruning incorrectly: popping the trie node’s '$' marker but leaving its empty dict in the parent — then visited subtrees are revisited as empty traversals. Always check if not node: parent.pop(ch) after recursion.

Debugging Strategy

Build a tiny trie by hand for ["oath", "oat"] — verify the structure with a print.
Trace one DFS from cell (0,0) for the smoke example. The cell o matches root’s 'o' child; descend; mark; try neighbors; etc. Verify oath is found exactly once.
Test the prune with a single word: after finding "oath", the trie should be empty. Subsequent starts at any cell return immediately.
Cross-check against the brute-force LC 79 per word for the smoke and stress tests on M=N=4, W=10.

Mastery Criteria

Recognized “many words in a grid” as trie + DFS within 60 seconds.
Built the trie in ≤ 8 lines with setdefault.
Wrote the DFS with mark; recurse; restore symmetry on first attempt.
Implemented dedup via pop('$') and pruning via if not node: parent.pop(ch).
Articulated the W factor savings vs running LC 79 W times.
Identified the language-specific trie-representation trade-off.
Solved LC 79 (single-word) in <8 minutes within a week.
Solved LC 1268 (Search Suggestions System — autocomplete) within two weeks.

LeetCode Interview — Extreme Coding