p39 — Word Search

Source: LeetCode 79 · Medium-Hard · Topics: Array, Backtracking, Matrix, DFS Companies (2024–2025 frequency): Meta (very high), Amazon (high), Microsoft (high), Google (high), Apple (medium) Loop position: algorithm round, frequently the centerpiece. The “in-place visited marking” trick is a senior-vs-mid discriminator.

1. Quick Context

Given a 2D grid of letters and a target word, return True if the word can be formed by a path through horizontally/vertically adjacent cells, where each cell is used at most once.

The pattern is grid backtracking: for each cell matching word[0], DFS into the 4 neighbors trying to match word[1], word[2], etc., with in-place visited marking (overwrite the cell with a sentinel like '#', restore on backtrack).

def dfs(r, c, i):
    if i == len(word): return True
    if not in_bounds(r, c) or board[r][c] != word[i]: return False
    board[r][c] = '#'                  # mark visited (in-place)
    found = any(dfs(r+dr, c+dc, i+1) for dr, dc in DIRS)
    board[r][c] = word[i]              # restore
    return found

Time: O(R · C · 4^L) worst case (L = word length). Space: O(L) recursion (no extra visited set).

The discriminators:

In-place visited marking vs. a separate visited set (saves O(R·C) memory for the set per call).
Pruning — return immediately on first character mismatch.
The Trie extension (LC 212 Word Search II): search for MANY words simultaneously via a Trie root.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/word-search/

STOP. 20-min timer.

3. Prerequisite Concepts

Grid DFS (Phase 01 Lab 09 — tree traversal generalizes).
The 4-direction delta convention: DIRS = ((-1,0),(1,0),(0,-1),(0,1)).
Backtracking template (p36).

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Find ANY path in the grid spelling word, using each cell at most once. Return bool.”

Step 2 — Clarify:

“Diagonal moves?” (No — 4-directional.)
“Case-sensitive?” (Yes per LC.)
“Can the same cell appear twice in the path?” (No.)
“Grid mutation allowed?” (For in-place visited — verify with interviewer; usually yes if restored.)
“Empty word?” (Per LC, word.length ≥ 1.)
“What’s R, C, L?” (R, C ≤ 6 usually; L ≤ 10. Tight constraints because 4^L explodes.)

Step 3 — Constraints: R, C ≤ 6, L ≤ 15 typical → 36 starting cells × 4^15 ≈ 1B worst case, but pruning on first mismatch keeps it well under the limit in practice.

Step 4 — Examples:

[["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCCED" → True.
Same grid, "SEE" → True.
Same grid, "ABCB" → False (can’t reuse the B).

Step 5 — Brute Force: Try every starting cell as the first letter; DFS in all 4 directions, tracking visited.

Step 6 — Complexity: O(R · C · 4^L) time. O(L) space if in-place. O(R · C + L) with separate visited set.

Step 7 — Pattern:

“Find a path in a grid with character constraints” → grid backtracking.
“In-place visited” is the standard memory optimization.
“Many words at once” → Trie + LC 212.

Step 8 — Develop: see solution.py. In-place AND separate-set variants. Bonus: Trie-based LC 212 sketch.

Step 9 — Test: word starts at every corner, word loops back into used cell, word longer than grid, single-cell grid.

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — Grid backtracking patterns

6.1 The DFS template for grid backtracking

DIRS = ((-1, 0), (1, 0), (0, -1), (0, 1))

def dfs(r, c, i):
    # Base: matched the entire word
    if i == len(word): return True
    # Bounds + character match (BOTH checks here — single source of truth)
    if r < 0 or r >= R or c < 0 or c >= C or board[r][c] != word[i]:
        return False
    # Mark visited (in-place sentinel)
    saved = board[r][c]
    board[r][c] = '#'
    # Recurse on all 4 neighbors
    for dr, dc in DIRS:
        if dfs(r + dr, c + dc, i + 1):
            board[r][c] = saved      # restore before short-circuit return
            return True
    board[r][c] = saved              # restore on backtrack
    return False

The single-source-of-truth check (bounds + character + visitedness all in one if) is the bug-resistant version. Many candidates check bounds in the caller’s loop, character in the function, and visitedness separately — three places to be wrong.

6.2 In-place marking vs separate `visited` set — why it matters

Separate set: O(R·C) memory per recursion stack frame’s visited copy (if you naively copy), OR a shared set + remove on backtrack (cleaner). Either way, hashing adds constant overhead.
In-place: O(L) recursion stack only. No extra memory. Hot in practice.

Caveat: the function must NOT throw between mark and restore (it leaves the grid corrupted). Wrap restore in try/finally if the caller’s grid is precious.

6.3 Pruning order

The early-return on bounds + character mismatch is the entire ballgame. Without it, you explore 4^L paths for every starting cell — TLE.

Some implementations also count character frequencies in the grid vs the word before any DFS — if the grid lacks even one letter of the word, return False immediately. Saves a full search on impossible inputs. Cheap O(R·C + L) preprocessing.

Even better: if word[0] appears MORE often in the grid than word[-1], reverse the word before searching. Counter-intuitive but it works because we prune on first mismatch — starting from the rarer character gives fewer wasted DFS calls.

6.4 LC 212 — Word Search II (the Trie extension)

For K words, the naive approach runs LC 79 K times: O(K · R·C · 4^L). For K=10000 this is unacceptable.

The Trie approach builds a Trie of all K words, then does ONE DFS per starting cell, following Trie edges. When the DFS reaches a Trie node marked “end-of-word,” record the word and delete the leaf (so we don’t report it twice and to prune dead branches).

Complexity: O(R · C · 4^L) where L = max word length. The constant is small because the Trie heavily prunes (4 directions × Trie branching, but most paths die quickly).

Trie pruning is a staff+ signal for this problem family.

6.5 Why we DON’T memoize

This problem looks DP-shaped (“subproblem: can we match word[i:] starting at (r, c)?”) but it’s NOT memoizable: the answer depends not only on (r, c, i) but ALSO on which cells have been visited. The visited state is exponential. So backtracking is the right tool; DP is the wrong one.

This is a classic interview “trap” — the candidate who tries to memoize wastes 10 minutes.

7. Anti-Pattern Analysis

Separate visited set passed by copy. O(R·C) memory per call × 4^L calls = death.
Mark visited BEFORE the bounds/char check. Corrupts the grid on invalid recursion.
Restore visited ONLY on the False path (forget to restore on early-True return). Leaves grid corrupted; subsequent starting-cell searches see wrong values.
Memoize (r, c, i). Wrong because visitedness matters. Fundamentally unsound.
Iterate starting cells but skip the early character match. if board[r][c] == word[0]: dfs(...). The inner DFS handles it, but the outer skip saves a function call per non-matching cell.
Allocate DIRS inside the recursion. Tiny waste but a code smell — hoist to module scope.
For LC 212, run LC 79 once per word without Trie. Times out at K > 100.
Path string concatenation (path + ch per call) — for this problem irrelevant since we don’t track path, but a common bug elsewhere.

8. Skills & Takeaways

Grid backtracking template: single-source-of-truth check + in-place visited + restore.
DIRS = ((-1,0),(1,0),(0,-1),(0,1)) — module-level constant.
Pruning is everything: bounds + char-match is the single line that prevents TLE.
In-place sentinel ('#') trick — saves O(R·C) memory.
Trie extension for LC 212 — multi-word search.
DON’T memoize — visited state is part of the subproblem.

9. When to Move On

Move on when:

You can write the LC 79 template in under 10 min.
You can sketch the Trie extension (LC 212) in under 15 min.
You can articulate WHY memoization is unsound here.
Stress test on 100 random grids agrees with a brute oracle.

10. Company Context

Meta:

LC 79 is one of the most-asked Medium problems at Meta. Bar Raiser expectation: clean code, in-place visited, articulate complexity.
LC 212 (Trie variant) is asked for L5+ — a “warm” candidate is expected to mention Trie immediately when “many words” comes up.
Scorecard phrase: “Algorithm — clean grid backtracking, in-place visited, articulated 4^L complexity.”

Amazon:

L5 staple. The “in-place visited” is often the deciding code-quality signal.
Bar Raisers test the “now do it for 10000 words” follow-up.

Microsoft:

L63. Often pairs LC 79 with “now path-track and return the path” (Word Search variant returning the cell sequence).

Google:

L4/L5. Often combined with “now visualize the search frontier” to test BFS vs DFS reasoning.

Apple:

Common in algo loops; education and accessibility teams use grid-pathing in features.

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
Visited	Set (copy per call)	Shared set + add/remove	In-place sentinel	+ explains memory trade-off
Pruning	At leaf	At each call	Early bounds + char check in one `if`	+ character-frequency precheck + reverse-word heuristic
LC 212	“Run LC 79 K times”	“Trie?” (vague)	Trie + DFS along Trie nodes	+ leaf-deletion + dead-branch pruning
Memoize?	Tries to memoize	“Maybe?”	“No — visited state is exponential”	+ cites: depends on the search state, not just position

12. Level Delta

Mid (L4):

Works with separate visited set; complexity is fuzzy.
Handles 1×1 grid, word = “A”.

Senior (L5):

In-place visited; restores correctly on all paths.
Cites O(R·C · 4^L).
Mentions Trie for the multi-word variant.

Staff (L6):

LC 212 with Trie + leaf-deletion.
Articulates why memoization is unsound.
Character-frequency precheck + reverse-word pruning.
Returns path if asked, not just boolean.

Principal (L7+):

Discusses production: text-on-image (OCR post-processing — search a known dictionary in extracted glyph grid), game-board solvers (Boggle, Scrabble), bioinformatics (sequence alignment in 2D scaffolds).
Discusses parallelization: shard starting cells across workers; each worker has its own grid copy.
Discusses approximate match: edit-distance Trie traversal for fuzzy word search (production spellcheck).

13. Follow-up Q&A

Q1: Return the actual path (cells), not just bool. A: Maintain a path list of (r, c). Append/pop in lockstep with mark/restore. On success, deep-copy and return.

Q2: LC 212 — find ALL words from a dictionary present in the grid. A: Build a Trie of all words. DFS each starting cell, walking the Trie in parallel. On end-of-word, record + delete leaf to avoid dupes and dead branches.

Q3: What if cells can be reused? A: Then memoization works: subproblem is (r, c, i) independent of history. O(R · C · L) time, O(R · C · L) memo.

Q4: 8-directional (include diagonals)? A: DIRS = [(dr, dc) for dr in (-1,0,1) for dc in (-1,0,1) if (dr,dc) != (0,0)]. 8^L worst case — same algorithm, larger constant.

Q5: How to scale to a 1000×1000 grid? A: Same template. Worst-case complexity unchanged, but practical pruning (character frequency, starting from rarer characters) dominates. For DICTIONARY search (LC 212), Trie is essential.

14. Solution Walkthrough

See solution.py:

exist(board, word) — in-place visited, single-source-of-truth check.
exist_with_set(board, word) — separate visited set (less efficient but doesn’t mutate input).
exist_brute(board, word) — naive exhaustive search without pruning (for stress comparison; very slow).
edge_cases() + stress_test() — random grids 3×3 to 4×4 with random words.

Run: python3 solution.py.

15. Beyond the Problem

Principal-engineer code review comment:

“Three things. (1) The in-place mutation is the key optimization but ALSO the dangerous one. If this function is called from multi-threaded code, you’ve created a race condition — make the contract crystal clear in the docstring (Mutates board temporarily; not thread-safe) or accept a copy. (2) For LC 212, the leaf-deletion is non-obvious — write the test that exercises the case where the same prefix has 5 words ending under it; without leaf-deletion you’ll report duplicates AND fail to prune. (3) Consider adding a max_length_in_dict Trie-walk depth limit when the word lengths vary wildly — saves the DFS from continuing past the longest possible match at each starting cell.”

Connections forward:

p36, p37, p38 — backtracking siblings.
p40 — N-Queens: stateful backtracking with sets.
LC 212 — Word Search II: Trie extension.
LC 200 — Number of Islands: grid DFS without backtracking.
LC 130 — Surrounded Regions: grid DFS with boundary trick.
LC 1219 — Path with Maximum Gold: grid backtracking + value accumulation.
Phase 03 Lab 07 — Trie Applications.
Phase 04 Lab 02 — DFS / Connected Components.
Production: Boggle/Scrabble solvers, OCR dictionary post-processing, game-board path validation, formal verification of small state-spaces.

LeetCode Interview — Extreme Coding