p39 — Word Search

Source: LeetCode 79 · Medium-Hard · Topics: Array, Backtracking, Matrix, DFS Companies (2024–2025 frequency): Meta (very high), Amazon (high), Microsoft (high), Google (high), Apple (medium) Loop position: algorithm round, frequently the centerpiece. The “in-place visited marking” trick is a senior-vs-mid discriminator.

1. Quick Context

Given a 2D grid of letters and a target word, return True if the word can be formed by a path through horizontally/vertically adjacent cells, where each cell is used at most once.

The pattern is grid backtracking: for each cell matching word[0], DFS into the 4 neighbors trying to match word[1], word[2], etc., with in-place visited marking (overwrite the cell with a sentinel like '#', restore on backtrack).

def dfs(r, c, i):
    if i == len(word): return True
    if not in_bounds(r, c) or board[r][c] != word[i]: return False
    board[r][c] = '#'                  # mark visited (in-place)
    found = any(dfs(r+dr, c+dc, i+1) for dr, dc in DIRS)
    board[r][c] = word[i]              # restore
    return found

Time: O(R · C · 4^L) worst case (L = word length). Space: O(L) recursion (no extra visited set).

The discriminators:

  1. In-place visited marking vs. a separate visited set (saves O(R·C) memory for the set per call).
  2. Pruning — return immediately on first character mismatch.
  3. The Trie extension (LC 212 Word Search II): search for MANY words simultaneously via a Trie root.

🔗 https://leetcode.com/problems/word-search/

STOP. 20-min timer.


3. Prerequisite Concepts

  • Grid DFS (Phase 01 Lab 09 — tree traversal generalizes).
  • The 4-direction delta convention: DIRS = ((-1,0),(1,0),(0,-1),(0,1)).
  • Backtracking template (p36).

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Find ANY path in the grid spelling word, using each cell at most once. Return bool.”

Step 2 — Clarify:

  • “Diagonal moves?” (No — 4-directional.)
  • “Case-sensitive?” (Yes per LC.)
  • “Can the same cell appear twice in the path?” (No.)
  • “Grid mutation allowed?” (For in-place visited — verify with interviewer; usually yes if restored.)
  • “Empty word?” (Per LC, word.length ≥ 1.)
  • “What’s R, C, L?” (R, C ≤ 6 usually; L ≤ 10. Tight constraints because 4^L explodes.)

Step 3 — Constraints: R, C ≤ 6, L ≤ 15 typical → 36 starting cells × 4^15 ≈ 1B worst case, but pruning on first mismatch keeps it well under the limit in practice.

Step 4 — Examples:

  • [["A","B","C","E"],["S","F","C","S"],["A","D","E","E"]], "ABCCED" → True.
  • Same grid, "SEE" → True.
  • Same grid, "ABCB" → False (can’t reuse the B).

Step 5 — Brute Force: Try every starting cell as the first letter; DFS in all 4 directions, tracking visited.

Step 6 — Complexity: O(R · C · 4^L) time. O(L) space if in-place. O(R · C + L) with separate visited set.

Step 7 — Pattern:

  • “Find a path in a grid with character constraints” → grid backtracking.
  • “In-place visited” is the standard memory optimization.
  • “Many words at once” → Trie + LC 212.

Step 8 — Develop: see solution.py. In-place AND separate-set variants. Bonus: Trie-based LC 212 sketch.

Step 9 — Test: word starts at every corner, word loops back into used cell, word longer than grid, single-cell grid.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Grid backtracking patterns

6.1 The DFS template for grid backtracking

DIRS = ((-1, 0), (1, 0), (0, -1), (0, 1))

def dfs(r, c, i):
    # Base: matched the entire word
    if i == len(word): return True
    # Bounds + character match (BOTH checks here — single source of truth)
    if r < 0 or r >= R or c < 0 or c >= C or board[r][c] != word[i]:
        return False
    # Mark visited (in-place sentinel)
    saved = board[r][c]
    board[r][c] = '#'
    # Recurse on all 4 neighbors
    for dr, dc in DIRS:
        if dfs(r + dr, c + dc, i + 1):
            board[r][c] = saved      # restore before short-circuit return
            return True
    board[r][c] = saved              # restore on backtrack
    return False

The single-source-of-truth check (bounds + character + visitedness all in one if) is the bug-resistant version. Many candidates check bounds in the caller’s loop, character in the function, and visitedness separately — three places to be wrong.

6.2 In-place marking vs separate visited set — why it matters

  • Separate set: O(R·C) memory per recursion stack frame’s visited copy (if you naively copy), OR a shared set + remove on backtrack (cleaner). Either way, hashing adds constant overhead.
  • In-place: O(L) recursion stack only. No extra memory. Hot in practice.

Caveat: the function must NOT throw between mark and restore (it leaves the grid corrupted). Wrap restore in try/finally if the caller’s grid is precious.

6.3 Pruning order

The early-return on bounds + character mismatch is the entire ballgame. Without it, you explore 4^L paths for every starting cell — TLE.

Some implementations also count character frequencies in the grid vs the word before any DFS — if the grid lacks even one letter of the word, return False immediately. Saves a full search on impossible inputs. Cheap O(R·C + L) preprocessing.

Even better: if word[0] appears MORE often in the grid than word[-1], reverse the word before searching. Counter-intuitive but it works because we prune on first mismatch — starting from the rarer character gives fewer wasted DFS calls.

6.4 LC 212 — Word Search II (the Trie extension)

For K words, the naive approach runs LC 79 K times: O(K · R·C · 4^L). For K=10000 this is unacceptable.

The Trie approach builds a Trie of all K words, then does ONE DFS per starting cell, following Trie edges. When the DFS reaches a Trie node marked “end-of-word,” record the word and delete the leaf (so we don’t report it twice and to prune dead branches).

Complexity: O(R · C · 4^L) where L = max word length. The constant is small because the Trie heavily prunes (4 directions × Trie branching, but most paths die quickly).

Trie pruning is a staff+ signal for this problem family.

6.5 Why we DON’T memoize

This problem looks DP-shaped (“subproblem: can we match word[i:] starting at (r, c)?”) but it’s NOT memoizable: the answer depends not only on (r, c, i) but ALSO on which cells have been visited. The visited state is exponential. So backtracking is the right tool; DP is the wrong one.

This is a classic interview “trap” — the candidate who tries to memoize wastes 10 minutes.


7. Anti-Pattern Analysis

  1. Separate visited set passed by copy. O(R·C) memory per call × 4^L calls = death.
  2. Mark visited BEFORE the bounds/char check. Corrupts the grid on invalid recursion.
  3. Restore visited ONLY on the False path (forget to restore on early-True return). Leaves grid corrupted; subsequent starting-cell searches see wrong values.
  4. Memoize (r, c, i). Wrong because visitedness matters. Fundamentally unsound.
  5. Iterate starting cells but skip the early character match. if board[r][c] == word[0]: dfs(...). The inner DFS handles it, but the outer skip saves a function call per non-matching cell.
  6. Allocate DIRS inside the recursion. Tiny waste but a code smell — hoist to module scope.
  7. For LC 212, run LC 79 once per word without Trie. Times out at K > 100.
  8. Path string concatenation (path + ch per call) — for this problem irrelevant since we don’t track path, but a common bug elsewhere.

8. Skills & Takeaways

  • Grid backtracking template: single-source-of-truth check + in-place visited + restore.
  • DIRS = ((-1,0),(1,0),(0,-1),(0,1)) — module-level constant.
  • Pruning is everything: bounds + char-match is the single line that prevents TLE.
  • In-place sentinel ('#') trick — saves O(R·C) memory.
  • Trie extension for LC 212 — multi-word search.
  • DON’T memoize — visited state is part of the subproblem.

9. When to Move On

Move on when:

  • You can write the LC 79 template in under 10 min.
  • You can sketch the Trie extension (LC 212) in under 15 min.
  • You can articulate WHY memoization is unsound here.
  • Stress test on 100 random grids agrees with a brute oracle.

10. Company Context

Meta:

  • LC 79 is one of the most-asked Medium problems at Meta. Bar Raiser expectation: clean code, in-place visited, articulate complexity.
  • LC 212 (Trie variant) is asked for L5+ — a “warm” candidate is expected to mention Trie immediately when “many words” comes up.
  • Scorecard phrase: “Algorithm — clean grid backtracking, in-place visited, articulated 4^L complexity.”

Amazon:

  • L5 staple. The “in-place visited” is often the deciding code-quality signal.
  • Bar Raisers test the “now do it for 10000 words” follow-up.

Microsoft:

  • L63. Often pairs LC 79 with “now path-track and return the path” (Word Search variant returning the cell sequence).

Google:

  • L4/L5. Often combined with “now visualize the search frontier” to test BFS vs DFS reasoning.

Apple:

  • Common in algo loops; education and accessibility teams use grid-pathing in features.

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
VisitedSet (copy per call)Shared set + add/removeIn-place sentinel+ explains memory trade-off
PruningAt leafAt each callEarly bounds + char check in one if+ character-frequency precheck + reverse-word heuristic
LC 212“Run LC 79 K times”“Trie?” (vague)Trie + DFS along Trie nodes+ leaf-deletion + dead-branch pruning
Memoize?Tries to memoize“Maybe?”“No — visited state is exponential”+ cites: depends on the search state, not just position

12. Level Delta

Mid (L4):

  • Works with separate visited set; complexity is fuzzy.
  • Handles 1×1 grid, word = “A”.

Senior (L5):

  • In-place visited; restores correctly on all paths.
  • Cites O(R·C · 4^L).
  • Mentions Trie for the multi-word variant.

Staff (L6):

  • LC 212 with Trie + leaf-deletion.
  • Articulates why memoization is unsound.
  • Character-frequency precheck + reverse-word pruning.
  • Returns path if asked, not just boolean.

Principal (L7+):

  • Discusses production: text-on-image (OCR post-processing — search a known dictionary in extracted glyph grid), game-board solvers (Boggle, Scrabble), bioinformatics (sequence alignment in 2D scaffolds).
  • Discusses parallelization: shard starting cells across workers; each worker has its own grid copy.
  • Discusses approximate match: edit-distance Trie traversal for fuzzy word search (production spellcheck).

13. Follow-up Q&A

Q1: Return the actual path (cells), not just bool. A: Maintain a path list of (r, c). Append/pop in lockstep with mark/restore. On success, deep-copy and return.

Q2: LC 212 — find ALL words from a dictionary present in the grid. A: Build a Trie of all words. DFS each starting cell, walking the Trie in parallel. On end-of-word, record + delete leaf to avoid dupes and dead branches.

Q3: What if cells can be reused? A: Then memoization works: subproblem is (r, c, i) independent of history. O(R · C · L) time, O(R · C · L) memo.

Q4: 8-directional (include diagonals)? A: DIRS = [(dr, dc) for dr in (-1,0,1) for dc in (-1,0,1) if (dr,dc) != (0,0)]. 8^L worst case — same algorithm, larger constant.

Q5: How to scale to a 1000×1000 grid? A: Same template. Worst-case complexity unchanged, but practical pruning (character frequency, starting from rarer characters) dominates. For DICTIONARY search (LC 212), Trie is essential.


14. Solution Walkthrough

See solution.py:

  • exist(board, word) — in-place visited, single-source-of-truth check.
  • exist_with_set(board, word) — separate visited set (less efficient but doesn’t mutate input).
  • exist_brute(board, word) — naive exhaustive search without pruning (for stress comparison; very slow).
  • edge_cases() + stress_test() — random grids 3×3 to 4×4 with random words.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“Three things. (1) The in-place mutation is the key optimization but ALSO the dangerous one. If this function is called from multi-threaded code, you’ve created a race condition — make the contract crystal clear in the docstring (Mutates board temporarily; not thread-safe) or accept a copy. (2) For LC 212, the leaf-deletion is non-obvious — write the test that exercises the case where the same prefix has 5 words ending under it; without leaf-deletion you’ll report duplicates AND fail to prune. (3) Consider adding a max_length_in_dict Trie-walk depth limit when the word lengths vary wildly — saves the DFS from continuing past the longest possible match at each starting cell.”

Connections forward:

  • p36, p37, p38 — backtracking siblings.
  • p40 — N-Queens: stateful backtracking with sets.
  • LC 212 — Word Search II: Trie extension.
  • LC 200 — Number of Islands: grid DFS without backtracking.
  • LC 130 — Surrounded Regions: grid DFS with boundary trick.
  • LC 1219 — Path with Maximum Gold: grid backtracking + value accumulation.
  • Phase 03 Lab 07 — Trie Applications.
  • Phase 04 Lab 02 — DFS / Connected Components.
  • Production: Boggle/Scrabble solvers, OCR dictionary post-processing, game-board path validation, formal verification of small state-spaces.