p72 — Regular Expression Matching
Source: LeetCode 10 · Hard · Topics: DP, Recursion, String Companies (2024–2025): Google (high), Meta (high), Amazon (medium-high), Microsoft (medium) Loop position: The DP problem where the recurrence has a branching
*case. Tests careful case analysis:*can match ZERO or MORE preceding chars, leading to a “skip the (char,*) pair” vs “consume one char” decision at every step.
1. Quick Context
Implement regex matching for . (matches any single char) and * (matches zero or more of the PRECEDING element). Match the ENTIRE string (not substring).
Strategy: 2D DP. dp[i][j] = True iff s[:i] matches p[:j]. Three cases for the pattern’s last character:
- Plain char or
.:dp[i][j] = dp[i-1][j-1] AND (s[i-1] matches p[j-1]). *(forms pair withp[j-2]): two choices —- Match ZERO of
p[j-2]:dp[i][j-2]. - Match ONE MORE of
p[j-2]:dp[i-1][j]ANDs[i-1] matches p[j-2].
- Match ZERO of
O(m·n).
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/regular-expression-matching/
STOP. 40-min timer. The case-analysis is the hardest part.
3. Prerequisite Concepts
- 2D string DP (alignment style).
- Why “match zero” via
*is the non-obvious case. - Empty string can be matched by
a*,a*b*c*, etc. - The
*always pairs with the preceding character — indexj-2.
4. How to Approach (FRAMEWORK 1–9)
1. Restate: Boolean: does full string match full pattern with . and * semantics?
2. Clarify: * always follows a real char or . (LC promises valid input). Match is FULL not substring.
3. Examples: s="aa", p="a*" → True. s="ab", p=".*" → True. s="aab", p="c*a*b" → True.
5. Brute force: Recursive char-by-char matching with branching on *. Exponential without memoization.
6. Target: O(m·n) DP.
7. Pattern: 2D DP, alignment-style with a branching * rule.
8. Develop: see solution.py.
9. Test: empty s vs empty p; a* matches empty; .* matches anything; nested stars like a*b*c*.
5. Progressive Hints
See hints.md.
6. Deeper Insight
6.1 The dp[i][j] definition matters
Use dp[i][j] = “s[0..i-1] matches p[0..j-1]” (1-indexed end-exclusive). So dp[0][0] = True (empty matches empty).
Initialization row dp[0][j]: empty string matches pattern iff pattern is zero-or-more star pairs. dp[0][j] = dp[0][j-2] whenever p[j-1] == '*'.
6.2 The * case is the entire problem
if p[j-1] == '*':
# Option A: match zero of p[j-2]
dp[i][j] = dp[i][j-2]
# Option B: match one MORE of p[j-2]
if p[j-2] == '.' or p[j-2] == s[i-1]:
dp[i][j] = dp[i][j] or dp[i-1][j]
“Match one more” means: we already used the (char,*) pair to consume zero or more chars up through s[i-2]; now we consume one more (s[i-1] if it matches p[j-2]). This stays on the SAME j because the * is still active.
6.3 Why dp[i-1][j] not dp[i-1][j-2] in Option B
The * is “greedy” in the sense that it remains in scope as we consume more chars. dp[i-1][j] says: “up to s[i-2] matched the SAME pattern p[:j] (including the active *)” — we just extend by one more matching char.
6.4 NFA/DFA framing
a*b*c* is a classic regular language. The DP is essentially simulating an NFA where each state is a position in the pattern, and * transitions are ε-moves. Production regex engines use either Thompson’s NFA construction (linear time) or backtracking (exponential worst case — see the StackOverflow July 2016 outage). This DP is conceptually the NFA simulation, written iteratively.
6.5 Backtracking RegEx engines and ReDoS
Most production regex engines (Perl, Python’s re, JavaScript) use backtracking, which can hit exponential worst cases on patterns like (a+)+$ against "aaaa...!". RE2 (Google) uses NFA construction to guarantee linear time. Knowing this distinction matters at Staff+.
6.6 LC 44 Wildcard Matching
LC 44 uses * differently — it matches ANY sequence of any characters (no preceding char binding). Different DP recurrence: dp[i][j] = dp[i-1][j] OR dp[i][j-1] for *. Easier than LC 10.
6.7 Memoized recursion variant
Often the cleanest for interview pressure: recurse on (i, j) = “does s[i:] match p[j:]?” with memoization. Two cases on next pattern token (regular vs c*). Many candidates find this easier than iterative.
7. Anti-Pattern Analysis
- Treat
*as “matches any single char” — that’s LC 44 wildcard. LC 10’s*modifies the PRECEDING char. - Forget the empty-prefix row
dp[0][j]— wrong answer for inputs likes="", p="a*b*". - Use
dp[i-1][j-1]instead ofdp[i-1][j]for “match one more” — silently wrong ons="aaa", p="a*". - Use 0-indexed
dp[i][j]= match s[:i+1] vs p[:j+1]` — boundary handling becomes painful. - Greedy “match longest first” — fails on
s="aa", p="a*a"and others. - Catastrophic backtracking in a recursive solution — TLE on adversarial inputs without memoization.
8. Skills & Takeaways
- 2D string DP with branching recurrence.
*modifies preceding char; pair withp[j-2].- Three cases at each step (plain,
.,*). - Initialization row for empty-string matching.
- Connection to NFA/DFA theory.
9. When to Move On
- Write iterative or memoized version cold in 35 min.
- State the recurrence in English: “If pattern ends in
*: either skip the (char,*) pair or consume one more matching char.” - Solve LC 44 (Wildcard Matching) by analogy.
- Articulate when backtracking regex engines blow up.
10. Company Context
- Google: L5/L6; “implement a small regex” is a classic.
- Meta: E5/E6; tests case analysis carefully.
- Amazon: L6; rare but discriminating.
- Microsoft: L65; engine-implementation flavor.
Scorecard for strong: “Defined dp[i][j] cleanly with empty-prefix row, articulated *’s two cases, used dp[i-1][j] for ‘match one more’ correctly, mentioned NFA framing.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff+ |
|---|---|---|---|---|
| Case analysis | Confused | After hints | Cold 3-case split | + cites NFA |
* recurrence | Off-by-one | Correct after debugging | Cold | + explains intuition |
| Initialization | Misses | After failing test | Cold | + works backward from empty match |
| Generalization | None | LC 44 by name | LC 44 solved by analogy | + RE2 vs backtracking ReDoS discussion |
12. Level Delta
- Mid (L4): Memoized recursion after hints; passes most tests.
- Senior (L5): Iterative DP cold; correct empty-prefix row; correct
*recurrence; cites LC 44. - Staff (L6): + NFA/Thompson construction reference + ReDoS pitfall + RE2 mention.
- Principal (L7+): + full regex-to-NFA conversion + DFA construction tradeoffs + production grep engine architecture.
13. Follow-up Q&A
Q1: Add + (one or more) and ? (zero or one).
A1: + reduces to a copy: a+ ≡ aa*. ? is “match 0 or 1”: dp[i][j] = dp[i][j-2] or (matching dp[i-1][j-2]).
Q2: What’s the time complexity of brute recursion?
A2: Exponential. Each * doubles the search tree. Memoization reduces to O(m·n).
Q3: Implement using NFA construction. A3: Build Thompson NFA, then simulate with set-of-states. O((m+n)·s) where s is input length.
Q4: What about anchors (^, $)?
A4: LC version assumes full match (implicit anchors). Adding partial-match support requires extending the recurrence.
Q5: Production case where backtracking regex bit you?
A5: ReDoS attacks — (a+)+$ against "aaaaaa!". Use RE2 or rewrite the regex to avoid catastrophic backtracking.
14. Solution Walkthrough
See solution.py:
is_match_dp— iterative 2D DP, canonical.is_match_memo— top-down with@lru_cache.is_match_brute— naive recursion (oracle for small inputs).
15. Beyond the Problem
Principal code review: “Regex matching is the gateway to NFA/DFA theory. The DP you write here is essentially a tabular NFA simulation: each
dp[i][j]cell is ‘can the NFA, after consuming s[:i], be in pattern-state j?’ The branching*recurrence corresponds to ε-transitions in the NFA. Production regex engines split into two camps: backtracking (Perl, Pythonre, PCRE) — concise to implement, exponential worst case (ReDoS), and NFA simulation (Go’s regexp/RE2, Rust’s regex) — slightly more complex code, linear time guaranteed. The 2016 StackOverflow outage and several CVE-rated ReDoS attacks demonstrate that this isn’t an academic distinction. When you write this DP in an interview, mention RE2; it shows you understand that the elegant n×m table is the same algorithm Google’s grep uses for trillions of lines of logs daily.”