p74 — Distinct Subsequences
Source: LeetCode 115 · Hard · Topics: DP, String Companies (2024–2025): Meta (high), Amazon (medium-high), Microsoft (medium), Google (medium) Loop position: The COUNTING DP — not a boolean, not an optimum, but how many ways. Tests whether candidate can derive an additive recurrence cleanly and reason about why the “include vs exclude” split avoids double-counting.
1. Quick Context
Given strings s and t. Return the number of distinct subsequences of s that equal t.
Strategy: 2D DP. dp[i][j] = number of distinct subsequences of s[:i] equal to t[:j]. Recurrence:
- If
s[i-1] != t[j-1]:dp[i][j] = dp[i-1][j](must skip s[i-1]). - If
s[i-1] == t[j-1]:dp[i][j] = dp[i-1][j-1] + dp[i-1][j](either use s[i-1] to match t[j-1], or skip it).
O(m·n) time and space; reducible to O(n).
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/distinct-subsequences/
STOP. 35-min timer. Counting DPs trip people who only practice boolean/min/max DPs.
3. Prerequisite Concepts
- Subsequence ≠ substring (no contiguity required).
- Include/exclude framework for counting.
- Distinct = each subsequence counted once even if it appears multiple times.
4. How to Approach (FRAMEWORK 1–9)
1. Restate: Count subsequences of s (preserving order, can skip) that equal t exactly.
2. Clarify: “Distinct” means count once per matching index-set, not per string value. Result can be very large (LC guarantees fits in 32-bit signed int).
3. Examples: s="rabbbit", t="rabbit" → 3 (drop one of the three b’s). s="babgbag", t="bag" → 5.
5. Brute force: Enumerate all subsequences of s with length |t|, compare to t. C(m, |t|) combinations → exponential.
6. Target: O(m·n) DP.
7. Pattern: 2D string DP, counting variant.
8. Develop: see solution.py.
9. Test: t longer than s → 0; t empty → 1 (empty subsequence); s == t.
5. Progressive Hints
See hints.md.
6. Deeper Insight
6.1 Why dp[i-1][j-1] + dp[i-1][j] when chars match
Two disjoint cases when s[i-1] == t[j-1]:
- Use s[i-1] to match t[j-1]: count = ways to form t[:j-1] from s[:i-1] =
dp[i-1][j-1]. - Skip s[i-1]: count = ways to form t[:j] from s[:i-1] =
dp[i-1][j].
The cases are disjoint by construction: the first FIXES the use of s[i-1]; the second EXCLUDES it. No double-counting.
6.2 Base cases
dp[i][0] = 1for all i — empty t has exactly one subsequence (the empty one).dp[0][j] = 0for j ≥ 1 — empty s has no way to form non-empty t.
6.3 The “distinct” trap
“Distinct” doesn’t mean “distinct as strings” — it means each subsequence (as an index set into s) is counted once. s="aa", t="a" → 2 (use either ‘a’). The DP naturally counts index sets, so this just works.
6.4 Space optimization
dp[i][j] depends on dp[i-1][j] and dp[i-1][j-1] — previous row only. Reduce to 1D, iterating j from RIGHT to LEFT (so dp[j-1] still refers to the previous row when read).
6.5 Connection to LC 72 Edit Distance and LC 1143 LCS
Same alignment-DP family. LC 115 differs because it’s COUNTING (additive) instead of BOOLEAN (OR) or OPTIMIZING (min/max). The recurrence shape is identical — just the combiner changes.
6.6 Polynomial coefficient framing
When t = "a" and s has k ’a’s, the answer is k. When t = "aa" and s has k ’a’s, it’s C(k, 2). This counts combinations subject to ordering — a generalization of binomial coefficients.
6.7 Numerical overflow
LC promises 32-bit int, but in general counting DPs can overflow. Mention 64-bit accumulator or modular arithmetic at Staff+.
7. Anti-Pattern Analysis
- Treat “distinct” as set-of-strings — overcomplicates; the DP naturally counts index sets.
- Enumerate combinations of s indices — exponential, won’t pass for m=1000.
- Forget
dp[i][0] = 1— wrong answer (zeros propagate). - Use OR instead of
+— that’s a boolean DP, not counting. - 1D rollback in wrong direction — must iterate j right-to-left to preserve previous-row values.
- Off-by-one in
s[i-1] == t[j-1]check — easy under pressure.
8. Skills & Takeaways
- Counting DP recurrence design.
- Include/exclude framework.
- Distinction between counting / boolean / optimizing DP.
- 1D space optimization with right-to-left iteration.
9. When to Move On
- Solve cold in 30 min with correct O(mn) DP.
- Reduce to O(n) space with right-to-left j-loop.
- Articulate “include vs exclude — disjoint, no double-count.”
- Solve LC 583, 72, 1143 by recognizing the family.
10. Company Context
- Meta: E5/E6; appeared multiple times in onsite rotation.
- Amazon: L6; counting DP discriminator.
- Microsoft: L65; less common, but appears.
- Google: L5; rare; tests case-analysis maturity.
Scorecard for strong: “Defined counting DP with disjoint include/exclude split, dp[i][0]=1 base case, O(mn) recurrence cold, optimized to O(n) with right-to-left scan.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff+ |
|---|---|---|---|---|
| Recurrence | Confused | After hints | Cold | + cites disjointness |
| Base cases | Wrong | After failing test | Cold | + explains why dp[i][0]=1 |
| Space opt | None | 2D only | O(n) with right-to-left | + cites why direction matters |
| DP family | None | None | Cites LC 72, 1143 | + counting vs boolean vs optimizing taxonomy |
12. Level Delta
- Mid (L4): 2D DP after one or two hints; correct on most tests.
- Senior (L5): Cold 2D DP + disjoint include/exclude articulation + O(n) optimization.
- Staff (L6): + DP taxonomy (counting/boolean/optimizing) + overflow discussion + cites combinatorial framing.
- Principal (L7+): + connection to formal grammar parsing (Earley counts) + production code-path counting in static analysis.
13. Follow-up Q&A
Q1: Return the actual subsequences, not just the count.
A1: Backtrack from dp[m][n]; collect index sets where both branches contributed. Output exponential; only feasible for small inputs.
Q2: What if t is much longer than s? A2: Return 0 trivially. Length precheck saves work.
Q3: With repeated patterns in t and large s, can you do faster than O(mn)? A3: For specific structures (e.g., t = single char) collapse to O(m) counting. For general t, O(mn) is tight in worst case.
Q4: Overflow risk?
A4: Real risk for large inputs. Use 64-bit (Python int is unbounded; in Java/C++ use long long) or modular arithmetic if the problem specifies modulo.
Q5: Generalization with weights (count weighted by occurrence)?
A5: Replace + with weighted sum. The recurrence shape is identical; the semiring changes.
14. Solution Walkthrough
See solution.py:
num_distinct_dp— iterative 2D DP, canonical.num_distinct_1d— O(n) space, right-to-left j-loop.num_distinct_memo— top-down with@lru_cache.num_distinct_brute— enumerate viaitertools.combinations(oracle).
15. Beyond the Problem
Principal code review: “Distinct Subsequences is the ‘counting twin’ of the alignment-DP family (LCS, Edit Distance, Interleaving String). Same O(m·n) table shape, same neighbor dependencies — only the combiner changes from
max/min/orto+. Recognizing this is the difference between solving one problem and solving an entire class. The deeper lesson: DPs live in a semiring — replace(+, ×)with(max, +)for shortest paths, with(or, and)for reachability, with(min, +)for tropical algebras used in scheduling. Production parsing libraries (Earley, CYK) use this exact pattern to count parse trees in O(n³). Static analyzers count program paths the same way. When you writedp[i][j] = dp[i-1][j] + dp[i-1][j-1], mention that this is the COUNTING instance of the alignment-DP semiring — interviewers at the Staff+ bar will lean in.”