p30 — Minimum Window Substring

Source: LeetCode 76 · Hard · Topics: Hash Table, String, Sliding Window Companies (2024–2025 frequency): Meta (very high), Amazon (high), Google (high), Microsoft (medium), Bloomberg (medium) Loop position: algorithm round, often the discriminator between Mid and Senior.

1. Quick Context

Given strings s and t, return the minimum window (shortest contiguous substring) in s such that every character in t (counting multiplicity) is included in the window. If no such window exists, return "".

This is the canonical “expand-then-contract” variable-window problem and the natural progression from p29. Three things are different from p29:

  1. The invariant is more complex: “window contains all required chars with sufficient multiplicity” — not just “no duplicates.”
  2. The optimization that separates Mid from Senior: the match counter. Naive implementations call Counter == Counter (O(σ)) on each shrink. The match counter tracks how many distinct characters of t are currently satisfied; checking formed == required is O(1).
  3. The structure: expand right until the window is valid; then contract left while it stays valid, recording the tightest. Repeat.

This problem appears in Meta loops with high frequency specifically because the naive O(N²) → O(N·σ) → O(N) optimization sequence reveals algorithmic depth in 15 minutes.


🔗 https://leetcode.com/problems/minimum-window-substring/

STOP. 35-min timer. This is Hard. If you can’t finish in 35, write the brute force, state the match-counter idea, and ship it for partial credit.


3. Prerequisite Concepts

  • p29 — Longest Substring Without Repeating: variable-window template.
  • collections.Counter and defaultdict(int).
  • Multiset / multiplicity reasoning.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Find the shortest contiguous substring w of s such that for every character c in t, the count of c in w is at least the count of c in t. Return w; if none exists, return empty string.”

Step 2 — Clarify:

  • “Multiplicity matters?” (Yes. t="AABC" requires TWO 'A's in the window.)
  • “Case-sensitive?” (Yes. 'A' vs 'a' distinct.)
  • “If multiple windows tie?” (LC: any one is acceptable. Often: leftmost.)
  • “Empty t?” (Spec says non-empty. Confirm — if empty, return "".)
  • t longer than s?” (No valid window → return "".)
  • “Characters in t not appearing in s at all?” (No valid window → "".)

Step 3 — Constraints: N (len of s) ≤ 10⁵, M (len of t) ≤ 10⁵, alphabet uppercase/lowercase English. O(N+M) target.

Step 4 — Examples:

  • s="ADOBECODEBANC", t="ABC""BANC".
  • s="a", t="a""a".
  • s="a", t="aa""" (not enough 'a's).
  • s="aa", t="aa""aa".
  • s="ab", t="b""b".
  • s="bba", t="ab""ba" (note: leftmost minimum is "ba", length 2).

Step 5 — Brute Force: Enumerate all (i, j) substrings, for each check Counter(s[i:j+1]) >= Counter(t). O(N²·M). State and pivot.

Step 6 — Complexity: Optimal O(N + M) time, O(σ) space.

Step 7 — Pattern Recognition:

  • “Shortest contiguous range satisfying a multiset constraint” → variable sliding window with expand-then-contract.
  • “Validity check is hot path” → incremental match counter (don’t re-compare maps).

Step 8 — Develop: see solution.py.

Step 9 — Test: valid example, no-valid case, single char, full string is the answer, t with duplicates, alphabet collisions.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — The match counter trick

6.1 The naive validity check

The straightforward way:

def is_valid(have, need):
    for c, k in need.items():
        if have.get(c, 0) < k:
            return False
    return True

This is O(σ) per call (where σ = |distinct chars in t|). If called inside both expand and contract loops, total complexity becomes O(N · σ) — fine for small σ but wasteful.

6.2 The match-counter optimization

Maintain two counters and TWO integers:

  • need = Counter(t) — required frequencies.
  • have = defaultdict(int) — current window frequencies.
  • required = len(need) — number of DISTINCT characters that need satisfying.
  • formed = 0 — number of DISTINCT characters currently satisfied (i.e., have[c] >= need[c]).

Expand (right += 1 adds char c):

have[c] += 1
if c in need and have[c] == need[c]:
    formed += 1

The == need[c] (not >=) is critical: we increment formed ONLY at the moment we just met the requirement. Going further (have > need) does NOT change formed.

Validity check: formed == required. O(1).

Contract (left += 1 removes char c):

if c in need and have[c] == need[c]:
    formed -= 1
have[c] -= 1

Order matters: we decrement formed BEFORE reducing have[c] below need[c]. (Equivalently: check have[c] == need[c] first, then decrement have[c].)

Now the main loop:

left = 0
formed = 0
best_len = inf
best_l = 0
for right in range(n):
    expand(s[right])
    while formed == required:           # window valid → try to shrink
        if right - left + 1 < best_len:
            best_len = right - left + 1
            best_l = left
        contract(s[left])
        left += 1
return s[best_l : best_l + best_len] if best_len < inf else ""

Each character is processed at most twice (once on expand, once on contract) → O(N + M). The formed == required check is O(1).

6.3 Why the order of operations matters

If you reverse the contract order:

have[c] -= 1
if c in need and have[c] < need[c]:
    formed -= 1

This is also correct (and arguably more intuitive). But you must NOT do:

have[c] -= 1
if c in need and have[c] == need[c] - 1:
    formed -= 1

unless need[c] > 0 for c — which it is by construction (c in need implies need[c] >= 1). Subtle. Use the cleaner “check then update” form.

6.4 Generalizations

This template extends directly to:

  • LC 567 — Permutation in String: fixed-window version (length = len(t)).
  • LC 438 — Find All Anagrams in a String: fixed-window, return all start indices.
  • LC 30 — Substring with Concatenation of All Words: tokenized version, window over word boundaries.
  • LC 727 — Minimum Window Subsequence: similar API but subsequence, requires DP not sliding window.

7. Anti-Pattern Analysis

  1. Counter == Counter on each shrink check. O(σ) per check — turns the algorithm into O(N·σ). Use match counter.
  2. have[c] >= need[c] instead of have[c] == need[c] when incrementing formed. With >=, you’d increment formed every time you add a c after the first match, double-counting.
  3. Wrong contract order. Decrementing have[c] before checking against need[c] flips the comparison.
  4. Tracking the substring with s[left:right+1] inside the loop. O(N) slice per iteration → O(N²) total. Track (best_l, best_len), slice once at the end.
  5. Initializing best_len = 0. Then the first valid window of any size fails the < best_len check (0 is smaller). Use inf or n + 1.
  6. Forgetting the no-window case. Return "" (empty string), not None or -1.
  7. Counting characters NOT in t. Wasteful but not incorrect. To avoid, guard with if c in need: have[c] += 1. Trade-off: complicates contract logic.
  8. Building need as a dict by iterating t manually with if/else. Counter(t) is faster, clearer, and harder to bug.
  9. Treating t as a SET. Loses multiplicity info. t="AABC" requires TWO 'A's.

8. Skills & Takeaways

  • Variable sliding window with expand-then-contract — the second canonical pattern after p29.
  • Match counter (formed/required) — an O(1) incremental validity check, replacing O(σ) comparisons. Reuse this pattern for any “multiset coverage” problem.
  • Counter / defaultdict fluency.
  • (best_l, best_len) tracking — avoid string slicing in hot loops.
  • Hard vs Medium discrimination — the match counter is the senior signal that elevates this from Medium-quality code.

9. When to Move On

Move on when:

  • You can write the match-counter version in under 18 minutes, no bugs.
  • You can explain WHY have[c] == need[c] (not >=) when incrementing formed.
  • You can articulate why total work is O(N + M) despite the inner while-loop.
  • Your stress test passes 200 random small (s, t) pairs vs brute.
  • You can pivot to LC 567 (Permutation in String) in under 10 minutes using the same machinery (with a fixed window length).

10. Company Context

Meta:

  • One of the highest-frequency Hard problems in 2024-25 Meta loops. Expected solid Senior-level fluency.
  • Bar Raiser specifically watches for the match-counter optimization. Naive Counter == is a partial-credit signal.
  • Scorecard phrase: “Algorithm — sliding window with match-counter optimization, O(N+M).”

Amazon:

  • Often paired with LC 438 (Anagrams) in the same loop — same template, fixed window.
  • Scorecard phrase: “Strong — variable window template + correct match-counter.”

Google:

  • Frequently extended: “now return the K shortest non-overlapping such windows.” Tricky greedy + sliding.
  • Scorecard phrase: “Algorithm + extension — generalized sliding window with constraint tracking.”

Microsoft:

  • Sometimes asked as the warm-up problem in an algorithm-heavy loop.
  • Scorecard phrase: “Strong — Hard problem cleanly solved with no false starts.”

Bloomberg:

  • Variant: “given a stream of trades and a basket of required tickers, find the shortest time window where all required tickers traded.”
  • Scorecard phrase: “Pragmatic — recognized streaming variable-window over time.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
First instinctBrute O(N²·M)Variable window with Counter == checkMatch counter from the startMatch counter + clean test plan + complexity proof
Validity checkMap equality each stepCounter == or full dict scanformed == required O(1)Same + verbal complexity argument
Substring extractions[left:right+1] inside loopSame(best_l, best_len), slice onceSame + discusses memory churn in production
GeneralizationNone“Probably works for anagrams”Notes LC 567, LC 438 use same templateNotes streaming variant, range/temporal extension

12. Level Delta

Mid (L4):

  • Variable-window solution with Counter == Counter validity check.
  • O(N·σ) time, correct answer.
  • All edge cases handled.

Senior (L5):

  • Match counter from the start.
  • O(N + M) time stated and justified.
  • (best_l, best_len) extraction, no in-loop slicing.
  • Tests t with duplicates and no-valid-window case.

Staff (L6):

  • Implements match counter with clean expand/contract symmetry.
  • Discusses LC 567, LC 438, LC 30 as same-template variants.
  • Discusses memory churn: in JVM/Go, the per-iteration slice creates GC pressure — track indices and slice once.
  • Discusses streaming variant (data arriving over time).

Principal (L7+):

  • Discusses the K minimum windows generalization: greedy over non-overlapping intervals.
  • Discusses multi-pattern: given M required substrings (not just chars), find shortest window containing all of them. Reduces to Aho-Corasick + sliding window in O(N·log M).
  • Discusses alphabet size σ very large (Unicode, custom tokens): hash-of-hash for O(1) match counter, or roll our own bitset. For ASCII, fixed-size int[128] beats dict.
  • Discusses production: search-result diversity (find minimum window of search results covering all required facets), log-anomaly detection (minimum window covering all required events), bioinformatics (shortest DNA region containing all required motifs).

13. Follow-up Q&A

Q1: Why have[c] == need[c] (not >=) when incrementing formed? A: formed counts the number of DISTINCT chars that are currently satisfied. We want to increment it exactly once per char — at the moment we first meet the requirement. Using >= would increment on every additional c, making formed larger than required and breaking the formed == required invariant.

Q2: What if the alphabet is huge (Unicode)? A: Replace defaultdict(int) with a plain dict. formed and required still work. For ASCII-only, replace dict with a fixed-size array for ~3-5x speedup; access is O(1) without hashing.

Q3: LC 567 — does s2 contain a permutation of s1? A: Fixed-window of length len(s1). Initialize the window’s have from s2[:len(s1)]. Then slide one char at a time: expand right, contract left. Check formed == required at each step. O(|s2|) time. Use Counter equality at the end if you prefer; match counter is more efficient.

Q4: Find ALL minimum windows (not just one). A: Two passes. First pass: find the minimum length L. Second pass: collect all windows of length exactly L that satisfy the constraint. Same O(N+M) bound.

Q5: Streaming s (data arriving over time). A: Right-expanding sliding window works online. Left can only advance forward. Memory: O(σ). On each character arrival: expand step. Periodically (or on query): check formed == required and try to contract. Online streaming with bounded memory ✓.


14. Solution Walkthrough

See solution.py:

  • min_window(s, t) — canonical match-counter solution. O(N + M).
  • min_window_naive(s, t) — variable window with O(σ) validity check on each shrink. O(N·σ).
  • min_window_brute(s, t) — try every (i, j) pair, compare Counters. O(N²·M). Oracle.
  • stress_test() — 200 random (s, t) pairs; all three agree on the LENGTH (any tied answer is acceptable; we compare lengths).
  • edge_cases() — empty t, t > s, no valid window, LC canonical s="ADOBECODEBANC", t="ABC", duplicates t="AABC", full string, single chars.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“The match-counter trick is great, but bury it under a clear comment block at the top of the function: ‘INVARIANT: formed = count of distinct chars c in need such that have[c] >= need[c].’ Without that, the future engineer maintaining this code will see the == need[c] increment and ‘fix’ it to >= need[c], breaking the invariant silently. Also: the have = defaultdict(int) allocates on every call. If this is a hot path, pass a reusable Counter in. And one more thing — the s[best_l:best_l+best_len] allocates a new string; if the caller only needs the indices for highlighting in a UI, return (best_l, best_len) instead and let the caller decide. Don’t pre-allocate what the caller might not need.”

Connections forward:

  • p29 — Longest Substring Without Repeating: simpler invariant, same template.
  • LC 567 — Permutation in String: fixed-window version.
  • LC 438 — Find All Anagrams: fixed-window, return all start indices.
  • LC 30 — Substring with Concatenation of All Words: tokenized version.
  • LC 727 — Minimum Window Subsequence: requires DP, NOT sliding window — common trap.
  • LC 992 — Subarrays with K Different Integers: “exactly K” trick = “at most K” - “at most K-1.”
  • Phase 02 Lab 02 — Sliding Window: template lab.
  • Production: search-result diversity, log-anomaly minimal coverage, DNA motif finding, ad-targeting “shortest sequence of impressions covering all required categories.”