p30 — Minimum Window Substring

Source: LeetCode 76 · Hard · Topics: Hash Table, String, Sliding Window Companies (2024–2025 frequency): Meta (very high), Amazon (high), Google (high), Microsoft (medium), Bloomberg (medium) Loop position: algorithm round, often the discriminator between Mid and Senior.

1. Quick Context

Given strings s and t, return the minimum window (shortest contiguous substring) in s such that every character in t (counting multiplicity) is included in the window. If no such window exists, return "".

This is the canonical “expand-then-contract” variable-window problem and the natural progression from p29. Three things are different from p29:

The invariant is more complex: “window contains all required chars with sufficient multiplicity” — not just “no duplicates.”
The optimization that separates Mid from Senior: the match counter. Naive implementations call Counter == Counter (O(σ)) on each shrink. The match counter tracks how many distinct characters of t are currently satisfied; checking formed == required is O(1).
The structure: expand right until the window is valid; then contract left while it stays valid, recording the tightest. Repeat.

This problem appears in Meta loops with high frequency specifically because the naive O(N²) → O(N·σ) → O(N) optimization sequence reveals algorithmic depth in 15 minutes.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/minimum-window-substring/

STOP. 35-min timer. This is Hard. If you can’t finish in 35, write the brute force, state the match-counter idea, and ship it for partial credit.

3. Prerequisite Concepts

p29 — Longest Substring Without Repeating: variable-window template.
collections.Counter and defaultdict(int).
Multiset / multiplicity reasoning.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Find the shortest contiguous substring w of s such that for every character c in t, the count of c in w is at least the count of c in t. Return w; if none exists, return empty string.”

Step 2 — Clarify:

“Multiplicity matters?” (Yes. t="AABC" requires TWO 'A's in the window.)
“Case-sensitive?” (Yes. 'A' vs 'a' distinct.)
“If multiple windows tie?” (LC: any one is acceptable. Often: leftmost.)
“Empty t?” (Spec says non-empty. Confirm — if empty, return "".)
“t longer than s?” (No valid window → return "".)
“Characters in t not appearing in s at all?” (No valid window → "".)

Step 3 — Constraints: N (len of s) ≤ 10⁵, M (len of t) ≤ 10⁵, alphabet uppercase/lowercase English. O(N+M) target.

Step 4 — Examples:

s="ADOBECODEBANC", t="ABC" → "BANC".
s="a", t="a" → "a".
s="a", t="aa" → "" (not enough 'a's).
s="aa", t="aa" → "aa".
s="ab", t="b" → "b".
s="bba", t="ab" → "ba" (note: leftmost minimum is "ba", length 2).

Step 5 — Brute Force: Enumerate all (i, j) substrings, for each check Counter(s[i:j+1]) >= Counter(t). O(N²·M). State and pivot.

Step 6 — Complexity: Optimal O(N + M) time, O(σ) space.

Step 7 — Pattern Recognition:

“Shortest contiguous range satisfying a multiset constraint” → variable sliding window with expand-then-contract.
“Validity check is hot path” → incremental match counter (don’t re-compare maps).

Step 8 — Develop: see solution.py.

Step 9 — Test: valid example, no-valid case, single char, full string is the answer, t with duplicates, alphabet collisions.

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — The match counter trick

6.1 The naive validity check

The straightforward way:

def is_valid(have, need):
    for c, k in need.items():
        if have.get(c, 0) < k:
            return False
    return True

This is O(σ) per call (where σ = |distinct chars in t|). If called inside both expand and contract loops, total complexity becomes O(N · σ) — fine for small σ but wasteful.

6.2 The match-counter optimization

Maintain two counters and TWO integers:

need = Counter(t) — required frequencies.
have = defaultdict(int) — current window frequencies.
required = len(need) — number of DISTINCT characters that need satisfying.
formed = 0 — number of DISTINCT characters currently satisfied (i.e., have[c] >= need[c]).

Expand (right += 1 adds char c):

have[c] += 1
if c in need and have[c] == need[c]:
    formed += 1

The == need[c] (not >=) is critical: we increment formed ONLY at the moment we just met the requirement. Going further (have > need) does NOT change formed.

Validity check: formed == required. O(1).

Contract (left += 1 removes char c):

if c in need and have[c] == need[c]:
    formed -= 1
have[c] -= 1

Order matters: we decrement formed BEFORE reducing have[c] below need[c]. (Equivalently: check have[c] == need[c] first, then decrement have[c].)

Now the main loop:

left = 0
formed = 0
best_len = inf
best_l = 0
for right in range(n):
    expand(s[right])
    while formed == required:           # window valid → try to shrink
        if right - left + 1 < best_len:
            best_len = right - left + 1
            best_l = left
        contract(s[left])
        left += 1
return s[best_l : best_l + best_len] if best_len < inf else ""

Each character is processed at most twice (once on expand, once on contract) → O(N + M). The formed == required check is O(1).

6.3 Why the order of operations matters

If you reverse the contract order:

have[c] -= 1
if c in need and have[c] < need[c]:
    formed -= 1

This is also correct (and arguably more intuitive). But you must NOT do:

have[c] -= 1
if c in need and have[c] == need[c] - 1:
    formed -= 1

unless need[c] > 0 for c — which it is by construction (c in need implies need[c] >= 1). Subtle. Use the cleaner “check then update” form.

6.4 Generalizations

This template extends directly to:

LC 567 — Permutation in String: fixed-window version (length = len(t)).
LC 438 — Find All Anagrams in a String: fixed-window, return all start indices.
LC 30 — Substring with Concatenation of All Words: tokenized version, window over word boundaries.
LC 727 — Minimum Window Subsequence: similar API but subsequence, requires DP not sliding window.

7. Anti-Pattern Analysis

Counter == Counter on each shrink check. O(σ) per check — turns the algorithm into O(N·σ). Use match counter.
have[c] >= need[c] instead of have[c] == need[c] when incrementing formed. With >=, you’d increment formed every time you add a c after the first match, double-counting.
Wrong contract order. Decrementing have[c] before checking against need[c] flips the comparison.
Tracking the substring with s[left:right+1] inside the loop. O(N) slice per iteration → O(N²) total. Track (best_l, best_len), slice once at the end.
Initializing best_len = 0. Then the first valid window of any size fails the < best_len check (0 is smaller). Use inf or n + 1.
Forgetting the no-window case. Return "" (empty string), not None or -1.
Counting characters NOT in t. Wasteful but not incorrect. To avoid, guard with if c in need: have[c] += 1. Trade-off: complicates contract logic.
Building need as a dict by iterating t manually with if/else. Counter(t) is faster, clearer, and harder to bug.
Treating t as a SET. Loses multiplicity info. t="AABC" requires TWO 'A's.

8. Skills & Takeaways

Variable sliding window with expand-then-contract — the second canonical pattern after p29.
Match counter (formed/required) — an O(1) incremental validity check, replacing O(σ) comparisons. Reuse this pattern for any “multiset coverage” problem.
Counter / defaultdict fluency.
(best_l, best_len) tracking — avoid string slicing in hot loops.
Hard vs Medium discrimination — the match counter is the senior signal that elevates this from Medium-quality code.

9. When to Move On

Move on when:

You can write the match-counter version in under 18 minutes, no bugs.
You can explain WHY have[c] == need[c] (not >=) when incrementing formed.
You can articulate why total work is O(N + M) despite the inner while-loop.
Your stress test passes 200 random small (s, t) pairs vs brute.
You can pivot to LC 567 (Permutation in String) in under 10 minutes using the same machinery (with a fixed window length).

10. Company Context

Meta:

One of the highest-frequency Hard problems in 2024-25 Meta loops. Expected solid Senior-level fluency.
Bar Raiser specifically watches for the match-counter optimization. Naive Counter == is a partial-credit signal.
Scorecard phrase: “Algorithm — sliding window with match-counter optimization, O(N+M).”

Amazon:

Often paired with LC 438 (Anagrams) in the same loop — same template, fixed window.
Scorecard phrase: “Strong — variable window template + correct match-counter.”

Google:

Frequently extended: “now return the K shortest non-overlapping such windows.” Tricky greedy + sliding.
Scorecard phrase: “Algorithm + extension — generalized sliding window with constraint tracking.”

Microsoft:

Sometimes asked as the warm-up problem in an algorithm-heavy loop.
Scorecard phrase: “Strong — Hard problem cleanly solved with no false starts.”

Bloomberg:

Variant: “given a stream of trades and a basket of required tickers, find the shortest time window where all required tickers traded.”
Scorecard phrase: “Pragmatic — recognized streaming variable-window over time.”

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
First instinct	Brute O(N²·M)	Variable window with `Counter ==` check	Match counter from the start	Match counter + clean test plan + complexity proof
Validity check	Map equality each step	`Counter ==` or full dict scan	`formed == required` O(1)	Same + verbal complexity argument
Substring extraction	`s[left:right+1]` inside loop	Same	`(best_l, best_len)`, slice once	Same + discusses memory churn in production
Generalization	None	“Probably works for anagrams”	Notes LC 567, LC 438 use same template	Notes streaming variant, range/temporal extension

12. Level Delta

Mid (L4):

Variable-window solution with Counter == Counter validity check.
O(N·σ) time, correct answer.
All edge cases handled.

Senior (L5):

Match counter from the start.
O(N + M) time stated and justified.
(best_l, best_len) extraction, no in-loop slicing.
Tests t with duplicates and no-valid-window case.

Staff (L6):

Implements match counter with clean expand/contract symmetry.
Discusses LC 567, LC 438, LC 30 as same-template variants.
Discusses memory churn: in JVM/Go, the per-iteration slice creates GC pressure — track indices and slice once.
Discusses streaming variant (data arriving over time).

Principal (L7+):

Discusses the K minimum windows generalization: greedy over non-overlapping intervals.
Discusses multi-pattern: given M required substrings (not just chars), find shortest window containing all of them. Reduces to Aho-Corasick + sliding window in O(N·log M).
Discusses alphabet size σ very large (Unicode, custom tokens): hash-of-hash for O(1) match counter, or roll our own bitset. For ASCII, fixed-size int[128] beats dict.
Discusses production: search-result diversity (find minimum window of search results covering all required facets), log-anomaly detection (minimum window covering all required events), bioinformatics (shortest DNA region containing all required motifs).

13. Follow-up Q&A

Q1: Why have[c] == need[c] (not >=) when incrementing formed? A: formed counts the number of DISTINCT chars that are currently satisfied. We want to increment it exactly once per char — at the moment we first meet the requirement. Using >= would increment on every additional c, making formed larger than required and breaking the formed == required invariant.

Q2: What if the alphabet is huge (Unicode)? A: Replace defaultdict(int) with a plain dict. formed and required still work. For ASCII-only, replace dict with a fixed-size array for ~3-5x speedup; access is O(1) without hashing.

Q3: LC 567 — does s2 contain a permutation of s1? A: Fixed-window of length len(s1). Initialize the window’s have from s2[:len(s1)]. Then slide one char at a time: expand right, contract left. Check formed == required at each step. O(|s2|) time. Use Counter equality at the end if you prefer; match counter is more efficient.

Q4: Find ALL minimum windows (not just one). A: Two passes. First pass: find the minimum length L. Second pass: collect all windows of length exactly L that satisfy the constraint. Same O(N+M) bound.

Q5: Streaming s (data arriving over time). A: Right-expanding sliding window works online. Left can only advance forward. Memory: O(σ). On each character arrival: expand step. Periodically (or on query): check formed == required and try to contract. Online streaming with bounded memory ✓.

14. Solution Walkthrough

See solution.py:

min_window(s, t) — canonical match-counter solution. O(N + M).
min_window_naive(s, t) — variable window with O(σ) validity check on each shrink. O(N·σ).
min_window_brute(s, t) — try every (i, j) pair, compare Counters. O(N²·M). Oracle.
stress_test() — 200 random (s, t) pairs; all three agree on the LENGTH (any tied answer is acceptable; we compare lengths).
edge_cases() — empty t, t > s, no valid window, LC canonical s="ADOBECODEBANC", t="ABC", duplicates t="AABC", full string, single chars.

Run: python3 solution.py.

15. Beyond the Problem

Principal-engineer code review comment:

“The match-counter trick is great, but bury it under a clear comment block at the top of the function: ‘INVARIANT: formed = count of distinct chars c in need such that have[c] >= need[c].’ Without that, the future engineer maintaining this code will see the == need[c] increment and ‘fix’ it to >= need[c], breaking the invariant silently. Also: the have = defaultdict(int) allocates on every call. If this is a hot path, pass a reusable Counter in. And one more thing — the s[best_l:best_l+best_len] allocates a new string; if the caller only needs the indices for highlighting in a UI, return (best_l, best_len) instead and let the caller decide. Don’t pre-allocate what the caller might not need.”

Connections forward:

p29 — Longest Substring Without Repeating: simpler invariant, same template.
LC 567 — Permutation in String: fixed-window version.
LC 438 — Find All Anagrams: fixed-window, return all start indices.
LC 30 — Substring with Concatenation of All Words: tokenized version.
LC 727 — Minimum Window Subsequence: requires DP, NOT sliding window — common trap.
LC 992 — Subarrays with K Different Integers: “exactly K” trick = “at most K” - “at most K-1.”
Phase 02 Lab 02 — Sliding Window: template lab.
Production: search-result diversity, log-anomaly minimal coverage, DNA motif finding, ad-targeting “shortest sequence of impressions covering all required categories.”

LeetCode Interview — Extreme Coding