p30 — Minimum Window Substring
Source: LeetCode 76 · Hard · Topics: Hash Table, String, Sliding Window Companies (2024–2025 frequency): Meta (very high), Amazon (high), Google (high), Microsoft (medium), Bloomberg (medium) Loop position: algorithm round, often the discriminator between Mid and Senior.
1. Quick Context
Given strings s and t, return the minimum window (shortest contiguous substring) in s such that every character in t (counting multiplicity) is included in the window. If no such window exists, return "".
This is the canonical “expand-then-contract” variable-window problem and the natural progression from p29. Three things are different from p29:
- The invariant is more complex: “window contains all required chars with sufficient multiplicity” — not just “no duplicates.”
- The optimization that separates Mid from Senior: the match counter. Naive implementations call
Counter == Counter(O(σ)) on each shrink. The match counter tracks how many distinct characters oftare currently satisfied; checkingformed == requiredis O(1). - The structure: expand
rightuntil the window is valid; then contractleftwhile it stays valid, recording the tightest. Repeat.
This problem appears in Meta loops with high frequency specifically because the naive O(N²) → O(N·σ) → O(N) optimization sequence reveals algorithmic depth in 15 minutes.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/minimum-window-substring/
STOP. 35-min timer. This is Hard. If you can’t finish in 35, write the brute force, state the match-counter idea, and ship it for partial credit.
3. Prerequisite Concepts
- p29 — Longest Substring Without Repeating: variable-window template.
collections.Counteranddefaultdict(int).- Multiset / multiplicity reasoning.
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “Find the shortest contiguous substring w of s such that for every character c in t, the count of c in w is at least the count of c in t. Return w; if none exists, return empty string.”
Step 2 — Clarify:
- “Multiplicity matters?” (Yes.
t="AABC"requires TWO'A's in the window.) - “Case-sensitive?” (Yes.
'A'vs'a'distinct.) - “If multiple windows tie?” (LC: any one is acceptable. Often: leftmost.)
- “Empty
t?” (Spec says non-empty. Confirm — if empty, return"".) - “
tlonger thans?” (No valid window → return"".) - “Characters in
tnot appearing insat all?” (No valid window →"".)
Step 3 — Constraints: N (len of s) ≤ 10⁵, M (len of t) ≤ 10⁵, alphabet uppercase/lowercase English. O(N+M) target.
Step 4 — Examples:
s="ADOBECODEBANC", t="ABC"→"BANC".s="a", t="a"→"a".s="a", t="aa"→""(not enough'a's).s="aa", t="aa"→"aa".s="ab", t="b"→"b".s="bba", t="ab"→"ba"(note: leftmost minimum is"ba", length 2).
Step 5 — Brute Force: Enumerate all (i, j) substrings, for each check Counter(s[i:j+1]) >= Counter(t). O(N²·M). State and pivot.
Step 6 — Complexity: Optimal O(N + M) time, O(σ) space.
Step 7 — Pattern Recognition:
- “Shortest contiguous range satisfying a multiset constraint” → variable sliding window with expand-then-contract.
- “Validity check is hot path” → incremental match counter (don’t re-compare maps).
Step 8 — Develop: see solution.py.
Step 9 — Test: valid example, no-valid case, single char, full string is the answer, t with duplicates, alphabet collisions.
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — The match counter trick
6.1 The naive validity check
The straightforward way:
def is_valid(have, need):
for c, k in need.items():
if have.get(c, 0) < k:
return False
return True
This is O(σ) per call (where σ = |distinct chars in t|). If called inside both expand and contract loops, total complexity becomes O(N · σ) — fine for small σ but wasteful.
6.2 The match-counter optimization
Maintain two counters and TWO integers:
need = Counter(t)— required frequencies.have = defaultdict(int)— current window frequencies.required = len(need)— number of DISTINCT characters that need satisfying.formed = 0— number of DISTINCT characters currently satisfied (i.e.,have[c] >= need[c]).
Expand (right += 1 adds char c):
have[c] += 1
if c in need and have[c] == need[c]:
formed += 1
The == need[c] (not >=) is critical: we increment formed ONLY at the moment we just met the requirement. Going further (have > need) does NOT change formed.
Validity check: formed == required. O(1).
Contract (left += 1 removes char c):
if c in need and have[c] == need[c]:
formed -= 1
have[c] -= 1
Order matters: we decrement formed BEFORE reducing have[c] below need[c]. (Equivalently: check have[c] == need[c] first, then decrement have[c].)
Now the main loop:
left = 0
formed = 0
best_len = inf
best_l = 0
for right in range(n):
expand(s[right])
while formed == required: # window valid → try to shrink
if right - left + 1 < best_len:
best_len = right - left + 1
best_l = left
contract(s[left])
left += 1
return s[best_l : best_l + best_len] if best_len < inf else ""
Each character is processed at most twice (once on expand, once on contract) → O(N + M). The formed == required check is O(1).
6.3 Why the order of operations matters
If you reverse the contract order:
have[c] -= 1
if c in need and have[c] < need[c]:
formed -= 1
This is also correct (and arguably more intuitive). But you must NOT do:
have[c] -= 1
if c in need and have[c] == need[c] - 1:
formed -= 1
unless need[c] > 0 for c — which it is by construction (c in need implies need[c] >= 1). Subtle. Use the cleaner “check then update” form.
6.4 Generalizations
This template extends directly to:
- LC 567 — Permutation in String: fixed-window version (length = len(t)).
- LC 438 — Find All Anagrams in a String: fixed-window, return all start indices.
- LC 30 — Substring with Concatenation of All Words: tokenized version, window over word boundaries.
- LC 727 — Minimum Window Subsequence: similar API but subsequence, requires DP not sliding window.
7. Anti-Pattern Analysis
Counter == Counteron each shrink check. O(σ) per check — turns the algorithm into O(N·σ). Use match counter.have[c] >= need[c]instead ofhave[c] == need[c]when incrementingformed. With>=, you’d incrementformedevery time you add acafter the first match, double-counting.- Wrong contract order. Decrementing
have[c]before checking againstneed[c]flips the comparison. - Tracking the substring with
s[left:right+1]inside the loop. O(N) slice per iteration → O(N²) total. Track(best_l, best_len), slice once at the end. - Initializing
best_len = 0. Then the first valid window of any size fails the< best_lencheck (0 is smaller). Useinforn + 1. - Forgetting the no-window case. Return
""(empty string), notNoneor-1. - Counting characters NOT in
t. Wasteful but not incorrect. To avoid, guard withif c in need: have[c] += 1. Trade-off: complicates contract logic. - Building
needas a dict by iteratingtmanually withif/else.Counter(t)is faster, clearer, and harder to bug. - Treating
tas a SET. Loses multiplicity info.t="AABC"requires TWO'A's.
8. Skills & Takeaways
- Variable sliding window with expand-then-contract — the second canonical pattern after p29.
- Match counter (
formed/required) — an O(1) incremental validity check, replacing O(σ) comparisons. Reuse this pattern for any “multiset coverage” problem. - Counter / defaultdict fluency.
(best_l, best_len)tracking — avoid string slicing in hot loops.- Hard vs Medium discrimination — the match counter is the senior signal that elevates this from Medium-quality code.
9. When to Move On
Move on when:
- You can write the match-counter version in under 18 minutes, no bugs.
- You can explain WHY
have[c] == need[c](not>=) when incrementingformed. - You can articulate why total work is O(N + M) despite the inner while-loop.
- Your stress test passes 200 random small (s, t) pairs vs brute.
- You can pivot to LC 567 (Permutation in String) in under 10 minutes using the same machinery (with a fixed window length).
10. Company Context
Meta:
- One of the highest-frequency Hard problems in 2024-25 Meta loops. Expected solid Senior-level fluency.
- Bar Raiser specifically watches for the match-counter optimization. Naive
Counter ==is a partial-credit signal. - Scorecard phrase: “Algorithm — sliding window with match-counter optimization, O(N+M).”
Amazon:
- Often paired with LC 438 (Anagrams) in the same loop — same template, fixed window.
- Scorecard phrase: “Strong — variable window template + correct match-counter.”
Google:
- Frequently extended: “now return the K shortest non-overlapping such windows.” Tricky greedy + sliding.
- Scorecard phrase: “Algorithm + extension — generalized sliding window with constraint tracking.”
Microsoft:
- Sometimes asked as the warm-up problem in an algorithm-heavy loop.
- Scorecard phrase: “Strong — Hard problem cleanly solved with no false starts.”
Bloomberg:
- Variant: “given a stream of trades and a basket of required tickers, find the shortest time window where all required tickers traded.”
- Scorecard phrase: “Pragmatic — recognized streaming variable-window over time.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| First instinct | Brute O(N²·M) | Variable window with Counter == check | Match counter from the start | Match counter + clean test plan + complexity proof |
| Validity check | Map equality each step | Counter == or full dict scan | formed == required O(1) | Same + verbal complexity argument |
| Substring extraction | s[left:right+1] inside loop | Same | (best_l, best_len), slice once | Same + discusses memory churn in production |
| Generalization | None | “Probably works for anagrams” | Notes LC 567, LC 438 use same template | Notes streaming variant, range/temporal extension |
12. Level Delta
Mid (L4):
- Variable-window solution with
Counter == Countervalidity check. - O(N·σ) time, correct answer.
- All edge cases handled.
Senior (L5):
- Match counter from the start.
- O(N + M) time stated and justified.
(best_l, best_len)extraction, no in-loop slicing.- Tests
twith duplicates and no-valid-window case.
Staff (L6):
- Implements match counter with clean expand/contract symmetry.
- Discusses LC 567, LC 438, LC 30 as same-template variants.
- Discusses memory churn: in JVM/Go, the per-iteration slice creates GC pressure — track indices and slice once.
- Discusses streaming variant (data arriving over time).
Principal (L7+):
- Discusses the K minimum windows generalization: greedy over non-overlapping intervals.
- Discusses multi-pattern: given M required substrings (not just chars), find shortest window containing all of them. Reduces to Aho-Corasick + sliding window in O(N·log M).
- Discusses alphabet size σ very large (Unicode, custom tokens): hash-of-hash for O(1) match counter, or roll our own bitset. For ASCII, fixed-size
int[128]beats dict. - Discusses production: search-result diversity (find minimum window of search results covering all required facets), log-anomaly detection (minimum window covering all required events), bioinformatics (shortest DNA region containing all required motifs).
13. Follow-up Q&A
Q1: Why have[c] == need[c] (not >=) when incrementing formed?
A: formed counts the number of DISTINCT chars that are currently satisfied. We want to increment it exactly once per char — at the moment we first meet the requirement. Using >= would increment on every additional c, making formed larger than required and breaking the formed == required invariant.
Q2: What if the alphabet is huge (Unicode)?
A: Replace defaultdict(int) with a plain dict. formed and required still work. For ASCII-only, replace dict with a fixed-size array for ~3-5x speedup; access is O(1) without hashing.
Q3: LC 567 — does s2 contain a permutation of s1?
A: Fixed-window of length len(s1). Initialize the window’s have from s2[:len(s1)]. Then slide one char at a time: expand right, contract left. Check formed == required at each step. O(|s2|) time. Use Counter equality at the end if you prefer; match counter is more efficient.
Q4: Find ALL minimum windows (not just one). A: Two passes. First pass: find the minimum length L. Second pass: collect all windows of length exactly L that satisfy the constraint. Same O(N+M) bound.
Q5: Streaming s (data arriving over time).
A: Right-expanding sliding window works online. Left can only advance forward. Memory: O(σ). On each character arrival: expand step. Periodically (or on query): check formed == required and try to contract. Online streaming with bounded memory ✓.
14. Solution Walkthrough
See solution.py:
min_window(s, t)— canonical match-counter solution. O(N + M).min_window_naive(s, t)— variable window with O(σ) validity check on each shrink. O(N·σ).min_window_brute(s, t)— try every (i, j) pair, compare Counters. O(N²·M). Oracle.stress_test()— 200 random (s, t) pairs; all three agree on the LENGTH (any tied answer is acceptable; we compare lengths).edge_cases()— empty t, t > s, no valid window, LC canonicals="ADOBECODEBANC", t="ABC", duplicatest="AABC", full string, single chars.
Run: python3 solution.py.
15. Beyond the Problem
Principal-engineer code review comment:
“The match-counter trick is great, but bury it under a clear comment block at the top of the function: ‘INVARIANT: formed = count of distinct chars c in need such that have[c] >= need[c].’ Without that, the future engineer maintaining this code will see the
== need[c]increment and ‘fix’ it to>= need[c], breaking the invariant silently. Also: thehave = defaultdict(int)allocates on every call. If this is a hot path, pass a reusable Counter in. And one more thing — thes[best_l:best_l+best_len]allocates a new string; if the caller only needs the indices for highlighting in a UI, return(best_l, best_len)instead and let the caller decide. Don’t pre-allocate what the caller might not need.”
Connections forward:
- p29 — Longest Substring Without Repeating: simpler invariant, same template.
- LC 567 — Permutation in String: fixed-window version.
- LC 438 — Find All Anagrams: fixed-window, return all start indices.
- LC 30 — Substring with Concatenation of All Words: tokenized version.
- LC 727 — Minimum Window Subsequence: requires DP, NOT sliding window — common trap.
- LC 992 — Subarrays with K Different Integers: “exactly K” trick = “at most K” - “at most K-1.”
- Phase 02 Lab 02 — Sliding Window: template lab.
- Production: search-result diversity, log-anomaly minimal coverage, DNA motif finding, ad-targeting “shortest sequence of impressions covering all required categories.”