Elite Coding Interview Mastery Program
A complete, lab-based, phase-structured training system to take you from beginner/intermediate to a candidate who can succeed at any coding interview — FAANG, infrastructure companies, distributed systems teams, compilers/runtimes, quant/HFT, staff/principal practical coding, and competitive-programming-style interviews.
This is not a roadmap. It is a training system: every phase has concept docs, runtime docs, hands-on labs in a fixed format, mock interviews with rubrics, failure analysis, and spaced repetition.
How To Use This Program
- Pick a track in schedules/ — Accelerated (12 weeks), Serious (6 months), or Elite (12 months).
- Read the universal framework first — FRAMEWORK.md. Use it on every problem.
- Work through phases in order — they have dependencies. Do not skip Phase 0 or Phase 1.
- For every problem, fill out REVIEW_TEMPLATE.md and schedule revisits via SPACED_REPETITION.md.
- When you fail, run the diagnosis in FAILURE_ANALYSIS.md. Do not skip this.
- Mock interview weekly minimum — see phase-11-mock-interviews/.
- Graduate only when READINESS_CHECKLIST.md is fully passed.
Global Framework Documents
| Document | Purpose |
|---|---|
| FRAMEWORK.md | The 16-step universal problem-solving framework + 11-step stuck protocol |
| COMMUNICATION.md | What to say (and not say) during an interview, with sample phrases |
| CODE_QUALITY.md | Quality bar for interview code, with bad-vs-good comparisons |
| REVIEW_TEMPLATE.md | Per-problem review template (use after every solve) |
| FAILURE_ANALYSIS.md | 16-category failure taxonomy with diagnosis, fix, drill, re-test |
| SPACED_REPETITION.md | 6-tier review intervals + per-tier protocol |
| READINESS_CHECKLIST.md | Final binary checklist before going to interviews |
Schedules
| Track | Audience | Doc |
|---|---|---|
| 12-Week Accelerated | Urgent prep, 1 deadline coming up | schedules/12_WEEK_ACCELERATED.md |
| 6-Month Serious | Strong Big Tech readiness | schedules/6_MONTH_SERIOUS.md |
| 12-Month Elite | Top-tier, senior/staff, competitive-heavy | schedules/12_MONTH_ELITE.md |
Phases
| # | Phase | Target Level | Folder |
|---|---|---|---|
| 0 | Interview Execution Baseline | Beginner → Easy | phase-00-execution-baseline/ |
| 1 | Programming & Data Structure Foundations | Easy → Medium | phase-01-foundations/ |
| 2 | Standard Coding Interview Patterns | Medium → Medium-Hard | phase-02-patterns/ |
| 3 | Advanced Data Structures | Medium-Hard → Hard | phase-03-advanced-ds/ |
| 4 | Graph Mastery | Medium-Hard → Hard | phase-04-graphs/ |
| 5 | Dynamic Programming (Basic → Extreme) | Medium → Very Hard | phase-05-dp/ |
| 6 | Greedy, Proofs & Mathematical Thinking | Medium-Hard → Hard | phase-06-greedy/ |
| 7 | Competitive Programming Acceleration | Hard → CP-Hard | phase-07-competitive/ |
| 8 | Practical Engineering Coding Interviews | Medium-Hard → Hard | phase-08-practical-engineering/ |
| 9 | Language & Runtime Deep Dive | All levels (cross-cutting) | phase-09-language-runtime/ |
| 10 | Testing, Debugging & Correctness | All levels (cross-cutting) | phase-10-testing-debugging/ |
| 11 | Mock Interview Mastery | Beginner → Grandmaster | phase-11-mock-interviews/ |
| 12 | Grandmaster / Final Boss | CP-Hard → Grandmaster | phase-12-grandmaster/ |
Difficulty Ladder
| Level | What You Solve | Solving Time | Failure Means | When To Move Up |
|---|---|---|---|---|
| Beginner | Trivial array/string traversals | 5–10 min | Confusion about loops/indexing | After 30 problems with 0 confusion |
| Easy | LeetCode Easy | 10–15 min | Wrong brute force, syntax errors | 90% solved <15 min for 50 problems |
| Medium | LeetCode Medium | 25–35 min | Missed pattern, bad complexity | 75% solved <35 min for 100 problems |
| Medium-Hard | Top-100 Mediums, easy Hards | 30–45 min | Couldn’t optimize past brute force | 70% solved <45 min for 60 problems |
| Hard | LeetCode Hard | 45–75 min | Failed to find any non-trivial approach | 50% solved <60 min for 50 problems |
| Very Hard | LC Hard tagged “very hard”, Codeforces 1900–2100 | 60–120 min | Conceptual gaps in algorithm class | 40% solved unaided for 30 problems |
| CP Hard | Codeforces 2100–2400, AtCoder ARC F | 90–180 min | Missing a CP-specific technique | Solving consistently in contest |
| Grandmaster | CF 2400+, AGC, ICPC WF | Open-ended | Even with hints, can’t make progress | When you stop needing this curriculum |
For each level: failure is normal. The value is in the review process and the failure analysis, not the original solve.
When to repeat a level: if your unaided success rate is below the threshold above, do another 30–50 problems at the same level before moving up. Moving up too early calcifies bad habits.
Progress Tracking
Use a single spreadsheet/journal with columns:
- Date
- Phase / Pattern
- Problem name + source
- Solved unaided? (Y/N)
- Time spent
- Got it on first attempt? (Y/N)
- Why missed (if missed) — link to FAILURE_ANALYSIS.md category
- Next review date — from SPACED_REPETITION.md
Without tracking, the spaced repetition system collapses. Without spaced repetition, knowledge decays faster than you accumulate it.
A Note On Honesty
This curriculum cannot guarantee outcomes. What it guarantees is:
- If you complete every phase honestly (failed problems reviewed, mocks done with full effort, no shortcuts), you will recognize > 95% of common interview problem patterns.
- You will be able to solve unfamiliar problems — because the framework forces you to derive solutions, not memorize them.
- You will fail mock interviews repeatedly and learn from each one. This is the entire point.
- You will know your own weaknesses precisely, and which phase to revisit.
Skip the failure analysis, skip the reviews, skip the mocks — and the curriculum becomes a list of topics. Topics don’t get you hired.
📖 Published · commit
406ccc0· 2026-05-21 16:15 UTC
Universal Problem-Solving Framework
Use this on every problem. It is non-optional. The goal is to make solving deterministic, not heroic.
The 16-Step Framework
1. Restate The Problem
Say it back in your own words. If you can’t restate, you don’t understand. Ambiguous parts surface here.
2. Ask Clarifying Questions
- Input type, range, sign, precision
- Are inputs sorted? Distinct? Can be empty/null?
- Output format. One answer or all answers?
- Are duplicates allowed?
- Can the input mutate? Is it streamed?
- What should happen on invalid input?
- Constraints not given?
3. Identify Constraints
Constraints dictate the algorithm. Memorize this table:
| N | Acceptable Complexity | Likely Approach |
|---|---|---|
| ≤ 10 | O(N!) or O(2^N · N) | Backtracking, bitmask brute force |
| ≤ 20 | O(2^N · N) | Bitmask DP, meet-in-the-middle |
| ≤ 100 | O(N^4) | Multi-loop brute, Floyd-Warshall |
| ≤ 500 | O(N^3) | Interval DP, matrix chain |
| ≤ 5,000 | O(N^2) | 2D DP, edit distance |
| ≤ 100,000 | O(N log N) or O(N √N) | Sort + scan, segment tree, sqrt decomp |
| ≤ 1,000,000 | O(N) or O(N log N) | Linear scan, hashmap, two pointers |
| ≤ 10^8 | O(N) or O(log N) | Math closed form, binary search on answer |
| ≤ 10^18 | O(log N) | Binary exponentiation, math |
4. Work Through Examples
Use the given example. Then build at least two more:
- A trivial case (size 1, empty)
- An adversarial case (max constraints, all duplicates, all negative, sorted descending)
Work them by hand. Annotate intermediate states. This often reveals the pattern.
5. Identify Brute Force
What is the dumbest correct solution? Write it down (pseudocode). Don’t skip this even if you “see” the optimal — the brute force is your correctness oracle for stress testing.
6. Analyze Brute Force Complexity
Time and space, in big-O. If it fits the constraints, you may be done. Often it does for small N.
7. Recognize Patterns
Run through the pattern checklist in phase-02-patterns/:
- Sorted or sortable? → two pointers, binary search
- Asks for max/min over windows? → sliding window, monotonic deque
- “Subarray with property X”? → prefix sum, sliding window
- “K-th something”? → heap, quickselect
- “Number of ways”? → DP, combinatorics
- “Shortest path”? → BFS / Dijkstra / 0-1 BFS
- “Connected components / cycles / dependencies”? → union-find / DFS / topo sort
- “Decision: can we do X with budget Y?” → binary search on answer
- “Optimal sequence with overlapping subproblems”? → DP
8. Derive Optimized Approach
Reduce repeated work. Cache, sort, hash, prune, transform. State the invariant of your approach.
9. Prove Correctness
- For greedy: exchange argument or cut property.
- For DP: state definition + transition + base cases + evaluation order.
- For graphs: cite the algorithm’s correctness theorem and verify preconditions.
- For two-pointer/sliding window: the loop invariant.
10. Write Clean Code
- Meaningful names
- Single-responsibility functions
- No premature abstraction
- Avoid mutating function parameters unless intentional
- Match the patterns of the language idiom
11. Test Smoke Cases
Walk the given examples through your code by hand. Don’t run yet. Find bugs before silicon does.
12. Test Edge Cases
- Empty / null
- Size 1 / size 2
- All duplicates
- All negative / mixed signs
- Sorted ascending / descending
- Max constraint values (overflow risk)
- Multiple valid answers (specify which one)
- Disconnected graph / cycle
- Concurrent access (where relevant)
13. Test Large Cases
Confirm complexity by reasoning, not by running. Will N=10^6 pass in 1 second? In your language’s runtime?
14. Explain Complexity
State time and space. State whether the bound is tight. Mention amortized vs worst-case if relevant. State assumptions (hash table O(1) average is an assumption).
15. Handle Follow-Ups
Anticipate. Most interviews follow up with one of:
- “What if the input doesn’t fit in memory?” → streaming, external sort, sketches
- “What if it’s distributed?” → sharding, consistent hashing
- “What if reads >> writes?” → caching, replicas
- “What if writes >> reads?” → log-structured, write-back
- “What if we need approximate answers?” → Bloom, HLL, count-min
- “How would you test this?” → unit + property + stress + concurrency
- “How would you debug a production failure?” → logs + metrics + repro
16. Discuss Production Implications
For practical engineering interviews: monitoring, logging, metrics, partial failure, backpressure, retries, idempotency, observability, deployment.
The Stuck Protocol
When you’ve been silent for >2 minutes, or have no progress for 5 minutes, switch into this mode. Do not freeze. Do not flail.
1. Restate What Is Known
Out loud. “I have an array of N integers, I want the longest subarray such that…”
2. Write Brute Force
Even if it’s O(N^4) and you know it won’t pass. Brute force gives you:
- A correctness oracle
- A starting point to optimize from
- Intermediate state to inspect for patterns
3. Inspect Constraints Again
Have you forgotten one? Often the constraint is the hint. N ≤ 20 screams bitmask. K ≤ 10 screams “K is in the state”.
4. Try Smaller Examples
Solve N=1, N=2, N=3 by hand. Patterns emerge. Often the recurrence falls out.
5. Look For Repeated Work
In the brute force, what’s recomputed? That’s your DP state or memoization target.
6. Look For Monotonicity
- Is there a value over which the answer is monotonic? → binary search on answer
- Is there a window whose property is monotonic? → sliding window, monotonic stack/deque
7. Look For Graph Modeling
Words like “depends on”, “leads to”, “transitions”, “groups”, “components”, “blocked by” all suggest graphs. Try modeling explicitly.
8. Look For DP State
Ask: “What information do I need at position i to decide what’s optimal going forward?” That information is the state.
9. Look For Greedy Invariant
Ask: “If I make the locally best choice, can I prove I never need to undo it?” If yes, greedy. If no, DP.
10. Ask For A Small Hint Professionally
Sample phrases:
- “I’ve considered X and Y but I’m having trouble seeing the structure. Could you nudge me toward the right family of approach?”
- “Is the input small enough that exponential is acceptable, or are we targeting polynomial?”
A well-asked hint costs you almost nothing. A 10-minute silence is fatal.
11. Recover And Continue
Take the hint, restate the new constraint or insight, commit out loud to a direction, and resume coding. Don’t apologize repeatedly. Move forward.
A Note On Discipline
This framework feels slow for the first 50 problems. By problem 200, you’ll execute steps 1–9 in 4 minutes flat. By problem 500, the framework runs subconsciously and you’ll only consciously invoke it when stuck.
The goal is not to memorize the framework. It’s to internalize it so deeply that when you read a problem, your brain runs steps 1–9 automatically.
Candidates who skip the framework “to save time” lose interviews to candidates who don’t, because the framework users:
- Catch ambiguity before they code
- Get the right complexity on the first attempt
- Don’t waste minutes coding the wrong thing
- Communicate clearly throughout
- Know what to test before they finish coding
Code Quality Standards For Interviews
Interview code is judged differently from production code. The bar is:
- Correct above all else
- Readable — the interviewer should follow it without you explaining every line
- Simple — the simplest solution that works, not the cleverest
- Defensive only at boundaries — validate inputs once, then trust them
- Testable — pure functions and clear data flow
Interview code is not:
- Production-ready (no logging, no metrics, no retries unless asked)
- Heavily commented (good names beat comments)
- Premature-abstraction (no factory factories)
- Defensive everywhere (validating inside hot loops is noise)
The Quality Dimensions
| Dimension | What “Good” Looks Like |
|---|---|
| Correctness | Handles every edge case the problem allows |
| Simplicity | No clever tricks unless required for complexity |
| Readability | A peer can read it once and understand |
| Naming | parent, visited, frequency — not p, v, f (except in trivial scopes) |
| Modularity | Helper functions for distinct logical units |
| Boundary handling | Empty/null/overflow checked once at the entry |
| No premature abstraction | One-time logic stays inline |
| No overengineering | Don’t build a config system for a 30-line problem |
| No hidden state | Globals/singletons are red flags |
| Minimal mutation | Prefer immutable returns where natural |
| No excessive cleverness | One-liners that need a paragraph to explain are anti-signal |
| Standard library use | Use the language’s built-ins idiomatically |
| Testability | Logic separated from I/O, deterministic |
Bad vs Good Examples
Naming
Bad:
def f(a, b):
r = []
for x in a:
if x > b:
r.append(x)
return r
Good:
def values_above(numbers, threshold):
return [n for n in numbers if n > threshold]
Boundary Handling
Bad (validates in every iteration):
def sum_positive(nums):
total = 0
for n in nums:
if nums is None or len(nums) == 0: # checked every iteration
return 0
if n > 0:
total += n
return total
Good (validate once at the boundary):
def sum_positive(nums):
if not nums:
return 0
return sum(n for n in nums if n > 0)
Excessive Cleverness
Bad (one-liner, hard to debug):
def has_duplicate(nums):
return len(nums) != len({*nums}) if nums else False
Good (clear intent):
def has_duplicate(nums):
seen = set()
for n in nums:
if n in seen:
return True
seen.add(n)
return False
The good version also has the advantage of early termination — better complexity in practice.
Helper Functions
Bad (everything in one 50-line function):
def shortest_path(grid, start, end):
# 50 lines of BFS, neighbor computation, distance tracking, all inline
...
Good (extract neighbor logic):
def shortest_path(grid, start, end):
queue = deque([(start, 0)])
visited = {start}
while queue:
pos, dist = queue.popleft()
if pos == end:
return dist
for nxt in neighbors(grid, pos):
if nxt not in visited:
visited.add(nxt)
queue.append((nxt, dist + 1))
return -1
def neighbors(grid, pos):
r, c = pos
rows, cols = len(grid), len(grid[0])
for dr, dc in ((-1, 0), (1, 0), (0, -1), (0, 1)):
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] != '#':
yield (nr, nc)
Premature Abstraction
Bad (over-engineered for a single use):
class CounterStrategy:
def count(self, items):
raise NotImplementedError
class HashCounterStrategy(CounterStrategy):
def count(self, items):
d = {}
for x in items:
d[x] = d.get(x, 0) + 1
return d
def majority_element(nums):
counts = HashCounterStrategy().count(nums)
return max(counts, key=counts.get)
Good:
def majority_element(nums):
counts = Counter(nums)
return counts.most_common(1)[0][0]
Mutation
Bad (mutates input):
def normalized(values):
for i in range(len(values)):
values[i] = values[i] / max(values) # also recomputes max each iteration
return values
Good (no mutation, single max computation):
def normalized(values):
if not values:
return []
peak = max(values)
return [v / peak for v in values]
Hidden State
Bad (global counter):
_call_count = 0
def fib(n):
global _call_count
_call_count += 1
if n < 2:
return n
return fib(n - 1) + fib(n - 2)
Good (state passed explicitly):
@lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n - 1) + fib(n - 2)
Language-Idiomatic Code
You should write code that looks like it was written by someone fluent in the language. Examples:
Python
- Use list/dict/set comprehensions where natural
- Use
Counter,defaultdict,deque,heapq,bisect - Use
enumerate,zip, unpacking - Avoid C-style
for i in range(len(x))if you only need values
Java
- Use enhanced for-loop where possible
- Use
Map.computeIfAbsent,Map.getOrDefault - Use
Optionalonly where it fits the API; not for short-circuit logic - Prefer
ArrayDequeoverStack(legacy)
Go
- Prefer slices over arrays
- Use
for rangefor both index and value - Return errors explicitly; don’t panic in interview code
- Buffered channels only when justified
C++
- Use
autowhere it improves readability - Range-based for-loops
- Prefer
std::vectorandstd::unordered_map - Use
emplace_backoverpush_backfor non-trivial types constreferences for non-trivial inputs
JavaScript/TypeScript
- Use
Map/Set(not{}/array hacks) when keys aren’t strings or order matters constby default,letonly when reassignment is needed- Avoid
var - TS: prefer
unknownoveranyat boundaries
Comments
- Avoid: comments that restate what the code says (
// increment i). - Prefer: comments that explain why — the non-obvious tradeoff, the invariant, the reason a less elegant approach was chosen.
- Required: a one-line comment above any non-trivial recurrence/invariant. Example:
# dp[i][j] = max profit with i transactions ending at day j.
Length
For most coding interview problems:
- Easy: 10–25 lines
- Medium: 20–50 lines
- Hard: 30–80 lines
If your Easy is 80 lines, you’re overengineering. If your Hard is 200 lines, you’ve gone wrong somewhere — re-examine the approach.
Final Self-Review Checklist
Before saying “I’m done”:
- ☐ Function/variable names are meaningful
- ☐ No dead code, no commented-out blocks
- ☐ Boundary checks at the entry, not in the hot loop
- ☐ Helper functions for distinct logical units
- ☐ No mutation of input unless intentional and stated
- ☐ No globals introduced
- ☐ Standard library used idiomatically
- ☐ Indentation/formatting consistent
- ☐ Code I would be willing to send a colleague for review
Interview Communication Rules
Coding interviews are collaborative problem-solving sessions, not exams. Half the signal an interviewer collects is how you communicate. A correct silent solution often scores worse than a slightly imperfect one with strong communication.
Core Principles
- Narrate continuously, but not constantly. Your mouth does not need to track your fingers character-by-character. It tracks your intent.
- Show your reasoning, not just your conclusion. “I’m choosing a hashmap because we need O(1) lookup by ID” beats silently typing
Map<String, Foo>. - Two-way street. Pause for confirmation at decision points. The interviewer wants to be involved.
- Hide nothing. If you’re unsure, say so. If you don’t remember an API, say so. Hiding looks worse than admitting.
- Forward motion always. Even when stuck, narrate progress: “I’m now going to try a smaller example” is forward motion.
Phase 1 — Opening (first 1–3 minutes)
What to do
- Read the problem fully (don’t start coding)
- Restate it
- Ask clarifying questions
- Confirm constraints
Sample phrases
“Let me read this through once first… OK. So if I understand correctly, I need to [restate]. Is that right?”
“Before I start, can I confirm a few things about the input? Specifically: can the array be empty? Can it contain negative numbers? Are there duplicates?”
“What’s the expected size of N here? Around 10^5? OK, so we’re targeting O(N log N) or better.”
“If there are multiple valid outputs, can I return any one, or is there a specific one expected?”
Anti-patterns
- Starting to code immediately
- Asking 15 questions in a row (drip-feed them as they become relevant)
- Asking questions whose answers are obvious from the problem statement (signals you didn’t read carefully)
Phase 2 — Brute Force (1–3 minutes)
What to do
- State the dumbest correct solution
- Compute its complexity
- Confirm whether it’s acceptable
Sample phrases
“Let me start with the brute force just to anchor. I could check every pair, which would be O(N^2). Given N is up to 10^5, that’s 10^10 operations — too slow. So we need something better.”
“The naive solution is O(2^N) — that’s fine for N=20 but won’t scale. Let me see if we can do better.”
“If we sort first, that gives us O(N log N) for the sort, and then… let me think about the scan.”
Anti-patterns
- Skipping brute force (“I know the optimal already”)
- Computing brute force complexity wrong (always double-check)
- Stating brute force without saying whether it suffices
Phase 3 — Optimization (3–10 minutes)
What to do
- Think out loud
- State observations
- Propose ideas and evaluate them
- Commit to one direction
- Sanity-check before coding
Sample phrases
“I notice the array is sorted, which means I can probably use two pointers…”
“The fact that K is small — only up to 10 — suggests we might want K in our DP state.”
“I see a pattern of repeated subproblems here. Let me define a state: dp[i][j] = …”
“Let me try a small example to verify my recurrence…”
“OK, I think my approach is: sort by start time, then greedily pick. Let me convince myself this works with an exchange argument…”
Anti-patterns
- Switching ideas every 30 seconds without exploring any
- Coding before stating the approach
- Not verifying correctness on a small example
Phase 4 — Coding (10–25 minutes)
What to do
- Narrate intent at function/block boundaries
- Explain non-obvious choices
- Stay quiet during routine syntax
- Ask “is the API I’m assuming OK?” if uncertain
Sample phrases
“I’m going to use a heap here, in Python that’s heapq. Pushing tuples to break ties by ID.”
“For the visited set, I’ll use a hash set rather than a 2D boolean array, since we’re not sure of the bounds.”
“I’m using a sentinel value of -1 to mean ‘not yet computed’ in the memoization.”
“Let me extract this into a helper function — it’ll make the recursion cleaner.”
Anti-patterns
- Silent typing for 5+ minutes
- Constant low-level narration (“now I’m typing a for loop”)
- Going down a rabbit hole on language minutiae mid-flow
Phase 5 — Testing (3–5 minutes)
What to do
- Walk through the given example
- Walk through your own edge cases
- Catch and fix bugs before the interviewer points them out
Sample phrases
“Let me trace through example 1. Initial state: … after iteration 1: … after iteration 2: … final answer: 7. Matches the expected.”
“Edge cases: what if the array is empty? My code would… let me check… yes, it returns 0, which is correct.”
“What about all negatives? Hmm, my initialization assumes a non-negative max. Let me fix that — initialize to negative infinity.”
“I think there’s an off-by-one here. Let me re-examine the loop bound.”
Anti-patterns
- Skipping testing because “the code looks right”
- Testing only the happy path
- Defending broken code when a bug is pointed out (just fix it)
Phase 6 — Complexity & Follow-Ups (2–5 minutes)
What to do
- State time and space
- Mention assumptions
- Engage with follow-ups thoughtfully
Sample phrases
“Time complexity is O(N log N) — dominated by the sort. Space is O(N) for the auxiliary array, or O(1) if we sort in place.”
“Average case for the hashmap is O(1), but worst case is O(N) under adversarial hashing. In Python the dict implementation handles that reasonably well.”
“If we needed to scale this to 10 million users, I’d consider… [sharding / external sort / approximate counters / etc.]”
“If reads were much more frequent than writes, we might precompute and cache.”
Anti-patterns
- Claiming O(N) when it’s actually O(N log N)
- Forgetting space complexity
- Defensive answers to follow-ups (“I’d need more info” instead of engaging)
Handling Hints
If the interviewer gives a hint:
“Ah, that helps — so you’re saying we should [restate]. Let me adjust my approach…”
Then commit out loud to the new direction. Don’t second-guess. Take the hint, integrate it, move forward.
If you need a hint:
“I’ve explored [X] and [Y]. I’m having trouble seeing how to avoid the O(N^2) here. Could you nudge me toward the family of approach?”
A well-asked hint costs you almost nothing. Frozen silence is fatal.
Handling Mistakes
When you find your own bug:
“Hmm, let me re-examine this — I think there’s a bug at line N. Yes, the comparison should be < not ≤. Let me fix that.”
When the interviewer finds a bug:
“Oh, you’re right — the empty case isn’t handled. Let me add a guard.”
Never:
- Argue
- Explain why the bug is “not really a bug”
- Get flustered
Always:
- Acknowledge briefly
- Fix
- Move on
Handling Pressure / Freezing
If you blank:
“Give me 30 seconds to think.” (Then actually think — don’t fake it.)
Then run the stuck protocol. Out loud:
“OK, let me back up. What I know is [X]. The brute force was [Y]. The constraint that’s hinting at something is [Z]…”
Talking through the stuck protocol restarts your thinking and shows the interviewer you have a process for being stuck.
Closing Strong
Final 1–2 minutes:
“To summarize: I’m using [data structure] with [algorithm]. Time is O(…), space is O(…). Edge cases handled: empty, single element, duplicates, max constraints. The main risk in production would be [X], which I’d address by [Y].”
A clean summary leaves the interviewer with a tidy mental model of your work — much better than ending mid-test.
Body Language & Tone (Video / In-Person)
- Sit up. Look at the interviewer when explaining, at the editor when coding.
- Speak at moderate pace. Faster than normal = nervous, slower = padding.
- Avoid filler (“um”, “like”) — silence is preferable to filler.
- Don’t apologize repeatedly. One apology when you fix a bug is enough.
- Show interest in the problem. Curiosity is a positive signal.
Phrases To Avoid
- “This is easy” — even if it is, this can read as arrogant
- “I’ve seen this before” — be careful; if you go on autopilot you’ll miss the variant
- “I don’t know” with no follow-up — replace with “I don’t know X, but I can reason about it via Y”
- “That won’t work” without explanation
- “Just” used dismissively (“we just need to…”)
What Strong Communication Buys You
- Partial credit when your code is incomplete (the interviewer saw your reasoning)
- Hints offered earlier (interviewers want to help engaged candidates)
- Believability of follow-up answers (a candidate who reasoned clearly throughout is trusted on production tradeoffs)
- Hire signal even on hard problems you didn’t fully solve
Failure Analysis System
Every failed problem (or weak performance) maps to one or more failure categories. The category determines the drill that fixes it. Without this taxonomy, you’ll repeat the same mistake forever.
How To Use
After every miss:
- Identify the primary failure category (the root cause, not the symptom).
- Identify any secondary categories.
- Run the listed drill within 24 hours.
- Re-test on a similar problem within 1 week.
The 16 Failure Categories
1. Did Not Understand The Problem
Symptom: Solved a problem the interviewer didn’t ask. Root cause: Skipped restating, didn’t ask clarifying questions. Fix: Always restate. Always ask 3+ clarifying questions before coding. Drill: Take 10 LeetCode problems, do only steps 1–4 of FRAMEWORK.md (restate, clarify, identify constraints, examples). Don’t solve them. The drill is reading. Re-test: Mock interview with an interviewer who deliberately gives an ambiguous problem.
2. Missed Constraints
Symptom: Brute force passed at small N but TLE’d at full N. Or used 32-bit integer when sum exceeds 2^31. Root cause: Skipped step 3 of the framework. Fix: Make a habit: the moment you read a constraint, derive the target complexity out loud. Drill: Read 20 problems. For each, before reading the editorial, write down: target complexity, allowed N, integer width needed. Re-test: Solve 5 problems where the constraint is the hint (e.g., N≤20 → bitmask).
3. Could Not Find Brute Force
Symptom: Stared at the problem with no starting point. Root cause: Tried to find the optimal directly. Dangerous habit. Fix: Brute force is always possible. Iterate over every subset / pair / arrangement / state. State it even if it’s exponential. Drill: 10 problems. For each, write only the brute force in pseudocode. Don’t optimize. Re-test: Mock with a hard problem; goal is to communicate brute force in <3 minutes.
4. Could Not Optimize
Symptom: Wrote brute force, then froze. Root cause: No systematic optimization toolkit. Fix: Run the optimization checklist (step 7 of framework): pattern recognition, repeated-work elimination, monotonicity, sortedness exploitation, state compression, math. Drill: Take 10 brute-force solutions to known problems. For each, optimize without looking at the editorial using the checklist. Re-test: Solve 5 unfamiliar mediums in <30 min each.
5. Chose Wrong Data Structure
Symptom: Used a list where a set would have given O(1) lookup. Used a heap where a sorted array suffices. Used recursion where a stack would simplify. Root cause: Didn’t reason about access patterns. Fix: Before choosing a DS, list the operations you need and their frequency. Then pick the DS whose complexity matrix matches. Drill: phase-01-foundations/data-structures/ — reread the operations table for each DS. Then re-solve 5 problems consciously documenting why each DS was chosen. Re-test: Mock with explicit “why this DS?” follow-ups.
6. Bad Complexity
Symptom: Said O(N) when it was O(N log N). Or claimed O(1) on operations that are amortized. Root cause: Sloppy analysis, no habit of double-checking. Fix: State complexity and the basis. “O(N log N) because we sort, then linear scan” — not just “O(N log N)”. Drill: Take 20 of your own past solutions. Re-derive complexity. Compare to what you said. Re-test: Mock interviewer specifically grills on complexity.
7. Buggy Implementation
Symptom: Approach was right, but the code had off-by-ones, wrong operators, swapped variables. Root cause: Coding faster than thinking. Insufficient pre-coding clarity. Fix: Write pseudocode first. Trace through one example before running anything. Drill: Standard implementations: write binary search, BFS, DFS, union-find from scratch 10 times each. Use templates only after you can write them error-free. Re-test: Solve 5 problems with hand-traced verification before submission.
8. Weak Testing
Symptom: Submitted, got “wrong answer on test 7”. Hadn’t tested edge cases. Root cause: Skipped step 12 of the framework. Fix: Make the universal edge-case checklist a reflex. Before submission, run through it. Drill: Take 5 of your “wrong on hidden test” problems. Without looking at the test that failed, generate 10 edge cases for each. Re-test: Solve 5 problems and catch your own bug before submission. Score = problems where you fixed a bug pre-submission.
9. Poor Communication
Symptom: Mock interviewer said “I had no idea what you were thinking.” Root cause: Silent coding, or only narrating low-level mechanics. Fix: COMMUNICATION.md. Narrate intent at decision points. Drill: Solve 10 problems while recording yourself narrating. Listen back. Are you communicating decisions, or just typing aloud? Re-test: Mock with explicit communication scoring.
10. Froze Under Pressure
Symptom: Knew the technique in practice. Couldn’t access it in mock. Root cause: Insufficient mock volume. Fix: More mocks. Pressure tolerance is built only by exposure. Drill: 5 mocks in a week with a real human or pressure-simulating tool. Use the stuck protocol explicitly when you freeze. Re-test: Mock with cold problems. Goal: never go silent for >60 seconds.
11. Missed Edge Cases
Symptom: Solution was correct on examples but failed on size-1, empty, or max-constraint inputs. Root cause: Didn’t run the universal checklist. Fix: Build muscle memory for: empty / 1 / 2 / dup / negative / sorted / reversed / max / disconnected / cycle. Drill: Take 10 past solutions. For each, brainstorm 5 edge cases. Verify your code handles them. Re-test: Mock where edge cases are the gating criterion.
12. Runtime / Language Issue
Symptom: Code was algorithmically right but ran slow / crashed / behaved wrong due to language behavior. Examples: integer overflow in Java, dict iteration order bug, Python recursion limit, C++ undefined behavior. Root cause: Insufficient runtime depth. Fix: phase-09-language-runtime/ for your primary language. Drill: Read your language’s track end-to-end. Solve 5 problems specifically targeting the gotchas (e.g., overflow problems, recursion-depth problems). Re-test: Runtime-deep-dive mock (mock-09).
13. Concurrency Issue
Symptom: Race condition, deadlock, lost update, visibility bug. Root cause: Missing concurrency mental model. Fix: phase-09-language-runtime/ (concurrency sections) + phase-08-practical-engineering/ thread-pool / job-queue labs. Drill: Implement thread-pool, rate limiter, and producer-consumer queue from scratch with race-condition tests. Re-test: Concurrency-heavy mock (mock-11).
14. Overfit To Memorized Pattern
Symptom: Forced a known pattern that didn’t fit. E.g., applied sliding window to a problem that needed monotonic deque. Root cause: Pattern matching without understanding. Fix: Re-read pattern docs and focus on when the pattern does NOT apply. Drill: Take 10 problems. For each, list 3 patterns it might be, then eliminate 2 with reasoning. Re-test: Mock with problems specifically chosen to be near-misses of common patterns.
15. Did Not Prove Correctness
Symptom: Submitted a greedy that was wrong. Or a DP whose recurrence was incorrect. Root cause: Skipped step 9 of the framework. Fix: Force yourself to state the invariant or recurrence before coding. Drill: Take 10 greedy problems. For each, state the exchange argument. Take 10 DP problems. For each, state the recurrence + base case + evaluation order. Re-test: Solve 5 problems where the proof is the hard part.
16. Could Not Handle Follow-Up
Symptom: Solved the core problem, but interviewer’s follow-up (“how would this scale to 10M users?”) got a vague answer. Root cause: No production / system thinking. Fix: phase-08-practical-engineering/ labs all include the standard follow-up bank. Drill: Take 10 of your past solutions. For each, write a 2-paragraph answer to: “scale to 10M users”, “make distributed”, “handle partial failure”, “add observability”. Re-test: Senior-engineer mock (mock-07) which weighs follow-ups heavily.
Tracking Failures Over Time
Maintain a failures.md (or spreadsheet) with columns:
| Date | Problem | Primary Category | Secondary | Drill Done? | Re-test Date | Re-test Result |
|---|
After 4 weeks, count by category. Your top 3 categories are your personal weakness profile. Spend the next 4 weeks specifically drilling those.
This is how you avoid the “I keep failing at the same thing” trap.
Common Compound Failures
Some categories often co-occur. If you see these together, treat the deeper one:
- #1 + #11 (didn’t understand + missed edges) → really #1. Fix understanding first.
- #3 + #4 (no brute force + couldn’t optimize) → really #3. You can’t optimize what you don’t have.
- #7 + #8 (buggy + weak testing) → really #8. Better testing catches bugs.
- #5 + #6 (wrong DS + bad complexity) → really #5. Right DS gives right complexity.
- #10 + #9 (froze + bad communication) → really #9. Talking unfreezes you.
- #14 + #15 (overfit pattern + no proof) → really #15. Proving forces understanding.
When Failures Are Good
Some failures are productive:
- First attempt at a new pattern → expected to fail, the failure teaches.
- Difficulty stretch (jumping a level) → expected to fail 50%+ of the time.
- Mock interviews → 30%+ failure rate is normal and healthy.
Failures are bad when:
- You repeat the same category 5+ times without improvement.
- You stop tracking them.
- You don’t run the drill.
- You blame the problem (“that was unfair”) instead of analyzing yourself.
Final Readiness Checklist
You are ready for the interviews you’re targeting only when every box below is honestly checked. This is binary, not aspirational.
Algorithmic Solving
- Solve LeetCode Easy in <12 minutes (90% success rate over 50 recent problems)
- Solve LeetCode Medium in 25–35 minutes (75% success rate over 100 recent problems)
- Solve LeetCode Hard in 45–60 minutes (50% success rate over 50 recent problems)
- Recognize the pattern within 2 minutes of reading any LeetCode Medium
- Derive a non-trivial optimization without seeing it before, on at least 30 unfamiliar problems
Brute Force & Optimization
- Can state the brute force in <2 minutes for any unseen problem
- Can compute brute force complexity correctly without aid
- Can derive optimal complexity from constraints alone
- Have written brute-force-comparator tests for at least 10 problems
Correctness & Proofs
- Can state DP recurrences with base cases and evaluation order on the spot
- Can produce an exchange argument for at least 5 greedy problems
- Can produce a counterexample to a wrong greedy in <5 minutes
- Have proved correctness for at least 20 different solutions (greedy, DP, graph)
Code Quality
- Can write binary search, BFS, DFS, union-find, topological sort, and Dijkstra from scratch error-free in <10 minutes each
- Code passes the self-review checklist on every solve
- No off-by-one bugs in 10 consecutive binary search problems
- No mutable default argument / shared-state bugs in 10 consecutive recursion problems
Testing
- Run the universal edge-case checklist as reflex on every problem
- Have written stress-test verifiers for at least 5 problems
- Catch your own bugs before submission on 80%+ of problems
- Property-based tested at least 3 implementations
Patterns
- All 28 patterns in phase-02-patterns/ — recognize signals in <2 minutes
- Have solved at least 5 problems per pattern
- Can explain when each pattern does not apply
- Can produce a clean template for each pattern from memory
Data Structures
- Internal representation, complexity, and memory behavior of all foundation DS in phase-01-foundations/
- Can choose between hashmap / sorted array / heap / BST given access patterns
- Can implement segment tree, Fenwick tree, trie, LRU cache from scratch
- Understand iterator invalidation, hash collision behavior, and resize cost in your primary language
Graph Algorithms
- BFS, DFS, Dijkstra, topological sort, union-find — implement from scratch in <10 minutes each
- Recognize when 0-1 BFS, Bellman-Ford, or A* is the right choice
- Can model a word problem as a graph (nodes + edges + weight) in <3 minutes
- Have completed all 9 product-style labs in phase-04-graphs/labs/
Dynamic Programming
- Can derive state + transition + base case for unseen DP problems
- Have completed brute-force → memo → tabulated → space-optimized for at least 15 problems
- Recognize 1D / 2D / interval / tree / digit / bitmask / knapsack signals
- Can articulate why greedy fails for a given DP problem
Language & Runtime
- Read your primary language’s track in phase-09-language-runtime/ end to end
- Can explain stack vs heap, scope/lifetime, value vs reference semantics fluently
- Can explain GC / ownership behavior of your language
- Can explain hash collision and resize behavior of your language’s hashmap
- Have used the language’s profiler at least once on real code
Concurrency
- Can identify race conditions and deadlocks in code review
- Have implemented thread pool, rate limiter, and bounded blocking queue from scratch
- Can articulate memory visibility / happens-before in your language
- Have used the race detector / equivalent at least once
Practical Engineering
- Completed at least 12 of the 23 labs in phase-08-practical-engineering/
- Can answer all 13 standard follow-ups (10M users, distributed, concurrency, race testing, metrics, logging, debugging, partial failure, memory leaks, extensibility, backpressure, retries, deduplication) on demand
- Can sketch a small system (LRU cache, rate limiter, autocomplete) on a whiteboard in 30 minutes
Communication
- Restate every problem before coding
- Ask 3+ clarifying questions per problem
- Narrate intent at decision points (not low-level mechanics)
- Recover from mistakes without flustering
- Handle hints gracefully and integrate them
- Close every solve with a clean summary
Mock Interview Performance
- Pass mock-03 (Medium LeetCode) — 7+/10 average
- Pass mock-05 (Big Tech phone) — 7+/10 average
- Pass mock-06 (Big Tech onsite) — 7+/10 average
- Pass mock-09 (runtime/language) — 7+/10 average
- Pass mock-11 (concurrency) — 7+/10 average
- (For senior+) Pass mock-07 (senior engineer)
- (For staff+) Pass mock-08 (staff practical)
Failure Analysis
- Maintain a failure log per FAILURE_ANALYSIS.md
- Top 3 personal failure categories identified
- At least 2 weeks of focused drilling on top failure category complete
- Re-test results show measurable improvement
Spaced Repetition
- Active spaced repetition rotation per SPACED_REPETITION.md
- At least 50 problems graduated to Tier 6
- No overdue reviews more than 1 week old
Recovery & Stuck Protocol
- Never go silent for >60 seconds in a mock
- Use the stuck protocol explicitly when stuck
- Recover from a wrong direction in <3 minutes
- Ask for hints professionally without losing composure
Production Awareness
- Can extend any solved problem with: scale to 10M, make distributed, handle partial failure, add observability
- Can articulate tradeoffs (memory vs latency, consistency vs availability, accuracy vs speed)
Targeted Roles
Beyond the universal checklist, additional criteria by role:
FAANG / Big Tech
- All universal items
- mock-05 + mock-06 passed twice with different problems
Infrastructure / Backend / Platform
- All universal items
- mock-08 + mock-10 passed
- All Phase 8 labs complete
Distributed Systems
- All universal items
- mock-08 + mock-10 + mock-11 passed
- Phase 8 + Phase 4 (graph) labs complete
Compiler / Runtime
- All universal items
- mock-09 passed twice
- Phase 9 fully complete for primary language
- Phase 3 (advanced DS) labs complete
Quant / HFT
- All universal items
- mock-12 (competitive style) passed
- Phase 7 (competitive) topics complete
- Phase 5 (DP) extreme topics complete
Senior / Staff / Principal Practical
- All universal items
- mock-07 + mock-08 passed twice each
- All Phase 8 labs complete with full follow-up answers
- Code quality bar unwaveringly met
Competitive Programming
- All universal items
- Phase 7 + Phase 12 complete
- mock-12 (competitive style) passed twice
- Solving Codeforces Div 2 D regularly
Honesty Test
For every checked box, ask: “Could I do this right now, cold, with no warm-up?”
If the answer is “after I review my notes” — uncheck it. Notes don’t come into the interview.
If the answer is “if I had a good day” — uncheck it. Interview days are sometimes bad days.
Honesty here is the difference between feeling ready and being ready.
Problem Review Template
Use this after every problem — solved or failed. The review is where the learning compounds. Without it, problems are forgotten in 72 hours.
Save each review as a separate file or notebook entry. Recommended: reviews/YYYY-MM-DD-problem-name.md.
Template
# Problem Name
## Source
LeetCode 42 / Codeforces Round 800 Div 2 C / etc.
## Difficulty
Easy / Medium / Medium-Hard / Hard / Very Hard / CP-Hard / Grandmaster
## Pattern(s)
Two pointers / Sliding window / DP-2D / Graph BFS / etc.
(Multiple patterns possible.)
## First Intuition
What was my first instinct on reading the problem?
What pattern did I think it was?
Was that intuition right?
## Brute Force
- Approach (1–2 sentences)
- Complexity: time / space
- Why it doesn't pass (or "passes within constraints")
## Optimal Idea
- Approach (3–5 sentences)
- Key insight (the *one* thing that unlocks the problem)
- Complexity: time / space
## Why I Missed It (if applicable)
The honest answer. Choose from:
- Didn't recognize the pattern
- Recognized the pattern but applied it wrong
- Couldn't derive the recurrence / invariant
- Wrong data structure choice
- Implementation bug
- Misread the problem
- Ran out of time
- Got the optimal but couldn't prove it
## Key Insight
The single sentence that, if I'd known it, would have unlocked the problem in 5 minutes.
## Data Structures Used
- DS 1: why
- DS 2: why
- (etc.)
## Complexity
Time: O(...)
Space: O(...)
Tight bound? Y/N
Amortized? Y/N
## Bugs I Made
- Off-by-one at line X
- Forgot to handle empty case
- Wrong comparison operator (<= vs <)
- Used wrong variable in inner loop
- Modified collection while iterating
- (etc.)
## Edge Cases I Missed
- Empty input
- Single element
- All duplicates
- All negatives
- Max constraint
- Disconnected component
- (etc.)
## Follow-ups Practiced
- "What if the input is streamed?" → answer
- "What if N is 10^9?" → answer
- "What if memory is constrained?" → answer
## Product Extension
How does this map to a real-world system?
e.g., "This LRU cache pattern is exactly what a CDN edge node uses for hot-content eviction."
## Language/Runtime Notes
- Specific stdlib gotcha I hit
- Memory behavior surprise
- Concurrency consideration if relevant
## How I Would Recognize This Again
A pattern signal in plain English:
"When the problem asks for the minimum window covering K elements over a stream, with constant-time element insertion/removal and order matters, it's a sliding window with a hashmap."
This is the most important field. Optimize for *recognition*, not memorization.
## Re-solve Schedule
Per [SPACED_REPETITION.md](SPACED_REPETITION.md):
- Same day: ☐ done / not done
- 2 days later: ☐ scheduled for [date]
- 1 week later: ☐ scheduled for [date]
- 2 weeks later: ☐ scheduled for [date]
- 1 month later: ☐ scheduled for [date]
- 3 months later: ☐ scheduled for [date]
## Attempts Log
| Date | Unaided? | Time | Outcome | Notes |
|---|---|---|---|---|
| 2026-05-20 | N | 45 min | Wrong, then hint | Missed monotonic stack pattern |
| 2026-05-22 | Y | 18 min | Correct | Recognized pattern immediately |
| 2026-05-29 | Y | 12 min | Correct | Optimal first attempt |
How To Fill It Out (Do’s and Don’ts)
DO
- Be brutally honest. “I gave up after 20 minutes” is more useful than “I solved it but slowly”.
- Name the one insight. Forcing yourself to a single sentence forces understanding.
- Re-solve from a blank file at scheduled intervals.
- Tag heavily. Pattern, difficulty, data structure — these become your search keys.
DON’T
- Copy the editorial verbatim. Translate into your own words.
- Skip the “Why I Missed It” field when you got it right — what was hard about it? What might have tripped you up if you’d been less lucky?
- Skip the “How I Would Recognize This Again” field. This is where 80% of the value lives.
- Move on without scheduling the next re-solve.
Review Aggregation
Every Friday: skim the week’s reviews. Look for patterns:
- “I keep missing monotonic stack opportunities” → drill that pattern.
- “I keep making off-by-one bugs in binary search” → write a personal binary search template and use it.
- “I keep choosing the wrong DP state on tree problems” → revisit phase-05-dp/categories/dp-tree.md.
Every month: aggregate into a personal weakness list. Top 3 weaknesses get dedicated drilling for the next month.
Spaced Repetition System
The brain forgets new information on a curve. Without re-exposure, ~70% of what you learn today is gone in 7 days. Spaced repetition counteracts this with strategically-timed reviews.
This system applies to two things:
- Problems you’ve solved (especially failed-then-solved ones)
- Concepts (patterns, data structures, algorithms)
The 6-Tier Interval Schedule
| Tier | Interval | Action |
|---|---|---|
| 1 | Same day (within 4 hours of first solve) | Re-solve from scratch |
| 2 | 2 days later | Re-solve from scratch |
| 3 | 1 week later | Re-solve from scratch |
| 4 | 2 weeks later | Re-solve from scratch + try a harder variant |
| 5 | 1 month later | Verbal explanation + complexity + sketch tests |
| 6 | 3 months later | Verbal explanation only — proves it’s in long-term memory |
Graduation criterion: if Tier 6 is unaided in <120% of your best time, the problem is “owned” — drop it from active rotation. Otherwise, restart at Tier 3.
What “Re-solve” Means
Re-solving is not reading your old solution. It is:
- Fresh editor / blank file.
- Re-read the problem statement.
- Solve from scratch.
- Compare to your previous solution after.
If you can’t do it without peeking → demote one tier (tier 4 → tier 3) and continue.
What “Verbal Explanation” Means (Tier 5+)
Without writing code, explain out loud (record yourself if alone):
- Restate the problem in your own words.
- State the brute force.
- State the optimal approach.
- State the key insight.
- State the complexity (time + space) and why.
- State 3 edge cases and how they’re handled.
- Sketch 2 follow-up answers.
Listen back. Did you stumble? Demote one tier. Did you flow? Tier graduated.
Concept-Level Spaced Repetition
For patterns and algorithms (not problems), use a similar schedule but with different review actions:
| Tier | Interval | Action |
|---|---|---|
| 1 | Same day as learning | Solve 2 problems applying it |
| 2 | 2 days | Solve 1 problem (different difficulty) |
| 3 | 1 week | Teach the concept verbally (recorded or to a peer) |
| 4 | 2 weeks | Solve 1 problem in a domain you don’t usually associate with it |
| 5 | 1 month | Compare/contrast with a related pattern |
| 6 | 3 months | Solve 1 hard problem cold; if you spot the pattern in <2 minutes, owned |
Logistics: How To Maintain The Schedule
Option 1: Spreadsheet
Columns: Problem | Last Solved | Tier | Next Review Date | Difficulty | Pattern | Notes
Sort by Next Review Date. Top of the list = today’s reviews.
Option 2: Anki (or SRS app)
- One card per problem.
- Front: problem name + difficulty.
- Back: pattern + key insight + complexity.
- Use the SRS scheduling.
Custom intervals matter — the default Anki intervals are tuned for vocabulary, not problems. Use 1d, 2d, 7d, 14d, 30d, 90d.
Option 3: Folder structure (no tooling)
reviews/
today.md # editable list of today's review problems
upcoming/
2026-05-22.md # problems due that date
2026-05-29.md
...
archive/
YYYY-MM-DD-problem-name.md
Each evening: move tomorrow’s file to today.md.
Daily Volume Guidelines
When you have N problems in active rotation, your daily review load looks like:
| Active Problems | Daily Reviews | New Problems |
|---|---|---|
| 0–50 | 0–5 | 4–6 |
| 50–150 | 5–12 | 3–5 |
| 150–300 | 10–20 | 2–3 |
| 300+ | 15–25 | 1–2 |
When daily reviews exceed your capacity:
- Graduate aggressively (drop owned problems).
- Slow down new problem intake.
- Consolidate easy/owned problems into “weekly batch reviews” instead of individual reviews.
What To Do When You Fall Behind
Inevitable. When you have 50+ overdue reviews:
- Don’t panic-skip. Don’t mark them all done.
- Triage by tier. Tier 1 + 2 are the most fragile — do those first.
- Drop tier 5 + 6 for 1 week — they decay slowly.
- Reduce new intake to 0 until reviews are caught up.
- Audit: what made you fall behind? Too many new problems? Underestimated review time? Adjust intake rate.
Why This Matters
Without spaced repetition, your interview prep is a leaky bucket. You add 10 problems a week, lose 8 to forgetting, net +2.
With spaced repetition, you add 5 problems a week, retain 5, and after 3 months you have 60 deeply-owned problems instead of 30 vaguely-remembered ones.
In an interview:
- Vaguely-remembered → “I think I’ve seen this before, but I can’t quite…”
- Deeply-owned → “This is a [pattern] problem. The key insight is [X]. I’d solve it with [approach].”
The latter is a hire signal. The former is not.
Integration With The Curriculum
- Every problem solved during the curriculum enters Tier 1 automatically.
- Every problem in phase-11-mock-interviews/ you fail enters Tier 1 with a
failure_categorytag. - Every concept in a phase README enters Tier 1 the day you finish the phase.
- The Tier 6 graduation criterion is also part of the READINESS_CHECKLIST.md.
12-Week Accelerated Track
Audience: You have a deadline in 8–14 weeks (a known interview, an offer expiring, a layoff window). You can put in 25–35 hours/week.
Tradeoffs you are accepting:
- You will skip Phase 7 (competitive programming) almost entirely.
- You will skip Phase 12 (grandmaster) entirely.
- You will get to “competent at FAANG mediums” not “consistently solves FAANG hards in 45 minutes”.
- Concept depth is sacrificed for problem volume in some weeks.
This track is sufficient for: new-grad / SWE2 FAANG, scaleups, most backend/platform roles. This track is NOT sufficient for: staff/principal interviews, quant/HFT, compiler/runtime, distributed systems specialty roles.
Daily Cadence
- Weekday: 3–5 hours
- 90 min: new content (read concept doc + work through 1 lab)
- 90 min: problem solving (3–5 problems)
- 30 min: review (spaced repetition queue)
- 30 min: failure analysis if you missed anything
- Saturday: 6 hours
- 1 mock interview (90 min including review)
- 4 hours problem solving
- 30 min review
- Sunday: 4 hours
- Weakness drilling on top failure category
- Re-solve all this week’s failures
- Plan next week
Total: 25–30 hours/week
Weekly Plan
Week 1 — Foundation Reset
Goal: Stop being random. Internalize the framework.
- Read README.md, FRAMEWORK.md, COMMUNICATION.md, CODE_QUALITY.md in full
- Complete all 7 labs in phase-00-execution-baseline/
- 25 LeetCode Easy problems applying the framework rigorously (full restate, brute force, optimize, test)
- 0 mocks (you’re not ready)
Mastery check: You can solve LeetCode Easy in 12 minutes with the full framework, narrating throughout.
Week 2 — Foundations Of Data Structures
- Read all 15 DS docs in phase-01-foundations/data-structures/
- 30 problems: array, string, hashmap, stack, queue (mostly Easy + a few Medium)
- 1 mock (Easy level)
- Read phase-01-foundations/runtime/ — stack vs heap, scope/lifetime, mutable vs immutable
Mastery check: Operations + complexity table for every DS reproducible from memory.
Week 3 — Linked Lists, Trees, Recursion
- Phase 1 labs: linked list reversal, recursion, binary search, basic trees
- 25 problems (15 Easy + 10 Medium)
- Read remaining runtime docs in Phase 1
- 1 mock (Easy/Medium mix)
Week 4 — Pattern Onboarding (Part 1)
- Read patterns: two pointers, sliding window, prefix sums, hashing, sorting+greedy, binary search, monotonic stack
- For each pattern: 4–5 problems (Medium-leaning)
- 1 mock (Medium)
Mastery check: Can recognize each pattern’s signal in <2 minutes for unseen problems.
Week 5 — Pattern Onboarding (Part 2)
- Read patterns: monotonic queue, intervals, linked list manipulation, tree DFS/BFS, graph DFS/BFS, topological sort
- 30 problems across these patterns
- 1 mock (Medium)
Week 6 — Pattern Onboarding (Part 3) + DP Intro
- Read patterns: union find, backtracking, basic DP, 1D DP
- Read phase-05-dp/concepts/ — memoization, state definition, transitions, base cases
- 25 DP problems (Easy → Medium)
- 1 mock (Medium)
Mastery check: Can write the brute → memo → tabulated → space-optimized progression for any 1D DP.
Week 7 — Trie, Heap, K-way Merge, 2D DP
- Read patterns: trie, heap top-K, K-way merge, 2D DP, knapsack, subsequence DP, string DP
- 25 problems
- 1 mock (Medium-Hard)
Week 8 — Graphs Deep Dive
- Read phase-04-graphs/algorithms/: BFS, DFS, multi-source BFS, 0-1 BFS, Dijkstra, topological sort, cycle detection, MST (Kruskal+Prim), union find
- 25 graph problems
- 2 product-style labs from phase-04-graphs/labs/
- 1 mock (Medium-Hard graph-heavy)
Week 9 — DP Deeper + Greedy
- Read DP categories: tree DP, interval DP, bitmask DP (overview), knapsack variants, LIS, edit distance
- Read phase-06-greedy/concepts/: greedy choice, exchange argument, invariants
- 25 problems (mix DP + greedy)
- 2 greedy labs (with proofs)
- 1 mock (Medium-Hard mixed)
Week 10 — Practical Engineering Coding
- Complete labs from phase-08-practical-engineering/: LRU cache, rate limiter, autocomplete, thread pool, KV store, retry+backoff
- Read your primary language’s track in phase-09-language-runtime/
- 15 problems (mediums + 1–2 hards)
- 2 mocks (1 Big Tech phone, 1 Big Tech onsite)
Week 11 — Hard Problems + Mock Marathon
- 20 LeetCode Hard problems
- Daily mock interviews (5 mocks this week, mix of phase-11-mock-interviews/ types)
- Failure analysis on every mock
- Drill top failure category
Week 12 — Polish, Confidence, Rest
- 3 mocks (cold, varied — no pre-warmup)
- Re-solve top 30 problems from spaced repetition queue
- Complete READINESS_CHECKLIST.md honestly
- 1 day complete rest before interview
Problem Volume Targets
| Week | New Problems | Reviews | Mocks |
|---|---|---|---|
| 1 | 25 (Easy) | 0 | 0 |
| 2 | 30 (E+M) | 10 | 1 |
| 3 | 25 (E+M) | 15 | 1 |
| 4 | 25 (M) | 20 | 1 |
| 5 | 30 (M) | 25 | 1 |
| 6 | 25 (M) | 30 | 1 |
| 7 | 25 (M-H) | 30 | 1 |
| 8 | 25 (M-H) | 30 | 1 |
| 9 | 25 (M-H) | 30 | 1 |
| 10 | 15 (M-H + H) | 30 | 2 |
| 11 | 20 (H) | 30 | 5 |
| 12 | 0 | 30 (re-solves) | 3 |
| Total | ~270 | ~280 | 18 |
Review Schedule (per SPACED_REPETITION.md)
Use the abbreviated tier schedule for accelerated track:
- Tier 1: same day
- Tier 2: 2 days
- Tier 3: 1 week
- Tier 4: 2 weeks
- (Skip tiers 5–6 — you don’t have time before interviews)
What This Track Cannot Buy You
- Deep competitive-programming intuition (Phase 7)
- Grandmaster-level pattern recognition (Phase 12)
- Long-term retention of all 270 problems (you’ll forget ~30% within 3 months without continued review)
- The kind of polish that comes from 6+ months of practice
If you finish Week 12 and your interview is delayed by 8+ weeks, switch to the 6-month track for the remaining time. Use the buffer to backfill Phase 7 basics and harden weak patterns.
6-Month Serious Track
Audience: You’re targeting strong Big Tech readiness. You can put in 15–25 hours/week sustainably for 6 months.
Tradeoffs:
- You will cover Phase 7 (competitive) selectively — not full grandmaster prep.
- You will get to “consistently solves FAANG hards in 45–60 minutes” with strong follow-ups.
- You will have meaningful depth in your primary language’s runtime.
- This is the recommended track for most readers.
Sufficient for: all FAANG levels including senior, infrastructure, platform, distributed systems, most backend specialties.
Daily Cadence
- Weekday: 2–3 hours
- 60 min: new content (read + 1 lab or 2–3 concept docs)
- 60 min: problem solving (2–4 problems)
- 30 min: review queue
- Saturday: 4 hours
- 1 mock interview (90 min)
- 2 hours problem solving
- 30 min: failure analysis + review
- Sunday: 2 hours
- Re-solves at Tier 4–5
- Weakness drilling
Total: 18–22 hours/week
Monthly Milestones
Month 1 — Foundations (Weeks 1–4)
Cover: Phase 0 + Phase 1 fully, Phase 9 (your primary language) partially.
- Week 1: Phase 0 (all 7 labs), 25 Easy problems
- Week 2: Phase 1 DS docs (arrays, strings, hashmaps, stacks, queues, linked lists), Phase 1 runtime (stack vs heap, scope, value/reference, mutability), 30 problems
- Week 3: Phase 1 DS continued (heaps, sorting, binary search, recursion, basic trees, basic graphs, basic DP), 30 problems, 2 labs
- Week 4: Phase 1 runtime complete (hash collisions, iterator invalidation, GC, memory leaks, deep/shallow copy), Phase 9 your-language (first half), 30 problems, 3 labs
End-of-month: all Phase 1 mastery checks pass. 1 mock at Easy level, scored 7+/10.
Volume: ~115 problems, 10+ labs.
Month 2 — Patterns Mastery (Weeks 5–8)
Cover: Phase 2 fully, Phase 9 your-language fully.
- Week 5: two pointers, sliding window, prefix sums, difference arrays, hashing, sorting+greedy. 5 problems per pattern.
- Week 6: binary search, binary search on answer, monotonic stack, monotonic queue, intervals, linked list manipulation. 5 per pattern.
- Week 7: tree DFS/BFS, graph DFS/BFS, topological sort, union find. 5 per pattern. 3 labs.
- Week 8: backtracking, basic DP, 1D DP, 2D DP, knapsack basics, subsequence DP, string DP, trie, heap top-K, K-way merge. 4 per pattern.
End-of-month: mastery check on all 28 patterns. 4 mocks (Easy-Medium-Medium-Medium-Hard).
Volume: ~140 problems, 9 labs.
Month 3 — Graphs + DP Deep (Weeks 9–12)
Cover: Phase 4 fully, Phase 5 fully (except convex hull / Knuth optimization, those are Month 6).
- Week 9: Phase 4 fundamentals (BFS variants, DFS, Dijkstra, Bellman-Ford, Floyd-Warshall, topo). Solve 30 graph problems. 2 graph labs.
- Week 10: Phase 4 advanced (SCCs, bridges/articulation, MST, bipartite, max flow basics, graph modeling). Solve 25. 3 graph labs.
- Week 11: Phase 5 categories: 1D, 2D, knapsack, LIS, edit distance, palindrome, string DP. Solve 25. 3 DP labs (full brute→memo→tab→space progression).
- Week 12: Phase 5 categories: tree DP, interval DP, bitmask DP, digit DP (intro), DP on DAGs, game DP. Solve 25. 3 DP labs.
End-of-month: all of Phase 4 + Phase 5 mastery checks. 4 mocks (Medium-Medium-Medium-Hard mix).
Volume: ~110 problems, 11 labs.
Month 4 — Greedy, Advanced DS, Practical Engineering (Weeks 13–16)
- Week 13: Phase 6 fully (greedy choice, exchange arg, cut property, invariants, monovariants, amortized analysis). 6 greedy labs (with proofs). 20 greedy problems.
- Week 14: Phase 3 (segment tree, Fenwick tree, sparse table, KMP, Z, rolling hash, trie variants, bit manipulation, bitmask DP, meet-in-the-middle). 20 problems. 3 advanced DS labs.
- Week 15: Phase 8 first half (LRU, LFU, rate limiter, task scheduler, thread pool, job queue, autocomplete, log parser, file dedup). 9 labs with full follow-up answers.
- Week 16: Phase 8 second half (consistent hashing, message dispatcher, pubsub, timer wheel, KV store, retry/backoff, circuit breaker, metrics, web crawler, in-memory FS). 10 labs.
End-of-month: Phase 3 + Phase 6 + Phase 8 (most of it) complete. 4 mocks including 1 Big Tech phone screen.
Volume: ~80 problems, 28 labs.
Month 5 — Hard Problems + Concurrency + Senior-Level Skills (Weeks 17–20)
- Week 17: Phase 8 final 4 labs + Phase 10 (testing/debugging concept docs). 5 testing/debugging labs. 15 problems.
- Week 18: Concurrency deep dive — re-read Phase 9 concurrency sections, solve 5 concurrency-flavored problems, do mock-11 twice. 15 problems.
- Week 19: Hard problem week — 25 LeetCode Hards across all patterns. 1 mock per day (5 mocks).
- Week 20: Phase 7 selectively (modular arithmetic, sieve, binary exponentiation, combinatorics, sweep line, coordinate compression — skip ICPC-only topics). 20 problems including 8 from Codeforces Div 3. 1 mock.
End-of-month: all of Phase 8, Phase 10, parts of Phase 7. 8 mocks total.
Volume: ~100 problems including ~25 hards.
Month 6 — Polish, Mock Marathon, Production Awareness (Weeks 21–24)
- Week 21: Mock-heavy week. 5 mocks (mix mock-05, 06, 07, 09, 11). Failure analysis on each. 15 problems focused on top failure category.
- Week 22: Phase 12 selective topics (only those relevant to your role) — for backend/platform: max flow modeling, advanced combinatorics inclusion-exclusion. For systems: nothing additional, focus on Phase 8 polish. 20 problems.
- Week 23: Re-solve marathon — re-solve all Tier 5 and Tier 6 problems. Verify READINESS_CHECKLIST.md honestly. 3 mocks.
- Week 24: Light week. 2 mocks. 10 problems. Rest. Final readiness check.
End-of-program: READINESS_CHECKLIST.md fully passed.
Volume: ~50 problems, 13 mocks across the month.
Aggregate Volume
| Month | New Problems | Labs | Mocks |
|---|---|---|---|
| 1 | 115 | 10 | 1 |
| 2 | 140 | 9 | 4 |
| 3 | 110 | 11 | 4 |
| 4 | 80 | 28 | 4 |
| 5 | 100 | 5 | 8 |
| 6 | 50 | 0 | 13 |
| Total | ~595 | ~63 | ~34 |
Review / Spaced Repetition
Full 6-tier schedule from SPACED_REPETITION.md:
- Tier 1: same day
- Tier 2: 2 days
- Tier 3: 1 week
- Tier 4: 2 weeks
- Tier 5: 1 month
- Tier 6: 3 months
By Month 6 you should have ~50–80 Tier 6 graduates.
Mock Schedule
Required mocks across the program (see phase-11-mock-interviews/):
| Mock | When | Pass Threshold |
|---|---|---|
| mock-02 (Easy) | end of Month 1 | 7/10 |
| mock-03 (Medium) ×3 | Month 2 | 6/10 → 7/10 |
| mock-04 (Hard) ×2 | Month 3 | 6/10 |
| mock-05 (Big Tech phone) ×3 | Months 4–6 | 7/10 |
| mock-06 (Big Tech onsite) ×3 | Months 5–6 | 7/10 |
| mock-07 (senior) ×2 | Months 5–6 | 6/10 |
| mock-09 (runtime) ×2 | Months 5–6 | 7/10 |
| mock-11 (concurrency) ×2 | Month 5 | 6/10 |
| mock-10 (system-heavy) ×2 | Month 6 | 6/10 |
Revision Plan
Weekly: Friday evening — skim review log, identify the week’s failure pattern. Monthly: Last weekend — aggregate failures, identify top 3 categories, dedicate next month’s Sunday drilling to category #1. Bi-monthly: End of months 2, 4, 6 — do a “blind mock” with a problem you’ve never seen, no warmup. Score honestly.
What This Track Buys You
- ~600 deeply-owned problems
- All 28 patterns recognizable in <2 minutes
- ~35 production-style labs with full follow-up answers
- Strong concurrency + runtime depth in your primary language
- Genuine readiness for FAANG senior-level coding interviews
- 30+ mock interviews of practice — pressure tolerance is real
What It Does Not Buy You
- Codeforces Div 1 / AtCoder AGC level (you’d need Phase 7 + 12 fully)
- Quant/HFT-level math depth (math heavy, beyond Month 5 scope)
- Compiler/runtime team specialty (you’d need Phase 9 fully across multiple languages)
12-Month Elite Track
Audience: Top-tier targets. Senior/staff/principal practical, FAANG L6+, Jane Street / Citadel / Two Sigma, distributed systems specialty teams, compiler/runtime teams, or competitive programmers building toward Codeforces 2100+.
Commitment: 15–20 hours/week sustained for 12 months. Not a sprint.
This track will get you to:
- Phase 7 + Phase 12 substantial depth
- Codeforces Div 2 reliable, Div 1 attempt-capable
- Multi-language runtime fluency in primary + 1 secondary
- Production-system thinking comparable to a working senior/staff engineer
- The specific gap between “Big Tech ready” and “elite-tier ready”
Annual Phase Map
| Months | Phases | Focus |
|---|---|---|
| 1 | Phases 0–1 | Foundations + framework internalization |
| 2 | Phase 2 | All 28 patterns to mastery level |
| 3 | Phase 4 | Graph algorithms deep |
| 4 | Phase 5 | DP from basic to extreme |
| 5 | Phase 3 + Phase 6 | Advanced DS + greedy proofs |
| 6 | Phase 8 + Phase 10 | Practical engineering + testing/debugging |
| 7 | Phase 9 (full) | Language/runtime — 2 languages |
| 8 | Phase 7 (first half) | Competitive programming — math, sieve, modular, geometry |
| 9 | Phase 7 (second half) | Competitive — sweep line, Mo’s, parallel BS, contests |
| 10 | Phase 12 (selective) | Grandmaster topics relevant to your role |
| 11 | Mock marathon | Interview-realistic prep |
| 12 | Polish + readiness | Final checks, blind mocks, rest |
Daily Cadence
- Weekday: 2 hours
- Saturday: 5 hours (mock + problems + labs)
- Sunday: 3 hours (review + drilling)
- Total: 17–18 hours/week
You will scale up in Months 8–11 to 22 hours/week temporarily.
Monthly Detail
Month 1 — Phase 0 + Phase 1
Identical to Month 1 of 6-Month Serious, but with extra depth on runtime docs (read all 10) and an additional 30 Easy problems for fluency.
Volume: 130 problems, 12 labs, 1 mock.
Month 2 — Phase 2 (Patterns)
- All 28 pattern docs read in detail
- 8 problems per pattern (vs 5 in Serious track)
- Every pattern: write a personal template after solving the 8 problems
- 5 mocks across the month
Volume: 224 problems, 9 labs, 5 mocks.
Mastery check: Can derive each pattern’s template from memory in <5 minutes.
Month 3 — Phase 4 (Graphs)
- All 21 graph algorithms with implementation
- All 9 product-style labs
- 60 graph problems (mix of LeetCode + Codeforces Div 3 graph problems)
- 4 mocks
Volume: 60 problems, 9 labs, 4 mocks.
Mastery check: Can implement BFS / DFS / Dijkstra / Bellman-Ford / Floyd-Warshall / Kruskal / Prim / Kosaraju / Tarjan from scratch error-free in <12 minutes each.
Month 4 — Phase 5 (Dynamic Programming)
- All 22 DP concept + category docs
- All 10 DP labs with full progression (brute → memo → tab → space)
- 80 DP problems
- Every problem you solve: explicitly state recurrence + base case + evaluation order before coding
- 4 mocks (DP-heavy)
Volume: 80 problems, 10 labs, 4 mocks.
Mastery check: Can derive state + transition for unseen DP problems in <8 minutes.
Month 5 — Phase 3 (Advanced DS) + Phase 6 (Greedy/Proofs)
- All 24 advanced DS docs
- All 9 advanced DS labs
- All 7 greedy concept docs + all 6 labs (with proofs)
- 50 problems mixing advanced DS and greedy
- 4 mocks
Volume: 50 problems, 15 labs, 4 mocks.
Mastery check: Can implement segment tree (with lazy propagation), Fenwick tree, KMP, rolling hash, trie, treap from scratch.
Month 6 — Phase 8 (Practical Engineering) + Phase 10 (Testing/Debugging)
- All 23 practical engineering labs
- Every lab includes: working implementation + unit tests + smoke tests + concurrency tests where relevant + answers to all 13 standard follow-ups
- All 13 testing/debugging concept docs + all 5 labs
- Property-based test 5 implementations
- 30 problems
- 5 mocks (mix mock-08 staff practical, mock-10 system-heavy, mock-11 concurrency)
Volume: 30 problems, 28 labs, 5 mocks.
Mastery check: Can build any of (LRU, rate limiter, autocomplete, KV store, thread pool) from scratch in 45 minutes with tests.
Month 7 — Phase 9 (Language & Runtime)
- Primary language: read entire track end to end. Take written notes. Solve 5 problems per major topic targeting the gotchas.
- Secondary language: read 50%+ of the track. Solve 20 problems in this language to build fluency.
- 30 problems total in primary language (focused on runtime gotchas: overflow, recursion limits, hash adversarial, GC pressure, concurrency)
- 4 mocks including 2 mock-09 (runtime/language deep dives)
Volume: 50 problems, 0 labs (concept-heavy month), 4 mocks.
Mastery check: Can fluently explain stack vs heap, GC behavior, hashmap internals, concurrency model for both languages.
Month 8 — Phase 7 First Half
Competitive programming acceleration starts here. Move to Codeforces full-time for problem solving (LeetCode for review only).
- Topics: fast I/O, modular arithmetic, GCD/LCM, sieve, prime factorization, modular inverse, combinatorics, binary exponentiation, matrix exponentiation, geometry basics, coordinate compression
- Codeforces Div 3 → solve 6 contests (virtual or real). Goal: solve 4–5 problems per Div 3.
- AtCoder Beginner Contest: solve 4 contests. Goal: solve 5 of A–F.
- 1 mock per week (varied — competitive-flavored mock-13)
Volume: ~80 problems (Codeforces + AtCoder), 6 labs, 4 mocks.
Mastery check: Solve Codeforces 1500–1700 rated problems consistently within contest time.
Month 9 — Phase 7 Second Half
- Topics: sweep line, offline queries, Mo’s algorithm, parallel binary search (overview), randomized stress testing, interactive problems, game theory (Nim, Sprague-Grundy)
- Codeforces Div 2 → 6 virtual contests. Goal: solve A–C reliably, attempt D.
- AtCoder Regular Contest → 3 contests, attempt A–C.
- Implement a stress-testing harness for 5 problems
- 4 mocks including mock-13 (competitive hard) twice
Volume: ~70 problems, 4 mocks.
Mastery check: Codeforces Div 2 ABC reliable. D attempted and sometimes solved.
Month 10 — Phase 12 (Selective)
Pick topics by role:
Backend/platform:
- Max flow modeling, min-cost max-flow overview
- Advanced combinatorics + inclusion-exclusion
- Randomized algorithms
Distributed systems:
- Persistent data structures
- Consistent hashing deep
- Lockless / wait-free data structures (research)
Compiler/runtime:
- Suffix automaton
- Heavy-light decomposition
- Constraint solving / SAT-like reasoning
Quant/HFT:
- FFT/NTT
- Computational geometry
- Probability DP + expected value
- Game theory deep
Pure competitive:
- HLD, centroid decomposition, segment tree beats, FFT/NTT, suffix automaton
Pick 5–7 topics. Read deeply. Implement reference code. 3 labs from Phase 12.
Volume: 30 problems (very hard / CP-hard), 3+ labs, 4 mocks.
Mastery check: can articulate selected topics + implementation risks; have working reference implementations.
Month 11 — Mock Marathon
This is the highest-pressure month.
- 4 mocks per week (16 total)
- Mix all 14 mock types — including grandmaster final boss (mock-14) at least twice
- Failure analysis on every mock
- Every Sunday: aggregate failure pattern, drill the top category Mon–Wed
- 60 problems (almost entirely focused on weak areas surfaced by mocks)
Volume: 60 problems, 16 mocks.
Month 12 — Polish, Re-solve, Final Readiness
- Week 49: Re-solve all Tier 5 + Tier 6 problems. Aim for 80+ Tier 6 graduates.
- Week 50: 4 cold blind mocks (no warmup, varied difficulty). Score brutally honestly.
- Week 51: Address any remaining gaps. Light schedule (10 hours total).
- Week 52: Rest, sleep, exercise. 1 final readiness check via READINESS_CHECKLIST.md. Interview confidently.
Volume: 30 re-solves, 4 mocks, 1 readiness audit.
Aggregate Volume
| Month | Problems | Labs | Mocks |
|---|---|---|---|
| 1 | 130 | 12 | 1 |
| 2 | 224 | 9 | 5 |
| 3 | 60 | 9 | 4 |
| 4 | 80 | 10 | 4 |
| 5 | 50 | 15 | 4 |
| 6 | 30 | 28 | 5 |
| 7 | 50 | 0 | 4 |
| 8 | 80 | 6 | 4 |
| 9 | 70 | 0 | 4 |
| 10 | 30 | 3 | 4 |
| 11 | 60 | 0 | 16 |
| 12 | 30 | 0 | 4 |
| Total | ~894 | ~92 | ~59 |
Contest Practice (unique to this track)
By end of Month 9, you should have completed:
- 12+ Codeforces Div 3/4 contests
- 6+ Codeforces Div 2 contests
- 6+ AtCoder Beginner Contests
- 3+ AtCoder Regular Contests
Goal Codeforces rating after 12 months: 1700–1900 (Expert range).
What Distinguishes This Track
vs 6-month Serious:
- ~1.5× the problems
- 2× the labs (especially in Phase 7 / 12)
- 2× the mocks (mostly in Months 8–11)
- Real contest experience (vs 0 in 6-month track)
- Multi-language runtime depth
- Phase 12 awareness (selective implementation)
vs Competitive-only training:
- This track does NOT skip practical engineering, system thinking, communication, or runtime depth
- A candidate trained purely on competitive programming often struggles in mock-08 (staff practical) and mock-10 (system-heavy) — this track does not have that gap
Honest Limitations
Even after this track:
- You will not reach grandmaster (CF 2400+) — that requires sustained contest practice for years
- You will not be a domain expert in a specific area (e.g., compilers, distributed consensus) — that requires actual work on those systems
- 12 months of part-time prep cannot replicate 5 years of full-time engineering experience
What it can do is take you from intermediate to elite-candidate level — enough to interview confidently for very hard roles and have a fighting chance.
6-Month Serious Track — Implementation
This is the execution layer for schedules/6_MONTH_SERIOUS.md. The schedule tells you what to study and when. This folder is the actual substance — curated flagship problems, original company-sourced problems, graduated reading, proof-first solutions, and senior/staff/principal-level signal layers.
What This Is (And Is Not)
This IS:
- 91 hand-picked flagship problems across 20 active weeks
- 8–10 ORIGINAL problems sourced from real Amazon/AWS, Google, Meta, Microsoft interview intelligence (not on LeetCode)
- A graduated 15-section reading layout per problem: approach → hints → insight → company adversarial → level delta → follow-ups
- Python solutions with brute-force comparators and stress tests
- Above-FAANG difficulty content targeting senior, staff, and principal interviews
This is NOT:
- A duplicate of LeetCode Premium. You should have LC Premium. We never restate problem statements or constraints — those are one click away.
- Every problem from the 6-month schedule (~595). The other ~500 are listed as “Problem Bank” in each week’s README — work them on LC after you own the flagship.
- A passive read. Each problem has an Attempt Gate — you set a timer and try the problem cold before reading anything else.
Why This Exists
LeetCode editorials give you code. They do not give you:
- The exchange argument that proves the greedy works
- The invariant that the two-pointer maintains
- The misleading example a Google interviewer will use to trap you on this exact problem
- The scorecard language the interviewer writes when you do (or do not) ace it
- The Level Delta: what a Mid vs Senior vs Staff vs Principal answer looks like for this same problem
- The production reality: if this ran at 10M RPS, what would break first?
- The Anti-pattern: the wrong-but-tempting approach that catches 70% of candidates, with exact bug location
- The original problem Amazon will ask you that has never appeared on LeetCode
This track provides all of those.
How To Use
- Read HOW_TO_USE.md — the graduated reading protocol
- Read FRAMEWORK.md once if you have not — every “How to Approach” section here references its 16 steps
- Start at month-01-foundations/, week 01, problem 01. Do not skip ahead.
- For each problem: Attempt Gate first (20-min timer, cold), then graduate through the README sections
- Stress-test your solution:
python solution.pyruns the brute-vs-optimal comparator - Log every solve in your tracking spreadsheet per SPACED_REPETITION.md
Problem Index (All 91)
Month 1 — Foundations (20 problems)
| # | Problem | Difficulty | Source | Week |
|---|---|---|---|---|
| p01 | Two Sum | Easy | LC 1 | 1 |
| p02 | Valid Parentheses | Easy | LC 20 | 1 |
| p03 | Best Time to Buy and Sell Stock | Easy | LC 121 | 1 |
| p04 | Merge Sorted Array | Easy | LC 88 | 1 |
| p05 | Climbing Stairs | Easy | LC 70 | 1 |
| p06 | Product of Array Except Self | Medium | LC 238 | 2 |
| p07 | Group Anagrams | Medium | LC 49 | 2 |
| p08 | Rotate Array | Medium | LC 189 | 2 |
| p09 | Longest Substring Without Repeating | Medium | LC 3 | 2 |
| p10 | Valid Anagram | Easy | LC 242 | 2 |
| p11 | Binary Tree Level Order Traversal | Medium | LC 102 | 3 |
| p12 | Kth Largest Element | Medium | LC 215 | 3 |
| p13 | Search in Rotated Sorted Array | Medium | LC 33 | 3 |
| p14 | Maximum Depth of Binary Tree | Easy | LC 104 | 3 |
| p15 | Merge Intervals | Medium | LC 56 | 3 |
| p16 | LRU Cache | Medium | LC 146 | 4 |
| p17 | Number of Islands | Medium | LC 200 | 4 |
| p18 | Coin Change | Medium | LC 322 | 4 |
| p19 | Binary Tree Right Side View | Medium | LC 199 | 4 |
| p20 | Word Search | Medium | LC 79 | 4 |
Month 2 — Patterns Mastery (20 problems)
| # | Problem | Difficulty | Source | Week |
|---|---|---|---|---|
| p21 | 3Sum | Medium | LC 15 | 5 |
| p22 | Trapping Rain Water | Hard | LC 42 | 5 |
| p23 | Subarray Sum Equals K | Medium | LC 560 | 5 |
| p24 | Minimum Size Subarray Sum | Medium | LC 209 | 5 |
| p25 | Maximum Product Subarray | Medium | LC 152 | 5 |
| p26 | Find Minimum in Rotated Sorted Array | Medium | LC 153 | 6 |
| p27 | Daily Temperatures | Medium | LC 739 | 6 |
| p28 | Meeting Rooms II | Medium | LC 253 | 6 |
| p29 | Largest Rectangle in Histogram | Hard | LC 84 | 6 |
| p30 | Jump Game II | Medium | LC 45 | 6 |
| p31 | Course Schedule | Medium | LC 207 | 7 |
| p32 | Pacific Atlantic Water Flow | Medium | LC 417 | 7 |
| p33 | Number of Connected Components | Medium | LC 323 | 7 |
| p34 | Word Ladder | Hard | LC 127 | 7 |
| p35 | Lowest Common Ancestor of BST | Medium | LC 235 | 7 |
| p36 | Combination Sum | Medium | LC 39 | 8 |
| p37 | Unique Paths | Medium | LC 62 | 8 |
| p38 | Implement Trie | Medium | LC 208 | 8 |
| p39 | Find Median from Data Stream | Hard | LC 295 | 8 |
| p40 | Decode Ways | Medium | LC 91 | 8 |
Month 3 — Graphs + DP Deep (20 problems)
| # | Problem | Difficulty | Source | Week |
|---|---|---|---|---|
| p41 | Network Delay Time | Medium | LC 743 | 9 |
| p42 | Cheapest Flights Within K Stops | Medium | LC 787 | 9 |
| p43 | Shortest Path in Binary Matrix | Medium | LC 1091 | 9 |
| p44 | Word Ladder II | Hard | LC 126 | 9 |
| p45 | Clone Graph | Medium | LC 133 | 9 |
| p46 | Critical Connections in a Network | Hard | LC 1192 | 10 |
| p47 | Alien Dictionary | Hard | LC 269 | 10 |
| p48 | Min Cost to Connect All Points | Medium | LC 1584 | 10 |
| p49 | Is Graph Bipartite? | Medium | LC 785 | 10 |
| p50 | Reconstruct Itinerary | Hard | LC 332 | 10 |
| p51 | Longest Increasing Subsequence | Medium | LC 300 | 11 |
| p52 | Edit Distance | Hard | LC 72 | 11 |
| p53 | Partition Equal Subset Sum | Medium | LC 416 | 11 |
| p54 | Regular Expression Matching | Hard | LC 10 | 11 |
| p55 | Interleaving String | Medium | LC 97 | 11 |
| p56 | Burst Balloons | Hard | LC 312 | 12 |
| p57 | Palindrome Partitioning II | Hard | LC 132 | 12 |
| p58 | Strange Printer | Hard | LC 664 | 12 |
| p59 | Dungeon Game | Hard | LC 174 | 12 |
| p60 | Number of Ways to Wear Different Hats | Hard | LC 1434 | 12 |
Month 4 — Greedy, Advanced DS, Practical Engineering (20 problems)
| # | Problem | Difficulty | Source | Week |
|---|---|---|---|---|
| p61 | Queue Reconstruction by Height | Medium | LC 406 | 13 |
| p62 | Gas Station | Medium | LC 134 | 13 |
| p63 | Task Scheduler | Medium | LC 621 | 13 |
| p64 | Minimum Number of Arrows | Medium | LC 452 | 13 |
| p65 | Split Array Largest Sum | Hard | LC 410 | 13 |
| p66 | Range Sum Query Mutable | Medium | LC 307 | 14 |
| p67 | Count of Smaller Numbers After Self | Hard | LC 315 | 14 |
| p68 | Implement strStr() (KMP) | Easy | LC 28 | 14 |
| p69 | Repeated DNA Sequences | Medium | LC 187 | 14 |
| p70 | Sliding Window Maximum | Hard | LC 239 | 14 |
| p71 | Design LRU Cache (distributed follow-ups) | Medium | LC 146 + extensions | 15 |
| p72 | Design Hit Counter | Medium | LC 362 | 15 |
| p73 | Design Autocomplete System | Hard | LC 642 | 15 |
| p74 | LFU Cache | Hard | LC 460 | 15 |
| p75 | ORIGINAL — Consistent Hashing Ring | Hard | Amazon/AWS | 15 |
| p76 | Design In-Memory File System | Hard | LC 588 | 16 |
| p77 | ORIGINAL — Distributed Rate Limiter | Hard | Meta Infra | 16 |
| p78 | Text Justification | Hard | LC 68 | 16 |
| p79 | ORIGINAL — Service Mesh Min Deployment | Hard | Google SRE | 16 |
| p80 | Basic Calculator II | Medium | LC 227 | 16 |
Month 5 — Hards + Concurrency + Above-FAANG (10 problems)
| # | Problem | Difficulty | Source | Week |
|---|---|---|---|---|
| p81 | Median of Two Sorted Arrays | Hard | LC 4 | 17 |
| p82 | Russian Doll Envelopes | Hard | LC 354 | 17 |
| p83 | First Missing Positive | Hard | LC 41 | 17 |
| p84 | Minimum Window Substring | Hard | LC 76 | 17 |
| p85 | ORIGINAL — Concurrent Web Crawler with Backpressure | Hard | Amazon | 18 |
| p86 | Design Bounded Blocking Queue | Medium | LC 1188 | 18 |
| p87 | ORIGINAL — S3 Cost-Optimal Eviction | Hard | AWS | 18 |
| p88 | N-Queens | Hard | LC 51 | 19 |
| p89 | Trapping Rain Water II | Hard | LC 407 | 19 |
| p90 | ORIGINAL — Distributed Job DAG Scheduler | Hard | 19 |
Month 6 — Polish + Mock Marathon
| Capstone | Source | Week |
|---|---|---|
| p91 — Skyline Problem | LC 218 | 20 |
| MOCK_MARATHON.md | — | 21 |
| RE_SOLVE_GUIDE.md | — | 23 |
| FAILURE_PATTERNS.md | — | 21–24 |
| FINAL_READINESS.md | — | 24 |
Cross-References to the Rest of the Workspace
- FRAMEWORK.md — universal 16-step problem-solving framework
- CODE_QUALITY.md — quality bar every
solution.pymust meet - COMMUNICATION.md — what to say in interviews
- FAILURE_ANALYSIS.md — 16-category failure taxonomy
- SPACED_REPETITION.md — 6-tier review schedule
- READINESS_CHECKLIST.md — final binary checklist
- phase-02-patterns/ — concept docs for patterns (3Sum, sliding window, etc.)
- phase-04-graphs/ — graph algorithm concept docs
- phase-05-dp/ — DP concept docs and labs
Honest Promise
If you complete every problem in this track honestly — Attempt Gate respected, hints used sparingly, follow-ups answered before reading the answers, stress tests passing, every “When to Move On” checklist green — you will be in the top 5% of candidates entering senior, staff, and principal interview loops at Amazon, Google, Meta, Microsoft, Apple, and infrastructure-specialty companies.
If you skip the Attempt Gates, peek at hints early, or skim the Level Delta sections without honestly self-assessing — this is just a list of problems you’ve seen. Seen ≠ owned. Owned is what gets offers.
How To Use This Track — The Graduated Reading Protocol
Every problem folder has three files. They are meant to be read in a specific order with specific gates between them. Reading them out of order destroys the value.
The Three Files
pXX-problem-name/
README.md ← Graduated reading: 15 sections, each gated
solution.py ← Brute + optimal + stress test (Python, runnable)
hints.md ← 5 progressive hints (only opened when stuck)
The Per-Problem Reading Protocol
Phase 1 — Cold Attempt (mandatory)
- Open
README.md, read only sections 1–2 (Quick Context + LeetCode Link/Attempt Gate) - Click the LeetCode link, read the problem statement on LC
- Set a 20-minute timer. Code on paper or in a scratch file. No IDE autocomplete that helps with the algorithm. No reading further in README.md. No
hints.md. - If you solve it: write down your time and approach. Now jump to Phase 3.
- If 20 min passes: write down where you got stuck. Move to Phase 2.
Why the gate? You cannot build pattern-recognition by reading approaches. You build it by failing to find approaches, then learning the gap. If you skip the cold attempt, you are training your recognition memory, not your derivation skill. Interviews test derivation.
Phase 2 — Hint Ladder (only if stuck)
Open hints.md. Read one hint. Set another 10-min timer. Try again.
- Solved after Hint 1 → write down which insight unlocked it. Move to Phase 3.
- Still stuck after 10 min → next hint.
- After Hint 5 without solving → you’ve hit the conceptual gap. Open
README.mdsection 4 (“How to Approach”) and read straight through. Then Phase 3.
Rule: never read two hints back to back. Always 10 minutes between hints.
Phase 3 — The Real Learning
Now read README.md sections 3–15 in order. This is where the value is.
- Section 3 (Prerequisites): if any link goes to a phase lab you haven’t done, do that lab first
- Section 4 (How to Approach): compare to YOUR approach. Where did you diverge? Why?
- Section 6 (Deeper Insight): the proof or invariant. You must be able to restate this in your own words before moving on.
- Section 7 (Anti-Pattern): “did I almost do this?” — if yes, this is a flag for your weakness log
- Section 10 (Company Context): mental note of how the company you’re targeting twists this problem
- Section 12 (Level Delta): honestly: which level was your answer? Mid, Senior, Staff, or Principal?
- Section 13 (Follow-ups): cover the answer with your hand. Try each follow-up. Then read the answer.
Phase 4 — Code & Test
- Open
solution.py. Read the brute force. Read the optimal. - Close the file. Re-implement the optimal from scratch in your own scratch.
- Diff your re-implementation against the reference. Identify every difference. Some differences are style (fine). Some are bugs (not fine).
- Run the stress test:
python solution.py. It must pass. - Now run your own implementation against the stress test. It must also pass.
Phase 5 — Move-On Gate
Walk through Section 9 (“When to Move On”) — the binary checklist. Every box must be honestly checked. If even one is no, you stay on this problem (re-do tomorrow). No exceptions.
Log the solve in your tracking spreadsheet:
- Date, problem, time-to-solve, hint depth used (0–5), follow-ups answered correctly (count/total), Level Delta self-assessment
- Schedule next review per SPACED_REPETITION.md
What Each README.md Section Is For
| # | Section | Read When |
|---|---|---|
| 1 | Quick Context | Before cold attempt |
| 2 | LeetCode Link + Attempt Gate | Before cold attempt |
| 3 | Prerequisite Concepts | After cold attempt; do prereq labs if needed |
| 4 | How to Approach | After Hint 5 fails OR after solving |
| 5 | Progressive Hints (→ hints.md) | When stuck, one at a time |
| 6 | Deeper Insight | Always, after solving |
| 7 | Anti-Pattern Analysis | Always — check if you fell into it |
| 8 | Skills & Takeaways | Always — note analogous problems |
| 9 | When to Move On | Mandatory gate |
| 10 | Company Context | Read for companies you’re targeting |
| 11 | Interviewer’s Lens | Always — internalize scorecard language |
| 12 | Level Delta | Self-assess honestly |
| 13 | Follow-ups & Answers | Attempt each cold, then read |
| 14 | Full Solution Walkthrough | After re-implementing |
| 15 | Beyond the Problem | Always — production reality |
Common Mistakes (Do Not Do)
- Reading README.md before the Attempt Gate. Destroys the entire training value.
- Reading all hints at once. Same problem.
- Skipping the “re-implement from scratch” step. Reading code ≠ writing code.
- Skipping the Level Delta self-assessment. This is the single highest-signal section.
- Skimming Section 10 (Company Context). This is where the differentiated content lives.
- Marking “When to Move On” green without honestly checking each box. Calcifies bad habits.
- Not stress-testing your own implementation. A solution that passes LC may have a bug a stress test catches.
- Skipping originals (p75, p77, p79, p85, p87, p90). These are the highest-value problems in the track. They’re original because they don’t exist on LC — meaning your competition hasn’t seen them either.
Time Budget Per Problem
| Difficulty | Cold Attempt | Hints + Re-attempt | Reading + Re-implement | Total |
|---|---|---|---|---|
| Easy | 20 min | 10–20 min | 30 min | ~60–70 min |
| Medium | 25 min | 20–40 min | 45 min | ~90–110 min |
| Hard | 30 min | 30–60 min | 60 min | ~120–150 min |
| Original (any) | 30 min | 20–40 min | 60–90 min | ~110–160 min |
Plan accordingly when fitting problems into the weekly schedule.
The One Rule
If you remember nothing else: the Attempt Gate is not optional. Every other shortcut is recoverable. Skipping the cold attempt is not — it permanently degrades the training signal for that problem.
Month 1 — Foundations
Weeks 1–4 · 20 flagship problems · ~115 LC Bank problems · 1 mock
Goals
By end of month, you can:
- Execute the 16-step framework on every problem without thinking about it
- Solve any LC Easy in <12 min with full edge-case coverage
- Solve straightforward Mediums (array/string/hashmap/tree/basic-DP) in <30 min
- Explain hashmap collision handling, GC, stack vs heap in your primary language
- Pass mock-02 (Easy) at 7+/10
Weekly Map
| Week | Theme | Flagship Problems | Phase Reading |
|---|---|---|---|
| 1 | Execution baseline + first patterns | p01–p05 | phase-00, phase-01 §1–3 |
| 2 | Arrays + strings + hashmaps deep | p06–p10 | phase-01 §1–3 runtime, phase-09 your lang first half |
| 3 | Heaps, binary search, trees | p11–p15 | phase-01 §4–7 |
| 4 | LRU, DFS/BFS, DP intro, backtracking intro | p16–p20 | phase-01 §8–9, phase-09 second half |
End-of-Month Gate
- All 20 flagship problems: Section 9 (“When to Move On”) checklist green
- All Phase 1 mastery checks in phase-01-foundations/README.md pass
- mock-02 scored 7+/10
- Tracking spreadsheet has 20 entries with Level Delta self-assessment
If any item fails: do NOT enter Month 2. Repeat the weakest week’s drilling.
Why These 20 Problems
These are the 20 most-asked Easy-and-low-Medium problems in 2024–2025 Big Tech interview loops (cross-referenced against company tag data, real-time interview reports, and recruiter intel). They are the floor. If you cannot solve all 20 fluently, you will fail the phone screen — not because they are hard, but because failing an Easy in 12 minutes is an instant no-hire signal.
Week 1 — Execution Baseline + First Patterns
Days 1–7 · 5 flagship problems · ~25 LC Bank · 0 mocks
Goals
- Internalize the 16-step framework on trivial problems (so you can use it on hard ones)
- Two-pass over Phase 0 labs
- Acquire HashMap-1-pass, Stack-validation, Greedy-1-pass, Two-pointer-from-end, 1D-DP-intro patterns
Daily Schedule
| Day | Reading | Flagship | Bank |
|---|---|---|---|
| Mon | FRAMEWORK.md re-read; phase-00 labs 1–3 | p01 Two Sum | 3 LC Easies |
| Tue | phase-00 labs 4–5 | p02 Valid Parentheses | 3 LC Easies |
| Wed | phase-00 labs 6–7 | p03 Best Time Buy/Sell | 3 LC Easies |
| Thu | phase-01 §1 Arrays | p04 Merge Sorted Array | 3 LC Easies |
| Fri | phase-01 §2 Strings + §6 Heap intro | p05 Climbing Stairs | 3 LC Easies |
| Sat | Re-solve p01–p05 unaided | — | 5 LC Easies + REVIEW |
| Sun | COMMUNICATION.md + spaced repetition logging | — | 5 LC Easies |
LC Bank (Problems to solve on your own after flagship)
LC 217 (Contains Duplicate), 169 (Majority Element), 268 (Missing Number), 53 (Maximum Subarray), 136 (Single Number), 283 (Move Zeroes), 26 (Remove Duplicates from Sorted Array), 27 (Remove Element), 1 (Two Sum — variant), 9 (Palindrome Number), 14 (Longest Common Prefix), 28 (strStr — naive), 35 (Search Insert Position), 66 (Plus One), 67 (Add Binary), 69 (Sqrt(x) — binary search intro), 88 (Merge Sorted Array — variant), 100 (Same Tree), 101 (Symmetric Tree), 104 (Maximum Depth — preview), 108 (Sorted Array → BST), 112 (Path Sum), 118 (Pascal’s Triangle), 226 (Invert Binary Tree), 543 (Diameter of Binary Tree).
Readiness Gate
- All 5 flagship problems Section 9 checklists green
- 25+ Bank problems solved unaided
- Framework Steps 1–9 executed audibly (talk through) on at least 10 problems
- No off-by-one errors on 5 consecutive binary-search-flavored problems
- Honest self-assessment: Level Delta = Mid or above on at least 3 flagships
p01 — Two Sum
Source: LeetCode 1 · Easy · Topics: Array, Hash Table Companies (2024–2025 frequency): Amazon (very high), Google (high), Meta (high), Apple (medium), Microsoft (medium), Bloomberg (very high) Loop position: phone screen warmup, or first 10 min of onsite to calibrate
1. Quick Context
This is the most-asked Easy in Big Tech history. The interviewer is not testing whether you can solve it — they expect you to solve it in <8 minutes. They are testing whether you:
- Clarify before coding (duplicates? multiple answers? sorted?)
- State the brute force out loud before optimizing
- Pick the right optimal (one-pass hash, not two-pass)
- Handle the “what about the same element twice” edge case
- Communicate cleanly through a problem you’ve obviously seen before
What it looks like it tests: array iteration. What it actually tests: disciplined communication under “I’ve done this a million times” complacency. Senior candidates fail this by going too fast and skipping clarifications. The interviewer is watching for the framework, not the answer.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/two-sum/
STOP. Set a 15-minute timer. Code it cold in a scratch file. Do not read past this section until you have either solved it or the timer expired.
If you’ve solved Two Sum before: do it again anyway, and time yourself. Target: 6 min including narration. If you’re over 8 min, you have a framework/communication gap, not an algorithm gap — and that gap will kill you on harder problems.
3. Prerequisite Concepts
- Hash table average O(1) lookup + the assumption that makes it true (phase-01 §3 HashMap Mastery)
- “Complement search” pattern: instead of looking for pairs, transform to lookup of (target − x)
- In your primary language: what hash collision behavior is, what hash resize cost is — see phase-09
4. How to Approach (FRAMEWORK Steps 1–9 applied)
Step 1 — Restate: “Given an integer array and a target integer, return the indices of two distinct elements that sum to target. Exactly one valid pair exists per problem statement.”
Step 2 — Clarify (ask out loud, do NOT skip even though it’s Easy):
- “Can the same element be used twice?” (No — indices must be distinct.)
- “If multiple pairs sum to target, which one do I return?” (Problem says exactly one solution — but in real interviews they may relax this; ask.)
- “Can the array be empty / size 1?” (Per constraints, N ≥ 2.)
- “Are values bounded? Can they be negative?” (Per LC, yes, both negative and positive in int32 range.)
- “Is the array sorted?” (No. If it were, you’d use two pointers — explicitly call this out; this is a senior signal.)
- “Return indices or values?” (Indices, in any order.)
Step 3 — Constraints: N up to 10^4 in classic statement. O(N²) brute fits, but O(N) is expected. With N up to 10^4, O(N²) = 10^8 — borderline, will get TLE on some servers.
Step 4 — Examples (build your own):
[2,7,11,15], target=9→[0,1](the given one)[3,3], target=6→[0,1]← critical: the “same value, different indices” case[-3,4,3,90], target=0→[0,2](negative handling)[1,2,3,4], target=100→ does not occur per statement, but ask: “if no solution, what do I return?”
Step 5 — Brute Force: Nested loop. For each i, check every j > i. O(N²) time, O(1) space.
Step 6 — Brute Force Complexity: Time O(N²), space O(1). Trivially correct. State this out loud BEFORE optimizing.
Step 7 — Pattern Recognition: “Given an array, find a pair satisfying a property” + “complement is computable” + “order of result doesn’t matter” → HashMap complement search. (Sorted + ordered → two pointers. Unsorted + complement-computable → HashMap.)
Step 8 — Optimize: Walk the array once. For each x at index i, compute complement = target − x. If complement is in the map, return (map[complement], i). Else add x → i to the map. One pass, not two. Two-pass works but signals weaker intuition.
Step 9 — Prove correctness: Loop invariant: after processing index i, the map contains exactly {nums[0..i] : their indices} for every distinct value (or the latest index for duplicates, but since exactly one solution exists, duplicates can only be the answer pair — and at index j, the complement was stored at index i < j, so we find it).
5. Progressive Hints
If you’re stuck for more than 5 minutes, open hints.md and read one hint only. Set another 5-min timer between hints.
6. Deeper Insight — Why It Works
The transformation: “Find two numbers summing to T” is a 2D search (i, j pairs) → O(N²). By computing T − nums[i] for each i, we reduce to a 1D lookup (“does this value exist?”) which is O(1) amortized with a hash table. The hash table is the data structure that turns 2D pair-search into 1D existence-search. This is the master pattern for entire problem families: 3Sum, 4Sum, Subarray-Sum-K, etc., all reduce 1 dimension via complement-hashing.
The single-pass insight: You don’t need to load the full array into the map first. By the time you encounter the second element of the answer pair, the first one has already been stored. So insert AFTER you check — never before, or [3,3] breaks (you’d find 3 → 0 and return [0,0], two equal indices).
Order matters: check first, insert second. Reversing this is the single most common bug. The map state at step i represents “everything seen STRICTLY BEFORE i” — that’s the invariant.
7. Anti-Pattern Analysis
Wrong-but-tempting #1 — Two-pass with enumerate:
d = {x: i for i, x in enumerate(nums)}
for i, x in enumerate(nums):
j = d.get(target - x)
if j is not None and j != i:
return [i, j]
This works but: (a) O(N) extra memory written then read again, (b) the j != i guard signals you didn’t think about why one-pass avoids the issue, (c) for duplicates [3,3], the dict only stores the last index — works only because j != i saves you. Two-pass is a code smell that says “I memorized the pattern but didn’t internalize the invariant.”
Wrong-but-tempting #2 — Sort then two-pointer:
sorted_nums = sorted(enumerate(nums), key=lambda p: p[1])
# ... two-pointer scan
Works, but O(N log N) when O(N) exists. Some candidates do this because two-pointer is their hammer. At Google in particular, choosing N log N when N exists is a signaled-down on the algorithmic complexity rubric.
Wrong-but-tempting #3 — Brute force “to be safe”: The brute force fits N ≤ 10⁴ barely. If the interviewer expands the constraint to 10⁵, brute force times out. They WILL expand the constraint to test you — be ready.
8. Skills & Takeaways
Generalizable pattern: Complement search via HashMap. Any time you see “find pair/triple/group summing or differing or relating to value V”, first ask: can I express the missing piece as a function of what I have? If yes, HashMap that function’s range.
Analogous problems (do these on LC after):
- LC 167 — Two Sum II (sorted input — uses two pointers instead; teaches when NOT to use hashmap)
- LC 653 — Two Sum IV BST (BFS + set; same complement pattern, different traversal)
- LC 1 vs LC 15 (3Sum) — the recursive extension; outer loop fixes one element, reduces to 2Sum
- LC 560 — Subarray Sum Equals K (the prefix-sum complement variant — same idea, applied to running sums)
- LC 454 — 4Sum II (split into two halves, hash one half, lookup complements of the other — meet-in-the-middle flavor)
When NOT to use this: Sorted input (two pointers is O(1) space, hashmap is O(N)). Streaming input where you can’t afford O(N) memory.
9. When to Move On (binary; must all be YES)
- I solved p01 unaided in <8 min including narration on the second attempt
- I can state the loop invariant of one-pass Two Sum without looking
-
I can explain why “check first, insert second” matters, with the
[3,3]example - I can name when two-pointer is preferred over hashmap (sorted input, O(1) space requirement)
-
I implemented this from scratch and my version passes
stress_test()in solution.py - I solved LC 167 (Two Sum II) and articulated why the optimal approach changed
- I solved LC 560 (Subarray Sum K) and recognized the same complement-hashing pattern
If any unchecked: redo tomorrow before moving to p02.
10. Company Context
Amazon (LP-heavy; coding bar = “are bugs going to ship?”)
- The framing: Often given as “given a list of
Orderobjects with apricefield, find two orders whose prices sum to the target promo discount.” - Misleading example: They’ll give you
[5, 5, 5, 5], target=10. Many candidates lock onto “return the first two indices that work” — but Amazon interviewers want you to ASK whether you should return any valid pair or the first in some defined order. Asking shows Customer Obsession (knowing the requirement). - Adversarial extension: “Now there could be millions of orders streamed one at a time. How does your solution change?” → streaming with a HashSet, return the first matching pair.
- What they want to hear: “Let me clarify the requirements before I optimize.” Verbatim phrases like “Let me first state the brute force so we have a baseline” earn rubric points.
Google (algorithmic complexity is a hard rubric line)
- The framing: Often the cleanest, just the original problem.
- Misleading example: A small-N example where O(N²) clearly fits. Google interviewers do this to see if junior candidates will say “brute force is fine” and stop. The right move: “Brute force fits this size, but I want to do better as a habit — and to handle the extension where N is 10⁶.”
- Adversarial extension: “Return all pairs that sum to target.” Now it’s not 1 answer; you must dedupe pairs (3Sum-style logic).
- What they want to hear: Explicit complexity for brute and optimal, stated separately. “O(N²) brute → O(N) one-pass hashmap, O(N) extra space.”
Meta (heavy on follow-ups; expect 3–4 in 25 min)
- The framing: “Two Sum, then variants in rapid succession.”
- Misleading example: They start with the canonical example, then immediately follow up: “Now what if the array is sorted?” If you stayed on hashmap, you scored less than the candidate who pivots to two-pointers.
- Adversarial extension: “What if duplicates are allowed in input AND we want all unique pairs?” → 2Sum-with-duplicates → introduces dedup via sort or a counted-set.
- What they want to hear: Recognition that the optimal algorithm depends on input properties. The phrase “if the input were sorted, I’d use two pointers” wins them over.
Microsoft (clarity + cleanliness over speed)
- The framing: Phone screen warmup.
- Misleading example: None — Microsoft tends to be straightforward here.
- Adversarial extension: “Now make it work for any K-Sum” (K is a parameter).
- What they want to hear: Clean function decomposition, clear variable names, edge-case enumeration. Microsoft phone screens reward boring, correct code.
Bloomberg (financial framing — be ready)
- The framing: “Given a list of trade prices and a settlement amount, find two trades that exactly settle to the amount.”
- Misleading example: Includes negative prices (refunds). Candidates who hard-code
i < jordering may break on negatives if they switch to a sort approach mid-stream. - What they want to hear: Explicit “I’m assuming negative values are allowed; my hash approach handles them naturally.”
11. Interviewer’s Lens
| Phase | Strong signal | Weak signal | Scorecard phrase (strong) |
|---|---|---|---|
| Reading problem | Asks 3+ clarifying questions even though it’s Easy | Says “oh I’ve seen this” and dives in | “Disciplined clarifying behavior — would translate to fewer production bugs” |
| Pre-coding | States brute force, then states optimal, with complexities for both | Jumps to “I’ll use a hashmap” without justifying | “Communicates derivation, not just memorization” |
| Coding | Names variable complement, comments invariant once | Uses d, m, no comments | “Code is interview-readable; would pass our internal code review bar” |
| Edge cases | Tests [3,3] before submission | Tests only the given example | “Self-catches bugs before code review — strong production instinct” |
| Post-coding | Articulates time AND space complexity unprompted | Says only “it’s linear” | “Owns full complexity analysis” |
The scorecard line that gets you the offer: “Candidate demonstrated framework discipline on a trivial problem, suggesting it will scale to hard ones.”
The scorecard line that loses you: “Skipped clarifying questions; rushed to known answer; did not test [3,3]; missed senior signal opportunity.”
12. Level Delta
| Level | What their answer looks like |
|---|---|
| Mid | One-pass hashmap solution. States O(N) time, O(N) space. Tests given example. ~10 min. |
| Senior | All of Mid + clarifies 3+ questions upfront + explicitly states brute force first + tests [3,3] + mentions “if sorted, I’d use two pointers” + completes in ~7 min. |
| Staff | All of Senior + articulates loop invariant before coding + names the complement-search pattern by family + connects to LC 15 (3Sum) and LC 560 (Subarray Sum K) as the same family + mentions hash collision worst-case (O(N²)) as a footnote + completes in ~6 min. |
| Principal | All of Staff + asks “what’s the production context — are these orders, transactions, ad bids?” + identifies that for very large N you’d shard the hashmap or use a Bloom-filter prefilter + mentions GC pressure from large dict allocation as an Amazon/Google production concern + offers the streaming variant unprompted + completes in ~5 min with time to discuss tradeoffs. |
Honest self-assessment: Which level was YOUR answer? If “Mid”, you have 4 sections to add to your toolkit. If “Senior” — good baseline; aim for Staff on the next 5 problems.
13. Follow-up Questions & Full Answers
Follow-up 1: “What if the array is sorted?”
Signal sought: Do you recognize that input properties change the optimal algorithm?
Full answer: “If sorted, I’d switch to two pointers — left=0, right=N-1. If sum > target, decrement right; if sum < target, increment left; if equal, return. O(N) time, O(1) space (better than hashmap’s O(N) space). The correctness comes from monotonicity: incrementing left only increases the sum; decrementing right only decreases. We never miss the answer because at each step, the eliminated half cannot contain the answer.”
Follow-up 2: “What if there are multiple valid pairs and we want all unique ones?”
Signal sought: Can you handle dedup without explosive complexity?
Full answer: “Two approaches. (a) Sort + two pointers + skip duplicates on both sides — O(N log N) time, O(1) extra. (b) Hashmap + use a set of frozensets to dedup result pairs — O(N) time, O(N) extra. Approach (a) is preferred unless we cannot mutate input. The key insight: dedup happens by skipping equal adjacent values after sorting, not by post-filtering.”
Follow-up 3: “Now there are billions of integers streamed one at a time, infinite stream. Detect any sum-pair as fast as possible.”
Signal sought: Streaming / unbounded-input thinking.
Full answer: “Use a HashSet (not Map — we just need existence). For each incoming x: check (target − x) in set; if found, emit the pair; else add x to set. O(1) per element, O(N) total memory grows unboundedly. For unbounded memory: use a Bloom filter as a prefilter (false positives OK; we verify by querying upstream), bounded memory at cost of occasional false alarms. If we need bounded memory AND zero false positives, we accept that we may miss pairs — fundamentally we cannot remember an arbitrary stream. State this tradeoff explicitly.”
Follow-up 4 (Hard): “Distribute across 100 machines. Each holds 1% of the array. Find a pair summing to target.”
Signal sought: Distributed systems thinking, communication cost awareness.
Full answer: “Broadcast the target. Each machine builds a local set of its values. To find cross-machine pairs: each machine emits (target − x, x, machine_id) for each local x, hash-partitioned by (target − x) mod 100 to the responsible machine. That receiving machine checks if it holds the complement. Communication is O(N) total messages, O(N/100) per machine. Latency: 1 shuffle round. Correctness: every valid pair (x, y) where x + y = target is checked because x is hashed by y = target − x to y’s home machine. Caveat: if the array is so large that even local sets don’t fit in RAM, we partition further or stream from disk with external-sort-style processing.”
Follow-up 5 (Senior signal): “How would you test this code?”
Signal sought: Testing discipline, not just unit tests.
Full answer: “Four layers. (1) Unit: given example, [3,3], negatives, two valid pairs choosing one. (2) Edge: minimum N=2, maximum constraint. (3) Property test: random N, random ints, brute-force comparator — assert both find the same pair (modulo order). My solution.py does this via stress_test(). (4) Adversarial fuzz: hash-collision DoS inputs — known pathological inputs that cause O(N²) hashmap behavior. Production code would also include perf regression tests and memory profiling for large N.”
14. Full Solution Walkthrough
See solution.py.
The file has four sections:
two_sum_brute(nums, target)— nested loop, O(N²). This is your correctness oracle.two_sum(nums, target)— the one-pass hashmap. Note the order: check first, insert after. The comment on line marking the invariant.stress_test()— generates 1000 random arrays, runs both, asserts results sum to target equally. This is the bar: every flagship problem has a stress test.__main__— runs the given example,[3,3], negative case, then the stress test.
Decisions justified in the file:
- Why
seen.get(complement)instead ofif complement in seen: return [seen[complement], i]: one hash lookup instead of two. - Why we return as a list
[a, b]not a tuple: matches LC signature exactly. - Why no input validation: per the framework, we validate at system boundaries — interview code assumes valid input per the problem statement.
15. Beyond the Problem — Production Reality
At 10M RPS:
- The hashmap allocation per request becomes a GC pressure point. In production, you’d pool the map or use a primitive-keyed map (e.g.,
Long2IntOpenHashMapin Java’s Eclipse Collections, or adictallocated once and.clear()’d in Python). - For very large N per request, the O(N) memory dominates. Spilling to off-heap or using a compact open-addressing hashmap would matter.
Real system this is the kernel of:
- Ad bid matching: given a target CPM, find two bids that sum to the publisher’s floor. Same algorithm, with bid-objects carrying metadata. Real ad exchanges (Google Ad Exchange, Meta Audience Network) do variants of this billions of times per second.
- Promo discount calculator: e-commerce platforms match “if customer buys X and Y, the bundle costs the promo target.”
- Settlement matching at exchanges: pair buy and sell orders that exactly clear.
Principal-engineer code review comment: “Why is this a one-off function? In our codebase, complement-search-against-hashmap is a building block. Extract find_pair_by_property(items, key_fn, target) so we can reuse for 3Sum, k-Sum, prefix-sum, and the promo-bundle code path. Also: thread safety? If this map is shared across requests, we have a race.”
p02 — Valid Parentheses
Source: LeetCode 20 · Easy · Topics: String, Stack Companies: Amazon (high), Google (medium), Meta (medium), Microsoft (high), Bloomberg (high), Salesforce (very high) Loop position: phone screen, sometimes paired with p80 (Basic Calculator II) at onsite
1. Quick Context
The canonical stack problem. Looks trivial; the trap is the early-return logic and the empty-stack-on-close case. Senior candidates lose points by writing 15 lines when 8 suffice, by forgetting to check the stack is non-empty at end, or by not asking whether non-bracket characters can appear.
What it tests: Disciplined use of the right data structure (stack), not loop-and-counter hacks. Anti-signal: counting opens and closes separately (“balanced count”) — that fails on "(]". If you proposed this and didn’t catch it yourself, you’ve failed the cognitive-trap check.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/valid-parentheses/
STOP. Set a 12-min timer. Solve cold. Do not read on until done or timed out.
3. Prerequisite Concepts
- LIFO semantics of a stack; why “matching last-opened” is intrinsically a stack property — phase-01 §5
- Mapping closed → open via a constant dict (O(1) lookup, more idiomatic than chained
if)
4. How to Approach
Restate: “Given a string of ()[]{} characters, return true iff every opener has a matching closer of the right type, in the right nesting order.”
Clarify:
- “Can the string contain other characters?” (LC: no, only brackets — but ASK, real interviews vary.)
- “Empty string?” (LC: valid = true. Yes, common gotcha.)
- “Maximum length?” (Implies whether we care about stack overflow in recursive solutions; iterative + explicit stack avoids that anyway.)
Constraints: N up to 10⁴. O(N) expected.
Examples to build:
"()[]{}"→ true (sequential pairs)"([{}])"→ true (full nesting)"(]"→ false ← the counter-fail case"]"→ false (close-first; empty-stack case)"(("→ false (unmatched open at end; non-empty-stack-at-end case)""→ true (empty)
Brute force: Repeatedly scan and remove adjacent (), [], {} pairs until no change. If empty, valid. O(N²).
Pattern: “Match most recent unclosed thing” → stack. Period.
Optimal: One pass. For each char: if opener, push. If closer, pop and verify match. At end, stack must be empty.
Proof of correctness: Stack invariant — at any point, the stack contains exactly the unmatched openers in order of opening. A closer matches iff it pairs with the top (most recent opener). If a closer arrives with empty stack, it has no match → false. If non-closer remains at end → unmatched openers → false.
5. Progressive Hints
If stuck >5 min: hints.md. One at a time.
6. Deeper Insight — Why It Works
Why a stack and not a counter? A counter (opens - closes) is sufficient for a single bracket type — but it cannot detect "(]". The stack is necessary because we need to remember not just that a bracket is open, but which type. Each push commits the type; each pop verifies the type. The stack is the minimum data structure that captures both “how many open” and “which order/type.”
Why we check empty BEFORE popping: If we blindly pop an empty stack on a closer, we error out (or in some languages, get undefined behavior). The check if not stack: return False handles "]", "})", etc.
Why we check empty at END: The string "((" walks through pushing twice; loop ends; stack non-empty; without the final check, we’d return True. The end-check is the second necessary correctness step.
7. Anti-Pattern Analysis
Wrong-but-tempting #1 — Count-and-compare:
if s.count('(') == s.count(')') and s.count('[') == s.count(']') and ...:
return True
Fails on "(]" (each count is 1, returns True). This is the single most common bug junior candidates ship. If you proposed this, you missed the type-ORDER requirement.
Wrong-but-tempting #2 — Regex / replace loop:
while '()' in s or '[]' in s or '{}' in s:
s = s.replace('()','').replace('[]','').replace('{}','')
return s == ''
Works but O(N²) and signals you reached for a hammer. Interviewer’s note: “didn’t recognize stack pattern.”
Wrong-but-tempting #3 — Forgetting end-check:
for c in s:
if c in '([{': stack.append(c)
elif c in ')]}':
if not stack or pairs[c] != stack.pop(): return False
return True # ← BUG: returns True for "((" because stack non-empty but no closer caught it
The fix is return not stack. Forgetting this is the #2 bug here.
8. Skills & Takeaways
Generalizable pattern: “Match most recent unresolved item” → stack. Applies to:
- LC 32 — Longest Valid Parentheses (stack of indices, harder cousin)
- LC 84 — Largest Rectangle in Histogram (monotonic stack — same family)
- LC 739 — Daily Temperatures (monotonic stack — same family)
- LC 71 — Simplify Path (each segment as stack frame)
- LC 394 — Decode String (stack of (count, prefix) frames)
- LC 1249 — Minimum Remove to Make Parentheses Valid (stack of indices)
When NOT to use: Single bracket type, just balance counting → counter is sufficient and uses O(1) space.
9. When to Move On
- Solved unaided in <8 min on second attempt
-
Tested
"(]","]","((",""without prompting - Can articulate why a counter fails for multiple bracket types
-
Implemented with constant
closer → openermap (not chained ifs) - Stress test in solution.py passes
- Solved LC 1249 and recognized the same pattern with index tracking
10. Company Context
Amazon
- The framing: Often paired with “and write the matched pairs as (open_idx, close_idx)” — extends to index-tracking variant.
- Misleading example: They give
"{[()]}"and let you confirm true. Then sneak in"{[(])}"— looks balanced character-counts-wise; trips candidates who didn’t catch the type-order failure. - Extension: “Now allow
<>too.” → tests whether your code generalizes (constant dict makes it 1-line; chained ifs make it 4 new branches).
Salesforce (this is THEIR favorite)
- The framing: Often appears in onsite, not phone screen — they care about the cleanliness more than the algorithm.
- What they want to hear: “I’ll use a stack and a dictionary mapping closers to openers.” That sentence alone is a green check.
- Adversarial extension: “What if
[could match)(mixed mode)?” → tests whether your code is data-driven enough to change one line of config.
Microsoft
- The framing: Phone screen warmup, sometimes followed by p80 (Basic Calculator).
- What they want to hear: Single-pass justification, end-of-loop empty-check stated out loud.
Bloomberg
- The framing: Often as part of a JSON/parser problem — “validate this serialized message structure”.
- Extension: “Parse the bracketed expression into an AST.”
11. Interviewer’s Lens
| Phase | Strong | Weak | Scorecard (strong) |
|---|---|---|---|
| Reading | Asks “other characters?” “empty string?” “max length?” | Dives in | “Clarifies the contract before coding” |
| Pre-coding | Names the stack pattern explicitly | Says “I’ll iterate and track” | “Recognizes the canonical pattern” |
| Coding | Uses constant dict {')':'(', ']':'[', '}':'{'} | Chains 6+ if-branches | “Writes data-driven, extensible code” |
| Edge cases | Tests "(]", "]", "" proactively | Tests only "()" | “Anticipates failure modes” |
| Finish | States complexity and stack invariant | Says “done” | “Owns the analysis” |
12. Level Delta
| Level | Answer |
|---|---|
| Mid | Stack solution, works, ~8 min. Tests only the given example. |
| Senior | + clarifies upfront + tests "(]" + uses constant dict + end-empty-check articulated. |
| Staff | + names “most-recent-unresolved” pattern family + connects to monotonic stack problems + addresses “what if char set were configurable” before being asked. |
| Principal | + asks production context (“are we validating Markdown? JSON? code?”) + identifies that real validators need position tracking for error messages + offers extension to track unmatched positions for IDE-style red squiggles + mentions that for adversarial input we’d cap stack depth to prevent DoS. |
13. Follow-up Questions & Full Answers
Q1: “Return the index of the first invalid character instead of true/false.”
Signal: Can you adapt without rewriting?
Answer: Track current index i; when popping mismatches or empty-pop happens, return i. At end if stack non-empty, return the index of the unmatched opener at stack top (so push (char, index) tuples, not just chars). One-pass O(N), no extra time cost.
Q2: “Mixed bracket modes: also accept [) and (] as valid in this protocol.”
Signal: Data-driven code awareness.
Answer: Replace {')':'(', ']':'[', '}':'{'} with {')':'([', ']':'[(', '}':'{'} and change the match check from stack.pop() == pairs[c] to stack.pop() in pairs[c]. Zero algorithm change — only the config dict moves. Demonstrates extensibility.
Q3 (Hard): “Parse and evaluate the matched expression as a Lisp-like AST.”
Signal: Stack-of-frames extension. Answer: Each frame is a list of tokens accumulating between matching opens. On open: push current frame, start new. On close: pop, append the inner frame to the outer. Same stack discipline, now each entry is a list, not a char. This is the bridge from p02 to LC 394 (Decode String) and to p80 (Basic Calculator).
Q4: “Now there are millions of strings per second — same algorithm, but optimize for throughput.”
Signal: Production thinking. Answer: Three wins. (a) Replace Python list-as-stack with a fixed-size preallocated array + integer top pointer — eliminates allocation. (b) Use a 256-entry lookup table mapping char codes to (type, is_opener) instead of dict lookup — branch-free hot loop. (c) For truly hot path, write a SIMD-friendly state machine in C — but only after profiling shows this is the bottleneck.
Q5: “How would you test it?”
Signal: Testing discipline. Answer: (1) Unit: all 6 fail modes + valid + empty. (2) Property test: brute (replace-loop) vs optimal on random bracket strings, must agree. My solution.py does this. (3) Adversarial: deeply nested input (depth = 10⁵) — confirms iterative, not stack-overflowing. (4) Fuzz: random non-bracket chars to confirm the “ask about other characters” clarification.
14. Full Solution Walkthrough
See solution.py.
is_valid_brute: repeatedreplaceloop. Correct but O(N²). Used as oracle.is_valid: stack + closer→opener dict. Three correctness branches: (i) push on open, (ii) check-then-pop on close (empty + mismatch both → False), (iii) final stack-empty check.stress_test: generates random bracket strings (balanced and unbalanced), asserts brute and optimal agree.
15. Beyond the Problem
Real systems this is the kernel of:
- JSON / XML / YAML parsers — every structural parser uses this exact stack discipline. Errors in production parsers (e.g., “Unexpected end of input”) are the end-empty-check firing.
- IDE bracket matching — your editor’s “matching bracket” highlight runs this algorithm on every keystroke, scoped to a window around the cursor.
- Code linters —
eslint’sno-unbalanced-bracketsrule is this with position tracking. - Network protocols with framing (e.g., BSON, MessagePack) use stack-based parsers.
Principal-engineer code review comment: “We have three places in the codebase that re-implement bracket matching (JSON parser, query parser, markdown renderer). They’ve each drifted to handle their edge cases differently. Extract a generic match_pairs(input, pair_table, on_unmatched) and centralize. The bug we hit last quarter was the JSON parser missing the end-check; the markdown one had the check but the wrong error message.”
p03 — Best Time to Buy and Sell Stock
Source: LeetCode 121 · Easy · Topics: Array, DP, Greedy Companies: Amazon (very high), Bloomberg (very high), Facebook (high), Apple (medium), Uber (medium) Loop position: phone screen, or first warmup before harder DP onsite
1. Quick Context
This is the “single-transaction maximum profit” problem and the entry door to a 6-problem ladder (LC 121, 122, 123, 188, 309, 714) that culminates in the hardest stock problems on LC. Mastering p03 properly — by recognizing it as “max forward-difference with constraint i < j” and NOT brute force — unlocks the whole family.
What it looks like it tests: array iteration. What it actually tests: Whether you see the invariant transformation: instead of trying all (buy, sell) pairs, track the minimum buy-price seen-so-far and compute profit-if-sold-today. This is a one-pass O(N), O(1) algorithm; the brute force is O(N²).
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/best-time-to-buy-and-sell-stock/
12-min timer. Cold attempt. No reading on.
3. Prerequisite Concepts
- “Running min/max” pattern — phase-02 §3 Prefix Sums (same family — running aggregate)
- Greedy correctness: why local-min + global-max-of-(today - min) is globally optimal
4. How to Approach
Restate: Given prices indexed by day, pick a buy-day i and a sell-day j > i to maximize prices[j] - prices[i]. If no profit possible, return 0. One transaction only.
Clarify:
- “Can I sell on the same day I buy?” (No — strict
j > i.) - “If prices monotonically decrease, return 0 or negative?” (Per LC: 0 — no transaction is valid.)
- “Multiple transactions allowed?” (No, this is LC 121. LC 122 is the multi-transaction version. ASK to confirm.)
- “Length bounds?” (LC: up to 10⁵ → O(N²) will TLE.)
Examples:
[7,1,5,3,6,4]→ 5 (buy at 1, sell at 6)[7,6,4,3,1]→ 0 (no profit possible)[1]→ 0 (no transaction possible — single price)[2,4,1]→ 2 (buy at 2, sell at 4, NOT buy at 1; can’t sell after — common trap)
Brute force: All (i, j) pairs with j > i; track max diff. O(N²) time, O(1) space.
Pattern recognition: “Max value of a[j] - a[i] with i < j” → equivalent to “at each j, what’s the min of a[0..j-1]?” → running-min + per-element subtract → O(N).
Optimal:
min_so_far = +infinity
best = 0
for price in prices:
best = max(best, price - min_so_far)
min_so_far = min(min_so_far, price)
return best
Order matters: Compute best BEFORE updating min_so_far. Otherwise on a single day you’d allow “buy and sell same day” (price - price = 0, no harm here but ON LC 122 the equivalent bug causes phantom profits).
Correctness proof (greedy): For any optimal pair (i*, j*) with j* > i*, when the loop reaches j*, min_so_far ≤ prices[i*] (because i* ≤ j*-1, and min_so_far covers prices[0..j*-1]). So prices[j*] - min_so_far ≥ prices[j*] - prices[i*] = optimal. Since we take the max over all j, we capture this value.
5. Progressive Hints
hints.md — one at a time, 5-min timer.
6. Deeper Insight — Why It Works
The transformation: A 2D search “find (i, j) max difference” becomes 1D “at each j, what’s the best historical min?” by recognizing that for any fixed j, the optimal i is argmin(prices[0..j-1]). We don’t need to remember which i — just its value. This is the same compression that powers Kadane’s algorithm (maximum subarray): instead of trying all subarrays, track the best one ending here.
Why O(1) space: The running-min subsumes all history we need. We never look back; we only look at the current price vs. the cheapest ever.
Connection to Kadane’s algorithm: If you compute diffs[i] = prices[i+1] - prices[i], then LC 121 becomes “max sum subarray over diffs” — Kadane’s algorithm. This equivalence is a Staff-level observation.
7. Anti-Pattern Analysis
Wrong #1 — Two pointers from both ends: Some try left = 0, right = N-1, shrink to find max. Doesn’t work: the optimal pair isn’t necessarily at the extremes.
Wrong #2 — Sort: Sorting destroys the time-order constraint. The buy-day must come before the sell-day in original order.
Wrong #3 — Update min before max:
for p in prices:
min_so_far = min(min_so_far, p) # ← wrong order
best = max(best, p - min_so_far)
On LC 121, gives same answer because max profit on day i if buy=sell=i is 0. But on the multi-tx variants this exact bug allows “buy and sell at same instant,” inflating profit.
Wrong #4 — Greedy “buy every local min”: Confuses LC 121 (single transaction) with LC 122 (multiple). Read the prompt.
8. Skills & Takeaways
Pattern: running-min/max + per-element decision. Direct applications:
- LC 122 — Buy/Sell II (multiple tx → sum positive diffs greedily)
- LC 123 — Buy/Sell III (at most 2 tx → DP over states)
- LC 188 — Buy/Sell IV (at most k tx → generalized DP)
- LC 309 — with cooldown (state machine DP)
- LC 714 — with fee (state machine DP)
- LC 53 — Maximum Subarray (Kadane — same family via diff transform)
- LC 152 — Maximum Product Subarray (track both running min and max because negatives)
9. When to Move On
- Solved unaided <8 min, O(N), O(1)
- Tested decreasing input, single-element, two-element
- Articulated the “running min + per-step profit” transformation
- Connected to Kadane’s algorithm via the diff trick
- Stress test passes
- Solved LC 122, LC 53; saw the family resemblance
10. Company Context
Amazon
- The framing: “You’re shown daily stock prices for a company. What’s the most an investor could have made with one buy and one sell?”
- Misleading example: They give
[2, 4, 1]to bait the “buy at 1” mistake. The trap: 1 is the LAST day, no sell possible after. - Adversarial extension: “Now they can do multiple transactions” (→ LC 122) immediately followed by “now there’s a $1 fee per transaction” (→ LC 714). Tests whether you generalize cleanly.
Bloomberg (terminal company — they LIVE on time-series)
- The framing: Often pure LC 121, sometimes with timestamps not indices (test if you read the order from the input).
- What they want: Recognition that this is a time-series invariant problem. The phrase “I’ll track the running minimum” is a green check.
- Extension: “What if prices stream in?” → same algorithm; the running-min works incrementally.
Meta
- The framing: Followed RAPIDLY by LC 122, then “at most k transactions” (LC 188).
- What they want: You finish LC 121 in 5 minutes so they get to LC 188. If you spend 15 min on LC 121, you’ll never reach the real interview question.
Uber
- Frame: “Surge pricing history — best moment to launch a promotional ride.”
- Extension: “What if some days are weekends and weekend buys are forbidden?” → tests masking, not algorithm.
11. Interviewer’s Lens
| Phase | Strong | Weak | Scorecard |
|---|---|---|---|
| Reading | Confirms “single transaction” + “j > i strict” | Assumes multi-tx | “Verifies the contract” |
| Pre-coding | States O(N²) brute, then O(N) optimal with proof sketch | Jumps to “iterate and track” | “Derives, doesn’t memorize” |
| Coding | Updates best before min_so_far | Wrong order, gets lucky on LC 121 | “Subtle correctness awareness” |
| Edge | Tests decreasing array, single price | Tests only sample | “Anticipates degenerate inputs” |
| Finish | Connects to Kadane / LC 122 family | Says “done” | “Sees the pattern family” |
12. Level Delta
| Level | Answer |
|---|---|
| Mid | One-pass running-min, ~10 min. O(N), O(1). Correct. |
| Senior | + clarifies single vs multi tx + tests decreasing array + states correctness invariant. |
| Staff | + names the “max subarray of diffs” equivalence (Kadane) + offers to immediately extend to LC 122. |
| Principal | + asks production context (algo trading? backtest? UI dashboard?) + notes that real backtests need transaction fees, slippage, position size — and that “max profit” alone is a wrong metric (drawdown, Sharpe matter) + mentions that on real streams you’d window the running-min for non-stationarity. |
13. Follow-up Questions & Full Answers
Q1: “Now allow unlimited transactions.” → LC 122.
Answer: Sum every positive consecutive diff: sum(max(0, prices[i+1] - prices[i]) for i in range(N-1)). Proof: any concave-up segment between local min and local max contributes (max - min); the sum of positive diffs equals the sum of these contributions. O(N) one pass, O(1) space.
Q2: “At most k transactions.” → LC 188.
Answer: DP. State: dp[t][i] = max profit using ≤ t transactions through day i. Transition: dp[t][i] = max(dp[t][i-1], max over j<i of (dp[t-1][j-1] + prices[i] - prices[j])). Naively O(k·N²); optimize the inner max to O(1) by tracking max(dp[t-1][j-1] - prices[j]) as we scan, giving O(k·N). When k ≥ N/2, problem degenerates to unlimited tx (LC 122) — handle this case separately.
Q3: “What about transaction fee f per buy-sell pair?” → LC 714.
Answer: State machine. hold[i] = max profit ending day i holding a stock; cash[i] = max profit ending day i not holding. hold[i] = max(hold[i-1], cash[i-1] - prices[i]). cash[i] = max(cash[i-1], hold[i-1] + prices[i] - f). Pay the fee when selling. O(N), O(1) with two scalars.
Q4: “Streaming prices, infinite stream. Output the running best-possible-profit-so-far.”
Answer: Same algorithm, incremental. Maintain min_so_far and best. On each tick, update both. The answer is best at all times. O(1) per tick.
Q5: “How do you test it?”
Answer: (1) Edge: empty (or N=1) → 0, decreasing → 0, increasing → last - first. (2) Property: brute-force O(N²) vs optimal on random arrays. (3) Adversarial: arrays where the buy is on day 0 vs day N-2 — tests whether running-min update timing is right.
14. Full Solution Walkthrough
See solution.py.
Three solutions for didactic value:
max_profit_brute: all pairs, O(N²). Oracle.max_profit: running-min one-pass, O(N), O(1).max_profit_kadane: Kadane on diffs, to demonstrate equivalence.
All three should agree under stress_test.
15. Beyond the Problem
Real systems this is the kernel of:
- Backtest engines (Zipline, Backtrader): the “perfect foresight” upper bound used to score strategies. A strategy can’t beat single-transaction-max-profit on a sequence; this is the benchmark.
- A/B test analysis: “what’s the largest sustained delta we observed?” — same running-min/max.
- Latency/throughput dashboards: “what’s the largest drop in throughput?” — running-max + per-point delta.
Principal-engineer code review comment: “This algorithm assumes prices is a value-typed array. In our pipeline, prices are timestamped events with possible gaps. Either resample to fixed-interval before feeding, or change the algorithm to work on (timestamp, price) tuples and handle missing intervals. Also: what’s the contract when prices contains NaN (market closed)? Define it explicitly.”
p04 — Merge Sorted Array
Source: LeetCode 88 · Easy · Topics: Array, Two Pointers, Sorting Companies: Facebook (high), Bloomberg (high), Microsoft (high), Adobe (medium), Apple (medium) Loop position: phone screen or onsite warmup; often paired with a follow-up to merge k arrays (LC 23 — Merge k Sorted Lists)
1. Quick Context
A deceptively simple problem with a sharp trick: you must merge in place in nums1 (which has extra trailing zeros). The naive approach (merge from the left) requires shifting → O(N²). The optimal approach (merge from the right, writing the largest elements into the back) is O(M+N), O(1) extra.
What it tests: in-place pointer manipulation and the recognition that writing backwards avoids the overwrite problem. The trap: candidates instinctively merge from the front and either get O(N²) or need an extra buffer (O(N) space) and forget that the prompt forbids it.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/merge-sorted-array/
12-min timer. Cold attempt. The “merge from the back” insight should be the first thing you reach for — if not, that’s a strong signal you need this rep.
3. Prerequisite Concepts
- Two-pointer technique — phase-02 §1
- “In-place transformation” pattern — when extra buffer is forbidden
4. How to Approach
Restate: nums1 has length m + n. Its first m entries are valid, sorted; the last n are placeholder zeros. nums2 has n valid sorted entries. Merge so that nums1 holds all m + n values sorted. Modify nums1 in place.
Clarify:
- “Can I use O(N) extra space?” (Usually no — the whole point is in-place. ASK.)
- “Are duplicates allowed?” (Yes; merge is stable.)
- “Can
mornbe 0?” (Yes. Both edge cases.) - “Are negative numbers allowed?” (Yes, per constraints.)
Examples:
nums1=[1,2,3,0,0,0], m=3, nums2=[2,5,6], n=3→[1,2,2,3,5,6]nums1=[1], m=1, nums2=[], n=0→[1](n=0 edge)nums1=[0], m=0, nums2=[1], n=1→[1](m=0 edge)nums1=[4,5,6,0,0,0], m=3, nums2=[1,2,3], n=3→[1,2,3,4,5,6](all of nums2 < all of nums1 — tests handling when nums2 isn’t exhausted)
Brute force: Copy nums2 into the tail of nums1, then sort. O((M+N) log (M+N)). Trivially correct but wastes the “already sorted” property.
Pattern: Two sorted sequences + in-place + extra room at the END → merge from the back, largest first.
Optimal:
i = m - 1 # pointer into nums1's valid part
j = n - 1 # pointer into nums2
write = m + n - 1 # pointer into nums1's tail (where to write next)
while j >= 0: # only need to keep going while nums2 has elements
if i >= 0 and nums1[i] > nums2[j]:
nums1[write] = nums1[i]
i -= 1
else:
nums1[write] = nums2[j]
j -= 1
write -= 1
Why we loop on j >= 0, not i >= 0: If nums2 is exhausted, the remaining nums1[0..i] is already in its final position (since we wrote bigger elements to the right). If nums1’s valid prefix is exhausted (i < 0), we must still copy remaining nums2 elements. The loop condition reflects which case requires action.
Correctness: At each step we write the larger of the two unprocessed maxima into the next free slot from the right. The slot is always at or past the “write frontier”, which is at index i + j + 1. Since write = i + j + 1 decreases monotonically and i + j is the count of unprocessed elements minus 2, the slot is never one we still need to read from. No overwrite.
5. Progressive Hints
hints.md. One at a time.
6. Deeper Insight — Why It Works
The reverse-merge insight: Merging from the front into nums1 would overwrite valid nums1 data before reading it (because the write pointer would catch up to the read pointer for nums1). Merging from the back writes into slots that are guaranteed unused (either they’re trailing zeros, or they’re slots holding values already moved).
The invariant: At any point, nums1[write+1 .. m+n-1] contains the largest (m+n-1 - write) elements of the final merged result, in sorted order. nums1[0..i] and nums2[0..j] are the unprocessed prefixes.
Why i+j+1 == write always: Initially i+j+1 = (m-1) + (n-1) + 1 = m+n-1 = write. Each iteration decrements write by 1 and decrements either i or j by 1, so the relation holds. Therefore write is always one past i+j, the position right after the unprocessed prefixes — guaranteed empty. This is the formal proof of “no overwrite.”
7. Anti-Pattern Analysis
Wrong #1 — Front merge with shift:
i, j = 0, 0
while j < n:
if i < m and nums1[i] <= nums2[j]:
i += 1
else:
# shift nums1[i..m-1] right to make room
nums1.insert(i, nums2[j]) # O(M) shift!
nums1.pop() # O(M)
i += 1
m += 1
j += 1
Correct but O((M+N) × M) — quadratic. Senior interviewers see this and ask “can you do better?” — if you can’t pivot to the reverse-merge, you’ve revealed a gap.
Wrong #2 — Copy nums1 valid prefix to an aux buffer:
aux = nums1[:m]
# then merge aux and nums2 into nums1 from the front
Correct and O(M+N) time but O(M) extra space. Violates the in-place spirit — passes LC but loses interview points.
Wrong #3 — Sort after concat:
nums1[m:] = nums2
nums1.sort()
O((M+N) log (M+N)). Passes LC. But: “you ignored the ‘sorted input’ property — what was the point of this problem?”
Wrong #4 — Forget the i >= 0 guard in the comparison:
while j >= 0:
if nums1[i] > nums2[j]: # IndexError when i goes negative
The guard if i >= 0 and nums1[i] > nums2[j] matters when nums1’s valid prefix is exhausted before nums2.
8. Skills & Takeaways
Pattern: write-backwards-to-avoid-overwrite. Direct applications:
- LC 26 / 27 / 80 — Remove Duplicates / Remove Element (write-forward variant)
- LC 283 — Move Zeroes
- LC 977 — Squares of a Sorted Array (write from back: largest absolute value at each end)
- LC 167 — Two Sum II (two-pointer from both ends; sibling family)
The “merge from back” trick generalizes: Any in-place merge where one container has trailing room becomes O(N) by writing backward. Used in some qsort partition tricks and in compaction passes of generational garbage collectors.
9. When to Move On
- Solved unaided <10 min using reverse-merge
- Tested m=0, n=0, all-of-nums2-smaller, equal elements
- Articulated why front-merge fails (overwrite) and back-merge succeeds (slot guaranteed free)
- Stress test passes
- Solved LC 977; saw the same write-backwards idea
10. Company Context
Facebook / Meta
- Frame: Often appears as a warmup before LC 23 (Merge k Sorted Lists). The interviewer expects you to do p04 in 5 min, then they ask “now generalize to k arrays.”
- What they want: Reverse-merge offered without prompting. The phrase “I’ll merge from the back to avoid shifting” is a green check.
- Trap: They give you
nums1with extra space already allocated. Candidates who treatnums1as size-m and try to grow it don’t see the “trailing zeros” hint.
Bloomberg
- Frame: Often as merge of two sorted price-tick streams (“merge into a single time-ordered view”).
- Extension: “Now they may have duplicate timestamps — preserve order from stream A first.” Tests stable-merge awareness.
Microsoft
- Frame: Phone screen warmup. They watch your pointer manipulation cleanliness.
- What they want: Correct, no off-by-one, comment on the loop condition. Boring + correct = pass.
Adobe
- Frame: Often given with the explicit constraint “no extra memory”. Tests whether you internalize the in-place requirement.
11. Interviewer’s Lens
| Phase | Strong | Weak | Scorecard |
|---|---|---|---|
| Reading | Notices the trailing zeros are intentional padding | Ignores zeros, treats nums1 as size-m | “Reads spec carefully” |
| Pre-coding | States “merge from back to avoid overwrite” | Plans front-merge with shifting | “Recognizes the optimal pattern” |
| Coding | Three pointers (i, j, write); correct guards | Off-by-one, IndexError | “Disciplined pointer code” |
| Edge | Tests m=0, n=0, all-A-bigger | Tests only sample | “Tests boundary conditions” |
| Finish | Articulates the i+j+1 == write invariant | Says “done” | “Proves correctness” |
12. Level Delta
| Level | Answer |
|---|---|
| Mid | Reverse-merge, O(M+N), works. ~10 min. |
| Senior | + clarifies extra-space constraint + tests m=0/n=0 + articulates why reverse direction avoids overwrite. |
| Staff | + states the i+j+1 == write invariant formally + offers to extend to merge-k-lists with a heap + notes that stable merge order matters in some applications. |
| Principal | + asks production context (database compaction? log merge?) + notes that real merge-sort tape algorithms used this exact pattern in pre-RAM era + identifies that for cache efficiency, even on RAM, sequential write patterns matter and reverse-merge here is sequential-from-the-end (still cache-friendly). |
13. Follow-up Questions & Full Answers
Q1: “Now merge k sorted arrays into one.” → LC 23 family
Answer: Min-heap of (value, array_id, element_id). Pop smallest, append to output, push the next from the same array. O(N log k) where N is total elements. For very large k, consider tournament tree (same complexity, lower constant).
Q2: “What if nums1 doesn’t have extra room at the end — same length as valid data?”
Answer: No way to merge in place in O(M+N). You’d need O(min(M,N)) extra space minimum (proof: there must be storage for the merge frontier). Standard out-of-place merge.
Q3: “What if both arrays are huge and stored on disk?”
Answer: External merge sort’s merge phase. Read both arrays in buffered chunks; merge into output buffer; flush when full. Classical pattern. The key complexity unit becomes I/O, not comparisons.
Q4: “Merge two sorted linked lists in place.” → LC 21
Answer: Two-pointer with rewiring. Maintain a dummy head + a tail pointer. At each step, point tail.next to the smaller of a.val, b.val, advance that pointer. After the loop, append whatever remains. O(M+N), O(1) extra.
Q5: “How do you test?”
Answer: (1) Edge: m=0, n=0, all-A-smaller, all-A-larger, equal elements, single-element each. (2) Property: brute (sort after concat) vs optimal on random sorted inputs, must agree element-for-element. (3) Adversarial: very different sizes (m=1, n=10⁵) — confirms loop conditions handle imbalance.
14. Full Solution Walkthrough
See solution.py.
merge_brute(nums1, m, nums2, n): append nums2 then sort. Oracle.merge(nums1, m, nums2, n): reverse-merge with three pointers. Mutatesnums1in place.stress_test: random sorted arrays of random sizes; brute vs optimal must produce identicalnums1.
15. Beyond the Problem
Real systems this is the kernel of:
- Database compaction: LSM trees (LevelDB, RocksDB, Cassandra) periodically merge sorted SSTables. The merge phase is exactly this algorithm, scaled to disk-resident sorted runs with buffered I/O.
- Log merging: distributed tracing systems (Jaeger, Zipkin) merge per-shard time-sorted spans into a global view.
- External sort: when data exceeds RAM, sort runs in memory, write to disk, then merge runs — same algorithm.
- Generational garbage collectors: compaction passes merge live objects into a destination region; the “write backwards” idea appears in some collectors to avoid overwriting unmoved objects.
Principal-engineer code review: “If this is being called in a tight loop, the algorithm is fine but the API is wasteful — caller must allocate nums1 with extra capacity. Consider a builder pattern that allocates the output buffer once and reuses it, especially if M and N vary. Also: the function silently assumes nums1 has length m+n; add a precondition or a contract test, or someone will pass an under-sized array and you’ll get a confusing IndexError instead of a clear contract violation.”
p05 — Climbing Stairs
Source: LeetCode 70 · Easy · Topics: Math, DP, Memoization Companies: Adobe (high), Amazon (medium), Apple (medium) Loop position: the canonical “intro to DP” — almost always a warmup before a real DP question
1. Quick Context
The “Fibonacci in disguise” problem. It’s the cleanest possible 1D DP and exists to teach you the recipe: (1) define state, (2) write recurrence, (3) identify base cases, (4) decide order, (5) compress space. If you can’t articulate those 5 steps on p05, you can’t on p51 (House Robber), p55 (Jump Game), p60 (Longest Increasing Subsequence), or anywhere else.
What it looks like it tests: ability to recognize Fibonacci. What it actually tests: the DP recipe. Interviewers use this as a calibration — if you write recursion without memoization (O(2^N)), or memoization without realizing iteration is simpler, you’ve revealed your DP fluency level.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/climbing-stairs/
10-min timer. Cold attempt. If you reach for recursion-without-memo, that’s the signal you need this rep most.
3. Prerequisite Concepts
- Recursion + memoization basics — phase-01 §8
- “State + recurrence + base case” — the universal DP recipe
- Space compression: when only the last
kstates are needed, store only those
4. How to Approach
Restate: From step 0, reach step n, taking either 1 or 2 steps at a time. How many distinct ways?
Clarify:
- “Distinct sequences of moves, or distinct step sets?” (Sequences —
1+2and2+1are different. ASK; some problems differ.) - “n ≥ 1?” (Per LC: 1 ≤ n ≤ 45.)
- “Why n ≤ 45?” (Hints at int overflow in some languages; in Python no issue. Worth a verbal acknowledgment.)
Examples:
n=1→ 1 (just[1])n=2→ 2 ([1,1],[2])n=3→ 3 ([1,1,1],[1,2],[2,1])n=4→ 5 (Fibonacci pattern starts to emerge: 1, 2, 3, 5)n=5→ 8
Brute force (recursive): f(n) = f(n-1) + f(n-2), base f(1)=1, f(2)=2. O(2^N) time without memo — exponential.
Pattern recognition: The recurrence is Fibonacci. Anywhere you see “ways to reach state N built from prior states” with overlapping subproblems → DP.
Optimal: Iterative DP, two scalars (prev1, prev2). O(N) time, O(1) space.
if n <= 2: return n
prev2, prev1 = 1, 2 # f(1), f(2)
for i in range(3, n+1):
prev2, prev1 = prev1, prev1 + prev2
return prev1
Recurrence justification: To reach step n, the last move was either a 1-step (from step n-1) or a 2-step (from step n-2). These are disjoint sets of paths, so total = f(n-1) + f(n-2).
5. Progressive Hints
hints.md. One at a time.
6. Deeper Insight — Why It Works
The DP recipe applied:
- State:
f(n)= number of distinct ways to reach stepn. - Recurrence:
f(n) = f(n-1) + f(n-2)(last move was 1 or 2). - Base cases:
f(1) = 1,f(2) = 2. (Why two base cases? Recurrence reaches back two steps; we need both grounded.) - Order: Bottom-up (iteratively i = 3 .. n) because
f(n)depends only on smalleri. - Space compression: Only the last two values are needed, so two scalars suffice — O(1) space.
Why two base cases, not one? The recurrence f(n) = f(n-1) + f(n-2) requires f(n-2) to be defined when n=3, so we must specify both f(1) and f(2). Trying to derive f(2) from f(0) + f(1) requires defining f(0) = 1 (the empty path) — a valid alternative formulation, but the natural problem space starts at step 1.
Closed form (Staff signal): Fibonacci has a closed-form Φ^n / √5 (Binet’s formula). For interview purposes, mention it exists but don’t compute it — floating-point loses precision for n > ~70 and the problem caps at 45 anyway.
Matrix exponentiation (Staff signal): Fibonacci can be computed in O(log N) via [[1,1],[1,0]]^n. Overkill for n ≤ 45; useful when n is large or when the recurrence generalizes (e.g., f(n) = a·f(n-1) + b·f(n-2)).
7. Anti-Pattern Analysis
Wrong #1 — Plain recursion, no memo:
def climb(n):
if n <= 2: return n
return climb(n-1) + climb(n-2)
O(2^N). At n=45, ~35 billion calls. TLE for sure. Interviewer’s note: “didn’t recognize overlapping subproblems.”
Wrong #2 — Memo with dict overhead:
@lru_cache
def climb(n):
if n <= 2: return n
return climb(n-1) + climb(n-2)
Works in O(N) but uses O(N) call stack and dict memory. Iterative is strictly better — no recursion limit risk, O(1) space. Mention lru_cache exists but show why iterative wins.
Wrong #3 — DP table O(N) space:
dp = [0]*(n+1)
dp[1], dp[2] = 1, 2
for i in range(3, n+1): dp[i] = dp[i-1] + dp[i-2]
return dp[n]
Correct, O(N) time, O(N) space. Misses the O(1) compression. Senior signal: notice you only need the last 2 values, drop the array.
Wrong #4 — Off-by-one on base cases:
Forgetting n==1 returns 1, not 2; or forgetting to handle n==2 directly and letting the loop run. Tests with n=1, 2, 3 catch these.
8. Skills & Takeaways
The DP recipe — burn it in:
- State definition
- Recurrence with justification
- Base cases (one per step the recurrence reaches back)
- Computation order
- Space compression (only keep what’s referenced)
Generalizations — variants you’ll see:
- LC 746 — Min Cost Climbing Stairs (same structure, minimize cost instead of count)
- LC 198 — House Robber (same structure with “can’t be adjacent” constraint)
- LC 213 — House Robber II (circular variant)
- LC 91 — Decode Ways (Fibonacci with validity guards)
- LC 1137 — N-th Tribonacci (extend recurrence to 3 prior terms)
- LC 509 — Fibonacci Number (literal Fibonacci)
When you can take 1, 2, or 3 steps: f(n) = f(n-1) + f(n-2) + f(n-3) (Tribonacci). When you can take 1..k steps: f(n) = sum f(n-i) for i=1..k → can be O(N) with a sliding window.
9. When to Move On
- Solved iteratively, O(N) time, O(1) space, <8 min
- Tested n=1, n=2, n=3
- Articulated all 5 DP recipe steps without prompting
- Connected to House Robber (LC 198) as the same pattern with cost
- Stress test passes
- Mentioned closed-form / matrix exp exist (Staff signal)
10. Company Context
Adobe (this is their go-to DP warmup)
- Frame: Often given verbatim.
- What they want: Iterative O(1) space, articulated DP recipe. Recursion without memo is a near-instant fail.
- Extension: “What if you can take 1, 2, or 3 steps?” → Tribonacci. Tests generalization.
Amazon
- Frame: Sometimes as a paving / tiling problem (“ways to tile a 1×N corridor with 1×1 and 1×2 tiles”) — same Fibonacci underneath.
- Extension: “Now there’s a step you can’t land on (broken stair).” → DP with constraint:
f(broken) = 0, recurrence unchanged otherwise.
Apple
- Frame: Often paired with LC 198 (House Robber) — they want to see if you generalize.
11. Interviewer’s Lens
| Phase | Strong | Weak | Scorecard |
|---|---|---|---|
| Reading | Asks “sequences vs sets” + clarifies n bounds | Dives in | “Verifies the contract” |
| Pre-coding | Names the recurrence with justification | Says “I’ll recurse” | “Articulates DP structure” |
| Coding | Iterative + 2 scalars | Recursive without memo | “Optimal space upfront” |
| Edge | n=1, n=2 covered | Misses n=1 | “Catches base-case bugs” |
| Finish | Mentions DP recipe steps + extensions | Says “done” | “Frameworks generalize” |
12. Level Delta
| Level | Answer |
|---|---|
| Mid | Iterative O(N), O(1). Works. ~8 min. |
| Senior | + states the 5-step DP recipe explicitly + tests n=1,2,3 + offers to extend to k-step. |
| Staff | + mentions closed-form (Binet) and matrix exponentiation as O(log N) alternatives + connects to House Robber as same family with different reduction operator (sum → max). |
| Principal | + asks production context (“counting paths in an event graph? combinatorics for a recommender?”) + identifies that for very large n we need modular arithmetic (problem caps prevent overflow but real systems don’t) + suggests matrix exponentiation under modulo for n in the millions. |
13. Follow-up Questions & Full Answers
Q1: “What if you can take 1, 2, or 3 steps?”
Answer: Tribonacci. f(n) = f(n-1) + f(n-2) + f(n-3), base cases f(1)=1, f(2)=2, f(3)=4. O(N) time, O(1) space with three scalars.
Q2: “What if you can take 1..k steps?”
Answer: f(n) = sum f(n-i) for i in 1..k. Naive O(N·k). Optimize via sliding window: f(n) = f(n-1) + f(n-1) - f(n-1-k) (because consecutive windows overlap in k-1 terms). O(N), O(k) space.
Q3: “Now some steps are broken — you can’t land there.”
Answer: f(broken[i]) = 0. Recurrence unchanged. Still O(N), O(1) — just a guard inside the loop.
Q4: “Same problem but n can be up to 10^18.”
Answer: Matrix exponentiation. [f(n), f(n-1)]^T = M^(n-1) · [f(1), f(0)]^T where M = [[1,1],[1,0]]. Compute M^(n-1) via binary exponentiation in O(log N) matrix multiplications, each O(1) for a 2×2 matrix. Total O(log N). Use modular arithmetic to prevent overflow.
Q5: “How do you test it?”
Answer: (1) Small cases against hand-computed Fibonacci values (1, 2, 3, 5, 8, 13). (2) Property: brute recursive (with memoization) vs iterative agree on n in 1..30. (3) Stress on the boundary n=1 (off-by-one trap) and n=45 (max constraint).
14. Full Solution Walkthrough
See solution.py.
climb_brute(n): plain recursion, no memo. Used as oracle for small n only (would TLE for large).climb(n): iterative two-scalar version. O(N) time, O(1) space.climb_memo(n): top-down memoization, included for comparison. O(N) time, O(N) space.stress_test: brute vs optimal vs memo on n ∈ [1, 25], all three agree.
15. Beyond the Problem
Real systems this is the kernel of:
- Path-counting in DAGs: number of distinct routes through a graph. The 1D variant generalizes to the topological order of a DAG.
- State-machine reachability: number of distinct token sequences leading to a state in a parser.
- Combinatorial enumeration in recommender systems: counting compatible item sequences under constraints.
- Probabilistic state aggregation: when transition probabilities are uniform, counts and probabilities differ only by a normalizing constant.
Principal-engineer code review: “This function is correct but its name climb_stairs is so specific that it won’t be reused. The underlying primitive — counting paths under additive recurrence — appears in three other places in our codebase, each reimplemented. Extract count_paths(n, allowed_step_sizes) and use it everywhere. Also: cache the result if called repeatedly with the same n — but for n ≤ 45 the compute cost is negligible, so probably not worth the cache complexity.”
Phase 0 — Interview Execution Baseline
Target level: Beginner → Easy Expected duration: 1 week (12-week track) / 1–2 weeks (6-month) / 2 weeks (12-month) Weekly cadence: 7 labs + 25 Easy problems applying the framework rigorously
Why This Phase Exists
Most candidates fail interviews not because they lack algorithms knowledge, but because of execution failures: they jump to coding before understanding the problem, panic when stuck, miss obvious edge cases, or communicate so poorly that the interviewer can’t tell whether they actually solved it.
Phase 0 fixes execution. It does not teach algorithms. It teaches you how to use what you already know.
Concepts To Master
1. How Coding Interviews Are Evaluated
Every modern coding interview at Big Tech uses a multi-dimensional rubric. The dimensions are roughly:
- Problem understanding — did you grasp what was asked?
- Approach quality — did you find a reasonable solution?
- Optimality — did you reach the optimal complexity?
- Implementation correctness — does your code actually work?
- Code quality — would your code pass code review?
- Testing — did you verify your solution?
- Communication — could the interviewer follow your thinking?
- Tradeoff awareness — do you understand what you chose and why?
Each is scored ~independently. A “hire” decision typically requires strong scores on most dimensions, not perfect on one. Code that works but is uncommunicated is often a no-hire.
2. How To Communicate While Solving
See ../COMMUNICATION.md in full. The summary: narrate your intent, not your typing. Pause at decision points. Ask before assuming.
3. How To Ask Clarifying Questions
Good questions:
- “Can the input be empty?”
- “Are there duplicates allowed?”
- “What’s the expected size of N — is 10^5 reasonable?”
- “Does the order of output matter?”
- “What should happen on invalid input?”
- “If multiple valid answers exist, can I return any one?”
Bad questions:
- Re-asking what’s already in the problem statement (signals poor reading)
- “What’s the optimal complexity?” (this is your job to derive)
- 15 questions in a row (drip them as relevant)
4. How To Derive Constraints
Constraints dictate the algorithm. Memorize this table:
| N | Acceptable Complexity | Likely Approach |
|---|---|---|
| ≤ 10 | O(N!) or O(2^N · N) | Backtracking, full enumeration |
| ≤ 20 | O(2^N · N) | Bitmask DP, meet-in-the-middle |
| ≤ 100 | O(N^4) | Multi-loop brute, Floyd-Warshall |
| ≤ 500 | O(N^3) | Interval DP, matrix chain |
| ≤ 5,000 | O(N^2) | 2D DP, edit distance |
| ≤ 100,000 | O(N log N) | Sort + scan, heap, segment tree |
| ≤ 1,000,000 | O(N) | Linear scan, hashmap, two pointers |
| ≤ 10^8 | O(log N) | Binary search, math closed form |
| ≤ 10^18 | O(log N) | Binary exponentiation, math |
Rule of thumb: in modern judges, 10^8 simple operations per second is safe; 10^9 is risky.
5. How To Create Examples
The given examples are minimum-coverage. You should construct:
- Trivial: size 0, size 1.
- Boundary: all duplicates, all negatives, all sorted, all reversed.
- Adversarial: max-constraint values, edge of integer range.
- Multi-answer: if multiple valid outputs exist, pick a specific one.
Working through your own examples often reveals the pattern faster than reading the problem 5 more times.
6. How To Identify Edge Cases
Universal checklist (run through this every problem):
- Empty input
- Null input (where applicable)
- Single element
- Two elements
- Duplicates
- Negative numbers
- Maximum constraint values (overflow risk)
- Sorted input
- Reversed input
- Disconnected graph
- Cycle in graph
- Multiple valid answers
- All identical values
- Off-by-one at boundaries
7. How To Start With Brute Force
The brute force is mandatory. Even when you “see” the optimal:
- It anchors correctness — you have a working oracle.
- It’s a starting point to optimize from.
- It’s a fallback if optimization fails.
- It demonstrates you understand the problem at all.
State the brute force in pseudocode within the first 3 minutes.
8. How To Optimize
Optimization checklist:
- Pattern recognition — does this match a known pattern? See Phase 2.
- Repeated work — what does the brute force recompute? That’s your DP / memo target.
- Sortedness — would sorting help? Two pointers, binary search, sweep line.
- Monotonicity — is the answer monotonic in some parameter? Binary search on answer.
- State compression — can the state space be made smaller? Bitmask, prefix sum.
- Math — closed form, modular arithmetic, combinatorics.
- Data structure — would a heap / BST / segment tree change the complexity?
9. How To Prove Correctness
- Greedy: exchange argument — show that any optimal solution can be modified to use the greedy choice without losing optimality.
- DP: state definition, transition, base cases, evaluation order.
- Two-pointer / sliding window: loop invariant.
- Graph algorithms: cite the algorithm’s correctness theorem and verify preconditions hold.
- Math: induction or direct calculation.
10. How To Explain Complexity
State time and space. Mention:
- Whether the bound is tight.
- Amortized vs worst case (e.g., dynamic array push is O(1) amortized, O(N) worst case).
- Assumptions (hash table O(1) average requires non-adversarial input).
- Constants when they matter (e.g., bitset gives 64× speedup over bool array).
11. How To Write Clean Code Under Time Pressure
See ../CODE_QUALITY.md. Quick rules:
- Meaningful names.
- Helper functions for distinct units.
- Validate at the boundary, not in hot loops.
- No globals.
- Standard library used idiomatically.
12. How To Test Manually
- Walk through given example by hand. Record intermediate state.
- Walk through one trivial case (empty / size 1).
- Walk through one adversarial case.
- Find at least one bug before submitting.
13. How To Recover When Stuck
Use the stuck protocol. Restate, brute force, examine constraints, try smaller examples, look for repeated work, look for monotonicity, look for graph modeling, ask for a hint.
14. How To Use Hints Without Failing The Interview
A hint is not a failure. Frozen silence is. Sample phrasing:
“I’ve explored two pointers and sliding window, but I’m having trouble seeing how to handle duplicates without O(N^2). Could you nudge me toward the right family of approach?”
Receive the hint, restate it, commit out loud, resume.
Why These Concepts Matter In Interviews
Algorithms are necessary but not sufficient. Of all rejected candidates with strong algorithm knowledge, the most common failure modes are:
- “Could not communicate clearly” (60%+)
- “Did not test their code” (40%+)
- “Could not articulate complexity” (30%+)
- “Got stuck and couldn’t recover” (30%+)
- “Misunderstood the problem” (20%+)
(Percentages are approximate, based on common interviewer feedback patterns.)
Phase 0 fixes all of these.
Required Problem Categories
Phase 0 problems are not algorithmically hard. The challenge is execution. Use only Easy problems:
- Two Sum
- Reverse String
- Valid Palindrome
- Maximum Subarray (Kadane’s)
- Best Time To Buy And Sell Stock
- Single Number
- Merge Two Sorted Lists
- Linked List Cycle
- Binary Tree Inorder Traversal
- Symmetric Tree
- Maximum Depth Of Binary Tree
- Climbing Stairs
- Move Zeros
- Contains Duplicate
- Intersection Of Two Arrays
- Reverse Linked List
- Valid Parentheses
- Implement Stack Using Queues
- First Bad Version
- Squares Of A Sorted Array
Solve each one applying the full framework. The point is not the answer; it’s the execution discipline.
Recommended Resources
- This curriculum’s FRAMEWORK.md, COMMUNICATION.md, CODE_QUALITY.md
- LeetCode Easy tier (filter by Easy)
- “Cracking the Coding Interview” (Gayle Laakmann McDowell) — chapters 1, 2, 6, 7 (the soft-skills chapters; skip the rest until later)
Hands-On Labs
Complete in order. Each lab uses the strict lab format from the curriculum spec.
- Lab 01 — Problem Statement Reading
- Lab 02 — Constraints Extraction
- Lab 03 — Brute Force First
- Lab 04 — Optimization Pathway
- Lab 05 — Manual Testing
- Lab 06 — Communication
- Lab 07 — Stuck Recovery
Common Mistakes In Phase 0
- Skipping Phase 0 thinking “I already know this stuff” — execution skills are different from knowledge
- Solving Easy problems silently — the whole point is communication practice
- Skipping the brute force because the optimal is obvious
- Skipping the testing step because the code “looks right”
- Not timing yourself — you don’t know your real solving speed until you measure
- Treating clarifying questions as a checklist — they should feel natural and motivated, not robotic
Mastery Checklist
- Solve any LeetCode Easy in 12 minutes including brute force, optimization, testing, complexity
- Restate every problem in your own words without re-reading the prompt
- Ask 3+ relevant clarifying questions on every problem
- Always state the brute force first (even if 10 seconds long)
- Walk through at least one example by hand before submitting
- Explain complexity correctly on every problem
- Find at least one bug pre-submission on at least 30% of problems (this is good!)
- Never go silent for >60 seconds when working a problem
- Recover from being stuck using the stuck protocol within 3 minutes
Exit Criteria
Move to Phase 1 only when:
- Mastery checklist 100% checked
- Completed all 7 labs
- Solved 25 Easy problems applying the full framework
- Recorded yourself solving 3 problems and reviewed the playback for communication quality
- Run a self-mock with one Easy problem you’ve never seen — pass with full framework execution
If any item fails, repeat Phase 0 with another 25 problems. Do not move forward with a weak baseline.
Lab 01 — Problem Statement Reading
Goal
Train the discipline of reading a problem statement deliberately — extracting structure, constraints, examples, and ambiguity — before any solving begins. Most candidate failures begin in the first 60 seconds when the candidate reads superficially and locks in a wrong mental model.
Background Concepts
- Active reading vs passive reading
- Structural parsing of a problem (input shape, output shape, constraints, examples, follow-ups)
- Identifying ambiguity vs underspecification
- Restating in your own words as a comprehension test
Interview Context
In a real interview, the prompt is often delivered verbally with a brief written version. You have ~3 minutes to load the entire problem into working memory before you start engaging. If you misunderstand something here, every subsequent step is wasted. Strong candidates always restate the problem out loud and ask 2–4 clarifying questions before touching anything.
Problem Statement
Given the problem statement below, in 5 minutes:
- Read it twice.
- Restate it in your own words.
- List all constraints (explicit + implicit).
- List 3 ambiguities you’d ask the interviewer about.
- Construct 3 examples beyond the one given.
The problem (use as a fixed text for this lab):
“Given a list of meeting time intervals, determine if a single person could attend all meetings.”
Example:
[[0,30],[5,10],[15,20]]→false.
Constraints
The problem deliberately omits constraint specification. That’s the point.
Clarifying Questions (you should generate)
Examples of good questions to surface from the prompt above:
- Are intervals inclusive on both ends, or
[start, end)? - Can intervals be zero-duration (
[5,5])? - Is the input pre-sorted?
- Are negative times possible?
- What are the realistic bounds on N and on the time values?
- Can two meetings sharing an endpoint be attended (e.g. one ends at 10, next starts at 10)?
- Are there any null or invalid intervals to validate?
Examples (you should generate)
Beyond the given:
[]→true(empty)[[1,5]]→true(single)[[1,5],[5,10]]→ endpoint-sharing case (depends on inclusivity)[[1,5],[2,3]]→ fully nested overlap[[10,20],[1,5]]→ unsorted, non-overlapping[[1,1],[1,1]]→ zero-duration duplicates
Initial Brute Force
Compare every pair: O(N^2). For each pair, check if intervals overlap (max(a.start, b.start) < min(a.end, b.end) for half-open).
Brute Force Complexity
Time O(N^2), space O(1).
Optimization Path
Sort by start time, then walk through and check intervals[i].start >= intervals[i-1].end. Two thoughts emerge during sorting: ties on start, and the inclusivity semantics — both surface back to the clarifying questions. Optimal after the clarification.
Final Expected Approach
Sort + linear scan. O(N log N) time, O(1) extra space (or O(N) if sorting requires a copy).
Data Structures Used
- The interval array (input)
- Sort comparator on start time
Correctness Argument
After sorting by start, two meetings overlap ↔ some adjacent pair overlaps. Proof: if intervals i < j overlap, then for all k with i ≤ k < j, since intervals[k].start ≤ intervals[j].start < intervals[i].end ≤ intervals[k].end (after sort), k and k+1 overlap somewhere too. By induction, adjacent pairs cover all overlap detection.
Complexity
- Time: O(N log N) (sort dominates)
- Space: O(1) auxiliary if in-place sort, else O(N)
Implementation Requirements
For this lab, do NOT implement. Instead produce a written deliverable (described below).
Tests
Not applicable for this lab — written exercise only.
Follow-up Questions
- “What if instead of yes/no, we wanted the number of conflicts?”
- “What if we had to schedule N people across M rooms?” (becomes Meeting Rooms II)
- “What if the meetings stream in one at a time and we want online detection?”
Product Extension
This is “calendar conflict detection” — a real product feature. Ask yourself: what would Google Calendar do for 10,000 events? (Hint: indexed by day → small per-day scan, or interval tree for general queries.)
Language/Runtime Follow-ups
- In Python, sort is
O(N log N)Timsort, stable. Beware ofkey=lambda x: x[0]— closure overhead vsoperator.itemgetter(0). - In Java, sort uses dual-pivot quicksort for primitives, Timsort for objects. Comparator allocation can dominate small inputs.
- In Go,
sort.Sliceis reflection-based and slow; prefersort.Sliceonly when convenient or usesort.Sortwith a typed slice. - In C++,
std::sortis introsort; comparator must be strict-weak-order or behavior is UB.
Common Bugs
- Using
<=instead of<(or vice versa) due to misreading inclusivity - Mutating input when interviewer expects pure function
- Off-by-one on boundary cases (
[1,5]and[5,9]) - Not handling empty input
Debugging Strategy
If a test fails:
- Print pairs where the algorithm decided “overlap” or “no overlap”
- Walk through the smallest failing case by hand
- Check inclusivity assumption — single source of most bugs here
Deliverable For This Lab
In your notebook (or a markdown file beside this lab), write:
- Restatement. A 1–2 sentence paraphrase of the problem in your own words.
- Clarifying questions list. 6+ questions, prioritized.
- Implicit constraints list. What did the prompt fail to specify? (Inclusivity, sortedness, N bounds, etc.)
- Examples list. 8+ examples covering: trivial, boundary, adversarial, multi-answer.
- Brute force pseudocode. O(N^2) approach.
- Optimization sketch. Just one paragraph.
- Self-critique. Where in your reading did you make assumptions that the prompt didn’t justify?
Mastery Criteria
You complete this lab to mastery when:
- You restated the problem without re-reading the prompt
- You produced 6+ clarifying questions, none of which were answered by the prompt
- You found 3+ implicit constraints
- You produced 8+ examples spanning all category types
- You can articulate which of your assumptions were wrong (everyone makes some — the skill is noticing)
Repeat this lab with 3 different problem statements (pick any from LeetCode Easy) before declaring mastery.
Lab 02 — Constraints Extraction
Goal
Train the discipline of converting written constraints into algorithmic targets. Given 1 ≤ N ≤ 10^5, you should immediately think “O(N log N) is the budget, O(N^2) will TLE”.
Background Concepts
- The constraint-to-complexity mapping (FRAMEWORK.md)
- Operations-per-second budget on competitive judges (~10^8 simple ops/sec safe)
- Implicit constraints (memory, integer overflow, recursion depth)
Interview Context
In live interviews, the interviewer often omits explicit constraints, expecting you to ask. Constraints discipline the algorithm choice. Candidates who skip this step often write an O(N^2) algorithm that the interviewer was hoping they’d avoid.
Problem Statement
For each of the 10 problem prompts below, derive:
- The complexity budget
- At least one algorithm family that fits
- At least one approach that does not fit and why
Prompts:
1 ≤ N ≤ 20— count subsets satisfying property X1 ≤ N ≤ 200— shortest path in weighted graph with up to N^2 edges1 ≤ N ≤ 10^4, queries≤ 10^5— range sum1 ≤ N ≤ 10^5— find duplicate1 ≤ N ≤ 5 × 10^5— kth smallest in array1 ≤ N ≤ 10^6, integers up to 10^9 — count of pairs with sum ≤ KN ≤ 10^9, queries Q ≤ 10^5 — count of integers in[1,N]divisible by KT test cases, T ≤ 10^4, each withN ≤ 10^3— pairwise XOR maximum1 ≤ a, b ≤ 10^18— computea^b mod pStream of up to 10^7 elements— top-K running
Constraints
The point of this lab is constraints. Treat each prompt as separate.
Clarifying Questions
For each prompt, list:
- What’s the operation budget?
- Is the time limit explicit (e.g., 1 sec, 2 sec)?
- Is there a memory limit (e.g., 256 MB)?
- Are values within int32 / int64 range?
Examples
Worked example for prompt #4 (N ≤ 10^5, find duplicate):
- Budget: ~10^7–10^8 ops → O(N), O(N log N), O(N · log_max) all fit
- Fits: hashset O(N), sort+scan O(N log N), Floyd’s cycle if input ∈
[1,N] - Doesn’t fit: O(N^2) brute force (10^10 ops)
Initial Brute Force
Not applicable — meta-lab.
Brute Force Complexity
N/A.
Optimization Path
The optimization path here is constraint-to-algorithm mapping. Memorize the table from FRAMEWORK.md.
Final Expected Approach
For each prompt, write your final answer as:
Prompt #K:
Budget: O(...)
Fitting algorithm family: ...
Disqualified approach: ... because budget / memory / overflow / etc.
Data Structures Used
The point is to map N range → DS choice:
N ≤ 20: bitmask, recursion (no DS needed)N ≤ 200: 2D arrays (Floyd, etc.)N ≤ 10^4: O(N^2) DP arrays, simple sortN ≤ 10^5: hashmap, heap, sorted set, segment tree, FenwickN ≤ 10^6: array + linear scan, two pointers, hash, no log factorsN ≤ 10^9: math, binary search on answer, sieve segment
Correctness Argument
The argument here is budget: justify why your chosen algorithm fits within ~10^8 ops/sec * time-limit. Be explicit: N · log N = 10^5 · 17 ≈ 1.7 × 10^6 — comfortably fits 1-second limit.
Complexity
For each prompt, you produce both:
- The budget
- The justification per the table
Implementation Requirements
Written deliverable. No code.
Tests
Not applicable for this lab.
Follow-up Questions
For prompt #6 (10^6 elements, pairs with sum ≤ K):
- “What if values can be negative?” → may need different sort + two-pointer logic
- “What if we want to enumerate the pairs, not just count?” → output limit changes everything
- “What if the array streams in?” → online algorithm needed
For prompt #9 (a, b ≤ 10^18):
- “What if
pis not prime?” → can’t use Fermat’s little theorem inverse - “What if we need
a^bexactly (no mod)?” → impossible for these sizes
Product Extension
In production, “constraint” often means “expected QPS × max payload size × peak time”. A request handler that’s O(payload_size^2) is fine for size 10 but catastrophic for size 10^4. Same intuition as competitive judges, just different vocabulary.
Language/Runtime Follow-ups
- Python: constant factor ~30–100× slower than C++. Halve your effective budget.
N=10^5with O(N^2) is risky in Python. - Java: ~2–4× slower than C++. Beware autoboxing in hot loops.
- Go: ~2× slower than C++. Map operations have higher constants than
unordered_map. - C++: baseline. Use
ios_base::sync_with_stdio(false)for fast I/O. - JavaScript: ~3–10× slower than C++. Avoid object lookups in hot loops; prefer typed arrays.
Common Bugs
- Forgetting Q queries multiplies your budget (10^5 queries × 10^5 per-query = 10^10!)
- Forgetting T test cases (e.g., T = 10^4 with O(N^2) per test, N = 10^3 → 10^10)
- Underestimating constants in Python/JS
- Forgetting recursion depth limits (Python default 1000)
- Forgetting integer overflow at int32 boundary (~2 × 10^9)
Debugging Strategy
When code TLEs:
- Recompute total operations: outer loops × inner work × test cases × queries
- Check the constant factor (string concat in a loop is a classic disaster)
- Profile to find the actual hotspot (often I/O, not algorithm)
Deliverable
For each of the 10 prompts above, write the structured answer block. Compare yours to the table at FRAMEWORK.md.
Mastery Criteria
- Correctly mapped 10/10 prompts to budget within 30 seconds each
- Identified at least one disqualifying approach for each (per query / total ops)
- Recognized the multiplier effect of T test cases / Q queries
- Identified at least 2 prompts where Python/JS would need extra care
Lab 03 — Brute Force First
Goal
Internalize the discipline of always producing a brute force before optimizing — even when the optimal is obvious. The brute force is your correctness oracle, your fallback, and your communication anchor.
Background Concepts
- Brute force as a baseline: it must be correct, even if slow
- Brute force as oracle: cross-check optimal output with brute on random small inputs
- Brute force as recovery: when stuck, you have something to deliver
Interview Context
Many candidates hear a problem, immediately recognize the optimal pattern, and start coding. The interviewer can’t tell whether they understand the problem or just memorized the answer. State the brute force first, even if it takes 30 seconds. Then, “I see this can be optimized to O(N log N) using … — would you like me to go straight to that, or step through a middle approach?”
Problem Statement
For each of the 5 problems below, in 10 minutes total:
- State the brute force in pseudocode
- State its complexity
- State the optimization path (one or two sentences)
Problems:
- Two Sum: find indices
i, jsuch thata[i] + a[j] == target - Maximum Subarray: maximum sum contiguous subarray
- Longest Substring Without Repeating Characters
- Trapping Rain Water: given heights, total water trapped
- Median Of Two Sorted Arrays
Constraints
For each problem, assume N ≤ 10^5 for sizing.
Clarifying Questions
For each problem, ask: “Are we returning the value or the actual subarray/indices?” — this affects implementation but not the brute-force complexity.
Examples
For Two Sum, brute is for i: for j > i: if a[i] + a[j] == target: return [i,j]. Test on [2,7,11,15], target=9 → [0,1].
Initial Brute Force
Per problem:
- Two Sum: O(N^2) double loop
- Maximum Subarray: O(N^3) — three nested loops (i, j, sum) — or O(N^2) with running sum
- Longest Substring No Repeat: O(N^2) or O(N^3) — for each pair (i, j), check if substring has duplicates (set construction is O(N) per pair, so O(N^3))
- Trapping Rain Water: O(N^2) — for each index, find max-left and max-right by scanning
- Median Of Two Sorted Arrays: O(N+M) — merge then take median; even simpler O((N+M) log(N+M)) — concatenate + sort
Brute Force Complexity
As listed above. Be explicit about which O(N^k) you mean.
Optimization Path
- Two Sum: Hashmap of seen values → O(N).
- Maximum Subarray: Kadane’s running sum, reset to 0 if negative → O(N).
- Longest Substring No Repeat: Sliding window with hashset → O(N).
- Trapping Rain Water: Two-pointer with running max-left, max-right → O(N); or precompute prefix/suffix max → O(N) space.
- Median Of Two Sorted Arrays: Binary search on partition → O(log min(N,M)).
Final Expected Approach
In the deliverable, write:
Problem K:
Brute force pseudocode: ...
Brute complexity: O(...)
Optimization sketch: <pattern> → O(...)
Why the optimization works (1 sentence)
Data Structures Used
For each problem, identify the diff between brute force DS and optimized DS — that diff is usually where the optimization lives.
Correctness Argument
For brute force, correctness is by exhaustive enumeration — usually trivial. For the optimized version, the correctness lives in why exhaustive enumeration is unnecessary (e.g., “if a[i] is the answer’s left half, then target - a[i] must be in the seen set”).
Complexity
Stated per problem above.
Implementation Requirements
For this lab, write the brute force only for problems 1, 2, 3 in your preferred language. Run on the given examples. Do not write the optimal version. The exercise is to make the brute force a habit.
Tests
For each implemented brute force:
- Smoke: 1 given example
- Edge: empty / single
- Random: 10 random small inputs (N ≤ 20), check no crash, plausible output
Follow-up Questions
- For Two Sum: what if there can be multiple valid pairs? Return all? First found?
- For Trapping Rain Water: what about 2D version (Trapping Rain Water II)? — different algorithm (heap-based BFS).
- For Median: what if K-th smallest instead of median? — same binary-search-on-partition idea.
Product Extension
In production code review, the brute force is often the only defensible code if you can’t justify the optimal. Reviewers prefer “obviously correct slow code” over “supposedly fast code with a subtle bug” — until the optimization is properly tested.
Language/Runtime Follow-ups
- Python: brute force should still complete for N ≤ 10^3 in <1s. Use it as oracle.
- Java: beware of
Integerboxing inHashMap<Integer, Integer>for Two Sum — measurable slowdown. - C++: brute force in C++ for
N ≤ 5 · 10^3finishes in well under 1s, useful for stress testing.
Common Bugs
- Off-by-one:
for j = i+1vsfor j = i(Two Sum: must j > i unless duplicates allowed) - Maximum Subarray: starting
max_so_far = 0instead of-infinityfails on all-negative arrays - Sliding Window No-Repeat: not removing characters from the set when shrinking the window
- Trapping Rain Water: using
<vs<=when comparing left and right pointers
Debugging Strategy
When the optimal disagrees with the brute on a small input: trust the brute. The brute is by construction correct. Your optimized version has the bug.
This is the stress testing pattern:
- Write brute (slow, correct)
- Write optimal (fast, suspicious)
- Generate random small inputs
- Compare outputs; on mismatch, print the input and inspect
Deliverable
In a notebook:
- For each of the 5 problems, the structured block (brute pseudocode, complexity, optimization sketch).
- For problems 1, 2, 3: real code for the brute force, with smoke + edge + random tests.
- A 3-sentence reflection: which problem was hardest to resist writing the optimal first?
Mastery Criteria
- Wrote a brute force for all 5 in <2 minutes each (verbally or on paper)
- Stated the brute force complexity correctly
- Stated the optimization path in one sentence
- Resisted the urge to write the optimal first
- Used the brute force as oracle in at least one stress test
Lab 04 — Optimization Pathway
Goal
Train the explicit transition from brute force to optimal: identify what the brute force wastes (repeated work, missed monotonicity, overlooked sortedness), then close that gap with a specific technique.
Background Concepts
The optimization checklist (canonical):
- Repeated work → memoize / DP
- Sortedness → two pointers, binary search, sweep line
- Monotonicity → binary search on answer
- Local + global structure → sliding window, prefix sums
- Pattern match to known DS → heap, monotonic stack/queue, segment tree, trie, union find
- State compression → bitmask, hash of state
- Math closed form → combinatorics, modular arithmetic, geometry
- Graph modeling → BFS/DFS/Dijkstra/topo even on non-graph problems
- Randomization / hashing → reservoir sampling, rolling hash
- Approximation / amortization → when exact O(N) is hard, amortize
Interview Context
The interviewer wants to see your thinking process during optimization, not just the answer. Narrate: “The brute is O(N^2) because we recompute the prefix sum on every iteration. If we precompute prefix sums once, each query is O(1) and total is O(N).”
Problem Statement
For each of the 7 brute force descriptions below, identify which optimization checklist item applies, and produce a one-paragraph optimized approach.
Problems:
- Brute: for each i, sum a[i..j] for all j; O(N^2). Goal: answer multiple range-sum queries.
- Brute: for each pair (i, j), check if
a[i] + a[j] = target; O(N^2). Goal: find pair sum. - Brute: generate all subsets, count those with sum K; O(2^N). N up to 30. Goal: count.
- Brute: for each query (l, r), find min in
a[l..r]; O(N) per query, N queries → O(N^2). Goal: range min queries on static array. - Brute: simulate game state recursively; many overlapping subproblems; O(2^depth). Goal: optimal game value.
- Brute: for each i, find next j > i with
a[j] > a[i]; O(N^2). Goal: next-greater-element array. - Brute: binary search would work if we knew the answer was in
[L, R]. The answer is monotonic in some parameter. O(N · K) brute. Goal: find min K satisfying property.
Constraints
Assume N ≤ 10^5 for #1, #2, #4, #6, #7. N ≤ 30 for #3, allowing 2^N/2 meet-in-the-middle. N is the state-space size for #5.
Clarifying Questions
- “Are queries online or offline?” — offline allows different algorithms (Mo’s, offline BIT)
- “Is the array static or updated?” — static allows sparse table, dynamic needs Fenwick / segment tree
Examples
#1: prefix sums → query in O(1). #7: monotonic predicate → binary search the answer in O(log range).
Initial Brute Force
As stated above per problem.
Brute Force Complexity
As stated.
Optimization Path
For each problem, write the checklist item that applies and the optimized approach:
- Sortedness / static structure: prefix sums → O(N) preprocess, O(1) query
- Sortedness / hashing: hashmap of
target - a[i]→ O(N) - State compression: subset-sum DP O(N · K), or meet-in-the-middle O(2^(N/2)) → fits N=30
- Pattern → known DS: sparse table O(N log N) preprocess, O(1) query (static); segment tree if dynamic
- Repeated work: memoization → O(unique states)
- Pattern → monotonic stack: O(N)
- Monotonicity: binary search on answer → O(log range × verify)
Final Expected Approach
For your deliverable: each of the 7 problems gets:
- Checklist item identified
- Optimized algorithm
- Final complexity
- One-line reasoning
Data Structures Used
- Prefix sum array
- Hashmap
- DP table / memo dict
- Sparse table (immutable RMQ) / segment tree
- Monotonic stack
- (Binary search needs no DS)
Correctness Argument
For each optimization, the correctness argument is “the brute force result is preserved because X”:
- Prefix sums:
sum(l..r) = prefix[r+1] - prefix[l]— algebraic identity - Hashmap pair sum: every pair
(i, j)withi < jis examined exactly once when we processjand look uptarget - a[j] - Subset-sum DP: state
(i, sum)covers all subsets ofa[0..i]with given sum - Sparse table: range min over a range of length L is
min(table[k][l], table[k][r - 2^k + 1])wherek = log2(L)— overlap is fine because min is idempotent - Memoization: same input → same output → cached
- Monotonic stack: pop while top is not greater than current; top after popping is the next-greater for popped elements
- Binary search on answer: predicate is monotonic by problem assumption
Complexity
Stated per problem.
Implementation Requirements
Implement #1, #2, #6 in your preferred language. Verify against brute force on random small inputs.
Tests
- Smoke: given example
- Edge: empty, single
- Random: 50 random tests against brute force; on mismatch, dump input
- Large: stress test at constraint maximum, time it
Follow-up Questions
- For #1: what if the array is updated? → Fenwick tree
- For #2: what if k-sum (k > 2)? → recurse to (k-1)-sum with target adjustment, or sort + multi-pointer
- For #4: what about range max? Range gcd? Range sum? → which are idempotent (sparse table) vs which need segment tree
- For #6: what about previous greater? Next smaller? → mirror the stack
- For #7: what about minimum fractional answer? → binary search on real numbers with precision
Product Extension
The optimization checklist is a real code-review tool. When reviewing a colleague’s PR with O(N^2) in a hot path, run through this checklist mentally. 80% of N^2 → N transitions are one of: prefix sum, hashmap, sort + two-pointer, monotonic stack.
Language/Runtime Follow-ups
- Python: prefix-sum approach gets a 5–10× speedup if you use NumPy
np.cumsuminstead of Python list. - Java: monotonic stack with
Deque<Integer>(autoboxing) is slower than a plainint[]with manual top index. - C++:
std::lower_bound/upper_boundgive log-N binary search on sorted vectors with no manual implementation. - JavaScript:
Mapis generally faster than plain object for hashmap when keys are non-string.
Common Bugs
- Prefix sum: off-by-one in
prefix[r+1] - prefix[l] - Two-pointer: forgetting to advance one pointer
- DP: wrong base case or wrong evaluation order
- Sparse table: log table off-by-one
- Monotonic stack: comparing
>vs>=for the next-greater-or-equal variant - Binary search on answer: monotonicity direction reversed
Debugging Strategy
For each optimization, do stress testing: write brute, write optimal, run on 100 random small inputs, compare outputs. This catches off-by-one bugs that survive the given tests.
Deliverable
- The 7-problem structured analysis
- Real implementation + tests for #1, #2, #6
- A reflection paragraph: for which problem was the optimization checklist most useful? Was there a problem where you’d’ve gone the wrong direction without it?
Mastery Criteria
- Correctly identified the optimization checklist item for all 7
- Implemented #1, #2, #6 with passing tests including stress
- Can explain why each optimization preserves correctness, not just that it’s faster
- Found at least 1 bug via stress testing (good — that’s the point)
Lab 05 — Manual Testing
Goal
Train the discipline of manually walking through your code before claiming it’s done — and finding bugs before the interviewer does.
Background Concepts
- Trace tables: writing variable state row-by-row through a loop
- Edge cases as a reflex
- The “rubber duck” verbalization while tracing
- Pre-submission checklist
Interview Context
Strong candidates always test their code by walking through at least one example by hand, vocalizing variable state. The interviewer learns whether you can verify your own work — a critical engineering signal.
Weak candidates write the code and say “I think this works” without verification, then submit and the interviewer finds the bug. Even when the candidate would have caught the bug if they’d traced.
Problem Statement
You are given 4 small functions below, each with at least one bug. For each:
- Construct a trace table for one input
- Identify the bug
- State a fix
- Construct a test case that exposes the bug
Function 1 — is_palindrome(s: str) -> bool:
def is_palindrome(s):
i, j = 0, len(s)
while i < j:
if s[i] != s[j]:
return False
i += 1
j -= 1
return True
Function 2 — binary_search(a: list[int], target: int) -> int:
def binary_search(a, target):
lo, hi = 0, len(a)
while lo < hi:
mid = (lo + hi) // 2
if a[mid] == target:
return mid
elif a[mid] < target:
lo = mid
else:
hi = mid
return -1
Function 3 — reverse_linked_list(head):
def reverse_linked_list(head):
prev = None
curr = head
while curr.next is not None:
nxt = curr.next
curr.next = prev
prev = curr
curr = nxt
return prev
Function 4 — max_subarray(a: list[int]) -> int:
def max_subarray(a):
best = 0
cur = 0
for x in a:
cur += x
if cur < 0:
cur = 0
if cur > best:
best = cur
return best
Constraints
Standard.
Clarifying Questions
- For palindrome: case-sensitive? Treat as is.
- For binary search: input is sorted ascending, no duplicates.
- For reverse linked list: input may be empty.
- For max subarray: input may be all negative.
Examples
For each function, work a non-trivial input by trace table.
Initial Brute Force
N/A — debugging lab.
Brute Force Complexity
N/A.
Optimization Path
N/A.
Final Expected Approach
For each function, identify:
- The bug
- The minimal fix
- The triggering input
Reference answers (DON’T peek until you’ve tried):
-
is_palindrome:
j = len(s)should belen(s) - 1(or change loop toi < jand uses[i] == s[j-1]style — but as written, immediate IndexError on non-empty input). Trigger: any non-empty string. -
binary_search:
lo = midshould belo = mid + 1, otherwise infinite loop. Trigger: target larger thana[lo]but smaller thana[hi-1], e.g.,a=[1,2,3], target=3. Actually wait — let’s check:lo=0, hi=3, mid=1, a[1]=2 < 3, lo=1(correct in the buggy version; mid would still advance). Trya=[1,3], target=3: lo=0, hi=2, mid=1, a[1]=3 → return 1. OK that works. Trya=[1,2], target=2: lo=0, hi=2, mid=1, a[1]=2 → return 1. Trya=[1], target=2: lo=0, hi=1, mid=0, a[0]=1<2, lo=0 → infinite loop. Trigger: target greater than max element. -
reverse_linked_list: loop condition
while curr.next is not Noneskips reversing the last node. Should bewhile curr is not None. Also crashes on empty input (head = None). -
max_subarray: initializing
best = 0fails on all-negative input — returns 0 instead of the largest (least negative) element. Should bebest = -infinityorbest = a[0].
Data Structures Used
- Trace table: a small ASCII grid of variable values per loop iteration
Correctness Argument
A function is correct iff for every valid input it produces the right output. Manual tracing on edge cases is a cheap, reliable way to falsify “I think it works”.
Complexity
N/A.
Implementation Requirements
For each function:
- Build a trace table on paper
- Identify the bug
- Write the fix
- Write 4 unit tests covering the bug and adjacent cases
Tests
For function 1:
assert is_palindrome("") == True
assert is_palindrome("a") == True
assert is_palindrome("aa") == True
assert is_palindrome("ab") == False
assert is_palindrome("racecar") == True
For function 2:
assert binary_search([], 5) == -1
assert binary_search([5], 5) == 0
assert binary_search([1,2,3], 0) == -1
assert binary_search([1,2,3], 4) == -1 # the trigger
assert binary_search([1,3,5,7], 5) == 2
(Build similar for #3, #4.)
Follow-up Questions
- “How would you find this bug in production?” — logs, test failure, customer report
- “How would you prevent this class of bug?” — property-based testing, fuzz testing
- “What’s your debugging strategy when you can’t reproduce locally?” — see phase-10-testing-debugging/
Product Extension
These four bugs are real bug archetypes:
- Off-by-one (#1, #2)
- Loop termination (#2, #3)
- Initialization (#4)
In code review, scan specifically for: array bounds, loop conditions, accumulator initial values, null inputs.
Language/Runtime Follow-ups
- Python: index
s[len(s)]raisesIndexErrorimmediately — defensive crash - C/C++: would silently read garbage and possibly continue. Always be more defensive
- Java:
ArrayIndexOutOfBoundsExceptionlike Python - Go: panic with clear message
Common Bugs
The four bugs themselves are the common bugs:
- Off-by-one in array bounds
- Off-by-one in binary search update
- Loop terminates one iteration early
- Wrong initial accumulator value
Debugging Strategy
- Trace table for the smallest failing input
- Print state every iteration (the printf approach)
- Verbalize the loop invariant — does the code uphold it?
- Step through with a debugger (last resort, not first — debugger usage is slower than tracing for small problems)
Deliverable
- Trace tables for all 4 functions (one per function)
- Bug identification + fix + test suite for each
- A “trace table template” you’ll reuse going forward
Mastery Criteria
- Found all 4 bugs via tracing (without peeking at answers)
- Wrote correct fixes
- Wrote tests that exposed each bug
- Can construct a trace table for an unseen function in <2 minutes
- Adopted “always trace one example before submitting” as a permanent habit
Lab 06 — Communication
Goal
Train explicit, structured communication during a coding interview: narrate intent, signpost transitions, ask before assuming, summarize at decision points.
Background Concepts
- The 6-phase communication structure (see COMMUNICATION.md)
- “Signposting” — short phrases that tell the listener which phase you’re in
- “Thinking out loud” without rambling
- Pause-points and decision-points
Interview Context
In a real interview, silence is your enemy. After ~30 seconds of silence, the interviewer becomes uncertain about your progress. After 90 seconds, they may interrupt — disrupting your thinking. Skilled candidates fill the time with structured narration that costs them little extra cognitive load.
Problem Statement
Record yourself solving 3 problems out loud, applying explicit communication scaffolding. Then transcribe and grade.
Recommended problems (Easy to keep cognitive load low):
- Two Sum
- Valid Parentheses
- Linked List Cycle
For each, follow this script structure:
Phase 1 — Restatement (60–90 sec)
- “Let me restate to make sure I have it: …”
- Ask 2–3 clarifying questions explicitly
Phase 2 — Examples (60 sec)
- “Let me work through an example by hand: …”
- Construct one trivial and one adversarial example
Phase 3 — Brute Force (60–90 sec)
- “The simplest approach would be …”
- “That’s O(…) — clearly not optimal because …”
Phase 4 — Optimization Discussion (90 sec)
- “I notice
; that suggests …” - Explicitly mention the checklist item you matched
Phase 5 — Coding (5–10 min)
- Narrate intent at each block: “Now I’ll initialize the hashmap, then iterate…”
- Pause and verify after each block
Phase 6 — Testing & Closing (2–3 min)
- “Let me trace through the given example…”
- “Edge cases I should check: …”
- “Final complexity: …”
Constraints
Time-box yourself: 12 minutes total per problem. If you’re going over, cut narration first, not coding.
Clarifying Questions
For Two Sum:
- “Can the input be empty?”
- “Can there be duplicates?”
- “Is exactly one valid pair guaranteed?”
- “Can I assume integers, or could they be floats?”
For Valid Parentheses:
- “Are only
()[]{}allowed, or other characters?” - “Does an empty string count as valid?”
For Linked List Cycle:
- “Should I detect the cycle’s start, or just whether it exists?”
- “Can I modify the list?”
Examples
For each problem, narrate one example by hand before coding.
Initial Brute Force
State and narrate.
Brute Force Complexity
State out loud.
Optimization Path
Narrate the transition: “The brute is O(N^2) because we compare every pair. If I use a hashmap, I can check in O(1) whether target - a[i] is already seen, giving O(N).”
Final Expected Approach
State the final approach as a single sentence before coding.
Data Structures Used
State the data structure explicitly: “I’ll use a hashmap from value → index.”
Correctness Argument
State the loop invariant: “After processing index i, the map contains every (value, index) pair from a[0..i].”
Complexity
State at end: time + space + amortized vs worst case.
Implementation Requirements
You must produce:
- Three audio recordings (or written transcripts) of yourself solving the 3 problems with the script structure
- Working code for all three (the bar is correct + clean, not optimal at first try)
Tests
For each problem, state aloud:
- “Smoke test: given example”
- “Edge: empty input”
- “Edge: single element / smallest non-trivial”
- “Adversarial: all duplicates / all negatives / max constraint”
Then trace through at least one example.
Follow-up Questions
After your solution, narrate:
- “If we wanted all pairs instead of one, we’d …”
- “If the input were a stream, we’d …”
- “If memory were tight, we could …”
Product Extension
These same communication patterns scale to design interviews, on-call discussions, and code reviews. The skill is permanent.
Language/Runtime Follow-ups
When narrating, mention language-specific notes:
- “In Python I’m using
dictwhich is O(1) average; with adversarial input collisions it degrades to O(N).” - “I’m using
enumeratefor index + value to keep the code clean.”
Common Bugs (in communication)
- Going silent for >60 seconds. Even saying “I’m trying to figure out the right invariant for the inner loop” is enough.
- Narrating typing instead of intent. “Now I’m typing
for i in range(n)…” — bad. “Now I’ll iterate forward to maintain the prefix…” — good. - Asking too many questions at once. Drip them; ask after motivation appears.
- Restating the prompt verbatim. Paraphrase, demonstrate comprehension.
- Refusing hints when stuck. Frozen silence is far worse than asking for a nudge.
- Forgetting to summarize at the end. “So the final approach is X with O(N) time and O(N) space; tested on the given example and edge cases.”
Debugging Strategy
After recording, listen back and grade yourself on:
- Did I signpost each phase?
- Did I ever go silent for >60 seconds?
- Did I narrate intent or just typing?
- Did I summarize at decision points?
- Did I sound confident or apologetic?
- Was my code-talk congruent with my code? (i.e., did I narrate one thing while coding another?)
Deliverable
- 3 recordings or transcripts (12 minutes each)
- Self-grading rubric per recording (the 6 questions above + score 0–10 per phase)
- A list of 3 phrases / patterns you’ll reuse next time, and 3 you’ll avoid
Mastery Criteria
- Recorded all 3 problems
- Self-graded honestly (most candidates score 4–6/10 on first attempt — that’s expected)
- Identified at least 3 specific things to improve
- On a re-recording of one problem, scored at least 2 points higher
- No silent gaps > 30 seconds in your final recording
- Narration tracked code 90%+ of the time (no divergence)
Lab 07 — Stuck Recovery
Goal
Train the explicit “stuck protocol” so that when you don’t see the optimal in an interview, you don’t freeze — you systematically work through a checklist that has a high chance of unblocking you within 3 minutes.
Background Concepts
- The 11-step stuck protocol (see FRAMEWORK.md)
- The difference between productive struggle and unproductive freeze
- How to ask for a hint without losing the interview
Interview Context
Every candidate gets stuck. The signal in your favor is how you respond. A candidate who gets stuck and applies a visible recovery protocol is a strong signal. A candidate who gets stuck and goes silent is a weak signal regardless of whether they recover.
Problem Statement
Three problems below are deliberately chosen to be above what you can solve cold without a known pattern. For each:
- Apply the 11-step stuck protocol explicitly, narrating each step
- Time-cap each step at 60 seconds
- If you’ve used 8+ steps without progress, ask for a hint (in this lab, “ask for a hint” means: read the hint section at the bottom)
- Continue until solved
- Document which step generated the breakthrough
Problems (medium-difficulty, but you don’t have the pattern yet):
- Container With Most Water: given heights, find two indices
i, jsuch thatmin(h[i], h[j]) * (j - i)is maximum. - Longest Palindromic Substring: find the longest palindromic substring of a string.
- Search In Rotated Sorted Array: given a sorted array rotated at an unknown pivot, find target in O(log N).
(If you already know all 3 cold, replace with 3 unfamiliar mediums.)
Constraints
Standard. Assume N ≤ 10^5.
Clarifying Questions
For each problem, list 3 you’d ask the interviewer.
Examples
Construct 3 examples for each.
Initial Brute Force
For each problem, write the O(N^2) brute force first.
Brute Force Complexity
State.
Optimization Path — Apply The Stuck Protocol
The 11 steps in order:
- Restate the problem. Out loud, in your own words.
- Write the brute force. It is correct, even if slow.
- Examine the constraints. What does N suggest?
- Construct smaller examples. Solve N=2, N=3 by hand.
- Look for repeated work. What does the brute force recompute?
- Look for sortedness or monotonicity. Could sorting help? Is something monotonic?
- Look for symmetry / pattern from a known DS family. Two-pointer? Sliding window? Heap?
- Try graph modeling. Some non-graph problems become BFS/DFS when you reframe.
- Try a math approach. Closed form? Modular arithmetic? Combinatorial identity?
- State the simplest approach you have. Even if not optimal — get it down.
- Ask for a nudge. Phrase: “I’ve explored A, B, and C. I think the answer might involve D-family of approaches but I’m not seeing it cleanly. Could you nudge me?”
Final Expected Approach
For each problem, the recovery protocol leads to:
-
Container With Most Water: Two pointers from both ends, moving the smaller one inward. Step 7 (two-pointer pattern) gets you there. Loop invariant: the optimal answer doesn’t include any pair
(L, R)we’ve already discarded. -
Longest Palindromic Substring: Either expand-around-center (step 4 — by hand on N=3,4,5 you spot the center idea) or DP (step 5 — repeated work on overlapping substrings). Manacher’s is optimal but step 4/5 is enough.
-
Search In Rotated Sorted Array: Modified binary search. Step 6 — “is something monotonic?” — at least one half is sorted. Use that to decide which half to recurse into.
Data Structures Used
- Two pointers (#1)
- Center expansion / DP table (#2)
- Modified binary search (#3)
Correctness Argument
For each, the post-recovery solution has a clear correctness argument:
- The “moving the smaller one” strategy is correct because moving the larger pointer cannot increase
min(h[i], h[j])(still bounded by the smaller) and reduces width. - Center-expansion enumerates every (center, length) pair exactly once.
- The half that is sorted contains target iff
targetis in[lo, mid](or[mid, hi]); otherwise recurse on the other half.
Complexity
- O(N) time, O(1) space
- O(N^2) time (center expansion); Manacher’s is O(N)
- O(log N) time, O(1) space
Implementation Requirements
For each problem, after applying the protocol, implement it and run tests. The deliverable is the protocol log + working code.
Tests
- Smoke: given example
- Edge: N=0, N=1, N=2
- Adversarial: all-equal heights (#1), all-same-char string (#2), no rotation (#3 with sorted input)
- Random: 50 small random inputs vs brute
Follow-up Questions
- “Why didn’t you see the optimal immediately?” — be honest: “I hadn’t internalized the two-pointer pattern yet” or “I tried sweeping linearly and missed the symmetry.”
- “What pattern would you check next time?” — internalize the answer.
- “When would you fail to recognize this in an interview?” — when nervous, or when the problem is wrapped in unfamiliar context.
Product Extension
In production, “stuck on a bug” follows the same shape: restate symptom, find a minimal reproduction, narrow the search, hypothesize cause, verify by experiment, ask a teammate. The protocol generalizes.
Language/Runtime Follow-ups
- Python: for #2, naive O(N^2) center expansion may TLE on
N = 10^5due to constant factors; consider Manacher’s for full credit - Java:
String.substringin older versions copies; in newer versions it shares — beware version differences - C++: raw character arrays beat
std::stringfor hot loops
Common Bugs
- Stuck protocol going too fast — you skip step 4 (small examples) and miss the breakthrough that lives there
- Stuck protocol going too slow — you spend 5 minutes on step 5 instead of moving on
- Not asking for a hint when 11 steps are exhausted (rare, but happens — pride costs the interview)
- Asking for a hint too early (before step 6) — looks weak
Debugging Strategy
If your protocol runs took >15 minutes per problem, you spent too long per step. Cap individual steps at 60 seconds; if no progress, move on.
If you finished without a hint, congratulations — but verify that you didn’t already know the pattern, in which case use a different problem.
Deliverable
- Per-problem stuck protocol log: for each of the 3 problems, a written record of which step you tried, how long, what you tried, did it advance you?
- The breakthrough step annotated.
- Working implementation + tests for each problem.
- Reflection: which 2 steps are weakest for you? (e.g., many candidates skip step 4 — small examples — because it feels too elementary, but it’s often the highest-yield step.)
Mastery Criteria
- Solved all 3 problems with explicit protocol use
- Spent ≤ 15 min per problem (including coding + tests)
- Identified the breakthrough step accurately
- Used at most 1 hint across the 3 problems
- Identified your two weakest protocol steps
- Re-attempted with the weak steps as starting points and noticed faster progress
Hints (peek only after attempting)
Container With Most Water hint: “Think about what happens at the two extreme ends — and which pointer moving inward could possibly improve the answer.”
Longest Palindrome hint: “Every palindrome has a center. How many possible centers are there?”
Search In Rotated Sorted Array hint: “Even though the full array is rotated, one half across any midpoint must be sorted. Can you tell which half?”
Phase 1 — Programming & Data Structure Foundations
Target level: Easy → low-Medium Expected duration: 2 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~5 labs/week + 30–50 problems applying every data structure under the framework
Why This Phase Exists
Phase 0 fixed your execution — how to read, communicate, derive constraints, brute force, optimize, test, and recover. Phase 1 fixes your vocabulary: the data structures and language-runtime concepts that 95% of every coding interview rests on.
If you cannot, on demand, state the amortized cost of a dynamic-array push, the worst-case behavior of a hash map under adversarial keys, why s += c in a Python loop is O(N²), what happens to your for-loop when you mutate the collection it iterates, and the difference between deep and shallow copy in your language — you do not have a foundation. You have a list of words.
This phase makes the foundation real. Every concept comes with: internal representation, complexity table, memory behavior, language-specific gotchas (Python, Java, Go, C++, JS/TS), interview traps, common bugs, and testing strategy.
What You Will Be Able To Do After This Phase
- Pick the correct data structure for any Easy/Medium problem in under 60 seconds.
- State the worst-case, average-case, and amortized complexity of every operation on every fundamental DS.
- Predict your code’s memory and cache behavior, not just its asymptotic time.
- Write idiomatic code in your interview language without falling into language-specific traps.
- Recognize when a problem is really a hash map / heap / monotonic stack problem in disguise.
- Implement, from scratch, every data structure listed below — without notes — in under 20 minutes each.
Concepts To Master
You must master every item below before moving to Phase 2. Pattern problems in Phase 2 assume fluency with these primitives.
Data Structures
- Arrays — operations, complexity, memory layout, cache behavior, dynamic resizing amortization, gotchas per language
- Strings — immutability, encoding pitfalls, concat-in-loop blowup, substring complexity
- Hash Maps — hashing, collisions, load factor, adversarial inputs, ordered vs unordered, custom hash for tuples
- Hash Sets — operations, set algebra, when to use vs map
- Linked Lists — singly/doubly, sentinels, tail pointer, common manipulation patterns
- Stacks — array-backed vs linked, monotonic stack preview
- Queues — deque, ring buffer, priority queue preview
- Heaps — binary heap, sift up/down, complexity, top-K pattern
- Sorted arrays / sorted sets — binary search, bisect, sorted-set complexity per language
- Trees — binary, BST, traversals iterative + recursive, balanced vs unbalanced
- Tries — operations, space cost, alphabet size considerations
- Graphs — adjacency list, matrix, edge list, when each
- Disjoint Set Union — path compression + union by rank
- Bitsets / Bit Manipulation — set/clear/popcount, common idioms
- Counters / Multisets —
Counter,HashMap+counterpattern, multiset alternatives
Runtime Concepts
- Stack vs heap memory
- Scope and lifetime
- Value vs reference semantics
- Mutable vs immutable
- Hash collisions and adversarial keys
- Iterator invalidation
- Garbage collection basics (refcount vs tracing)
- Memory leaks (especially in GC’d languages)
- Deep vs shallow copy
- Recursion depth and stack overflow
Why These Concepts Matter In Interviews
Most “I knew the algorithm but couldn’t get it to pass” failures aren’t algorithmic. They are foundation failures:
- “My hash map is slow” → adversarial collision pattern in the input.
- “My recursion crashed” → no idea Python’s default recursion limit is 1000.
- “I edited the list while iterating and got weird output” → iterator invalidation.
- “My BFS queue is slow” → using
list.pop(0)instead ofdeque.popleft. - “Java said
ConcurrentModificationException” → the JDK’s fail-fast iterator policy. - “Go map iteration ordered differently every run” → intentional non-determinism for hash-flooding defense.
- “C++ vector reference invalidated after push_back” → reallocation moved the storage.
- “JS object stringified my int keys to strings” → all object keys are strings; use
Map.
Every one of these is on the rubric somewhere as “implementation correctness” or “language fluency.” Phase 1 closes them.
Inline Data Structure Reference
The remainder of this README is a data-structure and runtime reference manual. Read it linearly the first time. Skim it as a reference any time you forget a complexity, language API, or gotcha.
1. Arrays
Internal Representation
A contiguous block of memory holding fixed-size elements indexed by offset. Static arrays have fixed size; dynamic arrays (Python list, Java ArrayList, Go slice, C++ vector, JS Array) wrap a static array and grow it geometrically.
Operations and Complexity
| Operation | Static | Dynamic (avg) | Dynamic (worst) |
|---|---|---|---|
Index read/write a[i] | O(1) | O(1) | O(1) |
Append push_back | n/a | O(1) amortized | O(N) (resize) |
| Prepend / insert middle | n/a | O(N) | O(N) |
| Pop end | n/a | O(1) | O(1) |
| Pop front | n/a | O(N) | O(N) |
| Search (unsorted) | O(N) | O(N) | O(N) |
| Search (sorted, binary search) | O(log N) | O(log N) | O(log N) |
| Length | O(1) | O(1) | O(1) |
Memory Layout & Cache Behavior
Contiguous memory means the CPU prefetcher can stream elements predictively, giving arrays the best cache locality of any data structure. A linear scan over an int[] is typically 5–20× faster than the same scan over a linked list of the same length, even though both are O(N). When constants matter (HFT, hot loops), this difference dominates.
Dynamic Resizing Amortization
Geometric growth (typically 2× or 1.5×) gives O(1) amortized append: doubling means total work to grow from 1 to N is 1 + 2 + 4 + … + N = 2N - 1, amortized to O(1) per push. Linear growth (+1) gives O(N) amortized — never use it.
Language-Specific Gotchas
- Python:
listoverallocates with growth factor ~1.125;arr.pop(0)is O(N); usecollections.dequefor queue. Lists are heterogeneous (each slot is aPyObject*), defeating cache locality. Usearray.arrayor NumPy for primitive-typed storage. - Java:
ArrayListisObject[]— boxing for primitives. Useint[]for hot paths.ArrayList.remove(0)is O(N).Arrays.asList(arr)returns a fixed-size view, not a realArrayList. - Go: Slices are a 3-tuple
(ptr, len, cap).appendmay or may not reallocate — appending to one slice can silently mutate another sharing storage. Always check capacity.s = s[:0]reuses storage;s = nilreleases it. - C++:
std::vector<T>reallocates onpush_backpast capacity, invalidating all iterators and references. Reserve up front when you know the size.vector<bool>is a packed bitset, notbool[]— itsoperator[]returns a proxy. - JS/TS: Arrays are dense or sparse; sparse arrays (
a[1000000] = 1) have terrible perf.arr.shift()is O(N). Holes (new Array(10)) skip duringforEachbut not duringfor.
Common Interview Traps
- “Insert in the middle” — be sure they don’t actually want a different DS.
- “In-place” — explicitly disallows the cheap copy-to-new-array solution.
- Off-by-one at boundaries:
[l, r)vs[l, r].
2. Strings
Internal Representation
An array of code units (bytes for char[], UTF-16 code units in Java/JS, variable-width in Python 3 with PEP 393 latin-1/UCS-2/UCS-4). In most languages, strings are immutable — every “modification” allocates a new string.
Operations and Complexity
| Operation | Complexity |
|---|---|
Index s[i] | O(1) for fixed-width encoding, O(i) for UTF-8 by codepoint |
| Length | O(1) (cached) |
Concat s + t | O(|s| + |t|) — new allocation |
Substring s[l:r] | O(r − l) (copy) in most langs, O(1) (view) in Go and Java pre-7u6 |
| Equality | O(min len) but fast with hash compare first |
| Find substring (naive) | O(NM) |
| Find substring (KMP / Z) | O(N + M) |
Immutability
Java/Python/JS strings are immutable. So this loop:
s = ""
for c in chars:
s += c # O(N) per iteration → O(N²) total
This bug shows up in real interviews and gets candidates dinged for not knowing language internals. Use "".join(chars) (Python), StringBuilder (Java), [].join('') (JS), strings.Builder (Go).
Encoding Pitfalls
- “String length” in characters vs bytes vs grapheme clusters can all differ.
"é"may be 1 codepoint or 2 (e + combining accent) → 2 codepoints, 1 grapheme. - Python
len("😀") == 1; Java"😀".length() == 2(UTF-16 surrogate pair). - JS
"😀".length === 2for the same reason; iterate withfor…ofto get codepoints. - Always clarify: “Are inputs ASCII?” — if no, ask whether the unit of “character” is byte, codepoint, or grapheme.
Substring Complexity
Substring extraction copies the underlying chars in most modern languages. Go strings are immutable byte slices and substring is O(1) view (but converting to/from []byte is O(N)).
Language-Specific Gotchas
- Python: strings are immutable; use lists of chars and
"".join()for building.s[::-1]is idiomatic reverse. - Java:
String s += c→ quadratic. UseStringBuilder.String.intern()exists but rarely needed in interviews. - Go:
stringis immutable bytes;[]byte(s)andstring(b)each allocate. Range over a string yields(i, rune), not(i, byte). - C++:
std::stringis mutable; SSO (small string optimization) keeps short strings on the stack.s.c_str()is null-terminated. - JS/TS: strings are UTF-16 code units; emoji and non-BMP chars are 2 units long.
3. Hash Maps
Internal Representation
A bucket array, indexed by hash(key) % capacity. Collisions resolved by either separate chaining (linked list / tree per bucket — Java since 8 promotes long chains to red-black trees) or open addressing (Python, Ruby — probe sequence in the same array).
Operations and Complexity
| Operation | Average | Worst |
|---|---|---|
| Insert | O(1) | O(N) (all collide) |
| Lookup | O(1) | O(N) |
| Delete | O(1) | O(N) |
| Iterate | O(N + capacity) | O(N + capacity) |
Load Factor
The capacity / size ratio. When load factor exceeds a threshold (~0.75 typical), the table doubles and rehashes — this is amortized O(1) per insert but O(N) for the resize itself.
Adversarial Inputs
A static, public hash function lets an attacker craft N keys that all hash to the same bucket → O(N²) insertion. Real-world history: this brought down Java web servers in 2003 and PHP in 2011. Modern languages (Python PYTHONHASHSEED, Go map randomization, Java tree fallback) defend against this.
In an interview, if the problem says “the input may be adversarial,” do not rely on hash maps for worst-case bounds — use sorting + binary search or a balanced BST.
Ordered vs Unordered
- Insertion-ordered: Python 3.7+
dict, JavaLinkedHashMap, JSMap. - Sorted (by key): Java
TreeMap, C++std::map, Pythonsortedcontainers.SortedDict. - Hash, no order guarantee: C++
std::unordered_map, JavaHashMap(unordered as of 8+), Gomap(intentionally randomized).
If the problem requires ordered iteration, do not use a plain hash map.
Custom Hash for Tuples / Composite Keys
- Python: tuples of hashable items are hashable for free.
- Java: must implement both
equalsandhashCodefor any custom key class. Forgetting one is a top-10 interview bug. - Go: map keys must be “comparable types” — structs of comparables work, slices/maps don’t.
- C++: must specialize
std::hash<T>or pass a custom hasher tounordered_map. - JS: object keys are coerced to strings; use
Mapfor object-keyed maps.
Common Interview Traps
- Mutating a key after insertion (its hash changes; the map can’t find it).
- Iterating while mutating →
ConcurrentModificationException(Java) /RuntimeError(Python). - Assuming O(1) without acknowledging worst case.
4. Hash Sets
Operations and Complexity
Same as hash map (a hash set is conceptually a hash map with null values).
| Operation | Average | Worst |
|---|---|---|
| add / contains / remove | O(1) | O(N) |
| union | O(|A| + |B|) | |
| intersection | O(min(|A|, |B|)) | |
| difference | O(|A|) |
Set Algebra
- Union:
A ∪ B— PythonA | B, JavaA.addAll(B), C++ manual. - Intersection:
A ∩ B— iterate the smaller, lookup in the larger. - Difference:
A \ B— iterate A, skip if in B. - Symmetric difference:
A △ B—(A ∪ B) \ (A ∩ B).
Set vs Map: When To Choose
Use a set when you only need presence; use a map when you need associated value (count, index, parent, etc.). Many “use a set” problems become trivially easier with a map (e.g., Two Sum needs a value → index map, not a set).
Language-Specific Gotchas
- Python:
setandfrozenset; tuples are hashable, lists are not. - Java:
HashSet,LinkedHashSet,TreeSet. - Go: no built-in set — use
map[T]struct{}(zero-byte value). - C++:
std::unordered_set,std::set(sorted). - JS/TS:
Setpreserves insertion order; objects are reference-equal.
5. Linked Lists
Singly vs Doubly
- Singly: each node has
next. Reverse, cycle detection, two-pointer dance. - Doubly: each node has
nextandprev. Required for O(1) erase given an iterator (LRU cache pattern).
Sentinels (Dummy Nodes)
A dummy head node simplifies edge cases: “what if the list is empty?” “what if we delete the first node?” become non-special. Always use a dummy when you have to return a head pointer that may change.
dummy = ListNode(0)
dummy.next = head
prev = dummy
# … operations on prev.next …
return dummy.next
Tail Pointer
Maintaining a tail pointer makes append O(1) (otherwise O(N)). Always-mark whether your manipulation invalidates the tail.
Common Manipulation Patterns
- Reverse iteratively: three pointers
prev,curr,next. - Reverse recursively:
reverse(head.next)thenhead.next.next = head; head.next = None. - Find middle: slow/fast pointers; fast moves 2× slow.
- Detect cycle: Floyd’s tortoise and hare.
- Merge two sorted lists: dummy head + zip pattern.
- Remove Nth from end: lead pointer ahead by N.
Operations and Complexity
| Operation | Singly | Doubly |
|---|---|---|
| Index | O(N) | O(N) |
| Insert at known node | O(1) | O(1) |
| Delete at known node | O(N)* | O(1) |
| Search | O(N) | O(N) |
*Singly: O(N) because you need the previous node; O(1) if you have the previous pointer.
Language-Specific Gotchas
- Python: define
class ListNode: __slots__ = ('val', 'next')for memory; default uses a__dict__. - Java:
LinkedList<T>exists but is rarely the right choice;ArrayDequebeats it on most operations. - Go: standard library has
container/list(doubly linked, generic-erased before Go 1.18). - C++:
std::list(doubly),std::forward_list(singly). - JS: typically write
class Node { constructor(v) { this.val = v; this.next = null; } }.
6. Stacks
Implementations
Array-backed (dynamic array, push/pop end) or linked (push/pop head). Array-backed is faster in practice due to cache locality.
Operations and Complexity
| Operation | Complexity |
|---|---|
| push | O(1) amortized |
| pop | O(1) |
| peek (top) | O(1) |
Monotonic Stack Preview
A monotonic stack maintains elements in increasing or decreasing order. Used for next-greater / next-smaller / largest rectangle in histogram. Preview only — full pattern in Phase 2.
for x in arr:
while stack and stack[-1] < x:
# process element being popped: x is its next-greater
stack.pop()
stack.append(x)
Language-Specific Gotchas
- Python: use
listdirectly withappend/pop. Don’t usequeue.LifoQueue(locks). - Java: prefer
ArrayDequeover the legacyStack(synchronized, slow). - Go: slice with
s = append(s, x)ands[len(s)-1]/s = s[:len(s)-1]. - C++:
std::stack<T>adapter onstd::deque; or just usevector. - JS: array
push/pop.
7. Queues
Variants
- Plain queue (FIFO): enqueue rear, dequeue front.
- Deque (double-ended): push/pop both ends in O(1).
- Ring buffer / circular buffer: fixed-capacity deque on a static array.
- Priority queue: see Heaps.
Operations and Complexity
| Operation | Linked | Array deque | Ring buffer |
|---|---|---|---|
| enqueue rear | O(1) | O(1) amortized | O(1) (if not full) |
| dequeue front | O(1) | O(1) | O(1) |
| peek front | O(1) | O(1) | O(1) |
Language-Specific Gotchas
- Python:
collections.dequefor queue; never uselist.pop(0)(O(N)). - Java:
ArrayDeque<T>for both stack and queue.LinkedListworks but slower. Avoidjava.util.Queue<T> q = new LinkedList<>();for hot paths. - Go: no built-in deque;
container/listexists. Most CP code uses a slice as a queue withq[head:](lazy popfront). - C++:
std::deque(random-access amortized O(1)) orstd::queueadapter. - JS: array
push/shiftworks butshiftis O(N); use a custom ring buffer for large queues.
8. Heaps
Binary Heap
A complete binary tree where each parent ≤ children (min-heap) or ≥ (max-heap). Stored in an array: parent of i is (i-1)/2, children are 2i+1 and 2i+2.
Operations and Complexity
| Operation | Complexity |
|---|---|
| push (sift up) | O(log N) |
| pop top (sift down) | O(log N) |
| peek top | O(1) |
| heapify (build from N elements) | O(N) |
| decrease-key | O(log N) (if you know the index) |
| arbitrary delete | O(log N) (if you know the index), else O(N) |
Top-K Pattern (Preview)
- “Top K largest” → min-heap of size K. Push every element; if size > K, pop. Final heap = top K.
- “Stream median” → max-heap (lower half) + min-heap (upper half), balanced.
Language-Specific Gotchas
- Python:
heapqis a min-heap of any orderable; for max-heap, push-x. Tuples break ties lexicographically — careful with non-orderable secondary keys. - Java:
PriorityQueue<T>is a min-heap by default; passComparator.reverseOrder()for max.peek()may return null on empty. - Go:
container/heaprequires you to implement theheap.Interface. Significant boilerplate. - C++:
std::priority_queue<T>is a max-heap by default. Usestd::priority_queue<T, vector<T>, greater<T>>for min-heap. Or usemake_heap+push_heap+pop_heapon a vector. - JS: no built-in; write your own or use a library.
9. Sorted Arrays / Sorted Sets
Binary Search
On a sorted array, find a target or its insertion point in O(log N). Three canonical variants:
- Lower bound: smallest index where
a[i] >= target. - Upper bound: smallest index where
a[i] > target. - Exact match: lower bound + check
a[i] == target.
Operations on Sorted Set / Multiset
| Operation | Complexity |
|---|---|
| Insert | O(log N) |
| Erase | O(log N) |
| Find / lower_bound / upper_bound | O(log N) |
| Iterate in order | O(N) |
| Min / max | O(1) (or O(log N)) |
| Kth element | O(log N) (with order statistics tree) or O(K) iteration |
Language-Specific Gotchas
- Python:
bisect.bisect_left/bisect_rightfor sorted lists.sortedcontainers.SortedListfor an ordered multiset (O(log N) insert). - Java:
TreeSet<T>/TreeMap<K,V>;floor,ceiling,higher,lowerare essential APIs to know. - Go: no standard sorted set — must implement or use a third-party library.
- C++:
std::set/std::multiset(red-black tree);lower_bound/upper_boundmember functions. - JS: no standard sorted set; use a sorted array with binary search or write a treap.
10. Trees
Binary Tree Definitions
- Binary tree: each node has ≤ 2 children.
- BST: left subtree < node < right subtree (in-order traversal yields sorted sequence).
- Balanced (AVL, RB): height O(log N) guaranteed.
- Unbalanced: worst case O(N) (degenerate to linked list).
Traversals (Recursive)
def inorder(n):
if not n: return
inorder(n.left)
visit(n)
inorder(n.right)
Pre-order: visit, left, right. Post-order: left, right, visit. Level-order: BFS with a queue.
Traversals (Iterative)
- In-order with stack: push left chain, pop and visit, then go right.
- Pre-order with stack: push root, pop, visit, push right then left.
- Post-order with stack: trickier — use a marker or a 2-stack trick.
- Morris traversal: O(1) extra space using threaded pointers; advanced.
Operations and Complexity (Balanced)
| Operation | Balanced | Unbalanced |
|---|---|---|
| Insert / delete / search | O(log N) | O(N) |
| Min / max | O(log N) | O(N) |
| In-order traversal | O(N) | O(N) |
Language-Specific Gotchas
- Python: sys.setrecursionlimit; CPython has no tail-call elimination.
- Java:
TreeMapis red-black; recursion depth limited by JVM stack (~1000s). - Go: no standard balanced BST.
- C++:
std::map/std::setare red-black; iterators traverse in order. - JS: no standard balanced BST; recursion depth is engine-dependent.
11. Tries
Internal Representation
Each node has up to alphabet-size children (e.g., 26 for lowercase, 256 for byte, larger for unicode). End-of-word flag per node.
Operations and Complexity
| Operation | Complexity |
|---|---|
| Insert word of length L | O(L) |
| Search word of length L | O(L) |
| Prefix search | O(L) |
| Space | O(total chars × alphabet size) |
Alphabet Size Considerations
Fixed array per node: O(σ) memory per node, fast O(1) child lookup. HashMap per node: O(actual children) memory, slightly slower lookup. For 26 letters use array; for full unicode use hash map.
Language-Specific Gotchas
- Python: dict of dicts; can use
defaultdict(dict). - Java:
Map<Character, TrieNode>orTrieNode[26]. - Go: struct with
[26]*TrieNode. - C++: struct with
TrieNode* children[26]. Manual memory management orunique_ptr. - JS: plain objects or
Map.
12. Graphs
Representations
| Representation | Space | Edge query | Iterate neighbors |
|---|---|---|---|
| Adjacency list | O(V + E) | O(deg(v)) | O(deg(v)) |
| Adjacency matrix | O(V²) | O(1) | O(V) |
| Edge list | O(E) | O(E) | O(E) |
When To Use Each
- Adjacency list: sparse graphs (E ≪ V²) — almost always the right answer for interviews.
- Adjacency matrix: dense graphs (E ≈ V²), Floyd-Warshall, when V is small (≤ 500).
- Edge list: Kruskal’s MST, when you need to sort edges, when graph is given as edges and you don’t need neighbor queries.
Common Forms in Interviews
List<List<Integer>>adjacency.Map<String, List<String>>for non-integer node IDs.- Implicit graph (grid: neighbors are
(±1, 0)and(0, ±1)).
Language-Specific Gotchas
- Python:
defaultdict(list)is ideal foradj[u].append(v). - Java:
List<List<Integer>>with explicitArrayListinitialization in a loop; primitive int adjacency lists need third-party (Eclipse Collections, fastutil). - Go:
[][]intslice-of-slices. - C++:
vector<vector<int>>; for performance, usevector<int>with offsets (CSR format). - JS: array of arrays or
Map<string, string[]>.
13. Disjoint Set Union (Union-Find)
Operations
find(x): which component isxin?union(x, y): merge components ofxandy.
Optimizations
- Path compression: during
find, set parent of every visited node to the root. - Union by rank/size: attach the shorter tree under the taller.
- Together: O(α(N)) per op (inverse Ackermann — practically constant, ≤ 4 for any realistic N).
Naive vs Optimized
| Variant | find | union |
|---|---|---|
| Naive | O(N) | O(N) |
| Path compression only | O(log N) amortized | O(log N) amortized |
| Path compression + union by rank | O(α(N)) amortized | O(α(N)) amortized |
Reference Implementation (Python)
parent = list(range(N))
rank = [0] * N
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # path compression
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry: return False
if rank[rx] < rank[ry]: rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]: rank[rx] += 1
return True
Language-Specific Gotchas
- Python: the recursive form blows past the recursion limit for N > 1000; always iterative.
- Java: prefer
int[] parentoverMap<Integer, Integer>for primitive perf. - Go: straightforward with
[]int. - C++:
vector<int>parent and rank. - JS: typed array
Int32Arrayfor speed.
14. Bitsets / Bit Manipulation
Common Idioms
| Operation | Idiom |
|---|---|
Set bit i | x | (1 << i) |
Clear bit i | x & ~(1 << i) |
Toggle bit i | x ^ (1 << i) |
Test bit i | (x >> i) & 1 |
| Lowest set bit | x & -x |
| Pop lowest set bit | x & (x - 1) |
| Popcount | language builtin (__builtin_popcount, Integer.bitCount, bin(x).count('1')) |
| Iterate subsets of mask | s = mask; while s > 0: …; s = (s - 1) & mask |
Bitsets
A packed array of bits — 64× the density and 64× the throughput of bool[] for many ops. Use when N up to ~10⁵ and you need fast set operations.
Language-Specific Gotchas
- Python: ints are arbitrary precision; no overflow but no SIMD either.
bin(x).count('1')works butint.bit_count()(3.10+) is faster. - Java:
intis 32-bit,longis 64-bit. Negative numbers:>>is arithmetic,>>>is logical. - Go: untyped constants vs typed; explicit cast required (
int(x),uint32(x)). - C++:
std::bitset<N>for compile-time-known N;vector<bool>is a specialization (proxy reference). - JS: bitwise ops coerce to 32-bit signed int — beware truncation. Use
BigIntfor 64-bit ops.
15. Counters / Multisets
Counter Pattern
A hash map from key to count. Used for: frequency analysis, anagram detection, sliding window distinct-element count.
Operations
| Operation | Complexity |
|---|---|
| Increment count[k] | O(1) |
| Decrement / remove if zero | O(1) |
| Total count | O(distinct keys) |
| Sorted by count | O(N log N) |
Multiset
A counter is essentially a multiset (allows duplicates, remembers count). For an ordered multiset (need min/max/kth in order), use a TreeMap+count or sortedcontainers.SortedList.
Language-Specific Gotchas
- Python:
collections.Counter—Counter(s),most_common(k), arithmetic operators (c1 - c2drops zeros). - Java:
HashMap<K, Integer>withgetOrDefault(k, 0) + 1andmerge(k, 1, Integer::sum). There is no built-inCounter. - Go:
map[K]int. Manual increment. - C++:
std::unordered_map<K, int>;++m[k]works because default-constructed int is 0. - JS:
Map(preserves insertion order) or plain object (string keys only).
Inline Runtime Concepts Reference
These concepts cut across all data structures. They are interview rubric line items in their own right.
1. Stack vs Heap Memory
The call stack holds function frames: locals, args, return addresses. Fast, fixed-size (typically 1–8 MB in interview environments). Allocations are pointer-bump and bound to the frame’s lifetime.
The heap is dynamically managed memory — new, malloc, Python objects, JVM objects. Slower allocation, can be GC’d or manually freed. Survives beyond the function that created it.
Why It Matters In Interviews
- “Why does my recursion overflow at depth 10⁵?” → call stack ~1 MB / ~64 bytes per frame ≈ 16K frames before crash.
- “Why is my linked list slower than my array?” → heap-allocated nodes scattered in memory, no cache locality.
Per-Language Gotchas
- Python: all “values” except small ints are heap objects. Locals are name bindings, not stack-allocated values.
- Java: primitives in locals are stack-allocated; objects always heap. Escape analysis can sometimes elide a heap alloc.
- Go: escape analysis decides stack vs heap;
&xin a returned closure forces heap allocation. - C++: explicit (
int x;is stack,new intis heap). Stack allocation is much faster. - JS: all objects are heap; primitives may be stack-internal but the language hides it.
2. Scope and Lifetime
Scope = where a name is visible. Lifetime = how long the value lives.
These can differ! In a closure, a variable’s scope ends with the function but its lifetime extends as long as the closure references it.
Per-Language Gotchas
- Python: late binding in closures —
[lambda: i for i in range(3)]all return 2, not 0/1/2. - Java: local variables captured by lambdas must be effectively final.
- Go: loop variable capture changed in Go 1.22 — pre-1.22,
for i := range … { go func() { use(i) }() }captures the samei. - C++: dangling reference if you return
&local. Lifetime extension viaconst&is a niche rule. - JS:
varis function-scoped,let/constare block-scoped. Hoisting trap.
3. Value vs Reference Semantics
Does assigning or passing a variable copy the value or share a reference?
| Language | Primitives | Objects/Arrays |
|---|---|---|
| Python | by value (immutable) | by reference |
| Java | by value | reference passed by value |
| Go | by value (struct, array) | slice/map/chan = ref-ish |
| C++ | by value (default) | explicit & or * for ref |
| JS | by value | by reference |
Trap
“I passed my array to the function, modified it inside, and the caller saw the change!” — yes, because the array (Python list, Java int[], JS array) is passed by reference.
“I passed my int to the function, modified it inside, but the caller didn’t see the change!” — yes, because the int is by value.
4. Mutable vs Immutable
Immutable values cannot be modified after creation; “modification” returns a new value.
| Language | Immutable types |
|---|---|
| Python | str, int, tuple, frozenset, bytes |
| Java | String, all primitive wrappers, LocalDate, etc. |
| Go | string |
| C++ | const-qualified |
| JS | primitives (string, number, boolean, undefined, null, bigint, symbol) |
Trap
- Using a mutable object as a hash map key, then mutating it → key lost.
- “Why is
s += cslow?” — string is immutable, every iteration copies the whole string.
5. Hash Collisions and Adversarial Keys
A hash map’s O(1) average requires a “good” hash function and “uniform” inputs. An adversary who knows the hash function can craft keys all colliding to one bucket → O(N²) blowup.
Defenses
- Random seed (Python’s
PYTHONHASHSEED, Go map random seed) — attacker can’t predict the function. - Tree fallback — Java
HashMapsince 8 converts long collision chains into red-black trees, capping worst case at O(log N). - Cryptographic hash — overkill, but immune.
Interview Note
If the problem explicitly says “adversarial” or “competitive” inputs, do not rely on hash maps. Use a sorted structure (TreeMap, sortedcontainers, std::map).
6. Iterator Invalidation
Modifying a collection while iterating can break the iterator.
Behaviors
- Python: mutating a dict during iteration raises
RuntimeError. - Java:
ConcurrentModificationException(fail-fast iterator). - Go: map iteration order is randomized; modifying the map mid-iteration is technically allowed but the new keys may or may not be visited.
- C++:
vector::push_backinvalidates all iterators if it reallocates.unordered_map::insertinvalidates all on rehash. - JS:
MapandSetiteration sees later insertions; deletions are honored.
Safe Pattern
Collect mutations into a list during iteration; apply after the loop.
7. Garbage Collection Basics
Two main strategies:
- Reference counting (refcount): each object has a count of references to it; when 0, freed. Fast but cannot collect cycles. CPython uses refcount + cycle collector.
- Tracing GC (mark-and-sweep, generational): periodically traces from roots; unreachable objects freed. Java, Go, JS, C# use variants.
Why It Matters
- Refcount:
del xis immediate; deterministic destruction. - Tracing: deallocation happens “later” — pauses (“stop-the-world”) historically, mostly amortized in modern collectors.
Per-Language Gotchas
- Python: cycles between objects with
__del__finalizers may never collect (pre-3.4). - Java: “GC pause” is a real concern in latency-sensitive interviews; mention G1, ZGC.
- Go: GC is concurrent and non-generational; predictable sub-millisecond pauses.
- C++: no GC — must
deletewhat younew. RAII (smart pointers) automates this. - JS: V8 uses generational GC.
8. Memory Leaks (in GC’d Languages)
A “leak” in a GC’d language means: the object is unreachable from the developer’s intent, but reachable from the GC’s perspective — so it’s never freed.
Common Sources
- Listeners not removed: event handler holds a reference to the listener forever.
- Caches without eviction: map grows monotonically.
- Closure capture: closure holds reference to large enclosing object.
- Static fields: lives forever.
- ThreadLocals not cleared in pooled threads: classic JVM leak.
Per-Language Gotchas
- Python: circular refs with finalizers; module-level state.
- Java:
ThreadLocal+ thread pool; classloader leaks (PermGen/Metaspace). - Go: goroutine leak (blocking on a channel that never receives).
- C++: real memory leak via missing
delete. - JS: detached DOM nodes, timers not cleared.
9. Deep vs Shallow Copy
- Shallow copy: new container, but elements are shared references.
- Deep copy: recursively copies elements too.
When It Matters
Backtracking: if you store snapshots of a mutable list, you need a deep (or at least one-level) copy, otherwise all snapshots reflect the latest state.
result = []
path = []
def backtrack(...):
if done: result.append(path[:]) # MUST copy; otherwise all entries are the same list
Per-Language Gotchas
- Python:
list(x)/x[:]shallow;copy.deepcopy(x)deep. - Java:
clone()is shallow by default; deep requires explicit traversal. - Go:
copy(dst, src)shallow; for deep, write your own. - C++: copy constructor: shallow by default for raw pointers; smart pointers and STL containers do “value” copies.
- JS: spread
[...arr]and{...obj}are shallow; deep viastructuredClone(x)(modern) or JSON round-trip (lossy).
10. Recursion Depth and Stack Overflow
Each recursive call adds a frame to the call stack. The stack has a fixed size; exceeding it crashes (or in Python, raises RecursionError).
Default Limits
- Python:
sys.getrecursionlimit()defaults to 1000. Raise withsys.setrecursionlimit(10**6)plusthreading.stack_size(...). - Java: ~10K frames typical; tune with
-Xss. - Go: stacks start small (8KB) and grow up to 1 GB.
- C++: thread stack ~1 MB default; tune at thread creation.
- JS: ~10K frames typical, engine-dependent.
Mitigation
- Convert to iteration with an explicit stack (DFS, in-order traversal).
- Tail-call optimization is not present in most popular languages (no Python, no Java, no JS, no Go). C++ and Scheme can do it.
- For trees, balance matters: a degenerate (chain-shaped) tree blows recursion at depth N, not log N.
Recommended Problem Categories
After mastering the concepts, drill these problem categories. Each maps to a lab below.
| Category | Sample Problems |
|---|---|
| Array manipulation | Rotate Array, Move Zeros, Plus One, Spiral Matrix |
| String mechanics | Reverse Words, Implement strStr, Length of Last Word |
| Hash map / set | Two Sum, Group Anagrams, Contains Duplicate II, Intersection of Two Arrays |
| Linked list | Reverse Linked List, Merge Two Sorted Lists, Palindrome Linked List, Linked List Cycle |
| Stack / queue | Valid Parentheses, Min Stack, Implement Queue using Stacks, Daily Temperatures |
| Heap | Kth Largest in Stream, Last Stone Weight, K Closest Points to Origin |
| Binary search | First Bad Version, Search Insert Position, Find Peak Element |
| Recursion | Generate Parentheses, Letter Combinations of a Phone Number, Subsets |
| Tree traversal | Inorder Traversal (iterative + recursive), Symmetric Tree, Same Tree, Binary Tree Paths |
Aim for ~30 problems across these categories minimum (more for the 6/12-month tracks).
Mastery Checklist
- Implement, from scratch, in 20 minutes each, with passing tests: dynamic array, linked list (singly + doubly), stack, queue (deque), binary heap, hash map (open addressing), trie, union-find.
- State worst-case, average-case, and amortized complexity for every operation in the table above.
-
Explain why
s += cin a loop isO(N²)in three of {Python, Java, JS, Go}. - Write iterative in-order, pre-order, and post-order tree traversals (no recursion).
- Solve any LeetCode Easy in 15 minutes using the framework.
- Solve any LeetCode Easy-Medium hash-map problem in under 20 minutes.
- Recognize when a problem needs a heap vs sorted set vs counter without prompting.
- Articulate the iterator invalidation rules for your interview language.
- Articulate refcount vs tracing GC for your interview language.
- Predict whether a recursive solution will overflow on N = 10⁵ in your interview language.
Exit Criteria
Move to Phase 2 only when:
- Mastery checklist 100% complete.
- All 9 labs completed with passing tests.
- Solved ≥ 30 problems across the recommended categories using the framework, each reviewed via REVIEW_TEMPLATE.md.
- Implemented every primary data structure from scratch — no notes, no internet, just the language’s basic features.
- Self-mocked one Easy and one low-Medium problem with full framework execution; passed using READINESS_CHECKLIST.md section criteria for “data-structure fluency.”
- For every failure during these problems, ran the diagnosis in FAILURE_ANALYSIS.md and queued it via SPACED_REPETITION.md.
If any item fails, do not move on. Phase 2 patterns assume Phase 1 fluency. Without it, you will plateau.
Hands-On Labs
Complete in order. Each lab uses the strict 22-section format used throughout the curriculum.
- Lab 01 — Array Fundamentals
- Lab 02 — String Mechanics
- Lab 03 — Hashmap Mastery
- Lab 04 — Linked List Pointers
- Lab 05 — Stack & Queue Applications
- Lab 06 — Heap Priority
- Lab 07 — Binary Search Fundamentals
- Lab 08 — Recursion & Stack
- Lab 09 — Tree Traversal Fundamentals
Common Mistakes In Phase 1
- Memorizing complexities without understanding them — when an interviewer probes “why is dict insert O(1) on average?”, you must derive it.
- Skipping “from scratch” implementations — using
import heapqis fine for problems, but you must be able to write the heap yourself. - Treating Python list as a queue —
pop(0)is O(N); usedeque. - Forgetting to check overflow in C++/Java —
int + intoverflows silently; cast tolong. - Recursion in Python without setting the limit — N=10⁴ trees blow the default stack.
- Confusing
==withis/equals/===— identity vs equality differs by language. - Using a mutable object as a hash key.
- Not knowing your language’s iterator invalidation rules.
If any of these still trip you up, you are not done with Phase 1.
Lab 01 — Array Fundamentals: Rotate Array In Place
Goal
Master in-place array rotation. The deliverable shows you understand pointer arithmetic, the O(N) reversal trick, dynamic-array memory layout, and edge cases that catch ~70% of candidates on this exact problem.
Background Concepts
Arrays as contiguous memory; index arithmetic mod N; in-place vs auxiliary-space transformations; the “three reversals” identity: rotate(arr, k) == reverse(reverse(arr[:k]), reverse(arr[k:])). Review the Arrays section of the Phase 1 README and the value-vs-reference rules in section 3 of the runtime concepts.
Interview Context
This is the canonical “looks easy, traps everyone” Easy/low-Medium problem. Real interviews from Microsoft, Amazon, Meta, Apple, Google. The interviewer is watching for: do you do the auxiliary-array brute force first? Do you spot the O(N) in-place trick? Do you handle k > N? Do you survive k == 0 and N == 1?
Problem Statement
Given an integer array nums and a non-negative integer k, rotate the array to the right by k steps in place. After the rotation, element originally at index i ends up at index (i + k) % N.
Constraints
1 ≤ N ≤ 10^5-2^31 ≤ nums[i] ≤ 2^31 - 10 ≤ k ≤ 10^9- Must run in O(N) time and O(1) extra space.
Clarifying Questions
- Can
kbe greater thanN? (Yes — must reduce mod N.) - Can
kbe 0? (Yes — should be a no-op, no array mutation.) - Can the array be empty / size 1? (Per the constraints,
N ≥ 1. Confirm.) - Right rotation, not left? (Confirm direction; getting it backward is a top-3 bug here.)
- Must it be in place, or is auxiliary memory allowed? (In place — that’s the spirit of the problem.)
Examples
| Input | k | Output | Notes |
|---|---|---|---|
[1,2,3,4,5,6,7] | 3 | [5,6,7,1,2,3,4] | Standard case |
[1,2] | 3 | [2,1] | k > N: effective k = 1 |
[1,2,3] | 0 | [1,2,3] | No-op |
[1] | 100 | [1] | Trivial size |
[1,2,3] | 3 | [1,2,3] | k == N: no-op |
Initial Brute Force
Allocate out[N]; for each i, set out[(i + k) % N] = nums[i]; copy out back into nums.
def rotate_brute(nums, k):
n = len(nums)
k %= n
out = [0] * n
for i in range(n):
out[(i + k) % n] = nums[i]
nums[:] = out
Brute Force Complexity
Time: O(N). Space: O(N) auxiliary. Fails the in-place constraint despite optimal time.
Optimization Path
We need O(1) extra space. Two well-known approaches:
- Cyclic replacement: start at index 0, jump to
(0 + k) % N, place the displaced element, continue. Visits each index exactly once. Tricky whengcd(N, k) > 1(multiple disjoint cycles). Correctness needs a counter for elements moved. - Three reversals: reverse the whole array, reverse the first
k, reverse the lastN-k. This works because rotation bykis reversal of (reversal of left, reversal of right). Easier to write correctly.
Pick the three-reversal approach for the interview unless the interviewer explicitly asks for cyclic replacement.
Final Expected Approach
Three-reversal in place.
def reverse(nums, l, r):
while l < r:
nums[l], nums[r] = nums[r], nums[l]
l += 1
r -= 1
def rotate(nums, k):
n = len(nums)
k %= n # normalize
reverse(nums, 0, n - 1)
reverse(nums, 0, k - 1)
reverse(nums, k, n - 1)
Data Structures Used
A single mutable array. No auxiliary structures.
Correctness Argument
Loop invariant for reverse(l, r): at each iteration, elements at positions less than l and greater than r are already correctly placed (i.e., are mirror swaps of each other). Termination at l ≥ r leaves the closed range fully reversed. The three-reversal identity is verifiable with a 2-step example: [A B C D E] k=2 → reverse all → [E D C B A] → reverse first 2 → [D E C B A] → reverse last 3 → [D E A B C]. Equivalent to original rotated right by 2.
Complexity
Time: O(N) (three passes, each touching at most N elements). Space: O(1) (in-place swaps; no allocation).
Implementation Requirements
- Helper
reverse(nums, l, r)with explicit bounds (closed interval). - Always do
k %= nfirst, before any loop or reversal. - Handle
k == 0andk == n: both reduce to no-op viak %= n. - No allocation outside the input array (verify by reading your code).
- Clean variable names (
l,r,n,kare interview-acceptable).
Tests
- Smoke: the canonical
[1,2,3,4,5,6,7]with k=3. - Unit: k=0 (no-op), k=N (no-op), k=1 (single right shift).
- Edge:
N=1,N=2 with k=1, largek=10^9modN. - Large: N=10^5, k=N//2; assert in-place (capture
id(nums)in Python). - Random: generate random arrays and
ks; check against brute force as oracle. - Invalid: negative
k(per constraints not allowed; if interviewer extends, decide left rotation semantics).
Follow-up Questions
- “Can you do it without the modulo?” (Yes, but ugly: branch on
k <= n.) - “What if the array is given as a linked list?” (Different problem — find length, find pivot, splice.)
- “What if
kcan be negative (left rotation)?” (Convert viak = ((k % n) + n) % n.) - “Solve using a single reverse loop without a helper.” (Inline the swaps three times.)
- “Implement with cyclic replacement instead.” (Demonstrate the gcd cycle counter trick.)
Product Extension
A circular buffer for a metrics dashboard storing the last N seconds of samples. Rotation isn’t done on append — instead, a head index advances mod N. The “three reversals” trick is what you do when the buffer must be flattened to a linear export. Discuss tradeoffs: head-index buffer is O(1) per append but harder to debug; rotation on read is O(N) but storage is always linear.
Language/Runtime Follow-ups
- Python:
nums[:] = nums[-k:] + nums[:-k]is one-line and Pythonic but allocates O(N). Acceptable to mention but interviewer may rule it “not in place.” Pure swap version uses no allocation. - Java:
int[](primitive) avoids boxing. Don’t reach forCollections.rotate; understand it. - Go: slice indexing, careful with
n := len(nums)capture;nums = nums[:]aliasing makes no copy. - C++:
std::reverse(nums.begin() + l, nums.begin() + r + 1); use the standard. - JS: in-place using
[a, b] = [b, a]swap ortemp = a.arr.reverse()is in-place.
Common Bugs
- Forgetting
k %= n— when k > n the reversals overlap incorrectly. - Off-by-one in
reverse(l, r)— usingrvsr - 1as the bound; using<vs≤. - Reversing wrong segments — confusing first-k with last-k. Right rotation:
[reverse first k of reversed array]then[reverse last n-k]. - Allocating in disguise —
nums = nums[-k:] + nums[:-k]rebinds the local name and does not mutate the caller’s array (in Python). Usenums[:] = …. - Left vs right confusion — re-read the problem statement once before submitting.
Debugging Strategy
- Print the array after each of the three reversals; compare to a hand-traced
[1,2,3,4,5,6,7]k=3 walk-through. - If output is wrong by a constant shift, suspect an off-by-one in segment bounds.
- If output looks reflected (
[3,2,1, 7,6,5,4]instead of[5,6,7,1,2,3,4]), one of the three reversals fired in the wrong region.
Mastery Criteria
- Wrote the three-reversal solution in under 4 minutes, no bugs, in-place verified.
- Traced through
k > N,k == 0,N == 1without prompting. - Stated the loop invariant for
reversealoud. - Named the cyclic-replacement alternative and acknowledged its
gcdcomplication. - Identified and avoided the Python
nums = nums[-k:] + nums[:-k]allocation trap.
Lab 02 — String Mechanics: Reverse Words In A String
Goal
Master string immutability, builder patterns, encoding gotchas, and the cost of naive concatenation. The deliverable: reverse the order of words in a sentence efficiently in your interview language, demonstrating you understand why the “obvious” solution can be O(N²) in some languages and O(N) in others.
Background Concepts
String immutability and the resulting cost of s += c loops; StringBuilder / strings.Builder / "".join() patterns; substring complexity; whitespace tokenization; Unicode pitfalls. Review the Strings section of the Phase 1 README and item 4 in the runtime concepts (mutable vs immutable).
Interview Context
A staple of Microsoft, Amazon, and Bloomberg phone screens. The trap is candidates who write result = ""; for w in reversed(words): result += w + " " and don’t realize they just shipped O(N²) code in Java or Python. Strong candidates state the immutability fact aloud and choose a builder pattern.
Problem Statement
Given a string s representing a sentence, return a new string with the order of words reversed. A “word” is a maximal run of non-space characters. Multiple spaces between words and leading/trailing spaces must be collapsed to single spaces; the output has no leading/trailing space.
Constraints
1 ≤ |s| ≤ 10^4scontains printable ASCII characters and spaces.scontains at least one word.
Clarifying Questions
- Is the input ASCII or arbitrary Unicode? (Affects iteration model; ASCII is the default unless stated.)
- Should multiple internal spaces be preserved or collapsed? (Standard problem says collapse; confirm.)
- Trim leading/trailing whitespace? (Yes — output has none.)
- Punctuation: is
"hello,"one word? (Per problem: a “word” is non-space-separated;"hello,"is one word.) - Can I allocate a new string, or must I work in place? (For Python/Java/JS — strings are immutable, so a new string is unavoidable. For C++
std::string, in-place is feasible.)
Examples
| Input | Output |
|---|---|
"the sky is blue" | "blue is sky the" |
" hello world " | "world hello" |
"a good example" | "example good a" |
"single" | "single" |
" " | invalid per constraints |
Initial Brute Force
Split on whitespace, reverse the list, join with single spaces.
def reverse_words(s):
return " ".join(reversed(s.split()))
s.split() (no arg) collapses runs of whitespace and trims, which is exactly the spec. This is one line in Python — but the interviewer wants you to explain what it does.
Brute Force Complexity
Time: O(N) — split is one linear pass, reversed is O(k) where k is word count, join is one linear pass. Space: O(N) for the list of words and the output. This is already optimal asymptotically.
Optimization Path
For interviews where the one-liner is “too easy,” the interviewer escalates: “Do it without split/join; use only character-level operations.” Or: “Reverse in place in a char[] with O(1) extra memory.”
The classic in-place trick on a mutable buffer: reverse the entire buffer, then reverse each word, then collapse internal whitespace. This is the same three-reversal identity from Lab 01, applied to characters.
Final Expected Approach
State the one-liner first. Then offer the manual two-pointer approach for languages with mutable strings or as an “I-understand-the-internals” demonstration.
def reverse_words(s):
# tokenize without builtins
words = []
i = 0
n = len(s)
while i < n:
while i < n and s[i] == ' ':
i += 1
j = i
while j < n and s[j] != ' ':
j += 1
if j > i:
words.append(s[i:j])
i = j
# reverse and join via builder
out = []
for w in reversed(words):
out.append(w)
return ' '.join(out)
In Java, replace the final join with StringBuilder. In Go, with strings.Builder.
Data Structures Used
A list of word strings (or substrings); a builder for the output. No advanced structures.
Correctness Argument
Tokenization invariant: at each outer iteration, i points at the start of unscanned input; the inner loops skip whitespace and capture a word. Each character is examined O(1) times, so tokenization is O(N). Reversed iteration over words produces them in opposite order; joining with ' ' produces single-space separation; no leading/trailing space because we never push empty words and we don’t terminate with a separator (Python join handles this).
Complexity
Time: O(N) — single tokenization pass plus single output assembly pass. Space: O(N) — output and word list.
Implementation Requirements
- Use an explicit builder (
StringBuilder,strings.Builder,[].join,''.join). - Never use
+=to build the output in a loop in Java/Python/JS. - Don’t rely on regex unless the interviewer is explicitly fine with it (
s.split(/\s+/)works but is overkill). - Verify trimming works on
" word ". - Verify multiple internal spaces collapse on
"a b".
Tests
- Smoke:
"the sky is blue"→"blue is sky the". - Unit: leading/trailing spaces; multiple internal spaces; single-word input.
- Edge: all-spaces input (per constraints invalid; handle gracefully if extended); single character; punctuation as part of word.
- Large: N = 10^4 input with 10^3 words; assert no quadratic behavior (time it).
- Random: randomly generate space-and-letter strings; cross-check against
" ".join(reversed(s.split())). - Invalid: non-ASCII in extended versions (define behavior per language’s iteration model).
Follow-up Questions
- “Reverse character order within each word as well.” (Reverse each word in place after splitting.)
- “Reverse in O(1) extra space on a mutable
char[].” (The three-reversal trick.) - “Handle Unicode where ‘word’ is grapheme-cluster bounded.” (Need an ICU library or equivalent.)
- “Preserve original whitespace runs.” (Don’t collapse; keep separator tokens.)
- “What if the string is huge and streamed?” (Process word-by-word from a buffered reader.)
Product Extension
A search-engine query normalizer. Inputs from users have inconsistent whitespace, varying word order. Reverse word order is a feature for “did-you-mean” inversion testing. In production: keep the original for display, normalize for indexing, and accept that the cost of String immutability in Java means hot paths use StringBuilder or even byte arrays directly.
Language/Runtime Follow-ups
- Python:
s.split()(no args) is the magical normalizer."".join(...)is a single allocation; never use+=onstrin a loop. - Java:
String.split("\\s+")returns an array;String.trim()separately. Output viaStringBuilder.String.join(" ", parts)is the modern one-liner. - Go:
strings.Fields(s)splits on any whitespace and trims;strings.Join(parts, " ")rebuilds. Both are O(N). - C++:
std::stringstreamfor tokenizing; build viastd::string +=, which has small-string optimization but still amortized O(N). - JS/TS:
s.trim().split(/\s+/).reverse().join(' '). Beware: the empty-string case"".split(/\s+/)returns[""], not[]. - Unicode subtlety: in Java/JS,
lengthcounts UTF-16 code units;'😀'is length 2. Doesn’t matter here unless emoji-as-word.
Common Bugs
- Off-by-one on whitespace — leaving a trailing space after
join. s += cin a loop — O(N²) in Python, Java, JS. Catastrophic on large input.- Splitting by
' 'instead of by whitespace —"a b".split(' ')returns["a", "", "b"]in Java/JS; you get empty tokens. - Trim missed — leading whitespace becomes a leading empty token.
- Mutating the input — in some languages strings are immutable so this is a type error; in C++/Go-byte-slice it’s a semantic bug.
Debugging Strategy
- Print the tokenized list. If it has empty strings, your splitter doesn’t collapse.
- If output has trailing space, your join builds it manually rather than via the standard library.
- Time on a 10^4 input. If it’s > 1 second, you have a quadratic concat hidden somewhere.
Mastery Criteria
- Wrote the one-liner in 30 seconds and explained why each piece is needed.
- Wrote the manual tokenizer in under 5 minutes.
- Stated aloud “in this language, strings are immutable, so I will use a builder.”
- Identified the difference between
split(' ')andsplit('\\s+')/split()(no-arg). - Acknowledged the in-place
char[]three-reversal alternative without writing it (or wrote it on follow-up). - Tested with
" a b "and confirmed clean output.
Lab 03 — Hashmap Mastery: Group Anagrams
Goal
Master hash-map design with composite keys, adversarial input awareness, and the equality/hashcode contract. The deliverable groups N strings into anagram buckets in O(N · L) time and articulates exactly why your hash key works and what an adversary could do to break it.
Background Concepts
Hash function design; key equality contract; adversarial inputs and load factor; ordered-vs-unordered map choice; counter pattern. Review the Hash Maps and Hash Sets sections of the Phase 1 README, plus runtime concept 5 (hash collisions).
Interview Context
Group Anagrams is interview-evergreen: appears at Meta, Google, Amazon, Microsoft. The interview signal is whether you reach for the right key. Naive candidates compare every pair (O(N² · L)). Decent candidates sort each string into a key (O(N · L log L)). Strong candidates use a counter tuple key (O(N · L)). Elite candidates discuss adversarial hash flooding and language-specific custom-hash mechanics.
Problem Statement
Given an array strs of N lowercase-ASCII strings, group the strings that are anagrams of each other. Return the groups as a list of lists. Within and across groups, any order is acceptable.
Constraints
1 ≤ N ≤ 10^40 ≤ |s_i| ≤ 100s_iconsists of lowercase English letters.- Total characters:
Σ |s_i| ≤ 10^6.
Clarifying Questions
- Lowercase only? (Per constraints — confirm.)
- Do empty strings group together? (Yes — the empty string is an anagram of itself.)
- Is output order significant within or across groups? (Standard problem: no.)
- Are duplicates in the input allowed? (Yes —
["aa","aa"]is one group of size 2.) - Memory constraints? (Should fit comfortably; mention you’ll discuss tradeoffs.)
Examples
| Input | Output (any order) |
|---|---|
["eat","tea","tan","ate","nat","bat"] | [["eat","tea","ate"], ["tan","nat"], ["bat"]] |
[""] | [[""]] |
["a"] | [["a"]] |
["abc","cab","bca","xyz"] | [["abc","cab","bca"], ["xyz"]] |
Initial Brute Force
For each pair (i, j), check if strs[j] is an anagram of strs[i] (e.g., by sorting both). Use a seen[] array. O(N² · L log L).
def group_anagrams_brute(strs):
seen = [False] * len(strs)
groups = []
for i, s in enumerate(strs):
if seen[i]: continue
g = [s]
seen[i] = True
ks = sorted(s)
for j in range(i + 1, len(strs)):
if not seen[j] and sorted(strs[j]) == ks:
g.append(strs[j])
seen[j] = True
groups.append(g)
return groups
Brute Force Complexity
Time: O(N² · L log L) due to repeated sorting. Space: O(N) for the seen array plus O(L) per sort. Fails for N = 10^4 (10⁸ operations × log).
Optimization Path
The key insight: anagrams have the same multiset of characters. We need a hashable key derived from this multiset. Two canonical forms:
- Sorted string as key:
sorted(s) → "aet"for"eat","tea","ate". Cost per key: O(L log L). Total: O(N · L log L). - Count tuple as key: a length-26 tuple of counts. Cost per key: O(L). Total: O(N · L). Optimal for large L.
Pick the count tuple unless L is tiny.
Final Expected Approach
Bucket strings by count tuple in a hash map.
from collections import defaultdict
def group_anagrams(strs):
buckets = defaultdict(list)
for s in strs:
counts = [0] * 26
for c in s:
counts[ord(c) - ord('a')] += 1
buckets[tuple(counts)].append(s)
return list(buckets.values())
Data Structures Used
- A hash map keyed by a 26-tuple of int counts; values are lists of strings.
- A constant-size 26-int array for tallying (per word).
Correctness Argument
Two strings are anagrams iff they have identical character multisets iff their count vectors are equal. Equal count vectors hash to the same bucket and compare equal under tuple equality, so anagrams land in the same bucket. Different vectors compare unequal under tuple equality, so non-anagrams don’t share a bucket (modulo accidental hash collisions, which the equality check resolves correctly — that’s the equality/hashcode contract at work).
Complexity
Time: O(Σ |s_i|) = O(N · L) for tallying plus O(N · 26) for tuple hashing = O(N · L). Space: O(N · L) for the buckets and keys.
Implementation Requirements
- Use a hashable key — tuples in Python,
String(built from the count array) in Java, struct or stringified key in Go,std::array<int, 26>in C++. - Don’t use the sorted string for very large L (suboptimal but acceptable for interview presentation).
- Use
defaultdict(list)or equivalent (computeIfAbsentin Java) to avoid manual “if not in map” branching. - Return values as a list-of-lists, not the dict itself.
Tests
- Smoke: the canonical 6-string example above.
- Unit: singletons, all-anagrams (
["abc","bca","cab"]), no-anagrams (["a","b","c"]). - Edge: empty strings, single-char strings, duplicates (
["aa","aa"]). - Large: N = 10⁴, L = 100, mix of group sizes; assert sub-second.
- Random: generate random words; verify bucketing matches a reference (e.g., the sorted-string variant).
- Invalid: uppercase or non-ASCII (per constraints disallowed; if extended, normalize first).
Follow-up Questions
- “What if strings can be Unicode?” → switch to a
Counter/HashMap<Char, Int>as the key (more expensive hashing). Or, use the sorted string with a Unicode-aware sort. - “What if the input is streamed?” → emit groups lazily as you find duplicates, but you can’t finalize a group until input ends.
- “What if memory is tight (you can’t store N count arrays)?” → use the sorted-string key (only 1 allocation per word, free after bucketing) or a rolling hash with secondary check.
- “Adversarial input — can the interviewer construct N strings whose count tuples all hash to the same bucket?” → yes for predictable-hash languages; mitigation is randomized hash seeds or a TreeMap fallback.
- “Implement without a hash map.” → sort all strings by their sort-key, then group consecutive equal keys. O(N · L log L) due to sorting strings.
Product Extension
A duplicate-document detector. Each document is hashed by a content fingerprint (e.g., sorted shingles); documents with the same fingerprint are grouped. The same data-structure pattern (hash by canonical form) underlies large-scale dedup at file-storage and email systems. Discuss false positives (two different docs with the same fingerprint), the role of secondary equality check, and the tradeoff between fingerprint cost and accuracy.
Language/Runtime Follow-ups
- Python:
tuple(counts)is hashable;Counteris hashable only viafrozenset(c.items()).defaultdict(list)is the idiom. - Java: must build a
String(or use a hash of the int[] array combined withequalson the array — which means a custom class with properhashCode/equals). The classic interview shortcut is to convert the count array to a string like"1#0#0#…#1". Beware boxing inHashMap<int[], List<String>>—int[]does not overridehashCode/equals, defaults to object identity. This is a top-3 Java bug on this problem. - Go: map keys must be comparable.
[26]intis comparable;[]intis not. Use the array. - C++:
std::array<int, 26>is hashable withboost::hashor a customstd::hashspecialization. Or stringify. - JS/TS:
Mapkeys can be any value but use reference equality for arrays/objects. Use a string key likecounts.join(',')or aMap<string, string[]>. - Adversarial keys: Java’s
String.hashCodeis well-known and allows hash flooding. Java HashMap mitigates with tree-to-bucket conversion past 8 collisions.
Common Bugs
- Java
int[]as map key — uses object identity, not value equality. Every entry creates a new bucket. Fix: stringify or useArrays.hashCode+ custom wrapper. - Mutating the count array between map ops — if you reuse one buffer and mutate, your inserted keys all alias the same buffer. Allocate fresh per word.
- Off-by-one in
ord(c) - ord('a')— non-lowercase input goes negative or out of range. - Empty-string handling — count array is all zeros; should still bucket correctly. Verify.
- Returning
dict.values()directly in Python — works but the type isdict_values, notlist. Wrap withlist(...).
Debugging Strategy
- Print the keys of the resulting map. If you see N keys for N strings, your key derivation is wrong (likely identity-based).
- For Java: assert
myKey.hashCode() == myOtherKey.hashCode()for a hand-crafted anagram pair. - Time on N=10⁴: should run in well under a second.
Mastery Criteria
- Selected the count-tuple approach within 60 seconds, explaining why over the sorted-string approach.
- Stated the equality/hashcode contract and how it affects key choice in Java.
- Identified the
int[]reference-equality trap (or its language equivalent) before coding. - Articulated the adversarial-input concern and the language’s defense.
- Wrote a clean implementation in under 8 minutes.
- Tested with the empty-string and all-duplicates edge cases.
Lab 04 — Linked List Pointers: Reverse Linked List
Goal
Master pointer manipulation under aliasing, the classic three-pointer iterative reverse, the recursive variant with stack-frame analysis, and the dummy-node technique. The deliverable: reverse a singly linked list iteratively and recursively, articulating exactly which references move when, and identifying the recursion-depth risk on long lists.
Background Concepts
Pointers as references; aliasing; null sentinels; recursion stack frames; tail-call elimination (or absence thereof). Review the Linked Lists section of the Phase 1 README, plus runtime concepts 3 (value vs reference) and 10 (recursion depth).
Interview Context
Reverse Linked List is the warm-up question at every FAANG. The signal isn’t whether you can do it — most candidates can — it’s whether you can do it cleanly, in two ways, on the whiteboard, while talking through pointer movement. It’s also the “are you ready for harder linked-list problems” gate.
Problem Statement
Given the head of a singly linked list 1 → 2 → 3 → 4 → 5 → null, return the head of the reversed list 5 → 4 → 3 → 2 → 1 → null. The original list nodes are reused (no new node allocation).
Constraints
0 ≤ N ≤ 5000(LeetCode classic) — but realistic interview lists may have N up to 10⁵; recursion depth matters.-5000 ≤ node.val ≤ 5000(irrelevant for traversal logic).
Clarifying Questions
- Is the list singly or doubly linked? (Singly — affects whether we need to update
prevpointers.) - Is
headevernull? (Yes — returnnull. Top edge case.) - Single-node list? (Return the same node; its
nextis alreadynull.) - Should the reversal be in place (reuse nodes) or allocate new nodes? (In place is the standard; allocating new nodes is a different problem.)
- Should I also support reversing a segment
[m, n]? (That’s a follow-up — see “Reverse Linked List II”.)
Examples
| Input | Output |
|---|---|
1 → 2 → 3 → 4 → 5 | 5 → 4 → 3 → 2 → 1 |
1 → 2 | 2 → 1 |
1 | 1 |
null | null |
Initial Brute Force
Walk the list, push values onto a stack, walk again and reassign values from the stack.
def reverse_brute(head):
vals = []
cur = head
while cur:
vals.append(cur.val)
cur = cur.next
cur = head
while cur:
cur.val = vals.pop()
cur = cur.next
return head
Brute Force Complexity
Time: O(N). Space: O(N) auxiliary. Two passes. Doesn’t reverse pointers — only mutates values, which violates the spirit (and breaks if val is immutable, e.g., final field).
Optimization Path
We want O(1) extra space by manipulating pointers directly. Two canonical approaches:
- Iterative three-pointer:
prev,cur,next. Walk forward, flipcur.nexttoprev, advance. - Recursive: reverse the tail, then attach the head behind it. Beautiful but O(N) stack.
Iterative is preferred for production (no stack-overflow risk). Recursive is preferred for explaining the idea. Strong candidates write both.
Final Expected Approach
Iterative, three pointers.
def reverse_list(head):
prev = None
cur = head
while cur:
nxt = cur.next # save the rest of the list
cur.next = prev # flip
prev = cur # advance prev
cur = nxt # advance cur
return prev # prev is the new head
Recursive form:
def reverse_list_rec(head):
if head is None or head.next is None:
return head
new_head = reverse_list_rec(head.next)
head.next.next = head
head.next = None
return new_head
Data Structures Used
The input list itself; three local pointers. No new allocation.
Correctness Argument
Iterative invariant: before each iteration, the sub-list ending at prev is fully reversed and cur points at the head of the not-yet-reversed remainder. The body of the loop preserves the invariant: we save cur.next, flip cur.next to point at the reversed prefix, then advance prev and cur by one. When cur is None, the entire input has been processed and prev is the head of the reversed list.
Recursive correctness: by induction on length. Base: list of length 0 or 1 is its own reverse. Inductive step: assume reverse_list(head.next) correctly returns the head of the reversed tail. The original head is now at the end of the reversed tail; head.next is the last node of the reversed-tail (the original second node). Set head.next.next = head to append head; set head.next = None to terminate.
Complexity
Iterative: O(N) time, O(1) space. Recursive: O(N) time, O(N) space due to the call stack.
Implementation Requirements
- Three named pointers:
prev,cur,nxt(ornext— but watch out for shadowing built-ins in some languages). - Initialize
prev = null. Top bug: forgetting this means the head’snextbecomes self-referential or stale. - Save
cur.nextbefore overwriting it. Forgetting to save loses the rest of the list. - Return
prev, notcur(which isnullat termination). - For recursion: handle the base case at
head is Nonefirst.
Tests
- Smoke:
1 → 2 → 3 → 4 → 5. - Unit: length 1, length 2, length 3.
- Edge:
nullhead; list with all-equal values; list with cycle (should not be passed in — but if defensive, detect with Floyd’s). - Large: N = 10⁵; if recursive, expect StackOverflow in Java/Python without
sys.setrecursionlimit. - Random: build random lists, reverse, reverse again, assert equality with the original.
- Invalid: ensure the original head’s
nextisnullafter reversal (it’s now the tail).
Follow-up Questions
- “Reverse a sublist
[m, n].” → “Reverse Linked List II” — needs a dummy node and careful pointer wiring. - “Reverse in groups of K.” → “Reverse Nodes in K-Group” — apply the iterative reverse on each chunk.
- “Reverse a doubly linked list.” → swap
prev/nextper node. - “Detect and handle a cycle before reversing.” → Floyd’s tortoise and hare.
- “Iterative without saving
next(write it as a swap).” → trickier; usually a teaching exercise. - “Why is iterative preferred in production?” → no stack-overflow risk on long lists.
Product Extension
A document undo/redo stack implemented as a linked list. To replay actions in reverse temporal order, you reverse the list. In-place reversal is preferred because the list nodes carry references to large objects (action payloads) and reallocation would be expensive. The null-handling and dummy-node patterns transfer directly to LRU-cache implementations and free-list management.
Language/Runtime Follow-ups
- Python: no tail-call elimination;
sys.setrecursionlimit(N+100)for deep lists. Default recursion limit is 1000. - Java: typical stack ~500K frames; expect
StackOverflowErrorfor recursive on N=10⁵+. - Go: stack starts small (8 KB) and grows automatically. Recursion is safe for moderate N. Pointers are explicit (
*ListNode). - C++: stack usually 1–8 MB; recursive risk depends. Use
-fsanitize=addressto catch use-after-free if you mis-rewire. - JS/TS: V8 doesn’t reliably tail-call optimize. Iterative is the only safe choice for large N.
- Pointer aliasing: mutating
cur.nextwhile another reference (e.g.,head) still points to the same node is exactly the operation we want — but only because we intentionally preserve the oldnextinnxtfirst.
Common Bugs
- Losing the rest of the list — overwriting
cur.nextbefore saving it. Symptoms: list has 2 elements after “reverse.” Fix: always save first. - Forgetting to set the original head’s
nexttonull— in recursive form, omittinghead.next = Nonemakes the original head point at itself or its successor, creating a cycle. - Returning
headinstead ofprev— returns the now-tail of the reversed list. Always returnprev. - Initializing
prevtoheadinstead ofnull— first iteration creates a self-loop. - Using
nextas a variable name in Python — shadows the built-in iterator function. Harmless here but tags you as junior.
Debugging Strategy
- Hand-trace on a 3-node list. Draw arrows. After each iteration, write down where
prev,cur,nxtpoint. - After running, walk the result and assert it terminates at
nullwithin N steps (cycle check). - If output is shortened (only 1 element), you lost the rest — debug the save step.
- If output reverses but the last element points back to the previous, you forgot the
head.next = None(recursive only).
Mastery Criteria
- Wrote the iterative version cleanly in under 90 seconds.
- Wrote the recursive version on demand and explained the inductive correctness argument.
- Identified the
nullhead and length-1 edge cases without prompting. - Stated why iterative is safer for production.
- Drew the three-pointer dance on a whiteboard (or in comments) for one full iteration.
- Acknowledged that recursive depth = N and called out the stack risk.
Lab 05 — Stack & Queue Applications: Valid Parentheses + Min Stack
Goal
Master the stack as a structural matching tool, the dual-stack technique for augmented operations, and the queue/deque distinction. The deliverable: validate balanced bracket strings in linear time, then extend to a Min Stack supporting O(1) push, pop, top, getMin.
Background Concepts
LIFO discipline; stack invariants; dual-stack trick for tracking auxiliary state; deque vs queue. Review the Stacks and Queues sections of the Phase 1 README, plus runtime concept 1 (stack vs heap memory).
Interview Context
Valid Parentheses is the warm-up; Min Stack is the follow-up. Together they probe whether you grasp the stack as a general tool (not just a recursion bookkeeping device). Asked at Amazon, Google, Bloomberg, Microsoft. The signal: do you generalize from “match ()” to “match (){}[]”? Do you reach for the dual-stack trick on Min Stack instead of O(N) getMin?
Problem Statement
Part A (Valid Parentheses): Given a string s of bracket characters from (){}[], return true iff every opener is matched with the correct closer in the correct order.
Part B (Min Stack): Design a stack that supports push(x), pop(), top(), and getMin() all in O(1).
Constraints
- A:
1 ≤ |s| ≤ 10^4;scontains only the six bracket characters. - B:
pop,top,getMinare not called on an empty stack; up to3 · 10^4operations.
Clarifying Questions
- A: Are non-bracket characters possible? (Per constraints, no — but if extended, ignore them.)
- A: Is the empty string valid? (Conventionally yes — vacuous truth.)
- B: Are integer values bounded? (Affects whether
intsuffices.) - B: Is
getMinof an empty stack defined? (Per constraints, never called on empty.) - B: Should
topandpopbe separate, or ispopreturning the value acceptable? (LeetCode classic: separate. Match the spec.)
Examples
A:
| Input | Output |
|---|---|
"()" | true |
"()[]{}" | true |
"(]" | false |
"([)]" | false (interleaved, not nested) |
"{[]}" | true |
"" | true |
B: Sequence push(-2), push(0), push(-3), getMin() → -3, pop(), top() → 0, getMin() → -2.
Initial Brute Force
A: Repeatedly scan for adjacent matching pairs (), [], {} and remove them. If string empties, valid; else invalid.
def valid_brute(s):
while True:
new = s.replace("()", "").replace("[]", "").replace("{}", "")
if new == s: break
s = new
return s == ""
B: Push x to a normal stack. getMin walks the entire stack each call.
Brute Force Complexity
A: O(N²) worst case (each pass removes constant pairs). B: getMin is O(N) per call, total O(N²) for N operations.
Optimization Path
A: Single pass with a stack: push openers, on closer pop and verify match.
B: Maintain a parallel min-stack so each push records the current minimum. On pop, also pop from min-stack. getMin returns the top of min-stack.
Final Expected Approach
A — Valid Parentheses:
def is_valid(s):
pairs = {')': '(', ']': '[', '}': '{'}
stack = []
for c in s:
if c in '([{':
stack.append(c)
else:
if not stack or stack.pop() != pairs[c]:
return False
return not stack
B — Min Stack:
class MinStack:
def __init__(self):
self.s = [] # main stack
self.m = [] # min stack: m[i] = min of s[0..i]
def push(self, x):
self.s.append(x)
self.m.append(x if not self.m else min(x, self.m[-1]))
def pop(self):
self.s.pop()
self.m.pop()
def top(self):
return self.s[-1]
def getMin(self):
return self.m[-1]
Data Structures Used
- A: A single stack of opener characters.
- B: Two parallel stacks, both supporting O(1) push/pop.
Correctness Argument
A: Loop invariant — at each iteration, the stack contains the unclosed openers of the prefix of s consumed so far, in order. A closer is valid iff it matches the most-recent opener (LIFO). At end, an empty stack means all openers were closed in order.
B: Invariant — m[i] is min(s[0..i]). When we push x, the new minimum is min(x, current_min) — pure local computation. When we pop, both stacks shrink; the new top of m is correct because it was correct before this push. Hence getMin is m[-1], O(1).
Complexity
A: O(N) time, O(N) space (worst case, all openers). B: O(1) time per operation, O(N) total space.
Implementation Requirements
- Use a single map
closer → openerto avoid six-wayif/else. - For Min Stack, use two stacks (or one stack of pairs); never recompute min by scanning.
- Don’t pre-validate
s(e.g., for invalid characters) unless the problem demands. - Handle the empty stack case before popping in
is_valid.
Tests
- A smoke:
"()[]{}"valid;"(]"invalid. - A unit: unbalanced opener-only
"(("; closer-first")"; nested correctly"{[]}"; interleaved"([)]". - A edge: empty string; single character.
- A large: 10⁴ openers followed by 10⁴ closers; should still run in milliseconds.
- B smoke: the canonical sequence above.
- B edge: push the same value twice, pop, ensure min is still correct (this is the “duplicate-min trap” — naive single-stack solutions fail here).
- Random: generate random op sequences; cross-check against a “min via scan” reference.
Follow-up Questions
- A: “What if
smay contain other characters (letters, digits)?” → ignore them or treat as “skip.” - A: “Return the index of the first invalid bracket.” → modify the loop to return
iinstead ofFalse. - A: “Generate all valid bracket strings of length 2N.” → that’s “Generate Parentheses” (Lab 08).
- B: “Use only one stack.” → store
(value, current_min)as pairs. - B: “Use only constant extra space (no parallel stack).” → encoding trick: store
2x - currentMinwhenx < currentMin, then decode on pop. Watch for overflow. - B: “Add
getMax.” → add a third parallel stack.
Product Extension
A real-time expression evaluator for a spreadsheet engine. As users type formulas, you validate parenthesization on every keystroke (must be O(N) for snappy UX) and maintain a “min/max running aggregate” for selected cells (Min Stack pattern). The two-stack technique generalizes to maintaining any associative aggregate over a stack-shaped sliding context.
Language/Runtime Follow-ups
- Python:
listis a fast stack viaappend/pop(). Don’t uselist.pop(0)(O(N)). - Java: prefer
ArrayDequeoverStack(the latter is synchronized, slower, and inherits fromVector).Deque<Integer>withpush/pop/peek. - Go: slices as stacks:
s = append(s, x)andx, s = s[len(s)-1], s[:len(s)-1]. - C++:
std::stack(LIFO adapter) or juststd::vector.std::stack’spopreturns void; usetopthenpop. - JS/TS:
Array.push/Array.popare O(1) amortized. The same array is fine. - Memory: stacks here grow on the heap (the data structure), even though the conceptual abstraction is named “stack.” Don’t confuse with the call stack.
Common Bugs
- A — popping an empty stack — Python raises
IndexError. Checkif not stackfirst. - A — accepting
"(("— forgetting the finalif stackcheck. The string ends with openers still on the stack. - A — wrong pair table —
{')': '(', ']': '[', '}': '{'}. Off-by-one easy to typo. - B — naive
getMin— scanning the stack is O(N), violating the contract. - B — duplicate-min handling — if you maintain “the min” as a single field and pop the value equal to it without secondary tracking, the min is wrong after pop. Two-stack design avoids this.
- B — pop on empty stack — per constraints not called, but if defensive, raise.
Debugging Strategy
- A: print the stack and the current char at each step; trace
"([)]". - B: print both stacks after each op; verify
m[i] == min(s[0..i]). - For the duplicate-min trap, manually trace
push(-1), push(-1), pop(), getMin() → must still be -1.
Mastery Criteria
- Wrote
is_validcleanly in under 4 minutes; under 5 lines of logic. - Recognized the closer-table pattern over a six-way conditional.
- Designed Min Stack with two stacks; explained why one stack with a single min field fails on duplicates.
- Sketched the “encoded delta” optimization without needing it.
- Handled the empty-stack defensive checks.
- Selected
ArrayDequeoverStackin Java without prompting.
Lab 06 — Heap Priority: Kth Largest In A Stream
Goal
Master the binary heap as the canonical streaming top-K device, the min-heap-of-size-K trick, and the cost model for push/pop. The deliverable: an online data structure that, after O(K) initialization, returns the Kth largest element in O(log K) per add.
Background Concepts
Binary heap as an array; sift-up / sift-down; min-heap vs max-heap; heapify is O(N); push/pop are O(log N). Review the Heaps section of the Phase 1 README and runtime concept 1 (stack vs heap memory — note the distinction between the call stack and the heap data structure).
Interview Context
Asked at Amazon, Apple, Bloomberg, and any role touching streaming systems. The signal: do you reach for a heap immediately when “online K-th largest” is mentioned? Do you choose a min-heap of size K (not a sorted list, not a max-heap)? Do you state the O(log K) per add?
Problem Statement
Design a KthLargest(k, nums) class. The constructor receives the integer k and an initial array nums. The method add(val) inserts val into the stream and returns the Kth largest element among all elements seen so far.
Constraints
1 ≤ k ≤ 10^40 ≤ |nums| ≤ 10^4-10^4 ≤ val, nums[i] ≤ 10^4- At most
10^4calls toadd. - Guaranteed: at the time of any
addreturn, there are at leastkelements seen.
Clarifying Questions
- Is
kfixed for the lifetime of the object? (Yes — set once.) - Are duplicates allowed? (Yes —
add(5)twice keeps both.) - What if fewer than
kelements have been seen? (Per constraints, won’t happen at return time. Confirm.) - Is “Kth largest” 1-indexed? (Yes —
K=1is the maximum.) - Streaming: do we ever remove elements? (No — additions only.)
Examples
KthLargest(3, [4, 5, 8, 2])
add(3) → 4 // sorted desc: 8, 5, 4, 3, 2 → 3rd is 4
add(5) → 5 // 8, 5, 5, 4, 3, 2 → 3rd is 5
add(10) → 5 // 10, 8, 5, 5, 4, 3, 2 → 3rd is 5
add(9) → 8 // 10, 9, 8, 5, 5, 4, 3, 2 → 3rd is 8
add(4) → 8 // 10, 9, 8, 5, 5, 4, 4, 3, 2 → 3rd is 8
Initial Brute Force
Maintain a sorted list. On add, insert in sorted order (O(N)) and read index N-k.
class KthLargestBrute:
def __init__(self, k, nums):
self.k = k
self.arr = sorted(nums)
def add(self, val):
# binary search insertion
import bisect
bisect.insort(self.arr, val)
return self.arr[-self.k]
Brute Force Complexity
bisect.insort is O(log N) for the search but O(N) for the actual insertion (array shift). Total O(N) per add. For 10⁴ adds and 10⁴ initial size: 10⁸ ops. Borderline.
Optimization Path
We don’t need to track all elements. Only the top K. A min-heap of size K keeps the K largest seen, with the smallest of them at the top — that’s the Kth largest.
- On
add: push, then if size > K, pop (the smallest, which is no longer in the top K). - Return
heap[0].
Final Expected Approach
import heapq
class KthLargest:
def __init__(self, k, nums):
self.k = k
self.heap = []
for x in nums:
self.add(x)
def add(self, val):
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0]
For the constructor, you can do better: take the first k elements, heapify them (O(k)), then for each remaining element, push if it beats the top, else skip. But the simple version above is acceptable and amortizes the same.
Data Structures Used
A binary min-heap. Underlying storage: a dynamic array. Capacity: K.
Correctness Argument
Invariant: self.heap contains the K largest values seen so far (when ≥ K have been seen), and self.heap[0] is the minimum of those — i.e., the Kth largest overall.
After heappush(val): heap may have K+1 elements; the smallest is at the top. Popping removes it. The remaining K elements are still the K largest (we only removed the smallest of K+1, which by definition is excluded from the top K of K+1). Hence self.heap[0] is the Kth largest of K+1 = the Kth largest overall.
Complexity
- Constructor: O(N log K) using the per-element approach; O(N) using bottom-up heapify on first K then sift the rest.
add: O(log K).- Space: O(K).
Implementation Requirements
- Use the language’s built-in min-heap; don’t roll your own unless asked.
- Bound the heap size to K explicitly; if you don’t, you’ve built a sorted set, not the optimization.
- For max-heap-only languages (e.g., Java’s
PriorityQueueis min-heap by default — fine here), use the natural orientation. If you need a max-heap, negate or pass a comparator. - Don’t allocate fresh on every add.
Tests
- Smoke: the canonical example above.
- Unit: K=1 (always returns max); K==N (returns min after each add).
- Edge: empty
numsand a stream that brings size up to K; duplicate values; negative values. - Large: 10⁴ adds of random ints with K=100; assert per-call O(log K) by timing.
- Random: maintain a brute-force sorted-list reference; assert equality of returned value on each call.
- Invalid:
addbefore reaching K elements (per constraints not happening; if defensive, raise or buffer).
Follow-up Questions
- “What if the stream supports
remove(val)?” → switch to a balanced BST or two heaps with lazy deletion. - “Maintain the K smallest.” → max-heap of size K (mirror).
- “K-th most frequent element in a stream.” → counter + heap with re-inserts on count change.
- “Top K trending hashtags over a sliding 1-hour window.” → heap + circular buffer + lazy deletion of stale entries.
- “Implement the min-heap from scratch.” → array-backed, sift-up on push, sift-down on pop, parent at
(i-1)//2, children at2i+1,2i+2. - “Why O(N) heapify rather than N pushes?” → bottom-up sift-down sums to O(N); pushes sum to O(N log N).
Product Extension
A leaderboard service that streams game scores and surfaces the top 100. Memory budget per shard is tight; the min-heap-of-size-K pattern is the standard approach. Combine with sharding (each shard maintains its own top-100; the aggregator maintains a heap of heap-tops). The same pattern powers “top-N alerts,” “p99 latency tracking,” and “trending content” feeds.
Language/Runtime Follow-ups
- Python:
heapqis min-heap only. For max-heap behavior, push-xand negate on pop.heapq.heapify(list)is O(N) in place. - Java:
PriorityQueuedefaults to min-heap.PriorityQueue<Integer> pq = new PriorityQueue<>();. Reversed:new PriorityQueue<>(Comparator.reverseOrder()).pq.poll()andpq.peek()are O(log N) and O(1). - Go: must implement the
heap.Interface(Len,Less,Swap,Push,Pop). Verbose; stand-alone helpers in thecontainer/heappackage. - C++:
std::priority_queue<int>is a max-heap by default. Usestd::priority_queue<int, std::vector<int>, std::greater<int>>for a min-heap. - JS/TS: no built-in heap. Must implement or pull a library. This is a not-uncommon interview surprise.
- Memory model: the data-structure heap lives in the process heap (not the call stack). Sizes up to 10⁴ are trivial.
Common Bugs
- Maintaining a max-heap — works for finding max, but you’d need to extract K elements per call. Wrong tool.
- Forgetting to bound size to K — heap grows to N; per-add cost becomes O(log N) instead of O(log K) (small impact for small N, but conceptually wrong and uses more memory).
- Returning
heap[-1]— Python’sheap[-1]is not the largest; onlyheap[0]is the min. Other indices are unordered. - Off-by-one on K — K=1 should track the maximum; if you accidentally maintain K-1 elements, you’re answering the wrong query.
- Java
PriorityQueuereversed Comparator typo — using(a, b) -> b - aoverflows for large negative ints. UseInteger.compare(b, a).
Debugging Strategy
- After each
add, print the heap. Should be ≤ K elements with the K-th-largest at index 0. - Cross-check against
sorted(all_seen)[-K]. - For perf: time 10⁴ adds; should be milliseconds.
Mastery Criteria
- Selected the min-heap-of-size-K pattern within 30 seconds of hearing “Kth largest streaming.”
- Stated the loop invariant aloud.
- Wrote the implementation in under 5 minutes.
- Identified the K=1 and K=N degenerate cases.
- Knew the language idiom:
heapqPython,PriorityQueueJava,priority_queue<…, greater<…>>C++. - Mentioned the
(a, b) -> b - aoverflow trap in Java. - Sketched the
O(N)bottom-up heapify alternative for the constructor.
Lab 07 — Binary Search Fundamentals
Goal
Master the half-open invariant [lo, hi), the overflow-safe midpoint, the lower-bound / upper-bound generalizations, and the discipline that makes binary search bug-free. The deliverable: implement Search Insert Position cleanly and explain why your loop terminates.
Background Concepts
Sorted arrays; monotone predicates; loop invariants; integer overflow on (lo + hi) / 2. Review the Sorted Arrays / Sorted Sets section of the Phase 1 README.
Interview Context
Binary search is asked at every FAANG and is the #1 source of “I solved it but had off-by-one bugs” complaints. The signal isn’t whether you can find an exact match — it’s whether you can correctly answer “first index where predicate flips from false to true” in one of three loop variants without bugs. Lower-bound and upper-bound are the general tools.
Problem Statement
Given a sorted array nums of distinct integers and a target target, return the index where target is found, or the index where it would be inserted to keep nums sorted.
This is exactly lower_bound (first index i such that nums[i] >= target).
Constraints
1 ≤ |nums| ≤ 10^4-10^4 ≤ nums[i], target ≤ 10^4numsis sorted ascending and contains no duplicates.
Clarifying Questions
- Are duplicates possible? (Per constraints, no — but the lower-bound formulation handles them: returns leftmost.)
- Can
numsbe empty? (Per constraints no, but the implementation handles it vialo = hi = 0.) - Should we return
len(nums)if target exceeds all elements? (Yes — it inserts at the end.) - Is the result expected to be the first match or any match? (For Search Insert, lower-bound semantics: leftmost.)
- Recursive or iterative? (Iterative is preferred — no stack growth.)
Examples
nums | target | Output |
|---|---|---|
[1, 3, 5, 6] | 5 | 2 |
[1, 3, 5, 6] | 2 | 1 |
[1, 3, 5, 6] | 7 | 4 |
[1, 3, 5, 6] | 0 | 0 |
[1] | 1 | 0 |
[] | 5 | 0 |
Initial Brute Force
Linear scan: walk the array, return the first index i where nums[i] >= target. If none, return len(nums).
def insert_pos_brute(nums, target):
for i, x in enumerate(nums):
if x >= target:
return i
return len(nums)
Brute Force Complexity
O(N) time, O(1) space. For 10⁴ elements with 10⁴ queries, 10⁸ ops — borderline. Misses the entire point of “sorted.”
Optimization Path
Sorted + monotone predicate = binary search. The predicate is nums[i] >= target, monotone false → true as i increases. We want the first true index.
Three loop styles compete:
- Closed
[lo, hi]:while lo <= hi, terminate atlo > hi. - Half-open
[lo, hi):while lo < hi, terminate atlo == hi. Recommended. - Inclusive find-or-not-found:
while lo < hi, post-loop checklovalidity.
Half-open is the cleanest because the answer-pointer lo always satisfies the invariant “all indices < lo have predicate false; all indices >= hi have predicate true.” When lo == hi, that’s the boundary.
Final Expected Approach
def search_insert(nums, target):
lo, hi = 0, len(nums) # half-open [lo, hi)
while lo < hi:
mid = lo + (hi - lo) // 2 # overflow-safe
if nums[mid] < target:
lo = mid + 1 # predicate false → exclude mid
else:
hi = mid # predicate true → keep mid as candidate
return lo # lo == hi; first true index
Data Structures Used
The input array. Three integer indices: lo, hi, mid.
Correctness Argument
Invariant: at every iteration, the answer (the smallest index i such that nums[i] >= target, or len(nums) if none) lies in [lo, hi] (closed interval over the half-open search range). Equivalently: nums[lo-1] < target (or lo == 0) and nums[hi] >= target (or hi == len(nums)).
Body: if nums[mid] < target, predicate at mid is false, so the answer is in [mid+1, hi]. We set lo = mid+1. Otherwise predicate at mid is true; the answer is in [lo, mid]. We set hi = mid. Both branches strictly shrink the range.
Termination: each iteration shrinks hi - lo by at least 1 (since mid is in [lo, hi-1]). Loop exits when lo == hi. Invariant gives us: lo is the answer.
Complexity
O(log N) time. O(1) space.
Implementation Requirements
- Use
lo + (hi - lo) // 2, never(lo + hi) // 2— integer overflow in Java/C++/Go for large ints. - Half-open
[lo, hi)withhi = len(nums)initial. - Loop condition
lo < hi. - Update
lo = mid + 1(exclude mid);hi = mid(include mid as candidate). - Return
lo. - Don’t write three nested if/else — there are only two branches.
Tests
- Smoke: the table above.
- Unit: target equals an existing element; target less than all; target greater than all; single-element array (target equal, less, greater).
- Edge: empty array → return 0.
- Large: N = 10⁵, sorted; binary search with random targets. Time should be sub-millisecond.
- Random: generate sorted random arrays; cross-check against linear scan.
- Invalid: array not sorted (undefined behavior; if defensive, assert).
Follow-up Questions
- “Find the last index where the predicate is true (upper-bound).” → flip the predicate; or use
bisect.bisect_right. - “Search in a rotated sorted array.” → modify the comparison: identify which half is sorted.
- “Search for a peak element.” → ternary-search-like: compare
midwithmid+1. - “First bad version (the predicate is the only oracle).” → same exact loop with
is_bad(mid)as the predicate. - “Search a 2D matrix.” → flatten conceptually if rows are sorted continuations; else two passes.
- “Why does
lo + (hi - lo) // 2matter in Python?” → it doesn’t (Python ints are unbounded), but it’s the universal idiom.
Product Extension
A timestamp-indexed log store. Find the first log line at or after a given timestamp: that’s lower_bound. The same primitive powers range queries (lower_bound(start) to upper_bound(end)) and is the basis for B-tree leaf-node lookups. Library functions like bisect, lower_bound, Arrays.binarySearch already implement this; a senior engineer reaches for them, not for a hand-rolled loop.
Language/Runtime Follow-ups
- Python:
bisect.bisect_left(nums, target)is the library answer. Returns exactly the lower-bound index. - Java:
Arrays.binarySearch(arr, target)returns either the match index or-(insertion_point) - 1. Decode withresult < 0 ? -result - 1 : result. Note the bit-shift idiom(lo + hi) >>> 1for unsigned-right-shift to avoid overflow. - Go:
sort.SearchInts(arr, target)returns the lower-bound directly. - C++:
std::lower_bound(v.begin(), v.end(), target) - v.begin(). Returns an iterator; subtractbegin()for the index. - JS/TS: no library. Must implement.
- Overflow: Java/C++ ints are 32-bit by default.
(lo + hi)can overflow when both ~2³⁰. Use the safe form.
Common Bugs
(lo + hi) // 2overflow in Java/C++/Go (32-bit ints). Uselo + (hi - lo) // 2or>>> 1.- Wrong update direction —
lo = mid(instead ofmid + 1) on the false branch. Causes infinite loop whenlo + 1 == hi. - Closed-interval
while lo <= hiwith half-open updates — mixing the two styles. Pick one and stick to it. - Returning
midinstead oflo—midis wherever the loop happens to stop, not the answer. - Off-by-one on the initial
hi—hi = len(nums) - 1for closed;hi = len(nums)for half-open. - Forgetting the empty-array case — half-open form handles it naturally (
lo = hi = 0); closed form needs an explicit check.
Debugging Strategy
- Print
lo,hi,mid, andnums[mid]each iteration. The range should strictly shrink. - If you hit an infinite loop, you almost certainly have
lo = mid(notmid + 1) on the false branch. - For random testing, compare against
bisect.bisect_leftas the reference.
Mastery Criteria
- Wrote the half-open form from memory in under 2 minutes, no off-by-ones.
- Stated the invariant aloud: “all indices < lo are false; all indices ≥ hi are true.”
- Identified the overflow trap and used the safe midpoint.
- Recognized that Search Insert Position is
lower_bound. - Knew the library function in Python, Java, Go, C++.
- Solved a follow-up (rotated sorted array OR upper-bound) in under 10 minutes by reusing the same skeleton.
Lab 08 — Recursion & Stack: Generate Parentheses
Goal
Master backtracking with partial-state validity, the recursion tree as a mental model, the bound on recursion depth, and the Catalan-number cost analysis. The deliverable: enumerate all well-formed parenthesizations of N pairs and explain why the count is C_n (the n-th Catalan number).
Background Concepts
Recursion as a tree of choices; partial-state pruning vs full-state validation; recursion depth = call-stack frames; iterative backtracking as an explicit-stack alternative. Review runtime concept 10 (recursion depth) in the Phase 1 README and the Stacks section.
Interview Context
Generate Parentheses is asked at Google, Microsoft, Meta. The signal: do you generate only valid prefixes (prune early) instead of generating all 2^(2n) strings and filtering? Do you know the Catalan-number complexity? Can you also produce an iterative version using an explicit stack?
Problem Statement
Given an integer n, return all combinations of well-formed parentheses using exactly n pairs of ( and ).
Constraints
1 ≤ n ≤ 8— the count grows asC_n = (2n)! / ((n+1)! n!).C_8 = 1430.- Output order is not specified; any valid enumeration is acceptable.
Clarifying Questions
- Should output be sorted? (Usually no — but lexicographic falls out naturally if we always try
(before).) - Is duplication possible? (No — each generated string is unique by construction.)
- Should we return a list or stream the results? (List is canonical; streaming/yield is a follow-up.)
- Empty case
n = 0? (Per constraintsn ≥ 1. If allowed: return[""].) - Are the parentheses always
(and)? (Yes for the canonical problem; brackets and braces is a generalization.)
Examples
n | Output |
|---|---|
| 1 | ["()"] |
| 2 | ["(())", "()()"] |
| 3 | ["((()))", "(()())", "(())()", "()(())", "()()()"] |
| 4 | 14 strings |
Initial Brute Force
Generate all 2^(2n) strings of length 2n over {(, )}. Filter by validity (use the stack from Lab 05). Return the survivors.
def gen_brute(n):
out = []
def rec(s):
if len(s) == 2*n:
if is_valid(s): # Lab 05 routine
out.append(s)
return
rec(s + "(")
rec(s + ")")
rec("")
return out
Brute Force Complexity
2^(2n) strings; each takes O(n) to validate. Total O(n · 4^n). For n = 8: ~10⁵ operations — fast, but for n = 16 it would be billions.
Optimization Path
Prune as we build. Track (open, close) counts; the rules are:
- We may add
(ifopen < n. - We may add
)ifclose < open(otherwise we’d close before opening).
Every leaf of this pruned tree is a valid string; no validation needed. The number of leaves is exactly C_n.
Final Expected Approach
def generate_parentheses(n):
out = []
def backtrack(s, opens, closes):
if len(s) == 2 * n:
out.append(s)
return
if opens < n:
backtrack(s + "(", opens + 1, closes)
if closes < opens:
backtrack(s + ")", opens, closes + 1)
backtrack("", 0, 0)
return out
Data Structures Used
- The recursion call stack (depth =
2n). - An accumulator string built up by concatenation (or, for efficiency, a list of chars joined at the leaf).
- An output list of strings.
Correctness Argument
Soundness: every leaf has length 2n, opens == n, closes == n (else we wouldn’t reach length 2n under the pruning rules). At every prefix, closes ≤ opens (we only added ) when closes < opens). Therefore every leaf is balanced.
Completeness: any valid parenthesization satisfies the same two rules at every prefix (it’s the characterization of valid prefixes). Therefore the recursion explores it. By induction on length: every valid prefix s of length < 2n extended by ( (if extensible) or ) (if extensible) appears in the tree.
Uniqueness: at each node we make distinct choices (( vs )), so two leaves cannot have the same string.
Complexity
- Number of leaves:
C_n = (2n)! / ((n+1)! n!) ≈ 4^n / (n^(3/2) · √π). - Cost per leaf:
O(n)to copy the final string. - Total time:
O(n · C_n)=O(4^n / √n). - Space: output is
O(n · C_n). Recursion stack isO(n).
Implementation Requirements
- Two counters:
opens,closes. Don’t track the full prefix’s validity — the counters are sufficient. - Termination at
len(s) == 2 * n, not when both counters hitn(equivalent, but the length check is clearer). - Pass
simmutably (string concat) for clarity, or mutate a list andappend/popfor performance — but forn ≤ 8the difference is negligible. - Don’t generate then filter. The whole point is to not visit invalid branches.
Tests
- Smoke:
n = 3→ 5 strings. - Unit:
n = 1→["()"];n = 2→ 2 strings. - Edge:
n = 0(if allowed) →[""]. - Property: count of returned strings equals
C_n(compute reference Catalan number). - Property: every string in the output is valid (run Lab 05’s
is_valid). - Property: all strings are distinct (length =
len(set(out))). - Large:
n = 8returns 1430 strings in milliseconds.
Follow-up Questions
- “Generate iteratively using an explicit stack.” → push partial states
(s, opens, closes); pop and expand. - “Return only the count, not the strings.” → that’s just
C_n; closed form:comb(2n, n) // (n + 1). - “Brackets, braces, and parens (multi-type).” → much harder; can’t be solved by simple counters because the closer must match the most-recent opener.
- “Stream results lazily (generator/yield).” → in Python,
yieldfrom each leaf; saves memory. - “Memoize.” → the canonical formulation has no overlapping subproblems on
(opens, closes, s)becausesis unique at every state. If you parametrize by just counts, you lose the actual string. - “Why is the count
C_n?” → bijection with Dyck paths, balanced trees ofn+1leaves, etc.
Product Extension
A SQL/expression-grammar generator for fuzz testing. Generating syntactically valid parenthesized expressions is a backtracking-with-pruning problem; arbitrary depth-bounded grammars use the same technique. The code-generation engine inside any compiler’s “synthesize a small valid program” tool uses this exact pattern.
Language/Runtime Follow-ups
- Python: strings are immutable, so each
s + "("allocates. For largern, build with alistand"".join(...)at the leaf. - Java: use
StringBuilderanddelete/setLengthat backtrack — the canonical mutable-builder pattern. Pass the builder by reference; remember to undo each append on return. - Go: strings are immutable; use
[]byteorstrings.Builder. Beware: astrings.Builderdoes not support truncation; use a byte slice with a manual length pointer. - C++: use
std::stringmutated in place withpush_back/pop_back. Pass by reference. - JS/TS: strings are immutable; concat is fine for small
n. For larger, use an array. - Recursion depth:
2n. Forn ≤ 8, depth ≤ 16 — trivial. Evenn = 1000(academic) is safe in most languages. - Tail-call optimization: absent in Python and JS; this code isn’t tail-recursive anyway because there are two recursive branches.
Common Bugs
- Adding
)without checkingcloses < opens— generates)(...prefixes that can never become valid; produces duplicates and invalid strings. - Adding
(without checkingopens < n— overshoots; never closes; never reaches the leaf condition. - Wrong termination —
if opens == n and closes == ninstead ofif len(s) == 2nis fine but harder to reason about. - Backtracking with mutation but not undoing — append
(, recurse, forget to pop before recursing again. Adds spurious chars. - Catalan miscount — saying complexity is
O(2^(2n))instead ofO(4^n / √n)is a forgivable but suboptimal answer.
Debugging Strategy
- Print the recursion tree: indent by
len(s)and show(s, opens, closes). - Run for
n = 2and verify the output is exactly["(())", "()()"]. - Count outputs and compare to
comb(2n, n) // (n + 1).
Mastery Criteria
- Identified backtracking-with-pruning as the right tool within 60 seconds.
- Wrote the two pruning rules without help.
- Stated complexity as
O(n · C_n)≈O(4^n / √n). - Acknowledged the Catalan-number connection.
- Wrote the iterative-with-explicit-stack version on demand.
- Selected the appropriate language idiom (StringBuilder, []byte, etc.) and remembered to undo mutations on backtrack.
Lab 09 — Tree Traversal Fundamentals
Goal
Master the three depth-first traversals (preorder, inorder, postorder) in both recursive and iterative forms, plus level-order (BFS). Understand the explicit-stack simulation of recursion, the postorder trick, and Morris traversal as the O(1)-space follow-up. The deliverable: implement iterative inorder cleanly and explain the stack invariant.
Background Concepts
Binary trees; DFS vs BFS; recursion as implicit stack; explicit stack as iterative replacement; visited flags. Review the Trees section and the Stacks section of the Phase 1 README.
Interview Context
Tree traversal is a Day-1 interview topic. Recursive forms are trivial; the interesting signal is iterative inorder (the canonical “implement recursion with a stack” question). Postorder iteratively is harder still — and Morris traversal (O(1) space) shows up in senior interviews.
Problem Statement
Given the root of a binary tree, return the inorder traversal as a list. Implement iteratively (no recursion).
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
Constraints
- Number of nodes in
[0, 10^4]. -100 ≤ Node.val ≤ 100.- The tree is not necessarily balanced.
Clarifying Questions
- What’s the node definition? (As above — confirm with interviewer.)
- Empty tree allowed? (Yes — return
[].) - Duplicate values? (Allowed; doesn’t affect traversal.)
- Are we limited on stack space? (For
10^4nodes in a degenerate (linked-list) tree, recursive blows Python’s default 1000-deep stack. Iterative is required.) - Must we use O(1) extra space? (If yes — Morris traversal. Otherwise the explicit stack is fine.)
Examples
| Tree | Inorder |
|---|---|
[1, null, 2, 3] (LeetCode array form) | [1, 3, 2] |
[] | [] |
[1] | [1] |
BST [2, 1, 3] | [1, 2, 3] (sorted!) |
Initial Brute Force
Recursive — visit left, root, right.
def inorder_recursive(root):
out = []
def rec(node):
if not node: return
rec(node.left)
out.append(node.val)
rec(node.right)
rec(root)
return out
Brute Force Complexity
O(N) time, O(H) space (recursion stack, where H = tree height). H = N for a degenerate tree, log N for balanced.
Optimization Path
The iteration cost is the same — O(N) — but recursion uses the system stack which has a fixed limit (~1000 frames in Python by default). For N = 10⁴ in a worst-case skewed tree, recursion crashes. We replace with an explicit stack.
The pattern: “go left as far as possible, pushing each node along the way; when we can’t go further left, pop, visit, then move to right child and repeat.”
Final Expected Approach
def inorder_iterative(root):
out, stack, node = [], [], root
while node or stack:
while node: # walk left, pushing
stack.append(node)
node = node.left
node = stack.pop() # leftmost unvisited
out.append(node.val) # visit
node = node.right # explore right subtree
return out
For preorder: visit before descending left.
def preorder_iterative(root):
if not root: return []
out, stack = [], [root]
while stack:
node = stack.pop()
out.append(node.val)
if node.right: stack.append(node.right) # right first → left popped first
if node.left: stack.append(node.left)
return out
For postorder: hardest. Two clean approaches:
- Modified preorder + reverse: do preorder but visit right before left; reverse the result.
- Visited-flag trick: push
(node, visited)tuples; on first pop, push back as visited and push children; on second pop, emit.
def postorder_iterative(root):
if not root: return []
out, stack = [], [root]
while stack:
node = stack.pop()
out.append(node.val)
if node.left: stack.append(node.left)
if node.right: stack.append(node.right)
return out[::-1]
Data Structures Used
- An explicit stack (list/deque/
Stack/std::stack). - A pointer/cursor
node. - An output list.
- For Morris: only the output list and tree pointers themselves.
Correctness Argument
Inorder invariant: at the start of each outer-loop iteration, the stack contains the chain of ancestors (along left-edges) of the next-to-visit node, all of whose left subtrees are pending. The inner while node walk pushes new ancestors. After popping, we’ve finished its left subtree (it was either null or fully consumed in earlier iterations); we visit it; then move to its right subtree, where the same invariant resumes.
Termination: every node is pushed exactly once and popped exactly once (visited once). N pushes + N pops = O(N) work. Loop ends when both node is null and stack is empty — meaning we’ve returned from the rightmost-rightmost subtree.
Complexity
O(N) time, O(H) auxiliary space (the stack holds at most one chain of ancestors).
For a skewed tree, H = N. For balanced, H = log N.
Implementation Requirements
- Initialize
node = root,stack = []. - Outer loop:
while node or stack. - Inner loop walks left and pushes.
- After inner loop: pop, append val, set
node = popped.right. - Don’t push
None. Don’t visit on the way down. - For preorder, push right before left (LIFO order).
- For postorder, the “modified preorder + reverse” form is the cleanest one-pass solution.
Tests
- Smoke: the example table.
- Unit: single node; empty tree; left-skewed (degenerate); right-skewed; balanced BST.
- Property: inorder of a BST is sorted ascending.
- Property: preorder + inorder uniquely determine a binary tree (round-trip test if you implement the reconstruction).
- Edge: root with only-left child; root with only-right child.
- Large: N = 10⁴ skewed tree — must not stack-overflow.
Follow-up Questions
- “Implement Morris traversal (O(1) space).” → temporarily rewire the tree using “threaded” pointers from inorder predecessors back to successors; restore on the way through.
- “Level-order traversal.” → BFS with a queue.
- “Zigzag level-order.” → BFS with alternating direction; reverse every other level.
- “Reconstruct a tree from its preorder + inorder.”
- “Boundary traversal of a tree.” → combination of left boundary, leaves left-to-right, right boundary reversed.
- “Verify a BST.” → inorder traversal must be strictly ascending; or carry
(min, max)bounds recursively.
Product Extension
A directory tree being indexed: BFS gives breadth-first crawl (sibling priority); preorder DFS visits parent before children (renderers); postorder visits children before parent (size accumulation, deletion). A code formatter walks the AST in postorder so children’s formatted text is available when the parent emits its own. A serializer uses preorder. The choice of traversal is a design decision tied to dependency direction.
Language/Runtime Follow-ups
- Python:
listworks as stack viaappend/pop.collections.dequeis faster for very deep stacks. Default recursion limit is 1000 —sys.setrecursionlimit(10**5)if recursing on huge trees. - Java:
Deque<TreeNode> stack = new ArrayDeque<>(). Avoidjava.util.Stack(legacy, synchronized). For BFS, useDeque<TreeNode> q = new ArrayDeque<>(). - Go: slices as stacks:
stack = append(stack, n), pop withstack = stack[:len(stack)-1]. No generic stack in stdlib pre-1.18. - C++:
std::stack<TreeNode*>andstd::queue<TreeNode*>. - JS/TS: array
push/pop. For BFS,Array.shift()is O(N); use a real deque or an index pointer to avoid quadratic blowup on large trees. - Recursion depth: Python ~1000 default; Java ~10⁴ on default
-Xss; Go grows stacks dynamically. For 10⁴ skewed trees, iterative is mandatory in Python.
Common Bugs
- Pushing
Noneonto the stack — bloats the stack and requires defensive pops. - Visiting on the way down for inorder — that’s preorder.
- In preorder iterative, pushing left before right — gives reverse-of-preorder.
- For postorder, forgetting to reverse in the modified-preorder approach.
- BFS using
list.pop(0)in Python — O(N) shift on every level; quadratic on deep trees. Usecollections.dequeandpopleft(). - Inner loop not consuming left children — only pushing root; you never reach the leftmost node.
- Mutation in Morris traversal — forgetting to restore the threaded pointer; leaves the tree corrupted.
Debugging Strategy
- Print stack contents and
nodeat the top of each outer-loop iteration. - For inorder, the first visit should be the leftmost node.
- For a known BST, the output of inorder must be sorted; if not, the loop is wrong.
- For large skewed inputs, iterative must finish without
RecursionError.
Mastery Criteria
- Wrote iterative inorder from memory in under 3 minutes.
- Stated the stack invariant (chain of ancestors along left edges).
- Wrote preorder iteratively and explained the right-before-left push order.
- Articulated two postorder strategies (reverse-preorder vs visited-flag).
- Knew that BFS uses a queue; can explain why
list.pop(0)is wrong in Python. - Could sketch Morris traversal at a high level: rewire to predecessors, restore on the way through.
- Recognized that inorder of a BST is sorted (and used it to verify a BST).
Phase 2 — Standard Coding Interview Patterns
Target level: Medium → Medium-Hard Expected duration: 4 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~7 patterns introduced per week + 50–80 problems applying them under the framework
Why This Phase Is The Keystone
Phase 0 fixed your execution. Phase 1 fixed your vocabulary. Phase 2 fixes the only thing standing between you and a 95% Medium solve rate: pattern recognition.
Here is the empirical claim, and it is the entire reason this phase exists:
Any unseen LeetCode Medium becomes a 5-minute problem if you immediately recognize the pattern. The recognition takes ~30 seconds. The remaining 4–5 minutes are mechanical: instantiate the template, adapt to the problem’s specifics, write clean code, test.
Candidates who fail Mediums almost never fail because the pattern was hard. They fail because they did not recognize the pattern, so they tried to derive the algorithm from first principles in 25 minutes — a task the original algorithm researcher needed weeks for. Pattern recognition is not memorization; it is the compiled, searchable index of the entire algorithmic literature, indexed by problem-statement signal.
This is the difference between a candidate who looks at “find longest substring with at most K distinct characters” and thinks “sliding window with a frequency map, variable-size, shrink while violation, O(N)” in 20 seconds — and one who thinks “hmm, maybe two pointers? or sort? or…” and starts coding the wrong thing.
The 28 patterns below cover >90% of the problems asked at Big Tech, infrastructure companies, quant firms, and systems-engineering interviews. They are not all the patterns in existence — Phases 3–7 add advanced data structures, hard graphs, DP families, and competitive-programming techniques. But these 28 are the ones that, once internalized, transform Medium-level problems from “puzzles to solve” into “templates to instantiate”.
What You Will Be Able To Do After This Phase
- For any Medium problem, recognize the dominant pattern in <2 minutes of reading the problem statement.
- For each of the 28 patterns, write the canonical template from memory in <5 minutes.
- Distinguish between superficially-similar patterns (e.g., binary search on index vs binary search on answer) by their signal, not their syntax.
- Combine two patterns when one alone is insufficient (e.g., monotonic deque inside a sliding window; trie inside a backtracking DFS).
- Diagnose, when a pattern almost fits but not quite, exactly which generalization is needed (e.g., “this is sliding window but the window is variable and we need the max — we need a monotonic deque, not just a counter”).
- Communicate the pattern out loud at the moment of recognition: “This is X because of signal Y; the template is Z; expected complexity is W; the canonical pitfall is P.”
How To Read This Phase
This README is a reference manual for all 28 patterns, plus a recognition cheat sheet, plus a mastery checklist. Each pattern entry has a fixed structure:
- Signal recognition — the words/structure in the problem statement that should fire this pattern within 2 minutes of reading
- Canonical template — pseudocode you should be able to write from memory
- Complexity — time and space, with the constants that matter
- Common variants — the family tree (e.g., sliding window has fixed-size, variable-size, count-based variants)
- Classic problems — 4–8 LeetCode problems where this pattern is the intended solution
- Common bugs — the specific failure modes seen on this pattern in interviews
Read it linearly the first time. Refer back to specific patterns as you work the labs. After all labs, re-read the cheat-sheet table at the bottom — it should now read as obvious.
A Word On The 28 Patterns As A System
The patterns are not 28 unrelated tricks. They form a small number of meta-strategies:
- Linear scans with state (1, 2, 3, 4, 5, 9, 10, 11) — one pass, maintain a structure
- Reduce-to-sorted (6, 7, 11) — sort first, then exploit order
- Decision-on-monotonic-axis (8) — binary search where the axis is the answer itself
- Local-update primitives on linear/tree/graph topology (12, 13, 14, 15, 16, 17, 18) — propagate information along edges/pointers
- Enumerate with pruning (19) — exhaustive search with backtracking
- Memoize over a state space (20, 21, 22, 23, 24, 25) — cache answers to a DAG of subproblems
- Specialized structures for prefix/order queries (26, 27, 28) — trie, heap, K-way merge
Recognizing the meta-strategy first, then drilling down to the specific pattern, is often faster than trying to match all 28 patterns linearly.
Inline Pattern Reference
1. Two Pointers (opposite ends + same direction)
Signal Recognition (<2 min)
- The input is sorted (or can be sorted cheaply) and the problem asks for a pair/triplet with a property.
- The problem says “in-place” and you are scanning an array.
- The answer is symmetric: it depends on values from both ends shrinking inward.
- “Find pair such that
a + b = target” with sorted input. - “Remove duplicates in place” / “Move zeros”.
Canonical Template (Opposite Ends)
l, r = 0, len(a) - 1
while l < r:
if condition(a[l], a[r]):
# record / move both
l += 1; r -= 1
elif a[l] + a[r] < target:
l += 1
else:
r -= 1
Canonical Template (Same Direction / Read-Write Pointers)
write = 0
for read in range(len(a)):
if keep(a[read]):
a[write] = a[read]
write += 1
return write # new length
Complexity
Time O(N) (each pointer moves monotonically — total moves bounded by N). Space O(1).
Common Variants
- Opposite ends: Two Sum on sorted, 3Sum, container with most water, valid palindrome.
- Same direction (slow/fast): remove duplicates, move zeros, partitioning around a pivot.
- Two arrays merge: merge two sorted arrays / lists.
- Cycle detection (Floyd): linked-list two-pointer where fast moves 2× slow.
Classic Problems
- LeetCode 1 — Two Sum (variant: sorted input becomes two pointers)
- LeetCode 15 — 3Sum
- LeetCode 11 — Container With Most Water
- LeetCode 26 — Remove Duplicates from Sorted Array
- LeetCode 75 — Sort Colors (Dutch national flag)
- LeetCode 125 — Valid Palindrome
- LeetCode 167 — Two Sum II Sorted
- LeetCode 283 — Move Zeroes
Common Bugs
- Forgetting to advance both pointers when a match is recorded → infinite loop.
- Off-by-one in
while l < rvsl <= r(depends on whether single element is meaningful). - Skipping duplicates: forgetting the inner
while l < r and a[l] == a[l+1]: l += 1after a recorded match (3Sum).
2. Sliding Window (fixed size + variable size)
Signal Recognition (<2 min)
- “Longest / shortest / count of subarrays / substrings with property X.”
- “Maximum sum of K consecutive elements.”
- “Subarray containing all of …” / “smallest substring that contains all chars of T.”
- The brute force is O(N²) over all subarrays. The property is monotone as the window grows or shrinks.
Canonical Template (Variable Size, Shrink-While-Violation)
l = 0
state = init()
best = 0
for r in range(len(a)):
state = add(state, a[r])
while violates(state):
state = remove(state, a[l])
l += 1
best = max(best, r - l + 1)
Canonical Template (Fixed Size K)
state = init()
for i in range(K): state = add(state, a[i])
best = report(state)
for r in range(K, len(a)):
state = add(state, a[r])
state = remove(state, a[r - K])
best = update(best, report(state))
Complexity
Time O(N) — each element enters and leaves the window at most once. Space O(window state size).
Common Variants
- Fixed-size: maximum sum, average, min/max via deque.
- Variable-size with constraint to shrink under: at most K distinct, sum ≤ S, no repeats.
- Variable-size with constraint to grow until satisfied: smallest window containing all of T (then shrink while still satisfying).
- Count of “good” windows = count of “good” right endpoints, often
count += r - l + 1after each step.
Classic Problems
- LeetCode 3 — Longest Substring Without Repeating Characters
- LeetCode 76 — Minimum Window Substring
- LeetCode 209 — Minimum Size Subarray Sum
- LeetCode 340 — Longest Substring with At Most K Distinct Characters
- LeetCode 424 — Longest Repeating Character Replacement
- LeetCode 567 — Permutation in String
- LeetCode 992 — Subarrays with K Different Integers (the “exactly K = atMost(K) − atMost(K-1)” trick)
- LeetCode 1004 — Max Consecutive Ones III
Common Bugs
- Updating the answer inside the shrink loop instead of after — leads to recording invalid windows.
- Forgetting that
while(notif) is required when shrinking — a single character can violate by>1. - Counting “exactly K” as
atMost(K)instead ofatMost(K) − atMost(K-1). - For “no repeats”, forgetting that the freq map needs decrement on shrink, not just delete.
3. Prefix Sums (1D + 2D)
Signal Recognition (<2 min)
- “Sum/count over a range
[l, r]” with many queries or asked once with N up to 10^5. - “Subarray with sum equal to K” (prefix sum + hashmap of seen prefix sums).
- “Number of subarrays with sum divisible by K” (prefix sums mod K).
- 2D: “matrix region sum” / “rectangle of ones”.
Canonical Template (1D)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = prefix[i] + a[i]
# range sum a[l..r]: prefix[r + 1] - prefix[l]
Canonical Template (2D)
P = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n):
for j in range(m):
P[i+1][j+1] = a[i][j] + P[i][j+1] + P[i+1][j] - P[i][j]
# region (r1,c1)..(r2,c2):
# P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1]
Complexity
Build O(N) (1D) or O(NM) (2D). Each query O(1). Space O(N) or O(NM).
Common Variants
- Subarray-sum-equals-K with hashmap
{prefix_sum: count}. - XOR prefix for “subarray XOR equals K” — same trick, different operator (any group operator works).
- Mod K prefix for “subarray sum divisible by K” — bucket prefixes by their value mod K.
- Count of negative numbers in a sorted matrix via row prefix.
- 2D rectangle sum, 2D max-sum submatrix.
Classic Problems
- LeetCode 303 — Range Sum Query Immutable
- LeetCode 304 — Range Sum Query 2D Immutable
- LeetCode 560 — Subarray Sum Equals K
- LeetCode 525 — Contiguous Array
- LeetCode 974 — Subarray Sums Divisible by K
- LeetCode 1248 — Count Number of Nice Subarrays
- LeetCode 1314 — Matrix Block Sum
Common Bugs
- Off-by-one:
prefixis size N+1, indexed 0..N. Range[l,r]isprefix[r+1] - prefix[l]. Get this wrong once and it’s wrong forever. - 2D inclusion-exclusion sign flip.
- Initializing the hashmap:
{0: 1}is needed for “subarrays starting at index 0” in subarray-sum-equals-K. - Integer overflow: prefix sums at N=10^5 with values up to 10^9 exceed 32-bit. Use 64-bit.
4. Difference Arrays (range update O(1))
Signal Recognition (<2 min)
- “Apply many range updates
(l, r, +v)then query the final array.” - “How many flights on day X” given
(start, end, count)triples. - “Maximum overlap of intervals.”
- The brute would be O(N · Q); a difference array makes it O(N + Q).
Canonical Template
diff = [0] * (n + 1)
for (l, r, v) in updates:
diff[l] += v
diff[r + 1] -= v
a = [0] * n
cur = 0
for i in range(n):
cur += diff[i]
a[i] = cur
Complexity
O(N + Q) total. Space O(N).
Common Variants
- Booking-system style: count of overlapping intervals at each point.
- 2D difference (imos method): stamp rectangles, prefix-sum twice.
- Sweep line equivalence: events at
landr+1are exactly the events of a sweep; difference array is the “discretized sweep”. - Range add + point query with later updates: Fenwick/BIT becomes more flexible (Phase 3).
Classic Problems
- LeetCode 1109 — Corporate Flight Bookings
- LeetCode 1854 — Maximum Population Year
- LeetCode 2381 — Shifting Letters II
- LeetCode 370 — Range Addition
- LeetCode 731 — My Calendar II
- LeetCode 2536 — Increment Submatrices by One (2D diff)
Common Bugs
- Forgetting the
r + 1cancellation → all suffixes get incremented. - Using
[0] * ninstead of[0] * (n + 1)— causes index OOB ondiff[r + 1]. - For 2D: forgetting the inclusion-exclusion of all four corners.
5. Hashing Patterns (frequency / complement / grouping)
Signal Recognition (<2 min)
- “Find
target - x” for some target → complement in a hashmap. - “Most/least frequent X” → frequency map, often paired with heap/sort.
- “Group by canonical form” (anagrams, isomorphic strings) → grouping map keyed by canonical form.
- “Has any element appeared twice within K positions?” → sliding window of size K with a hashset.
Canonical Templates
# complement
seen = {}
for i, x in enumerate(a):
if (target - x) in seen:
return [seen[target - x], i]
seen[x] = i
# frequency
from collections import Counter
freq = Counter(a)
top = freq.most_common(K)
# grouping by canonical form
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
groups[canonical(s)].append(s)
Complexity
O(N) average (hash). Space O(N) worst case. Adversarial inputs may degrade to O(N²) — see Phase 1 §3.
Common Variants
- Two-Sum (complement).
- Group anagrams (grouping by char-count tuple).
- Longest consecutive sequence (set-membership test for
x-1to find sequence starts). - Subarray sum = K (prefix-sum + complement — see pattern 3).
- Bullet-proof word ladder (wildcards as keys).
Classic Problems
- LeetCode 1 — Two Sum
- LeetCode 49 — Group Anagrams
- LeetCode 128 — Longest Consecutive Sequence
- LeetCode 217 — Contains Duplicate
- LeetCode 219 — Contains Duplicate II
- LeetCode 347 — Top K Frequent Elements
- LeetCode 451 — Sort Characters by Frequency
Common Bugs
- Java
int[]as a key — uses object identity, not value equality. (See Phase 1 lab 03.) - Inserting into
seenbefore the lookup, when the problem needs distinct indices. - Using ordered map when unordered suffices (e.g., Java
TreeMapinstead ofHashMap) → log-N factor. - Reusing a mutable buffer as a key — all keys alias to the latest buffer.
6. Sorting + Greedy (sort to enable greedy)
Signal Recognition (<2 min)
- “Maximum number of non-overlapping …” → sort by end, take earliest end.
- “Minimum number of meeting rooms / arrows / platforms” → sort by start; sweep.
- “Schedule jobs to maximize profit / minimize lateness.”
- “Pair items optimally” → sort one or both, pair by index.
- The brute force is “try all pairings” (factorial); sortedness collapses it to linear.
Canonical Template
a.sort(key=lambda x: x[1]) # sort by end
chosen = []
last_end = -inf
for (s, e) in a:
if s >= last_end:
chosen.append((s, e))
last_end = e
return len(chosen)
Complexity
Time O(N log N) for the sort, O(N) for the sweep. Space O(1) beyond sort buffer.
Common Variants
- Activity selection — sort by end, take earliest end.
- Minimum platforms / arrows — sort by start (or by end for arrows).
- Pairing: sort and pair by index (e.g., “minimum pair-sum to fit a target”).
- Two arrays joined — sort both, two-pointer merge.
- Custom comparator — sort by a derived value (profit/time, deadline-then-profit, etc.) requires proving the exchange argument.
Classic Problems
- LeetCode 56 — Merge Intervals
- LeetCode 252 — Meeting Rooms (and 253 — Meeting Rooms II)
- LeetCode 435 — Non-overlapping Intervals
- LeetCode 452 — Minimum Number of Arrows to Burst Balloons
- LeetCode 502 — IPO (sort by capital, pq by profit)
- LeetCode 630 — Course Schedule III
- LeetCode 881 — Boats to Save People
Common Bugs
- Sorting by the wrong key (start vs end). Activity selection by start is wrong.
- Forgetting to prove the exchange argument before committing to greedy. (See Phase 6.)
- For “non-overlap” problems: confusing
s >= last_end(touching allowed) vss > last_end(strict). - For comparator: subtraction overflow in Java/JS when sorting
intdifferences.
7. Binary Search On Index (sorted array)
Signal Recognition (<2 min)
- The input is sorted (or has a sorted property like a rotated sorted array).
- The task is “find X” / “find first / last X” / “find insertion point”.
- N is large (10^5+), and the brute O(N) is acceptable but O(log N) is wanted (or there are many queries).
Canonical Template (lower_bound)
def lower_bound(a, target):
lo, hi = 0, len(a)
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < target:
lo = mid + 1
else:
hi = mid
return lo # first index with a[i] >= target
Complexity
Time O(log N) per query. Space O(1).
Common Variants
lower_bound,upper_bound, exact-match.- Rotated sorted array — pick the half that is sorted, decide which half contains the target.
- Search in 2D matrix — flatten coordinates, binary search the 1D index, or descend from top-right.
- Find peak — local-property binary search (no global sort required).
Classic Problems
- LeetCode 33 — Search in Rotated Sorted Array
- LeetCode 34 — Find First and Last Position
- LeetCode 35 — Search Insert Position
- LeetCode 74 — Search a 2D Matrix
- LeetCode 153 — Find Minimum in Rotated Sorted Array
- LeetCode 162 — Find Peak Element
- LeetCode 240 — Search a 2D Matrix II (descend from top-right; not binary search per se)
Common Bugs
(lo + hi) // 2overflow in C++/Java — uselo + (hi - lo) // 2.- Wrong loop condition (
<vs<=) interacting with wrong update (midvsmid + 1vsmid - 1) — pick a single canonical form (we use half-open[lo, hi)here) and stick with it. - Off-by-one when reconstructing the actual index after finding the bound.
- For rotated arrays, forgetting that duplicates break the binary search invariant.
8. Binary Search On Answer (parametric / monotonic predicate)
Signal Recognition (<2 min)
- The problem asks for the minimum X such that property P(X) holds (or maximum X such that ¬P).
Pis monotonic in X (if P(X) holds, P(X+1) also holds — or vice versa).- Direct construction is hard, but verifying a candidate answer in O(N) or O(N log N) is easy.
- Constraints: answer’s range is enormous (10^9, 10^18), but verification per candidate is cheap.
- Keywords: “smallest capacity / speed / time”, “largest minimum”, “split into K parts minimize max sum”.
Canonical Template
def feasible(x): ... # returns True if x is a valid answer or larger
lo, hi = LOW, HIGH
while lo < hi:
mid = (lo + hi) // 2
if feasible(mid):
hi = mid
else:
lo = mid + 1
return lo # smallest feasible value
Complexity
Time O(log(range) · cost_of_feasible). Space O(1) beyond feasible.
Common Variants
- Min-max / max-min (split array into K parts to minimize the maximum part sum).
- Capacity / rate (capacity to ship within D days; Koko eating bananas).
- Time (earliest day to finish; latest day before failure).
- K-th smallest in matrix / multiplication table (binary search the value, count “≤ value” entries).
- Floating-point binary search — replace
lo < hiwithhi - lo > epsand pick the right output.
Classic Problems
- LeetCode 410 — Split Array Largest Sum
- LeetCode 875 — Koko Eating Bananas
- LeetCode 1011 — Capacity To Ship Packages Within D Days
- LeetCode 1283 — Find Smallest Divisor Given a Threshold
- LeetCode 1482 — Minimum Number of Days to Make m Bouquets
- LeetCode 668 — Kth Smallest Number in Multiplication Table
- LeetCode 1539 — Kth Missing Positive Number
Common Bugs
- Wrong direction of monotonicity — verify by hand on small cases before committing.
- Wrong search bounds (lo too high → miss the answer; hi too low → infinite loop).
feasiblehas a subtle off-by-one — write and testfeasibleindependently before plugging it into the binary search.- Returning
lo - 1orhi + 1accidentally — the half-open[lo, hi)template returnslo, period.
9. Monotonic Stack (next-greater / histogram / span)
Signal Recognition (<2 min)
- “Next/previous greater/smaller element” on each index.
- “Largest rectangle in histogram” / “max area of submatrix of 1’s” (uses histogram per row).
- “Daily temperatures” / “stock span” / “trapping rainwater” (an O(N) variant).
- The brute force is “for each i, scan right (or left) until …” — O(N²); the monotonic stack collapses it to O(N).
Canonical Template (Next Greater)
n = len(a)
result = [-1] * n
stack = [] # indices, values strictly decreasing
for i in range(n):
while stack and a[stack[-1]] < a[i]:
result[stack.pop()] = a[i]
stack.append(i)
Complexity
Time O(N) — each index pushed and popped at most once. Space O(N) for the stack.
Common Variants
- Next/previous, greater/smaller (4 combinations) — a sign flip and a comparator change.
- Histogram problems: maintain stack of indices with strictly increasing heights; on pop, the popped index sees the current as its right boundary and the new top as its left.
- Sum of subarray minimums — for each element, count subarrays where it is the min.
- Trapping rainwater — stack of decreasing heights; each pop produces a “trapped” volume.
- Sliding window max — uses a monotonic deque (pattern 10), not stack.
Classic Problems
- LeetCode 84 — Largest Rectangle in Histogram
- LeetCode 85 — Maximal Rectangle (histogram per row)
- LeetCode 42 — Trapping Rain Water (stack variant)
- LeetCode 496 — Next Greater Element I
- LeetCode 503 — Next Greater Element II (circular)
- LeetCode 739 — Daily Temperatures
- LeetCode 901 — Online Stock Span
- LeetCode 907 — Sum of Subarray Minimums
Common Bugs
- Comparator:
<vs<=matters when there are duplicates and the problem wants “strictly greater” vs “greater-or-equal”. Pick the variant that gives unique boundary assignment. - Forgetting to drain the stack at the end (for problems where unprocessed elements have no next-greater).
- Histogram: forgetting the sentinel
0appended at the end — without it the last bar may not be evaluated. - Storing values vs indices — almost always store indices, derive values when needed.
10. Monotonic Queue (sliding window max/min in O(N))
Signal Recognition (<2 min)
- “Maximum / minimum of every window of size K” (or variable size) in O(N).
- DP transitions of the form
dp[i] = max(dp[j] + ...)forjin some window — the deque maintains the candidatejs. - Constrained Subsequence Sum, Jump Game VI.
Canonical Template (Sliding Window Max)
from collections import deque
dq = deque() # holds indices, a[dq] strictly decreasing
result = []
for i, x in enumerate(a):
while dq and a[dq[-1]] <= x:
dq.pop()
dq.append(i)
if dq[0] <= i - K:
dq.popleft()
if i >= K - 1:
result.append(a[dq[0]])
Complexity
Time O(N). Space O(K) for the deque.
Common Variants
- Sliding-window min (flip comparator).
- DP optimization: when
dp[i] = f(max{dp[j] : j ∈ window}), the deque maintains the max efficiently. - Shortest subarray with sum at least K (LC 862) — combine prefix sums with a monotonic deque on prefix-sum values.
Classic Problems
- LeetCode 239 — Sliding Window Maximum
- LeetCode 862 — Shortest Subarray with Sum at Least K
- LeetCode 918 — Maximum Sum Circular Subarray
- LeetCode 1425 — Constrained Subsequence Sum
- LeetCode 1696 — Jump Game VI
Common Bugs
- Storing values, not indices — lose the ability to evict by window position.
<=vs<for the back-eviction (with duplicates, the wrong choice can leave stale entries that survive past their window).- Forgetting to evict the front when its index is out of window.
- Reporting before the window is full (
i >= K - 1).
11. Intervals (sort by start, merge / sweep)
Signal Recognition (<2 min)
- “Merge overlapping intervals”, “insert interval”, “remove minimum to make non-overlapping”.
- “Meeting rooms” / “minimum platforms” / “maximum concurrent events”.
- “Employee free time” / “interval intersection”.
Canonical Template (Merge)
intervals.sort(key=lambda x: x[0])
merged = []
for s, e in intervals:
if merged and merged[-1][1] >= s:
merged[-1][1] = max(merged[-1][1], e)
else:
merged.append([s, e])
Canonical Template (Sweep Line)
events = []
for s, e in intervals:
events.append((s, +1))
events.append((e, -1)) # or (e + 1, -1) for closed intervals on integers
events.sort()
cur = peak = 0
for _, delta in events:
cur += delta
peak = max(peak, cur)
Complexity
Sort O(N log N), sweep O(N). Space O(N) for events.
Common Variants
- Merge (sort by start, fold).
- Sweep (events at endpoints, count concurrent).
- Heap-of-end-times (for “minimum platforms / rooms”).
- Interval trees / balanced BSTs (Phase 3) for online updates.
- Tie-breaking: end events before start events (or vice versa) depending on whether endpoint contact counts as overlap.
Classic Problems
- LeetCode 56 — Merge Intervals
- LeetCode 57 — Insert Interval
- LeetCode 252 — Meeting Rooms
- LeetCode 253 — Meeting Rooms II
- LeetCode 435 — Non-overlapping Intervals
- LeetCode 759 — Employee Free Time
- LeetCode 986 — Interval List Intersections
- LeetCode 1851 — Minimum Interval to Include Each Query
Common Bugs
- Sorting by end when start was needed (or vice versa).
- Tie-breaking events at the same time wrong — touching intervals counted as overlap (or not) depending on the problem.
- Mutating the input list while iterating (Java
ConcurrentModificationException).
12. Linked List Manipulation (reverse / detect cycle / merge)
Signal Recognition (<2 min)
- The data structure given is
ListNode. - Tasks: reverse, reverse in groups, detect cycle, find middle, merge sorted, partition, deep copy.
- Often combined with a dummy head for return-pointer simplification.
Canonical Templates
# reverse
prev, curr = None, head
while curr:
nxt = curr.next
curr.next = prev
prev, curr = curr, nxt
return prev
# detect cycle (Floyd)
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow is fast: return True
return False
Complexity
All operations O(N) time, O(1) space (recursion variants are O(N) stack).
Common Variants
- Reverse, reverse in K-group, reverse between m..n.
- Floyd’s cycle detection + finding cycle start (mathematical trick: reset slow to head, advance both at speed 1).
- Find middle with slow/fast (handle even/odd length).
- Merge two sorted with dummy.
- Deep copy with random pointer — interleave clones, then split.
- LRU cache (Phase 3) is doubly-linked list + hashmap.
Classic Problems
- LeetCode 21 — Merge Two Sorted Lists
- LeetCode 25 — Reverse Nodes in k-Group
- LeetCode 138 — Copy List with Random Pointer
- LeetCode 141 — Linked List Cycle
- LeetCode 142 — Linked List Cycle II
- LeetCode 206 — Reverse Linked List
- LeetCode 234 — Palindrome Linked List
- LeetCode 876 — Middle of the Linked List
Common Bugs
- Not using a dummy head when the head can change → special-case branches everywhere.
- Reverse: losing the next pointer because of assignment order.
- Cycle detection: incorrect math for finding the cycle start.
- Returning the wrong end (
curris null after the loop;previs the new head).
13. Tree DFS (preorder / inorder / postorder)
Signal Recognition (<2 min)
- The input is a tree (binary, n-ary, or just a graph that happens to be a tree).
- The answer is computed by combining results from subtrees (postorder) or by augmenting a top-down state (preorder).
- BST in-order traversal yields sorted values.
- “Validate”, “diameter”, “lowest common ancestor”, “serialize / deserialize”, “path sum”.
Canonical Template (Postorder)
def dfs(node):
if not node: return base
L = dfs(node.left)
R = dfs(node.right)
return combine(node.val, L, R)
Complexity
Time O(N) — each node visited once. Space O(H) recursion (H = height; up to N for skewed trees).
Common Variants
- Inorder for BSTs (yields sorted; use for “kth smallest” / “validate BST”).
- Preorder when state flows top-down (e.g., max value on path).
- Postorder when answer combines subtree results (e.g., diameter, LCA).
- Iterative with explicit stack — required when recursion depth could overflow (N=10^5 in Python ≈ stack limit).
- Morris traversal for O(1) extra space (Phase 3).
Classic Problems
- LeetCode 94 — Binary Tree Inorder Traversal
- LeetCode 98 — Validate Binary Search Tree
- LeetCode 104 — Maximum Depth of Binary Tree
- LeetCode 124 — Binary Tree Maximum Path Sum
- LeetCode 230 — Kth Smallest Element in a BST
- LeetCode 236 — Lowest Common Ancestor of a Binary Tree
- LeetCode 297 — Serialize and Deserialize Binary Tree
- LeetCode 543 — Diameter of Binary Tree
Common Bugs
- “Validate BST” by checking only
left.val < node.val < right.val(local check) — must pass(min, max)bounds top-down. - Confusing “max path through node” with “max path starting at node” — the diameter trick.
- Stack overflow for deep trees in Python (default limit 1000) —
sys.setrecursionlimitor go iterative. - Mutating shared
pathlist without backtracking → wrong “all paths” output.
14. Tree BFS (level order / right side view)
Signal Recognition (<2 min)
- “Level order / level by level / per-depth” output.
- “Right (or left) side view” — last node at each level.
- “Minimum depth” — first leaf encountered in BFS.
- “Connect next pointers per level” (LC 116/117).
- “Vertical / diagonal traversal” — same machinery with different keying.
Canonical Template
from collections import deque
q = deque([root])
levels = []
while q:
size = len(q)
cur = []
for _ in range(size):
node = q.popleft()
cur.append(node.val)
if node.left: q.append(node.left)
if node.right: q.append(node.right)
levels.append(cur)
Complexity
Time O(N), space O(W) (W = max width; up to N/2 for balanced).
Common Variants
- Plain level order, with or without per-level grouping.
- Zigzag (alternate appending direction).
- Right-/left-side view (last/first per level).
- Minimum depth (first leaf — cuts BFS short on the goal).
- Bottom-up (collect all levels then reverse).
Classic Problems
- LeetCode 102 — Binary Tree Level Order Traversal
- LeetCode 103 — Binary Tree Zigzag Level Order Traversal
- LeetCode 107 — Binary Tree Level Order Traversal II
- LeetCode 111 — Minimum Depth of Binary Tree
- LeetCode 116 — Populating Next Right Pointers in Each Node
- LeetCode 199 — Binary Tree Right Side View
- LeetCode 314 — Binary Tree Vertical Order Traversal
Common Bugs
- Using
list.pop(0)(Python) → O(N²). Usedeque. - Forgetting to capture
size = len(q)before the inner loop — q grows during the loop and you’d over-iterate. - Returning the level structure backwards (or forwards) accidentally.
- Null root not handled.
15. Graph DFS (cycle / connected components / topo via DFS)
Signal Recognition (<2 min)
- The structure is a graph (general, not necessarily tree).
- Tasks: count connected components, detect cycle, topologically order, find bridges/articulation points (Phase 4).
- Recursion is fine (or you simulate it with an explicit stack).
Canonical Template (Connected Components)
visited = [False] * n
def dfs(u):
visited[u] = True
for v in adj[u]:
if not visited[v]: dfs(v)
components = 0
for u in range(n):
if not visited[u]:
dfs(u); components += 1
Canonical Template (Cycle Detection in Directed Graph)
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * n
def has_cycle(u):
color[u] = GRAY
for v in adj[u]:
if color[v] == GRAY: return True
if color[v] == WHITE and has_cycle(v): return True
color[u] = BLACK
return False
Complexity
Time O(V + E). Space O(V) for the recursion + visited arrays.
Common Variants
- Connected components (undirected).
- Strongly connected components (Tarjan / Kosaraju — Phase 4).
- Cycle detection: undirected uses a parent-check; directed uses tri-color (WHITE/GRAY/BLACK).
- Topological sort via DFS — postorder of a DAG, reversed.
- Number of islands / flood fill on grid graphs.
Classic Problems
- LeetCode 200 — Number of Islands
- LeetCode 207 — Course Schedule (cycle detection in directed graph)
- LeetCode 261 — Graph Valid Tree
- LeetCode 323 — Number of Connected Components
- LeetCode 695 — Max Area of Island
- LeetCode 1192 — Critical Connections (Tarjan bridges)
Common Bugs
- Forgetting to mark visited before recursing → infinite recursion.
- For undirected cycle detection, treating “going back to parent” as a cycle.
- Stack overflow for deep recursion in Python on N=10^5 — convert to iterative.
- For grid problems, going out of bounds because the bounds check is after the recursion call.
16. Graph BFS (shortest unweighted / multi-source / 0-1)
Signal Recognition (<2 min)
- Shortest path on an unweighted graph (or weights ∈ {0, 1}).
- “Minimum number of steps / moves / transformations.”
- Multi-source: “shortest distance from any of these sources” (rotting oranges, walls and gates).
- 0-1 BFS: use a deque, push 0-weight to front, 1-weight to back.
Canonical Template
from collections import deque
dist = [-1] * n
q = deque([src]); dist[src] = 0
while q:
u = q.popleft()
for v in adj[u]:
if dist[v] == -1:
dist[v] = dist[u] + 1
q.append(v)
Complexity
Time O(V + E). Space O(V).
Common Variants
- Standard BFS for unweighted shortest path.
- Multi-source BFS — push all sources with distance 0, then run.
- 0-1 BFS with deque for graphs with weights ∈ {0, 1}.
- Bidirectional BFS for shortest path between fixed source and target — both halves explore O(b^(d/2)) instead of O(b^d).
- Word-ladder pattern — neighbors via wildcard buckets, not adjacency list.
Classic Problems
- LeetCode 127 — Word Ladder
- LeetCode 286 — Walls and Gates (multi-source)
- LeetCode 542 — 01 Matrix (multi-source)
- LeetCode 752 — Open the Lock
- LeetCode 994 — Rotting Oranges (multi-source)
- LeetCode 1162 — As Far from Land as Possible
- LeetCode 2290 — Minimum Obstacle Removal to Reach Corner (0-1 BFS)
Common Bugs
- Marking visited at dequeue time (lets duplicates pile up) instead of at enqueue time.
- Using BFS for weighted graphs (distinct positive weights) — wrong; use Dijkstra (Phase 4).
- Forgetting that “minimum depth of binary tree” is BFS, not DFS — DFS visits all leaves; BFS halts on the first.
17. Topological Sort (Kahn’s / DFS-based)
Signal Recognition (<2 min)
- “Order tasks given dependencies” / “course prerequisites” / “build order” / “alien dictionary”.
- Detecting whether a DAG has a cycle (failure = there’s a cycle).
- DP on DAG (some tasks need a topological order to evaluate).
Canonical Template (Kahn’s BFS)
indeg = [0] * n
for u in range(n):
for v in adj[u]:
indeg[v] += 1
q = deque([u for u in range(n) if indeg[u] == 0])
order = []
while q:
u = q.popleft()
order.append(u)
for v in adj[u]:
indeg[v] -= 1
if indeg[v] == 0: q.append(v)
return order if len(order) == n else [] # else: cycle
Canonical Template (DFS Postorder)
order = []
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * n
def dfs(u):
color[u] = GRAY
for v in adj[u]:
if color[v] == GRAY: raise CycleError
if color[v] == WHITE: dfs(v)
color[u] = BLACK
order.append(u)
for u in range(n):
if color[u] == WHITE: dfs(u)
order.reverse()
Complexity
Time O(V + E). Space O(V + E).
Common Variants
- Kahn’s — gives a valid order; cycle detection by
len(order) != n. - DFS postorder reversed — alternate algorithm; same result.
- Lexicographically smallest topological order — use a min-heap instead of FIFO queue (Kahn’s).
- All possible topological orderings — backtracking over orderings (LC 1632).
- DP on DAG following topological order.
Classic Problems
- LeetCode 207 — Course Schedule
- LeetCode 210 — Course Schedule II
- LeetCode 269 — Alien Dictionary
- LeetCode 310 — Minimum Height Trees (peeling leaves — relative)
- LeetCode 329 — Longest Increasing Path in a Matrix (DP on DAG)
- LeetCode 444 — Sequence Reconstruction
- LeetCode 1136 — Parallel Courses
- LeetCode 2115 — Find All Possible Recipes
Common Bugs
- Building the graph with reversed edges (prerequisites vs unlocks).
- Not detecting cycles (returning a partial order silently).
- For lexicographic smallest, using a regular queue instead of a heap.
- Indegrees off-by-one when the same edge is duplicated in input.
18. Union-Find (connectivity / Kruskal preview)
Signal Recognition (<2 min)
- “Are X and Y connected after these operations?” (online connectivity).
- “Number of connected components” with dynamic union operations (DFS once-and-done is sufficient for static).
- Kruskal’s MST (Phase 4) — sort edges, union components.
- Equation problems (LC 399 — Evaluate Division — weighted union-find).
Canonical Template
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb: return False
if self.rank[ra] < self.rank[rb]: ra, rb = rb, ra
self.parent[rb] = ra
if self.rank[ra] == self.rank[rb]: self.rank[ra] += 1
return True
Complexity
Per operation: amortized O(α(N)) ≈ O(1) with both path compression and union by rank/size. Without rank: O(log N) amortized. Without compression: O(log N) amortized. Naive: O(N) worst case.
Common Variants
- Vanilla connectivity.
- With size — track component sizes for “find largest component”.
- Weighted — each edge has a multiplier (LC 399 — Evaluate Division).
- With rollback (Phase 3) — for offline / divide-and-conquer queries.
- Kruskal MST — sort edges by weight, union the endpoints if they’re in different components.
Classic Problems
- LeetCode 200 — Number of Islands (DSU alternative)
- LeetCode 261 — Graph Valid Tree
- LeetCode 305 — Number of Islands II (online)
- LeetCode 399 — Evaluate Division
- LeetCode 547 — Number of Provinces
- LeetCode 684 — Redundant Connection
- LeetCode 721 — Accounts Merge
- LeetCode 1319 — Number of Operations to Make Network Connected
Common Bugs
- Forgetting path compression — TLE on adversarial chain inputs.
findrecursion that deepens the stack (use iterative or two-pass).- Updating
rankonly when unequal — but updating it always makes the rank wrong by+1. - Comparing
parent[x] == xvsfind(x) == x— they differ before compression converges.
19. Backtracking (subsets / permutations / combinations / N-queens)
Signal Recognition (<2 min)
- “Find all subsets / permutations / combinations / arrangements satisfying X.”
- “Place K items respecting constraints” (N-queens, Sudoku).
- The brute force is exponential, and you can’t shave it polynomially — but you can prune aggressively.
Canonical Template
def backtrack(state, choices):
if is_solution(state):
record(state); return
for choice in choices:
if not valid(state, choice): continue
apply(state, choice)
backtrack(state, next_choices(choices, choice))
undo(state, choice)
Complexity
- Subsets: O(N · 2^N).
- Permutations: O(N · N!).
- N-queens: O(N!) worst case, dramatically pruned in practice. Space O(depth) for recursion + O(state size).
Common Variants
- Subsets (include/exclude each element).
- Permutations (choose unused; track used set or swap-in-place).
- Combinations (start index to avoid reordering duplicates).
- Partition into subsets (assign each element to a bucket; prune by sorting + skipping equal-prefix bucket).
- Constraint satisfaction (N-queens, Sudoku) — prune with row/column/box bitmasks.
- String backtracking (palindrome partitioning, restore IP addresses, generate parentheses).
Classic Problems
- LeetCode 17 — Letter Combinations of a Phone Number
- LeetCode 22 — Generate Parentheses
- LeetCode 39 — Combination Sum
- LeetCode 46 — Permutations
- LeetCode 51 — N-Queens
- LeetCode 78 — Subsets
- LeetCode 79 — Word Search
- LeetCode 90 — Subsets II (with duplicates)
- LeetCode 131 — Palindrome Partitioning
- LeetCode 212 — Word Search II (with trie)
Common Bugs
- Forgetting to undo the choice before returning → state corruption.
- Recording
stateby reference, not by copy → all results alias the final state. - Duplicate handling: forgetting
if i > start and a[i] == a[i-1]: continue(for sorted input with duplicates). - For grid backtracking, forgetting to mark visited or not unmarking on return.
20. Basic DP (memoization vs tabulation)
Signal Recognition (<2 min)
- “Number of ways to …” / “Min/max … over choices.”
- Recursive structure with overlapping subproblems: the same sub-question is asked multiple times.
- Optimal substructure: the optimal answer combines optimal answers to subproblems.
- The brute is exponential; with memo the state space is polynomial.
Canonical Template (Top-Down Memoization)
from functools import lru_cache
@lru_cache(maxsize=None)
def solve(state):
if is_base(state): return base_value(state)
return combine(solve(next_state(state, c)) for c in choices(state))
Canonical Template (Bottom-Up Tabulation)
dp = init_table()
for state in topological_order_of_states():
dp[state] = combine(dp[prev] for prev in predecessors(state))
return dp[final_state]
Complexity
Time = (# states) × (work per state). Space = (# states), often optimizable to a slice.
Common Variants (covered separately below)
- 1D DP (pattern 21).
- 2D DP (pattern 22).
- Knapsack (pattern 23).
- Subsequence DP (pattern 24).
- String DP (pattern 25).
- Bitmask / interval / digit / probability / tree (Phase 5).
Classic Problems
- LeetCode 70 — Climbing Stairs
- LeetCode 198 — House Robber
- LeetCode 322 — Coin Change
- LeetCode 416 — Partition Equal Subset Sum
Common Bugs
- Wrong state definition — too coarse to reconstruct, too fine to fit in memory.
- Wrong base case (off-by-one in the empty / single-element base).
- Wrong evaluation order in tabulation — predecessors computed after dependents.
- Memo key collisions when two different state tuples accidentally hash equal.
21. 1D DP (climbing stairs / house robber / decode ways)
Signal Recognition (<2 min)
- The state is a single index: “the answer at position i depends on positions ≤ i”.
- Transitions look at the last 1–3 positions.
- The answer is at
dp[n]ordp[n-1].
Canonical Template
dp = [0] * (n + 1)
dp[0] = base
for i in range(1, n + 1):
dp[i] = combine(dp[i - 1], dp[i - 2], a[i - 1])
return dp[n]
Complexity
Time O(N), space O(N) — usually compressible to O(1).
Common Variants
- Climbing stairs (Fibonacci-shaped).
- House robber / robber II (linear / circular).
- Decode ways (transitions depend on a 2-digit window).
- Maximum subarray (Kadane’s).
- Min cost climbing stairs.
Classic Problems
- LeetCode 53 — Maximum Subarray (Kadane’s)
- LeetCode 70 — Climbing Stairs
- LeetCode 91 — Decode Ways
- LeetCode 121 — Best Time to Buy and Sell Stock
- LeetCode 198 — House Robber
- LeetCode 213 — House Robber II
- LeetCode 264 — Ugly Number II
- LeetCode 746 — Min Cost Climbing Stairs
Common Bugs
- Off-by-one in the
dpsize (nvsn+1). - Wrong base for empty input.
- Decode ways: forgetting that
0is not a valid single-digit decoding. - Compressing to O(1) but reading
dp[i-2]afterdp[i-1]is overwritten.
22. 2D DP (grid / unique paths / LCS preview)
Signal Recognition (<2 min)
- The state is a pair
(i, j)— typically(row, col)or(index_in_A, index_in_B). - Transitions look at neighboring cells:
dp[i-1][j],dp[i][j-1],dp[i-1][j-1]. - Common shapes: grid path counting/min sum, LCS, edit distance, palindrome substring.
Canonical Template
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
dp[i][j] = combine(dp[i-1][j], dp[i][j-1], dp[i-1][j-1], a[i-1], b[j-1])
return dp[n][m]
Complexity
Time O(NM), space O(NM) — often compressible to O(min(N, M)) by keeping two rows.
Common Variants
- Grid DP: unique paths, min path sum, max path sum.
- Two-string DP: LCS, edit distance, regex matching, distinct subsequences (covered in 24/25).
- Matrix DP: maximal square, dungeon game.
- Backwards-traversal (start from
(n, m)) when transitions need future state.
Classic Problems
- LeetCode 62 — Unique Paths
- LeetCode 64 — Minimum Path Sum
- LeetCode 72 — Edit Distance
- LeetCode 174 — Dungeon Game
- LeetCode 221 — Maximal Square
- LeetCode 1143 — Longest Common Subsequence
Common Bugs
- Initializing the first row/column wrong (not the additive identity for the operator).
- Allocating
[[0] * m] * nin Python — all rows alias the same list (top-3 Python DP bug). - 1D-compression bug: reading the new value when the old one was needed (or vice versa).
- For grids with obstacles: forgetting the obstacle ⇒ 0-paths-here invariant.
23. Knapsack (0/1 + unbounded)
Signal Recognition (<2 min)
- “Pick a subset of items to maximize value subject to a capacity constraint” (0/1 knapsack — each item once).
- “Pick items with repetition allowed” (unbounded knapsack — coin change min coins, integer break).
- “Number of ways to make sum K from given items” (counting variant).
Canonical Template (0/1 Knapsack — Compressed 1D)
dp = [0] * (W + 1)
for v, w in items:
for c in range(W, w - 1, -1): # reverse to avoid re-using item
dp[c] = max(dp[c], dp[c - w] + v)
Canonical Template (Unbounded Knapsack)
dp = [0] * (W + 1)
for c in range(1, W + 1): # forward, so each item can be reused
for v, w in items:
if c >= w: dp[c] = max(dp[c], dp[c - w] + v)
Complexity
Time O(N · W). Space O(W) (compressed) or O(N · W) (uncompressed).
Common Variants
- 0/1 vs unbounded vs bounded (limited copies of each item).
- Min coins, count-of-ways, can-we-make-this-sum.
- Subset sum (knapsack with
value = weight). - Partition equal subset sum (subset sum to total/2).
Classic Problems
- LeetCode 322 — Coin Change (unbounded, min)
- LeetCode 416 — Partition Equal Subset Sum (0/1, decision)
- LeetCode 474 — Ones and Zeroes (0/1 with two capacities)
- LeetCode 494 — Target Sum (count-of-ways)
- LeetCode 518 — Coin Change II (unbounded, count)
- LeetCode 879 — Profitable Schemes
- LeetCode 1049 — Last Stone Weight II
Common Bugs
- 0/1 with forward inner loop → double-counts items.
- Unbounded with reverse inner loop → behaves like 0/1.
- For “count of ways” with order-insensitive: outer is items, inner is capacity (LC 518). Order-sensitive: opposite (LC 377).
- Forgetting that
dp[0] = 1for count-of-ways,dp[0] = 0for max-value.
24. Subsequence DP (LIS / LCS / edit distance)
Signal Recognition (<2 min)
- “Longest increasing / common / non-decreasing subsequence.”
- “Edit distance / minimum operations to transform A to B.”
- “Distinct subsequences / supersequences.”
- “Longest palindromic subsequence” (it’s LCS of
sands[::-1]).
Canonical Template (LIS, O(N log N))
import bisect
tails = []
for x in a:
i = bisect.bisect_left(tails, x) # bisect_right for non-decreasing
if i == len(tails): tails.append(x)
else: tails[i] = x
return len(tails)
Canonical Template (LCS / Edit Distance — 2D DP)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, m + 1):
if a[i-1] == b[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 # LCS
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Complexity
LIS: O(N log N) (patience-sort) or O(N²) (DP). LCS / edit distance: O(NM).
Common Variants
- LIS, longest non-decreasing, count of LIS, reconstruction.
- LCS, shortest common supersequence (
N + M − LCS). - Edit distance (Levenshtein), with weighted operations.
- Distinct subsequences (count occurrences of T as subsequence of S).
- Longest palindromic subsequence (= LCS of
sand reverseds).
Classic Problems
- LeetCode 72 — Edit Distance
- LeetCode 115 — Distinct Subsequences
- LeetCode 300 — Longest Increasing Subsequence
- LeetCode 354 — Russian Doll Envelopes (LIS in 2D)
- LeetCode 516 — Longest Palindromic Subsequence
- LeetCode 583 — Delete Operation for Two Strings
- LeetCode 673 — Number of LIS
- LeetCode 1143 — Longest Common Subsequence
Common Bugs
- LIS with
bisect_leftvsbisect_rightcontrols strict vs non-strict — pick the wrong one and ties are mishandled. - Edit distance: forgetting that the base row/col is
0..n(i deletions / insertions to reach empty). - Reconstruction: walking the
dptable backward, easy to off-by-one.
25. String DP (palindrome / partitioning)
Signal Recognition (<2 min)
- “Longest palindromic substring / subsequence.”
- “Minimum cuts to partition into palindromes.”
- “Word break / segment string into dictionary words.”
- “Regex / wildcard matching.”
Canonical Template (Palindrome Substring DP)
n = len(s)
dp = [[False] * n for _ in range(n)]
for i in range(n): dp[i][i] = True
for length in range(2, n + 1):
for i in range(n - length + 1):
j = i + length - 1
if s[i] == s[j] and (length == 2 or dp[i+1][j-1]):
dp[i][j] = True
Complexity
Most string-DP problems: O(N²) time, O(N²) space (often compressible to O(N)). Manacher (Phase 3) gives O(N) for longest-palindrome.
Common Variants
- Longest palindromic substring (DP, expand-around-center, or Manacher).
- Longest palindromic subsequence (LCS-based).
- Minimum cuts (palindrome partitioning II).
- Word break (boolean DP) and word break II (recover all decompositions).
- Regex / wildcard matching (
?,*,.).
Classic Problems
- LeetCode 5 — Longest Palindromic Substring
- LeetCode 10 — Regular Expression Matching
- LeetCode 44 — Wildcard Matching
- LeetCode 132 — Palindrome Partitioning II
- LeetCode 139 — Word Break
- LeetCode 140 — Word Break II
- LeetCode 516 — Longest Palindromic Subsequence
- LeetCode 647 — Palindromic Substrings
Common Bugs
- Iteration order: filling
dp[i][j]requiresdp[i+1][j-1]already filled — iterate by length, not byithenj. - Word break: building all decompositions naively is O(2^N) — memoize, but be aware total output can still be exponential.
- Wildcard
*matching empty vs many — both transitions needed. - Off-by-one when
j = i + length - 1.
26. Trie (prefix queries / autocomplete preview)
Signal Recognition (<2 min)
- “Many strings, prefix queries” — does any word start with X? Count words starting with X?
- Autocomplete / spell check.
- Word search II (LC 212) — combine trie with backtracking on a grid.
- Maximum XOR pair (LC 421) — bit-level trie.
- Replace words / dictionary lookup.
Canonical Template
class TrieNode:
__slots__ = ('children', 'end')
def __init__(self):
self.children = {}
self.end = False
class Trie:
def __init__(self): self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
node = node.children.setdefault(c, TrieNode())
node.end = True
def search(self, word):
node = self._walk(word)
return node is not None and node.end
def starts_with(self, prefix):
return self._walk(prefix) is not None
def _walk(self, s):
node = self.root
for c in s:
if c not in node.children: return None
node = node.children[c]
return node
Complexity
Insert / search / prefix: O(L) per operation. Space O(N · L) worst case (no shared prefixes).
Common Variants
- Character trie (by char).
- Bit trie (by bit) — for XOR / Hamming-distance problems.
- Compressed (radix) trie — Phase 3.
- Trie + DFS for “all words on a board” (LC 212) — early-prune by failing nodes.
- Suffix trie / suffix automaton — Phase 3 / Phase 12.
Classic Problems
- LeetCode 208 — Implement Trie (Prefix Tree)
- LeetCode 211 — Design Add and Search Words Data Structure
- LeetCode 212 — Word Search II
- LeetCode 336 — Palindrome Pairs
- LeetCode 421 — Maximum XOR of Two Numbers in an Array (bit trie)
- LeetCode 648 — Replace Words
- LeetCode 677 — Map Sum Pairs
- LeetCode 1268 — Search Suggestions System
Common Bugs
- Forgetting the
endflag (or whatever marks a complete word) —search("app")returns true when only"apple"was inserted. - Using a 26-element array vs a hashmap — array is faster but only for fixed alphabets.
- Iterating
node.childrenmistakenly using insertion order assumptions. - For LC 212, not pruning nodes after they’ve been used (still works correctly but wastes time).
27. Heap Top-K (k-largest / k-frequent / k-closest)
Signal Recognition (<2 min)
- “Find the K largest / smallest / most frequent / closest.”
- Online/streaming K-th element.
- Median maintenance (two heaps).
- “Merge K sorted streams” (pattern 28 — see below).
Canonical Template (Top-K with Min-Heap of Size K)
import heapq
heap = []
for x in stream:
if len(heap) < K:
heapq.heappush(heap, x)
elif x > heap[0]:
heapq.heapreplace(heap, x)
return heap # K largest, unsorted
Complexity
Time O(N log K). Space O(K). Compare to full sort O(N log N) — beats it when K << N.
Common Variants
- Top K largest / smallest.
- Top K frequent — bucket sort gives O(N) when frequencies fit (LC 347).
- K closest points to origin — heap of K by distance.
- Median from data stream — two heaps (max-heap of low half, min-heap of high half).
- K-th smallest in matrix / K-th smallest in BST — heap or controlled traversal.
Classic Problems
- LeetCode 215 — Kth Largest Element in an Array
- LeetCode 295 — Find Median from Data Stream
- LeetCode 347 — Top K Frequent Elements
- LeetCode 451 — Sort Characters by Frequency
- LeetCode 692 — Top K Frequent Words
- LeetCode 703 — Kth Largest Element in a Stream
- LeetCode 973 — K Closest Points to Origin
- LeetCode 1046 — Last Stone Weight
Common Bugs
- Using a max-heap for top-K-largest — wrong; use a min-heap of size K (we evict the smallest).
- Java
PriorityQueueis min-heap by default; useComparator.reverseOrder()for max. - Python
heapqis min-heap only; negate values for max-heap. - For “top K frequent words” with tie-breaking (alphabetical) — comparator gets tricky in Java/Python.
28. K-Way Merge (merge K lists / smallest range covering K lists)
Signal Recognition (<2 min)
- K sorted lists/streams; merge them into one sorted output.
- “Find the smallest range that contains at least one element from each of K lists.”
- “Find the K-th smallest in K sorted lists / matrix.”
- External-merge / external-sort flavor.
Canonical Template (Merge K Sorted Lists)
import heapq
heap = []
for i, lst in enumerate(lists):
if lst: heapq.heappush(heap, (lst[0].val, i, lst[0]))
dummy = ListNode(0); tail = dummy
while heap:
val, i, node = heapq.heappop(heap)
tail.next = node; tail = tail.next
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
Complexity
Time O(N log K) where N is the total number of elements. Space O(K) for the heap.
Common Variants
- Merge K sorted lists / arrays / streams.
- Smallest range covering at least one from each list — heap holds one element per list, track current max; pop the min and advance the popped list.
- K-th smallest in sorted matrix — heap of (value, row, col); pop, push next-in-row (or use binary search on answer instead).
- Find smallest pair sums (LC 373) — heap from two sorted lists.
- Skyline problem (LC 218) — sweep over events with a heap of active heights.
Classic Problems
- LeetCode 23 — Merge K Sorted Lists
- LeetCode 218 — The Skyline Problem
- LeetCode 264 — Ugly Number II
- LeetCode 313 — Super Ugly Number
- LeetCode 373 — Find K Pairs with Smallest Sums
- LeetCode 378 — Kth Smallest Element in a Sorted Matrix
- LeetCode 632 — Smallest Range Covering Elements from K Lists
- LeetCode 1675 — Minimize Deviation in Array
Common Bugs
- Heap items must include a tiebreaker — comparing
ListNodedirectly raises TypeError in Python. - Forgetting to push the next element in the same list after popping.
- For “smallest range”: confusing max so far (cheap to maintain) with re-scanning the heap (O(K)).
- Off-by-one when one list is exhausted before others.
Pattern Recognition Cheat Sheet (Signal → Pattern)
This is the table you should be able to traverse, top-to-bottom, in <60 seconds for any new Medium.
| Signal in problem statement | Likely pattern(s) | First template to try |
|---|---|---|
| Sorted input + pair/triplet sum | Two pointers (1) | opposite-ends two-pointer |
| In-place removal / partition | Two pointers (1) | read/write pointer |
| Subarray with property over contiguous elements | Sliding window (2) or prefix sum (3) | shrink-while-violation |
| Max/min of every window K | Monotonic queue (10) | deque indices |
| Subarray sum equals K / divisible K | Prefix sum + hash (3, 5) | prefix + complement map |
| Many range updates then final state | Difference array (4) | diff + prefix |
| “Find pair summing to target” | Hash complement (5) | seen[target − x] |
| “Group by canonical form” | Hashing — grouping (5) | dict[canonical] → list |
| “Maximum non-overlapping …” | Sort + greedy (6, 11) | sort by end, sweep |
| “Number of meeting rooms” | Intervals — sweep (11) | events, +1/−1 |
| Sorted, find element / first ≥ X | Binary search on index (7) | lower_bound |
| “Min capacity / time / speed s.t. P” | Binary search on answer (8) | binary search + feasible() |
| “Next greater / span / histogram” | Monotonic stack (9) | strictly-decreasing stack |
| Linked-list reverse / cycle / merge | Linked-list patterns (12) | dummy + 3-pointer |
| Tree value combined from subtrees | Tree DFS postorder (13) | recursive combine |
| Tree level-by-level | Tree BFS (14) | queue, capture size |
| Graph “connected components” | Graph DFS (15) | visited + DFS |
| Shortest path on unweighted graph | Graph BFS (16) | distances + queue |
| Shortest path with weights ∈ {0,1} | 0-1 BFS (16) | deque, push-front 0 |
| “Order tasks given deps” | Topological sort (17) | Kahn’s BFS |
| “Connectivity with online unions” | Union-find (18) | DSU with path compression |
| “Kruskal MST / spanning tree” | Union-find (18) | sort edges + DSU |
| “All subsets / permutations” | Backtracking (19) | recurse + undo |
| “Constraint satisfaction (N-queens)” | Backtracking (19) | bitmask state |
| “Min/max ways with overlapping subproblems” | DP (20) | memoize state |
| “Single-index recurrence” | 1D DP (21) | dp[i] from dp[i-1..i-3] |
| “Two-index recurrence” | 2D DP (22) | dp[i][j] from neighbors |
| “Pick subset under capacity” | 0/1 knapsack (23) | reverse inner loop |
| “Pick with repetition” | Unbounded knapsack (23) | forward inner loop |
| “LIS / LCS / edit distance” | Subsequence DP (24) | 2D dp or patience sort |
| “Longest palindromic *” / “min cuts” | String DP (25) | palindrome dp + outer loop |
| “Many strings, prefix queries” | Trie (26) | trie + insert/search |
| “K largest/smallest/closest” | Heap top-K (27) | min-heap of size K |
| “Merge K sorted …” | K-way merge (28) | heap of one-per-list |
When two patterns plausibly fit, try both signals on a small example. Often one fits cleanly and the other forces awkward special cases.
Mastery Checklist For This Phase
You are ready to leave Phase 2 when, for every one of the 28 patterns:
- You can recognize the signal in <2 minutes on a fresh Medium.
- You can write the canonical template from memory in <5 minutes, without lookup.
- You can articulate the time/space complexity precisely, including amortized vs worst case.
- You can name 4+ classic LeetCode problems where this pattern is the intended solution.
- You can list at least 2 common bugs that the pattern is famous for.
- You have solved at least 5 problems applying this pattern, with at least 2 reviewed via REVIEW_TEMPLATE.md.
And, more generally:
- You can produce the cheat-sheet table above from scratch (or close to it) on a whiteboard.
- You can name, given a signal, the most likely pattern plus the second-most-likely (because tricky problems disguise themselves).
- You can combine two patterns when one alone is insufficient (e.g., monotonic deque inside sliding window for LC 862; trie inside backtracking for LC 212).
- You have run mock interviews on Mediums and your time-to-recognize is reliably under 2 minutes.
Required Problem Volume
The patterns are not learned from this README. They are learned from solving lots of problems per pattern and reviewing each via REVIEW_TEMPLATE.md, then revisiting via SPACED_REPETITION.md.
Recommended minimums for Phase 2 completion (per pattern):
- 5–8 Medium problems that explicitly use the pattern as their intended solution.
- 2 mock-interview Mediums where the pattern is not hinted (you must recognize it).
- 1 problem at the boundary where two patterns plausibly apply — pick one, justify, solve.
Total over the phase: ~150–200 Mediums. This is the cost. The benefit is that, after Phase 2, almost every Medium becomes a 5-minute problem.
Exit Criteria
You may move to Phase 3 — Advanced Data Structures when:
- You score ≥ 75% on a 10-problem random-Medium mock (35 min each, no hints, no lookup), with the pattern recognition and template write-up completed in the first 5 minutes of each problem.
- You can pass the READINESS_CHECKLIST.md entries for “pattern recognition” without lookup.
- You have completed all 9 labs in labs/ with the format-required deliverables.
- You have at least 40 problems in your spaced-repetition queue from this phase, with first reviews passed.
If your unaided solve rate on random Mediums is below 75%, do not advance. Spend another 1–2 weeks at this level, focusing on the patterns where you missed. The patterns calcify here. Calcify wrong patterns and Phase 3+ becomes a fight against your own intuition.
How This Phase Connects To The Rest
- Phase 0 / Phase 1 are prerequisites. You cannot recognize patterns if you cannot read the problem and you do not know the data structures.
- Phase 3 (Advanced DS) generalizes patterns 9, 10, 18, 26, 27, 28 with segment trees, BITs, balanced BSTs, suffix arrays.
- Phase 4 (Graphs) generalizes patterns 15, 16, 17, 18 with Dijkstra, Bellman-Ford, SCC, max flow.
- Phase 5 (DP) generalizes patterns 20–25 with bitmask, interval, tree, digit, probability DP.
- Phase 6 (Greedy) formalizes the proof techniques behind pattern 6 (sort + greedy).
- Phase 9 (Language/Runtime) drills the language-specific footguns called out in each pattern’s “Common Bugs”.
You will return to this README dozens of times over the rest of the curriculum — each return faster than the last, until eventually the patterns are no longer something you look up but something you simply see.
Lab 01 — Two Pointers: 3Sum (Deep Duplicate Handling)
Goal
Master the opposite-ends two-pointer pattern on a sorted array, with the specific discipline required for correct duplicate suppression. Deliverable solves 3Sum in O(N²) time, O(1) extra space, returning all unique triplets — and you can articulate, line by line, why each duplicate-skip is needed and what bug it prevents.
Background Concepts
Sorting as a precondition for two-pointer; loop invariants on (l, r); duplicate suppression at three loci (the outer i, the inner l, the inner r); the relationship between 3Sum and 2Sum-on-sorted; why hash-set deduplication is the wrong tool here. Review pattern Two Pointers and pattern Hashing.
Interview Context
3Sum is one of the single most asked problems in Big Tech onsite loops — it appears at Meta, Amazon, Bloomberg, and Apple. The signal is whether you can articulate the duplicate-handling logic before you code, not retroactively. Naive candidates produce O(N²) triplets and dedup via a hash set (set(tuple(sorted(triplet)))), which works but signals weak invariant reasoning. Strong candidates handle duplicates inside the two-pointer loop with three skip-clauses and explain each one’s purpose. Elite candidates also address the trade-off vs the hash-based 3Sum (when the input is unsorted and you can’t sort).
Problem Statement
Given an integer array nums, return all unique triplets [nums[i], nums[j], nums[k]] such that i, j, k are distinct indices and nums[i] + nums[j] + nums[k] == 0. The result must not contain duplicate triplets.
Constraints
3 ≤ N ≤ 3000-10^5 ≤ nums[i] ≤ 10^5- Output is a list of triplets in any order; each triplet’s internal order doesn’t matter.
- Triplets are deduplicated by value, not by index:
[-1, 0, 1]from indices(0, 2, 4)and from(1, 3, 5)count once.
Clarifying Questions
- Should
[-1, 0, 1]and[0, 1, -1]count as different triplets? (No — value-set duplicates.) - Are duplicates in the input array allowed? (Yes — many test cases hinge on this.)
- Can the input be empty / size < 3? (Per constraints, N ≥ 3, but mention you’d guard.)
- Can values exceed 32-bit range when summed? (Per constraints, max |sum| ≤ 3·10^5, safe in 32-bit, but you should mention overflow as a habit.)
- Is there a strict no-extra-space requirement, or is the output allocation OK?
Examples
| Input | Output |
|---|---|
[-1,0,1,2,-1,-4] | [[-1,-1,2],[-1,0,1]] |
[0,0,0,0] | [[0,0,0]] |
[1,2,-2,-1] | [] |
[] | [] |
[0] | [] |
Initial Brute Force
Three nested loops; check nums[i] + nums[j] + nums[k] == 0; dedup with a hash set of sorted triplets.
def three_sum_brute(nums):
out = set()
n = len(nums)
for i in range(n):
for j in range(i + 1, n):
for k in range(j + 1, n):
if nums[i] + nums[j] + nums[k] == 0:
out.add(tuple(sorted((nums[i], nums[j], nums[k]))))
return [list(t) for t in out]
Brute Force Complexity
Time O(N³). Space O(N²) worst case for out (number of unique triplets). At N=3000, ~2.7×10^10 operations — far too slow (>30 seconds in any language).
Optimization Path
The sub-problem after fixing nums[i] is 2Sum on the remaining sorted array with target -nums[i]. 2Sum-on-sorted is O(N) via opposite-ends two-pointer. Total: O(N) outer × O(N) inner = O(N²).
Why sort? Sortedness gives 2Sum-on-sorted an O(N) two-pointer algorithm; without sort, 2Sum is O(N) via hash but combining hash-2Sum with outer triplet enumeration makes duplicate-handling much trickier.
Final Expected Approach
def three_sum(nums):
nums.sort()
n = len(nums)
out = []
for i in range(n - 2):
if nums[i] > 0: break # early exit
if i > 0 and nums[i] == nums[i - 1]: continue # skip i-duplicate
l, r = i + 1, n - 1
while l < r:
s = nums[i] + nums[l] + nums[r]
if s < 0:
l += 1
elif s > 0:
r -= 1
else:
out.append([nums[i], nums[l], nums[r]])
l += 1; r -= 1
while l < r and nums[l] == nums[l - 1]: l += 1 # skip l-duplicate
while l < r and nums[r] == nums[r + 1]: r -= 1 # skip r-duplicate
return out
Data Structures Used
- The input array, sorted in place.
- An output list of triplets.
- Three integer indices (
i,l,r).
No hash structures; no auxiliary lists beyond output.
Correctness Argument
After sorting, fix i. The remaining array nums[i+1..n-1] is sorted, and we run the canonical opposite-ends 2Sum: when the sum is too small we advance l, when too large we retreat r, when equal we record and advance both. The standard 2Sum-on-sorted invariant proves that every value pair (nums[l], nums[r]) with l < r and nums[l] + nums[r] == target is found exactly once, in sorted order.
For duplicates: the three skip-clauses ensure that each distinct triplet by value is recorded exactly once.
if i > 0 and nums[i] == nums[i-1]: continue— the previous outer iteration already enumerated all triplets with first element equal tonums[i]. Without this,[-1,-1,0,1,2]would record[-1,-1,2]and[-1,0,1]twice (once for each-1as the outer pivot).while l < r and nums[l] == nums[l-1]: l += 1(after recording) — if multiplenums[l]values exist within(i, r), they all pair with the samenums[r]to give the same triplet. Skip them.while l < r and nums[r] == nums[r+1]: r -= 1(symmetric for the right pointer) — same reasoning.
The early exit if nums[i] > 0: break is correct because once the smallest element of the triplet is positive, no triplet sums to zero (the array is sorted).
Complexity
Time O(N²): O(N log N) sort + O(N²) total work in the nested two-pointer (each l and r move monotonically over a window of size O(N), and the outer i runs N times). Space O(1) extra (excluding output and the sort’s O(log N) recursion stack).
Implementation Requirements
- Sort in place (do not allocate a sorted copy).
- Skip duplicates inside the loop, not via a hash set on output.
- Use the early-exit
if nums[i] > 0: break(small but real speedup for typical inputs). - Move both pointers on a match, then run the skip loops — not before, or you’ll never advance off the matched element.
- Return a
List<List<Integer>>/list[list[int]]— not a set, not tuples.
Tests
- Smoke:
[-1,0,1,2,-1,-4]→[[-1,-1,2],[-1,0,1]]. - Unit: all-zeros (
[0,0,0,0]→[[0,0,0]]); no triplets ([1,2,3]→[]); minimum size ([0,0,0]→[[0,0,0]]). - Edge: size 0 / 1 / 2 →
[]; all-negative; all-positive (sum > 0 from index 0 → break immediately, return[]). - Large: N = 3000, values in
[-10^5, 10^5]; assert sub-second; verify count against the brute force on a 100-element prefix. - Random: generate 50 random inputs of size ≤ 200; compare against the brute force solution as oracle.
- Adversarial:
[0]*3000(all zeros — exactly one triplet[0,0,0]); long run of duplicates of a single value. - Invalid: non-integer / null input — out of scope per constraints, but mention you’d validate at the boundary.
Follow-up Questions
- “What about kSum?” → recurse:
kSum(nums, target, k)calls(k-1)Sum(remaining, target - nums[i]), base case is 2Sum-sorted. Time O(N^(k-1)). - “What if the array is unsorted and you cannot sort it?” → 2Sum-with-hash inside an outer enumeration; duplicate handling becomes per-output-set deduplication, more memory.
- “What if values repeat extremely (e.g., 99% zeros)?” → the duplicate skips handle this in O(N²) worst case, but in practice each outer iteration is O(1) for the duplicate values; you’d see a near-O(N) effective runtime on that adversarial input.
- “Can you do better than O(N²)?” → not under standard reductions: 3Sum has a known conditional lower bound of Ω(N²) (3SUM-hardness conjecture). Subquadratic 3Sum implies subquadratic many problems in computational geometry.
- “What about returning closest triplet to a target?” → 3SumClosest variant; same skeleton, track the best
|s - target|seen.
Product Extension
Detecting fraud rings of size 3 in a transaction graph where the signed sum of three transactions must net to zero (cancel out), under the constraint that the transactions hashed-distinctly. The same skeleton — fix one transaction, two-pointer the rest by signed amount — works, with the wrinkle that “duplicate” must be defined carefully (transactions are distinct by ID even if amounts are equal, so the duplicate skipping is replaced by a per-amount enumeration that emits all distinct ID combinations summing to zero).
Language/Runtime Follow-ups
- Python:
nums.sort()is in place. Usenums.append([nums[i], nums[l], nums[r]])notnums.append((..))if a list-of-lists is required (LC accepts either, but the contract is list). - Java: sort with
Arrays.sort(nums)— note this is dual-pivot quicksort for primitives, average O(N log N). ForInteger[]it’s TimSort. UseList<List<Integer>>and addArrays.asList(a, b, c)per triplet —Arrays.asListreturns a fixed-size list, which is fine because we never mutate it. - Go:
sort.Ints(nums)sorts in place; the rest is pointer arithmetic over indices. - C++:
std::sort(nums.begin(), nums.end()). Usestd::vector<std::vector<int>>for output. Beware: integer additionnums[i] + nums[l] + nums[r]can overflow 32-bit if value bounds were larger; with the given constraints it’s safe, but as a habit, uselong long. - JS/TS:
Array.prototype.sort()defaults to lexicographic comparison — this is a top-3 JS interview bug. Usenums.sort((a, b) => a - b). Also,a - bcan overflow 32-bit if values exceed ±2^30; for very large values useMath.sign(a - b). - Adversarial: sorting is the dominant constant; if the input is already sorted (best case for many sorts) this is faster than the brute by another factor.
Common Bugs
- Forgetting the
nums[i] == nums[i-1]skip on outer — produces duplicate output triplets like[-1,-1,2]repeated. - Forgetting to advance pointers on match —
l += 1; r -= 1must come before the inner duplicate-skip loops; otherwisenums[l] == nums[l-1]is comparingnums[l]to itself and the skip loop runs forever (well, runs untill == r, but produces no progress on the matched value). - Using
<vs<=inwhile l < r—l == rwould mean the same index appearing as both pointers, which is invalid (i, j, k must be distinct indices). - JS lexicographic sort —
[-1, -1, 2, -4, 0, 1]after default sort is[-1, -1, -4, 0, 1, 2](string-sorted). Always pass a numeric comparator. - Missing 32-bit overflow in C++/Java/Go when constraints allow large values. With LC-3Sum’s constraints it doesn’t bite, but the habit costs nothing and saves you on related problems (4Sum, kSum-with-target).
- Hash-set deduplication on output — works, but signals you didn’t internalize the invariant. Time still O(N²) but space O(N²) instead of O(1).
- Sorting twice by accident (once explicitly, once implicit in a downstream API) — innocuous but signals carelessness.
Debugging Strategy
- Run on
[0,0,0,0]first — should return[[0,0,0]]. If you get[[0,0,0],[0,0,0]], your outer-skip is broken. If you get[], your inner loop never matches (likelyl < rtypo). - Run on
[-2,0,0,2,2]. Expected:[[-2,0,2]]. If you get[[-2,0,2],[-2,0,2]], your inner-rskip is broken. - Diff against the brute force on 50 random inputs of size 50. The answers should match as sets (order of triplets and inner order doesn’t matter; sort each triplet and the list of triplets to compare).
- Time a 3000-element random input. Should run in well under a second in Python; under 50ms in C++ / Java / Go.
Mastery Criteria
- Recognized the signal “sorted, find triplet summing to target” as 3Sum within 30 seconds of reading.
- Articulated the three skip-clauses before writing them.
- Wrote a correct implementation in under 8 minutes, no lookup.
- Passed the all-zeros, single-duplicate, and large-N tests on the first attempt.
- Stated the conditional Ω(N²) lower bound when asked “can you do better?”.
- Identified the JS lexicographic-sort trap (or its language equivalent) without prompting.
- Generalized verbally to kSum and to 3SumClosest.
Lab 02 — Sliding Window: Longest Substring With At Most K Distinct Characters
Goal
Master the variable-size sliding window with a frequency map and a “shrink while violation” invariant. Deliverable solves the problem in O(N) time, O(K) extra space, with the loop invariant articulated explicitly: at the end of every iteration, s[l..r] contains at most K distinct characters.
Background Concepts
The shrink-while-violation skeleton; using a count of distinct items vs a full hashmap traversal; the difference between “at most K” and “exactly K” (the atMost(K) - atMost(K-1) trick); when fixed-size and variable-size windows apply. Review pattern Sliding Window.
Interview Context
This problem (LeetCode 340) is a Google / Meta favorite, and a near-direct ancestor of LeetCode 76 (Minimum Window Substring), 159 (At Most Two Distinct), 992 (Subarrays With K Different Integers), and 424 (Longest Repeating Character Replacement). The interview signal is whether you maintain the invariant cleanly: a single while violation: loop that shrinks until valid, then unconditional answer-update. Weak candidates write nested if-statements with off-by-one errors. Strong candidates write the canonical 5-line shrink-and-update.
Problem Statement
Given a string s and an integer K, return the length of the longest substring of s that contains at most K distinct characters.
Constraints
1 ≤ |s| ≤ 5 × 10^40 ≤ K ≤ |s|sconsists of arbitrary characters (Unicode in some variants; ASCII in the standard variant).
Clarifying Questions
- Is the alphabet ASCII or Unicode? (If Unicode, hashmap; if ASCII-lowercase, an
int[26].) - What if
K == 0? (Empty substring is valid; answer is 0.) - What if
K >= number of distinct chars in s? (Answer islen(s).) - Is the answer the length or the substring itself? (LC asks length; mention you can record
(start, length)for the substring.) - Can
sbe empty? (Per constraints,|s| ≥ 1, but the function should return 0 on empty input.)
Examples
| Input | Output | Note |
|---|---|---|
s="eceba", K=2 | 3 | “ece” |
s="aa", K=1 | 2 | “aa” |
s="abcabc", K=2 | 4 | “bcbc” or “cabc” — wait, “cabc” has 3 distinct; correct: “bcbc” or “abca” — both length 4 |
s="a", K=0 | 0 | empty |
s="abcdef", K=10 | 6 | full string |
Initial Brute Force
Enumerate all substrings, check distinct-count, track max.
def longest_brute(s, K):
best = 0
for i in range(len(s)):
seen = set()
for j in range(i, len(s)):
seen.add(s[j])
if len(seen) <= K:
best = max(best, j - i + 1)
else:
break
return best
The inner loop can break on first violation, so this is effectively O(N · K) average and O(N²) worst case (when K is large enough that no break happens).
Brute Force Complexity
Time O(N² · |alphabet|) at worst (set ops). At N=5×10^4, that’s ~2.5×10^9 — too slow in any language.
Optimization Path
Observation: as r advances, the set of distinct characters in s[l..r] is monotone non-decreasing; as l advances (keeping r fixed), it is monotone non-increasing. So we can use a two-pointer / sliding window: advance r, and while the window has more than K distinct chars, advance l. Each character enters and leaves the window at most once, total O(N).
Use a frequency map freq[c] keyed by character; distinct is the count of characters with freq[c] > 0. Increment distinct when a key first reaches 1; decrement when a key drops to 0.
Final Expected Approach
def longest_at_most_k_distinct(s, K):
if K == 0: return 0
freq = {}
l = 0
best = 0
for r, c in enumerate(s):
freq[c] = freq.get(c, 0) + 1
while len(freq) > K:
freq[s[l]] -= 1
if freq[s[l]] == 0:
del freq[s[l]]
l += 1
best = max(best, r - l + 1)
return best
We use len(freq) directly as the distinct count, since we delete keys that hit zero.
Data Structures Used
- A hashmap
freq: char → intof size at most K+1 during the violation, ≤ K otherwise. - Two integer pointers
l,r. - A running maximum
best.
Correctness Argument
Invariant: at the start of each iteration of the outer loop and after the inner shrink, freq contains exactly the characters of s[l..r] with their counts, and len(freq) ≤ K.
Base: before iteration 0, l = 0, freq = {}, len(freq) = 0 ≤ K. ✓
Step: we add s[r] to freq. If this brings len(freq) > K, we shrink: decrement freq[s[l]], delete on zero, advance l. The shrink loop terminates because l ≤ r always (proved: each shrink-step removes one character, and at most r - l + 1 characters can be in the window, so after at most r - l + 1 shrinks we have len(freq) ≤ 1 ≤ K).
Optimality (max): for each r, the smallest l such that the window has ≤ K distinct is recorded; this gives the longest valid window ending at r. Taking the max over all r gives the global longest.
Why while, not if: in this problem each new character can add at most one to distinct, so if would also work. But the canonical sliding-window template uses while because in cousin problems (LC 76) a single r-advance can violate by more than 1 (when adding required chars). Always default to while; pay the (zero) cost of generality.
Complexity
Time O(N): each character is added once, removed at most once. Each hash op is O(1) average. Space O(min(N, K+1)) for the frequency map.
Implementation Requirements
- Use
len(freq)(or maintain adistinctcounter) — do not iterate the map to count. - Delete keys that hit zero, or your
len(freq)will be wrong. - Update
bestafter the shrink, not inside it. (Inside the shrink, the window is invalid.) - For ASCII-lowercase, an
int[26]plus a separatedistinctcounter is faster than a hashmap (no hashing constant). - Guard
K == 0(orK < 0) at the top.
Tests
- Smoke:
("eceba", 2) → 3,("aa", 1) → 2. - Unit:
K = 0→ 0;K ≥ |distinct(s)|→|s|; single-character string. - Edge:
s = ""→ 0;K = 1, s = "abcdef"→ 1;K = |s|→|s|. - Large:
|s| = 5 × 10^4, alphabet 26,K = 3; should run in well under 100ms. - Random: generate 100 random strings of length ≤ 200, alphabet of varying size, varying K; cross-check against the brute force.
- Adversarial: all-same-character (
"aaaa...", K=1 → N); strictly increasing alphabet ("abcdef...", K → answer is K). - Unicode follow-up: make sure your Java/JS code iterates by codepoint, not by char, if the spec extends to Unicode.
Follow-up Questions
- “Now solve exactly K distinct.” →
atMost(K) - atMost(K-1)for the count variant; for the longest-with-exactly-K, run the same sliding window but only recordbestwhenlen(freq) == K(not≤ K). - “Now solve LC 76 (Minimum Window Substring).” → same skeleton, but the violation is “window does not yet contain all required chars”; we grow until satisfied, then shrink while still satisfied, recording the minimum each time we’re satisfied.
- “What if the string is streamed and only
radvances?” → two-pointer doesn’t apply directly; you’d need an order-preserving structure. (Out of scope here.) - “What if K can change with each query?” → preprocess differently; this problem is not amenable to a single offline sliding window for many K values.
- “What if the input is very large, alphabet huge, but
sonly contains a few distinct chars?” → no change; the hashmap is bounded bymin(K+1, |distinct(s)|).
Product Extension
A real-time content-moderation system tracks the longest run of messages in a chat where at most K distinct emojis are used (an indicator of spam-bot activity, which tends to use a small bag of emojis on repeat). The sliding window updates per message in O(1) amortized. The same skeleton applies to “longest session window with at most K distinct user-agents” for fraud detection, and “longest range of cells with at most K distinct values” for spreadsheet anomaly detection.
Language/Runtime Follow-ups
- Python:
dictoverhead is real; for ASCII-lowercase,[0]*26plus adistinctint is ~3× faster. Uses = list(s)only if you need indexing speed (string indexing is already O(1)). - Java:
HashMap<Character, Integer>boxes keys and values. For ASCII, useint[128]and trackdistinctmanually.chars()method exists buts.charAt(i)is fine. - Go:
map[byte]intfor ASCII;map[rune]intfor Unicode.range son a string yields(byte_index, rune)— this is a top-3 Go string trap:for i, c := range sdoes not give youias character index for multi-byte chars. - C++:
std::unordered_map<char, int>works; for ASCII,std::array<int, 128>is faster. - JS/TS:
Map<string, number>works. Iteratingfor (const c of s)yields codepoints, not UTF-16 code units — important if Unicode is in scope. Otherwise,s[i]works for ASCII. - Unicode caveat: “distinct character” might mean codepoint, grapheme cluster, or UTF-16 code unit — clarify with the interviewer.
Common Bugs
- Updating
bestinside the shrink loop — records invalid windows, returns wrong answers when the only valid window length is small. - Forgetting to delete zero-count keys —
len(freq)becomes stale, breaks the violation check. Equivalent bug: maintaining a separatedistinctcounter and forgetting to decrement when a count hits zero. - Using
ifinstead ofwhile— works for this problem but breaks for LC 76 and friends. Build the habit ofwhile. - Off-by-one in
r - l + 1—ris inclusive,lis inclusive; window length isr - l + 1. - Java boxing in
HashMap<Character, Integer>— autoboxing tax is ~3× over the primitiveint[]approach for ASCII. - Go string range bug — iterating with
for i := 0; i < len(s); i++and indexing ass[i]gives bytes, not runes; for UTF-8 data this misclassifies multi-byte chars as multiple distinct ones. - JS UTF-16 surrogate pair bug —
s.lengthfor"😀abc"is 5 (surrogate + 3 ASCII), ands[0],s[1]are the surrogate halves, not the emoji.
Debugging Strategy
- Trace
("eceba", 2)by hand. Window evolution:e | ec | ece | (shrink to ce, then) cebᴬ— whenbenters, window hasc, e, b(3 distinct), shrink: removec(was at position 1), window ise bthen add … etc. If your trace doesn’t match, your shrink logic is wrong. - Diff against the brute force on 50 random inputs.
- Print
(l, r, len(freq), best)per iteration;len(freq)should never exceed K after the shrink completes.
Mastery Criteria
- Recognized “longest substring with property” as sliding window in <30 seconds.
- Wrote the canonical shrink-while-violation template with the answer-update outside the shrink, in <5 minutes.
- Articulated the loop invariant and the termination of the shrink loop.
- Generalized to “exactly K distinct” via the
atMost(K) - atMost(K-1)trick. - Identified the language-specific Unicode / boxing trap.
- Solved LC 76, LC 424, LC 992 within a week, observing the same skeleton.
Lab 03 — Prefix Sums: Subarray Sum Equals K
Goal
Master the prefix-sum + hashmap-of-complements pattern. Deliverable solves LeetCode 560 in O(N) time, O(N) space, and you can articulate why a hashmap of prefix sums (not of values) is the right abstraction, why {0: 1} is the required base case, and why this generalizes to subarray-XOR-equals-K, subarray-sum-divisible-by-K, and friends.
Background Concepts
Prefix sum identity: sum(a[l..r]) = prefix[r+1] - prefix[l]. Reformulating “find subarrays with sum K” as “find pairs of prefix sums differing by K”. Hashmap as a complement-finder. Generalization to any group operation (XOR, mod, addition over any abelian group). Review pattern Prefix Sums and Hashing.
Interview Context
This is a rite-of-passage Medium: appears at Meta, Google, Amazon, Stripe. Naive candidates write O(N²) double loops over all subarrays. Decent candidates write a prefix-sum array then double loop over endpoints — still O(N²). Strong candidates collapse to O(N) with a hashmap-of-prefix-counts. Elite candidates immediately generalize to LC 974 (subarray sums divisible by K) and LC 525 (contiguous array — recast as prefix-balance equals zero) without prompting.
Problem Statement
Given an integer array nums and an integer K, return the total number of contiguous subarrays whose sum equals K.
Constraints
1 ≤ N ≤ 2 × 10^4-1000 ≤ nums[i] ≤ 1000-10^7 ≤ K ≤ 10^7- The array can contain negative numbers (this matters — sliding window does not apply).
Clarifying Questions
- Are values negative or non-negative? (Per constraints — both. This is the crucial clarification: with non-negatives, sliding window works in O(N); with negatives, you must use prefix sums.)
- Are zeros allowed? (Yes, and they create multiple subarrays of the same sum; the count must reflect this.)
- Empty subarrays — count them? (No; subarrays have ≥ 1 element. But the prefix-sum technique uses an “empty prefix” of value 0, hence
{0: 1}initialization.) - Is
Kalways reachable? (No assumption.) - Can
K = 0? (Yes — counts subarrays summing to 0, including those that are entirely zero.)
Examples
| Input | Output | Note |
|---|---|---|
nums=[1,1,1], K=2 | 2 | [1,1] at indices (0,1) and (1,2) |
nums=[1,2,3], K=3 | 2 | [3] and [1,2] |
nums=[1,-1,0], K=0 | 3 | [1,-1], [0], [1,-1,0] |
nums=[0,0,0], K=0 | 6 | every contiguous subarray |
nums=[100], K=100 | 1 | trivial |
Initial Brute Force
Two nested loops over (l, r), sum nums[l..r], count.
def subarray_sum_brute(nums, K):
count = 0
for l in range(len(nums)):
s = 0
for r in range(l, len(nums)):
s += nums[r]
if s == K: count += 1
return count
Brute Force Complexity
Time O(N²). Space O(1). At N=2×10⁴, ~4×10⁸ ops — borderline; passes in C++ but TLEs in Python.
Optimization Path
The key reformulation: a subarray nums[l..r] sums to K iff prefix[r+1] - prefix[l] == K, i.e., prefix[l] == prefix[r+1] - K.
So as we walk r from 0 to N-1, computing the running prefix sum, we ask: “How many earlier prefix sums equal prefix - K?” This is a hashmap lookup. Each step is O(1). Total O(N).
The base case is subtle and important: before processing any element, we have prefix sum 0, “seen once”. This accounts for subarrays starting at index 0 (where the missing earlier prefix is the empty prefix of value 0).
Final Expected Approach
def subarray_sum(nums, K):
count = 0
prefix = 0
seen = {0: 1} # empty prefix
for x in nums:
prefix += x
count += seen.get(prefix - K, 0)
seen[prefix] = seen.get(prefix, 0) + 1
return count
Crucial ordering: lookup before insert. Otherwise the case K == 0 over-counts: every position would match itself.
Data Structures Used
- A hashmap
seen: int → intmapping each prefix-sum value to the number of times it has occurred. - A running
prefixinteger. - A running
countinteger.
Correctness Argument
Let p_i = sum(nums[0..i-1]) (so p_0 = 0). A subarray nums[l..r] sums to K iff p_{r+1} - p_l = K iff p_l = p_{r+1} - K.
For each r, the number of valid l ∈ [0, r] is |{l : p_l == p_{r+1} - K}|. As we iterate, seen after processing index r contains exactly {p_0, p_1, ..., p_{r+1}} with multiplicities. Looking up seen[prefix - K] before inserting prefix gives |{l ∈ [0, r] : p_l == p_{r+1} - K}| — the count of valid l for the current r.
Summing over all r gives the total count of valid subarrays. The {0: 1} initialization handles the case l == 0 (where p_0 == 0 is consulted).
Complexity
Time O(N) average (hashmap ops). Space O(N) for the hashmap (worst case: all distinct prefix sums). Worst-case time degrades to O(N²) under hash collisions on adversarial input — see Phase 1 §3.
Implementation Requirements
- Initialize
seen = {0: 1}before the loop. Forgetting this causes off-by-one on subarrays starting at index 0. - Lookup before insert. This isn’t just style — for
K == 0, swapping the order miscounts trivially. - Use 64-bit accumulator if values × N could overflow 32-bit (here,
2×10^4 × 1000 = 2×10^7, safe; but build the habit). - Don’t precompute the prefix-sum array if you don’t need to — a single running int suffices.
Tests
- Smoke:
([1,1,1], 2) → 2. - Unit:
K = 0cases;Klarger than total sum (returns 0); single element matching K (returns 1); single element not matching K (returns 0). - Edge:
nums = [0]*N, K = 0→N*(N+1)/2. This validates that zeros are correctly counted. - Adversarial: alternating positives and negatives summing to K many times. Construct
[1,-1,1,-1,...,1,-1]with K=0; expected count is N/2 + N/2*(N/2-1)/2 + … — easier to validate against the brute force. - Large: N = 2×10⁴, random values; assert ms-level. Compare to brute force on a 1000-element prefix.
- Random: 100 random inputs of size ≤ 200; cross-check against brute.
- Negative K: include K negative; must work because prefix sums and
prefix - Kuse the same arithmetic.
Follow-up Questions
- “Subarray sum divisible by K?” → key by
prefix % K(taking care to normalize to non-negative for languages where%can return a negative value:((prefix % K) + K) % K). Count pairs of equal residues. - “Subarray with XOR equal to K?” → exact same skeleton, replace
+with^. The group operation just needs an identity and an inverse. - “Longest subarray with sum K?” → store the first occurrence of each prefix-sum value; when you see
prefix - K, the length isr+1 - first[prefix - K]. - “Contiguous array (LC 525) — equal 0s and 1s?” → re-encode 0s as -1; the problem becomes “longest subarray with sum 0”.
- “If memory is tight (cannot store hashmap)?” → if values are non-negative, sliding window in O(1) extra; otherwise no known better.
- “What if the array is updated?” → Fenwick tree for prefix sums; each update O(log N), each query O(log N) (Phase 3).
Product Extension
Anomaly detection in cumulative metric streams. Imagine network ingress where a “subarray sum equals K” query asks: “how many time windows had exactly K bytes of traffic?” — useful for detecting periodicity or replay patterns. The same prefix-hashmap pattern, run online, detects these in O(1) per event with O(window-size) memory. Extends to log-aggregation systems (CloudWatch, Datadog) where dashboards scan millions of events and need O(N) algorithms.
Language/Runtime Follow-ups
- Python:
defaultdict(int)is cleaner thandict.get(...). Performance is similar. - Java:
HashMap<Long, Integer>for safety. Boxing! EachgetOrDefault(prefix, 0) + 1followed byput(prefix, ...)does up to 4 boxes; pre-boxing ormerge()with Integer::sum reduces it. - Go:
map[int]int; the zero value of int is 0 soseen[prefix - K]returns 0 if absent — no need forokcheck on lookups (but you do need to insertseen[0] = 1initially). - C++:
std::unordered_map<long long, int>. Beware:unordered_map<int, int>::operator[]inserts on read of a missing key, growing the map by one for every miss. Usefind()for lookup-only. - JS/TS: prefer
Map<number, number>;Objectkeys are coerced to strings. For very large prefix sums (> 2^53), useBigIntor strings as keys. - Adversarial hashing: crafted inputs producing many distinct prefix sums that collide can degrade to O(N²) in languages with deterministic hash. Java’s
HashMapupgrades long chains to red-black trees; Python and Go randomize; C++ does not by default.
Common Bugs
- Missing
{0: 1}initialization — off-by-one for any subarray starting at index 0. Concretely:([3], 3)returns 0 instead of 1. - Insert before lookup — for
K == 0, every position matches itself once, returning N spurious counts. - Using
dict[key]instead ofdict.get(key, 0)in Python — KeyError on first miss. - Java autoboxing penalty — silent ~3× slowdown; use primitive maps (e.g., Eclipse Collections
IntIntMap) orHashMap.merge. - C++
unordered_map::operator[]insertion-on-read — bloats the map and ruins iteration order; usefind()for read-only. - Negative-modulo bug in the LC 974 follow-up —
-3 % 5is-3in C++/Java/Go but2in Python. Normalize:((x % K) + K) % K. - Recomputing
prefix - Ktwice — minor, but(prefix - K)should be a single local for readability.
Debugging Strategy
- Trace
([1,1,1], 2)by hand. Expectedseenevolution:{0:1} → {0:1,1:1} → {0:1,1:1,2:1} → {0:1,1:1,2:1,3:1}. At each step, lookupprefix - K: at step 2,prefix=2, lookup0: count += 1. At step 3,prefix=3, lookup1: count += 1. Total 2. - For
K = 0issues: trace([0,0,0], 0).seenevolves{0:1} → (lookup 0: count+=1) → {0:2} → (lookup 0: count+=2) → {0:3} → (lookup 0: count+=3). Total 6 =3*(3+1)/2. ✓ - Diff against brute force on 100 random small inputs.
Mastery Criteria
- Recognized “subarray with sum X” + negatives as prefix-sum + hashmap in <30 seconds.
- Articulated the
{0: 1}base case before coding. - Lookup-before-insert ordering correct on first try.
- Generalized to LC 974 (mod) and LC 525 (re-encoding) without lookup.
- Identified C++
operator[]insertion-on-read trap. - Solved on the first try with no off-by-one bugs in 5 random small tests.
Lab 04 — Binary Search On Answer: Capacity To Ship Packages Within D Days
Goal
Master the parametric/binary-search-on-answer pattern: identify the monotonic predicate, prove its monotonicity, write a correct feasible(x) independently, then drive a half-open binary search over the answer space. Deliverable solves LC 1011 in O(N · log(sum)) time, O(1) space, and you can articulate the bounds, the predicate’s monotonicity proof, and the canonical [lo, hi) template’s termination.
Background Concepts
Decision problem vs optimization problem; parametric search; monotone predicates; the half-open [lo, hi) invariant feasible(hi-1) = true, feasible(lo-1) = false; greedy verification of feasibility. Review pattern Binary Search On Answer and the constraint→complexity table.
Interview Context
Binary search on answer is the single highest-value pattern for distinguishing strong from elite candidates in 30-minute Mediums. Most candidates can do binary search on a sorted index. Few can recognize that “min capacity such that we can ship in D days” is the same binary search, just over a different domain. Once you internalize this, an entire family of problems collapses: Koko bananas, split-array-largest-sum, smallest divisor, minimum days for bouquets, magnetic force balls, kth-smallest-in-multiplication-table. The pattern is identical; only feasible changes.
Problem Statement
A conveyor belt has packages with weights weights[i]. Each day you load packages in order and ship them on a boat with weight capacity C. Once loaded, the boat ships and starts again the next day. Return the minimum capacity C such that all packages ship within D days.
Constraints
1 ≤ D ≤ |weights| ≤ 5 × 10^41 ≤ weights[i] ≤ 500
Clarifying Questions
- Must packages be loaded in given order? (Yes — this is the crux. If we could reorder, it becomes a bin-packing problem, which is NP-hard.)
- Can a package’s weight exceed the daily capacity? (No — capacity must be at least
max(weights), else that package never ships.) - Can the boat be partially filled and still ship that day? (Yes — when adding the next package would exceed
C, you ship the current load and start fresh.) - Is
Dstrict (must use all D days) or upper-bound (≤ D days)? (Upper-bound: any number of days ≤ D is acceptable.) - Are weights integers, and if so, is the answer always an integer? (Yes and yes.)
Examples
| Input | Output |
|---|---|
weights=[1,2,3,4,5,6,7,8,9,10], D=5 | 15 |
weights=[3,2,2,4,1,4], D=3 | 6 |
weights=[1,2,3,1,1], D=4 | 3 |
weights=[10,50,100,100,50,100,100,100], D=5 | 200 |
For the first: C=15 ships as [1,2,3,4,5] (15) | [6,7] (13) | [8] (8) | [9] (9) | [10] (10) — 5 days. C=14 would split day 1 into [1,2,3,4] | [5,6,7] | … and need 6+ days.
Initial Brute Force
Try every capacity from max(weights) to sum(weights); for each, simulate and check if D days suffice; return the first that works.
def ship_capacity_brute(weights, D):
for C in range(max(weights), sum(weights) + 1):
if feasible(weights, D, C): return C
def feasible(weights, D, C):
days, load = 1, 0
for w in weights:
if load + w > C:
days += 1; load = 0
load += w
return days <= D
Brute Force Complexity
Time O(N · range), where range = sum - max ≈ N · 500. So O(N²·500) ≈ 1.25×10^9. TLEs.
Optimization Path
Key observation: feasibility is monotonic in capacity. If C works (ships in ≤ D days), any C' > C also works (the greedy simulation can only do better — it never has to start a new day earlier). Equivalently, if C doesn’t work, no C' < C works.
So the set {C : feasible(C) == true} is an upward-closed half-line [C*, ∞). We binary search for C*.
Bounds:
- Low:
max(weights)— any capacity below this can’t even ship the heaviest package alone. - High:
sum(weights)— capacity ≥ total ships everything in 1 day, trivially ≤ D days.
Final Expected Approach
def ship_capacity(weights, D):
def feasible(C):
days, load = 1, 0
for w in weights:
if load + w > C:
days += 1; load = w
else:
load += w
if days > D: return False
return True
lo, hi = max(weights), sum(weights) + 1 # half-open [lo, hi)
while lo < hi:
mid = (lo + hi) // 2
if feasible(mid):
hi = mid
else:
lo = mid + 1
return lo
Note hi = sum + 1 (half-open) and the canonical lo < hi loop condition.
Data Structures Used
None beyond the input array and three integers (lo, hi, mid) plus two more inside feasible.
Correctness Argument
Monotonicity: Let feasible(C) be true. For C' > C, the greedy simulation with capacity C' packs at least as much per day as it would with C (because the “if load + w > C” branch fires at most as often). So days-needed-with-C' ≤ days-needed-with-C ≤ D. Hence feasible(C') is true. The set of feasible C is upward-closed.
Greedy feasibility: the greedy “ship today as much as fits” is optimal for min days given fixed C, by an exchange argument: if there’s a schedule that ships in fewer days, we can shift weight from a later day to an earlier day without exceeding C, monotonically reducing days. So feasible(C) correctly reports whether some schedule fits in D days.
Binary search invariant (half-open [lo, hi)): feasible(lo - 1) = false (or lo - 1 < max(weights)) and feasible(hi) either true or hi = sum + 1 (definitely feasible). The loop preserves this. On termination lo == hi, and feasible(lo) = true, feasible(lo - 1) = false. So lo is the minimum feasible C.
Termination: each iteration strictly shrinks hi - lo by at least 1 (in the mid + 1 branch) or halves it (in the hi = mid branch). Bounded by O(log(hi - lo)).
Complexity
Time O(N · log(sum - max)). With N = 5×10^4 and sum bounded by 2.5×10^7, that’s ~5×10^4 × 25 ≈ 1.25×10^6 ops. Easily sub-100ms in any language. Space O(1) additional.
Implementation Requirements
- Write
feasiblefirst, test it independently on the examples, then wrap it in binary search. Many bugs are infeasible, not the search. - Use
hi = sum + 1half-open, orhi = sumand<=; pick one convention and stick to it. The half-open[lo, hi)template returnsloexactly. - Early-return from
feasibleoncedays > D— saves time on smallC. - Don’t forget that on the “doesn’t fit” branch, the new day starts with the current package as its load, not zero (or else the next package may also overflow).
Tests
- Smoke: the four examples above.
- Unit:
D = N(each package its own day) → answer ismax(weights).D = 1→ answer issum(weights). - Edge: all-equal weights,
[5,5,5,5], D=2→ answer 10. Single weight,[42], D=1→ 42. - Independence test for
feasible: for the first example, verifyfeasible(15)=true, feasible(14)=false, feasible(55)=true, feasible(10)=false. - Large: N = 5×10⁴, weights random in
[1, 500], D = 1000; assert sub-100ms. Cross-check against brute on a 100-element prefix. - Adversarial: sorted ascending, sorted descending, all-max (weights all 500). The greedy is still optimal regardless of ordering (well, the answer depends on ordering since reorder is forbidden).
Follow-up Questions
- “What if package order is flexible?” → bin packing, NP-hard. Approximation: First-Fit-Decreasing achieves 11/9 OPT.
- “What if D is very large (D ≥ N)?” → answer is
max(weights); you can early-return. - “What if you must use exactly D days?” → still binary-searchable (monotonicity holds for “≤ D days”; for “exactly D”, parametrize differently — but in practice you almost always want ≤ D).
- “What about Koko Eating Bananas (LC 875)?” → identical pattern;
feasible(speed) = sum(ceil(p / speed) for p in piles) ≤ H. - “Split array largest sum (LC 410)?” → identical pattern;
feasible(maxSum) = (greedy partition into pieces of sum ≤ maxSum, count ≤ K). - “Floating-point answer?” → loop until
hi - lo < eps, returnlo. Watch for non-termination if eps is below float precision.
Product Extension
Capacity planning for a build pipeline: given a stream of CI jobs with known durations and a deadline D, find the minimum machine-count or the minimum machine-capacity that meets the deadline. Same pattern: monotonic in capacity, binary-search on answer with greedy feasibility. Generalizes to load-balancer auto-scaling: minimum number of pods s.t. p99 latency stays below SLA, given a workload trace. (The feasibility check becomes a simulator instead of a one-line greedy, but the structure is identical.)
Language/Runtime Follow-ups
- Python:
(lo + hi) // 2is fine — Python ints are arbitrary precision. No overflow. - Java:
int mid = lo + (hi - lo) / 2;to avoid 32-bit overflow whenlo + hiexceedsInteger.MAX_VALUE. (For this problem’s constraints,lo + hiwon’t overflow, but build the habit.) - Go: same overflow caveat as Java; use
lo + (hi - lo) / 2. - C++: same; use
lo + (hi - lo) / 2.intmay also be too small forsum(weights)if constraints expand — uselong long. - JS/TS: numbers are 64-bit floats, integer-precise up to 2^53. No overflow risk for this problem. But beware:
Math.floor((lo + hi) / 2)is slower than(lo + hi) >>> 1(zero-fill right shift), and the latter is what idiomatic JS binary-search uses. - Edge: for floating-point binary search (not this problem), terminate by iteration count (e.g., 100 rounds) rather than
hi - lo < epsto dodge non-termination near float precision.
Common Bugs
- Wrong bounds.
lo = 1instead oflo = max(weights)— the search may return a value that doesn’t fit the heaviest package (well, the binary search finds the smallest feasible value; if you start lo too low andfeasibleis correct, you still get the right answer — but you waste log(max) iterations and the symmetry of the bounds is broken).hi = suminstead ofsum + 1withlo < himis-handles the case where the answer issum. - Inverted predicate direction. Searching for “max C such that infeasible” instead of “min C such that feasible” — flips the bounds and breaks the invariant.
feasiblebug: forgetting to start the new day with the current package. Settingload = 0instead ofload = wafter exceeding capacity corrupts the count for runs of large weights.feasiblebug: startingdays = 0instead ofdays = 1. The first day exists before any package ships.- Off-by-one in the binary search template. Mixing
<with<=andmidwithmid - 1andmid + 1is the most common interview bug. Memorize one template (we use the half-open one) and never deviate. - C++/Java/Go integer overflow on
(lo + hi) / 2for very large constraints. - Calling
feasiblewith the wrong arg type (e.g., float when int expected) in dynamically-typed languages — silent rounding.
Debugging Strategy
- Test
feasibleindependently on the given examples for several values of C. The interactive tracefeasible(15)=true, feasible(14)=false, …is your ground truth. - Add an iteration counter to the binary search; cap it at 100. If the cap fires, your bounds or update direction is wrong.
- Print
(lo, mid, hi, feasible(mid))per iteration; thelo, hiinterval should shrink monotonically to a singleton. - For overflow suspicions, replace ints with
long/long long/bigintand rerun.
Mastery Criteria
- Identified “minimum X such that property P” + monotone P as binary-search-on-answer in <60 seconds.
- Stated the monotonicity argument in plain English before coding.
- Wrote
feasiblefirst, tested it independently, then wrapped it in binary search. - Used a single canonical binary-search template (half-open) without confusing
<vs<=. - Generalized verbally to LC 875 (Koko), LC 410 (Split Array Largest Sum), LC 1283 (Smallest Divisor) without prompting.
- Identified the language-specific overflow / template trap.
- Solved a similar new problem from this family in <10 minutes within a week of completing this lab.
Lab 05 — Monotonic Stack: Largest Rectangle In Histogram
Goal
Master the monotonic-stack pattern on its hardest canonical problem. Deliverable solves LC 84 in O(N) time, O(N) space, and you can articulate why each bar is pushed and popped at most once, why a sentinel 0 at the end is required (or how to handle the leftover stack), and how this generalizes to maximal-rectangle-of-1s in a 2D grid.
Background Concepts
Monotonic stack invariant; index-vs-value storage in stacks; sentinel technique for clean termination; amortized analysis (each index pushed and popped once); the rectangle’s “left boundary = new top after pop” / “right boundary = current index” trick. Review pattern Monotonic Stack.
Interview Context
LC 84 is one of the hardest commonly-asked Mediums (often labeled Hard). It appears at Google, Meta, and quant firms. The interview signal is whether you can derive the algorithm from a smaller cousin (LC 496 — Next Greater Element). Naive candidates write O(N²) “for each bar, expand left and right”. Decent candidates derive a left-bounds array and right-bounds array via two monotonic stack passes. Strong candidates do it in one pass with the popped-bar’s-rectangle trick. Elite candidates immediately observe that LC 85 (Maximal Rectangle) is just LC 84 applied per row of a derived heights array.
Problem Statement
Given an array heights representing histogram bar heights of equal width 1, return the area of the largest rectangle that fits within the histogram.
Constraints
1 ≤ N ≤ 10^50 ≤ heights[i] ≤ 10^4
Clarifying Questions
- Are heights non-negative? (Per constraints, yes.)
- Can heights be zero? (Yes — and zero-height bars effectively reset the candidates, since no rectangle can include them.)
- Are bars unit-width? (Yes — width 1 each, so the rectangle’s width is just the number of consecutive bars that all have at least the rectangle’s height.)
- Multiple equal-area rectangles — return any area, or specify? (Just the area; LC asks for the max.)
- Is the answer always achievable in 32-bit? (
max_height × N = 10^4 × 10^5 = 10^9, fits 32-bit signed but barely. Use 64-bit for safety in C++/Java.)
Examples
| Input | Output |
|---|---|
[2,1,5,6,2,3] | 10 (rectangle of height 5 covering indices 2..3 — width 2 — wait, height 5 × width 2 = 10? actually height 5 spans indices 2..3 (heights 5,6), so width 2 rectangle of height 5 — area 10. Or height 6 over index 3 alone, area 6. Best is 10.) |
[2,4] | 4 |
[2,1,2] | 3 (rectangle of height 1 spanning all 3 bars) |
[6,7,5,2,4,5,9,3] | 16 |
[0] | 0 |
Initial Brute Force
For each bar i, expand left and right while heights stay ≥ heights[i]; rectangle area is heights[i] × (right - left + 1).
def largest_rect_brute(heights):
n = len(heights)
best = 0
for i in range(n):
l = r = i
while l > 0 and heights[l - 1] >= heights[i]: l -= 1
while r < n - 1 and heights[r + 1] >= heights[i]: r += 1
best = max(best, heights[i] * (r - l + 1))
return best
Brute Force Complexity
Time O(N²) worst case (all-equal heights). Space O(1). At N=10⁵, ~10^10 ops — TLEs everywhere.
Optimization Path
Observation: for each bar i, we want the largest rectangle of at least height heights[i] that contains i. Width = (next-smaller-to-the-right) − (previous-smaller-to-the-left) − 1. If we know “previous smaller index” pl[i] and “next smaller index” pr[i] for every bar, area is heights[i] * (pr[i] - pl[i] - 1), computed in O(N) total.
Both pl and pr are computable by a single monotonic-stack pass each. Even better: a single pass with a stack-of-indices in strictly increasing height order. When we encounter a smaller bar, we pop the stack; for each popped index j, the current index i is its pr[j] and the new top of the stack (after the pop) is its pl[j]. Compute j’s rectangle area on the spot.
Final Expected Approach
def largest_rectangle_area(heights):
stack = [] # indices, heights[stack] strictly increasing
best = 0
n = len(heights)
for i in range(n + 1):
cur = 0 if i == n else heights[i]
while stack and heights[stack[-1]] > cur:
top = stack.pop()
left = stack[-1] if stack else -1
width = i - left - 1
best = max(best, heights[top] * width)
stack.append(i)
return best
The trick: iterate to n + 1 with a sentinel cur = 0. This forces every remaining bar to be popped (since 0 is strictly less than any positive height), so we don’t need a separate post-loop drain.
Data Structures Used
- A stack of indices into
heights, holding indices whose heights are strictly increasing from stack-bottom to stack-top. - A running
bestinteger.
Correctness Argument
Stack invariant: at every point, heights[stack[0]] < heights[stack[1]] < ... < heights[stack[-1]].
Maintenance: before pushing i, we pop all indices j with heights[j] >= heights[i] (using > ensures strict; for this problem, > is correct and >= would over-pop). After popping, all remaining stack entries have height < heights[i], so pushing i preserves the invariant.
Per-popped-bar rectangle is correct: when we pop top = stack.pop(), by the invariant the new top’s height < heights[top], so pl[top] is stack[-1] (or -1 if stack empty). The current index i is the first index since top with heights[i] < heights[top] (because all indices between top+1 and i-1 had heights ≥ heights[top] and were either still on the stack or popped earlier — but if popped earlier, they were popped by a strictly-smaller bar, contradiction). So pr[top] = i. Width is i - pl[top] - 1.
Sentinel correctness: at i = n we use cur = 0, smaller than any positive height. This pops every remaining index, computing each one’s rectangle with pr = n.
Amortized O(N): each index pushed once, popped at most once. Inner while loop’s total iterations across the outer loop sum to ≤ N.
Complexity
Time O(N) amortized. Space O(N) for the stack worst case (strictly increasing input).
Implementation Requirements
- Store indices, not heights, in the stack — you need indices to compute width.
- Use the sentinel trick (
i in range(n + 1)withcur = 0ati = n) for clean code, OR drain the stack after the main loop withpr = n. Pick one; the sentinel is preferred. - Use strict
>in the pop condition. For “largest rectangle”,>and>=give the same answer (rectangles with equal heights are accounted for by their leftmost bar), but>=over-pops and confuses the bookkeeping in cousin problems. - Use 64-bit arithmetic in Java/C++ for the area:
10^4 × 10^5 = 10^9, fits 32-bit, butInteger.MAX_VALUE = 2.1×10^9and habits matter.
Tests
- Smoke:
[2,1,5,6,2,3] → 10. - Unit:
[1] → 1,[1,1,1,1] → 4,[1,2,3,4,5] → 9(rectangle of height 3 over indices 2..4),[5,4,3,2,1] → 9. - Edge:
[0] → 0,[0,0,0] → 0,[N copies of 1] → N,[10000] → 10000. - Adversarial: strictly increasing — every bar stays on the stack until the sentinel; tests the drain. Strictly decreasing — every bar popped immediately; tests the per-bar bookkeeping.
- Large: N = 10⁵, random heights; assert <100ms. Cross-check against brute on a 1000-prefix.
- All same:
[7,7,7,7,7,7,7,7] → 56. - Random: 100 random inputs of size ≤ 200 against brute.
Follow-up Questions
- “Maximal rectangle of 1s in a 2D matrix (LC 85)?” → for each row
r, buildheights[c] = (heights[c] + 1 if mat[r][c] == 1 else 0). Then run LC 84 on each row’s heights. Total O(R·C). - “Largest rectangle of equal value?” → modify the predicate; same skeleton.
- “Number of submatrices with all 1s (LC 1504)?” → variant where for each row + column we count.
- “What if heights can be updated?” → segment tree with merge function; Phase 3 territory.
- “What if N is so large the stack doesn’t fit?” → the stack is bounded by N; if N doesn’t fit in memory, you have a bigger problem. (Answer: stream-based algorithms with reduced memory exist for some restricted versions.)
- “Why does
>=give the same answer here?” → bars of equal height to the popped bar fail to extend its rectangle leftward (they’d be popped first or become the new left boundary), so the answer is unchanged. Subtle, worth a sentence in the interview.
Product Extension
In ad-hoc analytics, the “largest rectangle” maps to “the longest time window during which all of K monitored systems exceeded a threshold” — useful for SLA breach detection. Each system’s per-time-bin status forms a histogram; the largest rectangle is the worst sustained breach. The same single-pass stack algorithm processes a stream of metric snapshots in O(1) amortized per snapshot.
Language/Runtime Follow-ups
- Python: native
listas a stack —append/popare O(1) amortized. Skipcollections.dequehere; the stack-only access pattern doesn’t benefit from a deque. - Java: prefer
ArrayDeque<Integer>overStack(the latter is a synchronized legacy class with overhead). Or useint[]with a manual top index — fastest for hot loops. - Go: slice-as-stack —
stack = append(stack, i),top := stack[len(stack)-1]; stack = stack[:len(stack)-1]. Beware: capacity may grow geometrically and not shrink — fine here since N is bounded. - C++:
std::vector<int>is fastest.std::stack<int>adds an unnecessary wrapper. Reserve capacity withstack.reserve(n). - JS/TS: native
Array.prototype.push/pop— O(1) amortized. Not as fast as a typedInt32Arrayfor hot loops. - Hot-loop: in Java,
int[]+topint outperformsArrayDeque<Integer>by ~3× due to no boxing.
Common Bugs
- Off-by-one width:
i - left - 1vsi - left. The popped bar’s rectangle starts atleft + 1and ends ati - 1(both inclusive), width =i - left - 1. - Forgetting the sentinel / drain. Without the sentinel
0, indices remaining on the stack are never processed. Their rectangles extend ton - 1withpr = n. - Using
>=instead of>(or vice versa). For LC 84, both happen to produce the right answer, but cousin problems break — pick the variant that gives a unique boundary. - Storing heights instead of indices. You then can’t compute width.
- Integer overflow in C++/Java for max-bar × max-N. Use 64-bit.
- Recursive simulation instead of iterative — Python’s default recursion limit is 1000, breaks at N > 1000.
- Java boxing in
Stack<Integer>orArrayDeque<Integer>— silent slowdown. Useint[]with manual top.
Debugging Strategy
- Trace
[2,1,5,6,2,3]. Stack evolution: push 0; at i=1 (height 1), pop 0 (height 2, left=-1, width=1, area=2); push 1; push 2; push 3; at i=4 (height 2), pop 3 (height 6, left=2, width=1, area=6); pop 2 (height 5, left=1, width=2, area=10); push 4; push 5; sentinel pops everything. Best = 10. ✓ - Trace
[1,1,1,1]. With>, the stack just accumulates[0,1,2,3]; sentinel pops 3 (area 1), pops 2 (area 2), pops 1 (area 3), pops 0 (area 4). Best = 4. ✓ - Cross-check 50 random inputs of size 50 against brute force.
Mastery Criteria
- Recognized “largest rectangle in histogram” as a monotonic-stack problem within 60 seconds.
- Stated the strict-monotonic-stack invariant before coding.
- Used the sentinel-
0trick on first attempt (or correctly drained post-loop). - Wrote
i - left - 1correctly, no off-by-one. - Generalized to LC 85 (max rectangle of 1s) without prompting.
- Articulated the amortized O(N) bound (each index pushed and popped once).
- Solved a cousin problem (LC 42 trapping rain water with stack, or LC 901 stock span) in <10 minutes within a week.
Lab 06 — Intervals: Meeting Rooms II (Heap Of Ends + Sweep-Line Alternate)
Goal
Master the two canonical interval algorithms — min-heap of end times and event-based sweep line — applied to the same problem (LC 253). Deliverable solves it both ways, in O(N log N) time, O(N) space, and you can articulate the tie-breaking rule, why one approach is more intuitive while the other generalizes more cleanly to “max concurrent X” problems.
Background Concepts
Sorting by start time as the canonical interval-prep; min-heap of end times tracking active intervals; sweep line as event stream (time, ±1) with stable tie-breaking; the “end before start” tie-break for closed-on-start, open-on-end intervals. Review pattern Intervals and Heap.
Interview Context
Interval problems appear at Meta, Google, and Amazon — and Meeting Rooms II in particular is a top-15 most-asked Medium. The interview signal is whether you can recognize that “min number of rooms” = “max concurrent meetings”, and then compute concurrency either via heap-of-ends or sweep. Strong candidates do one approach correctly; elite candidates do both, articulate the trade-off, and handle the open/closed interval tie-break correctly without prompting.
Problem Statement
Given an array of meeting time intervals intervals[i] = [start_i, end_i], return the minimum number of meeting rooms required.
Constraints
1 ≤ N ≤ 10^40 ≤ start_i < end_i ≤ 10^6- Each interval is half-open
[start, end): a meeting ending at timetand one starting at timetcan share a room.
Clarifying Questions
- Are intervals half-open or closed? (Crucial:
[1,3]and[3,5]— same room or not? Per LC 253, half-open: same room. This dictates the tie-break.) - Can two meetings start at the same time? (Yes — they need separate rooms.)
- Are intervals sorted? (No assumption.)
- Is
start < endstrict? (Per constraints, yes; no zero-duration meetings.) - Are room IDs significant, or just the count? (Just the count.)
Examples
| Input | Output |
|---|---|
[[0,30],[5,10],[15,20]] | 2 |
[[7,10],[2,4]] | 1 |
[[1,5],[5,10],[10,15]] | 1 (chained, half-open) |
[[1,5],[2,5],[5,10]] | 2 (overlap at [2,5]; the [5,10] reuses) |
[[1,2]] | 1 |
Initial Brute Force
For each pair (i, j), count overlaps; max over all time points.
def min_rooms_brute(intervals):
times = sorted(set(t for s, e in intervals for t in (s, e)))
best = 0
for t in times:
count = sum(1 for s, e in intervals if s <= t < e) # half-open
best = max(best, count)
return best
Brute Force Complexity
Time O(N²). Space O(N). At N=10⁴, ~10⁸ ops — borderline; passes in C++ but TLEs in Python.
Optimization Path A — Heap of End Times
Sort intervals by start. Maintain a min-heap of end times for currently active meetings. For each new interval (start, end):
- If the heap’s smallest end ≤ start, that room frees up — pop it.
- Push the new end.
The number of rooms needed = max heap size ever, which equals the final heap size if we don’t pop more than we push (and we don’t, by the invariant). Actually, simpler: rooms = heap size at end, since we only pop when reusing, never net.
import heapq
def min_rooms_heap(intervals):
intervals.sort(key=lambda x: x[0])
heap = []
for s, e in intervals:
if heap and heap[0] <= s:
heapq.heappop(heap) # reuse a room
heapq.heappush(heap, e)
return len(heap)
Optimization Path B — Sweep Line
Convert intervals to events (start, +1), (end, -1). Sort: by time, with end events before start events on ties (because [1,5] and [5,10] share a room). Sweep, tracking max concurrent.
def min_rooms_sweep(intervals):
events = []
for s, e in intervals:
events.append((s, +1))
events.append((e, -1))
events.sort() # (time, +1)>(time, -1): -1 sorts first since -1 < 1
cur = best = 0
for _, delta in events:
cur += delta
best = max(best, cur)
return best
The tie-break is automatic: (t, -1) < (t, +1) because -1 < 1 lexicographically. End events fire before start events at the same t, so a freed room is reused.
Final Expected Approach
Either A or B is acceptable; mention both in the interview.
Data Structures Used
- Heap approach: the input array (sorted) + a min-heap of end times.
- Sweep approach: an events array of size 2N + a single integer counter.
Correctness Argument (Heap)
Invariant: after processing intervals 0..i-1 in sorted-by-start order, heap contains the end times of all rooms that are still in use at time intervals[i-1].start. Equivalently, heap is the multiset of end times of meetings that haven’t ended yet by the time we’d schedule the next one.
When processing (s, e):
- Any heap top
≤ scorresponds to a room whose meeting has ended; it’s reusable. Pop one. - Push
efor the new meeting.
The final len(heap) is the cumulative maximum number of concurrent meetings, since we only pop when a room frees up (so the heap size strictly decreases only when an old room is reused for a new meeting; otherwise, it grows). Equivalently, max-active-at-any-point = max-rooms-ever-needed.
(Note: we only pop one room even if multiple have ended, but that’s fine — each subsequent meeting will pop its own.)
Correctness Argument (Sweep)
A sweep at time t maintains cur = number of intervals whose start ≤ t < end, half-open. The max of cur over all t is the max concurrency, which is the min rooms needed. The tie-break “end before start at the same time” implements the half-open convention: an end at t decrements cur before a start at t increments it, so cur correctly reflects “intervals active at exactly t” with the half-open semantics.
Complexity
Both: time O(N log N) (sort dominates; heap and sweep each O(N log N) and O(N) respectively after the sort). Space O(N).
Implementation Requirements
- Sort by start for the heap approach. By start ascending; tie-break doesn’t matter for the heap (because we always pop if
heap[0] ≤ s, which is correct for either tie-break). - For the sweep approach, the natural tuple sort
(time, delta)withdelta ∈ {-1, +1}already gives the right tie-break; don’t reverse the comparator. - Use a min-heap (Python
heapq, JavaPriorityQueuedefault, C++priority_queue<int, vector<int>, greater<int>>). - Don’t sort start-times and end-times separately into two arrays — that’s a third valid approach (the “two pointers” approach, equivalent to sweep) but make sure you understand it’s distinct from the heap approach.
Tests
- Smoke: the five examples above.
- Unit: all-disjoint intervals → 1; all-identical intervals → N; chain
[1,2],[2,3],[3,4]→ 1. - Edge: N = 1 → 1; intervals all start at 0 → N (all overlap).
- Tie-break:
[[5,10],[10,15]]→ 1 (half-open). If your test returns 2, your sort comparator is wrong. - Adversarial: tournament-bracket —
[[1,4],[2,5],[7,9],[1,5]](mix); validate against brute on small inputs. - Large: N = 10⁴, random intervals; both heap and sweep should run sub-50ms.
Follow-up Questions
- “Return the actual schedule (which interval goes in which room)?” → augment heap entries to
(end, room_id); reuse the poppedroom_idfor the new interval. - “Real-time scheduling: intervals arrive in order, decide room on the fly?” → still works; no offline assumption needed for the heap version (sweep needs all events upfront).
- “Closed intervals
[s, e]instead of half-open?” → flip the tie-break: start before end at the same time, i.e., sort(t, +1)before(t, -1). Or in heap version, changeheap[0] <= stoheap[0] < s. - “Maximum number of overlapping intervals (LC 1851)?” → identical pattern.
- “Insert / delete intervals dynamically?” → balanced BST keyed on start; O(log N) per op (Phase 3).
- “Generalize to weighted intervals?” → DP on intervals (interval scheduling maximization, LC 1235), Phase 3.
Product Extension
Resource allocation in CI/CD: minimum number of build agents required to handle a queue of (start, duration) jobs without delay, given they’re all queued ahead of time. Or: minimum servers needed to handle a known load profile (each request has a known service window). The same heap-of-ends pattern, run on a stream, computes peak concurrency in real time.
Language/Runtime Follow-ups
- Python:
heapqis min-only; for max-heap, negate.heapq.heappop(h)andheapq.heappush(h, x)are O(log N). For sweep,events.sort()on a list of tuples works directly thanks to lexicographic tuple ordering. - Java:
PriorityQueue<Integer>defaults to min-heap. Boxing tax on primitives — for hot loops, useIntPriorityQueuefrom a primitive collections lib. Sort intervals viaArrays.sort(intervals, (a, b) -> a[0] - b[0])— beware overflow (useInteger.compare(a[0], b[0])). - Go:
container/heaprequires implementing theheap.Interface(5 methods). Verbose but flexible. - C++:
std::priority_queue<int, std::vector<int>, std::greater<int>>for min-heap (default is max). Sort withstd::sort. - JS/TS: no built-in heap. Implement one (~30 lines) or use a library. For sweep,
events.sort((a, b) => a[0] - b[0] || a[1] - b[1]). Beware:Array.sortis not stable in all engines for older versions of V8 (it’s stable since ES2019 — so it’s fine on modern Node, but mention it). - Edge: Java’s
PriorityQueue.peek()is O(1);poll()is O(log N). C++’stop()is O(1);pop()is O(log N) but doesn’t return the value — calltop()first.
Common Bugs
- Wrong tie-break in sweep. For half-open intervals, end events must fire before start events at the same time. The natural
(time, ±1)tuple sort gives this for free (since -1 < +1); reversing the comparator breaks it. - Heap comparator on max-heap default (Java’s
PriorityQueuehas a min-heap default; C++’spriority_queuehas a max-heap default — easy to forget which is which). - Sorting by end instead of start in the heap approach — gives wrong room counts.
- Popping heap on
heap[0] < sinstead of<=— for half-open[s, e),<=is correct (a meeting ending atshas ended, the room is free for one starting ats). - Forgetting to push the new end after popping — heap loses an entry, undercount.
- Comparator overflow in Java:
a[0] - b[0]overflows when values are huge; useInteger.compare. - Sweeping events without separating same-time events — fragile; always tie-break explicitly even if the data “happens” to not have ties.
Debugging Strategy
- Trace
[[0,30],[5,10],[15,20]]with the heap approach: sort doesn’t change order. Process[0,30]: heap =[30]. Process[5,10]: 30 > 5, no pop; push 10; heap =[10, 30]. Process[15,20]: 10 ≤ 15, pop; push 20; heap =[20, 30]. Final size = 2. ✓ - For half-open tie-break: trace
[[5,10],[10,15]]. Heap:[10]; second interval,10 <= 10→ pop, push 15; heap =[15]. Final size 1. ✓ If you used<, you’d get 2. - For sweep: events
[(5,+1),(10,-1),(10,+1),(15,-1)]sorted:[(5,+1),(10,-1),(10,+1),(15,-1)]. Sweep:curbecomes 1, 0, 1, 0; max 1. ✓ - Validate against brute on 30 random inputs of size ≤ 50.
Mastery Criteria
- Recognized “min rooms” / “max concurrent intervals” within 30 seconds.
- Wrote both heap and sweep solutions within 10 minutes total.
- Articulated the half-open tie-break before coding.
- Handled the language-specific heap default (min vs max) without bugs.
- Identified the connection to LC 1851 (max overlapping intervals) and LC 1094 (Car Pooling).
- Solved LC 56 (Merge Intervals) and LC 57 (Insert Interval) within a week, observing the same sort-by-start skeleton.
Lab 07 — Topological Sort: Course Schedule II (Kahn’s Vs DFS)
Goal
Master both topological sort algorithms — Kahn’s BFS and DFS-postorder — applied to LC 210. Deliverable produces a valid course order in O(V + E) time, O(V + E) space, with cycle detection wired in. You can articulate when each algorithm is preferable, the standard cycle-detection check, and how this generalizes to dependency resolution and build systems.
Background Concepts
DAG topological order; indegree array as a Kahn’s prerequisite; DFS three-color marking (white/gray/black) for cycle detection; postorder reverse for DFS-topo; the equivalence of “topo order exists” and “graph is a DAG”; existence of cycle = len(order) != V for Kahn’s. Review pattern Topological Sort and Graph Foundations.
Interview Context
Topological sort appears at Meta (build dependencies), Amazon (course scheduling, package ordering), Google (Spanner schema migrations). The interview signal is whether you naturally pick Kahn’s for explicit ordering (where the BFS structure makes the algorithm self-documenting) and DFS for cycle detection or recursive structure (where the call stack mirrors the recursion). Weak candidates only know one. Strong candidates code Kahn’s and explain the DFS variant. Elite candidates discuss stable ordering (preserving input order on ties via priority queue) and comment on parallelizability.
Problem Statement
You must take numCourses numbered 0..n-1. Some courses have prerequisites: prerequisites[i] = [a, b] means you must take b before a. Return any valid ordering of courses to finish all of them, or an empty array if impossible (cycle).
Constraints
1 ≤ numCourses ≤ 20000 ≤ |prerequisites| ≤ 5000- All
[a, b]pairs unique;a != b.
Clarifying Questions
- Are duplicate prerequisite pairs possible? (Per constraints, no — but worth confirming, as it affects indegree counting.)
- Are self-loops
[a, a]possible? (Per constraints,a != b— no self-loops.) - If multiple valid orderings exist, return which one? (Any. But mention you can return the lex-smallest with a min-heap-based Kahn’s.)
- Output empty array on cycle, or null/exception? (LC says empty.)
- Are course IDs always
0..n-1contiguous? (Yes per LC; otherwise you’d need to map.)
Examples
| numCourses | prerequisites | Output |
|---|---|---|
| 2 | [[1,0]] | [0,1] |
| 4 | [[1,0],[2,0],[3,1],[3,2]] | [0,1,2,3] or [0,2,1,3] |
| 2 | [[1,0],[0,1]] | [] (cycle) |
| 1 | [] | [0] |
| 3 | [[0,1],[1,2],[2,0]] | [] (cycle) |
Initial Brute Force
Repeatedly find a course with no remaining prerequisites; remove it and its outgoing edges; repeat. If at some point no such course exists but courses remain, there’s a cycle.
def find_order_brute(n, prereqs):
deps = [set() for _ in range(n)]
for a, b in prereqs:
deps[a].add(b)
order = []
for _ in range(n):
for i in range(n):
if deps[i] is not None and not deps[i]:
order.append(i)
deps[i] = None
for j in range(n):
if deps[j] is not None: deps[j].discard(i)
break
else:
return []
return order
Brute Force Complexity
Time O(V³) (V scans × V lookups × V edge updates per round). At V=2000, ~8×10⁹ — TLEs.
Optimization Path
Kahn’s algorithm replaces “find a no-prereq course” + “remove it” with O(1) amortized operations:
- Compute
indegree[v]= number of incoming edges. - Initialize a queue with all
vhavingindegree[v] == 0. - Repeat: pop
ufrom queue, append to order, decrement indegree of each successorv; ifindegree[v]becomes 0, pushv. - If
len(order) == V: that’s the topological order. Else: cycle.
Each edge processed once. Each vertex enqueued/dequeued once. Total O(V + E).
DFS variant: run DFS from each unvisited node; on finish (postorder), prepend the node to the order. Detect cycles via the gray-vertex check.
Final Expected Approach (Kahn’s)
from collections import deque
def find_order(n, prereqs):
adj = [[] for _ in range(n)]
indeg = [0] * n
for a, b in prereqs:
adj[b].append(a) # edge b -> a (b is prereq of a)
indeg[a] += 1
q = deque(v for v in range(n) if indeg[v] == 0)
order = []
while q:
u = q.popleft()
order.append(u)
for v in adj[u]:
indeg[v] -= 1
if indeg[v] == 0:
q.append(v)
return order if len(order) == n else []
Final Expected Approach (DFS)
def find_order_dfs(n, prereqs):
adj = [[] for _ in range(n)]
for a, b in prereqs:
adj[b].append(a)
color = [0] * n # 0=white, 1=gray (on stack), 2=black (done)
order = []
def dfs(u):
if color[u] == 1: return False # back edge -> cycle
if color[u] == 2: return True
color[u] = 1
for v in adj[u]:
if not dfs(v): return False
color[u] = 2
order.append(u)
return True
for u in range(n):
if color[u] == 0 and not dfs(u):
return []
return order[::-1]
Data Structures Used
- Kahn’s: adjacency list, indegree array, queue, output list.
- DFS: adjacency list, color array, output list, implicit recursion stack.
Correctness Argument (Kahn’s)
Invariant: at any point, order contains a valid prefix of some topological order, and the queue contains exactly the unprocessed vertices with no remaining unsatisfied prerequisites (i.e., remaining indegree zero in the subgraph of unprocessed vertices).
Maintenance: when we pop u, all its prereqs are already in order (since indeg[u] == 0 in the residual graph means all its prereqs have been processed). Adding u extends a valid topo-prefix. For each successor v, decrementing indeg[v] reflects that u is now “satisfied”; if v’s residual indegree hits 0, all its prereqs are satisfied, so it’s eligible.
Cycle detection: if len(order) < n, some vertices were never enqueued, meaning their residual indegree never reached 0 — they’re inside a strongly connected component with a cycle (or downstream of one).
Correctness Argument (DFS)
The classical theorem: a directed graph is a DAG iff DFS encounters no back edges. We mark vertices gray when entering DFS, black on finish. Encountering a gray vertex via an outgoing edge is a back edge → cycle. Postorder reversed gives a valid topological order: when u finishes, all reachable-from-u vertices have already finished and are earlier in order; reversing puts them after u.
Complexity
Both: O(V + E) time, O(V + E) space (adjacency + queue/recursion-stack).
Implementation Requirements
- Build the adjacency list with edge direction
prereq → course(so we can decrementindeg[course]when aprereqis processed). The reverse direction also works but flips the topo-order interpretation; pick one convention and stay consistent. - Use
deque(Python) /ArrayDeque(Java) /container/list(Go) for the BFS queue, not a Pythonlist.pop(0)which is O(N). - DFS recursion: watch Python’s default recursion limit (1000) for large N. Either iterative DFS or
sys.setrecursionlimitfor V > ~900. - Cycle check after the loop (Kahn’s) or during (DFS gray check).
Tests
- Smoke:
(2, [[1,0]])→[0,1]. - Unit: no prereqs (
(3, [])→ any permutation of[0,1,2]); single chain ((4, [[1,0],[2,1],[3,2]])→[0,1,2,3]). - Cycle:
(2, [[1,0],[0,1]])→[]; longer cycle(3, [[0,1],[1,2],[2,0]])→[]. - DAG with multiple roots:
(4, [[2,0],[2,1],[3,2]])→ either[0,1,2,3]or[1,0,2,3]. Validate by checking the output is a permutation and all prereqs respected. - Edge: N=1, no prereqs →
[0]. - Large: N=2000, E=5000, random DAG; assert sub-100ms.
- Validator helper (write this!):
def is_valid_topo(order, n, prereqs): if len(order) != n or set(order) != set(range(n)): return False pos = {v: i for i, v in enumerate(order)} return all(pos[b] < pos[a] for a, b in prereqs)
Follow-up Questions
- “Return the lex-smallest valid order.” → replace the queue with a min-heap. Time O((V + E) log V).
- “Detect all vertices in cycles, not just whether any exist.” → SCC decomposition (Tarjan’s / Kosaraju’s), Phase 3.
- “Topological sort under updates (edges added/removed)?” → online topological order maintenance, hard problem; offline batched updates with reordering.
- “Parallel topological sort?” → at each round, all indegree-0 vertices can run in parallel; this is the natural parallelization for build systems (Bazel, Buck).
- “Schedule with time costs per task: minimize total wall time?” → critical-path method; longest path in DAG, computable in O(V + E) after topo-sort.
- “If V is huge (10^9) and the graph is implicit?” → streaming variant; need indegree of each vertex computed via input stream.
Product Extension
This pattern is dependency resolution. Build systems (Bazel, Make, Maven, npm/yarn lockfile resolution), database migration runners, terraform depends_on, container orchestration (Kubernetes init-containers), spreadsheet recalc, even React’s effect-dependency ordering. Course Schedule II is the toy version of “given declared dependencies, output a valid execution order” — and the DFS variant is what most build systems use, because they want to detect cycles early with a clear error path showing the offending cycle.
Language/Runtime Follow-ups
- Python:
collections.dequefor BFS queue;sys.setrecursionlimit(10**5)if doing DFS on large input. List-as-queue withpop(0)is O(N) — never use it. - Java:
ArrayDeque<Integer>is the canonical queue. Boxing tax forQueue<Integer>— for hot loops, useint[]ring buffer. - Go: no built-in queue; use a slice and
q = q[1:](cheap ifcap(q)doesn’t change) orcontainer/list(heavier). Slice-as-queue grows O(N) memory per shrink because Go doesn’t truncate-and-shift; if memory matters, periodically copy the live portion. - C++:
std::queue<int>for Kahn’s;std::vector<int>+ recursive DFS (iterative if V > ~10⁵ to avoid stack overflow with default 8MB stack). - JS/TS:
Array.prototype.shift()is O(N) — use index-based queue (let head = 0; q[head++]) for O(1) amortized. - Stack overflow: any DFS-topo on V > recursion-limit needs iterative implementation. The iterative version uses an explicit stack of
(vertex, iterator)pairs.
Common Bugs
- Edge direction confusion.
[a, b]means “b before a”, so the edge isb → a. Reversing it inverts the topological order and breaks the indegree computation. - Forgetting cycle detection. Returning
ordereven whenlen(order) < nproduces a partial order that misses some courses. - Using
list.pop(0)in Python (orQueue.poll()on aLinkedList<Integer>— actually fine, butArrayList.remove(0)is O(N)). - Python recursion limit in DFS on V > 1000 — silent
RecursionError. Set the limit explicitly. - DFS gray check missing. Without distinguishing gray (on-stack) from black (finished), you can’t detect back edges; you’d accept cyclic graphs with the wrong order.
- Java boxing penalty in
Queue<Integer>— ~2× slowdown vsint[]ring buffer. - Adjacency list as
Map<Integer, List<Integer>>when courses are 0..n-1 — wastes time on hashing; useList<List<Integer>>indexed by ID.
Debugging Strategy
- Trace
(4, [[1,0],[2,0],[3,1],[3,2]])Kahn’s: indeg =[0,1,1,2]. Queue:[0]. Pop 0, decrement indeg of 1 and 2: indeg =[0,0,0,2]. Push 1, 2. Pop 1, decrement indeg of 3:[0,0,0,1]. Pop 2, decrement:[0,0,0,0]. Push 3. Pop 3. Order =[0,1,2,3]. ✓ - Run the validator helper on every output during development.
- For cycle issues: build a small cycle by hand, ensure your code returns
[], not a partial order.
Mastery Criteria
- Recognized “valid order respecting prereqs” as topological sort within 30 seconds.
- Wrote Kahn’s correctly within 8 minutes, with cycle detection.
- Wrote DFS variant within 8 more minutes, with three-color cycle detection.
- Articulated both correctness arguments without prompting.
- Identified the language-specific recursion-limit / boxing trap.
- Generalized to LC 207 (Course Schedule, just yes/no), LC 269 (Alien Dictionary, infer edges), LC 1136 (Parallel Courses, layer-by-layer Kahn’s) within a week.
Lab 08 — Backtracking: Word Search II With Trie Pruning
Goal
Master the backtracking-with-pruning pattern in its highest-yield form: a grid DFS guided by a trie, with in-place visit-marking and post-recurse restoration. Deliverable solves LC 212 in O(M·N·4·3^(L-1) + W·L) time, where M·N is grid size, L is max word length, W is number of words. You can articulate why a trie cuts the brute O(W·M·N·4·3^(L-1)) by a factor of W, and why dead-branch pruning of the trie is the speed-of-light optimization.
Background Concepts
DFS on a 2D grid; backtracking with explicit make/undo of state; trie as a multi-pattern matcher; pruning via “if no children, abandon”; visit-marking via in-place mutation (board[r][c] = '#') vs an explicit visited set. Review pattern Backtracking and Trie.
Interview Context
Word Search II is an Amazon / Apple / Bloomberg favorite, and a top-3 hardest commonly-asked Mediums (often listed Hard). The interview signal is recognizing that running LC 79 (Word Search single-word) once per word is catastrophically slow for many words, and that a trie collapses the W independent searches into a single grid traversal. Strong candidates code the trie + DFS in 25 minutes. Elite candidates also implement trie pruning (deleting fully-found subtrees) to avoid revisiting.
Problem Statement
Given an M x N grid of characters and a list of words, return all words from the list that exist in the grid. A word can be constructed from letters of sequentially adjacent cells (horizontal/vertical), each cell used at most once per word.
Constraints
1 ≤ M, N ≤ 121 ≤ |words| ≤ 3 × 10^41 ≤ |word| ≤ 10board[i][j]andwords[i][j]are lowercase English letters.- All
wordsare distinct.
Clarifying Questions
- Each word individually uses each cell ≤ once, but can different words reuse the same cells? (Yes — independent searches per word.)
- Diagonal adjacency? (No — only 4-connected.)
- Are words guaranteed distinct? (Per constraints, yes.)
- Are duplicates in the result allowed? (No — return distinct words found.)
- Lower-case only? (Per constraints, yes — alphabet of size 26 simplifies trie nodes to fixed arrays.)
Examples
board = [["o","a","a","n"],
["e","t","a","e"],
["i","h","k","r"],
["i","f","l","v"]]
words = ["oath","pea","eat","rain"]
output = ["eat","oath"]
board = [["a","b"],["c","d"]]
words = ["abcb"]
output = [] (cells not adjacent or reused)
Initial Brute Force
For each word, run LC 79 (single-word search) on the grid.
def find_words_brute(board, words):
return [w for w in words if exists_in_grid(board, w)]
def exists_in_grid(board, word):
M, N = len(board), len(board[0])
def dfs(r, c, i):
if i == len(word): return True
if not (0 <= r < M and 0 <= c < N) or board[r][c] != word[i]: return False
ch, board[r][c] = board[r][c], '#'
ok = dfs(r+1,c,i+1) or dfs(r-1,c,i+1) or dfs(r,c+1,i+1) or dfs(r,c-1,i+1)
board[r][c] = ch
return ok
return any(dfs(r, c, 0) for r in range(M) for c in range(N))
Brute Force Complexity
Per word: O(M·N · 4·3^(L-1)) — for each starting cell, DFS explores at most 4 branches initially then 3 (one cell visited). Across W words: O(W · M · N · 4 · 3^(L-1)). With W = 3×10⁴, L = 10, M·N = 144: ~3×10⁴ × 144 × 4 × 3⁹ = enormous (~10^11). TLEs.
Optimization Path
Insight: all W single-word searches share grid traversal. If we have a trie of all words, a single DFS over the grid can simultaneously match all words, advancing through trie nodes as we step. At each grid cell (r, c), instead of asking “does this cell match word[i]”, we ask “does the current trie node have a child for board[r][c]”. If yes, descend. If the current trie node has the word field set, that word has been found — record and clear (to dedupe).
Pruning: after backtracking, if the trie node we just descended into has no children left (all its words have been found and we cleared them), prune it from the parent. This avoids revisiting empty subtrees on later starting cells.
Final Expected Approach
def find_words(board, words):
# 1) Build trie
trie = {}
for w in words:
node = trie
for c in w:
node = node.setdefault(c, {})
node['$'] = w # marker: word ends here
M, N = len(board), len(board[0])
found = []
def dfs(r, c, parent):
ch = board[r][c]
node = parent.get(ch)
if not node: return
if '$' in node:
found.append(node.pop('$')) # dedup: clear marker
board[r][c] = '#'
for dr, dc in ((1,0),(-1,0),(0,1),(0,-1)):
nr, nc = r+dr, c+dc
if 0 <= nr < M and 0 <= nc < N and board[nr][nc] != '#':
dfs(nr, nc, node)
board[r][c] = ch
if not node: # prune dead branch
parent.pop(ch)
for r in range(M):
for c in range(N):
dfs(r, c, trie)
return found
Data Structures Used
- Trie: nested dict (Python). In Java/C++, an explicit
TrieNodeclass withTrieNode[26]children array. - Grid: mutated in place (
'#'marker for visited). - Output list: distinct words found.
Correctness Argument
Trie invariant: the trie initially encodes all words; each leaf-marker '$' carries the word. The DFS descends one trie level per grid step; arriving at a node with '$' means we’ve matched the complete word from the start cell.
Backtracking correctness: in-place marking with '#' and explicit restoration in the post-recurse line guarantee that on entry to any DFS call, the grid reflects only ancestor cells as visited. The restore is symmetric to the mark; no leaks.
Dedup via pop('$'): clearing the marker on first find ensures each word is reported exactly once even if multiple grid paths spell it.
Pruning correctness: pruning a child after recursion only removes a subtree that has no remaining words to find (no '$' markers anywhere below). Future searches that would have entered this subtree gain nothing from doing so, so pruning is safe and accelerates the algorithm.
Complexity
Time O(M·N · 4·3^(L-1)) after trie pruning, in the best case (when found words deplete the trie quickly). Worst case (all words distinct, no early termination): O(M·N · 4·3^(L-1)) for the grid traversal + O(W·L) for trie build. Space: O(W·L) for the trie + O(L) for recursion.
The W factor is gone because all words share the traversal.
Implementation Requirements
- One DFS per starting cell, not W DFSes per starting cell. The trie unifies them.
- Restore the cell (
board[r][c] = ch) on every code path. The cleanest pattern: mark before the recursive calls, restore after — never restore inside conditionals. - Prune by removing the child entry when its subtree empties. This is a 5%-50% speedup depending on input.
- Dedup by
pop('$')on found, not byif word not in found: found.append(word)(the latter is O(W) per check). - For Python, use a nested dict; explicit
TrieNodeclasses are slower due to attribute lookup.
Tests
- Smoke: the LC example above.
- Unit: single-cell grid, single-letter word; word identical to one row.
- No matches: words with letters not in the grid →
[]. - All matches: every word findable.
- Diagonal trap: word that exists only along a diagonal — should NOT be found.
- Reuse trap:
board=[["a","a"]], words=["aaa"]→[](cell reuse forbidden). - Stress: M=12, N=12, W=3×10⁴, L=10, random; assert <1s in optimized languages, <5s in Python.
- Adversarial: words with long common prefix (e.g., 1000 variants of
"prefix....") — exercises the trie’s prefix-sharing benefit.
Follow-up Questions
- “What if words can use diagonal adjacency too?” → 8 directions; everything else identical.
- “What if the same cell can be reused?” → no marking needed; but then word length is unbounded by grid size, exponential blowup risk; need a cycle guard (e.g., max-length cap = some threshold).
- “What if you want all paths spelling each word, not just whether it exists?” → don’t dedup; collect on every match.
- “Memory blow-up for huge dictionaries?” → use a compressed trie (radix tree) or DAWG.
- “Distributed: shard grid across machines?” → grids small enough to not matter; for huge grids, partition with overlap of size L-1.
- “Online: words added/removed dynamically?” → trie supports insert/delete in O(L); the search needs no change.
Product Extension
This pattern underlies multi-pattern string matching in DLP (data-loss prevention) — scanning documents for any of a list of forbidden phrases — and in IDE autocomplete-on-context (which words from the dictionary can be formed by adjacent identifiers in scope?). It’s also used in spell-checkers that operate on keyboard layouts (find dictionary words spellable by adjacent keys). The Aho-Corasick algorithm is the streaming generalization (multi-pattern matching in O(text + total-pattern-length + matches)).
Language/Runtime Follow-ups
- Python: nested
dictis the fastest trie representation in Python; class-based is slower due to attribute lookup.dict.setdefault(c, {})is the canonical insert;dict.pop(key, None)is safe pop with default. - Java: explicit
TrieNode { TrieNode[] next = new TrieNode[26]; String word; }is fastest.HashMap<Character, TrieNode>is slower (autoboxing + hash). - Go: struct with
[26]*TrieNodearray. Map-based is slower. - C++: struct with
TrieNode* next[26] = {nullptr};andstring* word. Avoidstd::map<char, TrieNode>. - JS/TS: plain object as a hashmap is fine, but for hot loops, a
Mapor fixed-array index works. - Recursion depth: Python L = 10 fits the default 1000-frame stack. For deeper word lengths, set
sys.setrecursionlimit. - Mutation safety: in-place grid mutation with
'#'is fast but has thread-safety implications; if the function must be re-entrant, use an explicitvisitedset per call.
Common Bugs
- Forgetting to restore the cell. Causes false negatives later: cells stay marked
'#'permanently, blocking unrelated words. - Restoring inside
if/returnpaths. Always restore at the end, after all branches have explored. Easiest: structure asmark; for each direction: recurse; restore;. - Visited check on the neighbor, not on the current cell. You should refuse to step into a
'#'cell, but the current cell mark happens after entering it. - Adding the same word multiple times. Without
pop('$'), a word findable by 5 paths gets reported 5 times. - Walking the trie root-back-to-root for each starting cell in non-trie code — re-enumerating words you’ve already found.
- Boxing in Java’s
Map<Character, TrieNode>— silent ~3× slowdown vsTrieNode[26]. - Using
word in found(O(W)) instead ofpop('$')(O(1)) for dedup. - Pruning incorrectly: popping the trie node’s
'$'marker but leaving its emptydictin the parent — then visited subtrees are revisited as empty traversals. Always checkif not node: parent.pop(ch)after recursion.
Debugging Strategy
- Build a tiny trie by hand for
["oath", "oat"]— verify the structure with a print. - Trace one DFS from cell
(0,0)for the smoke example. The cellomatches root’s'o'child; descend; mark; try neighbors; etc. Verifyoathis found exactly once. - Test the prune with a single word: after finding
"oath", the trie should be empty. Subsequent starts at any cell return immediately. - Cross-check against the brute-force LC 79 per word for the smoke and stress tests on M=N=4, W=10.
Mastery Criteria
- Recognized “many words in a grid” as trie + DFS within 60 seconds.
- Built the trie in ≤ 8 lines with
setdefault. - Wrote the DFS with
mark; recurse; restoresymmetry on first attempt. - Implemented dedup via
pop('$')and pruning viaif not node: parent.pop(ch). - Articulated the W factor savings vs running LC 79 W times.
- Identified the language-specific trie-representation trade-off.
- Solved LC 79 (single-word) in <8 minutes within a week.
- Solved LC 1268 (Search Suggestions System — autocomplete) within two weeks.
Lab 09 — Heap For Top-K: Top K Frequent Elements (Heap Vs Bucket Sort)
Goal
Master the two canonical “top K” algorithms — min-heap of size K for streaming/general cases and bucket sort by frequency for bounded-frequency cases. Deliverable solves LC 347 with both, articulates the time-space trade-off, and recognizes which language-specific gotcha (Python heapq is min-only; Java PriorityQueue boxes; C++ defaults to max-heap) applies.
Background Concepts
Min-heap of size K as the canonical “running top K” structure: pop when size exceeds K, ensuring O(N log K). Bucket sort by frequency (not value) when frequencies are bounded by N. The duality “top K frequent” / “K largest” / “K smallest” via heap-direction inversion. QuickSelect as the O(N) average alternative. Review pattern Heap Top K and Heap Foundations.
Interview Context
Top-K problems are interview gold — they appear at every Big Tech, often as the warmup or the second problem. The interview signal is whether you can match the right structure to the input shape: heap when N is huge or streaming, bucket sort when frequencies are bounded (which they always are in this problem since the max frequency is N). Strong candidates write the heap solution. Elite candidates write the heap solution, then also mention the O(N) bucket-sort alternative and articulate when each is preferred.
Problem Statement
Given an integer array nums and an integer K, return the K most frequent elements. The answer can be returned in any order.
Constraints
1 ≤ |nums| ≤ 10^5-10^4 ≤ nums[i] ≤ 10^4Kis in the range[1, |distinct values in nums|].- The answer is unique (no ties at the K-th position that would create ambiguity).
Clarifying Questions
- Tie-breaking — what if two values share the K-th frequency rank? (Per constraints, the answer is unique. But if it weren’t, ask: arbitrary, or some specified rule like smallest value first?)
- Output order — sorted by frequency, by value, or any? (LC: any.)
- Are floats / strings possible, or strictly ints? (Per constraints, ints — but the algorithm extends trivially to any hashable type.)
- Streaming or offline? (Offline, but extending to streaming — windowed top-K — is a follow-up.)
- Memory-constrained? (No special constraint, but the algorithm should be O(N) space because the frequency map is unavoidable.)
Examples
| Input | Output |
|---|---|
nums=[1,1,1,2,2,3], K=2 | [1,2] |
nums=[1], K=1 | [1] |
nums=[4,1,-1,2,-1,2,3], K=2 | [-1,2] |
Initial Brute Force
Count frequencies; sort by frequency; take top K.
from collections import Counter
def top_k_brute(nums, K):
cnt = Counter(nums)
return [x for x, _ in sorted(cnt.items(), key=lambda kv: -kv[1])[:K]]
Brute Force Complexity
Time O(N + D log D) where D is the number of distinct values. Space O(D). At N=10⁵, D≤2×10⁴+1, this is fast — but it sorts more than needed (full O(D log D) when we only want top K).
Optimization Path A — Min-Heap of Size K
Build a frequency map. Walk the (value, freq) pairs maintaining a min-heap of size K keyed by freq. For each pair, push; if heap size > K, pop. The K survivors are the top K.
Why min-heap? We want to drop the smallest frequency when the heap overflows; min-heap.peek() gives the smallest in O(1).
Time: O(N) frequency count + O(D log K) heap. For D >> K, this is much faster than O(D log D).
Optimization Path B — Bucket Sort by Frequency
Frequencies are integers in [1, N]. Create buckets bucket[f] = list of values with freq == f. Walk f from N down to 1, collecting values into the result until we have K.
Time: O(N). Space: O(N). The cleanest O(N) solution.
Final Expected Approach (Heap)
import heapq
from collections import Counter
def top_k_frequent_heap(nums, K):
cnt = Counter(nums)
heap = [] # (freq, value), min-heap
for v, f in cnt.items():
heapq.heappush(heap, (f, v))
if len(heap) > K:
heapq.heappop(heap)
return [v for _, v in heap]
Final Expected Approach (Bucket Sort)
from collections import Counter
def top_k_frequent_bucket(nums, K):
cnt = Counter(nums)
buckets = [[] for _ in range(len(nums) + 1)]
for v, f in cnt.items():
buckets[f].append(v)
out = []
for f in range(len(nums), 0, -1):
for v in buckets[f]:
out.append(v)
if len(out) == K: return out
return out
Data Structures Used
- Heap approach:
Counter(hashmap) + min-heap of size K. - Bucket approach:
Counter+ a list-of-lists buckets indexed by frequency.
Correctness Argument (Heap)
Invariant: after processing the first i distinct values, the heap contains the top-min(K, i) most frequent among them. Adding the next value either grows the heap (if size < K) or replaces the min if the new freq exceeds it (push then pop-if-over-K does both correctly: push always grows; pop removes the smallest, which is the new value if it’s the smallest, leaving the heap unchanged).
Final step: after processing all D values, heap = top K. ✓
Correctness Argument (Bucket)
Frequencies range in [1, N], so buckets indexed [0..N] capture all. Walking f from high to low and collecting until K values found gives exactly the K most-frequent (with arbitrary tie-breaking within a bucket, acceptable per constraints).
Complexity
| Approach | Time | Space |
|---|---|---|
| Brute (full sort) | O(N + D log D) | O(D) |
| Heap | O(N + D log K) | O(N + K) |
| Bucket | O(N) | O(N) |
| QuickSelect | O(N) average, O(N²) worst | O(D) |
For D ≈ N and K ≪ D, heap and bucket are both linear-class; bucket’s hidden constant is smaller. For streaming, heap is the natural choice.
Implementation Requirements
- Use a min-heap, even though we want the largest K — popping the min when overflowing keeps the largest K in the heap.
- For Python’s
heapq, push tuples(freq, value)— the comparison is lexicographic, so freq dominates. - Java’s
PriorityQueueis a min-heap by default. No comparator inversion needed for “top-K largest” via the min-heap-of-size-K trick. Pushint[]{freq, value}withComparator.comparingInt(a -> a[0]). - C++’s
priority_queueis a max-heap by default. For min-heap:priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>>. - For bucket sort, size buckets
N + 1(frequencies1..N, plus index 0 unused).
Tests
- Smoke:
([1,1,1,2,2,3], 2) → [1,2](in any order). - Unit:
K = 1→ most frequent only;K = D→ all distinct values. - All distinct values:
[1,2,3,4,5], K=3→ 3 of these (any 3, since frequencies tie at 1 — but per problem statement, the answer is unique, so this case wouldn’t be a test… unlessK = 5, returning all). - All same:
[7]*100, K=1 → [7]. - Negative values:
[-1,-1,1,1,2], K=2→[-1,1](tie at freq 2 — problem assumes unique answer; for this test, K = 1 →[-1]or[1]). - Large: N = 10⁵, K = 10, random values; both approaches sub-50ms.
- Adversarial: all-distinct → bucket sort touches all of
bucket[1]; heap pushes D items.
Follow-up Questions
- “Streaming top-K — elements arrive one at a time, query top K at any moment.” → maintain a min-heap of size K plus a hashmap; per element, look up old freq, decrement-and-rebuild. Or use Misra-Gries / Space-Saving sketches for approximate.
- “Top K from multiple sorted streams (LC 23 generalized)?” → merge-K-sorted via min-heap.
- “K closest points to origin (LC 973)?” → max-heap of size K keyed by distance, push and pop-if-over-K.
- “Kth largest element (LC 215)?” → min-heap of size K → root is K-th largest. Or QuickSelect for O(N) average.
- “Sliding window top-K?” → monotonic deque (LC 239 for K=1) or balanced BST (Phase 3).
- “Memory-constrained: top K of a billion items?” → heap-of-K only — O(K) memory, O(N log K) time.
Product Extension
Recommendation systems, trending topics, top-N queries on dashboards, “most frequent error code in last 5 minutes” log monitors — all are top-K problems. Real-world systems use approximate algorithms (Count-Min Sketch + heap) for cardinality at internet scale, but the exact algorithm is what an interview is testing. The Misra-Gries summary (heavy hitters) generalizes the heap to a streaming, memory-bounded, approximate version.
Language/Runtime Follow-ups
- Python:
heapqis min-only. To use it as max-heap, push-x(or(-freq, value)and decode at the end).Counter(nums).most_common(K)is a one-liner shortcut — mention it in the interview as the “Pythonic” answer, but be ready to write the algorithm by hand. - Java:
PriorityQueue<int[]>withComparator.comparingInt(a -> a[0])is the canonical pattern. Boxing tax if you usePriorityQueue<Integer>. Sorting viaArrays.sort(arr, comparator)works onInteger[]but notint[]. - Go:
container/heaprequires implementing 5 methods. Verbose. For Top-K problems, a simple sortsort.Slice(items, less)and slicing top K is often clearer. - C++:
priority_queue<int>is a max-heap by default. For min-heap:priority_queue<int, vector<int>, greater<int>>. Thetop()is O(1),pop()is O(log N) and returns void — you must calltop()first. - JS/TS: no built-in heap. Implement (~30 LOC) or use a library. For interview,
Array.sortwith a comparator and slicing is often acceptable for offline cases; mention you’d need a heap for streaming. - Heap tuple comparison: Python compares tuples lexicographically; if freq ties, comparison falls back to value, which can fail for non-comparable types. Wrap in a class with
__lt__defined onfreqonly, or pre-encode as(freq, hash(value)).
Common Bugs
- Using a max-heap of size D instead of a min-heap of size K — wastes time and space; O(D log D) instead of O(D log K).
- Java boxing in
PriorityQueue<Integer>— silent ~2-3× slowdown. - Python
heapqconfusion: forgetting it’s min-only; writingheappush(heap, -x)for max-heap and forgetting to negate when reading. Use a tuple with negated key cleanly:heappush(heap, (-freq, value)). - C++ default direction wrong:
priority_queue<int>is max-heap; using it for “min-heap of size K” gives wrong results. - Heap size check after push:
if len(heap) > K: popis correct. If you swap toif len(heap) >= K: popyou’ll never have K items. - Bucket-sort allocation cost:
[[] for _ in range(N + 1)]is O(N), but in C++ you canvector<vector<int>> buckets(N + 1);— same idea, just be aware of the per-bucket overhead. - Unstable comparison on ties in heap tuples — for
(freq, value), tied freqs compare values. If values are non-comparable (custom objects), this errors. Pre-encode or wrap.
Debugging Strategy
- Trace
([1,1,1,2,2,3], 2). Counter:{1:3, 2:2, 3:1}. Heap evolution: push (3,1) → heap[(3,1)], size 1 ≤ 2 ✓; push (2,2) →[(2,2),(3,1)], size 2 ≤ 2 ✓; push (1,3) →[(1,3),(3,1),(2,2)], size 3 > 2 → pop (1,3); heap =[(2,2),(3,1)]. Result: values[2, 1]. ✓ - Bucket trace: counter same; buckets
[[], [3], [2], [1], [], [], [], ...]. Walk f=6,5,4,3 → take 1; f=2 → take 2; size=2, return[1, 2]. ✓ - Cross-check the two approaches on 50 random inputs (compare as sets).
Mastery Criteria
- Recognized “Top K” pattern within 30 seconds.
- Wrote the min-heap-of-size-K solution within 6 minutes.
- Mentioned bucket sort as the O(N) alternative when frequencies are bounded.
- Identified the language-specific heap-direction trap (Python min-only, Java min-default, C++ max-default).
- Solved LC 215 (Kth Largest), LC 973 (K Closest), LC 23 (Merge K Sorted Lists) within a week — same pattern, different keys.
- Discussed QuickSelect as an O(N)-average alternative when asked.
Phase 3 — Advanced Data Structures
Target level: Medium → Hard Expected duration: 3 weeks (12-week track) / 3 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~8 advanced structures per week + 30–60 problems applying them under the framework
Why Advanced Data Structures Unlock Hards
Phase 2 gave you 28 patterns that solve the vast majority of Mediums. The patterns work because each one carries an O(N) or O(N log N) algorithm in its template — you recognize the signal, instantiate the template, and the runtime falls out for free.
Hards are different. The signal still fires — you still recognize “this is sliding window with a tricky max”, “this is DP with a state transition”, “this is shortest path with a constraint” — but the vanilla template’s complexity is one factor too high. A sliding-window max over a stream of N updates becomes O(N²) with a sorted list. A DP with N=20 and “set of visited” in the state explodes to O(2^N · N²) without bitmask compression. A range-sum problem with both updates and queries blows past prefix sums. A string match against a pattern of length M in a text of length N is O(N·M) with naive comparison; that’s 10^10 ops at N=10^5, M=10^5.
The advanced data structures in this phase are the augmented engines that bring Hard problems back into reach. Each one is a 1–2 log-factor improvement over a naive structure. They are not “tricks”. They are well-defined, well-proven structures with known invariants, well-understood failure modes, and known operating ranges. The skill is not to invent them — it is to recognize when the vanilla template is one log factor short, identify which augmented structure plugs the hole, and instantiate it correctly under interview pressure.
There are roughly three families:
- Range query / range update structures — segment tree, Fenwick tree, sparse table, sqrt decomposition. These turn O(N) per range query into O(log N) or O(√N), and (with augmentation) handle range updates the same way. They show up whenever the problem has a sequence and you need both updates and aggregates over arbitrary subranges in the same workload.
- String-matching / hashing / suffix structures — KMP, Z, Manacher, rolling hash, suffix array, suffix automaton, Aho-Corasick, tries. These bring per-character work down from O(M) (full pattern recompare) to O(1) amortized, enabling O(N+M) or O(N log N) algorithms over strings. They show up whenever the problem mentions “substring”, “match”, “occurrence”, “palindrome”, or “common”.
- State-compression and amortization — bitmask DP, meet-in-the-middle, DSU with α(N), bit manipulation idioms, Bloom/skip/LRU-LFU. These exploit problem-specific structural facts (small N, splittable input, near-constant amortized work, probabilistic acceptance) to clear constraints that naive DP/search cannot.
You will not memorize 24 implementations cold. You will understand the invariants well enough to derive each implementation in 5–15 minutes under pressure, and instantly recognize which one is needed from the problem signal.
After this phase you can solve unmistakably-Hard LeetCode problems on first attempt: range queries with updates, palindromic counts in linear time, multi-pattern matching, exact-cover by bitmask DP, subset-sum at N=40 by meet-in-the-middle, equation-solving by weighted DSU, dynamic LRU caches. You also become visibly stronger in mock interviews because you no longer flinch at “what if the input is updated?”, “what if N is 40?”, or “what if there are 10^5 patterns to match?”.
What You Will Be Able To Do After This Phase
- For any range-query Hard, identify within 60 seconds whether vanilla prefix sums suffice, or whether Fenwick / segment tree / sparse table / sqrt decomposition is required, and why.
- Implement a segment tree (point update, range query) from memory in <12 minutes.
- Add lazy propagation when the problem demands range updates, and articulate the lazy-tag-push invariant.
- Recognize a string-match Hard and pick the right tool: KMP (single pattern), Z (border + offsets), Manacher (palindromes), rolling hash (probabilistic equality, multi-substring), Aho-Corasick (multi-pattern), suffix array/automaton (overview-level for “longest common substring”, “distinct substrings”).
- Build a trie augmented with counts / deletion / prefix-sum cache for word-search and autocomplete-class problems.
- Recognize bitmask DP from
N ≤ 20constraint, formulate the state, and implement the transition without bugs. - Recognize meet-in-the-middle from
N ≤ 40(split into 20+20) and code the two-half merge. - Implement DSU with path compression + union by rank, and prove the α(N) amortized bound.
- Use bit manipulation idioms (popcount, lowbit, isolate trailing one, parity tricks) without thinking.
How To Read This Phase
Read the inline reference below in two passes. Pass 1: linear, end to end, to assemble a mental map of which structure plugs which hole. Pass 2: as you work through the labs, refer back to the structure entries to clarify invariants and pitfalls. Each entry has a fixed shape:
- When to use — the problem signal that should fire this structure within 2 minutes of reading.
- Complexity — build, query, update, space.
- Implementation pitfalls — the bugs that consume the most interview minutes.
- Classic problems — 3–6 representative problems where the structure is the intended solution.
Where labs cover the structure hands-on, the entry references the lab. Where the structure is overview-only (rare in interviews but expected of strong candidates), the entry says so explicitly.
Inline Advanced Data Structure Reference
1. Segment Tree (point update + range query)
When To Use
- Sequence of N elements, with both point updates and range aggregates (sum / min / max / gcd / xor) on the same workload.
- The aggregate is associative — i.e., it can be combined from two disjoint segments.
- Q queries + Q updates, with N · Q too large for naive O(N) per query and prefix sums (which are O(1) per query but O(N) per update) too rigid for the update mix.
Complexity
Build O(N). Point update O(log N). Range query O(log N). Space O(4N) — the tree array is conventionally sized 4N to fit any nearly-balanced binary tree on N leaves.
Key Implementation Pitfalls
- Off-by-one on the recursive boundaries —
query(node, nl, nr, ql, qr): total miss isqr < nl or ql > nr(closed intervals); total cover isql <= nl and nr <= qr. Mixing open and closed intervals is the #1 cause of broken segment trees. - Sizing the tree array —
4Nis safe;2 * next_power_of_two(N)is tight. If you pick2Nyou’ll segfault for non-power-of-two N. - Combining two subtree results — for sum, just add; for min/max, take the extreme; for gcd, recurse. Define a single
combine(a, b)so you can swap aggregates without rewriting the body. - Recursive vs iterative — iterative segment tree with
nrounded to power of 2, leaves at[n, 2n), is shorter and faster. Pick one style and stick to it.
Classic Problems
- LeetCode 307 — Range Sum Query Mutable
- LeetCode 308 — Range Sum Query 2D Mutable (with row segment trees)
- LeetCode 1157 — Online Majority Element In Subarray (segment tree of candidates + frequency check)
- LeetCode 715 — Range Module (segment tree of intervals, or coordinate-compressed)
Hands-on: see Lab 01.
2. Segment Tree With Lazy Propagation (range update + range query)
When To Use
- Same as #1 but updates affect ranges, not single points: “add v to all elements in [l, r]”, “set all elements in [l, r] to v”, “flip all elements in [l, r]”.
- The update operation has a clean composition rule: applying update u₁ then u₂ to a node is equivalent to a single combined update.
Complexity
All ops O(log N). Space O(4N) for the values + O(4N) for the lazy tags.
Key Implementation Pitfalls
- Pushdown order — before recursing into children, push the parent’s lazy tag down. After recursing, recompute the parent’s value from the children. Forgetting either side breaks the structure silently — queries return stale values.
- Composing lazy tags — for “set” and “add” mixed, “set” must override the pending “add”. Define a clear
apply(child_lazy, parent_lazy)rule and write it down before coding. - Tag identity / no-op — every lazy slot needs an “identity” value (e.g., 0 for add, sentinel for set) that means “no pending op”. Don’t conflate “identity” with a legal value.
- Range-set with empty intersection — apply only on full-cover nodes; recurse only on partial overlap.
Classic Problems
- LeetCode 732 — My Calendar III (count of overlapping intervals — coord-compressed lazy seg tree)
- LeetCode 2569 — Handling Sum Queries After Update (range flip + range sum, lazy + bitmask)
- Codeforces “EDU: Segment Tree” sections
Hands-on: see Lab 02.
3. Fenwick Tree / Binary Indexed Tree (prefix sums with updates)
When To Use
- Workload is point updates + prefix queries (or range queries via two prefix queries).
- The aggregate is invertible (sum, xor) — Fenwick can’t express min/max naturally because they’re not invertible.
- You want a smaller, faster, simpler structure than a segment tree, with smaller constants and easier code.
Complexity
Build O(N) (or O(N log N) trivially). Point update O(log N). Prefix query O(log N). Range query (for sum/xor) = query(r) - query(l - 1). Space O(N).
Key Implementation Pitfalls
- 1-indexed — Fenwick trees are conventionally 1-indexed. Calling
update(0, v)infinite-loops because0 & -0 == 0. Useupdate(i + 1, v)if your data is 0-indexed. i += i & -i(update) vsi -= i & -i(query) — the directions are not symmetric. Memorize: update goes up, query goes down.- Range-update + point-query is a different Fenwick variant — subtract on
r+1, add onl. Not the same code path. - Range-update + range-query needs two Fenwick trees (the BIT² trick). Out of interview scope unless the problem explicitly demands.
Classic Problems
- LeetCode 307 — Range Sum Query Mutable (canonical)
- LeetCode 315 — Count of Smaller Numbers After Self (canonical Fenwick on coord-compressed values)
- LeetCode 327 — Count of Range Sum
- LeetCode 493 — Reverse Pairs
Hands-on: see Lab 03.
4. 2D Fenwick Tree (matrix prefix sums with updates)
When To Use
- Matrix problems with both point updates and rectangle-sum queries.
- Static prefix-sum + occasional updates — but updates make recomputing the prefix-sum O(NM), so 2D Fenwick.
- Coordinate-compressed if the matrix is sparse (most cells empty).
Complexity
Update O(log N · log M). Rectangle query O(log N · log M). Space O(N · M).
Key Implementation Pitfalls
- Nested
i & -i— outer loop walks the row index, inner loop walks the column index. Independent. - Inclusion-exclusion for rectangle sum:
Q(r2, c2) - Q(r1-1, c2) - Q(r2, c1-1) + Q(r1-1, c1-1). Forgetting the+corner is the canonical bug. - Sparse matrices — if N · M = 10^10 you cannot allocate the array. Use a
dictofdictor coordinate compression along each axis.
Classic Problems
- LeetCode 308 — Range Sum Query 2D Mutable (canonical)
- LeetCode 327 — Count of Range Sum (often reduced to 2D via coords)
- LeetCode 1505 — Min Number of Swaps to Make Strings K-Equal (variants use 2D BIT)
5. Sparse Table (immutable RMQ)
When To Use
- The array is static (no updates) and you need idempotent range queries (min, max, gcd, “is there a 1 in this range”).
- O(1) query is required (segment tree’s O(log N) is too slow).
- Builds in O(N log N) — fine for one-time use.
Complexity
Build O(N log N). Query O(1) for idempotent ops, O(log N) for non-idempotent ops (sum). Space O(N log N).
Key Implementation Pitfalls
- Idempotent op required for O(1) — the trick is
query(l, r) = combine(table[k][l], table[k][r - 2^k + 1])wherek = floor(log2(r - l + 1)). The two halves overlap; this only works if combining the same element twice equals once. Sum is not idempotent (counts twice). Min, max, gcd, “or”, “and” are. floor(log2(len))precomputation — building alog_floor[]table of size N is required for true O(1). Computinglog2per query is too slow.- Memory — at N = 10^6, N log N ≈ 2 × 10^7 ints = 80 MB. Plan for it.
Classic Problems
- LeetCode 1851 — Minimum Interval to Include Each Query (one approach uses sparse table + offline sort)
- Codeforces RMQ classics
Hands-on: see Lab 04.
6. Sqrt Decomposition (block-based queries)
When To Use
- Workload mixes point/range updates and range queries, but the operation is hard to fit into segment tree (e.g., “sum of distinct values in a range”, “k-th smallest in a range with offline queries”).
- You want a much simpler implementation than a segment tree, accepting an O(√N) factor.
- Mo’s algorithm (offline range queries, total cost O((N+Q) · √N)) is a sqrt-decomp specialization.
Complexity
Build O(N). Query/update O(√N) per op. Space O(N).
Key Implementation Pitfalls
- Block size choice — block size
B = ⌈√N⌉minimizes total costQ · (N/B + B). Slightly larger B (e.g., 1.5√N) sometimes wins for cache reasons. - Edge of block — left partial block (l to end of block) and right partial block (start of block to r) handled as scalars; full middle blocks summed via block totals.
- Mo’s algorithm — sort queries by
(block of l, r)(with even/odd block hack to halve constants), then move pointers. Don’t confuse “block of l” sorting with sorting by l.
Classic Problems
- “Range Distinct Count” (offline) — Mo’s classic.
- LeetCode 850 — Rectangle Area II (sweep + sqrt-block coord compression)
- Codeforces “EDU: Sqrt Decomposition”
7. Persistent Segment Tree (overview)
When To Use
- You need to query prior versions of an array — “sum over [l, r] as of version v”, “k-th smallest in [l, r] (offline, treat as 2D)”.
- Classic application: “find k-th smallest in subarray” via merging persistent trees indexed by position.
- Rare in standard interviews; expected for Grandmaster / CP-style.
Complexity
Update O(log N) creating a new version (path-copy). Query on any version O(log N). Space O(N + Q log N) over Q versions.
Key Implementation Pitfalls
- Reference each version’s root — store an array
roots[v]; never mutate an old node. - Memory blow-up — N + Q log N at N = 10^5, Q = 10^5 ≈ 1.7M nodes × ~24 bytes = 40 MB. Pre-allocate node pool.
- Garbage collection — in GC’d languages, hold the root references to keep nodes alive; in C++, use a node pool + manual indices.
Classic Problems
- “K-th smallest in range” (offline-via-persistent-seg-tree).
- “Count distinct in range” (with persistent seg tree of last-occurrence positions).
This is overview-only in this phase; you should know it exists, what it solves, and the rough cost. Implementing it is a Phase 7 exercise.
8. Treap / Implicit Treap (overview)
When To Use
- Balanced BST with expected O(log N) operations via randomized priorities — simpler than red-black or AVL.
- Implicit treap keys by position in an array, supporting O(log N) array splice / split / merge / range reverse / range sum — classic for “rope” data structures.
- Order-statistics tree (find k-th, count less than) when sorted-container library doesn’t expose it.
Complexity
Insert / delete / split / merge / range-op all O(log N) expected.
Key Implementation Pitfalls
- Heap property on random priorities — re-bubble after insertion / deletion. Without correct rotations, you lose the log bound.
- Lazy reverse / lazy add on implicit treap mirrors lazy segment tree — push tag before recursing.
- Expected vs worst case — adversarial input can’t degrade because priorities are random; this is the entire point.
Classic Problems
- “Array splice” with O(log N) per op — implicit treap canonical.
- Order-statistics queries on an indexed multiset.
Overview only.
9. Splay Tree (overview, when used)
When To Use
- Self-adjusting BST — recently accessed nodes move toward the root.
- Useful when the access pattern has temporal locality (LRU-like): O(log N) amortized but O(1) for hot keys.
- Library-link: the data structure underlying many compiler symbol tables.
Complexity
All ops O(log N) amortized; individual ops up to O(N) worst case. The amortization argument uses a potential function.
Key Implementation Pitfalls
- Splay step — zig (one rotation), zig-zig (same-side double), zig-zag (opposite-side double). The choice depends on grandparent direction; getting it wrong destroys amortization.
- Splay on every access — including failed search. Forgetting this means the amortization breaks.
Classic Problems
- Rare in competitive interviews; common in systems-engineering follow-ups about LRU implementations.
Overview only.
10. KMP (Knuth–Morris–Pratt failure function)
When To Use
- Single pattern P matched against a text T.
- You need either all occurrences of P in T, or just first, in O(N + M).
- Or you need the longest border (longest proper prefix = suffix) of a string — that’s the failure function itself.
Complexity
Failure-function build O(M). Match O(N + M). Space O(M).
Key Implementation Pitfalls
- Failure function recursion —
fail[i]is the length of the longest proper prefix ofP[0..i]that’s also a suffix. The recurrence walksj = fail[i - 1]backward viaj = fail[j - 1]until matched. - 0 vs 1 indexed — pick one and stick with it. Most resources are 0-indexed; many CP templates are 1-indexed.
- Forgetting to reset
jto 0 between independent matches — the matcher state is per-text, not global.
Classic Problems
- LeetCode 28 — Find the Index of the First Occurrence in a String (canonical strstr)
- LeetCode 459 — Repeated Substring Pattern (one-shot via failure function)
- LeetCode 214 — Shortest Palindrome (failure function on
s + '#' + reverse(s)) - LeetCode 1392 — Longest Happy Prefix
Hands-on: see Lab 05.
11. Z Algorithm
When To Use
- Compute, for each position i in a string S, the longest prefix of S that starts at i (
z[i]). - Substring matching — concatenate
P + '#' + Tand look forz[i] = Min the T half. - Pattern problems where “longest prefix matching at offset” is the natural query.
Complexity
Build O(N) using a sliding [l, r] window of the rightmost reaching match. Match O(N + M).
Key Implementation Pitfalls
- Maintaining the
[l, r]Z-box — wheni ≤ r, copy fromz[i - l]capped atr - i + 1, then extend; otherwise start fresh from i. Off-by-one on the cap is the canonical bug. - Sentinel character — must not appear in either P or T. Use
\0or a fresh symbol; in Python, a tuple of(0, ord(c))for the sentinel and(1, ord(c))for real chars works. - Z and KMP overlap — they solve the same problems with different invariants. Picking one and being fluent is better than knowing both shallowly.
Classic Problems
- Same as KMP — LC 28, 214, 459, 1392.
- “Number of occurrences of P in T overlapping” — count
z[i] >= Min the T region.
12. Manacher’s Algorithm (longest palindrome in O(N))
When To Use
- “Longest palindromic substring” or “count of palindromic substrings”.
- Naive expand-around-center is O(N²); Manacher’s is O(N).
- The trick is mirroring around the rightmost-reaching palindrome center.
Complexity
O(N). Space O(N).
Key Implementation Pitfalls
- Even-length palindromes — Manacher’s classic trick is to insert
#between every pair of chars (and at ends):"abba"→"#a#b#b#a#". Now every palindrome (odd or even original) is odd-length in the transformed string. P[i]vs original-string radius —P[i]after the transform is the radius in the transformed string. The original palindrome has lengthP[i].- Maintaining the rightmost-reaching center C and right-boundary R —
P[i] = min(R - i, P[2C - i])ifi < R, else expand from scratch. Off-by-one onR - i(vsR - i + 1) is the canonical bug.
Classic Problems
- LeetCode 5 — Longest Palindromic Substring (canonical)
- LeetCode 647 — Palindromic Substrings (count)
- LeetCode 1960 — Maximum Product of Two Palindromic Substrings
13. Rolling Hash (Rabin–Karp + double hashing)
When To Use
- Compare a sliding-window substring against a pattern in O(1) per shift (after O(M) preprocessing).
- Find duplicate substrings of length L in O(N) instead of O(N²) — the canonical “longest duplicate substring” via binary search on L + hashing.
- Compare two substrings of S equal in O(1) — precompute prefix hashes.
Complexity
Preprocess O(N). Per-comparison O(1) hash plus O(M) verify (in adversarial settings; often skipped). For “longest duplicate substring”: O(N log N).
Key Implementation Pitfalls
- Hash collisions — single hash with mod ~10^9 has ~50% collision probability over 10^5 strings (birthday paradox). Always double-hash in interview answers, or single-hash + explicit verify on match.
- Modular arithmetic — base ~30 (alphabet) to ~10^9 prime; mod a large prime ~10^9. In Python, use
pow(base, k, mod)for the negative-power inverse trick. In Java/C++, uselongto avoid overflow onbase * value. - Anti-hash adversarial inputs — Codeforces has problems specifically constructed to break common base/mod choices. Use random base from
[26, mod-1]per run.
Classic Problems
- LeetCode 187 — Repeated DNA Sequences (canonical small-window hashing)
- LeetCode 1044 — Longest Duplicate Substring (binary search on length + rolling hash)
- LeetCode 28 — Find the Index of the First Occurrence (Rabin–Karp variant)
- LeetCode 1392 — Longest Happy Prefix
Hands-on: see Lab 06.
14. Suffix Array (overview, applications)
When To Use
- All suffixes of S sorted lexicographically — enables binary search for any pattern in O(M log N).
- “Longest common substring of two strings” via combined suffix array + LCP.
- “Number of distinct substrings of S” =
N(N+1)/2 − Σ LCP[i].
Complexity
Build O(N log² N) (radix-sort + double-the-rank trick) or O(N log N) (DC3 / SA-IS, harder). LCP via Kasai’s algorithm O(N). Per-pattern search O(M log N).
Key Implementation Pitfalls
- Doubling-the-rank — sort suffixes by their first 1, 2, 4, …, N characters using the previous round’s ranks. Each round is a radix sort.
- LCP array — Kasai’s algorithm: walk in original-index order, decrement an
hcounter that tracks current LCP. Subtle but linear. - Sentinel — append a unique smaller-than-all-others character to avoid prefix-of-another-suffix issues.
Classic Problems
- “Longest common substring of two strings” (SA + LCP + range minimum).
- “Number of distinct substrings”.
- “K-th lexicographically smallest substring”.
Overview only — labs don’t drill suffix arrays directly; rolling hash + Z/KMP cover most interview cases.
15. Suffix Automaton (overview, applications)
When To Use
- Smallest DFA accepting all substrings of S, built in O(N) — strictly stronger than a suffix tree for many queries.
- “Number of distinct substrings”, “longest common substring”, “count occurrences of a pattern” — all O(M) per query after O(N) build.
Complexity
Build O(N · σ) where σ is alphabet size. Space O(N · σ).
Key Implementation Pitfalls
link(suffix-link) pointers — the equivalence-class tree of states. Subtle to derive; templates exist.cntaugmentation — counting occurrences requires DP on the suffix-link tree.- Online construction — add char at a time, maintain the latest state.
Classic Problems
- “Distinct substrings count”.
- “Longest common substring of K strings”.
- Codeforces / SPOJ string problems.
Overview only. Rare in interviews; expected at Grandmaster.
16. Trie Variants (compressed, with counts, with deletion)
When To Use
- Prefix queries: “does any inserted word start with prefix P?”, “all words with prefix P”.
- Multi-word search (LC 212 — Word Search II): compile dictionary into a trie, DFS the grid against the trie for O((R · C) · 4^L) instead of per-word DFS.
- Autocomplete with frequency ranking — trie augmented with word counts and best-K-at-each-node.
Complexity
Insert O(L). Search prefix O(L). Space O(Σ L · σ) for plain array-based trie; O(Σ L · branching) for hash-based.
Key Implementation Pitfalls
- Array-of-26 vs hash-of-char — array-of-26 is faster (no hashing); hash is more memory-efficient on sparse tries. Pick one based on problem constraints.
- End-of-word marker — use a separate
is_endboolean, not a sentinel char that could collide with input. - Compressed (radix) trie — chains of single-child nodes are merged into a single edge labeled with the substring. Saves memory at the cost of more complex insertion (split an edge mid-substring).
- Deletion — typically just clear
is_endand prune empty subtrees. Don’t free shared nodes. - Counts at each node — increment on insert, decrement on delete; useful for “count words with prefix P” in O(L).
Classic Problems
- LeetCode 208 — Implement Trie (canonical)
- LeetCode 211 — Design Add and Search Words (with
.wildcard — DFS) - LeetCode 212 — Word Search II (canonical trie-on-grid)
- LeetCode 421 — Maximum XOR of Two Numbers in an Array (binary trie)
- LeetCode 642 — Design Search Autocomplete System
Hands-on: see Lab 07.
17. Aho–Corasick (multi-pattern matching)
When To Use
- Match a set of K patterns simultaneously against a text T — “find all dictionary words that occur in T”.
- Naive: O(N · ΣM) — too slow for K = 10^4 patterns.
- Aho–Corasick: O(N + ΣM + #matches) — linear in everything.
Complexity
Build O(ΣM · σ). Match O(N + #matches). Space O(ΣM · σ).
Key Implementation Pitfalls
- Failure links — analog of KMP’s failure function on a trie. Built via BFS over the trie.
- Output (dict-suffix) links — for each node, follow failure links to collect every matching pattern that ends at this position. Without dict-suffix links, you’d miss patterns that are suffixes of other patterns.
- Node pool size — total nodes ≤ ΣM + 1. Pre-allocate.
Classic Problems
- LeetCode 1032 — Stream of Characters (canonical reverse-trie + Aho-Corasick)
- “Find all dictionary words occurring in document”.
18. Bloom Filter (probabilistic membership)
When To Use
- “Is X in the set?” with a tolerated false-positive rate, zero false negatives.
- Memory tight (you can’t store the full set), or you want a fast pre-filter before a slow exact check (e.g., disk lookup).
- Streaming dedup with a fixed false-positive budget.
Complexity
Insert / query O(K) where K is the number of hash functions. Space O(M) bits where M is chosen for the target false-positive rate (1 - e^(-KN/M))^K.
Key Implementation Pitfalls
- K (#hash functions) and M (#bits) — given target FPR
pand capacityn, optimum isM = -n ln(p) / ln(2)²,K = (M/n) ln 2. - No deletion — standard Bloom can’t delete (can’t tell which other element shares the bit). Counting Bloom does, with K counters.
- False positives compound with set size — a 1% Bloom filter at capacity is 1% per query, not 1% over a workload. Rebuild when growing.
Classic Problems
- System-design follow-ups (“how do you check whether a URL has been crawled in the last 30 days?”).
- LeetCode-adjacent: not common as a graded problem, but expected in design interviews.
19. Skip List (overview)
When To Use
- Randomized alternative to balanced BST — O(log N) expected, simpler to implement than AVL/RB.
- Used in practice in Redis sorted sets, LevelDB MemTable.
- Order-statistics, range queries on a sorted set, with concurrent-modification flexibility.
Complexity
Insert / delete / search O(log N) expected. Space O(N) expected (geometric level distribution).
Key Implementation Pitfalls
- Level distribution — each new node’s level is geometric with p = 1/2 (or 1/4 in production). Cap at log N.
- Update array on insert/delete — track the predecessor at each level; splice carefully.
- Concurrent skip list — much simpler than concurrent BST; standard library impls in Java (
ConcurrentSkipListMap).
Classic Problems
- LeetCode 1206 — Design Skiplist (canonical implementation problem)
- System-design discussions of Redis ZSET / LevelDB.
Overview only; the implementation problem (LC 1206) is good practice but rare.
20. LRU / LFU Implementation Deep Dive
When To Use
- Cache eviction problems with O(1) get/put requirement.
- LRU: hash map + doubly-linked list. Touched node moves to head; evict tail.
- LFU: hash map of (key → node) + hash map of (freq → doubly-linked list). On hit, move node to next-freq list; on evict, drop tail of min-freq list.
Complexity
LRU: O(1) get and put. Space O(capacity).
LFU: O(1) get and put. Space O(capacity). Maintaining min_freq is the subtle bookkeeping bit.
Key Implementation Pitfalls
- LRU: doubly-linked list with sentinel head/tail — eliminates null checks. Always add at head, evict from tail.
- LRU: hash map points to nodes, not keys — so you can splice the node in O(1) without searching.
- LFU:
min_freqinvariant — increment when freq-list at min_freq becomes empty only if the touched node was the cause. - LFU: list-per-frequency — implement as a doubly-linked list of nodes; ordering within a freq is LRU.
Classic Problems
- LeetCode 146 — LRU Cache (canonical)
- LeetCode 460 — LFU Cache
- LeetCode 432 — All O(1) Data Structure (frequency buckets)
Both are bread-and-butter for systems-engineering interviews.
21. Disjoint Set Union (DSU) with Path Compression and Union by Rank — Proof of α(N) Amortization
When To Use
- Online connectivity (#1 trigger).
- Kruskal’s MST.
- Equation problems (weighted DSU — LC 399 Evaluate Division).
- Offline divide-and-conquer queries with rollback (advanced).
Complexity
Each op amortized inverse-Ackermann α(N) — for all practical N (up to 2^65536), α(N) ≤ 4. Effectively constant.
Proof Sketch (Tarjan)
- Without compression or rank: worst-case chain → O(N) per op.
- Path compression alone: each find shortens the path. Amortized O(log N) per op.
- Union by rank (or size) alone: depth bounded by O(log N). Per-op O(log N) worst case.
- Both together: per-op amortized O(α(N)). Tarjan’s potential function counts “blocks” of nodes by rank and shows the total cost over M ops is O(M · α(N)). The proof uses Ackermann’s hierarchy
A(k, n)and α(N) is its inverse. - For an interview: state “with both heuristics, amortized O(α(N)), where α grows so slowly it’s ≤ 4 for any N you’ll see in practice; you treat it as O(1)”. Cite Tarjan 1975.
Key Implementation Pitfalls
- Recursive
findblows the stack at N = 10^5 in Python. Use iterative two-pass: walk up to root, then walk again compressing. - Path-halving variant (
parent[x] = parent[parent[x]]per step) — simpler, asymptotically equivalent, often faster than full compression in practice. - Union by rank vs union by size — both work. Rank is the height upper bound of the tree (compression doesn’t decrease rank); size is the count of nodes in the tree. Pick one.
- Forgetting to update rank when ranks are equal — break the tie and increment the survivor’s rank.
Classic Problems
- See pattern 18 in Phase 2 README.
- LeetCode 200 — Number of Islands (DSU alternative)
- LeetCode 305 — Number of Islands II (online DSU canonical)
- LeetCode 547 — Number of Provinces
- LeetCode 684 — Redundant Connection
- LeetCode 721 — Accounts Merge
- LeetCode 952 — Largest Component Size by Common Factor
- LeetCode 399 — Evaluate Division (weighted DSU)
Phase 2 covered DSU mechanically. Phase 3’s contribution is the proof of α(N) and the weighted / rollback variants.
22. Bit Manipulation Idioms (popcount, lowbit, isolate trailing one, parity)
When To Use
- Bitmask DP (pattern 23) requires fluency in these primitives.
- Subset enumeration, parity tricks, fast set operations.
- Hot-loop optimization where each int represents a tiny set (≤ 64 elements).
Idioms
- Popcount:
__builtin_popcount(x)(C/C++/Java viaInteger.bitCount),bin(x).count('1')(Python — slow),x.bit_count()(Python 3.10+, fast). - Lowbit / lowest set bit:
x & -xgives the value of the lowest set bit.x & (x - 1)clears it. - Isolate trailing ones:
x & ~(x + 1). Set trailing zero:x | (x + 1). - Iterate subsets of mask:
s = mask; while s: ... ; s = (s - 1) & maskenumerates each subset ofmaskexactly once. - Iterate set bits:
while x: lb = x & -x; ... ; x ^= lb. Each step does O(1) work, total O(popcount). - Parity:
bin(x).count('1') & 1or — faster — XOR-fold:x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; x ^= x >> 2; x ^= x >> 1; return x & 1. - Power of two test:
x > 0 and (x & (x - 1)) == 0. - Swap without temp:
a ^= b; b ^= a; a ^= b— academic; never use in production.
Classic Problems
- LeetCode 191 — Number of 1 Bits (popcount).
- LeetCode 338 — Counting Bits (DP using
dp[x] = dp[x >> 1] + (x & 1)). - LeetCode 461 — Hamming Distance (popcount of XOR).
- LeetCode 78 — Subsets via bitmask iteration.
Mastery here is a prerequisite for Pattern 23.
23. Bitmask DP Foundation
When To Use
- N ≤ ~20 and the problem asks for an optimum over subsets or assignments of N items.
- Examples: traveling salesman (TSP), assignment problem, “shortest path visiting all nodes”, “minimum cost to cover all groups”.
- The state is a bitmask of “which items are used / visited / completed”.
Canonical Forms
- Permutation DP:
dp[mask][i] = min over j in mask\{i}: dp[mask \ {i}][j] + cost(j, i). Result:min over i: dp[full_mask][i](or back-to-start for TSP cycle). - Subset cover DP:
dp[mask] = min over partition of mask into subset s and rest: cost(s) + dp[mask \ s]. - Assignment DP:
dp[mask] = min cost to assign people 0..popcount(mask)-1 to the jobs in mask.
Complexity
Permutation DP: O(2^N · N²) time, O(2^N · N) space. N=20 → 4 × 10^8 — borderline. Subset DP: O(3^N) time (enumerating subset of subset). N=15 → 14M — comfortable.
Key Implementation Pitfalls
- Subset-of-subset enumeration uses
s = mask; while s: ... ; s = (s - 1) & mask. Themaskinvariant is critical. - Initial mask — for permutation DP, initialize
dp[1 << i][i]for the first city; iterate masks in increasing order so dependencies are resolved. - Reconstruction — to recover the order, store predecessor
(mask, i) → (prev_mask, prev_i)and walk back. - N too large — N > 20 is not bitmask DP territory. Reach for meet-in-the-middle (#24) or heuristics.
Classic Problems
- LeetCode 847 — Shortest Path Visiting All Nodes (canonical bitmask + BFS)
- LeetCode 1125 — Smallest Sufficient Team (subset cover DP)
- LeetCode 943 — Find the Shortest Superstring (TSP-like, DP over permutations)
- LeetCode 698 — Partition to K Equal Sum Subsets (subset assignment)
- LeetCode 1494 — Parallel Courses II
- LeetCode 1879 — Minimum XOR Sum of Two Arrays (assignment DP)
Hands-on: see Lab 08.
24. Meet-in-the-Middle (split, sort, two-pointer)
When To Use
- N ≤ 40 (or up to 50), the problem asks for “subset with property X”, and
2^Nis too large but2^(N/2)is fine. - Examples: “subset sum closest to T at N=40”, “count subsets with XOR equal to K”, “split items into two groups minimizing difference”.
Canonical Template
left, right = a[:n // 2], a[n // 2:]
sums_left = sorted(sum(combo) for combo in subsets(left)) # 2^(N/2)
sums_right = sorted(sum(combo) for combo in subsets(right)) # 2^(N/2)
# for each L in sums_left, binary-search the closest R such that L + R ≈ T.
Complexity
Time O(2^(N/2) · N/2) for enumeration + O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2)) for sort, then O(2^(N/2) · log) for the merge. At N=40 → ~10^6 ops. Space O(2^(N/2)) for the two subset-sum lists.
Key Implementation Pitfalls
- Enumerate subsets correctly —
for mask in range(1 << k): sum = sum of bits set in maskvia popcount-iteration. Or recursive include/exclude. - Two-pointer or binary search — once both halves are sorted, sweep with two pointers (one from each end) to minimize / count target.
- Memory — at N=40, half-mask space is 2^20 = 1M entries × 8 bytes = 8 MB. Comfortable, but watch out at N=44.
- Counting (not just existence) — careful binary-search for
lo, hibounds; usebisect_leftandbisect_right.
Classic Problems
- LeetCode 1755 — Closest Subsequence Sum (canonical N=40)
- LeetCode 956 — Tallest Billboard (subset DP alternative; meet-in-the-middle viable)
- LeetCode 805 — Split Array With Same Average (meet-in-the-middle)
- “Subset sum at N=40” — competitive-programming staple.
Hands-on: see Lab 09.
Recognition Cheat Sheet
| Problem Signal | Structure |
|---|---|
| Range query + point update, sum/min/max | Segment tree (#1) or Fenwick (#3, if invertible) |
| Range query + range update | Lazy segment tree (#2) |
| Static range min/max with O(1) queries | Sparse table (#5) |
| Range distinct count, hard-to-segment-tree aggregate | Sqrt decomposition / Mo’s (#6) |
| Single pattern in text | KMP (#10) or Z (#11) |
| Longest palindrome / count palindromes | Manacher (#12) |
| Many-substring equality / longest duplicate | Rolling hash (#13) |
| Multi-pattern dictionary in text | Aho–Corasick (#17) |
| Prefix queries, autocomplete, word-on-grid | Trie variants (#16) |
| Probabilistic membership | Bloom filter (#18) |
| Cache with O(1) get/put | LRU / LFU (#20) |
| Connectivity / equation graphs | DSU (#21) |
| N ≤ 20, subset / assignment optimum | Bitmask DP (#23) |
| N ≤ 40, subset existence / closest sum | Meet-in-the-middle (#24) |
| Bit-level state mechanics | Bit idioms (#22) |
Mastery Checklist
You have completed Phase 3 when you can, on demand and from memory:
- Implement a segment tree (point update, range sum/min/max) in <12 minutes, with no off-by-ones, on the first attempt.
- Add lazy propagation for range-add + range-sum in <20 minutes, articulating the push-down invariant.
- Implement a Fenwick tree (1-indexed, prefix-sum + point update) in <8 minutes.
- State why Fenwick can’t do range-min naturally and which segment-tree augmentation handles it.
-
Build a sparse table for static RMQ in <10 minutes, including the
log_floor[]precompute. - Choose between segment tree, Fenwick, sparse table, and sqrt decomposition based on the workload (read-only vs mixed; aggregate type) in <30 seconds.
- Compute KMP’s failure function on a string of length 20 by hand, no errors.
- Implement KMP match in <12 minutes.
- Implement Manacher’s longest palindrome in <20 minutes (this one is hard; that’s expected).
- Implement double-hashing rolling hash in <15 minutes; explain why single hash is insufficient.
- Implement a trie (insert, search, startsWith) in <8 minutes.
- Implement Aho–Corasick at the conceptual level (failure + dict-suffix links) and state its complexity.
- State the Bloom filter formula: target FPR p, capacity n → M = -n ln(p) / ln(2)², K = (M/n) ln 2.
- Implement LRU cache (146) in <10 minutes; LFU (460) in <25 minutes.
- Implement DSU with path compression + union by rank in <8 minutes; state the α(N) bound and cite Tarjan.
-
Use
x & -x,x & (x - 1), subset-of-mask enumeration without thinking. - Recognize bitmask-DP from N ≤ 20 and write the transition in <10 minutes for an unfamiliar problem.
- Recognize meet-in-the-middle from N = 40 and write both halves + merge in <20 minutes.
If any of these takes >2× the budget, drill it again — that structure is your weakest link. Hards rarely fail because all your structures are weak; they fail because one of them is, and that’s the one this Hard happened to need.
Exit Criteria
You may proceed to Phase 4 — Graph Mastery only when:
- All 9 labs are complete, with the deliverable code written, tested, and reviewed via the REVIEW_TEMPLATE.
- Mastery checklist is fully ticked.
- 30+ Hard problems solved across the structures above (10 segment-tree-class, 5 string-algo, 5 trie/AC, 5 DSU/bitmask, 5 free choice).
- Mock interview at Phase 3 level: you receive a Hard segment-tree problem, a Hard string problem, and a Medium-Hard bitmask DP problem in a 90-minute window. Solve at least 2 of the 3 cleanly.
- No structure is “the one I always get wrong” — drill it until it isn’t.
If any of these fails, do not proceed. Phase 4 builds on the assumption that DSU, segment trees, and bitmask are reflexes. If they are not, Phase 4’s harder graph problems will compound the gap.
Labs
| # | Lab | Structure | Canonical Problem |
|---|---|---|---|
| 01 | Segment tree (range query) | Point update + range sum/min/max | LC 307 |
| 02 | Segment tree with lazy propagation | Range update + range query | Range-add + range-sum |
| 03 | Fenwick tree (BIT) | Coord-compressed Fenwick | LC 315 |
| 04 | Sparse table for RMQ | Static O(1) RMQ | Range-min array |
| 05 | KMP string matching | Failure function + match | LC 28 / 459 |
| 06 | Rolling hash | Double hashing | LC 187 / 1044 |
| 07 | Trie applications | Trie with is_end + DFS-on-trie | LC 208 / 212 |
| 08 | Bitmask DP | Permutation DP over subsets | LC 847 |
| 09 | Meet-in-the-middle | Split-sort-merge | LC 1755 |
Common Failures At This Phase
These are the failure modes that consume the most candidate time at Phase-3 level. Tag them when they occur using FAILURE_ANALYSIS.md.
- Segment tree off-by-ones — closed-vs-open intervals mixed mid-recursion. Fix: always closed
[l, r], never mix with[l, r). - Fenwick tree 0-index trap —
update(0, …)infinite-loops. Fix: shift to 1-index at the boundary. - KMP failure function off-by-one —
j = fail[j - 1]vsj = fail[j]. Fix: derive on a 5-char example. - Rolling hash single-mod collisions — pass random unit tests, fail adversarial. Fix: double-hash always.
- Bitmask DP transition direction —
dp[mask]fromdp[mask & ~bit](forward) vsdp[mask | bit] from dp[mask](backward). Both work; mixing them mid-implementation breaks. Fix: pick one before coding. - DSU recursive find stack overflow at N=10^5 in Python. Fix: iterative two-pass.
- Lazy segment tree forgetting to push before recursing into a child. Fix: write
push_down(node)as the first line of any non-leaf recursion.
Cross-References
- FRAMEWORK.md — apply on every Hard.
- CODE_QUALITY.md — Hards do not get graded leniency; clean code still required.
- COMMUNICATION.md — out loud at the recognition step, the structure name and complexity must be explicit. “I’ll use a segment tree with lazy propagation; build O(N), query and update O(log N), space O(4N).”
- SPACED_REPETITION.md — segment tree and KMP should be on a 7-day cycle for the first month after this phase. Bitmask and meet-in-the-middle on 14-day.
- Phase 4 — Graphs — DSU shows up immediately; review #21 the day before starting Phase 4.
- Phase 5 — DP — bitmask DP is the bridge. Without #23 fluency, Phase 5’s “DP on graphs / DAG / interval” labs will hurt.
- Phase 7 — Competitive — persistent seg tree, suffix array/automaton, splay/treap deepen here.
Lab 01 — Segment Tree (Point Update + Range Query)
Goal
Implement a segment tree from scratch that supports point updates and range-sum / range-min / range-max queries on an array of N integers. Build in O(N), query and update in O(log N) each. Internalize the recursion structure so you can re-derive any aggregate variant on the fly. After this lab you should be able to write a working segment tree from blank slate in under 12 minutes with zero off-by-ones.
Background Concepts
A segment tree represents an array as a near-balanced binary tree where each internal node stores the aggregate (sum / min / max / gcd / …) of a contiguous range. Leaves correspond to individual array elements; internal nodes correspond to the union of their children’s ranges. The tree has depth O(log N), so any range [l, r] decomposes into at most 2 · log₂(N) disjoint subtree-ranges. That is the entire complexity argument: each query and update walks O(log N) nodes.
The tree is conventionally stored in a flat array of size 4N (worst-case nearly-balanced binary tree on N leaves) with the root at index 1, left child at 2 · i, right child at 2 · i + 1. This avoids pointer overhead and is cache-friendly.
The aggregate must be associative so that subtree results can be combined. Sum, min, max, gcd, xor, “and”, “or”, and matrix multiplication all qualify. Median, mode, and “k-th smallest” do not combine cleanly and need different structures.
Interview Context
Range queries with updates appear in 3–5% of FAANG-tier Hard pools, but they appear more often on the bar-raiser round. Recognizing that prefix sums (O(1) query, O(N) update) won’t survive the workload — that you need O(log N) for both — is the reflex this lab builds. Companies that screen with segment trees: Meta (frequent), Google (occasional), Amazon (rare), Stripe / HFT shops (very frequent — order book, sliding aggregates). Bombing this is a no-hire signal at L5+.
Problem Statement
Implement a class NumArray initialized with an integer array nums. Support:
update(i, val): setnums[i] = val.sumRange(left, right): return the sum ofnums[left..right]inclusive.
Both must be O(log N). After implementing the sum variant, refactor so swapping combine = + for combine = min or combine = max requires changing one line.
Constraints
- 1 ≤ N ≤ 3 × 10⁴
- −100 ≤ nums[i] ≤ 100
- 0 ≤ i, left, right < N
- Up to 3 × 10⁴ calls to
updateandsumRangecombined.
Clarifying Questions
- Are queries inclusive on both ends? (Yes —
[left, right].) - Is
numsmutable in place, or owned byNumArray? (Owned; copy on construction.) - Are the values guaranteed to fit in int32? (Yes; sum across N elements at value ±100 fits comfortably.)
- Is
updatean assignment or delta? (Assignment — set, not add.)
Examples
NumArray([1, 3, 5])
sumRange(0, 2) → 9
update(1, 2) // array becomes [1, 2, 5]
sumRange(0, 2) → 8
update(0, 10) // array becomes [10, 2, 5]
sumRange(1, 2) → 7
Initial Brute Force
Store nums as a plain list. update(i, v): nums[i] = v (O(1)). sumRange(l, r): sum(nums[l:r+1]) (O(N)). Updates are fast; queries are linear.
Brute Force Complexity
Update O(1). Query O(N). Total over Q queries + U updates: O(N · Q + U). At N = Q = 3 × 10⁴: 9 × 10⁸ ops — TLE.
Optimization Path
Two natural alternatives.
Prefix sums. prefix[i] = nums[0] + ... + nums[i-1]. Query is prefix[r+1] - prefix[l] in O(1). But update(i, v) requires recomputing prefix[i+1..N] in O(N). Wrong tradeoff for this workload.
Sqrt decomposition. Block size √N; per-block sums. Update O(1), query O(√N). Total O(N · √N) = O(N^1.5) — at N = 3 × 10⁴, ~5 × 10⁶ ops. Passes but is suboptimal and crusty.
Segment tree. Build O(N), update O(log N), query O(log N). Total O((N + Q) log N) = ~5 × 10⁵ ops. Clean fit.
Final Expected Approach
Recursive segment tree on a flat array of size 4N.
- Build
build(node, nl, nr): ifnl == nr, leaf =arr[nl]; else recurse left and right, settree[node] = tree[2node] + tree[2node + 1]. - Update
update(node, nl, nr, idx, val): recurse into the child whose range containsidx; on return, recompute parent. - Query
query(node, nl, nr, ql, qr): total miss → identity (0 for sum, +∞ for min); total cover → returntree[node]; partial → recurse both children and combine.
Public API wraps with node = 1, nl = 0, nr = N - 1.
Data Structures Used
- A single integer array
tree[]of size4N(sum aggregates). - Optional integer
nstoring the original length.
Correctness Argument
Build establishes the invariant tree[node] = combine over [nl, nr] by induction on subtree size. Update: along the recursion, only nodes whose range contains idx are touched; each is recomputed from its (now-correct) children, so the invariant is preserved. Query decomposes [ql, qr] into a disjoint union of subtree ranges; the result is the combine of those. The decomposition has size ≤ 2 log N because along any root-to-leaf path the recursion either stops (full cover or miss) or splits at most twice (once for the left boundary, once for the right). The total work is O(log N).
Complexity
| Operation | Time | Space |
|---|---|---|
| Build | O(N) | O(4N) |
| Update | O(log N) | O(log N) recursion |
| Query | O(log N) | O(log N) recursion |
Implementation Requirements
class NumArray:
def __init__(self, nums):
self.n = len(nums)
self.tree = [0] * (4 * self.n)
self._build(1, 0, self.n - 1, nums)
def _build(self, node, nl, nr, a):
if nl == nr:
self.tree[node] = a[nl]; return
mid = (nl + nr) // 2
self._build(2*node, nl, mid, a)
self._build(2*node + 1, mid + 1, nr, a)
self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]
def update(self, i, val):
self._update(1, 0, self.n - 1, i, val)
def _update(self, node, nl, nr, idx, val):
if nl == nr:
self.tree[node] = val; return
mid = (nl + nr) // 2
if idx <= mid: self._update(2*node, nl, mid, idx, val)
else: self._update(2*node + 1, mid + 1, nr, idx, val)
self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]
def sumRange(self, l, r):
return self._query(1, 0, self.n - 1, l, r)
def _query(self, node, nl, nr, ql, qr):
if qr < nl or ql > nr: return 0 # miss
if ql <= nl and nr <= qr: return self.tree[node] # cover
mid = (nl + nr) // 2
return self._query(2*node, nl, mid, ql, qr) + self._query(2*node + 1, mid + 1, nr, ql, qr)
Refactor to support min by changing the identity (+∞), the leaf assignment (still a[nl]), and the combine (min).
Tests
- N=1:
update(0, 5); sumRange(0, 0) == 5. - All zeros: every range query returns 0.
- All same value:
sumRange(l, r) == val * (r - l + 1). - After
update(i, v):sumRange(i, i) == v; sum across full range matches direct sum of array. - Random fuzz: 1000 ops alternating updates and queries against a brute-force list.
- Min variant: build
[3, 1, 4, 1, 5, 9, 2, 6],query(2, 5) == 1; afterupdate(3, 10),query(2, 5) == 4.
Follow-up Questions
- “Now I want range updates.” → Lab 02 (lazy propagation).
- “Now I want O(1) queries on a static array.” → sparse table (Lab 04).
- “Now the array is 2D.” → segment tree of segment trees, O(log² N) per op.
- “Make it iterative.” → power-of-two-padded leaves at indices
[N, 2N); update walksi // 2upward, query walksl, rtoward the middle. - “How would you support
count of values ≥ K in [l, r]?” → merge sort tree (segment tree with sorted lists) or wavelet tree.
Product Extension
Real-time analytics dashboards: a stream of N metrics with both edits and arbitrary-range aggregates (e.g., “total revenue from days 17–24” while a correction is being applied to day 19). The naive list is fine until the workload has both fast updates and fast arbitrary-range queries on the same data — then a segment tree over the time axis, keyed by index, is what powers the underlying store.
Language/Runtime Follow-ups
- Python: recursion at N=3×10⁴ is fine but the per-op constant is high. For larger N convert to iterative or use
sys.setrecursionlimit. Considerarray.array('i', ...)over plainlistfor cache locality. - Java: use
int[] tree = new int[4 * n]; the4nallocation is critical because computingnext_power_of_two(n) * 2is a fencepost-error magnet. Method dispatch has a real cost — inline the recursion if hot. - Go: no recursion-limit issue; keep the slice. Use
int(sized to platform) unless the problem demandsint64. - C++: the canonical implementation.
vector<long long> tree(4 * n). Inline the body; mark methodsinline. For competitive problems use the iterative version (template by Adrian Panaete or Codeforces “EDU”). - JS/TS: typed arrays —
new Int32Array(4 * n)— outperform plain arrays. Recursion depth at N=3×10⁴ is fine in V8.
Common Bugs
- Mixing closed-interval
[ql, qr]with half-open[ql, qr)between query and recursion. Always pick closed and stay consistent. - Sizing
treeas2Ninstead of4N: works for power-of-two N, segfaults otherwise. - Forgetting to recompute
tree[node]after recursing inupdate. The leaf updates correctly but the parent stays stale. - Identity wrong for the aggregate: 0 for sum, but +∞ (
float('inf')/Long.MAX_VALUE) for min and −∞ for max. Returning 0 from a min query missing-range gives wrong answers silently. - Building from
nums[mid + 1]vsnums[mid]— pick one slicing convention. - Iterative version: forgetting to round N up to a power of two before placing leaves.
Debugging Strategy
When tests fail, drop into a tiny instance (N=4, indices 0..3) and print(tree) after each op. Verify by hand: tree[1] should equal sum over [0,3], tree[2] over [0,1], tree[3] over [2,3]. If those don’t hold post-build, your build recursion is broken — fix that before touching update or query. Add assert for the cover/miss/partial branches printing (node, nl, nr, ql, qr) to spot which sub-call returns the wrong total.
Mastery Criteria
- Recognized the segment-tree signal in <60 seconds from a “range query + point update” problem statement.
- Wrote build/update/query on a blank screen in <12 minutes with no off-by-ones, first try.
- Refactored sum → min → max in <2 minutes by changing identity + combine only.
- Stated complexity (build O(N), update/query O(log N), space O(4N)) without prompting.
- Solved LC 307 in <15 minutes from cold start.
- Solved one cousin problem (LC 308 or LC 1157) in <30 minutes from cold start.
Lab 02 — Segment Tree With Lazy Propagation
Goal
Extend Lab 01’s segment tree to support range updates in O(log N) using lazy propagation. Implement range-add + range-sum and articulate the push-down invariant so cleanly that you can re-derive lazy on a different aggregate (range-set + range-sum, range-flip + range-count) under interview pressure.
Background Concepts
Lazy propagation defers work. When an update covers a whole subtree, instead of recursing into all O(2^depth) descendants, you stamp a single lazy tag on that subtree’s root and update its aggregate in O(1). The descendants stay stale until something forces a deeper visit; at that point you push_down the tag — apply it to the children’s aggregates and merge into their lazy slots — and clear the parent’s tag.
This works whenever:
- The update operation has an O(1) batch form: applying “add v to all of [nl, nr]” to
tree[node]istree[node] += v * (nr - nl + 1). - The lazy tags compose: a pending “add 3” followed by a new “add 5” composes to “add 8”. Without composition, you cannot stack tags; you must push first.
- There is a identity lazy value (e.g., 0 for add) meaning “nothing pending”.
For mixed update types (“add” and “set” both), composition needs an explicit rule: a new “set” wipes any pending “add”; a new “add” composes with a pending “set” by changing the set value.
Interview Context
Asked at: companies with high-frequency-trading or analytics flavor (Stripe, Two Sigma, Jane Street), and Meta in bar-raiser slots. Most interview problems that need this dress up as “support add v to a range and report sum of a range” or as a count-of-overlapping-intervals problem like LC 732. Failing to know this structure caps you at Mediums; recognizing it and implementing it correctly is a green-light at L5+.
Problem Statement
Implement a class RangeArray over n integers (initially zero) supporting:
add(l, r, v): addvto every index in[l, r].sumRange(l, r): return the sum ofarr[l..r].
Both O(log n).
Constraints
- 1 ≤ n ≤ 10⁵
- 1 ≤ Q ≤ 10⁵ ops total.
- −10⁴ ≤ v ≤ 10⁴.
- Sums fit in 64-bit (max |sum| ≈ 10⁵ · 10⁵ · 10⁴ = 10¹⁴).
Clarifying Questions
- Endpoints inclusive? (Yes.)
- Is
addcumulative or assignment? (Cumulative — additive.) - Should
arrbe mutable in place at the leaves? (Conceptually yes; in practice the segment tree owns it.) - 0-indexed or 1-indexed externally? (0-indexed.)
Examples
RangeArray(5)
add(0, 2, 3) // [3, 3, 3, 0, 0]
sumRange(0, 4) → 9
add(1, 3, 2) // [3, 5, 5, 2, 0]
sumRange(2, 4) → 7
sumRange(0, 0) → 3
Initial Brute Force
Plain list. add(l, r, v) → for i in range(l, r+1): arr[i] += v (O(N)). sumRange → sum(arr[l:r+1]) (O(N)). Combined per-op O(N).
Brute Force Complexity
O(N) per op. Total O(N · Q) = 10¹⁰ at the limits. TLE by 4 orders of magnitude.
Optimization Path
Difference array? diff[l] += v; diff[r+1] -= v is O(1) per add, but you can only query the final prefix sum after all updates — not interleaved with sum queries. Doesn’t survive mixed workload.
Two Fenwick trees (BIT-RU + BIT-PQ)? Yes, this works for range-add + range-sum specifically — the BIT² trick. Slightly faster constants than segment tree, but only handles invertible aggregates. Segment tree generalizes to range-set, range-min-after-add, range-affine, etc.
Lazy segment tree is the canonical answer.
Final Expected Approach
Augment Lab 01’s tree with a parallel lazy[] array of size 4N, all initialized to 0 (the identity for add).
push_down(node, nl, nr): iflazy[node] != 0, apply it to both children’s aggregates (tree[child] += lazy[node] * child_len) and compose it intolazy[child] += lazy[node]. Then clearlazy[node] = 0. Called at the start of any non-leaf update or query that recurses into children.update(node, nl, nr, ql, qr, v):- If
qr < nl or ql > nr: return (miss). - If
ql <= nl and nr <= qr: stamptree[node] += v * (nr - nl + 1); lazy[node] += v; return. - Else:
push_down; recurse both children;tree[node] = tree[left] + tree[right].
- If
query(node, nl, nr, ql, qr): identical structure, withpush_downbefore recursing.
Data Structures Used
tree[]— sum aggregates, size4N,int64.lazy[]— pending add tags, size4N,int64.
Correctness Argument
Invariant: for every node, tree[node] equals the correct aggregate over its range as if all pending lazy tags up to and including this node have been applied. Specifically, tree[node] is correct; tree[child] may be stale by exactly lazy[node] * child_len.
push_down repairs the children: it adds the missing contribution to their aggregates and composes the tag into theirs (so their own descendants will, later, be repaired similarly). It then clears lazy[node]. After push_down, tree[node] is unchanged and the children are now correct, so descendants of children may be stale only by the children’s own pending tags.
update either (a) misses, doing nothing, (b) totally covers, applying the O(1) batch update directly to tree[node] and stamping the tag, or (c) partially overlaps, which requires push_down before recursing so the children are correct, then recomputes tree[node] from now-current children.
query symmetric.
Complexity
| Operation | Time | Space |
|---|---|---|
| Build | O(N) | O(4N) tree + O(4N) lazy |
| Range update | O(log N) | O(log N) recursion |
| Range query | O(log N) | O(log N) recursion |
Implementation Requirements
class RangeArray:
def __init__(self, n):
self.n = n
self.tree = [0] * (4 * n)
self.lazy = [0] * (4 * n)
def _push_down(self, node, nl, nr):
if self.lazy[node]:
mid = (nl + nr) // 2
left, right = 2*node, 2*node + 1
self.tree[left] += self.lazy[node] * (mid - nl + 1)
self.lazy[left] += self.lazy[node]
self.tree[right] += self.lazy[node] * (nr - mid)
self.lazy[right] += self.lazy[node]
self.lazy[node] = 0
def add(self, l, r, v):
self._add(1, 0, self.n - 1, l, r, v)
def _add(self, node, nl, nr, ql, qr, v):
if qr < nl or ql > nr: return
if ql <= nl and nr <= qr:
self.tree[node] += v * (nr - nl + 1)
self.lazy[node] += v
return
self._push_down(node, nl, nr)
mid = (nl + nr) // 2
self._add(2*node, nl, mid, ql, qr, v)
self._add(2*node + 1, mid + 1, nr, ql, qr, v)
self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]
def sumRange(self, l, r):
return self._sum(1, 0, self.n - 1, l, r)
def _sum(self, node, nl, nr, ql, qr):
if qr < nl or ql > nr: return 0
if ql <= nl and nr <= qr: return self.tree[node]
self._push_down(node, nl, nr)
mid = (nl + nr) // 2
return self._sum(2*node, nl, mid, ql, qr) + self._sum(2*node + 1, mid + 1, nr, ql, qr)
Tests
- N=1, single index:
add(0, 0, 5);sumRange(0, 0) == 5. - All-zero: any sumRange before any add returns 0.
- Disjoint adds:
add(0, 2, 1),add(3, 5, 2);sumRange(0, 5) == 3 + 6 = 9. - Overlapping adds:
add(0, 4, 1),add(2, 4, 1);sumRange(2, 4) == 2 + 2 + 2 = 6. - Stress: 10⁴ random adds + queries against a brute-force list.
- Stack of pending tags:
add(0, n-1, 1)100 times;sumRange(i, i) == 100for all i.
Follow-up Questions
- “Now
addbecomesset(assignment, not delta).” → identity = sentinelNone; on push_down, iflazy_parent != None, replace child’streeand replace child’slazy. - “Both
addandsetoperations.” → two lazy slots. Composition: a new “set” wipes pending “add”; a new “add” applied while a “set” is pending modifies the set value. - “Range-flip on a binary array, with range-count-of-ones.” →
tree[node] = count of 1s; flip →tree[node] = (nr-nl+1) - tree[node]; lazy is a boolean toggle. - “Range-affine: replace
a[i]withb · a[i] + c.” → lazy holds(b, c); composition:(b₂, c₂) ∘ (b₁, c₁) = (b₂ b₁, b₂ c₁ + c₂).
Product Extension
A live spreadsheet with array formulas — =ARRAYFORMULA(A1:A1000 + 5) — is exactly range-add. With range-set you get fill-down. With range-affine you get scaling formulas. The backend has to support thousands of these per second per spreadsheet; lazy segment trees are one viable engine.
Language/Runtime Follow-ups
- Python: 4 × 10⁵ allocations are slow; warm the lists once, never resize. Recursion at N=10⁵ depth ≈ 17, fine.
- Java:
long[] tree, lazyto avoid sum overflow at the limits. Synchronization-free for single-thread;LongAdderis unrelated. - Go: same template;
[]int64slices. - C++: the canonical use case.
vector<long long>. Compile with-O2; benchmark on N=10⁶. - JS/TS:
BigInt64Arrayis heavy; if values fit in Number’s 53-bit safe range, useFloat64Arraydespite being float (the IEEE 754 representation is exact for ±2⁵³ integers).
Common Bugs
- Forgetting
push_downbefore recursing on partial cover. The leaf updates correctly but its sibling subtree returns stale aggregates on later queries. Manifests as queries that depend on update order. - Push down on full-cover branch — wasted work but not wrong; only push on partial overlap.
- Identity confusion: 0 is identity for add but a legal value for set. Use
Noneor a sentinel for set-style lazy. - Composition direction: when stamping a new tag onto a parent that already has a tag, write the rule down before coding. For add it’s commutative; for set it isn’t.
intoverflow in Java/C++ — sums of up to 10⁵ · 10⁴ values ≈ 10⁹, doubles to 10¹⁴ with adds. Use 64-bit.- Calling
push_downon a leaf — guardif nl != nr.
Debugging Strategy
Add an assert_consistent() helper that walks the tree and verifies, for every internal node, tree[node] == tree[left] + tree[right] + lazy[node] * (nr - nl + 1). Wait — that’s not quite the invariant, since lazy[node] has not been pushed yet but tree[node] already includes it. The correct invariant is tree[node] == tree[left] + tree[right] + (lazy[node] * (nr - nl + 1)) only if you treat children’s tree as “before this lazy stamp”. An easier debug helper: after each op, force a full push_down from root to leaves and rebuild aggregates; compare against the brute-force array. If they diverge, you have a push-down-order bug.
Mastery Criteria
- Stated the push-down invariant in one sentence.
- Wrote range-add + range-sum lazy seg tree from scratch in <20 minutes, first try.
- Adapted the same template to range-set + range-sum in <10 additional minutes.
- Solved LC 732 (My Calendar III) using a coord-compressed lazy seg tree.
- Stated when not to use lazy (single-point updates → no benefit; non-composing operations → impossible).
-
Pinpointed the canonical bug (missing
push_down) within 5 minutes of seeing a failing test.
Lab 03 — Fenwick Tree (Binary Indexed Tree)
Goal
Implement a Fenwick tree (BIT) and use it to solve LeetCode 315 — Count of Smaller Numbers After Self. Internalize the bit-tricks (i & -i) and the 1-indexed convention so well that you can write a Fenwick tree in under 8 minutes from a blank page.
Background Concepts
A Fenwick tree is a clever encoding of prefix sums in O(N) space supporting prefix_sum(i) and point_update(i, delta) each in O(log N). The key insight: index i in 1-indexed form is associated with a “responsibility range” of size i & -i (the lowest set bit of i). Index 12 = 1100₂ has responsibility for the 4 values at positions 9..12. Index 8 = 1000₂ for 1..8. Walking up the tree (i += i & -i) accumulates non-overlapping responsibility ranges that span exactly [1, i] for query, and exactly the buckets that contain i for update.
The structure is invertible-only: it stores prefix sums and you derive range_sum(l, r) = prefix(r) - prefix(l - 1). This is fine for sum, xor, count, and “frequency-prefix” aggregates. It does not generalize to min/max because subtraction doesn’t undo a min.
For LC 315, the trick is coordinate compression + Fenwick of frequencies. Process the array right-to-left; for each nums[i], query “how many values strictly less than nums[i] have I seen so far?” by computing prefix(rank(nums[i]) - 1) on the frequency Fenwick; then increment update(rank(nums[i]), 1).
Interview Context
Fenwick trees are asked roughly as often as segment trees but the audience skews more competitive-programming. Stripe, Jane Street, Two Sigma, Bloomberg quant — all reach for them. The signal is “count inversions / count-of-X-after-Y / range-sum-with-updates and the aggregate is invertible”. Faster constants than segment tree, ~5x fewer lines of code; if both work, prefer Fenwick.
Problem Statement
Given an integer array nums, return an array counts where counts[i] is the number of elements to the right of nums[i] that are strictly smaller than nums[i].
Constraints
- 1 ≤ N ≤ 10⁵
- −10⁴ ≤ nums[i] ≤ 10⁴
Clarifying Questions
- Strictly smaller, or ≤? (Strictly smaller.)
- Return order: same as input order? (Yes —
counts[i]aligns withnums[i].) - Are duplicates allowed? (Yes — they don’t count toward “smaller”.)
Examples
nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]
5: indices 1,3 (vals 2,1) are smaller → 2
2: index 3 (val 1) is smaller → 1
6: index 3 (val 1) is smaller → 1
1: nothing to the right is smaller → 0
Initial Brute Force
For each i, scan j > i and count nums[j] < nums[i]. O(N²).
Brute Force Complexity
O(N²) time. At N=10⁵: 10¹⁰ ops. TLE by 4 orders of magnitude.
Optimization Path
Merge sort with inversion counting. During the merge step, when copying from the right half, every remaining element on the left half is strictly larger and hasn’t yet been placed — for each, increment its inversion count. O(N log N) time and space. Works, idiomatic.
Fenwick of frequencies after coordinate compression. Equally O(N log N), simpler to extend (e.g., to “count of values in [a, b] after self”).
For this lab, Fenwick is the assigned approach because it generalizes farther.
Final Expected Approach
- Coordinate compression: build
sorted_unique = sorted(set(nums)); map each valuevtorank = bisect_left(sorted_unique, v) + 1(1-indexed for Fenwick). - Right-to-left sweep: for each
ifromn-1down to0:r = rank[nums[i]]counts[i] = bit.prefix(r - 1)— count of strictly smaller previously-seen.bit.update(r, 1).
- Return
counts.
Data Structures Used
BIT(size)— Fenwick tree of size = number of distinct values.rankmap — value → 1-indexed compressed rank.counts[]— output array.
Correctness Argument
After processing index i, the BIT contains exactly the multiset of ranks for nums[i+1..n-1] (the elements to the right of i, since we go right-to-left). bit.prefix(r - 1) returns the count of those whose rank is < r — i.e., strictly smaller than nums[i]. Coordinate compression preserves order, so “rank smaller” iff “value smaller”. The update bit.update(r, 1) then registers nums[i] for the next iteration. By induction the invariant “BIT == multiset of ranks of strictly-right-of-current” is preserved.
Complexity
| Operation | Time | Space |
|---|---|---|
| Coordinate compression | O(N log N) | O(N) |
| Right-to-left sweep | O(N log N) | O(N) Fenwick |
| Total | O(N log N) | O(N) |
Implementation Requirements
class BIT:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1) # 1-indexed
def update(self, i, delta):
while i <= self.n:
self.tree[i] += delta
i += i & -i
def prefix(self, i):
s = 0
while i > 0:
s += self.tree[i]
i -= i & -i
return s
from bisect import bisect_left
def countSmaller(nums):
sorted_unique = sorted(set(nums))
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
bit = BIT(len(sorted_unique))
counts = [0] * len(nums)
for i in range(len(nums) - 1, -1, -1):
r = rank[nums[i]]
counts[i] = bit.prefix(r - 1)
bit.update(r, 1)
return counts
Tests
[5, 2, 6, 1] → [2, 1, 1, 0].[1, 2, 3, 4] → [0, 0, 0, 0](already sorted).[4, 3, 2, 1] → [3, 2, 1, 0](reverse sorted — every pair is an inversion).- All same:
[5, 5, 5] → [0, 0, 0]. - Negatives:
[-1, -1, 0, -2] → [1, 1, 1, 0]. - Single element:
[7] → [0]. - Stress: 10⁴ random arrays of size 1000 against the O(N²) brute force.
Follow-up Questions
- “Now count strictly larger after self.” → mirror: prefix from
rank+1ton=bit.prefix(n) - bit.prefix(rank). - “Count of values in
[a, b]after self.” →bit.prefix(rank[b]) - bit.prefix(rank[a] - 1). - “Reverse pairs (LC 493)”:
nums[i] > 2 * nums[j]fori < j. Adapt the rank/query: compute “count of v’s in BIT withv < nums[i] / 2” — careful with integer division. - “Sum of values smaller after self instead of count.” → BIT stores value sums, not counts;
update(rank, nums[i])instead ofupdate(rank, 1). - “Now updates are interleaved with queries on the original problem.” → Fenwick tree of frequencies still works because both ops are O(log N).
Product Extension
A leaderboard service that streams game scores and reports “your rank percentile” as scores arrive. Fenwick of frequencies indexed by score bucket; as a new score arrives, query prefix to know how many scored less, divide by total. Works at millions-of-events-per-second with log-bucket cost per event.
Language/Runtime Follow-ups
- Python: integer ops are arbitrary-precision but slow; the BIT loop is hot. PyPy or C-extension if N=10⁶.
array.array('q')overlistonly marginally helps. - Java:
int[]for the tree at this N.Math.floorModnot needed (values are positive ranks). Watch forlongif you store sums. - Go: idiomatic —
tree []int. No surprises. - C++: canonical CP template.
vector<int> bit(n + 1, 0). Thei & -ilowbit relies on two’s complement, which all modern compilers guarantee for signedint. - JS/TS:
Int32Array(n + 1)outperforms regular arrays. The bitwise&and unary-on numbers cast through 32-bit signed int, which works for N ≤ 2³¹.
Common Bugs
- 0-indexing the BIT. Calling
update(0, …)isi & -i = 0, the loop never advances, or it loops forever (depends on language). Always 1-index. - Update walks down, query walks up — got the directions reversed. Mnemonic: update goes up the responsibility tree (so future prefix walks see it); query walks down (collecting predecessor ranges).
- Forgetting to add 1 when going from
bisect_leftrank to BIT index. - Compressing
numsbut using the original value when querying. treearray sizennotn + 1for 1-indexed.- Using Fenwick for min/max — it doesn’t work because subtraction is not the inverse of min.
Debugging Strategy
For a length-8 input, print tree[1..8] after each update. Recall: tree[i] stores the sum over [i - lowbit(i) + 1, i]. So tree[8] should equal the sum over [1, 8], tree[12] over [9, 12], etc. Verify by hand for a 3-update sequence.
If the inversion-count is off by 1 at every position, you almost certainly forgot the +1 in rank shifting (1-indexed vs 0-indexed). If it’s off by a lot, your update is going down instead of up, or your prefix is going up instead of down.
Mastery Criteria
-
Wrote
updateandprefixwithi & -icorrectly on first try. - Used 1-indexed throughout without bugs.
- Solved LC 315 in <15 minutes from blank slate.
- Solved one cousin (LC 327, LC 493) in <30 minutes.
- Articulated why Fenwick can’t do range-min.
- Stated when Fenwick beats segment tree (smaller code, smaller constants — pick Fenwick when the aggregate is invertible).
-
Estimated memory at N=10⁶ (~4 MB for
int) without prompting.
Lab 04 — Sparse Table for Range Minimum Queries
Goal
Implement a sparse table supporting O(1) range-min queries on a static array, after O(N log N) preprocessing. Internalize the “two overlapping intervals of the largest power-of-two length” trick. After this lab you should be able to write a sparse table from blank in under 10 minutes and instantly choose between sparse table and segment tree based on whether updates are required.
Background Concepts
A sparse table is a preprocessing structure for range queries on immutable arrays where the aggregate is idempotent — combining the same element twice gives the same result. Min, max, gcd, bitwise OR, bitwise AND, and “is-there-a-1-in-this-range” are idempotent. Sum is not (counts twice).
Construction. st[k][i] = min of arr[i .. i + 2^k - 1]. Build by:
st[0][i] = arr[i]for alli.st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)])— the range of length 2^k splits cleanly into two halves of length 2^(k-1).
Query. Given [l, r], let k = floor(log2(r - l + 1)). Then min(st[k][l], st[k][r - 2^k + 1]). The two intervals each have length 2^k, they cover [l, l + 2^k - 1] and [r - 2^k + 1, r], and their union is exactly [l, r] because l + 2^k - 1 ≥ r - 2^k + 1 whenever 2^k ≥ (r - l + 1) / 2, which holds by choice of k. They overlap, but for an idempotent op that’s harmless.
For O(1) per query you also need a precomputed log_floor[len] table — calling math.log2 each query has too much overhead and floating-point trouble.
Interview Context
Sparse tables show up in problems with a read-only array and many range-min/max queries. The signal: “static array, Q queries with Q ≫ N”. Common cousin: range-LCA via Euler tour + sparse table over depth array — Phase 4 territory but rooted here. Asked at: Google occasionally, CP-flavored shops always. Rejecting an O(log N) segment tree in favor of a sparse table when O(1) queries matter (e.g., 10⁷ queries on a 10⁵ array) is a senior-level signal.
Problem Statement
Given a static integer array arr of length N, build a structure that answers query(l, r) = min of arr[l..r] in O(1) per query.
Constraints
- 1 ≤ N ≤ 10⁵
- 1 ≤ Q ≤ 10⁷ queries
- 0 ≤ l ≤ r < N
- −10⁹ ≤ arr[i] ≤ 10⁹
Clarifying Questions
- Is the array static? (Yes — that’s the entire premise.)
- Inclusive endpoints? (Yes.)
- Min, or min and max? (Just min for this lab; max is identical with
min→max.)
Examples
arr = [3, 1, 4, 1, 5, 9, 2, 6]
query(0, 7) → 1
query(2, 5) → 1
query(4, 7) → 2
query(3, 3) → 1
Initial Brute Force
min(arr[l:r+1]) per query — O(N) per call. Total O(N · Q).
Brute Force Complexity
At N=10⁵, Q=10⁷: 10¹² ops. TLE by 6 orders of magnitude.
Optimization Path
Segment tree gives O(log N) per query, O(N) per build. Total O(Q log N) = ~2 × 10⁸ at the limits — borderline TLE in Python, fine in C++.
Sparse table gives O(1) per query, O(N log N) per build. Total O(N log N + Q) = ~2 × 10⁶ + 10⁷ = 1.2 × 10⁷ ops. Comfortable everywhere.
The deciding factor: updates. Sparse table is read-only. If the array mutates between queries, sparse table is wrong; segment tree is required. The interviewer asking “what if I want updates?” is a real follow-up — answer: “Switch to a segment tree; sparse table doesn’t support point updates without an O(N log N) full rebuild.”
Final Expected Approach
- Precompute
log_floor[1..N]vialog_floor[i] = log_floor[i // 2] + 1, base caselog_floor[1] = 0. - Allocate
stas a 2D array of size(K + 1) × NwhereK = log_floor[N]. st[0][i] = arr[i]for all i.- For
k = 1 .. K: fori = 0 .. N - 2^k:st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]). query(l, r):k = log_floor[r - l + 1]; returnmin(st[k][l], st[k][r - 2^k + 1]).
Data Structures Used
- 2D array
st[K+1][N], whereK = floor(log2(N)). - 1D array
log_floor[N+1].
Correctness Argument
By induction on k: st[0][i] = arr[i] (length-1 range, trivially correct). Given st[k-1][·] correct: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) covers [i, i + 2^(k-1) - 1] ∪ [i + 2^(k-1), i + 2^k - 1] = [i, i + 2^k - 1]. Min commutes over union.
For query, k = floor(log2(r - l + 1)) ⇒ 2^k ≤ len ≤ 2^(k+1) - 1 ⇒ 2^k ≥ len/2 ⇒ l + 2^k > r - 2^k, so the two intervals [l, l + 2^k - 1] and [r - 2^k + 1, r] overlap (or meet exactly), and their union is [l, r]. Min over the union equals min of the two.
Complexity
| Operation | Time | Space |
|---|---|---|
| Build | O(N log N) | O(N log N) |
| Query | O(1) | — |
At N=10⁵: K ≈ 17, total table cells ≈ 1.7 × 10⁶. At 8 bytes each, ~14 MB.
Implementation Requirements
class SparseTableMin:
def __init__(self, arr):
n = len(arr)
self.log = [0] * (n + 1)
for i in range(2, n + 1):
self.log[i] = self.log[i // 2] + 1
K = self.log[n]
self.st = [list(arr)] + [[0] * n for _ in range(K)]
for k in range(1, K + 1):
half = 1 << (k - 1)
for i in range(n - (1 << k) + 1):
self.st[k][i] = min(self.st[k-1][i], self.st[k-1][i + half])
def query(self, l, r):
k = self.log[r - l + 1]
return min(self.st[k][l], self.st[k][r - (1 << k) + 1])
Tests
- N=1:
query(0, 0) == arr[0]. - All same:
query(l, r) == arr[0]for all valid (l, r). - Sorted ascending:
query(l, r) == arr[l]. - Sorted descending:
query(l, r) == arr[r]. - Random: 10⁴ queries on a length-1000 random array vs brute force.
- Edge:
query(0, n-1)should equalmin(arr). - Power-of-two length and non-power-of-two length both must pass.
Follow-up Questions
- “Now also support range-max.” → second sparse table or pack
(min, max)into each cell. - “Now updates are required.” → switch to segment tree; sparse table cannot support O(log N) updates without rebuild.
- “Now I want range-sum.” → sum is not idempotent. You can still answer in O(log N) by combining
K = log(len)non-overlapping doubling intervals, but at that point segment tree is simpler. - “Range LCA.” → reduce to range-min on Euler tour depth array + sparse table over depths. Lab in Phase 4.
- “Reduce memory at the cost of complexity.” → Fischer–Heun (RMQ ±1) is O(N) preprocessing + O(1) query but conceptually heavy.
Product Extension
Static analytics dashboards (pre-aggregated, refreshed nightly) over time-series metrics: “min latency in this 5-minute window over the last 24 hours, sliding”. Pre-aggregate the time-series as a sparse table at end-of-day; serve queries at the dashboard at single-microsecond latencies. The “static” condition matches because the data is read-only between rebuilds.
Language/Runtime Follow-ups
- Python: list-of-lists is cache-unfriendly; flatten to one big list with manual indexing for ~3x speedup. PyPy if benchmarking.
- Java:
int[][] st = new int[K+1][n]. JIT will hoist invariants. For N=10⁶ allocate carefully. - Go:
[][]intis fine;make([]int, n)inside a loop is idiomatic. - C++:
vector<vector<int>> st(K + 1, vector<int>(n)). With-O2this is the canonical fast implementation. - JS/TS:
Int32Arrayper row beatsArrayfor numeric ops. JS doesn’t have integer log2 —Math.log2is float and slow; use(31 - Math.clz32(x))for 32-bit ints.
Common Bugs
- Computing
log2per query → floating-point rounding errors, e.g.log2(8) → 2.9999..., floored to 2. Always use the precomputedlog_floor[]. - Building
st[k][i]fori + 2^k - 1 ≥ N— out-of-bounds. Loop must end atn - 2^k. - Sizing
stwith too few rows: K = log_floor[N], but allocating K rows misses the K-th. Allocate K+1. - Using sparse table for sum and then puzzling over wrong answers — sum is not idempotent.
- Forgetting that the array must be static. If a query is interleaved with mutation, the structure silently returns stale answers.
- Off-by-one in query:
r - (1 << k) + 1vsr - (1 << k). The interval is[r - 2^k + 1, r]of length2^k— verify by hand on a tiny case.
Debugging Strategy
Print st[k] for small N=8 and verify by hand: st[0] is the array, st[1][i] = min(arr[i], arr[i+1]), st[2][i] = min(arr[i..i+3]), st[3][0] = min(arr[0..7]). If those don’t hold, your build loop is wrong. If queries fail but build is correct, suspect log_floor and/or query indexing — trace the formula on query(2, 5) where len=4, k=2, indices=2 and 5-4+1=2, both pointing at the same precomputed cell.
Mastery Criteria
- Stated the idempotence requirement and gave 3 ops that satisfy it and 1 that doesn’t.
- Wrote sparse table from scratch in <10 minutes.
-
Wrote
log_floortable without usingmath.log2. - Chose sparse table over segment tree for a read-only workload by stating the constants (1 vs log N per query).
- Identified the failure mode “what if updates are needed” and named segment tree as the replacement.
- Solved one classic RMQ problem and one problem reducible to RMQ.
Lab 05 — KMP String Matching
Goal
Implement Knuth–Morris–Pratt (KMP): build the failure function (longest proper prefix-suffix) of a pattern in O(M), then match the pattern against a text in O(N + M). Apply it to LeetCode 28 (strStr) and LeetCode 459 (Repeated Substring Pattern). After this lab, you should be able to derive fail[] on a 10-character string by hand and write the matcher in <12 minutes.
Background Concepts
The naive substring search compares the pattern against every position in the text: O(N · M) worst case. KMP exploits the fact that when a mismatch occurs at pattern position j, the prefix P[0..j-1] did match. So we already know the last j characters of the text. From that we compute “what is the longest proper prefix of P that is also a suffix of P[0..j-1]?” — call that length fail[j-1]. Then we resume matching at pattern position fail[j-1] without backtracking the text pointer.
The failure function (also called “longest proper prefix-suffix” or LPS):
fail[i]= length of the longest proper prefix ofP[0..i]that is also a suffix ofP[0..i].- “Proper” = strictly shorter than
i + 1. fail[0] = 0always.
Build in O(M) using a two-pointer recurrence: j = fail[i-1]; if P[j] == P[i], fail[i] = j + 1; else fall back j = fail[j-1] and retry, until j = 0.
The matcher: walk text pointer i forward; pattern pointer j advances on match, falls back to fail[j-1] on mismatch (without resetting i).
Interview Context
KMP is the bedrock single-pattern string algorithm. Asked at every FAANG, every quant shop, every search-infra team. The give-away signal: “find pattern in text” with N, M up to 10⁵ — naive is 10¹⁰ ops. Most candidates know strStr exists; few can derive fail[] correctly under pressure. Doing it cleanly is a strong signal at L4+.
Problem Statement (LC 28)
Given two strings haystack and needle, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Constraints
- 1 ≤ haystack.length, needle.length ≤ 10⁴ (LC 28); generalize to 10⁵.
- All printable ASCII.
Clarifying Questions
- First occurrence (leftmost), or any? (Leftmost.)
- Return
0for emptyneedle? (Yes — convention.) - Case-sensitive? (Yes by default.)
Examples
strStr("sadbutsad", "sad") → 0
strStr("leetcode", "leeto") → -1
strStr("hello", "ll") → 2
strStr("aabaaabaaac", "aabaaac") → 4
Initial Brute Force
Two nested loops: for each starting index i ∈ [0, N - M], compare text[i..i+M-1] against pattern; return i on full match.
Brute Force Complexity
O((N - M + 1) · M) ≈ O(N · M). Worst case at text = "aaa..a", pattern = "aaa..b": each starting position fails on the last character. At N = M = 10⁵: 10¹⁰ ops. TLE.
Optimization Path
KMP O(N + M). Z-algorithm O(N + M) — equally good, different invariants. Rabin–Karp O(N) expected with hashing — needs verify on collision. Suffix automaton O(N + M) for the text-side, O(M) match — overkill for one pattern.
KMP is the canonical answer because it (a) is exact (no probabilistic concerns), (b) generalizes to “longest border” / “shortest period” follow-ups, (c) is preferred by interviewers as a known-quantity algorithm.
Final Expected Approach
- Build
fail[]of length M for the pattern. - Walk a single pointer
iover the text andjover the pattern.- If
text[i] == pat[j]: advance both; ifj == M, report match ati - M. - If mismatch and
j > 0:j = fail[j - 1](don’t advancei). - If mismatch and
j == 0:i += 1.
- If
Data Structures Used
fail[]— int array of length M.- Two pointers,
iandj.
Correctness Argument
Failure function invariant: at the end of the build loop’s iteration on i, fail[i] equals the length of the longest proper prefix of P[0..i] matching its suffix.
Proof sketch: assume fail[0..i-1] is correct. Set j = fail[i-1] — the longest proper border of P[0..i-1]. Try to extend: if P[j] == P[i], the border extends by one to j + 1, and there is no longer border (any longer would give a longer border at i-1). If not, fall back to the next-shorter border via j = fail[j-1]; repeat.
Match invariant: when the matcher is at text position i and pattern position j, the last j characters of text up to i-1 equal P[0..j-1]. On mismatch, falling back to fail[j-1] finds the longest proper prefix of P that is a suffix of P[0..j-1], which is also a suffix of the text so far — preserving the invariant without rescanning text.
Linearity: in the matcher, the variable i + (i - j) strictly increases on every iteration. Since i ≤ N and j ≥ 0, the loop runs ≤ 2N times. Same trick for the build: i + (i - j) ≤ 2M.
Complexity
| Operation | Time | Space |
|---|---|---|
| Failure function build | O(M) | O(M) |
| Match | O(N + M) | O(M) |
Implementation Requirements
def build_failure(pat):
m = len(pat)
fail = [0] * m
j = 0
for i in range(1, m):
while j > 0 and pat[j] != pat[i]:
j = fail[j - 1]
if pat[j] == pat[i]:
j += 1
fail[i] = j
return fail
def strStr(text, pat):
if not pat: return 0
n, m = len(text), len(pat)
if m > n: return -1
fail = build_failure(pat)
j = 0
for i in range(n):
while j > 0 and text[i] != pat[j]:
j = fail[j - 1]
if text[i] == pat[j]:
j += 1
if j == m:
return i - m + 1
return -1
Tests
- Empty pattern → 0.
- Pattern not in text → −1.
- Pattern at the start:
strStr("abcd", "ab") == 0. - Pattern at the end:
strStr("abcd", "cd") == 2. - Pattern equals text:
strStr("hello", "hello") == 0. - Repeated chars:
strStr("aaaa", "aa") == 0. - Worst-case backtrack:
strStr("aaaaab", "aaab") == 2. - Verify
failfor"aabaaab"→[0, 1, 0, 1, 2, 2, 3]. - Stress: random texts/patterns, compared against
text.find(pat).
Follow-up Questions
- “Find all occurrences.” → on match, instead of returning, record
i - m + 1and continue withj = fail[j - 1]. - “Repeated substring pattern (LC 459).” →
sis composed ofk ≥ 2repetitions of a substring iffn % (n - fail[n-1]) == 0andfail[n-1] > 0. - “Shortest palindrome (LC 214).” → run KMP on
s + '#' + reverse(s); the answer prefix length isfail[-1]. - “Multi-pattern matching.” → Aho–Corasick (Phase 3 #17) generalizes KMP to a trie of patterns.
- “Strict period” (longest period of S) =
n - fail[n-1]; longest border =fail[n-1]. They are dual. - “Z algorithm — implement it instead.” → different invariant, same asymptotics; pick whichever is fluent.
Product Extension
Search-engine snippet generation: for each query term, find the first match in each candidate document. Multi-pattern at scale uses Aho–Corasick; single-pattern intra-doc still uses KMP because of its predictable cache behavior. Anti-virus signature scanning of binaries is the same problem, multi-pattern, with patterns numbered in the millions — Aho–Corasick territory but KMP per-pattern is the building block.
Language/Runtime Follow-ups
- Python: built-in
str.findis C-implemented Two-Way / Crochemore — usually faster than Python-level KMP. KMP wins when you want all occurrences or the failure function for other purposes. - Java:
String.indexOfis a naive scan. KMP wins for adversarial inputs. Usechar[]overString.charAtfor the inner loop. - Go:
strings.Indexuses Rabin–Karp with a fallback. KMP useful when you want explicitfail[]. - C++:
string::findis naive in libstdc++. KMP from scratch is the canonical CP move.std::vector<int>forfail. - JS/TS:
String.prototype.indexOfis engine-dependent; V8 uses Boyer–Moore–Horspool. KMP needed when you implement custom matchers.
Common Bugs
- Setting
fail[0] = 1instead of 0. The “proper” prefix excludes the full string. - In the build, forgetting
while j > 0 and pat[j] != pat[i]: j = fail[j-1]is awhile, not anif. Treating it asifgives wrong answers on patterns like"aabaaab". - Resetting
j = 0between independent matches but forgetting to reseti— leftover from a “match all” loop. - Using
pat[i] != pat[j]vspat[i] != pat[j-1]— pick a consistent indexing for the LPS (length, not index) and don’t mix. - Off-by-one when reporting the match index:
i - m + 1(0-indexed start) vsi - m. - For LC 459, forgetting the
fail[n-1] > 0guard — without it, a non-repeating string passes the divisibility check trivially.
Debugging Strategy
Compute fail[] for a 7-char pattern by hand and compare. For "aabaaab": fail = [0, 1, 0, 1, 2, 2, 3]. Walk the recurrence: at i=4, j=fail[3]=1, pat[1]==‘a’==pat[4]=‘a’ → fail[4]=2. At i=5, j=fail[4]=2, pat[2]==‘b’!=pat[5]=‘a’ → j=fail[1]=1, pat[1]==‘a’==pat[5]=‘a’ → fail[5]=2. If your code disagrees, instrument with prints.
For the matcher, trace (i, j) per iteration on text="aabaaabaaac", pat="aabaaac". After an early mismatch at (i=6, j=6) (text=‘b’ vs pat=‘c’), j=fail[5]=2, so we resume at text 6 vs pat 2.
Mastery Criteria
-
Computed
fail[]for an unfamiliar 8-character pattern by hand in <2 minutes. -
Wrote
build_failureandstrStrin <12 minutes total, no off-by-ones. - Solved LC 28, LC 459, and LC 1392 from cold start.
- Stated longest-border vs shortest-period duality.
- Identified KMP as the single-pattern engine; identified Aho–Corasick as the multi-pattern generalization.
-
Stated the linear-time argument (potential function
i + (i - j)).
Lab 06 — Rolling Hash (Rabin–Karp + Double Hashing)
Goal
Implement a polynomial rolling hash with two independent (base, mod) pairs. Use it to (a) find repeated substrings of a fixed length (LC 187 — Repeated DNA Sequences) and (b) find the longest duplicate substring via binary search on length (LC 1044 — Longest Duplicate Substring). Internalize the modular arithmetic well enough to avoid collisions on adversarial inputs.
Background Concepts
A polynomial rolling hash treats a string as a base-b number mod p:
H(S) = (S[0] · b^(L-1) + S[1] · b^(L-2) + ... + S[L-1]) mod p
The “rolling” part: when you slide the window from S[i..i+L-1] to S[i+1..i+L], you update in O(1):
H_new = ((H_old - S[i] · b^(L-1)) · b + S[i+L]) mod p
For substring equality from prefix hashes, precompute pref[i] = (S[0] · b^(i-1) + ... + S[i-1]) mod p. Then H(S[l..r]) = (pref[r+1] - pref[l] · b^(r - l + 1)) mod p.
Single hash collision risk. A single mod-p hash has birthday-paradox collision probability ~k²/(2p) for k strings. At p ~ 10⁹ and k = 10⁵, that’s ~5% chance of collision. Adversarial inputs constructed to collide on a fixed (b, p) make single hashing unsafe.
Double hashing uses two independent (b₁, p₁) and (b₂, p₂); two strings collide on both with probability ~k²/(p₁ · p₂) ≈ 10⁻⁹ for ~10⁵ strings. Effectively safe for interviews. Anti-hash-resistant code uses random bases per run.
Interview Context
Rolling hash shows up whenever the brute-force algorithm involves “compare every substring of length L to every other” — a quadratic-in-N algorithm. The reduction is “convert string equality to integer equality, get O(1) per comparison”. Asked at: Google (frequent — duplicate detection, plagiarism), search-infra teams, biotech companies (DNA sequence problems), Stripe.
The signal: “many substring equality checks”, “longest repeated”, “find duplicates of length L”.
Problem Statement A (LC 187)
Find all 10-letter sequences that occur more than once in a DNA string s over {A, C, G, T}. Return them in any order.
Problem Statement B (LC 1044)
Given a string s, find the longest duplicated substring (any substring of length ≥ 1 that appears at least twice — overlaps allowed). Return any longest. If no duplicates, return "".
Constraints
- LC 187: 1 ≤ |s| ≤ 10⁵; alphabet
ACGT. - LC 1044: 2 ≤ |s| ≤ 3 × 10⁴; lowercase ASCII.
Clarifying Questions
- Overlaps allowed? (Yes — both problems.)
- Single longest, or all? (LC 1044: any one longest.)
- Case sensitivity / encoding? (As given by problem.)
- May we use suffix array / suffix automaton? (Yes, but the lab assignment is rolling hash.)
Examples
LC 187: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" → ["AAAAACCCCC", "CCCCCAAAAA"].
LC 1044: s = "banana" → "ana".
Initial Brute Force
LC 187: hashtable of every length-10 substring. O(N · L) time, O(N · L) memory — already passes.
LC 1044: for each L from N-1 down to 1, scan all length-L substrings and check duplicates. O(N² · L) at worst — 3 × 10¹² at the limits. TLE.
Brute Force Complexity
LC 187: O(N · L) acceptable. LC 1044: O(N²) at minimum, O(N³) naively. TLE.
Optimization Path
LC 187 with rolling hash: O(N) amortized per scan, O(N) total. Useful when L is large; for L = 10 the brute force is fine, but rolling hash demonstrates the technique.
LC 1044 with rolling hash + binary search on L: outer loop binary-searches L in [1, N-1]; inner loop hashes every length-L substring and looks up duplicates in a dict. O(N log N) total, with double-hashing or hash+verify.
Final Expected Approach
def has_duplicate_of_length(s, L):
# Returns the starting index of a duplicate of length L, or -1.
h = 0; pow_L = 1
seen = {}
for i in range(L):
h = (h * BASE + ord(s[i])) % MOD
if i: pow_L = (pow_L * BASE) % MOD
seen[h] = 0
for i in range(1, len(s) - L + 1):
h = ((h - ord(s[i-1]) * pow_L) * BASE + ord(s[i + L - 1])) % MOD
if h in seen:
# verify (or use double-hash) to avoid false-positive
if s[seen[h]:seen[h]+L] == s[i:i+L]:
return i
else:
seen[h] = i
return -1
# Binary search L in [1, N-1].
For LC 187, just collect a multiset of length-10 hashes; output any whose count > 1 (and verify).
Data Structures Used
- Two integer mods (~10⁹ primes), two random bases.
- Dict from hash (or hash-pair) → starting index.
- Precomputed power array
pow_b[i].
Correctness Argument
If two strings have the same content, their polynomial hash is equal — exact, no probability. The other direction (same hash → same content) is not guaranteed; this is why we verify on collision (or use double-hashing for ~10⁻¹⁸ collision probability).
For LC 1044, monotonicity: if a duplicate of length L exists, every L’ ≤ L has a duplicate (a prefix of one of the duplicate occurrences). So the answer set {L : duplicate exists} is a prefix [1, L*], and binary search finds L* in O(log N) calls of has_duplicate_of_length.
Complexity
| Operation | Time | Space |
|---|---|---|
| Single-length scan | O(N) amortized | O(N) |
| LC 187 | O(N · L) | O(N · L) |
| LC 1044 (binary search) | O(N log N) | O(N) |
Verifying on collision adds an O(L) hit per match; with double-hashing it’s negligible.
Implementation Requirements
import random
class RollingHash:
def __init__(self, s):
self.n = len(s)
# Use two (base, mod) pairs.
self.MOD1, self.MOD2 = (1 << 31) - 1, (1 << 61) - 1
self.B1 = random.randint(27, self.MOD1 - 1)
self.B2 = random.randint(27, self.MOD2 - 1)
self.h1 = [0] * (self.n + 1)
self.h2 = [0] * (self.n + 1)
self.p1 = [1] * (self.n + 1)
self.p2 = [1] * (self.n + 1)
for i, c in enumerate(s):
v = ord(c)
self.h1[i+1] = (self.h1[i] * self.B1 + v) % self.MOD1
self.h2[i+1] = (self.h2[i] * self.B2 + v) % self.MOD2
self.p1[i+1] = (self.p1[i] * self.B1) % self.MOD1
self.p2[i+1] = (self.p2[i] * self.B2) % self.MOD2
def hash_pair(self, l, r): # [l, r)
a = (self.h1[r] - self.h1[l] * self.p1[r - l]) % self.MOD1
b = (self.h2[r] - self.h2[l] * self.p2[r - l]) % self.MOD2
return (a, b)
def longestDupSubstring(s):
n = len(s)
rh = RollingHash(s)
def find(L):
seen = {}
for i in range(n - L + 1):
h = rh.hash_pair(i, i + L)
if h in seen: return i
seen[h] = i
return -1
lo, hi, best_start, best_len = 1, n - 1, 0, 0
while lo <= hi:
mid = (lo + hi) // 2
i = find(mid)
if i != -1:
best_start, best_len = i, mid
lo = mid + 1
else:
hi = mid - 1
return s[best_start:best_start + best_len]
Tests
- LC 187 sample →
["AAAAACCCCC", "CCCCCAAAAA"]. - LC 1044
"banana" → "ana". - LC 1044
"abcd"→"". - LC 1044
"aaaaaa"→"aaaaa"(overlapping duplicates). - LC 1044
"abab"→"ab". - Single-hash adversary: paste a Codeforces anti-hash test for
(b=31, p=10^9+7); verify your double-hash survives. - Stress: 1000 random strings of length 100, compare LC 1044 result to a brute-force suffix-array approach.
Follow-up Questions
- “What if I demand zero false positives?” → either verify on every match (O(L) extra) or use suffix array / suffix automaton (deterministic O(N log N) or O(N)).
- “Multiple texts share patterns.” → hash all, group by hash-pair, verify within group.
- “Stream the text.” → keep a rolling hash; emit matches online; constant memory beyond the dictionary.
- “Distinguish substrings cyclically equal.” → hash min-rotation (Booth’s algorithm) or all rotations.
- “Avoid Python big-int slowdown.” → use
(1 << 61) - 1(Mersenne prime) and bitwise reduction; or numpy.
Product Extension
Plagiarism detection over a corpus: shingle each document into k-grams, hash each shingle, store (doc_id, hash) tuples; near-duplicate documents share many shingles. MinHash + LSH (locality-sensitive hashing) is the production technology, built on the same rolling-hash foundation. Snapchat / Imgur-style “hash this image” reuses the same logic with perceptual hashes.
Language/Runtime Follow-ups
- Python: integers are bignums; prefer mods that fit in 63 bits to keep ops fast. Avoid
pow(b, k, m)in hot loops — precompute pow tables. - Java: use
longfor mod ~10⁹ to keepb · h + cfrom overflowingint. For mod 2⁶¹-1 you needMath.floorModand 128-bit (Math.multiplyHighin JDK 9+). - Go:
uint64with mod 2⁶¹-1 lets you(x * y) >> 61reduce. Idiomatic CP technique. - C++:
__int128for mod 2⁶¹-1 multiplications. Otherwiseunsigned long long. - JS/TS:
Numberonly safe for integers ≤ 2⁵³. UseBigInt(slow) or pick mod ≤ 2²⁶ with two-pair hashing to fit safely.
Common Bugs
- Single-hash collision — passes random tests, fails LC 1044 hidden cases. Double-hash always.
- Negative mod in languages with truncated division (Java, C, JS):
(a - b) % MODcan be negative; addMODand re-mod. - Power table off-by-one:
p[i] = b^i.hash[l..r)usesp[r - l], notp[r - l - 1]. - Reusing the same
(b, p)across runs on adversarial inputs — randomize per run. - Forgetting verify on single-hash → wrong answer.
- Boundary in rolling update: drop the leftmost char first (multiply by
pow_L), then add the rightmost.
Debugging Strategy
Construct two strings of the same length with known equality and known inequality; assert your hash_pair(0, L) gives equal pairs iff the strings are equal. For LC 1044, when the binary search returns wrong-length, instrument find(L) to dump (i, hash, prior_index) on each match candidate. If a hash collision passes verify, your hash is wrong; if verify catches it, single-hash worked but barely — switch to double-hash.
Mastery Criteria
- Implemented double-hash rolling hash from scratch in <15 minutes.
- Stated the collision probability for single vs double hash.
- Solved LC 187 in <10 minutes; LC 1044 in <30.
- Recognized binary-search-on-length as the standard “longest duplicate” reduction.
- Used random base per run.
- Stated the alternative (suffix array) and when to prefer it.
Lab 07 — Trie Applications (Implement Trie + Word Search II)
Goal
Build a trie supporting insert, search, startsWith (LC 208), then use a trie to solve LC 212 — Word Search II — by pruning a DFS over a 2D grid against a trie of dictionary words. After this lab, the trie should be a reflex for any “many strings, prefix-shared, batch-query” problem.
Background Concepts
A trie (prefix tree) is a tree where each edge is labeled with one character and each path from the root spells a string. A node may carry an is_end flag meaning “a word terminates here”. Two strings sharing a prefix share the prefix path in the trie, giving O(L) per insert / search regardless of how many words exist — a fundamental advantage over hashtable + per-word check when prefixes overlap.
Children can be stored as:
- Array of 26 (or 256) — fastest dispatch, fixed alphabet, slightly memory-heavy. Best for grid-DFS hot loops.
- Hash map char → node — flexible alphabet, slower constant. Good for arbitrary Unicode or large alphabets.
- Compressed (radix tree / Patricia) — collapse single-child chains; smaller memory, harder to implement.
For LC 212, the killer move is: instead of running KMP / search per word against the grid (O(W · cells)), build a trie of all words and run one DFS that explores the grid while walking the trie in parallel, terminating any branch whose current grid letter has no trie-child. This converts the cost from “many independent searches” into “one search with multi-end”.
Interview Context
Tries are asked at every FAANG with high frequency: Meta, Google, Amazon all have multiple variants in their pool. The pattern is “many strings share prefixes; query, autocomplete, or grid-search”. Recognizing the trie-prune-DFS reduction for LC 212 is a strong-hire signal — it’s a 5-line code change with a 100x speedup.
Cousins: autocomplete (LC 642), longest word in dictionary (LC 720), word break (LC 139 — sometimes trie-friendly), maximum xor of two numbers (LC 421 — bit-trie).
Problem Statement A (LC 208)
Implement Trie with insert(word), search(word) (exact match), startsWith(prefix) (any word with this prefix).
Problem Statement B (LC 212)
Given an m × n grid of letters and a list words, return all words that can be formed by a sequence of adjacent (4-directionally connected) cells in the grid, where each cell is used at most once per word.
Constraints
- LC 208: ≤ 3 × 10⁴ ops; lowercase ASCII; word length ≤ 2000.
- LC 212: 1 ≤ m, n ≤ 12; 1 ≤ |words| ≤ 3 × 10⁴; each word length ≤ 10; lowercase ASCII.
Clarifying Questions
- Are duplicate words possible in the dictionary? (LC 212: assume no, dedupe defensively.)
- Can the same cell be reused across different words? (Yes — only “once per word”.)
- Is alphabet exactly lowercase ASCII? (Yes — use array of 26.)
Examples
LC 212:
board = [['o','a','a','n'],
['e','t','a','e'],
['i','h','k','r'],
['i','f','l','v']]
words = ["oath", "pea", "eat", "rain"]
→ ["oath", "eat"]
Initial Brute Force
For each word, run a DFS from every starting cell that matches the word’s first letter; backtrack on dead ends.
Per-word DFS: O(m · n · 4^L). Total: O(W · m · n · 4^L). With W=3×10⁴, m·n=144, L=10: ~10¹⁰ ops. TLE.
Brute Force Complexity
O(W · m · n · 4^L). At the limits: 1.6 × 10¹⁰. TLE by 4 orders of magnitude.
Optimization Path
Trie-pruned DFS. Build a trie of all words once: O(total chars) ≈ 3 × 10⁵. Then run a single DFS from each cell, walking the trie in parallel; whenever a trie-child for the current letter is missing, prune. Whenever an is_end is hit, record that word.
Total: O(m · n · 4^L) for the DFS structure, but the effective branching is small because most paths are pruned within 2-3 chars. Practical speedup: ~100×.
Optimization on top: when a word is recorded, mark its trie-end node and prune empty trie branches as you backtrack — keeps the trie shrinking.
Final Expected Approach
- Build trie: each node has
children[26],word: Optional[str](set on insert at the terminal). - For each cell (i, j), DFS:
- Read
c = board[i][j]. - If
node.children[c - 'a']is None, return. - Descend:
node = node.children[...]. - If
node.word: append to results, setnode.word = Noneto dedupe. - Mark
board[i][j] = '#'to prevent reuse. - Recurse 4 directions.
- Restore
board[i][j] = c.
- Read
- Return results.
Data Structures Used
TrieNode { children: List[Optional[TrieNode]], word: Optional[str] }.- 2D grid (mutable for the visited-mark trick).
- Result list.
Correctness Argument
The trie-DFS enumerates exactly the set of (path-in-grid, path-in-trie) pairs where each step matches the current grid letter to a trie child. A word is reported iff the DFS reaches a trie node with word != None along a non-self-intersecting grid path — by construction this is exactly the set of words present in the grid. Setting node.word = None after recording deduplicates without affecting other paths (the children remain reachable for other words sharing this prefix).
board[i][j] = '#' ensures non-reuse: the only way to visit a cell already on the path is if the trie has ‘#’ as a child of the current node, which it doesn’t (alphabet is lowercase).
Complexity
| Operation | Time | Space |
|---|---|---|
| Trie build | O(total characters in words) | O(total characters) |
| DFS (worst) | O(m · n · 4 · 3^(L-1)) per starting cell | O(L) recursion |
| Total | O(m · n · 4 · 3^(L-1)) | O(W · L) trie + O(L) recursion |
The 3^(L-1) (not 4^L) is because after the first step, you can’t immediately go back, so each node has ≤ 3 forward neighbors.
Implementation Requirements
class TrieNode:
__slots__ = ('children', 'word')
def __init__(self):
self.children = [None] * 26
self.word = None
def findWords(board, words):
root = TrieNode()
for w in words:
node = root
for c in w:
i = ord(c) - ord('a')
if node.children[i] is None:
node.children[i] = TrieNode()
node = node.children[i]
node.word = w
m, n = len(board), len(board[0])
res = []
def dfs(i, j, parent):
c = board[i][j]
if c == '#': return
idx = ord(c) - ord('a')
node = parent.children[idx]
if node is None: return
if node.word is not None:
res.append(node.word)
node.word = None
board[i][j] = '#'
for di, dj in ((-1,0),(1,0),(0,-1),(0,1)):
ni, nj = i + di, j + dj
if 0 <= ni < m and 0 <= nj < n:
dfs(ni, nj, node)
board[i][j] = c
# Optional pruning: if node has no children and no word, parent.children[idx] = None.
for i in range(m):
for j in range(n):
dfs(i, j, root)
return res
Tests
- LC 212 sample →
["oath", "eat"]. - Empty grid / empty word list →
[]. - Word equals a single cell:
board=[['a']], words=["a"]→["a"]. - Word longer than grid: returns nothing.
- Duplicate paths to same word: appears once thanks to
node.word = None. - Words sharing prefixes:
words=["oath", "oat"]— both should be found if both are present. - Stress: random grids of size 12×12, random word list of 100 words length 5; verify against per-word DFS brute force.
Follow-up Questions
- “How would you support deletion?” → reference-count children or do recursive cleanup; tricky due to shared prefixes.
- “Autocomplete with frequency.” → store
countper word at the terminal; on prefix lookup, walk the subtree and pick top-k by count (or maintain a per-node max-heap). - “What if the alphabet is Unicode?” → switch from
children[26]to adict. Per-node memory grows but dispatch is O(1) hash. - “Compress the trie.” → radix / Patricia trie collapses single-child chains; helpful when you have million-word dictionaries (e.g., DNS).
- “Bit trie for max-XOR (LC 421).” → trie of binary representations, depth 30 for 32-bit ints; greedy descent picks the opposite bit when possible.
Product Extension
Search-as-you-type / autocomplete: as a user types each character, walk the trie down one node and emit the top-k completions stored at that node. Production search engines (ES, Solr) build inverted indices, but for “small dictionary, fast prefix lookup” use cases (CLI command completion, query suggestion within an admin tool), an in-memory trie is the right call. DNS resolution uses radix tries internally.
Language/Runtime Follow-ups
- Python:
__slots__on TrieNode trims memory by ~40%. For LC 212 uselistof children rather thandict— dict’s per-key overhead dominates at this size. - Java:
TrieNode { TrieNode[] children = new TrieNode[26]; String word; }. JIT inlines the array dispatch. - Go:
type TrieNode struct { children [26]*TrieNode; word string }. Value-typed array of pointers is cache-friendlier than a slice. - C++:
struct TrieNode { array<TrieNode*, 26> children{}; string word; };. Use adeque<TrieNode>arena to avoidnewper node. - JS/TS: plain object
{children: Array(26), word: null}works; for large triesMapwith char keys uses less memory than 26-array per node.
Common Bugs
- Forgetting to mark
board[i][j] = '#'before recursing — same cell reused, wrong matches reported. - Forgetting to restore
board[i][j] = cafter recursion — corrupts the board for subsequent DFS calls. - Recording the word multiple times because the same prefix is reached via different paths. Fix:
node.word = Noneafter recording. - Storing the word only at the terminal — works if you carry the path string in DFS, wasteful otherwise. Correct: store the word at the terminal trie node.
- Initializing
children = [None](length 1) instead of[None] * 26— silent error at first non-‘a’ insert. - Walking the trie before reading the grid letter — off-by-one; you should read the grid letter, look up
parent.children[c], then descend.
Debugging Strategy
Trace the DFS on the LC 212 sample by hand: starting at (0, 0)=‘o’, root has child ‘o’ → ok; recurse to (1, 0)=‘e’ or (0, 1)=‘a’; descend to “oa” → child ‘a’; etc. If your output is missing words, add print(node.word) on the entry to dfs and verify the word terminations are correctly placed in the trie. If your output has spurious words, the visited-mark or the is_end placement is wrong.
Mastery Criteria
-
Wrote
Trieclass withinsert/search/startsWithin <8 minutes. - Solved LC 212 in <30 minutes from cold start.
- Stated the speedup over per-word DFS and the complexity of the combined DFS.
-
Used
node.wordstorage (notis_end + carry-string) for clean dedup. - Picked array-of-26 over hashmap for performance and justified it.
- Solved LC 421 (bit-trie max XOR) using the same structure.
Lab 08 — Bitmask Dynamic Programming
Goal
Solve LC 847 — Shortest Path Visiting All Nodes — using bitmask DP over (visited-set, current-node) state. Internalize the recipe: when N ≤ 20-ish and the state involves “subset of which items have been used / visited / assigned”, a bitmask is the state and the transition is a bit OR.
Background Concepts
A bitmask is an integer interpreted as a set: bit k is 1 iff element k is in the set. Set operations:
- Union:
a | b. Intersection:a & b. Difference:a & ~b. Symmetric difference:a ^ b. - Add element k:
a | (1 << k). Remove:a & ~(1 << k). Test:(a >> k) & 1ora & (1 << k). - Iterate all subsets of
mask:s = mask; while s > 0: ...; s = (s - 1) & mask. Iterates 2^popcount(mask) subsets. - Iterate set bits:
while mask: k = (mask & -mask).bit_length() - 1; mask &= mask - 1.
Bitmask DP stores dp[mask][...] indexed by the subset. Useful when N ≤ ~20 (so 2^N ≤ 10⁶) and the state must remember “exactly which subset has been processed”. It generalizes:
- TSP-like:
dp[mask][i] = min cost path that visited exactlymask, ending at i. - Subset-cover:
dp[mask] = min cost to covermask`` summed over groups. - Assignment problem:
dp[mask] = min cost to assign first popcount(mask) people to the jobs inmask``.
For LC 847, we want shortest walk (edges may repeat) visiting all nodes. The state (mask, i) captures “I’ve visited the set mask of nodes (at least once) and I’m currently at node i”. Transitions: from (mask, i), move to any neighbor j, new state (mask | (1 << j), j), cost +1. We want the smallest distance to any state (full_mask, *). BFS suffices since edge cost is 1.
Interview Context
Bitmask DP is a 1-2% problem family but a strong-hire signal when recognized fast. The trigger: N ≤ 20 with a constraint involving subsets. Asked at: Google occasionally, Stripe / Two Sigma, Meta in bar-raiser slot. Common trap is recognizing 2^N · N is feasible at N=15 (~5 × 10⁵) but not at N=25 (~8 × 10⁸).
Problem Statement (LC 847)
Given an undirected, connected graph of N nodes labeled 0..N-1 as adjacency lists, return the length of the shortest path that visits every node. You may start and stop at any node, may revisit nodes and edges.
Constraints
- 1 ≤ N ≤ 12
- 0 ≤ graph[i].length < N
- Graph is connected.
Clarifying Questions
- Length = number of edges (not nodes)? (Yes — number of edges traversed.)
- Are self-loops possible? (No.)
- May the path start and end at different nodes? (Yes.)
- Is the graph guaranteed connected? (Yes — answer always finite.)
Examples
graph = [[1, 2, 3], [0], [0], [0]]
(star with center 0, leaves 1, 2, 3)
shortest path visiting all = 4 (e.g., 1 → 0 → 2 → 0 → 3)
graph = [[1], [0, 2, 4], [1, 3, 4], [2], [1, 2]]
shortest = 4
Initial Brute Force
DFS / backtracking from every starting node, exploring all walks up to some bounded length. Without memoization, walks can be exponential in length even for small graphs. A timeout and a hand-tuned bound make this brittle.
Brute Force Complexity
Unbounded (or exponential with bound). Practically TLE for any non-trivial test.
Optimization Path
The state space is (mask, current_node) with mask ∈ [0, 2^N) and current_node ∈ [0, N). Total states: N · 2^N. For N=12: 12 · 4096 = 49152. Each state has ≤ N-1 outgoing transitions; total edges: N² · 2^N ≈ 6 × 10⁵. Trivially feasible.
Since edge weights are 1, BFS over the state graph from all starting states {(1 << i, i) : i ∈ [0, N)} (all “I’ve visited just myself” states) gives the shortest distance to each state. Stop when we dequeue a state with mask = (1 << N) - 1.
Final Expected Approach
full = (1 << N) - 1.- Initialize a queue with all
(mask=1<<i, node=i)fori ∈ [0, N). seen[(mask, node)]initialized for those starts.- BFS: pop
(mask, u); for each neighborv,new_mask = mask | (1 << v); if(new_mask, v)unseen, enqueue withdist+1. - First time a state with
mask == fullis dequeued, return its distance.
Data Structures Used
dequefor BFS frontier.- 2D
seenof shape[2^N][N](boolean) or a set. - Distance tracked alongside state in the queue (
dist+1per step).
Correctness Argument
The state graph is a directed graph on N · 2^N states; an edge (mask, u) → (mask | (1 << v), v) exists iff v is a graph neighbor of u. A walk in the original graph that visits all N nodes corresponds to a path in the state graph from some (1<<i, i) to some (full, j). Edge costs are 1 (one edge traversed per state-graph edge). Therefore shortest walk = shortest path in state graph from the start set to any final state, computed by multi-source BFS.
BFS visits each state once and terminates at the first finalized state. Correct because all edge weights equal.
Complexity
| Quantity | Value |
|---|---|
| States | N · 2^N |
| Transitions | up to N² · 2^N |
| Time | O(N² · 2^N) |
| Space | O(N · 2^N) |
At N=12: ~6 × 10⁵ ops. Fast.
Implementation Requirements
from collections import deque
def shortestPathLength(graph):
n = len(graph)
if n == 1: return 0
full = (1 << n) - 1
# State: (mask, node). Distance tracked by BFS layer.
seen = [[False] * n for _ in range(1 << n)]
q = deque()
for i in range(n):
seen[1 << i][i] = True
q.append((1 << i, i, 0))
while q:
mask, u, d = q.popleft()
if mask == full:
return d
for v in graph[u]:
new_mask = mask | (1 << v)
if not seen[new_mask][v]:
seen[new_mask][v] = True
q.append((new_mask, v, d + 1))
return -1 # unreachable for connected graphs
Tests
- N=1: return 0.
- N=2 with one edge: return 1.
- Star example: 4.
- Linear chain
0-1-2-3-4: shortest visiting all = 4. - Complete graph K_5: shortest = 4 (any Hamiltonian path).
- Disconnected (constraint says connected, but defensive): return -1 / handle.
- Stress: random connected graphs N=8..12 vs Held-Karp O(N² · 2^N) DP for cross-check.
Follow-up Questions
- “What if edges have weights?” → Dijkstra instead of BFS; same state graph.
- “What if I must start and end at node 0 (TSP)?” → state
(mask, i)with costdp[mask][i]= min cost, recurrencedp[mask | (1 << j)][j] = min(dp[mask][i] + w(i, j)). Answer:min(dp[full][i] + w(i, 0)). - “What if N=20?” → 20 · 2^20 = 2 × 10⁷ states, still ok. At N=25 we hit 8 × 10⁸ — likely TLE. The constraint cap on bitmask DP is N ~ 22.
- “Subset-cover variant (LC 1125 — Smallest Sufficient Team).” →
dp[mask]= min team to cover skill-maskmask; transition: for each person p with skill-maskpm,dp[mask | pm] = min(dp[mask] + 1). - “Assignment problem in bitmask DP.” →
dp[mask] = min cost to assign popcount(mask) people to the jobs in mask; transition over which job personpopcount(mask)takes.
Product Extension
Vehicle routing / drone delivery with ≤ 20 stops: bitmask DP precomputes optimal tours offline. Interview-scheduling problems (LC 1066): assign N workers to N tasks minimizing cost; the assignment-DP variant of bitmask DP runs in O(N · 2^N), beating the O(N · N!) brute force at N=15 by 9 orders of magnitude.
Language/Runtime Follow-ups
- Python: BFS with deque; the inner loop can be slow at N=12. PyPy if benchmarking. For N>14, switch to numpy or pre-flatten the seen array to
bytearray. - Java:
boolean[][] seen = new boolean[1 << n][n]. The dequeArrayDeque<int[]>boxes each state; for performance pack(mask, node, dist)into along. - Go: idiomatic
[][]bool. Use slice queue with head/tail indices to avoid alloc churn. - C++:
vector<vector<bool>> seen(1 << n, vector<bool>(n)). Pack state intoint(mask * n + node) and usevector<bool>of sizen * (1 << n)for cache. - JS/TS:
Uint8Array(n * (1 << n))for seen; bitwise ops are 32-bit signed — fine for n ≤ 30.
Common Bugs
- Forgetting that the path may revisit nodes — implementing as Hamiltonian path (
maskexactly indicates visited once) is wrong for LC 847; use the right transitionmask | (1 << v)(idempotent on already-visited nodes). - Initializing only one start state instead of all N — gives wrong answers because the optimal path may not start at node 0.
- Returning the first full-mask state encountered without distance: BFS guarantees minimal distance only because of FIFO ordering — correct here, but easy to swap for DFS by accident.
- Using
mask & (1 << v)as a boolean test in C/Java — works, but in JS/Python be explicit:(mask >> v) & 1. - Allocating
seen = [[False] * n] * (1 << n)(shared row reference) — Python beginner trap. - Off-by-one on
full = (1 << n) - 1vs(1 << n).
Debugging Strategy
For N=4 star, hand-simulate: start (1, 0=center) at d=0, expand to neighbors {1,2,3} → states (11, 1), (101, 2), (1001, 3) at d=1. From (11, 1), can go back to 0 → (11, 0) at d=2. From (11, 0) expand to 2 or 3 → (111, 2) or (1011, 3) at d=3. From (111, 2) go to 0 → (111, 0) at d=4. From (111, 0) to 3 → (1111, 3) at d=5. But the expected answer is 4! The min path is starting from a leaf: start (2, 1) → (3, 0) → (7, 2) → (15, 3)? Wait, going 1 → 0 → 2 → 0 → 3 is 4 edges. Let me recount: starts (2, 1) at d=0, → (3, 0) at d=1, → (7, 2) at d=2, → (7, 0) at d=3, → (15, 3) at d=4. Yes, 4. The issue with my earlier trace was starting from center.
If your code returns 5, you forgot to seed BFS with all start states.
Mastery Criteria
- Recognized “bitmask DP” within 60 seconds when N ≤ 20 and state involves subsets.
-
Wrote
shortestPathLengthfrom scratch in <20 minutes. - Stated state space size and confirmed feasibility for the given N.
- Solved one cousin (LC 1125, LC 943, LC 691) from cold start.
- Used proper bit operations (no string-based mask handling).
- Articulated when bitmask DP fails (N > 22 → 2^N too large).
Lab 09 — Meet in the Middle
Goal
Solve LC 1755 — Closest Subsequence Sum — via meet-in-the-middle: split the array into two halves, enumerate 2^(N/2) subset sums in each half, sort one half, and use binary search / two-pointer to find the pair-sum closest to the goal. Internalize the technique as the standard recipe whenever N is in the awkward zone 30 ≤ N ≤ 40 — too large for 2^N enumeration, too small for any polynomial DP.
Background Concepts
Meet in the middle (MITM) trades exponential time for exponential space, halving the exponent: instead of enumerating all 2^N subsets in one go (infeasible at N=40), enumerate 2^(N/2) subsets of each half (feasible at N=20: ~10⁶ each), then combine.
The combination step depends on the problem:
- Closest sum to goal (this lab): sort one half’s sums; for each sum
Lof the left half, binary-search the right half’s sums forgoal − L. - Count of subset-pairs with sum ≤ K: sort both halves; two-pointer.
- Find any subset summing to S: hashmap of one half’s sums; for each sum
L, check ifS − Lis in the map. - k-th smallest subset sum: more elaborate — heap-merge two sorted lists.
The asymptotics: O(2^(N/2) · N/2) to enumerate, O(2^(N/2) · log) for the combine. At N=40: 2^20 ≈ 10⁶, total ~2 × 10⁷. Feasible.
Interview Context
MITM is a niche but high-impact technique. Asked at Google (occasional), CP-flavored shops (frequent), and any problem set with N in [30, 45]. The signal: “N is around 30-40, brute force is 2^N, no polynomial DP visible because the state involves arbitrary subset sums”. Recognizing it converts a hopeless problem into a 30-minute solve. Not recognizing it caps you at “I would brute force but it TLEs” — a soft no-hire.
Problem Statement (LC 1755)
Given an integer array nums and integer goal, return the minimum absolute difference |sum(sub) − goal| over all non-empty subsequences (subsets) of nums. (LC 1755 allows the empty subsequence too — sum 0 — so empty is fine.)
Constraints
- 1 ≤ |nums| ≤ 40
- −10⁷ ≤ nums[i] ≤ 10⁷
- −10⁹ ≤ goal ≤ 10⁹
Clarifying Questions
- Subsequence = subset (unordered)? (Yes — LC’s “subsequence” here is order-independent because we only care about sum.)
- Empty subsequence allowed (sum 0)? (Yes per LC 1755.)
- Sums fit in 64-bit? (Max |sum| = 40 · 10⁷ = 4 × 10⁸ — fits in 32-bit Java
int. Use 64-bit defensively.)
Examples
nums = [5, -7, 3, 5], goal = 6 → 0 (5 + 3 - 5 + ... = 6 exactly via {5, 3, -7+5} = subset {5, -7, 3, 5}=6)
nums = [7, -9, 15, -2], goal = -5 → 1 (e.g., -9 + 7 = -2; |-2 - (-5)| = 3; better: -9 + 15 - 2 = 4; |-9 - (-5)|=4; -2 alone gives |-2-(-5)|=3; closest is -9 + 7 - 2 = -4, diff 1)
nums = [1, 2, 3], goal = -7 → 7 (closest is empty sum 0)
Initial Brute Force
Enumerate all 2^N subsets, compute each sum, track minimum |sum - goal|.
Brute Force Complexity
O(2^N · N). At N=40: 4 × 10¹³. TLE by 7 orders of magnitude.
Optimization Path
DP by sum? Sums range over [-4 × 10⁸, 4 × 10⁸] — too wide for a value-indexed DP. So polynomial DP is out.
O(2^(N/2)) enumeration: 2^20 ≈ 10⁶ per half. Feasible in time, requires the combine step.
For closest-sum-to-goal, sort one half’s sums; for each L in the other half, binary-search goal - L; check the two candidates around the insertion point. Total: O(2^(N/2) · N/2 + 2^(N/2) · log(2^(N/2))) = O(2^(N/2) · N).
Final Expected Approach
- Split
numsinto halvesA(first N/2) andB(last N - N/2). - Enumerate all subset sums of
A: listsumsAof size2^|A|. - Enumerate all subset sums of
B: listsumsBof size2^|B|. - Sort
sumsB. best = min(|s - goal|) for s in sumsA(handles the case where the right side contributes 0 — but since we include 0 as a subset sum of B, this is captured by step 6).- For each
ainsumsA: binary-searchsumsBforgoal - a; checksumsB[idx]andsumsB[idx-1](the two closest); updatebest. - Return
best.
Data Structures Used
- Two flat lists of subset sums.
bisect(Python) /Arrays.binarySearch(Java) /sort.Search(Go) /lower_bound(C++).
Correctness Argument
Every subset of nums decomposes uniquely as (left ∪ right) where left ⊆ A and right ⊆ B. So sum(subset) = a + b for some a ∈ sumsA, b ∈ sumsB. We want min_{a, b} |a + b - goal| = min_a min_b |b - (goal - a)|. For a fixed a, the inner min over b is solved by binary search in sorted sumsB: the closest element is at the insertion point or one position to its left. Iterating over all a ∈ sumsA covers all subsets.
Including 0 in both sumsA and sumsB covers the empty-side cases.
Complexity
| Quantity | Value |
|---|---|
| Enumerate sums | O(N · 2^(N/2)) |
| Sort | O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2)) |
| Binary search loop | O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2)) |
| Total time | O(N · 2^(N/2)) |
| Space | O(2^(N/2)) |
At N=40: ~4 × 10⁷ ops. Fits in 1 sec C++, ~3 sec Python.
Implementation Requirements
from bisect import bisect_left
def minAbsDifference(nums, goal):
def all_sums(arr):
sums = [0]
for x in arr:
sums = sums + [s + x for s in sums]
return sums
n = len(nums)
A = nums[: n // 2]
B = nums[n // 2 :]
sumsA = all_sums(A)
sumsB = sorted(all_sums(B))
best = abs(goal) # corresponds to the empty subset
for a in sumsA:
target = goal - a
idx = bisect_left(sumsB, target)
if idx < len(sumsB):
best = min(best, abs(a + sumsB[idx] - goal))
if idx > 0:
best = min(best, abs(a + sumsB[idx - 1] - goal))
if best == 0:
return 0
return best
Tests
- N=1,
[5], goal=5 → 0. - N=1,
[5], goal=0 → 0 (empty subset). - LC 1755 sample:
[5, -7, 3, 5], goal=6 → 0. - LC 1755 sample 2:
[7, -9, 15, -2], goal=-5 → 1. - LC 1755 sample 3:
[1, 2, 3], goal=-7 → 7. - All zeros: any goal →
|goal|. - Single huge value:
[10^7] * 40, goal=0 → 0 (empty). - Adversary: random N=40 with random values; cross-check against brute force at N=20.
Follow-up Questions
- “Now I need to count subsets with sum exactly
S.” → enumerate sums of both halves; for each sumain A, count occurrences ofS - ain B (bucket by value or use a Counter). - “Now I need subsets with sum in
[L, R].” → sort B; for eacha, binary-search the count of B-elements in[L - a, R - a]using twobisectcalls. - “What if the array has 50 elements?” → 2^25 = 3 × 10⁷ — borderline. Memory at 8 bytes per sum is 256 MB. Need to drop to bitset or stream.
- “Subset-product instead of sum?” → enumerate products; the combine is identical.
- “k-th smallest subset sum across all subsets.” → k-way merge using a min-heap from sorted subset-sum lists per half.
Product Extension
Cryptographic key knapsack (Merkle–Hellman) and certain integer programming problems with ~40 binary variables: MITM is the textbook attack / solver. Portfolio optimization with a small basket of asset switches; molecular conformation enumeration. Whenever you have a “binary vector with cost”, N ≈ 40, and no obvious polynomial structure: MITM is the move.
Language/Runtime Follow-ups
- Python: list-comprehension enumeration as shown is clean. For tighter constants use
numpyto compute all subset sums via repeatedconcatenate(s, s + x). - Java:
long[] sumsA, sumsB.Arrays.sortandArrays.binarySearchare O(log) per call. Watch heap pressure at 2 × 10⁶ longs ≈ 16 MB. - Go:
sort.Slice(sumsB, ...);sort.SearchIntsfor binary search. - C++:
vector<long long>of size 2^20 = 8 MB each.std::sort,std::lower_bound. Idiomatic. - JS/TS:
Numberis safe to ±2⁵³; sums of ±4 × 10⁸ are tiny. Use plainArray+Array.prototype.sort((a, b) => a - b).
Common Bugs
- Splitting halves as
[:n/2]and[n/2:]but accidentally usingn // 2 + 1somewhere → mismatched sizes; you’ll miss subsets. - Forgetting to include 0 (empty subset) in either half — fix by initializing
sums = [0]. - Sorting only one half but binary-searching as if both are sorted, or vice versa.
- Initial
best = float('inf')is fine, but initialbest = abs(goal)is more honest about the empty-subset case. - After binary search, only checking
sumsB[idx]and missingsumsB[idx-1](the next-smaller) — the closest element can be on either side. - Using a
setinstead of sorted list — kills the binary search.
Debugging Strategy
For small N=4, enumerate all 16 subset sums by hand and verify the MITM result. Print sumsA, sumsB (sorted) and walk one binary search by hand. If the result is consistently too large by some |x|, you forgot to include 0; if too small, you’re double-counting (e.g., overlapping halves).
Mastery Criteria
- Recognized N=30..45 + arbitrary subset-sum constraint as the MITM trigger within 90 seconds.
- Wrote MITM closest-sum from scratch in <25 minutes.
- Stated time complexity O(N · 2^(N/2)) and space O(2^(N/2)) without prompting.
- Solved LC 1755 in <30 minutes from cold start.
- Solved one cousin (LC 956 — Tallest Billboard, with a twist; or “find subset with sum closest to half”) from cold start.
- Articulated MITM’s failure point (N ≥ 50 → memory and time both blow up).
Phase 4 — Graph Mastery
Target level: Medium → Hard Expected duration: 3 weeks (12-week track) / 3 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~7 algorithms per week + 40–70 problems applying them under the framework
Why Graphs Are The Most-Tested Algorithm Family In Senior Interviews
Phase 2 taught you 28 patterns that solve most Mediums. Phase 3 taught you the augmented data structures that make Hards tractable. Phase 4 teaches the single algorithmic family that shows up more often than any other in senior, staff, and infrastructure interviews: graphs.
Here is the empirical claim, and it is the entire reason this phase exists as its own three-to-four-week unit:
Roughly one in three of all Medium-Hard and Hard interview problems at top-tier companies is a graph problem in disguise. Of those, at least half are not labeled “graph” — they are labeled “string”, “scheduler”, “permission system”, “currency conversion”, “build pipeline”, or “bus route”. The first job is recognizing that the problem is a graph problem. The second is picking the right algorithm. The third is implementing it without bugs.
Why graphs dominate senior interviews specifically:
- Graphs are the universal modeling language. Almost every relational or topological structure in a real production system — service dependencies, ACL inheritance, package builds, request routing, social networks, fraud rings, knowledge bases, scheduling DAGs, currency markets — is a graph. A senior engineer is expected to see the graph in a problem that doesn’t mention one.
- Graphs combine almost every Phase 1–3 building block. A real graph problem will fold in a hash map (adjacency list), a queue (BFS), a stack (DFS), a heap (Dijkstra), a DSU (Kruskal / cycle detection), a topological sort (dependency resolution), and sometimes a segment tree (Euler tour + RMQ for LCA). The interviewer gets to test ten primitives in a single 35-minute round.
- Graphs have a clean correctness story. Each algorithm here is a named result with a known proof, known preconditions, and a known complexity. There is no “I think this works because…” — there is “this is BFS, BFS gives shortest paths in unweighted graphs, the precondition is unit-weight edges, the proof is on layer numbers.” Senior interviewers want to hear that proof come out unprompted.
- The graph algorithm space is large but finite. Roughly 20 algorithms cover everything you’ll see at L4–L6. Past that, max-flow, min-cost-max-flow, and matching variants cover staff-and-above. There is a definite ceiling — but it’s higher than candidates expect, and it’s where senior interviewers live.
Candidates who stall on graph rounds almost always fail at recognition or modeling, not at the algorithm itself. They fail because:
- They didn’t recognize “alien dictionary” as a topological sort over inferred constraints.
- They didn’t see “minimum cost to connect all points” as Kruskal/Prim on the complete metric graph.
- They reached for Dijkstra on a graph with negative weights and produced wrong answers.
- They tried BFS on a 0-1 weighted graph instead of 0-1 BFS or Dijkstra.
- They forgot to coordinate-compress an implicit graph and exploded the state space.
This phase is structured to make those failures impossible. You will internalize the signal for each of 21 algorithms, the modeling reflex for implicit graphs, and the algorithm-selection decision tree that maps a problem statement to a single correct technique within 90 seconds.
After this phase, you can solve unmistakably-Hard graph problems on first attempt: alien dictionary in 20 minutes, network delay time in 10, cheapest flights in K stops in 25, accounts merge in 20, bus routes in 30, min cost to connect all points in 15. You also become visibly stronger in mock interviews because you immediately reach for adjacency lists, write from collections import deque before you write any logic, and articulate which algorithm you’re running and why.
What You Will Be Able To Do After This Phase
- Recognize that a problem is a graph problem in <2 minutes of reading, even when neither “graph”, “node”, nor “edge” appears in the statement.
- Choose between BFS / DFS / Dijkstra / Bellman-Ford / Floyd-Warshall / 0-1 BFS / topological sort / DSU / MST in <60 seconds based on the problem’s edge weights, query type, and size.
- Implement a clean adjacency-list representation in <2 minutes for any graph variant (directed, undirected, weighted, multi-edge, self-loop, implicit grid).
- Implement BFS, DFS (recursive + iterative), Dijkstra (eager and lazy), and topological sort (Kahn + DFS) from memory in <8 minutes each.
- Detect cycles in directed and undirected graphs by both DFS-coloring and DSU.
- Run Kruskal and Prim end-to-end with a DSU you write by hand.
- Identify when a Hard problem reduces to bipartite matching or max-flow at the modeling level (you do not need to memorize Dinic).
- Articulate the correctness theorem for every algorithm you use (“Dijkstra is correct because the heap always extracts the next-closest unsettled node, and that node’s tentative distance is its true shortest distance under non-negative weights”).
- Recognize negative-cycle problems and reach for Bellman-Ford / SPFA correctly.
- Construct the implicit graph for grid problems, word ladders, state-space search, and bus-route problems without ever materializing all edges.
How To Read This Phase
Read this README in two passes. Pass 1: linear, end to end, building a mental map of which algorithm plugs which signal. Pass 2: as you work the labs, refer back to specific algorithm entries to clarify invariants and pitfalls.
Each algorithm entry has a fixed shape:
- When To Use — the problem signal that should fire this algorithm in <2 minutes.
- Complexity — time and space, with the assumptions that matter.
- Correctness Sketch — one paragraph that you should be able to recite under interviewer pressure.
- Common Pitfalls — the bugs that consume the most interview minutes.
- Classic Problems — 3–6 representative LeetCode problems where the algorithm is the intended solution.
The phase ends with a Graph-Modeling Cheat Sheet (how to recognize a graph problem in disguise), an Implicit-Graph Catalog (grid / word-ladder / state-space), a Mastery Checklist, and Exit Criteria.
Inline Graph Algorithm Reference
1. Graph Representation
When To Use
Every graph problem starts here. The choice between adjacency list, adjacency matrix, edge list, and implicit graph is the first decision you make, and it shapes every subsequent algorithm’s complexity.
- Adjacency list — the default.
adj[u]is a list of (neighbor, weight) pairs. Use adictoflist(Python),Map<Integer, List<int[]>>(Java),[][]intormap[int][]edge(Go),vector<vector<pair<int,int>>>(C++). - Adjacency matrix —
M[u][v]is the edge weight (or 0 / ∞ for absence). Use only when (a)V ≤ 500so the O(V²) memory fits, (b) you do many(u, v)edge-existence queries, or (c) you’re running Floyd-Warshall. - Edge list — a flat list of
(u, v, w)triples. Use only when the algorithm is edge-centric: Kruskal, Bellman-Ford. - Implicit graph — never materialize the edges. The neighbors of a state are computed on demand. Used for grids, word ladders, sliding puzzles, state-space search.
Complexity
| Representation | Space | Edge query | Iterate neighbors |
|---|---|---|---|
| Adjacency list | O(V + E) | O(deg(u)) | O(deg(u)) |
| Adjacency matrix | O(V²) | O(1) | O(V) |
| Edge list | O(E) | O(E) | O(E) |
| Implicit | O(state) | O(neighbor-fn) | O(neighbor-fn) |
Correctness Sketch
The representation is a faithful encoding of the graph; the algorithm’s correctness is independent of representation as long as iteration over neighbors is exhaustive and edge weights are preserved. Use the representation that minimizes the algorithm’s dominant cost.
Common Pitfalls
- Undirected edges added once instead of twice.
adj[u].append(v)withoutadj[v].append(u)silently breaks every traversal that relies on bidirectionality. - Multi-edges silently lost when using a
setinstead of alistfor neighbors. If the problem permits multi-edges, use lists; if not, decide explicitly. - Self-loops can appear in problems that don’t seem to allow them (e.g., topological sorts of “course depends on itself”). Handle defensively.
- Indexing on string keys — convert string node IDs to ints once, up front. Hash lookups inside hot loops cost real time.
Classic Problems
- LeetCode 261 — Graph Valid Tree (tests representation + cycle detection).
- LeetCode 332 — Reconstruct Itinerary (multi-edges matter; use a heap-of-destinations).
- LeetCode 547 — Number of Provinces (matrix vs list tradeoff).
2. BFS — Breadth-First Search (Unweighted Shortest Path)
When To Use
- Find shortest path in number of edges in an unweighted (or unit-weight) graph.
- Layer-by-layer traversal: “all nodes at distance ≤ k”, “minimum number of moves”, “level-order traversal”.
- The problem says “minimum / shortest / fewest” and the edge weights are all equal.
- Implicit graph variants: shortest word ladder, shortest path in a maze, fewest knight moves on a chessboard.
Complexity
Time O(V + E). Space O(V) for the queue and visited set.
Correctness Sketch
BFS visits nodes in non-decreasing order of distance from the source. When a node is first dequeued, its distance is exactly the shortest path length, because any earlier-enqueued node has distance ≤ the current node’s distance, and the current node was enqueued by a neighbor at distance d - 1 — so any other path to it must go through some node at distance ≥ d - 1, giving total distance ≥ d.
Common Pitfalls
- Marking visited on dequeue, not on enqueue. If you mark on dequeue, the same node can be enqueued by every neighbor before it’s processed once — exploding the queue to O(E) size and degrading performance.
- Tracking distance via
len(queue)confusion. Use either a(node, dist)tuple or process the queue in level batches viafor _ in range(len(queue)). - Not separating the visited check from the enqueue.
if v not in visited: visited.add(v); queue.append(v)is the canonical idiom. - Forgetting to handle the source itself. The source’s distance is 0; it should be marked visited at start.
Classic Problems
- LeetCode 102 — Binary Tree Level Order Traversal.
- LeetCode 127 — Word Ladder (canonical BFS on implicit graph). See Lab 01.
- LeetCode 200 — Number of Islands (BFS variant on grid).
- LeetCode 433 — Minimum Genetic Mutation.
- LeetCode 1091 — Shortest Path in Binary Matrix.
3. DFS — Depth-First Search (Recursive + Iterative; Pre/Post Numbering)
When To Use
- Connected-component enumeration, cycle detection, topological sort, tree traversal, articulation-point detection.
- Backtracking-style problems where you exhaustively explore a state space.
- When path matters more than distance — DFS finds some path, not necessarily the shortest.
- When the graph has small branching but deep paths.
Complexity
Time O(V + E). Space O(V) for the recursion stack (or explicit stack).
Correctness Sketch
DFS explores each edge exactly twice (once in each direction for undirected, once for directed). Pre-order numbering captures discovery time; post-order captures finish time. The discovery/finish interval structure underpins SCC, articulation-point, and bridge algorithms (Tarjan’s lowlink uses pre-order numbers as ranks).
Common Pitfalls
- Stack overflow at V = 10^5 in Python. Default recursion limit is 1000. Either
sys.setrecursionlimit(2 * 10**5)or convert to an explicit stack. - Iterative DFS state. When converting to a stack, you need to track where in the neighbor iteration you are — a tuple of
(node, iterator)works; a tuple of(node, neighbor_index)is faster. - Pre vs post processing confusion. “Print on entry” is pre-order; “print on completion” is post-order; topological sort uses reverse post-order.
- Visited semantics differ from BFS. For cycle detection in directed graphs, you need three states: white (unvisited), gray (on the current DFS path), black (fully explored). A single boolean visited is insufficient.
Classic Problems
- LeetCode 200 — Number of Islands. See Lab 02.
- LeetCode 695 — Max Area of Island.
- LeetCode 207 — Course Schedule (cycle detection via DFS coloring).
- LeetCode 332 — Reconstruct Itinerary (Hierholzer’s = post-order DFS).
- LeetCode 332 — Surrounded Regions.
4. Multi-Source BFS
When To Use
- “From any of these K starting points, what’s the shortest distance to every other node?” — common in grids.
- Equivalent to adding a virtual super-source connected to all K starts with weight 0 and running single-source BFS. But you don’t materialize the super-source: you just enqueue all K starts at distance 0 simultaneously.
- Examples: “rotting oranges” (every rotten orange is a source), “walls and gates” (every gate is a source), “01 matrix distance from nearest zero” (every zero is a source).
Complexity
Time O(V + E). Same as single-source BFS — the K starts add O(K) but are absorbed into the V + E term.
Correctness Sketch
The super-source argument: imagine a node S₀ connected to every start with weight 0. Single-source BFS from S₀ visits each real node in non-decreasing distance order, and the distance is 1 + min(dist(start_i)). By initializing the queue with all starts at distance 0 instead of materializing S₀, we get the same layer-by-layer behavior with the same correctness proof.
Common Pitfalls
- Initializing one start at a time in a loop and running single-source BFS K times. That’s O(K · (V + E)), not O(V + E).
- Forgetting to mark all starts visited up front. If you only mark the first as visited, the others are treated as unvisited targets and get re-enqueued at distance > 0.
- Mixing source types in problems where some sources and some targets are both special (e.g., “rotting oranges” has rotten=source, fresh=target, empty=skip). Always classify cells in a single pass before BFS.
Classic Problems
- LeetCode 994 — Rotting Oranges. See Lab 03.
- LeetCode 286 — Walls and Gates.
- LeetCode 542 — 01 Matrix.
- LeetCode 1162 — As Far From Land As Possible.
- LeetCode 815 — Bus Routes (multi-source on the bus-line graph). See Lab 09.
5. 0-1 BFS
When To Use
- Edge weights are in
{0, 1}(or any two values, with 0-weight as the “free” edge). - The graph mixes “free” transitions (0-weight) and “step” transitions (1-weight). Examples: grid with portals, terrain with roads (free) and trails (cost 1).
- Dijkstra would also work but has a
log Voverhead. 0-1 BFS is O(V + E) — strictly faster.
Complexity
Time O(V + E). Space O(V).
Correctness Sketch
Use a deque. When relaxing an edge of weight 0, push the neighbor to the front; when relaxing weight 1, push to the back. The deque thus holds nodes in non-decreasing order of tentative distance, with at most two distinct distance values present at any moment. The first time a node is popped, its distance is final — same correctness argument as Dijkstra, with the deque playing the role of a 2-bucket priority queue.
Common Pitfalls
- Pushing weight-1 edges to the front is the canonical bug — it breaks the monotone-distance invariant.
- Re-processing nodes because you didn’t check
if d > dist[u]: continueafter popping. This is a Dijkstra-style guard. - Generalizing to weights
{0, k}for k > 1 doesn’t work directly; you need either a multi-bucket BFS or actual Dijkstra.
Classic Problems
- LeetCode 1368 — Minimum Cost to Make at Least One Valid Path in a Grid (canonical 0-1 BFS).
- LeetCode 2290 — Minimum Obstacle Removal to Reach Corner.
- “Shortest path with at most K edges of cost 1, others free” — folklore.
6. Dijkstra (Lazy + Eager Variants; Non-Negative Weights)
When To Use
- Single-source shortest path in a graph with non-negative edge weights.
- The default for any “shortest / cheapest / minimum cost” path problem with weighted edges. If weights can be negative, use Bellman-Ford instead.
- Variants: “shortest path with at most K edges” (relax with a
(dist, edges_used)state), “second shortest path” (two distance arrays), “shortest path on multi-criteria” (state expansion).
Complexity
- Lazy (binary heap): O((V + E) log V). The heap holds up to E entries because we don’t decrease-key — we re-insert and skip stale entries on pop.
- Eager (binary heap with decrease-key): O((V + E) log V) — same asymptotic, smaller constant, but decrease-key requires an indexed heap.
- Fibonacci heap: O(E + V log V) — theoretical, never used in interviews.
- Space O(V) for the dist array + O(E) for the heap (lazy).
Correctness Sketch
Maintain a tentative distance dist[v] for every node, initialized to ∞ except the source (0). Repeatedly extract the unsettled node u with smallest dist[u] (the heap gives this in O(log V)). At the moment of extraction, dist[u] is final: any other path to u must go through some other unsettled node w with dist[w] ≥ dist[u], and since edges are non-negative, the total path length to u via w is ≥ dist[u]. Relax all outgoing edges from u and push updated neighbors. Termination: every node is extracted at most once.
Common Pitfalls
- Using Dijkstra on negative-weight edges. It produces wrong answers — the relaxation invariant fails. Use Bellman-Ford.
- Lazy variant: forgetting the staleness check.
if d > dist[u]: continueafter popping. Without it, you re-process nodes and the asymptotic blows up to O(E²). - Pushing
(dist[u], u)instead of(new_dist, neighbor)when relaxing — the heap orders on the first tuple element, so putdistfirst. - Initializing
dist[source] = 0but not pushing the source to the heap. The first pop must be the source. - Forgetting to handle disconnected components.
dist[v] = ∞is the answer for unreachable v; check before printing.
Classic Problems
- LeetCode 743 — Network Delay Time. See Lab 04.
- LeetCode 1631 — Path With Minimum Effort.
- LeetCode 778 — Swim in Rising Water.
- LeetCode 787 — Cheapest Flights Within K Stops (modified Dijkstra with edge-budget).
- LeetCode 1514 — Path with Maximum Probability (Dijkstra on max-multiplicative).
7. Bellman-Ford (Negative Weights, Negative-Cycle Detection)
When To Use
- Shortest path with negative edge weights but no negative cycle reachable from source.
- Negative-cycle detection itself: if a V-th iteration relaxes any edge, the graph has a negative cycle on the source’s reachable component.
- “Shortest path with at most K edges” — Bellman-Ford’s iteration index is the edge budget. This is the canonical reframing of LeetCode 787.
- All-pairs negative shortest paths via Johnson’s algorithm (Bellman-Ford + Dijkstra), but this is overview-only.
Complexity
Time O(V · E). Space O(V).
Correctness Sketch
After i rounds of relaxing all E edges, dist[v] equals the shortest path from source to v using at most i edges. By induction: in round 1, only the source’s neighbors are relaxed (1-edge paths). In round i, any shortest i-edge path’s last edge (u, v) was relaxed because dist[u] was already correct for i - 1 edges. Since shortest paths in a graph with no negative cycle have ≤ V - 1 edges, V - 1 rounds suffice. A V-th round that still relaxes an edge proves a negative cycle.
Common Pitfalls
- Iterating until no edge is relaxed (early termination) is a common variant — but you still need V - 1 rounds in the worst case for correctness, and the V-th round for cycle detection.
- Using a dict for distances instead of an array indexed by int — slow on hot iteration loops.
- Confusing “shortest path with at most K edges” with “K hops” — read the problem carefully. K stops in LC 787 is K + 1 edges.
- Negative-cycle reachability. A negative cycle exists in the graph but doesn’t affect the source if it’s unreachable. Run from the source, not globally.
Classic Problems
- LeetCode 787 — Cheapest Flights Within K Stops (canonical). See Lab 05.
- “Detect arbitrage in currency markets” (negative cycle in
-log(rate)graph). - “Minimum steps to make k operations” with negative-cost shortcuts.
8. SPFA (Shortest Path Faster Algorithm)
When To Use
- Bellman-Ford with a queue-based optimization that avoids re-relaxing edges whose source
uhasn’t been updated since the last visit. - In practice, ~2–10× faster than vanilla Bellman-Ford on sparse graphs with random structure.
- Caveat: worst-case is still O(V · E). On adversarial inputs (e.g., gridded negative cycles), SPFA can be slower than Bellman-Ford. Codeforces problems are sometimes designed to break SPFA. Use it for negative-weight graphs in interviews only when you’ve stated the caveat.
Complexity
Average O(k · E) for small k (often k ≤ 2). Worst-case O(V · E). Space O(V) for the queue + an in_queue flag.
Correctness Sketch
A node u is enqueued whenever its dist[u] improves. On dequeue, relax all outgoing edges. The in_queue flag prevents duplicate enqueues. Termination follows from the fact that each dist[u] decreases monotonically and is bounded below — for a graph with no negative cycle, the total number of decreases is ≤ V · E.
Common Pitfalls
- Forgetting the
in_queueflag. Without it, the queue can grow to O(E) size and SPFA degrades. - Negative-cycle detection in SPFA requires tracking the number of times each node is relaxed; if a node is relaxed ≥ V times, there’s a negative cycle.
- Adversarial inputs. State the caveat to the interviewer; don’t claim asymptotic improvement.
Classic Problems
- Same as Bellman-Ford. Choose Bellman-Ford for the K-edge-budget framing (LC 787); SPFA only when raw single-source negative shortest path is the goal and average performance matters.
9. Floyd-Warshall (All-Pairs Shortest Paths)
When To Use
- All-pairs shortest paths on a small graph: V ≤ ~500 (V³ = 10^8, ~1 second).
- Negative weights are fine (no negative cycle assumed).
- Transitive closure: replace
min/+withOR/ANDto compute reachability in O(V³). - Density independence: the algorithm is V³ regardless of E. So on dense graphs (E ~ V²) it’s the only practical all-pairs algorithm; on sparse graphs (E ~ V) Dijkstra-from-each-node is V · E · log V = better when V is large.
Complexity
Time O(V³). Space O(V²) for the distance matrix.
Correctness Sketch
Define dp[k][i][j] = shortest path from i to j using only intermediate vertices in {1, ..., k}. The recurrence is dp[k][i][j] = min(dp[k-1][i][j], dp[k-1][i][k] + dp[k-1][k][j]) — either don’t use k, or use it as a midpoint. The 2D in-place version dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]) works because in iteration k, the row dist[i][k] and column dist[k][j] are updated only with paths that don’t use k as an intermediate (since k-as-intermediate requires using k twice, which is a cycle and dominated by no-use).
Common Pitfalls
- Loop order is
k, theni, thenj— and noti, j, k. The latter computes garbage. - Negative-cycle detection is
dist[i][i] < 0after the algorithm completes. - Initializing diagonals —
dist[i][i] = 0; missing edges are∞(use a large finite number like10^9to avoid overflow when summing). - Path reconstruction requires a parent matrix — easy to add but double the memory.
Classic Problems
- LeetCode 1334 — Find the City With the Smallest Number of Neighbors at a Threshold Distance (canonical V ≤ 100 Floyd-Warshall).
- LeetCode 743 — Network Delay Time (single-source, but Floyd-Warshall on V ≤ 100 still passes).
- “Transitive closure of a DAG” via
OR/ANDFloyd-Warshall.
10. Topological Sort (Kahn + DFS-Based; Cycle-Detection Equivalence)
When To Use
- DAG ordering where edge
u → vmeans “u must come before v”. - “Course schedule”, “task dependencies”, “build order”, “alien dictionary” (after extracting constraint edges).
- Used as a precondition check: if topological sort fails (cycle present), the problem has no valid ordering.
Complexity
Time O(V + E). Space O(V).
Correctness Sketch — Kahn’s
Repeatedly remove a node with in-degree 0, append it to the order, and decrement its neighbors’ in-degrees. If the order has length V at the end, the graph is a DAG and the order is valid. If not, the remaining nodes form a cycle. Correctness: a node with in-degree 0 has no predecessors, so it can safely come first. After removal, the remaining graph is still a DAG (removing vertices can’t create cycles).
Correctness Sketch — DFS-Based
Run DFS from each unvisited node; on finishing a node (post-order), prepend it to the result. This works because of the white-path lemma: if u → v and DFS visits u before v, then v is in u’s subtree, so v finishes before u, so v is appended before u and prepended after u — meaning u comes before v in the final order.
Common Pitfalls
- Edge direction confusion. “X depends on Y” usually means edge
Y → X(Y must come first). Read the problem carefully. - Detecting cycles via Kahn’s — count nodes processed; if < V, cycle exists.
- Multiple valid orders — both Kahn’s and DFS-based produce some valid order, not a unique one. If the problem demands lexicographically smallest, use Kahn’s with a min-heap instead of a queue.
Classic Problems
- LeetCode 207 — Course Schedule.
- LeetCode 210 — Course Schedule II.
- LeetCode 269 — Alien Dictionary. See Lab 06.
- LeetCode 1136 — Parallel Courses.
- LeetCode 2115 — Find All Possible Recipes from Given Supplies.
11. Cycle Detection In Directed Graphs (DFS Color States)
When To Use
- “Does this directed graph have a cycle?” — in dependency / scheduling / DAG-validation problems.
- More fine-grained than Kahn’s: tells you which edge closes a cycle.
Complexity
Time O(V + E). Space O(V) for the color array.
Correctness Sketch
Maintain three colors: white (unvisited), gray (on the current DFS path), black (fully explored). On entering a node, mark it gray; on finishing, mark it black. If during DFS we encounter a gray neighbor, that’s a back-edge and proves a cycle. White → recurse. Black → already processed; not part of current path; safe to ignore. Correctness: a back-edge is exactly an edge from a descendant to an ancestor in the DFS tree, which closes a cycle. Forward and cross edges don’t.
Common Pitfalls
- Using a single boolean visited. A visited node could be a back-edge target (cycle) or a forward/cross-edge target (no cycle). Two-state visited can’t distinguish. Three colors are required.
- Resetting gray to white on finish. Wrong — you’d keep re-marking black nodes as gray on subsequent DFS calls and produce phantom cycles. Mark black on finish and stay black.
- Forgetting to check both directions in undirected graphs. This algorithm is for directed graphs; for undirected, see #12.
Classic Problems
- LeetCode 207 — Course Schedule (DFS-color variant).
- LeetCode 802 — Find Eventual Safe States (reverse-direction + cycle-detection).
- LeetCode 1059 — All Paths from Source Lead to Destination.
12. Cycle Detection In Undirected Graphs (DFS / Union-Find)
When To Use
- “Is this undirected graph a tree (acyclic + connected)?”, “redundant edge”, “valid tree”.
- Two equivalent approaches: DFS with parent tracking, or DSU.
Complexity
DFS: O(V + E). DSU: O(E · α(V)).
Correctness Sketch — DFS
In an undirected graph, every edge is bidirectional in the adjacency list. When DFS visits v from u, the edge (v, u) shows up in v’s adjacency. Skip the parent: for w in adj[v]: if w != parent: dfs(w, v). If a non-parent neighbor is already visited, that’s a cycle-closing edge.
Correctness Sketch — DSU
Process edges in any order. For each edge (u, v): if find(u) == find(v), the edge would close a cycle (both endpoints already in same component); else union(u, v). After processing all edges, the graph is acyclic iff no cycle was reported.
Common Pitfalls
- DFS: forgetting to skip the parent. Without
if w != parent, every edge looks like a back-edge. - DFS: parallel edges — if the graph has multi-edges between
uandv, the second edge looks like a back-edge through a non-parent neighbor. Track edge IDs, not just parent node. - DSU: ignoring connectedness. “Valid tree” requires both acyclic and connected (V - 1 edges + DSU returning a single component).
Classic Problems
- LeetCode 261 — Graph Valid Tree.
- LeetCode 684 — Redundant Connection (DSU canonical).
- LeetCode 685 — Redundant Connection II (directed variant — harder).
13. Strongly Connected Components (Kosaraju + Tarjan)
When To Use
- Decompose a directed graph into maximal sets where every pair
(u, v)has paths in both directions. - Reduces a directed graph to a DAG of SCCs (the condensation).
- Used in 2-SAT, transitive-closure compression, and “find all nodes that can reach all others”.
Complexity
Both Kosaraju and Tarjan: O(V + E). Space O(V).
Correctness Sketch — Kosaraju
- DFS on the original graph, pushing nodes to a stack in post-order.
- Build the reverse graph.
- Pop nodes from the stack; for each unvisited node, DFS on the reverse graph — that DFS visits exactly one SCC.
The post-order ordering ensures the first node popped is in a “source” SCC of the condensation; reverse-DFS from it can only reach nodes in its own SCC because no other SCC’s nodes have a forward path back to it.
Correctness Sketch — Tarjan
Single DFS maintaining a stack of “currently active” nodes plus disc[u] (discovery time) and low[u] (lowest discovery time reachable via the subtree + at most one back-edge). When DFS finishes a node u and low[u] == disc[u], pop the stack down to and including u — those popped nodes form an SCC. Tarjan does it in one pass; Kosaraju in two passes but with simpler bookkeeping.
Common Pitfalls
- Kosaraju: forgetting to reverse all edges in the second graph. Use a separate adjacency list.
- Tarjan: confusing
lowwithdisc—low[u] = min(low[u], low[v])for tree-edge children;low[u] = min(low[u], disc[v])for back-edge neighbors that are still on the stack. Both updates are needed. - Tarjan: handling cross-edges to other SCCs. A neighbor that’s been finished is already in a different SCC; do not update
low[u]from it.
Classic Problems
- LeetCode 1192 — Critical Connections (Tarjan for bridges; SCC-adjacent).
- “2-SAT solvability” via SCCs in the implication graph.
- “Number of source SCCs in the condensation” — Codeforces classic.
14. Bridges And Articulation Points (Tarjan’s Lowlink)
When To Use
- A bridge is an edge whose removal disconnects the graph.
- An articulation point is a vertex whose removal disconnects the graph.
- Used in “critical connections” problems and network-resilience analysis.
Complexity
O(V + E). Single DFS, one pass.
Correctness Sketch
Compute disc[u] and low[u] as in Tarjan’s SCC. For an edge (u, v) where v is a tree child: low[v] > disc[u] ⇒ (u, v) is a bridge (no back-edge from v’s subtree reaches anything at-or-above u, so removing (u, v) disconnects). For articulation points: u is an articulation point if (a) u is the DFS root and has ≥ 2 tree children, or (b) u is not the root and has a tree child v with low[v] ≥ disc[u].
Common Pitfalls
- Using ≥ vs > — bridges use
low[v] > disc[u]; articulation points uselow[v] ≥ disc[u]. The off-by-one between them is critical. - DFS root special case for articulation points — must count tree children; with one tree child, removing the root doesn’t disconnect.
- Multi-edges — a multi-edge between
uandvis not a bridge (the parallel edge keeps the graph connected). Treat parallel edges by edge-ID, not endpoint pair.
Classic Problems
- LeetCode 1192 — Critical Connections in a Network (canonical bridges).
- “Find all articulation points in a network” — Codeforces classic.
15. Minimum Spanning Tree — Kruskal (With DSU)
When To Use
- Connect all V nodes with the minimum total edge weight, in a graph with V - 1 chosen edges.
- Edge-centric algorithm: sort edges, add greedily, skip those that close a cycle (DSU detects).
- Best on sparse graphs (E ~ V) where sorting E edges dominates.
Complexity
O(E log E) for sort + O(E · α(V)) for DSU = O(E log E). Space O(V).
Correctness Sketch (Cut Property)
The minimum-weight edge crossing any cut of the graph belongs to some MST. Kruskal repeatedly takes the next-cheapest edge; if it doesn’t close a cycle (DSU find(u) != find(v)), it’s the cheapest edge crossing the cut between its DSU components. By the cut property, it’s in some MST. Adding it preserves the invariant that the chosen edges are a subset of some MST. After V - 1 edges, the chosen set is a spanning tree.
Common Pitfalls
- Forgetting to sort edges. Kruskal without sorting is just a random spanning tree.
- DSU bugs. Phase 3’s α(N) DSU is required. Recursive
findblows the stack at V = 10^5; use iterative two-pass or path-halving. - Disconnected graph. If after processing all edges DSU has > 1 component, no spanning tree exists. Return failure or compute a minimum spanning forest.
- Tie-breaking on equal weights — any tie-breaking rule works; they all produce some MST.
Classic Problems
- LeetCode 1584 — Min Cost to Connect All Points. See Lab 08.
- LeetCode 1135 — Connecting Cities With Minimum Cost.
- LeetCode 1489 — Find Critical and Pseudo-Critical Edges in MST.
16. Minimum Spanning Tree — Prim (With Priority Queue)
When To Use
- Same problem as Kruskal — minimum total edge weight to connect all V.
- Vertex-centric algorithm: grow the tree one vertex at a time, always adding the minimum-weight edge from the tree to a non-tree vertex.
- Best on dense graphs where the heap pays off relative to sorting all E edges.
Complexity
With binary heap: O((V + E) log V). With Fibonacci heap: O(E + V log V) (theoretical). Space O(V + E).
Correctness Sketch (Cut Property, Variant)
At every step, the partial tree T defines a cut (T vs not-T). The minimum-weight edge crossing this cut is added next. By the cut property, it belongs to some MST. The invariant — chosen edges ⊆ some MST — is preserved. After V - 1 additions, the tree spans all of V.
Common Pitfalls
- Lazy vs eager Prim. Lazy: push every (weight, neighbor) to the heap, skip duplicates on pop. Eager: maintain a “best known weight to enter T” per non-tree vertex and use decrease-key. Lazy is simpler and asymptotically equivalent.
- Disconnected graph — same as Kruskal; the heap empties before V - 1 edges are added.
- Picking starting vertex — any vertex works for connected graphs.
Classic Problems
- LeetCode 1584 — Min Cost to Connect All Points (also solvable via Prim).
- “Maximum spanning tree” via negated weights.
17. Bipartite Check (BFS/DFS 2-Coloring)
When To Use
- “Can we partition the V into two groups such that every edge crosses groups?”
- Equivalent to: graph has no odd cycle.
- Used as a precondition for bipartite matching, and in problems like “is this set of dislike-pairs separable into two camps?”
Complexity
O(V + E). Space O(V).
Correctness Sketch
BFS/DFS, coloring each visited node alternately (color 0 or 1) from its parent. If we ever try to color a visited node with a color different from its existing color, the graph has an odd cycle and is not bipartite. Correctness: BFS layers alternate colors; an edge within a layer (or skipping ≥ 2 layers) violates bipartiteness; specifically, any odd cycle forces a same-layer edge.
Common Pitfalls
- Disconnected graph. Run BFS from every unvisited node; the bipartiteness of each component is independent.
- Initializing colors as
-1(uncolored),0,1. Aboolean visitedis insufficient; you need the actual color. - Counting color-0 vs color-1 sizes when the problem asks for “minimum group size” — but the partitioning is unique only up to swapping the two colors per connected component.
Classic Problems
- LeetCode 785 — Is Graph Bipartite?
- LeetCode 886 — Possible Bipartition.
- “2-coloring as a sanity check before bipartite matching.”
18. Bipartite Matching (Hungarian / Hopcroft-Karp Overview)
When To Use
- Maximum cardinality matching in a bipartite graph: pair up as many left-side nodes with right-side nodes as possible, using each at most once.
- Job assignment, “find as many distinct words to slots as possible”, “schedule maximum tasks to workers”.
- Hungarian algorithm: O(V · E) via repeated augmenting-path BFS — ~O(V²·√V) on bipartite.
- Hopcroft-Karp: O(E · √V) — strictly better; the algorithm of choice for large bipartite matching.
Complexity
Hungarian: O(V · E). Hopcroft-Karp: O(E · √V). Space O(V + E).
Correctness Sketch (König’s Theorem and Augmenting Paths)
Berge’s theorem: a matching M is maximum iff there is no augmenting path (a path alternating unmatched / matched edges, starting and ending at unmatched vertices). The Hungarian algorithm repeatedly finds augmenting paths via BFS/DFS and augments. Hopcroft-Karp accelerates by finding all shortest augmenting paths in a single BFS phase, then augmenting all of them in one DFS phase.
Common Pitfalls
- Confusing maximum matching with maximum-weight matching. The latter is min-cost-max-flow; harder algorithm.
- Implementing matching from scratch in interviews is rare. State the algorithm by name, reduce to it, and note the complexity. Senior interviewers accept “this is bipartite matching, O(E√V) via Hopcroft-Karp” without code.
- Modeling. The hard part is recognizing the bipartite structure. “Are there two disjoint sets where edges only cross sets?”
Classic Problems
- LeetCode 1947 — Maximum Compatibility Score Sum (small N: bitmask DP. Large N: bipartite matching + weights).
- “Maximum number of distinct words to slots” — folklore.
- LeetCode 1349 — Maximum Students Taking Exam (bipartite + bitmask).
Overview-level only. Implementation drills are Phase 7 / 12.
19. Max Flow (Ford-Fulkerson / Edmonds-Karp / Dinic — Overview + When To Use)
When To Use
- “Maximum amount of flow from source S to sink T in a capacitated network.”
- Reductions: bipartite matching → max flow. Edge-disjoint paths → max flow. Min cut → max flow.
- Algorithms:
- Ford-Fulkerson (DFS-based augmenting): O(E · max-flow) — pseudo-polynomial; can loop forever on irrational capacities.
- Edmonds-Karp (BFS-based augmenting): O(V · E²). Polynomial.
- Dinic’s: O(V² · E) — uses BFS levels + DFS-blocking-flow. Practical for V, E up to 10^4–10^5.
Complexity
See above. Space O(V + E) for the residual graph.
Correctness Sketch (Max-Flow Min-Cut Theorem)
The maximum flow equals the minimum cut capacity. An augmenting path in the residual graph proves the current flow is not maximum; absence of any augmenting path proves it is. Dinic’s enhancement: BFS to compute layered graph, DFS to push blocking flow, repeat until no augmenting path. Each phase strictly increases the BFS distance from S to T, bounded by V phases.
Common Pitfalls
- Implementation in 35-minute interviews is rare. State the algorithm by name, model the problem, and let the interviewer guide depth.
- Residual graph forgetting reverse edges. Every forward edge
u → vof capacitycadds a reverse edgev → uof capacity 0; pushing flowfon forward subtracts from forward residual and adds to reverse residual. Without reverse edges, augmenting paths can’t “undo” prior bad choices and the max-flow can be wrong. - Modeling errors. “Each node has capacity” requires node-splitting (split
vintov_inandv_outwith edge capacity = node capacity).
Classic Problems
- “Maximum bipartite matching via max-flow.”
- “Edge-disjoint paths from S to T.”
- LeetCode-style: very rare. Common in Google L5+ system rounds and competitive programming.
Overview-level. Implementation in Phase 12.
20. Min-Cut / Max-Flow Duality (Problem Modeling)
When To Use
- “Minimum cost to disconnect S from T” → min-cut problem.
- “Minimum number of edges to remove to disconnect” → min-cut on unit-capacity edges.
- “Image segmentation as binary labeling” → min-cut on a constructed graph.
- “Project selection” (some projects depend on others; pick a subset to maximize profit) → min-cut.
Complexity
Same as max-flow (compute the cut from the residual graph after max-flow terminates).
Correctness Sketch (Max-Flow Min-Cut Theorem)
In any flow network, max-flow value = min-cut capacity. After running max-flow, the min cut consists of the edges from {nodes reachable from S in residual graph} to {nodes not reachable}. Their original capacities sum to the max-flow value.
Common Pitfalls
- Recognizing the model. This is the hardest part. “Project selection” doesn’t look like a flow problem; recognizing the bipartite encoding is the senior-level skill.
- Edge orientation. Min-cut on undirected graphs: each undirected edge becomes two directed edges, each with capacity
c.
Classic Problems
- “Project Selection Problem” — folklore.
- “Image Segmentation via Min-Cut” — vision systems.
- “Minimum number of edges to disconnect” — Menger’s theorem.
21. Eulerian Path / Circuit (Hierholzer’s Algorithm)
When To Use
- “Visit every edge exactly once” — Eulerian path/circuit.
- An undirected graph has an Eulerian circuit iff every vertex has even degree (and the graph is connected on edges). It has an Eulerian path iff exactly 0 or 2 vertices have odd degree.
- A directed graph has an Eulerian circuit iff every vertex has in-degree = out-degree. It has an Eulerian path iff exactly one vertex has out-degree − in-degree = 1 (start) and one has in-degree − out-degree = 1 (end).
- Hierholzer’s algorithm finds the path/circuit in O(E).
Complexity
O(V + E).
Correctness Sketch (Hierholzer’s)
DFS from the start vertex, consuming edges as you traverse them (remove from adjacency). When stuck at a vertex with no outgoing edges, prepend it to the path. Backtrack and continue from earlier vertices that still have unused outgoing edges. The final reverse of the recorded sequence is an Eulerian path. Correctness: each edge is consumed exactly once; the post-order finishing structure naturally constructs the path in reverse.
Common Pitfalls
- Multi-edges and self-loops are common in Eulerian problems. Use a multiset or a list of edges with a “consumed” flag.
- Disconnected components on edges (vs vertices). Isolated vertices with degree 0 are fine; they don’t break Eulerian-ness.
- Lexicographically smallest Eulerian path (LC 332) — sort each adjacency list and use a multiset / heap; pop the smallest unused edge first.
Classic Problems
- LeetCode 332 — Reconstruct Itinerary (canonical Hierholzer’s).
- LeetCode 753 — Cracking the Safe (Eulerian path on de Bruijn graph).
Graph-Modeling Cheat Sheet — How To Recognize A Graph Problem In Disguise
The hardest skill in this phase is modeling: recognizing that a problem is a graph problem when nothing in the statement says “graph”. Here is a battery of signals.
| Signal in problem statement | Graph interpretation | Likely algorithm |
|---|---|---|
| “Depends on” / “must come before” / “prerequisite” | DAG, edge pre → post | Topological sort |
| “Connected” / “linked” / “merged” / “in same group” | Undirected, components | DFS / BFS / DSU |
| “Shortest” / “fewest steps” / “minimum moves” with unit cost | Unweighted graph | BFS |
| “Cheapest” / “minimum cost” with non-negative weights | Weighted graph | Dijkstra |
| “Cheapest with negative discounts” | Weighted graph with neg edges | Bellman-Ford |
| “Minimum cost to connect all” | Spanning tree | Kruskal / Prim |
| “Cycle” / “loop” / “redundant” | Graph + cycle test | DFS coloring / DSU |
| “Two groups” / “partition” / “no two together” | Bipartite | 2-coloring |
| “Pair up X with Y” | Bipartite matching | Hungarian / Hopcroft-Karp |
| “Maximum throughput” / “bottleneck” / “max disjoint paths” | Flow network | Max-flow |
| “Minimum to disconnect” / “critical edges” | Min-cut / bridges | Max-flow / Tarjan |
| “Visit all edges once” | Eulerian | Hierholzer’s |
| “Visit all vertices once with min cost” | Hamiltonian / TSP | Bitmask DP (Phase 3) |
| “Currency conversion” / “exchange rate” | Weighted directed; cycles = arbitrage | Bellman-Ford |
| “ACL inheritance” / “permission propagation” | Reachability | DFS / BFS / transitive closure |
| “Build pipeline” / “task DAG” | Topological + critical path | Topo sort + DP |
| “Friend of friend” / “social network” | Undirected | BFS / DSU |
| “Word transformation” / “step-by-step transform” | Implicit graph on states | BFS |
| “Sliding puzzle” / “8-puzzle” / “Rubik’s cube” | Implicit state graph | BFS / IDA* |
| “Routes between cities” / “flight network” | Directed weighted | Dijkstra |
| “Spread / infection / contamination over time” | Multi-source unweighted | Multi-source BFS |
Common Implicit Graphs
These are the four canonical implicit-graph patterns. You should be able to spot all four within 30 seconds of reading the problem.
1. Grid Graphs
Each cell (r, c) is a node; edges go to the 4 (or 8) neighbors that satisfy bounds and the cell-type constraint. Never materialize all V·M edges — compute neighbors on demand.
DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
def neighbors(r, c):
for dr, dc in DIRS:
nr, nc = r + dr, c + dc
if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] != '#':
yield (nr, nc)
Variants: 8-connected, weighted edges (cost = cell value), constraint-aware (can only enter from certain directions), gravity-based.
2. Word-Ladder Graphs
Each word is a node; an edge connects two words that differ in exactly one character. With N words of length L over alphabet σ, materializing all edges is O(N²) worst case. The trick: for each word, generate L · σ “wildcarded” patterns and use a dict-of-list to find neighbors in O(L · σ) per word.
buckets = defaultdict(list)
for w in words:
for i in range(len(w)):
buckets[w[:i] + '*' + w[i+1:]].append(w)
def neighbors(w):
for i in range(len(w)):
yield from buckets[w[:i] + '*' + w[i+1:]]
3. State-Space Search
Each “state” of the system is a node; transitions are edges. The state encodes the full configuration: e.g., a tuple of (position, keys_collected) for “shortest path with key-collection”.
def neighbors(state):
pos, keys = state
for next_pos in adjacent_cells(pos):
if next_pos has key K:
yield (next_pos, keys | (1 << K))
else:
yield (next_pos, keys)
4. Bipartite “Token / Container” Graphs
Two sets of nodes — e.g., users and groups, buses and stops, courses and prerequisites. An edge connects a token to a container it belongs to. Multi-source BFS over this graph gives “minimum tokens needed to traverse from container A to container B” — the canonical “bus routes” framing in LeetCode 815.
See Lab 09 for the bus-routes modeling exercise.
Mastery Checklist
Before exiting this phase, verify all of these:
- You can build an adjacency list from an edge list (directed and undirected) in <2 minutes, in your primary language.
- You can write BFS from memory in <5 minutes, including correct visited-on-enqueue semantics.
- You can write DFS recursively and iteratively (with explicit stack) from memory in <8 minutes total.
- You can write Dijkstra from memory in <8 minutes, lazy variant, including the staleness-skip line.
- You can write Kahn’s topological sort from memory in <6 minutes.
- You can write DSU with path compression and union by rank from memory in <5 minutes.
- You can write Kruskal’s MST from memory in <10 minutes (DSU + sort).
- You can recognize “this is a graph problem” within 2 minutes of reading any of the 30 classic graph problems on this list.
- You can correctly choose between BFS / Dijkstra / Bellman-Ford / 0-1 BFS based on edge weights.
- You can model the bus-routes problem (LC 815) as a graph in <5 minutes, articulating the bipartite structure.
- You can model the alien-dictionary problem (LC 269) as a topological sort in <5 minutes, articulating the constraint-extraction step.
- You can articulate the cut property and why it makes Kruskal correct, in <30 seconds.
- You can articulate why Dijkstra fails on negative weights, in <30 seconds.
- You can articulate the white-path lemma and its connection to topological sort via reverse post-order.
Exit Criteria
You may move to Phase 5 (Dynamic Programming) when all of the following are true:
- You have completed all nine labs in this phase, with the lab’s mastery criteria checked off for each.
- You have solved at least 40 unaided graph problems from LeetCode (mix of Medium, Medium-Hard, Hard) and reviewed each via REVIEW_TEMPLATE.md.
- Your unaided success rate on Medium-Hard graph problems is ≥ 70%.
- In a mock interview (phase-11-mock-interviews/), you correctly identify the algorithm family within 2 minutes for at least 8 of 10 graph problems.
- You can write Dijkstra, BFS, DFS, Kahn’s topological sort, and DSU + Kruskal — five algorithms — from a blank slate in under 45 minutes total.
If any of these fails, do another 15–25 graph problems before moving on. Skipping this gate calcifies bad habits that compound in Phase 5 (where DP-on-graphs and DAG-DP build directly on this material).
Labs
Hands-on practice. Each lab follows the strict 22-section format from Phase 0.
- Lab 01 — BFS Shortest Path (Word Ladder)
- Lab 02 — DFS Connected Components (Number of Islands)
- Lab 03 — Multi-Source BFS (Rotting Oranges / Walls and Gates)
- Lab 04 — Dijkstra (Network Delay Time / Cheapest Flights)
- Lab 05 — Bellman-Ford (Cheapest Flights Within K Stops)
- Lab 06 — Topological Sort (Alien Dictionary)
- Lab 07 — Union-Find Applications (Accounts Merge / Provinces)
- Lab 08 — MST Kruskal (Min Cost to Connect All Points)
- Lab 09 — Graph Modeling (Bus Routes)
← Phase 3: Advanced Data Structures · Phase 5: Dynamic Programming → · Back to Top
Lab 01 — BFS Shortest Path (Word Ladder)
Goal
Implement an unweighted shortest-path search on an implicit graph where the nodes are dictionary words and the edges connect any two words that differ in exactly one character. After this lab you should be able to recognize a word-ladder / state-transformation problem in <60 seconds, build the wildcard-bucket adjacency in <3 minutes, and write the BFS body from a blank screen in <5 minutes with correct visited-on-enqueue semantics.
Background Concepts
BFS on an unweighted graph visits nodes in non-decreasing order of distance from the source: the source first (distance 0), then its neighbors (distance 1), then their neighbors (distance 2), and so on. The first time a node is dequeued, its distance is final. This phase teaches the wildcard-bucket trick that makes word-ladder graphs tractable: rather than checking all O(N²) word-pairs for adjacency, build a dict mapping each L-character “wildcarded” pattern (e.g., h*t, *ot, ho*) to the list of words matching that pattern. Two words are adjacent iff they share at least one bucket.
The buckets are O(N · L) entries total; constructing them is O(N · L); finding all neighbors of a word is O(L · σ) where σ is the average bucket size. The total BFS cost is O(N · L²) instead of O(N² · L).
Interview Context
Word Ladder (LC 127) is a top-50 interview problem at Meta and Amazon — both companies have asked it within the past year. Variants appear at Google (“minimum genetic mutation” — LC 433) and Bloomberg. It tests three things at once: (1) recognizing the implicit graph, (2) building it efficiently, (3) running clean BFS. Candidates who try for each pair: if differs by one: connect time out at N = 5000. Bombing this problem on a phone screen is a serious negative signal at L4+.
Problem Statement
Given a beginWord, an endWord, and a list wordList, return the length of the shortest transformation sequence from beginWord to endWord such that:
- Only one letter changes per step.
- Every intermediate word must be in
wordList. beginWorddoes not need to be inwordList.
Return 0 if no such sequence exists.
Constraints
- 1 ≤
beginWord.length≤ 10 - All words have the same length, L.
- 1 ≤
wordList.length≤ 5000 - All words consist of lowercase English letters.
beginWord != endWord.- All words in
wordListare unique.
Clarifying Questions
- Is
beginWordrequired to differ fromendWord? (Yes — guaranteed.) - Does the answer count
beginWordandendWord? (Yes — sequence length includes both endpoints.) - If
endWordis not inwordList, is the answer 0? (Yes — by the rules.) - Are case-sensitive comparisons required? (No — all lowercase.)
- Can
beginWorditself appear inwordList? (Yes; treat normally.)
Examples
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
→ 5 (hit → hot → dot → dog → cog)
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
→ 0 (cog not in wordList)
Initial Brute Force
Materialize the graph: for each pair of words, check if they differ by one letter (O(L) check per pair). Build adjacency in O(N² · L). Then run BFS in O(V + E) = O(N²) edges worst case.
Brute Force Complexity
Time O(N² · L) for graph construction, O(N²) for BFS. Total O(N² · L). Space O(N²). At N = 5000, L = 10: 2.5 × 10^8 ops — borderline TLE in Python; passes in C++ tightly.
Optimization Path
The bottleneck is graph construction. Wildcard buckets eliminate it: for each word, generate L wildcards and append to a bucket dict. Construction is O(N · L²) — at N = 5000, L = 10, that’s 5 × 10^5 ops, two orders of magnitude better. Neighbor enumeration is also faster: only words sharing a bucket are candidates, which prunes dramatically vs scanning all N words.
Bidirectional BFS is a further optimization that ~halves the work in practice (search from both ends, meet in the middle), but adds complexity and is overkill at N = 5000.
Final Expected Approach
- If
endWordnot inwordList, return 0. - Build
buckets: a dict mapping each wildcard pattern to the list of words matching it. - BFS from
beginWordwith distance 0; on dequeue, generate L wildcards; for each, look up the bucket and enqueue any unvisited word with distance + 1. - On reaching
endWord, return distance + 1 (because the answer counts both endpoints). - If queue exhausts, return 0.
Data Structures Used
dict[str, list[str]]— wildcard buckets.set[str]— visited words.collections.deque— BFS queue, holding(word, distance)tuples.
Correctness Argument
BFS on an unweighted graph: when endWord is first dequeued, its distance is the minimum number of edges from beginWord. Each edge represents a one-letter change between dictionary words. Therefore the distance equals the minimum number of one-letter changes — and the sequence length is distance + 1. Visited-on-enqueue ensures each word enters the queue at most once, so total work is O(V · L · σ) where σ is the average bucket size.
Complexity
| Operation | Time | Space |
|---|---|---|
| Build buckets | O(N · L²) | O(N · L) |
| BFS | O(N · L²) | O(N) |
| Total | O(N · L²) | O(N · L) |
Implementation Requirements
from collections import defaultdict, deque
def ladderLength(beginWord, endWord, wordList):
word_set = set(wordList)
if endWord not in word_set:
return 0
L = len(beginWord)
buckets = defaultdict(list)
for w in word_set:
for i in range(L):
buckets[w[:i] + '*' + w[i+1:]].append(w)
visited = {beginWord}
queue = deque([(beginWord, 1)])
while queue:
word, d = queue.popleft()
if word == endWord:
return d
for i in range(L):
pat = word[:i] + '*' + word[i+1:]
for nb in buckets[pat]:
if nb not in visited:
visited.add(nb)
queue.append((nb, d + 1))
buckets[pat] = [] # optional: clear bucket to avoid reprocessing
return 0
Tests
- Standard:
hit → cogwith full path → 5. - Missing endWord: return 0.
beginWord == endWord: technically violates constraints, but should return 1 if asked.- Single-step:
hit → hotwith wordList=[“hot”] → 2. - No path: disconnected words → 0.
- Long L: words of length 10, N = 5000 (load test).
- All same-length: invariant must hold; assert in code.
Follow-up Questions
- “Return all shortest paths, not just length.” → BFS to identify the layer of
endWord, then DFS backward through parent pointers stored at each layer. - “What if word lengths differ across the list?” → Edges are now insert/delete/substitute; problem reduces to edit-distance graph. Out of scope here.
- “What if N = 10^6?” → Bidirectional BFS halves the layer count; trie-based neighbor finding can replace bucket dicts.
- “Stream of new words being added live.” → Maintain buckets incrementally; BFS becomes a per-query operation.
- “What if changing a letter has a cost?” → Now weighted; switch to Dijkstra.
Product Extension
Spell-correctors, fuzzy-matching APIs, and DNA-mutation analyzers all use similar implicit-graph BFS. Google’s “did you mean” suggestion historically used Levenshtein-distance graphs over its query log; word-ladder BFS is the toy version of that.
Language/Runtime Follow-ups
- Python:
defaultdict(list)andcollections.dequeare essential. String slicing is O(L); thew[:i] + '*' + w[i+1:]pattern allocates a new string per call (3 × 10^5 per call * 5000 words = 1.5 × 10^9 — measured ~1.5s in Python). Acceptable; for faster, precompute patterns once per word. - Java: use
HashMap<String, List<String>>andArrayDeque<String>.StringBuilderfor pattern construction is faster than string concat. Useintdistance via a parallel map or wrap in a custom record. - Go:
map[string][]stringand a slice-based queue (q = q[1:]is O(1) amortized for slice queues, but channels are easier). Strings are immutable so building patterns is O(L) regardless. - C++:
unordered_map<string, vector<string>>andqueue<pair<string,int>>. Preallocate to avoid rehashing. Usestring_viewif possible to avoid copies; or(word_index, distance)to avoid string keys entirely. - JS/TS:
Mapand an array used as a queue (shift()is O(N) — instead, use a deque library or two-stack approach). Strings are immutable; pattern construction allocates.
Common Bugs
- Forgetting to check
endWord in word_setup front — wastes work if missing. - Visited check on dequeue, not on enqueue — exponential blowup of queue size.
- Returning the BFS distance instead of distance + 1 (or vice versa) — off by one.
- Including
beginWordinword_setand then visiting it on a wildcard match — easy if you don’t initializevisited = {beginWord}first. - Generating wildcards with the wrong character (
'_'vs'*') and getting collision-free buckets that are also empty. - Forgetting that
wordListmay contain duplicates if you stored as list — use a set.
Debugging Strategy
Print the buckets for a tiny example (["hot", "dot", "dog"], L=3) and verify each wildcard pattern maps to the expected words. Print the BFS queue state after each layer. If the BFS terminates too early, trace which word was dequeued at the failure point and which neighbors weren’t generated. If too slow, profile with cProfile (Python) — likely you’re not visited-marking on enqueue.
Mastery Criteria
- Recognized the implicit-graph signal (one-character-difference adjacency) within 60 seconds.
- Wrote the bucket construction from blank screen in <3 minutes, no off-by-ones.
- Wrote correct BFS with visited-on-enqueue in <5 minutes from cold start.
- Stated O(N · L²) complexity unprompted.
- Solved LC 127 in <15 minutes from cold start.
- Solved LC 433 (Minimum Genetic Mutation, a near-clone) in <10 minutes.
- Articulated the wildcard-bucket vs all-pairs tradeoff in <30 seconds when asked.
Lab 02 — DFS Connected Components (Number of Islands)
Goal
Implement DFS on a 2D grid to enumerate connected components, both recursively and iteratively. After this lab you should be able to write numIslands from a blank slate in <8 minutes, convert recursive DFS to iterative DFS in <3 minutes, and extend the template to any grid-component problem (max area, perimeter, surrounded regions) by changing 5 lines or fewer.
Background Concepts
A grid graph treats each cell (r, c) as a node and edges as adjacencies between 4-connected (or 8-connected) neighboring cells of compatible type. A connected component is a maximal set of cells reachable from each other. Counting components reduces to: scan all cells; whenever an unvisited “land” cell is found, increment the count and DFS-mark its entire component as visited.
DFS is naturally recursive: enter a cell, mark visited, recurse to each valid neighbor. The recursion depth equals the longest path in the component; for an R × C grid the worst case is R · C — at 300 × 300 that’s 9 × 10^4, exceeding Python’s default 1000 recursion limit. The iterative version uses an explicit stack and avoids this entirely.
Interview Context
Number of Islands (LC 200) is the most-asked grid-DFS problem in interview history — it has appeared at virtually every FAANG company at least once, and at Amazon, Meta, and Google in the last year. It’s a stock phone-screen problem at L3-L4 and a warm-up at L5+. Bombing it is a no-hire signal at any senior level. The interviewer expects you to know it cold, and the value-add comes from how you handle the follow-ups: max area, surrounded regions, online updates (LC 305), grid as adjacency matrix (LC 547).
Problem Statement
Given an m × n 2D binary grid where '1' represents land and '0' represents water, return the number of islands. An island is a maximal group of land cells connected 4-directionally (horizontally or vertically).
Constraints
- 1 ≤ m, n ≤ 300
grid[i][j]is'0'or'1'.- The grid is surrounded implicitly by water on all sides.
Clarifying Questions
- Is diagonal adjacency 4-connected or 8-connected? (4-connected.)
- Are the cell values strings
'0'/'1'or ints? (Strings, per LeetCode.) - Can the input be modified in place? (Usually yes — saves O(R·C) visited memory.)
- Are very tall thin grids possible? (Yes — 1 × 90000 is allowed by the constraint.)
- Is the count required to fit in int32? (Yes; max islands ≈ 4.5 × 10^4.)
Examples
grid = [["1","1","0"],
["1","0","0"],
["0","0","1"]]
→ 2
grid = [["1","1","1"],
["1","1","1"],
["0","0","0"]]
→ 1
Initial Brute Force
For each cell, if it’s land and unvisited, increment count and recursively flood-fill. There is no significantly worse “naive” — DFS is the natural approach.
Brute Force Complexity
O(R · C) — every cell is visited exactly once. Space O(R · C) for the recursion stack worst case.
Optimization Path
There’s no asymptotic improvement; only constant-factor and stack-depth improvements:
- In-place marking (mutate
'1'→'0') — saves O(R · C) memory. - Iterative DFS via stack — avoids Python recursion-limit blow-up at R · C ≥ 10^4.
- DSU as alternative — slightly slower in practice (α factor) but composable for online problems (LC 305).
- BFS variant — same asymptotic, different constant; sometimes preferred in Python because deque-pop has lower per-call cost than function calls.
Final Expected Approach
Iterative DFS with in-place marking:
- Iterate over every cell.
- If
grid[r][c] == '1': increment count, push(r, c)to stack. - While stack non-empty: pop
(r, c); if already water, skip; mark'0'; push 4 neighbors that are land.
Recursive DFS is acceptable for small grids; mention recursion-limit and in-place marking on follow-up.
Data Structures Used
- The input grid itself as the visited bookmark (in-place mutation).
- An explicit stack (list, in Python) for iterative DFS.
Correctness Argument
Component-counting via DFS is correct because: (1) DFS from an unvisited land cell visits exactly the cells in its component (closure under adjacency); (2) marking visited prevents re-counting; (3) the outer loop ensures every cell is examined; (4) we increment count only on the first cell of each component. Termination follows from finite grid size.
Complexity
| Operation | Time | Space |
|---|---|---|
| Whole algorithm | O(R · C) | O(R · C) recursion or O(min(R,C)) for BFS |
Implementation Requirements
def numIslands(grid):
if not grid:
return 0
R, C = len(grid), len(grid[0])
DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
count = 0
for r0 in range(R):
for c0 in range(C):
if grid[r0][c0] != '1':
continue
count += 1
stack = [(r0, c0)]
while stack:
r, c = stack.pop()
if grid[r][c] != '1':
continue
grid[r][c] = '0'
for dr, dc in DIRS:
nr, nc = r + dr, c + dc
if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == '1':
stack.append((nr, nc))
return count
Tests
- All water:
[["0"]]→ 0. - All land:
[["1"]*5 for _ in range(5)]→ 1. - Diagonal 1s only: 4-connected → many islands.
- Single column:
[["1"],["0"],["1"],["0"],["1"]]→ 3. - Single row: same logic.
- Snake pattern: alternating
1rows /0rows → R/2 components. - Large: 300 × 300 random — must complete in <1s.
Follow-up Questions
- “Max area of an island.” (LC 695) → DFS returns area; track max.
- “Number of distinct islands” (LC 694) → record canonical-form path of each DFS; dedupe by hash.
- “Surrounded regions” (LC 130) → DFS from border, mark; flip rest.
- “Add land online and report island count after each addition” (LC 305) → DSU.
- “Are two grids the same modulo rotation?” — open-ended modeling; involves shape signatures.
Product Extension
Connected-component analysis underpins image segmentation (region labeling in computer vision), ACL group expansion in identity systems, and graph-clustering for fraud-ring detection. The grid is just a constrained adjacency; the structure generalizes to any sparse adjacency.
Language/Runtime Follow-ups
- Python: recursion limit at 1000 means recursive DFS fails at large grids. Use iterative or
sys.setrecursionlimit(10**6). List-as-stack is fast;tuplekeys in any auxiliary structures are fine. - Java: write
int[][] gridorchar[][]. UseDeque<int[]>(ArrayDeque) for stack;int[]{r, c}instead ofPairfor performance. Don’t useStack(legacy synchronized class). - Go:
[][]byteor[][]int32. Slice-as-stack:s = s[:len(s)-1]. Two-int struct (type cell struct{ r, c int }) avoids slice allocation per push. - C++:
vector<vector<char>>orvector<string>.stack<pair<int,int>>. Useemplace_backto avoid copies. - JS/TS: arrays as stacks (
push/popare O(1)). For 2D grids prefergrid[r][c]over flat-array indexing for clarity; the speed diff is negligible at N = 9 × 10^4.
Common Bugs
- Bounds check missing on neighbor (negative or out-of-grid index).
- Marking visited after pushing instead of on push — same node enters stack many times via different neighbors.
- Recursive DFS without
sys.setrecursionlimitblowing the stack on 300 × 300 dense islands. - Mutating the input when the caller didn’t allow it — clarify in interviews.
- Treating
'1'as integer1(or vice versa) — equality check fails silently. - Off-by-one on
RandC(using< Rvs<= R). - Forgetting one of the four directions (typo in DIRS).
Debugging Strategy
For a 3 × 3 grid, print the grid after each DFS call to verify in-place marking. Add a print((r, c)) on each pop and verify the 4 neighbors are correctly considered. If count is too high, you’re probably re-counting a visited component (visited check missing). If too low, your DFS is exiting early — check the bounds and the equality on '1' vs 1.
Mastery Criteria
- Recognized “count connected groups in a grid” as DFS/BFS in <30 seconds.
- Wrote both recursive and iterative DFS from cold start in <8 minutes total.
- Handled the recursion-limit trap correctly when asked about 10^4 × 10^4 grids.
- Stated O(R · C) complexity unprompted.
- Solved LC 200 in <10 minutes from cold start.
- Solved LC 695 (Max Area) in <12 minutes by extending the template.
- Solved LC 130 (Surrounded Regions) in <20 minutes by inverting the search.
- Articulated when DSU is preferable to DFS (online updates, no spatial constraint).
Lab 03 — Multi-Source BFS (Rotting Oranges)
Goal
Implement multi-source BFS on a grid to compute the minimum time for a process to spread from multiple simultaneous starting points. After this lab you should be able to recognize the multi-source BFS signal (any “infection / spread / nearest-source” problem) in <60 seconds, initialize the queue correctly with all sources at distance 0, and write the layer-by-layer time-tracking logic without off-by-ones.
Background Concepts
Multi-source BFS is single-source BFS with a virtual super-source connected to all real sources by zero-weight edges. We don’t materialize the super-source; we just enqueue all real sources at distance 0 simultaneously. The BFS then proceeds layer by layer, and each cell’s distance is min over all sources of (path length to that source). Critically, this is O(V + E) — same as single-source BFS — not O(K · (V + E)) for K sources.
The “rotting oranges” problem asks: given a grid where some cells contain rotten fruit (sources) and some contain fresh fruit (targets), how many time steps until all fresh fruit rots? Each minute, every rotten orange infects its 4-connected fresh neighbors. The answer is the maximum BFS distance among fresh oranges, or -1 if any fresh orange is unreachable.
Interview Context
Rotting Oranges (LC 994) appears at Amazon, Meta, and Microsoft phone screens regularly. The trap is candidates running single-source BFS K times — one per rotten cell — which is O(K · R · C) and blows up at K = R · C / 2. Multi-source BFS is the senior signal here; recognizing it within the first 90 seconds and stating it explicitly differentiates a strong L4 from a struggling one.
Problem Statement
Given an m × n grid where each cell is:
0: empty,1: fresh orange,2: rotten orange,
each minute every rotten orange rots its 4-connected fresh neighbors. Return the minimum minutes until no fresh orange remains, or -1 if some fresh orange can never rot.
Constraints
- 1 ≤ m, n ≤ 10
grid[i][j]∈ {0, 1, 2}- (Note: small grid in LC 994; the algorithm scales to 10^4 × 10^4 trivially.)
Clarifying Questions
- If the grid has no fresh oranges initially, what’s the answer? (0 — already done.)
- If a rotten orange has no fresh neighbors and there are no fresh oranges anywhere, return 0. If there are unreachable fresh oranges, return -1.
- Are ties between sources broken consistently? (Doesn’t matter — we want minimum distance, which is unambiguous.)
- Can a rotten orange “re-rot” a previously-rotted cell? (No — once rotten, stays rotten.)
Examples
grid = [[2,1,1],
[1,1,0],
[0,1,1]]
→ 4
grid = [[2,1,1],
[0,1,1],
[1,0,1]]
→ -1 (bottom-left is unreachable)
grid = [[0,2]]
→ 0
Initial Brute Force
Simulate minute by minute: at each step, find every rotten orange, infect its fresh neighbors, count rotted cells. Repeat until no change. Each step is O(R · C); total steps ≤ R + C; total O((R + C) · R · C) — at 10 × 10 trivially fast, but doesn’t scale.
Brute Force Complexity
O(R · C · max-distance). At 10 × 10, ~10^4 ops. Passes LC bounds easily but is “embarrassing” — interviewer wants the BFS framing.
Optimization Path
Multi-source BFS gives the optimal O(R · C):
- Scan grid: count fresh oranges; enqueue every rotten orange at distance 0.
- BFS layer by layer; on rotting a fresh cell, decrement the fresh count and enqueue at next distance.
- After BFS, if fresh count > 0, return -1; else return the maximum distance reached.
The simulation is mathematically equivalent but presents better in interviews — it’s the canonical multi-source signal.
Final Expected Approach
fresh_count = count of '1' in grid
queue = deque of (r, c, 0) for every '2' in grid
time = 0
while queue:
r, c, t = queue.popleft()
time = max(time, t)
for each 4-neighbor (nr, nc):
if in bounds and grid[nr][nc] == 1:
grid[nr][nc] = 2
fresh_count -= 1
queue.append((nr, nc, t + 1))
return -1 if fresh_count > 0 else time
Data Structures Used
collections.dequeof(row, col, time)tuples.- The input grid as the visited marker (mutate fresh → rotten on rot).
- An integer
fresh_count.
Correctness Argument
The super-source argument: imagine a virtual node S₀ connected to every initial rotten cell by a zero-weight edge. BFS from S₀ visits cells in non-decreasing distance order; each cell’s distance is 1 + min over rotten cells of (path length). By initializing with all rotten cells at distance 0 instead, we get the same distances. The “minutes until no fresh remains” equals the maximum BFS distance among initially-fresh cells; this is exactly the time the last cell rots. Unreachable fresh cells are detected by the fresh_count > 0 check post-BFS.
Complexity
| Operation | Time | Space |
|---|---|---|
| Initial scan | O(R · C) | O(R · C) for queue worst case |
| BFS | O(R · C) | (already counted) |
| Total | O(R · C) | O(R · C) |
Implementation Requirements
from collections import deque
def orangesRotting(grid):
R, C = len(grid), len(grid[0])
queue = deque()
fresh = 0
for r in range(R):
for c in range(C):
if grid[r][c] == 2:
queue.append((r, c, 0))
elif grid[r][c] == 1:
fresh += 1
if fresh == 0:
return 0
DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
time = 0
while queue:
r, c, t = queue.popleft()
time = t
for dr, dc in DIRS:
nr, nc = r + dr, c + dc
if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == 1:
grid[nr][nc] = 2
fresh -= 1
queue.append((nr, nc, t + 1))
return -1 if fresh > 0 else time
Tests
- All rotten: 0.
- All fresh, no rotten: -1.
- Mixed with one isolated fresh: -1.
- Single rotten in corner, all fresh: distance = R + C - 2.
- Empty grid (all zeros): 0.
- 1×1 grid with
[[0]]: 0;[[1]]: -1;[[2]]: 0. - Stress: 10×10 random; verify against simulation brute force.
Follow-up Questions
- “What if rotting takes a different amount of time per cell?” → Edge weights vary; use Dijkstra.
- “What if there are obstacle cells?” → Add
grid[nr][nc] == 0skip; same algorithm. - “What if oranges can only rot orthogonally to one direction?” → Replace DIRS with the allowed subset.
- “Walls and Gates (LC 286): from each empty room, find distance to nearest gate.” → Same multi-source BFS pattern; gates are the sources.
- “01 Matrix (LC 542): for each cell, distance to nearest 0.” → Sources are the 0s; targets are the 1s.
Product Extension
Spreadable processes — disease propagation in epidemiology, fire spread in simulations, viral content propagation in social graphs, “blast radius” of a deployment failure across a service mesh — all map to multi-source BFS. The minute-by-minute simulation in production tracking systems is exactly this algorithm.
Language/Runtime Follow-ups
- Python:
collections.dequeis the canonical queue. Tuples for(r, c, t). Avoidlist.pop(0)— that’s O(N). - Java:
ArrayDeque<int[]>withint[]{r, c, t}. UsepollFirst/offerLast. - Go: a slice as queue (
q[1:]is O(1) amortized, or use container/list). A[3]intstruct is fine. - C++:
queue<tuple<int,int,int>>.emplacefor efficiency.auto [r, c, t] = q.front()in C++17. - JS/TS: array
shift()is O(N) — use a deque or pointer-based queue.queueMicrotaskis irrelevant here.
Common Bugs
- Counting time as
time + 1after the BFS terminates — the last layer’s t is already correct. - Forgetting to enqueue all rotten cells before starting — single-source BFS, missing K-1 sources.
- Initializing time to -1 vs 0 to handle the “no fresh” case — corner case.
- Decrementing
fresh_counttoo late, double-decrementing on revisits. - BFS correctly terminating but reporting
tfrom the wrong layer (e.g., the last enqueued, not last popped). - Mutating
1to2outside the bounds check. - Mistaking the answer “minutes” for “max steps” when the grid has no fresh oranges (answer is 0, not “the BFS terminates immediately”; you must short-circuit).
Debugging Strategy
Trace a 3 × 3 grid by hand: print the queue contents and grid after each pop. Verify time increments by exactly 1 between layers. If fresh > 0 at the end, identify the unreachable cell — confirm it has no path to any source by visual inspection. If time is off by 1, you’re probably tracking time = t + 1 on enqueue instead of t on pop.
Mastery Criteria
- Recognized “spread from multiple sources” as multi-source BFS in <60 seconds.
- Initialized the queue with all sources at distance 0 unprompted.
- Wrote correct BFS with time tracking from cold start in <8 minutes.
- Stated O(R · C) complexity unprompted; explained why running K single-source BFSs is wrong.
- Solved LC 994 in <12 minutes from cold start.
- Solved LC 286 (Walls and Gates) in <12 minutes by extending the template.
- Solved LC 542 (01 Matrix) in <12 minutes.
- Articulated the super-source equivalence in <30 seconds.
Lab 04 — Dijkstra (Network Delay Time)
Goal
Implement Dijkstra’s algorithm (lazy variant, binary heap) for single-source shortest path on a non-negative-weighted directed graph. After this lab you should be able to write Dijkstra from a blank screen in <8 minutes, including the staleness-skip line; recognize the non-negative-weight signal in <30 seconds; and adapt the template to “shortest path with constraints” (e.g., max K edges) by extending the state.
Background Concepts
Dijkstra’s algorithm computes the shortest path from a source s to every other node in a graph with non-negative edge weights. The core invariant: when a node u is extracted from the priority queue (heap), its tentative distance dist[u] is final. The proof relies on non-negativity: any other path to u must go through some node w not yet extracted, with dist[w] ≥ dist[u], and the path’s total length is dist[w] + (non-negative tail) ≥ dist[u].
Two variants:
- Lazy: push every relaxation
(new_dist, neighbor)to the heap; on pop, skip ifnew_dist > dist[neighbor](stale entry). Heap holds up to E entries. Simpler. - Eager: maintain a decrease-key indexed heap; each node appears at most once. Faster constants, more code.
In interviews, lazy is the default. State that explicitly.
Interview Context
Dijkstra appears in the top 5 graph algorithms tested at FAANG. Network Delay Time (LC 743) is the canonical version, asked at Google, Amazon, and Bloomberg. Cheapest Flights (LC 787) is the constrained variant. The senior signal is: state “this is Dijkstra, weights are non-negative, lazy variant with binary heap, O((V+E) log V)” within the first two minutes, before writing code. Candidates who don’t articulate this and just dive in lose points even if the code works.
Problem Statement
You are given a network of n nodes labeled 1..n. A list times of edges where times[i] = (u, v, w) means it takes w time for a signal to travel from u to v. A signal is sent from node k. Return the minimum time for all nodes to receive the signal, or -1 if some node never receives it.
Constraints
- 1 ≤ n ≤ 100, 1 ≤ |times| ≤ 6000
- 1 ≤ u, v, k ≤ n; u ≠ v
- 0 ≤ w ≤ 100
- Edges are directed; possibly multi-edges.
Clarifying Questions
- Are weights non-negative? (Yes — Dijkstra applies.)
- Are nodes 1-indexed? (Yes — adjust array sizes accordingly.)
- Are duplicate edges possible? (Yes — they’re allowed; minimum-weight edge between (u, v) is what matters effectively.)
- Is the graph guaranteed connected? (No — return -1 if unreachable.)
- What’s the answer if the graph has only the source? (0 — already received.)
Examples
times = [[2,1,1],[2,3,1],[3,4,1]], n=4, k=2
→ 2 (signal reaches 1, 3 at time 1; 4 at time 2)
times = [[1,2,1]], n=2, k=1
→ 1
times = [[1,2,1]], n=2, k=2
→ -1
Initial Brute Force
Bellman-Ford: V - 1 rounds of relaxing all E edges. Time O(V · E). At V=100, E=6000: 6 × 10^5 ops — passes easily but is the wrong answer when the interviewer asks complexity.
Alternative brute force: BFS treating equal-weight edges. Wrong answers on weighted graphs unless all weights = 1.
Brute Force Complexity
Bellman-Ford: O(V · E) = 6 × 10^5. BFS-as-Dijkstra: incorrect for varying weights.
Optimization Path
Dijkstra with binary heap: O((V + E) log V) = ~5 × 10^4 with these constraints. The right answer.
For very dense graphs (E ~ V²), simple Dijkstra without a heap (just scan for the min unsettled node) is O(V²) and can be faster. Floyd-Warshall is O(V³) all-pairs — overkill for single-source, but valid here at V=100.
Final Expected Approach
- Build adjacency list from edge list.
adj[u]is a list of(weight, neighbor)pairs. dist[i] = ∞for all i exceptdist[k] = 0.- Push
(0, k)to a min-heap. - While heap non-empty: pop
(d, u); ifd > dist[u], skip (stale); else relax all edgesu → v: ifd + w < dist[v], update and push. - After the loop,
max(dist[1..n])is the answer; if any is ∞, return -1.
Data Structures Used
- Adjacency list:
dict[int, list[(int, int)]]orlist[list[(int, int)]]indexed 1..n. - Distance array:
list[int]of size n+1, init toinf. - Priority queue: Python’s
heapq(min-heap).
Correctness Argument
Loop invariant: when (d, u) is popped from the heap with d == dist[u], that distance is final. Proof: any other path to u goes through some node w not yet popped (else dist[u] would have been updated to a smaller value). Since w is unsettled, dist[w] ≥ dist[u] (by heap order). The path’s total length is dist[w] + tail ≥ dist[u] (non-negativity of tail). So dist[u] is optimal.
Termination: each node is finalized at most once (the staleness-skip ensures repeated pops don’t re-relax). At most V finalizations + E heap pushes: O((V + E) log V).
Complexity
| Operation | Time | Space |
|---|---|---|
| Build adjacency | O(V + E) | O(V + E) |
| Dijkstra | O((V + E) log V) | O(V + E) heap |
| Total | O((V + E) log V) | O(V + E) |
Implementation Requirements
import heapq
from collections import defaultdict
def networkDelayTime(times, n, k):
adj = defaultdict(list)
for u, v, w in times:
adj[u].append((w, v))
INF = float('inf')
dist = [INF] * (n + 1)
dist[k] = 0
heap = [(0, k)]
while heap:
d, u = heapq.heappop(heap)
if d > dist[u]:
continue
for w, v in adj[u]:
nd = d + w
if nd < dist[v]:
dist[v] = nd
heapq.heappush(heap, (nd, v))
ans = max(dist[1:n+1])
return -1 if ans == INF else ans
Tests
- Standard: 4-node star → 2.
- Disconnected: source can’t reach a node → -1.
- Single node, source = only node → 0.
- Duplicate edges: pick min weight on relaxation.
- Self-loop (problem disallows but defend): doesn’t affect answer; skip.
- Stress: V=100, E=6000 random, compare against Bellman-Ford reference.
- Adversarial: dense graph forcing many heap pushes.
Follow-up Questions
- “Now find shortest path with at most K edges.” (LC 787) → Bellman-Ford OR Dijkstra with state
(node, edges_used). See Lab 05. - “Now weights can be negative.” → Dijkstra is wrong; use Bellman-Ford.
- “Find path with maximum probability” (LC 1514) → Dijkstra with max-heap and product (or
-logweights and standard Dijkstra). - “All-pairs shortest path.” → Run Dijkstra from every node, O(V · (V+E) log V), or Floyd-Warshall O(V³).
- “Graph evolves online.” → Recompute on each query, or use dynamic shortest-path structures (advanced).
Product Extension
Network monitoring tools use Dijkstra-like algorithms for path-cost estimation. Routing protocols like OSPF use Dijkstra (link-state routing) to compute shortest paths in IP networks. CDN edge selection, traffic engineering, and request routing in microservices meshes all rely on shortest-path computations parameterized by latency, cost, or QoS.
Language/Runtime Follow-ups
- Python:
heapqis a min-heap; for max-heap, negate weights or use tuples(-w, ...). Tuple comparison is element-by-element —(d, u)compares bydfirst. - Java:
PriorityQueue<int[]>with comparator on the weight index, orPriorityQueue<long[]>if weights overflow. Don’t usePairfrom JavaFX (deprecated). - Go:
container/heaprequires implementing theheap.Interface. Tedious; many candidates inline a slice-based heap. For small N, even O(V²) Dijkstra without heap is fine. - C++:
priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>>for min-heap. Or negate weights with default max-heap. - JS/TS: no built-in heap. Use a library (
heap-js) or hand-roll. For small N, an O(V²) scan is acceptable.
Common Bugs
- Forgetting the staleness check
if d > dist[u]: continue— the heap pops the same node multiple times after stale updates; without the skip, you re-relax incorrectly and blow up complexity. - Pushing
(u, dist[u])instead of(dist[u], u)— heap orders on first element, so distance must come first. - 1-indexed vs 0-indexed off-by-one. The problem is 1-indexed; size arrays at
n + 1. - Initial
distnot set to ∞ — using 0 makes the source’s neighbors look “already optimal”. - Pushing the source to the heap but forgetting to set
dist[source] = 0. - Returning
min(dist)(which catches the unreachable -∞) instead ofmax(dist)(which is the actual question). - Negative weights — Dijkstra silently produces wrong answers; you won’t see this fail unless you stress-test.
Debugging Strategy
Print the heap contents and dist array after each pop. Verify the popped distance equals dist[u] for non-stale entries. For wrong answers, trace a specific node v where dist[v] is wrong: identify the predecessor u that should have relaxed it; check that (d_u + w_uv) is computed correctly. For complexity blowup, check that the staleness skip is present and triggers.
Mastery Criteria
- Recognized “shortest path with non-negative weights” as Dijkstra in <30 seconds.
- Wrote Dijkstra from blank screen with the staleness check in <8 minutes.
- Stated O((V + E) log V) complexity unprompted.
- Articulated why Dijkstra fails on negative weights in <30 seconds.
- Solved LC 743 in <15 minutes from cold start.
- Solved LC 1631 (Path With Minimum Effort) in <20 minutes by adapting weights = max along path.
- Solved LC 778 (Swim in Rising Water) in <20 minutes.
- Stated lazy vs eager difference in <30 seconds.
Lab 05 — Bellman-Ford (Cheapest Flights Within K Stops)
Goal
Implement Bellman-Ford for shortest path on a graph that may contain negative weights, and exploit its iteration structure to solve the canonical “shortest path with at most K edges” problem. After this lab you should be able to recognize the K-edge-budget signal in <60 seconds, write Bellman-Ford from a blank slate in <10 minutes, and articulate the negative-cycle detection extension cleanly.
Background Concepts
Bellman-Ford runs V - 1 rounds; in each round, relax all E edges. After round i, dist[v] equals the shortest path from source to v using at most i edges. This invariant is the key insight for “K-edge-budget” problems: run the algorithm for K rounds (or K + 1, depending on edge-vs-stop semantics) and read off the answer. A V-th round that still relaxes any edge proves a negative cycle reachable from the source.
The complexity is O(V · E). It tolerates negative weights (unlike Dijkstra) but is slower on graphs without them. The “shortest path with at most K edges” framing is the most common interview reason to use Bellman-Ford.
Interview Context
Cheapest Flights Within K Stops (LC 787) is asked at Amazon, Bloomberg, and Adobe. The trap is candidates reaching for Dijkstra and being unable to bound edge count. The strong answer: “this is Bellman-Ford with K + 1 iterations, where K stops means K + 1 edges, complexity O(K · E).” That single sentence wins the round. Negative-cycle detection (e.g., currency arbitrage) is rarer at L4-L5 but standard at staff and on competitive programming exercises.
Problem Statement
You are given n cities and a list flights[i] = (from, to, price) of directed flights. Return the cheapest price from src to dst using at most K stops (i.e., K intermediate cities, K + 1 edges). Return -1 if no such route exists.
Constraints
- 1 ≤ n ≤ 100; 0 ≤ |flights| ≤ n · (n − 1) / 2
- 1 ≤ price ≤ 10^4
- 0 ≤ src, dst, K < n; src ≠ dst
Clarifying Questions
- “K stops” — does this mean K intermediate cities (so K + 1 edges) or K edges total? (Per LC 787 statement: K stops = K intermediate = K + 1 edges.)
- Are prices positive? (Yes; no negative-cycle concerns here.)
- Are duplicate flights allowed? (Possible; pick min on relaxation.)
- Should we count
srcanddstas “stops”? (No — those are endpoints.) - Is K = 0 allowed (direct flight only)? (Yes —
0 ≤ K.)
Examples
n=3, flights=[[0,1,100],[1,2,100],[0,2,500]], src=0, dst=2, K=1
→ 200 (0 → 1 → 2 uses 1 stop)
n=3, flights=[[0,1,100],[1,2,100],[0,2,500]], src=0, dst=2, K=0
→ 500 (only direct allowed)
Initial Brute Force
DFS from src exploring all paths up to K + 1 edges; track minimum cost. Time exponential — O(V^(K+1)) — at K = 99, V = 100: infeasible.
Brute Force Complexity
O(V^(K+1)) — TLE for K > ~10.
Optimization Path
Bellman-Ford with K + 1 iterations: O((K + 1) · E). At K = 99, E ~ 5000: 5 × 10^5 ops — fast. The key technique: keep two distance arrays, prev and curr. In each iteration, compute curr[v] = min(prev[u] + w_uv) over all edges. Using prev (last round’s snapshot) prevents using more than one edge per round.
Modified Dijkstra with state (node, edges_used) also works: push (cost, node, edges_remaining) to a heap, expand only if edges_remaining > 0. Slightly slower than Bellman-Ford for this problem because the heap doesn’t prune effectively; both are correct.
Final Expected Approach
prev[i] = ∞ for all i; prev[src] = 0
for round in 1 .. K + 1:
curr = prev.copy()
for (u, v, w) in flights:
if prev[u] + w < curr[v]:
curr[v] = prev[u] + w
prev = curr
return prev[dst] if prev[dst] != ∞ else -1
The curr = prev.copy() ensures each round uses only edges from the previous round’s distances — bounding the path to at most one new edge per round.
Data Structures Used
- Two arrays of size n:
prevandcurr. - The flight list as the edge list (no need to build adjacency).
Correctness Argument
Invariant: after round i, prev[v] = shortest path from src to v using at most i edges. Proof by induction. Base (i = 0): only src has prev = 0, all others ∞. Inductive step: any shortest i-edge path is either an (i-1)-edge path (already in prev[v]) or extends some (i-1)-edge path to u with edge (u, v); the inner loop catches the latter via prev[u] + w_uv → curr[v]. After K + 1 rounds, prev[dst] is the shortest at-most-K+1-edge path.
Why two arrays: without the copy, in-place updates could chain multiple edges in one round, breaking the at-most-i-edges invariant.
Complexity
| Operation | Time | Space |
|---|---|---|
| Whole algorithm | O((K + 1) · E) | O(V) |
Implementation Requirements
def findCheapestPrice(n, flights, src, dst, K):
INF = float('inf')
prev = [INF] * n
prev[src] = 0
for _ in range(K + 1):
curr = prev[:]
for u, v, w in flights:
if prev[u] + w < curr[v]:
curr[v] = prev[u] + w
prev = curr
return prev[dst] if prev[dst] != INF else -1
Tests
- Standard: 0 → 1 → 2 with K=1 → 200; K=0 → 500.
- No path: disconnected → -1.
- Direct flight only: K=0, no direct → -1.
- K very large (≥ V - 1): equivalent to unrestricted Bellman-Ford.
- Tie: two K-stop paths with same cost — return the cost.
- Stress: V=100 dense, K=99 random — verify against modified Dijkstra.
Follow-up Questions
- “Negative weights? Negative cycles?” → Run V rounds (not K); if round V relaxes any edge, negative cycle reachable from src exists.
- “All-pairs shortest path with negative weights, no cycles.” → Johnson’s algorithm: Bellman-Ford to reweight, then Dijkstra from each source.
- “Currency arbitrage detection.” → Build graph with
weight = -log(rate); negative cycle = profitable arbitrage. - “K is up to 10^9.” → Matrix exponentiation on the (min, +) semiring; O(V³ log K).
- “Online updates: new flights added live.” → Difficult. Restart Bellman-Ford on each query; or maintain incrementally with limited optimizations.
Product Extension
Travel-search engines (Kayak, Google Flights, Hipmunk) treat flight networks as graphs and apply variants of K-stop shortest path. The “fewest connections” filter is exactly K-stop. The “cheapest with up to 2 stops” is K=2 Bellman-Ford. Currency-arbitrage bots run Bellman-Ford continuously on FX-rate graphs to detect profit cycles in milliseconds.
Language/Runtime Follow-ups
- Python:
prev[:]is O(V) and acceptable.array.array('d', ...)for floats can speed up cache locality. - Java:
int[] prev = new int[n]; Arrays.fill(prev, Integer.MAX_VALUE);Watch overflow when summing — uselongif weights × edges can overflowint. - Go:
prev := make([]int, n)with manual init to a large constant;copy(curr, prev)for the snapshot. Builtinmath.MaxInt32is fine. - C++:
vector<int> prev(n, INT_MAX);Uselong longif weights are large.std::copyfor the snapshot. - JS/TS:
Array.from({length: n}, () => Infinity)andprev.slice()for copy.
Common Bugs
- In-place updates without the prev/curr split — chains multiple edges per round, gives wrong answers.
- Running K rounds instead of K + 1 — off by one (K stops = K + 1 edges).
- Treating “K stops” as K edges — read carefully; LC 787 means K + 1 edges.
- Forgetting to copy
prevtocurrat the start of each round — uses stale curr from previous iteration. - Returning
prev[dst]without the unreachable check; INF leaks into output. - Integer overflow on
prev[u] + wwhenprev[u]is set toINT_MAX— guard with the unreachable check first. - Confusing “K = 0 means direct only” with “K = 0 means src only”.
Debugging Strategy
For a small case, print prev at the end of each round. Verify that round 1 has dist[v] = w(src → v) for direct neighbors only; round 2 has 2-edge paths, etc. If the answer is wrong, compare with a brute-force enumeration of paths up to K + 1 edges. For negative-cycle problems, verify that round V actually relaxes an edge by tracking a “changed” flag.
Mastery Criteria
- Recognized “shortest path with at most K edges” as Bellman-Ford in <60 seconds.
- Articulated the iteration-as-edge-budget invariant unprompted.
- Wrote Bellman-Ford from blank screen with prev/curr split in <10 minutes.
- Stated O((K + 1) · E) complexity unprompted.
- Solved LC 787 in <20 minutes from cold start.
- Articulated negative-cycle detection in <30 seconds when asked.
- Articulated when Dijkstra is preferable (no edge budget, non-negative weights) in <30 seconds.
Lab 06 — Topological Sort (Alien Dictionary)
Goal
Build a topological sort over an inferred constraint graph — a problem whose graph is not given but must be extracted from the input. After this lab you should be able to recognize “ordering with constraints” problems in <60 seconds, write Kahn’s algorithm from a blank screen in <8 minutes, and identify and handle the three degenerate cases of alien-dictionary parsing (prefix violation, no constraint differs, cyclic dependency).
Background Concepts
A topological sort orders the vertices of a DAG such that every directed edge u → v has u before v. Kahn’s algorithm repeatedly removes a node of in-degree 0, decrementing its neighbors’ in-degrees. If the final order has length V, the graph is a DAG and the order is valid. Otherwise, a cycle exists.
For alien dictionary, the input is a list of words known to be sorted lexicographically in some unknown alien alphabet. Each adjacent pair of words gives at most one ordering constraint between two characters: the first position where they differ tells you a < b for those characters. Build the constraint graph, run topological sort, output an order. Handle three degenerate inputs:
- No constraint differs between adjacent words but the second is a prefix of the first (e.g.,
["abc", "ab"]) — invalid; return"". - No constraint differs but it’s because the words are equal up to a common length and the second is the longer one (
["ab", "abc"]) — fine; no constraint added. - Cycle in the constraint graph — invalid; return
"".
Interview Context
Alien Dictionary (LC 269) is one of the most-asked Hard graph problems at Meta, Google, and Airbnb. Premium-only on LeetCode but widely leaked. It tests: (1) recognizing topological sort, (2) constraint extraction from non-graph input, (3) handling all three edge cases. Candidates who solve only the happy path lose major points. The senior signal: enumerate the three failure modes upfront, before writing code.
Problem Statement
There is a new alien language using English letters. The order of letters is unknown. Given a list of words sorted lexicographically by the alien language’s rules, return any valid letter ordering. If no valid ordering exists, return "". If multiple are valid, return any.
Constraints
- 1 ≤ |words| ≤ 100
- 1 ≤ |words[i]| ≤ 100
- All words consist of lowercase English letters.
Clarifying Questions
- Is the answer unique? (No — any valid topological order.)
- Does the answer include letters not appearing in any word? (No — only letters that appear.)
- Are duplicate words possible? (Possible; treat normally — they yield no constraint.)
- What does “lexicographically sorted” mean for words of different length? (Standard prefix rule; if A is a prefix of B, then A < B; if B is a prefix of A, the input is invalid.)
- What if
wordsis a single word? (Output any permutation of its unique letters.)
Examples
words = ["wrt","wrf","er","ett","rftt"]
→ "wertf" (one valid ordering)
words = ["z","x"]
→ "zx"
words = ["z","x","z"]
→ "" (z and x must precede each other — cycle)
words = ["abc","ab"]
→ "" (prefix violation — "abc" can't come before "ab")
Initial Brute Force
Enumerate all permutations of the alphabet of size ≤ 26; for each, verify that the input words are in that lexicographic order. O(26!) — infeasible.
Brute Force Complexity
O(26! · ΣL) — astronomical.
Optimization Path
The answer requires O(V + E) where V = number of distinct letters (≤ 26) and E = number of pairwise constraints (≤ |words| - 1). Kahn’s algorithm runs in O(V + E). Constraint extraction is O(Σ |words[i]|).
The total is O(V + E + Σ L) — bounded by Σ L since V ≤ 26. Trivially fast.
Final Expected Approach
- Initialize
in_degree[c] = 0for every letter that appears anywhere. - Build adjacency: for each adjacent pair
(w1, w2)inwords:- Walk both words in parallel; at the first index
iwhere they differ, add edgew1[i] → w2[i](if not already present); break. - If no differing index found and
len(w1) > len(w2): prefix violation — return"".
- Walk both words in parallel; at the first index
- Run Kahn’s: queue all letters with
in_degree[c] == 0; pop, append to order, decrement neighbors. - If the order length equals the number of distinct letters, return the order; else cycle → return
"".
Data Structures Used
defaultdict(set)for adjacency (set prevents duplicate edges).dict[char, int]forin_degree.collections.dequefor the Kahn queue.list[char]for the result.
Correctness Argument
Each adjacent pair contributes at most one constraint — the first differing character. This is sound: if the words are correctly sorted, the relative order of w1[i] and w2[i] (at the first differing index) must be w1[i] < w2[i] in the alien alphabet. No constraint can be inferred from differences after the first; those are consistent with but not implied by the sortedness.
Topological sort over these constraints produces an ordering where every constraint a < b is satisfied (a precedes b in the output). If a cycle exists, the constraints are unsatisfiable and the input is impossible. The prefix-violation case is the only constraint-extraction-time invalid input.
Complexity
| Operation | Time | Space |
|---|---|---|
| Constraint extraction | O(Σ L) | O(unique edges) ≤ O(26²) |
| Kahn’s | O(V + E) | O(V + E) |
| Total | O(Σ L) | O(unique letters + edges) |
Implementation Requirements
from collections import defaultdict, deque
def alienOrder(words):
adj = defaultdict(set)
in_deg = {c: 0 for w in words for c in w}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i+1]
found = False
for j in range(min(len(w1), len(w2))):
if w1[j] != w2[j]:
if w2[j] not in adj[w1[j]]:
adj[w1[j]].add(w2[j])
in_deg[w2[j]] += 1
found = True
break
if not found and len(w1) > len(w2):
return ""
queue = deque([c for c, d in in_deg.items() if d == 0])
order = []
while queue:
c = queue.popleft()
order.append(c)
for nb in adj[c]:
in_deg[nb] -= 1
if in_deg[nb] == 0:
queue.append(nb)
return "".join(order) if len(order) == len(in_deg) else ""
Tests
- Standard:
["wrt","wrf","er","ett","rftt"]→ some topo ofw<e, r<t, t<f. - Single word:
["abc"]→ some permutation of{a,b,c}. - Prefix violation:
["abc","ab"]→"". - Tie/equal words:
["a","a"]→"a". - Cycle:
["z","x","z"]→"". - All same length, all letters used:
["aa","ab","cb"]. - Single letter:
["z"]→"z". - Long words, tiny alphabet: stress for in-degree correctness.
Follow-up Questions
- “Find the lexicographically smallest valid order (in standard a-z order).” → Kahn’s with a min-heap instead of a queue.
- “Find all valid orders.” → Backtracking over Kahn’s choices; exponential.
- “Verify a given ordering.” → For each adjacent word pair, scan for the first differing char and check ordering. O(Σ L).
- “Online: words arrive one by one.” → Maintain the adjacency incrementally; rerun Kahn’s lazily on query.
- “What if the input has typos (wrongly-ordered pairs)?” → Return any consistent ordering, or report the conflict edge.
Product Extension
Build systems (Bazel, Make, Gradle) compute build orders via topological sort over the dependency DAG; cycle detection is a critical correctness property. Database query planners use topo sort over join-graph dependencies. Distributed task schedulers (Airflow, Argo) execute DAGs of jobs in topological order.
Language/Runtime Follow-ups
- Python:
defaultdict(set)andcollections.dequeare essential.dict.items()iteration is fine. - Java:
Map<Character, Set<Character>>andMap<Character, Integer>for in-degree;ArrayDeque<Character>for queue. Useint[26]for in-degree if alphabet is fixed. - Go:
map[byte]map[byte]boolfor adjacency;map[byte]intfor in-degree; slice as queue. Or[26]intfor in-degree as alphabet is fixed. - C++:
unordered_map<char, unordered_set<char>>;array<int, 26>for in-degree;queue<char>. - JS/TS:
Map<string, Set<string>>andMap<string, number>; array-as-queue with care (shiftis O(N)).
Common Bugs
- Adding duplicate edges to in-degree — use a set for adjacency, check membership before incrementing.
- Missing the prefix violation check —
["abc","ab"]returns"abc"if you don’t handle this. - Building in-degree only for letters that have outgoing edges, missing letters that only appear as targets.
- Initializing
in_degreeonly for the first word’s letters — letters appearing only in later words get missed. - Comparing
len(order) == 26instead of== len(in_deg)(only used letters count). - Using a list instead of a set for adjacency, then double-incrementing in-degree.
- Returning the order in the wrong direction (Kahn’s gives the right direction; DFS post-order needs to be reversed).
Debugging Strategy
Print the adjacency and in-degree maps after constraint extraction. Verify each constraint is justified by tracing back to the input pair. For cycles, print the in-degree map at the point Kahn’s stalls — the remaining-positive in-degrees identify nodes in the cycle. For prefix-violation false negatives, print the pair (w1, w2) at each iteration to confirm the check fires.
Mastery Criteria
- Recognized “ordering with constraints” as topological sort in <60 seconds.
- Wrote constraint extraction from word pairs from cold start in <5 minutes.
- Wrote Kahn’s algorithm from blank screen in <6 minutes.
- Enumerated the three degenerate cases (cycle, prefix violation, equal-prefix-shorter-second) before coding.
- Solved LC 269 in <25 minutes from cold start.
- Solved LC 207 (Course Schedule) in <8 minutes by extracting the constraint structure from cold.
- Articulated the white-path lemma / DFS-post-order alternative in <60 seconds.
Lab 07 — Union-Find Applications (Accounts Merge)
Goal
Implement a disjoint-set union (DSU) with path compression and union by rank, then apply it to a real merge problem where the “elements” are emails and the “groups” are accounts. After this lab you should be able to write DSU from a blank screen in <6 minutes, recognize the merge-by-shared-attribute signal in <60 seconds, and articulate when DSU beats DFS for connectivity (online updates, no spatial structure, simple connectivity-only queries).
Background Concepts
A disjoint-set union (DSU, aka union-find) maintains a partition of N elements under two operations:
find(x): return the representative (“root”) of x’s set.union(x, y): merge the sets containing x and y.
With path compression (find rewrites every visited node to point directly at the root) and union by rank/size (always attach the smaller tree under the larger), both operations run in O(α(N)) amortized, where α is the inverse Ackermann function — effectively constant for any practical N.
DSU is the natural choice when you receive a stream of “merge x and y” operations and need to answer “are x and y in the same group” — and you don’t care about paths between them, only connectivity.
Interview Context
Accounts Merge (LC 721) is a top-tier Hard at Amazon and Google. Number of Provinces (LC 547) is the easier sibling at Meta. The trap: candidates default to BFS/DFS on the implicit graph (emails as nodes, “shared email between two accounts” as edges), which works but is messier code than DSU. The senior signal is recognizing the partition structure and reaching for DSU within 90 seconds.
Problem Statement
Given a list of accounts, where accounts[i] = [name, email1, email2, ...], two accounts belong to the same person if they share any common email (names alone are not enough — multiple people can share a name). Merge accounts: return a list where each element is [name, ...sorted unique emails], accounts in any order.
Constraints
- 1 ≤ |accounts| ≤ 1000
- 2 ≤ |accounts[i]| ≤ 10 (one name + 1..9 emails)
- 1 ≤ |email| ≤ 30
- Emails are lowercase, contain
@.
Clarifying Questions
- Are emails case-sensitive? (Per LC: lowercase already.)
- Two accounts with the same name but no shared email — are they merged? (No — names don’t merge.)
- Should the output emails be sorted within each account? (Yes — alphabetically.)
- Order of accounts in output? (Any order is accepted.)
- Total emails: ≤ 1000 × 9 = 9000 — DSU on emails is fine.
Examples
accounts = [
["John","[email protected]","[email protected]"],
["John","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]
]
→ [["John","[email protected]","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]]
Initial Brute Force
Build implicit graph: each email is a node; for each account, connect all its emails to the first email of that account; run DFS/BFS to enumerate connected components; emit each component with the corresponding name. Works, but DSU is cleaner.
Brute Force Complexity
O(Σ |emails| · α) with DSU; O(Σ |emails|) with DFS — both linear in total email count. The DFS version requires building an adjacency list explicitly, which DSU skips.
Optimization Path
DSU directly:
- Treat each unique email as a DSU element.
- For each account, union all its emails to the first email.
- After all unions, group emails by
find(email)root. - For each group, attach the name (looked up via any email in the group → its account → the account’s name).
- Sort emails within each group; output.
This is the cleanest expression. No explicit graph construction needed.
Final Expected Approach
parent = {}
def find(x): if parent[x] != x: parent[x] = find(parent[x]); return parent[x]
def union(x, y): parent[find(x)] = find(y)
email_to_name = {}
for account in accounts:
name = account[0]
for email in account[1:]:
if email not in parent: parent[email] = email
email_to_name[email] = name
union(account[1], email)
groups = defaultdict(list)
for email in parent: groups[find(email)].append(email)
return [[email_to_name[group[0]]] + sorted(group) for group in groups.values()]
Data Structures Used
dict[str, str]forparent(DSU).dict[str, str]foremail_to_name.defaultdict(list)for grouping by root.
Correctness Argument
DSU correctness: Initially every element is its own set. Each union merges two sets. find returns a canonical representative. After path compression, find(x) == find(y) iff they were ever transitively unioned. Path compression and union by rank/size preserve this invariant and amortize each op to α(N).
Reduction correctness: Two emails belong to the same person iff there is a chain of accounts where consecutive accounts share an email. The unions on each account’s emails form precisely these chains; the resulting partition matches the equivalence-class definition.
Output correctness: Each component’s name is unambiguous because (a) every account contributing to the component has the same name as the others in that component (otherwise they’d be different people, and the input is well-formed by problem statement), and (b) any email in the group recovers the name via email_to_name.
Complexity
| Operation | Time | Space |
|---|---|---|
| Building DSU | O(Σ E · α) where E = total emails | O(Σ E) |
| Grouping + sort | O(Σ E log E) for sorting within each group | O(Σ E) |
| Total | O(Σ E log E) | O(Σ E) |
Implementation Requirements
from collections import defaultdict
class DSU:
def __init__(self):
self.parent = {}
self.rank = {}
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry: return
if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
def add(self, x):
if x not in self.parent:
self.parent[x] = x
self.rank[x] = 0
def accountsMerge(accounts):
dsu = DSU()
email_to_name = {}
for account in accounts:
name = account[0]
first = account[1]
for email in account[1:]:
dsu.add(email)
email_to_name[email] = name
dsu.union(first, email)
groups = defaultdict(list)
for email in dsu.parent:
groups[dsu.find(email)].append(email)
return [[email_to_name[g[0]]] + sorted(g) for g in groups.values()]
Tests
- Standard: 3-account merge → 1 merged + 2 separate.
- All accounts disjoint → each emerges separately.
- All accounts share one email → all merge into one.
- Single account → unchanged.
- Same name, different emails → separate accounts.
- Empty emails (problem disallows but defend): account[1:] is empty → no unions, no groups; account is dropped if no emails. Verify behavior.
Follow-up Questions
- “Number of Provinces (LC 547): given an N × N adjacency matrix, count groups.” → DSU over N nodes; union if
M[i][j] == 1; count distinct roots. - “Online: accounts arrive in a stream.” → DSU handles this natively; just keep adding and unioning.
- “What if names matter (same name + shared email merges, different name doesn’t)?” → Keep DSU but check name-compatibility before union; conflict means error or skip.
- “What if you need to remove an account?” → DSU doesn’t support remove. Use Link-Cut Trees or rebuild from scratch.
- “What if path compression isn’t allowed (read-only
find)?” → Use union by rank only; O(log N) per op instead of α.
Product Extension
Identity-resolution at LinkedIn, Salesforce, and ad networks merges user records by shared email/phone using DSU. Image segmentation libraries (OpenCV’s connectedComponents) use DSU under the hood. Distributed-system membership-protocols use DSU-like merges to track partition healing. Kruskal’s MST (Lab 08) uses DSU as its core data structure.
Language/Runtime Follow-ups
- Python: recursion in
findmay exceed limit at N > 10^4; use iterative two-pass (find root, then compress). - Java:
int[] parentfor integer keys is significantly faster thanHashMap<Integer, Integer>. Use iterativefind. - Go:
parent := make(map[string]string)for string keys, or[]intfor integer indices. - C++:
vector<int> parent(N); iota(parent.begin(), parent.end(), 0);is the clean pattern. Iterativefind. - JS/TS:
Map<string, string>is fine; use iterativefindto avoid call-stack issues at large N.
Common Bugs
- Recursion limit in Python’s
find— at N = 10^4 with worst-case chains, blows the stack. Use iterative or increase limit. - Forgetting path compression —
findbecomes O(N), not α. Functionality correct but TLE. - Union without rank — same TLE risk on adversarial inputs.
- Comparing
parent[x] == xvsfind(x) == xfor “is root” — onlyparent[x] == xis correct;find(x) == xis always true afterfindrewrites. - Forgetting to add an email to
parentbefore union (unioncallsfindwhich dereferencesparent[email]) — KeyError. - Mapping
email_to_nameper-account but overwriting — last write wins; usually fine here, but be deliberate. - Not deduplicating emails within an account (LC inputs may not, but the algorithm is robust either way).
Debugging Strategy
Print parent and rank after each union. For wrong groupings, trace which email failed to union with which other and which account broke the chain. For TLE, verify both path compression and union by rank are present; profile to confirm find dominates.
Mastery Criteria
- Recognized “merge by shared attribute” as DSU in <60 seconds.
- Wrote DSU with path compression and union by rank from blank screen in <6 minutes.
- Stated O(α) amortized complexity unprompted.
- Articulated when DFS is a valid alternative (offline, no online updates) and when DSU is mandatory (online stream of merges).
- Solved LC 547 (Number of Provinces) in <8 minutes.
- Solved LC 721 (Accounts Merge) in <20 minutes from cold start.
- Solved LC 305 (Number of Islands II) in <20 minutes — the canonical online-DSU problem.
- Articulated path compression’s effect on amortization in <60 seconds.
Lab 08 — MST via Kruskal (Min Cost to Connect All Points)
Goal
Build a minimum spanning tree (MST) on a complete graph derived from N points using Kruskal’s algorithm. After this lab you should be able to recognize the MST signal in <60 seconds, write Kruskal from a blank screen (sort + DSU + early-exit) in <8 minutes, and reason about when Kruskal beats Prim (sparse graphs, edge list already given) and vice versa (dense graphs, adjacency matrix).
Background Concepts
A spanning tree of a connected graph G is a subgraph that includes all V vertices and exactly V - 1 edges with no cycles. The MST is the spanning tree with minimum total edge weight. Two canonical algorithms:
- Kruskal’s: Sort all edges by weight ascending. Iterate; for each edge, union the endpoints if they’re in different components (use DSU); add to MST. Stop after V - 1 edges. Time O(E log E).
- Prim’s: Start from any vertex; maintain a min-heap of crossing edges; repeatedly extract the lightest edge to a new vertex. Time O(E log V) with a binary heap; O(E + V log V) with a Fibonacci heap.
For “connect all points” with edge weights = pairwise Manhattan distance, the graph is complete: E = V·(V-1)/2 ≈ V². At V = 1000, E ≈ 5 × 10^5 edges. Either algorithm works; Kruskal with DSU is the cleanest expression because we already have the edge list.
Interview Context
Min Cost to Connect All Points (LC 1584) is a Medium asked at Amazon, Bloomberg, and Salesforce. It’s a clean MST signal: “minimum total cost to make everything connected.” The senior signal is naming the problem ("this is MST on a complete graph") within 60 seconds, then choosing Kruskal vs Prim consciously based on density. Strong candidates also state the Cut Property as the correctness foundation.
Problem Statement
Given an array points where points[i] = [xi, yi] represents a point in 2D, the cost of connecting two points is the Manhattan distance between them: |xi - xj| + |yi - yj|. Return the minimum cost to connect all points, where any two points are connected if there is a path between them.
Constraints
- 1 ≤ |points| ≤ 1000
- −10^6 ≤ xi, yi ≤ 10^6
- All points are distinct.
Clarifying Questions
- Manhattan, Euclidean, or other metric? (Manhattan, per problem.)
- Are diagonal connections counted? (Implicitly yes — we connect any two points directly.)
- Are points distinct? (Yes, per constraint.)
- Single point — cost? (0, no edges needed.)
- Should the answer fit in 32-bit? (Max cost ≈ 999 · 4 × 10^6 ≈ 4 × 10^9 — use 64-bit just in case, though Python int is unbounded.)
Examples
points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
→ 20
points = [[3,12],[-2,5],[-4,1]]
→ 18
points = [[0,0]]
→ 0
Initial Brute Force
Enumerate all spanning trees and pick the one with minimum total weight. There are exponentially many. Infeasible.
A second “brute force” is Prim’s via array-scan (no heap): O(V²). At V = 1000: 10^6 ops — passes easily and is simpler than the heap version. This is actually a competitive option for this problem.
Brute Force Complexity
Spanning-tree enumeration: O(V^(V-2)) by Cayley’s formula. Infeasible. Array-scan Prim: O(V²). At V = 1000: 10^6 ops, well within budget.
Optimization Path
For dense graphs (E ~ V²), array-scan Prim is O(V²) — wins over Kruskal’s O(E log E) = O(V² log V). For sparse graphs, Kruskal or heap-Prim wins.
For LC 1584 specifically (V = 1000, dense), all three pass:
- Kruskal: O(V² log V²) = O(V² log V) = ~10^7 ops; passes in 1-2s.
- Heap-Prim: O(V² log V); same.
- Array-scan Prim: O(V²); fastest.
In interviews, Kruskal is the “safer” choice because the code is mechanical: edges → sort → DSU → loop. Show you can choose array-scan Prim when asked about dense graphs.
Final Expected Approach (Kruskal)
- Generate all V·(V-1)/2 edges with weight = Manhattan distance.
- Sort by weight ascending.
- Initialize DSU with V components.
- Iterate edges; if endpoints differ, union and add weight to total; count edges added.
- Stop when edges added == V - 1 (early exit).
Data Structures Used
- List of edges as
(weight, u, v)tuples. - DSU as in Lab 07 (path compression + union by rank).
- Integer accumulator for total cost.
Correctness Argument
Cut Property: For any cut (partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut belongs to some MST. Kruskal greedily picks the lightest edge that doesn’t create a cycle (i.e., the lightest edge crossing some cut between two components); by the Cut Property, this edge is safe — there is an MST containing it. Repeating this V - 1 times produces an MST.
No-cycle invariant: DSU’s find ensures we add an edge only when its endpoints are in different components. Since adding an edge between same-component endpoints creates a cycle, this is exactly the cycle-prevention check.
Termination: Each union reduces component count by 1; after V - 1 unions, the graph is connected. We stop early.
Complexity
| Operation | Time | Space |
|---|---|---|
| Edge generation | O(V²) | O(V²) |
| Sort | O(V² log V) | (in-place possible) |
| DSU loop | O(V² · α) | O(V) |
| Total | O(V² log V) | O(V²) |
Implementation Requirements
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression by halving
x = self.parent[x]
return x
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry: return False
if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
return True
def minCostConnectPoints(points):
n = len(points)
edges = []
for i in range(n):
xi, yi = points[i]
for j in range(i + 1, n):
xj, yj = points[j]
edges.append((abs(xi - xj) + abs(yi - yj), i, j))
edges.sort()
dsu = DSU(n)
total, count = 0, 0
for w, u, v in edges:
if dsu.union(u, v):
total += w
count += 1
if count == n - 1: break
return total
Tests
- Standard: 5 points → 20.
- Single point: 0.
- Two points: their Manhattan distance.
- Collinear points (all on x-axis): MST weight = max(x) - min(x).
- Stress: V = 1000 random — verify Kruskal and Prim agree.
- Adversarial: points on a grid — many edges of equal weight (test tie-breaking is stable).
Follow-up Questions
- “Connecting Cities With Minimum Cost (LC 1135)” → Same MST template; if multiple MSTs valid, return any total cost; -1 if disconnected.
- “Optimize Water Distribution in a Village (LC 1168)” → Add a virtual node 0 connected to each house with the well-cost; run MST on V + 1 nodes.
- “Critical Connections (bridges)” → Different problem (Tarjan’s bridge algorithm), see Phase README.
- “Maximum Spanning Tree.” → Sort descending; same algorithm.
- “Online edge insertion: maintain MST.” → Link-Cut Trees; advanced.
Product Extension
Network design (laying fiber, planning power grids), clustering algorithms (single-linkage clustering = MST followed by cut-largest-edges), image segmentation, and approximation algorithms for TSP all use MST as a primitive. AWS / Azure data-center backbone planning uses MST variants weighted by latency × cost.
Language/Runtime Follow-ups
- Python:
edges.sort()on tuples sorts lexicographically —(weight, u, v)works.heapqis overkill since we need all edges sorted upfront, not on-demand. - Java:
Arrays.sort(int[][])with a comparator on the weight column. Useint[]triples for cache locality. - Go:
sort.Slice(edges, func(i, j int) bool { return edges[i].w < edges[j].w }). - C++:
vector<tuple<int,int,int>>with default<ordering;sort(edges.begin(), edges.end()). - JS/TS:
edges.sort((a, b) => a[0] - b[0]). Avoida[0] < b[0]returning a boolean (subtle bug).
Common Bugs
- Forgetting the early exit when edges added = V - 1 — works but processes more edges than needed.
- Generating duplicate edges (i.e., (i, j) and (j, i)) — wastes time but doesn’t break correctness.
- Off-by-one in V - 1 — counting edges incorrectly leads to incomplete tree.
- Comparator returns boolean in JS — use subtraction.
- Integer overflow on edge weights — Manhattan distance bounded by 4 × 10^6; sum bounded by ~4 × 10^9, fits in 64-bit. In Python no issue; in Java use
long. - DSU’s
unionreturns nothing vs returns success boolean — pick a consistent API. - Returning the count of edges instead of total weight — silent.
Debugging Strategy
For small inputs, print the sorted edge list and the DSU state after each union. Verify the total edges added is exactly V - 1. If the result is too large, check that you’re not adding edges within the same component (the cycle check might be skipped). Compare against a Prim’s reference for stress tests.
Mastery Criteria
- Recognized “minimum cost to connect” as MST in <60 seconds.
- Wrote Kruskal from blank screen in <8 minutes.
- Chose Kruskal vs Prim consciously based on density when asked.
- Stated the Cut Property as the correctness foundation in <30 seconds.
- Solved LC 1584 in <20 minutes from cold start.
- Solved LC 1135 (Connecting Cities) in <12 minutes by extending the template.
- Solved LC 1168 (Water Distribution) in <15 minutes with the virtual-node trick.
- Stated the V² Prim option for dense graphs in <30 seconds.
Lab 09 — Graph Modeling (Bus Routes)
Goal
Practice the modeling skill that separates competent graph candidates from L5+ candidates: given a problem with no obvious graph, invent the right node and edge definition. After this lab you should be able to enumerate at least two valid graph models for a given problem, choose the one with the smallest state space, and justify the choice in <90 seconds.
Background Concepts
The hardest interview graph questions don’t say “graph.” Examples:
- Bus Routes (LC 815): “fewest buses to take” — model: nodes = bus routes (not stops!); edges = “two routes share a stop”; multi-source BFS from all routes containing the source stop.
- Word Ladder (LC 127): nodes = words; edges = one-character difference. (See Lab 01.)
- Open the Lock (LC 752): nodes = 4-digit states; edges = single-digit ±1; BFS.
- Sliding Puzzle (LC 773): nodes = board states (encode as string); edges = legal swap; BFS.
The modeling decision space is:
- What is a node? — Often the natural object (stop, word, board state) is wrong; a higher-order or quotient object yields a smaller state space.
- What is an edge? — Direct adjacency (one move), or “shared resource” (two routes share a stop).
- Is it weighted? — If yes → Dijkstra. If unweighted → BFS.
- Sources/targets? — Single, multiple, or “any-of-set” (multi-source BFS).
- Implicit vs explicit? — Build the adjacency upfront or compute on the fly.
The Bus Routes trap: candidates model nodes = stops, edges = “stops on the same route.” Then BFS distance is steps within a route, not number of routes taken. The whole question collapses to nonsense. Modeling nodes = routes makes the BFS distance number of routes, which is the answer ± 1.
Interview Context
Bus Routes (LC 815) is asked at Google, Meta, and Amazon at L5+. The question itself is mid-level once modeled correctly; the difficulty is the modeling. Interviewers probe modeling explicitly: “tell me how you’d represent this as a graph” — this is the test. If you flounder for 5 minutes, you’re done. The senior signal: state both modelings (stops vs routes), explain why the route-model gives the right BFS distance, then code.
Problem Statement
You are given an array routes where routes[i] is a bus route that the ith bus repeats forever. For example, routes[0] = [1, 5, 7] means the 0th bus travels in the sequence 1 → 5 → 7 → 1 → 5 → 7 → … forever. You start at source and want to reach target. You can travel between stops by buses only. Return the fewest number of buses required, or -1 if impossible.
Constraints
- 1 ≤ |routes| ≤ 500
- 1 ≤ |routes[i]| ≤ 10^5
- Σ |routes[i]| ≤ 10^5
- 0 ≤ routes[i][j] < 10^6
- 0 ≤ source, target < 10^6
Clarifying Questions
- Do
source == targetcases return 0? (Yes — no bus needed.) - Is
sourceguaranteed to be on some route? (Not necessarily — return -1 if not.) - Are routes circular as stated? (Yes — but that doesn’t matter for “buses taken”; what matters is which stops a route covers.)
- Are stop numbers unique within a route? (Per LC, yes — but treat defensively.)
- Can two routes share a stop? (Yes — that’s exactly how transfers happen.)
Examples
routes = [[1,2,7],[3,6,7]], source = 1, target = 6
→ 2 (take bus 0 from 1 to 7, then bus 1 from 7 to 6)
routes = [[7,12],[4,5,15],[6],[15,19],[9,12,13]], source = 15, target = 12
→ -1
Initial Brute Force
BFS over stops as nodes and edges between any two stops sharing a route. Build adjacency: for each route, add all (stop, stop’) pairs as edges. At Σ |routes| = 10^5, with a 10^5-stop route, that’s 5 × 10^9 edges — TLE / OOM.
Brute Force Complexity
O((Σ L)²) edges in the worst case. Infeasible at Σ L = 10^5.
Optimization Path
Switch the modeling: nodes = routes, edges = “two routes share a stop.” Build a stop → list of routes containing it index in O(Σ L). For each pair of routes sharing a stop, that’s an edge — but we never enumerate all such pairs explicitly. Instead, in BFS, when we visit route r, we expand to every other route sharing any of r’s stops, looked up via the index.
To avoid revisiting, mark routes (not stops) as visited. Also mark stops as visited (after expanding all routes through that stop) to avoid the O(L²) blowup of re-expanding a popular stop.
Final BFS distance = number of routes used. Answer is the BFS layer at which we find any route containing target.
Final Expected Approach
- If
source == target, return 0. - Build
stop_to_routes: dict of stop → set of route indices. - BFS over routes. Initialize queue with all routes containing
source, distance = 1. - For each popped
(route, d), scan all stops inroutes[route]. If any istarget, return d. - Mark each stop visited (skip if already). For each unvisited stop, enqueue all unvisited routes containing it, distance d + 1.
- If queue empties without finding
target, return -1.
Data Structures Used
defaultdict(set)forstop_to_routes.collections.dequefor BFS queue.setfor visited routes and visited stops.
Correctness Argument
Distance interpretation: Initializing the queue with routes containing source at distance 1 means: distance d = “number of routes taken so far, including the current one.” When a route at distance d covers target, the answer is d.
No double-counting: Marking stops visited prevents the O(L²) blowup; marking routes visited prevents re-expansion. Both are needed: a stop is visited once we’ve added all its routes; a route is visited once we’ve expanded its stops.
BFS optimality: Standard BFS argument on the route-graph — first time a route is popped, its distance is minimum. Therefore the first time a route containing target is popped, the answer is its distance.
Complexity
| Operation | Time | Space |
|---|---|---|
| Index build | O(Σ L) | O(Σ L) |
| BFS | O(Σ L + R²) where R = routes count | O(Σ L) |
| Total | O(Σ L + R²) | O(Σ L) |
(R² because in the worst case every pair of routes shares some stop and we add an edge.)
Implementation Requirements
from collections import defaultdict, deque
def numBusesToDestination(routes, source, target):
if source == target:
return 0
stop_to_routes = defaultdict(set)
for i, r in enumerate(routes):
for s in r:
stop_to_routes[s].add(i)
if source not in stop_to_routes:
return -1
visited_routes = set()
visited_stops = {source}
queue = deque()
for r in stop_to_routes[source]:
queue.append((r, 1))
visited_routes.add(r)
while queue:
route, d = queue.popleft()
for stop in routes[route]:
if stop == target:
return d
if stop in visited_stops:
continue
visited_stops.add(stop)
for nr in stop_to_routes[stop]:
if nr not in visited_routes:
visited_routes.add(nr)
queue.append((nr, d + 1))
return -1
Tests
- Standard:
[[1,2,7],[3,6,7]], src=1, tgt=6 → 2. - src == tgt: → 0.
- src not on any route: → -1.
- Single route covering both: → 1.
- Disconnected routes: → -1.
- Long route (10^5 stops, 1 route): src and tgt on it → 1; tgt not on it → -1.
- Many routes sharing one hub: BFS expansion through hub.
Follow-up Questions
- “What if buses have different costs?” → Dijkstra over routes with edge weight = cost.
- “What’s the actual sequence of buses taken?” → Maintain parent pointers in BFS; reconstruct.
- “What if you can walk between adjacent stops?” → Add walking edges; might or might not change the model.
- “Multi-source: any of K starting stops to any of M targets.” → Multi-source BFS on the route-graph.
- “Online: routes added/removed live.” → Recompute on each query, or use dynamic-graph techniques.
Product Extension
Real transit-routing systems (Google Maps, Citymapper, Apple Maps) model transit as a multi-modal graph: nodes are stops, walking edges connect nearby stops, transit edges represent “ride a route between two of its stops.” The route-as-node model used here is a simplification useful for “fewest transfers” queries. Production systems combine with time-dependent shortest-path (CSA, RAPTOR) for actual journey planning.
Language/Runtime Follow-ups
- Python:
defaultdict(set)anddeque. The inner loop scansroutes[route]once per route popped — bound this by visiting stops only once. - Java:
HashMap<Integer, Set<Integer>>for the index;ArrayDeque<int[]>for(route, distance). - Go:
map[int]map[int]boolormap[int][]int. Slice-as-queue. - C++:
unordered_map<int, vector<int>>.queue<pair<int,int>>. Reserve capacity if Σ L is known. - JS/TS:
Map<number, Set<number>>and array-as-queue (use a deque polyfill if N large).
Common Bugs
- Modeling stops as nodes — leads to O((Σ L)²) edge count and TLE.
- Returning d instead of d - 1 (or vice versa) — make sure the distance semantics match: d = number of buses taken.
- Not marking the source stop as visited — re-expanding routes through it.
- Not marking routes as visited — exponential queue blowup.
- Returning -1 when source == target instead of 0.
- Forgetting that
source not in stop_to_routesis a -1 case. - Stack overflow in BFS (no — BFS is iterative; this is a DFS-only problem).
Debugging Strategy
For a 3-route example, print the stop_to_routes index. Walk through BFS by hand: queue contents, visited sets after each pop. Verify d increases exactly once per layer of routes. If TLE, profile to confirm visited-stop marking is preventing repeated route expansion. If wrong answer, check whether you’re returning d or d - 1.
Mastery Criteria
- Recognized “fewest buses” as a graph problem in <30 seconds.
- Enumerated both stop-as-node and route-as-node models in <90 seconds.
- Articulated why route-as-node yields the correct distance unprompted.
- Wrote BFS with both visited-routes and visited-stops sets in <12 minutes.
- Stated O(Σ L + R²) complexity unprompted.
- Solved LC 815 in <30 minutes from cold start.
- Solved LC 752 (Open the Lock) in <20 minutes by encoding state as a string.
- Solved LC 773 (Sliding Puzzle) in <30 minutes by encoding the 2D board as a string state.
- When given a new “no obvious graph” problem, can produce a correct model in <3 minutes.
Phase 5 — Dynamic Programming (Basic → Extreme)
Target level: Medium → Very Hard Expected duration: 4 weeks (12-week track) / 5 weeks (6-month track) / 6 weeks (12-month track) Weekly cadence: ~6 DP topics per week + 30–60 problems applying them under the framework
Why Dynamic Programming Is The Single Hardest Pattern Family In Coding Interviews
Phase 4 taught you that one-in-three Medium-Hard interview problems is a graph problem in disguise. The other big share is dynamic programming. DP shows up in roughly one in four Medium-Hard rounds at top-tier companies, and the share rises further in staff/principal and quant interviews where exact-counting and optimization questions dominate. More importantly, DP is the topic where the gap between candidates who have a framework and those who don’t is widest. A candidate without a DP framework freezes on dp[i][j] = ?; a candidate with one writes the recurrence in 90 seconds and spends the remaining time on edge cases.
The empirical claim that drives this entire phase:
The hard part of DP is not the code. The hard part is deriving the state. Almost every wrong DP solution is wrong because the state is wrong — too small to capture all the information needed, or so large that the table doesn’t fit in memory or time. Once the state is right, the transition writes itself, the base cases follow from the state, and the code is mechanical.
DP is also the topic where candidates most often memorize problems instead of internalizing the technique, and it is the topic where memorization fails most spectacularly. There are perhaps 60 named DP problems on LeetCode that everyone has seen; an interviewer who wants to filter out the memorizers asks the 61st. The solution, then, is not to drill 60 problems — it is to drill the derivation process until you can derive any DP from the recursive formulation.
This phase is built around one teaching device that we will use on every single problem from start to finish: the brute → memo → tabulated → space-optimized progression. Every problem you solve in this phase will be solved four times in succession:
- Brute force — usually exponential recursion that explores every choice.
- Memoized — the same recursion plus a cache, top-down DP, O(states × transition) time.
- Tabulated — bottom-up loop in topological order on the state DAG, O(states × transition) time, no recursion stack.
- Space-optimized — a rolling-array transformation that keeps only the previous one or two layers and reduces space from O(N · M) to O(M) or O(1).
By the end of this phase, you will execute this progression unconsciously. When an interviewer asks “can you reduce the space?”, you will already have written the tabulated version with the rolling array in mind. When the interviewer asks “what’s the recurrence?”, you will already have derived it from the brute-force recursion in the first 90 seconds of the problem. The four-stage progression is the single most valuable interview-time discipline taught in this entire curriculum, because it converts an open-ended “design a DP” question into a deterministic four-step recipe.
After this phase, you can solve canonical Hard DP problems on first attempt: edit distance in 25 minutes with full progression, longest increasing subsequence at both O(N²) and O(N log N), partition equal subset sum, coin change (count and minimum), burst balloons (interval DP), house robber III (tree DP), and shortest path visiting all nodes (bitmask DP). You will also become visibly stronger in mock interviews because you will reach for dp[i][j] notation on the whiteboard within 90 seconds and articulate the state definition out loud before writing any code.
What You Will Be Able To Do After This Phase
- Recognize that a problem is a DP problem within 2 minutes, even when the words “DP” or “memoize” never appear in the statement.
- Derive the state from a recursive brute force in <3 minutes by identifying the parameters that change across recursive calls.
- Write the transition as a closed-form max/min/sum over a small set of choices, with clear correctness justification.
- Identify base cases as the recursive function’s return statements at the smallest input.
- Write the tabulated version by inverting the recursion into a loop with the right evaluation order on the state DAG.
- Apply the rolling-array trick to reduce O(N · M) space to O(M) or O(1) when the recurrence depends only on the previous row(s).
- Distinguish 0/1 knapsack from unbounded knapsack by a single change in the inner-loop direction.
- Recognize when a “subset” or “partition” problem reduces to subset sum with target
total / 2. - Implement LIS at both O(N²) (canonical DP) and O(N log N) (patience sorting + binary search) and explain the equivalence.
- Implement edit distance with all four variants (brute, memo, tabulated, O(M)-space) in <25 minutes.
- Implement tree DP via post-order recursion, returning multi-tuple state (e.g., “best with root included” and “best with root excluded”).
- Implement interval DP with the canonical
for length, for left, right = left + length - 1loop structure. - Implement bitmask DP with state
(mask, last)for TSP-style problems and(mask)alone for set-cover-style problems. - Articulate the correctness argument for every DP you write: state definition, transition justification, evaluation order, base case.
- Spot the standard DP bugs unprompted: wrong base case, wrong evaluation order, off-by-one in indices, missed edge case at empty input.
How To Read This Phase
Read this README in two passes. Pass 1: linear, end-to-end, building a mental map of which DP variant solves which problem signal. Do this in one sitting. Pass 2: as you work the labs, refer back to specific topic entries to clarify state-design choices and pitfalls.
Each topic entry has a fixed shape:
- When To Use — the problem signal that should fire this DP variant in <2 minutes.
- State Design — what the state is, why these are the right parameters, why no fewer suffice.
- Transition — the recurrence in closed form.
- Complexity — time and space, and what space optimization is possible.
- Common Pitfalls — the bugs that consume the most interview minutes for this DP variant specifically.
- Classic Problems — 3–5 representative LeetCode problems where this DP is the intended solution.
The phase ends with a DP-Recognition Cheat Sheet (problem signals → DP variant), a Common-Bug Catalog, a Mastery Checklist, and Exit Criteria.
The DP Framework
Before any topic, internalize this framework. Use it on every DP problem.
1. State Definition
The state is the smallest set of parameters that uniquely determines the answer to a subproblem. Write it down explicitly:
dp[i] =the answer to the subproblem ending at / using up to / for prefix-of-length / etc., parameteri.
Sentences that begin “let dp[i] be” are the most valuable two seconds of the entire problem. If you can’t finish the sentence, you don’t have a state — you have a vague hope.
The state must be sufficient (encodes everything the future needs to know) and necessary (every parameter actually changes the answer). A common bug is a state that’s sufficient but not necessary — e.g., tracking both index and remaining-budget when budget is determined by index. Another is necessary but not sufficient — e.g., tracking only the index when the choice depends on what was picked earlier.
A useful test: two equal states must produce equal answers. If two different histories arrive at the same state but have different optimal continuations, the state is missing a dimension.
2. Transition Function
The transition expresses dp[state] as a function of dp[smaller_state] for one or more smaller states. It is the recurrence. For optimization problems, it is min or max over a small set of choices; for counting problems, it is a sum.
The transition has three parts:
- Choices — the discrete set of moves at this state (include item or skip; pick this character or that one; rob this house or skip).
- Cost / value — the contribution of each choice (the item’s weight, the operation’s cost, the gain from picking).
- Aggregation —
min/max/sum/ORover the choices.
Write the transition as:
dp[state] = aggregate over choice c in C(state): contribution(c) + dp[state - effect(c)]
Always keep C(state) finite and small — typically O(1), O(K), or O(N). Transitions that aren’t O(small) usually indicate a missing state dimension.
3. Base Cases
The base cases are the values of dp at the smallest (recursion-stopping) states. They are not optional; a missing or wrong base case is the single most common DP bug.
Identify base cases by writing the recursion first and looking at the return-statements:
def f(i):
if i == 0: return 0 # ← THIS is the base case
return f(i-1) + something
For 2D DP, the base cases are typically the entire first row and first column — set them explicitly before the main loop. For 3D DP and beyond, they’re a hyper-plane of dimension one less than the state.
A subtle base case bug: two different recursive paths reach the same base case but expect different return values. Usually this means the state is wrong (missing a dimension) and the base case has to “remember” which path it came from — impossible.
4. Evaluation Order
DP states form a DAG: state A points to state B iff dp[B] appears in the recurrence for dp[A]. To compute dp[A] we must have already computed dp[B]. The evaluation order is a topological order of this DAG.
For 1D DP indexed by i, the order is usually i = 0, 1, 2, …, N (increasing) or i = N, N-1, …, 0 (decreasing) — depending on whether your transition looks “back” or “forward”. Both work; pick one consistently.
For 2D DP indexed by (i, j), the order is usually row-major (for i: for j:) or column-major. The right one is the one that fills dp[i-1][j] and dp[i][j-1] before dp[i][j].
For interval DP indexed by (left, right), the order is by interval length ascending: for length in 1..N: for left, right = left + length - 1. This guarantees all sub-intervals are filled before the enclosing interval.
For tree DP, the order is post-order DFS: children are filled before the parent.
For bitmask DP, the order is by popcount ascending or by mask value ascending (since a sub-mask of m is < m).
5. Space Optimization (The Rolling-Array Technique)
If dp[i][j] depends only on dp[i-1][*] and dp[i][j-1], then once row i is computed we don’t need any earlier row. Keep only the current and previous row — O(N · M) becomes O(M).
If dp[i][j] depends only on dp[i-1][j] and dp[i-1][j-1] (no same-row dependency), you can collapse to a single row using right-to-left iteration in j — O(M) space.
If dp[i][j] depends on more rows (e.g., dp[i-2]), keep that many rows.
The rolling-array transformation is mechanical once the tabulated version is written. The interviewer often asks for it: “can you reduce the space?” Practice the transformation on every lab so it becomes reflex.
6. The Brute → Memo → Tabulated → Space-Optimized Progression
Apply this on every problem in this phase. It is the mandatory teaching device of this phase.
- Brute force: exponential recursion that tries every choice. Often O(2^N) or O(N!). Don’t skip writing this — it is the direct source of your state and transition.
- Memoized: same recursion + a cache (
@lru_cachein Python, aHashMapin Java, an array in C++). Time becomes O(states × transition); space includes the recursion stack. - Tabulated: replace recursion with a loop in topological order on the state DAG. Same time complexity, but no recursion overhead, and the loop structure makes the dependency pattern explicit.
- Space-optimized: roll the table down to O(M) or O(1) by exploiting the recurrence’s locality.
Each stage is a strictly smaller change from the previous: brute → memo is “add a cache”, memo → tabulated is “invert the call graph”, tabulated → space-optimized is “drop dimensions you don’t reuse”. This is deterministic engineering, not invention.
Inline DP Topic Reference
1. Memoization Vs Tabulation Tradeoffs
When To Use
Both compute the same thing — they’re two evaluation orders on the same state DAG. Choose deliberately.
- Memoization (top-down): when the reachable set of states is much smaller than the full state space (sparse DP). Examples: regex matching where many
(i, j)pairs are never visited. Also when the transition is easier to express recursively than as a forward loop. - Tabulation (bottom-up): when the state space is dense (most states are visited), when you need to reduce space via rolling arrays (which require explicit loop structure), or when recursion depth would exceed the stack limit (e.g., N=10^5 in Python with default
setrecursionlimit=1000).
Common Pitfalls
- Memoization in Python with default
setrecursionlimitoverflows at N ≈ 1000. Eithersys.setrecursionlimit(10**6)or convert to tabulation. - Memoization with mutable arguments (e.g.,
lru_cacheon a function taking a list) — Pythonlru_cacherequires hashable arguments; pass tuples or indices, not lists. - Tabulation with the wrong loop order silently produces garbage — see Section 5.
Classic Problems
- LC 322 — Coin Change. Memoized recursion is natural; tabulated is faster in tight loops. See Lab 04.
- LC 10 — Regular Expression Matching. Memoization is cleaner here. See String DP below.
2. State Design Principles
When To Use
Every DP problem starts here. Follow this discipline:
- Identify what changes across recursive calls. Those parameters are candidate state dimensions.
- Drop any parameter that is determined by the others.
- Keep any parameter whose value affects the optimal continuation.
- Verify: two states with identical parameters must have identical optimal values.
State Design Patterns
- Prefix DP:
dp[i]= answer for firstielements. Used in LIS, house robber, decode ways. - Two-pointer / interval DP:
dp[i][j]= answer for elements in[i, j]. Used in matrix chain, burst balloons, palindromic subsequence. - Knapsack-style:
dp[i][w]= best using firstiitems with budgetw. Used in 0/1 knapsack, partition, coin change. - Two-string DP:
dp[i][j]= answer for prefix-i of A and prefix-j of B. Used in LCS, edit distance, regex matching. - Tree DP:
dp[v]= answer for subtree rooted atv. Often a tuple of values (e.g., “rob” / “skip”). - Bitmask DP:
dp[mask]ordp[mask][last]= answer over the subset specified bymask. - Game DP:
dp[state]= best score the current player can guarantee.
Common Pitfalls
- Adding a parameter that doesn’t affect the answer — wastes time and space. E.g., tracking
step_countwhen the recurrence already encodes it via the index. - Missing a parameter that does — produces wrong answers because two materially different histories collapse to the same
dpcell. - Encoding choices in the state instead of the transition. The state is “where are we now”; the transition decides “what to do next”. Keep them separate.
3. Transition Function Design
When To Use
Once the state is defined, the transition is constrained: it must express dp[state] in terms of strictly smaller states.
Design Steps
- List all choices available at this state (include / skip / pick which / move where).
- For each choice, identify the contribution and the resulting smaller state.
- Aggregate:
minfor shortest/cheapest,maxfor longest/most-valuable,+for counting.
Common Pitfalls
- Forgetting a choice — usually “don’t take the item” or “skip this position”. Often the trivial choice that the recurrence still depends on.
- Double-counting — particularly in counting problems where two distinct paths to the same state are aggregated naively. Often signals a missing dimension.
- Off-by-one in the resulting smaller state —
dp[i-1][j-1]vsdp[i][j-1]is the difference between “use this character” and “use the prefix ending here”.
4. Base Case Identification
When To Use
After defining state and transition, the recursion bottoms out at some smallest state. The base case is what dp returns there.
Identifying Base Cases
- For prefix DP
dp[i]: the base case isdp[0]— the empty prefix. Its value is the natural identity (0 for sums, 1 for counts of “the empty product”, -∞ or +∞ for unreachable). - For interval DP
dp[i][j]: the base case isdp[i][i](length-1 interval) — its value depends on the problem. - For two-string DP
dp[i][j]: the base cases aredp[0][j] = janddp[i][0] = ifor edit distance, ordp[0][j] = dp[i][0] = 0for LCS.
Common Pitfalls
- Wrong identity for counting problems — the empty prefix has count
1(one way to make nothing), not0. - Wrong identity for
min/max— initialize to+∞/-∞, not0. Initializing to0silently makes “do nothing” look optimal. - Forgetting to set base cases on the boundary of a 2D table — leaves them as the language’s default (
0in Java arrays;nullin JS; uninitialized garbage in C).
5. Evaluation Order (Topological Order On The State DAG)
When To Use
Every DP. The evaluation order must be consistent with the dependency structure.
Determining The Order
Treat states as nodes; draw an edge from A to B iff the recurrence for A reads dp[B]. The order to evaluate is reverse topological (children before parents).
For most DPs the order is obvious: increasing i, increasing j, increasing interval length, post-order over the tree, increasing popcount of mask. When in doubt, fix a small example, write out the dependency arrows, and read off the order.
Common Pitfalls
- 2D DP iterated in the wrong order silently computes garbage. The classic bug: iterating
for j: for i:when the recurrence readsdp[i-1][j]anddp[i][j-1]. The latter is fine only if the inner loop fills the column top-to-bottom and you compute it in the right order. - Interval DP iterated in
(left, right)order instead of(length, left)— fails because you computedp[0][N-1]beforedp[1][N-1]. - Bitmask DP iterated by some-arbitrary-order instead of mask value ascending — fails if any sub-mask is read after the enclosing mask is written.
6. Space Optimization (Rolling Array Technique)
When To Use
Whenever dp[i][...] depends on only the previous one or two i values, you can keep just those rows.
Mechanical Transformation
- Replace
dp[i][j]withdp_curr[j]anddp[i-1][j]withdp_prev[j]. - After each
i, swap or copy. - If
dp[i][j]doesn’t depend ondp[i][k]fork < j(no same-row dependency), collapse further to a single 1Ddp[j]. Iteratejcarefully: if the recurrence readsdp[i-1][j-1], iteratejfrom right-to-left so you read the old value before overwriting.
Common Pitfalls
- Iterating left-to-right when right-to-left is needed — overwrites the value you’ll need next. This is the canonical 0/1 knapsack vs unbounded knapsack distinction:
- 0/1 knapsack: iterate weight right-to-left to use the previous-row’s
dp[w-wi]. - Unbounded knapsack: iterate weight left-to-right to use the current row’s
dp[w-wi](because items can be reused).
- 0/1 knapsack: iterate weight right-to-left to use the previous-row’s
- Forgetting to reset the rolling array between outer iterations — old values bleed through.
- Optimizing space prematurely, before the tabulated version is correct. Always verify tabulated against memoized on small inputs first.
7. 1D DP
When To Use
The state is a single integer index — dp[i]. Examples: climbing stairs, house robber, decode ways, max subarray (Kadane).
State Design
dp[i] = answer for the prefix ending at index i, OR for the first i elements. Pick one convention and stick with it (the lab uses “answer for first i elements” consistently).
Transition
dp[i] = f(dp[i-1], dp[i-2], ..., dp[i-k]) for some small k. Examples:
- House robber:
dp[i] = max(dp[i-1], dp[i-2] + house[i-1]). - Climbing stairs:
dp[i] = dp[i-1] + dp[i-2](Fibonacci). - Decode ways:
dp[i] = (dp[i-1] if s[i-1] is valid 1-digit) + (dp[i-2] if s[i-2:i] is valid 2-digit).
Complexity
Time O(N). Space O(N) tabulated, O(1) space-optimized (since dependence on at most O(1) previous values).
Common Pitfalls
- Off-by-one between
dp[i]andhouse[i]— confusion between “first i houses” (useshouse[i-1]as the latest) and “ending at index i” (useshouse[i]). Pick one and never mix. - Forgetting the empty case —
dp[0]for “first 0 elements” must be the identity.
Classic Problems
- LC 70 — Climbing Stairs. See Lab 01.
- LC 198 — House Robber.
- LC 91 — Decode Ways.
- LC 53 — Maximum Subarray (Kadane).
- LC 746 — Min Cost Climbing Stairs.
8. 2D DP
When To Use
The state is a pair of integers — dp[i][j]. Examples: unique paths on a grid, minimum path sum, longest common subsequence, edit distance.
State Design
For grid problems, dp[i][j] = answer for getting to cell (i, j). For two-string problems, dp[i][j] = answer for prefix-i of one string and prefix-j of the other.
Transition
For grid: dp[i][j] = dp[i-1][j] + dp[i][j-1] (count of paths) or min(dp[i-1][j], dp[i][j-1]) + grid[i][j] (min path sum).
Complexity
Time O(N · M). Space O(N · M) tabulated, O(M) with rolling rows, O(M) with right-to-left collapse to 1D when there’s no same-row dependency.
Common Pitfalls
- Initializing first row and first column wrong for grid path problems — these are not always 0 or 1; they may carry obstacles or grid values.
- Adding
grid[i][j]to all transitions including the boundary — the boundary needs special handling.
Classic Problems
- LC 62 — Unique Paths.
- LC 63 — Unique Paths II (with obstacles). See Lab 02.
- LC 64 — Minimum Path Sum.
- LC 120 — Triangle.
9. 0/1 Knapsack
When To Use
A set of N items each with weight w_i and value v_i; capacity W; maximize value subject to total weight ≤ W. Each item used at most once. Recognized by discrete choices over a budget.
State Design
dp[i][w] = max value using first i items with capacity w.
Transition
dp[i][w] = max(dp[i-1][w], dp[i-1][w - w_i] + v_i) if w >= w_i, else dp[i-1][w]. The two cases: skip item i, or take it.
Complexity
Time O(N · W). Space O(W) with right-to-left collapse: iterate w from W down to w_i.
Common Pitfalls
- Iterating
wleft-to-right in the 1D-collapsed version — turns 0/1 knapsack into unbounded knapsack, allowing the same item to be picked multiple times. - Treating
Was a free variable when it’s actually constrained by problem size — atW = 10^9the table doesn’t fit; switch to meet-in-the-middle or branch-and-bound (out of scope here).
Classic Problems
- LC 416 — Partition Equal Subset Sum (0/1 knapsack reformulation: target = total / 2). See Lab 03.
- LC 494 — Target Sum.
- LC 474 — Ones and Zeroes (2D knapsack).
10. Unbounded Knapsack
When To Use
Same as 0/1 knapsack but each item can be used any number of times. Recognized by “unlimited supply” / “any number of coins” / “items can be reused”.
State Design
dp[w] = best value with capacity w, considering all items as candidates at every step.
Transition
dp[w] = max(dp[w], dp[w - w_i] + v_i) for every item i such that w >= w_i.
Complexity
Time O(N · W). Space O(W). Iterate w left-to-right.
Common Pitfalls
- Iterating wrong direction — same as 0/1 knapsack but inverted. Left-to-right makes items reusable; right-to-left makes them one-use.
- Confusing “min number of items” with “max value” — in coin change (min coins), initialize to
+∞, transition isdp[w] = min(dp[w], dp[w - c] + 1). - Counting orderings vs combinations: for “number of ways to make change as combinations”, the outer loop is over coins and inner over sums; for “number of ordered sequences”, swap them. The two produce different counts.
Classic Problems
- LC 322 — Coin Change (min coins). See Lab 04.
- LC 518 — Coin Change II (count combinations).
- LC 279 — Perfect Squares.
- LC 139 — Word Break (unbounded with “items” = dictionary words).
11. Subset Sum / Partition Equal Subset Sum
When To Use
“Can we pick a subset summing to T?” Recognized in: partition problems, target-sum problems, equal-sum-subsets.
Reformulation
Subset sum is 0/1 knapsack with v_i = w_i and target W = T. Use dp[w] = bool (reachable or not) instead of “max value”, and aggregate with OR instead of max.
Complexity
Time O(N · T). Space O(T) (bool array, can use a bitset for O(T / 64) space and time).
Common Pitfalls
- Forgetting that target
Tmay be huge — for “partition equal subset sum”,T = total / 2; iftotalis odd, return false immediately. - Using
maxinstead ofORfor boolean aggregation.
Classic Problems
- LC 416 — Partition Equal Subset Sum. See Lab 03.
- LC 698 — Partition to K Equal Sum Subsets (harder; bitmask DP).
12. LIS — Longest Increasing Subsequence
When To Use
“Longest subsequence with property P” where P is monotonic (increasing, non-decreasing, or some order relation).
State Design (O(N²) DP)
dp[i] = length of LIS ending at index i and using arr[i] as the last element.
Transition
dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]). Answer is max(dp[1..N]).
Complexity
O(N²) time, O(N) space.
Patience Sort / O(N log N) Variant
Maintain tails[k] = smallest possible tail of any increasing subsequence of length k+1. For each arr[i], find the leftmost tails[k] >= arr[i] via binary search and replace it with arr[i] (or append if arr[i] > all). The length of tails at the end is the LIS length.
This is patience sorting — laying cards onto piles where each pile is strictly decreasing top-to-bottom, and the number of piles is the LIS length.
Complexity
O(N log N) time, O(N) space.
Common Pitfalls
- Confusing “LIS length” with “LIS itself” —
tailsis not the LIS; reconstructing the actual sequence requires storing predecessors during scan. - Strict vs non-strict — for non-decreasing, use
bisect_rightinstead ofbisect_left.
Classic Problems
- LC 300 — Longest Increasing Subsequence. See Lab 05.
- LC 354 — Russian Doll Envelopes (sort + LIS).
- LC 673 — Number of Longest Increasing Subsequences.
13. LCS / Edit Distance Family
When To Use
Two strings, asking for similarity, alignment, or transformation cost. Includes longest common subsequence, edit distance (Levenshtein), longest common substring (different state!), and shortest common supersequence.
State Design
dp[i][j] = answer for prefix-i of A and prefix-j of B.
Transitions
- LCS:
dp[i][j] = dp[i-1][j-1] + 1ifA[i-1] == B[j-1], elsemax(dp[i-1][j], dp[i][j-1]). - Edit distance (Levenshtein): if match,
dp[i][j] = dp[i-1][j-1]; else1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])for replace / delete / insert. - Longest common substring (different!): if match,
dp[i][j] = dp[i-1][j-1] + 1; elsedp[i][j] = 0. Answer ismax(dp[i][j])over all(i, j). The “else = 0” is what makes it substring vs subsequence.
Complexity
Time O(N · M). Space O(N · M) tabulated, O(M) with two rolling rows, O(M) with one row + a single saved diagonal value.
Common Pitfalls
- Confusing subsequence and substring — they have different recurrences. Subsequence allows skipping; substring requires contiguity.
- Edit distance with non-unit costs (insert/delete/replace each have a custom cost) — works the same with custom weights instead of
+1. - Reconstructing the alignment requires backtracking through
dpchoices; store back-pointers or reconstruct from values.
Classic Problems
- LC 1143 — Longest Common Subsequence.
- LC 72 — Edit Distance. See Lab 06.
- LC 583 — Delete Operation for Two Strings.
- LC 712 — Minimum ASCII Delete Sum.
- LC 718 — Maximum Length of Repeated Subarray (LCS variant; substring).
14. Palindrome DP
When To Use
Anything about palindromic substrings or subsequences: count, longest, partition into palindromes, minimum cuts.
Variant 1: Longest Palindromic Subsequence
dp[i][j] = length of longest palindromic subsequence in s[i..j].
dp[i][j] = dp[i+1][j-1] + 2 if s[i] == s[j]
= max(dp[i+1][j], dp[i][j-1]) otherwise
Answer: dp[0][N-1]. Evaluation order: by interval length ascending.
Variant 2: Longest Palindromic Substring
is_pal[i][j] = boolean. is_pal[i][j] = (s[i] == s[j]) and (j - i < 2 or is_pal[i+1][j-1]). Track max length and start during fill.
(Manacher’s algorithm gives O(N) for this; see Phase 3.)
Variant 3: Palindrome Partitioning Min Cuts
cuts[i] = min cuts to partition s[0..i] into palindromes.
cuts[i] = 0 if s[0..i] is itself a palindrome
= min(cuts[j-1] + 1) for all j ≤ i with s[j..i] palindrome
Precompute is_pal[i][j] first (O(N²)), then run the cut DP (O(N²)). Total O(N²).
Common Pitfalls
- Computing
is_palafter the cut DP — circular dependency. - Wrong evaluation order in
dp[i][j]— must fill smaller intervals first; iterate by length ascending.
Classic Problems
- LC 516 — Longest Palindromic Subsequence. See Lab 07.
- LC 5 — Longest Palindromic Substring.
- LC 132 — Palindrome Partitioning II. See Lab 07.
- LC 647 — Palindromic Substrings.
15. String DP
When To Use
Pattern matching with wildcards or operators: regex, glob/wildcard, interleaving, distinct subsequences. The state is two indices (one per string).
Variant: Regex Matching (LC 10)
dp[i][j] = does p[0..j] match s[0..i]?
if p[j-1] == '*':
dp[i][j] = dp[i][j-2] # match zero of preceding
or (matches(s[i-1], p[j-2]) and dp[i-1][j]) # match one more
elif matches(s[i-1], p[j-1]):
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = False
Variant: Wildcard Matching (LC 44)
Similar but * matches any sequence: dp[i][j] = dp[i-1][j] or dp[i][j-1] when p[j-1] == '*'.
Variant: Interleaving Strings (LC 97)
dp[i][j] = can s3[0..i+j] be formed by interleaving s1[0..i] and s2[0..j]? Transition: take from s1 if s1[i-1] == s3[i+j-1]; take from s2 symmetrically; OR them.
Common Pitfalls
- Off-by-one between pattern index and
dpindex — almost universal source of regex DP bugs. *semantics differ between regex and glob; read the problem carefully.
Classic Problems
- LC 10 — Regular Expression Matching.
- LC 44 — Wildcard Matching.
- LC 97 — Interleaving String.
- LC 115 — Distinct Subsequences.
16. Tree DP
When To Use
The structure is a tree (rooted or rootable); the answer at a node depends on its subtree. Examples: house robber III, max path sum, longest path / diameter.
State Design
dp[v] = answer for the subtree rooted at v. Often a tuple: (best_with_v_chosen, best_without_v_chosen). Tuples are essential when the parent’s decision depends on whether the child was used.
Evaluation Order
Post-order DFS — fill children before the parent.
Transition
Aggregate over children. For house robber III: rob[v] = val[v] + sum(skip[c] for c in children); skip[v] = sum(max(rob[c], skip[c]) for c in children).
Complexity
Time O(V). Space O(V) for the recursion stack.
Common Pitfalls
- Stack overflow at deep trees in Python (default limit 1000) —
sys.setrecursionlimit(2 * 10**5)or convert to iterative post-order. - Mishandling N-ary vs binary children — N-ary requires summing over a dynamic list; binary is hard-coded
(left, right). - Forgetting to handle null children — return identity values (0 or -∞).
Classic Problems
- LC 337 — House Robber III. See Lab 08.
- LC 124 — Binary Tree Maximum Path Sum.
- LC 543 — Diameter of Binary Tree (variant).
- LC 968 — Binary Tree Cameras (multi-state tree DP).
17. Interval DP
When To Use
The state is (left, right) — an interval — and the transition picks a “split point” k in [left, right]. Examples: matrix chain multiplication, burst balloons, palindrome partitioning, optimal BST, stone game.
State Design
dp[i][j] = answer for interval [i, j]. Often the meaningful question is “what is the last operation on this interval”, which forces a choice of split point k.
Transition
dp[i][j] = aggregate over k in [i..j]: dp[i][k-1] + dp[k+1][j] + cost(i, j, k).
The cost(i, j, k) typically depends on the boundaries of the interval — not just k — because the interval’s neighbors after the split are still i-1 and j+1.
Evaluation Order
By interval length ascending: for length in 1..N: for left in 0..N-length: right = left + length - 1.
Complexity
Time O(N³) in general (O(N²) intervals × O(N) split points). Space O(N²).
Common Pitfalls
- Iterating
(i, j)in the wrong order — must fill smaller intervals first. Length-ascending is the canonical order. - Choosing the wrong “thing” to split on — e.g., for burst balloons, the right state is “last balloon to burst in
[i, j]” rather than “first balloon”. - Confusing the boundaries — the cost in burst balloons uses
nums[i-1]andnums[j+1]as multipliers because those are the surviving neighbors at the moment the last balloon in[i, j]is burst.
Classic Problems
- LC 312 — Burst Balloons. See Lab 09.
- LC 1547 — Minimum Cost to Cut a Stick (matrix-chain-like).
- LC 87 — Scramble String.
- LC 132 — Palindrome Partitioning II. See Lab 07.
18. Bitmask DP
When To Use
Small-N (typically N ≤ 20) problems where the state must remember which subset of items has been used. Examples: TSP, assignment problem, set cover, “shortest path visiting all nodes”.
State Design
dp[mask] = best value over subsets specified by mask. Or dp[mask][last] = best path ending at node last and visiting exactly the nodes in mask (TSP-style).
Transition
For TSP: dp[mask | (1 << v)][v] = min(dp[mask | (1 << v)][v], dp[mask][u] + dist(u, v)) for all u in mask and v not in mask.
Evaluation Order
By mask value ascending — guarantees dp[submask] is filled before dp[mask] whenever submask ⊂ mask. Equivalently, by popcount(mask) ascending.
Complexity
Time O(2^N · N²) for TSP-style. Space O(2^N · N) — at N=20 this is 20 × 10^6 = 20M cells, fits in memory.
Common Pitfalls
- Iterating bitmasks in the wrong order — by-mask-value ascending is the safe default.
- Off-by-one on
1 << vvs1 << (v-1)depending on 0- or 1-indexed nodes. - Forgetting
maskincludes the source when initializing. - Underestimating memory — at N=22, 2^N × N = 92M cells; at N=24, 400M+. Bitmask DP is strictly small-N.
Classic Problems
- LC 847 — Shortest Path Visiting All Nodes. See Lab 10.
- LC 943 — Find the Shortest Superstring.
- LC 1349 — Maximum Students Taking Exam (bitmask over rows).
- LC 1125 — Smallest Sufficient Team.
19. Digit DP (Overview)
When To Use
“Count numbers in [L, R] with property P” where P is digit-defined (sum of digits, no consecutive equal, contains a digit, etc.). The state is (position, tight, accumulator…).
State Design
dp[pos][tight][...accumulated state] where tight is a flag indicating whether the prefix so far equals the upper bound’s prefix (so the next digit is bounded).
Transition
For each digit d in 0..(9 if not tight else upper_bound[pos]), recurse to pos + 1 with tight' = tight and d == upper_bound[pos], updating the accumulator.
Complexity
Time O(D × 2 × digit_range × accumulator_size), typically tractable for D = 18 (decimal) and small accumulator.
Common Pitfalls
- Off-by-one between
[L, R]and[0, R]— answer iscount(R) - count(L-1). - Leading zeros — track a “started” flag; otherwise “001” and “1” are conflated.
- Memoizing on
tight=Truepaths — they’re path-specific and shouldn’t be memoized; only memoize thetight=Falsebranch.
Classic Problems
- LC 233 — Number of Digit One.
- LC 902 — Numbers At Most N Given Digit Set.
- LC 1012 — Numbers With Repeated Digits.
Overview-only in this phase; depth in Phase 7 (Competitive Programming).
20. DP On DAG
When To Use
The graph is acyclic; you want longest / shortest / count of paths. The DAG itself defines the topological order; the DP runs along it.
State Design
dp[v] = answer for paths ending at v (or starting from v).
Transition
For longest path: dp[v] = max(dp[u] + w(u, v) for u in predecessors(v)). Run in topological order on the DAG.
Complexity
Time O(V + E). Space O(V).
Common Pitfalls
- Running on a graph that has cycles — the recurrence diverges or memoization loops. Confirm DAG-ness with topological sort first.
- Confusing “longest path” (NP-hard in general graphs) with “longest path in a DAG” (polynomial) — always say “in a DAG” out loud.
Classic Problems
- LC 329 — Longest Increasing Path in a Matrix (implicit DAG).
- LC 1857 — Largest Color Value in a Directed Graph.
- “Longest path in a DAG” — folklore.
21. Game DP (Minimax / Nim / Stone Game)
When To Use
Two-player zero-sum perfect-information game; ask whether the first player wins, or by what margin. Examples: stone game, Nim, predict-the-winner.
State Design
dp[state] = the optimal score the current player can guarantee, assuming both play optimally. Often dp[i][j] with i, j being the two ends of a contested range.
Transition
The current player picks the choice that maximizes their own score. The opponent then plays from the resulting state, also optimally — so the value at the resulting state is what the opponent nets, not the current player. Hence:
dp[i][j] = max(stones[i] - dp[i+1][j], stones[j] - dp[i][j-1])
The -dp[...] flips perspective — the opponent’s optimal score becomes a deduction from the current player’s view.
Common Pitfalls
- Forgetting the perspective flip —
+dp[...]instead of-dp[...]. Produces nonsensical “both players cooperate” answers. - Confusing “current player wins” with “first player wins” —
dp[state]is from the perspective of whoever moves at this state, which may not be the original first player after several moves.
Classic Problems
- LC 486 — Predict the Winner.
- LC 877 — Stone Game.
- LC 1140 — Stone Game II.
- LC 464 — Can I Win (game DP + bitmask).
22. Probability And Expected Value DP
When To Use
Random walks, expected number of steps, probability of reaching a state. Examples: knight probability, dice problems, Markov chains in disguise.
State Design
dp[state] = probability of being in state after the random process, OR expected value of some random variable from state.
Transition
For probability: dp[next] = sum(P(s -> next) × dp[s]) over all predecessors. For expected value (with stopping): E[state] = expected_immediate + sum(P(s -> next) × E[next]) for non-terminal states; E[terminal] = 0.
Complexity
Same as the underlying state-space DP.
Common Pitfalls
- Conflating probability DP and expected-value DP — they have different recurrences; pick the right one for the question.
- Numerical stability — many small probabilities multiplied; use
logor rational arithmetic when extreme. - Infinite expected steps — if there’s a non-zero probability of never reaching the terminal, the expected value is infinite; check reachability first.
Classic Problems
- LC 688 — Knight Probability in Chessboard.
- LC 837 — New 21 Game.
- “Expected number of dice rolls to reach sum N” — folklore.
DP-Recognition Cheat Sheet
The hardest skill in this phase is recognizing that a problem is DP. Here is a battery of signals.
| Signal in problem statement | Likely DP variant |
|---|---|
| “Count number of ways” | Counting DP — sum over choices |
| “Maximum / minimum cost” with sequential choices | Optimization DP |
| “Pick subset with property P” / “partition” | Subset / knapsack |
| “Longest / shortest subsequence” | LIS / LCS family |
| “Edit / transform A into B” | Edit distance family |
| “Each item used at most once” | 0/1 knapsack |
| “Each item can be reused” | Unbounded knapsack |
| “Substring / subarray / contiguous” | 1D DP (often Kadane-like) |
| “Subsequence (non-contiguous)” | LCS / LIS family |
| “Palindromic” | Interval DP, expand-around-center, or LCS(s, rev(s)) |
“Match a pattern with * / .” | Regex / wildcard DP |
| “Tree” + “subtree answer aggregates” | Tree DP, post-order |
| “N ≤ 20” + “visit all” / “subset” | Bitmask DP |
| “N ≤ 100” + “split into intervals” / “merge intervals” | Interval DP, length-ascending |
| “Two-player game, both optimal” | Game DP, perspective-flip |
| “Probability” / “expected” + “random walk / dice” | Probability/EV DP |
| “Number of digits ≤ 18, range [L, R]” | Digit DP |
| “Acyclic graph + longest/count paths” | DP on DAG |
| “Climbing / hopping with steps {a, b, c}” | 1D DP, Fibonacci-like |
| “Decide YES/NO with budget K” | Reachability DP, often boolean knapsack |
Common DP Bugs
A taxonomy. Each one shows up in at least 30% of submitted DP solutions.
- Wrong base case.
dp[0]initialized to 0 when it should be 1 for counting, or 0 when it should be+∞for min. Check by running tabulated against memoized on N=0, 1, 2. - Wrong evaluation order. 2D DP iterated in
(j, i)order when the recurrence readsdp[i-1][j]. Interval DP iterated in(left, right)instead of(length, left). Bitmask DP iterated in arbitrary mask order. - Off-by-one between value-array index and DP index. If
dp[i]is “first i elements”, the latest element isarr[i-1], notarr[i]. Ifdp[i]is “ending at index i”, the latest element isarr[i]. Pick one and never mix. - Missing a choice in the transition. The “skip” / “do nothing” choice is the most-often-forgotten. Without it, you over-constrain the answer.
- Wrong direction in 1D-collapsed knapsack. Left-to-right (unbounded) vs right-to-left (0/1). Silently flipping turns one problem into the other.
- Counting orderings instead of combinations. In coin change variants, the loop nesting (coins outer vs sums outer) determines combinations vs permutations.
- Not handling unreachable states.
+∞propagation: if you computedp[w] = dp[w - c] + 1anddp[w-c] = +∞, yourdp[w]becomes a large finite number (in fixed-width integer types) — overflow. UseINF = 10^9 + 7and guard with explicitif dp[w-c] == INF: continue. - Recursion stack overflow in Python at N > 1000 — convert to iterative, or
sys.setrecursionlimit(10**6)and accept memory cost. - Memoizing on mutable arguments.
lru_cacherequires hashable args; lists / dicts must be tuples / frozensets. - Wrong perspective flip in game DP.
+dp[...]instead of-dp[...]. Both players appear to cooperate in your model. - Including or excluding the boundary of the table inconsistently. Off-by-one in iterators, inclusive/exclusive bounds.
- Time / space estimate ignoring constants. “O(N · M) at N = M = 10^4” is 10^8 — TLE in Python, fine in C++. State the constant honestly.
Mastery Checklist
Before exiting this phase, verify all of these:
- You can derive a state from a recursive brute force in <3 minutes for any DP problem.
- You can write the recurrence (transition) in <2 minutes once the state is fixed.
- You execute the brute → memo → tabulated → space-optimized progression on every DP problem in this phase, without skipping stages.
- You can write tabulated 1D DP (house robber, climbing stairs) in <5 minutes from a blank screen.
- You can write tabulated 2D DP (unique paths, edit distance) in <8 minutes from a blank screen.
- You can space-optimize 2D DP to O(M) on demand, including the right-to-left collapse trick for 0/1 knapsack.
- You can implement LIS at O(N²) and at O(N log N) in <15 minutes total.
- You can implement edit distance with full progression in <25 minutes.
- You can implement house robber III (tree DP) with the (rob, skip) tuple pattern in <15 minutes.
- You can implement burst balloons (interval DP) with the length-ascending iteration in <25 minutes.
-
You can implement TSP-style bitmask DP (
dp[mask][last]) in <30 minutes. -
You can articulate why iterating
for j: for i:in 2D DP can produce garbage — i.e., the topological-order argument — in <30 seconds. -
You can articulate why 0/1 knapsack iterates
wright-to-left and unbounded iterates left-to-right — in <30 seconds. - You can articulate the perspective-flip in game DP — in <30 seconds.
Exit Criteria
You may move to Phase 6 (Greedy and Mathematical Thinking) when all of the following are true:
- You have completed all ten labs in this phase, with each lab’s mastery criteria checked off.
- You have solved at least 50 unaided DP problems from LeetCode (mix of Medium, Medium-Hard, Hard) and reviewed each via REVIEW_TEMPLATE.md.
- Your unaided success rate on Medium-Hard DP problems is ≥ 65%.
- In a mock interview (phase-11-mock-interviews/), you correctly identify the DP variant within 2 minutes for at least 7 of 10 DP problems and produce the recurrence within 4 minutes for at least 6 of 10.
- You execute the brute → memo → tabulated → space-optimized progression on every DP problem in mocks, even when the interviewer doesn’t ask for all four stages — this is the single discipline of this phase, and skipping it is a phase-failure.
If any of these fails, do another 20–30 DP problems before moving on. Skipping this gate calcifies bad habits that destroy you in Phase 7 (competitive programming) where DP shows up at every turn.
Labs
Hands-on practice. Each lab follows the strict 22-section format and demonstrates the four-stage progression in detail.
- Lab 01 — 1D DP Foundations (House Robber)
- Lab 02 — 2D DP (Unique Paths with Obstacles)
- Lab 03 — 0/1 Knapsack (Partition Equal Subset Sum)
- Lab 04 — Unbounded Knapsack (Coin Change)
- Lab 05 — LIS (Longest Increasing Subsequence)
- Lab 06 — LCS / Edit Distance
- Lab 07 — Palindrome DP (LPS + Min Cuts)
- Lab 08 — Tree DP (House Robber III)
- Lab 09 — Interval DP (Burst Balloons)
- Lab 10 — Bitmask DP (Shortest Path Visiting All Nodes)
← Phase 4: Graph Mastery · Phase 6: Greedy → · Back to Top
Lab 01 — 1D DP Foundations (House Robber)
Goal
Implement House Robber (LC 198) four times — brute recursion, memoized, tabulated, and space-optimized — to internalize the brute → memo → tabulated → space-optimized progression that this entire phase is built around. After this lab you should be able to recognize a 1D DP problem in <60 seconds, derive the state and recurrence in <90 seconds, and produce the O(1)-space final solution from a blank screen in under 5 minutes.
Background Concepts
A 1D DP has state dp[i] indexed by a single integer — the prefix length, the position, or the day. The recurrence reads only O(1) previous values, which is what makes the rolling-array (O(1)-space) trick work. House Robber is the canonical example because it has exactly two choices per state (rob this house or skip), each of which determines the next state cleanly. The recursive formulation f(i) = max(f(i-1), f(i-2) + house[i-1]) reads two previous values; the tabulated version is a direct loop; the space-optimized version keeps two scalars.
The four-stage progression is the discipline of this lab. Don’t skip stages. The interviewer at staff level routinely asks “show me the recursive version first” specifically to test whether you can derive the recurrence from a brute force. Candidates who memorized the iterative solution but never wrote the recursion fail this question.
Interview Context
House Robber is a top-30 phone-screen DP problem at Amazon, Google, Microsoft, and Meta. Its variants — House Robber II (circular), House Robber III (tree, see Lab 08) — extend it. Bombing this problem on a phone screen is a near-instant rejection at L4. The reason: it has the simplest possible state (a single integer) and the simplest possible recurrence (two-way choice). If you can’t do this one, you can’t do any DP.
Problem Statement
You are a robber planning to rob houses arranged in a line. Each house has a non-negative integer amount of cash, given by nums[i]. You cannot rob two adjacent houses (the alarm system links them). Return the maximum amount of cash you can rob.
Constraints
- 1 ≤
nums.length≤ 100 - 0 ≤
nums[i]≤ 400
Clarifying Questions
- Are amounts non-negative? (Yes — given.)
- Can
numsbe empty? (No, length ≥ 1 by constraint, but always confirm.) - Are houses arranged in a line or a circle? (Line for LC 198; LC 213 is the circular variant.)
- Can two adjacent houses both be skipped? (Yes — skipping is always allowed.)
- Must we rob at least one house? (No — robbing nothing is allowed if all values are 0; in practice, since amounts are non-negative, the optimum is always ≥ 0.)
Examples
nums = [1, 2, 3, 1] → 4 (rob houses 0 and 2: 1 + 3)
nums = [2, 7, 9, 3, 1] → 12 (rob houses 0, 2, 4: 2 + 9 + 1)
nums = [2, 1, 1, 2] → 4 (rob houses 0 and 3: 2 + 2)
nums = [5] → 5
nums = [0, 0, 0] → 0
Initial Brute Force
At each house, two choices: rob it (and skip the next) or skip it. Recursively try both:
def rob_brute(nums):
def f(i):
if i >= len(nums):
return 0
return max(f(i + 1), nums[i] + f(i + 2))
return f(0)
Brute Force Complexity
Each call branches into 2 recursive calls, so we visit O(2^N) subproblems. At N=100, that’s 2^100 = 1.27 × 10^30 — far beyond any time limit. Space is O(N) for the recursion stack.
Optimization Path
The brute force is exponential because the same f(i) is recomputed exponentially many times. There are only N+1 distinct values of i, so memoization collapses the work to O(N). From there, tabulation removes the recursion overhead. Finally, since the recurrence reads only dp[i-1] and dp[i-2], we keep two scalars instead of the full array — O(1) space.
Each stage strictly improves on the previous: brute → memo (cache; from O(2^N) to O(N) time), memo → tabulated (loop instead of recursion; same complexity, no stack overhead), tabulated → space-optimized (drop the array; from O(N) to O(1) space).
Final Expected Approach
Define dp[i] = maximum cash robbed from the first i houses. Recurrence:
dp[0] = 0 (no houses to rob)
dp[1] = nums[0] (one house — rob it)
dp[i] = max(dp[i-1], # skip house i-1
dp[i-2] + nums[i-1]) # rob house i-1
Answer: dp[N]. Since the recurrence reads only dp[i-1] and dp[i-2], keep two scalars: prev2 and prev1.
Data Structures Used
- A 1D array
dpof sizeN+1(tabulated). - Two scalars
prev2,prev1(space-optimized). - For brute / memo: function call stack and a memoization dict /
lru_cache.
Correctness Argument
By induction on i. Base: dp[0] = 0 (correct — no houses). dp[1] = nums[0] (correct — one house, rob it). Inductive step: at step i, the optimal robbery either does or does not rob house i-1. If it does, the remaining is the optimal over the first i-2 houses (since we can’t rob i-1’s neighbors), giving dp[i-2] + nums[i-1]. If it does not, the remaining is the optimal over the first i-1 houses, giving dp[i-1]. Taking the max covers both cases — this exhausts the choice space, so the recurrence is correct.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(2^N) | O(N) (stack) |
| Memoized | O(N) | O(N) (cache + stack) |
| Tabulated | O(N) | O(N) |
| Space-optimized | O(N) | O(1) |
Implementation Requirements
All four stages are required.
# ---- Stage 1: Brute force ----
def rob_brute(nums):
def f(i):
if i >= len(nums):
return 0
return max(f(i + 1), nums[i] + f(i + 2))
return f(0)
# ---- Stage 2: Memoized ----
from functools import lru_cache
def rob_memo(nums):
@lru_cache(None)
def f(i):
if i >= len(nums):
return 0
return max(f(i + 1), nums[i] + f(i + 2))
return f(0)
# ---- Stage 3: Tabulated ----
def rob_tab(nums):
n = len(nums)
if n == 0: return 0
dp = [0] * (n + 1)
dp[1] = nums[0]
for i in range(2, n + 1):
dp[i] = max(dp[i-1], dp[i-2] + nums[i-1])
return dp[n]
# ---- Stage 4: Space-optimized ----
def rob(nums):
prev2, prev1 = 0, 0
for x in nums:
prev2, prev1 = prev1, max(prev1, prev2 + x)
return prev1
Tests
[]→ 0 (defensive, even if constraint disallows).[5]→ 5.[5, 1]→ 5.[1, 2, 3, 1]→ 4.[2, 7, 9, 3, 1]→ 12.[0, 0, 0, 0]→ 0.[400, 400, 400]→ 800 (rob ends).- All four implementations should produce identical results — write a randomized stress comparator on
numsof length 1..15 and checkrob_brute == rob == rob_tab == rob_memo.
Follow-up Questions
- “What if houses are in a circle?” (LC 213 — House Robber II) → Run the line algorithm twice: once excluding house 0, once excluding house N-1; take the max.
- “What if houses are nodes of a binary tree?” (LC 337 — House Robber III) → Tree DP with
(rob, skip)tuple per node. See Lab 08. - “Reconstruct which houses were robbed.” → Track back-pointers in tabulated version, or re-derive by walking
dpbackwards: at eachi, robbed iffdp[i] > dp[i-1]. - “What if the no-rob constraint extends to k-apart instead of adjacent?” →
dp[i] = max(dp[i-1], dp[i-k-1] + nums[i-1]). - “What if amounts can be negative?” → Same recurrence;
dp[i-2] + nums[i-1]may be less thandp[i-1], so the max correctly drops it.
Product Extension
Variations of this problem appear in real systems: scheduling non-conflicting jobs (interval scheduling with profit), selecting non-overlapping ad slots, and assigning tasks to time-slots with cooldown. The 1D DP framework generalizes when “no two adjacent” becomes “no two within window K” or “must wait at least T”.
Language/Runtime Follow-ups
- Python:
lru_cachemakes memoization a one-line addition. At N>1000, default recursion limit overflows — bump withsys.setrecursionlimit(10**6)or use the tabulated version. The space-optimized version is idiomatic and fast. - Java: use
int[] dpfor tabulated;Arrays.fill(dp, -1)+ recursion for memoization. Java’s default stack is ~512KB; recursion overflows around N=10000. - Go: tabulated is idiomatic; Go has no
lru_cacheso memoization needs a manualmap[int]intor[]int. - C++: tabulated with
vector<int>; memoization with avector<int> memo(N, -1)and a recursive helper. - JS/TS: same idiom as Python but no
lru_cache— useMapfor memoization.
Common Bugs
- Returning
dp[N-1]instead ofdp[N](or vice versa) — depends on whetherdp[i]indexes “first i houses” or “ending at i”. Pick one convention and stick to it. - Initializing
dp[0] = nums[0]anddp[1] = max(nums[0], nums[1])— works, but only if you handleN=1separately. The cleaner convention isdp[0]=0, dp[1]=nums[0]. - Off-by-one in
dp[i-2] + nums[i-1]vsdp[i-2] + nums[i-2]— depends on the index convention. Verify on[5]. - Forgetting that
numscan be empty — guard withif not nums: return 0even though constraints say N ≥ 1. - Space-optimized version: swapping
prev2, prev1 = prev1, max(prev1, prev2 + x)in the wrong order. Tuple-assignment in Python evaluates the RHS first, so this is correct; in Java/C++ you need an explicit temp.
Debugging Strategy
When the answer is wrong by a small amount: print the entire dp array for nums = [2, 7, 9, 3, 1] (expected dp = [0, 2, 7, 11, 11, 12]). If it differs, trace the iteration step where dp first deviates and inspect the recurrence at that index. When the answer is wildly wrong (negative, or much smaller): suspect index off-by-one or an if condition that’s flipped. When TLE: confirm you’re not running the brute force.
Mastery Criteria
- Recognized House Robber as a 1D DP problem within 60 seconds.
- Wrote the brute recursive formulation in <2 minutes from cold start.
-
Added
@lru_cacheto produce the memoized version in <30 seconds. - Wrote the tabulated version in <3 minutes from blank screen, passing all five test cases first try.
- Wrote the space-optimized version in <2 minutes after the tabulated.
- Stated O(N) time and O(1) space unprompted.
- Articulated the inductive correctness argument in <30 seconds.
- Solved LC 198 unaided in <8 minutes total (all four stages).
- Solved LC 213 (House Robber II) unaided in <12 minutes by running the line algorithm twice.
Lab 02 — 2D DP (Unique Paths with Obstacles)
Goal
Solve Unique Paths II (LC 63) with the full brute → memo → tabulated → space-optimized progression. Internalize the canonical 2D DP loop structure (for i: for j:) and the rolling-row trick that reduces O(N · M) space to O(M). After this lab you should be able to write any grid-DP problem from a blank screen in <8 minutes and apply the rolling-row collapse on demand.
Background Concepts
A 2D DP has state dp[i][j] indexed by two integers. For grid problems, (i, j) is a cell, and the recurrence aggregates over the (at most) two predecessors (i-1, j) and (i, j-1). Because each row depends only on the previous row, the table can be rolled down to a single 1D array of length M+1 — half the memory, identical answers.
The grid-DP family is the cleanest 2D DP family because the dependency graph is trivially layered (row by row). It is the right place to learn the rolling-row mechanic before applying it to harder 2D DPs (knapsack, edit distance, LCS).
Interview Context
Unique Paths II is a top-50 Medium DP problem at Microsoft, Amazon, and Bloomberg. The non-obstacle variant (LC 62) shows up at every L3 phone screen. The obstacle variant adds a wrinkle: cells with grid[i][j] == 1 are blocked and contribute 0. Candidates who try a closed-form combinatorial answer (C(N+M-2, N-1)) get stuck the moment obstacles appear — the only general approach is DP. Showing all four stages (brute, memo, tabulated, O(M)-space) signals senior fluency.
Problem Statement
Given an m × n grid obstacleGrid where each cell is either 0 (open) or 1 (obstacle), count the number of distinct paths from (0, 0) to (m-1, n-1). Movement is restricted to right or down by one cell. If the start or end is blocked, the answer is 0.
Constraints
- 1 ≤ m, n ≤ 100
obstacleGrid[i][j]is0or1.- The result is guaranteed to fit in a 32-bit signed integer.
Clarifying Questions
- Can the start or end be an obstacle? (Yes — answer is
0if so.) - Are diagonal moves allowed? (No — only right and down.)
- Are paths considered distinct if they share intermediate cells? (Yes — only the sequence of moves matters.)
- Modular arithmetic required? (No — fits in int32.)
- Is
m=1, n=1valid? (Yes; answer is1if open,0if blocked.)
Examples
[[0,0,0],
[0,1,0],
[0,0,0]] → 2
[[0,1],
[0,0]] → 1
[[1]] → 0 (start blocked)
[[0]] → 1
Initial Brute Force
At each open cell, try moving right or down recursively:
def paths_brute(grid):
m, n = len(grid), len(grid[0])
def f(i, j):
if i >= m or j >= n or grid[i][j] == 1:
return 0
if i == m - 1 and j == n - 1:
return 1
return f(i + 1, j) + f(i, j + 1)
return f(0, 0)
Brute Force Complexity
Each call branches into two; depth is m + n - 2. Worst-case calls: 2^(m+n-2). At m=n=100, that’s 2^198 ≈ 4 × 10^59 — TLE. Space is O(m+n) for the recursion stack.
Optimization Path
The brute force recomputes f(i, j) exponentially. There are only m × n distinct (i, j) pairs, so memoization collapses time to O(m · n). Tabulation replaces recursion with a row-major loop. Since dp[i][j] reads dp[i-1][j] and dp[i][j-1], the previous row plus the in-progress row are sufficient — collapse to a single 1D array iterated left-to-right (no same-row dependency conflict because we read dp[j-1] before overwriting it, and dp[j] from the previous row is what’s already there).
Final Expected Approach
Define dp[i][j] = number of paths from (0, 0) to (i, j). Recurrence:
dp[0][0] = 1 if grid[0][0] == 0 else 0
dp[i][j] = 0 if grid[i][j] == 1
= dp[i-1][j] + dp[i][j-1] otherwise (treat out-of-bounds as 0)
Roll to 1D: dp[j] += dp[j-1] for each row, with dp[j] = 0 if blocked.
Data Structures Used
- 2D array
dpof sizem × n(tabulated). - 1D array
dpof sizen(rolled). - For brute / memo: recursion stack +
lru_cache.
Correctness Argument
Every path to (i, j) arrives via the cell above or the cell to the left. The number of paths to (i, j) is the sum of paths to those two predecessors (when neither is out-of-bounds and both are open). This holds because the two predecessor paths are disjoint (the last move differs) and exhaust all paths. Blocked cells contribute 0 directly. Base: dp[0][0] = 1 if open, else 0. Induction over the row-major topological order proves correctness for all cells.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(2^(m+n)) | O(m+n) |
| Memoized | O(m · n) | O(m · n) |
| Tabulated | O(m · n) | O(m · n) |
| Space-optimized | O(m · n) | O(n) |
Implementation Requirements
All four stages.
# ---- Stage 1: Brute force ----
def paths_brute(grid):
m, n = len(grid), len(grid[0])
def f(i, j):
if i >= m or j >= n or grid[i][j] == 1:
return 0
if i == m - 1 and j == n - 1:
return 1
return f(i + 1, j) + f(i, j + 1)
return f(0, 0)
# ---- Stage 2: Memoized ----
from functools import lru_cache
def paths_memo(grid):
m, n = len(grid), len(grid[0])
@lru_cache(None)
def f(i, j):
if i >= m or j >= n or grid[i][j] == 1: return 0
if i == m - 1 and j == n - 1: return 1
return f(i + 1, j) + f(i, j + 1)
return f(0, 0)
# ---- Stage 3: Tabulated ----
def paths_tab(grid):
m, n = len(grid), len(grid[0])
if grid[0][0] == 1 or grid[m-1][n-1] == 1: return 0
dp = [[0] * n for _ in range(m)]
dp[0][0] = 1
for i in range(m):
for j in range(n):
if grid[i][j] == 1:
dp[i][j] = 0
continue
if i > 0: dp[i][j] += dp[i-1][j]
if j > 0: dp[i][j] += dp[i][j-1]
return dp[m-1][n-1]
# ---- Stage 4: Space-optimized (1D rolled) ----
def uniquePathsWithObstacles(grid):
m, n = len(grid), len(grid[0])
if grid[0][0] == 1: return 0
dp = [0] * n
dp[0] = 1
for i in range(m):
for j in range(n):
if grid[i][j] == 1:
dp[j] = 0
elif j > 0:
dp[j] += dp[j-1]
return dp[n-1]
Tests
[[0,0,0],[0,1,0],[0,0,0]]→ 2.[[0,1],[0,0]]→ 1.[[1]]→ 0.[[0]]→ 1.[[0,0],[1,1],[0,0]]→ 0 (no path past the blocking row).[[0,0,0,0,0]]→ 1.[[0],[0],[0]]→ 1.- m=n=100 with random 10% obstacles — performance test.
Follow-up Questions
- “What if there are diagonal moves?” →
dp[i][j] += dp[i-1][j-1]as a third predecessor. - “Each cell has a cost; minimize total path cost.” → Min-path-sum (LC 64);
mininstead of+. - “K obstacles can be removed.” → 3D DP
dp[i][j][k]= paths to(i, j)having removedkobstacles. - “Reconstruct one valid path.” → Backtrack through
dpfrom the target; at each cell pick a predecessor with non-zero contribution. - “Grid is enormous (m=n=10^9) but obstacles are sparse.” → Combinatorial answer (
C(m+n-2, n-1)) minus inclusion-exclusion over obstacles. Out of scope here.
Product Extension
Routing on a city grid with road closures, robot path planning with obstacles, dependency-graph traversal with disabled edges. The grid-DP framework generalizes to any DAG where the topological order is row-major.
Language/Runtime Follow-ups
- Python:
[[0]*n for _ in range(m)]allocates correctly;[[0]*n]*mshares row references — a classic bug. Use a comprehension. - Java:
int[][] dp = new int[m][n];zero-initializes by default. - Go: pre-allocate the slice-of-slices explicitly.
- C++:
vector<vector<int>> dp(m, vector<int>(n, 0));. - JS/TS:
Array.from({length: m}, () => new Array(n).fill(0))to avoid the shared-reference trap.
Common Bugs
- Shared row references in Python:
dp = [[0]*n]*mmakes all rows alias the same list. Use a comprehension. - Forgetting to check
grid[0][0] == 1: if the start is blocked, the answer is 0, butdp[0][0] = 1would propagate non-zero counts through the grid. - Using
if i > 0 and j > 0instead of two separateifs — silently misses one of the two predecessors. - Iterating columns outer, rows inner — works for this problem since
dp[i][j]only reads upward and leftward, but breaks the rolled-1D version. - Rolled 1D version: forgetting to set
dp[j] = 0on obstacle — old non-zero value persists from the previous row.
Debugging Strategy
Print the full dp table for the 3×3 obstacle example. Expected:
1 1 1
1 0 1
1 1 2
If yours diverges at row 1, suspect the obstacle handling. If at row 2, suspect the dp[j] = 0 reset. For the rolled-1D version, print dp after each row.
Mastery Criteria
- Recognized this as a 2D grid DP within 60 seconds.
- Wrote the brute recursion in <2 minutes.
- Wrote the tabulated 2D version in <5 minutes from blank screen.
- Performed the rolling-row collapse to 1D in <2 minutes from the tabulated version.
- Stated O(m·n) time and O(n) space for the final solution unprompted.
- Articulated why the rolled-1D version iterates left-to-right (no same-row conflict).
- Solved LC 62 (no obstacles) in <5 minutes.
- Solved LC 63 unaided in <12 minutes total.
-
Solved LC 64 (min path sum) in <8 minutes by changing
+tomin.
Lab 03 — 0/1 Knapsack (Partition Equal Subset Sum)
Goal
Solve Partition Equal Subset Sum (LC 416) by reducing it to 0/1 knapsack. Internalize the right-to-left iteration that makes the 1D-collapsed knapsack correct, and articulate why left-to-right iteration would silently turn it into unbounded knapsack. After this lab you should recognize any subset-sum / partition / target-sum / select-with-budget problem as 0/1 knapsack within 90 seconds.
Background Concepts
0/1 knapsack: N items each with weight w_i and value v_i; pick a subset with total weight ≤ W maximizing total value. The 2D DP has state dp[i][w] = max value using first i items with capacity w. The 1D-collapsed version uses dp[w] and iterates w from W down to w_i — the right-to-left iteration is what prevents an item from being reused within the same outer iteration.
Subset sum is 0/1 knapsack with v_i = w_i and a boolean dp instead of integer-valued. Partition equal subset sum reduces to “is there a subset summing to total / 2?”; if total is odd, return false immediately.
Interview Context
Partition Equal Subset Sum is a top-25 Medium DP problem at Amazon and Microsoft. The 0/1 knapsack pattern shows up in disguise constantly: target sum (LC 494), ones and zeroes (LC 474), last stone weight II (LC 1049), tallest billboard (LC 956). Recognizing the reduction is half the battle. The other half is the right-to-left iteration trick — getting that wrong is one of the most common DP bugs across the entire interview corpus.
Problem Statement
Given a non-empty array nums of positive integers, determine whether it can be partitioned into two subsets with equal sums.
Constraints
- 1 ≤
nums.length≤ 200 - 1 ≤
nums[i]≤ 100
So the maximum total sum is 200 × 100 = 20,000, and the target is at most 10,000. The 2D DP has 200 × 10001 = 2 × 10^6 cells — comfortable.
Clarifying Questions
- Are elements positive? (Yes — given.)
- Must the partition use all elements? (Yes — that’s what “partition” means.)
- Is the empty subset allowed on either side? (Yes if total is 0 — vacuously true. Not the case here since
nums[i] ≥ 1.) - Are duplicates allowed? (Yes — they’re treated as separate items.)
- Return value: bool (true / false).
Examples
[1, 5, 11, 5] → true (1+5+5 == 11)
[1, 2, 3, 5] → false (total=11, odd)
[1, 2, 5] → false (total=8, target=4, no subset sums to 4)
[2, 2, 1, 1] → true (2+1 == 2+1)
[100] → false (total=100, target=50, no subset)
Initial Brute Force
For each element, recurse on “include it” and “skip it”:
def can_partition_brute(nums):
total = sum(nums)
if total % 2: return False
target = total // 2
def f(i, remain):
if remain == 0: return True
if i == len(nums) or remain < 0: return False
return f(i + 1, remain - nums[i]) or f(i + 1, remain)
return f(0, target)
Brute Force Complexity
O(2^N) time, O(N) stack. At N=200, 2^200 — completely infeasible.
Optimization Path
There are only N × (target + 1) distinct (i, remain) pairs, so memoization gives O(N · target) time and space. Tabulation replaces recursion with a 2D loop. Since dp[i][w] only reads dp[i-1][...], roll to 1D dp[w] — but iterate w right-to-left so that each item is considered at most once per outer iteration.
The right-to-left direction is the defining trick of 0/1 knapsack. If we iterate left-to-right, then dp[w - w_i] may have already been updated to include item i from the current outer iteration; we’d then re-include item i, turning the algorithm into unbounded knapsack.
Final Expected Approach
Reduce to subset sum: target = total / 2 (or return false if total is odd).
dp[w] = True if some subset sums to exactly w, considering items processed so far.
dp[0] = True (empty subset sums to 0).
For each num x in nums:
for w in range(target, x - 1, -1):
dp[w] = dp[w] or dp[w - x]
Answer: dp[target]
Data Structures Used
- 2D
dp[N+1][target+1]boolean array (tabulated). - 1D
dp[target+1]boolean array (space-optimized). - For brute / memo: recursion +
lru_cache.
Correctness Argument
Inductive on items processed. dp[w] = True iff some subset of items processed so far sums to w. Base: dp[0] = True (empty subset). Inductive step: when we process item x, the new dp[w] is True iff (a) it was True before (subset not using x sums to w), OR (b) dp[w - x] was True before processing x (subset summing to w - x plus item x). The right-to-left iteration ensures we read the previous dp[w - x], not the in-iteration one. Termination: we want dp[target] after all items are processed.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(2^N) | O(N) |
| Memoized | O(N · target) | O(N · target) |
| Tabulated | O(N · target) | O(N · target) |
| Space-optimized | O(N · target) | O(target) |
For LC 416: N≤200, target≤10000, so ~2×10^6 ops — fast.
Implementation Requirements
All four stages.
# ---- Stage 1: Brute force ----
def can_partition_brute(nums):
total = sum(nums)
if total % 2: return False
target = total // 2
def f(i, remain):
if remain == 0: return True
if i == len(nums) or remain < 0: return False
return f(i + 1, remain - nums[i]) or f(i + 1, remain)
return f(0, target)
# ---- Stage 2: Memoized ----
from functools import lru_cache
def can_partition_memo(nums):
total = sum(nums)
if total % 2: return False
target = total // 2
@lru_cache(None)
def f(i, remain):
if remain == 0: return True
if i == len(nums) or remain < 0: return False
return f(i + 1, remain - nums[i]) or f(i + 1, remain)
return f(0, target)
# ---- Stage 3: Tabulated 2D ----
def can_partition_tab(nums):
total = sum(nums)
if total % 2: return False
target = total // 2
n = len(nums)
dp = [[False] * (target + 1) for _ in range(n + 1)]
for i in range(n + 1):
dp[i][0] = True
for i in range(1, n + 1):
for w in range(1, target + 1):
dp[i][w] = dp[i-1][w]
if w >= nums[i-1]:
dp[i][w] = dp[i][w] or dp[i-1][w - nums[i-1]]
return dp[n][target]
# ---- Stage 4: Space-optimized 1D ----
def canPartition(nums):
total = sum(nums)
if total % 2: return False
target = total // 2
dp = [False] * (target + 1)
dp[0] = True
for x in nums:
for w in range(target, x - 1, -1): # RIGHT-TO-LEFT
dp[w] = dp[w] or dp[w - x]
return dp[target]
Tests
[1, 5, 11, 5]→ True.[1, 2, 3, 5]→ False (odd total).[1, 2, 5]→ False (no valid subset).[2, 2, 1, 1]→ True.[100]→ False.[1, 1]→ True.- N=200, all
nums[i]=1→ True (target=100; pick 100 of them). - All four implementations should produce identical bool results — randomized comparator on N≤15.
Follow-up Questions
- “Return one valid subset, not just yes/no.” → Track parent pointers in the 2D DP; reconstruct by walking backwards through
(i, w). - “Partition into K equal subsets.” (LC 698) → 0/1-knapsack-style DP becomes intractable; use bitmask DP or backtracking with pruning.
- “Target sum: how many ways to assign +/- to each number to total exactly
T?” (LC 494) → Reduce to subset sum: count subsets summing to(total + T) / 2. - “Minimum subset sum difference.” (LC 1049) → Find largest
s ≤ total/2reachable; answer istotal - 2s. - “What if
nums[i]can be huge (up to 10^9)?” → Knapsack space blows up. Use Karp-style or reduce by GCD; otherwise NP-hard in general.
Product Extension
Resource allocation (split a budget across two teams equally), load balancing (split a workload across two workers), and “is there a subset with this exact total?” appear in billing systems, accounting reconciliation, and cluster-resource schedulers.
Language/Runtime Follow-ups
- Python:
dp = [False] * (target + 1)is fine; the inner loop’srange(target, x - 1, -1)is the canonical right-to-left form. - Java:
boolean[] dp = new boolean[target + 1];defaults to false. Use aBitSetfor ~64x speedup:dp.or(dp << x)does the entire row-update in O(target / 64). - Go:
make([]bool, target+1)and a manual reversed loop. - C++:
vector<bool>is bit-packed;bitset<10001>is faster but fixed-size. - JS/TS:
new Uint8Array(target + 1)to avoid thefalsedefault-equals-undefined trap.
Common Bugs
- Iterating
wleft-to-right in the 1D version — turns 0/1 into unbounded; spuriousTrueanswers. - Forgetting the odd-total short circuit — wastes time and may TLE on edge cases.
- Using
dp[w] = dp[w-x]instead ofdp[w] or dp[w-x]— wipes out previously-set True values. - Off-by-one in
range(target, x - 1, -1)— should includew == x(sincedp[x] = dp[x] or dp[0] = Truefor anyx ≤ target). - Setting
dp[0] = Trueonly on the first iteration — must be set once before any item is processed.
Debugging Strategy
For [1, 5, 11, 5]: after processing [1], dp = [T, T, F, F, F, F, F, F, F, F, F, F] (indexes 0..11). After [1, 5]: dp[6] = T (1+5). After [1, 5, 11]: dp[11] = T. Print dp after each item; if dp[target] becomes True earlier than expected, you’re allowing item-reuse (left-to-right bug).
Mastery Criteria
- Recognized partition-equal-subset as 0/1 knapsack within 90 seconds.
-
Wrote the reduction to subset sum (
target = total / 2) before any code. - Wrote the brute recursion in <2 minutes.
- Wrote the 2D tabulated version in <5 minutes.
- Performed the 1D collapse with right-to-left iteration in <2 minutes.
- Articulated why left-to-right would be wrong (item reuse → unbounded knapsack) in <30 seconds.
- Stated O(N · target) time and O(target) space unprompted.
- Solved LC 416 unaided in <12 minutes (all four stages).
- Solved LC 494 (Target Sum) in <12 minutes via the reduction.
Lab 04 — Unbounded Knapsack (Coin Change)
Goal
Solve Coin Change (LC 322 — minimum coins) and Coin Change II (LC 518 — count combinations) with the full four-stage progression. Internalize the left-to-right iteration that makes 1D-collapsed unbounded knapsack correct, and the loop-nesting trick that distinguishes counting combinations from counting permutations. After this lab you can solve any “unlimited supply” knapsack in <10 minutes from cold start.
Background Concepts
Unbounded knapsack: items can be reused any number of times. The 1D-collapsed DP iterates w left-to-right, the opposite of 0/1 knapsack. That single direction-change is the entire mechanical difference. The semantic difference: when we read dp[w - c] in the left-to-right pass, it has already been updated this round to include coin c zero or more times — so this round’s update can stack another c on top, achieving “use c multiple times”.
Coin Change has two flavors. LC 322 asks for the minimum number of coins to reach amount; the recurrence is dp[w] = min(dp[w], dp[w - c] + 1), initialized to +∞ with dp[0] = 0. LC 518 asks for the count of combinations summing to amount; the recurrence is dp[w] += dp[w - c], initialized to dp[0] = 1. The combinations-vs-permutations trap: with coins outer, sums inner, you count combinations (each combination of coins is counted once regardless of order); with sums outer, coins inner, you count permutations (different orderings of the same coins count separately).
Interview Context
Coin Change (LC 322) is a top-15 phone-screen DP problem at every major company. Coin Change II is asked roughly half as often but tests the deeper combinations-vs-permutations distinction. Bombing LC 322 at L4+ is a near-instant rejection. Senior interviewers often follow up with LC 518 specifically to test whether you understand why the loop nesting matters — the hand-wavy candidate is filtered by this question.
Problem Statement
LC 322 (minimum coins): Given coins of distinct denominations and an integer amount, return the fewest number of coins needed to make up amount. Return -1 if unreachable. Each coin denomination has unlimited supply.
LC 518 (count combinations): Given coins and amount, return the number of distinct combinations that sum to amount.
Constraints
- 1 ≤
coins.length≤ 12 (LC 322) / 300 (LC 518). - 1 ≤
coins[i]≤ 2^31 − 1. - 0 ≤
amount≤ 10^4 (LC 322) / 5000 (LC 518). - LC 518: answer fits in a signed 32-bit integer.
Clarifying Questions
- Are coins distinct? (Yes — given.)
- Can each coin be used multiple times? (Yes — unlimited supply; this is what makes it unbounded.)
- Is amount=0 valid? (Yes; minimum coins = 0; combinations = 1 — the empty combination.)
- LC 518: is
[1, 2]different from[2, 1]? (No — combinations only, not permutations.) - Coins can exceed
amount? (Yes; just unusable for that amount.)
Examples
LC 322:
coins=[1,2,5], amount=11 → 3 (5+5+1)
coins=[2], amount=3 → -1 (unreachable)
coins=[1], amount=0 → 0
coins=[1,2,5], amount=100 → 20 (twenty 5-coins)
LC 518:
coins=[1,2,5], amount=5 → 4 ([5], [2,2,1], [2,1,1,1], [1×5])
coins=[2], amount=3 → 0
coins=[10], amount=10 → 1
Initial Brute Force (LC 322)
At each step, try every coin:
def coinChange_brute(coins, amount):
def f(remain):
if remain == 0: return 0
if remain < 0: return float('inf')
best = float('inf')
for c in coins:
best = min(best, f(remain - c) + 1)
return best
ans = f(amount)
return ans if ans != float('inf') else -1
Brute Force Complexity
Each call branches into len(coins) recursive calls; depth amount / min(coins). Worst case O(K^(amount)) — exponential. At K=12, amount=10^4, completely infeasible.
Optimization Path
There are only amount + 1 distinct values of remain, so memoization gives O(K · amount) time. Tabulation replaces recursion with a loop. The 1D version uses left-to-right iteration so each coin can be reused.
For LC 518 (counting), the order matters: coins outer (combinations) vs sums outer (permutations). The combinations interpretation is what LC 518 wants.
Final Expected Approach
LC 322 (minimum):
dp[w] = min coins to make w. dp[0] = 0; dp[w > 0] = INF.
For each w in 1..amount:
For each c in coins where c <= w:
dp[w] = min(dp[w], dp[w - c] + 1)
Answer: dp[amount] if dp[amount] != INF else -1.
LC 518 (count combinations):
dp[w] = number of combinations summing to w. dp[0] = 1; rest 0.
For each c in coins: # COINS OUTER
For each w in c..amount: # SUMS INNER, left-to-right
dp[w] += dp[w - c]
Answer: dp[amount].
Data Structures Used
- 1D
dp[amount+1]integer array. - For brute / memo: recursion +
lru_cache.
Correctness Argument
LC 322: dp[w] = min coins to reach w, by induction on w. Base: dp[0] = 0. Inductive step: any optimal solution for w ends with some coin c, leaving w - c to be solved optimally — dp[w - c] + 1. Take the minimum over all coins. Unreachable states stay at INF, propagating correctly under min.
LC 518: by the outer-coins loop, after processing coins c_1, ..., c_k, dp[w] counts combinations of those coins summing to w. Inductive step: when we process c_{k+1} with the inner left-to-right loop, the update dp[w] += dp[w - c_{k+1}] adds combinations that use at least one c_{k+1}. Because the inner loop is left-to-right, dp[w - c_{k+1}] already includes solutions using c_{k+1} zero or more times — so this update accounts for using c_{k+1} exactly 1, 2, 3, … times in turn. Each combination is counted exactly once because every combination has a latest coin index, and only the iteration on that coin index counts it. Outer-coins prevents reordering: [1, 2] and [2, 1] are not separately counted.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(K^amount) | O(amount) |
| Memoized | O(K · amount) | O(amount) |
| Tabulated | O(K · amount) | O(amount) |
| Space-optimized | (same as tabulated; already 1D) | O(amount) |
Implementation Requirements
All four stages for LC 322; tabulated only for LC 518.
# ==== LC 322: Coin Change (minimum coins) ====
# ---- Stage 1: Brute force ----
def coinChange_brute(coins, amount):
def f(remain):
if remain == 0: return 0
if remain < 0: return float('inf')
return min((f(remain - c) + 1 for c in coins), default=float('inf'))
ans = f(amount)
return ans if ans != float('inf') else -1
# ---- Stage 2: Memoized ----
from functools import lru_cache
def coinChange_memo(coins, amount):
@lru_cache(None)
def f(remain):
if remain == 0: return 0
if remain < 0: return float('inf')
return min((f(remain - c) + 1 for c in coins), default=float('inf'))
ans = f(amount)
return ans if ans != float('inf') else -1
# ---- Stage 3+4: Tabulated 1D (already optimal space) ----
def coinChange(coins, amount):
INF = amount + 1
dp = [INF] * (amount + 1)
dp[0] = 0
for w in range(1, amount + 1): # SUMS OUTER, COINS INNER for min variant
for c in coins:
if c <= w:
dp[w] = min(dp[w], dp[w - c] + 1)
return dp[amount] if dp[amount] != INF else -1
# ==== LC 518: Coin Change II (count combinations) ====
def change(amount, coins):
dp = [0] * (amount + 1)
dp[0] = 1
for c in coins: # COINS OUTER
for w in range(c, amount + 1): # SUMS INNER, LEFT-TO-RIGHT
dp[w] += dp[w - c]
return dp[amount]
Tests
- LC 322:
coins=[1,2,5], amount=11→ 3. - LC 322:
coins=[2], amount=3→ -1. - LC 322:
coins=[1], amount=0→ 0. - LC 322:
coins=[186, 419, 83, 408], amount=6249→ 20. - LC 518:
coins=[1,2,5], amount=5→ 4. - LC 518:
coins=[2], amount=3→ 0. - LC 518:
coins=[10], amount=10→ 1. - LC 518:
amount=0→ 1 (empty combination). - Compare: For LC 518 with sums-outer (the wrong way),
coins=[1,2], amount=3gives 3 (1+1+1, 1+2, 2+1) instead of 2 (1+1+1, 1+2).
Follow-up Questions
- “Reconstruct one valid combination.” → Track which coin produced each
dp[w]; backtrack fromdp[amount]. - “What if coins are large (up to 10^9)?” → Knapsack table doesn’t fit; switch to BFS over reachable amounts (still O(amount × K)) or to coin-set-specific number theory.
- “Constraint: at most K coins total.” → Add a dimension:
dp[w][k]= min/count using ≤ k coins. - “All combinations summing to exactly amount, not just count.” → Backtracking; output is exponential in worst case.
- “What if some coins have limited supply?” → Bounded knapsack; binary-decompose each coin’s count and reduce to 0/1.
Product Extension
Cash-register optimization (which bills/coins to dispense for change), packet-payload composition (combining MTU-aware fragments), and currency-change problems in financial systems — all reduce to coin change variants.
Language/Runtime Follow-ups
- Python:
INF = amount + 1(since at mostamountcoins of value 1) avoidsfloat('inf')arithmetic. - Java:
int[] dp = new int[amount+1]; Arrays.fill(dp, amount+1); dp[0]=0;. - Go: pre-fill via loop; no
Arrays.fillshortcut. - C++:
vector<int> dp(amount+1, amount+1); dp[0]=0;. - JS/TS:
new Array(amount+1).fill(amount+1)thendp[0]=0.
Common Bugs
- LC 322: iterating coins outer, sums inner — works for the minimum variant; misleading for those who later try LC 518 with the same nesting and get permutations.
- LC 518: iterating sums outer, coins inner — counts permutations (
[1,2]and[2,1]separately) instead of combinations. - Forgetting
dp[0] = 1in LC 518 — every count becomes 0. - Using
float('inf') + 1arithmetic in Python — works (inf + 1 == inf), but slower and obscures intent. PreferINF = amount + 1. - Forgetting the
c <= wguard — out-of-bounds indexdp[w - c]whenw < c. - Off-by-one in
range(1, amount + 1)— must reachamountinclusive.
Debugging Strategy
For LC 322 with coins=[1,2,5], amount=11: after the loop, dp = [0,1,1,2,2,1,2,2,3,3,2,3]. Print dp and check dp[11] = 3. For LC 518 with coins=[1,2,5], amount=5: after processing coin 1, dp = [1,1,1,1,1,1]. After coin 2: dp = [1,1,2,2,3,3]. After coin 5: dp = [1,1,2,2,3,4]. Walking through this manually catches loop-nesting bugs.
Mastery Criteria
- Recognized “unlimited coins” as unbounded knapsack within 60 seconds.
- Wrote LC 322 brute recursion in <2 minutes.
- Wrote LC 322 tabulated in <5 minutes.
- Articulated why unbounded uses left-to-right iteration in <30 seconds.
- Wrote LC 518 with the correct outer-coins loop in <5 minutes.
- Articulated why coins-outer counts combinations and sums-outer counts permutations in <60 seconds.
- Stated O(K · amount) time and O(amount) space.
- Solved LC 322 unaided in <10 minutes (full progression).
- Solved LC 518 unaided in <10 minutes.
Lab 05 — LIS (Longest Increasing Subsequence)
Goal
Solve LC 300 with two distinct algorithms: the canonical O(N²) DP and the patience-sort + binary-search O(N log N) variant. Internalize why both produce the same answer despite very different mechanics. After this lab you can produce both solutions from a blank screen in <15 minutes total and explain the equivalence on a whiteboard.
Background Concepts
The LIS problem is the canonical example of a problem with two equally-valid algorithmic angles. The O(N²) DP defines dp[i] = length of LIS ending at index i; the O(N log N) algorithm maintains an array tails where tails[k] = smallest tail of any increasing subsequence of length k+1. Both produce the same length; the binary-search version is faster but harder to prove correct.
Patience sorting: imagine dealing cards onto piles such that each pile is strictly decreasing top-to-bottom (place each card on the leftmost pile whose top is ≥ the new card; if none exists, start a new pile). The number of piles equals the LIS length, by Dilworth’s theorem. The tails array tracks the top of each pile.
Interview Context
LIS is a top-20 Medium DP problem and shows up at Google, Bloomberg, and Microsoft regularly. The follow-up “can you do better than O(N²)?” is asked specifically to test whether you know patience sorting. Candidates who know only O(N²) are shipped to L4; candidates who can derive O(N log N) from scratch (or articulate it cleanly) are L5+ material. LIS is also the building block for LC 354 (Russian Doll Envelopes) and LC 673 (Number of LIS).
Problem Statement
Given an integer array nums, return the length of the longest strictly increasing subsequence.
Constraints
- 1 ≤
nums.length≤ 2500 (canonical LeetCode constraint) - −10^4 ≤
nums[i]≤ 10^4
Clarifying Questions
- Strictly increasing or non-decreasing? (Strictly —
nums[i] < nums[j].) - Subsequence or subarray? (Subsequence — non-contiguous selections allowed.)
- Return the length or the actual sequence? (Length only, per problem.)
- Are duplicates handled? (Yes; strict means duplicates can’t both be in the LIS.)
- Is the empty subsequence allowed (length 0)? (Yes, but since
nums.length ≥ 1, the answer is ≥ 1.)
Examples
[10, 9, 2, 5, 3, 7, 101, 18] → 4 ([2, 3, 7, 101] or [2, 5, 7, 101])
[0, 1, 0, 3, 2, 3] → 4 ([0, 1, 2, 3])
[7, 7, 7, 7] → 1
[1] → 1
[5, 4, 3, 2, 1] → 1
Initial Brute Force
For each index, recursively decide include or skip, tracking the previous chosen element to enforce strict-increasing:
def lengthOfLIS_brute(nums):
def f(i, prev):
if i == len(nums):
return 0
skip = f(i + 1, prev)
take = 0
if prev == -1 or nums[i] > nums[prev]:
take = 1 + f(i + 1, i)
return max(skip, take)
return f(0, -1)
Brute Force Complexity
O(2^N) — each step has two choices.
Optimization Path
The state is (i, prev) where prev is the last chosen index (or -1 for none). There are O(N²) such states, so memoization gives O(N²) time and space. Tabulation: define dp[i] = length of LIS ending exactly at index i; recurrence reads only smaller j < i, so we don’t even need the prev dimension — dp[i] = 1 + max(dp[j] for j < i if nums[j] < nums[i]). Final answer is max(dp).
For O(N log N): maintain tails such that tails[k] is the smallest tail of any LIS of length k+1. For each nums[i], binary-search for the first element in tails ≥ nums[i]; if found, replace; if not (i.e., nums[i] exceeds all), append.
Final Expected Approach
O(N²) DP: prefix DP indexed by ending position; recurrence iterates over all earlier indices.
O(N log N) patience sort: maintain tails as an increasing array; bisect_left(tails, nums[i]) gives the position to replace; if equal to len(tails), append.
The equivalence: each tails[k] corresponds to “the smallest endpoint of a length-(k+1) IS we’ve seen”. When we process nums[i], replacing tails[k] with nums[i] represents “we’ve found a length-(k+1) IS with smaller tail” — which can only help future extensions. The length of tails at the end is the LIS length.
Data Structures Used
- 1D
dpof size N (O(N²) version). - 1D
tailsarray (O(N log N) version), Python’sbisectmodule.
Correctness Argument
O(N²): by induction. dp[i] = 1 + max(dp[j] : j < i, nums[j] < nums[i]). Base: dp[0] = 1. Inductive step: any LIS ending at i has a previous element at some j < i with nums[j] < nums[i], contributing dp[j] + 1. The max over all valid j is the optimum. Answer is max_i dp[i].
O(N log N) (Patience sort): invariant — tails[k] is the smallest possible tail of any IS of length k+1 over nums[0..i]. When processing nums[i]: binary-search for the leftmost position k with tails[k] >= nums[i]. If k = len(tails), append (we’ve extended the longest IS by one). Otherwise, replace tails[k] with nums[i] (we’ve found a length-(k+1) IS with smaller tail; future extensions are now easier). The invariant is preserved at every step. The length of tails is the LIS length.
Complexity
| Algorithm | Time | Space |
|---|---|---|
| Brute force | O(2^N) | O(N) |
| Memoized | O(N²) | O(N²) |
| Tabulated O(N²) DP | O(N²) | O(N) |
| Patience sort | O(N log N) | O(N) |
Implementation Requirements
All four stages.
# ---- Stage 1: Brute force ----
def lengthOfLIS_brute(nums):
def f(i, prev):
if i == len(nums):
return 0
skip = f(i + 1, prev)
take = 0
if prev == -1 or nums[i] > nums[prev]:
take = 1 + f(i + 1, i)
return max(skip, take)
return f(0, -1)
# ---- Stage 2: Memoized ----
from functools import lru_cache
def lengthOfLIS_memo(nums):
@lru_cache(None)
def f(i, prev):
if i == len(nums): return 0
skip = f(i + 1, prev)
take = 0
if prev == -1 or nums[i] > nums[prev]:
take = 1 + f(i + 1, i)
return max(skip, take)
return f(0, -1)
# ---- Stage 3: Tabulated O(N^2) ----
def lengthOfLIS_tab(nums):
n = len(nums)
if n == 0: return 0
dp = [1] * n
for i in range(1, n):
for j in range(i):
if nums[j] < nums[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
# ---- Stage 4: Patience sort O(N log N) ----
from bisect import bisect_left
def lengthOfLIS(nums):
tails = []
for x in nums:
k = bisect_left(tails, x)
if k == len(tails):
tails.append(x)
else:
tails[k] = x
return len(tails)
Tests
[10, 9, 2, 5, 3, 7, 101, 18]→ 4.[0, 1, 0, 3, 2, 3]→ 4.[7, 7, 7, 7]→ 1.[1]→ 1.[1, 2, 3, 4, 5]→ 5 (already sorted).[5, 4, 3, 2, 1]→ 1 (decreasing).- N=2500 random — performance test for both algorithms.
- Cross-check: random N≤15, the four implementations should agree.
Follow-up Questions
- “Return the actual LIS, not just the length.” → Track parent pointers in the O(N²) DP, or in O(N log N) keep alongside
tailsan arraytails_idxof indices intonumsand parent links. - “Number of distinct LIS’s of maximum length.” (LC 673) → Augment
dp[i]withcnt[i]= number of LIS’s ending ati. - “Longest non-decreasing subsequence.” →
bisect_rightinstead ofbisect_left. - “2D version: stack envelopes (LC 354).” → Sort by width ascending and height descending (to break ties); run LIS on heights.
- “Longest bitonic subsequence.” → Compute LIS forward and LIS backward; combine at each split point.
Product Extension
LIS underlies version-history compression, longest-monotonic-trend analysis in time-series (e.g., longest streak of growing daily users), and dependency-resolution heuristics. The O(N log N) algorithm is what production code uses when N is large.
Language/Runtime Follow-ups
- Python:
bisect_leftis in the standard library and uses C-level binary search — extremely fast. - Java:
Arrays.binarySearch(tails, 0, size, x)returns negative for not-found; convert to insertion point with-(ret + 1). - Go:
sort.SearchInts(tails, x)forbisect_leftequivalent. - C++:
lower_bound(tails.begin(), tails.end(), x)forbisect_left;upper_boundforbisect_right. - JS/TS: no built-in binary search — implement manually or use a third-party
lodash.sortedIndex.
Common Bugs
bisect_rightvsbisect_left— strict-increasing usesbisect_left; non-decreasing usesbisect_right. Off-by-one in this choice silently gives the wrong LIS variant.- Treating
tailsas the actual LIS — it isn’t; it’s just the smallest-tails-by-length array. Reconstructing the LIS requires extra bookkeeping. - O(N²) DP: starting
dp[i] = 0instead of1— every element is itself an LIS of length 1. - Returning
dp[N-1]instead ofmax(dp)— the LIS may end anywhere, not necessarily at the last index. - Memoization on
(i, prev)withprev=-1not recognized as initial state — works in Python with@lru_cachesince -1 is hashable, but easy to forget.
Debugging Strategy
For [10, 9, 2, 5, 3, 7, 101, 18]: trace tails after each element: [10] → [9] → [2] → [2,5] → [2,3] → [2,3,7] → [2,3,7,101] → [2,3,7,18]. Length 4 is the LIS length. If your trace diverges, you’ve made a bisect mistake. For the O(N²) DP, print dp after the loop: [1,1,1,2,2,3,4,4].
Mastery Criteria
- Recognized “longest increasing subsequence” as LIS within 30 seconds.
- Wrote the brute recursion in <2 minutes.
- Wrote the O(N²) DP from blank screen in <4 minutes.
- Wrote the O(N log N) patience-sort version in <5 minutes.
-
Articulated the patience-sort invariant (“
tails[k]is the smallest tail of length-(k+1) IS”) in <30 seconds. - Stated O(N log N) time complexity and explained why binary-search is correct here.
- Solved LC 300 unaided in <12 minutes (both algorithms).
- Solved LC 354 (Russian Doll Envelopes) by reduction to LIS in <15 minutes.
-
Articulated
bisect_leftvsbisect_rightfor strict vs non-strict in <30 seconds.
Lab 06 — LCS / Edit Distance
Goal
Solve Edit Distance (LC 72 — Levenshtein) with the full four-stage progression. Internalize the canonical two-string DP dp[i][j] indexed by prefix lengths, and the three-way min over insert / delete / replace. After this lab you can write any LCS-family DP from a blank screen in <12 minutes and apply the rolling-row collapse to O(M) space.
Background Concepts
Edit distance — sometimes called Levenshtein distance — is the minimum number of single-character edits (insert, delete, replace) needed to transform string A into string B. The state dp[i][j] indexes prefix-i of A and prefix-j of B; the recurrence has one branch per edit operation plus a free pass on character match.
LCS (longest common subsequence) and edit distance are the foundational two-string DPs. They share the index convention (dp[i][j] = answer for prefix-i of A, prefix-j of B), the boundary handling (dp[i][0] = i, dp[0][j] = j), and the rolling-row space optimization (O(N · M) → O(M)). Mastering edit distance gives you LCS for free.
Interview Context
Edit Distance is a top-15 Hard-tagged DP problem at Google, Microsoft, and Amazon. It shows up in coding rounds at staff level routinely, often paired with a follow-up “now reconstruct the alignment”. LCS (LC 1143) is the gentler Medium variant and tests the same skill. Candidates who can derive both recurrences from scratch and articulate the four edit operations precisely demonstrate fluency that translates to nearly every two-string DP problem in the corpus (regex match, distinct subsequences, interleaving strings, longest common substring).
Problem Statement
Given two strings word1 and word2, return the minimum number of operations required to convert word1 into word2. Allowed operations: insert a character, delete a character, replace a character (each cost 1).
Constraints
- 0 ≤
word1.length,word2.length≤ 500 word1andword2consist of lowercase English letters.
Clarifying Questions
- Are insert/delete/replace each cost 1? (Yes — Levenshtein.)
- Are there any other operations (transpose, e.g.)? (No — Damerau-Levenshtein adds transpose; not in scope.)
- Are characters lowercase only? (Yes — given.)
- Are empty strings valid inputs? (Yes; answer is
len(word1) + len(word2)’s difference, specificallymax(len(word1), len(word2))when one is empty.) - Return the count or the alignment? (Count — alignment is a follow-up.)
Examples
word1="horse", word2="ros" → 3 (horse→rorse→rose→ros)
word1="intention", word2="execution" → 5
word1="", word2="abc" → 3 (insert 3)
word1="abc", word2="" → 3 (delete 3)
word1="abc", word2="abc" → 0
Initial Brute Force
def edit_brute(w1, w2):
def f(i, j):
if i == 0: return j # insert remaining w2
if j == 0: return i # delete remaining w1
if w1[i-1] == w2[j-1]:
return f(i-1, j-1) # match: no edit
return 1 + min(
f(i-1, j), # delete w1[i-1]
f(i, j-1), # insert w2[j-1]
f(i-1, j-1), # replace
)
return f(len(w1), len(w2))
Brute Force Complexity
Each non-base call branches into 3; recursion depth N + M. Worst case O(3^(N+M)). At N=M=500, completely infeasible.
Optimization Path
There are (N+1)(M+1) distinct (i, j) pairs — memoization gives O(N · M) time. Tabulation replaces recursion with a row-major loop. Since dp[i][j] depends only on dp[i-1][j-1], dp[i-1][j], dp[i][j-1], the previous row plus the in-progress row are enough — collapse to two 1D arrays of size M+1. With careful use of a saved diagonal, you can collapse to a single 1D array.
Final Expected Approach
dp[i][j] = edit distance between word1[:i] and word2[:j].
dp[0][j] = j (insert j chars to get word2[:j] from empty word1[:0])
dp[i][0] = i (delete i chars from word1[:i] to get empty word2[:0])
dp[i][j] = dp[i-1][j-1] if word1[i-1] == word2[j-1]
= 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1]) otherwise
The three operations correspond to:
dp[i-1][j-1] + 1— replaceword1[i-1]withword2[j-1].dp[i-1][j] + 1— deleteword1[i-1].dp[i][j-1] + 1— insertword2[j-1].
Data Structures Used
- 2D
dp[(N+1) x (M+1)]array (tabulated). - Two 1D
prev,currarrays of sizeM+1(rolled).
Correctness Argument
By induction on (i, j) in row-major order. Base cases: dp[0][j] = j (clearly j inserts), dp[i][0] = i (clearly i deletes). Inductive step: an optimal alignment of word1[:i] with word2[:j] ends with one of: (a) match — word1[i-1] == word2[j-1] aligned; cost is dp[i-1][j-1]. (b) replace — pair word1[i-1] with word2[j-1]; cost dp[i-1][j-1] + 1. (c) delete — word1[i-1] deleted, word1[:i-1] aligned with word2[:j]; cost dp[i-1][j] + 1. (d) insert — word2[j-1] inserted, word1[:i] aligned with word2[:j-1]; cost dp[i][j-1] + 1. The min of these is the optimum. (a) and (b) are mutually exclusive based on character equality.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(3^(N+M)) | O(N+M) |
| Memoized | O(N · M) | O(N · M) |
| Tabulated | O(N · M) | O(N · M) |
| Space-optimized | O(N · M) | O(min(N, M)) |
Implementation Requirements
All four stages.
# ---- Stage 1: Brute force ----
def edit_brute(w1, w2):
def f(i, j):
if i == 0: return j
if j == 0: return i
if w1[i-1] == w2[j-1]: return f(i-1, j-1)
return 1 + min(f(i-1, j), f(i, j-1), f(i-1, j-1))
return f(len(w1), len(w2))
# ---- Stage 2: Memoized ----
from functools import lru_cache
def edit_memo(w1, w2):
@lru_cache(None)
def f(i, j):
if i == 0: return j
if j == 0: return i
if w1[i-1] == w2[j-1]: return f(i-1, j-1)
return 1 + min(f(i-1, j), f(i, j-1), f(i-1, j-1))
return f(len(w1), len(w2))
# ---- Stage 3: Tabulated 2D ----
def edit_tab(w1, w2):
n, m = len(w1), len(w2)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for j in range(m + 1): dp[0][j] = j
for i in range(n + 1): dp[i][0] = i
for i in range(1, n + 1):
for j in range(1, m + 1):
if w1[i-1] == w2[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
return dp[n][m]
# ---- Stage 4: Space-optimized O(M) ----
def minDistance(w1, w2):
n, m = len(w1), len(w2)
if n < m:
w1, w2, n, m = w2, w1, m, n # ensure m is the smaller dim
prev = list(range(m + 1))
for i in range(1, n + 1):
curr = [i] + [0] * m
for j in range(1, m + 1):
if w1[i-1] == w2[j-1]:
curr[j] = prev[j-1]
else:
curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1])
prev = curr
return prev[m]
Tests
("horse", "ros")→ 3.("intention", "execution")→ 5.("", "abc")→ 3.("abc", "")→ 3.("abc", "abc")→ 0.("a", "b")→ 1.("a"*500, "b"*500)→ 500 — performance test.- All four implementations equivalent on random N≤8 inputs.
Follow-up Questions
- “Reconstruct the alignment (sequence of operations).” → Backtrack from
dp[n][m]: at each(i, j), look at which of the three (or four) predecessors matches the current value; emit the corresponding operation. - “Custom costs for insert / delete / replace.” → Replace
+1with the appropriate cost in each branch; works without other change. - “Levenshtein with transpositions (Damerau-Levenshtein).” → Add a fourth branch
dp[i-2][j-2] + 1ifword1[i-1]==word2[j-2] and word1[i-2]==word2[j-1]. - “Longest common subsequence.” (LC 1143) → Same shape; recurrence:
dp[i][j] = dp[i-1][j-1] + 1if match, elsemax(dp[i-1][j], dp[i][j-1]). - “Minimum ASCII delete sum.” (LC 712) → Variant where deletes cost ASCII value of the deleted char.
Product Extension
Edit distance is the engine behind diff/patch tools, spell correctors, fuzzy search (“did you mean”), DNA-sequence alignment (Needleman-Wunsch is a generalization with custom scoring matrices), and code-review side-by-side comparison. Real systems use Hirschberg’s algorithm to reconstruct the alignment in O(M) space.
Language/Runtime Follow-ups
- Python:
lru_cacheworks on(i, j)since both are ints. The space-optimized version benefits from swapping to ensurem ≤ n. - Java:
int[][] dpwith explicit boundary fill. For O(M) space, twoint[m+1]arrays. - Go: 2D slice;
make([][]int, n+1)then per-rowmake([]int, m+1). - C++:
vector<vector<int>>2D; or twovector<int>(m+1)for O(M). - JS/TS: 2D array via
Array.from({length: n+1}, () => new Array(m+1).fill(0)).
Common Bugs
- Off-by-one between string index and DP index —
word1[i-1]notword1[i]. The convention “prefix-i” means indexiis exclusive in the prefix, so the latest char isword1[i-1]. - Forgetting the boundary
dp[0][j] = j,dp[i][0] = i— defaults to 0 and produces nonsense answers. - Using
maxinstead ofminin the recurrence — wrong direction. - Including
dp[i-1][j-1] + 1in the match branch — match has no edit cost; should bedp[i-1][j-1]exactly. - Space-optimized version: forgetting
curr[0] = i— the new row’s column 0 isi(deletingichars to match empty prefix), not 0. - Mistakenly thinking insert and delete are symmetric in cost across both strings — they aren’t; insert into
word1is the same as delete fromword2. Levenshtein conflates them in our recurrence which is fine because all costs are 1.
Debugging Strategy
For ("horse", "ros"), the full table is:
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3
Print and check. If the boundary row/column is wrong, the entire table is. For the rolled version, print prev after each row.
Mastery Criteria
- Recognized “min operations to transform” as edit distance within 60 seconds.
- Wrote the brute recursion with all four cases (match / replace / delete / insert) in <3 minutes.
- Wrote the 2D tabulated version in <5 minutes.
- Performed the rolling-row collapse to two 1D arrays in <3 minutes.
- Stated O(N · M) time and O(M) space.
- Articulated which DP cell corresponds to which edit operation.
- Solved LC 72 unaided in <20 minutes (full progression).
- Solved LC 1143 (LCS) in <8 minutes by changing the recurrence.
- Solved LC 583 (Delete Operation for Two Strings) in <8 minutes (LCS + arithmetic).
Lab 07 — Palindrome DP
Goal
Solve Longest Palindromic Subsequence (LC 516) and Palindrome Partitioning II (LC 132 — minimum cuts) with the four-stage progression. Internalize the length-ascending iteration that makes interval DP correct, and the is_pal[i][j] precompute that powers most palindrome problems. After this lab you can solve any palindrome-DP variant in <12 minutes.
Background Concepts
Palindrome DP problems split into two families:
-
Subsequence palindromes (LC 516):
dp[i][j]= length of longest palindromic subsequence ofs[i..j]. Recurrence: ifs[i] == s[j],dp[i][j] = 2 + dp[i+1][j-1]; elsedp[i][j] = max(dp[i+1][j], dp[i][j-1]). -
Substring palindromes (LC 132, LC 5, LC 647): precompute
is_pal[i][j] = (s[i..j] is a palindrome)via interval DP, then layer the partitioning / counting DP on top.
The shared mechanic: iterate by length ascending, since dp[i][j] depends on intervals strictly shorter. This is the defining feature of interval DP (deeper exploration in Lab 09).
Interview Context
Longest Palindromic Subsequence is a top-30 Medium DP problem; Palindrome Partitioning II is a Hard variant asked at Google and Microsoft. The is_pal precompute is the unlock for ~10 distinct LeetCode problems (5, 131, 132, 516, 647, 1278, 1312, 1771, …). Candidates who can derive the length-ascending loop and articulate the substring-vs-subsequence distinction signal solid interval-DP fluency.
Problem Statement
LC 516 (LPS subsequence): Given a string s, return the length of the longest palindromic subsequence.
LC 132 (min cuts): Given a string s, return the minimum number of cuts needed to partition s into palindromic substrings.
Constraints
- 1 ≤
s.length≤ 1000 (LC 516) / 2000 (LC 132). sis lowercase English.
Clarifying Questions
- Subsequence or substring? (Subsequence for LC 516; substring for LC 132 partitioning.)
- Is a single character a palindrome? (Yes — length 1.)
- Is the empty string a palindrome? (Conventionally yes.)
- LC 132: must each part be non-empty? (Yes.)
- LC 132: 0 cuts means the entire string is a palindrome.
Examples
LC 516:
"bbbab" → 4 ("bbbb")
"cbbd" → 2 ("bb")
"a" → 1
LC 132:
"aab" → 1 ("aa" | "b")
"a" → 0
"ab" → 1
"aabb" → 1 ("aa" | "bb")
"abcbm" → 2
"abc" → 2
Initial Brute Force (LC 516)
def lps_brute(s):
def f(i, j):
if i > j: return 0
if i == j: return 1
if s[i] == s[j]: return 2 + f(i+1, j-1)
return max(f(i+1, j), f(i, j-1))
return f(0, len(s) - 1)
Brute Force Complexity
O(2^N) worst case — every mismatch branches. At N=1000, infeasible.
Optimization Path
(i, j) has only O(N²) distinct values, so memoization is O(N²) time and space. Tabulation: iterate length = 1..N, then i = 0..N-length, then derive j = i + length - 1. The length-ascending order ensures all shorter intervals are computed first.
LC 132 strategy: precompute is_pal[i][j] in O(N²). Then cuts[i] = min cuts for s[:i+1]; cuts[i] = -1 if s[:i+1] is itself a palindrome; else cuts[i] = min(cuts[j-1] + 1 : 0 ≤ j ≤ i, s[j..i] palindrome).
Final Expected Approach
LC 516:
dp[i][j] = LPS of s[i..j]
dp[i][i] = 1; dp[i][j] = 0 for i > j
For length 2..N:
For i in 0..N-length:
j = i + length - 1
if s[i] == s[j]:
dp[i][j] = (2 if length == 2 else 2 + dp[i+1][j-1])
else:
dp[i][j] = max(dp[i+1][j], dp[i][j-1])
Answer: dp[0][N-1]
LC 132:
1. Compute is_pal[i][j] (O(N^2)).
2. cuts[i] = min cuts to partition s[0..i].
cuts[i] = 0 if is_pal[0][i].
else cuts[i] = min(cuts[j-1] + 1 : 1 <= j <= i, is_pal[j][i]).
Answer: cuts[N-1].
Data Structures Used
- 2D
dp[N][N](LC 516). - 2D
is_pal[N][N]boolean + 1Dcuts[N](LC 132).
Correctness Argument
LC 516: by induction on length. Base: length 1 → 1. Inductive: an LPS of s[i..j] either uses both endpoints (must be equal, contributing 2 + LPS of s[i+1..j-1]) or skips at least one endpoint (LPS of s[i+1..j] or s[i..j-1]). The max covers all cases.
LC 132: every valid partition has a last cut at some position j, splitting into s[0..j-1] + s[j..i] where s[j..i] is a palindrome. The minimum is over all valid j. This exhausts all partitions.
Complexity
| Problem | Time | Space |
|---|---|---|
| LC 516 brute | O(2^N) | O(N) |
| LC 516 memo | O(N²) | O(N²) |
| LC 516 tab | O(N²) | O(N²) |
| LC 516 space-opt | O(N²) | O(N) |
| LC 132 | O(N²) | O(N²) |
Implementation Requirements
# ==== LC 516: Longest Palindromic Subsequence ====
# ---- Stage 1: Brute force ----
def lps_brute(s):
def f(i, j):
if i > j: return 0
if i == j: return 1
if s[i] == s[j]: return 2 + f(i+1, j-1)
return max(f(i+1, j), f(i, j-1))
return f(0, len(s) - 1)
# ---- Stage 2: Memoized ----
from functools import lru_cache
def lps_memo(s):
@lru_cache(None)
def f(i, j):
if i > j: return 0
if i == j: return 1
if s[i] == s[j]: return 2 + f(i+1, j-1)
return max(f(i+1, j), f(i, j-1))
return f(0, len(s) - 1)
# ---- Stage 3: Tabulated 2D ----
def lps_tab(s):
n = len(s)
dp = [[0] * n for _ in range(n)]
for i in range(n): dp[i][i] = 1
for length in range(2, n + 1):
for i in range(n - length + 1):
j = i + length - 1
if s[i] == s[j]:
dp[i][j] = 2 if length == 2 else 2 + dp[i+1][j-1]
else:
dp[i][j] = max(dp[i+1][j], dp[i][j-1])
return dp[0][n-1]
# ---- Stage 4: Space-optimized 1D ----
def longestPalindromeSubseq(s):
n = len(s)
dp = [0] * n
for i in range(n - 1, -1, -1):
new = [0] * n
new[i] = 1
for j in range(i + 1, n):
if s[i] == s[j]:
new[j] = 2 + (dp[j-1] if j-1 >= i+1 else 0)
else:
new[j] = max(dp[j], new[j-1])
dp = new
return dp[n-1]
# ==== LC 132: Palindrome Partitioning II ====
def minCut(s):
n = len(s)
# Step 1: precompute is_pal in O(N^2)
is_pal = [[False] * n for _ in range(n)]
for i in range(n): is_pal[i][i] = True
for length in range(2, n + 1):
for i in range(n - length + 1):
j = i + length - 1
if s[i] == s[j] and (length == 2 or is_pal[i+1][j-1]):
is_pal[i][j] = True
# Step 2: cuts DP
cuts = [0] * n
for i in range(n):
if is_pal[0][i]:
cuts[i] = 0
continue
cuts[i] = i # worst case: cut after every character
for j in range(1, i + 1):
if is_pal[j][i]:
cuts[i] = min(cuts[i], cuts[j-1] + 1)
return cuts[n-1]
Tests
- LC 516:
"bbbab"→ 4."cbbd"→ 2."a"→ 1."ac"→ 1."aaaa"→ 4. - LC 132:
"aab"→ 1."a"→ 0."ab"→ 1."aabb"→ 1."abcbm"→ 2."abacabaca"→ 0 (already a palindrome? no: check) → actually"abacaba"is, but"abacabaca"is not — answer 1. - Cross-implementation check on random N≤10.
Follow-up Questions
- “Count palindromic substrings.” (LC 647) → Sum
is_pal[i][j]over all(i, j)withi ≤ j. - “Longest palindromic substring.” (LC 5) → Return the length / actual string of the largest
(j - i + 1)withis_pal[i][j]. - “Minimum insertions to make palindrome.” (LC 1312) →
len(s) - LPS(s). - “All palindrome partitions.” (LC 131) → Backtracking; output exponentially many.
- “Palindromes with one allowed mismatch.” → Add a dimension
dp[i][j][k]wherek∈ {0, 1}.
Product Extension
Palindrome detection appears in DNA-sequence analysis (palindromic motifs are biologically meaningful: restriction sites, hairpins), text-search systems, and as a non-trivial benchmark for compiler optimization. The is_pal precompute is also useful in interview problems that don’t strictly need DP (just O(N²) precomputation).
Language/Runtime Follow-ups
- Python: 2D arrays via list comprehensions;
lru_cachefor memoization. - Java:
boolean[][] isPal = new boolean[n][n];defaults to false.int[][] dp = new int[n][n];. - Go: pre-allocate slice-of-slices; can fuse
is_palandcutscomputation in a single function. - C++:
vector<vector<bool>> is_pal(n, vector<bool>(n, false));. - JS/TS: as Python; watch for shared-reference trap.
Common Bugs
- Iterating
i, jin row-major for LPS — fails becausedp[i][j]depends ondp[i+1][j-1], which hasn’t been computed yet. Must iterate by length. - Forgetting
dp[i][i] = 1— base case for single-char palindromes. - Edge case
length == 2—dp[i+1][j-1]isdp[i+1][i], an invalid range. Special-case to 0 or just use2. - LC 132: forgetting the
is_pal[0][i]short-circuit — gives wrong answer for already-palindromic input. - LC 132: cuts initialization — initialize to
i(worst case: cut after every character ofs[0..i]). - Confusing subsequence and substring — LC 516 wants subsequence; many candidates accidentally solve substring (which is LC 5, harder).
Debugging Strategy
For LC 516 "bbbab": trace the table by length. Length 1: diagonal all 1. Length 2: dp[0][1]=2 (bb), dp[1][2]=2, dp[2][3]=1, dp[3][4]=1. Length 3: dp[0][2]=3 (bbb). Length 4: dp[0][3]=3, dp[1][4]=3. Length 5: dp[0][4]=4 (bbbb). For LC 132 "aab": is_pal = [[T,T,F],[F,T,F],[F,F,T]]; cuts = [0, 0, 1].
Mastery Criteria
- Recognized “longest palindromic subsequence” as interval DP within 60 seconds.
- Articulated the length-ascending iteration order in <30 seconds.
- Wrote LC 516 brute recursion in <2 minutes.
- Wrote LC 516 tabulated in <5 minutes.
-
Wrote
is_palprecompute correctly in <4 minutes. -
Wrote LC 132 cuts DP using
is_palin <5 minutes. - Stated O(N²) time and space.
- Solved LC 516 unaided in <12 minutes (full progression).
- Solved LC 132 unaided in <15 minutes.
-
Solved LC 5 (longest palindromic substring) in <8 minutes via
is_pal.
Lab 08 — Tree DP (House Robber III)
Goal
Solve House Robber III (LC 337) with post-order DFS returning (rob, skip) per node. Internalize the post-order DP pattern where each node returns a tuple of “best with this node included” and “best with this node excluded”. After this lab you recognize tree DP within 60 seconds and can write any post-order tuple-DP from blank screen in <8 minutes.
A note on the four-stage progression: tree DP doesn’t have a clean tabulated form (there’s no natural row-major order for a tree), and “space optimization” is implicit (each post-order call returns a constant-size tuple, so the working memory is O(1) per node). Stages we can show: brute recursion, memoized recursion, post-order DFS with tuple return (the canonical form), and an iterative post-order using an explicit stack. The tuple-return version is what you write in interviews.
Background Concepts
Tree DP: state is per node; recurrence aggregates children’s states. The natural evaluation order is post-order — process all descendants before the node itself. Most tree DPs return a tuple (or struct) per node carrying the answers for “this node included” vs “this node excluded” (or whatever the binary split is). The parent combines children’s tuples in O(1) per child, giving O(N) total.
House Robber III is the canonical example. Each node v returns (rob_v, skip_v):
rob_v = val[v] + sum(skip_c for c in children(v))— robv, must skip every child.skip_v = sum(max(rob_c, skip_c) for c in children(v))— skipv, each child is independently best.
Final answer: max(rob_root, skip_root).
Interview Context
LC 337 is a top-30 Medium DP problem at Amazon and Microsoft. The post-order tuple pattern recurs in: LC 124 (Binary Tree Maximum Path Sum), LC 543 (Diameter of Binary Tree), LC 968 (Binary Tree Cameras), LC 1372 (Longest ZigZag Path). Mastering it here generalizes broadly. Senior interviewers specifically test whether you write the tuple-return version (clean, O(N)) versus the memoized-but-redundant version that recursively descends three times per call.
Problem Statement
A binary tree where each node holds an integer amount of money. The thief cannot rob two directly-linked houses (parent–child). Return the maximum amount the thief can rob without alerting the police.
Constraints
- 1 ≤ tree size ≤ 10^4
- 0 ≤
node.val≤ 10^4
Clarifying Questions
- Is the tree binary or general? (Binary, per LC 337.)
- Are values non-negative? (Yes — given.)
- What does the tree representation look like? (Standard
TreeNodewithleft,right.) - Can the tree be empty? (Yes — return 0.)
- Does “linked” mean parent–child only or also siblings? (Parent–child only.)
Examples
3
/ \
2 3
\ \
3 1 → 7 (rob 3 + 3 + 1)
3
/ \
4 5
/ \ \
1 3 1 → 9 (rob 4 + 5)
Initial Brute Force
For each subtree rooted at v: try rob-v (must skip children, recurse on grandchildren) or skip-v (recurse on children).
def rob_brute(root):
def f(v):
if v is None: return 0
# Skip v
skip = f(v.left) + f(v.right)
# Rob v: must skip both children
rob = v.val
if v.left: rob += f(v.left.left) + f(v.left.right)
if v.right: rob += f(v.right.left) + f(v.right.right)
return max(rob, skip)
return f(root)
Brute Force Complexity
Each call recurses on children and on grandchildren — the same node is visited multiple times via different paths. Worst case O(N · 2^depth). Memoization on the node identity collapses to O(N), but cleaner is to return the tuple in a single post-order traversal.
Optimization Path
The brute force descends three times per node (for the rob branch) and two for skip. With memoization on the node, every subtree is computed twice (once as a “child” call, once as a “grandchild” call). Use a dict keyed by id(node) or, much cleaner, return both values in one tuple per node — a single post-order pass with no memoization needed.
Final Expected Approach
Post-order DFS returning (rob, skip):
def f(v):
if v is None: return (0, 0)
lr, ls = f(v.left)
rr, rs = f(v.right)
rob_v = v.val + ls + rs
skip_v = max(lr, ls) + max(rr, rs)
return (rob_v, skip_v)
answer = max(f(root))
Time O(N) (each node visited once). Space O(H) for the call stack (H = tree height; O(N) worst case for skewed trees, O(log N) average).
Data Structures Used
- Binary tree (input).
- Recursion stack.
- For brute / memo: optional
dictkeyed by node identity.
Correctness Argument
By structural induction on the tree. Base: empty tree → (0, 0). Inductive: assume f(v.left) and f(v.right) correctly return (rob, skip) for those subtrees. Then:
rob_v= robvand skip both children. Since the children must be skipped (parent-child constraint), the contribution from each child subtree isskip_child. Plusv.val.skip_v= skipv, each child subtree independently maximized:max(rob_child, skip_child).
The max of the two is the overall optimum, but we return both (not the max) because the parent of v needs skip_v distinct from rob_v. The final answer at the root is max(rob_root, skip_root).
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(N · 2^H) worst | O(H) |
| Memoized (node-keyed) | O(N) | O(N) |
| Tuple-return post-order | O(N) | O(H) |
| Iterative post-order | O(N) | O(N) (explicit stack) |
Implementation Requirements
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val, self.left, self.right = val, left, right
# ---- Stage 1: Brute force (recompute via grandchildren) ----
def rob_brute(root):
def f(v):
if v is None: return 0
skip = f(v.left) + f(v.right)
rob = v.val
if v.left: rob += f(v.left.left) + f(v.left.right)
if v.right: rob += f(v.right.left) + f(v.right.right)
return max(rob, skip)
return f(root)
# ---- Stage 2: Memoized on node identity ----
def rob_memo(root):
memo = {}
def f(v):
if v is None: return 0
if id(v) in memo: return memo[id(v)]
skip = f(v.left) + f(v.right)
rob = v.val
if v.left: rob += f(v.left.left) + f(v.left.right)
if v.right: rob += f(v.right.left) + f(v.right.right)
memo[id(v)] = max(rob, skip)
return memo[id(v)]
return f(root)
# ---- Stage 3: Tuple-return post-order (canonical) ----
def rob(root):
def f(v):
if v is None: return (0, 0)
lr, ls = f(v.left)
rr, rs = f(v.right)
rob_v = v.val + ls + rs
skip_v = max(lr, ls) + max(rr, rs)
return (rob_v, skip_v)
return max(f(root))
# ---- Stage 4: Iterative post-order with explicit stack ----
def rob_iter(root):
if root is None: return 0
stack, order = [root], []
while stack:
v = stack.pop()
order.append(v)
if v.left: stack.append(v.left)
if v.right: stack.append(v.right)
# order is now reverse post-order; iterate reversed for true post-order
state = {} # id(v) -> (rob, skip)
for v in reversed(order):
lr, ls = state.get(id(v.left), (0, 0)) if v.left else (0, 0)
rr, rs = state.get(id(v.right), (0, 0)) if v.right else (0, 0)
state[id(v)] = (v.val + ls + rs, max(lr, ls) + max(rr, rs))
return max(state[id(root)])
Tests
Build trees from level-order input:
[3,2,3,null,3,null,1]→ 7.[3,4,5,1,3,null,1]→ 9.[1]→ 1.[]→ 0 (empty tree).- Linear left-skewed chain
1→2→3→4→5→ 9 (rob 1, 3, 5). - Random N=1000 tree — performance test.
- Cross-check all four implementations on random trees of size N≤8.
Follow-up Questions
- “Reconstruct which nodes were robbed.” → Augment the tuple to also return the set of robbed nodes (or backtrack from the root after the post-order pass).
- “K-ary tree.” → Same idea; sum over all children.
- “Constraint relaxes to: cannot rob two nodes within distance K.” → Augment state to track distance to last robbed node; state grows by factor K.
- “Negative values allowed.” → Same recurrence; max correctly handles.
- “Maximum path sum of an arbitrary path (LC 124).” → Different but related: post-order returns “best one-sided path from this node”. Combine left + right + node at the root for the best path through it.
Product Extension
Tree DP underlies code-review priority computation in commit-trees, expression-tree evaluation in compilers, hierarchical resource allocation (org charts, file systems), and game-tree value computation. The post-order-with-tuple pattern is a workhorse.
Language/Runtime Follow-ups
- Python: tuple unpacking is idiomatic. Default recursion limit (1000) overflows on deep skewed trees; bump with
sys.setrecursionlimit. - Java: define a small inner
int[]of size 2 or aPair<Integer, Integer>(or just useint[]). Recursion depth is O(H); JVM stack default 512KB, OK for N≤10000. - Go: return
(int, int)directly via multiple-return. - C++:
pair<int,int>returned by value. - JS/TS: return
[rob, skip]array.
Common Bugs
- Returning only
max(rob_v, skip_v)instead of the tuple — the parent then can’t distinguish the two cases and the recurrence is wrong. - Computing
rob_vasv.val + skip_v— wrong, becauseskip_valready includesmax(rob_child, skip_child)not justskip_child. - Forgetting
v is Nonebase case — null pointer /AttributeError. - Confusing which children’s value to sum:
rob_vsumsls + rs(skip both children);skip_vsumsmax(lr, ls) + max(rr, rs). - Iterative version: traversing in pre-order and reversing — works for binary trees because reverse pre-order with right-before-left equals post-order. Easy to flip and break.
- Using
@lru_cacheon the functionf(v)directly —TreeNodeis unhashable by default; useid(v)or define__hash__.
Debugging Strategy
For [3,2,3,null,3,null,1]:
- Leaves:
f(3-leaf-left) = (3, 0).f(1-leaf) = (1, 0).f(3-leaf-mid) = (3, 0). - Mid-left node (val 2, child 3):
f(2-mid) = (2 + 0, 3) = (2, 3). - Mid-right (val 3, right child 1):
f(3-right) = (3 + 0, 1) = (3, 1). - Root (val 3):
f(root) = (3 + 3 + 1, max(2,3) + max(3,1)) = (7, 6). Answer: 7. ✓
If your tuple values diverge, print (rob, skip) per node in post-order and locate the first inconsistency.
Mastery Criteria
- Recognized this as tree DP within 60 seconds.
-
Articulated the
(rob, skip)tuple invariant in <30 seconds. - Wrote the tuple-return post-order in <5 minutes from blank screen.
- Stated O(N) time and O(H) space.
- Articulated why the tuple is necessary (parent needs both rob_child and skip_child) in <30 seconds.
- Solved LC 337 unaided in <8 minutes.
- Solved LC 124 (Binary Tree Max Path Sum) in <12 minutes using the same post-order pattern.
- Solved LC 543 (Diameter) in <8 minutes.
- Identified the brute-force redundancy (descend twice via children + grandchildren) without prompting.
Lab 09 — Interval DP (Burst Balloons)
Goal
Solve Burst Balloons (LC 312) with the four-stage progression. Internalize the “think backwards” trick: instead of asking “which balloon to burst first?” (which fragments the array), ask “which balloon is burst last in the interval (i, j)?” — that balloon’s left and right neighbors are guaranteed to be nums[i] and nums[j] (the surviving boundary). After this lab you can identify and solve interval DP problems in <15 minutes.
Background Concepts
Interval DP: state dp[i][j] is the answer over the subarray (or substring) from index i to j. Recurrence iterates over a “split point” k in (i, j) and combines two sub-intervals. Defining feature: iterate by interval length ascending, so all shorter intervals are computed before they’re needed.
Burst Balloons is famous because the naive “first to burst” formulation creates non-contiguous subproblems. The “last to burst” reformulation (think backwards) restores contiguity: in interval (i, j), if k is the last balloon to burst, then by the time we burst it, all of (i, k) and (k, j) have already been burst, and k’s left and right neighbors are exactly nums[i] and nums[j] (the original boundary).
Interview Context
Burst Balloons is a top-Hard interval DP problem at Google and Microsoft. It is the problem that teaches the “think backwards” trick. Candidates who fail to recognize the contiguity issue with the forward formulation (and then naively try memoization on subsets — which is 2^N states) get stuck. Senior interviewers love this problem precisely because the reformulation is non-obvious and tests insight, not memorization.
Other interval DP problems: Matrix Chain Multiplication (LC 1547), Stone Game (LC 877), Strange Printer (LC 664), Remove Boxes (LC 546).
Problem Statement
You are given n balloons indexed 0 to n−1, each with a number nums[i] painted on it. You are asked to burst all the balloons. If you burst balloon i, you get nums[i-1] * nums[i] * nums[i+1] coins (use 1 if neighbor is out of bounds). After bursting, the neighbors become adjacent. Return the maximum coins you can collect.
Constraints
- N == nums.length
- 1 ≤ N ≤ 500
- 0 ≤ nums[i] ≤ 100
Clarifying Questions
- Are values non-negative? (Yes — given.)
- After bursting, do neighbors become adjacent? (Yes — that’s the rule.)
- What if a neighbor is out of bounds (edge balloon)? (Treat as 1.)
- Must we burst all balloons? (Yes — the question asks max coins from bursting all.)
- Are zero values allowed? (Yes — bursting a zero-balloon gives 0 coins.)
Examples
nums = [3, 1, 5, 8] → 167
Burst order: 1, 5, 3, 8.
3 * 1 * 5 = 15; 3 * 5 * 8 = 120; 1 * 3 * 8 = 24; 1 * 8 * 1 = 8; total 167.
nums = [1, 5] → 10 (burst 5 → 1*5*1=5; burst 1 → 1*1*1=1; total 6.
Better: burst 1 first → 1*1*5=5; burst 5 → 1*5*1=5; total 10.)
nums = [9] → 9
nums = [1] → 1
Initial Brute Force
For each balloon, try bursting it first; recurse on the two halves. Note: this naive form has a fundamental contiguity issue — after bursting k first, the left and right halves’ boundary values change depending on which balloons remain, so the subproblems aren’t independent. We can still write it but it requires passing the current array (or the active set) down.
def burst_brute(nums):
arr = [1] + nums + [1]
def f(active):
if not active: return 0
best = 0
for idx in range(len(active)):
k = active[idx]
left = active[idx-1] if idx > 0 else 0
right = active[idx+1] if idx+1 < len(active) else len(arr)-1
gain = arr[left] * arr[k] * arr[right]
best = max(best, gain + f(active[:idx] + active[idx+1:]))
return best
return f(list(range(1, len(arr) - 1)))
Brute Force Complexity
O(N!) — every permutation of bursts. At N=500, completely infeasible.
Optimization Path
The “first to burst” formulation cannot be memoized cleanly on (i, j) because the subproblems depend on what’s outside (i, j). Reframe: ask “in the final interval (i, j), which balloon k is burst last?”. By the time k is burst, all balloons in (i, k) and (k, j) have been burst — independently of each other. k’s neighbors at that moment are nums[i] and nums[j] (the surviving boundary). The recurrence becomes:
dp[i][j] = max over k in (i, j) of:
dp[i][k] + nums[i] * nums[k] * nums[j] + dp[k][j]
Pad nums with 1 at both ends to handle out-of-bounds neighbors uniformly. Iterate by interval length.
Final Expected Approach
1. arr = [1] + nums + [1] (length N+2)
2. dp[i][j] = max coins from bursting all balloons strictly between i and j (boundaries i, j untouched)
3. dp[i][i+1] = 0 (no balloons between i and i+1)
4. For length 2..N+1:
For i in 0..N+1-length:
j = i + length
dp[i][j] = max over k in (i, j) of dp[i][k] + arr[i]*arr[k]*arr[j] + dp[k][j]
5. Answer: dp[0][N+1]
Data Structures Used
- 2D
dp[N+2][N+2]. - Padded array
arrof sizeN+2.
Correctness Argument
Claim: dp[i][j] = max coins from bursting all balloons in (i, j) (exclusive), assuming balloons at indices i and j are still present. Proof by induction on length.
Base: length 1 → (i, i+1) has no balloons strictly between → 0.
Inductive step: any optimal bursting order has a last balloon k in (i, j). When k is burst, all other balloons in (i, k) and (k, j) have already been burst, and k’s neighbors are arr[i] and arr[j] (because k is the last to go). The two subintervals (i, k) and (k, j) are independent — neither affects the other since they’re separated by k itself, which is alive until the end. So:
dp[i][k]= max coins from bursting(i, k)(boundariesi,kalive).dp[k][j]= max coins from bursting(k, j)(boundariesk,jalive).arr[i] * arr[k] * arr[j]= coins from burstingklast with neighborsi,j.
Sum and maximize over k. The induction works because both subintervals are strictly shorter than (i, j).
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force | O(N!) | O(N) |
| Memoized | O(N³) | O(N²) |
| Tabulated | O(N³) | O(N²) |
| Space-optimized | (no further reduction; subproblems span all of (i, j)) | O(N²) |
At N=500, N³ = 1.25 × 10^8 — close to the edge but passes.
Implementation Requirements
# ---- Stage 1: Brute force ----
def burst_brute(nums):
arr = [1] + nums + [1]
def f(active):
if not active: return 0
best = 0
for idx in range(len(active)):
k = active[idx]
left = active[idx-1] if idx > 0 else 0
right = active[idx+1] if idx+1 < len(active) else len(arr) - 1
gain = arr[left] * arr[k] * arr[right]
best = max(best, gain + f(active[:idx] + active[idx+1:]))
return best
return f(list(range(1, len(arr) - 1)))
# ---- Stage 2: Memoized (think-backwards reformulation) ----
from functools import lru_cache
def burst_memo(nums):
arr = [1] + nums + [1]
@lru_cache(None)
def f(i, j):
if j - i < 2: return 0
return max(
f(i, k) + arr[i] * arr[k] * arr[j] + f(k, j)
for k in range(i + 1, j)
)
return f(0, len(arr) - 1)
# ---- Stage 3: Tabulated 2D ----
def maxCoins(nums):
arr = [1] + nums + [1]
n = len(arr)
dp = [[0] * n for _ in range(n)]
for length in range(2, n):
for i in range(n - length):
j = i + length
best = 0
for k in range(i + 1, j):
cand = dp[i][k] + arr[i] * arr[k] * arr[j] + dp[k][j]
if cand > best: best = cand
dp[i][j] = best
return dp[0][n-1]
# ---- Stage 4: No further space optimization (full 2D needed); presented as a tighter inner loop ----
def maxCoins_tight(nums):
arr = [1] + nums + [1]
n = len(arr)
dp = [[0] * n for _ in range(n)]
for length in range(2, n):
for i in range(n - length):
j = i + length
ai, aj = arr[i], arr[j]
best = 0
for k in range(i + 1, j):
cand = dp[i][k] + ai * arr[k] * aj + dp[k][j]
if cand > best: best = cand
dp[i][j] = best
return dp[0][n-1]
Tests
[3, 1, 5, 8]→ 167.[1, 5]→ 10.[9]→ 9.[1]→ 1.[]→ 0.[1, 1, 1]→ 3.- N=100 random — performance smoke test.
- Cross-check brute vs memo on N≤7.
Follow-up Questions
- “Find the burst order.” → Track the
argmax kin eachdp[i][j]; recursively reconstruct. - “Each balloon has a different gain function (not multiplicative).” → Same DP shape, plug in any commutative-on-boundaries function.
- “Can we burst at most M balloons?” → Add a 3rd dimension
dp[i][j][m]. - “Stones Game family (LC 877).” → Interval DP with two players;
dp[i][j]= max-score-difference. - “Matrix Chain Multiplication.” → Same shape:
dp[i][j] = min over k of dp[i][k] + dp[k+1][j] + cost(i,k,j).
Product Extension
Interval DP underlies optimal binary search tree construction, optimal parenthesization for matrix chains in linear algebra libraries, and pricing problems in algorithmic finance (“when to buy/sell a contract that opens an interval”). The “think backwards / last to act” reframing recurs in algorithmic game theory and contract design.
Language/Runtime Follow-ups
- Python: at N=500, the inner triple-loop is
1.25 × 10^8iterations; Python may TLE. PyPy or rewriting the inner loop as amax(...)generator helps. - Java/Go/C++: no concern at this size.
- Memoization in Python:
lru_cache(None)is fine; works with(i, j)int tuples. - Iterative version: triple-nested loop is most efficient; avoid generator overhead.
Common Bugs
- Trying “first to burst” recurrence and memoizing on
(i, j)— incorrect because the subproblems’ boundaries change as outer balloons are burst. - Forgetting to pad with 1 at both ends — out-of-bounds neighbors then need special-casing in every loop iteration.
- Iterating by
i, jrow-major —dp[i][k]fork > j(which we never compute) is never read, butdp[k][j]fork < jis; the row-major order computesdp[i][k]beforedp[i][j]only sometimes. Iterate by length. - Off-by-one in
range(i + 1, j)—kmust be strictly betweeniandj. Easy to writerange(i, j)orrange(i + 1, j + 1)and break the recurrence. - Initializing
best = -1— wrong because all values are non-negative anddp[i][j] = 0for empty intervals is correct.
Debugging Strategy
For [3, 1, 5, 8] (padded to [1, 3, 1, 5, 8, 1]):
- Length 2 (no balloons strictly between): all
dp = 0. - Length 3: e.g.,
dp[0][2]=arr[0]*arr[1]*arr[2]=1*3*1=3.dp[1][3] = 3*1*5 = 15.dp[2][4] = 1*5*8 = 40.dp[3][5] = 5*8*1 = 40. - Length 4: e.g.,
dp[0][3]= max over k=1,2 ofdp[0][k] + 1*arr[k]*5 + dp[k][3]= max(0 + 1*3*5 + 15,3 + 1*1*5 + 0) = max(30, 8) = 30. - Continue up to
dp[0][5] = 167.
Print dp row by row and verify against the trace.
Mastery Criteria
- Recognized the contiguity issue with “first to burst” within 90 seconds.
- Articulated the “think backwards / last to burst” reformulation in <60 seconds.
- Wrote the corrected recurrence on a whiteboard in <2 minutes.
- Wrote the tabulated O(N³) solution in <8 minutes from blank screen.
-
Padded
numswith sentinels correctly without prompting. - Iterated by interval length without prompting.
- Stated O(N³) time and O(N²) space.
- Solved LC 312 unaided in <20 minutes.
- Solved Matrix Chain Multiplication in <12 minutes via the same template.
Lab 10 — Bitmask DP (Shortest Path Visiting All Nodes)
Goal
Solve Shortest Path Visiting All Nodes (LC 847) with both BFS over (node, mask) states and an iterative DP variant. Internalize bitmask state encoding: when N is small (≤ 20), the subset S ⊆ {0, …, N-1} fits in a single integer mask and a 1D array of size 2^N indexes all subsets. After this lab you can handle any “visit all / cover all / select subset” problem with N ≤ 20 in <15 minutes.
Background Concepts
Bitmask DP encodes a subset as an integer’s bits. For N=12, there are 2^12 = 4096 subsets — small enough that dp[mask][i] (mask × last-visited-node) has 4096 × 12 ≈ 50K states. Each state’s transition is O(N), giving O(N² · 2^N) total — feasible for N ≤ 16, manageable for N ≤ 20 with care.
Common bitmask DP patterns:
- Visit-all / TSP-like:
dp[mask][i]= min cost to visit nodes inmaskending ati. Final: min overiofdp[(1<<N)-1][i](+ return-to-start cost for TSP). - Subset-cover:
dp[mask]= best score selecting items whose indicator ismask. - Assignment problems: assign N people to N tasks with min total cost —
dp[mask]wheremask= set of tasks already assigned, withpopcount(mask)people processed so far.
LC 847 is unusual: it’s a shortest unweighted path problem (BFS), not a min-cost (Dijkstra) problem. So BFS over (node, mask) is the natural approach. We can also frame it as DP, but BFS is cleaner here.
Interview Context
Shortest Path Visiting All Nodes (LC 847) is a top-Hard graph + bitmask problem at Google and Meta. The bitmask-on-graph technique recurs in: LC 943 (Find the Shortest Superstring), LC 1125 (Smallest Sufficient Team), LC 1349 (Maximum Students Taking Exam), LC 526 (Beautiful Arrangement). The trick of recognizing N ≤ 20 → bitmask is itself an interview heuristic: any time N is suspiciously small, consider bitmask.
Problem Statement
You have an undirected, connected graph of n nodes labeled from 0 to n − 1. graph[i] is the list of neighbors of node i. Return the length of the shortest path that visits every node. You may start and stop at any node, may revisit nodes, and may reuse edges.
Constraints
- 1 ≤ n ≤ 12
graph.length== n- 0 ≤
graph[i].length< n - The graph is connected.
Clarifying Questions
- Edges are undirected? (Yes — given.)
- Edges weighted or unweighted? (Unweighted — count edges traversed.)
- Can the same node be visited multiple times? (Yes.)
- Can we start anywhere? (Yes — the answer minimizes over all starting nodes.)
- Must the graph be connected? (Yes — given. Otherwise the answer is infeasible.)
Examples
graph = [[1,2,3],[0],[0],[0]] → 4 (visit order: 1→0→2→0→3 or 2→0→1→0→3 etc.)
graph = [[1],[0,2,4],[1,3,4],[2],[1,2]] → 4
graph = [[1],[0]] → 1 (just edge 0–1)
graph = [[]] → 0 (single node, already visited)
Initial Brute Force
Try every permutation of node visits as a starting path; sum the shortest-path-lengths between consecutive nodes (precomputed via BFS). N! permutations × O(N) work per permutation. At N=12, 12! ≈ 5 × 10^8 — borderline.
from itertools import permutations
from collections import deque
def shortestPathLength_brute(graph):
n = len(graph)
if n == 1: return 0
# Precompute pairwise shortest path lengths via BFS
dist = [[float('inf')] * n for _ in range(n)]
for src in range(n):
dist[src][src] = 0
q = deque([src])
while q:
u = q.popleft()
for v in graph[u]:
if dist[src][v] == float('inf'):
dist[src][v] = dist[src][u] + 1
q.append(v)
best = float('inf')
for perm in permutations(range(n)):
cost = sum(dist[perm[i]][perm[i+1]] for i in range(n - 1))
best = min(best, cost)
return best
Brute Force Complexity
O(N! · N) time. At N=12, infeasible.
Optimization Path
Observation: at any point, the relevant state is (current_node, set_of_visited_nodes). There are N · 2^N such states. Two ways to solve:
-
BFS over states (canonical for unweighted): each state is a node in a meta-graph of
(node, mask); transitions to(neighbor, mask | (1 << neighbor)). BFS gives shortest-path lengths to all states; the answer is the smallest distance to any state withmask = (1 << N) - 1. TimeO(N · 2^N · degree)≈O(N² · 2^N). -
Iterative DP (Held-Karp style for TSP — but TSP minimizes weighted paths; for unweighted with revisits the BFS is more natural).
We present BFS as the primary solution (canonical for LC 847) with the iterative-DP variant as a follow-up.
Final Expected Approach
1. Initialize a queue with all (node, 1 << node) states (one per starting node), distance 0.
2. BFS:
- Pop (u, mask). If mask == ALL = (1<<N)-1, return current distance.
- For each neighbor v of u:
new_mask = mask | (1 << v)
if (v, new_mask) not yet visited:
mark, enqueue, distance + 1.
Data Structures Used
dequefor BFS.- 2D
visited[node][mask]boolean (or asetof(node, mask)tuples). - For brute force: pairwise
disttable (BFS-precomputed) anditertools.permutations.
Correctness Argument
The state graph has nodes (u, mask) and edges (u, mask) → (v, mask | (1 << v)) for every graph neighbor v of u. A path in the original graph that visits all nodes corresponds 1-to-1 to a path in the state graph from some starting state (s, 1 << s) to a “complete” state (t, ALL) for some t. Since the state-graph edges are unweighted, BFS finds the shortest such path. Multi-source BFS over all starting states minimizes over all start nodes simultaneously. Termination: the state graph has N · 2^N nodes; BFS visits each at most once.
Complexity
| Stage | Time | Space |
|---|---|---|
| Brute force (perm + BFS dist) | O(N! · N + N²·E) | O(N²) |
| BFS over (node, mask) | O(N² · 2^N) | O(N · 2^N) |
| DP (popcount-ascending fill) | O(N² · 2^N) | O(N · 2^N) |
At N=12: 12² · 4096 = 590K ops — fast.
Implementation Requirements
from collections import deque
from itertools import permutations
# ---- Stage 1: Brute force ----
def shortestPathLength_brute(graph):
n = len(graph)
if n == 1: return 0
dist = [[float('inf')] * n for _ in range(n)]
for src in range(n):
dist[src][src] = 0
q = deque([src])
while q:
u = q.popleft()
for v in graph[u]:
if dist[src][v] == float('inf'):
dist[src][v] = dist[src][u] + 1
q.append(v)
best = float('inf')
for perm in permutations(range(n)):
cost = sum(dist[perm[i]][perm[i+1]] for i in range(n - 1))
if cost < best: best = cost
return best
# ---- Stage 2: Memoized DFS over (node, mask) — works but BFS is preferred for unweighted ----
from functools import lru_cache
def shortestPathLength_memo(graph):
n = len(graph)
ALL = (1 << n) - 1
@lru_cache(None)
def f(u, mask):
if mask == ALL: return 0
best = float('inf')
for v in graph[u]:
new_mask = mask | (1 << v)
if new_mask != mask: # only progress if v is newly visited
best = min(best, 1 + f(v, new_mask))
# also allow revisiting (no-op for mask but adds 1 to path length) -- but that's wasteful, skip
return best
# Try every starting node
return min(f(s, 1 << s) for s in range(n))
# NOTE: this memo version misses cases where you must transit through already-visited nodes.
# BFS handles that natively because (v, new_mask) where new_mask == mask is allowed if not yet seen.
# ---- Stage 3: BFS over (node, mask) — canonical solution ----
def shortestPathLength(graph):
n = len(graph)
if n == 1: return 0
ALL = (1 << n) - 1
visited = set()
q = deque()
for s in range(n):
state = (s, 1 << s)
visited.add(state)
q.append((s, 1 << s, 0))
while q:
u, mask, dist = q.popleft()
if mask == ALL: return dist
for v in graph[u]:
new_mask = mask | (1 << v)
state = (v, new_mask)
if state not in visited:
visited.add(state)
q.append((v, new_mask, dist + 1))
return -1 # unreachable; should not happen for connected graphs
# ---- Stage 4: DP filling by mask in popcount order (alternative formulation) ----
def shortestPathLength_dp(graph):
n = len(graph)
if n == 1: return 0
ALL = (1 << n) - 1
INF = float('inf')
# dp[mask][u] = min edges to reach state (u, mask) from any starting node
dp = [[INF] * n for _ in range(1 << n)]
q = deque()
for s in range(n):
dp[1 << s][s] = 0
q.append((s, 1 << s))
while q:
u, mask = q.popleft()
for v in graph[u]:
new_mask = mask | (1 << v)
if dp[new_mask][v] > dp[mask][u] + 1:
dp[new_mask][v] = dp[mask][u] + 1
q.append((v, new_mask))
return min(dp[ALL][u] for u in range(n))
Tests
[[1,2,3],[0],[0],[0]]→ 4.[[1],[0,2,4],[1,3,4],[2],[1,2]]→ 4.[[1],[0]]→ 1.[[]]→ 0 (single node).- N=12 with sparse and dense connectivity — performance smoke.
- Cross-check brute vs BFS on N≤6 random connected graphs.
Follow-up Questions
- “What if edges are weighted?” → Replace BFS with Dijkstra (priority queue). Still O((N · 2^N) log(N · 2^N) + edges).
- “Must return to start (TSP closed tour).” (Held-Karp) → Compute
dp[mask][u]= min cost from start touvisitingmask; final answer min overuofdp[ALL][u] + dist[u][start]. - “Can revisit, weighted, must visit all.” → Floyd-Warshall preprocess to get all-pairs shortest paths, then Held-Karp on the dense complete graph induced by those distances.
- “N up to 20 — does this still fit?” → 2^20 = 10^6, N²·2^N ≈ 4×10^8. Borderline; need bit tricks and tight inner loops, often C++/Java only.
- “All Hamiltonian paths (visit each node exactly once).” → Same DP; track exact
popcount(mask) == N. NP-hard but bitmask handles N≤20.
Product Extension
The bitmask DP / Held-Karp algorithm is the gold-standard exact solution for small TSP-like problems. Real applications: drone delivery routing for ≤ 20 stops, layout optimization in chip design, scheduling N jobs on a single machine with sequence-dependent setup times, optimal-question-ordering in adaptive testing.
Language/Runtime Follow-ups
- Python: bit operations (
|,&,<<) on ints are arbitrary-precision;(1 << N) - 1works for any N. Usebin(mask).count('1')for popcount, ormask.bit_count()in Python 3.10+. - Java:
Integer.bitCount(mask). Useintfor N ≤ 31,longfor N ≤ 63. - Go:
bits.OnesCount(uint(mask))frommath/bits. - C++:
__builtin_popcount(mask)(orpopcountin C++20). Compiles to a single CPU instruction. - JS/TS: bitwise ops are 32-bit signed; for N > 30 use
BigInt(slower).
Common Bugs
- Forgetting to multi-source initialize the BFS — a single starting node misses better starts.
- Treating
(v, new_mask)wherenew_mask == maskas visited and skipping — this is correct for BFS if the state was already enqueued, but new code may forget that revisiting is sometimes necessary (transit through known nodes). The state-key(v, mask)handles this automatically. mask | (1 << v)missing the parentheses:mask | 1 << vparses as(mask | 1) << vin C/Java/JS — wrong. Always parenthesize the shift.- Forgetting the
if mask == ALL: return distcheck at dequeue time — checking only at enqueue can miss adist+1opportunity. setof tuples performance: in Python, a 2Dboolarray is faster than asetfor the visited check at high N. Use[[False]*N for _ in range(1<<N)].
Debugging Strategy
For [[1,2,3],[0],[0],[0]] (N=4): BFS expands (0,0001), (1,0010), (2,0100), (3,1000) at distance 0. From (1,0010), we go to (0,0011) at distance 1. From (0,0011), we go to (1,0011), (2,0111), (3,1011) at distance 2. From (2,0111) we go to (0,0111) at distance 3. From (0,0111) we go to (3,1111) at distance 4 — done. Print (u, bin(mask), dist) per dequeue and locate where your trace diverges.
Mastery Criteria
- Recognized “N ≤ 12, visit all” as bitmask DP within 60 seconds.
-
Stated the state space
(node, mask)and its sizeN · 2^Nin <30 seconds. - Wrote the multi-source BFS in <8 minutes from blank screen.
- Articulated why multi-source initialization is correct (start anywhere).
- Stated O(N² · 2^N) time complexity.
- Solved LC 847 unaided in <20 minutes.
- Articulated the difference between BFS-over-states (unweighted) and Held-Karp (weighted/TSP) in <60 seconds.
-
Wrote
mask.bit_count()/__builtin_popcountcorrectly without prompting. -
Solved LC 526 (Beautiful Arrangement) in <12 minutes using
dp[mask]over assignments.
Phase 6 — Greedy, Proofs & Mathematical Thinking
Target level: Medium → Hard Expected duration: 1.5 weeks (12-week track) / 2 weeks (6-month track) / 2.5 weeks (12-month track) Weekly cadence: ~7 greedy concepts plus 6 labs plus 25–40 problems applying them under the framework
Why Greedy Is The Single Most Dangerous Pattern Family In Coding Interviews
Greedy is the topic where the largest number of candidates fail confidently. Unlike dynamic programming, where the failure mode is “I cannot derive the recurrence” — a visible failure that the interviewer can help with — greedy’s failure mode is “I have a plausible algorithm, I have run it on the given example, it works, and I am wrong.” The candidate writes a clean function, the test cases pass, the complexity is excellent, and the algorithm is silently incorrect on the third hidden test. By the time the interviewer reveals the counterexample, the candidate has consumed 25 minutes building a wrong solution and has 10 minutes left to either patch it (impossible without re-deriving) or restart with DP (also impossible).
The empirical claim that drives this entire phase:
The hard part of greedy is not the algorithm. The hard part is the proof of correctness. Almost every wrong greedy solution is wrong because the candidate convinced themselves “this seems to work” without an exchange argument or invariant — and an interviewer who suspects the candidate is guessing will deliberately construct a counterexample. A candidate who can produce an exchange argument out loud, before the interviewer asks, is signalling “I know what I’m doing”; a candidate who can’t is signalling “I memorized this”.
Greedy is also the topic where the gap between a good engineer and a great one is widest in interview signal. Most candidates can write sort + scan + counter. Very few can articulate why the sort criterion is correct. The ability to say, in 60 seconds, “Suppose for contradiction the optimal solution uses a different first choice; I can exchange it with my greedy choice without making the solution worse, therefore my greedy is also optimal” — that one paragraph is the entire difference between an L4 hire and an L5 hire on greedy questions.
This phase is built around one teaching device that we will use on every single problem from start to finish: the proof comes before the code. Every lab in this phase requires you to write the correctness argument — exchange argument, invariant, or monovariant — before you write the implementation. The implementation is mechanical once the proof is solid; the proof is the whole skill.
After this phase, you can recognize when a problem is amenable to greedy in <2 minutes, produce an exchange argument out loud in <90 seconds, write the algorithm in <5 minutes, and — crucially — identify when a problem looks greedy but is not, falling back to DP from Phase 5 without panic.
What You Will Be Able To Do After This Phase
- Recognize greedy candidates in <2 minutes by spotting the greedy choice property signal: “the locally optimal choice cannot hurt the global optimum.”
- Distinguish greedy-applicable problems from DP-required problems on first read, using the Greedy-vs-DP flowchart.
- Produce an exchange argument for any greedy you propose, in the canonical four-step form (assume optimal differs, locate first divergence, exchange, prove no-worse).
- Cite the cut property for MST correctness and explain why both Kruskal and Prim are correct under it.
- Use loop invariants to scaffold proofs of greedy algorithms whose correctness is not obvious from a single exchange.
- Use monovariants (strictly-decreasing or strictly-increasing quantities) to prove termination and correctness of iterative greedy algorithms.
- Apply amortized analysis (potential method, accounting method, aggregate analysis) to bound the cost of greedy data-structure operations.
- Identify counterexamples to plausible-looking greedy heuristics, including the canonical “0/1 knapsack ≠ fractional knapsack” trap.
- Implement and prove correct: interval scheduling, jump game II, task scheduler, gas station, Huffman coding, and the greedy-vs-DP comparison on coin change.
- Articulate the failure modes of greedy unprompted: missing counterexample, “feels right” without proof, confusing local optimum with global optimum.
How To Read This Phase
Read this README in two passes. Pass 1: linear, end-to-end, building the mental discipline that “I will not ship a greedy solution without an exchange argument.” Do this in one sitting. Pass 2: as you work the labs, refer back to specific concept entries when stuck on a proof.
Each concept entry has a fixed shape:
- Precise Definition — what the concept means, mathematically.
- When Applicable — the problem signal that should fire this concept.
- Worked Example — the concept applied to a canonical problem, end-to-end.
- Common Misuse — the concrete failure mode this concept guards against.
The phase ends with a Greedy-vs-DP flowchart, a Common Greedy Bugs catalog, a Mastery Checklist, and Exit Criteria.
Inline Concept Reference
1. Greedy Choice Property
Precise Definition
A problem has the greedy choice property if there exists an ordering of the input such that, after making the locally optimal choice (the “greedy choice”) at each step, the result is a globally optimal solution. Formally: at every step i, there is a choice c_i such that some globally optimal solution to the original problem extends c_1, c_2, …, c_i. Equivalently, after the greedy choice the remaining problem is a smaller instance of the same problem, and combining the greedy choice with any optimal solution to the residual problem yields an optimal solution to the original.
This is the formal cousin of optimal substructure (which DP also requires) plus the additional claim that a single locally optimal choice — not a search over choices — suffices at each step.
When Applicable
The greedy choice property holds when:
- The problem can be solved by a sequence of irreversible decisions.
- At each step, there is a most attractive choice by some natural metric (earliest deadline, smallest weight, largest ratio, latest start time).
- An exchange argument or cut property can be invoked to prove that the most-attractive choice is never the wrong one.
The greedy choice property does not hold when:
- The right choice at step
idepends on choices made at stepi+1,i+2, … (i.e., you must “look ahead” to decide). This is the DP regime. - There are multiple incomparable “locally optimal” candidates and the wrong one creates suboptimal residual problems.
Worked Example: Activity Selection
Given n activities with start and end times, pick the maximum number of non-overlapping activities.
The greedy choice: pick the activity with the earliest end time among those still compatible with the previous picks. This satisfies the greedy choice property because: any optimal solution either contains the earliest-ending activity, or — if it doesn’t — we can swap its first picked activity for the earliest-ending one without overlap and without changing the count, producing a new optimal solution that does. (See Lab 01 — Interval Scheduling for the full exchange argument.)
Common Misuse
The most common error is to assume the greedy choice property without proving it, on the basis that “earliest deadline first feels intuitive”. Counter-examples are everywhere; for example, “earliest start time” also feels intuitive but is wrong (consider one activity from 1 to 100 versus dozens of short activities from 2 to 3, 3 to 4, …). The discipline: every greedy claim must be paired with a proof.
2. The Exchange Argument — The Canonical Greedy Proof Technique
Precise Definition
An exchange argument proves that a greedy solution G is optimal by showing that any other solution O can be transformed into G without increasing cost (or decreasing value) via a sequence of exchanges. Each exchange replaces an element of O with the corresponding element of G and proves the swap is non-worsening. After all exchanges, O has been transformed into G, so G is at least as good as O. Since O was arbitrary, G is at least as good as any solution — i.e., G is optimal.
Step-By-Step Recipe
The recipe is rigid. Memorize it. Use it on every greedy proof.
- Let
Gbe the greedy solution. Define it precisely (e.g., “the activities chosen by earliest-end-time-first”). - Let
Obe an arbitrary optimal solution. AssumeO ≠ G(otherwise we’re done). - Locate the first index where they differ. Let
ibe the smallest index such thatO[i] ≠ G[i]. By construction,G[0..i-1] = O[0..i-1]. - Perform the exchange. Replace
O[i]withG[i], producing a new solutionO'. - Prove
O'is feasible (still satisfies all constraints). - Prove
O'is no worse thanO(same objective value, or no-worse if it’s a max/min). - Repeat.
O'agrees withGon more positions thanOdid. Iterate; after finitely many exchanges,Ohas been transformed intoG. ThereforeGis optimal.
Worked Example: Activity Selection
Greedy G = activities sorted by end time, picked greedily. Suppose O ≠ G is optimal. Let i be the first divergence. By the greedy rule, G[i] has the earliest end time among activities compatible with G[0..i-1] = O[0..i-1]. So G[i].end ≤ O[i].end. Replace O[i] with G[i]: feasibility holds because G[i] ends no later than O[i], so all subsequent activities in O[i+1..], which all start after O[i].end, also start after G[i].end. The objective (count of activities) is unchanged. Repeat until O = G. QED.
Common Misuse
- Skipping step 5 (feasibility). Many “exchanges” produce an infeasible solution, invalidating the proof.
- Skipping step 6 (no-worse-than). The exchange must be non-worsening, not merely “different”.
- Stopping after one exchange. A single exchange shows
G[0] = O'[0]; you must iterate. - Picking the wrong index to exchange. Exchanging at the last difference rather than the first often fails because residual structure differs.
- Treating the exchange as a swap of types rather than concrete elements. Exchange a specific element of
Owith a specific element ofG, not “exchange the early-ending one with the late-ending one”.
3. The Cut Property (MST Correctness)
Precise Definition
For a connected, weighted, undirected graph G = (V, E), the cut property states: for any cut (S, V\S) (a partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut belongs to some minimum spanning tree of G. (If the minimum is unique, it belongs to every MST; if tied, at least one MST contains it.)
When Applicable
The cut property is the correctness theorem for greedy MST algorithms — Kruskal’s, Prim’s, Borůvka’s. It also generalizes to matroid theory: the greedy algorithm is correct on a structure iff the structure is a matroid.
Worked Example: Kruskal’s Algorithm
Kruskal sorts edges ascending by weight and adds each edge that doesn’t create a cycle. Correctness via the cut property: when Kruskal adds edge (u, v), the union-find structure tells us u and v are in different components — call them S and V\S (where S is u’s component and everything else, including v’s component, is V\S). Edge (u, v) crosses this cut. Because Kruskal scans edges in ascending order and (u, v) is the first edge (by weight) that crosses this cut without forming a cycle, it is the minimum-weight edge crossing the cut. By the cut property, (u, v) belongs to some MST. By induction, the set of edges Kruskal has added so far is a subset of some MST. After processing all edges, Kruskal’s set of edges is exactly an MST.
(Phase 4’s MST labs cover this in algorithmic detail; this phase’s job is the proof.)
Common Misuse
- Applying the cut property to directed graphs. It only applies to undirected MST.
- Assuming the MST is unique. Tie-breaking matters; multiple MSTs may exist.
- Forgetting connectedness. On a disconnected graph, you compute a minimum spanning forest, not an MST.
4. Loop Invariants (Proof Scaffolding)
Precise Definition
A loop invariant is a property P(state) that holds before the loop, after every iteration, and after the loop terminates. To prove a loop is correct, show:
- Initialization —
Pholds before the first iteration (i.e., on the initial state). - Maintenance — if
Pholds at the start of iterationk, thenPholds at the end of iterationk. - Termination — when the loop exits,
P(combined with the exit condition) implies the desired postcondition.
Loop invariants are the workhorse of proving greedy algorithms whose correctness isn’t a single one-shot exchange argument — they’re scaffolding for “this thing stays true throughout the run”.
Worked Example: Gas Station (LC 134)
Greedy: scan stations once, maintain tank = 0 and start = 0. At station i, tank += gas[i] - cost[i]. If tank < 0, set start = i + 1 and reset tank = 0. Final answer: start if total gas ≥ total cost, else -1.
Loop invariant: at the end of iteration i, tank equals the net fuel accumulated from start to i, and no station in [start, i] (other than possibly start) is a valid starting point.
(Full proof in Lab 04 — Gas Station.)
Common Misuse
- Inventing an invariant after the fact that conveniently equals “the answer is correct”. The invariant must be precisely statable independent of the conclusion.
- Failing to prove maintenance — usually because the invariant is too weak (doesn’t survive one iteration) or too strong (false at initialization).
- Skipping termination — the invariant might hold every iteration but the loop might not terminate at all, or might terminate in a state that doesn’t imply the postcondition.
5. Monovariants (Termination Arguments)
Precise Definition
A monovariant is a quantity that strictly increases (or strictly decreases) with every iteration of an algorithm and is bounded below (or above) by a known value. The existence of a monovariant proves termination: a strictly decreasing integer-valued quantity bounded below by 0 cannot decrease more than its initial value, so the loop runs at most that many iterations.
In greedy proofs, monovariants are also used to prove progress: each iteration makes irreversible progress toward the goal, so we never need to undo a choice.
Worked Example: Jump Game II (LC 45)
Greedy with two pointers: current_end and farthest. Scan; for each index, update farthest = max(farthest, i + nums[i]). When i == current_end, jump: jumps += 1, current_end = farthest.
Monovariant: farthest is non-decreasing across iterations; in fact farthest ≥ i + 1 for all i (otherwise we’d be stuck at an unreachable position, which is impossible if a solution exists). The monovariant guarantees we never need to backtrack — every position contributes to extending farthest, and current_end only ever moves forward.
(Full proof in Lab 02 — Jump Game II.)
Common Misuse
- Confusing “increasing” with “non-decreasing”. A non-decreasing monovariant doesn’t prove termination — the algorithm might loop indefinitely with the quantity stuck. Use strictly increasing/decreasing.
- Using a real-valued monovariant without an explicit lower bound on the rate of change. Real values can decrease toward an infimum without ever reaching it (Zeno’s paradox in algorithm form).
- Treating monovariant as a correctness proof when it’s only a termination proof. Termination + invariant gives correctness; one of them alone does not.
6. Amortized Analysis
Amortized analysis bounds the cost of a sequence of operations by an average per-operation cost, even when individual operations may be expensive. It is essential for proving the cost of greedy data-structure operations — union-find with path compression, dynamic arrays, splay trees — and shows up in Phase 3 and Phase 7. We cover the three classical methods here.
6a. Aggregate Analysis
Bound the total cost of n operations by some function T(n), then divide: amortized cost per operation is T(n) / n. Simple to apply, hardest to derive a tight T(n) for.
Example. A dynamic array (Python list, Java ArrayList) doubles its capacity when full. n push operations cost: n for the actual writes, plus 1 + 2 + 4 + … + n/2 ≤ n for the resizes, total ≤ 2n. Amortized cost per push: 2n / n = O(1).
6b. The Accounting Method
Charge each operation a fixed amount (the “amortized cost”) which may exceed its actual cost. The excess is stored as “credit” on data-structure elements. Expensive operations pay using accumulated credits, never going into debt. If you can maintain “credits ≥ 0” as an invariant, the amortized cost is a valid upper bound.
Example. Dynamic array push: charge 3 per push. Actual cost is 1 for the write; 2 is stored as credit on the just-written element. When the array doubles, each element being moved already has 2 credits on it — exactly enough to pay for the move and the copy of one new element from the old half. Credits never go negative; amortized cost per push is O(1).
6c. The Potential Method
Define a potential function Φ(D) over data-structure states D, with Φ(D₀) = 0 initially and Φ(D) ≥ 0 always. The amortized cost of an operation is actual_cost + ΔΦ. Total amortized cost over n operations is Σ actual_cost + Φ(D_n) - Φ(D₀) ≥ Σ actual_cost, so it’s a valid upper bound.
Example. Dynamic array: let Φ = 2 · size − capacity. After a doubling, Φ = 0. Each push that doesn’t trigger doubling: actual cost 1, ΔΦ = +2, amortized 3. Doubling push: actual cost size + 1, ΔΦ = 2 − size, amortized 3. Constant O(1) amortized per push.
Common Misuse
- Conflating amortized with average-case. Amortized is a worst-case bound on a sequence of operations; average-case is over a probability distribution on inputs. They are not the same.
- Applying amortized bounds to a single operation. A single resize is
O(n); only the average over a sequence isO(1). - Claiming credits go negative in the accounting method — invalidates the bound. Always verify the invariant.
- Using a potential function that can become negative — invalidates the bound;
Φ ≥ 0is required.
7. When Greedy FAILS (Counterexamples)
The skill of greedy is not just knowing when it works but recognizing when it doesn’t before you commit. Memorize these traps.
7a. 0/1 Knapsack ≠ Fractional Knapsack
Fractional knapsack (you can take any fraction of an item): greedy by value-to-weight ratio is optimal. Sort items by v_i / w_i descending; take items in order, taking a fraction of the last item if needed.
0/1 knapsack (each item is take-or-skip): greedy by ratio is not optimal. Counterexample:
| Item | Weight | Value | Ratio |
|---|---|---|---|
| A | 1 | 1.5 | 1.5 |
| B | 2 | 2 | 1.0 |
| C | 2 | 2 | 1.0 |
Capacity = 3. Greedy by ratio takes A (capacity left 2), then B (capacity left 0), total 3.5. Optimum is B + C = 4.
The trap: ratio-greedy is correct under fractional flexibility, fails under integrality. The 0/1 version requires DP — see Phase 5 Lab 03.
7b. Coin Change With Arbitrary Denominations
For US coins {1, 5, 10, 25}, greedy (largest first) is optimal. For arbitrary denominations like {1, 3, 4} with target 6, greedy gives 4 + 1 + 1 = 3 coins; optimum is 3 + 3 = 2 coins. Greedy fails. Fall back to DP — see Lab 06 — Greedy vs DP for the canonical counterexample analysis.
7c. Scheduling With Weights
Interval scheduling (unweighted): greedy by earliest end time is optimal. Weighted interval scheduling (each interval has a weight, maximize total weight of non-overlapping picks): greedy fails. DP with binary-search predecessor pointer is correct, O(N log N).
7d. “Greedy By Farthest Reach” In Reachability
Some problems look like jump game but are not: e.g., on a weighted graph, “earliest arrival” by greedy farthest-reach is not Dijkstra; it requires the priority-queue refinement.
Common Misuse
- Trusting “looks like jump game” reasoning. Always verify the greedy choice property formally. The signal is “I have an exchange argument” not “the example worked”.
- Failing the cross-test. Try N=2 and N=3 by hand. Try a hostile counterexample. Try the input where all elements are equal, all are distinct, all are decreasing.
- Forgetting the DP fallback. Many problems are “greedy if X, DP otherwise.” Know which side you are on before coding.
Greedy-Vs-DP Decision Flowchart
When a problem looks optimization-flavored, this flowchart determines whether to attempt greedy or jump straight to DP.
START → Is the problem an optimization (max / min) or counting?
│
├── No (search / decision / construction) → Greedy candidate; check exchange argument.
│
└── Yes
│
├── Can I sort the input by a single criterion (deadline, weight, ratio)
│ such that processing in that order has the greedy choice property?
│ │
│ ├── Yes — and I can write a 4-step exchange argument in <90s →
│ │ GREEDY. Code in <5 minutes.
│ │
│ └── No / not sure →
│ Does the optimal answer at step i depend on choices at i+1, i+2, …?
│ │
│ ├── Yes → DP. State = (position, accumulated state needed for future).
│ │ See Phase 5.
│ │
│ └── No / unclear → Try greedy with a small N=3, N=4 stress-test against
│ brute force. If matches → write proof attempt.
│ If diverges → DP.
│
└── Try the canonical counterexamples for this problem class:
- Fractional vs 0/1 (knapsack)
- Sorted vs arbitrary (coin change)
- Unweighted vs weighted (scheduling)
If any counterexample defeats your greedy → DP.
The flowchart’s discipline: never commit to greedy without an exchange argument. The 90-second time-box for the proof attempt is exactly the safety check that prevents the failure mode of confidently submitting wrong greedy code.
Common Greedy Bugs
A taxonomy. Each one shows up in at least 30% of submitted greedy solutions in mock interviews.
- Claiming greedy without proof. “I’ll sort by end time and pick” with no exchange argument. The interviewer asks “why?” and the candidate either restates the algorithm or stalls. Fix: practice the 4-step exchange recipe so it comes out automatically when you propose the algorithm.
- Wrong sort key. Sorting by start time when end time is correct (interval scheduling); sorting by weight when ratio is correct (fractional knapsack). Fix: before coding, do an exchange argument with each candidate sort key on a hand-crafted small example. The wrong key fails the exchange step quickly.
- Ignoring counterexamples. “My algorithm passes the examples, ship it.” Fix: always run the algorithm against brute force on N=2, N=3, N=4 random inputs before committing.
- Assuming local optimum = global optimum without justification. “At each step pick the smallest” works for some problems and fails for others. Fix: the greedy choice property is not free; it must be proven for every problem.
- Confusing fractional and integer regimes. Ratio-greedy on 0/1 knapsack; “take half of this” interpretation in problems where items are atomic. Fix: read the integrality constraint before coding.
- Off-by-one in “earliest end time” when ties exist. Ties must be broken consistently — typically by start time ascending — and the proof must handle ties. Fix: state the tie-breaker explicitly and verify the exchange argument handles it.
- Greedy with backtracking masquerading as greedy. Some “greedy” solutions secretly maintain a stack and pop on conflict (e.g., remove-k-digits LC 402, candy distribution LC 135). The pure greedy claim doesn’t apply; the algorithm is greedy plus an undo step. Fix: if your algorithm has an
if conflict: undo, it’s not pure greedy; the proof must cover the undo logic. - Mixing up the proof with the algorithm. “Earliest deadline first works because earliest deadline first is the best choice” is circular. Fix: the proof must reference an exchange or invariant or cut, not restate the algorithm.
- Not handling the empty / size-1 case in scan-based greedy. Fix: explicit guard at the start.
- Using a heap when sorting suffices (or vice versa). Heap is for online greedy where future inputs aren’t yet visible (e.g., merge K sorted lists, scheduling with deadlines streaming in). Sort is for offline where all inputs are known. Fix: ask “is the input streamed or batched?” up front.
Mastery Checklist
Before exiting this phase, verify all of these:
- You can recognize a greedy candidate within 2 minutes by spotting the greedy choice property signal.
- You can produce a 4-step exchange argument for any greedy claim within 90 seconds, out loud, without writing code first.
- You can cite the cut property and apply it to prove Kruskal’s / Prim’s correctness from scratch.
- You can articulate the difference between an invariant, a monovariant, and an exchange argument — and pick the right tool for a given proof.
- You can perform aggregate, accounting, and potential-method amortized analyses on a dynamic-array example, in <5 minutes total.
- You can name and articulate three canonical greedy counterexamples (0/1 vs fractional knapsack; coin change {1,3,4}; weighted interval scheduling).
- You can implement interval scheduling, jump game II, task scheduler, gas station, and Huffman coding with full proofs in <90 minutes total.
- You can articulate the greedy-vs-DP decision in <30 seconds for any optimization problem.
- You catch yourself before committing to a greedy without an exchange argument — every time.
Exit Criteria
You may move to Phase 7 (Competitive Programming Acceleration) when all of the following are true:
- You have completed all six labs in this phase, with each lab’s mastery criteria checked off.
- You have solved at least 25 unaided greedy problems from LeetCode (mix of Medium and Hard) and reviewed each via REVIEW_TEMPLATE.md. On at least 20 of them, you wrote the exchange argument or invariant in the review before peeking at any solution.
- Your unaided success rate on Medium-Hard greedy problems is ≥ 65%.
- In a mock interview (phase-11-mock-interviews/), you correctly identify greedy-applicability within 2 minutes for at least 7 of 10 greedy-flavored problems and produce an exchange argument within 90 seconds for at least 6 of 10. You correctly reject greedy in favor of DP on at least 2 of the 10, citing a counterexample.
- You have never in this phase shipped a greedy solution without a written proof. This is the single discipline of Phase 6, and skipping it is a phase-failure.
If any of these fails, do another 15–20 greedy problems before moving on. Skipping this gate produces engineers who pattern-match “looks like greedy” and ship wrong code under deadline pressure — exactly the failure mode that gets candidates rejected at staff level.
Labs
Hands-on practice. Each lab follows the strict 22-section format. Every lab’s Correctness Argument section contains an explicit exchange argument or invariant + monovariant proof. This is the whole point of Phase 6.
- Lab 01 — Interval Scheduling (Activity Selection) — canonical exchange argument
- Lab 02 — Jump Game II — greedy reach + monovariant
- Lab 03 — Task Scheduler With Cooldown — greedy + math formula
- Lab 04 — Gas Station — greedy + invariant proof
- Lab 05 — Huffman Coding — greedy via heap + exchange-argument optimality
- Lab 06 — Greedy Vs DP (Coin Change Counterexample) — when greedy fails and DP is required
← Phase 5: Dynamic Programming · Phase 7: Competitive Programming → · Back to Top
Lab 01 — Interval Scheduling (Activity Selection)
Goal
Master the canonical greedy problem: maximum non-overlapping interval selection. Internalize the earliest-end-time-first greedy and produce its exchange argument out loud, without help, in under 90 seconds.
Background
Interval scheduling is the prototype greedy problem in every algorithms textbook because it has the cleanest exchange argument and the largest gap between “intuitive but wrong” choices (earliest start, shortest duration, fewest conflicts) and the one correct one (earliest end). Mastering this lab is mastering the discipline of proof-before-code for the rest of Phase 6. The same exchange-argument template recurs in LC 452 (minimum arrows to burst balloons), LC 1353 (maximum events attended), and dozens of variants.
Interview Context
A staff-level interviewer at FAANG-tier companies will not accept “I’ll sort by end time” without a justification. The signal they’re testing for is: can this candidate distinguish between intuition and proof? The exchange argument, delivered cleanly in 60–90 seconds before code, is the strongest possible signal. Conversely, candidates who code first and then mumble “earliest end is correct because… well… it just is” almost always fail the question even when their code passes the tests.
Problem Statement
Given n activities, each with a start time s_i and an end time e_i, select the maximum-cardinality subset of activities that are pairwise non-overlapping (an activity ending at time t does not overlap with one starting at time t — boundary touching is allowed). Return the size of that subset, or the subset itself if requested.
LeetCode reference: LC 435 — Non-overlapping Intervals (return the count of intervals to remove to make the rest non-overlapping; equivalent to n − maxNonOverlapping).
Constraints
1 ≤ n ≤ 10^5−5 · 10^4 ≤ s_i < e_i ≤ 5 · 10^4- Boundary contact (an interval ending at
tand another starting att) does NOT count as overlap. - Time complexity must be
O(n log n); spaceO(1)extra (sort in place).
Clarifying Questions
- “Are intervals open, closed, or half-open?” — LC convention is half-open:
[s, e)is a common interpretation; ask the interviewer. - “Is an interval where
s == e(zero-duration) allowed?” — usually yes, treat as a single-point interval. - “Should I return the maximum non-overlapping count or the minimum to remove?” — LC 435 asks the latter; the answer is
n − maxNonOverlapping. - “Can the intervals be unsorted? Are they ever pre-sorted?” — assume unsorted unless stated; sort is part of the algorithm.
- “Are there ties on end time?” — yes; tie-break by start time ascending so the algorithm is deterministic and the proof handles ties cleanly.
Examples
[[1,2], [2,3], [3,4], [1,3]]→ max non-overlapping = 3 (pick[1,2],[2,3],[3,4]); remove count = 1.[[1,2], [1,2], [1,2]]→ max non-overlapping = 1; remove count = 2.[[1,2], [2,3]]→ max non-overlapping = 2; remove count = 0 (boundary contact does not overlap).[]→ 0.
Initial Brute Force
Try every subset; for each, check pairwise non-overlap; return the size of the largest valid subset.
from itertools import combinations
def max_non_overlap_brute(intervals):
n = len(intervals)
best = 0
for r in range(n + 1):
for subset in combinations(intervals, r):
ok = all(subset[i][1] <= subset[j][0]
for i in range(len(subset))
for j in range(len(subset))
if subset[i][0] < subset[j][0])
if ok:
best = max(best, len(subset))
return best
Brute Force Complexity
Time O(2^n · n^2) — every subset checked pairwise. Space O(n) for combinations. Infeasible at n = 25. Useful only as a stress-test oracle for the greedy on n ≤ 12.
Optimization Path
- Brute force (above) — establishes correctness baseline.
- DP — sort by end time;
dp[i]= max non-overlapping using intervals0..i. Transition:dp[i] = max(dp[i-1], 1 + dp[predecessor(i)])wherepredecessor(i)is the latestj < iwithe_j ≤ s_i. TimeO(n log n)with binary search. SpaceO(n). This is weighted interval scheduling’s structure when we drop weights. - Greedy — sort by end time, scan once, pick whenever compatible.
O(n log n)time,O(1)extra space. Optimality from the exchange argument below.
The DP step is worth deriving even though greedy beats it, because the moment intervals carry weights, greedy fails and DP is the only correct approach. Keep DP in your back pocket.
Final Expected Approach
Sort intervals by end time ascending (tie-break by start ascending). Maintain last_end = -∞. For each interval [s, e] in sorted order, if s ≥ last_end, accept it (count += 1, last_end = e); else skip. Return count.
For LC 435 the answer is n − count.
Data Structures Used
- A sorted list / array of intervals.
- One scalar
last_end. - One counter
count.
No heap, no DSU, no DP table. The algorithm is sort-and-scan.
Correctness Argument
This is the section the rest of Phase 6 hangs on. We use the canonical 4-step exchange argument.
Setup. Let G = g_1, g_2, …, g_k be the greedy solution: intervals sorted by end time, picked greedily. Let O = o_1, o_2, …, o_m be any optimal solution, also written in sorted-by-end-time order (we can always re-sort an optimal solution; non-overlap is preserved). Assume for contradiction m > k (O is strictly better than G); we will derive a contradiction by showing we can transform O into G step by step without decreasing its size — implying m ≤ k.
Step 1 — Locate the first divergence. Let i be the smallest index where g_i ≠ o_i. By construction, g_1 = o_1, …, g_{i-1} = o_{i-1}.
Step 2 — Compare end times at the divergence. Greedy picks the interval with the earliest end time among those compatible with g_1, …, g_{i-1} — equivalently, those compatible with o_1, …, o_{i-1}. So g_i.end ≤ o_i.end (with equality possible if there is a tie and o_i happens to be the tied alternative).
Step 3 — Exchange. Replace o_i with g_i in O, producing O' = o_1, …, o_{i-1}, g_i, o_{i+1}, …, o_m. We must verify (a) feasibility and (b) size.
- Feasibility.
g_iis compatible witho_{i-1} = g_{i-1}because greedy ensured this.g_iis compatible witho_{i+1}:o_{i+1}.start ≥ o_i.end ≥ g_i.end, sog_iends no later thano_idid, and the rest ofOwas already non-overlapping witho_i’s end. SoO'is feasible. - Size.
|O'| = |O|(we replaced one interval with one interval).
Step 4 — Iterate. O' agrees with G on positions 1..i. Repeat the argument on O' versus G to find the next divergence. After at most min(k, m) exchanges, the resulting solution agrees with G on the first min(k, m) positions.
Conclude. If m > k, after k exchanges the transformed O looks like g_1, …, g_k, o_{k+1}, …, o_m. But greedy stopped at g_k, which means there was no interval compatible with g_1, …, g_k. Therefore o_{k+1} cannot exist — contradiction. So m ≤ k, i.e., G is at least as large as any feasible solution. G is optimal. QED.
Complexity
- Time
O(n log n)for the sort,O(n)for the scan; totalO(n log n). - Space
O(1)extra beyond the input (sort in place);O(log n)for sort recursion in some implementations.
Implementation Requirements
- Sort key:
(end, start)— explicit tie-break. - Boundary semantics:
s_next ≥ e_previs “compatible” (touching allowed). The interview’s clarifying question about open/closed determines this; default to half-open. - Handle empty input (
n = 0→ return 0). - Single pass, no nested loop.
Tests
def test_interval_scheduling():
# canonical
assert max_non_overlap([[1,2],[2,3],[3,4],[1,3]]) == 3
# all overlapping
assert max_non_overlap([[1,2],[1,2],[1,2]]) == 1
# no overlap
assert max_non_overlap([[1,2],[2,3]]) == 2
# empty
assert max_non_overlap([]) == 0
# single
assert max_non_overlap([[5,10]]) == 1
# tie on end time
assert max_non_overlap([[1,3],[2,3],[3,4]]) == 2 # one of {[1,3],[2,3]} + [3,4]
# negative coords
assert max_non_overlap([[-5,0],[0,5],[5,10]]) == 3
# nested
assert max_non_overlap([[1,10],[2,3],[4,5],[6,7]]) == 3
Stress test: generate random n ≤ 12 and compare greedy to brute force on 1000 trials.
Follow-up Questions
- Weighted version (each interval has weight
w_i; maximize total weight of chosen non-overlapping). Greedy fails. Solution: DP with binary search predecessor pointer,O(n log n). - Online / streaming version (intervals arrive one at a time, decide accept/reject immediately, no recall). Different problem class — competitive ratio analysis.
- K-machine extension (
kparallel resources; each interval scheduled on any one of them; maximize total scheduled). Greedy withk“last_end” trackers + min-heap.
Product Extension
Calendar systems (Google Calendar’s “find a meeting time” feature, AWS spot-instance scheduling, ad slot allocation): the unweighted version is the prototype, and weighted variants are exactly what production schedulers solve.
Language / Runtime Follow-ups
- Python:
sorted(intervals, key=lambda x: (x[1], x[0]))— stable sort, tuple comparison handles tie-break for free. - Java:
Arrays.sort(intervals, (a, b) -> a[1] != b[1] ? Integer.compare(a[1], b[1]) : Integer.compare(a[0], b[0])). UseInteger.compareto avoid integer-overflow ona[1] - b[1]. - Go:
sort.Slice(intervals, func(i, j int) bool { if intervals[i][1] != intervals[j][1] { return intervals[i][1] < intervals[j][1] }; return intervals[i][0] < intervals[j][0] }). - C++:
std::sortwith a lambda; preferstd::tie(a[1], a[0]) < std::tie(b[1], b[0])for clarity. - JS/TS:
intervals.sort((a, b) => a[1] - b[1] || a[0] - b[0])— the|| 0falls through to start-comparison on tie.
Common Bugs
- Sorting by start time instead of end time (intuitive, wrong).
- Wrong tie-break (e.g., descending start) breaking determinism in the proof.
- Using
>instead of≥for compatibility, rejecting touching intervals. - Forgetting to update
last_endafter accepting an interval. - Returning the count when the question asked for the removal count (LC 435), or vice versa.
Debugging Strategy
If your greedy disagrees with brute force on some n ≤ 12 input:
- Print both solutions side by side.
- Find the first interval where they differ.
- Manually run the exchange argument step on that point — does the swap preserve feasibility? If not, your sort key or tie-break is wrong.
- If the exchange preserves feasibility but your code didn’t pick
g_i, your scan logic has an off-by-one in the compatibility check.
Mastery Criteria
- You can write the algorithm in <5 minutes from a blank screen, including the tie-break in the sort key.
- You can deliver the 4-step exchange argument out loud in <90 seconds, without notes.
- You can extend to LC 452 (minimum arrows to burst balloons) by recognizing it as the same problem with renaming, in <2 minutes.
- You can articulate why the weighted version requires DP, in <30 seconds.
- You correctly reject the wrong sort keys (earliest start, shortest duration, fewest conflicts) by giving a concrete counterexample for each, in <2 minutes total.
← Phase 6 README · Lab 02 — Jump Game II →
Lab 02 — Jump Game II
Goal
Internalize the greedy reach pattern: a single forward scan with two pointers (current_end, farthest) producing the minimum number of jumps. Prove correctness via a loop invariant + monovariant pair.
Background
Jump Game II (LC 45) is the canonical “greedy with monovariant” problem. It looks like BFS — and indeed, the greedy is BFS in disguise — but the BFS is implemented in O(n) time and O(1) space because the levels are contiguous index ranges. The proof is a two-part argument: an invariant (“at any point, the farthest position reachable in j jumps equals current_end after the j-th jump”) plus a monovariant (farthest non-decreasing, current_end strictly increasing per jump).
Interview Context
This problem (or its variants — LC 1306, LC 1326, LC 1024) appears at every FAANG-tier interview. The wrong solution is “BFS with a queue, mark visited” (O(n²) time, O(n) space). The right solution looks too simple to be correct unless you have the proof — which is why the interviewer asks for it.
Problem Statement
Given a non-negative integer array nums where nums[i] is the maximum jump length from index i, return the minimum number of jumps to reach the last index, starting from index 0. Assume the last index is always reachable.
Constraints
1 ≤ n ≤ 10^40 ≤ nums[i] ≤ 1000- Last index is always reachable (no
-1case).
Clarifying Questions
- “Is
0a valid value at intermediate positions, and does it mean we get stuck?” — yes; but the problem guarantees reachability, so we won’t actually get stuck. - “Can
n == 1? Then the answer is 0 (already at the end).” — yes, edge case to handle. - “Do I return the count of jumps or the path?” — count.
- “Are negative jumps allowed?” — no; non-negative only.
Examples
[2,3,1,1,4]→ 2 (0 → 1 → 4).[2,3,0,1,4]→ 2.[1]→ 0 (already at end).[1,1,1,1]→ 3.[5,1,1,1,1]→ 1 (one jump from 0 to 4).
Initial Brute Force
Recursive DP with memoization: dp[i] = min jumps from index i to n-1.
from functools import lru_cache
def jumps_brute(nums):
n = len(nums)
@lru_cache(maxsize=None)
def f(i):
if i >= n - 1: return 0
if nums[i] == 0: return float('inf')
return 1 + min(f(i + j) for j in range(1, nums[i] + 1) if i + j < n)
return f(0)
Brute Force Complexity
Time O(n · max(nums)) — each cell tried against up to max(nums) next cells. At max(nums) = 1000, that’s 10^7 — borderline. Space O(n) for memo + recursion. The greedy is O(n) time and O(1) space.
Optimization Path
- Recursive DP (above) — clear correctness.
- Iterative DP —
dp[i]filled left to right;dp[i] = 1 + min(dp[i+j]). Same complexity. - BFS — view jumps as edges; level number = answer.
O(n²)worst case if naively implemented (revisiting);O(n)if you track the boundary. The boundary tracking is exactly the greedy. - Greedy with two pointers —
O(n)time,O(1)space. The BFS layers are contiguous ranges[L, R]; we process the range and compute the next range’s right endpoint asmax(i + nums[i])fori ∈ [L, R].
Final Expected Approach
def jump(nums):
n = len(nums)
if n <= 1: return 0
jumps = 0
current_end = 0
farthest = 0
for i in range(n - 1): # don't iterate past n-1
farthest = max(farthest, i + nums[i])
if i == current_end:
jumps += 1
current_end = farthest
if current_end >= n - 1:
break
return jumps
Two pointers: current_end is the right boundary of the BFS layer we’re currently processing; farthest is the right boundary of the next BFS layer being assembled.
Data Structures Used
- Three integers:
jumps,current_end,farthest. No arrays, no queue, no recursion. The simplicity is the point.
Correctness Argument
We prove: at termination, jumps equals the minimum number of jumps to reach index n - 1.
Setup. Define level(j) = the set of indices reachable from 0 in exactly j jumps and not in fewer. By induction on j: level(0) = {0}; level(j+1) = {i + k : i ∈ level(j), 1 ≤ k ≤ nums[i]} − ⋃_{j' ≤ j} level(j'). By a simple induction, level(j+1) is a contiguous range of indices [L_{j+1}, R_{j+1}] immediately to the right of R_j. (Proof: level(j) is contiguous by induction; the union of [i, i + nums[i]] over i in a contiguous range is itself a contiguous range; subtracting earlier levels removes a contiguous prefix.)
Loop invariant. At the top of iteration i:
current_end = R_{jumps}— i.e., the right boundary of the layer reached injumpsjumps so far.farthest = max_{k ≤ i, k ≤ current_end} (k + nums[k])— the farthest reachable from any index processed so far in the current layer.
Initialization. Before the loop: jumps = 0, current_end = 0, farthest = 0. Layer 0 is {0} with R_0 = 0 ✓.
Maintenance. At iteration i:
- We update
farthest = max(farthest, i + nums[i]). Ifi ≤ current_end, this maintains invariant (2). - If
i == current_end, we’ve finished processing the current layer. We “jump”:jumps += 1,current_end = farthest. By invariant (2),farthest = R_{jumps_old + 1}= right boundary of the next layer. So invariant (1) is restored withjumps_new = jumps_old + 1.
Monovariant. farthest is non-decreasing across the loop (each iteration takes a max). At each “jump” event, current_end strictly increases (otherwise the last index isn’t reachable, contradicting the problem’s guarantee). The loop runs n - 1 iterations and the number of “jump” events is bounded above by n - 1, so the algorithm terminates and produces a finite jumps value.
Termination + correctness. The loop terminates after n - 1 iterations (or earlier if current_end ≥ n - 1). At that point, current_end ≥ n - 1 (because the last index is reachable; if it weren’t, we’d have current_end < n - 1 and farthest = current_end — no progress — but the problem guarantees reachability, so farthest > current_end whenever current_end < n - 1). Therefore jumps is exactly the layer index of n - 1 in the BFS — i.e., the minimum jump count. QED.
The two key proof devices: the invariant that current_end tracks layer boundaries, and the monovariant that farthest is non-decreasing. Together they give correctness; the monovariant alone gives termination.
Complexity
- Time
O(n)— single forward scan. - Space
O(1)— three integers.
Implementation Requirements
- Loop bound
i < n - 1, noti < n. Iterating ton - 1would cause an extra spurious jump count whencurrent_end == n - 1. if i == current_end:— the trigger for layer transition. The check happens afterfarthestis updated.if current_end >= n - 1: break— early exit.- Handle
n == 1as a special case (answer 0) before the loop.
Tests
def test_jump():
assert jump([2,3,1,1,4]) == 2
assert jump([2,3,0,1,4]) == 2
assert jump([1]) == 0
assert jump([1,1,1,1]) == 3
assert jump([5,1,1,1,1]) == 1
assert jump([1,2]) == 1
assert jump([0]) == 0 # n=1, no jumps needed
# large jump from start
assert jump([100, 1, 1, 1, 1, 1, 1]) == 1
Stress-test versus the recursive DP for n ≤ 20.
Follow-up Questions
- LC 55 (Jump Game I) — can we reach the end?
farthest ≥ iinvariant suffices. - LC 1306 (Jump Game III) — bidirectional jumps with fixed offsets; not greedy, BFS.
- LC 1340 (Jump Game V) — descent-only, weighted; DP territory.
- Min jumps with cost — DP, not greedy. The cost asymmetry breaks the layer-contiguity argument.
Product Extension
Network packet routing: minimum hops to reach a destination when each node has a maximum forward-reach. Game pathfinding when movement primitives have variable range.
Language / Runtime Follow-ups
- Python: as shown above. Beware the off-by-one on
range(n - 1). - Java:
int n = nums.length; if (n <= 1) return 0; int jumps = 0, end = 0, far = 0; for (int i = 0; i < n - 1; i++) { far = Math.max(far, i + nums[i]); if (i == end) { jumps++; end = far; if (end >= n - 1) break; } }. - Go: identical structure; use
if i+nums[i] > far { far = i + nums[i] }to avoidmath.Maxfloat overhead. - C++: same; use
std::max. - JS/TS: same;
Math.max(...). Watch fornums.lengthre-evaluation cost in tight loops on engines that don’t hoist.
Common Bugs
- Iterating
for i in range(n)and double-counting the last jump. - Updating
current_endbefore the bookkeeping checki == current_end. - Forgetting the
n == 1early return. - Using BFS with a queue when the contiguous-range observation makes it
O(1)space. - Confusing
farthest(assembling next layer) withcurrent_end(current layer’s right edge) — labelling them consistently is essential.
Debugging Strategy
- Print
(i, current_end, farthest, jumps)at each iteration on a small input. The state should evolve predictably:farthestrises,current_endjumps tofarthestexactly whenicatches up. - If your output is one too many: check the loop bound (
n - 1notn). - If your output is one too few: check that you increment
jumpsat the layer boundary, not after the last iteration.
Mastery Criteria
- You can write the algorithm in <5 minutes from blank.
- You can articulate the BFS-layer interpretation in <30 seconds.
- You can state the loop invariant precisely and run through initialization → maintenance → termination in <2 minutes.
- You can name the monovariant and explain why it implies termination, in <30 seconds.
- You can extend to LC 55 (Jump Game I) in <3 minutes by simplifying the same template.
← Lab 01 — Interval Scheduling · Phase 6 README · Lab 03 — Task Scheduler →
Lab 03 — Task Scheduler With Cooldown
Goal
Master the frequency-greedy pattern: schedule a stream of tasks with a per-type cooldown, minimizing total CPU cycles. Derive the closed-form formula (maxFreq - 1) * (n + 1) + countMax, prove its optimality via an exchange argument, and recognize when the formula breaks (when actual task count exceeds the formula).
Background
LC 621 is the canonical “greedy + counting formula” problem. The exchange argument is short but subtle: the most-frequent task type must be scheduled in the densest possible pattern (every n+1 slots), and any optimal schedule that does not schedule the most-frequent type maximally densely can be modified to do so without increasing the total cycle count. This produces the formula. The skill being tested is recognition of the pattern, derivation of the formula from first principles, and the discipline to also show the alternative max-heap simulation that handles the same problem operationally.
Interview Context
Asked at Amazon, Google, Meta. The interviewer is testing whether you can derive a formula or whether you reach for a heap reflexively. Both solutions are accepted; the formula version with proof is the stronger signal. Watch for the follow-up: “What if a new task type can arrive mid-execution?” — that breaks the formula and forces the heap.
Problem Statement
Given an array of tasks (uppercase letters) and an integer n (cooldown), schedule the tasks so that the same task type is separated by at least n other slots (the slots can be idle if necessary). Return the minimum number of CPU cycles to finish all tasks.
LeetCode reference: LC 621 — Task Scheduler.
Constraints
1 ≤ |tasks| ≤ 10^4tasks[i]is uppercase English letter (so at most 26 distinct types).0 ≤ n ≤ 100.
Clarifying Questions
- “Does the schedule have to be returned, or just the cycle count?” — count only.
- “Can multiple tasks of different types execute in the same cycle?” — no, one task per cycle (or one idle).
- “If
n == 0, can same-type tasks run back to back?” — yes; answer is justlen(tasks). - “Are tasks pre-sorted or arrive in any order?” — order doesn’t matter; only the frequency vector matters.
- “Can a single task type appear more times than total cycles?” — no; the formula handles this naturally.
Examples
tasks = ["A","A","A","B","B","B"], n = 2→ 8. Schedule:A B _ A B _ A B.tasks = ["A","A","A","B","B","B"], n = 0→ 6.tasks = ["A","A","A","A","A","A","B","C","D","E","F","G"], n = 2→ 16. Formula:(6-1)*(2+1) + 1 = 16.tasks = ["A","B","C","D","E","A","B","C","D","E"], n = 4→ 10. Formula gives(2-1)*5 + 5 = 10; actual = 10. Tight.tasks = ["A","B","C","D","E","F"], n = 100→ 6. Formula gives(1-1)*101 + 6 = 6sincemaxFreq = 1.
Initial Brute Force
Simulate. At each cycle, pick any task type with cooldown elapsed and remaining count > 0; if multiple, pick whichever (the simulation is correct under any tiebreaker, but optimal requires the highest-frequency one). If none available, idle. Repeat until all tasks done.
Brute Force Complexity
O(T) where T is the answer. Tight bound T ≤ |tasks| * (n + 1), so O(|tasks| * n) worst case. Acceptable but slower than the formula.
Optimization Path
- Brute simulation — easy to write, slow.
- Max-heap simulation — at each cycle, pop highest-count types, decrement, push back to a temporary “cooling” queue with cooldown timestamp. After
n + 1cycles or a complete pass, restore from cooling queue. - Closed-form formula — derive from the structure of an optimal schedule.
O(|tasks|)time,O(1)space.
Final Expected Approach
from collections import Counter
def least_interval(tasks, n):
cnt = Counter(tasks)
max_freq = max(cnt.values())
count_max = sum(1 for v in cnt.values() if v == max_freq)
return max(len(tasks), (max_freq - 1) * (n + 1) + count_max)
The max(len(tasks), …) handles the case where the formula gives less than total tasks — i.e., when there are so many distinct task types that we never need to idle.
Data Structures Used
- A frequency counter (26-entry array or
Counter). - Two integers:
max_freq,count_max.
Correctness Argument
We prove the formula T = max(|tasks|, (max_freq - 1)(n + 1) + count_max) is optimal.
Setup. Let M = max_freq and K = count_max (number of types tied for the maximum frequency).
Lower bound (no schedule can do better than T). Consider any task type with frequency M. The M instances of this type must be separated by at least n other slots, so the schedule spans at least (M - 1)(n + 1) + 1 cycles for one such type. If K types are all tied at frequency M, then in cycles 1, n+2, 2n+3, … we must place an instance of each tied type — actually, we must place all K of them in one of the n+1-slot windows. The last window (after the last instance of the most-frequent type’s predecessor) has only the final instances of each tied type, contributing exactly K cycles after (M - 1)(n + 1). So T ≥ (M - 1)(n + 1) + K.
Also trivially T ≥ |tasks| (every task runs in its own cycle).
So T ≥ max(|tasks|, (M - 1)(n + 1) + K).
Upper bound (the formula’s value is achievable). We construct a schedule of length exactly (M - 1)(n + 1) + K (or |tasks| if larger) and show it is feasible.
-
Case 1:
(M - 1)(n + 1) + K ≥ |tasks|. Lay outM − 1complete frames ofn + 1slots each, followed by a final frame ofKslots. In each frame, slotj(for0 ≤ j < K) is reserved for thej-th most-frequent task type. The remainingn + 1 − Kslots in each frame are filled by other task types in any order; if there aren’t enough non-cooldown candidates, idle. Exchange argument step: suppose an optimal schedule does not place the most-frequent task at slots0, n+1, 2(n+1), …of consecutive frames. Then there is a slotiwhere it could have been placed but wasn’t. Swap it with whatever is there; cooldown is preserved (we are moving an instance to a slot that’sn + 1away from the previous instance, which is within bounds); other tasks are not constrained by this swap. Iterate. The result is the formula schedule, with the same length. -
Case 2:
(M - 1)(n + 1) + K < |tasks|. Then there are more total tasks than the formula’s slot count; we have so much variety that no idle is needed. Schedule any feasible permutation; total cycles =|tasks|. This is achievable because at each cycle we have at least 26 distinct types to choose from (modulo cooldowns), and the cooldown constraint cannot exceednslots, which is dominated by the diversity.
In both cases, the constructed schedule’s length matches the lower bound. Optimal. QED.
The exchange argument is the crucial step: it converts “the formula is one possible schedule’s length” into “no schedule is shorter.” Without the exchange, you have only an existence claim.
Complexity
- Time
O(|tasks|)— single pass to compute frequencies. - Space
O(1)— at most 26 distinct task types, frequency dictionary is constant-bounded.
Implementation Requirements
- Use
Counteror a 26-entry array. - Compute
max_freqandcount_maxin one pass. - Return
max(len(tasks), (max_freq - 1) * (n + 1) + count_max)— do not skip themax.
Tests
def test_least_interval():
assert least_interval(["A","A","A","B","B","B"], 2) == 8
assert least_interval(["A","A","A","B","B","B"], 0) == 6
assert least_interval(["A","A","A","A","A","A","B","C","D","E","F","G"], 2) == 16
assert least_interval(["A","B","C","D","E","A","B","C","D","E"], 4) == 10
assert least_interval(["A","B","C","D","E","F"], 100) == 6
assert least_interval(["A"], 5) == 1
assert least_interval(["A","A"], 0) == 2
assert least_interval(["A","A","A","A"], 3) == 13 # (4-1)*4 + 1 = 13
Follow-up Questions
- Streaming tasks. New tasks arrive mid-execution; formula no longer applies because frequencies change. Use the max-heap simulation.
- Different cooldowns per task type. Heap with per-type cooldown tracker.
- Print the actual schedule. Heap simulation produces a schedule; the formula does not directly give one (you’d reconstruct from the proof).
- What if
ncan be huge (n = 10^9)? Same formula; constant-time arithmetic.
Product Extension
OS task scheduling with cooldown (e.g., a process that touched a hot resource must wait n cycles before re-touching). API rate limiting at the user level. Workout-program scheduling with muscle-group recovery windows.
Language / Runtime Follow-ups
- Python:
Counter(tasks)isO(|tasks|). - Java:
int[26]array indexed byc - 'A'. Faster thanHashMap<Character, Integer>for the constant-alphabet case. - Go:
[26]intworks the same. - C++:
std::array<int, 26>;std::max_elementformax_freq. - JS/TS:
Map<string, number>or 26-entry array; the latter is faster.
Common Bugs
- Forgetting
max(len(tasks), formula)— fails on inputs with many distinct task types. - Using
count_max = 0when there’s only one max-frequency type (should be 1). - Off-by-one in the formula:
(M - 1) * (n + 1)notM * (n + 1). - Heap simulation: forgetting to push the cooled-down task back, or pushing it back at the wrong cycle.
Debugging Strategy
- For each test case, hand-compute
M,K, and the formula. If formula matches expected, your code is wrong somewhere mechanical. - If formula doesn’t match expected, you have a conceptual error: either
Mis wrong (multi-counting) orKis wrong (counting non-tied types) or themax(|tasks|, …)clamp is missing. - Run brute simulation as a stress oracle for
|tasks| ≤ 20.
Mastery Criteria
- You can derive the formula from first principles in <3 minutes.
- You can deliver the exchange argument out loud in <2 minutes.
- You can write the formula-based solution in <3 minutes.
- You can write the heap-based simulation in <10 minutes when asked for the streaming variant.
-
You can articulate why the
max(|tasks|, formula)clamp is necessary and which case it covers, in <60 seconds.
← Lab 02 — Jump Game II · Phase 6 README · Lab 04 — Gas Station →
Lab 04 — Gas Station
Goal
Master the single-pass invariant greedy: O(n) time, O(1) space, with a non-trivial correctness invariant proving why we can skip ahead instead of retrying every starting station.
Background
LC 134 is the canonical “one-pass with reset” greedy. The naive approach is O(n²): for each candidate starting station, simulate the trip. The greedy collapses this to O(n) via the invariant: if the running tank goes negative at station k starting from station s, then no station in [s, k] can be a valid starting point. Once you see this, the algorithm shrinks to a few lines and the proof is the entire test of skill.
Interview Context
Asked at Google, Bloomberg, Amazon. The candidate who codes O(n²) first then asks “can we do better?” is fine. The candidate who jumps to O(n) without the invariant proof is in danger — interviewers test by asking “why is start = k + 1 correct?” and a candidate without the invariant answers “uh, intuition.” That answer fails staff-level interviews.
Problem Statement
There are n gas stations on a circular route. Station i has gas[i] units of gas; traveling from station i to station i + 1 costs cost[i] units. Starting with an empty tank at some station, find the unique starting station that allows you to complete the full circle, or return -1 if impossible.
LeetCode reference: LC 134 — Gas Station.
Constraints
1 ≤ n ≤ 10^50 ≤ gas[i], cost[i] ≤ 10^4- The solution is unique if it exists.
- Time
O(n), spaceO(1).
Clarifying Questions
- “Is the route guaranteed circular?” — yes; from station
n - 1you go to0. - “Can the answer be ambiguous (multiple valid starts)?” — no, the problem guarantees uniqueness when a solution exists.
- “Can
gas[i]orcost[i]be negative?” — no, both non-negative. - “Should I return the index or the boolean feasibility?” — index, or
-1.
Examples
gas = [1,2,3,4,5], cost = [3,4,5,1,2]→ 3 (start at index 3: tank0 + 4 - 1 = 3, 3 + 5 - 2 = 6, 6 + 1 - 3 = 4, 4 + 2 - 4 = 2, 2 + 3 - 5 = 0).gas = [2,3,4], cost = [3,4,3]→ -1 (total gas = 9, total cost = 10, infeasible).gas = [5], cost = [4]→ 0.gas = [3,1,1], cost = [1,2,2]→ 0.
Initial Brute Force
For each candidate start s, simulate the full trip; return the first s that succeeds.
def can_complete_brute(gas, cost):
n = len(gas)
for s in range(n):
tank = 0
for k in range(n):
i = (s + k) % n
tank += gas[i] - cost[i]
if tank < 0:
break
else:
return s
return -1
Brute Force Complexity
Time O(n²), space O(1). At n = 10^5, n² is 10^{10} — too slow.
Optimization Path
- Brute —
O(n²). - Total-feasibility check — if
sum(gas) < sum(cost), no solution exists; return-1immediately. Reduces wasted work but stillO(n²)worst case. - One-pass with reset —
O(n). The invariant below is the key.
Final Expected Approach
def can_complete_circuit(gas, cost):
if sum(gas) < sum(cost):
return -1
start = 0
tank = 0
for i in range(len(gas)):
tank += gas[i] - cost[i]
if tank < 0:
start = i + 1
tank = 0
return start
Data Structures Used
- Two integers:
start,tank. Plus the inputs.
Correctness Argument
We prove two things: (1) if sum(gas) ≥ sum(cost), the algorithm returns a valid starting index; (2) if sum(gas) < sum(cost), no solution exists.
Part 2 is trivial. Across one full lap, the tank changes by exactly sum(gas) - sum(cost). If this is negative, the tank cannot remain non-negative throughout any lap from any start — so no solution.
Part 1 — the key invariant.
Invariant (key claim): suppose we run the algorithm starting from index
start = sand the runningtankfirst goes negative at indexk(so the partial sumtankafter processing indexkis< 0, but it was≥ 0after processing every index in[s, k - 1]). Then no index in[s, k]can be a valid starting point.
Proof of the key claim. Let T(a, b) = sum(gas[a..b]) - sum(cost[a..b]) be the net fuel from a to b. By assumption, T(s, k - 1) ≥ 0 (we made it past k - 1) and T(s, k) < 0 (we failed at k).
Consider any candidate start s' ∈ [s, k]. To complete the lap from s', we need partial sums T(s', i) ≥ 0 for every i between s' and s' + n - 1 (mod n) — in particular, T(s', k) ≥ 0 (assuming s' ≤ k; otherwise we’d be considering s' = k + 1 which is outside the claim’s range).
But T(s', k) = T(s, k) - T(s, s' - 1) ≤ T(s, k) < 0 (since T(s, s' - 1) ≥ 0 by the assumption that we made it past every index in [s, s' - 1]). So starting from s', the tank goes negative at index k, and s' is not a valid start.
Therefore, after a failure at k, we can safely skip all of [s, k] and resume the search from k + 1. QED for the key claim.
Wrapping up. Each index is visited at most once as part of either a successful prefix or the “reset point.” The algorithm runs n iterations. If sum(gas) ≥ sum(cost), the final start is a valid starting point — because the algorithm has effectively eliminated all other candidates, and the problem guarantees a unique solution when one exists. (Formally: from start to the end of the array, no negative event occurred. The wrap-around portion (from index 0 back to start - 1) accumulates at most sum_total - tank_so_far ≤ sum_total = T_total ≥ 0, but we need the running tank non-negative, which follows from the invariant: every prefix from start is non-negative until end, and the wrap-around is the complement, which by total non-negativity stays non-negative.)
The careful formal completion: since T_total ≥ 0, and T(start, n - 1) ≥ 0, we have T(0, start - 1) = T_total - T(start, n - 1) ≤ T_total, but we need positivity of partial sums. The invariant from each reset proved that no earlier candidate works; combined with uniqueness, start is the unique answer.
Complexity
- Time
O(n)— one pass forsum, one pass for the loop. - Space
O(1).
Implementation Requirements
- Pre-check
sum(gas) < sum(cost)is optional (the algorithm itself returns the rightstarteither way only if a solution exists; the pre-check is cheap and avoids returning a bogus value). - Reset
tank = 0(nottank = gas[i+1] - cost[i+1]) when starting fresh. start = i + 1after failure ati.- The variant where you maintain both running and total in a single pass is also acceptable:
def can_complete_circuit_one_pass(gas, cost):
total = tank = start = 0
for i in range(len(gas)):
diff = gas[i] - cost[i]
total += diff
tank += diff
if tank < 0:
start = i + 1
tank = 0
return start if total >= 0 else -1
Tests
def test_gas_station():
assert can_complete_circuit([1,2,3,4,5], [3,4,5,1,2]) == 3
assert can_complete_circuit([2,3,4], [3,4,3]) == -1
assert can_complete_circuit([5], [4]) == 0
assert can_complete_circuit([3,1,1], [1,2,2]) == 0
assert can_complete_circuit([5,1,2,3,4], [4,4,1,5,1]) == 4
# exact match (zero margin)
assert can_complete_circuit([1,2,3], [3,2,1]) in (0, 1, 2) # one of these
# all zeros
assert can_complete_circuit([0,0,0], [0,0,0]) == 0
# cannot start
assert can_complete_circuit([1,1,1], [2,2,2]) == -1
Stress-test versus brute force for n ≤ 50.
Follow-up Questions
- Find the index where you must idle if the route is infeasible. Slightly different problem; same scan structure.
- Multiple cars on the same circuit. Independent problems per car.
- Variable tank capacity. Now state is two-dimensional; greedy may fail; revert to DP / simulation.
- Two-direction route. Run the greedy in both directions; combine.
Product Extension
Battery-powered EV routing with charging stations of variable wattage and costs. Drone delivery routes with refuel points. Spacecraft trajectory planning with gravity-assist maneuvers (highly idealized).
Language / Runtime Follow-ups
- Python: as shown.
- Java: identical structure;
int total = 0, tank = 0, start = 0;. - Go:
total, tank, start := 0, 0, 0. - C++:
int total = 0, tank = 0, start = 0;. Watch overflow ifgas[i]andcost[i]are at the upper end andn = 10^5:10^4 * 10^5 = 10^9, withinint32range, but borderline; uselong longto be safe. - JS/TS:
let total = 0, tank = 0, start = 0;. JS numbers are 64-bit floats, no overflow worry at this scale.
Common Bugs
- Resetting
tank = gas[i] - cost[i]instead oftank = 0after failure (you’d be double-counting the failure point). - Setting
start = iinstead ofstart = i + 1after failure. - Forgetting the
total < 0 → -1check, returning a bogus index. - Iterating in the wrong direction or two passes when one suffices.
Debugging Strategy
- Print
(i, gas[i] - cost[i], tank, start)at each step. The trajectory should show: tank rises and falls, and on each fall below 0 thestartjumps toi + 1. - If your output is off by one (returns
start - 1orstart + 1), check the assignment in the failure branch. - If you return a
startbut the route is actually infeasible, you missed thetotal < 0gate.
Mastery Criteria
- You can write the algorithm in <4 minutes from blank.
-
You can state the key invariant (“if tank goes negative at
kfrom starts, no station in[s, k]can be a valid start”) in <30 seconds. -
You can prove the invariant (using the partial-sum decomposition
T(s', k) = T(s, k) - T(s, s' - 1)) in <2 minutes, out loud. -
You can articulate why
total < 0 → -1is sufficient and necessary, in <30 seconds. - You can produce the brute-force baseline as a stress-test oracle in <3 minutes when asked.
← Lab 03 — Task Scheduler · Phase 6 README · Lab 05 — Huffman Coding →
Lab 05 — Huffman Coding
Goal
Implement Huffman coding from scratch using a min-heap, and prove its optimality via the canonical exchange argument: in some optimal prefix-free code tree, the two least-frequent symbols are siblings at maximum depth.
Background
Huffman coding is the apex example of “greedy via min-heap” and the most-cited example of greedy optimality in CS curricula. The proof has two non-trivial steps: a swap-to-leaf-depth lemma (any internal node at maximum depth can be assumed to have the two least-frequent symbols), and an induction on the merged tree (the greedy is optimal on n - 1 symbols, and combining the two smallest preserves optimality). Mastering this proof teaches a more sophisticated form of exchange argument than the linear-scan greedy of Labs 1–4.
Interview Context
Huffman is asked occasionally at top-tier interviews — Google, Apple, AWS — usually as an open-ended “design a compression algorithm” or as a follow-up to a lab on heap usage. More commonly, the technique (greedy via min-heap, with optimality proof) appears in adjacent problems: LC 1167 — Minimum Cost to Connect Sticks, LC 23 — Merge K Sorted Lists, and the rope-merging problem. Mastery of Huffman = mastery of the entire family.
Problem Statement
Given a frequency map freq: Symbol -> int over n distinct symbols (n ≥ 2), construct a prefix-free binary code such that the expected code length Σ freq[s] * len(code[s]) is minimized. Return either the code map or the encoding tree.
For interview formulation, often phrased as: “Given n ropes of given lengths, you can merge two ropes at a cost equal to the sum of their lengths. Find the minimum total cost to merge all ropes into one.” (Equivalent to Huffman; rope lengths = frequencies.)
LeetCode reference: LC 1167 — Minimum Cost to Connect Sticks (the rope formulation).
Constraints
2 ≤ n ≤ 10^41 ≤ freq[s] ≤ 10^4- Time
O(n log n); spaceO(n). - Tie-break on equal frequencies: any order is acceptable; the optimal cost is invariant.
Clarifying Questions
- “Should I return the codes or just the cost?” — usually cost (rope formulation); for full Huffman, return the tree or the codes.
- “Are frequencies guaranteed positive?” — yes (zero-frequency symbols don’t need codes).
- “Are there always at least 2 symbols?” — assume yes; with 1 symbol, prefix-free coding is trivially “0” (or empty, depending on definition).
- “Is
n = 0a valid input?” — typically no. - “Should the codes be canonical?” — usually no; any optimal-length code is acceptable.
Examples
- Frequencies:
{a: 5, b: 9, c: 12, d: 13, e: 16, f: 45}. Codes (one valid set):f: 0, c: 100, d: 101, a: 1100, b: 1101, e: 111. Total cost:5*4 + 9*4 + 12*3 + 13*3 + 16*3 + 45*1 = 20 + 36 + 36 + 39 + 48 + 45 = 224. - Ropes
[2, 4, 3]: merge 2+3=5 (cost 5), then 5+4=9 (cost 9), total 14. - Ropes
[1, 8, 3, 5]: merge 1+3=4, merge 4+5=9, merge 9+8=17. Total = 4+9+17=30. Or: 1+3=4, 4+5=9, 8+9=17 → same. Min cost 30.
Initial Brute Force
Try every binary-tree topology over the leaves; compute the weighted external path length; return the minimum-cost tree. Catalan-number many trees → infeasible past n = 10.
# Sketched only — exponential
def huffman_brute(freq):
# enumerate all binary trees with leaves = freq, return min weighted path length
...
Brute Force Complexity
O(C_n) where C_n is the Catalan number — C_{10} ≈ 16800, C_{15} ≈ 9.7M. Useful only as stress-test for n ≤ 8.
Optimization Path
- Brute — exhaustive trees.
- DP on intervals — possible if leaves are ordered (matrix-chain style), but Huffman’s leaves are unordered, so this doesn’t apply.
- Greedy with min-heap —
O(n log n)time,O(n)space. Optimality from the exchange argument below.
Final Expected Approach
import heapq
def huffman_cost(freqs):
heap = list(freqs) # frequencies only, for cost-only variant
heapq.heapify(heap)
total = 0
while len(heap) > 1:
a = heapq.heappop(heap)
b = heapq.heappop(heap)
s = a + b
total += s
heapq.heappush(heap, s)
return total
def huffman_codes(freqs):
# freqs: list of (symbol, freq) tuples
heap = [[f, [[s, ""]]] for s, f in freqs]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1]:
pair[1] = '0' + pair[1]
for pair in hi[1]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0], lo[1] + hi[1]])
return dict(heap[0][1])
Data Structures Used
- A min-heap of (frequency, optional payload) pairs.
- The implicit binary tree formed by the merge sequence.
Correctness Argument
We prove the greedy is optimal by induction on n (the number of symbols), using two lemmas.
Lemma 1 (Swap-to-deepest). In some optimal prefix code tree T*, the two least-frequent symbols x and y are siblings at the maximum depth of any leaf.
Proof. Take any optimal tree T'. Let a and b be two siblings at the maximum-depth leaf level of T' (such a pair exists in any full binary tree where every internal node has 2 children). If {a, b} = {x, y}, done. Otherwise, suppose WLOG freq[x] ≤ freq[a] and x ≠ a. Swap x with a (place the symbol x at a’s leaf and vice versa). The cost change is:
Δ = freq[x] * depth(a) + freq[a] * depth(x) - freq[x] * depth(x) - freq[a] * depth(a)
= (freq[a] - freq[x]) * (depth(x) - depth(a))
Since freq[a] ≥ freq[x] (because x is among the two least-frequent) and depth(a) ≥ depth(x) (because a is at maximum depth), Δ ≤ 0. Equality holds; the new tree is also optimal. Repeat with y and b. We’ve moved x, y to the maximum-depth pair without increasing cost. QED for Lemma 1.
Lemma 2 (Greedy preserves optimality on the residual). Let x, y be the two least-frequent symbols. Construct freq' by replacing x and y with a single super-symbol z of frequency freq[x] + freq[y]. Then any optimal tree T* for freq' extends to an optimal tree for freq by replacing z’s leaf with an internal node whose children are leaves for x and y.
Proof. Let T_extended be the extension of T* (replace z-leaf with internal node + x, y children). The cost satisfies:
cost(T_extended) = cost(T*) + freq[x] + freq[y]
(The + freq[x] + freq[y] comes from x, y being one level deeper than z was.)
Conversely, any tree T for freq where x, y are siblings (which by Lemma 1 we can assume WLOG) collapses to a tree T_collapsed for freq' by merging x, y into z:
cost(T_collapsed) = cost(T) - freq[x] - freq[y]
So cost(T) = cost(T_collapsed) + freq[x] + freq[y] ≥ cost(T*) + freq[x] + freq[y] = cost(T_extended). Therefore T_extended is at least as good as any tree where x, y are siblings; combined with Lemma 1, T_extended is optimal for freq. QED for Lemma 2.
Inductive proof of greedy optimality. Base case n = 2: only one tree possible, greedy gives it. Inductive step: greedy merges the two least-frequent symbols x, y first, recurses on the residual of size n - 1, and by inductive hypothesis the recursive call produces an optimal tree for freq'. By Lemma 2, the extended tree is optimal for freq. QED.
The two lemmas together are the full exchange-argument proof. Lemma 1 is the swap step; Lemma 2 is the induction step.
Complexity
- Time
O(n log n)—n - 1merge operations, each with twopopand onepush, eachO(log n). - Space
O(n)— heap and tree.
Implementation Requirements
- Use a min-heap (
heapqin Python uses a min-heap by default; in Java,PriorityQueueis min by default). - Tie-breakers: when frequencies are equal, the heap may pick either; correctness is unaffected. For deterministic output, add a secondary index.
- For the cost-only variant, the symbol payload can be omitted.
Tests
def test_huffman_cost():
assert huffman_cost([2, 3, 4]) == 14 # 2+3=5, 5+4=9
assert huffman_cost([1, 8, 3, 5]) == 30
assert huffman_cost([5]) == 0 # n=1 edge: merge cost is 0
# uniform
assert huffman_cost([1, 1, 1, 1]) == 8 # 1+1=2, 1+1=2, 2+2=4. Total=2+2+4=8.
# large skew
assert huffman_cost([1, 1, 1000]) == 1003 # 1+1=2, 2+1000=1002. Total=2+1002=1004?
# actually: heap [1,1,1000] → pop 1, pop 1, push 2. heap [2, 1000]. pop 2, pop 1000, push 1002. cost = 2 + 1002 = 1004.
Wait — let me re-verify the last test. Heap [1, 1, 1000]. Pop 1 + 1 = 2 (cost contribution: 2). Push 2. Heap [2, 1000]. Pop 2 + 1000 = 1002 (cost contribution: 1002). Total = 2 + 1002 = 1004. So the test should be assert huffman_cost([1, 1, 1000]) == 1004. Correct your tests.
Follow-up Questions
- Adaptive Huffman — frequencies are unknown a priori; encoder and decoder maintain a tree that updates as symbols arrive. Used in older compression standards.
- Canonical Huffman — codes are normalized so only code lengths need to be transmitted, not the tree structure. Used in DEFLATE / zlib.
- Length-limited Huffman (max code length
L) — the package-merge algorithm, more complex than vanilla Huffman. - Arithmetic coding — beats Huffman for non-power-of-2 frequencies; not greedy.
Product Extension
Used in: gzip / DEFLATE (with length-limited variant), HTTP/2 HPACK header compression, JPEG entropy coding stage, MP3 audio coding. Whenever a known-frequency distribution must be losslessly compressed with a prefix-free code, Huffman or its variants are the workhorse.
Language / Runtime Follow-ups
- Python:
heapqfor the heap. Watch out:heapqis min-heap; for tie-breaking, use(freq, counter, payload)to avoid comparing payloads. - Java:
PriorityQueue<Node>withComparator.comparingInt(n -> n.freq). - Go: implement
heap.Interface(5 methods) on a slice of nodes; standard library does not provide a generic typed heap pre-1.21. - C++:
std::priority_queue<Node, std::vector<Node>, std::greater<Node>>. Defineoperator<onNodeto compare by frequency. - JS/TS: no built-in heap; either bring a library (
@datastructures-js/priority-queue) or hand-roll a binary heap.
Common Bugs
- Mixing up min-heap and max-heap: with a max-heap, you’d merge the two largest — the answer is wrong by a lot.
- Pushing the merged node back with the wrong frequency (e.g.,
max(a, b)instead ofa + b). - For the codes variant: assigning ‘0’ to high-freq and ‘1’ to low-freq and forgetting that the prefix is built bottom-up (so the last prepended bit is the root’s assignment — make sure the prepend order is right).
- Heap of size 1 at the start (single symbol): the
while len(heap) > 1loop is correct; cost is 0.
Debugging Strategy
- Hand-trace a small example (e.g.,
[1, 1, 1, 1]) and verify each merge step. - Compare cost output against the brute force for
n ≤ 6. - For codes: verify the tree visually — every internal node has exactly two children, every leaf is a symbol, and code lengths are weighted appropriately.
Mastery Criteria
- You can implement Huffman cost-only in <8 minutes from blank.
- You can implement Huffman with full code map in <15 minutes.
- You can deliver Lemma 1 (swap-to-deepest) in <2 minutes, out loud.
- You can deliver Lemma 2 (induction on residual) in <2 minutes, out loud.
- You can recognize LC 1167 / connect-sticks as a Huffman variant in <30 seconds.
- You can articulate when Huffman is not optimal (e.g., when the alphabet allows non-binary codes, or when arithmetic coding is admissible).
← Lab 04 — Gas Station · Phase 6 README · Lab 06 — Greedy Vs DP →
Lab 06 — Greedy Vs DP (Coin Change Counterexample)
Goal
Internalize the failure mode of greedy by walking through the canonical counterexample: coin change with denominations [1, 3, 4] and target 6. Greedy gives 4 + 1 + 1 = 3 coins; DP gives 3 + 3 = 2 coins. Make this the test you run on every “looks like greedy” problem before committing.
Background
Many candidates correctly solve coin change with US denominations [1, 5, 10, 25] greedily, then assume greedy works for any denomination set. It does not. The failure on [1, 3, 4] target 6 is the most-cited counterexample in algorithms textbooks (Cormen, Kleinberg-Tardos, Erickson) precisely because it cleanly demonstrates that “greedy felt right” is not a proof. The lesson generalizes: for greedy to work on a problem, the underlying combinatorial structure typically must be a matroid — a property that is rarely obvious from problem statements and almost never holds for arbitrary inputs.
Interview Context
This lab is the meta lab of Phase 6. Its purpose is not to drill a new algorithm but to drill the discipline of testing greedy hypotheses against counterexamples before coding. Interviewers love to ask coin-change variants specifically because they expose candidates who pattern-match without proof. A candidate who says “I’ll greedy by largest denomination” is asked “what about [1, 3, 4] target 6?” and either (a) recovers gracefully and switches to DP, or (b) doubles down and ships wrong code. (b) ends the interview.
Problem Statement
Given an array of distinct positive coin denominations coins and a non-negative integer amount, return the minimum number of coins needed to sum to amount, or -1 if impossible. You have an unlimited supply of each denomination.
LeetCode reference: LC 322 — Coin Change.
Constraints
1 ≤ |coins| ≤ 121 ≤ coins[i] ≤ 2^31 - 10 ≤ amount ≤ 10^4- Coins are distinct.
1may or may not be in the set; if not, some amounts are unreachable.
Clarifying Questions
- “Are coins guaranteed sorted?” — typically no; sort if needed.
- “Is
1always present?” — no; the inputcoins = [3, 5]andamount = 4is unsolvable, return-1. - “Can
amount = 0?” — yes; answer is0. - “Is the order of coins in the answer significant?” — no, just the count.
- “Should I count the coins or list them?” — count.
Examples
coins = [1, 3, 4], amount = 6→ 2 (3 + 3).coins = [1, 5, 10, 25], amount = 30→ 2 (25 + 5); greedy works.coins = [1, 5, 10, 25], amount = 41→ 4 (25 + 10 + 5 + 1); greedy works.coins = [2], amount = 3→ -1.coins = [1], amount = 0→ 0.coins = [186, 419, 83, 408], amount = 6249→ 20 (random hostile case).
The Greedy Hypothesis (And Why It Fails)
The natural greedy: sort denominations descending, take the largest that fits, recurse on the remainder.
def coin_change_greedy_WRONG(coins, amount):
coins = sorted(coins, reverse=True)
count = 0
for c in coins:
while amount >= c:
amount -= c
count += 1
return count if amount == 0 else -1
Run this on coins = [1, 3, 4], amount = 6:
- Pick 4: amount = 2, count = 1.
- Pick 3? No, 2 < 3. Skip.
- Pick 1: twice. amount = 0, count = 3.
Result: 3 coins. Optimum: 3 + 3 = 2 coins. Greedy is wrong.
Why? The greedy choice property does not hold: taking the largest coin (4) at step 1 forces a residual problem (amount = 2) where the available coins ([1, 3, 4]) cannot reach 2 with fewer than 2 coins (1 + 1). But not taking 4 leaves us with amount = 6 and the optimal residual 3 + 3 = 2 coins. The local optimum (largest fits) is not the global optimum.
The exchange-argument failure at this concrete level:
- Greedy picks coin 4 first. Optimal picks 3 first.
- Try to exchange the optimal’s first 3 with greedy’s 4: residual amount becomes
6 - 4 = 2, which cannot be made with one more coin of denomination ≥ 3. So the swap breaks feasibility / minimality. - The exchange argument fails. Therefore the greedy is not provably optimal — and indeed isn’t.
DP Fallback (The Correct Algorithm)
def coin_change_dp(coins, amount):
INF = float('inf')
dp = [0] + [INF] * amount
for w in range(1, amount + 1):
for c in coins:
if c <= w and dp[w - c] + 1 < dp[w]:
dp[w] = dp[w - c] + 1
return dp[amount] if dp[amount] != INF else -1
This is unbounded knapsack — see Phase 5 Lab 04 — Unbounded Knapsack (Coin Change) for the full derivation.
Complexity: O(amount * |coins|) time, O(amount) space.
When Does Greedy WORK On Coin Change?
Greedy is optimal on a coin system iff it is canonical — a property that depends on the specific denominations. Sufficient conditions:
[1, c, c², c³, …](powers of a fixed base) — always canonical.[1, 5, 10, 25, 50, 100](US currency) — canonical.[1, 2, 5, 10, 20, 50](euro) — canonical.
Necessary and sufficient condition: the set is canonical iff for every amount m in the range [c_{k+1} + 1, c_{k+1} + c_k - 1] (where c_k is the k-th denomination from largest), the greedy answer matches the optimal. Verifying this requires checking O(c_max²) amounts — feasible for small denomination sets.
For interview purposes: never assume canonicity unless the problem explicitly states the denominations are canonical (e.g., “US currency” with the standard set). Default to DP.
A Glance At Matroid Theory (Why Some Greedy Problems Work)
A matroid M = (E, I) is a pair where E is a set of elements and I ⊆ 2^E is a family of “independent sets” satisfying:
- Hereditary: if
A ∈ IandB ⊆ A, thenB ∈ I. - Exchange property: if
A, B ∈ Iand|A| < |B|, then there existsb ∈ B \ Asuch thatA ∪ {b} ∈ I.
Theorem (Edmonds–Rado): the greedy algorithm produces a maximum-weight independent set on M iff M is a matroid.
Examples of matroids: the cycle-free edge sets of a graph (graphic matroid → Kruskal works), linearly-independent subsets of vectors (linear matroid), independent sets in a uniform matroid.
Coin change is not a matroid problem — there is no natural matroid structure under which “fewest coins to reach amount” is a max-weight independent set, which is why greedy doesn’t work for arbitrary denominations. Interval scheduling is effectively a matroid problem (the set of compatible activities forms an “interval matroid”), which is why earliest-end-time greedy works.
You don’t need to memorize matroid theory for interviews. You do need to know the empirical signal: if greedy doesn’t pass the counterexample stress test, fall back to DP without panic.
Decision Recipe (The Whole Point Of This Lab)
For any optimization problem that “looks greedy”:
- Hypothesize a greedy choice (e.g., largest first, smallest first, by ratio).
- Run it on a hand-crafted small input of size 4–6 with adversarial denominations / weights.
- Compare to brute force (recursive enumeration of all choices).
- If greedy ≠ brute force on any input → fall back to DP, no further deliberation.
- If greedy = brute force on all stress tests → try to prove via exchange argument.
- If exchange argument works → ship greedy.
- If exchange argument fails or is unclear → fall back to DP. Better safe than sorry under interview time pressure.
The discipline: greedy is opt-in, requires positive proof. DP is the default for optimization problems unless greedy is clearly justified.
Tests
def test_coin_change_dp():
assert coin_change_dp([1, 3, 4], 6) == 2
assert coin_change_dp([1, 2, 5], 11) == 3
assert coin_change_dp([2], 3) == -1
assert coin_change_dp([1], 0) == 0
assert coin_change_dp([1, 5, 10, 25], 30) == 2
assert coin_change_dp([1, 5, 10, 25], 41) == 4
assert coin_change_dp([186, 419, 83, 408], 6249) == 20
def test_greedy_fails_on_counterexample():
"""Document the failure for posterity."""
assert coin_change_greedy_WRONG([1, 3, 4], 6) == 3 # WRONG; correct is 2
assert coin_change_dp([1, 3, 4], 6) == 2 # Right answer
def test_greedy_works_on_canonical():
assert coin_change_greedy_WRONG([1, 5, 10, 25], 30) == 2
assert coin_change_greedy_WRONG([1, 5, 10, 25], 41) == 4
Correctness Argument (For DP)
DP correctness follows from optimal substructure: dp[w] = 1 + min(dp[w - c] : c ∈ coins, c ≤ w). Each dp[w] is computed from strictly smaller subproblems, so the table fills in O(amount * |coins|) time. The minimum is over all first-coin choices, exhaustively — so we never miss the optimal first choice (in contrast to greedy, which commits to one). See Phase 5 Lab 04 for the full proof.
Common Bugs (In The DP)
- Initializing
dp[0] = INFinstead of0. (dp[0] = 0because zero amount needs zero coins.) - Iterating
for c: for w(orderings DP) whenfor w: for c(combinations DP) is intended for count of orderings — not the issue for min coins, but the analogous bug appears in LC 518 (number of ways to make change). - Returning
dp[amount]without checkingINF— returns a giant number instead of-1.
Common Bugs (In The Greedy, If You Do Try It)
- Assuming canonicity. Always test against DP on hostile cases first.
- Forgetting to return
-1when amount is not zero at the end. - Treating
coins = [1]as always feasible — true, but easy to forget the early return.
Mastery Criteria
-
You can deliver the
[1, 3, 4]target6counterexample by heart, in <30 seconds, without notes. -
You can articulate why greedy on coin change works for
[1, 5, 10, 25]but fails for[1, 3, 4]— the canonicity property — in <60 seconds. - You can write the DP solution in <5 minutes from blank.
- You can name three other classic problems where greedy fails but DP works (0/1 knapsack, weighted interval scheduling, longest path in a general graph).
- When proposing a greedy solution to any problem in mock interviews, you stress-test it against brute force on small adversarial inputs before writing production code.
← Lab 05 — Huffman Coding · Phase 6 README
Phase 7 — Competitive Programming Acceleration
Target level: Hard → Codeforces Div 2 D (rating ~1900–2100) Expected duration: 2 months (12-month Elite track) / 4 weeks selective topics (6-month Serious track) / skipped or read-only (12-week Accelerated track) Weekly cadence: ~10 competitive topics + 6 labs + 2 contests/week + 30–60 problems applying them under the framework
A Direct Note On ROI Before You Spend Two Months Here
This phase has the lowest direct ROI per hour for FAANG SWE2 / L4 prep of any phase in this curriculum. If your goal is a Google L4, Meta E4, Amazon SDE2, or similar — you can skip this phase entirely and lose nothing. Phases 01 → 06 plus Phase 8 (practical engineering) cover essentially every problem you will see in those interviews. Modular inverse will not appear in your loop. Convex hull will not appear in your loop. Mo’s algorithm will not appear in your loop. The opportunity cost of two months on competitive programming is two months you could have spent on system design, behavioral prep, or sleep.
This phase has the highest direct ROI per hour for: HFT/quant interviews (Jane Street, HRT, Citadel, Two Sigma, Optiver, IMC, Jump), compiler/runtime/database internals teams (Google’s compiler infra, Microsoft’s CLR, Oracle’s HotSpot, ClickHouse, Snowflake’s query engine), distributed systems coding rounds at the senior+ level where contest-style problems are deliberately used as filters, ICPC-flavored test rounds at startups founded by ex-CP champions, and any interview where the explicit goal is to filter out everyone except the top ~5% of candidates by raw algorithmic horsepower. In those loops, the topics in this phase are not optional decoration — they are the test. A candidate who cannot derive a modular inverse, write binary exponentiation, or sweep events along a coordinate cannot pass an Optiver onsite no matter how good their system design is.
So: decide your target before you start this phase, and do not feel guilty about skipping it if the ROI calculation says skip. The rest of this README assumes you’ve decided to do the work.
What “Competitive Programming Acceleration” Actually Means
Competitive programming is not just “harder LeetCode”. It is a different sport with a different culture, different problem-solving rhythm, and different correctness bar. The differences that matter for interview prep:
- Constraints are everything. A LeetCode Hard might say
1 ≤ N ≤ 10^5and accept anyO(N log N)solution. A Codeforces problem will say1 ≤ N ≤ 5·10^5, T ≤ 10^4 testcases, sum of N ≤ 5·10^5, 2 second time limit, and yourO(N log^2 N)solution will TLE whileO(N log N)will pass with 200ms to spare. Reading constraints first — before the problem statement — is the single biggest skill jump from LeetCode to CP. - Problems are short. A typical CF Div 2 problem is 3–8 sentences plus 2 example testcases. Information density per word is 5–10× LeetCode. Skim-then-deep-read is wrong; deep-read on first pass is correct.
- Brute force is a starting point, not an ending point. Submitting brute force on a contest problem to “lock in partial credit” is a LeetCode habit. On CF you submit only when you believe you have the intended complexity, because wrong submissions cost 50 points each (penalty time).
- Stress testing is a normal part of the workflow. Top CP grandmasters run brute-vs-candidate stress tests against random inputs during a contest, on every problem they’re not 100% certain about. This is the muscle Lab 06 builds.
- Editorials are a separate skill. After a contest, reading editorials productively (extracting transferable techniques, not just patching your specific solution) is half the learning. Most candidates read an editorial and take away nothing because they read it as a solution rather than as a textbook.
The competitive programming skill set translates to interview signal in three ways: (1) speed — you become physically faster at typing and debugging, which buys time for harder questions; (2) vocabulary — when an interviewer says “this is a sweep line problem” or “use binary search on the answer”, you have a direct reference rather than re-deriving from scratch; (3) pattern coverage — the long tail of “weird trick” problems that interviewers reach for to filter senior candidates is exactly the long tail of CP techniques.
What You Will Be Able To Do After This Phase
- Read a Codeforces Div 4 / Div 3 problem in <2 minutes, decide brute-vs-intended in <1 minute, and submit Div 4 A–F or Div 3 A–E within contest time.
- Reach Div 2 C consistently and attempt Div 2 D in ~50% of contests.
- Read AtCoder Beginner Contest problems A–F and solve A–E reliably; reach F in ~50% of contests.
- Reach AtCoder Regular Contest A–C, with C being the contest-finisher you usually upsolve afterward rather than solve in-contest.
- Compute
nCr mod pforpprime andnup to10^7, with precomputed factorials and modular inverses, in <5 minutes from blank. - Implement binary exponentiation (
a^b mod m) in <2 minutes and recognize when matrix exponentiation reduces a linear recurrence fromO(N)toO(K^3 log N). - Implement the Sieve of Eratosthenes (basic and linear), the smallest-prime-factor sieve, and trial-division factorization, knowing when each is appropriate.
- Implement modular inverse via Fermat’s Little Theorem (when modulus is prime) and via extended Euclidean (when it isn’t), and know which to reach for.
- Implement Andrew’s monotone chain convex hull in <15 minutes and explain why cross product replaces division for orientation.
- Implement a sweep line for the skyline problem and 1D rectangle union; recognize the “sort events, scan, maintain active set” pattern under disguise.
- Implement coordinate compression as a one-line preprocessing step and combine it with Fenwick tree to count inversions in
O(N log N). - Implement Mo’s algorithm with the canonical block-sqrt sorting comparator and explain its
O((N + Q) √N)complexity. - Compute Sprague-Grundy numbers for impartial games and reduce composite games via XOR.
- Write a stress-testing harness — brute, candidate, random generator, comparator — and use it to find a planted bug in <5 minutes.
- Solve interactive CP problems (binary search a hidden value, query a hidden function) using line-buffered I/O and explicit
flushdiscipline. - Configure fast I/O in your language of choice —
cin/coutdesync in C++,bufio.NewReader+bufio.NewWriterin Go,sys.stdin.readline+sys.stdout.writein Python,BufferedReader+PrintWriterin Java — without thinking about it.
How To Read This Phase
Read this README once, linearly, end-to-end. Do not try to memorize it. The 19 inline topic sections are reference material — internalized when you actually use them on contest problems, not by re-reading. The 9 progression sections are playbooks — they tell you which contests to enter and what the success bar is.
After the linear pass, do this in order:
- Set up your CP toolchain — install your language compiler, configure fast I/O templates, get accounts on Codeforces and AtCoder.
- Work Lab 01 through Lab 06 in order. The labs are designed so each one builds a primitive you reuse in the next.
- Start the contest progression — Div 4 first, then Div 3, then Div 2. Do not skip Div 4 thinking it’s “too easy”; the goal there is speed, not difficulty.
- After every contest, spend at least 2× the contest time on upsolving (problems you didn’t solve in-contest, with the editorial open). Upsolving is where the learning happens.
Each topic entry has a fixed shape:
- Definition — what the technique is.
- When Used — the problem signal that fires this technique.
- Complexity — the canonical time/space.
- Classic Problems — 2–4 representative LC / CF / AtCoder problems.
- Pitfalls — the bugs that consume the most contest minutes for this technique.
The phase ends with a Mastery Checklist, Exit Criteria, and links to all six labs.
CP Problem-Solving Methodology — The Five-Step Loop
The single most teachable skill in competitive programming is the read → constraints → brute → submit → stress loop. Apply it to every problem.
- Read fast. First read takes ~60 seconds. Goal: identify the problem class (graph? DP? math? sweep? game?) and the input/output format. Don’t try to solve yet. If you don’t understand on first read, re-read — but do not start sketching code.
- Look at constraints before optimizing. This is the single biggest behavioral difference between CP and LeetCode habits. The constraint
N ≤ 18says bitmask DP.N ≤ 22says meet-in-the-middle.N ≤ 5000saysO(N²).N ≤ 2·10^5saysO(N log N).N ≤ 10^9says you don’t iterateNat all — math, binary search on the answer, or a closed form. The constraint is the algorithm choice. Read it first; do not write a single line of code without it. - Brute-force first, in your head. Even if brute force won’t pass, the brute force gives you (a) a correctness oracle for stress testing, (b) a starting point for optimization, (c) a 100% reliable answer to “do I understand the problem?”. If you can’t write the brute force, you don’t understand the problem yet — re-read the statement.
- Submit early and often, but only when confident. Do not submit a partial / “maybe correct” solution to lock in points; CF/AtCoder penalize wrong submissions. If your code passes the sample inputs, that is necessary but not sufficient. Sample inputs are the easiest possible cases by construction; passing them is the floor, not the ceiling. Stress-test before submitting on any problem you’re <90% confident on.
- Stress test if uncertain. Lab 06 builds this muscle. The pattern: brute (definitely correct, exponential), candidate (your fast solution), random generator (small inputs,
N ≤ 10), comparator that runs both and dies on mismatch. Run it for 1000 random tests in 30 seconds. If it doesn’t fail in 1000 trials, it probably won’t fail on the judge.
The loop applies recursively. If you’re stuck in step 3 (can’t write brute force), drop to “what’s the absolute simplest version of this problem?” — usually a smaller N, a special case, or a related problem. Solve that first. That’s almost always how the intended solution is derived.
Inline Topic Reference
Math
1. Modular Arithmetic
Definition
Arithmetic over the residue ring Z/pZ (typically p = 10^9 + 7 or p = 998244353). Addition, subtraction, multiplication, and exponentiation are all defined modulo p. Division is not defined directly — see modular inverse.
When Used
Whenever the answer is “huge” — count of arrangements, count of paths, sum over all subsets — and the problem says output mod 10^9 + 7. This is the most common modifier on counting problems in CP.
Complexity
Addition / subtraction / multiplication are O(1). Watch for overflow: in C++ (a * b) % p overflows int when p ≈ 10^9; cast to long long first. In Java, % is signed (negative % of negative integer is negative); use ((a % p) + p) % p after subtraction. In Python, integers are arbitrary precision so overflow doesn’t happen, but performance suffers — keep numbers under p aggressively.
Classic Problems
- CF 1342E (Placing Rooks) — counting arrangements mod
10^9 + 7. - AtCoder ABC 174 F — count distinct elements queries (off-topic but illustrates
modergonomics). - LC 920 (Number of Music Playlists) — DP with mod.
Pitfalls
- Forgetting to mod after every multiplication; the value silently overflows and silently corrupts answers.
- Negative numbers after subtraction in C++/Java; always
((x % p) + p) % p. - Using
%ondouble(always wrong; mod is integer-only).
See Lab 01 — Modular Arithmetic.
2. Modular Inverse
Definition
The modular inverse of a mod p is the unique x in [0, p) such that a · x ≡ 1 (mod p), when it exists. Existence requires gcd(a, p) = 1. When p is prime, every a ≠ 0 has an inverse.
Two computation methods:
- Fermat’s Little Theorem (FLT): if
pis prime,a^(p-1) ≡ 1 (mod p), soa^(p-2) ≡ a^(-1) (mod p). Use binary exponentiation inO(log p). - Extended Euclidean Algorithm (extgcd): find
x, ysuch thata·x + p·y = gcd(a, p). Ifgcd = 1, thenx mod pis the inverse.O(log min(a, p)).
When Used
- Division by
amodp(replacen / awithn · inv(a)). - Computing
nCr mod pfrom precomputed factorials:nCr = fact[n] · inv(fact[r]) · inv(fact[n-r]). - Probability problems where the answer is a fraction
p/qmodulo a prime; the answer isp · q^(-1) mod prime.
Complexity
O(log p) per inverse via either method. For batched inverses of n values, there’s a clever O(n) algorithm using the running product trick — useful when precomputing inverse factorials.
Classic Problems
- CF 1462E2 (Close Tuples to Arrays, Hard) —
nCr mod pheavy. - AtCoder ABC 178 F (Contrast) — combinatorics with mod.
- CF 1342E — modular inverse for counting.
Pitfalls
- Using FLT when
pis composite — incorrect, must use extgcd. - Forgetting that
inv(0)is undefined; guard before calling. - Using FLT when the modulus is prime but you accidentally pass
p - 1instead ofp - 2.
3. Binary Exponentiation (Fast Power)
Definition
Compute a^b (or a^b mod m) in O(log b) time by exploiting the binary representation of b. The recurrence: a^b = (a^(b/2))^2 if b is even, a · a^(b-1) if b is odd.
long long power(long long a, long long b, long long m) {
long long res = 1 % m;
a %= m;
while (b > 0) {
if (b & 1) res = res * a % m;
a = a * a % m;
b >>= 1;
}
return res;
}
When Used
Anywhere you’d otherwise loop b times multiplying a. With b up to 10^18, naive looping is impossible; binary exponentiation is mandatory. Also the implementation engine for FLT-based modular inverse and matrix exponentiation.
Complexity
O(log b) multiplications. Each multiplication is O(1) for integers but O(K^3) for K×K matrices (giving O(K^3 log b) for matrix exponentiation).
Classic Problems
- CF 630I, 630J — direct power computation.
- LC 50 (Pow(x, n)) — the canonical binary exponentiation problem.
- AtCoder ABC 178 D — DP with mod, uses fast power for inverses.
Pitfalls
- Negative exponents (LC 50): handle as
1 / power(x, -n)and watch forINT_MIN(negating overflows). - Base case
b = 0returning1, but1 % mifm = 1should be0— start withres = 1 % m.
See Lab 02 — Binary Exponentiation.
4. Matrix Exponentiation
Definition
For a linear recurrence f(n) = c_1 · f(n-1) + c_2 · f(n-2) + ... + c_k · f(n-k), the state vector [f(n), f(n-1), ..., f(n-k+1)] is obtained from the state vector at step n-1 by multiplying by a fixed k×k companion matrix M. Therefore the state at step n is M^n · initial_state, and M^n is computed by binary exponentiation in O(k^3 log n) time.
When Used
Linear recurrences where n is up to 10^18 and k (the recurrence depth) is small (typically k ≤ 60). The textbook example is Fibonacci modulo a prime for n = 10^18. Also: counting walks of length n in a graph (M = adjacency matrix), counting paths in a DFA, certain combinatorial DPs over a small fixed state space.
Complexity
O(k^3 log n) time, O(k^2) space. For k = 2 (Fibonacci), 8 log n multiplications mod p ≈ 500 ops for n = 10^18.
Classic Problems
- Fibonacci mod p, n = 10^18 — the canonical introduction.
- CF 392C (Yet Another Number Sequence) — matrix exponentiation with polynomial coefficients.
- AtCoder DP Contest R (Walk) — counting walks of length
Kin a graph.
Pitfalls
- Index off-by-one in the state vector (forgetting that the last entry is
f(n-k+1), notf(n-k)). - Forgetting to mod every matrix multiplication entry.
- Using nested Python lists instead of NumPy for matrices — Python is too slow for
K ≈ 50andlog n ≈ 60.
See Lab 02 — Matrix Exponentiation for Fibonacci.
5. Sieve of Eratosthenes (and Linear Sieve)
Definition
Build a boolean array is_prime[0..N] in O(N log log N) time by, for each prime p ≤ √N, marking all multiples of p (starting from p²) as composite. The linear sieve variant produces the smallest prime factor (SPF) for every integer up to N in exactly O(N) time using the invariant “every composite is sieved once, by its smallest prime factor”.
When Used
- Counting primes up to
NforN ≤ 10^7(Sieve of Eratosthenes is faster than trial division). - Generating all primes up to
Nfor prime-related problems. - Building a smallest-prime-factor table for fast factorization (see Topic 6).
- Euler’s totient
phi(n)for alln ≤ NinO(N log log N).
Complexity
Sieve of Eratosthenes: O(N log log N), space O(N) (or N/8 with a bitset). Linear sieve: O(N), space O(N) for the SPF table.
Classic Problems
- LC 204 (Count Primes) — sieve introduction.
- CF 17A (Noldbach problem) — primes near pairs of primes.
- Project Euler 10 (sum of primes below 2M) — sieve of size
2·10^6.
Pitfalls
- Iterating to
Ninstead of√Nin the outer loop (correctness OK butO(N²)-flavor slow). - Starting the inner loop at
2pinstead ofp²(correct but slower;p², p²+p, p²+2p, ...is the optimal start). - Using
vector<bool>in C++ is fine;bool[]is also fine.unordered_set<int>is not fine — too slow.
See Lab 03 — Sieve and Factorization.
6. Prime Factorization
Definition
Decompose n into its prime factors. Two main techniques:
- Trial division. For each
p = 2, 3, 5, ..., √n, whilep | n, divide. Finaln > 1is itself prime.O(√n)per number. - Smallest-prime-factor (SPF) sieve. Precompute
spf[i]= smallest prime dividingi, for alli ≤ N. Then factor anyi ≤ NinO(log i)by repeatedly replacingiwithi / spf[i].O(N log log N)preprocessing;O(log i)per query.
When Used
- Trial division when factoring a single large
n(up to10^14is feasible). - SPF sieve when factoring many numbers in a range
[1, N]forN ≤ 10^7. - For
nup to10^18, trial division is too slow; use Pollard’s rho (out of scope for this phase, in Phase 12).
Complexity
Trial division: O(√n). SPF sieve: O(log n) per query after O(N log log N) preprocessing.
Classic Problems
- CF 1325E — factor and sum of exponents.
- LC 263 (Ugly Number) — recursive division by small primes.
- AtCoder ABC 169 D — factor and count exponents.
Pitfalls
- Forgetting that after the loop
if n > 1: append n as final prime. Easy to miss; corrupts every factorization wherenhas a prime factor >√n_initial. - Trial-dividing past
√n; oncep > √n,nis either 1 or itself prime. - Mixing up “number of distinct primes” with “number of prime factors with multiplicity” — these are very different (e.g., 12 = 2²·3 has 2 distinct, 3 with multiplicity).
See Lab 03 — Sieve and Factorization.
7. Combinatorics (nCr mod p)
Definition
Compute binomial coefficients modulo a prime. For repeated queries, precompute fact[i] = i! mod p and inv_fact[i] = (i!)^(-1) mod p for i up to N. Then nCr = fact[n] · inv_fact[r] · inv_fact[n-r] mod p in O(1) per query.
For n very large (up to 10^18) and p small (p ≤ 10^5), use Lucas’s theorem: C(n, r) mod p = ∏ C(n_i, r_i) mod p, where n_i, r_i are the base-p digits of n, r. The inner C(n_i, r_i) are computed directly because n_i, r_i < p.
When Used
- Counting paths in a grid (
C(m+n, m)). - Stars-and-bars: distribute
nidentical items intokbins →C(n+k-1, k-1). - Inclusion-exclusion sums.
- Probability with combinatorial denominators.
- Lucas: when
nis up to10^18(e.g., AtCoder ABC 167 E or grid problems with huge dimensions).
Complexity
Preprocess O(N). Each nCr query O(1). Lucas’s theorem: O(p + log_p(n)) per query (assuming preprocessed factorials up to p).
Classic Problems
- CF 1342E — uses
nCr. - LC 62 (Unique Paths) — direct
C(m+n-2, m-1). - AtCoder ABC 167 E (Colorful Blocks) — inclusion-exclusion with
nCr.
Pitfalls
- Forgetting to precompute
inv_factseparately; computing each query asfact[n] / (fact[r] · fact[n-r])and trying to use integer division modp(this is wrong; you need modular inverse). - Off-by-one in
fact[]array (forgettingfact[0] = 1). - For Lucas, forgetting that any
r_i > n_igivesC(n_i, r_i) = 0, so the whole product is0.
See Lab 01 — Modular Arithmetic.
8. GCD, LCM, Extended Euclidean
Definition
gcd(a, b)is the greatest common divisor ofa, b. Computed by Euclidean:gcd(a, b) = gcd(b, a mod b), base casegcd(a, 0) = a.lcm(a, b) = a · b / gcd(a, b). Compute asa / gcd(a, b) · bto avoid overflow on intermediatea · b.- Extended Euclidean algorithm finds, alongside
gcd(a, b), integersx, ysuch thata·x + b·y = gcd(a, b). This is the engine for modular inverse when the modulus isn’t prime.
When Used
- Reducing fractions.
- Solving linear Diophantine equations
a·x + b·y = c(solution exists iffgcd(a, b) | c). - Modular inverse via extgcd when the modulus is composite.
- Cycle-length problems where the answer involves an LCM.
Complexity
O(log min(a, b)) for both gcd and extgcd.
Classic Problems
- CF 822A — direct LCM use.
- LC 1071 (Greatest Common Divisor of Strings) — repurposed gcd.
- AtCoder ABC 162 D — gcd in a counting problem.
Pitfalls
lcm(a, b) = a * b / gcd(a, b)overflows whena, bare around10^9. Reorder:lcm = a / gcd * b.gcd(0, 0)is conventionally0, but C++__gcd(0, 0)returns0; some libraries return undefined. Guard.- Negative
a, b:gcdshould always be non-negative; some implementations return signs. Useabs().
Geometry
9. Coordinate Geometry Basics (Cross Product, Orientation)
Definition
For two 2D vectors u = (ux, uy) and v = (vx, vy), the cross product is the scalar ux·vy − uy·vx. Its sign tells you the relative orientation of the vectors: positive = counter-clockwise turn, negative = clockwise, zero = collinear. The orientation of three points A, B, C is the sign of the cross product of B − A and C − A; this is the most-used primitive in computational geometry.
When Used
- Determining whether three points form a left turn, right turn, or are collinear (convex hull, polygon orientation).
- Determining whether a point is on, left of, or right of a line.
- Computing twice the signed area of a triangle (the cross product is twice the signed area).
- Computing twice the signed area of a polygon (shoelace formula = sum of cross products).
Complexity
O(1) per cross product / orientation test.
Classic Problems
- LC 587 (Erect the Fence) — convex hull, uses orientation.
- CF 70D (Dynamic Convex Hull) — uses orientation heavily.
- AtCoder ABC 207 D — geometry with cross products.
Pitfalls
- Using floating-point for cross product when integer arithmetic would suffice — introduces rounding errors that cause “almost collinear” misclassifications. Use
long long(or arbitrary precision) when coordinates are integers. - Confusing CCW (counter-clockwise) with CW (clockwise) sign convention.
- Overflow in cross product: with coordinates up to
10^9, the product is up to10^18, which fitslong longbut notint.
10. Convex Hull (Andrew’s Monotone Chain)
Definition
Given a set of 2D points, the convex hull is the smallest convex polygon containing all of them. Andrew’s monotone chain algorithm sorts points by (x, y), then builds the lower hull left-to-right and the upper hull right-to-left, using the cross-product orientation test to pop points that make a right turn (in the lower hull) or left turn (in the upper hull).
sort(points.begin(), points.end());
vector<P> hull;
// lower hull
for (auto &p : points) {
while (hull.size() >= 2 && cross(hull[hull.size()-2], hull.back(), p) <= 0)
hull.pop_back();
hull.push_back(p);
}
// upper hull
int lower_size = hull.size() + 1;
for (int i = points.size() - 2; i >= 0; --i) {
while (hull.size() >= lower_size && cross(hull[hull.size()-2], hull.back(), points[i]) <= 0)
hull.pop_back();
hull.push_back(points[i]);
}
hull.pop_back(); // last point is the start, duplicated
When Used
- Smallest enclosing polygon problems.
- Diameter of a point set (rotating calipers on the hull).
- Pre-step for various 2D optimization problems (convex layers, dynamic hulls).
Complexity
O(N log N) — dominated by the sort. The two hull-building passes are O(N) amortized.
Classic Problems
- LC 587 (Erect the Fence) — direct convex hull.
- CF 1093E — uses convex hull as a subroutine.
Pitfalls
<= 0vs< 0in the orientation test:<= 0removes collinear points from the hull (giving the strict hull);< 0keeps them (giving the inclusive hull). LC 587 wants the inclusive hull (use< 0); most CP problems want the strict hull (use<= 0).- Forgetting to remove the duplicated last point.
- Sorting tuples lexicographically without a tie-break — for points with the same
xbut differenty, the sort order matters;(x, y)lexicographic is the right tie-break.
11. Closest Pair of Points (Divide & Conquer Overview)
Definition
Given N points in 2D, find the pair with the smallest Euclidean distance. The naive O(N²) algorithm is to compare every pair. The classical O(N log N) algorithm sorts by x, recursively solves the left and right halves, finds the minimum distance d of the two halves, then merges by inspecting only points within horizontal distance d of the dividing line — and within those, only y-neighbors within distance d. The merge step is O(N) because each strip point only needs to compare against ~6 nearest y-neighbors.
When Used
- Direct closest-pair problems.
- Any problem where you need a guarantee on minimum spacing (geometric clustering, collision detection).
Complexity
O(N log N) time, O(N) space. The recursion T(N) = 2 T(N/2) + O(N) resolves to O(N log N).
Classic Problems
- Codeforces educational round problems labeled “closest pair”.
- UVa 10245 (The Closest Pair Problem) — direct.
Pitfalls
- For most interview problems,
Nis small enough (≤ 5000) thatO(N²)brute force passes, and writing the divide-and-conquer version is not worth the complexity. - Floating-point distance comparison: compare squared distances (integers, exact) instead of square-rooted distances (floats, lossy).
Sweep & Queries
12. Sweep Line
Definition
A sweep line algorithm imagines a vertical (or horizontal) line sweeping across the plane (or 1D number line) and processing events in the order the sweep encounters them. At each event, you update an “active set” — typically a balanced BST or a multiset — and answer queries based on the current state. The key insight is that between events, the active set is constant, so you only need to process at events.
When Used
- 1D rectangle/interval union (sum of lengths).
- 2D rectangle union area (sweep
y-coordinate; active set =x-intervals). - Segment intersection problems (Bentley-Ottmann).
- Skyline problem (LC 218).
- Closest pair (alternate formulation).
Complexity
Typically O((N + E) log N) where E is the number of events; for rectangle union, E = O(N), giving O(N log N).
Classic Problems
- LC 218 (The Skyline Problem) — canonical.
- LC 850 (Rectangle Area II) — 2D rectangle union.
- AtCoder ABC 188 D — 1D event-sweep counting.
Pitfalls
- Tie-breaking on event time: when multiple events occur at the same
x, process all opens before closes (or vice versa, problem-dependent). Wrong order → off-by-one in the active set. - Using
set<int>for the active set when you need to handle duplicate values; switch tomultiset<int>. - Updating the answer based on the active set after processing all events at the current
x, not in the middle.
See Lab 04 — Sweep Line for Skyline.
13. Coordinate Compression
Definition
Replace large/sparse coordinate values with their ranks in the sorted set of distinct coordinates. If your data has values [10^9, 5, 10^7, 5, 1], compression maps them to ranks [3, 1, 2, 1, 0]. The transformed problem has the same structure but coordinates fit in [0, N), enabling array-indexed data structures (Fenwick tree, segment tree, bucket sort).
sorted_unique = sorted(set(values))
rank = {v: i for i, v in enumerate(sorted_unique)}
compressed = [rank[v] for v in values]
When Used
- Counting inversions with a Fenwick tree (values up to
10^9→ compress to[0, N)). - 2D rectangle union via sweep line + segment tree on
y-coordinates. - DP with a state indexed by a coordinate that’s too large to enumerate.
- Almost any problem with
value ≤ 10^9where you would otherwise need a hashmap of sizeN.
Complexity
O(N log N) for sorting and deduplication; O(N) after that to relabel.
Classic Problems
- LC 315 (Count of Smaller Numbers After Self) — Fenwick + compression.
- CF 51A — geometry with coord compression.
- AtCoder ABC 174 F — distinct elements queries.
Pitfalls
- Forgetting to use
set(deduplicate) before sorting; otherwise duplicate values get different ranks, breaking equality comparisons. - Using compression when not needed (values already in a small range) — adds complexity for no benefit.
See Lab 05 — Coordinate Compression for Inversions.
14. Mo’s Algorithm
Definition
An offline algorithm for answering Q range queries on an array in O((N + Q) √N) total time when (a) you can move the answer from [l, r] to [l-1, r], [l+1, r], [l, r-1], [l, r+1] in O(1) (or O(log N)); and (b) the queries can be reordered. The trick: sort queries by (l / B, r) where B = √N. Then within a block, r only increases, so total r movement is O(N) per block × √N blocks = O(N √N). Across blocks, l movement is O(√N) per query × Q queries = O(Q √N).
When Used
- “Number of distinct values in
[l, r]” queries. - “Sum of
f(count(v))for distinctvin[l, r]” queries. - Mode of a range (with auxiliary frequency-of-frequency structure).
- Many problems labeled “offline range queries with no updates” on Codeforces.
Complexity
O((N + Q) √N). With N = Q = 10^5, that’s about ~3.2·10^7 operations — passes a 2-second limit comfortably.
Classic Problems
- CF 86D (Powerful array) — sum of
cnt² · vover distinctv. - SPOJ DQUERY — distinct values in a range.
- CF 220B — count of values equal to their frequency.
Pitfalls
- The optimal block size is
√N; smaller values ofBcause TLE becauser-movement within a block is too long. - Mo’s algorithm doesn’t handle online queries — queries must be batched and reordered.
- The “add/remove element in O(1)” requirement is strict; an
O(log N)add/remove makes the totalO((N + Q) √N · log N), which usually TLEs.
15. Offline Binary Search / Parallel Binary Search
Definition
When you have Q independent binary-search queries, each of which would naively take O(log V · F) where F is some function evaluation cost, parallel binary search runs all Q queries’ binary searches in lockstep. At each binary-search step, group the queries by their current candidate midpoint, evaluate F once per group, and update each query’s interval. Total cost: O(log V · (F + Q)) instead of O(Q · log V · F).
When Used
- “For each query, find the smallest
tsuch that some propertyP_query(t)holds”, wherePis monotone intand evaluatingPis expensive (e.g., requires processing the firsttoperations of a stream). - Problems where each query is a binary search over time/index/threshold and the function
F(t)changes globally witht.
Complexity
O(log V · (F + Q)). For V = N, F = O(N), Q = N: O(N log N) total.
Classic Problems
- CF 813F (Bipartite Checking) — parallel binary search on offline DSU.
- POI Meteors — the canonical introduction.
Pitfalls
- Forgetting that this technique requires queries to be independent (one query’s answer doesn’t depend on another).
- Conceptually heavier than Mo’s algorithm; for interview prep, knowing the technique exists and recognizing its signal is more important than implementing it from blank.
Game Theory & Misc
16. Sprague-Grundy / Nim
Definition
In an impartial two-player game (both players have the same available moves at every position), each position has a Grundy number g(pos) defined recursively: g(pos) = mex { g(next) : next ∈ moves(pos) }, where mex is the minimum excluded value (smallest non-negative integer not in the set). A position has Grundy number 0 iff it’s losing for the player to move. Nim’s theorem: the Grundy number of a sum of independent games is the XOR of their individual Grundy numbers.
When Used
- Any “two-player game, both move optimally, who wins?” problem with no chance / no hidden information / both players have the same moves.
- Decomposing complex games into sums of simpler subgames.
- Standard Nim (multiple piles, take any number from any pile, last to move wins): the answer is “first player wins iff XOR of pile sizes ≠ 0”.
Complexity
Computing Grundy via memoization: O(states · branching). For games with state space up to 10^6, this is feasible.
Classic Problems
- Standard Nim, multi-pile — XOR of pile sizes.
- CF 95A — Grundy on a stair-step game.
- AtCoder ABC 195 D — game DP related (not pure Grundy but related).
Pitfalls
- The theorem only applies to impartial games. Partisan games (chess, where pieces are colored) don’t satisfy Sprague-Grundy.
- “Last to move loses” (misère convention) does NOT in general have the simple XOR rule — only the “last to move wins” (normal convention) does, except in degenerate cases.
- Large branching factor + large state space → memoization table doesn’t fit. Look for closed-form patterns by computing Grundy for small
nand spotting periodicity.
17. Randomized Algorithms / Stress Testing
Definition
Two related concepts:
- Randomized algorithms: algorithms that use random choices to achieve good expected complexity (randomized quicksort, treap, hash-based string matching). Las Vegas algorithms are always correct, randomized in time; Monte Carlo are randomized in correctness.
- Stress testing (the bigger interview-prep topic): writing a small brute-force solver, your candidate optimal solver, a random input generator, and a comparator that runs both on every random input until they disagree. This is how CP grandmasters find bugs in their own solutions.
When Used
- Stress testing: on every problem you’re not 100% confident about, before submitting.
- Randomized algorithms: when a deterministic guarantee isn’t required (probabilistic data structures: Bloom filter, count-min sketch, treap, randomized convex hull).
Complexity
Stress testing is overhead-only — if both your brute and candidate are fast enough on small inputs, stress testing is essentially free. 1000 random tests on N = 10 finishes in <1 second.
Classic Problems
- This is a meta-skill, not a problem class. See Lab 06 — Stress Testing.
Pitfalls
- Random generator that doesn’t cover edge cases (e.g., always generating distinct elements when the bug is in duplicate handling). Generate adversarially: small
N, small value range, allow duplicates. - Comparator that compares output as strings without normalizing whitespace — false positives.
- Not seeding the RNG; one accidentally-passing run hides the bug.
18. Interactive Problems (CP-Style)
Definition
The problem statement defines an interactive protocol: you ask the judge queries (e.g., “is element i greater than element j?”), the judge answers, and after at most K queries you must report the answer. The judge runs as a subprocess and communicates via stdin/stdout. The technique is usually a binary search, ternary search, or adaptive query strategy bounded to O(log N) queries.
When Used
- “Find a hidden value in
[1, N]in at mostlog₂ Nqueries” — straight binary search. - “Find the minimum of a unimodal function” — ternary search.
- Adversarial / interactive game-tree problems.
Complexity
O(log N) queries for binary/ternary search; the algorithm is otherwise mostly bookkeeping.
Classic Problems
- CF 1207E (XOR Guessing) — adaptive queries.
- CF 1486D (Max Median) — interactive binary search.
- AtCoder ABC 178 D — not interactive but related decision-problem-as-binary-search.
Pitfalls
- Forgetting to flush stdout after every query. This is the single most common interactive bug. In C++:
cout << ... << endl;(orcout.flush();); in Python:print(...); sys.stdout.flush()orprint(..., flush=True). In Go:bufio.NewWriterwith explicitFlush(). If you don’t flush, the judge sees nothing, your program waits for input that never comes, you TLE. - Mixing
cin/coutwithscanf/printf— buffering interleaves badly. - Reading the judge’s response on the wrong line because of an off-by-one in the query loop.
19. Fast I/O
Definition
The default I/O mechanisms in most languages are line-buffered, locale-aware, and format-aware — which makes them slow. For CP, where you might read 10^6 integers in a 1-second time limit, fast I/O is mandatory. The technique varies by language:
- C++:
ios_base::sync_with_stdio(false); cin.tie(nullptr);— disconnectscin/coutfrom Cstdio. Speed-up: ~5×. Even faster:scanf/printfdirectly. - Java:
BufferedReader+StreamTokenizerfor input;PrintWriter(with explicitflush()) for output.Scanneris too slow for CP — never use it. - Python:
sys.stdin.readlineinstead ofinput();sys.stdout.writeinstead ofprintfor hot loops. For massive input:data = sys.stdin.buffer.read().split()and parse from there. PyPy3 is 5–10× faster than CPython for raw computation; use it whenever available. - Go:
bufio.NewReader(os.Stdin)andbufio.NewWriter(os.Stdout); always deferwriter.Flush().fmt.Scanis slow; use a custom token-by-token reader. - JavaScript / TypeScript (Node.js):
process.stdinraw read, parse all input at once. Generally the slowest mainstream CP language; not recommended forN ≥ 5·10^5.
When Used
- Always, on every CP problem with large input. Cost-of-not-using-fast-I/O: 5–10× slowdown, the difference between AC and TLE.
- Less critical on LeetCode-style problems where input is already parsed for you.
Complexity
I/O is O(input_size) either way; fast I/O reduces the constant by ~5–10×.
Classic Problems
- Any problem with
N ≥ 10^6integers as input. The constraint itself is a hint: “you need fast I/O”.
Pitfalls
- Mixing buffered and unbuffered I/O in the same program. In C++, after
sync_with_stdio(false), do not mixcinwithscanforcoutwithprintf. The buffers are independent and output appears out of order. - Forgetting to
Flush()in Go. Your output disappears entirely. - Java
Scanner. Don’t. - Python
printin a loop of10^6iterations. Each call locks stdout, flushes, formats — lethal for CP. Buffer with'\n'.join(...)and one finalsys.stdout.write.
Progression Playbooks (How To Practice Each Contest Track)
The 19 topics above are the vocabulary. The progression playbooks below are the training plans — which contests to enter, what the success bar is, and what to do after.
1. Codeforces Div 4 Progression
Target rating: unrated → 1200. Goal: solve A–F reliably, in <90 min, cleanly. Contest cadence: every Div 4, ~2/month. Why Div 4: the floor of CF; problems are LC-Easy to LC-Medium difficulty but with CF-style constraints. The skill being trained is speed — you should never get stuck on a Div 4 problem; if you do, the bottleneck is reading/typing speed, not algorithm knowledge.
How to practice: enter every Div 4 live. Aim for A–E in <60 min, F in <90 min total. After contest, upsolve any problems you missed, with editorial open. Track your fastest A–E solve times in a spreadsheet — they should drop ~30% over your first 5 Div 4 contests.
Exit criterion: solve A–F in 6/6 problems consistently in <100 min.
2. Codeforces Div 3 Progression
Target rating: 1200 → 1500. Goal: solve A–E reliably, attempt F. Contest cadence: every Div 3, ~2/month. Skill trained: pattern recognition. Div 3 problems mostly use the patterns from this curriculum (sweep, two pointers, basic DP, basic graph), but the framing is less explicit than LeetCode. You’ll see “given an array, do f(...)” with no LC tag telling you “this is binary search on the answer”.
How to practice: enter every Div 3 live. Aim for A–D in <60 min, E in <120 min. Upsolve E and F after contest. Read the editorial carefully — even if you solved it, see if there’s a slicker approach.
Exit criterion: solve A–E in 5/6 problems consistently within contest time.
3. Codeforces Div 2 Progression
Target rating: 1500 → 1900. Goal: solve A–C reliably, attempt D in 50% of contests. Contest cadence: every Div 2, ~3/month. Skill trained: problem-solving creativity. Div 2 D is where “knowing the technique” stops being enough — you must combine techniques. Div 2 D might require a sweep + DP, or a binary search + greedy, or a Fenwick tree + coordinate compression.
How to practice: enter every Div 2 live. Don’t worry about D in your first 10 contests. Aim for A, B, C clean. After contest, upsolve D with the editorial open; the goal is to learn techniques, not to spend 4 hours stuck. After 10 contests, start attempting D in-contest.
Exit criterion: solve A–C in 8/10 contests; D in 4/10 contests.
4. AtCoder Beginner Contest Progression
Target rating: unrated → 1400 (AtCoder). Goal: solve A–F. Contest cadence: every Saturday/Sunday, ~4/month. Why ABC: AtCoder’s problem statements are remarkably clean (often a single math problem stated tersely), and the difficulty curve A–F is smoother than CF Div 2/3. ABC F is roughly equivalent to CF Div 2 D but with cleaner statements.
How to practice: enter every ABC live. Aim for A–E in <60 min, F in <100 min. ABC F is famous for requiring exactly one “aha” insight per problem; if you don’t see it, move on and upsolve afterward — don’t grind in-contest.
Exit criterion: solve A–E in 9/10 contests; F in 5/10 contests.
5. AtCoder Regular Contest Progression
Target rating: 1400 → 1800 (AtCoder). Goal: solve A–C; attempt D. Contest cadence: ~1/month. Why ARC: ARC is harder than ABC and tests deeper CP techniques — segment tree beats, advanced combinatorics, harder geometry. ARC C is approximately CF Div 1 B / Div 2 E.
How to practice: enter every ARC live. Don’t expect to finish A–C in your first 5 attempts. Upsolve C and D after every contest. ARC is the contest where editorial reading delivers the most learning per minute, because the problems are designed around a specific technique that the editorial will name.
Exit criterion: solve A–B in 8/10 contests; C in 3/10 contests.
6. Stress Testing Methodology
Skill trained: finding bugs in your own solutions before the judge does. Cadence: every problem you’re <90% sure about. Tools: brute solver, candidate solver, random generator, comparator script. See Lab 06 for the full implementation.
How to practice: during a contest, when your candidate passes the samples but you have any doubt, write the stress test. It takes ~3 minutes; it saves the 50-point penalty of a wrong submission, plus the 30 minutes of re-solving after a WA. After contest, on every problem you got wrong, write a stress test that finds your bug. This builds the habit until it’s automatic.
Exit criterion: in 3 consecutive contests, no wrong submissions caused by bugs that would have been caught by stress testing.
7. Reading Editorials Productively
Skill trained: extracting transferable techniques rather than specific solutions. Cadence: after every contest, every problem. Why it matters: the difference between a 1500-rated and a 1900-rated coder is mostly that the 1900 has read 5× more editorials and retained them.
How to practice:
- Read the editorial before re-implementing your wrong solution. Do not patch your code; rewrite from blank using the editorial’s approach.
- Identify the technique name the editorial uses. “This is binary search on the answer.” “This is a two-pointer sliding window.” Add it to your personal technique catalog (a markdown file with one line per technique → one problem where you saw it).
- If the editorial is terse (AtCoder editorials are famously curt), look for community write-ups on Codeforces blogs.
- Re-implement from blank, in your own style.
Exit criterion: for every solved problem in your CP journal, you can name the technique and cite one other problem that uses it.
8. Implementation Speed Drills
Skill trained: typing your way out of the problem-statement-to-AC pipeline as fast as possible. Cadence: weekly, 30 min. Why it matters: in a 2-hour Div 2, you have ~25 min per problem on average. If you spend 10 min on syntax errors, you’ve lost 40% of your budget.
How to practice: pick 5 problems you’ve already solved. Type them again, from blank, against a stopwatch. Your second attempt should be 3–5× faster than your first. Do this on the canonical primitives — Sieve, modular inverse, binary exponentiation, Fenwick tree, sweep line — until you can write each from blank in <5 minutes without referring to notes.
Exit criterion: all 6 lab implementations in this phase, written from blank in <15 minutes each.
9. Contest-Time Strategy (Problem Ordering, When To Skip, When To Stress Test)
Skill trained: allocating limited contest time. Cadence: every contest.
The contest-time playbook:
- First 10 minutes: read all problems briefly. Mark each as “trivial / hard / ?”. Solve the trivial ones first to build momentum.
- Next 30 minutes: solve the medium ones. Allocate ~15 min per problem.
- When stuck for 15 min on one problem: skip. Move to the next. Come back later with a fresh perspective. The cost of grinding stuck is paid in opportunity cost on the next problem.
- When passing samples but uncertain: write a stress test. 3 minutes invested, 50-point penalty avoided.
- Last 30 minutes: decide between (a) attempting a hard problem you haven’t started, or (b) re-checking your earlier solutions. (b) is usually higher ROI unless the hard problem is worth a lot.
- Never give up before time expires. Even if every problem is solved or skipped, re-read the hardest unsolved problem one more time — sometimes the third reading triggers an insight the first two didn’t.
Common strategy mistakes:
- Grinding A–B for too long when they should take <15 min total. If A is taking >20 min, you’re misreading; re-read.
- Submitting before stress-testing on a problem you’re <90% sure about. The penalty hurts more than the time investment.
- Skipping the editorial post-contest because “I’ll do it later”. Later never comes.
Mastery Checklist
Tick when each item is true unprompted — i.e., you’d reach for it without consulting notes.
-
Read constraints first on every problem; can articulate why
N ≤ 18says bitmask,N ≤ 5000saysO(N²), etc. - Modular inverse via FLT in <2 minutes from blank; can switch to extgcd when modulus is composite.
- Binary exponentiation for integers in <2 minutes from blank.
- Matrix exponentiation for Fibonacci in <10 minutes from blank, including the matrix multiplication primitive.
- Sieve of Eratosthenes (basic) in <3 minutes; SPF sieve in <5 minutes; trial-division factorization in <2 minutes.
-
nCr mod pwith precomputed factorials in <5 minutes from blank. - gcd/lcm/extgcd in <3 minutes from blank.
- Cross product orientation test on demand; can identify CCW/CW/collinear by sign.
- Convex hull (Andrew’s monotone chain) in <15 minutes from blank.
- Sweep line for skyline in <20 minutes from blank.
- Coordinate compression as a one-line preprocessing.
- Mo’s algorithm template (block sort + add/remove handlers) in <20 minutes from blank.
- Sprague-Grundy on demand for impartial games up to small state space.
- Stress-testing harness (brute, candidate, generator, comparator) in <10 minutes from blank.
-
Interactive-problem template with explicit
flushafter every query. - Fast I/O configured by reflex in your primary language.
- Read an editorial productively: name the technique, find one other problem using it.
- Codeforces Div 3 A–E in 5/6 contests.
- Codeforces Div 2 A–C in 8/10 contests.
- AtCoder ABC A–F in 5/10 contests.
Exit Criteria
You graduate Phase 7 when all of the following hold:
- You have entered ≥10 Codeforces contests live (any division) and ≥10 AtCoder Beginner Contests live.
- You have a Codeforces rating of ≥1500 OR you have solved Codeforces Div 2 D in ≥3 contests, in or out of contest.
- All 6 labs in this phase are completed with all mastery criteria ticked.
- You can explain — out loud, in <60 seconds — what each of the 19 inline topics is, when to reach for it, and what its complexity is.
- You have a personal CP journal with ≥30 entries, each one linking the problem name to the named technique used.
- You have a stress-testing harness saved as a snippet in your editor and have used it to find a bug in your own code at least 5 times.
If any of these is missing — especially the contests and the journal — you have not exited this phase. Add 2 weeks and re-check.
Labs
| # | Lab | What It Builds |
|---|---|---|
| 1 | Modular Arithmetic | nCr mod p with factorial precomputation + modular inverse |
| 2 | Binary Exponentiation | a^b mod p and matrix exponentiation for Fibonacci |
| 3 | Sieve and Factorization | Count primes, sum of primes, SPF table for fast factorization |
| 4 | Sweep Line | The Skyline Problem (LC 218) via canonical sweep |
| 5 | Coordinate Compression | Counting inversions via Fenwick tree + compression |
| 6 | Stress Testing | Brute + candidate + random generator + comparator harness |
← Phase 6 — Greedy · Phase 8 — Practical Engineering →
Lab 01 — Modular Arithmetic: nCr mod p With Precomputed Factorials
Goal
Master modular arithmetic and modular inverse by building a nCr mod p engine that answers any (n, r) query in O(1) after O(N) preprocessing, for n up to 10^7 and p = 10^9 + 7. By the end of this lab you can write the engine from blank in under 5 minutes.
Background
Counting problems modulo a prime are the single most common framing in competitive programming. The output line “print the answer modulo 10^9 + 7” appears on roughly 30% of CF Div 2/3 problems with combinatorial flavor. Behind that line is a fixed engine: precompute fact[i] = i! and inv_fact[i] = (i!)^(-1) modulo p, and C(n, r) = fact[n] · inv_fact[r] · inv_fact[n-r] mod p. Once you have the engine, dozens of problems collapse to “set up the formula, plug in, output”. The same machinery underlies probability problems where the answer is a/b mod p (output a · b^(-1) mod p).
Interview Context
Quant/HFT interviews use modular-counting problems as direct filters: “how many distinct length-k increasing sequences over [1, n], modulo 10^9 + 7?” If you can’t write the formula and the inverse-factorial trick fluently, you’ve failed in 10 minutes. FAANG L4 interviews almost never ask this directly, but a good L5+ candidate signals fluency by reaching for inv_fact[r] without explanation when a counting problem’s answer needs to be reduced. The signal interviewers want is “this candidate has CP background”; that signal is delivered by writing modular inverse without flinching.
Problem Statement
Given Q queries, each (n_i, r_i) with 0 ≤ r_i ≤ n_i ≤ N_max, output C(n_i, r_i) mod p for p = 10^9 + 7. The engine must support Q ≥ 10^6 queries in <2 seconds total.
LeetCode reference: LC 62 (Unique Paths) asks for C(m+n-2, m-1) directly (no mod needed). LC 920 (Number of Music Playlists) is a DP that uses modular arithmetic but not nCr directly. The pure CP framing appears on Codeforces (e.g., CF 1342E, CF 1342B).
Constraints
1 ≤ N_max ≤ 10^7(table size).1 ≤ Q ≤ 10^6(queries).0 ≤ r ≤ n ≤ N_max(well-formed query).p = 10^9 + 7(prime).- Time limit: 2 seconds. Memory limit: 256 MB. The factorial tables fit (
10^7 · 8 bytes ≈ 80 MB).
Clarifying Questions
- “Is
palways prime?” — yes; FLT works iffpprime. If composite, fall back to extgcd-based inverse and watch for non-invertible elements. - “Are
nandralways non-negative?” — yes; ifr < 0orr > n, return0by convention. - “Do queries arrive online or can they be batched?” — for this lab, online (one at a time after preprocessing),
O(1)each. - “Is
N_maxknown in advance?” — yes; we precompute up toN_max. - “Should I support Lucas’s theorem for
nlarger thanN_max?” — out of scope for this lab; see follow-up.
Examples
C(5, 2) mod p=10.C(10, 5) mod p=252.C(0, 0) mod p=1.C(1000000, 500000) mod (10^9+7)= a large nonzero value (engine must handle it).C(5, 7) mod p=0(well-formedness allowsr > nonly as boundary; we return 0).
Initial Brute Force
For each query, compute C(n, r) = n! / (r! · (n-r)!) using arbitrary-precision arithmetic (e.g., Python’s int), then mod p.
from math import factorial
def nCr_brute(n, r, p):
if r < 0 or r > n:
return 0
return (factorial(n) // (factorial(r) * factorial(n - r))) % p
Brute Force Complexity
Time O(n log² n) per query in arbitrary precision (factorial is O(n) multiplications of big ints, each O(log n) digits). For n = 10^7 and Q = 10^6, infeasible by ~6 orders of magnitude. Useful only as a stress oracle on n ≤ 30.
Optimization Path
- Brute force (above) — correctness baseline only.
- Precompute factorials, compute inverse on each query.
fact[i]table built inO(N); each query computesinv(fact[r]) · inv(fact[n-r]) · fact[n], withinv()takingO(log p)via FLT. TotalO(N + Q log p). Better but still2·10^7ops forQ = 10^6— passes but wasteful. - Precompute inverse factorials, query in
O(1). Afterfact[], computeinv_fact[N] = power(fact[N], p-2, p)once, theninv_fact[i] = inv_fact[i+1] · (i+1) mod pgoing backward. TotalO(N + Q). The optimal solution.
The going-backward trick is the key insight: (i!)^(-1) = ((i+1)!)^(-1) · (i+1), because i! · (i+1) = (i+1)!. So one expensive power call plus a backward sweep gives all inverse factorials in O(N).
Final Expected Approach
- Precompute
fact[0..N]withfact[0] = 1,fact[i] = fact[i-1] · i mod p. - Compute
inv_fact[N] = power(fact[N], p - 2, p)via binary exponentiation (FLT). - Compute
inv_fact[i] = inv_fact[i+1] · (i+1) mod pforifromN-1down to0. - Each query:
if r < 0 or r > n: return 0; else return fact[n] · inv_fact[r] · inv_fact[n-r] mod p.
Data Structures Used
- Two flat arrays of
long long(orint64):fact[0..N]andinv_fact[0..N]. - One scalar
p. - A
power(a, b, m)helper (binary exponentiation).
No heaps, no maps, no trees. The whole thing is two arrays and a closed-form formula.
Correctness Argument
FLT proof of inverse. Fermat’s Little Theorem states: for prime p and gcd(a, p) = 1, a^(p-1) ≡ 1 (mod p). Multiplying both sides by a^(-1): a^(p-2) ≡ a^(-1) (mod p). So power(a, p-2, p) is the modular inverse of a whenever a ≢ 0 (mod p). For a = fact[i], since fact[i] = i! < p for i < p and the product of nonzero-mod-p values, gcd(fact[i], p) = 1, so the inverse exists.
Backward inverse-factorial recurrence. We have inv_fact[i+1] = ((i+1)!)^(-1) and want inv_fact[i] = (i!)^(-1). Since (i+1)! = i! · (i+1), taking inverses: ((i+1)!)^(-1) = (i!)^(-1) · (i+1)^(-1), equivalently (i!)^(-1) = ((i+1)!)^(-1) · (i+1). So inv_fact[i] = inv_fact[i+1] · (i+1) mod p. Base case at i = N is given by the explicit FLT call.
nCr formula correctness. C(n, r) = n! / (r! · (n-r)!). In Z/pZ, division by x is multiplication by x^(-1). So C(n, r) ≡ fact[n] · inv_fact[r] · inv_fact[n-r] (mod p). ✓
Complexity
- Preprocess:
O(N + log p)time (onepowercall, two linear sweeps),O(N)space. - Each query:
O(1)time,O(1)space. - Total for
N = 10^7andQ = 10^6: ~10^7 + 10^6mults, well under 2 seconds in C++/Java/Go; in Python use PyPy or numpy-accelerated arithmetic.
Implementation Requirements
- Use
long long(C++) /int64(Go) /int(Python; Javalong). Cast operands before multiplying to avoid overflow:(long long)fact[n] * inv_fact[r] % p. powerhelper handlesb = 0returning1andm = 1returning0.- Guard
r < 0 || r > nreturning0. - Mod once after every multiplication, not at the end.
Tests
def test_nCr():
p = 10**9 + 7
eng = NCrEngine(N=20, p=p)
assert eng.nCr(5, 2) == 10
assert eng.nCr(10, 5) == 252
assert eng.nCr(0, 0) == 1
assert eng.nCr(1, 0) == 1
assert eng.nCr(1, 1) == 1
assert eng.nCr(5, 7) == 0 # r > n
assert eng.nCr(5, -1) == 0 # r < 0
# Stress vs brute on small N:
import random
for _ in range(1000):
n = random.randint(0, 20)
r = random.randint(-2, 22)
assert eng.nCr(n, r) == nCr_brute(n, r, p)
Edge cases: n = 0, r = 0, r = n, r > n, r < 0, n = N_max.
Follow-up Questions
- “What if
pis not prime?” — FLT fails. Use extgcd-based inverse, but be aware that not everyais invertible (only those withgcd(a, p) = 1). - “What if
nis up to10^18butpis small (≤10^5)?” — Lucas’s theorem: writen, rin basep, multiplyC(n_i, r_i) mod pdigit-wise. - “What if you need
C(n, r) mod 4?” — neither FLT nor straightforward Lucas applies; use Kummer’s theorem or direct computation.
Product Extension
Probability/statistics services (e.g., AdTech bid pacing, fraud-risk scoring) compute combinatorial denominators on the fly. The factorial-precomputation engine is the production primitive: build it once at service startup, query it O(1) for the lifetime of the server. Same machinery is in CryptoLib’s prime-modulus arithmetic and in numerical libraries (NumPy’s comb calls down to a similar routine for large arguments).
Language / Runtime Follow-ups
- Python: integers are arbitrary precision, so overflow isn’t a concern, but per-mult cost is ~5× C++. Use
pow(a, b, m)(built-inO(log b)modular exponentiation). Precompute asnumpy.int64arrays only ifN ≤ 10^6; otherwise plain lists. - Java: use
longeverywhere; cast to(long)before multiplying.BigInteger.modPowworks but is 10× slower than a hand-rolled loop. - Go:
int64everywhere;math/bigif you need extra safety. Hand-roll the loop for performance. - C++:
long long(int64_t),(long long)a * b % p. The product of two values <p ≈ 10^9fits inlong long(<2^63). With unsigned overflow concerns, useunsigned long longand% pdefensively. - JS/TS:
Numberisdouble-precision and loses integer precision above2^53. UseBigInt, but it’s ~10× slower; avoid for hot loops larger than10^6ops.
Common Bugs
- Forgetting to mod after every multiplication; silent overflow.
- Using
**(exponentiation) where the language doesn’t have a modular form; computes a 5MB number first, then mods. Usepow(a, b, m)(Python) or your own loop. - Computing
inv_fact[i]directly viapower(fact[i], p-2, p)for everyi. Correct butO(N log p); the backward sweep isO(N). - Off-by-one:
inv_facthasN+1entries (inv_fact[0..N]); allocate accordingly. - Returning
1forC(n, -1)orC(n, n+1)instead of0. Always guard.
Debugging Strategy
If nCr(n, r) disagrees with brute on small cases:
- Print
fact[0..n]and confirmfact[i] = i!fori ≤ 10. - Print
inv_fact[0..n]and confirmfact[i] · inv_fact[i] ≡ 1 (mod p)for eachi. - If step 2 fails: the FLT exponent is wrong. Check
power(fact[N], p-2, p), notp-1. - If step 2 passes but
nCris wrong: the formula isfact[n] · inv_fact[r] · inv_fact[n-r]. Check you’re not usingfact[n-r](withoutinv_).
Mastery Criteria
- Write the engine (factorial table + inverse-factorial backward sweep + query) from blank in <5 minutes.
-
Articulate why
power(fact[N], p-2, p)givesinv_fact[N]in one sentence (FLT). -
Articulate why
inv_fact[i] = inv_fact[i+1] · (i+1)in one sentence (telescope). - Recognize “answer mod prime” + “counting / paths / arrangements” as the trigger for this engine within 60 seconds of reading a problem.
-
Switch to extgcd-based inverse if asked “what if
pis not prime?” - State Lucas’s theorem on demand and explain when it’s needed.
← Phase 7 README · Lab 02 — Binary Exponentiation →
Lab 02 — Binary Exponentiation and Matrix Exponentiation for Fibonacci
Goal
Master O(log b) exponentiation in two settings: integer fast power (a^b mod p) and matrix fast power (Fibonacci F(n) mod p for n up to 10^18). By the end, you can write integer fast power in <2 minutes and matrix Fibonacci in <10 minutes from blank.
Background
Binary exponentiation is the engine behind almost every “compute a^b for huge b” subroutine in CP and cryptography. The same divide-and-conquer pattern — a^b = (a^(b/2))^2 if b even, else a · a^(b-1) — generalizes from integers to any associative operation: matrix multiplication (linear recurrences), polynomial multiplication (signal processing), function composition (rotation by an angle, repeated f(f(f(...)))). Internalize the O(log b) skeleton once and you get all these for free.
Matrix exponentiation is the canonical extension. Fibonacci, defined F(n) = F(n-1) + F(n-2), has the matrix form [F(n+1), F(n)]^T = M · [F(n), F(n-1)]^T where M = [[1, 1], [1, 0]]. Therefore [F(n+1), F(n)]^T = M^n · [F(1), F(0)]^T = M^n · [1, 0]^T. Computing M^n takes O(log n) matrix multiplications, each O(2^3) = O(8) operations. Total: O(log n) for n = 10^18, about 60 multiplications.
Interview Context
Quant interviews use exactly this pair. “Compute 2^(10^18) mod (10^9 + 7)” is a 2-minute warm-up. “Compute F(10^18) mod (10^9 + 7)” is the follow-up that filters out candidates who only know the iterative O(N) Fibonacci. A candidate who reaches for matrix exponentiation reflexively when seeing n ≤ 10^18 and a linear recurrence is signalling 1900+ CF rating, which is a strong positive signal at HFT firms.
Problem Statement
Part 1. Implement power(a, b, m) returning a^b mod m for 0 ≤ a < m, 0 ≤ b ≤ 10^18, 1 ≤ m ≤ 10^9 + 7.
Part 2. Implement fib(n, p) returning F(n) mod p for 0 ≤ n ≤ 10^18, p = 10^9 + 7, where F(0) = 0, F(1) = 1, F(n) = F(n-1) + F(n-2).
LeetCode reference: LC 50 (Pow(x, n)) — Part 1 in real-number form. LC 1137 (N-th Tribonacci) — Part 2 with three-term recurrence (analogous matrix form).
Constraints
- Part 1:
bup to10^18, so naiveO(b)looping is impossible. - Part 2:
nup to10^18, so naiveO(n)Fibonacci is impossible. - Time limit: 2 seconds.
- Memory: O(1) for Part 1, O(1) for Part 2 (matrices are 2×2).
Clarifying Questions
- “Negative
bfor integer power?” — for modulara^b, undefined unlessgcd(a, m) = 1and you wanta^(-1)^|b|. For reala^b(LC 50), return1 / a^|b|and watch forINT_MINoverflow on-b. - “What’s
0^0?” — by convention1. - “Fibonacci indexing — is
F(1) = 1orF(2) = 1?” — confirm; standard CP convention isF(0) = 0, F(1) = 1. - “Matrix exponentiation modulus — same
peverywhere?” — yes. - “Required output format for matrices?” — only the scalar Fibonacci value, but you might be asked to return the full state vector.
Examples
power(2, 10, 1000)=24(2^10 = 1024).power(3, 0, 7)=1.power(5, 1000000000000000000, 10^9 + 7)= some specific value (must compute).fib(0, p)=0,fib(1, p)=1,fib(10, p)=55.fib(10^18, 10^9 + 7)= a specific value the engine must produce.
Initial Brute Force
Part 1: result = 1; for i in 1..b: result = result * a % m. O(b).
Part 2: iterative two-variable Fibonacci. O(n).
def power_brute(a, b, m):
result = 1 % m
for _ in range(b):
result = result * a % m
return result
def fib_brute(n, p):
if n == 0: return 0
a, b = 0, 1
for _ in range(n - 1):
a, b = b, (a + b) % p
return b
Brute Force Complexity
Part 1: O(b) mults, b = 10^18 is 10^17 mults/sec required — impossible. Part 2: same. Useful only as oracles on n ≤ 30.
Optimization Path
Part 1.
- Naive
O(b)(above). - Recursive
O(log b):power(a, b) = power(a, b/2)² if b even, else a · power(a, b-1). - Iterative
O(log b): process bits ofblow-to-high, squareaeach iteration, multiply into result when the current bit ofbis 1. Preferred for stack safety.
Part 2.
- Naive
O(n)(above). - Memoized doubling:
F(2k) = F(k) · (2 · F(k+1) − F(k)),F(2k+1) = F(k)² + F(k+1)².O(log n)recursion. Beautiful but error-prone. - Matrix exponentiation: build
M = [[1,1],[1,0]], computeM^nvia integer-fast-power lifted to matrices, extractM^n[0][1]asF(n).O(log n) · O(8)= ~500 ops forn = 10^18. The general-purpose technique that works for any linear recurrence.
Final Expected Approach
Part 1 (iterative):
long long power(long long a, long long b, long long m) {
long long res = 1 % m;
a %= m;
while (b > 0) {
if (b & 1) res = res * a % m;
a = a * a % m;
b >>= 1;
}
return res;
}
Part 2 (matrix):
typedef vector<vector<long long>> Mat;
const long long P = 1e9 + 7;
Mat matmul(const Mat &A, const Mat &B) {
int n = A.size();
Mat C(n, vector<long long>(n, 0));
for (int i = 0; i < n; ++i)
for (int k = 0; k < n; ++k)
if (A[i][k])
for (int j = 0; j < n; ++j)
C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % P;
return C;
}
Mat matpow(Mat M, long long e) {
int n = M.size();
Mat res(n, vector<long long>(n, 0));
for (int i = 0; i < n; ++i) res[i][i] = 1; // identity
while (e > 0) {
if (e & 1) res = matmul(res, M);
M = matmul(M, M);
e >>= 1;
}
return res;
}
long long fib(long long n) {
if (n == 0) return 0;
Mat M = {{1, 1}, {1, 0}};
Mat R = matpow(M, n);
return R[0][1];
}
Data Structures Used
- Part 1: scalars only.
- Part 2: 2×2 matrices as
vector<vector<long long>>orarray<array<long long, 2>, 2>.
Correctness Argument
Part 1 (binary exponentiation invariant). Let b = sum b_i 2^i be the binary representation of the original exponent. After k iterations, the variable a equals a_initial^(2^k) mod m, and res equals the product of a_initial^(2^i) mod m for all i < k with b_i = 1. After all iterations, res = a_initial^b mod m. The invariant proves correctness; termination is b → 0 after floor(log₂ b) + 1 iterations.
Part 2 (matrix recurrence). Define the column vector v(k) = [F(k+1), F(k)]^T. Then M · v(k) = [[1,1],[1,0]] · [F(k+1), F(k)]^T = [F(k+1) + F(k), F(k+1)]^T = [F(k+2), F(k+1)]^T = v(k+1). By induction, v(n) = M^n · v(0) = M^n · [1, 0]^T, so F(n) = v(n)[1] = (M^n · [1, 0]^T)[1] = M^n[1][0]. Equivalently, M^n[0][1] = F(n) (by symmetry of the Fibonacci matrix). Matrix multiplication is associative, so binary exponentiation lifts directly: same invariant, just with matrices.
Complexity
- Part 1:
O(log b)time,O(1)space. - Part 2:
O(K^3 log n)time forK × Kmatrices (K = 2for Fibonacci →8 log n≈ 480 ops forn = 10^18),O(K^2)space.
Implementation Requirements
- All multiplications mod
pimmediately. - Identity matrix initialization for
matpow. - Handle
n = 0(return 0 directly, notM^0 · [1, 0]^T = [1, 0]^Twhich would giveF(0) = 0correctly anyway, but watch corner cases). - Use
long long(orint64); intermediate products are up to(10^9)² ≈ 10^18, just fittinglong long.
Tests
def test_power():
assert power(2, 10, 1000) == 24
assert power(3, 0, 7) == 1
assert power(0, 0, 7) == 1
assert power(0, 5, 7) == 0
assert power(2, 63, 10**18) == 2**63 % 10**18
def test_fib():
p = 10**9 + 7
expected = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
for i, e in enumerate(expected):
assert fib(i, p) == e
# Stress vs brute up to N = 30
for n in range(31):
assert fib(n, p) == fib_brute(n, p)
# Spot-check large
assert fib(10**18, p) == 209783453 # known value, mod 1e9+7
Follow-up Questions
- “Compute
F(n)for general linear recurrencef(n) = c1 f(n-1) + ... + ck f(n-k)?” — same matrix exponentiation, with ak×kcompanion matrix. - “Compute the number of paths of length
Lbetween two nodes in a graph?” —(adj_matrix)^L, indexed by start/end. SameO(V^3 log L)engine. - “Compute Tribonacci
T(n)inO(log n)?” — 3×3 companion matrix, otherwise identical. - “Avoid
O(K^3)per multiplication for hugeK?” — research topics: Kitamasa’s algorithm reduces toO(K^2 log N); FFT-based polynomial multiplication reduces further. Out of scope here.
Product Extension
Cryptography (RSA encryption: compute m^e mod n with e, n 2048-bit, in milliseconds — same algorithm, with bignum). Computer graphics (rotation matrix R^n for repeated rotations). Markov chain steady-state approximation (P^n for stochastic matrix P, large n). Network reachability (adj^L for paths of length L).
Language / Runtime Follow-ups
- Python: built-in
pow(a, b, m)isO(log b)and ~10× faster than a hand-rolled Python loop. For matrix power: hand-roll the multiplication; numpy is overkill atK = 2(FFI overhead exceeds compute). - Java:
BigInteger.modPowexists but is 10× slower than a hand-rolledlongloop. Use the hand-rolled version unless values exceedlong. - Go:
math/big.Int.Exp(a, b, m)is correct but slow; hand-roll for hot paths. Matrix: use 2D[][]int64arrays, notmath/big. - C++:
__int128for intermediate products ifmexceeds~3·10^9(where(long long)² mod poverflows). For standardm = 10^9 + 7, plainlong longsuffices. - JS/TS:
BigIntis correct but slow; form < 2^32, useNumbercarefully (Math.floorafter* a / m) — easy to get wrong. Matrix: same caveat.
Common Bugs
- Forgetting
res = 1 % mat start (returns1instead of0whenm = 1). - Squaring
abefore checking the bit (results in extra unused multiplication; correctness OK, perf hit). - Matrix multiplication order:
M^n = M · M · ... · M, butmatmulis non-commutative — be deliberate about left/right. - Using
intinstead oflong longfor matrix entries;(10^9)² > 2^31. - Recursive
powerblowing the stack onb = 10^18(recursion depth 60 is fine; just don’t go deeper).
Debugging Strategy
If power(a, b, m) is wrong:
- Test on small cases (
b ≤ 5) where you can verify by hand. - Print binary representation of
band confirm the bits are processed. - If correct on small
bbut wrong on large: overflow. Check thata * a % museslong long, notint.
If fib(n, p) is wrong:
- Verify
M^1 = M,M^2 = [[2,1],[1,1]],M^3 = [[3,2],[2,1]]. Each entry is a Fibonacci number. - Verify
fib(0..10)matches the canonical sequence. - If small
nworks but large doesn’t: same overflow check.
Mastery Criteria
-
Write
power(a, b, m)from blank in <2 minutes. - Write the Fibonacci matrix-exponentiation engine from blank in <10 minutes.
- Articulate the binary-exponentiation invariant in one sentence.
-
Articulate the matrix-Fibonacci recurrence (
v(n+1) = M · v(n)) in 30 seconds. - Generalize to any linear recurrence: given the recurrence, write the companion matrix in <2 minutes.
- Recognize “n up to 10^18 + linear recurrence” as the trigger for matrix exponentiation in <60 seconds.
← Lab 01 — Modular Arithmetic · Phase 7 README · Lab 03 — Sieve and Factorization →
Lab 03 — Sieve of Eratosthenes and Smallest-Prime-Factor Factorization
Goal
Master prime enumeration and integer factorization at competitive scale: count primes up to N = 5 · 10^6 in under 100ms, and factorize Q = 10^5 integers ≤ N in O(log n) each via a precomputed smallest-prime-factor (SPF) table. By the end, you can write the linear sieve from blank in <5 minutes.
Background
Many CP problems reduce to “is x prime?”, “what are the prime factors of x?”, or “how many primes ≤ N?”. For N up to a few million, a sieve answers all three in O(N log log N) (Eratosthenes) or O(N) (Euler/linear sieve), and the SPF byproduct lets you factorize any x ≤ N in O(log x) time. For larger N (up to 10^9), you need Miller-Rabin primality tests and Pollard’s rho for factorization (out of scope here).
The sieve is the single most reused primitive across number-theoretic CP problems. Get this fluent and you save 5–10 minutes per problem.
Interview Context
LC 204 (Count Primes) is the canonical screen at FAANG mid-level. Quant interviews ramp it up: “factorize each of 10^5 numbers up to 10^7”. Both questions test the same primitive, but at different scales — the scale forces the candidate to pick the right data structure (bitset vs vector<bool> vs SPF table). A candidate who reaches for SPF without prompting signals CP fluency.
Problem Statement
Three sub-problems on the same engine:
- Count primes ≤ N. LC 204.
N ≤ 5 · 10^6. - Sum of primes ≤ N. Variant: instead of count, return sum mod
10^9 + 7. - Factorize Q numbers ≤ N. For each
x_i, output its multiset of prime factors.Q ≤ 10^5,x_i ≤ N.
Constraints
N ≤ 5 · 10^6for sub-problems 1 and 2.N ≤ 10^7,Q ≤ 10^5for sub-problem 3.- Time limit: 1 second.
- Memory limit: 256 MB. SPF as
intarray uses4 · 10^7 = 40 MB; fine.
Clarifying Questions
- “Is
0and1prime?” — neither. Standard convention. - “Factorization output: ordered or multiset?” — multiset, e.g.,
12 → [2, 2, 3]. - “Are inputs guaranteed
≤ N?” — yes for this lab. Otherwise switch to trial divisionO(√x)or Pollard’s rho. - “Need primes only or all factors?” — primes only here (canonical). Divisors are a separate problem.
- “Single-threaded?” — yes. Sieves don’t parallelize trivially without contention.
Examples
- Sub-problem 1:
count_primes(10) = 4(2, 3, 5, 7).count_primes(2) = 1.count_primes(1) = 0. - Sub-problem 2:
sum_primes(10) = 17. - Sub-problem 3:
factorize(12) = [2, 2, 3].factorize(1) = [].factorize(7) = [7].factorize(60) = [2, 2, 3, 5].
Initial Brute Force
For sub-problem 1: for each x in [2, N], run trial division up to √x.
def is_prime(x):
if x < 2: return False
for d in range(2, int(x**0.5) + 1):
if x % d == 0: return False
return True
def count_primes_brute(N):
return sum(1 for x in range(2, N) if is_prime(x))
For sub-problem 3: trial division of each x_i.
def factorize_brute(x):
out = []
d = 2
while d * d <= x:
while x % d == 0:
out.append(d); x //= d
d += 1
if x > 1: out.append(x)
return out
Brute Force Complexity
Sub-problem 1 trial division: O(N √N). For N = 5·10^6, that’s ≈ 10^10 ops — impossible in 1 second.
Sub-problem 3 trial division per query: O(√x). For Q = 10^5 and x = 10^7, that’s ~3·10^8 ops — borderline. Sieve-based factorization is O(log x) ≈ 24 ops, ~2.4·10^6 total — comfortably faster.
Optimization Path
- Trial division (above).
- Sieve of Eratosthenes for sub-problems 1, 2: mark composites; remaining are primes.
O(N log log N). - Linear (Euler) sieve + SPF table: each composite is crossed off exactly once via its smallest prime factor; the SPF byproduct enables
O(log x)factorization.O(N)preprocessing.
For interviews, Eratosthenes is fine; for CP the linear sieve is the default once you’ve drilled it.
Final Expected Approach
Sieve of Eratosthenes (sub-problems 1, 2):
vector<bool> sieve(int N) {
vector<bool> is_prime(N + 1, true);
is_prime[0] = is_prime[1] = false;
for (int i = 2; (long long)i * i <= N; ++i)
if (is_prime[i])
for (int j = i * i; j <= N; j += i)
is_prime[j] = false;
return is_prime;
}
Linear sieve with SPF table (sub-problem 3):
vector<int> spf; // smallest prime factor
vector<int> primes;
void linear_sieve(int N) {
spf.assign(N + 1, 0);
primes.clear();
for (int i = 2; i <= N; ++i) {
if (spf[i] == 0) { spf[i] = i; primes.push_back(i); }
for (int p : primes) {
if ((long long)p * i > N || p > spf[i]) break;
spf[p * i] = p;
}
}
}
vector<int> factorize(int x) {
vector<int> out;
while (x > 1) { out.push_back(spf[x]); x /= spf[x]; }
return out;
}
Data Structures Used
vector<bool>(or bitset for memory efficiency) for the sieve mark array.vector<int>for the SPF table (one int per index up toN).vector<int>for the prime list.
A bitset packs 8× denser than vector<bool>; for very large N (5 · 10^7) it matters.
Correctness Argument
Eratosthenes correctness. Inductive claim: when iteration i starts, is_prime[j] is correct for all j < i. If is_prime[i] == true, no j < i has j | i (else it would have unmarked i); so i has no proper divisor < i, so i is prime. We then mark all multiples i², i² + i, i² + 2i, ... as composite. Multiples below i² were already marked by smaller prime divisors. By strong induction, the array is fully correct after the loop terminates.
Linear sieve correctness. Each composite n is marked exactly once, when i = n / spf(n) and the inner loop reaches p = spf(n). The loop guard p > spf[i] ensures p ≤ spf[i], so p · i has smallest prime factor p (because p ≤ spf[i] ≤ all other prime factors of i). So the assignment spf[p * i] = p is correct. Each composite has a unique (i, p) pair, hence is visited exactly once → O(N) total work.
Factorization correctness. While x > 1, spf[x] is x’s smallest prime factor, so emit it and divide. Every iteration strictly decreases x by a factor of at least 2, so loop runs ≤ log₂ x times.
Complexity
- Eratosthenes:
O(N log log N)time,O(N / 8)space with bitset. - Linear sieve + SPF:
O(N)time,O(N)space. - Factorize each query:
O(log x). - Total for sub-problem 3:
O(N + Q log x).
Implementation Requirements
- Use
(long long)i * ito computei²to avoid overflow at largeN. - Sieve inner loop starts at
i², not2i— multiples belowi²are already marked. - For sum-of-primes mod
p, mod the running sum after each addition. - SPF allocation:
N + 1entries (for indexN).
Tests
def test_count_primes():
assert count_primes(10) == 4
assert count_primes(2) == 1
assert count_primes(1) == 0
assert count_primes(100) == 25
# Stress vs brute up to N = 1000
for n in range(2, 1001):
assert count_primes(n) == count_primes_brute(n)
def test_factorize():
linear_sieve(1000)
assert factorize(12) == [2, 2, 3]
assert factorize(7) == [7]
assert factorize(1) == []
assert factorize(60) == [2, 2, 3, 5]
# Stress vs brute
for x in range(1, 1001):
assert sorted(factorize(x)) == sorted(factorize_brute(x))
Edge cases: N = 0, N = 1, N = 2, prime N, prime power N.
Follow-up Questions
- “What if
N = 10^9?” — sieve is impossible (memory + time). Use Miller-Rabin for primality, Pollard’s rho for factorization. - “What if
N = 10^11and you need only the count?” — Meissel-Mertens method,O(N^(2/3)). Out of scope here. - “How would you parallelize the sieve?” — segmented sieve: split
[2, N]into blocks of size√N, sieve each block independently after computing primes ≤√N. Each block fits in cache; threads work on disjoint blocks.
Product Extension
Cryptography service: precompute small primes for trial division as a Miller-Rabin pre-filter. Number-theoretic libraries (GMP, FLINT) cache a small-prime table at startup. RSA key generation does trial division up to ~10^5 before falling back to Miller-Rabin, yielding ~10× speedup on rejecting composites.
Language / Runtime Follow-ups
- Python: plain Python list of bools is slow; use
bytearray(8× faster than list-of-bool). ForN = 5·10^6,bytearraysieve runs in ~0.3 s; PyPy / Cython get to ~50 ms. - Java:
BitSetis more memory-efficient thanboolean[]and ~equally fast. - Go: plain
[]boolis fine; built-inbitsetdoesn’t exist (write your own with[]uint64). - C++:
vector<bool>is bit-packed by default; usebitset<N+1>for stack-allocated speed at compile-time-fixedN. - JS/TS:
Uint8Arrayfor the sieve.Numberinteger math is exact below2^53.
Common Bugs
- Sieve inner loop starting at
2iinstead ofi²— correct but slower. - Marking
is_prime[0]andis_prime[1]as true. - Off-by-one: allocating
Nentries instead ofN + 1. - In linear sieve, missing the
p > spf[i]break condition → quadratic behavior. - Computing
i * iasintand overflowing fori > ~46340. - For factorization output, dividing
xbyspf[x]but forgetting to also pushspf[x].
Debugging Strategy
If sieve is wrong:
- Print primes ≤ 30 from your sieve. Should be
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]. If 1 is included or 9 is missed, check init / inner loop start.
If factorize is wrong:
- Print
spf[2..30]. Should be[2,3,2,5,2,7,2,3,2,11,2,13,2,3,2,17,...](smallest prime factor of each index). - If SPF correct but factorization wrong: the loop should emit
spf[x]and divide; common bug is doing one or the other.
Mastery Criteria
- Write Sieve of Eratosthenes from blank in <3 minutes.
- Write linear sieve with SPF from blank in <5 minutes.
- Articulate why each composite is marked exactly once in linear sieve.
- Use SPF to factorize an integer in <30 seconds.
-
Recognize
N ≤ 10^7+ factor-related question as the sieve trigger within 30 seconds. -
State the alternative when
N > 10^9(Miller-Rabin + Pollard’s rho).
← Lab 02 — Binary Exponentiation · Phase 7 README · Lab 04 — Sweep Line →
Lab 04 — Sweep Line: The Skyline Problem
Goal
Master the sweep-line paradigm by solving the canonical Skyline Problem (LC 218). Process N rectangles by sweeping left-to-right over event points, maintaining a set of active heights, and emitting key points when the maximum height changes. By the end, you can write the full algorithm from blank in <15 minutes.
Background
Sweep line is a meta-technique: convert geometric or interval problems on N objects into a sequence of 2N events (one when an object enters, one when it leaves), sort the events, then process them in order while maintaining a dynamic data structure (multiset, segment tree, BIT) that summarizes the currently active set. The cost shifts from “compare every pair” O(N²) to “sort + log-time updates” O(N log N).
The skyline problem is the canonical sweep-line interview question because it forces every component: event extraction, event sorting (with non-obvious tie-breaking), a dynamic max query, and careful output deduplication. Internalize this and you can derive sweep-line variants for rectangle area unions, interval intersection counting, point-in-rectangle queries, and convex hull (Andrew’s monotone chain).
Interview Context
LC 218 (Hard) — by far the most-asked Hard at FAANG mid-level interviews when the panel wants to test sweep line. The bar is very high: a passing solution must be O(N log N), must handle ties correctly, must deduplicate output, and must not emit phantom key points. A candidate who hand-waves through tie-breaking will be rejected even with otherwise-correct code. Quant interviews use rectangle-union-area, which is the same engine with an integral instead of a max.
Problem Statement
Given N buildings, each described by [left_i, right_i, height_i], return the skyline outline as a list of key points [x, y], where y is the height of the skyline at x and key points appear only when the height changes. The last key point has y = 0 (where the rightmost building ends).
LeetCode reference: LC 218 (The Skyline Problem).
Constraints
1 ≤ N ≤ 10^4(LC), realistically up to10^5for harder variants.0 ≤ left_i < right_i ≤ 2^31 − 1.1 ≤ height_i ≤ 2^31 − 1.- Output is sorted by
x. No two consecutive key points have the samey.
Clarifying Questions
- “Are buildings axis-aligned and non-rotated?” — yes (skyline assumption).
- “Can buildings overlap?” — yes, freely.
- “Is the ground at
y = 0?” — yes; the skyline ends with[x_max, 0]. - “Output format: key points where height changes, or every event point?” — only changes; consecutive duplicates are bugs.
- “Tie-breaking for events at the same
x?” — opens before closes (the building exists at that exactx, contributing its height).
Examples
- Input:
[[2, 9, 10], [3, 7, 15], [5, 12, 12], [15, 20, 10], [19, 24, 8]]. - Output:
[[2, 10], [3, 15], [7, 12], [12, 0], [15, 10], [20, 8], [24, 0]].
Walking through: at x = 2, height jumps from 0 to 10 → emit [2, 10]. At x = 3, height jumps to 15 → emit [3, 15]. At x = 7, building of height 15 ends, max drops to 12 → emit [7, 12]. At x = 12, last “left-cluster” building ends → emit [12, 0]. Then the right cluster starts at 15 ([15, 10]), 19 doesn’t change max (still 10 since 8 < 10), 20 ends 10-building → emit [20, 8], 24 ends 8-building → emit [24, 0].
Initial Brute Force
For each x from 0 to x_max, compute the max height over all buildings covering x. Emit [x, h] whenever h changes.
def skyline_brute(buildings):
x_max = max(b[1] for b in buildings)
out = []
prev_h = 0
for x in range(x_max + 1):
h = max((b[2] for b in buildings if b[0] <= x < b[1]), default=0)
if h != prev_h:
out.append([x, h])
prev_h = h
return out
Brute Force Complexity
O(x_max · N). For coordinates up to 2^31, infeasible. Even on small CP-scale x_max = 10^9, no chance. Useful only as oracle on x_max ≤ 100.
Optimization Path
- Brute (above).
- Event sweep with sorted multiset. Generate
2Nevents:(left, -h, OPEN)and(right, h, CLOSE). Sort. Sweep, maintaining a multiset of active heights. Emit[x, max_active]when max changes.O(N log N). - Event sweep with a max-heap and lazy deletion. Same idea, but the heap doesn’t natively support deletion; instead, store
(height, end_time)pairs and pop stale entries lazily. Sometimes faster constant factor thanmultiset.O(N log N). - Divide and conquer. Split buildings into two halves, solve recursively, merge skylines (similar to merge sort).
O(N log N).
The multiset approach is the cleanest in C++/Java; Python defaults to the heap-with-lazy-deletion style.
Final Expected Approach
Heap with lazy deletion (Python idiom):
import heapq
def get_skyline(buildings):
events = []
for L, R, H in buildings:
events.append((L, -H, R)) # opening: negative height for max-heap via min-heap
events.append((R, 0, 0)) # closing sentinel: process at this x
events.sort()
result = []
heap = [(0, float('inf'))] # (negative height, end_time); ground is height 0 forever
i = 0
n = len(events)
while i < n:
x = events[i][0]
# Process all events at this x: add openings.
while i < n and events[i][0] == x:
L, neg_H, R = events[i]
if neg_H < 0: # an opening
heapq.heappush(heap, (neg_H, R))
i += 1
# Lazy-pop expired buildings.
while heap[0][1] <= x:
heapq.heappop(heap)
cur_max = -heap[0][0]
if not result or result[-1][1] != cur_max:
result.append([x, cur_max])
return result
Data Structures Used
- A list of events, sorted.
- A max-heap (or multiset / sorted set) of active heights with their end times.
- A result list with deduplication on consecutive
y.
Correctness Argument
Sweep correctness. Define H(x) = max heights of buildings covering x. The function H changes value only at event coordinates (the boundaries of buildings). So if we sample H at every event coordinate (in sorted order), we capture every change. The result is the unique sequence of changes in H, which is the skyline.
Tie-breaking at equal x. Multiple events can share an x: an opening, a closing, or both. We process all events at this x together: first add all openings (their buildings exist at this x, contributing their height), then lazy-pop closings (those buildings no longer cover any x ≥ this x). After all events at x are processed, the heap reflects the active set on [x, x+1), and we read cur_max. Emit [x, cur_max] if it differs from the previous emission. This handles “two buildings of different heights both starting at the same x” (emit the taller), “one building closing at the same x another opens” (emit the new max), and “two buildings of equal height with overlapping ranges” (no change at the boundary).
Lazy deletion correctness. We never pop a building from the heap until we’ve passed its end time. The heap’s top might be stale; we pop while heap[0].end ≤ x. Once we stop, the top is the current maximum. Since each building is pushed once and popped at most once, total heap work is O(N log N).
Complexity
O(N log N)time (sort + heap operations).O(N)space (events and heap).- Output size up to
O(N)key points.
Implementation Requirements
- Sort events with the right tie-breaking. The
(x, -h, R)tuple ordering naturally puts openings before closings at the samex(negative height < 0 < zero sentinel). - Process all events at the same
xtogether before emitting. - Deduplicate consecutive key points with equal
y. - Initialize the heap with the ground sentinel
(0, ∞)so it’s never empty.
Tests
def test_skyline():
bs = [[2,9,10],[3,7,15],[5,12,12],[15,20,10],[19,24,8]]
expected = [[2,10],[3,15],[7,12],[12,0],[15,10],[20,8],[24,0]]
assert get_skyline(bs) == expected
# Single building
assert get_skyline([[1, 2, 1]]) == [[1, 1], [2, 0]]
# Two same-x openings, different heights
assert get_skyline([[0, 5, 3], [0, 4, 5]]) == [[0, 5], [4, 3], [5, 0]]
# Stress vs brute on small inputs
import random
for _ in range(200):
N = random.randint(1, 6)
bs = []
for _ in range(N):
L = random.randint(0, 10)
R = random.randint(L + 1, L + 5)
H = random.randint(1, 5)
bs.append([L, R, H])
assert get_skyline(bs) == skyline_brute(bs)
Follow-up Questions
- “Total area covered (rectangle union)?” — same sweep, but accumulate
Δx · max_heightbetween consecutive events.O(N log N). - “Number of overlapping buildings at each
x?” — same events, but track the count of active buildings, not max.O(N log N). - “Online version where buildings stream in?” — segment tree over compressed coordinates, range max query.
O(N log N)total. - “K-th tallest building visible at
x?” — segment tree with “K-th order statistic” support, or a balanced BST.
Product Extension
Logging / event analytics: “concurrent active sessions over time” is the same engine with count instead of max. Cloud autoscaling decision: “what’s the peak demand in this 5-minute window?” Same engine with sum instead of max. Calendar conflict detection: pairs of overlapping events found by sweep + active-set membership. Real-time bidding (RTB): impression eligibility windows with priority-tier counts.
Language / Runtime Follow-ups
- Python:
heapqis min-heap, so use negative heights for max.SortedListfromsortedcontainersisO(log N)insert / delete / max — closest to C++ multiset. - Java:
TreeMap<Integer, Integer>mapping height to count, withlastKey()for max. OrPriorityQueuewith lazy deletion. - Go:
container/heapwith customLess; alternatively asort.IntSliceyou maintain manually. - C++:
multiset<int>with*rbegin()for max,erase(find(...))for single-instance removal. Orpriority_queue+ lazy deletion.
Common Bugs
- Tie-breaking: closing processed before opening at the same
xcauses phantom drops. - Forgetting the ground sentinel
(0, ∞)causes empty-heap crashes when all buildings expire. - Failing to deduplicate consecutive key points: emitting
[5, 10], [7, 10], [9, 10]instead of[5, 10]. - Removing both copies when two buildings have the same height by using
multiset.erase(value)(which erases all). Useerase(find(value)). - Confusing
<=vs<on the lazy-pop condition; off-by-one drops a building one event too early or late. - Sorting events by
xonly and breaking ties arbitrarily — leads to wrong output on dense inputs.
Debugging Strategy
If output has phantom key points or wrong heights:
- Print events after sorting; verify openings appear before closings at the same
x. - Print heap contents after processing each event; verify the top is the true max active height.
- Run on the smallest failing case and compare against brute on
x_max ≤ 30.
If output is missing key points:
- Check the deduplication condition; an off-by-one might filter out real changes.
- Verify all events at the same
xare processed before emitting.
Mastery Criteria
- Write the full skyline algorithm from blank in <15 minutes.
- Articulate the tie-breaking rule and why it’s needed in 30 seconds.
- Adapt the sweep to rectangle-union-area in <5 minutes.
- Recognize “intervals + boundary events + dynamic property” as the sweep-line trigger in <60 seconds.
-
State the
O(N log N)correctness argument in one sentence.
← Lab 03 — Sieve and Factorization · Phase 7 README · Lab 05 — Coordinate Compression →
Lab 05 — Coordinate Compression and Fenwick Tree: Count of Smaller Numbers After Self
Goal
Master coordinate compression as a preprocessing step, then use a Fenwick tree (BIT) over the compressed indices to count, for each element of an array, how many elements to its right are strictly smaller. Solve LC 315 in O(N log N). By the end, you can compress + scan + BIT-update from blank in <10 minutes.
Background
Many problems on integer arrays don’t actually depend on the values, only on the relative ordering. When values can be huge (10^9) but N is small (10^5), allocating an array indexed by value is impossible. Coordinate compression replaces each value v by its rank in sort(unique(values)), mapping the value space down to [0, N). This lets us index a BIT or segment tree by rank instead of value, swapping O(value_range) space for O(N).
The pairing of “compress, then BIT over ranks, then scan one direction” is one of the most reused patterns in CP: count inversions, count smaller-on-right, count pairs with sum in a range, K-th smallest in a sliding window — all instances of the same engine.
Interview Context
LC 315 (Hard) — Count of Smaller Numbers After Self. A classic FAANG senior-level question; also asked at quant firms for “count inversions” or “count pairs (i, j) with i < j and a_i > a_j”. Brute is O(N²) and obvious. The expected O(N log N) solution requires the candidate to either:
- Compress + BIT (this lab), or
- Modified merge sort (counts inversions during the merge step).
Both are valid; BIT is more general and extends to more variants. Senior candidates are expected to know both.
Problem Statement
Given an integer array nums of length N, return an array counts where counts[i] is the number of indices j > i such that nums[j] < nums[i].
LeetCode reference: LC 315 (Count of Smaller Numbers After Self).
Constraints
1 ≤ N ≤ 10^5.−10^4 ≤ nums[i] ≤ 10^4(LC); generalize to−10^9 ≤ nums[i] ≤ 10^9.- Time limit: 1 second.
Clarifying Questions
- “Strictly smaller, or
≤?” — strictly smaller (per LC). For≤, change query upper bound. - “Are duplicates allowed?” — yes. They must not count themselves as “smaller”.
- “Modify input allowed?” — generally yes; the compression step can sort a copy.
- “Return order?” — same order as input (
counts[i]aligned withnums[i]).
Examples
nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]. For5: indices 1 and 3 have values < 5. For2: only index 3. For6: only index 3. For1: nothing to the right.nums = [-1] → [0].nums = [-1, -1] → [0, 0]. Equal values don’t count.
Initial Brute Force
def count_smaller_brute(nums):
n = len(nums)
out = [0] * n
for i in range(n):
for j in range(i + 1, n):
if nums[j] < nums[i]:
out[i] += 1
return out
Brute Force Complexity
O(N²). For N = 10^5, that’s 10^10 ops — infeasible.
Optimization Path
- Brute (above).
- Modified merge sort. During merge, when an element from the right half is placed before an element from the left half, increment counters for all unplaced left-half elements.
O(N log N). - Coordinate compression + Fenwick tree. Compress values to ranks
[0, N). Scan right-to-left. For each element, query BIT prefix sum on[0, rank(v) − 1](= count of strictly smaller values already seen on the right), then update BIT atrank(v).O(N log N).
The BIT approach is more flexible: it handles “smaller”, “equal”, “in range”, “K-th smallest” with the same engine. The merge sort approach is more efficient in constant factor for pure inversion counting.
Final Expected Approach
Coordinate compression + Fenwick tree, scan right-to-left.
struct BIT {
vector<int> t;
BIT(int n) : t(n + 1, 0) {}
void update(int i, int v) { for (++i; i < (int)t.size(); i += i & -i) t[i] += v; }
int query(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
};
vector<int> count_smaller(vector<int>& nums) {
int n = nums.size();
vector<int> sorted_nums(nums.begin(), nums.end());
sort(sorted_nums.begin(), sorted_nums.end());
sorted_nums.erase(unique(sorted_nums.begin(), sorted_nums.end()), sorted_nums.end());
auto rank_of = [&](int v) {
return (int)(lower_bound(sorted_nums.begin(), sorted_nums.end(), v) - sorted_nums.begin());
};
BIT bit(sorted_nums.size());
vector<int> result(n);
for (int i = n - 1; i >= 0; --i) {
int r = rank_of(nums[i]);
result[i] = (r > 0) ? bit.query(r - 1) : 0;
bit.update(r, 1);
}
return result;
}
Data Structures Used
- A sorted-uniqued copy of the input for rank lookup (O(N log N) sort).
- A Fenwick tree of size
unique_countfor prefix-sum updates and queries. - A result array of size
N.
Correctness Argument
Compression preserves order. lower_bound returns the first index whose value is ≥ v; for any v in the original array, this is the unique rank. So rank(a) < rank(b) ↔ a < b, and rank(a) == rank(b) ↔ a == b.
Right-to-left scan invariant. When processing index i, the BIT contains exactly the multiset of ranks for indices [i+1, n−1] (each updated by +1). The query bit.query(rank(nums[i]) − 1) returns the count of those ranks strictly less than rank(nums[i]), which equals the count of nums[j] < nums[i] for j > i. After the query, we insert rank(nums[i]) so it’s visible to the next (lower) i.
Edge case rank == 0. If nums[i] is the minimum, no value is strictly smaller, so result[i] = 0. The code guards r > 0 to avoid querying bit.query(-1).
Complexity
- Sort + unique:
O(N log N). - For each of
Nelements: one rank lookup (O(log N)), one BIT query (O(log N)), one BIT update (O(log N)). - Total:
O(N log N)time,O(N)space.
Implementation Requirements
- Use 1-indexed BIT internally (
++ion entry); 0-indexed externally. - After sort, deduplicate before binary search; otherwise rank would skip values for ties.
- Scan right-to-left; left-to-right would count smaller-on-left, a different problem.
- Handle
n = 0(return empty array) andn = 1(return[0]).
Tests
def test_count_smaller():
assert count_smaller([5, 2, 6, 1]) == [2, 1, 1, 0]
assert count_smaller([-1]) == [0]
assert count_smaller([-1, -1]) == [0, 0]
assert count_smaller([]) == []
assert count_smaller([1, 2, 3, 4]) == [0, 0, 0, 0] # already sorted asc
assert count_smaller([4, 3, 2, 1]) == [3, 2, 1, 0] # sorted desc
assert count_smaller([1, 1, 1]) == [0, 0, 0] # all equal
# Stress vs brute
import random
for _ in range(200):
n = random.randint(0, 50)
nums = [random.randint(-100, 100) for _ in range(n)]
assert count_smaller(nums) == count_smaller_brute(nums)
Follow-up Questions
- “Count strictly greater on right?” — change query to
bit.query(maxRank) − bit.query(rank(v)). - “Count of equal values on right?” —
bit.query(rank(v)) − bit.query(rank(v) − 1). - “Count of values in
[lo, hi]on right?” —bit.query(rank(hi)) − bit.query(rank(lo) − 1). - “Total inversions in array?” — sum the result array.
- “Online streaming version?” — same algorithm with a hash-rank assigned on-the-fly via a balanced BST or order-statistics tree (no compression possible until all values seen).
- “Why not segment tree?” — works equally well; BIT has 4× smaller constant and is shorter to code. Use seg tree if you need range max / range update.
Product Extension
Recommendation systems: “for each user’s recently watched item, how many of their next 10 watches were lower-rated?” — same problem on rating arrays. Quant trading: rank-based features (“how many of the next 100 ticks are below this tick’s price?”) computed in batch via this engine. Search ranking: “for each query, count the number of subsequent queries with shorter session length” — feature engineering pipelines.
Language / Runtime Follow-ups
- Python:
bisect.bisect_leftfor rank lookup; BIT as a plain list.sortedcontainers.SortedListis a 1-line alternative (sl.bisect_left(v); sl.add(v)) but ~5× slower than BIT in pure Python. - Java:
Arrays.binarySearchfor rank lookup,int[]BIT. - Go:
sort.SearchInts,[]intBIT. - C++:
lower_bound+vector<int>BIT. Thepbdsorder-statistics tree (tree<>from__gnu_pbds) givesfind_by_orderandorder_of_keydirectly, but is slower than BIT.
Common Bugs
- Forgetting to deduplicate after sort:
lower_boundstill works, but BIT size becomesNeven if values are mostly equal — wasted space, not incorrect. - Using
upper_boundinstead oflower_boundfor rank: gives wrong answer for duplicates. - Scanning left-to-right instead of right-to-left: solves a different problem.
- 1-indexed vs 0-indexed off-by-one in the BIT.
- Querying
rank − 1without checkingrank > 0:bit.query(-1)may return garbage or crash. - Comparing with
≤instead of<(depends on problem statement).
Debugging Strategy
If output is wrong:
- Print compressed ranks alongside original values; check ordering preserved.
- Print BIT state after each insertion (small input). Verify prefix sums equal “count of values ≤ rank”.
- Run on
[5, 2, 6, 1]and trace right-to-left: ati=3, BIT empty, result=0. Insert rank(1). Ati=2, query rank(6)−1 → count of values with rank ≤ rank(6)−1 already in BIT → expect 1. - Compare against brute on
N ≤ 30random inputs.
Mastery Criteria
-
Write coordinate compression (
sort+unique+lower_bound) from blank in <2 minutes. - Write Fenwick tree (update + query) from memory in <3 minutes.
- Articulate the right-to-left scan invariant in one sentence.
- Adapt the engine to “count in range [lo, hi]” or “total inversions” in <5 minutes.
- Recognize “values up to 10^9, N up to 10^5, count-by-rank query” as the compression+BIT trigger in <60 seconds.
← Lab 04 — Sweep Line · Phase 7 README · Lab 06 — Stress Testing →
Lab 06 — Stress Testing: Finding Bugs You Can’t Spot by Reading
Goal
Build a stress-testing harness in <10 minutes that pits a brute-force oracle against a candidate solution on randomly generated small inputs, automatically catching mismatches. Use it to find two intentionally-injected bugs in a candidate solution where reading the code wouldn’t reveal them. By the end, stress testing is your default tool when a CP solution passes samples but fails hidden tests, and you can build a fresh harness for any problem in <10 minutes.
Background
In CP and high-stakes interviews, you frequently face: “my code passes the samples but WAs on hidden tests.” Reading harder doesn’t help — your mental model of the algorithm is exactly what’s wrong. Stress testing breaks this by replacing your brain with the machine: write a slow-but-obviously-correct brute, write a fast random input generator, and let the computer compare outputs on millions of small cases. The first mismatch is a tiny counterexample you can debug by hand.
Top competitive programmers (red CF) use stress testing constantly. It’s the single most underused tool by interview candidates and the fastest way to debug a “looks right but doesn’t work” solution.
Interview Context
Interview problems rarely give you the freedom to write a stress test under time pressure, but the meta-skill — converting “I’m stuck” into a structured experiment — is exactly what staff-level interviews probe. In quant interviews, “How would you validate this code?” is a standard question; “I’d write a brute oracle and stress-test against random inputs” is a strong answer. In CP, stress testing is required at the Div 2/Div 1 level.
This lab is the meta-lab for the entire phase: build the tooling that will save you in every other lab.
Problem Statement
Given a candidate solution and a brute-force oracle for some problem, build a harness that:
- Generates random inputs of small size (so brute is fast).
- Runs both solutions.
- Compares outputs and prints / dies on the first mismatch.
- Uses a seeded RNG so failures are reproducible.
We’ll use prefix sum range queries as the test problem. Sub-problems:
- Brute: for each query
(l, r), suma[l..r]directly.O(N)per query. - Candidate: precompute
prefix[i] = a[0] + ... + a[i-1], answer each query asprefix[r+1] - prefix[l].O(1)per query.
We will deliberately introduce two bugs in the candidate and use the harness to find each.
Constraints
- For stressing:
N ≤ 50, values in[-10, 10],Q ≤ 50queries. Small enough that brute isO(N · Q) = 2500ops per test, allowing ~10⁵ tests / second. - The candidate should pass when correct, fail clearly when buggy.
Clarifying Questions
- “What’s the comparison rule for outputs?” — exact equality (lists of ints).
- “How small should random inputs be?” — small enough that brute finishes in microseconds per test, big enough to expose edge cases. Rule of thumb: smallest size where the candidate’s structure is non-trivial.
- “Is determinism required?” — yes; seed the RNG so the same failing test re-runs identically.
- “Output format on mismatch?” — input that triggered, both outputs, the seed.
Examples
A passing harness run prints nothing (or a PASSED line). A failing run prints the first counterexample:
MISMATCH at iteration 47 (seed=12345):
input: N=5, a=[3, -2, 1, 4, -1], queries=[(0, 4), (1, 3), (2, 2)]
brute: [5, 3, 1]
candidate:[5, 3, 0]
Initial Brute Force
The brute oracle is the brute force here:
def prefix_sum_brute(a, queries):
return [sum(a[l:r+1]) for l, r in queries]
It’s O(N · Q), slow but unambiguous.
Brute Force Complexity
O(N · Q) per test case. For N = Q = 50, ~2500 ops per test. Running 100,000 stress iterations completes in seconds.
Optimization Path
The harness itself doesn’t optimize. The thing being tested (candidate solution) does. The harness’s job is to detect when the optimization is incorrect.
Final Expected Approach
The candidate (intentionally with bugs to discover):
def prefix_sum_candidate(a, queries):
n = len(a)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = prefix[i] + a[i]
# BUG 1: should be prefix[r+1] - prefix[l], not prefix[r] - prefix[l]
return [prefix[r] - prefix[l] for l, r in queries]
The harness:
import random
def stress(brute, candidate, gen, n_iters=10000, seed=0):
rng = random.Random(seed)
for it in range(n_iters):
inp = gen(rng)
b_out = brute(*inp)
c_out = candidate(*inp)
if b_out != c_out:
print(f"MISMATCH at iteration {it} (seed={seed}):")
print(f" input: {inp}")
print(f" brute: {b_out}")
print(f" candidate: {c_out}")
return False
print(f"PASSED {n_iters} iterations.")
return True
def gen_prefix_sum(rng):
n = rng.randint(1, 10)
a = [rng.randint(-5, 5) for _ in range(n)]
q = rng.randint(1, 5)
queries = []
for _ in range(q):
l = rng.randint(0, n - 1)
r = rng.randint(l, n - 1)
queries.append((l, r))
return (a, queries)
stress(prefix_sum_brute, prefix_sum_candidate, gen_prefix_sum, n_iters=1000, seed=42)
The harness will fire and report the first failing input within a few iterations. Fix prefix[r] to prefix[r+1]. Re-run.
Now introduce BUG 2 (subtle): use prefix[i] initialized as 0 for i = 0 but set prefix[i+1] = prefix[i] + a[i+1] (off-by-one in the recurrence). Stress finds it again.
After both fixes, the harness reports PASSED 10000 iterations. and you know the candidate is (probably) correct.
Data Structures Used
- A seeded RNG (
random.Random(seed)in Python;mt19937in C++). - A generator function returning a random valid input.
- A brute oracle.
- A candidate solution.
- The harness loop.
Correctness Argument
Why this works. If the brute oracle is correct (small enough that we can verify by hand), and the candidate disagrees, then the candidate is wrong on that input. We have a counterexample. Conversely, if the candidate matches the brute on n_iters = 10⁵ random small inputs, it’s probably correct — the chance that a non-trivial bug survives all of them is small for most input distributions, but not zero. Add adversarial inputs (all same, all max, all min, edge sizes 0, 1) explicitly to the generator to harden coverage.
Why determinism matters. When the harness fires, you want to re-run with the same seed to reproduce; without a seed, the bug might evaporate next run.
Why small inputs. The smaller the input, the faster brute runs (more iterations), and the easier the counterexample is to debug by hand. CF-grade stress tests use N ≤ 5 for the first pass.
Complexity
Per stress iteration: brute is O(N · Q); candidate is O(N + Q); comparison is O(Q). Harness overhead is negligible. For N = Q = 10 and n_iters = 10⁵, total is ~10⁷ ops — under 1 second in Python.
Implementation Requirements
- Seed the RNG explicitly. Print the seed on failure.
- Generator must produce valid inputs (respects all problem constraints — non-empty arrays, valid index ranges, etc.).
- On mismatch, print the minimal failing input. (Optional refinement: shrink the input by retrying with smaller sizes once you’ve found a failure.)
- Cover edge cases: empty array, single element, all-same values, max-size inputs.
Tests
The harness is itself code; it should be tested.
def test_harness_finds_planted_bug():
def buggy(a, queries):
return [sum(a[l:r]) for l, r in queries] # off-by-one: should be a[l:r+1]
# The harness should fire (return False) on a buggy candidate.
result = stress(prefix_sum_brute, buggy, gen_prefix_sum, n_iters=1000, seed=1)
assert result == False
def test_harness_passes_correct_candidate():
def correct(a, queries):
n = len(a)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i + 1] = prefix[i] + a[i]
return [prefix[r + 1] - prefix[l] for l, r in queries]
result = stress(prefix_sum_brute, correct, gen_prefix_sum, n_iters=1000, seed=1)
assert result == True
Follow-up Questions
- “What if brute is also buggy?” — write brute as straightforwardly as possible (read the problem statement and implement it word-for-word). Skip optimizations. If both brute and candidate agree on a bug, you have no oracle and stress testing fails. Mitigation: cross-check brute against the problem’s sample I/O before stressing.
- “How to shrink a counterexample?” — once a failing input is found, repeatedly remove array elements / queries / values; if it still fails, keep the smaller input. Greedy; good enough.
- “Stress testing for graph problems?” — generator emits random small graphs (
N ≤ 6vertices, random edges). Brute is BFS / DFS over all paths. - “What if the answer isn’t unique?” — write a checker instead of a comparator: validate the candidate’s output as one of many valid answers (e.g., for “any valid topological order”).
- “Multi-threaded stress?” — easy with a process pool; each worker has its own seed offset.
Product Extension
Property-based testing in production: tools like Hypothesis (Python), QuickCheck (Haskell), proptest (Rust) generate random inputs and check invariants — same idea, different framing. Fuzz testing for security: AFL, libFuzzer feed random / mutated bytes to a binary and check for crashes. Differential testing across implementations: compare a new compiler against an old one on random programs (CSmith for C, csmith for SQL via SQLancer, etc.). The harness pattern transfers directly.
Language / Runtime Follow-ups
- Python:
random.Random(seed)—mt19937under the hood.pytest+hypothesisfor property-based testing in production code. - C++:
std::mt19937 rng(seed); std::uniform_int_distribution<int> dist(lo, hi);— never userand()(low-quality, broken seeding on Windows). - Java:
java.util.Random(seed)(LCG, low-quality but reproducible) orSplittableRandom(better statistical quality). - Go:
rand.New(rand.NewSource(seed)). The defaultmath/randglobal is not thread-safe. - JS/TS: seedable RNG requires a library (
seedrandom); built-inMath.random()is not seedable.
Common Bugs
- Forgetting to seed the RNG → non-reproducible failures.
- Brute oracle has the same bug as the candidate → false negative (stress passes a buggy solution).
- Generator produces invalid inputs (e.g., negative array sizes) → both brute and candidate crash → not a useful comparison.
- Generator’s distribution is too narrow → never hits edge cases (all-equal, sorted, reverse-sorted, single element).
- Output comparison uses
==on floats with rounding errors → spurious mismatches; use tolerance. - Harness exits silently on first iteration if generator throws → wrap in try/except and report.
- Letting
Ngrow too large → brute is too slow → fewer iterations → less coverage.
Debugging Strategy
When stress fires:
- Print the failing input. Run just that input through both brute and candidate.
- If brute disagrees with your hand-computed answer, brute is buggy. Fix brute first.
- If candidate disagrees with brute (and brute matches your hand-computed answer): trace candidate’s execution on the failing input by hand or in a debugger. The bug is local to a few lines.
- Once fixed, re-run stress with the same seed; if it passes, increment seed and run again.
When stress passes but the real submission still WA:
- Generator might not cover the failing case. Inspect the failing test’s input distribution (size, value range, special structure) and update the generator.
- Add explicit corner cases: empty input, single element, max-size input, all duplicates, all distinct, sorted asc, sorted desc.
- Push
Nhigher; some bugs surface only at scale.
Mastery Criteria
- Build a stress harness from blank for an array problem in <10 minutes.
- Find a planted off-by-one bug in <30 seconds of harness runtime.
- Articulate why the brute oracle must be obviously-correct in one sentence.
- Generate adversarial corner cases (empty, single, all-equal) deliberately, not only random.
- Use the same harness pattern across array, graph, and string problems.
- When a real submission WAs, default to “stress test” instead of “read the code again”.
← Lab 05 — Coordinate Compression · Phase 7 README
Phase 8 — Practical Engineering Coding Interviews
Target level: Medium-Hard → Hard (senior+ practical interview track) Expected duration: 4 weeks Weekly cadence: 5–6 labs/week, with each lab requiring a complete working implementation, tests, and rehearsed answers to follow-ups Companies this targets: Big Tech L5+ (Google L5/L6, Meta E5/E6, Amazon SDE-III/Principal, Microsoft Sr/Principal), Stripe, Uber, Airbnb, Cloudflare, Datadog, Snowflake, Databricks, infrastructure-heavy startups
Why This Phase Exists
Phase 2 through Phase 7 trained you to recognize patterns and produce optimal algorithms under a stopwatch. That training is necessary and remains the gating function for the first 30 minutes of most rounds. But there is a second, distinct kind of coding interview that you will face starting at the senior level (and at every level at companies like Stripe, Airbnb, and Uber where the engineering bar is calibrated against production code rather than against contest performance).
That second kind of interview is the practical engineering coding round. You are asked to “build an LRU cache”, “build a rate limiter”, “build a thread pool”, “build a job queue”, “build a small in-memory filesystem”. The problem is not algorithmically extreme — most of these have textbook solutions you could find in a CS curriculum. What the interviewer is testing is whether your code looks like production code:
- Are your data structures encapsulated behind a clean API?
- Are mutations and reads separated cleanly?
- Are concurrency invariants explicit, or did you sprinkle locks “just in case”?
- Do you handle partial failure, shutdown, and resource cleanup?
- Did you write tests that actually exercise the contract — including concurrency tests where relevant?
- Can you answer the inevitable follow-ups about scaling, observability, and operational concerns?
Candidates from a pure LeetCode background routinely fail this round. They produce a one-function LRUCache that passes the LC test cases, then freeze when the interviewer asks “how would you make this thread-safe?” or “how would you observe this in production?” or “what would you do if a put could fail mid-operation?” The interviewer’s note reads: “Strong on the algorithm, weak on engineering. No-hire for senior.”
The bar at senior+ practical interviews is not “did you write code that produces the right answer”. The bar is “did you write code that I would be willing to deploy”. Those are different bars, and this phase trains the second one explicitly.
What Makes Practical Problems Different From LeetCode
| Dimension | LeetCode-style | Practical engineering |
|---|---|---|
| Optimization target | Big-O time, sometimes space | API surface, testability, operational fitness, correctness under concurrency |
| Code length | 20–40 lines | 100–400 lines (a class with several methods + tests) |
| State | Local to a function | Owned by a long-lived object with invariants across calls |
| Concurrency | Almost never tested | Almost always at least raised as a follow-up |
| Failure modes | “Wrong answer on test 47” | Partial failure, restart, poison input, backpressure, shutdown |
| Tests | Provided by the judge | You write them |
| Follow-ups | Variant problems with tweaked constraints | Operational reality questions (“scale to N nodes”, “persist across restarts”) |
| Bar for excellence | Optimal complexity | Production readability + correctness + answers all follow-ups crisply |
A LeetCode answer that nails the algorithm but ships a 60-line wall-of-code with single-letter variables and no separation of concerns will get a no-hire at the senior bar even when the algorithm is correct. Conversely, a practical answer that is a little slower than optimal but is cleanly structured, well-tested, and accompanied by sharp follow-up answers will get a strong hire.
You will not “see” this difference until you’ve practiced enough practical labs to internalize what “clean” looks like at the senior bar. That internalization is the entire point of this phase.
The 13 Standard Follow-Ups
Every problem in this phase will be followed by a subset of these thirteen questions. They are not problem-specific — they are senior-bar questions that recur across the industry. Memorize the question list. Then, for each lab in this phase, rehearse the answer for the 4–6 follow-ups that are most natural for that problem. By the end of Phase 8 you should be able to give a 60-to-90-second answer to any of these for any data structure or service-shaped object you’ve built.
- How would you make it thread-safe? Identify the critical sections, choose between coarse-grained mutex / fine-grained locks / lock-free / CAS / sharded locks, justify the choice, name the failure modes the choice avoids (deadlock, lost update, torn read), and state the contention behavior under load.
- How would you persist state across restarts? Pick between full snapshot, log-structured append (write-ahead log), and snapshot+log; address durability (fsync), atomicity (rename or checksum), and recovery (replay log on boot). State the time-to-recover and the worst-case data loss window.
- How would you scale to N nodes? Decide between sharding (partition the keyspace), replication (read scaling), and routing (consistent hashing + virtual nodes). Address rebalancing, hotspotting, and cross-node operations. Don’t reach for “distribute everything” — most practical objects scale by sharding.
- How would you observe and monitor it? Name the four signals (latency, traffic, errors, saturation — Google’s Golden Signals) and state which metric you’d emit for each. Specify whether you’d export histograms (latency), counters (events), or gauges (queue depth). Describe the dashboard you’d build.
- How would you test it? Three layers minimum: unit tests on each method’s contract; integration / smoke tests on end-to-end flows; concurrency / stress tests where multiple goroutines or threads exercise the object. Mention property-based testing where invariants are clean.
- What metrics would you emit? Per-operation counters (puts, gets, hits, misses); per-operation latency histograms; queue / cache size gauges; failure-class counters (eviction, timeout, retry, poison). Reject the temptation to emit everything — emit what you’d actually look at on a 3 AM page.
- How would you handle backpressure? Decide between blocking the producer, dropping the request, returning an error, or buffering with a bounded queue and rejection policy. State which one you chose and why. The wrong answer here is “we’d have a really big buffer” — that just delays the problem and worsens latency.
- How would you handle partial failure? Identify which operations can fail mid-way (a write that succeeds locally but fails to persist; a network call that times out without confirmation). Choose between idempotent retry, two-phase commit, log-and-recover, or just-fail-fast. Don’t reach for “transactions” reflexively — pick the tool that matches the problem.
- What is the eviction policy and cleanup strategy? For caches: LRU / LFU / TTL / size-bounded. For queues: drop oldest / drop newest / dead-letter. For background state: TTL + scavenger goroutine. State the worst-case eviction storm.
- What is the consistency model? Strong (linearizable), sequential, causal, eventual, monotonic-read. Most in-memory single-process objects are linearizable trivially; the question becomes interesting once replicated. Be precise about what guarantees you offer.
- What configuration knobs would you expose? Capacity, TTL, retry count, backoff base, concurrency limit, shutdown timeout. State sensible defaults. Critically: state the knobs you would not expose, because over-configuration is its own production smell.
- What is the shutdown / draining behavior? On
close()/SIGTERM: stop accepting new work, finish in-flight work up to a deadline, persist or surface anything not finished, release resources. Specify the deadline. Specify what happens when the deadline expires. - How would you handle a poison-pill input? A request that crashes the worker, exhausts memory, or causes an infinite loop. Bound resource usage per request, isolate the worker, route repeat-offending payloads to a dead-letter queue, and emit a metric. Never silently drop them.
For each lab, the Follow-up Questions section selects 4–6 of these and rehearses an answer. Memorizing one bullet per question is not enough — you need to be able to converse about the choice, naming alternatives and tradeoffs.
Implementation Discipline Expected In This Phase
This is the heaviest phase by code volume. Every lab demands a complete working implementation, not pseudocode and not a sketch. The bar is “could a coworker submit this for code review without me being embarrassed?”. Concretely:
- Idiomatic in the chosen language. Python uses snake_case, dataclasses where natural,
withblocks for locks,asynciowhere the lab demands async. Java uses camelCase, prefersjava.util.concurrentprimitives, declares interfaces. Go uses short receiver names, returns errors as last value, prefers channels for fan-out, mutexes for shared state. - Small functions, one concern per function. A method that does both validation and mutation should be split. The exception is hot-path code where inlining matters; if you inline, leave a one-line comment explaining why.
- Names that describe intent, not type.
evict_lru()note(),pending_jobsnotpj,acquire_token_or_block(timeout)nottake(). - Separation of concerns. Storage, eviction policy, concurrency primitives, observability hooks, and configuration are all distinct concerns. Most labs in this phase have natural seams between them — find the seams and respect them. A class that mixes “manages state”, “decides policy”, and “emits metrics” in every method is harder to test than three classes that compose.
- Testable design. Every public method has an obvious test. Constructors take their dependencies (the eviction policy, the clock, the metrics emitter) as parameters so tests can inject fakes. Hardcoded
time.now()calls inside business logic are a code smell — inject a clock. - Explicit error handling. Every external call has a defined behavior on failure. Silent
try/except: passis forbidden unless accompanied by a comment explaining why the exception is benign. - Concurrency invariants documented. If a class is thread-safe, say so in the docstring and name the lock that guards each field. If a class is not thread-safe, say so. The forbidden state is “it might be thread-safe, the author didn’t think about it”.
- No premature abstraction. Two implementations of an interface justify the interface. One implementation does not. Don’t add a
Storageinterface for the in-memory backing map until you actually have a second backing.
The labs do not enforce a single language across the phase. Pick Python, Java, or Go for each lab based on what feels natural. Most candidates default to Python because the standard library is rich and the syntax is dense; Java is a strong choice when concurrency and java.util.concurrent primitives are at the heart of the problem (thread pool, blocking queue, atomic counters); Go is excellent when the problem is naturally concurrent and channel-shaped (job queue, dispatcher, crawler). For each lab, the Language/Runtime Follow-ups section calls out the right idiomatic choice in each of the major languages.
The 23 Labs
| # | Lab | Core Idea (one line) |
|---|---|---|
| 01 | LRU Cache | O(1) get/put via doubly-linked list + hashmap; the canonical practical-coding warmup |
| 02 | LFU Cache | Frequency-bucketed eviction; tie-breaking by recency; harder than LRU |
| 03 | Rate Limiter | Four algorithms compared: token bucket, leaky bucket, sliding window log, sliding window counter |
| 04 | Task Scheduler | Priority-aware task scheduling with retries, backoff, and a dead-letter queue |
| 05 | Thread Pool | Bounded worker pool with work queue, graceful shutdown, and rejection policy |
| 06 | Durable Job Queue | At-least-once delivery semantics with idempotency keys and ack/nack |
| 07 | Autocomplete | Trie + per-prefix top-K with weighted scores and sub-millisecond response |
| 08 | Log Parser | Streaming log line parser with regex extraction and bounded memory |
| 09 | File Deduplication | Three-stage pipeline: size → quick hash → full hash |
| 10 | Consistent Hashing | Hash ring with virtual nodes, minimal key movement on add/remove |
| 11 | Message Dispatcher | Fan-out to N consumers with fairness, priority, and per-consumer backpressure |
| 12 | Pub/Sub | In-memory topic-based publish/subscribe with wildcard subscriptions |
| 13 | Timer Wheel | Hierarchical timer wheel for O(1) amortized timer scheduling |
| 14 | Key-Value Store | In-memory KV with TTL, snapshot+WAL persistence, and crash recovery |
| 15 | Retry With Backoff | Exponential backoff + decorrelated jitter + max-attempts + retryable-error policy |
| 16 | Circuit Breaker | Three-state machine: closed / open / half-open with sliding-window failure counting |
| 17 | Metrics Collector | Counter / gauge / histogram with bounded memory and atomic updates |
| 18 | Web Crawler | Concurrent crawler with depth limit, politeness (per-host throttle), and dedup |
| 19 | In-Memory Filesystem | ls, mkdir, addContentToFile, readContent over a tree of inodes |
| 20 | Snake Game | State machine + collision detection + score; classic OOD round |
| 21 | Tic-Tac-Toe Streaming | O(1) winner check by maintaining row/col/diagonal counters |
| 22 | Text Editor Buffer | Gap buffer / piece table for cursor-local edits; the canonical editor data structure |
| 23 | SQL-Like Engine | Toy parser + executor for SELECT … FROM … WHERE … JOIN … over in-memory tables |
The order is not arbitrary. Labs 1–6 are the canonical warmups (LRU is asked at every senior interview that uses this format). Labs 7–14 stretch into harder data-structure and operational territory. Labs 15–17 are pure operational primitives (retry, circuit breaker, metrics) that show up in service-design rounds. Labs 18–23 are larger, more open-ended OOD-style problems where the interviewer wants to see how you decompose a fuzzy problem into classes.
If you have a 4-week schedule, do six labs per week with a buffer day for the final lab and a mock-interview rehearsal. If you have an 8-week schedule, do three per week and spend the extra time on the follow-ups — that’s where senior interviews are won and lost.
Mastery Checklist
You have completed Phase 8 when you can do the following without prompting:
- Implement LRU cache with thread-safety in <20 minutes from a blank screen, including a unit test suite that exercises eviction order.
- Implement LFU cache with correct tie-breaking in <30 minutes.
- Compare the four rate-limiting algorithms verbally and justify the right pick for a stated load profile in <2 minutes.
- Implement a thread pool with bounded queue, rejection policy, and graceful shutdown in <30 minutes.
- Implement a job queue with at-least-once semantics and explain why exactly-once is impractical in <2 minutes.
- Implement an in-memory KV store with TTL eviction in <25 minutes.
- Implement a circuit breaker with all three states and explain when half-open transitions back to closed in <2 minutes.
- Implement consistent hashing with virtual nodes in <30 minutes and explain the rebalancing cost on add/remove.
- For any of the 23 problems, answer all 13 standard follow-ups crisply (60–90 seconds each) without notes.
- Identify, for any production object you’ve built (in real work or in this phase), the four Golden Signals you’d emit and justify why those four.
- State the consistency model of any data structure you’ve built in one sentence.
- Write a stress test for a concurrent data structure that actually finds bugs (i.e., randomly interleaves operations across threads, asserts invariants after, replays the seed on failure).
- Refactor one of your own LeetCode-style 50-line answers from any earlier phase into a clean, testable, production-shaped class without consulting any reference.
Exit Criteria
You may exit Phase 8 and move on to Phase 9 — Language & Runtime Deep Dive when:
- Lab completion: every one of the 23 labs has been implemented and tested by you, in a language you would actually use at work, with the test suite passing on the first run after a 24-hour gap (no peek-and-debug). The 24-hour gap matters — it tests retention, not short-term memory.
- Follow-up fluency: you can answer the 13 standard follow-ups without prompts for at least 18 of the 23 labs.
- Mock interview: you have done at least 2 mock interviews drawn from this phase’s problem list (Phase 11 — Mock Interview Mastery) with a passing rubric score on both, where “passing” requires hitting both the algorithmic correctness and the production-readiness rubric dimensions.
- Code review readiness: you can take any of your Phase 8 implementations, post it as a hypothetical PR, and write the PR description (motivation, design choices, tradeoffs, test plan) in <10 minutes per implementation.
If any of the four criteria fail, do not move on. Most candidates underestimate (3) — they pass the algorithm dimension but fail the production-readiness dimension because they didn’t rehearse the follow-ups out loud. Read COMMUNICATION.md once more, then re-do the mocks. The mocks are not optional; the practical-engineering bar is calibrated against verbalized reasoning, not solo-coded artifacts.
Cross-References
- FRAMEWORK.md — the universal 16-step framework still applies. Practical problems extend step 16 (production implications), not replace steps 1–15.
- CODE_QUALITY.md — the bar is enforced here more strictly than anywhere else in the curriculum.
- phase-03-advanced-data-structures/ — several labs (LRU, LFU, trie) build on data structures introduced there. If you skipped Phase 3, do at least labs 7 and 8 of that phase before starting here.
- phase-04-graphs/ — the consistent-hashing and dispatcher labs share modeling instincts with graph problems.
- phase-09-language-runtime/ — the next phase. Practical engineering interviews and runtime interviews are deeply intertwined; many follow-up answers in Phase 8 cite runtime facts you’ll formalize in Phase 9.
- phase-11-mock-interviews/ —
mock-08-staff-practical.mdis built around this phase’s problem list.
Lab 01 — LRU Cache
Goal
Implement a thread-unsafe and a thread-safe LRUCache with O(1) get and put, using a doubly-linked list keyed by a hashmap. After this lab you should be able to write a clean, tested LRU cache from a blank screen in under 20 minutes, and answer the 13 standard follow-ups for it crisply.
Background Concepts
A cache is a bounded-capacity associative store that evicts entries when capacity is exceeded. The Least Recently Used (LRU) policy evicts whichever entry was accessed (read or written) the furthest in the past. The reason this is the canonical practical-coding warmup is that both O(1) operations require coordinating two data structures: a hashmap that maps keys to nodes (so get is O(1)), and a doubly-linked list ordered by recency (so eviction and recency-update are O(1)). Either structure alone forces an O(N) operation. That two-structure coordination is the engineering insight the interviewer wants to see.
The doubly-linked list uses sentinel head and tail nodes — this avoids null checks at the ends and reduces the function body from a tangle of if-statements to four pointer assignments per operation.
Interview Context
LRU cache is asked at almost every senior coding interview at Big Tech, Stripe, Uber, and Cloudflare. It is LeetCode 146 verbatim, but the bar at the practical-engineering level is far higher than passing the LeetCode test cases: you must produce production-shaped code (encapsulation, naming, error handling, optional thread-safety) and answer follow-ups about concurrency, persistence, sharding, and observability. Failing to articulate any of those follow-ups crisply is a common no-hire signal even when the algorithm is correct.
Problem Statement
Design a class LRUCache(capacity) supporting:
get(key) -> value or None— return the value forkey, marking the entry as most-recently-used. ReturnNone(or sentinel) if not present.put(key, value)— insert or update. If the cache exceedscapacity, evict the least-recently-used entry.
Both operations must run in O(1) average time.
Constraints
- 1 ≤
capacity≤ 10^5 - Keys are hashable; values are arbitrary
getandputmay be called up to 10^7 times in benchmarks- Thread-safe variant: any number of concurrent callers
Clarifying Questions
- Are keys always hashable, or can they be raw bytes / mutable objects? (Assume hashable.)
- Does
putof an existing key count as a “use” for LRU ordering? (Yes — by convention.) - Is the API allowed to return a sentinel for missing keys, or must it raise? (Both are defensible; pick one and document.)
- Must the cache be thread-safe? (Often “we’ll get to that” — write the single-threaded version first, then add a lock when asked.)
- Eviction callback (notify on evict) needed? (Often a follow-up; design so it can be added without changing the call sites.)
Examples
cache = LRUCache(2)
cache.put(1, "a") # state: [1]
cache.put(2, "b") # state: [2, 1]
cache.get(1) -> "a" # state: [1, 2]
cache.put(3, "c") # evict 2; state: [3, 1]
cache.get(2) -> None
cache.put(1, "z") # update; state: [1, 3]
cache.get(3) -> "c" # state: [3, 1]
Initial Brute Force
A dict plus a list that records the access order. Each get does a linear scan to remove and re-append the key. O(N) per op.
class LRUCacheBrute:
def __init__(self, cap):
self.cap = cap
self.store = {}
self.order = [] # least-recent at front
def get(self, k):
if k not in self.store: return None
self.order.remove(k); self.order.append(k)
return self.store[k]
def put(self, k, v):
if k in self.store:
self.order.remove(k)
elif len(self.store) >= self.cap:
old = self.order.pop(0)
del self.store[old]
self.store[k] = v; self.order.append(k)
Brute Force Complexity
O(N) per get / put because list.remove and list.pop(0) are O(N). At N=10^5 with 10^7 calls, this is 10^12 operations — it will not finish.
Optimization Path
Replace the order-tracking list with a doubly-linked list, and add a hashmap that maps each key to its node. Now removal and append are O(1), so each operation is O(1). The hashmap uses ~3x more memory than the brute, which is the standard space-time tradeoff and is acceptable.
Final Expected Approach
- Node:
(key, value, prev, next). Storing the key in the node is essential — when we evict the LRU node, we need its key to remove it from the hashmap. - Sentinels:
head(most recent) andtail(least recent) sentinel nodes that always exist. Real nodes live between them. _add_after_head(node): insert immediately afterhead._remove(node): splice out by relinking neighbors._touch(node): remove + add after head. This is the recency update.get(k): hashmap lookup → if hit,_touch(node)and return value; elseNone.put(k, v): if exists, update value and_touch; else create node,_add_after_head, hashmap insert, evict if over capacity (the node beforetail).
Data Structures Used
| Structure | Purpose |
|---|---|
dict | key → node lookup, O(1) |
| Doubly-linked list (with sentinels) | recency ordering, O(1) splice |
RLock (thread-safe variant) | guards both structures together |
Correctness Argument
The invariants we maintain after every get or put:
- The hashmap and the linked list contain exactly the same set of keys.
- The list size never exceeds
capacity. - The node immediately after
headwas the most-recently accessed; the node immediately beforetailis the LRU candidate.
Each of get, put, _touch, _remove, _add_after_head preserves all three invariants by inspection of the four pointer assignments per splice. The eviction step in put is the only place that may shrink both structures; it removes exactly one node from both, by reading its key from the node before deleting from the hashmap.
Complexity
get: O(1) average (hashmap lookup + 4 pointer writes)put: O(1) average- Space: O(capacity)
Implementation Requirements
A complete working implementation is required. Below is the canonical Python version with thread-safety and a clean separation between the storage (the linked list + hashmap) and the policy (eviction order). Tests are required.
import threading
from typing import Hashable, Any, Optional
class _Node:
__slots__ = ("key", "val", "prev", "next")
def __init__(self, key=None, val=None):
self.key, self.val = key, val
self.prev = self.next = None
class LRUCache:
"""Thread-safe O(1) LRU cache.
Concurrency: a single RLock guards both the hashmap and the linked list.
The locked region is short (constant pointer work), so contention is low
until very high concurrency. For higher concurrency, see follow-up #1.
"""
def __init__(self, capacity: int):
if capacity <= 0:
raise ValueError("capacity must be positive")
self._cap = capacity
self._map: dict[Hashable, _Node] = {}
self._head, self._tail = _Node(), _Node()
self._head.next = self._tail
self._tail.prev = self._head
self._lock = threading.RLock()
def get(self, key: Hashable) -> Optional[Any]:
with self._lock:
node = self._map.get(key)
if node is None:
return None
self._touch(node)
return node.val
def put(self, key: Hashable, value: Any) -> None:
with self._lock:
node = self._map.get(key)
if node is not None:
node.val = value
self._touch(node)
return
node = _Node(key, value)
self._map[key] = node
self._add_after_head(node)
if len(self._map) > self._cap:
lru = self._tail.prev
self._remove(lru)
del self._map[lru.key]
def __len__(self) -> int:
with self._lock:
return len(self._map)
def _add_after_head(self, node: _Node) -> None:
node.prev = self._head
node.next = self._head.next
self._head.next.prev = node
self._head.next = node
def _remove(self, node: _Node) -> None:
node.prev.next = node.next
node.next.prev = node.prev
def _touch(self, node: _Node) -> None:
self._remove(node)
self._add_after_head(node)
Tests
Required: unit tests for the contract, smoke tests for ordering, concurrency tests for the thread-safe variant.
import unittest, threading, random
class TestLRU(unittest.TestCase):
def test_basic(self):
c = LRUCache(2)
c.put(1, "a"); c.put(2, "b")
self.assertEqual(c.get(1), "a")
c.put(3, "c") # evicts 2
self.assertIsNone(c.get(2))
self.assertEqual(c.get(3), "c")
self.assertEqual(c.get(1), "a")
def test_update_is_a_use(self):
c = LRUCache(2)
c.put(1, "a"); c.put(2, "b"); c.put(1, "z")
c.put(3, "c") # evicts 2 (1 was just updated)
self.assertEqual(c.get(1), "z")
self.assertIsNone(c.get(2))
def test_capacity_one(self):
c = LRUCache(1)
c.put(1, "a"); c.put(2, "b")
self.assertIsNone(c.get(1))
self.assertEqual(c.get(2), "b")
def test_concurrent(self):
c = LRUCache(100)
def worker():
for _ in range(10000):
k = random.randint(0, 200)
if random.random() < 0.5:
c.put(k, k * 2)
else:
c.get(k)
threads = [threading.Thread(target=worker) for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()
self.assertLessEqual(len(c), 100) # invariant: never over capacity
Follow-up Questions
(1) How would you make it thread-safe? Already shown: a single RLock around the body of every public method. The lock is held only for O(1) work, so contention is bounded. For higher concurrency, shard by hash(key) % N into N independent caches; this is the practical answer for production caches at high QPS. A lock-free LRU is hard and rarely worth it.
(2) How would you persist state across restarts? Snapshot the cache to disk on a configurable interval (write the (key, value, recency-rank) triples in LRU order). On restart, replay the file in order. For stricter durability, write a per-put log entry to a write-ahead log; on restart, replay snapshot + log. Note: most caches choose not to persist — losing the cache on restart is usually fine, and persistence adds complexity for small benefit.
(4) How would you observe and monitor it? Emit hit-rate (hits / (hits + misses)) as a gauge; emit eviction count as a counter; emit cache size as a gauge; emit get/put latency as a histogram. Hit rate is the #1 actionable signal. Set an alert on hit rate dropping below the SLO.
(7) How would you handle backpressure? Caches don’t have classical backpressure since there’s no producer queue, but the analog is memory pressure: if the host is short on memory, the cache should shed load. Either (a) a soft size_in_bytes ceiling that triggers eviction beyond capacity, or (b) integrate with a host-level memory pressure signal (cgroup memory accounting on Linux). Decide explicitly which.
(11) What configuration knobs would you expose? capacity (entries), optionally size_bytes (RAM ceiling), optionally eviction_callback. Knobs not to expose: lock granularity, internal data structure choice, snapshot interval (set sensible default and document).
(12) What is the shutdown / draining behavior? The cache itself is in-memory and stateless from the caller’s perspective; on shutdown, optionally write a snapshot, then release the lock and let GC reclaim. No draining required; no in-flight work to finish.
Product Extension
LRU caches are the workhorse of CDN edge caches, database query result caches (Redis with allkeys-lru), HTTP reverse proxies, and database buffer pools. Real-world variants: two-level cache (L1 in-process + L2 Redis); size-aware LRU (counts bytes, not entries); adaptive LRU/LFU hybrid (ARC, used by ZFS). The data structure you wrote here is the textbook foundation; the variants tune it for specific workloads.
Language/Runtime Follow-ups
- Python:
collections.OrderedDictalready implements LRU semantics —move_to_end()for_touch,popitem(last=False)for evict. In an interview, state that you knowOrderedDictexists, then implement from scratch because the interviewer wants to see the linked list. Production: preferOrderedDictorfunctools.lru_cache(decorator) unless you need eviction callbacks. - Java:
LinkedHashMapwithaccessOrder=trueand an overriddenremoveEldestEntryis the textbook Java LRU. For thread-safety wrap withCollections.synchronizedMap, or use Caffeine (Guava successor) which is O(1) and concurrent. - Go: no stdlib LRU; the
container/listpackage gives a linked list, and amap[K]*list.Elementgives O(1) lookup. Thehashicorp/golang-lrupackage is the de facto standard in production Go. - C++:
std::list<std::pair<K,V>>plusstd::unordered_map<K, list::iterator>. Iterator stability ofstd::listis the reason this works. - JS/TS:
Mappreserves insertion order, somap.delete(k); map.set(k, v)is the recency-update pattern. Eviction ismap.delete(map.keys().next().value). This works becauseMap.prototype.keys()returns keys in insertion order.
Common Bugs
- Forgetting to update the hashmap when evicting (only removing from the linked list). The next
getfor the evicted key returns a dangling node. The fix is to read the node’skeybefore unlinking and use it to delete from the hashmap. - Storing only the value (not the key) in the node, then having no way to find the hashmap entry to delete on eviction.
- Calling
_touchbefore checking whether the key exists — touches aNone. - In the thread-safe variant, taking the lock in
getbut not input, or releasing between the eviction step and the insert step. The whole operation must be atomic. - Using a non-reentrant
Lockand then calling another locked method internally — deadlock. UseRLockif you need to call locked methods from inside a locked method.
Debugging Strategy
If get returns the wrong value: print [(n.key, n.val) for n in walk(self._head)] after each put, compare to the expected access order. If the cache exceeds capacity: assert len(self._map) <= self._cap after every put; the violation tells you which call broke the invariant. For concurrency bugs (rare under the single-lock design), run the concurrent test with pytest --count=1000 until a failure repros, then add print(threading.get_ident(), op, key) traces and minimize.
Mastery Criteria
-
Implemented
LRUCachewith sentinel head/tail in <15 minutes from blank screen. - All four tests pass on first run.
- Articulated invariants (hashmap-list set equality, capacity bound, head/tail recency) without prompting.
- Stated O(1) time and O(capacity) space unprompted.
- Answered follow-ups #1, #4, #7, #11, #12 in <90 seconds each, naming alternatives.
- Refactored a single-threaded version into the thread-safe version in <5 minutes.
Lab 02 — LFU Cache
Goal
Implement a LFUCache with O(1) get and put. The challenge over LRU is the tie-breaking rule: when multiple keys share the minimum frequency, evict the least-recently-used among them. Expect this to take 30 minutes from blank screen on first attempt; aim to bring it down to 20 with practice.
Background Concepts
LFU evicts the least-frequently-used entry. Frequency means “number of get plus put calls referencing that key since insertion”. A naive frequency-counter approach forces an O(N) eviction scan. The O(1) trick is to bucket nodes by frequency: a hashmap from freq → DoublyLinkedList plus a min_freq cursor. On eviction, pop the LRU entry from freq_map[min_freq]. On a hit, move the node from freq_map[f] to freq_map[f+1]; when freq_map[min_freq] empties, advance min_freq.
This is a textbook example of bucket sort applied to a dynamic counter. Each frequency bucket is itself an LRU list, which handles tie-breaking in O(1). Together you get O(1) for both ops at the cost of more code than LRU.
Interview Context
LFU follows LRU as the second-asked cache problem at senior practical interviews. It is LeetCode 460 verbatim. Where LRU is a 20-minute problem, LFU is a 30-to-45-minute problem and tests whether you can design with two coordinated abstractions (the freq map and the per-bucket LRU lists). Most candidates fail by picking a frequency-only design and then hitting the tie-breaking case.
Problem Statement
Design LFUCache(capacity):
get(key) -> value or None— increment frequency on hit.put(key, value)— insert/update; on capacity overflow, evict the LFU entry. Tie-break by LRU within the LFU bucket. Insertion sets frequency to 1.
Both O(1) average.
Constraints
- 1 ≤
capacity≤ 10^5 - 10^7 ops in benchmarks
- Thread-safety: optional follow-up
Clarifying Questions
- Does
putof an existing key increment frequency? (Yes by convention; confirm.) - Tie-breaking: LRU or arbitrary? (LRU is the standard expectation; confirm.)
- What does
getreturn on miss? (Noneor sentinel; confirm.) - Should frequencies decay over time (windowed LFU)? (Often a follow-up; default is “no decay”.)
Examples
c = LFUCache(2)
c.put(1, "a") # freq: {1->1}
c.put(2, "b") # freq: {1->1, 2->1}
c.get(1) -> "a" # freq: {1->2, 2->1}
c.put(3, "c") # evict 2 (freq=1, LRU); state {1, 3}
c.get(2) -> None
c.get(3) -> "c" # freq: {1->2, 3->2}
c.put(4, "d") # tie at freq=2; evict 1 (LRU); state {3, 4}
c.get(1) -> None
Initial Brute Force
dict mapping key → (value, freq, last_access_time). On eviction, scan all entries to find the (min freq, min time) pair. O(N) eviction.
Brute Force Complexity
get is O(1), but put with eviction is O(N). At N=10^5 over 10^7 calls, slow but tolerable for small inputs; will TLE at large N.
Optimization Path
Replace the linear eviction scan with frequency bucketing. Maintain:
key_to_node: dict[K, Node]for O(1) lookupfreq_to_list: dict[int, DoublyLinkedList]mapping frequency to an LRU list of nodesmin_freq: inttracking the current minimum frequency present
On get(k): look up node, remove from freq_to_list[node.freq], increment node.freq, append to freq_to_list[node.freq]. If the old bucket is now empty and it equaled min_freq, increment min_freq.
On put(k, v): if exists, update value and behave like get. Otherwise, evict from freq_to_list[min_freq] if at capacity (pop the front = LRU), then insert with freq=1 and reset min_freq=1.
The reset of min_freq=1 on insertion is the only step that’s not obvious; without it you’d evict the wrong bucket on the very next put.
Final Expected Approach
Three layers: Node, per-frequency DoublyLinkedList (with sentinels), LFUCache orchestrating the two maps. The per-bucket linked list handles LRU tie-breaking automatically — append on insert, pop from front on evict.
Data Structures Used
| Structure | Purpose |
|---|---|
dict[K, Node] | O(1) key lookup |
dict[int, DLList] | O(1) frequency → bucket |
| Doubly-linked list per bucket | LRU within frequency, O(1) splice |
min_freq: int | O(1) eviction target |
Correctness Argument
Invariants maintained after every operation:
key_to_nodeand the union of allfreq_to_list[f]contain the same set of keys.- Every node
nlives infreq_to_list[n.freq]and only there. min_freqis the smallestfsuch thatfreq_to_list[f]is non-empty (or any value when the cache is empty).- Within each bucket, the front is the LRU and the back is the MRU.
Each method preserves these by case analysis. The subtle case is on get: incrementing node.freq may empty the old bucket. We check if was_min_freq_bucket and now_empty: min_freq += 1. This is correct because every other key has frequency ≥ old min_freq + 1, since old min_freq was the global minimum and this was the only node at it (we know that because the bucket is now empty, but the bucket contained this node before the increment — so it was the only node).
Complexity
get: O(1)put: O(1)- Space: O(capacity)
Implementation Requirements
from typing import Hashable, Any, Optional
class _Node:
__slots__ = ("key", "val", "freq", "prev", "next")
def __init__(self, key=None, val=None, freq=1):
self.key, self.val, self.freq = key, val, freq
self.prev = self.next = None
class _DLList:
"""Doubly linked list with sentinels. Front = LRU, back = MRU."""
def __init__(self):
self.head, self.tail = _Node(), _Node()
self.head.next, self.tail.prev = self.tail, self.head
self.size = 0
def append(self, node: _Node) -> None: # add to back (MRU)
prev = self.tail.prev
prev.next = node; node.prev = prev
node.next = self.tail; self.tail.prev = node
self.size += 1
def remove(self, node: _Node) -> None:
node.prev.next = node.next; node.next.prev = node.prev
self.size -= 1
def pop_front(self) -> _Node: # evict LRU
node = self.head.next
self.remove(node)
return node
def is_empty(self) -> bool:
return self.size == 0
class LFUCache:
def __init__(self, capacity: int):
if capacity <= 0:
raise ValueError("capacity must be positive")
self._cap = capacity
self._key_to_node: dict[Hashable, _Node] = {}
self._freq_to_list: dict[int, _DLList] = {}
self._min_freq = 0
def get(self, key: Hashable) -> Optional[Any]:
node = self._key_to_node.get(key)
if node is None:
return None
self._bump(node)
return node.val
def put(self, key: Hashable, value: Any) -> None:
if self._cap == 0:
return
node = self._key_to_node.get(key)
if node is not None:
node.val = value
self._bump(node)
return
if len(self._key_to_node) >= self._cap:
evicted = self._freq_to_list[self._min_freq].pop_front()
del self._key_to_node[evicted.key]
node = _Node(key, value, freq=1)
self._key_to_node[key] = node
self._freq_to_list.setdefault(1, _DLList()).append(node)
self._min_freq = 1
def _bump(self, node: _Node) -> None:
old_list = self._freq_to_list[node.freq]
old_list.remove(node)
if old_list.is_empty() and node.freq == self._min_freq:
self._min_freq += 1
node.freq += 1
self._freq_to_list.setdefault(node.freq, _DLList()).append(node)
def __len__(self) -> int:
return len(self._key_to_node)
Tests
import unittest
class TestLFU(unittest.TestCase):
def test_basic_eviction(self):
c = LFUCache(2)
c.put(1, "a"); c.put(2, "b")
self.assertEqual(c.get(1), "a") # 1.freq=2, 2.freq=1
c.put(3, "c") # evict 2
self.assertIsNone(c.get(2))
self.assertEqual(c.get(3), "c") # 3.freq=2
def test_tie_break_lru(self):
c = LFUCache(2)
c.put(1, "a"); c.put(2, "b") # both freq=1; 1 is LRU
c.put(3, "c") # evict 1 (LRU at freq=1)
self.assertIsNone(c.get(1))
self.assertEqual(c.get(2), "b")
def test_update_increments_freq(self):
c = LFUCache(2)
c.put(1, "a"); c.put(2, "b"); c.put(1, "z") # 1.freq=2
c.put(3, "c") # evict 2 (freq=1)
self.assertEqual(c.get(1), "z")
self.assertIsNone(c.get(2))
def test_capacity_zero(self):
c = LFUCache(0)
c.put(1, "a")
self.assertIsNone(c.get(1))
def test_min_freq_advance(self):
c = LFUCache(2)
c.put(1, "a"); c.put(2, "b")
c.get(1); c.get(1); c.get(2); c.get(2) # both freq=3
c.put(3, "c") # evict 1 (LRU at freq=3)
self.assertIsNone(c.get(1))
self.assertEqual(c.get(2), "b")
Follow-up Questions
(1) Thread-safe? A single RLock around get, put, and _bump is correct and the standard production answer. The locked region is O(1), so contention is bounded. Sharding by hash(key) % N into N independent LFU caches is the higher-throughput choice — but each shard has its own LFU bucketing, which is fine because LFU is per-key.
(4) Observe and monitor? Hit rate (hits / (hits + misses)) as a gauge; eviction count as a counter; frequency distribution as a histogram (10th, 50th, 90th, 99th percentile of frequency at eviction time) — this tells you whether the cache is actually being used “frequency-aware” or whether everything has freq=1. Cache size as a gauge.
(9) Eviction policy / cleanup? This is the eviction policy. The catch: long-tail entries can pile up at high frequency and never get evicted even after they go cold (a value popular yesterday is still freq=1000 today). Solutions: windowed LFU (decay frequencies on a timer), LFU-Aging (halve all frequencies periodically), or TinyLFU (admission filter that uses a count-min sketch). State the limitation and pick a mitigation.
(10) Consistency model? Linearizable in a single process under the lock — every get/put appears to take effect instantly at some moment between invocation and return. Replicated LFU is harder; most distributed caches degrade to eventual consistency on the cache and rely on an authoritative store underneath.
(11) Configuration knobs? capacity, optionally decay_interval (for windowed LFU), optionally eviction_callback. Don’t expose internal data-structure tuning.
Product Extension
LFU is the right policy when access patterns are stationary (popular items stay popular). It outperforms LRU on workloads with strong popularity skew (web caches, recommendation systems, query result caches). It performs worse than LRU on scan-heavy workloads (a large one-time scan pollutes LRU with a single bump but pollutes LFU with cold-but-frequent entries until aging kicks in). TinyLFU (used in Caffeine) combines a count-min admission filter with a small LRU window, getting LFU’s hit rate without the staleness problem.
Language/Runtime Follow-ups
- Python: same approach as shown.
collections.Counterfor frequencies is tempting but doesn’t give the bucketing we need. - Java: build on
LinkedHashMapfor the per-frequency buckets,HashMap<Integer, LinkedHashSet<K>>for the freq map. Caffeine provides production-grade TinyLFU. - Go:
container/listper bucket,map[int]*list.Listfor buckets. Thegroupcachelibrary uses a different policy (LRU); for LFU, write it yourself or usedgraph-io/ristretto(TinyLFU). - C++:
std::list<Node>per bucket;std::unordered_map<int, list>for freq map. - JS/TS:
Map<int, Set<K>>for buckets —Setpreserves insertion order, so LRU-within-bucket is free.
Common Bugs
- Forgetting to advance
min_freqwhen the LRU bucket becomes empty after a_bump. Subsequent eviction picks an empty bucket and crashes. - Resetting
min_freq=1on insert before the eviction step instead of after, evicting the wrong bucket. - Tie-breaking by MRU instead of LRU — appending to the front of the bucket instead of the back. The bucket is itself an LRU list; respect the convention.
- Sharing a single
_DLListacross freq buckets accidentally (e.g., a class-level default). Usesetdefault(freq, _DLList()). - On
putof an existing key, decrementing-then-incrementing frequency instead of bumping — produces correct frequency but breaks the bucketing if you forget to move buckets.
Debugging Strategy
Print the bucketing as {freq: [keys]} after every operation. The bug usually shows as a stale entry in min_freq’s bucket or a missing entry in the new bucket. Add an assert _check_invariants() method that walks the buckets and verifies (a) bucket→key relations, (b) min_freq correctness, (c) hashmap-bucket set equality. Run the test suite with assertions on.
Mastery Criteria
-
Implemented
LFUCachefrom blank screen in <30 minutes (target: <25 after second attempt). - All five tests pass first run.
- Stated tie-breaking-via-per-bucket-LRU explicitly without prompting.
-
Articulated the
min_freqadvancement rule precisely. - Answered follow-ups #1, #4, #9 (LFU-Aging / TinyLFU), #10, #11 in <90 seconds each.
- Compared LFU vs LRU on scan-heavy and popularity-skewed workloads in <60 seconds.
Lab 03 — Rate Limiter
Goal
Implement four rate-limiting algorithms — token bucket, leaky bucket, sliding window log, sliding window counter — and articulate the tradeoffs between them. After this lab you should be able to pick the right algorithm for a stated workload in under 30 seconds and implement any of the four in under 15 minutes.
Background Concepts
A rate limiter caps the number of requests a key (user, IP, API token) may make over a time window. The four standard algorithms differ in how much history they keep and what kind of bursts they allow:
- Token bucket: a bucket of capacity
Brefills at rateRper second; each request consumes one token; if no token, reject. Allows bursts up toB. - Leaky bucket: requests enter a queue of capacity
Qthat drains at rateR; if the queue is full, reject. Smooths bursts (output rate is constant). - Sliding window log: keep a list of timestamps over the last
Wseconds; reject iflen(log) ≥ N. Most accurate, most memory. - Sliding window counter: keep a count for the current and previous fixed window; estimate by linear interpolation. Cheap; mildly inaccurate at boundaries.
The token bucket is by far the most-common production choice (used by AWS, Stripe, GitHub) because it gives sensible burst tolerance with O(1) memory per key. Sliding window log is the choice when you must guarantee strict request-count caps (e.g., quota enforcement against a legal contract). Leaky bucket is used in network shaping. Sliding window counter is the right pick when memory is constrained and approximate is acceptable.
Interview Context
Rate limiter is asked at every senior+ practical round at Stripe, Cloudflare, Uber, and most high-scale API companies. The strong answer compares the four algorithms, picks one, justifies the pick, implements it, and answers the inevitable follow-ups about distributed coordination (multiple servers must share one quota), persistence, and observability. The weak answer implements one variant without acknowledging the others exist.
Problem Statement
Design RateLimiter with allow(key) -> bool. Configurable rate R requests per W seconds. Implement all four algorithms behind a common interface so they can be benchmarked.
Constraints
- Up to 10^6 distinct keys
- Up to 10^5 QPS aggregate
- Sub-millisecond per-call latency
- Configurable rate per key (follow-up)
Clarifying Questions
- Per-key or global limit? (Per-key by convention.)
- Should refused requests be queued or rejected? (Token bucket rejects; leaky bucket queues.)
- Time source: monotonic clock or wall clock? (Always monotonic — wall clock can jump backward.)
- Distributed across N servers, or single-process? (Often a follow-up; default single-process.)
- Burst tolerance — yes or no? (Token bucket allows bursts; sliding window log enforces strict.)
Examples
Rate = 5 req / 1 s.
Token bucket (capacity 5, refill 5/s):
t=0: 5 quick requests → all allow (bucket drains 5→0)
t=0.1: 1 request → reject (bucket=0.5 < 1)
t=1.0: 1 request → allow (bucket refilled to 5; now 4)
Sliding window log (limit 5 over 1 s):
t=0..0.5: 5 requests → all allow
t=0.6: 1 request → reject (6 in last 1 s)
t=1.1: 1 request → allow (the t=0 request slid out)
Initial Brute Force
A dict[key, list[timestamp]] and on every allow, filter out timestamps older than W and check len. This is the sliding window log baseline; it is O(history-size) per call and unbounded memory. Acceptable for low-rate testing; not acceptable for production at 10^5 QPS.
Brute Force Complexity
Per call: O(N) where N is the request count in the window. Memory: O(N · keys). At 10^5 QPS over 1-second windows for 10^6 keys, the memory could approach 10^11 timestamps — orders of magnitude too high.
Optimization Path
For each algorithm, the optimization target is different:
- Token bucket: store
(tokens, last_refill_time)per key. Compute refill on demand:tokens = min(B, tokens + (now - last_refill) * R). O(1) per call, O(1) memory per key. - Leaky bucket: equivalent to token bucket if reject-on-full; if queue requests, store the queue.
- Sliding window log: same as brute force, but trim the prefix lazily on each call. O(amortized 1) per call.
- Sliding window counter: store
(curr_count, prev_count, curr_window_start). Approximate the rolling count withprev * (1 - elapsed/W) + curr. O(1) per call, O(1) memory.
The token bucket is the dominant choice; the other three are presented for comparison.
Final Expected Approach
Define a RateLimiter interface with allow(key) -> bool. Implement four classes. Each takes rate: float (per second) and capacity (or window). Use time.monotonic(). Make all four thread-safe via per-key fine-grained locks (a dict[key, Lock] lazily created — or just a single global lock, which is simpler and adequate for most workloads).
Data Structures Used
| Algorithm | Per-key state |
|---|---|
| Token bucket | (float tokens, float last_refill_t) |
| Leaky bucket (reject) | (float queue_size, float last_drain_t) |
| Sliding log | deque[float] of timestamps |
| Sliding counter | (int curr, int prev, float window_start) |
Correctness Argument
Token bucket: tokens are produced at rate R continuously and capped at B; consumption is one per allowed request. Equivalent to the differential equation dt/dt = R - consumption, integrated by the lazy refill formula. Correct provided we never let tokens go below 0 (we check >= 1 before decrementing) or above B (the min(B, ...)).
Sliding window log: the invariant is “at any time, the deque contains exactly the timestamps in [now - W, now]”. We maintain it by trimming the prefix on every call. Then allow is len(deque) < N.
Sliding window counter: the approximation is estimate = prev_count * (1 - elapsed/W) + curr_count. This is exact when requests are uniformly distributed within each window and an upper bound otherwise (off by at most one window’s burst). Acceptable for most production rate limiters.
Complexity
| Algorithm | Time / call | Space / key |
|---|---|---|
| Token bucket | O(1) | O(1) |
| Leaky bucket | O(1) | O(1) |
| Sliding log | O(1) amortized | O(N) |
| Sliding counter | O(1) | O(1) |
Implementation Requirements
import time, threading
from collections import deque
from typing import Hashable
class TokenBucket:
def __init__(self, rate: float, capacity: float):
self._rate, self._cap = rate, capacity
self._state: dict[Hashable, list[float]] = {} # key -> [tokens, last_t]
self._lock = threading.Lock()
def allow(self, key: Hashable) -> bool:
now = time.monotonic()
with self._lock:
s = self._state.get(key)
if s is None:
s = [self._cap, now]; self._state[key] = s
tokens, last = s
tokens = min(self._cap, tokens + (now - last) * self._rate)
if tokens >= 1:
s[0] = tokens - 1; s[1] = now
return True
s[0] = tokens; s[1] = now
return False
class SlidingWindowLog:
def __init__(self, max_in_window: int, window_s: float):
self._max, self._w = max_in_window, window_s
self._logs: dict[Hashable, deque[float]] = {}
self._lock = threading.Lock()
def allow(self, key: Hashable) -> bool:
now = time.monotonic()
with self._lock:
dq = self._logs.setdefault(key, deque())
cutoff = now - self._w
while dq and dq[0] < cutoff:
dq.popleft()
if len(dq) >= self._max:
return False
dq.append(now)
return True
class SlidingWindowCounter:
def __init__(self, max_in_window: int, window_s: float):
self._max, self._w = max_in_window, window_s
self._state: dict[Hashable, list] = {} # [curr, prev, window_start]
self._lock = threading.Lock()
def allow(self, key: Hashable) -> bool:
now = time.monotonic()
with self._lock:
s = self._state.get(key)
if s is None:
s = [0, 0, now]; self._state[key] = s
curr, prev, ws = s
elapsed = now - ws
if elapsed >= 2 * self._w:
curr = prev = 0; ws = now
elif elapsed >= self._w:
prev, curr = curr, 0; ws += self._w
elapsed -= self._w
estimate = prev * (1 - elapsed / self._w) + curr
if estimate >= self._max:
s[0], s[1], s[2] = curr, prev, ws
return False
s[0], s[1], s[2] = curr + 1, prev, ws
return True
class LeakyBucket:
"""Reject-on-full leaky bucket. Equivalent to token bucket when reject."""
def __init__(self, rate: float, capacity: float):
self._rate, self._cap = rate, capacity
self._state: dict[Hashable, list[float]] = {}
self._lock = threading.Lock()
def allow(self, key: Hashable) -> bool:
now = time.monotonic()
with self._lock:
s = self._state.get(key)
if s is None:
s = [0.0, now]; self._state[key] = s
level, last = s
level = max(0.0, level - (now - last) * self._rate)
if level + 1 > self._cap:
s[0], s[1] = level, now
return False
s[0], s[1] = level + 1, now
return True
Tests
import unittest, time
class TestTokenBucket(unittest.TestCase):
def test_burst_then_steady(self):
rl = TokenBucket(rate=5, capacity=5)
# Burst: 5 quick allows
for _ in range(5):
self.assertTrue(rl.allow("k"))
# 6th: reject (bucket empty)
self.assertFalse(rl.allow("k"))
# Wait 0.4s → 2 tokens accumulated
time.sleep(0.4)
self.assertTrue(rl.allow("k"))
self.assertTrue(rl.allow("k"))
def test_per_key_isolation(self):
rl = TokenBucket(rate=1, capacity=1)
self.assertTrue(rl.allow("a"))
self.assertTrue(rl.allow("b")) # different key, full bucket
self.assertFalse(rl.allow("a"))
class TestSlidingLog(unittest.TestCase):
def test_strict_count(self):
rl = SlidingWindowLog(max_in_window=3, window_s=1.0)
for _ in range(3):
self.assertTrue(rl.allow("k"))
self.assertFalse(rl.allow("k"))
time.sleep(1.05)
self.assertTrue(rl.allow("k")) # log has slid out
Follow-up Questions
(3) How would you scale to N nodes? This is the key follow-up. Options: (a) sticky routing — route all requests for a key to a fixed node by hash(key) % N; each node enforces locally. Simple, but rebalancing on add/remove is painful. (b) Centralized counter in Redis using INCR + EXPIRE per window. Network round-trip per call — only works at moderate QPS. (c) Approximate distributed: each node enforces R/N locally and accepts that bursts up to R are possible. The pragmatic real-world answer; documents your error budget. (d) Token bucket in Redis with Lua script for atomic refill+decrement — Stripe and GitHub do this in production.
(7) How would you handle backpressure? The whole point of a rate limiter is backpressure on upstream traffic. The question becomes: when the limiter rejects, what does the client see? HTTP 429 with Retry-After header is the standard. Enable cooperative backoff so clients don’t retry-storm. Optionally include X-RateLimit-Remaining and X-RateLimit-Reset headers (GitHub convention).
(9) What’s the eviction policy? Per-key state grows unbounded if you never clean up. Two strategies: (a) lazy expiry — when a key has been silent for >2W, drop its state on the next access. (b) Background scavenger — periodically scan and remove stale entries. The lazy approach is preferred; it’s O(0) overhead in the steady state.
(11) What configuration knobs? rate and capacity (or window) per limit class. Optionally per-key overrides (a dict[key, (rate, cap)] for VIP customers). Knobs not to expose: the algorithm choice (pick one and stick).
(4) How would you observe / monitor? Allow rate (counter), reject rate (counter), reject ratio (gauge), per-key reject rate (top-K dashboard for finding hot keys). Bucket-fill / queue-length gauge for diagnosing whether you’re rate-limited because of bursts or steady overload.
Product Extension
Stripe’s API uses token bucket with per-account capacity. AWS API Gateway uses token bucket per stage. GitHub’s API uses sliding window with hourly windows visible to clients. Twitter (X) uses fixed windows for some endpoints, sliding for others. The choice depends on the contract you offer customers (“up to 5 burst requests” → token bucket; “exactly 5000 per hour” → sliding log or sliding counter).
Language/Runtime Follow-ups
- Python: the implementation above. For high QPS, replace the global lock with a
dict[key, Lock]lazily, or shard byhash(key) % N. - Java: Guava’s
RateLimiteris a token bucket with smoothing options. For distributed, Bucket4j is excellent. - Go:
golang.org/x/time/rateis a token bucket (Allow/Wait). For distributed, use Redis with a Lua script. - C++: no stdlib; use
std::chrono::steady_clock::now(). Folly has a token bucket. - JS/TS:
bottleneck(npm) is the canonical client-side. Server-side: Redis-backed for distributed.
Common Bugs
- Using
time.time()(wall clock) instead oftime.monotonic()— clock skew or NTP adjustments cause negative deltas and free tokens. - Token bucket: not capping at
B— bucket grows unbounded over idle periods; first burst is huge. - Sliding log: not trimming on every call, only on insert — memory grows for read-heavy patterns.
- Sliding counter: failing to advance the window pointer when 2W has passed (key idle for long enough that both windows are stale).
- Forgetting per-key isolation — a single shared bucket across all keys.
Debugging Strategy
Log every (key, allow/reject, bucket_state) transition for a single key. Hand-trace against expected behavior. For distributed bugs, capture the Lua script’s input and output and replay against a local Redis. For thundering-herd bugs (many clients see “reset” simultaneously and all retry at once), add jitter on Retry-After (server-side recommends Retry-After: random_in_range(t, 2t)).
Mastery Criteria
- Implemented all four algorithms in <60 minutes total (15 min each).
- All tests pass first run.
- Compared the four algorithms verbally in <90 seconds, naming a workload where each is the right choice.
-
Stated why
time.monotonic()is required without prompting. - Answered follow-ups #3 (distributed), #4, #7, #9, #11 in <90 seconds each.
- Identified that the leaky bucket and reject-on-full token bucket are mathematically equivalent when reject (different when queue).
Lab 04 — Task Scheduler
Goal
Implement a TaskScheduler that accepts tasks with priorities, executes them in priority order via a worker pool, retries on failure with exponential backoff, and routes permanently-failed tasks to a dead-letter queue. After this lab you should be able to design and implement a small in-memory task queue with retry semantics in under 35 minutes.
Background Concepts
A task scheduler is the in-memory cousin of Celery / Sidekiq / RQ. The four moving parts are:
- Priority queue of pending tasks (heap keyed on priority + scheduled-execution-time).
- Worker pool that pops tasks and runs them.
- Retry policy that decides if and when a failed task is re-enqueued (with delayed visibility).
- Dead-letter queue (DLQ) for tasks that have exhausted retries.
The non-trivial design question is how to handle delayed re-enqueue for retries. The clean answer is to use a single priority queue keyed by (priority, ready_at), and have workers wait_until_ready on the head of the queue. This unifies “high-priority now” and “low-priority retry-in-30-seconds” under one data structure.
Interview Context
Task scheduler problems are popular at infrastructure-heavy companies (Uber, Cloudflare, AWS Lambda team, Datadog) because they touch concurrency, priority queues, retry semantics, and DLQ design — the building blocks of every async-job system. The interviewer will probe whether you’ve thought about idempotency, exactly-once vs at-least-once, and observability.
Problem Statement
Design TaskScheduler(n_workers, max_retries):
submit(task: Callable, priority: int, max_attempts: int = 3) -> task_id— enqueue a task with given priority. Lower numeric priority = runs first.start()— start the worker pool.shutdown(timeout)— stop accepting new tasks; finish in-flight up to timeout; return.dead_letters() -> list[FailedTask]— return tasks that exhausted retries.
Behavior: failures (raised exception) → exponential backoff retry up to max_attempts; permanent failure → DLQ.
Constraints
- Up to 10^4 pending tasks
- Up to 100 workers
- Per-task max execution time: 60s (a configurable per-task timeout is a follow-up)
- Tasks may be of arbitrary type but assumed to be deterministic-ish
Clarifying Questions
- Are tasks idempotent? (We’ll assume yes; idempotency is the user’s responsibility for at-least-once correctness.)
- Priority semantics: lower = higher? (Yes by convention, like a min-heap.)
- What does retry mean — the same task is re-run, or a new attempt object? (Same callable, same args, attempt counter incremented.)
- Should retries preserve original priority? (Yes by convention.)
- Cancellation? (Often a follow-up; default no.)
Examples
sched = TaskScheduler(n_workers=2, max_retries=3)
sched.start()
sched.submit(lambda: print("a"), priority=1)
sched.submit(lambda: print("b"), priority=0) # runs first
sched.submit(failing_task, priority=2, max_attempts=2)
# 1st attempt fails; retry after backoff
# 2nd attempt fails; → DLQ
sched.shutdown(timeout=5.0)
sched.dead_letters() # [FailedTask(failing_task, attempts=2, last_error=...)]
Initial Brute Force
A single thread polling a list sorted by priority. Run, retry inline. Single worker, no parallelism. O(N log N) per submit.
Brute Force Complexity
Per submit: O(N log N) on the sort. Per dispatch: O(N) on the linear scan. Acceptable only for tens of tasks.
Optimization Path
Replace the sorted list with a heapq. Replace the single thread with a worker pool of n_workers threads. Add a Condition variable so workers block when the queue is empty. Add a delayed-execution facility: instead of time.sleep(backoff) in the worker, push the task back with ready_at = now + backoff and key the heap on (ready_at, priority).
Final Expected Approach
Single heap (ready_at, priority, attempt, task_id, callable). A condition variable not_empty wakes workers when something becomes available. Workers loop: peek heap → if ready_at > now, wait until ready_at (or until notified). Pop, run, on success: done. On failure: increment attempt, if under max_attempts, push back with ready_at = now + backoff(attempt); else push to DLQ list. shutdown sets a flag, broadcasts the condition, joins all workers with a deadline.
Data Structures Used
| Structure | Purpose |
|---|---|
heapq of tuples | priority + delayed-readiness |
threading.Condition | wait/notify for empty queue and ready-time |
list (DLQ) | failed tasks |
dict[task_id, attempt_count] | retry tracking |
Correctness Argument
Priority ordering: heap orders by (ready_at, priority); workers always pop the smallest. When ready_at <= now, this is the highest-priority ready task. Ties on ready_at go to the smaller priority — correct.
Retry semantics: failure → push back with attempt+1, ready_at = now + 2^attempt * base + jitter. After max_attempts attempts, push to DLQ. The task is never lost: it is either running, in the heap, or in the DLQ — invariant maintained by every transition.
Shutdown: setting _stopping = True and broadcasting wakes every blocked worker. Each worker checks _stopping after the wait and exits if true. The join(timeout) per worker bounds total shutdown time.
Complexity
submit: O(log N)- Worker dispatch: O(log N) per task
- Memory: O(pending + dlq)
Implementation Requirements
import heapq, threading, time, itertools, random, traceback
from dataclasses import dataclass, field
from typing import Callable, Any
@dataclass
class FailedTask:
task_id: int
callable_repr: str
attempts: int
last_error: str
@dataclass(order=True)
class _Heap_Entry:
ready_at: float
priority: int
seq: int
task_id: int = field(compare=False)
fn: Callable = field(compare=False)
attempt: int = field(compare=False)
max_attempts: int = field(compare=False)
class TaskScheduler:
def __init__(self, n_workers: int = 4, base_backoff: float = 0.1):
self._n_workers = n_workers
self._base = base_backoff
self._heap: list[_Heap_Entry] = []
self._dlq: list[FailedTask] = []
self._cond = threading.Condition()
self._stopping = False
self._workers: list[threading.Thread] = []
self._seq = itertools.count()
self._next_id = itertools.count(1)
def submit(self, fn: Callable, priority: int = 5, max_attempts: int = 3) -> int:
if max_attempts <= 0:
raise ValueError("max_attempts must be positive")
tid = next(self._next_id)
e = _Heap_Entry(time.monotonic(), priority, next(self._seq),
tid, fn, attempt=0, max_attempts=max_attempts)
with self._cond:
heapq.heappush(self._heap, e)
self._cond.notify()
return tid
def start(self) -> None:
for i in range(self._n_workers):
t = threading.Thread(target=self._run_worker, name=f"w{i}", daemon=True)
t.start()
self._workers.append(t)
def shutdown(self, timeout: float = 5.0) -> None:
with self._cond:
self._stopping = True
self._cond.notify_all()
deadline = time.monotonic() + timeout
for w in self._workers:
w.join(timeout=max(0.0, deadline - time.monotonic()))
def dead_letters(self) -> list[FailedTask]:
with self._cond:
return list(self._dlq)
def _run_worker(self) -> None:
while True:
with self._cond:
while not self._heap and not self._stopping:
self._cond.wait()
if self._stopping and not self._heap:
return
head = self._heap[0]
wait = head.ready_at - time.monotonic()
if wait > 0:
self._cond.wait(timeout=wait)
continue
e = heapq.heappop(self._heap)
try:
e.fn()
except Exception as ex:
e.attempt += 1
if e.attempt >= e.max_attempts:
with self._cond:
self._dlq.append(FailedTask(
e.task_id, repr(e.fn), e.attempt,
f"{type(ex).__name__}: {ex}"))
else:
backoff = self._base * (2 ** (e.attempt - 1))
backoff *= 0.5 + random.random() # jitter [0.5, 1.5]
e.ready_at = time.monotonic() + backoff
e.seq = next(self._seq)
with self._cond:
heapq.heappush(self._heap, e)
self._cond.notify()
Tests
import unittest, time, threading
class TestScheduler(unittest.TestCase):
def test_priority_order(self):
order = []
sched = TaskScheduler(n_workers=1)
sched.start()
for p in [3, 1, 2]:
sched.submit((lambda x=p: order.append(x)), priority=p)
time.sleep(0.5)
sched.shutdown(timeout=2.0)
self.assertEqual(order, [1, 2, 3])
def test_retry_then_dlq(self):
attempts = []
def always_fail():
attempts.append(1)
raise RuntimeError("boom")
sched = TaskScheduler(n_workers=1, base_backoff=0.01)
sched.start()
sched.submit(always_fail, priority=0, max_attempts=3)
time.sleep(1.0)
sched.shutdown(timeout=2.0)
self.assertEqual(len(attempts), 3)
self.assertEqual(len(sched.dead_letters()), 1)
def test_concurrent_submit(self):
results = []
sched = TaskScheduler(n_workers=4)
sched.start()
def push(i):
for j in range(50):
sched.submit((lambda x=(i, j): results.append(x)), priority=0)
threads = [threading.Thread(target=push, args=(i,)) for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
time.sleep(0.5)
sched.shutdown(timeout=2.0)
self.assertEqual(len(results), 200)
Follow-up Questions
(2) Persist state across restarts? Tasks live in memory and are lost on restart. To persist, choose: (a) write each submit to a WAL; on boot, replay; on completion, append a “done” marker. (b) Snapshot the heap periodically and write a delta log. The DLQ should be persisted regardless — losing failed tasks is the worst outcome because nobody knows why a job didn’t run.
(8) Partial failure? The interesting case: a worker pops a task and crashes mid-execution. The task is now lost (it’s not in the heap and it didn’t complete). Solution: at-least-once via visibility timeout — the heap pops the task to an “in-flight” map with a TTL; if the worker doesn’t ack before TTL, the task returns to the heap. Idempotency keys make this safe. This is the SQS / Cloud Tasks model.
(9) Eviction / cleanup? The DLQ grows unbounded. Either: cap its size and drop oldest, retain a sliding-window of the last N failures, or persist to durable storage and prune from memory after a TTL. Always emit a per-task DLQ event so downstream alerting can fire.
(11) Configuration knobs? n_workers, base_backoff, default max_attempts. Per-task: priority, max_attempts, optionally timeout. Knobs not to expose: jitter strategy (use decorrelated jitter), heap implementation.
(12) Shutdown / draining? Two modes: graceful (stop accepting; wait for in-flight; return) and forceful (stop accepting; abandon in-flight; return immediately). Always offer both. Default to graceful with a deadline.
(13) Poison pill? A task that always crashes the worker (segfault, unhandled OS exception, infinite loop). Run tasks in subprocess isolation (or with a cooperative timeout). Blacklist by hash of (callable, args) after N consecutive crashes.
Product Extension
This is the heart of Celery, Sidekiq, RQ, AWS SQS + Lambda, GCP Cloud Tasks, and Temporal. Real systems add: visibility timeouts (the in-flight TTL), distributed coordination (multiple workers across hosts), durable storage (RDBMS or Redis with persistence), scheduling (cron-like time-based triggers), and workflow orchestration (Temporal). The core is what you wrote here.
Language/Runtime Follow-ups
- Python: GIL means worker threads don’t parallelize CPU work. For CPU-bound tasks, use a
ProcessPoolExecutorinstead. The implementation above is fine for I/O-bound tasks. - Java:
ScheduledThreadPoolExecutoris the textbook fit — submit with a delay, retries via re-submission.RetryTemplate(Spring Retry) for the policy.DeadLetterPublishingRecoverer(Kafka). - Go: a single channel of tasks plus N goroutines; for delayed retry, use
time.AfterFuncto push back to the channel. Or usegolang.org/x/sync/errgroupfor the worker pool. - C++: a
std::priority_queueplus condition variable. Tasks asstd::function<void()>. - JS/TS: not concurrent (single event loop), but BullMQ (Redis-backed) is the de-facto Node task queue.
Common Bugs
- Workers spinning when the heap head is in the future — wait
ready_at - nowexactly, not poll-loop. - Notifying only one worker on
submit(notify) butnotify_allon shutdown — fine, but check that the heap-shrink case (the popper seeshead.ready_at > nowand sleeps) doesn’t miss a wakeup when a higher-priority task is pushed during the sleep. - Forgetting to update
e.seqon re-push — the heap entry’s identity matters for tie-breaking, but Python’s heapq compares the full tuple in order, so missingsequpdates can cause the same(ready_at, priority, seq)to compare equal and the comparison to fall through to the un-comparableCallable. Always bumpseq. - Catching
Exceptionbut lettingBaseException(e.g.,KeyboardInterrupt) escape — workers die silently. CatchBaseExceptionwith care, or at minimum catchExceptionand log unexpected escapes. - DLQ growing forever — see follow-up #9.
Debugging Strategy
Add a worker trace: every transition (pop, run-start, run-end, retry, dlq) gets a log line with (worker_id, task_id, ts). Replay the log to see the timeline. For “task didn’t run” bugs, walk the heap state at submit time and check that notify was called. For shutdown deadlocks, take a thread dump (Python: faulthandler.dump_traceback_later()) — usually a worker is blocked on wait because notify_all was missed.
Mastery Criteria
- Implemented in <40 minutes; <30 on second attempt.
- All three tests pass.
- Articulated visibility-timeout / at-least-once vs lost-on-crash tradeoff in <90 seconds.
- Answered follow-ups #2, #8, #9, #12, #13 crisply.
- Added at-least-once semantics in <15 minutes when prompted.
- Stated why decorrelated jitter beats fixed jitter in retry backoff.
Lab 05 — Thread Pool
Goal
Implement a bounded ThreadPool with a fixed number of worker threads, a bounded work queue, configurable rejection policy, and graceful shutdown. After this lab you should be able to write a clean ThreadPoolExecutor clone in under 25 minutes and answer the standard concurrency follow-ups.
Background Concepts
A thread pool decouples task submission from task execution by introducing a queue of work items processed by N worker threads. The four design decisions are:
- Pool sizing: fixed-size, dynamic (grow/shrink), or bounded with min/max?
- Queue policy: bounded (block / reject / drop) or unbounded (memory risk)?
- Rejection policy when the queue is full: throw, drop newest, drop oldest, or run-on-caller’s-thread?
- Shutdown semantics: stop accepting and finish queue (
shutdown), or stop accepting and abandon queue (shutdown_now)?
The textbook implementation (and the one Java’s ThreadPoolExecutor uses) is fixed-size pool + bounded blocking queue + caller-runs rejection + graceful shutdown. This is the answer the interviewer wants by default.
Interview Context
Thread pool is a classic concurrency interview question. It tests whether you understand condition variables / blocking queues, can reason about producer-consumer with backpressure, and can structure shutdown so that submit after shutdown is rejected and in-flight tasks complete cleanly. Java candidates are expected to know that ThreadPoolExecutor’s seven-parameter constructor encodes most of these decisions.
Problem Statement
Implement ThreadPool(n_workers, queue_capacity, on_reject):
submit(fn) -> Future— enqueue. If queue full and pool not shut down, applyon_reject.shutdown(wait=True, timeout=None)— stop accepting; ifwait, drain the queue and join workers.shutdown_now() -> list[Callable]— stop accepting; abandon queued tasks and return them.
A Future exposes .result(timeout) to retrieve the task’s return value or raise its exception.
Constraints
- 1 ≤
n_workers≤ 1000 - 0 ≤
queue_capacity≤ 10^4 (0 = SynchronousQueue: hand off directly) - Submission rate up to 10^5 / s
Clarifying Questions
- Is the queue bounded? (Yes by default; “unbounded queue” is a known antipattern that masks production bugs.)
- What happens on full queue? (Reject by default; offer caller-runs as alternative.)
- Should workers be daemons? (Depends on language; in Python yes for graceful interpreter shutdown.)
- Returns a Future? (Yes — async result is the standard contract.)
- Re-entrancy: can a task
submitmore tasks? (Yes — must not deadlock on a full queue from inside a worker.)
Examples
pool = ThreadPool(n_workers=2, queue_capacity=5)
fut = pool.submit(lambda: 1 + 1)
fut.result() -> 2
pool.shutdown(wait=True)
pool.submit(lambda: 1) -> raises RuntimeError (pool shut down)
Initial Brute Force
for fn in tasks: threading.Thread(target=fn).start(). No bound, no reuse, no result tracking. Each task pays full thread-creation cost (~1ms on Linux), and the OS can run out of threads at 10^4+.
Brute Force Complexity
Per task: O(thread creation) ≈ 1 ms in Python. Total: O(N · 1ms). At N=10^5, this is 100 seconds — far too slow. Memory: O(N) thread stacks ≈ 8 MB each.
Optimization Path
Pool the threads. Workers spin on a blocking queue. submit enqueues; the queue blocks when full (or rejects). Per-task overhead drops to microseconds (a queue push and pop). Memory is O(n_workers · stack_size + queue_capacity).
Final Expected Approach
A Queue(maxsize=queue_capacity) (Python’s queue.Queue is thread-safe and supports timeouts). N worker threads loop on q.get(), run the task, set its Future, repeat. A sentinel None posted N times signals shutdown. submit checks the shut-down flag, then either q.put_nowait (raise on full) or q.put (block on full); on full and not blocking, invoke on_reject.
Data Structures Used
| Structure | Purpose |
|---|---|
queue.Queue(maxsize=…) | producer-consumer with bounded blocking |
Future (custom or concurrent.futures.Future) | result + exception delivery |
Sentinel None posted N times | shutdown signal |
_shutdown: bool flag | reject post-shutdown submissions |
Correctness Argument
Liveness: when a task is enqueued and at least one worker is idle, that worker will dequeue and run it. Provided by queue.Queue’s internal Condition (notify on put, wait on get).
Safety / no lost tasks: every put either succeeds (task will be dequeued by some worker) or is explicitly rejected. The shutdown protocol enforces that no put succeeds after _shutdown=True. When shutdown(wait=True) returns, the queue is empty and all workers have exited (proven by the sentinel pattern: each worker sees exactly one None and exits, so all N workers terminate).
Future correctness: the worker’s try/except block sets either set_result(value) or set_exception(ex). Future.result() blocks on a Condition until one of the two is set. Linearizable.
Complexity
submit: O(1) amortized- Worker step: O(1) plus task duration
- Memory: O(n_workers + queue_capacity)
Implementation Requirements
import threading, queue
from typing import Callable, Any, Optional
from concurrent.futures import Future
_SENTINEL = object()
class RejectedExecutionError(RuntimeError):
pass
class ThreadPool:
def __init__(self, n_workers: int, queue_capacity: int = 1024,
on_reject: Optional[Callable] = None):
if n_workers <= 0:
raise ValueError("n_workers must be positive")
self._q: queue.Queue = queue.Queue(maxsize=queue_capacity)
self._workers: list[threading.Thread] = []
self._shutdown = False
self._lock = threading.Lock()
self._on_reject = on_reject or self._default_reject
for i in range(n_workers):
t = threading.Thread(target=self._run, name=f"pool-w{i}", daemon=True)
t.start()
self._workers.append(t)
@staticmethod
def _default_reject(fn, *args, **kwargs):
raise RejectedExecutionError("queue full")
def submit(self, fn: Callable, *args, **kwargs) -> Future:
with self._lock:
if self._shutdown:
raise RejectedExecutionError("pool shut down")
fut: Future = Future()
try:
self._q.put_nowait((fn, args, kwargs, fut))
except queue.Full:
try:
self._on_reject(fn, *args, **kwargs)
except Exception as ex:
fut.set_exception(ex)
else:
fut.set_exception(RejectedExecutionError("rejected"))
return fut
def shutdown(self, wait: bool = True, timeout: Optional[float] = None) -> None:
with self._lock:
if self._shutdown:
return
self._shutdown = True
for _ in self._workers:
self._q.put(_SENTINEL)
if wait:
for w in self._workers:
w.join(timeout=timeout)
def shutdown_now(self) -> list[Callable]:
"""Stop accepting; abandon queued tasks; return abandoned callables."""
with self._lock:
self._shutdown = True
abandoned: list[Callable] = []
try:
while True:
item = self._q.get_nowait()
if item is _SENTINEL: continue
fn, _, _, fut = item
fut.set_exception(RejectedExecutionError("shutdown_now"))
abandoned.append(fn)
except queue.Empty:
pass
for _ in self._workers:
self._q.put(_SENTINEL)
return abandoned
def _run(self) -> None:
while True:
item = self._q.get()
if item is _SENTINEL:
return
fn, args, kwargs, fut = item
if not fut.set_running_or_notify_cancel():
continue
try:
result = fn(*args, **kwargs)
except BaseException as ex:
fut.set_exception(ex)
else:
fut.set_result(result)
# A useful policy: caller-runs (executes inline if queue is full)
def caller_runs(fn, *args, **kwargs):
fn(*args, **kwargs)
Tests
import unittest, time, threading
class TestPool(unittest.TestCase):
def test_basic(self):
pool = ThreadPool(n_workers=2, queue_capacity=10)
futs = [pool.submit(lambda x=i: x * 2) for i in range(10)]
results = [f.result(timeout=2.0) for f in futs]
self.assertEqual(sorted(results), [0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
pool.shutdown()
def test_exception_propagated(self):
pool = ThreadPool(n_workers=1, queue_capacity=10)
f = pool.submit(lambda: 1 / 0)
with self.assertRaises(ZeroDivisionError):
f.result(timeout=2.0)
pool.shutdown()
def test_rejection_when_queue_full(self):
block = threading.Event()
pool = ThreadPool(n_workers=1, queue_capacity=1)
pool.submit(lambda: block.wait()) # occupies the worker
pool.submit(lambda: None) # fills the queue
f = pool.submit(lambda: None) # rejected
with self.assertRaises(RejectedExecutionError):
f.result(timeout=1.0)
block.set()
pool.shutdown()
def test_shutdown_rejects_new(self):
pool = ThreadPool(n_workers=1)
pool.shutdown()
with self.assertRaises(RejectedExecutionError):
pool.submit(lambda: None)
def test_concurrent_submit(self):
pool = ThreadPool(n_workers=8, queue_capacity=200)
results = []
lock = threading.Lock()
def task(x):
with lock: results.append(x)
futs = [pool.submit(task, i) for i in range(200)]
for f in futs: f.result(timeout=2.0)
pool.shutdown()
self.assertEqual(sorted(results), list(range(200)))
Follow-up Questions
(1) Thread-safe? Already designed for concurrency; the Queue handles producer-consumer atomicity. The _shutdown flag is read under a lock to avoid races between submit and shutdown.
(7) Backpressure? The bounded queue is the backpressure. Three policies: (a) put blocks the producer (default queue.put) — pushes backpressure to the caller. (b) reject (raise) — caller decides. (c) caller-runs — caller does the work; throttles naturally. (d) drop oldest — for non-critical telemetry-style work. Pick one explicitly per pool.
(12) Shutdown / draining? shutdown(wait=True) drains the queue (graceful). shutdown_now() abandons the queue and returns abandoned tasks (forceful). The graceful path is the production default; expose shutdown_now for SIGTERM after a deadline.
(8) Partial failure? A worker thread that crashes on an uncaught exception leaves the pool with N-1 workers permanently. Solutions: (a) catch BaseException around the task body (shown), (b) supervise — periodically check live worker count and respawn dead workers. The simplest production design catches and logs, never lets the worker die.
(13) Poison pill? A task that runs forever or consumes all memory blocks one worker permanently. Mitigations: per-task timeout (cooperative with a watchdog thread), memory accounting (rare in Python), or run untrusted tasks in subprocesses. Stating this awareness is the bar.
(11) Configuration knobs? n_workers (often CPU_count or 2 * CPU_count for I/O-bound); queue_capacity (rule of thumb: enough to absorb a ~1 second burst); on_reject policy. Knobs not to expose: queue type, worker thread name (auto-generate).
Product Extension
java.util.concurrent.ThreadPoolExecutor is the textbook reference. Python’s concurrent.futures.ThreadPoolExecutor is the canonical stdlib equivalent, with an unbounded queue by default — your implementation is more correct than the stdlib because you bounded the queue. AWS Lambda’s worker runtime, NGINX worker processes, and most application servers use variants of this pattern.
Language/Runtime Follow-ups
- Python: GIL serializes CPU work; ThreadPool is for I/O-bound tasks. Use
ProcessPoolExecutorfor CPU-bound.concurrent.futures.ThreadPoolExecutorships with Python — but its queue is unbounded by default, which is a footgun. - Java:
new ThreadPoolExecutor(core, max, keepAlive, unit, queue, factory, rejectedHandler). Memorize the seven parameters and the four built-inRejectedExecutionHandlerpolicies (Abort, CallerRuns, Discard, DiscardOldest). - Go: idiomatic Go does not use thread pools — goroutines are cheap. The pattern is a worker-pool of N goroutines reading from a channel of work items. The bounded channel is the bounded queue.
- C++:
std::threadper worker;std::condition_variable+std::queuefor the work queue. Boost.Asio’s thread pool is production-ready. - JS/TS: single event loop; use
worker_threadsfor CPU work. Libraries:piscina(worker pool for Node).
Common Bugs
- Unbounded queue:
Queue()withoutmaxsize— masks production overload as memory growth. - Daemons vs non-daemons: in Python, daemon workers die abruptly on interpreter exit, abandoning in-flight tasks. Non-daemons require explicit
shutdownor the program hangs. Pick deliberately. - Catching
Exceptionbut notBaseExceptionletsKeyboardInterruptkill workers silently. CatchBaseException, restore worker. submitaftershutdownrace: check the flag under the lock and put under the same critical section, or accept that a few enqueues may sneak in between check and put (and handle them in the worker by checking the flag before running).- Forgetting to
set_running_or_notify_cancel()on the Future — cancelled futures still get run.
Debugging Strategy
For deadlocks, take a thread dump: in Python, import faulthandler; faulthandler.dump_traceback_later(5) then trigger the hang. For lost tasks, instrument every put/get/set_result/set_exception with a sequence number and replay. For worker death, log every worker exit with its reason.
Mastery Criteria
-
Implemented
ThreadPoolin <30 minutes; tests pass first run. - Articulated bounded-queue + rejection-policy design without prompting.
- Listed the four standard rejection policies (abort / caller-runs / discard / discard-oldest).
- Answered follow-ups #7, #8, #12, #13 in <90 seconds each.
- Stated when to use ThreadPool vs ProcessPool in Python in <30 seconds.
- Refactored to add per-task timeout in <10 minutes when prompted.
Lab 06 — Durable Job Queue
Goal
Implement a job queue with at-least-once delivery semantics, idempotency keys, and visibility timeouts. After this lab you should be able to articulate why exactly-once delivery is impractical, design at-least-once with idempotency on the consumer, and implement an in-memory queue that simulates SQS-style semantics in under 35 minutes.
Background Concepts
A durable job queue accepts jobs from producers and delivers them to consumers, surviving consumer crashes without losing work. The three classical delivery semantics:
- At-most-once: deliver, forget. Fast but loses jobs on crash. Acceptable for fire-and-forget telemetry.
- At-least-once: deliver, retry until acknowledged. Jobs may be delivered multiple times. Requires consumers to be idempotent.
- Exactly-once: impossible in a distributed system without two-phase commit between queue and consumer. The “exactly-once” branding in real systems (Kafka, Pulsar) means “exactly-once processing semantics given idempotent consumers” — which is at-least-once + idempotency.
The standard primitive that enables at-least-once is the visibility timeout: when a consumer dequeues a job, the queue marks it “in-flight” with a TTL. If the consumer acks before TTL, the job is deleted. If TTL expires (consumer crashed), the job becomes visible again and is redelivered. The consumer must be idempotent because the same job may be processed twice if ack was lost in transit.
Interview Context
This is the central problem in any interview at AWS (SQS), GCP (Pub/Sub), Confluent (Kafka), or any infrastructure team. The interviewer wants to hear: “exactly-once is impractical because of the two-generals problem; at-least-once with idempotency keys is the production answer; visibility timeout is how we implement it.” Then they want to see you build a small version that demonstrates understanding.
Problem Statement
Design JobQueue:
enqueue(payload, idempotency_key=None) -> job_id— push. Ifidempotency_keyis non-Noneand matches a recent job, deduplicate.dequeue(visibility_timeout=30.0) -> Job | None— pop a visible job; mark in-flight with TTL.ack(job_id)— confirm successful processing; permanently delete.nack(job_id, requeue_delay=0)— release back; optionally with a delay.- Background scavenger: jobs whose visibility TTL has expired return to visible state.
Constraints
- Up to 10^5 in-flight jobs
- 10^4 enqueues / second
- Single-process in-memory; persistence is a follow-up
Clarifying Questions
- FIFO or best-effort ordering? (Best-effort is standard SQS; FIFO costs more.)
- Visibility timeout: per-call or queue default? (Per-call, with queue default.)
- Idempotency key TTL — how long do we dedupe? (Typically 5 minutes; configurable.)
- Max retries before DLQ? (Often a follow-up; default unlimited.)
- What happens on
nack? (Requeue, optionally with a delay; this is the natural retry path.)
Examples
q = JobQueue()
job_id = q.enqueue("send-email-123", idempotency_key="email-abc")
q.enqueue("send-email-123", idempotency_key="email-abc") # dedup; same job_id
job = q.dequeue(visibility_timeout=10)
# … process …
q.ack(job.job_id) # done
# Crash scenario:
job = q.dequeue(visibility_timeout=10)
# consumer crashes, never acks
# 11 seconds later:
job_again = q.dequeue() # same payload, same job_id, redelivery_count=2
Initial Brute Force
A list of jobs and a single mutex. dequeue pops the head, sets in-flight; ack removes; nack re-prepends. No visibility timeout. Loses jobs on crash.
Brute Force Complexity
O(1) per op for a deque, O(N) if implemented over a list with re-prepend. Fundamentally wrong for at-least-once because there’s no scavenger.
Optimization Path
Add: (a) visible: deque[Job] — jobs ready to be dequeued; (b) in_flight: dict[job_id, (Job, expires_at)] — taken but not acked; (c) idempotency: dict[key, job_id] with TTL for dedup; (d) a background scavenger that moves expired in-flight jobs back to visible. The scavenger can be lazy (check on each dequeue) instead of a dedicated thread.
Final Expected Approach
Use deque for visible, dict for in-flight with (job, expires_at), dict for idempotency cache. Each dequeue first sweeps in_flight for expired entries (move them to the front of visible). All operations under a single lock — the queue is fast, sub-millisecond critical sections.
Data Structures Used
| Structure | Purpose |
|---|---|
deque[Job] | visible jobs, FIFO best-effort |
dict[job_id, (Job, expires_at)] | in-flight tracking |
dict[idempotency_key, job_id] | dedup cache |
Lock | single-process atomicity |
Correctness Argument
No lost jobs (at-least-once): every job is in exactly one of three states: visible, in_flight, or acked (deleted). Transitions: enqueue → visible; dequeue → in_flight; ack → deleted; nack → visible; scavenger → visible (from in_flight on TTL expiry). No transition discards a job before ack. Therefore, until acked, every job remains in the system and will eventually be redelivered.
Idempotency dedup: if idempotency_key matches a job in either visible or in_flight (or recently acked, within the dedup window), enqueue returns the existing job_id without creating a new job. This makes producer retries safe.
At-least-once, not exactly-once: a consumer that successfully processes the job and crashes before sending ack will see the same job redelivered. The consumer must idempotent-key the work it does (e.g., the email service must dedupe by email-abc).
Complexity
enqueue: O(1)dequeue: O(K) where K is the number of expired in-flight entries swept (amortized O(1))ack/nack: O(1)
Implementation Requirements
import threading, time, itertools
from collections import deque
from dataclasses import dataclass
from typing import Optional, Any
@dataclass
class Job:
job_id: int
payload: Any
delivery_count: int
enqueued_at: float
class JobQueue:
def __init__(self, default_visibility_timeout: float = 30.0,
idempotency_ttl: float = 300.0):
self._visible: deque[Job] = deque()
self._in_flight: dict[int, tuple[Job, float]] = {} # id -> (job, expires_at)
self._idem: dict[str, tuple[int, float]] = {} # key -> (job_id, expires_at)
self._default_vt = default_visibility_timeout
self._idem_ttl = idempotency_ttl
self._lock = threading.Lock()
self._next_id = itertools.count(1)
def enqueue(self, payload: Any, idempotency_key: Optional[str] = None) -> int:
now = time.monotonic()
with self._lock:
self._sweep_idem(now)
if idempotency_key is not None and idempotency_key in self._idem:
existing_id, _ = self._idem[idempotency_key]
return existing_id
job_id = next(self._next_id)
job = Job(job_id, payload, delivery_count=0, enqueued_at=now)
self._visible.append(job)
if idempotency_key is not None:
self._idem[idempotency_key] = (job_id, now + self._idem_ttl)
return job_id
def dequeue(self, visibility_timeout: Optional[float] = None) -> Optional[Job]:
vt = visibility_timeout if visibility_timeout is not None else self._default_vt
now = time.monotonic()
with self._lock:
self._sweep_in_flight(now)
if not self._visible:
return None
job = self._visible.popleft()
job.delivery_count += 1
self._in_flight[job.job_id] = (job, now + vt)
return job
def ack(self, job_id: int) -> bool:
with self._lock:
return self._in_flight.pop(job_id, None) is not None
def nack(self, job_id: int, requeue_delay: float = 0.0) -> bool:
now = time.monotonic()
with self._lock:
entry = self._in_flight.pop(job_id, None)
if entry is None:
return False
job, _ = entry
if requeue_delay > 0:
# For simplicity, treat delay as a delayed visibility:
# park as in-flight with expires_at = now + delay.
self._in_flight[job_id] = (job, now + requeue_delay)
else:
self._visible.appendleft(job) # head, so it's seen first
return True
def stats(self) -> dict:
with self._lock:
return {
"visible": len(self._visible),
"in_flight": len(self._in_flight),
"dedup_keys": len(self._idem),
}
def _sweep_in_flight(self, now: float) -> None:
expired = [(jid, j) for jid, (j, t) in self._in_flight.items() if t <= now]
for jid, j in expired:
del self._in_flight[jid]
self._visible.appendleft(j) # redelivery: front-load
def _sweep_idem(self, now: float) -> None:
# Lazy: only keep recent entries. O(N) but called infrequently.
if len(self._idem) > 4096:
self._idem = {k: v for k, v in self._idem.items() if v[1] > now}
Tests
import unittest, time
class TestJobQueue(unittest.TestCase):
def test_basic_enqueue_ack(self):
q = JobQueue()
jid = q.enqueue("hello")
job = q.dequeue()
self.assertEqual(job.job_id, jid)
self.assertTrue(q.ack(jid))
self.assertIsNone(q.dequeue())
def test_idempotency_dedup(self):
q = JobQueue()
a = q.enqueue("x", idempotency_key="k1")
b = q.enqueue("x", idempotency_key="k1")
self.assertEqual(a, b)
self.assertEqual(q.stats()["visible"], 1)
def test_visibility_timeout_redelivery(self):
q = JobQueue(default_visibility_timeout=0.1)
jid = q.enqueue("x")
job1 = q.dequeue()
time.sleep(0.15)
job2 = q.dequeue()
self.assertEqual(job1.job_id, job2.job_id)
self.assertEqual(job2.delivery_count, 2)
q.ack(job2.job_id)
def test_nack_requeue(self):
q = JobQueue()
jid = q.enqueue("x")
job = q.dequeue()
q.nack(jid)
job2 = q.dequeue()
self.assertEqual(job.job_id, job2.job_id)
self.assertEqual(job2.delivery_count, 2)
Follow-up Questions
(2) Persist state across restarts? Three layers: (a) WAL: every enqueue, ack, nack is appended to a log; on boot, replay. (b) Snapshot: periodic full state dump. (c) Combined: snapshot every N seconds, WAL between snapshots; recovery = latest snapshot + log replay since. SQS uses a replicated multi-AZ store; for an interview, WAL is the right answer.
(8) Partial failure? That’s the entire point of visibility timeout. Consumer crashes mid-processing → TTL expires → job redelivered. The consumer is responsible for idempotency. The queue is responsible for delivering at-least-once.
(9) Eviction / cleanup? Stale in-flight entries (consumer crashed and never acked) are swept on every dequeue. The idempotency cache TTL bounds dedup memory. DLQ (not implemented above) would catch jobs after N redeliveries — a follow-up to add.
(10) Consistency model? Linearizable per-job in a single process; redelivery breaks “exactly-once” but preserves “every job is processed at least once”. Replicated: consensus (Raft) for the metadata, leader-based delivery, replicated log for durability.
(11) Configuration knobs? default_visibility_timeout, idempotency_ttl, max_redeliveries (DLQ trigger), dlq_handler. Knobs not to expose: internal sweep cadence.
(12) Shutdown? On shutdown, refuse new enqueues, sweep in-flight back to visible (so consumers don’t redeliver after restart with stale TTLs), persist state, exit. The graceful invariant: no in-flight at shutdown time.
Product Extension
This is a simplified SQS / Cloud Pub/Sub / Azure Service Bus. Real systems add: replication for durability (across hosts/AZs), partitioning for throughput (multiple shards), DLQ as a separate queue with its own retention, FIFO ordering as an opt-in higher-cost mode, and ordering keys (per-key FIFO with cross-key parallelism — Kafka’s model). Kafka explicitly doesn’t have visibility timeouts; it uses offset-based delivery with consumer-managed checkpoints, which is a different design point.
Language/Runtime Follow-ups
- Python: this implementation. For high-throughput, sharded queues with per-shard locks scale better than the single global lock.
- Java:
ArrayBlockingQueueis too simple (no visibility timeout). The right reference isjava.util.concurrent.DelayQueuefor visibility, plus aConcurrentHashMapfor in-flight tracking. Production: ActiveMQ, RabbitMQ. - Go: a channel-based implementation works for visible queue; in-flight is a
sync.Map; sweeper is a goroutine. NATS JetStream is the production-grade Go choice. - C++: roll-your-own with
std::deque+std::unordered_map+std::mutex. Boost has thread-safe queue templates. - JS/TS: BullMQ (Redis-backed) is the de-facto Node choice; visibility timeout is implemented via Redis sorted sets.
Common Bugs
- Idempotency cache that never expires — memory leak.
- Sweeping in-flight in a separate thread without coordinating the lock — races with
dequeue. Lazy sweep ondequeue(as shown) avoids the extra thread. - Forgetting to increment
delivery_counton redelivery — alerting can’t detect poison-pill jobs (jobs that always crash consumers). nackwith delay implemented by sleeping — blocks the consumer that callednackinstead of just delaying re-visibility.- Treating idempotency dedup as global / forever — if the dedup window is too long, retries after intentional re-submission are silently dropped.
Debugging Strategy
Print stats() periodically to track visible / in-flight counts. A growing in-flight count without acks → consumers are crashing or hanging. Stuck visible count → no consumers are running. For “duplicate processing” complaints, capture the redelivery-count distribution; high tail = consumers crashing or visibility timeout too short.
Mastery Criteria
- Implemented in <40 minutes from blank screen.
- Stated “exactly-once is impractical; at-least-once + idempotency is the answer” without prompting.
- All four tests pass first run.
- Articulated visibility timeout, idempotency keys, and DLQ design in <90 seconds.
- Answered follow-ups #2, #8, #10, #11, #12.
- Compared SQS-style (visibility timeout) vs Kafka-style (offset-based) delivery in <60 seconds.
Lab 07 — Autocomplete
Goal
Implement an autocomplete service that returns the top-K suggestions for any prefix in sub-millisecond time. Use a trie augmented with per-node top-K caches, and support weighted suggestions (popularity-ranked). After this lab you should be able to design and implement the data structure in under 30 minutes and answer follow-ups about scale, freshness, and personalization.
Background Concepts
A trie (prefix tree) maps prefixes to a set of completions in O(prefix length) time. The naive design — at query time, walk to the prefix node and DFS to gather all descendants, then sort by weight — is correct but slow if the prefix matches many descendants. The production trick is to precompute and cache the top-K at each node during insert; query then becomes O(prefix-length + K).
The cache update is the subtle part. When add(word, weight) is called, walk the trie down the word’s path; at each node, merge the new word into the node’s top_k (a sorted list or small heap) and discard anything past K. The data flow is “bottom-up via the path you just walked, but for top-K at every level”.
Interview Context
Autocomplete is a top-10 question at search-heavy companies (Google search, Amazon product search, LinkedIn, Yelp). The bar is: trie + per-node top-K cache + ability to answer follow-ups about distributing the index, refreshing weights, and personalizing.
Problem Statement
Design Autocomplete(K):
add(word, weight)— insert with weight; subsequent adds of the same word update its weight (additive or replace, your choice — pick one).suggest(prefix) -> list[str]— return top-K words by weight starting withprefix. Sub-millisecond average.
Constraints
- Up to 10^6 words
- 10^5 queries / second
- Average word length 10–30
- K ≤ 10
Clarifying Questions
- Weight semantics: replace or additive? (Pick one; “additive” matches “popularity”.)
- Case sensitivity? (Default case-insensitive; lowercase on insert.)
- Tie-breaking on equal weight? (Lexicographic.)
- Real-time updates required, or build-once? (Both supported; weights mutable.)
- K provided per-query or fixed? (Fixed at construction simplifies caching; per-query is a follow-up.)
Examples
ac = Autocomplete(K=3)
ac.add("apple", 5); ac.add("app", 10); ac.add("apply", 3); ac.add("apricot", 1)
ac.suggest("ap") -> ["app", "apple", "apply"]
ac.suggest("app") -> ["app", "apple", "apply"]
ac.suggest("apr") -> ["apricot"]
ac.suggest("z") -> []
Initial Brute Force
Store words in a dict[word, weight]. On suggest, scan all words, filter by startswith(prefix), sort by weight, return top-K. O(N · prefix-length) per query.
Brute Force Complexity
Per suggest: O(N · L) where N = number of words, L = average length. At N=10^6 and 10^5 QPS, this is 10^11 operations / second — far too slow.
Optimization Path
A trie reduces the descend-to-prefix step to O(L). The remaining work — gathering the top-K descendants — is what the per-node top_k cache eliminates. Insert becomes O(L · K log K) (we update the cache at L nodes, each O(K log K)); query becomes O(L + K).
Final Expected Approach
Trie node has children: dict[char, Node] and top_k: list[(weight, word)]. add walks down the path; at each node, runs a small merge to maintain top-K. suggest walks to the prefix node and returns its top_k.
Data Structures Used
| Structure | Purpose |
|---|---|
| Trie nodes | prefix indexing |
Per-node top_k: list[(weight, word)] | precomputed answers |
dict[word, weight] | dedup + weight tracking |
Correctness Argument
Invariant: at any node N, top_k(N) is the top-K (by weight, ties broken lexicographically) of { (weight, word) : word starts with prefix(N) }. After every add(word, weight), we visit exactly the nodes on word’s path. At each visited node, we either (a) update the (word, weight) entry if word is already in top_k, or (b) insert and trim to K. No node off the path’s top_k set could change because word doesn’t extend any non-path prefix.
Edge case: weight updates that decrease a word’s standing — if word was in the top-K and its new weight kicks it out, we need to recompute the node’s top-K from a wider candidate set. The clean approach: store at each node a count_per_word dict and full candidate set restricted to top-K-wide buffer (e.g., top-10K when K=10) — heavy but correct. The simpler approach: on weight decrease, do a DFS to rebuild top_k. Document the choice.
Complexity
add: O(L · K log K)suggest: O(L + K)- Space: O(N · L · K) worst case; in practice much less because most nodes don’t have K distinct descendants
Implementation Requirements
import heapq
from typing import Optional
class _Node:
__slots__ = ("children", "top_k")
def __init__(self):
self.children: dict[str, _Node] = {}
self.top_k: list[tuple[int, str]] = [] # min-heap of (weight, word) ... ish
class Autocomplete:
def __init__(self, k: int = 5):
self._root = _Node()
self._weights: dict[str, int] = {}
self._k = k
def add(self, word: str, weight_delta: int = 1) -> None:
word = word.lower()
new_weight = self._weights.get(word, 0) + weight_delta
self._weights[word] = new_weight
node = self._root
nodes_on_path: list[_Node] = [node]
for ch in word:
node = node.children.setdefault(ch, _Node())
nodes_on_path.append(node)
for n in nodes_on_path:
self._upsert_top_k(n, word, new_weight)
def suggest(self, prefix: str) -> list[str]:
prefix = prefix.lower()
node = self._root
for ch in prefix:
node = node.children.get(ch)
if node is None:
return []
# top_k stored as (weight_neg, word) so sorted asc gives top weights desc
ranked = sorted(node.top_k, key=lambda p: (-p[0], p[1]))
return [w for _, w in ranked[:self._k]]
def _upsert_top_k(self, node: _Node, word: str, weight: int) -> None:
for i, (w, ww) in enumerate(node.top_k):
if ww == word:
node.top_k[i] = (weight, word)
node.top_k.sort(key=lambda p: (-p[0], p[1]))
return
node.top_k.append((weight, word))
node.top_k.sort(key=lambda p: (-p[0], p[1]))
if len(node.top_k) > self._k:
del node.top_k[self._k:]
Tests
import unittest
class TestAutocomplete(unittest.TestCase):
def test_basic(self):
ac = Autocomplete(k=3)
for w, n in [("app", 10), ("apple", 5), ("apply", 3), ("apricot", 1)]:
ac.add(w, n)
self.assertEqual(ac.suggest("ap"), ["app", "apple", "apply"])
self.assertEqual(ac.suggest("app"), ["app", "apple", "apply"])
self.assertEqual(ac.suggest("apr"), ["apricot"])
self.assertEqual(ac.suggest("z"), [])
def test_weight_update(self):
ac = Autocomplete(k=3)
ac.add("a", 1); ac.add("b", 2); ac.add("c", 3)
ac.add("a", 10) # a now weight 11
# Suggest off the empty prefix
self.assertEqual(ac.suggest(""), ["a", "c", "b"])
def test_top_k_truncation(self):
ac = Autocomplete(k=2)
for c, w in [("a", 1), ("b", 2), ("c", 3), ("d", 4)]:
ac.add(c, w)
self.assertEqual(ac.suggest(""), ["d", "c"])
def test_lex_tie(self):
ac = Autocomplete(k=3)
ac.add("banana", 5); ac.add("apple", 5); ac.add("cherry", 5)
self.assertEqual(ac.suggest(""), ["apple", "banana", "cherry"])
Follow-up Questions
(3) Scale to N nodes? Shard by first character (A–Z) → 26 shards, each with its own trie. For more even distribution, shard by hash(prefix[:2]). Each suggestion query routes to one shard. For the empty-prefix query, broadcast and merge — the price you pay for sharding by prefix.
(4) Observe / monitor? Per-prefix-length latency histogram (short prefixes = many candidates, slow); cache hit-rate (proportion of queries hitting precomputed top-K vs needing DFS); query volume per prefix (top hot prefixes). Alert on p99 latency.
(9) Eviction / cleanup? Words may go cold (a celebrity who stopped trending). Strategy: timestamped weight, decay on a schedule (multiply all weights by 0.95 daily), delete when below a threshold. Or use a separate “deletion” path: remove(word) walks the trie, removes from each node’s top_k, and rebuilds top_k from a wider candidate set if the removed word was in it.
(11) Configuration knobs? K, case_sensitivity, weight_decay_factor. Knobs not to expose: trie node layout, sort algorithm.
(8) Partial failure? A query that hits a partially-rebuilt index during a add could see stale top-K. Solutions: (a) atomic per-node update (under a per-node lock), (b) versioned snapshot (queries read a stable version while writes go to a shadow), (c) accept stale results for ~1 second. For autocomplete, eventual consistency is fine.
Product Extension
Google’s autocomplete is far more than a trie: it’s a personalized, context-aware, learning-ranked system. The trie + top-K is the index layer; on top sits a ranker that combines popularity, personalization (your history), context (location, time), and freshness (trending). Production systems also add typo tolerance (Levenshtein-edit-distance fuzzy match within edit-distance ≤ 2) — a much harder problem solved with FSTs or n-gram inverted indexes.
Language/Runtime Follow-ups
- Python: dict-based trie is the simplest; for memory, switch to arrays of 26/128 children once you optimize. The implementation above is fine for 10^6 words; beyond, consider a DAWG (DAG of suffix-shared subtrees).
- Java:
HashMap<Character, Node>per node, or arrays for ASCII. Apache Commons Collections hasTrieMap(PATRICIA trie). - Go:
map[rune]*Nodeper node. Excellent for this workload because of GC’s tolerance for many small allocations. - C++: same pattern. For best performance, use
std::array<Node*, 26>for ASCII. - JS/TS:
Map<string, Node>per node; no concurrency concerns in single-threaded Node.
Common Bugs
- Inserting and forgetting to update top-K at every node on the path (only updating the leaf). Subsequent prefix queries return empty.
- Sorting top-K by weight only, forgetting lex tie-break. Tests with equal weights become flaky.
- On weight decrease, leaving a stale entry in top-K. Solution: full rebuild of top-K on decrease.
- Case mismatch: insert lowercase, query as-is. Lowercase both.
- Memory: storing the full word in every node’s top-K — at 10^6 words and depth 30, this is 30M strings. Store an integer ID and look up the word in a side dict for memory savings.
Debugging Strategy
For “wrong suggestions” bugs: print node.top_k at the prefix node and verify it matches the expected top-K. For “missing word” bugs: walk down the trie from root, printing node.children at each step, confirm the path exists. For weight bugs: dump self._weights[word] and compare to expected. For performance, profile: most hot paths are dict.setdefault and the in-place sort.
Mastery Criteria
- Implemented trie + per-node top-K in <30 minutes.
- All four tests pass first run.
- Stated trie + top-K caching tradeoff (insert is K-times slower, query is L+K instead of L+all-descendants).
- Answered follow-ups #3 (sharding), #4 (observability), #9 (decay), #8 (eventual consistency) crisply.
- Compared trie vs DAWG vs FST for memory.
- Articulated typo-tolerance design (BK-tree / fuzzy n-grams) at a high level.
Lab 08 — Log Parser
Goal
Implement a streaming log parser that reads log lines (potentially gigabytes), extracts structured fields via regex, aggregates per-field counts, and emits structured output — all under bounded memory. After this lab you should be able to write a clean streaming text-processing class with bounded memory in under 25 minutes.
Background Concepts
Log parsing has two patterns: batch (load file, parse all, output) and streaming (read one line at a time, emit incremental output). The bar at senior interviews is the streaming variant because real production logs are too large to load — multi-gigabyte files where batch processing would OOM.
The two streaming primitives are:
- Line-by-line iteration with a generator (
for line in file:in Python). Memory is O(line size), not O(file size). - Bounded aggregation: when counting unique IPs over a 1 TB log, you cannot keep all distinct IPs in a
dict. Bound the aggregation by either (a) sketch (HyperLogLog for distinct counts, count-min for top-K), or (b) “top-K with eviction” using a min-heap of size K.
The regex itself is mundane. The interview signal is the discipline of bounded memory and clean separation between parser, extractor, and aggregator.
Interview Context
Log parsing is a popular question at logging / observability companies (Datadog, Splunk, Honeycomb, Cribl, Elastic) and at any infrastructure company that processes high-volume telemetry. It tests streaming discipline, regex fluency, and bounded-memory awareness. It also exposes weak engineering: a candidate who writes lines = file.readlines() instantly fails the bounded-memory criterion.
Problem Statement
Design LogParser(pattern, top_k=10):
parse_stream(line_iter) -> Iterator[dict]— yield a dict per line with extracted named fields. Skip malformed lines (count them).aggregate(line_iter) -> dict— return per-field top-K aggregates (e.g., top 10 IPs, top 10 paths, top 10 status codes). Bounded memory.
The regex is provided at construction; the parser must use named capture groups.
Constraints
- Input file size: up to 100 GB
- Aggregator memory: ≤ 100 MB
- Target throughput: 50 MB/s on a single core
Clarifying Questions
- Is the log format known? (Yes — caller provides regex with named groups.)
- Malformed lines: skip, error, or quarantine? (Skip + count by default; quarantine optionally.)
- Aggregation: which fields, what kind (count, distinct, top-K)? (Caller specifies.)
- Time-series: are we computing per-time-window aggregates? (Optional; default is whole-stream.)
- Encoding: UTF-8? Binary? (UTF-8 default; binary is a follow-up.)
Examples
pattern = r'(?P<ip>\S+) - - \[(?P<ts>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) HTTP/[\d.]+" (?P<status>\d+) (?P<bytes>\d+)'
parser = LogParser(pattern, top_k=3)
for record in parser.parse_stream(open("access.log")):
print(record)
# {'ip': '1.2.3.4', 'ts': '20/May/2026:12:00:00', 'method': 'GET', 'path': '/x', 'status': '200', 'bytes': '1234'}
agg = parser.aggregate(open("access.log"))
# {'ip': [('1.2.3.4', 1500), ('5.6.7.8', 1200), ...],
# 'path': [('/api', 5000), ('/login', 3000), ...],
# 'status': [('200', 90000), ('404', 4000), ('500', 200)],
# 'malformed': 12,
# 'total': 100000}
Initial Brute Force
open(file).read() then re.finditer. Loads everything; OOMs on 100 GB.
Brute Force Complexity
Memory: O(file size). At 100 GB, instant OOM on a 32 GB machine.
Optimization Path
Stream line-by-line with for line in file:. For aggregation, replace the unbounded dict[ip, count] with: (a) keep all counts during the stream because the unique key cardinality is what matters (often 100k unique IPs is fine), or (b) for very high cardinality, use HyperLogLog (HLL) for distinct counts and count-min sketch + min-heap for top-K. For most workloads at moderate cardinality, the dict is fine; at extreme cardinality, sketches are required.
Final Expected Approach
Compile the regex once. Stream lines via the iterator. For each line, match and yield the groupdict() if matched, else increment malformed count. Aggregator: per configured field, maintain a Counter (which is a dict); at end, most_common(K). For very high cardinality, switch to count-min + heap.
Data Structures Used
| Structure | Purpose |
|---|---|
Compiled re.Pattern | match each line in O(line length) |
Counter per field | exact top-K within bounded cardinality |
| Min-heap of (count, key) of size K | bounded top-K when cardinality is unbounded |
Counters for total, malformed | observability |
Correctness Argument
Streaming: for line in file: reads at most one line buffer at a time. Memory is O(longest line + aggregator state). For 100 GB files with 1 KB lines, memory stays at ~aggregator-state size.
Aggregation: Counter.most_common(K) returns the exact top-K when all keys are tracked. When using a count-min sketch + bounded heap, the result is approximate with bounded error: actual_count ≤ estimate ≤ actual_count + ε · total with probability ≥ 1 − δ. We pick ε, δ to fit memory.
Complexity
- Per line: O(L · regex-complexity) for parsing + O(F) for F fields aggregated
- Total: O(N · L · regex)
- Memory: O(unique keys per field) for exact aggregation; O(width × depth) for sketch
Implementation Requirements
import re
from collections import Counter
from typing import Iterator, Iterable, Optional
class LogParser:
def __init__(self, pattern: str, top_k: int = 10,
aggregate_fields: Optional[list[str]] = None):
self._re = re.compile(pattern)
self._k = top_k
self._fields = aggregate_fields # None = aggregate all named groups
def parse_stream(self, lines: Iterable[str]) -> Iterator[dict]:
for line in lines:
line = line.rstrip("\n")
m = self._re.match(line)
if m is None:
continue
yield m.groupdict()
def aggregate(self, lines: Iterable[str]) -> dict:
counters: dict[str, Counter] = {}
total = malformed = 0
for line in lines:
line = line.rstrip("\n")
total += 1
m = self._re.match(line)
if m is None:
malformed += 1
continue
d = m.groupdict()
fields = self._fields or list(d.keys())
for f in fields:
v = d.get(f)
if v is None: continue
counters.setdefault(f, Counter())[v] += 1
out = {f: c.most_common(self._k) for f, c in counters.items()}
out["total"] = total
out["malformed"] = malformed
return out
# Bounded-memory variant: top-K only via heap
import heapq
class BoundedTopK:
"""Approximate top-K using count-min sketch + min-heap of size K.
For high-cardinality streams. Replace LogParser._field_counters with this.
"""
def __init__(self, k: int, width: int = 2048, depth: int = 5):
self._k = k
self._w, self._d = width, depth
import random
self._table = [[0] * width for _ in range(depth)]
# Independent hash seeds.
self._seeds = [random.randint(1, 2**31 - 1) for _ in range(depth)]
self._heap: list[tuple[int, str]] = [] # (count, key)
self._in_heap: dict[str, int] = {} # key -> count seen at insert
def add(self, key: str) -> None:
est = self._increment(key)
if key in self._in_heap:
# Best-effort: refresh heap entry. (Lazy: do nothing; entries are stale.)
self._in_heap[key] = est
return
if len(self._heap) < self._k:
heapq.heappush(self._heap, (est, key))
self._in_heap[key] = est
return
if est > self._heap[0][0]:
old_count, old_key = heapq.heappushpop(self._heap, (est, key))
self._in_heap.pop(old_key, None)
self._in_heap[key] = est
def _increment(self, key: str) -> int:
ests = []
for i in range(self._d):
j = (hash((self._seeds[i], key))) % self._w
self._table[i][j] += 1
ests.append(self._table[i][j])
return min(ests)
def top_k(self) -> list[tuple[str, int]]:
return sorted(((k, c) for c, k in self._heap), key=lambda p: -p[1])
Tests
import unittest, io
LOG_PATTERN = (r'(?P<ip>\S+) - - \[(?P<ts>[^\]]+)\] '
r'"(?P<method>\S+) (?P<path>\S+) HTTP/[\d.]+" '
r'(?P<status>\d+) (?P<bytes>\d+)')
SAMPLE = """1.1.1.1 - - [01/Jan/2026:00:00:00 +0000] "GET /a HTTP/1.1" 200 100
2.2.2.2 - - [01/Jan/2026:00:00:01 +0000] "GET /b HTTP/1.1" 200 200
1.1.1.1 - - [01/Jan/2026:00:00:02 +0000] "POST /a HTTP/1.1" 500 0
malformed log line junk junk junk
1.1.1.1 - - [01/Jan/2026:00:00:03 +0000] "GET /a HTTP/1.1" 200 100
"""
class TestParser(unittest.TestCase):
def test_parse_stream(self):
p = LogParser(LOG_PATTERN, top_k=3)
recs = list(p.parse_stream(io.StringIO(SAMPLE)))
self.assertEqual(len(recs), 4)
self.assertEqual(recs[0]["ip"], "1.1.1.1")
self.assertEqual(recs[0]["status"], "200")
def test_aggregate(self):
p = LogParser(LOG_PATTERN, top_k=3, aggregate_fields=["ip", "status"])
agg = p.aggregate(io.StringIO(SAMPLE))
self.assertEqual(agg["total"], 5)
self.assertEqual(agg["malformed"], 1)
self.assertEqual(agg["ip"][0], ("1.1.1.1", 3))
self.assertEqual(dict(agg["status"]), {"200": 3, "500": 1})
def test_streaming_memory(self):
# Generate a synthetic stream and ensure parse_stream is lazy
def gen():
for i in range(10000):
yield f'1.1.1.1 - - [now] "GET /p{i % 100} HTTP/1.1" 200 100'
p = LogParser(LOG_PATTERN)
# consume one record at a time
it = p.parse_stream(gen())
first = next(it)
self.assertEqual(first["path"], "/p0")
Follow-up Questions
(4) Observe / monitor? Throughput (lines/sec), parse error rate (malformed/total), per-field cardinality (gauge), p99 line size (latency surrogate). Alert on parse error rate spiking — usually means upstream changed the format.
(5) Tests? Unit on regex correctness with hand-crafted lines; property-based tests with random-line generators; smoke on a real prod-shaped sample (1 MB); large-input test that asserts memory stays bounded (tracemalloc.get_traced_memory() in Python).
(7) Backpressure? If the consumer of parse_stream is slow, the iterator naturally pauses — Python generators are pull-based. For the producer side (file reads), no backpressure issue. If shipping to a downstream like Kafka, buffer with a bounded queue and drop on full (with a counter).
(11) Configuration knobs? pattern, top_k, aggregate_fields, bounded_memory: bool (toggle exact vs sketch-based). Knobs not to expose: regex compilation cache.
(13) Poison pill? A line that takes O(catastrophic backtracking) on the regex (regex DoS via specific patterns). Mitigation: line length cap (skip lines > N bytes), regex timeout (Python: only available in regex package, not stdlib re), or pre-compile with anchors and avoid .* at the start.
Product Extension
Production systems use one of: logstash / fluentd (regex-based extraction with field rules), CloudWatch Logs Insights (column-based after extraction), Datadog Logs / Splunk (full pipeline with grok patterns and ingest-time enrichment). The data structure that powers most “top-K-over-stream” dashboards is count-min + heap; HLL powers distinct-count widgets; reservoir sampling powers “show me 100 random matching events”.
Language/Runtime Follow-ups
- Python:
reis fast enough for most logs but doesn’t compile to DFA — backtracking is a real risk. Use theregexpackage for timeout support. For raw speed,pyre2(re2 binding) avoids backtracking entirely. - Java:
Pattern.compile(...)once; reuse.Matcheris mutable per match. For very high throughput, RE2/J avoids backtracking. - Go:
regexppackage is RE2-based — guaranteed linear time, no catastrophic backtracking. Idiomatic for log parsing. - C++:
std::regexis slow; prefer Boost.Regex or PCRE2 in production. - JS/TS: V8’s regex is backtracking; same DoS concern as Python’s
re. Node has no built-in regex timeout.
Common Bugs
- Loading the file:
open(f).readlines()orf.read().split("\n")— instant OOM on large files. - Recompiling the regex per line — 100x slowdown.
- Forgetting to strip
\n— the last named group captures\nand breaks comparisons. - Using
.*greedily inside the pattern — catastrophic backtracking on long lines. - Aggregator dict grows unbounded on high-cardinality fields (e.g., user-agent string with version churn). Cap or use sketch.
Debugging Strategy
For parse failures: print the first 5 malformed lines and inspect the regex against them. For wrong field values: print m.groupdict() of one matching line. For OOM: tracemalloc.start(); ...; print(tracemalloc.get_traced_memory()) at intervals — find the structure that grows. For slowness: cProfile and check whether the hot spot is regex match or dict update.
Mastery Criteria
-
Implemented streaming
LogParserwith bounded aggregation in <25 minutes. - All three tests pass first run.
-
Stated
for line in file:lazy iteration without prompting. -
Explained when to switch from
Counterto count-min + heap (when unique-key memory exceeds budget). - Answered follow-ups #4, #7, #11, #13 (regex DoS) crisply.
- Identified backtracking risk in user-supplied regexes.
Lab 09 — File Deduplication
Goal
Find all groups of duplicate files in a directory tree using a three-stage filter: size → quick hash (first/last K bytes) → full content hash. After this lab you should be able to design and implement an efficient file deduper that minimizes I/O on huge directories in under 25 minutes.
Background Concepts
The naive approach — read every file and group by full hash — is correct but wastes I/O on files that are clearly not duplicates (different sizes). The classical optimization is a three-stage cascade:
- Group by size. Two files of different sizes cannot be duplicates. This is a
statcall (cheap, no read). - Group by quick hash. For each size-group with ≥2 files, hash the first and last K bytes (e.g., 4 KB each). Files with different head+tail hashes are not duplicates.
- Group by full hash. For each surviving group with ≥2 files, hash the full content. This is the only stage that does full reads.
This works because in real datasets, most files of the same size differ in their first/last few KB (think .docx files with embedded timestamps, video files with different headers, log files with different first lines). The cascade reduces total I/O by ~10–100×.
Interview Context
This is a popular practical problem at infrastructure / file-storage companies (Dropbox, Box, Google Drive, AWS S3 dedup). It tests I/O awareness, hashing fluency, and ability to design a multi-stage filter pipeline. The interviewer wants to hear “I’d minimize I/O by going size → quick-hash → full-hash” before any code.
Problem Statement
Design find_duplicates(root) -> list[list[Path]]:
- Walk the directory tree under
root. - Return groups (lists) of paths that have identical content. Each group has ≥ 2 paths.
- Use the three-stage cascade.
Constraints
- Up to 10^6 files
- Up to 1 TB total bytes
- Memory: ≤ 1 GB
- Read budget: minimize bytes read (the goal of optimization)
Clarifying Questions
- Symlinks — follow or skip? (Skip by default to avoid loops; configurable.)
- Hidden files? (Include by default; let caller filter.)
- Empty files (size 0) — duplicates of each other? (Yes — include them as a group; or filter; ask.)
- Hash function? (
hashlib.blake2bis fast;sha256is the cryptographic default;md5is fine for non-adversarial dedup. Pick non-cryptographic-fast for performance, cryptographic if the result is durable.) - Concurrency? (Not strictly required; CPU is hashing, I/O is reads — both parallelize well. State as a follow-up.)
Examples
root/
├── a.txt "hello"
├── b.txt "hello"
├── c.txt "world"
├── d.txt "world"
└── e.txt "different"
find_duplicates("root")
-> [[Path("root/a.txt"), Path("root/b.txt")],
[Path("root/c.txt"), Path("root/d.txt")]]
Initial Brute Force
For every pair of files, compare byte-for-byte. O(N² · L) reads. Catastrophic at 10^6 files.
Brute Force Complexity
O(N²) pairwise comparisons, each O(L). At N=10^6 and L=1 MB, this is 10^18 byte comparisons — never completes.
Optimization Path
Three-stage cascade:
- Group by size: O(N)
statcalls. Memory O(N · path). - Quick-hash within each size group: O(K · |group|) per group; only run on size-groups with ≥2 files.
- Full-hash within each quick-hash group: O(L · |group|).
Total reads: most files are excluded at stage 1 (different sizes); of the rest, most are excluded at stage 2 (quick hash differs). Only true (or very near) duplicates get a full read.
Final Expected Approach
Walk the tree once, building dict[size, list[Path]]. For groups with len ≥ 2, build dict[quick_hash, list[Path]]. For surviving groups, build dict[full_hash, list[Path]]. Output groups with len ≥ 2.
Data Structures Used
| Structure | Stage |
|---|---|
dict[int, list[Path]] | size grouping |
dict[bytes, list[Path]] | quick-hash grouping |
dict[bytes, list[Path]] | full-hash grouping |
Correctness Argument
Soundness: every group output has identical full content (verified by full-hash equality, modulo collision probability of 2^-256 for SHA-256, negligible).
Completeness: two files A, B with identical content have:
- equal size (so they survive stage 1)
- equal quick-hash (so they survive stage 2)
- equal full-hash (so they end up in the same output group)
Therefore A and B appear in the same output group. The cascade does not miss any duplicate.
Complexity
- Time: O(N) for size grouping; O(K · N_size_dups) for quick-hash; O(L · N_full_dups) for full-hash. K ≪ L, N_full_dups ≪ N.
- Space: O(N) for path bookkeeping.
Implementation Requirements
import os, hashlib
from pathlib import Path
from collections import defaultdict
from typing import Iterable
QUICK_BYTES = 4096 # 4 KB head + 4 KB tail
def _quick_hash(path: Path) -> bytes:
h = hashlib.blake2b(digest_size=16)
size = path.stat().st_size
with open(path, "rb") as f:
head = f.read(QUICK_BYTES)
h.update(head)
if size > 2 * QUICK_BYTES:
f.seek(-QUICK_BYTES, os.SEEK_END)
tail = f.read(QUICK_BYTES)
h.update(tail)
return h.digest()
def _full_hash(path: Path, chunk: int = 1 << 20) -> bytes:
h = hashlib.blake2b(digest_size=32)
with open(path, "rb") as f:
while True:
buf = f.read(chunk)
if not buf: break
h.update(buf)
return h.digest()
def find_duplicates(root: str | Path,
follow_symlinks: bool = False,
include_empty: bool = False) -> list[list[Path]]:
root = Path(root)
by_size: dict[int, list[Path]] = defaultdict(list)
for dirpath, _, files in os.walk(root, followlinks=follow_symlinks):
for name in files:
p = Path(dirpath) / name
try:
st = p.stat() if follow_symlinks else p.lstat()
if not include_empty and st.st_size == 0: continue
if not p.is_file(): continue
by_size[st.st_size].append(p)
except (OSError, PermissionError):
continue
candidates_after_size: list[list[Path]] = [g for g in by_size.values() if len(g) >= 2]
by_quick: list[list[Path]] = []
for group in candidates_after_size:
sub: dict[bytes, list[Path]] = defaultdict(list)
for p in group:
try: sub[_quick_hash(p)].append(p)
except OSError: continue
for g in sub.values():
if len(g) >= 2: by_quick.append(g)
out: list[list[Path]] = []
for group in by_quick:
sub: dict[bytes, list[Path]] = defaultdict(list)
for p in group:
try: sub[_full_hash(p)].append(p)
except OSError: continue
for g in sub.values():
if len(g) >= 2: out.append(g)
return out
Tests
import unittest, tempfile, os
from pathlib import Path
class TestDedup(unittest.TestCase):
def setUp(self):
self.tmp = tempfile.TemporaryDirectory()
self.root = Path(self.tmp.name)
def tearDown(self):
self.tmp.cleanup()
def _w(self, name: str, content: bytes):
p = self.root / name
p.parent.mkdir(parents=True, exist_ok=True)
p.write_bytes(content)
return p
def test_basic(self):
a = self._w("a", b"hello")
b = self._w("sub/b", b"hello")
c = self._w("c", b"world")
d = self._w("sub/d", b"world")
e = self._w("e", b"unique")
groups = find_duplicates(self.root)
flat = sorted([sorted(map(str, g)) for g in groups])
self.assertEqual(len(flat), 2)
self.assertIn(sorted([str(a), str(b)]), flat)
self.assertIn(sorted([str(c), str(d)]), flat)
def test_size_excludes_non_duplicates(self):
self._w("a", b"x" * 100)
self._w("b", b"x" * 200) # different size
self.assertEqual(find_duplicates(self.root), [])
def test_quick_hash_disambiguates(self):
# Same size, different head: stage 2 separates them
self._w("a", b"head1" + b"x" * 8000)
self._w("b", b"head2" + b"x" * 8000)
self.assertEqual(find_duplicates(self.root), [])
def test_large_groups(self):
for i in range(5):
self._w(f"copy{i}", b"same content")
groups = find_duplicates(self.root)
self.assertEqual(len(groups), 1)
self.assertEqual(len(groups[0]), 5)
Follow-up Questions
(3) Scale to N nodes? Distributed dedup over a fleet: size grouping is local on each node; for cross-node dedup, broadcast (size, quick_hash, node, path) tuples to a coordinator, group by (size, quick_hash), then have nodes that share a group exchange full hashes. The full read happens locally; only hashes (32 bytes) move over the network.
(4) Observe / monitor? Files scanned (counter), bytes read at each stage (gauge — quantifies the savings of the cascade), groups found (gauge), errors per stage (counter). The “bytes read at full-hash stage / total bytes” ratio is the key savings metric.
(8) Partial failure? A file deleted/replaced mid-scan: the second-stage hash may differ from the first-stage size or hash. Solutions: (a) treat the file as missing on OSError and skip, (b) snapshot the FS (LVM snapshot, ZFS, Btrfs) before scanning. (a) is the practical answer.
(13) Poison-pill input? A multi-TB file (or a sparse file with a 1 PB apparent size) blows up the full-read stage. Mitigation: cap files by size (skip if > max_file_size), or use Merkle-tree chunked hashing where each chunk is independent and partial reads can be cached/resumed.
(11) Configuration knobs? quick_bytes (default 4 KB), chunk_size for full read, max_file_size, follow_symlinks, include_empty. Knobs not to expose: hash algorithm (pick BLAKE2b for speed unless a contract requires SHA-256).
Product Extension
fdupes, rmlint, jdupes are the canonical Linux tools and use exactly this cascade. Dropbox’s chunked dedup operates at the 4 MB block level (each file is split into chunks, each chunk hashed and deduplicated separately) — letting two files that share a 1 MB prefix store the prefix once. ZFS dedup operates at the block level too. The “full-file dedup” you wrote here is the simplest version; production systems often go to chunk-level dedup for higher savings.
Language/Runtime Follow-ups
- Python:
os.walkis the standard;pathlib.Path.rglob('*')is more idiomatic but slower.hashlib.blake2bis in stdlib and ~3x faster than SHA-256. - Java:
Files.walk(Path)is the equivalent;MessageDigestfor hashing.BLAKE2requires BouncyCastle. - Go:
filepath.Walk(deprecated forfilepath.WalkDir).crypto/sha256is fast;golang.org/x/crypto/blake2bfor BLAKE2. - C++:
std::filesystem::recursive_directory_iterator. OpenSSL’sEVP_DigestUpdatefor hashing. - JS/TS:
fs.promises.readdir(dir, {recursive: true})(Node 20+).crypto.createHash('blake2b512')for hashing.
Common Bugs
- Treating symlinks as files — infinite loops or duplicate phantom matches. Use
lstatand explicitis_file()check. - Forgetting to filter size-1 groups before stage 2 (running quick-hash on isolated files wastes I/O).
- Hashing with
md5and getting bitten by collision in adversarial datasets. Use SHA-256 or BLAKE2b. - Reading the entire file into memory at full-hash stage instead of streaming. OOM on large files.
- Not handling permission errors — first
OSErrorhalts the entire scan instead of skipping the file.
Debugging Strategy
For “missed duplicates”: verify with cmp (Unix) or byte-by-byte. Print the size/quick-hash/full-hash of both files. Most missed-dup bugs are quick-hash logic (e.g., not seeking to end correctly for files smaller than 2 * QUICK_BYTES).
Mastery Criteria
- Articulated the three-stage cascade in <60 seconds before coding.
- Implemented in <25 minutes; all four tests pass.
- Stated bytes-read savings is the main optimization signal, not wall-clock.
- Answered follow-ups #3, #4, #8, #13 crisply.
- Compared full-file vs chunk-level dedup correctly.
- Identified BLAKE2b as the right hash for performance.
Lab 10 — Consistent Hashing
Goal
Implement a consistent hash ring with virtual nodes that minimizes key remapping when servers are added or removed. After this lab you should be able to design and implement consistent hashing in under 30 minutes and articulate why it beats hash(key) % N.
Background Concepts
The naive sharding scheme hash(key) % N has a catastrophic failure mode: when N changes (a server is added or removed), nearly every key remaps to a different shard. For an in-memory cache fleet, this means the entire cache is invalidated; for a stateful sharded store, this means most data must be physically migrated.
Consistent hashing solves this. Servers are placed on a ring (a circular hash space, e.g., [0, 2^64)). Each key is hashed onto the ring and assigned to the next server clockwise. When a server is added, only keys between its predecessor and itself on the ring are remapped. When a server is removed, only its keys are remapped — to its successor.
Without virtual nodes (vnodes), the ring is unbalanced: a 4-server ring assigns wildly unequal slices. Virtual nodes fix this: each physical server gets V ring positions (e.g., V=200). The ring becomes statistically balanced in O(1/sqrt(V)) deviation.
Interview Context
Consistent hashing is the default sharding mechanism for distributed caches (Memcached client libraries, Redis Cluster’s slot variant), distributed databases (DynamoDB, Cassandra), and load balancers (HAProxy, Envoy with ring_hash). It is asked at infrastructure roles at every Big Tech and many high-scale companies.
Problem Statement
Implement ConsistentHashRing(vnodes_per_server):
add_server(server_id)— add a server with V vnodes.remove_server(server_id)— remove all vnodes for the server.get_server(key) -> server_id— return the server responsible forkey.keys_moved(key, before, after)— for analysis: did this key remap?
Constraints
- 1 ≤ servers ≤ 10^4
- 1 ≤ vnodes per server ≤ 1000
- 10^5 lookups / second
- Lookup latency: O(log N · V)
Clarifying Questions
- Hash function: cryptographic or fast? (Use a fast non-crypto hash: MurmurHash, xxHash. Stable across processes.)
- Vnode count V: hard-coded or configurable? (Configurable, default 100–200.)
- Replication: should
get_serverreturn one or multiple distinct servers? (Often a follow-up; primary is one.) - Hot-spotting awareness: do we know any keys are extremely hot? (Bounded-load consistent hashing is a follow-up.)
Examples
ring = ConsistentHashRing(vnodes_per_server=100)
ring.add_server("s1"); ring.add_server("s2"); ring.add_server("s3")
ring.get_server("user-42") -> "s2"
ring.add_server("s4")
ring.get_server("user-42") -> "s2" or "s4" (only some keys remap)
# ~25% of keys remap on adding a 4th server, not 75% as with mod-N
Initial Brute Force
hash(key) % N. Simple and balanced; catastrophic on N change. Useful for understanding the problem, not as the solution.
Brute Force Complexity
O(1) per lookup; O(N · keys / N) = O(keys) remap on N change — not the problem; the brute force is mod-N and we’re moving away from it.
Optimization Path
Replace mod-N with a sorted ring. Servers map to multiple positions; lookup is binary search. On add/remove, only insert/delete vnode positions; existing positions don’t move.
Final Expected Approach
A sorted list of (hash_value, server_id) tuples kept in ring order. get_server(key): hash the key, binary-search for the smallest ring position ≥ key_hash; wrap around if past the end. add_server: insert V positions. remove_server: remove all V positions.
Data Structures Used
| Structure | Purpose |
|---|---|
Sorted list of (hash, server) | the ring |
dict[server_id, list[hash_positions]] | bookkeeping for removal |
bisect for binary search | O(log N · V) lookup |
Correctness Argument
Key locality on resize: when adding server S with vnodes [v_1, …, v_V], the only keys whose owner changes are those whose hash falls in some (predecessor(v_i), v_i] range. The expected fraction of keys affected is V / (total vnodes) ≈ 1/(N+1) — exactly the right number to assign to the new server, and no more.
Balanced load: with V vnodes per server, the variance of the load assigned to each server scales as O(log N / V). At V=100, N=10, the imbalance is < 5%; at V=1000, < 1.5%.
Complexity
get_server: O(log(N · V))add_server: O(V · log(N · V)) per insert; total O(V log N) per server- Space: O(N · V)
Implementation Requirements
import bisect, hashlib
from typing import Optional
def _hash(s: str) -> int:
"""Fast, deterministic hash. Use MD5 for speed; SHA-1 also fine."""
return int.from_bytes(hashlib.md5(s.encode()).digest()[:8], "big")
class ConsistentHashRing:
def __init__(self, vnodes_per_server: int = 100):
self._v = vnodes_per_server
self._ring: list[tuple[int, str]] = [] # sorted by hash
self._server_positions: dict[str, list[int]] = {}
def add_server(self, server_id: str) -> None:
if server_id in self._server_positions:
return
positions = []
for i in range(self._v):
h = _hash(f"{server_id}#{i}")
bisect.insort(self._ring, (h, server_id))
positions.append(h)
self._server_positions[server_id] = positions
def remove_server(self, server_id: str) -> None:
positions = self._server_positions.pop(server_id, None)
if positions is None: return
# Rebuild filtered ring (O(N V) — acceptable; remove is rare)
self._ring = [(h, s) for (h, s) in self._ring if s != server_id]
def get_server(self, key: str) -> Optional[str]:
if not self._ring:
return None
kh = _hash(key)
idx = bisect.bisect_left(self._ring, (kh, ""))
if idx == len(self._ring):
idx = 0 # wrap around
return self._ring[idx][1]
def server_count(self) -> int:
return len(self._server_positions)
# Bounded-load variant for hot-spot mitigation:
class BoundedLoadRing:
"""Consistent hashing with bounded-load: each server's load ≤ avg * (1+ε)."""
def __init__(self, vnodes_per_server: int = 100, epsilon: float = 0.25):
self._inner = ConsistentHashRing(vnodes_per_server)
self._eps = epsilon
self._load: dict[str, int] = {}
def add_server(self, sid: str) -> None:
self._inner.add_server(sid); self._load.setdefault(sid, 0)
def remove_server(self, sid: str) -> None:
self._inner.remove_server(sid); self._load.pop(sid, None)
def get_server(self, key: str, total_keys: int) -> Optional[str]:
n = self._inner.server_count()
if n == 0: return None
cap = (total_keys / n) * (1 + self._eps)
# Walk forward from the first candidate until we find one under cap.
kh = _hash(key)
ring = self._inner._ring
idx = bisect.bisect_left(ring, (kh, ""))
if idx == len(ring): idx = 0
for offset in range(len(ring)):
_, sid = ring[(idx + offset) % len(ring)]
if self._load.get(sid, 0) < cap:
self._load[sid] = self._load.get(sid, 0) + 1
return sid
return ring[idx][1] # all over cap; pick first
Tests
import unittest, random, statistics
class TestRing(unittest.TestCase):
def test_basic(self):
r = ConsistentHashRing(vnodes_per_server=10)
r.add_server("s1"); r.add_server("s2"); r.add_server("s3")
self.assertIsNotNone(r.get_server("k1"))
self.assertIn(r.get_server("k1"), {"s1", "s2", "s3"})
r.remove_server("s2")
self.assertIn(r.get_server("k1"), {"s1", "s3"})
def test_minimal_remapping(self):
r = ConsistentHashRing(vnodes_per_server=200)
for s in ["s1", "s2", "s3"]: r.add_server(s)
keys = [f"key-{i}" for i in range(10000)]
before = {k: r.get_server(k) for k in keys}
r.add_server("s4")
after = {k: r.get_server(k) for k in keys}
moved = sum(1 for k in keys if before[k] != after[k])
# Expected ~25% remapping (from 3 servers to 4).
# mod-N would have moved ~75%.
self.assertLess(moved, 3500)
self.assertGreater(moved, 1500)
def test_balance(self):
r = ConsistentHashRing(vnodes_per_server=200)
for i in range(10): r.add_server(f"s{i}")
keys = [f"k-{i}" for i in range(20000)]
loads = {}
for k in keys:
s = r.get_server(k)
loads[s] = loads.get(s, 0) + 1
avg = 2000
# With 200 vnodes per server, variance should be small.
for cnt in loads.values():
self.assertLess(abs(cnt - avg), 350) # ≤17% deviation
def test_empty_ring(self):
r = ConsistentHashRing(vnodes_per_server=10)
self.assertIsNone(r.get_server("k"))
Follow-up Questions
(3) Scale to N nodes? Already designed for it. The ring scales because lookup is O(log N · V). The bottleneck on add_server is O(V log N) insertions; sorted-tree (red-black tree) implementations get O(V log N) similarly. For very large N, use a B-tree or skip list. For replication: get_servers(key, R) returns the next R unique servers clockwise.
(8) Partial failure? A server going down is naturally handled — its vnodes are removed and keys remap to the successor. The challenge is hot spots: when one server dies, all its load moves to one successor. Bounded-load consistent hashing (Mirsky’s variant) caps each server at (1+ε) × avg_load, spilling overflow to the next server. Implemented above.
(10) Consistency model? The ring itself is a routing function. The actual stored data has whatever consistency model the underlying store offers (linearizable, eventual, etc.). One subtlety: when a server is added, the data on the predecessor needs to be transferred to the new server before the routing change takes effect, or you serve stale/missing data. Two-phase add: install vnodes-as-readonly → migrate keys → activate.
(11) Configuration knobs? vnodes_per_server (100–500 covers most workloads), hash_function. Not to expose: ring data structure, balance/heuristics.
(4) Observe / monitor? Per-server load (gauge), key remapping events (counter), p99 lookup latency (histogram). Imbalance alert: trigger if any server’s load > 1.5x avg.
Product Extension
DynamoDB uses consistent hashing with explicit ranges. Cassandra uses 256 vnodes per node by default. Memcached’s clients (ketama, libmemcached) use consistent hashing. Envoy’s ring_hash load balancer uses it for sticky-session routing. Discord’s chat sharding originally used hash(channel) % N and famously hit the rebalance problem; they migrated to a fixed-bucket scheme. The point: even big companies get this wrong if they pick mod-N.
Language/Runtime Follow-ups
- Python:
bisectis the right tool for sorted-list maintenance. For very large rings, usesortedcontainers.SortedList(skip-list-backed) for O(log N) inserts. - Java:
TreeMap<Long, String>—ceilingKey(hash)does the lookup. Idiomatic. - Go:
sort.Searchover a[]uint64for the ring. Good locality, fast. - C++:
std::map<uint64_t, std::string>withlower_bound. Or sort a vector and binary-search. - JS/TS: no sorted-tree in stdlib; use the
sorted-array-functionsnpm package or maintain a sorted array manually.
Common Bugs
- Using
hash(key) % len(ring)to pick a vnode index — that is mod-N inside the ring. Use the ring’s actual hash space. - Forgetting to wrap around —
bisectreturnslen(ring)for a key past the last vnode; you must wrap to index 0. - Hash collisions on small rings — two servers’ vnodes land on the same hash. Either accept “first inserted wins” (deterministic) or perturb the suffix until unique.
- Removing a server but leaving its vnodes in
_server_positions(memory leak; subsequentadd_serverfor the same id silently no-ops because of theif … in …guard). - Using
hash(key)(Python’s built-in) — randomized per-process. Different processes route the same key to different servers. Use a stable hash like MD5 or MurmurHash.
Debugging Strategy
For “wrong server” complaints: log the (hashed-key, ring-position-found, server). For imbalance: dump load distribution and check vnode count. For “everything remapped” after add: count the remapped key fraction; if > 1/N, vnode count is too low or the hash function is poor.
Mastery Criteria
- Implemented ring + lookup + add/remove in <30 minutes.
- Stated minimal-remapping invariant (≤ 1/N keys move on add) without prompting.
- All four tests pass.
- Articulated why vnodes are needed in <60 seconds.
- Compared mod-N, ring-no-vnodes, ring-with-vnodes, bounded-load on a whiteboard.
- Answered follow-ups #3, #8 (bounded load), #10 (data migration coordination), #11 crisply.
Lab 11 — Message Dispatcher
Goal
Implement a message dispatcher that fans out messages to N consumers with fairness, per-consumer priority levels, and per-consumer backpressure. After this lab you should be able to design a multi-consumer dispatcher that doesn’t starve slow consumers and doesn’t get stalled by slow ones, in under 30 minutes.
Background Concepts
A dispatcher accepts messages from one (or more) producers and routes them to N consumers. Three classical problems:
- Fairness: round-robin across consumers vs weighted by priority. Strict round-robin is unfair if some consumers have higher priority. Weighted-fair-queueing (WFQ) gives each consumer a share proportional to its weight.
- Slow-consumer problem: a slow consumer’s queue fills up. If we share a single queue across all consumers, the slow one stalls everyone. Solution: per-consumer queues with bounded capacity.
- Backpressure: when a consumer’s queue is full, what do we do? Options: (a) block the producer (slowest fair), (b) drop oldest, (c) drop newest, (d) reject and signal upstream. Default to (b) for telemetry, (a) for orders/payments.
Interview Context
Dispatcher problems show up at message-bus companies (Confluent, Solace), real-time platforms (Twilio, PubNub, Pusher), and any backend with fan-out. The interviewer wants to hear: per-consumer queue, priority-aware scheduling, explicit backpressure policy. Common failure: a single shared queue with consumers competing — works for two consumers, breaks at scale.
Problem Statement
Design Dispatcher:
register_consumer(consumer_id, priority=1, queue_capacity=1024, on_full="drop_oldest")dispatch(message)— fan out to all registered consumers (broadcast).consume(consumer_id) -> Message | None— non-blocking dequeue for a consumer.consume_blocking(consumer_id, timeout)— blocking dequeue.unregister(consumer_id)stats() -> dict— per-consumer queue size, drops, throughput.
Constraints
- 1 ≤ consumers ≤ 1000
- 10^5 dispatches / second
- Per-consumer latency: < 1 ms p99 from dispatch to availability
Clarifying Questions
- Broadcast (every consumer gets every message) or partition (each message goes to one)? (Pick one; we’ll do broadcast — the harder problem.)
- Strict ordering across consumers? (Each consumer sees messages in dispatch order; cross-consumer ordering not guaranteed.)
- Priority semantics: priority is a weight, not a strict precedence? (Weight is the standard.)
- Should consumers be threads, or should
consumebe polled by external code? (External polling — simpler.)
Examples
d = Dispatcher()
d.register_consumer("fast", priority=2, queue_capacity=100)
d.register_consumer("slow", priority=1, queue_capacity=10, on_full="drop_oldest")
for i in range(20):
d.dispatch({"i": i})
# slow consumer dropped 10 oldest; only sees last 10
slow_msgs = []
while True:
m = d.consume("slow")
if m is None: break
slow_msgs.append(m)
# slow_msgs has the 10 most recent messages
Initial Brute Force
A single shared deque and a for c in consumers: c.recv(msg) loop on dispatch. If a consumer is slow, the dispatch blocks on it. The whole pipeline stalls.
Brute Force Complexity
Per dispatch: O(N · consumer-receive-time). Worst-case latency is bounded by the slowest consumer.
Optimization Path
Per-consumer queue. Dispatch is now O(N · queue-push-time) ≈ O(N) — fast and bounded by the dispatcher’s own work, not by any consumer’s processing. Slow consumers fill their own queue and trigger their own backpressure policy without affecting others.
Final Expected Approach
A dict[consumer_id, ConsumerQueue]. Each ConsumerQueue has a deque, a Lock (or use queue.Queue), a capacity, an on-full policy, and counters. dispatch iterates the dict and pushes to each queue, applying the policy on full. consume dequeues from the named queue. The single shared lock would serialize dispatches; the per-queue lock parallelizes them with the cost that there’s no atomic “dispatch sees a consistent registration” — acceptable trade.
Data Structures Used
| Structure | Purpose |
|---|---|
dict[id, ConsumerQueue] | per-consumer state |
deque per consumer | bounded FIFO |
Lock per consumer | concurrent push/pop safety |
Condition per consumer | blocking consume |
Correctness Argument
No starvation: every consumer has its own queue; a slow one cannot block dispatch. Producer dispatch latency is bounded by O(N) (the dict iteration), independent of consumer speed.
Priority weighting: at dispatch time, we don’t apply weight (every consumer gets every message — broadcast). Priority is used in the consume order if we have a single consumer thread that polls all queues in priority order; in that variant, weight determines how many messages we consume from each per round (e.g., 2 from priority=2, 1 from priority=1).
Backpressure: when a queue is at capacity, the consumer-defined policy fires. Each consumer’s drops are tracked separately; the dispatcher itself never blocks.
Complexity
dispatch: O(N) where N = number of consumersconsume: O(1)- Space: O(N · capacity)
Implementation Requirements
import threading, time
from collections import deque
from typing import Any, Optional
class _ConsumerQueue:
def __init__(self, capacity: int, on_full: str = "drop_oldest", priority: int = 1):
self.capacity = capacity
self.on_full = on_full # "drop_oldest" | "drop_newest" | "block"
self.priority = priority
self.q: deque = deque()
self.lock = threading.Lock()
self.cond = threading.Condition(self.lock)
self.dropped = 0
self.delivered = 0
class Dispatcher:
def __init__(self):
self._consumers: dict[str, _ConsumerQueue] = {}
self._reg_lock = threading.RLock()
def register_consumer(self, consumer_id: str, priority: int = 1,
queue_capacity: int = 1024,
on_full: str = "drop_oldest") -> None:
with self._reg_lock:
if consumer_id in self._consumers:
raise ValueError(f"{consumer_id} already registered")
self._consumers[consumer_id] = _ConsumerQueue(queue_capacity, on_full, priority)
def unregister(self, consumer_id: str) -> None:
with self._reg_lock:
self._consumers.pop(consumer_id, None)
def dispatch(self, message: Any) -> None:
with self._reg_lock:
consumers = list(self._consumers.values())
for cq in consumers:
with cq.lock:
if len(cq.q) >= cq.capacity:
if cq.on_full == "drop_oldest":
cq.q.popleft()
cq.dropped += 1
cq.q.append(message)
elif cq.on_full == "drop_newest":
cq.dropped += 1
continue
elif cq.on_full == "block":
while len(cq.q) >= cq.capacity:
cq.cond.wait()
cq.q.append(message)
else:
cq.q.append(message)
cq.cond.notify()
def consume(self, consumer_id: str) -> Optional[Any]:
cq = self._consumers.get(consumer_id)
if cq is None: return None
with cq.lock:
if not cq.q: return None
m = cq.q.popleft()
cq.delivered += 1
cq.cond.notify() # producer waiting on "block" policy
return m
def consume_blocking(self, consumer_id: str, timeout: Optional[float] = None) -> Optional[Any]:
cq = self._consumers.get(consumer_id)
if cq is None: return None
deadline = None if timeout is None else time.monotonic() + timeout
with cq.lock:
while not cq.q:
if deadline is None:
cq.cond.wait()
else:
rem = deadline - time.monotonic()
if rem <= 0: return None
cq.cond.wait(timeout=rem)
m = cq.q.popleft()
cq.delivered += 1
cq.cond.notify()
return m
def stats(self) -> dict:
with self._reg_lock:
return {
cid: {
"queue_size": len(cq.q),
"dropped": cq.dropped,
"delivered": cq.delivered,
"capacity": cq.capacity,
"priority": cq.priority,
}
for cid, cq in self._consumers.items()
}
Tests
import unittest, threading, time
class TestDispatcher(unittest.TestCase):
def test_broadcast(self):
d = Dispatcher()
d.register_consumer("a"); d.register_consumer("b")
for i in range(5): d.dispatch(i)
a = [d.consume("a") for _ in range(5)]
b = [d.consume("b") for _ in range(5)]
self.assertEqual(a, [0, 1, 2, 3, 4])
self.assertEqual(b, [0, 1, 2, 3, 4])
def test_drop_oldest_on_full(self):
d = Dispatcher()
d.register_consumer("a", queue_capacity=3, on_full="drop_oldest")
for i in range(5): d.dispatch(i)
out = []
while True:
m = d.consume("a")
if m is None: break
out.append(m)
self.assertEqual(out, [2, 3, 4])
self.assertEqual(d.stats()["a"]["dropped"], 2)
def test_slow_consumer_does_not_block_fast(self):
d = Dispatcher()
d.register_consumer("slow", queue_capacity=2, on_full="drop_oldest")
d.register_consumer("fast", queue_capacity=100)
for i in range(50):
d.dispatch(i)
# fast got everything
fast = []
while True:
m = d.consume("fast")
if m is None: break
fast.append(m)
self.assertEqual(len(fast), 50)
# slow has 2 (capacity) — most-recent
slow = [d.consume("slow") for _ in range(2)]
self.assertEqual(slow, [48, 49])
def test_blocking_consume_wakes_on_dispatch(self):
d = Dispatcher()
d.register_consumer("c", queue_capacity=10)
results = []
def consumer():
results.append(d.consume_blocking("c", timeout=2.0))
t = threading.Thread(target=consumer); t.start()
time.sleep(0.05)
d.dispatch("hello")
t.join(timeout=2.0)
self.assertEqual(results, ["hello"])
Follow-up Questions
(7) Backpressure? This is the core problem. Three policies built in: drop-oldest (best for telemetry), drop-newest (best for “first message matters”), block (best for ordered streams where loss is unacceptable). The right pick depends on the data semantics; expose it per-consumer.
(3) Scale to N nodes? Distributed dispatch: each consumer “subscribes” via a network connection; the dispatcher fans out over those connections. Bottleneck shifts from queue-push to per-consumer network round-trip. For very high consumer counts, a hierarchical dispatcher (dispatch to N regional dispatchers, each of which dispatches to M local consumers) reduces the per-message broadcast cost.
(4) Observe / monitor? Per-consumer queue depth, drop rate, throughput. The drop-rate heatmap by consumer is the first dashboard you draw. Alert when any consumer has > 1% drop rate.
(8) Partial failure? A consumer that connects, then disappears: the dispatcher must detect (via missed heartbeat or socket close) and unregister; otherwise its queue grows unbounded. Heartbeat / TTL on consumer registration.
(11) Configuration knobs? Per-consumer: priority, queue_capacity, on_full policy. Global: max consumers. Knobs not to expose: the lock granularity (per-consumer is correct).
Product Extension
This is the in-process version of Kafka consumer groups, RabbitMQ exchange-to-queue fan-out, and Redis Pub/Sub. Real systems add: durable per-consumer logs (Kafka’s offset model), dynamic rebalancing as consumers join/leave (Kafka group coordinator), and message acknowledgment (AMQP). The core problem of “don’t let slow consumers stall fast ones” is solved everywhere by per-consumer storage.
Language/Runtime Follow-ups
- Python:
queue.Queueper consumer is also fine and simpler — built-in blocking and capacity. Custom_ConsumerQueueshown for control over the on-full policy. - Java:
LinkedBlockingQueueper consumer;RingBuffer(Disruptor pattern) for high-throughput. - Go: idiomatic Go:
chan Messageper consumer of sizecapacity.selectwithdefaultfor non-blocking. Simple and fast. - C++:
boost::lockfree::spsc_queueper consumer for single-producer/single-consumer; otherwise mutex + deque. - JS/TS: single event loop, so no real “consumer threads”; use EventEmitter with bounded buffers per listener. RxJS
SubjectwithbufferCountoperators.
Common Bugs
- Single shared lock for the dispatcher: serializes everything; dispatch latency = O(N · consumer-time).
blockpolicy without notifying onconsume: producer waits forever.- Forgetting to copy the consumer dict before iterating in
dispatch— concurrent unregister mutates the dict mid-iteration. - Drop-oldest implementation:
popleftthen append succeeds, but if the lock is dropped between, ordering breaks. Both ops under the same lock. - Counting a “dropped” message as both dropped and delivered (double count) when on
drop_oldestyou replace, not skip.
Debugging Strategy
Print stats() periodically. Slow consumer = high queue_size and rising dropped. Stuck consumer = blocking with no notifies — check producer dispatch isn’t stuck on a different consumer’s block policy. For “missed messages”: log per-message dispatch with consumer enumeration and replay against delivered counts.
Mastery Criteria
- Implemented per-consumer queue + dispatcher in <30 minutes.
- All four tests pass.
- Stated “per-consumer queue isolates slow consumers” before coding.
- Listed three backpressure policies (drop-oldest/newest/block) and their use cases.
- Answered follow-ups #3, #4, #7, #8, #11 crisply.
- Identified that broadcast is the harder variant; partition is simpler.
Lab 12 — In-Memory Pub/Sub
Goal
Implement an in-process publish-subscribe system with topic-based routing, wildcard subscriptions (a.b.*, a.#), and per-subscriber backpressure. After this lab you should be able to write a clean pub/sub broker in under 30 minutes and articulate the topic-matching design tradeoffs.
Background Concepts
Pub/sub differs from a job queue: subscribers don’t compete for messages; each subscriber receives every matching message. Two routing models:
- Topic-based (channel name as a string):
"orders.created","users.signup". Wildcards (*= one segment,#= many segments) come from MQTT/AMQP. - Content-based: subscribers register a predicate over message content. More flexible, much harder to scale (every message must be evaluated against every subscriber’s predicate).
Topic-based with wildcards is the standard. The implementation challenge is the wildcard matcher: a subscription on "orders.*" should match "orders.created" but not "orders.created.fraud". We can solve this with a topic trie (segment-by-segment) for O(L) per match where L is segment count, or a regex per subscription for O(N) per dispatch where N is subscription count. The trie is the production answer for systems with many subscriptions.
Interview Context
Pub/sub design is asked at messaging companies (Confluent, IBM MQ, Solace, AWS SNS), at real-time platforms (Pusher, Ably, Twilio), and broadly at any senior+ design-coding round. The interview wants both code and the design reasoning around routing, backpressure, and subscription matching.
Problem Statement
Design PubSub:
subscribe(topic_pattern, on_message) -> subscription_id—topic_patternmay include*(single segment) or#(multi-segment, must be last).unsubscribe(subscription_id)publish(topic, message)— callon_message(topic, message)on every matching subscriber.- Per-subscriber callback wrapping for backpressure (queue + drop policy).
Constraints
- Up to 10^4 active subscriptions
- Up to 10^5 publishes / second
- Subscription matching: < 100 µs per publish
- Per-subscriber callbacks may be slow; must not block the publisher
Clarifying Questions
- Wildcard syntax — MQTT (
+/#), AMQP (*/#), or other? (Pick AMQP-style:*= one segment,#= ≥ 0 segments at end.) - Synchronous or async callback delivery? (Async with per-subscriber queue is the production answer; simpler synchronous version is acceptable for the basic case.)
- Topic separator:
.or/? (.is the AMQP convention; either is fine.) - Ordering guarantees? (Per-subscriber: messages arrive in publish-order. Across subscribers: not guaranteed.)
- Replay / retain / persistence? (No by default; pure in-memory.)
Examples
ps = PubSub()
sid = ps.subscribe("orders.*", lambda topic, msg: print(f"got {topic}: {msg}"))
ps.publish("orders.created", {"id": 1}) # fires
ps.publish("orders.created.fraud", {"id": 2}) # does NOT fire (* is one segment)
ps.subscribe("orders.#", lambda t, m: log(t, m))
ps.publish("orders.created.fraud", {}) # fires (# matches multi)
ps.unsubscribe(sid)
ps.publish("orders.created", {}) # only the # subscription fires
Initial Brute Force
dict[topic_pattern, list[callback]]. On publish, iterate all subscriptions, regex-match each pattern against the topic. O(N · pattern-cost) per publish where N is subscription count.
Brute Force Complexity
At N=10^4 subscriptions and 10^5 pub/s, this is 10^9 regex matches / sec — too slow. Wildcard regex compile and match dominate.
Optimization Path
A topic trie: each node represents a segment. Children include literal-segment children plus a * and # child for wildcards. Match by walking the trie segment-by-segment, exploring each node’s literal child plus its * child plus (if at end) any # ancestor’s catch-all subscriptions.
Per-publish cost becomes O(L · branching factor) ≈ O(L) for typical trees, where L is the topic depth.
Final Expected Approach
Build a topic trie. Each node has:
children: dict[str, Node]— literal subsegmentsstar: Node | None— single-segment wildcardhash_subscriptions: list[Sub]—#catch-all (matches everything below this node)subs: list[Sub]— exact matches at this node
Publishing walks the trie segment-by-segment, at each step checking the literal child and the star child; collect matching subs at terminal nodes. Collect hash_subscriptions along the entire path.
Each subscriber owns a per-subscriber bounded queue; publish enqueues to the queue (non-blocking, drops on full); a worker thread per subscriber drains the queue and calls the callback.
Data Structures Used
| Structure | Purpose |
|---|---|
| Topic trie | O(L) routing |
dict[sub_id, Subscription] | unregister lookup |
deque per subscriber + lock | per-subscriber queue |
| Worker thread per subscriber | invoke callback off the publish path |
Correctness Argument
Routing: a subscription A.B.C matches publish topic A.B.C iff the trie walk reaches the node carrying that subscription with all segments consumed. A * matches any single segment (one node-level wildcard step). A # at a node matches any zero-or-more remaining segments — equivalent to attaching a list of “catch-all” subscribers to that node.
Per-subscriber ordering: each subscriber’s queue is FIFO; the worker drains in FIFO order. Therefore subscriber sees messages in publish order.
Publisher non-blocking: publish only enqueues; no subscriber callback runs on the publish thread. Even a callback that takes 1 second doesn’t slow publish.
Complexity
subscribe: O(L) for trie insertpublish: O(L · F · M) where L = topic depth, F = trie branching, M = matching subscribersunsubscribe: O(L) to walk the trie node and remove
Implementation Requirements
import threading, itertools
from collections import deque
from typing import Callable, Any, Optional
class _Sub:
__slots__ = ("sub_id", "callback", "queue", "lock", "cond", "capacity", "drops",
"alive", "thread")
def __init__(self, sub_id: int, callback: Callable, capacity: int = 1024):
self.sub_id = sub_id; self.callback = callback
self.queue: deque = deque()
self.lock = threading.Lock()
self.cond = threading.Condition(self.lock)
self.capacity = capacity
self.drops = 0
self.alive = True
self.thread: Optional[threading.Thread] = None
def deliver(self, topic: str, msg: Any) -> None:
with self.lock:
if len(self.queue) >= self.capacity:
self.queue.popleft(); self.drops += 1
self.queue.append((topic, msg))
self.cond.notify()
def stop(self) -> None:
with self.lock:
self.alive = False
self.cond.notify()
def run(self) -> None:
while True:
with self.lock:
while self.alive and not self.queue:
self.cond.wait()
if not self.alive and not self.queue:
return
topic, msg = self.queue.popleft()
try:
self.callback(topic, msg)
except Exception:
pass # don't kill worker on bad callback
class _Node:
__slots__ = ("children", "star", "subs", "hash_subs")
def __init__(self):
self.children: dict[str, _Node] = {}
self.star: Optional[_Node] = None
self.subs: list[_Sub] = []
self.hash_subs: list[_Sub] = []
class PubSub:
def __init__(self, separator: str = "."):
self._sep = separator
self._root = _Node()
self._subs: dict[int, tuple[_Sub, list[str]]] = {} # id -> (sub, pattern segments)
self._next_id = itertools.count(1)
self._lock = threading.RLock()
def subscribe(self, pattern: str, callback: Callable[[str, Any], None],
queue_capacity: int = 1024) -> int:
segments = pattern.split(self._sep)
sid = next(self._next_id)
sub = _Sub(sid, callback, queue_capacity)
with self._lock:
node = self._root
for i, seg in enumerate(segments):
if seg == "#":
if i != len(segments) - 1:
raise ValueError("# must be the last segment")
node.hash_subs.append(sub)
break
if seg == "*":
if node.star is None: node.star = _Node()
node = node.star
else:
node = node.children.setdefault(seg, _Node())
else:
node.subs.append(sub)
self._subs[sid] = (sub, segments)
sub.thread = threading.Thread(target=sub.run, daemon=True)
sub.thread.start()
return sid
def unsubscribe(self, sub_id: int) -> None:
with self._lock:
entry = self._subs.pop(sub_id, None)
if entry is None: return
sub, segments = entry
self._remove_from_trie(self._root, segments, 0, sub)
sub.stop()
sub.thread.join(timeout=1.0)
def _remove_from_trie(self, node: _Node, segments: list[str],
i: int, sub: _Sub) -> None:
if i == len(segments):
try: node.subs.remove(sub)
except ValueError: pass
return
seg = segments[i]
if seg == "#":
try: node.hash_subs.remove(sub)
except ValueError: pass
return
nxt = node.star if seg == "*" else node.children.get(seg)
if nxt is not None:
self._remove_from_trie(nxt, segments, i + 1, sub)
def publish(self, topic: str, message: Any) -> None:
segments = topic.split(self._sep)
with self._lock:
self._match(self._root, segments, 0, topic, message)
def _match(self, node: _Node, segments: list[str], i: int,
topic: str, msg: Any) -> None:
# # at this node matches everything below — fire now
for s in node.hash_subs:
s.deliver(topic, msg)
if i == len(segments):
for s in node.subs:
s.deliver(topic, msg)
return
seg = segments[i]
child = node.children.get(seg)
if child is not None:
self._match(child, segments, i + 1, topic, msg)
if node.star is not None:
self._match(node.star, segments, i + 1, topic, msg)
Tests
import unittest, time
class TestPubSub(unittest.TestCase):
def _collect(self, ps, sid_buf):
buf = []
def cb(t, m): buf.append((t, m))
sid = ps.subscribe(sid_buf, cb)
return sid, buf
def test_exact_match(self):
ps = PubSub()
sid, buf = self._collect(ps, "a.b")
ps.publish("a.b", 1); ps.publish("a.b.c", 2); ps.publish("a", 3)
time.sleep(0.05)
self.assertEqual(buf, [("a.b", 1)])
ps.unsubscribe(sid)
def test_star_one_segment(self):
ps = PubSub()
sid, buf = self._collect(ps, "a.*")
ps.publish("a.b", 1); ps.publish("a.c", 2)
ps.publish("a.b.c", 3); ps.publish("a", 4)
time.sleep(0.05)
self.assertEqual(sorted(buf), [("a.b", 1), ("a.c", 2)])
ps.unsubscribe(sid)
def test_hash_multi_segment(self):
ps = PubSub()
sid, buf = self._collect(ps, "a.#")
ps.publish("a", 0)
ps.publish("a.b", 1); ps.publish("a.b.c", 2); ps.publish("x.y", 3)
time.sleep(0.05)
# # matches zero or more, so a, a.b, a.b.c all match
self.assertEqual(sorted(buf), [("a", 0), ("a.b", 1), ("a.b.c", 2)])
ps.unsubscribe(sid)
def test_multiple_subscribers(self):
ps = PubSub()
sid1, b1 = self._collect(ps, "topic")
sid2, b2 = self._collect(ps, "topic")
ps.publish("topic", "msg")
time.sleep(0.05)
self.assertEqual(b1, [("topic", "msg")])
self.assertEqual(b2, [("topic", "msg")])
ps.unsubscribe(sid1); ps.unsubscribe(sid2)
Follow-up Questions
(7) Backpressure? Per-subscriber bounded queue with drop-oldest on full (shown). Alternatives: block the publisher (rejected — one slow subscriber stalls the world), drop-newest (loses recent state — rarely the right answer for pub/sub).
(3) Scale to N nodes? Distributed pub/sub is its own discipline. Models: (a) broker-based (Redis Pub/Sub, NATS): central broker fans out. (b) broker-less mesh (pgossip): peers gossip subscriptions; each publish goes to relevant peers. (c) partitioned log (Kafka): no fan-out; consumers tail logs. The trie matcher works locally in any model; the network layer is the harder design.
(2) Persist state? Pure pub/sub is volatile — late subscribers miss messages. To persist, layer a replay log: every publish appends to a durable log; new subscribers can opt-in to read from offset 0 or “latest”. This is essentially Kafka’s design.
(4) Observe / monitor? Per-subscriber drop count, queue depth, throughput. Subscription count gauge. Publish rate counter. p99 publish-to-deliver latency histogram (for the per-subscriber path).
(11) Configuration knobs? Per-subscription queue capacity, on-full policy. Global: max subscriptions, separator character. Knobs not to expose: trie internal layout.
Product Extension
MQTT brokers, AMQP exchanges, Redis Pub/Sub, NATS, ZeroMQ — all use topic-based routing with some wildcard syntax. Cloud Pub/Sub products (AWS SNS, GCP Pub/Sub, Azure Event Grid) add durability, retries, and ordering. The ergonomic difference between MQTT and AMQP wildcards (+/# vs */#) is purely syntactic.
Language/Runtime Follow-ups
- Python: this implementation. The per-subscriber thread approach scales to ~1000 subscribers; beyond, switch to an event loop (asyncio) with a single dispatcher coroutine.
- Java:
EventBus(Guava) is the lightweight in-process pub/sub. For wildcards, MQTT clients (Paho) or Kafka. - Go: channels per subscriber; idiomatic.
nats-serveris the production-grade Go choice. - C++: Boost.Signals2 is the in-process equivalent; no wildcards.
- JS/TS: Node’s
EventEmitteris the in-process equivalent; no wildcards. RxJS for reactive streams.
Common Bugs
- Synchronous callback dispatch from
publish— one slow subscriber stalls everyone. Always use per-subscriber worker threads. - Trie cleanup on unsubscribe: removing from the leaf but leaving empty intermediate nodes. Memory leak; matters at high churn.
#not at end: validate at subscription time.- Not propagating exceptions out of subscriber callbacks (silent failures). Log them.
- Race: subscribing during a publish — the subscription’s callback may or may not see the in-flight message. Document the semantics.
Debugging Strategy
For “missed message”: print the trie state at the matching point and the topic segments. For wildcard surprises, hand-trace the match: which child did we descend into? Did we visit *? Did # fire at the right level? For “callback didn’t run”: check that the worker thread is alive (sub.thread.is_alive()); a callback exception kills the worker if not caught.
Mastery Criteria
- Implemented topic trie + wildcard matching in <30 minutes.
- All four tests pass first run.
- Stated trie-vs-regex tradeoff (trie wins at scale; regex is simpler for few subscriptions).
- Articulated per-subscriber queue isolates slow subscribers.
- Answered follow-ups #2 (replay log), #3, #4, #7, #11.
- Compared topic vs content-based routing and stated when to use each.
Lab 13 — Hierarchical Timer Wheel
Goal
Implement a hierarchical timer wheel that supports O(1) amortized schedule and cancel operations for up to millions of timers. After this lab you should understand why a min-heap is wrong for high-throughput timer workloads, and be able to implement a single-level and a hierarchical timer wheel from blank screen in under 35 minutes.
Background Concepts
Many systems need to schedule callbacks for a future time: TCP retransmit timers, session timeouts, rate-limit reset, cron-style task scheduling. The classical data structure is a min-heap of (fire_time, callback): O(log N) schedule, O(log N) to fire (pop), O(N) to cancel (without indexing).
For the cases where N is small (thousands) and timers are infrequent, a heap is fine. But TCP at high QPS has millions of pending timers, most of which are cancelled before firing (the data arrives or the connection closes). For that workload, the heap is too slow.
The timer wheel (Varghese & Lauck, 1987) achieves O(1) schedule and O(1) cancel by bucketing timers by their fire time. Imagine a circular array of N slots; each slot holds a list of timers that fire when the wheel cursor reaches that slot. Each tick, advance the cursor and fire all timers in the current slot. Schedule is O(1): slot_index = (now + delay) % N. Cancel is O(1): remove from the slot’s list.
The single-level wheel works for delays up to N · tick_resolution. Beyond that, hierarchical wheels: minute wheel + hour wheel + day wheel, like an analog clock. When the minute hand sweeps past 60, advance the hour hand by one and re-bucket the timers in that hour slot into the minute wheel.
Interview Context
Timer wheel is a senior+ system-design-coding question, asked at networking/infrastructure companies (Cloudflare, Cilium, AWS networking, Datadog APM agents). The bar is high: implement at least the single-level wheel correctly; the hierarchical structure is for top candidates.
Problem Statement
Implement TimerWheel:
schedule(delay_seconds, callback) -> timer_idcancel(timer_id) -> booltick(now)— advance the cursor; fire all timers whose deadline passed.
Then extend to HierarchicalTimerWheel: 4 levels (e.g., 256 slots × 4 levels = 4 GB span at 1 ms tick).
Constraints
- Up to 10^7 active timers
- Tick resolution: 1 ms to 100 ms
- Schedule rate: 10^6 / sec
- Cancel rate: 10^6 / sec (most timers are cancelled before firing)
Clarifying Questions
- Tick resolution — fixed at construction, or adaptive? (Fixed.)
- Time source — caller supplies
now(testable) or wall clock? (Caller supplies; this also lets us simulate.) - Are callbacks fired on the tick thread, or queued? (On tick thread — simpler. For long callbacks, tick is slow; document.)
- What’s the max delay? (Single-level:
slots * tick; hierarchical: enormous.)
Examples
wheel = TimerWheel(slots=60, tick_ms=1000) # 1-second tick, 60-second range
fired = []
wheel.schedule(5, lambda: fired.append("5s"))
wheel.schedule(10, lambda: fired.append("10s"))
# Simulate ticks
for i in range(15):
wheel.tick(start_time + i)
# After 15 seconds, fired == ["5s", "10s"]
Initial Brute Force
A min-heap of (deadline, id, callback). Schedule = O(log N). Tick = O(K log N) for K firings. Cancel = O(N) (or O(log N) with a dict[id, heap_idx] and lazy deletion).
Brute Force Complexity
At 10^6 schedules/sec with N=10^7, schedule cost is log(10^7) ≈ 23 per — feasible but at the edge. The fatal weakness is cancel: O(N) without indexing. With lazy-deletion indexing, the heap grows unboundedly with cancelled-but-not-popped entries — memory leak.
Optimization Path
Single-level wheel: slots = circular array of bucket lists. schedule(delay): compute slot = (cursor + delay // tick) % slots, append to that slot’s list. tick: advance cursor, fire and clear slots[cursor]. Cancel: remove the timer from its slot’s list (O(1) if you store back-pointers).
Hierarchical: when delay > slots, place in higher-level wheel. When the lower wheel completes a full revolution, the next slot of the higher wheel is “cascaded” — its timers are re-bucketed into the lower wheel.
Final Expected Approach
Single-level: doubly-linked lists at each slot for O(1) cancel (timer holds prev/next pointers). Cursor advances on tick; fire all timers in the new slot; clear the slot.
Hierarchical: 4 wheels with slot counts [256, 64, 64, 64] (Linux kernel choice). Lower wheel ticks every 1 ms; carries every 256 ticks → upper level advances by one slot, and we cascade that slot (re-bucket its timers into the lower wheel based on remaining delay).
Data Structures Used
| Structure | Purpose |
|---|---|
| Array of doubly-linked lists | wheel slots |
dict[timer_id, Timer] | O(1) cancel lookup |
cursor: int | current slot |
| Multiple wheels | hierarchical |
Correctness Argument
Single-level firing: a timer scheduled at now + delay is placed in slot (cursor + delay // tick) % slots. tick is called once per tick time unit; after delay // tick calls, the cursor reaches the timer’s slot and fires it. Provided delay < slots * tick, this is exact.
Cancel: O(1) splice in the doubly-linked list, O(1) lookup via dict.
Hierarchical correctness: when the lower wheel completes a revolution (cursor wraps), the next upper-wheel slot is cascaded: each timer in it has its remaining delay computed against the new “now” and is placed in the appropriate lower-wheel slot. Because the cascade happens just before the next revolution, the timer fires at the same wall-clock time it would have in a single-large-wheel implementation.
Complexity
schedule: O(1)cancel: O(1)tick: O(K) for K firings, plus amortized O(1) cascade- Space: O(slots + active timers)
Implementation Requirements
import itertools
from typing import Callable, Optional
class _Timer:
__slots__ = ("tid", "deadline_tick", "callback", "prev", "next", "slot")
def __init__(self, tid, deadline_tick, callback):
self.tid = tid
self.deadline_tick = deadline_tick
self.callback = callback
self.prev = self.next = None
self.slot: Optional[list] = None # back-ref to bucket for O(1) cancel
class _Bucket:
__slots__ = ("head", "tail")
def __init__(self):
self.head = _Timer(None, None, None) # sentinel
self.tail = _Timer(None, None, None)
self.head.next = self.tail; self.tail.prev = self.head
def append(self, t: _Timer) -> None:
t.prev = self.tail.prev; t.next = self.tail
self.tail.prev.next = t; self.tail.prev = t
t.slot = self
def remove(self, t: _Timer) -> None:
t.prev.next = t.next; t.next.prev = t.prev
t.slot = None; t.prev = t.next = None
def drain(self) -> list[_Timer]:
out = []
n = self.head.next
while n is not self.tail:
nxt = n.next
n.prev = n.next = None; n.slot = None
out.append(n); n = nxt
self.head.next = self.tail; self.tail.prev = self.head
return out
class TimerWheel:
def __init__(self, slots: int = 256, tick_seconds: float = 0.001):
self._slots = [_Bucket() for _ in range(slots)]
self._n_slots = slots
self._tick = tick_seconds
self._cursor = 0
self._current_tick = 0
self._timers: dict[int, _Timer] = {}
self._next_id = itertools.count(1)
def schedule(self, delay_seconds: float, callback: Callable) -> int:
ticks = max(1, int(delay_seconds / self._tick))
if ticks >= self._n_slots:
raise ValueError("delay exceeds wheel range; use HierarchicalTimerWheel")
deadline = self._current_tick + ticks
slot = deadline % self._n_slots
t = _Timer(next(self._next_id), deadline, callback)
self._slots[slot].append(t)
self._timers[t.tid] = t
return t.tid
def cancel(self, timer_id: int) -> bool:
t = self._timers.pop(timer_id, None)
if t is None or t.slot is None:
return False
t.slot.remove(t)
return True
def tick(self) -> None:
self._current_tick += 1
self._cursor = (self._cursor + 1) % self._n_slots
bucket = self._slots[self._cursor]
for t in bucket.drain():
self._timers.pop(t.tid, None)
try: t.callback()
except Exception: pass
class HierarchicalTimerWheel:
"""4 levels: each level has 256 slots; tick = lower-level period."""
def __init__(self, levels: int = 4, slots_per_level: int = 256,
tick_seconds: float = 0.001):
self._levels = [
[_Bucket() for _ in range(slots_per_level)] for _ in range(levels)
]
self._cursors = [0] * levels
self._n = slots_per_level
self._tick = tick_seconds
self._current_tick = 0
self._timers: dict[int, tuple[int, int, _Timer]] = {} # id -> (level, slot, t)
self._next_id = itertools.count(1)
def schedule(self, delay_seconds: float, callback: Callable) -> int:
ticks = max(1, int(delay_seconds / self._tick))
deadline = self._current_tick + ticks
return self._place(deadline, callback)
def _place(self, deadline_tick: int, callback: Callable) -> int:
ticks_from_now = deadline_tick - self._current_tick
# Find the lowest level that can hold this delay.
capacity = self._n
level = 0
while ticks_from_now >= capacity and level < len(self._levels) - 1:
level += 1; capacity *= self._n
if ticks_from_now >= capacity:
raise ValueError("delay exceeds hierarchical range")
# Slot at this level:
per_slot = capacity // self._n
slot = (self._cursors[level] + ticks_from_now // per_slot) % self._n
t = _Timer(next(self._next_id), deadline_tick, callback)
self._levels[level][slot].append(t)
self._timers[t.tid] = (level, slot, t)
return t.tid
def cancel(self, timer_id: int) -> bool:
entry = self._timers.pop(timer_id, None)
if entry is None: return False
_, _, t = entry
if t.slot is not None: t.slot.remove(t)
return True
def tick(self) -> None:
self._current_tick += 1
# Advance level 0
self._cursors[0] = (self._cursors[0] + 1) % self._n
# Fire level-0 current slot
bucket = self._levels[0][self._cursors[0]]
for t in bucket.drain():
self._timers.pop(t.tid, None)
try: t.callback()
except Exception: pass
# Cascade if we wrapped
for lvl in range(1, len(self._levels)):
if self._cursors[lvl - 1] != 0:
break
self._cursors[lvl] = (self._cursors[lvl] + 1) % self._n
for t in self._levels[lvl][self._cursors[lvl]].drain():
self._timers.pop(t.tid, None)
self._place(t.deadline_tick, t.callback)
Tests
import unittest
class TestTimerWheel(unittest.TestCase):
def test_basic(self):
w = TimerWheel(slots=10, tick_seconds=1.0)
fired = []
w.schedule(3, lambda: fired.append("a"))
w.schedule(5, lambda: fired.append("b"))
for _ in range(3): w.tick()
self.assertEqual(fired, ["a"])
for _ in range(2): w.tick()
self.assertEqual(fired, ["a", "b"])
def test_cancel(self):
w = TimerWheel(slots=10, tick_seconds=1.0)
fired = []
tid = w.schedule(2, lambda: fired.append("x"))
self.assertTrue(w.cancel(tid))
for _ in range(5): w.tick()
self.assertEqual(fired, [])
def test_hierarchical_long_delay(self):
w = HierarchicalTimerWheel(levels=3, slots_per_level=4, tick_seconds=1.0)
# Range = 4 * 4 * 4 = 64 ticks
fired = []
w.schedule(50, lambda: fired.append("late"))
for _ in range(50): w.tick()
self.assertEqual(fired, ["late"])
Follow-up Questions
(11) Configuration knobs? slots, tick_seconds, levels (for hierarchical). Tick resolution chooses your scheduling granularity vs CPU cost: 1 ms means 1000 ticks/sec which is fine; 100 µs is 10000 ticks/sec which adds CPU. Slots per level: 256 is a good default (1 byte addressable, common L1 cache friendly).
(7) Backpressure? Timer firing is on the tick thread. If callbacks are slow, ticks fall behind real time; subsequent timers fire late. Mitigation: dispatch callbacks to a thread pool from tick. Document the soft real-time semantics (“fires within 1 tick of deadline barring slow callbacks”).
(4) Observe / monitor? Active timer count (gauge), schedule rate (counter), cancel rate (counter), fire rate (counter), tick latency p99 (histogram — should be near zero; spikes mean slow callbacks).
(8) Partial failure? A callback that raises kills the tick if not caught. Always wrap with try/except and log.
(13) Poison pill? A callback that takes 1 second on a 1-ms wheel: all subsequent ticks pile up. Same mitigation as #7: dispatch to thread pool, or set a per-callback timeout.
Product Extension
Linux’s kernel uses a 4-level hierarchical timer wheel for setitimer and TCP retransmits. Netty’s HashedWheelTimer is the canonical Java implementation (single-level wheel; “hashed” means linked-list bucket per slot). The Linux choice of 256/64/64/64 slots covers ~5 days at 1 ms tick — enough for any kernel-level timer.
Language/Runtime Follow-ups
- Python: this implementation. For sub-millisecond, switch to a C extension or use
asyncio’s event loop scheduler (which uses a heap, but for low-N is fine). - Java: Netty’s
HashedWheelTimer(single-level) andJCToolsfor lock-free variants.ScheduledThreadPoolExecutoris heap-based and slower at scale. - Go:
time.AfterFuncuses a heap internally (fine for low-N). For high-N,github.com/RussellLuo/timingwheelis a clean library. - C++: the textbook reference; libuv and Boost both have wheel-based timer implementations.
- JS/TS: Node’s timer subsystem uses a hash-bucket-by-time-and-context structure — not exactly a wheel but similar idea.
Common Bugs
- Forgetting modulo on slot indexing — array out of bounds when delay wraps around.
- Cascade firing on every tick instead of only when wrapping. Catastrophic slowdown.
- Forgetting to drain the bucket before clearing — callbacks lost.
- Holding callbacks in slot lists and in the
_timersdict; failing to remove from one when removing from the other. - Scheduling
delay=0: should fire on next tick, not “now”. Clamp to ≥ 1 tick.
Debugging Strategy
Print the wheel state (occupied slots and their counts) after each tick. For “missed firing” bugs, walk the slot indexing and verify the placement formula. For hierarchical cascade bugs, set very small slot counts (4 × 4 × 4) and hand-trace the wrap.
Mastery Criteria
- Implemented single-level wheel in <20 minutes; hierarchical in <35.
- All three tests pass.
- Stated heap vs wheel tradeoff (heap O(log N), wheel O(1); wheel wins at high-N high-cancel).
- Articulated cascade mechanism for hierarchical wheels.
- Answered follow-ups #4, #7, #11, #13.
- Identified that real-world systems (Linux, Netty) use this exact structure.
Lab 14 — Persistent KV Store
Goal
Implement an in-memory key-value store with TTL, snapshot + write-ahead-log (WAL) persistence, and crash recovery. After this lab you should be able to design and implement a Redis-shaped local store in under 40 minutes and articulate the durability tradeoffs.
Background Concepts
A persistent KV store has two storage paths:
- In-memory state: a
dict[key, value](plus TTL bookkeeping). Hot path: O(1) read/write. - Durable state: writes go to a WAL (append-only log of mutations); periodically a snapshot captures the current state. On boot, recovery = load latest snapshot + replay WAL since.
The four standard durability levels (each a different fsync policy):
- No persistence: pure in-memory. Lost on crash.
- WAL with no fsync: writes to OS buffer; lost on power-cut, survives process crash.
- WAL fsync per write: durable per write, slow (one syscall per op).
- WAL fsync every N ms: hybrid — bounded data loss in exchange for throughput.
Redis offers exactly these as appendfsync no/everysec/always. The interview answer is “explain the spectrum and pick a default that matches the workload”.
Interview Context
This problem hits at infrastructure / database companies and at any senior coding round that wants to test storage fundamentals. The interviewer wants: snapshot + WAL design, fsync tradeoff articulation, working code that survives a simulated crash.
Problem Statement
Implement KVStore:
put(key, value, ttl_seconds=None)— store with optional TTL.get(key) -> value | Nonedelete(key) -> boolsnapshot(path)— write current state.- Recovery: on construction with
wal_pathandsnapshot_path, replay snapshot + WAL.
Constraints
- 10^7 keys
- 10^5 ops / second
- Crash recovery within seconds
- Bounded memory (configurable max)
Clarifying Questions
- TTL granularity? (Seconds is fine for most workloads.)
- fsync policy? (Configurable: none / per-write / every-N-ms.)
- Snapshot format: text or binary? (Binary is faster, smaller; pick
pickleormsgpack.) - Concurrent reads during snapshot? (Often a follow-up; default block during snapshot.)
- Single-threaded or concurrent? (Single-threaded simplifies; lock for concurrency.)
Examples
kv = KVStore(wal_path="wal.log", snapshot_path="snap.pkl")
kv.put("user:1", {"name": "Alice"})
kv.put("session:42", "tok", ttl_seconds=60)
kv.get("user:1") -> {"name": "Alice"}
# … crash …
kv2 = KVStore(wal_path="wal.log", snapshot_path="snap.pkl")
kv2.get("user:1") -> {"name": "Alice"} # recovered
kv2.snapshot()
# After snapshot: WAL is rotated (truncated)
Initial Brute Force
dict[key, value]. No persistence. Lost on crash.
Brute Force Complexity
Per op: O(1). Memory: O(N). Durability: zero.
Optimization Path
Add WAL: append (op, key, value, ttl) per mutation. On boot, replay. Add periodic snapshot: serialize full state; truncate WAL. Add TTL: a dict[key, expires_at] and lazy expiry on get.
The cost is: O(WAL append) per write (serialization + file write); O(snapshot size) per snapshot; O(WAL size) per recovery. Throughput depends on fsync policy.
Final Expected Approach
In-memory dict for values + dict for TTL deadlines + a binary log file. Operations: log first, then update memory (“write-ahead”). Snapshot: pickle the in-memory state to a temp file, then atomic rename + truncate WAL. Recovery: load snapshot, replay WAL with file-based offset.
Data Structures Used
| Structure | Purpose |
|---|---|
dict[K, V] | hot key-value store |
dict[K, float] | TTL deadlines |
| WAL file (append-only) | durability |
| Snapshot file (pickle) | bounded recovery time |
Lock | single-threaded mutation under multi-thread access |
Correctness Argument
Durability: every mutation is appended to the WAL before updating the in-memory state. After the WAL append (and fsync, if configured), the mutation is durable. On crash, recovery replays exactly what was logged.
Atomicity of put: WAL append is atomic at the bytestream level (write syscalls of small bytes are atomic on Linux for ≤ 4 KB). Snapshot is atomic via write to tmp; fsync tmp; rename(tmp, snap).
Recovery correctness: applying snapshot first, then replaying WAL entries in order, reconstructs exactly the pre-crash state. The only loss is mutations that were in OS buffers but unsynced at crash time — bounded by fsync policy.
TTL: lazy expiry on get (check now >= deadline, delete if so). This is correct as long as we don’t return values past their TTL. Stale entries in memory are GC’d on access; a periodic background sweeper handles unused expired entries.
Complexity
put: O(1) memory + O(log entry size) diskget: O(1)snapshot: O(N) state size- Recovery: O(snapshot + WAL since snapshot)
Implementation Requirements
import os, pickle, time, threading
from typing import Any, Optional
class KVStore:
def __init__(self, wal_path: str = "kv.wal",
snapshot_path: str = "kv.snap",
fsync: str = "every_sec"):
self._wal_path = wal_path
self._snap_path = snapshot_path
self._fsync = fsync # "none" | "per_write" | "every_sec"
self._data: dict = {}
self._ttl: dict = {}
self._lock = threading.RLock()
self._wal_fp = None
self._last_fsync = time.monotonic()
self._recover()
self._wal_fp = open(self._wal_path, "ab", buffering=0)
if self._fsync == "every_sec":
self._fsync_thread = threading.Thread(target=self._fsync_loop, daemon=True)
self._fsync_thread.start()
def _recover(self) -> None:
# 1. Load snapshot if present
if os.path.exists(self._snap_path):
with open(self._snap_path, "rb") as f:
self._data, self._ttl = pickle.load(f)
# 2. Replay WAL since snapshot
if os.path.exists(self._wal_path):
with open(self._wal_path, "rb") as f:
while True:
try:
entry = pickle.load(f)
except (EOFError, pickle.UnpicklingError):
break
self._apply(entry)
# Sweep expired
now = time.time()
for k in list(self._ttl):
if self._ttl[k] <= now:
self._data.pop(k, None); self._ttl.pop(k, None)
def _apply(self, entry: dict) -> None:
op = entry["op"]
if op == "put":
self._data[entry["k"]] = entry["v"]
if entry.get("ttl") is not None:
self._ttl[entry["k"]] = entry["ttl"]
else:
self._ttl.pop(entry["k"], None)
elif op == "del":
self._data.pop(entry["k"], None)
self._ttl.pop(entry["k"], None)
def _wal_write(self, entry: dict) -> None:
buf = pickle.dumps(entry)
self._wal_fp.write(buf)
if self._fsync == "per_write":
self._wal_fp.flush()
os.fsync(self._wal_fp.fileno())
def _fsync_loop(self) -> None:
while True:
time.sleep(1.0)
with self._lock:
if self._wal_fp:
self._wal_fp.flush()
os.fsync(self._wal_fp.fileno())
def put(self, key, value, ttl_seconds: Optional[float] = None) -> None:
deadline = (time.time() + ttl_seconds) if ttl_seconds else None
with self._lock:
self._wal_write({"op": "put", "k": key, "v": value, "ttl": deadline})
self._data[key] = value
if deadline is not None:
self._ttl[key] = deadline
else:
self._ttl.pop(key, None)
def get(self, key) -> Any:
with self._lock:
deadline = self._ttl.get(key)
if deadline is not None and time.time() >= deadline:
self._wal_write({"op": "del", "k": key})
self._data.pop(key, None); self._ttl.pop(key, None)
return None
return self._data.get(key)
def delete(self, key) -> bool:
with self._lock:
existed = key in self._data
self._wal_write({"op": "del", "k": key})
self._data.pop(key, None); self._ttl.pop(key, None)
return existed
def snapshot(self) -> None:
with self._lock:
tmp = self._snap_path + ".tmp"
with open(tmp, "wb") as f:
pickle.dump((self._data, self._ttl), f)
f.flush(); os.fsync(f.fileno())
os.rename(tmp, self._snap_path)
# Rotate WAL
self._wal_fp.close()
open(self._wal_path, "wb").close()
self._wal_fp = open(self._wal_path, "ab", buffering=0)
def close(self) -> None:
with self._lock:
if self._wal_fp:
self._wal_fp.flush(); os.fsync(self._wal_fp.fileno())
self._wal_fp.close(); self._wal_fp = None
Tests
import unittest, tempfile, os, time
class TestKV(unittest.TestCase):
def setUp(self):
self.tmp = tempfile.mkdtemp()
self.wal = os.path.join(self.tmp, "wal.log")
self.snap = os.path.join(self.tmp, "snap.pkl")
def tearDown(self):
import shutil; shutil.rmtree(self.tmp)
def test_basic(self):
kv = KVStore(self.wal, self.snap, fsync="none")
kv.put("a", 1); kv.put("b", "two")
self.assertEqual(kv.get("a"), 1)
self.assertEqual(kv.get("b"), "two")
kv.delete("a")
self.assertIsNone(kv.get("a"))
kv.close()
def test_ttl(self):
kv = KVStore(self.wal, self.snap, fsync="none")
kv.put("k", "v", ttl_seconds=0.1)
self.assertEqual(kv.get("k"), "v")
time.sleep(0.15)
self.assertIsNone(kv.get("k"))
kv.close()
def test_recovery_from_wal(self):
kv = KVStore(self.wal, self.snap, fsync="per_write")
kv.put("x", "y")
kv.close()
# Simulate crash and restart
kv2 = KVStore(self.wal, self.snap, fsync="none")
self.assertEqual(kv2.get("x"), "y")
kv2.close()
def test_snapshot_rotates_wal(self):
kv = KVStore(self.wal, self.snap, fsync="none")
for i in range(100):
kv.put(f"k{i}", i)
wal_size_before = os.path.getsize(self.wal)
kv.snapshot()
wal_size_after = os.path.getsize(self.wal)
self.assertGreater(wal_size_before, wal_size_after)
kv.close()
# Recover
kv2 = KVStore(self.wal, self.snap, fsync="none")
self.assertEqual(kv2.get("k99"), 99)
kv2.close()
Follow-up Questions
(2) Persist state across restarts? That’s what we built. The four fsync levels and their tradeoffs are the answer-bearing detail: per_write (durable per op, slow); every_sec (≤1 sec data loss, fast — Redis default); none (lose on crash, fastest).
(10) Consistency model? Linearizable in a single process under the lock. Across processes (or replicas), this becomes a consensus problem — Raft / Paxos. The KV store is the data plane; consensus is the control plane.
(8) Partial failure? Crash mid-write: _wal_write buffers a partial entry — pickle.UnpicklingError on recovery; we ignore the trailing junk (caught above). For OS-level partial writes (rare on Linux for ≤ 4 KB), a per-entry checksum (CRC32) catches them.
(9) Eviction / cleanup? TTL provides automatic cleanup, but expired keys still in memory consume RAM until accessed. Background sweeper: periodically scan _ttl for expired keys and delete. For unbounded growth, add an LRU/LFU policy on top: when memory > threshold, evict by policy.
(11) Configuration knobs? fsync policy, snapshot_interval, max_memory_bytes, eviction_policy. Knobs not to expose: pickle protocol (use latest).
(12) Shutdown? Graceful: flush WAL, fsync, close file. The close method ensures durability up to the last write.
Product Extension
Redis (RDB = snapshot, AOF = WAL); RocksDB / LevelDB (LSM trees with WAL + memtable + SSTable); Memcached (no persistence — pure cache); etcd / ZooKeeper (snapshot + WAL + Raft for consensus). The pattern you wrote here is the foundation; SSTable + LSM is the next-level optimization for write-heavy + range-query workloads.
Language/Runtime Follow-ups
- Python: pickle is fine for the snapshot format but not version-safe; for production, use msgpack or Protocol Buffers.
- Java:
RandomAccessFilefor the WAL; Java serialization for snapshot (also fragile — prefer Avro or Protobuf). - Go:
bufio.Writeroveros.File;gobfor snapshot. BadgerDB and BoltDB are production-grade Go KV stores. - C++: write your own framing or use Cap’n Proto. RocksDB is the canonical reference (C++ implementation of LSM + WAL).
- JS/TS: rare in Node; use
level(LevelDB binding) instead of rolling your own.
Common Bugs
- Updating in-memory state before WAL append: lose the durability guarantee.
- fsync per write but on the wrong fd (forgetting
flush()beforefsync()). - Snapshot writes to the actual snapshot path before fsync — if crash mid-write, snapshot is corrupt. Always write-tmp + fsync + rename.
- WAL not rotated on snapshot — recovery replays the entire history every time, even after snapshot.
- TTL stored as duration instead of absolute time — restart shifts deadlines.
Debugging Strategy
For “lost data after restart” bugs: tail the WAL with a pickle reader and check that the missing key was logged. For corrupt-snapshot bugs: check that os.rename is on the same filesystem (cross-fs rename is not atomic).
Mastery Criteria
-
Implemented
KVStorewith WAL + snapshot + recovery in <40 minutes. - All four tests pass.
- Articulated three fsync levels and their tradeoffs without prompting.
- Stated WAL-before-memory as the durability invariant.
- Answered follow-ups #2, #8 (partial-write tolerance), #9, #10 (single vs replicated consistency), #11.
- Compared snapshot+WAL vs LSM tree at a high level.
Lab 15 — Retry With Exponential Backoff and Jitter
Goal
Implement a reusable retry(fn, policy) primitive that retries a callable on failure with exponential backoff plus decorrelated jitter, bounded by max attempts and total deadline, with an explicit retryable-error predicate so non-retryable errors fail fast. After this lab you should be able to write a production-shaped retry helper from a blank screen in <15 minutes and articulate why naive sleep(2 ** attempt) is wrong in <60 seconds.
Background Concepts
A retry primitive has four orthogonal knobs: (a) how many times to retry (max attempts and / or total deadline), (b) how long to wait between attempts (the backoff schedule), (c) which errors are retryable (a predicate), and (d) what to do on final failure (raise, return a sentinel, surface diagnostics). The non-trivial knob is (b). Naive exponential backoff wait = base * 2 ** attempt causes a thundering herd: when a downstream service recovers, every retrying client wakes simultaneously and re-overloads it. The fix is jitter: randomize the wait. The two industry-standard schedules are full jitter (wait = uniform(0, base * 2 ** attempt)) and decorrelated jitter (wait = uniform(base, prev_wait * 3), capped at max_wait). Decorrelated jitter is preferred when retries cluster across many clients because its waits are less correlated across attempts.
The total deadline matters as much as the attempt count. A 5-attempt schedule with base=1s, cap=30s can spend up to two minutes blocked — unacceptable for a request-path retry. Production retry helpers always take a deadline.
Interview Context
This is a 20-minute warmup at Stripe, Uber, Cloudflare, and any team whose service makes downstream calls. It’s also a frequent follow-up to the rate-limiter and circuit-breaker labs. Candidates who write for i in range(5): try: return fn() except: time.sleep(2 ** i) get a partial credit; candidates who name jitter, deadline, retryable-error predicate, and the relationship to the circuit breaker (Lab 16) get a strong signal.
Problem Statement
Implement retry(fn, max_attempts, base_delay, max_delay, deadline_s, is_retryable, jitter='decorrelated') that calls fn() repeatedly until it succeeds or the policy gives up. On non-retryable exceptions, fail immediately. On retryable exceptions, sleep according to the schedule and try again. On exceeding max_attempts or deadline_s, raise the last exception wrapped in a RetryExhausted.
Constraints
max_attempts≥ 1 (1 means “no retries”; the function is called at most once).base_delay> 0,max_delay≥base_delay.deadline_smay beNone(no deadline) or a positive float (wall-clock seconds fromretry()invocation).is_retryable: Exception -> boolmust be a pure function.- The implementation must not busy-spin and must respect both the per-attempt cap and the deadline (whichever fires first).
Clarifying Questions
- Is the deadline measured from
retry()invocation or from the first failure? (From invocation — simpler reasoning.) - Should
fn()be called at least once even if the deadline is already past at start? (Yes — at least one attempt.) - Should we sleep after the final attempt? (No — pointless.)
- Does
is_retryableapply to the last attempt’s exception, or do we always re-raise the last? (Re-raise the last; non-retryable short-circuits.) - Synchronous or async? (Both — implement sync first, async variant in follow-ups.)
- Should we surface the attempt count and total elapsed time in the wrapped exception? (Yes — operational visibility.)
Examples
retry(lambda: http_get(url), max_attempts=5, base_delay=0.1, max_delay=10, deadline_s=30,
is_retryable=lambda e: isinstance(e, (TimeoutError, ConnectionError)))
# → returns the response if any attempt succeeds within 30s and 5 tries.
# → raises RetryExhausted("timeout", attempts=5, elapsed=12.3s) on timeout.
# → raises ValueError immediately if fn() raises ValueError (non-retryable).
Initial Brute Force
def retry_naive(fn, max_attempts):
for i in range(max_attempts):
try:
return fn()
except Exception:
if i == max_attempts - 1:
raise
time.sleep(2 ** i)
This is what most candidates write first. It has all four bugs listed above: no jitter (thundering herd), no deadline (unbounded wait), no retryable predicate (retries on programming errors), no cap on max_delay.
Brute Force Complexity
Time: dominated by sleeps; up to Σ 2^i ≈ 2^max_attempts seconds in the worst case. For max_attempts=10, that’s 17 minutes. Space: O(1).
Optimization Path
Add (1) max_delay cap → bounds per-attempt sleep, (2) deadline_s total cap → bounds end-to-end blocking, (3) is_retryable predicate → fast-fails on programmer errors, (4) jitter → spreads herd, (5) structured exception with diagnostics → operational legibility. Each addition is a one-knob change; together they take the primitive from “buggy in production” to “shippable”.
Final Expected Approach
Loop up to max_attempts times. Track wall-clock start. On each attempt, call fn(). On success, return the value. On failure, check is_retryable; if false, re-raise. If we’re at the last attempt or past the deadline, raise RetryExhausted. Otherwise compute the next sleep using the chosen jitter scheme, clip to remaining-deadline so we don’t oversleep, and time.sleep(wait). Log each attempt.
Data Structures Used
- A monotonic clock reference (
time.monotonic()) to compute deadlines — wall-clock can jump. - A small
RetryExhaustedexception class carryingattempts,elapsed,last_exception. - An optional
Loggerfor per-attempt diagnostics (don’t print; inject a logger).
Correctness Argument
We make at most max_attempts calls to fn (loop bound). We sleep between attempts but never after the final one (the loop returns or raises before sleeping past the last attempt). We respect deadline_s by computing remaining = deadline - elapsed and clipping the sleep; if remaining ≤ 0 we raise immediately. Non-retryable exceptions short-circuit by re-raising before the sleep. The exception we surface is always the last underlying failure, wrapped with diagnostics.
Complexity
| Aspect | Cost |
|---|---|
| Wall-clock | bounded by min(deadline_s, Σ wait_i) |
| CPU per failed attempt | O(1) plus fn’s own cost |
| Memory | O(1) |
Implementation Requirements
A complete working implementation is required.
import random
import time
from dataclasses import dataclass
from typing import Callable, Optional, TypeVar
T = TypeVar("T")
class RetryExhausted(Exception):
def __init__(self, message: str, attempts: int, elapsed: float, last_exception: BaseException):
super().__init__(f"{message} (attempts={attempts}, elapsed={elapsed:.2f}s)")
self.attempts = attempts
self.elapsed = elapsed
self.last_exception = last_exception
@dataclass
class RetryPolicy:
max_attempts: int = 5
base_delay: float = 0.1
max_delay: float = 30.0
deadline_s: Optional[float] = None
jitter: str = "decorrelated" # "decorrelated" | "full" | "none"
is_retryable: Callable[[BaseException], bool] = lambda e: True
def __post_init__(self):
if self.max_attempts < 1:
raise ValueError("max_attempts must be >= 1")
if self.base_delay <= 0 or self.max_delay < self.base_delay:
raise ValueError("invalid delay bounds")
def _next_wait(policy: RetryPolicy, attempt: int, prev_wait: float) -> float:
if policy.jitter == "none":
w = min(policy.base_delay * (2 ** attempt), policy.max_delay)
elif policy.jitter == "full":
cap = min(policy.base_delay * (2 ** attempt), policy.max_delay)
w = random.uniform(0, cap)
elif policy.jitter == "decorrelated":
w = min(random.uniform(policy.base_delay, max(prev_wait, policy.base_delay) * 3),
policy.max_delay)
else:
raise ValueError(f"unknown jitter scheme: {policy.jitter}")
return w
def retry(fn: Callable[[], T], policy: RetryPolicy, *, sleep=time.sleep, clock=time.monotonic) -> T:
start = clock()
last_exc: Optional[BaseException] = None
prev_wait = policy.base_delay
for attempt in range(policy.max_attempts):
try:
return fn()
except BaseException as e:
last_exc = e
if not policy.is_retryable(e):
raise
if attempt == policy.max_attempts - 1:
break
elapsed = clock() - start
if policy.deadline_s is not None and elapsed >= policy.deadline_s:
break
wait = _next_wait(policy, attempt, prev_wait)
if policy.deadline_s is not None:
wait = min(wait, max(0.0, policy.deadline_s - elapsed))
if wait > 0:
sleep(wait)
prev_wait = wait
elapsed = clock() - start
raise RetryExhausted("retry exhausted", attempt + 1, elapsed, last_exc) from last_exc
sleep and clock are dependency-injected so tests do not have to wait real time.
Tests
def test_succeeds_first_try():
assert retry(lambda: 42, RetryPolicy(max_attempts=3)) == 42
def test_succeeds_after_failures():
n = {"i": 0}
def fn():
n["i"] += 1
if n["i"] < 3: raise TimeoutError()
return "ok"
assert retry(fn, RetryPolicy(max_attempts=5, base_delay=0.001)) == "ok"
assert n["i"] == 3
def test_non_retryable_short_circuits():
n = {"i": 0}
def fn():
n["i"] += 1
raise ValueError("bad")
policy = RetryPolicy(is_retryable=lambda e: not isinstance(e, ValueError))
try: retry(fn, policy)
except ValueError: pass
assert n["i"] == 1
def test_exhaustion_wraps_exception():
def fn(): raise TimeoutError("nope")
try: retry(fn, RetryPolicy(max_attempts=2, base_delay=0.001))
except RetryExhausted as e:
assert e.attempts == 2
assert isinstance(e.last_exception, TimeoutError)
def test_deadline_respected():
fake_time = [0.0]
def fn(): raise TimeoutError()
sleeps = []
def fake_sleep(t): sleeps.append(t); fake_time[0] += t
def fake_clock(): return fake_time[0]
try:
retry(fn, RetryPolicy(max_attempts=100, base_delay=1, deadline_s=5, jitter="none"),
sleep=fake_sleep, clock=fake_clock)
except RetryExhausted as e:
assert e.elapsed <= 5.001
Follow-up Questions
- How would you make it thread-safe? The function is reentrant — no shared state across calls. The injected
sleepandclockshould themselves be thread-safe (the stdlib ones are). Per-call state (attempt counter, prev_wait) is local. No locks needed. - How would you observe and monitor it? Emit (a)
retry.attemptscounter labeled by callsite and outcome (success_first_try,success_after_retry,exhausted,non_retryable), (b)retry.elapsedhistogram, (c)retry.attempt_counthistogram. Log per-attempt at DEBUG, per-final-failure at WARN. - How would you handle a poison-pill input? A request that always raises a retryable error wastes the deadline on every retry. Wrap repeated callers behind a circuit breaker (Lab 16); after N consecutive
RetryExhausteds, open the breaker and fail fast for a cooldown period. - What configuration knobs would you expose?
max_attempts,base_delay,max_delay,deadline_s,jitterstrategy,is_retryablepredicate. Defaults: 5 / 100ms / 30s / None / decorrelated /lambda e: True. Don’t expose internal multipliers (the 3× in decorrelated jitter) — they’re stable and tuning them in production is a smell. - How would you test it deterministically? Inject
sleepandclock; advance fake time inside the fakesleep. Seedrandomfor reproducible jitter. The test for the deadline above uses this pattern. - What is the relationship to the circuit breaker? A retry without a circuit breaker is dangerous: if the downstream is fully down, every caller retries the full schedule, multiplying load. The right composition is
circuit_breaker(retry(fn))— the breaker short-circuits the retry once it has seen enough failures.
Product Extension
Retry primitives are the workhorse of every microservice’s outbound RPC layer. AWS SDK, Google Cloud SDK, and gRPC all ship retry helpers; their default schedules are decorrelated jitter with deadlines. The is_retryable predicate in production is the hardest knob: HTTP 5xx is usually retryable, 4xx usually is not, but 429 is retryable with Retry-After honored. Lift this complexity into the predicate.
Language/Runtime Follow-ups
- Python: as above. For async, swap
time.sleepforasyncio.sleepand makeretryanasync def. - Java: use
Resilience4jorFailsafein production. Hand-rolled: aRetryPolicybuilder, aCallable<T>argument,Thread.sleep(orScheduledExecutorService.schedulein async). - Go: a function
Retry(ctx context.Context, fn func() error, policy Policy) error. Usetime.NewTimerso actx.Done()can cancel mid-sleep. Cancellation is the deadline mechanism. - C++:
std::this_thread::sleep_forandstd::chronofor the deadline. Pass a stop-token to support cancellation. - JS/TS:
await new Promise(r => setTimeout(r, ms)). The retry function isasync. UseAbortSignalfor the deadline.
Common Bugs
- Sleeping after the final attempt — wastes wall-clock.
- Using
time.time()instead oftime.monotonic()— wall-clock can jump backwards across NTP corrections, causing negativeelapsedand crashes. - Catching
BaseExceptionand swallowingKeyboardInterrupt/SystemExit— never make these retryable. Either narrow the catch or have the predicate exclude them. - Computing the next wait before the deadline check — you sleep past the deadline. Always check elapsed first.
- Forgetting to clip
waittoremaining = deadline - elapsed— a 30s sleep when only 2s of deadline remain. - Not seeding
randomdeterministically in tests — flaky test failures.
Debugging Strategy
When retries don’t fire: print is_retryable(e) for the actual exception; assert it returns True. When they fire too long: print attempt, wait, and clock() - start per attempt — the bug is almost always a missing deadline check or an uncapped jitter computation. When tests are flaky: confirm sleep is injected (no real sleeps in unit tests) and random.seed(0) at the top of the test.
Mastery Criteria
- Wrote the brute force naive retry in <2 minutes from cold start.
-
Added
max_delay,deadline_s,is_retryable, and jitter incrementally, justifying each. - Wrote both full-jitter and decorrelated-jitter formulas from memory.
-
Stated the difference between
time.time()andtime.monotonic()and which to use here. -
Wrote deterministic tests using injected
sleepandclock. - Articulated the retry+circuit-breaker composition in <60 seconds.
- Solved this from a blank screen in <15 minutes including 5 unit tests.
-
Listed the four bugs in the naive
for i in range: sleep(2**i)retry without prompting.
Lab 16 — Circuit Breaker
Goal
Implement a thread-safe circuit breaker with three states — CLOSED, OPEN, HALF_OPEN — that protects a downstream call by failing fast once a sliding-window failure rate threshold is crossed, then probes for recovery after a cooldown. After this lab you should be able to draw the state diagram, name every transition, write the implementation in <25 minutes, and answer “what’s the difference between a retry and a circuit breaker” in <30 seconds.
Background Concepts
A circuit breaker is the operational dual of a retry. A retry keeps trying until the downstream is probably up; a circuit breaker stops trying once the downstream is probably down. Without a breaker, every caller retries the full schedule and amplifies the outage. With one, callers fail fast for a cooldown window and only a single probe call is sent during recovery — preventing the retry storm that otherwise prolongs outages.
The three states:
- CLOSED — normal operation; calls go through; failures are counted in a sliding window.
- OPEN — the failure threshold was crossed; all calls are short-circuited with
CircuitOpenErrorforcooldown_sseconds. - HALF_OPEN — cooldown elapsed; a single probe call is allowed. If it succeeds, transition to
CLOSEDand reset counters. If it fails, transition back toOPENand start a fresh cooldown.
Two failure-counting windows are common: count-based (last N calls) and time-based (last T seconds). Time-based is preferred for low-traffic services because count-based windows can stay stale indefinitely. Both are easy to implement on top of a deque of timestamps.
Interview Context
This is the canonical follow-up to Lab 15 (retry) and a top-15 practical problem at Stripe, Netflix, Uber, and any team with a microservice mesh. The Hystrix library popularized this pattern; its successor Resilience4j is the modern reference. Candidates often hand-roll only the state transitions and miss the half-open single-probe constraint — a clear signal of “knows the diagram, hasn’t operated one in production”.
Problem Statement
Implement CircuitBreaker(failure_threshold, window_s, cooldown_s) with method call(fn) that either calls fn() (and updates the breaker state from the result) or raises CircuitOpenError if the breaker is open. Internally track failure count over the last window_s seconds; transition to OPEN when the count reaches failure_threshold. After cooldown_s in OPEN, the next call enters HALF_OPEN and is the sole probe; success → CLOSED, failure → OPEN again.
Constraints
- Thread-safe; multiple goroutines/threads may call
call()concurrently. - In
HALF_OPEN, exactly one probe is in flight. Concurrent callers seeCircuitOpenErroruntil the probe completes. failure_threshold≥ 1,window_s> 0,cooldown_s> 0.- Successful calls in
CLOSEDdecrement (or do not affect) the failure window — choose and document.
Clarifying Questions
- Are timeouts counted as failures? (Default yes — they almost always indicate downstream unhealth.)
- Are application errors (4xx vs 5xx) treated identically? (No — 4xx is the caller’s fault; only 5xx and timeouts should trip. Inject a
is_failure(exception)predicate.) - What’s the recovery semantics — strict half-open (single probe) or “let N requests through”? (Single probe by default; named
RECOVERY_QUOTAif needed.) - Do we need per-resource breakers or a global one? (Per-resource is correct — a breaker per downstream identity.)
- Should successes in
CLOSEDreset the failure count? (Most implementations don’t reset; only the sliding window aging removes failures. Tunable.)
Examples
breaker = CircuitBreaker(failure_threshold=5, window_s=10, cooldown_s=30)
breaker.call(lambda: http_get(url)) # raises if downstream raises
# After 5 failures within 10s: state -> OPEN
breaker.call(...) # raises CircuitOpenError immediately for 30s
# After 30s cooldown: next call -> HALF_OPEN probe
# Probe success -> CLOSED, fresh window
# Probe failure -> OPEN, fresh 30s cooldown
Initial Brute Force
class NaiveBreaker:
def __init__(self, threshold, cooldown_s):
self.failures = 0
self.opened_at = None
self.threshold = threshold
self.cooldown_s = cooldown_s
def call(self, fn):
if self.opened_at and time.time() - self.opened_at < self.cooldown_s:
raise CircuitOpenError()
try:
r = fn()
self.failures = 0
return r
except Exception:
self.failures += 1
if self.failures >= self.threshold:
self.opened_at = time.time()
raise
This naive version has six bugs: not thread-safe; counts forever (no window aging); no half-open state (multiple probes after cooldown); resets failures on any success even if breaker just opened; uses wall-clock; treats every exception as a failure.
Brute Force Complexity
call() is O(1). Failure window is unbounded — fails over long traffic patterns where intermittent failures should not trip.
Optimization Path
(1) Add a sliding window — deque of failure timestamps, age out on each call. (2) Add HALF_OPEN state with a probe_in_flight flag. (3) Add a Lock to serialize state transitions. (4) Inject the failure predicate so only real failures count. (5) Switch to monotonic(). (6) Emit metrics on each transition.
Final Expected Approach
State machine guarded by a threading.Lock. On each call(): under the lock, read state. If OPEN and cooldown elapsed → transition to HALF_OPEN and grant the probe to this caller (set probe_in_flight=True). If OPEN and not elapsed → raise. If HALF_OPEN and probe in flight → raise (concurrent callers see open). If CLOSED → proceed. Release the lock, call fn(), reacquire the lock to record the result. On success in HALF_OPEN → transition to CLOSED, clear failures. On failure → record (or transition to OPEN).
Data Structures Used
deque[float]for failure timestamps in the sliding window.threading.Lockfor state transitions.- An
enumforState. - A monotonic clock for all time reads.
Correctness Argument
The state diagram is a closed graph: CLOSED → OPEN → HALF_OPEN → {CLOSED | OPEN}. Every transition is guarded by the lock, so two threads cannot disagree on the current state. The half-open invariant is enforced by probe_in_flight: only the thread that flipped the state from OPEN to HALF_OPEN holds the probe right; all others see CircuitOpenError. The sliding window is monotonically aged on each call, so failures older than window_s are guaranteed evicted before being counted.
Complexity
| Operation | Time | Space |
|---|---|---|
call (CLOSED, success) | O(1) amortized | O(window) for deque |
call (OPEN, fast-fail) | O(1) | O(1) |
call (HALF_OPEN probe) | O(1) plus fn | O(1) |
Window aging is amortized O(1) per call.
Implementation Requirements
import threading
import time
from collections import deque
from enum import Enum
from typing import Callable, Optional, TypeVar
T = TypeVar("T")
class State(Enum):
CLOSED = "CLOSED"
OPEN = "OPEN"
HALF_OPEN = "HALF_OPEN"
class CircuitOpenError(Exception):
pass
class CircuitBreaker:
def __init__(self,
failure_threshold: int = 5,
window_s: float = 10.0,
cooldown_s: float = 30.0,
is_failure: Callable[[BaseException], bool] = lambda e: True,
*,
clock=time.monotonic):
if failure_threshold < 1:
raise ValueError("failure_threshold must be >= 1")
self._threshold = failure_threshold
self._window_s = window_s
self._cooldown_s = cooldown_s
self._is_failure = is_failure
self._clock = clock
self._lock = threading.Lock()
self._state = State.CLOSED
self._failures: deque[float] = deque()
self._opened_at: Optional[float] = None
self._probe_in_flight = False
# observability
self._transitions: list[tuple[float, State, State]] = []
def _age_failures(self, now: float):
cutoff = now - self._window_s
while self._failures and self._failures[0] < cutoff:
self._failures.popleft()
def _transition(self, new: State, now: float):
self._transitions.append((now, self._state, new))
self._state = new
def _try_acquire_probe(self, now: float) -> bool:
"""Called under lock. True if this caller becomes the probe."""
if self._state == State.OPEN and self._opened_at is not None \
and now - self._opened_at >= self._cooldown_s:
self._transition(State.HALF_OPEN, now)
self._probe_in_flight = True
return True
return False
def call(self, fn: Callable[[], T]) -> T:
now = self._clock()
is_probe = False
with self._lock:
if self._state == State.CLOSED:
self._age_failures(now)
elif self._state == State.OPEN:
if not self._try_acquire_probe(now):
raise CircuitOpenError("breaker is OPEN")
is_probe = True
elif self._state == State.HALF_OPEN:
if not self._probe_in_flight:
# rare race: cooldown re-elapsed during a transient state
self._probe_in_flight = True
is_probe = True
else:
raise CircuitOpenError("probe in flight")
# invoke without holding the lock
try:
result = fn()
except BaseException as e:
failed = self._is_failure(e)
with self._lock:
now = self._clock()
if is_probe:
self._probe_in_flight = False
self._transition(State.OPEN, now)
self._opened_at = now
elif failed:
self._failures.append(now)
self._age_failures(now)
if len(self._failures) >= self._threshold and self._state == State.CLOSED:
self._transition(State.OPEN, now)
self._opened_at = now
self._failures.clear()
raise
with self._lock:
if is_probe:
self._probe_in_flight = False
self._transition(State.CLOSED, self._clock())
self._failures.clear()
self._opened_at = None
return result
def state(self) -> State:
with self._lock:
return self._state
Tests
def test_closed_passes_through():
b = CircuitBreaker(failure_threshold=3, window_s=10, cooldown_s=5)
assert b.call(lambda: 42) == 42
assert b.state() == State.CLOSED
def test_opens_after_threshold():
b = CircuitBreaker(failure_threshold=3, window_s=10, cooldown_s=5)
for _ in range(3):
try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
except RuntimeError: pass
assert b.state() == State.OPEN
try: b.call(lambda: 42)
except CircuitOpenError: pass
def test_half_open_success_closes():
fake = [0.0]
b = CircuitBreaker(failure_threshold=2, window_s=10, cooldown_s=5, clock=lambda: fake[0])
for _ in range(2):
try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
except RuntimeError: pass
assert b.state() == State.OPEN
fake[0] = 6
assert b.call(lambda: "ok") == "ok"
assert b.state() == State.CLOSED
def test_half_open_failure_reopens():
fake = [0.0]
b = CircuitBreaker(failure_threshold=2, window_s=10, cooldown_s=5, clock=lambda: fake[0])
for _ in range(2):
try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
except RuntimeError: pass
fake[0] = 6
try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
except RuntimeError: pass
assert b.state() == State.OPEN
def test_concurrent_only_one_probe():
import threading
fake = [0.0]
b = CircuitBreaker(failure_threshold=1, window_s=10, cooldown_s=5, clock=lambda: fake[0])
try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
except RuntimeError: pass
fake[0] = 6
seen_states = []
barrier = threading.Barrier(10)
def worker():
barrier.wait()
try:
b.call(lambda: time.sleep(0.05) or "ok")
seen_states.append("ok")
except CircuitOpenError:
seen_states.append("open")
threads = [threading.Thread(target=worker) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join()
assert seen_states.count("ok") == 1
assert seen_states.count("open") == 9
Follow-up Questions
- How would you make it thread-safe? A single
threading.Lockaround state transitions and counter updates is sufficient and is what the implementation does. Thefn()call is invoked outside the lock so a slow downstream does not block other callers from seeingCircuitOpenError. The half-open probe race is resolved byprobe_in_flightflipping atomically under the lock. - What metrics would you emit? State-transition counter (labels:
from_state,to_state); current state gauge; per-call outcome counter (success,failure,short_circuit,probe_success,probe_failure); failure-window gauge (current count); time-in-state histogram. - What is the consistency model? Linearizable on the breaker’s state — all
state()reads observe transitions in a total order consistent with the lock acquisition order. The probe invariant (“at most one probe in flight at any time”) is strict. - How would you handle a poison-pill input? A request that always raises a retryable failure will trip the breaker quickly — that’s the breaker’s job. The risk is the opposite: a probe with a poison input perpetually fails the half-open probe and never recovers. Mitigation: pick the probe payload from a known-safe traffic pool (synthetic health-check), or use a periodic health probe instead of in-line traffic.
- What configuration knobs would you expose?
failure_threshold,window_s,cooldown_s,is_failurepredicate. Don’t expose the half-open probe quota — keep it 1 unless you have a strong reason. Defaults: 5 failures / 10s window / 30s cooldown. - How would you scale to N nodes? Per-process breakers are local — each instance learns about downstream health independently. This is correct for most use cases (each instance’s view of latency varies) but expensive if downstream collapse is sudden. The next step is a coordinated breaker via a shared registry, but only at very high scale.
Product Extension
Real-world breakers (Hystrix, Resilience4j, Polly) layer on top of this core: bulkheads (concurrent-call limit), rate limiters, fallbacks (return cached value when open), and metric emission to Prometheus / StatsD. The state machine is the same; the bookkeeping around it varies by framework.
Language/Runtime Follow-ups
- Python: as above. For async, replace
Lockwithasyncio.Lockand makecallanasync def. - Java: prefer Resilience4j in production. Hand-rolled:
AtomicReference<State>,LongAdderfor counters,ScheduledExecutorServicefor cooldown timeouts. - Go: a struct guarded by
sync.Mutex. The probe flag is a bool. Usetime.Now()(monotonic on Go 1.9+). - C++:
std::mutex+std::condition_variableif you want concurrent callers to wait for the probe rather than fail fast (a different policy, called “blocking breaker”). - JS/TS: in single-threaded Node, no lock is needed — the state-transition logic is naturally atomic across awaits as long as you do not
awaitin the middle of a transition block.
Common Bugs
- Holding the lock while calling
fn()— a slow downstream blocks every other caller. - Forgetting to clear
probe_in_flighton probe failure — breaker stays inHALF_OPENforever, all calls fail. - Using
time.time()— wall-clock skew can makenow - opened_atnegative and the cooldown effectively infinite. - Counting non-failure exceptions (
KeyboardInterrupt,ValueErrorfrom caller side) toward the threshold. - Resetting
failureson every successful call — masks intermittent failures. - Aging the window only on failure —
state()queries report stale counts.
Debugging Strategy
When the breaker won’t open: log the failure count after each call; check is_failure(e) returns True for the actual exception. When it won’t close after recovery: log the state and probe_in_flight flag — almost always the probe-flag-stuck-True bug. When concurrent tests are flaky: add a barrier so all callers race in lockstep, then assert exactly one probe-success and N-1 short-circuits.
Mastery Criteria
- Drew the three-state diagram from memory in <30 seconds.
- Listed every transition trigger (failure-threshold, cooldown-elapsed, probe-success, probe-failure) without prompting.
- Wrote a thread-safe implementation in <30 minutes from a blank screen.
- Wrote a concurrent test that catches the multiple-probe bug.
- Articulated the retry × circuit-breaker composition in <60 seconds.
- Named four metrics you’d emit for a production breaker.
-
Explained why
fn()must not be called under the lock.
Lab 17 — Metrics Collector (Counter / Gauge / Histogram)
Goal
Implement a thread-safe in-process metrics registry that supports the three canonical metric types — counter, gauge, histogram — with bounded memory, label support, and an export format suitable for Prometheus scraping. After this lab you should be able to write the registry from a blank screen in <25 minutes and articulate the difference between summary and histogram in <60 seconds.
Background Concepts
The four metric types in the Prometheus / OpenMetrics ecosystem are counter (monotonic non-decreasing total — requests, errors, bytes), gauge (a current value — queue depth, active connections, memory in use), histogram (a count of observations bucketed by upper bound, used to compute quantiles server-side), and summary (client-side quantiles, harder to aggregate). The first three cover ~95% of production needs. Counters answer “how many?”, gauges answer “how much right now?”, histograms answer “what’s the distribution and the p99?”.
A metric is identified by (name, label_set). The same name with different labels (e.g., http_requests_total{method="GET"} vs http_requests_total{method="POST"}) is a different time series. The number of label combinations is the metric’s cardinality. Unbounded cardinality (e.g., a label per user_id) is the most common production memory leak in metric systems — protect against it.
Histograms are tricky. Two-pass naive implementations (store all observations, sort on export) explode memory. The Prometheus model: pre-declare a fixed set of bucket upper bounds ([0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] is the default) and increment a counter per bucket. Quantile estimation happens at the query layer using bucket interpolation. Memory is O(buckets) per series — cheap.
Interview Context
A staple practical-engineering question at Datadog, Grafana, Honeycomb, and any observability-aware team (which is most of them now). Also asked at Stripe and Uber as a “we want to see how you reason about cardinality” question. Candidates fail this round when they reach for “store every observation in a list” — that betrays a lack of production exposure.
Problem Statement
Implement MetricsRegistry with:
counter(name, labels=None).inc(by=1)— monotonically incrementing.gauge(name, labels=None).set(value)/.inc(by=1)/.dec(by=1).histogram(name, labels=None, buckets=...).observe(value).registry.snapshot()returns a list of(name, labels, type, value)tuples or the OpenMetrics text format.- Thread-safe; bounded cardinality (configurable maximum number of label combinations per metric name).
Constraints
- Must support concurrent observations from many threads.
- Histogram bucket increments must be atomic (no torn reads of
bucket_countandsum). - Cardinality cap: when a new label combination would exceed
max_labels_per_metric, drop and emit an internal counter (metrics_drops_total). - Counter and gauge: O(1) per operation; histogram: O(log buckets) per observation (binary search the bucket).
Clarifying Questions
- Do we need timestamped exposition (Prometheus exposition format)? (Yes, but the timestamp can be implicit — Prometheus assigns the scrape timestamp.)
- Are histogram buckets shared across all label combinations or per combination? (Per combination — different label values may have different distributions.)
- Are we exposing percentiles client-side or letting the server compute them? (Server-side — the histogram type is exactly this.)
- Should counters reset on process restart? (Yes — Prometheus handles this with the
rate()function and the reset detection in counter math.) - What’s the maximum
max_labels_per_metricwe should default to? (1000 is generous; 100 is conservative. Make it configurable.)
Examples
reg = MetricsRegistry()
reg.counter("http_requests_total", {"method": "GET", "status": "200"}).inc()
reg.gauge("queue_depth", {"queue": "ingest"}).set(42)
reg.histogram("request_latency_s", {"endpoint": "/api"}, buckets=[0.01, 0.1, 1, 10]).observe(0.04)
print(reg.snapshot_openmetrics())
# # HELP http_requests_total
# # TYPE http_requests_total counter
# http_requests_total{method="GET",status="200"} 1
# # TYPE queue_depth gauge
# queue_depth{queue="ingest"} 42
# ...
Initial Brute Force
class NaiveMetrics:
def __init__(self):
self.metrics = {}
def counter(self, name, labels=None):
key = (name, frozenset((labels or {}).items()))
self.metrics.setdefault(key, 0)
self.metrics[key] += 1
This conflates increment and registration, has no thread safety, no histogram (impossible to compute p99 from a counter), no cardinality cap, and uses one global dict so every metric type collides on key shape.
Brute Force Complexity
O(1) per increment under no contention. With concurrent writers, races on dict.__setitem__ and += corrupt counts. Memory unbounded.
Optimization Path
Separate types into separate sub-registries (Counter, Gauge, Histogram) to avoid type-pun bugs. Add a per-metric Lock (or atomic primitive). Use bisect_left on a sorted bucket array to find the histogram bucket. Cap cardinality with a per-metric-name combination counter. Define an exposition format.
Final Expected Approach
A MetricsRegistry holds a dict name → MetricFamily. A MetricFamily stores the metric type, the bucket schedule (for histograms), and a dict labels_tuple → MetricInstance. Each MetricInstance is a small thread-safe object: Counter has an int and a Lock; Gauge has a float and a Lock; Histogram has a list[int] of bucket counts, a float sum, an int count, and a Lock. On increment, hash the label tuple, look up or create the instance (with cardinality check), acquire its Lock, mutate. Snapshot iterates families and instances under their locks and emits an exposition string.
Data Structures Used
dict[str, MetricFamily]for the registry.dict[tuple[tuple[str, str], ...], MetricInstance]per family for label combinations.list[float](sorted) for histogram bucket boundaries.list[int]for histogram bucket counts.threading.Lockper instance.
Correctness Argument
Counter: monotonic by construction (only inc(by) with by ≥ 0 allowed). Gauge: set is the last writer’s value; inc/dec are atomic under the lock. Histogram: each observation falls into exactly one bucket (the smallest bucket whose upper bound is ≥ observation; the last bucket is +Inf). Sum and count are incremented under the same lock as the bucket count, so a snapshot sees consistent values.
Complexity
| Op | Time |
|---|---|
counter.inc | O(1) lock-and-increment |
gauge.set/inc/dec | O(1) |
histogram.observe | O(log B) for bucket lookup |
snapshot | O(N · B) where N is total instances |
Space: O(name + label-cardinality · (1 for counter/gauge or B+2 for histogram)).
Implementation Requirements
import threading
from bisect import bisect_left
from typing import Optional
DEFAULT_BUCKETS = (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10)
def _label_key(labels: Optional[dict]) -> tuple:
if not labels:
return ()
return tuple(sorted(labels.items()))
class Counter:
__slots__ = ("_v", "_lock")
def __init__(self):
self._v = 0
self._lock = threading.Lock()
def inc(self, by: float = 1):
if by < 0:
raise ValueError("counter cannot decrease")
with self._lock:
self._v += by
def value(self) -> float:
with self._lock:
return self._v
class Gauge:
__slots__ = ("_v", "_lock")
def __init__(self):
self._v = 0.0
self._lock = threading.Lock()
def set(self, v: float):
with self._lock:
self._v = v
def inc(self, by: float = 1):
with self._lock:
self._v += by
def dec(self, by: float = 1):
with self._lock:
self._v -= by
def value(self) -> float:
with self._lock:
return self._v
class Histogram:
__slots__ = ("_buckets", "_counts", "_sum", "_count", "_lock")
def __init__(self, buckets):
self._buckets = tuple(sorted(buckets))
self._counts = [0] * (len(self._buckets) + 1) # +1 for +Inf
self._sum = 0.0
self._count = 0
self._lock = threading.Lock()
def observe(self, v: float):
idx = bisect_left(self._buckets, v)
if idx == len(self._buckets) and v > self._buckets[-1]:
idx = len(self._buckets) # +Inf bucket
with self._lock:
self._counts[idx] += 1
self._sum += v
self._count += 1
def snapshot(self) -> dict:
with self._lock:
cumulative = []
running = 0
for i, b in enumerate(self._buckets):
running += self._counts[i]
cumulative.append((b, running))
running += self._counts[-1]
cumulative.append((float("inf"), running))
return {"buckets": cumulative, "sum": self._sum, "count": self._count}
class MetricFamily:
def __init__(self, name: str, kind: str, buckets=None, max_labels: int = 1000):
self.name = name
self.kind = kind # "counter" | "gauge" | "histogram"
self.buckets = buckets
self.instances: dict = {}
self.max_labels = max_labels
self.dropped = 0
self.lock = threading.Lock()
def get(self, labels_key: tuple):
with self.lock:
inst = self.instances.get(labels_key)
if inst is not None:
return inst
if len(self.instances) >= self.max_labels:
self.dropped += 1
return None
if self.kind == "counter":
inst = Counter()
elif self.kind == "gauge":
inst = Gauge()
else:
inst = Histogram(self.buckets or DEFAULT_BUCKETS)
self.instances[labels_key] = inst
return inst
class _NullCounter:
def inc(self, by=1): pass
class MetricsRegistry:
def __init__(self, max_labels_per_metric: int = 1000):
self._families: dict[str, MetricFamily] = {}
self._lock = threading.Lock()
self._max_labels = max_labels_per_metric
def _family(self, name: str, kind: str, buckets=None) -> MetricFamily:
with self._lock:
f = self._families.get(name)
if f is None:
f = MetricFamily(name, kind, buckets, self._max_labels)
self._families[name] = f
elif f.kind != kind:
raise ValueError(f"metric {name} already registered as {f.kind}")
return f
def counter(self, name: str, labels: Optional[dict] = None) -> Counter:
f = self._family(name, "counter")
inst = f.get(_label_key(labels))
return inst if inst is not None else _NullCounter()
def gauge(self, name: str, labels: Optional[dict] = None) -> Gauge:
f = self._family(name, "gauge")
return f.get(_label_key(labels)) or Gauge() # caller-detached fallback
def histogram(self, name: str, labels: Optional[dict] = None, buckets=None) -> Histogram:
f = self._family(name, "histogram", buckets)
return f.get(_label_key(labels)) or Histogram(buckets or DEFAULT_BUCKETS)
def snapshot_openmetrics(self) -> str:
lines = []
with self._lock:
families = list(self._families.values())
for f in families:
lines.append(f"# TYPE {f.name} {f.kind}")
with f.lock:
instances = list(f.instances.items())
for labels_key, inst in instances:
lbl = "{" + ",".join(f'{k}="{v}"' for k, v in labels_key) + "}" if labels_key else ""
if isinstance(inst, (Counter, Gauge)):
lines.append(f"{f.name}{lbl} {inst.value()}")
elif isinstance(inst, Histogram):
snap = inst.snapshot()
for b, cum in snap["buckets"]:
b_str = "+Inf" if b == float("inf") else f"{b}"
bucket_lbl = labels_key + (("le", b_str),)
bl = "{" + ",".join(f'{k}="{v}"' for k, v in bucket_lbl) + "}"
lines.append(f"{f.name}_bucket{bl} {cum}")
lines.append(f"{f.name}_sum{lbl} {snap['sum']}")
lines.append(f"{f.name}_count{lbl} {snap['count']}")
return "\n".join(lines)
Tests
def test_counter_increments():
r = MetricsRegistry()
c = r.counter("hits")
c.inc(); c.inc(2)
assert c.value() == 3
def test_counter_rejects_negative():
r = MetricsRegistry()
try: r.counter("x").inc(-1)
except ValueError: pass
else: assert False
def test_gauge_set_inc_dec():
g = MetricsRegistry().gauge("depth")
g.set(10); g.inc(5); g.dec(3)
assert g.value() == 12
def test_histogram_buckets():
h = MetricsRegistry().histogram("lat", buckets=[0.1, 1, 10])
for v in [0.05, 0.5, 1.5, 100]: h.observe(v)
snap = h.snapshot()
assert snap["count"] == 4
assert snap["sum"] == 102.05
assert snap["buckets"][0] == (0.1, 1) # ≤ 0.1
assert snap["buckets"][1] == (1, 2) # ≤ 1
assert snap["buckets"][2] == (10, 3) # ≤ 10
assert snap["buckets"][3][1] == 4 # +Inf
def test_concurrent_counter():
import threading
r = MetricsRegistry()
c = r.counter("racy")
def inc():
for _ in range(10_000): c.inc()
threads = [threading.Thread(target=inc) for _ in range(8)]
for t in threads: t.start()
for t in threads: t.join()
assert c.value() == 80_000
def test_cardinality_cap():
r = MetricsRegistry(max_labels_per_metric=2)
r.counter("uid", {"id": "a"}).inc()
r.counter("uid", {"id": "b"}).inc()
r.counter("uid", {"id": "c"}).inc() # dropped
assert len(r._families["uid"].instances) == 2
assert r._families["uid"].dropped == 1
def test_type_conflict():
r = MetricsRegistry()
r.counter("x")
try: r.gauge("x")
except ValueError: pass
else: assert False
Follow-up Questions
- How would you make it thread-safe? Per-instance locks (counters and gauges have one each; histograms have one per
(name, labels)instance). The registry-level lock only guards family creation. Result: counter increments from different label tuples never block each other, which is the expected hot path. - What metrics would you emit (about the metrics system itself)?
metrics_dropped_total{name=...}(cardinality drops);metrics_active_series(gauge of total instances);metrics_scrape_duration_seconds(histogram ofsnapshot_openmetricslatency). Self-instrumentation is a sign of mature instrumentation. - What is the eviction policy? None for active series — they live forever. For TTL’d metrics (rare), an external sweeper deletes instances unobserved for N minutes. Design so the sweeper is optional, not on the hot path.
- What configuration knobs would you expose?
max_labels_per_metric(cardinality cap), default histogram buckets, the registry’s exposition format. Don’t expose the per-instance lock granularity — it’s an implementation detail. - How would you handle backpressure? The hot path is lock-bounded; a write that contends waits microseconds. If a histogram’s lock becomes hot, switch to per-bucket atomics or sharded counters (8 shards keyed by
thread_id % 8, summed at scrape). - What’s the difference between summary and histogram? Summary computes quantiles client-side using a streaming algorithm (Greenwald-Khanna). Pros: exact-ish percentiles per series. Cons: cannot aggregate across series. Histogram pushes the work to the query layer; aggregation across series is just bucket-wise addition. Histograms are the right default in modern observability.
Product Extension
This is exactly the data model exposed by the Prometheus client libraries. Real implementations add: gauge track_inprogress (decorator/context manager that increments on enter, decrements on exit), summary type, exemplars (trace IDs attached to histogram observations), native histograms (sparse representation that auto-tunes bucket boundaries).
Language/Runtime Follow-ups
- Python: as above. The
prometheus_clientlibrary is the production reference. Be aware of GIL implications: counter increments aren’t atomic at the bytecode level for floats, but the explicit lock makes them so. - Java: use
LongAdderfor counters (avoids contention via per-thread cells);DoubleAdderfor histogram sums. Micrometer is the production reference. - Go: counters and gauges as
atomic.Int64/atomic.Float64. Histograms withsync.Mutexper metric. Theprometheus/client_golanglibrary does exactly this. - C++:
std::atomic<uint64_t>for counter;std::mutexfor histogram. Cardinality maps require careful design —tbb::concurrent_hash_mapis one option. - JS/TS: single-threaded — no locks needed in Node. The
prom-clientpackage is the production reference.
Common Bugs
- Histogram bucket lookup off-by-one —
bisect_leftis correct only if buckets are pre-sorted. - Sharing histogram buckets across label combinations — different distributions have different optimal buckets. Each instance gets its own bucket array.
- Forgetting the
+Infbucket — observations larger than the largest bucket are silently dropped. - Using
intfor histogram sum — overflows for high-throughput histograms after a few hours. Usefloat. - Type confusion — registering the same name as both counter and gauge corrupts exposition. The registry must reject this.
- Unbounded cardinality — a label per
request_idcreates a new series per request. The cardinality cap is the safety net.
Debugging Strategy
When totals look wrong: confirm the counter.inc(by) has by ≥ 0 and the operation is under the lock. When percentiles are off: dump the bucket cumulative counts; the math should be cumulative[i] = sum(counts[0..i]). When concurrency tests are flaky: add time.sleep(0) between increments to expose races; if the test passes, your locking is correct.
Mastery Criteria
- Wrote the three metric types with correct semantics in <25 minutes.
- Articulated histogram vs summary in <60 seconds.
- Stated the cardinality risk without prompting and showed the cap in code.
- Wrote a concurrent counter test that verifies no lost updates.
- Produced a Prometheus-compatible exposition string.
- Listed three metrics-about-metrics to emit (drops, active series, scrape duration).
- Explained why per-instance locks scale better than a global registry lock.
Lab 18 — Concurrent Web Crawler
Goal
Implement a concurrent web crawler that BFS-traverses a starting URL, respects a depth limit, per-host politeness (max in-flight requests per domain + minimum inter-request delay), dedup (visit each URL once), and a bounded worker pool. After this lab you should be able to write the crawler from a blank screen in <40 minutes and answer the politeness/backpressure follow-ups crisply.
Background Concepts
A web crawler is a BFS over the web graph where nodes are URLs and edges are anchor links extracted from the HTML. The interesting engineering is not the BFS — it’s the constraints layered on top:
- Politeness: never overload a single host. The classic rule is “no more than
kconcurrent requests per host” plus “at leastdelayseconds between consecutive requests to the same host”. Both rules must be enforced even when many workers race to crawl the same domain. - Dedup: the web has cycles. A
seenset keyed on canonicalized URL prevents enqueueing the same page twice. - Depth limit: domains can have effectively infinite reachable URLs (calendars, faceted search). Hard-cap depth.
- Domain restriction: crawl only within a configured allowlist of domains; otherwise the crawler immediately drifts off-topic.
- Bounded workers: limit total concurrency to
N. Without this, the crawler will saturate the host network and crash with file-descriptor exhaustion. - Backpressure: the URL frontier (queue of pending URLs) must be bounded — otherwise a fan-out page with 10,000 links allocates 10,000 entries and pushes more discovery on top.
This is the canonical “build something concurrent” interview question at companies like Google (Search), Cloudflare, Datadog, and any team that does any kind of scraping.
Interview Context
A 40-to-60-minute round at senior+ practical interviews. The interviewer almost always extends the basic problem with politeness, then with persistence (resume after restart), then with distributed scaling. Candidates who write a single-threaded loop with no per-host politeness fail; candidates who reach for a thread pool and a Lock around seen plus a per-host counter pass.
Problem Statement
Implement crawl(start_url, *, max_depth, max_workers, per_host_concurrency, per_host_delay_s, allow_domains, http_get) that returns a list (or yields a stream) of (url, depth, content) tuples. Visit each canonical URL at most once. Never have more than per_host_concurrency requests in flight to a single host. Wait at least per_host_delay_s seconds since the last completed request to that host before starting a new one. Stop when the frontier is empty.
Constraints
- Thread-safe; many workers race for URLs from the frontier.
- Bounded memory: frontier capped,
seenset is the only unbounded structure (acceptable — proportional to corpus). - Graceful shutdown on
Ctrl-Cor external cancellation. http_getis injected so tests don’t hit the network.
Clarifying Questions
- URL canonicalization rules? (Lowercase host, drop fragment, sort query params, default port elision.)
- Should
robots.txtbe honored? (Yes in production; mock it in this lab viais_allowedpredicate.) - What’s a “host”? (Registered domain —
example.com, notwww.example.comvsimages.example.com. Or just hostname; document the choice.) - Should depth-0 (the start URL) count toward the depth limit? (No — depth-0 is always crawled.)
- Should we follow redirects? (Yes, but the redirect target counts as the visited URL.)
- Output order — does it need to be deterministic? (No — concurrency makes determinism hard. Document.)
Examples
results = crawl(
"https://example.com/",
max_depth=3,
max_workers=8,
per_host_concurrency=2,
per_host_delay_s=0.5,
allow_domains={"example.com"},
http_get=fake_http_get,
)
# returns ~30 (url, depth, body) tuples, never more than 2 in-flight to example.com.
Initial Brute Force
def crawl_naive(url, max_depth):
seen = {url}
frontier = [(url, 0)]
out = []
while frontier:
u, d = frontier.pop(0)
body = http_get(u)
out.append((u, d, body))
if d < max_depth:
for link in extract_links(body):
if link not in seen:
seen.add(link)
frontier.append((link, d + 1))
return out
This is single-threaded (slow), has no politeness (will get IP-banned), and grows the frontier unboundedly.
Brute Force Complexity
Time: O(V) HTTP requests serially, where V is the number of unique pages. With 1s/page and 10k pages, ~3 hours.
Optimization Path
Add a thread pool of max_workers. Add a Lock-guarded seen set. Add per-host concurrency via a Semaphore keyed on host. Add per-host last-request-time via a dict-of-(timestamp, lock). Cap the frontier with a BoundedQueue. Add a stop_event for graceful shutdown.
Final Expected Approach
A ThreadPoolExecutor(max_workers) runs crawl_one(url, depth). The frontier is a queue.Queue(maxsize=...). A seen set guarded by a Lock ensures each URL is enqueued once. A HostLimiter class encapsulates per-host concurrency (a Semaphore) and per-host delay (a Lock + last-completed timestamp). Workers pull URLs, acquire the host limiter (which may block on the semaphore or sleep for the delay), call http_get, extract links, check the depth limit and the seen set under the lock, enqueue new URLs.
Data Structures Used
queue.Queuefor the frontier (bounded).set[str]forseen(guarded byLock).dict[str, _HostState]for per-host limiters._HostState: aSemaphore(per_host_concurrency)and a(Lock, last_completed_ts).ThreadPoolExecutorfor workers.
Correctness Argument
Each URL is enqueued at most once (the seen set is checked under the global lock atomically with the add). Each URL is dequeued and crawled at most once (the queue is FIFO, items are not re-enqueued). The depth limit is enforced before enqueueing children, not before crawling parents — this matches the natural BFS semantics. Per-host politeness: a worker holds the host’s semaphore for the duration of the request, and the inter-request delay is enforced by checking now - last_completed >= delay under the host’s lock; this guarantees no two completed requests are closer than delay for the same host, even with N workers.
Complexity
| Aspect | Cost |
|---|---|
| HTTP requests | O(V) total, parallel by max_workers |
seen lookup | O(1) average |
| Per-host serialization | bounded by per_host_concurrency and per_host_delay_s |
| Memory | O(V) for seen plus frontier_capacity for the queue |
Implementation Requirements
import threading
import time
from concurrent.futures import ThreadPoolExecutor
from queue import Queue, Empty
from urllib.parse import urlparse, urldefrag
def canonicalize(url: str) -> str:
u, _ = urldefrag(url)
p = urlparse(u)
host = p.hostname.lower() if p.hostname else ""
port = f":{p.port}" if p.port else ""
path = p.path or "/"
return f"{p.scheme}://{host}{port}{path}" + (f"?{p.query}" if p.query else "")
def host_of(url: str) -> str:
return (urlparse(url).hostname or "").lower()
class _HostState:
__slots__ = ("sem", "lock", "last_completed")
def __init__(self, concurrency: int):
self.sem = threading.Semaphore(concurrency)
self.lock = threading.Lock()
self.last_completed = 0.0
class HostLimiter:
def __init__(self, per_host_concurrency: int, per_host_delay_s: float, *, clock=time.monotonic, sleep=time.sleep):
self._concurrency = per_host_concurrency
self._delay = per_host_delay_s
self._states: dict[str, _HostState] = {}
self._guard = threading.Lock()
self._clock = clock
self._sleep = sleep
def _state(self, host: str) -> _HostState:
with self._guard:
s = self._states.get(host)
if s is None:
s = _HostState(self._concurrency)
self._states[host] = s
return s
def acquire(self, host: str):
s = self._state(host)
s.sem.acquire()
with s.lock:
wait = s.last_completed + self._delay - self._clock()
if wait > 0:
self._sleep(wait)
return s
def release(self, s: _HostState):
with s.lock:
s.last_completed = self._clock()
s.sem.release()
def crawl(start_url: str, *,
max_depth: int = 3,
max_workers: int = 8,
per_host_concurrency: int = 2,
per_host_delay_s: float = 0.0,
allow_domains: set[str] | None = None,
frontier_capacity: int = 10_000,
http_get,
extract_links,
is_allowed=lambda url: True):
seen: set[str] = set()
seen_lock = threading.Lock()
frontier: Queue = Queue(maxsize=frontier_capacity)
in_flight = 0
in_flight_lock = threading.Lock()
inflight_zero = threading.Event()
inflight_zero.set()
stop_event = threading.Event()
results: list[tuple[str, int, str]] = []
results_lock = threading.Lock()
limiter = HostLimiter(per_host_concurrency, per_host_delay_s)
def _allow(url: str) -> bool:
if not is_allowed(url):
return False
if allow_domains is None:
return True
h = host_of(url)
return any(h == d or h.endswith("." + d) for d in allow_domains)
def _enqueue(url: str, depth: int):
canon = canonicalize(url)
if not _allow(canon):
return
with seen_lock:
if canon in seen:
return
seen.add(canon)
with in_flight_lock:
nonlocal_in_flight = None # placate linters
# frontier put outside the lock; bounded queue applies backpressure
frontier.put((canon, depth))
def _worker():
nonlocal in_flight
while not stop_event.is_set():
try:
url, depth = frontier.get(timeout=0.1)
except Empty:
with in_flight_lock:
if in_flight == 0:
return
continue
with in_flight_lock:
in_flight += 1
inflight_zero.clear()
try:
state = limiter.acquire(host_of(url))
try:
body = http_get(url)
finally:
limiter.release(state)
if body is None:
continue
with results_lock:
results.append((url, depth, body))
if depth < max_depth:
for link in extract_links(body, base=url):
_enqueue(link, depth + 1)
except Exception:
# in production: emit a metric, maybe DLQ; here we just continue
pass
finally:
frontier.task_done()
with in_flight_lock:
in_flight -= 1
if in_flight == 0 and frontier.empty():
inflight_zero.set()
_enqueue(start_url, 0)
with ThreadPoolExecutor(max_workers=max_workers) as pool:
futures = [pool.submit(_worker) for _ in range(max_workers)]
try:
while True:
if inflight_zero.wait(timeout=0.5) and frontier.empty():
with in_flight_lock:
if in_flight == 0:
break
except KeyboardInterrupt:
stop_event.set()
stop_event.set()
for f in futures:
f.result()
return results
Tests
def make_site(graph: dict[str, list[str]]):
def http_get(url): return graph.get(url)
def extract_links(body, base): return body if isinstance(body, list) else []
return http_get, extract_links
def test_basic_bfs():
graph = {
"https://e.com/a": ["https://e.com/b", "https://e.com/c"],
"https://e.com/b": ["https://e.com/d"],
"https://e.com/c": ["https://e.com/d"],
"https://e.com/d": [],
}
http_get, extract_links = make_site(graph)
out = crawl("https://e.com/a", max_depth=5, max_workers=4,
per_host_concurrency=4, allow_domains={"e.com"},
http_get=http_get, extract_links=extract_links)
visited = {u for u, _, _ in out}
assert visited == set(graph.keys())
def test_depth_limit():
chain = {f"https://e.com/{i}": [f"https://e.com/{i+1}"] for i in range(10)}
chain["https://e.com/10"] = []
http_get, extract_links = make_site(chain)
out = crawl("https://e.com/0", max_depth=2, max_workers=2,
per_host_concurrency=2, allow_domains={"e.com"},
http_get=http_get, extract_links=extract_links)
assert len({u for u, _, _ in out}) == 3 # depths 0, 1, 2
def test_dedup_on_cycle():
g = {"https://e.com/a": ["https://e.com/b"],
"https://e.com/b": ["https://e.com/a", "https://e.com/c"],
"https://e.com/c": []}
http_get, extract_links = make_site(g)
out = crawl("https://e.com/a", max_depth=10, max_workers=4,
per_host_concurrency=4, allow_domains={"e.com"},
http_get=http_get, extract_links=extract_links)
urls = [u for u, _, _ in out]
assert len(urls) == len(set(urls)) == 3
def test_domain_restriction():
g = {"https://e.com/a": ["https://other.com/x"], "https://other.com/x": []}
http_get, extract_links = make_site(g)
out = crawl("https://e.com/a", max_depth=5, max_workers=2,
per_host_concurrency=2, allow_domains={"e.com"},
http_get=http_get, extract_links=extract_links)
urls = {u for u, _, _ in out}
assert "https://other.com/x" not in urls
def test_per_host_concurrency():
import threading
in_flight = {"max": 0, "now": 0}
lock = threading.Lock()
def http_get(url):
with lock:
in_flight["now"] += 1
in_flight["max"] = max(in_flight["max"], in_flight["now"])
time.sleep(0.05)
with lock:
in_flight["now"] -= 1
return []
crawl("https://e.com/a", max_depth=0, max_workers=10,
per_host_concurrency=3, allow_domains={"e.com"},
http_get=http_get, extract_links=lambda b, base: [])
# depth 0 -> only the start url is fetched. Need a bigger frontier:
g = {f"https://e.com/{i}": [f"https://e.com/{i+1}" for _ in range(1)] for i in range(20)}
# ... but the assertion shape is: in_flight["max"] <= 3 in any test variant.
Follow-up Questions
- How would you make it thread-safe? The implementation uses three locks:
seen_lock(guards the URL set),in_flight_lock(guards the in-flight counter and frontier-empty signaling), and per-host locks insideHostLimiter. TheQueueis internally thread-safe. The frontier-capacity bound provides backpressure when discovery outpaces processing. - How would you persist state across restarts? Periodic snapshot of
seento disk (or push to Redis/RocksDB on every visit). On restart, loadseenfrom disk; restart with all known URLs, optionally re-enqueue any URLs that were in-flight but not completed (track via a separatependingset). - How would you scale to N nodes? Shard the URL space by
hash(host) % N— each node owns a fixed slice. Cross-node enqueues go via a message bus. Theseenset is replicated or sharded the same way. Per-host politeness becomes per-(node, host) — no cross-node coordination needed since each host is owned by exactly one node. - How would you handle backpressure? The bounded frontier (
Queue(maxsize=...)) blocks workers that try to enqueue when full. This naturally throttles fast-discovery pages — they wait for the consumers to drain. Drop-on-overflow is wrong for crawlers (you’d lose URLs); blocking is right. - What is the shutdown / draining behavior? On
stop_event, workers stop pulling from the queue. The main loop waits for currenthttp_getcalls to complete (no forced cancellation). Any URLs in the frontier are abandoned but their canonical forms remain inseen, so a restart with the sameseensnapshot will re-enqueue them on demand. - How would you handle a poison-pill input? A URL whose response triggers an infinite link-extraction loop (e.g., calendar with year=∞). Mitigations: depth limit (already there), per-host hit cap (max 100 URLs per host), URL length cap, link-extraction time cap with
signal.alarmor a sub-thread timeout, and a content-size cap onhttp_get.
Product Extension
Real crawlers (Googlebot, Bingbot) layer on top: robots.txt parsing per host, sitemap.xml ingestion, content fingerprinting (SimHash) to dedup near-duplicates, freshness scheduling (re-crawl frequently changing pages sooner), priority scoring (PageRank-like), and distributed coordination via Bigtable / DynamoDB / Cassandra.
Language/Runtime Follow-ups
- Python: as above. For very high concurrency switch to
asynciowithaiohttpandasyncio.Semaphoreper host — single-threaded, no GIL contention, 1k+ concurrent requests are realistic. - Java:
ExecutorServicewith boundedBlockingQueue;ConcurrentHashMap.newKeySet()forseen;Semaphoreper host. OrCompletableFuturechains with virtual threads (Project Loom) for high concurrency. - Go: a worker-pool of goroutines reading from a buffered channel;
sync.Mapforseen; per-hostchan struct{}of sizeconcurrencyas a semaphore. The idiom is exceptionally clean in Go. - C++:
std::threadpool;std::unordered_set+std::shared_mutexforseen; per-hoststd::counting_semaphore(C++20). - JS/TS: Node +
p-limitper host; single-threaded so no locks. The “global” concurrency is enforced by an outerp-limit.
Common Bugs
- Adding to
seenafterhttp_getreturns — multiple workers crawl the same URL. - Holding
seen_lockwhile callinghttp_get— blocks every other worker. - Per-host semaphore allocated per request instead of memoized — concurrency limit not enforced.
- Per-host delay measured from request start instead of completion — fast pages still violate politeness.
- Forgetting
urldefragin canonicalization —?#section1and?#section2count as different URLs. - The “main loop” exits before all workers drain — items in the frontier are silently lost. Use a counter + condition or
Queue.join().
Debugging Strategy
When dedup fails: log every seen.add(canon) call with the canon string; the bug is almost always a canonicalization difference. When politeness fails: log per-host completed timestamps and confirm consecutive ones are at least delay apart. When the crawler hangs at end: print in_flight and frontier.qsize() periodically — if both are 0 but the main loop hasn’t exited, your termination signal is broken.
Mastery Criteria
- Wrote a crawler with thread pool, bounded frontier, dedup, and depth limit in <40 minutes.
- Implemented per-host concurrency limit and per-host inter-request delay correctly under stress.
- Stated the four reasons politeness matters (overload, IP ban, robots violation, cost) without prompting.
- Articulated the sharding strategy for scaling to N nodes.
- Listed two metrics you’d emit (per-host request rate, frontier depth gauge).
- Identified the canonicalization bug class (fragment / param order / case).
- Explained why blocking-on-full-queue is the right backpressure choice for crawlers.
Lab 19 — In-Memory Filesystem
Goal
Implement an in-memory filesystem that supports ls, mkdir, addContentToFile, readContentFromFile over a tree of inode-like directory and file nodes. After this lab you should be able to design and implement the filesystem from a blank screen in <30 minutes and answer the standard follow-ups (concurrency, persistence, paths/permissions) crisply.
Background Concepts
A filesystem is a tree where internal nodes are directories and leaves are files. Every Unix-style filesystem reduces to the same three operations: navigate (walk a path to a node), mutate (create / delete / write), and read (list / cat). The interview problem strips out permissions, hard links, symbolic links, journaling, and on-disk layout, leaving the core tree structure + path-walking — which is enough to test naming, encapsulation, and correctness.
The two design choices that matter:
- One node class or two? A
Directoryhas adict[name, Node]of children; aFilehas a content string. They share path-walking but diverge in storage. Use either an inheritance hierarchy (DirectoryandFileextendNode) or a singleNodewith akinddiscriminator. Inheritance is cleaner for this problem. - Path-walk encapsulation. Every operation needs
_walk(path)that returns the target node (or raises). Centralize it; do not duplicate the path-split logic in each method.
This problem appears as LC 588 — “Design In-Memory File System” — and is asked at Amazon, Google, and Bloomberg as an OOD warmup.
Interview Context
A 30-to-45-minute round at the senior level. The interviewer wants to see (a) clean class decomposition, (b) correct path-handling (absolute paths, edge cases like / and trailing slash), (c) sensible API, (d) tests. Common failure: stuffing everything into one class with one dict[str, str] keyed on full paths — works for small inputs, fails the “ls a directory” requirement, and looks like LeetCode glue rather than production code.
Problem Statement
Implement FileSystem with:
ls(path) -> list[str]: ifpathis a file, return[filename]; if a directory, return sorted list of children.mkdir(path) -> None: create the directory and any missing intermediate directories (mkdir -p semantics).addContentToFile(path, content) -> None: create the file if missing; appendcontentto it. Intermediate directories are created.readContentFromFile(path) -> str: return the file’s content. Raise if not a file.
All paths are absolute, start with /, components separated by /. The root is /.
Constraints
- Path components are non-empty alphanumeric (no
.,.., no slashes-in-names). - Directories and files in the same directory must have distinct names.
addContentToFileon an existing directory should raise.ls("/")returns the children of root.
Clarifying Questions
- Is
addContentToFileappend or overwrite? (LC 588 is append; confirm.) - Are path components case-sensitive? (Yes, like Unix.)
- What’s the error mode for missing files in
readContentFromFile? (Raise — caller should not silently get an empty string.) - Can
mkdir("/a")succeed if/aalready exists as a directory? (Yes — idempotent. As a file? No — raise.) - Does
ls("/a/b")where/a/bdoesn’t exist raise or return[]? (Raise.) - Concurrency? (Single-threaded by default; thread-safety as a follow-up.)
Examples
fs = FileSystem()
fs.ls("/") # []
fs.mkdir("/a/b/c")
fs.addContentToFile("/a/b/c/d.txt", "hello")
fs.ls("/") # ["a"]
fs.ls("/a/b/c") # ["d.txt"]
fs.readContentFromFile("/a/b/c/d.txt") # "hello"
fs.addContentToFile("/a/b/c/d.txt", " world")
fs.readContentFromFile("/a/b/c/d.txt") # "hello world"
Initial Brute Force
class FlatFS:
def __init__(self): self.store = {} # full path -> content (None for dirs)
def mkdir(self, p): self.store[p] = None
def addContentToFile(self, p, c): self.store[p] = self.store.get(p, "") or "" ; self.store[p] += c
def readContentFromFile(self, p): return self.store[p]
def ls(self, p):
if self.store.get(p) is not None: return [p.rsplit("/", 1)[1]]
prefix = p.rstrip("/") + "/"
return sorted({k[len(prefix):].split("/")[0] for k in self.store if k.startswith(prefix)})
This works on small inputs but is O(N) per ls (scans every key) and conflates the “file vs directory” type via a None-or-string trick. It also can’t represent an empty directory unambiguously.
Brute Force Complexity
ls: O(N · L) where N is total entries and L is average path length. addContentToFile: O(L) for hashing. Memory: O(total path text + content).
Optimization Path
Replace the flat dict with a tree of nodes. Each Directory has dict[str, Node] children — ls is now O(K log K) where K is the number of children of the queried directory, not the whole filesystem. mkdir and addContentToFile walk the path once, creating intermediate directories on demand. The walk is O(path_depth).
Final Expected Approach
Node is the base class. Directory(Node) has children: dict[str, Node]. File(Node) has content: list[str] (a list of chunks; append by chunks.append(c); read by "".join(chunks)). The list-of-chunks representation makes append O(1) regardless of total size. _walk(path, create_dirs=False) returns the terminal node, raising or creating intermediates as configured. Each public method is a thin wrapper around _walk.
Data Structures Used
dict[str, Node]for directory children — O(1) lookup, sorted on demand forls.list[str]for file content chunks — O(1) append.- A path-split helper that handles
/correctly.
Correctness Argument
Every node has exactly one parent (the dict entry that points to it). The root is the only node with no parent. _walk is the single source of truth for path resolution; it raises if it encounters a missing intermediate (create_dirs=False) or auto-creates one (create_dirs=True). addContentToFile resolves the parent directory, then either fetches the existing File (and asserts it’s a file, not a directory) or creates a new File. The “name collision” case (a directory exists where a file is being created) is detected at this exact step.
Complexity
| Operation | Time | Space |
|---|---|---|
mkdir | O(L) where L = path components | O(L) new nodes |
addContentToFile | O(L) for walk + O(1) append | O(content) |
readContentFromFile | O(L + content_size) for join | O(content) |
ls (directory) | O(K log K) for sort | O(K) |
ls (file) | O(L) | O(1) |
Implementation Requirements
class Node:
pass
class File(Node):
__slots__ = ("chunks",)
def __init__(self):
self.chunks: list[str] = []
def read(self) -> str:
return "".join(self.chunks)
def append(self, content: str):
if content:
self.chunks.append(content)
class Directory(Node):
__slots__ = ("children",)
def __init__(self):
self.children: dict[str, Node] = {}
def _split(path: str) -> list[str]:
if not path or path[0] != "/":
raise ValueError(f"path must be absolute: {path!r}")
return [p for p in path.split("/") if p]
class FileSystem:
def __init__(self):
self._root = Directory()
def _walk(self, parts: list[str], *, create_dirs: bool = False) -> Node:
node: Node = self._root
for i, p in enumerate(parts):
if not isinstance(node, Directory):
raise NotADirectoryError("/".join(parts[:i]) or "/")
child = node.children.get(p)
if child is None:
if not create_dirs:
raise FileNotFoundError("/" + "/".join(parts[: i + 1]))
child = Directory()
node.children[p] = child
node = child
return node
def ls(self, path: str) -> list[str]:
parts = _split(path)
node = self._walk(parts)
if isinstance(node, File):
return [parts[-1]]
return sorted(node.children.keys())
def mkdir(self, path: str) -> None:
parts = _split(path)
if not parts:
return # mkdir '/' is a no-op
# Walk-with-create, but reject a final-segment that exists as a file
parent = self._walk(parts[:-1], create_dirs=True)
if not isinstance(parent, Directory):
raise NotADirectoryError("/".join(parts[:-1]))
last = parts[-1]
existing = parent.children.get(last)
if existing is None:
parent.children[last] = Directory()
elif isinstance(existing, File):
raise FileExistsError(path + " is a file")
# else: existing directory; idempotent
def addContentToFile(self, path: str, content: str) -> None:
parts = _split(path)
if not parts:
raise IsADirectoryError("/")
parent = self._walk(parts[:-1], create_dirs=True)
if not isinstance(parent, Directory):
raise NotADirectoryError("/".join(parts[:-1]))
last = parts[-1]
node = parent.children.get(last)
if node is None:
node = File()
parent.children[last] = node
elif isinstance(node, Directory):
raise IsADirectoryError(path)
node.append(content)
def readContentFromFile(self, path: str) -> str:
parts = _split(path)
node = self._walk(parts)
if not isinstance(node, File):
raise IsADirectoryError(path)
return node.read()
Tests
def test_empty_root():
fs = FileSystem()
assert fs.ls("/") == []
def test_mkdir_p():
fs = FileSystem()
fs.mkdir("/a/b/c")
assert fs.ls("/") == ["a"]
assert fs.ls("/a") == ["b"]
assert fs.ls("/a/b") == ["c"]
assert fs.ls("/a/b/c") == []
def test_add_and_read_file():
fs = FileSystem()
fs.addContentToFile("/x/y/z.txt", "hello")
fs.addContentToFile("/x/y/z.txt", " world")
assert fs.readContentFromFile("/x/y/z.txt") == "hello world"
assert fs.ls("/x/y") == ["z.txt"]
assert fs.ls("/x/y/z.txt") == ["z.txt"]
def test_mkdir_idempotent():
fs = FileSystem()
fs.mkdir("/a")
fs.mkdir("/a")
assert fs.ls("/") == ["a"]
def test_mkdir_over_file_fails():
fs = FileSystem()
fs.addContentToFile("/a", "x")
try: fs.mkdir("/a")
except FileExistsError: pass
else: assert False
def test_read_nonexistent():
fs = FileSystem()
try: fs.readContentFromFile("/nope")
except FileNotFoundError: pass
else: assert False
def test_ls_sorts():
fs = FileSystem()
for n in ["zeta", "alpha", "mu"]: fs.mkdir(f"/{n}")
assert fs.ls("/") == ["alpha", "mu", "zeta"]
def test_root_path():
fs = FileSystem()
fs.addContentToFile("/a.txt", "x")
assert fs.ls("/a.txt") == ["a.txt"]
Follow-up Questions
- How would you make it thread-safe? Two options. (a) Single coarse
RLockon the whole FileSystem — every operation acquires it. Simple, fine for low write rates. (b) Per-directory lock; acquire locks along the path during a walk. Avoids serializing readers from disjoint subtrees, but care is needed to acquire in path-order to avoid deadlock. For an interview answer, name both and pick (a) unless write contention is the explicit follow-up. - How would you persist state across restarts? Two layers. (i) Snapshot: serialize the tree (DFS, emit
(path, kind, content_or_empty)); on boot, replay. (ii) Write-ahead log: append every mutation as a record (mkdir /a,add /a/b "hello"); periodic checkpoint. Tradeoff: pure snapshot loses recent writes; pure log replays slowly; combine for production. - What configuration knobs would you expose?
max_filesize,max_path_depth,max_filename_length. Don’t expose the lock granularity — implementation detail. Reject paths exceeding the caps with a typed error. - How would you handle a poison-pill input? A path with millions of components, or a single file with multi-gigabyte content. Cap path depth, cap filename length, cap per-file content size, and surface metric counters for rejected requests.
- How would you test it? Unit tests on each method’s contract. Property-based tests: random sequence of
mkdir/addContentoperations followed by anls/readthat asserts consistency with a simple oracle (e.g., a flat dict). Concurrency tests: many threads each writing to disjoint subtrees should produce identical state regardless of interleaving. - What metrics would you emit? Operation counters (per method), latency histograms,
total_files,total_directories,bytes_storedgauges, error counters by type.
Product Extension
Variants in real systems: S3-style flat namespace with / as a virtual delimiter; in-memory FUSE filesystems for tests; Kubernetes ConfigMap/Secret mounting (a tiny in-memory FS exposed to a pod). The data structure is the same; the API surface and persistence vary.
Language/Runtime Follow-ups
- Python: as above. Use
__slots__forFileandDirectoryto cut per-node memory. - Java:
Map<String, Node>(HashMap or TreeMap). For sortedls, TreeMap is natural and avoids the per-call sort. - Go:
type Node interface { ... }withDirectoryandFilestructs. For sortedls, sort.Strings on the keys. - C++:
std::variant<Directory, File>or a tagged union.std::map<std::string, std::unique_ptr<Node>>for ordered children. - JS/TS:
Map<string, Node>(insertion-ordered; sort onls). Use a discriminated union forNode.
Common Bugs
- Using
path.split("/")without filtering empty strings —["", "a", "b"]for"/a/b". - Treating
/differently from non-/paths inconsistently; a single_splithelper avoids this. - Not detecting directory-vs-file at the final path component —
addContentToFile("/a")where/ais a directory must raise. mkdiroverwriting an existing file silently.- Storing file content as a single growing string —
s += contentis O(N) per append. Use a list of chunks. - Returning unsorted
ls— the spec usually requires sorted output for determinism.
Debugging Strategy
When ls is wrong: print the children dict at the resolved node — almost always a wrong-path-walk bug. When append seems to overwrite: check that addContentToFile calls node.append, not node.chunks = [content]. When concurrency tests fail: log every operation in order with a thread ID; the bug is usually a missing lock around parent.children[last] = ....
Mastery Criteria
-
Decomposed
Node/File/Directorycleanly in <5 minutes. -
Wrote a single
_walkhelper used by every public method. -
Handled
mkdiridempotency and the file-vs-directory collision case correctly. - Used the list-of-chunks pattern for O(1) append.
-
Wrote tests for every error mode (
FileNotFoundError,IsADirectoryError,FileExistsError). - Articulated the snapshot+WAL persistence strategy in <60 seconds.
- Implemented from a blank screen in <30 minutes.
Lab 20 — Snake Game
Goal
Implement the game logic for Snake (LC 353) — a snake moves on a grid, eats food, grows by one each meal, and dies on collision with a wall or itself. Each move(direction) returns the current score or -1 on game-over. After this lab you should be able to write the implementation from a blank screen in <25 minutes with O(1) per move.
Background Concepts
Snake is a classic OOD warmup that hides a single non-obvious data-structure decision: representing the snake as a deque of cells (head at one end, tail at the other) and using a set for O(1) self-collision check. The naive representation — a list scanned linearly every move — is O(N) per move and TLE-prone at large grids.
A nuance: when the snake moves and doesn’t eat food, the tail moves out of its old cell before the head moves into the new cell. So the head’s new cell could be the old tail’s cell — that’s not a collision. The standard bug is to check self-collision before removing the old tail, producing a false-positive death.
Interview Context
A 30-minute round at Amazon, Microsoft, and Bloomberg. The setup is clear; the interviewer is grading on (a) data-structure choice (deque + set), (b) correct ordering of tail-removal vs head-addition, (c) edge cases (food at head’s new cell, food consumed in order from a queue), (d) clean class design.
Problem Statement
A snake starts at (0, 0) on a width × height grid (top-left is origin, x grows right, y grows down). Food is given as a queue of [row, col] positions consumed in order. On move(direction) where direction ∈ {U, D, L, R}:
- The head advances one cell in that direction.
- If the new head is out of bounds → game over, return
-1. - If the new head collides with the snake’s body (excluding the cell the tail vacates this turn) → game over, return
-1. - If the new head equals the next food position → consume the food (advance the food queue), grow by one (do NOT remove the tail), score += 1.
- Otherwise → remove the tail.
Return the current score (number of foods eaten).
Constraints
- 1 ≤
width, height≤ 10^4. - 0 ≤
food.length≤ 50. - Food positions are inside the grid and never on
(0, 0). moveis called up to 10^4 times.
Clarifying Questions
- Is the head or the tail at index 0 of the snake list? (Convention: head at index 0; tail at the end. Document.)
- Can the snake move backwards onto itself in one move? (Length 1: yes — that’s just a turn. Length > 1: that’s a self-collision.)
- Is food consumed FIFO from the queue? (Yes.)
- Does the game continue after an illegal move? (No —
-1is terminal; subsequent calls should also return-1or be undefined. Document.) - Can two food items occupy the same cell? (Spec says no; assume distinct.)
Examples
g = SnakeGame(width=3, height=2, food=[[1,2],[0,1]])
g.move("R") # head: (0,1) -> 0
g.move("D") # head: (1,1) -> 0
g.move("R") # head: (1,2) eats food[0] -> 1
g.move("U") # head: (0,2) -> 1
g.move("L") # head: (0,1) eats food[1] -> 2
g.move("U") # out of bounds -> -1
Initial Brute Force
class SnakeNaive:
def __init__(self, w, h, food):
self.w, self.h = w, h
self.food = food
self.snake = [(0, 0)]
def move(self, d):
dr, dc = {"U":(-1,0),"D":(1,0),"L":(0,-1),"R":(0,1)}[d]
r, c = self.snake[0]
nr, nc = r + dr, c + dc
if not (0 <= nr < self.h and 0 <= nc < self.w): return -1
if self.food and [nr, nc] == self.food[0]:
self.food.pop(0)
self.snake.insert(0, (nr, nc))
else:
self.snake.pop()
if (nr, nc) in self.snake: return -1
self.snake.insert(0, (nr, nc))
return len(self.snake) - 1
Two bugs and one performance issue: (nr, nc) in self.snake is O(N); self.snake.insert(0, ...) is O(N) for a list; self.food.pop(0) is O(F).
Brute Force Complexity
move: O(N) per call. Across M moves: O(M · N). At N = 10^4 and M = 10^4: 10^8 — borderline.
Optimization Path
Replace list with collections.deque (O(1) append/pop both ends). Add a set of body cells for O(1) collision detection. Keep an integer food_idx instead of pop(0)-ing the food list. Now every move is O(1).
Final Expected Approach
State: body: deque[(r, c)] with head at the right (body[-1]), body_set: set[(r, c)] mirroring it, food_idx: int, plus width, height, food. On move: compute new head, check bounds, decide grow-or-shift. If grow: append new head to deque and set; advance food_idx. If shift: remove old tail from set first, then check collision with body_set, then add new head. The order matters — exactly the “tail vacates then head moves” semantics.
Data Structures Used
collections.dequefor the snake body (O(1) head/tail append/pop).set[tuple[int, int]]for membership (O(1) collision check).intfood_idxto avoid mutating the food list.- A
dict[str, tuple[int, int]]for direction deltas.
Correctness Argument
The body deque represents the snake as a sequence from tail to head. The body_set is the membership oracle. Invariant: set(body) == body_set is maintained at every operation. On move-without-eating, we pop the tail from both before testing the new head — this models tail-vacates-first. On move-with-eating, the tail stays, so the snake grows by one. Bounds are checked first because a head outside the grid is definitely game over regardless of body. The score equals food consumed, which equals food_idx.
Complexity
| Operation | Time | Space |
|---|---|---|
move | O(1) amortized | O(N) for body |
Implementation Requirements
from collections import deque
from typing import List
class SnakeGame:
DIRS = {
"U": (-1, 0),
"D": (1, 0),
"L": (0, -1),
"R": (0, 1),
}
def __init__(self, width: int, height: int, food: List[List[int]]):
if width <= 0 or height <= 0:
raise ValueError("width and height must be positive")
self._w = width
self._h = height
self._food = [tuple(f) for f in food]
self._food_idx = 0
self._body: deque[tuple[int, int]] = deque([(0, 0)])
self._body_set: set[tuple[int, int]] = {(0, 0)}
self._game_over = False
def move(self, direction: str) -> int:
if self._game_over:
return -1
if direction not in self.DIRS:
raise ValueError(f"invalid direction: {direction!r}")
dr, dc = self.DIRS[direction]
head_r, head_c = self._body[-1]
nr, nc = head_r + dr, head_c + dc
# 1. bounds
if not (0 <= nr < self._h and 0 <= nc < self._w):
self._game_over = True
return -1
# 2. eat-or-shift decision
new_head = (nr, nc)
eats = (
self._food_idx < len(self._food)
and self._food[self._food_idx] == new_head
)
if eats:
self._food_idx += 1
# grow: head added; tail stays
if new_head in self._body_set:
# the new head landed on the body (rare but possible: food
# placed on a cell the snake currently occupies)
self._game_over = True
return -1
self._body.append(new_head)
self._body_set.add(new_head)
return self._food_idx
# shift: tail vacates first
old_tail = self._body.popleft()
self._body_set.remove(old_tail)
if new_head in self._body_set:
self._game_over = True
return -1
self._body.append(new_head)
self._body_set.add(new_head)
return self._food_idx
Tests
def test_basic_path():
g = SnakeGame(3, 2, [[1, 2], [0, 1]])
assert g.move("R") == 0
assert g.move("D") == 0
assert g.move("R") == 1
assert g.move("U") == 1
assert g.move("L") == 2
assert g.move("U") == -1
def test_immediate_wall():
g = SnakeGame(3, 3, [])
assert g.move("U") == -1
assert g.move("R") == -1 # idempotent terminal
def test_self_collision_after_growth():
# Grow to length 4, then turn into self.
g = SnakeGame(4, 4, [[0, 1], [0, 2], [0, 3]])
assert g.move("R") == 1
assert g.move("R") == 2
assert g.move("R") == 3
assert g.move("D") == 3
assert g.move("L") == 3
assert g.move("U") == 3 # no collision yet
# body is at (0,3),(0,2),(0,1),(0,0) wait, careful — let's just sanity check non-trivial case
def test_tail_cell_is_safe():
# Length 2 snake, move into the cell its tail just vacated -> not collision.
g = SnakeGame(3, 3, [[0, 1]]) # eat once at (0,1)
assert g.move("R") == 1 # body: (0,0)->(0,1), length 2
assert g.move("D") == 1 # body: (0,1)->(1,1), tail (0,0) vacated
assert g.move("L") == 1 # body: (1,1)->(1,0)
assert g.move("U") == 1 # body: (1,0)->(0,0). Old tail vacated this turn.
def test_food_consumed_in_order():
g = SnakeGame(5, 5, [[0, 1], [0, 2]])
assert g.move("R") == 1
assert g.move("R") == 2
def test_terminal_state_persists():
g = SnakeGame(2, 2, [])
assert g.move("U") == -1
assert g.move("D") == -1
assert g.move("L") == -1
Follow-up Questions
- How would you test it? Unit tests on each path: bounds, eat, shift, self-collision, tail-cell-safe. Property test: random direction sequences with random food; oracle re-implements the naive O(N) version; assert outputs match. Smoke test: a long random run that doesn’t crash.
- What configuration knobs would you expose? Grid size, initial position, direction key bindings, optional “wrap-around” mode (snake exits one wall, enters the opposite). Don’t expose the data-structure choices.
- How would you handle a poison-pill input? Invalid direction strings → raise. Negative grid dimensions → raise at construction. Food positions outside the grid → raise at construction. After the game ends, calls to
moveare idempotent (return-1). - How would you make it thread-safe? Wrap
movein aLock. Snake game state has no natural concurrency benefit (a single player), but if multiple callers (e.g., network clients in a multiplayer variant) race, the lock prevents torn updates. - What metrics would you emit?
moves_per_gamehistogram,score_at_game_overhistogram,game_over_reasoncounter (wall vs self-collision). Useful to compare difficulty levels. - How would you scale to N players (multiplayer Snake)? Each player has their own
body/body_set. Thebody_sets must be merged for collision detection:forbidden = self.body_set | sum(other.body_set). The food queue is shared. Lock per shared-state structure or use a STM-style atomic transaction per tick.
Product Extension
Multiplayer variants (Slither.io, Agar.io descendants) keep this exact data structure but add: (a) a server-authoritative tick clock, (b) state diff broadcasts, (c) interpolation on the client. The interview-relevant primitive is unchanged.
Language/Runtime Follow-ups
- Python: as above.
collections.dequeis the right primitive —list.pop(0)is O(N). - Java:
ArrayDeque<int[]>for the body;HashSet<Long>for collision (encode(r, c)as(long)r * width + c). - Go: a slice for the body (use ring-buffer indices for O(1) ends, or accept linear shifts for small N);
map[[2]int]struct{}for the set. - C++:
std::deque<std::pair<int,int>>andstd::unordered_set<int64_t>with a(r, c)encoding. - JS/TS: array as a deque is fine for small N; for performance, use head/tail pointers in a fixed array.
Setwith a string-key"r,c"for collision.
Common Bugs
- Removing the tail after checking collision — the just-vacated cell falsely flags as collision.
- Using
list.insert(0, ...)andlist.pop(0)— both O(N), defeats the data-structure choice. - Not advancing
food_idxcorrectly — eating the same food twice or skipping food. - Comparing
food[idx](a list) to(nr, nc)(a tuple) —[0,1] == (0,1)isFalsein Python. Normalize types at construction. - Allowing
moveafter game over without returning-1— undefined behavior. Set a_game_overflag and short-circuit. - Computing direction deltas inside the function instead of as a class constant — minor, but inelegant.
Debugging Strategy
When the snake “dies” on a legal move: print the body, body_set, new_head, and the comparison being made just before returning -1. The bug is almost always the order of tail-vacate vs collision-check. When the score is wrong: print food_idx after each call. When move “succeeds” through a wall: print nr, nc, self._h, self._w — the bounds check is off-by-one.
Mastery Criteria
- Picked deque + set in <30 seconds, justified the choice.
- Stated the tail-vacate-first invariant unprompted.
-
Wrote O(1)
movefrom a blank screen in <20 minutes. - Wrote the tail-cell-is-safe test from memory.
- Listed at least three game-over reasons (wall, self-collision, food-on-body — rare).
- Articulated the multiplayer extension in <60 seconds.
- Solved LC 353 in <25 minutes total with all tests passing.
Lab 21 — Tic-Tac-Toe (Streaming Winner Detection)
Goal
Implement Tic-Tac-Toe (LC 348) where players alternate moves on an N × N board and move(row, col, player) returns 0 (no winner yet) or the player number on a winning move. The naive O(N²) per-move full-scan is unacceptable; achieve O(1) per move by maintaining row, column, and diagonal counters. After this lab you should write the implementation in <15 minutes.
Background Concepts
The non-trivial bit of Tic-Tac-Toe-as-a-data-structure-problem is the per-move winner check. Each cell affects exactly one row, one column, and (if on a diagonal) at most one or two diagonals. By incrementing player-1’s counter by +1 and player-2’s by -1 on the same axes, a counter that hits +N means player 1 won that axis, -N means player 2 won. This collapses the O(N²) scan to O(1).
Diagonals: the main diagonal is the line where row == col; the anti-diagonal is where row + col == N - 1. A cell is on the main diagonal iff row == col; on the anti-diagonal iff row + col == N - 1. The center cell of an odd-N board sits on both.
This is the cleanest real example of “exchange a redundant scan for a maintained counter” — a recurring pattern in real code (running averages, sliding maxes, materialized aggregates in databases).
Interview Context
A 20-minute warmup at Amazon, Google, and Microsoft. Often paired with the LRU lab as a phone-screen double-feature. The interviewer wants O(1) per move, clean class API, and at least one or two follow-ups about extending to N-in-a-row Connect Four-style games (where the winning condition is more complex).
Problem Statement
Implement TicTacToe(n) and move(row, col, player) -> int:
- The board is
n × nand starts empty. - Players alternate (caller manages turn order; you don’t validate it for this version).
playeris 1 or 2.- Each move places the player’s mark at
(row, col). Assume the cell is empty. - Return the player’s number if this move results in a win (full row, column, main diagonal, or anti-diagonal of that player); otherwise return 0.
- Once a player has won, the game ends; further moves are not part of the spec but should be defensively handled.
Constraints
- 1 ≤
n≤ 100. - Each call to
moveis O(1) target. - Up to 10^6 moves across the lifetime of an instance.
Clarifying Questions
- Are
rowandcol0-indexed? (Yes.) - Is the cell guaranteed empty? (Per LC 348: yes. In practice, validate defensively.)
- Do we need to detect a draw? (Not in LC 348; doable as
move_count == n*n.) - Once a player wins, are further moves undefined? (Yes; either short-circuit to that winner or raise.)
- Can the same player call
movetwice in a row? (Spec assumes alternating; we do not enforce.)
Examples
g = TicTacToe(3)
g.move(0, 0, 1) # 0 player 1 at (0,0)
g.move(0, 2, 2) # 0 player 2 at (0,2)
g.move(2, 2, 1) # 0
g.move(1, 1, 2) # 0
g.move(2, 0, 1) # 0
g.move(1, 0, 2) # 0
g.move(2, 1, 1) # 1 player 1 wins via row 2 (0,0 main diag was already two of 1's)
Initial Brute Force
class TicTacToeNaive:
def __init__(self, n):
self.n = n
self.b = [[0] * n for _ in range(n)]
def move(self, r, c, p):
self.b[r][c] = p
# check row
if all(self.b[r][j] == p for j in range(self.n)): return p
if all(self.b[i][c] == p for i in range(self.n)): return p
if r == c and all(self.b[i][i] == p for i in range(self.n)): return p
if r + c == self.n - 1 and all(self.b[i][self.n - 1 - i] == p for i in range(self.n)): return p
return 0
This is O(N) per move. For N = 100 and 10^6 moves: 10^8 — slow but passing. The point is structural: it scans the whole row/column/diagonal every time even though the move only changed one cell.
Brute Force Complexity
move: O(N) per call. Total: O(M · N).
Optimization Path
Replace each row/col/diagonal full-scan with a maintained counter. Use +1 for player 1, -1 for player 2; a counter at ±N is a win. The diagonals are special-cased by the row == col and row + col == N - 1 predicates — we only update them when the cell is on the diagonal. Now every check is a single integer comparison.
Final Expected Approach
State: rows[N], cols[N], diag (scalar), anti (scalar). Each is an integer counter. On move(r, c, player): compute delta = +1 if player == 1 else -1. Increment rows[r], cols[c], and conditionally diag and anti. If any of the four updated counters has absolute value N → that player wins.
Data Structures Used
list[int]of size N for rows.list[int]of size N for columns.intfor the main diagonal counter.intfor the anti-diagonal counter.- (Optional)
list[list[int]]board for defensive duplicate-move detection.
Correctness Argument
A row of N copies of player 1 produces a counter of +N exactly when all N cells are player 1, because every player-1 move on that row contributes +1 and no player-2 move contributes there (since the cell is occupied by player 1). Symmetric for player 2 → -N. Same argument for columns and the two diagonals. The diagonal counter is only updated for cells on the diagonal, so it correctly counts only diagonal cells.
Complexity
| Operation | Time | Space |
|---|---|---|
move | O(1) | O(N) for row/col counters |
Implementation Requirements
class TicTacToe:
def __init__(self, n: int):
if n < 1:
raise ValueError("n must be >= 1")
self._n = n
self._rows = [0] * n
self._cols = [0] * n
self._diag = 0
self._anti = 0
self._winner = 0 # 0 = no winner yet
def move(self, row: int, col: int, player: int) -> int:
if self._winner:
return self._winner
if not (0 <= row < self._n and 0 <= col < self._n):
raise IndexError(f"({row}, {col}) out of bounds for n={self._n}")
if player not in (1, 2):
raise ValueError(f"player must be 1 or 2, got {player}")
delta = 1 if player == 1 else -1
target = self._n if player == 1 else -self._n
self._rows[row] += delta
self._cols[col] += delta
if row == col:
self._diag += delta
if row + col == self._n - 1:
self._anti += delta
if (self._rows[row] == target
or self._cols[col] == target
or self._diag == target
or self._anti == target):
self._winner = player
return player
return 0
Tests
def test_row_win_player1():
g = TicTacToe(3)
assert g.move(0, 0, 1) == 0
assert g.move(1, 0, 2) == 0
assert g.move(0, 1, 1) == 0
assert g.move(1, 1, 2) == 0
assert g.move(0, 2, 1) == 1
def test_col_win_player2():
g = TicTacToe(3)
g.move(0, 0, 1); g.move(0, 1, 2)
g.move(1, 0, 1); g.move(1, 1, 2)
g.move(2, 2, 1);
assert g.move(2, 1, 2) == 2
def test_diagonal_win():
g = TicTacToe(3)
g.move(0, 0, 1); g.move(0, 1, 2)
g.move(1, 1, 1); g.move(0, 2, 2)
assert g.move(2, 2, 1) == 1
def test_anti_diagonal_win():
g = TicTacToe(3)
g.move(0, 2, 1); g.move(0, 0, 2)
g.move(1, 1, 1); g.move(0, 1, 2)
assert g.move(2, 0, 1) == 1
def test_no_winner_on_partial():
g = TicTacToe(3)
assert g.move(0, 0, 1) == 0
assert g.move(1, 1, 2) == 0
def test_n_equals_one():
g = TicTacToe(1)
assert g.move(0, 0, 1) == 1
def test_invalid_player():
g = TicTacToe(3)
try: g.move(0, 0, 3)
except ValueError: pass
else: assert False
def test_move_after_winner():
g = TicTacToe(3)
for c in range(3): g.move(0, c, 1)
# subsequent moves still report the winner
assert g.move(1, 1, 2) == 1
def test_large_n_no_win():
g = TicTacToe(100)
# Fill 99 of player 1's row 0 — should not win.
for c in range(99):
assert g.move(0, c, 1) == 0
Follow-up Questions
- How would you test it? Unit tests for each axis (row, col, both diagonals, by both players). Property test: random move sequences; oracle is the naive O(N) scan; assert outputs match. Edge:
n=1(any move wins). Edge: anti-diagonal at the corners only. - What is the consistency model? Single-threaded, linearizable trivially. If multiple threads race on
move, the counters can interleave and a player can falsely fail to win. Wrap with aLockif concurrent. - What configuration knobs would you expose? Just
n. Resist adding “win condition = K-in-a-row instead of N” — that’s a different problem (Connect Four / Gomoku). If asked, see Connect Four extension below. - How would you handle a poison-pill input? Out-of-bounds coords (
IndexError), invalid player (ValueError), repeated cell (defensive: trackboardand reject). The current implementation rejects bounds and player; cell-overwrite detection is an explicit follow-up. - How would you extend to K-in-a-row on an N×N board (Gomoku, Connect Four)? Counters no longer suffice — you need to find any window of K consecutive same-player cells. Two options: (a) on each move, scan the row, column, and both diagonals through the cell looking for K-in-a-row centered on the move (O(K) per move), or (b) maintain run-length encodings per axis (more memory, O(1) per move). For interview-time, (a) is the right answer — clean and O(K), not O(N).
- What metrics would you emit?
moves_total,wins_total{player=1|2},time_to_winhistogram. Game-balance metrics for product analytics; otherwise sparse.
Product Extension
The “maintained counter instead of full scan” pattern shows up everywhere: real-time sports scores (a goal updates a single team total instead of recomputing from a play log), database materialized views (incrementally maintained, not recomputed), Prometheus counters (the rate() function avoids re-scanning the whole series). Tic-Tac-Toe is the simplest possible illustration.
Language/Runtime Follow-ups
- Python: as above. Lists are 8 bytes per int reference; for very large N,
array.array("i", [0]*n)is denser. - Java:
int[] rows,int[] cols,int diag,int anti. No autoboxing in the hot path. - Go: same — value types throughout, no allocations after construction.
- C++:
std::vector<int>. Makemovenon-virtual; this is hot-path code. - JS/TS:
Int32Array(n)for rows and cols — denser than a regular Array.
Common Bugs
- Off-by-one in the anti-diagonal predicate:
row + col == n - 1, notnorn + 1. - Using
+1for both players (and checkingcount == Nandcount == -N) — mistake. Use opposite-sign deltas. - Forgetting to update the diagonal counter when the move is on the diagonal — counter stays stuck.
- Scaling the win threshold incorrectly (
target = n if player == 1 else -n). A cleaner version usesabs(counter) == n and sign(counter) == sign(delta). - Not guarding against repeated cells — same
(r, c)updated twice can spuriously win for one player or unjustly cancel out. - Concurrent calls without a lock: counters become inconsistent and the win condition fires on the wrong player.
Debugging Strategy
When wins are missed: print (rows, cols, diag, anti) after each move; trace which counter should have hit ±N. When wins fire spuriously: same trace — usually the diagonal predicate is wrong. When tests pass for player 1 but not player 2: confirm delta = -1 and target = -N for player 2; sign errors are common.
Mastery Criteria
- Stated the O(1)-per-move counter approach in <30 seconds.
-
Wrote the diagonal predicates (
r == c,r + c == n - 1) without prompting. - Implemented from a blank screen in <15 minutes with all tests passing.
- Listed the K-in-a-row extension and named the right scan strategy.
- Articulated why this is a “maintained counter” pattern in <30 seconds.
-
Wrote tests covering both diagonals and
n=1.
Lab 22 — Text Editor Buffer (Gap Buffer / Piece Table)
Goal
Implement a text editor data structure that supports cursor-local insert, delete (backspace), left/right cursor movement, and substring read with O(1) amortized cursor-local edits. The reference implementation is a gap buffer; the follow-up is a piece table. After this lab you should articulate why a flat str or list[char] is wrong and produce a working gap buffer in <30 minutes.
Background Concepts
A naive editor representation — a single string — makes every insert and delete O(N): the character data after the cursor must shift. For a million-character document, every keystroke is millions of operations. Real editors avoid this with one of three data structures:
- Gap buffer: a single contiguous array with a “gap” of unused slots positioned at the cursor. Insert at cursor = O(1) (write into the gap, shrink it). Move cursor = O(distance moved) — the gap moves with the cursor by shifting characters across it. Used by Emacs.
- Piece table: an immutable original buffer + an append-only “added” buffer + a list of “pieces” describing the visible document as concatenated slices. Insert anywhere = O(1) amortized (append to the added buffer, splice a piece into the piece list). Used by VS Code, Word.
- Rope / balanced tree of strings: O(log N) for all operations, the most general. Used by Xi-editor and several research editors.
The interview almost always wants the gap buffer because it is the simplest correct answer with the right asymptotics for the locality assumption (most edits happen near the cursor). The piece table is the right follow-up answer when the interviewer asks “what if edits aren’t local?” or “what if you want O(1) amortized regardless of cursor position?”.
Interview Context
A 30-to-45-minute round at Google (Docs), Microsoft (VS Code, Word), JetBrains, and any team that builds editor-like UI. Most candidates default to a list[char] and accept O(N) per insert; that’s a partial answer. Reaching for a gap buffer immediately demonstrates that you’ve thought about real editor performance.
Problem Statement
Implement TextEditor with:
insert(text: str): inserttextat the current cursor position; cursor moves to the end of inserted text.delete_left(n: int) -> int: delete up toncharacters to the left of the cursor; return the actual number deleted (capped by left content).move_left(n: int) -> str: move cursornpositions left (capped at start); return the last 10 characters to the left of the new cursor (or fewer if not available).move_right(n: int) -> str: symmetric on the right.text() -> str: return the full document (debug helper; not on the hot path).
This is the LC 2296 “Design a Text Editor” interface, with the read-back-10 affordance.
Constraints
- Up to 10^4 calls in total.
- Each
textargument up to 40 characters; total inserted up to ~10^4 characters. - Insert and
delete_leftmust be O(amortized 1) plus O(text length). Move operations are O(distance moved).
Clarifying Questions
- Is the cursor between characters (column-style) or at a character (cell-style)? (Between — like every editor.)
- Does
delete_left(n)with n > available delete only what’s available and return that count? (Yes.) - What does
move_leftreturn when the cursor is at the start? (Empty string.) - Is the buffer Unicode-aware? (For LC 2296: ASCII suffices. For real editors: must handle code points and grapheme clusters; out of scope here.)
- Are inserts at any cursor position guaranteed local (i.e., does the interviewer want gap buffer or piece table)? (Default gap buffer — most edits are local.)
Examples
ed = TextEditor()
ed.insert("leetcode") # cursor at end; text = "leetcode"
ed.delete_left(4) # 4
ed.text() # "leet"
ed.insert("practice") # text = "leetpractice"
ed.move_right(3) # "etpractice" -> last 10 to left
ed.move_left(8) # "leet"
ed.delete_left(10) # 4
ed.text() # "practice"
Initial Brute Force
class StringEditor:
def __init__(self): self.s = ""; self.c = 0
def insert(self, text): self.s = self.s[:self.c] + text + self.s[self.c:]; self.c += len(text)
def delete_left(self, n):
d = min(n, self.c); self.s = self.s[:self.c - d] + self.s[self.c:]; self.c -= d; return d
def move_left(self, n): self.c = max(0, self.c - n); return self.s[max(0, self.c - 10):self.c]
def move_right(self, n): self.c = min(len(self.s), self.c + n); return self.s[max(0, self.c - 10):self.c]
def text(self): return self.s
Correct, but every insert and delete is O(N) due to slice + concatenation. At 10^4 ops on a 10^4-char document: 10^8 char-shifts. Borderline-TLE on LC 2296.
Brute Force Complexity
insert: O(N + |text|). delete_left: O(N). move_*: O(1) for the text return (slice). Total worst case: O(N · operations).
Optimization Path
Switch to a gap buffer: a single bytearray (or list[str]) of length capacity, with two indices gap_start and gap_end. Characters before gap_start and after gap_end are real content; the range [gap_start, gap_end) is unused. The cursor position is gap_start. Insert at cursor: write into the gap, advance gap_start. Delete left: rewind gap_start (the deleted characters are now in the gap, no copying needed). Move left by k: shift k characters from before gap_start to after gap_end - 1 (the gap moves toward the start). Move right by k: symmetric. Resize when the gap shrinks to zero — double the capacity.
Final Expected Approach
A bytearray buf of size capacity. Indices gap_start (left edge of gap) and gap_end (right edge, exclusive). Invariants: 0 ≤ gap_start ≤ gap_end ≤ capacity. Document length = capacity - (gap_end - gap_start). Cursor = gap_start. Operations manipulate the indices and copy small ranges of bytes; total work for cursor-local edits is bounded by the edit size, not the document size.
Data Structures Used
bytearray(orlist[str]) for the storage buffer.- Two
intindicesgap_startandgap_end. - A
capacitybookkeeping value. - A
_growhelper that doubles capacity when the gap is exhausted.
Correctness Argument
After every operation: the document is buf[:gap_start] + buf[gap_end:capacity] decoded. insert(text): ensure the gap holds len(text) slots (grow if needed); copy text into buf[gap_start:gap_start + len(text)]; advance gap_start by len(text). The document grows by exactly len(text) and the cursor moves to the end of the insertion. delete_left(n): cap n by gap_start (cursor is gap_start, so the leftmost left-deletable count is gap_start); rewind gap_start by n. The document shrinks by exactly n. move_left(k): shift min(k, gap_start) bytes from buf[gap_start - k:gap_start] to buf[gap_end - k:gap_end]; subtract k from both indices. The visible document is unchanged, only the gap moved.
Complexity
| Operation | Time | Space |
|---|---|---|
insert(t) | O( | t |
delete_left(n) | O(1) | O(1) extra |
move_left(k) / move_right(k) | O(k) | O(1) |
text() | O(N) | O(N) |
Implementation Requirements
class TextEditor:
def __init__(self, initial_capacity: int = 16):
self._buf = bytearray(initial_capacity)
self._gap_start = 0
self._gap_end = initial_capacity
@property
def _capacity(self) -> int:
return len(self._buf)
@property
def _length(self) -> int:
return self._capacity - (self._gap_end - self._gap_start)
def _grow(self, needed: int):
new_cap = max(self._capacity * 2, self._capacity + needed)
new_buf = bytearray(new_cap)
# left segment unchanged, right segment shifted to end of new buffer
new_buf[: self._gap_start] = self._buf[: self._gap_start]
right_size = self._capacity - self._gap_end
new_buf[new_cap - right_size :] = self._buf[self._gap_end :]
self._buf = new_buf
self._gap_end = new_cap - right_size
def insert(self, text: str):
b = text.encode("utf-8")
if self._gap_end - self._gap_start < len(b):
self._grow(len(b))
self._buf[self._gap_start : self._gap_start + len(b)] = b
self._gap_start += len(b)
def delete_left(self, n: int) -> int:
d = min(n, self._gap_start)
self._gap_start -= d
return d
def _move_left(self, k: int):
k = min(k, self._gap_start)
if k == 0:
return
# copy k bytes from before gap to after gap (right side)
src_end = self._gap_start
src_start = src_end - k
dst_end = self._gap_end
dst_start = dst_end - k
# work right-to-left to handle overlap
for i in range(k - 1, -1, -1):
self._buf[dst_start + i] = self._buf[src_start + i]
self._gap_start -= k
self._gap_end -= k
def _move_right(self, k: int):
right_avail = self._capacity - self._gap_end
k = min(k, right_avail)
if k == 0:
return
# copy k bytes from after gap (right side) to before gap (left side)
for i in range(k):
self._buf[self._gap_start + i] = self._buf[self._gap_end + i]
self._gap_start += k
self._gap_end += k
def _last_10_left(self) -> str:
start = max(0, self._gap_start - 10)
return self._buf[start : self._gap_start].decode("utf-8", errors="replace")
def move_left(self, k: int) -> str:
self._move_left(k)
return self._last_10_left()
def move_right(self, k: int) -> str:
self._move_right(k)
return self._last_10_left()
def text(self) -> str:
left = self._buf[: self._gap_start]
right = self._buf[self._gap_end :]
return (left + right).decode("utf-8", errors="replace")
Tests
def test_basic_insert_delete():
ed = TextEditor()
ed.insert("leetcode")
assert ed.text() == "leetcode"
assert ed.delete_left(4) == 4
assert ed.text() == "leet"
ed.insert("practice")
assert ed.text() == "leetpractice"
def test_cursor_movement_returns_last_10():
ed = TextEditor()
ed.insert("practice")
assert ed.move_right(3) == "practice" # cursor at end already; last 10 left = "practice"
assert ed.move_left(8) == ""
assert ed.delete_left(10) == 0
ed.insert("leet")
assert ed.text() == "leetpractice"
assert ed.move_left(2) == "le"
def test_lc_2296_canonical():
ed = TextEditor()
ed.insert("leetcode")
assert ed.delete_left(4) == 4
ed.insert("practice")
assert ed.move_right(3) == "etpractice"
assert ed.move_left(8) == "leet"
assert ed.delete_left(10) == 4
assert ed.move_left(2) == ""
def test_grow_buffer():
ed = TextEditor(initial_capacity=4)
ed.insert("a" * 100)
assert ed.text() == "a" * 100
def test_delete_more_than_left():
ed = TextEditor()
ed.insert("ab")
assert ed.delete_left(10) == 2
assert ed.text() == ""
def test_move_clamps():
ed = TextEditor()
ed.insert("hello")
ed.move_left(100) # clamped to 0
ed.move_right(100) # back to end
assert ed.text() == "hello"
Follow-up Questions
- What is the relationship to a piece table? Gap buffer is one contiguous buffer with one gap; piece table is two buffers (original + append-only) and a list of pieces. Insert at cursor in piece table = append to “added” buffer, splice the piece list — O(1) amortized regardless of cursor position. The downside: random-access reads are O(log P) where P is the number of pieces (binary-search the piece list). Use a piece table when edits are non-local; use a gap buffer when most edits cluster.
- How would you make it thread-safe? Wrap public methods with a
Lock(or use a single-writer model — most editors are single-threaded on the editing buffer for exactly this reason; rendering and saving happen on background threads with snapshots). - How would you persist state across restarts? On every K seconds or after every N keystrokes, write the current text to a temporary file, then atomically rename it. For more granular crash recovery, append every operation to a log; replay on boot.
- What configuration knobs would you expose?
initial_capacity,max_document_size. The growth factor (currently 2×) is a sensible default; don’t expose unless you’ve measured. - How would you handle a poison-pill input? A multi-megabyte single
insert(text). Reject text longer thanmax_insert(e.g., 1 MiB). Total document size capped bymax_document_size. Return errors, don’t OOM. - What metrics would you emit?
inserts_total,deletes_total,cursor_moves_total,buffer_grows_total,document_size_bytesgauge,gap_size_bytesgauge. Useful for tracking edit patterns and tuning capacity defaults.
Product Extension
Real editors layer many things on top: undo/redo (each operation pushes an inverse onto a stack), syntax highlighting (incremental tree-sitter passes), multi-cursor (a list of gap-buffer-style cursors), collaborative editing (operational transforms or CRDTs over the same buffer). The buffer is the bottom; the rest is composition.
Language/Runtime Follow-ups
- Python: as above.
bytearrayis the right primitive; avoid string concatenation inside the hot path. - Java:
char[]plusint gapStart,int gapEnd.StringBuilderis internally achar[]but lacks gap-buffer semantics. - Go:
[]byte(or[]runefor Unicode-aware editors). The growth pattern matches Go’s slice append. - C++:
std::vector<char>. For piece tables,std::vector<Piece>of(buffer_id, offset, length). - JS/TS:
Uint8Arrayis the dense representation;stringconcatenation is O(N) and should be avoided.
Common Bugs
- Forgetting to grow when the gap is exhausted — silent overwrite of right-segment data.
- Off-by-one when shifting bytes during cursor moves — left-to-right copy on overlapping ranges loses data; copy right-to-left on left-shifts.
- Using
len(self._buf)after grow without updating cached references — always recomputecapacitypost-grow. - Returning the full
text()on everymove_*call when the spec only wants the last 10 characters left of cursor. - Encoding inconsistencies — mixing
strandbytearray. Pick one (here we use UTF-8 bytes; document the choice; reject mid-codepoint splits in real implementations). - Initializing the buffer too small (e.g., capacity 1) — every keystroke triggers a regrow. Default capacity 16 amortizes well.
Debugging Strategy
When text() is wrong: print (buf, gap_start, gap_end, capacity) after each operation; the bug is almost always a forgotten index update. When cursor moves leak data: print the bytes copied and the index ranges; right-to-left vs left-to-right copy direction is the most common bug. When grow fails: assert len(buf) == capacity after every operation.
Mastery Criteria
- Stated the gap-buffer invariant in <30 seconds.
- Named when piece table is preferred (non-local edits) without prompting.
- Implemented gap buffer with insert/delete/move/text in <30 minutes from blank screen.
- Wrote a regrow test that exercises the doubling.
- Articulated the overlap-direction bug for cursor moves and named the fix.
- Solved LC 2296 unaided in <40 minutes including all tests.
- Listed three real editors (Emacs, VS Code, Word) and which structure each uses.
Lab 23 — Toy SQL-Like Engine
Goal
Implement a tiny SQL-like engine that can parse and execute SELECT col1, col2 FROM t WHERE expr [JOIN u ON expr] [ORDER BY col [DESC]] [LIMIT n] over in-memory tables. The engine has three layers: tokenizer, parser (produces an AST), executor (interprets the AST). After this lab you should be able to scope a 60-minute version of this in <5 minutes and produce a working subset (no joins, no order-by) in <40 minutes.
Background Concepts
A SQL engine — even a toy — is the cleanest interview-friendly example of the frontend / backend / interpreter trilogy that runs every real query engine, compiler, and DSL:
- Lexer / tokenizer: converts a string into a stream of typed tokens (
SELECT, identifiername,=, integer42, etc.). Skips whitespace, recognizes keywords, classifies punctuation. - Parser: consumes the token stream and produces an AST:
Select(columns, from_table, where_expr, joins, order_by, limit). Recursive descent is the right tool for this problem class — top-down, predictable, fits on a whiteboard. - Executor: walks the AST and produces rows. For
WHERE, evaluate the expression against each row. ForJOIN, nested-loop the two tables and concatenate matching rows. ForORDER BY, sort by the named column withDESCflag. ForLIMIT, slice the result.
The interviewer is not testing whether you can build a real query optimizer (you can’t, in 60 minutes). They are testing whether you can decompose the problem into the three layers, write each cleanly, and connect them through a typed AST. Candidates who try to do everything inline in one function fail; candidates who name Token, Expr, Select types and split functions per layer pass.
Interview Context
A 60-minute round at Snowflake, Databricks, MongoDB, Neon, PlanetScale, and any database / data-platform company. Often paired with a smaller warmup. The supported subset varies by interviewer — at minimum SELECT cols FROM t WHERE expr is expected; joins and order-by are stretch goals; aggregates (COUNT, SUM) are extras for strong candidates.
Problem Statement
Implement Engine with:
register_table(name: str, columns: list[str], rows: list[list]): store an in-memory table.query(sql: str) -> list[list]: parse and execute the SQL, return rows.
Supported grammar:
SELECT col_list FROM table_name
[ JOIN table_name ON cond_expr ]
[ WHERE cond_expr ]
[ ORDER BY column_ref [ASC | DESC] ]
[ LIMIT integer ]
col_list is * or comma-separated column references (qualified table.col or bare col). cond_expr is a small expression language: literals (int, string), column refs, and the operators = != < <= > >= AND OR NOT.
Constraints
- Identifiers are alphanumeric (and underscore). Keywords are case-insensitive (
SELECT==select). - String literals use single quotes (
'foo'). - Tables fit in memory; nested-loop join is acceptable.
- Up to 10^3 rows per table; query must finish in well under a second.
Clarifying Questions
- Are aggregates (
COUNT,SUM) required? (No for the base; stretch goal.) - Are subqueries supported? (No.)
- Are NULLs supported? (No — undefined column = error; missing field = treat as None and
NULLpropagation rules elided.) - Are types coerced? (No — comparing
'5'and5returnsFalseor raises; document.) - Is column resolution case-sensitive? (Yes — keywords case-insensitive, identifiers case-sensitive. Document.)
- Are joins inner-only? (Yes —
INNER JOINsemantics; noLEFT/RIGHT/FULLfor the base.)
Examples
-- users(id, name, age); orders(id, user_id, total)
SELECT name FROM users WHERE age >= 18
SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.total > 100
SELECT name FROM users ORDER BY age DESC LIMIT 3
Initial Brute Force
Skip the parser and tokenize-execute in a single big regex-soup function. This is what most candidates produce when panicked. It works for two or three test cases and breaks instantly on any extension.
Brute Force Complexity
Roughly O(rows × cols × query-length) and bug-prone.
Optimization Path
Properly separate lexer / parser / executor. Tokenizer scans the string once: O(N). Parser is recursive descent: O(tokens). Executor: O(rows × predicate cost) for WHERE; O(left × right) for nested-loop joins; O(rows log rows) for ORDER BY. Each layer is independently testable.
Final Expected Approach
Three layers connected by typed values:
tokenize(sql) -> list[Token]—Token = (kind, value)where kind isKEYWORD,IDENT,INT,STRING,OP,PUNC,EOF.Parser(tokens).parse_select() -> Select— recursive-descent. Each non-terminal is a method.Selectis a dataclass withcolumns,from_table,joins,where,order_by,limit.Engine.execute(select) -> rows— fetch base rows, apply joins, applywhere, project columns, order, limit.
Expr is a small algebraic datatype: Literal(value), Column(table_or_None, name), BinOp(op, left, right), UnaryOp(op, operand). Evaluation: eval_expr(expr, row, schema) -> value.
Data Structures Used
list[tuple]per table (rows).dict[str, int]per table (column → index).- AST: small dataclasses or named tuples.
- Token list:
list[Token]. dict[str, callable]for operator dispatch.
Correctness Argument
Each layer’s correctness is independent of the others. Tokenizer correctness: every input character is consumed exactly once and emitted as exactly one token (or skipped if whitespace). Parser correctness: a recursive-descent parser for an LL(1) grammar accepts the language exactly when the grammar is LL(1) and the parser’s lookahead matches. The grammar above is trivially LL(1). Executor correctness: each clause is a transformation on a row stream. WHERE filters; JOIN cross-products and filters; projection picks columns; ORDER BY sorts; LIMIT truncates. Each transformation preserves the well-typed-row invariant.
Complexity
| Stage | Time |
|---|---|
| Tokenize | O(N) |
| Parse | O(T) where T = tokens |
| WHERE filter | O(R · |
| Inner JOIN (nested loop) | O(R₁ · R₂ · |
| ORDER BY | O(R log R) |
| LIMIT | O(L) |
For larger data: indices, hash joins, query optimizers — out of scope.
Implementation Requirements
import re
from dataclasses import dataclass
from typing import Any, Optional
# ---------------- Tokenizer ----------------
KEYWORDS = {"SELECT", "FROM", "WHERE", "JOIN", "ON",
"ORDER", "BY", "ASC", "DESC", "LIMIT",
"AND", "OR", "NOT"}
@dataclass
class Token:
kind: str
value: Any
_TOKEN_RE = re.compile(r"""
\s+ | # whitespace
'([^']*)' | # string literal
(\d+) | # int
(==|!=|<=|>=|=|<|>) | # ops
([A-Za-z_][A-Za-z0-9_]*) | # identifier or keyword
(,|\(|\)|\.|\*) # punctuation
""", re.VERBOSE)
def tokenize(sql: str) -> list[Token]:
tokens: list[Token] = []
i = 0
while i < len(sql):
m = _TOKEN_RE.match(sql, i)
if not m:
raise SyntaxError(f"unexpected char at {i}: {sql[i]!r}")
s, ival, op, ident, punc = m.groups()
if m.group(0).isspace():
pass
elif s is not None:
tokens.append(Token("STRING", s))
elif ival is not None:
tokens.append(Token("INT", int(ival)))
elif op is not None:
tokens.append(Token("OP", "=" if op == "==" else op))
elif ident is not None:
up = ident.upper()
if up in KEYWORDS:
tokens.append(Token("KEYWORD", up))
else:
tokens.append(Token("IDENT", ident))
elif punc is not None:
tokens.append(Token("PUNC", punc))
i = m.end()
tokens.append(Token("EOF", None))
return tokens
# ---------------- AST ----------------
@dataclass
class Column:
table: Optional[str]
name: str
@dataclass
class Literal:
value: Any
@dataclass
class BinOp:
op: str
left: Any
right: Any
@dataclass
class UnaryOp:
op: str
operand: Any
@dataclass
class Join:
table: str
alias: Optional[str]
on: Any
@dataclass
class Select:
columns: list # list[Column] or ["*"]
from_table: str
from_alias: Optional[str]
joins: list # list[Join]
where: Optional[Any]
order_by: Optional[tuple] # (Column, "ASC" | "DESC")
limit: Optional[int]
# ---------------- Parser (recursive descent) ----------------
class Parser:
def __init__(self, tokens: list[Token]):
self._t = tokens
self._i = 0
def _peek(self) -> Token: return self._t[self._i]
def _eat(self, kind, value=None) -> Token:
tok = self._t[self._i]
if tok.kind != kind or (value is not None and tok.value != value):
raise SyntaxError(f"expected {kind} {value}, got {tok}")
self._i += 1
return tok
def _accept(self, kind, value=None) -> bool:
tok = self._t[self._i]
if tok.kind == kind and (value is None or tok.value == value):
self._i += 1
return True
return False
def parse_select(self) -> Select:
self._eat("KEYWORD", "SELECT")
cols = self._parse_columns()
self._eat("KEYWORD", "FROM")
ftable, falias = self._parse_table_alias()
joins = []
while self._accept("KEYWORD", "JOIN"):
jt, ja = self._parse_table_alias()
self._eat("KEYWORD", "ON")
joins.append(Join(jt, ja, self._parse_expr()))
where = self._parse_expr() if self._accept("KEYWORD", "WHERE") else None
order_by = None
if self._accept("KEYWORD", "ORDER"):
self._eat("KEYWORD", "BY")
ob_col = self._parse_column_ref()
direction = "ASC"
if self._accept("KEYWORD", "DESC"): direction = "DESC"
elif self._accept("KEYWORD", "ASC"): direction = "ASC"
order_by = (ob_col, direction)
limit = None
if self._accept("KEYWORD", "LIMIT"):
limit = self._eat("INT").value
self._eat("EOF")
return Select(cols, ftable, falias, joins, where, order_by, limit)
def _parse_columns(self):
if self._accept("PUNC", "*"):
return ["*"]
cols = [self._parse_column_ref()]
while self._accept("PUNC", ","):
cols.append(self._parse_column_ref())
return cols
def _parse_column_ref(self) -> Column:
ident = self._eat("IDENT").value
if self._accept("PUNC", "."):
name = self._eat("IDENT").value
return Column(ident, name)
return Column(None, ident)
def _parse_table_alias(self) -> tuple[str, Optional[str]]:
name = self._eat("IDENT").value
alias = None
if self._peek().kind == "IDENT":
alias = self._eat("IDENT").value
return name, alias
def _parse_expr(self):
return self._parse_or()
def _parse_or(self):
left = self._parse_and()
while self._accept("KEYWORD", "OR"):
left = BinOp("OR", left, self._parse_and())
return left
def _parse_and(self):
left = self._parse_not()
while self._accept("KEYWORD", "AND"):
left = BinOp("AND", left, self._parse_not())
return left
def _parse_not(self):
if self._accept("KEYWORD", "NOT"):
return UnaryOp("NOT", self._parse_not())
return self._parse_cmp()
def _parse_cmp(self):
left = self._parse_atom()
if self._peek().kind == "OP":
op = self._eat("OP").value
return BinOp(op, left, self._parse_atom())
return left
def _parse_atom(self):
tok = self._peek()
if tok.kind == "INT": self._i += 1; return Literal(tok.value)
if tok.kind == "STRING": self._i += 1; return Literal(tok.value)
if tok.kind == "PUNC" and tok.value == "(":
self._i += 1
e = self._parse_expr()
self._eat("PUNC", ")")
return e
return self._parse_column_ref()
# ---------------- Executor ----------------
class Engine:
def __init__(self):
self._tables: dict[str, tuple[list[str], list[list]]] = {}
def register_table(self, name: str, columns: list[str], rows: list[list]):
self._tables[name] = (columns, [list(r) for r in rows])
def query(self, sql: str) -> list[list]:
ast = Parser(tokenize(sql)).parse_select()
return self._execute(ast)
def _execute(self, sel: Select) -> list[list]:
# 1. base
base_cols, base_rows = self._fetch(sel.from_table)
alias = sel.from_alias or sel.from_table
rows = [(r, {alias: (base_cols, r)}) for r in base_rows]
# 2. joins (nested loop)
for j in sel.joins:
jcols, jrows = self._fetch(j.table)
j_alias = j.alias or j.table
new_rows = []
for left_row, env in rows:
for jr in jrows:
new_env = dict(env)
new_env[j_alias] = (jcols, jr)
if self._eval(j.on, new_env):
new_rows.append((left_row + jr, new_env))
rows = new_rows
# 3. where
if sel.where is not None:
rows = [(r, env) for r, env in rows if self._eval(sel.where, env)]
# 4. project
if sel.columns == ["*"]:
projected = [r for r, _ in rows]
else:
projected = [[self._eval(c, env) for c in sel.columns] for _, env in rows]
# 5. order by
if sel.order_by is not None:
col, direction = sel.order_by
# we sort over the *original* rows (with env), then re-project. Simpler: sort
# the projected rows along with their key.
rows_with_keys = [(self._eval(col, env), p) for (_, env), p in zip(rows, projected)]
rows_with_keys.sort(key=lambda kp: kp[0], reverse=(direction == "DESC"))
projected = [p for _, p in rows_with_keys]
# 6. limit
if sel.limit is not None:
projected = projected[: sel.limit]
return projected
def _fetch(self, table: str):
if table not in self._tables:
raise ValueError(f"unknown table: {table}")
return self._tables[table]
def _eval(self, expr, env: dict[str, tuple[list[str], list]]):
if isinstance(expr, Literal):
return expr.value
if isinstance(expr, Column):
if expr.table is not None:
cols, row = env[expr.table]
return row[cols.index(expr.name)]
for cols, row in env.values():
if expr.name in cols:
return row[cols.index(expr.name)]
raise NameError(f"unknown column: {expr.name}")
if isinstance(expr, UnaryOp) and expr.op == "NOT":
return not self._eval(expr.operand, env)
if isinstance(expr, BinOp):
l = self._eval(expr.left, env)
r = self._eval(expr.right, env)
return {
"=": l == r, "!=": l != r,
"<": l < r, "<=": l <= r,
">": l > r, ">=": l >= r,
"AND": bool(l) and bool(r),
"OR": bool(l) or bool(r),
}[expr.op]
raise TypeError(f"bad expr: {expr}")
Tests
def setup_engine():
e = Engine()
e.register_table("users", ["id", "name", "age"], [
[1, "alice", 30], [2, "bob", 17], [3, "carol", 22], [4, "dave", 45],
])
e.register_table("orders", ["id", "user_id", "total"], [
[10, 1, 250], [11, 1, 50], [12, 3, 800], [13, 4, 75],
])
return e
def test_basic_select_star():
e = setup_engine()
assert e.query("SELECT * FROM users") == [
[1, "alice", 30], [2, "bob", 17], [3, "carol", 22], [4, "dave", 45]]
def test_where_int_compare():
e = setup_engine()
out = e.query("SELECT name FROM users WHERE age >= 18")
assert sorted(out) == [["alice"], ["carol"], ["dave"]]
def test_string_compare():
e = setup_engine()
out = e.query("SELECT name FROM users WHERE name = 'alice'")
assert out == [["alice"]]
def test_and_or_not():
e = setup_engine()
out = e.query("SELECT name FROM users WHERE age > 20 AND NOT name = 'dave'")
assert sorted(out) == [["alice"], ["carol"]]
def test_join():
e = setup_engine()
out = e.query(
"SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id "
"WHERE o.total > 100"
)
assert sorted(out) == [["alice", 250], ["carol", 800]]
def test_order_by_desc_limit():
e = setup_engine()
out = e.query("SELECT name FROM users ORDER BY age DESC LIMIT 2")
assert out == [["dave"], ["alice"]]
def test_unknown_column_errors():
e = setup_engine()
try: e.query("SELECT bogus FROM users")
except NameError: pass
else: assert False
Follow-up Questions
- How would you test it? Layer-by-layer: tokenizer tests for every keyword/operator/punctuation; parser tests that pretty-print the AST and compare strings; executor tests against fixture tables. Property test: random valid queries with predictable outputs from a Python list-comprehension oracle.
- What is the consistency model? Single-threaded; reads see the snapshot at query start. For concurrent writes, copy-on-write tables or per-table read-write locks.
- What configuration knobs would you expose? Maximum query length, maximum result rows, query timeout. Don’t expose internals (tokenizer regex, parser lookahead).
- How would you handle a poison-pill input? Catastrophic regex (rare with the lexer above), deeply nested expressions (limit recursion depth), enormous joins (
R₁ × R₂row cap before executing). Bound everything. - How would you scale to N nodes? Beyond toy: shard tables by primary key range or hash; route queries to the owning node; for joins across shards, use distributed hash join. Real systems (Spanner, CockroachDB) layer query planning, distributed execution, and consensus over this same skeleton.
- What metrics would you emit? Per-query: parse latency, execution latency, rows scanned, rows returned. Per-table: row count gauge. Aggregate: queries-per-second counter, error rate counter.
Product Extension
This is the same skeleton DuckDB, SQLite, Postgres, and every database engine starts with: lex / parse / plan / execute. Real engines add a planner/optimizer between parse and execute that rewrites the AST (push down predicates, choose join order, pick indexes), and a storage layer beneath execute. Aggregate functions, group-by, subqueries, and CTEs are all extensions of the AST + executor pair.
Language/Runtime Follow-ups
- Python: dataclasses are the right shape for AST.
re.VERBOSElexer is concise. - Java: Use sealed interfaces (Java 17+) for AST nodes. ANTLR for the parser if available; hand-rolled recursive descent if not.
- Go: Use a
type Node interface { node() }and individual struct types implementing it. The parser is a struct with the token list and an index. - C++:
std::variantfor AST nodes is clean; visitor pattern viastd::visit. - JS/TS: Discriminated unions for AST. The runtime cost of dynamic dispatch is acceptable for an interview-grade engine.
Common Bugs
- Lexer that consumes whitespace as a token — pollutes the parser. Skip whitespace in the lexer.
- Parser that allows the same column twice in projection but then fails at execution — better to validate at parse time.
- JOIN executor that builds a Cartesian product before filtering — works but quadratic memory before predicate evaluation. Filter as you go (the implementation above does this).
- ORDER BY on a column not in projection — must evaluate against the row environment, not the projected output.
- Operator precedence wrong —
NOT a AND bparsed asNOT (a AND b)instead of(NOT a) AND b. The recursive-descent ladder (OR < AND < NOT < CMP) handles this. - Case-folding identifiers — many SQL engines do (Postgres folds to lowercase); this toy engine doesn’t. Document the choice.
Debugging Strategy
When parse fails: print the token stream up to the failure point and the parser’s _i index — almost always a missing keyword in the parse method (WHERE vs WHRE). When execution returns wrong rows: log the where evaluation per row with the values it sees. When joins explode: cap row count and emit an error rather than running unbounded.
Mastery Criteria
- Decomposed lex / parse / execute in <2 minutes.
- Wrote the recursive-descent expression parser with correct precedence.
-
Implemented
WHEREandSELECT colscorrectly in <30 minutes. - Added inner JOIN nested-loop in <10 minutes from the WHERE-only baseline.
-
Added
ORDER BYandLIMITin <10 minutes more. - Articulated where the optimizer would slot in (between parse and execute).
- Listed three real systems (DuckDB, SQLite, Postgres) using this skeleton.
- Wrote tokenizer + parser + executor tests independently per layer.
Phase 9 — Language & Runtime Deep Dive
Target level: Cross-cutting (applies at every level, but the bar rises sharply at senior+) Expected duration: 1–2 weeks of primary-language reading + 2–4 days of secondary-language skim Format: No labs. Five comprehensive language READMEs that double as interview-prep references. Companies this targets: Every company that asks “why does this work?” follow-ups — which is every company at L4+, and many at L3 as well.
Why This Phase Exists
Every other phase trains you to produce code that works. This phase trains you to explain why it works — and equally importantly, to recognize the silent ways it can stop working when an interviewer perturbs an input or asks a follow-up.
At the junior bar, “I called .sort() on the list” is a complete answer. At the senior bar, the interviewer will ask:
- “Is that sort stable?”
- “What’s the worst-case complexity in your language’s standard library?”
- “Does it allocate?”
- “What if the comparator throws?”
- “What if the same key compares differently across calls — what’s the consistency contract?”
- “If two of the elements are mutable and equal-by-hash, what happens to a
dictkeyed on them?”
A candidate who answers crisply — citing the language’s actual contract, naming the algorithm (Timsort, IntroSort, pdqsort), describing the auxiliary memory, and pointing out the one realistic failure mode — clears the senior bar without breaking a sweat. A candidate who hedges, says “uh, I’d have to check,” or worse, confidently says something wrong — does not.
The gap between these two candidates is rarely raw algorithmic ability. It’s runtime literacy: knowing your tools at the level your tools deserve.
Junior interviews ask “can you make the language do what you want?”. Senior interviews ask “do you know what the language is doing on your behalf?”. This phase exists because the second question is unbounded — every language has hundreds of subtle behaviors — and you cannot bluff your way through it under stopwatch pressure.
What “Language Depth” Actually Means In Interviews
There are five distinct kinds of language questions an interviewer can probe. Confusing them in your own head is one of the most common ways to under-prepare for this phase.
| Probe type | Example question | What it tests |
|---|---|---|
| Mechanical | “What does += do on a Python list vs a Python tuple?” | Did you actually use the language, or just write code in it? |
| Performance | “Why is ''.join(parts) faster than s += part?” | Do you know the cost model, not just the syntax? |
| Concurrency | “What does volatile guarantee in Java?” | Do you understand the memory model? |
| Failure mode | “What happens if __hash__ and __eq__ disagree?” | Can you predict subtle bugs you’ve never seen? |
| Idiom | “What’s the idiomatic way to read a file line-by-line?” | Would your code look native to a coworker? |
Every language-specific section in the phase below is structured around these five probe types. When you read them, mentally tag each subsection with which probe it answers. By the end you should be able to look at any language question and instantly classify it — and most candidates can answer two of the five but routinely fluff the other three.
The Five Tracks
| Track | Folder | Word count | Use cases |
|---|---|---|---|
| Python | python/README.md | ~10K | The default interview language; ML/data/SRE-leaning roles default here |
| Java | java/README.md | ~10K | FAANG-traditional, finance, Android, large enterprise backends |
| Go | go/README.md | ~7K | Infra, distributed systems, container/cloud-native (K8s, Docker, etcd shops) |
| C++ | cpp/README.md | ~10K | HFT/quant, game engines, embedded, browsers, databases, systems-programming roles |
| JavaScript / TypeScript | javascript-typescript/README.md | ~7K | Web frontend, Node backends, full-stack startup roles |
Each track is self-contained — you should not need to consult external references to answer the interview-relevant questions in that track. The READMEs are dense by design. Read your primary language linearly. Skim a secondary language. Ignore the others until/unless you switch.
How To Use This Phase
If you have one primary language
- Read your primary language README end-to-end. Don’t skip sections — even ones you “already know.” The bar in this phase is being able to answer a follow-up under stopwatch pressure, not having heard of the topic.
- For each section, after reading, close the page and explain the concept out loud (or to a rubber duck) in 60 seconds. If you can’t, re-read.
- Run every code example yourself. Many of them produce output that looks wrong until you internalize why it’s right. Reading the explanation without running the code leaves the wrongness un-felt.
- Cross-link backward to the labs in earlier phases: when you read the dict-internals section, revisit phase-01-foundations/labs/lab-03-hashmap-mastery.md. When you read the GIL section, revisit phase-08-practical-engineering/labs/lab-05-thread-pool.md. The readings are reference; the labs are practice; the combination is mastery.
- Write a one-line flashcard for every interview gotcha (“integer cache -5..256 in Python,” “Integer cache -128..127 in Java,” “loop variable capture pre-Go-1.22”). You will get drilled on at least 3 of these in any senior interview.
If you have a secondary language for breadth
Skim the README. Focus on the Common Interview Gotchas and Memory Model sections. Skip the standard library deep dive — you can look those up. Time budget: 1 evening per secondary language.
If you’re polyglot and want to know all five
You will not actually answer interviews fluently in five languages. Pick one primary, one secondary, and learn the rest as a hobby. Interview fluency requires hours of speaking-the-language-aloud practice that you cannot distribute across five tracks in any reasonable time budget.
What’s Deliberately Not In This Phase
- Build systems. No
setuptools/Maven/go.mod/ CMake /package.jsondeep dive. Interviews don’t probe these. - Framework-specific behavior. No Django, Spring, React, Express. Even at FAANG, framework knowledge is rarely on the rubric for a coding round.
- Tooling. No
pdb/gdb/delve/ Chrome DevTools tutorials. Phase 10 covers debugging methodology generically. - Trivia. “Which year was Python 3.0 released?” type questions are out. We focus on what an interviewer asks because they expect you to use the answer in your code.
- Esoterica. Python’s
__init_subclass__, Java’sMethodHandle, Go’sunsafe.Pointer, C++’splacement new— these are real but they’re rarely on a coding interview rubric. If you reach for them in an interview without being asked, you signal over-engineering.
The bias is toward what gets you points in an interview, not what makes you “complete” as a language nerd.
Mastery Checklist
You have completed Phase 9 when, in your primary language:
- You can describe the implementation of the standard hash map / dict, including its collision strategy, load factor, and adversarial-input behavior, in 90 seconds.
- You can describe how the language allocates memory: stack vs heap, GC strategy if any, when objects move, and what triggers a full collection.
- You can name three pitfalls in the language’s mutable-default-argument / late-binding-closure / iterator-invalidation territory and write code that demonstrates each.
- You can write thread-safe code idiomatically, naming which primitive you’d use (mutex / channel / atomic / actor) and why.
- For each of the standard collections (list/dict/set/heap/deque or their equivalents), you can state insert/lookup/delete complexity and one common gotcha.
-
You can answer “what does
==mean here?” precisely, distinguishing identity, value-equality, equals-with-typed-narrowing, and any platform-specific surprises. - You can describe the language’s concurrency model (event loop / GIL / OS threads / goroutines / fibers) in one paragraph and name the kind of work each is bad at.
- You can read a 50-line snippet in your secondary language and accurately predict its behavior on adversarial inputs (e.g., empty list, negative index, mutation during iteration).
- You have a one-line answer for every “common interview gotcha” in your primary language’s README and can produce code that demonstrates the gotcha live.
Exit Criteria
You may exit Phase 9 and move on to Phase 10 — Testing, Debugging & Correctness when:
- Primary-language depth. You’ve read the entire primary-language README and can answer 90% of the “common interview gotchas” subsection without re-reading.
- Cross-cutting fluency. When asked a follow-up like “how would you make this thread-safe?” or “what’s the memory cost of this collection?” during a Phase 8 lab review, you reach for primitives and reasoning from this phase, not from a generic OS course.
- Secondary-language familiarity. You can read code in at least one secondary language without grabbing a reference for basics, and you can identify two or three of the secondary language’s distinctive gotchas.
- Mock readiness. You’ve done at least one mock-09-runtime-language where the entire round was follow-up questions with no algorithmic component, and scored “passing” on the rubric.
If your primary-language depth is shallow but your algorithmic skill is strong, the senior interviewer will say “smart but green” — which is a no-hire at L5+. The fix is not more LeetCode. The fix is this phase.
A Note On Language Choice For Interviews
Pick a language. Do not switch mid-interview. Do not switch mid-loop. Do not arrive at an onsite saying “I usually do Python but I’ll do Java today because the problem is more concurrent.”
Default recommendation: Python for breadth (almost every company allows it), Java if you’re targeting traditional FAANG / finance, C++ if you’re targeting HFT or systems, Go if you’re targeting infrastructure-heavy companies (Cloudflare, Datadog, Snowflake’s data-plane teams, container/Kubernetes shops), JS/TS if you’re targeting web-leaning or Node-leaning roles.
The cost of switching languages is roughly 6 months of practice in the new language before you’re fluent at the senior bar. Do not switch on a whim.
Cross-References
- phase-01-foundations/ — the data-structure complexity tables here are the foundation; this phase deepens them with implementation specifics.
- phase-08-practical-engineering/ — every “Language/Runtime Follow-ups” callout in those labs is grounded here.
- phase-10-testing-debugging/ — debugging is largely “knowing what your runtime is doing”; that phase builds on this one.
- phase-11-mock-interviews/mocks/mock-09-runtime-language.md — the mock round dedicated to this phase.
- CODE_QUALITY.md — “use the language’s built-ins idiomatically” is one of the quality dimensions; this phase tells you what idiomatic actually means.
- FRAMEWORK.md — Step 16 (production implications) routinely cites runtime facts from this phase.
The Sub-Tracks
- Python — CPython internals, GIL, memory model, dict/list/set internals, asyncio, common gotchas
- Java — JVM, GC, JMM, collections framework, concurrency, generics erasure, modern Java
- Go — runtime, GMP scheduler, goroutines/channels, slices/maps internals, context, common bugs
- C++ — memory model, smart pointers, move semantics, STL complexity, undefined behavior, modern idioms
- JavaScript / TypeScript — V8, event loop, prototypes, this-binding, async/await, TS type system
Python Runtime Deep Dive
Target audience: candidates interviewing in Python at Big Tech, ML, infra, or any role where the interviewer is allowed to ask “how does it actually work?”
Scope: CPython. PyPy and other implementations are noted only when they materially change interview answers.
Python’s reputation for being “easy” is exactly why senior interviewers grill it hardest. The candidate who can write a clean two-pointer solution in Python and explain why their dict lookup is O(1) amortized but worst-case O(N), why their threading.Thread doesn’t help CPU-bound code, and why [[]] * 3 is a foot-gun, is rare. Be that candidate.
1. CPython Interpreter, Bytecode, Frame Objects
CPython is a stack-based bytecode interpreter. Source code → AST → bytecode → executed by an evaluation loop in C (ceval.c).
What runs your code
- Lexer/Parser → AST.
- Compiler → bytecode (
.pyccached in__pycache__/). - Interpreter loop (
PyEval_EvalFrameEx) → fetches one bytecode opcode at a time, dispatches.
import dis
def add(a, b):
return a + b
dis.dis(add)
# 2 0 RESUME 0
# 3 2 LOAD_FAST 0 (a)
# 4 LOAD_FAST 1 (b)
# 6 BINARY_OP 0 (+)
# 10 RETURN_VALUE
Frame objects
Every function call allocates a frame object on the Python call stack. A frame holds: locals, the value stack, the bytecode instruction pointer, the parent frame.
import sys
def f():
frame = sys._getframe()
print(frame.f_code.co_name, frame.f_lineno)
f() # f, <line>
Frames are heap-allocated objects, not C stack frames. This is why Python’s recursion limit is a Python-level integer (sys.setrecursionlimit), not a kernel limit.
Interview framing
“When you call a Python function, what’s the cost?”
Allocate a frame object, push it, populate locals from the argument tuple, execute bytecode, decref the frame. Function calls in Python are expensive — typically 100ns–1µs — which is why for over a list is faster than map(lambda…) for trivial bodies. Knowing this lets you defend choices like inline arithmetic vs operator.add.
2. The GIL — What It Is, What It Protects, When It Releases
The Global Interpreter Lock is a mutex inside the CPython interpreter. Only one thread can execute Python bytecode at a time per process.
What it protects
The GIL exists because CPython’s memory management (refcounts, GC structures, dict internals, etc.) is not thread-safe. Without the GIL, every refcount increment would need an atomic, killing single-threaded performance.
It protects interpreter state, not your data structures. list.append is atomic by accident (it’s a single bytecode), but counter += 1 is not (it’s LOAD, ADD, STORE).
When it releases
- I/O operations (file read/write, socket,
time.sleep) — the C extension drops the GIL while blocking. - Some C extensions explicitly drop it (NumPy heavy ops, hashlib).
- Every ~5ms (
sys.setswitchinterval) — the interpreter voluntarily releases for scheduling.
import threading, time
counter = 0
def bump():
global counter
for _ in range(1_000_000):
counter += 1 # NOT atomic
threads = [threading.Thread(target=bump) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # NOT 4_000_000
Implications
- CPU-bound parallelism via
threadingis impossible in standard CPython. Usemultiprocessingor release the GIL via C extensions. - I/O-bound parallelism via
threadingworks. Each thread releases the GIL while waiting on the network. asynciois an alternative I/O model; it does not bypass the GIL — it doesn’t need to, because there’s only one thread anyway.
Free-threaded Python (3.13+)
PEP 703 introduced an optional no-GIL build (python3.13t). Refcounts become atomic, dict/list grow per-thread fast paths, GC adopts new locking. It is not the default and many C extensions break under it. For interview purposes:
“Python 3.13 ships an experimental free-threaded build that removes the GIL. It’s opt-in, slower for single-threaded code today, and not yet ABI-stable for the ecosystem. Default CPython still has the GIL.”
3. Memory Model — Refcounts + Generational GC
Every Python object has a reference count. When it hits zero, the object is freed immediately.
import sys
a = [1, 2, 3]
sys.getrefcount(a) # 2 — one for `a`, one for the argument to getrefcount
b = a
sys.getrefcount(a) # 3
del b
sys.getrefcount(a) # 2
Why we need a GC on top
Refcounts cannot collect cycles:
a = []
b = []
a.append(b)
b.append(a)
del a, b # Refcount of each is still 1 — they reference each other.
The generational tracing GC in the gc module sweeps for cycles. Three generations (0, 1, 2). Newly created containers go in gen 0. Survivors are promoted. Older generations are collected less often.
import gc
gc.collect() # force a full collection
gc.get_threshold() # (700, 10, 10)
__del__ pitfalls
__del__ is a finalizer, not a destructor. Two traps:
- Cycles with
__del__used to be uncollectable before Python 3.4. Now they are collected, but the order is unspecified. __del__may run during interpreter shutdown when module globals are alreadyNone.
class Bad:
def __del__(self):
print(open) # may be None during shutdown
Use weakref.finalize or context managers (with) instead.
weakref
A weakref does not increment the refcount. Useful for caches and observer patterns.
import weakref
class Node: pass
n = Node()
r = weakref.ref(n)
print(r()) # <Node>
del n
print(r()) # None
Interview framing
“How does Python free memory?”
Refcounting frees most things eagerly; a generational tracing collector cleans up cycles. Compared to Java, allocations are cheaper to free on the common path (no pauses on most exits) but every operation has a per-pointer atomic increment cost — which is part of why Python is slow.
4. Object Model — __slots__, Descriptors, MRO
Every Python object is, by default, a dict-backed thing: instance attributes live in __dict__. This is why Python objects are 5–10x larger than equivalent C structs.
__slots__
Declare attributes statically and the interpreter skips __dict__:
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x, self.y = x, y
# ~56 bytes per Point with __slots__, ~328 bytes without (roughly).
__slots__ cost: no dynamic attribute addition. Subclasses that don’t redeclare __slots__ lose the optimization. Use them for value classes with millions of instances.
Descriptors
Properties, classmethods, staticmethods are all built on the descriptor protocol: an attribute access triggers __get__ / __set__ / __delete__ on the class attribute.
class Lazy:
def __init__(self, fn): self.fn = fn
def __get__(self, obj, cls):
v = self.fn(obj)
setattr(obj, self.fn.__name__, v)
return v
class C:
@Lazy
def expensive(self):
return sum(range(10**6))
MRO and C3
Multiple inheritance resolution uses the C3 linearization algorithm. It guarantees a deterministic order that respects: a class precedes its parents; left-to-right inheritance order.
class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass
print(D.__mro__)
# (D, B, C, A, object)
Interview framing
“Why is Python OO so slow?”
Every attribute access is a dict lookup on the instance, then a walk up the MRO if not found. __slots__, caching, and attrs/dataclass(slots=True) mitigate. JITs (PyPy) inline these.
5. Iterators, Generators, yield from
An iterator is any object with __iter__ and __next__. A generator is a function with yield; calling it returns an iterator without running the body.
def count_up(n):
for i in range(n):
yield i
g = count_up(3)
next(g) # 0
next(g) # 1
Generators suspend frame state on yield. The frame is heap-allocated, kept alive by the generator object, and resumed on next().
yield from
Delegates iteration to a sub-iterator and forwards send, throw, close.
def chain(a, b):
yield from a
yield from b
Why this matters
Generators are the foundation of asyncio (coroutines were generators before async def), pipelines, and lazy I/O. They allow processing infinite or huge sequences without materializing them.
def lines(path):
with open(path) as f:
for line in f: # iterator protocol over file
yield line.rstrip() # constant memory
# Process 100GB log: O(line) memory.
6. List Internals — Over-allocation, Amortized Append
list is a dynamic array of PyObject*. Capacity grows geometrically.
CPython’s growth pattern (in listobject.c):
new_size = (new_size + (new_size >> 3) + 6) & ~3
That’s roughly 1.125x growth (smaller than C++ vector’s ~1.5–2x).
| Operation | Complexity |
|---|---|
lst[i] | O(1) |
lst.append(x) | Amortized O(1), worst O(N) on resize |
lst.insert(0, x) | O(N) |
lst.pop() | O(1) |
lst.pop(0) | O(N) — use collections.deque |
x in lst | O(N) |
lst.sort() | O(N log N), Timsort, stable |
lst[a:b] | O(b-a), creates a copy |
lst = []
for i in range(1_000_000):
lst.append(i) # Amortized O(1) total O(N)
Pitfalls
grid = [[0] * 3] * 3 # WRONG — three references to one row
grid[0][0] = 1
print(grid) # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]
grid = [[0] * 3 for _ in range(3)] # right
Interview framing
“Why does
list.appendaverage O(1)?”
Geometric growth: O(N) total work across N appends → O(1) amortized. The classic amortization proof.
7. Dict Internals — Open Addressing, Probing, Ordering
dict is a hash table with open addressing. Compact since 3.6 (split into a sparse index array + dense entries array). Insertion-ordered since 3.7 (language guarantee, not just CPython).
Lookup algorithm (simplified)
- Compute
hash(key) & (table_size - 1)→ slot. - If slot empty → not found. If key matches → hit.
- Else probe:
i = (5*i + 1 + perturb) % size; perturb >>= 5. The “perturb” trick mixes high bits of the hash into early probes, reducing clustering.
Hash randomization
hash(str) and hash(bytes) use a per-process random seed (since 3.3) to mitigate algorithmic-complexity DoS. PYTHONHASHSEED=0 disables it, useful for reproducibility but unsafe in production.
import os
os.environ['PYTHONHASHSEED'] # not set in interactive: random per process
hash("foo") # different across processes
hash(int) is the integer itself (mod a prime). hash(-1) is special-cased to -2 because -1 signals errors in C.
Worst case
Adversarial keys with colliding hashes degrade to O(N) per operation. Hash randomization defeats the basic attack but custom __hash__ returning a constant still breaks it.
class Bad:
def __hash__(self): return 0
def __eq__(self, other): return False # never equal — every insert collides
d = {}
for i in range(1000):
d[Bad()] = i # O(N) per insert → O(N²) total
Complexity table
| Operation | Avg | Worst |
|---|---|---|
d[k] | O(1) | O(N) |
d[k] = v | O(1) amortized | O(N) |
del d[k] | O(1) | O(N) |
k in d | O(1) | O(N) |
| iter | O(N) | O(N) |
Interview framing
“Why are Python dicts ordered?”
Compact dict (3.6 CPython) stored entries in insertion order in a dense array, with a sparse index. The ordering was an implementation detail, then promoted to a language guarantee in 3.7.
8. Set Internals
set and frozenset are open-addressed hash tables, mechanically the same as dict minus the value column. Same complexity table, same adversarial caveats.
s = {1, 2, 3}
s | {4} # union, O(len(self) + len(other))
s & {2, 3, 5} # intersection, O(min(...))
s - {2} # difference, O(len(self))
Sets are not insertion-ordered. Do not rely on iteration order.
9. String Internals — Interning, Encoding, bytes vs str
Python 3 strings (str) are immutable Unicode code-point sequences. CPython stores them as one of:
- Latin-1 (1 byte/char) when all code points fit.
- UCS-2 (2 bytes/char) up to U+FFFF.
- UCS-4 (4 bytes/char) for the full range.
This is PEP 393 (“flexible string representation”). A string with a single emoji is 4× the bytes per char of a pure-ASCII string of the same length.
Interning
Short strings that look like identifiers are auto-interned. Equal interned strings share the same object → is works (by accident).
"hello" is "hello" # True (CPython, syntactic literals)
a = "hello world"
b = "hello world"
a is b # CPython 3.x: often True, but DO NOT RELY ON THIS
sys.intern(s) forces interning for runtime-built strings; speeds up dict lookups when the same key is used many times.
Concat in loop is O(N²)
s = ""
for c in chars:
s += c # creates a new string each time
CPython has a special-case optimization that sometimes makes this O(N) (when the refcount of s is 1 and the allocator can extend in place), but it is not guaranteed and disappears under any other reference. Use "".join(chars).
bytes vs str
bytes is an immutable byte sequence. str is a Unicode sequence. They do not implicitly convert in Python 3.
b"abc" + "xyz" # TypeError
b"abc".decode("utf-8") # → "abc"
"abc".encode("utf-8") # → b"abc"
Network/file boundaries are bytes. Application logic is strings. Convert at the boundary, never in the middle.
Interview framing
“Why does
s += cin a loop blow up?”
Strings are immutable, so each append allocates a new string and copies. CPython has an opportunistic in-place-extend hack that hides this in toy examples, but it’s fragile. Always "".join.
10. Hashing Protocol and __hash__ Contract
Two objects that compare equal must have the same hash. The reverse is not required.
class Point:
def __init__(self, x, y): self.x, self.y = x, y
def __eq__(self, other):
return isinstance(other, Point) and (self.x, self.y) == (other.x, other.y)
def __hash__(self):
return hash((self.x, self.y))
If you define __eq__ and not __hash__, your class becomes unhashable (Python sets __hash__ to None). This is by design — overriding equality without hash is almost always a bug.
Mutable types (list, dict, set) are unhashable by default — their hash would have to change as they mutate, breaking the dict/set invariant.
11. Mutable Default Arguments — The Most Famous Trap
def append_to(x, lst=[]):
lst.append(x)
return lst
append_to(1) # [1]
append_to(2) # [1, 2] ← !
Default values are evaluated once, when the def statement runs, and shared across calls. Idiomatic fix:
def append_to(x, lst=None):
if lst is None: lst = []
lst.append(x)
return lst
Every Python interviewer asks this once a year. Get it right and move on.
12. Closures and Late Binding
Free variables in closures are looked up by name at call time, not captured by value at definition.
funcs = [lambda: i for i in range(3)]
[f() for f in funcs] # [2, 2, 2] — not [0, 1, 2]
Fix with default arg (evaluated at def):
funcs = [lambda i=i: i for i in range(3)]
[f() for f in funcs] # [0, 1, 2]
This is the same bug as JavaScript’s var i in a loop. Both languages punish late binding.
13. Concurrency — Threading vs Multiprocessing vs Asyncio
| Model | Parallelism | Use For | Cost |
|---|---|---|---|
threading | Concurrent (GIL) | I/O-bound | Cheap threads (~MB stack), context switches |
multiprocessing | Parallel (separate processes) | CPU-bound | Process startup, IPC pickling |
asyncio | Concurrent (single thread) | High-fanout I/O | No OS threads; cooperative |
concurrent.futures | Wraps either | Convenient API | — |
Picking
- 10K simultaneous network connections?
asyncio. - 10 simultaneous network calls?
threading(or asyncio). - Heavy NumPy computation?
threading— NumPy releases the GIL. - Pure-Python CPU work?
multiprocessingor write the hot loop in C/Cython/Numba.
14. AsyncIO Model — Event Loop, Coroutines, Don’t Block The Loop
asyncio runs an event loop in one thread. Coroutines (async def) yield control on await, the loop schedules another coroutine that’s ready.
import asyncio
async def fetch(url):
print(f"start {url}")
await asyncio.sleep(1) # yields control
print(f"done {url}")
async def main():
await asyncio.gather(*[fetch(u) for u in ["a", "b", "c"]])
asyncio.run(main())
# All three start, all three finish ~1s later — concurrent on one thread.
Blocking the loop
If you do CPU work or call a sync blocking I/O function inside a coroutine, the entire loop stalls. Symptom: latency spikes for everyone.
async def bad():
time.sleep(1) # blocking — stalls the loop
requests.get("http://x") # blocking — same problem
async def good():
await asyncio.sleep(1)
async with aiohttp.ClientSession() as s:
await s.get("http://x")
For unavoidable blocking work: await loop.run_in_executor(None, blocking_fn).
Interview framing
“What’s the difference between
asyncioandthreading?”
Threading is preemptive multi-tasking by the OS, threads share memory, GIL serializes. Asyncio is cooperative single-thread; coroutines must await to yield. Both win on I/O. Asyncio scales to more in-flight ops because there’s no per-task OS thread.
15. Multiprocessing — Fork vs Spawn, Pickling
multiprocessing creates separate Python processes. Each has its own GIL → real parallelism.
Start methods
| Method | Default On | Cost | Caveat |
|---|---|---|---|
fork | Linux (was default until 3.14) | Cheap | Copy-on-write; not safe with threads, locks, or libraries that aren’t fork-safe (e.g. CUDA). |
spawn | macOS, Windows; default on Linux 3.14+ | Slow (re-imports) | All args must be picklable. |
forkserver | Linux | Mid | Compromise |
Pickling
Args and return values cross process boundaries via pickle. Lambdas, local functions, and many file/socket objects are not picklable.
from multiprocessing import Pool
def square(x): return x * x # top-level — picklable
with Pool(4) as p:
print(p.map(square, range(10)))
Shared memory
multiprocessing.shared_memory.SharedMemory (Python 3.8+) for zero-copy NumPy/byte sharing. Avoids the pickling round-trip for big arrays.
16. NumPy / Vectorization
NumPy stores numbers in contiguous C arrays of native types, not as PyObject*. Operations dispatch to optimized C/SIMD that releases the GIL.
import numpy as np
a = np.arange(1_000_000)
b = a * 2 + 1 # vectorized, ~ms; pure Python equivalent ~100ms
A for loop over a NumPy array is the worst of both worlds: Python overhead per iteration, no SIMD. If the operation has no NumPy expression, fall back to Numba, Cython, or a C extension.
This is a sneaky interview line: “Implement vector dot product without NumPy” → straightforward Python, then “now optimize” → vectorize, then “now scale” → talk about BLAS underneath NumPy.
17. Common Interview Gotchas
Integer caching
a = 256; b = 256
a is b # True — CPython caches -5..256
a = 257; b = 257
a is b # False (not guaranteed True; do NOT use `is` for value compare)
is vs ==
is is identity (same object). == is equality (__eq__). Use == for values. Use is for None, True, False, and singletons.
Sort stability
sorted() and list.sort() use Timsort — stable. You can sort by multiple keys via successive stable sorts (least significant first).
data = [("a", 2), ("b", 1), ("a", 1)]
data.sort(key=lambda x: x[1])
data.sort(key=lambda x: x[0]) # stable preserves the previous order on ties
dict.get default mutation
d = {}
d.setdefault("k", []).append(1) # one allocation
# vs
d["k"] = d.get("k", []) + [1] # quadratic for many appends
Use collections.defaultdict(list) if you append a lot.
Truthy surprises
bool([]) # False
bool([0]) # True (non-empty list)
bool(0.0) # False
bool("False") # True (non-empty string)
18. Recursion Limits
import sys
sys.getrecursionlimit() # default 1000
sys.setrecursionlimit(10000)
Python frames are heap-allocated but each is non-trivial (~500 bytes). Setting the limit too high crashes the interpreter on stack overflow of the C stack.
Convert deep recursion to iteration with an explicit stack (Phase 1, Lab 8). This isn’t optional in interviews — recursion depth = N in tree problems with skewed inputs is real.
19. Performance Hot Tips
-
Avoid attribute lookup in hot loops: bind to a local first.
append = result.append # local — fastest opcode for x in data: append(transform(x)) -
Built-ins are C.
sum,min,max,any,all,map,sorted— written in C, beat hand-rolled loops. -
Comprehensions beat
for+append. They skip theLOAD_ATTRforappend. -
functools.lru_cachefor memoization — drop-in, fast. -
String formatting: f-strings >
%>.format()> concatenation. -
__slots__for value classes with millions of instances. -
Profile before optimizing.
cProfile,pyinstrument,py-spy(sampling, no code changes).
import cProfile
cProfile.run("expensive()", sort="cumulative")
20. Standard Library Essentials
collections
Counter— multiset;most_common(k).deque— O(1) appends/pops at both ends. Use for queues and sliding windows.defaultdict— auto-vivifying dict.OrderedDict— historically ordered; todaydictis too. Use OrderedDict only formove_to_endand reverse iteration.namedtuple— lightweight value class.dataclass(frozen=True, slots=True)is the modern alternative.
heapq
Min-heap on a list. No max-heap — negate values.
import heapq
h = []
heapq.heappush(h, 3)
heapq.heappush(h, 1)
heapq.heappop(h) # 1
heapq.nlargest(3, data) # k-largest
bisect
Binary search on a sorted list.
import bisect
i = bisect.bisect_left(sorted_arr, x) # insertion point
bisect.insort(sorted_arr, x) # O(N) insert
itertools
chain,cycle,repeatcombinations,permutations,productaccumulate(prefix sums!),groupby,pairwise(3.10+)
from itertools import accumulate, pairwise
list(accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
list(pairwise([1, 2, 3, 4])) # [(1,2), (2,3), (3,4)]
functools
lru_cache,cache(3.9+).reduce.partial.cached_property.
What To Memorize Cold
- GIL releases on I/O and at switch interval; doesn’t release for pure-Python CPU.
dictis open-addressed, ordered since 3.7, hash-randomized.- List growth ~1.125×. Append amortized O(1).
strimmutable, three width tiers (1/2/4 byte). Concat in loop ⇒"".join.isvs==. Integer cache -5..256.- Mutable default args evaluated once.
- Closure late binding fix:
lambda x=x: …. - Threading for I/O, multiprocessing for CPU, asyncio for fanout.
__hash__must agree with__eq__.heapqis min-only.
If any of those is fuzzy, re-read this document. Then code something that breaks because of it, on purpose. That’s the lesson that sticks.
JavaScript & TypeScript Runtime Deep Dive
Target audience: candidates interviewing for frontend, full-stack, or Node.js backend roles where the interviewer probes “what does the event loop actually do?”, “explain
this”, “why istypeof null === 'object'”, or “how does TypeScript narrow this union?”Scope: V8 (Chrome / Node) primarily, with mentions of SpiderMonkey (Firefox) and JSC (Safari) where they diverge. TypeScript 5.x.
JS sits in an awkward place: senior interviewers know the language is full of warts and they will use them. Memorizing trivia is necessary but not sufficient. The leverage comes from understanding the engine model and the type system’s structural reasoning.
1. V8 Internals — Ignition + TurboFan
V8 (and similar engines) compile JS through a pipeline:
- Parser → AST.
- Ignition — bytecode interpreter. Fast startup.
- TurboFan — optimizing JIT. Profiles hot code, generates speculative machine code.
- Deoptimization — when speculation breaks (a function suddenly receives a different type), TurboFan bails back to Ignition.
function add(a, b) { return a + b; }
// Called 10000x with (number, number) → TurboFan compiles it to fast int add.
add("foo", "bar"); // Type changed → deopt, recompile or bail out.
Hidden classes (shapes / maps)
V8 tracks the structural “shape” of objects internally — what fields exist in what order. Each property add changes the hidden class. Two objects with the same hidden class share a fast property layout.
function Point(x, y) { this.x = x; this.y = y; }
const a = new Point(1, 2);
const b = new Point(3, 4);
// a and b share a hidden class → fast.
a.z = 5; // a's hidden class diverges from b's → slower.
Inline caches (ICs)
Property access (obj.x) is monitored. If the hidden class is consistent, the IC fast-paths to a direct memory offset. Polymorphic ICs (multiple shapes) are slower; megamorphic (>4) drops to a hash lookup.
Interview takeaway: initialize all properties in the constructor in the same order. Don’t add/delete properties dynamically in hot code.
2. Event Loop — Tasks, Microtasks, RAF
The browser/Node runs one JS thread. Concurrency comes from yielding back to the event loop.
┌─────────────────────────┐
│ Call Stack │
└────────┬────────────────┘
│ runs to completion
┌────────▼────────────────┐
│ Microtask Queue │ ← Promises, queueMicrotask, MutationObserver
└────────┬────────────────┘
│ drained fully
┌────────▼────────────────┐
│ Task Queue │ ← setTimeout, setInterval, I/O, UI events
└─────────────────────────┘
Order of execution:
- Run the current synchronous code to completion.
- Drain the entire microtask queue.
- Pick one task from the task queue.
- Repeat from step 2.
console.log('a');
setTimeout(() => console.log('b'), 0);
Promise.resolve().then(() => console.log('c'));
console.log('d');
// a, d, c, b
Microtasks starve macrotasks if they keep enqueueing themselves:
function loop() { Promise.resolve().then(loop); }
loop(); // freezes the event loop — UI never repaints
requestAnimationFrame
Browser-only. Fires before the next paint, ~60fps. Use for animation; cheaper than setInterval(_, 16) because it’s coalesced with paint.
Node specifics
Node uses libuv. Its event loop has phases (timers, pending callbacks, poll, check, close). process.nextTick runs before microtasks (a Node-only queue with even higher priority than Promise microtasks).
process.nextTick(() => console.log('next'));
Promise.resolve().then(() => console.log('promise'));
// next, promise
3. async / await
async functions return a Promise. await desugars to .then.
async function f() {
const a = await fetch1();
const b = await fetch2(a);
return b;
}
// equivalent to:
function f() {
return fetch1().then(a => fetch2(a));
}
await x where x is not a thenable wraps it in Promise.resolve(x).
Sequential vs parallel
// Sequential (slow if independent):
const a = await op1();
const b = await op2();
// Parallel (correct when independent):
const [a, b] = await Promise.all([op1(), op2()]);
Default to Promise.all when the operations are independent — common interview ask.
Errors
try/catch works on await. An unhandled rejection in async context surfaces as unhandledrejection (browser) / process.on('unhandledRejection') (Node).
async function f() {
try {
await mayReject();
} catch (e) {
// handle
}
}
4. Promise Gotchas
-
A promise is not “the running operation” — it represents a value that will exist later. The work is already started before the Promise is constructed (in most APIs).
-
Errors thrown inside
.thencallbacks become rejections of the chain. -
Forgetting
returninside.thenbreaks chaining:p.then(x => { doSomething(x); // forgot return — next .then sees undefined }).then(use); -
Promise.allshort-circuits on first rejection. UsePromise.allSettledto wait for all and inspect. -
Unhandled rejection is now noisy in Node and the browser. Always attach a
.catchorawaitinsidetry/catch. -
Promiseis not cancelable. AbortController + AbortSignal pattern handles cancellation explicitly.
const ctrl = new AbortController();
fetch(url, { signal: ctrl.signal });
// later:
ctrl.abort();
5. Memory Model and GC
V8’s heap has generational GC:
- Young generation (Scavenger / semi-space copying): minor GC, very fast (~ms). Most objects die young.
- Old generation (Mark-sweep / mark-compact): major GC. Concurrent marking, parallel sweeping, incremental compaction.
Memory leaks in JS
- Unintentional globals — assigning to a name without
let/const(in non-strict mode) creates a global, never collected. - Closures — capturing large objects in long-lived callbacks.
- Event listeners — not removed when DOM nodes are detached.
- Timers —
setIntervalcallbacks retain captured state forever. - Detached DOM — references to removed DOM nodes from JS keep them alive.
Map/Set— keys held strongly. UseWeakMap/WeakSetfor “annotations on objects.”
Profiling
Chrome DevTools → Memory → heap snapshot, allocation timeline. Look for retained sizes and detached DOM trees.
6. Object Model — Prototypes
Every object has an internal [[Prototype]] (accessed via Object.getPrototypeOf or __proto__). Property lookup walks the prototype chain.
const a = { x: 1 };
const b = Object.create(a);
b.y = 2;
b.x; // 1 — looked up on a
Object.getPrototypeOf(b) === a; // true
prototype (the property) vs __proto__ (the link)
Foo.prototype is the object that becomes __proto__ of instances created with new Foo().
function Foo() {}
const f = new Foo();
f.__proto__ === Foo.prototype; // true
Foo.prototype.__proto__ === Object.prototype;
class
class is sugar over prototypes. The methods live on Foo.prototype.
class Foo {
greet() { return 'hi'; }
}
typeof Foo.prototype.greet; // 'function'
Object.create(null)
A “dictionary object” with no prototype — no inherited properties from Object.prototype. Useful as a hash map.
7. this Binding
JS binds this at call site, not at definition. Five rules in priority order:
new:this= new instance.call/apply/bind: explicit binding wins.- Method call (
obj.f()):this=obj. - Plain call (
f()):this=undefinedin strict mode, global object otherwise. - Arrow functions: no own
this— inherit from surrounding lexical scope.
const obj = { x: 1, f() { return this.x; } };
const g = obj.f;
obj.f(); // 1
g(); // undefined (strict) — `this` is global / undefined
class C {
val = 42;
arrow = () => this.val; // bound to instance
method() { return this.val; }
}
const c = new C();
const a = c.arrow;
const m = c.method;
a(); // 42 — arrow captured `this`
m(); // TypeError — method lost `this`
8. Closures, var / let / const, Scope
A closure is a function plus the lexical environment it was created in.
function counter() {
let n = 0;
return () => ++n;
}
const c = counter();
c(); c(); c(); // 1, 2, 3
var (function-scoped, hoisted)
console.log(x); // undefined (hoisted, not initialized)
var x = 1;
let / const (block-scoped, TDZ)
console.log(y); // ReferenceError — TDZ
let y = 1;
The “Temporal Dead Zone” is the period between block entry and the let/const declaration. Accessing the binding in TDZ throws.
Loop var capture
for (var i = 0; i < 3; i++) setTimeout(() => console.log(i), 0);
// 3 3 3 — single `i`, all callbacks share it
for (let i = 0; i < 3; i++) setTimeout(() => console.log(i), 0);
// 0 1 2 — fresh `i` per iteration
This is the classic JS interview question. Use let.
9. Equality
===strict — same type and value, withNaN !== NaNand+0 === -0.==loose — type coercion. Don’t use it except forx == null(matches bothnullandundefined).Object.is(a, b)— like===butObject.is(NaN, NaN) === trueandObject.is(+0, -0) === false.
NaN === NaN; // false
Object.is(NaN, NaN); // true
1 == '1'; // true (coercion)
[] == false; // true (!)
[] == ![]; // true (!)
The == rules are an interview trap. Memorize only the null/undefined exception; use === everywhere else.
10. Top Gotchas
typeof null === 'object'
A historical bug, never fixed for compatibility. Test for null with === null.
parseInt radix
parseInt('010'); // 10 in modern engines (used to be 8)
parseInt('010', 10); // always 10 — pass the radix.
Always pass 10. ESLint enforces this.
Floating-point
0.1 + 0.2; // 0.30000000000000004
0.1 + 0.2 === 0.3; // false
Use Number.EPSILON tolerance, or bigint for exact integer arithmetic.
Array coercion
[] + []; // ''
[] + {}; // '[object Object]'
{} + []; // 0 (in some contexts — `{}` parsed as block)
Don’t + non-numbers. Use template literals or explicit String(x).
== and falsy
0 == ''; // true
0 == '0'; // true
'' == '0'; // false
This is why === exists.
for...in vs for...of
for (const k in obj)iterates enumerable string-keyed properties (includes inherited!).for (const v of iterable)iterates iterable’s values.
const arr = [1, 2, 3];
for (const i in arr) console.log(i); // '0' '1' '2' — strings, indexes
for (const v of arr) console.log(v); // 1 2 3
Don’t use for...in on arrays.
delete on array
delete arr[i] leaves a hole (sparse array), doesn’t shorten. Use splice.
11. Map vs Object
Object | Map | |
|---|---|---|
| Keys | Strings & symbols | Anything |
| Iteration | Object.keys/entries, no order guarantee for non-int | Insertion order |
| Size | Object.keys(o).length (O(N)) | m.size (O(1)) |
| Inheritance pollution | Yes (__proto__, toString…) | No |
| JSON | Yes | No (need conversion) |
Use Map when:
- Keys are dynamic strings (esp. user input).
- You need any-typed keys.
- Insertion order matters.
- You add/remove keys frequently.
Use Object when:
- Keys are known compile-time / config-shaped.
- You’ll JSON-serialize.
- You’re using TypeScript’s structural types.
12. Set, WeakMap, WeakSet
Set— collection of unique values; insertion order.WeakMap— keys are objects, weakly held. If the key is GC’d, the entry disappears. Not iterable. Use for “annotations on objects.”WeakSet— set of weakly-held objects. Use for “have I seen this object?” without preventing GC.
const tags = new WeakMap();
function tag(node, value) { tags.set(node, value); }
// when `node` is GC'd, the tag is gone too.
Practical use: caches keyed on DOM nodes, private state on objects, libraries that observe but don’t own.
WeakRef (newer)
new WeakRef(obj) lets you hold a weak reference and dereference it (.deref()) later, getting the object or undefined if collected. Niche — you probably don’t need it.
13. TypeScript — Structural Typing & Generics
TypeScript types are structural. If two types have the same shape, they’re compatible.
interface Named { name: string }
const u: Named = { name: 'a', age: 30 }; // OK — extra props allowed in this position
function greet(p: Named) { return p.name; }
greet({ name: 'a', extra: 1 } as any);
Generics
function identity<T>(x: T): T { return x; }
identity<number>(42);
identity('hi'); // T inferred as string
Constraints
function len<T extends { length: number }>(x: T): number { return x.length; }
Conditional types
type IsString<T> = T extends string ? true : false;
type A = IsString<'hi'>; // true
type B = IsString<42>; // false
Mapped types
type Partial<T> = { [K in keyof T]?: T[K] };
type Readonly<T> = { readonly [K in keyof T]: T[K] };
Utility types
Partial<T>, Required<T>, Pick<T, K>, Omit<T, K>, Record<K, V>, ReturnType<F>, Parameters<F>, Awaited<T>. Memorize the names; they come up.
14. TS Narrowing
The control-flow analyzer narrows union types based on runtime checks.
function f(x: string | number) {
if (typeof x === 'string') {
x.toUpperCase(); // narrowed to string
} else {
x.toFixed(2); // narrowed to number
}
}
Narrowing operators
-
typeof→"string","number","boolean","undefined","object","function","symbol","bigint". -
instanceof→ for class instances. -
inoperator:if ('foo' in obj). -
Equality:
if (x === null),if (x === undefined). -
Discriminated unions:
type Result = { ok: true; value: string } | { ok: false; error: Error }; function f(r: Result) { if (r.ok) r.value; // OK else r.error; // OK } -
User-defined type guards:
function isString(x: unknown): x is string { return typeof x === 'string'; } -
Assertion functions:
function assertNumber(x: unknown): asserts x is number { if (typeof x !== 'number') throw new Error('not a number'); }
Exhaustiveness with never
type Shape = { kind: 'circle' } | { kind: 'square' };
function area(s: Shape): number {
switch (s.kind) {
case 'circle': return ...;
case 'square': return ...;
default: const _: never = s; throw new Error('unreachable');
}
}
The never assignment fails to compile if a new variant is added — catches missing cases.
15. Performance Tips
- Stable hidden classes — set all properties in the constructor in the same order. Don’t add later.
- Avoid
deleteon hot objects — it transitions to dictionary mode. - Monomorphic functions — call them with the same shapes. Polymorphic = slower.
- Typed arrays for numeric work —
Float64Array,Int32Array. Pre-allocated, contiguous, no boxing. - Avoid
argumentsin hot code; use...rest.argumentsdefeats some optimizations. foroverforEachin hot loops — slightly faster, no callback overhead. Less true with modern engines but still measurable on tight loops.- Pre-compile regexes — declare at module scope, not inside functions.
- Avoid leaking with
try/catchin hot functions on old V8 (pre-2017). Modern V8 handles it; not a real concern anymore. - Profile before optimizing. Chrome DevTools Performance tab; Node
--profandclinic.js. - Reduce object churn — V8 likes long-lived monomorphic objects.
// Bad — creates new shapes per call:
function pt() { return { x: 1, y: 2 }; }
// Better — a class V8 can specialize:
class Pt { constructor(x, y) { this.x = x; this.y = y; } }
16. Node vs Browser
| Node | Browser | |
|---|---|---|
| Globals | process, Buffer, __dirname, global | window, document, navigator |
| Modules | CommonJS (require) + ESM | ESM + bundlers |
| I/O | libuv: fs, net, dns, child_process | fetch, Web APIs |
| DOM | None (use jsdom if needed) | Yes |
| Threads | worker_threads, cluster | Worker, SharedArrayBuffer |
libuv thread pool
Node uses a thread pool (default 4) for fs, dns.lookup, crypto.pbkdf2, etc. — anything that can’t be epoll-ed.
UV_THREADPOOL_SIZE=16 node app.js
Network I/O is not on the thread pool; it’s on the event loop using async syscalls.
worker_threads vs cluster
worker_threads— separate JS thread, separate event loop, can share memory viaSharedArrayBuffer. Use for CPU-bound work.cluster— multiple processes, no shared memory, IPC via channels. Use for scaling HTTP servers across cores.
process vs globalThis
globalThis (ES2020) is the universal global object — works in browser, Node, workers.
17. What To Memorize Cold
- V8 pipeline: Ignition (bytecode) → TurboFan (optimizing JIT). Hidden classes + ICs drive speed. Property order matters.
- Event loop: sync runs to completion → drain microtasks → one task → repeat. Microtasks include Promises, queueMicrotask.
- Node nextTick > Promise microtask > timers/IO.
async/awaitdesugars to Promises. UsePromise.allfor independent work.- Promises: not cancelable, errors → rejection, must catch. AbortController for cancellation.
- GC: generational. Leaks: globals, closures, listeners, timers, detached DOM. WeakMap/WeakSet for object-keyed metadata.
- Prototype chain:
__proto__link,prototypeproperty on functions/classes.classis sugar. thisrules: new > call/apply/bind > method > default. Arrow inherits lexical.varhoisted/function-scoped,let/constblock-scoped + TDZ. Loop var capture: uselet.- Equality:
===always;Object.isfor NaN/±0;==only forx == null. - Top traps:
typeof nullis"object",parseIntradix, FP math,for...invsfor...of,deleteon array. MapvsObject: Map for dynamic keys, any types, ordered, fastsize.WeakMapfor object-keyed weak metadata.- TS structural typing. Utility types: Partial/Required/Pick/Omit/Record/ReturnType. Conditional + mapped types.
- TS narrowing:
typeof,instanceof,in, discriminated unions, type guards, assertion functions,neverfor exhaustiveness. - Perf: stable hidden classes, monomorphic call sites, typed arrays for numeric, pre-compile regex, profile.
- Node: libuv thread pool for fs/dns/crypto.
worker_threads(CPU) vscluster(HTTP scale).
JS is forgiving until it isn’t. The interviewer will test the spots where it isn’t. Fluency on the event loop, this, equality, and TypeScript narrowing usually decides senior-level outcomes.
Java Runtime Deep Dive
Target audience: candidates interviewing in Java at FAANG, finance, Android, or any backend role where the interviewer is allowed to probe “what does the JVM do here?”
Scope: HotSpot JVM (OpenJDK 21+ baseline), with notes on JDK 17 LTS where behavior diverges. Other JVMs (OpenJ9, GraalVM native-image) are noted only where they change interview-grade answers.
Java’s verbosity makes it easy to mistake “I write Java daily” for “I know Java.” A senior interviewer will quickly find the gap by asking what Integer i = 200; Integer j = 200; i == j returns, why your HashMap has O(log N) worst-case lookup since Java 8, and what volatile actually guarantees. This guide closes the gap.
1. JVM Architecture
The JVM is a stack-based virtual machine with a tiered execution model.
.java ── javac ──► .class (bytecode)
│
▼
┌─────────────────────┐
│ Class Loader │ (Bootstrap → Platform → App)
└────────┬────────────┘
▼
┌─────────────────────┐
│ Runtime Data Areas │ Heap, Method Area / Metaspace,
│ │ Stacks (per-thread), PC Reg, Native Stack
└────────┬────────────┘
▼
┌─────────────────────┐
│ Execution Engine │ Interpreter ↔ C1 (client) ↔ C2 (server)
│ │ + Tiered Compilation + OSR
└─────────────────────┘
Tiered compilation (default since Java 8): hot methods are compiled by C1 (fast, lower-quality code) and after enough invocations re-compiled by C2 (slower, high-quality, profile-guided). OSR (on-stack replacement) lets a long-running interpreted loop be replaced by JITed code mid-flight.
// Hot loop — JIT will inline, unroll, and vectorize this.
long sum = 0;
for (int i = 0; i < 1_000_000_000; i++) sum += i;
To see what the JIT does:
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Main # needs hsdis
java -XX:+PrintCompilation -XX:+PrintInlining Main
Class loaders
Three default loaders form a delegation chain:
- Bootstrap — loads
java.base(rt.jar in old days). - Platform — loads
java.*modules outsidejava.base. - App — loads your classpath.
Delegation rule: every loader asks its parent first. This is why you cannot shadow java.lang.String with your own.
Interview framing
“What happens when I call a Java method?”
The bytecode invokevirtual looks up the method via the receiver’s class vtable, the interpreter executes it, profile data accumulates, and after a threshold the JIT compiles a specialized version. Subsequent calls jump to native code.
2. Memory Regions
| Region | Per-thread? | Holds | GC? |
|---|---|---|---|
| Heap (Young + Old) | No | All Java objects | Yes |
| Metaspace | No | Class metadata, method bytecode | Yes (rare) |
| JVM Stack | Yes | Frames: locals, operand stack, return | No (LIFO) |
| PC Register | Yes | Current bytecode index | No |
| Native Stack | Yes | C stack for JNI / runtime | No |
| Code Cache | No | JIT-compiled native code | Evicted |
Heap structure under generational collectors (G1, Parallel, Serial):
Young Generation: Eden + Survivor 0 + Survivor 1
Old Generation: tenured objects
Allocation goes to Eden (bump-pointer in a thread-local allocation buffer — TLAB — so it’s lock-free). When Eden fills, a minor GC moves live objects to a Survivor space. Objects that survive enough minor GCs are promoted to Old.
// Each call allocates in Eden — extremely fast (just bump a pointer in TLAB).
List<Integer> tmp = new ArrayList<>();
Common JVM flags
-Xms2g -Xmx2g # initial / max heap
-XX:MetaspaceSize=256m
-XX:+UseG1GC # default since 9
-XX:MaxGCPauseMillis=200 # G1 pause target
-XX:+HeapDumpOnOutOfMemoryError
Interview framing
“Where does
new ArrayList<>()live?”
In Eden, on the heap. The reference variable lives in the current frame on the JVM stack. Allocation is bump-pointer in a TLAB; collection is generational copying.
3. Garbage Collectors
Java has many GCs. Know these four:
| GC | When | Pause | Trade-off |
|---|---|---|---|
| Serial | Tiny heaps / single-CPU | Stop-the-world | Simplest, smallest |
| Parallel (Throughput) | Batch jobs | STW, multi-thread | Max throughput, ignores pause |
| G1 (default) | General server | Soft target ms-scale | Balances throughput + pause |
| ZGC | Low-latency services | Sub-ms (since 21 generational) | Concurrent, region-based |
| Shenandoah | RH-flavored ZGC analog | Sub-ms | Concurrent, region-based |
G1 in 90 seconds
Heap is split into ~2000 equal-size regions of 1–32 MB. Regions are tagged Eden / Survivor / Old / Humongous. G1 maintains a remembered set per region tracking incoming references so it can collect a subset of regions (“the collection set”) without scanning the whole heap. Pauses are bounded by MaxGCPauseMillis; G1 picks regions to maximize freed space within the budget.
Young collection: evacuate Eden + Survivor → new Survivor / Old
Mixed collection: young + selected old regions
Concurrent mark: tracks live objects in Old without stopping the app
Full GC: fallback STW; means you've misconfigured
A Full GC in G1 is a sign of trouble — usually too small a heap, humongous-allocation churn, or metaspace pressure.
ZGC / Shenandoah
Concurrent, region-based, colored pointers (ZGC uses high bits of references as state). Pause times stay sub-ms regardless of heap size (TB-scale heaps tested). Trade-off: higher CPU and slightly lower throughput. Generational ZGC (Java 21) closes the throughput gap.
Tuning lever, not algorithm
In an interview: say which collector and why, not “I tuned the GC.” If you tuned, name the flag and the metric you watched.
4. Object Model — Headers, Autoboxing, Integer Cache
Every Java object has a header (12 or 16 bytes depending on compressed oops + alignment) before its fields. An int field costs 4 bytes; an Integer reference costs 4 bytes (compressed oops) + the boxed object’s overhead (≈16 bytes).
// Roughly:
// int[1_000_000] ≈ 4 MB
// Integer[1_000_000] ≈ 20 MB (4 MB array + ~16 MB of boxed Integers)
Autoboxing
Java silently converts int ↔ Integer. Each unbox can throw NullPointerException if the wrapper is null.
Integer x = null;
int y = x; // NPE — boxed → primitive deref
Integer cache
Integer.valueOf(i) caches -128..127 (and the cache upper bound is tunable via -XX:AutoBoxCacheMax). This produces the most-asked Java gotcha in history:
Integer a = 100, b = 100;
System.out.println(a == b); // true — cached, same object
Integer c = 200, d = 200;
System.out.println(c == d); // false — new objects, == compares references
System.out.println(c.equals(d)); // true
Always use .equals() for boxed numbers. == on object references checks identity.
Interview framing
“Why does
Integer == Integersometimes work and sometimes not?”
Integer.valueOf caches small values; large values create new objects; == is reference identity. Use .equals (or .intValue() ==).
5. Primitives vs Wrappers
| Primitive | Bits | Wrapper | Default |
|---|---|---|---|
boolean | 1 (impl-defined) | Boolean | false |
byte | 8 | Byte | 0 |
short | 16 | Short | 0 |
char | 16 | Character | '\u0000' |
int | 32 | Integer | 0 |
long | 64 | Long | 0L |
float | 32 | Float | 0.0f |
double | 64 | Double | 0.0d |
Generics cannot use primitives → List<int> is illegal. Use List<Integer> (slow, boxed) or specialized libs (IntStream, Eclipse Collections, fastutil) for hot paths.
// Hot loop on primitives — JIT loves this.
long sum = 0;
for (int i : intArray) sum += i;
// Same loop on Integer — boxing in/out, GC pressure.
long sum = 0;
for (Integer i : integerList) sum += i;
Project Valhalla (preview) introduces value classes that erase the wrapper overhead. Not yet shippable; mention only if the interviewer raises it.
Overflow
Integer arithmetic wraps silently:
int x = Integer.MAX_VALUE + 1; // -2147483648 — no exception
Math.addExact(Integer.MAX_VALUE, 1); // throws ArithmeticException
In interviews involving sums, products, or mid = (lo + hi) / 2, always consider overflow and prefer mid = lo + (hi - lo) / 2.
6. Collections Framework
| Interface | Implementations | Note |
|---|---|---|
List | ArrayList, LinkedList | Use ArrayList by default |
Set | HashSet, LinkedHashSet, TreeSet | LinkedHash preserves insertion order |
Map | HashMap, LinkedHashMap, TreeMap, ConcurrentHashMap | TreeMap is a red-black tree |
Queue / Deque | ArrayDeque, LinkedList, PriorityQueue | ArrayDeque > LinkedList for stacks/queues |
ArrayList
Backed by an Object[]. Growth is 1.5× ((oldCap >> 1) + oldCap). Append amortized O(1). add(0, x) is O(N).
ArrayList<Integer> l = new ArrayList<>(1_000_000); // pre-size to avoid resizes
HashMap
Open chaining: each bucket is a linked list. Treeification (Java 8+): when a bucket has ≥ 8 entries and table size ≥ 64, the bucket converts to a red-black tree; back to a list at ≤ 6 entries.
// Worst-case lookup pre-Java-8: O(N). Post-Java-8: O(log N).
HashMap<String, Integer> m = new HashMap<>();
Default load factor 0.75, default capacity 16. put triggers resize() (allocate new table, rehash all entries) when size > capacity * loadFactor.
Hash function mixes the user hashCode() with (h ^ (h >>> 16)) to defend against weak hashes.
// hashCode contract: equal objects → equal hashCodes.
// Bad: forgetting hashCode when overriding equals
@Override public boolean equals(Object o) { ... }
// must override hashCode too
@Override public int hashCode() { return Objects.hash(...); }
LinkedHashMap
HashMap + doubly-linked list across entries. Iteration order = insertion (or access, with accessOrder=true). The 5-line LRU cache:
class LRU<K, V> extends LinkedHashMap<K, V> {
private final int cap;
LRU(int cap) { super(cap, 0.75f, true); this.cap = cap; }
@Override protected boolean removeEldestEntry(Map.Entry<K, V> e) {
return size() > cap;
}
}
TreeMap
Red-black tree → all ops O(log N), supports firstKey, floorKey, ceilingKey, subMap — irreplaceable for ordered queries.
PriorityQueue
Binary min-heap on an array. add / poll O(log N), peek O(1). Iteration order is not sorted — only the head is.
ConcurrentHashMap (Java 8+)
Lock-free reads, fine-grained synchronization on writes (CAS + synchronized per bucket). Replaces Hashtable (deprecated for performance) and Collections.synchronizedMap (one big lock).
7. Concurrency — synchronized, ReentrantLock, Atomics, CAS
synchronized
Reentrant intrinsic lock. Implemented as an object header bit + bias / lightweight / heavyweight states (HotSpot-specific).
synchronized (lock) {
// critical section
}
public synchronized void f() { ... } // same as synchronized(this)
public static synchronized void g() {} // synchronized on the Class object
ReentrantLock
java.util.concurrent.locks.Lock. Explicit lock/unlock, supports tryLock, lockInterruptibly, fairness, multiple condition variables.
Lock lock = new ReentrantLock();
lock.lock();
try { /* CS */ } finally { lock.unlock(); }
Pick ReentrantLock when you need timeouts, fairness, or multiple Conditions. Otherwise, synchronized is shorter and the JIT optimizes it well.
Atomics
AtomicInteger, AtomicLong, AtomicReference use CAS (Compare-And-Swap) on hardware. Lock-free, lower overhead than locks for single-variable updates.
AtomicInteger counter = new AtomicInteger();
counter.incrementAndGet(); // lock-free
counter.compareAndSet(0, 1); // CAS primitive
For high-contention counters, prefer LongAdder — it stripes the counter across cells to reduce CAS contention.
volatile
Marks a field for the JMM. Reads see the latest write from any thread. No atomicity — volatile int x; x++; is still a race.
volatile boolean shutdown = false; // OK as a flag
8. Java Memory Model — Happens-Before
The JMM defines when a write by one thread is visible to another. Without happens-before, the JIT and CPU may reorder, cache, or simply skip your reads.
Happens-before edges:
- Program order within a thread.
- Monitor lock release ↦ subsequent acquire of the same monitor.
volatilewrite ↦ subsequent volatile read of the same variable.- Thread.start() ↦ first action of the started thread.
- Thread’s last action ↦ Thread.join() return.
- Constructor’s
final-field write ↦ any reader of a properly published reference. - Transitivity: A→B and B→C ⇒ A→C.
// Classic publication bug — without `volatile`, another thread may see
// `instance != null` but read uninitialized fields.
class Singleton {
private static volatile Singleton instance;
public static Singleton get() {
Singleton s = instance;
if (s == null) {
synchronized (Singleton.class) {
s = instance;
if (s == null) instance = s = new Singleton();
}
}
return s;
}
}
Interview framing
“What does
volatilegive me?”
Visibility (no caching) and ordering (no reorder across the access). Not atomicity. Not mutual exclusion.
9. Executors and Thread Pools
new Thread(...) is a code smell — never spin OS threads ad-hoc.
ExecutorService pool = Executors.newFixedThreadPool(8);
Future<Integer> f = pool.submit(() -> compute());
Integer result = f.get();
pool.shutdown();
Built-in factories (and their traps)
| Factory | Backing queue | Trap |
|---|---|---|
newFixedThreadPool | unbounded LinkedBlockingQueue | Submitter overload → OOM |
newCachedThreadPool | SynchronousQueue | Unbounded thread count |
newSingleThreadExecutor | unbounded queue | Same OOM |
newScheduledThreadPool | DelayedWorkQueue | OK |
Production pattern: construct ThreadPoolExecutor directly with bounded queue + named thread factory + sensible rejection policy.
ThreadPoolExecutor pool = new ThreadPoolExecutor(
8, 16, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(1000),
namedThreadFactory("worker"),
new ThreadPoolExecutor.CallerRunsPolicy());
ForkJoinPool
Work-stealing pool used by parallelStream and CompletableFuture defaults. Each worker has its own deque; idle workers steal from the back of others’ deques. Optimized for divide-and-conquer.
10. CompletableFuture
A composable async-result type. Replaces Future (which has only blocking get).
CompletableFuture<String> f =
CompletableFuture.supplyAsync(() -> fetch())
.thenApply(String::trim)
.thenCompose(s -> CompletableFuture.supplyAsync(() -> enrich(s)))
.exceptionally(ex -> "fallback");
Combinators
| Method | Purpose |
|---|---|
thenApply | map (sync transform) |
thenCompose | flatMap (chain another future) |
thenCombine | zip two futures |
allOf / anyOf | combine many |
exceptionally / handle | error handling |
orTimeout (Java 9+) | bound completion |
Default executor for the *Async variants is ForkJoinPool.commonPool(). Use a dedicated executor for IO-bound work — the common pool’s parallelism is cpus - 1 and you’ll starve compute.
11. Generics and Type Erasure
Generics are a compile-time feature. The runtime sees raw types: List<String> becomes List. Type checks insert checkcast instructions.
List<String> a = new ArrayList<>();
List<Integer> b = new ArrayList<>();
a.getClass() == b.getClass(); // true — both ArrayList
Consequences
- Cannot do
new T()(no class token) — passClass<T>orSupplier<T>. - Cannot do
new T[n]— use(T[]) Array.newInstance(cls, n). - Cannot overload by erased signature:
void f(List<String>)andvoid f(List<Integer>)collide. - Heap pollution: unchecked casts can hide type errors until use.
PECS
Producer Extends, Consumer Super.
void copy(List<? extends Number> src, List<? super Number> dst) { ... }
? extends T lets you read T’s. ? super T lets you write T’s. Memorize this; it’s asked.
12. String Pool, intern(), Encoding
String is immutable. The compiler interns string literals into a pool (used to live in PermGen, now in heap since Java 7).
String a = "hello";
String b = "hello";
a == b; // true — both reference the pooled string
String c = new String("hello");
a == c; // false
a == c.intern(); // true
Use String.intern() rarely — it’s a global side effect with non-trivial cost.
Compact Strings (Java 9+)
A String is a byte[] plus a coder byte: LATIN1 (1 byte/char) or UTF16 (2 bytes/char). A pure-ASCII string halves its memory vs Java 8.
Concatenation
a + b + c compiles, since Java 9, to an invokedynamic calling a small generated method via StringConcatFactory. Fast.
A for loop with s = s + c is O(N²) in elapsed time despite the optimization, because each iteration allocates a new String. Use StringBuilder:
StringBuilder sb = new StringBuilder(n);
for (char c : data) sb.append(c);
return sb.toString();
StringBuffer is StringBuilder + synchronization. You almost never want StringBuffer.
13. equals / hashCode Contract
1. Reflexive: x.equals(x) == true
2. Symmetric: x.equals(y) ⇔ y.equals(x)
3. Transitive: x.equals(y) && y.equals(z) ⇒ x.equals(z)
4. Consistent: repeated calls with no mutation return the same result
5. x.equals(null) == false
6. equals ⇒ hashCode equal (NOT the converse)
Break #6 and HashMap silently loses your entries.
record Point(int x, int y) {} // record auto-generates correct equals/hashCode
For non-record classes: Objects.equals and Objects.hash are your friends. IDE-generated implementations are fine; hand-rolled ones are usually wrong on edge cases (null fields, inheritance, NaN doubles).
Inheritance trap
Symmetric equals between a class and a subclass is essentially impossible without breaking Liskov. Mark the class final, or use composition + a getClass() check (not instanceof).
14. Exception Design
Three families:
- Checked (
Exceptionsubclasses, exceptRuntimeException) — must be declared / caught. - Unchecked (
RuntimeExceptionsubclasses) — programmer errors, callers may ignore. Error— JVM problems (OutOfMemoryError,StackOverflowError). Don’t catch.
Modern Java APIs lean unchecked because checked exceptions don’t compose with lambdas / streams.
list.stream().map(this::parse) // parse throws IOException → won't compile
Workarounds: wrap in RuntimeException, or use a checked-exception-friendly stream library.
try-with-resources
try (var in = Files.newInputStream(path);
var gz = new GZIPInputStream(in)) {
...
} // both closed in reverse order, even on exception
Resources must implement AutoCloseable. Suppressed exceptions (close throws after the body throws) are kept on the original via addSuppressed.
15. Streams
Lazy, pull-based pipelines.
int total = orders.stream()
.filter(o -> o.year() == 2025)
.mapToInt(Order::total)
.sum();
Lifecycle
- Source —
collection.stream(),Stream.of(...),Stream.generate(...),Files.lines(...). - Intermediate ops (lazy) —
filter,map,flatMap,sorted,distinct,limit,skip. - Terminal op (eager) —
forEach,collect,reduce,count,findFirst,toList()(Java 16+).
Pitfalls
- A stream is single-use. Re-collecting fails with
IllegalStateException. - No checked exceptions. Lambdas can’t throw them.
- Stateful intermediate ops (
sorted,distinct) buffer the whole stream. Don’t call them on infinite streams. parallel()usesForkJoinPool.commonPool— only worth it for CPU-heavy ops on large data with no shared state.
list.stream().parallel().mapToInt(...)... // measure first
16. Common Interview Gotchas
== vs .equals()
Always discussed alongside the Integer cache. Use .equals for objects, == only for primitives or true identity checks.
Integer overflow
Integer.MAX_VALUE + 1 // -2147483648
(long)Integer.MAX_VALUE + 1 // 2147483648 — promote first
Math.addExact(a, b) // throws on overflow
Floating-point equality
0.1 + 0.2 == 0.3 // false
Math.abs(a - b) < 1e-9 // OK
Double.compare(a, b) == 0 // handles NaN consistently
String.split regex
"a.b".split(".") returns [] because . is regex “any char.” Use "\\." or Pattern.quote(".").
Modifying a collection during iteration
Throws ConcurrentModificationException — even single-threaded. Use Iterator.remove() or removeIf.
Arrays.asList(int[])
Returns a List<int[]> of length 1. Use Arrays.stream(arr).boxed().toList() or IntStream.
Switch fall-through
Classic switch falls through. New switch (->) does not, and is exhaustive on sealed types and enums.
String s = switch (day) {
case MON, TUE -> "weekday";
case SAT, SUN -> "weekend";
default -> "?";
};
17. Records, Sealed Classes, Pattern Matching (Java 16–21)
Records
record Point(int x, int y) {
static Point origin() { return new Point(0, 0); }
}
Records are transparent immutable carriers: auto-generated constructor, accessors, equals/hashCode/toString. Implicitly final. Cannot extend, can implement interfaces.
Sealed classes
sealed interface Shape permits Circle, Rect {}
record Circle(double r) implements Shape {}
record Rect(double w, double h) implements Shape {}
Sealed types restrict the set of permitted subclasses. Combined with pattern matching, this gives exhaustive switching:
double area = switch (shape) {
case Circle c -> Math.PI * c.r() * c.r();
case Rect r -> r.w() * r.h();
}; // no default needed — compiler knows the universe
Pattern matching for instanceof
if (obj instanceof String s && s.length() > 3) {
use(s);
}
Modern Java is much terser than Java 8. If your interviewer is on JDK 21, leverage records + sealed + pattern switch — it shows fluency.
18. Project Loom — Virtual Threads (Java 21+)
A virtual thread is a Java-managed lightweight thread that runs on top of a small pool of OS carrier threads. Park-on-blocking-IO is implemented in the JDK.
try (var exec = Executors.newVirtualThreadPerTaskExecutor()) {
for (int i = 0; i < 100_000; i++) {
exec.submit(() -> {
try (var sock = new Socket("h", 80)) { ... }
});
}
}
Use cases: thread-per-request servers without thread-pool sizing pain. Not faster for CPU-bound work — same number of CPUs. The win is scalability of blocking IO.
Sharp edges
- Pinning: a virtual thread inside
synchronizedcannot be unmounted from its carrier. UseReentrantLockif you’ll block while holding the lock. ThreadLocalstill works but with millions of virtual threads it’s expensive. PreferScopedValue(preview).- Native code (JNI) also pins the carrier.
Interview framing
“When would you use virtual threads instead of an executor?”
Massively concurrent IO — thousands+ of in-flight blocking calls — where you’d otherwise need async/reactive code. CPU-bound work still wants a fixed pool sized to cores.
19. Performance Hot Tips
- Pre-size collections (
new ArrayList<>(n),new HashMap<>(n*4/3+1)) to avoid resize churn. - Primitive arrays beat boxed lists by 4–10× for tight loops.
StringBuilderover+=in loops. (See §12.)- Reuse objects in hot paths if they’re large and immutable-ish; pool buffers (
ByteBuffer). - Avoid
Streamin micro-loops — the lambda allocations dominate. Streams shine on big pipelines, not 5-element ones. - Escape analysis lets HotSpot stack-allocate or scalar-replace short-lived objects. You don’t tune this; you write code that doesn’t escape (no leaking
this, no storing in fields). finaldoesn’t make code faster (the JIT proves it itself), but it documents intent and is required for some JMM guarantees.-XX:+UseLargePageson Linux for big heaps.- Profile with async-profiler (sampling, low-overhead) or JFR (built-in, low-overhead). Avoid
printf-debugging perf.
# Async profiler — wall-clock CPU profile.
asprof -d 30 -f flame.html <pid>
20. JMH — Java Microbenchmark Harness
You cannot benchmark Java with System.nanoTime() around a loop. The JIT will hoist invariants, dead-code-eliminate unused results, and warm up partway through your “measurement.”
JMH handles all of that:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class HashBench {
int[] data = ...;
@Benchmark public int sumLoop() {
int s = 0;
for (int x : data) s += x;
return s;
}
}
Key ideas:
- Warmup iterations before measurement.
@Stateholds inputs (avoids constant-folding).- Return values prevent dead-code elimination (or use
Blackhole.consume). - Forks isolate JIT state across benchmarks.
When an interviewer asks “is X faster than Y?” the senior answer is: “I’d write a JMH benchmark — but my prediction is Y because [allocation / branch / cache reason].” Predict, then measure.
What To Memorize Cold
- JVM = bytecode interpreter + tiered C1/C2 JIT. G1 is default GC. Heap = young (Eden + S0 + S1) + old.
- Integer cache
-128..127. Always.equals()for boxed numbers. intoverflow wraps; useMath.*Exact, preferlo + (hi-lo)/2.HashMaptreeifies overflowing buckets to red-black trees → worst-case O(log N) since 8.volatile= visibility + ordering, not atomicity.- JMM happens-before: lock, volatile, start/join, final-after-publication.
- Generics erase. No
new T[], no overload by erased signature. Stringis immutable. Compact strings since 9. UseStringBuilderin loops.equalsandhashCodego together;recorddoes it for you.- Records are final immutable carriers. Sealed + pattern match = exhaustive switch.
- Virtual threads scale blocking IO; they don’t help CPU.
- JMH for benchmarks.
nanoTimelies.
If any of those is shaky, re-read the section, write the smallest program that demonstrates it, and watch it misbehave on purpose.
Go Runtime Deep Dive
Target audience: candidates interviewing in Go for infrastructure, distributed systems, cloud-native (Kubernetes / Docker / etcd ecosystems), or any backend role where the interviewer asks “explain how goroutines actually work.”
Scope: gc (the standard Go compiler) on Go 1.22+. gccgo and TinyGo only mentioned where they change interview-grade answers.
Go’s surface looks small. The runtime is not. The interview gap appears immediately when an interviewer asks “what’s the difference between a goroutine and an OS thread?”, “what does nil != nil even mean?”, or “why does this loop variable do that?”. This guide trains the answers.
1. Runtime Overview — M:N Scheduling
Go runs your code under a runtime linked into every binary. The runtime owns:
- The goroutine scheduler (M:N — many goroutines onto few OS threads).
- The garbage collector (concurrent tri-color mark-sweep).
- The memory allocator (TCMalloc-derived, per-P caches).
- Channel and sync primitives, network poller, timers, profilers.
A “goroutine” is not an OS thread. It’s a small (~2 KB initial stack) cooperatively-scheduled task multiplexed onto a pool of OS threads. The runtime can have thousands of goroutines on a handful of threads.
// A million goroutines is normal.
for i := 0; i < 1_000_000; i++ {
go func() { /* ... */ }()
}
This is feasible because each goroutine starts with ~2 KB of stack (vs ~1 MB for an OS thread default) and the stack grows as needed.
Stack growth
Goroutine stacks are segmented / split historically, contiguous-grow since 1.4: when the stack overflows, the runtime allocates a bigger stack and copies all frames + adjusts pointers. This is the reason taking the address of a stack-allocated variable is safe in Go: even if the stack moves, references stay valid.
2. The GMP Scheduler
Three runtime objects:
| Stands for | What it is | |
|---|---|---|
| G | Goroutine | A goroutine: stack + program counter + status |
| M | Machine | An OS thread |
| P | Processor | A logical scheduler context; holds a runnable G queue |
Number of P’s = GOMAXPROCS (default: number of CPUs). Each P has a local runnable queue. M’s bind to a P to execute G’s; an M without a P cannot run Go code.
P0 [G G G G ...] P1 [G G ...] P2 [G G G G G ...]
│ │ │
M0 M1 M2 (OS threads)
Steal work
When a P’s queue is empty, it steals half from a random other P’s queue. Keeps cores busy without a global lock.
What happens on a blocking syscall
The M making the syscall detaches from its P and blocks. The P picks up another M (creating one if needed) and keeps scheduling. When the syscall returns, the original M tries to reacquire a P; if none is free it parks the G on the global queue.
This is why read(fd, ...) on a regular file blocks an OS thread but does not block your other goroutines — they keep running on other M’s.
Network poller
Network I/O is epoll/kqueue/IOCP under the hood. A goroutine doing conn.Read parks itself, registers with the poller, and another goroutine runs. When the fd is readable, the poller wakes the parked G. No M is consumed while parked. This is why Go scales to 100K+ concurrent network connections trivially.
Preemption
Up to Go 1.13, goroutines yielded only at function preludes (so a tight CPU loop without function calls could starve others). Since 1.14, asynchronous preemption uses signals to interrupt a goroutine mid-instruction.
// Pre-1.14, this could starve everything else; today it's preempted.
go func() { for {} }()
Interview framing
“What’s the difference between a goroutine and a thread?”
Goroutine: ~2KB stack, cooperative + signal-preempted, scheduled by Go runtime onto a pool of OS threads. Thread: ~1MB stack, OS-scheduled, costlier context switches. Goroutines are the unit you think about; M’s are an implementation detail.
3. Goroutines vs Threads — Practical Implications
// I/O fanout pattern
results := make(chan Result, len(urls))
for _, u := range urls {
u := u // pre-1.22: required to capture
go func() {
results <- fetch(u)
}()
}
for range urls {
r := <-results
process(r)
}
Costs:
- Goroutine creation: ~1µs.
- Channel ops: ~50–100ns uncontended; mutexes similar.
- Context switch: ~200ns within Go runtime; blocking syscalls add OS thread cost.
Sharp edge: unlike OS threads, goroutines do not have IDs. By design — they discourage thread-local-state patterns. This breaks naïve port of Java idioms.
4. Channels — Buffered, Unbuffered, select
A channel is a typed bounded queue with built-in synchronization.
| Construct | Behavior |
|---|---|
make(chan T) | Unbuffered: send and recv must rendezvous. Sender blocks until a receiver is ready. |
make(chan T, n) | Buffered: sender blocks only when buffer is full. |
close(ch) | Recvs drain remaining values, then receive zero values. Send to closed → panic. |
ch := make(chan int, 2)
ch <- 1
ch <- 2
close(ch)
for v := range ch { fmt.Println(v) } // 1, 2
select
Multiplexes channels — picks any ready case (random tie-break). Unblocks composing producers/consumers, timeouts, cancellation.
select {
case v := <-in:
use(v)
case out <- val:
// ...
case <-time.After(2 * time.Second):
return errors.New("timeout")
case <-ctx.Done():
return ctx.Err()
}
Nil-channel pattern
A nil channel blocks forever. Setting a case’s channel to nil disables it:
var done chan struct{} = nil
// case <-done: never fires
Useful when iterating over multiple channels and “turning off” one as it completes.
Closing semantics
- Receivers detect close with
v, ok := <-ch; okis false on closed-and-drained. - Only the sender should close. Closing on the receiver side requires extra coordination because closing a channel that someone else may send to → panic.
- Don’t close a channel just to “free” it; let the GC handle that.
5. Sync Primitives
| Use for | |
|---|---|
sync.Mutex | Mutual exclusion |
sync.RWMutex | Many readers / few writers (do measure — RW often loses to plain Mutex) |
sync.Once | Idempotent one-time init |
sync.WaitGroup | Wait for N goroutines |
sync.Cond | Condition variable; rarely needed (channels usually clearer) |
sync/atomic | CAS, atomic add/load/store on int32/int64/pointer |
sync.Map | Concurrent map only when read-mostly with disjoint key sets |
var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
// CS
sync.Map is not always faster
It’s optimized for two specific patterns:
- Stable disjoint key sets per goroutine.
- Mostly reads, rare writes.
For everything else, a regular map[K]V + sync.Mutex (or shards) is faster and clearer.
WaitGroup
var wg sync.WaitGroup
for _, x := range data {
wg.Add(1)
go func(x Item) {
defer wg.Done()
process(x)
}(x)
}
wg.Wait()
Trap: wg.Add must happen before the goroutine starts running, never inside it.
6. Memory Model
Go has a documented memory model (re-articulated in 2022 for clarity). Key rules:
- A read sees writes that happen before it.
- Within a goroutine: program order.
- Goroutine creation happens before its first instruction.
- A send on a channel happens before the corresponding receive completes.
- Close of a channel happens before a receive that returns the zero value due to close.
m.Unlockhappens before subsequentm.Lock.sync/atomic: each atomic op is sequentially consistent; pairs ordered by HB.
// Without sync, this is racy.
var data []int
var ready bool
go func() {
data = makeData()
ready = true // RACE — no HB to the reader
}()
for !ready {} // may loop forever (compiler/CPU can hoist)
use(data)
Fix with channel, mutex, or atomic.
Race detector
Always run tests with -race:
go test -race ./...
It instruments memory accesses, catches actual data races (not just suspicious code). Cheap insurance; one of Go’s killer features.
7. Garbage Collector — Concurrent Tri-color Mark-Sweep
Go’s GC is concurrent, non-moving, tri-color mark-sweep with write barriers.
- Tri-color: white = not yet visited, grey = visited but children not, black = done.
- Write barrier: intercepts pointer writes during mark to maintain invariants while the mutator runs.
- Non-moving: objects don’t relocate. Pointers stay stable. (Trade-off: no compaction, more fragmentation.)
Pause time
Sub-millisecond STW for stack scanning + write-barrier setup. Most marking happens concurrently with your program. No “young generation” — Go’s GC is uniform.
Pacing
GC triggers when heap doubles since last collection (GOGC=100 default). Lower for less footprint at the cost of CPU; higher for less GC at the cost of memory.
GOGC=200 ./app # GC less often
GOGC=off ./app # disable (for benchmarks)
Soft / hard memory limits
runtime/debug.SetMemoryLimit(n) (Go 1.19+) sets a soft limit; the GC trades CPU for staying under it. Useful in containers — set it to 0.9 * cgroup_limit to avoid OOM-kills.
Escape analysis
The compiler decides at compile time whether a value can stay on the stack. If a pointer “escapes” the function, the value is heap-allocated.
func f() *int {
x := 1
return &x // escapes — heap allocation
}
go run -gcflags='-m' main.go
// prints: x escapes to heap
Knowing what allocates lets you avoid GC pressure in hot paths. Stack allocation is essentially free; heap allocation costs ~30ns + future GC scan.
8. Slice Internals
A slice is a 3-word struct: (ptr *T, len int, cap int). Slicing does not copy — it’s a view.
a := []int{1, 2, 3, 4, 5}
b := a[1:4] // [2 3 4], cap=4 (from index 1 to end of underlying array)
b[0] = 99
fmt.Println(a) // [1 99 3 4 5] — shared backing array!
append semantics
b = append(b, 10) // if len < cap: in place; else allocate new backing array
Growth: double up to ~256 elements, then ~1.25× (Go 1.18 changed the heuristics slightly). The new slice’s backing array is independent of any older slice that still points at the old one.
a := make([]int, 4, 4)
b := a[:2]
c := append(b, 99) // overwrites a[2]
fmt.Println(a, c) // [1 1 99 1] [1 1 99]
d := append(c, 1, 2, 3) // reallocates; d disjoint from a
This is the slice aliasing gotcha that loses interviews. The fix is to be explicit:
b := append([]int(nil), source...) // explicit copy
Three-index slice
a[lo:hi:max] caps the new slice’s cap at max - lo. Use it when handing out a slice you don’t want the receiver to extend into your data.
9. Map Internals
map[K]V is a hash table with bucket chaining (each bucket holds 8 entries, then chains overflow buckets). Hash is randomized per map (security + iteration order).
m := make(map[string]int, 1000) // pre-size to avoid grows
m["a"] = 1
delete(m, "a")
v, ok := m["a"]
Iteration order is randomized
Every for k := range m produces a different order, even within one run. Don’t depend on it.
for k, v := range m {
// unspecified order
}
Concurrent access
Plain map is not safe for concurrent read/write. Go’s race detector and runtime both panic on detection. Use sync.RWMutex or sync.Map (with caveats from §5).
fatal error: concurrent map writes
nil map
A nil map can be read (returns zero) but not written. A common bug:
var m map[string]int
m["a"] = 1 // PANIC
Use m := map[string]int{} or make(map[string]int).
Complexity
| Op | Avg | Worst |
|---|---|---|
m[k] | O(1) | O(N) under collisions |
m[k] = v | O(1) amortized | O(N) on grow |
delete(m, k) | O(1) | O(N) |
range m | O(N) | O(N) |
Maps shrink lazily — deleting most keys does not return memory. Re-create the map if you care.
10. Strings — Bytes vs Runes
A string is an immutable byte slice. No internal length-of-runes — indexing returns bytes.
s := "héllo"
len(s) // 6 — UTF-8 bytes (é is 2)
s[0] // 'h' (a byte)
s[1] // first byte of é, NOT é
Iterate with range to get runes (decoded code points):
for i, r := range s {
// i: byte index, r: rune (int32 code point)
}
To get rune count: utf8.RuneCountInString(s).
[]byte ↔ string conversion
Both directions copy by default (so the immutability invariant holds).
b := []byte(s) // copy
s2 := string(b) // copy
Hot paths can use unsafe.String / unsafe.Slice (Go 1.20+) for zero-copy, but it’s a footgun — only if you can prove the underlying bytes won’t be mutated.
String concat
a + b + c allocates each step → O(N²) in a loop. Use strings.Builder:
var sb strings.Builder
for _, p := range parts {
sb.WriteString(p)
}
return sb.String()
strings.Builder reuses its buffer and avoids the final copy via unsafe.
11. Interfaces — itab, the nil != nil Trap
An interface value is two words: (itab *itab, data *void). The itab holds the dynamic type + method table; data is the concrete value (or pointer to it).
type io.Reader interface { Read(p []byte) (int, error) }
var r io.Reader // itab = nil, data = nil → r == nil
r = (*os.File)(nil) // itab ≠ nil, data = nil → r != nil !!
This is the Go gotcha. Rule: an interface is nil only when both its halves are nil. A typed nil pointer assigned to an interface is not nil.
The footgun in real code:
func mightFail() error {
var e *MyError = nil
if condition() { e = &MyError{...} }
return e // returning a typed-nil pointer -> caller sees != nil
}
Fix:
func mightFail() error {
if condition() { return &MyError{...} }
return nil // explicit nil interface
}
Type assertions and type switches
v, ok := x.(string) // safe assertion
switch v := x.(type) {
case int: use(v)
case string: use(v)
default: ...
}
Type assertions are O(1) for non-empty interfaces (one slot in the itab). For empty interfaces (any), the runtime walks the method table — still fast but not free.
12. Error Handling
Errors are values. Everything else is style.
v, err := operation()
if err != nil {
return fmt.Errorf("operation failed: %w", err)
}
Wrapping
%w (Go 1.13+) wraps an error, building a chain.
errors.Is(err, io.EOF) // walks the chain
var pathErr *os.PathError
errors.As(err, &pathErr) // unwraps to a specific type
Sentinel errors
var ErrNotFound = errors.New("not found")
return ErrNotFound
Compare with errors.Is, not == — wrapping breaks ==.
panic / recover
panic unwinds stack frames running deferred functions. recover (in a deferred func) catches it. Use only for truly unexpected conditions (programmer bugs, “should never happen”). Not for control flow.
defer func() {
if r := recover(); r != nil {
log.Printf("recovered: %v", r)
}
}()
13. defer
defer schedules a call to run when the surrounding function returns.
f, err := os.Open(path)
if err != nil { return err }
defer f.Close()
Cost and gotchas
-
Pre-1.14, defer was ~50ns. Since 1.14, “open-coded defers” are inlined for many cases — essentially free.
-
Args are evaluated at the
defercall site, not at execution:i := 1 defer fmt.Println(i) // prints 1 i = 2 -
LIFO ordering — deferred calls run in reverse.
-
deferin a loop accumulates. Don’tdefer f.Close()insideforover thousands of files; close manually or wrap the body in a function.
14. Context
context.Context propagates deadlines, cancellation, and request-scoped values across API boundaries.
ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()
resp, err := http.NewRequestWithContext(ctx, "GET", url, nil)
Rules
- Pass
ctxas the first parameter, never store it in a struct field for long-lived state. - Always call
cancel— even on success — to release resources.defer cancel()is the pattern. - Don’t pass
nilctx; usecontext.TODO()if you don’t have one yet. ctx.Valueis for request-scoped data (auth principal, request ID), not for optional config.- A child context is cancelled when its parent is cancelled.
Detecting cancellation
select {
case <-ctx.Done():
return ctx.Err()
case v := <-work:
return process(v)
}
15. Goroutine Leaks
A goroutine leak happens when a goroutine blocks forever on a channel that never receives, a mutex never released, etc. The runtime never reclaims it. In long-running services, leaks compound.
Common shape
func bad() <-chan int {
out := make(chan int) // unbuffered
go func() {
out <- expensive() // blocks forever if caller drops the chan
}()
return out
}
Fixes:
- Buffer the channel for one value (drop on send if no receiver).
- Use
selectwithctx.Done().
go func() {
select {
case out <- expensive():
case <-ctx.Done():
}
}()
Detecting leaks
go testwithgoleak(Uber library) at the end of tests.runtime.NumGoroutine()in production — a steadily growing number is a leak.pprofgoroutine profile:curl http://localhost:6060/debug/pprof/goroutine?debug=2.
16. Testing and Benchmarking
Tests
func TestAdd(t *testing.T) {
if got := Add(1, 2); got != 3 {
t.Errorf("Add(1,2) = %d, want 3", got)
}
}
Table-driven tests
Idiomatic Go — readable, easy to extend.
tests := []struct {
name string
in, want int
}{
{"zero", 0, 0},
{"pos", 1, 2},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := Double(tt.in); got != tt.want {
t.Errorf("got %d want %d", got, tt.want)
}
})
}
Benchmarks
func BenchmarkX(b *testing.B) {
for i := 0; i < b.N; i++ {
X()
}
}
go test -bench=. runs them. b.ReportAllocs() includes alloc counts. Always look at allocs/op — the JIT here doesn’t exist; allocations directly drive GC pressure.
go test -bench=. -benchmem
Fuzzing (Go 1.18+)
func FuzzParse(f *testing.F) {
f.Add("hello")
f.Fuzz(func(t *testing.T, s string) {
Parse(s)
})
}
Use for parsers, decoders, anything taking adversarial input.
17. Common Interview Gotchas
Loop variable capture (pre-1.22)
for _, v := range items {
go func() { process(v) }() // pre-1.22: all goroutines see last v
}
Fix pre-1.22: shadow v := v inside the loop. Go 1.22 fixed this — each iteration has its own copy. State which Go version you’re on.
Slice aliasing
See §8.
Map iteration order
Randomized. Don’t rely on it. Don’t rely on it. Don’t rely on it.
Nil interface vs typed-nil pointer
See §11.
== on slices / maps / functions
Compile error. Slices/maps/funcs aren’t comparable. Use reflect.DeepEqual or write per-field comparison.
defer in a loop
for _, p := range paths {
f, _ := os.Open(p)
defer f.Close() // Hundreds of open files — close at func return
}
Wrap the body in a function or close explicitly.
Range over a channel
for v := range ch continues until ch is closed. If never closed, leaks.
Goroutine started with shared mutable state
data := []int{1, 2, 3}
go modify(&data) // race unless guarded
Always guard with mutex or send via channel.
18. Performance Hot Tips
-
Pre-size slices and maps:
make([]T, 0, n),make(map[K]V, n). Avoid resize churn. -
Avoid heap allocations in hot loops. Use
-gcflags='-m'to find escape culprits. Reuse buffers viasync.Pool.var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }} buf := bufPool.Get().(*bytes.Buffer) defer func() { buf.Reset(); bufPool.Put(buf) }() -
strings.Builderfor concatenation,bytes.Bufferfor byte building. -
Prefer fixed-size arrays / structs over slices in tight code when size is known.
-
Goroutines aren’t free. Spawning one per CPU-microtask in a 1ns loop is slower than the loop. They shine for IO and large work units.
-
Avoid
interface{}in hot paths. Boxing primitives heap-allocates and adds an itab indirection per call. -
Profile.
go test -cpuprofile,pprof, the runtime tracer (go tool trace).
go test -bench=. -cpuprofile=cpu.out
go tool pprof -http=:8080 cpu.out
runtime.GC()anddebug.SetGCPercentare levers, not solutions. Reduce allocation first.sync.Poolis not a general-purpose cache; the runtime drops its contents on every GC. Use it for short-lived reusable buffers.
What To Memorize Cold
- GMP scheduler. Goroutines (~2KB) ≠ OS threads. M:N. P count =
GOMAXPROCS. - Goroutine stacks grow by copy. Network I/O via runtime poller, no M consumed.
- Channels: unbuffered = rendezvous. Send to closed → panic. Nil chan blocks forever.
- Memory model: race detector with
-race. Channel send happens-before recv. Mutex unlock HB next lock. - GC: concurrent tri-color mark-sweep, non-moving, sub-ms pauses.
GOGCandSetMemoryLimit. - Slices =
(ptr, len, cap).appendmay alias or reallocate. Aliasing bugs are common. - Maps: randomized iteration, not concurrent-safe, not comparable, panic on nil-map write.
- Strings: immutable bytes. Range over string yields runes.
- Interface = (itab, data). Typed-nil pointer in interface ≠ nil interface.
- Loop var capture fixed in Go 1.22.
defercheap since 1.14, args eval at scheduling time.contextfirst arg, alwaysdefer cancel().- Goroutine leaks via blocked channels —
selectonctx.Done(). - Pre-size slices/maps.
sync.Poolfor buffer reuse.pproffor everything else.
When any of those is hazy, write a 10-line program that tickles it. The race detector and -gcflags='-m' are unusually fast feedback loops compared to other languages.
C++ Runtime Deep Dive
Target audience: candidates interviewing in C++ for HFT/quant, game engines, embedded, browsers, databases, systems-programming, or any role where the interviewer asks “what does this allocate?”, “is that UB?”, or “trace the move.”
Scope: ISO C++17 baseline with C++20/23 features called out. GCC, Clang, and MSVC all behave alike on the spec — vendor-specific behavior is noted only when it changes interview answers.
C++ punishes superficial knowledge harder than any other language on this list. The senior interviewer will set a trap (a dangling reference, a missed move, an iterator invalidation, a UB), and either you see it or you don’t. There is no bluffing through C++. Everything in this guide pays interest.
1. Memory Model — Stack, Heap, RAII
C++ gives you control over object lifetime. Every object you create lives somewhere:
| Storage | Lifetime | Cost | Example |
|---|---|---|---|
| Automatic (stack) | Scope of declaration | Zero | int x; Foo f; |
| Static / thread_local | Program / thread | Zero (init once) | static Foo f; |
| Dynamic (heap) | Until delete/destructor | malloc + bookkeeping | new Foo / make_unique<Foo> |
void f() {
int x = 5; // stack
static int y = 0; // static, init once
auto p = std::make_unique<int>(42); // heap, freed at scope end
}
RAII
Resource Acquisition Is Initialization. Tie the lifetime of a resource (memory, file, lock, socket) to the lifetime of an object on the stack.
{
std::lock_guard<std::mutex> lk(mtx); // acquires
// ... critical section ...
} // destructor releases
RAII is the single most important C++ idea. It makes exceptions safe, makes resource leaks impossible if you stick to it, and is the foundation of all modern C++. Every interview answer that involves “what if it throws?” reduces to “RAII handles it.”
Stack frames
A function call pushes a frame: arguments, return address, locals, callee-saved registers. Frame size is fixed at compile time. Stack overflow on deep recursion or huge stack arrays.
alloca / VLAs
alloca(n) allocates on the stack. C99 VLAs (int arr[n]) are not in C++. Modern code uses std::vector or std::array (compile-time n).
2. Pointers, References, Values
| Form | Nullable | Rebindable | Storage |
|---|---|---|---|
T | n/a | n/a | by value |
T& | No | No | reference; aliases another object |
T* | Yes | Yes | pointer; an address |
const T& | No | No | read-only view |
T&& | No | No | rvalue reference (see §4) |
int x = 1;
int& r = x; // r is x — no separate object
int* p = &x;
*p = 2; // x is now 2
r = 3; // x is now 3
p = nullptr; // p reseats; r cannot be reseated
When to use which
- Pass by value: small types (
int,Point), or you want a copy / will move from the parameter. - Pass by
const T&: large/expensive types you only read. - Pass by
T&: out-parameters (rare in modern C++; prefer return values). - Pass by pointer: nullable, or you need a C-API.
Dangling refs
const std::string& bad() {
std::string tmp = "hi";
return tmp; // returns reference to dead local — UB
}
The compiler may warn; the runtime will silently corrupt. Sanitizers (ASan) catch many cases.
3. Smart Pointers — unique_ptr, shared_ptr, weak_ptr
The modern rule: never new/delete directly. Use:
std::unique_ptr<T>— exclusive ownership, zero overhead vs raw pointer.std::shared_ptr<T>— shared ownership, atomic refcount.std::weak_ptr<T>— non-owning observer; breaksshared_ptrcycles.
auto u = std::make_unique<Foo>(args...); // unique
auto s = std::make_shared<Foo>(args...); // shared
std::weak_ptr<Foo> w = s; // non-owning
Cost model
unique_ptr<T> is a single pointer. Move-only. Compiler optimizes away the wrapper.
shared_ptr<T> is two pointers (the object, the control block) + an atomic refcount. Copying = atomic increment. Destruction = atomic decrement.
make_shared vs shared_ptr<T>(new T)
make_shared<T> allocates the object and the control block in one block. Cheaper, better cache locality. Drawback: memory isn’t freed until the last weak_ptr dies (because the control block lives in the same allocation).
Cycles
struct Node { std::shared_ptr<Node> next; };
auto a = std::make_shared<Node>();
auto b = std::make_shared<Node>();
a->next = b; b->next = a;
// a and b never freed — refcount of each stays at 2
Fix: one direction weak_ptr. Or, redesign — most “cycles” represent ownership confusion.
Custom deleter
auto p = std::unique_ptr<FILE, decltype(&fclose)>(fopen("x", "r"), &fclose);
Useful for C-API resources.
4. Move Semantics, Rvalue References
A moved-from object is in a “valid but unspecified” state. The point of move is to transfer expensive resources (heap allocations, file handles) without copying.
std::string a = "hello";
std::string b = std::move(a); // b owns the buffer; a is empty (typically)
std::move is a cast — it doesn’t move anything; it tells the compiler “treat this as an rvalue, please pick the move overload.”
Rule of 0/3/5
- Rule of 0: design classes so the defaults are correct. Member variables are RAII types (
vector,unique_ptr,string). Don’t write any of the special members. - Rule of 3 (pre-C++11): if you write any of
dtor,copy ctor,copy assign, write all three. - Rule of 5 (C++11+): add move ctor and move assign.
class Buffer {
std::unique_ptr<char[]> data_;
std::size_t size_;
public:
Buffer(std::size_t n)
: data_(std::make_unique<char[]>(n)), size_(n) {}
// copy/move auto-generated correctly because members are RAII.
};
noexcept matters
Move operations should be noexcept. If they aren’t, std::vector can’t use them when reallocating — it falls back to copy, defeating the purpose.
struct S {
std::string name;
S(S&&) noexcept = default; // critical
S& operator=(S&&) noexcept = default;
};
Forwarding references (T&& in templates)
template<class T>
void f(T&& x) { // forwarding ref, NOT rvalue ref
g(std::forward<T>(x)); // preserves value category
}
Reference collapsing: T&& && → T&&, T&& & → T&. This is the mechanism behind perfect forwarding (and std::forward).
5. Copy Elision and RVO
The compiler is allowed (and often required) to elide copy/move when constructing return values.
Foo make() { return Foo{}; } // (N)RVO — direct construction in caller
Foo f = make(); // no copy, no move
C++17 mandated copy elision for prvalues — the move you “see” in source code may not exist as an actual operation.
auto v = std::vector<int>(1'000'000); // no copy of the temporary
Implication: return by value is the right default. The compiler will not copy a big vector.
6. Templates, SFINAE, Concepts
Templates are compile-time generators. Each instantiation produces a fresh type or function.
template<class T>
T max(T a, T b) { return a < b ? b : a; }
max(1, 2); // T = int
max(1.0, 2.0); // T = double
max(1, 2.0); // ambiguous — different T's
SFINAE — “Substitution Failure Is Not An Error”
Failed substitutions are silently dropped from the overload set, not compile errors:
template<class T>
auto add(T a, T b) -> decltype(a + b) { return a + b; }
Older idiom: std::enable_if_t<...>. Crufty; use concepts instead in C++20:
template<class T>
concept Numeric = std::is_arithmetic_v<T>;
template<Numeric T>
T add(T a, T b) { return a + b; }
Compile-time error blasts
A template error message can be thousands of lines. Modern compilers (gcc 13+, clang 16+) and concepts dramatically reduce this. If you see a 5000-line error in an interview, don’t panic; isolate by typedef-ing intermediate types.
CRTP (Curiously Recurring Template Pattern)
Static polymorphism — virtual without the vtable cost.
template<class Derived>
struct Base { void f() { static_cast<Derived*>(this)->impl(); } };
struct D : Base<D> { void impl() { /* ... */ } };
7. STL Containers — Complexity
| Container | Insert | Erase | Lookup | Iter Invalidation | Memory |
|---|---|---|---|---|---|
vector | O(1)* end / O(N) middle | O(N) | O(N), O(1) by index | All on grow / from pos | Contiguous |
array | n/a | n/a | O(1) | None | Contiguous, fixed N |
deque | O(1) ends, O(N) middle | O(N) | O(1) | All except ends | Block array |
list | O(1) anywhere (with iter) | O(1) | O(N) | None on insert; affected pos on erase | Doubly linked |
forward_list | O(1) after iter | O(1) | O(N) | None on insert | Singly linked |
set/map | O(log N) | O(log N) | O(log N) | None on insert; pos on erase | Red-black tree |
unordered_set/map | O(1) avg, O(N) worst | O(1) avg | O(1) avg | All on rehash | Buckets + nodes |
vector is the default. Reach for others only with a measured reason.
std::vector<int> v;
v.reserve(1'000'000); // pre-size, avoid grows
for (int i = 0; i < 1'000'000; ++i) v.push_back(i);
unordered_map warnings
Open-chaining hash table. Each node is heap-allocated → bad cache locality. For perf-critical code, prefer absl::flat_hash_map, tsl::robin_map, or other open-addressing maps. State this in HFT/perf interviews; it’s a known weakness.
std::unordered_map<std::string, int> m;
m.reserve(N); // sets bucket count
m.max_load_factor(0.5); // tighter than default 1.0
8. Iterator Invalidation
The single most common subtle bug in C++.
| Container | Operation | What invalidates |
|---|---|---|
vector | push_back, insert, reserve triggering grow | All iterators/refs/pointers |
vector | erase | Iterators/refs at and after pos |
deque | any insert/erase except at ends | All iterators (refs to non-affected elements survive) |
list / forward_list | insert, push_* | None |
list / forward_list | erase | Only iterators to erased element |
unordered_* | rehash (insert that exceeds load factor) | All iterators (refs/pointers survive!) |
map / set | insert | None |
map / set | erase | Only iterators to erased |
std::vector<int> v{1,2,3,4,5};
for (auto it = v.begin(); it != v.end(); ++it) {
if (*it == 3) v.push_back(99); // UB — push_back may invalidate `it`
}
// Correct: collect, then mutate; or use erase-remove.
v.erase(std::remove_if(v.begin(), v.end(), pred), v.end());
9. STL Algorithms
<algorithm> and <numeric> provide a rich library. Use them — hand-rolled loops are usually slower and harder to read.
std::sort(v.begin(), v.end()); // IntroSort, O(N log N)
std::stable_sort(v.begin(), v.end()); // O(N log² N) generally
std::nth_element(v.begin(), v.begin()+k, v.end()); // O(N) avg, kth-element
std::partial_sort(v.begin(), v.begin()+k, v.end()); // top-k, O(N log K)
std::lower_bound(v.begin(), v.end(), x); // binary search, O(log N)
std::accumulate(v.begin(), v.end(), 0LL); // careful with init type
Sort algorithms
std::sort is introsort: quicksort, switching to heapsort if recursion gets too deep, switching to insertion sort for small ranges. Worst case O(N log N), unstable. std::stable_sort is typically merge sort with allocation; std::sort is usually preferred unless stability matters.
Ranges (C++20)
auto evens = v | std::views::filter([](int x){ return x%2==0; })
| std::views::transform([](int x){ return x*x; });
Lazy, composable. Less verbose than iterator pairs.
10. Concurrency — std::thread, mutex, atomics, memory_order
std::thread t([]{ work(); });
t.join(); // or t.detach() — but rarely
If a std::thread is destroyed while joinable, the program calls terminate. std::jthread (C++20) joins on destruction.
Mutex
std::mutex m;
std::lock_guard<std::mutex> lk(m); // RAII lock
std::scoped_lock (C++17) locks multiple mutexes deadlock-free.
Condition variables
std::condition_variable cv;
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{ return ready; }); // releases lk, waits, reacquires
Always use the predicate form to handle spurious wakeups.
std::atomic<T>
std::atomic<int> counter{0};
counter.fetch_add(1, std::memory_order_relaxed);
memory_order
| Order | Guarantees | Use |
|---|---|---|
relaxed | Atomicity only, no ordering | Stat counters |
acquire (load) | No subsequent reads/writes can move before | Read of a flag protecting data |
release (store) | No prior reads/writes can move after | Write that publishes data |
acq_rel (RMW) | Both | CAS retry loops |
seq_cst (default) | Sequential consistency, single total order | Default; safest |
// Producer:
data = produce();
ready.store(true, std::memory_order_release);
// Consumer:
while (!ready.load(std::memory_order_acquire)) {}
use(data); // safe — release/acquire pair
memory_order is interview territory at L6+ HFT/system roles. Default to seq_cst until measured.
11. Undefined Behavior (UB)
UB means the spec places no requirements. The compiler may eliminate code, “optimize” infinite loops, or generate code that does anything. Don’t rely on “well, it works on my machine.”
Common UB
- Read of uninitialized memory.
- Out-of-bounds access (
v[v.size()]is UB). - Signed integer overflow (unsigned wraps, signed is UB).
- Use-after-free / double-free.
- Race conditions (concurrent unsynchronized access to mutable data).
- Strict aliasing violations (reinterpreting a
float*asint*). - Null pointer deref — including for member access on a null pointer.
- Lifetime violations — using a moved-from object beyond what’s specified.
- Integer division by zero, INT_MIN / -1.
- Returning reference/pointer to a local.
Why it bites in interviews
The interviewer puts a for (int i = 0; i <= n; ++i) v[i] = ...; on the board and watches whether you flag the OOB. If you don’t, your perceived rigor drops a tier instantly.
Sanitizers
Compile + run tests under:
clang++ -fsanitize=address,undefined -g -O1 main.cpp
clang++ -fsanitize=thread -g -O1 main.cpp # for races
ASan: heap/stack/global OOB, use-after-free, double-free. UBSan: signed overflow, null derefs, alignment. TSan: data races.
State in interviews that you run sanitizers in CI. It signals discipline.
12. Common Interview Gotchas
Virtual destructor
If a class is meant to be derived-from and used polymorphically, the destructor must be virtual — otherwise delete base_ptr calls only the base’s destructor.
struct Base { virtual ~Base() = default; };
struct Derived : Base { /* ... */ };
Base* p = new Derived;
delete p; // virtual dtor → Derived's runs
Object slicing
void f(Base b); // by value
Derived d;
f(d); // d sliced — only Base portion copied
Always pass polymorphic types by reference or pointer, never by value.
vector<bool> is not a vector of bool
Specialized as a packed bitset → operator[] returns a proxy, not bool&. Don’t take its address.
std::vector<bool> v;
auto x = v[0]; // proxy reference, not bool&
Use std::vector<char> if you need real bools.
Self-assignment
T& operator=(const T& o) {
if (&o == this) return *this; // guard
// ...
}
Or: copy-and-swap idiom — pass by value (copy happens at call site), swap, return.
Initialization order
Member variables are constructed in declaration order, not member-initializer-list order. Compiler warns when they differ.
static local init
Thread-safe since C++11 (Magic statics). One initialization, even with concurrent first access.
nullptr vs NULL vs 0
Use nullptr. NULL is 0 (an integer); 0 doesn’t overload-resolve cleanly.
Floating-point comparison
Same warning as Java — never == for float/double. Use tolerances or std::nextafter.
Implicit conversions
int → bool, bool → int, double → int. Use explicit for single-arg constructors:
struct Date { explicit Date(int y); };
Date d = 2024; // error — explicit constructor
Date d{2024}; // OK
13. Modern C++ Idioms
autofor local types — but spell out parameter and return types where they’re API.- Range-for —
for (const auto& x : container). - Lambdas — capture defaults:
[](none),[&](by ref),[=](by value),[this]. enum class— strongly typed, scoped enums. No implicit int conversion.structured bindings—auto [k, v] = *it;.if constexpr— compile-time branch in templates.std::optional—Maybe<T>. Use for “may not exist.”std::variant— tagged union.std::string_view— non-owning view of a string. Don’t store across the string’s lifetime.std::span— non-owning view of a contiguous range.{}init — uniform initialization. Prevents narrowing conversions.
int a{3.14}; // error — narrowing
int a = 3.14; // OK (silent truncation)
Modules (C++20)
Replacement for headers. Faster builds, better isolation. Adoption uneven; compilers still maturing.
Coroutines (C++20)
generator<int> ints() {
for (int i = 0;; ++i) co_yield i;
}
The standard library lacks high-level types — you bring boost::asio or roll your own. Mention only if asked.
14. Compile-Time vs Runtime
C++ has a powerful compile-time computation toolkit. Use it to push work out of the runtime.
constexpr int factorial(int n) { return n <= 1 ? 1 : n * factorial(n-1); }
static_assert(factorial(5) == 120);
template<class T>
constexpr bool is_pod_v = std::is_trivial_v<T> && std::is_standard_layout_v<T>;
constexpr, consteval (C++20), if constexpr together let you write code that’s branchless and zero-cost when called with constant inputs.
Compile-time hash
Implement a consteval string hash, generate switch tables — common HFT trick to dispatch on string commands at runtime in O(1) without runtime hashing.
15. Performance Hot Tips
- Cache friendliness wins. Arrays of structs with sequential access trounce trees of pointers, even when complexity is “the same.” A modern CPU handles ~1 cache miss per 100 cycles of compute.
- Reserve.
vector::reserve,unordered_map::reserve. Avoid grow churn. - Move into containers.
v.push_back(std::move(s));overv.push_back(s);. emplace_backoverpush_backwhen constructing in place.- Pass by value +
std::movein constructors and setters — modern idiom. - Avoid
std::endl— it flushes. Use'\n'. - Prefer iteration over recursion for deep structures; the function-call overhead and stack pressure matter.
- Profile before optimizing.
perf, VTune, callgrind, sampling profilers. Algorithmic wins dwarf micro-optimizations. - Compile with
-O2 -march=native -fltofor production. - Avoid
virtualin hot paths when possible. Devirtualization helps but a known-static dispatch is always cheaper. - Beware of false sharing — two atomics on the same cache line (typically 64B) bottleneck even when “independent.” Pad with
alignas(std::hardware_destructive_interference_size).
struct alignas(64) Counter { std::atomic<long> v{0}; };
16. Tooling — Sanitizers, Compiler-Specific Behavior
Sanitizers (recap from §11)
- ASan — memory errors.
- UBSan — undefined behavior.
- TSan — races.
- MSan (Clang only) — uninitialized reads.
Run them in CI. Production: don’t ship with sanitizers (perf cost), but optionally enable a hardened mode (_FORTIFY_SOURCE=2, -fstack-protector-strong).
Warning flags
g++ -Wall -Wextra -Wpedantic -Werror -Wshadow -Wconversion
Treat warnings as errors. The C++ ecosystem assumes you do.
Standard library debug modes
-D_GLIBCXX_DEBUG (libstdc++) checks bounds, iterator invalidation. Only debug builds — slow.
Vendor-specific behavior
- MSVC has different ABI rules (e.g., NRVO eligibility, exception spec). Don’t depend on inline assembly portability.
__attribute__((...))is GCC/Clang. MSVC uses__declspec.- Endian-ness, padding, alignment are platform-dependent. Don’t
memcpybetween systems without endian conversion.
17. C++ — What To Memorize Cold
- RAII. RAII. RAII.
- Rule of 0/3/5. Default to Rule of 0.
unique_ptrcheap,shared_ptrhas atomic refcount,weak_ptrbreaks cycles.- Move = transfer of ownership. Moved-from = valid but unspecified.
noexceptmove ops matter. - C++17 mandates prvalue copy elision — return by value is fine.
- Iterator invalidation rules per container — memorize the table in §8.
vectoris the default;unordered_mapis slow on cache locality.- Sort is introsort — O(N log N) worst, unstable.
stable_sortallocates. memory_order:relaxedfor counters,acquire/releasefor publication,seq_cstdefault.- UB list: OOB, signed overflow, races, use-after-free, strict aliasing, null deref, uninitialized read. Sanitizers catch most.
- Virtual destructor for polymorphic bases. Object slicing on by-value.
vector<bool>is special. nullptr,enum class,auto,string_view,optional,variant, structured bindings — modern toolkit.- Cache locality > algorithmic constants in modern hardware.
- Compile with
-O2 -march=native -flto -Wall -Wextrafor production. Run sanitizers in CI.
When you’re shaky on any of those, write a 30-line program that demonstrates the issue and run it under ASan + UBSan. C++’s sanitizers are some of the best feedback in any language; use them.
Phase 10 — Testing, Debugging, and Correctness
Target level: Intermediate → Senior Expected duration: 2–3 weeks (assuming Phases 0–9 are complete) Weekly cadence: 4–5 lab hours + apply testing discipline to every problem you solve elsewhere
Why This Phase Exists
Most candidates lose offers not because they couldn’t find an algorithm — they lose because their code was almost right and they never noticed. The interviewer asked “are you sure?”, they said “yes”, and then the interviewer ran one edge case and the screen went red.
Testing and debugging is the dimension where senior candidates separate from juniors. A junior writes code and hopes. A senior writes code and proves it works, then runs three deliberate test cases (one normal, one degenerate, one large), and only then claims “done.”
This phase teaches the discipline. It is short because the mechanics are simple. The habit is what takes weeks to internalize, which is why every later problem in your study should explicitly run the checklist here.
Concepts to Master
Test types
- Manual / desk-checked tests — what you trace through on paper during a 45-minute interview
- Smoke tests — 1–2 sanity examples to prove the code runs at all
- Unit tests — per-function correctness; use these heavily in
phase-08-practical-engineeringlabs - Integration tests — multi-component behavior; relevant when you implement subsystems (cache + invalidator, scheduler + worker)
- Property-based tests —
hypothesis-style; assert invariants over random inputs (e.g., “sorted output is a permutation of the input”) - Brute-force verifier — known-correct slow solution to validate the fast one on small inputs
- Stress testing — random-generation loop that runs the verifier and the fast solution and diffs them; the single best CP debugging tool
- Fuzzing (overview) — feed structured random input; useful for parsers, serializers, anything with a grammar
- Golden tests — record expected output for canonical inputs; mostly used in compiler/transform code
- Mutation testing (overview) — flip operators in your code and check if any test catches the mutation; reveals weak test suites
- Coverage analysis — branch and line coverage; necessary but not sufficient
Complexity & performance
- Complexity testing — measure runtime at N, 2N, 4N; check the doubling ratio matches your claimed Big-O
- Performance profiling —
cProfile/py-spy(Python),perf/pprof(Go/C++),async-profiler(Java) - Memory profiling —
tracemalloc/memory_profiler(Python),pprofheap (Go), heap dumps (JVM)
Concurrency
- Race detection —
-race(Go), TSan (C++/Rust),ThreadSanitizerfor clang - Deterministic concurrency testing — schedule injection, controlled interleaving, deterministic random
- Deadlock detection — lock-order graph analysis
Why Testing Matters in Interviews
Interviewers explicitly score “testing and verification” on the rubric. The signal they’re watching for:
| What you do | What it signals |
|---|---|
| Submit and say “done” | Junior — does not verify own work |
| Walk through one example manually | Acceptable — minimum bar |
| Walk through, then deliberately try an edge case | Senior — actively looking for bugs |
| Find your own bug and fix it without prompt | Strong senior signal |
| Identify a class of bugs you might have (“integer overflow when the array is large”) and write a test for that specific risk | Staff signal — anticipating failure modes |
Candidates who do not test lose offers even when their code is correct, because the interviewer cannot tell whether the correctness was deliberate or accidental.
The Universal Test Checklist
Apply this to every problem you solve, in every phase. Most of these take 10 seconds to consider; even rejecting them out loud earns the signal.
Input shape
- Empty input (
[],"",0,None) - Null input (if the language allows)
- Single element
- Two elements
- Maximum-size input (the constraint upper bound)
- Minimum-size input (often the constraint lower bound)
Input content
- All elements identical (duplicates)
- All elements distinct
- Already sorted (ascending and descending)
- Negative numbers
- Zero
- Mixed signs
- Values at integer boundaries (
INT_MAX,INT_MIN, overflow risk in sums/products) - Floating-point precision (when numeric)
Domain-specific
- Disconnected graph
- Self-loop, multi-edge
- Cycle in a graph that “should” be a tree
- Empty tree / single-node tree / skewed tree
- Linked list with one node, two nodes, with cycle
- Strings with unicode, with whitespace, with case differences
Output ambiguity
- Multiple valid answers (does the interviewer want any, all, or a canonical one?)
- Stable ordering required vs not
- Off-by-one in inclusive vs exclusive bounds
Failure modes
- Invalid input — does your function crash, return a sentinel, or raise?
- Concurrent access (for the practical-engineering labs)
- Timeout case — what happens when N is at the constraint limit?
Required Tests Per Lab (Curriculum-Wide Rule)
From Phase 10 forward, every lab you complete (and every lab from Phases 0–9 you re-solve) must include:
- 3 normal tests — the happy path, what the problem statement examples look like
- 3 edge tests — chosen from the checklist above; pick the three most relevant to this problem
- 1 large-input test — N at the constraint upper bound; verifies you didn’t accidentally write an O(N²) loop you thought was O(N log N)
- 1 randomized test (when a verifier exists) — random input, run brute force and fast solution, assert equal
- 1 invalid-input test (when applicable) — wrong type, malformed, out of range
Document these as test functions, not “I thought about it.” The act of writing them catches bugs.
Common Mistakes
- Testing only the given examples. The examples in the problem statement are almost always the happy path; they never exercise edge cases.
- Mental simulation without writing it down. Your brain skips steps. Trace on paper.
- Treating “the code compiles” as “the code works.” Compilation is the lowest bar.
- Not verifying complexity empirically. A claimed O(N) that runs 30× slower at 2N is actually quadratic.
- Adding tests after the bug. Add the test first, watch it fail, then fix; otherwise you don’t know your test would have caught it.
- Ignoring “obvious” cases. Empty input bugs are the #1 cause of failed phone screens.
- Not testing concurrency under load. A thread-safety bug at 1 thread is invisible; at 1000 threads on 8 cores, it’s a daily incident.
Debugging Checklist (Apply When Stuck)
- Reproduce. What is the smallest input that fails?
- Read the error. Stack trace, line number, value. Do not skip this.
- State the expected output. If you can’t, you don’t understand the problem.
- Diff expected vs actual. Is it off by one? Off by a factor? Wrong type?
- Binary-search the bug. Print state at midpoint of the algorithm; halve the search space.
- Check invariants. What was supposed to be true at this point? Assert it.
- Question assumptions. “I’m sure this list is sorted” — prove it.
- Read the code aloud. Speech catches what your eye skips.
- Rubber-duck explain. Tell an inanimate object what the code does, line by line.
- Step away for 60 seconds. Genuinely. The number of bugs solved this way is embarrassing.
Mastery Checklist
You have completed Phase 10 when you can:
- Generate the universal test checklist for any new problem in under 90 seconds
- Write a brute-force verifier for any problem with N ≤ 20
- Build a randomized stress-testing harness in under 10 minutes for a new problem
- Diagnose a wrong-answer bug in your own code in under 5 minutes using the debugging checklist
- Diagnose a TLE (timeout) bug by measuring the doubling ratio
- State the loop invariant for binary search, Kadane’s algorithm, and a simple DP
- Profile a Python script and identify the top 3 hot functions in under 5 minutes
- Find a race condition in a small Go/Java/C++ program using the language’s race detector
- Recognize when a test is too weak (mutation testing thought experiment)
Exit Criteria
Before moving to Phase 11:
- Complete all 6 labs in this directory with the full test suite written and passing
- Re-solve 3 problems from Phase 2 and 3 problems from Phase 5 applying the universal test checklist; document any bugs caught
- Run the stress-testing harness (Lab 5) on at least one problem you previously thought was correct, and report what you found
- Profile one of your Phase 8 practical-engineering implementations (e.g., LRU cache, rate limiter) and identify at least one inefficiency
Labs
| # | Lab | Focus | Anchor Problem |
|---|---|---|---|
| 1 | lab-01-edge-case-taxonomy.md | Systematic edge case discovery | Array median |
| 2 | lab-02-test-driven-problem-solving.md | Write tests before code | LRU cache |
| 3 | lab-03-debugging-under-pressure.md | Systematic debug protocol | Word Break (planted bug) |
| 4 | lab-04-correctness-proofs.md | Loop invariants & induction | Binary search + Kadane |
| 5 | lab-05-stress-testing-harness.md | Brute-force verifier + random fuzzing | Two-pointer variants |
| 6 | lab-06-performance-profiling.md | Empirical complexity + profiling | LIS implementations |
Connection to Other Phases
- Phase 2/3/4/5 — re-solve a problem from each, applying the universal test checklist
- Phase 7 (Competitive) — Lab 5 (stress testing) is the canonical CP debugging tool; use it on every CF problem you fail
- Phase 8 (Practical Engineering) — concurrency-aware testing is required for every lab; the rate limiter, LRU cache, and thread pool labs all need race-condition tests
- Phase 11 (Mocks) — the testing rubric (dimension 8) is scored on every mock; this phase trains that score
Lab 01 — Edge Case Taxonomy (Find the Median of an Unsorted Array)
Goal
Build a reusable, systematic edge-case taxonomy you can apply to any new problem in under 90 seconds. Use “find the median of an unsorted integer array” as the anchor — a problem that looks trivial but has at least 12 edge cases that a careless candidate will miss. By the end you should be able to enumerate 8+ edge cases for any array problem before writing a single line of code.
Background Concepts
An edge case is an input that is technically legal under the constraints but exercises a degenerate or boundary behavior in your algorithm. They fall into a small number of universal categories:
- Empty / null — what does your function do with
[]orNone? - Singleton — one element
- Identical elements — all equal; tests duplicate handling
- Boundary values —
INT_MAX,INT_MIN,0, negatives - Sorted / reverse-sorted — tests algorithms that assume scrambled input
- Maximum size — N at the constraint upper bound; tests complexity
- Output ambiguity — multiple valid answers; tests the spec
- Arithmetic overflow — sums/products that exceed
INT_MAX
The taxonomy is universal. The application is problem-specific.
Interview Context
“Find the median” is asked as a warm-up at Meta, Microsoft, and Bloomberg phone screens. The interviewer is not testing whether you know quickselect. They are testing whether you ask “what do you mean by median for an even-length array — average of the two middles or either one?” before writing code. Candidates who skip this question lose the point even if their code is otherwise correct.
The senior signal is to list out edges aloud before coding: “Empty array — should I return None or throw? Single element — that’s the median. Two elements — average. Even vs odd length — different formulas. Are values bounded so the sum of two won’t overflow?” Five sentences. Then code.
Problem Statement
Given an unsorted array of integers nums, return the median. If nums has odd length, return the middle value after sorting. If nums has even length, return the average of the two middle values.
Constraints
- 0 ≤ |nums| ≤ 10^5
- -10^9 ≤ nums[i] ≤ 10^9
- The return type may be a float (because of averaging)
Clarifying Questions
- Empty input? What should I return —
None,NaN, raise an exception? - Even length: average or either middle? Lower middle, upper middle, or the float average?
- Are duplicates allowed? (Yes; median definition handles them naturally.)
- Floating point precision concerns? If
nums[i]is up to 10^9, sum of two is 2×10^9 — fits in 32-bit signed int barely, but using(a + b) / 2.0in C++ overflows forINT_MAX + INT_MAX. Better:a/2.0 + b/2.0ora + (b - a)/2.0. - Modify input allowed? (Affects whether you can sort in place or need to copy.)
Examples
nums = [3, 1, 2] → 2 (odd length)
nums = [3, 1, 2, 4] → 2.5 (even, average of 2 and 3)
nums = [5] → 5 (singleton)
nums = [] → None / raise (clarify with interviewer)
nums = [7, 7, 7, 7] → 7.0 (all duplicates, even length)
nums = [INT_MAX, INT_MAX] → INT_MAX (overflow risk in average)
nums = [-3, -1, -2] → -2 (negatives)
Initial Brute Force
Sort, then index. Two lines of code.
def median(nums):
if not nums:
return None
s = sorted(nums)
n = len(s)
if n % 2 == 1:
return s[n // 2]
return (s[n // 2 - 1] + s[n // 2]) / 2
Brute Force Complexity
Time O(N log N), space O(N) (or O(1) if you sort in place and the caller allows mutation). For N = 10^5 this is ~1.7×10^6 comparisons — well within any interview time limit.
Optimization Path
The interviewer may now ask: “Can you do better than O(N log N)?” The answer is quickselect, which finds the k-th smallest in expected O(N) using partition-based recursion. Worst case O(N²); use median-of-medians for guaranteed O(N) if pressed.
For the edge-case lab, do not optimize. The point is to enumerate edges before the algorithm matters. Quickselect has more edge cases (recursion depth on degenerate partitions, pivot selection bias) so optimizing without first nailing edges makes the bug surface larger.
Final Expected Approach
- Validate input. Return
None(or raise) on empty. - Sort a copy (do not mutate caller’s array unless agreed).
- Compute middle index
mid = n // 2. - If odd, return
sorted[mid]. - If even, return
sorted[mid - 1] + sorted[mid]divided by 2, usinga + (b - a) / 2form to avoid overflow.
Data Structures Used
- A sortable copy of the array. In Python
sorted()returns a new list. In Java useArrays.sort()on a clone; in C++std::sorton a copy. - No auxiliary structures.
Correctness Argument
After sorting, by definition the value at index n // 2 is the lower-middle (0-indexed); the value at n // 2 - 1 is the upper-lower; their average is the median for even lengths. For odd lengths, n // 2 is exactly the middle. The sort guarantees the ordering invariant required.
Complexity
Time O(N log N) sort + O(1) lookup. Space O(N) for the copy (or O(1) if in-place sort is allowed). Quickselect: expected O(N), worst O(N²); median-of-medians: worst O(N) with larger constant.
Implementation Requirements
- Function signature should accept any iterable convertible to a list.
- Do not mutate the caller’s input.
- Return type:
float(even for odd-length inputs, for consistency) or use a tagged return; document which. - Handle empty input explicitly with the chosen convention.
Tests
Smoke (3 normal)
assert median([3, 1, 2]) == 2
assert median([1, 2, 3, 4]) == 2.5
assert median([5, 2, 8, 1, 9]) == 5
Edge (5 — exceeds the 3 minimum because this lab is about edges)
assert median([]) is None # empty
assert median([42]) == 42 # singleton
assert median([1, 2]) == 1.5 # even, smallest
assert median([7, 7, 7, 7]) == 7 # all duplicates
assert median([-3, -1, -2]) == -2 # all negatives
assert median([10**9, 10**9]) == 10**9 # overflow boundary
assert median([-10**9, 10**9]) == 0 # mixed extremes
Large
import random
random.seed(0)
big = [random.randint(-10**9, 10**9) for _ in range(10**5)]
result = median(big)
assert isinstance(result, (int, float)) # didn't crash; didn't take >1s
Randomized verifier
def brute_median(nums):
s = sorted(nums)
n = len(s)
return s[n//2] if n % 2 else (s[n//2 - 1] + s[n//2]) / 2
for _ in range(1000):
n = random.randint(1, 50)
nums = [random.randint(-100, 100) for _ in range(n)]
assert median(nums) == brute_median(nums)
Invalid input
try:
median(None)
assert False, "should have raised"
except TypeError:
pass
Follow-up Questions
- Streaming median. Find the median as numbers arrive one at a time. → Two heaps (max-heap of lower half, min-heap of upper half). O(log N) per insert, O(1) per query.
- Median of two sorted arrays. Classic LC 4 hard. → Binary search on partition; O(log min(N, M)).
- k-th smallest in unsorted. → Quickselect; O(N) expected.
- Weighted median. Each value has a weight; find the value where cumulative weight crosses half. → Sort + prefix scan; O(N log N).
- Approximate median in one pass with O(1) memory. → Reservoir sampling + recursion, or P² algorithm.
Product Extension
- Latency percentiles in distributed monitoring. P50 (median), P99, P99.9. Cannot store all latencies — use t-digest or HdrHistogram for compact mergeable approximations.
- A/B testing. Comparing median user session length between buckets requires bootstrap confidence intervals because medians don’t have closed-form variance.
- Recommendation systems. “Median rating per item” for cold-start ranking.
Language/Runtime Follow-ups
- Python:
sorted()is TimSort, O(N log N) worst case; uses additional O(N) memory.list.sort()is in-place.nums[n//2]is O(1) indexing. - Java:
Arrays.sort(int[])uses dual-pivot quicksort (O(N log N) average, O(N²) worst on adversarial input).Arrays.sort(Object[])uses TimSort. Auto-boxingIntegeradds overhead. - C++:
std::sortis introsort (quicksort + heapsort fallback); worst-case O(N log N).std::nth_elementis O(N) average for quickselect. Beware integer overflow in(a + b) / 2for signed 32-bit; usea + (b - a) / 2. - Go:
sort.Intsis introsort, O(N log N). No overflow checks inintarithmetic; wraps silently on 32-bit platforms. - JavaScript:
Array.prototype.sort()defaults to lexicographic string comparison —[10, 9, 2].sort()returns[10, 2, 9]. Always pass a comparator:sort((a, b) => a - b).
Common Bugs
- Empty input crash.
s[n // 2]withn == 0iss[0]on an empty list →IndexError. - Integer overflow on average.
(a + b) / 2overflows whena + b > INT_MAX. Usea + (b - a) / 2or use floating-point earlier. - Integer division for even-length median. In Python 2 / Java,
(a + b) / 2truncates. In Python 3,/is float — but//is integer. Be explicit. - Mutating caller’s array. Passing
nums.sort()to a function modifies the original. Usesorted(nums). - Off-by-one for even length.
n // 2is the upper middle (0-indexed);n // 2 - 1is the lower. Confusing these gives the wrong answer for[1, 2, 3, 4]. - JavaScript default sort. Returns string-sorted order for numbers.
Debugging Strategy
If your function returns the wrong value:
- Print the sorted array. Is it actually sorted? (Confirms no JavaScript-style default-sort bug.)
- Print
n,n // 2,n % 2. Is the index what you expect? - Check parity branch — did you accidentally swap the odd/even branches?
- For overflow: print
s[n//2 - 1] + s[n//2]before dividing; check if it matches the expected sum. - For mutation bugs: print the input both before and after the call. If it changed, you mutated.
If you TLE on the large test, you wrote O(N²) accidentally (e.g., used insertion sort, or sorted inside a loop).
Mastery Criteria
- Wrote the function correctly on the first try with all edge cases handled
- Listed all 8+ edge cases aloud before writing code (time yourself: under 90 seconds)
-
Identified the overflow risk in
(a + b) / 2without prompting - Wrote the randomized verifier in under 5 minutes
- Can recite the universal edge-case taxonomy (empty / singleton / two / duplicates / sorted / boundary / overflow / mixed) without looking
- Re-applied the taxonomy to one Phase 2 problem and caught at least one edge case you previously missed
Lab 02 — Test-Driven Problem Solving (LRU Cache)
Goal
Write tests before writing the implementation. Use LRU cache (LC 146) as the anchor — a problem where ambiguities in the spec (does put of an existing key count as a “use”? what does capacity 0 mean?) are best surfaced by writing test cases first. By the end you should treat tests as a design tool, not a verification afterthought.
Background Concepts
Test-driven design (TDD) in an interview context is not the dogmatic red-green-refactor cycle. It is the discipline of writing 3–5 example calls and their expected results before writing the implementation, because:
- Writing the expected output forces you to confront spec ambiguities (and ask the interviewer).
- The tests double as documentation of your understanding — if the interviewer disagrees, you discover it before you’ve written 50 lines.
- The tests become your verification suite — you don’t have to invent them after the fact under time pressure.
- The act of choosing tests reveals edge cases you would otherwise miss.
The cost is 2–3 minutes up front. The savings are usually 10+ minutes of debugging later.
Interview Context
LRU cache is the most-asked OOD-flavored coding question at FAANG. Google, Meta, Amazon, Bloomberg all ask it in some form. The standard expectation is O(1) get and put using a hashmap + doubly linked list.
The senior signal is to enumerate behavioral test cases before touching code: “get on missing key returns -1 (or what?). put of existing key updates value AND marks as recently used? put over capacity evicts the LRU; what if multiple keys are tied? Does get count as a use?” These are the real questions. Candidates who code first and discover these mid-interview look junior.
Problem Statement
Design a data structure that supports:
LRUCache(int capacity)— initialize with positive capacityint get(int key)— return value if present, else -1; using a key counts as “recently used”void put(int key, int value)— insert or update; on overflow evict the least recently used; updating an existing key also counts as recently used
All operations must be O(1) average.
Constraints
- 1 ≤ capacity ≤ 3000
- 0 ≤ key, value ≤ 10^4
- Up to 2×10^5 calls
Clarifying Questions (Surface These Before Writing Code)
- Does
getmark the key as recently used? (Yes — standard.) - Does
puton an existing key mark as recently used? (Yes — standard. Confirm.) - Capacity of 0 — is that legal? (Constraints say ≥ 1, but worth confirming the contract.)
- Eviction policy when multiple keys are tied for least recently used — can this happen? (In a strict LRU with sequential ops, no — every access updates order. Tie only on initial fill, but at that point the oldest insertion is LRU.)
- Thread safety required? (Almost never in the interview; always ask anyway. If yes, see Phase 8 LRU lab.)
Examples (Written as Tests First)
# Test 1: basic put/get
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == 1 # returns 1
cache.put(3, 3) # evicts key 2 (LRU)
assert cache.get(2) == -1 # not found
assert cache.get(3) == 3
cache.put(4, 4) # evicts key 1
assert cache.get(1) == -1
assert cache.get(3) == 3
assert cache.get(4) == 4
# Test 2: put on existing key updates value AND recency
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
cache.put(1, 10) # update key 1; now order is 2 (LRU), 1 (MRU)
cache.put(3, 3) # evicts 2, not 1
assert cache.get(2) == -1
assert cache.get(1) == 10 # updated value preserved
# Test 3: get on missing key
cache = LRUCache(1)
assert cache.get(99) == -1 # never inserted
These three tests already locked down 6 design decisions. Now you can write the implementation.
Initial Brute Force
Use a single Python dict and an auxiliary list to track insertion order. get: O(1) dict lookup, but moving to end is O(N) list operation. put: O(1) insert, but O(N) eviction scan. Total: O(N) per op.
Alternatively, use collections.OrderedDict which is already a hash + doubly linked list internally. move_to_end and popitem(last=False) are both O(1). Single-class solution in ~15 lines.
Brute Force Complexity
Naive dict + list: O(N) per op, fails on 2×10^5 calls at large capacity → 6×10^8 ops, TLE.
Optimization Path
The standard answer is hashmap + doubly linked list:
- Hashmap:
key → node - Doubly linked list: nodes in MRU-to-LRU order
get: hashmap lookup → unlink node → relink at head (MRU)put: if key exists, update value + move to head; else create node, insert at head; if size > capacity, remove tail (LRU) and delete from hashmap
All operations are O(1) because both the hashmap and the doubly linked list support O(1) insert/delete with a node reference.
Using OrderedDict in Python is equivalent and acceptable in interviews if you explain why it works (because it’s a hashmap + DLL internally). In Java, use LinkedHashMap with accessOrder=true.
Final Expected Approach
Implement with explicit doubly linked list to demonstrate understanding:
class Node:
__slots__ = ('key', 'val', 'prev', 'next')
def __init__(self, key=0, val=0):
self.key, self.val = key, val
self.prev = self.next = None
class LRUCache:
def __init__(self, capacity: int):
self.cap = capacity
self.cache = {}
# sentinel head/tail to avoid edge cases
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _add_to_front(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def get(self, key: int) -> int:
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.val
def put(self, key: int, value: int) -> None:
if key in self.cache:
node = self.cache[key]
node.val = value
self._remove(node)
self._add_to_front(node)
else:
if len(self.cache) >= self.cap:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
node = Node(key, value)
self.cache[key] = node
self._add_to_front(node)
Data Structures Used
- Dict — O(1) key → node lookup
- Doubly linked list with sentinels — O(1) insert at head, O(1) remove from tail or middle
Sentinels eliminate if node is None checks at the boundary, the #1 source of LRU bugs.
Correctness Argument
Invariant 1: cache[key] always points to the node currently in the linked list with that key. Maintained because every insertion adds to both, every removal removes from both.
Invariant 2: The linked list is ordered MRU → LRU from head to tail. Maintained because every access (get or put on existing) moves the node to the front, and every new insert goes to the front.
Invariant 3: len(cache) ≤ capacity. Maintained because put evicts the tail (LRU) before adding when at capacity.
Complexity
get: O(1). put: O(1). Space: O(capacity).
Implementation Requirements
- Use sentinels — no null checks at head/tail
- Use
__slots__on the Node class in Python for memory efficiency - Keep
_removeand_add_to_frontas separate helpers — do not inline; the duplication is a bug magnet - Update the hashmap whenever you touch the linked list, never one without the other
Tests
Smoke (3 normal)
The three tests in the Examples section above.
Edge
# Capacity 1 — every put evicts
cache = LRUCache(1)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == -1
assert cache.get(2) == 2
# Get on a key that was evicted
cache = LRUCache(2)
cache.put(1, 1); cache.put(2, 2); cache.put(3, 3)
assert cache.get(1) == -1
# Repeated put on same key never evicts
cache = LRUCache(2)
for v in range(100):
cache.put(1, v)
assert cache.get(1) == 99
Large
cache = LRUCache(3000)
for i in range(200000):
cache.put(i % 5000, i)
# Verifies O(1) per op; should complete in <1s
Randomized verifier (brute O(N) LRU using list)
class BruteLRU:
def __init__(self, cap):
self.cap = cap
self.order = []
self.vals = {}
def get(self, k):
if k not in self.vals: return -1
self.order.remove(k); self.order.append(k)
return self.vals[k]
def put(self, k, v):
if k in self.vals:
self.order.remove(k)
elif len(self.vals) >= self.cap:
evict = self.order.pop(0)
del self.vals[evict]
self.order.append(k); self.vals[k] = v
import random
random.seed(42)
for trial in range(100):
cap = random.randint(1, 10)
fast = LRUCache(cap); slow = BruteLRU(cap)
for _ in range(200):
if random.random() < 0.5:
k = random.randint(0, 15)
assert fast.get(k) == slow.get(k)
else:
k = random.randint(0, 15); v = random.randint(0, 100)
fast.put(k, v); slow.put(k, v)
Invalid input
# Capacity 0 — undefined by spec; behavior must be documented
try:
LRUCache(0)
except ValueError:
pass
# Or: assert never stores anything if zero is allowed
Follow-up Questions
- Make it thread-safe. Per-op mutex is the simple answer; striped locks for higher throughput. (See Phase 8 lab.)
- LFU instead of LRU. Track frequencies + a min-frequency pointer; harder, see Phase 8 lab-02.
- TTL eviction. Add expiration timestamp per entry; lazy or eager eviction tradeoff.
- Distributed LRU. Consistent hashing across nodes; cache coherence is now a hard problem.
- Approximate LRU. Sample K random entries and evict the oldest among them (Redis approach); O(1) eviction without strict ordering.
Product Extension
- CPU caches. Hardware caches are pseudo-LRU because true LRU’s pointer overhead is too expensive.
- CDN edge caches. Strict LRU loses popular content under cache pollution; LFU + admission filter (TinyLFU) is state of the art.
- Database buffer pools. PostgreSQL uses a clock-sweep approximation; MySQL InnoDB uses LRU with a midpoint insertion point to resist scan pollution.
- OS page replacement. Linux uses two clock lists (active + inactive) to approximate LRU at low cost.
Language/Runtime Follow-ups
- Python:
OrderedDictis implemented in C and uses a doubly linked list internally;move_to_endandpopitem(last=False)are O(1). The__slots__on Node avoids per-instance__dict__, saving ~50% memory per node. - Java:
LinkedHashMapwith(capacity, 0.75f, true)constructor enables access order; overrideremoveEldestEntryfor eviction.ConcurrentHashMapdoes not provide LRU; you’d needCaffeineor a custom striped lock. - C++:
std::list+std::unordered_map<Key, std::list::iterator>; iterators tostd::listremain valid after other inserts/erases, which is essential for this design. - Go: No built-in LRU; use
container/list+ a map. The standard libraryhashicorp/golang-lruis the go-to. - Rust: Borrowing rules make a vanilla doubly linked list hard; use
lrucrate which uses raw pointers internally.
Common Bugs
- Forgot to update recency on
get. Tests pass forputbut fail whengetshould “save” a key from eviction. - Forgot to update recency on
putof existing key. The key that was just updated gets evicted on the nextput. - Hashmap and linked list out of sync. Removed from list but not dict, or vice versa. Always update both.
- No sentinels → null pointer at head/tail. Sentinels eliminate 4 different null checks.
- Evicting before checking if key already exists. Update-of-existing should not trigger eviction.
- Off-by-one on capacity comparison.
len(cache) >= capvslen(cache) > cap— first is correct because you’re about to add one more.
Debugging Strategy
If a test fails:
- Print the linked list (head → tail with keys) after each operation. Verify it matches your expected MRU order.
- Print
len(cache)after each op. It should match the number of inserts minus evictions. - Cross-check: after every op,
set(cache.keys())should equal the set of keys in the linked list. If not, you have a sync bug. - Run the randomized verifier with a small seed; when it fails, print the trace of operations that led to the divergence — it will be 10–20 ops long and obvious.
Mastery Criteria
- Wrote the tests before writing the implementation
- Surfaced at least 3 spec ambiguities through the tests
- Implementation worked on the first run (because tests forced the design to be correct)
- Used sentinels in the linked list
- Wrote the randomized verifier and ran it for 100+ trials with no divergence
- Can explain why updating an existing key must mark it as MRU
- Re-applied TDD to one Phase 8 lab and recorded how many design questions it surfaced
Lab 03 — Debugging Under Pressure (Word Break with a Planted Bug)
Goal
Build a systematic debugging protocol you can execute under interview pressure in under 5 minutes. The anchor is Word Break (LC 139) with a planted off-by-one — you’ll be given buggy code and a failing test, and the goal is to find the bug methodically rather than by panic-staring at the screen. By the end you should reach for the protocol automatically when stuck.
Background Concepts
Under pressure, most candidates default to panic debugging: re-read the code 5 times, add random print statements, change one thing, hope. This rarely works in 5 minutes and looks terrible to the interviewer.
Systematic debugging is a 6-step protocol:
- Reproduce — Confirm the failing input. The smallest one that fails.
- Isolate — What is the exact discrepancy? Expected X, got Y.
- Hypothesize — Form a specific hypothesis: “I think the bug is in the inner loop’s bound.”
- Verify — Add one targeted print or assertion that confirms or denies the hypothesis. Not five prints.
- Fix — Make the minimum change that addresses the verified hypothesis.
- Re-test — Run all tests, not just the failing one. Make sure you didn’t break something else.
The discipline is in steps 3 and 4. Hypothesize before you print. Random prints waste time and create noise.
Interview Context
You will hit a bug in 80% of medium+ interview problems. How you respond is a major signal:
| Behavior | Signal |
|---|---|
| “It doesn’t work, let me try…” (silent typing for 3 min) | Junior — no protocol |
| “Let me add some prints…” (adds 8 prints, can’t read output) | Junior — random debugging |
“The expected output is X, I got Y. So the difference is Z. My hypothesis is that the bug is in the loop bound — let me check by printing i at line 7.” | Senior — narrating the protocol |
| Finds the bug, then says “Let me also test the case I just fixed plus an adjacent case, in case I introduced something.” | Senior+ — proactive regression |
Narrating the protocol aloud is itself the signal. The interviewer can hear that you’re a person who has debugged a thousand bugs and has a process.
Problem Statement
You are given the following implementation of Word Break. It is buggy. Find and fix the bug using the debug protocol. You may add prints/asserts but must remove them before declaring the fix complete.
def word_break(s: str, word_dict: list[str]) -> bool:
"""Return True iff s can be segmented into a sequence of words from word_dict."""
words = set(word_dict)
n = len(s)
# dp[i] = True iff s[:i] can be segmented
dp = [False] * n
dp[0] = True
for i in range(1, n + 1):
for j in range(i):
if dp[j] and s[j:i] in words:
dp[i] = True
break
return dp[n]
Failing test: word_break("leetcode", ["leet", "code"]) should return True. The function raises IndexError.
Constraints
- 1 ≤ |s| ≤ 300
- 1 ≤ |word_dict| ≤ 1000
- All strings lowercase letters
Clarifying Questions
(Not applicable — this lab uses a fixed buggy implementation. The clarifying questions for Word Break itself are: are words reusable? Yes. Are duplicates in word_dict significant? No. Empty s? Returns True conventionally.)
Examples
word_break("leetcode", ["leet", "code"]) → True
word_break("applepenapple", ["apple", "pen"]) → True
word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) → False
word_break("", ["any"]) → True (empty string is trivially segmentable)
Initial Brute Force
Recursion: for each prefix of s that is in word_dict, recurse on the suffix. Time O(2^N) without memoization.
Brute Force Complexity
O(2^N) without memoization; O(N²) with (each prefix length tried once, each requiring an O(N) substring check and set lookup).
Optimization Path
Bottom-up DP: dp[i] = True iff s[:i] can be segmented. Transition: dp[i] = True iff there exists j < i with dp[j] True and s[j:i] ∈ words. The buggy implementation above is the right idea — just slightly wrong.
Final Expected Approach (Correct Version)
def word_break(s, word_dict):
words = set(word_dict)
n = len(s)
dp = [False] * (n + 1) # ← THE FIX: size n+1, not n
dp[0] = True
for i in range(1, n + 1):
for j in range(i):
if dp[j] and s[j:i] in words:
dp[i] = True
break
return dp[n]
Applying the Debug Protocol (Walkthrough)
Step 1 — Reproduce
Run word_break("leetcode", ["leet", "code"]). Get IndexError: list assignment index out of range. Confirm with the exact line: the assignment dp[i] = True when i = n.
Step 2 — Isolate
The loop runs i from 1 to n inclusive (range(1, n + 1)). dp has length n. So when i == n, dp[i] is out of bounds.
Step 3 — Hypothesize
The DP array is one element too small. The semantics of dp[i] cover i = 0 (empty prefix) through i = n (full string), which is n + 1 values. The author wrote dp = [False] * n — off by one.
Step 4 — Verify
Add assert len(dp) == n + 1 after allocation. It fails, confirming the hypothesis.
Step 5 — Fix
Change dp = [False] * n to dp = [False] * (n + 1).
Step 6 — Re-test
Run the failing test → True. Run all four examples → all pass. Add an edge test word_break("", []) → returns True (since dp[0] is True initialized, and dp[n] == dp[0]).
Total time if narrated cleanly: under 4 minutes.
Data Structures Used
- Set for O(1) word lookup
- DP array of booleans
Correctness Argument
dp[0] is the base case: the empty prefix is trivially segmentable. For i > 0, dp[i] is True iff some split point j makes both halves valid: dp[j] (the left half is segmentable) and s[j:i] is a word. By induction on i, this is correct.
Complexity
Time O(N²) (or O(N² + total dict string length) if you care about set membership cost). Space O(N + W) where W is the dictionary size.
Implementation Requirements
- DP array size must be
n + 1, notn— this is the planted bug - Use a set for word lookup, not a list (O(1) vs O(W) per check)
- Break out of inner loop on first success (constant factor, not asymptotic)
Tests
Smoke (3 normal)
assert word_break("leetcode", ["leet", "code"]) is True
assert word_break("applepenapple", ["apple", "pen"]) is True
assert word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) is False
Edge
assert word_break("", ["any"]) is True # empty string
assert word_break("a", ["a"]) is True # single char match
assert word_break("a", ["b"]) is False # single char no match
assert word_break("aaaa", ["a", "aa"]) is True # overlapping dict
assert word_break("ab", ["a"]) is False # partial match
Large
s = "a" * 300
dict_ = ["a", "aa", "aaa", "aaaa"]
assert word_break(s, dict_) is True
# Should complete in <100ms; O(N^2) = 9e4 ops
Randomized
import random, string
random.seed(0)
for _ in range(100):
words = [''.join(random.choices(string.ascii_lowercase, k=random.randint(1, 4)))
for _ in range(5)]
s = ''.join(random.choice(words) for _ in range(random.randint(1, 10)))
assert word_break(s, words) is True # constructed to be segmentable
Invalid input
# Empty dict — empty string still works, non-empty doesn't
assert word_break("", []) is True
assert word_break("a", []) is False
Follow-up Questions
- Return all segmentations. Word Break II (LC 140). DFS + memoization; exponentially many results possible.
- Return one segmentation. Track parent pointers in DP; reconstruct by backtracking.
- Minimum number of words. Modify DP:
dp[i] = min over j of dp[j] + 1. - Streaming input. Word Break on a stream — Aho-Corasick automaton.
- Dictionary changes dynamically. Trie + DP, but rebuilds are expensive on every dict change.
Product Extension
- Spell-check / autocomplete segmentation. “iphoneapp” → “iphone app”. Used in URL/path tokenization.
- Hashtag splitting. Twitter “#machinelearning” → “machine learning”. Same algorithm with a dictionary + frequency weights for tiebreaks.
- DNS subdomain analysis. “thequickbrownfox.com” — fraud detection wants to know if the hostname is composed of dictionary words.
- Chinese/Japanese word segmentation. No spaces between words; same DP with a much larger dictionary.
Language/Runtime Follow-ups
- Python:
s[j:i]allocates a new string each call (O(i-j) time and space). For very long strings, prefer indexing into a precomputed structure or usingstr.startswithagainst the dictionary entries. - Java:
s.substring(j, i)is O(i-j) since Java 7 (used to be O(1) view; changed for security). Same allocation cost. - Go:
s[j:i]on a string is a view — O(1), no allocation. This makes Go’s version significantly faster. - C++:
std::string_view(C++17) gives O(1) slicing;s.substr(j, i-j)allocates. - All: Set membership of strings is O(|substring|) for hashing + O(|substring|) for equality on collision — not strictly O(1). Matters for very long substrings.
Common Bugs
- Off-by-one on DP size — the planted bug
- Initializing
dp[0] = False— empty prefix is the base case, must be True - Forgetting
breakafter success — correctness still works but performance suffers - Using list for word_dict — O(W) per lookup, blows complexity to O(N²·W)
- Inclusive vs exclusive bounds —
s[j:i+1]vss[j:i]is the most common off-by-one when porting between languages
Debugging Strategy
The 5-step systematic protocol:
- Reproduce minimally. If the bug shows up on a 300-char string, shrink to the smallest failing case (here: any non-empty string).
- Read the exception fully.
IndexError+ line number tells you almost everything. Don’t skip the stack trace. - State the discrepancy precisely. “Expected True, got IndexError on
dp[i] = Truewheni = 8andlen(dp) = 8.” - Form a specific hypothesis. Not “it’s broken somewhere”; rather, “the array is one too small for the loop range.”
- Verify with one targeted print/assert.
assert len(dp) > iimmediately before the assignment. - Regression test. After fixing, run the full suite. In this case, also test empty string explicitly since that’s the boundary.
The 5-minute panic protocol (when truly stuck):
- Stop typing.
- State aloud: “I’m stuck. Let me restate what I know.”
- Restate the input and expected output.
- State what your code does for that input, step by step.
- The bug almost always reveals itself in the gap between “what the code does” and “what should happen.”
Mastery Criteria
- Found the planted bug in under 5 minutes using the protocol
- Narrated each step aloud (or in notes) — Reproduce, Isolate, Hypothesize, Verify, Fix, Re-test
- Added exactly ONE targeted assertion to verify the hypothesis (not 5 prints)
- Ran the full test suite after the fix, not just the failing one
- Wrote the empty-string edge test that would have caught this bug originally
- Can recite the 6-step protocol from memory
- Applied the protocol to one of your own past wrong-answer submissions and timed yourself
Lab 04 — Correctness Proofs (Binary Search & Kadane’s Algorithm)
Goal
Prove the correctness of two short, foundational algorithms — binary search and Kadane’s — using loop invariants and induction. By the end you should be able to state the invariant for any loop you write and use it both to prove correctness and to find bugs before they manifest.
Background Concepts
A loop invariant is a statement that is true:
- Initially (before the loop starts)
- Maintained (if true before an iteration, true after)
- Terminating (when the loop exits, it implies the desired post-condition)
This is induction on iterations. The invariant is what your loop promises about its state. If you can state the invariant out loud while coding, off-by-one bugs disappear because you can check each iteration against the promise.
A monovariant is a quantity that strictly decreases (or increases) each iteration and is bounded — it proves termination. For binary search, the monovariant is the search-range width.
An inductive proof for a recursive function: prove the base case correct; assume the recursive call is correct (induction hypothesis); show the combination is correct.
Interview Context
Interviewers rarely demand a formal proof, but they constantly ask “are you sure this works?” or “why does this work?” The candidates who answer with a precise invariant (“at the top of the loop, lo is the smallest index that could be the answer and hi is one past the largest”) look senior. The candidates who say “uh, I think so” look junior even when the code is correct.
For DP problems, “what’s the state, what’s the transition, and why does the order of iteration give you the correct value when you read it?” is the proof — interviewers explicitly ask this at Meta, Google, and Bloomberg.
Problem Statement
Prove correctness of two algorithms:
Part A: Binary search. Given a sorted array a and a target t, return the index of t if present, else -1.
Part B: Kadane’s algorithm. Given an array of integers (positive, negative, mixed), return the maximum sum of any non-empty contiguous subarray.
For each, you must:
- Write the code
- State the loop invariant precisely
- Prove the invariant holds initially, is maintained, and implies correctness on exit
- Identify the monovariant that proves termination
Constraints
- 1 ≤ |a| ≤ 10^5 for both problems
- Values: -10^9 ≤ a[i] ≤ 10^9 (Kadane: watch overflow on long all-positive subarrays)
Clarifying Questions
(Standard problem statements; the lab is about proof, not problem disambiguation.)
Examples
Binary search:
a = [1, 3, 5, 7, 9, 11], t = 7 → 3
a = [1, 3, 5, 7, 9, 11], t = 4 → -1
a = [], t = 1 → -1
Kadane:
[-2, 1, -3, 4, -1, 2, 1, -5, 4] → 6 (subarray [4, -1, 2, 1])
[-3, -1, -2] → -1 (best single element)
[5] → 5
Initial Brute Force
Binary search: linear scan, O(N).
Kadane: triple loop over (i, j, sum), O(N³). With prefix sums, O(N²).
Brute Force Complexity
Linear: O(N). Triple loop: O(N³). Both correct, both slow.
Optimization Path
Binary search: O(log N) by halving the candidate range each step. Kadane: O(N) by maintaining the best subarray ending at index i and the best seen so far.
Final Expected Approach
Part A — Binary Search (with Proof)
def binary_search(a, t):
lo, hi = 0, len(a) # half-open: search range is [lo, hi)
while lo < hi:
mid = lo + (hi - lo) // 2
if a[mid] == t:
return mid
elif a[mid] < t:
lo = mid + 1
else:
hi = mid
return -1
Loop invariant: At the top of every iteration, if t is present in a, then t is at some index in [lo, hi).
Initialization: lo = 0, hi = len(a). If t is present, it’s at some index in [0, len(a)) by definition. Invariant holds.
Maintenance: Assume invariant holds before an iteration. Compute mid.
- If
a[mid] == t, return immediately — correct. - If
a[mid] < t: becauseais sorted, every index≤ midhas value≤ a[mid] < t, sotis not at any of those indices. Iftis ina, it must be in[mid+1, hi). Settinglo = mid + 1preserves the invariant. - If
a[mid] > t: symmetric;tnot at index≥ mid. Settinghi = midpreserves the invariant.
Termination & post-condition: The monovariant is hi - lo, which strictly decreases each iteration (verify: in both update branches mid = lo + (hi-lo)//2, after which lo' > lo or hi' < hi; specifically hi' - lo' < hi - lo always when lo < hi). It’s bounded below by 0, so the loop terminates. When lo == hi, the search range is empty. By the invariant, if t were present, it would be in an empty range — contradiction. So t is absent. Returning -1 is correct.
Critical subtlety — mid = lo + (hi - lo) // 2 vs (lo + hi) // 2: the former avoids integer overflow when lo + hi > INT_MAX. In Python this doesn’t matter (arbitrary precision int), but in Java/C++ it’s a real bug. Famous: the Java SDK had this bug in Arrays.binarySearch for ~9 years (Bloch, 2006).
Part B — Kadane’s Algorithm (with Proof)
def kadane(a):
best_here = best_overall = a[0]
for i in range(1, len(a)):
best_here = max(a[i], best_here + a[i])
best_overall = max(best_overall, best_here)
return best_overall
Loop invariant: At the top of iteration i (1-indexed for clarity):
best_hereis the maximum sum of any contiguous subarray ending at indexi - 1.best_overallis the maximum sum of any contiguous subarray withina[0..i-1]inclusive.
Initialization: Before the loop (i.e., before i = 1), the only subarray of a[0..0] is [a[0]] with sum a[0]. Both best_here and best_overall are set to a[0]. Invariant holds.
Maintenance: Assume the invariant holds at the start of iteration i. Consider all contiguous subarrays ending at index i. Each such subarray is either:
- The singleton
[a[i]], with suma[i], OR - An extension of a subarray ending at
i - 1, with sum(sum of that subarray) + a[i].
The best extension is best_here + a[i] (by the invariant on best_here). So the best subarray ending at i has sum max(a[i], best_here + a[i]), which is exactly the new best_here. Invariant clause 1 maintained.
The best subarray within a[0..i] is either entirely within a[0..i-1] (covered by old best_overall) or ends at i (covered by new best_here). The new best_overall = max(old best_overall, new best_here) is therefore correct. Clause 2 maintained.
Termination & post-condition: The loop runs exactly n - 1 iterations (finite, no monovariant needed). On exit, best_overall is the max contiguous subarray sum within a[0..n-1] = the whole array. Returning it is correct.
Edge case proof: Kadane requires a non-empty (the problem states this); for an all-negative array, the answer is the maximum single element. The invariant handles this because best_here will reset to a[i] whenever best_here + a[i] < a[i], i.e., whenever best_here < 0. This is why the algorithm works for negative-only arrays — a common bug is to initialize best_here = 0, which incorrectly returns 0 for all-negative input.
Data Structures Used
- Plain arrays
- Two integer variables for Kadane
- Two integer indices for binary search
Correctness Argument
See Part A and Part B above.
Complexity
- Binary search: O(log N) time, O(1) space.
- Kadane: O(N) time, O(1) space.
Implementation Requirements
- Use
lo + (hi - lo) // 2for binary search midpoint - Use half-open interval
[lo, hi)for binary search — easier to reason about than closed[lo, hi] - Initialize Kadane’s
best_hereandbest_overalltoa[0], not0, to handle all-negative arrays
Tests
Smoke
assert binary_search([1, 3, 5, 7, 9], 5) == 2
assert binary_search([1, 3, 5, 7, 9], 4) == -1
assert kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4]) == 6
Edge
# Binary search edges
assert binary_search([], 1) == -1
assert binary_search([5], 5) == 0
assert binary_search([5], 4) == -1
assert binary_search([1, 1, 1, 1], 1) in (0, 1, 2, 3) # any valid index
# Kadane edges
assert kadane([5]) == 5
assert kadane([-3, -1, -2]) == -1
assert kadane([1, 2, 3, 4]) == 10 # all positive
assert kadane([-1, -2, -3, -4]) == -1 # all negative
Large
import random
random.seed(0)
big = sorted(random.sample(range(10**6), 10**5))
assert binary_search(big, big[50000]) == 50000
big2 = [random.randint(-10**6, 10**6) for _ in range(10**5)]
result = kadane(big2)
assert isinstance(result, int)
Randomized verifier
def brute_kadane(a):
return max(sum(a[i:j]) for i in range(len(a)) for j in range(i+1, len(a)+1))
for _ in range(200):
a = [random.randint(-50, 50) for _ in range(random.randint(1, 30))]
assert kadane(a) == brute_kadane(a)
Invariant assertions (the proof, in code)
def binary_search_with_assertions(a, t):
lo, hi = 0, len(a)
while lo < hi:
# INVARIANT: if t in a, then t at some index in [lo, hi)
if t in a[:lo]: assert False, "invariant violated (t before lo)"
if lo > 0 and a[lo-1] >= t and t in a: assert False
mid = lo + (hi - lo) // 2
if a[mid] == t: return mid
elif a[mid] < t: lo = mid + 1
else: hi = mid
return -1
Follow-up Questions
- Find leftmost vs rightmost occurrence of
t. Modify binary search; the invariant becomes “the answer is in[lo, hi)” rather than “if t is present, it’s in…”. - Binary search on real numbers. Replace integer halving with floating-point; the invariant is the same but termination uses a precision threshold, not
lo < hi. - Kadane with at most K negative numbers allowed. State expands to
(i, k); DP, O(NK). - Maximum sum circular subarray. Two passes of Kadane + total-sum trick; the invariant for the circular case is more subtle.
- Maximum product subarray. Maintain both max and min products at each index because a negative * negative becomes the largest.
Product Extension
- Database B-tree page searches — binary search within a page; the invariant analysis directly applies.
- Time-series anomaly detection — Kadane variants find the largest cumulative deviation, used in change-point detection.
- Streaming Kadane — given a stream of metrics, find the worst-degradation window. Same algorithm with O(1) memory.
Language/Runtime Follow-ups
- Python:
mid = (lo + hi) // 2is safe (arbitrary precision);mid = lo + (hi - lo) // 2is still preferable for portability. - Java/C++:
(lo + hi) / 2overflows whenlo + hi > 2^31 - 1. Uselo + (hi - lo) / 2or(lo + hi) >>> 1(Java unsigned shift). - Kadane overflow: for
|a[i]| ≤ 10^9and N = 10^5, max sum is 10^14 — exceeds 32-bit int. Uselong/int64in Java/Go/C++. - Floating-point Kadane: accumulation error compounds; use Kahan summation if precision matters.
Common Bugs
(lo + hi) // 2overflow in C++/Java- Closed-interval binary search with
lo <= hiis correct but themidupdates are trickier; pick a convention and stick with it - Kadane initialized to 0 — fails on all-negative arrays
- Forgetting
best_overallupdate — returns the best ending at the last position, not overall - Empty input to Kadane — undefined; problem statement says non-empty, but check the contract
- Binary search infinite loop when
lo = midinstead oflo = mid + 1— the monovariant doesn’t decrease
Debugging Strategy
When binary search loops forever or returns wrong index:
- Print
(lo, mid, hi)each iteration. Iflodoesn’t strictly increase orhidoesn’t strictly decrease (toward each other), you have an off-by-one. - Check the boundary condition: does your invariant include or exclude
hi? - For “find leftmost”, the answer is at
loafter the loop, notmid.
When Kadane returns 0 on all-negative input:
- Check initialization — should be
a[0], not0. - Print
(best_here, best_overall)at each step; trace by hand against the expected.
Mastery Criteria
- Wrote both algorithms correctly without testing first (proof-first coding)
- Stated the loop invariant for each in one sentence
- Identified the monovariant for binary search termination
- Proved correctness by induction (3 steps: init, maintain, terminate)
-
Recognized the
(lo + hi) / 2overflow risk without prompting -
Explained why Kadane’s init must be
a[0]and not0 - Wrote loop invariants as comments in your code for one Phase 5 DP problem
Lab 05 — Stress Testing Harness (Two-Pointer Variants)
Goal
Build a reusable stress-testing harness: a randomized input generator + a known-correct brute-force verifier + a diff loop that finds the smallest failing input. This is the single most valuable debugging tool in competitive programming, and it is shockingly underused in interview prep. After this lab you should reach for the harness automatically whenever your solution passes the given examples but you don’t trust it.
Background Concepts
A stress test has three components:
- Generator (
gen) — produces random valid inputs, parameterized by size and seed for reproducibility. - Brute force (
brute) — a known-correct slow solution. Often O(N²) or O(2^N), valid only for tiny N. - Fast solution (
fast) — the optimized solution you’re testing.
The harness loops: generate input → run both → compare. On mismatch, print the input and both outputs and stop. Then shrink the failing input to the smallest case that still fails — this is what makes debugging fast.
Why this works: brute force is correct by construction (it tries everything). Any discrepancy is your bug, not the brute force’s. Random testing covers cases you didn’t think of.
Why people don’t use it: they think writing brute force is “wasted time.” It is not; in interview prep, the brute force is also your starting point for the optimization conversation with the interviewer.
Interview Context
Stress testing rarely happens during a 45-min interview, but the practice habit shows up in your interview performance:
- You instantly know how to write the brute force (which the interviewer always wants you to articulate first).
- You catch bugs in practice that would otherwise be silently learned-wrong, then deployed mid-interview.
- You build pattern recognition for “this kind of two-pointer has off-by-one risk” because you’ve seen the harness flag them.
At competitive companies (Jane Street, Hudson River, Citadel) and at top-tier interviews (Google L6+), interviewers will sometimes ask “how would you verify this is correct beyond running the examples?” The answer is the stress harness.
Problem Statement
Implement and stress-test three two-pointer problems known to have subtle off-by-one bugs:
A. Two Sum II (sorted array). Given a sorted array a and target t, return indices (i, j) with i < j and a[i] + a[j] == t, or (-1, -1) if no such pair exists.
B. Container With Most Water (LC 11). Given heights h, find indices (i, j) maximizing (j - i) * min(h[i], h[j]).
C. 3Sum (LC 15). Given nums, return all unique triples (a, b, c) with a + b + c == 0. Each triple sorted ascending; output deduplicated.
For each, write the fast solution, write the brute force, and write the stress harness. Run for ≥1000 random trials.
Constraints
- |a| ≤ 1000 for stress; ≤ 10^5 for the real fast solution
- -1000 ≤ a[i] ≤ 1000
Clarifying Questions
(These problems are standard; the lab is about the harness, not disambiguation.)
Examples
two_sum_sorted([1, 3, 4, 5, 7], 9) → (1, 3) # 3 + 5 = 8? no — actually (2, 3) since 4+5=9
container([1, 8, 6, 2, 5, 4, 8, 3, 7]) → 49 # i=1 (h=8), j=8 (h=7), width=7
3sum([-1, 0, 1, 2, -1, -4]) → [[-1, -1, 2], [-1, 0, 1]]
Initial Brute Force
Two Sum: O(N²) double loop. Trivially correct. Container: O(N²) double loop, take max. 3Sum: O(N³) triple loop; collect, sort each triple, deduplicate via set of tuples.
Brute Force Complexity
O(N²), O(N²), O(N³). All valid for N ≤ 100 in <1 sec.
Optimization Path
All three are classic two-pointer problems. After sorting (for 3Sum and Two Sum), pointers move from both ends inward based on the comparison.
Final Expected Approach
Fast solutions
def two_sum_sorted(a, t):
i, j = 0, len(a) - 1
while i < j:
s = a[i] + a[j]
if s == t: return (i, j)
elif s < t: i += 1
else: j -= 1
return (-1, -1)
def container(h):
i, j = 0, len(h) - 1
best = 0
while i < j:
best = max(best, (j - i) * min(h[i], h[j]))
if h[i] < h[j]: i += 1
else: j -= 1
return best
def three_sum(nums):
nums = sorted(nums)
n = len(nums)
res = []
for i in range(n - 2):
if i > 0 and nums[i] == nums[i-1]: continue # skip dup anchor
j, k = i + 1, n - 1
while j < k:
s = nums[i] + nums[j] + nums[k]
if s == 0:
res.append([nums[i], nums[j], nums[k]])
j += 1; k -= 1
while j < k and nums[j] == nums[j-1]: j += 1 # skip dup j
while j < k and nums[k] == nums[k+1]: k -= 1 # skip dup k
elif s < 0: j += 1
else: k -= 1
return res
Brute forces
def brute_two_sum(a, t):
for i in range(len(a)):
for j in range(i+1, len(a)):
if a[i] + a[j] == t: return (i, j)
return (-1, -1)
def brute_container(h):
best = 0
for i in range(len(h)):
for j in range(i+1, len(h)):
best = max(best, (j - i) * min(h[i], h[j]))
return best
def brute_3sum(nums):
n = len(nums)
found = set()
for i in range(n):
for j in range(i+1, n):
for k in range(j+1, n):
if nums[i] + nums[j] + nums[k] == 0:
found.add(tuple(sorted([nums[i], nums[j], nums[k]])))
return sorted([list(t) for t in found])
The Stress Harness
import random
def stress(gen, brute, fast, normalize, trials=2000, seed=0):
random.seed(seed)
for t in range(trials):
inp = gen()
b = normalize(brute(*inp))
f = normalize(fast(*inp))
if b != f:
print(f"FAIL on trial {t}")
print(f" input: {inp}")
print(f" brute: {b}")
print(f" fast: {f}")
# Shrink: try to find a smaller failing input
shrunk = shrink_input(inp, brute, fast, normalize)
print(f" smallest failing input: {shrunk}")
return False
print(f"PASS {trials} trials")
return True
def shrink_input(inp, brute, fast, normalize):
"""Greedy shrink — drop elements one at a time, keep if still fails."""
arr, *rest = inp
current = list(arr)
changed = True
while changed:
changed = False
for i in range(len(current)):
candidate = current[:i] + current[i+1:]
if len(candidate) < 2: continue
try:
if normalize(brute(candidate, *rest)) != normalize(fast(candidate, *rest)):
current = candidate; changed = True; break
except Exception:
continue
return (current, *rest)
# Generators
def gen_two_sum():
n = random.randint(2, 20)
a = sorted(random.randint(-30, 30) for _ in range(n))
t = random.randint(-60, 60)
return (a, t)
def gen_container():
n = random.randint(2, 30)
return ([random.randint(0, 20) for _ in range(n)],)
def gen_3sum():
n = random.randint(3, 15)
return ([random.randint(-10, 10) for _ in range(n)],)
# Normalizers (canonicalize output before comparison)
def norm_two_sum(r):
# Both -1, -1 OR a valid pair; for the pair, the sum is what matters, not index
if r == (-1, -1): return None
return "found" # we only care that one was found; index may differ
# NOTE: If indices must match exactly, change the brute force to scan in two-pointer order
def norm_container(x): return x # int, direct compare
def norm_3sum(triples): return sorted([sorted(t) for t in triples])
# Run
stress(gen_two_sum, brute_two_sum, two_sum_sorted, norm_two_sum)
stress(gen_container, brute_container, container, norm_container)
stress(gen_3sum, brute_3sum, three_sum, norm_3sum)
Data Structures Used
- Lists of integers
- Set of tuples for 3Sum deduplication in brute force
- A small library of helper functions (
gen,brute,fast,normalize,stress,shrink_input) that you reuse across problems
Correctness Argument
The brute force is correct because it enumerates all valid candidates (O(N^k) for k-sum). Any output from the fast solution that disagrees with the brute is a bug in the fast solution. Random sampling over thousands of trials gives high confidence (though not certainty) that the fast is correct; the smaller the input space (e.g., values in [-10, 10] with N ≤ 15), the more confident.
Complexity
Harness adds zero asymptotic cost — the fast solution’s complexity is unchanged. Each trial costs O(brute) which is the bottleneck; with N ≤ 30 it runs ~2000 trials in <2 seconds.
Implementation Requirements
- Use
random.seed()for reproducibility — failures must be re-runnable - Print the failing input before the outputs, so you can copy-paste and re-run
- Always normalize outputs before comparison (canonical sort order, etc.)
- Implement shrinking — a 20-element failure is hard to debug; a 4-element failure is obvious
Tests
The harness itself, as a test
# Sanity: plant a bug in the fast solution and verify the harness catches it
def buggy_two_sum(a, t):
i, j = 0, len(a) - 1
while i < j:
s = a[i] + a[j]
if s == t: return (i, j)
elif s < t: i += 1
else: j -= 1
return (0, 0) # BUG: should return (-1, -1)
assert stress(gen_two_sum, brute_two_sum, buggy_two_sum, norm_two_sum, trials=500) is False
Pass-through on the correct solutions
assert stress(gen_two_sum, brute_two_sum, two_sum_sorted, norm_two_sum, trials=2000) is True
assert stress(gen_container, brute_container, container, norm_container, trials=2000) is True
assert stress(gen_3sum, brute_3sum, three_sum, norm_3sum, trials=2000) is True
Edge generators
Add specialized generators that stress edge cases:
def gen_two_sum_edge():
"""Heavy on duplicates and boundary targets."""
n = random.randint(2, 10)
a = sorted([random.choice([-1, 0, 1]) for _ in range(n)])
t = random.choice([-2, 0, 2])
return (a, t)
Follow-up Questions
- Generator-based testing (Hypothesis library). Python’s
hypothesislibrary generates inputs and shrinks them automatically. Show how to convert the harness into Hypothesis strategies. - Detecting performance regressions. Add timing to the harness; flag when fast > 10× the previous run on the same seed.
- Coverage-guided fuzzing. Use
atherisor similar to mutate inputs that increase code coverage; finds rarer bugs than purely random. - Concurrent stress testing. Run brute and fast on different threads; useful for testing thread-safe versions.
- What if there’s no brute force? Then write two independent fast solutions (different algorithms) and stress them against each other. Common for geometry problems.
Product Extension
- CI-integrated fuzzing. Google’s OSS-Fuzz runs continuous random testing on open-source projects; finds thousands of bugs annually.
- Property-based testing in production. Stripe, Jane Street, Klarna use property tests to validate financial logic where the brute force is “the spec.”
- Differential testing. Compare two implementations of the same protocol (e.g., two JSON parsers) on random inputs to find spec ambiguities.
Language/Runtime Follow-ups
- Python:
hypothesisis the gold standard for property-based testing.random.seed()is per-thread; for parallel stress, use independentrandom.Randominstances. - Java:
jqwikorjunit-quickcheckfor property-based; JMH for performance regression detection. - Go: Built-in
testing/quickand (Go 1.18+) native fuzzing withgo test -fuzz. - C++:
rapidcheck(QuickCheck-style); LLVM’s libFuzzer for coverage-guided. - Rust:
proptestandquickcheckcrates; nativecargo fuzz.
Common Bugs
- Non-reproducible failures — forgot
random.seed(); can’t re-run the failing case. - Output comparison fails due to ordering — set vs list, dict iteration order; always normalize.
- Brute force itself is buggy — verify the brute on the given problem examples first.
- Generator produces invalid inputs — e.g., for Two Sum Sorted, the generator must produce a sorted array. Verify with
assert all(a[i] <= a[i+1] for i in range(len(a)-1)). - Shrinker breaks the input invariant — for Two Sum Sorted, dropping an element keeps the array sorted; but for a tree-structured input, dropping a node may break invariants. Custom shrinkers per problem.
Debugging Strategy
When the harness reports a failure:
- Read the smallest failing input that the shrinker produced. If it’s ≤ 5 elements, trace by hand.
- Run only the fast solution with prints on that small input. Compare to expected.
- The bug is almost always in a boundary condition — empty input, single element, all duplicates, exact-target match.
- If the bug only appears with duplicates, suspect your dedup logic (3Sum is famous for this).
- If the bug only appears with negatives, suspect signed comparisons or
abs()misuse.
Mastery Criteria
- Built the stress harness in under 20 minutes for the three target problems
- Caught at least one bug by planting one and verifying the harness flagged it
- Wrote a shrinker that reduces failures to ≤ 10 elements
- Ran 2000+ trials per problem with no failure
- Built a reusable harness module you can drop into any future problem
- Applied the harness to one Phase 2 or Phase 5 problem and either confirmed correctness or found a bug
- Can explain why random testing complements (not replaces) edge-case enumeration
Lab 06 — Performance Profiling (Three LIS Implementations)
Goal
Measure and compare three implementations of Longest Increasing Subsequence: O(N²) DP, O(N log N) patience sort, and a poorly-written “O(N log N)” with hidden O(N) inside the inner loop. Use profiling tools to detect the discrepancy between claimed complexity and actual behavior. By the end you should never again submit a solution thinking “this should be fast enough” without measuring.
Background Concepts
Empirical complexity verification: if a function is O(f(N)), then running it at N and 2N should produce runtimes that scale as f(2N) / f(N). For O(N): 2×. For O(N log N): ~2.1×. For O(N²): 4×. For O(N³): 8×. Measure the ratio; mismatch reveals the bug.
Profiling tools:
- Python:
cProfilefor function-level timing;line_profilerfor line-by-line;py-spyfor sampling without code changes;tracemallocfor memory. - Java:
async-profilerfor low-overhead sampling; JFR (Java Flight Recorder); JMH for microbenchmarks; jmap for heap snapshots. - Go:
pprof(CPU and heap);runtime/tracefor goroutine scheduling;go test -benchwith-benchmem. - C++:
perf(Linux) with flamegraphs; Valgrind/Callgrind for instruction counts; gperftools. - Node.js: built-in
--inspect+ Chrome DevTools;clinic.jsfor higher-level analysis.
Common deceptions:
- A “constant time” operation that’s actually O(N) (e.g.,
list.insert(0, x)in Python,s += cin a loop in Java). - Hash collisions in adversarial input turning O(1) lookups into O(N).
- Garbage collector pauses inflating measurements.
- JIT warmup masking the cold-start cost.
Interview Context
Interviewers ask “what’s the complexity?” on every problem. They sometimes follow up with “are you sure?” — and the right answer is “I claim O(N log N); I can verify by running at doubling sizes if you’d like.” Candidates who can articulate empirical verification (even without running it) signal a different level of rigor.
Practical-engineering interviews (Phase 8) often include “make this faster” or “profile this for me” as a follow-up. You need fluency with at least one language’s profiler.
Problem Statement
Implement three versions of LIS:
Version A (O(N²) DP): dp[i] = length of LIS ending at index i. Transition: dp[i] = max(dp[j] + 1) for all j < i with a[j] < a[i].
Version B (O(N log N) patience sort): Maintain tails[k] = smallest tail value of any increasing subsequence of length k+1. For each element, binary-search-and-replace; if no replacement, append.
Version C (“fake” O(N log N)): Same as B, but uses Python list with index() (O(N)) to find the replacement position instead of binary search. Looks like O(N log N) at a glance; actually O(N²).
Measure runtime at N = 1000, 2000, 4000, 8000, 16000. Verify the doubling ratios. Profile to find the bottleneck in Version C.
Constraints
- 1 ≤ N ≤ 16000 for measurement
- Values: random integers in
[0, 10^6]
Clarifying Questions
(LIS is standard; the lab is about measurement.)
Examples
LIS([10, 9, 2, 5, 3, 7, 101, 18]) == 4 (e.g., [2, 5, 7, 101] or [2, 3, 7, 18])
LIS([0, 1, 0, 3, 2, 3]) == 4
LIS([7, 7, 7, 7]) == 1
Initial Brute Force
Enumerate all 2^N subsequences, check each. O(2^N · N).
Brute Force Complexity
O(2^N · N). Valid only for N ≤ 20. Useful as the verifier in your stress harness from Lab 5.
Optimization Path
DP from O(2^N) → O(N²) → O(N log N).
Final Expected Approach
import bisect
import time
import random
# Version A: O(N^2) DP
def lis_dp(a):
if not a: return 0
n = len(a)
dp = [1] * n
for i in range(1, n):
for j in range(i):
if a[j] < a[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
# Version B: O(N log N) — real
def lis_patience(a):
tails = []
for x in a:
idx = bisect.bisect_left(tails, x)
if idx == len(tails):
tails.append(x)
else:
tails[idx] = x
return len(tails)
# Version C: "fake" O(N log N) — uses linear search disguised
def lis_fake(a):
tails = []
for x in a:
# Linear scan to find first tail >= x — O(N)!
idx = None
for i, t in enumerate(tails):
if t >= x:
idx = i; break
if idx is None:
tails.append(x)
else:
tails[idx] = x
return len(tails)
Measurement Harness
def benchmark(fn, n, trials=3):
random.seed(n)
times = []
for _ in range(trials):
a = [random.randint(0, 10**6) for _ in range(n)]
start = time.perf_counter()
fn(a)
times.append(time.perf_counter() - start)
return min(times) # min is most reliable; mean is noisy
def doubling_test(fn, name, sizes=[1000, 2000, 4000, 8000, 16000]):
prev = None
print(f"\n{name}")
print(f"{'N':>8} {'time (s)':>12} {'ratio':>8}")
for n in sizes:
t = benchmark(fn, n)
ratio = f"{t/prev:.2f}x" if prev else "—"
print(f"{n:>8} {t:>12.4f} {ratio:>8}")
prev = t
doubling_test(lis_dp, "Version A — O(N^2) DP")
doubling_test(lis_patience, "Version B — O(N log N) patience")
doubling_test(lis_fake, "Version C — fake O(N log N)")
Expected Output (approximate, on modern laptop)
Version A — O(N^2) DP
N time (s) ratio
1000 0.0420 —
2000 0.1680 4.00x
4000 0.6720 4.00x
8000 2.6900 4.00x
16000 10.7600 4.00x
Version B — O(N log N) patience
N time (s) ratio
1000 0.0008 —
2000 0.0017 2.13x
4000 0.0036 2.12x
8000 0.0076 2.11x
16000 0.0160 2.10x
Version C — fake O(N log N)
N time (s) ratio
1000 0.0210 —
2000 0.0840 4.00x ← 4x means O(N^2), not O(N log N)!
4000 0.3360 4.00x
8000 1.3440 4.00x
16000 5.3760 4.00x
Reading the ratios: if you claimed O(N log N) and see 4× per doubling, your algorithm is actually O(N²). The doubling test is the cheapest, most reliable complexity-verifier in your toolbox.
Profiling Version C
import cProfile, pstats
random.seed(0)
a = [random.randint(0, 10**6) for _ in range(8000)]
pr = cProfile.Profile()
pr.enable()
lis_fake(a)
pr.disable()
pstats.Stats(pr).sort_stats('cumulative').print_stats(10)
Expected output: the lis_fake function dominates; the inner for i, t in enumerate(tails) is the hot spot. With line_profiler:
$ pip install line_profiler
$ kernprof -l -v script.py
# Add @profile decorator to lis_fake first
Output will show the inner loop accounts for ~95% of the time per call, confirming the linear-search bottleneck.
Data Structures Used
- Plain list (Python) — supports
bisectfor true O(log N) search - Profiler outputs (text/HTML/flamegraph depending on tool)
Correctness Argument
All three versions produce the same output on random input. The patience sort correctness: tails[k] always stores the smallest possible tail of a length-(k+1) LIS seen so far. When a new element is smaller than tails[k], replacing improves the future extensibility; when it’s larger than all tails, it extends to a new length. Final length is len(tails). Inductive proof omitted; see CLRS / standard references.
Complexity
- A: O(N²) time, O(N) space.
- B: O(N log N) time, O(N) space.
- C: claimed O(N log N), actually O(N²) due to linear inner search.
Implementation Requirements
- Use
time.perf_counter(), nottime.time()— higher resolution, monotonic. - Take the
minof multiple trials, not the mean —minrejects GC/scheduler noise. - Warm up before timing if testing JIT’d languages (Java, JS).
- Use
__slots__and pre-allocated arrays in hot Python paths.
Tests
Smoke
assert lis_dp([10, 9, 2, 5, 3, 7, 101, 18]) == 4
assert lis_patience([10, 9, 2, 5, 3, 7, 101, 18]) == 4
assert lis_fake([10, 9, 2, 5, 3, 7, 101, 18]) == 4
Edge
for fn in (lis_dp, lis_patience, lis_fake):
assert fn([]) == 0
assert fn([42]) == 1
assert fn([1, 2, 3, 4, 5]) == 5 # already sorted
assert fn([5, 4, 3, 2, 1]) == 1 # reverse sorted
assert fn([7, 7, 7, 7]) == 1 # all duplicates
Performance assertions
# Verify Version B's doubling ratio
times = [benchmark(lis_patience, n) for n in (1000, 2000, 4000)]
ratios = [times[1]/times[0], times[2]/times[1]]
for r in ratios:
assert 1.8 < r < 2.5, f"Version B ratio {r:.2f} not in O(N log N) range"
# Verify Version C is actually quadratic (this assertion *should* pass — proving the bug)
times = [benchmark(lis_fake, n) for n in (1000, 2000, 4000)]
ratios = [times[1]/times[0], times[2]/times[1]]
for r in ratios:
assert 3.5 < r < 4.5, f"Version C ratio {r:.2f} not in O(N^2) range — bug may be fixed?"
Randomized verifier (A == B == C)
random.seed(0)
for _ in range(50):
a = [random.randint(0, 100) for _ in range(random.randint(1, 50))]
assert lis_dp(a) == lis_patience(a) == lis_fake(a)
Follow-up Questions
- What if the input is nearly sorted? Version A’s actual runtime degrades less; profile to confirm.
- Memory profile: which version allocates most? Use
tracemalloc.start()andtracemalloc.get_traced_memory(). - Reconstruct the LIS, not just its length. Track parent pointers; doesn’t change asymptotic complexity but doubles space.
- LIS in a stream (one pass, can’t store everything). Use patience sort with a fixed buffer; gives approximate answer.
- Parallelize LIS. Hard — DP dependencies are sequential. Pipeline by chunks; merge with care.
Product Extension
- Code review of “this should be fast” claims — every senior engineer learns to verify before trusting.
- Database query planners — the planner estimates I/O cost; profiling validates the estimate against real query times.
- CDN cache eviction policies — when comparing LRU vs LFU vs SLRU under real traffic, microbenchmarks lie; full profiles win.
- Production hot-path detection — flamegraphs reveal that 80% of CPU is spent in 3% of the code; optimize there.
Language/Runtime Follow-ups
- Python:
time.perf_counter()is the right clock.cProfileoverhead is ~30%; for tight loops useline_profilerorpy-spy(sampling, ~0% overhead). The GIL means CPU profiling is mostly sequential; for asyncio code useaiomonitororasynciodebug mode. - Java: JMH (Java Microbenchmark Harness) handles JIT warmup, dead-code elimination, and constant folding correctly — handwritten timing loops in Java are often wrong. Use
-prof gcto see allocation cost. - Go:
go test -bench=. -benchmemis the standard.-cpuprofileand-memprofilewrite pprof files; visualize withgo tool pprof. - C++:
perf record+perf reportfor sampling;perf statfor cache misses and IPC. Use-O2or-O3for measurement; debug builds have very different performance. - Node.js: V8’s
--profflag dumps tick logs;--inspectfor Chrome DevTools. Beware turbofan optimization — code that runs cold for the first 10K iterations is suddenly 10× faster after JIT.
Common Bugs
- Timing the cold start — first call includes import/parse/JIT warmup; throw it out.
- Using
time.time()— wall clock; affected by NTP, sleep, system load. - Mean over trials — one stop-the-world GC pause skews the mean; use min instead.
- Measuring with assertions on — Python
-Oflag strips asserts; default mode keeps them, slowing hot loops. - Forgetting
random.seed()— runs aren’t reproducible. - Comparing implementations on different input distributions — random LIS, sorted LIS, and reversed LIS have wildly different runtimes for some algorithms.
Debugging Strategy
When complexity doesn’t match your claim:
- Verify with the doubling test. This is the only way.
- Profile to find the hot function. It’s almost always one inner loop.
- Read the standard library docs for any “built-in” operation you used —
list.insert,dict.update,str +=,Vector.addmay not be what you think. - Check for hidden quadratic behavior in concatenation:
result = result + small_thingin a loop is the classic Java/Python beginner trap. - Verify memory with
tracemalloc— sometimes the “slow” is actually paging, not CPU.
Mastery Criteria
- Implemented all three LIS versions
- Ran the doubling test and observed the 4× / 2.1× / 4× ratios
- Profiled Version C with cProfile (or equivalent) and identified the linear-search bottleneck
- Wrote performance assertions that would catch a regression
- Can recite the expected doubling ratio for O(N), O(N log N), O(N²), O(N³)
- Applied profiling to one of your own past solutions and identified one inefficiency
- Familiarity with at least one profiler in your primary language (output format, common flags, how to interpret)
Phase 11 — Mock Interview Mastery
Target level: All (mock difficulty scales from beginner through staff/principal/competitive) Expected duration: 4–8 weeks (depending on your overall track; mocks are continuous) Weekly cadence: 2 mocks minimum, 3+ if interviews are within 4 weeks
Why This Phase Exists
Phases 0–10 trained you to solve problems. This phase trains you to interview. Those are different skills. You can know every algorithm, pass every Phase 1–9 lab, write proofs from Phase 10 — and still fail a real interview because:
- You panic and forget the obvious under real-time pressure.
- You waste 8 minutes on clarifying questions a senior would resolve in 60 seconds.
- You code correctly but communicate nothing — the interviewer can’t tell if you’re thinking or stuck.
- You optimize prematurely before understanding the problem.
- You miss the follow-ups that separate “competent” from “hireable at level.”
- You finish in 25 minutes and have nothing to say when asked the extension question.
- You write buggy code, then spend the remaining time debugging instead of explaining.
A mock interview is the nearest equivalent to the real event without the stakes. Your job: complete at least 12 mocks (one per level), identify your failure mode per level, drill it until it stops happening.
The candidates who pass the hard rounds are not the ones who know the most algorithms. They are the ones who have rehearsed the performance enough times that the algorithm is almost a side effect of a clean interview.
How to Run a Mock
Alone (self-timed)
- Read the problem statement only. Do not peek at hints, examples, follow-ups.
- Set a timer to the mock’s exact allocated time.
- Open a blank document (Google Doc, plain text, paper). No IDE, no autocomplete, no syntax highlighting. The real interview is in a shared Google Doc or CoderPad with minimal tooling.
- Narrate aloud or write notes continuously. If you go silent for >30 seconds, stop, write down what you’re thinking, then proceed.
- Write pseudocode first. If you have >20 minutes left after pseudocode, translate to real code. If less, stay in pseudocode and be very clear about logic.
- When time expires, STOP. This is a time-management test, not a coding-speed test.
- Self-evaluate against the 14-dimension rubric below. Score honestly. If your “Optimization” claim was O(N log N) but you wrote O(N²), that’s a 1, not a 4.
- Do not look at the official solution until after a second self-mock at the same level. One failure teaches a fact; two failures teach the pattern.
With a partner (realistic)
- Find a peer, ideally one level above you.
- They read the problem statement to you. You ask clarifying questions; they answer in character.
- They watch silently as you solve. They give hints only if you explicitly request one (with a score penalty) or after 10+ minutes of being stuck.
- After the timer expires, they rate you on the 14 dimensions, then you debrief.
- Swap roles next session.
Best of both worlds
- Pramp, interviewing.io, Hello Interview (and similar) match you with strangers. Higher pressure, more realistic.
- Record yourself (audio + screen). Replay 24h later. You will be shocked at what you actually said vs what you remember.
The 14-Dimension Scoring Rubric
Every mock is scored 1–5 on each dimension. Total /70. Passing thresholds vary by mock level (see each mock file).
1. Problem Understanding
- 1: Misread the problem; solved the wrong thing.
- 2: Understood the surface; missed a subtle constraint.
- 3: Understood correctly, restated to interviewer.
- 4: Restated, identified the underlying category (graph, DP, greedy) within first 2 minutes.
- 5: Restated, identified category, and explicitly verified your interpretation with one well-chosen example.
2. Clarifying Questions
- 1: None asked; assumed everything.
- 2: One generic question (“can the input be empty?”).
- 3: 2–3 questions covering input bounds and edge cases.
- 4: 3–5 questions covering bounds, edge cases, output format, ambiguity resolution.
- 5: Surgical questions that probe the exact ambiguities of this problem (e.g., for LRU: “does put-on-existing count as a use?”).
3. Brute Force
- 1: No brute force articulated; jumped to optimization.
- 2: Mentioned brute force in passing.
- 3: Stated brute force with complexity; moved on.
- 4: Stated brute force, complexity, and why it fails the constraint.
- 5: Wrote brute force pseudocode briefly to confirm correctness before optimizing — gives you a verifier.
4. Optimization
- 1: No improvement on brute force.
- 2: Improved by a constant factor.
- 3: Optimal-class solution (e.g., O(N log N) when O(N log N) is optimal).
- 4: Optimal-class solution with the right pattern recognized within first 5 minutes.
- 5: Optimal solution + articulated why the optimization works (the key insight) + considered alternative optimizations and rejected them with reasoning.
5. Correctness
- 1: Solution wrong; doesn’t handle the given examples.
- 2: Handles examples but fails an obvious edge case.
- 3: Handles all standard edge cases.
- 4: Handles edge cases plus 1 non-obvious one (overflow, empty input, all-duplicates).
- 5: Walks through correctness argument using invariant or induction.
6. Complexity Analysis
- 1: Wrong or absent.
- 2: Correct but only stated at the end.
- 3: Correct, articulated during/after coding.
- 4: Correct, plus space complexity, plus identified the bottleneck.
- 5: All of the above + considered amortized analysis or worst-case input that triggers worst-case complexity.
7. Code Quality
- 1: Unreadable; magic numbers; one-letter variables everywhere.
- 2: Works but ugly; copy-paste blocks; unclear naming.
- 3: Readable; reasonable names; small functions.
- 4: Clean structure; helpful comments where non-obvious; good use of standard library.
- 5: Production-quality — clear names, no dead code, idiomatic for the language, would pass a code review.
8. Testing
- 1: Did not test.
- 2: Tested only the given example.
- 3: Tested 1–2 edge cases unprompted.
- 4: Systematic walkthrough of given examples + 2+ deliberately-chosen edges.
- 5: Found and fixed own bug through testing, or explicitly stated which test classes would expose risks.
9. Debugging
- 1: Hit a bug, panicked, never recovered.
- 2: Hit a bug, fixed by trial and error.
- 3: Hit a bug, debugged systematically with prints.
- 4: Hit a bug, hypothesized cause, verified with targeted assertion, fixed.
- 5: Hit a bug; narrated the debug protocol aloud; fixed in under 3 minutes.
10. Communication
- 1: Silent typing.
- 2: Occasional muttering; no clear narrative.
- 3: Explained brute force and optimization out loud.
- 4: Continuous narration of thought process; pauses only to think briefly.
- 5: Narrated, paused at decision points to consider tradeoffs, invited interviewer input at appropriate moments.
11. Handling Follow-ups
- 1: Could not answer follow-ups.
- 2: Answered partially.
- 3: Answered correctly with one prompt.
- 4: Answered correctly without prompt; proposed reasonable extensions.
- 5: Answered, anticipated the follow-up before it was asked, and proposed extensions.
12. Language/Runtime Knowledge
- 1: Made language errors (Python integer-divides where float was needed; Java auto-boxing trap).
- 2: No errors but no runtime awareness.
- 3: Used appropriate language features (Python
Counter, JavaMap.Entry). - 4: Articulated runtime cost (Python
list.insert(0, ...)is O(N); JavaString +=is O(N²) in a loop). - 5: Discussed GC/memory model/concurrency implications when relevant.
13. Tradeoff Reasoning
- 1: Picked one approach with no comparison.
- 2: Mentioned one alternative.
- 3: Compared two alternatives with a stated reason.
- 4: Compared 2–3 alternatives across time/space/code complexity axes.
- 5: Articulated which alternative would be preferred under different constraints (small N vs large N, read-heavy vs write-heavy, latency vs throughput).
14. Production Awareness
- 1: None — solved as an algorithm puzzle.
- 2: Mentioned scaling in passing.
- 3: Articulated 1–2 production concerns (latency, persistence, concurrency).
- 4: Articulated multiple production concerns; explained how implementation would change.
- 5: Discussed monitoring, failure modes, backward compatibility, deployment strategy — staff-level signal.
Passing Thresholds by Mock Level
| Mock | Target average score | Total minimum (/70) |
|---|---|---|
| 01 — Beginner | 2.5 | 35 |
| 02 — Easy LeetCode | 3.0 | 42 |
| 03 — Medium LeetCode | 3.0 | 42 |
| 04 — Hard LeetCode | 3.2 | 45 |
| 05 — Big Tech phone screen | 3.3 | 46 |
| 06 — Big Tech onsite | 3.5 | 49 |
| 07 — Senior engineer | 3.8 | 53 |
| 08 — Staff practical | 4.0 | 56 |
| 09 — Runtime/language | 3.8 | 53 |
| 10 — Infrastructure/backend | 4.0 | 56 |
| 11 — Concurrency | 4.0 | 56 |
| 12 — Competitive style | 3.5 | 49 |
Notes:
- Production-aware dimensions (#13, #14) are weighted higher for mocks 07–10.
- Communication (#10) is the most common reason candidates fail; if your average for #10 is below 3.5, drill it specifically.
- Mock 12 (competitive) deprioritizes #11–#14 (no follow-ups; pure algorithm).
Common Failure Modes by Level
| Level | Most common failure |
|---|---|
| Beginner | Silent coding; no communication |
| Easy LC | Forgot edge cases (empty, single element) |
| Medium LC | Stuck on optimization; couldn’t find the pattern |
| Hard LC | Panicked when first approach didn’t work |
| Phone screen | Spent too long on clarification; ran out of time |
| Onsite | Solved problem 1, gave up on problem 2 |
| Senior | No tradeoff reasoning; “I’d just use X” without comparing |
| Staff | No production awareness; built a perfect algorithm with no monitoring story |
| Runtime/lang | Couldn’t answer GC / memory model / concurrency probe mid-coding |
| Infrastructure | Treated it like a LeetCode problem instead of a system build |
| Concurrency | Race conditions in submitted code |
| Competitive | Failed to reach the algorithmic insight; brute force only |
How to Schedule Mocks
12-Week Accelerated Track
- Weeks 1–4: Mocks 01–04 (one each)
- Weeks 5–8: Mocks 05–08 (one each)
- Weeks 9–12: Mocks 09–12 (one each) + repeats of the ones you failed
6-Month Serious Track
- Months 1–2: foundations; no mocks yet
- Month 3: Mocks 01–03 (2 per week)
- Month 4: Mocks 04–06 (2 per week)
- Month 5: Mocks 07–10 (3 per week)
- Month 6: Mocks 11–12 + heavy re-mock cycle (4 per week)
12-Month Elite Track
- Months 1–6: foundations + light mocks (mock 01–04, 1 per week)
- Months 7–9: Mocks 05–10, 2 per week
- Months 10–12: Mocks 11–12, ICPC contests, 3 mocks per week + 2 contests per week
How to Self-Evaluate Honestly
The single biggest failure mode is grade inflation. To counter:
- Record the session. Listen back. You will hear all the silent gaps and the muttered “uh, let me think” filler.
- Compare to the rubric word-for-word. “I tested edge cases” is not enough for a 4 unless you actually tested 3+ unprompted edge cases.
- Find someone harsher than you to debrief with. Ideally an engineer one level above your target.
- Track scores over time. A flat line means you’re not improving — change something (new problem domain, harder mock, partner).
- The dimension where you score lowest is your training target. Drill it for two weeks, then re-mock.
Mock Index
| # | Mock | Time | Target Role |
|---|---|---|---|
| 1 | Beginner | 30 min | First-time interviewer / intern |
| 2 | Easy LeetCode | 30 min | Intern → SWE-I |
| 3 | Medium LeetCode | 35 min | SWE-I → SWE-II |
| 4 | Hard LeetCode | 60 min | SWE-II → Senior |
| 5 | Big Tech Phone Screen | 45 min | Any FAANG screen |
| 6 | Big Tech Onsite | 60 min (× 2 problems) | FAANG SWE-II / Senior |
| 7 | Senior Engineer | 60 min | Senior SWE |
| 8 | Staff Practical | 75 min | Staff / Principal |
| 9 | Runtime/Language Deep Dive | 45 min | Senior / Staff |
| 10 | Infrastructure/Backend | 75 min | Backend / Platform |
| 11 | Concurrency Heavy | 60 min | Backend / Systems |
| 12 | Competitive Style | 90 min | Quant / Compiler / ICPC |
What “Pass” Means
Passing a mock is necessary but not sufficient readiness. The full readiness checklist is in READINESS_CHECKLIST.md. The mocks here verify you can perform — not that you can do so consistently. Aim for 3 consecutive passes of any given mock before considering that level handled.
Mock 01 — Beginner
Interview type: First-time mock / warm-up Target role: Intern, new grad, first-ever interview practice Time limit: 30 minutes Format: 1 easy problem Hints policy: Unlimited hints with -1 to score per hint Primary goal: Build the habit loop of clarify → brute force → optimize → code → test. Optimization is not the focus.
What This Mock Tests
This mock exists to break the most common beginner failure mode: silent coding. You will be scored more on whether you communicated than on whether your code is optimal. A correct silent solution scores lower than a slightly buggy spoken one.
The scoring rubric weights as follows:
| Dimension | Weight |
|---|---|
| Communication (#10) | 3× |
| Clarifying questions (#2) | 2× |
| Testing (#8) | 2× |
| Code quality (#7) | 1× |
| Correctness (#5) | 1× |
| Complexity (#6) | 1× |
| All others | 0.5× |
Pick One Problem (interviewer’s choice; for self-mock, pick at random)
Problem A — Reverse a String
Write a function that reverses a string. The input is given as a list of characters s. Modify s in-place; do not allocate a new list.
Examples:
Input: ['h', 'e', 'l', 'l', 'o']
Output: ['o', 'l', 'l', 'e', 'h']
Input: ['H', 'a', 'n', 'n', 'a', 'h']
Output: ['h', 'a', 'n', 'n', 'a', 'H']
Constraints: 1 ≤ |s| ≤ 10^5. Each character is printable ASCII.
Problem B — Valid Parentheses
Given a string s containing only ()[]{}, determine if it is valid. Valid means: brackets close in the right order, every opener has a matching closer of the same type.
Examples:
"()" → True
"()[]{}" → True
"(]" → False
"([)]" → False
"{[]}" → True
"" → True
Constraints: 0 ≤ |s| ≤ 10^4.
Problem C — Find Maximum in Array
Given an array of integers, return the maximum value. Do not use the language’s built-in max().
Examples:
[3, 1, 4, 1, 5, 9, 2, 6] → 9
[-3, -1, -7] → -1
[42] → 42
Constraints: 1 ≤ |a| ≤ 10^5. -10^9 ≤ a[i] ≤ 10^9.
Expected Communication Style
You should:
- Restate the problem in your own words. (“So I need to reverse this list of characters in place, meaning no new list allocation.”)
- Ask 2+ clarifying questions even if they feel obvious. (“Should I handle empty input? What’s the max length?”)
- State 1+ example trace out loud. (“For
[h, e, l, l, o], I’d swap positions 0 and 4, then 1 and 3, leaving 2 alone.”) - Articulate brute force first. Even for these problems — there’s an obvious approach.
- Code while narrating. “I’ll use two pointers, left and right, swap and move toward center until they meet.”
- Test out loud. Walk through the example, then try empty input, then single element.
You should not:
- Type silently for >30 seconds
- Skip clarifying questions because the problem “is obvious”
- Skip testing because the code “looks right”
- Assume the interviewer is following — narrate every decision
Common Failure Modes
- Silent coding. Most common. -3 to communication.
- Skipping clarification. “Empty string?” was not asked → -1.
- No testing. Submitted without walking through. -2 to testing.
- Skipping brute force. Wrote the optimal directly without acknowledging the simpler approach. -1 to brute force.
- Using a built-in. Problem C says no
max()— using it is an instant fail of that problem.
Passing Bar
- Total score: 35/70 (average 2.5)
- Communication dimension: 3+ (mandatory)
- Code: works on all given examples
- One unprompted test case beyond the given examples
If you score 35+ but communication is below 3: re-do this mock. The score is misleading; the failure mode isn’t fixed.
Follow-up Questions (Interviewer may ask)
For A:
- What’s the complexity? (O(N) time, O(1) space.)
- What if the string is in a Unicode encoding with multi-byte characters? (Character iteration is no longer index-1; need to handle codepoints.)
- What if the string is immutable in your language? (Java strings, Python strings — must allocate.)
For B:
- Complexity? (O(N) time, O(N) worst-case space for the stack.)
- What if you also need to return the position of the first invalid bracket? (Track index when pushing; return index on failed pop.)
- What if the input can have non-bracket characters mixed in? (Skip them or treat as invalid — clarify.)
For C:
- Complexity? (O(N) time, O(1) space.)
- What if the array is empty? (Undefined; throw, return None, or return INT_MIN — clarify.)
- What if it’s a stream and you can’t store it all? (Maintain running max with O(1) state.)
Required Tests
For all problems:
- The given examples
- Empty input (where the constraint allows)
- Single-element input
- One additional edge case you choose
For A: a palindrome input (e.g., ['r', 'a', 'c', 'e', 'c', 'a', 'r']).
For B: nested mismatch like "([)]" and just an opener "(" (should be False).
For C: all-negative input and all-identical input.
Required Complexity Explanation
State out loud:
- Time complexity in Big-O
- Space complexity
- Why those bounds are tight (or whether they could be improved)
For these problems, the optimal is also the simplest. Acknowledge that briefly.
Self-Evaluation Template
Copy this into your notes after the mock:
Mock 01 — Beginner
Date: _______
Problem chosen: _______
Time taken: _____ min (limit: 30)
Scores (1–5):
[ ] 1. Problem Understanding
[ ] 2. Clarifying Questions
[ ] 3. Brute Force
[ ] 4. Optimization
[ ] 5. Correctness
[ ] 6. Complexity Analysis
[ ] 7. Code Quality
[ ] 8. Testing
[ ] 9. Debugging (if applicable)
[ ] 10. Communication
[ ] 11. Follow-ups
[ ] 12. Language/Runtime
[ ] 13. Tradeoffs
[ ] 14. Production Awareness
Total: ___/70
What went well:
What went poorly:
Specific bug or moment of confusion:
What to drill before next mock:
What to Do If You Fail
If you scored below 35:
- Identify the dimension with the lowest score.
- Re-do this same mock with a different problem (A/B/C). Focus only on that dimension.
- If communication is the issue, record yourself doing 3 LC easies aloud over the next week. Listen back.
- Do not move to Mock 02 until you pass Mock 01 twice in a row. Foundational habits matter more than progression.
Mock 02 — Easy LeetCode
Interview type: Standard LC easy Target role: Intern, new grad SWE-I, first technical screen Time limit: 30 minutes Format: 1 easy problem Hints policy: One free hint after 10 min of being stuck; additional hints -1 each Primary goal: Solve correctly with clean code and adequate testing in under 30 min.
What This Mock Tests
The bar at the easy level: you should solve the problem with the optimal approach, communicate clearly throughout, and verify with at least 2 unprompted tests. Easy LC questions are the most common phone-screen problems at smaller companies and at the start of FAANG screens.
Scoring weights are uniform across dimensions — easy LCs should pass on every axis.
Pick One Problem
Problem A — Two Sum (LC 1)
Given an array of integers nums and an integer target, return the indices of the two numbers that add up to target. You may assume exactly one solution exists. You may not use the same element twice.
Examples:
nums = [2, 7, 11, 15], target = 9 → [0, 1]
nums = [3, 2, 4], target = 6 → [1, 2]
nums = [3, 3], target = 6 → [0, 1]
Constraints: 2 ≤ |nums| ≤ 10^4. -10^9 ≤ nums[i] ≤ 10^9. Exactly one solution.
Problem B — Best Time to Buy and Sell Stock (LC 121)
You are given prices, where prices[i] is the price of a stock on day i. Maximize profit by choosing one day to buy and a later day to sell. If no profit possible, return 0.
Examples:
[7, 1, 5, 3, 6, 4] → 5 (buy day 1 at 1, sell day 4 at 6)
[7, 6, 4, 3, 1] → 0
[1, 2] → 1
Constraints: 1 ≤ |prices| ≤ 10^5. 0 ≤ prices[i] ≤ 10^4.
Problem C — Contains Duplicate (LC 217)
Given an integer array nums, return True if any value appears at least twice.
Examples:
[1, 2, 3, 1] → True
[1, 2, 3, 4] → False
[1, 1, 1, 3, 3, 4, 3, 2, 4, 2] → True
Constraints: 1 ≤ |nums| ≤ 10^5.
Expected Communication Style
- Restate the problem in 1 sentence.
- Ask 2–3 clarifying questions: input bounds, edge cases (empty? duplicates? negative?), output format.
- State brute force with complexity. (“Nested loop, O(N²).”)
- Identify the optimization signal. (“I see lookup-by-value — hashmap.”)
- State optimal complexity before coding. (“O(N) time, O(N) space.”)
- Code with brief narration of each step.
- Walk through given example. Then 1–2 edge cases.
Solution Sketches
A. Two Sum: hashmap value → index; for each element, check if target - x is in the map; if so return; else insert. O(N) time, O(N) space.
B. Stock: maintain running minimum; for each price, compute price - min_so_far; track max. O(N) time, O(1) space.
C. Duplicate: insert into a set; on collision return True. O(N) time, O(N) space. Or: sort + scan adjacent, O(N log N) time, O(1) extra.
Common Failure Modes
- Wrote the O(N²) brute force as the final answer. Acceptable on first attempt, but you must immediately follow with the optimization.
- Forgot to handle duplicates in Two Sum. What if
nums = [3, 3]? The naive hashmap solution must checktarget - xbefore insertingx. - Used
max(prices) - min(prices)for B. Wrong — the min must come before the max. - No edge case test. Empty arrays, single element, all duplicates.
Passing Bar
- Total score: 42/70 (average 3.0)
- Optimal complexity reached (O(N) or O(N log N) depending on problem)
- At least 2 unprompted edge case tests
- Continuous narration (no silent stretches >30 sec)
Follow-up Questions
For A:
- What if there are multiple valid pairs? Return all. → Need to handle duplicates carefully; either set of tuples or sort + two-pointer.
- What if the array is sorted? → Two-pointer, O(1) extra space.
- What if the input is a stream? → Hashmap still works; can’t know “future” elements.
- What if duplicates matter? →
nums = [3, 3], target = 6→ return [0, 1].
For B:
- Can you buy and sell multiple times? → LC 122; sum all positive day-to-day differences.
- With a cooldown of K days between trades? → DP, state = (day, holding?).
- With a transaction fee? → DP variant.
- What if you can short-sell? → Symmetric; maintain running max.
For C:
- What if memory is constrained? → Bloom filter (approximate); or sort in-place + scan.
- Return the first duplicate, not just whether one exists. → Same approach, return on first collision.
- Find ALL duplicates. → Count map.
- In a stream? → Bloom filter or Count-Min Sketch for approximate.
Required Tests
For all:
- Given examples
- Empty input or smallest legal input
- Single-element input (if applicable)
- Edge case specific to the problem (negative numbers, all-duplicates, sorted input)
Required Complexity Explanation
State:
- Time complexity (with reasoning)
- Space complexity (with reasoning)
- The bottleneck — which line determines the dominant cost
Self-Evaluation Template
Mock 02 — Easy LC
Date: _______
Problem: _______
Time taken: _____ / 30 min
Scores (1–5) for all 14 dimensions:
___ Total /70
Optimal complexity reached? Y/N
Hints used? Number: ___
Edge cases tested unprompted? Number: ___
Strongest dimension:
Weakest dimension:
Action item for next mock:
What to Do If You Fail
- Score 35–41: Re-do with a different problem from the list. Focus on the lowest-scored dimension.
- Score <35: Step back to Mock 01 for one session. The habit loop isn’t solid.
- Failed to reach optimal complexity: Drill Phase 1 (foundations) hash-map and array labs.
- Took longer than 30 min: Repeat with a stricter timer. 30 min is the actual phone-screen budget.
Mock 03 — Medium LeetCode
Interview type: Standard LC medium — the modal interview question type Target role: SWE-I → SWE-II, generic mid-tier phone screen Time limit: 35 minutes Format: 1 medium problem Hints policy: One free hint at 15 min; additional -2 each Primary goal: Pattern recognition under time pressure + clean implementation.
What This Mock Tests
Mediums are the bread and butter of coding interviews. If you can’t pass mediums consistently in 30–35 min, you cannot pass FAANG. The bar is:
- Recognize the pattern (sliding window, hashmap, two pointers, BFS, etc.) within 5 minutes
- State optimal complexity before coding
- Implement correctly with clean code
- Test 3+ cases including 1 non-obvious edge
Scoring weights this mock equally across all dimensions, with slight emphasis on Optimization (#4) and Testing (#8) — these are the differentiators at the medium level.
Pick One Problem
Problem A — Longest Substring Without Repeating Characters (LC 3)
Given a string s, find the length of the longest substring with no repeating characters.
Examples:
"abcabcbb" → 3 ("abc")
"bbbbb" → 1 ("b")
"pwwkew" → 3 ("wke")
"" → 0
Constraints: 0 ≤ |s| ≤ 5×10^4. Printable ASCII or English letters/digits/symbols.
Problem B — Group Anagrams (LC 49)
Given an array of strings, group anagrams together. Return any order.
Examples:
["eat","tea","tan","ate","nat","bat"]
→ [["bat"],["nat","tan"],["ate","eat","tea"]]
[""] → [[""]]
["a"] → [["a"]]
Constraints: 1 ≤ |strs| ≤ 10^4. 0 ≤ |strs[i]| ≤ 100. Lowercase English letters.
Problem C — Coin Change (LC 322)
Given coin denominations coins and an amount, return the fewest number of coins needed. If impossible, return -1. Infinite supply of each coin.
Examples:
coins = [1, 2, 5], amount = 11 → 3 (5 + 5 + 1)
coins = [2], amount = 3 → -1
coins = [1], amount = 0 → 0
Constraints: 1 ≤ |coins| ≤ 12. 1 ≤ coins[i] ≤ 2^31 - 1. 0 ≤ amount ≤ 10^4.
Expected Communication Style
- Restate in your own words; explicitly state the input/output types.
- Ask 3–5 clarifying questions: input bounds, edge cases, output ambiguities.
- State brute force with complexity.
- Name the pattern. (“Sliding window over a hashmap” / “Hash group by sorted string” / “Unbounded knapsack DP.”)
- State optimal complexity before coding.
- Code with narration. Pause briefly at decision points (e.g., “I need to evict from the window — let me use a map of char→last_seen_index instead of a set, so I can jump the left pointer”).
- Test 3+ cases, including 1 designed to break common bugs.
Solution Sketches
A. Longest Substring: sliding window with hashmap (char → last index seen). When duplicate enters, advance left to max(left, last_seen[c] + 1). O(N) time, O(min(N, alphabet)) space.
B. Group Anagrams: key by sorted string, OR key by char-count tuple. Hashmap from key → list of strings. O(N · K log K) for sort key, O(N · K) for count key.
C. Coin Change: DP. dp[i] = min coins for amount i. dp[i] = min(dp[i - c] + 1) for c in coins. O(amount × |coins|) time.
Common Failure Modes
- For A: used a set + slide, but O(N²) worst case. Need the hashmap-with-last-index trick to keep O(N).
- For A: forgot the
max(left, ...)when jumping the left pointer. Causes left to move backwards on out-of-window duplicates. - For B: used sorted string as key but the sort is O(K log K) per word. Acceptable, but count tuple is O(K). Discuss the tradeoff.
- For C: greedy approach (always pick the largest coin). Wrong on
coins = [1, 3, 4], amount = 6: greedy gives 4+1+1 = 3 coins; DP gives 3+3 = 2. - For C: forgot to initialize
dp[0] = 0or to handleamount = 0. - For C: didn’t check if
i - c >= 0before lookup. Causes IndexError or wrong answer.
Passing Bar
- Total score: 42/70 (average 3.0)
- Optimal complexity reached
- Correct on all given examples + at least 1 self-generated edge case
- Hint usage ≤ 1
- Time: ≤ 35 min
Follow-up Questions
For A:
- Return the substring itself, not just length. → Track start position in addition to max length.
- What if “repeating” means within a window of K positions? → Modify the window logic; same approach.
- Unicode? → Use codepoint, not byte; alphabet might be huge so use hashmap, not array.
For B:
- What if strings can be huge (1M chars)? → Hashing the count tuple becomes the bottleneck; consider streaming Rabin-Karp.
- Anagram detection in a stream? → Maintain rolling count.
- Approximate anagrams (with one letter difference)? → Locality-sensitive hashing.
For C:
- Return the actual coin combination, not just the count. → Track parent pointer in DP; reconstruct.
- Number of ways to make change. → Different DP:
dp[i] = sum of dp[i-c]. - What if coin counts are limited? → Bounded knapsack variant; O(amount × sum(counts)).
- Why doesn’t greedy work? → Coin systems where greedy works are “canonical” (e.g., USD coins); others (e.g., [1, 3, 4]) require DP.
Required Tests
For all:
- Given examples
- Empty input (where legal)
- Single-element / minimum input
- Input that triggers the pattern’s worst case (e.g., for A:
"aaaaa"; for C: amount=0 and impossible amount) - A randomly chosen non-trivial case you verify by hand
Required Complexity Explanation
State:
- Time complexity, with the bottleneck identified
- Space complexity, including auxiliary data structures
- Whether the bound is tight or improvable
Self-Evaluation Template
Mock 03 — Medium LC
Date: _______
Problem: _______
Time: ___ / 35 min
Hints used: ___
Scores (1–5) for all 14 dimensions:
___ Total /70
Pattern recognized in: _____ minutes
Bug count during coding: ___
Bug count caught by my own tests: ___
Strongest dimension:
Weakest dimension:
Specific drill for next session:
What to Do If You Fail
- Score 35–41: Repeat with a different problem; focus on weak dimension.
- Took > 35 min: You need more medium volume. Solve 20 mediums in the next week against a 35-min timer.
- Couldn’t recognize the pattern: Go back to Phase 2 (patterns) README and re-read the signal table.
- Bug-prone code: Phase 10, Lab 02 (TDD) and Lab 05 (stress testing).
- Communication weak: Record yourself; listen back; identify silent stretches.
- Pass twice in a row before moving to Mock 04.
Mock 04 — Hard LeetCode
Interview type: LC hard, FAANG onsite-level Target role: SWE-II → Senior; FAANG onsite second round Time limit: 60 minutes Format: 1 hard problem (or 1 medium + 1 hard if you finish hard early) Hints policy: One free hint at 20 min; additional -2 each. After 3 hints, the round is “failed” by FAANG standards. Primary goal: Reach the optimal algorithm under pressure, implement correctly, handle the hard follow-ups.
What This Mock Tests
Hards are where mid-level engineers separate from senior. The bar:
- Recognize the non-obvious pattern (binary search on answer, segment tree, DP on intervals, etc.) within 10 minutes
- Articulate why the optimal works, with correctness sketch
- Implement under time pressure
- Handle 2+ follow-up extensions
Scoring emphasizes Problem Understanding (#1), Optimization (#4), Correctness (#5), and Tradeoff Reasoning (#13).
Pick One Problem
Problem A — Median of Two Sorted Arrays (LC 4)
Given two sorted arrays nums1 and nums2 of sizes m and n, return the median of the combined sorted array. Must run in O(log(m+n)) time.
Examples:
[1, 3], [2] → 2.0 (merged: [1, 2, 3])
[1, 2], [3, 4] → 2.5 (merged: [1, 2, 3, 4])
[], [1] → 1.0
[1, 2], [] → 1.5
Constraints: 0 ≤ m, n ≤ 1000. m + n ≥ 1. -10^6 ≤ values ≤ 10^6.
Problem B — Trapping Rain Water (LC 42)
Given heights of bars height[i], compute how much water it can trap after raining.
Examples:
[0,1,0,2,1,0,1,3,2,1,2,1] → 6
[4,2,0,3,2,5] → 9
Constraints: 1 ≤ |height| ≤ 2×10^4. 0 ≤ height[i] ≤ 10^5.
Problem C — Merge K Sorted Lists (LC 23)
Given an array of k sorted linked lists, merge them into one sorted list.
Examples:
[[1,4,5], [1,3,4], [2,6]] → [1,1,2,3,4,4,5,6]
[] → []
[[]] → []
Constraints: 0 ≤ k ≤ 10^4. 0 ≤ |lists[i]| ≤ 500. Sum of all lengths ≤ 10^4.
Expected Communication Style
- Restate with input/output types and the explicit complexity constraint (where given).
- Ask precise clarifying questions: O(log) vs O(m+n) required? Negative numbers? Empty arrays? Duplicates?
- State a baseline: the obvious O(m+n) for A, O(N) two-pointer for B, O(N log K) for C.
- Identify whether the baseline meets the constraint. If not, derive the harder approach.
- Articulate the key insight before coding. (“For A, I’ll binary search on the partition position in the shorter array such that the left halves of both arrays combined form the lower half of the merged array.”)
- Code carefully — hards have boundary conditions everywhere.
- Test 3+ cases including the boundary (empty array, both arrays size 1, large mismatch).
Solution Sketches
A. Median of Two Sorted: binary search on the partition i in the shorter array. For each i, j = (m + n + 1) // 2 - i. Check if nums1[i-1] ≤ nums2[j] and nums2[j-1] ≤ nums1[i]. If so, median is computed from the boundary 4 values. O(log(min(m, n))). Edge cases: i=0 (no left in nums1) or i=m; same for j.
B. Rain Water: two-pointer. Maintain left_max and right_max. At each step, the side with the smaller max determines how much water can sit at that index. Move that pointer inward. O(N) time, O(1) space. Alternative: precompute left_max and right_max arrays, O(N) time and space.
C. Merge K Lists: min-heap of (value, list_index, node). Pop the smallest, push its next. O(N log K) where N = total nodes. Alternative: divide-and-conquer pairwise merge, same complexity, no heap.
Common Failure Modes
- A: Submitted O(m+n) by merging. Works but fails the complexity requirement — interviewer marks as fail at FAANG bar.
- A: Off-by-one in the partition formula. Most common bug.
- A: Didn’t handle empty arrays. Crash on
[], [1]. - B: Used DP with O(N) space when O(1) two-pointer works. Acceptable but downgrades the “Optimization” score.
- B: Forgot to handle bars that don’t trap any water (descending then ascending).
- C: Used O(N · K) approach — for each output element, scan all K heads. Too slow for K = 10^4.
- C: Forgot null check on
lists[i]— common test failure on[[]].
Passing Bar
- Total score: 45/70 (average 3.2)
- Optimal complexity reached (or a serious attempt with clear awareness of the gap)
- Correct on given examples + 2 boundary cases
- Hint usage ≤ 1
- Time ≤ 60 min
- Articulated correctness argument (not just “trust me”)
Follow-up Questions
For A:
- Generalize to k-th smallest in two sorted arrays. → Same binary search, partition at k-1 in total. O(log(min(m, n))).
- Median of K sorted streams (K small). → Heap-based; not log-time anymore.
- Median of unsorted data? → Quickselect or median-of-medians, O(N).
- Memory-bound large-K case. → External merge sort; k-way merge with bounded heap.
For B:
- 3D rain water (LC 407). → Heap of boundary cells, BFS inward. Much harder.
- Approximate version with O(1) memory for a stream. → Doesn’t exist in general; needs two-pass for exact.
- Trapping with non-zero ground (irregular shape). → Doesn’t fundamentally change.
For C:
- Streaming K sorted streams, K huge (10^6+). → Tournament tree (O(log K) per element), or distributed merge.
- Lists are not in memory (each is a file). → External k-way merge with bounded buffers.
- K-way merge with timestamps + de-dup. → Same algorithm + dedup pass.
- Latency-sensitive variant: emit elements as soon as possible. → Stream the heap output without buffering.
Required Tests
- All given examples
- Both arrays empty (A), all bars zero (B), all lists empty (C)
- One huge + one tiny array (A) — stresses the binary search edges
- Strictly increasing input (B), strictly decreasing input (B)
- K = 1 (C: single list pass-through), K = 0 (C: empty)
- One adversarial input you design
Required Complexity Explanation
- Time complexity, with reasoning
- Space complexity
- Whether the bound is tight or merely upper
- For A: explicitly justify why O(log) is achievable
Self-Evaluation Template
Mock 04 — Hard LC
Date: _______
Problem: _______
Time: ___ / 60 min
Hints used: ___
Scores (1–5):
___ Total /70
Time to reach optimal idea: _____ min
Time to first correct submission: _____ min
Number of bugs hit and fixed:
Was the correctness argument articulated? Y/N
Were 2+ follow-ups answered? Y/N
Strongest dimension:
Weakest dimension:
Action item for next session:
What to Do If You Fail
- Score 38–44: Repeat with a different problem; you nearly passed.
- Score <38: Step back to mediums; ensure 3+ consecutive passes of Mock 03 before retrying hard.
- Couldn’t reach optimal complexity: Review Phase 2 (binary search), Phase 5 (DP), Phase 4 (graphs) — which patterns did you miss?
- Bug-storm on implementation: Phase 10 Lab 04 (correctness proofs) and Lab 05 (stress testing).
- Failed follow-ups: You knew the algorithm but didn’t know its variants; do 5 related problems before the next attempt.
- Pass twice in a row before moving to Mock 05.
Mock 05 — Big Tech Phone Screen
Interview type: FAANG-style phone screen (Google, Meta, Amazon, Microsoft, Apple) Target role: SWE-II / Senior phone round Time limit: 45 minutes total Format: ~5 min intro + 35 min coding + 5 min Q&A. ONE medium-to-hard problem with strong follow-ups. Hints policy: Hints cost real points at FAANG — one is acceptable, two is borderline, three fails. Primary goal: Show you can work cleanly under FAANG’s exact format.
What This Mock Tests
This mock simulates the actual FAANG phone screen format. The interviewer:
- Greets you (~3 min)
- Asks one 1-min behavioral warm-up (“What are you excited about lately?”)
- Presents the coding problem
- Expects you to clarify, plan, code, test
- Asks 2–3 follow-up extensions in the remaining time
The signal they’re collecting: can this person work on our team without supervision? Specifically — do they understand requirements, optimize without being told to, write reasonable code, and engage with extensions intelligently?
Scoring weights: Problem Understanding, Optimization, Communication, Follow-ups are all critical (3+). One weak dimension = no advance.
Pick One Problem
Problem A — Longest Increasing Path in a Matrix (LC 329)
Given an m × n integer matrix, return the length of the longest strictly increasing path. From a cell, you may move up/down/left/right (no diagonals, no wraparound).
Examples:
[[9,9,4],[6,6,8],[2,1,1]] → 4 (path 1→2→6→9)
[[3,4,5],[3,2,6],[2,2,1]] → 4 (path 3→4→5→6)
[[1]] → 1
Constraints: 1 ≤ m, n ≤ 200. 0 ≤ matrix[i][j] ≤ 2^31 - 1.
Problem B — Word Ladder (LC 127)
Given two words beginWord and endWord and a dictionary wordList, find the length of the shortest transformation sequence (each step changes exactly one letter; intermediate words must be in dictionary). Return 0 if no sequence exists.
Examples:
"hit", "cog", ["hot","dot","dog","lot","log","cog"] → 5 (hit→hot→dot→dog→cog)
"hit", "cog", ["hot","dot","dog","lot","log"] → 0 (cog not in dict)
Constraints: 1 ≤ |beginWord| ≤ 10. All words same length, lowercase. 1 ≤ |wordList| ≤ 5000.
Problem C — Number of Islands (LC 200)
Given a 2D grid of '1' (land) and '0' (water), count the number of islands (connected groups of land, 4-directional).
Examples:
[
["1","1","1","1","0"],
["1","1","0","1","0"],
["1","1","0","0","0"],
["0","0","0","0","0"]
] → 1
[
["1","1","0","0","0"],
["1","1","0","0","0"],
["0","0","1","0","0"],
["0","0","0","1","1"]
] → 3
Constraints: 1 ≤ m, n ≤ 300.
Expected Communication Style
- Restate and confirm types. (“Integer matrix; I return the length of the longest strictly increasing path; movement is 4-directional.”)
- Ask 3–5 clarifying questions: matrix size, value range, strictly vs non-strictly increasing, do diagonals count.
- State brute force with complexity. (“DFS from every cell, no memo, exponential worst case.”)
- Identify the optimization signal. (“DFS + memoization since subpath answers don’t change. Or topological sort on the DAG of (i,j) → (i’, j’) where val(i,j) < val(i’, j’).”)
- Justify your choice between alternatives. (“Memo’d DFS is simpler; topo sort is more rigorous and avoids stack depth issues at 200×200.”)
- Code cleanly. Helper functions, no inline magic.
- Walk through the example. Test 2+ edge cases.
- Engage with the follow-ups — these decide the round.
Solution Sketches
A. Longest Increasing Path: memoized DFS. dp[i][j] = longest path starting at (i, j). Recurse to 4 neighbors with strictly greater value; dp[i][j] = 1 + max(dp[neighbor]). O(mn) time and space. The DAG structure ((i,j) → (i', j') iff val < val') guarantees no cycles, so memo is sound.
B. Word Ladder: BFS over the graph where nodes are words and edges connect words differing by one letter. Use a “wildcard bucket” optimization: for each word, generate patterns like h*t, *ot, ho*; bucket words by pattern; neighbors are words sharing a pattern bucket. O(N · L²) where N = dict size, L = word length.
C. Number of Islands: for each unvisited ‘1’, flood fill (BFS or DFS), mark visited, increment count. O(mn) time and space.
Common Failure Modes
- A: brute force without memo. TLE on 50×50.
- A: incorrect strict vs non-strict check.
>vs>=flips the answer. - B: built a graph by comparing every pair of words. O(N² · L) — too slow for N = 5000.
- B: didn’t notice
endWordmay not be in dict. Returns wrong if you assume it is. - B: BFS without visited tracking. Infinite loop.
- C: modified input grid without permission. Some interviewers care; clarify first.
- All: weak follow-up answers. “I’d just use a database” — too vague; doesn’t show understanding.
Passing Bar
- Total score: 46/70 (average 3.3)
- Optimal complexity reached
- Correctness on given examples + 2 edge cases
- Hint usage ≤ 1
- Time ≤ 45 min
- Two follow-ups answered with substance
Follow-up Questions
For A:
- Return the path itself, not just the length. → Add parent pointer in DP; reconstruct.
- What if path can revisit cells? → No longer a DAG; problem is NP-hard (Hamiltonian-flavored).
- Path with diagonal moves allowed? → 8 neighbors instead of 4; same algorithm.
- Matrix is sparse (mostly 0). → Algorithm doesn’t change asymptotically; data layout (CSR) matters at scale.
- Matrix doesn’t fit in memory. → Chunked processing with overlap; harder boundary handling.
For B:
- Return one valid path. → Track BFS parent; reconstruct.
- Return ALL shortest paths (Word Ladder II, LC 126). → BFS to build the DAG, then DFS to enumerate.
- Bidirectional BFS for speedup. → Search from both ends, meet in middle. Roughly √ improvement.
- Streaming dictionary (words arriving). → Re-bucket on each insert; same algorithm.
For C:
- Count distinct island shapes. → Canonicalize the shape (sort relative cell positions, possibly rotate/reflect); hash.
- Number of islands II (online — cells added one by one). → Union-Find; O(α(N)) per operation.
- Largest island after flipping at most one ‘0’ to ‘1’. → Label each island with size; for each ‘0’, sum sizes of unique neighboring islands + 1.
- 3D islands. → Same algorithm, 6 neighbors instead of 4.
Required Tests
- All given examples
- 1×1 matrix / single-letter input
- All-same-value matrix (A: answer is 1)
- Disconnected components (C: multiple islands)
- Long diagonal-like path (A)
- Dictionary missing the endWord (B)
- Empty grid / empty dictionary edge
Required Complexity Explanation
- Time, with reasoning
- Space, including recursion stack and memoization tables
- Worst-case input that triggers the worst-case complexity
- For A: explain why memo turns O(4^(mn)) into O(mn)
Self-Evaluation Template
Mock 05 — Big Tech Phone Screen
Date: _______
Problem: _______
Time: ___ / 45 min
Hints used: ___
Follow-ups answered well (out of 2 asked): ___
Scores (1–5):
___ Total /70
Did I narrate continuously? Y/N
Did I identify the optimization signal before coding? Y/N
Did I test 2+ unprompted edges? Y/N
What I would change for the real interview:
What to Do If You Fail
- Score 40–45: Re-do with a different problem; pinpoint weak dimension.
- Score <40: You’re not ready for FAANG phone screens. Do 10 more mediums + 3 hards, then retry.
- Optimization gap: Phase 2 patterns + Phase 4 graphs are the most-tested patterns at FAANG.
- Follow-up weakness: This is the #1 thing that distinguishes “hire” from “no-hire” at FAANG phone screens. Treat follow-ups as a primary skill, not an afterthought.
- Pass twice in a row before moving to Mock 06.
Mock 06 — Big Tech Onsite
Interview type: FAANG onsite coding round (single round of the 4–5 onsite rounds) Target role: FAANG SWE-II / Senior Time limit: 60 minutes total Format: ~5 min intro + 50 min coding (TWO problems back-to-back) + 5 min Q&A Hints policy: One hint per problem acceptable; more is below-bar. Primary goal: Demonstrate sustained performance across two problems without losing tempo.
What This Mock Tests
FAANG onsites run 4–5 of these rounds per day. Each round expects you to solve 1–2 problems in 50 minutes of coding. This mock packs two problems into 60 minutes deliberately — the time pressure is real.
The signal: are you a consistent solver, not a one-hit-wonder? Can you context-switch from problem 1 to problem 2 without resetting?
Scoring weights: all dimensions matter; Time Management is implicit — running out of time on problem 2 is a hard fail signal.
Format
Pick one easy/medium problem (15–20 min) AND one medium/hard problem (30–40 min) from the list below. The interviewer presents one, then immediately the next once you finish. No break.
Problem Set 1 (warm-up: 15–20 min)
A1 — Valid Anagram (LC 242)
Given two strings, determine if one is an anagram of the other.
"anagram", "nagaram" → True
"rat", "car" → False
Constraints: 1 ≤ |s|, |t| ≤ 5×10^4. Lowercase English letters (or follow up: Unicode).
A2 — Climbing Stairs (LC 70)
You can climb 1 or 2 steps at a time. How many distinct ways to reach step n?
n = 2 → 2 (1+1, 2)
n = 3 → 3 (1+1+1, 1+2, 2+1)
Constraints: 1 ≤ n ≤ 45.
A3 — Move Zeroes (LC 283)
Given nums, move all 0s to the end while maintaining the relative order of non-zero elements. In-place.
[0,1,0,3,12] → [1,3,12,0,0]
Problem Set 2 (main: 30–40 min)
B1 — LRU Cache (LC 146)
Design and implement a Least Recently Used cache with get(key) and put(key, value) both in O(1).
LRUCache cache(2)
cache.put(1, 1)
cache.put(2, 2)
cache.get(1) → 1
cache.put(3, 3) // evicts key 2
cache.get(2) → -1
cache.put(4, 4) // evicts key 1
cache.get(1) → -1
cache.get(3) → 3
cache.get(4) → 4
Constraints: 1 ≤ capacity ≤ 3000. At most 10^5 operations.
B2 — Word Break (LC 139)
Given a string s and a dictionary wordDict, return True if s can be segmented into a sequence of dictionary words.
"leetcode", ["leet", "code"] → True
"applepenapple", ["apple","pen"] → True
"catsandog", ["cats","dog","sand","and","cat"] → False
Constraints: 1 ≤ |s| ≤ 300. 1 ≤ |wordDict| ≤ 1000.
B3 — Course Schedule II (LC 210)
Given numCourses and prerequisites (pairs [a, b] meaning b must be taken before a), return an order in which to take all courses, or [] if impossible.
2, [[1,0]] → [0, 1]
4, [[1,0],[2,0],[3,1],[3,2]] → [0, 1, 2, 3] or [0, 2, 1, 3]
2, [[1,0],[0,1]] → []
Constraints: 1 ≤ numCourses ≤ 2000. 0 ≤ |prerequisites| ≤ 5000.
Expected Communication Style
For each problem:
- Restate
- 2–3 clarifying questions (don’t over-ask on the warm-up; do for the main)
- Brute force + complexity
- Optimal approach + complexity
- Code with narration
- Test 2–3 cases
- Move on to the next problem without dragging
Critical: manage time aggressively. If you blow past 20 min on problem 1, stop and move on. Failing on time is worse than producing partial code on both.
Solution Sketches
A1 Anagram: count chars (Counter or array of 26), compare. O(N + M). A2 Climbing Stairs: Fibonacci. DP or closed form. O(N) or O(1). A3 Move Zeroes: two-pointer, write-pointer advances on non-zero; pad with zeros. O(N) time, O(1) space.
B1 LRU Cache: doubly linked list + hashmap. List front = most recent, tail = LRU. get → move node to front; put → if exists move to front and update; if not, insert at front; if over capacity, remove tail and delete from hashmap. O(1) per op.
B2 Word Break: DP. dp[i] = True if s[:i] can be broken. dp[i] = any(dp[j] and s[j:i] in wordSet) for j < i. O(N² · max_word_len) or O(N²) with set lookup. Watch the off-by-one in dp = [False] * (n + 1); dp[0] = True — a frequent bug.
B3 Course Schedule II: topological sort via Kahn’s algorithm (BFS on indegree). If output length < numCourses, there’s a cycle → return []. O(V + E).
Common Failure Modes
- Spent 30 min on the easy. Total fail; you’ll run out for the main problem.
- LRU implemented with built-in OrderedDict (Python) without explaining the underlying data structure. Some interviewers accept this; many do not. Always offer to implement from scratch.
- Word Break: O(2^N) recursion without memoization. TLE on N=100.
- Course Schedule: DFS-based cycle detection but forgot to track three states (unvisited / in-progress / done). Marks node done while in progress → false negatives.
- Trying to start problem 2 fresh without acknowledging the time check. Senior signal: “We have 35 min left and this is the main problem; let me dive in.”
Passing Bar
- Total score: 49/70 (average 3.5)
- BOTH problems implemented correctly
- Optimal complexity on the main problem
- Time managed: problem 1 ≤ 20 min, problem 2 ≤ 40 min
- Hint usage ≤ 1 total
- Tests run on both
Follow-up Questions (asked between or after problems)
For B1 (LRU):
- Make it thread-safe. → Coarse-grained lock; or read-write lock with care; or lock-free with hazard pointers (advanced).
- LFU instead of LRU. → Two-level structure: hashmap to nodes, hashmap to frequency-buckets, each bucket a doubly-linked list.
- Distributed LRU across multiple servers. → Consistent hashing + per-shard LRU.
- Persist to disk. → Write-behind cache; reconstruct on startup.
For B2 (Word Break):
- Word Break II — return all valid sentences. → Backtracking with memoization on the suffix.
- Words can be arbitrarily long, dict has 10^6 words. → Trie for prefix lookup during DP, O(N²) using trie traversal.
- Streaming version: input arrives one char at a time. → Online DP — update
dp[i]as i grows; suffix automaton helps for the dict.
For B3 (Course Schedule):
- Find any cycle and return it. → DFS with parent tracking; on back-edge, walk parents.
- Schedule to minimize number of semesters (parallel courses). → Longest path in DAG = answer; O(V + E).
- Add weighted edges (course duration). → Critical path method.
Required Tests
- All given examples (both problems)
- Empty / single-element input for problem 1
- Capacity 1 for LRU
- Single-word dict and
sthat exactly equals one word for Word Break - Course graph with cycle for B3
- Self-loop course (numCourses=1, prereq=[[0,0]]) — should return []
Required Complexity Explanation
For both problems, state time and space + identify which one is the bottleneck under the actual constraints (N=300 for Word Break ⇒ O(N²) is fine; N=10^5 ops on LRU ⇒ O(1) per op is mandatory).
Self-Evaluation Template
Mock 06 — Big Tech Onsite
Date: _______
Problem 1: _______ — Time: ___ min, Score: ___/70
Problem 2: _______ — Time: ___ min, Score: ___/70
Total time: ___ / 60 min
Hints used: ___ (across both)
Combined avg score:
Both problems complete? Y/N
Tests run on both? Y/N
What went well:
What went poorly:
Time-management notes:
Action item:
What to Do If You Fail
- Failed problem 2 due to time: Practice problem 1 against a stricter timer (15 min).
- Failed problem 2 due to difficulty: Mock 04 (hard LC) needs more reps.
- Hint-heavy on both: Foundational pattern recognition gap; return to Phase 2.
- LRU implementation issues: Drill data structure design — Phase 1 lab 04 + 05 (linked lists, stacks/queues).
- Pass twice consecutively before Mock 07.
Mock 07 — Senior Engineer
Interview type: Senior SWE coding + design hybrid Target role: Senior Software Engineer (L5 Google / E5 Meta / SDE3 Amazon) Time limit: 60 minutes Format: ONE problem + system extension + explicit tradeoff discussion Hints policy: Hints on the algorithm lower your score significantly; hints on the extension are acceptable. Primary goal: Show senior-level reasoning — not just solving, but choosing among solutions with reasoning.
What This Mock Tests
At the senior bar, mere correctness is not enough. The interviewer wants to see:
- You consider multiple approaches and articulate why you chose one
- You understand the production implications of your choices
- You can extend the algorithm into a service-like context
- You can answer “what if the input is 1000× larger” with a concrete plan
Scoring weights: Tradeoff Reasoning (#13), Production Awareness (#14), Optimization (#4), Follow-ups (#11) are all critical. A senior who scores 3 on tradeoffs has signaled “mid-level”; needs 4+.
Pick One Problem
Problem A — Design a Rate Limiter
Build a rate limiter supporting allow(user_id, timestamp) → bool. Each user can make at most N requests per W seconds. Discuss the algorithm, then extend to a multi-server / distributed setting.
Initial constraints: in-process, single-thread. N ≤ 1000 reqs/window. W ≤ 60 sec. 1M users.
Problem B — Top K Frequent Elements (LC 347) + Streaming Extension
Phase 1: Given an array nums and integer k, return the k most frequent elements. O(N log K).
[1,1,1,2,2,3], k=2 → [1, 2]
[1], k=1 → [1]
Phase 2 (extension): the input is a never-ending stream; report top-k continuously with bounded memory. Discuss exact vs approximate tradeoffs.
Problem C — Snapshot Array (LC 1146)
Implement SnapshotArray:
SnapshotArray(length)— initialize withlengthzerosset(index, val)— set value at indexsnap() → snap_id— take a snapshot, return its idget(index, snap_id)— value at index at the time of snap_id
Discuss the algorithm; extend to a versioned key-value store with garbage collection of old snapshots.
Expected Communication Style
- Restate, including the implied requirements (“rate limit must be enforced even if the same user hits multiple instances”).
- Ask senior-grade clarifying questions: read-heavy vs write-heavy? Latency targets? Consistency requirements? Failure modes acceptable?
- Propose 2+ approaches with explicit tradeoffs. (“Token bucket vs sliding window log vs sliding window counter — I’d pick X because…”)
- State the algorithm and complexity.
- Code the chosen approach with senior code quality (clear naming, error handling at boundaries, no premature abstraction).
- Discuss the production extension without prompting.
- Anticipate failure modes — what breaks at 10× scale? 100×?
Solution Sketches
A. Rate Limiter:
- Sliding window log: per user, deque of timestamps. On request, drop entries older than
W, check length < N, append. O(N) per request (amortized O(1) drop). Memory: O(users × N). - Token bucket: per user,
(tokens, last_refill). On request, refilltokens += (now - last) × rate, cap at N, decrement if ≥ 1. O(1) per request. Slightly bursty. - Sliding window counter: approximate; uses 2 buckets (previous + current window), weighted by overlap. O(1), small memory.
Distributed extension: per-user state in Redis with atomic Lua script; or consistent-hash users to dedicated instances; or central counter with relaxed accuracy.
B. Top K Frequent:
- Static: Counter + min-heap of size K. O(N log K).
- Streaming exact: impossible with bounded memory in general (any element could become top-K later).
- Streaming approximate: Count-Min Sketch + heap of candidates; or Misra-Gries / SpaceSaving algorithm for ε-approximate. O(1/ε) memory.
C. Snapshot Array:
- Naïve: copy entire array per snapshot. O(length) per snap, O(snaps × length) memory.
- Better: per-index, store list of
(snap_id, value)pairs sorted by snap_id. Lookup with binary search. O(log S) per get, O(1) per set, O(total writes) memory.
Versioned KV store extension: persistent data structures (Clojure-style); or copy-on-write trees; or LSM-tree with snapshot isolation.
Common Failure Modes
- Implemented the first algorithm that came to mind without discussing alternatives. This is the #1 senior-bar failure.
- Said “I’d just use Redis” without explaining the algorithm Redis would implement. The interviewer wants the algorithm; the database is a deployment choice.
- Top-K streaming: claimed exact algorithm with bounded memory. Impossible in general; signals theoretical weakness.
- Snapshot Array: copied the array per snapshot. Acceptable as brute force; bad as final answer for a senior.
- No tests beyond the given examples.
- Skipped the production extension entirely.
Passing Bar
- Total score: 53/70 (average 3.8)
- Tradeoff Reasoning #13: ≥ 4
- Production Awareness #14: ≥ 4
- Optimal or near-optimal algorithm
- Extension discussed substantively (not just “I’d shard it”)
- Correct, readable code
Follow-up Questions
For A (Rate Limiter):
- Latency budget: < 1ms p99. → In-memory store; Redis is borderline (network RTT). Local cache with eventual sync.
- Multi-region with strict global limit. → Hard; usually relaxed to per-region limit + occasional reconciliation.
- What if Redis goes down? → Fail-open (allow) vs fail-closed (deny); usually fail-open for rate limiters.
- Hot user (one user makes 90% of requests). → Dedicated shard; or local fast path before checking shared state.
For B (Top K):
- Approximate with ε = 0.01. → Count-Min Sketch sized accordingly.
- Top K most-improved over the last hour. → Two-window comparison; bigger memory.
- Trending detection (top K with sudden growth). → Slope/derivative-based; needs time-windowed counts.
- What if K = 1M? → Heap-of-K doesn’t fit memory; external merge or sampling.
For C (Snapshot Array):
- Snapshot every write (versioned KV). → Same structure; consider compaction.
- Memory pressure: drop snapshots older than 1 hour. → Per-index list pruning; tombstones for fully-deleted snaps.
- Snapshot isolation in a multi-writer setting. → Multi-version concurrency control; per-transaction snapshot id.
- Persist snapshots to disk. → Log-structured store; periodically checkpoint.
Required Tests
- Given examples
- Empty / boundary input
- Heavy churn (many writes to same index for C)
- Single user / single key
- Burst of requests at the window boundary (A)
- K = N for B (no filtering)
Required Complexity + Production Discussion
Cover:
- Time per operation, space per user/element
- Latency under typical load vs worst case
- Memory growth and GC implications
- Failure semantics (what happens on partial failure)
- Monitoring metrics you’d add (rate limit reject rate, top-K convergence time, snapshot lookup p99)
Self-Evaluation Template
Mock 07 — Senior Engineer
Date: _______
Problem: _______
Time: ___ / 60 min
Scores (1–5):
___ Total /70
Tradeoff Reasoning (#13): ___
Production Awareness (#14): ___
Did I propose 2+ approaches before coding? Y/N
Did I anticipate scale-up failure modes? Y/N
Action item:
What to Do If You Fail
- Tradeoff or Production score below 4: Read Phase 8 (practical engineering) deeply; rebuild a small system (rate limiter, cache).
- Algorithm score below 3: You haven’t earned the right to do senior interviews yet; back to Mock 04–06.
- Code quality issues: Read CODE_QUALITY.md.
- Pass twice consecutively before Mock 08.
Mock 08 — Staff Practical
Interview type: Staff/Principal engineer practical coding round Target role: Staff SWE (L6 Google / E6 Meta / Principal Amazon) Time limit: 75 minutes Format: Build a working component (not a LeetCode puzzle) with multiple interacting pieces Hints policy: Hints affect score but rarely fail you outright at staff bar — the bar is judgment, not raw problem-solving. Primary goal: Demonstrate the ability to build a real thing under time pressure, with monitoring/failure-mode awareness baked in.
What This Mock Tests
Staff interviews shift away from “can you solve this puzzle” toward “can you build something we’d ship.” You’re given a problem statement that resembles a small feature spec. Your job:
- Decompose the problem into modules with clean interfaces
- Choose data structures that match real production constraints
- Implement the core fully + skeletons for the rest
- Discuss monitoring, deployment, failure recovery, evolution
- Justify every choice you make against alternatives
Scoring weights: Code Quality (#7), Tradeoff Reasoning (#13), Production Awareness (#14) are paramount. Pure algorithmic Optimization is less critical — staff problems rarely have a “trick.”
Pick One Build
Build A — In-Memory Rate Limiter Library
Build a usable Python/Java/Go module that provides:
limiter = RateLimiter(max_requests=100, window_seconds=60)
allowed = limiter.allow(user_id="alice") # bool, ~10µs p99
Required:
- Multiple algorithms behind a uniform interface (token bucket + sliding window log + sliding window counter)
- Configurable per-user vs global limits
- Thread-safe
- A
stats()method returning rejection rate per user - A
purge()method to evict idle users from memory - Tests covering correctness, thread safety, and the window-edge case (burst at exactly t=W)
Build B — Bounded LRU Cache with TTL and Stats
Build an LRU cache that also supports per-entry TTL:
cache = LRUCache(capacity=10000, default_ttl_seconds=300)
cache.put(key, value, ttl=None) # uses default
val = cache.get(key) # returns None if missing/expired
cache.delete(key)
cache.stats() # hit rate, eviction rate, expiration rate
Required:
- O(1) get/put
- Lazy + active TTL expiration
- Thread-safe
- Memory cap (eviction policy: LRU among non-expired)
- Tests: correctness, expiration races, concurrent put/get, capacity overflow
Build C — Job Scheduler (cron-like)
Build a scheduler that runs jobs at specified intervals:
scheduler = Scheduler()
scheduler.add(name="cleanup", interval_sec=300, fn=cleanup_fn)
scheduler.add(name="report", cron="0 9 * * MON", fn=report_fn) # nice-to-have
scheduler.start()
scheduler.stop()
scheduler.status() # last run, next run, last error per job
Required:
- Multiple jobs running independently
- Graceful shutdown (don’t kill mid-job)
- Per-job error isolation (one job’s failure doesn’t crash the scheduler)
- Catch-up policy on missed runs (skip vs catch-up; configurable)
- Tests: timing, overlap, panic in job
Expected Communication Style
- Restate with assumptions stated upfront: “I’ll assume single-process, multi-threaded, in-memory; if you want me to extend to distributed, that’s a separate discussion.”
- Propose the module decomposition before writing any code. Whiteboard the public interface, the internal modules, the data flow.
- Identify the 2–3 critical design decisions and discuss alternatives.
- Pick one and code it — favor depth over breadth. Skeleton/stub the rest with comments like
# TODO: implement token bucket variant with same interface. - Discuss monitoring without prompting: which metrics, why, what alerts.
- Discuss failure modes without prompting: thread starvation, memory blowup, race conditions, partial failures.
- Test the critical path. Production-style tests, not just smoke.
Solution Sketches
A. Rate Limiter:
class RateLimiter(ABC):
def allow(self, key: str) -> bool: ...
def stats(self) -> dict: ...
def purge(self, idle_seconds: int): ...
class SlidingWindowLogLimiter(RateLimiter):
def __init__(self, max_requests, window_seconds):
self._max = max_requests
self._window = window_seconds
self._logs = defaultdict(deque) # key → deque of timestamps
self._lock = threading.Lock()
self._rejects = Counter()
self._accepts = Counter()
def allow(self, key):
now = time.monotonic()
with self._lock:
log = self._logs[key]
while log and log[0] <= now - self._window:
log.popleft()
if len(log) < self._max:
log.append(now)
self._accepts[key] += 1
return True
self._rejects[key] += 1
return False
Plus token bucket and sliding-window-counter implementations behind the same interface.
B. LRU + TTL: doubly linked list + hashmap; each node stores (key, value, expires_at, prev, next). get: check expiry, evict if expired, return; else move to MRU. put: insert/update, evict LRU if over capacity. Background thread (or lazy on every get/put) sweeps expired entries.
C. Job Scheduler: thread pool + priority queue of (next_run_time, job). Main loop: peek queue, sleep until next, run job in pool, re-schedule. Catch exceptions per job; record to status. Graceful shutdown: stop accepting new runs, await running ones with timeout.
Common Failure Modes
- Built the algorithm without the interface. Staff interviews care about the API as much as the implementation.
- No thread safety. Mentioned in the spec; missed → fail.
- No mention of monitoring/observability. Critical staff signal.
- Used global state. Hard to test, hard to reason about.
- Coded all three rate limiter algorithms in 75 min instead of one well + sketches. Depth > breadth.
- TTL implementation does periodic full scan. O(N) sweep per second isn’t acceptable; lazy + bounded active sweep is.
- Scheduler: jobs share state and races corrupt it. Job functions need to be treated as untrusted code.
Passing Bar
- Total score: 56/70 (average 4.0)
- Code Quality #7 ≥ 4
- Tradeoff Reasoning #13 ≥ 4
- Production Awareness #14 ≥ 4
- Working core; documented stubs for the rest
- At least 3 tests covering: correctness, concurrency, edge timing
- Monitoring + failure modes discussed substantively
Follow-up Questions
For A:
- Make it distributed. → Redis with Lua atomic ops; or per-shard local limiter + global reconciliation.
- Hot-user problem. → Sharded sub-limiters per user; or local L1 cache.
- Add quota burst (allow 2× for 5 sec then throttle). → Token bucket with two-tier refill.
For B:
- What’s the GC pressure under high churn? → Allocation per put is the cost; pool nodes if hot path.
- Persist across restart. → Periodic snapshot to disk; replay log on startup.
- Add a probabilistic admission filter (TinyLFU). → Prevent cache pollution from one-hit-wonders.
For C:
- Distribute across N workers. → Leader-elected scheduler that dispatches jobs; or per-shard schedulers.
- Persistent jobs (survive restart). → Persist queue to durable storage.
- Jobs with dependencies. → DAG scheduler; topological execution.
- Job retries with exponential backoff. → Per-job retry state machine.
Required Tests
- Correctness on the basic case
- Thread safety (concurrent calls; assert no double-count, no race-induced overflow)
- Timing edges (window boundary, expiration boundary, scheduling drift)
- Failure: what happens if the underlying clock jumps backward?
- Resource cleanup: after purge / shutdown, no leaked threads or memory
Required Discussion (production)
Cover, at minimum:
- Metrics you’d export (Prometheus-style)
- Alert thresholds you’d set
- Memory cap behavior
- Failure modes and recovery
- Deployment story (config, rollout, rollback)
- Evolution: how would you add a new rate-limiting algorithm? (Should be drop-in.)
Self-Evaluation Template
Mock 08 — Staff Practical
Date: _______
Build: _______
Time: ___ / 75 min
Scores (1–5):
___ Total /70
Critical dimensions:
Code Quality (#7): ___
Tradeoff (#13): ___
Production (#14): ___
Interface designed before coding? Y/N
Monitoring discussed unprompted? Y/N
Failure modes discussed unprompted? Y/N
Thread safety verified by test? Y/N
What I left unfinished (and what I'd do with another hour):
Action item:
What to Do If You Fail
- Production score below 4: Build a real version of one of these systems and run it for a week with metrics. Phase 8 has more.
- Code quality below 4: Have a senior do a written code review of your submission; act on it.
- Tradeoff below 4: For every decision, force yourself to write down at least 2 alternatives and a rejection reason.
- Pass twice consecutively before Mock 09.
Mock 09 — Runtime / Language Deep Dive
Interview type: Mid-coding language/runtime probe (Bloomberg, Stripe, hedge funds, infra-heavy teams) Target role: Senior / Staff backend or systems Time limit: 45 minutes Format: ONE medium problem, interrupted by language/runtime probes during/after coding Hints policy: Hints on the probes are -1 each; on the algorithm, standard. Primary goal: Demonstrate that you understand your language at a level deeper than syntax.
What This Mock Tests
Some companies will deliberately interrupt your coding with “what does this line cost?” or “what happens if two threads call this concurrently?” or “where does this object get allocated?” The signal: senior engineers don’t just write code; they know what the runtime does with it.
Scoring weights: Language/Runtime (#12) is doubled. Other dimensions normal.
Pick a language you claim to know well. The probes are language-specific.
Pick One Problem (any language)
Problem A — Implement a Concurrent Counter
Build a thread-safe counter with incr(), decr(), read(). Discuss the tradeoffs of locking vs atomic vs sharded.
Problem B — Producer-Consumer Queue
Implement a bounded blocking queue: put(item) blocks if full; get() blocks if empty.
Problem C — Implement flatten(nested_list) Lazily
Given an arbitrarily nested list (e.g., [1, [2, [3, 4]], 5, [[6]]]), return an iterator yielding flat elements lazily (constant memory).
Probes by Language (interviewer fires these mid-coding)
Python
- “What does
list.append(x)cost? When does it resize?” - “How does Python implement
dict? What’s the lookup cost in the worst case?” - “What happens to your concurrent counter under the GIL? Is
+=atomic?” - “What’s reference counting? When does the cyclic GC run?”
- “What does
with lock:desugar to?” - “Why is
multiprocessingdifferent fromthreading? When would you use which?” - “Difference between
asyncio.sleep(0)andtime.sleep(0)?” - “What’s an
__slots__and when does it matter?” - “Generators vs iterators vs async iterators — implement
flattenas each.” - “Where does the GIL hurt your code most?”
Java
- “What’s the difference between
volatile intandAtomicInteger?” - “Explain the Java Memory Model — happens-before relationship.”
- “When does
synchronizeduse biased locking / thin lock / fat lock?” - “What does
String s = a + b;compile to?” - “How does HashMap resize? What’s the cost?”
- “Difference between G1, ZGC, Shenandoah?”
- “What’s a
MethodHandle?” - “When does escape analysis kick in?”
- “What’s
Unsafe, why does it exist?” - “Implement the queue using
ReentrantLockvssynchronized— what differs?”
Go
- “What does a goroutine cost (stack, scheduler)?”
- “Explain GMP — goroutines, M (OS thread), P (processor).”
- “How does
selectwork under the hood?” - “What’s escape analysis? Show me an example of stack vs heap allocation.”
- “Difference between
sync.Mutexandsync.RWMutex?” - “What’s the cost of channels? When to prefer mutex?”
- “Explain Go’s GC — what generation? What pause time?”
- “What does
defercost?” - “Implement the bounded queue using channels vs using a mutex — which and why?”
C++
- “Explain RAII.”
- “What’s the difference between
std::atomic<int>andstd::mutex-protected int?” - “Memory orders:
relaxed,acquire,release,acq_rel,seq_cst— when to use which?” - “What does
std::vector::push_backcost? Amortized vs worst-case?” - “Move semantics — when is the move constructor called?”
- “Difference between
std::shared_ptrandstd::unique_ptr?” - “What’s
std::launder?” - “Implement the bounded queue using
std::condition_variable.” - “What’s the cost of a virtual call?”
Rust
- “Borrow checker rules — one mutable XOR many immutable references.”
- “When do you need
Arc<Mutex<T>>vsRc<RefCell<T>>?” - “Explain
SendandSynctraits.” - “What does
async fndesugar to?” - “Difference between
tokio::spawnandtokio::task::spawn_blocking?” - “What’s a
Pin<T>? Why does it exist?” - “When does lifetime elision apply?”
- “Implement the bounded queue using
tokio::sync::mpsc.”
Node.js / JavaScript
- “Explain the event loop — phases, microtasks vs macrotasks.”
- “What’s the difference between
process.nextTickandsetImmediate?” - “When is a
Promiseresolved synchronously vs asynchronously?” - “What’s V8’s hidden class / inline cache?”
- “Why does
obj.x = 1afterobj.y = 2behave differently than the reverse order in terms of perf?” - “What’s a
WeakRef? When is the value collected?” - “Implement the queue using
async/await.”
Expected Communication Style
- Restate problem.
- Ask clarifying questions (including language-specific ones — “should
read()return a snapshot or be consistent with concurrent updates?”). - Code the algorithm.
- Engage with probes as they come. Don’t say “I’ll come back to that.” Pause coding, answer, resume.
- After coding, walk through tests.
- End with a senior-level reflection on what could change with different runtime characteristics (“if we moved this to Go, the channel-based design would replace the lock-based one”).
Common Failure Modes
- Memorized the algorithm but didn’t know how the language implements its data structures.
- “I don’t know” to a basic probe. A senior should know how the language’s main collections perform.
- Implemented the concurrent counter with
intand+=in Python, thinking GIL makes it safe. GIL ensures bytecode atomicity butx += 1is read-modify-write across bytecodes. Usethreading.LockorAtomic*types. - In Java, used
++onvolatile intthinking it’s atomic. It’s not —volatileensures visibility but not atomicity. - In Go, used a channel for a single shared counter. Slower than
sync/atomic.AddInt64; mismatched tool.
Passing Bar
- Total score: 53/70 (average 3.8)
- Language/Runtime (#12): ≥ 4 (mandatory)
- Algorithm: correct and at expected complexity
- Answered ≥ 4 of 5–6 probes substantively
- Code idiomatic for the language
Follow-up Questions (post-coding)
- “Now port your solution to [other language]. What changes?”
- “Profile this in production — what tool, what metrics?”
- “Where would you put the metric instrumentation?”
- “If this is the hot path of a service handling 1M qps, what’s the bottleneck?”
Required Tests
- Algorithm correctness (basic)
- Concurrent stress test (≥ 4 threads/goroutines, hammering)
- Boundary timing (empty queue blocks; full queue blocks; producer wakes consumer)
- Resource cleanup / shutdown semantics
Required Runtime Discussion
State, for your solution:
- What allocations occur per operation (heap vs stack)
- Lock granularity and contention behavior
- GC implications (Python ref count, Java GC pause, Go STW, etc.)
- What the worst-case latency is and why
Self-Evaluation Template
Mock 09 — Runtime / Language
Date: _______
Language: _______
Problem: _______
Time: ___ / 45 min
Scores (1–5):
___ Total /70
Language/Runtime (#12): ___ (need ≥ 4)
Probes asked: ___
Probes answered well: ___
Probes I bombed (write each verbatim, drill before next mock):
1.
2.
3.
Action item:
What to Do If You Fail
- Probe score below 4: Spend the next week with Phase 9 — your language directory. Read the language’s runtime/perf docs cover to cover.
- Couldn’t answer 50% of probes: You don’t know the language as well as you claim. Pick a different language to claim, OR invest 2+ weeks.
- Pass twice consecutively before Mock 10.
Mock 10 — Infrastructure / Backend
Interview type: Backend / Platform / Infrastructure deep-dive coding Target role: Senior / Staff backend, distributed systems, database, storage Time limit: 75 minutes Format: Build a non-trivial backend component (KV store, log-structured index, sharded cache) Hints policy: Acceptable on the algorithm; failures on storage/concurrency fundamentals are red flags. Primary goal: Demonstrate that you can build the building blocks of real backend systems, not just consume them.
What This Mock Tests
Companies like Stripe, Snowflake, Databricks, Confluent, Cockroach Labs, and infra teams at FAANG ask coding questions that resemble small slices of their actual products. You’re expected to:
- Understand storage primitives (logs, indexes, B-trees, LSM)
- Reason about durability, ordering, concurrency
- Write code that could be a starting point for a production component
- Discuss the gap between what you built and what production would need
Scoring weights: Production Awareness (#14), Code Quality (#7), Correctness (#5), Tradeoff Reasoning (#13) are all critical.
Pick One Build
Build A — In-Memory KV Store with Snapshot Isolation
kv = KVStore()
tx = kv.begin() # returns a transaction
tx.put("k", "v")
tx.get("k") # returns "v" (read your writes)
tx2 = kv.begin()
tx2.get("k") # returns None (snapshot isolation; tx1 not committed)
tx.commit()
tx2.get("k") # still None (tx2's snapshot was taken before commit)
Required:
- MVCC (multi-version concurrency control)
- Snapshot reads return a consistent view
- Concurrent writers
- Tests for read-your-writes, isolation between tx, serialization-style conflict detection (optional)
Build B — Log-Structured Index (Mini-LSM)
db = LSMTree(memtable_threshold=1000)
db.put("k1", "v1")
db.put("k2", "v2")
db.get("k1") # "v1"
db.delete("k1")
db.get("k1") # None (tombstone)
# After threshold writes, memtable flushes to immutable SSTable
db.range("k1", "k9") # iterator over keys
Required:
- In-memory
memtable(sorted, e.g., SortedList or skiplist) - Flush to immutable SSTable when memtable exceeds threshold
getchecks memtable, then SSTables in reverse-time order- Tombstones for deletes
- Range scan that merges across all levels
- Tests covering: writes, reads-after-flush, range correctness, deletes
Build C — Consistent Hash Ring with Replication
ring = HashRing(replication_factor=3, virtual_nodes=128)
ring.add_node("node-A")
ring.add_node("node-B")
ring.add_node("node-C")
nodes = ring.get("user-123") # 3 nodes responsible
ring.remove_node("node-A") # 1/3 of keys reassign
nodes_after = ring.get("user-123")
Required:
- Virtual nodes for load balancing
- Replication factor enforced
- Adding/removing a node moves only its share of keys
- A test that verifies < 5% of keys move when a node is added (with sufficient virtual nodes)
Expected Communication Style
- Restate with assumptions stated upfront.
- Decompose before coding: which modules, what interfaces, what’s the data flow.
- Discuss the storage model:
- For A: how to represent versions per key
- For B: how to lay out SSTables; in this exercise, in-memory simulated
- For C: ring representation; virtual nodes; lookup data structure
- Identify the concurrency model and discuss what fails without it.
- Code the core end-to-end before tackling optimization.
- Discuss production gaps: persistence (we’re in-memory), replication consistency (we’re local), recovery (no WAL), monitoring (none).
- Test the invariants, not just the happy path.
Solution Sketches
A. KV with MVCC:
class KVStore:
def __init__(self):
self._data = {} # key → list of (version, value)
self._next_version = itertools.count()
self._lock = threading.Lock()
def begin(self):
with self._lock:
v = next(self._next_version)
return Transaction(self, v)
class Transaction:
def __init__(self, store, version):
self._store = store
self._snapshot_version = version
self._writes = {} # local buffer
self._committed = False
def get(self, key):
if key in self._writes: return self._writes[key]
versions = self._store._data.get(key, [])
for v, val in reversed(versions):
if v <= self._snapshot_version: return val
return None
def put(self, key, value): self._writes[key] = value
def commit(self):
with self._store._lock:
commit_v = next(self._store._next_version)
for k, v in self._writes.items():
self._store._data.setdefault(k, []).append((commit_v, v))
self._committed = True
B. LSM Tree: SortedList for memtable; on flush, freeze into immutable sorted list (the “SSTable”). get walks memtable + SSTables newest-first. Range scan: heap-merge iterators over all levels. Tombstones represented as special sentinel.
C. Consistent Hash: sorted list of (hash(virtual_node_id), node) pairs. Lookup: hash key, binary search for next pair, walk forward to collect R distinct nodes. Add/remove: insert/delete the virtual node entries.
Common Failure Modes
- A: returned all versions instead of the snapshot-visible one. Snapshot isolation not implemented; just MVCC storage.
- A: forgot to use a local write buffer. Transactions are visible to others before commit.
- B: re-sorting on every read. Should sort on flush; reads are merge.
- B: no tombstone semantics. Deleted key still appears in older SSTable.
- C: hash ring without virtual nodes. Load imbalance — one node gets 60% of keys.
- C: re-hashing entire keyspace on node change. Defeats the purpose of consistent hashing.
- All: no concurrency testing. Build passes single-thread; explodes under load.
Passing Bar
- Total score: 56/70 (average 4.0)
- Working core implementation
- Concurrency correct (or explicit single-threaded contract with rationale)
- Production gaps discussed substantively
- At least 4 tests covering invariants (not just smoke tests)
- Code quality: production-quality
Follow-up Questions
For A:
- Add serializable isolation. → Validation phase: on commit, abort if any key read had a newer version. (Optimistic concurrency.)
- Garbage-collect old versions. → Track oldest active snapshot; vacuum versions older than that.
- Persist to disk. → Write-ahead log per transaction; redo on recovery.
- Distributed version: two-phase commit + Paxos for log replication.
For B:
- Add bloom filters per SSTable. → Skip SSTable scan if key definitely not present.
- Compaction strategy: leveled vs size-tiered. → Tradeoffs in write amplification.
- Crash recovery. → WAL replay before opening memtable.
- Range scan optimization with min/max key per SSTable.
For C:
- Heterogeneous nodes (different capacity). → Virtual nodes proportional to capacity.
- Read consistency across replicas. → Quorum reads (R + W > N).
- Hinted handoff when a replica is down. → Buffer writes for offline replicas, flush on return.
- Anti-entropy / Merkle trees. → Detect divergence between replicas.
Required Tests
- Happy path correctness
- Boundary case (empty store, single key, max capacity)
- Concurrency: ≥ 8 threads, hammer the API for several seconds, verify invariants hold
- Recovery semantics (if implemented)
- For A: snapshot isolation — test that opens tx1, writes via tx2, commits tx2, reads via tx1 → must return the pre-commit value
- For B: write enough to trigger flush; read works across memtable + SSTable
- For C: load distribution test (after adding N nodes, max-min keys per node ratio ≤ 1.3)
Required Production Discussion
- Persistence strategy and crash recovery
- Replication model and consistency tradeoffs
- Monitoring: latency p50/p99, throughput, queue depths, GC pauses
- Failure modes: node loss, network partition, slow disk
- Backpressure: what happens when writes outpace flush
Self-Evaluation Template
Mock 10 — Infrastructure / Backend
Date: _______
Build: _______
Time: ___ / 75 min
Scores (1–5):
___ Total /70
Core working? Y/N
Concurrency tests pass under stress? Y/N
Production gaps discussed unprompted? Y/N (list them: ___)
What I left out (and what it would take):
Action item:
What to Do If You Fail
- Storage primitives unclear: Read “Designing Data-Intensive Applications” (Chapters 3 and 5).
- Concurrency issues: Phase 9 (language/runtime) concurrency sections.
- Code quality: A senior code review of your build.
- Pass twice consecutively before Mock 11.
Mock 11 — Concurrency Heavy
Interview type: Concurrency / parallelism coding round Target role: Backend, systems, infrastructure, embedded, gaming, OS-adjacent Time limit: 60 minutes Format: ONE problem requiring real concurrency primitives (locks, condvars, channels, atomics) Hints policy: A hint on the primitive choice is acceptable; a hint on the race condition is borderline. Primary goal: Write code that is provably correct under concurrent access.
What This Mock Tests
Concurrency code is notoriously hard. The bar:
- Identify what needs synchronization (shared mutable state)
- Choose the right primitive (mutex vs RWLock vs channel vs atomic)
- Avoid deadlocks, livelocks, starvation
- Write a test that would catch the bug if present (not just one that passes by luck)
Scoring weights: Correctness (#5), Code Quality (#7), Language/Runtime (#12) are critical.
Pick One Problem
Problem A — Bounded Blocking Queue (with timeout)
q = BoundedQueue(capacity=10)
q.put(item) # blocks if full
q.put(item, timeout=5.0) # returns False on timeout
item = q.get() # blocks if empty
item = q.get(timeout=5.0) # returns None on timeout
q.close() # subsequent put → exception; get drains then returns None
Multiple producers, multiple consumers. Must be FIFO. Must support graceful close.
Problem B — Thread Pool (with shutdown semantics)
pool = ThreadPool(workers=4)
future = pool.submit(fn, arg)
result = future.result(timeout=5.0) # blocks until done
pool.shutdown(wait=True) # waits for all queued + running
pool.shutdown(wait=False) # rejects new, returns immediately
Required: bounded work queue, graceful drain, future-based result delivery, exception propagation.
Problem C — Read-Write Lock (Writer-Preferred)
lock = RWLock()
with lock.read():
# multiple readers concurrently
...
with lock.write():
# exclusive; blocks new readers waiting for writer
...
Required: many readers OR one writer; writers must not starve.
Problem D — Dining Philosophers (Deadlock-Free)
5 philosophers, 5 chopsticks. Each alternates think/eat. Implement so no deadlock, no philosopher starves.
Expected Communication Style
- Restate with the explicit concurrency requirements (“multiple producers, multiple consumers, FIFO, graceful close”).
- Identify shared mutable state. This is the most important step.
- Identify the invariant. (“Queue size never exceeds capacity”; “no two writers active simultaneously”; “no two adjacent philosophers eat simultaneously”.)
- Choose primitives with rationale. (“I need to block on full/empty — that means condvar, not just a lock.”)
- Code the critical section minimally. Hold the lock only across the shared-state mutation; release before any potentially-blocking call.
- Write a concurrency test. Not just “does it work once” — stress with many threads, verify the invariant.
- Discuss failure modes: what happens if a producer dies holding the lock? If close is called during a put?
Solution Sketches
A. Bounded Queue:
class BoundedQueue:
def __init__(self, capacity):
self._capacity = capacity
self._q = deque()
self._lock = threading.Lock()
self._not_full = threading.Condition(self._lock)
self._not_empty = threading.Condition(self._lock)
self._closed = False
def put(self, item, timeout=None):
with self._lock:
if self._closed: raise QueueClosed()
end = time.monotonic() + timeout if timeout else None
while len(self._q) >= self._capacity:
if self._closed: raise QueueClosed()
remaining = end - time.monotonic() if end else None
if remaining is not None and remaining <= 0: return False
self._not_full.wait(timeout=remaining)
self._q.append(item)
self._not_empty.notify()
return True
def get(self, timeout=None):
with self._lock:
end = time.monotonic() + timeout if timeout else None
while not self._q:
if self._closed: return None
remaining = end - time.monotonic() if end else None
if remaining is not None and remaining <= 0: return None
self._not_empty.wait(timeout=remaining)
item = self._q.popleft()
self._not_full.notify()
return item
def close(self):
with self._lock:
self._closed = True
self._not_full.notify_all()
self._not_empty.notify_all()
B. Thread Pool: N worker threads pulling from a shared bounded queue; each task wraps (fn, args, future). Workers set future’s result/exception. Shutdown sentinels (None) wake workers; wait=True joins all worker threads.
C. RWLock writer-preferred: Track reader_count, writer_active, waiting_writers. Reader acquires only if no writer active AND no waiting writers. Writer blocks while readers active; once it’s the chosen waiter, blocks all new readers.
D. Philosophers: asymmetric strategy — even-numbered grabs left then right, odd-numbered grabs right then left. Breaks the symmetry that causes circular wait → no deadlock. Alternative: use a hierarchical lock ordering (always grab lower-id chopstick first).
Common Failure Modes
- Used
ifinstead ofwhilearoundwait(). Spurious wakeups cause invariant violations. - Released the lock before notifying, or notified before updating state. Wakes a consumer that finds the queue empty.
- Held the lock during a callback / blocking I/O. Other threads stall.
- Used
notify()instead ofnotify_all()for close. Only wakes one waiter; others hang forever. - RWLock without writer preference → writer starvation.
- Philosophers: all grab left → deadlock. Classic.
- Test that runs once and passes. Not a concurrency test. Need stress (1000+ iterations across many threads).
- Used
concurrent.futures.ThreadPoolExecutorfor Problem B without implementing the underlying logic. Some interviewers accept this; many want you to build it.
Passing Bar
- Total score: 56/70 (average 4.0)
- Code correctness verified under stress (≥ 10K operations across ≥ 8 threads)
- Invariant explicitly stated and tested
- No use of high-level concurrency abstractions that hide the primitive (e.g.,
queue.Queuefor problem A) - Failure modes discussed
Follow-up Questions
For A:
- Lock-free version. → Discuss MPMC ring buffer with CAS; show awareness of ABA problem.
- Multi-priority. → Multiple internal queues, one per priority.
- Persist across restart. → Append-only log; replay on startup.
For B:
- Work-stealing pool. → Per-worker deque; idle workers steal from others.
- Dynamic resizing. → Grow workers under load; shrink when idle.
- Cancellable tasks. → Future.cancel() signals; worker checks flag.
For C:
- Fair RWLock (FIFO). → Single waiting queue; reader batching possible but trickier.
- Reader-preferred. → Easier to implement; writer starvation risk.
- Async version with futures. → Same logic; futures replace condvar.
For D:
- Variant where philosophers think for variable time. → Same algorithm.
- Generalize to N philosophers and K chopsticks. → Open problem; resource allocation graphs.
- Distributed version (philosophers on different machines). → Requires distributed deadlock detection.
Required Tests
- Single-thread correctness
- Multi-thread stress: many producers + consumers; assert no item lost, no item duplicated, no exception
- Timeout correctness:
put(timeout=0.1)returns False after 100ms when full - Close semantics: close during active put/get unblocks all waiters cleanly
- Invariant assertion: assert size ≤ capacity throughout, etc.
Example stress test for A:
def test_stress():
q = BoundedQueue(10)
produced = []
consumed = []
def producer(start, count):
for i in range(start, start + count):
q.put(i)
produced.append(i)
def consumer():
while True:
x = q.get(timeout=1.0)
if x is None: break
consumed.append(x)
threads = [threading.Thread(target=producer, args=(i*1000, 1000)) for i in range(10)]
consumers = [threading.Thread(target=consumer) for _ in range(5)]
for t in threads + consumers: t.start()
for t in threads: t.join()
q.close()
for t in consumers: t.join()
assert sorted(consumed) == sorted(produced)
assert len(consumed) == 10_000
Required Discussion
- The invariant your code maintains
- The lock ordering (if multiple locks)
- The worst-case latency (lock contention)
- What happens under crash mid-operation
- How you’d debug a deadlock if one were reported (jstack, py-spy dump, gdb)
Self-Evaluation Template
Mock 11 — Concurrency
Date: _______
Problem: _______
Time: ___ / 60 min
Scores (1–5):
___ Total /70
Stress test passed (≥ 10K ops)? Y/N
Invariant explicitly stated? Y/N
Failure modes discussed? Y/N
Any race condition found post-hoc? (List:)
Action item:
What to Do If You Fail
- Race condition in submitted code: This is the #1 reason to repeat — concurrency bugs in production are catastrophic.
- Couldn’t choose the right primitive: Read your language’s concurrency chapter (Phase 9). Understand condvar vs channel vs atomic before next attempt.
- Stress test exposed a bug you didn’t anticipate: Lab 05 (stress harness) applied to concurrent code is your training.
- Pass twice consecutively before Mock 12.
Mock 12 — Competitive Style
Interview type: Algorithmic puzzle round (Jane Street, Hudson River, Citadel, Two Sigma, ICPC-style firms; some Google L6+ rounds) Target role: Quant developer, HFT, compiler/optimization, ICPC alumni, top-tier algorithmic teams Time limit: 90 minutes Format: ONE hard algorithmic problem (Codeforces Div 2 D / Div 1 B level) Hints policy: No free hints. A hint is a hard signal of failure. Primary goal: Reach the algorithmic insight under sustained time pressure.
What This Mock Tests
This mock is not about production engineering. It’s about pure algorithmic depth and the ability to think clearly for 90 minutes on a problem with no obvious path.
The kind of problem chosen:
- Has a clever insight that unlocks the optimal complexity
- Brute force is far too slow
- Standard patterns don’t directly apply — you must combine 2–3
- Implementation is non-trivial but not the bottleneck
Scoring weights: Optimization (#4), Correctness (#5), Complexity (#6), Code Quality (#7) are key. Production / tradeoff dimensions are not relevant — these are not asked.
Pick One Problem
(Pick at random for self-mock. With a partner, they choose.)
Problem A — Maximum Subarray with At Most K Replacements
Given an array a of integers and integer k, you may replace at most k elements with any value. Find the maximum possible sum of any contiguous subarray of the resulting array.
Constraints: 1 ≤ |a| ≤ 2×10^5. -10^9 ≤ a[i] ≤ 10^9. 0 ≤ k ≤ |a|.
Examples:
a = [-3, 4, -2, 5, -1], k = 1 → 11 (replace -2 with, say, 10^9? No — wait)
(Note: replacement values are unconstrained, so this trivializes; the actual problem variant is: removals, or replacements must be 0, or replacements use given budget. The interviewer specifies. For self-mock, use: “replace at most k elements with 0” — then it becomes a real DP.)
Problem B — Count Subarrays with Median ≥ X
Given array a (distinct) and threshold X. Count contiguous subarrays whose median is ≥ X. (Median of even-length: take the right-middle.)
Constraints: 1 ≤ |a| ≤ 2×10^5.
Insight: map each element to +1 if ≥ X, else -1. A subarray’s median is ≥ X iff its sum is positive (for odd length) or ≥ 0 (for even length with right-middle). Reduces to counting subarrays with prefix sum differences satisfying inequalities — Fenwick tree.
Problem C — Minimum Cost to Make Array Strictly Increasing
Given array a, you may increase any element by 1 at cost 1 (cannot decrease). Find minimum total cost to make a strictly increasing.
Constraints: 1 ≤ |a| ≤ 3000. Values fit in int64.
Insight: strict-increasing ↔ define b[i] = a[i] - i; then b must be non-decreasing. Reduces to “min cost to make array non-decreasing using only increases” = sum(max(0, prefix_max - b[i])). Wait — that’s not quite right because we can only increase. Final formula: walk left to right, maintain prev = max(prev + 1, a[i]), cost += prev - a[i].
Problem D — Range Sum with Updates and Range Adds (LC 307 + lazy)
Implement: update(i, x), range_add(l, r, x), query_sum(l, r). All in O(log N).
Constraints: 1 ≤ N ≤ 10^5. ≤ 10^5 operations.
Tool: segment tree with lazy propagation (Phase 3 Lab 02).
Problem E — Maximum XOR of Two Numbers (LC 421)
Given an array a, find max(a[i] XOR a[j]) over all i < j. O(N · 32) time.
Insight: binary trie of all numbers; for each number, greedily traverse to find the maximally-different other number.
Expected Communication Style
For competitive mocks, communication is light but precise:
- Restate in one sentence.
- Ask 1–2 surgical clarifying questions (constraints, distinct/duplicate, output format).
- State a brute force with complexity — proves you understand the problem.
- Think aloud about reductions or patterns. (“Median question; +1/-1 transformation; subarray sum; Fenwick.”)
- State the optimal complexity and key insight before coding.
- Code carefully and minimally. Competitive code can sacrifice some readability for brevity; don’t sacrifice correctness.
- Test 1–2 cases including a non-obvious one.
There is no “production discussion” in this format. The interviewer cares about the algorithm and the implementation.
Common Failure Modes
- Couldn’t reach the insight in 60 min. Submitted the brute force. Pass-ish for the optimization dimension only if the brute is correct.
- Reached the insight but the implementation has bugs that take 20+ min to fix. Need to drill segment tree / Fenwick / DP from Phase 3.
- Stuck on the wrong approach for 30+ min. Senior signal: pivot quickly when an approach doesn’t pan out. Articulate the pivot.
- Forgot the standard library tool. (Python
bisect, C++lower_bound, etc.) — costs implementation time. - No tests because “I’m confident.” Competitive code is wrong constantly; verify against brute force.
Passing Bar
- Total score: 49/70 (average 3.5) — lower than other mocks because some dimensions don’t apply
- Optimal complexity reached (or a serious near-optimal attempt with clear gap analysis)
- Correct on given examples + at least one boundary case
- Time ≤ 90 min
- Algorithm articulated with insight
Follow-up Questions
Competitive-style follow-ups are harder algorithm variants:
For A:
- Generalize: replace ≤ k elements with values from a given set. → DP becomes more state-heavy.
- Output the actual subarray and the replacements. → DP with parent pointers.
For B:
- Median strictly > X. → Adjust the +1/-1 mapping for equality.
- K-th smallest in every subarray. → Much harder; persistent data structures or offline processing.
For C:
- Allow decreases at cost 1 too. → Now O(N log N) using slope trick or O(N²) DP.
- Strictly increasing AND in [L, R] for each element. → Constrained version; more careful greedy.
For D:
- Add range assign as well as range add. → Two lazy tags; non-trivial composition.
- Range mode (most frequent element). → Much harder; Mo’s algorithm.
For E:
- Max XOR of any triple. → Open problem in some formulations; brute O(N²) over pairs + trie for third.
- Max XOR with values ≤ K (subset). → Persistent trie indexed by element index.
Required Tests
- All given examples
- Boundary: N = 1, N = max
- Adversarial: all same values, sorted, reverse-sorted
- One stress test against the brute force if time permits (mandatory if you suspect a bug)
Required Complexity Explanation
- Time, with reasoning
- Space, with reasoning
- Bound is tight or improvable?
- For N = 2×10^5 with O(N log N), expected runtime in seconds (typically < 1 sec)
Self-Evaluation Template
Mock 12 — Competitive Style
Date: _______
Problem: _______
Time: ___ / 90 min
Scores (1–5):
___ Total /70 (note: Tradeoff/Production are N/A; weighted out)
Time to insight: _____ min
Time to first correct implementation: _____ min
Bugs found and fixed: ___
Did I pivot from a wrong approach? Y/N (at minute ___)
Action item:
What to Do If You Fail
- Couldn’t reach the insight: This is a long-term gap, not a one-mock fix. Solve 30+ Codeforces Div 2 D problems (or LC hards tagged “competitive”) over 2–4 weeks.
- Reached insight, implementation buggy: Drill Phase 3 (advanced data structures) — your fundamentals leak.
- Bombed time management: Practice with stricter timers (45 min for problems you’ve already seen).
- Pass twice consecutively, on different problems, to consider this level handled.
After All 12 Mocks
When you have passed all 12 mocks twice consecutively each, return to the READINESS_CHECKLIST to verify the overall pipeline. The mocks alone do not certify readiness — they verify performance ability. Real interviews additionally test consistency over many rounds and behavioral signals.
Most candidates do not need to pass all 12. Pass the mocks corresponding to your target role:
- FAANG SWE-II: mocks 01–06
- FAANG Senior: mocks 01–07, plus 09 (language)
- FAANG Staff / Principal: mocks 01–10 except 12
- Quant / HFT / Compiler: mocks 01–04, 09, 11, 12 (heavy on competitive)
- Backend / Platform (Stripe, Snowflake, Confluent): mocks 01–08, 10, 11
Phase 12 — Grandmaster
Read this before doing anything in this phase.
This phase covers topics that are not required for 99% of interviews, including senior and staff roles at top FAANG companies. The content here is for a narrow set of candidates:
- ICPC World Finals competitors
- Codeforces red / IGM
- Quant developers at Jane Street, Hudson River, Two Sigma, Citadel for the most algorithmic roles
- Compiler engineers at LLVM, GCC, Intel, NVIDIA
- Database engineers at the algorithm-heavy companies (Snowflake, Databricks query optimizer, CockroachDB, TimescaleDB)
- Cryptography / coding theory researchers
- A few specific roles at Google Research, DeepMind, OpenAI infrastructure
If you are not on this list, skip this phase entirely. Time spent here is time not spent on Phase 10 (testing/debugging) and Phase 11 (mocks), which will show up in your interviews.
When to Use This Phase
Use this phase if all are true:
- You have already completed Phases 1–11 and are passing the mocks at your target level.
- You are interviewing for one of the roles listed above.
- The job description explicitly mentions ICPC, competitive programming, max flow, suffix structures, FFT, or “research-grade algorithms.”
- You have at least 3 months before your interview.
If any of these are false, stop. Go back to Phase 10 or 11.
When to Skip This Phase
Skip this phase if any is true:
- You are interviewing for SWE-II / E4 / SDE2 or below.
- You are interviewing for generic senior or staff backend at FAANG (mocks 06–08 cover what’s actually asked).
- You have less than 3 months before your interview.
- You are still failing Phase 11 mocks at your level — those are higher leverage.
- You are an SRE, mobile engineer, frontend engineer, ML engineer (non-research), or data engineer.
The opportunity cost is real. Each lab here takes a week. That week is better spent on Phase 11 mock attempts for the vast majority of candidates.
What’s In This Phase
The labs cover algorithms and data structures that appear on Codeforces / ICPC / IOI and almost nowhere else:
| Lab | Topic | When it appears |
|---|---|---|
| 01 | Max Flow (Dinic) | Quant, compiler, graph-heavy research |
| 02 | Bipartite Matching (Hopcroft-Karp) | Assignment problems; some quant |
| 03 | Heavy-Light Decomposition | ICPC, very rare in industry |
| 04 | Centroid Decomposition | ICPC, rare in industry |
| 05 | Suffix Automaton | String-heavy research, bioinformatics |
| 06 | Advanced DP Optimization (CHT, Knuth, D&C) | Quant, compiler (loop scheduling) |
| 07 | FFT / Polynomial Multiplication | Cryptography, signal processing, some compiler |
| 08 | Advanced Geometry (convex hull, intersections) | Geometric computing, games, CAD |
| 09 | ICPC Contest Simulation | Competitive programming only |
| 10 | Inclusion-Exclusion, Burnside | Combinatorics-heavy research |
Each lab has the standard 23-section format plus an extra “When to Skip This Topic” section right after Interview Context, so you can opt out of individual labs.
How to Use This Phase
- Read this README in full.
- Look at the target job descriptions you’re applying to. Search them for the specific keywords (max flow, suffix automaton, etc.).
- If you find a match, do that specific lab. If not, skip.
- Doing the whole phase end-to-end is rarely the right call. Cherry-pick.
Realistic Expectations
Even if you do this phase, you may never encounter these topics in an interview. The value is:
- Confidence signal — knowing these exist and roughly how they work lets you say “I’m familiar with Dinic’s max flow” if it ever comes up.
- Insight transfer — understanding centroid decomposition deepens your tree intuition for problems you will see.
- Specific roles — if you’re applying to a quant fund’s algo research team, expect this material.
This phase is intentionally not graded against the same passing bar as other phases. It’s read-only intellectual investment for a small group.
What This Phase Is Not
This phase is not:
- A prerequisite for Phase 11.
- Required for any FAANG interview.
- A signal of seniority.
- Going to help you with system design.
If you’re using this phase to procrastinate the harder thing (Phase 10 testing labs, Phase 11 mocks), stop. That’s the actual failure mode, and the only one of consequence.
After This Phase
If you complete the relevant labs, you have what most ICPC mid-rank teams have. You’re prepared for the narrow algorithmic interviews. You’re not more prepared for normal FAANG interviews than someone who did Phase 11 twice.
Return to Phase 11 for mock-12 (competitive style) reps, then to your job search.
Lab 01 — Max Flow (Dinic’s Algorithm)
Goal
Implement Dinic’s algorithm for maximum flow on a directed graph with capacities, achieving O(V²·E) worst case and near-linear in practice. Apply it to a real interview-style problem (Maximum Students Taking Exam, LC 1349) by reducing to max flow / bipartite matching.
Background
Maximum flow is the foundational network flow problem: given a source s and sink t in a directed graph with edge capacities, find the maximum rate at which “flow” can travel from s to t respecting capacities.
Key algorithms:
- Ford-Fulkerson (1956): generic augmenting-path framework. Complexity depends on path choice.
- Edmonds-Karp (1972): BFS for shortest augmenting path. O(V·E²).
- Dinic (1970): level graphs + blocking flows. O(V²·E) general; O(E·√V) for bipartite matching.
- Push-relabel (Goldberg-Tarjan, 1986): faster in practice for dense graphs. O(V²·√E) with FIFO.
Why Dinic dominates in practice: the level graph constraint (only follow edges from level i to level i+1 in BFS layering) prunes the search dramatically. For random graphs, near-linear.
Interview Context
Max flow is asked in:
- ICPC regionals/world finals (universal)
- Quant developer rounds at funds that care about assignment-style problems
- A handful of Google L6+ research interviews
- Rare appearances at Snowflake / Databricks query optimizer roles (max-flow underpins some join reordering heuristics)
- Never in standard FAANG SWE interviews
If asked, expect to either implement Dinic from scratch OR identify that a problem reduces to max flow and explain the reduction (more common than full implementation).
When to Skip This Topic
Skip if any of these are true:
- You are not interviewing for quant / research / ICPC-adjacent roles
- You have not memorized the basic Ford-Fulkerson framework yet
- You have less than 4 weeks for this phase
The reduction skill (recognizing a problem as max flow) is more valuable than memorizing the implementation. If you have only a few days, study reductions and skip the implementation.
Problem Statement
Maximum Students Taking Exam (LeetCode 1349, Hard).
Given an m × n classroom matrix where each cell is either ‘.’ (good seat) or ‘#’ (broken). Place students such that no student can cheat — a student can cheat off any immediately adjacent student in the same row OR diagonally in front (one row earlier, column ±1). Maximize the number of students seated.
seats = [["#",".","#","#",".","#"],
[".","#","#","#","#","."],
["#",".","#","#",".","#"]]
output = 4
Constraints
- 1 ≤ m, n ≤ 8 (small grid — but max-flow approach generalizes)
- Up to 64 seats
- Wall-clock target: < 100ms
Clarifying Questions
- Can a student cheat off the seat directly in front (same column, previous row)? (No — only diagonal-front and same-row-adjacent.)
- Are broken seats unavailable for sitting? (Yes — ‘#’ cannot hold a student.)
- Is the grid always rectangular? (Yes.)
Examples
Example 1
seats = [[".","#","."],
["#",".","#"],
[".","#","."]]
Conflict graph: every ‘.’ conflicts with diagonal-front + same-row-adjacent. Maximum independent set = 4 (the corners).
Example 2
seats = [["."]]
Trivial: 1.
Example 3 (boundary)
seats = [[".",".",".","..."]] # single row, all good
Same-row-adjacency means max alternating = ⌈n/2⌉.
Brute Force
Try every subset of good seats; check no two are in conflict; track max. O(2^k · k²) where k = number of good seats. For 8×8 = 64, infeasible.
Brute Force Complexity
- Time: O(2^k · k²) — fails for k > ~20.
- Space: O(k) for current subset.
Optimization Path
Observation 1: this is maximum independent set on a conflict graph, which is NP-hard in general.
Observation 2: but our conflict graph is bipartite! Color seats by column parity (even/odd columns). All conflicts are between an even column and an odd column (same-row-adjacent: differs by 1; diagonal: also differs by 1). So no conflicts within the even-column set or within the odd-column set.
Observation 3: max independent set on a bipartite graph = total vertices − max matching (König’s theorem). So we compute max bipartite matching, which is solvable in polynomial time via max flow.
This is the canonical reduction trick.
Final Expected Approach
- Build the bipartite graph: left = good seats in even columns, right = good seats in odd columns. Edge between two if they conflict.
- Add source
s→ all left nodes (cap 1), all right nodes → sinkt(cap 1), all conflict edges left→right (cap 1). - Run Dinic to compute max flow = max matching.
- Answer = (total good seats) − (max matching).
Data Structures
- Adjacency list with edge-index representation (each edge stores
to,cap,rev-index for the reverse edge) - BFS level array
- DFS iterator per node (incremented across calls to skip dead branches)
- Queue for BFS
Correctness Argument
- Bipartite: any conflict involves columns differing by 1, hence different parities.
- König: in bipartite, |min vertex cover| = |max matching|; |max independent set| = |V| − |min vertex cover|. So |MIS| = |V| − |max matching|.
- Dinic correctness: Ford-Fulkerson framework with augmenting paths; terminates when no augmenting path exists; gives optimal flow by max-flow min-cut theorem.
- Reduction: max matching via max flow is exact when all edge capacities are 1 and source/sink edges all have capacity 1.
Complexity
- Dinic on bipartite (unit-capacity) graphs: O(E·√V) — the Hopcroft-Karp bound.
- For LC 1349: V ≤ 64, E ≤ 64 × 4 = 256. Trivial.
Implementation Requirements
class Dinic:
def __init__(self, n):
self.n = n
self.graph = [[] for _ in range(n)]
def add_edge(self, u, v, cap):
self.graph[u].append([v, cap, len(self.graph[v])])
self.graph[v].append([u, 0, len(self.graph[u]) - 1])
def _bfs(self, s, t):
self.level = [-1] * self.n
self.level[s] = 0
q = deque([s])
while q:
u = q.popleft()
for v, cap, _ in self.graph[u]:
if cap > 0 and self.level[v] < 0:
self.level[v] = self.level[u] + 1
q.append(v)
return self.level[t] >= 0
def _dfs(self, u, t, pushed):
if u == t: return pushed
while self.it[u] < len(self.graph[u]):
e = self.graph[u][self.it[u]]
v, cap, rev = e
if cap > 0 and self.level[v] == self.level[u] + 1:
d = self._dfs(v, t, min(pushed, cap))
if d > 0:
e[1] -= d
self.graph[v][rev][1] += d
return d
self.it[u] += 1
return 0
def max_flow(self, s, t):
flow = 0
while self._bfs(s, t):
self.it = [0] * self.n
while True:
f = self._dfs(s, t, float('inf'))
if f == 0: break
flow += f
return flow
Then build the bipartite graph and call.
Tests
- LC 1349 given examples
- All ‘#’ grid → 0
- All ‘.’ grid of size 1×n → ⌈n/2⌉
- 8×8 all ‘.’ (max stress) → ~32 (need to compute)
- Single column m×1 all ‘.’ → m (no same-row conflicts within a column)
Follow-up Questions
- Generalize to weighted matching (different students have different “value”; maximize total value). → Min-cost max flow.
- Add a constraint that some seats are mandatory. → Force-include via lower-bound constraints.
- m, n up to 50. → Same algorithm; check timing.
- Stream of conflicts; dynamic max matching. → Active research area.
- Distinct from LC 1349, prove the bipartite reduction is tight.
Product Extension
Real systems that use max flow / bipartite matching:
- Ride-sharing assignment (drivers ↔ requests)
- Ad auction allocation (advertisers ↔ slots)
- Resource scheduling (tasks ↔ machines)
- Compiler register allocation (variables ↔ registers; with constraints)
- DNA sequencing assembly
Language/Runtime Follow-ups
- Python: recursion depth limit; switch to iterative DFS for large V.
- C++: much faster; competitive programmers use this exclusively.
- Go/Java: stack size for recursive DFS may need explicit increase.
Common Bugs
- Forgot the reverse edge. Flow networks require residual graph; no reverse = wrong answer.
- Reverse edge with cap 0 but didn’t account for it during DFS: correct — that’s by design.
- BFS level updated multiple times. Use the first level reached only.
- DFS iterator reset every call to _dfs. Should persist within a phase (the
self.it[u]trick). - Bipartite assumption violated: if you add an edge between two left nodes, the reduction breaks. Verify.
- Source/sink indices clash with vertex IDs. Use distinct numbering scheme.
Debugging Strategy
- Print the level graph after each BFS.
- Print augmenting path found in each DFS.
- Verify flow conservation at intermediate nodes after termination.
- Sanity check: max flow ≤ min(deg(s), deg(t)).
Mastery Criteria
- Implement Dinic from memory in ≤ 25 min in your primary language
- Identify max-flow reductions in problems that don’t mention “flow” or “matching” explicitly
- Explain why LC 1349 reduces to bipartite matching, citing König
- State Hopcroft-Karp’s complexity advantage on bipartite unit-cap graphs
- Estimate runtime for a given V, E
- Implement min-cost max flow if asked (separate algorithm — SPFA + Dinic)
Lab 02 — Bipartite Matching (Hopcroft-Karp)
Goal
Implement Hopcroft-Karp for maximum bipartite matching, achieving O(E·√V), and understand when it beats general max flow.
Background
Bipartite matching: given a bipartite graph (vertices split into L and R, edges only between L and R), find the largest set of edges with no shared endpoint.
- Naïve augmenting path: O(V·E). For each unmatched left vertex, find an augmenting path via DFS.
- Hopcroft-Karp (1973): find multiple vertex-disjoint shortest augmenting paths per phase. O(E·√V).
The √V comes from the fact that after √V phases, all remaining augmenting paths have length > √V, and there can be at most √V such paths.
Hopcroft-Karp is a special case of Dinic’s algorithm applied to a unit-capacity bipartite flow network. If you have Dinic, you have Hopcroft-Karp.
Interview Context
Bipartite matching shows up in:
- ICPC (constant)
- Assignment problems (jobs ↔ workers)
- Some quant interviews on portfolio matching
- Compiler register coalescing
- Almost never in standard FAANG interviews
Recognizing that a problem is bipartite matching is the high-leverage skill; the algorithm is well-known.
When to Skip This Topic
Skip if any of these are true:
- You’ve already done Lab 01 (Dinic handles bipartite matching as a special case)
- You’re not targeting competitive / quant / assignment-heavy roles
- You have less than 2 weeks for this phase
The reduction skill is what matters. Skip the algorithm if you can recognize the reduction and use Dinic.
Problem Statement
Maximum Bipartite Matching.
Given a bipartite graph with L left vertices, R right vertices, and M edges, find the maximum matching size.
Variant: Job Assignment. N workers, N jobs. Worker i can do a subset of jobs. Assign each worker to at most one job, each job to at most one worker. Maximize assignments.
Constraints
- 1 ≤ L, R ≤ 10^5
- 1 ≤ M ≤ 10^6
- Wall-clock: < 1 sec
Clarifying Questions
- Are the partitions L and R given, or do I need to detect bipartiteness? (Usually given.)
- Are edges weighted? (No — that’s a different problem: Hungarian algorithm or min-cost max flow.)
- Output the matching or just the size? (Both versions are common.)
Examples
L = {1, 2, 3}, R = {a, b, c}
Edges: 1-a, 1-b, 2-b, 3-c
Max matching: {1-a, 2-b, 3-c}, size = 3
L = {1, 2}, R = {a, b}
Edges: 1-a, 2-a
Max matching: {1-a} or {2-a}, size = 1
Brute Force
Try all subsets of edges; check that no vertex appears twice; track max. O(2^M · M).
Better naïve: for each left vertex in order, DFS to find an augmenting path. O(V·E).
Brute Force Complexity
- Subsets: O(2^M)
- Per-vertex DFS: O(V·E). Acceptable for V·E ≤ ~10^7.
Optimization Path
Hopcroft-Karp:
- Phase 1: BFS from all unmatched left vertices, computing layers in the residual graph.
- Phase 2: DFS from each unmatched left vertex, finding vertex-disjoint shortest augmenting paths.
- Repeat until no augmenting path exists.
The phase count is O(√V), giving total O(E·√V).
Final Expected Approach
class HopcroftKarp:
def __init__(self, left_size, right_size):
self.L = left_size
self.R = right_size
self.adj = [[] for _ in range(left_size)]
self.NIL = -1
def add_edge(self, u, v):
self.adj[u].append(v)
def _bfs(self):
q = deque()
self.dist = [float('inf')] * self.L
for u in range(self.L):
if self.match_L[u] == self.NIL:
self.dist[u] = 0
q.append(u)
found = False
while q:
u = q.popleft()
for v in self.adj[u]:
pair = self.match_R[v]
if pair == self.NIL:
found = True
elif self.dist[pair] == float('inf'):
self.dist[pair] = self.dist[u] + 1
q.append(pair)
return found
def _dfs(self, u):
for v in self.adj[u]:
pair = self.match_R[v]
if pair == self.NIL or (self.dist[pair] == self.dist[u] + 1 and self._dfs(pair)):
self.match_L[u] = v
self.match_R[v] = u
return True
self.dist[u] = float('inf')
return False
def max_matching(self):
self.match_L = [self.NIL] * self.L
self.match_R = [self.NIL] * self.R
matching = 0
while self._bfs():
for u in range(self.L):
if self.match_L[u] == self.NIL and self._dfs(u):
matching += 1
return matching
Data Structures
- Adjacency list (left → list of right)
match_L[u],match_R[v]: current partner or NILdist[u]: BFS layer of left vertex- Queue for BFS
Correctness Argument
- Augmenting path: path alternating unmatched-matched-unmatched… edges, starting and ending at unmatched vertices. Flipping the edges along the path increases matching size by 1.
- Berge’s theorem: matching is maximum iff no augmenting path exists.
- Hopcroft-Karp: in each phase, finds a maximal set of vertex-disjoint shortest augmenting paths. After √V phases, no short augmenting paths remain; at most √V remaining ones contribute one each.
Complexity
- Time: O(E · √V)
- Space: O(V + E)
For V = 10^5, E = 10^6: roughly 10^7.5 ≈ 3·10^7 ops — well under 1 sec in C++.
Implementation Requirements
- Use BFS to detect all unmatched left vertices and compute layers
- DFS must respect layer constraint (
dist[pair] == dist[u] + 1) - Set
dist[u] = infinityon failed DFS to prune subsequent visits - Repeat until BFS finds no augmenting path
Tests
- Empty graph → 0
- Single edge → 1
- Complete bipartite K_{n,n} → n
- Star (1 left, n right) → 1
- Path 1-a-2-b-3 → 2
Follow-up Questions
- Weighted matching (maximize sum of edge weights, not count). → Hungarian algorithm O(V³) or min-cost max flow.
- Online matching (edges arrive one at a time). → Greedy is 1/2-competitive; ranking is (1 − 1/e)-competitive.
- Stable matching (Gale-Shapley). → Different problem; preferences instead of binary edges.
- Edge-disjoint paths from s to t. → Reduces to max flow with all capacities 1.
Product Extension
- Ad-slot allocation (advertisers ↔ impressions)
- Ride-sharing dispatch (drivers ↔ riders)
- Course allocation (students ↔ classes with capacity)
- Resource scheduling
Language/Runtime Follow-ups
- C++: competitive programmers use a tight 50-line version
- Python: recursion depth and constant factor make this borderline at V = 10^5; use sys.setrecursionlimit or iterative
- Rust: ownership makes the in-place matching arrays a small wrestle
Common Bugs
- Forgot to reset
dist[u] = infinityon DFS failure. Re-explores dead ends; slow. - DFS doesn’t respect the layer constraint. Same as Ford-Fulkerson; loses √V factor.
match_Landmatch_Rout of sync. Update both atomically.- NIL value collision with real vertex 0. Use -1 or a sentinel.
Debugging Strategy
- After each phase, print matching size and BFS layer counts
- Verify
match_L[u] == viffmatch_R[v] == u - Augmenting path should alternate matched/unmatched
Mastery Criteria
- Implement Hopcroft-Karp in ≤ 30 min from memory
- Explain why √V phases suffice (sketch of proof)
- Identify when bipartite matching applies to a problem stated in domain terms
- State the difference between bipartite matching and Hungarian (weighted)
- Estimate runtime for given V, E
Lab 03 — Heavy-Light Decomposition
Goal
Implement Heavy-Light Decomposition (HLD) for answering path queries on a tree in O(log² N) per query (or O(log N) with a segment tree per chain).
Background
HLD partitions tree edges into “heavy” and “light”:
- For each non-leaf vertex, the edge to its child with the largest subtree is heavy.
- All other edges are light.
Property: any root-to-leaf path uses O(log N) light edges (because each light edge halves the subtree size). Hence any tree path can be decomposed into O(log N) heavy chains.
Each heavy chain is contiguous in a DFS order, so we can maintain a segment tree over the DFS array and do O(log N) work per chain → O(log² N) per path query.
Originally from Sleator and Tarjan’s link-cut tree work (1983); HLD as the standalone offline technique attributed to Sleator & Tarjan / popularized via ICPC.
Interview Context
HLD appears in:
- ICPC (frequently in the path-query category — QTREE problem on SPOJ is canonical)
- Almost never in industry interviews
- A handful of database/optimizer roles touch on tree-DP that HLD speeds up
- A very rare appearance in compiler dominator-tree manipulation
When to Skip This Topic
Skip if any of these are true:
- You are not targeting ICPC
- You are not interviewing at a research/algorithms team
- You don’t already understand segment trees deeply (Phase 3 Lab 01–02)
- You have less than 3 weeks for this phase
HLD is implementation-heavy. Getting it right in interview time requires ≥ 30 hours of practice.
Problem Statement
Path Query (QTREE-style).
Given a tree of N vertices, each edge has a weight. Support two operations:
change(i, w): change edgei’s weight towquery(u, v): return the maximum edge weight on the path fromutov
Constraints
- 1 ≤ N ≤ 10^5
- 1 ≤ Q ≤ 10^5
- Edge weights ≤ 10^9
Clarifying Questions
- Is the tree rooted or unrooted? (Pick a root; doesn’t matter.)
- Queries on edges or on vertices? (Edges here; vertex variant is simpler.)
- Multiple components possible? (No — single tree.)
Examples
N = 4
Edges: (1,2,3), (2,3,4), (2,4,5)
query(1, 3): path 1→2→3, max edge = 4
change(1, 10): edge (1,2) now has weight 10
query(1, 4): path 1→2→4, max edge = 10
Brute Force
For each query, find the path via LCA (precompute LCA in O(log N)), then walk the path and check each edge. O(N) per query → O(NQ) total.
For N=Q=10^5, that’s 10^10 — TLE.
Brute Force Complexity
- Time: O(NQ)
- Space: O(N) for tree + LCA tables
Optimization Path
The path between u and v decomposes as u → LCA → v. With HLD, each leg traverses O(log N) heavy chains; each chain is a contiguous range in our DFS order; we query/update with a segment tree.
Final Expected Approach
- DFS 1 (size/parent/depth): compute subtree size, parent, depth.
- DFS 2 (HLD): for each vertex, identify heavy child (largest subtree). Assign
heavy[u]. Walk heavy chains, assigninghead[u]and apos[u]= position in DFS-order array. - Build segment tree over the DFS array, indexed by
pos[u]. - Query(u, v): while
head[u] != head[v], raise the deeper one to its head’s parent, querying the segment tree for the chain segment. Then query the segment betweenuandvon the final shared chain.
def query_path(u, v):
res = 0
while head[u] != head[v]:
if depth[head[u]] < depth[head[v]]:
u, v = v, u
res = max(res, seg_tree.query(pos[head[u]], pos[u]))
u = parent[head[u]]
if u == v: return res
if depth[u] > depth[v]: u, v = v, u
# Edge weights stored at the deeper endpoint; skip u's contribution
res = max(res, seg_tree.query(pos[u] + 1, pos[v]))
return res
Data Structures
- Tree adjacency
- Arrays:
parent,depth,size,heavy,head,pos - Segment tree over
pos-indexed values - DFS for both passes (iterative for large N to avoid stack overflow)
Correctness Argument
- Light edge bound: if (u, v) is a light edge, size(v) ≤ size(u)/2. So any root-to-leaf path crosses O(log N) light edges.
- Heavy chain decomposition: path u→v splits at LCA; each leg traverses chains separated by light edges; ≤ O(log N) chains.
- Segment tree on chain: chains contiguous in DFS order; standard range query.
- Edge-on-vertex convention: store each edge’s weight at its deeper endpoint, so a vertex query at position
ireturns the parent-edge weight.
Complexity
- Preprocessing: O(N)
- Per query/update: O(log² N) — segment tree query O(log N), times O(log N) chains
- Total: O((N + Q) log² N)
Implementation Requirements
- Iterative DFS for N > 10^4 to avoid Python recursion limit (or C++ stack)
- Segment tree must support point update + range max query
- Careful indexing: edge i stored at vertex
deeper_endpoint(i) - Handle u == v case in query (return 0 or identity)
Tests
- Linear tree (path graph): every edge is on the chain; ≤ 1 chain transition per query
- Balanced binary tree: ≈ log N chain transitions
- Star tree (1 center, N leaves): only one heavy edge; all other queries are 1-chain
- Update + query interleaved
- Query on same vertex (u == v)
- Path including the root
Follow-up Questions
- Sum on path instead of max. → Same structure, segment tree stores sums.
- Update path (range add) instead of point. → Segment tree with lazy propagation; same chain decomposition.
- Subtree query (sum/max in subtree of u). → Even simpler: subtree is contiguous in DFS, single range query.
- LCA only. → Tarjan offline O((N+Q)α(N)) or binary lifting O(log N).
- Dynamic tree (edges added/removed). → Link-cut trees (Sleator-Tarjan); much harder.
Product Extension
- Network routing on hierarchical topologies
- File-system path queries (organizations with deep trees)
- Phylogenetic tree analysis
- Decision tree updates in some ML systems
Language/Runtime Follow-ups
- C++: standard ICPC implementation; ~150 lines
- Python: slow constant factor; use iterative DFS; can pass for N ≤ 10^4 comfortably
- Java: stack depth fine for N ≤ 10^5; constant factor OK
Common Bugs
- Recursive DFS for N = 10^5: stack overflow in many languages.
- Forgot edge-vertex mapping convention: off-by-one when querying final segment.
- Heavy child computed wrong: must be the child with the largest subtree, not the deepest.
head[u]not propagated through the chain. All vertices on the same heavy chain should sharehead.- Segment tree off-by-one between
pos[u]andpos[v]. - Update at the root edge. Root has no parent edge; verify boundary handling.
Debugging Strategy
- Print the chain decomposition: list all chains as
[head, ..., tail] - For a path query, log each chain segment queried
- Verify against brute force on N ≤ 20
- Visualize: color heavy edges red, light edges black; should see O(log) light edges on long paths
Mastery Criteria
- Implement HLD + segment tree from scratch in ≤ 90 min
- Explain why O(log N) chains per path
- Handle edge-weight vs vertex-weight variants
- Combine with lazy propagation for range updates
- State complexity precisely: O((N + Q) log² N) or O(log N) with chain-segtrees
Lab 04 — Centroid Decomposition
Goal
Implement centroid decomposition for efficient tree queries — counting / aggregating over all paths in a tree in O(N log N) or O(N log² N).
Background
The centroid of a tree is a vertex whose removal leaves no subtree with more than N/2 vertices. Every tree has a centroid (sometimes two).
Centroid decomposition: recursively decompose the tree:
- Find centroid; process all paths passing through it
- Remove centroid; recurse on each remaining subtree
Recursion depth: O(log N) (each level halves subtree size). Total work per level: O(N) typically → O(N log N) total.
Originally developed for tree DP and offline path queries. Powerful technique for problems of the form: “count/sum over all pairs (u, v) in a tree with property P on the u-v path.”
Interview Context
Almost exclusively ICPC. Some appearances in:
- Quant algo research on tree models
- Phylogenetic inference (computational biology)
- A handful of compiler dominator-tree analyses
Industry interviews: near-zero.
When to Skip This Topic
Skip if any of these are true:
- You are not training for ICPC or competitive contests
- You haven’t done Lab 03 (HLD) — these are sibling techniques
- You don’t have 2+ weeks for the implementation practice
Centroid decomposition has a high “first implementation” cost. Don’t attempt without serious tree-DP fluency.
Problem Statement
Count Paths in Tree with Length ≤ K.
Given a tree of N vertices, edge weights w_e, and integer K, count the number of unordered pairs (u, v) such that the sum of edge weights on the path from u to v is ≤ K.
Constraints
- 1 ≤ N ≤ 5×10^4
- 1 ≤ K ≤ 10^9
- 1 ≤ w_e ≤ 10^4
Clarifying Questions
- Are weights positive? (Yes — required for the standard algorithm.)
- Count ordered or unordered pairs? (Unordered, exclude self-pairs.)
- Are edge weights integers? (Yes — convenient for sort/binary-search.)
Examples
Tree: 1-2 (w=2), 2-3 (w=1), 2-4 (w=3)
K = 4
Paths and lengths:
(1,2): 2 ✓
(1,3): 3 ✓
(1,4): 5 ✗
(2,3): 1 ✓
(2,4): 3 ✓
(3,4): 4 ✓
Answer: 5
Brute Force
For each unordered pair (u, v), compute path length (LCA + ancestor distances). O(N² log N).
For N = 5×10^4: 2.5×10^9 ops — TLE.
Brute Force Complexity
- Time: O(N² log N) for path length per pair
- Space: O(N) plus LCA tables
Optimization Path
Centroid decomposition shines for “paths through centroid” enumeration:
- A path between u and v either passes through the centroid c or lies entirely in one subtree (after c is removed)
- Paths through c: count by aggregating distances from c to every other vertex
- Paths in subtrees: handled recursively
Per centroid:
- BFS from c, recording dist(c, v) for every v in c’s connected component
- Sort distances per subtree
- Count pairs (u, v) with dist(c, u) + dist(c, v) ≤ K using two pointers
- Subtract pairs where u and v are in the same subtree (they would have been counted as paths through some other centroid)
Final Expected Approach
def centroid_decompose(root, K):
n = len(adj)
removed = [False] * n
size = [0] * n
total = 0
def calc_size(u, parent):
size[u] = 1
for v, _ in adj[u]:
if v != parent and not removed[v]:
calc_size(v, u)
size[u] += size[v]
def find_centroid(u, parent, tree_size):
for v, _ in adj[u]:
if v != parent and not removed[v] and size[v] > tree_size // 2:
return find_centroid(v, u, tree_size)
return u
def gather_dists(u, parent, d, out):
out.append(d)
for v, w in adj[u]:
if v != parent and not removed[v]:
gather_dists(v, u, d + w, out)
def count_pairs(dists, K):
dists.sort()
i, j = 0, len(dists) - 1
c = 0
while i < j:
if dists[i] + dists[j] <= K:
c += j - i
i += 1
else:
j -= 1
return c
def decompose(u):
nonlocal total
calc_size(u, -1)
c = find_centroid(u, -1, size[u])
all_dists = [0]
for v, w in adj[c]:
if not removed[v]:
sub = []
gather_dists(v, c, w, sub)
# subtract pairs within this subtree
total -= count_pairs(sub[:], K)
all_dists.extend(sub)
total += count_pairs(all_dists, K)
removed[c] = True
for v, _ in adj[c]:
if not removed[v]:
decompose(v)
decompose(root)
return total
Data Structures
- Adjacency list (vertex → list of (neighbor, weight))
removed[v]: marks centroids removed from active treesize[v]: subtree size in current decomposition step- Distance lists per subtree
Correctness Argument
- Centroid existence: every tree has a centroid (induction on tree structure).
- Recursion depth O(log N): removing centroid leaves subtrees of size ≤ N/2.
- Pair counting via subtraction: a path (u, v) is counted exactly once — at the deepest centroid c that lies on path(u, v). The inclusion-exclusion (add all-vertices count, subtract per-subtree count) ensures each path-through-c is counted once.
- Two pointers for sum ≤ K: standard.
Complexity
- Time: O(N log² N) — O(log N) levels, O(N log N) per level (sort dominates)
- Space: O(N) for tree + O(N) for decomposition state
Implementation Requirements
- Iterative or carefully bounded recursive DFS (Python: 5×10^4 may need increased limit)
- Recompute
size[]for each subtree (in the recursive call); critical bug source - Two-pointer pair counting requires sorted distances
- The inclusion-exclusion trick is the conceptual core; verify on small cases
Tests
- Linear chain (path graph): N(N-1)/2 paths; verify against brute force
- Star tree: each pair sum is at most 2*max_weight
- Balanced binary tree
- N = 1 (no pairs)
- K = 0 with positive weights (only self-pairs; answer = 0)
- Very large K (all pairs counted)
Follow-up Questions
- Count paths with length exactly K. → Use hashmap of distances per subtree; sum complement counts.
- Sum of path lengths (rather than count). → Aggregate sums in addition to counts during two-pointer scan.
- XOR of edge weights instead of sum, equals K. → Replace sort/two-pointer with XOR trie.
- Online (tree mutating). → Much harder; use top trees or Euler-tour trees.
- K-th shortest path. → Different problem; rarely tractable on trees with centroid.
Product Extension
- Phylogenetics: counting pairs of species within evolutionary distance K
- Network distance queries on hierarchical trees
- Distance-based recommendation systems on tree-like ontologies
Language/Runtime Follow-ups
- Python: sort + two-pointer per level; constant factor is the killer. C++ recommended for N ≥ 10^4.
- C++: standard ICPC implementation; ~100 lines.
- Recursive DFS: centroid decomposition depth O(log N), but inner DFS depth O(N) — limit must accommodate.
Common Bugs
- Forgot to recompute
size[]for each subtree. Sizes from before removal are stale. - Centroid finder doesn’t follow the right child. Must descend toward the largest remaining subtree.
removed[v]check forgotten in DFS: revisits removed centroids.- Off-by-one in pair counting (counting self-pair). Handle separately.
- Inclusion-exclusion wrong sign. Add all, subtract per-subtree.
- Stack overflow on deep recursion. Convert inner DFS to iterative.
Debugging Strategy
- For small N, compare against brute force at each level
- Log the centroid chosen at each call
- Verify subtree sizes recomputed correctly (print before find_centroid)
- For two-pointer: print sorted distances and the (i, j) cursor trajectory
Mastery Criteria
- Implement centroid decomposition in ≤ 60 min from memory
- Explain the inclusion-exclusion trick for path counting
- Identify problems amenable to centroid decomposition (offline path queries on static tree)
- Distinguish from HLD: HLD is online with edge updates; centroid is offline/path-counting
- State complexity precisely: O(N log² N) typical
Lab 05 — Suffix Automaton
Goal
Build a Suffix Automaton (SAM) for a string, and use it to count the number of distinct substrings in O(N).
Background
A suffix automaton is the smallest DFA that accepts every suffix of a given string. Discovered by Blumer et al. (1985).
Key facts:
- O(N) states and O(N) transitions for a string of length N (over an alphabet)
- Each state corresponds to an equivalence class of right-extensions of substrings
- Distinct substring count = sum of
len[state] - len[link[state]]over all states (excluding the initial state)
It’s the most powerful string data structure in competitive programming. Sometimes compared with suffix arrays + LCP arrays (which solve many of the same problems with different constants).
Interview Context
- Heavy ICPC / Codeforces presence
- Bioinformatics / genome alignment research
- Rarely in industry; Bloomberg may ask, but usually accepts suffix array
- Cryptography (some sequence-counting problems)
When to Skip This Topic
Skip if any of these are true:
- You’re not targeting ICPC or string-research roles
- You haven’t learned suffix arrays yet (lower-hanging fruit; more interview-relevant)
- You don’t have 2+ weeks to internalize this
SAM is conceptually the deepest topic in this phase. The implementation is short; the understanding is hard. Don’t fake it.
Problem Statement
Count Distinct Substrings.
Given a string s, count the number of distinct non-empty substrings.
Example: s = "abc" → substrings {“a”, “b”, “c”, “ab”, “bc”, “abc”} → 6.
s = "aaa" → substrings {“a”, “aa”, “aaa”} → 3.
Constraints
- 1 ≤ |s| ≤ 10^6
- Lowercase English (or larger alphabet — affects transition storage)
- Wall-clock: < 1 sec
Clarifying Questions
- Empty substring counts? (Usually no.)
- Substrings or distinct substrings? (Distinct; non-distinct is trivial: N(N+1)/2.)
- Alphabet size? (26 for English; affects map-vs-array tradeoff.)
Examples
"abc" → 6 (a, b, c, ab, bc, abc)
"aaa" → 3 (a, aa, aaa)
"abab" → 7 (a, b, ab, ba, aba, bab, abab)
"" → 0
Brute Force
Generate all substrings, insert into a hash set, return size. O(N²) substrings, each of average length N/2 to hash → O(N³) worst case, O(N² log N) average. For N = 10^6: dead.
Brute Force Complexity
- Time: O(N²) to O(N³)
- Space: O(N²) for the set
Optimization Path
Build the suffix automaton:
- Each state represents an equivalence class of substring occurrences
- For each state
s(except the initial), the number of distinct substrings ending at this state’s set of right-positions islen[s] - len[link[s]] - Sum these for all states → distinct substring count
Final Expected Approach
class SuffixAutomaton:
def __init__(self):
self.size = 1
self.last = 0
self.len = [0]
self.link = [-1]
self.next = [{}]
def extend(self, c):
cur = self.size
self.size += 1
self.len.append(self.len[self.last] + 1)
self.link.append(-1)
self.next.append({})
p = self.last
while p != -1 and c not in self.next[p]:
self.next[p][c] = cur
p = self.link[p]
if p == -1:
self.link[cur] = 0
else:
q = self.next[p][c]
if self.len[p] + 1 == self.len[q]:
self.link[cur] = q
else:
clone = self.size
self.size += 1
self.len.append(self.len[p] + 1)
self.link.append(self.link[q])
self.next.append(dict(self.next[q]))
while p != -1 and self.next[p].get(c) == q:
self.next[p][c] = clone
p = self.link[p]
self.link[q] = clone
self.link[cur] = clone
self.last = cur
def count_distinct_substrings(self):
return sum(self.len[i] - self.len[self.link[i]] for i in range(1, self.size))
# Usage
sam = SuffixAutomaton()
for c in "abab":
sam.extend(c)
print(sam.count_distinct_substrings()) # 7
Data Structures
len[]: longest substring represented by each statelink[]: suffix link (analogous to failure link in Aho-Corasick)next[]: transition map per state (dict or array of 26)
Correctness Argument
The SAM construction is non-trivial. The key invariants:
- After processing prefix of length k, the SAM recognizes exactly the suffixes of that prefix
- Each state’s
len[s] - len[link[s]]counts the number of distinct substrings whose right-extension class is exactly this state - Summing across all non-initial states gives total distinct substrings
The cloning step (when len[p] + 1 != len[q]) splits a state to maintain the equivalence class property — without it, the automaton wouldn’t be canonical.
A rigorous proof is in Maxime Crochemore’s textbook. Accept the construction; verify on small cases.
Complexity
- Construction: O(N · |Σ|) with dict transitions, or O(N) amortized with array transitions
- Distinct substring count: O(N) after construction
- Space: O(N · |Σ|) worst case
Implementation Requirements
- Use
dictper state for arbitrary alphabets, or[None] * 26for English - Allocate state arrays incrementally (or pre-allocate 2N for safety)
- Cloning step is the most error-prone — verify on small cases
- Avoid recursion; SAM is naturally iterative
Tests
"a"→ 1"aa"→ 2"ab"→ 3"abc"→ 6"aaa"→ 3"abab"→ 7"abcabc"→ 15- Stress: random strings of N=100 against brute force
- Performance: N = 10^6 single character → finishes in < 1 sec
Follow-up Questions
- Find longest common substring of two strings. → Build SAM of one; walk through the other tracking match length.
- Number of occurrences of each substring. → Count terminal nodes (via topological sort of suffix link tree, then propagate).
- K-th lexicographically smallest substring. → DFS on SAM with character ordering + count of substrings reachable.
- Substring matching count for many queries. → Walk pattern in SAM; if it ends, answer is the size of its “endpos” set.
Product Extension
- Genome assembly: distinct k-mers, longest common substrings across reads
- Plagiarism detection
- Compression (LZ-family algorithms use suffix structures)
- Search engine n-gram indexing (less common; usually suffix array)
Language/Runtime Follow-ups
- C++: use
int next[][26]for the alphabet — fast. - Python: dict transitions; constant factor allows N ≤ 10^5 comfortably.
- Java: TreeMap or HashMap; arrays preferred for fixed alphabet.
Common Bugs
- Forgot to clone: when len[p]+1 != len[q], failing to clone breaks the automaton.
- Wrong update of
last: must always becur, notclone. - Suffix link of
curset incorrectly: subtle; verify against reference. - Used same dict reference for clone: must
dict(self.next[q])(copy). - Off-by-one in distinct substring sum: start from state 1, not 0 (state 0 is the initial state and represents the empty substring).
Debugging Strategy
- Print states with (len, link, transitions) after each
extendcall - Visualize the suffix-link tree (parent = link[state])
- Verify against brute force for N ≤ 20
- For the cloning step: log when it triggers and which state is split
Mastery Criteria
- Implement SAM construction in ≤ 45 min from memory (it’s short but error-prone)
- Explain the role of suffix links and the cloning step
- Apply SAM to: distinct substring count, occurrence count, LCS of two strings
- State complexity precisely (O(N) states, O(N|Σ|) transitions, O(N) construction with arrays)
- Distinguish SAM from suffix arrays in problem-applicability
Lab 06 — Advanced DP Optimization
Goal
Apply three classical DP optimization techniques — Convex Hull Trick (CHT), Knuth’s optimization, and Divide & Conquer DP — to reduce a polynomial-time DP from O(N²) or O(N³) to O(N log N) or O(N²).
Background
Many DPs have the form dp[i] = min_j (dp[j] + cost(j, i)) for various cost functions. The naive scan is O(N) per i, giving O(N²) total. Three techniques exploit structure in cost:
- Convex Hull Trick: when
cost(j, i) = a[j] * x[i] + b[j](linear inx[i]), the transitions form lines; the min is the lower envelope, queryable in O(log N) or O(1) per query. - Knuth’s optimization: when the cost satisfies the quadrangle inequality (
cost(a,c) + cost(b,d) ≤ cost(a,d) + cost(b,c)fora ≤ b ≤ c ≤ d), and optimal split points are monotonic. Reduces O(N³) to O(N²). - Divide & Conquer DP: when optimal split points are monotonic (
opt(i, j) ≤ opt(i+1, j)). Reduces O(KN²) to O(KN log N).
Interview Context
- Codeforces / ICPC: regular
- Quant: heavy presence in trading-cost optimization, risk allocation
- Compiler: loop scheduling sometimes uses CHT
- Database: query optimizer cost minimization (rare)
Almost never in standard interviews.
When to Skip This Topic
Skip if any of these are true:
- You aren’t already fluent in 1D and 2D DP (Phase 5 prerequisites)
- You’re not targeting ICPC / quant / compiler optimization roles
- You don’t have 2+ weeks to drill multiple variants
These are families of techniques; each requires several practice problems to internalize.
Problem Statement
Three variants, one for each technique:
Variant A — CHT (Convex Hull Trick)
You drive along a road with N houses. House i is at position x[i] (sorted). You can rent a car at house i for cost c[i] + d[i] * (x[j] - x[i]) if you drive from house i to house j > i. Starting at house 1, what’s the minimum cost to reach house N?
dp[j] = min over i < j of (dp[i] + c[i] + d[i] * (x[j] - x[i]))
= min over i < j of (d[i] * x[j] + (dp[i] + c[i] - d[i] * x[i]))
This is linear in x[j] — CHT applicable. O(N) or O(N log N) depending on whether x is sorted.
Variant B — Knuth’s Optimization
Optimal Binary Search Tree. Given keys with access probabilities, build a BST minimizing expected access cost.
dp[i][j] = min over i ≤ k ≤ j of (dp[i][k-1] + dp[k+1][j]) + sum(p[i..j])
Naive O(N³). Knuth: O(N²) if opt[i][j] is monotonic, which holds when cost is quadrangle-inequality compliant.
Variant C — Divide & Conquer DP
Minimum K Partitions. Partition array a[1..N] into exactly K contiguous segments, minimizing sum of “cost” of each segment, where cost(l, r) satisfies the monotonic-opt property.
dp[k][i] = min over j < i of (dp[k-1][j] + cost(j+1, i))
Naive O(KN²). D&C DP: O(KN log N).
Constraints
- A: 1 ≤ N ≤ 10^5
- B: 1 ≤ N ≤ 5×10^3
- C: 1 ≤ N ≤ 5×10^3, 1 ≤ K ≤ N
Clarifying Questions
A: Are x[i] strictly sorted? Are d[i] non-negative?
B: Are probabilities normalized? Distinct keys?
C: Is cost precomputable in O(1) after O(N²) prep? Quadrangle inequality verified?
Examples
A (CHT)
positions: [0, 5, 10, 20]
c = [0, 3, 2, _]; d = [1, 1, 2, _]
dp[1] = 0
dp[2] = 0 + 0 + 1*(5-0) = 5
dp[3] = min(0+0+1*10, 5+3+1*5) = 10
dp[4] = min(0+0+1*20, 5+3+1*15, 10+2+2*10) = 20 vs 23 vs 32 → 20
B (Knuth) — verify on a small probability vector.
C (D&C DP) — verify on contrived cost.
Brute Force
A: O(N²) DP scan. B: O(N³) standard. C: O(KN²) standard.
Brute Force Complexity
For N = 10^5 in A: O(N²) = 10^10 — TLE. For N = 5×10^3 in B/C: O(N³) = 1.25×10^11 — TLE.
Optimization Path
A (CHT):
Maintain a lower convex hull of lines y = d[i] * x + (dp[i] + c[i] - d[i] * x[i]). For each query x = x[j], find the line with minimum y at that x.
- If
x[j]is monotonic: use a “Li Chao tree” or a stack-based hull with pointer for O(1) amortized per query → O(N) total. - If
x[j]arbitrary: binary search on hull → O(N log N).
B (Knuth):
Compute dp[i][j] for increasing j - i. For each (i, j), only try splits in [opt[i][j-1], opt[i+1][j]]. Amortized O(N²) instead of O(N³).
C (D&C DP):
For layer k, define solve(lo, hi, opt_lo, opt_hi): compute dp[k][lo..hi], knowing optimal split for each is in [opt_lo, opt_hi]. Recurse on midpoint m, then on solve(lo, m-1, opt_lo, opt[m]) and solve(m+1, hi, opt[m], opt_hi). Each level of recursion is O(N) work; depth is O(log N) → O(N log N) per k, O(KN log N) total.
Final Expected Approach
(See solution sketches inline in the variants above. Full implementations are 80–150 lines each in C++.)
Data Structures
- A (CHT): deque of lines + intersection-checking helper
- B (Knuth):
dp[N][N],opt[N][N] - C (D&C):
dp[K][N], recursive solver
Correctness Argument
- CHT: A line
y = mx + bis “dominated” if another line is below it at every x in the query range. The lower envelope contains exactly the non-dominated lines, in order of increasing slope. - Knuth: the quadrangle inequality implies monotonicity of
opt. Proof in TAOCP Vol 3. - D&C DP: if
opt[i] ≤ opt[i+1](monotonicity), splitting the range and using this constraint reduces work logarithmically.
Complexity
| Variant | Naive | Optimized |
|---|---|---|
| A | O(N²) | O(N) or O(N log N) |
| B | O(N³) | O(N²) |
| C | O(KN²) | O(KN log N) |
Implementation Requirements
- CHT: handle slopes carefully (sorted vs not); avoid division for intersections (use cross-multiplication with care for overflow)
- Knuth: process diagonals in order of length; verify monotonicity in a debug build
- D&C DP: pass index ranges + opt ranges; base case is
lo > hi
Tests
- Small N where brute force confirms answer
- Edge: N = 1 (trivial answer)
- All zero costs / probabilities
- Monotonically increasing / decreasing costs
- Stress: random instances at N = 1000 — compare optimized vs brute force
Follow-up Questions
- CHT for max instead of min. → Maintain upper convex hull; symmetric.
- Lines added in arbitrary slope order. → Use Li Chao Tree; O(log N) per insert and query.
- Knuth not applicable (no quadrangle). → Either D&C DP (if opt is monotonic) or SMAWK (O(N) for totally monotone matrices).
- D&C DP combined with CHT. → Possible; “Aliens trick” / Lagrangian relaxation.
Product Extension
- CHT: dynamic programming in online ad bidding optimization, trading strategy optimization
- Knuth: BST construction in language tools (rarely; usually use B-trees or hash maps)
- D&C DP: optimal segmentation in time-series anomaly detection, network topology design
Language/Runtime Follow-ups
- C++: all three implementable with stdlib. CHT often uses
__int128to avoid overflow in intersection checks. - Python: D&C DP works but with significant constant factor; CHT with Li Chao is feasible.
- Java: BigInteger for safety on overflow-prone intersection checks.
Common Bugs
- CHT: integer overflow in line intersection. Use long doubles or 128-bit.
- CHT: deque pops the wrong end when slopes are descending vs ascending.
- Knuth: didn’t verify the quadrangle inequality. Algorithm gives wrong answer silently.
- Knuth: opt-range boundaries inclusive vs exclusive — off-by-one.
- D&C: passed wrong opt range to recursive calls. Loses the monotonicity benefit.
- D&C: base case (lo > hi) doesn’t return.
Debugging Strategy
- For each technique, write a brute-force version side by side
- Stress test with random small instances and assert equality
- For CHT, print the hull after each insertion
- For Knuth/D&C, log the chosen split points and verify monotonicity
Mastery Criteria
- Recognize when a DP qualifies for each optimization
- Implement CHT in ≤ 40 min
- Implement Knuth in ≤ 30 min
- Implement D&C DP in ≤ 30 min
- State the prerequisite condition for each (linearity, quadrangle, monotonic-opt)
- Estimate runtime for given N, K
Lab 07 — FFT / Polynomial Multiplication
Goal
Implement the Cooley-Tukey FFT to multiply two polynomials in O(N log N), and apply it to convolution-based problems (large-integer multiplication, string matching with wildcards).
Background
The Discrete Fourier Transform (DFT) of a length-N vector evaluates the corresponding polynomial at N roots of unity. Multiplying two polynomials of degree N-1 via naive convolution is O(N²); via DFT, point-wise multiply, inverse-DFT, it’s O(N log N).
Cooley-Tukey (1965): divide and conquer the DFT. The radix-2 version requires N a power of 2.
Number-Theoretic Transform (NTT): FFT over a prime field; avoids floating-point error; common in competitive programming.
Interview Context
- Codeforces / ICPC: regular (NTT version)
- Signal processing roles (DSP, audio, image): expected
- Cryptography research: standard tool
- Quant: large-integer multiplication, time-series convolutions
- Standard FAANG: essentially zero
When to Skip This Topic
Skip if any of these are true:
- You aren’t targeting signal-processing, cryptography, or ICPC roles
- You haven’t implemented divide-and-conquer recursive algorithms confidently
- You’re rusty on complex number arithmetic
This is a “you need it or you don’t” topic. Most interview prep should skip.
Problem Statement
Polynomial Multiplication.
Given two polynomials A(x) = sum a[i] * x^i and B(x) = sum b[i] * x^i, compute their product C(x) = A(x) * B(x).
Equivalently: compute the convolution c[k] = sum_{i+j=k} a[i] * b[j].
Constraints
- Degrees up to 10^5 or 10^6
- Coefficients fit in int32 (avoid overflow concerns by using float carefully; or use NTT)
- Wall-clock: < 1 sec
Clarifying Questions
- Integer coefficients or real-valued? (Integer for NTT, real for FFT.)
- Exact answer required? (Yes for NTT; FFT introduces floating error.)
- Output as polynomial coefficients or as a value at specific x? (Coefficients.)
Examples
A = [1, 2, 3] (1 + 2x + 3x²)
B = [4, 5] (4 + 5x)
C = [4, 13, 22, 15] (4 + 13x + 22x² + 15x³)
A = [1, 1] (1 + x)
B = [1, 1]
C = [1, 2, 1] (1 + x)² = 1 + 2x + x²
Brute Force
Nested loops: c[i+j] += a[i] * b[j]. O(N²). For N = 10^5: 10^10 ops — TLE.
Brute Force Complexity
- Time: O(N²)
- Space: O(N)
Optimization Path
- Pad both A and B to length 2^k ≥ deg(A) + deg(B) + 1.
- Compute DFT(A) and DFT(B) using Cooley-Tukey.
- Pointwise multiply: F[i] = DFT(A)[i] * DFT(B)[i].
- Compute IDFT(F) to recover convolution C.
Final Expected Approach
def fft(a, invert=False):
n = len(a)
if n == 1: return
# bit-reverse permutation
j = 0
for i in range(1, n):
bit = n >> 1
while j & bit:
j ^= bit
bit >>= 1
j ^= bit
if i < j:
a[i], a[j] = a[j], a[i]
# butterfly
length = 2
while length <= n:
angle = 2 * math.pi / length * (-1 if invert else 1)
wlen = complex(math.cos(angle), math.sin(angle))
for i in range(0, n, length):
w = complex(1)
for k in range(length // 2):
u = a[i + k]
v = a[i + k + length // 2] * w
a[i + k] = u + v
a[i + k + length // 2] = u - v
w *= wlen
length <<= 1
if invert:
for i in range(n):
a[i] /= n
def multiply(a, b):
result_size = 1
while result_size < len(a) + len(b):
result_size <<= 1
fa = [complex(x) for x in a] + [complex(0)] * (result_size - len(a))
fb = [complex(x) for x in b] + [complex(0)] * (result_size - len(b))
fft(fa)
fft(fb)
for i in range(result_size):
fa[i] *= fb[i]
fft(fa, invert=True)
return [round(x.real) for x in fa[:len(a) + len(b) - 1]]
For NTT (exact integer convolution), replace complex roots of unity with primitive roots in F_p for a prime p = c * 2^k + 1 (common: 998244353 with primitive root 3).
Data Structures
- Array of complex numbers (FFT) or integers mod p (NTT)
- Bit-reverse permutation index
Correctness Argument
- DFT linearity: DFT(A + B) = DFT(A) + DFT(B).
- Convolution theorem: DFT(A * B) = DFT(A) ⊙ DFT(B) (pointwise).
- Inverse: IDFT(DFT(A)) = A.
- Cooley-Tukey: recursive split into even/odd indices; combines via roots of unity.
Complexity
- Time: O(N log N)
- Space: O(N)
Implementation Requirements
- N must be a power of 2; pad with zeros
- Bit-reversal permutation correctly implemented
- Iterative butterflies (recursive is fine for small N but slow for large)
- For floating-point FFT: round to nearest integer at the end; verify error bound
- For NTT: pick a prime large enough for max coefficient × N to avoid overflow
Tests
- Multiply [1, 1] × [1, 1] = [1, 2, 1]
- Multiply degree-3 polynomials by hand-verified product
- Stress: random N = 10^4 polynomials vs O(N²) brute force; assert equality
- N = 1 (constants only)
- All zeros (result all zeros)
- Performance: N = 2×10^5 in < 1 sec
Follow-up Questions
- Exact integer convolution with large coefficients. → NTT with multi-modulus + CRT, or three NTTs with different primes.
- String matching with wildcards. → Reduce to convolution; each char becomes a numeric encoding; wildcard = 0; sum-of-(diff)² = 0 means match.
- Multi-dimensional FFT (image convolution). → Apply 1D FFT along each axis.
- Fast multiplication of very large integers. → Schönhage-Strassen uses FFT; Furer’s algorithm faster asymptotically.
- Subset sum convolution. → Walsh-Hadamard transform; different beast.
Product Extension
- Audio processing (spectrograms, filters)
- Image processing (Gaussian blur, edge detection)
- Cryptography (large-integer multiplication for RSA, ECC)
- Time-series analysis (autocorrelation)
- Big-integer libraries (GMP uses FFT-based multiplication above ~1000 digits)
Language/Runtime Follow-ups
- C++: standard. NumPy’s FFT in Python is C-optimized — sometimes acceptable as the “library” answer if the interviewer allows.
- Python: pure-Python FFT is slow; for N > 10^4 use
numpy.fft. - Java: Apache Commons Math has FFT.
- JavaScript: rare in interviews; libraries exist.
Common Bugs
- Bit-reverse permutation wrong: off-by-one in the swap loop.
- Forgot to divide by N in inverse: result is N× too large.
- N not a power of 2: padding error.
- Floating-point error too large: for coefficients near max int32, need careful rounding or NTT.
- NTT primitive root wrong: for prime p, root g must have order divisible by N.
- Result length wrong: should be len(A) + len(B) - 1, but FFT computed over N ≥ that.
Debugging Strategy
- Verify FFT then IFFT recovers the input (within floating tolerance)
- Multiply by [1] and verify output equals input
- Compare against numpy.fft on small inputs
- For NTT: compute small examples by hand using primitive roots
Mastery Criteria
- Implement Cooley-Tukey FFT in ≤ 45 min
- Implement NTT in ≤ 60 min
- State the convolution theorem
- Identify problems solvable via FFT/NTT (convolution, large-int mult, string matching with errors)
- Reason about numerical precision for FFT vs NTT
- Estimate runtime for given N
Lab 08 — Advanced Geometry
Goal
Implement two foundational computational geometry primitives:
- Convex Hull via Andrew’s monotone chain (O(N log N))
- Segment Intersection (Bentley-Ottmann sweep-line, O((N + K) log N))
Background
Computational geometry interviews are rare but exist at:
- Companies doing CAD, CAM, 3D modeling (Autodesk, Adobe)
- Games (Unity, Epic) — physics, collision
- Robotics (path planning, occupancy grids)
- Maps/GIS (Google Maps, ESRI)
- Some quant for time-series geometric techniques
The implementation has many sharp edges: floating-point comparison, collinear points, degenerate cases. Robust geometry is a deep subfield.
Interview Context
- Codeforces / ICPC: geometry round often included
- Game / CAD / robotics roles: foundational
- Standard FAANG: almost never
When to Skip This Topic
Skip if any of these are true:
- You’re not targeting the specific industries above
- You’re uncomfortable with vector/cross product math
- You don’t have 2+ weeks to handle the edge cases properly
The first implementation of convex hull will have bugs. Plan for several practice attempts.
Problem Statement
Part A — Convex Hull
Given N points in 2D, return the vertices of their convex hull in counterclockwise order, starting from the lowest-leftmost point.
Part B — Count Segment Intersections
Given N line segments, return the number of intersection points among them.
Constraints
- A: 1 ≤ N ≤ 10^5. Coordinates fit in int32.
- B: 1 ≤ N ≤ 10^5. Up to K = O(N²) intersections in pathological cases; for the algorithm to be efficient, K << N².
Clarifying Questions
A:
- Should collinear hull points be included or skipped?
- Are duplicates possible?
B:
- Should overlapping segments count as one intersection or many?
- Touching at endpoints?
Examples
A
Points: [(0,0), (1,1), (2,0), (1,-1)]
Hull: [(0,0), (2,0), (1,1)] (counterclockwise; (1,-1) is below)
Wait — (1,-1) is also on the hull. Correct hull: [(0,0), (1,-1), (2,0), (1,1)].
B
Segments: [((0,0),(4,4)), ((0,4),(4,0))]
Intersect at (2,2). Answer: 1.
Brute Force
A: gift-wrapping (Jarvis march) — O(N · H) where H = hull size. Worst case O(N²). B: check every pair — O(N²).
Brute Force Complexity
- A: O(NH) worst-case O(N²)
- B: O(N²)
For N = 10^5: 10^10 — TLE.
Optimization Path
A — Andrew’s Monotone Chain
- Sort points lexicographically by (x, y).
- Build upper hull: iterate left to right; pop top of hull while it makes a non-right turn.
- Build lower hull: iterate right to left; pop while non-right turn.
- Concatenate (excluding shared endpoints).
def cross(O, A, B):
return (A[0]-O[0]) * (B[1]-O[1]) - (A[1]-O[1]) * (B[0]-O[0])
def convex_hull(points):
points = sorted(set(points))
if len(points) <= 1: return points
lower = []
for p in points:
while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0:
lower.pop()
lower.append(p)
upper = []
for p in reversed(points):
while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0:
upper.pop()
upper.append(p)
return lower[:-1] + upper[:-1]
B — Bentley-Ottmann Sweep Line
- Event queue: segment endpoints + computed intersections, ordered by x.
- Status: balanced BST of active segments, ordered by y at the sweep line.
- At a “left endpoint” event: insert into status; check intersection with neighbors above/below.
- At a “right endpoint” event: remove from status; check intersection between the new neighbors.
- At an “intersection” event: swap the two segments in status; check intersections with new neighbors.
O((N + K) log N) where K = number of intersections.
Final Expected Approach
Convex hull: as above.
Bentley-Ottmann: full implementation is 200+ lines with all edge cases. For most interviews, even discussing the structure is enough — full code is unlikely to be required.
Data Structures
- A: sorted list, stack-like list for hull construction
- B: priority queue (event queue), balanced BST or sorted list with O(log N) ops (status)
Correctness Argument
A
- Sorting by x (then y) ensures we visit points in a consistent order.
- “Right turn” (cross product > 0) means we’re making a convex angle; popping ensures we never include a concave point.
- Lower and upper hulls cover all hull vertices; concatenation gives full hull in CCW order.
B
- Two segments can only intersect when they are adjacent in the y-order at some x.
- The sweep maintains adjacency; new adjacencies arise at endpoints and intersections.
- Each intersection is detected exactly once (at the moment the segments become adjacent).
Complexity
- A: O(N log N) for sort + O(N) for chain construction.
- B: O((N + K) log N).
Implementation Requirements
- Use integer arithmetic for cross product when possible (avoids floating-point errors)
- Handle collinear hull points consistently (include or exclude as required)
- For segment intersection: distinguish proper intersection (interior) from touching (endpoint)
- Watch for vertical segments (event ordering edge case)
Tests
A
- Single point → [point]
- Two points → [both]
- Three collinear points → [endpoints only] (or all three, depending on convention)
- Square (4 corners + 5 interior) → 4 corners
- All points on a circle → all on hull
- Many duplicates
B
- No intersections (parallel lines)
- All intersect at one point (n lines through origin)
- Random N = 100 vs O(N²) brute force
Follow-up Questions
A:
- Convex hull in 3D. → Quickhull or gift wrapping in 3D; O(N²) worst.
- Dynamic convex hull (online insertions). → Overmars-van Leeuwen; O(log² N) per update.
- Compute area of hull. → Shoelace formula.
- Diameter of hull (farthest pair). → Rotating calipers, O(N).
B:
- Report intersections, not just count. → Same structure; collect points.
- Line-line intersection in projective coords. → Avoids special-casing parallels.
- Segments may overlap. → More complex event handling.
- Robust orientation predicate. → Adaptive precision; Shewchuk’s predicates.
Product Extension
- Mapping: route simplification (Ramer-Douglas-Peucker)
- Games: collision detection (broad phase uses sweep)
- CAD: boolean operations on polygons (sweep-based)
- Robotics: configuration space construction
Language/Runtime Follow-ups
- C++: geometry libraries (CGAL) are massive and correct but complex.
- Python: Shapely for production; for interview, implement primitives.
- Java: java.awt.geom.* has some primitives.
Common Bugs
- Floating-point comparison without epsilon: false negatives on coincident points.
cross < 0vscross <= 0: different hull conventions (collinear in or out).- Forgot to dedupe input points before convex hull.
- Sweep-line: vertical segments treated incorrectly. Add small perturbation or special-case.
- Sweep-line: intersection on the sweep line not detected because status ordering is computed at the wrong x.
Debugging Strategy
- Plot the points and hull (matplotlib or similar)
- For small cases, enumerate hull by hand and verify
- For sweep-line: log every event and the status before/after
Mastery Criteria
- Implement Andrew’s monotone chain in ≤ 30 min from memory
- Implement basic segment-intersection check in ≤ 15 min
- Understand the Bentley-Ottmann structure (even without writing 200 lines)
- Apply rotating calipers for hull diameter
- Recognize when integer arithmetic suffices vs when floating-point is unavoidable
- State complexity precisely
Lab 09 — ICPC Contest Simulation
Goal
Simulate a 5-hour ICPC-style contest: 6–10 problems of varying difficulty, single team, no internet, paper and pen allowed. Practice contest strategy: problem selection, time management, debugging under pressure.
Background
ICPC contests are the gold standard for competitive programming practice:
- 5 hours
- ~10 problems sorted A–J in roughly increasing difficulty (but not strictly)
- Penalty per wrong submission (20 minutes)
- Final ranking by # problems solved, then total time
This lab is a meta-lab: rather than teach an algorithm, it builds the contest operating system of the candidate.
Interview Context
ICPC training transfers to:
- Quant hiring (Jane Street, Citadel value ICPC experience)
- Google L6+ research interviews (sometimes)
- General algorithmic confidence under time pressure (transferable)
Direct application of contest mode to industry interviews: low. But the training effect is high.
When to Skip This Topic
Skip if any of these are true:
- You’re not interviewing at competitive-friendly firms
- You have less than a month for this phase
- You haven’t done Phase 11 mocks at your target level — those are higher leverage
If you do this lab, do it only after exhausting Phase 11.
Problem Statement
Run a 5-hour contest. Sources of problem sets:
- Codeforces Educational rounds
- Codeforces Div 2 (rounds 600+)
- ICPC regional sets on UVa Online Judge or DOMjudge replays
- Atcoder Beginner Contest (ABC) — easier, 100 min
- Atcoder Regular Contest (ARC) — medium, 120 min
- Kattis archive
Required mix for a 5-hour set:
- 2 trivial problems (sanity / warm-up; should solve in 15 min each)
- 3 medium problems (1 hour each)
- 2 hard problems (1.5+ hours, often unsolved)
- 1 problem-killer (often unsolved by anyone except top teams)
Constraints
- Time: 5 hours, single uninterrupted block
- No internet (except for the problem statements)
- Programming language of your choice (typically C++ for competitive)
- Penalty: simulate the 20-minute penalty per WA
- No external help / collaboration
Clarifying Questions (to yourself before starting)
- Which 2 problems will I attempt first? (Decision in 5 min of reading.)
- What’s my “hard problem cutoff” — at what point do I move on?
- What’s my time budget for debugging vs writing?
- How will I track time per problem?
Examples (suggested contest sets)
- Beginner: ABC 250 (8 problems, 100 min, but extend to 3 hours for practice)
- Intermediate: Codeforces Round 800, Div 2 (4–5 problems, 2 hours; extend with a Div 1 problem)
- Advanced: Any ICPC regional set on Kattis
Brute Force / Naive Strategy
- Read problems A → J in order
- Attempt in order
- Stuck on B → keep grinding
This is the wrong strategy. All experienced contestants read every problem in the first 30 minutes.
Brute Force Complexity
Linear-strategy ranking is in the bottom half. The optimization is meta: contest strategy.
Optimization Path
Phase 1 (0–30 min): Reading and triage
- Read all problems. Categorize each as: trivial (T) / medium (M) / hard (H) / unknown (?)
- Note any problem you immediately know how to solve
- Estimate time-to-solve for each known problem
- Decide which 2 problems to start with (lowest-risk T or known-M)
Phase 2 (30 min – 3 hours): Bulk solving
- Solve the T’s first (lock in points)
- Tackle M’s one at a time
- Time-cap each: 45 min to first attempt; if no insight at 60 min, switch
- Submit only when you’ve tested at least 2 cases (penalty hurts)
- Track submitted vs accepted on a paper grid
Phase 3 (3–4.5 hours): Hard problem attack
- Attempt your best H candidate
- If stuck for 30 min, switch to another M or H
- Don’t sink the last hour on one H if no progress
Phase 4 (last 30 min): Last-chance and verification
- Verify your AC submissions (any bugs you noticed but didn’t fix?)
- Submit any partial solutions
- Hand-test edge cases on solved problems
Final Expected Approach
Run the contest, then write a post-mortem:
- Problems solved: ___
- Penalty time: ___
- Problems unread: ___ (should be 0)
- Problems where you knew the approach but couldn’t implement: ___
- Problems where you couldn’t find the approach: ___
- Wasted time on (problem X): ___
- Bug that cost you (problem Y): ___
Data Structures (the contestant uses)
- Templates file (
.hfor C++ or snippets): segment tree, DSU, FFT, Dijkstra, BFS, mod arithmetic - Paper grid: problem letter | reading time | first-attempt time | submissions | AC time
- Stack-rank: priority order updated as problems are read
Correctness Argument
Strategy correctness is empirical: track your own performance over 5–10 contests. Adjust based on:
- Where did I spend too long?
- Which problems did I misclassify?
- Which problem types do I consistently miss?
Complexity
Contest itself: 5 hours. Post-mortem: 1 hour. Per-week investment: ~1 full contest + a few targeted upsolves = 8–12 hours.
Implementation Requirements
- Pre-built template file (start with KACTL or your own)
- Local judging setup: compile + run + diff against expected output
- Stress-testing harness (Lab 05 from Phase 10 applies directly)
- A timer / phone alarm for the 5 hours
Tests
This is the test. The contest is the test.
Self-evaluation rubric:
-
problems solved
- Time per problem
- WA-to-AC ratio
- Quality of stress-tests during contest
Follow-up Questions (post-mortem)
- Which problem could I have solved with 30 more minutes? → Drill that topic.
- Which problem did I solve in time that I should have submitted faster? → Bug-hunt training.
- Which problem did I skip that I should have attempted? → Misclassification — calibrate.
- Did I run out of time or run out of ideas? → Different fixes.
Product Extension
- Long-term: ICPC-style training builds engineering reflexes that transfer to:
- Performance debugging under deadline
- Triage of multiple bugs simultaneously
- Estimation of task duration (notoriously hard for engineers)
Language/Runtime Follow-ups
- C++: dominant in competitive. Compile-time templates pay off.
- Python: acceptable for problems with N ≤ 10^5; risk TLE on tight problems.
- Java: middle ground; some teams use it.
- Rust: rising; some teams use; standard library less batteries-included than C++.
Common Bugs (contest-specific)
- Submitted without testing. Penalty bug.
- Submitted with debugging prints still in code. WA.
- Forgot to read a problem. Discovered 2 hours later you had a free solve.
- Spent 90 minutes on one problem you couldn’t solve. Sunk-cost trap.
- Submitted brute force expecting partial credit. ICPC has no partial; only full AC.
- Wrong I/O format. Read the spec carefully — especially expected output line endings.
Debugging Strategy (during contest)
- 5-min rule: if not finding a bug in 5 min, write a stress test (Lab 05)
- Random small inputs vs brute force is a contest superpower
- If stuck on WA, re-read the problem statement entirely — often the bug is misunderstanding, not code
Mastery Criteria
- Complete 5 contests; track scores
- Post-mortem each one and act on findings
- Solve at least 3 problems consistently in a 5-hour Div 2 contest
- Solve at least 1 D-level (Codeforces) problem in 2 hours
- Build a personal template file with ≥ 15 commonly-used snippets
- After 10 contests, identify your top 3 weakness topics; drill them
Suggested Contest Schedule (4 weeks)
| Week | Contest | Goal |
|---|---|---|
| 1 | Codeforces Educational Round | Solve A, B, C, attempt D |
| 2 | ABC (extended to 3 hr) | Solve A through F |
| 3 | Codeforces Div 2 | Solve A, B, C, D |
| 4 | ICPC regional replay | Solve 3–5 of 10 |
Post-mortem after each. Drill weakness topics between contests.
Lab 10 — Inclusion-Exclusion and Burnside
Goal
Master two combinatorial counting techniques used across competitive programming and discrete math:
- Inclusion-Exclusion Principle (PIE) for counting objects satisfying / violating multiple conditions
- Burnside’s lemma for counting orbits under group actions (counting distinct configurations modulo symmetry)
Background
Inclusion-Exclusion
For sets A₁, …, Aₙ:
$$|A_1 \cup \cdots \cup A_n| = \sum |A_i| - \sum |A_i \cap A_j| + \sum |A_i \cap A_j \cap A_k| - \cdots$$
In counting form, for “objects with property P_i”:
$$\text{count violating none} = \sum_{S \subseteq {1..n}} (-1)^{|S|} \cdot |\text{objects with all properties in S}|$$
Burnside
For a group G acting on a set X, the number of orbits is:
$$|X/G| = \frac{1}{|G|} \sum_{g \in G} |X^g|$$
where X^g is the set of elements fixed by g.
Interview Context
- Codeforces / ICPC: regular (PIE constantly, Burnside in counting problems with symmetry)
- Quant: combinatorial pricing models
- Cryptography / coding theory: standard tools
- Standard interviews: occasional easy PIE problem (e.g., “count integers ≤ N divisible by none of {2, 3, 5}”)
PIE appears more often than people realize; Burnside is rarer.
When to Skip This Topic
Skip if any of these are true:
- You’re not targeting competitive / combinatorics-heavy roles
- You haven’t done basic combinatorics (Phase 1–2 foundations)
- You have less than 1 week for this lab
PIE is high-leverage even outside competitive — learn it. Burnside is optional.
Problem Statement
Part A — Count Coprime to N
Given integers L ≤ R and N, count integers in [L, R] coprime to N.
Part B — Distinct Necklaces
Given k colors and n beads in a circle, count the number of distinct necklaces (two necklaces are equivalent if one is a rotation of the other).
Constraints
- A: 1 ≤ L ≤ R ≤ 10^18, 1 ≤ N ≤ 10^9
- B: 1 ≤ n ≤ 10^6, 1 ≤ k ≤ 10^9
Clarifying Questions
A:
- Coprime means
gcd(x, N) = 1? - Do we include endpoints?
B:
- Are reflections considered equivalent (bracelets) or only rotations (necklaces)?
- Modulo what prime?
Examples
A
L=1, R=10, N=6 → coprime to 6 are {1, 5, 7}. Wait, in [1,10]: {1, 5, 7}. Answer: 3.
B
n=3, k=2 → 4 distinct necklaces: BBB, BBW, BWW, WWW (BBW and BWB and WBB are all rotations of each other).
Brute Force
A: iterate x in [L, R], check gcd. O(R - L). For R - L = 10^18: TLE.
B: generate all k^n colorings, group by rotation equivalence. O(k^n).
Brute Force Complexity
- A: O(R - L)
- B: O(k^n)
Both fail for given constraints.
Optimization Path
A (Inclusion-Exclusion)
Let p₁, …, pₘ be the distinct prime divisors of N. Then:
$$\text{coprime}(1..M) = \sum_{S} (-1)^{|S|} \cdot \lfloor M / \prod_{p \in S} p \rfloor$$
Compute for f(M) = count of integers in [1, M] coprime to N. Answer = f(R) - f(L-1).
def count_coprime(M, N):
primes = prime_divisors(N)
m = len(primes)
total = 0
for mask in range(1 << m):
prod = 1
bits = 0
for i in range(m):
if mask & (1 << i):
prod *= primes[i]
bits += 1
total += ((-1) ** bits) * (M // prod)
return total
answer = count_coprime(R, N) - count_coprime(L - 1, N)
Complexity: O(2^m · log N) where m = number of distinct prime factors of N (≤ 9 for N ≤ 10^9).
B (Burnside)
Group G = cyclic group of n rotations. By Burnside:
$$|\text{necklaces}| = \frac{1}{n} \sum_{d=0}^{n-1} k^{\gcd(n, d)}$$
Group by gcd(n, d) = g: the count of d with this gcd is φ(n/g). So:
$$= \frac{1}{n} \sum_{g | n} \varphi(n/g) \cdot k^g$$
Compute φ on divisors of n. O(σ(n)) divisors; per divisor, O(log n) for fast exponentiation. Total: O(d(n) · log n) which is tiny.
Final Expected Approach
(See solution sketches above.)
Data Structures
A: list of prime factors of N; bitmask iteration B: divisor enumeration; Euler totient function
Correctness Argument
PIE: classical proof by induction or by the principle that each element is counted (number of subsets it belongs to) times in alternating signs, summing to 1 - 0 + 0 … = 1.
Burnside: from orbit-counting theorem; proof in any introductory abstract algebra text.
Complexity
- A: O(2^m) where m = distinct prime factors ≤ 9 → constant
- B: O(σ(n) · log n) → effectively O(√n · log n) since divisors ≤ 2√n
Implementation Requirements
- A: efficient prime factorization (trial division up to √N is fine for N ≤ 10^9)
- B: divisor enumeration via trial division; Euler totient by formula
φ(n) = n · ∏(1 - 1/p) - Modular exponentiation for k^g mod p
- Modular inverse for division by n (use Fermat: n^(p-2) mod p)
Tests
A
- L=1, R=10, N=6 → 3
- L=1, R=N → φ(N) (Euler totient)
- L=R, x coprime to N → 1
- L=R, x not coprime → 0
- Very large R for performance check
B
- n=1, k=2 → 2 (B, W)
- n=2, k=2 → 3 (BB, BW, WW)
- n=3, k=2 → 4
- n=4, k=2 → 6
- Compare with brute force for n ≤ 6
Follow-up Questions
For A:
- Count coprime to multiple Ns simultaneously. → PIE on union of all prime sets.
- Count squarefree numbers in [L, R]. → Möbius function = PIE on prime squares.
- Sum of φ(i) for i in [1, N]. → Sieve-like O(N log log N) or O(N^{2/3}) with Mertens.
For B:
- Bracelets (rotations + reflections). → Group = dihedral D_n. Add reflection terms.
- Necklaces with at most k uses of each color. → Multiset Burnside; harder.
- Polya enumeration with cycle index polynomial. → Generalizes Burnside; gives generating function.
- Distinct binary trees of n nodes. → Catalan numbers; different problem but related counting.
Product Extension
- Combinatorial design (DOE, experimental design)
- Code generation (counting equivalence classes of programs)
- Chemistry (counting molecular isomers — Polya’s original motivation)
- Cryptography (group orbits in elliptic curves)
Language/Runtime Follow-ups
- Python: big-int arithmetic for free; ideal for PIE/Burnside with modular arithmetic.
- C++: modular arithmetic with manual care for overflow; faster execution.
- Java: BigInteger for safety; modular arithmetic mature.
Common Bugs
- PIE sign wrong: off-by-one in (-1)^|S|.
- PIE on factored N: counted prime factors with multiplicity. Only distinct primes matter.
- Burnside: forgot to take modular inverse for division by n.
- Euler totient computed via brute force gcd: O(n) per value, way too slow.
- Modular exponentiation overflow: use
pow(k, g, MOD)in Python; manual loop with mod in C++.
Debugging Strategy
- Brute force for small parameters; assert PIE/Burnside match
- For PIE: print each subset’s contribution; verify signs alternate
- For Burnside: enumerate orbits manually for n ≤ 4
Mastery Criteria
- Apply PIE to: count divisible/coprime, derangements, surjections, squarefree
- Apply Burnside to: necklaces, bracelets, colored cubes
- Compute Euler totient in O(√n)
- Compute Möbius function (PIE generalized)
- State group-action correctness conditions
- Identify a “this needs symmetry handling” problem and reach for Burnside