Elite Coding Interview Mastery Program

A complete, lab-based, phase-structured training system to take you from beginner/intermediate to a candidate who can succeed at any coding interview — FAANG, infrastructure companies, distributed systems teams, compilers/runtimes, quant/HFT, staff/principal practical coding, and competitive-programming-style interviews.

This is not a roadmap. It is a training system: every phase has concept docs, runtime docs, hands-on labs in a fixed format, mock interviews with rubrics, failure analysis, and spaced repetition.

How To Use This Program

Pick a track in schedules/ — Accelerated (12 weeks), Serious (6 months), or Elite (12 months).
Read the universal framework first — FRAMEWORK.md. Use it on every problem.
Work through phases in order — they have dependencies. Do not skip Phase 0 or Phase 1.
For every problem, fill out REVIEW_TEMPLATE.md and schedule revisits via SPACED_REPETITION.md.
When you fail, run the diagnosis in FAILURE_ANALYSIS.md. Do not skip this.
Mock interview weekly minimum — see phase-11-mock-interviews/.
Graduate only when READINESS_CHECKLIST.md is fully passed.

Global Framework Documents

Document	Purpose
FRAMEWORK.md	The 16-step universal problem-solving framework + 11-step stuck protocol
COMMUNICATION.md	What to say (and not say) during an interview, with sample phrases
CODE_QUALITY.md	Quality bar for interview code, with bad-vs-good comparisons
REVIEW_TEMPLATE.md	Per-problem review template (use after every solve)
FAILURE_ANALYSIS.md	16-category failure taxonomy with diagnosis, fix, drill, re-test
SPACED_REPETITION.md	6-tier review intervals + per-tier protocol
READINESS_CHECKLIST.md	Final binary checklist before going to interviews

Schedules

Track	Audience	Doc
12-Week Accelerated	Urgent prep, 1 deadline coming up	schedules/12_WEEK_ACCELERATED.md
6-Month Serious	Strong Big Tech readiness	schedules/6_MONTH_SERIOUS.md
12-Month Elite	Top-tier, senior/staff, competitive-heavy	schedules/12_MONTH_ELITE.md

Phases

#	Phase	Target Level	Folder
0	Interview Execution Baseline	Beginner → Easy	phase-00-execution-baseline/
1	Programming & Data Structure Foundations	Easy → Medium	phase-01-foundations/
2	Standard Coding Interview Patterns	Medium → Medium-Hard	phase-02-patterns/
3	Advanced Data Structures	Medium-Hard → Hard	phase-03-advanced-ds/
4	Graph Mastery	Medium-Hard → Hard	phase-04-graphs/
5	Dynamic Programming (Basic → Extreme)	Medium → Very Hard	phase-05-dp/
6	Greedy, Proofs & Mathematical Thinking	Medium-Hard → Hard	phase-06-greedy/
7	Competitive Programming Acceleration	Hard → CP-Hard	phase-07-competitive/
8	Practical Engineering Coding Interviews	Medium-Hard → Hard	phase-08-practical-engineering/
9	Language & Runtime Deep Dive	All levels (cross-cutting)	phase-09-language-runtime/
10	Testing, Debugging & Correctness	All levels (cross-cutting)	phase-10-testing-debugging/
11	Mock Interview Mastery	Beginner → Grandmaster	phase-11-mock-interviews/
12	Grandmaster / Final Boss	CP-Hard → Grandmaster	phase-12-grandmaster/

Difficulty Ladder

Level	What You Solve	Solving Time	Failure Means	When To Move Up
Beginner	Trivial array/string traversals	5–10 min	Confusion about loops/indexing	After 30 problems with 0 confusion
Easy	LeetCode Easy	10–15 min	Wrong brute force, syntax errors	90% solved <15 min for 50 problems
Medium	LeetCode Medium	25–35 min	Missed pattern, bad complexity	75% solved <35 min for 100 problems
Medium-Hard	Top-100 Mediums, easy Hards	30–45 min	Couldn’t optimize past brute force	70% solved <45 min for 60 problems
Hard	LeetCode Hard	45–75 min	Failed to find any non-trivial approach	50% solved <60 min for 50 problems
Very Hard	LC Hard tagged “very hard”, Codeforces 1900–2100	60–120 min	Conceptual gaps in algorithm class	40% solved unaided for 30 problems
CP Hard	Codeforces 2100–2400, AtCoder ARC F	90–180 min	Missing a CP-specific technique	Solving consistently in contest
Grandmaster	CF 2400+, AGC, ICPC WF	Open-ended	Even with hints, can’t make progress	When you stop needing this curriculum

For each level: failure is normal. The value is in the review process and the failure analysis, not the original solve.

When to repeat a level: if your unaided success rate is below the threshold above, do another 30–50 problems at the same level before moving up. Moving up too early calcifies bad habits.

Progress Tracking

Use a single spreadsheet/journal with columns:

Date
Phase / Pattern
Problem name + source
Solved unaided? (Y/N)
Time spent
Got it on first attempt? (Y/N)
Why missed (if missed) — link to FAILURE_ANALYSIS.md category
Next review date — from SPACED_REPETITION.md

Without tracking, the spaced repetition system collapses. Without spaced repetition, knowledge decays faster than you accumulate it.

A Note On Honesty

This curriculum cannot guarantee outcomes. What it guarantees is:

If you complete every phase honestly (failed problems reviewed, mocks done with full effort, no shortcuts), you will recognize > 95% of common interview problem patterns.
You will be able to solve unfamiliar problems — because the framework forces you to derive solutions, not memorize them.
You will fail mock interviews repeatedly and learn from each one. This is the entire point.
You will know your own weaknesses precisely, and which phase to revisit.

Skip the failure analysis, skip the reviews, skip the mocks — and the curriculum becomes a list of topics. Topics don’t get you hired.

📖 Published · commit 406ccc0 · 2026-05-21 16:15 UTC

Universal Problem-Solving Framework

Use this on every problem. It is non-optional. The goal is to make solving deterministic, not heroic.

The 16-Step Framework

1. Restate The Problem

Say it back in your own words. If you can’t restate, you don’t understand. Ambiguous parts surface here.

2. Ask Clarifying Questions

Input type, range, sign, precision
Are inputs sorted? Distinct? Can be empty/null?
Output format. One answer or all answers?
Are duplicates allowed?
Can the input mutate? Is it streamed?
What should happen on invalid input?
Constraints not given?

3. Identify Constraints

Constraints dictate the algorithm. Memorize this table:

N	Acceptable Complexity	Likely Approach
≤ 10	O(N!) or O(2^N · N)	Backtracking, bitmask brute force
≤ 20	O(2^N · N)	Bitmask DP, meet-in-the-middle
≤ 100	O(N^4)	Multi-loop brute, Floyd-Warshall
≤ 500	O(N^3)	Interval DP, matrix chain
≤ 5,000	O(N^2)	2D DP, edit distance
≤ 100,000	O(N log N) or O(N √N)	Sort + scan, segment tree, sqrt decomp
≤ 1,000,000	O(N) or O(N log N)	Linear scan, hashmap, two pointers
≤ 10^8	O(N) or O(log N)	Math closed form, binary search on answer
≤ 10^18	O(log N)	Binary exponentiation, math

4. Work Through Examples

Use the given example. Then build at least two more:

A trivial case (size 1, empty)
An adversarial case (max constraints, all duplicates, all negative, sorted descending)

Work them by hand. Annotate intermediate states. This often reveals the pattern.

5. Identify Brute Force

What is the dumbest correct solution? Write it down (pseudocode). Don’t skip this even if you “see” the optimal — the brute force is your correctness oracle for stress testing.

6. Analyze Brute Force Complexity

Time and space, in big-O. If it fits the constraints, you may be done. Often it does for small N.

7. Recognize Patterns

Run through the pattern checklist in phase-02-patterns/:

Sorted or sortable? → two pointers, binary search
Asks for max/min over windows? → sliding window, monotonic deque
“Subarray with property X”? → prefix sum, sliding window
“K-th something”? → heap, quickselect
“Number of ways”? → DP, combinatorics
“Shortest path”? → BFS / Dijkstra / 0-1 BFS
“Connected components / cycles / dependencies”? → union-find / DFS / topo sort
“Decision: can we do X with budget Y?” → binary search on answer
“Optimal sequence with overlapping subproblems”? → DP

8. Derive Optimized Approach

Reduce repeated work. Cache, sort, hash, prune, transform. State the invariant of your approach.

9. Prove Correctness

For greedy: exchange argument or cut property.
For DP: state definition + transition + base cases + evaluation order.
For graphs: cite the algorithm’s correctness theorem and verify preconditions.
For two-pointer/sliding window: the loop invariant.

10. Write Clean Code

Meaningful names
Single-responsibility functions
No premature abstraction
Avoid mutating function parameters unless intentional
Match the patterns of the language idiom

11. Test Smoke Cases

Walk the given examples through your code by hand. Don’t run yet. Find bugs before silicon does.

12. Test Edge Cases

Empty / null
Size 1 / size 2
All duplicates
All negative / mixed signs
Sorted ascending / descending
Max constraint values (overflow risk)
Multiple valid answers (specify which one)
Disconnected graph / cycle
Concurrent access (where relevant)

13. Test Large Cases

Confirm complexity by reasoning, not by running. Will N=10^6 pass in 1 second? In your language’s runtime?

14. Explain Complexity

State time and space. State whether the bound is tight. Mention amortized vs worst-case if relevant. State assumptions (hash table O(1) average is an assumption).

15. Handle Follow-Ups

Anticipate. Most interviews follow up with one of:

“What if the input doesn’t fit in memory?” → streaming, external sort, sketches
“What if it’s distributed?” → sharding, consistent hashing
“What if reads >> writes?” → caching, replicas
“What if writes >> reads?” → log-structured, write-back
“What if we need approximate answers?” → Bloom, HLL, count-min
“How would you test this?” → unit + property + stress + concurrency
“How would you debug a production failure?” → logs + metrics + repro

16. Discuss Production Implications

For practical engineering interviews: monitoring, logging, metrics, partial failure, backpressure, retries, idempotency, observability, deployment.

The Stuck Protocol

When you’ve been silent for >2 minutes, or have no progress for 5 minutes, switch into this mode. Do not freeze. Do not flail.

1. Restate What Is Known

Out loud. “I have an array of N integers, I want the longest subarray such that…”

2. Write Brute Force

Even if it’s O(N^4) and you know it won’t pass. Brute force gives you:

A correctness oracle
A starting point to optimize from
Intermediate state to inspect for patterns

3. Inspect Constraints Again

Have you forgotten one? Often the constraint is the hint. N ≤ 20 screams bitmask. K ≤ 10 screams “K is in the state”.

4. Try Smaller Examples

Solve N=1, N=2, N=3 by hand. Patterns emerge. Often the recurrence falls out.

5. Look For Repeated Work

In the brute force, what’s recomputed? That’s your DP state or memoization target.

6. Look For Monotonicity

Is there a value over which the answer is monotonic? → binary search on answer
Is there a window whose property is monotonic? → sliding window, monotonic stack/deque

7. Look For Graph Modeling

Words like “depends on”, “leads to”, “transitions”, “groups”, “components”, “blocked by” all suggest graphs. Try modeling explicitly.

8. Look For DP State

Ask: “What information do I need at position i to decide what’s optimal going forward?” That information is the state.

9. Look For Greedy Invariant

Ask: “If I make the locally best choice, can I prove I never need to undo it?” If yes, greedy. If no, DP.

10. Ask For A Small Hint Professionally

Sample phrases:

“I’ve considered X and Y but I’m having trouble seeing the structure. Could you nudge me toward the right family of approach?”
“Is the input small enough that exponential is acceptable, or are we targeting polynomial?”

A well-asked hint costs you almost nothing. A 10-minute silence is fatal.

11. Recover And Continue

Take the hint, restate the new constraint or insight, commit out loud to a direction, and resume coding. Don’t apologize repeatedly. Move forward.

A Note On Discipline

This framework feels slow for the first 50 problems. By problem 200, you’ll execute steps 1–9 in 4 minutes flat. By problem 500, the framework runs subconsciously and you’ll only consciously invoke it when stuck.

The goal is not to memorize the framework. It’s to internalize it so deeply that when you read a problem, your brain runs steps 1–9 automatically.

Candidates who skip the framework “to save time” lose interviews to candidates who don’t, because the framework users:

Catch ambiguity before they code
Get the right complexity on the first attempt
Don’t waste minutes coding the wrong thing
Communicate clearly throughout
Know what to test before they finish coding

Code Quality Standards For Interviews

Interview code is judged differently from production code. The bar is:

Correct above all else
Readable — the interviewer should follow it without you explaining every line
Simple — the simplest solution that works, not the cleverest
Defensive only at boundaries — validate inputs once, then trust them
Testable — pure functions and clear data flow

Interview code is not:

Production-ready (no logging, no metrics, no retries unless asked)
Heavily commented (good names beat comments)
Premature-abstraction (no factory factories)
Defensive everywhere (validating inside hot loops is noise)

The Quality Dimensions

Dimension	What “Good” Looks Like
Correctness	Handles every edge case the problem allows
Simplicity	No clever tricks unless required for complexity
Readability	A peer can read it once and understand
Naming	`parent`, `visited`, `frequency` — not `p`, `v`, `f` (except in trivial scopes)
Modularity	Helper functions for distinct logical units
Boundary handling	Empty/null/overflow checked once at the entry
No premature abstraction	One-time logic stays inline
No overengineering	Don’t build a config system for a 30-line problem
No hidden state	Globals/singletons are red flags
Minimal mutation	Prefer immutable returns where natural
No excessive cleverness	One-liners that need a paragraph to explain are anti-signal
Standard library use	Use the language’s built-ins idiomatically
Testability	Logic separated from I/O, deterministic

Bad vs Good Examples

Naming

Bad:

def f(a, b):
    r = []
    for x in a:
        if x > b:
            r.append(x)
    return r

Good:

def values_above(numbers, threshold):
    return [n for n in numbers if n > threshold]

Boundary Handling

Bad (validates in every iteration):

def sum_positive(nums):
    total = 0
    for n in nums:
        if nums is None or len(nums) == 0:  # checked every iteration
            return 0
        if n > 0:
            total += n
    return total

Good (validate once at the boundary):

def sum_positive(nums):
    if not nums:
        return 0
    return sum(n for n in nums if n > 0)

Excessive Cleverness

Bad (one-liner, hard to debug):

def has_duplicate(nums):
    return len(nums) != len({*nums}) if nums else False

Good (clear intent):

def has_duplicate(nums):
    seen = set()
    for n in nums:
        if n in seen:
            return True
        seen.add(n)
    return False

The good version also has the advantage of early termination — better complexity in practice.

Helper Functions

Bad (everything in one 50-line function):

def shortest_path(grid, start, end):
    # 50 lines of BFS, neighbor computation, distance tracking, all inline
    ...

Good (extract neighbor logic):

def shortest_path(grid, start, end):
    queue = deque([(start, 0)])
    visited = {start}
    while queue:
        pos, dist = queue.popleft()
        if pos == end:
            return dist
        for nxt in neighbors(grid, pos):
            if nxt not in visited:
                visited.add(nxt)
                queue.append((nxt, dist + 1))
    return -1

def neighbors(grid, pos):
    r, c = pos
    rows, cols = len(grid), len(grid[0])
    for dr, dc in ((-1, 0), (1, 0), (0, -1), (0, 1)):
        nr, nc = r + dr, c + dc
        if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] != '#':
            yield (nr, nc)

Premature Abstraction

Bad (over-engineered for a single use):

class CounterStrategy:
    def count(self, items):
        raise NotImplementedError

class HashCounterStrategy(CounterStrategy):
    def count(self, items):
        d = {}
        for x in items:
            d[x] = d.get(x, 0) + 1
        return d

def majority_element(nums):
    counts = HashCounterStrategy().count(nums)
    return max(counts, key=counts.get)

Good:

def majority_element(nums):
    counts = Counter(nums)
    return counts.most_common(1)[0][0]

Mutation

Bad (mutates input):

def normalized(values):
    for i in range(len(values)):
        values[i] = values[i] / max(values)  # also recomputes max each iteration
    return values

Good (no mutation, single max computation):

def normalized(values):
    if not values:
        return []
    peak = max(values)
    return [v / peak for v in values]

Hidden State

Bad (global counter):

_call_count = 0
def fib(n):
    global _call_count
    _call_count += 1
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)

Good (state passed explicitly):

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)

Language-Idiomatic Code

You should write code that looks like it was written by someone fluent in the language. Examples:

Python

Use list/dict/set comprehensions where natural
Use Counter, defaultdict, deque, heapq, bisect
Use enumerate, zip, unpacking
Avoid C-style for i in range(len(x)) if you only need values

Java

Use enhanced for-loop where possible
Use Map.computeIfAbsent, Map.getOrDefault
Use Optional only where it fits the API; not for short-circuit logic
Prefer ArrayDeque over Stack (legacy)

Go

Prefer slices over arrays
Use for range for both index and value
Return errors explicitly; don’t panic in interview code
Buffered channels only when justified

C++

Use auto where it improves readability
Range-based for-loops
Prefer std::vector and std::unordered_map
Use emplace_back over push_back for non-trivial types
const references for non-trivial inputs

JavaScript/TypeScript

Use Map/Set (not {}/array hacks) when keys aren’t strings or order matters
const by default, let only when reassignment is needed
Avoid var
TS: prefer unknown over any at boundaries

Comments

Avoid: comments that restate what the code says (// increment i).
Prefer: comments that explain why — the non-obvious tradeoff, the invariant, the reason a less elegant approach was chosen.
Required: a one-line comment above any non-trivial recurrence/invariant. Example: # dp[i][j] = max profit with i transactions ending at day j.

Length

For most coding interview problems:

Easy: 10–25 lines
Medium: 20–50 lines
Hard: 30–80 lines

If your Easy is 80 lines, you’re overengineering. If your Hard is 200 lines, you’ve gone wrong somewhere — re-examine the approach.

Final Self-Review Checklist

Before saying “I’m done”:

☐ Function/variable names are meaningful
☐ No dead code, no commented-out blocks
☐ Boundary checks at the entry, not in the hot loop
☐ Helper functions for distinct logical units
☐ No mutation of input unless intentional and stated
☐ No globals introduced
☐ Standard library used idiomatically
☐ Indentation/formatting consistent
☐ Code I would be willing to send a colleague for review

Interview Communication Rules

Coding interviews are collaborative problem-solving sessions, not exams. Half the signal an interviewer collects is how you communicate. A correct silent solution often scores worse than a slightly imperfect one with strong communication.

Core Principles

Narrate continuously, but not constantly. Your mouth does not need to track your fingers character-by-character. It tracks your intent.
Show your reasoning, not just your conclusion. “I’m choosing a hashmap because we need O(1) lookup by ID” beats silently typing Map<String, Foo>.
Two-way street. Pause for confirmation at decision points. The interviewer wants to be involved.
Hide nothing. If you’re unsure, say so. If you don’t remember an API, say so. Hiding looks worse than admitting.
Forward motion always. Even when stuck, narrate progress: “I’m now going to try a smaller example” is forward motion.

Phase 1 — Opening (first 1–3 minutes)

What to do

Read the problem fully (don’t start coding)
Restate it
Ask clarifying questions
Confirm constraints

Sample phrases

“Let me read this through once first… OK. So if I understand correctly, I need to [restate]. Is that right?”

“Before I start, can I confirm a few things about the input? Specifically: can the array be empty? Can it contain negative numbers? Are there duplicates?”

“What’s the expected size of N here? Around 10^5? OK, so we’re targeting O(N log N) or better.”

“If there are multiple valid outputs, can I return any one, or is there a specific one expected?”

Anti-patterns

Starting to code immediately
Asking 15 questions in a row (drip-feed them as they become relevant)
Asking questions whose answers are obvious from the problem statement (signals you didn’t read carefully)

Phase 2 — Brute Force (1–3 minutes)

What to do

State the dumbest correct solution
Compute its complexity
Confirm whether it’s acceptable

Sample phrases

“Let me start with the brute force just to anchor. I could check every pair, which would be O(N^2). Given N is up to 10^5, that’s 10^10 operations — too slow. So we need something better.”

“The naive solution is O(2^N) — that’s fine for N=20 but won’t scale. Let me see if we can do better.”

“If we sort first, that gives us O(N log N) for the sort, and then… let me think about the scan.”

Anti-patterns

Skipping brute force (“I know the optimal already”)
Computing brute force complexity wrong (always double-check)
Stating brute force without saying whether it suffices

Phase 3 — Optimization (3–10 minutes)

What to do

Think out loud
State observations
Propose ideas and evaluate them
Commit to one direction
Sanity-check before coding

Sample phrases

“I notice the array is sorted, which means I can probably use two pointers…”

“The fact that K is small — only up to 10 — suggests we might want K in our DP state.”

“I see a pattern of repeated subproblems here. Let me define a state: dp[i][j] = …”

“Let me try a small example to verify my recurrence…”

“OK, I think my approach is: sort by start time, then greedily pick. Let me convince myself this works with an exchange argument…”

Anti-patterns

Switching ideas every 30 seconds without exploring any
Coding before stating the approach
Not verifying correctness on a small example

Phase 4 — Coding (10–25 minutes)

What to do

Narrate intent at function/block boundaries
Explain non-obvious choices
Stay quiet during routine syntax
Ask “is the API I’m assuming OK?” if uncertain

Sample phrases

“I’m going to use a heap here, in Python that’s heapq. Pushing tuples to break ties by ID.”

“For the visited set, I’ll use a hash set rather than a 2D boolean array, since we’re not sure of the bounds.”

“I’m using a sentinel value of -1 to mean ‘not yet computed’ in the memoization.”

“Let me extract this into a helper function — it’ll make the recursion cleaner.”

Anti-patterns

Silent typing for 5+ minutes
Constant low-level narration (“now I’m typing a for loop”)
Going down a rabbit hole on language minutiae mid-flow

Phase 5 — Testing (3–5 minutes)

What to do

Walk through the given example
Walk through your own edge cases
Catch and fix bugs before the interviewer points them out

Sample phrases

“Let me trace through example 1. Initial state: … after iteration 1: … after iteration 2: … final answer: 7. Matches the expected.”

“Edge cases: what if the array is empty? My code would… let me check… yes, it returns 0, which is correct.”

“What about all negatives? Hmm, my initialization assumes a non-negative max. Let me fix that — initialize to negative infinity.”

“I think there’s an off-by-one here. Let me re-examine the loop bound.”

Anti-patterns

Skipping testing because “the code looks right”
Testing only the happy path
Defending broken code when a bug is pointed out (just fix it)

Phase 6 — Complexity & Follow-Ups (2–5 minutes)

What to do

State time and space
Mention assumptions
Engage with follow-ups thoughtfully

Sample phrases

“Time complexity is O(N log N) — dominated by the sort. Space is O(N) for the auxiliary array, or O(1) if we sort in place.”

“Average case for the hashmap is O(1), but worst case is O(N) under adversarial hashing. In Python the dict implementation handles that reasonably well.”

“If we needed to scale this to 10 million users, I’d consider… [sharding / external sort / approximate counters / etc.]”

“If reads were much more frequent than writes, we might precompute and cache.”

Anti-patterns

Claiming O(N) when it’s actually O(N log N)
Forgetting space complexity
Defensive answers to follow-ups (“I’d need more info” instead of engaging)

Handling Hints

If the interviewer gives a hint:

“Ah, that helps — so you’re saying we should [restate]. Let me adjust my approach…”

Then commit out loud to the new direction. Don’t second-guess. Take the hint, integrate it, move forward.

If you need a hint:

“I’ve explored [X] and [Y]. I’m having trouble seeing how to avoid the O(N^2) here. Could you nudge me toward the family of approach?”

A well-asked hint costs you almost nothing. Frozen silence is fatal.

Handling Mistakes

When you find your own bug:

“Hmm, let me re-examine this — I think there’s a bug at line N. Yes, the comparison should be < not ≤. Let me fix that.”

When the interviewer finds a bug:

“Oh, you’re right — the empty case isn’t handled. Let me add a guard.”

Never:

Argue
Explain why the bug is “not really a bug”
Get flustered

Always:

Acknowledge briefly
Fix
Move on

Handling Pressure / Freezing

If you blank:

“Give me 30 seconds to think.” (Then actually think — don’t fake it.)

Then run the stuck protocol. Out loud:

“OK, let me back up. What I know is [X]. The brute force was [Y]. The constraint that’s hinting at something is [Z]…”

Talking through the stuck protocol restarts your thinking and shows the interviewer you have a process for being stuck.

Closing Strong

Final 1–2 minutes:

“To summarize: I’m using [data structure] with [algorithm]. Time is O(…), space is O(…). Edge cases handled: empty, single element, duplicates, max constraints. The main risk in production would be [X], which I’d address by [Y].”

A clean summary leaves the interviewer with a tidy mental model of your work — much better than ending mid-test.

Body Language & Tone (Video / In-Person)

Sit up. Look at the interviewer when explaining, at the editor when coding.
Speak at moderate pace. Faster than normal = nervous, slower = padding.
Avoid filler (“um”, “like”) — silence is preferable to filler.
Don’t apologize repeatedly. One apology when you fix a bug is enough.
Show interest in the problem. Curiosity is a positive signal.

Phrases To Avoid

“This is easy” — even if it is, this can read as arrogant
“I’ve seen this before” — be careful; if you go on autopilot you’ll miss the variant
“I don’t know” with no follow-up — replace with “I don’t know X, but I can reason about it via Y”
“That won’t work” without explanation
“Just” used dismissively (“we just need to…”)

What Strong Communication Buys You

Partial credit when your code is incomplete (the interviewer saw your reasoning)
Hints offered earlier (interviewers want to help engaged candidates)
Believability of follow-up answers (a candidate who reasoned clearly throughout is trusted on production tradeoffs)
Hire signal even on hard problems you didn’t fully solve

Failure Analysis System

Every failed problem (or weak performance) maps to one or more failure categories. The category determines the drill that fixes it. Without this taxonomy, you’ll repeat the same mistake forever.

How To Use

After every miss:

Identify the primary failure category (the root cause, not the symptom).
Identify any secondary categories.
Run the listed drill within 24 hours.
Re-test on a similar problem within 1 week.

The 16 Failure Categories

1. Did Not Understand The Problem

Symptom: Solved a problem the interviewer didn’t ask. Root cause: Skipped restating, didn’t ask clarifying questions. Fix: Always restate. Always ask 3+ clarifying questions before coding. Drill: Take 10 LeetCode problems, do only steps 1–4 of FRAMEWORK.md (restate, clarify, identify constraints, examples). Don’t solve them. The drill is reading. Re-test: Mock interview with an interviewer who deliberately gives an ambiguous problem.

2. Missed Constraints

Symptom: Brute force passed at small N but TLE’d at full N. Or used 32-bit integer when sum exceeds 2^31. Root cause: Skipped step 3 of the framework. Fix: Make a habit: the moment you read a constraint, derive the target complexity out loud. Drill: Read 20 problems. For each, before reading the editorial, write down: target complexity, allowed N, integer width needed. Re-test: Solve 5 problems where the constraint is the hint (e.g., N≤20 → bitmask).

3. Could Not Find Brute Force

Symptom: Stared at the problem with no starting point. Root cause: Tried to find the optimal directly. Dangerous habit. Fix: Brute force is always possible. Iterate over every subset / pair / arrangement / state. State it even if it’s exponential. Drill: 10 problems. For each, write only the brute force in pseudocode. Don’t optimize. Re-test: Mock with a hard problem; goal is to communicate brute force in <3 minutes.

4. Could Not Optimize

Symptom: Wrote brute force, then froze. Root cause: No systematic optimization toolkit. Fix: Run the optimization checklist (step 7 of framework): pattern recognition, repeated-work elimination, monotonicity, sortedness exploitation, state compression, math. Drill: Take 10 brute-force solutions to known problems. For each, optimize without looking at the editorial using the checklist. Re-test: Solve 5 unfamiliar mediums in <30 min each.

5. Chose Wrong Data Structure

Symptom: Used a list where a set would have given O(1) lookup. Used a heap where a sorted array suffices. Used recursion where a stack would simplify. Root cause: Didn’t reason about access patterns. Fix: Before choosing a DS, list the operations you need and their frequency. Then pick the DS whose complexity matrix matches. Drill: phase-01-foundations/data-structures/ — reread the operations table for each DS. Then re-solve 5 problems consciously documenting why each DS was chosen. Re-test: Mock with explicit “why this DS?” follow-ups.

6. Bad Complexity

Symptom: Said O(N) when it was O(N log N). Or claimed O(1) on operations that are amortized. Root cause: Sloppy analysis, no habit of double-checking. Fix: State complexity and the basis. “O(N log N) because we sort, then linear scan” — not just “O(N log N)”. Drill: Take 20 of your own past solutions. Re-derive complexity. Compare to what you said. Re-test: Mock interviewer specifically grills on complexity.

7. Buggy Implementation

Symptom: Approach was right, but the code had off-by-ones, wrong operators, swapped variables. Root cause: Coding faster than thinking. Insufficient pre-coding clarity. Fix: Write pseudocode first. Trace through one example before running anything. Drill: Standard implementations: write binary search, BFS, DFS, union-find from scratch 10 times each. Use templates only after you can write them error-free. Re-test: Solve 5 problems with hand-traced verification before submission.

8. Weak Testing

Symptom: Submitted, got “wrong answer on test 7”. Hadn’t tested edge cases. Root cause: Skipped step 12 of the framework. Fix: Make the universal edge-case checklist a reflex. Before submission, run through it. Drill: Take 5 of your “wrong on hidden test” problems. Without looking at the test that failed, generate 10 edge cases for each. Re-test: Solve 5 problems and catch your own bug before submission. Score = problems where you fixed a bug pre-submission.

9. Poor Communication

Symptom: Mock interviewer said “I had no idea what you were thinking.” Root cause: Silent coding, or only narrating low-level mechanics. Fix: COMMUNICATION.md. Narrate intent at decision points. Drill: Solve 10 problems while recording yourself narrating. Listen back. Are you communicating decisions, or just typing aloud? Re-test: Mock with explicit communication scoring.

10. Froze Under Pressure

Symptom: Knew the technique in practice. Couldn’t access it in mock. Root cause: Insufficient mock volume. Fix: More mocks. Pressure tolerance is built only by exposure. Drill: 5 mocks in a week with a real human or pressure-simulating tool. Use the stuck protocol explicitly when you freeze. Re-test: Mock with cold problems. Goal: never go silent for >60 seconds.

11. Missed Edge Cases

Symptom: Solution was correct on examples but failed on size-1, empty, or max-constraint inputs. Root cause: Didn’t run the universal checklist. Fix: Build muscle memory for: empty / 1 / 2 / dup / negative / sorted / reversed / max / disconnected / cycle. Drill: Take 10 past solutions. For each, brainstorm 5 edge cases. Verify your code handles them. Re-test: Mock where edge cases are the gating criterion.

12. Runtime / Language Issue

Symptom: Code was algorithmically right but ran slow / crashed / behaved wrong due to language behavior. Examples: integer overflow in Java, dict iteration order bug, Python recursion limit, C++ undefined behavior. Root cause: Insufficient runtime depth. Fix: phase-09-language-runtime/ for your primary language. Drill: Read your language’s track end-to-end. Solve 5 problems specifically targeting the gotchas (e.g., overflow problems, recursion-depth problems). Re-test: Runtime-deep-dive mock (mock-09).

13. Concurrency Issue

Symptom: Race condition, deadlock, lost update, visibility bug. Root cause: Missing concurrency mental model. Fix: phase-09-language-runtime/ (concurrency sections) + phase-08-practical-engineering/ thread-pool / job-queue labs. Drill: Implement thread-pool, rate limiter, and producer-consumer queue from scratch with race-condition tests. Re-test: Concurrency-heavy mock (mock-11).

14. Overfit To Memorized Pattern

Symptom: Forced a known pattern that didn’t fit. E.g., applied sliding window to a problem that needed monotonic deque. Root cause: Pattern matching without understanding. Fix: Re-read pattern docs and focus on when the pattern does NOT apply. Drill: Take 10 problems. For each, list 3 patterns it might be, then eliminate 2 with reasoning. Re-test: Mock with problems specifically chosen to be near-misses of common patterns.

15. Did Not Prove Correctness

Symptom: Submitted a greedy that was wrong. Or a DP whose recurrence was incorrect. Root cause: Skipped step 9 of the framework. Fix: Force yourself to state the invariant or recurrence before coding. Drill: Take 10 greedy problems. For each, state the exchange argument. Take 10 DP problems. For each, state the recurrence + base case + evaluation order. Re-test: Solve 5 problems where the proof is the hard part.

16. Could Not Handle Follow-Up

Symptom: Solved the core problem, but interviewer’s follow-up (“how would this scale to 10M users?”) got a vague answer. Root cause: No production / system thinking. Fix: phase-08-practical-engineering/ labs all include the standard follow-up bank. Drill: Take 10 of your past solutions. For each, write a 2-paragraph answer to: “scale to 10M users”, “make distributed”, “handle partial failure”, “add observability”. Re-test: Senior-engineer mock (mock-07) which weighs follow-ups heavily.

Tracking Failures Over Time

Maintain a failures.md (or spreadsheet) with columns:

Date	Problem	Primary Category	Secondary	Drill Done?	Re-test Date	Re-test Result

After 4 weeks, count by category. Your top 3 categories are your personal weakness profile. Spend the next 4 weeks specifically drilling those.

This is how you avoid the “I keep failing at the same thing” trap.

Common Compound Failures

Some categories often co-occur. If you see these together, treat the deeper one:

#1 + #11 (didn’t understand + missed edges) → really #1. Fix understanding first.
#3 + #4 (no brute force + couldn’t optimize) → really #3. You can’t optimize what you don’t have.
#7 + #8 (buggy + weak testing) → really #8. Better testing catches bugs.
#5 + #6 (wrong DS + bad complexity) → really #5. Right DS gives right complexity.
#10 + #9 (froze + bad communication) → really #9. Talking unfreezes you.
#14 + #15 (overfit pattern + no proof) → really #15. Proving forces understanding.

When Failures Are Good

Some failures are productive:

First attempt at a new pattern → expected to fail, the failure teaches.
Difficulty stretch (jumping a level) → expected to fail 50%+ of the time.
Mock interviews → 30%+ failure rate is normal and healthy.

Failures are bad when:

You repeat the same category 5+ times without improvement.
You stop tracking them.
You don’t run the drill.
You blame the problem (“that was unfair”) instead of analyzing yourself.

Final Readiness Checklist

You are ready for the interviews you’re targeting only when every box below is honestly checked. This is binary, not aspirational.

Algorithmic Solving

Solve LeetCode Easy in <12 minutes (90% success rate over 50 recent problems)
Solve LeetCode Medium in 25–35 minutes (75% success rate over 100 recent problems)
Solve LeetCode Hard in 45–60 minutes (50% success rate over 50 recent problems)
Recognize the pattern within 2 minutes of reading any LeetCode Medium
Derive a non-trivial optimization without seeing it before, on at least 30 unfamiliar problems

Brute Force & Optimization

Can state the brute force in <2 minutes for any unseen problem
Can compute brute force complexity correctly without aid
Can derive optimal complexity from constraints alone
Have written brute-force-comparator tests for at least 10 problems

Correctness & Proofs

Can state DP recurrences with base cases and evaluation order on the spot
Can produce an exchange argument for at least 5 greedy problems
Can produce a counterexample to a wrong greedy in <5 minutes
Have proved correctness for at least 20 different solutions (greedy, DP, graph)

Code Quality

Can write binary search, BFS, DFS, union-find, topological sort, and Dijkstra from scratch error-free in <10 minutes each
Code passes the self-review checklist on every solve
No off-by-one bugs in 10 consecutive binary search problems
No mutable default argument / shared-state bugs in 10 consecutive recursion problems

Testing

Run the universal edge-case checklist as reflex on every problem
Have written stress-test verifiers for at least 5 problems
Catch your own bugs before submission on 80%+ of problems
Property-based tested at least 3 implementations

Patterns

All 28 patterns in phase-02-patterns/ — recognize signals in <2 minutes
Have solved at least 5 problems per pattern
Can explain when each pattern does not apply
Can produce a clean template for each pattern from memory

Data Structures

Internal representation, complexity, and memory behavior of all foundation DS in phase-01-foundations/
Can choose between hashmap / sorted array / heap / BST given access patterns
Can implement segment tree, Fenwick tree, trie, LRU cache from scratch
Understand iterator invalidation, hash collision behavior, and resize cost in your primary language

Graph Algorithms

BFS, DFS, Dijkstra, topological sort, union-find — implement from scratch in <10 minutes each
Recognize when 0-1 BFS, Bellman-Ford, or A* is the right choice
Can model a word problem as a graph (nodes + edges + weight) in <3 minutes
Have completed all 9 product-style labs in phase-04-graphs/labs/

Dynamic Programming

Can derive state + transition + base case for unseen DP problems
Have completed brute-force → memo → tabulated → space-optimized for at least 15 problems
Recognize 1D / 2D / interval / tree / digit / bitmask / knapsack signals
Can articulate why greedy fails for a given DP problem

Language & Runtime

Read your primary language’s track in phase-09-language-runtime/ end to end
Can explain stack vs heap, scope/lifetime, value vs reference semantics fluently
Can explain GC / ownership behavior of your language
Can explain hash collision and resize behavior of your language’s hashmap
Have used the language’s profiler at least once on real code

Concurrency

Can identify race conditions and deadlocks in code review
Have implemented thread pool, rate limiter, and bounded blocking queue from scratch
Can articulate memory visibility / happens-before in your language
Have used the race detector / equivalent at least once

Practical Engineering

Completed at least 12 of the 23 labs in phase-08-practical-engineering/
Can answer all 13 standard follow-ups (10M users, distributed, concurrency, race testing, metrics, logging, debugging, partial failure, memory leaks, extensibility, backpressure, retries, deduplication) on demand
Can sketch a small system (LRU cache, rate limiter, autocomplete) on a whiteboard in 30 minutes

Communication

Restate every problem before coding
Ask 3+ clarifying questions per problem
Narrate intent at decision points (not low-level mechanics)
Recover from mistakes without flustering
Handle hints gracefully and integrate them
Close every solve with a clean summary

Mock Interview Performance

Pass mock-03 (Medium LeetCode) — 7+/10 average
Pass mock-05 (Big Tech phone) — 7+/10 average
Pass mock-06 (Big Tech onsite) — 7+/10 average
Pass mock-09 (runtime/language) — 7+/10 average
Pass mock-11 (concurrency) — 7+/10 average
(For senior+) Pass mock-07 (senior engineer)
(For staff+) Pass mock-08 (staff practical)

Failure Analysis

Maintain a failure log per FAILURE_ANALYSIS.md
Top 3 personal failure categories identified
At least 2 weeks of focused drilling on top failure category complete
Re-test results show measurable improvement

Spaced Repetition

Active spaced repetition rotation per SPACED_REPETITION.md
At least 50 problems graduated to Tier 6
No overdue reviews more than 1 week old

Recovery & Stuck Protocol

Never go silent for >60 seconds in a mock
Use the stuck protocol explicitly when stuck
Recover from a wrong direction in <3 minutes
Ask for hints professionally without losing composure

Production Awareness

Can extend any solved problem with: scale to 10M, make distributed, handle partial failure, add observability
Can articulate tradeoffs (memory vs latency, consistency vs availability, accuracy vs speed)

Targeted Roles

Beyond the universal checklist, additional criteria by role:

FAANG / Big Tech

All universal items
mock-05 + mock-06 passed twice with different problems

Infrastructure / Backend / Platform

All universal items
mock-08 + mock-10 passed
All Phase 8 labs complete

Distributed Systems

All universal items
mock-08 + mock-10 + mock-11 passed
Phase 8 + Phase 4 (graph) labs complete

Compiler / Runtime

All universal items
mock-09 passed twice
Phase 9 fully complete for primary language
Phase 3 (advanced DS) labs complete

Quant / HFT

All universal items
mock-12 (competitive style) passed
Phase 7 (competitive) topics complete
Phase 5 (DP) extreme topics complete

Senior / Staff / Principal Practical

All universal items
mock-07 + mock-08 passed twice each
All Phase 8 labs complete with full follow-up answers
Code quality bar unwaveringly met

Competitive Programming

All universal items
Phase 7 + Phase 12 complete
mock-12 (competitive style) passed twice
Solving Codeforces Div 2 D regularly

Honesty Test

For every checked box, ask: “Could I do this right now, cold, with no warm-up?”

If the answer is “after I review my notes” — uncheck it. Notes don’t come into the interview.

If the answer is “if I had a good day” — uncheck it. Interview days are sometimes bad days.

Honesty here is the difference between feeling ready and being ready.

Problem Review Template

Use this after every problem — solved or failed. The review is where the learning compounds. Without it, problems are forgotten in 72 hours.

Save each review as a separate file or notebook entry. Recommended: reviews/YYYY-MM-DD-problem-name.md.

Template

# Problem Name

## Source
LeetCode 42 / Codeforces Round 800 Div 2 C / etc.

## Difficulty
Easy / Medium / Medium-Hard / Hard / Very Hard / CP-Hard / Grandmaster

## Pattern(s)
Two pointers / Sliding window / DP-2D / Graph BFS / etc.
(Multiple patterns possible.)

## First Intuition
What was my first instinct on reading the problem?
What pattern did I think it was?
Was that intuition right?

## Brute Force
- Approach (1–2 sentences)
- Complexity: time / space
- Why it doesn't pass (or "passes within constraints")

## Optimal Idea
- Approach (3–5 sentences)
- Key insight (the *one* thing that unlocks the problem)
- Complexity: time / space

## Why I Missed It (if applicable)
The honest answer. Choose from:
- Didn't recognize the pattern
- Recognized the pattern but applied it wrong
- Couldn't derive the recurrence / invariant
- Wrong data structure choice
- Implementation bug
- Misread the problem
- Ran out of time
- Got the optimal but couldn't prove it

## Key Insight
The single sentence that, if I'd known it, would have unlocked the problem in 5 minutes.

## Data Structures Used
- DS 1: why
- DS 2: why
- (etc.)

## Complexity
Time: O(...)
Space: O(...)
Tight bound? Y/N
Amortized? Y/N

## Bugs I Made
- Off-by-one at line X
- Forgot to handle empty case
- Wrong comparison operator (<= vs <)
- Used wrong variable in inner loop
- Modified collection while iterating
- (etc.)

## Edge Cases I Missed
- Empty input
- Single element
- All duplicates
- All negatives
- Max constraint
- Disconnected component
- (etc.)

## Follow-ups Practiced
- "What if the input is streamed?" → answer
- "What if N is 10^9?" → answer
- "What if memory is constrained?" → answer

## Product Extension
How does this map to a real-world system?
e.g., "This LRU cache pattern is exactly what a CDN edge node uses for hot-content eviction."

## Language/Runtime Notes
- Specific stdlib gotcha I hit
- Memory behavior surprise
- Concurrency consideration if relevant

## How I Would Recognize This Again
A pattern signal in plain English:
"When the problem asks for the minimum window covering K elements over a stream, with constant-time element insertion/removal and order matters, it's a sliding window with a hashmap."

This is the most important field. Optimize for *recognition*, not memorization.

## Re-solve Schedule
Per [SPACED_REPETITION.md](SPACED_REPETITION.md):
- Same day: ☐ done / not done
- 2 days later: ☐ scheduled for [date]
- 1 week later: ☐ scheduled for [date]
- 2 weeks later: ☐ scheduled for [date]
- 1 month later: ☐ scheduled for [date]
- 3 months later: ☐ scheduled for [date]

## Attempts Log
| Date | Unaided? | Time | Outcome | Notes |
|---|---|---|---|---|
| 2026-05-20 | N | 45 min | Wrong, then hint | Missed monotonic stack pattern |
| 2026-05-22 | Y | 18 min | Correct | Recognized pattern immediately |
| 2026-05-29 | Y | 12 min | Correct | Optimal first attempt |

How To Fill It Out (Do’s and Don’ts)

DO

Be brutally honest. “I gave up after 20 minutes” is more useful than “I solved it but slowly”.
Name the one insight. Forcing yourself to a single sentence forces understanding.
Re-solve from a blank file at scheduled intervals.
Tag heavily. Pattern, difficulty, data structure — these become your search keys.

DON’T

Copy the editorial verbatim. Translate into your own words.
Skip the “Why I Missed It” field when you got it right — what was hard about it? What might have tripped you up if you’d been less lucky?
Skip the “How I Would Recognize This Again” field. This is where 80% of the value lives.
Move on without scheduling the next re-solve.

Review Aggregation

Every Friday: skim the week’s reviews. Look for patterns:

“I keep missing monotonic stack opportunities” → drill that pattern.
“I keep making off-by-one bugs in binary search” → write a personal binary search template and use it.
“I keep choosing the wrong DP state on tree problems” → revisit phase-05-dp/categories/dp-tree.md.

Every month: aggregate into a personal weakness list. Top 3 weaknesses get dedicated drilling for the next month.

Spaced Repetition System

The brain forgets new information on a curve. Without re-exposure, ~70% of what you learn today is gone in 7 days. Spaced repetition counteracts this with strategically-timed reviews.

This system applies to two things:

Problems you’ve solved (especially failed-then-solved ones)
Concepts (patterns, data structures, algorithms)

The 6-Tier Interval Schedule

Tier	Interval	Action
1	Same day (within 4 hours of first solve)	Re-solve from scratch
2	2 days later	Re-solve from scratch
3	1 week later	Re-solve from scratch
4	2 weeks later	Re-solve from scratch + try a harder variant
5	1 month later	Verbal explanation + complexity + sketch tests
6	3 months later	Verbal explanation only — proves it’s in long-term memory

Graduation criterion: if Tier 6 is unaided in <120% of your best time, the problem is “owned” — drop it from active rotation. Otherwise, restart at Tier 3.

What “Re-solve” Means

Re-solving is not reading your old solution. It is:

Fresh editor / blank file.
Re-read the problem statement.
Solve from scratch.
Compare to your previous solution after.

If you can’t do it without peeking → demote one tier (tier 4 → tier 3) and continue.

What “Verbal Explanation” Means (Tier 5+)

Without writing code, explain out loud (record yourself if alone):

Restate the problem in your own words.
State the brute force.
State the optimal approach.
State the key insight.
State the complexity (time + space) and why.
State 3 edge cases and how they’re handled.
Sketch 2 follow-up answers.

Listen back. Did you stumble? Demote one tier. Did you flow? Tier graduated.

Concept-Level Spaced Repetition

For patterns and algorithms (not problems), use a similar schedule but with different review actions:

Tier	Interval	Action
1	Same day as learning	Solve 2 problems applying it
2	2 days	Solve 1 problem (different difficulty)
3	1 week	Teach the concept verbally (recorded or to a peer)
4	2 weeks	Solve 1 problem in a domain you don’t usually associate with it
5	1 month	Compare/contrast with a related pattern
6	3 months	Solve 1 hard problem cold; if you spot the pattern in <2 minutes, owned

Logistics: How To Maintain The Schedule

Option 1: Spreadsheet

Sort by Next Review Date. Top of the list = today’s reviews.

Option 2: Anki (or SRS app)

One card per problem.
Front: problem name + difficulty.
Back: pattern + key insight + complexity.
Use the SRS scheduling.

Custom intervals matter — the default Anki intervals are tuned for vocabulary, not problems. Use 1d, 2d, 7d, 14d, 30d, 90d.

Option 3: Folder structure (no tooling)

reviews/
  today.md            # editable list of today's review problems
  upcoming/
    2026-05-22.md     # problems due that date
    2026-05-29.md
    ...
  archive/
    YYYY-MM-DD-problem-name.md

Each evening: move tomorrow’s file to today.md.

Daily Volume Guidelines

When you have N problems in active rotation, your daily review load looks like:

Active Problems	Daily Reviews	New Problems
0–50	0–5	4–6
50–150	5–12	3–5
150–300	10–20	2–3
300+	15–25	1–2

When daily reviews exceed your capacity:

Graduate aggressively (drop owned problems).
Slow down new problem intake.
Consolidate easy/owned problems into “weekly batch reviews” instead of individual reviews.

What To Do When You Fall Behind

Inevitable. When you have 50+ overdue reviews:

Don’t panic-skip. Don’t mark them all done.
Triage by tier. Tier 1 + 2 are the most fragile — do those first.
Drop tier 5 + 6 for 1 week — they decay slowly.
Reduce new intake to 0 until reviews are caught up.
Audit: what made you fall behind? Too many new problems? Underestimated review time? Adjust intake rate.

Why This Matters

Without spaced repetition, your interview prep is a leaky bucket. You add 10 problems a week, lose 8 to forgetting, net +2.

With spaced repetition, you add 5 problems a week, retain 5, and after 3 months you have 60 deeply-owned problems instead of 30 vaguely-remembered ones.

In an interview:

Vaguely-remembered → “I think I’ve seen this before, but I can’t quite…”
Deeply-owned → “This is a [pattern] problem. The key insight is [X]. I’d solve it with [approach].”

The latter is a hire signal. The former is not.

Integration With The Curriculum

Every problem solved during the curriculum enters Tier 1 automatically.
Every problem in phase-11-mock-interviews/ you fail enters Tier 1 with a failure_category tag.
Every concept in a phase README enters Tier 1 the day you finish the phase.
The Tier 6 graduation criterion is also part of the READINESS_CHECKLIST.md.

12-Week Accelerated Track

Audience: You have a deadline in 8–14 weeks (a known interview, an offer expiring, a layoff window). You can put in 25–35 hours/week.

Tradeoffs you are accepting:

You will skip Phase 7 (competitive programming) almost entirely.
You will skip Phase 12 (grandmaster) entirely.
You will get to “competent at FAANG mediums” not “consistently solves FAANG hards in 45 minutes”.
Concept depth is sacrificed for problem volume in some weeks.

This track is sufficient for: new-grad / SWE2 FAANG, scaleups, most backend/platform roles. This track is NOT sufficient for: staff/principal interviews, quant/HFT, compiler/runtime, distributed systems specialty roles.

Daily Cadence

Weekday: 3–5 hours
- 90 min: new content (read concept doc + work through 1 lab)
- 90 min: problem solving (3–5 problems)
- 30 min: review (spaced repetition queue)
- 30 min: failure analysis if you missed anything
Saturday: 6 hours
- 1 mock interview (90 min including review)
- 4 hours problem solving
- 30 min review
Sunday: 4 hours
- Weakness drilling on top failure category
- Re-solve all this week’s failures
- Plan next week

Total: 25–30 hours/week

Weekly Plan

Week 1 — Foundation Reset

Goal: Stop being random. Internalize the framework.

Read README.md, FRAMEWORK.md, COMMUNICATION.md, CODE_QUALITY.md in full
Complete all 7 labs in phase-00-execution-baseline/
25 LeetCode Easy problems applying the framework rigorously (full restate, brute force, optimize, test)
0 mocks (you’re not ready)

Mastery check: You can solve LeetCode Easy in 12 minutes with the full framework, narrating throughout.

Week 2 — Foundations Of Data Structures

Read all 15 DS docs in phase-01-foundations/data-structures/
30 problems: array, string, hashmap, stack, queue (mostly Easy + a few Medium)
1 mock (Easy level)
Read phase-01-foundations/runtime/ — stack vs heap, scope/lifetime, mutable vs immutable

Mastery check: Operations + complexity table for every DS reproducible from memory.

Week 3 — Linked Lists, Trees, Recursion

Phase 1 labs: linked list reversal, recursion, binary search, basic trees
25 problems (15 Easy + 10 Medium)
Read remaining runtime docs in Phase 1
1 mock (Easy/Medium mix)

Week 4 — Pattern Onboarding (Part 1)

Read patterns: two pointers, sliding window, prefix sums, hashing, sorting+greedy, binary search, monotonic stack
For each pattern: 4–5 problems (Medium-leaning)
1 mock (Medium)

Mastery check: Can recognize each pattern’s signal in <2 minutes for unseen problems.

Week 5 — Pattern Onboarding (Part 2)

Read patterns: monotonic queue, intervals, linked list manipulation, tree DFS/BFS, graph DFS/BFS, topological sort
30 problems across these patterns
1 mock (Medium)

Week 6 — Pattern Onboarding (Part 3) + DP Intro

Read patterns: union find, backtracking, basic DP, 1D DP
Read phase-05-dp/concepts/ — memoization, state definition, transitions, base cases
25 DP problems (Easy → Medium)
1 mock (Medium)

Mastery check: Can write the brute → memo → tabulated → space-optimized progression for any 1D DP.

Week 7 — Trie, Heap, K-way Merge, 2D DP

Read patterns: trie, heap top-K, K-way merge, 2D DP, knapsack, subsequence DP, string DP
25 problems
1 mock (Medium-Hard)

Week 8 — Graphs Deep Dive

Read phase-04-graphs/algorithms/: BFS, DFS, multi-source BFS, 0-1 BFS, Dijkstra, topological sort, cycle detection, MST (Kruskal+Prim), union find
25 graph problems
2 product-style labs from phase-04-graphs/labs/
1 mock (Medium-Hard graph-heavy)

Week 9 — DP Deeper + Greedy

Read DP categories: tree DP, interval DP, bitmask DP (overview), knapsack variants, LIS, edit distance
Read phase-06-greedy/concepts/: greedy choice, exchange argument, invariants
25 problems (mix DP + greedy)
2 greedy labs (with proofs)
1 mock (Medium-Hard mixed)

Week 10 — Practical Engineering Coding

Complete labs from phase-08-practical-engineering/: LRU cache, rate limiter, autocomplete, thread pool, KV store, retry+backoff
Read your primary language’s track in phase-09-language-runtime/
15 problems (mediums + 1–2 hards)
2 mocks (1 Big Tech phone, 1 Big Tech onsite)

Week 11 — Hard Problems + Mock Marathon

20 LeetCode Hard problems
Daily mock interviews (5 mocks this week, mix of phase-11-mock-interviews/ types)
Failure analysis on every mock
Drill top failure category

Week 12 — Polish, Confidence, Rest

3 mocks (cold, varied — no pre-warmup)
Re-solve top 30 problems from spaced repetition queue
Complete READINESS_CHECKLIST.md honestly
1 day complete rest before interview

Problem Volume Targets

Week	New Problems	Reviews	Mocks
1	25 (Easy)	0	0
2	30 (E+M)	10	1
3	25 (E+M)	15	1
4	25 (M)	20	1
5	30 (M)	25	1
6	25 (M)	30	1
7	25 (M-H)	30	1
8	25 (M-H)	30	1
9	25 (M-H)	30	1
10	15 (M-H + H)	30	2
11	20 (H)	30	5
12	0	30 (re-solves)	3
Total	~270	~280	18

Review Schedule (per SPACED_REPETITION.md)

Use the abbreviated tier schedule for accelerated track:

Tier 1: same day
Tier 2: 2 days
Tier 3: 1 week
Tier 4: 2 weeks
(Skip tiers 5–6 — you don’t have time before interviews)

What This Track Cannot Buy You

Deep competitive-programming intuition (Phase 7)
Grandmaster-level pattern recognition (Phase 12)
Long-term retention of all 270 problems (you’ll forget ~30% within 3 months without continued review)
The kind of polish that comes from 6+ months of practice

If you finish Week 12 and your interview is delayed by 8+ weeks, switch to the 6-month track for the remaining time. Use the buffer to backfill Phase 7 basics and harden weak patterns.

6-Month Serious Track

Audience: You’re targeting strong Big Tech readiness. You can put in 15–25 hours/week sustainably for 6 months.

Tradeoffs:

You will cover Phase 7 (competitive) selectively — not full grandmaster prep.
You will get to “consistently solves FAANG hards in 45–60 minutes” with strong follow-ups.
You will have meaningful depth in your primary language’s runtime.
This is the recommended track for most readers.

Sufficient for: all FAANG levels including senior, infrastructure, platform, distributed systems, most backend specialties.

Daily Cadence

Weekday: 2–3 hours
- 60 min: new content (read + 1 lab or 2–3 concept docs)
- 60 min: problem solving (2–4 problems)
- 30 min: review queue
Saturday: 4 hours
- 1 mock interview (90 min)
- 2 hours problem solving
- 30 min: failure analysis + review
Sunday: 2 hours
- Re-solves at Tier 4–5
- Weakness drilling

Total: 18–22 hours/week

Monthly Milestones

Month 1 — Foundations (Weeks 1–4)

Cover: Phase 0 + Phase 1 fully, Phase 9 (your primary language) partially.

Week 1: Phase 0 (all 7 labs), 25 Easy problems
Week 2: Phase 1 DS docs (arrays, strings, hashmaps, stacks, queues, linked lists), Phase 1 runtime (stack vs heap, scope, value/reference, mutability), 30 problems
Week 3: Phase 1 DS continued (heaps, sorting, binary search, recursion, basic trees, basic graphs, basic DP), 30 problems, 2 labs
Week 4: Phase 1 runtime complete (hash collisions, iterator invalidation, GC, memory leaks, deep/shallow copy), Phase 9 your-language (first half), 30 problems, 3 labs

End-of-month: all Phase 1 mastery checks pass. 1 mock at Easy level, scored 7+/10.

Volume: ~115 problems, 10+ labs.

Month 2 — Patterns Mastery (Weeks 5–8)

Cover: Phase 2 fully, Phase 9 your-language fully.

Week 5: two pointers, sliding window, prefix sums, difference arrays, hashing, sorting+greedy. 5 problems per pattern.
Week 6: binary search, binary search on answer, monotonic stack, monotonic queue, intervals, linked list manipulation. 5 per pattern.
Week 7: tree DFS/BFS, graph DFS/BFS, topological sort, union find. 5 per pattern. 3 labs.
Week 8: backtracking, basic DP, 1D DP, 2D DP, knapsack basics, subsequence DP, string DP, trie, heap top-K, K-way merge. 4 per pattern.

End-of-month: mastery check on all 28 patterns. 4 mocks (Easy-Medium-Medium-Medium-Hard).

Volume: ~140 problems, 9 labs.

Month 3 — Graphs + DP Deep (Weeks 9–12)

Cover: Phase 4 fully, Phase 5 fully (except convex hull / Knuth optimization, those are Month 6).

Week 9: Phase 4 fundamentals (BFS variants, DFS, Dijkstra, Bellman-Ford, Floyd-Warshall, topo). Solve 30 graph problems. 2 graph labs.
Week 10: Phase 4 advanced (SCCs, bridges/articulation, MST, bipartite, max flow basics, graph modeling). Solve 25. 3 graph labs.
Week 11: Phase 5 categories: 1D, 2D, knapsack, LIS, edit distance, palindrome, string DP. Solve 25. 3 DP labs (full brute→memo→tab→space progression).
Week 12: Phase 5 categories: tree DP, interval DP, bitmask DP, digit DP (intro), DP on DAGs, game DP. Solve 25. 3 DP labs.

End-of-month: all of Phase 4 + Phase 5 mastery checks. 4 mocks (Medium-Medium-Medium-Hard mix).

Volume: ~110 problems, 11 labs.

Month 4 — Greedy, Advanced DS, Practical Engineering (Weeks 13–16)

Week 13: Phase 6 fully (greedy choice, exchange arg, cut property, invariants, monovariants, amortized analysis). 6 greedy labs (with proofs). 20 greedy problems.
Week 14: Phase 3 (segment tree, Fenwick tree, sparse table, KMP, Z, rolling hash, trie variants, bit manipulation, bitmask DP, meet-in-the-middle). 20 problems. 3 advanced DS labs.
Week 15: Phase 8 first half (LRU, LFU, rate limiter, task scheduler, thread pool, job queue, autocomplete, log parser, file dedup). 9 labs with full follow-up answers.
Week 16: Phase 8 second half (consistent hashing, message dispatcher, pubsub, timer wheel, KV store, retry/backoff, circuit breaker, metrics, web crawler, in-memory FS). 10 labs.

End-of-month: Phase 3 + Phase 6 + Phase 8 (most of it) complete. 4 mocks including 1 Big Tech phone screen.

Volume: ~80 problems, 28 labs.

Month 5 — Hard Problems + Concurrency + Senior-Level Skills (Weeks 17–20)

Week 17: Phase 8 final 4 labs + Phase 10 (testing/debugging concept docs). 5 testing/debugging labs. 15 problems.
Week 18: Concurrency deep dive — re-read Phase 9 concurrency sections, solve 5 concurrency-flavored problems, do mock-11 twice. 15 problems.
Week 19: Hard problem week — 25 LeetCode Hards across all patterns. 1 mock per day (5 mocks).
Week 20: Phase 7 selectively (modular arithmetic, sieve, binary exponentiation, combinatorics, sweep line, coordinate compression — skip ICPC-only topics). 20 problems including 8 from Codeforces Div 3. 1 mock.

End-of-month: all of Phase 8, Phase 10, parts of Phase 7. 8 mocks total.

Volume: ~100 problems including ~25 hards.

Month 6 — Polish, Mock Marathon, Production Awareness (Weeks 21–24)

Week 21: Mock-heavy week. 5 mocks (mix mock-05, 06, 07, 09, 11). Failure analysis on each. 15 problems focused on top failure category.
Week 22: Phase 12 selective topics (only those relevant to your role) — for backend/platform: max flow modeling, advanced combinatorics inclusion-exclusion. For systems: nothing additional, focus on Phase 8 polish. 20 problems.
Week 23: Re-solve marathon — re-solve all Tier 5 and Tier 6 problems. Verify READINESS_CHECKLIST.md honestly. 3 mocks.
Week 24: Light week. 2 mocks. 10 problems. Rest. Final readiness check.

End-of-program: READINESS_CHECKLIST.md fully passed.

Volume: ~50 problems, 13 mocks across the month.

Aggregate Volume

Month	New Problems	Labs	Mocks
1	115	10	1
2	140	9	4
3	110	11	4
4	80	28	4
5	100	5	8
6	50	0	13
Total	~595	~63	~34

Review / Spaced Repetition

Full 6-tier schedule from SPACED_REPETITION.md:

Tier 1: same day
Tier 2: 2 days
Tier 3: 1 week
Tier 4: 2 weeks
Tier 5: 1 month
Tier 6: 3 months

By Month 6 you should have ~50–80 Tier 6 graduates.

Mock Schedule

Required mocks across the program (see phase-11-mock-interviews/):

Mock	When	Pass Threshold
mock-02 (Easy)	end of Month 1	7/10
mock-03 (Medium) ×3	Month 2	6/10 → 7/10
mock-04 (Hard) ×2	Month 3	6/10
mock-05 (Big Tech phone) ×3	Months 4–6	7/10
mock-06 (Big Tech onsite) ×3	Months 5–6	7/10
mock-07 (senior) ×2	Months 5–6	6/10
mock-09 (runtime) ×2	Months 5–6	7/10
mock-11 (concurrency) ×2	Month 5	6/10
mock-10 (system-heavy) ×2	Month 6	6/10

Revision Plan

Weekly: Friday evening — skim review log, identify the week’s failure pattern. Monthly: Last weekend — aggregate failures, identify top 3 categories, dedicate next month’s Sunday drilling to category #1. Bi-monthly: End of months 2, 4, 6 — do a “blind mock” with a problem you’ve never seen, no warmup. Score honestly.

What This Track Buys You

~600 deeply-owned problems
All 28 patterns recognizable in <2 minutes
~35 production-style labs with full follow-up answers
Strong concurrency + runtime depth in your primary language
Genuine readiness for FAANG senior-level coding interviews
30+ mock interviews of practice — pressure tolerance is real

What It Does Not Buy You

Codeforces Div 1 / AtCoder AGC level (you’d need Phase 7 + 12 fully)
Quant/HFT-level math depth (math heavy, beyond Month 5 scope)
Compiler/runtime team specialty (you’d need Phase 9 fully across multiple languages)

12-Month Elite Track

Audience: Top-tier targets. Senior/staff/principal practical, FAANG L6+, Jane Street / Citadel / Two Sigma, distributed systems specialty teams, compiler/runtime teams, or competitive programmers building toward Codeforces 2100+.

Commitment: 15–20 hours/week sustained for 12 months. Not a sprint.

This track will get you to:

Phase 7 + Phase 12 substantial depth
Codeforces Div 2 reliable, Div 1 attempt-capable
Multi-language runtime fluency in primary + 1 secondary
Production-system thinking comparable to a working senior/staff engineer
The specific gap between “Big Tech ready” and “elite-tier ready”

Annual Phase Map

Months	Phases	Focus
1	Phases 0–1	Foundations + framework internalization
2	Phase 2	All 28 patterns to mastery level
3	Phase 4	Graph algorithms deep
4	Phase 5	DP from basic to extreme
5	Phase 3 + Phase 6	Advanced DS + greedy proofs
6	Phase 8 + Phase 10	Practical engineering + testing/debugging
7	Phase 9 (full)	Language/runtime — 2 languages
8	Phase 7 (first half)	Competitive programming — math, sieve, modular, geometry
9	Phase 7 (second half)	Competitive — sweep line, Mo’s, parallel BS, contests
10	Phase 12 (selective)	Grandmaster topics relevant to your role
11	Mock marathon	Interview-realistic prep
12	Polish + readiness	Final checks, blind mocks, rest

Daily Cadence

Weekday: 2 hours
Saturday: 5 hours (mock + problems + labs)
Sunday: 3 hours (review + drilling)
Total: 17–18 hours/week

You will scale up in Months 8–11 to 22 hours/week temporarily.

Monthly Detail

Month 1 — Phase 0 + Phase 1

Identical to Month 1 of 6-Month Serious, but with extra depth on runtime docs (read all 10) and an additional 30 Easy problems for fluency.

Volume: 130 problems, 12 labs, 1 mock.

Month 2 — Phase 2 (Patterns)

All 28 pattern docs read in detail
8 problems per pattern (vs 5 in Serious track)
Every pattern: write a personal template after solving the 8 problems
5 mocks across the month

Volume: 224 problems, 9 labs, 5 mocks.

Mastery check: Can derive each pattern’s template from memory in <5 minutes.

Month 3 — Phase 4 (Graphs)

All 21 graph algorithms with implementation
All 9 product-style labs
60 graph problems (mix of LeetCode + Codeforces Div 3 graph problems)
4 mocks

Volume: 60 problems, 9 labs, 4 mocks.

Mastery check: Can implement BFS / DFS / Dijkstra / Bellman-Ford / Floyd-Warshall / Kruskal / Prim / Kosaraju / Tarjan from scratch error-free in <12 minutes each.

Month 4 — Phase 5 (Dynamic Programming)

All 22 DP concept + category docs
All 10 DP labs with full progression (brute → memo → tab → space)
80 DP problems
Every problem you solve: explicitly state recurrence + base case + evaluation order before coding
4 mocks (DP-heavy)

Volume: 80 problems, 10 labs, 4 mocks.

Mastery check: Can derive state + transition for unseen DP problems in <8 minutes.

Month 5 — Phase 3 (Advanced DS) + Phase 6 (Greedy/Proofs)

All 24 advanced DS docs
All 9 advanced DS labs
All 7 greedy concept docs + all 6 labs (with proofs)
50 problems mixing advanced DS and greedy
4 mocks

Volume: 50 problems, 15 labs, 4 mocks.

Mastery check: Can implement segment tree (with lazy propagation), Fenwick tree, KMP, rolling hash, trie, treap from scratch.

Month 6 — Phase 8 (Practical Engineering) + Phase 10 (Testing/Debugging)

All 23 practical engineering labs
Every lab includes: working implementation + unit tests + smoke tests + concurrency tests where relevant + answers to all 13 standard follow-ups
All 13 testing/debugging concept docs + all 5 labs
Property-based test 5 implementations
30 problems
5 mocks (mix mock-08 staff practical, mock-10 system-heavy, mock-11 concurrency)

Volume: 30 problems, 28 labs, 5 mocks.

Mastery check: Can build any of (LRU, rate limiter, autocomplete, KV store, thread pool) from scratch in 45 minutes with tests.

Month 7 — Phase 9 (Language & Runtime)

Primary language: read entire track end to end. Take written notes. Solve 5 problems per major topic targeting the gotchas.
Secondary language: read 50%+ of the track. Solve 20 problems in this language to build fluency.
30 problems total in primary language (focused on runtime gotchas: overflow, recursion limits, hash adversarial, GC pressure, concurrency)
4 mocks including 2 mock-09 (runtime/language deep dives)

Volume: 50 problems, 0 labs (concept-heavy month), 4 mocks.

Mastery check: Can fluently explain stack vs heap, GC behavior, hashmap internals, concurrency model for both languages.

Month 8 — Phase 7 First Half

Competitive programming acceleration starts here. Move to Codeforces full-time for problem solving (LeetCode for review only).

Topics: fast I/O, modular arithmetic, GCD/LCM, sieve, prime factorization, modular inverse, combinatorics, binary exponentiation, matrix exponentiation, geometry basics, coordinate compression
Codeforces Div 3 → solve 6 contests (virtual or real). Goal: solve 4–5 problems per Div 3.
AtCoder Beginner Contest: solve 4 contests. Goal: solve 5 of A–F.
1 mock per week (varied — competitive-flavored mock-13)

Volume: ~80 problems (Codeforces + AtCoder), 6 labs, 4 mocks.

Mastery check: Solve Codeforces 1500–1700 rated problems consistently within contest time.

Month 9 — Phase 7 Second Half

Topics: sweep line, offline queries, Mo’s algorithm, parallel binary search (overview), randomized stress testing, interactive problems, game theory (Nim, Sprague-Grundy)
Codeforces Div 2 → 6 virtual contests. Goal: solve A–C reliably, attempt D.
AtCoder Regular Contest → 3 contests, attempt A–C.
Implement a stress-testing harness for 5 problems
4 mocks including mock-13 (competitive hard) twice

Volume: ~70 problems, 4 mocks.

Mastery check: Codeforces Div 2 ABC reliable. D attempted and sometimes solved.

Month 10 — Phase 12 (Selective)

Pick topics by role:

Backend/platform:

Max flow modeling, min-cost max-flow overview
Advanced combinatorics + inclusion-exclusion
Randomized algorithms

Distributed systems:

Persistent data structures
Consistent hashing deep
Lockless / wait-free data structures (research)

Compiler/runtime:

Suffix automaton
Heavy-light decomposition
Constraint solving / SAT-like reasoning

Quant/HFT:

FFT/NTT
Computational geometry
Probability DP + expected value
Game theory deep

Pure competitive:

HLD, centroid decomposition, segment tree beats, FFT/NTT, suffix automaton

Pick 5–7 topics. Read deeply. Implement reference code. 3 labs from Phase 12.

Volume: 30 problems (very hard / CP-hard), 3+ labs, 4 mocks.

Mastery check: can articulate selected topics + implementation risks; have working reference implementations.

Month 11 — Mock Marathon

This is the highest-pressure month.

4 mocks per week (16 total)
Mix all 14 mock types — including grandmaster final boss (mock-14) at least twice
Failure analysis on every mock
Every Sunday: aggregate failure pattern, drill the top category Mon–Wed
60 problems (almost entirely focused on weak areas surfaced by mocks)

Volume: 60 problems, 16 mocks.

Month 12 — Polish, Re-solve, Final Readiness

Week 49: Re-solve all Tier 5 + Tier 6 problems. Aim for 80+ Tier 6 graduates.
Week 50: 4 cold blind mocks (no warmup, varied difficulty). Score brutally honestly.
Week 51: Address any remaining gaps. Light schedule (10 hours total).
Week 52: Rest, sleep, exercise. 1 final readiness check via READINESS_CHECKLIST.md. Interview confidently.

Volume: 30 re-solves, 4 mocks, 1 readiness audit.

Aggregate Volume

Month	Problems	Labs	Mocks
1	130	12	1
2	224	9	5
3	60	9	4
4	80	10	4
5	50	15	4
6	30	28	5
7	50	0	4
8	80	6	4
9	70	0	4
10	30	3	4
11	60	0	16
12	30	0	4
Total	~894	~92	~59

Contest Practice (unique to this track)

By end of Month 9, you should have completed:

12+ Codeforces Div 3/4 contests
6+ Codeforces Div 2 contests
6+ AtCoder Beginner Contests
3+ AtCoder Regular Contests

Goal Codeforces rating after 12 months: 1700–1900 (Expert range).

What Distinguishes This Track

vs 6-month Serious:

~1.5× the problems
2× the labs (especially in Phase 7 / 12)
2× the mocks (mostly in Months 8–11)
Real contest experience (vs 0 in 6-month track)
Multi-language runtime depth
Phase 12 awareness (selective implementation)

vs Competitive-only training:

This track does NOT skip practical engineering, system thinking, communication, or runtime depth
A candidate trained purely on competitive programming often struggles in mock-08 (staff practical) and mock-10 (system-heavy) — this track does not have that gap

Honest Limitations

Even after this track:

You will not reach grandmaster (CF 2400+) — that requires sustained contest practice for years
You will not be a domain expert in a specific area (e.g., compilers, distributed consensus) — that requires actual work on those systems
12 months of part-time prep cannot replicate 5 years of full-time engineering experience

What it can do is take you from intermediate to elite-candidate level — enough to interview confidently for very hard roles and have a fighting chance.

6-Month Serious Track — Implementation

This is the execution layer for schedules/6_MONTH_SERIOUS.md. The schedule tells you what to study and when. This folder is the actual substance — curated flagship problems, original company-sourced problems, graduated reading, proof-first solutions, and senior/staff/principal-level signal layers.

What This Is (And Is Not)

This IS:

91 hand-picked flagship problems across 20 active weeks
8–10 ORIGINAL problems sourced from real Amazon/AWS, Google, Meta, Microsoft interview intelligence (not on LeetCode)
A graduated 15-section reading layout per problem: approach → hints → insight → company adversarial → level delta → follow-ups
Python solutions with brute-force comparators and stress tests
Above-FAANG difficulty content targeting senior, staff, and principal interviews

This is NOT:

A duplicate of LeetCode Premium. You should have LC Premium. We never restate problem statements or constraints — those are one click away.
Every problem from the 6-month schedule (~595). The other ~500 are listed as “Problem Bank” in each week’s README — work them on LC after you own the flagship.
A passive read. Each problem has an Attempt Gate — you set a timer and try the problem cold before reading anything else.

Why This Exists

LeetCode editorials give you code. They do not give you:

The exchange argument that proves the greedy works
The invariant that the two-pointer maintains
The misleading example a Google interviewer will use to trap you on this exact problem
The scorecard language the interviewer writes when you do (or do not) ace it
The Level Delta: what a Mid vs Senior vs Staff vs Principal answer looks like for this same problem
The production reality: if this ran at 10M RPS, what would break first?
The Anti-pattern: the wrong-but-tempting approach that catches 70% of candidates, with exact bug location
The original problem Amazon will ask you that has never appeared on LeetCode

This track provides all of those.

How To Use

Read HOW_TO_USE.md — the graduated reading protocol
Read FRAMEWORK.md once if you have not — every “How to Approach” section here references its 16 steps
Start at month-01-foundations/, week 01, problem 01. Do not skip ahead.
For each problem: Attempt Gate first (20-min timer, cold), then graduate through the README sections
Stress-test your solution: python solution.py runs the brute-vs-optimal comparator
Log every solve in your tracking spreadsheet per SPACED_REPETITION.md

Problem Index (All 91)

Month 1 — Foundations (20 problems)

#	Problem	Difficulty	Source	Week
p01	Two Sum	Easy	LC 1	1
p02	Valid Parentheses	Easy	LC 20	1
p03	Best Time to Buy and Sell Stock	Easy	LC 121	1
p04	Merge Sorted Array	Easy	LC 88	1
p05	Climbing Stairs	Easy	LC 70	1
p06	Product of Array Except Self	Medium	LC 238	2
p07	Group Anagrams	Medium	LC 49	2
p08	Rotate Array	Medium	LC 189	2
p09	Longest Substring Without Repeating	Medium	LC 3	2
p10	Valid Anagram	Easy	LC 242	2
p11	Binary Tree Level Order Traversal	Medium	LC 102	3
p12	Kth Largest Element	Medium	LC 215	3
p13	Search in Rotated Sorted Array	Medium	LC 33	3
p14	Maximum Depth of Binary Tree	Easy	LC 104	3
p15	Merge Intervals	Medium	LC 56	3
p16	LRU Cache	Medium	LC 146	4
p17	Number of Islands	Medium	LC 200	4
p18	Coin Change	Medium	LC 322	4
p19	Binary Tree Right Side View	Medium	LC 199	4
p20	Word Search	Medium	LC 79	4

Month 2 — Patterns Mastery (20 problems)

#	Problem	Difficulty	Source	Week
p21	3Sum	Medium	LC 15	5
p22	Trapping Rain Water	Hard	LC 42	5
p23	Subarray Sum Equals K	Medium	LC 560	5
p24	Minimum Size Subarray Sum	Medium	LC 209	5
p25	Maximum Product Subarray	Medium	LC 152	5
p26	Find Minimum in Rotated Sorted Array	Medium	LC 153	6
p27	Daily Temperatures	Medium	LC 739	6
p28	Meeting Rooms II	Medium	LC 253	6
p29	Largest Rectangle in Histogram	Hard	LC 84	6
p30	Jump Game II	Medium	LC 45	6
p31	Course Schedule	Medium	LC 207	7
p32	Pacific Atlantic Water Flow	Medium	LC 417	7
p33	Number of Connected Components	Medium	LC 323	7
p34	Word Ladder	Hard	LC 127	7
p35	Lowest Common Ancestor of BST	Medium	LC 235	7
p36	Combination Sum	Medium	LC 39	8
p37	Unique Paths	Medium	LC 62	8
p38	Implement Trie	Medium	LC 208	8
p39	Find Median from Data Stream	Hard	LC 295	8
p40	Decode Ways	Medium	LC 91	8

Month 3 — Graphs + DP Deep (20 problems)

#	Problem	Difficulty	Source	Week
p41	Network Delay Time	Medium	LC 743	9
p42	Cheapest Flights Within K Stops	Medium	LC 787	9
p43	Shortest Path in Binary Matrix	Medium	LC 1091	9
p44	Word Ladder II	Hard	LC 126	9
p45	Clone Graph	Medium	LC 133	9
p46	Critical Connections in a Network	Hard	LC 1192	10
p47	Alien Dictionary	Hard	LC 269	10
p48	Min Cost to Connect All Points	Medium	LC 1584	10
p49	Is Graph Bipartite?	Medium	LC 785	10
p50	Reconstruct Itinerary	Hard	LC 332	10
p51	Longest Increasing Subsequence	Medium	LC 300	11
p52	Edit Distance	Hard	LC 72	11
p53	Partition Equal Subset Sum	Medium	LC 416	11
p54	Regular Expression Matching	Hard	LC 10	11
p55	Interleaving String	Medium	LC 97	11
p56	Burst Balloons	Hard	LC 312	12
p57	Palindrome Partitioning II	Hard	LC 132	12
p58	Strange Printer	Hard	LC 664	12
p59	Dungeon Game	Hard	LC 174	12
p60	Number of Ways to Wear Different Hats	Hard	LC 1434	12

Month 4 — Greedy, Advanced DS, Practical Engineering (20 problems)

#	Problem	Difficulty	Source	Week
p61	Queue Reconstruction by Height	Medium	LC 406	13
p62	Gas Station	Medium	LC 134	13
p63	Task Scheduler	Medium	LC 621	13
p64	Minimum Number of Arrows	Medium	LC 452	13
p65	Split Array Largest Sum	Hard	LC 410	13
p66	Range Sum Query Mutable	Medium	LC 307	14
p67	Count of Smaller Numbers After Self	Hard	LC 315	14
p68	Implement strStr() (KMP)	Easy	LC 28	14
p69	Repeated DNA Sequences	Medium	LC 187	14
p70	Sliding Window Maximum	Hard	LC 239	14
p71	Design LRU Cache (distributed follow-ups)	Medium	LC 146 + extensions	15
p72	Design Hit Counter	Medium	LC 362	15
p73	Design Autocomplete System	Hard	LC 642	15
p74	LFU Cache	Hard	LC 460	15
p75	ORIGINAL — Consistent Hashing Ring	Hard	Amazon/AWS	15
p76	Design In-Memory File System	Hard	LC 588	16
p77	ORIGINAL — Distributed Rate Limiter	Hard	Meta Infra	16
p78	Text Justification	Hard	LC 68	16
p79	ORIGINAL — Service Mesh Min Deployment	Hard	Google SRE	16
p80	Basic Calculator II	Medium	LC 227	16

Month 5 — Hards + Concurrency + Above-FAANG (10 problems)

#	Problem	Difficulty	Source	Week
p81	Median of Two Sorted Arrays	Hard	LC 4	17
p82	Russian Doll Envelopes	Hard	LC 354	17
p83	First Missing Positive	Hard	LC 41	17
p84	Minimum Window Substring	Hard	LC 76	17
p85	ORIGINAL — Concurrent Web Crawler with Backpressure	Hard	Amazon	18
p86	Design Bounded Blocking Queue	Medium	LC 1188	18
p87	ORIGINAL — S3 Cost-Optimal Eviction	Hard	AWS	18
p88	N-Queens	Hard	LC 51	19
p89	Trapping Rain Water II	Hard	LC 407	19
p90	ORIGINAL — Distributed Job DAG Scheduler	Hard	Google	19

Month 6 — Polish + Mock Marathon

Capstone	Source	Week
p91 — Skyline Problem	LC 218	20
MOCK_MARATHON.md	—	21
RE_SOLVE_GUIDE.md	—	23
FAILURE_PATTERNS.md	—	21–24
FINAL_READINESS.md	—	24

Cross-References to the Rest of the Workspace

FRAMEWORK.md — universal 16-step problem-solving framework
CODE_QUALITY.md — quality bar every solution.py must meet
COMMUNICATION.md — what to say in interviews
FAILURE_ANALYSIS.md — 16-category failure taxonomy
SPACED_REPETITION.md — 6-tier review schedule
READINESS_CHECKLIST.md — final binary checklist
phase-02-patterns/ — concept docs for patterns (3Sum, sliding window, etc.)
phase-04-graphs/ — graph algorithm concept docs
phase-05-dp/ — DP concept docs and labs

Honest Promise

If you complete every problem in this track honestly — Attempt Gate respected, hints used sparingly, follow-ups answered before reading the answers, stress tests passing, every “When to Move On” checklist green — you will be in the top 5% of candidates entering senior, staff, and principal interview loops at Amazon, Google, Meta, Microsoft, Apple, and infrastructure-specialty companies.

If you skip the Attempt Gates, peek at hints early, or skim the Level Delta sections without honestly self-assessing — this is just a list of problems you’ve seen. Seen ≠ owned. Owned is what gets offers.

How To Use This Track — The Graduated Reading Protocol

Every problem folder has three files. They are meant to be read in a specific order with specific gates between them. Reading them out of order destroys the value.

The Three Files

pXX-problem-name/
  README.md       ← Graduated reading: 15 sections, each gated
  solution.py     ← Brute + optimal + stress test (Python, runnable)
  hints.md        ← 5 progressive hints (only opened when stuck)

The Per-Problem Reading Protocol

Phase 1 — Cold Attempt (mandatory)

Open README.md, read only sections 1–2 (Quick Context + LeetCode Link/Attempt Gate)
Click the LeetCode link, read the problem statement on LC
Set a 20-minute timer. Code on paper or in a scratch file. No IDE autocomplete that helps with the algorithm. No reading further in README.md. No hints.md.
If you solve it: write down your time and approach. Now jump to Phase 3.
If 20 min passes: write down where you got stuck. Move to Phase 2.

Why the gate? You cannot build pattern-recognition by reading approaches. You build it by failing to find approaches, then learning the gap. If you skip the cold attempt, you are training your recognition memory, not your derivation skill. Interviews test derivation.

Phase 2 — Hint Ladder (only if stuck)

Open hints.md. Read one hint. Set another 10-min timer. Try again.

Solved after Hint 1 → write down which insight unlocked it. Move to Phase 3.
Still stuck after 10 min → next hint.
After Hint 5 without solving → you’ve hit the conceptual gap. Open README.md section 4 (“How to Approach”) and read straight through. Then Phase 3.

Rule: never read two hints back to back. Always 10 minutes between hints.

Phase 3 — The Real Learning

Now read README.md sections 3–15 in order. This is where the value is.

Section 3 (Prerequisites): if any link goes to a phase lab you haven’t done, do that lab first
Section 4 (How to Approach): compare to YOUR approach. Where did you diverge? Why?
Section 6 (Deeper Insight): the proof or invariant. You must be able to restate this in your own words before moving on.
Section 7 (Anti-Pattern): “did I almost do this?” — if yes, this is a flag for your weakness log
Section 10 (Company Context): mental note of how the company you’re targeting twists this problem
Section 12 (Level Delta): honestly: which level was your answer? Mid, Senior, Staff, or Principal?
Section 13 (Follow-ups): cover the answer with your hand. Try each follow-up. Then read the answer.

Phase 4 — Code & Test

Open solution.py. Read the brute force. Read the optimal.
Close the file. Re-implement the optimal from scratch in your own scratch.
Diff your re-implementation against the reference. Identify every difference. Some differences are style (fine). Some are bugs (not fine).
Run the stress test: python solution.py. It must pass.
Now run your own implementation against the stress test. It must also pass.

Phase 5 — Move-On Gate

Walk through Section 9 (“When to Move On”) — the binary checklist. Every box must be honestly checked. If even one is no, you stay on this problem (re-do tomorrow). No exceptions.

Log the solve in your tracking spreadsheet:

Date, problem, time-to-solve, hint depth used (0–5), follow-ups answered correctly (count/total), Level Delta self-assessment
Schedule next review per SPACED_REPETITION.md

What Each README.md Section Is For

#	Section	Read When
1	Quick Context	Before cold attempt
2	LeetCode Link + Attempt Gate	Before cold attempt
3	Prerequisite Concepts	After cold attempt; do prereq labs if needed
4	How to Approach	After Hint 5 fails OR after solving
5	Progressive Hints (→ hints.md)	When stuck, one at a time
6	Deeper Insight	Always, after solving
7	Anti-Pattern Analysis	Always — check if you fell into it
8	Skills & Takeaways	Always — note analogous problems
9	When to Move On	Mandatory gate
10	Company Context	Read for companies you’re targeting
11	Interviewer’s Lens	Always — internalize scorecard language
12	Level Delta	Self-assess honestly
13	Follow-ups & Answers	Attempt each cold, then read
14	Full Solution Walkthrough	After re-implementing
15	Beyond the Problem	Always — production reality

Common Mistakes (Do Not Do)

Reading README.md before the Attempt Gate. Destroys the entire training value.
Reading all hints at once. Same problem.
Skipping the “re-implement from scratch” step. Reading code ≠ writing code.
Skipping the Level Delta self-assessment. This is the single highest-signal section.
Skimming Section 10 (Company Context). This is where the differentiated content lives.
Marking “When to Move On” green without honestly checking each box. Calcifies bad habits.
Not stress-testing your own implementation. A solution that passes LC may have a bug a stress test catches.
Skipping originals (p75, p77, p79, p85, p87, p90). These are the highest-value problems in the track. They’re original because they don’t exist on LC — meaning your competition hasn’t seen them either.

Time Budget Per Problem

Difficulty	Cold Attempt	Hints + Re-attempt	Reading + Re-implement	Total
Easy	20 min	10–20 min	30 min	~60–70 min
Medium	25 min	20–40 min	45 min	~90–110 min
Hard	30 min	30–60 min	60 min	~120–150 min
Original (any)	30 min	20–40 min	60–90 min	~110–160 min

Plan accordingly when fitting problems into the weekly schedule.

The One Rule

If you remember nothing else: the Attempt Gate is not optional. Every other shortcut is recoverable. Skipping the cold attempt is not — it permanently degrades the training signal for that problem.

Month 1 — Foundations

Weeks 1–4 · 20 flagship problems · ~115 LC Bank problems · 1 mock

Goals

By end of month, you can:

Execute the 16-step framework on every problem without thinking about it
Solve any LC Easy in <12 min with full edge-case coverage
Solve straightforward Mediums (array/string/hashmap/tree/basic-DP) in <30 min
Explain hashmap collision handling, GC, stack vs heap in your primary language
Pass mock-02 (Easy) at 7+/10

Weekly Map

Week	Theme	Flagship Problems	Phase Reading
1	Execution baseline + first patterns	p01–p05	phase-00, phase-01 §1–3
2	Arrays + strings + hashmaps deep	p06–p10	phase-01 §1–3 runtime, phase-09 your lang first half
3	Heaps, binary search, trees	p11–p15	phase-01 §4–7
4	LRU, DFS/BFS, DP intro, backtracking intro	p16–p20	phase-01 §8–9, phase-09 second half

End-of-Month Gate

All 20 flagship problems: Section 9 (“When to Move On”) checklist green
All Phase 1 mastery checks in phase-01-foundations/README.md pass
mock-02 scored 7+/10
Tracking spreadsheet has 20 entries with Level Delta self-assessment

If any item fails: do NOT enter Month 2. Repeat the weakest week’s drilling.

Why These 20 Problems

These are the 20 most-asked Easy-and-low-Medium problems in 2024–2025 Big Tech interview loops (cross-referenced against company tag data, real-time interview reports, and recruiter intel). They are the floor. If you cannot solve all 20 fluently, you will fail the phone screen — not because they are hard, but because failing an Easy in 12 minutes is an instant no-hire signal.

Week 1 — Execution Baseline + First Patterns

Days 1–7 · 5 flagship problems · ~25 LC Bank · 0 mocks

Goals

Internalize the 16-step framework on trivial problems (so you can use it on hard ones)
Two-pass over Phase 0 labs
Acquire HashMap-1-pass, Stack-validation, Greedy-1-pass, Two-pointer-from-end, 1D-DP-intro patterns

Daily Schedule

Day	Reading	Flagship	Bank
Mon	FRAMEWORK.md re-read; phase-00 labs 1–3	p01 Two Sum	3 LC Easies
Tue	phase-00 labs 4–5	p02 Valid Parentheses	3 LC Easies
Wed	phase-00 labs 6–7	p03 Best Time Buy/Sell	3 LC Easies
Thu	phase-01 §1 Arrays	p04 Merge Sorted Array	3 LC Easies
Fri	phase-01 §2 Strings + §6 Heap intro	p05 Climbing Stairs	3 LC Easies
Sat	Re-solve p01–p05 unaided	—	5 LC Easies + REVIEW
Sun	COMMUNICATION.md + spaced repetition logging	—	5 LC Easies

LC Bank (Problems to solve on your own after flagship)

LC 217 (Contains Duplicate), 169 (Majority Element), 268 (Missing Number), 53 (Maximum Subarray), 136 (Single Number), 283 (Move Zeroes), 26 (Remove Duplicates from Sorted Array), 27 (Remove Element), 1 (Two Sum — variant), 9 (Palindrome Number), 14 (Longest Common Prefix), 28 (strStr — naive), 35 (Search Insert Position), 66 (Plus One), 67 (Add Binary), 69 (Sqrt(x) — binary search intro), 88 (Merge Sorted Array — variant), 100 (Same Tree), 101 (Symmetric Tree), 104 (Maximum Depth — preview), 108 (Sorted Array → BST), 112 (Path Sum), 118 (Pascal’s Triangle), 226 (Invert Binary Tree), 543 (Diameter of Binary Tree).

Readiness Gate

All 5 flagship problems Section 9 checklists green
25+ Bank problems solved unaided
Framework Steps 1–9 executed audibly (talk through) on at least 10 problems
No off-by-one errors on 5 consecutive binary-search-flavored problems
Honest self-assessment: Level Delta = Mid or above on at least 3 flagships

p01 — Two Sum

Source: LeetCode 1 · Easy · Topics: Array, Hash Table Companies (2024–2025 frequency): Amazon (very high), Google (high), Meta (high), Apple (medium), Microsoft (medium), Bloomberg (very high) Loop position: phone screen warmup, or first 10 min of onsite to calibrate

1. Quick Context

This is the most-asked Easy in Big Tech history. The interviewer is not testing whether you can solve it — they expect you to solve it in <8 minutes. They are testing whether you:

Clarify before coding (duplicates? multiple answers? sorted?)
State the brute force out loud before optimizing
Pick the right optimal (one-pass hash, not two-pass)
Handle the “what about the same element twice” edge case
Communicate cleanly through a problem you’ve obviously seen before

What it looks like it tests: array iteration. What it actually tests: disciplined communication under “I’ve done this a million times” complacency. Senior candidates fail this by going too fast and skipping clarifications. The interviewer is watching for the framework, not the answer.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/two-sum/

STOP. Set a 15-minute timer. Code it cold in a scratch file. Do not read past this section until you have either solved it or the timer expired.

If you’ve solved Two Sum before: do it again anyway, and time yourself. Target: 6 min including narration. If you’re over 8 min, you have a framework/communication gap, not an algorithm gap — and that gap will kill you on harder problems.

3. Prerequisite Concepts

Hash table average O(1) lookup + the assumption that makes it true (phase-01 §3 HashMap Mastery)
“Complement search” pattern: instead of looking for pairs, transform to lookup of (target − x)
In your primary language: what hash collision behavior is, what hash resize cost is — see phase-09

4. How to Approach (FRAMEWORK Steps 1–9 applied)

Step 1 — Restate: “Given an integer array and a target integer, return the indices of two distinct elements that sum to target. Exactly one valid pair exists per problem statement.”

Step 2 — Clarify (ask out loud, do NOT skip even though it’s Easy):

“Can the same element be used twice?” (No — indices must be distinct.)
“If multiple pairs sum to target, which one do I return?” (Problem says exactly one solution — but in real interviews they may relax this; ask.)
“Can the array be empty / size 1?” (Per constraints, N ≥ 2.)
“Are values bounded? Can they be negative?” (Per LC, yes, both negative and positive in int32 range.)
“Is the array sorted?” (No. If it were, you’d use two pointers — explicitly call this out; this is a senior signal.)
“Return indices or values?” (Indices, in any order.)

Step 3 — Constraints: N up to 10^4 in classic statement. O(N²) brute fits, but O(N) is expected. With N up to 10^4, O(N²) = 10^8 — borderline, will get TLE on some servers.

Step 4 — Examples (build your own):

[2,7,11,15], target=9 → [0,1] (the given one)
[3,3], target=6 → [0,1] ← critical: the “same value, different indices” case
[-3,4,3,90], target=0 → [0,2] (negative handling)
[1,2,3,4], target=100 → does not occur per statement, but ask: “if no solution, what do I return?”

Step 5 — Brute Force: Nested loop. For each i, check every j > i. O(N²) time, O(1) space.

Step 6 — Brute Force Complexity: Time O(N²), space O(1). Trivially correct. State this out loud BEFORE optimizing.

Step 7 — Pattern Recognition: “Given an array, find a pair satisfying a property” + “complement is computable” + “order of result doesn’t matter” → HashMap complement search. (Sorted + ordered → two pointers. Unsorted + complement-computable → HashMap.)

Step 8 — Optimize: Walk the array once. For each x at index i, compute complement = target − x. If complement is in the map, return (map[complement], i). Else add x → i to the map. One pass, not two. Two-pass works but signals weaker intuition.

Step 9 — Prove correctness: Loop invariant: after processing index i, the map contains exactly {nums[0..i] : their indices} for every distinct value (or the latest index for duplicates, but since exactly one solution exists, duplicates can only be the answer pair — and at index j, the complement was stored at index i < j, so we find it).

5. Progressive Hints

If you’re stuck for more than 5 minutes, open hints.md and read one hint only. Set another 5-min timer between hints.

6. Deeper Insight — Why It Works

The transformation: “Find two numbers summing to T” is a 2D search (i, j pairs) → O(N²). By computing T − nums[i] for each i, we reduce to a 1D lookup (“does this value exist?”) which is O(1) amortized with a hash table. The hash table is the data structure that turns 2D pair-search into 1D existence-search. This is the master pattern for entire problem families: 3Sum, 4Sum, Subarray-Sum-K, etc., all reduce 1 dimension via complement-hashing.

The single-pass insight: You don’t need to load the full array into the map first. By the time you encounter the second element of the answer pair, the first one has already been stored. So insert AFTER you check — never before, or [3,3] breaks (you’d find 3 → 0 and return [0,0], two equal indices).

Order matters: check first, insert second. Reversing this is the single most common bug. The map state at step i represents “everything seen STRICTLY BEFORE i” — that’s the invariant.

7. Anti-Pattern Analysis

Wrong-but-tempting #1 — Two-pass with enumerate:

d = {x: i for i, x in enumerate(nums)}
for i, x in enumerate(nums):
    j = d.get(target - x)
    if j is not None and j != i:
        return [i, j]

This works but: (a) O(N) extra memory written then read again, (b) the j != i guard signals you didn’t think about why one-pass avoids the issue, (c) for duplicates [3,3], the dict only stores the last index — works only because j != i saves you. Two-pass is a code smell that says “I memorized the pattern but didn’t internalize the invariant.”

Wrong-but-tempting #2 — Sort then two-pointer:

sorted_nums = sorted(enumerate(nums), key=lambda p: p[1])
# ... two-pointer scan

Works, but O(N log N) when O(N) exists. Some candidates do this because two-pointer is their hammer. At Google in particular, choosing N log N when N exists is a signaled-down on the algorithmic complexity rubric.

Wrong-but-tempting #3 — Brute force “to be safe”: The brute force fits N ≤ 10⁴ barely. If the interviewer expands the constraint to 10⁵, brute force times out. They WILL expand the constraint to test you — be ready.

8. Skills & Takeaways

Generalizable pattern: Complement search via HashMap. Any time you see “find pair/triple/group summing or differing or relating to value V”, first ask: can I express the missing piece as a function of what I have? If yes, HashMap that function’s range.

Analogous problems (do these on LC after):

LC 167 — Two Sum II (sorted input — uses two pointers instead; teaches when NOT to use hashmap)
LC 653 — Two Sum IV BST (BFS + set; same complement pattern, different traversal)
LC 1 vs LC 15 (3Sum) — the recursive extension; outer loop fixes one element, reduces to 2Sum
LC 560 — Subarray Sum Equals K (the prefix-sum complement variant — same idea, applied to running sums)
LC 454 — 4Sum II (split into two halves, hash one half, lookup complements of the other — meet-in-the-middle flavor)

When NOT to use this: Sorted input (two pointers is O(1) space, hashmap is O(N)). Streaming input where you can’t afford O(N) memory.

9. When to Move On (binary; must all be YES)

I solved p01 unaided in <8 min including narration on the second attempt
I can state the loop invariant of one-pass Two Sum without looking
I can explain why “check first, insert second” matters, with the [3,3] example
I can name when two-pointer is preferred over hashmap (sorted input, O(1) space requirement)
I implemented this from scratch and my version passes stress_test() in solution.py
I solved LC 167 (Two Sum II) and articulated why the optimal approach changed
I solved LC 560 (Subarray Sum K) and recognized the same complement-hashing pattern

If any unchecked: redo tomorrow before moving to p02.

10. Company Context

Amazon (LP-heavy; coding bar = “are bugs going to ship?”)

The framing: Often given as “given a list of Order objects with a price field, find two orders whose prices sum to the target promo discount.”
Misleading example: They’ll give you [5, 5, 5, 5], target=10. Many candidates lock onto “return the first two indices that work” — but Amazon interviewers want you to ASK whether you should return any valid pair or the first in some defined order. Asking shows Customer Obsession (knowing the requirement).
Adversarial extension: “Now there could be millions of orders streamed one at a time. How does your solution change?” → streaming with a HashSet, return the first matching pair.
What they want to hear: “Let me clarify the requirements before I optimize.” Verbatim phrases like “Let me first state the brute force so we have a baseline” earn rubric points.

Google (algorithmic complexity is a hard rubric line)

The framing: Often the cleanest, just the original problem.
Misleading example: A small-N example where O(N²) clearly fits. Google interviewers do this to see if junior candidates will say “brute force is fine” and stop. The right move: “Brute force fits this size, but I want to do better as a habit — and to handle the extension where N is 10⁶.”
Adversarial extension: “Return all pairs that sum to target.” Now it’s not 1 answer; you must dedupe pairs (3Sum-style logic).
What they want to hear: Explicit complexity for brute and optimal, stated separately. “O(N²) brute → O(N) one-pass hashmap, O(N) extra space.”

Meta (heavy on follow-ups; expect 3–4 in 25 min)

The framing: “Two Sum, then variants in rapid succession.”
Misleading example: They start with the canonical example, then immediately follow up: “Now what if the array is sorted?” If you stayed on hashmap, you scored less than the candidate who pivots to two-pointers.
Adversarial extension: “What if duplicates are allowed in input AND we want all unique pairs?” → 2Sum-with-duplicates → introduces dedup via sort or a counted-set.
What they want to hear: Recognition that the optimal algorithm depends on input properties. The phrase “if the input were sorted, I’d use two pointers” wins them over.

Microsoft (clarity + cleanliness over speed)

The framing: Phone screen warmup.
Misleading example: None — Microsoft tends to be straightforward here.
Adversarial extension: “Now make it work for any K-Sum” (K is a parameter).
What they want to hear: Clean function decomposition, clear variable names, edge-case enumeration. Microsoft phone screens reward boring, correct code.

Bloomberg (financial framing — be ready)

The framing: “Given a list of trade prices and a settlement amount, find two trades that exactly settle to the amount.”
Misleading example: Includes negative prices (refunds). Candidates who hard-code i < j ordering may break on negatives if they switch to a sort approach mid-stream.
What they want to hear: Explicit “I’m assuming negative values are allowed; my hash approach handles them naturally.”

11. Interviewer’s Lens

Phase	Strong signal	Weak signal	Scorecard phrase (strong)
Reading problem	Asks 3+ clarifying questions even though it’s Easy	Says “oh I’ve seen this” and dives in	“Disciplined clarifying behavior — would translate to fewer production bugs”
Pre-coding	States brute force, then states optimal, with complexities for both	Jumps to “I’ll use a hashmap” without justifying	“Communicates derivation, not just memorization”
Coding	Names variable `complement`, comments invariant once	Uses `d`, `m`, no comments	“Code is interview-readable; would pass our internal code review bar”
Edge cases	Tests `[3,3]` before submission	Tests only the given example	“Self-catches bugs before code review — strong production instinct”
Post-coding	Articulates time AND space complexity unprompted	Says only “it’s linear”	“Owns full complexity analysis”

The scorecard line that gets you the offer: “Candidate demonstrated framework discipline on a trivial problem, suggesting it will scale to hard ones.”

The scorecard line that loses you: “Skipped clarifying questions; rushed to known answer; did not test [3,3]; missed senior signal opportunity.”

12. Level Delta

Level	What their answer looks like
Mid	One-pass hashmap solution. States O(N) time, O(N) space. Tests given example. ~10 min.
Senior	All of Mid + clarifies 3+ questions upfront + explicitly states brute force first + tests `[3,3]` + mentions “if sorted, I’d use two pointers” + completes in ~7 min.
Staff	All of Senior + articulates loop invariant before coding + names the complement-search pattern by family + connects to LC 15 (3Sum) and LC 560 (Subarray Sum K) as the same family + mentions hash collision worst-case (O(N²)) as a footnote + completes in ~6 min.
Principal	All of Staff + asks “what’s the production context — are these orders, transactions, ad bids?” + identifies that for very large N you’d shard the hashmap or use a Bloom-filter prefilter + mentions GC pressure from large dict allocation as an Amazon/Google production concern + offers the streaming variant unprompted + completes in ~5 min with time to discuss tradeoffs.

Honest self-assessment: Which level was YOUR answer? If “Mid”, you have 4 sections to add to your toolkit. If “Senior” — good baseline; aim for Staff on the next 5 problems.

13. Follow-up Questions & Full Answers

Follow-up 1: “What if the array is sorted?”

Signal sought: Do you recognize that input properties change the optimal algorithm?

Full answer: “If sorted, I’d switch to two pointers — left=0, right=N-1. If sum > target, decrement right; if sum < target, increment left; if equal, return. O(N) time, O(1) space (better than hashmap’s O(N) space). The correctness comes from monotonicity: incrementing left only increases the sum; decrementing right only decreases. We never miss the answer because at each step, the eliminated half cannot contain the answer.”

Follow-up 2: “What if there are multiple valid pairs and we want all unique ones?”

Signal sought: Can you handle dedup without explosive complexity?

Full answer: “Two approaches. (a) Sort + two pointers + skip duplicates on both sides — O(N log N) time, O(1) extra. (b) Hashmap + use a set of frozensets to dedup result pairs — O(N) time, O(N) extra. Approach (a) is preferred unless we cannot mutate input. The key insight: dedup happens by skipping equal adjacent values after sorting, not by post-filtering.”

Follow-up 3: “Now there are billions of integers streamed one at a time, infinite stream. Detect any sum-pair as fast as possible.”

Signal sought: Streaming / unbounded-input thinking.

Full answer: “Use a HashSet (not Map — we just need existence). For each incoming x: check (target − x) in set; if found, emit the pair; else add x to set. O(1) per element, O(N) total memory grows unboundedly. For unbounded memory: use a Bloom filter as a prefilter (false positives OK; we verify by querying upstream), bounded memory at cost of occasional false alarms. If we need bounded memory AND zero false positives, we accept that we may miss pairs — fundamentally we cannot remember an arbitrary stream. State this tradeoff explicitly.”

Follow-up 4 (Hard): “Distribute across 100 machines. Each holds 1% of the array. Find a pair summing to target.”

Signal sought: Distributed systems thinking, communication cost awareness.

Full answer: “Broadcast the target. Each machine builds a local set of its values. To find cross-machine pairs: each machine emits (target − x, x, machine_id) for each local x, hash-partitioned by (target − x) mod 100 to the responsible machine. That receiving machine checks if it holds the complement. Communication is O(N) total messages, O(N/100) per machine. Latency: 1 shuffle round. Correctness: every valid pair (x, y) where x + y = target is checked because x is hashed by y = target − x to y’s home machine. Caveat: if the array is so large that even local sets don’t fit in RAM, we partition further or stream from disk with external-sort-style processing.”

Follow-up 5 (Senior signal): “How would you test this code?”

Signal sought: Testing discipline, not just unit tests.

Full answer: “Four layers. (1) Unit: given example, [3,3], negatives, two valid pairs choosing one. (2) Edge: minimum N=2, maximum constraint. (3) Property test: random N, random ints, brute-force comparator — assert both find the same pair (modulo order). My solution.py does this via stress_test(). (4) Adversarial fuzz: hash-collision DoS inputs — known pathological inputs that cause O(N²) hashmap behavior. Production code would also include perf regression tests and memory profiling for large N.”

14. Full Solution Walkthrough

See solution.py.

The file has four sections:

two_sum_brute(nums, target) — nested loop, O(N²). This is your correctness oracle.
two_sum(nums, target) — the one-pass hashmap. Note the order: check first, insert after. The comment on line marking the invariant.
stress_test() — generates 1000 random arrays, runs both, asserts results sum to target equally. This is the bar: every flagship problem has a stress test.
__main__ — runs the given example, [3,3], negative case, then the stress test.

Decisions justified in the file:

Why seen.get(complement) instead of if complement in seen: return [seen[complement], i]: one hash lookup instead of two.
Why we return as a list [a, b] not a tuple: matches LC signature exactly.
Why no input validation: per the framework, we validate at system boundaries — interview code assumes valid input per the problem statement.

15. Beyond the Problem — Production Reality

At 10M RPS:

The hashmap allocation per request becomes a GC pressure point. In production, you’d pool the map or use a primitive-keyed map (e.g., Long2IntOpenHashMap in Java’s Eclipse Collections, or a dict allocated once and .clear()’d in Python).
For very large N per request, the O(N) memory dominates. Spilling to off-heap or using a compact open-addressing hashmap would matter.

Real system this is the kernel of:

Ad bid matching: given a target CPM, find two bids that sum to the publisher’s floor. Same algorithm, with bid-objects carrying metadata. Real ad exchanges (Google Ad Exchange, Meta Audience Network) do variants of this billions of times per second.
Promo discount calculator: e-commerce platforms match “if customer buys X and Y, the bundle costs the promo target.”
Settlement matching at exchanges: pair buy and sell orders that exactly clear.

Principal-engineer code review comment: “Why is this a one-off function? In our codebase, complement-search-against-hashmap is a building block. Extract find_pair_by_property(items, key_fn, target) so we can reuse for 3Sum, k-Sum, prefix-sum, and the promo-bundle code path. Also: thread safety? If this map is shared across requests, we have a race.”

p02 — Valid Parentheses

Source: LeetCode 20 · Easy · Topics: String, Stack Companies: Amazon (high), Google (medium), Meta (medium), Microsoft (high), Bloomberg (high), Salesforce (very high) Loop position: phone screen, sometimes paired with p80 (Basic Calculator II) at onsite

1. Quick Context

The canonical stack problem. Looks trivial; the trap is the early-return logic and the empty-stack-on-close case. Senior candidates lose points by writing 15 lines when 8 suffice, by forgetting to check the stack is non-empty at end, or by not asking whether non-bracket characters can appear.

What it tests: Disciplined use of the right data structure (stack), not loop-and-counter hacks. Anti-signal: counting opens and closes separately (“balanced count”) — that fails on "(]". If you proposed this and didn’t catch it yourself, you’ve failed the cognitive-trap check.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/valid-parentheses/

STOP. Set a 12-min timer. Solve cold. Do not read on until done or timed out.

3. Prerequisite Concepts

LIFO semantics of a stack; why “matching last-opened” is intrinsically a stack property — phase-01 §5
Mapping closed → open via a constant dict (O(1) lookup, more idiomatic than chained if)

4. How to Approach

Restate: “Given a string of ()[]{} characters, return true iff every opener has a matching closer of the right type, in the right nesting order.”

Clarify:

“Can the string contain other characters?” (LC: no, only brackets — but ASK, real interviews vary.)
“Empty string?” (LC: valid = true. Yes, common gotcha.)
“Maximum length?” (Implies whether we care about stack overflow in recursive solutions; iterative + explicit stack avoids that anyway.)

Constraints: N up to 10⁴. O(N) expected.

Examples to build:

"()[]{}" → true (sequential pairs)
"([{}])" → true (full nesting)
"(]" → false ← the counter-fail case
"]" → false (close-first; empty-stack case)
"((" → false (unmatched open at end; non-empty-stack-at-end case)
"" → true (empty)

Brute force: Repeatedly scan and remove adjacent (), [], {} pairs until no change. If empty, valid. O(N²).

Pattern: “Match most recent unclosed thing” → stack. Period.

Optimal: One pass. For each char: if opener, push. If closer, pop and verify match. At end, stack must be empty.

Proof of correctness: Stack invariant — at any point, the stack contains exactly the unmatched openers in order of opening. A closer matches iff it pairs with the top (most recent opener). If a closer arrives with empty stack, it has no match → false. If non-closer remains at end → unmatched openers → false.

5. Progressive Hints

If stuck >5 min: hints.md. One at a time.

6. Deeper Insight — Why It Works

Why a stack and not a counter? A counter (opens - closes) is sufficient for a single bracket type — but it cannot detect "(]". The stack is necessary because we need to remember not just that a bracket is open, but which type. Each push commits the type; each pop verifies the type. The stack is the minimum data structure that captures both “how many open” and “which order/type.”

Why we check empty BEFORE popping: If we blindly pop an empty stack on a closer, we error out (or in some languages, get undefined behavior). The check if not stack: return False handles "]", "})", etc.

Why we check empty at END: The string "((" walks through pushing twice; loop ends; stack non-empty; without the final check, we’d return True. The end-check is the second necessary correctness step.

7. Anti-Pattern Analysis

Wrong-but-tempting #1 — Count-and-compare:

if s.count('(') == s.count(')') and s.count('[') == s.count(']') and ...:
    return True

Fails on "(]" (each count is 1, returns True). This is the single most common bug junior candidates ship. If you proposed this, you missed the type-ORDER requirement.

Wrong-but-tempting #2 — Regex / replace loop:

while '()' in s or '[]' in s or '{}' in s:
    s = s.replace('()','').replace('[]','').replace('{}','')
return s == ''

Works but O(N²) and signals you reached for a hammer. Interviewer’s note: “didn’t recognize stack pattern.”

Wrong-but-tempting #3 — Forgetting end-check:

for c in s:
    if c in '([{': stack.append(c)
    elif c in ')]}':
        if not stack or pairs[c] != stack.pop(): return False
return True   # ← BUG: returns True for "((" because stack non-empty but no closer caught it

The fix is return not stack. Forgetting this is the #2 bug here.

8. Skills & Takeaways

Generalizable pattern: “Match most recent unresolved item” → stack. Applies to:

LC 32 — Longest Valid Parentheses (stack of indices, harder cousin)
LC 84 — Largest Rectangle in Histogram (monotonic stack — same family)
LC 739 — Daily Temperatures (monotonic stack — same family)
LC 71 — Simplify Path (each segment as stack frame)
LC 394 — Decode String (stack of (count, prefix) frames)
LC 1249 — Minimum Remove to Make Parentheses Valid (stack of indices)

When NOT to use: Single bracket type, just balance counting → counter is sufficient and uses O(1) space.

9. When to Move On

Solved unaided in <8 min on second attempt
Tested "(]", "]", "((", "" without prompting
Can articulate why a counter fails for multiple bracket types
Implemented with constant closer → opener map (not chained ifs)
Stress test in solution.py passes
Solved LC 1249 and recognized the same pattern with index tracking

10. Company Context

Amazon

The framing: Often paired with “and write the matched pairs as (open_idx, close_idx)” — extends to index-tracking variant.
Misleading example: They give "{[()]}" and let you confirm true. Then sneak in "{[(])}" — looks balanced character-counts-wise; trips candidates who didn’t catch the type-order failure.
Extension: “Now allow <> too.” → tests whether your code generalizes (constant dict makes it 1-line; chained ifs make it 4 new branches).

Salesforce (this is THEIR favorite)

The framing: Often appears in onsite, not phone screen — they care about the cleanliness more than the algorithm.
What they want to hear: “I’ll use a stack and a dictionary mapping closers to openers.” That sentence alone is a green check.
Adversarial extension: “What if [ could match ) (mixed mode)?” → tests whether your code is data-driven enough to change one line of config.

Microsoft

The framing: Phone screen warmup, sometimes followed by p80 (Basic Calculator).
What they want to hear: Single-pass justification, end-of-loop empty-check stated out loud.

Bloomberg

The framing: Often as part of a JSON/parser problem — “validate this serialized message structure”.
Extension: “Parse the bracketed expression into an AST.”

11. Interviewer’s Lens

Phase	Strong	Weak	Scorecard (strong)
Reading	Asks “other characters?” “empty string?” “max length?”	Dives in	“Clarifies the contract before coding”
Pre-coding	Names the stack pattern explicitly	Says “I’ll iterate and track”	“Recognizes the canonical pattern”
Coding	Uses constant dict `{')':'(', ']':'[', '}':'{'}`	Chains 6+ if-branches	“Writes data-driven, extensible code”
Edge cases	Tests `"(]"`, `"]"`, `""` proactively	Tests only `"()"`	“Anticipates failure modes”
Finish	States complexity and stack invariant	Says “done”	“Owns the analysis”

12. Level Delta

Level	Answer
Mid	Stack solution, works, ~8 min. Tests only the given example.
Senior	+ clarifies upfront + tests `"(]"` + uses constant dict + end-empty-check articulated.
Staff	+ names “most-recent-unresolved” pattern family + connects to monotonic stack problems + addresses “what if char set were configurable” before being asked.
Principal	+ asks production context (“are we validating Markdown? JSON? code?”) + identifies that real validators need position tracking for error messages + offers extension to track unmatched positions for IDE-style red squiggles + mentions that for adversarial input we’d cap stack depth to prevent DoS.

13. Follow-up Questions & Full Answers

Q1: “Return the index of the first invalid character instead of true/false.”

Signal: Can you adapt without rewriting? Answer: Track current index i; when popping mismatches or empty-pop happens, return i. At end if stack non-empty, return the index of the unmatched opener at stack top (so push (char, index) tuples, not just chars). One-pass O(N), no extra time cost.

Q2: “Mixed bracket modes: also accept `[)` and `(]` as valid in this protocol.”

Signal: Data-driven code awareness. Answer: Replace {')':'(', ']':'[', '}':'{'} with {')':'([', ']':'[(', '}':'{'} and change the match check from stack.pop() == pairs[c] to stack.pop() in pairs[c]. Zero algorithm change — only the config dict moves. Demonstrates extensibility.

Q3 (Hard): “Parse and evaluate the matched expression as a Lisp-like AST.”

Signal: Stack-of-frames extension. Answer: Each frame is a list of tokens accumulating between matching opens. On open: push current frame, start new. On close: pop, append the inner frame to the outer. Same stack discipline, now each entry is a list, not a char. This is the bridge from p02 to LC 394 (Decode String) and to p80 (Basic Calculator).

Q4: “Now there are millions of strings per second — same algorithm, but optimize for throughput.”

Signal: Production thinking. Answer: Three wins. (a) Replace Python list-as-stack with a fixed-size preallocated array + integer top pointer — eliminates allocation. (b) Use a 256-entry lookup table mapping char codes to (type, is_opener) instead of dict lookup — branch-free hot loop. (c) For truly hot path, write a SIMD-friendly state machine in C — but only after profiling shows this is the bottleneck.

Q5: “How would you test it?”

Signal: Testing discipline. Answer: (1) Unit: all 6 fail modes + valid + empty. (2) Property test: brute (replace-loop) vs optimal on random bracket strings, must agree. My solution.py does this. (3) Adversarial: deeply nested input (depth = 10⁵) — confirms iterative, not stack-overflowing. (4) Fuzz: random non-bracket chars to confirm the “ask about other characters” clarification.

14. Full Solution Walkthrough

See solution.py.

is_valid_brute: repeated replace loop. Correct but O(N²). Used as oracle.
is_valid: stack + closer→opener dict. Three correctness branches: (i) push on open, (ii) check-then-pop on close (empty + mismatch both → False), (iii) final stack-empty check.
stress_test: generates random bracket strings (balanced and unbalanced), asserts brute and optimal agree.

15. Beyond the Problem

Real systems this is the kernel of:

JSON / XML / YAML parsers — every structural parser uses this exact stack discipline. Errors in production parsers (e.g., “Unexpected end of input”) are the end-empty-check firing.
IDE bracket matching — your editor’s “matching bracket” highlight runs this algorithm on every keystroke, scoped to a window around the cursor.
Code linters — eslint’s no-unbalanced-brackets rule is this with position tracking.
Network protocols with framing (e.g., BSON, MessagePack) use stack-based parsers.

Principal-engineer code review comment: “We have three places in the codebase that re-implement bracket matching (JSON parser, query parser, markdown renderer). They’ve each drifted to handle their edge cases differently. Extract a generic match_pairs(input, pair_table, on_unmatched) and centralize. The bug we hit last quarter was the JSON parser missing the end-check; the markdown one had the check but the wrong error message.”

p03 — Best Time to Buy and Sell Stock

Source: LeetCode 121 · Easy · Topics: Array, DP, Greedy Companies: Amazon (very high), Bloomberg (very high), Facebook (high), Apple (medium), Uber (medium) Loop position: phone screen, or first warmup before harder DP onsite

1. Quick Context

This is the “single-transaction maximum profit” problem and the entry door to a 6-problem ladder (LC 121, 122, 123, 188, 309, 714) that culminates in the hardest stock problems on LC. Mastering p03 properly — by recognizing it as “max forward-difference with constraint i < j” and NOT brute force — unlocks the whole family.

What it looks like it tests: array iteration. What it actually tests: Whether you see the invariant transformation: instead of trying all (buy, sell) pairs, track the minimum buy-price seen-so-far and compute profit-if-sold-today. This is a one-pass O(N), O(1) algorithm; the brute force is O(N²).

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/best-time-to-buy-and-sell-stock/

12-min timer. Cold attempt. No reading on.

3. Prerequisite Concepts

“Running min/max” pattern — phase-02 §3 Prefix Sums (same family — running aggregate)
Greedy correctness: why local-min + global-max-of-(today - min) is globally optimal

4. How to Approach

Restate: Given prices indexed by day, pick a buy-day i and a sell-day j > i to maximize prices[j] - prices[i]. If no profit possible, return 0. One transaction only.

Clarify:

“Can I sell on the same day I buy?” (No — strict j > i.)
“If prices monotonically decrease, return 0 or negative?” (Per LC: 0 — no transaction is valid.)
“Multiple transactions allowed?” (No, this is LC 121. LC 122 is the multi-transaction version. ASK to confirm.)
“Length bounds?” (LC: up to 10⁵ → O(N²) will TLE.)

Examples:

[7,1,5,3,6,4] → 5 (buy at 1, sell at 6)
[7,6,4,3,1] → 0 (no profit possible)
[1] → 0 (no transaction possible — single price)
[2,4,1] → 2 (buy at 2, sell at 4, NOT buy at 1; can’t sell after — common trap)

Brute force: All (i, j) pairs with j > i; track max diff. O(N²) time, O(1) space.

Pattern recognition: “Max value of a[j] - a[i] with i < j” → equivalent to “at each j, what’s the min of a[0..j-1]?” → running-min + per-element subtract → O(N).

Optimal:

min_so_far = +infinity
best = 0
for price in prices:
    best = max(best, price - min_so_far)
    min_so_far = min(min_so_far, price)
return best

Order matters: Compute best BEFORE updating min_so_far. Otherwise on a single day you’d allow “buy and sell same day” (price - price = 0, no harm here but ON LC 122 the equivalent bug causes phantom profits).

Correctness proof (greedy): For any optimal pair (i*, j*) with j* > i*, when the loop reaches j*, min_so_far ≤ prices[i*] (because i* ≤ j*-1, and min_so_far covers prices[0..j*-1]). So prices[j*] - min_so_far ≥ prices[j*] - prices[i*] = optimal. Since we take the max over all j, we capture this value.

5. Progressive Hints

hints.md — one at a time, 5-min timer.

6. Deeper Insight — Why It Works

The transformation: A 2D search “find (i, j) max difference” becomes 1D “at each j, what’s the best historical min?” by recognizing that for any fixed j, the optimal i is argmin(prices[0..j-1]). We don’t need to remember which i — just its value. This is the same compression that powers Kadane’s algorithm (maximum subarray): instead of trying all subarrays, track the best one ending here.

Why O(1) space: The running-min subsumes all history we need. We never look back; we only look at the current price vs. the cheapest ever.

Connection to Kadane’s algorithm: If you compute diffs[i] = prices[i+1] - prices[i], then LC 121 becomes “max sum subarray over diffs” — Kadane’s algorithm. This equivalence is a Staff-level observation.

7. Anti-Pattern Analysis

Wrong #1 — Two pointers from both ends: Some try left = 0, right = N-1, shrink to find max. Doesn’t work: the optimal pair isn’t necessarily at the extremes.

Wrong #2 — Sort: Sorting destroys the time-order constraint. The buy-day must come before the sell-day in original order.

Wrong #3 — Update min before max:

for p in prices:
    min_so_far = min(min_so_far, p)   # ← wrong order
    best = max(best, p - min_so_far)

On LC 121, gives same answer because max profit on day i if buy=sell=i is 0. But on the multi-tx variants this exact bug allows “buy and sell at same instant,” inflating profit.

Wrong #4 — Greedy “buy every local min”: Confuses LC 121 (single transaction) with LC 122 (multiple). Read the prompt.

8. Skills & Takeaways

Pattern: running-min/max + per-element decision. Direct applications:

LC 122 — Buy/Sell II (multiple tx → sum positive diffs greedily)
LC 123 — Buy/Sell III (at most 2 tx → DP over states)
LC 188 — Buy/Sell IV (at most k tx → generalized DP)
LC 309 — with cooldown (state machine DP)
LC 714 — with fee (state machine DP)
LC 53 — Maximum Subarray (Kadane — same family via diff transform)
LC 152 — Maximum Product Subarray (track both running min and max because negatives)

9. When to Move On

Solved unaided <8 min, O(N), O(1)
Tested decreasing input, single-element, two-element
Articulated the “running min + per-step profit” transformation
Connected to Kadane’s algorithm via the diff trick
Stress test passes
Solved LC 122, LC 53; saw the family resemblance

10. Company Context

Amazon

The framing: “You’re shown daily stock prices for a company. What’s the most an investor could have made with one buy and one sell?”
Misleading example: They give [2, 4, 1] to bait the “buy at 1” mistake. The trap: 1 is the LAST day, no sell possible after.
Adversarial extension: “Now they can do multiple transactions” (→ LC 122) immediately followed by “now there’s a $1 fee per transaction” (→ LC 714). Tests whether you generalize cleanly.

Bloomberg (terminal company — they LIVE on time-series)

The framing: Often pure LC 121, sometimes with timestamps not indices (test if you read the order from the input).
What they want: Recognition that this is a time-series invariant problem. The phrase “I’ll track the running minimum” is a green check.
Extension: “What if prices stream in?” → same algorithm; the running-min works incrementally.

Uber

Frame: “Surge pricing history — best moment to launch a promotional ride.”
Extension: “What if some days are weekends and weekend buys are forbidden?” → tests masking, not algorithm.

11. Interviewer’s Lens

Phase	Strong	Weak	Scorecard
Reading	Confirms “single transaction” + “j > i strict”	Assumes multi-tx	“Verifies the contract”
Pre-coding	States O(N²) brute, then O(N) optimal with proof sketch	Jumps to “iterate and track”	“Derives, doesn’t memorize”
Coding	Updates `best` before `min_so_far`	Wrong order, gets lucky on LC 121	“Subtle correctness awareness”
Edge	Tests decreasing array, single price	Tests only sample	“Anticipates degenerate inputs”
Finish	Connects to Kadane / LC 122 family	Says “done”	“Sees the pattern family”

12. Level Delta

Level	Answer
Mid	One-pass running-min, ~10 min. O(N), O(1). Correct.
Senior	+ clarifies single vs multi tx + tests decreasing array + states correctness invariant.
Staff	+ names the “max subarray of diffs” equivalence (Kadane) + offers to immediately extend to LC 122.
Principal	+ asks production context (algo trading? backtest? UI dashboard?) + notes that real backtests need transaction fees, slippage, position size — and that “max profit” alone is a wrong metric (drawdown, Sharpe matter) + mentions that on real streams you’d window the running-min for non-stationarity.

13. Follow-up Questions & Full Answers

Q1: “Now allow unlimited transactions.” → LC 122.

Answer: Sum every positive consecutive diff: sum(max(0, prices[i+1] - prices[i]) for i in range(N-1)). Proof: any concave-up segment between local min and local max contributes (max - min); the sum of positive diffs equals the sum of these contributions. O(N) one pass, O(1) space.

Q2: “At most k transactions.” → LC 188.

Answer: DP. State: dp[t][i] = max profit using ≤ t transactions through day i. Transition: dp[t][i] = max(dp[t][i-1], max over j<i of (dp[t-1][j-1] + prices[i] - prices[j])). Naively O(k·N²); optimize the inner max to O(1) by tracking max(dp[t-1][j-1] - prices[j]) as we scan, giving O(k·N). When k ≥ N/2, problem degenerates to unlimited tx (LC 122) — handle this case separately.

Q3: “What about transaction fee f per buy-sell pair?” → LC 714.

Answer: State machine. hold[i] = max profit ending day i holding a stock; cash[i] = max profit ending day i not holding. hold[i] = max(hold[i-1], cash[i-1] - prices[i]). cash[i] = max(cash[i-1], hold[i-1] + prices[i] - f). Pay the fee when selling. O(N), O(1) with two scalars.

Q4: “Streaming prices, infinite stream. Output the running best-possible-profit-so-far.”

Answer: Same algorithm, incremental. Maintain min_so_far and best. On each tick, update both. The answer is best at all times. O(1) per tick.

Q5: “How do you test it?”

Answer: (1) Edge: empty (or N=1) → 0, decreasing → 0, increasing → last - first. (2) Property: brute-force O(N²) vs optimal on random arrays. (3) Adversarial: arrays where the buy is on day 0 vs day N-2 — tests whether running-min update timing is right.

14. Full Solution Walkthrough

See solution.py.

Three solutions for didactic value:

max_profit_brute: all pairs, O(N²). Oracle.
max_profit: running-min one-pass, O(N), O(1).
max_profit_kadane: Kadane on diffs, to demonstrate equivalence.

All three should agree under stress_test.

15. Beyond the Problem

Real systems this is the kernel of:

Backtest engines (Zipline, Backtrader): the “perfect foresight” upper bound used to score strategies. A strategy can’t beat single-transaction-max-profit on a sequence; this is the benchmark.
A/B test analysis: “what’s the largest sustained delta we observed?” — same running-min/max.
Latency/throughput dashboards: “what’s the largest drop in throughput?” — running-max + per-point delta.

Principal-engineer code review comment: “This algorithm assumes prices is a value-typed array. In our pipeline, prices are timestamped events with possible gaps. Either resample to fixed-interval before feeding, or change the algorithm to work on (timestamp, price) tuples and handle missing intervals. Also: what’s the contract when prices contains NaN (market closed)? Define it explicitly.”

p04 — Merge Sorted Array

Source: LeetCode 88 · Easy · Topics: Array, Two Pointers, Sorting Companies: Facebook (high), Bloomberg (high), Microsoft (high), Adobe (medium), Apple (medium) Loop position: phone screen or onsite warmup; often paired with a follow-up to merge k arrays (LC 23 — Merge k Sorted Lists)

1. Quick Context

A deceptively simple problem with a sharp trick: you must merge in place in nums1 (which has extra trailing zeros). The naive approach (merge from the left) requires shifting → O(N²). The optimal approach (merge from the right, writing the largest elements into the back) is O(M+N), O(1) extra.

What it tests: in-place pointer manipulation and the recognition that writing backwards avoids the overwrite problem. The trap: candidates instinctively merge from the front and either get O(N²) or need an extra buffer (O(N) space) and forget that the prompt forbids it.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/merge-sorted-array/

12-min timer. Cold attempt. The “merge from the back” insight should be the first thing you reach for — if not, that’s a strong signal you need this rep.

3. Prerequisite Concepts

Two-pointer technique — phase-02 §1
“In-place transformation” pattern — when extra buffer is forbidden

4. How to Approach

Restate: nums1 has length m + n. Its first m entries are valid, sorted; the last n are placeholder zeros. nums2 has n valid sorted entries. Merge so that nums1 holds all m + n values sorted. Modify nums1 in place.

Clarify:

“Can I use O(N) extra space?” (Usually no — the whole point is in-place. ASK.)
“Are duplicates allowed?” (Yes; merge is stable.)
“Can m or n be 0?” (Yes. Both edge cases.)
“Are negative numbers allowed?” (Yes, per constraints.)

Examples:

nums1=[1,2,3,0,0,0], m=3, nums2=[2,5,6], n=3 → [1,2,2,3,5,6]
nums1=[1], m=1, nums2=[], n=0 → [1] (n=0 edge)
nums1=[0], m=0, nums2=[1], n=1 → [1] (m=0 edge)
nums1=[4,5,6,0,0,0], m=3, nums2=[1,2,3], n=3 → [1,2,3,4,5,6] (all of nums2 < all of nums1 — tests handling when nums2 isn’t exhausted)

Brute force: Copy nums2 into the tail of nums1, then sort. O((M+N) log (M+N)). Trivially correct but wastes the “already sorted” property.

Pattern: Two sorted sequences + in-place + extra room at the END → merge from the back, largest first.

Optimal:

i = m - 1            # pointer into nums1's valid part
j = n - 1            # pointer into nums2
write = m + n - 1    # pointer into nums1's tail (where to write next)
while j >= 0:        # only need to keep going while nums2 has elements
    if i >= 0 and nums1[i] > nums2[j]:
        nums1[write] = nums1[i]
        i -= 1
    else:
        nums1[write] = nums2[j]
        j -= 1
    write -= 1

Why we loop on j >= 0, not i >= 0: If nums2 is exhausted, the remaining nums1[0..i] is already in its final position (since we wrote bigger elements to the right). If nums1’s valid prefix is exhausted (i < 0), we must still copy remaining nums2 elements. The loop condition reflects which case requires action.

Correctness: At each step we write the larger of the two unprocessed maxima into the next free slot from the right. The slot is always at or past the “write frontier”, which is at index i + j + 1. Since write = i + j + 1 decreases monotonically and i + j is the count of unprocessed elements minus 2, the slot is never one we still need to read from. No overwrite.

5. Progressive Hints

hints.md. One at a time.

6. Deeper Insight — Why It Works

The reverse-merge insight: Merging from the front into nums1 would overwrite valid nums1 data before reading it (because the write pointer would catch up to the read pointer for nums1). Merging from the back writes into slots that are guaranteed unused (either they’re trailing zeros, or they’re slots holding values already moved).

The invariant: At any point, nums1[write+1 .. m+n-1] contains the largest (m+n-1 - write) elements of the final merged result, in sorted order. nums1[0..i] and nums2[0..j] are the unprocessed prefixes.

Why i+j+1 == write always: Initially i+j+1 = (m-1) + (n-1) + 1 = m+n-1 = write. Each iteration decrements write by 1 and decrements either i or j by 1, so the relation holds. Therefore write is always one past i+j, the position right after the unprocessed prefixes — guaranteed empty. This is the formal proof of “no overwrite.”

7. Anti-Pattern Analysis

Wrong #1 — Front merge with shift:

i, j = 0, 0
while j < n:
    if i < m and nums1[i] <= nums2[j]:
        i += 1
    else:
        # shift nums1[i..m-1] right to make room
        nums1.insert(i, nums2[j])  # O(M) shift!
        nums1.pop()                # O(M)
        i += 1
        m += 1
        j += 1

Correct but O((M+N) × M) — quadratic. Senior interviewers see this and ask “can you do better?” — if you can’t pivot to the reverse-merge, you’ve revealed a gap.

Wrong #2 — Copy nums1 valid prefix to an aux buffer:

aux = nums1[:m]
# then merge aux and nums2 into nums1 from the front

Correct and O(M+N) time but O(M) extra space. Violates the in-place spirit — passes LC but loses interview points.

Wrong #3 — Sort after concat:

nums1[m:] = nums2
nums1.sort()

O((M+N) log (M+N)). Passes LC. But: “you ignored the ‘sorted input’ property — what was the point of this problem?”

Wrong #4 — Forget the i >= 0 guard in the comparison:

while j >= 0:
    if nums1[i] > nums2[j]:   # IndexError when i goes negative

The guard if i >= 0 and nums1[i] > nums2[j] matters when nums1’s valid prefix is exhausted before nums2.

8. Skills & Takeaways

Pattern: write-backwards-to-avoid-overwrite. Direct applications:

LC 26 / 27 / 80 — Remove Duplicates / Remove Element (write-forward variant)
LC 283 — Move Zeroes
LC 977 — Squares of a Sorted Array (write from back: largest absolute value at each end)
LC 167 — Two Sum II (two-pointer from both ends; sibling family)

The “merge from back” trick generalizes: Any in-place merge where one container has trailing room becomes O(N) by writing backward. Used in some qsort partition tricks and in compaction passes of generational garbage collectors.

9. When to Move On

Solved unaided <10 min using reverse-merge
Tested m=0, n=0, all-of-nums2-smaller, equal elements
Articulated why front-merge fails (overwrite) and back-merge succeeds (slot guaranteed free)
Stress test passes
Solved LC 977; saw the same write-backwards idea

10. Company Context

Facebook / Meta

Frame: Often appears as a warmup before LC 23 (Merge k Sorted Lists). The interviewer expects you to do p04 in 5 min, then they ask “now generalize to k arrays.”
What they want: Reverse-merge offered without prompting. The phrase “I’ll merge from the back to avoid shifting” is a green check.
Trap: They give you nums1 with extra space already allocated. Candidates who treat nums1 as size-m and try to grow it don’t see the “trailing zeros” hint.

Bloomberg

Frame: Often as merge of two sorted price-tick streams (“merge into a single time-ordered view”).
Extension: “Now they may have duplicate timestamps — preserve order from stream A first.” Tests stable-merge awareness.

Microsoft

Frame: Phone screen warmup. They watch your pointer manipulation cleanliness.
What they want: Correct, no off-by-one, comment on the loop condition. Boring + correct = pass.

Adobe

Frame: Often given with the explicit constraint “no extra memory”. Tests whether you internalize the in-place requirement.

11. Interviewer’s Lens

Phase	Strong	Weak	Scorecard
Reading	Notices the trailing zeros are intentional padding	Ignores zeros, treats nums1 as size-m	“Reads spec carefully”
Pre-coding	States “merge from back to avoid overwrite”	Plans front-merge with shifting	“Recognizes the optimal pattern”
Coding	Three pointers (i, j, write); correct guards	Off-by-one, IndexError	“Disciplined pointer code”
Edge	Tests m=0, n=0, all-A-bigger	Tests only sample	“Tests boundary conditions”
Finish	Articulates the `i+j+1 == write` invariant	Says “done”	“Proves correctness”

12. Level Delta

Level	Answer
Mid	Reverse-merge, O(M+N), works. ~10 min.
Senior	+ clarifies extra-space constraint + tests m=0/n=0 + articulates why reverse direction avoids overwrite.
Staff	+ states the `i+j+1 == write` invariant formally + offers to extend to merge-k-lists with a heap + notes that stable merge order matters in some applications.
Principal	+ asks production context (database compaction? log merge?) + notes that real merge-sort tape algorithms used this exact pattern in pre-RAM era + identifies that for cache efficiency, even on RAM, sequential write patterns matter and reverse-merge here is sequential-from-the-end (still cache-friendly).

13. Follow-up Questions & Full Answers

Q1: “Now merge k sorted arrays into one.” → LC 23 family

Answer: Min-heap of (value, array_id, element_id). Pop smallest, append to output, push the next from the same array. O(N log k) where N is total elements. For very large k, consider tournament tree (same complexity, lower constant).

Q2: “What if nums1 doesn’t have extra room at the end — same length as valid data?”

Answer: No way to merge in place in O(M+N). You’d need O(min(M,N)) extra space minimum (proof: there must be storage for the merge frontier). Standard out-of-place merge.

Q3: “What if both arrays are huge and stored on disk?”

Answer: External merge sort’s merge phase. Read both arrays in buffered chunks; merge into output buffer; flush when full. Classical pattern. The key complexity unit becomes I/O, not comparisons.

Q4: “Merge two sorted linked lists in place.” → LC 21

Answer: Two-pointer with rewiring. Maintain a dummy head + a tail pointer. At each step, point tail.next to the smaller of a.val, b.val, advance that pointer. After the loop, append whatever remains. O(M+N), O(1) extra.

Q5: “How do you test?”

Answer: (1) Edge: m=0, n=0, all-A-smaller, all-A-larger, equal elements, single-element each. (2) Property: brute (sort after concat) vs optimal on random sorted inputs, must agree element-for-element. (3) Adversarial: very different sizes (m=1, n=10⁵) — confirms loop conditions handle imbalance.

14. Full Solution Walkthrough

See solution.py.

merge_brute(nums1, m, nums2, n): append nums2 then sort. Oracle.
merge(nums1, m, nums2, n): reverse-merge with three pointers. Mutates nums1 in place.
stress_test: random sorted arrays of random sizes; brute vs optimal must produce identical nums1.

15. Beyond the Problem

Real systems this is the kernel of:

Database compaction: LSM trees (LevelDB, RocksDB, Cassandra) periodically merge sorted SSTables. The merge phase is exactly this algorithm, scaled to disk-resident sorted runs with buffered I/O.
Log merging: distributed tracing systems (Jaeger, Zipkin) merge per-shard time-sorted spans into a global view.
External sort: when data exceeds RAM, sort runs in memory, write to disk, then merge runs — same algorithm.
Generational garbage collectors: compaction passes merge live objects into a destination region; the “write backwards” idea appears in some collectors to avoid overwriting unmoved objects.

Principal-engineer code review: “If this is being called in a tight loop, the algorithm is fine but the API is wasteful — caller must allocate nums1 with extra capacity. Consider a builder pattern that allocates the output buffer once and reuses it, especially if M and N vary. Also: the function silently assumes nums1 has length m+n; add a precondition or a contract test, or someone will pass an under-sized array and you’ll get a confusing IndexError instead of a clear contract violation.”

p05 — Climbing Stairs

Source: LeetCode 70 · Easy · Topics: Math, DP, Memoization Companies: Adobe (high), Amazon (medium), Apple (medium) Loop position: the canonical “intro to DP” — almost always a warmup before a real DP question

1. Quick Context

The “Fibonacci in disguise” problem. It’s the cleanest possible 1D DP and exists to teach you the recipe: (1) define state, (2) write recurrence, (3) identify base cases, (4) decide order, (5) compress space. If you can’t articulate those 5 steps on p05, you can’t on p51 (House Robber), p55 (Jump Game), p60 (Longest Increasing Subsequence), or anywhere else.

What it looks like it tests: ability to recognize Fibonacci. What it actually tests: the DP recipe. Interviewers use this as a calibration — if you write recursion without memoization (O(2^N)), or memoization without realizing iteration is simpler, you’ve revealed your DP fluency level.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/climbing-stairs/

10-min timer. Cold attempt. If you reach for recursion-without-memo, that’s the signal you need this rep most.

3. Prerequisite Concepts

Recursion + memoization basics — phase-01 §8
“State + recurrence + base case” — the universal DP recipe
Space compression: when only the last k states are needed, store only those

4. How to Approach

Restate: From step 0, reach step n, taking either 1 or 2 steps at a time. How many distinct ways?

Clarify:

“Distinct sequences of moves, or distinct step sets?” (Sequences — 1+2 and 2+1 are different. ASK; some problems differ.)
“n ≥ 1?” (Per LC: 1 ≤ n ≤ 45.)
“Why n ≤ 45?” (Hints at int overflow in some languages; in Python no issue. Worth a verbal acknowledgment.)

Examples:

n=1 → 1 (just [1])
n=2 → 2 ([1,1], [2])
n=3 → 3 ([1,1,1], [1,2], [2,1])
n=4 → 5 (Fibonacci pattern starts to emerge: 1, 2, 3, 5)
n=5 → 8

Brute force (recursive): f(n) = f(n-1) + f(n-2), base f(1)=1, f(2)=2. O(2^N) time without memo — exponential.

Pattern recognition: The recurrence is Fibonacci. Anywhere you see “ways to reach state N built from prior states” with overlapping subproblems → DP.

Optimal: Iterative DP, two scalars (prev1, prev2). O(N) time, O(1) space.

if n <= 2: return n
prev2, prev1 = 1, 2     # f(1), f(2)
for i in range(3, n+1):
    prev2, prev1 = prev1, prev1 + prev2
return prev1

Recurrence justification: To reach step n, the last move was either a 1-step (from step n-1) or a 2-step (from step n-2). These are disjoint sets of paths, so total = f(n-1) + f(n-2).

5. Progressive Hints

hints.md. One at a time.

6. Deeper Insight — Why It Works

The DP recipe applied:

State: f(n) = number of distinct ways to reach step n.
Recurrence: f(n) = f(n-1) + f(n-2) (last move was 1 or 2).
Base cases: f(1) = 1, f(2) = 2. (Why two base cases? Recurrence reaches back two steps; we need both grounded.)
Order: Bottom-up (iteratively i = 3 .. n) because f(n) depends only on smaller i.
Space compression: Only the last two values are needed, so two scalars suffice — O(1) space.

Why two base cases, not one? The recurrence f(n) = f(n-1) + f(n-2) requires f(n-2) to be defined when n=3, so we must specify both f(1) and f(2). Trying to derive f(2) from f(0) + f(1) requires defining f(0) = 1 (the empty path) — a valid alternative formulation, but the natural problem space starts at step 1.

Closed form (Staff signal): Fibonacci has a closed-form Φ^n / √5 (Binet’s formula). For interview purposes, mention it exists but don’t compute it — floating-point loses precision for n > ~70 and the problem caps at 45 anyway.

Matrix exponentiation (Staff signal): Fibonacci can be computed in O(log N) via [[1,1],[1,0]]^n. Overkill for n ≤ 45; useful when n is large or when the recurrence generalizes (e.g., f(n) = a·f(n-1) + b·f(n-2)).

7. Anti-Pattern Analysis

Wrong #1 — Plain recursion, no memo:

def climb(n):
    if n <= 2: return n
    return climb(n-1) + climb(n-2)

O(2^N). At n=45, ~35 billion calls. TLE for sure. Interviewer’s note: “didn’t recognize overlapping subproblems.”

Wrong #2 — Memo with dict overhead:

@lru_cache
def climb(n):
    if n <= 2: return n
    return climb(n-1) + climb(n-2)

Works in O(N) but uses O(N) call stack and dict memory. Iterative is strictly better — no recursion limit risk, O(1) space. Mention lru_cache exists but show why iterative wins.

Wrong #3 — DP table O(N) space:

dp = [0]*(n+1)
dp[1], dp[2] = 1, 2
for i in range(3, n+1): dp[i] = dp[i-1] + dp[i-2]
return dp[n]

Correct, O(N) time, O(N) space. Misses the O(1) compression. Senior signal: notice you only need the last 2 values, drop the array.

Wrong #4 — Off-by-one on base cases: Forgetting n==1 returns 1, not 2; or forgetting to handle n==2 directly and letting the loop run. Tests with n=1, 2, 3 catch these.

8. Skills & Takeaways

The DP recipe — burn it in:

State definition
Recurrence with justification
Base cases (one per step the recurrence reaches back)
Computation order
Space compression (only keep what’s referenced)

Generalizations — variants you’ll see:

LC 746 — Min Cost Climbing Stairs (same structure, minimize cost instead of count)
LC 198 — House Robber (same structure with “can’t be adjacent” constraint)
LC 213 — House Robber II (circular variant)
LC 91 — Decode Ways (Fibonacci with validity guards)
LC 1137 — N-th Tribonacci (extend recurrence to 3 prior terms)
LC 509 — Fibonacci Number (literal Fibonacci)

When you can take 1, 2, or 3 steps: f(n) = f(n-1) + f(n-2) + f(n-3) (Tribonacci). When you can take 1..k steps: f(n) = sum f(n-i) for i=1..k → can be O(N) with a sliding window.

9. When to Move On

Solved iteratively, O(N) time, O(1) space, <8 min
Tested n=1, n=2, n=3
Articulated all 5 DP recipe steps without prompting
Connected to House Robber (LC 198) as the same pattern with cost
Stress test passes
Mentioned closed-form / matrix exp exist (Staff signal)

10. Company Context

Adobe (this is their go-to DP warmup)

Frame: Often given verbatim.
What they want: Iterative O(1) space, articulated DP recipe. Recursion without memo is a near-instant fail.
Extension: “What if you can take 1, 2, or 3 steps?” → Tribonacci. Tests generalization.

Amazon

Frame: Sometimes as a paving / tiling problem (“ways to tile a 1×N corridor with 1×1 and 1×2 tiles”) — same Fibonacci underneath.
Extension: “Now there’s a step you can’t land on (broken stair).” → DP with constraint: f(broken) = 0, recurrence unchanged otherwise.

Apple

Frame: Often paired with LC 198 (House Robber) — they want to see if you generalize.

11. Interviewer’s Lens

Phase	Strong	Weak	Scorecard
Reading	Asks “sequences vs sets” + clarifies n bounds	Dives in	“Verifies the contract”
Pre-coding	Names the recurrence with justification	Says “I’ll recurse”	“Articulates DP structure”
Coding	Iterative + 2 scalars	Recursive without memo	“Optimal space upfront”
Edge	n=1, n=2 covered	Misses n=1	“Catches base-case bugs”
Finish	Mentions DP recipe steps + extensions	Says “done”	“Frameworks generalize”

12. Level Delta

Level	Answer
Mid	Iterative O(N), O(1). Works. ~8 min.
Senior	+ states the 5-step DP recipe explicitly + tests n=1,2,3 + offers to extend to k-step.
Staff	+ mentions closed-form (Binet) and matrix exponentiation as O(log N) alternatives + connects to House Robber as same family with different reduction operator (sum → max).
Principal	+ asks production context (“counting paths in an event graph? combinatorics for a recommender?”) + identifies that for very large n we need modular arithmetic (problem caps prevent overflow but real systems don’t) + suggests matrix exponentiation under modulo for n in the millions.

13. Follow-up Questions & Full Answers

Q1: “What if you can take 1, 2, or 3 steps?”

Answer: Tribonacci. f(n) = f(n-1) + f(n-2) + f(n-3), base cases f(1)=1, f(2)=2, f(3)=4. O(N) time, O(1) space with three scalars.

Q2: “What if you can take 1..k steps?”

Answer: f(n) = sum f(n-i) for i in 1..k. Naive O(N·k). Optimize via sliding window: f(n) = f(n-1) + f(n-1) - f(n-1-k) (because consecutive windows overlap in k-1 terms). O(N), O(k) space.

Q3: “Now some steps are broken — you can’t land there.”

Answer: f(broken[i]) = 0. Recurrence unchanged. Still O(N), O(1) — just a guard inside the loop.

Q4: “Same problem but n can be up to 10^18.”

Answer: Matrix exponentiation. [f(n), f(n-1)]^T = M^(n-1) · [f(1), f(0)]^T where M = [[1,1],[1,0]]. Compute M^(n-1) via binary exponentiation in O(log N) matrix multiplications, each O(1) for a 2×2 matrix. Total O(log N). Use modular arithmetic to prevent overflow.

Q5: “How do you test it?”

Answer: (1) Small cases against hand-computed Fibonacci values (1, 2, 3, 5, 8, 13). (2) Property: brute recursive (with memoization) vs iterative agree on n in 1..30. (3) Stress on the boundary n=1 (off-by-one trap) and n=45 (max constraint).

14. Full Solution Walkthrough

See solution.py.

climb_brute(n): plain recursion, no memo. Used as oracle for small n only (would TLE for large).
climb(n): iterative two-scalar version. O(N) time, O(1) space.
climb_memo(n): top-down memoization, included for comparison. O(N) time, O(N) space.
stress_test: brute vs optimal vs memo on n ∈ [1, 25], all three agree.

15. Beyond the Problem

Real systems this is the kernel of:

Path-counting in DAGs: number of distinct routes through a graph. The 1D variant generalizes to the topological order of a DAG.
State-machine reachability: number of distinct token sequences leading to a state in a parser.
Combinatorial enumeration in recommender systems: counting compatible item sequences under constraints.
Probabilistic state aggregation: when transition probabilities are uniform, counts and probabilities differ only by a normalizing constant.

Principal-engineer code review: “This function is correct but its name climb_stairs is so specific that it won’t be reused. The underlying primitive — counting paths under additive recurrence — appears in three other places in our codebase, each reimplemented. Extract count_paths(n, allowed_step_sizes) and use it everywhere. Also: cache the result if called repeatedly with the same n — but for n ≤ 45 the compute cost is negligible, so probably not worth the cache complexity.”

Phase 0 — Interview Execution Baseline

Target level: Beginner → Easy Expected duration: 1 week (12-week track) / 1–2 weeks (6-month) / 2 weeks (12-month) Weekly cadence: 7 labs + 25 Easy problems applying the framework rigorously

Why This Phase Exists

Most candidates fail interviews not because they lack algorithms knowledge, but because of execution failures: they jump to coding before understanding the problem, panic when stuck, miss obvious edge cases, or communicate so poorly that the interviewer can’t tell whether they actually solved it.

Phase 0 fixes execution. It does not teach algorithms. It teaches you how to use what you already know.

Concepts To Master

1. How Coding Interviews Are Evaluated

Every modern coding interview at Big Tech uses a multi-dimensional rubric. The dimensions are roughly:

Problem understanding — did you grasp what was asked?
Approach quality — did you find a reasonable solution?
Optimality — did you reach the optimal complexity?
Implementation correctness — does your code actually work?
Code quality — would your code pass code review?
Testing — did you verify your solution?
Communication — could the interviewer follow your thinking?
Tradeoff awareness — do you understand what you chose and why?

Each is scored ~independently. A “hire” decision typically requires strong scores on most dimensions, not perfect on one. Code that works but is uncommunicated is often a no-hire.

2. How To Communicate While Solving

See ../COMMUNICATION.md in full. The summary: narrate your intent, not your typing. Pause at decision points. Ask before assuming.

3. How To Ask Clarifying Questions

Good questions:

“Can the input be empty?”
“Are there duplicates allowed?”
“What’s the expected size of N — is 10^5 reasonable?”
“Does the order of output matter?”
“What should happen on invalid input?”
“If multiple valid answers exist, can I return any one?”

Bad questions:

Re-asking what’s already in the problem statement (signals poor reading)
“What’s the optimal complexity?” (this is your job to derive)
15 questions in a row (drip them as relevant)

4. How To Derive Constraints

Constraints dictate the algorithm. Memorize this table:

N	Acceptable Complexity	Likely Approach
≤ 10	O(N!) or O(2^N · N)	Backtracking, full enumeration
≤ 20	O(2^N · N)	Bitmask DP, meet-in-the-middle
≤ 100	O(N^4)	Multi-loop brute, Floyd-Warshall
≤ 500	O(N^3)	Interval DP, matrix chain
≤ 5,000	O(N^2)	2D DP, edit distance
≤ 100,000	O(N log N)	Sort + scan, heap, segment tree
≤ 1,000,000	O(N)	Linear scan, hashmap, two pointers
≤ 10^8	O(log N)	Binary search, math closed form
≤ 10^18	O(log N)	Binary exponentiation, math

Rule of thumb: in modern judges, 10^8 simple operations per second is safe; 10^9 is risky.

5. How To Create Examples

The given examples are minimum-coverage. You should construct:

Trivial: size 0, size 1.
Boundary: all duplicates, all negatives, all sorted, all reversed.
Adversarial: max-constraint values, edge of integer range.
Multi-answer: if multiple valid outputs exist, pick a specific one.

Working through your own examples often reveals the pattern faster than reading the problem 5 more times.

6. How To Identify Edge Cases

Universal checklist (run through this every problem):

Empty input
Null input (where applicable)
Single element
Two elements
Duplicates
Negative numbers
Maximum constraint values (overflow risk)
Sorted input
Reversed input
Disconnected graph
Cycle in graph
Multiple valid answers
All identical values
Off-by-one at boundaries

7. How To Start With Brute Force

The brute force is mandatory. Even when you “see” the optimal:

It anchors correctness — you have a working oracle.
It’s a starting point to optimize from.
It’s a fallback if optimization fails.
It demonstrates you understand the problem at all.

State the brute force in pseudocode within the first 3 minutes.

8. How To Optimize

Optimization checklist:

Pattern recognition — does this match a known pattern? See Phase 2.
Repeated work — what does the brute force recompute? That’s your DP / memo target.
Sortedness — would sorting help? Two pointers, binary search, sweep line.
Monotonicity — is the answer monotonic in some parameter? Binary search on answer.
State compression — can the state space be made smaller? Bitmask, prefix sum.
Math — closed form, modular arithmetic, combinatorics.
Data structure — would a heap / BST / segment tree change the complexity?

9. How To Prove Correctness

Greedy: exchange argument — show that any optimal solution can be modified to use the greedy choice without losing optimality.
DP: state definition, transition, base cases, evaluation order.
Two-pointer / sliding window: loop invariant.
Graph algorithms: cite the algorithm’s correctness theorem and verify preconditions hold.
Math: induction or direct calculation.

10. How To Explain Complexity

State time and space. Mention:

Whether the bound is tight.
Amortized vs worst case (e.g., dynamic array push is O(1) amortized, O(N) worst case).
Assumptions (hash table O(1) average requires non-adversarial input).
Constants when they matter (e.g., bitset gives 64× speedup over bool array).

11. How To Write Clean Code Under Time Pressure

See ../CODE_QUALITY.md. Quick rules:

Meaningful names.
Helper functions for distinct units.
Validate at the boundary, not in hot loops.
No globals.
Standard library used idiomatically.

12. How To Test Manually

Walk through given example by hand. Record intermediate state.
Walk through one trivial case (empty / size 1).
Walk through one adversarial case.
Find at least one bug before submitting.

13. How To Recover When Stuck

Use the stuck protocol. Restate, brute force, examine constraints, try smaller examples, look for repeated work, look for monotonicity, look for graph modeling, ask for a hint.

14. How To Use Hints Without Failing The Interview

A hint is not a failure. Frozen silence is. Sample phrasing:

“I’ve explored two pointers and sliding window, but I’m having trouble seeing how to handle duplicates without O(N^2). Could you nudge me toward the right family of approach?”

Receive the hint, restate it, commit out loud, resume.

Why These Concepts Matter In Interviews

Algorithms are necessary but not sufficient. Of all rejected candidates with strong algorithm knowledge, the most common failure modes are:

“Could not communicate clearly” (60%+)
“Did not test their code” (40%+)
“Could not articulate complexity” (30%+)
“Got stuck and couldn’t recover” (30%+)
“Misunderstood the problem” (20%+)

(Percentages are approximate, based on common interviewer feedback patterns.)

Phase 0 fixes all of these.

Required Problem Categories

Phase 0 problems are not algorithmically hard. The challenge is execution. Use only Easy problems:

Two Sum
Reverse String
Valid Palindrome
Maximum Subarray (Kadane’s)
Best Time To Buy And Sell Stock
Single Number
Merge Two Sorted Lists
Linked List Cycle
Binary Tree Inorder Traversal
Symmetric Tree
Maximum Depth Of Binary Tree
Climbing Stairs
Move Zeros
Contains Duplicate
Intersection Of Two Arrays
Reverse Linked List
Valid Parentheses
Implement Stack Using Queues
First Bad Version
Squares Of A Sorted Array

Solve each one applying the full framework. The point is not the answer; it’s the execution discipline.

Recommended Resources

This curriculum’s FRAMEWORK.md, COMMUNICATION.md, CODE_QUALITY.md
LeetCode Easy tier (filter by Easy)
“Cracking the Coding Interview” (Gayle Laakmann McDowell) — chapters 1, 2, 6, 7 (the soft-skills chapters; skip the rest until later)

Hands-On Labs

Complete in order. Each lab uses the strict lab format from the curriculum spec.

Common Mistakes In Phase 0

Skipping Phase 0 thinking “I already know this stuff” — execution skills are different from knowledge
Solving Easy problems silently — the whole point is communication practice
Skipping the brute force because the optimal is obvious
Skipping the testing step because the code “looks right”
Not timing yourself — you don’t know your real solving speed until you measure
Treating clarifying questions as a checklist — they should feel natural and motivated, not robotic

Mastery Checklist

Solve any LeetCode Easy in 12 minutes including brute force, optimization, testing, complexity
Restate every problem in your own words without re-reading the prompt
Ask 3+ relevant clarifying questions on every problem
Always state the brute force first (even if 10 seconds long)
Walk through at least one example by hand before submitting
Explain complexity correctly on every problem
Find at least one bug pre-submission on at least 30% of problems (this is good!)
Never go silent for >60 seconds when working a problem
Recover from being stuck using the stuck protocol within 3 minutes

Exit Criteria

Move to Phase 1 only when:

Mastery checklist 100% checked
Completed all 7 labs
Solved 25 Easy problems applying the full framework
Recorded yourself solving 3 problems and reviewed the playback for communication quality
Run a self-mock with one Easy problem you’ve never seen — pass with full framework execution

If any item fails, repeat Phase 0 with another 25 problems. Do not move forward with a weak baseline.

Lab 01 — Problem Statement Reading

Goal

Train the discipline of reading a problem statement deliberately — extracting structure, constraints, examples, and ambiguity — before any solving begins. Most candidate failures begin in the first 60 seconds when the candidate reads superficially and locks in a wrong mental model.

Background Concepts

Active reading vs passive reading
Structural parsing of a problem (input shape, output shape, constraints, examples, follow-ups)
Identifying ambiguity vs underspecification
Restating in your own words as a comprehension test

Interview Context

In a real interview, the prompt is often delivered verbally with a brief written version. You have ~3 minutes to load the entire problem into working memory before you start engaging. If you misunderstand something here, every subsequent step is wasted. Strong candidates always restate the problem out loud and ask 2–4 clarifying questions before touching anything.

Problem Statement

Given the problem statement below, in 5 minutes:

Read it twice.
Restate it in your own words.
List all constraints (explicit + implicit).
List 3 ambiguities you’d ask the interviewer about.
Construct 3 examples beyond the one given.

The problem (use as a fixed text for this lab):

“Given a list of meeting time intervals, determine if a single person could attend all meetings.”

Example: [[0,30],[5,10],[15,20]] → false.

Constraints

The problem deliberately omits constraint specification. That’s the point.

Clarifying Questions (you should generate)

Examples of good questions to surface from the prompt above:

Are intervals inclusive on both ends, or [start, end)?
Can intervals be zero-duration ([5,5])?
Is the input pre-sorted?
Are negative times possible?
What are the realistic bounds on N and on the time values?
Can two meetings sharing an endpoint be attended (e.g. one ends at 10, next starts at 10)?
Are there any null or invalid intervals to validate?

Examples (you should generate)

Beyond the given:

[] → true (empty)
[[1,5]] → true (single)
[[1,5],[5,10]] → endpoint-sharing case (depends on inclusivity)
[[1,5],[2,3]] → fully nested overlap
[[10,20],[1,5]] → unsorted, non-overlapping
[[1,1],[1,1]] → zero-duration duplicates

Initial Brute Force

Compare every pair: O(N^2). For each pair, check if intervals overlap (max(a.start, b.start) < min(a.end, b.end) for half-open).

Brute Force Complexity

Time O(N^2), space O(1).

Optimization Path

Sort by start time, then walk through and check intervals[i].start >= intervals[i-1].end. Two thoughts emerge during sorting: ties on start, and the inclusivity semantics — both surface back to the clarifying questions. Optimal after the clarification.

Final Expected Approach

Sort + linear scan. O(N log N) time, O(1) extra space (or O(N) if sorting requires a copy).

Data Structures Used

The interval array (input)
Sort comparator on start time

Correctness Argument

After sorting by start, two meetings overlap ↔ some adjacent pair overlaps. Proof: if intervals i < j overlap, then for all k with i ≤ k < j, since intervals[k].start ≤ intervals[j].start < intervals[i].end ≤ intervals[k].end (after sort), k and k+1 overlap somewhere too. By induction, adjacent pairs cover all overlap detection.

Complexity

Time: O(N log N) (sort dominates)
Space: O(1) auxiliary if in-place sort, else O(N)

Implementation Requirements

For this lab, do NOT implement. Instead produce a written deliverable (described below).

Tests

Not applicable for this lab — written exercise only.

Follow-up Questions

“What if instead of yes/no, we wanted the number of conflicts?”
“What if we had to schedule N people across M rooms?” (becomes Meeting Rooms II)
“What if the meetings stream in one at a time and we want online detection?”

Product Extension

This is “calendar conflict detection” — a real product feature. Ask yourself: what would Google Calendar do for 10,000 events? (Hint: indexed by day → small per-day scan, or interval tree for general queries.)

Language/Runtime Follow-ups

In Python, sort is O(N log N) Timsort, stable. Beware of key=lambda x: x[0] — closure overhead vs operator.itemgetter(0).
In Java, sort uses dual-pivot quicksort for primitives, Timsort for objects. Comparator allocation can dominate small inputs.
In Go, sort.Slice is reflection-based and slow; prefer sort.Slice only when convenient or use sort.Sort with a typed slice.
In C++, std::sort is introsort; comparator must be strict-weak-order or behavior is UB.

Common Bugs

Using <= instead of < (or vice versa) due to misreading inclusivity
Mutating input when interviewer expects pure function
Off-by-one on boundary cases ([1,5] and [5,9])
Not handling empty input

Debugging Strategy

If a test fails:

Print pairs where the algorithm decided “overlap” or “no overlap”
Walk through the smallest failing case by hand
Check inclusivity assumption — single source of most bugs here

Deliverable For This Lab

In your notebook (or a markdown file beside this lab), write:

Restatement. A 1–2 sentence paraphrase of the problem in your own words.
Clarifying questions list. 6+ questions, prioritized.
Implicit constraints list. What did the prompt fail to specify? (Inclusivity, sortedness, N bounds, etc.)
Examples list. 8+ examples covering: trivial, boundary, adversarial, multi-answer.
Brute force pseudocode. O(N^2) approach.
Optimization sketch. Just one paragraph.
Self-critique. Where in your reading did you make assumptions that the prompt didn’t justify?

Mastery Criteria

You complete this lab to mastery when:

You restated the problem without re-reading the prompt
You produced 6+ clarifying questions, none of which were answered by the prompt
You found 3+ implicit constraints
You produced 8+ examples spanning all category types
You can articulate which of your assumptions were wrong (everyone makes some — the skill is noticing)

Repeat this lab with 3 different problem statements (pick any from LeetCode Easy) before declaring mastery.

Lab 02 — Constraints Extraction

Goal

Train the discipline of converting written constraints into algorithmic targets. Given 1 ≤ N ≤ 10^5, you should immediately think “O(N log N) is the budget, O(N^2) will TLE”.

Background Concepts

The constraint-to-complexity mapping (FRAMEWORK.md)
Operations-per-second budget on competitive judges (~10^8 simple ops/sec safe)
Implicit constraints (memory, integer overflow, recursion depth)

Interview Context

In live interviews, the interviewer often omits explicit constraints, expecting you to ask. Constraints discipline the algorithm choice. Candidates who skip this step often write an O(N^2) algorithm that the interviewer was hoping they’d avoid.

Problem Statement

For each of the 10 problem prompts below, derive:

The complexity budget
At least one algorithm family that fits
At least one approach that does not fit and why

Prompts:

1 ≤ N ≤ 20 — count subsets satisfying property X
1 ≤ N ≤ 200 — shortest path in weighted graph with up to N^2 edges
1 ≤ N ≤ 10^4, queries ≤ 10^5 — range sum
1 ≤ N ≤ 10^5 — find duplicate
1 ≤ N ≤ 5 × 10^5 — kth smallest in array
1 ≤ N ≤ 10^6, integers up to 10^9 — count of pairs with sum ≤ K
N ≤ 10^9, queries Q ≤ 10^5 — count of integers in [1,N] divisible by K
T test cases, T ≤ 10^4, each with N ≤ 10^3 — pairwise XOR maximum
1 ≤ a, b ≤ 10^18 — compute a^b mod p
Stream of up to 10^7 elements — top-K running

Constraints

The point of this lab is constraints. Treat each prompt as separate.

Clarifying Questions

For each prompt, list:

What’s the operation budget?
Is the time limit explicit (e.g., 1 sec, 2 sec)?
Is there a memory limit (e.g., 256 MB)?
Are values within int32 / int64 range?

Examples

Worked example for prompt #4 (N ≤ 10^5, find duplicate):

Budget: ~10^7–10^8 ops → O(N), O(N log N), O(N · log_max) all fit
Fits: hashset O(N), sort+scan O(N log N), Floyd’s cycle if input ∈ [1,N]
Doesn’t fit: O(N^2) brute force (10^10 ops)

Initial Brute Force

Not applicable — meta-lab.

Brute Force Complexity

N/A.

Optimization Path

The optimization path here is constraint-to-algorithm mapping. Memorize the table from FRAMEWORK.md.

Final Expected Approach

For each prompt, write your final answer as:

Prompt #K:
  Budget: O(...)
  Fitting algorithm family: ...
  Disqualified approach: ... because budget / memory / overflow / etc.

Data Structures Used

The point is to map N range → DS choice:

N ≤ 20: bitmask, recursion (no DS needed)
N ≤ 200: 2D arrays (Floyd, etc.)
N ≤ 10^4: O(N^2) DP arrays, simple sort
N ≤ 10^5: hashmap, heap, sorted set, segment tree, Fenwick
N ≤ 10^6: array + linear scan, two pointers, hash, no log factors
N ≤ 10^9: math, binary search on answer, sieve segment

Correctness Argument

The argument here is budget: justify why your chosen algorithm fits within ~10^8 ops/sec * time-limit. Be explicit: N · log N = 10^5 · 17 ≈ 1.7 × 10^6 — comfortably fits 1-second limit.

Complexity

For each prompt, you produce both:

The budget
The justification per the table

Implementation Requirements

Written deliverable. No code.

Tests

Not applicable for this lab.

Follow-up Questions

For prompt #6 (10^6 elements, pairs with sum ≤ K):

“What if values can be negative?” → may need different sort + two-pointer logic
“What if we want to enumerate the pairs, not just count?” → output limit changes everything
“What if the array streams in?” → online algorithm needed

For prompt #9 (a, b ≤ 10^18):

“What if p is not prime?” → can’t use Fermat’s little theorem inverse
“What if we need a^b exactly (no mod)?” → impossible for these sizes

Product Extension

In production, “constraint” often means “expected QPS × max payload size × peak time”. A request handler that’s O(payload_size^2) is fine for size 10 but catastrophic for size 10^4. Same intuition as competitive judges, just different vocabulary.

Language/Runtime Follow-ups

Python: constant factor ~30–100× slower than C++. Halve your effective budget. N=10^5 with O(N^2) is risky in Python.
Java: ~2–4× slower than C++. Beware autoboxing in hot loops.
Go: ~2× slower than C++. Map operations have higher constants than unordered_map.
C++: baseline. Use ios_base::sync_with_stdio(false) for fast I/O.
JavaScript: ~3–10× slower than C++. Avoid object lookups in hot loops; prefer typed arrays.

Common Bugs

Forgetting Q queries multiplies your budget (10^5 queries × 10^5 per-query = 10^10!)
Forgetting T test cases (e.g., T = 10^4 with O(N^2) per test, N = 10^3 → 10^10)
Underestimating constants in Python/JS
Forgetting recursion depth limits (Python default 1000)
Forgetting integer overflow at int32 boundary (~2 × 10^9)

Debugging Strategy

When code TLEs:

Recompute total operations: outer loops × inner work × test cases × queries
Check the constant factor (string concat in a loop is a classic disaster)
Profile to find the actual hotspot (often I/O, not algorithm)

Deliverable

For each of the 10 prompts above, write the structured answer block. Compare yours to the table at FRAMEWORK.md.

Mastery Criteria

Correctly mapped 10/10 prompts to budget within 30 seconds each
Identified at least one disqualifying approach for each (per query / total ops)
Recognized the multiplier effect of T test cases / Q queries
Identified at least 2 prompts where Python/JS would need extra care

Lab 03 — Brute Force First

Goal

Internalize the discipline of always producing a brute force before optimizing — even when the optimal is obvious. The brute force is your correctness oracle, your fallback, and your communication anchor.

Background Concepts

Brute force as a baseline: it must be correct, even if slow
Brute force as oracle: cross-check optimal output with brute on random small inputs
Brute force as recovery: when stuck, you have something to deliver

Interview Context

Many candidates hear a problem, immediately recognize the optimal pattern, and start coding. The interviewer can’t tell whether they understand the problem or just memorized the answer. State the brute force first, even if it takes 30 seconds. Then, “I see this can be optimized to O(N log N) using … — would you like me to go straight to that, or step through a middle approach?”

Problem Statement

For each of the 5 problems below, in 10 minutes total:

State the brute force in pseudocode
State its complexity
State the optimization path (one or two sentences)

Problems:

Two Sum: find indices i, j such that a[i] + a[j] == target
Maximum Subarray: maximum sum contiguous subarray
Longest Substring Without Repeating Characters
Trapping Rain Water: given heights, total water trapped
Median Of Two Sorted Arrays

Constraints

For each problem, assume N ≤ 10^5 for sizing.

Clarifying Questions

For each problem, ask: “Are we returning the value or the actual subarray/indices?” — this affects implementation but not the brute-force complexity.

Examples

For Two Sum, brute is for i: for j > i: if a[i] + a[j] == target: return [i,j]. Test on [2,7,11,15], target=9 → [0,1].

Initial Brute Force

Per problem:

Two Sum: O(N^2) double loop
Maximum Subarray: O(N^3) — three nested loops (i, j, sum) — or O(N^2) with running sum
Longest Substring No Repeat: O(N^2) or O(N^3) — for each pair (i, j), check if substring has duplicates (set construction is O(N) per pair, so O(N^3))
Trapping Rain Water: O(N^2) — for each index, find max-left and max-right by scanning
Median Of Two Sorted Arrays: O(N+M) — merge then take median; even simpler O((N+M) log(N+M)) — concatenate + sort

Brute Force Complexity

As listed above. Be explicit about which O(N^k) you mean.

Optimization Path

Two Sum: Hashmap of seen values → O(N).
Maximum Subarray: Kadane’s running sum, reset to 0 if negative → O(N).
Longest Substring No Repeat: Sliding window with hashset → O(N).
Trapping Rain Water: Two-pointer with running max-left, max-right → O(N); or precompute prefix/suffix max → O(N) space.
Median Of Two Sorted Arrays: Binary search on partition → O(log min(N,M)).

Final Expected Approach

In the deliverable, write:

Problem K:
  Brute force pseudocode: ...
  Brute complexity: O(...)
  Optimization sketch: <pattern> → O(...)
  Why the optimization works (1 sentence)

Data Structures Used

For each problem, identify the diff between brute force DS and optimized DS — that diff is usually where the optimization lives.

Correctness Argument

For brute force, correctness is by exhaustive enumeration — usually trivial. For the optimized version, the correctness lives in why exhaustive enumeration is unnecessary (e.g., “if a[i] is the answer’s left half, then target - a[i] must be in the seen set”).

Complexity

Stated per problem above.

Implementation Requirements

For this lab, write the brute force only for problems 1, 2, 3 in your preferred language. Run on the given examples. Do not write the optimal version. The exercise is to make the brute force a habit.

Tests

For each implemented brute force:

Smoke: 1 given example
Edge: empty / single
Random: 10 random small inputs (N ≤ 20), check no crash, plausible output

Follow-up Questions

For Two Sum: what if there can be multiple valid pairs? Return all? First found?
For Trapping Rain Water: what about 2D version (Trapping Rain Water II)? — different algorithm (heap-based BFS).
For Median: what if K-th smallest instead of median? — same binary-search-on-partition idea.

Product Extension

In production code review, the brute force is often the only defensible code if you can’t justify the optimal. Reviewers prefer “obviously correct slow code” over “supposedly fast code with a subtle bug” — until the optimization is properly tested.

Language/Runtime Follow-ups

Python: brute force should still complete for N ≤ 10^3 in <1s. Use it as oracle.
Java: beware of Integer boxing in HashMap<Integer, Integer> for Two Sum — measurable slowdown.
C++: brute force in C++ for N ≤ 5 · 10^3 finishes in well under 1s, useful for stress testing.

Common Bugs

Off-by-one: for j = i+1 vs for j = i (Two Sum: must j > i unless duplicates allowed)
Maximum Subarray: starting max_so_far = 0 instead of -infinity fails on all-negative arrays
Sliding Window No-Repeat: not removing characters from the set when shrinking the window
Trapping Rain Water: using < vs <= when comparing left and right pointers

Debugging Strategy

When the optimal disagrees with the brute on a small input: trust the brute. The brute is by construction correct. Your optimized version has the bug.

This is the stress testing pattern:

Write brute (slow, correct)
Write optimal (fast, suspicious)
Generate random small inputs
Compare outputs; on mismatch, print the input and inspect

Deliverable

In a notebook:

For each of the 5 problems, the structured block (brute pseudocode, complexity, optimization sketch).
For problems 1, 2, 3: real code for the brute force, with smoke + edge + random tests.
A 3-sentence reflection: which problem was hardest to resist writing the optimal first?

Mastery Criteria

Wrote a brute force for all 5 in <2 minutes each (verbally or on paper)
Stated the brute force complexity correctly
Stated the optimization path in one sentence
Resisted the urge to write the optimal first
Used the brute force as oracle in at least one stress test

Lab 04 — Optimization Pathway

Goal

Train the explicit transition from brute force to optimal: identify what the brute force wastes (repeated work, missed monotonicity, overlooked sortedness), then close that gap with a specific technique.

Background Concepts

The optimization checklist (canonical):

Repeated work → memoize / DP
Sortedness → two pointers, binary search, sweep line
Monotonicity → binary search on answer
Local + global structure → sliding window, prefix sums
Pattern match to known DS → heap, monotonic stack/queue, segment tree, trie, union find
State compression → bitmask, hash of state
Math closed form → combinatorics, modular arithmetic, geometry
Graph modeling → BFS/DFS/Dijkstra/topo even on non-graph problems
Randomization / hashing → reservoir sampling, rolling hash
Approximation / amortization → when exact O(N) is hard, amortize

Interview Context

The interviewer wants to see your thinking process during optimization, not just the answer. Narrate: “The brute is O(N^2) because we recompute the prefix sum on every iteration. If we precompute prefix sums once, each query is O(1) and total is O(N).”

Problem Statement

For each of the 7 brute force descriptions below, identify which optimization checklist item applies, and produce a one-paragraph optimized approach.

Problems:

Brute: for each i, sum a[i..j] for all j; O(N^2). Goal: answer multiple range-sum queries.
Brute: for each pair (i, j), check if a[i] + a[j] = target; O(N^2). Goal: find pair sum.
Brute: generate all subsets, count those with sum K; O(2^N). N up to 30. Goal: count.
Brute: for each query (l, r), find min in a[l..r]; O(N) per query, N queries → O(N^2). Goal: range min queries on static array.
Brute: simulate game state recursively; many overlapping subproblems; O(2^depth). Goal: optimal game value.
Brute: for each i, find next j > i with a[j] > a[i]; O(N^2). Goal: next-greater-element array.
Brute: binary search would work if we knew the answer was in [L, R]. The answer is monotonic in some parameter. O(N · K) brute. Goal: find min K satisfying property.

Constraints

Assume N ≤ 10^5 for #1, #2, #4, #6, #7. N ≤ 30 for #3, allowing 2^N/2 meet-in-the-middle. N is the state-space size for #5.

Clarifying Questions

“Are queries online or offline?” — offline allows different algorithms (Mo’s, offline BIT)
“Is the array static or updated?” — static allows sparse table, dynamic needs Fenwick / segment tree

Examples

#1: prefix sums → query in O(1). #7: monotonic predicate → binary search the answer in O(log range).

Initial Brute Force

As stated above per problem.

Brute Force Complexity

As stated.

Optimization Path

For each problem, write the checklist item that applies and the optimized approach:

Sortedness / static structure: prefix sums → O(N) preprocess, O(1) query
Sortedness / hashing: hashmap of target - a[i] → O(N)
State compression: subset-sum DP O(N · K), or meet-in-the-middle O(2^(N/2)) → fits N=30
Pattern → known DS: sparse table O(N log N) preprocess, O(1) query (static); segment tree if dynamic
Repeated work: memoization → O(unique states)
Pattern → monotonic stack: O(N)
Monotonicity: binary search on answer → O(log range × verify)

Final Expected Approach

For your deliverable: each of the 7 problems gets:

Checklist item identified
Optimized algorithm
Final complexity
One-line reasoning

Data Structures Used

Prefix sum array
Hashmap
DP table / memo dict
Sparse table (immutable RMQ) / segment tree
Monotonic stack
(Binary search needs no DS)

Correctness Argument

For each optimization, the correctness argument is “the brute force result is preserved because X”:

Prefix sums: sum(l..r) = prefix[r+1] - prefix[l] — algebraic identity
Hashmap pair sum: every pair (i, j) with i < j is examined exactly once when we process j and look up target - a[j]
Subset-sum DP: state (i, sum) covers all subsets of a[0..i] with given sum
Sparse table: range min over a range of length L is min(table[k][l], table[k][r - 2^k + 1]) where k = log2(L) — overlap is fine because min is idempotent
Memoization: same input → same output → cached
Monotonic stack: pop while top is not greater than current; top after popping is the next-greater for popped elements
Binary search on answer: predicate is monotonic by problem assumption

Complexity

Stated per problem.

Implementation Requirements

Implement #1, #2, #6 in your preferred language. Verify against brute force on random small inputs.

Tests

Smoke: given example
Edge: empty, single
Random: 50 random tests against brute force; on mismatch, dump input
Large: stress test at constraint maximum, time it

Follow-up Questions

For #1: what if the array is updated? → Fenwick tree
For #2: what if k-sum (k > 2)? → recurse to (k-1)-sum with target adjustment, or sort + multi-pointer
For #4: what about range max? Range gcd? Range sum? → which are idempotent (sparse table) vs which need segment tree
For #6: what about previous greater? Next smaller? → mirror the stack
For #7: what about minimum fractional answer? → binary search on real numbers with precision

Product Extension

The optimization checklist is a real code-review tool. When reviewing a colleague’s PR with O(N^2) in a hot path, run through this checklist mentally. 80% of N^2 → N transitions are one of: prefix sum, hashmap, sort + two-pointer, monotonic stack.

Language/Runtime Follow-ups

Python: prefix-sum approach gets a 5–10× speedup if you use NumPy np.cumsum instead of Python list.
Java: monotonic stack with Deque<Integer> (autoboxing) is slower than a plain int[] with manual top index.
C++: std::lower_bound / upper_bound give log-N binary search on sorted vectors with no manual implementation.
JavaScript: Map is generally faster than plain object for hashmap when keys are non-string.

Common Bugs

Prefix sum: off-by-one in prefix[r+1] - prefix[l]
Two-pointer: forgetting to advance one pointer
DP: wrong base case or wrong evaluation order
Sparse table: log table off-by-one
Monotonic stack: comparing > vs >= for the next-greater-or-equal variant
Binary search on answer: monotonicity direction reversed

Debugging Strategy

For each optimization, do stress testing: write brute, write optimal, run on 100 random small inputs, compare outputs. This catches off-by-one bugs that survive the given tests.

Deliverable

The 7-problem structured analysis
Real implementation + tests for #1, #2, #6
A reflection paragraph: for which problem was the optimization checklist most useful? Was there a problem where you’d’ve gone the wrong direction without it?

Mastery Criteria

Correctly identified the optimization checklist item for all 7
Implemented #1, #2, #6 with passing tests including stress
Can explain why each optimization preserves correctness, not just that it’s faster
Found at least 1 bug via stress testing (good — that’s the point)

Lab 05 — Manual Testing

Goal

Train the discipline of manually walking through your code before claiming it’s done — and finding bugs before the interviewer does.

Background Concepts

Trace tables: writing variable state row-by-row through a loop
Edge cases as a reflex
The “rubber duck” verbalization while tracing
Pre-submission checklist

Interview Context

Strong candidates always test their code by walking through at least one example by hand, vocalizing variable state. The interviewer learns whether you can verify your own work — a critical engineering signal.

Weak candidates write the code and say “I think this works” without verification, then submit and the interviewer finds the bug. Even when the candidate would have caught the bug if they’d traced.

Problem Statement

You are given 4 small functions below, each with at least one bug. For each:

Construct a trace table for one input
Identify the bug
State a fix
Construct a test case that exposes the bug

Function 1 — is_palindrome(s: str) -> bool:

def is_palindrome(s):
    i, j = 0, len(s)
    while i < j:
        if s[i] != s[j]:
            return False
        i += 1
        j -= 1
    return True

Function 2 — binary_search(a: list[int], target: int) -> int:

def binary_search(a, target):
    lo, hi = 0, len(a)
    while lo < hi:
        mid = (lo + hi) // 2
        if a[mid] == target:
            return mid
        elif a[mid] < target:
            lo = mid
        else:
            hi = mid
    return -1

Function 3 — reverse_linked_list(head):

def reverse_linked_list(head):
    prev = None
    curr = head
    while curr.next is not None:
        nxt = curr.next
        curr.next = prev
        prev = curr
        curr = nxt
    return prev

Function 4 — max_subarray(a: list[int]) -> int:

def max_subarray(a):
    best = 0
    cur = 0
    for x in a:
        cur += x
        if cur < 0:
            cur = 0
        if cur > best:
            best = cur
    return best

Constraints

Standard.

Clarifying Questions

For palindrome: case-sensitive? Treat as is.
For binary search: input is sorted ascending, no duplicates.
For reverse linked list: input may be empty.
For max subarray: input may be all negative.

Examples

For each function, work a non-trivial input by trace table.

Initial Brute Force

N/A — debugging lab.

Brute Force Complexity

N/A.

Optimization Path

N/A.

Final Expected Approach

For each function, identify:

The bug
The minimal fix
The triggering input

Reference answers (DON’T peek until you’ve tried):

is_palindrome: j = len(s) should be len(s) - 1 (or change loop to i < j and use s[i] == s[j-1] style — but as written, immediate IndexError on non-empty input). Trigger: any non-empty string.
binary_search: lo = mid should be lo = mid + 1, otherwise infinite loop. Trigger: target larger than a[lo] but smaller than a[hi-1], e.g., a=[1,2,3], target=3. Actually wait — let’s check: lo=0, hi=3, mid=1, a[1]=2 < 3, lo=1 (correct in the buggy version; mid would still advance). Try a=[1,3], target=3: lo=0, hi=2, mid=1, a[1]=3 → return 1. OK that works. Try a=[1,2], target=2: lo=0, hi=2, mid=1, a[1]=2 → return 1. Try a=[1], target=2: lo=0, hi=1, mid=0, a[0]=1<2, lo=0 → infinite loop. Trigger: target greater than max element.
reverse_linked_list: loop condition while curr.next is not None skips reversing the last node. Should be while curr is not None. Also crashes on empty input (head = None).
max_subarray: initializing best = 0 fails on all-negative input — returns 0 instead of the largest (least negative) element. Should be best = -infinity or best = a[0].

Data Structures Used

Trace table: a small ASCII grid of variable values per loop iteration

Correctness Argument

A function is correct iff for every valid input it produces the right output. Manual tracing on edge cases is a cheap, reliable way to falsify “I think it works”.

Complexity

N/A.

Implementation Requirements

For each function:

Build a trace table on paper
Identify the bug
Write the fix
Write 4 unit tests covering the bug and adjacent cases

Tests

For function 1:

assert is_palindrome("") == True
assert is_palindrome("a") == True
assert is_palindrome("aa") == True
assert is_palindrome("ab") == False
assert is_palindrome("racecar") == True

For function 2:

assert binary_search([], 5) == -1
assert binary_search([5], 5) == 0
assert binary_search([1,2,3], 0) == -1
assert binary_search([1,2,3], 4) == -1   # the trigger
assert binary_search([1,3,5,7], 5) == 2

(Build similar for #3, #4.)

Follow-up Questions

“How would you find this bug in production?” — logs, test failure, customer report
“How would you prevent this class of bug?” — property-based testing, fuzz testing
“What’s your debugging strategy when you can’t reproduce locally?” — see phase-10-testing-debugging/

Product Extension

These four bugs are real bug archetypes:

Off-by-one (#1, #2)
Loop termination (#2, #3)
Initialization (#4)

In code review, scan specifically for: array bounds, loop conditions, accumulator initial values, null inputs.

Language/Runtime Follow-ups

Python: index s[len(s)] raises IndexError immediately — defensive crash
C/C++: would silently read garbage and possibly continue. Always be more defensive
Java: ArrayIndexOutOfBoundsException like Python
Go: panic with clear message

Common Bugs

The four bugs themselves are the common bugs:

Off-by-one in array bounds
Off-by-one in binary search update
Loop terminates one iteration early
Wrong initial accumulator value

Debugging Strategy

Trace table for the smallest failing input
Print state every iteration (the printf approach)
Verbalize the loop invariant — does the code uphold it?
Step through with a debugger (last resort, not first — debugger usage is slower than tracing for small problems)

Deliverable

Trace tables for all 4 functions (one per function)
Bug identification + fix + test suite for each
A “trace table template” you’ll reuse going forward

Mastery Criteria

Found all 4 bugs via tracing (without peeking at answers)
Wrote correct fixes
Wrote tests that exposed each bug
Can construct a trace table for an unseen function in <2 minutes
Adopted “always trace one example before submitting” as a permanent habit

Lab 06 — Communication

Goal

Train explicit, structured communication during a coding interview: narrate intent, signpost transitions, ask before assuming, summarize at decision points.

Background Concepts

The 6-phase communication structure (see COMMUNICATION.md)
“Signposting” — short phrases that tell the listener which phase you’re in
“Thinking out loud” without rambling
Pause-points and decision-points

Interview Context

In a real interview, silence is your enemy. After ~30 seconds of silence, the interviewer becomes uncertain about your progress. After 90 seconds, they may interrupt — disrupting your thinking. Skilled candidates fill the time with structured narration that costs them little extra cognitive load.

Problem Statement

Record yourself solving 3 problems out loud, applying explicit communication scaffolding. Then transcribe and grade.

Recommended problems (Easy to keep cognitive load low):

Two Sum
Valid Parentheses
Linked List Cycle

For each, follow this script structure:

Phase 1 — Restatement (60–90 sec)

“Let me restate to make sure I have it: …”
Ask 2–3 clarifying questions explicitly

Phase 2 — Examples (60 sec)

“Let me work through an example by hand: …”
Construct one trivial and one adversarial example

Phase 3 — Brute Force (60–90 sec)

“The simplest approach would be …”
“That’s O(…) — clearly not optimal because …”

Phase 4 — Optimization Discussion (90 sec)

“I notice ; that suggests …”
Explicitly mention the checklist item you matched

Phase 5 — Coding (5–10 min)

Narrate intent at each block: “Now I’ll initialize the hashmap, then iterate…”
Pause and verify after each block

Phase 6 — Testing & Closing (2–3 min)

“Let me trace through the given example…”
“Edge cases I should check: …”
“Final complexity: …”

Constraints

Time-box yourself: 12 minutes total per problem. If you’re going over, cut narration first, not coding.

Clarifying Questions

For Two Sum:

“Can the input be empty?”
“Can there be duplicates?”
“Is exactly one valid pair guaranteed?”
“Can I assume integers, or could they be floats?”

For Valid Parentheses:

“Are only () [] {} allowed, or other characters?”
“Does an empty string count as valid?”

For Linked List Cycle:

“Should I detect the cycle’s start, or just whether it exists?”
“Can I modify the list?”

Examples

For each problem, narrate one example by hand before coding.

Initial Brute Force

State and narrate.

Brute Force Complexity

State out loud.

Optimization Path

Narrate the transition: “The brute is O(N^2) because we compare every pair. If I use a hashmap, I can check in O(1) whether target - a[i] is already seen, giving O(N).”

Final Expected Approach

State the final approach as a single sentence before coding.

Data Structures Used

State the data structure explicitly: “I’ll use a hashmap from value → index.”

Correctness Argument

State the loop invariant: “After processing index i, the map contains every (value, index) pair from a[0..i].”

Complexity

State at end: time + space + amortized vs worst case.

Implementation Requirements

You must produce:

Three audio recordings (or written transcripts) of yourself solving the 3 problems with the script structure
Working code for all three (the bar is correct + clean, not optimal at first try)

Tests

For each problem, state aloud:

“Smoke test: given example”
“Edge: empty input”
“Edge: single element / smallest non-trivial”
“Adversarial: all duplicates / all negatives / max constraint”

Then trace through at least one example.

Follow-up Questions

After your solution, narrate:

“If we wanted all pairs instead of one, we’d …”
“If the input were a stream, we’d …”
“If memory were tight, we could …”

Product Extension

These same communication patterns scale to design interviews, on-call discussions, and code reviews. The skill is permanent.

Language/Runtime Follow-ups

When narrating, mention language-specific notes:

“In Python I’m using dict which is O(1) average; with adversarial input collisions it degrades to O(N).”
“I’m using enumerate for index + value to keep the code clean.”

Common Bugs (in communication)

Going silent for >60 seconds. Even saying “I’m trying to figure out the right invariant for the inner loop” is enough.
Narrating typing instead of intent. “Now I’m typing for i in range(n)…” — bad. “Now I’ll iterate forward to maintain the prefix…” — good.
Asking too many questions at once. Drip them; ask after motivation appears.
Restating the prompt verbatim. Paraphrase, demonstrate comprehension.
Refusing hints when stuck. Frozen silence is far worse than asking for a nudge.
Forgetting to summarize at the end. “So the final approach is X with O(N) time and O(N) space; tested on the given example and edge cases.”

Debugging Strategy

After recording, listen back and grade yourself on:

Did I signpost each phase?
Did I ever go silent for >60 seconds?
Did I narrate intent or just typing?
Did I summarize at decision points?
Did I sound confident or apologetic?
Was my code-talk congruent with my code? (i.e., did I narrate one thing while coding another?)

Deliverable

3 recordings or transcripts (12 minutes each)
Self-grading rubric per recording (the 6 questions above + score 0–10 per phase)
A list of 3 phrases / patterns you’ll reuse next time, and 3 you’ll avoid

Mastery Criteria

Recorded all 3 problems
Self-graded honestly (most candidates score 4–6/10 on first attempt — that’s expected)
Identified at least 3 specific things to improve
On a re-recording of one problem, scored at least 2 points higher
No silent gaps > 30 seconds in your final recording
Narration tracked code 90%+ of the time (no divergence)

Lab 07 — Stuck Recovery

Goal

Train the explicit “stuck protocol” so that when you don’t see the optimal in an interview, you don’t freeze — you systematically work through a checklist that has a high chance of unblocking you within 3 minutes.

Background Concepts

The 11-step stuck protocol (see FRAMEWORK.md)
The difference between productive struggle and unproductive freeze
How to ask for a hint without losing the interview

Interview Context

Every candidate gets stuck. The signal in your favor is how you respond. A candidate who gets stuck and applies a visible recovery protocol is a strong signal. A candidate who gets stuck and goes silent is a weak signal regardless of whether they recover.

Problem Statement

Three problems below are deliberately chosen to be above what you can solve cold without a known pattern. For each:

Apply the 11-step stuck protocol explicitly, narrating each step
Time-cap each step at 60 seconds
If you’ve used 8+ steps without progress, ask for a hint (in this lab, “ask for a hint” means: read the hint section at the bottom)
Continue until solved
Document which step generated the breakthrough

Problems (medium-difficulty, but you don’t have the pattern yet):

Container With Most Water: given heights, find two indices i, j such that min(h[i], h[j]) * (j - i) is maximum.
Longest Palindromic Substring: find the longest palindromic substring of a string.
Search In Rotated Sorted Array: given a sorted array rotated at an unknown pivot, find target in O(log N).

(If you already know all 3 cold, replace with 3 unfamiliar mediums.)

Constraints

Standard. Assume N ≤ 10^5.

Clarifying Questions

For each problem, list 3 you’d ask the interviewer.

Examples

Construct 3 examples for each.

Initial Brute Force

For each problem, write the O(N^2) brute force first.

Brute Force Complexity

State.

Optimization Path — Apply The Stuck Protocol

The 11 steps in order:

Restate the problem. Out loud, in your own words.
Write the brute force. It is correct, even if slow.
Examine the constraints. What does N suggest?
Construct smaller examples. Solve N=2, N=3 by hand.
Look for repeated work. What does the brute force recompute?
Look for sortedness or monotonicity. Could sorting help? Is something monotonic?
Look for symmetry / pattern from a known DS family. Two-pointer? Sliding window? Heap?
Try graph modeling. Some non-graph problems become BFS/DFS when you reframe.
Try a math approach. Closed form? Modular arithmetic? Combinatorial identity?
State the simplest approach you have. Even if not optimal — get it down.
Ask for a nudge. Phrase: “I’ve explored A, B, and C. I think the answer might involve D-family of approaches but I’m not seeing it cleanly. Could you nudge me?”

Final Expected Approach

For each problem, the recovery protocol leads to:

Container With Most Water: Two pointers from both ends, moving the smaller one inward. Step 7 (two-pointer pattern) gets you there. Loop invariant: the optimal answer doesn’t include any pair (L, R) we’ve already discarded.
Longest Palindromic Substring: Either expand-around-center (step 4 — by hand on N=3,4,5 you spot the center idea) or DP (step 5 — repeated work on overlapping substrings). Manacher’s is optimal but step 4/5 is enough.
Search In Rotated Sorted Array: Modified binary search. Step 6 — “is something monotonic?” — at least one half is sorted. Use that to decide which half to recurse into.

Data Structures Used

Two pointers (#1)
Center expansion / DP table (#2)
Modified binary search (#3)

Correctness Argument

For each, the post-recovery solution has a clear correctness argument:

The “moving the smaller one” strategy is correct because moving the larger pointer cannot increase min(h[i], h[j]) (still bounded by the smaller) and reduces width.
Center-expansion enumerates every (center, length) pair exactly once.
The half that is sorted contains target iff target is in [lo, mid] (or [mid, hi]); otherwise recurse on the other half.

Complexity

O(N) time, O(1) space
O(N^2) time (center expansion); Manacher’s is O(N)
O(log N) time, O(1) space

Implementation Requirements

For each problem, after applying the protocol, implement it and run tests. The deliverable is the protocol log + working code.

Tests

Smoke: given example
Edge: N=0, N=1, N=2
Adversarial: all-equal heights (#1), all-same-char string (#2), no rotation (#3 with sorted input)
Random: 50 small random inputs vs brute

Follow-up Questions

“Why didn’t you see the optimal immediately?” — be honest: “I hadn’t internalized the two-pointer pattern yet” or “I tried sweeping linearly and missed the symmetry.”
“What pattern would you check next time?” — internalize the answer.
“When would you fail to recognize this in an interview?” — when nervous, or when the problem is wrapped in unfamiliar context.

Product Extension

In production, “stuck on a bug” follows the same shape: restate symptom, find a minimal reproduction, narrow the search, hypothesize cause, verify by experiment, ask a teammate. The protocol generalizes.

Language/Runtime Follow-ups

Python: for #2, naive O(N^2) center expansion may TLE on N = 10^5 due to constant factors; consider Manacher’s for full credit
Java: String.substring in older versions copies; in newer versions it shares — beware version differences
C++: raw character arrays beat std::string for hot loops

Common Bugs

Stuck protocol going too fast — you skip step 4 (small examples) and miss the breakthrough that lives there
Stuck protocol going too slow — you spend 5 minutes on step 5 instead of moving on
Not asking for a hint when 11 steps are exhausted (rare, but happens — pride costs the interview)
Asking for a hint too early (before step 6) — looks weak

Debugging Strategy

If your protocol runs took >15 minutes per problem, you spent too long per step. Cap individual steps at 60 seconds; if no progress, move on.

If you finished without a hint, congratulations — but verify that you didn’t already know the pattern, in which case use a different problem.

Deliverable

Per-problem stuck protocol log: for each of the 3 problems, a written record of which step you tried, how long, what you tried, did it advance you?
The breakthrough step annotated.
Working implementation + tests for each problem.
Reflection: which 2 steps are weakest for you? (e.g., many candidates skip step 4 — small examples — because it feels too elementary, but it’s often the highest-yield step.)

Mastery Criteria

Solved all 3 problems with explicit protocol use
Spent ≤ 15 min per problem (including coding + tests)
Identified the breakthrough step accurately
Used at most 1 hint across the 3 problems
Identified your two weakest protocol steps
Re-attempted with the weak steps as starting points and noticed faster progress

Hints (peek only after attempting)

Container With Most Water hint: “Think about what happens at the two extreme ends — and which pointer moving inward could possibly improve the answer.”

Longest Palindrome hint: “Every palindrome has a center. How many possible centers are there?”

Search In Rotated Sorted Array hint: “Even though the full array is rotated, one half across any midpoint must be sorted. Can you tell which half?”

Phase 1 — Programming & Data Structure Foundations

Target level: Easy → low-Medium Expected duration: 2 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~5 labs/week + 30–50 problems applying every data structure under the framework

Why This Phase Exists

Phase 0 fixed your execution — how to read, communicate, derive constraints, brute force, optimize, test, and recover. Phase 1 fixes your vocabulary: the data structures and language-runtime concepts that 95% of every coding interview rests on.

If you cannot, on demand, state the amortized cost of a dynamic-array push, the worst-case behavior of a hash map under adversarial keys, why s += c in a Python loop is O(N²), what happens to your for-loop when you mutate the collection it iterates, and the difference between deep and shallow copy in your language — you do not have a foundation. You have a list of words.

This phase makes the foundation real. Every concept comes with: internal representation, complexity table, memory behavior, language-specific gotchas (Python, Java, Go, C++, JS/TS), interview traps, common bugs, and testing strategy.

What You Will Be Able To Do After This Phase

Pick the correct data structure for any Easy/Medium problem in under 60 seconds.
State the worst-case, average-case, and amortized complexity of every operation on every fundamental DS.
Predict your code’s memory and cache behavior, not just its asymptotic time.
Write idiomatic code in your interview language without falling into language-specific traps.
Recognize when a problem is really a hash map / heap / monotonic stack problem in disguise.
Implement, from scratch, every data structure listed below — without notes — in under 20 minutes each.

Concepts To Master

You must master every item below before moving to Phase 2. Pattern problems in Phase 2 assume fluency with these primitives.

Data Structures

Arrays — operations, complexity, memory layout, cache behavior, dynamic resizing amortization, gotchas per language
Strings — immutability, encoding pitfalls, concat-in-loop blowup, substring complexity
Hash Maps — hashing, collisions, load factor, adversarial inputs, ordered vs unordered, custom hash for tuples
Hash Sets — operations, set algebra, when to use vs map
Linked Lists — singly/doubly, sentinels, tail pointer, common manipulation patterns
Stacks — array-backed vs linked, monotonic stack preview
Queues — deque, ring buffer, priority queue preview
Heaps — binary heap, sift up/down, complexity, top-K pattern
Sorted arrays / sorted sets — binary search, bisect, sorted-set complexity per language
Trees — binary, BST, traversals iterative + recursive, balanced vs unbalanced
Tries — operations, space cost, alphabet size considerations
Graphs — adjacency list, matrix, edge list, when each
Disjoint Set Union — path compression + union by rank
Bitsets / Bit Manipulation — set/clear/popcount, common idioms
Counters / Multisets — Counter, HashMap+counter pattern, multiset alternatives

Runtime Concepts

Stack vs heap memory
Scope and lifetime
Value vs reference semantics
Mutable vs immutable
Hash collisions and adversarial keys
Iterator invalidation
Garbage collection basics (refcount vs tracing)
Memory leaks (especially in GC’d languages)
Deep vs shallow copy
Recursion depth and stack overflow

Why These Concepts Matter In Interviews

Most “I knew the algorithm but couldn’t get it to pass” failures aren’t algorithmic. They are foundation failures:

“My hash map is slow” → adversarial collision pattern in the input.
“My recursion crashed” → no idea Python’s default recursion limit is 1000.
“I edited the list while iterating and got weird output” → iterator invalidation.
“My BFS queue is slow” → using list.pop(0) instead of deque.popleft.
“Java said ConcurrentModificationException” → the JDK’s fail-fast iterator policy.
“Go map iteration ordered differently every run” → intentional non-determinism for hash-flooding defense.
“C++ vector reference invalidated after push_back” → reallocation moved the storage.
“JS object stringified my int keys to strings” → all object keys are strings; use Map.

Every one of these is on the rubric somewhere as “implementation correctness” or “language fluency.” Phase 1 closes them.

Inline Data Structure Reference

The remainder of this README is a data-structure and runtime reference manual. Read it linearly the first time. Skim it as a reference any time you forget a complexity, language API, or gotcha.

1. Arrays

Internal Representation

A contiguous block of memory holding fixed-size elements indexed by offset. Static arrays have fixed size; dynamic arrays (Python list, Java ArrayList, Go slice, C++ vector, JS Array) wrap a static array and grow it geometrically.

Operations and Complexity

Operation	Static	Dynamic (avg)	Dynamic (worst)
Index read/write `a[i]`	O(1)	O(1)	O(1)
Append `push_back`	n/a	O(1) amortized	O(N) (resize)
Prepend / insert middle	n/a	O(N)	O(N)
Pop end	n/a	O(1)	O(1)
Pop front	n/a	O(N)	O(N)
Search (unsorted)	O(N)	O(N)	O(N)
Search (sorted, binary search)	O(log N)	O(log N)	O(log N)
Length	O(1)	O(1)	O(1)

Memory Layout & Cache Behavior

Contiguous memory means the CPU prefetcher can stream elements predictively, giving arrays the best cache locality of any data structure. A linear scan over an int[] is typically 5–20× faster than the same scan over a linked list of the same length, even though both are O(N). When constants matter (HFT, hot loops), this difference dominates.

Dynamic Resizing Amortization

Geometric growth (typically 2× or 1.5×) gives O(1) amortized append: doubling means total work to grow from 1 to N is 1 + 2 + 4 + … + N = 2N - 1, amortized to O(1) per push. Linear growth (+1) gives O(N) amortized — never use it.

Language-Specific Gotchas

Python: list overallocates with growth factor ~1.125; arr.pop(0) is O(N); use collections.deque for queue. Lists are heterogeneous (each slot is a PyObject*), defeating cache locality. Use array.array or NumPy for primitive-typed storage.
Java: ArrayList is Object[] — boxing for primitives. Use int[] for hot paths. ArrayList.remove(0) is O(N). Arrays.asList(arr) returns a fixed-size view, not a real ArrayList.
Go: Slices are a 3-tuple (ptr, len, cap). append may or may not reallocate — appending to one slice can silently mutate another sharing storage. Always check capacity. s = s[:0] reuses storage; s = nil releases it.
C++: std::vector<T> reallocates on push_back past capacity, invalidating all iterators and references. Reserve up front when you know the size. vector<bool> is a packed bitset, not bool[] — its operator[] returns a proxy.
JS/TS: Arrays are dense or sparse; sparse arrays (a[1000000] = 1) have terrible perf. arr.shift() is O(N). Holes (new Array(10)) skip during forEach but not during for.

Common Interview Traps

“Insert in the middle” — be sure they don’t actually want a different DS.
“In-place” — explicitly disallows the cheap copy-to-new-array solution.
Off-by-one at boundaries: [l, r) vs [l, r].

2. Strings

Internal Representation

An array of code units (bytes for char[], UTF-16 code units in Java/JS, variable-width in Python 3 with PEP 393 latin-1/UCS-2/UCS-4). In most languages, strings are immutable — every “modification” allocates a new string.

Operations and Complexity

Operation	Complexity
Index `s[i]`	O(1) for fixed-width encoding, O(i) for UTF-8 by codepoint
Length	O(1) (cached)
Concat `s + t`	O(\|s\| + \|t\|) — new allocation
Substring `s[l:r]`	O(r − l) (copy) in most langs, O(1) (view) in Go and Java pre-7u6
Equality	O(min len) but fast with hash compare first
Find substring (naive)	O(NM)
Find substring (KMP / Z)	O(N + M)

Immutability

Java/Python/JS strings are immutable. So this loop:

s = ""
for c in chars:
    s += c   # O(N) per iteration → O(N²) total

This bug shows up in real interviews and gets candidates dinged for not knowing language internals. Use "".join(chars) (Python), StringBuilder (Java), [].join('') (JS), strings.Builder (Go).

Encoding Pitfalls

“String length” in characters vs bytes vs grapheme clusters can all differ. "é" may be 1 codepoint or 2 (e + combining accent) → 2 codepoints, 1 grapheme.
Python len("😀") == 1; Java "😀".length() == 2 (UTF-16 surrogate pair).
JS "😀".length === 2 for the same reason; iterate with for…of to get codepoints.
Always clarify: “Are inputs ASCII?” — if no, ask whether the unit of “character” is byte, codepoint, or grapheme.

Substring Complexity

Substring extraction copies the underlying chars in most modern languages. Go strings are immutable byte slices and substring is O(1) view (but converting to/from []byte is O(N)).

Language-Specific Gotchas

Python: strings are immutable; use lists of chars and "".join() for building. s[::-1] is idiomatic reverse.
Java: String s += c → quadratic. Use StringBuilder. String.intern() exists but rarely needed in interviews.
Go: string is immutable bytes; []byte(s) and string(b) each allocate. Range over a string yields (i, rune), not (i, byte).
C++: std::string is mutable; SSO (small string optimization) keeps short strings on the stack. s.c_str() is null-terminated.
JS/TS: strings are UTF-16 code units; emoji and non-BMP chars are 2 units long.

3. Hash Maps

Internal Representation

A bucket array, indexed by hash(key) % capacity. Collisions resolved by either separate chaining (linked list / tree per bucket — Java since 8 promotes long chains to red-black trees) or open addressing (Python, Ruby — probe sequence in the same array).

Operations and Complexity

Operation	Average	Worst
Insert	O(1)	O(N) (all collide)
Lookup	O(1)	O(N)
Delete	O(1)	O(N)
Iterate	O(N + capacity)	O(N + capacity)

Load Factor

The capacity / size ratio. When load factor exceeds a threshold (~0.75 typical), the table doubles and rehashes — this is amortized O(1) per insert but O(N) for the resize itself.

Adversarial Inputs

A static, public hash function lets an attacker craft N keys that all hash to the same bucket → O(N²) insertion. Real-world history: this brought down Java web servers in 2003 and PHP in 2011. Modern languages (Python PYTHONHASHSEED, Go map randomization, Java tree fallback) defend against this.

In an interview, if the problem says “the input may be adversarial,” do not rely on hash maps for worst-case bounds — use sorting + binary search or a balanced BST.

Ordered vs Unordered

Insertion-ordered: Python 3.7+ dict, Java LinkedHashMap, JS Map.
Sorted (by key): Java TreeMap, C++ std::map, Python sortedcontainers.SortedDict.
Hash, no order guarantee: C++ std::unordered_map, Java HashMap (unordered as of 8+), Go map (intentionally randomized).

If the problem requires ordered iteration, do not use a plain hash map.

Custom Hash for Tuples / Composite Keys

Python: tuples of hashable items are hashable for free.
Java: must implement both equals and hashCode for any custom key class. Forgetting one is a top-10 interview bug.
Go: map keys must be “comparable types” — structs of comparables work, slices/maps don’t.
C++: must specialize std::hash<T> or pass a custom hasher to unordered_map.
JS: object keys are coerced to strings; use Map for object-keyed maps.

Common Interview Traps

Mutating a key after insertion (its hash changes; the map can’t find it).
Iterating while mutating → ConcurrentModificationException (Java) / RuntimeError (Python).
Assuming O(1) without acknowledging worst case.

4. Hash Sets

Operations and Complexity

Same as hash map (a hash set is conceptually a hash map with null values).

Operation	Average	Worst
add / contains / remove	O(1)	O(N)
union	O(\|A\| + \|B\|)
intersection	O(min(\|A\|, \|B\|))
difference	O(\|A\|)

Set Algebra

Union: A ∪ B — Python A | B, Java A.addAll(B), C++ manual.
Intersection: A ∩ B — iterate the smaller, lookup in the larger.
Difference: A \ B — iterate A, skip if in B.
Symmetric difference: A △ B — (A ∪ B) \ (A ∩ B).

Set vs Map: When To Choose

Use a set when you only need presence; use a map when you need associated value (count, index, parent, etc.). Many “use a set” problems become trivially easier with a map (e.g., Two Sum needs a value → index map, not a set).

Language-Specific Gotchas

Python: set and frozenset; tuples are hashable, lists are not.
Java: HashSet, LinkedHashSet, TreeSet.
Go: no built-in set — use map[T]struct{} (zero-byte value).
C++: std::unordered_set, std::set (sorted).
JS/TS: Set preserves insertion order; objects are reference-equal.

5. Linked Lists

Singly vs Doubly

Singly: each node has next. Reverse, cycle detection, two-pointer dance.
Doubly: each node has next and prev. Required for O(1) erase given an iterator (LRU cache pattern).

Sentinels (Dummy Nodes)

A dummy head node simplifies edge cases: “what if the list is empty?” “what if we delete the first node?” become non-special. Always use a dummy when you have to return a head pointer that may change.

dummy = ListNode(0)
dummy.next = head
prev = dummy
# … operations on prev.next …
return dummy.next

Tail Pointer

Maintaining a tail pointer makes append O(1) (otherwise O(N)). Always-mark whether your manipulation invalidates the tail.

Common Manipulation Patterns

Reverse iteratively: three pointers prev, curr, next.
Reverse recursively: reverse(head.next) then head.next.next = head; head.next = None.
Find middle: slow/fast pointers; fast moves 2× slow.
Detect cycle: Floyd’s tortoise and hare.
Merge two sorted lists: dummy head + zip pattern.
Remove Nth from end: lead pointer ahead by N.

Operations and Complexity

Operation	Singly	Doubly
Index	O(N)	O(N)
Insert at known node	O(1)	O(1)
Delete at known node	O(N)*	O(1)
Search	O(N)	O(N)

*Singly: O(N) because you need the previous node; O(1) if you have the previous pointer.

Language-Specific Gotchas

Python: define class ListNode: __slots__ = ('val', 'next') for memory; default uses a __dict__.
Java: LinkedList<T> exists but is rarely the right choice; ArrayDeque beats it on most operations.
Go: standard library has container/list (doubly linked, generic-erased before Go 1.18).
C++: std::list (doubly), std::forward_list (singly).
JS: typically write class Node { constructor(v) { this.val = v; this.next = null; } }.

6. Stacks

Implementations

Array-backed (dynamic array, push/pop end) or linked (push/pop head). Array-backed is faster in practice due to cache locality.

Operations and Complexity

Operation	Complexity
push	O(1) amortized
pop	O(1)
peek (top)	O(1)

Monotonic Stack Preview

A monotonic stack maintains elements in increasing or decreasing order. Used for next-greater / next-smaller / largest rectangle in histogram. Preview only — full pattern in Phase 2.

for x in arr:
    while stack and stack[-1] < x:
        # process element being popped: x is its next-greater
        stack.pop()
    stack.append(x)

Language-Specific Gotchas

Python: use list directly with append/pop. Don’t use queue.LifoQueue (locks).
Java: prefer ArrayDeque over the legacy Stack (synchronized, slow).
Go: slice with s = append(s, x) and s[len(s)-1] / s = s[:len(s)-1].
C++: std::stack<T> adapter on std::deque; or just use vector.
JS: array push/pop.

7. Queues

Variants

Plain queue (FIFO): enqueue rear, dequeue front.
Deque (double-ended): push/pop both ends in O(1).
Ring buffer / circular buffer: fixed-capacity deque on a static array.
Priority queue: see Heaps.

Operations and Complexity

Operation	Linked	Array deque	Ring buffer
enqueue rear	O(1)	O(1) amortized	O(1) (if not full)
dequeue front	O(1)	O(1)	O(1)
peek front	O(1)	O(1)	O(1)

Language-Specific Gotchas

Python: collections.deque for queue; never use list.pop(0) (O(N)).
Java: ArrayDeque<T> for both stack and queue. LinkedList works but slower. Avoid java.util.Queue<T> q = new LinkedList<>(); for hot paths.
Go: no built-in deque; container/list exists. Most CP code uses a slice as a queue with q[head:] (lazy popfront).
C++: std::deque (random-access amortized O(1)) or std::queue adapter.
JS: array push/shift works but shift is O(N); use a custom ring buffer for large queues.

8. Heaps

Binary Heap

A complete binary tree where each parent ≤ children (min-heap) or ≥ (max-heap). Stored in an array: parent of i is (i-1)/2, children are 2i+1 and 2i+2.

Operations and Complexity

Operation	Complexity
push (sift up)	O(log N)
pop top (sift down)	O(log N)
peek top	O(1)
heapify (build from N elements)	O(N)
decrease-key	O(log N) (if you know the index)
arbitrary delete	O(log N) (if you know the index), else O(N)

Top-K Pattern (Preview)

“Top K largest” → min-heap of size K. Push every element; if size > K, pop. Final heap = top K.
“Stream median” → max-heap (lower half) + min-heap (upper half), balanced.

Language-Specific Gotchas

Python: heapq is a min-heap of any orderable; for max-heap, push -x. Tuples break ties lexicographically — careful with non-orderable secondary keys.
Java: PriorityQueue<T> is a min-heap by default; pass Comparator.reverseOrder() for max. peek() may return null on empty.
Go: container/heap requires you to implement the heap.Interface. Significant boilerplate.
C++: std::priority_queue<T> is a max-heap by default. Use std::priority_queue<T, vector<T>, greater<T>> for min-heap. Or use make_heap + push_heap + pop_heap on a vector.
JS: no built-in; write your own or use a library.

9. Sorted Arrays / Sorted Sets

Binary Search

On a sorted array, find a target or its insertion point in O(log N). Three canonical variants:

Lower bound: smallest index where a[i] >= target.
Upper bound: smallest index where a[i] > target.
Exact match: lower bound + check a[i] == target.

Operations on Sorted Set / Multiset

Operation	Complexity
Insert	O(log N)
Erase	O(log N)
Find / lower_bound / upper_bound	O(log N)
Iterate in order	O(N)
Min / max	O(1) (or O(log N))
Kth element	O(log N) (with order statistics tree) or O(K) iteration

Language-Specific Gotchas

Python: bisect.bisect_left / bisect_right for sorted lists. sortedcontainers.SortedList for an ordered multiset (O(log N) insert).
Java: TreeSet<T> / TreeMap<K,V>; floor, ceiling, higher, lower are essential APIs to know.
Go: no standard sorted set — must implement or use a third-party library.
C++: std::set / std::multiset (red-black tree); lower_bound / upper_bound member functions.
JS: no standard sorted set; use a sorted array with binary search or write a treap.

10. Trees

Binary Tree Definitions

Binary tree: each node has ≤ 2 children.
BST: left subtree < node < right subtree (in-order traversal yields sorted sequence).
Balanced (AVL, RB): height O(log N) guaranteed.
Unbalanced: worst case O(N) (degenerate to linked list).

Traversals (Recursive)

def inorder(n):
    if not n: return
    inorder(n.left)
    visit(n)
    inorder(n.right)

Pre-order: visit, left, right. Post-order: left, right, visit. Level-order: BFS with a queue.

Traversals (Iterative)

In-order with stack: push left chain, pop and visit, then go right.
Pre-order with stack: push root, pop, visit, push right then left.
Post-order with stack: trickier — use a marker or a 2-stack trick.
Morris traversal: O(1) extra space using threaded pointers; advanced.

Operations and Complexity (Balanced)

Operation	Balanced	Unbalanced
Insert / delete / search	O(log N)	O(N)
Min / max	O(log N)	O(N)
In-order traversal	O(N)	O(N)

Language-Specific Gotchas

Python: sys.setrecursionlimit; CPython has no tail-call elimination.
Java: TreeMap is red-black; recursion depth limited by JVM stack (~1000s).
Go: no standard balanced BST.
C++: std::map / std::set are red-black; iterators traverse in order.
JS: no standard balanced BST; recursion depth is engine-dependent.

11. Tries

Internal Representation

Each node has up to alphabet-size children (e.g., 26 for lowercase, 256 for byte, larger for unicode). End-of-word flag per node.

Operations and Complexity

Operation	Complexity
Insert word of length L	O(L)
Search word of length L	O(L)
Prefix search	O(L)
Space	O(total chars × alphabet size)

Alphabet Size Considerations

Fixed array per node: O(σ) memory per node, fast O(1) child lookup. HashMap per node: O(actual children) memory, slightly slower lookup. For 26 letters use array; for full unicode use hash map.

Language-Specific Gotchas

Python: dict of dicts; can use defaultdict(dict).
Java: Map<Character, TrieNode> or TrieNode[26].
Go: struct with [26]*TrieNode.
C++: struct with TrieNode* children[26]. Manual memory management or unique_ptr.
JS: plain objects or Map.

12. Graphs

Representations

Representation	Space	Edge query	Iterate neighbors
Adjacency list	O(V + E)	O(deg(v))	O(deg(v))
Adjacency matrix	O(V²)	O(1)	O(V)
Edge list	O(E)	O(E)	O(E)

When To Use Each

Adjacency list: sparse graphs (E ≪ V²) — almost always the right answer for interviews.
Adjacency matrix: dense graphs (E ≈ V²), Floyd-Warshall, when V is small (≤ 500).
Edge list: Kruskal’s MST, when you need to sort edges, when graph is given as edges and you don’t need neighbor queries.

Common Forms in Interviews

List<List<Integer>> adjacency.
Map<String, List<String>> for non-integer node IDs.
Implicit graph (grid: neighbors are (±1, 0) and (0, ±1)).

Language-Specific Gotchas

Python: defaultdict(list) is ideal for adj[u].append(v).
Java: List<List<Integer>> with explicit ArrayList initialization in a loop; primitive int adjacency lists need third-party (Eclipse Collections, fastutil).
Go: [][]int slice-of-slices.
C++: vector<vector<int>>; for performance, use vector<int> with offsets (CSR format).
JS: array of arrays or Map<string, string[]>.

13. Disjoint Set Union (Union-Find)

Operations

find(x): which component is x in?
union(x, y): merge components of x and y.

Optimizations

Path compression: during find, set parent of every visited node to the root.
Union by rank/size: attach the shorter tree under the taller.
Together: O(α(N)) per op (inverse Ackermann — practically constant, ≤ 4 for any realistic N).

Naive vs Optimized

Variant	find	union
Naive	O(N)	O(N)
Path compression only	O(log N) amortized	O(log N) amortized
Path compression + union by rank	O(α(N)) amortized	O(α(N)) amortized

Reference Implementation (Python)

parent = list(range(N))
rank = [0] * N

def find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]   # path compression
        x = parent[x]
    return x

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry: return False
    if rank[rx] < rank[ry]: rx, ry = ry, rx
    parent[ry] = rx
    if rank[rx] == rank[ry]: rank[rx] += 1
    return True

Language-Specific Gotchas

Python: the recursive form blows past the recursion limit for N > 1000; always iterative.
Java: prefer int[] parent over Map<Integer, Integer> for primitive perf.
Go: straightforward with []int.
C++: vector<int> parent and rank.
JS: typed array Int32Array for speed.

14. Bitsets / Bit Manipulation

Common Idioms

Operation	Idiom
Set bit `i`	`x \| (1 << i)`
Clear bit `i`	`x & ~(1 << i)`
Toggle bit `i`	`x ^ (1 << i)`
Test bit `i`	`(x >> i) & 1`
Lowest set bit	`x & -x`
Pop lowest set bit	`x & (x - 1)`
Popcount	language builtin (`__builtin_popcount`, `Integer.bitCount`, `bin(x).count('1')`)
Iterate subsets of mask	`s = mask; while s > 0: …; s = (s - 1) & mask`

Bitsets

A packed array of bits — 64× the density and 64× the throughput of bool[] for many ops. Use when N up to ~10⁵ and you need fast set operations.

Language-Specific Gotchas

Python: ints are arbitrary precision; no overflow but no SIMD either. bin(x).count('1') works but int.bit_count() (3.10+) is faster.
Java: int is 32-bit, long is 64-bit. Negative numbers: >> is arithmetic, >>> is logical.
Go: untyped constants vs typed; explicit cast required (int(x), uint32(x)).
C++: std::bitset<N> for compile-time-known N; vector<bool> is a specialization (proxy reference).
JS: bitwise ops coerce to 32-bit signed int — beware truncation. Use BigInt for 64-bit ops.

15. Counters / Multisets

Counter Pattern

A hash map from key to count. Used for: frequency analysis, anagram detection, sliding window distinct-element count.

Operations

Operation	Complexity
Increment count[k]	O(1)
Decrement / remove if zero	O(1)
Total count	O(distinct keys)
Sorted by count	O(N log N)

Multiset

A counter is essentially a multiset (allows duplicates, remembers count). For an ordered multiset (need min/max/kth in order), use a TreeMap+count or sortedcontainers.SortedList.

Language-Specific Gotchas

Python: collections.Counter — Counter(s), most_common(k), arithmetic operators (c1 - c2 drops zeros).
Java: HashMap<K, Integer> with getOrDefault(k, 0) + 1 and merge(k, 1, Integer::sum). There is no built-in Counter.
Go: map[K]int. Manual increment.
C++: std::unordered_map<K, int>; ++m[k] works because default-constructed int is 0.
JS: Map (preserves insertion order) or plain object (string keys only).

Inline Runtime Concepts Reference

These concepts cut across all data structures. They are interview rubric line items in their own right.

1. Stack vs Heap Memory

The call stack holds function frames: locals, args, return addresses. Fast, fixed-size (typically 1–8 MB in interview environments). Allocations are pointer-bump and bound to the frame’s lifetime.

The heap is dynamically managed memory — new, malloc, Python objects, JVM objects. Slower allocation, can be GC’d or manually freed. Survives beyond the function that created it.

Why It Matters In Interviews

“Why does my recursion overflow at depth 10⁵?” → call stack ~1 MB / ~64 bytes per frame ≈ 16K frames before crash.
“Why is my linked list slower than my array?” → heap-allocated nodes scattered in memory, no cache locality.

Per-Language Gotchas

Python: all “values” except small ints are heap objects. Locals are name bindings, not stack-allocated values.
Java: primitives in locals are stack-allocated; objects always heap. Escape analysis can sometimes elide a heap alloc.
Go: escape analysis decides stack vs heap; &x in a returned closure forces heap allocation.
C++: explicit (int x; is stack, new int is heap). Stack allocation is much faster.
JS: all objects are heap; primitives may be stack-internal but the language hides it.

2. Scope and Lifetime

Scope = where a name is visible. Lifetime = how long the value lives.

These can differ! In a closure, a variable’s scope ends with the function but its lifetime extends as long as the closure references it.

Per-Language Gotchas

Python: late binding in closures — [lambda: i for i in range(3)] all return 2, not 0/1/2.
Java: local variables captured by lambdas must be effectively final.
Go: loop variable capture changed in Go 1.22 — pre-1.22, for i := range … { go func() { use(i) }() } captures the same i.
C++: dangling reference if you return &local. Lifetime extension via const& is a niche rule.
JS: var is function-scoped, let/const are block-scoped. Hoisting trap.

3. Value vs Reference Semantics

Does assigning or passing a variable copy the value or share a reference?

Language	Primitives	Objects/Arrays
Python	by value (immutable)	by reference
Java	by value	reference passed by value
Go	by value (struct, array)	slice/map/chan = ref-ish
C++	by value (default)	explicit & or * for ref
JS	by value	by reference

Trap

“I passed my array to the function, modified it inside, and the caller saw the change!” — yes, because the array (Python list, Java int[], JS array) is passed by reference.

“I passed my int to the function, modified it inside, but the caller didn’t see the change!” — yes, because the int is by value.

4. Mutable vs Immutable

Immutable values cannot be modified after creation; “modification” returns a new value.

Language	Immutable types
Python	`str`, `int`, `tuple`, `frozenset`, `bytes`
Java	`String`, all primitive wrappers, `LocalDate`, etc.
Go	`string`
C++	`const`-qualified
JS	primitives (`string`, `number`, `boolean`, `undefined`, `null`, `bigint`, `symbol`)

Trap

Using a mutable object as a hash map key, then mutating it → key lost.
“Why is s += c slow?” — string is immutable, every iteration copies the whole string.

5. Hash Collisions and Adversarial Keys

A hash map’s O(1) average requires a “good” hash function and “uniform” inputs. An adversary who knows the hash function can craft keys all colliding to one bucket → O(N²) blowup.

Defenses

Random seed (Python’s PYTHONHASHSEED, Go map random seed) — attacker can’t predict the function.
Tree fallback — Java HashMap since 8 converts long collision chains into red-black trees, capping worst case at O(log N).
Cryptographic hash — overkill, but immune.

Interview Note

If the problem explicitly says “adversarial” or “competitive” inputs, do not rely on hash maps. Use a sorted structure (TreeMap, sortedcontainers, std::map).

6. Iterator Invalidation

Modifying a collection while iterating can break the iterator.

Behaviors

Python: mutating a dict during iteration raises RuntimeError.
Java: ConcurrentModificationException (fail-fast iterator).
Go: map iteration order is randomized; modifying the map mid-iteration is technically allowed but the new keys may or may not be visited.
C++: vector::push_back invalidates all iterators if it reallocates. unordered_map::insert invalidates all on rehash.
JS: Map and Set iteration sees later insertions; deletions are honored.

Safe Pattern

Collect mutations into a list during iteration; apply after the loop.

7. Garbage Collection Basics

Two main strategies:

Reference counting (refcount): each object has a count of references to it; when 0, freed. Fast but cannot collect cycles. CPython uses refcount + cycle collector.
Tracing GC (mark-and-sweep, generational): periodically traces from roots; unreachable objects freed. Java, Go, JS, C# use variants.

Why It Matters

Refcount: del x is immediate; deterministic destruction.
Tracing: deallocation happens “later” — pauses (“stop-the-world”) historically, mostly amortized in modern collectors.

Per-Language Gotchas

Python: cycles between objects with __del__ finalizers may never collect (pre-3.4).
Java: “GC pause” is a real concern in latency-sensitive interviews; mention G1, ZGC.
Go: GC is concurrent and non-generational; predictable sub-millisecond pauses.
C++: no GC — must delete what you new. RAII (smart pointers) automates this.
JS: V8 uses generational GC.

8. Memory Leaks (in GC’d Languages)

A “leak” in a GC’d language means: the object is unreachable from the developer’s intent, but reachable from the GC’s perspective — so it’s never freed.

Common Sources

Listeners not removed: event handler holds a reference to the listener forever.
Caches without eviction: map grows monotonically.
Closure capture: closure holds reference to large enclosing object.
Static fields: lives forever.
ThreadLocals not cleared in pooled threads: classic JVM leak.

Per-Language Gotchas

Python: circular refs with finalizers; module-level state.
Java: ThreadLocal + thread pool; classloader leaks (PermGen / Metaspace).
Go: goroutine leak (blocking on a channel that never receives).
C++: real memory leak via missing delete.
JS: detached DOM nodes, timers not cleared.

9. Deep vs Shallow Copy

Shallow copy: new container, but elements are shared references.
Deep copy: recursively copies elements too.

When It Matters

Backtracking: if you store snapshots of a mutable list, you need a deep (or at least one-level) copy, otherwise all snapshots reflect the latest state.

result = []
path = []
def backtrack(...):
    if done: result.append(path[:])  # MUST copy; otherwise all entries are the same list

Per-Language Gotchas

Python: list(x) / x[:] shallow; copy.deepcopy(x) deep.
Java: clone() is shallow by default; deep requires explicit traversal.
Go: copy(dst, src) shallow; for deep, write your own.
C++: copy constructor: shallow by default for raw pointers; smart pointers and STL containers do “value” copies.
JS: spread [...arr] and {...obj} are shallow; deep via structuredClone(x) (modern) or JSON round-trip (lossy).

10. Recursion Depth and Stack Overflow

Each recursive call adds a frame to the call stack. The stack has a fixed size; exceeding it crashes (or in Python, raises RecursionError).

Default Limits

Python: sys.getrecursionlimit() defaults to 1000. Raise with sys.setrecursionlimit(10**6) plus threading.stack_size(...).
Java: ~10K frames typical; tune with -Xss.
Go: stacks start small (8KB) and grow up to 1 GB.
C++: thread stack ~1 MB default; tune at thread creation.
JS: ~10K frames typical, engine-dependent.

Mitigation

Convert to iteration with an explicit stack (DFS, in-order traversal).
Tail-call optimization is not present in most popular languages (no Python, no Java, no JS, no Go). C++ and Scheme can do it.
For trees, balance matters: a degenerate (chain-shaped) tree blows recursion at depth N, not log N.

Lab 01 — Array Fundamentals: Rotate Array In Place

Goal

Master in-place array rotation. The deliverable shows you understand pointer arithmetic, the O(N) reversal trick, dynamic-array memory layout, and edge cases that catch ~70% of candidates on this exact problem.

Background Concepts

Arrays as contiguous memory; index arithmetic mod N; in-place vs auxiliary-space transformations; the “three reversals” identity: rotate(arr, k) == reverse(reverse(arr[:k]), reverse(arr[k:])). Review the Arrays section of the Phase 1 README and the value-vs-reference rules in section 3 of the runtime concepts.

Interview Context

This is the canonical “looks easy, traps everyone” Easy/low-Medium problem. Real interviews from Microsoft, Amazon, Meta, Apple, Google. The interviewer is watching for: do you do the auxiliary-array brute force first? Do you spot the O(N) in-place trick? Do you handle k > N? Do you survive k == 0 and N == 1?

Problem Statement

Given an integer array nums and a non-negative integer k, rotate the array to the right by k steps in place. After the rotation, element originally at index i ends up at index (i + k) % N.

Constraints

1 ≤ N ≤ 10^5
-2^31 ≤ nums[i] ≤ 2^31 - 1
0 ≤ k ≤ 10^9
Must run in O(N) time and O(1) extra space.

Clarifying Questions

Can k be greater than N? (Yes — must reduce mod N.)
Can k be 0? (Yes — should be a no-op, no array mutation.)
Can the array be empty / size 1? (Per the constraints, N ≥ 1. Confirm.)
Right rotation, not left? (Confirm direction; getting it backward is a top-3 bug here.)
Must it be in place, or is auxiliary memory allowed? (In place — that’s the spirit of the problem.)

Examples

Input	k	Output	Notes
`[1,2,3,4,5,6,7]`	3	`[5,6,7,1,2,3,4]`	Standard case
`[1,2]`	3	`[2,1]`	k > N: effective k = 1
`[1,2,3]`	0	`[1,2,3]`	No-op
`[1]`	100	`[1]`	Trivial size
`[1,2,3]`	3	`[1,2,3]`	k == N: no-op

Initial Brute Force

Allocate out[N]; for each i, set out[(i + k) % N] = nums[i]; copy out back into nums.

def rotate_brute(nums, k):
    n = len(nums)
    k %= n
    out = [0] * n
    for i in range(n):
        out[(i + k) % n] = nums[i]
    nums[:] = out

Brute Force Complexity

Time: O(N). Space: O(N) auxiliary. Fails the in-place constraint despite optimal time.

Optimization Path

We need O(1) extra space. Two well-known approaches:

Cyclic replacement: start at index 0, jump to (0 + k) % N, place the displaced element, continue. Visits each index exactly once. Tricky when gcd(N, k) > 1 (multiple disjoint cycles). Correctness needs a counter for elements moved.
Three reversals: reverse the whole array, reverse the first k, reverse the last N-k. This works because rotation by k is reversal of (reversal of left, reversal of right). Easier to write correctly.

Pick the three-reversal approach for the interview unless the interviewer explicitly asks for cyclic replacement.

Final Expected Approach

Three-reversal in place.

def reverse(nums, l, r):
    while l < r:
        nums[l], nums[r] = nums[r], nums[l]
        l += 1
        r -= 1

def rotate(nums, k):
    n = len(nums)
    k %= n              # normalize
    reverse(nums, 0, n - 1)
    reverse(nums, 0, k - 1)
    reverse(nums, k, n - 1)

Data Structures Used

A single mutable array. No auxiliary structures.

Correctness Argument

Loop invariant for reverse(l, r): at each iteration, elements at positions less than l and greater than r are already correctly placed (i.e., are mirror swaps of each other). Termination at l ≥ r leaves the closed range fully reversed. The three-reversal identity is verifiable with a 2-step example: [A B C D E] k=2 → reverse all → [E D C B A] → reverse first 2 → [D E C B A] → reverse last 3 → [D E A B C]. Equivalent to original rotated right by 2.

Complexity

Time: O(N) (three passes, each touching at most N elements). Space: O(1) (in-place swaps; no allocation).

Implementation Requirements

Helper reverse(nums, l, r) with explicit bounds (closed interval).
Always do k %= n first, before any loop or reversal.
Handle k == 0 and k == n: both reduce to no-op via k %= n.
No allocation outside the input array (verify by reading your code).
Clean variable names (l, r, n, k are interview-acceptable).

Tests

Smoke: the canonical [1,2,3,4,5,6,7] with k=3.
Unit: k=0 (no-op), k=N (no-op), k=1 (single right shift).
Edge: N=1, N=2 with k=1, large k=10^9 mod N.
Large: N=10^5, k=N//2; assert in-place (capture id(nums) in Python).
Random: generate random arrays and ks; check against brute force as oracle.
Invalid: negative k (per constraints not allowed; if interviewer extends, decide left rotation semantics).

Follow-up Questions

“Can you do it without the modulo?” (Yes, but ugly: branch on k <= n.)
“What if the array is given as a linked list?” (Different problem — find length, find pivot, splice.)
“What if k can be negative (left rotation)?” (Convert via k = ((k % n) + n) % n.)
“Solve using a single reverse loop without a helper.” (Inline the swaps three times.)
“Implement with cyclic replacement instead.” (Demonstrate the gcd cycle counter trick.)

Product Extension

A circular buffer for a metrics dashboard storing the last N seconds of samples. Rotation isn’t done on append — instead, a head index advances mod N. The “three reversals” trick is what you do when the buffer must be flattened to a linear export. Discuss tradeoffs: head-index buffer is O(1) per append but harder to debug; rotation on read is O(N) but storage is always linear.

Language/Runtime Follow-ups

Python: nums[:] = nums[-k:] + nums[:-k] is one-line and Pythonic but allocates O(N). Acceptable to mention but interviewer may rule it “not in place.” Pure swap version uses no allocation.
Java: int[] (primitive) avoids boxing. Don’t reach for Collections.rotate; understand it.
Go: slice indexing, careful with n := len(nums) capture; nums = nums[:] aliasing makes no copy.
C++: std::reverse(nums.begin() + l, nums.begin() + r + 1); use the standard.
JS: in-place using [a, b] = [b, a] swap or temp = a. arr.reverse() is in-place.

Common Bugs

Forgetting k %= n — when k > n the reversals overlap incorrectly.
Off-by-one in reverse(l, r) — using r vs r - 1 as the bound; using < vs ≤.
Reversing wrong segments — confusing first-k with last-k. Right rotation: [reverse first k of reversed array] then [reverse last n-k].
Allocating in disguise — nums = nums[-k:] + nums[:-k] rebinds the local name and does not mutate the caller’s array (in Python). Use nums[:] = ….
Left vs right confusion — re-read the problem statement once before submitting.

Debugging Strategy

Print the array after each of the three reversals; compare to a hand-traced [1,2,3,4,5,6,7] k=3 walk-through.
If output is wrong by a constant shift, suspect an off-by-one in segment bounds.
If output looks reflected ([3,2,1, 7,6,5,4] instead of [5,6,7,1,2,3,4]), one of the three reversals fired in the wrong region.

Mastery Criteria

Wrote the three-reversal solution in under 4 minutes, no bugs, in-place verified.
Traced through k > N, k == 0, N == 1 without prompting.
Stated the loop invariant for reverse aloud.
Named the cyclic-replacement alternative and acknowledged its gcd complication.
Identified and avoided the Python nums = nums[-k:] + nums[:-k] allocation trap.

Lab 02 — String Mechanics: Reverse Words In A String

Goal

Master string immutability, builder patterns, encoding gotchas, and the cost of naive concatenation. The deliverable: reverse the order of words in a sentence efficiently in your interview language, demonstrating you understand why the “obvious” solution can be O(N²) in some languages and O(N) in others.

Background Concepts

String immutability and the resulting cost of s += c loops; StringBuilder / strings.Builder / "".join() patterns; substring complexity; whitespace tokenization; Unicode pitfalls. Review the Strings section of the Phase 1 README and item 4 in the runtime concepts (mutable vs immutable).

Interview Context

A staple of Microsoft, Amazon, and Bloomberg phone screens. The trap is candidates who write result = ""; for w in reversed(words): result += w + " " and don’t realize they just shipped O(N²) code in Java or Python. Strong candidates state the immutability fact aloud and choose a builder pattern.

Problem Statement

Given a string s representing a sentence, return a new string with the order of words reversed. A “word” is a maximal run of non-space characters. Multiple spaces between words and leading/trailing spaces must be collapsed to single spaces; the output has no leading/trailing space.

Constraints

1 ≤ |s| ≤ 10^4
s contains printable ASCII characters and spaces.
s contains at least one word.

Clarifying Questions

Is the input ASCII or arbitrary Unicode? (Affects iteration model; ASCII is the default unless stated.)
Should multiple internal spaces be preserved or collapsed? (Standard problem says collapse; confirm.)
Trim leading/trailing whitespace? (Yes — output has none.)
Punctuation: is "hello," one word? (Per problem: a “word” is non-space-separated; "hello," is one word.)
Can I allocate a new string, or must I work in place? (For Python/Java/JS — strings are immutable, so a new string is unavoidable. For C++ std::string, in-place is feasible.)

Examples

Input	Output
`"the sky is blue"`	`"blue is sky the"`
`" hello world "`	`"world hello"`
`"a good example"`	`"example good a"`
`"single"`	`"single"`
`" "`	invalid per constraints

Initial Brute Force

Split on whitespace, reverse the list, join with single spaces.

def reverse_words(s):
    return " ".join(reversed(s.split()))

s.split() (no arg) collapses runs of whitespace and trims, which is exactly the spec. This is one line in Python — but the interviewer wants you to explain what it does.

Brute Force Complexity

Time: O(N) — split is one linear pass, reversed is O(k) where k is word count, join is one linear pass. Space: O(N) for the list of words and the output. This is already optimal asymptotically.

Optimization Path

For interviews where the one-liner is “too easy,” the interviewer escalates: “Do it without split/join; use only character-level operations.” Or: “Reverse in place in a char[] with O(1) extra memory.”

The classic in-place trick on a mutable buffer: reverse the entire buffer, then reverse each word, then collapse internal whitespace. This is the same three-reversal identity from Lab 01, applied to characters.

Final Expected Approach

State the one-liner first. Then offer the manual two-pointer approach for languages with mutable strings or as an “I-understand-the-internals” demonstration.

def reverse_words(s):
    # tokenize without builtins
    words = []
    i = 0
    n = len(s)
    while i < n:
        while i < n and s[i] == ' ':
            i += 1
        j = i
        while j < n and s[j] != ' ':
            j += 1
        if j > i:
            words.append(s[i:j])
        i = j
    # reverse and join via builder
    out = []
    for w in reversed(words):
        out.append(w)
    return ' '.join(out)

In Java, replace the final join with StringBuilder. In Go, with strings.Builder.

Data Structures Used

A list of word strings (or substrings); a builder for the output. No advanced structures.

Correctness Argument

Tokenization invariant: at each outer iteration, i points at the start of unscanned input; the inner loops skip whitespace and capture a word. Each character is examined O(1) times, so tokenization is O(N). Reversed iteration over words produces them in opposite order; joining with ' ' produces single-space separation; no leading/trailing space because we never push empty words and we don’t terminate with a separator (Python join handles this).

Complexity

Time: O(N) — single tokenization pass plus single output assembly pass. Space: O(N) — output and word list.

Implementation Requirements

Use an explicit builder (StringBuilder, strings.Builder, [].join, ''.join).
Never use += to build the output in a loop in Java/Python/JS.
Don’t rely on regex unless the interviewer is explicitly fine with it (s.split(/\s+/) works but is overkill).
Verify trimming works on " word ".
Verify multiple internal spaces collapse on "a b".

Tests

Smoke: "the sky is blue" → "blue is sky the".
Unit: leading/trailing spaces; multiple internal spaces; single-word input.
Edge: all-spaces input (per constraints invalid; handle gracefully if extended); single character; punctuation as part of word.
Large: N = 10^4 input with 10^3 words; assert no quadratic behavior (time it).
Random: randomly generate space-and-letter strings; cross-check against " ".join(reversed(s.split())).
Invalid: non-ASCII in extended versions (define behavior per language’s iteration model).

Follow-up Questions

“Reverse character order within each word as well.” (Reverse each word in place after splitting.)
“Reverse in O(1) extra space on a mutable char[].” (The three-reversal trick.)
“Handle Unicode where ‘word’ is grapheme-cluster bounded.” (Need an ICU library or equivalent.)
“Preserve original whitespace runs.” (Don’t collapse; keep separator tokens.)
“What if the string is huge and streamed?” (Process word-by-word from a buffered reader.)

Product Extension

A search-engine query normalizer. Inputs from users have inconsistent whitespace, varying word order. Reverse word order is a feature for “did-you-mean” inversion testing. In production: keep the original for display, normalize for indexing, and accept that the cost of String immutability in Java means hot paths use StringBuilder or even byte arrays directly.

Language/Runtime Follow-ups

Python: s.split() (no args) is the magical normalizer. "".join(...) is a single allocation; never use += on str in a loop.
Java: String.split("\\s+") returns an array; String.trim() separately. Output via StringBuilder. String.join(" ", parts) is the modern one-liner.
Go: strings.Fields(s) splits on any whitespace and trims; strings.Join(parts, " ") rebuilds. Both are O(N).
C++: std::stringstream for tokenizing; build via std::string += , which has small-string optimization but still amortized O(N).
JS/TS: s.trim().split(/\s+/).reverse().join(' '). Beware: the empty-string case "".split(/\s+/) returns [""], not [].
Unicode subtlety: in Java/JS, length counts UTF-16 code units; '😀' is length 2. Doesn’t matter here unless emoji-as-word.

Common Bugs

Off-by-one on whitespace — leaving a trailing space after join.
s += c in a loop — O(N²) in Python, Java, JS. Catastrophic on large input.
Splitting by ' ' instead of by whitespace — "a b".split(' ') returns ["a", "", "b"] in Java/JS; you get empty tokens.
Trim missed — leading whitespace becomes a leading empty token.
Mutating the input — in some languages strings are immutable so this is a type error; in C++/Go-byte-slice it’s a semantic bug.

Debugging Strategy

Print the tokenized list. If it has empty strings, your splitter doesn’t collapse.
If output has trailing space, your join builds it manually rather than via the standard library.
Time on a 10^4 input. If it’s > 1 second, you have a quadratic concat hidden somewhere.

Mastery Criteria

Wrote the one-liner in 30 seconds and explained why each piece is needed.
Wrote the manual tokenizer in under 5 minutes.
Stated aloud “in this language, strings are immutable, so I will use a builder.”
Identified the difference between split(' ') and split('\\s+')/split() (no-arg).
Acknowledged the in-place char[] three-reversal alternative without writing it (or wrote it on follow-up).
Tested with " a b " and confirmed clean output.

Lab 03 — Hashmap Mastery: Group Anagrams

Goal

Master hash-map design with composite keys, adversarial input awareness, and the equality/hashcode contract. The deliverable groups N strings into anagram buckets in O(N · L) time and articulates exactly why your hash key works and what an adversary could do to break it.

Background Concepts

Hash function design; key equality contract; adversarial inputs and load factor; ordered-vs-unordered map choice; counter pattern. Review the Hash Maps and Hash Sets sections of the Phase 1 README, plus runtime concept 5 (hash collisions).

Interview Context

Group Anagrams is interview-evergreen: appears at Meta, Google, Amazon, Microsoft. The interview signal is whether you reach for the right key. Naive candidates compare every pair (O(N² · L)). Decent candidates sort each string into a key (O(N · L log L)). Strong candidates use a counter tuple key (O(N · L)). Elite candidates discuss adversarial hash flooding and language-specific custom-hash mechanics.

Problem Statement

Given an array strs of N lowercase-ASCII strings, group the strings that are anagrams of each other. Return the groups as a list of lists. Within and across groups, any order is acceptable.

Constraints

1 ≤ N ≤ 10^4
0 ≤ |s_i| ≤ 100
s_i consists of lowercase English letters.
Total characters: Σ |s_i| ≤ 10^6.

Clarifying Questions

Lowercase only? (Per constraints — confirm.)
Do empty strings group together? (Yes — the empty string is an anagram of itself.)
Is output order significant within or across groups? (Standard problem: no.)
Are duplicates in the input allowed? (Yes — ["aa","aa"] is one group of size 2.)
Memory constraints? (Should fit comfortably; mention you’ll discuss tradeoffs.)

Examples

Input	Output (any order)
`["eat","tea","tan","ate","nat","bat"]`	`[["eat","tea","ate"], ["tan","nat"], ["bat"]]`
`[""]`	`[[""]]`
`["a"]`	`[["a"]]`
`["abc","cab","bca","xyz"]`	`[["abc","cab","bca"], ["xyz"]]`

Initial Brute Force

For each pair (i, j), check if strs[j] is an anagram of strs[i] (e.g., by sorting both). Use a seen[] array. O(N² · L log L).

def group_anagrams_brute(strs):
    seen = [False] * len(strs)
    groups = []
    for i, s in enumerate(strs):
        if seen[i]: continue
        g = [s]
        seen[i] = True
        ks = sorted(s)
        for j in range(i + 1, len(strs)):
            if not seen[j] and sorted(strs[j]) == ks:
                g.append(strs[j])
                seen[j] = True
        groups.append(g)
    return groups

Brute Force Complexity

Time: O(N² · L log L) due to repeated sorting. Space: O(N) for the seen array plus O(L) per sort. Fails for N = 10^4 (10⁸ operations × log).

Optimization Path

The key insight: anagrams have the same multiset of characters. We need a hashable key derived from this multiset. Two canonical forms:

Sorted string as key: sorted(s) → "aet" for "eat", "tea", "ate". Cost per key: O(L log L). Total: O(N · L log L).
Count tuple as key: a length-26 tuple of counts. Cost per key: O(L). Total: O(N · L). Optimal for large L.

Pick the count tuple unless L is tiny.

Final Expected Approach

Bucket strings by count tuple in a hash map.

from collections import defaultdict

def group_anagrams(strs):
    buckets = defaultdict(list)
    for s in strs:
        counts = [0] * 26
        for c in s:
            counts[ord(c) - ord('a')] += 1
        buckets[tuple(counts)].append(s)
    return list(buckets.values())

Data Structures Used

A hash map keyed by a 26-tuple of int counts; values are lists of strings.
A constant-size 26-int array for tallying (per word).

Correctness Argument

Two strings are anagrams iff they have identical character multisets iff their count vectors are equal. Equal count vectors hash to the same bucket and compare equal under tuple equality, so anagrams land in the same bucket. Different vectors compare unequal under tuple equality, so non-anagrams don’t share a bucket (modulo accidental hash collisions, which the equality check resolves correctly — that’s the equality/hashcode contract at work).

Complexity

Time: O(Σ |s_i|) = O(N · L) for tallying plus O(N · 26) for tuple hashing = O(N · L). Space: O(N · L) for the buckets and keys.

Implementation Requirements

Use a hashable key — tuples in Python, String (built from the count array) in Java, struct or stringified key in Go, std::array<int, 26> in C++.
Don’t use the sorted string for very large L (suboptimal but acceptable for interview presentation).
Use defaultdict(list) or equivalent (computeIfAbsent in Java) to avoid manual “if not in map” branching.
Return values as a list-of-lists, not the dict itself.

Tests

Smoke: the canonical 6-string example above.
Unit: singletons, all-anagrams (["abc","bca","cab"]), no-anagrams (["a","b","c"]).
Edge: empty strings, single-char strings, duplicates (["aa","aa"]).
Large: N = 10⁴, L = 100, mix of group sizes; assert sub-second.
Random: generate random words; verify bucketing matches a reference (e.g., the sorted-string variant).
Invalid: uppercase or non-ASCII (per constraints disallowed; if extended, normalize first).

Follow-up Questions

“What if strings can be Unicode?” → switch to a Counter / HashMap<Char, Int> as the key (more expensive hashing). Or, use the sorted string with a Unicode-aware sort.
“What if the input is streamed?” → emit groups lazily as you find duplicates, but you can’t finalize a group until input ends.
“What if memory is tight (you can’t store N count arrays)?” → use the sorted-string key (only 1 allocation per word, free after bucketing) or a rolling hash with secondary check.
“Adversarial input — can the interviewer construct N strings whose count tuples all hash to the same bucket?” → yes for predictable-hash languages; mitigation is randomized hash seeds or a TreeMap fallback.
“Implement without a hash map.” → sort all strings by their sort-key, then group consecutive equal keys. O(N · L log L) due to sorting strings.

Product Extension

A duplicate-document detector. Each document is hashed by a content fingerprint (e.g., sorted shingles); documents with the same fingerprint are grouped. The same data-structure pattern (hash by canonical form) underlies large-scale dedup at file-storage and email systems. Discuss false positives (two different docs with the same fingerprint), the role of secondary equality check, and the tradeoff between fingerprint cost and accuracy.

Language/Runtime Follow-ups

Python: tuple(counts) is hashable; Counter is hashable only via frozenset(c.items()). defaultdict(list) is the idiom.
Java: must build a String (or use a hash of the int[] array combined with equals on the array — which means a custom class with proper hashCode/equals). The classic interview shortcut is to convert the count array to a string like "1#0#0#…#1". Beware boxing in HashMap<int[], List<String>> — int[] does not override hashCode/equals, defaults to object identity. This is a top-3 Java bug on this problem.
Go: map keys must be comparable. [26]int is comparable; []int is not. Use the array.
C++: std::array<int, 26> is hashable with boost::hash or a custom std::hash specialization. Or stringify.
JS/TS: Map keys can be any value but use reference equality for arrays/objects. Use a string key like counts.join(',') or a Map<string, string[]>.
Adversarial keys: Java’s String.hashCode is well-known and allows hash flooding. Java HashMap mitigates with tree-to-bucket conversion past 8 collisions.

Common Bugs

Java int[] as map key — uses object identity, not value equality. Every entry creates a new bucket. Fix: stringify or use Arrays.hashCode + custom wrapper.
Mutating the count array between map ops — if you reuse one buffer and mutate, your inserted keys all alias the same buffer. Allocate fresh per word.
Off-by-one in ord(c) - ord('a') — non-lowercase input goes negative or out of range.
Empty-string handling — count array is all zeros; should still bucket correctly. Verify.
Returning dict.values() directly in Python — works but the type is dict_values, not list. Wrap with list(...).

Debugging Strategy

Print the keys of the resulting map. If you see N keys for N strings, your key derivation is wrong (likely identity-based).
For Java: assert myKey.hashCode() == myOtherKey.hashCode() for a hand-crafted anagram pair.
Time on N=10⁴: should run in well under a second.

Mastery Criteria

Selected the count-tuple approach within 60 seconds, explaining why over the sorted-string approach.
Stated the equality/hashcode contract and how it affects key choice in Java.
Identified the int[] reference-equality trap (or its language equivalent) before coding.
Articulated the adversarial-input concern and the language’s defense.
Wrote a clean implementation in under 8 minutes.
Tested with the empty-string and all-duplicates edge cases.

Lab 04 — Linked List Pointers: Reverse Linked List

Goal

Master pointer manipulation under aliasing, the classic three-pointer iterative reverse, the recursive variant with stack-frame analysis, and the dummy-node technique. The deliverable: reverse a singly linked list iteratively and recursively, articulating exactly which references move when, and identifying the recursion-depth risk on long lists.

Background Concepts

Pointers as references; aliasing; null sentinels; recursion stack frames; tail-call elimination (or absence thereof). Review the Linked Lists section of the Phase 1 README, plus runtime concepts 3 (value vs reference) and 10 (recursion depth).

Interview Context

Reverse Linked List is the warm-up question at every FAANG. The signal isn’t whether you can do it — most candidates can — it’s whether you can do it cleanly, in two ways, on the whiteboard, while talking through pointer movement. It’s also the “are you ready for harder linked-list problems” gate.

Problem Statement

Given the head of a singly linked list 1 → 2 → 3 → 4 → 5 → null, return the head of the reversed list 5 → 4 → 3 → 2 → 1 → null. The original list nodes are reused (no new node allocation).

Constraints

0 ≤ N ≤ 5000 (LeetCode classic) — but realistic interview lists may have N up to 10⁵; recursion depth matters.
-5000 ≤ node.val ≤ 5000 (irrelevant for traversal logic).

Clarifying Questions

Is the list singly or doubly linked? (Singly — affects whether we need to update prev pointers.)
Is head ever null? (Yes — return null. Top edge case.)
Single-node list? (Return the same node; its next is already null.)
Should the reversal be in place (reuse nodes) or allocate new nodes? (In place is the standard; allocating new nodes is a different problem.)
Should I also support reversing a segment [m, n]? (That’s a follow-up — see “Reverse Linked List II”.)

Examples

Input	Output
`1 → 2 → 3 → 4 → 5`	`5 → 4 → 3 → 2 → 1`
`1 → 2`	`2 → 1`
`1`	`1`
`null`	`null`

Initial Brute Force

Walk the list, push values onto a stack, walk again and reassign values from the stack.

def reverse_brute(head):
    vals = []
    cur = head
    while cur:
        vals.append(cur.val)
        cur = cur.next
    cur = head
    while cur:
        cur.val = vals.pop()
        cur = cur.next
    return head

Brute Force Complexity

Time: O(N). Space: O(N) auxiliary. Two passes. Doesn’t reverse pointers — only mutates values, which violates the spirit (and breaks if val is immutable, e.g., final field).

Optimization Path

We want O(1) extra space by manipulating pointers directly. Two canonical approaches:

Iterative three-pointer: prev, cur, next. Walk forward, flip cur.next to prev, advance.
Recursive: reverse the tail, then attach the head behind it. Beautiful but O(N) stack.

Iterative is preferred for production (no stack-overflow risk). Recursive is preferred for explaining the idea. Strong candidates write both.

Final Expected Approach

Iterative, three pointers.

def reverse_list(head):
    prev = None
    cur = head
    while cur:
        nxt = cur.next   # save the rest of the list
        cur.next = prev  # flip
        prev = cur       # advance prev
        cur = nxt        # advance cur
    return prev          # prev is the new head

Recursive form:

def reverse_list_rec(head):
    if head is None or head.next is None:
        return head
    new_head = reverse_list_rec(head.next)
    head.next.next = head
    head.next = None
    return new_head

Data Structures Used

The input list itself; three local pointers. No new allocation.

Correctness Argument

Iterative invariant: before each iteration, the sub-list ending at prev is fully reversed and cur points at the head of the not-yet-reversed remainder. The body of the loop preserves the invariant: we save cur.next, flip cur.next to point at the reversed prefix, then advance prev and cur by one. When cur is None, the entire input has been processed and prev is the head of the reversed list.

Recursive correctness: by induction on length. Base: list of length 0 or 1 is its own reverse. Inductive step: assume reverse_list(head.next) correctly returns the head of the reversed tail. The original head is now at the end of the reversed tail; head.next is the last node of the reversed-tail (the original second node). Set head.next.next = head to append head; set head.next = None to terminate.

Complexity

Iterative: O(N) time, O(1) space. Recursive: O(N) time, O(N) space due to the call stack.

Implementation Requirements

Three named pointers: prev, cur, nxt (or next — but watch out for shadowing built-ins in some languages).
Initialize prev = null. Top bug: forgetting this means the head’s next becomes self-referential or stale.
Save cur.next before overwriting it. Forgetting to save loses the rest of the list.
Return prev, not cur (which is null at termination).
For recursion: handle the base case at head is None first.

Tests

Smoke: 1 → 2 → 3 → 4 → 5.
Unit: length 1, length 2, length 3.
Edge: null head; list with all-equal values; list with cycle (should not be passed in — but if defensive, detect with Floyd’s).
Large: N = 10⁵; if recursive, expect StackOverflow in Java/Python without sys.setrecursionlimit.
Random: build random lists, reverse, reverse again, assert equality with the original.
Invalid: ensure the original head’s next is null after reversal (it’s now the tail).

Follow-up Questions

“Reverse a sublist [m, n].” → “Reverse Linked List II” — needs a dummy node and careful pointer wiring.
“Reverse in groups of K.” → “Reverse Nodes in K-Group” — apply the iterative reverse on each chunk.
“Reverse a doubly linked list.” → swap prev/next per node.
“Detect and handle a cycle before reversing.” → Floyd’s tortoise and hare.
“Iterative without saving next (write it as a swap).” → trickier; usually a teaching exercise.
“Why is iterative preferred in production?” → no stack-overflow risk on long lists.

Product Extension

A document undo/redo stack implemented as a linked list. To replay actions in reverse temporal order, you reverse the list. In-place reversal is preferred because the list nodes carry references to large objects (action payloads) and reallocation would be expensive. The null-handling and dummy-node patterns transfer directly to LRU-cache implementations and free-list management.

Language/Runtime Follow-ups

Python: no tail-call elimination; sys.setrecursionlimit(N+100) for deep lists. Default recursion limit is 1000.
Java: typical stack ~500K frames; expect StackOverflowError for recursive on N=10⁵+.
Go: stack starts small (8 KB) and grows automatically. Recursion is safe for moderate N. Pointers are explicit (*ListNode).
C++: stack usually 1–8 MB; recursive risk depends. Use -fsanitize=address to catch use-after-free if you mis-rewire.
JS/TS: V8 doesn’t reliably tail-call optimize. Iterative is the only safe choice for large N.
Pointer aliasing: mutating cur.next while another reference (e.g., head) still points to the same node is exactly the operation we want — but only because we intentionally preserve the old next in nxt first.

Common Bugs

Losing the rest of the list — overwriting cur.next before saving it. Symptoms: list has 2 elements after “reverse.” Fix: always save first.
Forgetting to set the original head’s next to null — in recursive form, omitting head.next = None makes the original head point at itself or its successor, creating a cycle.
Returning head instead of prev — returns the now-tail of the reversed list. Always return prev.
Initializing prev to head instead of null — first iteration creates a self-loop.
Using next as a variable name in Python — shadows the built-in iterator function. Harmless here but tags you as junior.

Debugging Strategy

Hand-trace on a 3-node list. Draw arrows. After each iteration, write down where prev, cur, nxt point.
After running, walk the result and assert it terminates at null within N steps (cycle check).
If output is shortened (only 1 element), you lost the rest — debug the save step.
If output reverses but the last element points back to the previous, you forgot the head.next = None (recursive only).

Mastery Criteria

Wrote the iterative version cleanly in under 90 seconds.
Wrote the recursive version on demand and explained the inductive correctness argument.
Identified the null head and length-1 edge cases without prompting.
Stated why iterative is safer for production.
Drew the three-pointer dance on a whiteboard (or in comments) for one full iteration.
Acknowledged that recursive depth = N and called out the stack risk.

Lab 05 — Stack & Queue Applications: Valid Parentheses + Min Stack

Goal

Master the stack as a structural matching tool, the dual-stack technique for augmented operations, and the queue/deque distinction. The deliverable: validate balanced bracket strings in linear time, then extend to a Min Stack supporting O(1) push, pop, top, getMin.

Background Concepts

LIFO discipline; stack invariants; dual-stack trick for tracking auxiliary state; deque vs queue. Review the Stacks and Queues sections of the Phase 1 README, plus runtime concept 1 (stack vs heap memory).

Interview Context

Valid Parentheses is the warm-up; Min Stack is the follow-up. Together they probe whether you grasp the stack as a general tool (not just a recursion bookkeeping device). Asked at Amazon, Google, Bloomberg, Microsoft. The signal: do you generalize from “match ()” to “match (){}[]”? Do you reach for the dual-stack trick on Min Stack instead of O(N) getMin?

Problem Statement

Part A (Valid Parentheses): Given a string s of bracket characters from (){}[], return true iff every opener is matched with the correct closer in the correct order.

Part B (Min Stack): Design a stack that supports push(x), pop(), top(), and getMin() all in O(1).

Constraints

A: 1 ≤ |s| ≤ 10^4; s contains only the six bracket characters.
B: pop, top, getMin are not called on an empty stack; up to 3 · 10^4 operations.

Clarifying Questions

A: Are non-bracket characters possible? (Per constraints, no — but if extended, ignore them.)
A: Is the empty string valid? (Conventionally yes — vacuous truth.)
B: Are integer values bounded? (Affects whether int suffices.)
B: Is getMin of an empty stack defined? (Per constraints, never called on empty.)
B: Should top and pop be separate, or is pop returning the value acceptable? (LeetCode classic: separate. Match the spec.)

Examples

Input	Output
`"()"`	`true`
`"()[]{}"`	`true`
`"(]"`	`false`
`"([)]"`	`false` (interleaved, not nested)
`"{[]}"`	`true`
`""`	`true`

B: Sequence push(-2), push(0), push(-3), getMin() → -3, pop(), top() → 0, getMin() → -2.

Initial Brute Force

A: Repeatedly scan for adjacent matching pairs (), [], {} and remove them. If string empties, valid; else invalid.

def valid_brute(s):
    while True:
        new = s.replace("()", "").replace("[]", "").replace("{}", "")
        if new == s: break
        s = new
    return s == ""

B: Push x to a normal stack. getMin walks the entire stack each call.

Brute Force Complexity

A: O(N²) worst case (each pass removes constant pairs). B: getMin is O(N) per call, total O(N²) for N operations.

Optimization Path

A: Single pass with a stack: push openers, on closer pop and verify match. B: Maintain a parallel min-stack so each push records the current minimum. On pop, also pop from min-stack. getMin returns the top of min-stack.

Final Expected Approach

A — Valid Parentheses:

def is_valid(s):
    pairs = {')': '(', ']': '[', '}': '{'}
    stack = []
    for c in s:
        if c in '([{':
            stack.append(c)
        else:
            if not stack or stack.pop() != pairs[c]:
                return False
    return not stack

B — Min Stack:

class MinStack:
    def __init__(self):
        self.s = []      # main stack
        self.m = []      # min stack: m[i] = min of s[0..i]

    def push(self, x):
        self.s.append(x)
        self.m.append(x if not self.m else min(x, self.m[-1]))

    def pop(self):
        self.s.pop()
        self.m.pop()

    def top(self):
        return self.s[-1]

    def getMin(self):
        return self.m[-1]

Data Structures Used

A: A single stack of opener characters.
B: Two parallel stacks, both supporting O(1) push/pop.

Correctness Argument

A: Loop invariant — at each iteration, the stack contains the unclosed openers of the prefix of s consumed so far, in order. A closer is valid iff it matches the most-recent opener (LIFO). At end, an empty stack means all openers were closed in order.

B: Invariant — m[i] is min(s[0..i]). When we push x, the new minimum is min(x, current_min) — pure local computation. When we pop, both stacks shrink; the new top of m is correct because it was correct before this push. Hence getMin is m[-1], O(1).

Complexity

A: O(N) time, O(N) space (worst case, all openers). B: O(1) time per operation, O(N) total space.

Implementation Requirements

Use a single map closer → opener to avoid six-way if/else.
For Min Stack, use two stacks (or one stack of pairs); never recompute min by scanning.
Don’t pre-validate s (e.g., for invalid characters) unless the problem demands.
Handle the empty stack case before popping in is_valid.

Tests

A smoke: "()[]{}" valid; "(]" invalid.
A unit: unbalanced opener-only "(("; closer-first ")"; nested correctly "{[]}"; interleaved "([)]".
A edge: empty string; single character.
A large: 10⁴ openers followed by 10⁴ closers; should still run in milliseconds.
B smoke: the canonical sequence above.
B edge: push the same value twice, pop, ensure min is still correct (this is the “duplicate-min trap” — naive single-stack solutions fail here).
Random: generate random op sequences; cross-check against a “min via scan” reference.

Follow-up Questions

A: “What if s may contain other characters (letters, digits)?” → ignore them or treat as “skip.”
A: “Return the index of the first invalid bracket.” → modify the loop to return i instead of False.
A: “Generate all valid bracket strings of length 2N.” → that’s “Generate Parentheses” (Lab 08).
B: “Use only one stack.” → store (value, current_min) as pairs.
B: “Use only constant extra space (no parallel stack).” → encoding trick: store 2x - currentMin when x < currentMin, then decode on pop. Watch for overflow.
B: “Add getMax.” → add a third parallel stack.

Product Extension

A real-time expression evaluator for a spreadsheet engine. As users type formulas, you validate parenthesization on every keystroke (must be O(N) for snappy UX) and maintain a “min/max running aggregate” for selected cells (Min Stack pattern). The two-stack technique generalizes to maintaining any associative aggregate over a stack-shaped sliding context.

Language/Runtime Follow-ups

Python: list is a fast stack via append / pop(). Don’t use list.pop(0) (O(N)).
Java: prefer ArrayDeque over Stack (the latter is synchronized, slower, and inherits from Vector). Deque<Integer> with push / pop / peek.
Go: slices as stacks: s = append(s, x) and x, s = s[len(s)-1], s[:len(s)-1].
C++: std::stack (LIFO adapter) or just std::vector. std::stack’s pop returns void; use top then pop.
JS/TS: Array.push / Array.pop are O(1) amortized. The same array is fine.
Memory: stacks here grow on the heap (the data structure), even though the conceptual abstraction is named “stack.” Don’t confuse with the call stack.

Common Bugs

A — popping an empty stack — Python raises IndexError. Check if not stack first.
A — accepting "((" — forgetting the final if stack check. The string ends with openers still on the stack.
A — wrong pair table — {')': '(', ']': '[', '}': '{'}. Off-by-one easy to typo.
B — naive getMin — scanning the stack is O(N), violating the contract.
B — duplicate-min handling — if you maintain “the min” as a single field and pop the value equal to it without secondary tracking, the min is wrong after pop. Two-stack design avoids this.
B — pop on empty stack — per constraints not called, but if defensive, raise.

Debugging Strategy

A: print the stack and the current char at each step; trace "([)]".
B: print both stacks after each op; verify m[i] == min(s[0..i]).
For the duplicate-min trap, manually trace push(-1), push(-1), pop(), getMin() → must still be -1.

Mastery Criteria

Wrote is_valid cleanly in under 4 minutes; under 5 lines of logic.
Recognized the closer-table pattern over a six-way conditional.
Designed Min Stack with two stacks; explained why one stack with a single min field fails on duplicates.
Sketched the “encoded delta” optimization without needing it.
Handled the empty-stack defensive checks.
Selected ArrayDeque over Stack in Java without prompting.

Lab 06 — Heap Priority: Kth Largest In A Stream

Goal

Master the binary heap as the canonical streaming top-K device, the min-heap-of-size-K trick, and the cost model for push/pop. The deliverable: an online data structure that, after O(K) initialization, returns the Kth largest element in O(log K) per add.

Background Concepts

Binary heap as an array; sift-up / sift-down; min-heap vs max-heap; heapify is O(N); push/pop are O(log N). Review the Heaps section of the Phase 1 README and runtime concept 1 (stack vs heap memory — note the distinction between the call stack and the heap data structure).

Interview Context

Asked at Amazon, Apple, Bloomberg, and any role touching streaming systems. The signal: do you reach for a heap immediately when “online K-th largest” is mentioned? Do you choose a min-heap of size K (not a sorted list, not a max-heap)? Do you state the O(log K) per add?

Problem Statement

Design a KthLargest(k, nums) class. The constructor receives the integer k and an initial array nums. The method add(val) inserts val into the stream and returns the Kth largest element among all elements seen so far.

Constraints

1 ≤ k ≤ 10^4
0 ≤ |nums| ≤ 10^4
-10^4 ≤ val, nums[i] ≤ 10^4
At most 10^4 calls to add.
Guaranteed: at the time of any add return, there are at least k elements seen.

Clarifying Questions

Is k fixed for the lifetime of the object? (Yes — set once.)
Are duplicates allowed? (Yes — add(5) twice keeps both.)
What if fewer than k elements have been seen? (Per constraints, won’t happen at return time. Confirm.)
Is “Kth largest” 1-indexed? (Yes — K=1 is the maximum.)
Streaming: do we ever remove elements? (No — additions only.)

Examples

KthLargest(3, [4, 5, 8, 2])
add(3) → 4    // sorted desc: 8, 5, 4, 3, 2 → 3rd is 4
add(5) → 5    // 8, 5, 5, 4, 3, 2 → 3rd is 5
add(10) → 5   // 10, 8, 5, 5, 4, 3, 2 → 3rd is 5
add(9) → 8    // 10, 9, 8, 5, 5, 4, 3, 2 → 3rd is 8
add(4) → 8    // 10, 9, 8, 5, 5, 4, 4, 3, 2 → 3rd is 8

Initial Brute Force

Maintain a sorted list. On add, insert in sorted order (O(N)) and read index N-k.

class KthLargestBrute:
    def __init__(self, k, nums):
        self.k = k
        self.arr = sorted(nums)
    def add(self, val):
        # binary search insertion
        import bisect
        bisect.insort(self.arr, val)
        return self.arr[-self.k]

Brute Force Complexity

bisect.insort is O(log N) for the search but O(N) for the actual insertion (array shift). Total O(N) per add. For 10⁴ adds and 10⁴ initial size: 10⁸ ops. Borderline.

Optimization Path

We don’t need to track all elements. Only the top K. A min-heap of size K keeps the K largest seen, with the smallest of them at the top — that’s the Kth largest.

On add: push, then if size > K, pop (the smallest, which is no longer in the top K).
Return heap[0].

Final Expected Approach

import heapq

class KthLargest:
    def __init__(self, k, nums):
        self.k = k
        self.heap = []
        for x in nums:
            self.add(x)

    def add(self, val):
        heapq.heappush(self.heap, val)
        if len(self.heap) > self.k:
            heapq.heappop(self.heap)
        return self.heap[0]

For the constructor, you can do better: take the first k elements, heapify them (O(k)), then for each remaining element, push if it beats the top, else skip. But the simple version above is acceptable and amortizes the same.

Data Structures Used

A binary min-heap. Underlying storage: a dynamic array. Capacity: K.

Correctness Argument

Invariant: self.heap contains the K largest values seen so far (when ≥ K have been seen), and self.heap[0] is the minimum of those — i.e., the Kth largest overall.

After heappush(val): heap may have K+1 elements; the smallest is at the top. Popping removes it. The remaining K elements are still the K largest (we only removed the smallest of K+1, which by definition is excluded from the top K of K+1). Hence self.heap[0] is the Kth largest of K+1 = the Kth largest overall.

Complexity

Constructor: O(N log K) using the per-element approach; O(N) using bottom-up heapify on first K then sift the rest.
add: O(log K).
Space: O(K).

Implementation Requirements

Use the language’s built-in min-heap; don’t roll your own unless asked.
Bound the heap size to K explicitly; if you don’t, you’ve built a sorted set, not the optimization.
For max-heap-only languages (e.g., Java’s PriorityQueue is min-heap by default — fine here), use the natural orientation. If you need a max-heap, negate or pass a comparator.
Don’t allocate fresh on every add.

Tests

Smoke: the canonical example above.
Unit: K=1 (always returns max); K==N (returns min after each add).
Edge: empty nums and a stream that brings size up to K; duplicate values; negative values.
Large: 10⁴ adds of random ints with K=100; assert per-call O(log K) by timing.
Random: maintain a brute-force sorted-list reference; assert equality of returned value on each call.
Invalid: add before reaching K elements (per constraints not happening; if defensive, raise or buffer).

Follow-up Questions

“What if the stream supports remove(val)?” → switch to a balanced BST or two heaps with lazy deletion.
“Maintain the K smallest.” → max-heap of size K (mirror).
“K-th most frequent element in a stream.” → counter + heap with re-inserts on count change.
“Top K trending hashtags over a sliding 1-hour window.” → heap + circular buffer + lazy deletion of stale entries.
“Implement the min-heap from scratch.” → array-backed, sift-up on push, sift-down on pop, parent at (i-1)//2, children at 2i+1, 2i+2.
“Why O(N) heapify rather than N pushes?” → bottom-up sift-down sums to O(N); pushes sum to O(N log N).

Product Extension

A leaderboard service that streams game scores and surfaces the top 100. Memory budget per shard is tight; the min-heap-of-size-K pattern is the standard approach. Combine with sharding (each shard maintains its own top-100; the aggregator maintains a heap of heap-tops). The same pattern powers “top-N alerts,” “p99 latency tracking,” and “trending content” feeds.

Language/Runtime Follow-ups

Python: heapq is min-heap only. For max-heap behavior, push -x and negate on pop. heapq.heapify(list) is O(N) in place.
Java: PriorityQueue defaults to min-heap. PriorityQueue<Integer> pq = new PriorityQueue<>();. Reversed: new PriorityQueue<>(Comparator.reverseOrder()). pq.poll() and pq.peek() are O(log N) and O(1).
Go: must implement the heap.Interface (Len, Less, Swap, Push, Pop). Verbose; stand-alone helpers in the container/heap package.
C++: std::priority_queue<int> is a max-heap by default. Use std::priority_queue<int, std::vector<int>, std::greater<int>> for a min-heap.
JS/TS: no built-in heap. Must implement or pull a library. This is a not-uncommon interview surprise.
Memory model: the data-structure heap lives in the process heap (not the call stack). Sizes up to 10⁴ are trivial.

Common Bugs

Maintaining a max-heap — works for finding max, but you’d need to extract K elements per call. Wrong tool.
Forgetting to bound size to K — heap grows to N; per-add cost becomes O(log N) instead of O(log K) (small impact for small N, but conceptually wrong and uses more memory).
Returning heap[-1] — Python’s heap[-1] is not the largest; only heap[0] is the min. Other indices are unordered.
Off-by-one on K — K=1 should track the maximum; if you accidentally maintain K-1 elements, you’re answering the wrong query.
Java PriorityQueue reversed Comparator typo — using (a, b) -> b - a overflows for large negative ints. Use Integer.compare(b, a).

Debugging Strategy

After each add, print the heap. Should be ≤ K elements with the K-th-largest at index 0.
Cross-check against sorted(all_seen)[-K].
For perf: time 10⁴ adds; should be milliseconds.

Mastery Criteria

Selected the min-heap-of-size-K pattern within 30 seconds of hearing “Kth largest streaming.”
Stated the loop invariant aloud.
Wrote the implementation in under 5 minutes.
Identified the K=1 and K=N degenerate cases.
Knew the language idiom: heapq Python, PriorityQueue Java, priority_queue<…, greater<…>> C++.
Mentioned the (a, b) -> b - a overflow trap in Java.
Sketched the O(N) bottom-up heapify alternative for the constructor.

Lab 07 — Binary Search Fundamentals

Goal

Master the half-open invariant [lo, hi), the overflow-safe midpoint, the lower-bound / upper-bound generalizations, and the discipline that makes binary search bug-free. The deliverable: implement Search Insert Position cleanly and explain why your loop terminates.

Background Concepts

Sorted arrays; monotone predicates; loop invariants; integer overflow on (lo + hi) / 2. Review the Sorted Arrays / Sorted Sets section of the Phase 1 README.

Interview Context

Binary search is asked at every FAANG and is the #1 source of “I solved it but had off-by-one bugs” complaints. The signal isn’t whether you can find an exact match — it’s whether you can correctly answer “first index where predicate flips from false to true” in one of three loop variants without bugs. Lower-bound and upper-bound are the general tools.

Problem Statement

Given a sorted array nums of distinct integers and a target target, return the index where target is found, or the index where it would be inserted to keep nums sorted.

This is exactly lower_bound (first index i such that nums[i] >= target).

Constraints

1 ≤ |nums| ≤ 10^4
-10^4 ≤ nums[i], target ≤ 10^4
nums is sorted ascending and contains no duplicates.

Clarifying Questions

Are duplicates possible? (Per constraints, no — but the lower-bound formulation handles them: returns leftmost.)
Can nums be empty? (Per constraints no, but the implementation handles it via lo = hi = 0.)
Should we return len(nums) if target exceeds all elements? (Yes — it inserts at the end.)
Is the result expected to be the first match or any match? (For Search Insert, lower-bound semantics: leftmost.)
Recursive or iterative? (Iterative is preferred — no stack growth.)

Examples

`nums`	`target`	Output
`[1, 3, 5, 6]`	5	2
`[1, 3, 5, 6]`	2	1
`[1, 3, 5, 6]`	7	4
`[1, 3, 5, 6]`	0	0
`[1]`	1	0
`[]`	5	0

Initial Brute Force

Linear scan: walk the array, return the first index i where nums[i] >= target. If none, return len(nums).

def insert_pos_brute(nums, target):
    for i, x in enumerate(nums):
        if x >= target:
            return i
    return len(nums)

Brute Force Complexity

O(N) time, O(1) space. For 10⁴ elements with 10⁴ queries, 10⁸ ops — borderline. Misses the entire point of “sorted.”

Optimization Path

Sorted + monotone predicate = binary search. The predicate is nums[i] >= target, monotone false → true as i increases. We want the first true index.

Three loop styles compete:

Closed [lo, hi]: while lo <= hi, terminate at lo > hi.
Half-open [lo, hi): while lo < hi, terminate at lo == hi. Recommended.
Inclusive find-or-not-found: while lo < hi, post-loop check lo validity.

Half-open is the cleanest because the answer-pointer lo always satisfies the invariant “all indices < lo have predicate false; all indices >= hi have predicate true.” When lo == hi, that’s the boundary.

Final Expected Approach

def search_insert(nums, target):
    lo, hi = 0, len(nums)            # half-open [lo, hi)
    while lo < hi:
        mid = lo + (hi - lo) // 2    # overflow-safe
        if nums[mid] < target:
            lo = mid + 1             # predicate false → exclude mid
        else:
            hi = mid                 # predicate true → keep mid as candidate
    return lo                        # lo == hi; first true index

Data Structures Used

The input array. Three integer indices: lo, hi, mid.

Correctness Argument

Invariant: at every iteration, the answer (the smallest index i such that nums[i] >= target, or len(nums) if none) lies in [lo, hi] (closed interval over the half-open search range). Equivalently: nums[lo-1] < target (or lo == 0) and nums[hi] >= target (or hi == len(nums)).

Body: if nums[mid] < target, predicate at mid is false, so the answer is in [mid+1, hi]. We set lo = mid+1. Otherwise predicate at mid is true; the answer is in [lo, mid]. We set hi = mid. Both branches strictly shrink the range.

Termination: each iteration shrinks hi - lo by at least 1 (since mid is in [lo, hi-1]). Loop exits when lo == hi. Invariant gives us: lo is the answer.

Complexity

O(log N) time. O(1) space.

Implementation Requirements

Use lo + (hi - lo) // 2, never (lo + hi) // 2 — integer overflow in Java/C++/Go for large ints.
Half-open [lo, hi) with hi = len(nums) initial.
Loop condition lo < hi.
Update lo = mid + 1 (exclude mid); hi = mid (include mid as candidate).
Return lo.
Don’t write three nested if/else — there are only two branches.

Tests

Smoke: the table above.
Unit: target equals an existing element; target less than all; target greater than all; single-element array (target equal, less, greater).
Edge: empty array → return 0.
Large: N = 10⁵, sorted; binary search with random targets. Time should be sub-millisecond.
Random: generate sorted random arrays; cross-check against linear scan.
Invalid: array not sorted (undefined behavior; if defensive, assert).

Follow-up Questions

“Find the last index where the predicate is true (upper-bound).” → flip the predicate; or use bisect.bisect_right.
“Search in a rotated sorted array.” → modify the comparison: identify which half is sorted.
“Search for a peak element.” → ternary-search-like: compare mid with mid+1.
“First bad version (the predicate is the only oracle).” → same exact loop with is_bad(mid) as the predicate.
“Search a 2D matrix.” → flatten conceptually if rows are sorted continuations; else two passes.
“Why does lo + (hi - lo) // 2 matter in Python?” → it doesn’t (Python ints are unbounded), but it’s the universal idiom.

Product Extension

A timestamp-indexed log store. Find the first log line at or after a given timestamp: that’s lower_bound. The same primitive powers range queries (lower_bound(start) to upper_bound(end)) and is the basis for B-tree leaf-node lookups. Library functions like bisect, lower_bound, Arrays.binarySearch already implement this; a senior engineer reaches for them, not for a hand-rolled loop.

Language/Runtime Follow-ups

Python: bisect.bisect_left(nums, target) is the library answer. Returns exactly the lower-bound index.
Java: Arrays.binarySearch(arr, target) returns either the match index or -(insertion_point) - 1. Decode with result < 0 ? -result - 1 : result. Note the bit-shift idiom (lo + hi) >>> 1 for unsigned-right-shift to avoid overflow.
Go: sort.SearchInts(arr, target) returns the lower-bound directly.
C++: std::lower_bound(v.begin(), v.end(), target) - v.begin(). Returns an iterator; subtract begin() for the index.
JS/TS: no library. Must implement.
Overflow: Java/C++ ints are 32-bit by default. (lo + hi) can overflow when both ~2³⁰. Use the safe form.

Common Bugs

(lo + hi) // 2 overflow in Java/C++/Go (32-bit ints). Use lo + (hi - lo) // 2 or >>> 1.
Wrong update direction — lo = mid (instead of mid + 1) on the false branch. Causes infinite loop when lo + 1 == hi.
Closed-interval while lo <= hi with half-open updates — mixing the two styles. Pick one and stick to it.
Returning mid instead of lo — mid is wherever the loop happens to stop, not the answer.
Off-by-one on the initial hi — hi = len(nums) - 1 for closed; hi = len(nums) for half-open.
Forgetting the empty-array case — half-open form handles it naturally (lo = hi = 0); closed form needs an explicit check.

Debugging Strategy

Print lo, hi, mid, and nums[mid] each iteration. The range should strictly shrink.
If you hit an infinite loop, you almost certainly have lo = mid (not mid + 1) on the false branch.
For random testing, compare against bisect.bisect_left as the reference.

Mastery Criteria

Wrote the half-open form from memory in under 2 minutes, no off-by-ones.
Stated the invariant aloud: “all indices < lo are false; all indices ≥ hi are true.”
Identified the overflow trap and used the safe midpoint.
Recognized that Search Insert Position is lower_bound.
Knew the library function in Python, Java, Go, C++.
Solved a follow-up (rotated sorted array OR upper-bound) in under 10 minutes by reusing the same skeleton.

Lab 08 — Recursion & Stack: Generate Parentheses

Goal

Master backtracking with partial-state validity, the recursion tree as a mental model, the bound on recursion depth, and the Catalan-number cost analysis. The deliverable: enumerate all well-formed parenthesizations of N pairs and explain why the count is C_n (the n-th Catalan number).

Background Concepts

Recursion as a tree of choices; partial-state pruning vs full-state validation; recursion depth = call-stack frames; iterative backtracking as an explicit-stack alternative. Review runtime concept 10 (recursion depth) in the Phase 1 README and the Stacks section.

Interview Context

Generate Parentheses is asked at Google, Microsoft, Meta. The signal: do you generate only valid prefixes (prune early) instead of generating all 2^(2n) strings and filtering? Do you know the Catalan-number complexity? Can you also produce an iterative version using an explicit stack?

Problem Statement

Given an integer n, return all combinations of well-formed parentheses using exactly n pairs of ( and ).

Constraints

1 ≤ n ≤ 8 — the count grows as C_n = (2n)! / ((n+1)! n!). C_8 = 1430.
Output order is not specified; any valid enumeration is acceptable.

Clarifying Questions

Should output be sorted? (Usually no — but lexicographic falls out naturally if we always try ( before ).)
Is duplication possible? (No — each generated string is unique by construction.)
Should we return a list or stream the results? (List is canonical; streaming/yield is a follow-up.)
Empty case n = 0? (Per constraints n ≥ 1. If allowed: return [""].)
Are the parentheses always ( and )? (Yes for the canonical problem; brackets and braces is a generalization.)

Examples

`n`	Output
1	`["()"]`
2	`["(())", "()()"]`
3	`["((()))", "(()())", "(())()", "()(())", "()()()"]`
4	14 strings

Initial Brute Force

Generate all 2^(2n) strings of length 2n over {(, )}. Filter by validity (use the stack from Lab 05). Return the survivors.

def gen_brute(n):
    out = []
    def rec(s):
        if len(s) == 2*n:
            if is_valid(s):  # Lab 05 routine
                out.append(s)
            return
        rec(s + "(")
        rec(s + ")")
    rec("")
    return out

Brute Force Complexity

2^(2n) strings; each takes O(n) to validate. Total O(n · 4^n). For n = 8: ~10⁵ operations — fast, but for n = 16 it would be billions.

Optimization Path

Prune as we build. Track (open, close) counts; the rules are:

We may add ( if open < n.
We may add ) if close < open (otherwise we’d close before opening).

Every leaf of this pruned tree is a valid string; no validation needed. The number of leaves is exactly C_n.

Final Expected Approach

def generate_parentheses(n):
    out = []
    def backtrack(s, opens, closes):
        if len(s) == 2 * n:
            out.append(s)
            return
        if opens < n:
            backtrack(s + "(", opens + 1, closes)
        if closes < opens:
            backtrack(s + ")", opens, closes + 1)
    backtrack("", 0, 0)
    return out

Data Structures Used

The recursion call stack (depth = 2n).
An accumulator string built up by concatenation (or, for efficiency, a list of chars joined at the leaf).
An output list of strings.

Correctness Argument

Soundness: every leaf has length 2n, opens == n, closes == n (else we wouldn’t reach length 2n under the pruning rules). At every prefix, closes ≤ opens (we only added ) when closes < opens). Therefore every leaf is balanced.

Completeness: any valid parenthesization satisfies the same two rules at every prefix (it’s the characterization of valid prefixes). Therefore the recursion explores it. By induction on length: every valid prefix s of length < 2n extended by ( (if extensible) or ) (if extensible) appears in the tree.

Uniqueness: at each node we make distinct choices (( vs )), so two leaves cannot have the same string.

Complexity

Number of leaves: C_n = (2n)! / ((n+1)! n!) ≈ 4^n / (n^(3/2) · √π).
Cost per leaf: O(n) to copy the final string.
Total time: O(n · C_n) = O(4^n / √n).
Space: output is O(n · C_n). Recursion stack is O(n).

Implementation Requirements

Two counters: opens, closes. Don’t track the full prefix’s validity — the counters are sufficient.
Termination at len(s) == 2 * n, not when both counters hit n (equivalent, but the length check is clearer).
Pass s immutably (string concat) for clarity, or mutate a list and append/pop for performance — but for n ≤ 8 the difference is negligible.
Don’t generate then filter. The whole point is to not visit invalid branches.

Tests

Smoke: n = 3 → 5 strings.
Unit: n = 1 → ["()"]; n = 2 → 2 strings.
Edge: n = 0 (if allowed) → [""].
Property: count of returned strings equals C_n (compute reference Catalan number).
Property: every string in the output is valid (run Lab 05’s is_valid).
Property: all strings are distinct (length = len(set(out))).
Large: n = 8 returns 1430 strings in milliseconds.

Follow-up Questions

“Generate iteratively using an explicit stack.” → push partial states (s, opens, closes); pop and expand.
“Return only the count, not the strings.” → that’s just C_n; closed form: comb(2n, n) // (n + 1).
“Brackets, braces, and parens (multi-type).” → much harder; can’t be solved by simple counters because the closer must match the most-recent opener.
“Stream results lazily (generator/yield).” → in Python, yield from each leaf; saves memory.
“Memoize.” → the canonical formulation has no overlapping subproblems on (opens, closes, s) because s is unique at every state. If you parametrize by just counts, you lose the actual string.
“Why is the count C_n?” → bijection with Dyck paths, balanced trees of n+1 leaves, etc.

Product Extension

A SQL/expression-grammar generator for fuzz testing. Generating syntactically valid parenthesized expressions is a backtracking-with-pruning problem; arbitrary depth-bounded grammars use the same technique. The code-generation engine inside any compiler’s “synthesize a small valid program” tool uses this exact pattern.

Language/Runtime Follow-ups

Python: strings are immutable, so each s + "(" allocates. For larger n, build with a list and "".join(...) at the leaf.
Java: use StringBuilder and delete/setLength at backtrack — the canonical mutable-builder pattern. Pass the builder by reference; remember to undo each append on return.
Go: strings are immutable; use []byte or strings.Builder. Beware: a strings.Builder does not support truncation; use a byte slice with a manual length pointer.
C++: use std::string mutated in place with push_back / pop_back. Pass by reference.
JS/TS: strings are immutable; concat is fine for small n. For larger, use an array.
Recursion depth: 2n. For n ≤ 8, depth ≤ 16 — trivial. Even n = 1000 (academic) is safe in most languages.
Tail-call optimization: absent in Python and JS; this code isn’t tail-recursive anyway because there are two recursive branches.

Common Bugs

Adding ) without checking closes < opens — generates )(... prefixes that can never become valid; produces duplicates and invalid strings.
Adding ( without checking opens < n — overshoots; never closes; never reaches the leaf condition.
Wrong termination — if opens == n and closes == n instead of if len(s) == 2n is fine but harder to reason about.
Backtracking with mutation but not undoing — append (, recurse, forget to pop before recursing again. Adds spurious chars.
Catalan miscount — saying complexity is O(2^(2n)) instead of O(4^n / √n) is a forgivable but suboptimal answer.

Debugging Strategy

Print the recursion tree: indent by len(s) and show (s, opens, closes).
Run for n = 2 and verify the output is exactly ["(())", "()()"].
Count outputs and compare to comb(2n, n) // (n + 1).

Mastery Criteria

Identified backtracking-with-pruning as the right tool within 60 seconds.
Wrote the two pruning rules without help.
Stated complexity as O(n · C_n) ≈ O(4^n / √n).
Acknowledged the Catalan-number connection.
Wrote the iterative-with-explicit-stack version on demand.
Selected the appropriate language idiom (StringBuilder, []byte, etc.) and remembered to undo mutations on backtrack.

Lab 09 — Tree Traversal Fundamentals

Goal

Master the three depth-first traversals (preorder, inorder, postorder) in both recursive and iterative forms, plus level-order (BFS). Understand the explicit-stack simulation of recursion, the postorder trick, and Morris traversal as the O(1)-space follow-up. The deliverable: implement iterative inorder cleanly and explain the stack invariant.

Background Concepts

Binary trees; DFS vs BFS; recursion as implicit stack; explicit stack as iterative replacement; visited flags. Review the Trees section and the Stacks section of the Phase 1 README.

Interview Context

Tree traversal is a Day-1 interview topic. Recursive forms are trivial; the interesting signal is iterative inorder (the canonical “implement recursion with a stack” question). Postorder iteratively is harder still — and Morris traversal (O(1) space) shows up in senior interviews.

Problem Statement

Given the root of a binary tree, return the inorder traversal as a list. Implement iteratively (no recursion).

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

Constraints

Number of nodes in [0, 10^4].
-100 ≤ Node.val ≤ 100.
The tree is not necessarily balanced.

Clarifying Questions

What’s the node definition? (As above — confirm with interviewer.)
Empty tree allowed? (Yes — return [].)
Duplicate values? (Allowed; doesn’t affect traversal.)
Are we limited on stack space? (For 10^4 nodes in a degenerate (linked-list) tree, recursive blows Python’s default 1000-deep stack. Iterative is required.)
Must we use O(1) extra space? (If yes — Morris traversal. Otherwise the explicit stack is fine.)

Examples

Tree	Inorder
`[1, null, 2, 3]` (LeetCode array form)	`[1, 3, 2]`
`[]`	`[]`
`[1]`	`[1]`
BST `[2, 1, 3]`	`[1, 2, 3]` (sorted!)

Initial Brute Force

Recursive — visit left, root, right.

def inorder_recursive(root):
    out = []
    def rec(node):
        if not node: return
        rec(node.left)
        out.append(node.val)
        rec(node.right)
    rec(root)
    return out

Brute Force Complexity

O(N) time, O(H) space (recursion stack, where H = tree height). H = N for a degenerate tree, log N for balanced.

Optimization Path

The iteration cost is the same — O(N) — but recursion uses the system stack which has a fixed limit (~1000 frames in Python by default). For N = 10⁴ in a worst-case skewed tree, recursion crashes. We replace with an explicit stack.

The pattern: “go left as far as possible, pushing each node along the way; when we can’t go further left, pop, visit, then move to right child and repeat.”

Final Expected Approach

def inorder_iterative(root):
    out, stack, node = [], [], root
    while node or stack:
        while node:                  # walk left, pushing
            stack.append(node)
            node = node.left
        node = stack.pop()           # leftmost unvisited
        out.append(node.val)         # visit
        node = node.right            # explore right subtree
    return out

For preorder: visit before descending left.

def preorder_iterative(root):
    if not root: return []
    out, stack = [], [root]
    while stack:
        node = stack.pop()
        out.append(node.val)
        if node.right: stack.append(node.right)  # right first → left popped first
        if node.left:  stack.append(node.left)
    return out

For postorder: hardest. Two clean approaches:

Modified preorder + reverse: do preorder but visit right before left; reverse the result.
Visited-flag trick: push (node, visited) tuples; on first pop, push back as visited and push children; on second pop, emit.

def postorder_iterative(root):
    if not root: return []
    out, stack = [], [root]
    while stack:
        node = stack.pop()
        out.append(node.val)
        if node.left:  stack.append(node.left)
        if node.right: stack.append(node.right)
    return out[::-1]

Data Structures Used

An explicit stack (list/deque/Stack/std::stack).
A pointer/cursor node.
An output list.
For Morris: only the output list and tree pointers themselves.

Correctness Argument

Inorder invariant: at the start of each outer-loop iteration, the stack contains the chain of ancestors (along left-edges) of the next-to-visit node, all of whose left subtrees are pending. The inner while node walk pushes new ancestors. After popping, we’ve finished its left subtree (it was either null or fully consumed in earlier iterations); we visit it; then move to its right subtree, where the same invariant resumes.

Termination: every node is pushed exactly once and popped exactly once (visited once). N pushes + N pops = O(N) work. Loop ends when both node is null and stack is empty — meaning we’ve returned from the rightmost-rightmost subtree.

Complexity

O(N) time, O(H) auxiliary space (the stack holds at most one chain of ancestors).

For a skewed tree, H = N. For balanced, H = log N.

Implementation Requirements

Initialize node = root, stack = [].
Outer loop: while node or stack.
Inner loop walks left and pushes.
After inner loop: pop, append val, set node = popped.right.
Don’t push None. Don’t visit on the way down.
For preorder, push right before left (LIFO order).
For postorder, the “modified preorder + reverse” form is the cleanest one-pass solution.

Tests

Smoke: the example table.
Unit: single node; empty tree; left-skewed (degenerate); right-skewed; balanced BST.
Property: inorder of a BST is sorted ascending.
Property: preorder + inorder uniquely determine a binary tree (round-trip test if you implement the reconstruction).
Edge: root with only-left child; root with only-right child.
Large: N = 10⁴ skewed tree — must not stack-overflow.

Follow-up Questions

“Implement Morris traversal (O(1) space).” → temporarily rewire the tree using “threaded” pointers from inorder predecessors back to successors; restore on the way through.
“Level-order traversal.” → BFS with a queue.
“Zigzag level-order.” → BFS with alternating direction; reverse every other level.
“Reconstruct a tree from its preorder + inorder.”
“Boundary traversal of a tree.” → combination of left boundary, leaves left-to-right, right boundary reversed.
“Verify a BST.” → inorder traversal must be strictly ascending; or carry (min, max) bounds recursively.

Product Extension

A directory tree being indexed: BFS gives breadth-first crawl (sibling priority); preorder DFS visits parent before children (renderers); postorder visits children before parent (size accumulation, deletion). A code formatter walks the AST in postorder so children’s formatted text is available when the parent emits its own. A serializer uses preorder. The choice of traversal is a design decision tied to dependency direction.

Language/Runtime Follow-ups

Python: list works as stack via append/pop. collections.deque is faster for very deep stacks. Default recursion limit is 1000 — sys.setrecursionlimit(10**5) if recursing on huge trees.
Java: Deque<TreeNode> stack = new ArrayDeque<>(). Avoid java.util.Stack (legacy, synchronized). For BFS, use Deque<TreeNode> q = new ArrayDeque<>().
Go: slices as stacks: stack = append(stack, n), pop with stack = stack[:len(stack)-1]. No generic stack in stdlib pre-1.18.
C++: std::stack<TreeNode*> and std::queue<TreeNode*>.
JS/TS: array push/pop. For BFS, Array.shift() is O(N); use a real deque or an index pointer to avoid quadratic blowup on large trees.
Recursion depth: Python ~1000 default; Java ~10⁴ on default -Xss; Go grows stacks dynamically. For 10⁴ skewed trees, iterative is mandatory in Python.

Common Bugs

Pushing None onto the stack — bloats the stack and requires defensive pops.
Visiting on the way down for inorder — that’s preorder.
In preorder iterative, pushing left before right — gives reverse-of-preorder.
For postorder, forgetting to reverse in the modified-preorder approach.
BFS using list.pop(0) in Python — O(N) shift on every level; quadratic on deep trees. Use collections.deque and popleft().
Inner loop not consuming left children — only pushing root; you never reach the leftmost node.
Mutation in Morris traversal — forgetting to restore the threaded pointer; leaves the tree corrupted.

Debugging Strategy

Print stack contents and node at the top of each outer-loop iteration.
For inorder, the first visit should be the leftmost node.
For a known BST, the output of inorder must be sorted; if not, the loop is wrong.
For large skewed inputs, iterative must finish without RecursionError.

Mastery Criteria

Wrote iterative inorder from memory in under 3 minutes.
Stated the stack invariant (chain of ancestors along left edges).
Wrote preorder iteratively and explained the right-before-left push order.
Articulated two postorder strategies (reverse-preorder vs visited-flag).
Knew that BFS uses a queue; can explain why list.pop(0) is wrong in Python.
Could sketch Morris traversal at a high level: rewire to predecessors, restore on the way through.
Recognized that inorder of a BST is sorted (and used it to verify a BST).

Phase 2 — Standard Coding Interview Patterns

Target level: Medium → Medium-Hard Expected duration: 4 weeks (12-week track) / 4 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~7 patterns introduced per week + 50–80 problems applying them under the framework

Why This Phase Is The Keystone

Phase 0 fixed your execution. Phase 1 fixed your vocabulary. Phase 2 fixes the only thing standing between you and a 95% Medium solve rate: pattern recognition.

Here is the empirical claim, and it is the entire reason this phase exists:

Any unseen LeetCode Medium becomes a 5-minute problem if you immediately recognize the pattern. The recognition takes ~30 seconds. The remaining 4–5 minutes are mechanical: instantiate the template, adapt to the problem’s specifics, write clean code, test.

Candidates who fail Mediums almost never fail because the pattern was hard. They fail because they did not recognize the pattern, so they tried to derive the algorithm from first principles in 25 minutes — a task the original algorithm researcher needed weeks for. Pattern recognition is not memorization; it is the compiled, searchable index of the entire algorithmic literature, indexed by problem-statement signal.

This is the difference between a candidate who looks at “find longest substring with at most K distinct characters” and thinks “sliding window with a frequency map, variable-size, shrink while violation, O(N)” in 20 seconds — and one who thinks “hmm, maybe two pointers? or sort? or…” and starts coding the wrong thing.

The 28 patterns below cover >90% of the problems asked at Big Tech, infrastructure companies, quant firms, and systems-engineering interviews. They are not all the patterns in existence — Phases 3–7 add advanced data structures, hard graphs, DP families, and competitive-programming techniques. But these 28 are the ones that, once internalized, transform Medium-level problems from “puzzles to solve” into “templates to instantiate”.

What You Will Be Able To Do After This Phase

For any Medium problem, recognize the dominant pattern in <2 minutes of reading the problem statement.
For each of the 28 patterns, write the canonical template from memory in <5 minutes.
Distinguish between superficially-similar patterns (e.g., binary search on index vs binary search on answer) by their signal, not their syntax.
Combine two patterns when one alone is insufficient (e.g., monotonic deque inside a sliding window; trie inside a backtracking DFS).
Diagnose, when a pattern almost fits but not quite, exactly which generalization is needed (e.g., “this is sliding window but the window is variable and we need the max — we need a monotonic deque, not just a counter”).
Communicate the pattern out loud at the moment of recognition: “This is X because of signal Y; the template is Z; expected complexity is W; the canonical pitfall is P.”

How To Read This Phase

This README is a reference manual for all 28 patterns, plus a recognition cheat sheet, plus a mastery checklist. Each pattern entry has a fixed structure:

Signal recognition — the words/structure in the problem statement that should fire this pattern within 2 minutes of reading
Canonical template — pseudocode you should be able to write from memory
Complexity — time and space, with the constants that matter
Common variants — the family tree (e.g., sliding window has fixed-size, variable-size, count-based variants)
Classic problems — 4–8 LeetCode problems where this pattern is the intended solution
Common bugs — the specific failure modes seen on this pattern in interviews

Read it linearly the first time. Refer back to specific patterns as you work the labs. After all labs, re-read the cheat-sheet table at the bottom — it should now read as obvious.

A Word On The 28 Patterns As A System

The patterns are not 28 unrelated tricks. They form a small number of meta-strategies:

Linear scans with state (1, 2, 3, 4, 5, 9, 10, 11) — one pass, maintain a structure
Reduce-to-sorted (6, 7, 11) — sort first, then exploit order
Decision-on-monotonic-axis (8) — binary search where the axis is the answer itself
Local-update primitives on linear/tree/graph topology (12, 13, 14, 15, 16, 17, 18) — propagate information along edges/pointers
Enumerate with pruning (19) — exhaustive search with backtracking
Memoize over a state space (20, 21, 22, 23, 24, 25) — cache answers to a DAG of subproblems
Specialized structures for prefix/order queries (26, 27, 28) — trie, heap, K-way merge

Recognizing the meta-strategy first, then drilling down to the specific pattern, is often faster than trying to match all 28 patterns linearly.

Inline Pattern Reference

1. Two Pointers (opposite ends + same direction)

Signal Recognition (<2 min)

The input is sorted (or can be sorted cheaply) and the problem asks for a pair/triplet with a property.
The problem says “in-place” and you are scanning an array.
The answer is symmetric: it depends on values from both ends shrinking inward.
“Find pair such that a + b = target” with sorted input.
“Remove duplicates in place” / “Move zeros”.

Canonical Template (Opposite Ends)

l, r = 0, len(a) - 1
while l < r:
    if condition(a[l], a[r]):
        # record / move both
        l += 1; r -= 1
    elif a[l] + a[r] < target:
        l += 1
    else:
        r -= 1

Canonical Template (Same Direction / Read-Write Pointers)

write = 0
for read in range(len(a)):
    if keep(a[read]):
        a[write] = a[read]
        write += 1
return write  # new length

Complexity

Time O(N) (each pointer moves monotonically — total moves bounded by N). Space O(1).

Common Variants

Opposite ends: Two Sum on sorted, 3Sum, container with most water, valid palindrome.
Same direction (slow/fast): remove duplicates, move zeros, partitioning around a pivot.
Two arrays merge: merge two sorted arrays / lists.
Cycle detection (Floyd): linked-list two-pointer where fast moves 2× slow.

Classic Problems

LeetCode 1 — Two Sum (variant: sorted input becomes two pointers)
LeetCode 15 — 3Sum
LeetCode 11 — Container With Most Water
LeetCode 26 — Remove Duplicates from Sorted Array
LeetCode 75 — Sort Colors (Dutch national flag)
LeetCode 125 — Valid Palindrome
LeetCode 167 — Two Sum II Sorted
LeetCode 283 — Move Zeroes

Common Bugs

Forgetting to advance both pointers when a match is recorded → infinite loop.
Off-by-one in while l < r vs l <= r (depends on whether single element is meaningful).
Skipping duplicates: forgetting the inner while l < r and a[l] == a[l+1]: l += 1 after a recorded match (3Sum).

2. Sliding Window (fixed size + variable size)

Signal Recognition (<2 min)

“Longest / shortest / count of subarrays / substrings with property X.”
“Maximum sum of K consecutive elements.”
“Subarray containing all of …” / “smallest substring that contains all chars of T.”
The brute force is O(N²) over all subarrays. The property is monotone as the window grows or shrinks.

Canonical Template (Variable Size, Shrink-While-Violation)

l = 0
state = init()
best = 0
for r in range(len(a)):
    state = add(state, a[r])
    while violates(state):
        state = remove(state, a[l])
        l += 1
    best = max(best, r - l + 1)

Canonical Template (Fixed Size K)

state = init()
for i in range(K): state = add(state, a[i])
best = report(state)
for r in range(K, len(a)):
    state = add(state, a[r])
    state = remove(state, a[r - K])
    best = update(best, report(state))

Complexity

Time O(N) — each element enters and leaves the window at most once. Space O(window state size).

Common Variants

Fixed-size: maximum sum, average, min/max via deque.
Variable-size with constraint to shrink under: at most K distinct, sum ≤ S, no repeats.
Variable-size with constraint to grow until satisfied: smallest window containing all of T (then shrink while still satisfying).
Count of “good” windows = count of “good” right endpoints, often count += r - l + 1 after each step.

Classic Problems

LeetCode 3 — Longest Substring Without Repeating Characters
LeetCode 76 — Minimum Window Substring
LeetCode 209 — Minimum Size Subarray Sum
LeetCode 340 — Longest Substring with At Most K Distinct Characters
LeetCode 424 — Longest Repeating Character Replacement
LeetCode 567 — Permutation in String
LeetCode 992 — Subarrays with K Different Integers (the “exactly K = atMost(K) − atMost(K-1)” trick)
LeetCode 1004 — Max Consecutive Ones III

Common Bugs

Updating the answer inside the shrink loop instead of after — leads to recording invalid windows.
Forgetting that while (not if) is required when shrinking — a single character can violate by >1.
Counting “exactly K” as atMost(K) instead of atMost(K) − atMost(K-1).
For “no repeats”, forgetting that the freq map needs decrement on shrink, not just delete.

3. Prefix Sums (1D + 2D)

Signal Recognition (<2 min)

“Sum/count over a range [l, r]” with many queries or asked once with N up to 10^5.
“Subarray with sum equal to K” (prefix sum + hashmap of seen prefix sums).
“Number of subarrays with sum divisible by K” (prefix sums mod K).
2D: “matrix region sum” / “rectangle of ones”.

Canonical Template (1D)

prefix = [0] * (n + 1)
for i in range(n):
    prefix[i + 1] = prefix[i] + a[i]
# range sum a[l..r]:  prefix[r + 1] - prefix[l]

Canonical Template (2D)

P = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n):
    for j in range(m):
        P[i+1][j+1] = a[i][j] + P[i][j+1] + P[i+1][j] - P[i][j]
# region (r1,c1)..(r2,c2):
# P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1]

Complexity

Build O(N) (1D) or O(NM) (2D). Each query O(1). Space O(N) or O(NM).

Common Variants

Subarray-sum-equals-K with hashmap {prefix_sum: count}.
XOR prefix for “subarray XOR equals K” — same trick, different operator (any group operator works).
Mod K prefix for “subarray sum divisible by K” — bucket prefixes by their value mod K.
Count of negative numbers in a sorted matrix via row prefix.
2D rectangle sum, 2D max-sum submatrix.

Classic Problems

LeetCode 303 — Range Sum Query Immutable
LeetCode 304 — Range Sum Query 2D Immutable
LeetCode 560 — Subarray Sum Equals K
LeetCode 525 — Contiguous Array
LeetCode 974 — Subarray Sums Divisible by K
LeetCode 1248 — Count Number of Nice Subarrays
LeetCode 1314 — Matrix Block Sum

Common Bugs

Off-by-one: prefix is size N+1, indexed 0..N. Range [l,r] is prefix[r+1] - prefix[l]. Get this wrong once and it’s wrong forever.
2D inclusion-exclusion sign flip.
Initializing the hashmap: {0: 1} is needed for “subarrays starting at index 0” in subarray-sum-equals-K.
Integer overflow: prefix sums at N=10^5 with values up to 10^9 exceed 32-bit. Use 64-bit.

4. Difference Arrays (range update O(1))

Signal Recognition (<2 min)

“Apply many range updates (l, r, +v) then query the final array.”
“How many flights on day X” given (start, end, count) triples.
“Maximum overlap of intervals.”
The brute would be O(N · Q); a difference array makes it O(N + Q).

Canonical Template

diff = [0] * (n + 1)
for (l, r, v) in updates:
    diff[l] += v
    diff[r + 1] -= v
a = [0] * n
cur = 0
for i in range(n):
    cur += diff[i]
    a[i] = cur

Complexity

O(N + Q) total. Space O(N).

Common Variants

Booking-system style: count of overlapping intervals at each point.
2D difference (imos method): stamp rectangles, prefix-sum twice.
Sweep line equivalence: events at l and r+1 are exactly the events of a sweep; difference array is the “discretized sweep”.
Range add + point query with later updates: Fenwick/BIT becomes more flexible (Phase 3).

Classic Problems

LeetCode 1109 — Corporate Flight Bookings
LeetCode 1854 — Maximum Population Year
LeetCode 2381 — Shifting Letters II
LeetCode 370 — Range Addition
LeetCode 731 — My Calendar II
LeetCode 2536 — Increment Submatrices by One (2D diff)

Common Bugs

Forgetting the r + 1 cancellation → all suffixes get incremented.
Using [0] * n instead of [0] * (n + 1) — causes index OOB on diff[r + 1].
For 2D: forgetting the inclusion-exclusion of all four corners.

5. Hashing Patterns (frequency / complement / grouping)

Signal Recognition (<2 min)

“Find target - x” for some target → complement in a hashmap.
“Most/least frequent X” → frequency map, often paired with heap/sort.
“Group by canonical form” (anagrams, isomorphic strings) → grouping map keyed by canonical form.
“Has any element appeared twice within K positions?” → sliding window of size K with a hashset.

Canonical Templates

# complement
seen = {}
for i, x in enumerate(a):
    if (target - x) in seen:
        return [seen[target - x], i]
    seen[x] = i

# frequency
from collections import Counter
freq = Counter(a)
top = freq.most_common(K)

# grouping by canonical form
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
    groups[canonical(s)].append(s)

Complexity

O(N) average (hash). Space O(N) worst case. Adversarial inputs may degrade to O(N²) — see Phase 1 §3.

Common Variants

Two-Sum (complement).
Group anagrams (grouping by char-count tuple).
Longest consecutive sequence (set-membership test for x-1 to find sequence starts).
Subarray sum = K (prefix-sum + complement — see pattern 3).
Bullet-proof word ladder (wildcards as keys).

Classic Problems

LeetCode 1 — Two Sum
LeetCode 49 — Group Anagrams
LeetCode 128 — Longest Consecutive Sequence
LeetCode 217 — Contains Duplicate
LeetCode 219 — Contains Duplicate II
LeetCode 347 — Top K Frequent Elements
LeetCode 451 — Sort Characters by Frequency

Common Bugs

Java int[] as a key — uses object identity, not value equality. (See Phase 1 lab 03.)
Inserting into seen before the lookup, when the problem needs distinct indices.
Using ordered map when unordered suffices (e.g., Java TreeMap instead of HashMap) → log-N factor.
Reusing a mutable buffer as a key — all keys alias to the latest buffer.

6. Sorting + Greedy (sort to enable greedy)

Signal Recognition (<2 min)

“Maximum number of non-overlapping …” → sort by end, take earliest end.
“Minimum number of meeting rooms / arrows / platforms” → sort by start; sweep.
“Schedule jobs to maximize profit / minimize lateness.”
“Pair items optimally” → sort one or both, pair by index.
The brute force is “try all pairings” (factorial); sortedness collapses it to linear.

Canonical Template

a.sort(key=lambda x: x[1])  # sort by end
chosen = []
last_end = -inf
for (s, e) in a:
    if s >= last_end:
        chosen.append((s, e))
        last_end = e
return len(chosen)

Complexity

Time O(N log N) for the sort, O(N) for the sweep. Space O(1) beyond sort buffer.

Common Variants

Activity selection — sort by end, take earliest end.
Minimum platforms / arrows — sort by start (or by end for arrows).
Pairing: sort and pair by index (e.g., “minimum pair-sum to fit a target”).
Two arrays joined — sort both, two-pointer merge.
Custom comparator — sort by a derived value (profit/time, deadline-then-profit, etc.) requires proving the exchange argument.

Classic Problems

LeetCode 56 — Merge Intervals
LeetCode 252 — Meeting Rooms (and 253 — Meeting Rooms II)
LeetCode 435 — Non-overlapping Intervals
LeetCode 452 — Minimum Number of Arrows to Burst Balloons
LeetCode 502 — IPO (sort by capital, pq by profit)
LeetCode 630 — Course Schedule III
LeetCode 881 — Boats to Save People

Common Bugs

Sorting by the wrong key (start vs end). Activity selection by start is wrong.
Forgetting to prove the exchange argument before committing to greedy. (See Phase 6.)
For “non-overlap” problems: confusing s >= last_end (touching allowed) vs s > last_end (strict).
For comparator: subtraction overflow in Java/JS when sorting int differences.

7. Binary Search On Index (sorted array)

Signal Recognition (<2 min)

The input is sorted (or has a sorted property like a rotated sorted array).
The task is “find X” / “find first / last X” / “find insertion point”.
N is large (10^5+), and the brute O(N) is acceptable but O(log N) is wanted (or there are many queries).

Canonical Template (lower_bound)

def lower_bound(a, target):
    lo, hi = 0, len(a)
    while lo < hi:
        mid = (lo + hi) // 2
        if a[mid] < target:
            lo = mid + 1
        else:
            hi = mid
    return lo  # first index with a[i] >= target

Complexity

Time O(log N) per query. Space O(1).

Common Variants

lower_bound, upper_bound, exact-match.
Rotated sorted array — pick the half that is sorted, decide which half contains the target.
Search in 2D matrix — flatten coordinates, binary search the 1D index, or descend from top-right.
Find peak — local-property binary search (no global sort required).

Classic Problems

LeetCode 33 — Search in Rotated Sorted Array
LeetCode 34 — Find First and Last Position
LeetCode 35 — Search Insert Position
LeetCode 74 — Search a 2D Matrix
LeetCode 153 — Find Minimum in Rotated Sorted Array
LeetCode 162 — Find Peak Element
LeetCode 240 — Search a 2D Matrix II (descend from top-right; not binary search per se)

Common Bugs

(lo + hi) // 2 overflow in C++/Java — use lo + (hi - lo) // 2.
Wrong loop condition (< vs <=) interacting with wrong update (mid vs mid + 1 vs mid - 1) — pick a single canonical form (we use half-open [lo, hi) here) and stick with it.
Off-by-one when reconstructing the actual index after finding the bound.
For rotated arrays, forgetting that duplicates break the binary search invariant.

8. Binary Search On Answer (parametric / monotonic predicate)

Signal Recognition (<2 min)

The problem asks for the minimum X such that property P(X) holds (or maximum X such that ¬P).
P is monotonic in X (if P(X) holds, P(X+1) also holds — or vice versa).
Direct construction is hard, but verifying a candidate answer in O(N) or O(N log N) is easy.
Constraints: answer’s range is enormous (10^9, 10^18), but verification per candidate is cheap.
Keywords: “smallest capacity / speed / time”, “largest minimum”, “split into K parts minimize max sum”.

Canonical Template

def feasible(x): ...  # returns True if x is a valid answer or larger

lo, hi = LOW, HIGH
while lo < hi:
    mid = (lo + hi) // 2
    if feasible(mid):
        hi = mid
    else:
        lo = mid + 1
return lo  # smallest feasible value

Complexity

Time O(log(range) · cost_of_feasible). Space O(1) beyond feasible.

Common Variants

Min-max / max-min (split array into K parts to minimize the maximum part sum).
Capacity / rate (capacity to ship within D days; Koko eating bananas).
Time (earliest day to finish; latest day before failure).
K-th smallest in matrix / multiplication table (binary search the value, count “≤ value” entries).
Floating-point binary search — replace lo < hi with hi - lo > eps and pick the right output.

Classic Problems

LeetCode 410 — Split Array Largest Sum
LeetCode 875 — Koko Eating Bananas
LeetCode 1011 — Capacity To Ship Packages Within D Days
LeetCode 1283 — Find Smallest Divisor Given a Threshold
LeetCode 1482 — Minimum Number of Days to Make m Bouquets
LeetCode 668 — Kth Smallest Number in Multiplication Table
LeetCode 1539 — Kth Missing Positive Number

Common Bugs

Wrong direction of monotonicity — verify by hand on small cases before committing.
Wrong search bounds (lo too high → miss the answer; hi too low → infinite loop).
feasible has a subtle off-by-one — write and test feasible independently before plugging it into the binary search.
Returning lo - 1 or hi + 1 accidentally — the half-open [lo, hi) template returns lo, period.

9. Monotonic Stack (next-greater / histogram / span)

Signal Recognition (<2 min)

“Next/previous greater/smaller element” on each index.
“Largest rectangle in histogram” / “max area of submatrix of 1’s” (uses histogram per row).
“Daily temperatures” / “stock span” / “trapping rainwater” (an O(N) variant).
The brute force is “for each i, scan right (or left) until …” — O(N²); the monotonic stack collapses it to O(N).

Canonical Template (Next Greater)

n = len(a)
result = [-1] * n
stack = []  # indices, values strictly decreasing
for i in range(n):
    while stack and a[stack[-1]] < a[i]:
        result[stack.pop()] = a[i]
    stack.append(i)

Complexity

Time O(N) — each index pushed and popped at most once. Space O(N) for the stack.

Common Variants

Next/previous, greater/smaller (4 combinations) — a sign flip and a comparator change.
Histogram problems: maintain stack of indices with strictly increasing heights; on pop, the popped index sees the current as its right boundary and the new top as its left.
Sum of subarray minimums — for each element, count subarrays where it is the min.
Trapping rainwater — stack of decreasing heights; each pop produces a “trapped” volume.
Sliding window max — uses a monotonic deque (pattern 10), not stack.

Classic Problems

LeetCode 84 — Largest Rectangle in Histogram
LeetCode 85 — Maximal Rectangle (histogram per row)
LeetCode 42 — Trapping Rain Water (stack variant)
LeetCode 496 — Next Greater Element I
LeetCode 503 — Next Greater Element II (circular)
LeetCode 739 — Daily Temperatures
LeetCode 901 — Online Stock Span
LeetCode 907 — Sum of Subarray Minimums

Common Bugs

Comparator: < vs <= matters when there are duplicates and the problem wants “strictly greater” vs “greater-or-equal”. Pick the variant that gives unique boundary assignment.
Forgetting to drain the stack at the end (for problems where unprocessed elements have no next-greater).
Histogram: forgetting the sentinel 0 appended at the end — without it the last bar may not be evaluated.
Storing values vs indices — almost always store indices, derive values when needed.

10. Monotonic Queue (sliding window max/min in O(N))

Signal Recognition (<2 min)

“Maximum / minimum of every window of size K” (or variable size) in O(N).
DP transitions of the form dp[i] = max(dp[j] + ...) for j in some window — the deque maintains the candidate js.
Constrained Subsequence Sum, Jump Game VI.

Canonical Template (Sliding Window Max)

from collections import deque
dq = deque()  # holds indices, a[dq] strictly decreasing
result = []
for i, x in enumerate(a):
    while dq and a[dq[-1]] <= x:
        dq.pop()
    dq.append(i)
    if dq[0] <= i - K:
        dq.popleft()
    if i >= K - 1:
        result.append(a[dq[0]])

Complexity

Time O(N). Space O(K) for the deque.

Common Variants

Sliding-window min (flip comparator).
DP optimization: when dp[i] = f(max{dp[j] : j ∈ window}), the deque maintains the max efficiently.
Shortest subarray with sum at least K (LC 862) — combine prefix sums with a monotonic deque on prefix-sum values.

Classic Problems

LeetCode 239 — Sliding Window Maximum
LeetCode 862 — Shortest Subarray with Sum at Least K
LeetCode 918 — Maximum Sum Circular Subarray
LeetCode 1425 — Constrained Subsequence Sum
LeetCode 1696 — Jump Game VI

Common Bugs

Storing values, not indices — lose the ability to evict by window position.
<= vs < for the back-eviction (with duplicates, the wrong choice can leave stale entries that survive past their window).
Forgetting to evict the front when its index is out of window.
Reporting before the window is full (i >= K - 1).

11. Intervals (sort by start, merge / sweep)

Signal Recognition (<2 min)

“Merge overlapping intervals”, “insert interval”, “remove minimum to make non-overlapping”.
“Meeting rooms” / “minimum platforms” / “maximum concurrent events”.
“Employee free time” / “interval intersection”.

Canonical Template (Merge)

intervals.sort(key=lambda x: x[0])
merged = []
for s, e in intervals:
    if merged and merged[-1][1] >= s:
        merged[-1][1] = max(merged[-1][1], e)
    else:
        merged.append([s, e])

Canonical Template (Sweep Line)

events = []
for s, e in intervals:
    events.append((s, +1))
    events.append((e, -1))  # or (e + 1, -1) for closed intervals on integers
events.sort()
cur = peak = 0
for _, delta in events:
    cur += delta
    peak = max(peak, cur)

Complexity

Sort O(N log N), sweep O(N). Space O(N) for events.

Common Variants

Merge (sort by start, fold).
Sweep (events at endpoints, count concurrent).
Heap-of-end-times (for “minimum platforms / rooms”).
Interval trees / balanced BSTs (Phase 3) for online updates.
Tie-breaking: end events before start events (or vice versa) depending on whether endpoint contact counts as overlap.

Classic Problems

LeetCode 56 — Merge Intervals
LeetCode 57 — Insert Interval
LeetCode 252 — Meeting Rooms
LeetCode 253 — Meeting Rooms II
LeetCode 435 — Non-overlapping Intervals
LeetCode 759 — Employee Free Time
LeetCode 986 — Interval List Intersections
LeetCode 1851 — Minimum Interval to Include Each Query

Common Bugs

Sorting by end when start was needed (or vice versa).
Tie-breaking events at the same time wrong — touching intervals counted as overlap (or not) depending on the problem.
Mutating the input list while iterating (Java ConcurrentModificationException).

12. Linked List Manipulation (reverse / detect cycle / merge)

Signal Recognition (<2 min)

The data structure given is ListNode.
Tasks: reverse, reverse in groups, detect cycle, find middle, merge sorted, partition, deep copy.
Often combined with a dummy head for return-pointer simplification.

Canonical Templates

# reverse
prev, curr = None, head
while curr:
    nxt = curr.next
    curr.next = prev
    prev, curr = curr, nxt
return prev

# detect cycle (Floyd)
slow = fast = head
while fast and fast.next:
    slow = slow.next
    fast = fast.next.next
    if slow is fast: return True
return False

Complexity

All operations O(N) time, O(1) space (recursion variants are O(N) stack).

Common Variants

Reverse, reverse in K-group, reverse between m..n.
Floyd’s cycle detection + finding cycle start (mathematical trick: reset slow to head, advance both at speed 1).
Find middle with slow/fast (handle even/odd length).
Merge two sorted with dummy.
Deep copy with random pointer — interleave clones, then split.
LRU cache (Phase 3) is doubly-linked list + hashmap.

Classic Problems

LeetCode 21 — Merge Two Sorted Lists
LeetCode 25 — Reverse Nodes in k-Group
LeetCode 138 — Copy List with Random Pointer
LeetCode 141 — Linked List Cycle
LeetCode 142 — Linked List Cycle II
LeetCode 206 — Reverse Linked List
LeetCode 234 — Palindrome Linked List
LeetCode 876 — Middle of the Linked List

Common Bugs

Not using a dummy head when the head can change → special-case branches everywhere.
Reverse: losing the next pointer because of assignment order.
Cycle detection: incorrect math for finding the cycle start.
Returning the wrong end (curr is null after the loop; prev is the new head).

13. Tree DFS (preorder / inorder / postorder)

Signal Recognition (<2 min)

The input is a tree (binary, n-ary, or just a graph that happens to be a tree).
The answer is computed by combining results from subtrees (postorder) or by augmenting a top-down state (preorder).
BST in-order traversal yields sorted values.
“Validate”, “diameter”, “lowest common ancestor”, “serialize / deserialize”, “path sum”.

Canonical Template (Postorder)

def dfs(node):
    if not node: return base
    L = dfs(node.left)
    R = dfs(node.right)
    return combine(node.val, L, R)

Complexity

Time O(N) — each node visited once. Space O(H) recursion (H = height; up to N for skewed trees).

Common Variants

Inorder for BSTs (yields sorted; use for “kth smallest” / “validate BST”).
Preorder when state flows top-down (e.g., max value on path).
Postorder when answer combines subtree results (e.g., diameter, LCA).
Iterative with explicit stack — required when recursion depth could overflow (N=10^5 in Python ≈ stack limit).
Morris traversal for O(1) extra space (Phase 3).

Classic Problems

LeetCode 94 — Binary Tree Inorder Traversal
LeetCode 98 — Validate Binary Search Tree
LeetCode 104 — Maximum Depth of Binary Tree
LeetCode 124 — Binary Tree Maximum Path Sum
LeetCode 230 — Kth Smallest Element in a BST
LeetCode 236 — Lowest Common Ancestor of a Binary Tree
LeetCode 297 — Serialize and Deserialize Binary Tree
LeetCode 543 — Diameter of Binary Tree

Common Bugs

“Validate BST” by checking only left.val < node.val < right.val (local check) — must pass (min, max) bounds top-down.
Confusing “max path through node” with “max path starting at node” — the diameter trick.
Stack overflow for deep trees in Python (default limit 1000) — sys.setrecursionlimit or go iterative.
Mutating shared path list without backtracking → wrong “all paths” output.

14. Tree BFS (level order / right side view)

Signal Recognition (<2 min)

“Level order / level by level / per-depth” output.
“Right (or left) side view” — last node at each level.
“Minimum depth” — first leaf encountered in BFS.
“Connect next pointers per level” (LC 116/117).
“Vertical / diagonal traversal” — same machinery with different keying.

Canonical Template

from collections import deque
q = deque([root])
levels = []
while q:
    size = len(q)
    cur = []
    for _ in range(size):
        node = q.popleft()
        cur.append(node.val)
        if node.left: q.append(node.left)
        if node.right: q.append(node.right)
    levels.append(cur)

Complexity

Time O(N), space O(W) (W = max width; up to N/2 for balanced).

Common Variants

Plain level order, with or without per-level grouping.
Zigzag (alternate appending direction).
Right-/left-side view (last/first per level).
Minimum depth (first leaf — cuts BFS short on the goal).
Bottom-up (collect all levels then reverse).

Classic Problems

LeetCode 102 — Binary Tree Level Order Traversal
LeetCode 103 — Binary Tree Zigzag Level Order Traversal
LeetCode 107 — Binary Tree Level Order Traversal II
LeetCode 111 — Minimum Depth of Binary Tree
LeetCode 116 — Populating Next Right Pointers in Each Node
LeetCode 199 — Binary Tree Right Side View
LeetCode 314 — Binary Tree Vertical Order Traversal

Common Bugs

Using list.pop(0) (Python) → O(N²). Use deque.
Forgetting to capture size = len(q) before the inner loop — q grows during the loop and you’d over-iterate.
Returning the level structure backwards (or forwards) accidentally.
Null root not handled.

15. Graph DFS (cycle / connected components / topo via DFS)

Signal Recognition (<2 min)

The structure is a graph (general, not necessarily tree).
Tasks: count connected components, detect cycle, topologically order, find bridges/articulation points (Phase 4).
Recursion is fine (or you simulate it with an explicit stack).

Canonical Template (Connected Components)

visited = [False] * n
def dfs(u):
    visited[u] = True
    for v in adj[u]:
        if not visited[v]: dfs(v)

components = 0
for u in range(n):
    if not visited[u]:
        dfs(u); components += 1

Canonical Template (Cycle Detection in Directed Graph)

WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * n
def has_cycle(u):
    color[u] = GRAY
    for v in adj[u]:
        if color[v] == GRAY: return True
        if color[v] == WHITE and has_cycle(v): return True
    color[u] = BLACK
    return False

Complexity

Time O(V + E). Space O(V) for the recursion + visited arrays.

Common Variants

Connected components (undirected).
Strongly connected components (Tarjan / Kosaraju — Phase 4).
Cycle detection: undirected uses a parent-check; directed uses tri-color (WHITE/GRAY/BLACK).
Topological sort via DFS — postorder of a DAG, reversed.
Number of islands / flood fill on grid graphs.

Classic Problems

LeetCode 200 — Number of Islands
LeetCode 207 — Course Schedule (cycle detection in directed graph)
LeetCode 261 — Graph Valid Tree
LeetCode 323 — Number of Connected Components
LeetCode 695 — Max Area of Island
LeetCode 1192 — Critical Connections (Tarjan bridges)

Common Bugs

Forgetting to mark visited before recursing → infinite recursion.
For undirected cycle detection, treating “going back to parent” as a cycle.
Stack overflow for deep recursion in Python on N=10^5 — convert to iterative.
For grid problems, going out of bounds because the bounds check is after the recursion call.

16. Graph BFS (shortest unweighted / multi-source / 0-1)

Signal Recognition (<2 min)

Shortest path on an unweighted graph (or weights ∈ {0, 1}).
“Minimum number of steps / moves / transformations.”
Multi-source: “shortest distance from any of these sources” (rotting oranges, walls and gates).
0-1 BFS: use a deque, push 0-weight to front, 1-weight to back.

Canonical Template

from collections import deque
dist = [-1] * n
q = deque([src]); dist[src] = 0
while q:
    u = q.popleft()
    for v in adj[u]:
        if dist[v] == -1:
            dist[v] = dist[u] + 1
            q.append(v)

Complexity

Time O(V + E). Space O(V).

Common Variants

Standard BFS for unweighted shortest path.
Multi-source BFS — push all sources with distance 0, then run.
0-1 BFS with deque for graphs with weights ∈ {0, 1}.
Bidirectional BFS for shortest path between fixed source and target — both halves explore O(b^(d/2)) instead of O(b^d).
Word-ladder pattern — neighbors via wildcard buckets, not adjacency list.

Classic Problems

LeetCode 127 — Word Ladder
LeetCode 286 — Walls and Gates (multi-source)
LeetCode 542 — 01 Matrix (multi-source)
LeetCode 752 — Open the Lock
LeetCode 994 — Rotting Oranges (multi-source)
LeetCode 1162 — As Far from Land as Possible
LeetCode 2290 — Minimum Obstacle Removal to Reach Corner (0-1 BFS)

Common Bugs

Marking visited at dequeue time (lets duplicates pile up) instead of at enqueue time.
Using BFS for weighted graphs (distinct positive weights) — wrong; use Dijkstra (Phase 4).
Forgetting that “minimum depth of binary tree” is BFS, not DFS — DFS visits all leaves; BFS halts on the first.

17. Topological Sort (Kahn’s / DFS-based)

Signal Recognition (<2 min)

“Order tasks given dependencies” / “course prerequisites” / “build order” / “alien dictionary”.
Detecting whether a DAG has a cycle (failure = there’s a cycle).
DP on DAG (some tasks need a topological order to evaluate).

Canonical Template (Kahn’s BFS)

indeg = [0] * n
for u in range(n):
    for v in adj[u]:
        indeg[v] += 1
q = deque([u for u in range(n) if indeg[u] == 0])
order = []
while q:
    u = q.popleft()
    order.append(u)
    for v in adj[u]:
        indeg[v] -= 1
        if indeg[v] == 0: q.append(v)
return order if len(order) == n else []  # else: cycle

Canonical Template (DFS Postorder)

order = []
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * n
def dfs(u):
    color[u] = GRAY
    for v in adj[u]:
        if color[v] == GRAY: raise CycleError
        if color[v] == WHITE: dfs(v)
    color[u] = BLACK
    order.append(u)
for u in range(n):
    if color[u] == WHITE: dfs(u)
order.reverse()

Complexity

Time O(V + E). Space O(V + E).

Common Variants

Kahn’s — gives a valid order; cycle detection by len(order) != n.
DFS postorder reversed — alternate algorithm; same result.
Lexicographically smallest topological order — use a min-heap instead of FIFO queue (Kahn’s).
All possible topological orderings — backtracking over orderings (LC 1632).
DP on DAG following topological order.

Classic Problems

LeetCode 207 — Course Schedule
LeetCode 210 — Course Schedule II
LeetCode 269 — Alien Dictionary
LeetCode 310 — Minimum Height Trees (peeling leaves — relative)
LeetCode 329 — Longest Increasing Path in a Matrix (DP on DAG)
LeetCode 444 — Sequence Reconstruction
LeetCode 1136 — Parallel Courses
LeetCode 2115 — Find All Possible Recipes

Common Bugs

Building the graph with reversed edges (prerequisites vs unlocks).
Not detecting cycles (returning a partial order silently).
For lexicographic smallest, using a regular queue instead of a heap.
Indegrees off-by-one when the same edge is duplicated in input.

18. Union-Find (connectivity / Kruskal preview)

Signal Recognition (<2 min)

“Are X and Y connected after these operations?” (online connectivity).
“Number of connected components” with dynamic union operations (DFS once-and-done is sufficient for static).
Kruskal’s MST (Phase 4) — sort edges, union components.
Equation problems (LC 399 — Evaluate Division — weighted union-find).

Canonical Template

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]  # path compression
            x = self.parent[x]
        return x
    def union(self, a, b):
        ra, rb = self.find(a), self.find(b)
        if ra == rb: return False
        if self.rank[ra] < self.rank[rb]: ra, rb = rb, ra
        self.parent[rb] = ra
        if self.rank[ra] == self.rank[rb]: self.rank[ra] += 1
        return True

Complexity

Per operation: amortized O(α(N)) ≈ O(1) with both path compression and union by rank/size. Without rank: O(log N) amortized. Without compression: O(log N) amortized. Naive: O(N) worst case.

Common Variants

Vanilla connectivity.
With size — track component sizes for “find largest component”.
Weighted — each edge has a multiplier (LC 399 — Evaluate Division).
With rollback (Phase 3) — for offline / divide-and-conquer queries.
Kruskal MST — sort edges by weight, union the endpoints if they’re in different components.

Classic Problems

LeetCode 200 — Number of Islands (DSU alternative)
LeetCode 261 — Graph Valid Tree
LeetCode 305 — Number of Islands II (online)
LeetCode 399 — Evaluate Division
LeetCode 547 — Number of Provinces
LeetCode 684 — Redundant Connection
LeetCode 721 — Accounts Merge
LeetCode 1319 — Number of Operations to Make Network Connected

Common Bugs

Forgetting path compression — TLE on adversarial chain inputs.
find recursion that deepens the stack (use iterative or two-pass).
Updating rank only when unequal — but updating it always makes the rank wrong by +1.
Comparing parent[x] == x vs find(x) == x — they differ before compression converges.

19. Backtracking (subsets / permutations / combinations / N-queens)

Signal Recognition (<2 min)

“Find all subsets / permutations / combinations / arrangements satisfying X.”
“Place K items respecting constraints” (N-queens, Sudoku).
The brute force is exponential, and you can’t shave it polynomially — but you can prune aggressively.

Canonical Template

def backtrack(state, choices):
    if is_solution(state):
        record(state); return
    for choice in choices:
        if not valid(state, choice): continue
        apply(state, choice)
        backtrack(state, next_choices(choices, choice))
        undo(state, choice)

Complexity

Subsets: O(N · 2^N).
Permutations: O(N · N!).
N-queens: O(N!) worst case, dramatically pruned in practice. Space O(depth) for recursion + O(state size).

Common Variants

Subsets (include/exclude each element).
Permutations (choose unused; track used set or swap-in-place).
Combinations (start index to avoid reordering duplicates).
Partition into subsets (assign each element to a bucket; prune by sorting + skipping equal-prefix bucket).
Constraint satisfaction (N-queens, Sudoku) — prune with row/column/box bitmasks.
String backtracking (palindrome partitioning, restore IP addresses, generate parentheses).

Classic Problems

LeetCode 17 — Letter Combinations of a Phone Number
LeetCode 22 — Generate Parentheses
LeetCode 39 — Combination Sum
LeetCode 46 — Permutations
LeetCode 51 — N-Queens
LeetCode 78 — Subsets
LeetCode 79 — Word Search
LeetCode 90 — Subsets II (with duplicates)
LeetCode 131 — Palindrome Partitioning
LeetCode 212 — Word Search II (with trie)

Common Bugs

Forgetting to undo the choice before returning → state corruption.
Recording state by reference, not by copy → all results alias the final state.
Duplicate handling: forgetting if i > start and a[i] == a[i-1]: continue (for sorted input with duplicates).
For grid backtracking, forgetting to mark visited or not unmarking on return.

20. Basic DP (memoization vs tabulation)

Signal Recognition (<2 min)

“Number of ways to …” / “Min/max … over choices.”
Recursive structure with overlapping subproblems: the same sub-question is asked multiple times.
Optimal substructure: the optimal answer combines optimal answers to subproblems.
The brute is exponential; with memo the state space is polynomial.

Canonical Template (Top-Down Memoization)

from functools import lru_cache
@lru_cache(maxsize=None)
def solve(state):
    if is_base(state): return base_value(state)
    return combine(solve(next_state(state, c)) for c in choices(state))

Canonical Template (Bottom-Up Tabulation)

dp = init_table()
for state in topological_order_of_states():
    dp[state] = combine(dp[prev] for prev in predecessors(state))
return dp[final_state]

Complexity

Time = (# states) × (work per state). Space = (# states), often optimizable to a slice.

Common Variants (covered separately below)

1D DP (pattern 21).
2D DP (pattern 22).
Knapsack (pattern 23).
Subsequence DP (pattern 24).
String DP (pattern 25).
Bitmask / interval / digit / probability / tree (Phase 5).

Classic Problems

LeetCode 70 — Climbing Stairs
LeetCode 198 — House Robber
LeetCode 322 — Coin Change
LeetCode 416 — Partition Equal Subset Sum

Common Bugs

Wrong state definition — too coarse to reconstruct, too fine to fit in memory.
Wrong base case (off-by-one in the empty / single-element base).
Wrong evaluation order in tabulation — predecessors computed after dependents.
Memo key collisions when two different state tuples accidentally hash equal.

21. 1D DP (climbing stairs / house robber / decode ways)

Signal Recognition (<2 min)

The state is a single index: “the answer at position i depends on positions ≤ i”.
Transitions look at the last 1–3 positions.
The answer is at dp[n] or dp[n-1].

Canonical Template

dp = [0] * (n + 1)
dp[0] = base
for i in range(1, n + 1):
    dp[i] = combine(dp[i - 1], dp[i - 2], a[i - 1])
return dp[n]

Complexity

Time O(N), space O(N) — usually compressible to O(1).

Common Variants

Climbing stairs (Fibonacci-shaped).
House robber / robber II (linear / circular).
Decode ways (transitions depend on a 2-digit window).
Maximum subarray (Kadane’s).
Min cost climbing stairs.

Classic Problems

LeetCode 53 — Maximum Subarray (Kadane’s)
LeetCode 70 — Climbing Stairs
LeetCode 91 — Decode Ways
LeetCode 121 — Best Time to Buy and Sell Stock
LeetCode 198 — House Robber
LeetCode 213 — House Robber II
LeetCode 264 — Ugly Number II
LeetCode 746 — Min Cost Climbing Stairs

Common Bugs

Off-by-one in the dp size (n vs n+1).
Wrong base for empty input.
Decode ways: forgetting that 0 is not a valid single-digit decoding.
Compressing to O(1) but reading dp[i-2] after dp[i-1] is overwritten.

22. 2D DP (grid / unique paths / LCS preview)

Signal Recognition (<2 min)

The state is a pair (i, j) — typically (row, col) or (index_in_A, index_in_B).
Transitions look at neighboring cells: dp[i-1][j], dp[i][j-1], dp[i-1][j-1].
Common shapes: grid path counting/min sum, LCS, edit distance, palindrome substring.

Canonical Template

dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
    for j in range(1, m + 1):
        dp[i][j] = combine(dp[i-1][j], dp[i][j-1], dp[i-1][j-1], a[i-1], b[j-1])
return dp[n][m]

Complexity

Time O(NM), space O(NM) — often compressible to O(min(N, M)) by keeping two rows.

Common Variants

Grid DP: unique paths, min path sum, max path sum.
Two-string DP: LCS, edit distance, regex matching, distinct subsequences (covered in 24/25).
Matrix DP: maximal square, dungeon game.
Backwards-traversal (start from (n, m)) when transitions need future state.

Classic Problems

LeetCode 62 — Unique Paths
LeetCode 64 — Minimum Path Sum
LeetCode 72 — Edit Distance
LeetCode 174 — Dungeon Game
LeetCode 221 — Maximal Square
LeetCode 1143 — Longest Common Subsequence

Common Bugs

Initializing the first row/column wrong (not the additive identity for the operator).
Allocating [[0] * m] * n in Python — all rows alias the same list (top-3 Python DP bug).
1D-compression bug: reading the new value when the old one was needed (or vice versa).
For grids with obstacles: forgetting the obstacle ⇒ 0-paths-here invariant.

23. Knapsack (0/1 + unbounded)

Signal Recognition (<2 min)

“Pick a subset of items to maximize value subject to a capacity constraint” (0/1 knapsack — each item once).
“Pick items with repetition allowed” (unbounded knapsack — coin change min coins, integer break).
“Number of ways to make sum K from given items” (counting variant).

Canonical Template (0/1 Knapsack — Compressed 1D)

dp = [0] * (W + 1)
for v, w in items:
    for c in range(W, w - 1, -1):  # reverse to avoid re-using item
        dp[c] = max(dp[c], dp[c - w] + v)

Canonical Template (Unbounded Knapsack)

dp = [0] * (W + 1)
for c in range(1, W + 1):  # forward, so each item can be reused
    for v, w in items:
        if c >= w: dp[c] = max(dp[c], dp[c - w] + v)

Complexity

Time O(N · W). Space O(W) (compressed) or O(N · W) (uncompressed).

Common Variants

0/1 vs unbounded vs bounded (limited copies of each item).
Min coins, count-of-ways, can-we-make-this-sum.
Subset sum (knapsack with value = weight).
Partition equal subset sum (subset sum to total/2).

Classic Problems

LeetCode 322 — Coin Change (unbounded, min)
LeetCode 416 — Partition Equal Subset Sum (0/1, decision)
LeetCode 474 — Ones and Zeroes (0/1 with two capacities)
LeetCode 494 — Target Sum (count-of-ways)
LeetCode 518 — Coin Change II (unbounded, count)
LeetCode 879 — Profitable Schemes
LeetCode 1049 — Last Stone Weight II

Common Bugs

0/1 with forward inner loop → double-counts items.
Unbounded with reverse inner loop → behaves like 0/1.
For “count of ways” with order-insensitive: outer is items, inner is capacity (LC 518). Order-sensitive: opposite (LC 377).
Forgetting that dp[0] = 1 for count-of-ways, dp[0] = 0 for max-value.

24. Subsequence DP (LIS / LCS / edit distance)

Signal Recognition (<2 min)

“Longest increasing / common / non-decreasing subsequence.”
“Edit distance / minimum operations to transform A to B.”
“Distinct subsequences / supersequences.”
“Longest palindromic subsequence” (it’s LCS of s and s[::-1]).

Canonical Template (LIS, O(N log N))

import bisect
tails = []
for x in a:
    i = bisect.bisect_left(tails, x)  # bisect_right for non-decreasing
    if i == len(tails): tails.append(x)
    else: tails[i] = x
return len(tails)

Canonical Template (LCS / Edit Distance — 2D DP)

dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
    for j in range(1, m + 1):
        if a[i-1] == b[j-1]:
            dp[i][j] = dp[i-1][j-1] + 1   # LCS
        else:
            dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Complexity

LIS: O(N log N) (patience-sort) or O(N²) (DP). LCS / edit distance: O(NM).

Common Variants

LIS, longest non-decreasing, count of LIS, reconstruction.
LCS, shortest common supersequence (N + M − LCS).
Edit distance (Levenshtein), with weighted operations.
Distinct subsequences (count occurrences of T as subsequence of S).
Longest palindromic subsequence (= LCS of s and reversed s).

Classic Problems

LeetCode 72 — Edit Distance
LeetCode 115 — Distinct Subsequences
LeetCode 300 — Longest Increasing Subsequence
LeetCode 354 — Russian Doll Envelopes (LIS in 2D)
LeetCode 516 — Longest Palindromic Subsequence
LeetCode 583 — Delete Operation for Two Strings
LeetCode 673 — Number of LIS
LeetCode 1143 — Longest Common Subsequence

Common Bugs

LIS with bisect_left vs bisect_right controls strict vs non-strict — pick the wrong one and ties are mishandled.
Edit distance: forgetting that the base row/col is 0..n (i deletions / insertions to reach empty).
Reconstruction: walking the dp table backward, easy to off-by-one.

25. String DP (palindrome / partitioning)

Signal Recognition (<2 min)

“Longest palindromic substring / subsequence.”
“Minimum cuts to partition into palindromes.”
“Word break / segment string into dictionary words.”
“Regex / wildcard matching.”

Canonical Template (Palindrome Substring DP)

n = len(s)
dp = [[False] * n for _ in range(n)]
for i in range(n): dp[i][i] = True
for length in range(2, n + 1):
    for i in range(n - length + 1):
        j = i + length - 1
        if s[i] == s[j] and (length == 2 or dp[i+1][j-1]):
            dp[i][j] = True

Complexity

Most string-DP problems: O(N²) time, O(N²) space (often compressible to O(N)). Manacher (Phase 3) gives O(N) for longest-palindrome.

Common Variants

Longest palindromic substring (DP, expand-around-center, or Manacher).
Longest palindromic subsequence (LCS-based).
Minimum cuts (palindrome partitioning II).
Word break (boolean DP) and word break II (recover all decompositions).
Regex / wildcard matching (?, *, .).

Classic Problems

LeetCode 5 — Longest Palindromic Substring
LeetCode 10 — Regular Expression Matching
LeetCode 44 — Wildcard Matching
LeetCode 132 — Palindrome Partitioning II
LeetCode 139 — Word Break
LeetCode 140 — Word Break II
LeetCode 516 — Longest Palindromic Subsequence
LeetCode 647 — Palindromic Substrings

Common Bugs

Iteration order: filling dp[i][j] requires dp[i+1][j-1] already filled — iterate by length, not by i then j.
Word break: building all decompositions naively is O(2^N) — memoize, but be aware total output can still be exponential.
Wildcard * matching empty vs many — both transitions needed.
Off-by-one when j = i + length - 1.

26. Trie (prefix queries / autocomplete preview)

Signal Recognition (<2 min)

“Many strings, prefix queries” — does any word start with X? Count words starting with X?
Autocomplete / spell check.
Word search II (LC 212) — combine trie with backtracking on a grid.
Maximum XOR pair (LC 421) — bit-level trie.
Replace words / dictionary lookup.

Canonical Template

class TrieNode:
    __slots__ = ('children', 'end')
    def __init__(self):
        self.children = {}
        self.end = False

class Trie:
    def __init__(self): self.root = TrieNode()
    def insert(self, word):
        node = self.root
        for c in word:
            node = node.children.setdefault(c, TrieNode())
        node.end = True
    def search(self, word):
        node = self._walk(word)
        return node is not None and node.end
    def starts_with(self, prefix):
        return self._walk(prefix) is not None
    def _walk(self, s):
        node = self.root
        for c in s:
            if c not in node.children: return None
            node = node.children[c]
        return node

Complexity

Insert / search / prefix: O(L) per operation. Space O(N · L) worst case (no shared prefixes).

Common Variants

Character trie (by char).
Bit trie (by bit) — for XOR / Hamming-distance problems.
Compressed (radix) trie — Phase 3.
Trie + DFS for “all words on a board” (LC 212) — early-prune by failing nodes.
Suffix trie / suffix automaton — Phase 3 / Phase 12.

Classic Problems

LeetCode 208 — Implement Trie (Prefix Tree)
LeetCode 211 — Design Add and Search Words Data Structure
LeetCode 212 — Word Search II
LeetCode 336 — Palindrome Pairs
LeetCode 421 — Maximum XOR of Two Numbers in an Array (bit trie)
LeetCode 648 — Replace Words
LeetCode 677 — Map Sum Pairs
LeetCode 1268 — Search Suggestions System

Common Bugs

Forgetting the end flag (or whatever marks a complete word) — search("app") returns true when only "apple" was inserted.
Using a 26-element array vs a hashmap — array is faster but only for fixed alphabets.
Iterating node.children mistakenly using insertion order assumptions.
For LC 212, not pruning nodes after they’ve been used (still works correctly but wastes time).

27. Heap Top-K (k-largest / k-frequent / k-closest)

Signal Recognition (<2 min)

“Find the K largest / smallest / most frequent / closest.”
Online/streaming K-th element.
Median maintenance (two heaps).
“Merge K sorted streams” (pattern 28 — see below).

Canonical Template (Top-K with Min-Heap of Size K)

import heapq
heap = []
for x in stream:
    if len(heap) < K:
        heapq.heappush(heap, x)
    elif x > heap[0]:
        heapq.heapreplace(heap, x)
return heap  # K largest, unsorted

Complexity

Time O(N log K). Space O(K). Compare to full sort O(N log N) — beats it when K << N.

Common Variants

Top K largest / smallest.
Top K frequent — bucket sort gives O(N) when frequencies fit (LC 347).
K closest points to origin — heap of K by distance.
Median from data stream — two heaps (max-heap of low half, min-heap of high half).
K-th smallest in matrix / K-th smallest in BST — heap or controlled traversal.

Classic Problems

LeetCode 215 — Kth Largest Element in an Array
LeetCode 295 — Find Median from Data Stream
LeetCode 347 — Top K Frequent Elements
LeetCode 451 — Sort Characters by Frequency
LeetCode 692 — Top K Frequent Words
LeetCode 703 — Kth Largest Element in a Stream
LeetCode 973 — K Closest Points to Origin
LeetCode 1046 — Last Stone Weight

Common Bugs

Using a max-heap for top-K-largest — wrong; use a min-heap of size K (we evict the smallest).
Java PriorityQueue is min-heap by default; use Comparator.reverseOrder() for max.
Python heapq is min-heap only; negate values for max-heap.
For “top K frequent words” with tie-breaking (alphabetical) — comparator gets tricky in Java/Python.

28. K-Way Merge (merge K lists / smallest range covering K lists)

Signal Recognition (<2 min)

K sorted lists/streams; merge them into one sorted output.
“Find the smallest range that contains at least one element from each of K lists.”
“Find the K-th smallest in K sorted lists / matrix.”
External-merge / external-sort flavor.

Canonical Template (Merge K Sorted Lists)

import heapq
heap = []
for i, lst in enumerate(lists):
    if lst: heapq.heappush(heap, (lst[0].val, i, lst[0]))
dummy = ListNode(0); tail = dummy
while heap:
    val, i, node = heapq.heappop(heap)
    tail.next = node; tail = tail.next
    if node.next:
        heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next

Complexity

Time O(N log K) where N is the total number of elements. Space O(K) for the heap.

Common Variants

Merge K sorted lists / arrays / streams.
Smallest range covering at least one from each list — heap holds one element per list, track current max; pop the min and advance the popped list.
K-th smallest in sorted matrix — heap of (value, row, col); pop, push next-in-row (or use binary search on answer instead).
Find smallest pair sums (LC 373) — heap from two sorted lists.
Skyline problem (LC 218) — sweep over events with a heap of active heights.

Classic Problems

LeetCode 23 — Merge K Sorted Lists
LeetCode 218 — The Skyline Problem
LeetCode 264 — Ugly Number II
LeetCode 313 — Super Ugly Number
LeetCode 373 — Find K Pairs with Smallest Sums
LeetCode 378 — Kth Smallest Element in a Sorted Matrix
LeetCode 632 — Smallest Range Covering Elements from K Lists
LeetCode 1675 — Minimize Deviation in Array

Common Bugs

Heap items must include a tiebreaker — comparing ListNode directly raises TypeError in Python.
Forgetting to push the next element in the same list after popping.
For “smallest range”: confusing max so far (cheap to maintain) with re-scanning the heap (O(K)).
Off-by-one when one list is exhausted before others.

Pattern Recognition Cheat Sheet (Signal → Pattern)

This is the table you should be able to traverse, top-to-bottom, in <60 seconds for any new Medium.

Signal in problem statement	Likely pattern(s)	First template to try
Sorted input + pair/triplet sum	Two pointers (1)	opposite-ends two-pointer
In-place removal / partition	Two pointers (1)	read/write pointer
Subarray with property over contiguous elements	Sliding window (2) or prefix sum (3)	shrink-while-violation
Max/min of every window K	Monotonic queue (10)	deque indices
Subarray sum equals K / divisible K	Prefix sum + hash (3, 5)	prefix + complement map
Many range updates then final state	Difference array (4)	diff + prefix
“Find pair summing to target”	Hash complement (5)	seen[target − x]
“Group by canonical form”	Hashing — grouping (5)	dict[canonical] → list
“Maximum non-overlapping …”	Sort + greedy (6, 11)	sort by end, sweep
“Number of meeting rooms”	Intervals — sweep (11)	events, +1/−1
Sorted, find element / first ≥ X	Binary search on index (7)	lower_bound
“Min capacity / time / speed s.t. P”	Binary search on answer (8)	binary search + feasible()
“Next greater / span / histogram”	Monotonic stack (9)	strictly-decreasing stack
Linked-list reverse / cycle / merge	Linked-list patterns (12)	dummy + 3-pointer
Tree value combined from subtrees	Tree DFS postorder (13)	recursive combine
Tree level-by-level	Tree BFS (14)	queue, capture size
Graph “connected components”	Graph DFS (15)	visited + DFS
Shortest path on unweighted graph	Graph BFS (16)	distances + queue
Shortest path with weights ∈ {0,1}	0-1 BFS (16)	deque, push-front 0
“Order tasks given deps”	Topological sort (17)	Kahn’s BFS
“Connectivity with online unions”	Union-find (18)	DSU with path compression
“Kruskal MST / spanning tree”	Union-find (18)	sort edges + DSU
“All subsets / permutations”	Backtracking (19)	recurse + undo
“Constraint satisfaction (N-queens)”	Backtracking (19)	bitmask state
“Min/max ways with overlapping subproblems”	DP (20)	memoize state
“Single-index recurrence”	1D DP (21)	dp[i] from dp[i-1..i-3]
“Two-index recurrence”	2D DP (22)	dp[i][j] from neighbors
“Pick subset under capacity”	0/1 knapsack (23)	reverse inner loop
“Pick with repetition”	Unbounded knapsack (23)	forward inner loop
“LIS / LCS / edit distance”	Subsequence DP (24)	2D dp or patience sort
“Longest palindromic *” / “min cuts”	String DP (25)	palindrome dp + outer loop
“Many strings, prefix queries”	Trie (26)	trie + insert/search
“K largest/smallest/closest”	Heap top-K (27)	min-heap of size K
“Merge K sorted …”	K-way merge (28)	heap of one-per-list

When two patterns plausibly fit, try both signals on a small example. Often one fits cleanly and the other forces awkward special cases.

Mastery Checklist For This Phase

You are ready to leave Phase 2 when, for every one of the 28 patterns:

You can recognize the signal in <2 minutes on a fresh Medium.
You can write the canonical template from memory in <5 minutes, without lookup.
You can articulate the time/space complexity precisely, including amortized vs worst case.
You can name 4+ classic LeetCode problems where this pattern is the intended solution.
You can list at least 2 common bugs that the pattern is famous for.
You have solved at least 5 problems applying this pattern, with at least 2 reviewed via REVIEW_TEMPLATE.md.

And, more generally:

You can produce the cheat-sheet table above from scratch (or close to it) on a whiteboard.
You can name, given a signal, the most likely pattern plus the second-most-likely (because tricky problems disguise themselves).
You can combine two patterns when one alone is insufficient (e.g., monotonic deque inside sliding window for LC 862; trie inside backtracking for LC 212).
You have run mock interviews on Mediums and your time-to-recognize is reliably under 2 minutes.

Required Problem Volume

The patterns are not learned from this README. They are learned from solving lots of problems per pattern and reviewing each via REVIEW_TEMPLATE.md, then revisiting via SPACED_REPETITION.md.

Recommended minimums for Phase 2 completion (per pattern):

5–8 Medium problems that explicitly use the pattern as their intended solution.
2 mock-interview Mediums where the pattern is not hinted (you must recognize it).
1 problem at the boundary where two patterns plausibly apply — pick one, justify, solve.

Total over the phase: ~150–200 Mediums. This is the cost. The benefit is that, after Phase 2, almost every Medium becomes a 5-minute problem.

Exit Criteria

You may move to Phase 3 — Advanced Data Structures when:

You score ≥ 75% on a 10-problem random-Medium mock (35 min each, no hints, no lookup), with the pattern recognition and template write-up completed in the first 5 minutes of each problem.
You can pass the READINESS_CHECKLIST.md entries for “pattern recognition” without lookup.
You have completed all 9 labs in labs/ with the format-required deliverables.
You have at least 40 problems in your spaced-repetition queue from this phase, with first reviews passed.

If your unaided solve rate on random Mediums is below 75%, do not advance. Spend another 1–2 weeks at this level, focusing on the patterns where you missed. The patterns calcify here. Calcify wrong patterns and Phase 3+ becomes a fight against your own intuition.

How This Phase Connects To The Rest

Phase 0 / Phase 1 are prerequisites. You cannot recognize patterns if you cannot read the problem and you do not know the data structures.
Phase 3 (Advanced DS) generalizes patterns 9, 10, 18, 26, 27, 28 with segment trees, BITs, balanced BSTs, suffix arrays.
Phase 4 (Graphs) generalizes patterns 15, 16, 17, 18 with Dijkstra, Bellman-Ford, SCC, max flow.
Phase 5 (DP) generalizes patterns 20–25 with bitmask, interval, tree, digit, probability DP.
Phase 6 (Greedy) formalizes the proof techniques behind pattern 6 (sort + greedy).
Phase 9 (Language/Runtime) drills the language-specific footguns called out in each pattern’s “Common Bugs”.

You will return to this README dozens of times over the rest of the curriculum — each return faster than the last, until eventually the patterns are no longer something you look up but something you simply see.

Lab 01 — Two Pointers: 3Sum (Deep Duplicate Handling)

Goal

Master the opposite-ends two-pointer pattern on a sorted array, with the specific discipline required for correct duplicate suppression. Deliverable solves 3Sum in O(N²) time, O(1) extra space, returning all unique triplets — and you can articulate, line by line, why each duplicate-skip is needed and what bug it prevents.

Background Concepts

Sorting as a precondition for two-pointer; loop invariants on (l, r); duplicate suppression at three loci (the outer i, the inner l, the inner r); the relationship between 3Sum and 2Sum-on-sorted; why hash-set deduplication is the wrong tool here. Review pattern Two Pointers and pattern Hashing.

Interview Context

3Sum is one of the single most asked problems in Big Tech onsite loops — it appears at Meta, Amazon, Bloomberg, and Apple. The signal is whether you can articulate the duplicate-handling logic before you code, not retroactively. Naive candidates produce O(N²) triplets and dedup via a hash set (set(tuple(sorted(triplet)))), which works but signals weak invariant reasoning. Strong candidates handle duplicates inside the two-pointer loop with three skip-clauses and explain each one’s purpose. Elite candidates also address the trade-off vs the hash-based 3Sum (when the input is unsorted and you can’t sort).

Problem Statement

Given an integer array nums, return all unique triplets [nums[i], nums[j], nums[k]] such that i, j, k are distinct indices and nums[i] + nums[j] + nums[k] == 0. The result must not contain duplicate triplets.

Constraints

3 ≤ N ≤ 3000
-10^5 ≤ nums[i] ≤ 10^5
Output is a list of triplets in any order; each triplet’s internal order doesn’t matter.
Triplets are deduplicated by value, not by index: [-1, 0, 1] from indices (0, 2, 4) and from (1, 3, 5) count once.

Clarifying Questions

Should [-1, 0, 1] and [0, 1, -1] count as different triplets? (No — value-set duplicates.)
Are duplicates in the input array allowed? (Yes — many test cases hinge on this.)
Can the input be empty / size < 3? (Per constraints, N ≥ 3, but mention you’d guard.)
Can values exceed 32-bit range when summed? (Per constraints, max |sum| ≤ 3·10^5, safe in 32-bit, but you should mention overflow as a habit.)
Is there a strict no-extra-space requirement, or is the output allocation OK?

Examples

Input	Output
`[-1,0,1,2,-1,-4]`	`[[-1,-1,2],[-1,0,1]]`
`[0,0,0,0]`	`[[0,0,0]]`
`[1,2,-2,-1]`	`[]`
`[]`	`[]`
`[0]`	`[]`

Initial Brute Force

Three nested loops; check nums[i] + nums[j] + nums[k] == 0; dedup with a hash set of sorted triplets.

def three_sum_brute(nums):
    out = set()
    n = len(nums)
    for i in range(n):
        for j in range(i + 1, n):
            for k in range(j + 1, n):
                if nums[i] + nums[j] + nums[k] == 0:
                    out.add(tuple(sorted((nums[i], nums[j], nums[k]))))
    return [list(t) for t in out]

Brute Force Complexity

Time O(N³). Space O(N²) worst case for out (number of unique triplets). At N=3000, ~2.7×10^10 operations — far too slow (>30 seconds in any language).

Optimization Path

The sub-problem after fixing nums[i] is 2Sum on the remaining sorted array with target -nums[i]. 2Sum-on-sorted is O(N) via opposite-ends two-pointer. Total: O(N) outer × O(N) inner = O(N²).

Why sort? Sortedness gives 2Sum-on-sorted an O(N) two-pointer algorithm; without sort, 2Sum is O(N) via hash but combining hash-2Sum with outer triplet enumeration makes duplicate-handling much trickier.

Final Expected Approach

def three_sum(nums):
    nums.sort()
    n = len(nums)
    out = []
    for i in range(n - 2):
        if nums[i] > 0: break                              # early exit
        if i > 0 and nums[i] == nums[i - 1]: continue      # skip i-duplicate
        l, r = i + 1, n - 1
        while l < r:
            s = nums[i] + nums[l] + nums[r]
            if s < 0:
                l += 1
            elif s > 0:
                r -= 1
            else:
                out.append([nums[i], nums[l], nums[r]])
                l += 1; r -= 1
                while l < r and nums[l] == nums[l - 1]: l += 1   # skip l-duplicate
                while l < r and nums[r] == nums[r + 1]: r -= 1   # skip r-duplicate
    return out

Data Structures Used

The input array, sorted in place.
An output list of triplets.
Three integer indices (i, l, r).

No hash structures; no auxiliary lists beyond output.

Correctness Argument

After sorting, fix i. The remaining array nums[i+1..n-1] is sorted, and we run the canonical opposite-ends 2Sum: when the sum is too small we advance l, when too large we retreat r, when equal we record and advance both. The standard 2Sum-on-sorted invariant proves that every value pair (nums[l], nums[r]) with l < r and nums[l] + nums[r] == target is found exactly once, in sorted order.

For duplicates: the three skip-clauses ensure that each distinct triplet by value is recorded exactly once.

if i > 0 and nums[i] == nums[i-1]: continue — the previous outer iteration already enumerated all triplets with first element equal to nums[i]. Without this, [-1,-1,0,1,2] would record [-1,-1,2] and [-1,0,1] twice (once for each -1 as the outer pivot).
while l < r and nums[l] == nums[l-1]: l += 1 (after recording) — if multiple nums[l] values exist within (i, r), they all pair with the same nums[r] to give the same triplet. Skip them.
while l < r and nums[r] == nums[r+1]: r -= 1 (symmetric for the right pointer) — same reasoning.

The early exit if nums[i] > 0: break is correct because once the smallest element of the triplet is positive, no triplet sums to zero (the array is sorted).

Complexity

Time O(N²): O(N log N) sort + O(N²) total work in the nested two-pointer (each l and r move monotonically over a window of size O(N), and the outer i runs N times). Space O(1) extra (excluding output and the sort’s O(log N) recursion stack).

Implementation Requirements

Sort in place (do not allocate a sorted copy).
Skip duplicates inside the loop, not via a hash set on output.
Use the early-exit if nums[i] > 0: break (small but real speedup for typical inputs).
Move both pointers on a match, then run the skip loops — not before, or you’ll never advance off the matched element.
Return a List<List<Integer>> / list[list[int]] — not a set, not tuples.

Tests

Smoke: [-1,0,1,2,-1,-4] → [[-1,-1,2],[-1,0,1]].
Unit: all-zeros ([0,0,0,0] → [[0,0,0]]); no triplets ([1,2,3] → []); minimum size ([0,0,0] → [[0,0,0]]).
Edge: size 0 / 1 / 2 → []; all-negative; all-positive (sum > 0 from index 0 → break immediately, return []).
Large: N = 3000, values in [-10^5, 10^5]; assert sub-second; verify count against the brute force on a 100-element prefix.
Random: generate 50 random inputs of size ≤ 200; compare against the brute force solution as oracle.
Adversarial: [0]*3000 (all zeros — exactly one triplet [0,0,0]); long run of duplicates of a single value.
Invalid: non-integer / null input — out of scope per constraints, but mention you’d validate at the boundary.

Follow-up Questions

“What about kSum?” → recurse: kSum(nums, target, k) calls (k-1)Sum(remaining, target - nums[i]), base case is 2Sum-sorted. Time O(N^(k-1)).
“What if the array is unsorted and you cannot sort it?” → 2Sum-with-hash inside an outer enumeration; duplicate handling becomes per-output-set deduplication, more memory.
“What if values repeat extremely (e.g., 99% zeros)?” → the duplicate skips handle this in O(N²) worst case, but in practice each outer iteration is O(1) for the duplicate values; you’d see a near-O(N) effective runtime on that adversarial input.
“Can you do better than O(N²)?” → not under standard reductions: 3Sum has a known conditional lower bound of Ω(N²) (3SUM-hardness conjecture). Subquadratic 3Sum implies subquadratic many problems in computational geometry.
“What about returning closest triplet to a target?” → 3SumClosest variant; same skeleton, track the best |s - target| seen.

Product Extension

Detecting fraud rings of size 3 in a transaction graph where the signed sum of three transactions must net to zero (cancel out), under the constraint that the transactions hashed-distinctly. The same skeleton — fix one transaction, two-pointer the rest by signed amount — works, with the wrinkle that “duplicate” must be defined carefully (transactions are distinct by ID even if amounts are equal, so the duplicate skipping is replaced by a per-amount enumeration that emits all distinct ID combinations summing to zero).

Language/Runtime Follow-ups

Python: nums.sort() is in place. Use nums.append([nums[i], nums[l], nums[r]]) not nums.append((..)) if a list-of-lists is required (LC accepts either, but the contract is list).
Java: sort with Arrays.sort(nums) — note this is dual-pivot quicksort for primitives, average O(N log N). For Integer[] it’s TimSort. Use List<List<Integer>> and add Arrays.asList(a, b, c) per triplet — Arrays.asList returns a fixed-size list, which is fine because we never mutate it.
Go: sort.Ints(nums) sorts in place; the rest is pointer arithmetic over indices.
C++: std::sort(nums.begin(), nums.end()). Use std::vector<std::vector<int>> for output. Beware: integer addition nums[i] + nums[l] + nums[r] can overflow 32-bit if value bounds were larger; with the given constraints it’s safe, but as a habit, use long long.
JS/TS: Array.prototype.sort() defaults to lexicographic comparison — this is a top-3 JS interview bug. Use nums.sort((a, b) => a - b). Also, a - b can overflow 32-bit if values exceed ±2^30; for very large values use Math.sign(a - b).
Adversarial: sorting is the dominant constant; if the input is already sorted (best case for many sorts) this is faster than the brute by another factor.

Common Bugs

Forgetting the nums[i] == nums[i-1] skip on outer — produces duplicate output triplets like [-1,-1,2] repeated.
Forgetting to advance pointers on match — l += 1; r -= 1 must come before the inner duplicate-skip loops; otherwise nums[l] == nums[l-1] is comparing nums[l] to itself and the skip loop runs forever (well, runs until l == r, but produces no progress on the matched value).
Using < vs <= in while l < r — l == r would mean the same index appearing as both pointers, which is invalid (i, j, k must be distinct indices).
JS lexicographic sort — [-1, -1, 2, -4, 0, 1] after default sort is [-1, -1, -4, 0, 1, 2] (string-sorted). Always pass a numeric comparator.
Missing 32-bit overflow in C++/Java/Go when constraints allow large values. With LC-3Sum’s constraints it doesn’t bite, but the habit costs nothing and saves you on related problems (4Sum, kSum-with-target).
Hash-set deduplication on output — works, but signals you didn’t internalize the invariant. Time still O(N²) but space O(N²) instead of O(1).
Sorting twice by accident (once explicitly, once implicit in a downstream API) — innocuous but signals carelessness.

Debugging Strategy

Run on [0,0,0,0] first — should return [[0,0,0]]. If you get [[0,0,0],[0,0,0]], your outer-skip is broken. If you get [], your inner loop never matches (likely l < r typo).
Run on [-2,0,0,2,2]. Expected: [[-2,0,2]]. If you get [[-2,0,2],[-2,0,2]], your inner-r skip is broken.
Diff against the brute force on 50 random inputs of size 50. The answers should match as sets (order of triplets and inner order doesn’t matter; sort each triplet and the list of triplets to compare).
Time a 3000-element random input. Should run in well under a second in Python; under 50ms in C++ / Java / Go.

Mastery Criteria

Recognized the signal “sorted, find triplet summing to target” as 3Sum within 30 seconds of reading.
Articulated the three skip-clauses before writing them.
Wrote a correct implementation in under 8 minutes, no lookup.
Passed the all-zeros, single-duplicate, and large-N tests on the first attempt.
Stated the conditional Ω(N²) lower bound when asked “can you do better?”.
Identified the JS lexicographic-sort trap (or its language equivalent) without prompting.
Generalized verbally to kSum and to 3SumClosest.

Lab 02 — Sliding Window: Longest Substring With At Most K Distinct Characters

Goal

Master the variable-size sliding window with a frequency map and a “shrink while violation” invariant. Deliverable solves the problem in O(N) time, O(K) extra space, with the loop invariant articulated explicitly: at the end of every iteration, s[l..r] contains at most K distinct characters.

Background Concepts

The shrink-while-violation skeleton; using a count of distinct items vs a full hashmap traversal; the difference between “at most K” and “exactly K” (the atMost(K) - atMost(K-1) trick); when fixed-size and variable-size windows apply. Review pattern Sliding Window.

Interview Context

This problem (LeetCode 340) is a Google / Meta favorite, and a near-direct ancestor of LeetCode 76 (Minimum Window Substring), 159 (At Most Two Distinct), 992 (Subarrays With K Different Integers), and 424 (Longest Repeating Character Replacement). The interview signal is whether you maintain the invariant cleanly: a single while violation: loop that shrinks until valid, then unconditional answer-update. Weak candidates write nested if-statements with off-by-one errors. Strong candidates write the canonical 5-line shrink-and-update.

Problem Statement

Given a string s and an integer K, return the length of the longest substring of s that contains at most K distinct characters.

Constraints

1 ≤ |s| ≤ 5 × 10^4
0 ≤ K ≤ |s|
s consists of arbitrary characters (Unicode in some variants; ASCII in the standard variant).

Clarifying Questions

Is the alphabet ASCII or Unicode? (If Unicode, hashmap; if ASCII-lowercase, an int[26].)
What if K == 0? (Empty substring is valid; answer is 0.)
What if K >= number of distinct chars in s? (Answer is len(s).)
Is the answer the length or the substring itself? (LC asks length; mention you can record (start, length) for the substring.)
Can s be empty? (Per constraints, |s| ≥ 1, but the function should return 0 on empty input.)

Examples

Input	Output	Note
`s="eceba", K=2`	3	“ece”
`s="aa", K=1`	2	“aa”
`s="abcabc", K=2`	4	“bcbc” or “cabc” — wait, “cabc” has 3 distinct; correct: “bcbc” or “abca” — both length 4
`s="a", K=0`	0	empty
`s="abcdef", K=10`	6	full string

Initial Brute Force

Enumerate all substrings, check distinct-count, track max.

def longest_brute(s, K):
    best = 0
    for i in range(len(s)):
        seen = set()
        for j in range(i, len(s)):
            seen.add(s[j])
            if len(seen) <= K:
                best = max(best, j - i + 1)
            else:
                break
    return best

The inner loop can break on first violation, so this is effectively O(N · K) average and O(N²) worst case (when K is large enough that no break happens).

Brute Force Complexity

Time O(N² · |alphabet|) at worst (set ops). At N=5×10^4, that’s ~2.5×10^9 — too slow in any language.

Optimization Path

Observation: as r advances, the set of distinct characters in s[l..r] is monotone non-decreasing; as l advances (keeping r fixed), it is monotone non-increasing. So we can use a two-pointer / sliding window: advance r, and while the window has more than K distinct chars, advance l. Each character enters and leaves the window at most once, total O(N).

Use a frequency map freq[c] keyed by character; distinct is the count of characters with freq[c] > 0. Increment distinct when a key first reaches 1; decrement when a key drops to 0.

Final Expected Approach

def longest_at_most_k_distinct(s, K):
    if K == 0: return 0
    freq = {}
    l = 0
    best = 0
    for r, c in enumerate(s):
        freq[c] = freq.get(c, 0) + 1
        while len(freq) > K:
            freq[s[l]] -= 1
            if freq[s[l]] == 0:
                del freq[s[l]]
            l += 1
        best = max(best, r - l + 1)
    return best

We use len(freq) directly as the distinct count, since we delete keys that hit zero.

Data Structures Used

A hashmap freq: char → int of size at most K+1 during the violation, ≤ K otherwise.
Two integer pointers l, r.
A running maximum best.

Correctness Argument

Invariant: at the start of each iteration of the outer loop and after the inner shrink, freq contains exactly the characters of s[l..r] with their counts, and len(freq) ≤ K.

Base: before iteration 0, l = 0, freq = {}, len(freq) = 0 ≤ K. ✓

Step: we add s[r] to freq. If this brings len(freq) > K, we shrink: decrement freq[s[l]], delete on zero, advance l. The shrink loop terminates because l ≤ r always (proved: each shrink-step removes one character, and at most r - l + 1 characters can be in the window, so after at most r - l + 1 shrinks we have len(freq) ≤ 1 ≤ K).

Optimality (max): for each r, the smallest l such that the window has ≤ K distinct is recorded; this gives the longest valid window ending at r. Taking the max over all r gives the global longest.

Why while, not if: in this problem each new character can add at most one to distinct, so if would also work. But the canonical sliding-window template uses while because in cousin problems (LC 76) a single r-advance can violate by more than 1 (when adding required chars). Always default to while; pay the (zero) cost of generality.

Complexity

Time O(N): each character is added once, removed at most once. Each hash op is O(1) average. Space O(min(N, K+1)) for the frequency map.

Implementation Requirements

Use len(freq) (or maintain a distinct counter) — do not iterate the map to count.
Delete keys that hit zero, or your len(freq) will be wrong.
Update best after the shrink, not inside it. (Inside the shrink, the window is invalid.)
For ASCII-lowercase, an int[26] plus a separate distinct counter is faster than a hashmap (no hashing constant).
Guard K == 0 (or K < 0) at the top.

Tests

Smoke: ("eceba", 2) → 3, ("aa", 1) → 2.
Unit: K = 0 → 0; K ≥ |distinct(s)| → |s|; single-character string.
Edge: s = "" → 0; K = 1, s = "abcdef" → 1; K = |s| → |s|.
Large: |s| = 5 × 10^4, alphabet 26, K = 3; should run in well under 100ms.
Random: generate 100 random strings of length ≤ 200, alphabet of varying size, varying K; cross-check against the brute force.
Adversarial: all-same-character ("aaaa...", K=1 → N); strictly increasing alphabet ("abcdef...", K → answer is K).
Unicode follow-up: make sure your Java/JS code iterates by codepoint, not by char, if the spec extends to Unicode.

Follow-up Questions

“Now solve exactly K distinct.” → atMost(K) - atMost(K-1) for the count variant; for the longest-with-exactly-K, run the same sliding window but only record best when len(freq) == K (not ≤ K).
“Now solve LC 76 (Minimum Window Substring).” → same skeleton, but the violation is “window does not yet contain all required chars”; we grow until satisfied, then shrink while still satisfied, recording the minimum each time we’re satisfied.
“What if the string is streamed and only r advances?” → two-pointer doesn’t apply directly; you’d need an order-preserving structure. (Out of scope here.)
“What if K can change with each query?” → preprocess differently; this problem is not amenable to a single offline sliding window for many K values.
“What if the input is very large, alphabet huge, but s only contains a few distinct chars?” → no change; the hashmap is bounded by min(K+1, |distinct(s)|).

Product Extension

A real-time content-moderation system tracks the longest run of messages in a chat where at most K distinct emojis are used (an indicator of spam-bot activity, which tends to use a small bag of emojis on repeat). The sliding window updates per message in O(1) amortized. The same skeleton applies to “longest session window with at most K distinct user-agents” for fraud detection, and “longest range of cells with at most K distinct values” for spreadsheet anomaly detection.

Language/Runtime Follow-ups

Python: dict overhead is real; for ASCII-lowercase, [0]*26 plus a distinct int is ~3× faster. Use s = list(s) only if you need indexing speed (string indexing is already O(1)).
Java: HashMap<Character, Integer> boxes keys and values. For ASCII, use int[128] and track distinct manually. chars() method exists but s.charAt(i) is fine.
Go: map[byte]int for ASCII; map[rune]int for Unicode. range s on a string yields (byte_index, rune) — this is a top-3 Go string trap: for i, c := range s does not give you i as character index for multi-byte chars.
C++: std::unordered_map<char, int> works; for ASCII, std::array<int, 128> is faster.
JS/TS: Map<string, number> works. Iterating for (const c of s) yields codepoints, not UTF-16 code units — important if Unicode is in scope. Otherwise, s[i] works for ASCII.
Unicode caveat: “distinct character” might mean codepoint, grapheme cluster, or UTF-16 code unit — clarify with the interviewer.

Common Bugs

Updating best inside the shrink loop — records invalid windows, returns wrong answers when the only valid window length is small.
Forgetting to delete zero-count keys — len(freq) becomes stale, breaks the violation check. Equivalent bug: maintaining a separate distinct counter and forgetting to decrement when a count hits zero.
Using if instead of while — works for this problem but breaks for LC 76 and friends. Build the habit of while.
Off-by-one in r - l + 1 — r is inclusive, l is inclusive; window length is r - l + 1.
Java boxing in HashMap<Character, Integer> — autoboxing tax is ~3× over the primitive int[] approach for ASCII.
Go string range bug — iterating with for i := 0; i < len(s); i++ and indexing as s[i] gives bytes, not runes; for UTF-8 data this misclassifies multi-byte chars as multiple distinct ones.
JS UTF-16 surrogate pair bug — s.length for "😀abc" is 5 (surrogate + 3 ASCII), and s[0], s[1] are the surrogate halves, not the emoji.

Debugging Strategy

Trace ("eceba", 2) by hand. Window evolution: e | ec | ece | (shrink to ce, then) cebᴬ — when b enters, window has c, e, b (3 distinct), shrink: remove c (was at position 1), window is e b then add … etc. If your trace doesn’t match, your shrink logic is wrong.
Diff against the brute force on 50 random inputs.
Print (l, r, len(freq), best) per iteration; len(freq) should never exceed K after the shrink completes.

Mastery Criteria

Recognized “longest substring with property” as sliding window in <30 seconds.
Wrote the canonical shrink-while-violation template with the answer-update outside the shrink, in <5 minutes.
Articulated the loop invariant and the termination of the shrink loop.
Generalized to “exactly K distinct” via the atMost(K) - atMost(K-1) trick.
Identified the language-specific Unicode / boxing trap.
Solved LC 76, LC 424, LC 992 within a week, observing the same skeleton.

Lab 03 — Prefix Sums: Subarray Sum Equals K

Goal

Master the prefix-sum + hashmap-of-complements pattern. Deliverable solves LeetCode 560 in O(N) time, O(N) space, and you can articulate why a hashmap of prefix sums (not of values) is the right abstraction, why {0: 1} is the required base case, and why this generalizes to subarray-XOR-equals-K, subarray-sum-divisible-by-K, and friends.

Background Concepts

Prefix sum identity: sum(a[l..r]) = prefix[r+1] - prefix[l]. Reformulating “find subarrays with sum K” as “find pairs of prefix sums differing by K”. Hashmap as a complement-finder. Generalization to any group operation (XOR, mod, addition over any abelian group). Review pattern Prefix Sums and Hashing.

Interview Context

This is a rite-of-passage Medium: appears at Meta, Google, Amazon, Stripe. Naive candidates write O(N²) double loops over all subarrays. Decent candidates write a prefix-sum array then double loop over endpoints — still O(N²). Strong candidates collapse to O(N) with a hashmap-of-prefix-counts. Elite candidates immediately generalize to LC 974 (subarray sums divisible by K) and LC 525 (contiguous array — recast as prefix-balance equals zero) without prompting.

Problem Statement

Given an integer array nums and an integer K, return the total number of contiguous subarrays whose sum equals K.

Constraints

1 ≤ N ≤ 2 × 10^4
-1000 ≤ nums[i] ≤ 1000
-10^7 ≤ K ≤ 10^7
The array can contain negative numbers (this matters — sliding window does not apply).

Clarifying Questions

Are values negative or non-negative? (Per constraints — both. This is the crucial clarification: with non-negatives, sliding window works in O(N); with negatives, you must use prefix sums.)
Are zeros allowed? (Yes, and they create multiple subarrays of the same sum; the count must reflect this.)
Empty subarrays — count them? (No; subarrays have ≥ 1 element. But the prefix-sum technique uses an “empty prefix” of value 0, hence {0: 1} initialization.)
Is K always reachable? (No assumption.)
Can K = 0? (Yes — counts subarrays summing to 0, including those that are entirely zero.)

Examples

Input	Output	Note
`nums=[1,1,1], K=2`	2	`[1,1]` at indices `(0,1)` and `(1,2)`
`nums=[1,2,3], K=3`	2	`[3]` and `[1,2]`
`nums=[1,-1,0], K=0`	3	`[1,-1]`, `[0]`, `[1,-1,0]`
`nums=[0,0,0], K=0`	6	every contiguous subarray
`nums=[100], K=100`	1	trivial

Initial Brute Force

Two nested loops over (l, r), sum nums[l..r], count.

def subarray_sum_brute(nums, K):
    count = 0
    for l in range(len(nums)):
        s = 0
        for r in range(l, len(nums)):
            s += nums[r]
            if s == K: count += 1
    return count

Brute Force Complexity

Time O(N²). Space O(1). At N=2×10⁴, ~4×10⁸ ops — borderline; passes in C++ but TLEs in Python.

Optimization Path

The key reformulation: a subarray nums[l..r] sums to K iff prefix[r+1] - prefix[l] == K, i.e., prefix[l] == prefix[r+1] - K.

So as we walk r from 0 to N-1, computing the running prefix sum, we ask: “How many earlier prefix sums equal prefix - K?” This is a hashmap lookup. Each step is O(1). Total O(N).

The base case is subtle and important: before processing any element, we have prefix sum 0, “seen once”. This accounts for subarrays starting at index 0 (where the missing earlier prefix is the empty prefix of value 0).

Final Expected Approach

def subarray_sum(nums, K):
    count = 0
    prefix = 0
    seen = {0: 1}                       # empty prefix
    for x in nums:
        prefix += x
        count += seen.get(prefix - K, 0)
        seen[prefix] = seen.get(prefix, 0) + 1
    return count

Crucial ordering: lookup before insert. Otherwise the case K == 0 over-counts: every position would match itself.

Data Structures Used

A hashmap seen: int → int mapping each prefix-sum value to the number of times it has occurred.
A running prefix integer.
A running count integer.

Correctness Argument

Let p_i = sum(nums[0..i-1]) (so p_0 = 0). A subarray nums[l..r] sums to K iff p_{r+1} - p_l = K iff p_l = p_{r+1} - K.

For each r, the number of valid l ∈ [0, r] is |{l : p_l == p_{r+1} - K}|. As we iterate, seen after processing index r contains exactly {p_0, p_1, ..., p_{r+1}} with multiplicities. Looking up seen[prefix - K] before inserting prefix gives |{l ∈ [0, r] : p_l == p_{r+1} - K}| — the count of valid l for the current r.

Summing over all r gives the total count of valid subarrays. The {0: 1} initialization handles the case l == 0 (where p_0 == 0 is consulted).

Complexity

Time O(N) average (hashmap ops). Space O(N) for the hashmap (worst case: all distinct prefix sums). Worst-case time degrades to O(N²) under hash collisions on adversarial input — see Phase 1 §3.

Implementation Requirements

Initialize seen = {0: 1} before the loop. Forgetting this causes off-by-one on subarrays starting at index 0.
Lookup before insert. This isn’t just style — for K == 0, swapping the order miscounts trivially.
Use 64-bit accumulator if values × N could overflow 32-bit (here, 2×10^4 × 1000 = 2×10^7, safe; but build the habit).
Don’t precompute the prefix-sum array if you don’t need to — a single running int suffices.

Tests

Smoke: ([1,1,1], 2) → 2.
Unit: K = 0 cases; K larger than total sum (returns 0); single element matching K (returns 1); single element not matching K (returns 0).
Edge: nums = [0]*N, K = 0 → N*(N+1)/2. This validates that zeros are correctly counted.
Adversarial: alternating positives and negatives summing to K many times. Construct [1,-1,1,-1,...,1,-1] with K=0; expected count is N/2 + N/2*(N/2-1)/2 + … — easier to validate against the brute force.
Large: N = 2×10⁴, random values; assert ms-level. Compare to brute force on a 1000-element prefix.
Random: 100 random inputs of size ≤ 200; cross-check against brute.
Negative K: include K negative; must work because prefix sums and prefix - K use the same arithmetic.

Follow-up Questions

“Subarray sum divisible by K?” → key by prefix % K (taking care to normalize to non-negative for languages where % can return a negative value: ((prefix % K) + K) % K). Count pairs of equal residues.
“Subarray with XOR equal to K?” → exact same skeleton, replace + with ^. The group operation just needs an identity and an inverse.
“Longest subarray with sum K?” → store the first occurrence of each prefix-sum value; when you see prefix - K, the length is r+1 - first[prefix - K].
“Contiguous array (LC 525) — equal 0s and 1s?” → re-encode 0s as -1; the problem becomes “longest subarray with sum 0”.
“If memory is tight (cannot store hashmap)?” → if values are non-negative, sliding window in O(1) extra; otherwise no known better.
“What if the array is updated?” → Fenwick tree for prefix sums; each update O(log N), each query O(log N) (Phase 3).

Product Extension

Anomaly detection in cumulative metric streams. Imagine network ingress where a “subarray sum equals K” query asks: “how many time windows had exactly K bytes of traffic?” — useful for detecting periodicity or replay patterns. The same prefix-hashmap pattern, run online, detects these in O(1) per event with O(window-size) memory. Extends to log-aggregation systems (CloudWatch, Datadog) where dashboards scan millions of events and need O(N) algorithms.

Language/Runtime Follow-ups

Python: defaultdict(int) is cleaner than dict.get(...). Performance is similar.
Java: HashMap<Long, Integer> for safety. Boxing! Each getOrDefault(prefix, 0) + 1 followed by put(prefix, ...) does up to 4 boxes; pre-boxing or merge() with Integer::sum reduces it.
Go: map[int]int; the zero value of int is 0 so seen[prefix - K] returns 0 if absent — no need for ok check on lookups (but you do need to insert seen[0] = 1 initially).
C++: std::unordered_map<long long, int>. Beware: unordered_map<int, int>::operator[] inserts on read of a missing key, growing the map by one for every miss. Use find() for lookup-only.
JS/TS: prefer Map<number, number>; Object keys are coerced to strings. For very large prefix sums (> 2^53), use BigInt or strings as keys.
Adversarial hashing: crafted inputs producing many distinct prefix sums that collide can degrade to O(N²) in languages with deterministic hash. Java’s HashMap upgrades long chains to red-black trees; Python and Go randomize; C++ does not by default.

Common Bugs

Missing {0: 1} initialization — off-by-one for any subarray starting at index 0. Concretely: ([3], 3) returns 0 instead of 1.
Insert before lookup — for K == 0, every position matches itself once, returning N spurious counts.
Using dict[key] instead of dict.get(key, 0) in Python — KeyError on first miss.
Java autoboxing penalty — silent ~3× slowdown; use primitive maps (e.g., Eclipse Collections IntIntMap) or HashMap.merge.
C++ unordered_map::operator[] insertion-on-read — bloats the map and ruins iteration order; use find() for read-only.
Negative-modulo bug in the LC 974 follow-up — -3 % 5 is -3 in C++/Java/Go but 2 in Python. Normalize: ((x % K) + K) % K.
Recomputing prefix - K twice — minor, but (prefix - K) should be a single local for readability.

Debugging Strategy

Trace ([1,1,1], 2) by hand. Expected seen evolution: {0:1} → {0:1,1:1} → {0:1,1:1,2:1} → {0:1,1:1,2:1,3:1}. At each step, lookup prefix - K: at step 2, prefix=2, lookup 0: count += 1. At step 3, prefix=3, lookup 1: count += 1. Total 2.
For K = 0 issues: trace ([0,0,0], 0). seen evolves {0:1} → (lookup 0: count+=1) → {0:2} → (lookup 0: count+=2) → {0:3} → (lookup 0: count+=3). Total 6 = 3*(3+1)/2. ✓
Diff against brute force on 100 random small inputs.

Mastery Criteria

Recognized “subarray with sum X” + negatives as prefix-sum + hashmap in <30 seconds.
Articulated the {0: 1} base case before coding.
Lookup-before-insert ordering correct on first try.
Generalized to LC 974 (mod) and LC 525 (re-encoding) without lookup.
Identified C++ operator[] insertion-on-read trap.
Solved on the first try with no off-by-one bugs in 5 random small tests.

Lab 04 — Binary Search On Answer: Capacity To Ship Packages Within D Days

Goal

Master the parametric/binary-search-on-answer pattern: identify the monotonic predicate, prove its monotonicity, write a correct feasible(x) independently, then drive a half-open binary search over the answer space. Deliverable solves LC 1011 in O(N · log(sum)) time, O(1) space, and you can articulate the bounds, the predicate’s monotonicity proof, and the canonical [lo, hi) template’s termination.

Background Concepts

Decision problem vs optimization problem; parametric search; monotone predicates; the half-open [lo, hi) invariant feasible(hi-1) = true, feasible(lo-1) = false; greedy verification of feasibility. Review pattern Binary Search On Answer and the constraint→complexity table.

Interview Context

Binary search on answer is the single highest-value pattern for distinguishing strong from elite candidates in 30-minute Mediums. Most candidates can do binary search on a sorted index. Few can recognize that “min capacity such that we can ship in D days” is the same binary search, just over a different domain. Once you internalize this, an entire family of problems collapses: Koko bananas, split-array-largest-sum, smallest divisor, minimum days for bouquets, magnetic force balls, kth-smallest-in-multiplication-table. The pattern is identical; only feasible changes.

Problem Statement

A conveyor belt has packages with weights weights[i]. Each day you load packages in order and ship them on a boat with weight capacity C. Once loaded, the boat ships and starts again the next day. Return the minimum capacity C such that all packages ship within D days.

Constraints

1 ≤ D ≤ |weights| ≤ 5 × 10^4
1 ≤ weights[i] ≤ 500

Clarifying Questions

Must packages be loaded in given order? (Yes — this is the crux. If we could reorder, it becomes a bin-packing problem, which is NP-hard.)
Can a package’s weight exceed the daily capacity? (No — capacity must be at least max(weights), else that package never ships.)
Can the boat be partially filled and still ship that day? (Yes — when adding the next package would exceed C, you ship the current load and start fresh.)
Is D strict (must use all D days) or upper-bound (≤ D days)? (Upper-bound: any number of days ≤ D is acceptable.)
Are weights integers, and if so, is the answer always an integer? (Yes and yes.)

Examples

Input	Output
`weights=[1,2,3,4,5,6,7,8,9,10], D=5`	15
`weights=[3,2,2,4,1,4], D=3`	6
`weights=[1,2,3,1,1], D=4`	3
`weights=[10,50,100,100,50,100,100,100], D=5`	200

For the first: C=15 ships as [1,2,3,4,5] (15) | [6,7] (13) | [8] (8) | [9] (9) | [10] (10) — 5 days. C=14 would split day 1 into [1,2,3,4] | [5,6,7] | … and need 6+ days.

Initial Brute Force

Try every capacity from max(weights) to sum(weights); for each, simulate and check if D days suffice; return the first that works.

def ship_capacity_brute(weights, D):
    for C in range(max(weights), sum(weights) + 1):
        if feasible(weights, D, C): return C

def feasible(weights, D, C):
    days, load = 1, 0
    for w in weights:
        if load + w > C:
            days += 1; load = 0
        load += w
    return days <= D

Brute Force Complexity

Time O(N · range), where range = sum - max ≈ N · 500. So O(N²·500) ≈ 1.25×10^9. TLEs.

Optimization Path

Key observation: feasibility is monotonic in capacity. If C works (ships in ≤ D days), any C' > C also works (the greedy simulation can only do better — it never has to start a new day earlier). Equivalently, if C doesn’t work, no C' < C works.

So the set {C : feasible(C) == true} is an upward-closed half-line [C*, ∞). We binary search for C*.

Bounds:

Low: max(weights) — any capacity below this can’t even ship the heaviest package alone.
High: sum(weights) — capacity ≥ total ships everything in 1 day, trivially ≤ D days.

Final Expected Approach

def ship_capacity(weights, D):
    def feasible(C):
        days, load = 1, 0
        for w in weights:
            if load + w > C:
                days += 1; load = w
            else:
                load += w
            if days > D: return False
        return True

    lo, hi = max(weights), sum(weights) + 1   # half-open [lo, hi)
    while lo < hi:
        mid = (lo + hi) // 2
        if feasible(mid):
            hi = mid
        else:
            lo = mid + 1
    return lo

Note hi = sum + 1 (half-open) and the canonical lo < hi loop condition.

Data Structures Used

None beyond the input array and three integers (lo, hi, mid) plus two more inside feasible.

Correctness Argument

Monotonicity: Let feasible(C) be true. For C' > C, the greedy simulation with capacity C' packs at least as much per day as it would with C (because the “if load + w > C” branch fires at most as often). So days-needed-with-C' ≤ days-needed-with-C ≤ D. Hence feasible(C') is true. The set of feasible C is upward-closed.

Greedy feasibility: the greedy “ship today as much as fits” is optimal for min days given fixed C, by an exchange argument: if there’s a schedule that ships in fewer days, we can shift weight from a later day to an earlier day without exceeding C, monotonically reducing days. So feasible(C) correctly reports whether some schedule fits in D days.

Binary search invariant (half-open [lo, hi)): feasible(lo - 1) = false (or lo - 1 < max(weights)) and feasible(hi) either true or hi = sum + 1 (definitely feasible). The loop preserves this. On termination lo == hi, and feasible(lo) = true, feasible(lo - 1) = false. So lo is the minimum feasible C.

Termination: each iteration strictly shrinks hi - lo by at least 1 (in the mid + 1 branch) or halves it (in the hi = mid branch). Bounded by O(log(hi - lo)).

Complexity

Time O(N · log(sum - max)). With N = 5×10^4 and sum bounded by 2.5×10^7, that’s ~5×10^4 × 25 ≈ 1.25×10^6 ops. Easily sub-100ms in any language. Space O(1) additional.

Implementation Requirements

Write feasible first, test it independently on the examples, then wrap it in binary search. Many bugs are in feasible, not the search.
Use hi = sum + 1 half-open, or hi = sum and <=; pick one convention and stick to it. The half-open [lo, hi) template returns lo exactly.
Early-return from feasible once days > D — saves time on small C.
Don’t forget that on the “doesn’t fit” branch, the new day starts with the current package as its load, not zero (or else the next package may also overflow).

Tests

Smoke: the four examples above.
Unit: D = N (each package its own day) → answer is max(weights). D = 1 → answer is sum(weights).
Edge: all-equal weights, [5,5,5,5], D=2 → answer 10. Single weight, [42], D=1 → 42.
Independence test for feasible: for the first example, verify feasible(15)=true, feasible(14)=false, feasible(55)=true, feasible(10)=false.
Large: N = 5×10⁴, weights random in [1, 500], D = 1000; assert sub-100ms. Cross-check against brute on a 100-element prefix.
Adversarial: sorted ascending, sorted descending, all-max (weights all 500). The greedy is still optimal regardless of ordering (well, the answer depends on ordering since reorder is forbidden).

Follow-up Questions

“What if package order is flexible?” → bin packing, NP-hard. Approximation: First-Fit-Decreasing achieves 11/9 OPT.
“What if D is very large (D ≥ N)?” → answer is max(weights); you can early-return.
“What if you must use exactly D days?” → still binary-searchable (monotonicity holds for “≤ D days”; for “exactly D”, parametrize differently — but in practice you almost always want ≤ D).
“What about Koko Eating Bananas (LC 875)?” → identical pattern; feasible(speed) = sum(ceil(p / speed) for p in piles) ≤ H.
“Split array largest sum (LC 410)?” → identical pattern; feasible(maxSum) = (greedy partition into pieces of sum ≤ maxSum, count ≤ K).
“Floating-point answer?” → loop until hi - lo < eps, return lo. Watch for non-termination if eps is below float precision.

Product Extension

Capacity planning for a build pipeline: given a stream of CI jobs with known durations and a deadline D, find the minimum machine-count or the minimum machine-capacity that meets the deadline. Same pattern: monotonic in capacity, binary-search on answer with greedy feasibility. Generalizes to load-balancer auto-scaling: minimum number of pods s.t. p99 latency stays below SLA, given a workload trace. (The feasibility check becomes a simulator instead of a one-line greedy, but the structure is identical.)

Language/Runtime Follow-ups

Python: (lo + hi) // 2 is fine — Python ints are arbitrary precision. No overflow.
Java: int mid = lo + (hi - lo) / 2; to avoid 32-bit overflow when lo + hi exceeds Integer.MAX_VALUE. (For this problem’s constraints, lo + hi won’t overflow, but build the habit.)
Go: same overflow caveat as Java; use lo + (hi - lo) / 2.
C++: same; use lo + (hi - lo) / 2. int may also be too small for sum(weights) if constraints expand — use long long.
JS/TS: numbers are 64-bit floats, integer-precise up to 2^53. No overflow risk for this problem. But beware: Math.floor((lo + hi) / 2) is slower than (lo + hi) >>> 1 (zero-fill right shift), and the latter is what idiomatic JS binary-search uses.
Edge: for floating-point binary search (not this problem), terminate by iteration count (e.g., 100 rounds) rather than hi - lo < eps to dodge non-termination near float precision.

Common Bugs

Wrong bounds. lo = 1 instead of lo = max(weights) — the search may return a value that doesn’t fit the heaviest package (well, the binary search finds the smallest feasible value; if you start lo too low and feasible is correct, you still get the right answer — but you waste log(max) iterations and the symmetry of the bounds is broken). hi = sum instead of sum + 1 with lo < hi mis-handles the case where the answer is sum.
Inverted predicate direction. Searching for “max C such that infeasible” instead of “min C such that feasible” — flips the bounds and breaks the invariant.
feasible bug: forgetting to start the new day with the current package. Setting load = 0 instead of load = w after exceeding capacity corrupts the count for runs of large weights.
feasible bug: starting days = 0 instead of days = 1. The first day exists before any package ships.
Off-by-one in the binary search template. Mixing < with <= and mid with mid - 1 and mid + 1 is the most common interview bug. Memorize one template (we use the half-open one) and never deviate.
C++/Java/Go integer overflow on (lo + hi) / 2 for very large constraints.
Calling feasible with the wrong arg type (e.g., float when int expected) in dynamically-typed languages — silent rounding.

Debugging Strategy

Test feasible independently on the given examples for several values of C. The interactive trace feasible(15)=true, feasible(14)=false, … is your ground truth.
Add an iteration counter to the binary search; cap it at 100. If the cap fires, your bounds or update direction is wrong.
Print (lo, mid, hi, feasible(mid)) per iteration; the lo, hi interval should shrink monotonically to a singleton.
For overflow suspicions, replace ints with long/long long/bigint and rerun.

Mastery Criteria

Identified “minimum X such that property P” + monotone P as binary-search-on-answer in <60 seconds.
Stated the monotonicity argument in plain English before coding.
Wrote feasible first, tested it independently, then wrapped it in binary search.
Used a single canonical binary-search template (half-open) without confusing < vs <=.
Generalized verbally to LC 875 (Koko), LC 410 (Split Array Largest Sum), LC 1283 (Smallest Divisor) without prompting.
Identified the language-specific overflow / template trap.
Solved a similar new problem from this family in <10 minutes within a week of completing this lab.

Lab 05 — Monotonic Stack: Largest Rectangle In Histogram

Goal

Master the monotonic-stack pattern on its hardest canonical problem. Deliverable solves LC 84 in O(N) time, O(N) space, and you can articulate why each bar is pushed and popped at most once, why a sentinel 0 at the end is required (or how to handle the leftover stack), and how this generalizes to maximal-rectangle-of-1s in a 2D grid.

Background Concepts

Monotonic stack invariant; index-vs-value storage in stacks; sentinel technique for clean termination; amortized analysis (each index pushed and popped once); the rectangle’s “left boundary = new top after pop” / “right boundary = current index” trick. Review pattern Monotonic Stack.

Interview Context

LC 84 is one of the hardest commonly-asked Mediums (often labeled Hard). It appears at Google, Meta, and quant firms. The interview signal is whether you can derive the algorithm from a smaller cousin (LC 496 — Next Greater Element). Naive candidates write O(N²) “for each bar, expand left and right”. Decent candidates derive a left-bounds array and right-bounds array via two monotonic stack passes. Strong candidates do it in one pass with the popped-bar’s-rectangle trick. Elite candidates immediately observe that LC 85 (Maximal Rectangle) is just LC 84 applied per row of a derived heights array.

Problem Statement

Given an array heights representing histogram bar heights of equal width 1, return the area of the largest rectangle that fits within the histogram.

Constraints

1 ≤ N ≤ 10^5
0 ≤ heights[i] ≤ 10^4

Clarifying Questions

Are heights non-negative? (Per constraints, yes.)
Can heights be zero? (Yes — and zero-height bars effectively reset the candidates, since no rectangle can include them.)
Are bars unit-width? (Yes — width 1 each, so the rectangle’s width is just the number of consecutive bars that all have at least the rectangle’s height.)
Multiple equal-area rectangles — return any area, or specify? (Just the area; LC asks for the max.)
Is the answer always achievable in 32-bit? (max_height × N = 10^4 × 10^5 = 10^9, fits 32-bit signed but barely. Use 64-bit for safety in C++/Java.)

Examples

Input	Output
`[2,1,5,6,2,3]`	10 (rectangle of height 5 covering indices 2..3 — width 2 — wait, height 5 × width 2 = 10? actually height 5 spans indices 2..3 (heights 5,6), so width 2 rectangle of height 5 — area 10. Or height 6 over index 3 alone, area 6. Best is 10.)
`[2,4]`	4
`[2,1,2]`	3 (rectangle of height 1 spanning all 3 bars)
`[6,7,5,2,4,5,9,3]`	16
`[0]`	0

Initial Brute Force

For each bar i, expand left and right while heights stay ≥ heights[i]; rectangle area is heights[i] × (right - left + 1).

def largest_rect_brute(heights):
    n = len(heights)
    best = 0
    for i in range(n):
        l = r = i
        while l > 0 and heights[l - 1] >= heights[i]: l -= 1
        while r < n - 1 and heights[r + 1] >= heights[i]: r += 1
        best = max(best, heights[i] * (r - l + 1))
    return best

Brute Force Complexity

Time O(N²) worst case (all-equal heights). Space O(1). At N=10⁵, ~10^10 ops — TLEs everywhere.

Optimization Path

Observation: for each bar i, we want the largest rectangle of at least height heights[i] that contains i. Width = (next-smaller-to-the-right) − (previous-smaller-to-the-left) − 1. If we know “previous smaller index” pl[i] and “next smaller index” pr[i] for every bar, area is heights[i] * (pr[i] - pl[i] - 1), computed in O(N) total.

Both pl and pr are computable by a single monotonic-stack pass each. Even better: a single pass with a stack-of-indices in strictly increasing height order. When we encounter a smaller bar, we pop the stack; for each popped index j, the current index i is its pr[j] and the new top of the stack (after the pop) is its pl[j]. Compute j’s rectangle area on the spot.

Final Expected Approach

def largest_rectangle_area(heights):
    stack = []  # indices, heights[stack] strictly increasing
    best = 0
    n = len(heights)
    for i in range(n + 1):
        cur = 0 if i == n else heights[i]
        while stack and heights[stack[-1]] > cur:
            top = stack.pop()
            left = stack[-1] if stack else -1
            width = i - left - 1
            best = max(best, heights[top] * width)
        stack.append(i)
    return best

The trick: iterate to n + 1 with a sentinel cur = 0. This forces every remaining bar to be popped (since 0 is strictly less than any positive height), so we don’t need a separate post-loop drain.

Data Structures Used

A stack of indices into heights, holding indices whose heights are strictly increasing from stack-bottom to stack-top.
A running best integer.

Correctness Argument

Stack invariant: at every point, heights[stack[0]] < heights[stack[1]] < ... < heights[stack[-1]].

Maintenance: before pushing i, we pop all indices j with heights[j] >= heights[i] (using > ensures strict; for this problem, > is correct and >= would over-pop). After popping, all remaining stack entries have height < heights[i], so pushing i preserves the invariant.

Per-popped-bar rectangle is correct: when we pop top = stack.pop(), by the invariant the new top’s height < heights[top], so pl[top] is stack[-1] (or -1 if stack empty). The current index i is the first index since top with heights[i] < heights[top] (because all indices between top+1 and i-1 had heights ≥ heights[top] and were either still on the stack or popped earlier — but if popped earlier, they were popped by a strictly-smaller bar, contradiction). So pr[top] = i. Width is i - pl[top] - 1.

Sentinel correctness: at i = n we use cur = 0, smaller than any positive height. This pops every remaining index, computing each one’s rectangle with pr = n.

Amortized O(N): each index pushed once, popped at most once. Inner while loop’s total iterations across the outer loop sum to ≤ N.

Complexity

Time O(N) amortized. Space O(N) for the stack worst case (strictly increasing input).

Implementation Requirements

Store indices, not heights, in the stack — you need indices to compute width.
Use the sentinel trick (i in range(n + 1) with cur = 0 at i = n) for clean code, OR drain the stack after the main loop with pr = n. Pick one; the sentinel is preferred.
Use strict > in the pop condition. For “largest rectangle”, > and >= give the same answer (rectangles with equal heights are accounted for by their leftmost bar), but >= over-pops and confuses the bookkeeping in cousin problems.
Use 64-bit arithmetic in Java/C++ for the area: 10^4 × 10^5 = 10^9, fits 32-bit, but Integer.MAX_VALUE = 2.1×10^9 and habits matter.

Tests

Smoke: [2,1,5,6,2,3] → 10.
Unit: [1] → 1, [1,1,1,1] → 4, [1,2,3,4,5] → 9 (rectangle of height 3 over indices 2..4), [5,4,3,2,1] → 9.
Edge: [0] → 0, [0,0,0] → 0, [N copies of 1] → N, [10000] → 10000.
Adversarial: strictly increasing — every bar stays on the stack until the sentinel; tests the drain. Strictly decreasing — every bar popped immediately; tests the per-bar bookkeeping.
Large: N = 10⁵, random heights; assert <100ms. Cross-check against brute on a 1000-prefix.
All same: [7,7,7,7,7,7,7,7] → 56.
Random: 100 random inputs of size ≤ 200 against brute.

Follow-up Questions

“Maximal rectangle of 1s in a 2D matrix (LC 85)?” → for each row r, build heights[c] = (heights[c] + 1 if mat[r][c] == 1 else 0). Then run LC 84 on each row’s heights. Total O(R·C).
“Largest rectangle of equal value?” → modify the predicate; same skeleton.
“Number of submatrices with all 1s (LC 1504)?” → variant where for each row + column we count.
“What if heights can be updated?” → segment tree with merge function; Phase 3 territory.
“What if N is so large the stack doesn’t fit?” → the stack is bounded by N; if N doesn’t fit in memory, you have a bigger problem. (Answer: stream-based algorithms with reduced memory exist for some restricted versions.)
“Why does >= give the same answer here?” → bars of equal height to the popped bar fail to extend its rectangle leftward (they’d be popped first or become the new left boundary), so the answer is unchanged. Subtle, worth a sentence in the interview.

Product Extension

In ad-hoc analytics, the “largest rectangle” maps to “the longest time window during which all of K monitored systems exceeded a threshold” — useful for SLA breach detection. Each system’s per-time-bin status forms a histogram; the largest rectangle is the worst sustained breach. The same single-pass stack algorithm processes a stream of metric snapshots in O(1) amortized per snapshot.

Language/Runtime Follow-ups

Python: native list as a stack — append / pop are O(1) amortized. Skip collections.deque here; the stack-only access pattern doesn’t benefit from a deque.
Java: prefer ArrayDeque<Integer> over Stack (the latter is a synchronized legacy class with overhead). Or use int[] with a manual top index — fastest for hot loops.
Go: slice-as-stack — stack = append(stack, i), top := stack[len(stack)-1]; stack = stack[:len(stack)-1]. Beware: capacity may grow geometrically and not shrink — fine here since N is bounded.
C++: std::vector<int> is fastest. std::stack<int> adds an unnecessary wrapper. Reserve capacity with stack.reserve(n).
JS/TS: native Array.prototype.push/pop — O(1) amortized. Not as fast as a typed Int32Array for hot loops.
Hot-loop: in Java, int[] + top int outperforms ArrayDeque<Integer> by ~3× due to no boxing.

Common Bugs

Off-by-one width: i - left - 1 vs i - left. The popped bar’s rectangle starts at left + 1 and ends at i - 1 (both inclusive), width = i - left - 1.
Forgetting the sentinel / drain. Without the sentinel 0, indices remaining on the stack are never processed. Their rectangles extend to n - 1 with pr = n.
Using >= instead of > (or vice versa). For LC 84, both happen to produce the right answer, but cousin problems break — pick the variant that gives a unique boundary.
Storing heights instead of indices. You then can’t compute width.
Integer overflow in C++/Java for max-bar × max-N. Use 64-bit.
Recursive simulation instead of iterative — Python’s default recursion limit is 1000, breaks at N > 1000.
Java boxing in Stack<Integer> or ArrayDeque<Integer> — silent slowdown. Use int[] with manual top.

Debugging Strategy

Trace [2,1,5,6,2,3]. Stack evolution: push 0; at i=1 (height 1), pop 0 (height 2, left=-1, width=1, area=2); push 1; push 2; push 3; at i=4 (height 2), pop 3 (height 6, left=2, width=1, area=6); pop 2 (height 5, left=1, width=2, area=10); push 4; push 5; sentinel pops everything. Best = 10. ✓
Trace [1,1,1,1]. With >, the stack just accumulates [0,1,2,3]; sentinel pops 3 (area 1), pops 2 (area 2), pops 1 (area 3), pops 0 (area 4). Best = 4. ✓
Cross-check 50 random inputs of size 50 against brute force.

Mastery Criteria

Recognized “largest rectangle in histogram” as a monotonic-stack problem within 60 seconds.
Stated the strict-monotonic-stack invariant before coding.
Used the sentinel-0 trick on first attempt (or correctly drained post-loop).
Wrote i - left - 1 correctly, no off-by-one.
Generalized to LC 85 (max rectangle of 1s) without prompting.
Articulated the amortized O(N) bound (each index pushed and popped once).
Solved a cousin problem (LC 42 trapping rain water with stack, or LC 901 stock span) in <10 minutes within a week.

Lab 06 — Intervals: Meeting Rooms II (Heap Of Ends + Sweep-Line Alternate)

Goal

Master the two canonical interval algorithms — min-heap of end times and event-based sweep line — applied to the same problem (LC 253). Deliverable solves it both ways, in O(N log N) time, O(N) space, and you can articulate the tie-breaking rule, why one approach is more intuitive while the other generalizes more cleanly to “max concurrent X” problems.

Background Concepts

Sorting by start time as the canonical interval-prep; min-heap of end times tracking active intervals; sweep line as event stream (time, ±1) with stable tie-breaking; the “end before start” tie-break for closed-on-start, open-on-end intervals. Review pattern Intervals and Heap.

Interview Context

Interval problems appear at Meta, Google, and Amazon — and Meeting Rooms II in particular is a top-15 most-asked Medium. The interview signal is whether you can recognize that “min number of rooms” = “max concurrent meetings”, and then compute concurrency either via heap-of-ends or sweep. Strong candidates do one approach correctly; elite candidates do both, articulate the trade-off, and handle the open/closed interval tie-break correctly without prompting.

Problem Statement

Given an array of meeting time intervals intervals[i] = [start_i, end_i], return the minimum number of meeting rooms required.

Constraints

1 ≤ N ≤ 10^4
0 ≤ start_i < end_i ≤ 10^6
Each interval is half-open [start, end): a meeting ending at time t and one starting at time t can share a room.

Clarifying Questions

Are intervals half-open or closed? (Crucial: [1,3] and [3,5] — same room or not? Per LC 253, half-open: same room. This dictates the tie-break.)
Can two meetings start at the same time? (Yes — they need separate rooms.)
Are intervals sorted? (No assumption.)
Is start < end strict? (Per constraints, yes; no zero-duration meetings.)
Are room IDs significant, or just the count? (Just the count.)

Examples

Input	Output
`[[0,30],[5,10],[15,20]]`	2
`[[7,10],[2,4]]`	1
`[[1,5],[5,10],[10,15]]`	1 (chained, half-open)
`[[1,5],[2,5],[5,10]]`	2 (overlap at [2,5]; the [5,10] reuses)
`[[1,2]]`	1

Initial Brute Force

For each pair (i, j), count overlaps; max over all time points.

def min_rooms_brute(intervals):
    times = sorted(set(t for s, e in intervals for t in (s, e)))
    best = 0
    for t in times:
        count = sum(1 for s, e in intervals if s <= t < e)  # half-open
        best = max(best, count)
    return best

Brute Force Complexity

Time O(N²). Space O(N). At N=10⁴, ~10⁸ ops — borderline; passes in C++ but TLEs in Python.

Optimization Path A — Heap of End Times

Sort intervals by start. Maintain a min-heap of end times for currently active meetings. For each new interval (start, end):

If the heap’s smallest end ≤ start, that room frees up — pop it.
Push the new end.

The number of rooms needed = max heap size ever, which equals the final heap size if we don’t pop more than we push (and we don’t, by the invariant). Actually, simpler: rooms = heap size at end, since we only pop when reusing, never net.

import heapq

def min_rooms_heap(intervals):
    intervals.sort(key=lambda x: x[0])
    heap = []
    for s, e in intervals:
        if heap and heap[0] <= s:
            heapq.heappop(heap)        # reuse a room
        heapq.heappush(heap, e)
    return len(heap)

Optimization Path B — Sweep Line

Convert intervals to events (start, +1), (end, -1). Sort: by time, with end events before start events on ties (because [1,5] and [5,10] share a room). Sweep, tracking max concurrent.

def min_rooms_sweep(intervals):
    events = []
    for s, e in intervals:
        events.append((s, +1))
        events.append((e, -1))
    events.sort()                       # (time, +1)>(time, -1): -1 sorts first since -1 < 1
    cur = best = 0
    for _, delta in events:
        cur += delta
        best = max(best, cur)
    return best

The tie-break is automatic: (t, -1) < (t, +1) because -1 < 1 lexicographically. End events fire before start events at the same t, so a freed room is reused.

Final Expected Approach

Either A or B is acceptable; mention both in the interview.

Data Structures Used

Heap approach: the input array (sorted) + a min-heap of end times.
Sweep approach: an events array of size 2N + a single integer counter.

Correctness Argument (Heap)

Invariant: after processing intervals 0..i-1 in sorted-by-start order, heap contains the end times of all rooms that are still in use at time intervals[i-1].start. Equivalently, heap is the multiset of end times of meetings that haven’t ended yet by the time we’d schedule the next one.

When processing (s, e):

Any heap top ≤ s corresponds to a room whose meeting has ended; it’s reusable. Pop one.
Push e for the new meeting.

The final len(heap) is the cumulative maximum number of concurrent meetings, since we only pop when a room frees up (so the heap size strictly decreases only when an old room is reused for a new meeting; otherwise, it grows). Equivalently, max-active-at-any-point = max-rooms-ever-needed.

(Note: we only pop one room even if multiple have ended, but that’s fine — each subsequent meeting will pop its own.)

Correctness Argument (Sweep)

A sweep at time t maintains cur = number of intervals whose start ≤ t < end, half-open. The max of cur over all t is the max concurrency, which is the min rooms needed. The tie-break “end before start at the same time” implements the half-open convention: an end at t decrements cur before a start at t increments it, so cur correctly reflects “intervals active at exactly t” with the half-open semantics.

Complexity

Both: time O(N log N) (sort dominates; heap and sweep each O(N log N) and O(N) respectively after the sort). Space O(N).

Implementation Requirements

Sort by start for the heap approach. By start ascending; tie-break doesn’t matter for the heap (because we always pop if heap[0] ≤ s, which is correct for either tie-break).
For the sweep approach, the natural tuple sort (time, delta) with delta ∈ {-1, +1} already gives the right tie-break; don’t reverse the comparator.
Use a min-heap (Python heapq, Java PriorityQueue default, C++ priority_queue<int, vector<int>, greater<int>>).
Don’t sort start-times and end-times separately into two arrays — that’s a third valid approach (the “two pointers” approach, equivalent to sweep) but make sure you understand it’s distinct from the heap approach.

Tests

Smoke: the five examples above.
Unit: all-disjoint intervals → 1; all-identical intervals → N; chain [1,2],[2,3],[3,4] → 1.
Edge: N = 1 → 1; intervals all start at 0 → N (all overlap).
Tie-break: [[5,10],[10,15]] → 1 (half-open). If your test returns 2, your sort comparator is wrong.
Adversarial: tournament-bracket — [[1,4],[2,5],[7,9],[1,5]] (mix); validate against brute on small inputs.
Large: N = 10⁴, random intervals; both heap and sweep should run sub-50ms.

Follow-up Questions

“Return the actual schedule (which interval goes in which room)?” → augment heap entries to (end, room_id); reuse the popped room_id for the new interval.
“Real-time scheduling: intervals arrive in order, decide room on the fly?” → still works; no offline assumption needed for the heap version (sweep needs all events upfront).
“Closed intervals [s, e] instead of half-open?” → flip the tie-break: start before end at the same time, i.e., sort (t, +1) before (t, -1). Or in heap version, change heap[0] <= s to heap[0] < s.
“Maximum number of overlapping intervals (LC 1851)?” → identical pattern.
“Insert / delete intervals dynamically?” → balanced BST keyed on start; O(log N) per op (Phase 3).
“Generalize to weighted intervals?” → DP on intervals (interval scheduling maximization, LC 1235), Phase 3.

Product Extension

Resource allocation in CI/CD: minimum number of build agents required to handle a queue of (start, duration) jobs without delay, given they’re all queued ahead of time. Or: minimum servers needed to handle a known load profile (each request has a known service window). The same heap-of-ends pattern, run on a stream, computes peak concurrency in real time.

Language/Runtime Follow-ups

Python: heapq is min-only; for max-heap, negate. heapq.heappop(h) and heapq.heappush(h, x) are O(log N). For sweep, events.sort() on a list of tuples works directly thanks to lexicographic tuple ordering.
Java: PriorityQueue<Integer> defaults to min-heap. Boxing tax on primitives — for hot loops, use IntPriorityQueue from a primitive collections lib. Sort intervals via Arrays.sort(intervals, (a, b) -> a[0] - b[0]) — beware overflow (use Integer.compare(a[0], b[0])).
Go: container/heap requires implementing the heap.Interface (5 methods). Verbose but flexible.
C++: std::priority_queue<int, std::vector<int>, std::greater<int>> for min-heap (default is max). Sort with std::sort.
JS/TS: no built-in heap. Implement one (~30 lines) or use a library. For sweep, events.sort((a, b) => a[0] - b[0] || a[1] - b[1]). Beware: Array.sort is not stable in all engines for older versions of V8 (it’s stable since ES2019 — so it’s fine on modern Node, but mention it).
Edge: Java’s PriorityQueue.peek() is O(1); poll() is O(log N). C++’s top() is O(1); pop() is O(log N) but doesn’t return the value — call top() first.

Common Bugs

Wrong tie-break in sweep. For half-open intervals, end events must fire before start events at the same time. The natural (time, ±1) tuple sort gives this for free (since -1 < +1); reversing the comparator breaks it.
Heap comparator on max-heap default (Java’s PriorityQueue has a min-heap default; C++’s priority_queue has a max-heap default — easy to forget which is which).
Sorting by end instead of start in the heap approach — gives wrong room counts.
Popping heap on heap[0] < s instead of <= — for half-open [s, e), <= is correct (a meeting ending at s has ended, the room is free for one starting at s).
Forgetting to push the new end after popping — heap loses an entry, undercount.
Comparator overflow in Java: a[0] - b[0] overflows when values are huge; use Integer.compare.
Sweeping events without separating same-time events — fragile; always tie-break explicitly even if the data “happens” to not have ties.

Debugging Strategy

Trace [[0,30],[5,10],[15,20]] with the heap approach: sort doesn’t change order. Process [0,30]: heap = [30]. Process [5,10]: 30 > 5, no pop; push 10; heap = [10, 30]. Process [15,20]: 10 ≤ 15, pop; push 20; heap = [20, 30]. Final size = 2. ✓
For half-open tie-break: trace [[5,10],[10,15]]. Heap: [10]; second interval, 10 <= 10 → pop, push 15; heap = [15]. Final size 1. ✓ If you used <, you’d get 2.
For sweep: events [(5,+1),(10,-1),(10,+1),(15,-1)] sorted: [(5,+1),(10,-1),(10,+1),(15,-1)]. Sweep: cur becomes 1, 0, 1, 0; max 1. ✓
Validate against brute on 30 random inputs of size ≤ 50.

Mastery Criteria

Recognized “min rooms” / “max concurrent intervals” within 30 seconds.
Wrote both heap and sweep solutions within 10 minutes total.
Articulated the half-open tie-break before coding.
Handled the language-specific heap default (min vs max) without bugs.
Identified the connection to LC 1851 (max overlapping intervals) and LC 1094 (Car Pooling).
Solved LC 56 (Merge Intervals) and LC 57 (Insert Interval) within a week, observing the same sort-by-start skeleton.

Lab 07 — Topological Sort: Course Schedule II (Kahn’s Vs DFS)

Goal

Master both topological sort algorithms — Kahn’s BFS and DFS-postorder — applied to LC 210. Deliverable produces a valid course order in O(V + E) time, O(V + E) space, with cycle detection wired in. You can articulate when each algorithm is preferable, the standard cycle-detection check, and how this generalizes to dependency resolution and build systems.

Background Concepts

DAG topological order; indegree array as a Kahn’s prerequisite; DFS three-color marking (white/gray/black) for cycle detection; postorder reverse for DFS-topo; the equivalence of “topo order exists” and “graph is a DAG”; existence of cycle = len(order) != V for Kahn’s. Review pattern Topological Sort and Graph Foundations.

Interview Context

Topological sort appears at Meta (build dependencies), Amazon (course scheduling, package ordering), Google (Spanner schema migrations). The interview signal is whether you naturally pick Kahn’s for explicit ordering (where the BFS structure makes the algorithm self-documenting) and DFS for cycle detection or recursive structure (where the call stack mirrors the recursion). Weak candidates only know one. Strong candidates code Kahn’s and explain the DFS variant. Elite candidates discuss stable ordering (preserving input order on ties via priority queue) and comment on parallelizability.

Problem Statement

You must take numCourses numbered 0..n-1. Some courses have prerequisites: prerequisites[i] = [a, b] means you must take b before a. Return any valid ordering of courses to finish all of them, or an empty array if impossible (cycle).

Constraints

1 ≤ numCourses ≤ 2000
0 ≤ |prerequisites| ≤ 5000
All [a, b] pairs unique; a != b.

Clarifying Questions

Are duplicate prerequisite pairs possible? (Per constraints, no — but worth confirming, as it affects indegree counting.)
Are self-loops [a, a] possible? (Per constraints, a != b — no self-loops.)
If multiple valid orderings exist, return which one? (Any. But mention you can return the lex-smallest with a min-heap-based Kahn’s.)
Output empty array on cycle, or null/exception? (LC says empty.)
Are course IDs always 0..n-1 contiguous? (Yes per LC; otherwise you’d need to map.)

Examples

numCourses	prerequisites	Output
2	`[[1,0]]`	`[0,1]`
4	`[[1,0],[2,0],[3,1],[3,2]]`	`[0,1,2,3]` or `[0,2,1,3]`
2	`[[1,0],[0,1]]`	`[]` (cycle)
1	`[]`	`[0]`
3	`[[0,1],[1,2],[2,0]]`	`[]` (cycle)

Initial Brute Force

Repeatedly find a course with no remaining prerequisites; remove it and its outgoing edges; repeat. If at some point no such course exists but courses remain, there’s a cycle.

def find_order_brute(n, prereqs):
    deps = [set() for _ in range(n)]
    for a, b in prereqs:
        deps[a].add(b)
    order = []
    for _ in range(n):
        for i in range(n):
            if deps[i] is not None and not deps[i]:
                order.append(i)
                deps[i] = None
                for j in range(n):
                    if deps[j] is not None: deps[j].discard(i)
                break
        else:
            return []
    return order

Brute Force Complexity

Time O(V³) (V scans × V lookups × V edge updates per round). At V=2000, ~8×10⁹ — TLEs.

Optimization Path

Kahn’s algorithm replaces “find a no-prereq course” + “remove it” with O(1) amortized operations:

Compute indegree[v] = number of incoming edges.
Initialize a queue with all v having indegree[v] == 0.
Repeat: pop u from queue, append to order, decrement indegree of each successor v; if indegree[v] becomes 0, push v.
If len(order) == V: that’s the topological order. Else: cycle.

Each edge processed once. Each vertex enqueued/dequeued once. Total O(V + E).

DFS variant: run DFS from each unvisited node; on finish (postorder), prepend the node to the order. Detect cycles via the gray-vertex check.

Final Expected Approach (Kahn’s)

from collections import deque

def find_order(n, prereqs):
    adj = [[] for _ in range(n)]
    indeg = [0] * n
    for a, b in prereqs:
        adj[b].append(a)        # edge b -> a (b is prereq of a)
        indeg[a] += 1
    q = deque(v for v in range(n) if indeg[v] == 0)
    order = []
    while q:
        u = q.popleft()
        order.append(u)
        for v in adj[u]:
            indeg[v] -= 1
            if indeg[v] == 0:
                q.append(v)
    return order if len(order) == n else []

Final Expected Approach (DFS)

def find_order_dfs(n, prereqs):
    adj = [[] for _ in range(n)]
    for a, b in prereqs:
        adj[b].append(a)
    color = [0] * n               # 0=white, 1=gray (on stack), 2=black (done)
    order = []

    def dfs(u):
        if color[u] == 1: return False    # back edge -> cycle
        if color[u] == 2: return True
        color[u] = 1
        for v in adj[u]:
            if not dfs(v): return False
        color[u] = 2
        order.append(u)
        return True

    for u in range(n):
        if color[u] == 0 and not dfs(u):
            return []
    return order[::-1]

Data Structures Used

Kahn’s: adjacency list, indegree array, queue, output list.
DFS: adjacency list, color array, output list, implicit recursion stack.

Correctness Argument (Kahn’s)

Invariant: at any point, order contains a valid prefix of some topological order, and the queue contains exactly the unprocessed vertices with no remaining unsatisfied prerequisites (i.e., remaining indegree zero in the subgraph of unprocessed vertices).

Maintenance: when we pop u, all its prereqs are already in order (since indeg[u] == 0 in the residual graph means all its prereqs have been processed). Adding u extends a valid topo-prefix. For each successor v, decrementing indeg[v] reflects that u is now “satisfied”; if v’s residual indegree hits 0, all its prereqs are satisfied, so it’s eligible.

Cycle detection: if len(order) < n, some vertices were never enqueued, meaning their residual indegree never reached 0 — they’re inside a strongly connected component with a cycle (or downstream of one).

Correctness Argument (DFS)

The classical theorem: a directed graph is a DAG iff DFS encounters no back edges. We mark vertices gray when entering DFS, black on finish. Encountering a gray vertex via an outgoing edge is a back edge → cycle. Postorder reversed gives a valid topological order: when u finishes, all reachable-from-u vertices have already finished and are earlier in order; reversing puts them after u.

Complexity

Both: O(V + E) time, O(V + E) space (adjacency + queue/recursion-stack).

Implementation Requirements

Build the adjacency list with edge direction prereq → course (so we can decrement indeg[course] when a prereq is processed). The reverse direction also works but flips the topo-order interpretation; pick one convention and stay consistent.
Use deque (Python) / ArrayDeque (Java) / container/list (Go) for the BFS queue, not a Python list.pop(0) which is O(N).
DFS recursion: watch Python’s default recursion limit (1000) for large N. Either iterative DFS or sys.setrecursionlimit for V > ~900.
Cycle check after the loop (Kahn’s) or during (DFS gray check).

Tests

Smoke: (2, [[1,0]]) → [0,1].
Unit: no prereqs ((3, []) → any permutation of [0,1,2]); single chain ((4, [[1,0],[2,1],[3,2]]) → [0,1,2,3]).
Cycle: (2, [[1,0],[0,1]]) → []; longer cycle (3, [[0,1],[1,2],[2,0]]) → [].
DAG with multiple roots: (4, [[2,0],[2,1],[3,2]]) → either [0,1,2,3] or [1,0,2,3]. Validate by checking the output is a permutation and all prereqs respected.
Edge: N=1, no prereqs → [0].
Large: N=2000, E=5000, random DAG; assert sub-100ms.

Validator helper (write this!):

def is_valid_topo(order, n, prereqs):
    if len(order) != n or set(order) != set(range(n)): return False
    pos = {v: i for i, v in enumerate(order)}
    return all(pos[b] < pos[a] for a, b in prereqs)

Follow-up Questions

“Return the lex-smallest valid order.” → replace the queue with a min-heap. Time O((V + E) log V).
“Detect all vertices in cycles, not just whether any exist.” → SCC decomposition (Tarjan’s / Kosaraju’s), Phase 3.
“Topological sort under updates (edges added/removed)?” → online topological order maintenance, hard problem; offline batched updates with reordering.
“Parallel topological sort?” → at each round, all indegree-0 vertices can run in parallel; this is the natural parallelization for build systems (Bazel, Buck).
“Schedule with time costs per task: minimize total wall time?” → critical-path method; longest path in DAG, computable in O(V + E) after topo-sort.
“If V is huge (10^9) and the graph is implicit?” → streaming variant; need indegree of each vertex computed via input stream.

Product Extension

This pattern is dependency resolution. Build systems (Bazel, Make, Maven, npm/yarn lockfile resolution), database migration runners, terraform depends_on, container orchestration (Kubernetes init-containers), spreadsheet recalc, even React’s effect-dependency ordering. Course Schedule II is the toy version of “given declared dependencies, output a valid execution order” — and the DFS variant is what most build systems use, because they want to detect cycles early with a clear error path showing the offending cycle.

Language/Runtime Follow-ups

Python: collections.deque for BFS queue; sys.setrecursionlimit(10**5) if doing DFS on large input. List-as-queue with pop(0) is O(N) — never use it.
Java: ArrayDeque<Integer> is the canonical queue. Boxing tax for Queue<Integer> — for hot loops, use int[] ring buffer.
Go: no built-in queue; use a slice and q = q[1:] (cheap if cap(q) doesn’t change) or container/list (heavier). Slice-as-queue grows O(N) memory per shrink because Go doesn’t truncate-and-shift; if memory matters, periodically copy the live portion.
C++: std::queue<int> for Kahn’s; std::vector<int> + recursive DFS (iterative if V > ~10⁵ to avoid stack overflow with default 8MB stack).
JS/TS: Array.prototype.shift() is O(N) — use index-based queue (let head = 0; q[head++]) for O(1) amortized.
Stack overflow: any DFS-topo on V > recursion-limit needs iterative implementation. The iterative version uses an explicit stack of (vertex, iterator) pairs.

Common Bugs

Edge direction confusion. [a, b] means “b before a”, so the edge is b → a. Reversing it inverts the topological order and breaks the indegree computation.
Forgetting cycle detection. Returning order even when len(order) < n produces a partial order that misses some courses.
Using list.pop(0) in Python (or Queue.poll() on a LinkedList<Integer> — actually fine, but ArrayList.remove(0) is O(N)).
Python recursion limit in DFS on V > 1000 — silent RecursionError. Set the limit explicitly.
DFS gray check missing. Without distinguishing gray (on-stack) from black (finished), you can’t detect back edges; you’d accept cyclic graphs with the wrong order.
Java boxing penalty in Queue<Integer> — ~2× slowdown vs int[] ring buffer.
Adjacency list as Map<Integer, List<Integer>> when courses are 0..n-1 — wastes time on hashing; use List<List<Integer>> indexed by ID.

Debugging Strategy

Trace (4, [[1,0],[2,0],[3,1],[3,2]]) Kahn’s: indeg = [0,1,1,2]. Queue: [0]. Pop 0, decrement indeg of 1 and 2: indeg = [0,0,0,2]. Push 1, 2. Pop 1, decrement indeg of 3: [0,0,0,1]. Pop 2, decrement: [0,0,0,0]. Push 3. Pop 3. Order = [0,1,2,3]. ✓
Run the validator helper on every output during development.
For cycle issues: build a small cycle by hand, ensure your code returns [], not a partial order.

Mastery Criteria

Recognized “valid order respecting prereqs” as topological sort within 30 seconds.
Wrote Kahn’s correctly within 8 minutes, with cycle detection.
Wrote DFS variant within 8 more minutes, with three-color cycle detection.
Articulated both correctness arguments without prompting.
Identified the language-specific recursion-limit / boxing trap.
Generalized to LC 207 (Course Schedule, just yes/no), LC 269 (Alien Dictionary, infer edges), LC 1136 (Parallel Courses, layer-by-layer Kahn’s) within a week.

Lab 08 — Backtracking: Word Search II With Trie Pruning

Goal

Master the backtracking-with-pruning pattern in its highest-yield form: a grid DFS guided by a trie, with in-place visit-marking and post-recurse restoration. Deliverable solves LC 212 in O(M·N·4·3^(L-1) + W·L) time, where M·N is grid size, L is max word length, W is number of words. You can articulate why a trie cuts the brute O(W·M·N·4·3^(L-1)) by a factor of W, and why dead-branch pruning of the trie is the speed-of-light optimization.

Background Concepts

DFS on a 2D grid; backtracking with explicit make/undo of state; trie as a multi-pattern matcher; pruning via “if no children, abandon”; visit-marking via in-place mutation (board[r][c] = '#') vs an explicit visited set. Review pattern Backtracking and Trie.

Interview Context

Word Search II is an Amazon / Apple / Bloomberg favorite, and a top-3 hardest commonly-asked Mediums (often listed Hard). The interview signal is recognizing that running LC 79 (Word Search single-word) once per word is catastrophically slow for many words, and that a trie collapses the W independent searches into a single grid traversal. Strong candidates code the trie + DFS in 25 minutes. Elite candidates also implement trie pruning (deleting fully-found subtrees) to avoid revisiting.

Problem Statement

Given an M x N grid of characters and a list of words, return all words from the list that exist in the grid. A word can be constructed from letters of sequentially adjacent cells (horizontal/vertical), each cell used at most once per word.

Constraints

1 ≤ M, N ≤ 12
1 ≤ |words| ≤ 3 × 10^4
1 ≤ |word| ≤ 10
board[i][j] and words[i][j] are lowercase English letters.
All words are distinct.

Clarifying Questions

Each word individually uses each cell ≤ once, but can different words reuse the same cells? (Yes — independent searches per word.)
Diagonal adjacency? (No — only 4-connected.)
Are words guaranteed distinct? (Per constraints, yes.)
Are duplicates in the result allowed? (No — return distinct words found.)
Lower-case only? (Per constraints, yes — alphabet of size 26 simplifies trie nodes to fixed arrays.)

Examples

board = [["o","a","a","n"],
         ["e","t","a","e"],
         ["i","h","k","r"],
         ["i","f","l","v"]]
words = ["oath","pea","eat","rain"]
output = ["eat","oath"]

board = [["a","b"],["c","d"]]
words = ["abcb"]
output = []   (cells not adjacent or reused)

Initial Brute Force

For each word, run LC 79 (single-word search) on the grid.

def find_words_brute(board, words):
    return [w for w in words if exists_in_grid(board, w)]

def exists_in_grid(board, word):
    M, N = len(board), len(board[0])
    def dfs(r, c, i):
        if i == len(word): return True
        if not (0 <= r < M and 0 <= c < N) or board[r][c] != word[i]: return False
        ch, board[r][c] = board[r][c], '#'
        ok = dfs(r+1,c,i+1) or dfs(r-1,c,i+1) or dfs(r,c+1,i+1) or dfs(r,c-1,i+1)
        board[r][c] = ch
        return ok
    return any(dfs(r, c, 0) for r in range(M) for c in range(N))

Brute Force Complexity

Per word: O(M·N · 4·3^(L-1)) — for each starting cell, DFS explores at most 4 branches initially then 3 (one cell visited). Across W words: O(W · M · N · 4 · 3^(L-1)). With W = 3×10⁴, L = 10, M·N = 144: ~3×10⁴ × 144 × 4 × 3⁹ = enormous (~10^11). TLEs.

Optimization Path

Insight: all W single-word searches share grid traversal. If we have a trie of all words, a single DFS over the grid can simultaneously match all words, advancing through trie nodes as we step. At each grid cell (r, c), instead of asking “does this cell match word[i]”, we ask “does the current trie node have a child for board[r][c]”. If yes, descend. If the current trie node has the word field set, that word has been found — record and clear (to dedupe).

Pruning: after backtracking, if the trie node we just descended into has no children left (all its words have been found and we cleared them), prune it from the parent. This avoids revisiting empty subtrees on later starting cells.

Final Expected Approach

def find_words(board, words):
    # 1) Build trie
    trie = {}
    for w in words:
        node = trie
        for c in w:
            node = node.setdefault(c, {})
        node['$'] = w                  # marker: word ends here

    M, N = len(board), len(board[0])
    found = []

    def dfs(r, c, parent):
        ch = board[r][c]
        node = parent.get(ch)
        if not node: return
        if '$' in node:
            found.append(node.pop('$'))    # dedup: clear marker
        board[r][c] = '#'
        for dr, dc in ((1,0),(-1,0),(0,1),(0,-1)):
            nr, nc = r+dr, c+dc
            if 0 <= nr < M and 0 <= nc < N and board[nr][nc] != '#':
                dfs(nr, nc, node)
        board[r][c] = ch
        if not node:                       # prune dead branch
            parent.pop(ch)

    for r in range(M):
        for c in range(N):
            dfs(r, c, trie)

    return found

Data Structures Used

Trie: nested dict (Python). In Java/C++, an explicit TrieNode class with TrieNode[26] children array.
Grid: mutated in place ('#' marker for visited).
Output list: distinct words found.

Correctness Argument

Trie invariant: the trie initially encodes all words; each leaf-marker '$' carries the word. The DFS descends one trie level per grid step; arriving at a node with '$' means we’ve matched the complete word from the start cell.

Backtracking correctness: in-place marking with '#' and explicit restoration in the post-recurse line guarantee that on entry to any DFS call, the grid reflects only ancestor cells as visited. The restore is symmetric to the mark; no leaks.

Dedup via pop('$'): clearing the marker on first find ensures each word is reported exactly once even if multiple grid paths spell it.

Pruning correctness: pruning a child after recursion only removes a subtree that has no remaining words to find (no '$' markers anywhere below). Future searches that would have entered this subtree gain nothing from doing so, so pruning is safe and accelerates the algorithm.

Complexity

Time O(M·N · 4·3^(L-1)) after trie pruning, in the best case (when found words deplete the trie quickly). Worst case (all words distinct, no early termination): O(M·N · 4·3^(L-1)) for the grid traversal + O(W·L) for trie build. Space: O(W·L) for the trie + O(L) for recursion.

The W factor is gone because all words share the traversal.

Implementation Requirements

One DFS per starting cell, not W DFSes per starting cell. The trie unifies them.
Restore the cell (board[r][c] = ch) on every code path. The cleanest pattern: mark before the recursive calls, restore after — never restore inside conditionals.
Prune by removing the child entry when its subtree empties. This is a 5%-50% speedup depending on input.
Dedup by pop('$') on found, not by if word not in found: found.append(word) (the latter is O(W) per check).
For Python, use a nested dict; explicit TrieNode classes are slower due to attribute lookup.

Tests

Smoke: the LC example above.
Unit: single-cell grid, single-letter word; word identical to one row.
No matches: words with letters not in the grid → [].
All matches: every word findable.
Diagonal trap: word that exists only along a diagonal — should NOT be found.
Reuse trap: board=[["a","a"]], words=["aaa"] → [] (cell reuse forbidden).
Stress: M=12, N=12, W=3×10⁴, L=10, random; assert <1s in optimized languages, <5s in Python.
Adversarial: words with long common prefix (e.g., 1000 variants of "prefix....") — exercises the trie’s prefix-sharing benefit.

Follow-up Questions

“What if words can use diagonal adjacency too?” → 8 directions; everything else identical.
“What if the same cell can be reused?” → no marking needed; but then word length is unbounded by grid size, exponential blowup risk; need a cycle guard (e.g., max-length cap = some threshold).
“What if you want all paths spelling each word, not just whether it exists?” → don’t dedup; collect on every match.
“Memory blow-up for huge dictionaries?” → use a compressed trie (radix tree) or DAWG.
“Distributed: shard grid across machines?” → grids small enough to not matter; for huge grids, partition with overlap of size L-1.
“Online: words added/removed dynamically?” → trie supports insert/delete in O(L); the search needs no change.

Product Extension

This pattern underlies multi-pattern string matching in DLP (data-loss prevention) — scanning documents for any of a list of forbidden phrases — and in IDE autocomplete-on-context (which words from the dictionary can be formed by adjacent identifiers in scope?). It’s also used in spell-checkers that operate on keyboard layouts (find dictionary words spellable by adjacent keys). The Aho-Corasick algorithm is the streaming generalization (multi-pattern matching in O(text + total-pattern-length + matches)).

Language/Runtime Follow-ups

Python: nested dict is the fastest trie representation in Python; class-based is slower due to attribute lookup. dict.setdefault(c, {}) is the canonical insert; dict.pop(key, None) is safe pop with default.
Java: explicit TrieNode { TrieNode[] next = new TrieNode[26]; String word; } is fastest. HashMap<Character, TrieNode> is slower (autoboxing + hash).
Go: struct with [26]*TrieNode array. Map-based is slower.
C++: struct with TrieNode* next[26] = {nullptr}; and string* word. Avoid std::map<char, TrieNode>.
JS/TS: plain object as a hashmap is fine, but for hot loops, a Map or fixed-array index works.
Recursion depth: Python L = 10 fits the default 1000-frame stack. For deeper word lengths, set sys.setrecursionlimit.
Mutation safety: in-place grid mutation with '#' is fast but has thread-safety implications; if the function must be re-entrant, use an explicit visited set per call.

Common Bugs

Forgetting to restore the cell. Causes false negatives later: cells stay marked '#' permanently, blocking unrelated words.
Restoring inside if/return paths. Always restore at the end, after all branches have explored. Easiest: structure as mark; for each direction: recurse; restore;.
Visited check on the neighbor, not on the current cell. You should refuse to step into a '#' cell, but the current cell mark happens after entering it.
Adding the same word multiple times. Without pop('$'), a word findable by 5 paths gets reported 5 times.
Walking the trie root-back-to-root for each starting cell in non-trie code — re-enumerating words you’ve already found.
Boxing in Java’s Map<Character, TrieNode> — silent ~3× slowdown vs TrieNode[26].
Using word in found (O(W)) instead of pop('$') (O(1)) for dedup.
Pruning incorrectly: popping the trie node’s '$' marker but leaving its empty dict in the parent — then visited subtrees are revisited as empty traversals. Always check if not node: parent.pop(ch) after recursion.

Debugging Strategy

Build a tiny trie by hand for ["oath", "oat"] — verify the structure with a print.
Trace one DFS from cell (0,0) for the smoke example. The cell o matches root’s 'o' child; descend; mark; try neighbors; etc. Verify oath is found exactly once.
Test the prune with a single word: after finding "oath", the trie should be empty. Subsequent starts at any cell return immediately.
Cross-check against the brute-force LC 79 per word for the smoke and stress tests on M=N=4, W=10.

Mastery Criteria

Recognized “many words in a grid” as trie + DFS within 60 seconds.
Built the trie in ≤ 8 lines with setdefault.
Wrote the DFS with mark; recurse; restore symmetry on first attempt.
Implemented dedup via pop('$') and pruning via if not node: parent.pop(ch).
Articulated the W factor savings vs running LC 79 W times.
Identified the language-specific trie-representation trade-off.
Solved LC 79 (single-word) in <8 minutes within a week.
Solved LC 1268 (Search Suggestions System — autocomplete) within two weeks.

Lab 09 — Heap For Top-K: Top K Frequent Elements (Heap Vs Bucket Sort)

Goal

Master the two canonical “top K” algorithms — min-heap of size K for streaming/general cases and bucket sort by frequency for bounded-frequency cases. Deliverable solves LC 347 with both, articulates the time-space trade-off, and recognizes which language-specific gotcha (Python heapq is min-only; Java PriorityQueue boxes; C++ defaults to max-heap) applies.

Background Concepts

Min-heap of size K as the canonical “running top K” structure: pop when size exceeds K, ensuring O(N log K). Bucket sort by frequency (not value) when frequencies are bounded by N. The duality “top K frequent” / “K largest” / “K smallest” via heap-direction inversion. QuickSelect as the O(N) average alternative. Review pattern Heap Top K and Heap Foundations.

Interview Context

Top-K problems are interview gold — they appear at every Big Tech, often as the warmup or the second problem. The interview signal is whether you can match the right structure to the input shape: heap when N is huge or streaming, bucket sort when frequencies are bounded (which they always are in this problem since the max frequency is N). Strong candidates write the heap solution. Elite candidates write the heap solution, then also mention the O(N) bucket-sort alternative and articulate when each is preferred.

Problem Statement

Given an integer array nums and an integer K, return the K most frequent elements. The answer can be returned in any order.

Constraints

1 ≤ |nums| ≤ 10^5
-10^4 ≤ nums[i] ≤ 10^4
K is in the range [1, |distinct values in nums|].
The answer is unique (no ties at the K-th position that would create ambiguity).

Clarifying Questions

Tie-breaking — what if two values share the K-th frequency rank? (Per constraints, the answer is unique. But if it weren’t, ask: arbitrary, or some specified rule like smallest value first?)
Output order — sorted by frequency, by value, or any? (LC: any.)
Are floats / strings possible, or strictly ints? (Per constraints, ints — but the algorithm extends trivially to any hashable type.)
Streaming or offline? (Offline, but extending to streaming — windowed top-K — is a follow-up.)
Memory-constrained? (No special constraint, but the algorithm should be O(N) space because the frequency map is unavoidable.)

Examples

Input	Output
`nums=[1,1,1,2,2,3], K=2`	`[1,2]`
`nums=[1], K=1`	`[1]`
`nums=[4,1,-1,2,-1,2,3], K=2`	`[-1,2]`

Initial Brute Force

Count frequencies; sort by frequency; take top K.

from collections import Counter

def top_k_brute(nums, K):
    cnt = Counter(nums)
    return [x for x, _ in sorted(cnt.items(), key=lambda kv: -kv[1])[:K]]

Brute Force Complexity

Time O(N + D log D) where D is the number of distinct values. Space O(D). At N=10⁵, D≤2×10⁴+1, this is fast — but it sorts more than needed (full O(D log D) when we only want top K).

Optimization Path A — Min-Heap of Size K

Build a frequency map. Walk the (value, freq) pairs maintaining a min-heap of size K keyed by freq. For each pair, push; if heap size > K, pop. The K survivors are the top K.

Why min-heap? We want to drop the smallest frequency when the heap overflows; min-heap.peek() gives the smallest in O(1).

Time: O(N) frequency count + O(D log K) heap. For D >> K, this is much faster than O(D log D).

Optimization Path B — Bucket Sort by Frequency

Frequencies are integers in [1, N]. Create buckets bucket[f] = list of values with freq == f. Walk f from N down to 1, collecting values into the result until we have K.

Time: O(N). Space: O(N). The cleanest O(N) solution.

Final Expected Approach (Heap)

import heapq
from collections import Counter

def top_k_frequent_heap(nums, K):
    cnt = Counter(nums)
    heap = []   # (freq, value), min-heap
    for v, f in cnt.items():
        heapq.heappush(heap, (f, v))
        if len(heap) > K:
            heapq.heappop(heap)
    return [v for _, v in heap]

Final Expected Approach (Bucket Sort)

from collections import Counter

def top_k_frequent_bucket(nums, K):
    cnt = Counter(nums)
    buckets = [[] for _ in range(len(nums) + 1)]
    for v, f in cnt.items():
        buckets[f].append(v)
    out = []
    for f in range(len(nums), 0, -1):
        for v in buckets[f]:
            out.append(v)
            if len(out) == K: return out
    return out

Data Structures Used

Heap approach: Counter (hashmap) + min-heap of size K.
Bucket approach: Counter + a list-of-lists buckets indexed by frequency.

Correctness Argument (Heap)

Invariant: after processing the first i distinct values, the heap contains the top-min(K, i) most frequent among them. Adding the next value either grows the heap (if size < K) or replaces the min if the new freq exceeds it (push then pop-if-over-K does both correctly: push always grows; pop removes the smallest, which is the new value if it’s the smallest, leaving the heap unchanged).

Final step: after processing all D values, heap = top K. ✓

Correctness Argument (Bucket)

Frequencies range in [1, N], so buckets indexed [0..N] capture all. Walking f from high to low and collecting until K values found gives exactly the K most-frequent (with arbitrary tie-breaking within a bucket, acceptable per constraints).

Complexity

Approach	Time	Space
Brute (full sort)	O(N + D log D)	O(D)
Heap	O(N + D log K)	O(N + K)
Bucket	O(N)	O(N)
QuickSelect	O(N) average, O(N²) worst	O(D)

For D ≈ N and K ≪ D, heap and bucket are both linear-class; bucket’s hidden constant is smaller. For streaming, heap is the natural choice.

Implementation Requirements

Use a min-heap, even though we want the largest K — popping the min when overflowing keeps the largest K in the heap.
For Python’s heapq, push tuples (freq, value) — the comparison is lexicographic, so freq dominates.
Java’s PriorityQueue is a min-heap by default. No comparator inversion needed for “top-K largest” via the min-heap-of-size-K trick. Push int[]{freq, value} with Comparator.comparingInt(a -> a[0]).
C++’s priority_queue is a max-heap by default. For min-heap: priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>>.
For bucket sort, size buckets N + 1 (frequencies 1..N, plus index 0 unused).

Tests

Smoke: ([1,1,1,2,2,3], 2) → [1,2] (in any order).
Unit: K = 1 → most frequent only; K = D → all distinct values.
All distinct values: [1,2,3,4,5], K=3 → 3 of these (any 3, since frequencies tie at 1 — but per problem statement, the answer is unique, so this case wouldn’t be a test… unless K = 5, returning all).
All same: [7]*100, K=1 → [7].
Negative values: [-1,-1,1,1,2], K=2 → [-1,1] (tie at freq 2 — problem assumes unique answer; for this test, K = 1 → [-1] or [1]).
Large: N = 10⁵, K = 10, random values; both approaches sub-50ms.
Adversarial: all-distinct → bucket sort touches all of bucket[1]; heap pushes D items.

Follow-up Questions

“Streaming top-K — elements arrive one at a time, query top K at any moment.” → maintain a min-heap of size K plus a hashmap; per element, look up old freq, decrement-and-rebuild. Or use Misra-Gries / Space-Saving sketches for approximate.
“Top K from multiple sorted streams (LC 23 generalized)?” → merge-K-sorted via min-heap.
“K closest points to origin (LC 973)?” → max-heap of size K keyed by distance, push and pop-if-over-K.
“Kth largest element (LC 215)?” → min-heap of size K → root is K-th largest. Or QuickSelect for O(N) average.
“Sliding window top-K?” → monotonic deque (LC 239 for K=1) or balanced BST (Phase 3).
“Memory-constrained: top K of a billion items?” → heap-of-K only — O(K) memory, O(N log K) time.

Product Extension

Recommendation systems, trending topics, top-N queries on dashboards, “most frequent error code in last 5 minutes” log monitors — all are top-K problems. Real-world systems use approximate algorithms (Count-Min Sketch + heap) for cardinality at internet scale, but the exact algorithm is what an interview is testing. The Misra-Gries summary (heavy hitters) generalizes the heap to a streaming, memory-bounded, approximate version.

Language/Runtime Follow-ups

Python: heapq is min-only. To use it as max-heap, push -x (or (-freq, value) and decode at the end). Counter(nums).most_common(K) is a one-liner shortcut — mention it in the interview as the “Pythonic” answer, but be ready to write the algorithm by hand.
Java: PriorityQueue<int[]> with Comparator.comparingInt(a -> a[0]) is the canonical pattern. Boxing tax if you use PriorityQueue<Integer>. Sorting via Arrays.sort(arr, comparator) works on Integer[] but not int[].
Go: container/heap requires implementing 5 methods. Verbose. For Top-K problems, a simple sort sort.Slice(items, less) and slicing top K is often clearer.
C++: priority_queue<int> is a max-heap by default. For min-heap: priority_queue<int, vector<int>, greater<int>>. The top() is O(1), pop() is O(log N) and returns void — you must call top() first.
JS/TS: no built-in heap. Implement (~30 LOC) or use a library. For interview, Array.sort with a comparator and slicing is often acceptable for offline cases; mention you’d need a heap for streaming.
Heap tuple comparison: Python compares tuples lexicographically; if freq ties, comparison falls back to value, which can fail for non-comparable types. Wrap in a class with __lt__ defined on freq only, or pre-encode as (freq, hash(value)).

Common Bugs

Using a max-heap of size D instead of a min-heap of size K — wastes time and space; O(D log D) instead of O(D log K).
Java boxing in PriorityQueue<Integer> — silent ~2-3× slowdown.
Python heapq confusion: forgetting it’s min-only; writing heappush(heap, -x) for max-heap and forgetting to negate when reading. Use a tuple with negated key cleanly: heappush(heap, (-freq, value)).
C++ default direction wrong: priority_queue<int> is max-heap; using it for “min-heap of size K” gives wrong results.
Heap size check after push: if len(heap) > K: pop is correct. If you swap to if len(heap) >= K: pop you’ll never have K items.
Bucket-sort allocation cost: [[] for _ in range(N + 1)] is O(N), but in C++ you can vector<vector<int>> buckets(N + 1); — same idea, just be aware of the per-bucket overhead.
Unstable comparison on ties in heap tuples — for (freq, value), tied freqs compare values. If values are non-comparable (custom objects), this errors. Pre-encode or wrap.

Debugging Strategy

Trace ([1,1,1,2,2,3], 2). Counter: {1:3, 2:2, 3:1}. Heap evolution: push (3,1) → heap [(3,1)], size 1 ≤ 2 ✓; push (2,2) → [(2,2),(3,1)], size 2 ≤ 2 ✓; push (1,3) → [(1,3),(3,1),(2,2)], size 3 > 2 → pop (1,3); heap = [(2,2),(3,1)]. Result: values [2, 1]. ✓
Bucket trace: counter same; buckets [[], [3], [2], [1], [], [], [], ...]. Walk f=6,5,4,3 → take 1; f=2 → take 2; size=2, return [1, 2]. ✓
Cross-check the two approaches on 50 random inputs (compare as sets).

Mastery Criteria

Recognized “Top K” pattern within 30 seconds.
Wrote the min-heap-of-size-K solution within 6 minutes.
Mentioned bucket sort as the O(N) alternative when frequencies are bounded.
Identified the language-specific heap-direction trap (Python min-only, Java min-default, C++ max-default).
Solved LC 215 (Kth Largest), LC 973 (K Closest), LC 23 (Merge K Sorted Lists) within a week — same pattern, different keys.
Discussed QuickSelect as an O(N)-average alternative when asked.

Phase 3 — Advanced Data Structures

Target level: Medium → Hard Expected duration: 3 weeks (12-week track) / 3 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~8 advanced structures per week + 30–60 problems applying them under the framework

Why Advanced Data Structures Unlock Hards

Phase 2 gave you 28 patterns that solve the vast majority of Mediums. The patterns work because each one carries an O(N) or O(N log N) algorithm in its template — you recognize the signal, instantiate the template, and the runtime falls out for free.

Hards are different. The signal still fires — you still recognize “this is sliding window with a tricky max”, “this is DP with a state transition”, “this is shortest path with a constraint” — but the vanilla template’s complexity is one factor too high. A sliding-window max over a stream of N updates becomes O(N²) with a sorted list. A DP with N=20 and “set of visited” in the state explodes to O(2^N · N²) without bitmask compression. A range-sum problem with both updates and queries blows past prefix sums. A string match against a pattern of length M in a text of length N is O(N·M) with naive comparison; that’s 10^10 ops at N=10^5, M=10^5.

The advanced data structures in this phase are the augmented engines that bring Hard problems back into reach. Each one is a 1–2 log-factor improvement over a naive structure. They are not “tricks”. They are well-defined, well-proven structures with known invariants, well-understood failure modes, and known operating ranges. The skill is not to invent them — it is to recognize when the vanilla template is one log factor short, identify which augmented structure plugs the hole, and instantiate it correctly under interview pressure.

There are roughly three families:

Range query / range update structures — segment tree, Fenwick tree, sparse table, sqrt decomposition. These turn O(N) per range query into O(log N) or O(√N), and (with augmentation) handle range updates the same way. They show up whenever the problem has a sequence and you need both updates and aggregates over arbitrary subranges in the same workload.
String-matching / hashing / suffix structures — KMP, Z, Manacher, rolling hash, suffix array, suffix automaton, Aho-Corasick, tries. These bring per-character work down from O(M) (full pattern recompare) to O(1) amortized, enabling O(N+M) or O(N log N) algorithms over strings. They show up whenever the problem mentions “substring”, “match”, “occurrence”, “palindrome”, or “common”.
State-compression and amortization — bitmask DP, meet-in-the-middle, DSU with α(N), bit manipulation idioms, Bloom/skip/LRU-LFU. These exploit problem-specific structural facts (small N, splittable input, near-constant amortized work, probabilistic acceptance) to clear constraints that naive DP/search cannot.

You will not memorize 24 implementations cold. You will understand the invariants well enough to derive each implementation in 5–15 minutes under pressure, and instantly recognize which one is needed from the problem signal.

After this phase you can solve unmistakably-Hard LeetCode problems on first attempt: range queries with updates, palindromic counts in linear time, multi-pattern matching, exact-cover by bitmask DP, subset-sum at N=40 by meet-in-the-middle, equation-solving by weighted DSU, dynamic LRU caches. You also become visibly stronger in mock interviews because you no longer flinch at “what if the input is updated?”, “what if N is 40?”, or “what if there are 10^5 patterns to match?”.

What You Will Be Able To Do After This Phase

For any range-query Hard, identify within 60 seconds whether vanilla prefix sums suffice, or whether Fenwick / segment tree / sparse table / sqrt decomposition is required, and why.
Implement a segment tree (point update, range query) from memory in <12 minutes.
Add lazy propagation when the problem demands range updates, and articulate the lazy-tag-push invariant.
Recognize a string-match Hard and pick the right tool: KMP (single pattern), Z (border + offsets), Manacher (palindromes), rolling hash (probabilistic equality, multi-substring), Aho-Corasick (multi-pattern), suffix array/automaton (overview-level for “longest common substring”, “distinct substrings”).
Build a trie augmented with counts / deletion / prefix-sum cache for word-search and autocomplete-class problems.
Recognize bitmask DP from N ≤ 20 constraint, formulate the state, and implement the transition without bugs.
Recognize meet-in-the-middle from N ≤ 40 (split into 20+20) and code the two-half merge.
Implement DSU with path compression + union by rank, and prove the α(N) amortized bound.
Use bit manipulation idioms (popcount, lowbit, isolate trailing one, parity tricks) without thinking.

How To Read This Phase

Read the inline reference below in two passes. Pass 1: linear, end to end, to assemble a mental map of which structure plugs which hole. Pass 2: as you work through the labs, refer back to the structure entries to clarify invariants and pitfalls. Each entry has a fixed shape:

When to use — the problem signal that should fire this structure within 2 minutes of reading.
Complexity — build, query, update, space.
Implementation pitfalls — the bugs that consume the most interview minutes.
Classic problems — 3–6 representative problems where the structure is the intended solution.

Where labs cover the structure hands-on, the entry references the lab. Where the structure is overview-only (rare in interviews but expected of strong candidates), the entry says so explicitly.

Inline Advanced Data Structure Reference

1. Segment Tree (point update + range query)

When To Use

Sequence of N elements, with both point updates and range aggregates (sum / min / max / gcd / xor) on the same workload.
The aggregate is associative — i.e., it can be combined from two disjoint segments.
Q queries + Q updates, with N · Q too large for naive O(N) per query and prefix sums (which are O(1) per query but O(N) per update) too rigid for the update mix.

Complexity

Build O(N). Point update O(log N). Range query O(log N). Space O(4N) — the tree array is conventionally sized 4N to fit any nearly-balanced binary tree on N leaves.

Key Implementation Pitfalls

Off-by-one on the recursive boundaries — query(node, nl, nr, ql, qr): total miss is qr < nl or ql > nr (closed intervals); total cover is ql <= nl and nr <= qr. Mixing open and closed intervals is the #1 cause of broken segment trees.
Sizing the tree array — 4N is safe; 2 * next_power_of_two(N) is tight. If you pick 2N you’ll segfault for non-power-of-two N.
Combining two subtree results — for sum, just add; for min/max, take the extreme; for gcd, recurse. Define a single combine(a, b) so you can swap aggregates without rewriting the body.
Recursive vs iterative — iterative segment tree with n rounded to power of 2, leaves at [n, 2n), is shorter and faster. Pick one style and stick to it.

Classic Problems

LeetCode 307 — Range Sum Query Mutable
LeetCode 308 — Range Sum Query 2D Mutable (with row segment trees)
LeetCode 1157 — Online Majority Element In Subarray (segment tree of candidates + frequency check)
LeetCode 715 — Range Module (segment tree of intervals, or coordinate-compressed)

Hands-on: see Lab 01.

2. Segment Tree With Lazy Propagation (range update + range query)

When To Use

Same as #1 but updates affect ranges, not single points: “add v to all elements in [l, r]”, “set all elements in [l, r] to v”, “flip all elements in [l, r]”.
The update operation has a clean composition rule: applying update u₁ then u₂ to a node is equivalent to a single combined update.

Complexity

All ops O(log N). Space O(4N) for the values + O(4N) for the lazy tags.

Key Implementation Pitfalls

Pushdown order — before recursing into children, push the parent’s lazy tag down. After recursing, recompute the parent’s value from the children. Forgetting either side breaks the structure silently — queries return stale values.
Composing lazy tags — for “set” and “add” mixed, “set” must override the pending “add”. Define a clear apply(child_lazy, parent_lazy) rule and write it down before coding.
Tag identity / no-op — every lazy slot needs an “identity” value (e.g., 0 for add, sentinel for set) that means “no pending op”. Don’t conflate “identity” with a legal value.
Range-set with empty intersection — apply only on full-cover nodes; recurse only on partial overlap.

Classic Problems

LeetCode 732 — My Calendar III (count of overlapping intervals — coord-compressed lazy seg tree)
LeetCode 2569 — Handling Sum Queries After Update (range flip + range sum, lazy + bitmask)
Codeforces “EDU: Segment Tree” sections

Hands-on: see Lab 02.

3. Fenwick Tree / Binary Indexed Tree (prefix sums with updates)

When To Use

Workload is point updates + prefix queries (or range queries via two prefix queries).
The aggregate is invertible (sum, xor) — Fenwick can’t express min/max naturally because they’re not invertible.
You want a smaller, faster, simpler structure than a segment tree, with smaller constants and easier code.

Complexity

Build O(N) (or O(N log N) trivially). Point update O(log N). Prefix query O(log N). Range query (for sum/xor) = query(r) - query(l - 1). Space O(N).

Key Implementation Pitfalls

1-indexed — Fenwick trees are conventionally 1-indexed. Calling update(0, v) infinite-loops because 0 & -0 == 0. Use update(i + 1, v) if your data is 0-indexed.
i += i & -i (update) vs i -= i & -i (query) — the directions are not symmetric. Memorize: update goes up, query goes down.
Range-update + point-query is a different Fenwick variant — subtract on r+1, add on l. Not the same code path.
Range-update + range-query needs two Fenwick trees (the BIT² trick). Out of interview scope unless the problem explicitly demands.

Classic Problems

LeetCode 307 — Range Sum Query Mutable (canonical)
LeetCode 315 — Count of Smaller Numbers After Self (canonical Fenwick on coord-compressed values)
LeetCode 327 — Count of Range Sum
LeetCode 493 — Reverse Pairs

Hands-on: see Lab 03.

4. 2D Fenwick Tree (matrix prefix sums with updates)

When To Use

Matrix problems with both point updates and rectangle-sum queries.
Static prefix-sum + occasional updates — but updates make recomputing the prefix-sum O(NM), so 2D Fenwick.
Coordinate-compressed if the matrix is sparse (most cells empty).

Complexity

Update O(log N · log M). Rectangle query O(log N · log M). Space O(N · M).

Key Implementation Pitfalls

Nested i & -i — outer loop walks the row index, inner loop walks the column index. Independent.
Inclusion-exclusion for rectangle sum: Q(r2, c2) - Q(r1-1, c2) - Q(r2, c1-1) + Q(r1-1, c1-1). Forgetting the + corner is the canonical bug.
Sparse matrices — if N · M = 10^10 you cannot allocate the array. Use a dict of dict or coordinate compression along each axis.

Classic Problems

LeetCode 308 — Range Sum Query 2D Mutable (canonical)
LeetCode 327 — Count of Range Sum (often reduced to 2D via coords)
LeetCode 1505 — Min Number of Swaps to Make Strings K-Equal (variants use 2D BIT)

5. Sparse Table (immutable RMQ)

When To Use

The array is static (no updates) and you need idempotent range queries (min, max, gcd, “is there a 1 in this range”).
O(1) query is required (segment tree’s O(log N) is too slow).
Builds in O(N log N) — fine for one-time use.

Complexity

Build O(N log N). Query O(1) for idempotent ops, O(log N) for non-idempotent ops (sum). Space O(N log N).

Key Implementation Pitfalls

Idempotent op required for O(1) — the trick is query(l, r) = combine(table[k][l], table[k][r - 2^k + 1]) where k = floor(log2(r - l + 1)). The two halves overlap; this only works if combining the same element twice equals once. Sum is not idempotent (counts twice). Min, max, gcd, “or”, “and” are.
floor(log2(len)) precomputation — building a log_floor[] table of size N is required for true O(1). Computing log2 per query is too slow.
Memory — at N = 10^6, N log N ≈ 2 × 10^7 ints = 80 MB. Plan for it.

Classic Problems

LeetCode 1851 — Minimum Interval to Include Each Query (one approach uses sparse table + offline sort)
Codeforces RMQ classics

Hands-on: see Lab 04.

6. Sqrt Decomposition (block-based queries)

When To Use

Workload mixes point/range updates and range queries, but the operation is hard to fit into segment tree (e.g., “sum of distinct values in a range”, “k-th smallest in a range with offline queries”).
You want a much simpler implementation than a segment tree, accepting an O(√N) factor.
Mo’s algorithm (offline range queries, total cost O((N+Q) · √N)) is a sqrt-decomp specialization.

Complexity

Build O(N). Query/update O(√N) per op. Space O(N).

Key Implementation Pitfalls

Block size choice — block size B = ⌈√N⌉ minimizes total cost Q · (N/B + B). Slightly larger B (e.g., 1.5√N) sometimes wins for cache reasons.
Edge of block — left partial block (l to end of block) and right partial block (start of block to r) handled as scalars; full middle blocks summed via block totals.
Mo’s algorithm — sort queries by (block of l, r) (with even/odd block hack to halve constants), then move pointers. Don’t confuse “block of l” sorting with sorting by l.

Classic Problems

“Range Distinct Count” (offline) — Mo’s classic.
LeetCode 850 — Rectangle Area II (sweep + sqrt-block coord compression)
Codeforces “EDU: Sqrt Decomposition”

7. Persistent Segment Tree (overview)

When To Use

You need to query prior versions of an array — “sum over [l, r] as of version v”, “k-th smallest in [l, r] (offline, treat as 2D)”.
Classic application: “find k-th smallest in subarray” via merging persistent trees indexed by position.
Rare in standard interviews; expected for Grandmaster / CP-style.

Complexity

Update O(log N) creating a new version (path-copy). Query on any version O(log N). Space O(N + Q log N) over Q versions.

Key Implementation Pitfalls

Reference each version’s root — store an array roots[v]; never mutate an old node.
Memory blow-up — N + Q log N at N = 10^5, Q = 10^5 ≈ 1.7M nodes × ~24 bytes = 40 MB. Pre-allocate node pool.
Garbage collection — in GC’d languages, hold the root references to keep nodes alive; in C++, use a node pool + manual indices.

Classic Problems

“K-th smallest in range” (offline-via-persistent-seg-tree).
“Count distinct in range” (with persistent seg tree of last-occurrence positions).

This is overview-only in this phase; you should know it exists, what it solves, and the rough cost. Implementing it is a Phase 7 exercise.

8. Treap / Implicit Treap (overview)

When To Use

Balanced BST with expected O(log N) operations via randomized priorities — simpler than red-black or AVL.
Implicit treap keys by position in an array, supporting O(log N) array splice / split / merge / range reverse / range sum — classic for “rope” data structures.
Order-statistics tree (find k-th, count less than) when sorted-container library doesn’t expose it.

Complexity

Insert / delete / split / merge / range-op all O(log N) expected.

Key Implementation Pitfalls

Heap property on random priorities — re-bubble after insertion / deletion. Without correct rotations, you lose the log bound.
Lazy reverse / lazy add on implicit treap mirrors lazy segment tree — push tag before recursing.
Expected vs worst case — adversarial input can’t degrade because priorities are random; this is the entire point.

Classic Problems

“Array splice” with O(log N) per op — implicit treap canonical.
Order-statistics queries on an indexed multiset.

Overview only.

9. Splay Tree (overview, when used)

When To Use

Self-adjusting BST — recently accessed nodes move toward the root.
Useful when the access pattern has temporal locality (LRU-like): O(log N) amortized but O(1) for hot keys.
Library-link: the data structure underlying many compiler symbol tables.

Complexity

All ops O(log N) amortized; individual ops up to O(N) worst case. The amortization argument uses a potential function.

Key Implementation Pitfalls

Splay step — zig (one rotation), zig-zig (same-side double), zig-zag (opposite-side double). The choice depends on grandparent direction; getting it wrong destroys amortization.
Splay on every access — including failed search. Forgetting this means the amortization breaks.

Classic Problems

Rare in competitive interviews; common in systems-engineering follow-ups about LRU implementations.

Overview only.

10. KMP (Knuth–Morris–Pratt failure function)

When To Use

Single pattern P matched against a text T.
You need either all occurrences of P in T, or just first, in O(N + M).
Or you need the longest border (longest proper prefix = suffix) of a string — that’s the failure function itself.

Complexity

Failure-function build O(M). Match O(N + M). Space O(M).

Key Implementation Pitfalls

Failure function recursion — fail[i] is the length of the longest proper prefix of P[0..i] that’s also a suffix. The recurrence walks j = fail[i - 1] backward via j = fail[j - 1] until matched.
0 vs 1 indexed — pick one and stick with it. Most resources are 0-indexed; many CP templates are 1-indexed.
Forgetting to reset j to 0 between independent matches — the matcher state is per-text, not global.

Classic Problems

LeetCode 28 — Find the Index of the First Occurrence in a String (canonical strstr)
LeetCode 459 — Repeated Substring Pattern (one-shot via failure function)
LeetCode 214 — Shortest Palindrome (failure function on s + '#' + reverse(s))
LeetCode 1392 — Longest Happy Prefix

Hands-on: see Lab 05.

11. Z Algorithm

When To Use

Compute, for each position i in a string S, the longest prefix of S that starts at i (z[i]).
Substring matching — concatenate P + '#' + T and look for z[i] = M in the T half.
Pattern problems where “longest prefix matching at offset” is the natural query.

Complexity

Build O(N) using a sliding [l, r] window of the rightmost reaching match. Match O(N + M).

Key Implementation Pitfalls

Maintaining the [l, r] Z-box — when i ≤ r, copy from z[i - l] capped at r - i + 1, then extend; otherwise start fresh from i. Off-by-one on the cap is the canonical bug.
Sentinel character — must not appear in either P or T. Use \0 or a fresh symbol; in Python, a tuple of (0, ord(c)) for the sentinel and (1, ord(c)) for real chars works.
Z and KMP overlap — they solve the same problems with different invariants. Picking one and being fluent is better than knowing both shallowly.

Classic Problems

Same as KMP — LC 28, 214, 459, 1392.
“Number of occurrences of P in T overlapping” — count z[i] >= M in the T region.

12. Manacher’s Algorithm (longest palindrome in O(N))

When To Use

“Longest palindromic substring” or “count of palindromic substrings”.
Naive expand-around-center is O(N²); Manacher’s is O(N).
The trick is mirroring around the rightmost-reaching palindrome center.

Complexity

O(N). Space O(N).

Key Implementation Pitfalls

Even-length palindromes — Manacher’s classic trick is to insert # between every pair of chars (and at ends): "abba" → "#a#b#b#a#". Now every palindrome (odd or even original) is odd-length in the transformed string.
P[i] vs original-string radius — P[i] after the transform is the radius in the transformed string. The original palindrome has length P[i].
Maintaining the rightmost-reaching center C and right-boundary R — P[i] = min(R - i, P[2C - i]) if i < R, else expand from scratch. Off-by-one on R - i (vs R - i + 1) is the canonical bug.

Classic Problems

LeetCode 5 — Longest Palindromic Substring (canonical)
LeetCode 647 — Palindromic Substrings (count)
LeetCode 1960 — Maximum Product of Two Palindromic Substrings

13. Rolling Hash (Rabin–Karp + double hashing)

When To Use

Compare a sliding-window substring against a pattern in O(1) per shift (after O(M) preprocessing).
Find duplicate substrings of length L in O(N) instead of O(N²) — the canonical “longest duplicate substring” via binary search on L + hashing.
Compare two substrings of S equal in O(1) — precompute prefix hashes.

Complexity

Preprocess O(N). Per-comparison O(1) hash plus O(M) verify (in adversarial settings; often skipped). For “longest duplicate substring”: O(N log N).

Key Implementation Pitfalls

Hash collisions — single hash with mod ~10^9 has ~50% collision probability over 10^5 strings (birthday paradox). Always double-hash in interview answers, or single-hash + explicit verify on match.
Modular arithmetic — base ~30 (alphabet) to ~10^9 prime; mod a large prime ~10^9. In Python, use pow(base, k, mod) for the negative-power inverse trick. In Java/C++, use long to avoid overflow on base * value.
Anti-hash adversarial inputs — Codeforces has problems specifically constructed to break common base/mod choices. Use random base from [26, mod-1] per run.

Classic Problems

LeetCode 187 — Repeated DNA Sequences (canonical small-window hashing)
LeetCode 1044 — Longest Duplicate Substring (binary search on length + rolling hash)
LeetCode 28 — Find the Index of the First Occurrence (Rabin–Karp variant)
LeetCode 1392 — Longest Happy Prefix

Hands-on: see Lab 06.

14. Suffix Array (overview, applications)

When To Use

All suffixes of S sorted lexicographically — enables binary search for any pattern in O(M log N).
“Longest common substring of two strings” via combined suffix array + LCP.
“Number of distinct substrings of S” = N(N+1)/2 − Σ LCP[i].

Complexity

Build O(N log² N) (radix-sort + double-the-rank trick) or O(N log N) (DC3 / SA-IS, harder). LCP via Kasai’s algorithm O(N). Per-pattern search O(M log N).

Key Implementation Pitfalls

Doubling-the-rank — sort suffixes by their first 1, 2, 4, …, N characters using the previous round’s ranks. Each round is a radix sort.
LCP array — Kasai’s algorithm: walk in original-index order, decrement an h counter that tracks current LCP. Subtle but linear.
Sentinel — append a unique smaller-than-all-others character to avoid prefix-of-another-suffix issues.

Classic Problems

“Longest common substring of two strings” (SA + LCP + range minimum).
“Number of distinct substrings”.
“K-th lexicographically smallest substring”.

Overview only — labs don’t drill suffix arrays directly; rolling hash + Z/KMP cover most interview cases.

15. Suffix Automaton (overview, applications)

When To Use

Smallest DFA accepting all substrings of S, built in O(N) — strictly stronger than a suffix tree for many queries.
“Number of distinct substrings”, “longest common substring”, “count occurrences of a pattern” — all O(M) per query after O(N) build.

Complexity

Build O(N · σ) where σ is alphabet size. Space O(N · σ).

Key Implementation Pitfalls

link (suffix-link) pointers — the equivalence-class tree of states. Subtle to derive; templates exist.
cnt augmentation — counting occurrences requires DP on the suffix-link tree.
Online construction — add char at a time, maintain the latest state.

Classic Problems

“Distinct substrings count”.
“Longest common substring of K strings”.
Codeforces / SPOJ string problems.

Overview only. Rare in interviews; expected at Grandmaster.

16. Trie Variants (compressed, with counts, with deletion)

When To Use

Prefix queries: “does any inserted word start with prefix P?”, “all words with prefix P”.
Multi-word search (LC 212 — Word Search II): compile dictionary into a trie, DFS the grid against the trie for O((R · C) · 4^L) instead of per-word DFS.
Autocomplete with frequency ranking — trie augmented with word counts and best-K-at-each-node.

Complexity

Insert O(L). Search prefix O(L). Space O(Σ L · σ) for plain array-based trie; O(Σ L · branching) for hash-based.

Key Implementation Pitfalls

Array-of-26 vs hash-of-char — array-of-26 is faster (no hashing); hash is more memory-efficient on sparse tries. Pick one based on problem constraints.
End-of-word marker — use a separate is_end boolean, not a sentinel char that could collide with input.
Compressed (radix) trie — chains of single-child nodes are merged into a single edge labeled with the substring. Saves memory at the cost of more complex insertion (split an edge mid-substring).
Deletion — typically just clear is_end and prune empty subtrees. Don’t free shared nodes.
Counts at each node — increment on insert, decrement on delete; useful for “count words with prefix P” in O(L).

Classic Problems

LeetCode 208 — Implement Trie (canonical)
LeetCode 211 — Design Add and Search Words (with . wildcard — DFS)
LeetCode 212 — Word Search II (canonical trie-on-grid)
LeetCode 421 — Maximum XOR of Two Numbers in an Array (binary trie)
LeetCode 642 — Design Search Autocomplete System

Hands-on: see Lab 07.

17. Aho–Corasick (multi-pattern matching)

When To Use

Match a set of K patterns simultaneously against a text T — “find all dictionary words that occur in T”.
Naive: O(N · ΣM) — too slow for K = 10^4 patterns.
Aho–Corasick: O(N + ΣM + #matches) — linear in everything.

Complexity

Build O(ΣM · σ). Match O(N + #matches). Space O(ΣM · σ).

Key Implementation Pitfalls

Failure links — analog of KMP’s failure function on a trie. Built via BFS over the trie.
Output (dict-suffix) links — for each node, follow failure links to collect every matching pattern that ends at this position. Without dict-suffix links, you’d miss patterns that are suffixes of other patterns.
Node pool size — total nodes ≤ ΣM + 1. Pre-allocate.

Classic Problems

LeetCode 1032 — Stream of Characters (canonical reverse-trie + Aho-Corasick)
“Find all dictionary words occurring in document”.

18. Bloom Filter (probabilistic membership)

When To Use

“Is X in the set?” with a tolerated false-positive rate, zero false negatives.
Memory tight (you can’t store the full set), or you want a fast pre-filter before a slow exact check (e.g., disk lookup).
Streaming dedup with a fixed false-positive budget.

Complexity

Insert / query O(K) where K is the number of hash functions. Space O(M) bits where M is chosen for the target false-positive rate (1 - e^(-KN/M))^K.

Key Implementation Pitfalls

K (#hash functions) and M (#bits) — given target FPR p and capacity n, optimum is M = -n ln(p) / ln(2)², K = (M/n) ln 2.
No deletion — standard Bloom can’t delete (can’t tell which other element shares the bit). Counting Bloom does, with K counters.
False positives compound with set size — a 1% Bloom filter at capacity is 1% per query, not 1% over a workload. Rebuild when growing.

Classic Problems

System-design follow-ups (“how do you check whether a URL has been crawled in the last 30 days?”).
LeetCode-adjacent: not common as a graded problem, but expected in design interviews.

19. Skip List (overview)

When To Use

Randomized alternative to balanced BST — O(log N) expected, simpler to implement than AVL/RB.
Used in practice in Redis sorted sets, LevelDB MemTable.
Order-statistics, range queries on a sorted set, with concurrent-modification flexibility.

Complexity

Insert / delete / search O(log N) expected. Space O(N) expected (geometric level distribution).

Key Implementation Pitfalls

Level distribution — each new node’s level is geometric with p = 1/2 (or 1/4 in production). Cap at log N.
Update array on insert/delete — track the predecessor at each level; splice carefully.
Concurrent skip list — much simpler than concurrent BST; standard library impls in Java (ConcurrentSkipListMap).

Classic Problems

LeetCode 1206 — Design Skiplist (canonical implementation problem)
System-design discussions of Redis ZSET / LevelDB.

Overview only; the implementation problem (LC 1206) is good practice but rare.

20. LRU / LFU Implementation Deep Dive

When To Use

Cache eviction problems with O(1) get/put requirement.
LRU: hash map + doubly-linked list. Touched node moves to head; evict tail.
LFU: hash map of (key → node) + hash map of (freq → doubly-linked list). On hit, move node to next-freq list; on evict, drop tail of min-freq list.

Complexity

LRU: O(1) get and put. Space O(capacity). LFU: O(1) get and put. Space O(capacity). Maintaining min_freq is the subtle bookkeeping bit.

Key Implementation Pitfalls

LRU: doubly-linked list with sentinel head/tail — eliminates null checks. Always add at head, evict from tail.
LRU: hash map points to nodes, not keys — so you can splice the node in O(1) without searching.
LFU: min_freq invariant — increment when freq-list at min_freq becomes empty only if the touched node was the cause.
LFU: list-per-frequency — implement as a doubly-linked list of nodes; ordering within a freq is LRU.

Classic Problems

LeetCode 146 — LRU Cache (canonical)
LeetCode 460 — LFU Cache
LeetCode 432 — All O(1) Data Structure (frequency buckets)

Both are bread-and-butter for systems-engineering interviews.

21. Disjoint Set Union (DSU) with Path Compression and Union by Rank — Proof of α(N) Amortization

When To Use

Online connectivity (#1 trigger).
Kruskal’s MST.
Equation problems (weighted DSU — LC 399 Evaluate Division).
Offline divide-and-conquer queries with rollback (advanced).

Complexity

Each op amortized inverse-Ackermann α(N) — for all practical N (up to 2^65536), α(N) ≤ 4. Effectively constant.

Proof Sketch (Tarjan)

Without compression or rank: worst-case chain → O(N) per op.
Path compression alone: each find shortens the path. Amortized O(log N) per op.
Union by rank (or size) alone: depth bounded by O(log N). Per-op O(log N) worst case.
Both together: per-op amortized O(α(N)). Tarjan’s potential function counts “blocks” of nodes by rank and shows the total cost over M ops is O(M · α(N)). The proof uses Ackermann’s hierarchy A(k, n) and α(N) is its inverse.
For an interview: state “with both heuristics, amortized O(α(N)), where α grows so slowly it’s ≤ 4 for any N you’ll see in practice; you treat it as O(1)”. Cite Tarjan 1975.

Key Implementation Pitfalls

Recursive find blows the stack at N = 10^5 in Python. Use iterative two-pass: walk up to root, then walk again compressing.
Path-halving variant (parent[x] = parent[parent[x]] per step) — simpler, asymptotically equivalent, often faster than full compression in practice.
Union by rank vs union by size — both work. Rank is the height upper bound of the tree (compression doesn’t decrease rank); size is the count of nodes in the tree. Pick one.
Forgetting to update rank when ranks are equal — break the tie and increment the survivor’s rank.

Classic Problems

See pattern 18 in Phase 2 README.
LeetCode 200 — Number of Islands (DSU alternative)
LeetCode 305 — Number of Islands II (online DSU canonical)
LeetCode 547 — Number of Provinces
LeetCode 684 — Redundant Connection
LeetCode 721 — Accounts Merge
LeetCode 952 — Largest Component Size by Common Factor
LeetCode 399 — Evaluate Division (weighted DSU)

Phase 2 covered DSU mechanically. Phase 3’s contribution is the proof of α(N) and the weighted / rollback variants.

22. Bit Manipulation Idioms (popcount, lowbit, isolate trailing one, parity)

When To Use

Bitmask DP (pattern 23) requires fluency in these primitives.
Subset enumeration, parity tricks, fast set operations.
Hot-loop optimization where each int represents a tiny set (≤ 64 elements).

Idioms

Popcount: __builtin_popcount(x) (C/C++/Java via Integer.bitCount), bin(x).count('1') (Python — slow), x.bit_count() (Python 3.10+, fast).
Lowbit / lowest set bit: x & -x gives the value of the lowest set bit. x & (x - 1) clears it.
Isolate trailing ones: x & ~(x + 1). Set trailing zero: x | (x + 1).
Iterate subsets of mask: s = mask; while s: ... ; s = (s - 1) & mask enumerates each subset of mask exactly once.
Iterate set bits: while x: lb = x & -x; ... ; x ^= lb. Each step does O(1) work, total O(popcount).
Parity: bin(x).count('1') & 1 or — faster — XOR-fold: x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; x ^= x >> 2; x ^= x >> 1; return x & 1.
Power of two test: x > 0 and (x & (x - 1)) == 0.
Swap without temp: a ^= b; b ^= a; a ^= b — academic; never use in production.

Classic Problems

LeetCode 191 — Number of 1 Bits (popcount).
LeetCode 338 — Counting Bits (DP using dp[x] = dp[x >> 1] + (x & 1)).
LeetCode 461 — Hamming Distance (popcount of XOR).
LeetCode 78 — Subsets via bitmask iteration.

Mastery here is a prerequisite for Pattern 23.

23. Bitmask DP Foundation

When To Use

N ≤ ~20 and the problem asks for an optimum over subsets or assignments of N items.
Examples: traveling salesman (TSP), assignment problem, “shortest path visiting all nodes”, “minimum cost to cover all groups”.
The state is a bitmask of “which items are used / visited / completed”.

Canonical Forms

Permutation DP: dp[mask][i] = min over j in mask\{i}: dp[mask \ {i}][j] + cost(j, i). Result: min over i: dp[full_mask][i] (or back-to-start for TSP cycle).
Subset cover DP: dp[mask] = min over partition of mask into subset s and rest: cost(s) + dp[mask \ s].
Assignment DP: dp[mask] = min cost to assign people 0..popcount(mask)-1 to the jobs in mask.

Complexity

Permutation DP: O(2^N · N²) time, O(2^N · N) space. N=20 → 4 × 10^8 — borderline. Subset DP: O(3^N) time (enumerating subset of subset). N=15 → 14M — comfortable.

Key Implementation Pitfalls

Subset-of-subset enumeration uses s = mask; while s: ... ; s = (s - 1) & mask. The mask invariant is critical.
Initial mask — for permutation DP, initialize dp[1 << i][i] for the first city; iterate masks in increasing order so dependencies are resolved.
Reconstruction — to recover the order, store predecessor (mask, i) → (prev_mask, prev_i) and walk back.
N too large — N > 20 is not bitmask DP territory. Reach for meet-in-the-middle (#24) or heuristics.

Classic Problems

LeetCode 847 — Shortest Path Visiting All Nodes (canonical bitmask + BFS)
LeetCode 1125 — Smallest Sufficient Team (subset cover DP)
LeetCode 943 — Find the Shortest Superstring (TSP-like, DP over permutations)
LeetCode 698 — Partition to K Equal Sum Subsets (subset assignment)
LeetCode 1494 — Parallel Courses II
LeetCode 1879 — Minimum XOR Sum of Two Arrays (assignment DP)

Hands-on: see Lab 08.

24. Meet-in-the-Middle (split, sort, two-pointer)

When To Use

N ≤ 40 (or up to 50), the problem asks for “subset with property X”, and 2^N is too large but 2^(N/2) is fine.
Examples: “subset sum closest to T at N=40”, “count subsets with XOR equal to K”, “split items into two groups minimizing difference”.

Canonical Template

left, right = a[:n // 2], a[n // 2:]
sums_left  = sorted(sum(combo) for combo in subsets(left))   # 2^(N/2)
sums_right = sorted(sum(combo) for combo in subsets(right))  # 2^(N/2)
# for each L in sums_left, binary-search the closest R such that L + R ≈ T.

Complexity

Time O(2^(N/2) · N/2) for enumeration + O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2)) for sort, then O(2^(N/2) · log) for the merge. At N=40 → ~10^6 ops. Space O(2^(N/2)) for the two subset-sum lists.

Key Implementation Pitfalls

Enumerate subsets correctly — for mask in range(1 << k): sum = sum of bits set in mask via popcount-iteration. Or recursive include/exclude.
Two-pointer or binary search — once both halves are sorted, sweep with two pointers (one from each end) to minimize / count target.
Memory — at N=40, half-mask space is 2^20 = 1M entries × 8 bytes = 8 MB. Comfortable, but watch out at N=44.
Counting (not just existence) — careful binary-search for lo, hi bounds; use bisect_left and bisect_right.

Classic Problems

LeetCode 1755 — Closest Subsequence Sum (canonical N=40)
LeetCode 956 — Tallest Billboard (subset DP alternative; meet-in-the-middle viable)
LeetCode 805 — Split Array With Same Average (meet-in-the-middle)
“Subset sum at N=40” — competitive-programming staple.

Hands-on: see Lab 09.

Recognition Cheat Sheet

Problem Signal	Structure
Range query + point update, sum/min/max	Segment tree (#1) or Fenwick (#3, if invertible)
Range query + range update	Lazy segment tree (#2)
Static range min/max with O(1) queries	Sparse table (#5)
Range distinct count, hard-to-segment-tree aggregate	Sqrt decomposition / Mo’s (#6)
Single pattern in text	KMP (#10) or Z (#11)
Longest palindrome / count palindromes	Manacher (#12)
Many-substring equality / longest duplicate	Rolling hash (#13)
Multi-pattern dictionary in text	Aho–Corasick (#17)
Prefix queries, autocomplete, word-on-grid	Trie variants (#16)
Probabilistic membership	Bloom filter (#18)
Cache with O(1) get/put	LRU / LFU (#20)
Connectivity / equation graphs	DSU (#21)
N ≤ 20, subset / assignment optimum	Bitmask DP (#23)
N ≤ 40, subset existence / closest sum	Meet-in-the-middle (#24)
Bit-level state mechanics	Bit idioms (#22)

Mastery Checklist

You have completed Phase 3 when you can, on demand and from memory:

Implement a segment tree (point update, range sum/min/max) in <12 minutes, with no off-by-ones, on the first attempt.
Add lazy propagation for range-add + range-sum in <20 minutes, articulating the push-down invariant.
Implement a Fenwick tree (1-indexed, prefix-sum + point update) in <8 minutes.
State why Fenwick can’t do range-min naturally and which segment-tree augmentation handles it.
Build a sparse table for static RMQ in <10 minutes, including the log_floor[] precompute.
Choose between segment tree, Fenwick, sparse table, and sqrt decomposition based on the workload (read-only vs mixed; aggregate type) in <30 seconds.
Compute KMP’s failure function on a string of length 20 by hand, no errors.
Implement KMP match in <12 minutes.
Implement Manacher’s longest palindrome in <20 minutes (this one is hard; that’s expected).
Implement double-hashing rolling hash in <15 minutes; explain why single hash is insufficient.
Implement a trie (insert, search, startsWith) in <8 minutes.
Implement Aho–Corasick at the conceptual level (failure + dict-suffix links) and state its complexity.
State the Bloom filter formula: target FPR p, capacity n → M = -n ln(p) / ln(2)², K = (M/n) ln 2.
Implement LRU cache (146) in <10 minutes; LFU (460) in <25 minutes.
Implement DSU with path compression + union by rank in <8 minutes; state the α(N) bound and cite Tarjan.
Use x & -x, x & (x - 1), subset-of-mask enumeration without thinking.
Recognize bitmask-DP from N ≤ 20 and write the transition in <10 minutes for an unfamiliar problem.
Recognize meet-in-the-middle from N = 40 and write both halves + merge in <20 minutes.

If any of these takes >2× the budget, drill it again — that structure is your weakest link. Hards rarely fail because all your structures are weak; they fail because one of them is, and that’s the one this Hard happened to need.

Exit Criteria

You may proceed to Phase 4 — Graph Mastery only when:

All 9 labs are complete, with the deliverable code written, tested, and reviewed via the REVIEW_TEMPLATE.
Mastery checklist is fully ticked.
30+ Hard problems solved across the structures above (10 segment-tree-class, 5 string-algo, 5 trie/AC, 5 DSU/bitmask, 5 free choice).
Mock interview at Phase 3 level: you receive a Hard segment-tree problem, a Hard string problem, and a Medium-Hard bitmask DP problem in a 90-minute window. Solve at least 2 of the 3 cleanly.
No structure is “the one I always get wrong” — drill it until it isn’t.

If any of these fails, do not proceed. Phase 4 builds on the assumption that DSU, segment trees, and bitmask are reflexes. If they are not, Phase 4’s harder graph problems will compound the gap.

Labs

#	Lab	Structure	Canonical Problem
01	Segment tree (range query)	Point update + range sum/min/max	LC 307
02	Segment tree with lazy propagation	Range update + range query	Range-add + range-sum
03	Fenwick tree (BIT)	Coord-compressed Fenwick	LC 315
04	Sparse table for RMQ	Static O(1) RMQ	Range-min array
05	KMP string matching	Failure function + match	LC 28 / 459
06	Rolling hash	Double hashing	LC 187 / 1044
07	Trie applications	Trie with `is_end` + DFS-on-trie	LC 208 / 212
08	Bitmask DP	Permutation DP over subsets	LC 847
09	Meet-in-the-middle	Split-sort-merge	LC 1755

Common Failures At This Phase

These are the failure modes that consume the most candidate time at Phase-3 level. Tag them when they occur using FAILURE_ANALYSIS.md.

Segment tree off-by-ones — closed-vs-open intervals mixed mid-recursion. Fix: always closed [l, r], never mix with [l, r).
Fenwick tree 0-index trap — update(0, …) infinite-loops. Fix: shift to 1-index at the boundary.
KMP failure function off-by-one — j = fail[j - 1] vs j = fail[j]. Fix: derive on a 5-char example.
Rolling hash single-mod collisions — pass random unit tests, fail adversarial. Fix: double-hash always.
Bitmask DP transition direction — dp[mask] from dp[mask & ~bit] (forward) vs dp[mask | bit] from dp[mask] (backward). Both work; mixing them mid-implementation breaks. Fix: pick one before coding.
DSU recursive find stack overflow at N=10^5 in Python. Fix: iterative two-pass.
Lazy segment tree forgetting to push before recursing into a child. Fix: write push_down(node) as the first line of any non-leaf recursion.

Cross-References

FRAMEWORK.md — apply on every Hard.
CODE_QUALITY.md — Hards do not get graded leniency; clean code still required.
COMMUNICATION.md — out loud at the recognition step, the structure name and complexity must be explicit. “I’ll use a segment tree with lazy propagation; build O(N), query and update O(log N), space O(4N).”
SPACED_REPETITION.md — segment tree and KMP should be on a 7-day cycle for the first month after this phase. Bitmask and meet-in-the-middle on 14-day.
Phase 4 — Graphs — DSU shows up immediately; review #21 the day before starting Phase 4.
Phase 5 — DP — bitmask DP is the bridge. Without #23 fluency, Phase 5’s “DP on graphs / DAG / interval” labs will hurt.
Phase 7 — Competitive — persistent seg tree, suffix array/automaton, splay/treap deepen here.

Lab 01 — Segment Tree (Point Update + Range Query)

Goal

Implement a segment tree from scratch that supports point updates and range-sum / range-min / range-max queries on an array of N integers. Build in O(N), query and update in O(log N) each. Internalize the recursion structure so you can re-derive any aggregate variant on the fly. After this lab you should be able to write a working segment tree from blank slate in under 12 minutes with zero off-by-ones.

Background Concepts

A segment tree represents an array as a near-balanced binary tree where each internal node stores the aggregate (sum / min / max / gcd / …) of a contiguous range. Leaves correspond to individual array elements; internal nodes correspond to the union of their children’s ranges. The tree has depth O(log N), so any range [l, r] decomposes into at most 2 · log₂(N) disjoint subtree-ranges. That is the entire complexity argument: each query and update walks O(log N) nodes.

The tree is conventionally stored in a flat array of size 4N (worst-case nearly-balanced binary tree on N leaves) with the root at index 1, left child at 2 · i, right child at 2 · i + 1. This avoids pointer overhead and is cache-friendly.

The aggregate must be associative so that subtree results can be combined. Sum, min, max, gcd, xor, “and”, “or”, and matrix multiplication all qualify. Median, mode, and “k-th smallest” do not combine cleanly and need different structures.

Interview Context

Range queries with updates appear in 3–5% of FAANG-tier Hard pools, but they appear more often on the bar-raiser round. Recognizing that prefix sums (O(1) query, O(N) update) won’t survive the workload — that you need O(log N) for both — is the reflex this lab builds. Companies that screen with segment trees: Meta (frequent), Google (occasional), Amazon (rare), Stripe / HFT shops (very frequent — order book, sliding aggregates). Bombing this is a no-hire signal at L5+.

Problem Statement

Implement a class NumArray initialized with an integer array nums. Support:

update(i, val): set nums[i] = val.
sumRange(left, right): return the sum of nums[left..right] inclusive.

Both must be O(log N). After implementing the sum variant, refactor so swapping combine = + for combine = min or combine = max requires changing one line.

Constraints

1 ≤ N ≤ 3 × 10⁴
−100 ≤ nums[i] ≤ 100
0 ≤ i, left, right < N
Up to 3 × 10⁴ calls to update and sumRange combined.

Clarifying Questions

Are queries inclusive on both ends? (Yes — [left, right].)
Is nums mutable in place, or owned by NumArray? (Owned; copy on construction.)
Are the values guaranteed to fit in int32? (Yes; sum across N elements at value ±100 fits comfortably.)
Is update an assignment or delta? (Assignment — set, not add.)

Examples

NumArray([1, 3, 5])
sumRange(0, 2) → 9
update(1, 2)        // array becomes [1, 2, 5]
sumRange(0, 2) → 8
update(0, 10)       // array becomes [10, 2, 5]
sumRange(1, 2) → 7

Initial Brute Force

Store nums as a plain list. update(i, v): nums[i] = v (O(1)). sumRange(l, r): sum(nums[l:r+1]) (O(N)). Updates are fast; queries are linear.

Brute Force Complexity

Update O(1). Query O(N). Total over Q queries + U updates: O(N · Q + U). At N = Q = 3 × 10⁴: 9 × 10⁸ ops — TLE.

Optimization Path

Two natural alternatives.

Prefix sums. prefix[i] = nums[0] + ... + nums[i-1]. Query is prefix[r+1] - prefix[l] in O(1). But update(i, v) requires recomputing prefix[i+1..N] in O(N). Wrong tradeoff for this workload.

Sqrt decomposition. Block size √N; per-block sums. Update O(1), query O(√N). Total O(N · √N) = O(N^1.5) — at N = 3 × 10⁴, ~5 × 10⁶ ops. Passes but is suboptimal and crusty.

Segment tree. Build O(N), update O(log N), query O(log N). Total O((N + Q) log N) = ~5 × 10⁵ ops. Clean fit.

Final Expected Approach

Recursive segment tree on a flat array of size 4N.

Build build(node, nl, nr): if nl == nr, leaf = arr[nl]; else recurse left and right, set tree[node] = tree[2node] + tree[2node + 1].
Update update(node, nl, nr, idx, val): recurse into the child whose range contains idx; on return, recompute parent.
Query query(node, nl, nr, ql, qr): total miss → identity (0 for sum, +∞ for min); total cover → return tree[node]; partial → recurse both children and combine.

Public API wraps with node = 1, nl = 0, nr = N - 1.

Data Structures Used

A single integer array tree[] of size 4N (sum aggregates).
Optional integer n storing the original length.

Correctness Argument

Build establishes the invariant tree[node] = combine over [nl, nr] by induction on subtree size. Update: along the recursion, only nodes whose range contains idx are touched; each is recomputed from its (now-correct) children, so the invariant is preserved. Query decomposes [ql, qr] into a disjoint union of subtree ranges; the result is the combine of those. The decomposition has size ≤ 2 log N because along any root-to-leaf path the recursion either stops (full cover or miss) or splits at most twice (once for the left boundary, once for the right). The total work is O(log N).

Complexity

Operation	Time	Space
Build	O(N)	O(4N)
Update	O(log N)	O(log N) recursion
Query	O(log N)	O(log N) recursion

Implementation Requirements

class NumArray:
    def __init__(self, nums):
        self.n = len(nums)
        self.tree = [0] * (4 * self.n)
        self._build(1, 0, self.n - 1, nums)

    def _build(self, node, nl, nr, a):
        if nl == nr:
            self.tree[node] = a[nl]; return
        mid = (nl + nr) // 2
        self._build(2*node, nl, mid, a)
        self._build(2*node + 1, mid + 1, nr, a)
        self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]

    def update(self, i, val):
        self._update(1, 0, self.n - 1, i, val)

    def _update(self, node, nl, nr, idx, val):
        if nl == nr:
            self.tree[node] = val; return
        mid = (nl + nr) // 2
        if idx <= mid: self._update(2*node, nl, mid, idx, val)
        else: self._update(2*node + 1, mid + 1, nr, idx, val)
        self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]

    def sumRange(self, l, r):
        return self._query(1, 0, self.n - 1, l, r)

    def _query(self, node, nl, nr, ql, qr):
        if qr < nl or ql > nr: return 0           # miss
        if ql <= nl and nr <= qr: return self.tree[node]   # cover
        mid = (nl + nr) // 2
        return self._query(2*node, nl, mid, ql, qr) + self._query(2*node + 1, mid + 1, nr, ql, qr)

Refactor to support min by changing the identity (+∞), the leaf assignment (still a[nl]), and the combine (min).

Tests

N=1: update(0, 5); sumRange(0, 0) == 5.
All zeros: every range query returns 0.
All same value: sumRange(l, r) == val * (r - l + 1).
After update(i, v): sumRange(i, i) == v; sum across full range matches direct sum of array.
Random fuzz: 1000 ops alternating updates and queries against a brute-force list.
Min variant: build [3, 1, 4, 1, 5, 9, 2, 6], query(2, 5) == 1; after update(3, 10), query(2, 5) == 4.

Follow-up Questions

“Now I want range updates.” → Lab 02 (lazy propagation).
“Now I want O(1) queries on a static array.” → sparse table (Lab 04).
“Now the array is 2D.” → segment tree of segment trees, O(log² N) per op.
“Make it iterative.” → power-of-two-padded leaves at indices [N, 2N); update walks i // 2 upward, query walks l, r toward the middle.
“How would you support count of values ≥ K in [l, r]?” → merge sort tree (segment tree with sorted lists) or wavelet tree.

Product Extension

Real-time analytics dashboards: a stream of N metrics with both edits and arbitrary-range aggregates (e.g., “total revenue from days 17–24” while a correction is being applied to day 19). The naive list is fine until the workload has both fast updates and fast arbitrary-range queries on the same data — then a segment tree over the time axis, keyed by index, is what powers the underlying store.

Language/Runtime Follow-ups

Python: recursion at N=3×10⁴ is fine but the per-op constant is high. For larger N convert to iterative or use sys.setrecursionlimit. Consider array.array('i', ...) over plain list for cache locality.
Java: use int[] tree = new int[4 * n]; the 4n allocation is critical because computing next_power_of_two(n) * 2 is a fencepost-error magnet. Method dispatch has a real cost — inline the recursion if hot.
Go: no recursion-limit issue; keep the slice. Use int (sized to platform) unless the problem demands int64.
C++: the canonical implementation. vector<long long> tree(4 * n). Inline the body; mark methods inline. For competitive problems use the iterative version (template by Adrian Panaete or Codeforces “EDU”).
JS/TS: typed arrays — new Int32Array(4 * n) — outperform plain arrays. Recursion depth at N=3×10⁴ is fine in V8.

Common Bugs

Mixing closed-interval [ql, qr] with half-open [ql, qr) between query and recursion. Always pick closed and stay consistent.
Sizing tree as 2N instead of 4N: works for power-of-two N, segfaults otherwise.
Forgetting to recompute tree[node] after recursing in update. The leaf updates correctly but the parent stays stale.
Identity wrong for the aggregate: 0 for sum, but +∞ (float('inf') / Long.MAX_VALUE) for min and −∞ for max. Returning 0 from a min query missing-range gives wrong answers silently.
Building from nums[mid + 1] vs nums[mid] — pick one slicing convention.
Iterative version: forgetting to round N up to a power of two before placing leaves.

Debugging Strategy

When tests fail, drop into a tiny instance (N=4, indices 0..3) and print(tree) after each op. Verify by hand: tree[1] should equal sum over [0,3], tree[2] over [0,1], tree[3] over [2,3]. If those don’t hold post-build, your build recursion is broken — fix that before touching update or query. Add assert for the cover/miss/partial branches printing (node, nl, nr, ql, qr) to spot which sub-call returns the wrong total.

Mastery Criteria

Recognized the segment-tree signal in <60 seconds from a “range query + point update” problem statement.
Wrote build/update/query on a blank screen in <12 minutes with no off-by-ones, first try.
Refactored sum → min → max in <2 minutes by changing identity + combine only.
Stated complexity (build O(N), update/query O(log N), space O(4N)) without prompting.
Solved LC 307 in <15 minutes from cold start.
Solved one cousin problem (LC 308 or LC 1157) in <30 minutes from cold start.

Lab 02 — Segment Tree With Lazy Propagation

Goal

Extend Lab 01’s segment tree to support range updates in O(log N) using lazy propagation. Implement range-add + range-sum and articulate the push-down invariant so cleanly that you can re-derive lazy on a different aggregate (range-set + range-sum, range-flip + range-count) under interview pressure.

Background Concepts

Lazy propagation defers work. When an update covers a whole subtree, instead of recursing into all O(2^depth) descendants, you stamp a single lazy tag on that subtree’s root and update its aggregate in O(1). The descendants stay stale until something forces a deeper visit; at that point you push_down the tag — apply it to the children’s aggregates and merge into their lazy slots — and clear the parent’s tag.

This works whenever:

The update operation has an O(1) batch form: applying “add v to all of [nl, nr]” to tree[node] is tree[node] += v * (nr - nl + 1).
The lazy tags compose: a pending “add 3” followed by a new “add 5” composes to “add 8”. Without composition, you cannot stack tags; you must push first.
There is a identity lazy value (e.g., 0 for add) meaning “nothing pending”.

For mixed update types (“add” and “set” both), composition needs an explicit rule: a new “set” wipes any pending “add”; a new “add” composes with a pending “set” by changing the set value.

Interview Context

Asked at: companies with high-frequency-trading or analytics flavor (Stripe, Two Sigma, Jane Street), and Meta in bar-raiser slots. Most interview problems that need this dress up as “support add v to a range and report sum of a range” or as a count-of-overlapping-intervals problem like LC 732. Failing to know this structure caps you at Mediums; recognizing it and implementing it correctly is a green-light at L5+.

Problem Statement

Implement a class RangeArray over n integers (initially zero) supporting:

add(l, r, v): add v to every index in [l, r].
sumRange(l, r): return the sum of arr[l..r].

Both O(log n).

Constraints

1 ≤ n ≤ 10⁵
1 ≤ Q ≤ 10⁵ ops total.
−10⁴ ≤ v ≤ 10⁴.
Sums fit in 64-bit (max |sum| ≈ 10⁵ · 10⁵ · 10⁴ = 10¹⁴).

Clarifying Questions

Endpoints inclusive? (Yes.)
Is add cumulative or assignment? (Cumulative — additive.)
Should arr be mutable in place at the leaves? (Conceptually yes; in practice the segment tree owns it.)
0-indexed or 1-indexed externally? (0-indexed.)

Examples

RangeArray(5)
add(0, 2, 3)         // [3, 3, 3, 0, 0]
sumRange(0, 4) → 9
add(1, 3, 2)         // [3, 5, 5, 2, 0]
sumRange(2, 4) → 7
sumRange(0, 0) → 3

Initial Brute Force

Plain list. add(l, r, v) → for i in range(l, r+1): arr[i] += v (O(N)). sumRange → sum(arr[l:r+1]) (O(N)). Combined per-op O(N).

Brute Force Complexity

O(N) per op. Total O(N · Q) = 10¹⁰ at the limits. TLE by 4 orders of magnitude.

Optimization Path

Difference array? diff[l] += v; diff[r+1] -= v is O(1) per add, but you can only query the final prefix sum after all updates — not interleaved with sum queries. Doesn’t survive mixed workload.

Two Fenwick trees (BIT-RU + BIT-PQ)? Yes, this works for range-add + range-sum specifically — the BIT² trick. Slightly faster constants than segment tree, but only handles invertible aggregates. Segment tree generalizes to range-set, range-min-after-add, range-affine, etc.

Lazy segment tree is the canonical answer.

Final Expected Approach

Augment Lab 01’s tree with a parallel lazy[] array of size 4N, all initialized to 0 (the identity for add).

push_down(node, nl, nr): if lazy[node] != 0, apply it to both children’s aggregates (tree[child] += lazy[node] * child_len) and compose it into lazy[child] += lazy[node]. Then clear lazy[node] = 0. Called at the start of any non-leaf update or query that recurses into children.
update(node, nl, nr, ql, qr, v):
- If qr < nl or ql > nr: return (miss).
- If ql <= nl and nr <= qr: stamp tree[node] += v * (nr - nl + 1); lazy[node] += v; return.
- Else: push_down; recurse both children; tree[node] = tree[left] + tree[right].
query(node, nl, nr, ql, qr): identical structure, with push_down before recursing.

Data Structures Used

tree[] — sum aggregates, size 4N, int64.
lazy[] — pending add tags, size 4N, int64.

Correctness Argument

Invariant: for every node, tree[node] equals the correct aggregate over its range as if all pending lazy tags up to and including this node have been applied. Specifically, tree[node] is correct; tree[child] may be stale by exactly lazy[node] * child_len.

push_down repairs the children: it adds the missing contribution to their aggregates and composes the tag into theirs (so their own descendants will, later, be repaired similarly). It then clears lazy[node]. After push_down, tree[node] is unchanged and the children are now correct, so descendants of children may be stale only by the children’s own pending tags.

update either (a) misses, doing nothing, (b) totally covers, applying the O(1) batch update directly to tree[node] and stamping the tag, or (c) partially overlaps, which requires push_down before recursing so the children are correct, then recomputes tree[node] from now-current children.

query symmetric.

Complexity

Operation	Time	Space
Build	O(N)	O(4N) tree + O(4N) lazy
Range update	O(log N)	O(log N) recursion
Range query	O(log N)	O(log N) recursion

Implementation Requirements

class RangeArray:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (4 * n)
        self.lazy = [0] * (4 * n)

    def _push_down(self, node, nl, nr):
        if self.lazy[node]:
            mid = (nl + nr) // 2
            left, right = 2*node, 2*node + 1
            self.tree[left] += self.lazy[node] * (mid - nl + 1)
            self.lazy[left] += self.lazy[node]
            self.tree[right] += self.lazy[node] * (nr - mid)
            self.lazy[right] += self.lazy[node]
            self.lazy[node] = 0

    def add(self, l, r, v):
        self._add(1, 0, self.n - 1, l, r, v)

    def _add(self, node, nl, nr, ql, qr, v):
        if qr < nl or ql > nr: return
        if ql <= nl and nr <= qr:
            self.tree[node] += v * (nr - nl + 1)
            self.lazy[node] += v
            return
        self._push_down(node, nl, nr)
        mid = (nl + nr) // 2
        self._add(2*node, nl, mid, ql, qr, v)
        self._add(2*node + 1, mid + 1, nr, ql, qr, v)
        self.tree[node] = self.tree[2*node] + self.tree[2*node + 1]

    def sumRange(self, l, r):
        return self._sum(1, 0, self.n - 1, l, r)

    def _sum(self, node, nl, nr, ql, qr):
        if qr < nl or ql > nr: return 0
        if ql <= nl and nr <= qr: return self.tree[node]
        self._push_down(node, nl, nr)
        mid = (nl + nr) // 2
        return self._sum(2*node, nl, mid, ql, qr) + self._sum(2*node + 1, mid + 1, nr, ql, qr)

Tests

N=1, single index: add(0, 0, 5); sumRange(0, 0) == 5.
All-zero: any sumRange before any add returns 0.
Disjoint adds: add(0, 2, 1), add(3, 5, 2); sumRange(0, 5) == 3 + 6 = 9.
Overlapping adds: add(0, 4, 1), add(2, 4, 1); sumRange(2, 4) == 2 + 2 + 2 = 6.
Stress: 10⁴ random adds + queries against a brute-force list.
Stack of pending tags: add(0, n-1, 1) 100 times; sumRange(i, i) == 100 for all i.

Follow-up Questions

“Now add becomes set (assignment, not delta).” → identity = sentinel None; on push_down, if lazy_parent != None, replace child’s tree and replace child’s lazy.
“Both add and set operations.” → two lazy slots. Composition: a new “set” wipes pending “add”; a new “add” applied while a “set” is pending modifies the set value.
“Range-flip on a binary array, with range-count-of-ones.” → tree[node] = count of 1s; flip → tree[node] = (nr-nl+1) - tree[node]; lazy is a boolean toggle.
“Range-affine: replace a[i] with b · a[i] + c.” → lazy holds (b, c); composition: (b₂, c₂) ∘ (b₁, c₁) = (b₂ b₁, b₂ c₁ + c₂).

Product Extension

A live spreadsheet with array formulas — =ARRAYFORMULA(A1:A1000 + 5) — is exactly range-add. With range-set you get fill-down. With range-affine you get scaling formulas. The backend has to support thousands of these per second per spreadsheet; lazy segment trees are one viable engine.

Language/Runtime Follow-ups

Python: 4 × 10⁵ allocations are slow; warm the lists once, never resize. Recursion at N=10⁵ depth ≈ 17, fine.
Java: long[] tree, lazy to avoid sum overflow at the limits. Synchronization-free for single-thread; LongAdder is unrelated.
Go: same template; []int64 slices.
C++: the canonical use case. vector<long long>. Compile with -O2; benchmark on N=10⁶.
JS/TS: BigInt64Array is heavy; if values fit in Number’s 53-bit safe range, use Float64Array despite being float (the IEEE 754 representation is exact for ±2⁵³ integers).

Common Bugs

Forgetting push_down before recursing on partial cover. The leaf updates correctly but its sibling subtree returns stale aggregates on later queries. Manifests as queries that depend on update order.
Push down on full-cover branch — wasted work but not wrong; only push on partial overlap.
Identity confusion: 0 is identity for add but a legal value for set. Use None or a sentinel for set-style lazy.
Composition direction: when stamping a new tag onto a parent that already has a tag, write the rule down before coding. For add it’s commutative; for set it isn’t.
int overflow in Java/C++ — sums of up to 10⁵ · 10⁴ values ≈ 10⁹, doubles to 10¹⁴ with adds. Use 64-bit.
Calling push_down on a leaf — guard if nl != nr.

Debugging Strategy

Add an assert_consistent() helper that walks the tree and verifies, for every internal node, tree[node] == tree[left] + tree[right] + lazy[node] * (nr - nl + 1). Wait — that’s not quite the invariant, since lazy[node] has not been pushed yet but tree[node] already includes it. The correct invariant is tree[node] == tree[left] + tree[right] + (lazy[node] * (nr - nl + 1)) only if you treat children’s tree as “before this lazy stamp”. An easier debug helper: after each op, force a full push_down from root to leaves and rebuild aggregates; compare against the brute-force array. If they diverge, you have a push-down-order bug.

Mastery Criteria

Stated the push-down invariant in one sentence.
Wrote range-add + range-sum lazy seg tree from scratch in <20 minutes, first try.
Adapted the same template to range-set + range-sum in <10 additional minutes.
Solved LC 732 (My Calendar III) using a coord-compressed lazy seg tree.
Stated when not to use lazy (single-point updates → no benefit; non-composing operations → impossible).
Pinpointed the canonical bug (missing push_down) within 5 minutes of seeing a failing test.

Lab 03 — Fenwick Tree (Binary Indexed Tree)

Goal

Implement a Fenwick tree (BIT) and use it to solve LeetCode 315 — Count of Smaller Numbers After Self. Internalize the bit-tricks (i & -i) and the 1-indexed convention so well that you can write a Fenwick tree in under 8 minutes from a blank page.

Background Concepts

A Fenwick tree is a clever encoding of prefix sums in O(N) space supporting prefix_sum(i) and point_update(i, delta) each in O(log N). The key insight: index i in 1-indexed form is associated with a “responsibility range” of size i & -i (the lowest set bit of i). Index 12 = 1100₂ has responsibility for the 4 values at positions 9..12. Index 8 = 1000₂ for 1..8. Walking up the tree (i += i & -i) accumulates non-overlapping responsibility ranges that span exactly [1, i] for query, and exactly the buckets that contain i for update.

The structure is invertible-only: it stores prefix sums and you derive range_sum(l, r) = prefix(r) - prefix(l - 1). This is fine for sum, xor, count, and “frequency-prefix” aggregates. It does not generalize to min/max because subtraction doesn’t undo a min.

For LC 315, the trick is coordinate compression + Fenwick of frequencies. Process the array right-to-left; for each nums[i], query “how many values strictly less than nums[i] have I seen so far?” by computing prefix(rank(nums[i]) - 1) on the frequency Fenwick; then increment update(rank(nums[i]), 1).

Interview Context

Fenwick trees are asked roughly as often as segment trees but the audience skews more competitive-programming. Stripe, Jane Street, Two Sigma, Bloomberg quant — all reach for them. The signal is “count inversions / count-of-X-after-Y / range-sum-with-updates and the aggregate is invertible”. Faster constants than segment tree, ~5x fewer lines of code; if both work, prefer Fenwick.

Problem Statement

Given an integer array nums, return an array counts where counts[i] is the number of elements to the right of nums[i] that are strictly smaller than nums[i].

Constraints

1 ≤ N ≤ 10⁵
−10⁴ ≤ nums[i] ≤ 10⁴

Clarifying Questions

Strictly smaller, or ≤? (Strictly smaller.)
Return order: same as input order? (Yes — counts[i] aligns with nums[i].)
Are duplicates allowed? (Yes — they don’t count toward “smaller”.)

Examples

nums = [5, 2, 6, 1]   →   counts = [2, 1, 1, 0]
   5: indices 1,3 (vals 2,1) are smaller → 2
   2: index 3 (val 1) is smaller → 1
   6: index 3 (val 1) is smaller → 1
   1: nothing to the right is smaller → 0

Initial Brute Force

For each i, scan j > i and count nums[j] < nums[i]. O(N²).

Brute Force Complexity

O(N²) time. At N=10⁵: 10¹⁰ ops. TLE by 4 orders of magnitude.

Optimization Path

Merge sort with inversion counting. During the merge step, when copying from the right half, every remaining element on the left half is strictly larger and hasn’t yet been placed — for each, increment its inversion count. O(N log N) time and space. Works, idiomatic.

Fenwick of frequencies after coordinate compression. Equally O(N log N), simpler to extend (e.g., to “count of values in [a, b] after self”).

For this lab, Fenwick is the assigned approach because it generalizes farther.

Final Expected Approach

Coordinate compression: build sorted_unique = sorted(set(nums)); map each value v to rank = bisect_left(sorted_unique, v) + 1 (1-indexed for Fenwick).
Right-to-left sweep: for each i from n-1 down to 0:
- r = rank[nums[i]]
- counts[i] = bit.prefix(r - 1) — count of strictly smaller previously-seen.
- bit.update(r, 1).
Return counts.

Data Structures Used

BIT(size) — Fenwick tree of size = number of distinct values.
rank map — value → 1-indexed compressed rank.
counts[] — output array.

Correctness Argument

After processing index i, the BIT contains exactly the multiset of ranks for nums[i+1..n-1] (the elements to the right of i, since we go right-to-left). bit.prefix(r - 1) returns the count of those whose rank is < r — i.e., strictly smaller than nums[i]. Coordinate compression preserves order, so “rank smaller” iff “value smaller”. The update bit.update(r, 1) then registers nums[i] for the next iteration. By induction the invariant “BIT == multiset of ranks of strictly-right-of-current” is preserved.

Complexity

Operation	Time	Space
Coordinate compression	O(N log N)	O(N)
Right-to-left sweep	O(N log N)	O(N) Fenwick
Total	O(N log N)	O(N)

Implementation Requirements

class BIT:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)   # 1-indexed

    def update(self, i, delta):
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s


from bisect import bisect_left

def countSmaller(nums):
    sorted_unique = sorted(set(nums))
    rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
    bit = BIT(len(sorted_unique))
    counts = [0] * len(nums)
    for i in range(len(nums) - 1, -1, -1):
        r = rank[nums[i]]
        counts[i] = bit.prefix(r - 1)
        bit.update(r, 1)
    return counts

Tests

[5, 2, 6, 1] → [2, 1, 1, 0].
[1, 2, 3, 4] → [0, 0, 0, 0] (already sorted).
[4, 3, 2, 1] → [3, 2, 1, 0] (reverse sorted — every pair is an inversion).
All same: [5, 5, 5] → [0, 0, 0].
Negatives: [-1, -1, 0, -2] → [1, 1, 1, 0].
Single element: [7] → [0].
Stress: 10⁴ random arrays of size 1000 against the O(N²) brute force.

Follow-up Questions

“Now count strictly larger after self.” → mirror: prefix from rank+1 to n = bit.prefix(n) - bit.prefix(rank).
“Count of values in [a, b] after self.” → bit.prefix(rank[b]) - bit.prefix(rank[a] - 1).
“Reverse pairs (LC 493)”: nums[i] > 2 * nums[j] for i < j. Adapt the rank/query: compute “count of v’s in BIT with v < nums[i] / 2” — careful with integer division.
“Sum of values smaller after self instead of count.” → BIT stores value sums, not counts; update(rank, nums[i]) instead of update(rank, 1).
“Now updates are interleaved with queries on the original problem.” → Fenwick tree of frequencies still works because both ops are O(log N).

Product Extension

A leaderboard service that streams game scores and reports “your rank percentile” as scores arrive. Fenwick of frequencies indexed by score bucket; as a new score arrives, query prefix to know how many scored less, divide by total. Works at millions-of-events-per-second with log-bucket cost per event.

Language/Runtime Follow-ups

Python: integer ops are arbitrary-precision but slow; the BIT loop is hot. PyPy or C-extension if N=10⁶. array.array('q') over list only marginally helps.
Java: int[] for the tree at this N. Math.floorMod not needed (values are positive ranks). Watch for long if you store sums.
Go: idiomatic — tree []int. No surprises.
C++: canonical CP template. vector<int> bit(n + 1, 0). The i & -i lowbit relies on two’s complement, which all modern compilers guarantee for signed int.
JS/TS: Int32Array(n + 1) outperforms regular arrays. The bitwise & and unary - on numbers cast through 32-bit signed int, which works for N ≤ 2³¹.

Common Bugs

0-indexing the BIT. Calling update(0, …) is i & -i = 0, the loop never advances, or it loops forever (depends on language). Always 1-index.
Update walks down, query walks up — got the directions reversed. Mnemonic: update goes up the responsibility tree (so future prefix walks see it); query walks down (collecting predecessor ranges).
Forgetting to add 1 when going from bisect_left rank to BIT index.
Compressing nums but using the original value when querying.
tree array size n not n + 1 for 1-indexed.
Using Fenwick for min/max — it doesn’t work because subtraction is not the inverse of min.

Debugging Strategy

For a length-8 input, print tree[1..8] after each update. Recall: tree[i] stores the sum over [i - lowbit(i) + 1, i]. So tree[8] should equal the sum over [1, 8], tree[12] over [9, 12], etc. Verify by hand for a 3-update sequence.

If the inversion-count is off by 1 at every position, you almost certainly forgot the +1 in rank shifting (1-indexed vs 0-indexed). If it’s off by a lot, your update is going down instead of up, or your prefix is going up instead of down.

Mastery Criteria

Wrote update and prefix with i & -i correctly on first try.
Used 1-indexed throughout without bugs.
Solved LC 315 in <15 minutes from blank slate.
Solved one cousin (LC 327, LC 493) in <30 minutes.
Articulated why Fenwick can’t do range-min.
Stated when Fenwick beats segment tree (smaller code, smaller constants — pick Fenwick when the aggregate is invertible).
Estimated memory at N=10⁶ (~4 MB for int) without prompting.

Lab 04 — Sparse Table for Range Minimum Queries

Goal

Implement a sparse table supporting O(1) range-min queries on a static array, after O(N log N) preprocessing. Internalize the “two overlapping intervals of the largest power-of-two length” trick. After this lab you should be able to write a sparse table from blank in under 10 minutes and instantly choose between sparse table and segment tree based on whether updates are required.

Background Concepts

A sparse table is a preprocessing structure for range queries on immutable arrays where the aggregate is idempotent — combining the same element twice gives the same result. Min, max, gcd, bitwise OR, bitwise AND, and “is-there-a-1-in-this-range” are idempotent. Sum is not (counts twice).

Construction. st[k][i] = min of arr[i .. i + 2^k - 1]. Build by:

st[0][i] = arr[i] for all i.
st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) — the range of length 2^k splits cleanly into two halves of length 2^(k-1).

Query. Given [l, r], let k = floor(log2(r - l + 1)). Then min(st[k][l], st[k][r - 2^k + 1]). The two intervals each have length 2^k, they cover [l, l + 2^k - 1] and [r - 2^k + 1, r], and their union is exactly [l, r] because l + 2^k - 1 ≥ r - 2^k + 1 whenever 2^k ≥ (r - l + 1) / 2, which holds by choice of k. They overlap, but for an idempotent op that’s harmless.

For O(1) per query you also need a precomputed log_floor[len] table — calling math.log2 each query has too much overhead and floating-point trouble.

Interview Context

Sparse tables show up in problems with a read-only array and many range-min/max queries. The signal: “static array, Q queries with Q ≫ N”. Common cousin: range-LCA via Euler tour + sparse table over depth array — Phase 4 territory but rooted here. Asked at: Google occasionally, CP-flavored shops always. Rejecting an O(log N) segment tree in favor of a sparse table when O(1) queries matter (e.g., 10⁷ queries on a 10⁵ array) is a senior-level signal.

Problem Statement

Given a static integer array arr of length N, build a structure that answers query(l, r) = min of arr[l..r] in O(1) per query.

Constraints

1 ≤ N ≤ 10⁵
1 ≤ Q ≤ 10⁷ queries
0 ≤ l ≤ r < N
−10⁹ ≤ arr[i] ≤ 10⁹

Clarifying Questions

Is the array static? (Yes — that’s the entire premise.)
Inclusive endpoints? (Yes.)
Min, or min and max? (Just min for this lab; max is identical with min → max.)

Examples

arr = [3, 1, 4, 1, 5, 9, 2, 6]
query(0, 7) → 1
query(2, 5) → 1
query(4, 7) → 2
query(3, 3) → 1

Initial Brute Force

min(arr[l:r+1]) per query — O(N) per call. Total O(N · Q).

Brute Force Complexity

At N=10⁵, Q=10⁷: 10¹² ops. TLE by 6 orders of magnitude.

Optimization Path

Segment tree gives O(log N) per query, O(N) per build. Total O(Q log N) = ~2 × 10⁸ at the limits — borderline TLE in Python, fine in C++.

Sparse table gives O(1) per query, O(N log N) per build. Total O(N log N + Q) = ~2 × 10⁶ + 10⁷ = 1.2 × 10⁷ ops. Comfortable everywhere.

The deciding factor: updates. Sparse table is read-only. If the array mutates between queries, sparse table is wrong; segment tree is required. The interviewer asking “what if I want updates?” is a real follow-up — answer: “Switch to a segment tree; sparse table doesn’t support point updates without an O(N log N) full rebuild.”

Final Expected Approach

Precompute log_floor[1..N] via log_floor[i] = log_floor[i // 2] + 1, base case log_floor[1] = 0.
Allocate st as a 2D array of size (K + 1) × N where K = log_floor[N].
st[0][i] = arr[i] for all i.
For k = 1 .. K: for i = 0 .. N - 2^k: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]).
query(l, r): k = log_floor[r - l + 1]; return min(st[k][l], st[k][r - 2^k + 1]).

Data Structures Used

2D array st[K+1][N], where K = floor(log2(N)).
1D array log_floor[N+1].

Correctness Argument

By induction on k: st[0][i] = arr[i] (length-1 range, trivially correct). Given st[k-1][·] correct: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) covers [i, i + 2^(k-1) - 1] ∪ [i + 2^(k-1), i + 2^k - 1] = [i, i + 2^k - 1]. Min commutes over union.

For query, k = floor(log2(r - l + 1)) ⇒ 2^k ≤ len ≤ 2^(k+1) - 1 ⇒ 2^k ≥ len/2 ⇒ l + 2^k > r - 2^k, so the two intervals [l, l + 2^k - 1] and [r - 2^k + 1, r] overlap (or meet exactly), and their union is [l, r]. Min over the union equals min of the two.

Complexity

Operation	Time	Space
Build	O(N log N)	O(N log N)
Query	O(1)	—

At N=10⁵: K ≈ 17, total table cells ≈ 1.7 × 10⁶. At 8 bytes each, ~14 MB.

Implementation Requirements

class SparseTableMin:
    def __init__(self, arr):
        n = len(arr)
        self.log = [0] * (n + 1)
        for i in range(2, n + 1):
            self.log[i] = self.log[i // 2] + 1
        K = self.log[n]
        self.st = [list(arr)] + [[0] * n for _ in range(K)]
        for k in range(1, K + 1):
            half = 1 << (k - 1)
            for i in range(n - (1 << k) + 1):
                self.st[k][i] = min(self.st[k-1][i], self.st[k-1][i + half])

    def query(self, l, r):
        k = self.log[r - l + 1]
        return min(self.st[k][l], self.st[k][r - (1 << k) + 1])

Tests

N=1: query(0, 0) == arr[0].
All same: query(l, r) == arr[0] for all valid (l, r).
Sorted ascending: query(l, r) == arr[l].
Sorted descending: query(l, r) == arr[r].
Random: 10⁴ queries on a length-1000 random array vs brute force.
Edge: query(0, n-1) should equal min(arr).
Power-of-two length and non-power-of-two length both must pass.

Follow-up Questions

“Now also support range-max.” → second sparse table or pack (min, max) into each cell.
“Now updates are required.” → switch to segment tree; sparse table cannot support O(log N) updates without rebuild.
“Now I want range-sum.” → sum is not idempotent. You can still answer in O(log N) by combining K = log(len) non-overlapping doubling intervals, but at that point segment tree is simpler.
“Range LCA.” → reduce to range-min on Euler tour depth array + sparse table over depths. Lab in Phase 4.
“Reduce memory at the cost of complexity.” → Fischer–Heun (RMQ ±1) is O(N) preprocessing + O(1) query but conceptually heavy.

Product Extension

Static analytics dashboards (pre-aggregated, refreshed nightly) over time-series metrics: “min latency in this 5-minute window over the last 24 hours, sliding”. Pre-aggregate the time-series as a sparse table at end-of-day; serve queries at the dashboard at single-microsecond latencies. The “static” condition matches because the data is read-only between rebuilds.

Language/Runtime Follow-ups

Python: list-of-lists is cache-unfriendly; flatten to one big list with manual indexing for ~3x speedup. PyPy if benchmarking.
Java: int[][] st = new int[K+1][n]. JIT will hoist invariants. For N=10⁶ allocate carefully.
Go: [][]int is fine; make([]int, n) inside a loop is idiomatic.
C++: vector<vector<int>> st(K + 1, vector<int>(n)). With -O2 this is the canonical fast implementation.
JS/TS: Int32Array per row beats Array for numeric ops. JS doesn’t have integer log2 — Math.log2 is float and slow; use (31 - Math.clz32(x)) for 32-bit ints.

Common Bugs

Computing log2 per query → floating-point rounding errors, e.g. log2(8) → 2.9999..., floored to 2. Always use the precomputed log_floor[].
Building st[k][i] for i + 2^k - 1 ≥ N — out-of-bounds. Loop must end at n - 2^k.
Sizing st with too few rows: K = log_floor[N], but allocating K rows misses the K-th. Allocate K+1.
Using sparse table for sum and then puzzling over wrong answers — sum is not idempotent.
Forgetting that the array must be static. If a query is interleaved with mutation, the structure silently returns stale answers.
Off-by-one in query: r - (1 << k) + 1 vs r - (1 << k). The interval is [r - 2^k + 1, r] of length 2^k — verify by hand on a tiny case.

Debugging Strategy

Print st[k] for small N=8 and verify by hand: st[0] is the array, st[1][i] = min(arr[i], arr[i+1]), st[2][i] = min(arr[i..i+3]), st[3][0] = min(arr[0..7]). If those don’t hold, your build loop is wrong. If queries fail but build is correct, suspect log_floor and/or query indexing — trace the formula on query(2, 5) where len=4, k=2, indices=2 and 5-4+1=2, both pointing at the same precomputed cell.

Mastery Criteria

Stated the idempotence requirement and gave 3 ops that satisfy it and 1 that doesn’t.
Wrote sparse table from scratch in <10 minutes.
Wrote log_floor table without using math.log2.
Chose sparse table over segment tree for a read-only workload by stating the constants (1 vs log N per query).
Identified the failure mode “what if updates are needed” and named segment tree as the replacement.
Solved one classic RMQ problem and one problem reducible to RMQ.

Lab 05 — KMP String Matching

Goal

Implement Knuth–Morris–Pratt (KMP): build the failure function (longest proper prefix-suffix) of a pattern in O(M), then match the pattern against a text in O(N + M). Apply it to LeetCode 28 (strStr) and LeetCode 459 (Repeated Substring Pattern). After this lab, you should be able to derive fail[] on a 10-character string by hand and write the matcher in <12 minutes.

Background Concepts

The naive substring search compares the pattern against every position in the text: O(N · M) worst case. KMP exploits the fact that when a mismatch occurs at pattern position j, the prefix P[0..j-1] did match. So we already know the last j characters of the text. From that we compute “what is the longest proper prefix of P that is also a suffix of P[0..j-1]?” — call that length fail[j-1]. Then we resume matching at pattern position fail[j-1] without backtracking the text pointer.

The failure function (also called “longest proper prefix-suffix” or LPS):

fail[i] = length of the longest proper prefix of P[0..i] that is also a suffix of P[0..i].
“Proper” = strictly shorter than i + 1.
fail[0] = 0 always.

Build in O(M) using a two-pointer recurrence: j = fail[i-1]; if P[j] == P[i], fail[i] = j + 1; else fall back j = fail[j-1] and retry, until j = 0.

The matcher: walk text pointer i forward; pattern pointer j advances on match, falls back to fail[j-1] on mismatch (without resetting i).

Interview Context

KMP is the bedrock single-pattern string algorithm. Asked at every FAANG, every quant shop, every search-infra team. The give-away signal: “find pattern in text” with N, M up to 10⁵ — naive is 10¹⁰ ops. Most candidates know strStr exists; few can derive fail[] correctly under pressure. Doing it cleanly is a strong signal at L4+.

Problem Statement (LC 28)

Given two strings haystack and needle, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Constraints

1 ≤ haystack.length, needle.length ≤ 10⁴ (LC 28); generalize to 10⁵.
All printable ASCII.

Clarifying Questions

First occurrence (leftmost), or any? (Leftmost.)
Return 0 for empty needle? (Yes — convention.)
Case-sensitive? (Yes by default.)

Examples

strStr("sadbutsad", "sad") → 0
strStr("leetcode", "leeto") → -1
strStr("hello", "ll") → 2
strStr("aabaaabaaac", "aabaaac") → 4

Initial Brute Force

Two nested loops: for each starting index i ∈ [0, N - M], compare text[i..i+M-1] against pattern; return i on full match.

Brute Force Complexity

O((N - M + 1) · M) ≈ O(N · M). Worst case at text = "aaa..a", pattern = "aaa..b": each starting position fails on the last character. At N = M = 10⁵: 10¹⁰ ops. TLE.

Optimization Path

KMP O(N + M). Z-algorithm O(N + M) — equally good, different invariants. Rabin–Karp O(N) expected with hashing — needs verify on collision. Suffix automaton O(N + M) for the text-side, O(M) match — overkill for one pattern.

KMP is the canonical answer because it (a) is exact (no probabilistic concerns), (b) generalizes to “longest border” / “shortest period” follow-ups, (c) is preferred by interviewers as a known-quantity algorithm.

Final Expected Approach

Build fail[] of length M for the pattern.
Walk a single pointer i over the text and j over the pattern.
- If text[i] == pat[j]: advance both; if j == M, report match at i - M.
- If mismatch and j > 0: j = fail[j - 1] (don’t advance i).
- If mismatch and j == 0: i += 1.

Data Structures Used

fail[] — int array of length M.
Two pointers, i and j.

Correctness Argument

Failure function invariant: at the end of the build loop’s iteration on i, fail[i] equals the length of the longest proper prefix of P[0..i] matching its suffix.

Proof sketch: assume fail[0..i-1] is correct. Set j = fail[i-1] — the longest proper border of P[0..i-1]. Try to extend: if P[j] == P[i], the border extends by one to j + 1, and there is no longer border (any longer would give a longer border at i-1). If not, fall back to the next-shorter border via j = fail[j-1]; repeat.

Match invariant: when the matcher is at text position i and pattern position j, the last j characters of text up to i-1 equal P[0..j-1]. On mismatch, falling back to fail[j-1] finds the longest proper prefix of P that is a suffix of P[0..j-1], which is also a suffix of the text so far — preserving the invariant without rescanning text.

Linearity: in the matcher, the variable i + (i - j) strictly increases on every iteration. Since i ≤ N and j ≥ 0, the loop runs ≤ 2N times. Same trick for the build: i + (i - j) ≤ 2M.

Complexity

Operation	Time	Space
Failure function build	O(M)	O(M)
Match	O(N + M)	O(M)

Implementation Requirements

def build_failure(pat):
    m = len(pat)
    fail = [0] * m
    j = 0
    for i in range(1, m):
        while j > 0 and pat[j] != pat[i]:
            j = fail[j - 1]
        if pat[j] == pat[i]:
            j += 1
        fail[i] = j
    return fail

def strStr(text, pat):
    if not pat: return 0
    n, m = len(text), len(pat)
    if m > n: return -1
    fail = build_failure(pat)
    j = 0
    for i in range(n):
        while j > 0 and text[i] != pat[j]:
            j = fail[j - 1]
        if text[i] == pat[j]:
            j += 1
            if j == m:
                return i - m + 1
    return -1

Tests

Empty pattern → 0.
Pattern not in text → −1.
Pattern at the start: strStr("abcd", "ab") == 0.
Pattern at the end: strStr("abcd", "cd") == 2.
Pattern equals text: strStr("hello", "hello") == 0.
Repeated chars: strStr("aaaa", "aa") == 0.
Worst-case backtrack: strStr("aaaaab", "aaab") == 2.
Verify fail for "aabaaab" → [0, 1, 0, 1, 2, 2, 3].
Stress: random texts/patterns, compared against text.find(pat).

Follow-up Questions

“Find all occurrences.” → on match, instead of returning, record i - m + 1 and continue with j = fail[j - 1].
“Repeated substring pattern (LC 459).” → s is composed of k ≥ 2 repetitions of a substring iff n % (n - fail[n-1]) == 0 and fail[n-1] > 0.
“Shortest palindrome (LC 214).” → run KMP on s + '#' + reverse(s); the answer prefix length is fail[-1].
“Multi-pattern matching.” → Aho–Corasick (Phase 3 #17) generalizes KMP to a trie of patterns.
“Strict period” (longest period of S) = n - fail[n-1]; longest border = fail[n-1]. They are dual.
“Z algorithm — implement it instead.” → different invariant, same asymptotics; pick whichever is fluent.

Product Extension

Search-engine snippet generation: for each query term, find the first match in each candidate document. Multi-pattern at scale uses Aho–Corasick; single-pattern intra-doc still uses KMP because of its predictable cache behavior. Anti-virus signature scanning of binaries is the same problem, multi-pattern, with patterns numbered in the millions — Aho–Corasick territory but KMP per-pattern is the building block.

Language/Runtime Follow-ups

Python: built-in str.find is C-implemented Two-Way / Crochemore — usually faster than Python-level KMP. KMP wins when you want all occurrences or the failure function for other purposes.
Java: String.indexOf is a naive scan. KMP wins for adversarial inputs. Use char[] over String.charAt for the inner loop.
Go: strings.Index uses Rabin–Karp with a fallback. KMP useful when you want explicit fail[].
C++: string::find is naive in libstdc++. KMP from scratch is the canonical CP move. std::vector<int> for fail.
JS/TS: String.prototype.indexOf is engine-dependent; V8 uses Boyer–Moore–Horspool. KMP needed when you implement custom matchers.

Common Bugs

Setting fail[0] = 1 instead of 0. The “proper” prefix excludes the full string.
In the build, forgetting while j > 0 and pat[j] != pat[i]: j = fail[j-1] is a while, not an if. Treating it as if gives wrong answers on patterns like "aabaaab".
Resetting j = 0 between independent matches but forgetting to reset i — leftover from a “match all” loop.
Using pat[i] != pat[j] vs pat[i] != pat[j-1] — pick a consistent indexing for the LPS (length, not index) and don’t mix.
Off-by-one when reporting the match index: i - m + 1 (0-indexed start) vs i - m.
For LC 459, forgetting the fail[n-1] > 0 guard — without it, a non-repeating string passes the divisibility check trivially.

Debugging Strategy

Compute fail[] for a 7-char pattern by hand and compare. For "aabaaab": fail = [0, 1, 0, 1, 2, 2, 3]. Walk the recurrence: at i=4, j=fail[3]=1, pat[1]==‘a’==pat[4]=‘a’ → fail[4]=2. At i=5, j=fail[4]=2, pat[2]==‘b’!=pat[5]=‘a’ → j=fail[1]=1, pat[1]==‘a’==pat[5]=‘a’ → fail[5]=2. If your code disagrees, instrument with prints.

For the matcher, trace (i, j) per iteration on text="aabaaabaaac", pat="aabaaac". After an early mismatch at (i=6, j=6) (text=‘b’ vs pat=‘c’), j=fail[5]=2, so we resume at text 6 vs pat 2.

Mastery Criteria

Computed fail[] for an unfamiliar 8-character pattern by hand in <2 minutes.
Wrote build_failure and strStr in <12 minutes total, no off-by-ones.
Solved LC 28, LC 459, and LC 1392 from cold start.
Stated longest-border vs shortest-period duality.
Identified KMP as the single-pattern engine; identified Aho–Corasick as the multi-pattern generalization.
Stated the linear-time argument (potential function i + (i - j)).

Lab 06 — Rolling Hash (Rabin–Karp + Double Hashing)

Goal

Implement a polynomial rolling hash with two independent (base, mod) pairs. Use it to (a) find repeated substrings of a fixed length (LC 187 — Repeated DNA Sequences) and (b) find the longest duplicate substring via binary search on length (LC 1044 — Longest Duplicate Substring). Internalize the modular arithmetic well enough to avoid collisions on adversarial inputs.

Background Concepts

A polynomial rolling hash treats a string as a base-b number mod p:

H(S) = (S[0] · b^(L-1) + S[1] · b^(L-2) + ... + S[L-1]) mod p

The “rolling” part: when you slide the window from S[i..i+L-1] to S[i+1..i+L], you update in O(1):

H_new = ((H_old - S[i] · b^(L-1)) · b + S[i+L]) mod p

For substring equality from prefix hashes, precompute pref[i] = (S[0] · b^(i-1) + ... + S[i-1]) mod p. Then H(S[l..r]) = (pref[r+1] - pref[l] · b^(r - l + 1)) mod p.

Single hash collision risk. A single mod-p hash has birthday-paradox collision probability ~k²/(2p) for k strings. At p ~ 10⁹ and k = 10⁵, that’s ~5% chance of collision. Adversarial inputs constructed to collide on a fixed (b, p) make single hashing unsafe.

Double hashing uses two independent (b₁, p₁) and (b₂, p₂); two strings collide on both with probability ~k²/(p₁ · p₂) ≈ 10⁻⁹ for ~10⁵ strings. Effectively safe for interviews. Anti-hash-resistant code uses random bases per run.

Interview Context

Rolling hash shows up whenever the brute-force algorithm involves “compare every substring of length L to every other” — a quadratic-in-N algorithm. The reduction is “convert string equality to integer equality, get O(1) per comparison”. Asked at: Google (frequent — duplicate detection, plagiarism), search-infra teams, biotech companies (DNA sequence problems), Stripe.

The signal: “many substring equality checks”, “longest repeated”, “find duplicates of length L”.

Problem Statement A (LC 187)

Find all 10-letter sequences that occur more than once in a DNA string s over {A, C, G, T}. Return them in any order.

Problem Statement B (LC 1044)

Given a string s, find the longest duplicated substring (any substring of length ≥ 1 that appears at least twice — overlaps allowed). Return any longest. If no duplicates, return "".

Constraints

LC 187: 1 ≤ |s| ≤ 10⁵; alphabet ACGT.
LC 1044: 2 ≤ |s| ≤ 3 × 10⁴; lowercase ASCII.

Clarifying Questions

Overlaps allowed? (Yes — both problems.)
Single longest, or all? (LC 1044: any one longest.)
Case sensitivity / encoding? (As given by problem.)
May we use suffix array / suffix automaton? (Yes, but the lab assignment is rolling hash.)

Examples

LC 187: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" → ["AAAAACCCCC", "CCCCCAAAAA"].

LC 1044: s = "banana" → "ana".

Initial Brute Force

LC 187: hashtable of every length-10 substring. O(N · L) time, O(N · L) memory — already passes.

LC 1044: for each L from N-1 down to 1, scan all length-L substrings and check duplicates. O(N² · L) at worst — 3 × 10¹² at the limits. TLE.

Brute Force Complexity

LC 187: O(N · L) acceptable. LC 1044: O(N²) at minimum, O(N³) naively. TLE.

Optimization Path

LC 187 with rolling hash: O(N) amortized per scan, O(N) total. Useful when L is large; for L = 10 the brute force is fine, but rolling hash demonstrates the technique.

LC 1044 with rolling hash + binary search on L: outer loop binary-searches L in [1, N-1]; inner loop hashes every length-L substring and looks up duplicates in a dict. O(N log N) total, with double-hashing or hash+verify.

Final Expected Approach

def has_duplicate_of_length(s, L):
    # Returns the starting index of a duplicate of length L, or -1.
    h = 0; pow_L = 1
    seen = {}
    for i in range(L):
        h = (h * BASE + ord(s[i])) % MOD
        if i: pow_L = (pow_L * BASE) % MOD
    seen[h] = 0
    for i in range(1, len(s) - L + 1):
        h = ((h - ord(s[i-1]) * pow_L) * BASE + ord(s[i + L - 1])) % MOD
        if h in seen:
            # verify (or use double-hash) to avoid false-positive
            if s[seen[h]:seen[h]+L] == s[i:i+L]:
                return i
        else:
            seen[h] = i
    return -1

# Binary search L in [1, N-1].

For LC 187, just collect a multiset of length-10 hashes; output any whose count > 1 (and verify).

Data Structures Used

Two integer mods (~10⁹ primes), two random bases.
Dict from hash (or hash-pair) → starting index.
Precomputed power array pow_b[i].

Correctness Argument

If two strings have the same content, their polynomial hash is equal — exact, no probability. The other direction (same hash → same content) is not guaranteed; this is why we verify on collision (or use double-hashing for ~10⁻¹⁸ collision probability).

For LC 1044, monotonicity: if a duplicate of length L exists, every L’ ≤ L has a duplicate (a prefix of one of the duplicate occurrences). So the answer set {L : duplicate exists} is a prefix [1, L*], and binary search finds L* in O(log N) calls of has_duplicate_of_length.

Complexity

Operation	Time	Space
Single-length scan	O(N) amortized	O(N)
LC 187	O(N · L)	O(N · L)
LC 1044 (binary search)	O(N log N)	O(N)

Verifying on collision adds an O(L) hit per match; with double-hashing it’s negligible.

Implementation Requirements

import random

class RollingHash:
    def __init__(self, s):
        self.n = len(s)
        # Use two (base, mod) pairs.
        self.MOD1, self.MOD2 = (1 << 31) - 1, (1 << 61) - 1
        self.B1 = random.randint(27, self.MOD1 - 1)
        self.B2 = random.randint(27, self.MOD2 - 1)
        self.h1 = [0] * (self.n + 1)
        self.h2 = [0] * (self.n + 1)
        self.p1 = [1] * (self.n + 1)
        self.p2 = [1] * (self.n + 1)
        for i, c in enumerate(s):
            v = ord(c)
            self.h1[i+1] = (self.h1[i] * self.B1 + v) % self.MOD1
            self.h2[i+1] = (self.h2[i] * self.B2 + v) % self.MOD2
            self.p1[i+1] = (self.p1[i] * self.B1) % self.MOD1
            self.p2[i+1] = (self.p2[i] * self.B2) % self.MOD2

    def hash_pair(self, l, r):  # [l, r)
        a = (self.h1[r] - self.h1[l] * self.p1[r - l]) % self.MOD1
        b = (self.h2[r] - self.h2[l] * self.p2[r - l]) % self.MOD2
        return (a, b)


def longestDupSubstring(s):
    n = len(s)
    rh = RollingHash(s)
    def find(L):
        seen = {}
        for i in range(n - L + 1):
            h = rh.hash_pair(i, i + L)
            if h in seen: return i
            seen[h] = i
        return -1
    lo, hi, best_start, best_len = 1, n - 1, 0, 0
    while lo <= hi:
        mid = (lo + hi) // 2
        i = find(mid)
        if i != -1:
            best_start, best_len = i, mid
            lo = mid + 1
        else:
            hi = mid - 1
    return s[best_start:best_start + best_len]

Tests

LC 187 sample → ["AAAAACCCCC", "CCCCCAAAAA"].
LC 1044 "banana" → "ana".
LC 1044 "abcd" → "".
LC 1044 "aaaaaa" → "aaaaa" (overlapping duplicates).
LC 1044 "abab" → "ab".
Single-hash adversary: paste a Codeforces anti-hash test for (b=31, p=10^9+7); verify your double-hash survives.
Stress: 1000 random strings of length 100, compare LC 1044 result to a brute-force suffix-array approach.

Follow-up Questions

“What if I demand zero false positives?” → either verify on every match (O(L) extra) or use suffix array / suffix automaton (deterministic O(N log N) or O(N)).
“Multiple texts share patterns.” → hash all, group by hash-pair, verify within group.
“Stream the text.” → keep a rolling hash; emit matches online; constant memory beyond the dictionary.
“Distinguish substrings cyclically equal.” → hash min-rotation (Booth’s algorithm) or all rotations.
“Avoid Python big-int slowdown.” → use (1 << 61) - 1 (Mersenne prime) and bitwise reduction; or numpy.

Product Extension

Plagiarism detection over a corpus: shingle each document into k-grams, hash each shingle, store (doc_id, hash) tuples; near-duplicate documents share many shingles. MinHash + LSH (locality-sensitive hashing) is the production technology, built on the same rolling-hash foundation. Snapchat / Imgur-style “hash this image” reuses the same logic with perceptual hashes.

Language/Runtime Follow-ups

Python: integers are bignums; prefer mods that fit in 63 bits to keep ops fast. Avoid pow(b, k, m) in hot loops — precompute pow tables.
Java: use long for mod ~10⁹ to keep b · h + c from overflowing int. For mod 2⁶¹-1 you need Math.floorMod and 128-bit (Math.multiplyHigh in JDK 9+).
Go: uint64 with mod 2⁶¹-1 lets you (x * y) >> 61 reduce. Idiomatic CP technique.
C++: __int128 for mod 2⁶¹-1 multiplications. Otherwise unsigned long long.
JS/TS: Number only safe for integers ≤ 2⁵³. Use BigInt (slow) or pick mod ≤ 2²⁶ with two-pair hashing to fit safely.

Common Bugs

Single-hash collision — passes random tests, fails LC 1044 hidden cases. Double-hash always.
Negative mod in languages with truncated division (Java, C, JS): (a - b) % MOD can be negative; add MOD and re-mod.
Power table off-by-one: p[i] = b^i. hash[l..r) uses p[r - l], not p[r - l - 1].
Reusing the same (b, p) across runs on adversarial inputs — randomize per run.
Forgetting verify on single-hash → wrong answer.
Boundary in rolling update: drop the leftmost char first (multiply by pow_L), then add the rightmost.

Debugging Strategy

Construct two strings of the same length with known equality and known inequality; assert your hash_pair(0, L) gives equal pairs iff the strings are equal. For LC 1044, when the binary search returns wrong-length, instrument find(L) to dump (i, hash, prior_index) on each match candidate. If a hash collision passes verify, your hash is wrong; if verify catches it, single-hash worked but barely — switch to double-hash.

Mastery Criteria

Implemented double-hash rolling hash from scratch in <15 minutes.
Stated the collision probability for single vs double hash.
Solved LC 187 in <10 minutes; LC 1044 in <30.
Recognized binary-search-on-length as the standard “longest duplicate” reduction.
Used random base per run.
Stated the alternative (suffix array) and when to prefer it.

Lab 07 — Trie Applications (Implement Trie + Word Search II)

Goal

Build a trie supporting insert, search, startsWith (LC 208), then use a trie to solve LC 212 — Word Search II — by pruning a DFS over a 2D grid against a trie of dictionary words. After this lab, the trie should be a reflex for any “many strings, prefix-shared, batch-query” problem.

Background Concepts

A trie (prefix tree) is a tree where each edge is labeled with one character and each path from the root spells a string. A node may carry an is_end flag meaning “a word terminates here”. Two strings sharing a prefix share the prefix path in the trie, giving O(L) per insert / search regardless of how many words exist — a fundamental advantage over hashtable + per-word check when prefixes overlap.

Children can be stored as:

Array of 26 (or 256) — fastest dispatch, fixed alphabet, slightly memory-heavy. Best for grid-DFS hot loops.
Hash map char → node — flexible alphabet, slower constant. Good for arbitrary Unicode or large alphabets.
Compressed (radix tree / Patricia) — collapse single-child chains; smaller memory, harder to implement.

For LC 212, the killer move is: instead of running KMP / search per word against the grid (O(W · cells)), build a trie of all words and run one DFS that explores the grid while walking the trie in parallel, terminating any branch whose current grid letter has no trie-child. This converts the cost from “many independent searches” into “one search with multi-end”.

Interview Context

Tries are asked at every FAANG with high frequency: Meta, Google, Amazon all have multiple variants in their pool. The pattern is “many strings share prefixes; query, autocomplete, or grid-search”. Recognizing the trie-prune-DFS reduction for LC 212 is a strong-hire signal — it’s a 5-line code change with a 100x speedup.

Cousins: autocomplete (LC 642), longest word in dictionary (LC 720), word break (LC 139 — sometimes trie-friendly), maximum xor of two numbers (LC 421 — bit-trie).

Problem Statement A (LC 208)

Implement Trie with insert(word), search(word) (exact match), startsWith(prefix) (any word with this prefix).

Problem Statement B (LC 212)

Given an m × n grid of letters and a list words, return all words that can be formed by a sequence of adjacent (4-directionally connected) cells in the grid, where each cell is used at most once per word.

Constraints

LC 208: ≤ 3 × 10⁴ ops; lowercase ASCII; word length ≤ 2000.
LC 212: 1 ≤ m, n ≤ 12; 1 ≤ |words| ≤ 3 × 10⁴; each word length ≤ 10; lowercase ASCII.

Clarifying Questions

Are duplicate words possible in the dictionary? (LC 212: assume no, dedupe defensively.)
Can the same cell be reused across different words? (Yes — only “once per word”.)
Is alphabet exactly lowercase ASCII? (Yes — use array of 26.)

Examples

LC 212:

board = [['o','a','a','n'],
         ['e','t','a','e'],
         ['i','h','k','r'],
         ['i','f','l','v']]
words = ["oath", "pea", "eat", "rain"]
→ ["oath", "eat"]

Initial Brute Force

For each word, run a DFS from every starting cell that matches the word’s first letter; backtrack on dead ends.

Per-word DFS: O(m · n · 4^L). Total: O(W · m · n · 4^L). With W=3×10⁴, m·n=144, L=10: ~10¹⁰ ops. TLE.

Brute Force Complexity

O(W · m · n · 4^L). At the limits: 1.6 × 10¹⁰. TLE by 4 orders of magnitude.

Optimization Path

Trie-pruned DFS. Build a trie of all words once: O(total chars) ≈ 3 × 10⁵. Then run a single DFS from each cell, walking the trie in parallel; whenever a trie-child for the current letter is missing, prune. Whenever an is_end is hit, record that word.

Total: O(m · n · 4^L) for the DFS structure, but the effective branching is small because most paths are pruned within 2-3 chars. Practical speedup: ~100×.

Optimization on top: when a word is recorded, mark its trie-end node and prune empty trie branches as you backtrack — keeps the trie shrinking.

Final Expected Approach

Build trie: each node has children[26], word: Optional[str] (set on insert at the terminal).
For each cell (i, j), DFS:
- Read c = board[i][j].
- If node.children[c - 'a'] is None, return.
- Descend: node = node.children[...].
- If node.word: append to results, set node.word = None to dedupe.
- Mark board[i][j] = '#' to prevent reuse.
- Recurse 4 directions.
- Restore board[i][j] = c.
Return results.

Data Structures Used

TrieNode { children: List[Optional[TrieNode]], word: Optional[str] }.
2D grid (mutable for the visited-mark trick).
Result list.

Correctness Argument

The trie-DFS enumerates exactly the set of (path-in-grid, path-in-trie) pairs where each step matches the current grid letter to a trie child. A word is reported iff the DFS reaches a trie node with word != None along a non-self-intersecting grid path — by construction this is exactly the set of words present in the grid. Setting node.word = None after recording deduplicates without affecting other paths (the children remain reachable for other words sharing this prefix).

board[i][j] = '#' ensures non-reuse: the only way to visit a cell already on the path is if the trie has ‘#’ as a child of the current node, which it doesn’t (alphabet is lowercase).

Complexity

Operation	Time	Space
Trie build	O(total characters in words)	O(total characters)
DFS (worst)	O(m · n · 4 · 3^(L-1)) per starting cell	O(L) recursion
Total	O(m · n · 4 · 3^(L-1))	O(W · L) trie + O(L) recursion

The 3^(L-1) (not 4^L) is because after the first step, you can’t immediately go back, so each node has ≤ 3 forward neighbors.

Implementation Requirements

class TrieNode:
    __slots__ = ('children', 'word')
    def __init__(self):
        self.children = [None] * 26
        self.word = None

def findWords(board, words):
    root = TrieNode()
    for w in words:
        node = root
        for c in w:
            i = ord(c) - ord('a')
            if node.children[i] is None:
                node.children[i] = TrieNode()
            node = node.children[i]
        node.word = w

    m, n = len(board), len(board[0])
    res = []

    def dfs(i, j, parent):
        c = board[i][j]
        if c == '#': return
        idx = ord(c) - ord('a')
        node = parent.children[idx]
        if node is None: return
        if node.word is not None:
            res.append(node.word)
            node.word = None
        board[i][j] = '#'
        for di, dj in ((-1,0),(1,0),(0,-1),(0,1)):
            ni, nj = i + di, j + dj
            if 0 <= ni < m and 0 <= nj < n:
                dfs(ni, nj, node)
        board[i][j] = c
        # Optional pruning: if node has no children and no word, parent.children[idx] = None.

    for i in range(m):
        for j in range(n):
            dfs(i, j, root)
    return res

Tests

LC 212 sample → ["oath", "eat"].
Empty grid / empty word list → [].
Word equals a single cell: board=[['a']], words=["a"] → ["a"].
Word longer than grid: returns nothing.
Duplicate paths to same word: appears once thanks to node.word = None.
Words sharing prefixes: words=["oath", "oat"] — both should be found if both are present.
Stress: random grids of size 12×12, random word list of 100 words length 5; verify against per-word DFS brute force.

Follow-up Questions

“How would you support deletion?” → reference-count children or do recursive cleanup; tricky due to shared prefixes.
“Autocomplete with frequency.” → store count per word at the terminal; on prefix lookup, walk the subtree and pick top-k by count (or maintain a per-node max-heap).
“What if the alphabet is Unicode?” → switch from children[26] to a dict. Per-node memory grows but dispatch is O(1) hash.
“Compress the trie.” → radix / Patricia trie collapses single-child chains; helpful when you have million-word dictionaries (e.g., DNS).
“Bit trie for max-XOR (LC 421).” → trie of binary representations, depth 30 for 32-bit ints; greedy descent picks the opposite bit when possible.

Product Extension

Search-as-you-type / autocomplete: as a user types each character, walk the trie down one node and emit the top-k completions stored at that node. Production search engines (ES, Solr) build inverted indices, but for “small dictionary, fast prefix lookup” use cases (CLI command completion, query suggestion within an admin tool), an in-memory trie is the right call. DNS resolution uses radix tries internally.

Language/Runtime Follow-ups

Python: __slots__ on TrieNode trims memory by ~40%. For LC 212 use list of children rather than dict — dict’s per-key overhead dominates at this size.
Java: TrieNode { TrieNode[] children = new TrieNode[26]; String word; }. JIT inlines the array dispatch.
Go: type TrieNode struct { children [26]*TrieNode; word string }. Value-typed array of pointers is cache-friendlier than a slice.
C++: struct TrieNode { array<TrieNode*, 26> children{}; string word; };. Use a deque<TrieNode> arena to avoid new per node.
JS/TS: plain object {children: Array(26), word: null} works; for large tries Map with char keys uses less memory than 26-array per node.

Common Bugs

Forgetting to mark board[i][j] = '#' before recursing — same cell reused, wrong matches reported.
Forgetting to restore board[i][j] = c after recursion — corrupts the board for subsequent DFS calls.
Recording the word multiple times because the same prefix is reached via different paths. Fix: node.word = None after recording.
Storing the word only at the terminal — works if you carry the path string in DFS, wasteful otherwise. Correct: store the word at the terminal trie node.
Initializing children = [None] (length 1) instead of [None] * 26 — silent error at first non-‘a’ insert.
Walking the trie before reading the grid letter — off-by-one; you should read the grid letter, look up parent.children[c], then descend.

Debugging Strategy

Trace the DFS on the LC 212 sample by hand: starting at (0, 0)=‘o’, root has child ‘o’ → ok; recurse to (1, 0)=‘e’ or (0, 1)=‘a’; descend to “oa” → child ‘a’; etc. If your output is missing words, add print(node.word) on the entry to dfs and verify the word terminations are correctly placed in the trie. If your output has spurious words, the visited-mark or the is_end placement is wrong.

Mastery Criteria

Wrote Trie class with insert/search/startsWith in <8 minutes.
Solved LC 212 in <30 minutes from cold start.
Stated the speedup over per-word DFS and the complexity of the combined DFS.
Used node.word storage (not is_end + carry-string) for clean dedup.
Picked array-of-26 over hashmap for performance and justified it.
Solved LC 421 (bit-trie max XOR) using the same structure.

Lab 08 — Bitmask Dynamic Programming

Goal

Solve LC 847 — Shortest Path Visiting All Nodes — using bitmask DP over (visited-set, current-node) state. Internalize the recipe: when N ≤ 20-ish and the state involves “subset of which items have been used / visited / assigned”, a bitmask is the state and the transition is a bit OR.

Background Concepts

A bitmask is an integer interpreted as a set: bit k is 1 iff element k is in the set. Set operations:

Union: a | b. Intersection: a & b. Difference: a & ~b. Symmetric difference: a ^ b.
Add element k: a | (1 << k). Remove: a & ~(1 << k). Test: (a >> k) & 1 or a & (1 << k).
Iterate all subsets of mask: s = mask; while s > 0: ...; s = (s - 1) & mask. Iterates 2^popcount(mask) subsets.
Iterate set bits: while mask: k = (mask & -mask).bit_length() - 1; mask &= mask - 1.

Bitmask DP stores dp[mask][...] indexed by the subset. Useful when N ≤ ~20 (so 2^N ≤ 10⁶) and the state must remember “exactly which subset has been processed”. It generalizes:

TSP-like: dp[mask][i] = min cost path that visited exactly mask, ending at i.
Subset-cover: dp[mask] = min cost to cover mask`` summed over groups.
Assignment problem: dp[mask] = min cost to assign first popcount(mask) people to the jobs in mask``.

For LC 847, we want shortest walk (edges may repeat) visiting all nodes. The state (mask, i) captures “I’ve visited the set mask of nodes (at least once) and I’m currently at node i”. Transitions: from (mask, i), move to any neighbor j, new state (mask | (1 << j), j), cost +1. We want the smallest distance to any state (full_mask, *). BFS suffices since edge cost is 1.

Interview Context

Bitmask DP is a 1-2% problem family but a strong-hire signal when recognized fast. The trigger: N ≤ 20 with a constraint involving subsets. Asked at: Google occasionally, Stripe / Two Sigma, Meta in bar-raiser slot. Common trap is recognizing 2^N · N is feasible at N=15 (~5 × 10⁵) but not at N=25 (~8 × 10⁸).

Problem Statement (LC 847)

Given an undirected, connected graph of N nodes labeled 0..N-1 as adjacency lists, return the length of the shortest path that visits every node. You may start and stop at any node, may revisit nodes and edges.

Constraints

1 ≤ N ≤ 12
0 ≤ graph[i].length < N
Graph is connected.

Clarifying Questions

Length = number of edges (not nodes)? (Yes — number of edges traversed.)
Are self-loops possible? (No.)
May the path start and end at different nodes? (Yes.)
Is the graph guaranteed connected? (Yes — answer always finite.)

Examples

graph = [[1, 2, 3], [0], [0], [0]]
       (star with center 0, leaves 1, 2, 3)
shortest path visiting all = 4 (e.g., 1 → 0 → 2 → 0 → 3)

graph = [[1], [0, 2, 4], [1, 3, 4], [2], [1, 2]]
shortest = 4

Initial Brute Force

DFS / backtracking from every starting node, exploring all walks up to some bounded length. Without memoization, walks can be exponential in length even for small graphs. A timeout and a hand-tuned bound make this brittle.

Brute Force Complexity

Unbounded (or exponential with bound). Practically TLE for any non-trivial test.

Optimization Path

The state space is (mask, current_node) with mask ∈ [0, 2^N) and current_node ∈ [0, N). Total states: N · 2^N. For N=12: 12 · 4096 = 49152. Each state has ≤ N-1 outgoing transitions; total edges: N² · 2^N ≈ 6 × 10⁵. Trivially feasible.

Since edge weights are 1, BFS over the state graph from all starting states {(1 << i, i) : i ∈ [0, N)} (all “I’ve visited just myself” states) gives the shortest distance to each state. Stop when we dequeue a state with mask = (1 << N) - 1.

Final Expected Approach

full = (1 << N) - 1.
Initialize a queue with all (mask=1<<i, node=i) for i ∈ [0, N).
seen[(mask, node)] initialized for those starts.
BFS: pop (mask, u); for each neighbor v, new_mask = mask | (1 << v); if (new_mask, v) unseen, enqueue with dist+1.
First time a state with mask == full is dequeued, return its distance.

Data Structures Used

deque for BFS frontier.
2D seen of shape [2^N][N] (boolean) or a set.
Distance tracked alongside state in the queue (dist+1 per step).

Correctness Argument

The state graph is a directed graph on N · 2^N states; an edge (mask, u) → (mask | (1 << v), v) exists iff v is a graph neighbor of u. A walk in the original graph that visits all N nodes corresponds to a path in the state graph from some (1<<i, i) to some (full, j). Edge costs are 1 (one edge traversed per state-graph edge). Therefore shortest walk = shortest path in state graph from the start set to any final state, computed by multi-source BFS.

BFS visits each state once and terminates at the first finalized state. Correct because all edge weights equal.

Complexity

Quantity	Value
States	N · 2^N
Transitions	up to N² · 2^N
Time	O(N² · 2^N)
Space	O(N · 2^N)

At N=12: ~6 × 10⁵ ops. Fast.

Implementation Requirements

from collections import deque

def shortestPathLength(graph):
    n = len(graph)
    if n == 1: return 0
    full = (1 << n) - 1
    # State: (mask, node). Distance tracked by BFS layer.
    seen = [[False] * n for _ in range(1 << n)]
    q = deque()
    for i in range(n):
        seen[1 << i][i] = True
        q.append((1 << i, i, 0))
    while q:
        mask, u, d = q.popleft()
        if mask == full:
            return d
        for v in graph[u]:
            new_mask = mask | (1 << v)
            if not seen[new_mask][v]:
                seen[new_mask][v] = True
                q.append((new_mask, v, d + 1))
    return -1  # unreachable for connected graphs

Tests

N=1: return 0.
N=2 with one edge: return 1.
Star example: 4.
Linear chain 0-1-2-3-4: shortest visiting all = 4.
Complete graph K_5: shortest = 4 (any Hamiltonian path).
Disconnected (constraint says connected, but defensive): return -1 / handle.
Stress: random connected graphs N=8..12 vs Held-Karp O(N² · 2^N) DP for cross-check.

Follow-up Questions

“What if edges have weights?” → Dijkstra instead of BFS; same state graph.
“What if I must start and end at node 0 (TSP)?” → state (mask, i) with cost dp[mask][i] = min cost, recurrence dp[mask | (1 << j)][j] = min(dp[mask][i] + w(i, j)). Answer: min(dp[full][i] + w(i, 0)).
“What if N=20?” → 20 · 2^20 = 2 × 10⁷ states, still ok. At N=25 we hit 8 × 10⁸ — likely TLE. The constraint cap on bitmask DP is N ~ 22.
“Subset-cover variant (LC 1125 — Smallest Sufficient Team).” → dp[mask] = min team to cover skill-mask mask; transition: for each person p with skill-mask pm, dp[mask | pm] = min(dp[mask] + 1).
“Assignment problem in bitmask DP.” → dp[mask] = min cost to assign popcount(mask) people to the jobs in mask; transition over which job person popcount(mask) takes.

Product Extension

Vehicle routing / drone delivery with ≤ 20 stops: bitmask DP precomputes optimal tours offline. Interview-scheduling problems (LC 1066): assign N workers to N tasks minimizing cost; the assignment-DP variant of bitmask DP runs in O(N · 2^N), beating the O(N · N!) brute force at N=15 by 9 orders of magnitude.

Language/Runtime Follow-ups

Python: BFS with deque; the inner loop can be slow at N=12. PyPy if benchmarking. For N>14, switch to numpy or pre-flatten the seen array to bytearray.
Java: boolean[][] seen = new boolean[1 << n][n]. The deque ArrayDeque<int[]> boxes each state; for performance pack (mask, node, dist) into a long.
Go: idiomatic [][]bool. Use slice queue with head/tail indices to avoid alloc churn.
C++: vector<vector<bool>> seen(1 << n, vector<bool>(n)). Pack state into int (mask * n + node) and use vector<bool> of size n * (1 << n) for cache.
JS/TS: Uint8Array(n * (1 << n)) for seen; bitwise ops are 32-bit signed — fine for n ≤ 30.

Common Bugs

Forgetting that the path may revisit nodes — implementing as Hamiltonian path (mask exactly indicates visited once) is wrong for LC 847; use the right transition mask | (1 << v) (idempotent on already-visited nodes).
Initializing only one start state instead of all N — gives wrong answers because the optimal path may not start at node 0.
Returning the first full-mask state encountered without distance: BFS guarantees minimal distance only because of FIFO ordering — correct here, but easy to swap for DFS by accident.
Using mask & (1 << v) as a boolean test in C/Java — works, but in JS/Python be explicit: (mask >> v) & 1.
Allocating seen = [[False] * n] * (1 << n) (shared row reference) — Python beginner trap.
Off-by-one on full = (1 << n) - 1 vs (1 << n).

Debugging Strategy

For N=4 star, hand-simulate: start (1, 0=center) at d=0, expand to neighbors {1,2,3} → states (11, 1), (101, 2), (1001, 3) at d=1. From (11, 1), can go back to 0 → (11, 0) at d=2. From (11, 0) expand to 2 or 3 → (111, 2) or (1011, 3) at d=3. From (111, 2) go to 0 → (111, 0) at d=4. From (111, 0) to 3 → (1111, 3) at d=5. But the expected answer is 4! The min path is starting from a leaf: start (2, 1) → (3, 0) → (7, 2) → (15, 3)? Wait, going 1 → 0 → 2 → 0 → 3 is 4 edges. Let me recount: starts (2, 1) at d=0, → (3, 0) at d=1, → (7, 2) at d=2, → (7, 0) at d=3, → (15, 3) at d=4. Yes, 4. The issue with my earlier trace was starting from center.

If your code returns 5, you forgot to seed BFS with all start states.

Mastery Criteria

Recognized “bitmask DP” within 60 seconds when N ≤ 20 and state involves subsets.
Wrote shortestPathLength from scratch in <20 minutes.
Stated state space size and confirmed feasibility for the given N.
Solved one cousin (LC 1125, LC 943, LC 691) from cold start.
Used proper bit operations (no string-based mask handling).
Articulated when bitmask DP fails (N > 22 → 2^N too large).

Lab 09 — Meet in the Middle

Goal

Solve LC 1755 — Closest Subsequence Sum — via meet-in-the-middle: split the array into two halves, enumerate 2^(N/2) subset sums in each half, sort one half, and use binary search / two-pointer to find the pair-sum closest to the goal. Internalize the technique as the standard recipe whenever N is in the awkward zone 30 ≤ N ≤ 40 — too large for 2^N enumeration, too small for any polynomial DP.

Background Concepts

Meet in the middle (MITM) trades exponential time for exponential space, halving the exponent: instead of enumerating all 2^N subsets in one go (infeasible at N=40), enumerate 2^(N/2) subsets of each half (feasible at N=20: ~10⁶ each), then combine.

The combination step depends on the problem:

Closest sum to goal (this lab): sort one half’s sums; for each sum L of the left half, binary-search the right half’s sums for goal − L.
Count of subset-pairs with sum ≤ K: sort both halves; two-pointer.
Find any subset summing to S: hashmap of one half’s sums; for each sum L, check if S − L is in the map.
k-th smallest subset sum: more elaborate — heap-merge two sorted lists.

The asymptotics: O(2^(N/2) · N/2) to enumerate, O(2^(N/2) · log) for the combine. At N=40: 2^20 ≈ 10⁶, total ~2 × 10⁷. Feasible.

Interview Context

MITM is a niche but high-impact technique. Asked at Google (occasional), CP-flavored shops (frequent), and any problem set with N in [30, 45]. The signal: “N is around 30-40, brute force is 2^N, no polynomial DP visible because the state involves arbitrary subset sums”. Recognizing it converts a hopeless problem into a 30-minute solve. Not recognizing it caps you at “I would brute force but it TLEs” — a soft no-hire.

Problem Statement (LC 1755)

Given an integer array nums and integer goal, return the minimum absolute difference |sum(sub) − goal| over all non-empty subsequences (subsets) of nums. (LC 1755 allows the empty subsequence too — sum 0 — so empty is fine.)

Constraints

1 ≤ |nums| ≤ 40
−10⁷ ≤ nums[i] ≤ 10⁷
−10⁹ ≤ goal ≤ 10⁹

Clarifying Questions

Subsequence = subset (unordered)? (Yes — LC’s “subsequence” here is order-independent because we only care about sum.)
Empty subsequence allowed (sum 0)? (Yes per LC 1755.)
Sums fit in 64-bit? (Max |sum| = 40 · 10⁷ = 4 × 10⁸ — fits in 32-bit Java int. Use 64-bit defensively.)

Examples

nums = [5, -7, 3, 5], goal = 6 → 0   (5 + 3 - 5 + ... = 6 exactly via {5, 3, -7+5} = subset {5, -7, 3, 5}=6)
nums = [7, -9, 15, -2], goal = -5 → 1  (e.g., -9 + 7 = -2; |-2 - (-5)| = 3; better: -9 + 15 - 2 = 4; |-9 - (-5)|=4; -2 alone gives |-2-(-5)|=3; closest is -9 + 7 - 2 = -4, diff 1)
nums = [1, 2, 3], goal = -7 → 7  (closest is empty sum 0)

Initial Brute Force

Enumerate all 2^N subsets, compute each sum, track minimum |sum - goal|.

Brute Force Complexity

O(2^N · N). At N=40: 4 × 10¹³. TLE by 7 orders of magnitude.

Optimization Path

DP by sum? Sums range over [-4 × 10⁸, 4 × 10⁸] — too wide for a value-indexed DP. So polynomial DP is out.

O(2^(N/2)) enumeration: 2^20 ≈ 10⁶ per half. Feasible in time, requires the combine step.

For closest-sum-to-goal, sort one half’s sums; for each L in the other half, binary-search goal - L; check the two candidates around the insertion point. Total: O(2^(N/2) · N/2 + 2^(N/2) · log(2^(N/2))) = O(2^(N/2) · N).

Final Expected Approach

Split nums into halves A (first N/2) and B (last N - N/2).
Enumerate all subset sums of A: list sumsA of size 2^|A|.
Enumerate all subset sums of B: list sumsB of size 2^|B|.
Sort sumsB.
best = min(|s - goal|) for s in sumsA (handles the case where the right side contributes 0 — but since we include 0 as a subset sum of B, this is captured by step 6).
For each a in sumsA: binary-search sumsB for goal - a; check sumsB[idx] and sumsB[idx-1] (the two closest); update best.
Return best.

Data Structures Used

Two flat lists of subset sums.
bisect (Python) / Arrays.binarySearch (Java) / sort.Search (Go) / lower_bound (C++).

Correctness Argument

Every subset of nums decomposes uniquely as (left ∪ right) where left ⊆ A and right ⊆ B. So sum(subset) = a + b for some a ∈ sumsA, b ∈ sumsB. We want min_{a, b} |a + b - goal| = min_a min_b |b - (goal - a)|. For a fixed a, the inner min over b is solved by binary search in sorted sumsB: the closest element is at the insertion point or one position to its left. Iterating over all a ∈ sumsA covers all subsets.

Including 0 in both sumsA and sumsB covers the empty-side cases.

Complexity

Quantity	Value
Enumerate sums	O(N · 2^(N/2))
Sort	O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2))
Binary search loop	O(2^(N/2) · log(2^(N/2))) = O(N · 2^(N/2))
Total time	O(N · 2^(N/2))
Space	O(2^(N/2))

At N=40: ~4 × 10⁷ ops. Fits in 1 sec C++, ~3 sec Python.

Implementation Requirements

from bisect import bisect_left

def minAbsDifference(nums, goal):
    def all_sums(arr):
        sums = [0]
        for x in arr:
            sums = sums + [s + x for s in sums]
        return sums

    n = len(nums)
    A = nums[: n // 2]
    B = nums[n // 2 :]
    sumsA = all_sums(A)
    sumsB = sorted(all_sums(B))

    best = abs(goal)  # corresponds to the empty subset
    for a in sumsA:
        target = goal - a
        idx = bisect_left(sumsB, target)
        if idx < len(sumsB):
            best = min(best, abs(a + sumsB[idx] - goal))
        if idx > 0:
            best = min(best, abs(a + sumsB[idx - 1] - goal))
        if best == 0:
            return 0
    return best

Tests

N=1, [5], goal=5 → 0.
N=1, [5], goal=0 → 0 (empty subset).
LC 1755 sample: [5, -7, 3, 5], goal=6 → 0.
LC 1755 sample 2: [7, -9, 15, -2], goal=-5 → 1.
LC 1755 sample 3: [1, 2, 3], goal=-7 → 7.
All zeros: any goal → |goal|.
Single huge value: [10^7] * 40, goal=0 → 0 (empty).
Adversary: random N=40 with random values; cross-check against brute force at N=20.

Follow-up Questions

“Now I need to count subsets with sum exactly S.” → enumerate sums of both halves; for each sum a in A, count occurrences of S - a in B (bucket by value or use a Counter).
“Now I need subsets with sum in [L, R].” → sort B; for each a, binary-search the count of B-elements in [L - a, R - a] using two bisect calls.
“What if the array has 50 elements?” → 2^25 = 3 × 10⁷ — borderline. Memory at 8 bytes per sum is 256 MB. Need to drop to bitset or stream.
“Subset-product instead of sum?” → enumerate products; the combine is identical.
“k-th smallest subset sum across all subsets.” → k-way merge using a min-heap from sorted subset-sum lists per half.

Product Extension

Cryptographic key knapsack (Merkle–Hellman) and certain integer programming problems with ~40 binary variables: MITM is the textbook attack / solver. Portfolio optimization with a small basket of asset switches; molecular conformation enumeration. Whenever you have a “binary vector with cost”, N ≈ 40, and no obvious polynomial structure: MITM is the move.

Language/Runtime Follow-ups

Python: list-comprehension enumeration as shown is clean. For tighter constants use numpy to compute all subset sums via repeated concatenate(s, s + x).
Java: long[] sumsA, sumsB. Arrays.sort and Arrays.binarySearch are O(log) per call. Watch heap pressure at 2 × 10⁶ longs ≈ 16 MB.
Go: sort.Slice(sumsB, ...); sort.SearchInts for binary search.
C++: vector<long long> of size 2^20 = 8 MB each. std::sort, std::lower_bound. Idiomatic.
JS/TS: Number is safe to ±2⁵³; sums of ±4 × 10⁸ are tiny. Use plain Array + Array.prototype.sort((a, b) => a - b).

Common Bugs

Splitting halves as [:n/2] and [n/2:] but accidentally using n // 2 + 1 somewhere → mismatched sizes; you’ll miss subsets.
Forgetting to include 0 (empty subset) in either half — fix by initializing sums = [0].
Sorting only one half but binary-searching as if both are sorted, or vice versa.
Initial best = float('inf') is fine, but initial best = abs(goal) is more honest about the empty-subset case.
After binary search, only checking sumsB[idx] and missing sumsB[idx-1] (the next-smaller) — the closest element can be on either side.
Using a set instead of sorted list — kills the binary search.

Debugging Strategy

For small N=4, enumerate all 16 subset sums by hand and verify the MITM result. Print sumsA, sumsB (sorted) and walk one binary search by hand. If the result is consistently too large by some |x|, you forgot to include 0; if too small, you’re double-counting (e.g., overlapping halves).

Mastery Criteria

Recognized N=30..45 + arbitrary subset-sum constraint as the MITM trigger within 90 seconds.
Wrote MITM closest-sum from scratch in <25 minutes.
Stated time complexity O(N · 2^(N/2)) and space O(2^(N/2)) without prompting.
Solved LC 1755 in <30 minutes from cold start.
Solved one cousin (LC 956 — Tallest Billboard, with a twist; or “find subset with sum closest to half”) from cold start.
Articulated MITM’s failure point (N ≥ 50 → memory and time both blow up).

Phase 4 — Graph Mastery

Target level: Medium → Hard Expected duration: 3 weeks (12-week track) / 3 weeks (6-month track) / 4 weeks (12-month track) Weekly cadence: ~7 algorithms per week + 40–70 problems applying them under the framework

Why Graphs Are The Most-Tested Algorithm Family In Senior Interviews

Phase 2 taught you 28 patterns that solve most Mediums. Phase 3 taught you the augmented data structures that make Hards tractable. Phase 4 teaches the single algorithmic family that shows up more often than any other in senior, staff, and infrastructure interviews: graphs.

Here is the empirical claim, and it is the entire reason this phase exists as its own three-to-four-week unit:

Roughly one in three of all Medium-Hard and Hard interview problems at top-tier companies is a graph problem in disguise. Of those, at least half are not labeled “graph” — they are labeled “string”, “scheduler”, “permission system”, “currency conversion”, “build pipeline”, or “bus route”. The first job is recognizing that the problem is a graph problem. The second is picking the right algorithm. The third is implementing it without bugs.

Why graphs dominate senior interviews specifically:

Graphs are the universal modeling language. Almost every relational or topological structure in a real production system — service dependencies, ACL inheritance, package builds, request routing, social networks, fraud rings, knowledge bases, scheduling DAGs, currency markets — is a graph. A senior engineer is expected to see the graph in a problem that doesn’t mention one.
Graphs combine almost every Phase 1–3 building block. A real graph problem will fold in a hash map (adjacency list), a queue (BFS), a stack (DFS), a heap (Dijkstra), a DSU (Kruskal / cycle detection), a topological sort (dependency resolution), and sometimes a segment tree (Euler tour + RMQ for LCA). The interviewer gets to test ten primitives in a single 35-minute round.
Graphs have a clean correctness story. Each algorithm here is a named result with a known proof, known preconditions, and a known complexity. There is no “I think this works because…” — there is “this is BFS, BFS gives shortest paths in unweighted graphs, the precondition is unit-weight edges, the proof is on layer numbers.” Senior interviewers want to hear that proof come out unprompted.
The graph algorithm space is large but finite. Roughly 20 algorithms cover everything you’ll see at L4–L6. Past that, max-flow, min-cost-max-flow, and matching variants cover staff-and-above. There is a definite ceiling — but it’s higher than candidates expect, and it’s where senior interviewers live.

Candidates who stall on graph rounds almost always fail at recognition or modeling, not at the algorithm itself. They fail because:

They didn’t recognize “alien dictionary” as a topological sort over inferred constraints.
They didn’t see “minimum cost to connect all points” as Kruskal/Prim on the complete metric graph.
They reached for Dijkstra on a graph with negative weights and produced wrong answers.
They tried BFS on a 0-1 weighted graph instead of 0-1 BFS or Dijkstra.
They forgot to coordinate-compress an implicit graph and exploded the state space.

This phase is structured to make those failures impossible. You will internalize the signal for each of 21 algorithms, the modeling reflex for implicit graphs, and the algorithm-selection decision tree that maps a problem statement to a single correct technique within 90 seconds.

After this phase, you can solve unmistakably-Hard graph problems on first attempt: alien dictionary in 20 minutes, network delay time in 10, cheapest flights in K stops in 25, accounts merge in 20, bus routes in 30, min cost to connect all points in 15. You also become visibly stronger in mock interviews because you immediately reach for adjacency lists, write from collections import deque before you write any logic, and articulate which algorithm you’re running and why.

What You Will Be Able To Do After This Phase

Recognize that a problem is a graph problem in <2 minutes of reading, even when neither “graph”, “node”, nor “edge” appears in the statement.
Choose between BFS / DFS / Dijkstra / Bellman-Ford / Floyd-Warshall / 0-1 BFS / topological sort / DSU / MST in <60 seconds based on the problem’s edge weights, query type, and size.
Implement a clean adjacency-list representation in <2 minutes for any graph variant (directed, undirected, weighted, multi-edge, self-loop, implicit grid).
Implement BFS, DFS (recursive + iterative), Dijkstra (eager and lazy), and topological sort (Kahn + DFS) from memory in <8 minutes each.
Detect cycles in directed and undirected graphs by both DFS-coloring and DSU.
Run Kruskal and Prim end-to-end with a DSU you write by hand.
Identify when a Hard problem reduces to bipartite matching or max-flow at the modeling level (you do not need to memorize Dinic).
Articulate the correctness theorem for every algorithm you use (“Dijkstra is correct because the heap always extracts the next-closest unsettled node, and that node’s tentative distance is its true shortest distance under non-negative weights”).
Recognize negative-cycle problems and reach for Bellman-Ford / SPFA correctly.
Construct the implicit graph for grid problems, word ladders, state-space search, and bus-route problems without ever materializing all edges.

How To Read This Phase

Read this README in two passes. Pass 1: linear, end to end, building a mental map of which algorithm plugs which signal. Pass 2: as you work the labs, refer back to specific algorithm entries to clarify invariants and pitfalls.

Each algorithm entry has a fixed shape:

When To Use — the problem signal that should fire this algorithm in <2 minutes.
Complexity — time and space, with the assumptions that matter.
Correctness Sketch — one paragraph that you should be able to recite under interviewer pressure.
Common Pitfalls — the bugs that consume the most interview minutes.
Classic Problems — 3–6 representative LeetCode problems where the algorithm is the intended solution.

The phase ends with a Graph-Modeling Cheat Sheet (how to recognize a graph problem in disguise), an Implicit-Graph Catalog (grid / word-ladder / state-space), a Mastery Checklist, and Exit Criteria.

Inline Graph Algorithm Reference

1. Graph Representation

When To Use

Every graph problem starts here. The choice between adjacency list, adjacency matrix, edge list, and implicit graph is the first decision you make, and it shapes every subsequent algorithm’s complexity.

Adjacency list — the default. adj[u] is a list of (neighbor, weight) pairs. Use a dict of list (Python), Map<Integer, List<int[]>> (Java), [][]int or map[int][]edge (Go), vector<vector<pair<int,int>>> (C++).
Adjacency matrix — M[u][v] is the edge weight (or 0 / ∞ for absence). Use only when (a) V ≤ 500 so the O(V²) memory fits, (b) you do many (u, v) edge-existence queries, or (c) you’re running Floyd-Warshall.
Edge list — a flat list of (u, v, w) triples. Use only when the algorithm is edge-centric: Kruskal, Bellman-Ford.
Implicit graph — never materialize the edges. The neighbors of a state are computed on demand. Used for grids, word ladders, sliding puzzles, state-space search.

Complexity

Representation	Space	Edge query	Iterate neighbors
Adjacency list	O(V + E)	O(deg(u))	O(deg(u))
Adjacency matrix	O(V²)	O(1)	O(V)
Edge list	O(E)	O(E)	O(E)
Implicit	O(state)	O(neighbor-fn)	O(neighbor-fn)

Correctness Sketch

The representation is a faithful encoding of the graph; the algorithm’s correctness is independent of representation as long as iteration over neighbors is exhaustive and edge weights are preserved. Use the representation that minimizes the algorithm’s dominant cost.

Common Pitfalls

Undirected edges added once instead of twice. adj[u].append(v) without adj[v].append(u) silently breaks every traversal that relies on bidirectionality.
Multi-edges silently lost when using a set instead of a list for neighbors. If the problem permits multi-edges, use lists; if not, decide explicitly.
Self-loops can appear in problems that don’t seem to allow them (e.g., topological sorts of “course depends on itself”). Handle defensively.
Indexing on string keys — convert string node IDs to ints once, up front. Hash lookups inside hot loops cost real time.

Classic Problems

LeetCode 261 — Graph Valid Tree (tests representation + cycle detection).
LeetCode 332 — Reconstruct Itinerary (multi-edges matter; use a heap-of-destinations).
LeetCode 547 — Number of Provinces (matrix vs list tradeoff).

2. BFS — Breadth-First Search (Unweighted Shortest Path)

When To Use

Find shortest path in number of edges in an unweighted (or unit-weight) graph.
Layer-by-layer traversal: “all nodes at distance ≤ k”, “minimum number of moves”, “level-order traversal”.
The problem says “minimum / shortest / fewest” and the edge weights are all equal.
Implicit graph variants: shortest word ladder, shortest path in a maze, fewest knight moves on a chessboard.

Complexity

Time O(V + E). Space O(V) for the queue and visited set.

Correctness Sketch

BFS visits nodes in non-decreasing order of distance from the source. When a node is first dequeued, its distance is exactly the shortest path length, because any earlier-enqueued node has distance ≤ the current node’s distance, and the current node was enqueued by a neighbor at distance d - 1 — so any other path to it must go through some node at distance ≥ d - 1, giving total distance ≥ d.

Common Pitfalls

Marking visited on dequeue, not on enqueue. If you mark on dequeue, the same node can be enqueued by every neighbor before it’s processed once — exploding the queue to O(E) size and degrading performance.
Tracking distance via len(queue) confusion. Use either a (node, dist) tuple or process the queue in level batches via for _ in range(len(queue)).
Not separating the visited check from the enqueue. if v not in visited: visited.add(v); queue.append(v) is the canonical idiom.
Forgetting to handle the source itself. The source’s distance is 0; it should be marked visited at start.

Classic Problems

LeetCode 102 — Binary Tree Level Order Traversal.
LeetCode 127 — Word Ladder (canonical BFS on implicit graph). See Lab 01.
LeetCode 200 — Number of Islands (BFS variant on grid).
LeetCode 433 — Minimum Genetic Mutation.
LeetCode 1091 — Shortest Path in Binary Matrix.

3. DFS — Depth-First Search (Recursive + Iterative; Pre/Post Numbering)

When To Use

Connected-component enumeration, cycle detection, topological sort, tree traversal, articulation-point detection.
Backtracking-style problems where you exhaustively explore a state space.
When path matters more than distance — DFS finds some path, not necessarily the shortest.
When the graph has small branching but deep paths.

Complexity

Time O(V + E). Space O(V) for the recursion stack (or explicit stack).

Correctness Sketch

DFS explores each edge exactly twice (once in each direction for undirected, once for directed). Pre-order numbering captures discovery time; post-order captures finish time. The discovery/finish interval structure underpins SCC, articulation-point, and bridge algorithms (Tarjan’s lowlink uses pre-order numbers as ranks).

Common Pitfalls

Stack overflow at V = 10^5 in Python. Default recursion limit is 1000. Either sys.setrecursionlimit(2 * 10**5) or convert to an explicit stack.
Iterative DFS state. When converting to a stack, you need to track where in the neighbor iteration you are — a tuple of (node, iterator) works; a tuple of (node, neighbor_index) is faster.
Pre vs post processing confusion. “Print on entry” is pre-order; “print on completion” is post-order; topological sort uses reverse post-order.
Visited semantics differ from BFS. For cycle detection in directed graphs, you need three states: white (unvisited), gray (on the current DFS path), black (fully explored). A single boolean visited is insufficient.

Classic Problems

LeetCode 200 — Number of Islands. See Lab 02.
LeetCode 695 — Max Area of Island.
LeetCode 207 — Course Schedule (cycle detection via DFS coloring).
LeetCode 332 — Reconstruct Itinerary (Hierholzer’s = post-order DFS).
LeetCode 332 — Surrounded Regions.

4. Multi-Source BFS

When To Use

“From any of these K starting points, what’s the shortest distance to every other node?” — common in grids.
Equivalent to adding a virtual super-source connected to all K starts with weight 0 and running single-source BFS. But you don’t materialize the super-source: you just enqueue all K starts at distance 0 simultaneously.
Examples: “rotting oranges” (every rotten orange is a source), “walls and gates” (every gate is a source), “01 matrix distance from nearest zero” (every zero is a source).

Complexity

Time O(V + E). Same as single-source BFS — the K starts add O(K) but are absorbed into the V + E term.

Correctness Sketch

The super-source argument: imagine a node S₀ connected to every start with weight 0. Single-source BFS from S₀ visits each real node in non-decreasing distance order, and the distance is 1 + min(dist(start_i)). By initializing the queue with all starts at distance 0 instead of materializing S₀, we get the same layer-by-layer behavior with the same correctness proof.

Common Pitfalls

Initializing one start at a time in a loop and running single-source BFS K times. That’s O(K · (V + E)), not O(V + E).
Forgetting to mark all starts visited up front. If you only mark the first as visited, the others are treated as unvisited targets and get re-enqueued at distance > 0.
Mixing source types in problems where some sources and some targets are both special (e.g., “rotting oranges” has rotten=source, fresh=target, empty=skip). Always classify cells in a single pass before BFS.

Classic Problems

LeetCode 994 — Rotting Oranges. See Lab 03.
LeetCode 286 — Walls and Gates.
LeetCode 542 — 01 Matrix.
LeetCode 1162 — As Far From Land As Possible.
LeetCode 815 — Bus Routes (multi-source on the bus-line graph). See Lab 09.

5. 0-1 BFS

When To Use

Edge weights are in {0, 1} (or any two values, with 0-weight as the “free” edge).
The graph mixes “free” transitions (0-weight) and “step” transitions (1-weight). Examples: grid with portals, terrain with roads (free) and trails (cost 1).
Dijkstra would also work but has a log V overhead. 0-1 BFS is O(V + E) — strictly faster.

Complexity

Time O(V + E). Space O(V).

Correctness Sketch

Use a deque. When relaxing an edge of weight 0, push the neighbor to the front; when relaxing weight 1, push to the back. The deque thus holds nodes in non-decreasing order of tentative distance, with at most two distinct distance values present at any moment. The first time a node is popped, its distance is final — same correctness argument as Dijkstra, with the deque playing the role of a 2-bucket priority queue.

Common Pitfalls

Pushing weight-1 edges to the front is the canonical bug — it breaks the monotone-distance invariant.
Re-processing nodes because you didn’t check if d > dist[u]: continue after popping. This is a Dijkstra-style guard.
Generalizing to weights {0, k} for k > 1 doesn’t work directly; you need either a multi-bucket BFS or actual Dijkstra.

Classic Problems

LeetCode 1368 — Minimum Cost to Make at Least One Valid Path in a Grid (canonical 0-1 BFS).
LeetCode 2290 — Minimum Obstacle Removal to Reach Corner.
“Shortest path with at most K edges of cost 1, others free” — folklore.

6. Dijkstra (Lazy + Eager Variants; Non-Negative Weights)

When To Use

Single-source shortest path in a graph with non-negative edge weights.
The default for any “shortest / cheapest / minimum cost” path problem with weighted edges. If weights can be negative, use Bellman-Ford instead.
Variants: “shortest path with at most K edges” (relax with a (dist, edges_used) state), “second shortest path” (two distance arrays), “shortest path on multi-criteria” (state expansion).

Complexity

Lazy (binary heap): O((V + E) log V). The heap holds up to E entries because we don’t decrease-key — we re-insert and skip stale entries on pop.
Eager (binary heap with decrease-key): O((V + E) log V) — same asymptotic, smaller constant, but decrease-key requires an indexed heap.
Fibonacci heap: O(E + V log V) — theoretical, never used in interviews.
Space O(V) for the dist array + O(E) for the heap (lazy).

Correctness Sketch

Maintain a tentative distance dist[v] for every node, initialized to ∞ except the source (0). Repeatedly extract the unsettled node u with smallest dist[u] (the heap gives this in O(log V)). At the moment of extraction, dist[u] is final: any other path to u must go through some other unsettled node w with dist[w] ≥ dist[u], and since edges are non-negative, the total path length to u via w is ≥ dist[u]. Relax all outgoing edges from u and push updated neighbors. Termination: every node is extracted at most once.

Common Pitfalls

Using Dijkstra on negative-weight edges. It produces wrong answers — the relaxation invariant fails. Use Bellman-Ford.
Lazy variant: forgetting the staleness check. if d > dist[u]: continue after popping. Without it, you re-process nodes and the asymptotic blows up to O(E²).
Pushing (dist[u], u) instead of (new_dist, neighbor) when relaxing — the heap orders on the first tuple element, so put dist first.
Initializing dist[source] = 0 but not pushing the source to the heap. The first pop must be the source.
Forgetting to handle disconnected components. dist[v] = ∞ is the answer for unreachable v; check before printing.

Classic Problems

LeetCode 743 — Network Delay Time. See Lab 04.
LeetCode 1631 — Path With Minimum Effort.
LeetCode 778 — Swim in Rising Water.
LeetCode 787 — Cheapest Flights Within K Stops (modified Dijkstra with edge-budget).
LeetCode 1514 — Path with Maximum Probability (Dijkstra on max-multiplicative).

7. Bellman-Ford (Negative Weights, Negative-Cycle Detection)

When To Use

Shortest path with negative edge weights but no negative cycle reachable from source.
Negative-cycle detection itself: if a V-th iteration relaxes any edge, the graph has a negative cycle on the source’s reachable component.
“Shortest path with at most K edges” — Bellman-Ford’s iteration index is the edge budget. This is the canonical reframing of LeetCode 787.
All-pairs negative shortest paths via Johnson’s algorithm (Bellman-Ford + Dijkstra), but this is overview-only.

Complexity

Time O(V · E). Space O(V).

Correctness Sketch

After i rounds of relaxing all E edges, dist[v] equals the shortest path from source to v using at most i edges. By induction: in round 1, only the source’s neighbors are relaxed (1-edge paths). In round i, any shortest i-edge path’s last edge (u, v) was relaxed because dist[u] was already correct for i - 1 edges. Since shortest paths in a graph with no negative cycle have ≤ V - 1 edges, V - 1 rounds suffice. A V-th round that still relaxes an edge proves a negative cycle.

Common Pitfalls

Iterating until no edge is relaxed (early termination) is a common variant — but you still need V - 1 rounds in the worst case for correctness, and the V-th round for cycle detection.
Using a dict for distances instead of an array indexed by int — slow on hot iteration loops.
Confusing “shortest path with at most K edges” with “K hops” — read the problem carefully. K stops in LC 787 is K + 1 edges.
Negative-cycle reachability. A negative cycle exists in the graph but doesn’t affect the source if it’s unreachable. Run from the source, not globally.

Classic Problems

LeetCode 787 — Cheapest Flights Within K Stops (canonical). See Lab 05.
“Detect arbitrage in currency markets” (negative cycle in -log(rate) graph).
“Minimum steps to make k operations” with negative-cost shortcuts.

8. SPFA (Shortest Path Faster Algorithm)

When To Use

Bellman-Ford with a queue-based optimization that avoids re-relaxing edges whose source u hasn’t been updated since the last visit.
In practice, ~2–10× faster than vanilla Bellman-Ford on sparse graphs with random structure.
Caveat: worst-case is still O(V · E). On adversarial inputs (e.g., gridded negative cycles), SPFA can be slower than Bellman-Ford. Codeforces problems are sometimes designed to break SPFA. Use it for negative-weight graphs in interviews only when you’ve stated the caveat.

Complexity

Average O(k · E) for small k (often k ≤ 2). Worst-case O(V · E). Space O(V) for the queue + an in_queue flag.

Correctness Sketch

A node u is enqueued whenever its dist[u] improves. On dequeue, relax all outgoing edges. The in_queue flag prevents duplicate enqueues. Termination follows from the fact that each dist[u] decreases monotonically and is bounded below — for a graph with no negative cycle, the total number of decreases is ≤ V · E.

Common Pitfalls

Forgetting the in_queue flag. Without it, the queue can grow to O(E) size and SPFA degrades.
Negative-cycle detection in SPFA requires tracking the number of times each node is relaxed; if a node is relaxed ≥ V times, there’s a negative cycle.
Adversarial inputs. State the caveat to the interviewer; don’t claim asymptotic improvement.

Classic Problems

Same as Bellman-Ford. Choose Bellman-Ford for the K-edge-budget framing (LC 787); SPFA only when raw single-source negative shortest path is the goal and average performance matters.

9. Floyd-Warshall (All-Pairs Shortest Paths)

When To Use

All-pairs shortest paths on a small graph: V ≤ ~500 (V³ = 10^8, ~1 second).
Negative weights are fine (no negative cycle assumed).
Transitive closure: replace min/+ with OR/AND to compute reachability in O(V³).
Density independence: the algorithm is V³ regardless of E. So on dense graphs (E ~ V²) it’s the only practical all-pairs algorithm; on sparse graphs (E ~ V) Dijkstra-from-each-node is V · E · log V = better when V is large.

Complexity

Time O(V³). Space O(V²) for the distance matrix.

Correctness Sketch

Define dp[k][i][j] = shortest path from i to j using only intermediate vertices in {1, ..., k}. The recurrence is dp[k][i][j] = min(dp[k-1][i][j], dp[k-1][i][k] + dp[k-1][k][j]) — either don’t use k, or use it as a midpoint. The 2D in-place version dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]) works because in iteration k, the row dist[i][k] and column dist[k][j] are updated only with paths that don’t use k as an intermediate (since k-as-intermediate requires using k twice, which is a cycle and dominated by no-use).

Common Pitfalls

Loop order is k, then i, then j — and not i, j, k. The latter computes garbage.
Negative-cycle detection is dist[i][i] < 0 after the algorithm completes.
Initializing diagonals — dist[i][i] = 0; missing edges are ∞ (use a large finite number like 10^9 to avoid overflow when summing).
Path reconstruction requires a parent matrix — easy to add but double the memory.

Classic Problems

LeetCode 1334 — Find the City With the Smallest Number of Neighbors at a Threshold Distance (canonical V ≤ 100 Floyd-Warshall).
LeetCode 743 — Network Delay Time (single-source, but Floyd-Warshall on V ≤ 100 still passes).
“Transitive closure of a DAG” via OR/AND Floyd-Warshall.

10. Topological Sort (Kahn + DFS-Based; Cycle-Detection Equivalence)

When To Use

DAG ordering where edge u → v means “u must come before v”.
“Course schedule”, “task dependencies”, “build order”, “alien dictionary” (after extracting constraint edges).
Used as a precondition check: if topological sort fails (cycle present), the problem has no valid ordering.

Complexity

Time O(V + E). Space O(V).

Correctness Sketch — Kahn’s

Repeatedly remove a node with in-degree 0, append it to the order, and decrement its neighbors’ in-degrees. If the order has length V at the end, the graph is a DAG and the order is valid. If not, the remaining nodes form a cycle. Correctness: a node with in-degree 0 has no predecessors, so it can safely come first. After removal, the remaining graph is still a DAG (removing vertices can’t create cycles).

Correctness Sketch — DFS-Based

Run DFS from each unvisited node; on finishing a node (post-order), prepend it to the result. This works because of the white-path lemma: if u → v and DFS visits u before v, then v is in u’s subtree, so v finishes before u, so v is appended before u and prepended after u — meaning u comes before v in the final order.

Common Pitfalls

Edge direction confusion. “X depends on Y” usually means edge Y → X (Y must come first). Read the problem carefully.
Detecting cycles via Kahn’s — count nodes processed; if < V, cycle exists.
Multiple valid orders — both Kahn’s and DFS-based produce some valid order, not a unique one. If the problem demands lexicographically smallest, use Kahn’s with a min-heap instead of a queue.

Classic Problems

LeetCode 207 — Course Schedule.
LeetCode 210 — Course Schedule II.
LeetCode 269 — Alien Dictionary. See Lab 06.
LeetCode 1136 — Parallel Courses.
LeetCode 2115 — Find All Possible Recipes from Given Supplies.

11. Cycle Detection In Directed Graphs (DFS Color States)

When To Use

“Does this directed graph have a cycle?” — in dependency / scheduling / DAG-validation problems.
More fine-grained than Kahn’s: tells you which edge closes a cycle.

Complexity

Time O(V + E). Space O(V) for the color array.

Correctness Sketch

Maintain three colors: white (unvisited), gray (on the current DFS path), black (fully explored). On entering a node, mark it gray; on finishing, mark it black. If during DFS we encounter a gray neighbor, that’s a back-edge and proves a cycle. White → recurse. Black → already processed; not part of current path; safe to ignore. Correctness: a back-edge is exactly an edge from a descendant to an ancestor in the DFS tree, which closes a cycle. Forward and cross edges don’t.

Common Pitfalls

Using a single boolean visited. A visited node could be a back-edge target (cycle) or a forward/cross-edge target (no cycle). Two-state visited can’t distinguish. Three colors are required.
Resetting gray to white on finish. Wrong — you’d keep re-marking black nodes as gray on subsequent DFS calls and produce phantom cycles. Mark black on finish and stay black.
Forgetting to check both directions in undirected graphs. This algorithm is for directed graphs; for undirected, see #12.

Classic Problems

LeetCode 207 — Course Schedule (DFS-color variant).
LeetCode 802 — Find Eventual Safe States (reverse-direction + cycle-detection).
LeetCode 1059 — All Paths from Source Lead to Destination.

12. Cycle Detection In Undirected Graphs (DFS / Union-Find)

When To Use

“Is this undirected graph a tree (acyclic + connected)?”, “redundant edge”, “valid tree”.
Two equivalent approaches: DFS with parent tracking, or DSU.

Complexity

DFS: O(V + E). DSU: O(E · α(V)).

Correctness Sketch — DFS

In an undirected graph, every edge is bidirectional in the adjacency list. When DFS visits v from u, the edge (v, u) shows up in v’s adjacency. Skip the parent: for w in adj[v]: if w != parent: dfs(w, v). If a non-parent neighbor is already visited, that’s a cycle-closing edge.

Correctness Sketch — DSU

Process edges in any order. For each edge (u, v): if find(u) == find(v), the edge would close a cycle (both endpoints already in same component); else union(u, v). After processing all edges, the graph is acyclic iff no cycle was reported.

Common Pitfalls

DFS: forgetting to skip the parent. Without if w != parent, every edge looks like a back-edge.
DFS: parallel edges — if the graph has multi-edges between u and v, the second edge looks like a back-edge through a non-parent neighbor. Track edge IDs, not just parent node.
DSU: ignoring connectedness. “Valid tree” requires both acyclic and connected (V - 1 edges + DSU returning a single component).

Classic Problems

LeetCode 261 — Graph Valid Tree.
LeetCode 684 — Redundant Connection (DSU canonical).
LeetCode 685 — Redundant Connection II (directed variant — harder).

13. Strongly Connected Components (Kosaraju + Tarjan)

When To Use

Decompose a directed graph into maximal sets where every pair (u, v) has paths in both directions.
Reduces a directed graph to a DAG of SCCs (the condensation).
Used in 2-SAT, transitive-closure compression, and “find all nodes that can reach all others”.

Complexity

Both Kosaraju and Tarjan: O(V + E). Space O(V).

Correctness Sketch — Kosaraju

DFS on the original graph, pushing nodes to a stack in post-order.
Build the reverse graph.
Pop nodes from the stack; for each unvisited node, DFS on the reverse graph — that DFS visits exactly one SCC.

The post-order ordering ensures the first node popped is in a “source” SCC of the condensation; reverse-DFS from it can only reach nodes in its own SCC because no other SCC’s nodes have a forward path back to it.

Correctness Sketch — Tarjan

Single DFS maintaining a stack of “currently active” nodes plus disc[u] (discovery time) and low[u] (lowest discovery time reachable via the subtree + at most one back-edge). When DFS finishes a node u and low[u] == disc[u], pop the stack down to and including u — those popped nodes form an SCC. Tarjan does it in one pass; Kosaraju in two passes but with simpler bookkeeping.

Common Pitfalls

Kosaraju: forgetting to reverse all edges in the second graph. Use a separate adjacency list.
Tarjan: confusing low with disc — low[u] = min(low[u], low[v]) for tree-edge children; low[u] = min(low[u], disc[v]) for back-edge neighbors that are still on the stack. Both updates are needed.
Tarjan: handling cross-edges to other SCCs. A neighbor that’s been finished is already in a different SCC; do not update low[u] from it.

Classic Problems

LeetCode 1192 — Critical Connections (Tarjan for bridges; SCC-adjacent).
“2-SAT solvability” via SCCs in the implication graph.
“Number of source SCCs in the condensation” — Codeforces classic.

14. Bridges And Articulation Points (Tarjan’s Lowlink)

When To Use

A bridge is an edge whose removal disconnects the graph.
An articulation point is a vertex whose removal disconnects the graph.
Used in “critical connections” problems and network-resilience analysis.

Complexity

O(V + E). Single DFS, one pass.

Correctness Sketch

Compute disc[u] and low[u] as in Tarjan’s SCC. For an edge (u, v) where v is a tree child: low[v] > disc[u] ⇒ (u, v) is a bridge (no back-edge from v’s subtree reaches anything at-or-above u, so removing (u, v) disconnects). For articulation points: u is an articulation point if (a) u is the DFS root and has ≥ 2 tree children, or (b) u is not the root and has a tree child v with low[v] ≥ disc[u].

Common Pitfalls

Using ≥ vs > — bridges use low[v] > disc[u]; articulation points use low[v] ≥ disc[u]. The off-by-one between them is critical.
DFS root special case for articulation points — must count tree children; with one tree child, removing the root doesn’t disconnect.
Multi-edges — a multi-edge between u and v is not a bridge (the parallel edge keeps the graph connected). Treat parallel edges by edge-ID, not endpoint pair.

Classic Problems

LeetCode 1192 — Critical Connections in a Network (canonical bridges).
“Find all articulation points in a network” — Codeforces classic.

15. Minimum Spanning Tree — Kruskal (With DSU)

When To Use

Connect all V nodes with the minimum total edge weight, in a graph with V - 1 chosen edges.
Edge-centric algorithm: sort edges, add greedily, skip those that close a cycle (DSU detects).
Best on sparse graphs (E ~ V) where sorting E edges dominates.

Complexity

O(E log E) for sort + O(E · α(V)) for DSU = O(E log E). Space O(V).

Correctness Sketch (Cut Property)

The minimum-weight edge crossing any cut of the graph belongs to some MST. Kruskal repeatedly takes the next-cheapest edge; if it doesn’t close a cycle (DSU find(u) != find(v)), it’s the cheapest edge crossing the cut between its DSU components. By the cut property, it’s in some MST. Adding it preserves the invariant that the chosen edges are a subset of some MST. After V - 1 edges, the chosen set is a spanning tree.

Common Pitfalls

Forgetting to sort edges. Kruskal without sorting is just a random spanning tree.
DSU bugs. Phase 3’s α(N) DSU is required. Recursive find blows the stack at V = 10^5; use iterative two-pass or path-halving.
Disconnected graph. If after processing all edges DSU has > 1 component, no spanning tree exists. Return failure or compute a minimum spanning forest.
Tie-breaking on equal weights — any tie-breaking rule works; they all produce some MST.

Classic Problems

LeetCode 1584 — Min Cost to Connect All Points. See Lab 08.
LeetCode 1135 — Connecting Cities With Minimum Cost.
LeetCode 1489 — Find Critical and Pseudo-Critical Edges in MST.

16. Minimum Spanning Tree — Prim (With Priority Queue)

When To Use

Same problem as Kruskal — minimum total edge weight to connect all V.
Vertex-centric algorithm: grow the tree one vertex at a time, always adding the minimum-weight edge from the tree to a non-tree vertex.
Best on dense graphs where the heap pays off relative to sorting all E edges.

Complexity

With binary heap: O((V + E) log V). With Fibonacci heap: O(E + V log V) (theoretical). Space O(V + E).

Correctness Sketch (Cut Property, Variant)

At every step, the partial tree T defines a cut (T vs not-T). The minimum-weight edge crossing this cut is added next. By the cut property, it belongs to some MST. The invariant — chosen edges ⊆ some MST — is preserved. After V - 1 additions, the tree spans all of V.

Common Pitfalls

Lazy vs eager Prim. Lazy: push every (weight, neighbor) to the heap, skip duplicates on pop. Eager: maintain a “best known weight to enter T” per non-tree vertex and use decrease-key. Lazy is simpler and asymptotically equivalent.
Disconnected graph — same as Kruskal; the heap empties before V - 1 edges are added.
Picking starting vertex — any vertex works for connected graphs.

Classic Problems

LeetCode 1584 — Min Cost to Connect All Points (also solvable via Prim).
“Maximum spanning tree” via negated weights.

17. Bipartite Check (BFS/DFS 2-Coloring)

When To Use

“Can we partition the V into two groups such that every edge crosses groups?”
Equivalent to: graph has no odd cycle.
Used as a precondition for bipartite matching, and in problems like “is this set of dislike-pairs separable into two camps?”

Complexity

O(V + E). Space O(V).

Correctness Sketch

BFS/DFS, coloring each visited node alternately (color 0 or 1) from its parent. If we ever try to color a visited node with a color different from its existing color, the graph has an odd cycle and is not bipartite. Correctness: BFS layers alternate colors; an edge within a layer (or skipping ≥ 2 layers) violates bipartiteness; specifically, any odd cycle forces a same-layer edge.

Common Pitfalls

Disconnected graph. Run BFS from every unvisited node; the bipartiteness of each component is independent.
Initializing colors as -1 (uncolored), 0, 1. A boolean visited is insufficient; you need the actual color.
Counting color-0 vs color-1 sizes when the problem asks for “minimum group size” — but the partitioning is unique only up to swapping the two colors per connected component.

Classic Problems

LeetCode 785 — Is Graph Bipartite?
LeetCode 886 — Possible Bipartition.
“2-coloring as a sanity check before bipartite matching.”

18. Bipartite Matching (Hungarian / Hopcroft-Karp Overview)

When To Use

Maximum cardinality matching in a bipartite graph: pair up as many left-side nodes with right-side nodes as possible, using each at most once.
Job assignment, “find as many distinct words to slots as possible”, “schedule maximum tasks to workers”.
Hungarian algorithm: O(V · E) via repeated augmenting-path BFS — ~O(V²·√V) on bipartite.
Hopcroft-Karp: O(E · √V) — strictly better; the algorithm of choice for large bipartite matching.

Complexity

Hungarian: O(V · E). Hopcroft-Karp: O(E · √V). Space O(V + E).

Correctness Sketch (König’s Theorem and Augmenting Paths)

Berge’s theorem: a matching M is maximum iff there is no augmenting path (a path alternating unmatched / matched edges, starting and ending at unmatched vertices). The Hungarian algorithm repeatedly finds augmenting paths via BFS/DFS and augments. Hopcroft-Karp accelerates by finding all shortest augmenting paths in a single BFS phase, then augmenting all of them in one DFS phase.

Common Pitfalls

Confusing maximum matching with maximum-weight matching. The latter is min-cost-max-flow; harder algorithm.
Implementing matching from scratch in interviews is rare. State the algorithm by name, reduce to it, and note the complexity. Senior interviewers accept “this is bipartite matching, O(E√V) via Hopcroft-Karp” without code.
Modeling. The hard part is recognizing the bipartite structure. “Are there two disjoint sets where edges only cross sets?”

Classic Problems

LeetCode 1947 — Maximum Compatibility Score Sum (small N: bitmask DP. Large N: bipartite matching + weights).
“Maximum number of distinct words to slots” — folklore.
LeetCode 1349 — Maximum Students Taking Exam (bipartite + bitmask).

Overview-level only. Implementation drills are Phase 7 / 12.

19. Max Flow (Ford-Fulkerson / Edmonds-Karp / Dinic — Overview + When To Use)

When To Use

“Maximum amount of flow from source S to sink T in a capacitated network.”
Reductions: bipartite matching → max flow. Edge-disjoint paths → max flow. Min cut → max flow.
Algorithms:
- Ford-Fulkerson (DFS-based augmenting): O(E · max-flow) — pseudo-polynomial; can loop forever on irrational capacities.
- Edmonds-Karp (BFS-based augmenting): O(V · E²). Polynomial.
- Dinic’s: O(V² · E) — uses BFS levels + DFS-blocking-flow. Practical for V, E up to 10^4–10^5.

Complexity

See above. Space O(V + E) for the residual graph.

Correctness Sketch (Max-Flow Min-Cut Theorem)

The maximum flow equals the minimum cut capacity. An augmenting path in the residual graph proves the current flow is not maximum; absence of any augmenting path proves it is. Dinic’s enhancement: BFS to compute layered graph, DFS to push blocking flow, repeat until no augmenting path. Each phase strictly increases the BFS distance from S to T, bounded by V phases.

Common Pitfalls

Implementation in 35-minute interviews is rare. State the algorithm by name, model the problem, and let the interviewer guide depth.
Residual graph forgetting reverse edges. Every forward edge u → v of capacity c adds a reverse edge v → u of capacity 0; pushing flow f on forward subtracts from forward residual and adds to reverse residual. Without reverse edges, augmenting paths can’t “undo” prior bad choices and the max-flow can be wrong.
Modeling errors. “Each node has capacity” requires node-splitting (split v into v_in and v_out with edge capacity = node capacity).

Classic Problems

“Maximum bipartite matching via max-flow.”
“Edge-disjoint paths from S to T.”
LeetCode-style: very rare. Common in Google L5+ system rounds and competitive programming.

Overview-level. Implementation in Phase 12.

20. Min-Cut / Max-Flow Duality (Problem Modeling)

When To Use

“Minimum cost to disconnect S from T” → min-cut problem.
“Minimum number of edges to remove to disconnect” → min-cut on unit-capacity edges.
“Image segmentation as binary labeling” → min-cut on a constructed graph.
“Project selection” (some projects depend on others; pick a subset to maximize profit) → min-cut.

Complexity

Same as max-flow (compute the cut from the residual graph after max-flow terminates).

Correctness Sketch (Max-Flow Min-Cut Theorem)

In any flow network, max-flow value = min-cut capacity. After running max-flow, the min cut consists of the edges from {nodes reachable from S in residual graph} to {nodes not reachable}. Their original capacities sum to the max-flow value.

Common Pitfalls

Recognizing the model. This is the hardest part. “Project selection” doesn’t look like a flow problem; recognizing the bipartite encoding is the senior-level skill.
Edge orientation. Min-cut on undirected graphs: each undirected edge becomes two directed edges, each with capacity c.

Classic Problems

“Project Selection Problem” — folklore.
“Image Segmentation via Min-Cut” — vision systems.
“Minimum number of edges to disconnect” — Menger’s theorem.

21. Eulerian Path / Circuit (Hierholzer’s Algorithm)

When To Use

“Visit every edge exactly once” — Eulerian path/circuit.
An undirected graph has an Eulerian circuit iff every vertex has even degree (and the graph is connected on edges). It has an Eulerian path iff exactly 0 or 2 vertices have odd degree.
A directed graph has an Eulerian circuit iff every vertex has in-degree = out-degree. It has an Eulerian path iff exactly one vertex has out-degree − in-degree = 1 (start) and one has in-degree − out-degree = 1 (end).
Hierholzer’s algorithm finds the path/circuit in O(E).

Complexity

O(V + E).

Correctness Sketch (Hierholzer’s)

DFS from the start vertex, consuming edges as you traverse them (remove from adjacency). When stuck at a vertex with no outgoing edges, prepend it to the path. Backtrack and continue from earlier vertices that still have unused outgoing edges. The final reverse of the recorded sequence is an Eulerian path. Correctness: each edge is consumed exactly once; the post-order finishing structure naturally constructs the path in reverse.

Common Pitfalls

Multi-edges and self-loops are common in Eulerian problems. Use a multiset or a list of edges with a “consumed” flag.
Disconnected components on edges (vs vertices). Isolated vertices with degree 0 are fine; they don’t break Eulerian-ness.
Lexicographically smallest Eulerian path (LC 332) — sort each adjacency list and use a multiset / heap; pop the smallest unused edge first.

Classic Problems

LeetCode 332 — Reconstruct Itinerary (canonical Hierholzer’s).
LeetCode 753 — Cracking the Safe (Eulerian path on de Bruijn graph).

Graph-Modeling Cheat Sheet — How To Recognize A Graph Problem In Disguise

The hardest skill in this phase is modeling: recognizing that a problem is a graph problem when nothing in the statement says “graph”. Here is a battery of signals.

Signal in problem statement	Graph interpretation	Likely algorithm
“Depends on” / “must come before” / “prerequisite”	DAG, edge `pre → post`	Topological sort
“Connected” / “linked” / “merged” / “in same group”	Undirected, components	DFS / BFS / DSU
“Shortest” / “fewest steps” / “minimum moves” with unit cost	Unweighted graph	BFS
“Cheapest” / “minimum cost” with non-negative weights	Weighted graph	Dijkstra
“Cheapest with negative discounts”	Weighted graph with neg edges	Bellman-Ford
“Minimum cost to connect all”	Spanning tree	Kruskal / Prim
“Cycle” / “loop” / “redundant”	Graph + cycle test	DFS coloring / DSU
“Two groups” / “partition” / “no two together”	Bipartite	2-coloring
“Pair up X with Y”	Bipartite matching	Hungarian / Hopcroft-Karp
“Maximum throughput” / “bottleneck” / “max disjoint paths”	Flow network	Max-flow
“Minimum to disconnect” / “critical edges”	Min-cut / bridges	Max-flow / Tarjan
“Visit all edges once”	Eulerian	Hierholzer’s
“Visit all vertices once with min cost”	Hamiltonian / TSP	Bitmask DP (Phase 3)
“Currency conversion” / “exchange rate”	Weighted directed; cycles = arbitrage	Bellman-Ford
“ACL inheritance” / “permission propagation”	Reachability	DFS / BFS / transitive closure
“Build pipeline” / “task DAG”	Topological + critical path	Topo sort + DP
“Friend of friend” / “social network”	Undirected	BFS / DSU
“Word transformation” / “step-by-step transform”	Implicit graph on states	BFS
“Sliding puzzle” / “8-puzzle” / “Rubik’s cube”	Implicit state graph	BFS / IDA*
“Routes between cities” / “flight network”	Directed weighted	Dijkstra
“Spread / infection / contamination over time”	Multi-source unweighted	Multi-source BFS

Common Implicit Graphs

These are the four canonical implicit-graph patterns. You should be able to spot all four within 30 seconds of reading the problem.

1. Grid Graphs

Each cell (r, c) is a node; edges go to the 4 (or 8) neighbors that satisfy bounds and the cell-type constraint. Never materialize all V·M edges — compute neighbors on demand.

DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
def neighbors(r, c):
    for dr, dc in DIRS:
        nr, nc = r + dr, c + dc
        if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] != '#':
            yield (nr, nc)

Variants: 8-connected, weighted edges (cost = cell value), constraint-aware (can only enter from certain directions), gravity-based.

2. Word-Ladder Graphs

Each word is a node; an edge connects two words that differ in exactly one character. With N words of length L over alphabet σ, materializing all edges is O(N²) worst case. The trick: for each word, generate L · σ “wildcarded” patterns and use a dict-of-list to find neighbors in O(L · σ) per word.

buckets = defaultdict(list)
for w in words:
    for i in range(len(w)):
        buckets[w[:i] + '*' + w[i+1:]].append(w)
def neighbors(w):
    for i in range(len(w)):
        yield from buckets[w[:i] + '*' + w[i+1:]]

3. State-Space Search

Each “state” of the system is a node; transitions are edges. The state encodes the full configuration: e.g., a tuple of (position, keys_collected) for “shortest path with key-collection”.

def neighbors(state):
    pos, keys = state
    for next_pos in adjacent_cells(pos):
        if next_pos has key K:
            yield (next_pos, keys | (1 << K))
        else:
            yield (next_pos, keys)

4. Bipartite “Token / Container” Graphs

Two sets of nodes — e.g., users and groups, buses and stops, courses and prerequisites. An edge connects a token to a container it belongs to. Multi-source BFS over this graph gives “minimum tokens needed to traverse from container A to container B” — the canonical “bus routes” framing in LeetCode 815.

See Lab 09 for the bus-routes modeling exercise.

Mastery Checklist

Before exiting this phase, verify all of these:

You can build an adjacency list from an edge list (directed and undirected) in <2 minutes, in your primary language.
You can write BFS from memory in <5 minutes, including correct visited-on-enqueue semantics.
You can write DFS recursively and iteratively (with explicit stack) from memory in <8 minutes total.
You can write Dijkstra from memory in <8 minutes, lazy variant, including the staleness-skip line.
You can write Kahn’s topological sort from memory in <6 minutes.
You can write DSU with path compression and union by rank from memory in <5 minutes.
You can write Kruskal’s MST from memory in <10 minutes (DSU + sort).
You can recognize “this is a graph problem” within 2 minutes of reading any of the 30 classic graph problems on this list.
You can correctly choose between BFS / Dijkstra / Bellman-Ford / 0-1 BFS based on edge weights.
You can model the bus-routes problem (LC 815) as a graph in <5 minutes, articulating the bipartite structure.
You can model the alien-dictionary problem (LC 269) as a topological sort in <5 minutes, articulating the constraint-extraction step.
You can articulate the cut property and why it makes Kruskal correct, in <30 seconds.
You can articulate why Dijkstra fails on negative weights, in <30 seconds.
You can articulate the white-path lemma and its connection to topological sort via reverse post-order.

Exit Criteria

You may move to Phase 5 (Dynamic Programming) when all of the following are true:

You have completed all nine labs in this phase, with the lab’s mastery criteria checked off for each.
You have solved at least 40 unaided graph problems from LeetCode (mix of Medium, Medium-Hard, Hard) and reviewed each via REVIEW_TEMPLATE.md.
Your unaided success rate on Medium-Hard graph problems is ≥ 70%.
In a mock interview (phase-11-mock-interviews/), you correctly identify the algorithm family within 2 minutes for at least 8 of 10 graph problems.
You can write Dijkstra, BFS, DFS, Kahn’s topological sort, and DSU + Kruskal — five algorithms — from a blank slate in under 45 minutes total.

If any of these fails, do another 15–25 graph problems before moving on. Skipping this gate calcifies bad habits that compound in Phase 5 (where DP-on-graphs and DAG-DP build directly on this material).

Labs

Hands-on practice. Each lab follows the strict 22-section format from Phase 0.

← Phase 3: Advanced Data Structures · Phase 5: Dynamic Programming → · Back to Top

Lab 01 — BFS Shortest Path (Word Ladder)

Goal

Implement an unweighted shortest-path search on an implicit graph where the nodes are dictionary words and the edges connect any two words that differ in exactly one character. After this lab you should be able to recognize a word-ladder / state-transformation problem in <60 seconds, build the wildcard-bucket adjacency in <3 minutes, and write the BFS body from a blank screen in <5 minutes with correct visited-on-enqueue semantics.

Background Concepts

BFS on an unweighted graph visits nodes in non-decreasing order of distance from the source: the source first (distance 0), then its neighbors (distance 1), then their neighbors (distance 2), and so on. The first time a node is dequeued, its distance is final. This phase teaches the wildcard-bucket trick that makes word-ladder graphs tractable: rather than checking all O(N²) word-pairs for adjacency, build a dict mapping each L-character “wildcarded” pattern (e.g., h*t, *ot, ho*) to the list of words matching that pattern. Two words are adjacent iff they share at least one bucket.

The buckets are O(N · L) entries total; constructing them is O(N · L); finding all neighbors of a word is O(L · σ) where σ is the average bucket size. The total BFS cost is O(N · L²) instead of O(N² · L).

Interview Context

Word Ladder (LC 127) is a top-50 interview problem at Meta and Amazon — both companies have asked it within the past year. Variants appear at Google (“minimum genetic mutation” — LC 433) and Bloomberg. It tests three things at once: (1) recognizing the implicit graph, (2) building it efficiently, (3) running clean BFS. Candidates who try for each pair: if differs by one: connect time out at N = 5000. Bombing this problem on a phone screen is a serious negative signal at L4+.

Problem Statement

Given a beginWord, an endWord, and a list wordList, return the length of the shortest transformation sequence from beginWord to endWord such that:

Only one letter changes per step.
Every intermediate word must be in wordList.
beginWord does not need to be in wordList.

Return 0 if no such sequence exists.

Constraints

1 ≤ beginWord.length ≤ 10
All words have the same length, L.
1 ≤ wordList.length ≤ 5000
All words consist of lowercase English letters.
beginWord != endWord.
All words in wordList are unique.

Clarifying Questions

Is beginWord required to differ from endWord? (Yes — guaranteed.)
Does the answer count beginWord and endWord? (Yes — sequence length includes both endpoints.)
If endWord is not in wordList, is the answer 0? (Yes — by the rules.)
Are case-sensitive comparisons required? (No — all lowercase.)
Can beginWord itself appear in wordList? (Yes; treat normally.)

Examples

beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
→ 5  (hit → hot → dot → dog → cog)

beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
→ 0  (cog not in wordList)

Initial Brute Force

Materialize the graph: for each pair of words, check if they differ by one letter (O(L) check per pair). Build adjacency in O(N² · L). Then run BFS in O(V + E) = O(N²) edges worst case.

Brute Force Complexity

Time O(N² · L) for graph construction, O(N²) for BFS. Total O(N² · L). Space O(N²). At N = 5000, L = 10: 2.5 × 10^8 ops — borderline TLE in Python; passes in C++ tightly.

Optimization Path

The bottleneck is graph construction. Wildcard buckets eliminate it: for each word, generate L wildcards and append to a bucket dict. Construction is O(N · L²) — at N = 5000, L = 10, that’s 5 × 10^5 ops, two orders of magnitude better. Neighbor enumeration is also faster: only words sharing a bucket are candidates, which prunes dramatically vs scanning all N words.

Bidirectional BFS is a further optimization that ~halves the work in practice (search from both ends, meet in the middle), but adds complexity and is overkill at N = 5000.

Final Expected Approach

If endWord not in wordList, return 0.
Build buckets: a dict mapping each wildcard pattern to the list of words matching it.
BFS from beginWord with distance 0; on dequeue, generate L wildcards; for each, look up the bucket and enqueue any unvisited word with distance + 1.
On reaching endWord, return distance + 1 (because the answer counts both endpoints).
If queue exhausts, return 0.

Data Structures Used

dict[str, list[str]] — wildcard buckets.
set[str] — visited words.
collections.deque — BFS queue, holding (word, distance) tuples.

Correctness Argument

BFS on an unweighted graph: when endWord is first dequeued, its distance is the minimum number of edges from beginWord. Each edge represents a one-letter change between dictionary words. Therefore the distance equals the minimum number of one-letter changes — and the sequence length is distance + 1. Visited-on-enqueue ensures each word enters the queue at most once, so total work is O(V · L · σ) where σ is the average bucket size.

Complexity

Operation	Time	Space
Build buckets	O(N · L²)	O(N · L)
BFS	O(N · L²)	O(N)
Total	O(N · L²)	O(N · L)

Implementation Requirements

from collections import defaultdict, deque

def ladderLength(beginWord, endWord, wordList):
    word_set = set(wordList)
    if endWord not in word_set:
        return 0
    L = len(beginWord)
    buckets = defaultdict(list)
    for w in word_set:
        for i in range(L):
            buckets[w[:i] + '*' + w[i+1:]].append(w)
    visited = {beginWord}
    queue = deque([(beginWord, 1)])
    while queue:
        word, d = queue.popleft()
        if word == endWord:
            return d
        for i in range(L):
            pat = word[:i] + '*' + word[i+1:]
            for nb in buckets[pat]:
                if nb not in visited:
                    visited.add(nb)
                    queue.append((nb, d + 1))
            buckets[pat] = []  # optional: clear bucket to avoid reprocessing
    return 0

Tests

Standard: hit → cog with full path → 5.
Missing endWord: return 0.
beginWord == endWord: technically violates constraints, but should return 1 if asked.
Single-step: hit → hot with wordList=[“hot”] → 2.
No path: disconnected words → 0.
Long L: words of length 10, N = 5000 (load test).
All same-length: invariant must hold; assert in code.

Follow-up Questions

“Return all shortest paths, not just length.” → BFS to identify the layer of endWord, then DFS backward through parent pointers stored at each layer.
“What if word lengths differ across the list?” → Edges are now insert/delete/substitute; problem reduces to edit-distance graph. Out of scope here.
“What if N = 10^6?” → Bidirectional BFS halves the layer count; trie-based neighbor finding can replace bucket dicts.
“Stream of new words being added live.” → Maintain buckets incrementally; BFS becomes a per-query operation.
“What if changing a letter has a cost?” → Now weighted; switch to Dijkstra.

Product Extension

Spell-correctors, fuzzy-matching APIs, and DNA-mutation analyzers all use similar implicit-graph BFS. Google’s “did you mean” suggestion historically used Levenshtein-distance graphs over its query log; word-ladder BFS is the toy version of that.

Language/Runtime Follow-ups

Python: defaultdict(list) and collections.deque are essential. String slicing is O(L); the w[:i] + '*' + w[i+1:] pattern allocates a new string per call (3 × 10^5 per call * 5000 words = 1.5 × 10^9 — measured ~1.5s in Python). Acceptable; for faster, precompute patterns once per word.
Java: use HashMap<String, List<String>> and ArrayDeque<String>. StringBuilder for pattern construction is faster than string concat. Use int distance via a parallel map or wrap in a custom record.
Go: map[string][]string and a slice-based queue (q = q[1:] is O(1) amortized for slice queues, but channels are easier). Strings are immutable so building patterns is O(L) regardless.
C++: unordered_map<string, vector<string>> and queue<pair<string,int>>. Preallocate to avoid rehashing. Use string_view if possible to avoid copies; or (word_index, distance) to avoid string keys entirely.
JS/TS: Map and an array used as a queue (shift() is O(N) — instead, use a deque library or two-stack approach). Strings are immutable; pattern construction allocates.

Common Bugs

Forgetting to check endWord in word_set up front — wastes work if missing.
Visited check on dequeue, not on enqueue — exponential blowup of queue size.
Returning the BFS distance instead of distance + 1 (or vice versa) — off by one.
Including beginWord in word_set and then visiting it on a wildcard match — easy if you don’t initialize visited = {beginWord} first.
Generating wildcards with the wrong character ('_' vs '*') and getting collision-free buckets that are also empty.
Forgetting that wordList may contain duplicates if you stored as list — use a set.

Debugging Strategy

Print the buckets for a tiny example (["hot", "dot", "dog"], L=3) and verify each wildcard pattern maps to the expected words. Print the BFS queue state after each layer. If the BFS terminates too early, trace which word was dequeued at the failure point and which neighbors weren’t generated. If too slow, profile with cProfile (Python) — likely you’re not visited-marking on enqueue.

Mastery Criteria

Recognized the implicit-graph signal (one-character-difference adjacency) within 60 seconds.
Wrote the bucket construction from blank screen in <3 minutes, no off-by-ones.
Wrote correct BFS with visited-on-enqueue in <5 minutes from cold start.
Stated O(N · L²) complexity unprompted.
Solved LC 127 in <15 minutes from cold start.
Solved LC 433 (Minimum Genetic Mutation, a near-clone) in <10 minutes.
Articulated the wildcard-bucket vs all-pairs tradeoff in <30 seconds when asked.

Lab 02 — DFS Connected Components (Number of Islands)

Goal

Implement DFS on a 2D grid to enumerate connected components, both recursively and iteratively. After this lab you should be able to write numIslands from a blank slate in <8 minutes, convert recursive DFS to iterative DFS in <3 minutes, and extend the template to any grid-component problem (max area, perimeter, surrounded regions) by changing 5 lines or fewer.

Background Concepts

A grid graph treats each cell (r, c) as a node and edges as adjacencies between 4-connected (or 8-connected) neighboring cells of compatible type. A connected component is a maximal set of cells reachable from each other. Counting components reduces to: scan all cells; whenever an unvisited “land” cell is found, increment the count and DFS-mark its entire component as visited.

DFS is naturally recursive: enter a cell, mark visited, recurse to each valid neighbor. The recursion depth equals the longest path in the component; for an R × C grid the worst case is R · C — at 300 × 300 that’s 9 × 10^4, exceeding Python’s default 1000 recursion limit. The iterative version uses an explicit stack and avoids this entirely.

Interview Context

Number of Islands (LC 200) is the most-asked grid-DFS problem in interview history — it has appeared at virtually every FAANG company at least once, and at Amazon, Meta, and Google in the last year. It’s a stock phone-screen problem at L3-L4 and a warm-up at L5+. Bombing it is a no-hire signal at any senior level. The interviewer expects you to know it cold, and the value-add comes from how you handle the follow-ups: max area, surrounded regions, online updates (LC 305), grid as adjacency matrix (LC 547).

Problem Statement

Given an m × n 2D binary grid where '1' represents land and '0' represents water, return the number of islands. An island is a maximal group of land cells connected 4-directionally (horizontally or vertically).

Constraints

1 ≤ m, n ≤ 300
grid[i][j] is '0' or '1'.
The grid is surrounded implicitly by water on all sides.

Clarifying Questions

Is diagonal adjacency 4-connected or 8-connected? (4-connected.)
Are the cell values strings '0'/'1' or ints? (Strings, per LeetCode.)
Can the input be modified in place? (Usually yes — saves O(R·C) visited memory.)
Are very tall thin grids possible? (Yes — 1 × 90000 is allowed by the constraint.)
Is the count required to fit in int32? (Yes; max islands ≈ 4.5 × 10^4.)

Examples

grid = [["1","1","0"],
        ["1","0","0"],
        ["0","0","1"]]
→ 2

grid = [["1","1","1"],
        ["1","1","1"],
        ["0","0","0"]]
→ 1

Initial Brute Force

For each cell, if it’s land and unvisited, increment count and recursively flood-fill. There is no significantly worse “naive” — DFS is the natural approach.

Brute Force Complexity

O(R · C) — every cell is visited exactly once. Space O(R · C) for the recursion stack worst case.

Optimization Path

There’s no asymptotic improvement; only constant-factor and stack-depth improvements:

In-place marking (mutate '1' → '0') — saves O(R · C) memory.
Iterative DFS via stack — avoids Python recursion-limit blow-up at R · C ≥ 10^4.
DSU as alternative — slightly slower in practice (α factor) but composable for online problems (LC 305).
BFS variant — same asymptotic, different constant; sometimes preferred in Python because deque-pop has lower per-call cost than function calls.

Final Expected Approach

Iterative DFS with in-place marking:

Iterate over every cell.
If grid[r][c] == '1': increment count, push (r, c) to stack.
While stack non-empty: pop (r, c); if already water, skip; mark '0'; push 4 neighbors that are land.

Recursive DFS is acceptable for small grids; mention recursion-limit and in-place marking on follow-up.

Data Structures Used

The input grid itself as the visited bookmark (in-place mutation).
An explicit stack (list, in Python) for iterative DFS.

Correctness Argument

Component-counting via DFS is correct because: (1) DFS from an unvisited land cell visits exactly the cells in its component (closure under adjacency); (2) marking visited prevents re-counting; (3) the outer loop ensures every cell is examined; (4) we increment count only on the first cell of each component. Termination follows from finite grid size.

Complexity

Operation	Time	Space
Whole algorithm	O(R · C)	O(R · C) recursion or O(min(R,C)) for BFS

Implementation Requirements

def numIslands(grid):
    if not grid:
        return 0
    R, C = len(grid), len(grid[0])
    DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
    count = 0
    for r0 in range(R):
        for c0 in range(C):
            if grid[r0][c0] != '1':
                continue
            count += 1
            stack = [(r0, c0)]
            while stack:
                r, c = stack.pop()
                if grid[r][c] != '1':
                    continue
                grid[r][c] = '0'
                for dr, dc in DIRS:
                    nr, nc = r + dr, c + dc
                    if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == '1':
                        stack.append((nr, nc))
    return count

Tests

All water: [["0"]] → 0.
All land: [["1"]*5 for _ in range(5)] → 1.
Diagonal 1s only: 4-connected → many islands.
Single column: [["1"],["0"],["1"],["0"],["1"]] → 3.
Single row: same logic.
Snake pattern: alternating 1 rows / 0 rows → R/2 components.
Large: 300 × 300 random — must complete in <1s.

Follow-up Questions

“Max area of an island.” (LC 695) → DFS returns area; track max.
“Number of distinct islands” (LC 694) → record canonical-form path of each DFS; dedupe by hash.
“Surrounded regions” (LC 130) → DFS from border, mark; flip rest.
“Add land online and report island count after each addition” (LC 305) → DSU.
“Are two grids the same modulo rotation?” — open-ended modeling; involves shape signatures.

Product Extension

Connected-component analysis underpins image segmentation (region labeling in computer vision), ACL group expansion in identity systems, and graph-clustering for fraud-ring detection. The grid is just a constrained adjacency; the structure generalizes to any sparse adjacency.

Language/Runtime Follow-ups

Python: recursion limit at 1000 means recursive DFS fails at large grids. Use iterative or sys.setrecursionlimit(10**6). List-as-stack is fast; tuple keys in any auxiliary structures are fine.
Java: write int[][] grid or char[][]. Use Deque<int[]> (ArrayDeque) for stack; int[]{r, c} instead of Pair for performance. Don’t use Stack (legacy synchronized class).
Go: [][]byte or [][]int32. Slice-as-stack: s = s[:len(s)-1]. Two-int struct (type cell struct{ r, c int }) avoids slice allocation per push.
C++: vector<vector<char>> or vector<string>. stack<pair<int,int>>. Use emplace_back to avoid copies.
JS/TS: arrays as stacks (push / pop are O(1)). For 2D grids prefer grid[r][c] over flat-array indexing for clarity; the speed diff is negligible at N = 9 × 10^4.

Common Bugs

Bounds check missing on neighbor (negative or out-of-grid index).
Marking visited after pushing instead of on push — same node enters stack many times via different neighbors.
Recursive DFS without sys.setrecursionlimit blowing the stack on 300 × 300 dense islands.
Mutating the input when the caller didn’t allow it — clarify in interviews.
Treating '1' as integer 1 (or vice versa) — equality check fails silently.
Off-by-one on R and C (using < R vs <= R).
Forgetting one of the four directions (typo in DIRS).

Debugging Strategy

For a 3 × 3 grid, print the grid after each DFS call to verify in-place marking. Add a print((r, c)) on each pop and verify the 4 neighbors are correctly considered. If count is too high, you’re probably re-counting a visited component (visited check missing). If too low, your DFS is exiting early — check the bounds and the equality on '1' vs 1.

Mastery Criteria

Recognized “count connected groups in a grid” as DFS/BFS in <30 seconds.
Wrote both recursive and iterative DFS from cold start in <8 minutes total.
Handled the recursion-limit trap correctly when asked about 10^4 × 10^4 grids.
Stated O(R · C) complexity unprompted.
Solved LC 200 in <10 minutes from cold start.
Solved LC 695 (Max Area) in <12 minutes by extending the template.
Solved LC 130 (Surrounded Regions) in <20 minutes by inverting the search.
Articulated when DSU is preferable to DFS (online updates, no spatial constraint).

Lab 03 — Multi-Source BFS (Rotting Oranges)

Goal

Implement multi-source BFS on a grid to compute the minimum time for a process to spread from multiple simultaneous starting points. After this lab you should be able to recognize the multi-source BFS signal (any “infection / spread / nearest-source” problem) in <60 seconds, initialize the queue correctly with all sources at distance 0, and write the layer-by-layer time-tracking logic without off-by-ones.

Background Concepts

Multi-source BFS is single-source BFS with a virtual super-source connected to all real sources by zero-weight edges. We don’t materialize the super-source; we just enqueue all real sources at distance 0 simultaneously. The BFS then proceeds layer by layer, and each cell’s distance is min over all sources of (path length to that source). Critically, this is O(V + E) — same as single-source BFS — not O(K · (V + E)) for K sources.

The “rotting oranges” problem asks: given a grid where some cells contain rotten fruit (sources) and some contain fresh fruit (targets), how many time steps until all fresh fruit rots? Each minute, every rotten orange infects its 4-connected fresh neighbors. The answer is the maximum BFS distance among fresh oranges, or -1 if any fresh orange is unreachable.

Interview Context

Rotting Oranges (LC 994) appears at Amazon, Meta, and Microsoft phone screens regularly. The trap is candidates running single-source BFS K times — one per rotten cell — which is O(K · R · C) and blows up at K = R · C / 2. Multi-source BFS is the senior signal here; recognizing it within the first 90 seconds and stating it explicitly differentiates a strong L4 from a struggling one.

Problem Statement

Given an m × n grid where each cell is:

0: empty,
1: fresh orange,
2: rotten orange,

each minute every rotten orange rots its 4-connected fresh neighbors. Return the minimum minutes until no fresh orange remains, or -1 if some fresh orange can never rot.

Constraints

1 ≤ m, n ≤ 10
grid[i][j] ∈ {0, 1, 2}
(Note: small grid in LC 994; the algorithm scales to 10^4 × 10^4 trivially.)

Clarifying Questions

If the grid has no fresh oranges initially, what’s the answer? (0 — already done.)
If a rotten orange has no fresh neighbors and there are no fresh oranges anywhere, return 0. If there are unreachable fresh oranges, return -1.
Are ties between sources broken consistently? (Doesn’t matter — we want minimum distance, which is unambiguous.)
Can a rotten orange “re-rot” a previously-rotted cell? (No — once rotten, stays rotten.)

Examples

grid = [[2,1,1],
        [1,1,0],
        [0,1,1]]
→ 4

grid = [[2,1,1],
        [0,1,1],
        [1,0,1]]
→ -1  (bottom-left is unreachable)

grid = [[0,2]]
→ 0

Initial Brute Force

Simulate minute by minute: at each step, find every rotten orange, infect its fresh neighbors, count rotted cells. Repeat until no change. Each step is O(R · C); total steps ≤ R + C; total O((R + C) · R · C) — at 10 × 10 trivially fast, but doesn’t scale.

Brute Force Complexity

O(R · C · max-distance). At 10 × 10, ~10^4 ops. Passes LC bounds easily but is “embarrassing” — interviewer wants the BFS framing.

Optimization Path

Multi-source BFS gives the optimal O(R · C):

Scan grid: count fresh oranges; enqueue every rotten orange at distance 0.
BFS layer by layer; on rotting a fresh cell, decrement the fresh count and enqueue at next distance.
After BFS, if fresh count > 0, return -1; else return the maximum distance reached.

The simulation is mathematically equivalent but presents better in interviews — it’s the canonical multi-source signal.

Final Expected Approach

fresh_count = count of '1' in grid
queue = deque of (r, c, 0) for every '2' in grid
time = 0
while queue:
    r, c, t = queue.popleft()
    time = max(time, t)
    for each 4-neighbor (nr, nc):
        if in bounds and grid[nr][nc] == 1:
            grid[nr][nc] = 2
            fresh_count -= 1
            queue.append((nr, nc, t + 1))
return -1 if fresh_count > 0 else time

Data Structures Used

collections.deque of (row, col, time) tuples.
The input grid as the visited marker (mutate fresh → rotten on rot).
An integer fresh_count.

Correctness Argument

The super-source argument: imagine a virtual node S₀ connected to every initial rotten cell by a zero-weight edge. BFS from S₀ visits cells in non-decreasing distance order; each cell’s distance is 1 + min over rotten cells of (path length). By initializing with all rotten cells at distance 0 instead, we get the same distances. The “minutes until no fresh remains” equals the maximum BFS distance among initially-fresh cells; this is exactly the time the last cell rots. Unreachable fresh cells are detected by the fresh_count > 0 check post-BFS.

Complexity

Operation	Time	Space
Initial scan	O(R · C)	O(R · C) for queue worst case
BFS	O(R · C)	(already counted)
Total	O(R · C)	O(R · C)

Implementation Requirements

from collections import deque

def orangesRotting(grid):
    R, C = len(grid), len(grid[0])
    queue = deque()
    fresh = 0
    for r in range(R):
        for c in range(C):
            if grid[r][c] == 2:
                queue.append((r, c, 0))
            elif grid[r][c] == 1:
                fresh += 1
    if fresh == 0:
        return 0
    DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
    time = 0
    while queue:
        r, c, t = queue.popleft()
        time = t
        for dr, dc in DIRS:
            nr, nc = r + dr, c + dc
            if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == 1:
                grid[nr][nc] = 2
                fresh -= 1
                queue.append((nr, nc, t + 1))
    return -1 if fresh > 0 else time

Tests

All rotten: 0.
All fresh, no rotten: -1.
Mixed with one isolated fresh: -1.
Single rotten in corner, all fresh: distance = R + C - 2.
Empty grid (all zeros): 0.
1×1 grid with [[0]]: 0; [[1]]: -1; [[2]]: 0.
Stress: 10×10 random; verify against simulation brute force.

Follow-up Questions

“What if rotting takes a different amount of time per cell?” → Edge weights vary; use Dijkstra.
“What if there are obstacle cells?” → Add grid[nr][nc] == 0 skip; same algorithm.
“What if oranges can only rot orthogonally to one direction?” → Replace DIRS with the allowed subset.
“Walls and Gates (LC 286): from each empty room, find distance to nearest gate.” → Same multi-source BFS pattern; gates are the sources.
“01 Matrix (LC 542): for each cell, distance to nearest 0.” → Sources are the 0s; targets are the 1s.

Product Extension

Spreadable processes — disease propagation in epidemiology, fire spread in simulations, viral content propagation in social graphs, “blast radius” of a deployment failure across a service mesh — all map to multi-source BFS. The minute-by-minute simulation in production tracking systems is exactly this algorithm.

Language/Runtime Follow-ups

Python: collections.deque is the canonical queue. Tuples for (r, c, t). Avoid list.pop(0) — that’s O(N).
Java: ArrayDeque<int[]> with int[]{r, c, t}. Use pollFirst / offerLast.
Go: a slice as queue (q[1:] is O(1) amortized, or use container/list). A [3]int struct is fine.
C++: queue<tuple<int,int,int>>. emplace for efficiency. auto [r, c, t] = q.front() in C++17.
JS/TS: array shift() is O(N) — use a deque or pointer-based queue. queueMicrotask is irrelevant here.

Common Bugs

Counting time as time + 1 after the BFS terminates — the last layer’s t is already correct.
Forgetting to enqueue all rotten cells before starting — single-source BFS, missing K-1 sources.
Initializing time to -1 vs 0 to handle the “no fresh” case — corner case.
Decrementing fresh_count too late, double-decrementing on revisits.
BFS correctly terminating but reporting t from the wrong layer (e.g., the last enqueued, not last popped).
Mutating 1 to 2 outside the bounds check.
Mistaking the answer “minutes” for “max steps” when the grid has no fresh oranges (answer is 0, not “the BFS terminates immediately”; you must short-circuit).

Debugging Strategy

Trace a 3 × 3 grid by hand: print the queue contents and grid after each pop. Verify time increments by exactly 1 between layers. If fresh > 0 at the end, identify the unreachable cell — confirm it has no path to any source by visual inspection. If time is off by 1, you’re probably tracking time = t + 1 on enqueue instead of t on pop.

Mastery Criteria

Recognized “spread from multiple sources” as multi-source BFS in <60 seconds.
Initialized the queue with all sources at distance 0 unprompted.
Wrote correct BFS with time tracking from cold start in <8 minutes.
Stated O(R · C) complexity unprompted; explained why running K single-source BFSs is wrong.
Solved LC 994 in <12 minutes from cold start.
Solved LC 286 (Walls and Gates) in <12 minutes by extending the template.
Solved LC 542 (01 Matrix) in <12 minutes.
Articulated the super-source equivalence in <30 seconds.

Lab 04 — Dijkstra (Network Delay Time)

Goal

Implement Dijkstra’s algorithm (lazy variant, binary heap) for single-source shortest path on a non-negative-weighted directed graph. After this lab you should be able to write Dijkstra from a blank screen in <8 minutes, including the staleness-skip line; recognize the non-negative-weight signal in <30 seconds; and adapt the template to “shortest path with constraints” (e.g., max K edges) by extending the state.

Background Concepts

Dijkstra’s algorithm computes the shortest path from a source s to every other node in a graph with non-negative edge weights. The core invariant: when a node u is extracted from the priority queue (heap), its tentative distance dist[u] is final. The proof relies on non-negativity: any other path to u must go through some node w not yet extracted, with dist[w] ≥ dist[u], and the path’s total length is dist[w] + (non-negative tail) ≥ dist[u].

Two variants:

Lazy: push every relaxation (new_dist, neighbor) to the heap; on pop, skip if new_dist > dist[neighbor] (stale entry). Heap holds up to E entries. Simpler.
Eager: maintain a decrease-key indexed heap; each node appears at most once. Faster constants, more code.

In interviews, lazy is the default. State that explicitly.

Interview Context

Dijkstra appears in the top 5 graph algorithms tested at FAANG. Network Delay Time (LC 743) is the canonical version, asked at Google, Amazon, and Bloomberg. Cheapest Flights (LC 787) is the constrained variant. The senior signal is: state “this is Dijkstra, weights are non-negative, lazy variant with binary heap, O((V+E) log V)” within the first two minutes, before writing code. Candidates who don’t articulate this and just dive in lose points even if the code works.

Problem Statement

You are given a network of n nodes labeled 1..n. A list times of edges where times[i] = (u, v, w) means it takes w time for a signal to travel from u to v. A signal is sent from node k. Return the minimum time for all nodes to receive the signal, or -1 if some node never receives it.

Constraints

1 ≤ n ≤ 100, 1 ≤ |times| ≤ 6000
1 ≤ u, v, k ≤ n; u ≠ v
0 ≤ w ≤ 100
Edges are directed; possibly multi-edges.

Clarifying Questions

Are weights non-negative? (Yes — Dijkstra applies.)
Are nodes 1-indexed? (Yes — adjust array sizes accordingly.)
Are duplicate edges possible? (Yes — they’re allowed; minimum-weight edge between (u, v) is what matters effectively.)
Is the graph guaranteed connected? (No — return -1 if unreachable.)
What’s the answer if the graph has only the source? (0 — already received.)

Examples

times = [[2,1,1],[2,3,1],[3,4,1]], n=4, k=2
→ 2  (signal reaches 1, 3 at time 1; 4 at time 2)

times = [[1,2,1]], n=2, k=1
→ 1

times = [[1,2,1]], n=2, k=2
→ -1

Initial Brute Force

Bellman-Ford: V - 1 rounds of relaxing all E edges. Time O(V · E). At V=100, E=6000: 6 × 10^5 ops — passes easily but is the wrong answer when the interviewer asks complexity.

Alternative brute force: BFS treating equal-weight edges. Wrong answers on weighted graphs unless all weights = 1.

Brute Force Complexity

Bellman-Ford: O(V · E) = 6 × 10^5. BFS-as-Dijkstra: incorrect for varying weights.

Optimization Path

Dijkstra with binary heap: O((V + E) log V) = ~5 × 10^4 with these constraints. The right answer.

For very dense graphs (E ~ V²), simple Dijkstra without a heap (just scan for the min unsettled node) is O(V²) and can be faster. Floyd-Warshall is O(V³) all-pairs — overkill for single-source, but valid here at V=100.

Final Expected Approach

Build adjacency list from edge list. adj[u] is a list of (weight, neighbor) pairs.
dist[i] = ∞ for all i except dist[k] = 0.
Push (0, k) to a min-heap.
While heap non-empty: pop (d, u); if d > dist[u], skip (stale); else relax all edges u → v: if d + w < dist[v], update and push.
After the loop, max(dist[1..n]) is the answer; if any is ∞, return -1.

Data Structures Used

Adjacency list: dict[int, list[(int, int)]] or list[list[(int, int)]] indexed 1..n.
Distance array: list[int] of size n+1, init to inf.
Priority queue: Python’s heapq (min-heap).

Correctness Argument

Loop invariant: when (d, u) is popped from the heap with d == dist[u], that distance is final. Proof: any other path to u goes through some node w not yet popped (else dist[u] would have been updated to a smaller value). Since w is unsettled, dist[w] ≥ dist[u] (by heap order). The path’s total length is dist[w] + tail ≥ dist[u] (non-negativity of tail). So dist[u] is optimal.

Termination: each node is finalized at most once (the staleness-skip ensures repeated pops don’t re-relax). At most V finalizations + E heap pushes: O((V + E) log V).

Complexity

Operation	Time	Space
Build adjacency	O(V + E)	O(V + E)
Dijkstra	O((V + E) log V)	O(V + E) heap
Total	O((V + E) log V)	O(V + E)

Implementation Requirements

import heapq
from collections import defaultdict

def networkDelayTime(times, n, k):
    adj = defaultdict(list)
    for u, v, w in times:
        adj[u].append((w, v))
    INF = float('inf')
    dist = [INF] * (n + 1)
    dist[k] = 0
    heap = [(0, k)]
    while heap:
        d, u = heapq.heappop(heap)
        if d > dist[u]:
            continue
        for w, v in adj[u]:
            nd = d + w
            if nd < dist[v]:
                dist[v] = nd
                heapq.heappush(heap, (nd, v))
    ans = max(dist[1:n+1])
    return -1 if ans == INF else ans

Tests

Standard: 4-node star → 2.
Disconnected: source can’t reach a node → -1.
Single node, source = only node → 0.
Duplicate edges: pick min weight on relaxation.
Self-loop (problem disallows but defend): doesn’t affect answer; skip.
Stress: V=100, E=6000 random, compare against Bellman-Ford reference.
Adversarial: dense graph forcing many heap pushes.

Follow-up Questions

“Now find shortest path with at most K edges.” (LC 787) → Bellman-Ford OR Dijkstra with state (node, edges_used). See Lab 05.
“Now weights can be negative.” → Dijkstra is wrong; use Bellman-Ford.
“Find path with maximum probability” (LC 1514) → Dijkstra with max-heap and product (or -log weights and standard Dijkstra).
“All-pairs shortest path.” → Run Dijkstra from every node, O(V · (V+E) log V), or Floyd-Warshall O(V³).
“Graph evolves online.” → Recompute on each query, or use dynamic shortest-path structures (advanced).

Product Extension

Network monitoring tools use Dijkstra-like algorithms for path-cost estimation. Routing protocols like OSPF use Dijkstra (link-state routing) to compute shortest paths in IP networks. CDN edge selection, traffic engineering, and request routing in microservices meshes all rely on shortest-path computations parameterized by latency, cost, or QoS.

Language/Runtime Follow-ups

Python: heapq is a min-heap; for max-heap, negate weights or use tuples (-w, ...). Tuple comparison is element-by-element — (d, u) compares by d first.
Java: PriorityQueue<int[]> with comparator on the weight index, or PriorityQueue<long[]> if weights overflow. Don’t use Pair from JavaFX (deprecated).
Go: container/heap requires implementing the heap.Interface. Tedious; many candidates inline a slice-based heap. For small N, even O(V²) Dijkstra without heap is fine.
C++: priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> for min-heap. Or negate weights with default max-heap.
JS/TS: no built-in heap. Use a library (heap-js) or hand-roll. For small N, an O(V²) scan is acceptable.

Common Bugs

Forgetting the staleness check if d > dist[u]: continue — the heap pops the same node multiple times after stale updates; without the skip, you re-relax incorrectly and blow up complexity.
Pushing (u, dist[u]) instead of (dist[u], u) — heap orders on first element, so distance must come first.
1-indexed vs 0-indexed off-by-one. The problem is 1-indexed; size arrays at n + 1.
Initial dist not set to ∞ — using 0 makes the source’s neighbors look “already optimal”.
Pushing the source to the heap but forgetting to set dist[source] = 0.
Returning min(dist) (which catches the unreachable -∞) instead of max(dist) (which is the actual question).
Negative weights — Dijkstra silently produces wrong answers; you won’t see this fail unless you stress-test.

Debugging Strategy

Print the heap contents and dist array after each pop. Verify the popped distance equals dist[u] for non-stale entries. For wrong answers, trace a specific node v where dist[v] is wrong: identify the predecessor u that should have relaxed it; check that (d_u + w_uv) is computed correctly. For complexity blowup, check that the staleness skip is present and triggers.

Mastery Criteria

Recognized “shortest path with non-negative weights” as Dijkstra in <30 seconds.
Wrote Dijkstra from blank screen with the staleness check in <8 minutes.
Stated O((V + E) log V) complexity unprompted.
Articulated why Dijkstra fails on negative weights in <30 seconds.
Solved LC 743 in <15 minutes from cold start.
Solved LC 1631 (Path With Minimum Effort) in <20 minutes by adapting weights = max along path.
Solved LC 778 (Swim in Rising Water) in <20 minutes.
Stated lazy vs eager difference in <30 seconds.

Lab 05 — Bellman-Ford (Cheapest Flights Within K Stops)

Goal

Implement Bellman-Ford for shortest path on a graph that may contain negative weights, and exploit its iteration structure to solve the canonical “shortest path with at most K edges” problem. After this lab you should be able to recognize the K-edge-budget signal in <60 seconds, write Bellman-Ford from a blank slate in <10 minutes, and articulate the negative-cycle detection extension cleanly.

Background Concepts

Bellman-Ford runs V - 1 rounds; in each round, relax all E edges. After round i, dist[v] equals the shortest path from source to v using at most i edges. This invariant is the key insight for “K-edge-budget” problems: run the algorithm for K rounds (or K + 1, depending on edge-vs-stop semantics) and read off the answer. A V-th round that still relaxes any edge proves a negative cycle reachable from the source.

The complexity is O(V · E). It tolerates negative weights (unlike Dijkstra) but is slower on graphs without them. The “shortest path with at most K edges” framing is the most common interview reason to use Bellman-Ford.

Interview Context

Cheapest Flights Within K Stops (LC 787) is asked at Amazon, Bloomberg, and Adobe. The trap is candidates reaching for Dijkstra and being unable to bound edge count. The strong answer: “this is Bellman-Ford with K + 1 iterations, where K stops means K + 1 edges, complexity O(K · E).” That single sentence wins the round. Negative-cycle detection (e.g., currency arbitrage) is rarer at L4-L5 but standard at staff and on competitive programming exercises.

Problem Statement

You are given n cities and a list flights[i] = (from, to, price) of directed flights. Return the cheapest price from src to dst using at most K stops (i.e., K intermediate cities, K + 1 edges). Return -1 if no such route exists.

Constraints

1 ≤ n ≤ 100; 0 ≤ |flights| ≤ n · (n − 1) / 2
1 ≤ price ≤ 10^4
0 ≤ src, dst, K < n; src ≠ dst

Clarifying Questions

“K stops” — does this mean K intermediate cities (so K + 1 edges) or K edges total? (Per LC 787 statement: K stops = K intermediate = K + 1 edges.)
Are prices positive? (Yes; no negative-cycle concerns here.)
Are duplicate flights allowed? (Possible; pick min on relaxation.)
Should we count src and dst as “stops”? (No — those are endpoints.)
Is K = 0 allowed (direct flight only)? (Yes — 0 ≤ K.)

Examples

n=3, flights=[[0,1,100],[1,2,100],[0,2,500]], src=0, dst=2, K=1
→ 200  (0 → 1 → 2 uses 1 stop)

n=3, flights=[[0,1,100],[1,2,100],[0,2,500]], src=0, dst=2, K=0
→ 500  (only direct allowed)

Initial Brute Force

DFS from src exploring all paths up to K + 1 edges; track minimum cost. Time exponential — O(V^(K+1)) — at K = 99, V = 100: infeasible.

Brute Force Complexity

O(V^(K+1)) — TLE for K > ~10.

Optimization Path

Bellman-Ford with K + 1 iterations: O((K + 1) · E). At K = 99, E ~ 5000: 5 × 10^5 ops — fast. The key technique: keep two distance arrays, prev and curr. In each iteration, compute curr[v] = min(prev[u] + w_uv) over all edges. Using prev (last round’s snapshot) prevents using more than one edge per round.

Modified Dijkstra with state (node, edges_used) also works: push (cost, node, edges_remaining) to a heap, expand only if edges_remaining > 0. Slightly slower than Bellman-Ford for this problem because the heap doesn’t prune effectively; both are correct.

Final Expected Approach

prev[i] = ∞ for all i; prev[src] = 0
for round in 1 .. K + 1:
    curr = prev.copy()
    for (u, v, w) in flights:
        if prev[u] + w < curr[v]:
            curr[v] = prev[u] + w
    prev = curr
return prev[dst] if prev[dst] != ∞ else -1

The curr = prev.copy() ensures each round uses only edges from the previous round’s distances — bounding the path to at most one new edge per round.

Data Structures Used

Two arrays of size n: prev and curr.
The flight list as the edge list (no need to build adjacency).

Correctness Argument

Invariant: after round i, prev[v] = shortest path from src to v using at most i edges. Proof by induction. Base (i = 0): only src has prev = 0, all others ∞. Inductive step: any shortest i-edge path is either an (i-1)-edge path (already in prev[v]) or extends some (i-1)-edge path to u with edge (u, v); the inner loop catches the latter via prev[u] + w_uv → curr[v]. After K + 1 rounds, prev[dst] is the shortest at-most-K+1-edge path.

Why two arrays: without the copy, in-place updates could chain multiple edges in one round, breaking the at-most-i-edges invariant.

Complexity

Operation	Time	Space
Whole algorithm	O((K + 1) · E)	O(V)

Implementation Requirements

def findCheapestPrice(n, flights, src, dst, K):
    INF = float('inf')
    prev = [INF] * n
    prev[src] = 0
    for _ in range(K + 1):
        curr = prev[:]
        for u, v, w in flights:
            if prev[u] + w < curr[v]:
                curr[v] = prev[u] + w
        prev = curr
    return prev[dst] if prev[dst] != INF else -1

Tests

Standard: 0 → 1 → 2 with K=1 → 200; K=0 → 500.
No path: disconnected → -1.
Direct flight only: K=0, no direct → -1.
K very large (≥ V - 1): equivalent to unrestricted Bellman-Ford.
Tie: two K-stop paths with same cost — return the cost.
Stress: V=100 dense, K=99 random — verify against modified Dijkstra.

Follow-up Questions

“Negative weights? Negative cycles?” → Run V rounds (not K); if round V relaxes any edge, negative cycle reachable from src exists.
“All-pairs shortest path with negative weights, no cycles.” → Johnson’s algorithm: Bellman-Ford to reweight, then Dijkstra from each source.
“Currency arbitrage detection.” → Build graph with weight = -log(rate); negative cycle = profitable arbitrage.
“K is up to 10^9.” → Matrix exponentiation on the (min, +) semiring; O(V³ log K).
“Online updates: new flights added live.” → Difficult. Restart Bellman-Ford on each query; or maintain incrementally with limited optimizations.

Product Extension

Travel-search engines (Kayak, Google Flights, Hipmunk) treat flight networks as graphs and apply variants of K-stop shortest path. The “fewest connections” filter is exactly K-stop. The “cheapest with up to 2 stops” is K=2 Bellman-Ford. Currency-arbitrage bots run Bellman-Ford continuously on FX-rate graphs to detect profit cycles in milliseconds.

Language/Runtime Follow-ups

Python: prev[:] is O(V) and acceptable. array.array('d', ...) for floats can speed up cache locality.
Java: int[] prev = new int[n]; Arrays.fill(prev, Integer.MAX_VALUE); Watch overflow when summing — use long if weights × edges can overflow int.
Go: prev := make([]int, n) with manual init to a large constant; copy(curr, prev) for the snapshot. Builtin math.MaxInt32 is fine.
C++: vector<int> prev(n, INT_MAX); Use long long if weights are large. std::copy for the snapshot.
JS/TS: Array.from({length: n}, () => Infinity) and prev.slice() for copy.

Common Bugs

In-place updates without the prev/curr split — chains multiple edges per round, gives wrong answers.
Running K rounds instead of K + 1 — off by one (K stops = K + 1 edges).
Treating “K stops” as K edges — read carefully; LC 787 means K + 1 edges.
Forgetting to copy prev to curr at the start of each round — uses stale curr from previous iteration.
Returning prev[dst] without the unreachable check; INF leaks into output.
Integer overflow on prev[u] + w when prev[u] is set to INT_MAX — guard with the unreachable check first.
Confusing “K = 0 means direct only” with “K = 0 means src only”.

Debugging Strategy

For a small case, print prev at the end of each round. Verify that round 1 has dist[v] = w(src → v) for direct neighbors only; round 2 has 2-edge paths, etc. If the answer is wrong, compare with a brute-force enumeration of paths up to K + 1 edges. For negative-cycle problems, verify that round V actually relaxes an edge by tracking a “changed” flag.

Mastery Criteria

Recognized “shortest path with at most K edges” as Bellman-Ford in <60 seconds.
Articulated the iteration-as-edge-budget invariant unprompted.
Wrote Bellman-Ford from blank screen with prev/curr split in <10 minutes.
Stated O((K + 1) · E) complexity unprompted.
Solved LC 787 in <20 minutes from cold start.
Articulated negative-cycle detection in <30 seconds when asked.
Articulated when Dijkstra is preferable (no edge budget, non-negative weights) in <30 seconds.

Lab 06 — Topological Sort (Alien Dictionary)

Goal

Build a topological sort over an inferred constraint graph — a problem whose graph is not given but must be extracted from the input. After this lab you should be able to recognize “ordering with constraints” problems in <60 seconds, write Kahn’s algorithm from a blank screen in <8 minutes, and identify and handle the three degenerate cases of alien-dictionary parsing (prefix violation, no constraint differs, cyclic dependency).

Background Concepts

A topological sort orders the vertices of a DAG such that every directed edge u → v has u before v. Kahn’s algorithm repeatedly removes a node of in-degree 0, decrementing its neighbors’ in-degrees. If the final order has length V, the graph is a DAG and the order is valid. Otherwise, a cycle exists.

For alien dictionary, the input is a list of words known to be sorted lexicographically in some unknown alien alphabet. Each adjacent pair of words gives at most one ordering constraint between two characters: the first position where they differ tells you a < b for those characters. Build the constraint graph, run topological sort, output an order. Handle three degenerate inputs:

No constraint differs between adjacent words but the second is a prefix of the first (e.g., ["abc", "ab"]) — invalid; return "".
No constraint differs but it’s because the words are equal up to a common length and the second is the longer one (["ab", "abc"]) — fine; no constraint added.
Cycle in the constraint graph — invalid; return "".

Interview Context

Alien Dictionary (LC 269) is one of the most-asked Hard graph problems at Meta, Google, and Airbnb. Premium-only on LeetCode but widely leaked. It tests: (1) recognizing topological sort, (2) constraint extraction from non-graph input, (3) handling all three edge cases. Candidates who solve only the happy path lose major points. The senior signal: enumerate the three failure modes upfront, before writing code.

Problem Statement

There is a new alien language using English letters. The order of letters is unknown. Given a list of words sorted lexicographically by the alien language’s rules, return any valid letter ordering. If no valid ordering exists, return "". If multiple are valid, return any.

Constraints

1 ≤ |words| ≤ 100
1 ≤ |words[i]| ≤ 100
All words consist of lowercase English letters.

Clarifying Questions

Is the answer unique? (No — any valid topological order.)
Does the answer include letters not appearing in any word? (No — only letters that appear.)
Are duplicate words possible? (Possible; treat normally — they yield no constraint.)
What does “lexicographically sorted” mean for words of different length? (Standard prefix rule; if A is a prefix of B, then A < B; if B is a prefix of A, the input is invalid.)
What if words is a single word? (Output any permutation of its unique letters.)

Examples

words = ["wrt","wrf","er","ett","rftt"]
→ "wertf"  (one valid ordering)

words = ["z","x"]
→ "zx"

words = ["z","x","z"]
→ ""  (z and x must precede each other — cycle)

words = ["abc","ab"]
→ ""  (prefix violation — "abc" can't come before "ab")

Initial Brute Force

Enumerate all permutations of the alphabet of size ≤ 26; for each, verify that the input words are in that lexicographic order. O(26!) — infeasible.

Brute Force Complexity

O(26! · ΣL) — astronomical.

Optimization Path

The answer requires O(V + E) where V = number of distinct letters (≤ 26) and E = number of pairwise constraints (≤ |words| - 1). Kahn’s algorithm runs in O(V + E). Constraint extraction is O(Σ |words[i]|).

The total is O(V + E + Σ L) — bounded by Σ L since V ≤ 26. Trivially fast.

Final Expected Approach

Initialize in_degree[c] = 0 for every letter that appears anywhere.
Build adjacency: for each adjacent pair (w1, w2) in words:
- Walk both words in parallel; at the first index i where they differ, add edge w1[i] → w2[i] (if not already present); break.
- If no differing index found and len(w1) > len(w2): prefix violation — return "".
Run Kahn’s: queue all letters with in_degree[c] == 0; pop, append to order, decrement neighbors.
If the order length equals the number of distinct letters, return the order; else cycle → return "".

Data Structures Used

defaultdict(set) for adjacency (set prevents duplicate edges).
dict[char, int] for in_degree.
collections.deque for the Kahn queue.
list[char] for the result.

Correctness Argument

Each adjacent pair contributes at most one constraint — the first differing character. This is sound: if the words are correctly sorted, the relative order of w1[i] and w2[i] (at the first differing index) must be w1[i] < w2[i] in the alien alphabet. No constraint can be inferred from differences after the first; those are consistent with but not implied by the sortedness.

Topological sort over these constraints produces an ordering where every constraint a < b is satisfied (a precedes b in the output). If a cycle exists, the constraints are unsatisfiable and the input is impossible. The prefix-violation case is the only constraint-extraction-time invalid input.

Complexity

Operation	Time	Space
Constraint extraction	O(Σ L)	O(unique edges) ≤ O(26²)
Kahn’s	O(V + E)	O(V + E)
Total	O(Σ L)	O(unique letters + edges)

Implementation Requirements

from collections import defaultdict, deque

def alienOrder(words):
    adj = defaultdict(set)
    in_deg = {c: 0 for w in words for c in w}
    for i in range(len(words) - 1):
        w1, w2 = words[i], words[i+1]
        found = False
        for j in range(min(len(w1), len(w2))):
            if w1[j] != w2[j]:
                if w2[j] not in adj[w1[j]]:
                    adj[w1[j]].add(w2[j])
                    in_deg[w2[j]] += 1
                found = True
                break
        if not found and len(w1) > len(w2):
            return ""
    queue = deque([c for c, d in in_deg.items() if d == 0])
    order = []
    while queue:
        c = queue.popleft()
        order.append(c)
        for nb in adj[c]:
            in_deg[nb] -= 1
            if in_deg[nb] == 0:
                queue.append(nb)
    return "".join(order) if len(order) == len(in_deg) else ""

Tests

Standard: ["wrt","wrf","er","ett","rftt"] → some topo of w<e, r<t, t<f.
Single word: ["abc"] → some permutation of {a,b,c}.
Prefix violation: ["abc","ab"] → "".
Tie/equal words: ["a","a"] → "a".
Cycle: ["z","x","z"] → "".
All same length, all letters used: ["aa","ab","cb"].
Single letter: ["z"] → "z".
Long words, tiny alphabet: stress for in-degree correctness.

Follow-up Questions

“Find the lexicographically smallest valid order (in standard a-z order).” → Kahn’s with a min-heap instead of a queue.
“Find all valid orders.” → Backtracking over Kahn’s choices; exponential.
“Verify a given ordering.” → For each adjacent word pair, scan for the first differing char and check ordering. O(Σ L).
“Online: words arrive one by one.” → Maintain the adjacency incrementally; rerun Kahn’s lazily on query.
“What if the input has typos (wrongly-ordered pairs)?” → Return any consistent ordering, or report the conflict edge.

Product Extension

Build systems (Bazel, Make, Gradle) compute build orders via topological sort over the dependency DAG; cycle detection is a critical correctness property. Database query planners use topo sort over join-graph dependencies. Distributed task schedulers (Airflow, Argo) execute DAGs of jobs in topological order.

Language/Runtime Follow-ups

Python: defaultdict(set) and collections.deque are essential. dict.items() iteration is fine.
Java: Map<Character, Set<Character>> and Map<Character, Integer> for in-degree; ArrayDeque<Character> for queue. Use int[26] for in-degree if alphabet is fixed.
Go: map[byte]map[byte]bool for adjacency; map[byte]int for in-degree; slice as queue. Or [26]int for in-degree as alphabet is fixed.
C++: unordered_map<char, unordered_set<char>>; array<int, 26> for in-degree; queue<char>.
JS/TS: Map<string, Set<string>> and Map<string, number>; array-as-queue with care (shift is O(N)).

Common Bugs

Adding duplicate edges to in-degree — use a set for adjacency, check membership before incrementing.
Missing the prefix violation check — ["abc","ab"] returns "abc" if you don’t handle this.
Building in-degree only for letters that have outgoing edges, missing letters that only appear as targets.
Initializing in_degree only for the first word’s letters — letters appearing only in later words get missed.
Comparing len(order) == 26 instead of == len(in_deg) (only used letters count).
Using a list instead of a set for adjacency, then double-incrementing in-degree.
Returning the order in the wrong direction (Kahn’s gives the right direction; DFS post-order needs to be reversed).

Debugging Strategy

Print the adjacency and in-degree maps after constraint extraction. Verify each constraint is justified by tracing back to the input pair. For cycles, print the in-degree map at the point Kahn’s stalls — the remaining-positive in-degrees identify nodes in the cycle. For prefix-violation false negatives, print the pair (w1, w2) at each iteration to confirm the check fires.

Mastery Criteria

Recognized “ordering with constraints” as topological sort in <60 seconds.
Wrote constraint extraction from word pairs from cold start in <5 minutes.
Wrote Kahn’s algorithm from blank screen in <6 minutes.
Enumerated the three degenerate cases (cycle, prefix violation, equal-prefix-shorter-second) before coding.
Solved LC 269 in <25 minutes from cold start.
Solved LC 207 (Course Schedule) in <8 minutes by extracting the constraint structure from cold.
Articulated the white-path lemma / DFS-post-order alternative in <60 seconds.

Lab 07 — Union-Find Applications (Accounts Merge)

Goal

Implement a disjoint-set union (DSU) with path compression and union by rank, then apply it to a real merge problem where the “elements” are emails and the “groups” are accounts. After this lab you should be able to write DSU from a blank screen in <6 minutes, recognize the merge-by-shared-attribute signal in <60 seconds, and articulate when DSU beats DFS for connectivity (online updates, no spatial structure, simple connectivity-only queries).

Background Concepts

A disjoint-set union (DSU, aka union-find) maintains a partition of N elements under two operations:

find(x): return the representative (“root”) of x’s set.
union(x, y): merge the sets containing x and y.

With path compression (find rewrites every visited node to point directly at the root) and union by rank/size (always attach the smaller tree under the larger), both operations run in O(α(N)) amortized, where α is the inverse Ackermann function — effectively constant for any practical N.

DSU is the natural choice when you receive a stream of “merge x and y” operations and need to answer “are x and y in the same group” — and you don’t care about paths between them, only connectivity.

Interview Context

Accounts Merge (LC 721) is a top-tier Hard at Amazon and Google. Number of Provinces (LC 547) is the easier sibling at Meta. The trap: candidates default to BFS/DFS on the implicit graph (emails as nodes, “shared email between two accounts” as edges), which works but is messier code than DSU. The senior signal is recognizing the partition structure and reaching for DSU within 90 seconds.

Problem Statement

Given a list of accounts, where accounts[i] = [name, email1, email2, ...], two accounts belong to the same person if they share any common email (names alone are not enough — multiple people can share a name). Merge accounts: return a list where each element is [name, ...sorted unique emails], accounts in any order.

Constraints

1 ≤ |accounts| ≤ 1000
2 ≤ |accounts[i]| ≤ 10 (one name + 1..9 emails)
1 ≤ |email| ≤ 30
Emails are lowercase, contain @.

Clarifying Questions

Are emails case-sensitive? (Per LC: lowercase already.)
Two accounts with the same name but no shared email — are they merged? (No — names don’t merge.)
Should the output emails be sorted within each account? (Yes — alphabetically.)
Order of accounts in output? (Any order is accepted.)
Total emails: ≤ 1000 × 9 = 9000 — DSU on emails is fine.

Examples

accounts = [
  ["John","[email protected]","[email protected]"],
  ["John","[email protected]","[email protected]"],
  ["Mary","[email protected]"],
  ["John","[email protected]"]
]
→ [["John","[email protected]","[email protected]","[email protected]"],
   ["Mary","[email protected]"],
   ["John","[email protected]"]]

Initial Brute Force

Build implicit graph: each email is a node; for each account, connect all its emails to the first email of that account; run DFS/BFS to enumerate connected components; emit each component with the corresponding name. Works, but DSU is cleaner.

Brute Force Complexity

O(Σ |emails| · α) with DSU; O(Σ |emails|) with DFS — both linear in total email count. The DFS version requires building an adjacency list explicitly, which DSU skips.

Optimization Path

DSU directly:

Treat each unique email as a DSU element.
For each account, union all its emails to the first email.
After all unions, group emails by find(email) root.
For each group, attach the name (looked up via any email in the group → its account → the account’s name).
Sort emails within each group; output.

This is the cleanest expression. No explicit graph construction needed.

Final Expected Approach

parent = {}
def find(x): if parent[x] != x: parent[x] = find(parent[x]); return parent[x]
def union(x, y): parent[find(x)] = find(y)

email_to_name = {}
for account in accounts:
    name = account[0]
    for email in account[1:]:
        if email not in parent: parent[email] = email
        email_to_name[email] = name
        union(account[1], email)

groups = defaultdict(list)
for email in parent: groups[find(email)].append(email)

return [[email_to_name[group[0]]] + sorted(group) for group in groups.values()]

Data Structures Used

dict[str, str] for parent (DSU).
dict[str, str] for email_to_name.
defaultdict(list) for grouping by root.

Correctness Argument

DSU correctness: Initially every element is its own set. Each union merges two sets. find returns a canonical representative. After path compression, find(x) == find(y) iff they were ever transitively unioned. Path compression and union by rank/size preserve this invariant and amortize each op to α(N).

Reduction correctness: Two emails belong to the same person iff there is a chain of accounts where consecutive accounts share an email. The unions on each account’s emails form precisely these chains; the resulting partition matches the equivalence-class definition.

Output correctness: Each component’s name is unambiguous because (a) every account contributing to the component has the same name as the others in that component (otherwise they’d be different people, and the input is well-formed by problem statement), and (b) any email in the group recovers the name via email_to_name.

Complexity

Operation	Time	Space
Building DSU	O(Σ E · α) where E = total emails	O(Σ E)
Grouping + sort	O(Σ E log E) for sorting within each group	O(Σ E)
Total	O(Σ E log E)	O(Σ E)

Implementation Requirements

from collections import defaultdict

class DSU:
    def __init__(self):
        self.parent = {}
        self.rank = {}
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return
        if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
    def add(self, x):
        if x not in self.parent:
            self.parent[x] = x
            self.rank[x] = 0

def accountsMerge(accounts):
    dsu = DSU()
    email_to_name = {}
    for account in accounts:
        name = account[0]
        first = account[1]
        for email in account[1:]:
            dsu.add(email)
            email_to_name[email] = name
            dsu.union(first, email)
    groups = defaultdict(list)
    for email in dsu.parent:
        groups[dsu.find(email)].append(email)
    return [[email_to_name[g[0]]] + sorted(g) for g in groups.values()]

Tests

Standard: 3-account merge → 1 merged + 2 separate.
All accounts disjoint → each emerges separately.
All accounts share one email → all merge into one.
Single account → unchanged.
Same name, different emails → separate accounts.
Empty emails (problem disallows but defend): account[1:] is empty → no unions, no groups; account is dropped if no emails. Verify behavior.

Follow-up Questions

“Number of Provinces (LC 547): given an N × N adjacency matrix, count groups.” → DSU over N nodes; union if M[i][j] == 1; count distinct roots.
“Online: accounts arrive in a stream.” → DSU handles this natively; just keep adding and unioning.
“What if names matter (same name + shared email merges, different name doesn’t)?” → Keep DSU but check name-compatibility before union; conflict means error or skip.
“What if you need to remove an account?” → DSU doesn’t support remove. Use Link-Cut Trees or rebuild from scratch.
“What if path compression isn’t allowed (read-only find)?” → Use union by rank only; O(log N) per op instead of α.

Product Extension

Identity-resolution at LinkedIn, Salesforce, and ad networks merges user records by shared email/phone using DSU. Image segmentation libraries (OpenCV’s connectedComponents) use DSU under the hood. Distributed-system membership-protocols use DSU-like merges to track partition healing. Kruskal’s MST (Lab 08) uses DSU as its core data structure.

Language/Runtime Follow-ups

Python: recursion in find may exceed limit at N > 10^4; use iterative two-pass (find root, then compress).
Java: int[] parent for integer keys is significantly faster than HashMap<Integer, Integer>. Use iterative find.
Go: parent := make(map[string]string) for string keys, or []int for integer indices.
C++: vector<int> parent(N); iota(parent.begin(), parent.end(), 0); is the clean pattern. Iterative find.
JS/TS: Map<string, string> is fine; use iterative find to avoid call-stack issues at large N.

Common Bugs

Recursion limit in Python’s find — at N = 10^4 with worst-case chains, blows the stack. Use iterative or increase limit.
Forgetting path compression — find becomes O(N), not α. Functionality correct but TLE.
Union without rank — same TLE risk on adversarial inputs.
Comparing parent[x] == x vs find(x) == x for “is root” — only parent[x] == x is correct; find(x) == x is always true after find rewrites.
Forgetting to add an email to parent before union (union calls find which dereferences parent[email]) — KeyError.
Mapping email_to_name per-account but overwriting — last write wins; usually fine here, but be deliberate.
Not deduplicating emails within an account (LC inputs may not, but the algorithm is robust either way).

Debugging Strategy

Print parent and rank after each union. For wrong groupings, trace which email failed to union with which other and which account broke the chain. For TLE, verify both path compression and union by rank are present; profile to confirm find dominates.

Mastery Criteria

Recognized “merge by shared attribute” as DSU in <60 seconds.
Wrote DSU with path compression and union by rank from blank screen in <6 minutes.
Stated O(α) amortized complexity unprompted.
Articulated when DFS is a valid alternative (offline, no online updates) and when DSU is mandatory (online stream of merges).
Solved LC 547 (Number of Provinces) in <8 minutes.
Solved LC 721 (Accounts Merge) in <20 minutes from cold start.
Solved LC 305 (Number of Islands II) in <20 minutes — the canonical online-DSU problem.
Articulated path compression’s effect on amortization in <60 seconds.

Lab 08 — MST via Kruskal (Min Cost to Connect All Points)

Goal

Build a minimum spanning tree (MST) on a complete graph derived from N points using Kruskal’s algorithm. After this lab you should be able to recognize the MST signal in <60 seconds, write Kruskal from a blank screen (sort + DSU + early-exit) in <8 minutes, and reason about when Kruskal beats Prim (sparse graphs, edge list already given) and vice versa (dense graphs, adjacency matrix).

Background Concepts

A spanning tree of a connected graph G is a subgraph that includes all V vertices and exactly V - 1 edges with no cycles. The MST is the spanning tree with minimum total edge weight. Two canonical algorithms:

Kruskal’s: Sort all edges by weight ascending. Iterate; for each edge, union the endpoints if they’re in different components (use DSU); add to MST. Stop after V - 1 edges. Time O(E log E).
Prim’s: Start from any vertex; maintain a min-heap of crossing edges; repeatedly extract the lightest edge to a new vertex. Time O(E log V) with a binary heap; O(E + V log V) with a Fibonacci heap.

For “connect all points” with edge weights = pairwise Manhattan distance, the graph is complete: E = V·(V-1)/2 ≈ V². At V = 1000, E ≈ 5 × 10^5 edges. Either algorithm works; Kruskal with DSU is the cleanest expression because we already have the edge list.

Interview Context

Min Cost to Connect All Points (LC 1584) is a Medium asked at Amazon, Bloomberg, and Salesforce. It’s a clean MST signal: “minimum total cost to make everything connected.” The senior signal is naming the problem ("this is MST on a complete graph") within 60 seconds, then choosing Kruskal vs Prim consciously based on density. Strong candidates also state the Cut Property as the correctness foundation.

Problem Statement

Given an array points where points[i] = [xi, yi] represents a point in 2D, the cost of connecting two points is the Manhattan distance between them: |xi - xj| + |yi - yj|. Return the minimum cost to connect all points, where any two points are connected if there is a path between them.

Constraints

1 ≤ |points| ≤ 1000
−10^6 ≤ xi, yi ≤ 10^6
All points are distinct.

Clarifying Questions

Manhattan, Euclidean, or other metric? (Manhattan, per problem.)
Are diagonal connections counted? (Implicitly yes — we connect any two points directly.)
Are points distinct? (Yes, per constraint.)
Single point — cost? (0, no edges needed.)
Should the answer fit in 32-bit? (Max cost ≈ 999 · 4 × 10^6 ≈ 4 × 10^9 — use 64-bit just in case, though Python int is unbounded.)

Examples

points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
→ 20

points = [[3,12],[-2,5],[-4,1]]
→ 18

points = [[0,0]]
→ 0

Initial Brute Force

Enumerate all spanning trees and pick the one with minimum total weight. There are exponentially many. Infeasible.

A second “brute force” is Prim’s via array-scan (no heap): O(V²). At V = 1000: 10^6 ops — passes easily and is simpler than the heap version. This is actually a competitive option for this problem.

Brute Force Complexity

Spanning-tree enumeration: O(V^(V-2)) by Cayley’s formula. Infeasible. Array-scan Prim: O(V²). At V = 1000: 10^6 ops, well within budget.

Optimization Path

For dense graphs (E ~ V²), array-scan Prim is O(V²) — wins over Kruskal’s O(E log E) = O(V² log V). For sparse graphs, Kruskal or heap-Prim wins.

For LC 1584 specifically (V = 1000, dense), all three pass:

Kruskal: O(V² log V²) = O(V² log V) = ~10^7 ops; passes in 1-2s.
Heap-Prim: O(V² log V); same.
Array-scan Prim: O(V²); fastest.

In interviews, Kruskal is the “safer” choice because the code is mechanical: edges → sort → DSU → loop. Show you can choose array-scan Prim when asked about dense graphs.

Final Expected Approach (Kruskal)

Generate all V·(V-1)/2 edges with weight = Manhattan distance.
Sort by weight ascending.
Initialize DSU with V components.
Iterate edges; if endpoints differ, union and add weight to total; count edges added.
Stop when edges added == V - 1 (early exit).

Data Structures Used

List of edges as (weight, u, v) tuples.
DSU as in Lab 07 (path compression + union by rank).
Integer accumulator for total cost.

Correctness Argument

Cut Property: For any cut (partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut belongs to some MST. Kruskal greedily picks the lightest edge that doesn’t create a cycle (i.e., the lightest edge crossing some cut between two components); by the Cut Property, this edge is safe — there is an MST containing it. Repeating this V - 1 times produces an MST.

No-cycle invariant: DSU’s find ensures we add an edge only when its endpoints are in different components. Since adding an edge between same-component endpoints creates a cycle, this is exactly the cycle-prevention check.

Termination: Each union reduces component count by 1; after V - 1 unions, the graph is connected. We stop early.

Complexity

Operation	Time	Space
Edge generation	O(V²)	O(V²)
Sort	O(V² log V)	(in-place possible)
DSU loop	O(V² · α)	O(V)
Total	O(V² log V)	O(V²)

Implementation Requirements

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]  # path compression by halving
            x = self.parent[x]
        return x
    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry: return False
        if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
        return True

def minCostConnectPoints(points):
    n = len(points)
    edges = []
    for i in range(n):
        xi, yi = points[i]
        for j in range(i + 1, n):
            xj, yj = points[j]
            edges.append((abs(xi - xj) + abs(yi - yj), i, j))
    edges.sort()
    dsu = DSU(n)
    total, count = 0, 0
    for w, u, v in edges:
        if dsu.union(u, v):
            total += w
            count += 1
            if count == n - 1: break
    return total

Tests

Standard: 5 points → 20.
Single point: 0.
Two points: their Manhattan distance.
Collinear points (all on x-axis): MST weight = max(x) - min(x).
Stress: V = 1000 random — verify Kruskal and Prim agree.
Adversarial: points on a grid — many edges of equal weight (test tie-breaking is stable).

Follow-up Questions

“Connecting Cities With Minimum Cost (LC 1135)” → Same MST template; if multiple MSTs valid, return any total cost; -1 if disconnected.
“Optimize Water Distribution in a Village (LC 1168)” → Add a virtual node 0 connected to each house with the well-cost; run MST on V + 1 nodes.
“Critical Connections (bridges)” → Different problem (Tarjan’s bridge algorithm), see Phase README.
“Maximum Spanning Tree.” → Sort descending; same algorithm.
“Online edge insertion: maintain MST.” → Link-Cut Trees; advanced.

Product Extension

Network design (laying fiber, planning power grids), clustering algorithms (single-linkage clustering = MST followed by cut-largest-edges), image segmentation, and approximation algorithms for TSP all use MST as a primitive. AWS / Azure data-center backbone planning uses MST variants weighted by latency × cost.

Language/Runtime Follow-ups

Python: edges.sort() on tuples sorts lexicographically — (weight, u, v) works. heapq is overkill since we need all edges sorted upfront, not on-demand.
Java: Arrays.sort(int[][]) with a comparator on the weight column. Use int[] triples for cache locality.
Go: sort.Slice(edges, func(i, j int) bool { return edges[i].w < edges[j].w }).
C++: vector<tuple<int,int,int>> with default < ordering; sort(edges.begin(), edges.end()).
JS/TS: edges.sort((a, b) => a[0] - b[0]). Avoid a[0] < b[0] returning a boolean (subtle bug).

Common Bugs

Forgetting the early exit when edges added = V - 1 — works but processes more edges than needed.
Generating duplicate edges (i.e., (i, j) and (j, i)) — wastes time but doesn’t break correctness.
Off-by-one in V - 1 — counting edges incorrectly leads to incomplete tree.
Comparator returns boolean in JS — use subtraction.
Integer overflow on edge weights — Manhattan distance bounded by 4 × 10^6; sum bounded by ~4 × 10^9, fits in 64-bit. In Python no issue; in Java use long.
DSU’s union returns nothing vs returns success boolean — pick a consistent API.
Returning the count of edges instead of total weight — silent.

Debugging Strategy

For small inputs, print the sorted edge list and the DSU state after each union. Verify the total edges added is exactly V - 1. If the result is too large, check that you’re not adding edges within the same component (the cycle check might be skipped). Compare against a Prim’s reference for stress tests.

Mastery Criteria

Recognized “minimum cost to connect” as MST in <60 seconds.
Wrote Kruskal from blank screen in <8 minutes.
Chose Kruskal vs Prim consciously based on density when asked.
Stated the Cut Property as the correctness foundation in <30 seconds.
Solved LC 1584 in <20 minutes from cold start.
Solved LC 1135 (Connecting Cities) in <12 minutes by extending the template.
Solved LC 1168 (Water Distribution) in <15 minutes with the virtual-node trick.
Stated the V² Prim option for dense graphs in <30 seconds.

Lab 09 — Graph Modeling (Bus Routes)

Goal

Practice the modeling skill that separates competent graph candidates from L5+ candidates: given a problem with no obvious graph, invent the right node and edge definition. After this lab you should be able to enumerate at least two valid graph models for a given problem, choose the one with the smallest state space, and justify the choice in <90 seconds.

Background Concepts

The hardest interview graph questions don’t say “graph.” Examples:

Bus Routes (LC 815): “fewest buses to take” — model: nodes = bus routes (not stops!); edges = “two routes share a stop”; multi-source BFS from all routes containing the source stop.
Word Ladder (LC 127): nodes = words; edges = one-character difference. (See Lab 01.)
Open the Lock (LC 752): nodes = 4-digit states; edges = single-digit ±1; BFS.
Sliding Puzzle (LC 773): nodes = board states (encode as string); edges = legal swap; BFS.

The modeling decision space is:

What is a node? — Often the natural object (stop, word, board state) is wrong; a higher-order or quotient object yields a smaller state space.
What is an edge? — Direct adjacency (one move), or “shared resource” (two routes share a stop).
Is it weighted? — If yes → Dijkstra. If unweighted → BFS.
Sources/targets? — Single, multiple, or “any-of-set” (multi-source BFS).
Implicit vs explicit? — Build the adjacency upfront or compute on the fly.

The Bus Routes trap: candidates model nodes = stops, edges = “stops on the same route.” Then BFS distance is steps within a route, not number of routes taken. The whole question collapses to nonsense. Modeling nodes = routes makes the BFS distance number of routes, which is the answer ± 1.

Interview Context

Bus Routes (LC 815) is asked at Google, Meta, and Amazon at L5+. The question itself is mid-level once modeled correctly; the difficulty is the modeling. Interviewers probe modeling explicitly: “tell me how you’d represent this as a graph” — this is the test. If you flounder for 5 minutes, you’re done. The senior signal: state both modelings (stops vs routes), explain why the route-model gives the right BFS distance, then code.

Problem Statement

You are given an array routes where routes[i] is a bus route that the ith bus repeats forever. For example, routes[0] = [1, 5, 7] means the 0th bus travels in the sequence 1 → 5 → 7 → 1 → 5 → 7 → … forever. You start at source and want to reach target. You can travel between stops by buses only. Return the fewest number of buses required, or -1 if impossible.

Constraints

1 ≤ |routes| ≤ 500
1 ≤ |routes[i]| ≤ 10^5
Σ |routes[i]| ≤ 10^5
0 ≤ routes[i][j] < 10^6
0 ≤ source, target < 10^6

Clarifying Questions

Do source == target cases return 0? (Yes — no bus needed.)
Is source guaranteed to be on some route? (Not necessarily — return -1 if not.)
Are routes circular as stated? (Yes — but that doesn’t matter for “buses taken”; what matters is which stops a route covers.)
Are stop numbers unique within a route? (Per LC, yes — but treat defensively.)
Can two routes share a stop? (Yes — that’s exactly how transfers happen.)

Examples

routes = [[1,2,7],[3,6,7]], source = 1, target = 6
→ 2  (take bus 0 from 1 to 7, then bus 1 from 7 to 6)

routes = [[7,12],[4,5,15],[6],[15,19],[9,12,13]], source = 15, target = 12
→ -1

Initial Brute Force

BFS over stops as nodes and edges between any two stops sharing a route. Build adjacency: for each route, add all (stop, stop’) pairs as edges. At Σ |routes| = 10^5, with a 10^5-stop route, that’s 5 × 10^9 edges — TLE / OOM.

Brute Force Complexity

O((Σ L)²) edges in the worst case. Infeasible at Σ L = 10^5.

Optimization Path

Switch the modeling: nodes = routes, edges = “two routes share a stop.” Build a stop → list of routes containing it index in O(Σ L). For each pair of routes sharing a stop, that’s an edge — but we never enumerate all such pairs explicitly. Instead, in BFS, when we visit route r, we expand to every other route sharing any of r’s stops, looked up via the index.

To avoid revisiting, mark routes (not stops) as visited. Also mark stops as visited (after expanding all routes through that stop) to avoid the O(L²) blowup of re-expanding a popular stop.

Final BFS distance = number of routes used. Answer is the BFS layer at which we find any route containing target.

Final Expected Approach

If source == target, return 0.
Build stop_to_routes: dict of stop → set of route indices.
BFS over routes. Initialize queue with all routes containing source, distance = 1.
For each popped (route, d), scan all stops in routes[route]. If any is target, return d.
Mark each stop visited (skip if already). For each unvisited stop, enqueue all unvisited routes containing it, distance d + 1.
If queue empties without finding target, return -1.

Data Structures Used

defaultdict(set) for stop_to_routes.
collections.deque for BFS queue.
set for visited routes and visited stops.

Correctness Argument

Distance interpretation: Initializing the queue with routes containing source at distance 1 means: distance d = “number of routes taken so far, including the current one.” When a route at distance d covers target, the answer is d.

No double-counting: Marking stops visited prevents the O(L²) blowup; marking routes visited prevents re-expansion. Both are needed: a stop is visited once we’ve added all its routes; a route is visited once we’ve expanded its stops.

BFS optimality: Standard BFS argument on the route-graph — first time a route is popped, its distance is minimum. Therefore the first time a route containing target is popped, the answer is its distance.

Complexity

Operation	Time	Space
Index build	O(Σ L)	O(Σ L)
BFS	O(Σ L + R²) where R = routes count	O(Σ L)
Total	O(Σ L + R²)	O(Σ L)

(R² because in the worst case every pair of routes shares some stop and we add an edge.)

Implementation Requirements

from collections import defaultdict, deque

def numBusesToDestination(routes, source, target):
    if source == target:
        return 0
    stop_to_routes = defaultdict(set)
    for i, r in enumerate(routes):
        for s in r:
            stop_to_routes[s].add(i)
    if source not in stop_to_routes:
        return -1
    visited_routes = set()
    visited_stops = {source}
    queue = deque()
    for r in stop_to_routes[source]:
        queue.append((r, 1))
        visited_routes.add(r)
    while queue:
        route, d = queue.popleft()
        for stop in routes[route]:
            if stop == target:
                return d
            if stop in visited_stops:
                continue
            visited_stops.add(stop)
            for nr in stop_to_routes[stop]:
                if nr not in visited_routes:
                    visited_routes.add(nr)
                    queue.append((nr, d + 1))
    return -1

Tests

Standard: [[1,2,7],[3,6,7]], src=1, tgt=6 → 2.
src == tgt: → 0.
src not on any route: → -1.
Single route covering both: → 1.
Disconnected routes: → -1.
Long route (10^5 stops, 1 route): src and tgt on it → 1; tgt not on it → -1.
Many routes sharing one hub: BFS expansion through hub.

Follow-up Questions

“What if buses have different costs?” → Dijkstra over routes with edge weight = cost.
“What’s the actual sequence of buses taken?” → Maintain parent pointers in BFS; reconstruct.
“What if you can walk between adjacent stops?” → Add walking edges; might or might not change the model.
“Multi-source: any of K starting stops to any of M targets.” → Multi-source BFS on the route-graph.
“Online: routes added/removed live.” → Recompute on each query, or use dynamic-graph techniques.

Product Extension

Real transit-routing systems (Google Maps, Citymapper, Apple Maps) model transit as a multi-modal graph: nodes are stops, walking edges connect nearby stops, transit edges represent “ride a route between two of its stops.” The route-as-node model used here is a simplification useful for “fewest transfers” queries. Production systems combine with time-dependent shortest-path (CSA, RAPTOR) for actual journey planning.

Language/Runtime Follow-ups

Python: defaultdict(set) and deque. The inner loop scans routes[route] once per route popped — bound this by visiting stops only once.
Java: HashMap<Integer, Set<Integer>> for the index; ArrayDeque<int[]> for (route, distance).
Go: map[int]map[int]bool or map[int][]int. Slice-as-queue.
C++: unordered_map<int, vector<int>>. queue<pair<int,int>>. Reserve capacity if Σ L is known.
JS/TS: Map<number, Set<number>> and array-as-queue (use a deque polyfill if N large).

Common Bugs

Modeling stops as nodes — leads to O((Σ L)²) edge count and TLE.
Returning d instead of d - 1 (or vice versa) — make sure the distance semantics match: d = number of buses taken.
Not marking the source stop as visited — re-expanding routes through it.
Not marking routes as visited — exponential queue blowup.
Returning -1 when source == target instead of 0.
Forgetting that source not in stop_to_routes is a -1 case.
Stack overflow in BFS (no — BFS is iterative; this is a DFS-only problem).

Debugging Strategy

For a 3-route example, print the stop_to_routes index. Walk through BFS by hand: queue contents, visited sets after each pop. Verify d increases exactly once per layer of routes. If TLE, profile to confirm visited-stop marking is preventing repeated route expansion. If wrong answer, check whether you’re returning d or d - 1.

Mastery Criteria

Recognized “fewest buses” as a graph problem in <30 seconds.
Enumerated both stop-as-node and route-as-node models in <90 seconds.
Articulated why route-as-node yields the correct distance unprompted.
Wrote BFS with both visited-routes and visited-stops sets in <12 minutes.
Stated O(Σ L + R²) complexity unprompted.
Solved LC 815 in <30 minutes from cold start.
Solved LC 752 (Open the Lock) in <20 minutes by encoding state as a string.
Solved LC 773 (Sliding Puzzle) in <30 minutes by encoding the 2D board as a string state.
When given a new “no obvious graph” problem, can produce a correct model in <3 minutes.

Phase 5 — Dynamic Programming (Basic → Extreme)

Target level: Medium → Very Hard Expected duration: 4 weeks (12-week track) / 5 weeks (6-month track) / 6 weeks (12-month track) Weekly cadence: ~6 DP topics per week + 30–60 problems applying them under the framework

Why Dynamic Programming Is The Single Hardest Pattern Family In Coding Interviews

Phase 4 taught you that one-in-three Medium-Hard interview problems is a graph problem in disguise. The other big share is dynamic programming. DP shows up in roughly one in four Medium-Hard rounds at top-tier companies, and the share rises further in staff/principal and quant interviews where exact-counting and optimization questions dominate. More importantly, DP is the topic where the gap between candidates who have a framework and those who don’t is widest. A candidate without a DP framework freezes on dp[i][j] = ?; a candidate with one writes the recurrence in 90 seconds and spends the remaining time on edge cases.

The empirical claim that drives this entire phase:

The hard part of DP is not the code. The hard part is deriving the state. Almost every wrong DP solution is wrong because the state is wrong — too small to capture all the information needed, or so large that the table doesn’t fit in memory or time. Once the state is right, the transition writes itself, the base cases follow from the state, and the code is mechanical.

DP is also the topic where candidates most often memorize problems instead of internalizing the technique, and it is the topic where memorization fails most spectacularly. There are perhaps 60 named DP problems on LeetCode that everyone has seen; an interviewer who wants to filter out the memorizers asks the 61st. The solution, then, is not to drill 60 problems — it is to drill the derivation process until you can derive any DP from the recursive formulation.

This phase is built around one teaching device that we will use on every single problem from start to finish: the brute → memo → tabulated → space-optimized progression. Every problem you solve in this phase will be solved four times in succession:

Brute force — usually exponential recursion that explores every choice.
Memoized — the same recursion plus a cache, top-down DP, O(states × transition) time.
Tabulated — bottom-up loop in topological order on the state DAG, O(states × transition) time, no recursion stack.
Space-optimized — a rolling-array transformation that keeps only the previous one or two layers and reduces space from O(N · M) to O(M) or O(1).

By the end of this phase, you will execute this progression unconsciously. When an interviewer asks “can you reduce the space?”, you will already have written the tabulated version with the rolling array in mind. When the interviewer asks “what’s the recurrence?”, you will already have derived it from the brute-force recursion in the first 90 seconds of the problem. The four-stage progression is the single most valuable interview-time discipline taught in this entire curriculum, because it converts an open-ended “design a DP” question into a deterministic four-step recipe.

After this phase, you can solve canonical Hard DP problems on first attempt: edit distance in 25 minutes with full progression, longest increasing subsequence at both O(N²) and O(N log N), partition equal subset sum, coin change (count and minimum), burst balloons (interval DP), house robber III (tree DP), and shortest path visiting all nodes (bitmask DP). You will also become visibly stronger in mock interviews because you will reach for dp[i][j] notation on the whiteboard within 90 seconds and articulate the state definition out loud before writing any code.

What You Will Be Able To Do After This Phase

Recognize that a problem is a DP problem within 2 minutes, even when the words “DP” or “memoize” never appear in the statement.
Derive the state from a recursive brute force in <3 minutes by identifying the parameters that change across recursive calls.
Write the transition as a closed-form max/min/sum over a small set of choices, with clear correctness justification.
Identify base cases as the recursive function’s return statements at the smallest input.
Write the tabulated version by inverting the recursion into a loop with the right evaluation order on the state DAG.
Apply the rolling-array trick to reduce O(N · M) space to O(M) or O(1) when the recurrence depends only on the previous row(s).
Distinguish 0/1 knapsack from unbounded knapsack by a single change in the inner-loop direction.
Recognize when a “subset” or “partition” problem reduces to subset sum with target total / 2.
Implement LIS at both O(N²) (canonical DP) and O(N log N) (patience sorting + binary search) and explain the equivalence.
Implement edit distance with all four variants (brute, memo, tabulated, O(M)-space) in <25 minutes.
Implement tree DP via post-order recursion, returning multi-tuple state (e.g., “best with root included” and “best with root excluded”).
Implement interval DP with the canonical for length, for left, right = left + length - 1 loop structure.
Implement bitmask DP with state (mask, last) for TSP-style problems and (mask) alone for set-cover-style problems.
Articulate the correctness argument for every DP you write: state definition, transition justification, evaluation order, base case.
Spot the standard DP bugs unprompted: wrong base case, wrong evaluation order, off-by-one in indices, missed edge case at empty input.

How To Read This Phase

Read this README in two passes. Pass 1: linear, end-to-end, building a mental map of which DP variant solves which problem signal. Do this in one sitting. Pass 2: as you work the labs, refer back to specific topic entries to clarify state-design choices and pitfalls.

Each topic entry has a fixed shape:

When To Use — the problem signal that should fire this DP variant in <2 minutes.
State Design — what the state is, why these are the right parameters, why no fewer suffice.
Transition — the recurrence in closed form.
Complexity — time and space, and what space optimization is possible.
Common Pitfalls — the bugs that consume the most interview minutes for this DP variant specifically.
Classic Problems — 3–5 representative LeetCode problems where this DP is the intended solution.

The phase ends with a DP-Recognition Cheat Sheet (problem signals → DP variant), a Common-Bug Catalog, a Mastery Checklist, and Exit Criteria.

The DP Framework

Before any topic, internalize this framework. Use it on every DP problem.

1. State Definition

The state is the smallest set of parameters that uniquely determines the answer to a subproblem. Write it down explicitly:

dp[i] = the answer to the subproblem ending at / using up to / for prefix-of-length / etc., parameter i.

Sentences that begin “let dp[i] be” are the most valuable two seconds of the entire problem. If you can’t finish the sentence, you don’t have a state — you have a vague hope.

The state must be sufficient (encodes everything the future needs to know) and necessary (every parameter actually changes the answer). A common bug is a state that’s sufficient but not necessary — e.g., tracking both index and remaining-budget when budget is determined by index. Another is necessary but not sufficient — e.g., tracking only the index when the choice depends on what was picked earlier.

A useful test: two equal states must produce equal answers. If two different histories arrive at the same state but have different optimal continuations, the state is missing a dimension.

2. Transition Function

The transition expresses dp[state] as a function of dp[smaller_state] for one or more smaller states. It is the recurrence. For optimization problems, it is min or max over a small set of choices; for counting problems, it is a sum.

The transition has three parts:

Choices — the discrete set of moves at this state (include item or skip; pick this character or that one; rob this house or skip).
Cost / value — the contribution of each choice (the item’s weight, the operation’s cost, the gain from picking).
Aggregation — min / max / sum / OR over the choices.

Write the transition as:

dp[state] = aggregate over choice c in C(state): contribution(c) + dp[state - effect(c)]

Always keep C(state) finite and small — typically O(1), O(K), or O(N). Transitions that aren’t O(small) usually indicate a missing state dimension.

3. Base Cases

The base cases are the values of dp at the smallest (recursion-stopping) states. They are not optional; a missing or wrong base case is the single most common DP bug.

Identify base cases by writing the recursion first and looking at the return-statements:

def f(i):
    if i == 0: return 0  # ← THIS is the base case
    return f(i-1) + something

For 2D DP, the base cases are typically the entire first row and first column — set them explicitly before the main loop. For 3D DP and beyond, they’re a hyper-plane of dimension one less than the state.

A subtle base case bug: two different recursive paths reach the same base case but expect different return values. Usually this means the state is wrong (missing a dimension) and the base case has to “remember” which path it came from — impossible.

4. Evaluation Order

DP states form a DAG: state A points to state B iff dp[B] appears in the recurrence for dp[A]. To compute dp[A] we must have already computed dp[B]. The evaluation order is a topological order of this DAG.

For 1D DP indexed by i, the order is usually i = 0, 1, 2, …, N (increasing) or i = N, N-1, …, 0 (decreasing) — depending on whether your transition looks “back” or “forward”. Both work; pick one consistently.

For 2D DP indexed by (i, j), the order is usually row-major (for i: for j:) or column-major. The right one is the one that fills dp[i-1][j] and dp[i][j-1] before dp[i][j].

For interval DP indexed by (left, right), the order is by interval length ascending: for length in 1..N: for left, right = left + length - 1. This guarantees all sub-intervals are filled before the enclosing interval.

For tree DP, the order is post-order DFS: children are filled before the parent.

For bitmask DP, the order is by popcount ascending or by mask value ascending (since a sub-mask of m is < m).

5. Space Optimization (The Rolling-Array Technique)

If dp[i][j] depends only on dp[i-1][*] and dp[i][j-1], then once row i is computed we don’t need any earlier row. Keep only the current and previous row — O(N · M) becomes O(M).

If dp[i][j] depends only on dp[i-1][j] and dp[i-1][j-1] (no same-row dependency), you can collapse to a single row using right-to-left iteration in j — O(M) space.

If dp[i][j] depends on more rows (e.g., dp[i-2]), keep that many rows.

The rolling-array transformation is mechanical once the tabulated version is written. The interviewer often asks for it: “can you reduce the space?” Practice the transformation on every lab so it becomes reflex.

6. The Brute → Memo → Tabulated → Space-Optimized Progression

Apply this on every problem in this phase. It is the mandatory teaching device of this phase.

Brute force: exponential recursion that tries every choice. Often O(2^N) or O(N!). Don’t skip writing this — it is the direct source of your state and transition.
Memoized: same recursion + a cache (@lru_cache in Python, a HashMap in Java, an array in C++). Time becomes O(states × transition); space includes the recursion stack.
Tabulated: replace recursion with a loop in topological order on the state DAG. Same time complexity, but no recursion overhead, and the loop structure makes the dependency pattern explicit.
Space-optimized: roll the table down to O(M) or O(1) by exploiting the recurrence’s locality.

Each stage is a strictly smaller change from the previous: brute → memo is “add a cache”, memo → tabulated is “invert the call graph”, tabulated → space-optimized is “drop dimensions you don’t reuse”. This is deterministic engineering, not invention.

Inline DP Topic Reference

1. Memoization Vs Tabulation Tradeoffs

When To Use

Both compute the same thing — they’re two evaluation orders on the same state DAG. Choose deliberately.

Memoization (top-down): when the reachable set of states is much smaller than the full state space (sparse DP). Examples: regex matching where many (i, j) pairs are never visited. Also when the transition is easier to express recursively than as a forward loop.
Tabulation (bottom-up): when the state space is dense (most states are visited), when you need to reduce space via rolling arrays (which require explicit loop structure), or when recursion depth would exceed the stack limit (e.g., N=10^5 in Python with default setrecursionlimit=1000).

Common Pitfalls

Memoization in Python with default setrecursionlimit overflows at N ≈ 1000. Either sys.setrecursionlimit(10**6) or convert to tabulation.
Memoization with mutable arguments (e.g., lru_cache on a function taking a list) — Python lru_cache requires hashable arguments; pass tuples or indices, not lists.
Tabulation with the wrong loop order silently produces garbage — see Section 5.

Classic Problems

LC 322 — Coin Change. Memoized recursion is natural; tabulated is faster in tight loops. See Lab 04.
LC 10 — Regular Expression Matching. Memoization is cleaner here. See String DP below.

2. State Design Principles

When To Use

Every DP problem starts here. Follow this discipline:

Identify what changes across recursive calls. Those parameters are candidate state dimensions.
Drop any parameter that is determined by the others.
Keep any parameter whose value affects the optimal continuation.
Verify: two states with identical parameters must have identical optimal values.

State Design Patterns

Prefix DP: dp[i] = answer for first i elements. Used in LIS, house robber, decode ways.
Two-pointer / interval DP: dp[i][j] = answer for elements in [i, j]. Used in matrix chain, burst balloons, palindromic subsequence.
Knapsack-style: dp[i][w] = best using first i items with budget w. Used in 0/1 knapsack, partition, coin change.
Two-string DP: dp[i][j] = answer for prefix-i of A and prefix-j of B. Used in LCS, edit distance, regex matching.
Tree DP: dp[v] = answer for subtree rooted at v. Often a tuple of values (e.g., “rob” / “skip”).
Bitmask DP: dp[mask] or dp[mask][last] = answer over the subset specified by mask.
Game DP: dp[state] = best score the current player can guarantee.

Common Pitfalls

Adding a parameter that doesn’t affect the answer — wastes time and space. E.g., tracking step_count when the recurrence already encodes it via the index.
Missing a parameter that does — produces wrong answers because two materially different histories collapse to the same dp cell.
Encoding choices in the state instead of the transition. The state is “where are we now”; the transition decides “what to do next”. Keep them separate.

3. Transition Function Design

When To Use

Once the state is defined, the transition is constrained: it must express dp[state] in terms of strictly smaller states.

Design Steps

List all choices available at this state (include / skip / pick which / move where).
For each choice, identify the contribution and the resulting smaller state.
Aggregate: min for shortest/cheapest, max for longest/most-valuable, + for counting.

Common Pitfalls

Forgetting a choice — usually “don’t take the item” or “skip this position”. Often the trivial choice that the recurrence still depends on.
Double-counting — particularly in counting problems where two distinct paths to the same state are aggregated naively. Often signals a missing dimension.
Off-by-one in the resulting smaller state — dp[i-1][j-1] vs dp[i][j-1] is the difference between “use this character” and “use the prefix ending here”.

4. Base Case Identification

When To Use

After defining state and transition, the recursion bottoms out at some smallest state. The base case is what dp returns there.

Identifying Base Cases

For prefix DP dp[i]: the base case is dp[0] — the empty prefix. Its value is the natural identity (0 for sums, 1 for counts of “the empty product”, -∞ or +∞ for unreachable).
For interval DP dp[i][j]: the base case is dp[i][i] (length-1 interval) — its value depends on the problem.
For two-string DP dp[i][j]: the base cases are dp[0][j] = j and dp[i][0] = i for edit distance, or dp[0][j] = dp[i][0] = 0 for LCS.

Common Pitfalls

Wrong identity for counting problems — the empty prefix has count 1 (one way to make nothing), not 0.
Wrong identity for min / max — initialize to +∞ / -∞, not 0. Initializing to 0 silently makes “do nothing” look optimal.
Forgetting to set base cases on the boundary of a 2D table — leaves them as the language’s default (0 in Java arrays; null in JS; uninitialized garbage in C).

5. Evaluation Order (Topological Order On The State DAG)

When To Use

Every DP. The evaluation order must be consistent with the dependency structure.

Determining The Order

Treat states as nodes; draw an edge from A to B iff the recurrence for A reads dp[B]. The order to evaluate is reverse topological (children before parents).

For most DPs the order is obvious: increasing i, increasing j, increasing interval length, post-order over the tree, increasing popcount of mask. When in doubt, fix a small example, write out the dependency arrows, and read off the order.

Common Pitfalls

2D DP iterated in the wrong order silently computes garbage. The classic bug: iterating for j: for i: when the recurrence reads dp[i-1][j] and dp[i][j-1]. The latter is fine only if the inner loop fills the column top-to-bottom and you compute it in the right order.
Interval DP iterated in (left, right) order instead of (length, left) — fails because you compute dp[0][N-1] before dp[1][N-1].
Bitmask DP iterated by some-arbitrary-order instead of mask value ascending — fails if any sub-mask is read after the enclosing mask is written.

6. Space Optimization (Rolling Array Technique)

When To Use

Whenever dp[i][...] depends on only the previous one or two i values, you can keep just those rows.

Mechanical Transformation

Replace dp[i][j] with dp_curr[j] and dp[i-1][j] with dp_prev[j].
After each i, swap or copy.
If dp[i][j] doesn’t depend on dp[i][k] for k < j (no same-row dependency), collapse further to a single 1D dp[j]. Iterate j carefully: if the recurrence reads dp[i-1][j-1], iterate j from right-to-left so you read the old value before overwriting.

Common Pitfalls

Iterating left-to-right when right-to-left is needed — overwrites the value you’ll need next. This is the canonical 0/1 knapsack vs unbounded knapsack distinction:
- 0/1 knapsack: iterate weight right-to-left to use the previous-row’s dp[w-wi].
- Unbounded knapsack: iterate weight left-to-right to use the current row’s dp[w-wi] (because items can be reused).
Forgetting to reset the rolling array between outer iterations — old values bleed through.
Optimizing space prematurely, before the tabulated version is correct. Always verify tabulated against memoized on small inputs first.

7. 1D DP

When To Use

The state is a single integer index — dp[i]. Examples: climbing stairs, house robber, decode ways, max subarray (Kadane).

State Design

dp[i] = answer for the prefix ending at index i, OR for the first i elements. Pick one convention and stick with it (the lab uses “answer for first i elements” consistently).

Transition

dp[i] = f(dp[i-1], dp[i-2], ..., dp[i-k]) for some small k. Examples:

House robber: dp[i] = max(dp[i-1], dp[i-2] + house[i-1]).
Climbing stairs: dp[i] = dp[i-1] + dp[i-2] (Fibonacci).
Decode ways: dp[i] = (dp[i-1] if s[i-1] is valid 1-digit) + (dp[i-2] if s[i-2:i] is valid 2-digit).

Complexity

Time O(N). Space O(N) tabulated, O(1) space-optimized (since dependence on at most O(1) previous values).

Common Pitfalls

Off-by-one between dp[i] and house[i] — confusion between “first i houses” (uses house[i-1] as the latest) and “ending at index i” (uses house[i]). Pick one and never mix.
Forgetting the empty case — dp[0] for “first 0 elements” must be the identity.

Classic Problems

LC 70 — Climbing Stairs. See Lab 01.
LC 198 — House Robber.
LC 91 — Decode Ways.
LC 53 — Maximum Subarray (Kadane).
LC 746 — Min Cost Climbing Stairs.

8. 2D DP

When To Use

The state is a pair of integers — dp[i][j]. Examples: unique paths on a grid, minimum path sum, longest common subsequence, edit distance.

State Design

For grid problems, dp[i][j] = answer for getting to cell (i, j). For two-string problems, dp[i][j] = answer for prefix-i of one string and prefix-j of the other.

Transition

For grid: dp[i][j] = dp[i-1][j] + dp[i][j-1] (count of paths) or min(dp[i-1][j], dp[i][j-1]) + grid[i][j] (min path sum).

Complexity

Time O(N · M). Space O(N · M) tabulated, O(M) with rolling rows, O(M) with right-to-left collapse to 1D when there’s no same-row dependency.

Common Pitfalls

Initializing first row and first column wrong for grid path problems — these are not always 0 or 1; they may carry obstacles or grid values.
Adding grid[i][j] to all transitions including the boundary — the boundary needs special handling.

Classic Problems

LC 62 — Unique Paths.
LC 63 — Unique Paths II (with obstacles). See Lab 02.
LC 64 — Minimum Path Sum.
LC 120 — Triangle.

9. 0/1 Knapsack

When To Use

A set of N items each with weight w_i and value v_i; capacity W; maximize value subject to total weight ≤ W. Each item used at most once. Recognized by discrete choices over a budget.

State Design

dp[i][w] = max value using first i items with capacity w.

Transition

dp[i][w] = max(dp[i-1][w], dp[i-1][w - w_i] + v_i) if w >= w_i, else dp[i-1][w]. The two cases: skip item i, or take it.

Complexity

Time O(N · W). Space O(W) with right-to-left collapse: iterate w from W down to w_i.

Common Pitfalls

Iterating w left-to-right in the 1D-collapsed version — turns 0/1 knapsack into unbounded knapsack, allowing the same item to be picked multiple times.
Treating W as a free variable when it’s actually constrained by problem size — at W = 10^9 the table doesn’t fit; switch to meet-in-the-middle or branch-and-bound (out of scope here).

Classic Problems

LC 416 — Partition Equal Subset Sum (0/1 knapsack reformulation: target = total / 2). See Lab 03.
LC 494 — Target Sum.
LC 474 — Ones and Zeroes (2D knapsack).

10. Unbounded Knapsack

When To Use

Same as 0/1 knapsack but each item can be used any number of times. Recognized by “unlimited supply” / “any number of coins” / “items can be reused”.

State Design

dp[w] = best value with capacity w, considering all items as candidates at every step.

Transition

dp[w] = max(dp[w], dp[w - w_i] + v_i) for every item i such that w >= w_i.

Complexity

Time O(N · W). Space O(W). Iterate w left-to-right.

Common Pitfalls

Iterating wrong direction — same as 0/1 knapsack but inverted. Left-to-right makes items reusable; right-to-left makes them one-use.
Confusing “min number of items” with “max value” — in coin change (min coins), initialize to +∞, transition is dp[w] = min(dp[w], dp[w - c] + 1).
Counting orderings vs combinations: for “number of ways to make change as combinations”, the outer loop is over coins and inner over sums; for “number of ordered sequences”, swap them. The two produce different counts.

Classic Problems

LC 322 — Coin Change (min coins). See Lab 04.
LC 518 — Coin Change II (count combinations).
LC 279 — Perfect Squares.
LC 139 — Word Break (unbounded with “items” = dictionary words).

11. Subset Sum / Partition Equal Subset Sum

When To Use

“Can we pick a subset summing to T?” Recognized in: partition problems, target-sum problems, equal-sum-subsets.

Reformulation

Subset sum is 0/1 knapsack with v_i = w_i and target W = T. Use dp[w] = bool (reachable or not) instead of “max value”, and aggregate with OR instead of max.

Complexity

Time O(N · T). Space O(T) (bool array, can use a bitset for O(T / 64) space and time).

Common Pitfalls

Forgetting that target T may be huge — for “partition equal subset sum”, T = total / 2; if total is odd, return false immediately.
Using max instead of OR for boolean aggregation.

Classic Problems

LC 416 — Partition Equal Subset Sum. See Lab 03.
LC 698 — Partition to K Equal Sum Subsets (harder; bitmask DP).

12. LIS — Longest Increasing Subsequence

When To Use

“Longest subsequence with property P” where P is monotonic (increasing, non-decreasing, or some order relation).

State Design (O(N²) DP)

dp[i] = length of LIS ending at index i and using arr[i] as the last element.

Transition

dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]). Answer is max(dp[1..N]).

Complexity

O(N²) time, O(N) space.

Patience Sort / O(N log N) Variant

Maintain tails[k] = smallest possible tail of any increasing subsequence of length k+1. For each arr[i], find the leftmost tails[k] >= arr[i] via binary search and replace it with arr[i] (or append if arr[i] > all). The length of tails at the end is the LIS length.

This is patience sorting — laying cards onto piles where each pile is strictly decreasing top-to-bottom, and the number of piles is the LIS length.

Complexity

O(N log N) time, O(N) space.

Common Pitfalls

Confusing “LIS length” with “LIS itself” — tails is not the LIS; reconstructing the actual sequence requires storing predecessors during scan.
Strict vs non-strict — for non-decreasing, use bisect_right instead of bisect_left.

Classic Problems

LC 300 — Longest Increasing Subsequence. See Lab 05.
LC 354 — Russian Doll Envelopes (sort + LIS).
LC 673 — Number of Longest Increasing Subsequences.

13. LCS / Edit Distance Family

When To Use

Two strings, asking for similarity, alignment, or transformation cost. Includes longest common subsequence, edit distance (Levenshtein), longest common substring (different state!), and shortest common supersequence.

State Design

dp[i][j] = answer for prefix-i of A and prefix-j of B.

Transitions

LCS: dp[i][j] = dp[i-1][j-1] + 1 if A[i-1] == B[j-1], else max(dp[i-1][j], dp[i][j-1]).
Edit distance (Levenshtein): if match, dp[i][j] = dp[i-1][j-1]; else 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1]) for replace / delete / insert.
Longest common substring (different!): if match, dp[i][j] = dp[i-1][j-1] + 1; else dp[i][j] = 0. Answer is max(dp[i][j]) over all (i, j). The “else = 0” is what makes it substring vs subsequence.

Complexity

Time O(N · M). Space O(N · M) tabulated, O(M) with two rolling rows, O(M) with one row + a single saved diagonal value.

Common Pitfalls

Confusing subsequence and substring — they have different recurrences. Subsequence allows skipping; substring requires contiguity.
Edit distance with non-unit costs (insert/delete/replace each have a custom cost) — works the same with custom weights instead of +1.
Reconstructing the alignment requires backtracking through dp choices; store back-pointers or reconstruct from values.

Classic Problems

LC 1143 — Longest Common Subsequence.
LC 72 — Edit Distance. See Lab 06.
LC 583 — Delete Operation for Two Strings.
LC 712 — Minimum ASCII Delete Sum.
LC 718 — Maximum Length of Repeated Subarray (LCS variant; substring).

14. Palindrome DP

When To Use

Anything about palindromic substrings or subsequences: count, longest, partition into palindromes, minimum cuts.

Variant 1: Longest Palindromic Subsequence

dp[i][j] = length of longest palindromic subsequence in s[i..j].

dp[i][j] = dp[i+1][j-1] + 2          if s[i] == s[j]
         = max(dp[i+1][j], dp[i][j-1])  otherwise

Answer: dp[0][N-1]. Evaluation order: by interval length ascending.

Variant 2: Longest Palindromic Substring

is_pal[i][j] = boolean. is_pal[i][j] = (s[i] == s[j]) and (j - i < 2 or is_pal[i+1][j-1]). Track max length and start during fill.

(Manacher’s algorithm gives O(N) for this; see Phase 3.)

Variant 3: Palindrome Partitioning Min Cuts

cuts[i] = min cuts to partition s[0..i] into palindromes.

cuts[i] = 0                              if s[0..i] is itself a palindrome
        = min(cuts[j-1] + 1) for all j ≤ i with s[j..i] palindrome

Precompute is_pal[i][j] first (O(N²)), then run the cut DP (O(N²)). Total O(N²).

Common Pitfalls

Computing is_pal after the cut DP — circular dependency.
Wrong evaluation order in dp[i][j] — must fill smaller intervals first; iterate by length ascending.

Classic Problems

LC 516 — Longest Palindromic Subsequence. See Lab 07.
LC 5 — Longest Palindromic Substring.
LC 132 — Palindrome Partitioning II. See Lab 07.
LC 647 — Palindromic Substrings.

15. String DP

When To Use

Pattern matching with wildcards or operators: regex, glob/wildcard, interleaving, distinct subsequences. The state is two indices (one per string).

Variant: Regex Matching (LC 10)

dp[i][j] = does p[0..j] match s[0..i]?

if p[j-1] == '*':
    dp[i][j] = dp[i][j-2]                             # match zero of preceding
              or (matches(s[i-1], p[j-2]) and dp[i-1][j])  # match one more
elif matches(s[i-1], p[j-1]):
    dp[i][j] = dp[i-1][j-1]
else:
    dp[i][j] = False

Variant: Wildcard Matching (LC 44)

Similar but * matches any sequence: dp[i][j] = dp[i-1][j] or dp[i][j-1] when p[j-1] == '*'.

Variant: Interleaving Strings (LC 97)

dp[i][j] = can s3[0..i+j] be formed by interleaving s1[0..i] and s2[0..j]? Transition: take from s1 if s1[i-1] == s3[i+j-1]; take from s2 symmetrically; OR them.

Common Pitfalls

Off-by-one between pattern index and dp index — almost universal source of regex DP bugs.
* semantics differ between regex and glob; read the problem carefully.

Classic Problems

LC 10 — Regular Expression Matching.
LC 44 — Wildcard Matching.
LC 97 — Interleaving String.
LC 115 — Distinct Subsequences.

16. Tree DP

When To Use

The structure is a tree (rooted or rootable); the answer at a node depends on its subtree. Examples: house robber III, max path sum, longest path / diameter.

State Design

dp[v] = answer for the subtree rooted at v. Often a tuple: (best_with_v_chosen, best_without_v_chosen). Tuples are essential when the parent’s decision depends on whether the child was used.

Evaluation Order

Post-order DFS — fill children before the parent.

Transition

Aggregate over children. For house robber III: rob[v] = val[v] + sum(skip[c] for c in children); skip[v] = sum(max(rob[c], skip[c]) for c in children).

Complexity

Time O(V). Space O(V) for the recursion stack.

Common Pitfalls

Stack overflow at deep trees in Python (default limit 1000) — sys.setrecursionlimit(2 * 10**5) or convert to iterative post-order.
Mishandling N-ary vs binary children — N-ary requires summing over a dynamic list; binary is hard-coded (left, right).
Forgetting to handle null children — return identity values (0 or -∞).

Classic Problems

LC 337 — House Robber III. See Lab 08.
LC 124 — Binary Tree Maximum Path Sum.
LC 543 — Diameter of Binary Tree (variant).
LC 968 — Binary Tree Cameras (multi-state tree DP).

17. Interval DP

When To Use

The state is (left, right) — an interval — and the transition picks a “split point” k in [left, right]. Examples: matrix chain multiplication, burst balloons, palindrome partitioning, optimal BST, stone game.

State Design

dp[i][j] = answer for interval [i, j]. Often the meaningful question is “what is the last operation on this interval”, which forces a choice of split point k.

Transition

dp[i][j] = aggregate over k in [i..j]: dp[i][k-1] + dp[k+1][j] + cost(i, j, k).

The cost(i, j, k) typically depends on the boundaries of the interval — not just k — because the interval’s neighbors after the split are still i-1 and j+1.

Evaluation Order

By interval length ascending: for length in 1..N: for left in 0..N-length: right = left + length - 1.

Complexity

Time O(N³) in general (O(N²) intervals × O(N) split points). Space O(N²).

Common Pitfalls

Iterating (i, j) in the wrong order — must fill smaller intervals first. Length-ascending is the canonical order.
Choosing the wrong “thing” to split on — e.g., for burst balloons, the right state is “last balloon to burst in [i, j]” rather than “first balloon”.
Confusing the boundaries — the cost in burst balloons uses nums[i-1] and nums[j+1] as multipliers because those are the surviving neighbors at the moment the last balloon in [i, j] is burst.

Classic Problems

LC 312 — Burst Balloons. See Lab 09.
LC 1547 — Minimum Cost to Cut a Stick (matrix-chain-like).
LC 87 — Scramble String.
LC 132 — Palindrome Partitioning II. See Lab 07.

18. Bitmask DP

When To Use

Small-N (typically N ≤ 20) problems where the state must remember which subset of items has been used. Examples: TSP, assignment problem, set cover, “shortest path visiting all nodes”.

State Design

dp[mask] = best value over subsets specified by mask. Or dp[mask][last] = best path ending at node last and visiting exactly the nodes in mask (TSP-style).

Transition

For TSP: dp[mask | (1 << v)][v] = min(dp[mask | (1 << v)][v], dp[mask][u] + dist(u, v)) for all u in mask and v not in mask.

Evaluation Order

By mask value ascending — guarantees dp[submask] is filled before dp[mask] whenever submask ⊂ mask. Equivalently, by popcount(mask) ascending.

Complexity

Time O(2^N · N²) for TSP-style. Space O(2^N · N) — at N=20 this is 20 × 10^6 = 20M cells, fits in memory.

Common Pitfalls

Iterating bitmasks in the wrong order — by-mask-value ascending is the safe default.
Off-by-one on 1 << v vs 1 << (v-1) depending on 0- or 1-indexed nodes.
Forgetting mask includes the source when initializing.
Underestimating memory — at N=22, 2^N × N = 92M cells; at N=24, 400M+. Bitmask DP is strictly small-N.

Classic Problems

LC 847 — Shortest Path Visiting All Nodes. See Lab 10.
LC 943 — Find the Shortest Superstring.
LC 1349 — Maximum Students Taking Exam (bitmask over rows).
LC 1125 — Smallest Sufficient Team.

19. Digit DP (Overview)

When To Use

“Count numbers in [L, R] with property P” where P is digit-defined (sum of digits, no consecutive equal, contains a digit, etc.). The state is (position, tight, accumulator…).

State Design

dp[pos][tight][...accumulated state] where tight is a flag indicating whether the prefix so far equals the upper bound’s prefix (so the next digit is bounded).

Transition

For each digit d in 0..(9 if not tight else upper_bound[pos]), recurse to pos + 1 with tight' = tight and d == upper_bound[pos], updating the accumulator.

Complexity

Time O(D × 2 × digit_range × accumulator_size), typically tractable for D = 18 (decimal) and small accumulator.

Common Pitfalls

Off-by-one between [L, R] and [0, R] — answer is count(R) - count(L-1).
Leading zeros — track a “started” flag; otherwise “001” and “1” are conflated.
Memoizing on tight=True paths — they’re path-specific and shouldn’t be memoized; only memoize the tight=False branch.

Classic Problems

LC 233 — Number of Digit One.
LC 902 — Numbers At Most N Given Digit Set.
LC 1012 — Numbers With Repeated Digits.

Overview-only in this phase; depth in Phase 7 (Competitive Programming).

20. DP On DAG

When To Use

The graph is acyclic; you want longest / shortest / count of paths. The DAG itself defines the topological order; the DP runs along it.

State Design

dp[v] = answer for paths ending at v (or starting from v).

Transition

For longest path: dp[v] = max(dp[u] + w(u, v) for u in predecessors(v)). Run in topological order on the DAG.

Complexity

Time O(V + E). Space O(V).

Common Pitfalls

Running on a graph that has cycles — the recurrence diverges or memoization loops. Confirm DAG-ness with topological sort first.
Confusing “longest path” (NP-hard in general graphs) with “longest path in a DAG” (polynomial) — always say “in a DAG” out loud.

Classic Problems

LC 329 — Longest Increasing Path in a Matrix (implicit DAG).
LC 1857 — Largest Color Value in a Directed Graph.
“Longest path in a DAG” — folklore.

21. Game DP (Minimax / Nim / Stone Game)

When To Use

Two-player zero-sum perfect-information game; ask whether the first player wins, or by what margin. Examples: stone game, Nim, predict-the-winner.

State Design

dp[state] = the optimal score the current player can guarantee, assuming both play optimally. Often dp[i][j] with i, j being the two ends of a contested range.

Transition

The current player picks the choice that maximizes their own score. The opponent then plays from the resulting state, also optimally — so the value at the resulting state is what the opponent nets, not the current player. Hence:

dp[i][j] = max(stones[i] - dp[i+1][j], stones[j] - dp[i][j-1])

The -dp[...] flips perspective — the opponent’s optimal score becomes a deduction from the current player’s view.

Common Pitfalls

Forgetting the perspective flip — +dp[...] instead of -dp[...]. Produces nonsensical “both players cooperate” answers.
Confusing “current player wins” with “first player wins” — dp[state] is from the perspective of whoever moves at this state, which may not be the original first player after several moves.

Classic Problems

LC 486 — Predict the Winner.
LC 877 — Stone Game.
LC 1140 — Stone Game II.
LC 464 — Can I Win (game DP + bitmask).

22. Probability And Expected Value DP

When To Use

Random walks, expected number of steps, probability of reaching a state. Examples: knight probability, dice problems, Markov chains in disguise.

State Design

dp[state] = probability of being in state after the random process, OR expected value of some random variable from state.

Transition

For probability: dp[next] = sum(P(s -> next) × dp[s]) over all predecessors. For expected value (with stopping): E[state] = expected_immediate + sum(P(s -> next) × E[next]) for non-terminal states; E[terminal] = 0.

Complexity

Same as the underlying state-space DP.

Common Pitfalls

Conflating probability DP and expected-value DP — they have different recurrences; pick the right one for the question.
Numerical stability — many small probabilities multiplied; use log or rational arithmetic when extreme.
Infinite expected steps — if there’s a non-zero probability of never reaching the terminal, the expected value is infinite; check reachability first.

Classic Problems

LC 688 — Knight Probability in Chessboard.
LC 837 — New 21 Game.
“Expected number of dice rolls to reach sum N” — folklore.

DP-Recognition Cheat Sheet

The hardest skill in this phase is recognizing that a problem is DP. Here is a battery of signals.

Signal in problem statement	Likely DP variant
“Count number of ways”	Counting DP — sum over choices
“Maximum / minimum cost” with sequential choices	Optimization DP
“Pick subset with property P” / “partition”	Subset / knapsack
“Longest / shortest subsequence”	LIS / LCS family
“Edit / transform A into B”	Edit distance family
“Each item used at most once”	0/1 knapsack
“Each item can be reused”	Unbounded knapsack
“Substring / subarray / contiguous”	1D DP (often Kadane-like)
“Subsequence (non-contiguous)”	LCS / LIS family
“Palindromic”	Interval DP, expand-around-center, or LCS(s, rev(s))
“Match a pattern with `*` / `.`”	Regex / wildcard DP
“Tree” + “subtree answer aggregates”	Tree DP, post-order
“N ≤ 20” + “visit all” / “subset”	Bitmask DP
“N ≤ 100” + “split into intervals” / “merge intervals”	Interval DP, length-ascending
“Two-player game, both optimal”	Game DP, perspective-flip
“Probability” / “expected” + “random walk / dice”	Probability/EV DP
“Number of digits ≤ 18, range [L, R]”	Digit DP
“Acyclic graph + longest/count paths”	DP on DAG
“Climbing / hopping with steps {a, b, c}”	1D DP, Fibonacci-like
“Decide YES/NO with budget K”	Reachability DP, often boolean knapsack

Common DP Bugs

A taxonomy. Each one shows up in at least 30% of submitted DP solutions.

Wrong base case. dp[0] initialized to 0 when it should be 1 for counting, or 0 when it should be +∞ for min. Check by running tabulated against memoized on N=0, 1, 2.
Wrong evaluation order. 2D DP iterated in (j, i) order when the recurrence reads dp[i-1][j]. Interval DP iterated in (left, right) instead of (length, left). Bitmask DP iterated in arbitrary mask order.
Off-by-one between value-array index and DP index. If dp[i] is “first i elements”, the latest element is arr[i-1], not arr[i]. If dp[i] is “ending at index i”, the latest element is arr[i]. Pick one and never mix.
Missing a choice in the transition. The “skip” / “do nothing” choice is the most-often-forgotten. Without it, you over-constrain the answer.
Wrong direction in 1D-collapsed knapsack. Left-to-right (unbounded) vs right-to-left (0/1). Silently flipping turns one problem into the other.
Counting orderings instead of combinations. In coin change variants, the loop nesting (coins outer vs sums outer) determines combinations vs permutations.
Not handling unreachable states. +∞ propagation: if you compute dp[w] = dp[w - c] + 1 and dp[w-c] = +∞, your dp[w] becomes a large finite number (in fixed-width integer types) — overflow. Use INF = 10^9 + 7 and guard with explicit if dp[w-c] == INF: continue.
Recursion stack overflow in Python at N > 1000 — convert to iterative, or sys.setrecursionlimit(10**6) and accept memory cost.
Memoizing on mutable arguments. lru_cache requires hashable args; lists / dicts must be tuples / frozensets.
Wrong perspective flip in game DP. +dp[...] instead of -dp[...]. Both players appear to cooperate in your model.
Including or excluding the boundary of the table inconsistently. Off-by-one in iterators, inclusive/exclusive bounds.
Time / space estimate ignoring constants. “O(N · M) at N = M = 10^4” is 10^8 — TLE in Python, fine in C++. State the constant honestly.

Mastery Checklist

Before exiting this phase, verify all of these:

You can derive a state from a recursive brute force in <3 minutes for any DP problem.
You can write the recurrence (transition) in <2 minutes once the state is fixed.
You execute the brute → memo → tabulated → space-optimized progression on every DP problem in this phase, without skipping stages.
You can write tabulated 1D DP (house robber, climbing stairs) in <5 minutes from a blank screen.
You can write tabulated 2D DP (unique paths, edit distance) in <8 minutes from a blank screen.
You can space-optimize 2D DP to O(M) on demand, including the right-to-left collapse trick for 0/1 knapsack.
You can implement LIS at O(N²) and at O(N log N) in <15 minutes total.
You can implement edit distance with full progression in <25 minutes.
You can implement house robber III (tree DP) with the (rob, skip) tuple pattern in <15 minutes.
You can implement burst balloons (interval DP) with the length-ascending iteration in <25 minutes.
You can implement TSP-style bitmask DP (dp[mask][last]) in <30 minutes.
You can articulate why iterating for j: for i: in 2D DP can produce garbage — i.e., the topological-order argument — in <30 seconds.
You can articulate why 0/1 knapsack iterates w right-to-left and unbounded iterates left-to-right — in <30 seconds.
You can articulate the perspective-flip in game DP — in <30 seconds.

Exit Criteria

You may move to Phase 6 (Greedy and Mathematical Thinking) when all of the following are true:

You have completed all ten labs in this phase, with each lab’s mastery criteria checked off.
You have solved at least 50 unaided DP problems from LeetCode (mix of Medium, Medium-Hard, Hard) and reviewed each via REVIEW_TEMPLATE.md.
Your unaided success rate on Medium-Hard DP problems is ≥ 65%.
In a mock interview (phase-11-mock-interviews/), you correctly identify the DP variant within 2 minutes for at least 7 of 10 DP problems and produce the recurrence within 4 minutes for at least 6 of 10.
You execute the brute → memo → tabulated → space-optimized progression on every DP problem in mocks, even when the interviewer doesn’t ask for all four stages — this is the single discipline of this phase, and skipping it is a phase-failure.

If any of these fails, do another 20–30 DP problems before moving on. Skipping this gate calcifies bad habits that destroy you in Phase 7 (competitive programming) where DP shows up at every turn.

Labs

Hands-on practice. Each lab follows the strict 22-section format and demonstrates the four-stage progression in detail.

← Phase 4: Graph Mastery · Phase 6: Greedy → · Back to Top

Lab 01 — 1D DP Foundations (House Robber)

Goal

Implement House Robber (LC 198) four times — brute recursion, memoized, tabulated, and space-optimized — to internalize the brute → memo → tabulated → space-optimized progression that this entire phase is built around. After this lab you should be able to recognize a 1D DP problem in <60 seconds, derive the state and recurrence in <90 seconds, and produce the O(1)-space final solution from a blank screen in under 5 minutes.

Background Concepts

A 1D DP has state dp[i] indexed by a single integer — the prefix length, the position, or the day. The recurrence reads only O(1) previous values, which is what makes the rolling-array (O(1)-space) trick work. House Robber is the canonical example because it has exactly two choices per state (rob this house or skip), each of which determines the next state cleanly. The recursive formulation f(i) = max(f(i-1), f(i-2) + house[i-1]) reads two previous values; the tabulated version is a direct loop; the space-optimized version keeps two scalars.

The four-stage progression is the discipline of this lab. Don’t skip stages. The interviewer at staff level routinely asks “show me the recursive version first” specifically to test whether you can derive the recurrence from a brute force. Candidates who memorized the iterative solution but never wrote the recursion fail this question.

Interview Context

House Robber is a top-30 phone-screen DP problem at Amazon, Google, Microsoft, and Meta. Its variants — House Robber II (circular), House Robber III (tree, see Lab 08) — extend it. Bombing this problem on a phone screen is a near-instant rejection at L4. The reason: it has the simplest possible state (a single integer) and the simplest possible recurrence (two-way choice). If you can’t do this one, you can’t do any DP.

Problem Statement

You are a robber planning to rob houses arranged in a line. Each house has a non-negative integer amount of cash, given by nums[i]. You cannot rob two adjacent houses (the alarm system links them). Return the maximum amount of cash you can rob.

Constraints

1 ≤ nums.length ≤ 100
0 ≤ nums[i] ≤ 400

Clarifying Questions

Are amounts non-negative? (Yes — given.)
Can nums be empty? (No, length ≥ 1 by constraint, but always confirm.)
Are houses arranged in a line or a circle? (Line for LC 198; LC 213 is the circular variant.)
Can two adjacent houses both be skipped? (Yes — skipping is always allowed.)
Must we rob at least one house? (No — robbing nothing is allowed if all values are 0; in practice, since amounts are non-negative, the optimum is always ≥ 0.)

Examples

nums = [1, 2, 3, 1]            → 4   (rob houses 0 and 2: 1 + 3)
nums = [2, 7, 9, 3, 1]         → 12  (rob houses 0, 2, 4: 2 + 9 + 1)
nums = [2, 1, 1, 2]            → 4   (rob houses 0 and 3: 2 + 2)
nums = [5]                     → 5
nums = [0, 0, 0]               → 0

Initial Brute Force

At each house, two choices: rob it (and skip the next) or skip it. Recursively try both:

def rob_brute(nums):
    def f(i):
        if i >= len(nums):
            return 0
        return max(f(i + 1), nums[i] + f(i + 2))
    return f(0)

Brute Force Complexity

Each call branches into 2 recursive calls, so we visit O(2^N) subproblems. At N=100, that’s 2^100 = 1.27 × 10^30 — far beyond any time limit. Space is O(N) for the recursion stack.

Optimization Path

The brute force is exponential because the same f(i) is recomputed exponentially many times. There are only N+1 distinct values of i, so memoization collapses the work to O(N). From there, tabulation removes the recursion overhead. Finally, since the recurrence reads only dp[i-1] and dp[i-2], we keep two scalars instead of the full array — O(1) space.

Each stage strictly improves on the previous: brute → memo (cache; from O(2^N) to O(N) time), memo → tabulated (loop instead of recursion; same complexity, no stack overhead), tabulated → space-optimized (drop the array; from O(N) to O(1) space).

Final Expected Approach

Define dp[i] = maximum cash robbed from the first i houses. Recurrence:

dp[0] = 0                                         (no houses to rob)
dp[1] = nums[0]                                   (one house — rob it)
dp[i] = max(dp[i-1],          # skip house i-1
            dp[i-2] + nums[i-1])  # rob house i-1

Answer: dp[N]. Since the recurrence reads only dp[i-1] and dp[i-2], keep two scalars: prev2 and prev1.

Data Structures Used

A 1D array dp of size N+1 (tabulated).
Two scalars prev2, prev1 (space-optimized).
For brute / memo: function call stack and a memoization dict / lru_cache.

Correctness Argument

By induction on i. Base: dp[0] = 0 (correct — no houses). dp[1] = nums[0] (correct — one house, rob it). Inductive step: at step i, the optimal robbery either does or does not rob house i-1. If it does, the remaining is the optimal over the first i-2 houses (since we can’t rob i-1’s neighbors), giving dp[i-2] + nums[i-1]. If it does not, the remaining is the optimal over the first i-1 houses, giving dp[i-1]. Taking the max covers both cases — this exhausts the choice space, so the recurrence is correct.

Complexity

Stage	Time	Space
Brute force	O(2^N)	O(N) (stack)
Memoized	O(N)	O(N) (cache + stack)
Tabulated	O(N)	O(N)
Space-optimized	O(N)	O(1)

Implementation Requirements

All four stages are required.

# ---- Stage 1: Brute force ----
def rob_brute(nums):
    def f(i):
        if i >= len(nums):
            return 0
        return max(f(i + 1), nums[i] + f(i + 2))
    return f(0)

# ---- Stage 2: Memoized ----
from functools import lru_cache
def rob_memo(nums):
    @lru_cache(None)
    def f(i):
        if i >= len(nums):
            return 0
        return max(f(i + 1), nums[i] + f(i + 2))
    return f(0)

# ---- Stage 3: Tabulated ----
def rob_tab(nums):
    n = len(nums)
    if n == 0: return 0
    dp = [0] * (n + 1)
    dp[1] = nums[0]
    for i in range(2, n + 1):
        dp[i] = max(dp[i-1], dp[i-2] + nums[i-1])
    return dp[n]

# ---- Stage 4: Space-optimized ----
def rob(nums):
    prev2, prev1 = 0, 0
    for x in nums:
        prev2, prev1 = prev1, max(prev1, prev2 + x)
    return prev1

Tests

[] → 0 (defensive, even if constraint disallows).
[5] → 5.
[5, 1] → 5.
[1, 2, 3, 1] → 4.
[2, 7, 9, 3, 1] → 12.
[0, 0, 0, 0] → 0.
[400, 400, 400] → 800 (rob ends).
All four implementations should produce identical results — write a randomized stress comparator on nums of length 1..15 and check rob_brute == rob == rob_tab == rob_memo.

Follow-up Questions

“What if houses are in a circle?” (LC 213 — House Robber II) → Run the line algorithm twice: once excluding house 0, once excluding house N-1; take the max.
“What if houses are nodes of a binary tree?” (LC 337 — House Robber III) → Tree DP with (rob, skip) tuple per node. See Lab 08.
“Reconstruct which houses were robbed.” → Track back-pointers in tabulated version, or re-derive by walking dp backwards: at each i, robbed iff dp[i] > dp[i-1].
“What if the no-rob constraint extends to k-apart instead of adjacent?” → dp[i] = max(dp[i-1], dp[i-k-1] + nums[i-1]).
“What if amounts can be negative?” → Same recurrence; dp[i-2] + nums[i-1] may be less than dp[i-1], so the max correctly drops it.

Product Extension

Variations of this problem appear in real systems: scheduling non-conflicting jobs (interval scheduling with profit), selecting non-overlapping ad slots, and assigning tasks to time-slots with cooldown. The 1D DP framework generalizes when “no two adjacent” becomes “no two within window K” or “must wait at least T”.

Language/Runtime Follow-ups

Python: lru_cache makes memoization a one-line addition. At N>1000, default recursion limit overflows — bump with sys.setrecursionlimit(10**6) or use the tabulated version. The space-optimized version is idiomatic and fast.
Java: use int[] dp for tabulated; Arrays.fill(dp, -1) + recursion for memoization. Java’s default stack is ~512KB; recursion overflows around N=10000.
Go: tabulated is idiomatic; Go has no lru_cache so memoization needs a manual map[int]int or []int.
C++: tabulated with vector<int>; memoization with a vector<int> memo(N, -1) and a recursive helper.
JS/TS: same idiom as Python but no lru_cache — use Map for memoization.

Common Bugs

Returning dp[N-1] instead of dp[N] (or vice versa) — depends on whether dp[i] indexes “first i houses” or “ending at i”. Pick one convention and stick to it.
Initializing dp[0] = nums[0] and dp[1] = max(nums[0], nums[1]) — works, but only if you handle N=1 separately. The cleaner convention is dp[0]=0, dp[1]=nums[0].
Off-by-one in dp[i-2] + nums[i-1] vs dp[i-2] + nums[i-2] — depends on the index convention. Verify on [5].
Forgetting that nums can be empty — guard with if not nums: return 0 even though constraints say N ≥ 1.
Space-optimized version: swapping prev2, prev1 = prev1, max(prev1, prev2 + x) in the wrong order. Tuple-assignment in Python evaluates the RHS first, so this is correct; in Java/C++ you need an explicit temp.

Debugging Strategy

When the answer is wrong by a small amount: print the entire dp array for nums = [2, 7, 9, 3, 1] (expected dp = [0, 2, 7, 11, 11, 12]). If it differs, trace the iteration step where dp first deviates and inspect the recurrence at that index. When the answer is wildly wrong (negative, or much smaller): suspect index off-by-one or an if condition that’s flipped. When TLE: confirm you’re not running the brute force.

Mastery Criteria

Recognized House Robber as a 1D DP problem within 60 seconds.
Wrote the brute recursive formulation in <2 minutes from cold start.
Added @lru_cache to produce the memoized version in <30 seconds.
Wrote the tabulated version in <3 minutes from blank screen, passing all five test cases first try.
Wrote the space-optimized version in <2 minutes after the tabulated.
Stated O(N) time and O(1) space unprompted.
Articulated the inductive correctness argument in <30 seconds.
Solved LC 198 unaided in <8 minutes total (all four stages).
Solved LC 213 (House Robber II) unaided in <12 minutes by running the line algorithm twice.

Lab 02 — 2D DP (Unique Paths with Obstacles)

Goal

Solve Unique Paths II (LC 63) with the full brute → memo → tabulated → space-optimized progression. Internalize the canonical 2D DP loop structure (for i: for j:) and the rolling-row trick that reduces O(N · M) space to O(M). After this lab you should be able to write any grid-DP problem from a blank screen in <8 minutes and apply the rolling-row collapse on demand.

Background Concepts

A 2D DP has state dp[i][j] indexed by two integers. For grid problems, (i, j) is a cell, and the recurrence aggregates over the (at most) two predecessors (i-1, j) and (i, j-1). Because each row depends only on the previous row, the table can be rolled down to a single 1D array of length M+1 — half the memory, identical answers.

The grid-DP family is the cleanest 2D DP family because the dependency graph is trivially layered (row by row). It is the right place to learn the rolling-row mechanic before applying it to harder 2D DPs (knapsack, edit distance, LCS).

Interview Context

Unique Paths II is a top-50 Medium DP problem at Microsoft, Amazon, and Bloomberg. The non-obstacle variant (LC 62) shows up at every L3 phone screen. The obstacle variant adds a wrinkle: cells with grid[i][j] == 1 are blocked and contribute 0. Candidates who try a closed-form combinatorial answer (C(N+M-2, N-1)) get stuck the moment obstacles appear — the only general approach is DP. Showing all four stages (brute, memo, tabulated, O(M)-space) signals senior fluency.

Problem Statement

Given an m × n grid obstacleGrid where each cell is either 0 (open) or 1 (obstacle), count the number of distinct paths from (0, 0) to (m-1, n-1). Movement is restricted to right or down by one cell. If the start or end is blocked, the answer is 0.

Constraints

1 ≤ m, n ≤ 100
obstacleGrid[i][j] is 0 or 1.
The result is guaranteed to fit in a 32-bit signed integer.

Clarifying Questions

Can the start or end be an obstacle? (Yes — answer is 0 if so.)
Are diagonal moves allowed? (No — only right and down.)
Are paths considered distinct if they share intermediate cells? (Yes — only the sequence of moves matters.)
Modular arithmetic required? (No — fits in int32.)
Is m=1, n=1 valid? (Yes; answer is 1 if open, 0 if blocked.)

Examples

[[0,0,0],
 [0,1,0],
 [0,0,0]]                   → 2

[[0,1],
 [0,0]]                     → 1

[[1]]                       → 0   (start blocked)

[[0]]                       → 1

Initial Brute Force

At each open cell, try moving right or down recursively:

def paths_brute(grid):
    m, n = len(grid), len(grid[0])
    def f(i, j):
        if i >= m or j >= n or grid[i][j] == 1:
            return 0
        if i == m - 1 and j == n - 1:
            return 1
        return f(i + 1, j) + f(i, j + 1)
    return f(0, 0)

Brute Force Complexity

Each call branches into two; depth is m + n - 2. Worst-case calls: 2^(m+n-2). At m=n=100, that’s 2^198 ≈ 4 × 10^59 — TLE. Space is O(m+n) for the recursion stack.

Optimization Path

The brute force recomputes f(i, j) exponentially. There are only m × n distinct (i, j) pairs, so memoization collapses time to O(m · n). Tabulation replaces recursion with a row-major loop. Since dp[i][j] reads dp[i-1][j] and dp[i][j-1], the previous row plus the in-progress row are sufficient — collapse to a single 1D array iterated left-to-right (no same-row dependency conflict because we read dp[j-1] before overwriting it, and dp[j] from the previous row is what’s already there).

Final Expected Approach

Define dp[i][j] = number of paths from (0, 0) to (i, j). Recurrence:

dp[0][0] = 1 if grid[0][0] == 0 else 0
dp[i][j] = 0                                if grid[i][j] == 1
         = dp[i-1][j] + dp[i][j-1]          otherwise (treat out-of-bounds as 0)

Roll to 1D: dp[j] += dp[j-1] for each row, with dp[j] = 0 if blocked.

Data Structures Used

2D array dp of size m × n (tabulated).
1D array dp of size n (rolled).
For brute / memo: recursion stack + lru_cache.

Correctness Argument

Every path to (i, j) arrives via the cell above or the cell to the left. The number of paths to (i, j) is the sum of paths to those two predecessors (when neither is out-of-bounds and both are open). This holds because the two predecessor paths are disjoint (the last move differs) and exhaust all paths. Blocked cells contribute 0 directly. Base: dp[0][0] = 1 if open, else 0. Induction over the row-major topological order proves correctness for all cells.

Complexity

Stage	Time	Space
Brute force	O(2^(m+n))	O(m+n)
Memoized	O(m · n)	O(m · n)
Tabulated	O(m · n)	O(m · n)
Space-optimized	O(m · n)	O(n)

Implementation Requirements

All four stages.

# ---- Stage 1: Brute force ----
def paths_brute(grid):
    m, n = len(grid), len(grid[0])
    def f(i, j):
        if i >= m or j >= n or grid[i][j] == 1:
            return 0
        if i == m - 1 and j == n - 1:
            return 1
        return f(i + 1, j) + f(i, j + 1)
    return f(0, 0)

# ---- Stage 2: Memoized ----
from functools import lru_cache
def paths_memo(grid):
    m, n = len(grid), len(grid[0])
    @lru_cache(None)
    def f(i, j):
        if i >= m or j >= n or grid[i][j] == 1: return 0
        if i == m - 1 and j == n - 1: return 1
        return f(i + 1, j) + f(i, j + 1)
    return f(0, 0)

# ---- Stage 3: Tabulated ----
def paths_tab(grid):
    m, n = len(grid), len(grid[0])
    if grid[0][0] == 1 or grid[m-1][n-1] == 1: return 0
    dp = [[0] * n for _ in range(m)]
    dp[0][0] = 1
    for i in range(m):
        for j in range(n):
            if grid[i][j] == 1:
                dp[i][j] = 0
                continue
            if i > 0: dp[i][j] += dp[i-1][j]
            if j > 0: dp[i][j] += dp[i][j-1]
    return dp[m-1][n-1]

# ---- Stage 4: Space-optimized (1D rolled) ----
def uniquePathsWithObstacles(grid):
    m, n = len(grid), len(grid[0])
    if grid[0][0] == 1: return 0
    dp = [0] * n
    dp[0] = 1
    for i in range(m):
        for j in range(n):
            if grid[i][j] == 1:
                dp[j] = 0
            elif j > 0:
                dp[j] += dp[j-1]
    return dp[n-1]

Tests

[[0,0,0],[0,1,0],[0,0,0]] → 2.
[[0,1],[0,0]] → 1.
[[1]] → 0.
[[0]] → 1.
[[0,0],[1,1],[0,0]] → 0 (no path past the blocking row).
[[0,0,0,0,0]] → 1.
[[0],[0],[0]] → 1.
m=n=100 with random 10% obstacles — performance test.

Follow-up Questions

“What if there are diagonal moves?” → dp[i][j] += dp[i-1][j-1] as a third predecessor.
“Each cell has a cost; minimize total path cost.” → Min-path-sum (LC 64); min instead of +.
“K obstacles can be removed.” → 3D DP dp[i][j][k] = paths to (i, j) having removed k obstacles.
“Reconstruct one valid path.” → Backtrack through dp from the target; at each cell pick a predecessor with non-zero contribution.
“Grid is enormous (m=n=10^9) but obstacles are sparse.” → Combinatorial answer (C(m+n-2, n-1)) minus inclusion-exclusion over obstacles. Out of scope here.

Product Extension

Routing on a city grid with road closures, robot path planning with obstacles, dependency-graph traversal with disabled edges. The grid-DP framework generalizes to any DAG where the topological order is row-major.

Language/Runtime Follow-ups

Python: [[0]*n for _ in range(m)] allocates correctly; [[0]*n]*m shares row references — a classic bug. Use a comprehension.
Java: int[][] dp = new int[m][n]; zero-initializes by default.
Go: pre-allocate the slice-of-slices explicitly.
C++: vector<vector<int>> dp(m, vector<int>(n, 0));.
JS/TS: Array.from({length: m}, () => new Array(n).fill(0)) to avoid the shared-reference trap.

Common Bugs

Shared row references in Python: dp = [[0]*n]*m makes all rows alias the same list. Use a comprehension.
Forgetting to check grid[0][0] == 1: if the start is blocked, the answer is 0, but dp[0][0] = 1 would propagate non-zero counts through the grid.
Using if i > 0 and j > 0 instead of two separate ifs — silently misses one of the two predecessors.
Iterating columns outer, rows inner — works for this problem since dp[i][j] only reads upward and leftward, but breaks the rolled-1D version.
Rolled 1D version: forgetting to set dp[j] = 0 on obstacle — old non-zero value persists from the previous row.

Debugging Strategy

Print the full dp table for the 3×3 obstacle example. Expected:

1 1 1
1 0 1
1 1 2

If yours diverges at row 1, suspect the obstacle handling. If at row 2, suspect the dp[j] = 0 reset. For the rolled-1D version, print dp after each row.

Mastery Criteria

Recognized this as a 2D grid DP within 60 seconds.
Wrote the brute recursion in <2 minutes.
Wrote the tabulated 2D version in <5 minutes from blank screen.
Performed the rolling-row collapse to 1D in <2 minutes from the tabulated version.
Stated O(m·n) time and O(n) space for the final solution unprompted.
Articulated why the rolled-1D version iterates left-to-right (no same-row conflict).
Solved LC 62 (no obstacles) in <5 minutes.
Solved LC 63 unaided in <12 minutes total.
Solved LC 64 (min path sum) in <8 minutes by changing + to min.

Lab 03 — 0/1 Knapsack (Partition Equal Subset Sum)

Goal

Solve Partition Equal Subset Sum (LC 416) by reducing it to 0/1 knapsack. Internalize the right-to-left iteration that makes the 1D-collapsed knapsack correct, and articulate why left-to-right iteration would silently turn it into unbounded knapsack. After this lab you should recognize any subset-sum / partition / target-sum / select-with-budget problem as 0/1 knapsack within 90 seconds.

Background Concepts

0/1 knapsack: N items each with weight w_i and value v_i; pick a subset with total weight ≤ W maximizing total value. The 2D DP has state dp[i][w] = max value using first i items with capacity w. The 1D-collapsed version uses dp[w] and iterates w from W down to w_i — the right-to-left iteration is what prevents an item from being reused within the same outer iteration.

Subset sum is 0/1 knapsack with v_i = w_i and a boolean dp instead of integer-valued. Partition equal subset sum reduces to “is there a subset summing to total / 2?”; if total is odd, return false immediately.

Interview Context

Partition Equal Subset Sum is a top-25 Medium DP problem at Amazon and Microsoft. The 0/1 knapsack pattern shows up in disguise constantly: target sum (LC 494), ones and zeroes (LC 474), last stone weight II (LC 1049), tallest billboard (LC 956). Recognizing the reduction is half the battle. The other half is the right-to-left iteration trick — getting that wrong is one of the most common DP bugs across the entire interview corpus.

Problem Statement

Given a non-empty array nums of positive integers, determine whether it can be partitioned into two subsets with equal sums.

Constraints

1 ≤ nums.length ≤ 200
1 ≤ nums[i] ≤ 100

So the maximum total sum is 200 × 100 = 20,000, and the target is at most 10,000. The 2D DP has 200 × 10001 = 2 × 10^6 cells — comfortable.

Clarifying Questions

Are elements positive? (Yes — given.)
Must the partition use all elements? (Yes — that’s what “partition” means.)
Is the empty subset allowed on either side? (Yes if total is 0 — vacuously true. Not the case here since nums[i] ≥ 1.)
Are duplicates allowed? (Yes — they’re treated as separate items.)
Return value: bool (true / false).

Examples

[1, 5, 11, 5]            → true   (1+5+5 == 11)
[1, 2, 3, 5]             → false  (total=11, odd)
[1, 2, 5]                → false  (total=8, target=4, no subset sums to 4)
[2, 2, 1, 1]             → true   (2+1 == 2+1)
[100]                    → false  (total=100, target=50, no subset)

Initial Brute Force

For each element, recurse on “include it” and “skip it”:

def can_partition_brute(nums):
    total = sum(nums)
    if total % 2: return False
    target = total // 2
    def f(i, remain):
        if remain == 0: return True
        if i == len(nums) or remain < 0: return False
        return f(i + 1, remain - nums[i]) or f(i + 1, remain)
    return f(0, target)

Brute Force Complexity

O(2^N) time, O(N) stack. At N=200, 2^200 — completely infeasible.

Optimization Path

There are only N × (target + 1) distinct (i, remain) pairs, so memoization gives O(N · target) time and space. Tabulation replaces recursion with a 2D loop. Since dp[i][w] only reads dp[i-1][...], roll to 1D dp[w] — but iterate w right-to-left so that each item is considered at most once per outer iteration.

The right-to-left direction is the defining trick of 0/1 knapsack. If we iterate left-to-right, then dp[w - w_i] may have already been updated to include item i from the current outer iteration; we’d then re-include item i, turning the algorithm into unbounded knapsack.

Final Expected Approach

Reduce to subset sum: target = total / 2 (or return false if total is odd).

dp[w] = True  if some subset sums to exactly w, considering items processed so far.
dp[0] = True (empty subset sums to 0).
For each num x in nums:
    for w in range(target, x - 1, -1):
        dp[w] = dp[w] or dp[w - x]
Answer: dp[target]

Data Structures Used

2D dp[N+1][target+1] boolean array (tabulated).
1D dp[target+1] boolean array (space-optimized).
For brute / memo: recursion + lru_cache.

Correctness Argument

Inductive on items processed. dp[w] = True iff some subset of items processed so far sums to w. Base: dp[0] = True (empty subset). Inductive step: when we process item x, the new dp[w] is True iff (a) it was True before (subset not using x sums to w), OR (b) dp[w - x] was True before processing x (subset summing to w - x plus item x). The right-to-left iteration ensures we read the previous dp[w - x], not the in-iteration one. Termination: we want dp[target] after all items are processed.

Complexity

Stage	Time	Space
Brute force	O(2^N)	O(N)
Memoized	O(N · target)	O(N · target)
Tabulated	O(N · target)	O(N · target)
Space-optimized	O(N · target)	O(target)

For LC 416: N≤200, target≤10000, so ~2×10^6 ops — fast.

Implementation Requirements

All four stages.

# ---- Stage 1: Brute force ----
def can_partition_brute(nums):
    total = sum(nums)
    if total % 2: return False
    target = total // 2
    def f(i, remain):
        if remain == 0: return True
        if i == len(nums) or remain < 0: return False
        return f(i + 1, remain - nums[i]) or f(i + 1, remain)
    return f(0, target)

# ---- Stage 2: Memoized ----
from functools import lru_cache
def can_partition_memo(nums):
    total = sum(nums)
    if total % 2: return False
    target = total // 2
    @lru_cache(None)
    def f(i, remain):
        if remain == 0: return True
        if i == len(nums) or remain < 0: return False
        return f(i + 1, remain - nums[i]) or f(i + 1, remain)
    return f(0, target)

# ---- Stage 3: Tabulated 2D ----
def can_partition_tab(nums):
    total = sum(nums)
    if total % 2: return False
    target = total // 2
    n = len(nums)
    dp = [[False] * (target + 1) for _ in range(n + 1)]
    for i in range(n + 1):
        dp[i][0] = True
    for i in range(1, n + 1):
        for w in range(1, target + 1):
            dp[i][w] = dp[i-1][w]
            if w >= nums[i-1]:
                dp[i][w] = dp[i][w] or dp[i-1][w - nums[i-1]]
    return dp[n][target]

# ---- Stage 4: Space-optimized 1D ----
def canPartition(nums):
    total = sum(nums)
    if total % 2: return False
    target = total // 2
    dp = [False] * (target + 1)
    dp[0] = True
    for x in nums:
        for w in range(target, x - 1, -1):  # RIGHT-TO-LEFT
            dp[w] = dp[w] or dp[w - x]
    return dp[target]

Tests

[1, 5, 11, 5] → True.
[1, 2, 3, 5] → False (odd total).
[1, 2, 5] → False (no valid subset).
[2, 2, 1, 1] → True.
[100] → False.
[1, 1] → True.
N=200, all nums[i]=1 → True (target=100; pick 100 of them).
All four implementations should produce identical bool results — randomized comparator on N≤15.

Follow-up Questions

“Return one valid subset, not just yes/no.” → Track parent pointers in the 2D DP; reconstruct by walking backwards through (i, w).
“Partition into K equal subsets.” (LC 698) → 0/1-knapsack-style DP becomes intractable; use bitmask DP or backtracking with pruning.
“Target sum: how many ways to assign +/- to each number to total exactly T?” (LC 494) → Reduce to subset sum: count subsets summing to (total + T) / 2.
“Minimum subset sum difference.” (LC 1049) → Find largest s ≤ total/2 reachable; answer is total - 2s.
“What if nums[i] can be huge (up to 10^9)?” → Knapsack space blows up. Use Karp-style or reduce by GCD; otherwise NP-hard in general.

Product Extension

Resource allocation (split a budget across two teams equally), load balancing (split a workload across two workers), and “is there a subset with this exact total?” appear in billing systems, accounting reconciliation, and cluster-resource schedulers.

Language/Runtime Follow-ups

Python: dp = [False] * (target + 1) is fine; the inner loop’s range(target, x - 1, -1) is the canonical right-to-left form.
Java: boolean[] dp = new boolean[target + 1]; defaults to false. Use a BitSet for ~64x speedup: dp.or(dp << x) does the entire row-update in O(target / 64).
Go: make([]bool, target+1) and a manual reversed loop.
C++: vector<bool> is bit-packed; bitset<10001> is faster but fixed-size.
JS/TS: new Uint8Array(target + 1) to avoid the false default-equals-undefined trap.

Common Bugs

Iterating w left-to-right in the 1D version — turns 0/1 into unbounded; spurious True answers.
Forgetting the odd-total short circuit — wastes time and may TLE on edge cases.
Using dp[w] = dp[w-x] instead of dp[w] or dp[w-x] — wipes out previously-set True values.
Off-by-one in range(target, x - 1, -1) — should include w == x (since dp[x] = dp[x] or dp[0] = True for any x ≤ target).
Setting dp[0] = True only on the first iteration — must be set once before any item is processed.

Debugging Strategy

For [1, 5, 11, 5]: after processing [1], dp = [T, T, F, F, F, F, F, F, F, F, F, F] (indexes 0..11). After [1, 5]: dp[6] = T (1+5). After [1, 5, 11]: dp[11] = T. Print dp after each item; if dp[target] becomes True earlier than expected, you’re allowing item-reuse (left-to-right bug).

Mastery Criteria

Recognized partition-equal-subset as 0/1 knapsack within 90 seconds.
Wrote the reduction to subset sum (target = total / 2) before any code.
Wrote the brute recursion in <2 minutes.
Wrote the 2D tabulated version in <5 minutes.
Performed the 1D collapse with right-to-left iteration in <2 minutes.
Articulated why left-to-right would be wrong (item reuse → unbounded knapsack) in <30 seconds.
Stated O(N · target) time and O(target) space unprompted.
Solved LC 416 unaided in <12 minutes (all four stages).
Solved LC 494 (Target Sum) in <12 minutes via the reduction.

Lab 04 — Unbounded Knapsack (Coin Change)

Goal

Solve Coin Change (LC 322 — minimum coins) and Coin Change II (LC 518 — count combinations) with the full four-stage progression. Internalize the left-to-right iteration that makes 1D-collapsed unbounded knapsack correct, and the loop-nesting trick that distinguishes counting combinations from counting permutations. After this lab you can solve any “unlimited supply” knapsack in <10 minutes from cold start.

Background Concepts

Unbounded knapsack: items can be reused any number of times. The 1D-collapsed DP iterates w left-to-right, the opposite of 0/1 knapsack. That single direction-change is the entire mechanical difference. The semantic difference: when we read dp[w - c] in the left-to-right pass, it has already been updated this round to include coin c zero or more times — so this round’s update can stack another c on top, achieving “use c multiple times”.

Coin Change has two flavors. LC 322 asks for the minimum number of coins to reach amount; the recurrence is dp[w] = min(dp[w], dp[w - c] + 1), initialized to +∞ with dp[0] = 0. LC 518 asks for the count of combinations summing to amount; the recurrence is dp[w] += dp[w - c], initialized to dp[0] = 1. The combinations-vs-permutations trap: with coins outer, sums inner, you count combinations (each combination of coins is counted once regardless of order); with sums outer, coins inner, you count permutations (different orderings of the same coins count separately).

Interview Context

Coin Change (LC 322) is a top-15 phone-screen DP problem at every major company. Coin Change II is asked roughly half as often but tests the deeper combinations-vs-permutations distinction. Bombing LC 322 at L4+ is a near-instant rejection. Senior interviewers often follow up with LC 518 specifically to test whether you understand why the loop nesting matters — the hand-wavy candidate is filtered by this question.

Problem Statement

LC 322 (minimum coins): Given coins of distinct denominations and an integer amount, return the fewest number of coins needed to make up amount. Return -1 if unreachable. Each coin denomination has unlimited supply.

LC 518 (count combinations): Given coins and amount, return the number of distinct combinations that sum to amount.

Constraints

1 ≤ coins.length ≤ 12 (LC 322) / 300 (LC 518).
1 ≤ coins[i] ≤ 2^31 − 1.
0 ≤ amount ≤ 10^4 (LC 322) / 5000 (LC 518).
LC 518: answer fits in a signed 32-bit integer.

Clarifying Questions

Are coins distinct? (Yes — given.)
Can each coin be used multiple times? (Yes — unlimited supply; this is what makes it unbounded.)
Is amount=0 valid? (Yes; minimum coins = 0; combinations = 1 — the empty combination.)
LC 518: is [1, 2] different from [2, 1]? (No — combinations only, not permutations.)
Coins can exceed amount? (Yes; just unusable for that amount.)

Examples

LC 322:
coins=[1,2,5], amount=11   → 3   (5+5+1)
coins=[2], amount=3        → -1  (unreachable)
coins=[1], amount=0        → 0
coins=[1,2,5], amount=100  → 20  (twenty 5-coins)

LC 518:
coins=[1,2,5], amount=5    → 4   ([5], [2,2,1], [2,1,1,1], [1×5])
coins=[2], amount=3        → 0
coins=[10], amount=10      → 1

Initial Brute Force (LC 322)

At each step, try every coin:

def coinChange_brute(coins, amount):
    def f(remain):
        if remain == 0: return 0
        if remain < 0: return float('inf')
        best = float('inf')
        for c in coins:
            best = min(best, f(remain - c) + 1)
        return best
    ans = f(amount)
    return ans if ans != float('inf') else -1

Brute Force Complexity

Each call branches into len(coins) recursive calls; depth amount / min(coins). Worst case O(K^(amount)) — exponential. At K=12, amount=10^4, completely infeasible.

Optimization Path

There are only amount + 1 distinct values of remain, so memoization gives O(K · amount) time. Tabulation replaces recursion with a loop. The 1D version uses left-to-right iteration so each coin can be reused.

For LC 518 (counting), the order matters: coins outer (combinations) vs sums outer (permutations). The combinations interpretation is what LC 518 wants.

Final Expected Approach

LC 322 (minimum):

dp[w] = min coins to make w. dp[0] = 0; dp[w > 0] = INF.
For each w in 1..amount:
    For each c in coins where c <= w:
        dp[w] = min(dp[w], dp[w - c] + 1)
Answer: dp[amount] if dp[amount] != INF else -1.

LC 518 (count combinations):

dp[w] = number of combinations summing to w. dp[0] = 1; rest 0.
For each c in coins:                           # COINS OUTER
    For each w in c..amount:                   # SUMS INNER, left-to-right
        dp[w] += dp[w - c]
Answer: dp[amount].

Data Structures Used

1D dp[amount+1] integer array.
For brute / memo: recursion + lru_cache.

Correctness Argument

LC 322: dp[w] = min coins to reach w, by induction on w. Base: dp[0] = 0. Inductive step: any optimal solution for w ends with some coin c, leaving w - c to be solved optimally — dp[w - c] + 1. Take the minimum over all coins. Unreachable states stay at INF, propagating correctly under min.

LC 518: by the outer-coins loop, after processing coins c_1, ..., c_k, dp[w] counts combinations of those coins summing to w. Inductive step: when we process c_{k+1} with the inner left-to-right loop, the update dp[w] += dp[w - c_{k+1}] adds combinations that use at least one c_{k+1}. Because the inner loop is left-to-right, dp[w - c_{k+1}] already includes solutions using c_{k+1} zero or more times — so this update accounts for using c_{k+1} exactly 1, 2, 3, … times in turn. Each combination is counted exactly once because every combination has a latest coin index, and only the iteration on that coin index counts it. Outer-coins prevents reordering: [1, 2] and [2, 1] are not separately counted.

Complexity

Stage	Time	Space
Brute force	O(K^amount)	O(amount)
Memoized	O(K · amount)	O(amount)
Tabulated	O(K · amount)	O(amount)
Space-optimized	(same as tabulated; already 1D)	O(amount)

Implementation Requirements

All four stages for LC 322; tabulated only for LC 518.

# ==== LC 322: Coin Change (minimum coins) ====

# ---- Stage 1: Brute force ----
def coinChange_brute(coins, amount):
    def f(remain):
        if remain == 0: return 0
        if remain < 0: return float('inf')
        return min((f(remain - c) + 1 for c in coins), default=float('inf'))
    ans = f(amount)
    return ans if ans != float('inf') else -1

# ---- Stage 2: Memoized ----
from functools import lru_cache
def coinChange_memo(coins, amount):
    @lru_cache(None)
    def f(remain):
        if remain == 0: return 0
        if remain < 0: return float('inf')
        return min((f(remain - c) + 1 for c in coins), default=float('inf'))
    ans = f(amount)
    return ans if ans != float('inf') else -1

# ---- Stage 3+4: Tabulated 1D (already optimal space) ----
def coinChange(coins, amount):
    INF = amount + 1
    dp = [INF] * (amount + 1)
    dp[0] = 0
    for w in range(1, amount + 1):                  # SUMS OUTER, COINS INNER for min variant
        for c in coins:
            if c <= w:
                dp[w] = min(dp[w], dp[w - c] + 1)
    return dp[amount] if dp[amount] != INF else -1

# ==== LC 518: Coin Change II (count combinations) ====

def change(amount, coins):
    dp = [0] * (amount + 1)
    dp[0] = 1
    for c in coins:                                  # COINS OUTER
        for w in range(c, amount + 1):               # SUMS INNER, LEFT-TO-RIGHT
            dp[w] += dp[w - c]
    return dp[amount]

Tests

LC 322: coins=[1,2,5], amount=11 → 3.
LC 322: coins=[2], amount=3 → -1.
LC 322: coins=[1], amount=0 → 0.
LC 322: coins=[186, 419, 83, 408], amount=6249 → 20.
LC 518: coins=[1,2,5], amount=5 → 4.
LC 518: coins=[2], amount=3 → 0.
LC 518: coins=[10], amount=10 → 1.
LC 518: amount=0 → 1 (empty combination).
Compare: For LC 518 with sums-outer (the wrong way), coins=[1,2], amount=3 gives 3 (1+1+1, 1+2, 2+1) instead of 2 (1+1+1, 1+2).

Follow-up Questions

“Reconstruct one valid combination.” → Track which coin produced each dp[w]; backtrack from dp[amount].
“What if coins are large (up to 10^9)?” → Knapsack table doesn’t fit; switch to BFS over reachable amounts (still O(amount × K)) or to coin-set-specific number theory.
“Constraint: at most K coins total.” → Add a dimension: dp[w][k] = min/count using ≤ k coins.
“All combinations summing to exactly amount, not just count.” → Backtracking; output is exponential in worst case.
“What if some coins have limited supply?” → Bounded knapsack; binary-decompose each coin’s count and reduce to 0/1.

Product Extension

Cash-register optimization (which bills/coins to dispense for change), packet-payload composition (combining MTU-aware fragments), and currency-change problems in financial systems — all reduce to coin change variants.

Language/Runtime Follow-ups

Python: INF = amount + 1 (since at most amount coins of value 1) avoids float('inf') arithmetic.
Java: int[] dp = new int[amount+1]; Arrays.fill(dp, amount+1); dp[0]=0;.
Go: pre-fill via loop; no Arrays.fill shortcut.
C++: vector<int> dp(amount+1, amount+1); dp[0]=0;.
JS/TS: new Array(amount+1).fill(amount+1) then dp[0]=0.

Common Bugs

LC 322: iterating coins outer, sums inner — works for the minimum variant; misleading for those who later try LC 518 with the same nesting and get permutations.
LC 518: iterating sums outer, coins inner — counts permutations ([1,2] and [2,1] separately) instead of combinations.
Forgetting dp[0] = 1 in LC 518 — every count becomes 0.
Using float('inf') + 1 arithmetic in Python — works (inf + 1 == inf), but slower and obscures intent. Prefer INF = amount + 1.
Forgetting the c <= w guard — out-of-bounds index dp[w - c] when w < c.
Off-by-one in range(1, amount + 1) — must reach amount inclusive.

Debugging Strategy

For LC 322 with coins=[1,2,5], amount=11: after the loop, dp = [0,1,1,2,2,1,2,2,3,3,2,3]. Print dp and check dp[11] = 3. For LC 518 with coins=[1,2,5], amount=5: after processing coin 1, dp = [1,1,1,1,1,1]. After coin 2: dp = [1,1,2,2,3,3]. After coin 5: dp = [1,1,2,2,3,4]. Walking through this manually catches loop-nesting bugs.

Mastery Criteria

Recognized “unlimited coins” as unbounded knapsack within 60 seconds.
Wrote LC 322 brute recursion in <2 minutes.
Wrote LC 322 tabulated in <5 minutes.
Articulated why unbounded uses left-to-right iteration in <30 seconds.
Wrote LC 518 with the correct outer-coins loop in <5 minutes.
Articulated why coins-outer counts combinations and sums-outer counts permutations in <60 seconds.
Stated O(K · amount) time and O(amount) space.
Solved LC 322 unaided in <10 minutes (full progression).
Solved LC 518 unaided in <10 minutes.

Lab 05 — LIS (Longest Increasing Subsequence)

Goal

Solve LC 300 with two distinct algorithms: the canonical O(N²) DP and the patience-sort + binary-search O(N log N) variant. Internalize why both produce the same answer despite very different mechanics. After this lab you can produce both solutions from a blank screen in <15 minutes total and explain the equivalence on a whiteboard.

Background Concepts

The LIS problem is the canonical example of a problem with two equally-valid algorithmic angles. The O(N²) DP defines dp[i] = length of LIS ending at index i; the O(N log N) algorithm maintains an array tails where tails[k] = smallest tail of any increasing subsequence of length k+1. Both produce the same length; the binary-search version is faster but harder to prove correct.

Patience sorting: imagine dealing cards onto piles such that each pile is strictly decreasing top-to-bottom (place each card on the leftmost pile whose top is ≥ the new card; if none exists, start a new pile). The number of piles equals the LIS length, by Dilworth’s theorem. The tails array tracks the top of each pile.

Interview Context

LIS is a top-20 Medium DP problem and shows up at Google, Bloomberg, and Microsoft regularly. The follow-up “can you do better than O(N²)?” is asked specifically to test whether you know patience sorting. Candidates who know only O(N²) are shipped to L4; candidates who can derive O(N log N) from scratch (or articulate it cleanly) are L5+ material. LIS is also the building block for LC 354 (Russian Doll Envelopes) and LC 673 (Number of LIS).

Problem Statement

Given an integer array nums, return the length of the longest strictly increasing subsequence.

Constraints

1 ≤ nums.length ≤ 2500 (canonical LeetCode constraint)
−10^4 ≤ nums[i] ≤ 10^4

Clarifying Questions

Strictly increasing or non-decreasing? (Strictly — nums[i] < nums[j].)
Subsequence or subarray? (Subsequence — non-contiguous selections allowed.)
Return the length or the actual sequence? (Length only, per problem.)
Are duplicates handled? (Yes; strict means duplicates can’t both be in the LIS.)
Is the empty subsequence allowed (length 0)? (Yes, but since nums.length ≥ 1, the answer is ≥ 1.)

Examples

[10, 9, 2, 5, 3, 7, 101, 18]    → 4   ([2, 3, 7, 101] or [2, 5, 7, 101])
[0, 1, 0, 3, 2, 3]              → 4   ([0, 1, 2, 3])
[7, 7, 7, 7]                    → 1
[1]                             → 1
[5, 4, 3, 2, 1]                 → 1

Initial Brute Force

For each index, recursively decide include or skip, tracking the previous chosen element to enforce strict-increasing:

def lengthOfLIS_brute(nums):
    def f(i, prev):
        if i == len(nums):
            return 0
        skip = f(i + 1, prev)
        take = 0
        if prev == -1 or nums[i] > nums[prev]:
            take = 1 + f(i + 1, i)
        return max(skip, take)
    return f(0, -1)

Brute Force Complexity

O(2^N) — each step has two choices.

Optimization Path

The state is (i, prev) where prev is the last chosen index (or -1 for none). There are O(N²) such states, so memoization gives O(N²) time and space. Tabulation: define dp[i] = length of LIS ending exactly at index i; recurrence reads only smaller j < i, so we don’t even need the prev dimension — dp[i] = 1 + max(dp[j] for j < i if nums[j] < nums[i]). Final answer is max(dp).

For O(N log N): maintain tails such that tails[k] is the smallest tail of any LIS of length k+1. For each nums[i], binary-search for the first element in tails ≥ nums[i]; if found, replace; if not (i.e., nums[i] exceeds all), append.

Final Expected Approach

O(N²) DP: prefix DP indexed by ending position; recurrence iterates over all earlier indices.

O(N log N) patience sort: maintain tails as an increasing array; bisect_left(tails, nums[i]) gives the position to replace; if equal to len(tails), append.

The equivalence: each tails[k] corresponds to “the smallest endpoint of a length-(k+1) IS we’ve seen”. When we process nums[i], replacing tails[k] with nums[i] represents “we’ve found a length-(k+1) IS with smaller tail” — which can only help future extensions. The length of tails at the end is the LIS length.

Data Structures Used

1D dp of size N (O(N²) version).
1D tails array (O(N log N) version), Python’s bisect module.

Correctness Argument

O(N²): by induction. dp[i] = 1 + max(dp[j] : j < i, nums[j] < nums[i]). Base: dp[0] = 1. Inductive step: any LIS ending at i has a previous element at some j < i with nums[j] < nums[i], contributing dp[j] + 1. The max over all valid j is the optimum. Answer is max_i dp[i].

O(N log N) (Patience sort): invariant — tails[k] is the smallest possible tail of any IS of length k+1 over nums[0..i]. When processing nums[i]: binary-search for the leftmost position k with tails[k] >= nums[i]. If k = len(tails), append (we’ve extended the longest IS by one). Otherwise, replace tails[k] with nums[i] (we’ve found a length-(k+1) IS with smaller tail; future extensions are now easier). The invariant is preserved at every step. The length of tails is the LIS length.

Complexity

Algorithm	Time	Space
Brute force	O(2^N)	O(N)
Memoized	O(N²)	O(N²)
Tabulated O(N²) DP	O(N²)	O(N)
Patience sort	O(N log N)	O(N)

Implementation Requirements

All four stages.

# ---- Stage 1: Brute force ----
def lengthOfLIS_brute(nums):
    def f(i, prev):
        if i == len(nums):
            return 0
        skip = f(i + 1, prev)
        take = 0
        if prev == -1 or nums[i] > nums[prev]:
            take = 1 + f(i + 1, i)
        return max(skip, take)
    return f(0, -1)

# ---- Stage 2: Memoized ----
from functools import lru_cache
def lengthOfLIS_memo(nums):
    @lru_cache(None)
    def f(i, prev):
        if i == len(nums): return 0
        skip = f(i + 1, prev)
        take = 0
        if prev == -1 or nums[i] > nums[prev]:
            take = 1 + f(i + 1, i)
        return max(skip, take)
    return f(0, -1)

# ---- Stage 3: Tabulated O(N^2) ----
def lengthOfLIS_tab(nums):
    n = len(nums)
    if n == 0: return 0
    dp = [1] * n
    for i in range(1, n):
        for j in range(i):
            if nums[j] < nums[i]:
                dp[i] = max(dp[i], dp[j] + 1)
    return max(dp)

# ---- Stage 4: Patience sort O(N log N) ----
from bisect import bisect_left
def lengthOfLIS(nums):
    tails = []
    for x in nums:
        k = bisect_left(tails, x)
        if k == len(tails):
            tails.append(x)
        else:
            tails[k] = x
    return len(tails)

Tests

[10, 9, 2, 5, 3, 7, 101, 18] → 4.
[0, 1, 0, 3, 2, 3] → 4.
[7, 7, 7, 7] → 1.
[1] → 1.
[1, 2, 3, 4, 5] → 5 (already sorted).
[5, 4, 3, 2, 1] → 1 (decreasing).
N=2500 random — performance test for both algorithms.
Cross-check: random N≤15, the four implementations should agree.

Follow-up Questions

“Return the actual LIS, not just the length.” → Track parent pointers in the O(N²) DP, or in O(N log N) keep alongside tails an array tails_idx of indices into nums and parent links.
“Number of distinct LIS’s of maximum length.” (LC 673) → Augment dp[i] with cnt[i] = number of LIS’s ending at i.
“Longest non-decreasing subsequence.” → bisect_right instead of bisect_left.
“2D version: stack envelopes (LC 354).” → Sort by width ascending and height descending (to break ties); run LIS on heights.
“Longest bitonic subsequence.” → Compute LIS forward and LIS backward; combine at each split point.

Product Extension

LIS underlies version-history compression, longest-monotonic-trend analysis in time-series (e.g., longest streak of growing daily users), and dependency-resolution heuristics. The O(N log N) algorithm is what production code uses when N is large.

Language/Runtime Follow-ups

Python: bisect_left is in the standard library and uses C-level binary search — extremely fast.
Java: Arrays.binarySearch(tails, 0, size, x) returns negative for not-found; convert to insertion point with -(ret + 1).
Go: sort.SearchInts(tails, x) for bisect_left equivalent.
C++: lower_bound(tails.begin(), tails.end(), x) for bisect_left; upper_bound for bisect_right.
JS/TS: no built-in binary search — implement manually or use a third-party lodash.sortedIndex.

Common Bugs

bisect_right vs bisect_left — strict-increasing uses bisect_left; non-decreasing uses bisect_right. Off-by-one in this choice silently gives the wrong LIS variant.
Treating tails as the actual LIS — it isn’t; it’s just the smallest-tails-by-length array. Reconstructing the LIS requires extra bookkeeping.
O(N²) DP: starting dp[i] = 0 instead of 1 — every element is itself an LIS of length 1.
Returning dp[N-1] instead of max(dp) — the LIS may end anywhere, not necessarily at the last index.
Memoization on (i, prev) with prev=-1 not recognized as initial state — works in Python with @lru_cache since -1 is hashable, but easy to forget.

Debugging Strategy

For [10, 9, 2, 5, 3, 7, 101, 18]: trace tails after each element: [10] → [9] → [2] → [2,5] → [2,3] → [2,3,7] → [2,3,7,101] → [2,3,7,18]. Length 4 is the LIS length. If your trace diverges, you’ve made a bisect mistake. For the O(N²) DP, print dp after the loop: [1,1,1,2,2,3,4,4].

Mastery Criteria

Recognized “longest increasing subsequence” as LIS within 30 seconds.
Wrote the brute recursion in <2 minutes.
Wrote the O(N²) DP from blank screen in <4 minutes.
Wrote the O(N log N) patience-sort version in <5 minutes.
Articulated the patience-sort invariant (“tails[k] is the smallest tail of length-(k+1) IS”) in <30 seconds.
Stated O(N log N) time complexity and explained why binary-search is correct here.
Solved LC 300 unaided in <12 minutes (both algorithms).
Solved LC 354 (Russian Doll Envelopes) by reduction to LIS in <15 minutes.
Articulated bisect_left vs bisect_right for strict vs non-strict in <30 seconds.

Lab 06 — LCS / Edit Distance

Goal

Solve Edit Distance (LC 72 — Levenshtein) with the full four-stage progression. Internalize the canonical two-string DP dp[i][j] indexed by prefix lengths, and the three-way min over insert / delete / replace. After this lab you can write any LCS-family DP from a blank screen in <12 minutes and apply the rolling-row collapse to O(M) space.

Background Concepts

Edit distance — sometimes called Levenshtein distance — is the minimum number of single-character edits (insert, delete, replace) needed to transform string A into string B. The state dp[i][j] indexes prefix-i of A and prefix-j of B; the recurrence has one branch per edit operation plus a free pass on character match.

LCS (longest common subsequence) and edit distance are the foundational two-string DPs. They share the index convention (dp[i][j] = answer for prefix-i of A, prefix-j of B), the boundary handling (dp[i][0] = i, dp[0][j] = j), and the rolling-row space optimization (O(N · M) → O(M)). Mastering edit distance gives you LCS for free.

Interview Context

Edit Distance is a top-15 Hard-tagged DP problem at Google, Microsoft, and Amazon. It shows up in coding rounds at staff level routinely, often paired with a follow-up “now reconstruct the alignment”. LCS (LC 1143) is the gentler Medium variant and tests the same skill. Candidates who can derive both recurrences from scratch and articulate the four edit operations precisely demonstrate fluency that translates to nearly every two-string DP problem in the corpus (regex match, distinct subsequences, interleaving strings, longest common substring).

Problem Statement

Given two strings word1 and word2, return the minimum number of operations required to convert word1 into word2. Allowed operations: insert a character, delete a character, replace a character (each cost 1).

Constraints

0 ≤ word1.length, word2.length ≤ 500
word1 and word2 consist of lowercase English letters.

Clarifying Questions

Are insert/delete/replace each cost 1? (Yes — Levenshtein.)
Are there any other operations (transpose, e.g.)? (No — Damerau-Levenshtein adds transpose; not in scope.)
Are characters lowercase only? (Yes — given.)
Are empty strings valid inputs? (Yes; answer is len(word1) + len(word2)’s difference, specifically max(len(word1), len(word2)) when one is empty.)
Return the count or the alignment? (Count — alignment is a follow-up.)

Examples

word1="horse",       word2="ros"          → 3   (horse→rorse→rose→ros)
word1="intention",   word2="execution"    → 5
word1="",            word2="abc"          → 3   (insert 3)
word1="abc",         word2=""             → 3   (delete 3)
word1="abc",         word2="abc"          → 0

Initial Brute Force

def edit_brute(w1, w2):
    def f(i, j):
        if i == 0: return j      # insert remaining w2
        if j == 0: return i      # delete remaining w1
        if w1[i-1] == w2[j-1]:
            return f(i-1, j-1)   # match: no edit
        return 1 + min(
            f(i-1, j),           # delete w1[i-1]
            f(i, j-1),           # insert w2[j-1]
            f(i-1, j-1),         # replace
        )
    return f(len(w1), len(w2))

Brute Force Complexity

Each non-base call branches into 3; recursion depth N + M. Worst case O(3^(N+M)). At N=M=500, completely infeasible.

Optimization Path

There are (N+1)(M+1) distinct (i, j) pairs — memoization gives O(N · M) time. Tabulation replaces recursion with a row-major loop. Since dp[i][j] depends only on dp[i-1][j-1], dp[i-1][j], dp[i][j-1], the previous row plus the in-progress row are enough — collapse to two 1D arrays of size M+1. With careful use of a saved diagonal, you can collapse to a single 1D array.

Final Expected Approach

dp[i][j] = edit distance between word1[:i] and word2[:j].
dp[0][j] = j     (insert j chars to get word2[:j] from empty word1[:0])
dp[i][0] = i     (delete i chars from word1[:i] to get empty word2[:0])
dp[i][j] = dp[i-1][j-1]                                    if word1[i-1] == word2[j-1]
         = 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])   otherwise

The three operations correspond to:

dp[i-1][j-1] + 1 — replace word1[i-1] with word2[j-1].
dp[i-1][j] + 1 — delete word1[i-1].
dp[i][j-1] + 1 — insert word2[j-1].

Data Structures Used

2D dp[(N+1) x (M+1)] array (tabulated).
Two 1D prev, curr arrays of size M+1 (rolled).

Correctness Argument

By induction on (i, j) in row-major order. Base cases: dp[0][j] = j (clearly j inserts), dp[i][0] = i (clearly i deletes). Inductive step: an optimal alignment of word1[:i] with word2[:j] ends with one of: (a) match — word1[i-1] == word2[j-1] aligned; cost is dp[i-1][j-1]. (b) replace — pair word1[i-1] with word2[j-1]; cost dp[i-1][j-1] + 1. (c) delete — word1[i-1] deleted, word1[:i-1] aligned with word2[:j]; cost dp[i-1][j] + 1. (d) insert — word2[j-1] inserted, word1[:i] aligned with word2[:j-1]; cost dp[i][j-1] + 1. The min of these is the optimum. (a) and (b) are mutually exclusive based on character equality.

Complexity

Stage	Time	Space
Brute force	O(3^(N+M))	O(N+M)
Memoized	O(N · M)	O(N · M)
Tabulated	O(N · M)	O(N · M)
Space-optimized	O(N · M)	O(min(N, M))

Implementation Requirements

All four stages.

# ---- Stage 1: Brute force ----
def edit_brute(w1, w2):
    def f(i, j):
        if i == 0: return j
        if j == 0: return i
        if w1[i-1] == w2[j-1]: return f(i-1, j-1)
        return 1 + min(f(i-1, j), f(i, j-1), f(i-1, j-1))
    return f(len(w1), len(w2))

# ---- Stage 2: Memoized ----
from functools import lru_cache
def edit_memo(w1, w2):
    @lru_cache(None)
    def f(i, j):
        if i == 0: return j
        if j == 0: return i
        if w1[i-1] == w2[j-1]: return f(i-1, j-1)
        return 1 + min(f(i-1, j), f(i, j-1), f(i-1, j-1))
    return f(len(w1), len(w2))

# ---- Stage 3: Tabulated 2D ----
def edit_tab(w1, w2):
    n, m = len(w1), len(w2)
    dp = [[0] * (m + 1) for _ in range(n + 1)]
    for j in range(m + 1): dp[0][j] = j
    for i in range(n + 1): dp[i][0] = i
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            if w1[i-1] == w2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
    return dp[n][m]

# ---- Stage 4: Space-optimized O(M) ----
def minDistance(w1, w2):
    n, m = len(w1), len(w2)
    if n < m:
        w1, w2, n, m = w2, w1, m, n  # ensure m is the smaller dim
    prev = list(range(m + 1))
    for i in range(1, n + 1):
        curr = [i] + [0] * m
        for j in range(1, m + 1):
            if w1[i-1] == w2[j-1]:
                curr[j] = prev[j-1]
            else:
                curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1])
        prev = curr
    return prev[m]

Tests

("horse", "ros") → 3.
("intention", "execution") → 5.
("", "abc") → 3.
("abc", "") → 3.
("abc", "abc") → 0.
("a", "b") → 1.
("a"*500, "b"*500) → 500 — performance test.
All four implementations equivalent on random N≤8 inputs.

Follow-up Questions

“Reconstruct the alignment (sequence of operations).” → Backtrack from dp[n][m]: at each (i, j), look at which of the three (or four) predecessors matches the current value; emit the corresponding operation.
“Custom costs for insert / delete / replace.” → Replace +1 with the appropriate cost in each branch; works without other change.
“Levenshtein with transpositions (Damerau-Levenshtein).” → Add a fourth branch dp[i-2][j-2] + 1 if word1[i-1]==word2[j-2] and word1[i-2]==word2[j-1].
“Longest common subsequence.” (LC 1143) → Same shape; recurrence: dp[i][j] = dp[i-1][j-1] + 1 if match, else max(dp[i-1][j], dp[i][j-1]).
“Minimum ASCII delete sum.” (LC 712) → Variant where deletes cost ASCII value of the deleted char.

Product Extension

Edit distance is the engine behind diff/patch tools, spell correctors, fuzzy search (“did you mean”), DNA-sequence alignment (Needleman-Wunsch is a generalization with custom scoring matrices), and code-review side-by-side comparison. Real systems use Hirschberg’s algorithm to reconstruct the alignment in O(M) space.

Language/Runtime Follow-ups

Python: lru_cache works on (i, j) since both are ints. The space-optimized version benefits from swapping to ensure m ≤ n.
Java: int[][] dp with explicit boundary fill. For O(M) space, two int[m+1] arrays.
Go: 2D slice; make([][]int, n+1) then per-row make([]int, m+1).
C++: vector<vector<int>> 2D; or two vector<int>(m+1) for O(M).
JS/TS: 2D array via Array.from({length: n+1}, () => new Array(m+1).fill(0)).

Common Bugs

Off-by-one between string index and DP index — word1[i-1] not word1[i]. The convention “prefix-i” means index i is exclusive in the prefix, so the latest char is word1[i-1].
Forgetting the boundary dp[0][j] = j, dp[i][0] = i — defaults to 0 and produces nonsense answers.
Using max instead of min in the recurrence — wrong direction.
Including dp[i-1][j-1] + 1 in the match branch — match has no edit cost; should be dp[i-1][j-1] exactly.
Space-optimized version: forgetting curr[0] = i — the new row’s column 0 is i (deleting i chars to match empty prefix), not 0.
Mistakenly thinking insert and delete are symmetric in cost across both strings — they aren’t; insert into word1 is the same as delete from word2. Levenshtein conflates them in our recurrence which is fine because all costs are 1.

Debugging Strategy

For ("horse", "ros"), the full table is:

    ""  r   o   s
""   0  1   2   3
h    1  1   2   3
o    2  2   1   2
r    3  2   2   2
s    4  3   3   2
e    5  4   4   3

Print and check. If the boundary row/column is wrong, the entire table is. For the rolled version, print prev after each row.

Mastery Criteria

Recognized “min operations to transform” as edit distance within 60 seconds.
Wrote the brute recursion with all four cases (match / replace / delete / insert) in <3 minutes.
Wrote the 2D tabulated version in <5 minutes.
Performed the rolling-row collapse to two 1D arrays in <3 minutes.
Stated O(N · M) time and O(M) space.
Articulated which DP cell corresponds to which edit operation.
Solved LC 72 unaided in <20 minutes (full progression).
Solved LC 1143 (LCS) in <8 minutes by changing the recurrence.
Solved LC 583 (Delete Operation for Two Strings) in <8 minutes (LCS + arithmetic).

Lab 07 — Palindrome DP

Goal

Solve Longest Palindromic Subsequence (LC 516) and Palindrome Partitioning II (LC 132 — minimum cuts) with the four-stage progression. Internalize the length-ascending iteration that makes interval DP correct, and the is_pal[i][j] precompute that powers most palindrome problems. After this lab you can solve any palindrome-DP variant in <12 minutes.

Background Concepts

Palindrome DP problems split into two families:

Subsequence palindromes (LC 516): dp[i][j] = length of longest palindromic subsequence of s[i..j]. Recurrence: if s[i] == s[j], dp[i][j] = 2 + dp[i+1][j-1]; else dp[i][j] = max(dp[i+1][j], dp[i][j-1]).
Substring palindromes (LC 132, LC 5, LC 647): precompute is_pal[i][j] = (s[i..j] is a palindrome) via interval DP, then layer the partitioning / counting DP on top.

The shared mechanic: iterate by length ascending, since dp[i][j] depends on intervals strictly shorter. This is the defining feature of interval DP (deeper exploration in Lab 09).

Interview Context

Longest Palindromic Subsequence is a top-30 Medium DP problem; Palindrome Partitioning II is a Hard variant asked at Google and Microsoft. The is_pal precompute is the unlock for ~10 distinct LeetCode problems (5, 131, 132, 516, 647, 1278, 1312, 1771, …). Candidates who can derive the length-ascending loop and articulate the substring-vs-subsequence distinction signal solid interval-DP fluency.

Problem Statement

LC 516 (LPS subsequence): Given a string s, return the length of the longest palindromic subsequence.

LC 132 (min cuts): Given a string s, return the minimum number of cuts needed to partition s into palindromic substrings.

Constraints

1 ≤ s.length ≤ 1000 (LC 516) / 2000 (LC 132).
s is lowercase English.

Clarifying Questions

Subsequence or substring? (Subsequence for LC 516; substring for LC 132 partitioning.)
Is a single character a palindrome? (Yes — length 1.)
Is the empty string a palindrome? (Conventionally yes.)
LC 132: must each part be non-empty? (Yes.)
LC 132: 0 cuts means the entire string is a palindrome.

Examples

LC 516:
"bbbab"               → 4   ("bbbb")
"cbbd"                → 2   ("bb")
"a"                   → 1

LC 132:
"aab"                 → 1   ("aa" | "b")
"a"                   → 0
"ab"                  → 1
"aabb"                → 1   ("aa" | "bb")
"abcbm"               → 2
"abc"                 → 2

Initial Brute Force (LC 516)

def lps_brute(s):
    def f(i, j):
        if i > j: return 0
        if i == j: return 1
        if s[i] == s[j]: return 2 + f(i+1, j-1)
        return max(f(i+1, j), f(i, j-1))
    return f(0, len(s) - 1)

Brute Force Complexity

O(2^N) worst case — every mismatch branches. At N=1000, infeasible.

Optimization Path

(i, j) has only O(N²) distinct values, so memoization is O(N²) time and space. Tabulation: iterate length = 1..N, then i = 0..N-length, then derive j = i + length - 1. The length-ascending order ensures all shorter intervals are computed first.

LC 132 strategy: precompute is_pal[i][j] in O(N²). Then cuts[i] = min cuts for s[:i+1]; cuts[i] = -1 if s[:i+1] is itself a palindrome; else cuts[i] = min(cuts[j-1] + 1 : 0 ≤ j ≤ i, s[j..i] palindrome).

Final Expected Approach

LC 516:

dp[i][j] = LPS of s[i..j]
dp[i][i] = 1; dp[i][j] = 0 for i > j
For length 2..N:
    For i in 0..N-length:
        j = i + length - 1
        if s[i] == s[j]:
            dp[i][j] = (2 if length == 2 else 2 + dp[i+1][j-1])
        else:
            dp[i][j] = max(dp[i+1][j], dp[i][j-1])
Answer: dp[0][N-1]

LC 132:

1. Compute is_pal[i][j] (O(N^2)).
2. cuts[i] = min cuts to partition s[0..i].
   cuts[i] = 0 if is_pal[0][i].
   else cuts[i] = min(cuts[j-1] + 1 : 1 <= j <= i, is_pal[j][i]).
Answer: cuts[N-1].

Data Structures Used

2D dp[N][N] (LC 516).
2D is_pal[N][N] boolean + 1D cuts[N] (LC 132).

Correctness Argument

LC 516: by induction on length. Base: length 1 → 1. Inductive: an LPS of s[i..j] either uses both endpoints (must be equal, contributing 2 + LPS of s[i+1..j-1]) or skips at least one endpoint (LPS of s[i+1..j] or s[i..j-1]). The max covers all cases.

LC 132: every valid partition has a last cut at some position j, splitting into s[0..j-1] + s[j..i] where s[j..i] is a palindrome. The minimum is over all valid j. This exhausts all partitions.

Complexity

Problem	Time	Space
LC 516 brute	O(2^N)	O(N)
LC 516 memo	O(N²)	O(N²)
LC 516 tab	O(N²)	O(N²)
LC 516 space-opt	O(N²)	O(N)
LC 132	O(N²)	O(N²)

Implementation Requirements

# ==== LC 516: Longest Palindromic Subsequence ====

# ---- Stage 1: Brute force ----
def lps_brute(s):
    def f(i, j):
        if i > j: return 0
        if i == j: return 1
        if s[i] == s[j]: return 2 + f(i+1, j-1)
        return max(f(i+1, j), f(i, j-1))
    return f(0, len(s) - 1)

# ---- Stage 2: Memoized ----
from functools import lru_cache
def lps_memo(s):
    @lru_cache(None)
    def f(i, j):
        if i > j: return 0
        if i == j: return 1
        if s[i] == s[j]: return 2 + f(i+1, j-1)
        return max(f(i+1, j), f(i, j-1))
    return f(0, len(s) - 1)

# ---- Stage 3: Tabulated 2D ----
def lps_tab(s):
    n = len(s)
    dp = [[0] * n for _ in range(n)]
    for i in range(n): dp[i][i] = 1
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            if s[i] == s[j]:
                dp[i][j] = 2 if length == 2 else 2 + dp[i+1][j-1]
            else:
                dp[i][j] = max(dp[i+1][j], dp[i][j-1])
    return dp[0][n-1]

# ---- Stage 4: Space-optimized 1D ----
def longestPalindromeSubseq(s):
    n = len(s)
    dp = [0] * n
    for i in range(n - 1, -1, -1):
        new = [0] * n
        new[i] = 1
        for j in range(i + 1, n):
            if s[i] == s[j]:
                new[j] = 2 + (dp[j-1] if j-1 >= i+1 else 0)
            else:
                new[j] = max(dp[j], new[j-1])
        dp = new
    return dp[n-1]

# ==== LC 132: Palindrome Partitioning II ====

def minCut(s):
    n = len(s)
    # Step 1: precompute is_pal in O(N^2)
    is_pal = [[False] * n for _ in range(n)]
    for i in range(n): is_pal[i][i] = True
    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            if s[i] == s[j] and (length == 2 or is_pal[i+1][j-1]):
                is_pal[i][j] = True
    # Step 2: cuts DP
    cuts = [0] * n
    for i in range(n):
        if is_pal[0][i]:
            cuts[i] = 0
            continue
        cuts[i] = i  # worst case: cut after every character
        for j in range(1, i + 1):
            if is_pal[j][i]:
                cuts[i] = min(cuts[i], cuts[j-1] + 1)
    return cuts[n-1]

Tests

LC 516: "bbbab" → 4. "cbbd" → 2. "a" → 1. "ac" → 1. "aaaa" → 4.
LC 132: "aab" → 1. "a" → 0. "ab" → 1. "aabb" → 1. "abcbm" → 2. "abacabaca" → 0 (already a palindrome? no: check) → actually "abacaba" is, but "abacabaca" is not — answer 1.
Cross-implementation check on random N≤10.

Follow-up Questions

“Count palindromic substrings.” (LC 647) → Sum is_pal[i][j] over all (i, j) with i ≤ j.
“Longest palindromic substring.” (LC 5) → Return the length / actual string of the largest (j - i + 1) with is_pal[i][j].
“Minimum insertions to make palindrome.” (LC 1312) → len(s) - LPS(s).
“All palindrome partitions.” (LC 131) → Backtracking; output exponentially many.
“Palindromes with one allowed mismatch.” → Add a dimension dp[i][j][k] where k ∈ {0, 1}.

Product Extension

Palindrome detection appears in DNA-sequence analysis (palindromic motifs are biologically meaningful: restriction sites, hairpins), text-search systems, and as a non-trivial benchmark for compiler optimization. The is_pal precompute is also useful in interview problems that don’t strictly need DP (just O(N²) precomputation).

Language/Runtime Follow-ups

Python: 2D arrays via list comprehensions; lru_cache for memoization.
Java: boolean[][] isPal = new boolean[n][n]; defaults to false. int[][] dp = new int[n][n];.
Go: pre-allocate slice-of-slices; can fuse is_pal and cuts computation in a single function.
C++: vector<vector<bool>> is_pal(n, vector<bool>(n, false));.
JS/TS: as Python; watch for shared-reference trap.

Common Bugs

Iterating i, j in row-major for LPS — fails because dp[i][j] depends on dp[i+1][j-1], which hasn’t been computed yet. Must iterate by length.
Forgetting dp[i][i] = 1 — base case for single-char palindromes.
Edge case length == 2 — dp[i+1][j-1] is dp[i+1][i], an invalid range. Special-case to 0 or just use 2.
LC 132: forgetting the is_pal[0][i] short-circuit — gives wrong answer for already-palindromic input.
LC 132: cuts initialization — initialize to i (worst case: cut after every character of s[0..i]).
Confusing subsequence and substring — LC 516 wants subsequence; many candidates accidentally solve substring (which is LC 5, harder).

Debugging Strategy

For LC 516 "bbbab": trace the table by length. Length 1: diagonal all 1. Length 2: dp[0][1]=2 (bb), dp[1][2]=2, dp[2][3]=1, dp[3][4]=1. Length 3: dp[0][2]=3 (bbb). Length 4: dp[0][3]=3, dp[1][4]=3. Length 5: dp[0][4]=4 (bbbb). For LC 132 "aab": is_pal = [[T,T,F],[F,T,F],[F,F,T]]; cuts = [0, 0, 1].

Mastery Criteria

Recognized “longest palindromic subsequence” as interval DP within 60 seconds.
Articulated the length-ascending iteration order in <30 seconds.
Wrote LC 516 brute recursion in <2 minutes.
Wrote LC 516 tabulated in <5 minutes.
Wrote is_pal precompute correctly in <4 minutes.
Wrote LC 132 cuts DP using is_pal in <5 minutes.
Stated O(N²) time and space.
Solved LC 516 unaided in <12 minutes (full progression).
Solved LC 132 unaided in <15 minutes.
Solved LC 5 (longest palindromic substring) in <8 minutes via is_pal.

Lab 08 — Tree DP (House Robber III)

Goal

Solve House Robber III (LC 337) with post-order DFS returning (rob, skip) per node. Internalize the post-order DP pattern where each node returns a tuple of “best with this node included” and “best with this node excluded”. After this lab you recognize tree DP within 60 seconds and can write any post-order tuple-DP from blank screen in <8 minutes.

A note on the four-stage progression: tree DP doesn’t have a clean tabulated form (there’s no natural row-major order for a tree), and “space optimization” is implicit (each post-order call returns a constant-size tuple, so the working memory is O(1) per node). Stages we can show: brute recursion, memoized recursion, post-order DFS with tuple return (the canonical form), and an iterative post-order using an explicit stack. The tuple-return version is what you write in interviews.

Background Concepts

Tree DP: state is per node; recurrence aggregates children’s states. The natural evaluation order is post-order — process all descendants before the node itself. Most tree DPs return a tuple (or struct) per node carrying the answers for “this node included” vs “this node excluded” (or whatever the binary split is). The parent combines children’s tuples in O(1) per child, giving O(N) total.

House Robber III is the canonical example. Each node v returns (rob_v, skip_v):

rob_v = val[v] + sum(skip_c for c in children(v)) — rob v, must skip every child.
skip_v = sum(max(rob_c, skip_c) for c in children(v)) — skip v, each child is independently best.

Final answer: max(rob_root, skip_root).

Interview Context

LC 337 is a top-30 Medium DP problem at Amazon and Microsoft. The post-order tuple pattern recurs in: LC 124 (Binary Tree Maximum Path Sum), LC 543 (Diameter of Binary Tree), LC 968 (Binary Tree Cameras), LC 1372 (Longest ZigZag Path). Mastering it here generalizes broadly. Senior interviewers specifically test whether you write the tuple-return version (clean, O(N)) versus the memoized-but-redundant version that recursively descends three times per call.

Problem Statement

A binary tree where each node holds an integer amount of money. The thief cannot rob two directly-linked houses (parent–child). Return the maximum amount the thief can rob without alerting the police.

Constraints

1 ≤ tree size ≤ 10^4
0 ≤ node.val ≤ 10^4

Clarifying Questions

Is the tree binary or general? (Binary, per LC 337.)
Are values non-negative? (Yes — given.)
What does the tree representation look like? (Standard TreeNode with left, right.)
Can the tree be empty? (Yes — return 0.)
Does “linked” mean parent–child only or also siblings? (Parent–child only.)

Examples

        3
       / \
      2   3
       \   \
        3   1                → 7   (rob 3 + 3 + 1)

        3
       / \
      4   5
     / \   \
    1   3   1                → 9   (rob 4 + 5)

Initial Brute Force

For each subtree rooted at v: try rob-v (must skip children, recurse on grandchildren) or skip-v (recurse on children).

def rob_brute(root):
    def f(v):
        if v is None: return 0
        # Skip v
        skip = f(v.left) + f(v.right)
        # Rob v: must skip both children
        rob = v.val
        if v.left:  rob += f(v.left.left)  + f(v.left.right)
        if v.right: rob += f(v.right.left) + f(v.right.right)
        return max(rob, skip)
    return f(root)

Brute Force Complexity

Each call recurses on children and on grandchildren — the same node is visited multiple times via different paths. Worst case O(N · 2^depth). Memoization on the node identity collapses to O(N), but cleaner is to return the tuple in a single post-order traversal.

Optimization Path

The brute force descends three times per node (for the rob branch) and two for skip. With memoization on the node, every subtree is computed twice (once as a “child” call, once as a “grandchild” call). Use a dict keyed by id(node) or, much cleaner, return both values in one tuple per node — a single post-order pass with no memoization needed.

Final Expected Approach

Post-order DFS returning (rob, skip):

def f(v):
    if v is None: return (0, 0)
    lr, ls = f(v.left)
    rr, rs = f(v.right)
    rob_v  = v.val + ls + rs
    skip_v = max(lr, ls) + max(rr, rs)
    return (rob_v, skip_v)

answer = max(f(root))

Time O(N) (each node visited once). Space O(H) for the call stack (H = tree height; O(N) worst case for skewed trees, O(log N) average).

Data Structures Used

Binary tree (input).
Recursion stack.
For brute / memo: optional dict keyed by node identity.

Correctness Argument

By structural induction on the tree. Base: empty tree → (0, 0). Inductive: assume f(v.left) and f(v.right) correctly return (rob, skip) for those subtrees. Then:

rob_v = rob v and skip both children. Since the children must be skipped (parent-child constraint), the contribution from each child subtree is skip_child. Plus v.val.
skip_v = skip v, each child subtree independently maximized: max(rob_child, skip_child).

The max of the two is the overall optimum, but we return both (not the max) because the parent of v needs skip_v distinct from rob_v. The final answer at the root is max(rob_root, skip_root).

Complexity

Stage	Time	Space
Brute force	O(N · 2^H) worst	O(H)
Memoized (node-keyed)	O(N)	O(N)
Tuple-return post-order	O(N)	O(H)
Iterative post-order	O(N)	O(N) (explicit stack)

Implementation Requirements

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val, self.left, self.right = val, left, right

# ---- Stage 1: Brute force (recompute via grandchildren) ----
def rob_brute(root):
    def f(v):
        if v is None: return 0
        skip = f(v.left) + f(v.right)
        rob = v.val
        if v.left:  rob += f(v.left.left) + f(v.left.right)
        if v.right: rob += f(v.right.left) + f(v.right.right)
        return max(rob, skip)
    return f(root)

# ---- Stage 2: Memoized on node identity ----
def rob_memo(root):
    memo = {}
    def f(v):
        if v is None: return 0
        if id(v) in memo: return memo[id(v)]
        skip = f(v.left) + f(v.right)
        rob = v.val
        if v.left:  rob += f(v.left.left) + f(v.left.right)
        if v.right: rob += f(v.right.left) + f(v.right.right)
        memo[id(v)] = max(rob, skip)
        return memo[id(v)]
    return f(root)

# ---- Stage 3: Tuple-return post-order (canonical) ----
def rob(root):
    def f(v):
        if v is None: return (0, 0)
        lr, ls = f(v.left)
        rr, rs = f(v.right)
        rob_v  = v.val + ls + rs
        skip_v = max(lr, ls) + max(rr, rs)
        return (rob_v, skip_v)
    return max(f(root))

# ---- Stage 4: Iterative post-order with explicit stack ----
def rob_iter(root):
    if root is None: return 0
    stack, order = [root], []
    while stack:
        v = stack.pop()
        order.append(v)
        if v.left:  stack.append(v.left)
        if v.right: stack.append(v.right)
    # order is now reverse post-order; iterate reversed for true post-order
    state = {}  # id(v) -> (rob, skip)
    for v in reversed(order):
        lr, ls = state.get(id(v.left),  (0, 0)) if v.left  else (0, 0)
        rr, rs = state.get(id(v.right), (0, 0)) if v.right else (0, 0)
        state[id(v)] = (v.val + ls + rs, max(lr, ls) + max(rr, rs))
    return max(state[id(root)])

Tests

Build trees from level-order input:

[3,2,3,null,3,null,1] → 7.
[3,4,5,1,3,null,1] → 9.
[1] → 1.
[] → 0 (empty tree).
Linear left-skewed chain 1→2→3→4→5 → 9 (rob 1, 3, 5).
Random N=1000 tree — performance test.
Cross-check all four implementations on random trees of size N≤8.

Follow-up Questions

“Reconstruct which nodes were robbed.” → Augment the tuple to also return the set of robbed nodes (or backtrack from the root after the post-order pass).
“K-ary tree.” → Same idea; sum over all children.
“Constraint relaxes to: cannot rob two nodes within distance K.” → Augment state to track distance to last robbed node; state grows by factor K.
“Negative values allowed.” → Same recurrence; max correctly handles.
“Maximum path sum of an arbitrary path (LC 124).” → Different but related: post-order returns “best one-sided path from this node”. Combine left + right + node at the root for the best path through it.

Product Extension

Tree DP underlies code-review priority computation in commit-trees, expression-tree evaluation in compilers, hierarchical resource allocation (org charts, file systems), and game-tree value computation. The post-order-with-tuple pattern is a workhorse.

Language/Runtime Follow-ups

Python: tuple unpacking is idiomatic. Default recursion limit (1000) overflows on deep skewed trees; bump with sys.setrecursionlimit.
Java: define a small inner int[] of size 2 or a Pair<Integer, Integer> (or just use int[]). Recursion depth is O(H); JVM stack default 512KB, OK for N≤10000.
Go: return (int, int) directly via multiple-return.
C++: pair<int,int> returned by value.
JS/TS: return [rob, skip] array.

Common Bugs

Returning only max(rob_v, skip_v) instead of the tuple — the parent then can’t distinguish the two cases and the recurrence is wrong.
Computing rob_v as v.val + skip_v — wrong, because skip_v already includes max(rob_child, skip_child) not just skip_child.
Forgetting v is None base case — null pointer / AttributeError.
Confusing which children’s value to sum: rob_v sums ls + rs (skip both children); skip_v sums max(lr, ls) + max(rr, rs).
Iterative version: traversing in pre-order and reversing — works for binary trees because reverse pre-order with right-before-left equals post-order. Easy to flip and break.
Using @lru_cache on the function f(v) directly — TreeNode is unhashable by default; use id(v) or define __hash__.

Debugging Strategy

For [3,2,3,null,3,null,1]:

Leaves: f(3-leaf-left) = (3, 0). f(1-leaf) = (1, 0). f(3-leaf-mid) = (3, 0).
Mid-left node (val 2, child 3): f(2-mid) = (2 + 0, 3) = (2, 3).
Mid-right (val 3, right child 1): f(3-right) = (3 + 0, 1) = (3, 1).
Root (val 3): f(root) = (3 + 3 + 1, max(2,3) + max(3,1)) = (7, 6). Answer: 7. ✓

If your tuple values diverge, print (rob, skip) per node in post-order and locate the first inconsistency.

Mastery Criteria

Recognized this as tree DP within 60 seconds.
Articulated the (rob, skip) tuple invariant in <30 seconds.
Wrote the tuple-return post-order in <5 minutes from blank screen.
Stated O(N) time and O(H) space.
Articulated why the tuple is necessary (parent needs both rob_child and skip_child) in <30 seconds.
Solved LC 337 unaided in <8 minutes.
Solved LC 124 (Binary Tree Max Path Sum) in <12 minutes using the same post-order pattern.
Solved LC 543 (Diameter) in <8 minutes.
Identified the brute-force redundancy (descend twice via children + grandchildren) without prompting.

Lab 09 — Interval DP (Burst Balloons)

Goal

Solve Burst Balloons (LC 312) with the four-stage progression. Internalize the “think backwards” trick: instead of asking “which balloon to burst first?” (which fragments the array), ask “which balloon is burst last in the interval (i, j)?” — that balloon’s left and right neighbors are guaranteed to be nums[i] and nums[j] (the surviving boundary). After this lab you can identify and solve interval DP problems in <15 minutes.

Background Concepts

Interval DP: state dp[i][j] is the answer over the subarray (or substring) from index i to j. Recurrence iterates over a “split point” k in (i, j) and combines two sub-intervals. Defining feature: iterate by interval length ascending, so all shorter intervals are computed before they’re needed.

Burst Balloons is famous because the naive “first to burst” formulation creates non-contiguous subproblems. The “last to burst” reformulation (think backwards) restores contiguity: in interval (i, j), if k is the last balloon to burst, then by the time we burst it, all of (i, k) and (k, j) have already been burst, and k’s left and right neighbors are exactly nums[i] and nums[j] (the original boundary).

Interview Context

Burst Balloons is a top-Hard interval DP problem at Google and Microsoft. It is the problem that teaches the “think backwards” trick. Candidates who fail to recognize the contiguity issue with the forward formulation (and then naively try memoization on subsets — which is 2^N states) get stuck. Senior interviewers love this problem precisely because the reformulation is non-obvious and tests insight, not memorization.

Other interval DP problems: Matrix Chain Multiplication (LC 1547), Stone Game (LC 877), Strange Printer (LC 664), Remove Boxes (LC 546).

Problem Statement

You are given n balloons indexed 0 to n−1, each with a number nums[i] painted on it. You are asked to burst all the balloons. If you burst balloon i, you get nums[i-1] * nums[i] * nums[i+1] coins (use 1 if neighbor is out of bounds). After bursting, the neighbors become adjacent. Return the maximum coins you can collect.

Constraints

N == nums.length
1 ≤ N ≤ 500
0 ≤ nums[i] ≤ 100

Clarifying Questions

Are values non-negative? (Yes — given.)
After bursting, do neighbors become adjacent? (Yes — that’s the rule.)
What if a neighbor is out of bounds (edge balloon)? (Treat as 1.)
Must we burst all balloons? (Yes — the question asks max coins from bursting all.)
Are zero values allowed? (Yes — bursting a zero-balloon gives 0 coins.)

Examples

nums = [3, 1, 5, 8]    → 167
  Burst order: 1, 5, 3, 8.
  3 * 1 * 5 = 15;  3 * 5 * 8 = 120;  1 * 3 * 8 = 24;  1 * 8 * 1 = 8;   total 167.

nums = [1, 5]          → 10   (burst 5 → 1*5*1=5; burst 1 → 1*1*1=1; total 6.
                                Better: burst 1 first → 1*1*5=5; burst 5 → 1*5*1=5; total 10.)

nums = [9]             → 9
nums = [1]             → 1

Initial Brute Force

For each balloon, try bursting it first; recurse on the two halves. Note: this naive form has a fundamental contiguity issue — after bursting k first, the left and right halves’ boundary values change depending on which balloons remain, so the subproblems aren’t independent. We can still write it but it requires passing the current array (or the active set) down.

def burst_brute(nums):
    arr = [1] + nums + [1]
    def f(active):
        if not active: return 0
        best = 0
        for idx in range(len(active)):
            k = active[idx]
            left  = active[idx-1] if idx > 0 else 0
            right = active[idx+1] if idx+1 < len(active) else len(arr)-1
            gain = arr[left] * arr[k] * arr[right]
            best = max(best, gain + f(active[:idx] + active[idx+1:]))
        return best
    return f(list(range(1, len(arr) - 1)))

Brute Force Complexity

O(N!) — every permutation of bursts. At N=500, completely infeasible.

Optimization Path

The “first to burst” formulation cannot be memoized cleanly on (i, j) because the subproblems depend on what’s outside (i, j). Reframe: ask “in the final interval (i, j), which balloon k is burst last?”. By the time k is burst, all balloons in (i, k) and (k, j) have been burst — independently of each other. k’s neighbors at that moment are nums[i] and nums[j] (the surviving boundary). The recurrence becomes:

dp[i][j] = max over k in (i, j) of:
           dp[i][k] + nums[i] * nums[k] * nums[j] + dp[k][j]

Pad nums with 1 at both ends to handle out-of-bounds neighbors uniformly. Iterate by interval length.

Final Expected Approach

1. arr = [1] + nums + [1]  (length N+2)
2. dp[i][j] = max coins from bursting all balloons strictly between i and j (boundaries i, j untouched)
3. dp[i][i+1] = 0 (no balloons between i and i+1)
4. For length 2..N+1:
       For i in 0..N+1-length:
           j = i + length
           dp[i][j] = max over k in (i, j) of dp[i][k] + arr[i]*arr[k]*arr[j] + dp[k][j]
5. Answer: dp[0][N+1]

Data Structures Used

2D dp[N+2][N+2].
Padded array arr of size N+2.

Correctness Argument

Claim: dp[i][j] = max coins from bursting all balloons in (i, j) (exclusive), assuming balloons at indices i and j are still present. Proof by induction on length.

Base: length 1 → (i, i+1) has no balloons strictly between → 0.

Inductive step: any optimal bursting order has a last balloon k in (i, j). When k is burst, all other balloons in (i, k) and (k, j) have already been burst, and k’s neighbors are arr[i] and arr[j] (because k is the last to go). The two subintervals (i, k) and (k, j) are independent — neither affects the other since they’re separated by k itself, which is alive until the end. So:

dp[i][k] = max coins from bursting (i, k) (boundaries i, k alive).
dp[k][j] = max coins from bursting (k, j) (boundaries k, j alive).
arr[i] * arr[k] * arr[j] = coins from bursting k last with neighbors i, j.

Sum and maximize over k. The induction works because both subintervals are strictly shorter than (i, j).

Complexity

Stage	Time	Space
Brute force	O(N!)	O(N)
Memoized	O(N³)	O(N²)
Tabulated	O(N³)	O(N²)
Space-optimized	(no further reduction; subproblems span all of `(i, j)`)	O(N²)

At N=500, N³ = 1.25 × 10^8 — close to the edge but passes.

Implementation Requirements

# ---- Stage 1: Brute force ----
def burst_brute(nums):
    arr = [1] + nums + [1]
    def f(active):
        if not active: return 0
        best = 0
        for idx in range(len(active)):
            k = active[idx]
            left  = active[idx-1] if idx > 0 else 0
            right = active[idx+1] if idx+1 < len(active) else len(arr) - 1
            gain = arr[left] * arr[k] * arr[right]
            best = max(best, gain + f(active[:idx] + active[idx+1:]))
        return best
    return f(list(range(1, len(arr) - 1)))

# ---- Stage 2: Memoized (think-backwards reformulation) ----
from functools import lru_cache
def burst_memo(nums):
    arr = [1] + nums + [1]
    @lru_cache(None)
    def f(i, j):
        if j - i < 2: return 0
        return max(
            f(i, k) + arr[i] * arr[k] * arr[j] + f(k, j)
            for k in range(i + 1, j)
        )
    return f(0, len(arr) - 1)

# ---- Stage 3: Tabulated 2D ----
def maxCoins(nums):
    arr = [1] + nums + [1]
    n = len(arr)
    dp = [[0] * n for _ in range(n)]
    for length in range(2, n):
        for i in range(n - length):
            j = i + length
            best = 0
            for k in range(i + 1, j):
                cand = dp[i][k] + arr[i] * arr[k] * arr[j] + dp[k][j]
                if cand > best: best = cand
            dp[i][j] = best
    return dp[0][n-1]

# ---- Stage 4: No further space optimization (full 2D needed); presented as a tighter inner loop ----
def maxCoins_tight(nums):
    arr = [1] + nums + [1]
    n = len(arr)
    dp = [[0] * n for _ in range(n)]
    for length in range(2, n):
        for i in range(n - length):
            j = i + length
            ai, aj = arr[i], arr[j]
            best = 0
            for k in range(i + 1, j):
                cand = dp[i][k] + ai * arr[k] * aj + dp[k][j]
                if cand > best: best = cand
            dp[i][j] = best
    return dp[0][n-1]

Tests

[3, 1, 5, 8] → 167.
[1, 5] → 10.
[9] → 9.
[1] → 1.
[] → 0.
[1, 1, 1] → 3.
N=100 random — performance smoke test.
Cross-check brute vs memo on N≤7.

Follow-up Questions

“Find the burst order.” → Track the argmax k in each dp[i][j]; recursively reconstruct.
“Each balloon has a different gain function (not multiplicative).” → Same DP shape, plug in any commutative-on-boundaries function.
“Can we burst at most M balloons?” → Add a 3rd dimension dp[i][j][m].
“Stones Game family (LC 877).” → Interval DP with two players; dp[i][j] = max-score-difference.
“Matrix Chain Multiplication.” → Same shape: dp[i][j] = min over k of dp[i][k] + dp[k+1][j] + cost(i,k,j).

Product Extension

Interval DP underlies optimal binary search tree construction, optimal parenthesization for matrix chains in linear algebra libraries, and pricing problems in algorithmic finance (“when to buy/sell a contract that opens an interval”). The “think backwards / last to act” reframing recurs in algorithmic game theory and contract design.

Language/Runtime Follow-ups

Python: at N=500, the inner triple-loop is 1.25 × 10^8 iterations; Python may TLE. PyPy or rewriting the inner loop as a max(...) generator helps.
Java/Go/C++: no concern at this size.
Memoization in Python: lru_cache(None) is fine; works with (i, j) int tuples.
Iterative version: triple-nested loop is most efficient; avoid generator overhead.

Common Bugs

Trying “first to burst” recurrence and memoizing on (i, j) — incorrect because the subproblems’ boundaries change as outer balloons are burst.
Forgetting to pad with 1 at both ends — out-of-bounds neighbors then need special-casing in every loop iteration.
Iterating by i, j row-major — dp[i][k] for k > j (which we never compute) is never read, but dp[k][j] for k < j is; the row-major order computes dp[i][k] before dp[i][j] only sometimes. Iterate by length.
Off-by-one in range(i + 1, j) — k must be strictly between i and j. Easy to write range(i, j) or range(i + 1, j + 1) and break the recurrence.
Initializing best = -1 — wrong because all values are non-negative and dp[i][j] = 0 for empty intervals is correct.

Debugging Strategy

For [3, 1, 5, 8] (padded to [1, 3, 1, 5, 8, 1]):

Length 2 (no balloons strictly between): all dp = 0.
Length 3: e.g., dp[0][2] = arr[0]*arr[1]*arr[2] = 1*3*1=3. dp[1][3] = 3*1*5 = 15. dp[2][4] = 1*5*8 = 40. dp[3][5] = 5*8*1 = 40.
Length 4: e.g., dp[0][3] = max over k=1,2 of dp[0][k] + 1*arr[k]*5 + dp[k][3] = max(0 + 1*3*5 + 15, 3 + 1*1*5 + 0) = max(30, 8) = 30.
Continue up to dp[0][5] = 167.

Print dp row by row and verify against the trace.

Mastery Criteria

Recognized the contiguity issue with “first to burst” within 90 seconds.
Articulated the “think backwards / last to burst” reformulation in <60 seconds.
Wrote the corrected recurrence on a whiteboard in <2 minutes.
Wrote the tabulated O(N³) solution in <8 minutes from blank screen.
Padded nums with sentinels correctly without prompting.
Iterated by interval length without prompting.
Stated O(N³) time and O(N²) space.
Solved LC 312 unaided in <20 minutes.
Solved Matrix Chain Multiplication in <12 minutes via the same template.

Lab 10 — Bitmask DP (Shortest Path Visiting All Nodes)

Goal

Solve Shortest Path Visiting All Nodes (LC 847) with both BFS over (node, mask) states and an iterative DP variant. Internalize bitmask state encoding: when N is small (≤ 20), the subset S ⊆ {0, …, N-1} fits in a single integer mask and a 1D array of size 2^N indexes all subsets. After this lab you can handle any “visit all / cover all / select subset” problem with N ≤ 20 in <15 minutes.

Background Concepts

Bitmask DP encodes a subset as an integer’s bits. For N=12, there are 2^12 = 4096 subsets — small enough that dp[mask][i] (mask × last-visited-node) has 4096 × 12 ≈ 50K states. Each state’s transition is O(N), giving O(N² · 2^N) total — feasible for N ≤ 16, manageable for N ≤ 20 with care.

Common bitmask DP patterns:

Visit-all / TSP-like: dp[mask][i] = min cost to visit nodes in mask ending at i. Final: min over i of dp[(1<<N)-1][i] (+ return-to-start cost for TSP).
Subset-cover: dp[mask] = best score selecting items whose indicator is mask.
Assignment problems: assign N people to N tasks with min total cost — dp[mask] where mask = set of tasks already assigned, with popcount(mask) people processed so far.

LC 847 is unusual: it’s a shortest unweighted path problem (BFS), not a min-cost (Dijkstra) problem. So BFS over (node, mask) is the natural approach. We can also frame it as DP, but BFS is cleaner here.

Interview Context

Shortest Path Visiting All Nodes (LC 847) is a top-Hard graph + bitmask problem at Google and Meta. The bitmask-on-graph technique recurs in: LC 943 (Find the Shortest Superstring), LC 1125 (Smallest Sufficient Team), LC 1349 (Maximum Students Taking Exam), LC 526 (Beautiful Arrangement). The trick of recognizing N ≤ 20 → bitmask is itself an interview heuristic: any time N is suspiciously small, consider bitmask.

Problem Statement

You have an undirected, connected graph of n nodes labeled from 0 to n − 1. graph[i] is the list of neighbors of node i. Return the length of the shortest path that visits every node. You may start and stop at any node, may revisit nodes, and may reuse edges.

Constraints

1 ≤ n ≤ 12
graph.length == n
0 ≤ graph[i].length < n
The graph is connected.

Clarifying Questions

Edges are undirected? (Yes — given.)
Edges weighted or unweighted? (Unweighted — count edges traversed.)
Can the same node be visited multiple times? (Yes.)
Can we start anywhere? (Yes — the answer minimizes over all starting nodes.)
Must the graph be connected? (Yes — given. Otherwise the answer is infeasible.)

Examples

graph = [[1,2,3],[0],[0],[0]]      → 4   (visit order: 1→0→2→0→3 or 2→0→1→0→3 etc.)
graph = [[1],[0,2,4],[1,3,4],[2],[1,2]]   → 4

graph = [[1],[0]]                  → 1   (just edge 0–1)
graph = [[]]                       → 0   (single node, already visited)

Initial Brute Force

Try every permutation of node visits as a starting path; sum the shortest-path-lengths between consecutive nodes (precomputed via BFS). N! permutations × O(N) work per permutation. At N=12, 12! ≈ 5 × 10^8 — borderline.

from itertools import permutations
from collections import deque

def shortestPathLength_brute(graph):
    n = len(graph)
    if n == 1: return 0
    # Precompute pairwise shortest path lengths via BFS
    dist = [[float('inf')] * n for _ in range(n)]
    for src in range(n):
        dist[src][src] = 0
        q = deque([src])
        while q:
            u = q.popleft()
            for v in graph[u]:
                if dist[src][v] == float('inf'):
                    dist[src][v] = dist[src][u] + 1
                    q.append(v)
    best = float('inf')
    for perm in permutations(range(n)):
        cost = sum(dist[perm[i]][perm[i+1]] for i in range(n - 1))
        best = min(best, cost)
    return best

Brute Force Complexity

O(N! · N) time. At N=12, infeasible.

Optimization Path

Observation: at any point, the relevant state is (current_node, set_of_visited_nodes). There are N · 2^N such states. Two ways to solve:

BFS over states (canonical for unweighted): each state is a node in a meta-graph of (node, mask); transitions to (neighbor, mask | (1 << neighbor)). BFS gives shortest-path lengths to all states; the answer is the smallest distance to any state with mask = (1 << N) - 1. Time O(N · 2^N · degree) ≈ O(N² · 2^N).
Iterative DP (Held-Karp style for TSP — but TSP minimizes weighted paths; for unweighted with revisits the BFS is more natural).

We present BFS as the primary solution (canonical for LC 847) with the iterative-DP variant as a follow-up.

Final Expected Approach

1. Initialize a queue with all (node, 1 << node) states (one per starting node), distance 0.
2. BFS:
   - Pop (u, mask). If mask == ALL = (1<<N)-1, return current distance.
   - For each neighbor v of u:
       new_mask = mask | (1 << v)
       if (v, new_mask) not yet visited:
           mark, enqueue, distance + 1.

Data Structures Used

deque for BFS.
2D visited[node][mask] boolean (or a set of (node, mask) tuples).
For brute force: pairwise dist table (BFS-precomputed) and itertools.permutations.

Correctness Argument

The state graph has nodes (u, mask) and edges (u, mask) → (v, mask | (1 << v)) for every graph neighbor v of u. A path in the original graph that visits all nodes corresponds 1-to-1 to a path in the state graph from some starting state (s, 1 << s) to a “complete” state (t, ALL) for some t. Since the state-graph edges are unweighted, BFS finds the shortest such path. Multi-source BFS over all starting states minimizes over all start nodes simultaneously. Termination: the state graph has N · 2^N nodes; BFS visits each at most once.

Complexity

Stage	Time	Space
Brute force (perm + BFS dist)	O(N! · N + N²·E)	O(N²)
BFS over (node, mask)	O(N² · 2^N)	O(N · 2^N)
DP (popcount-ascending fill)	O(N² · 2^N)	O(N · 2^N)

At N=12: 12² · 4096 = 590K ops — fast.

Implementation Requirements

from collections import deque
from itertools import permutations

# ---- Stage 1: Brute force ----
def shortestPathLength_brute(graph):
    n = len(graph)
    if n == 1: return 0
    dist = [[float('inf')] * n for _ in range(n)]
    for src in range(n):
        dist[src][src] = 0
        q = deque([src])
        while q:
            u = q.popleft()
            for v in graph[u]:
                if dist[src][v] == float('inf'):
                    dist[src][v] = dist[src][u] + 1
                    q.append(v)
    best = float('inf')
    for perm in permutations(range(n)):
        cost = sum(dist[perm[i]][perm[i+1]] for i in range(n - 1))
        if cost < best: best = cost
    return best

# ---- Stage 2: Memoized DFS over (node, mask) — works but BFS is preferred for unweighted ----
from functools import lru_cache
def shortestPathLength_memo(graph):
    n = len(graph)
    ALL = (1 << n) - 1
    @lru_cache(None)
    def f(u, mask):
        if mask == ALL: return 0
        best = float('inf')
        for v in graph[u]:
            new_mask = mask | (1 << v)
            if new_mask != mask:  # only progress if v is newly visited
                best = min(best, 1 + f(v, new_mask))
            # also allow revisiting (no-op for mask but adds 1 to path length) -- but that's wasteful, skip
        return best
    # Try every starting node
    return min(f(s, 1 << s) for s in range(n))
# NOTE: this memo version misses cases where you must transit through already-visited nodes.
# BFS handles that natively because (v, new_mask) where new_mask == mask is allowed if not yet seen.

# ---- Stage 3: BFS over (node, mask) — canonical solution ----
def shortestPathLength(graph):
    n = len(graph)
    if n == 1: return 0
    ALL = (1 << n) - 1
    visited = set()
    q = deque()
    for s in range(n):
        state = (s, 1 << s)
        visited.add(state)
        q.append((s, 1 << s, 0))
    while q:
        u, mask, dist = q.popleft()
        if mask == ALL: return dist
        for v in graph[u]:
            new_mask = mask | (1 << v)
            state = (v, new_mask)
            if state not in visited:
                visited.add(state)
                q.append((v, new_mask, dist + 1))
    return -1  # unreachable; should not happen for connected graphs

# ---- Stage 4: DP filling by mask in popcount order (alternative formulation) ----
def shortestPathLength_dp(graph):
    n = len(graph)
    if n == 1: return 0
    ALL = (1 << n) - 1
    INF = float('inf')
    # dp[mask][u] = min edges to reach state (u, mask) from any starting node
    dp = [[INF] * n for _ in range(1 << n)]
    q = deque()
    for s in range(n):
        dp[1 << s][s] = 0
        q.append((s, 1 << s))
    while q:
        u, mask = q.popleft()
        for v in graph[u]:
            new_mask = mask | (1 << v)
            if dp[new_mask][v] > dp[mask][u] + 1:
                dp[new_mask][v] = dp[mask][u] + 1
                q.append((v, new_mask))
    return min(dp[ALL][u] for u in range(n))

Tests

[[1,2,3],[0],[0],[0]] → 4.
[[1],[0,2,4],[1,3,4],[2],[1,2]] → 4.
[[1],[0]] → 1.
[[]] → 0 (single node).
N=12 with sparse and dense connectivity — performance smoke.
Cross-check brute vs BFS on N≤6 random connected graphs.

Follow-up Questions

“What if edges are weighted?” → Replace BFS with Dijkstra (priority queue). Still O((N · 2^N) log(N · 2^N) + edges).
“Must return to start (TSP closed tour).” (Held-Karp) → Compute dp[mask][u] = min cost from start to u visiting mask; final answer min over u of dp[ALL][u] + dist[u][start].
“Can revisit, weighted, must visit all.” → Floyd-Warshall preprocess to get all-pairs shortest paths, then Held-Karp on the dense complete graph induced by those distances.
“N up to 20 — does this still fit?” → 2^20 = 10^6, N²·2^N ≈ 4×10^8. Borderline; need bit tricks and tight inner loops, often C++/Java only.
“All Hamiltonian paths (visit each node exactly once).” → Same DP; track exact popcount(mask) == N. NP-hard but bitmask handles N≤20.

Product Extension

The bitmask DP / Held-Karp algorithm is the gold-standard exact solution for small TSP-like problems. Real applications: drone delivery routing for ≤ 20 stops, layout optimization in chip design, scheduling N jobs on a single machine with sequence-dependent setup times, optimal-question-ordering in adaptive testing.

Language/Runtime Follow-ups

Python: bit operations (|, &, <<) on ints are arbitrary-precision; (1 << N) - 1 works for any N. Use bin(mask).count('1') for popcount, or mask.bit_count() in Python 3.10+.
Java: Integer.bitCount(mask). Use int for N ≤ 31, long for N ≤ 63.
Go: bits.OnesCount(uint(mask)) from math/bits.
C++: __builtin_popcount(mask) (or popcount in C++20). Compiles to a single CPU instruction.
JS/TS: bitwise ops are 32-bit signed; for N > 30 use BigInt (slower).

Common Bugs

Forgetting to multi-source initialize the BFS — a single starting node misses better starts.
Treating (v, new_mask) where new_mask == mask as visited and skipping — this is correct for BFS if the state was already enqueued, but new code may forget that revisiting is sometimes necessary (transit through known nodes). The state-key (v, mask) handles this automatically.
mask | (1 << v) missing the parentheses: mask | 1 << v parses as (mask | 1) << v in C/Java/JS — wrong. Always parenthesize the shift.
Forgetting the if mask == ALL: return dist check at dequeue time — checking only at enqueue can miss a dist+1 opportunity.
set of tuples performance: in Python, a 2D bool array is faster than a set for the visited check at high N. Use [[False]*N for _ in range(1<<N)].

Debugging Strategy

For [[1,2,3],[0],[0],[0]] (N=4): BFS expands (0,0001), (1,0010), (2,0100), (3,1000) at distance 0. From (1,0010), we go to (0,0011) at distance 1. From (0,0011), we go to (1,0011), (2,0111), (3,1011) at distance 2. From (2,0111) we go to (0,0111) at distance 3. From (0,0111) we go to (3,1111) at distance 4 — done. Print (u, bin(mask), dist) per dequeue and locate where your trace diverges.

Mastery Criteria

Recognized “N ≤ 12, visit all” as bitmask DP within 60 seconds.
Stated the state space (node, mask) and its size N · 2^N in <30 seconds.
Wrote the multi-source BFS in <8 minutes from blank screen.
Articulated why multi-source initialization is correct (start anywhere).
Stated O(N² · 2^N) time complexity.
Solved LC 847 unaided in <20 minutes.
Articulated the difference between BFS-over-states (unweighted) and Held-Karp (weighted/TSP) in <60 seconds.
Wrote mask.bit_count() / __builtin_popcount correctly without prompting.
Solved LC 526 (Beautiful Arrangement) in <12 minutes using dp[mask] over assignments.

Phase 6 — Greedy, Proofs & Mathematical Thinking

Target level: Medium → Hard Expected duration: 1.5 weeks (12-week track) / 2 weeks (6-month track) / 2.5 weeks (12-month track) Weekly cadence: ~7 greedy concepts plus 6 labs plus 25–40 problems applying them under the framework

Why Greedy Is The Single Most Dangerous Pattern Family In Coding Interviews

Greedy is the topic where the largest number of candidates fail confidently. Unlike dynamic programming, where the failure mode is “I cannot derive the recurrence” — a visible failure that the interviewer can help with — greedy’s failure mode is “I have a plausible algorithm, I have run it on the given example, it works, and I am wrong.” The candidate writes a clean function, the test cases pass, the complexity is excellent, and the algorithm is silently incorrect on the third hidden test. By the time the interviewer reveals the counterexample, the candidate has consumed 25 minutes building a wrong solution and has 10 minutes left to either patch it (impossible without re-deriving) or restart with DP (also impossible).

The empirical claim that drives this entire phase:

The hard part of greedy is not the algorithm. The hard part is the proof of correctness. Almost every wrong greedy solution is wrong because the candidate convinced themselves “this seems to work” without an exchange argument or invariant — and an interviewer who suspects the candidate is guessing will deliberately construct a counterexample. A candidate who can produce an exchange argument out loud, before the interviewer asks, is signalling “I know what I’m doing”; a candidate who can’t is signalling “I memorized this”.

Greedy is also the topic where the gap between a good engineer and a great one is widest in interview signal. Most candidates can write sort + scan + counter. Very few can articulate why the sort criterion is correct. The ability to say, in 60 seconds, “Suppose for contradiction the optimal solution uses a different first choice; I can exchange it with my greedy choice without making the solution worse, therefore my greedy is also optimal” — that one paragraph is the entire difference between an L4 hire and an L5 hire on greedy questions.

This phase is built around one teaching device that we will use on every single problem from start to finish: the proof comes before the code. Every lab in this phase requires you to write the correctness argument — exchange argument, invariant, or monovariant — before you write the implementation. The implementation is mechanical once the proof is solid; the proof is the whole skill.

After this phase, you can recognize when a problem is amenable to greedy in <2 minutes, produce an exchange argument out loud in <90 seconds, write the algorithm in <5 minutes, and — crucially — identify when a problem looks greedy but is not, falling back to DP from Phase 5 without panic.

What You Will Be Able To Do After This Phase

Recognize greedy candidates in <2 minutes by spotting the greedy choice property signal: “the locally optimal choice cannot hurt the global optimum.”
Distinguish greedy-applicable problems from DP-required problems on first read, using the Greedy-vs-DP flowchart.
Produce an exchange argument for any greedy you propose, in the canonical four-step form (assume optimal differs, locate first divergence, exchange, prove no-worse).
Cite the cut property for MST correctness and explain why both Kruskal and Prim are correct under it.
Use loop invariants to scaffold proofs of greedy algorithms whose correctness is not obvious from a single exchange.
Use monovariants (strictly-decreasing or strictly-increasing quantities) to prove termination and correctness of iterative greedy algorithms.
Apply amortized analysis (potential method, accounting method, aggregate analysis) to bound the cost of greedy data-structure operations.
Identify counterexamples to plausible-looking greedy heuristics, including the canonical “0/1 knapsack ≠ fractional knapsack” trap.
Implement and prove correct: interval scheduling, jump game II, task scheduler, gas station, Huffman coding, and the greedy-vs-DP comparison on coin change.
Articulate the failure modes of greedy unprompted: missing counterexample, “feels right” without proof, confusing local optimum with global optimum.

How To Read This Phase

Read this README in two passes. Pass 1: linear, end-to-end, building the mental discipline that “I will not ship a greedy solution without an exchange argument.” Do this in one sitting. Pass 2: as you work the labs, refer back to specific concept entries when stuck on a proof.

Each concept entry has a fixed shape:

Precise Definition — what the concept means, mathematically.
When Applicable — the problem signal that should fire this concept.
Worked Example — the concept applied to a canonical problem, end-to-end.
Common Misuse — the concrete failure mode this concept guards against.

The phase ends with a Greedy-vs-DP flowchart, a Common Greedy Bugs catalog, a Mastery Checklist, and Exit Criteria.

Inline Concept Reference

1. Greedy Choice Property

Precise Definition

A problem has the greedy choice property if there exists an ordering of the input such that, after making the locally optimal choice (the “greedy choice”) at each step, the result is a globally optimal solution. Formally: at every step i, there is a choice c_i such that some globally optimal solution to the original problem extends c_1, c_2, …, c_i. Equivalently, after the greedy choice the remaining problem is a smaller instance of the same problem, and combining the greedy choice with any optimal solution to the residual problem yields an optimal solution to the original.

This is the formal cousin of optimal substructure (which DP also requires) plus the additional claim that a single locally optimal choice — not a search over choices — suffices at each step.

When Applicable

The greedy choice property holds when:

The problem can be solved by a sequence of irreversible decisions.
At each step, there is a most attractive choice by some natural metric (earliest deadline, smallest weight, largest ratio, latest start time).
An exchange argument or cut property can be invoked to prove that the most-attractive choice is never the wrong one.

The greedy choice property does not hold when:

The right choice at step i depends on choices made at step i+1, i+2, … (i.e., you must “look ahead” to decide). This is the DP regime.
There are multiple incomparable “locally optimal” candidates and the wrong one creates suboptimal residual problems.

Worked Example: Activity Selection

Given n activities with start and end times, pick the maximum number of non-overlapping activities.

The greedy choice: pick the activity with the earliest end time among those still compatible with the previous picks. This satisfies the greedy choice property because: any optimal solution either contains the earliest-ending activity, or — if it doesn’t — we can swap its first picked activity for the earliest-ending one without overlap and without changing the count, producing a new optimal solution that does. (See Lab 01 — Interval Scheduling for the full exchange argument.)

Common Misuse

The most common error is to assume the greedy choice property without proving it, on the basis that “earliest deadline first feels intuitive”. Counter-examples are everywhere; for example, “earliest start time” also feels intuitive but is wrong (consider one activity from 1 to 100 versus dozens of short activities from 2 to 3, 3 to 4, …). The discipline: every greedy claim must be paired with a proof.

2. The Exchange Argument — The Canonical Greedy Proof Technique

Precise Definition

An exchange argument proves that a greedy solution G is optimal by showing that any other solution O can be transformed into G without increasing cost (or decreasing value) via a sequence of exchanges. Each exchange replaces an element of O with the corresponding element of G and proves the swap is non-worsening. After all exchanges, O has been transformed into G, so G is at least as good as O. Since O was arbitrary, G is at least as good as any solution — i.e., G is optimal.

Step-By-Step Recipe

The recipe is rigid. Memorize it. Use it on every greedy proof.

Let G be the greedy solution. Define it precisely (e.g., “the activities chosen by earliest-end-time-first”).
Let O be an arbitrary optimal solution. Assume O ≠ G (otherwise we’re done).
Locate the first index where they differ. Let i be the smallest index such that O[i] ≠ G[i]. By construction, G[0..i-1] = O[0..i-1].
Perform the exchange. Replace O[i] with G[i], producing a new solution O'.
Prove O' is feasible (still satisfies all constraints).
Prove O' is no worse than O (same objective value, or no-worse if it’s a max/min).
Repeat. O' agrees with G on more positions than O did. Iterate; after finitely many exchanges, O has been transformed into G. Therefore G is optimal.

Worked Example: Activity Selection

Greedy G = activities sorted by end time, picked greedily. Suppose O ≠ G is optimal. Let i be the first divergence. By the greedy rule, G[i] has the earliest end time among activities compatible with G[0..i-1] = O[0..i-1]. So G[i].end ≤ O[i].end. Replace O[i] with G[i]: feasibility holds because G[i] ends no later than O[i], so all subsequent activities in O[i+1..], which all start after O[i].end, also start after G[i].end. The objective (count of activities) is unchanged. Repeat until O = G. QED.

Common Misuse

Skipping step 5 (feasibility). Many “exchanges” produce an infeasible solution, invalidating the proof.
Skipping step 6 (no-worse-than). The exchange must be non-worsening, not merely “different”.
Stopping after one exchange. A single exchange shows G[0] = O'[0]; you must iterate.
Picking the wrong index to exchange. Exchanging at the last difference rather than the first often fails because residual structure differs.
Treating the exchange as a swap of types rather than concrete elements. Exchange a specific element of O with a specific element of G, not “exchange the early-ending one with the late-ending one”.

3. The Cut Property (MST Correctness)

Precise Definition

For a connected, weighted, undirected graph G = (V, E), the cut property states: for any cut (S, V\S) (a partition of vertices into two non-empty sets), the minimum-weight edge crossing the cut belongs to some minimum spanning tree of G. (If the minimum is unique, it belongs to every MST; if tied, at least one MST contains it.)

When Applicable

The cut property is the correctness theorem for greedy MST algorithms — Kruskal’s, Prim’s, Borůvka’s. It also generalizes to matroid theory: the greedy algorithm is correct on a structure iff the structure is a matroid.

Worked Example: Kruskal’s Algorithm

Kruskal sorts edges ascending by weight and adds each edge that doesn’t create a cycle. Correctness via the cut property: when Kruskal adds edge (u, v), the union-find structure tells us u and v are in different components — call them S and V\S (where S is u’s component and everything else, including v’s component, is V\S). Edge (u, v) crosses this cut. Because Kruskal scans edges in ascending order and (u, v) is the first edge (by weight) that crosses this cut without forming a cycle, it is the minimum-weight edge crossing the cut. By the cut property, (u, v) belongs to some MST. By induction, the set of edges Kruskal has added so far is a subset of some MST. After processing all edges, Kruskal’s set of edges is exactly an MST.

(Phase 4’s MST labs cover this in algorithmic detail; this phase’s job is the proof.)

Common Misuse

Applying the cut property to directed graphs. It only applies to undirected MST.
Assuming the MST is unique. Tie-breaking matters; multiple MSTs may exist.
Forgetting connectedness. On a disconnected graph, you compute a minimum spanning forest, not an MST.

4. Loop Invariants (Proof Scaffolding)

Precise Definition

A loop invariant is a property P(state) that holds before the loop, after every iteration, and after the loop terminates. To prove a loop is correct, show:

Initialization — P holds before the first iteration (i.e., on the initial state).
Maintenance — if P holds at the start of iteration k, then P holds at the end of iteration k.
Termination — when the loop exits, P (combined with the exit condition) implies the desired postcondition.

Loop invariants are the workhorse of proving greedy algorithms whose correctness isn’t a single one-shot exchange argument — they’re scaffolding for “this thing stays true throughout the run”.

Worked Example: Gas Station (LC 134)

Greedy: scan stations once, maintain tank = 0 and start = 0. At station i, tank += gas[i] - cost[i]. If tank < 0, set start = i + 1 and reset tank = 0. Final answer: start if total gas ≥ total cost, else -1.

Loop invariant: at the end of iteration i, tank equals the net fuel accumulated from start to i, and no station in [start, i] (other than possibly start) is a valid starting point.

(Full proof in Lab 04 — Gas Station.)

Common Misuse

Inventing an invariant after the fact that conveniently equals “the answer is correct”. The invariant must be precisely statable independent of the conclusion.
Failing to prove maintenance — usually because the invariant is too weak (doesn’t survive one iteration) or too strong (false at initialization).
Skipping termination — the invariant might hold every iteration but the loop might not terminate at all, or might terminate in a state that doesn’t imply the postcondition.

5. Monovariants (Termination Arguments)

Precise Definition

A monovariant is a quantity that strictly increases (or strictly decreases) with every iteration of an algorithm and is bounded below (or above) by a known value. The existence of a monovariant proves termination: a strictly decreasing integer-valued quantity bounded below by 0 cannot decrease more than its initial value, so the loop runs at most that many iterations.

In greedy proofs, monovariants are also used to prove progress: each iteration makes irreversible progress toward the goal, so we never need to undo a choice.

Worked Example: Jump Game II (LC 45)

Greedy with two pointers: current_end and farthest. Scan; for each index, update farthest = max(farthest, i + nums[i]). When i == current_end, jump: jumps += 1, current_end = farthest.

Monovariant: farthest is non-decreasing across iterations; in fact farthest ≥ i + 1 for all i (otherwise we’d be stuck at an unreachable position, which is impossible if a solution exists). The monovariant guarantees we never need to backtrack — every position contributes to extending farthest, and current_end only ever moves forward.

(Full proof in Lab 02 — Jump Game II.)

Common Misuse

Confusing “increasing” with “non-decreasing”. A non-decreasing monovariant doesn’t prove termination — the algorithm might loop indefinitely with the quantity stuck. Use strictly increasing/decreasing.
Using a real-valued monovariant without an explicit lower bound on the rate of change. Real values can decrease toward an infimum without ever reaching it (Zeno’s paradox in algorithm form).
Treating monovariant as a correctness proof when it’s only a termination proof. Termination + invariant gives correctness; one of them alone does not.

6. Amortized Analysis

Amortized analysis bounds the cost of a sequence of operations by an average per-operation cost, even when individual operations may be expensive. It is essential for proving the cost of greedy data-structure operations — union-find with path compression, dynamic arrays, splay trees — and shows up in Phase 3 and Phase 7. We cover the three classical methods here.

6a. Aggregate Analysis

Bound the total cost of n operations by some function T(n), then divide: amortized cost per operation is T(n) / n. Simple to apply, hardest to derive a tight T(n) for.

Example. A dynamic array (Python list, Java ArrayList) doubles its capacity when full. n push operations cost: n for the actual writes, plus 1 + 2 + 4 + … + n/2 ≤ n for the resizes, total ≤ 2n. Amortized cost per push: 2n / n = O(1).

6b. The Accounting Method

Charge each operation a fixed amount (the “amortized cost”) which may exceed its actual cost. The excess is stored as “credit” on data-structure elements. Expensive operations pay using accumulated credits, never going into debt. If you can maintain “credits ≥ 0” as an invariant, the amortized cost is a valid upper bound.

Example. Dynamic array push: charge 3 per push. Actual cost is 1 for the write; 2 is stored as credit on the just-written element. When the array doubles, each element being moved already has 2 credits on it — exactly enough to pay for the move and the copy of one new element from the old half. Credits never go negative; amortized cost per push is O(1).

6c. The Potential Method

Define a potential function Φ(D) over data-structure states D, with Φ(D₀) = 0 initially and Φ(D) ≥ 0 always. The amortized cost of an operation is actual_cost + ΔΦ. Total amortized cost over n operations is Σ actual_cost + Φ(D_n) - Φ(D₀) ≥ Σ actual_cost, so it’s a valid upper bound.

Example. Dynamic array: let Φ = 2 · size − capacity. After a doubling, Φ = 0. Each push that doesn’t trigger doubling: actual cost 1, ΔΦ = +2, amortized 3. Doubling push: actual cost size + 1, ΔΦ = 2 − size, amortized 3. Constant O(1) amortized per push.

Common Misuse

Conflating amortized with average-case. Amortized is a worst-case bound on a sequence of operations; average-case is over a probability distribution on inputs. They are not the same.
Applying amortized bounds to a single operation. A single resize is O(n); only the average over a sequence is O(1).
Claiming credits go negative in the accounting method — invalidates the bound. Always verify the invariant.
Using a potential function that can become negative — invalidates the bound; Φ ≥ 0 is required.

7. When Greedy FAILS (Counterexamples)

The skill of greedy is not just knowing when it works but recognizing when it doesn’t before you commit. Memorize these traps.

7a. 0/1 Knapsack ≠ Fractional Knapsack

Fractional knapsack (you can take any fraction of an item): greedy by value-to-weight ratio is optimal. Sort items by v_i / w_i descending; take items in order, taking a fraction of the last item if needed.

0/1 knapsack (each item is take-or-skip): greedy by ratio is not optimal. Counterexample:

Item	Weight	Value	Ratio
A	1	1.5	1.5
B	2	2	1.0
C	2	2	1.0

Capacity = 3. Greedy by ratio takes A (capacity left 2), then B (capacity left 0), total 3.5. Optimum is B + C = 4.

The trap: ratio-greedy is correct under fractional flexibility, fails under integrality. The 0/1 version requires DP — see Phase 5 Lab 03.

7b. Coin Change With Arbitrary Denominations

For US coins {1, 5, 10, 25}, greedy (largest first) is optimal. For arbitrary denominations like {1, 3, 4} with target 6, greedy gives 4 + 1 + 1 = 3 coins; optimum is 3 + 3 = 2 coins. Greedy fails. Fall back to DP — see Lab 06 — Greedy vs DP for the canonical counterexample analysis.

7c. Scheduling With Weights

Interval scheduling (unweighted): greedy by earliest end time is optimal. Weighted interval scheduling (each interval has a weight, maximize total weight of non-overlapping picks): greedy fails. DP with binary-search predecessor pointer is correct, O(N log N).

7d. “Greedy By Farthest Reach” In Reachability

Some problems look like jump game but are not: e.g., on a weighted graph, “earliest arrival” by greedy farthest-reach is not Dijkstra; it requires the priority-queue refinement.

Common Misuse

Trusting “looks like jump game” reasoning. Always verify the greedy choice property formally. The signal is “I have an exchange argument” not “the example worked”.
Failing the cross-test. Try N=2 and N=3 by hand. Try a hostile counterexample. Try the input where all elements are equal, all are distinct, all are decreasing.
Forgetting the DP fallback. Many problems are “greedy if X, DP otherwise.” Know which side you are on before coding.

Greedy-Vs-DP Decision Flowchart

When a problem looks optimization-flavored, this flowchart determines whether to attempt greedy or jump straight to DP.

START → Is the problem an optimization (max / min) or counting?
        │
        ├── No (search / decision / construction) → Greedy candidate; check exchange argument.
        │
        └── Yes
            │
            ├── Can I sort the input by a single criterion (deadline, weight, ratio)
            │   such that processing in that order has the greedy choice property?
            │   │
            │   ├── Yes — and I can write a 4-step exchange argument in <90s →
            │   │   GREEDY. Code in <5 minutes.
            │   │
            │   └── No / not sure →
            │       Does the optimal answer at step i depend on choices at i+1, i+2, …?
            │       │
            │       ├── Yes → DP. State = (position, accumulated state needed for future).
            │       │   See Phase 5.
            │       │
            │       └── No / unclear → Try greedy with a small N=3, N=4 stress-test against
            │                          brute force. If matches → write proof attempt.
            │                          If diverges → DP.
            │
            └── Try the canonical counterexamples for this problem class:
                - Fractional vs 0/1 (knapsack)
                - Sorted vs arbitrary (coin change)
                - Unweighted vs weighted (scheduling)
                If any counterexample defeats your greedy → DP.

The flowchart’s discipline: never commit to greedy without an exchange argument. The 90-second time-box for the proof attempt is exactly the safety check that prevents the failure mode of confidently submitting wrong greedy code.

Common Greedy Bugs

A taxonomy. Each one shows up in at least 30% of submitted greedy solutions in mock interviews.

Claiming greedy without proof. “I’ll sort by end time and pick” with no exchange argument. The interviewer asks “why?” and the candidate either restates the algorithm or stalls. Fix: practice the 4-step exchange recipe so it comes out automatically when you propose the algorithm.
Wrong sort key. Sorting by start time when end time is correct (interval scheduling); sorting by weight when ratio is correct (fractional knapsack). Fix: before coding, do an exchange argument with each candidate sort key on a hand-crafted small example. The wrong key fails the exchange step quickly.
Ignoring counterexamples. “My algorithm passes the examples, ship it.” Fix: always run the algorithm against brute force on N=2, N=3, N=4 random inputs before committing.
Assuming local optimum = global optimum without justification. “At each step pick the smallest” works for some problems and fails for others. Fix: the greedy choice property is not free; it must be proven for every problem.
Confusing fractional and integer regimes. Ratio-greedy on 0/1 knapsack; “take half of this” interpretation in problems where items are atomic. Fix: read the integrality constraint before coding.
Off-by-one in “earliest end time” when ties exist. Ties must be broken consistently — typically by start time ascending — and the proof must handle ties. Fix: state the tie-breaker explicitly and verify the exchange argument handles it.
Greedy with backtracking masquerading as greedy. Some “greedy” solutions secretly maintain a stack and pop on conflict (e.g., remove-k-digits LC 402, candy distribution LC 135). The pure greedy claim doesn’t apply; the algorithm is greedy plus an undo step. Fix: if your algorithm has an if conflict: undo, it’s not pure greedy; the proof must cover the undo logic.
Mixing up the proof with the algorithm. “Earliest deadline first works because earliest deadline first is the best choice” is circular. Fix: the proof must reference an exchange or invariant or cut, not restate the algorithm.
Not handling the empty / size-1 case in scan-based greedy. Fix: explicit guard at the start.
Using a heap when sorting suffices (or vice versa). Heap is for online greedy where future inputs aren’t yet visible (e.g., merge K sorted lists, scheduling with deadlines streaming in). Sort is for offline where all inputs are known. Fix: ask “is the input streamed or batched?” up front.

Mastery Checklist

Before exiting this phase, verify all of these:

You can recognize a greedy candidate within 2 minutes by spotting the greedy choice property signal.
You can produce a 4-step exchange argument for any greedy claim within 90 seconds, out loud, without writing code first.
You can cite the cut property and apply it to prove Kruskal’s / Prim’s correctness from scratch.
You can articulate the difference between an invariant, a monovariant, and an exchange argument — and pick the right tool for a given proof.
You can perform aggregate, accounting, and potential-method amortized analyses on a dynamic-array example, in <5 minutes total.
You can name and articulate three canonical greedy counterexamples (0/1 vs fractional knapsack; coin change {1,3,4}; weighted interval scheduling).
You can implement interval scheduling, jump game II, task scheduler, gas station, and Huffman coding with full proofs in <90 minutes total.
You can articulate the greedy-vs-DP decision in <30 seconds for any optimization problem.
You catch yourself before committing to a greedy without an exchange argument — every time.

Exit Criteria

You may move to Phase 7 (Competitive Programming Acceleration) when all of the following are true:

You have completed all six labs in this phase, with each lab’s mastery criteria checked off.
You have solved at least 25 unaided greedy problems from LeetCode (mix of Medium and Hard) and reviewed each via REVIEW_TEMPLATE.md. On at least 20 of them, you wrote the exchange argument or invariant in the review before peeking at any solution.
Your unaided success rate on Medium-Hard greedy problems is ≥ 65%.
In a mock interview (phase-11-mock-interviews/), you correctly identify greedy-applicability within 2 minutes for at least 7 of 10 greedy-flavored problems and produce an exchange argument within 90 seconds for at least 6 of 10. You correctly reject greedy in favor of DP on at least 2 of the 10, citing a counterexample.
You have never in this phase shipped a greedy solution without a written proof. This is the single discipline of Phase 6, and skipping it is a phase-failure.

If any of these fails, do another 15–20 greedy problems before moving on. Skipping this gate produces engineers who pattern-match “looks like greedy” and ship wrong code under deadline pressure — exactly the failure mode that gets candidates rejected at staff level.

Labs

Hands-on practice. Each lab follows the strict 22-section format. Every lab’s Correctness Argument section contains an explicit exchange argument or invariant + monovariant proof. This is the whole point of Phase 6.

Lab 01 — Interval Scheduling (Activity Selection) — canonical exchange argument
Lab 02 — Jump Game II — greedy reach + monovariant
Lab 03 — Task Scheduler With Cooldown — greedy + math formula
Lab 04 — Gas Station — greedy + invariant proof
Lab 05 — Huffman Coding — greedy via heap + exchange-argument optimality
Lab 06 — Greedy Vs DP (Coin Change Counterexample) — when greedy fails and DP is required

← Phase 5: Dynamic Programming · Phase 7: Competitive Programming → · Back to Top

Lab 01 — Interval Scheduling (Activity Selection)

Goal

Master the canonical greedy problem: maximum non-overlapping interval selection. Internalize the earliest-end-time-first greedy and produce its exchange argument out loud, without help, in under 90 seconds.

Background

Interval scheduling is the prototype greedy problem in every algorithms textbook because it has the cleanest exchange argument and the largest gap between “intuitive but wrong” choices (earliest start, shortest duration, fewest conflicts) and the one correct one (earliest end). Mastering this lab is mastering the discipline of proof-before-code for the rest of Phase 6. The same exchange-argument template recurs in LC 452 (minimum arrows to burst balloons), LC 1353 (maximum events attended), and dozens of variants.

Interview Context

A staff-level interviewer at FAANG-tier companies will not accept “I’ll sort by end time” without a justification. The signal they’re testing for is: can this candidate distinguish between intuition and proof? The exchange argument, delivered cleanly in 60–90 seconds before code, is the strongest possible signal. Conversely, candidates who code first and then mumble “earliest end is correct because… well… it just is” almost always fail the question even when their code passes the tests.

Problem Statement

Given n activities, each with a start time s_i and an end time e_i, select the maximum-cardinality subset of activities that are pairwise non-overlapping (an activity ending at time t does not overlap with one starting at time t — boundary touching is allowed). Return the size of that subset, or the subset itself if requested.

LeetCode reference: LC 435 — Non-overlapping Intervals (return the count of intervals to remove to make the rest non-overlapping; equivalent to n − maxNonOverlapping).

Constraints

1 ≤ n ≤ 10^5
−5 · 10^4 ≤ s_i < e_i ≤ 5 · 10^4
Boundary contact (an interval ending at t and another starting at t) does NOT count as overlap.
Time complexity must be O(n log n); space O(1) extra (sort in place).

Clarifying Questions

“Are intervals open, closed, or half-open?” — LC convention is half-open: [s, e) is a common interpretation; ask the interviewer.
“Is an interval where s == e (zero-duration) allowed?” — usually yes, treat as a single-point interval.
“Should I return the maximum non-overlapping count or the minimum to remove?” — LC 435 asks the latter; the answer is n − maxNonOverlapping.
“Can the intervals be unsorted? Are they ever pre-sorted?” — assume unsorted unless stated; sort is part of the algorithm.
“Are there ties on end time?” — yes; tie-break by start time ascending so the algorithm is deterministic and the proof handles ties cleanly.

Examples

[[1,2], [2,3], [3,4], [1,3]] → max non-overlapping = 3 (pick [1,2], [2,3], [3,4]); remove count = 1.
[[1,2], [1,2], [1,2]] → max non-overlapping = 1; remove count = 2.
[[1,2], [2,3]] → max non-overlapping = 2; remove count = 0 (boundary contact does not overlap).
[] → 0.

Initial Brute Force

Try every subset; for each, check pairwise non-overlap; return the size of the largest valid subset.

from itertools import combinations

def max_non_overlap_brute(intervals):
    n = len(intervals)
    best = 0
    for r in range(n + 1):
        for subset in combinations(intervals, r):
            ok = all(subset[i][1] <= subset[j][0]
                     for i in range(len(subset))
                     for j in range(len(subset))
                     if subset[i][0] < subset[j][0])
            if ok:
                best = max(best, len(subset))
    return best

Brute Force Complexity

Time O(2^n · n^2) — every subset checked pairwise. Space O(n) for combinations. Infeasible at n = 25. Useful only as a stress-test oracle for the greedy on n ≤ 12.

Optimization Path

Brute force (above) — establishes correctness baseline.
DP — sort by end time; dp[i] = max non-overlapping using intervals 0..i. Transition: dp[i] = max(dp[i-1], 1 + dp[predecessor(i)]) where predecessor(i) is the latest j < i with e_j ≤ s_i. Time O(n log n) with binary search. Space O(n). This is weighted interval scheduling’s structure when we drop weights.
Greedy — sort by end time, scan once, pick whenever compatible. O(n log n) time, O(1) extra space. Optimality from the exchange argument below.

The DP step is worth deriving even though greedy beats it, because the moment intervals carry weights, greedy fails and DP is the only correct approach. Keep DP in your back pocket.

Final Expected Approach

Sort intervals by end time ascending (tie-break by start ascending). Maintain last_end = -∞. For each interval [s, e] in sorted order, if s ≥ last_end, accept it (count += 1, last_end = e); else skip. Return count.

For LC 435 the answer is n − count.

Data Structures Used

A sorted list / array of intervals.
One scalar last_end.
One counter count.

No heap, no DSU, no DP table. The algorithm is sort-and-scan.

Correctness Argument

This is the section the rest of Phase 6 hangs on. We use the canonical 4-step exchange argument.

Setup. Let G = g_1, g_2, …, g_k be the greedy solution: intervals sorted by end time, picked greedily. Let O = o_1, o_2, …, o_m be any optimal solution, also written in sorted-by-end-time order (we can always re-sort an optimal solution; non-overlap is preserved). Assume for contradiction m > k (O is strictly better than G); we will derive a contradiction by showing we can transform O into G step by step without decreasing its size — implying m ≤ k.

Step 1 — Locate the first divergence. Let i be the smallest index where g_i ≠ o_i. By construction, g_1 = o_1, …, g_{i-1} = o_{i-1}.

Step 2 — Compare end times at the divergence. Greedy picks the interval with the earliest end time among those compatible with g_1, …, g_{i-1} — equivalently, those compatible with o_1, …, o_{i-1}. So g_i.end ≤ o_i.end (with equality possible if there is a tie and o_i happens to be the tied alternative).

Step 3 — Exchange. Replace o_i with g_i in O, producing O' = o_1, …, o_{i-1}, g_i, o_{i+1}, …, o_m. We must verify (a) feasibility and (b) size.

Feasibility. g_i is compatible with o_{i-1} = g_{i-1} because greedy ensured this. g_i is compatible with o_{i+1}: o_{i+1}.start ≥ o_i.end ≥ g_i.end, so g_i ends no later than o_i did, and the rest of O was already non-overlapping with o_i’s end. So O' is feasible.
Size. |O'| = |O| (we replaced one interval with one interval).

Step 4 — Iterate. O' agrees with G on positions 1..i. Repeat the argument on O' versus G to find the next divergence. After at most min(k, m) exchanges, the resulting solution agrees with G on the first min(k, m) positions.

Conclude. If m > k, after k exchanges the transformed O looks like g_1, …, g_k, o_{k+1}, …, o_m. But greedy stopped at g_k, which means there was no interval compatible with g_1, …, g_k. Therefore o_{k+1} cannot exist — contradiction. So m ≤ k, i.e., G is at least as large as any feasible solution. G is optimal. QED.

Complexity

Time O(n log n) for the sort, O(n) for the scan; total O(n log n).
Space O(1) extra beyond the input (sort in place); O(log n) for sort recursion in some implementations.

Implementation Requirements

Sort key: (end, start) — explicit tie-break.
Boundary semantics: s_next ≥ e_prev is “compatible” (touching allowed). The interview’s clarifying question about open/closed determines this; default to half-open.
Handle empty input (n = 0 → return 0).
Single pass, no nested loop.

Tests

def test_interval_scheduling():
    # canonical
    assert max_non_overlap([[1,2],[2,3],[3,4],[1,3]]) == 3
    # all overlapping
    assert max_non_overlap([[1,2],[1,2],[1,2]]) == 1
    # no overlap
    assert max_non_overlap([[1,2],[2,3]]) == 2
    # empty
    assert max_non_overlap([]) == 0
    # single
    assert max_non_overlap([[5,10]]) == 1
    # tie on end time
    assert max_non_overlap([[1,3],[2,3],[3,4]]) == 2  # one of {[1,3],[2,3]} + [3,4]
    # negative coords
    assert max_non_overlap([[-5,0],[0,5],[5,10]]) == 3
    # nested
    assert max_non_overlap([[1,10],[2,3],[4,5],[6,7]]) == 3

Stress test: generate random n ≤ 12 and compare greedy to brute force on 1000 trials.

Follow-up Questions

Weighted version (each interval has weight w_i; maximize total weight of chosen non-overlapping). Greedy fails. Solution: DP with binary search predecessor pointer, O(n log n).
Online / streaming version (intervals arrive one at a time, decide accept/reject immediately, no recall). Different problem class — competitive ratio analysis.
K-machine extension (k parallel resources; each interval scheduled on any one of them; maximize total scheduled). Greedy with k “last_end” trackers + min-heap.

Product Extension

Calendar systems (Google Calendar’s “find a meeting time” feature, AWS spot-instance scheduling, ad slot allocation): the unweighted version is the prototype, and weighted variants are exactly what production schedulers solve.

Language / Runtime Follow-ups

Python: sorted(intervals, key=lambda x: (x[1], x[0])) — stable sort, tuple comparison handles tie-break for free.
Java: Arrays.sort(intervals, (a, b) -> a[1] != b[1] ? Integer.compare(a[1], b[1]) : Integer.compare(a[0], b[0])). Use Integer.compare to avoid integer-overflow on a[1] - b[1].
Go: sort.Slice(intervals, func(i, j int) bool { if intervals[i][1] != intervals[j][1] { return intervals[i][1] < intervals[j][1] }; return intervals[i][0] < intervals[j][0] }).
C++: std::sort with a lambda; prefer std::tie(a[1], a[0]) < std::tie(b[1], b[0]) for clarity.
JS/TS: intervals.sort((a, b) => a[1] - b[1] || a[0] - b[0]) — the || 0 falls through to start-comparison on tie.

Common Bugs

Sorting by start time instead of end time (intuitive, wrong).
Wrong tie-break (e.g., descending start) breaking determinism in the proof.
Using > instead of ≥ for compatibility, rejecting touching intervals.
Forgetting to update last_end after accepting an interval.
Returning the count when the question asked for the removal count (LC 435), or vice versa.

Debugging Strategy

If your greedy disagrees with brute force on some n ≤ 12 input:

Print both solutions side by side.
Find the first interval where they differ.
Manually run the exchange argument step on that point — does the swap preserve feasibility? If not, your sort key or tie-break is wrong.
If the exchange preserves feasibility but your code didn’t pick g_i, your scan logic has an off-by-one in the compatibility check.

Mastery Criteria

You can write the algorithm in <5 minutes from a blank screen, including the tie-break in the sort key.
You can deliver the 4-step exchange argument out loud in <90 seconds, without notes.
You can extend to LC 452 (minimum arrows to burst balloons) by recognizing it as the same problem with renaming, in <2 minutes.
You can articulate why the weighted version requires DP, in <30 seconds.
You correctly reject the wrong sort keys (earliest start, shortest duration, fewest conflicts) by giving a concrete counterexample for each, in <2 minutes total.

← Phase 6 README · Lab 02 — Jump Game II →

Lab 02 — Jump Game II

Goal

Internalize the greedy reach pattern: a single forward scan with two pointers (current_end, farthest) producing the minimum number of jumps. Prove correctness via a loop invariant + monovariant pair.

Background

Jump Game II (LC 45) is the canonical “greedy with monovariant” problem. It looks like BFS — and indeed, the greedy is BFS in disguise — but the BFS is implemented in O(n) time and O(1) space because the levels are contiguous index ranges. The proof is a two-part argument: an invariant (“at any point, the farthest position reachable in j jumps equals current_end after the j-th jump”) plus a monovariant (farthest non-decreasing, current_end strictly increasing per jump).

Interview Context

This problem (or its variants — LC 1306, LC 1326, LC 1024) appears at every FAANG-tier interview. The wrong solution is “BFS with a queue, mark visited” (O(n²) time, O(n) space). The right solution looks too simple to be correct unless you have the proof — which is why the interviewer asks for it.

Problem Statement

Given a non-negative integer array nums where nums[i] is the maximum jump length from index i, return the minimum number of jumps to reach the last index, starting from index 0. Assume the last index is always reachable.

Constraints

1 ≤ n ≤ 10^4
0 ≤ nums[i] ≤ 1000
Last index is always reachable (no -1 case).

Clarifying Questions

“Is 0 a valid value at intermediate positions, and does it mean we get stuck?” — yes; but the problem guarantees reachability, so we won’t actually get stuck.
“Can n == 1? Then the answer is 0 (already at the end).” — yes, edge case to handle.
“Do I return the count of jumps or the path?” — count.
“Are negative jumps allowed?” — no; non-negative only.

Examples

[2,3,1,1,4] → 2 (0 → 1 → 4).
[2,3,0,1,4] → 2.
[1] → 0 (already at end).
[1,1,1,1] → 3.
[5,1,1,1,1] → 1 (one jump from 0 to 4).

Initial Brute Force

Recursive DP with memoization: dp[i] = min jumps from index i to n-1.

from functools import lru_cache

def jumps_brute(nums):
    n = len(nums)
    @lru_cache(maxsize=None)
    def f(i):
        if i >= n - 1: return 0
        if nums[i] == 0: return float('inf')
        return 1 + min(f(i + j) for j in range(1, nums[i] + 1) if i + j < n)
    return f(0)

Brute Force Complexity

Time O(n · max(nums)) — each cell tried against up to max(nums) next cells. At max(nums) = 1000, that’s 10^7 — borderline. Space O(n) for memo + recursion. The greedy is O(n) time and O(1) space.

Optimization Path

Recursive DP (above) — clear correctness.
Iterative DP — dp[i] filled left to right; dp[i] = 1 + min(dp[i+j]). Same complexity.
BFS — view jumps as edges; level number = answer. O(n²) worst case if naively implemented (revisiting); O(n) if you track the boundary. The boundary tracking is exactly the greedy.
Greedy with two pointers — O(n) time, O(1) space. The BFS layers are contiguous ranges [L, R]; we process the range and compute the next range’s right endpoint as max(i + nums[i]) for i ∈ [L, R].

Final Expected Approach

def jump(nums):
    n = len(nums)
    if n <= 1: return 0
    jumps = 0
    current_end = 0
    farthest = 0
    for i in range(n - 1):              # don't iterate past n-1
        farthest = max(farthest, i + nums[i])
        if i == current_end:
            jumps += 1
            current_end = farthest
            if current_end >= n - 1:
                break
    return jumps

Two pointers: current_end is the right boundary of the BFS layer we’re currently processing; farthest is the right boundary of the next BFS layer being assembled.

Data Structures Used

Three integers: jumps, current_end, farthest. No arrays, no queue, no recursion. The simplicity is the point.

Correctness Argument

We prove: at termination, jumps equals the minimum number of jumps to reach index n - 1.

Setup. Define level(j) = the set of indices reachable from 0 in exactly j jumps and not in fewer. By induction on j: level(0) = {0}; level(j+1) = {i + k : i ∈ level(j), 1 ≤ k ≤ nums[i]} − ⋃_{j' ≤ j} level(j'). By a simple induction, level(j+1) is a contiguous range of indices [L_{j+1}, R_{j+1}] immediately to the right of R_j. (Proof: level(j) is contiguous by induction; the union of [i, i + nums[i]] over i in a contiguous range is itself a contiguous range; subtracting earlier levels removes a contiguous prefix.)

Loop invariant. At the top of iteration i:

current_end = R_{jumps} — i.e., the right boundary of the layer reached in jumps jumps so far.
farthest = max_{k ≤ i, k ≤ current_end} (k + nums[k]) — the farthest reachable from any index processed so far in the current layer.

Initialization. Before the loop: jumps = 0, current_end = 0, farthest = 0. Layer 0 is {0} with R_0 = 0 ✓.

Maintenance. At iteration i:

We update farthest = max(farthest, i + nums[i]). If i ≤ current_end, this maintains invariant (2).
If i == current_end, we’ve finished processing the current layer. We “jump”: jumps += 1, current_end = farthest. By invariant (2), farthest = R_{jumps_old + 1} = right boundary of the next layer. So invariant (1) is restored with jumps_new = jumps_old + 1.

Monovariant. farthest is non-decreasing across the loop (each iteration takes a max). At each “jump” event, current_end strictly increases (otherwise the last index isn’t reachable, contradicting the problem’s guarantee). The loop runs n - 1 iterations and the number of “jump” events is bounded above by n - 1, so the algorithm terminates and produces a finite jumps value.

Termination + correctness. The loop terminates after n - 1 iterations (or earlier if current_end ≥ n - 1). At that point, current_end ≥ n - 1 (because the last index is reachable; if it weren’t, we’d have current_end < n - 1 and farthest = current_end — no progress — but the problem guarantees reachability, so farthest > current_end whenever current_end < n - 1). Therefore jumps is exactly the layer index of n - 1 in the BFS — i.e., the minimum jump count. QED.

The two key proof devices: the invariant that current_end tracks layer boundaries, and the monovariant that farthest is non-decreasing. Together they give correctness; the monovariant alone gives termination.

Complexity

Time O(n) — single forward scan.
Space O(1) — three integers.

Implementation Requirements

Loop bound i < n - 1, not i < n. Iterating to n - 1 would cause an extra spurious jump count when current_end == n - 1.
if i == current_end: — the trigger for layer transition. The check happens after farthest is updated.
if current_end >= n - 1: break — early exit.
Handle n == 1 as a special case (answer 0) before the loop.

Tests

def test_jump():
    assert jump([2,3,1,1,4]) == 2
    assert jump([2,3,0,1,4]) == 2
    assert jump([1]) == 0
    assert jump([1,1,1,1]) == 3
    assert jump([5,1,1,1,1]) == 1
    assert jump([1,2]) == 1
    assert jump([0]) == 0  # n=1, no jumps needed
    # large jump from start
    assert jump([100, 1, 1, 1, 1, 1, 1]) == 1

Stress-test versus the recursive DP for n ≤ 20.

Follow-up Questions

LC 55 (Jump Game I) — can we reach the end? farthest ≥ i invariant suffices.
LC 1306 (Jump Game III) — bidirectional jumps with fixed offsets; not greedy, BFS.
LC 1340 (Jump Game V) — descent-only, weighted; DP territory.
Min jumps with cost — DP, not greedy. The cost asymmetry breaks the layer-contiguity argument.

Product Extension

Network packet routing: minimum hops to reach a destination when each node has a maximum forward-reach. Game pathfinding when movement primitives have variable range.

Language / Runtime Follow-ups

Python: as shown above. Beware the off-by-one on range(n - 1).
Java: int n = nums.length; if (n <= 1) return 0; int jumps = 0, end = 0, far = 0; for (int i = 0; i < n - 1; i++) { far = Math.max(far, i + nums[i]); if (i == end) { jumps++; end = far; if (end >= n - 1) break; } }.
Go: identical structure; use if i+nums[i] > far { far = i + nums[i] } to avoid math.Max float overhead.
C++: same; use std::max.
JS/TS: same; Math.max(...). Watch for nums.length re-evaluation cost in tight loops on engines that don’t hoist.

Common Bugs

Iterating for i in range(n) and double-counting the last jump.
Updating current_end before the bookkeeping check i == current_end.
Forgetting the n == 1 early return.
Using BFS with a queue when the contiguous-range observation makes it O(1) space.
Confusing farthest (assembling next layer) with current_end (current layer’s right edge) — labelling them consistently is essential.

Debugging Strategy

Print (i, current_end, farthest, jumps) at each iteration on a small input. The state should evolve predictably: farthest rises, current_end jumps to farthest exactly when i catches up.
If your output is one too many: check the loop bound (n - 1 not n).
If your output is one too few: check that you increment jumps at the layer boundary, not after the last iteration.

Mastery Criteria

You can write the algorithm in <5 minutes from blank.
You can articulate the BFS-layer interpretation in <30 seconds.
You can state the loop invariant precisely and run through initialization → maintenance → termination in <2 minutes.
You can name the monovariant and explain why it implies termination, in <30 seconds.
You can extend to LC 55 (Jump Game I) in <3 minutes by simplifying the same template.

← Lab 01 — Interval Scheduling · Phase 6 README · Lab 03 — Task Scheduler →

Lab 03 — Task Scheduler With Cooldown

Goal

Master the frequency-greedy pattern: schedule a stream of tasks with a per-type cooldown, minimizing total CPU cycles. Derive the closed-form formula (maxFreq - 1) * (n + 1) + countMax, prove its optimality via an exchange argument, and recognize when the formula breaks (when actual task count exceeds the formula).

Background

LC 621 is the canonical “greedy + counting formula” problem. The exchange argument is short but subtle: the most-frequent task type must be scheduled in the densest possible pattern (every n+1 slots), and any optimal schedule that does not schedule the most-frequent type maximally densely can be modified to do so without increasing the total cycle count. This produces the formula. The skill being tested is recognition of the pattern, derivation of the formula from first principles, and the discipline to also show the alternative max-heap simulation that handles the same problem operationally.

Interview Context

Asked at Amazon, Google, Meta. The interviewer is testing whether you can derive a formula or whether you reach for a heap reflexively. Both solutions are accepted; the formula version with proof is the stronger signal. Watch for the follow-up: “What if a new task type can arrive mid-execution?” — that breaks the formula and forces the heap.

Problem Statement

Given an array of tasks (uppercase letters) and an integer n (cooldown), schedule the tasks so that the same task type is separated by at least n other slots (the slots can be idle if necessary). Return the minimum number of CPU cycles to finish all tasks.

LeetCode reference: LC 621 — Task Scheduler.

Constraints

1 ≤ |tasks| ≤ 10^4
tasks[i] is uppercase English letter (so at most 26 distinct types).
0 ≤ n ≤ 100.

Clarifying Questions

“Does the schedule have to be returned, or just the cycle count?” — count only.
“Can multiple tasks of different types execute in the same cycle?” — no, one task per cycle (or one idle).
“If n == 0, can same-type tasks run back to back?” — yes; answer is just len(tasks).
“Are tasks pre-sorted or arrive in any order?” — order doesn’t matter; only the frequency vector matters.
“Can a single task type appear more times than total cycles?” — no; the formula handles this naturally.

Examples

tasks = ["A","A","A","B","B","B"], n = 2 → 8. Schedule: A B _ A B _ A B.
tasks = ["A","A","A","B","B","B"], n = 0 → 6.
tasks = ["A","A","A","A","A","A","B","C","D","E","F","G"], n = 2 → 16. Formula: (6-1)*(2+1) + 1 = 16.
tasks = ["A","B","C","D","E","A","B","C","D","E"], n = 4 → 10. Formula gives (2-1)*5 + 5 = 10; actual = 10. Tight.
tasks = ["A","B","C","D","E","F"], n = 100 → 6. Formula gives (1-1)*101 + 6 = 6 since maxFreq = 1.

Initial Brute Force

Simulate. At each cycle, pick any task type with cooldown elapsed and remaining count > 0; if multiple, pick whichever (the simulation is correct under any tiebreaker, but optimal requires the highest-frequency one). If none available, idle. Repeat until all tasks done.

Brute Force Complexity

O(T) where T is the answer. Tight bound T ≤ |tasks| * (n + 1), so O(|tasks| * n) worst case. Acceptable but slower than the formula.

Optimization Path

Brute simulation — easy to write, slow.
Max-heap simulation — at each cycle, pop highest-count types, decrement, push back to a temporary “cooling” queue with cooldown timestamp. After n + 1 cycles or a complete pass, restore from cooling queue.
Closed-form formula — derive from the structure of an optimal schedule. O(|tasks|) time, O(1) space.

Final Expected Approach

from collections import Counter

def least_interval(tasks, n):
    cnt = Counter(tasks)
    max_freq = max(cnt.values())
    count_max = sum(1 for v in cnt.values() if v == max_freq)
    return max(len(tasks), (max_freq - 1) * (n + 1) + count_max)

The max(len(tasks), …) handles the case where the formula gives less than total tasks — i.e., when there are so many distinct task types that we never need to idle.

Data Structures Used

A frequency counter (26-entry array or Counter).
Two integers: max_freq, count_max.

Correctness Argument

We prove the formula T = max(|tasks|, (max_freq - 1)(n + 1) + count_max) is optimal.

Setup. Let M = max_freq and K = count_max (number of types tied for the maximum frequency).

Lower bound (no schedule can do better than T). Consider any task type with frequency M. The M instances of this type must be separated by at least n other slots, so the schedule spans at least (M - 1)(n + 1) + 1 cycles for one such type. If K types are all tied at frequency M, then in cycles 1, n+2, 2n+3, … we must place an instance of each tied type — actually, we must place all K of them in one of the n+1-slot windows. The last window (after the last instance of the most-frequent type’s predecessor) has only the final instances of each tied type, contributing exactly K cycles after (M - 1)(n + 1). So T ≥ (M - 1)(n + 1) + K.

Also trivially T ≥ |tasks| (every task runs in its own cycle).

So T ≥ max(|tasks|, (M - 1)(n + 1) + K).

Upper bound (the formula’s value is achievable). We construct a schedule of length exactly (M - 1)(n + 1) + K (or |tasks| if larger) and show it is feasible.

Case 1: (M - 1)(n + 1) + K ≥ |tasks|. Lay out M − 1 complete frames of n + 1 slots each, followed by a final frame of K slots. In each frame, slot j (for 0 ≤ j < K) is reserved for the j-th most-frequent task type. The remaining n + 1 − K slots in each frame are filled by other task types in any order; if there aren’t enough non-cooldown candidates, idle. Exchange argument step: suppose an optimal schedule does not place the most-frequent task at slots 0, n+1, 2(n+1), … of consecutive frames. Then there is a slot i where it could have been placed but wasn’t. Swap it with whatever is there; cooldown is preserved (we are moving an instance to a slot that’s n + 1 away from the previous instance, which is within bounds); other tasks are not constrained by this swap. Iterate. The result is the formula schedule, with the same length.
Case 2: (M - 1)(n + 1) + K < |tasks|. Then there are more total tasks than the formula’s slot count; we have so much variety that no idle is needed. Schedule any feasible permutation; total cycles = |tasks|. This is achievable because at each cycle we have at least 26 distinct types to choose from (modulo cooldowns), and the cooldown constraint cannot exceed n slots, which is dominated by the diversity.

In both cases, the constructed schedule’s length matches the lower bound. Optimal. QED.

The exchange argument is the crucial step: it converts “the formula is one possible schedule’s length” into “no schedule is shorter.” Without the exchange, you have only an existence claim.

Complexity

Time O(|tasks|) — single pass to compute frequencies.
Space O(1) — at most 26 distinct task types, frequency dictionary is constant-bounded.

Implementation Requirements

Use Counter or a 26-entry array.
Compute max_freq and count_max in one pass.
Return max(len(tasks), (max_freq - 1) * (n + 1) + count_max) — do not skip the max.

Tests

def test_least_interval():
    assert least_interval(["A","A","A","B","B","B"], 2) == 8
    assert least_interval(["A","A","A","B","B","B"], 0) == 6
    assert least_interval(["A","A","A","A","A","A","B","C","D","E","F","G"], 2) == 16
    assert least_interval(["A","B","C","D","E","A","B","C","D","E"], 4) == 10
    assert least_interval(["A","B","C","D","E","F"], 100) == 6
    assert least_interval(["A"], 5) == 1
    assert least_interval(["A","A"], 0) == 2
    assert least_interval(["A","A","A","A"], 3) == 13  # (4-1)*4 + 1 = 13

Follow-up Questions

Streaming tasks. New tasks arrive mid-execution; formula no longer applies because frequencies change. Use the max-heap simulation.
Different cooldowns per task type. Heap with per-type cooldown tracker.
Print the actual schedule. Heap simulation produces a schedule; the formula does not directly give one (you’d reconstruct from the proof).
What if n can be huge (n = 10^9)? Same formula; constant-time arithmetic.

Product Extension

OS task scheduling with cooldown (e.g., a process that touched a hot resource must wait n cycles before re-touching). API rate limiting at the user level. Workout-program scheduling with muscle-group recovery windows.

Language / Runtime Follow-ups

Python: Counter(tasks) is O(|tasks|).
Java: int[26] array indexed by c - 'A'. Faster than HashMap<Character, Integer> for the constant-alphabet case.
Go: [26]int works the same.
C++: std::array<int, 26>; std::max_element for max_freq.
JS/TS: Map<string, number> or 26-entry array; the latter is faster.

Common Bugs

Forgetting max(len(tasks), formula) — fails on inputs with many distinct task types.
Using count_max = 0 when there’s only one max-frequency type (should be 1).
Off-by-one in the formula: (M - 1) * (n + 1) not M * (n + 1).
Heap simulation: forgetting to push the cooled-down task back, or pushing it back at the wrong cycle.

Debugging Strategy

For each test case, hand-compute M, K, and the formula. If formula matches expected, your code is wrong somewhere mechanical.
If formula doesn’t match expected, you have a conceptual error: either M is wrong (multi-counting) or K is wrong (counting non-tied types) or the max(|tasks|, …) clamp is missing.
Run brute simulation as a stress oracle for |tasks| ≤ 20.

Mastery Criteria

You can derive the formula from first principles in <3 minutes.
You can deliver the exchange argument out loud in <2 minutes.
You can write the formula-based solution in <3 minutes.
You can write the heap-based simulation in <10 minutes when asked for the streaming variant.
You can articulate why the max(|tasks|, formula) clamp is necessary and which case it covers, in <60 seconds.

← Lab 02 — Jump Game II · Phase 6 README · Lab 04 — Gas Station →

Lab 04 — Gas Station

Goal

Master the single-pass invariant greedy: O(n) time, O(1) space, with a non-trivial correctness invariant proving why we can skip ahead instead of retrying every starting station.

Background

LC 134 is the canonical “one-pass with reset” greedy. The naive approach is O(n²): for each candidate starting station, simulate the trip. The greedy collapses this to O(n) via the invariant: if the running tank goes negative at station k starting from station s, then no station in [s, k] can be a valid starting point. Once you see this, the algorithm shrinks to a few lines and the proof is the entire test of skill.

Interview Context

Asked at Google, Bloomberg, Amazon. The candidate who codes O(n²) first then asks “can we do better?” is fine. The candidate who jumps to O(n) without the invariant proof is in danger — interviewers test by asking “why is start = k + 1 correct?” and a candidate without the invariant answers “uh, intuition.” That answer fails staff-level interviews.

Problem Statement

There are n gas stations on a circular route. Station i has gas[i] units of gas; traveling from station i to station i + 1 costs cost[i] units. Starting with an empty tank at some station, find the unique starting station that allows you to complete the full circle, or return -1 if impossible.

LeetCode reference: LC 134 — Gas Station.

Constraints

1 ≤ n ≤ 10^5
0 ≤ gas[i], cost[i] ≤ 10^4
The solution is unique if it exists.
Time O(n), space O(1).

Clarifying Questions

“Is the route guaranteed circular?” — yes; from station n - 1 you go to 0.
“Can the answer be ambiguous (multiple valid starts)?” — no, the problem guarantees uniqueness when a solution exists.
“Can gas[i] or cost[i] be negative?” — no, both non-negative.
“Should I return the index or the boolean feasibility?” — index, or -1.

Examples

gas = [1,2,3,4,5], cost = [3,4,5,1,2] → 3 (start at index 3: tank 0 + 4 - 1 = 3, 3 + 5 - 2 = 6, 6 + 1 - 3 = 4, 4 + 2 - 4 = 2, 2 + 3 - 5 = 0).
gas = [2,3,4], cost = [3,4,3] → -1 (total gas = 9, total cost = 10, infeasible).
gas = [5], cost = [4] → 0.
gas = [3,1,1], cost = [1,2,2] → 0.

Initial Brute Force

For each candidate start s, simulate the full trip; return the first s that succeeds.

def can_complete_brute(gas, cost):
    n = len(gas)
    for s in range(n):
        tank = 0
        for k in range(n):
            i = (s + k) % n
            tank += gas[i] - cost[i]
            if tank < 0:
                break
        else:
            return s
    return -1

Brute Force Complexity

Time O(n²), space O(1). At n = 10^5, n² is 10^{10} — too slow.

Optimization Path

Brute — O(n²).
Total-feasibility check — if sum(gas) < sum(cost), no solution exists; return -1 immediately. Reduces wasted work but still O(n²) worst case.
One-pass with reset — O(n). The invariant below is the key.

Final Expected Approach

def can_complete_circuit(gas, cost):
    if sum(gas) < sum(cost):
        return -1
    start = 0
    tank = 0
    for i in range(len(gas)):
        tank += gas[i] - cost[i]
        if tank < 0:
            start = i + 1
            tank = 0
    return start

Data Structures Used

Two integers: start, tank. Plus the inputs.

Correctness Argument

We prove two things: (1) if sum(gas) ≥ sum(cost), the algorithm returns a valid starting index; (2) if sum(gas) < sum(cost), no solution exists.

Part 2 is trivial. Across one full lap, the tank changes by exactly sum(gas) - sum(cost). If this is negative, the tank cannot remain non-negative throughout any lap from any start — so no solution.

Part 1 — the key invariant.

Invariant (key claim): suppose we run the algorithm starting from index start = s and the running tank first goes negative at index k (so the partial sum tank after processing index k is < 0, but it was ≥ 0 after processing every index in [s, k - 1]). Then no index in [s, k] can be a valid starting point.

Proof of the key claim. Let T(a, b) = sum(gas[a..b]) - sum(cost[a..b]) be the net fuel from a to b. By assumption, T(s, k - 1) ≥ 0 (we made it past k - 1) and T(s, k) < 0 (we failed at k).

Consider any candidate start s' ∈ [s, k]. To complete the lap from s', we need partial sums T(s', i) ≥ 0 for every i between s' and s' + n - 1 (mod n) — in particular, T(s', k) ≥ 0 (assuming s' ≤ k; otherwise we’d be considering s' = k + 1 which is outside the claim’s range).

But T(s', k) = T(s, k) - T(s, s' - 1) ≤ T(s, k) < 0 (since T(s, s' - 1) ≥ 0 by the assumption that we made it past every index in [s, s' - 1]). So starting from s', the tank goes negative at index k, and s' is not a valid start.

Therefore, after a failure at k, we can safely skip all of [s, k] and resume the search from k + 1. QED for the key claim.

Wrapping up. Each index is visited at most once as part of either a successful prefix or the “reset point.” The algorithm runs n iterations. If sum(gas) ≥ sum(cost), the final start is a valid starting point — because the algorithm has effectively eliminated all other candidates, and the problem guarantees a unique solution when one exists. (Formally: from start to the end of the array, no negative event occurred. The wrap-around portion (from index 0 back to start - 1) accumulates at most sum_total - tank_so_far ≤ sum_total = T_total ≥ 0, but we need the running tank non-negative, which follows from the invariant: every prefix from start is non-negative until end, and the wrap-around is the complement, which by total non-negativity stays non-negative.)

The careful formal completion: since T_total ≥ 0, and T(start, n - 1) ≥ 0, we have T(0, start - 1) = T_total - T(start, n - 1) ≤ T_total, but we need positivity of partial sums. The invariant from each reset proved that no earlier candidate works; combined with uniqueness, start is the unique answer.

Complexity

Time O(n) — one pass for sum, one pass for the loop.
Space O(1).

Implementation Requirements

Pre-check sum(gas) < sum(cost) is optional (the algorithm itself returns the right start either way only if a solution exists; the pre-check is cheap and avoids returning a bogus value).
Reset tank = 0 (not tank = gas[i+1] - cost[i+1]) when starting fresh.
start = i + 1 after failure at i.
The variant where you maintain both running and total in a single pass is also acceptable:

def can_complete_circuit_one_pass(gas, cost):
    total = tank = start = 0
    for i in range(len(gas)):
        diff = gas[i] - cost[i]
        total += diff
        tank += diff
        if tank < 0:
            start = i + 1
            tank = 0
    return start if total >= 0 else -1

Tests

def test_gas_station():
    assert can_complete_circuit([1,2,3,4,5], [3,4,5,1,2]) == 3
    assert can_complete_circuit([2,3,4], [3,4,3]) == -1
    assert can_complete_circuit([5], [4]) == 0
    assert can_complete_circuit([3,1,1], [1,2,2]) == 0
    assert can_complete_circuit([5,1,2,3,4], [4,4,1,5,1]) == 4
    # exact match (zero margin)
    assert can_complete_circuit([1,2,3], [3,2,1]) in (0, 1, 2)  # one of these
    # all zeros
    assert can_complete_circuit([0,0,0], [0,0,0]) == 0
    # cannot start
    assert can_complete_circuit([1,1,1], [2,2,2]) == -1

Stress-test versus brute force for n ≤ 50.

Follow-up Questions

Find the index where you must idle if the route is infeasible. Slightly different problem; same scan structure.
Multiple cars on the same circuit. Independent problems per car.
Variable tank capacity. Now state is two-dimensional; greedy may fail; revert to DP / simulation.
Two-direction route. Run the greedy in both directions; combine.

Product Extension

Battery-powered EV routing with charging stations of variable wattage and costs. Drone delivery routes with refuel points. Spacecraft trajectory planning with gravity-assist maneuvers (highly idealized).

Language / Runtime Follow-ups

Python: as shown.
Java: identical structure; int total = 0, tank = 0, start = 0;.
Go: total, tank, start := 0, 0, 0.
C++: int total = 0, tank = 0, start = 0;. Watch overflow if gas[i] and cost[i] are at the upper end and n = 10^5: 10^4 * 10^5 = 10^9, within int32 range, but borderline; use long long to be safe.
JS/TS: let total = 0, tank = 0, start = 0;. JS numbers are 64-bit floats, no overflow worry at this scale.

Common Bugs

Resetting tank = gas[i] - cost[i] instead of tank = 0 after failure (you’d be double-counting the failure point).
Setting start = i instead of start = i + 1 after failure.
Forgetting the total < 0 → -1 check, returning a bogus index.
Iterating in the wrong direction or two passes when one suffices.

Debugging Strategy

Print (i, gas[i] - cost[i], tank, start) at each step. The trajectory should show: tank rises and falls, and on each fall below 0 the start jumps to i + 1.
If your output is off by one (returns start - 1 or start + 1), check the assignment in the failure branch.
If you return a start but the route is actually infeasible, you missed the total < 0 gate.

Mastery Criteria

You can write the algorithm in <4 minutes from blank.
You can state the key invariant (“if tank goes negative at k from start s, no station in [s, k] can be a valid start”) in <30 seconds.
You can prove the invariant (using the partial-sum decomposition T(s', k) = T(s, k) - T(s, s' - 1)) in <2 minutes, out loud.
You can articulate why total < 0 → -1 is sufficient and necessary, in <30 seconds.
You can produce the brute-force baseline as a stress-test oracle in <3 minutes when asked.

← Lab 03 — Task Scheduler · Phase 6 README · Lab 05 — Huffman Coding →

Lab 05 — Huffman Coding

Goal

Implement Huffman coding from scratch using a min-heap, and prove its optimality via the canonical exchange argument: in some optimal prefix-free code tree, the two least-frequent symbols are siblings at maximum depth.

Background

Huffman coding is the apex example of “greedy via min-heap” and the most-cited example of greedy optimality in CS curricula. The proof has two non-trivial steps: a swap-to-leaf-depth lemma (any internal node at maximum depth can be assumed to have the two least-frequent symbols), and an induction on the merged tree (the greedy is optimal on n - 1 symbols, and combining the two smallest preserves optimality). Mastering this proof teaches a more sophisticated form of exchange argument than the linear-scan greedy of Labs 1–4.

Interview Context

Huffman is asked occasionally at top-tier interviews — Google, Apple, AWS — usually as an open-ended “design a compression algorithm” or as a follow-up to a lab on heap usage. More commonly, the technique (greedy via min-heap, with optimality proof) appears in adjacent problems: LC 1167 — Minimum Cost to Connect Sticks, LC 23 — Merge K Sorted Lists, and the rope-merging problem. Mastery of Huffman = mastery of the entire family.

Problem Statement

Given a frequency map freq: Symbol -> int over n distinct symbols (n ≥ 2), construct a prefix-free binary code such that the expected code length Σ freq[s] * len(code[s]) is minimized. Return either the code map or the encoding tree.

For interview formulation, often phrased as: “Given n ropes of given lengths, you can merge two ropes at a cost equal to the sum of their lengths. Find the minimum total cost to merge all ropes into one.” (Equivalent to Huffman; rope lengths = frequencies.)

LeetCode reference: LC 1167 — Minimum Cost to Connect Sticks (the rope formulation).

Constraints

2 ≤ n ≤ 10^4
1 ≤ freq[s] ≤ 10^4
Time O(n log n); space O(n).
Tie-break on equal frequencies: any order is acceptable; the optimal cost is invariant.

Clarifying Questions

“Should I return the codes or just the cost?” — usually cost (rope formulation); for full Huffman, return the tree or the codes.
“Are frequencies guaranteed positive?” — yes (zero-frequency symbols don’t need codes).
“Are there always at least 2 symbols?” — assume yes; with 1 symbol, prefix-free coding is trivially “0” (or empty, depending on definition).
“Is n = 0 a valid input?” — typically no.
“Should the codes be canonical?” — usually no; any optimal-length code is acceptable.

Examples

Frequencies: {a: 5, b: 9, c: 12, d: 13, e: 16, f: 45}. Codes (one valid set): f: 0, c: 100, d: 101, a: 1100, b: 1101, e: 111. Total cost: 5*4 + 9*4 + 12*3 + 13*3 + 16*3 + 45*1 = 20 + 36 + 36 + 39 + 48 + 45 = 224.
Ropes [2, 4, 3]: merge 2+3=5 (cost 5), then 5+4=9 (cost 9), total 14.
Ropes [1, 8, 3, 5]: merge 1+3=4, merge 4+5=9, merge 9+8=17. Total = 4+9+17=30. Or: 1+3=4, 4+5=9, 8+9=17 → same. Min cost 30.

Initial Brute Force

Try every binary-tree topology over the leaves; compute the weighted external path length; return the minimum-cost tree. Catalan-number many trees → infeasible past n = 10.

# Sketched only — exponential
def huffman_brute(freq):
    # enumerate all binary trees with leaves = freq, return min weighted path length
    ...

Brute Force Complexity

O(C_n) where C_n is the Catalan number — C_{10} ≈ 16800, C_{15} ≈ 9.7M. Useful only as stress-test for n ≤ 8.

Optimization Path

Brute — exhaustive trees.
DP on intervals — possible if leaves are ordered (matrix-chain style), but Huffman’s leaves are unordered, so this doesn’t apply.
Greedy with min-heap — O(n log n) time, O(n) space. Optimality from the exchange argument below.

Final Expected Approach

import heapq

def huffman_cost(freqs):
    heap = list(freqs)            # frequencies only, for cost-only variant
    heapq.heapify(heap)
    total = 0
    while len(heap) > 1:
        a = heapq.heappop(heap)
        b = heapq.heappop(heap)
        s = a + b
        total += s
        heapq.heappush(heap, s)
    return total

def huffman_codes(freqs):
    # freqs: list of (symbol, freq) tuples
    heap = [[f, [[s, ""]]] for s, f in freqs]
    heapq.heapify(heap)
    while len(heap) > 1:
        lo = heapq.heappop(heap)
        hi = heapq.heappop(heap)
        for pair in lo[1]:
            pair[1] = '0' + pair[1]
        for pair in hi[1]:
            pair[1] = '1' + pair[1]
        heapq.heappush(heap, [lo[0] + hi[0], lo[1] + hi[1]])
    return dict(heap[0][1])

Data Structures Used

A min-heap of (frequency, optional payload) pairs.
The implicit binary tree formed by the merge sequence.

Correctness Argument

We prove the greedy is optimal by induction on n (the number of symbols), using two lemmas.

Lemma 1 (Swap-to-deepest). In some optimal prefix code tree T*, the two least-frequent symbols x and y are siblings at the maximum depth of any leaf.

Proof. Take any optimal tree T'. Let a and b be two siblings at the maximum-depth leaf level of T' (such a pair exists in any full binary tree where every internal node has 2 children). If {a, b} = {x, y}, done. Otherwise, suppose WLOG freq[x] ≤ freq[a] and x ≠ a. Swap x with a (place the symbol x at a’s leaf and vice versa). The cost change is:

Δ = freq[x] * depth(a) + freq[a] * depth(x) - freq[x] * depth(x) - freq[a] * depth(a) = (freq[a] - freq[x]) * (depth(x) - depth(a))

Since freq[a] ≥ freq[x] (because x is among the two least-frequent) and depth(a) ≥ depth(x) (because a is at maximum depth), Δ ≤ 0. Equality holds; the new tree is also optimal. Repeat with y and b. We’ve moved x, y to the maximum-depth pair without increasing cost. QED for Lemma 1.

Lemma 2 (Greedy preserves optimality on the residual). Let x, y be the two least-frequent symbols. Construct freq' by replacing x and y with a single super-symbol z of frequency freq[x] + freq[y]. Then any optimal tree T* for freq' extends to an optimal tree for freq by replacing z’s leaf with an internal node whose children are leaves for x and y.

Proof. Let T_extended be the extension of T* (replace z-leaf with internal node + x, y children). The cost satisfies:

cost(T_extended) = cost(T*) + freq[x] + freq[y]

(The + freq[x] + freq[y] comes from x, y being one level deeper than z was.)

Conversely, any tree T for freq where x, y are siblings (which by Lemma 1 we can assume WLOG) collapses to a tree T_collapsed for freq' by merging x, y into z:

cost(T_collapsed) = cost(T) - freq[x] - freq[y]

So cost(T) = cost(T_collapsed) + freq[x] + freq[y] ≥ cost(T*) + freq[x] + freq[y] = cost(T_extended). Therefore T_extended is at least as good as any tree where x, y are siblings; combined with Lemma 1, T_extended is optimal for freq. QED for Lemma 2.

Inductive proof of greedy optimality. Base case n = 2: only one tree possible, greedy gives it. Inductive step: greedy merges the two least-frequent symbols x, y first, recurses on the residual of size n - 1, and by inductive hypothesis the recursive call produces an optimal tree for freq'. By Lemma 2, the extended tree is optimal for freq. QED.

The two lemmas together are the full exchange-argument proof. Lemma 1 is the swap step; Lemma 2 is the induction step.

Complexity

Time O(n log n) — n - 1 merge operations, each with two pop and one push, each O(log n).
Space O(n) — heap and tree.

Implementation Requirements

Use a min-heap (heapq in Python uses a min-heap by default; in Java, PriorityQueue is min by default).
Tie-breakers: when frequencies are equal, the heap may pick either; correctness is unaffected. For deterministic output, add a secondary index.
For the cost-only variant, the symbol payload can be omitted.

Tests

def test_huffman_cost():
    assert huffman_cost([2, 3, 4]) == 14  # 2+3=5, 5+4=9
    assert huffman_cost([1, 8, 3, 5]) == 30
    assert huffman_cost([5]) == 0  # n=1 edge: merge cost is 0
    # uniform
    assert huffman_cost([1, 1, 1, 1]) == 8  # 1+1=2, 1+1=2, 2+2=4. Total=2+2+4=8.
    # large skew
    assert huffman_cost([1, 1, 1000]) == 1003  # 1+1=2, 2+1000=1002. Total=2+1002=1004?
    # actually: heap [1,1,1000] → pop 1, pop 1, push 2. heap [2, 1000]. pop 2, pop 1000, push 1002. cost = 2 + 1002 = 1004.

Wait — let me re-verify the last test. Heap [1, 1, 1000]. Pop 1 + 1 = 2 (cost contribution: 2). Push 2. Heap [2, 1000]. Pop 2 + 1000 = 1002 (cost contribution: 1002). Total = 2 + 1002 = 1004. So the test should be assert huffman_cost([1, 1, 1000]) == 1004. Correct your tests.

Follow-up Questions

Adaptive Huffman — frequencies are unknown a priori; encoder and decoder maintain a tree that updates as symbols arrive. Used in older compression standards.
Canonical Huffman — codes are normalized so only code lengths need to be transmitted, not the tree structure. Used in DEFLATE / zlib.
Length-limited Huffman (max code length L) — the package-merge algorithm, more complex than vanilla Huffman.
Arithmetic coding — beats Huffman for non-power-of-2 frequencies; not greedy.

Product Extension

Used in: gzip / DEFLATE (with length-limited variant), HTTP/2 HPACK header compression, JPEG entropy coding stage, MP3 audio coding. Whenever a known-frequency distribution must be losslessly compressed with a prefix-free code, Huffman or its variants are the workhorse.

Language / Runtime Follow-ups

Python: heapq for the heap. Watch out: heapq is min-heap; for tie-breaking, use (freq, counter, payload) to avoid comparing payloads.
Java: PriorityQueue<Node> with Comparator.comparingInt(n -> n.freq).
Go: implement heap.Interface (5 methods) on a slice of nodes; standard library does not provide a generic typed heap pre-1.21.
C++: std::priority_queue<Node, std::vector<Node>, std::greater<Node>>. Define operator< on Node to compare by frequency.
JS/TS: no built-in heap; either bring a library (@datastructures-js/priority-queue) or hand-roll a binary heap.

Common Bugs

Mixing up min-heap and max-heap: with a max-heap, you’d merge the two largest — the answer is wrong by a lot.
Pushing the merged node back with the wrong frequency (e.g., max(a, b) instead of a + b).
For the codes variant: assigning ‘0’ to high-freq and ‘1’ to low-freq and forgetting that the prefix is built bottom-up (so the last prepended bit is the root’s assignment — make sure the prepend order is right).
Heap of size 1 at the start (single symbol): the while len(heap) > 1 loop is correct; cost is 0.

Debugging Strategy

Hand-trace a small example (e.g., [1, 1, 1, 1]) and verify each merge step.
Compare cost output against the brute force for n ≤ 6.
For codes: verify the tree visually — every internal node has exactly two children, every leaf is a symbol, and code lengths are weighted appropriately.

Mastery Criteria

You can implement Huffman cost-only in <8 minutes from blank.
You can implement Huffman with full code map in <15 minutes.
You can deliver Lemma 1 (swap-to-deepest) in <2 minutes, out loud.
You can deliver Lemma 2 (induction on residual) in <2 minutes, out loud.
You can recognize LC 1167 / connect-sticks as a Huffman variant in <30 seconds.
You can articulate when Huffman is not optimal (e.g., when the alphabet allows non-binary codes, or when arithmetic coding is admissible).

← Lab 04 — Gas Station · Phase 6 README · Lab 06 — Greedy Vs DP →

Lab 06 — Greedy Vs DP (Coin Change Counterexample)

Goal

Internalize the failure mode of greedy by walking through the canonical counterexample: coin change with denominations [1, 3, 4] and target 6. Greedy gives 4 + 1 + 1 = 3 coins; DP gives 3 + 3 = 2 coins. Make this the test you run on every “looks like greedy” problem before committing.

Background

Many candidates correctly solve coin change with US denominations [1, 5, 10, 25] greedily, then assume greedy works for any denomination set. It does not. The failure on [1, 3, 4] target 6 is the most-cited counterexample in algorithms textbooks (Cormen, Kleinberg-Tardos, Erickson) precisely because it cleanly demonstrates that “greedy felt right” is not a proof. The lesson generalizes: for greedy to work on a problem, the underlying combinatorial structure typically must be a matroid — a property that is rarely obvious from problem statements and almost never holds for arbitrary inputs.

Interview Context

This lab is the meta lab of Phase 6. Its purpose is not to drill a new algorithm but to drill the discipline of testing greedy hypotheses against counterexamples before coding. Interviewers love to ask coin-change variants specifically because they expose candidates who pattern-match without proof. A candidate who says “I’ll greedy by largest denomination” is asked “what about [1, 3, 4] target 6?” and either (a) recovers gracefully and switches to DP, or (b) doubles down and ships wrong code. (b) ends the interview.

Problem Statement

Given an array of distinct positive coin denominations coins and a non-negative integer amount, return the minimum number of coins needed to sum to amount, or -1 if impossible. You have an unlimited supply of each denomination.

LeetCode reference: LC 322 — Coin Change.

Constraints

1 ≤ |coins| ≤ 12
1 ≤ coins[i] ≤ 2^31 - 1
0 ≤ amount ≤ 10^4
Coins are distinct. 1 may or may not be in the set; if not, some amounts are unreachable.

Clarifying Questions

“Are coins guaranteed sorted?” — typically no; sort if needed.
“Is 1 always present?” — no; the input coins = [3, 5] and amount = 4 is unsolvable, return -1.
“Can amount = 0?” — yes; answer is 0.
“Is the order of coins in the answer significant?” — no, just the count.
“Should I count the coins or list them?” — count.

Examples

coins = [1, 3, 4], amount = 6 → 2 (3 + 3).
coins = [1, 5, 10, 25], amount = 30 → 2 (25 + 5); greedy works.
coins = [1, 5, 10, 25], amount = 41 → 4 (25 + 10 + 5 + 1); greedy works.
coins = [2], amount = 3 → -1.
coins = [1], amount = 0 → 0.
coins = [186, 419, 83, 408], amount = 6249 → 20 (random hostile case).

The Greedy Hypothesis (And Why It Fails)

The natural greedy: sort denominations descending, take the largest that fits, recurse on the remainder.

def coin_change_greedy_WRONG(coins, amount):
    coins = sorted(coins, reverse=True)
    count = 0
    for c in coins:
        while amount >= c:
            amount -= c
            count += 1
    return count if amount == 0 else -1

Run this on coins = [1, 3, 4], amount = 6:

Pick 4: amount = 2, count = 1.
Pick 3? No, 2 < 3. Skip.
Pick 1: twice. amount = 0, count = 3.

Result: 3 coins. Optimum: 3 + 3 = 2 coins. Greedy is wrong.

Why? The greedy choice property does not hold: taking the largest coin (4) at step 1 forces a residual problem (amount = 2) where the available coins ([1, 3, 4]) cannot reach 2 with fewer than 2 coins (1 + 1). But not taking 4 leaves us with amount = 6 and the optimal residual 3 + 3 = 2 coins. The local optimum (largest fits) is not the global optimum.

The exchange-argument failure at this concrete level:

Greedy picks coin 4 first. Optimal picks 3 first.
Try to exchange the optimal’s first 3 with greedy’s 4: residual amount becomes 6 - 4 = 2, which cannot be made with one more coin of denomination ≥ 3. So the swap breaks feasibility / minimality.
The exchange argument fails. Therefore the greedy is not provably optimal — and indeed isn’t.

DP Fallback (The Correct Algorithm)

def coin_change_dp(coins, amount):
    INF = float('inf')
    dp = [0] + [INF] * amount
    for w in range(1, amount + 1):
        for c in coins:
            if c <= w and dp[w - c] + 1 < dp[w]:
                dp[w] = dp[w - c] + 1
    return dp[amount] if dp[amount] != INF else -1

This is unbounded knapsack — see Phase 5 Lab 04 — Unbounded Knapsack (Coin Change) for the full derivation.

Complexity: O(amount * |coins|) time, O(amount) space.

When Does Greedy WORK On Coin Change?

Greedy is optimal on a coin system iff it is canonical — a property that depends on the specific denominations. Sufficient conditions:

[1, c, c², c³, …] (powers of a fixed base) — always canonical.
[1, 5, 10, 25, 50, 100] (US currency) — canonical.
[1, 2, 5, 10, 20, 50] (euro) — canonical.

Necessary and sufficient condition: the set is canonical iff for every amount m in the range [c_{k+1} + 1, c_{k+1} + c_k - 1] (where c_k is the k-th denomination from largest), the greedy answer matches the optimal. Verifying this requires checking O(c_max²) amounts — feasible for small denomination sets.

For interview purposes: never assume canonicity unless the problem explicitly states the denominations are canonical (e.g., “US currency” with the standard set). Default to DP.

A Glance At Matroid Theory (Why Some Greedy Problems Work)

A matroid M = (E, I) is a pair where E is a set of elements and I ⊆ 2^E is a family of “independent sets” satisfying:

Hereditary: if A ∈ I and B ⊆ A, then B ∈ I.
Exchange property: if A, B ∈ I and |A| < |B|, then there exists b ∈ B \ A such that A ∪ {b} ∈ I.

Theorem (Edmonds–Rado): the greedy algorithm produces a maximum-weight independent set on M iff M is a matroid.

Examples of matroids: the cycle-free edge sets of a graph (graphic matroid → Kruskal works), linearly-independent subsets of vectors (linear matroid), independent sets in a uniform matroid.

Coin change is not a matroid problem — there is no natural matroid structure under which “fewest coins to reach amount” is a max-weight independent set, which is why greedy doesn’t work for arbitrary denominations. Interval scheduling is effectively a matroid problem (the set of compatible activities forms an “interval matroid”), which is why earliest-end-time greedy works.

You don’t need to memorize matroid theory for interviews. You do need to know the empirical signal: if greedy doesn’t pass the counterexample stress test, fall back to DP without panic.

Decision Recipe (The Whole Point Of This Lab)

For any optimization problem that “looks greedy”:

Hypothesize a greedy choice (e.g., largest first, smallest first, by ratio).
Run it on a hand-crafted small input of size 4–6 with adversarial denominations / weights.
Compare to brute force (recursive enumeration of all choices).
If greedy ≠ brute force on any input → fall back to DP, no further deliberation.
If greedy = brute force on all stress tests → try to prove via exchange argument.
- If exchange argument works → ship greedy.
- If exchange argument fails or is unclear → fall back to DP. Better safe than sorry under interview time pressure.

The discipline: greedy is opt-in, requires positive proof. DP is the default for optimization problems unless greedy is clearly justified.

Tests

def test_coin_change_dp():
    assert coin_change_dp([1, 3, 4], 6) == 2
    assert coin_change_dp([1, 2, 5], 11) == 3
    assert coin_change_dp([2], 3) == -1
    assert coin_change_dp([1], 0) == 0
    assert coin_change_dp([1, 5, 10, 25], 30) == 2
    assert coin_change_dp([1, 5, 10, 25], 41) == 4
    assert coin_change_dp([186, 419, 83, 408], 6249) == 20

def test_greedy_fails_on_counterexample():
    """Document the failure for posterity."""
    assert coin_change_greedy_WRONG([1, 3, 4], 6) == 3  # WRONG; correct is 2
    assert coin_change_dp([1, 3, 4], 6) == 2            # Right answer

def test_greedy_works_on_canonical():
    assert coin_change_greedy_WRONG([1, 5, 10, 25], 30) == 2
    assert coin_change_greedy_WRONG([1, 5, 10, 25], 41) == 4

Correctness Argument (For DP)

DP correctness follows from optimal substructure: dp[w] = 1 + min(dp[w - c] : c ∈ coins, c ≤ w). Each dp[w] is computed from strictly smaller subproblems, so the table fills in O(amount * |coins|) time. The minimum is over all first-coin choices, exhaustively — so we never miss the optimal first choice (in contrast to greedy, which commits to one). See Phase 5 Lab 04 for the full proof.

Common Bugs (In The DP)

Initializing dp[0] = INF instead of 0. (dp[0] = 0 because zero amount needs zero coins.)
Iterating for c: for w (orderings DP) when for w: for c (combinations DP) is intended for count of orderings — not the issue for min coins, but the analogous bug appears in LC 518 (number of ways to make change).
Returning dp[amount] without checking INF — returns a giant number instead of -1.

Common Bugs (In The Greedy, If You Do Try It)

Assuming canonicity. Always test against DP on hostile cases first.
Forgetting to return -1 when amount is not zero at the end.
Treating coins = [1] as always feasible — true, but easy to forget the early return.

Mastery Criteria

You can deliver the [1, 3, 4] target 6 counterexample by heart, in <30 seconds, without notes.
You can articulate why greedy on coin change works for [1, 5, 10, 25] but fails for [1, 3, 4] — the canonicity property — in <60 seconds.
You can write the DP solution in <5 minutes from blank.
You can name three other classic problems where greedy fails but DP works (0/1 knapsack, weighted interval scheduling, longest path in a general graph).
When proposing a greedy solution to any problem in mock interviews, you stress-test it against brute force on small adversarial inputs before writing production code.

← Lab 05 — Huffman Coding · Phase 6 README

Phase 7 — Competitive Programming Acceleration

Target level: Hard → Codeforces Div 2 D (rating ~1900–2100) Expected duration: 2 months (12-month Elite track) / 4 weeks selective topics (6-month Serious track) / skipped or read-only (12-week Accelerated track) Weekly cadence: ~10 competitive topics + 6 labs + 2 contests/week + 30–60 problems applying them under the framework

A Direct Note On ROI Before You Spend Two Months Here

This phase has the lowest direct ROI per hour for FAANG SWE2 / L4 prep of any phase in this curriculum. If your goal is a Google L4, Meta E4, Amazon SDE2, or similar — you can skip this phase entirely and lose nothing. Phases 01 → 06 plus Phase 8 (practical engineering) cover essentially every problem you will see in those interviews. Modular inverse will not appear in your loop. Convex hull will not appear in your loop. Mo’s algorithm will not appear in your loop. The opportunity cost of two months on competitive programming is two months you could have spent on system design, behavioral prep, or sleep.

This phase has the highest direct ROI per hour for: HFT/quant interviews (Jane Street, HRT, Citadel, Two Sigma, Optiver, IMC, Jump), compiler/runtime/database internals teams (Google’s compiler infra, Microsoft’s CLR, Oracle’s HotSpot, ClickHouse, Snowflake’s query engine), distributed systems coding rounds at the senior+ level where contest-style problems are deliberately used as filters, ICPC-flavored test rounds at startups founded by ex-CP champions, and any interview where the explicit goal is to filter out everyone except the top ~5% of candidates by raw algorithmic horsepower. In those loops, the topics in this phase are not optional decoration — they are the test. A candidate who cannot derive a modular inverse, write binary exponentiation, or sweep events along a coordinate cannot pass an Optiver onsite no matter how good their system design is.

So: decide your target before you start this phase, and do not feel guilty about skipping it if the ROI calculation says skip. The rest of this README assumes you’ve decided to do the work.

What “Competitive Programming Acceleration” Actually Means

Competitive programming is not just “harder LeetCode”. It is a different sport with a different culture, different problem-solving rhythm, and different correctness bar. The differences that matter for interview prep:

Constraints are everything. A LeetCode Hard might say 1 ≤ N ≤ 10^5 and accept any O(N log N) solution. A Codeforces problem will say 1 ≤ N ≤ 5·10^5, T ≤ 10^4 testcases, sum of N ≤ 5·10^5, 2 second time limit, and your O(N log^2 N) solution will TLE while O(N log N) will pass with 200ms to spare. Reading constraints first — before the problem statement — is the single biggest skill jump from LeetCode to CP.
Problems are short. A typical CF Div 2 problem is 3–8 sentences plus 2 example testcases. Information density per word is 5–10× LeetCode. Skim-then-deep-read is wrong; deep-read on first pass is correct.
Brute force is a starting point, not an ending point. Submitting brute force on a contest problem to “lock in partial credit” is a LeetCode habit. On CF you submit only when you believe you have the intended complexity, because wrong submissions cost 50 points each (penalty time).
Stress testing is a normal part of the workflow. Top CP grandmasters run brute-vs-candidate stress tests against random inputs during a contest, on every problem they’re not 100% certain about. This is the muscle Lab 06 builds.
Editorials are a separate skill. After a contest, reading editorials productively (extracting transferable techniques, not just patching your specific solution) is half the learning. Most candidates read an editorial and take away nothing because they read it as a solution rather than as a textbook.

The competitive programming skill set translates to interview signal in three ways: (1) speed — you become physically faster at typing and debugging, which buys time for harder questions; (2) vocabulary — when an interviewer says “this is a sweep line problem” or “use binary search on the answer”, you have a direct reference rather than re-deriving from scratch; (3) pattern coverage — the long tail of “weird trick” problems that interviewers reach for to filter senior candidates is exactly the long tail of CP techniques.

What You Will Be Able To Do After This Phase

Read a Codeforces Div 4 / Div 3 problem in <2 minutes, decide brute-vs-intended in <1 minute, and submit Div 4 A–F or Div 3 A–E within contest time.
Reach Div 2 C consistently and attempt Div 2 D in ~50% of contests.
Read AtCoder Beginner Contest problems A–F and solve A–E reliably; reach F in ~50% of contests.
Reach AtCoder Regular Contest A–C, with C being the contest-finisher you usually upsolve afterward rather than solve in-contest.
Compute nCr mod p for p prime and n up to 10^7, with precomputed factorials and modular inverses, in <5 minutes from blank.
Implement binary exponentiation (a^b mod m) in <2 minutes and recognize when matrix exponentiation reduces a linear recurrence from O(N) to O(K^3 log N).
Implement the Sieve of Eratosthenes (basic and linear), the smallest-prime-factor sieve, and trial-division factorization, knowing when each is appropriate.
Implement modular inverse via Fermat’s Little Theorem (when modulus is prime) and via extended Euclidean (when it isn’t), and know which to reach for.
Implement Andrew’s monotone chain convex hull in <15 minutes and explain why cross product replaces division for orientation.
Implement a sweep line for the skyline problem and 1D rectangle union; recognize the “sort events, scan, maintain active set” pattern under disguise.
Implement coordinate compression as a one-line preprocessing step and combine it with Fenwick tree to count inversions in O(N log N).
Implement Mo’s algorithm with the canonical block-sqrt sorting comparator and explain its O((N + Q) √N) complexity.
Compute Sprague-Grundy numbers for impartial games and reduce composite games via XOR.
Write a stress-testing harness — brute, candidate, random generator, comparator — and use it to find a planted bug in <5 minutes.
Solve interactive CP problems (binary search a hidden value, query a hidden function) using line-buffered I/O and explicit flush discipline.
Configure fast I/O in your language of choice — cin/cout desync in C++, bufio.NewReader + bufio.NewWriter in Go, sys.stdin.readline + sys.stdout.write in Python, BufferedReader + PrintWriter in Java — without thinking about it.

How To Read This Phase

Read this README once, linearly, end-to-end. Do not try to memorize it. The 19 inline topic sections are reference material — internalized when you actually use them on contest problems, not by re-reading. The 9 progression sections are playbooks — they tell you which contests to enter and what the success bar is.

After the linear pass, do this in order:

Set up your CP toolchain — install your language compiler, configure fast I/O templates, get accounts on Codeforces and AtCoder.
Work Lab 01 through Lab 06 in order. The labs are designed so each one builds a primitive you reuse in the next.
Start the contest progression — Div 4 first, then Div 3, then Div 2. Do not skip Div 4 thinking it’s “too easy”; the goal there is speed, not difficulty.
After every contest, spend at least 2× the contest time on upsolving (problems you didn’t solve in-contest, with the editorial open). Upsolving is where the learning happens.

Each topic entry has a fixed shape:

Definition — what the technique is.
When Used — the problem signal that fires this technique.
Complexity — the canonical time/space.
Classic Problems — 2–4 representative LC / CF / AtCoder problems.
Pitfalls — the bugs that consume the most contest minutes for this technique.

The phase ends with a Mastery Checklist, Exit Criteria, and links to all six labs.

CP Problem-Solving Methodology — The Five-Step Loop

The single most teachable skill in competitive programming is the read → constraints → brute → submit → stress loop. Apply it to every problem.

Read fast. First read takes ~60 seconds. Goal: identify the problem class (graph? DP? math? sweep? game?) and the input/output format. Don’t try to solve yet. If you don’t understand on first read, re-read — but do not start sketching code.
Look at constraints before optimizing. This is the single biggest behavioral difference between CP and LeetCode habits. The constraint N ≤ 18 says bitmask DP. N ≤ 22 says meet-in-the-middle. N ≤ 5000 says O(N²). N ≤ 2·10^5 says O(N log N). N ≤ 10^9 says you don’t iterate N at all — math, binary search on the answer, or a closed form. The constraint is the algorithm choice. Read it first; do not write a single line of code without it.
Brute-force first, in your head. Even if brute force won’t pass, the brute force gives you (a) a correctness oracle for stress testing, (b) a starting point for optimization, (c) a 100% reliable answer to “do I understand the problem?”. If you can’t write the brute force, you don’t understand the problem yet — re-read the statement.
Submit early and often, but only when confident. Do not submit a partial / “maybe correct” solution to lock in points; CF/AtCoder penalize wrong submissions. If your code passes the sample inputs, that is necessary but not sufficient. Sample inputs are the easiest possible cases by construction; passing them is the floor, not the ceiling. Stress-test before submitting on any problem you’re <90% confident on.
Stress test if uncertain. Lab 06 builds this muscle. The pattern: brute (definitely correct, exponential), candidate (your fast solution), random generator (small inputs, N ≤ 10), comparator that runs both and dies on mismatch. Run it for 1000 random tests in 30 seconds. If it doesn’t fail in 1000 trials, it probably won’t fail on the judge.

The loop applies recursively. If you’re stuck in step 3 (can’t write brute force), drop to “what’s the absolute simplest version of this problem?” — usually a smaller N, a special case, or a related problem. Solve that first. That’s almost always how the intended solution is derived.

Inline Topic Reference

Math

1. Modular Arithmetic

Definition

Arithmetic over the residue ring Z/pZ (typically p = 10^9 + 7 or p = 998244353). Addition, subtraction, multiplication, and exponentiation are all defined modulo p. Division is not defined directly — see modular inverse.

When Used

Whenever the answer is “huge” — count of arrangements, count of paths, sum over all subsets — and the problem says output mod 10^9 + 7. This is the most common modifier on counting problems in CP.

Complexity

Addition / subtraction / multiplication are O(1). Watch for overflow: in C++ (a * b) % p overflows int when p ≈ 10^9; cast to long long first. In Java, % is signed (negative % of negative integer is negative); use ((a % p) + p) % p after subtraction. In Python, integers are arbitrary precision so overflow doesn’t happen, but performance suffers — keep numbers under p aggressively.

Classic Problems

CF 1342E (Placing Rooks) — counting arrangements mod 10^9 + 7.
AtCoder ABC 174 F — count distinct elements queries (off-topic but illustrates mod ergonomics).
LC 920 (Number of Music Playlists) — DP with mod.

Pitfalls

Forgetting to mod after every multiplication; the value silently overflows and silently corrupts answers.
Negative numbers after subtraction in C++/Java; always ((x % p) + p) % p.
Using % on double (always wrong; mod is integer-only).

See Lab 01 — Modular Arithmetic.

2. Modular Inverse

Definition

The modular inverse of a mod p is the unique x in [0, p) such that a · x ≡ 1 (mod p), when it exists. Existence requires gcd(a, p) = 1. When p is prime, every a ≠ 0 has an inverse.

Two computation methods:

Fermat’s Little Theorem (FLT): if p is prime, a^(p-1) ≡ 1 (mod p), so a^(p-2) ≡ a^(-1) (mod p). Use binary exponentiation in O(log p).
Extended Euclidean Algorithm (extgcd): find x, y such that a·x + p·y = gcd(a, p). If gcd = 1, then x mod p is the inverse. O(log min(a, p)).

When Used

Division by a mod p (replace n / a with n · inv(a)).
Computing nCr mod p from precomputed factorials: nCr = fact[n] · inv(fact[r]) · inv(fact[n-r]).
Probability problems where the answer is a fraction p/q modulo a prime; the answer is p · q^(-1) mod prime.

Complexity

O(log p) per inverse via either method. For batched inverses of n values, there’s a clever O(n) algorithm using the running product trick — useful when precomputing inverse factorials.

Classic Problems

CF 1462E2 (Close Tuples to Arrays, Hard) — nCr mod p heavy.
AtCoder ABC 178 F (Contrast) — combinatorics with mod.
CF 1342E — modular inverse for counting.

Pitfalls

Using FLT when p is composite — incorrect, must use extgcd.
Forgetting that inv(0) is undefined; guard before calling.
Using FLT when the modulus is prime but you accidentally pass p - 1 instead of p - 2.

3. Binary Exponentiation (Fast Power)

Definition

Compute a^b (or a^b mod m) in O(log b) time by exploiting the binary representation of b. The recurrence: a^b = (a^(b/2))^2 if b is even, a · a^(b-1) if b is odd.

long long power(long long a, long long b, long long m) {
    long long res = 1 % m;
    a %= m;
    while (b > 0) {
        if (b & 1) res = res * a % m;
        a = a * a % m;
        b >>= 1;
    }
    return res;
}

When Used

Anywhere you’d otherwise loop b times multiplying a. With b up to 10^18, naive looping is impossible; binary exponentiation is mandatory. Also the implementation engine for FLT-based modular inverse and matrix exponentiation.

Complexity

O(log b) multiplications. Each multiplication is O(1) for integers but O(K^3) for K×K matrices (giving O(K^3 log b) for matrix exponentiation).

Classic Problems

CF 630I, 630J — direct power computation.
LC 50 (Pow(x, n)) — the canonical binary exponentiation problem.
AtCoder ABC 178 D — DP with mod, uses fast power for inverses.

Pitfalls

Negative exponents (LC 50): handle as 1 / power(x, -n) and watch for INT_MIN (negating overflows).
Base case b = 0 returning 1, but 1 % m if m = 1 should be 0 — start with res = 1 % m.

See Lab 02 — Binary Exponentiation.

4. Matrix Exponentiation

Definition

For a linear recurrence f(n) = c_1 · f(n-1) + c_2 · f(n-2) + ... + c_k · f(n-k), the state vector [f(n), f(n-1), ..., f(n-k+1)] is obtained from the state vector at step n-1 by multiplying by a fixed k×k companion matrix M. Therefore the state at step n is M^n · initial_state, and M^n is computed by binary exponentiation in O(k^3 log n) time.

When Used

Linear recurrences where n is up to 10^18 and k (the recurrence depth) is small (typically k ≤ 60). The textbook example is Fibonacci modulo a prime for n = 10^18. Also: counting walks of length n in a graph (M = adjacency matrix), counting paths in a DFA, certain combinatorial DPs over a small fixed state space.

Complexity

O(k^3 log n) time, O(k^2) space. For k = 2 (Fibonacci), 8 log n multiplications mod p ≈ 500 ops for n = 10^18.

Classic Problems

Fibonacci mod p, n = 10^18 — the canonical introduction.
CF 392C (Yet Another Number Sequence) — matrix exponentiation with polynomial coefficients.
AtCoder DP Contest R (Walk) — counting walks of length K in a graph.

Pitfalls

Index off-by-one in the state vector (forgetting that the last entry is f(n-k+1), not f(n-k)).
Forgetting to mod every matrix multiplication entry.
Using nested Python lists instead of NumPy for matrices — Python is too slow for K ≈ 50 and log n ≈ 60.

See Lab 02 — Matrix Exponentiation for Fibonacci.

5. Sieve of Eratosthenes (and Linear Sieve)

Definition

Build a boolean array is_prime[0..N] in O(N log log N) time by, for each prime p ≤ √N, marking all multiples of p (starting from p²) as composite. The linear sieve variant produces the smallest prime factor (SPF) for every integer up to N in exactly O(N) time using the invariant “every composite is sieved once, by its smallest prime factor”.

When Used

Counting primes up to N for N ≤ 10^7 (Sieve of Eratosthenes is faster than trial division).
Generating all primes up to N for prime-related problems.
Building a smallest-prime-factor table for fast factorization (see Topic 6).
Euler’s totient phi(n) for all n ≤ N in O(N log log N).

Complexity

Sieve of Eratosthenes: O(N log log N), space O(N) (or N/8 with a bitset). Linear sieve: O(N), space O(N) for the SPF table.

Classic Problems

LC 204 (Count Primes) — sieve introduction.
CF 17A (Noldbach problem) — primes near pairs of primes.
Project Euler 10 (sum of primes below 2M) — sieve of size 2·10^6.

Pitfalls

Iterating to N instead of √N in the outer loop (correctness OK but O(N²)-flavor slow).
Starting the inner loop at 2p instead of p² (correct but slower; p², p²+p, p²+2p, ... is the optimal start).
Using vector<bool> in C++ is fine; bool[] is also fine. unordered_set<int> is not fine — too slow.

See Lab 03 — Sieve and Factorization.

6. Prime Factorization

Definition

Decompose n into its prime factors. Two main techniques:

Trial division. For each p = 2, 3, 5, ..., √n, while p | n, divide. Final n > 1 is itself prime. O(√n) per number.
Smallest-prime-factor (SPF) sieve. Precompute spf[i] = smallest prime dividing i, for all i ≤ N. Then factor any i ≤ N in O(log i) by repeatedly replacing i with i / spf[i]. O(N log log N) preprocessing; O(log i) per query.

When Used

Trial division when factoring a single large n (up to 10^14 is feasible).
SPF sieve when factoring many numbers in a range [1, N] for N ≤ 10^7.
For n up to 10^18, trial division is too slow; use Pollard’s rho (out of scope for this phase, in Phase 12).

Complexity

Trial division: O(√n). SPF sieve: O(log n) per query after O(N log log N) preprocessing.

Classic Problems

CF 1325E — factor and sum of exponents.
LC 263 (Ugly Number) — recursive division by small primes.
AtCoder ABC 169 D — factor and count exponents.

Pitfalls

Forgetting that after the loop if n > 1: append n as final prime. Easy to miss; corrupts every factorization where n has a prime factor > √n_initial.
Trial-dividing past √n; once p > √n, n is either 1 or itself prime.
Mixing up “number of distinct primes” with “number of prime factors with multiplicity” — these are very different (e.g., 12 = 2²·3 has 2 distinct, 3 with multiplicity).

See Lab 03 — Sieve and Factorization.

7. Combinatorics (`nCr mod p`)

Definition

Compute binomial coefficients modulo a prime. For repeated queries, precompute fact[i] = i! mod p and inv_fact[i] = (i!)^(-1) mod p for i up to N. Then nCr = fact[n] · inv_fact[r] · inv_fact[n-r] mod p in O(1) per query.

For n very large (up to 10^18) and p small (p ≤ 10^5), use Lucas’s theorem: C(n, r) mod p = ∏ C(n_i, r_i) mod p, where n_i, r_i are the base-p digits of n, r. The inner C(n_i, r_i) are computed directly because n_i, r_i < p.

When Used

Counting paths in a grid (C(m+n, m)).
Stars-and-bars: distribute n identical items into k bins → C(n+k-1, k-1).
Inclusion-exclusion sums.
Probability with combinatorial denominators.
Lucas: when n is up to 10^18 (e.g., AtCoder ABC 167 E or grid problems with huge dimensions).

Complexity

Preprocess O(N). Each nCr query O(1). Lucas’s theorem: O(p + log_p(n)) per query (assuming preprocessed factorials up to p).

Classic Problems

CF 1342E — uses nCr.
LC 62 (Unique Paths) — direct C(m+n-2, m-1).
AtCoder ABC 167 E (Colorful Blocks) — inclusion-exclusion with nCr.

Pitfalls

Forgetting to precompute inv_fact separately; computing each query as fact[n] / (fact[r] · fact[n-r]) and trying to use integer division mod p (this is wrong; you need modular inverse).
Off-by-one in fact[] array (forgetting fact[0] = 1).
For Lucas, forgetting that any r_i > n_i gives C(n_i, r_i) = 0, so the whole product is 0.

See Lab 01 — Modular Arithmetic.

8. GCD, LCM, Extended Euclidean

Definition

gcd(a, b) is the greatest common divisor of a, b. Computed by Euclidean: gcd(a, b) = gcd(b, a mod b), base case gcd(a, 0) = a.
lcm(a, b) = a · b / gcd(a, b). Compute as a / gcd(a, b) · b to avoid overflow on intermediate a · b.
Extended Euclidean algorithm finds, alongside gcd(a, b), integers x, y such that a·x + b·y = gcd(a, b). This is the engine for modular inverse when the modulus isn’t prime.

When Used

Reducing fractions.
Solving linear Diophantine equations a·x + b·y = c (solution exists iff gcd(a, b) | c).
Modular inverse via extgcd when the modulus is composite.
Cycle-length problems where the answer involves an LCM.

Complexity

O(log min(a, b)) for both gcd and extgcd.

Classic Problems

CF 822A — direct LCM use.
LC 1071 (Greatest Common Divisor of Strings) — repurposed gcd.
AtCoder ABC 162 D — gcd in a counting problem.

Pitfalls

lcm(a, b) = a * b / gcd(a, b) overflows when a, b are around 10^9. Reorder: lcm = a / gcd * b.
gcd(0, 0) is conventionally 0, but C++ __gcd(0, 0) returns 0; some libraries return undefined. Guard.
Negative a, b: gcd should always be non-negative; some implementations return signs. Use abs().

Geometry

9. Coordinate Geometry Basics (Cross Product, Orientation)

Definition

For two 2D vectors u = (ux, uy) and v = (vx, vy), the cross product is the scalar ux·vy − uy·vx. Its sign tells you the relative orientation of the vectors: positive = counter-clockwise turn, negative = clockwise, zero = collinear. The orientation of three points A, B, C is the sign of the cross product of B − A and C − A; this is the most-used primitive in computational geometry.

When Used

Determining whether three points form a left turn, right turn, or are collinear (convex hull, polygon orientation).
Determining whether a point is on, left of, or right of a line.
Computing twice the signed area of a triangle (the cross product is twice the signed area).
Computing twice the signed area of a polygon (shoelace formula = sum of cross products).

Complexity

O(1) per cross product / orientation test.

Classic Problems

LC 587 (Erect the Fence) — convex hull, uses orientation.
CF 70D (Dynamic Convex Hull) — uses orientation heavily.
AtCoder ABC 207 D — geometry with cross products.

Pitfalls

Using floating-point for cross product when integer arithmetic would suffice — introduces rounding errors that cause “almost collinear” misclassifications. Use long long (or arbitrary precision) when coordinates are integers.
Confusing CCW (counter-clockwise) with CW (clockwise) sign convention.
Overflow in cross product: with coordinates up to 10^9, the product is up to 10^18, which fits long long but not int.

10. Convex Hull (Andrew’s Monotone Chain)

Definition

Given a set of 2D points, the convex hull is the smallest convex polygon containing all of them. Andrew’s monotone chain algorithm sorts points by (x, y), then builds the lower hull left-to-right and the upper hull right-to-left, using the cross-product orientation test to pop points that make a right turn (in the lower hull) or left turn (in the upper hull).

sort(points.begin(), points.end());
vector<P> hull;
// lower hull
for (auto &p : points) {
    while (hull.size() >= 2 && cross(hull[hull.size()-2], hull.back(), p) <= 0)
        hull.pop_back();
    hull.push_back(p);
}
// upper hull
int lower_size = hull.size() + 1;
for (int i = points.size() - 2; i >= 0; --i) {
    while (hull.size() >= lower_size && cross(hull[hull.size()-2], hull.back(), points[i]) <= 0)
        hull.pop_back();
    hull.push_back(points[i]);
}
hull.pop_back();  // last point is the start, duplicated

When Used

Smallest enclosing polygon problems.
Diameter of a point set (rotating calipers on the hull).
Pre-step for various 2D optimization problems (convex layers, dynamic hulls).

Complexity

O(N log N) — dominated by the sort. The two hull-building passes are O(N) amortized.

Classic Problems

LC 587 (Erect the Fence) — direct convex hull.
CF 1093E — uses convex hull as a subroutine.

Pitfalls

<= 0 vs < 0 in the orientation test: <= 0 removes collinear points from the hull (giving the strict hull); < 0 keeps them (giving the inclusive hull). LC 587 wants the inclusive hull (use < 0); most CP problems want the strict hull (use <= 0).
Forgetting to remove the duplicated last point.
Sorting tuples lexicographically without a tie-break — for points with the same x but different y, the sort order matters; (x, y) lexicographic is the right tie-break.

11. Closest Pair of Points (Divide & Conquer Overview)

Definition

Given N points in 2D, find the pair with the smallest Euclidean distance. The naive O(N²) algorithm is to compare every pair. The classical O(N log N) algorithm sorts by x, recursively solves the left and right halves, finds the minimum distance d of the two halves, then merges by inspecting only points within horizontal distance d of the dividing line — and within those, only y-neighbors within distance d. The merge step is O(N) because each strip point only needs to compare against ~6 nearest y-neighbors.

When Used

Direct closest-pair problems.
Any problem where you need a guarantee on minimum spacing (geometric clustering, collision detection).

Complexity

O(N log N) time, O(N) space. The recursion T(N) = 2 T(N/2) + O(N) resolves to O(N log N).

Classic Problems

Codeforces educational round problems labeled “closest pair”.
UVa 10245 (The Closest Pair Problem) — direct.

Pitfalls

For most interview problems, N is small enough (≤ 5000) that O(N²) brute force passes, and writing the divide-and-conquer version is not worth the complexity.
Floating-point distance comparison: compare squared distances (integers, exact) instead of square-rooted distances (floats, lossy).

Sweep & Queries

12. Sweep Line

Definition

A sweep line algorithm imagines a vertical (or horizontal) line sweeping across the plane (or 1D number line) and processing events in the order the sweep encounters them. At each event, you update an “active set” — typically a balanced BST or a multiset — and answer queries based on the current state. The key insight is that between events, the active set is constant, so you only need to process at events.

When Used

1D rectangle/interval union (sum of lengths).
2D rectangle union area (sweep y-coordinate; active set = x-intervals).
Segment intersection problems (Bentley-Ottmann).
Skyline problem (LC 218).
Closest pair (alternate formulation).

Complexity

Typically O((N + E) log N) where E is the number of events; for rectangle union, E = O(N), giving O(N log N).

Classic Problems

LC 218 (The Skyline Problem) — canonical.
LC 850 (Rectangle Area II) — 2D rectangle union.
AtCoder ABC 188 D — 1D event-sweep counting.

Pitfalls

Tie-breaking on event time: when multiple events occur at the same x, process all opens before closes (or vice versa, problem-dependent). Wrong order → off-by-one in the active set.
Using set<int> for the active set when you need to handle duplicate values; switch to multiset<int>.
Updating the answer based on the active set after processing all events at the current x, not in the middle.

See Lab 04 — Sweep Line for Skyline.

13. Coordinate Compression

Definition

Replace large/sparse coordinate values with their ranks in the sorted set of distinct coordinates. If your data has values [10^9, 5, 10^7, 5, 1], compression maps them to ranks [3, 1, 2, 1, 0]. The transformed problem has the same structure but coordinates fit in [0, N), enabling array-indexed data structures (Fenwick tree, segment tree, bucket sort).

sorted_unique = sorted(set(values))
rank = {v: i for i, v in enumerate(sorted_unique)}
compressed = [rank[v] for v in values]

When Used

Counting inversions with a Fenwick tree (values up to 10^9 → compress to [0, N)).
2D rectangle union via sweep line + segment tree on y-coordinates.
DP with a state indexed by a coordinate that’s too large to enumerate.
Almost any problem with value ≤ 10^9 where you would otherwise need a hashmap of size N.

Complexity

O(N log N) for sorting and deduplication; O(N) after that to relabel.

Classic Problems

LC 315 (Count of Smaller Numbers After Self) — Fenwick + compression.
CF 51A — geometry with coord compression.
AtCoder ABC 174 F — distinct elements queries.

Pitfalls

Forgetting to use set (deduplicate) before sorting; otherwise duplicate values get different ranks, breaking equality comparisons.
Using compression when not needed (values already in a small range) — adds complexity for no benefit.

See Lab 05 — Coordinate Compression for Inversions.

14. Mo’s Algorithm

Definition

An offline algorithm for answering Q range queries on an array in O((N + Q) √N) total time when (a) you can move the answer from [l, r] to [l-1, r], [l+1, r], [l, r-1], [l, r+1] in O(1) (or O(log N)); and (b) the queries can be reordered. The trick: sort queries by (l / B, r) where B = √N. Then within a block, r only increases, so total r movement is O(N) per block × √N blocks = O(N √N). Across blocks, l movement is O(√N) per query × Q queries = O(Q √N).

When Used

“Number of distinct values in [l, r]” queries.
“Sum of f(count(v)) for distinct v in [l, r]” queries.
Mode of a range (with auxiliary frequency-of-frequency structure).
Many problems labeled “offline range queries with no updates” on Codeforces.

Complexity

O((N + Q) √N). With N = Q = 10^5, that’s about ~3.2·10^7 operations — passes a 2-second limit comfortably.

Classic Problems

CF 86D (Powerful array) — sum of cnt² · v over distinct v.
SPOJ DQUERY — distinct values in a range.
CF 220B — count of values equal to their frequency.

Pitfalls

The optimal block size is √N; smaller values of B cause TLE because r-movement within a block is too long.
Mo’s algorithm doesn’t handle online queries — queries must be batched and reordered.
The “add/remove element in O(1)” requirement is strict; an O(log N) add/remove makes the total O((N + Q) √N · log N), which usually TLEs.

15. Offline Binary Search / Parallel Binary Search

Definition

When you have Q independent binary-search queries, each of which would naively take O(log V · F) where F is some function evaluation cost, parallel binary search runs all Q queries’ binary searches in lockstep. At each binary-search step, group the queries by their current candidate midpoint, evaluate F once per group, and update each query’s interval. Total cost: O(log V · (F + Q)) instead of O(Q · log V · F).

When Used

“For each query, find the smallest t such that some property P_query(t) holds”, where P is monotone in t and evaluating P is expensive (e.g., requires processing the first t operations of a stream).
Problems where each query is a binary search over time/index/threshold and the function F(t) changes globally with t.

Complexity

O(log V · (F + Q)). For V = N, F = O(N), Q = N: O(N log N) total.

Classic Problems

CF 813F (Bipartite Checking) — parallel binary search on offline DSU.
POI Meteors — the canonical introduction.

Pitfalls

Forgetting that this technique requires queries to be independent (one query’s answer doesn’t depend on another).
Conceptually heavier than Mo’s algorithm; for interview prep, knowing the technique exists and recognizing its signal is more important than implementing it from blank.

Game Theory & Misc

16. Sprague-Grundy / Nim

Definition

In an impartial two-player game (both players have the same available moves at every position), each position has a Grundy number g(pos) defined recursively: g(pos) = mex { g(next) : next ∈ moves(pos) }, where mex is the minimum excluded value (smallest non-negative integer not in the set). A position has Grundy number 0 iff it’s losing for the player to move. Nim’s theorem: the Grundy number of a sum of independent games is the XOR of their individual Grundy numbers.

When Used

Any “two-player game, both move optimally, who wins?” problem with no chance / no hidden information / both players have the same moves.
Decomposing complex games into sums of simpler subgames.
Standard Nim (multiple piles, take any number from any pile, last to move wins): the answer is “first player wins iff XOR of pile sizes ≠ 0”.

Complexity

Computing Grundy via memoization: O(states · branching). For games with state space up to 10^6, this is feasible.

Classic Problems

Standard Nim, multi-pile — XOR of pile sizes.
CF 95A — Grundy on a stair-step game.
AtCoder ABC 195 D — game DP related (not pure Grundy but related).

Pitfalls

The theorem only applies to impartial games. Partisan games (chess, where pieces are colored) don’t satisfy Sprague-Grundy.
“Last to move loses” (misère convention) does NOT in general have the simple XOR rule — only the “last to move wins” (normal convention) does, except in degenerate cases.
Large branching factor + large state space → memoization table doesn’t fit. Look for closed-form patterns by computing Grundy for small n and spotting periodicity.

17. Randomized Algorithms / Stress Testing

Definition

Two related concepts:

Randomized algorithms: algorithms that use random choices to achieve good expected complexity (randomized quicksort, treap, hash-based string matching). Las Vegas algorithms are always correct, randomized in time; Monte Carlo are randomized in correctness.
Stress testing (the bigger interview-prep topic): writing a small brute-force solver, your candidate optimal solver, a random input generator, and a comparator that runs both on every random input until they disagree. This is how CP grandmasters find bugs in their own solutions.

When Used

Stress testing: on every problem you’re not 100% confident about, before submitting.
Randomized algorithms: when a deterministic guarantee isn’t required (probabilistic data structures: Bloom filter, count-min sketch, treap, randomized convex hull).

Complexity

Stress testing is overhead-only — if both your brute and candidate are fast enough on small inputs, stress testing is essentially free. 1000 random tests on N = 10 finishes in <1 second.

Classic Problems

This is a meta-skill, not a problem class. See Lab 06 — Stress Testing.

Pitfalls

Random generator that doesn’t cover edge cases (e.g., always generating distinct elements when the bug is in duplicate handling). Generate adversarially: small N, small value range, allow duplicates.
Comparator that compares output as strings without normalizing whitespace — false positives.
Not seeding the RNG; one accidentally-passing run hides the bug.

See Lab 06 — Stress Testing.

18. Interactive Problems (CP-Style)

Definition

The problem statement defines an interactive protocol: you ask the judge queries (e.g., “is element i greater than element j?”), the judge answers, and after at most K queries you must report the answer. The judge runs as a subprocess and communicates via stdin/stdout. The technique is usually a binary search, ternary search, or adaptive query strategy bounded to O(log N) queries.

When Used

“Find a hidden value in [1, N] in at most log₂ N queries” — straight binary search.
“Find the minimum of a unimodal function” — ternary search.
Adversarial / interactive game-tree problems.

Complexity

O(log N) queries for binary/ternary search; the algorithm is otherwise mostly bookkeeping.

Classic Problems

CF 1207E (XOR Guessing) — adaptive queries.
CF 1486D (Max Median) — interactive binary search.
AtCoder ABC 178 D — not interactive but related decision-problem-as-binary-search.

Pitfalls

Forgetting to flush stdout after every query. This is the single most common interactive bug. In C++: cout << ... << endl; (or cout.flush();); in Python: print(...); sys.stdout.flush() or print(..., flush=True). In Go: bufio.NewWriter with explicit Flush(). If you don’t flush, the judge sees nothing, your program waits for input that never comes, you TLE.
Mixing cin/cout with scanf/printf — buffering interleaves badly.
Reading the judge’s response on the wrong line because of an off-by-one in the query loop.

19. Fast I/O

Definition

The default I/O mechanisms in most languages are line-buffered, locale-aware, and format-aware — which makes them slow. For CP, where you might read 10^6 integers in a 1-second time limit, fast I/O is mandatory. The technique varies by language:

C++: ios_base::sync_with_stdio(false); cin.tie(nullptr); — disconnects cin/cout from C stdio. Speed-up: ~5×. Even faster: scanf/printf directly.
Java: BufferedReader + StreamTokenizer for input; PrintWriter (with explicit flush()) for output. Scanner is too slow for CP — never use it.
Python: sys.stdin.readline instead of input(); sys.stdout.write instead of print for hot loops. For massive input: data = sys.stdin.buffer.read().split() and parse from there. PyPy3 is 5–10× faster than CPython for raw computation; use it whenever available.
Go: bufio.NewReader(os.Stdin) and bufio.NewWriter(os.Stdout); always defer writer.Flush(). fmt.Scan is slow; use a custom token-by-token reader.
JavaScript / TypeScript (Node.js): process.stdin raw read, parse all input at once. Generally the slowest mainstream CP language; not recommended for N ≥ 5·10^5.

When Used

Always, on every CP problem with large input. Cost-of-not-using-fast-I/O: 5–10× slowdown, the difference between AC and TLE.
Less critical on LeetCode-style problems where input is already parsed for you.

Complexity

I/O is O(input_size) either way; fast I/O reduces the constant by ~5–10×.

Classic Problems

Any problem with N ≥ 10^6 integers as input. The constraint itself is a hint: “you need fast I/O”.

Pitfalls

Mixing buffered and unbuffered I/O in the same program. In C++, after sync_with_stdio(false), do not mix cin with scanf or cout with printf. The buffers are independent and output appears out of order.
Forgetting to Flush() in Go. Your output disappears entirely.
Java Scanner. Don’t.
Python print in a loop of 10^6 iterations. Each call locks stdout, flushes, formats — lethal for CP. Buffer with '\n'.join(...) and one final sys.stdout.write.

Progression Playbooks (How To Practice Each Contest Track)

The 19 topics above are the vocabulary. The progression playbooks below are the training plans — which contests to enter, what the success bar is, and what to do after.

1. Codeforces Div 4 Progression

Target rating: unrated → 1200. Goal: solve A–F reliably, in <90 min, cleanly. Contest cadence: every Div 4, ~2/month. Why Div 4: the floor of CF; problems are LC-Easy to LC-Medium difficulty but with CF-style constraints. The skill being trained is speed — you should never get stuck on a Div 4 problem; if you do, the bottleneck is reading/typing speed, not algorithm knowledge.

How to practice: enter every Div 4 live. Aim for A–E in <60 min, F in <90 min total. After contest, upsolve any problems you missed, with editorial open. Track your fastest A–E solve times in a spreadsheet — they should drop ~30% over your first 5 Div 4 contests.

Exit criterion: solve A–F in 6/6 problems consistently in <100 min.

2. Codeforces Div 3 Progression

Target rating: 1200 → 1500. Goal: solve A–E reliably, attempt F. Contest cadence: every Div 3, ~2/month. Skill trained: pattern recognition. Div 3 problems mostly use the patterns from this curriculum (sweep, two pointers, basic DP, basic graph), but the framing is less explicit than LeetCode. You’ll see “given an array, do f(...)” with no LC tag telling you “this is binary search on the answer”.

How to practice: enter every Div 3 live. Aim for A–D in <60 min, E in <120 min. Upsolve E and F after contest. Read the editorial carefully — even if you solved it, see if there’s a slicker approach.

Exit criterion: solve A–E in 5/6 problems consistently within contest time.

3. Codeforces Div 2 Progression

Target rating: 1500 → 1900. Goal: solve A–C reliably, attempt D in 50% of contests. Contest cadence: every Div 2, ~3/month. Skill trained: problem-solving creativity. Div 2 D is where “knowing the technique” stops being enough — you must combine techniques. Div 2 D might require a sweep + DP, or a binary search + greedy, or a Fenwick tree + coordinate compression.

How to practice: enter every Div 2 live. Don’t worry about D in your first 10 contests. Aim for A, B, C clean. After contest, upsolve D with the editorial open; the goal is to learn techniques, not to spend 4 hours stuck. After 10 contests, start attempting D in-contest.

Exit criterion: solve A–C in 8/10 contests; D in 4/10 contests.

4. AtCoder Beginner Contest Progression

Target rating: unrated → 1400 (AtCoder). Goal: solve A–F. Contest cadence: every Saturday/Sunday, ~4/month. Why ABC: AtCoder’s problem statements are remarkably clean (often a single math problem stated tersely), and the difficulty curve A–F is smoother than CF Div 2/3. ABC F is roughly equivalent to CF Div 2 D but with cleaner statements.

How to practice: enter every ABC live. Aim for A–E in <60 min, F in <100 min. ABC F is famous for requiring exactly one “aha” insight per problem; if you don’t see it, move on and upsolve afterward — don’t grind in-contest.

Exit criterion: solve A–E in 9/10 contests; F in 5/10 contests.

5. AtCoder Regular Contest Progression

Target rating: 1400 → 1800 (AtCoder). Goal: solve A–C; attempt D. Contest cadence: ~1/month. Why ARC: ARC is harder than ABC and tests deeper CP techniques — segment tree beats, advanced combinatorics, harder geometry. ARC C is approximately CF Div 1 B / Div 2 E.

How to practice: enter every ARC live. Don’t expect to finish A–C in your first 5 attempts. Upsolve C and D after every contest. ARC is the contest where editorial reading delivers the most learning per minute, because the problems are designed around a specific technique that the editorial will name.

Exit criterion: solve A–B in 8/10 contests; C in 3/10 contests.

6. Stress Testing Methodology

Skill trained: finding bugs in your own solutions before the judge does. Cadence: every problem you’re <90% sure about. Tools: brute solver, candidate solver, random generator, comparator script. See Lab 06 for the full implementation.

How to practice: during a contest, when your candidate passes the samples but you have any doubt, write the stress test. It takes ~3 minutes; it saves the 50-point penalty of a wrong submission, plus the 30 minutes of re-solving after a WA. After contest, on every problem you got wrong, write a stress test that finds your bug. This builds the habit until it’s automatic.

Exit criterion: in 3 consecutive contests, no wrong submissions caused by bugs that would have been caught by stress testing.

7. Reading Editorials Productively

Skill trained: extracting transferable techniques rather than specific solutions. Cadence: after every contest, every problem. Why it matters: the difference between a 1500-rated and a 1900-rated coder is mostly that the 1900 has read 5× more editorials and retained them.

How to practice:

Read the editorial before re-implementing your wrong solution. Do not patch your code; rewrite from blank using the editorial’s approach.
Identify the technique name the editorial uses. “This is binary search on the answer.” “This is a two-pointer sliding window.” Add it to your personal technique catalog (a markdown file with one line per technique → one problem where you saw it).
If the editorial is terse (AtCoder editorials are famously curt), look for community write-ups on Codeforces blogs.
Re-implement from blank, in your own style.

Exit criterion: for every solved problem in your CP journal, you can name the technique and cite one other problem that uses it.

8. Implementation Speed Drills

Skill trained: typing your way out of the problem-statement-to-AC pipeline as fast as possible. Cadence: weekly, 30 min. Why it matters: in a 2-hour Div 2, you have ~25 min per problem on average. If you spend 10 min on syntax errors, you’ve lost 40% of your budget.

How to practice: pick 5 problems you’ve already solved. Type them again, from blank, against a stopwatch. Your second attempt should be 3–5× faster than your first. Do this on the canonical primitives — Sieve, modular inverse, binary exponentiation, Fenwick tree, sweep line — until you can write each from blank in <5 minutes without referring to notes.

Exit criterion: all 6 lab implementations in this phase, written from blank in <15 minutes each.

9. Contest-Time Strategy (Problem Ordering, When To Skip, When To Stress Test)

Skill trained: allocating limited contest time. Cadence: every contest.

The contest-time playbook:

First 10 minutes: read all problems briefly. Mark each as “trivial / hard / ?”. Solve the trivial ones first to build momentum.
Next 30 minutes: solve the medium ones. Allocate ~15 min per problem.
When stuck for 15 min on one problem: skip. Move to the next. Come back later with a fresh perspective. The cost of grinding stuck is paid in opportunity cost on the next problem.
When passing samples but uncertain: write a stress test. 3 minutes invested, 50-point penalty avoided.
Last 30 minutes: decide between (a) attempting a hard problem you haven’t started, or (b) re-checking your earlier solutions. (b) is usually higher ROI unless the hard problem is worth a lot.
Never give up before time expires. Even if every problem is solved or skipped, re-read the hardest unsolved problem one more time — sometimes the third reading triggers an insight the first two didn’t.

Common strategy mistakes:

Grinding A–B for too long when they should take <15 min total. If A is taking >20 min, you’re misreading; re-read.
Submitting before stress-testing on a problem you’re <90% sure about. The penalty hurts more than the time investment.
Skipping the editorial post-contest because “I’ll do it later”. Later never comes.

Mastery Checklist

Tick when each item is true unprompted — i.e., you’d reach for it without consulting notes.

Read constraints first on every problem; can articulate why N ≤ 18 says bitmask, N ≤ 5000 says O(N²), etc.
Modular inverse via FLT in <2 minutes from blank; can switch to extgcd when modulus is composite.
Binary exponentiation for integers in <2 minutes from blank.
Matrix exponentiation for Fibonacci in <10 minutes from blank, including the matrix multiplication primitive.
Sieve of Eratosthenes (basic) in <3 minutes; SPF sieve in <5 minutes; trial-division factorization in <2 minutes.
nCr mod p with precomputed factorials in <5 minutes from blank.
gcd/lcm/extgcd in <3 minutes from blank.
Cross product orientation test on demand; can identify CCW/CW/collinear by sign.
Convex hull (Andrew’s monotone chain) in <15 minutes from blank.
Sweep line for skyline in <20 minutes from blank.
Coordinate compression as a one-line preprocessing.
Mo’s algorithm template (block sort + add/remove handlers) in <20 minutes from blank.
Sprague-Grundy on demand for impartial games up to small state space.
Stress-testing harness (brute, candidate, generator, comparator) in <10 minutes from blank.
Interactive-problem template with explicit flush after every query.
Fast I/O configured by reflex in your primary language.
Read an editorial productively: name the technique, find one other problem using it.
Codeforces Div 3 A–E in 5/6 contests.
Codeforces Div 2 A–C in 8/10 contests.
AtCoder ABC A–F in 5/10 contests.

Exit Criteria

You graduate Phase 7 when all of the following hold:

You have entered ≥10 Codeforces contests live (any division) and ≥10 AtCoder Beginner Contests live.
You have a Codeforces rating of ≥1500 OR you have solved Codeforces Div 2 D in ≥3 contests, in or out of contest.
All 6 labs in this phase are completed with all mastery criteria ticked.
You can explain — out loud, in <60 seconds — what each of the 19 inline topics is, when to reach for it, and what its complexity is.
You have a personal CP journal with ≥30 entries, each one linking the problem name to the named technique used.
You have a stress-testing harness saved as a snippet in your editor and have used it to find a bug in your own code at least 5 times.

If any of these is missing — especially the contests and the journal — you have not exited this phase. Add 2 weeks and re-check.

Labs

#	Lab	What It Builds
1	Modular Arithmetic	`nCr mod p` with factorial precomputation + modular inverse
2	Binary Exponentiation	`a^b mod p` and matrix exponentiation for Fibonacci
3	Sieve and Factorization	Count primes, sum of primes, SPF table for fast factorization
4	Sweep Line	The Skyline Problem (LC 218) via canonical sweep
5	Coordinate Compression	Counting inversions via Fenwick tree + compression
6	Stress Testing	Brute + candidate + random generator + comparator harness

← Phase 6 — Greedy · Phase 8 — Practical Engineering →

Lab 01 — Modular Arithmetic: `nCr mod p` With Precomputed Factorials

Goal

Master modular arithmetic and modular inverse by building a nCr mod p engine that answers any (n, r) query in O(1) after O(N) preprocessing, for n up to 10^7 and p = 10^9 + 7. By the end of this lab you can write the engine from blank in under 5 minutes.

Background

Counting problems modulo a prime are the single most common framing in competitive programming. The output line “print the answer modulo 10^9 + 7” appears on roughly 30% of CF Div 2/3 problems with combinatorial flavor. Behind that line is a fixed engine: precompute fact[i] = i! and inv_fact[i] = (i!)^(-1) modulo p, and C(n, r) = fact[n] · inv_fact[r] · inv_fact[n-r] mod p. Once you have the engine, dozens of problems collapse to “set up the formula, plug in, output”. The same machinery underlies probability problems where the answer is a/b mod p (output a · b^(-1) mod p).

Interview Context

Quant/HFT interviews use modular-counting problems as direct filters: “how many distinct length-k increasing sequences over [1, n], modulo 10^9 + 7?” If you can’t write the formula and the inverse-factorial trick fluently, you’ve failed in 10 minutes. FAANG L4 interviews almost never ask this directly, but a good L5+ candidate signals fluency by reaching for inv_fact[r] without explanation when a counting problem’s answer needs to be reduced. The signal interviewers want is “this candidate has CP background”; that signal is delivered by writing modular inverse without flinching.

Problem Statement

Given Q queries, each (n_i, r_i) with 0 ≤ r_i ≤ n_i ≤ N_max, output C(n_i, r_i) mod p for p = 10^9 + 7. The engine must support Q ≥ 10^6 queries in <2 seconds total.

LeetCode reference: LC 62 (Unique Paths) asks for C(m+n-2, m-1) directly (no mod needed). LC 920 (Number of Music Playlists) is a DP that uses modular arithmetic but not nCr directly. The pure CP framing appears on Codeforces (e.g., CF 1342E, CF 1342B).

Constraints

1 ≤ N_max ≤ 10^7 (table size).
1 ≤ Q ≤ 10^6 (queries).
0 ≤ r ≤ n ≤ N_max (well-formed query).
p = 10^9 + 7 (prime).
Time limit: 2 seconds. Memory limit: 256 MB. The factorial tables fit (10^7 · 8 bytes ≈ 80 MB).

Clarifying Questions

“Is p always prime?” — yes; FLT works iff p prime. If composite, fall back to extgcd-based inverse and watch for non-invertible elements.
“Are n and r always non-negative?” — yes; if r < 0 or r > n, return 0 by convention.
“Do queries arrive online or can they be batched?” — for this lab, online (one at a time after preprocessing), O(1) each.
“Is N_max known in advance?” — yes; we precompute up to N_max.
“Should I support Lucas’s theorem for n larger than N_max?” — out of scope for this lab; see follow-up.

Examples

C(5, 2) mod p = 10.
C(10, 5) mod p = 252.
C(0, 0) mod p = 1.
C(1000000, 500000) mod (10^9+7) = a large nonzero value (engine must handle it).
C(5, 7) mod p = 0 (well-formedness allows r > n only as boundary; we return 0).

Initial Brute Force

For each query, compute C(n, r) = n! / (r! · (n-r)!) using arbitrary-precision arithmetic (e.g., Python’s int), then mod p.

from math import factorial

def nCr_brute(n, r, p):
    if r < 0 or r > n:
        return 0
    return (factorial(n) // (factorial(r) * factorial(n - r))) % p

Brute Force Complexity

Time O(n log² n) per query in arbitrary precision (factorial is O(n) multiplications of big ints, each O(log n) digits). For n = 10^7 and Q = 10^6, infeasible by ~6 orders of magnitude. Useful only as a stress oracle on n ≤ 30.

Optimization Path

Brute force (above) — correctness baseline only.
Precompute factorials, compute inverse on each query. fact[i] table built in O(N); each query computes inv(fact[r]) · inv(fact[n-r]) · fact[n], with inv() taking O(log p) via FLT. Total O(N + Q log p). Better but still 2·10^7 ops for Q = 10^6 — passes but wasteful.
Precompute inverse factorials, query in O(1). After fact[], compute inv_fact[N] = power(fact[N], p-2, p) once, then inv_fact[i] = inv_fact[i+1] · (i+1) mod p going backward. Total O(N + Q). The optimal solution.

The going-backward trick is the key insight: (i!)^(-1) = ((i+1)!)^(-1) · (i+1), because i! · (i+1) = (i+1)!. So one expensive power call plus a backward sweep gives all inverse factorials in O(N).

Final Expected Approach

Precompute fact[0..N] with fact[0] = 1, fact[i] = fact[i-1] · i mod p.
Compute inv_fact[N] = power(fact[N], p - 2, p) via binary exponentiation (FLT).
Compute inv_fact[i] = inv_fact[i+1] · (i+1) mod p for i from N-1 down to 0.
Each query: if r < 0 or r > n: return 0; else return fact[n] · inv_fact[r] · inv_fact[n-r] mod p.

Data Structures Used

Two flat arrays of long long (or int64): fact[0..N] and inv_fact[0..N].
One scalar p.
A power(a, b, m) helper (binary exponentiation).

No heaps, no maps, no trees. The whole thing is two arrays and a closed-form formula.

Correctness Argument

FLT proof of inverse. Fermat’s Little Theorem states: for prime p and gcd(a, p) = 1, a^(p-1) ≡ 1 (mod p). Multiplying both sides by a^(-1): a^(p-2) ≡ a^(-1) (mod p). So power(a, p-2, p) is the modular inverse of a whenever a ≢ 0 (mod p). For a = fact[i], since fact[i] = i! < p for i < p and the product of nonzero-mod-p values, gcd(fact[i], p) = 1, so the inverse exists.

Backward inverse-factorial recurrence. We have inv_fact[i+1] = ((i+1)!)^(-1) and want inv_fact[i] = (i!)^(-1). Since (i+1)! = i! · (i+1), taking inverses: ((i+1)!)^(-1) = (i!)^(-1) · (i+1)^(-1), equivalently (i!)^(-1) = ((i+1)!)^(-1) · (i+1). So inv_fact[i] = inv_fact[i+1] · (i+1) mod p. Base case at i = N is given by the explicit FLT call.

nCr formula correctness. C(n, r) = n! / (r! · (n-r)!). In Z/pZ, division by x is multiplication by x^(-1). So C(n, r) ≡ fact[n] · inv_fact[r] · inv_fact[n-r] (mod p). ✓

Complexity

Preprocess: O(N + log p) time (one power call, two linear sweeps), O(N) space.
Each query: O(1) time, O(1) space.
Total for N = 10^7 and Q = 10^6: ~10^7 + 10^6 mults, well under 2 seconds in C++/Java/Go; in Python use PyPy or numpy-accelerated arithmetic.

Implementation Requirements

Use long long (C++) / int64 (Go) / int (Python; Java long). Cast operands before multiplying to avoid overflow: (long long)fact[n] * inv_fact[r] % p.
power helper handles b = 0 returning 1 and m = 1 returning 0.
Guard r < 0 || r > n returning 0.
Mod once after every multiplication, not at the end.

Tests

def test_nCr():
    p = 10**9 + 7
    eng = NCrEngine(N=20, p=p)
    assert eng.nCr(5, 2) == 10
    assert eng.nCr(10, 5) == 252
    assert eng.nCr(0, 0) == 1
    assert eng.nCr(1, 0) == 1
    assert eng.nCr(1, 1) == 1
    assert eng.nCr(5, 7) == 0       # r > n
    assert eng.nCr(5, -1) == 0      # r < 0
    # Stress vs brute on small N:
    import random
    for _ in range(1000):
        n = random.randint(0, 20)
        r = random.randint(-2, 22)
        assert eng.nCr(n, r) == nCr_brute(n, r, p)

Edge cases: n = 0, r = 0, r = n, r > n, r < 0, n = N_max.

Follow-up Questions

“What if p is not prime?” — FLT fails. Use extgcd-based inverse, but be aware that not every a is invertible (only those with gcd(a, p) = 1).
“What if n is up to 10^18 but p is small (≤ 10^5)?” — Lucas’s theorem: write n, r in base p, multiply C(n_i, r_i) mod p digit-wise.
“What if you need C(n, r) mod 4?” — neither FLT nor straightforward Lucas applies; use Kummer’s theorem or direct computation.

Product Extension

Probability/statistics services (e.g., AdTech bid pacing, fraud-risk scoring) compute combinatorial denominators on the fly. The factorial-precomputation engine is the production primitive: build it once at service startup, query it O(1) for the lifetime of the server. Same machinery is in CryptoLib’s prime-modulus arithmetic and in numerical libraries (NumPy’s comb calls down to a similar routine for large arguments).

Language / Runtime Follow-ups

Python: integers are arbitrary precision, so overflow isn’t a concern, but per-mult cost is ~5× C++. Use pow(a, b, m) (built-in O(log b) modular exponentiation). Precompute as numpy.int64 arrays only if N ≤ 10^6; otherwise plain lists.
Java: use long everywhere; cast to (long) before multiplying. BigInteger.modPow works but is 10× slower than a hand-rolled loop.
Go: int64 everywhere; math/big if you need extra safety. Hand-roll the loop for performance.
C++: long long (int64_t), (long long)a * b % p. The product of two values <p ≈ 10^9 fits in long long (< 2^63). With unsigned overflow concerns, use unsigned long long and % p defensively.
JS/TS: Number is double-precision and loses integer precision above 2^53. Use BigInt, but it’s ~10× slower; avoid for hot loops larger than 10^6 ops.

Common Bugs

Forgetting to mod after every multiplication; silent overflow.
Using ** (exponentiation) where the language doesn’t have a modular form; computes a 5MB number first, then mods. Use pow(a, b, m) (Python) or your own loop.
Computing inv_fact[i] directly via power(fact[i], p-2, p) for every i. Correct but O(N log p); the backward sweep is O(N).
Off-by-one: inv_fact has N+1 entries (inv_fact[0..N]); allocate accordingly.
Returning 1 for C(n, -1) or C(n, n+1) instead of 0. Always guard.

Debugging Strategy

If nCr(n, r) disagrees with brute on small cases:

Print fact[0..n] and confirm fact[i] = i! for i ≤ 10.
Print inv_fact[0..n] and confirm fact[i] · inv_fact[i] ≡ 1 (mod p) for each i.
If step 2 fails: the FLT exponent is wrong. Check power(fact[N], p-2, p), not p-1.
If step 2 passes but nCr is wrong: the formula is fact[n] · inv_fact[r] · inv_fact[n-r]. Check you’re not using fact[n-r] (without inv_).

Mastery Criteria

Write the engine (factorial table + inverse-factorial backward sweep + query) from blank in <5 minutes.
Articulate why power(fact[N], p-2, p) gives inv_fact[N] in one sentence (FLT).
Articulate why inv_fact[i] = inv_fact[i+1] · (i+1) in one sentence (telescope).
Recognize “answer mod prime” + “counting / paths / arrangements” as the trigger for this engine within 60 seconds of reading a problem.
Switch to extgcd-based inverse if asked “what if p is not prime?”
State Lucas’s theorem on demand and explain when it’s needed.

← Phase 7 README · Lab 02 — Binary Exponentiation →

Lab 02 — Binary Exponentiation and Matrix Exponentiation for Fibonacci

Goal

Master O(log b) exponentiation in two settings: integer fast power (a^b mod p) and matrix fast power (Fibonacci F(n) mod p for n up to 10^18). By the end, you can write integer fast power in <2 minutes and matrix Fibonacci in <10 minutes from blank.

Background

Binary exponentiation is the engine behind almost every “compute a^b for huge b” subroutine in CP and cryptography. The same divide-and-conquer pattern — a^b = (a^(b/2))^2 if b even, else a · a^(b-1) — generalizes from integers to any associative operation: matrix multiplication (linear recurrences), polynomial multiplication (signal processing), function composition (rotation by an angle, repeated f(f(f(...)))). Internalize the O(log b) skeleton once and you get all these for free.

Matrix exponentiation is the canonical extension. Fibonacci, defined F(n) = F(n-1) + F(n-2), has the matrix form [F(n+1), F(n)]^T = M · [F(n), F(n-1)]^T where M = [[1, 1], [1, 0]]. Therefore [F(n+1), F(n)]^T = M^n · [F(1), F(0)]^T = M^n · [1, 0]^T. Computing M^n takes O(log n) matrix multiplications, each O(2^3) = O(8) operations. Total: O(log n) for n = 10^18, about 60 multiplications.

Interview Context

Quant interviews use exactly this pair. “Compute 2^(10^18) mod (10^9 + 7)” is a 2-minute warm-up. “Compute F(10^18) mod (10^9 + 7)” is the follow-up that filters out candidates who only know the iterative O(N) Fibonacci. A candidate who reaches for matrix exponentiation reflexively when seeing n ≤ 10^18 and a linear recurrence is signalling 1900+ CF rating, which is a strong positive signal at HFT firms.

Problem Statement

Part 1. Implement power(a, b, m) returning a^b mod m for 0 ≤ a < m, 0 ≤ b ≤ 10^18, 1 ≤ m ≤ 10^9 + 7.

Part 2. Implement fib(n, p) returning F(n) mod p for 0 ≤ n ≤ 10^18, p = 10^9 + 7, where F(0) = 0, F(1) = 1, F(n) = F(n-1) + F(n-2).

LeetCode reference: LC 50 (Pow(x, n)) — Part 1 in real-number form. LC 1137 (N-th Tribonacci) — Part 2 with three-term recurrence (analogous matrix form).

Constraints

Part 1: b up to 10^18, so naive O(b) looping is impossible.
Part 2: n up to 10^18, so naive O(n) Fibonacci is impossible.
Time limit: 2 seconds.
Memory: O(1) for Part 1, O(1) for Part 2 (matrices are 2×2).

Clarifying Questions

“Negative b for integer power?” — for modular a^b, undefined unless gcd(a, m) = 1 and you want a^(-1)^|b|. For real a^b (LC 50), return 1 / a^|b| and watch for INT_MIN overflow on -b.
“What’s 0^0?” — by convention 1.
“Fibonacci indexing — is F(1) = 1 or F(2) = 1?” — confirm; standard CP convention is F(0) = 0, F(1) = 1.
“Matrix exponentiation modulus — same p everywhere?” — yes.
“Required output format for matrices?” — only the scalar Fibonacci value, but you might be asked to return the full state vector.

Examples

power(2, 10, 1000) = 24 (2^10 = 1024).
power(3, 0, 7) = 1.
power(5, 1000000000000000000, 10^9 + 7) = some specific value (must compute).
fib(0, p) = 0, fib(1, p) = 1, fib(10, p) = 55.
fib(10^18, 10^9 + 7) = a specific value the engine must produce.

Initial Brute Force

Part 1: result = 1; for i in 1..b: result = result * a % m. O(b). Part 2: iterative two-variable Fibonacci. O(n).

def power_brute(a, b, m):
    result = 1 % m
    for _ in range(b):
        result = result * a % m
    return result

def fib_brute(n, p):
    if n == 0: return 0
    a, b = 0, 1
    for _ in range(n - 1):
        a, b = b, (a + b) % p
    return b

Brute Force Complexity

Part 1: O(b) mults, b = 10^18 is 10^17 mults/sec required — impossible. Part 2: same. Useful only as oracles on n ≤ 30.

Optimization Path

Part 1.

Naive O(b) (above).
Recursive O(log b): power(a, b) = power(a, b/2)² if b even, else a · power(a, b-1).
Iterative O(log b): process bits of b low-to-high, square a each iteration, multiply into result when the current bit of b is 1. Preferred for stack safety.

Part 2.

Naive O(n) (above).
Memoized doubling: F(2k) = F(k) · (2 · F(k+1) − F(k)), F(2k+1) = F(k)² + F(k+1)². O(log n) recursion. Beautiful but error-prone.
Matrix exponentiation: build M = [[1,1],[1,0]], compute M^n via integer-fast-power lifted to matrices, extract M^n[0][1] as F(n). O(log n) · O(8) = ~500 ops for n = 10^18. The general-purpose technique that works for any linear recurrence.

Final Expected Approach

Part 1 (iterative):

long long power(long long a, long long b, long long m) {
    long long res = 1 % m;
    a %= m;
    while (b > 0) {
        if (b & 1) res = res * a % m;
        a = a * a % m;
        b >>= 1;
    }
    return res;
}

Part 2 (matrix):

typedef vector<vector<long long>> Mat;
const long long P = 1e9 + 7;

Mat matmul(const Mat &A, const Mat &B) {
    int n = A.size();
    Mat C(n, vector<long long>(n, 0));
    for (int i = 0; i < n; ++i)
        for (int k = 0; k < n; ++k)
            if (A[i][k])
                for (int j = 0; j < n; ++j)
                    C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % P;
    return C;
}

Mat matpow(Mat M, long long e) {
    int n = M.size();
    Mat res(n, vector<long long>(n, 0));
    for (int i = 0; i < n; ++i) res[i][i] = 1;  // identity
    while (e > 0) {
        if (e & 1) res = matmul(res, M);
        M = matmul(M, M);
        e >>= 1;
    }
    return res;
}

long long fib(long long n) {
    if (n == 0) return 0;
    Mat M = {{1, 1}, {1, 0}};
    Mat R = matpow(M, n);
    return R[0][1];
}

Data Structures Used

Part 1: scalars only.
Part 2: 2×2 matrices as vector<vector<long long>> or array<array<long long, 2>, 2>.

Correctness Argument

Part 1 (binary exponentiation invariant). Let b = sum b_i 2^i be the binary representation of the original exponent. After k iterations, the variable a equals a_initial^(2^k) mod m, and res equals the product of a_initial^(2^i) mod m for all i < k with b_i = 1. After all iterations, res = a_initial^b mod m. The invariant proves correctness; termination is b → 0 after floor(log₂ b) + 1 iterations.

Part 2 (matrix recurrence). Define the column vector v(k) = [F(k+1), F(k)]^T. Then M · v(k) = [[1,1],[1,0]] · [F(k+1), F(k)]^T = [F(k+1) + F(k), F(k+1)]^T = [F(k+2), F(k+1)]^T = v(k+1). By induction, v(n) = M^n · v(0) = M^n · [1, 0]^T, so F(n) = v(n)[1] = (M^n · [1, 0]^T)[1] = M^n[1][0]. Equivalently, M^n[0][1] = F(n) (by symmetry of the Fibonacci matrix). Matrix multiplication is associative, so binary exponentiation lifts directly: same invariant, just with matrices.

Complexity

Part 1: O(log b) time, O(1) space.
Part 2: O(K^3 log n) time for K × K matrices (K = 2 for Fibonacci → 8 log n ≈ 480 ops for n = 10^18), O(K^2) space.

Implementation Requirements

All multiplications mod p immediately.
Identity matrix initialization for matpow.
Handle n = 0 (return 0 directly, not M^0 · [1, 0]^T = [1, 0]^T which would give F(0) = 0 correctly anyway, but watch corner cases).
Use long long (or int64); intermediate products are up to (10^9)² ≈ 10^18, just fitting long long.

Tests

def test_power():
    assert power(2, 10, 1000) == 24
    assert power(3, 0, 7) == 1
    assert power(0, 0, 7) == 1
    assert power(0, 5, 7) == 0
    assert power(2, 63, 10**18) == 2**63 % 10**18

def test_fib():
    p = 10**9 + 7
    expected = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
    for i, e in enumerate(expected):
        assert fib(i, p) == e
    # Stress vs brute up to N = 30
    for n in range(31):
        assert fib(n, p) == fib_brute(n, p)
    # Spot-check large
    assert fib(10**18, p) == 209783453   # known value, mod 1e9+7

Follow-up Questions

“Compute F(n) for general linear recurrence f(n) = c1 f(n-1) + ... + ck f(n-k)?” — same matrix exponentiation, with a k×k companion matrix.
“Compute the number of paths of length L between two nodes in a graph?” — (adj_matrix)^L, indexed by start/end. Same O(V^3 log L) engine.
“Compute Tribonacci T(n) in O(log n)?” — 3×3 companion matrix, otherwise identical.
“Avoid O(K^3) per multiplication for huge K?” — research topics: Kitamasa’s algorithm reduces to O(K^2 log N); FFT-based polynomial multiplication reduces further. Out of scope here.

Product Extension

Cryptography (RSA encryption: compute m^e mod n with e, n 2048-bit, in milliseconds — same algorithm, with bignum). Computer graphics (rotation matrix R^n for repeated rotations). Markov chain steady-state approximation (P^n for stochastic matrix P, large n). Network reachability (adj^L for paths of length L).

Language / Runtime Follow-ups

Python: built-in pow(a, b, m) is O(log b) and ~10× faster than a hand-rolled Python loop. For matrix power: hand-roll the multiplication; numpy is overkill at K = 2 (FFI overhead exceeds compute).
Java: BigInteger.modPow exists but is 10× slower than a hand-rolled long loop. Use the hand-rolled version unless values exceed long.
Go: math/big.Int.Exp(a, b, m) is correct but slow; hand-roll for hot paths. Matrix: use 2D [][]int64 arrays, not math/big.
C++: __int128 for intermediate products if m exceeds ~3·10^9 (where (long long)² mod p overflows). For standard m = 10^9 + 7, plain long long suffices.
JS/TS: BigInt is correct but slow; for m < 2^32, use Number carefully (Math.floor after * a / m) — easy to get wrong. Matrix: same caveat.

Common Bugs

Forgetting res = 1 % m at start (returns 1 instead of 0 when m = 1).
Squaring a before checking the bit (results in extra unused multiplication; correctness OK, perf hit).
Matrix multiplication order: M^n = M · M · ... · M, but matmul is non-commutative — be deliberate about left/right.
Using int instead of long long for matrix entries; (10^9)² > 2^31.
Recursive power blowing the stack on b = 10^18 (recursion depth 60 is fine; just don’t go deeper).

Debugging Strategy

If power(a, b, m) is wrong:

Test on small cases (b ≤ 5) where you can verify by hand.
Print binary representation of b and confirm the bits are processed.
If correct on small b but wrong on large: overflow. Check that a * a % m uses long long, not int.

If fib(n, p) is wrong:

Verify M^1 = M, M^2 = [[2,1],[1,1]], M^3 = [[3,2],[2,1]]. Each entry is a Fibonacci number.
Verify fib(0..10) matches the canonical sequence.
If small n works but large doesn’t: same overflow check.

Mastery Criteria

Write power(a, b, m) from blank in <2 minutes.
Write the Fibonacci matrix-exponentiation engine from blank in <10 minutes.
Articulate the binary-exponentiation invariant in one sentence.
Articulate the matrix-Fibonacci recurrence (v(n+1) = M · v(n)) in 30 seconds.
Generalize to any linear recurrence: given the recurrence, write the companion matrix in <2 minutes.
Recognize “n up to 10^18 + linear recurrence” as the trigger for matrix exponentiation in <60 seconds.

← Lab 01 — Modular Arithmetic · Phase 7 README · Lab 03 — Sieve and Factorization →

Lab 03 — Sieve of Eratosthenes and Smallest-Prime-Factor Factorization

Goal

Master prime enumeration and integer factorization at competitive scale: count primes up to N = 5 · 10^6 in under 100ms, and factorize Q = 10^5 integers ≤ N in O(log n) each via a precomputed smallest-prime-factor (SPF) table. By the end, you can write the linear sieve from blank in <5 minutes.

Background

Many CP problems reduce to “is x prime?”, “what are the prime factors of x?”, or “how many primes ≤ N?”. For N up to a few million, a sieve answers all three in O(N log log N) (Eratosthenes) or O(N) (Euler/linear sieve), and the SPF byproduct lets you factorize any x ≤ N in O(log x) time. For larger N (up to 10^9), you need Miller-Rabin primality tests and Pollard’s rho for factorization (out of scope here).

The sieve is the single most reused primitive across number-theoretic CP problems. Get this fluent and you save 5–10 minutes per problem.

Interview Context

LC 204 (Count Primes) is the canonical screen at FAANG mid-level. Quant interviews ramp it up: “factorize each of 10^5 numbers up to 10^7”. Both questions test the same primitive, but at different scales — the scale forces the candidate to pick the right data structure (bitset vs vector<bool> vs SPF table). A candidate who reaches for SPF without prompting signals CP fluency.

Problem Statement

Three sub-problems on the same engine:

Count primes ≤ N. LC 204. N ≤ 5 · 10^6.
Sum of primes ≤ N. Variant: instead of count, return sum mod 10^9 + 7.
Factorize Q numbers ≤ N. For each x_i, output its multiset of prime factors. Q ≤ 10^5, x_i ≤ N.

Constraints

N ≤ 5 · 10^6 for sub-problems 1 and 2.
N ≤ 10^7, Q ≤ 10^5 for sub-problem 3.
Time limit: 1 second.
Memory limit: 256 MB. SPF as int array uses 4 · 10^7 = 40 MB; fine.

Clarifying Questions

“Is 0 and 1 prime?” — neither. Standard convention.
“Factorization output: ordered or multiset?” — multiset, e.g., 12 → [2, 2, 3].
“Are inputs guaranteed ≤ N?” — yes for this lab. Otherwise switch to trial division O(√x) or Pollard’s rho.
“Need primes only or all factors?” — primes only here (canonical). Divisors are a separate problem.
“Single-threaded?” — yes. Sieves don’t parallelize trivially without contention.

Examples

Sub-problem 1: count_primes(10) = 4 (2, 3, 5, 7). count_primes(2) = 1. count_primes(1) = 0.
Sub-problem 2: sum_primes(10) = 17.
Sub-problem 3: factorize(12) = [2, 2, 3]. factorize(1) = []. factorize(7) = [7]. factorize(60) = [2, 2, 3, 5].

Initial Brute Force

For sub-problem 1: for each x in [2, N], run trial division up to √x.

def is_prime(x):
    if x < 2: return False
    for d in range(2, int(x**0.5) + 1):
        if x % d == 0: return False
    return True

def count_primes_brute(N):
    return sum(1 for x in range(2, N) if is_prime(x))

For sub-problem 3: trial division of each x_i.

def factorize_brute(x):
    out = []
    d = 2
    while d * d <= x:
        while x % d == 0:
            out.append(d); x //= d
        d += 1
    if x > 1: out.append(x)
    return out

Brute Force Complexity

Sub-problem 1 trial division: O(N √N). For N = 5·10^6, that’s ≈ 10^10 ops — impossible in 1 second.

Sub-problem 3 trial division per query: O(√x). For Q = 10^5 and x = 10^7, that’s ~3·10^8 ops — borderline. Sieve-based factorization is O(log x) ≈ 24 ops, ~2.4·10^6 total — comfortably faster.

Optimization Path

Trial division (above).
Sieve of Eratosthenes for sub-problems 1, 2: mark composites; remaining are primes. O(N log log N).
Linear (Euler) sieve + SPF table: each composite is crossed off exactly once via its smallest prime factor; the SPF byproduct enables O(log x) factorization. O(N) preprocessing.

For interviews, Eratosthenes is fine; for CP the linear sieve is the default once you’ve drilled it.

Final Expected Approach

Sieve of Eratosthenes (sub-problems 1, 2):

vector<bool> sieve(int N) {
    vector<bool> is_prime(N + 1, true);
    is_prime[0] = is_prime[1] = false;
    for (int i = 2; (long long)i * i <= N; ++i)
        if (is_prime[i])
            for (int j = i * i; j <= N; j += i)
                is_prime[j] = false;
    return is_prime;
}

Linear sieve with SPF table (sub-problem 3):

vector<int> spf;       // smallest prime factor
vector<int> primes;

void linear_sieve(int N) {
    spf.assign(N + 1, 0);
    primes.clear();
    for (int i = 2; i <= N; ++i) {
        if (spf[i] == 0) { spf[i] = i; primes.push_back(i); }
        for (int p : primes) {
            if ((long long)p * i > N || p > spf[i]) break;
            spf[p * i] = p;
        }
    }
}

vector<int> factorize(int x) {
    vector<int> out;
    while (x > 1) { out.push_back(spf[x]); x /= spf[x]; }
    return out;
}

Data Structures Used

vector<bool> (or bitset for memory efficiency) for the sieve mark array.
vector<int> for the SPF table (one int per index up to N).
vector<int> for the prime list.

A bitset packs 8× denser than vector<bool>; for very large N (5 · 10^7) it matters.

Correctness Argument

Eratosthenes correctness. Inductive claim: when iteration i starts, is_prime[j] is correct for all j < i. If is_prime[i] == true, no j < i has j | i (else it would have unmarked i); so i has no proper divisor < i, so i is prime. We then mark all multiples i², i² + i, i² + 2i, ... as composite. Multiples below i² were already marked by smaller prime divisors. By strong induction, the array is fully correct after the loop terminates.

Linear sieve correctness. Each composite n is marked exactly once, when i = n / spf(n) and the inner loop reaches p = spf(n). The loop guard p > spf[i] ensures p ≤ spf[i], so p · i has smallest prime factor p (because p ≤ spf[i] ≤ all other prime factors of i). So the assignment spf[p * i] = p is correct. Each composite has a unique (i, p) pair, hence is visited exactly once → O(N) total work.

Factorization correctness. While x > 1, spf[x] is x’s smallest prime factor, so emit it and divide. Every iteration strictly decreases x by a factor of at least 2, so loop runs ≤ log₂ x times.

Complexity

Eratosthenes: O(N log log N) time, O(N / 8) space with bitset.
Linear sieve + SPF: O(N) time, O(N) space.
Factorize each query: O(log x).
Total for sub-problem 3: O(N + Q log x).

Implementation Requirements

Use (long long)i * i to compute i² to avoid overflow at large N.
Sieve inner loop starts at i², not 2i — multiples below i² are already marked.
For sum-of-primes mod p, mod the running sum after each addition.
SPF allocation: N + 1 entries (for index N).

Tests

def test_count_primes():
    assert count_primes(10) == 4
    assert count_primes(2) == 1
    assert count_primes(1) == 0
    assert count_primes(100) == 25
    # Stress vs brute up to N = 1000
    for n in range(2, 1001):
        assert count_primes(n) == count_primes_brute(n)

def test_factorize():
    linear_sieve(1000)
    assert factorize(12) == [2, 2, 3]
    assert factorize(7) == [7]
    assert factorize(1) == []
    assert factorize(60) == [2, 2, 3, 5]
    # Stress vs brute
    for x in range(1, 1001):
        assert sorted(factorize(x)) == sorted(factorize_brute(x))

Edge cases: N = 0, N = 1, N = 2, prime N, prime power N.

Follow-up Questions

“What if N = 10^9?” — sieve is impossible (memory + time). Use Miller-Rabin for primality, Pollard’s rho for factorization.
“What if N = 10^11 and you need only the count?” — Meissel-Mertens method, O(N^(2/3)). Out of scope here.
“How would you parallelize the sieve?” — segmented sieve: split [2, N] into blocks of size √N, sieve each block independently after computing primes ≤ √N. Each block fits in cache; threads work on disjoint blocks.

Product Extension

Cryptography service: precompute small primes for trial division as a Miller-Rabin pre-filter. Number-theoretic libraries (GMP, FLINT) cache a small-prime table at startup. RSA key generation does trial division up to ~10^5 before falling back to Miller-Rabin, yielding ~10× speedup on rejecting composites.

Language / Runtime Follow-ups

Python: plain Python list of bools is slow; use bytearray (8× faster than list-of-bool). For N = 5·10^6, bytearray sieve runs in ~0.3 s; PyPy / Cython get to ~50 ms.
Java: BitSet is more memory-efficient than boolean[] and ~equally fast.
Go: plain []bool is fine; built-in bitset doesn’t exist (write your own with []uint64).
C++: vector<bool> is bit-packed by default; use bitset<N+1> for stack-allocated speed at compile-time-fixed N.
JS/TS: Uint8Array for the sieve. Number integer math is exact below 2^53.

Common Bugs

Sieve inner loop starting at 2i instead of i² — correct but slower.
Marking is_prime[0] and is_prime[1] as true.
Off-by-one: allocating N entries instead of N + 1.
In linear sieve, missing the p > spf[i] break condition → quadratic behavior.
Computing i * i as int and overflowing for i > ~46340.
For factorization output, dividing x by spf[x] but forgetting to also push spf[x].

Debugging Strategy

If sieve is wrong:

Print primes ≤ 30 from your sieve. Should be [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]. If 1 is included or 9 is missed, check init / inner loop start.

If factorize is wrong:

Print spf[2..30]. Should be [2,3,2,5,2,7,2,3,2,11,2,13,2,3,2,17,...] (smallest prime factor of each index).
If SPF correct but factorization wrong: the loop should emit spf[x] and divide; common bug is doing one or the other.

Mastery Criteria

Write Sieve of Eratosthenes from blank in <3 minutes.
Write linear sieve with SPF from blank in <5 minutes.
Articulate why each composite is marked exactly once in linear sieve.
Use SPF to factorize an integer in <30 seconds.
Recognize N ≤ 10^7 + factor-related question as the sieve trigger within 30 seconds.
State the alternative when N > 10^9 (Miller-Rabin + Pollard’s rho).

← Lab 02 — Binary Exponentiation · Phase 7 README · Lab 04 — Sweep Line →

Lab 04 — Sweep Line: The Skyline Problem

Goal

Master the sweep-line paradigm by solving the canonical Skyline Problem (LC 218). Process N rectangles by sweeping left-to-right over event points, maintaining a set of active heights, and emitting key points when the maximum height changes. By the end, you can write the full algorithm from blank in <15 minutes.

Background

Sweep line is a meta-technique: convert geometric or interval problems on N objects into a sequence of 2N events (one when an object enters, one when it leaves), sort the events, then process them in order while maintaining a dynamic data structure (multiset, segment tree, BIT) that summarizes the currently active set. The cost shifts from “compare every pair” O(N²) to “sort + log-time updates” O(N log N).

The skyline problem is the canonical sweep-line interview question because it forces every component: event extraction, event sorting (with non-obvious tie-breaking), a dynamic max query, and careful output deduplication. Internalize this and you can derive sweep-line variants for rectangle area unions, interval intersection counting, point-in-rectangle queries, and convex hull (Andrew’s monotone chain).

Interview Context

LC 218 (Hard) — by far the most-asked Hard at FAANG mid-level interviews when the panel wants to test sweep line. The bar is very high: a passing solution must be O(N log N), must handle ties correctly, must deduplicate output, and must not emit phantom key points. A candidate who hand-waves through tie-breaking will be rejected even with otherwise-correct code. Quant interviews use rectangle-union-area, which is the same engine with an integral instead of a max.

Problem Statement

Given N buildings, each described by [left_i, right_i, height_i], return the skyline outline as a list of key points [x, y], where y is the height of the skyline at x and key points appear only when the height changes. The last key point has y = 0 (where the rightmost building ends).

LeetCode reference: LC 218 (The Skyline Problem).

Constraints

1 ≤ N ≤ 10^4 (LC), realistically up to 10^5 for harder variants.
0 ≤ left_i < right_i ≤ 2^31 − 1.
1 ≤ height_i ≤ 2^31 − 1.
Output is sorted by x. No two consecutive key points have the same y.

Clarifying Questions

“Are buildings axis-aligned and non-rotated?” — yes (skyline assumption).
“Can buildings overlap?” — yes, freely.
“Is the ground at y = 0?” — yes; the skyline ends with [x_max, 0].
“Output format: key points where height changes, or every event point?” — only changes; consecutive duplicates are bugs.
“Tie-breaking for events at the same x?” — opens before closes (the building exists at that exact x, contributing its height).

Examples

Input: [[2, 9, 10], [3, 7, 15], [5, 12, 12], [15, 20, 10], [19, 24, 8]].
Output: [[2, 10], [3, 15], [7, 12], [12, 0], [15, 10], [20, 8], [24, 0]].

Walking through: at x = 2, height jumps from 0 to 10 → emit [2, 10]. At x = 3, height jumps to 15 → emit [3, 15]. At x = 7, building of height 15 ends, max drops to 12 → emit [7, 12]. At x = 12, last “left-cluster” building ends → emit [12, 0]. Then the right cluster starts at 15 ([15, 10]), 19 doesn’t change max (still 10 since 8 < 10), 20 ends 10-building → emit [20, 8], 24 ends 8-building → emit [24, 0].

Initial Brute Force

For each x from 0 to x_max, compute the max height over all buildings covering x. Emit [x, h] whenever h changes.

def skyline_brute(buildings):
    x_max = max(b[1] for b in buildings)
    out = []
    prev_h = 0
    for x in range(x_max + 1):
        h = max((b[2] for b in buildings if b[0] <= x < b[1]), default=0)
        if h != prev_h:
            out.append([x, h])
            prev_h = h
    return out

Brute Force Complexity

O(x_max · N). For coordinates up to 2^31, infeasible. Even on small CP-scale x_max = 10^9, no chance. Useful only as oracle on x_max ≤ 100.

Optimization Path

Brute (above).
Event sweep with sorted multiset. Generate 2N events: (left, -h, OPEN) and (right, h, CLOSE). Sort. Sweep, maintaining a multiset of active heights. Emit [x, max_active] when max changes. O(N log N).
Event sweep with a max-heap and lazy deletion. Same idea, but the heap doesn’t natively support deletion; instead, store (height, end_time) pairs and pop stale entries lazily. Sometimes faster constant factor than multiset. O(N log N).
Divide and conquer. Split buildings into two halves, solve recursively, merge skylines (similar to merge sort). O(N log N).

The multiset approach is the cleanest in C++/Java; Python defaults to the heap-with-lazy-deletion style.

Final Expected Approach

Heap with lazy deletion (Python idiom):

import heapq

def get_skyline(buildings):
    events = []
    for L, R, H in buildings:
        events.append((L, -H, R))     # opening: negative height for max-heap via min-heap
        events.append((R, 0, 0))      # closing sentinel: process at this x
    events.sort()

    result = []
    heap = [(0, float('inf'))]        # (negative height, end_time); ground is height 0 forever
    i = 0
    n = len(events)
    while i < n:
        x = events[i][0]
        # Process all events at this x: add openings.
        while i < n and events[i][0] == x:
            L, neg_H, R = events[i]
            if neg_H < 0:             # an opening
                heapq.heappush(heap, (neg_H, R))
            i += 1
        # Lazy-pop expired buildings.
        while heap[0][1] <= x:
            heapq.heappop(heap)
        cur_max = -heap[0][0]
        if not result or result[-1][1] != cur_max:
            result.append([x, cur_max])
    return result

Data Structures Used

A list of events, sorted.
A max-heap (or multiset / sorted set) of active heights with their end times.
A result list with deduplication on consecutive y.

Correctness Argument

Sweep correctness. Define H(x) = max heights of buildings covering x. The function H changes value only at event coordinates (the boundaries of buildings). So if we sample H at every event coordinate (in sorted order), we capture every change. The result is the unique sequence of changes in H, which is the skyline.

Tie-breaking at equal x. Multiple events can share an x: an opening, a closing, or both. We process all events at this x together: first add all openings (their buildings exist at this x, contributing their height), then lazy-pop closings (those buildings no longer cover any x ≥ this x). After all events at x are processed, the heap reflects the active set on [x, x+1), and we read cur_max. Emit [x, cur_max] if it differs from the previous emission. This handles “two buildings of different heights both starting at the same x” (emit the taller), “one building closing at the same x another opens” (emit the new max), and “two buildings of equal height with overlapping ranges” (no change at the boundary).

Lazy deletion correctness. We never pop a building from the heap until we’ve passed its end time. The heap’s top might be stale; we pop while heap[0].end ≤ x. Once we stop, the top is the current maximum. Since each building is pushed once and popped at most once, total heap work is O(N log N).

Complexity

O(N log N) time (sort + heap operations).
O(N) space (events and heap).
Output size up to O(N) key points.

Implementation Requirements

Sort events with the right tie-breaking. The (x, -h, R) tuple ordering naturally puts openings before closings at the same x (negative height < 0 < zero sentinel).
Process all events at the same x together before emitting.
Deduplicate consecutive key points with equal y.
Initialize the heap with the ground sentinel (0, ∞) so it’s never empty.

Tests

def test_skyline():
    bs = [[2,9,10],[3,7,15],[5,12,12],[15,20,10],[19,24,8]]
    expected = [[2,10],[3,15],[7,12],[12,0],[15,10],[20,8],[24,0]]
    assert get_skyline(bs) == expected

    # Single building
    assert get_skyline([[1, 2, 1]]) == [[1, 1], [2, 0]]

    # Two same-x openings, different heights
    assert get_skyline([[0, 5, 3], [0, 4, 5]]) == [[0, 5], [4, 3], [5, 0]]

    # Stress vs brute on small inputs
    import random
    for _ in range(200):
        N = random.randint(1, 6)
        bs = []
        for _ in range(N):
            L = random.randint(0, 10)
            R = random.randint(L + 1, L + 5)
            H = random.randint(1, 5)
            bs.append([L, R, H])
        assert get_skyline(bs) == skyline_brute(bs)

Follow-up Questions

“Total area covered (rectangle union)?” — same sweep, but accumulate Δx · max_height between consecutive events. O(N log N).
“Number of overlapping buildings at each x?” — same events, but track the count of active buildings, not max. O(N log N).
“Online version where buildings stream in?” — segment tree over compressed coordinates, range max query. O(N log N) total.
“K-th tallest building visible at x?” — segment tree with “K-th order statistic” support, or a balanced BST.

Product Extension

Logging / event analytics: “concurrent active sessions over time” is the same engine with count instead of max. Cloud autoscaling decision: “what’s the peak demand in this 5-minute window?” Same engine with sum instead of max. Calendar conflict detection: pairs of overlapping events found by sweep + active-set membership. Real-time bidding (RTB): impression eligibility windows with priority-tier counts.

Language / Runtime Follow-ups

Python: heapq is min-heap, so use negative heights for max. SortedList from sortedcontainers is O(log N) insert / delete / max — closest to C++ multiset.
Java: TreeMap<Integer, Integer> mapping height to count, with lastKey() for max. Or PriorityQueue with lazy deletion.
Go: container/heap with custom Less; alternatively a sort.IntSlice you maintain manually.
C++: multiset<int> with *rbegin() for max, erase(find(...)) for single-instance removal. Or priority_queue + lazy deletion.

Common Bugs

Tie-breaking: closing processed before opening at the same x causes phantom drops.
Forgetting the ground sentinel (0, ∞) causes empty-heap crashes when all buildings expire.
Failing to deduplicate consecutive key points: emitting [5, 10], [7, 10], [9, 10] instead of [5, 10].
Removing both copies when two buildings have the same height by using multiset.erase(value) (which erases all). Use erase(find(value)).
Confusing <= vs < on the lazy-pop condition; off-by-one drops a building one event too early or late.
Sorting events by x only and breaking ties arbitrarily — leads to wrong output on dense inputs.

Debugging Strategy

If output has phantom key points or wrong heights:

Print events after sorting; verify openings appear before closings at the same x.
Print heap contents after processing each event; verify the top is the true max active height.
Run on the smallest failing case and compare against brute on x_max ≤ 30.

If output is missing key points:

Check the deduplication condition; an off-by-one might filter out real changes.
Verify all events at the same x are processed before emitting.

Mastery Criteria

Write the full skyline algorithm from blank in <15 minutes.
Articulate the tie-breaking rule and why it’s needed in 30 seconds.
Adapt the sweep to rectangle-union-area in <5 minutes.
Recognize “intervals + boundary events + dynamic property” as the sweep-line trigger in <60 seconds.
State the O(N log N) correctness argument in one sentence.

← Lab 03 — Sieve and Factorization · Phase 7 README · Lab 05 — Coordinate Compression →

Lab 05 — Coordinate Compression and Fenwick Tree: Count of Smaller Numbers After Self

Goal

Master coordinate compression as a preprocessing step, then use a Fenwick tree (BIT) over the compressed indices to count, for each element of an array, how many elements to its right are strictly smaller. Solve LC 315 in O(N log N). By the end, you can compress + scan + BIT-update from blank in <10 minutes.

Background

Many problems on integer arrays don’t actually depend on the values, only on the relative ordering. When values can be huge (10^9) but N is small (10^5), allocating an array indexed by value is impossible. Coordinate compression replaces each value v by its rank in sort(unique(values)), mapping the value space down to [0, N). This lets us index a BIT or segment tree by rank instead of value, swapping O(value_range) space for O(N).

The pairing of “compress, then BIT over ranks, then scan one direction” is one of the most reused patterns in CP: count inversions, count smaller-on-right, count pairs with sum in a range, K-th smallest in a sliding window — all instances of the same engine.

Interview Context

LC 315 (Hard) — Count of Smaller Numbers After Self. A classic FAANG senior-level question; also asked at quant firms for “count inversions” or “count pairs (i, j) with i < j and a_i > a_j”. Brute is O(N²) and obvious. The expected O(N log N) solution requires the candidate to either:

Compress + BIT (this lab), or
Modified merge sort (counts inversions during the merge step).

Both are valid; BIT is more general and extends to more variants. Senior candidates are expected to know both.

Problem Statement

Given an integer array nums of length N, return an array counts where counts[i] is the number of indices j > i such that nums[j] < nums[i].

LeetCode reference: LC 315 (Count of Smaller Numbers After Self).

Constraints

1 ≤ N ≤ 10^5.
−10^4 ≤ nums[i] ≤ 10^4 (LC); generalize to −10^9 ≤ nums[i] ≤ 10^9.
Time limit: 1 second.

Clarifying Questions

“Strictly smaller, or ≤?” — strictly smaller (per LC). For ≤, change query upper bound.
“Are duplicates allowed?” — yes. They must not count themselves as “smaller”.
“Modify input allowed?” — generally yes; the compression step can sort a copy.
“Return order?” — same order as input (counts[i] aligned with nums[i]).

Examples

nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]. For 5: indices 1 and 3 have values < 5. For 2: only index 3. For 6: only index 3. For 1: nothing to the right.
nums = [-1] → [0].
nums = [-1, -1] → [0, 0]. Equal values don’t count.

Initial Brute Force

def count_smaller_brute(nums):
    n = len(nums)
    out = [0] * n
    for i in range(n):
        for j in range(i + 1, n):
            if nums[j] < nums[i]:
                out[i] += 1
    return out

Brute Force Complexity

O(N²). For N = 10^5, that’s 10^10 ops — infeasible.

Optimization Path

Brute (above).
Modified merge sort. During merge, when an element from the right half is placed before an element from the left half, increment counters for all unplaced left-half elements. O(N log N).
Coordinate compression + Fenwick tree. Compress values to ranks [0, N). Scan right-to-left. For each element, query BIT prefix sum on [0, rank(v) − 1] (= count of strictly smaller values already seen on the right), then update BIT at rank(v). O(N log N).

The BIT approach is more flexible: it handles “smaller”, “equal”, “in range”, “K-th smallest” with the same engine. The merge sort approach is more efficient in constant factor for pure inversion counting.

Final Expected Approach

Coordinate compression + Fenwick tree, scan right-to-left.

struct BIT {
    vector<int> t;
    BIT(int n) : t(n + 1, 0) {}
    void update(int i, int v) { for (++i; i < (int)t.size(); i += i & -i) t[i] += v; }
    int query(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
};

vector<int> count_smaller(vector<int>& nums) {
    int n = nums.size();
    vector<int> sorted_nums(nums.begin(), nums.end());
    sort(sorted_nums.begin(), sorted_nums.end());
    sorted_nums.erase(unique(sorted_nums.begin(), sorted_nums.end()), sorted_nums.end());

    auto rank_of = [&](int v) {
        return (int)(lower_bound(sorted_nums.begin(), sorted_nums.end(), v) - sorted_nums.begin());
    };

    BIT bit(sorted_nums.size());
    vector<int> result(n);
    for (int i = n - 1; i >= 0; --i) {
        int r = rank_of(nums[i]);
        result[i] = (r > 0) ? bit.query(r - 1) : 0;
        bit.update(r, 1);
    }
    return result;
}

Data Structures Used

A sorted-uniqued copy of the input for rank lookup (O(N log N) sort).
A Fenwick tree of size unique_count for prefix-sum updates and queries.
A result array of size N.

Correctness Argument

Compression preserves order. lower_bound returns the first index whose value is ≥ v; for any v in the original array, this is the unique rank. So rank(a) < rank(b) ↔ a < b, and rank(a) == rank(b) ↔ a == b.

Right-to-left scan invariant. When processing index i, the BIT contains exactly the multiset of ranks for indices [i+1, n−1] (each updated by +1). The query bit.query(rank(nums[i]) − 1) returns the count of those ranks strictly less than rank(nums[i]), which equals the count of nums[j] < nums[i] for j > i. After the query, we insert rank(nums[i]) so it’s visible to the next (lower) i.

Edge case rank == 0. If nums[i] is the minimum, no value is strictly smaller, so result[i] = 0. The code guards r > 0 to avoid querying bit.query(-1).

Complexity

Sort + unique: O(N log N).
For each of N elements: one rank lookup (O(log N)), one BIT query (O(log N)), one BIT update (O(log N)).
Total: O(N log N) time, O(N) space.

Implementation Requirements

Use 1-indexed BIT internally (++i on entry); 0-indexed externally.
After sort, deduplicate before binary search; otherwise rank would skip values for ties.
Scan right-to-left; left-to-right would count smaller-on-left, a different problem.
Handle n = 0 (return empty array) and n = 1 (return [0]).

Tests

def test_count_smaller():
    assert count_smaller([5, 2, 6, 1]) == [2, 1, 1, 0]
    assert count_smaller([-1]) == [0]
    assert count_smaller([-1, -1]) == [0, 0]
    assert count_smaller([]) == []
    assert count_smaller([1, 2, 3, 4]) == [0, 0, 0, 0]   # already sorted asc
    assert count_smaller([4, 3, 2, 1]) == [3, 2, 1, 0]   # sorted desc
    assert count_smaller([1, 1, 1]) == [0, 0, 0]         # all equal

    # Stress vs brute
    import random
    for _ in range(200):
        n = random.randint(0, 50)
        nums = [random.randint(-100, 100) for _ in range(n)]
        assert count_smaller(nums) == count_smaller_brute(nums)

Follow-up Questions

“Count strictly greater on right?” — change query to bit.query(maxRank) − bit.query(rank(v)).
“Count of equal values on right?” — bit.query(rank(v)) − bit.query(rank(v) − 1).
“Count of values in [lo, hi] on right?” — bit.query(rank(hi)) − bit.query(rank(lo) − 1).
“Total inversions in array?” — sum the result array.
“Online streaming version?” — same algorithm with a hash-rank assigned on-the-fly via a balanced BST or order-statistics tree (no compression possible until all values seen).
“Why not segment tree?” — works equally well; BIT has 4× smaller constant and is shorter to code. Use seg tree if you need range max / range update.

Product Extension

Recommendation systems: “for each user’s recently watched item, how many of their next 10 watches were lower-rated?” — same problem on rating arrays. Quant trading: rank-based features (“how many of the next 100 ticks are below this tick’s price?”) computed in batch via this engine. Search ranking: “for each query, count the number of subsequent queries with shorter session length” — feature engineering pipelines.

Language / Runtime Follow-ups

Python: bisect.bisect_left for rank lookup; BIT as a plain list. sortedcontainers.SortedList is a 1-line alternative (sl.bisect_left(v); sl.add(v)) but ~5× slower than BIT in pure Python.
Java: Arrays.binarySearch for rank lookup, int[] BIT.
Go: sort.SearchInts, []int BIT.
C++: lower_bound + vector<int> BIT. The pbds order-statistics tree (tree<> from __gnu_pbds) gives find_by_order and order_of_key directly, but is slower than BIT.

Common Bugs

Forgetting to deduplicate after sort: lower_bound still works, but BIT size becomes N even if values are mostly equal — wasted space, not incorrect.
Using upper_bound instead of lower_bound for rank: gives wrong answer for duplicates.
Scanning left-to-right instead of right-to-left: solves a different problem.
1-indexed vs 0-indexed off-by-one in the BIT.
Querying rank − 1 without checking rank > 0: bit.query(-1) may return garbage or crash.
Comparing with ≤ instead of < (depends on problem statement).

Debugging Strategy

If output is wrong:

Print compressed ranks alongside original values; check ordering preserved.
Print BIT state after each insertion (small input). Verify prefix sums equal “count of values ≤ rank”.
Run on [5, 2, 6, 1] and trace right-to-left: at i=3, BIT empty, result=0. Insert rank(1). At i=2, query rank(6)−1 → count of values with rank ≤ rank(6)−1 already in BIT → expect 1.
Compare against brute on N ≤ 30 random inputs.

Mastery Criteria

Write coordinate compression (sort + unique + lower_bound) from blank in <2 minutes.
Write Fenwick tree (update + query) from memory in <3 minutes.
Articulate the right-to-left scan invariant in one sentence.
Adapt the engine to “count in range [lo, hi]” or “total inversions” in <5 minutes.
Recognize “values up to 10^9, N up to 10^5, count-by-rank query” as the compression+BIT trigger in <60 seconds.

← Lab 04 — Sweep Line · Phase 7 README · Lab 06 — Stress Testing →

Lab 06 — Stress Testing: Finding Bugs You Can’t Spot by Reading

Goal

Build a stress-testing harness in <10 minutes that pits a brute-force oracle against a candidate solution on randomly generated small inputs, automatically catching mismatches. Use it to find two intentionally-injected bugs in a candidate solution where reading the code wouldn’t reveal them. By the end, stress testing is your default tool when a CP solution passes samples but fails hidden tests, and you can build a fresh harness for any problem in <10 minutes.

Background

In CP and high-stakes interviews, you frequently face: “my code passes the samples but WAs on hidden tests.” Reading harder doesn’t help — your mental model of the algorithm is exactly what’s wrong. Stress testing breaks this by replacing your brain with the machine: write a slow-but-obviously-correct brute, write a fast random input generator, and let the computer compare outputs on millions of small cases. The first mismatch is a tiny counterexample you can debug by hand.

Top competitive programmers (red CF) use stress testing constantly. It’s the single most underused tool by interview candidates and the fastest way to debug a “looks right but doesn’t work” solution.

Interview Context

Interview problems rarely give you the freedom to write a stress test under time pressure, but the meta-skill — converting “I’m stuck” into a structured experiment — is exactly what staff-level interviews probe. In quant interviews, “How would you validate this code?” is a standard question; “I’d write a brute oracle and stress-test against random inputs” is a strong answer. In CP, stress testing is required at the Div 2/Div 1 level.

This lab is the meta-lab for the entire phase: build the tooling that will save you in every other lab.

Problem Statement

Given a candidate solution and a brute-force oracle for some problem, build a harness that:

Generates random inputs of small size (so brute is fast).
Runs both solutions.
Compares outputs and prints / dies on the first mismatch.
Uses a seeded RNG so failures are reproducible.

We’ll use prefix sum range queries as the test problem. Sub-problems:

Brute: for each query (l, r), sum a[l..r] directly. O(N) per query.
Candidate: precompute prefix[i] = a[0] + ... + a[i-1], answer each query as prefix[r+1] - prefix[l]. O(1) per query.

We will deliberately introduce two bugs in the candidate and use the harness to find each.

Constraints

For stressing: N ≤ 50, values in [-10, 10], Q ≤ 50 queries. Small enough that brute is O(N · Q) = 2500 ops per test, allowing ~10⁵ tests / second.
The candidate should pass when correct, fail clearly when buggy.

Clarifying Questions

“What’s the comparison rule for outputs?” — exact equality (lists of ints).
“How small should random inputs be?” — small enough that brute finishes in microseconds per test, big enough to expose edge cases. Rule of thumb: smallest size where the candidate’s structure is non-trivial.
“Is determinism required?” — yes; seed the RNG so the same failing test re-runs identically.
“Output format on mismatch?” — input that triggered, both outputs, the seed.

Examples

A passing harness run prints nothing (or a PASSED line). A failing run prints the first counterexample:

MISMATCH at iteration 47 (seed=12345):
input:  N=5, a=[3, -2, 1, 4, -1], queries=[(0, 4), (1, 3), (2, 2)]
brute:    [5, 3, 1]
candidate:[5, 3, 0]

Initial Brute Force

The brute oracle is the brute force here:

def prefix_sum_brute(a, queries):
    return [sum(a[l:r+1]) for l, r in queries]

It’s O(N · Q), slow but unambiguous.

Brute Force Complexity

O(N · Q) per test case. For N = Q = 50, ~2500 ops per test. Running 100,000 stress iterations completes in seconds.

Optimization Path

The harness itself doesn’t optimize. The thing being tested (candidate solution) does. The harness’s job is to detect when the optimization is incorrect.

Final Expected Approach

The candidate (intentionally with bugs to discover):

def prefix_sum_candidate(a, queries):
    n = len(a)
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = prefix[i] + a[i]
    # BUG 1: should be prefix[r+1] - prefix[l], not prefix[r] - prefix[l]
    return [prefix[r] - prefix[l] for l, r in queries]

The harness:

import random

def stress(brute, candidate, gen, n_iters=10000, seed=0):
    rng = random.Random(seed)
    for it in range(n_iters):
        inp = gen(rng)
        b_out = brute(*inp)
        c_out = candidate(*inp)
        if b_out != c_out:
            print(f"MISMATCH at iteration {it} (seed={seed}):")
            print(f"  input:     {inp}")
            print(f"  brute:     {b_out}")
            print(f"  candidate: {c_out}")
            return False
    print(f"PASSED {n_iters} iterations.")
    return True

def gen_prefix_sum(rng):
    n = rng.randint(1, 10)
    a = [rng.randint(-5, 5) for _ in range(n)]
    q = rng.randint(1, 5)
    queries = []
    for _ in range(q):
        l = rng.randint(0, n - 1)
        r = rng.randint(l, n - 1)
        queries.append((l, r))
    return (a, queries)

stress(prefix_sum_brute, prefix_sum_candidate, gen_prefix_sum, n_iters=1000, seed=42)

The harness will fire and report the first failing input within a few iterations. Fix prefix[r] to prefix[r+1]. Re-run.

Now introduce BUG 2 (subtle): use prefix[i] initialized as 0 for i = 0 but set prefix[i+1] = prefix[i] + a[i+1] (off-by-one in the recurrence). Stress finds it again.

After both fixes, the harness reports PASSED 10000 iterations. and you know the candidate is (probably) correct.

Data Structures Used

A seeded RNG (random.Random(seed) in Python; mt19937 in C++).
A generator function returning a random valid input.
A brute oracle.
A candidate solution.
The harness loop.

Correctness Argument

Why this works. If the brute oracle is correct (small enough that we can verify by hand), and the candidate disagrees, then the candidate is wrong on that input. We have a counterexample. Conversely, if the candidate matches the brute on n_iters = 10⁵ random small inputs, it’s probably correct — the chance that a non-trivial bug survives all of them is small for most input distributions, but not zero. Add adversarial inputs (all same, all max, all min, edge sizes 0, 1) explicitly to the generator to harden coverage.

Why determinism matters. When the harness fires, you want to re-run with the same seed to reproduce; without a seed, the bug might evaporate next run.

Why small inputs. The smaller the input, the faster brute runs (more iterations), and the easier the counterexample is to debug by hand. CF-grade stress tests use N ≤ 5 for the first pass.

Complexity

Per stress iteration: brute is O(N · Q); candidate is O(N + Q); comparison is O(Q). Harness overhead is negligible. For N = Q = 10 and n_iters = 10⁵, total is ~10⁷ ops — under 1 second in Python.

Implementation Requirements

Seed the RNG explicitly. Print the seed on failure.
Generator must produce valid inputs (respects all problem constraints — non-empty arrays, valid index ranges, etc.).
On mismatch, print the minimal failing input. (Optional refinement: shrink the input by retrying with smaller sizes once you’ve found a failure.)
Cover edge cases: empty array, single element, all-same values, max-size inputs.

Tests

The harness is itself code; it should be tested.

def test_harness_finds_planted_bug():
    def buggy(a, queries):
        return [sum(a[l:r]) for l, r in queries]   # off-by-one: should be a[l:r+1]
    # The harness should fire (return False) on a buggy candidate.
    result = stress(prefix_sum_brute, buggy, gen_prefix_sum, n_iters=1000, seed=1)
    assert result == False

def test_harness_passes_correct_candidate():
    def correct(a, queries):
        n = len(a)
        prefix = [0] * (n + 1)
        for i in range(n):
            prefix[i + 1] = prefix[i] + a[i]
        return [prefix[r + 1] - prefix[l] for l, r in queries]
    result = stress(prefix_sum_brute, correct, gen_prefix_sum, n_iters=1000, seed=1)
    assert result == True

Follow-up Questions

“What if brute is also buggy?” — write brute as straightforwardly as possible (read the problem statement and implement it word-for-word). Skip optimizations. If both brute and candidate agree on a bug, you have no oracle and stress testing fails. Mitigation: cross-check brute against the problem’s sample I/O before stressing.
“How to shrink a counterexample?” — once a failing input is found, repeatedly remove array elements / queries / values; if it still fails, keep the smaller input. Greedy; good enough.
“Stress testing for graph problems?” — generator emits random small graphs (N ≤ 6 vertices, random edges). Brute is BFS / DFS over all paths.
“What if the answer isn’t unique?” — write a checker instead of a comparator: validate the candidate’s output as one of many valid answers (e.g., for “any valid topological order”).
“Multi-threaded stress?” — easy with a process pool; each worker has its own seed offset.

Product Extension

Property-based testing in production: tools like Hypothesis (Python), QuickCheck (Haskell), proptest (Rust) generate random inputs and check invariants — same idea, different framing. Fuzz testing for security: AFL, libFuzzer feed random / mutated bytes to a binary and check for crashes. Differential testing across implementations: compare a new compiler against an old one on random programs (CSmith for C, csmith for SQL via SQLancer, etc.). The harness pattern transfers directly.

Language / Runtime Follow-ups

Python: random.Random(seed) — mt19937 under the hood. pytest + hypothesis for property-based testing in production code.
C++: std::mt19937 rng(seed); std::uniform_int_distribution<int> dist(lo, hi); — never use rand() (low-quality, broken seeding on Windows).
Java: java.util.Random(seed) (LCG, low-quality but reproducible) or SplittableRandom (better statistical quality).
Go: rand.New(rand.NewSource(seed)). The default math/rand global is not thread-safe.
JS/TS: seedable RNG requires a library (seedrandom); built-in Math.random() is not seedable.

Common Bugs

Forgetting to seed the RNG → non-reproducible failures.
Brute oracle has the same bug as the candidate → false negative (stress passes a buggy solution).
Generator produces invalid inputs (e.g., negative array sizes) → both brute and candidate crash → not a useful comparison.
Generator’s distribution is too narrow → never hits edge cases (all-equal, sorted, reverse-sorted, single element).
Output comparison uses == on floats with rounding errors → spurious mismatches; use tolerance.
Harness exits silently on first iteration if generator throws → wrap in try/except and report.
Letting N grow too large → brute is too slow → fewer iterations → less coverage.

Debugging Strategy

When stress fires:

Print the failing input. Run just that input through both brute and candidate.
If brute disagrees with your hand-computed answer, brute is buggy. Fix brute first.
If candidate disagrees with brute (and brute matches your hand-computed answer): trace candidate’s execution on the failing input by hand or in a debugger. The bug is local to a few lines.
Once fixed, re-run stress with the same seed; if it passes, increment seed and run again.

When stress passes but the real submission still WA:

Generator might not cover the failing case. Inspect the failing test’s input distribution (size, value range, special structure) and update the generator.
Add explicit corner cases: empty input, single element, max-size input, all duplicates, all distinct, sorted asc, sorted desc.
Push N higher; some bugs surface only at scale.

Mastery Criteria

Build a stress harness from blank for an array problem in <10 minutes.
Find a planted off-by-one bug in <30 seconds of harness runtime.
Articulate why the brute oracle must be obviously-correct in one sentence.
Generate adversarial corner cases (empty, single, all-equal) deliberately, not only random.
Use the same harness pattern across array, graph, and string problems.
When a real submission WAs, default to “stress test” instead of “read the code again”.

← Lab 05 — Coordinate Compression · Phase 7 README

Phase 8 — Practical Engineering Coding Interviews

Target level: Medium-Hard → Hard (senior+ practical interview track) Expected duration: 4 weeks Weekly cadence: 5–6 labs/week, with each lab requiring a complete working implementation, tests, and rehearsed answers to follow-ups Companies this targets: Big Tech L5+ (Google L5/L6, Meta E5/E6, Amazon SDE-III/Principal, Microsoft Sr/Principal), Stripe, Uber, Airbnb, Cloudflare, Datadog, Snowflake, Databricks, infrastructure-heavy startups

Why This Phase Exists

Phase 2 through Phase 7 trained you to recognize patterns and produce optimal algorithms under a stopwatch. That training is necessary and remains the gating function for the first 30 minutes of most rounds. But there is a second, distinct kind of coding interview that you will face starting at the senior level (and at every level at companies like Stripe, Airbnb, and Uber where the engineering bar is calibrated against production code rather than against contest performance).

That second kind of interview is the practical engineering coding round. You are asked to “build an LRU cache”, “build a rate limiter”, “build a thread pool”, “build a job queue”, “build a small in-memory filesystem”. The problem is not algorithmically extreme — most of these have textbook solutions you could find in a CS curriculum. What the interviewer is testing is whether your code looks like production code:

Are your data structures encapsulated behind a clean API?
Are mutations and reads separated cleanly?
Are concurrency invariants explicit, or did you sprinkle locks “just in case”?
Do you handle partial failure, shutdown, and resource cleanup?
Did you write tests that actually exercise the contract — including concurrency tests where relevant?
Can you answer the inevitable follow-ups about scaling, observability, and operational concerns?

Candidates from a pure LeetCode background routinely fail this round. They produce a one-function LRUCache that passes the LC test cases, then freeze when the interviewer asks “how would you make this thread-safe?” or “how would you observe this in production?” or “what would you do if a put could fail mid-operation?” The interviewer’s note reads: “Strong on the algorithm, weak on engineering. No-hire for senior.”

The bar at senior+ practical interviews is not “did you write code that produces the right answer”. The bar is “did you write code that I would be willing to deploy”. Those are different bars, and this phase trains the second one explicitly.

What Makes Practical Problems Different From LeetCode

Dimension	LeetCode-style	Practical engineering
Optimization target	Big-O time, sometimes space	API surface, testability, operational fitness, correctness under concurrency
Code length	20–40 lines	100–400 lines (a class with several methods + tests)
State	Local to a function	Owned by a long-lived object with invariants across calls
Concurrency	Almost never tested	Almost always at least raised as a follow-up
Failure modes	“Wrong answer on test 47”	Partial failure, restart, poison input, backpressure, shutdown
Tests	Provided by the judge	You write them
Follow-ups	Variant problems with tweaked constraints	Operational reality questions (“scale to N nodes”, “persist across restarts”)
Bar for excellence	Optimal complexity	Production readability + correctness + answers all follow-ups crisply

A LeetCode answer that nails the algorithm but ships a 60-line wall-of-code with single-letter variables and no separation of concerns will get a no-hire at the senior bar even when the algorithm is correct. Conversely, a practical answer that is a little slower than optimal but is cleanly structured, well-tested, and accompanied by sharp follow-up answers will get a strong hire.

You will not “see” this difference until you’ve practiced enough practical labs to internalize what “clean” looks like at the senior bar. That internalization is the entire point of this phase.

The 13 Standard Follow-Ups

Every problem in this phase will be followed by a subset of these thirteen questions. They are not problem-specific — they are senior-bar questions that recur across the industry. Memorize the question list. Then, for each lab in this phase, rehearse the answer for the 4–6 follow-ups that are most natural for that problem. By the end of Phase 8 you should be able to give a 60-to-90-second answer to any of these for any data structure or service-shaped object you’ve built.

How would you make it thread-safe? Identify the critical sections, choose between coarse-grained mutex / fine-grained locks / lock-free / CAS / sharded locks, justify the choice, name the failure modes the choice avoids (deadlock, lost update, torn read), and state the contention behavior under load.
How would you persist state across restarts? Pick between full snapshot, log-structured append (write-ahead log), and snapshot+log; address durability (fsync), atomicity (rename or checksum), and recovery (replay log on boot). State the time-to-recover and the worst-case data loss window.
How would you scale to N nodes? Decide between sharding (partition the keyspace), replication (read scaling), and routing (consistent hashing + virtual nodes). Address rebalancing, hotspotting, and cross-node operations. Don’t reach for “distribute everything” — most practical objects scale by sharding.
How would you observe and monitor it? Name the four signals (latency, traffic, errors, saturation — Google’s Golden Signals) and state which metric you’d emit for each. Specify whether you’d export histograms (latency), counters (events), or gauges (queue depth). Describe the dashboard you’d build.
How would you test it? Three layers minimum: unit tests on each method’s contract; integration / smoke tests on end-to-end flows; concurrency / stress tests where multiple goroutines or threads exercise the object. Mention property-based testing where invariants are clean.
What metrics would you emit? Per-operation counters (puts, gets, hits, misses); per-operation latency histograms; queue / cache size gauges; failure-class counters (eviction, timeout, retry, poison). Reject the temptation to emit everything — emit what you’d actually look at on a 3 AM page.
How would you handle backpressure? Decide between blocking the producer, dropping the request, returning an error, or buffering with a bounded queue and rejection policy. State which one you chose and why. The wrong answer here is “we’d have a really big buffer” — that just delays the problem and worsens latency.
How would you handle partial failure? Identify which operations can fail mid-way (a write that succeeds locally but fails to persist; a network call that times out without confirmation). Choose between idempotent retry, two-phase commit, log-and-recover, or just-fail-fast. Don’t reach for “transactions” reflexively — pick the tool that matches the problem.
What is the eviction policy and cleanup strategy? For caches: LRU / LFU / TTL / size-bounded. For queues: drop oldest / drop newest / dead-letter. For background state: TTL + scavenger goroutine. State the worst-case eviction storm.
What is the consistency model? Strong (linearizable), sequential, causal, eventual, monotonic-read. Most in-memory single-process objects are linearizable trivially; the question becomes interesting once replicated. Be precise about what guarantees you offer.
What configuration knobs would you expose? Capacity, TTL, retry count, backoff base, concurrency limit, shutdown timeout. State sensible defaults. Critically: state the knobs you would not expose, because over-configuration is its own production smell.
What is the shutdown / draining behavior? On close() / SIGTERM: stop accepting new work, finish in-flight work up to a deadline, persist or surface anything not finished, release resources. Specify the deadline. Specify what happens when the deadline expires.
How would you handle a poison-pill input? A request that crashes the worker, exhausts memory, or causes an infinite loop. Bound resource usage per request, isolate the worker, route repeat-offending payloads to a dead-letter queue, and emit a metric. Never silently drop them.

For each lab, the Follow-up Questions section selects 4–6 of these and rehearses an answer. Memorizing one bullet per question is not enough — you need to be able to converse about the choice, naming alternatives and tradeoffs.

Implementation Discipline Expected In This Phase

This is the heaviest phase by code volume. Every lab demands a complete working implementation, not pseudocode and not a sketch. The bar is “could a coworker submit this for code review without me being embarrassed?”. Concretely:

Idiomatic in the chosen language. Python uses snake_case, dataclasses where natural, with blocks for locks, asyncio where the lab demands async. Java uses camelCase, prefers java.util.concurrent primitives, declares interfaces. Go uses short receiver names, returns errors as last value, prefers channels for fan-out, mutexes for shared state.
Small functions, one concern per function. A method that does both validation and mutation should be split. The exception is hot-path code where inlining matters; if you inline, leave a one-line comment explaining why.
Names that describe intent, not type. evict_lru() not e(), pending_jobs not pj, acquire_token_or_block(timeout) not take().
Separation of concerns. Storage, eviction policy, concurrency primitives, observability hooks, and configuration are all distinct concerns. Most labs in this phase have natural seams between them — find the seams and respect them. A class that mixes “manages state”, “decides policy”, and “emits metrics” in every method is harder to test than three classes that compose.
Testable design. Every public method has an obvious test. Constructors take their dependencies (the eviction policy, the clock, the metrics emitter) as parameters so tests can inject fakes. Hardcoded time.now() calls inside business logic are a code smell — inject a clock.
Explicit error handling. Every external call has a defined behavior on failure. Silent try/except: pass is forbidden unless accompanied by a comment explaining why the exception is benign.
Concurrency invariants documented. If a class is thread-safe, say so in the docstring and name the lock that guards each field. If a class is not thread-safe, say so. The forbidden state is “it might be thread-safe, the author didn’t think about it”.
No premature abstraction. Two implementations of an interface justify the interface. One implementation does not. Don’t add a Storage interface for the in-memory backing map until you actually have a second backing.

The labs do not enforce a single language across the phase. Pick Python, Java, or Go for each lab based on what feels natural. Most candidates default to Python because the standard library is rich and the syntax is dense; Java is a strong choice when concurrency and java.util.concurrent primitives are at the heart of the problem (thread pool, blocking queue, atomic counters); Go is excellent when the problem is naturally concurrent and channel-shaped (job queue, dispatcher, crawler). For each lab, the Language/Runtime Follow-ups section calls out the right idiomatic choice in each of the major languages.

The 23 Labs

#	Lab	Core Idea (one line)
01	LRU Cache	O(1) get/put via doubly-linked list + hashmap; the canonical practical-coding warmup
02	LFU Cache	Frequency-bucketed eviction; tie-breaking by recency; harder than LRU
03	Rate Limiter	Four algorithms compared: token bucket, leaky bucket, sliding window log, sliding window counter
04	Task Scheduler	Priority-aware task scheduling with retries, backoff, and a dead-letter queue
05	Thread Pool	Bounded worker pool with work queue, graceful shutdown, and rejection policy
06	Durable Job Queue	At-least-once delivery semantics with idempotency keys and ack/nack
07	Autocomplete	Trie + per-prefix top-K with weighted scores and sub-millisecond response
08	Log Parser	Streaming log line parser with regex extraction and bounded memory
09	File Deduplication	Three-stage pipeline: size → quick hash → full hash
10	Consistent Hashing	Hash ring with virtual nodes, minimal key movement on add/remove
11	Message Dispatcher	Fan-out to N consumers with fairness, priority, and per-consumer backpressure
12	Pub/Sub	In-memory topic-based publish/subscribe with wildcard subscriptions
13	Timer Wheel	Hierarchical timer wheel for O(1) amortized timer scheduling
14	Key-Value Store	In-memory KV with TTL, snapshot+WAL persistence, and crash recovery
15	Retry With Backoff	Exponential backoff + decorrelated jitter + max-attempts + retryable-error policy
16	Circuit Breaker	Three-state machine: closed / open / half-open with sliding-window failure counting
17	Metrics Collector	Counter / gauge / histogram with bounded memory and atomic updates
18	Web Crawler	Concurrent crawler with depth limit, politeness (per-host throttle), and dedup
19	In-Memory Filesystem	`ls`, `mkdir`, `addContentToFile`, `readContent` over a tree of inodes
20	Snake Game	State machine + collision detection + score; classic OOD round
21	Tic-Tac-Toe Streaming	O(1) winner check by maintaining row/col/diagonal counters
22	Text Editor Buffer	Gap buffer / piece table for cursor-local edits; the canonical editor data structure
23	SQL-Like Engine	Toy parser + executor for `SELECT … FROM … WHERE … JOIN …` over in-memory tables

The order is not arbitrary. Labs 1–6 are the canonical warmups (LRU is asked at every senior interview that uses this format). Labs 7–14 stretch into harder data-structure and operational territory. Labs 15–17 are pure operational primitives (retry, circuit breaker, metrics) that show up in service-design rounds. Labs 18–23 are larger, more open-ended OOD-style problems where the interviewer wants to see how you decompose a fuzzy problem into classes.

If you have a 4-week schedule, do six labs per week with a buffer day for the final lab and a mock-interview rehearsal. If you have an 8-week schedule, do three per week and spend the extra time on the follow-ups — that’s where senior interviews are won and lost.

Mastery Checklist

You have completed Phase 8 when you can do the following without prompting:

Implement LRU cache with thread-safety in <20 minutes from a blank screen, including a unit test suite that exercises eviction order.
Implement LFU cache with correct tie-breaking in <30 minutes.
Compare the four rate-limiting algorithms verbally and justify the right pick for a stated load profile in <2 minutes.
Implement a thread pool with bounded queue, rejection policy, and graceful shutdown in <30 minutes.
Implement a job queue with at-least-once semantics and explain why exactly-once is impractical in <2 minutes.
Implement an in-memory KV store with TTL eviction in <25 minutes.
Implement a circuit breaker with all three states and explain when half-open transitions back to closed in <2 minutes.
Implement consistent hashing with virtual nodes in <30 minutes and explain the rebalancing cost on add/remove.
For any of the 23 problems, answer all 13 standard follow-ups crisply (60–90 seconds each) without notes.
Identify, for any production object you’ve built (in real work or in this phase), the four Golden Signals you’d emit and justify why those four.
State the consistency model of any data structure you’ve built in one sentence.
Write a stress test for a concurrent data structure that actually finds bugs (i.e., randomly interleaves operations across threads, asserts invariants after, replays the seed on failure).
Refactor one of your own LeetCode-style 50-line answers from any earlier phase into a clean, testable, production-shaped class without consulting any reference.

Exit Criteria

You may exit Phase 8 and move on to Phase 9 — Language & Runtime Deep Dive when:

Lab completion: every one of the 23 labs has been implemented and tested by you, in a language you would actually use at work, with the test suite passing on the first run after a 24-hour gap (no peek-and-debug). The 24-hour gap matters — it tests retention, not short-term memory.
Follow-up fluency: you can answer the 13 standard follow-ups without prompts for at least 18 of the 23 labs.
Mock interview: you have done at least 2 mock interviews drawn from this phase’s problem list (Phase 11 — Mock Interview Mastery) with a passing rubric score on both, where “passing” requires hitting both the algorithmic correctness and the production-readiness rubric dimensions.
Code review readiness: you can take any of your Phase 8 implementations, post it as a hypothetical PR, and write the PR description (motivation, design choices, tradeoffs, test plan) in <10 minutes per implementation.

If any of the four criteria fail, do not move on. Most candidates underestimate (3) — they pass the algorithm dimension but fail the production-readiness dimension because they didn’t rehearse the follow-ups out loud. Read COMMUNICATION.md once more, then re-do the mocks. The mocks are not optional; the practical-engineering bar is calibrated against verbalized reasoning, not solo-coded artifacts.

Cross-References

FRAMEWORK.md — the universal 16-step framework still applies. Practical problems extend step 16 (production implications), not replace steps 1–15.
CODE_QUALITY.md — the bar is enforced here more strictly than anywhere else in the curriculum.
phase-03-advanced-data-structures/ — several labs (LRU, LFU, trie) build on data structures introduced there. If you skipped Phase 3, do at least labs 7 and 8 of that phase before starting here.
phase-04-graphs/ — the consistent-hashing and dispatcher labs share modeling instincts with graph problems.
phase-09-language-runtime/ — the next phase. Practical engineering interviews and runtime interviews are deeply intertwined; many follow-up answers in Phase 8 cite runtime facts you’ll formalize in Phase 9.
phase-11-mock-interviews/ — mock-08-staff-practical.md is built around this phase’s problem list.

Lab 01 — LRU Cache

Goal

Implement a thread-unsafe and a thread-safe LRUCache with O(1) get and put, using a doubly-linked list keyed by a hashmap. After this lab you should be able to write a clean, tested LRU cache from a blank screen in under 20 minutes, and answer the 13 standard follow-ups for it crisply.

Background Concepts

A cache is a bounded-capacity associative store that evicts entries when capacity is exceeded. The Least Recently Used (LRU) policy evicts whichever entry was accessed (read or written) the furthest in the past. The reason this is the canonical practical-coding warmup is that both O(1) operations require coordinating two data structures: a hashmap that maps keys to nodes (so get is O(1)), and a doubly-linked list ordered by recency (so eviction and recency-update are O(1)). Either structure alone forces an O(N) operation. That two-structure coordination is the engineering insight the interviewer wants to see.

The doubly-linked list uses sentinel head and tail nodes — this avoids null checks at the ends and reduces the function body from a tangle of if-statements to four pointer assignments per operation.

Interview Context

LRU cache is asked at almost every senior coding interview at Big Tech, Stripe, Uber, and Cloudflare. It is LeetCode 146 verbatim, but the bar at the practical-engineering level is far higher than passing the LeetCode test cases: you must produce production-shaped code (encapsulation, naming, error handling, optional thread-safety) and answer follow-ups about concurrency, persistence, sharding, and observability. Failing to articulate any of those follow-ups crisply is a common no-hire signal even when the algorithm is correct.

Problem Statement

Design a class LRUCache(capacity) supporting:

get(key) -> value or None — return the value for key, marking the entry as most-recently-used. Return None (or sentinel) if not present.
put(key, value) — insert or update. If the cache exceeds capacity, evict the least-recently-used entry.

Both operations must run in O(1) average time.

Constraints

1 ≤ capacity ≤ 10^5
Keys are hashable; values are arbitrary
get and put may be called up to 10^7 times in benchmarks
Thread-safe variant: any number of concurrent callers

Clarifying Questions

Are keys always hashable, or can they be raw bytes / mutable objects? (Assume hashable.)
Does put of an existing key count as a “use” for LRU ordering? (Yes — by convention.)
Is the API allowed to return a sentinel for missing keys, or must it raise? (Both are defensible; pick one and document.)
Must the cache be thread-safe? (Often “we’ll get to that” — write the single-threaded version first, then add a lock when asked.)
Eviction callback (notify on evict) needed? (Often a follow-up; design so it can be added without changing the call sites.)

Examples

cache = LRUCache(2)
cache.put(1, "a")              # state: [1]
cache.put(2, "b")              # state: [2, 1]
cache.get(1)        -> "a"     # state: [1, 2]
cache.put(3, "c")              # evict 2; state: [3, 1]
cache.get(2)        -> None
cache.put(1, "z")              # update; state: [1, 3]
cache.get(3)        -> "c"     # state: [3, 1]

Initial Brute Force

A dict plus a list that records the access order. Each get does a linear scan to remove and re-append the key. O(N) per op.

class LRUCacheBrute:
    def __init__(self, cap):
        self.cap = cap
        self.store = {}
        self.order = []   # least-recent at front

    def get(self, k):
        if k not in self.store: return None
        self.order.remove(k); self.order.append(k)
        return self.store[k]

    def put(self, k, v):
        if k in self.store:
            self.order.remove(k)
        elif len(self.store) >= self.cap:
            old = self.order.pop(0)
            del self.store[old]
        self.store[k] = v; self.order.append(k)

Brute Force Complexity

O(N) per get / put because list.remove and list.pop(0) are O(N). At N=10^5 with 10^7 calls, this is 10^12 operations — it will not finish.

Optimization Path

Replace the order-tracking list with a doubly-linked list, and add a hashmap that maps each key to its node. Now removal and append are O(1), so each operation is O(1). The hashmap uses ~3x more memory than the brute, which is the standard space-time tradeoff and is acceptable.

Final Expected Approach

Node: (key, value, prev, next). Storing the key in the node is essential — when we evict the LRU node, we need its key to remove it from the hashmap.
Sentinels: head (most recent) and tail (least recent) sentinel nodes that always exist. Real nodes live between them.
_add_after_head(node): insert immediately after head.
_remove(node): splice out by relinking neighbors.
_touch(node): remove + add after head. This is the recency update.
get(k): hashmap lookup → if hit, _touch(node) and return value; else None.
put(k, v): if exists, update value and _touch; else create node, _add_after_head, hashmap insert, evict if over capacity (the node before tail).

Data Structures Used

Structure	Purpose
`dict`	key → node lookup, O(1)
Doubly-linked list (with sentinels)	recency ordering, O(1) splice
`RLock` (thread-safe variant)	guards both structures together

Correctness Argument

The invariants we maintain after every get or put:

The hashmap and the linked list contain exactly the same set of keys.
The list size never exceeds capacity.
The node immediately after head was the most-recently accessed; the node immediately before tail is the LRU candidate.

Each of get, put, _touch, _remove, _add_after_head preserves all three invariants by inspection of the four pointer assignments per splice. The eviction step in put is the only place that may shrink both structures; it removes exactly one node from both, by reading its key from the node before deleting from the hashmap.

Complexity

get: O(1) average (hashmap lookup + 4 pointer writes)
put: O(1) average
Space: O(capacity)

Implementation Requirements

A complete working implementation is required. Below is the canonical Python version with thread-safety and a clean separation between the storage (the linked list + hashmap) and the policy (eviction order). Tests are required.

import threading
from typing import Hashable, Any, Optional

class _Node:
    __slots__ = ("key", "val", "prev", "next")
    def __init__(self, key=None, val=None):
        self.key, self.val = key, val
        self.prev = self.next = None

class LRUCache:
    """Thread-safe O(1) LRU cache.

    Concurrency: a single RLock guards both the hashmap and the linked list.
    The locked region is short (constant pointer work), so contention is low
    until very high concurrency. For higher concurrency, see follow-up #1.
    """

    def __init__(self, capacity: int):
        if capacity <= 0:
            raise ValueError("capacity must be positive")
        self._cap = capacity
        self._map: dict[Hashable, _Node] = {}
        self._head, self._tail = _Node(), _Node()
        self._head.next = self._tail
        self._tail.prev = self._head
        self._lock = threading.RLock()

    def get(self, key: Hashable) -> Optional[Any]:
        with self._lock:
            node = self._map.get(key)
            if node is None:
                return None
            self._touch(node)
            return node.val

    def put(self, key: Hashable, value: Any) -> None:
        with self._lock:
            node = self._map.get(key)
            if node is not None:
                node.val = value
                self._touch(node)
                return
            node = _Node(key, value)
            self._map[key] = node
            self._add_after_head(node)
            if len(self._map) > self._cap:
                lru = self._tail.prev
                self._remove(lru)
                del self._map[lru.key]

    def __len__(self) -> int:
        with self._lock:
            return len(self._map)

    def _add_after_head(self, node: _Node) -> None:
        node.prev = self._head
        node.next = self._head.next
        self._head.next.prev = node
        self._head.next = node

    def _remove(self, node: _Node) -> None:
        node.prev.next = node.next
        node.next.prev = node.prev

    def _touch(self, node: _Node) -> None:
        self._remove(node)
        self._add_after_head(node)

Tests

Required: unit tests for the contract, smoke tests for ordering, concurrency tests for the thread-safe variant.

import unittest, threading, random

class TestLRU(unittest.TestCase):
    def test_basic(self):
        c = LRUCache(2)
        c.put(1, "a"); c.put(2, "b")
        self.assertEqual(c.get(1), "a")
        c.put(3, "c")                          # evicts 2
        self.assertIsNone(c.get(2))
        self.assertEqual(c.get(3), "c")
        self.assertEqual(c.get(1), "a")

    def test_update_is_a_use(self):
        c = LRUCache(2)
        c.put(1, "a"); c.put(2, "b"); c.put(1, "z")
        c.put(3, "c")                          # evicts 2 (1 was just updated)
        self.assertEqual(c.get(1), "z")
        self.assertIsNone(c.get(2))

    def test_capacity_one(self):
        c = LRUCache(1)
        c.put(1, "a"); c.put(2, "b")
        self.assertIsNone(c.get(1))
        self.assertEqual(c.get(2), "b")

    def test_concurrent(self):
        c = LRUCache(100)
        def worker():
            for _ in range(10000):
                k = random.randint(0, 200)
                if random.random() < 0.5:
                    c.put(k, k * 2)
                else:
                    c.get(k)
        threads = [threading.Thread(target=worker) for _ in range(8)]
        for t in threads: t.start()
        for t in threads: t.join()
        self.assertLessEqual(len(c), 100)      # invariant: never over capacity

Follow-up Questions

(1) How would you make it thread-safe? Already shown: a single RLock around the body of every public method. The lock is held only for O(1) work, so contention is bounded. For higher concurrency, shard by hash(key) % N into N independent caches; this is the practical answer for production caches at high QPS. A lock-free LRU is hard and rarely worth it.

(2) How would you persist state across restarts? Snapshot the cache to disk on a configurable interval (write the (key, value, recency-rank) triples in LRU order). On restart, replay the file in order. For stricter durability, write a per-put log entry to a write-ahead log; on restart, replay snapshot + log. Note: most caches choose not to persist — losing the cache on restart is usually fine, and persistence adds complexity for small benefit.

(4) How would you observe and monitor it? Emit hit-rate (hits / (hits + misses)) as a gauge; emit eviction count as a counter; emit cache size as a gauge; emit get/put latency as a histogram. Hit rate is the #1 actionable signal. Set an alert on hit rate dropping below the SLO.

(7) How would you handle backpressure? Caches don’t have classical backpressure since there’s no producer queue, but the analog is memory pressure: if the host is short on memory, the cache should shed load. Either (a) a soft size_in_bytes ceiling that triggers eviction beyond capacity, or (b) integrate with a host-level memory pressure signal (cgroup memory accounting on Linux). Decide explicitly which.

(11) What configuration knobs would you expose? capacity (entries), optionally size_bytes (RAM ceiling), optionally eviction_callback. Knobs not to expose: lock granularity, internal data structure choice, snapshot interval (set sensible default and document).

(12) What is the shutdown / draining behavior? The cache itself is in-memory and stateless from the caller’s perspective; on shutdown, optionally write a snapshot, then release the lock and let GC reclaim. No draining required; no in-flight work to finish.

Product Extension

LRU caches are the workhorse of CDN edge caches, database query result caches (Redis with allkeys-lru), HTTP reverse proxies, and database buffer pools. Real-world variants: two-level cache (L1 in-process + L2 Redis); size-aware LRU (counts bytes, not entries); adaptive LRU/LFU hybrid (ARC, used by ZFS). The data structure you wrote here is the textbook foundation; the variants tune it for specific workloads.

Language/Runtime Follow-ups

Python: collections.OrderedDict already implements LRU semantics — move_to_end() for _touch, popitem(last=False) for evict. In an interview, state that you know OrderedDict exists, then implement from scratch because the interviewer wants to see the linked list. Production: prefer OrderedDict or functools.lru_cache (decorator) unless you need eviction callbacks.
Java: LinkedHashMap with accessOrder=true and an overridden removeEldestEntry is the textbook Java LRU. For thread-safety wrap with Collections.synchronizedMap, or use Caffeine (Guava successor) which is O(1) and concurrent.
Go: no stdlib LRU; the container/list package gives a linked list, and a map[K]*list.Element gives O(1) lookup. The hashicorp/golang-lru package is the de facto standard in production Go.
C++: std::list<std::pair<K,V>> plus std::unordered_map<K, list::iterator>. Iterator stability of std::list is the reason this works.
JS/TS: Map preserves insertion order, so map.delete(k); map.set(k, v) is the recency-update pattern. Eviction is map.delete(map.keys().next().value). This works because Map.prototype.keys() returns keys in insertion order.

Common Bugs

Forgetting to update the hashmap when evicting (only removing from the linked list). The next get for the evicted key returns a dangling node. The fix is to read the node’s key before unlinking and use it to delete from the hashmap.
Storing only the value (not the key) in the node, then having no way to find the hashmap entry to delete on eviction.
Calling _touch before checking whether the key exists — touches a None.
In the thread-safe variant, taking the lock in get but not in put, or releasing between the eviction step and the insert step. The whole operation must be atomic.
Using a non-reentrant Lock and then calling another locked method internally — deadlock. Use RLock if you need to call locked methods from inside a locked method.

Debugging Strategy

If get returns the wrong value: print [(n.key, n.val) for n in walk(self._head)] after each put, compare to the expected access order. If the cache exceeds capacity: assert len(self._map) <= self._cap after every put; the violation tells you which call broke the invariant. For concurrency bugs (rare under the single-lock design), run the concurrent test with pytest --count=1000 until a failure repros, then add print(threading.get_ident(), op, key) traces and minimize.

Mastery Criteria

Implemented LRUCache with sentinel head/tail in <15 minutes from blank screen.
All four tests pass on first run.
Articulated invariants (hashmap-list set equality, capacity bound, head/tail recency) without prompting.
Stated O(1) time and O(capacity) space unprompted.
Answered follow-ups #1, #4, #7, #11, #12 in <90 seconds each, naming alternatives.
Refactored a single-threaded version into the thread-safe version in <5 minutes.

Lab 02 — LFU Cache

Goal

Implement a LFUCache with O(1) get and put. The challenge over LRU is the tie-breaking rule: when multiple keys share the minimum frequency, evict the least-recently-used among them. Expect this to take 30 minutes from blank screen on first attempt; aim to bring it down to 20 with practice.

Background Concepts

LFU evicts the least-frequently-used entry. Frequency means “number of get plus put calls referencing that key since insertion”. A naive frequency-counter approach forces an O(N) eviction scan. The O(1) trick is to bucket nodes by frequency: a hashmap from freq → DoublyLinkedList plus a min_freq cursor. On eviction, pop the LRU entry from freq_map[min_freq]. On a hit, move the node from freq_map[f] to freq_map[f+1]; when freq_map[min_freq] empties, advance min_freq.

This is a textbook example of bucket sort applied to a dynamic counter. Each frequency bucket is itself an LRU list, which handles tie-breaking in O(1). Together you get O(1) for both ops at the cost of more code than LRU.

Interview Context

LFU follows LRU as the second-asked cache problem at senior practical interviews. It is LeetCode 460 verbatim. Where LRU is a 20-minute problem, LFU is a 30-to-45-minute problem and tests whether you can design with two coordinated abstractions (the freq map and the per-bucket LRU lists). Most candidates fail by picking a frequency-only design and then hitting the tie-breaking case.

Problem Statement

Design LFUCache(capacity):

get(key) -> value or None — increment frequency on hit.
put(key, value) — insert/update; on capacity overflow, evict the LFU entry. Tie-break by LRU within the LFU bucket. Insertion sets frequency to 1.

Both O(1) average.

Constraints

1 ≤ capacity ≤ 10^5
10^7 ops in benchmarks
Thread-safety: optional follow-up

Clarifying Questions

Does put of an existing key increment frequency? (Yes by convention; confirm.)
Tie-breaking: LRU or arbitrary? (LRU is the standard expectation; confirm.)
What does get return on miss? (None or sentinel; confirm.)
Should frequencies decay over time (windowed LFU)? (Often a follow-up; default is “no decay”.)

Examples

c = LFUCache(2)
c.put(1, "a")            # freq: {1->1}
c.put(2, "b")            # freq: {1->1, 2->1}
c.get(1)        -> "a"   # freq: {1->2, 2->1}
c.put(3, "c")            # evict 2 (freq=1, LRU); state {1, 3}
c.get(2)        -> None
c.get(3)        -> "c"   # freq: {1->2, 3->2}
c.put(4, "d")            # tie at freq=2; evict 1 (LRU); state {3, 4}
c.get(1)        -> None

Initial Brute Force

dict mapping key → (value, freq, last_access_time). On eviction, scan all entries to find the (min freq, min time) pair. O(N) eviction.

Brute Force Complexity

get is O(1), but put with eviction is O(N). At N=10^5 over 10^7 calls, slow but tolerable for small inputs; will TLE at large N.

Optimization Path

Replace the linear eviction scan with frequency bucketing. Maintain:

key_to_node: dict[K, Node] for O(1) lookup
freq_to_list: dict[int, DoublyLinkedList] mapping frequency to an LRU list of nodes
min_freq: int tracking the current minimum frequency present

On get(k): look up node, remove from freq_to_list[node.freq], increment node.freq, append to freq_to_list[node.freq]. If the old bucket is now empty and it equaled min_freq, increment min_freq.

On put(k, v): if exists, update value and behave like get. Otherwise, evict from freq_to_list[min_freq] if at capacity (pop the front = LRU), then insert with freq=1 and reset min_freq=1.

The reset of min_freq=1 on insertion is the only step that’s not obvious; without it you’d evict the wrong bucket on the very next put.

Final Expected Approach

Three layers: Node, per-frequency DoublyLinkedList (with sentinels), LFUCache orchestrating the two maps. The per-bucket linked list handles LRU tie-breaking automatically — append on insert, pop from front on evict.

Data Structures Used

Structure	Purpose
`dict[K, Node]`	O(1) key lookup
`dict[int, DLList]`	O(1) frequency → bucket
Doubly-linked list per bucket	LRU within frequency, O(1) splice
`min_freq: int`	O(1) eviction target

Correctness Argument

Invariants maintained after every operation:

key_to_node and the union of all freq_to_list[f] contain the same set of keys.
Every node n lives in freq_to_list[n.freq] and only there.
min_freq is the smallest f such that freq_to_list[f] is non-empty (or any value when the cache is empty).
Within each bucket, the front is the LRU and the back is the MRU.

Each method preserves these by case analysis. The subtle case is on get: incrementing node.freq may empty the old bucket. We check if was_min_freq_bucket and now_empty: min_freq += 1. This is correct because every other key has frequency ≥ old min_freq + 1, since old min_freq was the global minimum and this was the only node at it (we know that because the bucket is now empty, but the bucket contained this node before the increment — so it was the only node).

Complexity

get: O(1)
put: O(1)
Space: O(capacity)

Implementation Requirements

from typing import Hashable, Any, Optional

class _Node:
    __slots__ = ("key", "val", "freq", "prev", "next")
    def __init__(self, key=None, val=None, freq=1):
        self.key, self.val, self.freq = key, val, freq
        self.prev = self.next = None

class _DLList:
    """Doubly linked list with sentinels. Front = LRU, back = MRU."""
    def __init__(self):
        self.head, self.tail = _Node(), _Node()
        self.head.next, self.tail.prev = self.tail, self.head
        self.size = 0

    def append(self, node: _Node) -> None:           # add to back (MRU)
        prev = self.tail.prev
        prev.next = node; node.prev = prev
        node.next = self.tail; self.tail.prev = node
        self.size += 1

    def remove(self, node: _Node) -> None:
        node.prev.next = node.next; node.next.prev = node.prev
        self.size -= 1

    def pop_front(self) -> _Node:                    # evict LRU
        node = self.head.next
        self.remove(node)
        return node

    def is_empty(self) -> bool:
        return self.size == 0


class LFUCache:
    def __init__(self, capacity: int):
        if capacity <= 0:
            raise ValueError("capacity must be positive")
        self._cap = capacity
        self._key_to_node: dict[Hashable, _Node] = {}
        self._freq_to_list: dict[int, _DLList] = {}
        self._min_freq = 0

    def get(self, key: Hashable) -> Optional[Any]:
        node = self._key_to_node.get(key)
        if node is None:
            return None
        self._bump(node)
        return node.val

    def put(self, key: Hashable, value: Any) -> None:
        if self._cap == 0:
            return
        node = self._key_to_node.get(key)
        if node is not None:
            node.val = value
            self._bump(node)
            return
        if len(self._key_to_node) >= self._cap:
            evicted = self._freq_to_list[self._min_freq].pop_front()
            del self._key_to_node[evicted.key]
        node = _Node(key, value, freq=1)
        self._key_to_node[key] = node
        self._freq_to_list.setdefault(1, _DLList()).append(node)
        self._min_freq = 1

    def _bump(self, node: _Node) -> None:
        old_list = self._freq_to_list[node.freq]
        old_list.remove(node)
        if old_list.is_empty() and node.freq == self._min_freq:
            self._min_freq += 1
        node.freq += 1
        self._freq_to_list.setdefault(node.freq, _DLList()).append(node)

    def __len__(self) -> int:
        return len(self._key_to_node)

Tests

import unittest

class TestLFU(unittest.TestCase):
    def test_basic_eviction(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")
        self.assertEqual(c.get(1), "a")          # 1.freq=2, 2.freq=1
        c.put(3, "c")                            # evict 2
        self.assertIsNone(c.get(2))
        self.assertEqual(c.get(3), "c")          # 3.freq=2

    def test_tie_break_lru(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")             # both freq=1; 1 is LRU
        c.put(3, "c")                            # evict 1 (LRU at freq=1)
        self.assertIsNone(c.get(1))
        self.assertEqual(c.get(2), "b")

    def test_update_increments_freq(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b"); c.put(1, "z")  # 1.freq=2
        c.put(3, "c")                            # evict 2 (freq=1)
        self.assertEqual(c.get(1), "z")
        self.assertIsNone(c.get(2))

    def test_capacity_zero(self):
        c = LFUCache(0)
        c.put(1, "a")
        self.assertIsNone(c.get(1))

    def test_min_freq_advance(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")
        c.get(1); c.get(1); c.get(2); c.get(2)   # both freq=3
        c.put(3, "c")                            # evict 1 (LRU at freq=3)
        self.assertIsNone(c.get(1))
        self.assertEqual(c.get(2), "b")

Follow-up Questions

(1) Thread-safe? A single RLock around get, put, and _bump is correct and the standard production answer. The locked region is O(1), so contention is bounded. Sharding by hash(key) % N into N independent LFU caches is the higher-throughput choice — but each shard has its own LFU bucketing, which is fine because LFU is per-key.

(4) Observe and monitor? Hit rate (hits / (hits + misses)) as a gauge; eviction count as a counter; frequency distribution as a histogram (10th, 50th, 90th, 99th percentile of frequency at eviction time) — this tells you whether the cache is actually being used “frequency-aware” or whether everything has freq=1. Cache size as a gauge.

(9) Eviction policy / cleanup? This is the eviction policy. The catch: long-tail entries can pile up at high frequency and never get evicted even after they go cold (a value popular yesterday is still freq=1000 today). Solutions: windowed LFU (decay frequencies on a timer), LFU-Aging (halve all frequencies periodically), or TinyLFU (admission filter that uses a count-min sketch). State the limitation and pick a mitigation.

(10) Consistency model? Linearizable in a single process under the lock — every get/put appears to take effect instantly at some moment between invocation and return. Replicated LFU is harder; most distributed caches degrade to eventual consistency on the cache and rely on an authoritative store underneath.

(11) Configuration knobs? capacity, optionally decay_interval (for windowed LFU), optionally eviction_callback. Don’t expose internal data-structure tuning.

Product Extension

LFU is the right policy when access patterns are stationary (popular items stay popular). It outperforms LRU on workloads with strong popularity skew (web caches, recommendation systems, query result caches). It performs worse than LRU on scan-heavy workloads (a large one-time scan pollutes LRU with a single bump but pollutes LFU with cold-but-frequent entries until aging kicks in). TinyLFU (used in Caffeine) combines a count-min admission filter with a small LRU window, getting LFU’s hit rate without the staleness problem.

Language/Runtime Follow-ups

Python: same approach as shown. collections.Counter for frequencies is tempting but doesn’t give the bucketing we need.
Java: build on LinkedHashMap for the per-frequency buckets, HashMap<Integer, LinkedHashSet<K>> for the freq map. Caffeine provides production-grade TinyLFU.
Go: container/list per bucket, map[int]*list.List for buckets. The groupcache library uses a different policy (LRU); for LFU, write it yourself or use dgraph-io/ristretto (TinyLFU).
C++: std::list<Node> per bucket; std::unordered_map<int, list> for freq map.
JS/TS: Map<int, Set<K>> for buckets — Set preserves insertion order, so LRU-within-bucket is free.

Common Bugs

Forgetting to advance min_freq when the LRU bucket becomes empty after a _bump. Subsequent eviction picks an empty bucket and crashes.
Resetting min_freq=1 on insert before the eviction step instead of after, evicting the wrong bucket.
Tie-breaking by MRU instead of LRU — appending to the front of the bucket instead of the back. The bucket is itself an LRU list; respect the convention.
Sharing a single _DLList across freq buckets accidentally (e.g., a class-level default). Use setdefault(freq, _DLList()).
On put of an existing key, decrementing-then-incrementing frequency instead of bumping — produces correct frequency but breaks the bucketing if you forget to move buckets.

Debugging Strategy

Print the bucketing as {freq: [keys]} after every operation. The bug usually shows as a stale entry in min_freq’s bucket or a missing entry in the new bucket. Add an assert _check_invariants() method that walks the buckets and verifies (a) bucket→key relations, (b) min_freq correctness, (c) hashmap-bucket set equality. Run the test suite with assertions on.

Mastery Criteria

Implemented LFUCache from blank screen in <30 minutes (target: <25 after second attempt).
All five tests pass first run.
Stated tie-breaking-via-per-bucket-LRU explicitly without prompting.
Articulated the min_freq advancement rule precisely.
Answered follow-ups #1, #4, #9 (LFU-Aging / TinyLFU), #10, #11 in <90 seconds each.
Compared LFU vs LRU on scan-heavy and popularity-skewed workloads in <60 seconds.

Lab 03 — Rate Limiter

Goal

Implement four rate-limiting algorithms — token bucket, leaky bucket, sliding window log, sliding window counter — and articulate the tradeoffs between them. After this lab you should be able to pick the right algorithm for a stated workload in under 30 seconds and implement any of the four in under 15 minutes.

Background Concepts

A rate limiter caps the number of requests a key (user, IP, API token) may make over a time window. The four standard algorithms differ in how much history they keep and what kind of bursts they allow:

Token bucket: a bucket of capacity B refills at rate R per second; each request consumes one token; if no token, reject. Allows bursts up to B.
Leaky bucket: requests enter a queue of capacity Q that drains at rate R; if the queue is full, reject. Smooths bursts (output rate is constant).
Sliding window log: keep a list of timestamps over the last W seconds; reject if len(log) ≥ N. Most accurate, most memory.
Sliding window counter: keep a count for the current and previous fixed window; estimate by linear interpolation. Cheap; mildly inaccurate at boundaries.

The token bucket is by far the most-common production choice (used by AWS, Stripe, GitHub) because it gives sensible burst tolerance with O(1) memory per key. Sliding window log is the choice when you must guarantee strict request-count caps (e.g., quota enforcement against a legal contract). Leaky bucket is used in network shaping. Sliding window counter is the right pick when memory is constrained and approximate is acceptable.

Interview Context

Rate limiter is asked at every senior+ practical round at Stripe, Cloudflare, Uber, and most high-scale API companies. The strong answer compares the four algorithms, picks one, justifies the pick, implements it, and answers the inevitable follow-ups about distributed coordination (multiple servers must share one quota), persistence, and observability. The weak answer implements one variant without acknowledging the others exist.

Problem Statement

Design RateLimiter with allow(key) -> bool. Configurable rate R requests per W seconds. Implement all four algorithms behind a common interface so they can be benchmarked.

Constraints

Up to 10^6 distinct keys
Up to 10^5 QPS aggregate
Sub-millisecond per-call latency
Configurable rate per key (follow-up)

Clarifying Questions

Per-key or global limit? (Per-key by convention.)
Should refused requests be queued or rejected? (Token bucket rejects; leaky bucket queues.)
Time source: monotonic clock or wall clock? (Always monotonic — wall clock can jump backward.)
Distributed across N servers, or single-process? (Often a follow-up; default single-process.)
Burst tolerance — yes or no? (Token bucket allows bursts; sliding window log enforces strict.)

Examples

Rate = 5 req / 1 s.

Token bucket (capacity 5, refill 5/s):
  t=0:   5 quick requests   → all allow (bucket drains 5→0)
  t=0.1: 1 request          → reject (bucket=0.5 < 1)
  t=1.0: 1 request          → allow (bucket refilled to 5; now 4)

Sliding window log (limit 5 over 1 s):
  t=0..0.5: 5 requests      → all allow
  t=0.6: 1 request          → reject (6 in last 1 s)
  t=1.1: 1 request          → allow (the t=0 request slid out)

Initial Brute Force

A dict[key, list[timestamp]] and on every allow, filter out timestamps older than W and check len. This is the sliding window log baseline; it is O(history-size) per call and unbounded memory. Acceptable for low-rate testing; not acceptable for production at 10^5 QPS.

Brute Force Complexity

Per call: O(N) where N is the request count in the window. Memory: O(N · keys). At 10^5 QPS over 1-second windows for 10^6 keys, the memory could approach 10^11 timestamps — orders of magnitude too high.

Optimization Path

For each algorithm, the optimization target is different:

Token bucket: store (tokens, last_refill_time) per key. Compute refill on demand: tokens = min(B, tokens + (now - last_refill) * R). O(1) per call, O(1) memory per key.
Leaky bucket: equivalent to token bucket if reject-on-full; if queue requests, store the queue.
Sliding window log: same as brute force, but trim the prefix lazily on each call. O(amortized 1) per call.
Sliding window counter: store (curr_count, prev_count, curr_window_start). Approximate the rolling count with prev * (1 - elapsed/W) + curr. O(1) per call, O(1) memory.

The token bucket is the dominant choice; the other three are presented for comparison.

Final Expected Approach

Define a RateLimiter interface with allow(key) -> bool. Implement four classes. Each takes rate: float (per second) and capacity (or window). Use time.monotonic(). Make all four thread-safe via per-key fine-grained locks (a dict[key, Lock] lazily created — or just a single global lock, which is simpler and adequate for most workloads).

Data Structures Used

Algorithm	Per-key state
Token bucket	`(float tokens, float last_refill_t)`
Leaky bucket (reject)	`(float queue_size, float last_drain_t)`
Sliding log	`deque[float]` of timestamps
Sliding counter	`(int curr, int prev, float window_start)`

Correctness Argument

Token bucket: tokens are produced at rate R continuously and capped at B; consumption is one per allowed request. Equivalent to the differential equation dt/dt = R - consumption, integrated by the lazy refill formula. Correct provided we never let tokens go below 0 (we check >= 1 before decrementing) or above B (the min(B, ...)).

Sliding window log: the invariant is “at any time, the deque contains exactly the timestamps in [now - W, now]”. We maintain it by trimming the prefix on every call. Then allow is len(deque) < N.

Sliding window counter: the approximation is estimate = prev_count * (1 - elapsed/W) + curr_count. This is exact when requests are uniformly distributed within each window and an upper bound otherwise (off by at most one window’s burst). Acceptable for most production rate limiters.

Complexity

Algorithm	Time / call	Space / key
Token bucket	O(1)	O(1)
Leaky bucket	O(1)	O(1)
Sliding log	O(1) amortized	O(N)
Sliding counter	O(1)	O(1)

Implementation Requirements

import time, threading
from collections import deque
from typing import Hashable

class TokenBucket:
    def __init__(self, rate: float, capacity: float):
        self._rate, self._cap = rate, capacity
        self._state: dict[Hashable, list[float]] = {}    # key -> [tokens, last_t]
        self._lock = threading.Lock()

    def allow(self, key: Hashable) -> bool:
        now = time.monotonic()
        with self._lock:
            s = self._state.get(key)
            if s is None:
                s = [self._cap, now]; self._state[key] = s
            tokens, last = s
            tokens = min(self._cap, tokens + (now - last) * self._rate)
            if tokens >= 1:
                s[0] = tokens - 1; s[1] = now
                return True
            s[0] = tokens; s[1] = now
            return False


class SlidingWindowLog:
    def __init__(self, max_in_window: int, window_s: float):
        self._max, self._w = max_in_window, window_s
        self._logs: dict[Hashable, deque[float]] = {}
        self._lock = threading.Lock()

    def allow(self, key: Hashable) -> bool:
        now = time.monotonic()
        with self._lock:
            dq = self._logs.setdefault(key, deque())
            cutoff = now - self._w
            while dq and dq[0] < cutoff:
                dq.popleft()
            if len(dq) >= self._max:
                return False
            dq.append(now)
            return True


class SlidingWindowCounter:
    def __init__(self, max_in_window: int, window_s: float):
        self._max, self._w = max_in_window, window_s
        self._state: dict[Hashable, list] = {}    # [curr, prev, window_start]
        self._lock = threading.Lock()

    def allow(self, key: Hashable) -> bool:
        now = time.monotonic()
        with self._lock:
            s = self._state.get(key)
            if s is None:
                s = [0, 0, now]; self._state[key] = s
            curr, prev, ws = s
            elapsed = now - ws
            if elapsed >= 2 * self._w:
                curr = prev = 0; ws = now
            elif elapsed >= self._w:
                prev, curr = curr, 0; ws += self._w
                elapsed -= self._w
            estimate = prev * (1 - elapsed / self._w) + curr
            if estimate >= self._max:
                s[0], s[1], s[2] = curr, prev, ws
                return False
            s[0], s[1], s[2] = curr + 1, prev, ws
            return True


class LeakyBucket:
    """Reject-on-full leaky bucket. Equivalent to token bucket when reject."""
    def __init__(self, rate: float, capacity: float):
        self._rate, self._cap = rate, capacity
        self._state: dict[Hashable, list[float]] = {}
        self._lock = threading.Lock()

    def allow(self, key: Hashable) -> bool:
        now = time.monotonic()
        with self._lock:
            s = self._state.get(key)
            if s is None:
                s = [0.0, now]; self._state[key] = s
            level, last = s
            level = max(0.0, level - (now - last) * self._rate)
            if level + 1 > self._cap:
                s[0], s[1] = level, now
                return False
            s[0], s[1] = level + 1, now
            return True

Tests

import unittest, time

class TestTokenBucket(unittest.TestCase):
    def test_burst_then_steady(self):
        rl = TokenBucket(rate=5, capacity=5)
        # Burst: 5 quick allows
        for _ in range(5):
            self.assertTrue(rl.allow("k"))
        # 6th: reject (bucket empty)
        self.assertFalse(rl.allow("k"))
        # Wait 0.4s → 2 tokens accumulated
        time.sleep(0.4)
        self.assertTrue(rl.allow("k"))
        self.assertTrue(rl.allow("k"))

    def test_per_key_isolation(self):
        rl = TokenBucket(rate=1, capacity=1)
        self.assertTrue(rl.allow("a"))
        self.assertTrue(rl.allow("b"))      # different key, full bucket
        self.assertFalse(rl.allow("a"))


class TestSlidingLog(unittest.TestCase):
    def test_strict_count(self):
        rl = SlidingWindowLog(max_in_window=3, window_s=1.0)
        for _ in range(3):
            self.assertTrue(rl.allow("k"))
        self.assertFalse(rl.allow("k"))
        time.sleep(1.05)
        self.assertTrue(rl.allow("k"))      # log has slid out

Follow-up Questions

(3) How would you scale to N nodes? This is the key follow-up. Options: (a) sticky routing — route all requests for a key to a fixed node by hash(key) % N; each node enforces locally. Simple, but rebalancing on add/remove is painful. (b) Centralized counter in Redis using INCR + EXPIRE per window. Network round-trip per call — only works at moderate QPS. (c) Approximate distributed: each node enforces R/N locally and accepts that bursts up to R are possible. The pragmatic real-world answer; documents your error budget. (d) Token bucket in Redis with Lua script for atomic refill+decrement — Stripe and GitHub do this in production.

(7) How would you handle backpressure? The whole point of a rate limiter is backpressure on upstream traffic. The question becomes: when the limiter rejects, what does the client see? HTTP 429 with Retry-After header is the standard. Enable cooperative backoff so clients don’t retry-storm. Optionally include X-RateLimit-Remaining and X-RateLimit-Reset headers (GitHub convention).

(9) What’s the eviction policy? Per-key state grows unbounded if you never clean up. Two strategies: (a) lazy expiry — when a key has been silent for >2W, drop its state on the next access. (b) Background scavenger — periodically scan and remove stale entries. The lazy approach is preferred; it’s O(0) overhead in the steady state.

(11) What configuration knobs? rate and capacity (or window) per limit class. Optionally per-key overrides (a dict[key, (rate, cap)] for VIP customers). Knobs not to expose: the algorithm choice (pick one and stick).

(4) How would you observe / monitor? Allow rate (counter), reject rate (counter), reject ratio (gauge), per-key reject rate (top-K dashboard for finding hot keys). Bucket-fill / queue-length gauge for diagnosing whether you’re rate-limited because of bursts or steady overload.

Product Extension

Stripe’s API uses token bucket with per-account capacity. AWS API Gateway uses token bucket per stage. GitHub’s API uses sliding window with hourly windows visible to clients. Twitter (X) uses fixed windows for some endpoints, sliding for others. The choice depends on the contract you offer customers (“up to 5 burst requests” → token bucket; “exactly 5000 per hour” → sliding log or sliding counter).

Language/Runtime Follow-ups

Python: the implementation above. For high QPS, replace the global lock with a dict[key, Lock] lazily, or shard by hash(key) % N.
Java: Guava’s RateLimiter is a token bucket with smoothing options. For distributed, Bucket4j is excellent.
Go: golang.org/x/time/rate is a token bucket (Allow / Wait). For distributed, use Redis with a Lua script.
C++: no stdlib; use std::chrono::steady_clock::now(). Folly has a token bucket.
JS/TS: bottleneck (npm) is the canonical client-side. Server-side: Redis-backed for distributed.

Common Bugs

Using time.time() (wall clock) instead of time.monotonic() — clock skew or NTP adjustments cause negative deltas and free tokens.
Token bucket: not capping at B — bucket grows unbounded over idle periods; first burst is huge.
Sliding log: not trimming on every call, only on insert — memory grows for read-heavy patterns.
Sliding counter: failing to advance the window pointer when 2W has passed (key idle for long enough that both windows are stale).
Forgetting per-key isolation — a single shared bucket across all keys.

Debugging Strategy

Log every (key, allow/reject, bucket_state) transition for a single key. Hand-trace against expected behavior. For distributed bugs, capture the Lua script’s input and output and replay against a local Redis. For thundering-herd bugs (many clients see “reset” simultaneously and all retry at once), add jitter on Retry-After (server-side recommends Retry-After: random_in_range(t, 2t)).

Mastery Criteria

Implemented all four algorithms in <60 minutes total (15 min each).
All tests pass first run.
Compared the four algorithms verbally in <90 seconds, naming a workload where each is the right choice.
Stated why time.monotonic() is required without prompting.
Answered follow-ups #3 (distributed), #4, #7, #9, #11 in <90 seconds each.
Identified that the leaky bucket and reject-on-full token bucket are mathematically equivalent when reject (different when queue).

Lab 04 — Task Scheduler

Goal

Implement a TaskScheduler that accepts tasks with priorities, executes them in priority order via a worker pool, retries on failure with exponential backoff, and routes permanently-failed tasks to a dead-letter queue. After this lab you should be able to design and implement a small in-memory task queue with retry semantics in under 35 minutes.

Background Concepts

A task scheduler is the in-memory cousin of Celery / Sidekiq / RQ. The four moving parts are:

Priority queue of pending tasks (heap keyed on priority + scheduled-execution-time).
Worker pool that pops tasks and runs them.
Retry policy that decides if and when a failed task is re-enqueued (with delayed visibility).
Dead-letter queue (DLQ) for tasks that have exhausted retries.

The non-trivial design question is how to handle delayed re-enqueue for retries. The clean answer is to use a single priority queue keyed by (priority, ready_at), and have workers wait_until_ready on the head of the queue. This unifies “high-priority now” and “low-priority retry-in-30-seconds” under one data structure.

Interview Context

Task scheduler problems are popular at infrastructure-heavy companies (Uber, Cloudflare, AWS Lambda team, Datadog) because they touch concurrency, priority queues, retry semantics, and DLQ design — the building blocks of every async-job system. The interviewer will probe whether you’ve thought about idempotency, exactly-once vs at-least-once, and observability.

Problem Statement

Design TaskScheduler(n_workers, max_retries):

submit(task: Callable, priority: int, max_attempts: int = 3) -> task_id — enqueue a task with given priority. Lower numeric priority = runs first.
start() — start the worker pool.
shutdown(timeout) — stop accepting new tasks; finish in-flight up to timeout; return.
dead_letters() -> list[FailedTask] — return tasks that exhausted retries.

Behavior: failures (raised exception) → exponential backoff retry up to max_attempts; permanent failure → DLQ.

Constraints

Up to 10^4 pending tasks
Up to 100 workers
Per-task max execution time: 60s (a configurable per-task timeout is a follow-up)
Tasks may be of arbitrary type but assumed to be deterministic-ish

Clarifying Questions

Are tasks idempotent? (We’ll assume yes; idempotency is the user’s responsibility for at-least-once correctness.)
Priority semantics: lower = higher? (Yes by convention, like a min-heap.)
What does retry mean — the same task is re-run, or a new attempt object? (Same callable, same args, attempt counter incremented.)
Should retries preserve original priority? (Yes by convention.)
Cancellation? (Often a follow-up; default no.)

Examples

sched = TaskScheduler(n_workers=2, max_retries=3)
sched.start()
sched.submit(lambda: print("a"), priority=1)
sched.submit(lambda: print("b"), priority=0)   # runs first
sched.submit(failing_task, priority=2, max_attempts=2)
                                                # 1st attempt fails; retry after backoff
                                                # 2nd attempt fails; → DLQ
sched.shutdown(timeout=5.0)
sched.dead_letters()  # [FailedTask(failing_task, attempts=2, last_error=...)]

Initial Brute Force

A single thread polling a list sorted by priority. Run, retry inline. Single worker, no parallelism. O(N log N) per submit.

Brute Force Complexity

Per submit: O(N log N) on the sort. Per dispatch: O(N) on the linear scan. Acceptable only for tens of tasks.

Optimization Path

Replace the sorted list with a heapq. Replace the single thread with a worker pool of n_workers threads. Add a Condition variable so workers block when the queue is empty. Add a delayed-execution facility: instead of time.sleep(backoff) in the worker, push the task back with ready_at = now + backoff and key the heap on (ready_at, priority).

Final Expected Approach

Single heap (ready_at, priority, attempt, task_id, callable). A condition variable not_empty wakes workers when something becomes available. Workers loop: peek heap → if ready_at > now, wait until ready_at (or until notified). Pop, run, on success: done. On failure: increment attempt, if under max_attempts, push back with ready_at = now + backoff(attempt); else push to DLQ list. shutdown sets a flag, broadcasts the condition, joins all workers with a deadline.

Data Structures Used

Structure	Purpose
`heapq` of tuples	priority + delayed-readiness
`threading.Condition`	wait/notify for empty queue and ready-time
`list` (DLQ)	failed tasks
`dict[task_id, attempt_count]`	retry tracking

Correctness Argument

Priority ordering: heap orders by (ready_at, priority); workers always pop the smallest. When ready_at <= now, this is the highest-priority ready task. Ties on ready_at go to the smaller priority — correct.

Retry semantics: failure → push back with attempt+1, ready_at = now + 2^attempt * base + jitter. After max_attempts attempts, push to DLQ. The task is never lost: it is either running, in the heap, or in the DLQ — invariant maintained by every transition.

Shutdown: setting _stopping = True and broadcasting wakes every blocked worker. Each worker checks _stopping after the wait and exits if true. The join(timeout) per worker bounds total shutdown time.

Complexity

submit: O(log N)
Worker dispatch: O(log N) per task
Memory: O(pending + dlq)

Implementation Requirements

import heapq, threading, time, itertools, random, traceback
from dataclasses import dataclass, field
from typing import Callable, Any

@dataclass
class FailedTask:
    task_id: int
    callable_repr: str
    attempts: int
    last_error: str

@dataclass(order=True)
class _Heap_Entry:
    ready_at: float
    priority: int
    seq: int
    task_id: int = field(compare=False)
    fn: Callable = field(compare=False)
    attempt: int = field(compare=False)
    max_attempts: int = field(compare=False)


class TaskScheduler:
    def __init__(self, n_workers: int = 4, base_backoff: float = 0.1):
        self._n_workers = n_workers
        self._base = base_backoff
        self._heap: list[_Heap_Entry] = []
        self._dlq: list[FailedTask] = []
        self._cond = threading.Condition()
        self._stopping = False
        self._workers: list[threading.Thread] = []
        self._seq = itertools.count()
        self._next_id = itertools.count(1)

    def submit(self, fn: Callable, priority: int = 5, max_attempts: int = 3) -> int:
        if max_attempts <= 0:
            raise ValueError("max_attempts must be positive")
        tid = next(self._next_id)
        e = _Heap_Entry(time.monotonic(), priority, next(self._seq),
                        tid, fn, attempt=0, max_attempts=max_attempts)
        with self._cond:
            heapq.heappush(self._heap, e)
            self._cond.notify()
        return tid

    def start(self) -> None:
        for i in range(self._n_workers):
            t = threading.Thread(target=self._run_worker, name=f"w{i}", daemon=True)
            t.start()
            self._workers.append(t)

    def shutdown(self, timeout: float = 5.0) -> None:
        with self._cond:
            self._stopping = True
            self._cond.notify_all()
        deadline = time.monotonic() + timeout
        for w in self._workers:
            w.join(timeout=max(0.0, deadline - time.monotonic()))

    def dead_letters(self) -> list[FailedTask]:
        with self._cond:
            return list(self._dlq)

    def _run_worker(self) -> None:
        while True:
            with self._cond:
                while not self._heap and not self._stopping:
                    self._cond.wait()
                if self._stopping and not self._heap:
                    return
                head = self._heap[0]
                wait = head.ready_at - time.monotonic()
                if wait > 0:
                    self._cond.wait(timeout=wait)
                    continue
                e = heapq.heappop(self._heap)
            try:
                e.fn()
            except Exception as ex:
                e.attempt += 1
                if e.attempt >= e.max_attempts:
                    with self._cond:
                        self._dlq.append(FailedTask(
                            e.task_id, repr(e.fn), e.attempt,
                            f"{type(ex).__name__}: {ex}"))
                else:
                    backoff = self._base * (2 ** (e.attempt - 1))
                    backoff *= 0.5 + random.random()      # jitter [0.5, 1.5]
                    e.ready_at = time.monotonic() + backoff
                    e.seq = next(self._seq)
                    with self._cond:
                        heapq.heappush(self._heap, e)
                        self._cond.notify()

Tests

import unittest, time, threading

class TestScheduler(unittest.TestCase):
    def test_priority_order(self):
        order = []
        sched = TaskScheduler(n_workers=1)
        sched.start()
        for p in [3, 1, 2]:
            sched.submit((lambda x=p: order.append(x)), priority=p)
        time.sleep(0.5)
        sched.shutdown(timeout=2.0)
        self.assertEqual(order, [1, 2, 3])

    def test_retry_then_dlq(self):
        attempts = []
        def always_fail():
            attempts.append(1)
            raise RuntimeError("boom")
        sched = TaskScheduler(n_workers=1, base_backoff=0.01)
        sched.start()
        sched.submit(always_fail, priority=0, max_attempts=3)
        time.sleep(1.0)
        sched.shutdown(timeout=2.0)
        self.assertEqual(len(attempts), 3)
        self.assertEqual(len(sched.dead_letters()), 1)

    def test_concurrent_submit(self):
        results = []
        sched = TaskScheduler(n_workers=4)
        sched.start()
        def push(i):
            for j in range(50):
                sched.submit((lambda x=(i, j): results.append(x)), priority=0)
        threads = [threading.Thread(target=push, args=(i,)) for i in range(4)]
        for t in threads: t.start()
        for t in threads: t.join()
        time.sleep(0.5)
        sched.shutdown(timeout=2.0)
        self.assertEqual(len(results), 200)

Follow-up Questions

(2) Persist state across restarts? Tasks live in memory and are lost on restart. To persist, choose: (a) write each submit to a WAL; on boot, replay; on completion, append a “done” marker. (b) Snapshot the heap periodically and write a delta log. The DLQ should be persisted regardless — losing failed tasks is the worst outcome because nobody knows why a job didn’t run.

(8) Partial failure? The interesting case: a worker pops a task and crashes mid-execution. The task is now lost (it’s not in the heap and it didn’t complete). Solution: at-least-once via visibility timeout — the heap pops the task to an “in-flight” map with a TTL; if the worker doesn’t ack before TTL, the task returns to the heap. Idempotency keys make this safe. This is the SQS / Cloud Tasks model.

(9) Eviction / cleanup? The DLQ grows unbounded. Either: cap its size and drop oldest, retain a sliding-window of the last N failures, or persist to durable storage and prune from memory after a TTL. Always emit a per-task DLQ event so downstream alerting can fire.

(11) Configuration knobs? n_workers, base_backoff, default max_attempts. Per-task: priority, max_attempts, optionally timeout. Knobs not to expose: jitter strategy (use decorrelated jitter), heap implementation.

(12) Shutdown / draining? Two modes: graceful (stop accepting; wait for in-flight; return) and forceful (stop accepting; abandon in-flight; return immediately). Always offer both. Default to graceful with a deadline.

(13) Poison pill? A task that always crashes the worker (segfault, unhandled OS exception, infinite loop). Run tasks in subprocess isolation (or with a cooperative timeout). Blacklist by hash of (callable, args) after N consecutive crashes.

Product Extension

This is the heart of Celery, Sidekiq, RQ, AWS SQS + Lambda, GCP Cloud Tasks, and Temporal. Real systems add: visibility timeouts (the in-flight TTL), distributed coordination (multiple workers across hosts), durable storage (RDBMS or Redis with persistence), scheduling (cron-like time-based triggers), and workflow orchestration (Temporal). The core is what you wrote here.

Language/Runtime Follow-ups

Python: GIL means worker threads don’t parallelize CPU work. For CPU-bound tasks, use a ProcessPoolExecutor instead. The implementation above is fine for I/O-bound tasks.
Java: ScheduledThreadPoolExecutor is the textbook fit — submit with a delay, retries via re-submission. RetryTemplate (Spring Retry) for the policy. DeadLetterPublishingRecoverer (Kafka).
Go: a single channel of tasks plus N goroutines; for delayed retry, use time.AfterFunc to push back to the channel. Or use golang.org/x/sync/errgroup for the worker pool.
C++: a std::priority_queue plus condition variable. Tasks as std::function<void()>.
JS/TS: not concurrent (single event loop), but BullMQ (Redis-backed) is the de-facto Node task queue.

Common Bugs

Workers spinning when the heap head is in the future — wait ready_at - now exactly, not poll-loop.
Notifying only one worker on submit (notify) but notify_all on shutdown — fine, but check that the heap-shrink case (the popper sees head.ready_at > now and sleeps) doesn’t miss a wakeup when a higher-priority task is pushed during the sleep.
Forgetting to update e.seq on re-push — the heap entry’s identity matters for tie-breaking, but Python’s heapq compares the full tuple in order, so missing seq updates can cause the same (ready_at, priority, seq) to compare equal and the comparison to fall through to the un-comparable Callable. Always bump seq.
Catching Exception but letting BaseException (e.g., KeyboardInterrupt) escape — workers die silently. Catch BaseException with care, or at minimum catch Exception and log unexpected escapes.
DLQ growing forever — see follow-up #9.

Debugging Strategy

Add a worker trace: every transition (pop, run-start, run-end, retry, dlq) gets a log line with (worker_id, task_id, ts). Replay the log to see the timeline. For “task didn’t run” bugs, walk the heap state at submit time and check that notify was called. For shutdown deadlocks, take a thread dump (Python: faulthandler.dump_traceback_later()) — usually a worker is blocked on wait because notify_all was missed.

Mastery Criteria

Implemented in <40 minutes; <30 on second attempt.
All three tests pass.
Articulated visibility-timeout / at-least-once vs lost-on-crash tradeoff in <90 seconds.
Answered follow-ups #2, #8, #9, #12, #13 crisply.
Added at-least-once semantics in <15 minutes when prompted.
Stated why decorrelated jitter beats fixed jitter in retry backoff.

Lab 05 — Thread Pool

Goal

Implement a bounded ThreadPool with a fixed number of worker threads, a bounded work queue, configurable rejection policy, and graceful shutdown. After this lab you should be able to write a clean ThreadPoolExecutor clone in under 25 minutes and answer the standard concurrency follow-ups.

Background Concepts

A thread pool decouples task submission from task execution by introducing a queue of work items processed by N worker threads. The four design decisions are:

Pool sizing: fixed-size, dynamic (grow/shrink), or bounded with min/max?
Queue policy: bounded (block / reject / drop) or unbounded (memory risk)?
Rejection policy when the queue is full: throw, drop newest, drop oldest, or run-on-caller’s-thread?
Shutdown semantics: stop accepting and finish queue (shutdown), or stop accepting and abandon queue (shutdown_now)?

The textbook implementation (and the one Java’s ThreadPoolExecutor uses) is fixed-size pool + bounded blocking queue + caller-runs rejection + graceful shutdown. This is the answer the interviewer wants by default.

Interview Context

Thread pool is a classic concurrency interview question. It tests whether you understand condition variables / blocking queues, can reason about producer-consumer with backpressure, and can structure shutdown so that submit after shutdown is rejected and in-flight tasks complete cleanly. Java candidates are expected to know that ThreadPoolExecutor’s seven-parameter constructor encodes most of these decisions.

Problem Statement

Implement ThreadPool(n_workers, queue_capacity, on_reject):

submit(fn) -> Future — enqueue. If queue full and pool not shut down, apply on_reject.
shutdown(wait=True, timeout=None) — stop accepting; if wait, drain the queue and join workers.
shutdown_now() -> list[Callable] — stop accepting; abandon queued tasks and return them.

A Future exposes .result(timeout) to retrieve the task’s return value or raise its exception.

Constraints

1 ≤ n_workers ≤ 1000
0 ≤ queue_capacity ≤ 10^4 (0 = SynchronousQueue: hand off directly)
Submission rate up to 10^5 / s

Clarifying Questions

Is the queue bounded? (Yes by default; “unbounded queue” is a known antipattern that masks production bugs.)
What happens on full queue? (Reject by default; offer caller-runs as alternative.)
Should workers be daemons? (Depends on language; in Python yes for graceful interpreter shutdown.)
Returns a Future? (Yes — async result is the standard contract.)
Re-entrancy: can a task submit more tasks? (Yes — must not deadlock on a full queue from inside a worker.)

Examples

pool = ThreadPool(n_workers=2, queue_capacity=5)
fut = pool.submit(lambda: 1 + 1)
fut.result()           -> 2
pool.shutdown(wait=True)
pool.submit(lambda: 1) -> raises RuntimeError (pool shut down)

Initial Brute Force

for fn in tasks: threading.Thread(target=fn).start(). No bound, no reuse, no result tracking. Each task pays full thread-creation cost (~1ms on Linux), and the OS can run out of threads at 10^4+.

Brute Force Complexity

Per task: O(thread creation) ≈ 1 ms in Python. Total: O(N · 1ms). At N=10^5, this is 100 seconds — far too slow. Memory: O(N) thread stacks ≈ 8 MB each.

Optimization Path

Pool the threads. Workers spin on a blocking queue. submit enqueues; the queue blocks when full (or rejects). Per-task overhead drops to microseconds (a queue push and pop). Memory is O(n_workers · stack_size + queue_capacity).

Final Expected Approach

A Queue(maxsize=queue_capacity) (Python’s queue.Queue is thread-safe and supports timeouts). N worker threads loop on q.get(), run the task, set its Future, repeat. A sentinel None posted N times signals shutdown. submit checks the shut-down flag, then either q.put_nowait (raise on full) or q.put (block on full); on full and not blocking, invoke on_reject.

Data Structures Used

Structure	Purpose
`queue.Queue(maxsize=…)`	producer-consumer with bounded blocking
`Future` (custom or `concurrent.futures.Future`)	result + exception delivery
Sentinel `None` posted N times	shutdown signal
`_shutdown: bool` flag	reject post-shutdown submissions

Correctness Argument

Liveness: when a task is enqueued and at least one worker is idle, that worker will dequeue and run it. Provided by queue.Queue’s internal Condition (notify on put, wait on get).

Safety / no lost tasks: every put either succeeds (task will be dequeued by some worker) or is explicitly rejected. The shutdown protocol enforces that no put succeeds after _shutdown=True. When shutdown(wait=True) returns, the queue is empty and all workers have exited (proven by the sentinel pattern: each worker sees exactly one None and exits, so all N workers terminate).

Future correctness: the worker’s try/except block sets either set_result(value) or set_exception(ex). Future.result() blocks on a Condition until one of the two is set. Linearizable.

Complexity

submit: O(1) amortized
Worker step: O(1) plus task duration
Memory: O(n_workers + queue_capacity)

Implementation Requirements

import threading, queue
from typing import Callable, Any, Optional
from concurrent.futures import Future

_SENTINEL = object()

class RejectedExecutionError(RuntimeError):
    pass

class ThreadPool:
    def __init__(self, n_workers: int, queue_capacity: int = 1024,
                 on_reject: Optional[Callable] = None):
        if n_workers <= 0:
            raise ValueError("n_workers must be positive")
        self._q: queue.Queue = queue.Queue(maxsize=queue_capacity)
        self._workers: list[threading.Thread] = []
        self._shutdown = False
        self._lock = threading.Lock()
        self._on_reject = on_reject or self._default_reject
        for i in range(n_workers):
            t = threading.Thread(target=self._run, name=f"pool-w{i}", daemon=True)
            t.start()
            self._workers.append(t)

    @staticmethod
    def _default_reject(fn, *args, **kwargs):
        raise RejectedExecutionError("queue full")

    def submit(self, fn: Callable, *args, **kwargs) -> Future:
        with self._lock:
            if self._shutdown:
                raise RejectedExecutionError("pool shut down")
        fut: Future = Future()
        try:
            self._q.put_nowait((fn, args, kwargs, fut))
        except queue.Full:
            try:
                self._on_reject(fn, *args, **kwargs)
            except Exception as ex:
                fut.set_exception(ex)
            else:
                fut.set_exception(RejectedExecutionError("rejected"))
        return fut

    def shutdown(self, wait: bool = True, timeout: Optional[float] = None) -> None:
        with self._lock:
            if self._shutdown:
                return
            self._shutdown = True
        for _ in self._workers:
            self._q.put(_SENTINEL)
        if wait:
            for w in self._workers:
                w.join(timeout=timeout)

    def shutdown_now(self) -> list[Callable]:
        """Stop accepting; abandon queued tasks; return abandoned callables."""
        with self._lock:
            self._shutdown = True
        abandoned: list[Callable] = []
        try:
            while True:
                item = self._q.get_nowait()
                if item is _SENTINEL: continue
                fn, _, _, fut = item
                fut.set_exception(RejectedExecutionError("shutdown_now"))
                abandoned.append(fn)
        except queue.Empty:
            pass
        for _ in self._workers:
            self._q.put(_SENTINEL)
        return abandoned

    def _run(self) -> None:
        while True:
            item = self._q.get()
            if item is _SENTINEL:
                return
            fn, args, kwargs, fut = item
            if not fut.set_running_or_notify_cancel():
                continue
            try:
                result = fn(*args, **kwargs)
            except BaseException as ex:
                fut.set_exception(ex)
            else:
                fut.set_result(result)


# A useful policy: caller-runs (executes inline if queue is full)
def caller_runs(fn, *args, **kwargs):
    fn(*args, **kwargs)

Tests

import unittest, time, threading

class TestPool(unittest.TestCase):
    def test_basic(self):
        pool = ThreadPool(n_workers=2, queue_capacity=10)
        futs = [pool.submit(lambda x=i: x * 2) for i in range(10)]
        results = [f.result(timeout=2.0) for f in futs]
        self.assertEqual(sorted(results), [0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
        pool.shutdown()

    def test_exception_propagated(self):
        pool = ThreadPool(n_workers=1, queue_capacity=10)
        f = pool.submit(lambda: 1 / 0)
        with self.assertRaises(ZeroDivisionError):
            f.result(timeout=2.0)
        pool.shutdown()

    def test_rejection_when_queue_full(self):
        block = threading.Event()
        pool = ThreadPool(n_workers=1, queue_capacity=1)
        pool.submit(lambda: block.wait())   # occupies the worker
        pool.submit(lambda: None)           # fills the queue
        f = pool.submit(lambda: None)       # rejected
        with self.assertRaises(RejectedExecutionError):
            f.result(timeout=1.0)
        block.set()
        pool.shutdown()

    def test_shutdown_rejects_new(self):
        pool = ThreadPool(n_workers=1)
        pool.shutdown()
        with self.assertRaises(RejectedExecutionError):
            pool.submit(lambda: None)

    def test_concurrent_submit(self):
        pool = ThreadPool(n_workers=8, queue_capacity=200)
        results = []
        lock = threading.Lock()
        def task(x):
            with lock: results.append(x)
        futs = [pool.submit(task, i) for i in range(200)]
        for f in futs: f.result(timeout=2.0)
        pool.shutdown()
        self.assertEqual(sorted(results), list(range(200)))

Follow-up Questions

(1) Thread-safe? Already designed for concurrency; the Queue handles producer-consumer atomicity. The _shutdown flag is read under a lock to avoid races between submit and shutdown.

(7) Backpressure? The bounded queue is the backpressure. Three policies: (a) put blocks the producer (default queue.put) — pushes backpressure to the caller. (b) reject (raise) — caller decides. (c) caller-runs — caller does the work; throttles naturally. (d) drop oldest — for non-critical telemetry-style work. Pick one explicitly per pool.

(12) Shutdown / draining? shutdown(wait=True) drains the queue (graceful). shutdown_now() abandons the queue and returns abandoned tasks (forceful). The graceful path is the production default; expose shutdown_now for SIGTERM after a deadline.

(8) Partial failure? A worker thread that crashes on an uncaught exception leaves the pool with N-1 workers permanently. Solutions: (a) catch BaseException around the task body (shown), (b) supervise — periodically check live worker count and respawn dead workers. The simplest production design catches and logs, never lets the worker die.

(13) Poison pill? A task that runs forever or consumes all memory blocks one worker permanently. Mitigations: per-task timeout (cooperative with a watchdog thread), memory accounting (rare in Python), or run untrusted tasks in subprocesses. Stating this awareness is the bar.

(11) Configuration knobs? n_workers (often CPU_count or 2 * CPU_count for I/O-bound); queue_capacity (rule of thumb: enough to absorb a ~1 second burst); on_reject policy. Knobs not to expose: queue type, worker thread name (auto-generate).

Product Extension

java.util.concurrent.ThreadPoolExecutor is the textbook reference. Python’s concurrent.futures.ThreadPoolExecutor is the canonical stdlib equivalent, with an unbounded queue by default — your implementation is more correct than the stdlib because you bounded the queue. AWS Lambda’s worker runtime, NGINX worker processes, and most application servers use variants of this pattern.

Language/Runtime Follow-ups

Python: GIL serializes CPU work; ThreadPool is for I/O-bound tasks. Use ProcessPoolExecutor for CPU-bound. concurrent.futures.ThreadPoolExecutor ships with Python — but its queue is unbounded by default, which is a footgun.
Java: new ThreadPoolExecutor(core, max, keepAlive, unit, queue, factory, rejectedHandler). Memorize the seven parameters and the four built-in RejectedExecutionHandler policies (Abort, CallerRuns, Discard, DiscardOldest).
Go: idiomatic Go does not use thread pools — goroutines are cheap. The pattern is a worker-pool of N goroutines reading from a channel of work items. The bounded channel is the bounded queue.
C++: std::thread per worker; std::condition_variable + std::queue for the work queue. Boost.Asio’s thread pool is production-ready.
JS/TS: single event loop; use worker_threads for CPU work. Libraries: piscina (worker pool for Node).

Common Bugs

Unbounded queue: Queue() without maxsize — masks production overload as memory growth.
Daemons vs non-daemons: in Python, daemon workers die abruptly on interpreter exit, abandoning in-flight tasks. Non-daemons require explicit shutdown or the program hangs. Pick deliberately.
Catching Exception but not BaseException lets KeyboardInterrupt kill workers silently. Catch BaseException, restore worker.
submit after shutdown race: check the flag under the lock and put under the same critical section, or accept that a few enqueues may sneak in between check and put (and handle them in the worker by checking the flag before running).
Forgetting to set_running_or_notify_cancel() on the Future — cancelled futures still get run.

Debugging Strategy

For deadlocks, take a thread dump: in Python, import faulthandler; faulthandler.dump_traceback_later(5) then trigger the hang. For lost tasks, instrument every put/get/set_result/set_exception with a sequence number and replay. For worker death, log every worker exit with its reason.

Mastery Criteria

Implemented ThreadPool in <30 minutes; tests pass first run.
Articulated bounded-queue + rejection-policy design without prompting.
Listed the four standard rejection policies (abort / caller-runs / discard / discard-oldest).
Answered follow-ups #7, #8, #12, #13 in <90 seconds each.
Stated when to use ThreadPool vs ProcessPool in Python in <30 seconds.
Refactored to add per-task timeout in <10 minutes when prompted.

Lab 06 — Durable Job Queue

Goal

Implement a job queue with at-least-once delivery semantics, idempotency keys, and visibility timeouts. After this lab you should be able to articulate why exactly-once delivery is impractical, design at-least-once with idempotency on the consumer, and implement an in-memory queue that simulates SQS-style semantics in under 35 minutes.

Background Concepts

A durable job queue accepts jobs from producers and delivers them to consumers, surviving consumer crashes without losing work. The three classical delivery semantics:

At-most-once: deliver, forget. Fast but loses jobs on crash. Acceptable for fire-and-forget telemetry.
At-least-once: deliver, retry until acknowledged. Jobs may be delivered multiple times. Requires consumers to be idempotent.
Exactly-once: impossible in a distributed system without two-phase commit between queue and consumer. The “exactly-once” branding in real systems (Kafka, Pulsar) means “exactly-once processing semantics given idempotent consumers” — which is at-least-once + idempotency.

The standard primitive that enables at-least-once is the visibility timeout: when a consumer dequeues a job, the queue marks it “in-flight” with a TTL. If the consumer acks before TTL, the job is deleted. If TTL expires (consumer crashed), the job becomes visible again and is redelivered. The consumer must be idempotent because the same job may be processed twice if ack was lost in transit.

Interview Context

This is the central problem in any interview at AWS (SQS), GCP (Pub/Sub), Confluent (Kafka), or any infrastructure team. The interviewer wants to hear: “exactly-once is impractical because of the two-generals problem; at-least-once with idempotency keys is the production answer; visibility timeout is how we implement it.” Then they want to see you build a small version that demonstrates understanding.

Problem Statement

Design JobQueue:

enqueue(payload, idempotency_key=None) -> job_id — push. If idempotency_key is non-None and matches a recent job, deduplicate.
dequeue(visibility_timeout=30.0) -> Job | None — pop a visible job; mark in-flight with TTL.
ack(job_id) — confirm successful processing; permanently delete.
nack(job_id, requeue_delay=0) — release back; optionally with a delay.
Background scavenger: jobs whose visibility TTL has expired return to visible state.

Constraints

Up to 10^5 in-flight jobs
10^4 enqueues / second
Single-process in-memory; persistence is a follow-up

Clarifying Questions

FIFO or best-effort ordering? (Best-effort is standard SQS; FIFO costs more.)
Visibility timeout: per-call or queue default? (Per-call, with queue default.)
Idempotency key TTL — how long do we dedupe? (Typically 5 minutes; configurable.)
Max retries before DLQ? (Often a follow-up; default unlimited.)
What happens on nack? (Requeue, optionally with a delay; this is the natural retry path.)

Examples

q = JobQueue()
job_id = q.enqueue("send-email-123", idempotency_key="email-abc")
q.enqueue("send-email-123", idempotency_key="email-abc")  # dedup; same job_id

job = q.dequeue(visibility_timeout=10)
# … process …
q.ack(job.job_id)  # done

# Crash scenario:
job = q.dequeue(visibility_timeout=10)
# consumer crashes, never acks
# 11 seconds later:
job_again = q.dequeue()    # same payload, same job_id, redelivery_count=2

Initial Brute Force

A list of jobs and a single mutex. dequeue pops the head, sets in-flight; ack removes; nack re-prepends. No visibility timeout. Loses jobs on crash.

Brute Force Complexity

O(1) per op for a deque, O(N) if implemented over a list with re-prepend. Fundamentally wrong for at-least-once because there’s no scavenger.

Optimization Path

Add: (a) visible: deque[Job] — jobs ready to be dequeued; (b) in_flight: dict[job_id, (Job, expires_at)] — taken but not acked; (c) idempotency: dict[key, job_id] with TTL for dedup; (d) a background scavenger that moves expired in-flight jobs back to visible. The scavenger can be lazy (check on each dequeue) instead of a dedicated thread.

Final Expected Approach

Use deque for visible, dict for in-flight with (job, expires_at), dict for idempotency cache. Each dequeue first sweeps in_flight for expired entries (move them to the front of visible). All operations under a single lock — the queue is fast, sub-millisecond critical sections.

Data Structures Used

Structure	Purpose
`deque[Job]`	visible jobs, FIFO best-effort
`dict[job_id, (Job, expires_at)]`	in-flight tracking
`dict[idempotency_key, job_id]`	dedup cache
`Lock`	single-process atomicity

Correctness Argument

No lost jobs (at-least-once): every job is in exactly one of three states: visible, in_flight, or acked (deleted). Transitions: enqueue → visible; dequeue → in_flight; ack → deleted; nack → visible; scavenger → visible (from in_flight on TTL expiry). No transition discards a job before ack. Therefore, until acked, every job remains in the system and will eventually be redelivered.

Idempotency dedup: if idempotency_key matches a job in either visible or in_flight (or recently acked, within the dedup window), enqueue returns the existing job_id without creating a new job. This makes producer retries safe.

At-least-once, not exactly-once: a consumer that successfully processes the job and crashes before sending ack will see the same job redelivered. The consumer must idempotent-key the work it does (e.g., the email service must dedupe by email-abc).

Complexity

enqueue: O(1)
dequeue: O(K) where K is the number of expired in-flight entries swept (amortized O(1))
ack / nack: O(1)

Implementation Requirements

import threading, time, itertools
from collections import deque
from dataclasses import dataclass
from typing import Optional, Any

@dataclass
class Job:
    job_id: int
    payload: Any
    delivery_count: int
    enqueued_at: float

class JobQueue:
    def __init__(self, default_visibility_timeout: float = 30.0,
                 idempotency_ttl: float = 300.0):
        self._visible: deque[Job] = deque()
        self._in_flight: dict[int, tuple[Job, float]] = {}     # id -> (job, expires_at)
        self._idem: dict[str, tuple[int, float]] = {}          # key -> (job_id, expires_at)
        self._default_vt = default_visibility_timeout
        self._idem_ttl = idempotency_ttl
        self._lock = threading.Lock()
        self._next_id = itertools.count(1)

    def enqueue(self, payload: Any, idempotency_key: Optional[str] = None) -> int:
        now = time.monotonic()
        with self._lock:
            self._sweep_idem(now)
            if idempotency_key is not None and idempotency_key in self._idem:
                existing_id, _ = self._idem[idempotency_key]
                return existing_id
            job_id = next(self._next_id)
            job = Job(job_id, payload, delivery_count=0, enqueued_at=now)
            self._visible.append(job)
            if idempotency_key is not None:
                self._idem[idempotency_key] = (job_id, now + self._idem_ttl)
            return job_id

    def dequeue(self, visibility_timeout: Optional[float] = None) -> Optional[Job]:
        vt = visibility_timeout if visibility_timeout is not None else self._default_vt
        now = time.monotonic()
        with self._lock:
            self._sweep_in_flight(now)
            if not self._visible:
                return None
            job = self._visible.popleft()
            job.delivery_count += 1
            self._in_flight[job.job_id] = (job, now + vt)
            return job

    def ack(self, job_id: int) -> bool:
        with self._lock:
            return self._in_flight.pop(job_id, None) is not None

    def nack(self, job_id: int, requeue_delay: float = 0.0) -> bool:
        now = time.monotonic()
        with self._lock:
            entry = self._in_flight.pop(job_id, None)
            if entry is None:
                return False
            job, _ = entry
            if requeue_delay > 0:
                # For simplicity, treat delay as a delayed visibility:
                # park as in-flight with expires_at = now + delay.
                self._in_flight[job_id] = (job, now + requeue_delay)
            else:
                self._visible.appendleft(job)   # head, so it's seen first
            return True

    def stats(self) -> dict:
        with self._lock:
            return {
                "visible": len(self._visible),
                "in_flight": len(self._in_flight),
                "dedup_keys": len(self._idem),
            }

    def _sweep_in_flight(self, now: float) -> None:
        expired = [(jid, j) for jid, (j, t) in self._in_flight.items() if t <= now]
        for jid, j in expired:
            del self._in_flight[jid]
            self._visible.appendleft(j)        # redelivery: front-load

    def _sweep_idem(self, now: float) -> None:
        # Lazy: only keep recent entries. O(N) but called infrequently.
        if len(self._idem) > 4096:
            self._idem = {k: v for k, v in self._idem.items() if v[1] > now}

Tests

import unittest, time

class TestJobQueue(unittest.TestCase):
    def test_basic_enqueue_ack(self):
        q = JobQueue()
        jid = q.enqueue("hello")
        job = q.dequeue()
        self.assertEqual(job.job_id, jid)
        self.assertTrue(q.ack(jid))
        self.assertIsNone(q.dequeue())

    def test_idempotency_dedup(self):
        q = JobQueue()
        a = q.enqueue("x", idempotency_key="k1")
        b = q.enqueue("x", idempotency_key="k1")
        self.assertEqual(a, b)
        self.assertEqual(q.stats()["visible"], 1)

    def test_visibility_timeout_redelivery(self):
        q = JobQueue(default_visibility_timeout=0.1)
        jid = q.enqueue("x")
        job1 = q.dequeue()
        time.sleep(0.15)
        job2 = q.dequeue()
        self.assertEqual(job1.job_id, job2.job_id)
        self.assertEqual(job2.delivery_count, 2)
        q.ack(job2.job_id)

    def test_nack_requeue(self):
        q = JobQueue()
        jid = q.enqueue("x")
        job = q.dequeue()
        q.nack(jid)
        job2 = q.dequeue()
        self.assertEqual(job.job_id, job2.job_id)
        self.assertEqual(job2.delivery_count, 2)

Follow-up Questions

(2) Persist state across restarts? Three layers: (a) WAL: every enqueue, ack, nack is appended to a log; on boot, replay. (b) Snapshot: periodic full state dump. (c) Combined: snapshot every N seconds, WAL between snapshots; recovery = latest snapshot + log replay since. SQS uses a replicated multi-AZ store; for an interview, WAL is the right answer.

(8) Partial failure? That’s the entire point of visibility timeout. Consumer crashes mid-processing → TTL expires → job redelivered. The consumer is responsible for idempotency. The queue is responsible for delivering at-least-once.

(9) Eviction / cleanup? Stale in-flight entries (consumer crashed and never acked) are swept on every dequeue. The idempotency cache TTL bounds dedup memory. DLQ (not implemented above) would catch jobs after N redeliveries — a follow-up to add.

(10) Consistency model? Linearizable per-job in a single process; redelivery breaks “exactly-once” but preserves “every job is processed at least once”. Replicated: consensus (Raft) for the metadata, leader-based delivery, replicated log for durability.

(11) Configuration knobs? default_visibility_timeout, idempotency_ttl, max_redeliveries (DLQ trigger), dlq_handler. Knobs not to expose: internal sweep cadence.

(12) Shutdown? On shutdown, refuse new enqueues, sweep in-flight back to visible (so consumers don’t redeliver after restart with stale TTLs), persist state, exit. The graceful invariant: no in-flight at shutdown time.

Product Extension

This is a simplified SQS / Cloud Pub/Sub / Azure Service Bus. Real systems add: replication for durability (across hosts/AZs), partitioning for throughput (multiple shards), DLQ as a separate queue with its own retention, FIFO ordering as an opt-in higher-cost mode, and ordering keys (per-key FIFO with cross-key parallelism — Kafka’s model). Kafka explicitly doesn’t have visibility timeouts; it uses offset-based delivery with consumer-managed checkpoints, which is a different design point.

Language/Runtime Follow-ups

Python: this implementation. For high-throughput, sharded queues with per-shard locks scale better than the single global lock.
Java: ArrayBlockingQueue is too simple (no visibility timeout). The right reference is java.util.concurrent.DelayQueue for visibility, plus a ConcurrentHashMap for in-flight tracking. Production: ActiveMQ, RabbitMQ.
Go: a channel-based implementation works for visible queue; in-flight is a sync.Map; sweeper is a goroutine. NATS JetStream is the production-grade Go choice.
C++: roll-your-own with std::deque + std::unordered_map + std::mutex. Boost has thread-safe queue templates.
JS/TS: BullMQ (Redis-backed) is the de-facto Node choice; visibility timeout is implemented via Redis sorted sets.

Common Bugs

Idempotency cache that never expires — memory leak.
Sweeping in-flight in a separate thread without coordinating the lock — races with dequeue. Lazy sweep on dequeue (as shown) avoids the extra thread.
Forgetting to increment delivery_count on redelivery — alerting can’t detect poison-pill jobs (jobs that always crash consumers).
nack with delay implemented by sleeping — blocks the consumer that called nack instead of just delaying re-visibility.
Treating idempotency dedup as global / forever — if the dedup window is too long, retries after intentional re-submission are silently dropped.

Debugging Strategy

Print stats() periodically to track visible / in-flight counts. A growing in-flight count without acks → consumers are crashing or hanging. Stuck visible count → no consumers are running. For “duplicate processing” complaints, capture the redelivery-count distribution; high tail = consumers crashing or visibility timeout too short.

Mastery Criteria

Implemented in <40 minutes from blank screen.
Stated “exactly-once is impractical; at-least-once + idempotency is the answer” without prompting.
All four tests pass first run.
Articulated visibility timeout, idempotency keys, and DLQ design in <90 seconds.
Answered follow-ups #2, #8, #10, #11, #12.
Compared SQS-style (visibility timeout) vs Kafka-style (offset-based) delivery in <60 seconds.

Lab 07 — Autocomplete

Goal

Implement an autocomplete service that returns the top-K suggestions for any prefix in sub-millisecond time. Use a trie augmented with per-node top-K caches, and support weighted suggestions (popularity-ranked). After this lab you should be able to design and implement the data structure in under 30 minutes and answer follow-ups about scale, freshness, and personalization.

Background Concepts

A trie (prefix tree) maps prefixes to a set of completions in O(prefix length) time. The naive design — at query time, walk to the prefix node and DFS to gather all descendants, then sort by weight — is correct but slow if the prefix matches many descendants. The production trick is to precompute and cache the top-K at each node during insert; query then becomes O(prefix-length + K).

The cache update is the subtle part. When add(word, weight) is called, walk the trie down the word’s path; at each node, merge the new word into the node’s top_k (a sorted list or small heap) and discard anything past K. The data flow is “bottom-up via the path you just walked, but for top-K at every level”.

Interview Context

Autocomplete is a top-10 question at search-heavy companies (Google search, Amazon product search, LinkedIn, Yelp). The bar is: trie + per-node top-K cache + ability to answer follow-ups about distributing the index, refreshing weights, and personalizing.

Problem Statement

Design Autocomplete(K):

add(word, weight) — insert with weight; subsequent adds of the same word update its weight (additive or replace, your choice — pick one).
suggest(prefix) -> list[str] — return top-K words by weight starting with prefix. Sub-millisecond average.

Constraints

Up to 10^6 words
10^5 queries / second
Average word length 10–30
K ≤ 10

Clarifying Questions

Weight semantics: replace or additive? (Pick one; “additive” matches “popularity”.)
Case sensitivity? (Default case-insensitive; lowercase on insert.)
Tie-breaking on equal weight? (Lexicographic.)
Real-time updates required, or build-once? (Both supported; weights mutable.)
K provided per-query or fixed? (Fixed at construction simplifies caching; per-query is a follow-up.)

Examples

ac = Autocomplete(K=3)
ac.add("apple", 5); ac.add("app", 10); ac.add("apply", 3); ac.add("apricot", 1)
ac.suggest("ap")    -> ["app", "apple", "apply"]
ac.suggest("app")   -> ["app", "apple", "apply"]
ac.suggest("apr")   -> ["apricot"]
ac.suggest("z")     -> []

Initial Brute Force

Store words in a dict[word, weight]. On suggest, scan all words, filter by startswith(prefix), sort by weight, return top-K. O(N · prefix-length) per query.

Brute Force Complexity

Per suggest: O(N · L) where N = number of words, L = average length. At N=10^6 and 10^5 QPS, this is 10^11 operations / second — far too slow.

Optimization Path

A trie reduces the descend-to-prefix step to O(L). The remaining work — gathering the top-K descendants — is what the per-node top_k cache eliminates. Insert becomes O(L · K log K) (we update the cache at L nodes, each O(K log K)); query becomes O(L + K).

Final Expected Approach

Trie node has children: dict[char, Node] and top_k: list[(weight, word)]. add walks down the path; at each node, runs a small merge to maintain top-K. suggest walks to the prefix node and returns its top_k.

Data Structures Used

Structure	Purpose
Trie nodes	prefix indexing
Per-node `top_k: list[(weight, word)]`	precomputed answers
`dict[word, weight]`	dedup + weight tracking

Correctness Argument

Invariant: at any node N, top_k(N) is the top-K (by weight, ties broken lexicographically) of { (weight, word) : word starts with prefix(N) }. After every add(word, weight), we visit exactly the nodes on word’s path. At each visited node, we either (a) update the (word, weight) entry if word is already in top_k, or (b) insert and trim to K. No node off the path’s top_k set could change because word doesn’t extend any non-path prefix.

Edge case: weight updates that decrease a word’s standing — if word was in the top-K and its new weight kicks it out, we need to recompute the node’s top-K from a wider candidate set. The clean approach: store at each node a count_per_word dict and full candidate set restricted to top-K-wide buffer (e.g., top-10K when K=10) — heavy but correct. The simpler approach: on weight decrease, do a DFS to rebuild top_k. Document the choice.

Complexity

add: O(L · K log K)
suggest: O(L + K)
Space: O(N · L · K) worst case; in practice much less because most nodes don’t have K distinct descendants

Implementation Requirements

import heapq
from typing import Optional

class _Node:
    __slots__ = ("children", "top_k")
    def __init__(self):
        self.children: dict[str, _Node] = {}
        self.top_k: list[tuple[int, str]] = []  # min-heap of (weight, word) ... ish

class Autocomplete:
    def __init__(self, k: int = 5):
        self._root = _Node()
        self._weights: dict[str, int] = {}
        self._k = k

    def add(self, word: str, weight_delta: int = 1) -> None:
        word = word.lower()
        new_weight = self._weights.get(word, 0) + weight_delta
        self._weights[word] = new_weight
        node = self._root
        nodes_on_path: list[_Node] = [node]
        for ch in word:
            node = node.children.setdefault(ch, _Node())
            nodes_on_path.append(node)
        for n in nodes_on_path:
            self._upsert_top_k(n, word, new_weight)

    def suggest(self, prefix: str) -> list[str]:
        prefix = prefix.lower()
        node = self._root
        for ch in prefix:
            node = node.children.get(ch)
            if node is None:
                return []
        # top_k stored as (weight_neg, word) so sorted asc gives top weights desc
        ranked = sorted(node.top_k, key=lambda p: (-p[0], p[1]))
        return [w for _, w in ranked[:self._k]]

    def _upsert_top_k(self, node: _Node, word: str, weight: int) -> None:
        for i, (w, ww) in enumerate(node.top_k):
            if ww == word:
                node.top_k[i] = (weight, word)
                node.top_k.sort(key=lambda p: (-p[0], p[1]))
                return
        node.top_k.append((weight, word))
        node.top_k.sort(key=lambda p: (-p[0], p[1]))
        if len(node.top_k) > self._k:
            del node.top_k[self._k:]

Tests

import unittest

class TestAutocomplete(unittest.TestCase):
    def test_basic(self):
        ac = Autocomplete(k=3)
        for w, n in [("app", 10), ("apple", 5), ("apply", 3), ("apricot", 1)]:
            ac.add(w, n)
        self.assertEqual(ac.suggest("ap"), ["app", "apple", "apply"])
        self.assertEqual(ac.suggest("app"), ["app", "apple", "apply"])
        self.assertEqual(ac.suggest("apr"), ["apricot"])
        self.assertEqual(ac.suggest("z"), [])

    def test_weight_update(self):
        ac = Autocomplete(k=3)
        ac.add("a", 1); ac.add("b", 2); ac.add("c", 3)
        ac.add("a", 10)                        # a now weight 11
        # Suggest off the empty prefix
        self.assertEqual(ac.suggest(""), ["a", "c", "b"])

    def test_top_k_truncation(self):
        ac = Autocomplete(k=2)
        for c, w in [("a", 1), ("b", 2), ("c", 3), ("d", 4)]:
            ac.add(c, w)
        self.assertEqual(ac.suggest(""), ["d", "c"])

    def test_lex_tie(self):
        ac = Autocomplete(k=3)
        ac.add("banana", 5); ac.add("apple", 5); ac.add("cherry", 5)
        self.assertEqual(ac.suggest(""), ["apple", "banana", "cherry"])

Follow-up Questions

(3) Scale to N nodes? Shard by first character (A–Z) → 26 shards, each with its own trie. For more even distribution, shard by hash(prefix[:2]). Each suggestion query routes to one shard. For the empty-prefix query, broadcast and merge — the price you pay for sharding by prefix.

(4) Observe / monitor? Per-prefix-length latency histogram (short prefixes = many candidates, slow); cache hit-rate (proportion of queries hitting precomputed top-K vs needing DFS); query volume per prefix (top hot prefixes). Alert on p99 latency.

(9) Eviction / cleanup? Words may go cold (a celebrity who stopped trending). Strategy: timestamped weight, decay on a schedule (multiply all weights by 0.95 daily), delete when below a threshold. Or use a separate “deletion” path: remove(word) walks the trie, removes from each node’s top_k, and rebuilds top_k from a wider candidate set if the removed word was in it.

(11) Configuration knobs? K, case_sensitivity, weight_decay_factor. Knobs not to expose: trie node layout, sort algorithm.

(8) Partial failure? A query that hits a partially-rebuilt index during a add could see stale top-K. Solutions: (a) atomic per-node update (under a per-node lock), (b) versioned snapshot (queries read a stable version while writes go to a shadow), (c) accept stale results for ~1 second. For autocomplete, eventual consistency is fine.

Product Extension

Google’s autocomplete is far more than a trie: it’s a personalized, context-aware, learning-ranked system. The trie + top-K is the index layer; on top sits a ranker that combines popularity, personalization (your history), context (location, time), and freshness (trending). Production systems also add typo tolerance (Levenshtein-edit-distance fuzzy match within edit-distance ≤ 2) — a much harder problem solved with FSTs or n-gram inverted indexes.

Language/Runtime Follow-ups

Python: dict-based trie is the simplest; for memory, switch to arrays of 26/128 children once you optimize. The implementation above is fine for 10^6 words; beyond, consider a DAWG (DAG of suffix-shared subtrees).
Java: HashMap<Character, Node> per node, or arrays for ASCII. Apache Commons Collections has TrieMap (PATRICIA trie).
Go: map[rune]*Node per node. Excellent for this workload because of GC’s tolerance for many small allocations.
C++: same pattern. For best performance, use std::array<Node*, 26> for ASCII.
JS/TS: Map<string, Node> per node; no concurrency concerns in single-threaded Node.

Common Bugs

Inserting and forgetting to update top-K at every node on the path (only updating the leaf). Subsequent prefix queries return empty.
Sorting top-K by weight only, forgetting lex tie-break. Tests with equal weights become flaky.
On weight decrease, leaving a stale entry in top-K. Solution: full rebuild of top-K on decrease.
Case mismatch: insert lowercase, query as-is. Lowercase both.
Memory: storing the full word in every node’s top-K — at 10^6 words and depth 30, this is 30M strings. Store an integer ID and look up the word in a side dict for memory savings.

Debugging Strategy

For “wrong suggestions” bugs: print node.top_k at the prefix node and verify it matches the expected top-K. For “missing word” bugs: walk down the trie from root, printing node.children at each step, confirm the path exists. For weight bugs: dump self._weights[word] and compare to expected. For performance, profile: most hot paths are dict.setdefault and the in-place sort.

Mastery Criteria

Implemented trie + per-node top-K in <30 minutes.
All four tests pass first run.
Stated trie + top-K caching tradeoff (insert is K-times slower, query is L+K instead of L+all-descendants).
Answered follow-ups #3 (sharding), #4 (observability), #9 (decay), #8 (eventual consistency) crisply.
Compared trie vs DAWG vs FST for memory.
Articulated typo-tolerance design (BK-tree / fuzzy n-grams) at a high level.

Lab 08 — Log Parser

Goal

Implement a streaming log parser that reads log lines (potentially gigabytes), extracts structured fields via regex, aggregates per-field counts, and emits structured output — all under bounded memory. After this lab you should be able to write a clean streaming text-processing class with bounded memory in under 25 minutes.

Background Concepts

Log parsing has two patterns: batch (load file, parse all, output) and streaming (read one line at a time, emit incremental output). The bar at senior interviews is the streaming variant because real production logs are too large to load — multi-gigabyte files where batch processing would OOM.

The two streaming primitives are:

Line-by-line iteration with a generator (for line in file: in Python). Memory is O(line size), not O(file size).
Bounded aggregation: when counting unique IPs over a 1 TB log, you cannot keep all distinct IPs in a dict. Bound the aggregation by either (a) sketch (HyperLogLog for distinct counts, count-min for top-K), or (b) “top-K with eviction” using a min-heap of size K.

The regex itself is mundane. The interview signal is the discipline of bounded memory and clean separation between parser, extractor, and aggregator.

Interview Context

Log parsing is a popular question at logging / observability companies (Datadog, Splunk, Honeycomb, Cribl, Elastic) and at any infrastructure company that processes high-volume telemetry. It tests streaming discipline, regex fluency, and bounded-memory awareness. It also exposes weak engineering: a candidate who writes lines = file.readlines() instantly fails the bounded-memory criterion.

Problem Statement

Design LogParser(pattern, top_k=10):

parse_stream(line_iter) -> Iterator[dict] — yield a dict per line with extracted named fields. Skip malformed lines (count them).
aggregate(line_iter) -> dict — return per-field top-K aggregates (e.g., top 10 IPs, top 10 paths, top 10 status codes). Bounded memory.

The regex is provided at construction; the parser must use named capture groups.

Constraints

Input file size: up to 100 GB
Aggregator memory: ≤ 100 MB
Target throughput: 50 MB/s on a single core

Clarifying Questions

Is the log format known? (Yes — caller provides regex with named groups.)
Malformed lines: skip, error, or quarantine? (Skip + count by default; quarantine optionally.)
Aggregation: which fields, what kind (count, distinct, top-K)? (Caller specifies.)
Time-series: are we computing per-time-window aggregates? (Optional; default is whole-stream.)
Encoding: UTF-8? Binary? (UTF-8 default; binary is a follow-up.)

Examples

pattern = r'(?P<ip>\S+) - - \[(?P<ts>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) HTTP/[\d.]+" (?P<status>\d+) (?P<bytes>\d+)'
parser = LogParser(pattern, top_k=3)

for record in parser.parse_stream(open("access.log")):
    print(record)
# {'ip': '1.2.3.4', 'ts': '20/May/2026:12:00:00', 'method': 'GET', 'path': '/x', 'status': '200', 'bytes': '1234'}

agg = parser.aggregate(open("access.log"))
# {'ip': [('1.2.3.4', 1500), ('5.6.7.8', 1200), ...],
#  'path': [('/api', 5000), ('/login', 3000), ...],
#  'status': [('200', 90000), ('404', 4000), ('500', 200)],
#  'malformed': 12,
#  'total': 100000}

Initial Brute Force

open(file).read() then re.finditer. Loads everything; OOMs on 100 GB.

Brute Force Complexity

Memory: O(file size). At 100 GB, instant OOM on a 32 GB machine.

Optimization Path

Stream line-by-line with for line in file:. For aggregation, replace the unbounded dict[ip, count] with: (a) keep all counts during the stream because the unique key cardinality is what matters (often 100k unique IPs is fine), or (b) for very high cardinality, use HyperLogLog (HLL) for distinct counts and count-min sketch + min-heap for top-K. For most workloads at moderate cardinality, the dict is fine; at extreme cardinality, sketches are required.

Final Expected Approach

Compile the regex once. Stream lines via the iterator. For each line, match and yield the groupdict() if matched, else increment malformed count. Aggregator: per configured field, maintain a Counter (which is a dict); at end, most_common(K). For very high cardinality, switch to count-min + heap.

Data Structures Used

Structure	Purpose
Compiled `re.Pattern`	match each line in O(line length)
`Counter` per field	exact top-K within bounded cardinality
Min-heap of (count, key) of size K	bounded top-K when cardinality is unbounded
Counters for `total`, `malformed`	observability

Correctness Argument

Streaming: for line in file: reads at most one line buffer at a time. Memory is O(longest line + aggregator state). For 100 GB files with 1 KB lines, memory stays at ~aggregator-state size.

Aggregation: Counter.most_common(K) returns the exact top-K when all keys are tracked. When using a count-min sketch + bounded heap, the result is approximate with bounded error: actual_count ≤ estimate ≤ actual_count + ε · total with probability ≥ 1 − δ. We pick ε, δ to fit memory.

Complexity

Per line: O(L · regex-complexity) for parsing + O(F) for F fields aggregated
Total: O(N · L · regex)
Memory: O(unique keys per field) for exact aggregation; O(width × depth) for sketch

Implementation Requirements

import re
from collections import Counter
from typing import Iterator, Iterable, Optional

class LogParser:
    def __init__(self, pattern: str, top_k: int = 10,
                 aggregate_fields: Optional[list[str]] = None):
        self._re = re.compile(pattern)
        self._k = top_k
        self._fields = aggregate_fields  # None = aggregate all named groups

    def parse_stream(self, lines: Iterable[str]) -> Iterator[dict]:
        for line in lines:
            line = line.rstrip("\n")
            m = self._re.match(line)
            if m is None:
                continue
            yield m.groupdict()

    def aggregate(self, lines: Iterable[str]) -> dict:
        counters: dict[str, Counter] = {}
        total = malformed = 0
        for line in lines:
            line = line.rstrip("\n")
            total += 1
            m = self._re.match(line)
            if m is None:
                malformed += 1
                continue
            d = m.groupdict()
            fields = self._fields or list(d.keys())
            for f in fields:
                v = d.get(f)
                if v is None: continue
                counters.setdefault(f, Counter())[v] += 1
        out = {f: c.most_common(self._k) for f, c in counters.items()}
        out["total"] = total
        out["malformed"] = malformed
        return out


# Bounded-memory variant: top-K only via heap
import heapq
class BoundedTopK:
    """Approximate top-K using count-min sketch + min-heap of size K.

    For high-cardinality streams. Replace LogParser._field_counters with this.
    """
    def __init__(self, k: int, width: int = 2048, depth: int = 5):
        self._k = k
        self._w, self._d = width, depth
        import random
        self._table = [[0] * width for _ in range(depth)]
        # Independent hash seeds.
        self._seeds = [random.randint(1, 2**31 - 1) for _ in range(depth)]
        self._heap: list[tuple[int, str]] = []   # (count, key)
        self._in_heap: dict[str, int] = {}       # key -> count seen at insert

    def add(self, key: str) -> None:
        est = self._increment(key)
        if key in self._in_heap:
            # Best-effort: refresh heap entry. (Lazy: do nothing; entries are stale.)
            self._in_heap[key] = est
            return
        if len(self._heap) < self._k:
            heapq.heappush(self._heap, (est, key))
            self._in_heap[key] = est
            return
        if est > self._heap[0][0]:
            old_count, old_key = heapq.heappushpop(self._heap, (est, key))
            self._in_heap.pop(old_key, None)
            self._in_heap[key] = est

    def _increment(self, key: str) -> int:
        ests = []
        for i in range(self._d):
            j = (hash((self._seeds[i], key))) % self._w
            self._table[i][j] += 1
            ests.append(self._table[i][j])
        return min(ests)

    def top_k(self) -> list[tuple[str, int]]:
        return sorted(((k, c) for c, k in self._heap), key=lambda p: -p[1])

Tests

import unittest, io

LOG_PATTERN = (r'(?P<ip>\S+) - - \[(?P<ts>[^\]]+)\] '
               r'"(?P<method>\S+) (?P<path>\S+) HTTP/[\d.]+" '
               r'(?P<status>\d+) (?P<bytes>\d+)')

SAMPLE = """1.1.1.1 - - [01/Jan/2026:00:00:00 +0000] "GET /a HTTP/1.1" 200 100
2.2.2.2 - - [01/Jan/2026:00:00:01 +0000] "GET /b HTTP/1.1" 200 200
1.1.1.1 - - [01/Jan/2026:00:00:02 +0000] "POST /a HTTP/1.1" 500 0
malformed log line junk junk junk
1.1.1.1 - - [01/Jan/2026:00:00:03 +0000] "GET /a HTTP/1.1" 200 100
"""

class TestParser(unittest.TestCase):
    def test_parse_stream(self):
        p = LogParser(LOG_PATTERN, top_k=3)
        recs = list(p.parse_stream(io.StringIO(SAMPLE)))
        self.assertEqual(len(recs), 4)
        self.assertEqual(recs[0]["ip"], "1.1.1.1")
        self.assertEqual(recs[0]["status"], "200")

    def test_aggregate(self):
        p = LogParser(LOG_PATTERN, top_k=3, aggregate_fields=["ip", "status"])
        agg = p.aggregate(io.StringIO(SAMPLE))
        self.assertEqual(agg["total"], 5)
        self.assertEqual(agg["malformed"], 1)
        self.assertEqual(agg["ip"][0], ("1.1.1.1", 3))
        self.assertEqual(dict(agg["status"]), {"200": 3, "500": 1})

    def test_streaming_memory(self):
        # Generate a synthetic stream and ensure parse_stream is lazy
        def gen():
            for i in range(10000):
                yield f'1.1.1.1 - - [now] "GET /p{i % 100} HTTP/1.1" 200 100'
        p = LogParser(LOG_PATTERN)
        # consume one record at a time
        it = p.parse_stream(gen())
        first = next(it)
        self.assertEqual(first["path"], "/p0")

Follow-up Questions

(4) Observe / monitor? Throughput (lines/sec), parse error rate (malformed/total), per-field cardinality (gauge), p99 line size (latency surrogate). Alert on parse error rate spiking — usually means upstream changed the format.

(5) Tests? Unit on regex correctness with hand-crafted lines; property-based tests with random-line generators; smoke on a real prod-shaped sample (1 MB); large-input test that asserts memory stays bounded (tracemalloc.get_traced_memory() in Python).

(7) Backpressure? If the consumer of parse_stream is slow, the iterator naturally pauses — Python generators are pull-based. For the producer side (file reads), no backpressure issue. If shipping to a downstream like Kafka, buffer with a bounded queue and drop on full (with a counter).

(11) Configuration knobs? pattern, top_k, aggregate_fields, bounded_memory: bool (toggle exact vs sketch-based). Knobs not to expose: regex compilation cache.

(13) Poison pill? A line that takes O(catastrophic backtracking) on the regex (regex DoS via specific patterns). Mitigation: line length cap (skip lines > N bytes), regex timeout (Python: only available in regex package, not stdlib re), or pre-compile with anchors and avoid .* at the start.

Product Extension

Production systems use one of: logstash / fluentd (regex-based extraction with field rules), CloudWatch Logs Insights (column-based after extraction), Datadog Logs / Splunk (full pipeline with grok patterns and ingest-time enrichment). The data structure that powers most “top-K-over-stream” dashboards is count-min + heap; HLL powers distinct-count widgets; reservoir sampling powers “show me 100 random matching events”.

Language/Runtime Follow-ups

Python: re is fast enough for most logs but doesn’t compile to DFA — backtracking is a real risk. Use the regex package for timeout support. For raw speed, pyre2 (re2 binding) avoids backtracking entirely.
Java: Pattern.compile(...) once; reuse. Matcher is mutable per match. For very high throughput, RE2/J avoids backtracking.
Go: regexp package is RE2-based — guaranteed linear time, no catastrophic backtracking. Idiomatic for log parsing.
C++: std::regex is slow; prefer Boost.Regex or PCRE2 in production.
JS/TS: V8’s regex is backtracking; same DoS concern as Python’s re. Node has no built-in regex timeout.

Common Bugs

Loading the file: open(f).readlines() or f.read().split("\n") — instant OOM on large files.
Recompiling the regex per line — 100x slowdown.
Forgetting to strip \n — the last named group captures \n and breaks comparisons.
Using .* greedily inside the pattern — catastrophic backtracking on long lines.
Aggregator dict grows unbounded on high-cardinality fields (e.g., user-agent string with version churn). Cap or use sketch.

Debugging Strategy

For parse failures: print the first 5 malformed lines and inspect the regex against them. For wrong field values: print m.groupdict() of one matching line. For OOM: tracemalloc.start(); ...; print(tracemalloc.get_traced_memory()) at intervals — find the structure that grows. For slowness: cProfile and check whether the hot spot is regex match or dict update.

Mastery Criteria

Implemented streaming LogParser with bounded aggregation in <25 minutes.
All three tests pass first run.
Stated for line in file: lazy iteration without prompting.
Explained when to switch from Counter to count-min + heap (when unique-key memory exceeds budget).
Answered follow-ups #4, #7, #11, #13 (regex DoS) crisply.
Identified backtracking risk in user-supplied regexes.

Lab 09 — File Deduplication

Goal

Find all groups of duplicate files in a directory tree using a three-stage filter: size → quick hash (first/last K bytes) → full content hash. After this lab you should be able to design and implement an efficient file deduper that minimizes I/O on huge directories in under 25 minutes.

Background Concepts

The naive approach — read every file and group by full hash — is correct but wastes I/O on files that are clearly not duplicates (different sizes). The classical optimization is a three-stage cascade:

Group by size. Two files of different sizes cannot be duplicates. This is a stat call (cheap, no read).
Group by quick hash. For each size-group with ≥2 files, hash the first and last K bytes (e.g., 4 KB each). Files with different head+tail hashes are not duplicates.
Group by full hash. For each surviving group with ≥2 files, hash the full content. This is the only stage that does full reads.

This works because in real datasets, most files of the same size differ in their first/last few KB (think .docx files with embedded timestamps, video files with different headers, log files with different first lines). The cascade reduces total I/O by ~10–100×.

Interview Context

This is a popular practical problem at infrastructure / file-storage companies (Dropbox, Box, Google Drive, AWS S3 dedup). It tests I/O awareness, hashing fluency, and ability to design a multi-stage filter pipeline. The interviewer wants to hear “I’d minimize I/O by going size → quick-hash → full-hash” before any code.

Problem Statement

Design find_duplicates(root) -> list[list[Path]]:

Walk the directory tree under root.
Return groups (lists) of paths that have identical content. Each group has ≥ 2 paths.
Use the three-stage cascade.

Constraints

Up to 10^6 files
Up to 1 TB total bytes
Memory: ≤ 1 GB
Read budget: minimize bytes read (the goal of optimization)

Clarifying Questions

Symlinks — follow or skip? (Skip by default to avoid loops; configurable.)
Hidden files? (Include by default; let caller filter.)
Empty files (size 0) — duplicates of each other? (Yes — include them as a group; or filter; ask.)
Hash function? (hashlib.blake2b is fast; sha256 is the cryptographic default; md5 is fine for non-adversarial dedup. Pick non-cryptographic-fast for performance, cryptographic if the result is durable.)
Concurrency? (Not strictly required; CPU is hashing, I/O is reads — both parallelize well. State as a follow-up.)

Examples

root/
├── a.txt    "hello"
├── b.txt    "hello"
├── c.txt    "world"
├── d.txt    "world"
└── e.txt    "different"

find_duplicates("root")
-> [[Path("root/a.txt"), Path("root/b.txt")],
    [Path("root/c.txt"), Path("root/d.txt")]]

Initial Brute Force

For every pair of files, compare byte-for-byte. O(N² · L) reads. Catastrophic at 10^6 files.

Brute Force Complexity

O(N²) pairwise comparisons, each O(L). At N=10^6 and L=1 MB, this is 10^18 byte comparisons — never completes.

Optimization Path

Three-stage cascade:

Group by size: O(N) stat calls. Memory O(N · path).
Quick-hash within each size group: O(K · |group|) per group; only run on size-groups with ≥2 files.
Full-hash within each quick-hash group: O(L · |group|).

Total reads: most files are excluded at stage 1 (different sizes); of the rest, most are excluded at stage 2 (quick hash differs). Only true (or very near) duplicates get a full read.

Final Expected Approach

Walk the tree once, building dict[size, list[Path]]. For groups with len ≥ 2, build dict[quick_hash, list[Path]]. For surviving groups, build dict[full_hash, list[Path]]. Output groups with len ≥ 2.

Data Structures Used

Structure	Stage
`dict[int, list[Path]]`	size grouping
`dict[bytes, list[Path]]`	quick-hash grouping
`dict[bytes, list[Path]]`	full-hash grouping

Correctness Argument

Soundness: every group output has identical full content (verified by full-hash equality, modulo collision probability of 2^-256 for SHA-256, negligible).

Completeness: two files A, B with identical content have:

equal size (so they survive stage 1)
equal quick-hash (so they survive stage 2)
equal full-hash (so they end up in the same output group)

Therefore A and B appear in the same output group. The cascade does not miss any duplicate.

Complexity

Time: O(N) for size grouping; O(K · N_size_dups) for quick-hash; O(L · N_full_dups) for full-hash. K ≪ L, N_full_dups ≪ N.
Space: O(N) for path bookkeeping.

Implementation Requirements

import os, hashlib
from pathlib import Path
from collections import defaultdict
from typing import Iterable

QUICK_BYTES = 4096   # 4 KB head + 4 KB tail

def _quick_hash(path: Path) -> bytes:
    h = hashlib.blake2b(digest_size=16)
    size = path.stat().st_size
    with open(path, "rb") as f:
        head = f.read(QUICK_BYTES)
        h.update(head)
        if size > 2 * QUICK_BYTES:
            f.seek(-QUICK_BYTES, os.SEEK_END)
            tail = f.read(QUICK_BYTES)
            h.update(tail)
    return h.digest()

def _full_hash(path: Path, chunk: int = 1 << 20) -> bytes:
    h = hashlib.blake2b(digest_size=32)
    with open(path, "rb") as f:
        while True:
            buf = f.read(chunk)
            if not buf: break
            h.update(buf)
    return h.digest()

def find_duplicates(root: str | Path,
                    follow_symlinks: bool = False,
                    include_empty: bool = False) -> list[list[Path]]:
    root = Path(root)
    by_size: dict[int, list[Path]] = defaultdict(list)
    for dirpath, _, files in os.walk(root, followlinks=follow_symlinks):
        for name in files:
            p = Path(dirpath) / name
            try:
                st = p.stat() if follow_symlinks else p.lstat()
                if not include_empty and st.st_size == 0: continue
                if not p.is_file(): continue
                by_size[st.st_size].append(p)
            except (OSError, PermissionError):
                continue

    candidates_after_size: list[list[Path]] = [g for g in by_size.values() if len(g) >= 2]

    by_quick: list[list[Path]] = []
    for group in candidates_after_size:
        sub: dict[bytes, list[Path]] = defaultdict(list)
        for p in group:
            try: sub[_quick_hash(p)].append(p)
            except OSError: continue
        for g in sub.values():
            if len(g) >= 2: by_quick.append(g)

    out: list[list[Path]] = []
    for group in by_quick:
        sub: dict[bytes, list[Path]] = defaultdict(list)
        for p in group:
            try: sub[_full_hash(p)].append(p)
            except OSError: continue
        for g in sub.values():
            if len(g) >= 2: out.append(g)

    return out

Tests

import unittest, tempfile, os
from pathlib import Path

class TestDedup(unittest.TestCase):
    def setUp(self):
        self.tmp = tempfile.TemporaryDirectory()
        self.root = Path(self.tmp.name)

    def tearDown(self):
        self.tmp.cleanup()

    def _w(self, name: str, content: bytes):
        p = self.root / name
        p.parent.mkdir(parents=True, exist_ok=True)
        p.write_bytes(content)
        return p

    def test_basic(self):
        a = self._w("a", b"hello")
        b = self._w("sub/b", b"hello")
        c = self._w("c", b"world")
        d = self._w("sub/d", b"world")
        e = self._w("e", b"unique")
        groups = find_duplicates(self.root)
        flat = sorted([sorted(map(str, g)) for g in groups])
        self.assertEqual(len(flat), 2)
        self.assertIn(sorted([str(a), str(b)]), flat)
        self.assertIn(sorted([str(c), str(d)]), flat)

    def test_size_excludes_non_duplicates(self):
        self._w("a", b"x" * 100)
        self._w("b", b"x" * 200)             # different size
        self.assertEqual(find_duplicates(self.root), [])

    def test_quick_hash_disambiguates(self):
        # Same size, different head: stage 2 separates them
        self._w("a", b"head1" + b"x" * 8000)
        self._w("b", b"head2" + b"x" * 8000)
        self.assertEqual(find_duplicates(self.root), [])

    def test_large_groups(self):
        for i in range(5):
            self._w(f"copy{i}", b"same content")
        groups = find_duplicates(self.root)
        self.assertEqual(len(groups), 1)
        self.assertEqual(len(groups[0]), 5)

Follow-up Questions

(3) Scale to N nodes? Distributed dedup over a fleet: size grouping is local on each node; for cross-node dedup, broadcast (size, quick_hash, node, path) tuples to a coordinator, group by (size, quick_hash), then have nodes that share a group exchange full hashes. The full read happens locally; only hashes (32 bytes) move over the network.

(4) Observe / monitor? Files scanned (counter), bytes read at each stage (gauge — quantifies the savings of the cascade), groups found (gauge), errors per stage (counter). The “bytes read at full-hash stage / total bytes” ratio is the key savings metric.

(8) Partial failure? A file deleted/replaced mid-scan: the second-stage hash may differ from the first-stage size or hash. Solutions: (a) treat the file as missing on OSError and skip, (b) snapshot the FS (LVM snapshot, ZFS, Btrfs) before scanning. (a) is the practical answer.

(13) Poison-pill input? A multi-TB file (or a sparse file with a 1 PB apparent size) blows up the full-read stage. Mitigation: cap files by size (skip if > max_file_size), or use Merkle-tree chunked hashing where each chunk is independent and partial reads can be cached/resumed.

(11) Configuration knobs? quick_bytes (default 4 KB), chunk_size for full read, max_file_size, follow_symlinks, include_empty. Knobs not to expose: hash algorithm (pick BLAKE2b for speed unless a contract requires SHA-256).

Product Extension

fdupes, rmlint, jdupes are the canonical Linux tools and use exactly this cascade. Dropbox’s chunked dedup operates at the 4 MB block level (each file is split into chunks, each chunk hashed and deduplicated separately) — letting two files that share a 1 MB prefix store the prefix once. ZFS dedup operates at the block level too. The “full-file dedup” you wrote here is the simplest version; production systems often go to chunk-level dedup for higher savings.

Language/Runtime Follow-ups

Python: os.walk is the standard; pathlib.Path.rglob('*') is more idiomatic but slower. hashlib.blake2b is in stdlib and ~3x faster than SHA-256.
Java: Files.walk(Path) is the equivalent; MessageDigest for hashing. BLAKE2 requires BouncyCastle.
Go: filepath.Walk (deprecated for filepath.WalkDir). crypto/sha256 is fast; golang.org/x/crypto/blake2b for BLAKE2.
C++: std::filesystem::recursive_directory_iterator. OpenSSL’s EVP_DigestUpdate for hashing.
JS/TS: fs.promises.readdir(dir, {recursive: true}) (Node 20+). crypto.createHash('blake2b512') for hashing.

Common Bugs

Treating symlinks as files — infinite loops or duplicate phantom matches. Use lstat and explicit is_file() check.
Forgetting to filter size-1 groups before stage 2 (running quick-hash on isolated files wastes I/O).
Hashing with md5 and getting bitten by collision in adversarial datasets. Use SHA-256 or BLAKE2b.
Reading the entire file into memory at full-hash stage instead of streaming. OOM on large files.
Not handling permission errors — first OSError halts the entire scan instead of skipping the file.

Debugging Strategy

For “missed duplicates”: verify with cmp (Unix) or byte-by-byte. Print the size/quick-hash/full-hash of both files. Most missed-dup bugs are quick-hash logic (e.g., not seeking to end correctly for files smaller than 2 * QUICK_BYTES).

Mastery Criteria

Articulated the three-stage cascade in <60 seconds before coding.
Implemented in <25 minutes; all four tests pass.
Stated bytes-read savings is the main optimization signal, not wall-clock.
Answered follow-ups #3, #4, #8, #13 crisply.
Compared full-file vs chunk-level dedup correctly.
Identified BLAKE2b as the right hash for performance.

Lab 10 — Consistent Hashing

Goal

Implement a consistent hash ring with virtual nodes that minimizes key remapping when servers are added or removed. After this lab you should be able to design and implement consistent hashing in under 30 minutes and articulate why it beats hash(key) % N.

Background Concepts

The naive sharding scheme hash(key) % N has a catastrophic failure mode: when N changes (a server is added or removed), nearly every key remaps to a different shard. For an in-memory cache fleet, this means the entire cache is invalidated; for a stateful sharded store, this means most data must be physically migrated.

Consistent hashing solves this. Servers are placed on a ring (a circular hash space, e.g., [0, 2^64)). Each key is hashed onto the ring and assigned to the next server clockwise. When a server is added, only keys between its predecessor and itself on the ring are remapped. When a server is removed, only its keys are remapped — to its successor.

Without virtual nodes (vnodes), the ring is unbalanced: a 4-server ring assigns wildly unequal slices. Virtual nodes fix this: each physical server gets V ring positions (e.g., V=200). The ring becomes statistically balanced in O(1/sqrt(V)) deviation.

Interview Context

Consistent hashing is the default sharding mechanism for distributed caches (Memcached client libraries, Redis Cluster’s slot variant), distributed databases (DynamoDB, Cassandra), and load balancers (HAProxy, Envoy with ring_hash). It is asked at infrastructure roles at every Big Tech and many high-scale companies.

Problem Statement

Implement ConsistentHashRing(vnodes_per_server):

add_server(server_id) — add a server with V vnodes.
remove_server(server_id) — remove all vnodes for the server.
get_server(key) -> server_id — return the server responsible for key.
keys_moved(key, before, after) — for analysis: did this key remap?

Constraints

1 ≤ servers ≤ 10^4
1 ≤ vnodes per server ≤ 1000
10^5 lookups / second
Lookup latency: O(log N · V)

Clarifying Questions

Hash function: cryptographic or fast? (Use a fast non-crypto hash: MurmurHash, xxHash. Stable across processes.)
Vnode count V: hard-coded or configurable? (Configurable, default 100–200.)
Replication: should get_server return one or multiple distinct servers? (Often a follow-up; primary is one.)
Hot-spotting awareness: do we know any keys are extremely hot? (Bounded-load consistent hashing is a follow-up.)

Examples

ring = ConsistentHashRing(vnodes_per_server=100)
ring.add_server("s1"); ring.add_server("s2"); ring.add_server("s3")
ring.get_server("user-42")     -> "s2"
ring.add_server("s4")
ring.get_server("user-42")     -> "s2" or "s4" (only some keys remap)
# ~25% of keys remap on adding a 4th server, not 75% as with mod-N

Initial Brute Force

hash(key) % N. Simple and balanced; catastrophic on N change. Useful for understanding the problem, not as the solution.

Brute Force Complexity

O(1) per lookup; O(N · keys / N) = O(keys) remap on N change — not the problem; the brute force is mod-N and we’re moving away from it.

Optimization Path

Replace mod-N with a sorted ring. Servers map to multiple positions; lookup is binary search. On add/remove, only insert/delete vnode positions; existing positions don’t move.

Final Expected Approach

A sorted list of (hash_value, server_id) tuples kept in ring order. get_server(key): hash the key, binary-search for the smallest ring position ≥ key_hash; wrap around if past the end. add_server: insert V positions. remove_server: remove all V positions.

Data Structures Used

Structure	Purpose
Sorted list of `(hash, server)`	the ring
`dict[server_id, list[hash_positions]]`	bookkeeping for removal
`bisect` for binary search	O(log N · V) lookup

Correctness Argument

Key locality on resize: when adding server S with vnodes [v_1, …, v_V], the only keys whose owner changes are those whose hash falls in some (predecessor(v_i), v_i] range. The expected fraction of keys affected is V / (total vnodes) ≈ 1/(N+1) — exactly the right number to assign to the new server, and no more.

Balanced load: with V vnodes per server, the variance of the load assigned to each server scales as O(log N / V). At V=100, N=10, the imbalance is < 5%; at V=1000, < 1.5%.

Complexity

get_server: O(log(N · V))
add_server: O(V · log(N · V)) per insert; total O(V log N) per server
Space: O(N · V)

Implementation Requirements

import bisect, hashlib
from typing import Optional

def _hash(s: str) -> int:
    """Fast, deterministic hash. Use MD5 for speed; SHA-1 also fine."""
    return int.from_bytes(hashlib.md5(s.encode()).digest()[:8], "big")

class ConsistentHashRing:
    def __init__(self, vnodes_per_server: int = 100):
        self._v = vnodes_per_server
        self._ring: list[tuple[int, str]] = []   # sorted by hash
        self._server_positions: dict[str, list[int]] = {}

    def add_server(self, server_id: str) -> None:
        if server_id in self._server_positions:
            return
        positions = []
        for i in range(self._v):
            h = _hash(f"{server_id}#{i}")
            bisect.insort(self._ring, (h, server_id))
            positions.append(h)
        self._server_positions[server_id] = positions

    def remove_server(self, server_id: str) -> None:
        positions = self._server_positions.pop(server_id, None)
        if positions is None: return
        # Rebuild filtered ring (O(N V) — acceptable; remove is rare)
        self._ring = [(h, s) for (h, s) in self._ring if s != server_id]

    def get_server(self, key: str) -> Optional[str]:
        if not self._ring:
            return None
        kh = _hash(key)
        idx = bisect.bisect_left(self._ring, (kh, ""))
        if idx == len(self._ring):
            idx = 0                               # wrap around
        return self._ring[idx][1]

    def server_count(self) -> int:
        return len(self._server_positions)


# Bounded-load variant for hot-spot mitigation:
class BoundedLoadRing:
    """Consistent hashing with bounded-load: each server's load ≤ avg * (1+ε)."""
    def __init__(self, vnodes_per_server: int = 100, epsilon: float = 0.25):
        self._inner = ConsistentHashRing(vnodes_per_server)
        self._eps = epsilon
        self._load: dict[str, int] = {}

    def add_server(self, sid: str) -> None:
        self._inner.add_server(sid); self._load.setdefault(sid, 0)

    def remove_server(self, sid: str) -> None:
        self._inner.remove_server(sid); self._load.pop(sid, None)

    def get_server(self, key: str, total_keys: int) -> Optional[str]:
        n = self._inner.server_count()
        if n == 0: return None
        cap = (total_keys / n) * (1 + self._eps)
        # Walk forward from the first candidate until we find one under cap.
        kh = _hash(key)
        ring = self._inner._ring
        idx = bisect.bisect_left(ring, (kh, ""))
        if idx == len(ring): idx = 0
        for offset in range(len(ring)):
            _, sid = ring[(idx + offset) % len(ring)]
            if self._load.get(sid, 0) < cap:
                self._load[sid] = self._load.get(sid, 0) + 1
                return sid
        return ring[idx][1]                       # all over cap; pick first

Tests

import unittest, random, statistics

class TestRing(unittest.TestCase):
    def test_basic(self):
        r = ConsistentHashRing(vnodes_per_server=10)
        r.add_server("s1"); r.add_server("s2"); r.add_server("s3")
        self.assertIsNotNone(r.get_server("k1"))
        self.assertIn(r.get_server("k1"), {"s1", "s2", "s3"})
        r.remove_server("s2")
        self.assertIn(r.get_server("k1"), {"s1", "s3"})

    def test_minimal_remapping(self):
        r = ConsistentHashRing(vnodes_per_server=200)
        for s in ["s1", "s2", "s3"]: r.add_server(s)
        keys = [f"key-{i}" for i in range(10000)]
        before = {k: r.get_server(k) for k in keys}
        r.add_server("s4")
        after = {k: r.get_server(k) for k in keys}
        moved = sum(1 for k in keys if before[k] != after[k])
        # Expected ~25% remapping (from 3 servers to 4).
        # mod-N would have moved ~75%.
        self.assertLess(moved, 3500)
        self.assertGreater(moved, 1500)

    def test_balance(self):
        r = ConsistentHashRing(vnodes_per_server=200)
        for i in range(10): r.add_server(f"s{i}")
        keys = [f"k-{i}" for i in range(20000)]
        loads = {}
        for k in keys:
            s = r.get_server(k)
            loads[s] = loads.get(s, 0) + 1
        avg = 2000
        # With 200 vnodes per server, variance should be small.
        for cnt in loads.values():
            self.assertLess(abs(cnt - avg), 350)   # ≤17% deviation

    def test_empty_ring(self):
        r = ConsistentHashRing(vnodes_per_server=10)
        self.assertIsNone(r.get_server("k"))

Follow-up Questions

(3) Scale to N nodes? Already designed for it. The ring scales because lookup is O(log N · V). The bottleneck on add_server is O(V log N) insertions; sorted-tree (red-black tree) implementations get O(V log N) similarly. For very large N, use a B-tree or skip list. For replication: get_servers(key, R) returns the next R unique servers clockwise.

(8) Partial failure? A server going down is naturally handled — its vnodes are removed and keys remap to the successor. The challenge is hot spots: when one server dies, all its load moves to one successor. Bounded-load consistent hashing (Mirsky’s variant) caps each server at (1+ε) × avg_load, spilling overflow to the next server. Implemented above.

(10) Consistency model? The ring itself is a routing function. The actual stored data has whatever consistency model the underlying store offers (linearizable, eventual, etc.). One subtlety: when a server is added, the data on the predecessor needs to be transferred to the new server before the routing change takes effect, or you serve stale/missing data. Two-phase add: install vnodes-as-readonly → migrate keys → activate.

(11) Configuration knobs? vnodes_per_server (100–500 covers most workloads), hash_function. Not to expose: ring data structure, balance/heuristics.

(4) Observe / monitor? Per-server load (gauge), key remapping events (counter), p99 lookup latency (histogram). Imbalance alert: trigger if any server’s load > 1.5x avg.

Product Extension

DynamoDB uses consistent hashing with explicit ranges. Cassandra uses 256 vnodes per node by default. Memcached’s clients (ketama, libmemcached) use consistent hashing. Envoy’s ring_hash load balancer uses it for sticky-session routing. Discord’s chat sharding originally used hash(channel) % N and famously hit the rebalance problem; they migrated to a fixed-bucket scheme. The point: even big companies get this wrong if they pick mod-N.

Language/Runtime Follow-ups

Python: bisect is the right tool for sorted-list maintenance. For very large rings, use sortedcontainers.SortedList (skip-list-backed) for O(log N) inserts.
Java: TreeMap<Long, String> — ceilingKey(hash) does the lookup. Idiomatic.
Go: sort.Search over a []uint64 for the ring. Good locality, fast.
C++: std::map<uint64_t, std::string> with lower_bound. Or sort a vector and binary-search.
JS/TS: no sorted-tree in stdlib; use the sorted-array-functions npm package or maintain a sorted array manually.

Common Bugs

Using hash(key) % len(ring) to pick a vnode index — that is mod-N inside the ring. Use the ring’s actual hash space.
Forgetting to wrap around — bisect returns len(ring) for a key past the last vnode; you must wrap to index 0.
Hash collisions on small rings — two servers’ vnodes land on the same hash. Either accept “first inserted wins” (deterministic) or perturb the suffix until unique.
Removing a server but leaving its vnodes in _server_positions (memory leak; subsequent add_server for the same id silently no-ops because of the if … in … guard).
Using hash(key) (Python’s built-in) — randomized per-process. Different processes route the same key to different servers. Use a stable hash like MD5 or MurmurHash.

Debugging Strategy

For “wrong server” complaints: log the (hashed-key, ring-position-found, server). For imbalance: dump load distribution and check vnode count. For “everything remapped” after add: count the remapped key fraction; if > 1/N, vnode count is too low or the hash function is poor.

Mastery Criteria

Implemented ring + lookup + add/remove in <30 minutes.
Stated minimal-remapping invariant (≤ 1/N keys move on add) without prompting.
All four tests pass.
Articulated why vnodes are needed in <60 seconds.
Compared mod-N, ring-no-vnodes, ring-with-vnodes, bounded-load on a whiteboard.
Answered follow-ups #3, #8 (bounded load), #10 (data migration coordination), #11 crisply.

Lab 11 — Message Dispatcher

Goal

Implement a message dispatcher that fans out messages to N consumers with fairness, per-consumer priority levels, and per-consumer backpressure. After this lab you should be able to design a multi-consumer dispatcher that doesn’t starve slow consumers and doesn’t get stalled by slow ones, in under 30 minutes.

Background Concepts

A dispatcher accepts messages from one (or more) producers and routes them to N consumers. Three classical problems:

Fairness: round-robin across consumers vs weighted by priority. Strict round-robin is unfair if some consumers have higher priority. Weighted-fair-queueing (WFQ) gives each consumer a share proportional to its weight.
Slow-consumer problem: a slow consumer’s queue fills up. If we share a single queue across all consumers, the slow one stalls everyone. Solution: per-consumer queues with bounded capacity.
Backpressure: when a consumer’s queue is full, what do we do? Options: (a) block the producer (slowest fair), (b) drop oldest, (c) drop newest, (d) reject and signal upstream. Default to (b) for telemetry, (a) for orders/payments.

Interview Context

Dispatcher problems show up at message-bus companies (Confluent, Solace), real-time platforms (Twilio, PubNub, Pusher), and any backend with fan-out. The interviewer wants to hear: per-consumer queue, priority-aware scheduling, explicit backpressure policy. Common failure: a single shared queue with consumers competing — works for two consumers, breaks at scale.

Problem Statement

Design Dispatcher:

register_consumer(consumer_id, priority=1, queue_capacity=1024, on_full="drop_oldest")
dispatch(message) — fan out to all registered consumers (broadcast).
consume(consumer_id) -> Message | None — non-blocking dequeue for a consumer.
consume_blocking(consumer_id, timeout) — blocking dequeue.
unregister(consumer_id)
stats() -> dict — per-consumer queue size, drops, throughput.

Constraints

1 ≤ consumers ≤ 1000
10^5 dispatches / second
Per-consumer latency: < 1 ms p99 from dispatch to availability

Clarifying Questions

Broadcast (every consumer gets every message) or partition (each message goes to one)? (Pick one; we’ll do broadcast — the harder problem.)
Strict ordering across consumers? (Each consumer sees messages in dispatch order; cross-consumer ordering not guaranteed.)
Priority semantics: priority is a weight, not a strict precedence? (Weight is the standard.)
Should consumers be threads, or should consume be polled by external code? (External polling — simpler.)

Examples

d = Dispatcher()
d.register_consumer("fast", priority=2, queue_capacity=100)
d.register_consumer("slow", priority=1, queue_capacity=10, on_full="drop_oldest")

for i in range(20):
    d.dispatch({"i": i})

# slow consumer dropped 10 oldest; only sees last 10
slow_msgs = []
while True:
    m = d.consume("slow")
    if m is None: break
    slow_msgs.append(m)
# slow_msgs has the 10 most recent messages

Initial Brute Force

A single shared deque and a for c in consumers: c.recv(msg) loop on dispatch. If a consumer is slow, the dispatch blocks on it. The whole pipeline stalls.

Brute Force Complexity

Per dispatch: O(N · consumer-receive-time). Worst-case latency is bounded by the slowest consumer.

Optimization Path

Per-consumer queue. Dispatch is now O(N · queue-push-time) ≈ O(N) — fast and bounded by the dispatcher’s own work, not by any consumer’s processing. Slow consumers fill their own queue and trigger their own backpressure policy without affecting others.

Final Expected Approach

A dict[consumer_id, ConsumerQueue]. Each ConsumerQueue has a deque, a Lock (or use queue.Queue), a capacity, an on-full policy, and counters. dispatch iterates the dict and pushes to each queue, applying the policy on full. consume dequeues from the named queue. The single shared lock would serialize dispatches; the per-queue lock parallelizes them with the cost that there’s no atomic “dispatch sees a consistent registration” — acceptable trade.

Data Structures Used

Structure	Purpose
`dict[id, ConsumerQueue]`	per-consumer state
`deque` per consumer	bounded FIFO
`Lock` per consumer	concurrent push/pop safety
`Condition` per consumer	blocking consume

Correctness Argument

No starvation: every consumer has its own queue; a slow one cannot block dispatch. Producer dispatch latency is bounded by O(N) (the dict iteration), independent of consumer speed.

Priority weighting: at dispatch time, we don’t apply weight (every consumer gets every message — broadcast). Priority is used in the consume order if we have a single consumer thread that polls all queues in priority order; in that variant, weight determines how many messages we consume from each per round (e.g., 2 from priority=2, 1 from priority=1).

Backpressure: when a queue is at capacity, the consumer-defined policy fires. Each consumer’s drops are tracked separately; the dispatcher itself never blocks.

Complexity

dispatch: O(N) where N = number of consumers
consume: O(1)
Space: O(N · capacity)

Implementation Requirements

import threading, time
from collections import deque
from typing import Any, Optional

class _ConsumerQueue:
    def __init__(self, capacity: int, on_full: str = "drop_oldest", priority: int = 1):
        self.capacity = capacity
        self.on_full = on_full        # "drop_oldest" | "drop_newest" | "block"
        self.priority = priority
        self.q: deque = deque()
        self.lock = threading.Lock()
        self.cond = threading.Condition(self.lock)
        self.dropped = 0
        self.delivered = 0


class Dispatcher:
    def __init__(self):
        self._consumers: dict[str, _ConsumerQueue] = {}
        self._reg_lock = threading.RLock()

    def register_consumer(self, consumer_id: str, priority: int = 1,
                          queue_capacity: int = 1024,
                          on_full: str = "drop_oldest") -> None:
        with self._reg_lock:
            if consumer_id in self._consumers:
                raise ValueError(f"{consumer_id} already registered")
            self._consumers[consumer_id] = _ConsumerQueue(queue_capacity, on_full, priority)

    def unregister(self, consumer_id: str) -> None:
        with self._reg_lock:
            self._consumers.pop(consumer_id, None)

    def dispatch(self, message: Any) -> None:
        with self._reg_lock:
            consumers = list(self._consumers.values())
        for cq in consumers:
            with cq.lock:
                if len(cq.q) >= cq.capacity:
                    if cq.on_full == "drop_oldest":
                        cq.q.popleft()
                        cq.dropped += 1
                        cq.q.append(message)
                    elif cq.on_full == "drop_newest":
                        cq.dropped += 1
                        continue
                    elif cq.on_full == "block":
                        while len(cq.q) >= cq.capacity:
                            cq.cond.wait()
                        cq.q.append(message)
                else:
                    cq.q.append(message)
                cq.cond.notify()

    def consume(self, consumer_id: str) -> Optional[Any]:
        cq = self._consumers.get(consumer_id)
        if cq is None: return None
        with cq.lock:
            if not cq.q: return None
            m = cq.q.popleft()
            cq.delivered += 1
            cq.cond.notify()                 # producer waiting on "block" policy
            return m

    def consume_blocking(self, consumer_id: str, timeout: Optional[float] = None) -> Optional[Any]:
        cq = self._consumers.get(consumer_id)
        if cq is None: return None
        deadline = None if timeout is None else time.monotonic() + timeout
        with cq.lock:
            while not cq.q:
                if deadline is None:
                    cq.cond.wait()
                else:
                    rem = deadline - time.monotonic()
                    if rem <= 0: return None
                    cq.cond.wait(timeout=rem)
            m = cq.q.popleft()
            cq.delivered += 1
            cq.cond.notify()
            return m

    def stats(self) -> dict:
        with self._reg_lock:
            return {
                cid: {
                    "queue_size": len(cq.q),
                    "dropped": cq.dropped,
                    "delivered": cq.delivered,
                    "capacity": cq.capacity,
                    "priority": cq.priority,
                }
                for cid, cq in self._consumers.items()
            }

Tests

import unittest, threading, time

class TestDispatcher(unittest.TestCase):
    def test_broadcast(self):
        d = Dispatcher()
        d.register_consumer("a"); d.register_consumer("b")
        for i in range(5): d.dispatch(i)
        a = [d.consume("a") for _ in range(5)]
        b = [d.consume("b") for _ in range(5)]
        self.assertEqual(a, [0, 1, 2, 3, 4])
        self.assertEqual(b, [0, 1, 2, 3, 4])

    def test_drop_oldest_on_full(self):
        d = Dispatcher()
        d.register_consumer("a", queue_capacity=3, on_full="drop_oldest")
        for i in range(5): d.dispatch(i)
        out = []
        while True:
            m = d.consume("a")
            if m is None: break
            out.append(m)
        self.assertEqual(out, [2, 3, 4])
        self.assertEqual(d.stats()["a"]["dropped"], 2)

    def test_slow_consumer_does_not_block_fast(self):
        d = Dispatcher()
        d.register_consumer("slow", queue_capacity=2, on_full="drop_oldest")
        d.register_consumer("fast", queue_capacity=100)
        for i in range(50):
            d.dispatch(i)
        # fast got everything
        fast = []
        while True:
            m = d.consume("fast")
            if m is None: break
            fast.append(m)
        self.assertEqual(len(fast), 50)
        # slow has 2 (capacity) — most-recent
        slow = [d.consume("slow") for _ in range(2)]
        self.assertEqual(slow, [48, 49])

    def test_blocking_consume_wakes_on_dispatch(self):
        d = Dispatcher()
        d.register_consumer("c", queue_capacity=10)
        results = []
        def consumer():
            results.append(d.consume_blocking("c", timeout=2.0))
        t = threading.Thread(target=consumer); t.start()
        time.sleep(0.05)
        d.dispatch("hello")
        t.join(timeout=2.0)
        self.assertEqual(results, ["hello"])

Follow-up Questions

(7) Backpressure? This is the core problem. Three policies built in: drop-oldest (best for telemetry), drop-newest (best for “first message matters”), block (best for ordered streams where loss is unacceptable). The right pick depends on the data semantics; expose it per-consumer.

(3) Scale to N nodes? Distributed dispatch: each consumer “subscribes” via a network connection; the dispatcher fans out over those connections. Bottleneck shifts from queue-push to per-consumer network round-trip. For very high consumer counts, a hierarchical dispatcher (dispatch to N regional dispatchers, each of which dispatches to M local consumers) reduces the per-message broadcast cost.

(4) Observe / monitor? Per-consumer queue depth, drop rate, throughput. The drop-rate heatmap by consumer is the first dashboard you draw. Alert when any consumer has > 1% drop rate.

(8) Partial failure? A consumer that connects, then disappears: the dispatcher must detect (via missed heartbeat or socket close) and unregister; otherwise its queue grows unbounded. Heartbeat / TTL on consumer registration.

(11) Configuration knobs? Per-consumer: priority, queue_capacity, on_full policy. Global: max consumers. Knobs not to expose: the lock granularity (per-consumer is correct).

Product Extension

This is the in-process version of Kafka consumer groups, RabbitMQ exchange-to-queue fan-out, and Redis Pub/Sub. Real systems add: durable per-consumer logs (Kafka’s offset model), dynamic rebalancing as consumers join/leave (Kafka group coordinator), and message acknowledgment (AMQP). The core problem of “don’t let slow consumers stall fast ones” is solved everywhere by per-consumer storage.

Language/Runtime Follow-ups

Python: queue.Queue per consumer is also fine and simpler — built-in blocking and capacity. Custom _ConsumerQueue shown for control over the on-full policy.
Java: LinkedBlockingQueue per consumer; RingBuffer (Disruptor pattern) for high-throughput.
Go: idiomatic Go: chan Message per consumer of size capacity. select with default for non-blocking. Simple and fast.
C++: boost::lockfree::spsc_queue per consumer for single-producer/single-consumer; otherwise mutex + deque.
JS/TS: single event loop, so no real “consumer threads”; use EventEmitter with bounded buffers per listener. RxJS Subject with bufferCount operators.

Common Bugs

Single shared lock for the dispatcher: serializes everything; dispatch latency = O(N · consumer-time).
block policy without notifying on consume: producer waits forever.
Forgetting to copy the consumer dict before iterating in dispatch — concurrent unregister mutates the dict mid-iteration.
Drop-oldest implementation: popleft then append succeeds, but if the lock is dropped between, ordering breaks. Both ops under the same lock.
Counting a “dropped” message as both dropped and delivered (double count) when on drop_oldest you replace, not skip.

Debugging Strategy

Print stats() periodically. Slow consumer = high queue_size and rising dropped. Stuck consumer = blocking with no notifies — check producer dispatch isn’t stuck on a different consumer’s block policy. For “missed messages”: log per-message dispatch with consumer enumeration and replay against delivered counts.

Mastery Criteria

Implemented per-consumer queue + dispatcher in <30 minutes.
All four tests pass.
Stated “per-consumer queue isolates slow consumers” before coding.
Listed three backpressure policies (drop-oldest/newest/block) and their use cases.
Answered follow-ups #3, #4, #7, #8, #11 crisply.
Identified that broadcast is the harder variant; partition is simpler.

Lab 12 — In-Memory Pub/Sub

Goal

Implement an in-process publish-subscribe system with topic-based routing, wildcard subscriptions (a.b.*, a.#), and per-subscriber backpressure. After this lab you should be able to write a clean pub/sub broker in under 30 minutes and articulate the topic-matching design tradeoffs.

Background Concepts

Pub/sub differs from a job queue: subscribers don’t compete for messages; each subscriber receives every matching message. Two routing models:

Topic-based (channel name as a string): "orders.created", "users.signup". Wildcards (* = one segment, # = many segments) come from MQTT/AMQP.
Content-based: subscribers register a predicate over message content. More flexible, much harder to scale (every message must be evaluated against every subscriber’s predicate).

Topic-based with wildcards is the standard. The implementation challenge is the wildcard matcher: a subscription on "orders.*" should match "orders.created" but not "orders.created.fraud". We can solve this with a topic trie (segment-by-segment) for O(L) per match where L is segment count, or a regex per subscription for O(N) per dispatch where N is subscription count. The trie is the production answer for systems with many subscriptions.

Interview Context

Pub/sub design is asked at messaging companies (Confluent, IBM MQ, Solace, AWS SNS), at real-time platforms (Pusher, Ably, Twilio), and broadly at any senior+ design-coding round. The interview wants both code and the design reasoning around routing, backpressure, and subscription matching.

Problem Statement

Design PubSub:

subscribe(topic_pattern, on_message) -> subscription_id — topic_pattern may include * (single segment) or # (multi-segment, must be last).
unsubscribe(subscription_id)
publish(topic, message) — call on_message(topic, message) on every matching subscriber.
Per-subscriber callback wrapping for backpressure (queue + drop policy).

Constraints

Up to 10^4 active subscriptions
Up to 10^5 publishes / second
Subscription matching: < 100 µs per publish
Per-subscriber callbacks may be slow; must not block the publisher

Clarifying Questions

Wildcard syntax — MQTT (+/#), AMQP (*/#), or other? (Pick AMQP-style: * = one segment, # = ≥ 0 segments at end.)
Synchronous or async callback delivery? (Async with per-subscriber queue is the production answer; simpler synchronous version is acceptable for the basic case.)
Topic separator: . or /? (. is the AMQP convention; either is fine.)
Ordering guarantees? (Per-subscriber: messages arrive in publish-order. Across subscribers: not guaranteed.)
Replay / retain / persistence? (No by default; pure in-memory.)

Examples

ps = PubSub()
sid = ps.subscribe("orders.*", lambda topic, msg: print(f"got {topic}: {msg}"))
ps.publish("orders.created", {"id": 1})         # fires
ps.publish("orders.created.fraud", {"id": 2})   # does NOT fire (* is one segment)

ps.subscribe("orders.#", lambda t, m: log(t, m))
ps.publish("orders.created.fraud", {})          # fires (# matches multi)

ps.unsubscribe(sid)
ps.publish("orders.created", {})                # only the # subscription fires

Initial Brute Force

dict[topic_pattern, list[callback]]. On publish, iterate all subscriptions, regex-match each pattern against the topic. O(N · pattern-cost) per publish where N is subscription count.

Brute Force Complexity

At N=10^4 subscriptions and 10^5 pub/s, this is 10^9 regex matches / sec — too slow. Wildcard regex compile and match dominate.

Optimization Path

A topic trie: each node represents a segment. Children include literal-segment children plus a * and # child for wildcards. Match by walking the trie segment-by-segment, exploring each node’s literal child plus its * child plus (if at end) any # ancestor’s catch-all subscriptions.

Per-publish cost becomes O(L · branching factor) ≈ O(L) for typical trees, where L is the topic depth.

Final Expected Approach

Build a topic trie. Each node has:

children: dict[str, Node] — literal subsegments
star: Node | None — single-segment wildcard
hash_subscriptions: list[Sub] — # catch-all (matches everything below this node)
subs: list[Sub] — exact matches at this node

Publishing walks the trie segment-by-segment, at each step checking the literal child and the star child; collect matching subs at terminal nodes. Collect hash_subscriptions along the entire path.

Each subscriber owns a per-subscriber bounded queue; publish enqueues to the queue (non-blocking, drops on full); a worker thread per subscriber drains the queue and calls the callback.

Data Structures Used

Structure	Purpose
Topic trie	O(L) routing
`dict[sub_id, Subscription]`	unregister lookup
`deque` per subscriber + lock	per-subscriber queue
Worker thread per subscriber	invoke callback off the publish path

Correctness Argument

Routing: a subscription A.B.C matches publish topic A.B.C iff the trie walk reaches the node carrying that subscription with all segments consumed. A * matches any single segment (one node-level wildcard step). A # at a node matches any zero-or-more remaining segments — equivalent to attaching a list of “catch-all” subscribers to that node.

Per-subscriber ordering: each subscriber’s queue is FIFO; the worker drains in FIFO order. Therefore subscriber sees messages in publish order.

Publisher non-blocking: publish only enqueues; no subscriber callback runs on the publish thread. Even a callback that takes 1 second doesn’t slow publish.

Complexity

subscribe: O(L) for trie insert
publish: O(L · F · M) where L = topic depth, F = trie branching, M = matching subscribers
unsubscribe: O(L) to walk the trie node and remove

Implementation Requirements

import threading, itertools
from collections import deque
from typing import Callable, Any, Optional

class _Sub:
    __slots__ = ("sub_id", "callback", "queue", "lock", "cond", "capacity", "drops",
                 "alive", "thread")
    def __init__(self, sub_id: int, callback: Callable, capacity: int = 1024):
        self.sub_id = sub_id; self.callback = callback
        self.queue: deque = deque()
        self.lock = threading.Lock()
        self.cond = threading.Condition(self.lock)
        self.capacity = capacity
        self.drops = 0
        self.alive = True
        self.thread: Optional[threading.Thread] = None

    def deliver(self, topic: str, msg: Any) -> None:
        with self.lock:
            if len(self.queue) >= self.capacity:
                self.queue.popleft(); self.drops += 1
            self.queue.append((topic, msg))
            self.cond.notify()

    def stop(self) -> None:
        with self.lock:
            self.alive = False
            self.cond.notify()

    def run(self) -> None:
        while True:
            with self.lock:
                while self.alive and not self.queue:
                    self.cond.wait()
                if not self.alive and not self.queue:
                    return
                topic, msg = self.queue.popleft()
            try:
                self.callback(topic, msg)
            except Exception:
                pass    # don't kill worker on bad callback


class _Node:
    __slots__ = ("children", "star", "subs", "hash_subs")
    def __init__(self):
        self.children: dict[str, _Node] = {}
        self.star: Optional[_Node] = None
        self.subs: list[_Sub] = []
        self.hash_subs: list[_Sub] = []


class PubSub:
    def __init__(self, separator: str = "."):
        self._sep = separator
        self._root = _Node()
        self._subs: dict[int, tuple[_Sub, list[str]]] = {}  # id -> (sub, pattern segments)
        self._next_id = itertools.count(1)
        self._lock = threading.RLock()

    def subscribe(self, pattern: str, callback: Callable[[str, Any], None],
                  queue_capacity: int = 1024) -> int:
        segments = pattern.split(self._sep)
        sid = next(self._next_id)
        sub = _Sub(sid, callback, queue_capacity)
        with self._lock:
            node = self._root
            for i, seg in enumerate(segments):
                if seg == "#":
                    if i != len(segments) - 1:
                        raise ValueError("# must be the last segment")
                    node.hash_subs.append(sub)
                    break
                if seg == "*":
                    if node.star is None: node.star = _Node()
                    node = node.star
                else:
                    node = node.children.setdefault(seg, _Node())
            else:
                node.subs.append(sub)
            self._subs[sid] = (sub, segments)
        sub.thread = threading.Thread(target=sub.run, daemon=True)
        sub.thread.start()
        return sid

    def unsubscribe(self, sub_id: int) -> None:
        with self._lock:
            entry = self._subs.pop(sub_id, None)
            if entry is None: return
            sub, segments = entry
            self._remove_from_trie(self._root, segments, 0, sub)
        sub.stop()
        sub.thread.join(timeout=1.0)

    def _remove_from_trie(self, node: _Node, segments: list[str],
                          i: int, sub: _Sub) -> None:
        if i == len(segments):
            try: node.subs.remove(sub)
            except ValueError: pass
            return
        seg = segments[i]
        if seg == "#":
            try: node.hash_subs.remove(sub)
            except ValueError: pass
            return
        nxt = node.star if seg == "*" else node.children.get(seg)
        if nxt is not None:
            self._remove_from_trie(nxt, segments, i + 1, sub)

    def publish(self, topic: str, message: Any) -> None:
        segments = topic.split(self._sep)
        with self._lock:
            self._match(self._root, segments, 0, topic, message)

    def _match(self, node: _Node, segments: list[str], i: int,
               topic: str, msg: Any) -> None:
        # # at this node matches everything below — fire now
        for s in node.hash_subs:
            s.deliver(topic, msg)
        if i == len(segments):
            for s in node.subs:
                s.deliver(topic, msg)
            return
        seg = segments[i]
        child = node.children.get(seg)
        if child is not None:
            self._match(child, segments, i + 1, topic, msg)
        if node.star is not None:
            self._match(node.star, segments, i + 1, topic, msg)

Tests

import unittest, time

class TestPubSub(unittest.TestCase):
    def _collect(self, ps, sid_buf):
        buf = []
        def cb(t, m): buf.append((t, m))
        sid = ps.subscribe(sid_buf, cb)
        return sid, buf

    def test_exact_match(self):
        ps = PubSub()
        sid, buf = self._collect(ps, "a.b")
        ps.publish("a.b", 1); ps.publish("a.b.c", 2); ps.publish("a", 3)
        time.sleep(0.05)
        self.assertEqual(buf, [("a.b", 1)])
        ps.unsubscribe(sid)

    def test_star_one_segment(self):
        ps = PubSub()
        sid, buf = self._collect(ps, "a.*")
        ps.publish("a.b", 1); ps.publish("a.c", 2)
        ps.publish("a.b.c", 3); ps.publish("a", 4)
        time.sleep(0.05)
        self.assertEqual(sorted(buf), [("a.b", 1), ("a.c", 2)])
        ps.unsubscribe(sid)

    def test_hash_multi_segment(self):
        ps = PubSub()
        sid, buf = self._collect(ps, "a.#")
        ps.publish("a", 0)
        ps.publish("a.b", 1); ps.publish("a.b.c", 2); ps.publish("x.y", 3)
        time.sleep(0.05)
        # # matches zero or more, so a, a.b, a.b.c all match
        self.assertEqual(sorted(buf), [("a", 0), ("a.b", 1), ("a.b.c", 2)])
        ps.unsubscribe(sid)

    def test_multiple_subscribers(self):
        ps = PubSub()
        sid1, b1 = self._collect(ps, "topic")
        sid2, b2 = self._collect(ps, "topic")
        ps.publish("topic", "msg")
        time.sleep(0.05)
        self.assertEqual(b1, [("topic", "msg")])
        self.assertEqual(b2, [("topic", "msg")])
        ps.unsubscribe(sid1); ps.unsubscribe(sid2)

Follow-up Questions

(7) Backpressure? Per-subscriber bounded queue with drop-oldest on full (shown). Alternatives: block the publisher (rejected — one slow subscriber stalls the world), drop-newest (loses recent state — rarely the right answer for pub/sub).

(3) Scale to N nodes? Distributed pub/sub is its own discipline. Models: (a) broker-based (Redis Pub/Sub, NATS): central broker fans out. (b) broker-less mesh (pgossip): peers gossip subscriptions; each publish goes to relevant peers. (c) partitioned log (Kafka): no fan-out; consumers tail logs. The trie matcher works locally in any model; the network layer is the harder design.

(2) Persist state? Pure pub/sub is volatile — late subscribers miss messages. To persist, layer a replay log: every publish appends to a durable log; new subscribers can opt-in to read from offset 0 or “latest”. This is essentially Kafka’s design.

(4) Observe / monitor? Per-subscriber drop count, queue depth, throughput. Subscription count gauge. Publish rate counter. p99 publish-to-deliver latency histogram (for the per-subscriber path).

(11) Configuration knobs? Per-subscription queue capacity, on-full policy. Global: max subscriptions, separator character. Knobs not to expose: trie internal layout.

Product Extension

MQTT brokers, AMQP exchanges, Redis Pub/Sub, NATS, ZeroMQ — all use topic-based routing with some wildcard syntax. Cloud Pub/Sub products (AWS SNS, GCP Pub/Sub, Azure Event Grid) add durability, retries, and ordering. The ergonomic difference between MQTT and AMQP wildcards (+/# vs */#) is purely syntactic.

Language/Runtime Follow-ups

Python: this implementation. The per-subscriber thread approach scales to ~1000 subscribers; beyond, switch to an event loop (asyncio) with a single dispatcher coroutine.
Java: EventBus (Guava) is the lightweight in-process pub/sub. For wildcards, MQTT clients (Paho) or Kafka.
Go: channels per subscriber; idiomatic. nats-server is the production-grade Go choice.
C++: Boost.Signals2 is the in-process equivalent; no wildcards.
JS/TS: Node’s EventEmitter is the in-process equivalent; no wildcards. RxJS for reactive streams.

Common Bugs

Synchronous callback dispatch from publish — one slow subscriber stalls everyone. Always use per-subscriber worker threads.
Trie cleanup on unsubscribe: removing from the leaf but leaving empty intermediate nodes. Memory leak; matters at high churn.
# not at end: validate at subscription time.
Not propagating exceptions out of subscriber callbacks (silent failures). Log them.
Race: subscribing during a publish — the subscription’s callback may or may not see the in-flight message. Document the semantics.

Debugging Strategy

For “missed message”: print the trie state at the matching point and the topic segments. For wildcard surprises, hand-trace the match: which child did we descend into? Did we visit *? Did # fire at the right level? For “callback didn’t run”: check that the worker thread is alive (sub.thread.is_alive()); a callback exception kills the worker if not caught.

Mastery Criteria

Implemented topic trie + wildcard matching in <30 minutes.
All four tests pass first run.
Stated trie-vs-regex tradeoff (trie wins at scale; regex is simpler for few subscriptions).
Articulated per-subscriber queue isolates slow subscribers.
Answered follow-ups #2 (replay log), #3, #4, #7, #11.
Compared topic vs content-based routing and stated when to use each.

Lab 13 — Hierarchical Timer Wheel

Goal

Implement a hierarchical timer wheel that supports O(1) amortized schedule and cancel operations for up to millions of timers. After this lab you should understand why a min-heap is wrong for high-throughput timer workloads, and be able to implement a single-level and a hierarchical timer wheel from blank screen in under 35 minutes.

Background Concepts

Many systems need to schedule callbacks for a future time: TCP retransmit timers, session timeouts, rate-limit reset, cron-style task scheduling. The classical data structure is a min-heap of (fire_time, callback): O(log N) schedule, O(log N) to fire (pop), O(N) to cancel (without indexing).

For the cases where N is small (thousands) and timers are infrequent, a heap is fine. But TCP at high QPS has millions of pending timers, most of which are cancelled before firing (the data arrives or the connection closes). For that workload, the heap is too slow.

The timer wheel (Varghese & Lauck, 1987) achieves O(1) schedule and O(1) cancel by bucketing timers by their fire time. Imagine a circular array of N slots; each slot holds a list of timers that fire when the wheel cursor reaches that slot. Each tick, advance the cursor and fire all timers in the current slot. Schedule is O(1): slot_index = (now + delay) % N. Cancel is O(1): remove from the slot’s list.

The single-level wheel works for delays up to N · tick_resolution. Beyond that, hierarchical wheels: minute wheel + hour wheel + day wheel, like an analog clock. When the minute hand sweeps past 60, advance the hour hand by one and re-bucket the timers in that hour slot into the minute wheel.

Interview Context

Timer wheel is a senior+ system-design-coding question, asked at networking/infrastructure companies (Cloudflare, Cilium, AWS networking, Datadog APM agents). The bar is high: implement at least the single-level wheel correctly; the hierarchical structure is for top candidates.

Problem Statement

Implement TimerWheel:

schedule(delay_seconds, callback) -> timer_id
cancel(timer_id) -> bool
tick(now) — advance the cursor; fire all timers whose deadline passed.

Then extend to HierarchicalTimerWheel: 4 levels (e.g., 256 slots × 4 levels = 4 GB span at 1 ms tick).

Constraints

Up to 10^7 active timers
Tick resolution: 1 ms to 100 ms
Schedule rate: 10^6 / sec
Cancel rate: 10^6 / sec (most timers are cancelled before firing)

Clarifying Questions

Tick resolution — fixed at construction, or adaptive? (Fixed.)
Time source — caller supplies now (testable) or wall clock? (Caller supplies; this also lets us simulate.)
Are callbacks fired on the tick thread, or queued? (On tick thread — simpler. For long callbacks, tick is slow; document.)
What’s the max delay? (Single-level: slots * tick; hierarchical: enormous.)

Examples

wheel = TimerWheel(slots=60, tick_ms=1000)   # 1-second tick, 60-second range
fired = []
wheel.schedule(5, lambda: fired.append("5s"))
wheel.schedule(10, lambda: fired.append("10s"))

# Simulate ticks
for i in range(15):
    wheel.tick(start_time + i)
# After 15 seconds, fired == ["5s", "10s"]

Initial Brute Force

A min-heap of (deadline, id, callback). Schedule = O(log N). Tick = O(K log N) for K firings. Cancel = O(N) (or O(log N) with a dict[id, heap_idx] and lazy deletion).

Brute Force Complexity

At 10^6 schedules/sec with N=10^7, schedule cost is log(10^7) ≈ 23 per — feasible but at the edge. The fatal weakness is cancel: O(N) without indexing. With lazy-deletion indexing, the heap grows unboundedly with cancelled-but-not-popped entries — memory leak.

Optimization Path

Single-level wheel: slots = circular array of bucket lists. schedule(delay): compute slot = (cursor + delay // tick) % slots, append to that slot’s list. tick: advance cursor, fire and clear slots[cursor]. Cancel: remove the timer from its slot’s list (O(1) if you store back-pointers).

Hierarchical: when delay > slots, place in higher-level wheel. When the lower wheel completes a full revolution, the next slot of the higher wheel is “cascaded” — its timers are re-bucketed into the lower wheel.

Final Expected Approach

Single-level: doubly-linked lists at each slot for O(1) cancel (timer holds prev/next pointers). Cursor advances on tick; fire all timers in the new slot; clear the slot.

Hierarchical: 4 wheels with slot counts [256, 64, 64, 64] (Linux kernel choice). Lower wheel ticks every 1 ms; carries every 256 ticks → upper level advances by one slot, and we cascade that slot (re-bucket its timers into the lower wheel based on remaining delay).

Data Structures Used

Structure	Purpose
Array of doubly-linked lists	wheel slots
`dict[timer_id, Timer]`	O(1) cancel lookup
`cursor: int`	current slot
Multiple wheels	hierarchical

Correctness Argument

Single-level firing: a timer scheduled at now + delay is placed in slot (cursor + delay // tick) % slots. tick is called once per tick time unit; after delay // tick calls, the cursor reaches the timer’s slot and fires it. Provided delay < slots * tick, this is exact.

Cancel: O(1) splice in the doubly-linked list, O(1) lookup via dict.

Hierarchical correctness: when the lower wheel completes a revolution (cursor wraps), the next upper-wheel slot is cascaded: each timer in it has its remaining delay computed against the new “now” and is placed in the appropriate lower-wheel slot. Because the cascade happens just before the next revolution, the timer fires at the same wall-clock time it would have in a single-large-wheel implementation.

Complexity

schedule: O(1)
cancel: O(1)
tick: O(K) for K firings, plus amortized O(1) cascade
Space: O(slots + active timers)

Implementation Requirements

import itertools
from typing import Callable, Optional

class _Timer:
    __slots__ = ("tid", "deadline_tick", "callback", "prev", "next", "slot")
    def __init__(self, tid, deadline_tick, callback):
        self.tid = tid
        self.deadline_tick = deadline_tick
        self.callback = callback
        self.prev = self.next = None
        self.slot: Optional[list] = None  # back-ref to bucket for O(1) cancel


class _Bucket:
    __slots__ = ("head", "tail")
    def __init__(self):
        self.head = _Timer(None, None, None)   # sentinel
        self.tail = _Timer(None, None, None)
        self.head.next = self.tail; self.tail.prev = self.head

    def append(self, t: _Timer) -> None:
        t.prev = self.tail.prev; t.next = self.tail
        self.tail.prev.next = t; self.tail.prev = t
        t.slot = self

    def remove(self, t: _Timer) -> None:
        t.prev.next = t.next; t.next.prev = t.prev
        t.slot = None; t.prev = t.next = None

    def drain(self) -> list[_Timer]:
        out = []
        n = self.head.next
        while n is not self.tail:
            nxt = n.next
            n.prev = n.next = None; n.slot = None
            out.append(n); n = nxt
        self.head.next = self.tail; self.tail.prev = self.head
        return out


class TimerWheel:
    def __init__(self, slots: int = 256, tick_seconds: float = 0.001):
        self._slots = [_Bucket() for _ in range(slots)]
        self._n_slots = slots
        self._tick = tick_seconds
        self._cursor = 0
        self._current_tick = 0
        self._timers: dict[int, _Timer] = {}
        self._next_id = itertools.count(1)

    def schedule(self, delay_seconds: float, callback: Callable) -> int:
        ticks = max(1, int(delay_seconds / self._tick))
        if ticks >= self._n_slots:
            raise ValueError("delay exceeds wheel range; use HierarchicalTimerWheel")
        deadline = self._current_tick + ticks
        slot = deadline % self._n_slots
        t = _Timer(next(self._next_id), deadline, callback)
        self._slots[slot].append(t)
        self._timers[t.tid] = t
        return t.tid

    def cancel(self, timer_id: int) -> bool:
        t = self._timers.pop(timer_id, None)
        if t is None or t.slot is None:
            return False
        t.slot.remove(t)
        return True

    def tick(self) -> None:
        self._current_tick += 1
        self._cursor = (self._cursor + 1) % self._n_slots
        bucket = self._slots[self._cursor]
        for t in bucket.drain():
            self._timers.pop(t.tid, None)
            try: t.callback()
            except Exception: pass


class HierarchicalTimerWheel:
    """4 levels: each level has 256 slots; tick = lower-level period."""
    def __init__(self, levels: int = 4, slots_per_level: int = 256,
                 tick_seconds: float = 0.001):
        self._levels = [
            [_Bucket() for _ in range(slots_per_level)] for _ in range(levels)
        ]
        self._cursors = [0] * levels
        self._n = slots_per_level
        self._tick = tick_seconds
        self._current_tick = 0
        self._timers: dict[int, tuple[int, int, _Timer]] = {}  # id -> (level, slot, t)
        self._next_id = itertools.count(1)

    def schedule(self, delay_seconds: float, callback: Callable) -> int:
        ticks = max(1, int(delay_seconds / self._tick))
        deadline = self._current_tick + ticks
        return self._place(deadline, callback)

    def _place(self, deadline_tick: int, callback: Callable) -> int:
        ticks_from_now = deadline_tick - self._current_tick
        # Find the lowest level that can hold this delay.
        capacity = self._n
        level = 0
        while ticks_from_now >= capacity and level < len(self._levels) - 1:
            level += 1; capacity *= self._n
        if ticks_from_now >= capacity:
            raise ValueError("delay exceeds hierarchical range")
        # Slot at this level:
        per_slot = capacity // self._n
        slot = (self._cursors[level] + ticks_from_now // per_slot) % self._n
        t = _Timer(next(self._next_id), deadline_tick, callback)
        self._levels[level][slot].append(t)
        self._timers[t.tid] = (level, slot, t)
        return t.tid

    def cancel(self, timer_id: int) -> bool:
        entry = self._timers.pop(timer_id, None)
        if entry is None: return False
        _, _, t = entry
        if t.slot is not None: t.slot.remove(t)
        return True

    def tick(self) -> None:
        self._current_tick += 1
        # Advance level 0
        self._cursors[0] = (self._cursors[0] + 1) % self._n
        # Fire level-0 current slot
        bucket = self._levels[0][self._cursors[0]]
        for t in bucket.drain():
            self._timers.pop(t.tid, None)
            try: t.callback()
            except Exception: pass
        # Cascade if we wrapped
        for lvl in range(1, len(self._levels)):
            if self._cursors[lvl - 1] != 0:
                break
            self._cursors[lvl] = (self._cursors[lvl] + 1) % self._n
            for t in self._levels[lvl][self._cursors[lvl]].drain():
                self._timers.pop(t.tid, None)
                self._place(t.deadline_tick, t.callback)

Tests

import unittest

class TestTimerWheel(unittest.TestCase):
    def test_basic(self):
        w = TimerWheel(slots=10, tick_seconds=1.0)
        fired = []
        w.schedule(3, lambda: fired.append("a"))
        w.schedule(5, lambda: fired.append("b"))
        for _ in range(3): w.tick()
        self.assertEqual(fired, ["a"])
        for _ in range(2): w.tick()
        self.assertEqual(fired, ["a", "b"])

    def test_cancel(self):
        w = TimerWheel(slots=10, tick_seconds=1.0)
        fired = []
        tid = w.schedule(2, lambda: fired.append("x"))
        self.assertTrue(w.cancel(tid))
        for _ in range(5): w.tick()
        self.assertEqual(fired, [])

    def test_hierarchical_long_delay(self):
        w = HierarchicalTimerWheel(levels=3, slots_per_level=4, tick_seconds=1.0)
        # Range = 4 * 4 * 4 = 64 ticks
        fired = []
        w.schedule(50, lambda: fired.append("late"))
        for _ in range(50): w.tick()
        self.assertEqual(fired, ["late"])

Follow-up Questions

(11) Configuration knobs? slots, tick_seconds, levels (for hierarchical). Tick resolution chooses your scheduling granularity vs CPU cost: 1 ms means 1000 ticks/sec which is fine; 100 µs is 10000 ticks/sec which adds CPU. Slots per level: 256 is a good default (1 byte addressable, common L1 cache friendly).

(7) Backpressure? Timer firing is on the tick thread. If callbacks are slow, ticks fall behind real time; subsequent timers fire late. Mitigation: dispatch callbacks to a thread pool from tick. Document the soft real-time semantics (“fires within 1 tick of deadline barring slow callbacks”).

(4) Observe / monitor? Active timer count (gauge), schedule rate (counter), cancel rate (counter), fire rate (counter), tick latency p99 (histogram — should be near zero; spikes mean slow callbacks).

(8) Partial failure? A callback that raises kills the tick if not caught. Always wrap with try/except and log.

(13) Poison pill? A callback that takes 1 second on a 1-ms wheel: all subsequent ticks pile up. Same mitigation as #7: dispatch to thread pool, or set a per-callback timeout.

Product Extension

Linux’s kernel uses a 4-level hierarchical timer wheel for setitimer and TCP retransmits. Netty’s HashedWheelTimer is the canonical Java implementation (single-level wheel; “hashed” means linked-list bucket per slot). The Linux choice of 256/64/64/64 slots covers ~5 days at 1 ms tick — enough for any kernel-level timer.

Language/Runtime Follow-ups

Python: this implementation. For sub-millisecond, switch to a C extension or use asyncio’s event loop scheduler (which uses a heap, but for low-N is fine).
Java: Netty’s HashedWheelTimer (single-level) and JCTools for lock-free variants. ScheduledThreadPoolExecutor is heap-based and slower at scale.
Go: time.AfterFunc uses a heap internally (fine for low-N). For high-N, github.com/RussellLuo/timingwheel is a clean library.
C++: the textbook reference; libuv and Boost both have wheel-based timer implementations.
JS/TS: Node’s timer subsystem uses a hash-bucket-by-time-and-context structure — not exactly a wheel but similar idea.

Common Bugs

Forgetting modulo on slot indexing — array out of bounds when delay wraps around.
Cascade firing on every tick instead of only when wrapping. Catastrophic slowdown.
Forgetting to drain the bucket before clearing — callbacks lost.
Holding callbacks in slot lists and in the _timers dict; failing to remove from one when removing from the other.
Scheduling delay=0: should fire on next tick, not “now”. Clamp to ≥ 1 tick.

Debugging Strategy

Print the wheel state (occupied slots and their counts) after each tick. For “missed firing” bugs, walk the slot indexing and verify the placement formula. For hierarchical cascade bugs, set very small slot counts (4 × 4 × 4) and hand-trace the wrap.

Mastery Criteria

Implemented single-level wheel in <20 minutes; hierarchical in <35.
All three tests pass.
Stated heap vs wheel tradeoff (heap O(log N), wheel O(1); wheel wins at high-N high-cancel).
Articulated cascade mechanism for hierarchical wheels.
Answered follow-ups #4, #7, #11, #13.
Identified that real-world systems (Linux, Netty) use this exact structure.

Lab 14 — Persistent KV Store

Goal

Implement an in-memory key-value store with TTL, snapshot + write-ahead-log (WAL) persistence, and crash recovery. After this lab you should be able to design and implement a Redis-shaped local store in under 40 minutes and articulate the durability tradeoffs.

Background Concepts

A persistent KV store has two storage paths:

In-memory state: a dict[key, value] (plus TTL bookkeeping). Hot path: O(1) read/write.
Durable state: writes go to a WAL (append-only log of mutations); periodically a snapshot captures the current state. On boot, recovery = load latest snapshot + replay WAL since.

The four standard durability levels (each a different fsync policy):

No persistence: pure in-memory. Lost on crash.
WAL with no fsync: writes to OS buffer; lost on power-cut, survives process crash.
WAL fsync per write: durable per write, slow (one syscall per op).
WAL fsync every N ms: hybrid — bounded data loss in exchange for throughput.

Redis offers exactly these as appendfsync no/everysec/always. The interview answer is “explain the spectrum and pick a default that matches the workload”.

Interview Context

This problem hits at infrastructure / database companies and at any senior coding round that wants to test storage fundamentals. The interviewer wants: snapshot + WAL design, fsync tradeoff articulation, working code that survives a simulated crash.

Problem Statement

Implement KVStore:

put(key, value, ttl_seconds=None) — store with optional TTL.
get(key) -> value | None
delete(key) -> bool
snapshot(path) — write current state.
Recovery: on construction with wal_path and snapshot_path, replay snapshot + WAL.

Constraints

10^7 keys
10^5 ops / second
Crash recovery within seconds
Bounded memory (configurable max)

Clarifying Questions

TTL granularity? (Seconds is fine for most workloads.)
fsync policy? (Configurable: none / per-write / every-N-ms.)
Snapshot format: text or binary? (Binary is faster, smaller; pick pickle or msgpack.)
Concurrent reads during snapshot? (Often a follow-up; default block during snapshot.)
Single-threaded or concurrent? (Single-threaded simplifies; lock for concurrency.)

Examples

kv = KVStore(wal_path="wal.log", snapshot_path="snap.pkl")
kv.put("user:1", {"name": "Alice"})
kv.put("session:42", "tok", ttl_seconds=60)
kv.get("user:1")             -> {"name": "Alice"}
# … crash …
kv2 = KVStore(wal_path="wal.log", snapshot_path="snap.pkl")
kv2.get("user:1")            -> {"name": "Alice"}    # recovered
kv2.snapshot()
# After snapshot: WAL is rotated (truncated)

Initial Brute Force

dict[key, value]. No persistence. Lost on crash.

Brute Force Complexity

Per op: O(1). Memory: O(N). Durability: zero.

Optimization Path

Add WAL: append (op, key, value, ttl) per mutation. On boot, replay. Add periodic snapshot: serialize full state; truncate WAL. Add TTL: a dict[key, expires_at] and lazy expiry on get.

The cost is: O(WAL append) per write (serialization + file write); O(snapshot size) per snapshot; O(WAL size) per recovery. Throughput depends on fsync policy.

Final Expected Approach

In-memory dict for values + dict for TTL deadlines + a binary log file. Operations: log first, then update memory (“write-ahead”). Snapshot: pickle the in-memory state to a temp file, then atomic rename + truncate WAL. Recovery: load snapshot, replay WAL with file-based offset.

Data Structures Used

Structure	Purpose
`dict[K, V]`	hot key-value store
`dict[K, float]`	TTL deadlines
WAL file (append-only)	durability
Snapshot file (pickle)	bounded recovery time
`Lock`	single-threaded mutation under multi-thread access

Correctness Argument

Durability: every mutation is appended to the WAL before updating the in-memory state. After the WAL append (and fsync, if configured), the mutation is durable. On crash, recovery replays exactly what was logged.

Atomicity of put: WAL append is atomic at the bytestream level (write syscalls of small bytes are atomic on Linux for ≤ 4 KB). Snapshot is atomic via write to tmp; fsync tmp; rename(tmp, snap).

Recovery correctness: applying snapshot first, then replaying WAL entries in order, reconstructs exactly the pre-crash state. The only loss is mutations that were in OS buffers but unsynced at crash time — bounded by fsync policy.

TTL: lazy expiry on get (check now >= deadline, delete if so). This is correct as long as we don’t return values past their TTL. Stale entries in memory are GC’d on access; a periodic background sweeper handles unused expired entries.

Complexity

put: O(1) memory + O(log entry size) disk
get: O(1)
snapshot: O(N) state size
Recovery: O(snapshot + WAL since snapshot)

Implementation Requirements

import os, pickle, time, threading
from typing import Any, Optional

class KVStore:
    def __init__(self, wal_path: str = "kv.wal",
                 snapshot_path: str = "kv.snap",
                 fsync: str = "every_sec"):
        self._wal_path = wal_path
        self._snap_path = snapshot_path
        self._fsync = fsync     # "none" | "per_write" | "every_sec"
        self._data: dict = {}
        self._ttl: dict = {}
        self._lock = threading.RLock()
        self._wal_fp = None
        self._last_fsync = time.monotonic()
        self._recover()
        self._wal_fp = open(self._wal_path, "ab", buffering=0)
        if self._fsync == "every_sec":
            self._fsync_thread = threading.Thread(target=self._fsync_loop, daemon=True)
            self._fsync_thread.start()

    def _recover(self) -> None:
        # 1. Load snapshot if present
        if os.path.exists(self._snap_path):
            with open(self._snap_path, "rb") as f:
                self._data, self._ttl = pickle.load(f)
        # 2. Replay WAL since snapshot
        if os.path.exists(self._wal_path):
            with open(self._wal_path, "rb") as f:
                while True:
                    try:
                        entry = pickle.load(f)
                    except (EOFError, pickle.UnpicklingError):
                        break
                    self._apply(entry)
        # Sweep expired
        now = time.time()
        for k in list(self._ttl):
            if self._ttl[k] <= now:
                self._data.pop(k, None); self._ttl.pop(k, None)

    def _apply(self, entry: dict) -> None:
        op = entry["op"]
        if op == "put":
            self._data[entry["k"]] = entry["v"]
            if entry.get("ttl") is not None:
                self._ttl[entry["k"]] = entry["ttl"]
            else:
                self._ttl.pop(entry["k"], None)
        elif op == "del":
            self._data.pop(entry["k"], None)
            self._ttl.pop(entry["k"], None)

    def _wal_write(self, entry: dict) -> None:
        buf = pickle.dumps(entry)
        self._wal_fp.write(buf)
        if self._fsync == "per_write":
            self._wal_fp.flush()
            os.fsync(self._wal_fp.fileno())

    def _fsync_loop(self) -> None:
        while True:
            time.sleep(1.0)
            with self._lock:
                if self._wal_fp:
                    self._wal_fp.flush()
                    os.fsync(self._wal_fp.fileno())

    def put(self, key, value, ttl_seconds: Optional[float] = None) -> None:
        deadline = (time.time() + ttl_seconds) if ttl_seconds else None
        with self._lock:
            self._wal_write({"op": "put", "k": key, "v": value, "ttl": deadline})
            self._data[key] = value
            if deadline is not None:
                self._ttl[key] = deadline
            else:
                self._ttl.pop(key, None)

    def get(self, key) -> Any:
        with self._lock:
            deadline = self._ttl.get(key)
            if deadline is not None and time.time() >= deadline:
                self._wal_write({"op": "del", "k": key})
                self._data.pop(key, None); self._ttl.pop(key, None)
                return None
            return self._data.get(key)

    def delete(self, key) -> bool:
        with self._lock:
            existed = key in self._data
            self._wal_write({"op": "del", "k": key})
            self._data.pop(key, None); self._ttl.pop(key, None)
            return existed

    def snapshot(self) -> None:
        with self._lock:
            tmp = self._snap_path + ".tmp"
            with open(tmp, "wb") as f:
                pickle.dump((self._data, self._ttl), f)
                f.flush(); os.fsync(f.fileno())
            os.rename(tmp, self._snap_path)
            # Rotate WAL
            self._wal_fp.close()
            open(self._wal_path, "wb").close()
            self._wal_fp = open(self._wal_path, "ab", buffering=0)

    def close(self) -> None:
        with self._lock:
            if self._wal_fp:
                self._wal_fp.flush(); os.fsync(self._wal_fp.fileno())
                self._wal_fp.close(); self._wal_fp = None

Tests

import unittest, tempfile, os, time

class TestKV(unittest.TestCase):
    def setUp(self):
        self.tmp = tempfile.mkdtemp()
        self.wal = os.path.join(self.tmp, "wal.log")
        self.snap = os.path.join(self.tmp, "snap.pkl")

    def tearDown(self):
        import shutil; shutil.rmtree(self.tmp)

    def test_basic(self):
        kv = KVStore(self.wal, self.snap, fsync="none")
        kv.put("a", 1); kv.put("b", "two")
        self.assertEqual(kv.get("a"), 1)
        self.assertEqual(kv.get("b"), "two")
        kv.delete("a")
        self.assertIsNone(kv.get("a"))
        kv.close()

    def test_ttl(self):
        kv = KVStore(self.wal, self.snap, fsync="none")
        kv.put("k", "v", ttl_seconds=0.1)
        self.assertEqual(kv.get("k"), "v")
        time.sleep(0.15)
        self.assertIsNone(kv.get("k"))
        kv.close()

    def test_recovery_from_wal(self):
        kv = KVStore(self.wal, self.snap, fsync="per_write")
        kv.put("x", "y")
        kv.close()
        # Simulate crash and restart
        kv2 = KVStore(self.wal, self.snap, fsync="none")
        self.assertEqual(kv2.get("x"), "y")
        kv2.close()

    def test_snapshot_rotates_wal(self):
        kv = KVStore(self.wal, self.snap, fsync="none")
        for i in range(100):
            kv.put(f"k{i}", i)
        wal_size_before = os.path.getsize(self.wal)
        kv.snapshot()
        wal_size_after = os.path.getsize(self.wal)
        self.assertGreater(wal_size_before, wal_size_after)
        kv.close()
        # Recover
        kv2 = KVStore(self.wal, self.snap, fsync="none")
        self.assertEqual(kv2.get("k99"), 99)
        kv2.close()

Follow-up Questions

(2) Persist state across restarts? That’s what we built. The four fsync levels and their tradeoffs are the answer-bearing detail: per_write (durable per op, slow); every_sec (≤1 sec data loss, fast — Redis default); none (lose on crash, fastest).

(10) Consistency model? Linearizable in a single process under the lock. Across processes (or replicas), this becomes a consensus problem — Raft / Paxos. The KV store is the data plane; consensus is the control plane.

(8) Partial failure? Crash mid-write: _wal_write buffers a partial entry — pickle.UnpicklingError on recovery; we ignore the trailing junk (caught above). For OS-level partial writes (rare on Linux for ≤ 4 KB), a per-entry checksum (CRC32) catches them.

(9) Eviction / cleanup? TTL provides automatic cleanup, but expired keys still in memory consume RAM until accessed. Background sweeper: periodically scan _ttl for expired keys and delete. For unbounded growth, add an LRU/LFU policy on top: when memory > threshold, evict by policy.

(11) Configuration knobs? fsync policy, snapshot_interval, max_memory_bytes, eviction_policy. Knobs not to expose: pickle protocol (use latest).

(12) Shutdown? Graceful: flush WAL, fsync, close file. The close method ensures durability up to the last write.

Product Extension

Redis (RDB = snapshot, AOF = WAL); RocksDB / LevelDB (LSM trees with WAL + memtable + SSTable); Memcached (no persistence — pure cache); etcd / ZooKeeper (snapshot + WAL + Raft for consensus). The pattern you wrote here is the foundation; SSTable + LSM is the next-level optimization for write-heavy + range-query workloads.

Language/Runtime Follow-ups

Python: pickle is fine for the snapshot format but not version-safe; for production, use msgpack or Protocol Buffers.
Java: RandomAccessFile for the WAL; Java serialization for snapshot (also fragile — prefer Avro or Protobuf).
Go: bufio.Writer over os.File; gob for snapshot. BadgerDB and BoltDB are production-grade Go KV stores.
C++: write your own framing or use Cap’n Proto. RocksDB is the canonical reference (C++ implementation of LSM + WAL).
JS/TS: rare in Node; use level (LevelDB binding) instead of rolling your own.

Common Bugs

Updating in-memory state before WAL append: lose the durability guarantee.
fsync per write but on the wrong fd (forgetting flush() before fsync()).
Snapshot writes to the actual snapshot path before fsync — if crash mid-write, snapshot is corrupt. Always write-tmp + fsync + rename.
WAL not rotated on snapshot — recovery replays the entire history every time, even after snapshot.
TTL stored as duration instead of absolute time — restart shifts deadlines.

Debugging Strategy

For “lost data after restart” bugs: tail the WAL with a pickle reader and check that the missing key was logged. For corrupt-snapshot bugs: check that os.rename is on the same filesystem (cross-fs rename is not atomic).

Mastery Criteria

Implemented KVStore with WAL + snapshot + recovery in <40 minutes.
All four tests pass.
Articulated three fsync levels and their tradeoffs without prompting.
Stated WAL-before-memory as the durability invariant.
Answered follow-ups #2, #8 (partial-write tolerance), #9, #10 (single vs replicated consistency), #11.
Compared snapshot+WAL vs LSM tree at a high level.

Lab 15 — Retry With Exponential Backoff and Jitter

Goal

Implement a reusable retry(fn, policy) primitive that retries a callable on failure with exponential backoff plus decorrelated jitter, bounded by max attempts and total deadline, with an explicit retryable-error predicate so non-retryable errors fail fast. After this lab you should be able to write a production-shaped retry helper from a blank screen in <15 minutes and articulate why naive sleep(2 ** attempt) is wrong in <60 seconds.

Background Concepts

A retry primitive has four orthogonal knobs: (a) how many times to retry (max attempts and / or total deadline), (b) how long to wait between attempts (the backoff schedule), (c) which errors are retryable (a predicate), and (d) what to do on final failure (raise, return a sentinel, surface diagnostics). The non-trivial knob is (b). Naive exponential backoff wait = base * 2 ** attempt causes a thundering herd: when a downstream service recovers, every retrying client wakes simultaneously and re-overloads it. The fix is jitter: randomize the wait. The two industry-standard schedules are full jitter (wait = uniform(0, base * 2 ** attempt)) and decorrelated jitter (wait = uniform(base, prev_wait * 3), capped at max_wait). Decorrelated jitter is preferred when retries cluster across many clients because its waits are less correlated across attempts.

The total deadline matters as much as the attempt count. A 5-attempt schedule with base=1s, cap=30s can spend up to two minutes blocked — unacceptable for a request-path retry. Production retry helpers always take a deadline.

Interview Context

This is a 20-minute warmup at Stripe, Uber, Cloudflare, and any team whose service makes downstream calls. It’s also a frequent follow-up to the rate-limiter and circuit-breaker labs. Candidates who write for i in range(5): try: return fn() except: time.sleep(2 ** i) get a partial credit; candidates who name jitter, deadline, retryable-error predicate, and the relationship to the circuit breaker (Lab 16) get a strong signal.

Problem Statement

Implement retry(fn, max_attempts, base_delay, max_delay, deadline_s, is_retryable, jitter='decorrelated') that calls fn() repeatedly until it succeeds or the policy gives up. On non-retryable exceptions, fail immediately. On retryable exceptions, sleep according to the schedule and try again. On exceeding max_attempts or deadline_s, raise the last exception wrapped in a RetryExhausted.

Constraints

max_attempts ≥ 1 (1 means “no retries”; the function is called at most once).
base_delay > 0, max_delay ≥ base_delay.
deadline_s may be None (no deadline) or a positive float (wall-clock seconds from retry() invocation).
is_retryable: Exception -> bool must be a pure function.
The implementation must not busy-spin and must respect both the per-attempt cap and the deadline (whichever fires first).

Clarifying Questions

Is the deadline measured from retry() invocation or from the first failure? (From invocation — simpler reasoning.)
Should fn() be called at least once even if the deadline is already past at start? (Yes — at least one attempt.)
Should we sleep after the final attempt? (No — pointless.)
Does is_retryable apply to the last attempt’s exception, or do we always re-raise the last? (Re-raise the last; non-retryable short-circuits.)
Synchronous or async? (Both — implement sync first, async variant in follow-ups.)
Should we surface the attempt count and total elapsed time in the wrapped exception? (Yes — operational visibility.)

Examples

retry(lambda: http_get(url), max_attempts=5, base_delay=0.1, max_delay=10, deadline_s=30,
      is_retryable=lambda e: isinstance(e, (TimeoutError, ConnectionError)))
# → returns the response if any attempt succeeds within 30s and 5 tries.
# → raises RetryExhausted("timeout", attempts=5, elapsed=12.3s) on timeout.
# → raises ValueError immediately if fn() raises ValueError (non-retryable).

Initial Brute Force

def retry_naive(fn, max_attempts):
    for i in range(max_attempts):
        try:
            return fn()
        except Exception:
            if i == max_attempts - 1:
                raise
            time.sleep(2 ** i)

This is what most candidates write first. It has all four bugs listed above: no jitter (thundering herd), no deadline (unbounded wait), no retryable predicate (retries on programming errors), no cap on max_delay.

Brute Force Complexity

Time: dominated by sleeps; up to Σ 2^i ≈ 2^max_attempts seconds in the worst case. For max_attempts=10, that’s 17 minutes. Space: O(1).

Optimization Path

Add (1) max_delay cap → bounds per-attempt sleep, (2) deadline_s total cap → bounds end-to-end blocking, (3) is_retryable predicate → fast-fails on programmer errors, (4) jitter → spreads herd, (5) structured exception with diagnostics → operational legibility. Each addition is a one-knob change; together they take the primitive from “buggy in production” to “shippable”.

Final Expected Approach

Loop up to max_attempts times. Track wall-clock start. On each attempt, call fn(). On success, return the value. On failure, check is_retryable; if false, re-raise. If we’re at the last attempt or past the deadline, raise RetryExhausted. Otherwise compute the next sleep using the chosen jitter scheme, clip to remaining-deadline so we don’t oversleep, and time.sleep(wait). Log each attempt.

Data Structures Used

A monotonic clock reference (time.monotonic()) to compute deadlines — wall-clock can jump.
A small RetryExhausted exception class carrying attempts, elapsed, last_exception.
An optional Logger for per-attempt diagnostics (don’t print; inject a logger).

Correctness Argument

We make at most max_attempts calls to fn (loop bound). We sleep between attempts but never after the final one (the loop returns or raises before sleeping past the last attempt). We respect deadline_s by computing remaining = deadline - elapsed and clipping the sleep; if remaining ≤ 0 we raise immediately. Non-retryable exceptions short-circuit by re-raising before the sleep. The exception we surface is always the last underlying failure, wrapped with diagnostics.

Complexity

Aspect	Cost
Wall-clock	bounded by `min(deadline_s, Σ wait_i)`
CPU per failed attempt	O(1) plus `fn`’s own cost
Memory	O(1)

Implementation Requirements

A complete working implementation is required.

import random
import time
from dataclasses import dataclass
from typing import Callable, Optional, TypeVar

T = TypeVar("T")


class RetryExhausted(Exception):
    def __init__(self, message: str, attempts: int, elapsed: float, last_exception: BaseException):
        super().__init__(f"{message} (attempts={attempts}, elapsed={elapsed:.2f}s)")
        self.attempts = attempts
        self.elapsed = elapsed
        self.last_exception = last_exception


@dataclass
class RetryPolicy:
    max_attempts: int = 5
    base_delay: float = 0.1
    max_delay: float = 30.0
    deadline_s: Optional[float] = None
    jitter: str = "decorrelated"            # "decorrelated" | "full" | "none"
    is_retryable: Callable[[BaseException], bool] = lambda e: True

    def __post_init__(self):
        if self.max_attempts < 1:
            raise ValueError("max_attempts must be >= 1")
        if self.base_delay <= 0 or self.max_delay < self.base_delay:
            raise ValueError("invalid delay bounds")


def _next_wait(policy: RetryPolicy, attempt: int, prev_wait: float) -> float:
    if policy.jitter == "none":
        w = min(policy.base_delay * (2 ** attempt), policy.max_delay)
    elif policy.jitter == "full":
        cap = min(policy.base_delay * (2 ** attempt), policy.max_delay)
        w = random.uniform(0, cap)
    elif policy.jitter == "decorrelated":
        w = min(random.uniform(policy.base_delay, max(prev_wait, policy.base_delay) * 3),
                policy.max_delay)
    else:
        raise ValueError(f"unknown jitter scheme: {policy.jitter}")
    return w


def retry(fn: Callable[[], T], policy: RetryPolicy, *, sleep=time.sleep, clock=time.monotonic) -> T:
    start = clock()
    last_exc: Optional[BaseException] = None
    prev_wait = policy.base_delay
    for attempt in range(policy.max_attempts):
        try:
            return fn()
        except BaseException as e:
            last_exc = e
            if not policy.is_retryable(e):
                raise
            if attempt == policy.max_attempts - 1:
                break
            elapsed = clock() - start
            if policy.deadline_s is not None and elapsed >= policy.deadline_s:
                break
            wait = _next_wait(policy, attempt, prev_wait)
            if policy.deadline_s is not None:
                wait = min(wait, max(0.0, policy.deadline_s - elapsed))
            if wait > 0:
                sleep(wait)
            prev_wait = wait
    elapsed = clock() - start
    raise RetryExhausted("retry exhausted", attempt + 1, elapsed, last_exc) from last_exc

sleep and clock are dependency-injected so tests do not have to wait real time.

Tests

def test_succeeds_first_try():
    assert retry(lambda: 42, RetryPolicy(max_attempts=3)) == 42

def test_succeeds_after_failures():
    n = {"i": 0}
    def fn():
        n["i"] += 1
        if n["i"] < 3: raise TimeoutError()
        return "ok"
    assert retry(fn, RetryPolicy(max_attempts=5, base_delay=0.001)) == "ok"
    assert n["i"] == 3

def test_non_retryable_short_circuits():
    n = {"i": 0}
    def fn():
        n["i"] += 1
        raise ValueError("bad")
    policy = RetryPolicy(is_retryable=lambda e: not isinstance(e, ValueError))
    try: retry(fn, policy)
    except ValueError: pass
    assert n["i"] == 1

def test_exhaustion_wraps_exception():
    def fn(): raise TimeoutError("nope")
    try: retry(fn, RetryPolicy(max_attempts=2, base_delay=0.001))
    except RetryExhausted as e:
        assert e.attempts == 2
        assert isinstance(e.last_exception, TimeoutError)

def test_deadline_respected():
    fake_time = [0.0]
    def fn(): raise TimeoutError()
    sleeps = []
    def fake_sleep(t): sleeps.append(t); fake_time[0] += t
    def fake_clock(): return fake_time[0]
    try:
        retry(fn, RetryPolicy(max_attempts=100, base_delay=1, deadline_s=5, jitter="none"),
              sleep=fake_sleep, clock=fake_clock)
    except RetryExhausted as e:
        assert e.elapsed <= 5.001

Follow-up Questions

How would you make it thread-safe? The function is reentrant — no shared state across calls. The injected sleep and clock should themselves be thread-safe (the stdlib ones are). Per-call state (attempt counter, prev_wait) is local. No locks needed.
How would you observe and monitor it? Emit (a) retry.attempts counter labeled by callsite and outcome (success_first_try, success_after_retry, exhausted, non_retryable), (b) retry.elapsed histogram, (c) retry.attempt_count histogram. Log per-attempt at DEBUG, per-final-failure at WARN.
How would you handle a poison-pill input? A request that always raises a retryable error wastes the deadline on every retry. Wrap repeated callers behind a circuit breaker (Lab 16); after N consecutive RetryExhausteds, open the breaker and fail fast for a cooldown period.
What configuration knobs would you expose? max_attempts, base_delay, max_delay, deadline_s, jitter strategy, is_retryable predicate. Defaults: 5 / 100ms / 30s / None / decorrelated / lambda e: True. Don’t expose internal multipliers (the 3× in decorrelated jitter) — they’re stable and tuning them in production is a smell.
How would you test it deterministically? Inject sleep and clock; advance fake time inside the fake sleep. Seed random for reproducible jitter. The test for the deadline above uses this pattern.
What is the relationship to the circuit breaker? A retry without a circuit breaker is dangerous: if the downstream is fully down, every caller retries the full schedule, multiplying load. The right composition is circuit_breaker(retry(fn)) — the breaker short-circuits the retry once it has seen enough failures.

Product Extension

Retry primitives are the workhorse of every microservice’s outbound RPC layer. AWS SDK, Google Cloud SDK, and gRPC all ship retry helpers; their default schedules are decorrelated jitter with deadlines. The is_retryable predicate in production is the hardest knob: HTTP 5xx is usually retryable, 4xx usually is not, but 429 is retryable with Retry-After honored. Lift this complexity into the predicate.

Language/Runtime Follow-ups

Python: as above. For async, swap time.sleep for asyncio.sleep and make retry an async def.
Java: use Resilience4j or Failsafe in production. Hand-rolled: a RetryPolicy builder, a Callable<T> argument, Thread.sleep (or ScheduledExecutorService.schedule in async).
Go: a function Retry(ctx context.Context, fn func() error, policy Policy) error. Use time.NewTimer so a ctx.Done() can cancel mid-sleep. Cancellation is the deadline mechanism.
C++: std::this_thread::sleep_for and std::chrono for the deadline. Pass a stop-token to support cancellation.
JS/TS: await new Promise(r => setTimeout(r, ms)). The retry function is async. Use AbortSignal for the deadline.

Common Bugs

Sleeping after the final attempt — wastes wall-clock.
Using time.time() instead of time.monotonic() — wall-clock can jump backwards across NTP corrections, causing negative elapsed and crashes.
Catching BaseException and swallowing KeyboardInterrupt / SystemExit — never make these retryable. Either narrow the catch or have the predicate exclude them.
Computing the next wait before the deadline check — you sleep past the deadline. Always check elapsed first.
Forgetting to clip wait to remaining = deadline - elapsed — a 30s sleep when only 2s of deadline remain.
Not seeding random deterministically in tests — flaky test failures.

Debugging Strategy

When retries don’t fire: print is_retryable(e) for the actual exception; assert it returns True. When they fire too long: print attempt, wait, and clock() - start per attempt — the bug is almost always a missing deadline check or an uncapped jitter computation. When tests are flaky: confirm sleep is injected (no real sleeps in unit tests) and random.seed(0) at the top of the test.

Mastery Criteria

Wrote the brute force naive retry in <2 minutes from cold start.
Added max_delay, deadline_s, is_retryable, and jitter incrementally, justifying each.
Wrote both full-jitter and decorrelated-jitter formulas from memory.
Stated the difference between time.time() and time.monotonic() and which to use here.
Wrote deterministic tests using injected sleep and clock.
Articulated the retry+circuit-breaker composition in <60 seconds.
Solved this from a blank screen in <15 minutes including 5 unit tests.
Listed the four bugs in the naive for i in range: sleep(2**i) retry without prompting.

Lab 16 — Circuit Breaker

Goal

Implement a thread-safe circuit breaker with three states — CLOSED, OPEN, HALF_OPEN — that protects a downstream call by failing fast once a sliding-window failure rate threshold is crossed, then probes for recovery after a cooldown. After this lab you should be able to draw the state diagram, name every transition, write the implementation in <25 minutes, and answer “what’s the difference between a retry and a circuit breaker” in <30 seconds.

Background Concepts

A circuit breaker is the operational dual of a retry. A retry keeps trying until the downstream is probably up; a circuit breaker stops trying once the downstream is probably down. Without a breaker, every caller retries the full schedule and amplifies the outage. With one, callers fail fast for a cooldown window and only a single probe call is sent during recovery — preventing the retry storm that otherwise prolongs outages.

The three states:

CLOSED — normal operation; calls go through; failures are counted in a sliding window.
OPEN — the failure threshold was crossed; all calls are short-circuited with CircuitOpenError for cooldown_s seconds.
HALF_OPEN — cooldown elapsed; a single probe call is allowed. If it succeeds, transition to CLOSED and reset counters. If it fails, transition back to OPEN and start a fresh cooldown.

Two failure-counting windows are common: count-based (last N calls) and time-based (last T seconds). Time-based is preferred for low-traffic services because count-based windows can stay stale indefinitely. Both are easy to implement on top of a deque of timestamps.

Interview Context

This is the canonical follow-up to Lab 15 (retry) and a top-15 practical problem at Stripe, Netflix, Uber, and any team with a microservice mesh. The Hystrix library popularized this pattern; its successor Resilience4j is the modern reference. Candidates often hand-roll only the state transitions and miss the half-open single-probe constraint — a clear signal of “knows the diagram, hasn’t operated one in production”.

Problem Statement

Implement CircuitBreaker(failure_threshold, window_s, cooldown_s) with method call(fn) that either calls fn() (and updates the breaker state from the result) or raises CircuitOpenError if the breaker is open. Internally track failure count over the last window_s seconds; transition to OPEN when the count reaches failure_threshold. After cooldown_s in OPEN, the next call enters HALF_OPEN and is the sole probe; success → CLOSED, failure → OPEN again.

Constraints

Thread-safe; multiple goroutines/threads may call call() concurrently.
In HALF_OPEN, exactly one probe is in flight. Concurrent callers see CircuitOpenError until the probe completes.
failure_threshold ≥ 1, window_s > 0, cooldown_s > 0.
Successful calls in CLOSED decrement (or do not affect) the failure window — choose and document.

Clarifying Questions

Are timeouts counted as failures? (Default yes — they almost always indicate downstream unhealth.)
Are application errors (4xx vs 5xx) treated identically? (No — 4xx is the caller’s fault; only 5xx and timeouts should trip. Inject a is_failure(exception) predicate.)
What’s the recovery semantics — strict half-open (single probe) or “let N requests through”? (Single probe by default; named RECOVERY_QUOTA if needed.)
Do we need per-resource breakers or a global one? (Per-resource is correct — a breaker per downstream identity.)
Should successes in CLOSED reset the failure count? (Most implementations don’t reset; only the sliding window aging removes failures. Tunable.)

Examples

breaker = CircuitBreaker(failure_threshold=5, window_s=10, cooldown_s=30)
breaker.call(lambda: http_get(url))   # raises if downstream raises
# After 5 failures within 10s: state -> OPEN
breaker.call(...)                      # raises CircuitOpenError immediately for 30s
# After 30s cooldown: next call -> HALF_OPEN probe
# Probe success -> CLOSED, fresh window
# Probe failure -> OPEN, fresh 30s cooldown

Initial Brute Force

class NaiveBreaker:
    def __init__(self, threshold, cooldown_s):
        self.failures = 0
        self.opened_at = None
        self.threshold = threshold
        self.cooldown_s = cooldown_s
    def call(self, fn):
        if self.opened_at and time.time() - self.opened_at < self.cooldown_s:
            raise CircuitOpenError()
        try:
            r = fn()
            self.failures = 0
            return r
        except Exception:
            self.failures += 1
            if self.failures >= self.threshold:
                self.opened_at = time.time()
            raise

This naive version has six bugs: not thread-safe; counts forever (no window aging); no half-open state (multiple probes after cooldown); resets failures on any success even if breaker just opened; uses wall-clock; treats every exception as a failure.

Brute Force Complexity

call() is O(1). Failure window is unbounded — fails over long traffic patterns where intermittent failures should not trip.

Optimization Path

(1) Add a sliding window — deque of failure timestamps, age out on each call. (2) Add HALF_OPEN state with a probe_in_flight flag. (3) Add a Lock to serialize state transitions. (4) Inject the failure predicate so only real failures count. (5) Switch to monotonic(). (6) Emit metrics on each transition.

Final Expected Approach

State machine guarded by a threading.Lock. On each call(): under the lock, read state. If OPEN and cooldown elapsed → transition to HALF_OPEN and grant the probe to this caller (set probe_in_flight=True). If OPEN and not elapsed → raise. If HALF_OPEN and probe in flight → raise (concurrent callers see open). If CLOSED → proceed. Release the lock, call fn(), reacquire the lock to record the result. On success in HALF_OPEN → transition to CLOSED, clear failures. On failure → record (or transition to OPEN).

Data Structures Used

deque[float] for failure timestamps in the sliding window.
threading.Lock for state transitions.
An enum for State.
A monotonic clock for all time reads.

Correctness Argument

The state diagram is a closed graph: CLOSED → OPEN → HALF_OPEN → {CLOSED | OPEN}. Every transition is guarded by the lock, so two threads cannot disagree on the current state. The half-open invariant is enforced by probe_in_flight: only the thread that flipped the state from OPEN to HALF_OPEN holds the probe right; all others see CircuitOpenError. The sliding window is monotonically aged on each call, so failures older than window_s are guaranteed evicted before being counted.

Complexity

Operation	Time	Space
`call` (CLOSED, success)	O(1) amortized	O(window) for deque
`call` (OPEN, fast-fail)	O(1)	O(1)
`call` (HALF_OPEN probe)	O(1) plus `fn`	O(1)

Window aging is amortized O(1) per call.

Implementation Requirements

import threading
import time
from collections import deque
from enum import Enum
from typing import Callable, Optional, TypeVar

T = TypeVar("T")


class State(Enum):
    CLOSED = "CLOSED"
    OPEN = "OPEN"
    HALF_OPEN = "HALF_OPEN"


class CircuitOpenError(Exception):
    pass


class CircuitBreaker:
    def __init__(self,
                 failure_threshold: int = 5,
                 window_s: float = 10.0,
                 cooldown_s: float = 30.0,
                 is_failure: Callable[[BaseException], bool] = lambda e: True,
                 *,
                 clock=time.monotonic):
        if failure_threshold < 1:
            raise ValueError("failure_threshold must be >= 1")
        self._threshold = failure_threshold
        self._window_s = window_s
        self._cooldown_s = cooldown_s
        self._is_failure = is_failure
        self._clock = clock
        self._lock = threading.Lock()
        self._state = State.CLOSED
        self._failures: deque[float] = deque()
        self._opened_at: Optional[float] = None
        self._probe_in_flight = False
        # observability
        self._transitions: list[tuple[float, State, State]] = []

    def _age_failures(self, now: float):
        cutoff = now - self._window_s
        while self._failures and self._failures[0] < cutoff:
            self._failures.popleft()

    def _transition(self, new: State, now: float):
        self._transitions.append((now, self._state, new))
        self._state = new

    def _try_acquire_probe(self, now: float) -> bool:
        """Called under lock. True if this caller becomes the probe."""
        if self._state == State.OPEN and self._opened_at is not None \
                and now - self._opened_at >= self._cooldown_s:
            self._transition(State.HALF_OPEN, now)
            self._probe_in_flight = True
            return True
        return False

    def call(self, fn: Callable[[], T]) -> T:
        now = self._clock()
        is_probe = False
        with self._lock:
            if self._state == State.CLOSED:
                self._age_failures(now)
            elif self._state == State.OPEN:
                if not self._try_acquire_probe(now):
                    raise CircuitOpenError("breaker is OPEN")
                is_probe = True
            elif self._state == State.HALF_OPEN:
                if not self._probe_in_flight:
                    # rare race: cooldown re-elapsed during a transient state
                    self._probe_in_flight = True
                    is_probe = True
                else:
                    raise CircuitOpenError("probe in flight")
        # invoke without holding the lock
        try:
            result = fn()
        except BaseException as e:
            failed = self._is_failure(e)
            with self._lock:
                now = self._clock()
                if is_probe:
                    self._probe_in_flight = False
                    self._transition(State.OPEN, now)
                    self._opened_at = now
                elif failed:
                    self._failures.append(now)
                    self._age_failures(now)
                    if len(self._failures) >= self._threshold and self._state == State.CLOSED:
                        self._transition(State.OPEN, now)
                        self._opened_at = now
                        self._failures.clear()
            raise
        with self._lock:
            if is_probe:
                self._probe_in_flight = False
                self._transition(State.CLOSED, self._clock())
                self._failures.clear()
                self._opened_at = None
        return result

    def state(self) -> State:
        with self._lock:
            return self._state

Tests

def test_closed_passes_through():
    b = CircuitBreaker(failure_threshold=3, window_s=10, cooldown_s=5)
    assert b.call(lambda: 42) == 42
    assert b.state() == State.CLOSED

def test_opens_after_threshold():
    b = CircuitBreaker(failure_threshold=3, window_s=10, cooldown_s=5)
    for _ in range(3):
        try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
        except RuntimeError: pass
    assert b.state() == State.OPEN
    try: b.call(lambda: 42)
    except CircuitOpenError: pass

def test_half_open_success_closes():
    fake = [0.0]
    b = CircuitBreaker(failure_threshold=2, window_s=10, cooldown_s=5, clock=lambda: fake[0])
    for _ in range(2):
        try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
        except RuntimeError: pass
    assert b.state() == State.OPEN
    fake[0] = 6
    assert b.call(lambda: "ok") == "ok"
    assert b.state() == State.CLOSED

def test_half_open_failure_reopens():
    fake = [0.0]
    b = CircuitBreaker(failure_threshold=2, window_s=10, cooldown_s=5, clock=lambda: fake[0])
    for _ in range(2):
        try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
        except RuntimeError: pass
    fake[0] = 6
    try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
    except RuntimeError: pass
    assert b.state() == State.OPEN

def test_concurrent_only_one_probe():
    import threading
    fake = [0.0]
    b = CircuitBreaker(failure_threshold=1, window_s=10, cooldown_s=5, clock=lambda: fake[0])
    try: b.call(lambda: (_ for _ in ()).throw(RuntimeError()))
    except RuntimeError: pass
    fake[0] = 6
    seen_states = []
    barrier = threading.Barrier(10)
    def worker():
        barrier.wait()
        try:
            b.call(lambda: time.sleep(0.05) or "ok")
            seen_states.append("ok")
        except CircuitOpenError:
            seen_states.append("open")
    threads = [threading.Thread(target=worker) for _ in range(10)]
    for t in threads: t.start()
    for t in threads: t.join()
    assert seen_states.count("ok") == 1
    assert seen_states.count("open") == 9

Follow-up Questions

How would you make it thread-safe? A single threading.Lock around state transitions and counter updates is sufficient and is what the implementation does. The fn() call is invoked outside the lock so a slow downstream does not block other callers from seeing CircuitOpenError. The half-open probe race is resolved by probe_in_flight flipping atomically under the lock.
What metrics would you emit? State-transition counter (labels: from_state, to_state); current state gauge; per-call outcome counter (success, failure, short_circuit, probe_success, probe_failure); failure-window gauge (current count); time-in-state histogram.
What is the consistency model? Linearizable on the breaker’s state — all state() reads observe transitions in a total order consistent with the lock acquisition order. The probe invariant (“at most one probe in flight at any time”) is strict.
How would you handle a poison-pill input? A request that always raises a retryable failure will trip the breaker quickly — that’s the breaker’s job. The risk is the opposite: a probe with a poison input perpetually fails the half-open probe and never recovers. Mitigation: pick the probe payload from a known-safe traffic pool (synthetic health-check), or use a periodic health probe instead of in-line traffic.
What configuration knobs would you expose? failure_threshold, window_s, cooldown_s, is_failure predicate. Don’t expose the half-open probe quota — keep it 1 unless you have a strong reason. Defaults: 5 failures / 10s window / 30s cooldown.
How would you scale to N nodes? Per-process breakers are local — each instance learns about downstream health independently. This is correct for most use cases (each instance’s view of latency varies) but expensive if downstream collapse is sudden. The next step is a coordinated breaker via a shared registry, but only at very high scale.

Product Extension

Real-world breakers (Hystrix, Resilience4j, Polly) layer on top of this core: bulkheads (concurrent-call limit), rate limiters, fallbacks (return cached value when open), and metric emission to Prometheus / StatsD. The state machine is the same; the bookkeeping around it varies by framework.

Language/Runtime Follow-ups

Python: as above. For async, replace Lock with asyncio.Lock and make call an async def.
Java: prefer Resilience4j in production. Hand-rolled: AtomicReference<State>, LongAdder for counters, ScheduledExecutorService for cooldown timeouts.
Go: a struct guarded by sync.Mutex. The probe flag is a bool. Use time.Now() (monotonic on Go 1.9+).
C++: std::mutex + std::condition_variable if you want concurrent callers to wait for the probe rather than fail fast (a different policy, called “blocking breaker”).
JS/TS: in single-threaded Node, no lock is needed — the state-transition logic is naturally atomic across awaits as long as you do not await in the middle of a transition block.

Common Bugs

Holding the lock while calling fn() — a slow downstream blocks every other caller.
Forgetting to clear probe_in_flight on probe failure — breaker stays in HALF_OPEN forever, all calls fail.
Using time.time() — wall-clock skew can make now - opened_at negative and the cooldown effectively infinite.
Counting non-failure exceptions (KeyboardInterrupt, ValueError from caller side) toward the threshold.
Resetting failures on every successful call — masks intermittent failures.
Aging the window only on failure — state() queries report stale counts.

Debugging Strategy

When the breaker won’t open: log the failure count after each call; check is_failure(e) returns True for the actual exception. When it won’t close after recovery: log the state and probe_in_flight flag — almost always the probe-flag-stuck-True bug. When concurrent tests are flaky: add a barrier so all callers race in lockstep, then assert exactly one probe-success and N-1 short-circuits.

Mastery Criteria

Drew the three-state diagram from memory in <30 seconds.
Listed every transition trigger (failure-threshold, cooldown-elapsed, probe-success, probe-failure) without prompting.
Wrote a thread-safe implementation in <30 minutes from a blank screen.
Wrote a concurrent test that catches the multiple-probe bug.
Articulated the retry × circuit-breaker composition in <60 seconds.
Named four metrics you’d emit for a production breaker.
Explained why fn() must not be called under the lock.

Lab 17 — Metrics Collector (Counter / Gauge / Histogram)

Goal

Implement a thread-safe in-process metrics registry that supports the three canonical metric types — counter, gauge, histogram — with bounded memory, label support, and an export format suitable for Prometheus scraping. After this lab you should be able to write the registry from a blank screen in <25 minutes and articulate the difference between summary and histogram in <60 seconds.

Background Concepts

The four metric types in the Prometheus / OpenMetrics ecosystem are counter (monotonic non-decreasing total — requests, errors, bytes), gauge (a current value — queue depth, active connections, memory in use), histogram (a count of observations bucketed by upper bound, used to compute quantiles server-side), and summary (client-side quantiles, harder to aggregate). The first three cover ~95% of production needs. Counters answer “how many?”, gauges answer “how much right now?”, histograms answer “what’s the distribution and the p99?”.

A metric is identified by (name, label_set). The same name with different labels (e.g., http_requests_total{method="GET"} vs http_requests_total{method="POST"}) is a different time series. The number of label combinations is the metric’s cardinality. Unbounded cardinality (e.g., a label per user_id) is the most common production memory leak in metric systems — protect against it.

Histograms are tricky. Two-pass naive implementations (store all observations, sort on export) explode memory. The Prometheus model: pre-declare a fixed set of bucket upper bounds ([0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] is the default) and increment a counter per bucket. Quantile estimation happens at the query layer using bucket interpolation. Memory is O(buckets) per series — cheap.

Interview Context

A staple practical-engineering question at Datadog, Grafana, Honeycomb, and any observability-aware team (which is most of them now). Also asked at Stripe and Uber as a “we want to see how you reason about cardinality” question. Candidates fail this round when they reach for “store every observation in a list” — that betrays a lack of production exposure.

Problem Statement

Implement MetricsRegistry with:

counter(name, labels=None).inc(by=1) — monotonically incrementing.
gauge(name, labels=None).set(value) / .inc(by=1) / .dec(by=1).
histogram(name, labels=None, buckets=...).observe(value).
registry.snapshot() returns a list of (name, labels, type, value) tuples or the OpenMetrics text format.
Thread-safe; bounded cardinality (configurable maximum number of label combinations per metric name).

Constraints

Must support concurrent observations from many threads.
Histogram bucket increments must be atomic (no torn reads of bucket_count and sum).
Cardinality cap: when a new label combination would exceed max_labels_per_metric, drop and emit an internal counter (metrics_drops_total).
Counter and gauge: O(1) per operation; histogram: O(log buckets) per observation (binary search the bucket).

Clarifying Questions

Do we need timestamped exposition (Prometheus exposition format)? (Yes, but the timestamp can be implicit — Prometheus assigns the scrape timestamp.)
Are histogram buckets shared across all label combinations or per combination? (Per combination — different label values may have different distributions.)
Are we exposing percentiles client-side or letting the server compute them? (Server-side — the histogram type is exactly this.)
Should counters reset on process restart? (Yes — Prometheus handles this with the rate() function and the reset detection in counter math.)
What’s the maximum max_labels_per_metric we should default to? (1000 is generous; 100 is conservative. Make it configurable.)

Examples

reg = MetricsRegistry()
reg.counter("http_requests_total", {"method": "GET", "status": "200"}).inc()
reg.gauge("queue_depth", {"queue": "ingest"}).set(42)
reg.histogram("request_latency_s", {"endpoint": "/api"}, buckets=[0.01, 0.1, 1, 10]).observe(0.04)
print(reg.snapshot_openmetrics())
# # HELP http_requests_total
# # TYPE http_requests_total counter
# http_requests_total{method="GET",status="200"} 1
# # TYPE queue_depth gauge
# queue_depth{queue="ingest"} 42
# ...

Initial Brute Force

class NaiveMetrics:
    def __init__(self):
        self.metrics = {}
    def counter(self, name, labels=None):
        key = (name, frozenset((labels or {}).items()))
        self.metrics.setdefault(key, 0)
        self.metrics[key] += 1

This conflates increment and registration, has no thread safety, no histogram (impossible to compute p99 from a counter), no cardinality cap, and uses one global dict so every metric type collides on key shape.

Brute Force Complexity

O(1) per increment under no contention. With concurrent writers, races on dict.__setitem__ and += corrupt counts. Memory unbounded.

Optimization Path

Separate types into separate sub-registries (Counter, Gauge, Histogram) to avoid type-pun bugs. Add a per-metric Lock (or atomic primitive). Use bisect_left on a sorted bucket array to find the histogram bucket. Cap cardinality with a per-metric-name combination counter. Define an exposition format.

Final Expected Approach

A MetricsRegistry holds a dict name → MetricFamily. A MetricFamily stores the metric type, the bucket schedule (for histograms), and a dict labels_tuple → MetricInstance. Each MetricInstance is a small thread-safe object: Counter has an int and a Lock; Gauge has a float and a Lock; Histogram has a list[int] of bucket counts, a float sum, an int count, and a Lock. On increment, hash the label tuple, look up or create the instance (with cardinality check), acquire its Lock, mutate. Snapshot iterates families and instances under their locks and emits an exposition string.

Data Structures Used

dict[str, MetricFamily] for the registry.
dict[tuple[tuple[str, str], ...], MetricInstance] per family for label combinations.
list[float] (sorted) for histogram bucket boundaries.
list[int] for histogram bucket counts.
threading.Lock per instance.

Correctness Argument

Counter: monotonic by construction (only inc(by) with by ≥ 0 allowed). Gauge: set is the last writer’s value; inc/dec are atomic under the lock. Histogram: each observation falls into exactly one bucket (the smallest bucket whose upper bound is ≥ observation; the last bucket is +Inf). Sum and count are incremented under the same lock as the bucket count, so a snapshot sees consistent values.

Complexity

Op	Time
`counter.inc`	O(1) lock-and-increment
`gauge.set/inc/dec`	O(1)
`histogram.observe`	O(log B) for bucket lookup
`snapshot`	O(N · B) where N is total instances

Space: O(name + label-cardinality · (1 for counter/gauge or B+2 for histogram)).

Implementation Requirements

import threading
from bisect import bisect_left
from typing import Optional

DEFAULT_BUCKETS = (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10)


def _label_key(labels: Optional[dict]) -> tuple:
    if not labels:
        return ()
    return tuple(sorted(labels.items()))


class Counter:
    __slots__ = ("_v", "_lock")
    def __init__(self):
        self._v = 0
        self._lock = threading.Lock()
    def inc(self, by: float = 1):
        if by < 0:
            raise ValueError("counter cannot decrease")
        with self._lock:
            self._v += by
    def value(self) -> float:
        with self._lock:
            return self._v


class Gauge:
    __slots__ = ("_v", "_lock")
    def __init__(self):
        self._v = 0.0
        self._lock = threading.Lock()
    def set(self, v: float):
        with self._lock:
            self._v = v
    def inc(self, by: float = 1):
        with self._lock:
            self._v += by
    def dec(self, by: float = 1):
        with self._lock:
            self._v -= by
    def value(self) -> float:
        with self._lock:
            return self._v


class Histogram:
    __slots__ = ("_buckets", "_counts", "_sum", "_count", "_lock")
    def __init__(self, buckets):
        self._buckets = tuple(sorted(buckets))
        self._counts = [0] * (len(self._buckets) + 1)  # +1 for +Inf
        self._sum = 0.0
        self._count = 0
        self._lock = threading.Lock()
    def observe(self, v: float):
        idx = bisect_left(self._buckets, v)
        if idx == len(self._buckets) and v > self._buckets[-1]:
            idx = len(self._buckets)  # +Inf bucket
        with self._lock:
            self._counts[idx] += 1
            self._sum += v
            self._count += 1
    def snapshot(self) -> dict:
        with self._lock:
            cumulative = []
            running = 0
            for i, b in enumerate(self._buckets):
                running += self._counts[i]
                cumulative.append((b, running))
            running += self._counts[-1]
            cumulative.append((float("inf"), running))
            return {"buckets": cumulative, "sum": self._sum, "count": self._count}


class MetricFamily:
    def __init__(self, name: str, kind: str, buckets=None, max_labels: int = 1000):
        self.name = name
        self.kind = kind  # "counter" | "gauge" | "histogram"
        self.buckets = buckets
        self.instances: dict = {}
        self.max_labels = max_labels
        self.dropped = 0
        self.lock = threading.Lock()
    def get(self, labels_key: tuple):
        with self.lock:
            inst = self.instances.get(labels_key)
            if inst is not None:
                return inst
            if len(self.instances) >= self.max_labels:
                self.dropped += 1
                return None
            if self.kind == "counter":
                inst = Counter()
            elif self.kind == "gauge":
                inst = Gauge()
            else:
                inst = Histogram(self.buckets or DEFAULT_BUCKETS)
            self.instances[labels_key] = inst
            return inst


class _NullCounter:
    def inc(self, by=1): pass


class MetricsRegistry:
    def __init__(self, max_labels_per_metric: int = 1000):
        self._families: dict[str, MetricFamily] = {}
        self._lock = threading.Lock()
        self._max_labels = max_labels_per_metric

    def _family(self, name: str, kind: str, buckets=None) -> MetricFamily:
        with self._lock:
            f = self._families.get(name)
            if f is None:
                f = MetricFamily(name, kind, buckets, self._max_labels)
                self._families[name] = f
            elif f.kind != kind:
                raise ValueError(f"metric {name} already registered as {f.kind}")
            return f

    def counter(self, name: str, labels: Optional[dict] = None) -> Counter:
        f = self._family(name, "counter")
        inst = f.get(_label_key(labels))
        return inst if inst is not None else _NullCounter()

    def gauge(self, name: str, labels: Optional[dict] = None) -> Gauge:
        f = self._family(name, "gauge")
        return f.get(_label_key(labels)) or Gauge()  # caller-detached fallback

    def histogram(self, name: str, labels: Optional[dict] = None, buckets=None) -> Histogram:
        f = self._family(name, "histogram", buckets)
        return f.get(_label_key(labels)) or Histogram(buckets or DEFAULT_BUCKETS)

    def snapshot_openmetrics(self) -> str:
        lines = []
        with self._lock:
            families = list(self._families.values())
        for f in families:
            lines.append(f"# TYPE {f.name} {f.kind}")
            with f.lock:
                instances = list(f.instances.items())
            for labels_key, inst in instances:
                lbl = "{" + ",".join(f'{k}="{v}"' for k, v in labels_key) + "}" if labels_key else ""
                if isinstance(inst, (Counter, Gauge)):
                    lines.append(f"{f.name}{lbl} {inst.value()}")
                elif isinstance(inst, Histogram):
                    snap = inst.snapshot()
                    for b, cum in snap["buckets"]:
                        b_str = "+Inf" if b == float("inf") else f"{b}"
                        bucket_lbl = labels_key + (("le", b_str),)
                        bl = "{" + ",".join(f'{k}="{v}"' for k, v in bucket_lbl) + "}"
                        lines.append(f"{f.name}_bucket{bl} {cum}")
                    lines.append(f"{f.name}_sum{lbl} {snap['sum']}")
                    lines.append(f"{f.name}_count{lbl} {snap['count']}")
        return "\n".join(lines)

Tests

def test_counter_increments():
    r = MetricsRegistry()
    c = r.counter("hits")
    c.inc(); c.inc(2)
    assert c.value() == 3

def test_counter_rejects_negative():
    r = MetricsRegistry()
    try: r.counter("x").inc(-1)
    except ValueError: pass
    else: assert False

def test_gauge_set_inc_dec():
    g = MetricsRegistry().gauge("depth")
    g.set(10); g.inc(5); g.dec(3)
    assert g.value() == 12

def test_histogram_buckets():
    h = MetricsRegistry().histogram("lat", buckets=[0.1, 1, 10])
    for v in [0.05, 0.5, 1.5, 100]: h.observe(v)
    snap = h.snapshot()
    assert snap["count"] == 4
    assert snap["sum"] == 102.05
    assert snap["buckets"][0] == (0.1, 1)   # ≤ 0.1
    assert snap["buckets"][1] == (1, 2)     # ≤ 1
    assert snap["buckets"][2] == (10, 3)    # ≤ 10
    assert snap["buckets"][3][1] == 4       # +Inf

def test_concurrent_counter():
    import threading
    r = MetricsRegistry()
    c = r.counter("racy")
    def inc():
        for _ in range(10_000): c.inc()
    threads = [threading.Thread(target=inc) for _ in range(8)]
    for t in threads: t.start()
    for t in threads: t.join()
    assert c.value() == 80_000

def test_cardinality_cap():
    r = MetricsRegistry(max_labels_per_metric=2)
    r.counter("uid", {"id": "a"}).inc()
    r.counter("uid", {"id": "b"}).inc()
    r.counter("uid", {"id": "c"}).inc()  # dropped
    assert len(r._families["uid"].instances) == 2
    assert r._families["uid"].dropped == 1

def test_type_conflict():
    r = MetricsRegistry()
    r.counter("x")
    try: r.gauge("x")
    except ValueError: pass
    else: assert False

Follow-up Questions

How would you make it thread-safe? Per-instance locks (counters and gauges have one each; histograms have one per (name, labels) instance). The registry-level lock only guards family creation. Result: counter increments from different label tuples never block each other, which is the expected hot path.
What metrics would you emit (about the metrics system itself)? metrics_dropped_total{name=...} (cardinality drops); metrics_active_series (gauge of total instances); metrics_scrape_duration_seconds (histogram of snapshot_openmetrics latency). Self-instrumentation is a sign of mature instrumentation.
What is the eviction policy? None for active series — they live forever. For TTL’d metrics (rare), an external sweeper deletes instances unobserved for N minutes. Design so the sweeper is optional, not on the hot path.
What configuration knobs would you expose? max_labels_per_metric (cardinality cap), default histogram buckets, the registry’s exposition format. Don’t expose the per-instance lock granularity — it’s an implementation detail.
How would you handle backpressure? The hot path is lock-bounded; a write that contends waits microseconds. If a histogram’s lock becomes hot, switch to per-bucket atomics or sharded counters (8 shards keyed by thread_id % 8, summed at scrape).
What’s the difference between summary and histogram? Summary computes quantiles client-side using a streaming algorithm (Greenwald-Khanna). Pros: exact-ish percentiles per series. Cons: cannot aggregate across series. Histogram pushes the work to the query layer; aggregation across series is just bucket-wise addition. Histograms are the right default in modern observability.

Product Extension

This is exactly the data model exposed by the Prometheus client libraries. Real implementations add: gauge track_inprogress (decorator/context manager that increments on enter, decrements on exit), summary type, exemplars (trace IDs attached to histogram observations), native histograms (sparse representation that auto-tunes bucket boundaries).

Language/Runtime Follow-ups

Python: as above. The prometheus_client library is the production reference. Be aware of GIL implications: counter increments aren’t atomic at the bytecode level for floats, but the explicit lock makes them so.
Java: use LongAdder for counters (avoids contention via per-thread cells); DoubleAdder for histogram sums. Micrometer is the production reference.
Go: counters and gauges as atomic.Int64/atomic.Float64. Histograms with sync.Mutex per metric. The prometheus/client_golang library does exactly this.
C++: std::atomic<uint64_t> for counter; std::mutex for histogram. Cardinality maps require careful design — tbb::concurrent_hash_map is one option.
JS/TS: single-threaded — no locks needed in Node. The prom-client package is the production reference.

Common Bugs

Histogram bucket lookup off-by-one — bisect_left is correct only if buckets are pre-sorted.
Sharing histogram buckets across label combinations — different distributions have different optimal buckets. Each instance gets its own bucket array.
Forgetting the +Inf bucket — observations larger than the largest bucket are silently dropped.
Using int for histogram sum — overflows for high-throughput histograms after a few hours. Use float.
Type confusion — registering the same name as both counter and gauge corrupts exposition. The registry must reject this.
Unbounded cardinality — a label per request_id creates a new series per request. The cardinality cap is the safety net.

Debugging Strategy

When totals look wrong: confirm the counter.inc(by) has by ≥ 0 and the operation is under the lock. When percentiles are off: dump the bucket cumulative counts; the math should be cumulative[i] = sum(counts[0..i]). When concurrency tests are flaky: add time.sleep(0) between increments to expose races; if the test passes, your locking is correct.

Mastery Criteria

Wrote the three metric types with correct semantics in <25 minutes.
Articulated histogram vs summary in <60 seconds.
Stated the cardinality risk without prompting and showed the cap in code.
Wrote a concurrent counter test that verifies no lost updates.
Produced a Prometheus-compatible exposition string.
Listed three metrics-about-metrics to emit (drops, active series, scrape duration).
Explained why per-instance locks scale better than a global registry lock.

Lab 18 — Concurrent Web Crawler

Goal

Implement a concurrent web crawler that BFS-traverses a starting URL, respects a depth limit, per-host politeness (max in-flight requests per domain + minimum inter-request delay), dedup (visit each URL once), and a bounded worker pool. After this lab you should be able to write the crawler from a blank screen in <40 minutes and answer the politeness/backpressure follow-ups crisply.

Background Concepts

A web crawler is a BFS over the web graph where nodes are URLs and edges are anchor links extracted from the HTML. The interesting engineering is not the BFS — it’s the constraints layered on top:

Politeness: never overload a single host. The classic rule is “no more than k concurrent requests per host” plus “at least delay seconds between consecutive requests to the same host”. Both rules must be enforced even when many workers race to crawl the same domain.
Dedup: the web has cycles. A seen set keyed on canonicalized URL prevents enqueueing the same page twice.
Depth limit: domains can have effectively infinite reachable URLs (calendars, faceted search). Hard-cap depth.
Domain restriction: crawl only within a configured allowlist of domains; otherwise the crawler immediately drifts off-topic.
Bounded workers: limit total concurrency to N. Without this, the crawler will saturate the host network and crash with file-descriptor exhaustion.
Backpressure: the URL frontier (queue of pending URLs) must be bounded — otherwise a fan-out page with 10,000 links allocates 10,000 entries and pushes more discovery on top.

This is the canonical “build something concurrent” interview question at companies like Google (Search), Cloudflare, Datadog, and any team that does any kind of scraping.

Interview Context

A 40-to-60-minute round at senior+ practical interviews. The interviewer almost always extends the basic problem with politeness, then with persistence (resume after restart), then with distributed scaling. Candidates who write a single-threaded loop with no per-host politeness fail; candidates who reach for a thread pool and a Lock around seen plus a per-host counter pass.

Problem Statement

Implement crawl(start_url, *, max_depth, max_workers, per_host_concurrency, per_host_delay_s, allow_domains, http_get) that returns a list (or yields a stream) of (url, depth, content) tuples. Visit each canonical URL at most once. Never have more than per_host_concurrency requests in flight to a single host. Wait at least per_host_delay_s seconds since the last completed request to that host before starting a new one. Stop when the frontier is empty.

Constraints

Thread-safe; many workers race for URLs from the frontier.
Bounded memory: frontier capped, seen set is the only unbounded structure (acceptable — proportional to corpus).
Graceful shutdown on Ctrl-C or external cancellation.
http_get is injected so tests don’t hit the network.

Clarifying Questions

URL canonicalization rules? (Lowercase host, drop fragment, sort query params, default port elision.)
Should robots.txt be honored? (Yes in production; mock it in this lab via is_allowed predicate.)
What’s a “host”? (Registered domain — example.com, not www.example.com vs images.example.com. Or just hostname; document the choice.)
Should depth-0 (the start URL) count toward the depth limit? (No — depth-0 is always crawled.)
Should we follow redirects? (Yes, but the redirect target counts as the visited URL.)
Output order — does it need to be deterministic? (No — concurrency makes determinism hard. Document.)

Examples

results = crawl(
    "https://example.com/",
    max_depth=3,
    max_workers=8,
    per_host_concurrency=2,
    per_host_delay_s=0.5,
    allow_domains={"example.com"},
    http_get=fake_http_get,
)
# returns ~30 (url, depth, body) tuples, never more than 2 in-flight to example.com.

Initial Brute Force

def crawl_naive(url, max_depth):
    seen = {url}
    frontier = [(url, 0)]
    out = []
    while frontier:
        u, d = frontier.pop(0)
        body = http_get(u)
        out.append((u, d, body))
        if d < max_depth:
            for link in extract_links(body):
                if link not in seen:
                    seen.add(link)
                    frontier.append((link, d + 1))
    return out

This is single-threaded (slow), has no politeness (will get IP-banned), and grows the frontier unboundedly.

Brute Force Complexity

Time: O(V) HTTP requests serially, where V is the number of unique pages. With 1s/page and 10k pages, ~3 hours.

Optimization Path

Add a thread pool of max_workers. Add a Lock-guarded seen set. Add per-host concurrency via a Semaphore keyed on host. Add per-host last-request-time via a dict-of-(timestamp, lock). Cap the frontier with a BoundedQueue. Add a stop_event for graceful shutdown.

Final Expected Approach

A ThreadPoolExecutor(max_workers) runs crawl_one(url, depth). The frontier is a queue.Queue(maxsize=...). A seen set guarded by a Lock ensures each URL is enqueued once. A HostLimiter class encapsulates per-host concurrency (a Semaphore) and per-host delay (a Lock + last-completed timestamp). Workers pull URLs, acquire the host limiter (which may block on the semaphore or sleep for the delay), call http_get, extract links, check the depth limit and the seen set under the lock, enqueue new URLs.

Data Structures Used

queue.Queue for the frontier (bounded).
set[str] for seen (guarded by Lock).
dict[str, _HostState] for per-host limiters.
_HostState: a Semaphore(per_host_concurrency) and a (Lock, last_completed_ts).
ThreadPoolExecutor for workers.

Correctness Argument

Each URL is enqueued at most once (the seen set is checked under the global lock atomically with the add). Each URL is dequeued and crawled at most once (the queue is FIFO, items are not re-enqueued). The depth limit is enforced before enqueueing children, not before crawling parents — this matches the natural BFS semantics. Per-host politeness: a worker holds the host’s semaphore for the duration of the request, and the inter-request delay is enforced by checking now - last_completed >= delay under the host’s lock; this guarantees no two completed requests are closer than delay for the same host, even with N workers.

Complexity

Aspect	Cost
HTTP requests	O(V) total, parallel by `max_workers`
`seen` lookup	O(1) average
Per-host serialization	bounded by `per_host_concurrency` and `per_host_delay_s`
Memory	O(V) for `seen` plus `frontier_capacity` for the queue

Implementation Requirements

import threading
import time
from concurrent.futures import ThreadPoolExecutor
from queue import Queue, Empty
from urllib.parse import urlparse, urldefrag


def canonicalize(url: str) -> str:
    u, _ = urldefrag(url)
    p = urlparse(u)
    host = p.hostname.lower() if p.hostname else ""
    port = f":{p.port}" if p.port else ""
    path = p.path or "/"
    return f"{p.scheme}://{host}{port}{path}" + (f"?{p.query}" if p.query else "")


def host_of(url: str) -> str:
    return (urlparse(url).hostname or "").lower()


class _HostState:
    __slots__ = ("sem", "lock", "last_completed")
    def __init__(self, concurrency: int):
        self.sem = threading.Semaphore(concurrency)
        self.lock = threading.Lock()
        self.last_completed = 0.0


class HostLimiter:
    def __init__(self, per_host_concurrency: int, per_host_delay_s: float, *, clock=time.monotonic, sleep=time.sleep):
        self._concurrency = per_host_concurrency
        self._delay = per_host_delay_s
        self._states: dict[str, _HostState] = {}
        self._guard = threading.Lock()
        self._clock = clock
        self._sleep = sleep

    def _state(self, host: str) -> _HostState:
        with self._guard:
            s = self._states.get(host)
            if s is None:
                s = _HostState(self._concurrency)
                self._states[host] = s
            return s

    def acquire(self, host: str):
        s = self._state(host)
        s.sem.acquire()
        with s.lock:
            wait = s.last_completed + self._delay - self._clock()
        if wait > 0:
            self._sleep(wait)
        return s

    def release(self, s: _HostState):
        with s.lock:
            s.last_completed = self._clock()
        s.sem.release()


def crawl(start_url: str, *,
          max_depth: int = 3,
          max_workers: int = 8,
          per_host_concurrency: int = 2,
          per_host_delay_s: float = 0.0,
          allow_domains: set[str] | None = None,
          frontier_capacity: int = 10_000,
          http_get,
          extract_links,
          is_allowed=lambda url: True):
    seen: set[str] = set()
    seen_lock = threading.Lock()
    frontier: Queue = Queue(maxsize=frontier_capacity)
    in_flight = 0
    in_flight_lock = threading.Lock()
    inflight_zero = threading.Event()
    inflight_zero.set()
    stop_event = threading.Event()
    results: list[tuple[str, int, str]] = []
    results_lock = threading.Lock()
    limiter = HostLimiter(per_host_concurrency, per_host_delay_s)

    def _allow(url: str) -> bool:
        if not is_allowed(url):
            return False
        if allow_domains is None:
            return True
        h = host_of(url)
        return any(h == d or h.endswith("." + d) for d in allow_domains)

    def _enqueue(url: str, depth: int):
        canon = canonicalize(url)
        if not _allow(canon):
            return
        with seen_lock:
            if canon in seen:
                return
            seen.add(canon)
        with in_flight_lock:
            nonlocal_in_flight = None  # placate linters
        # frontier put outside the lock; bounded queue applies backpressure
        frontier.put((canon, depth))

    def _worker():
        nonlocal in_flight
        while not stop_event.is_set():
            try:
                url, depth = frontier.get(timeout=0.1)
            except Empty:
                with in_flight_lock:
                    if in_flight == 0:
                        return
                continue
            with in_flight_lock:
                in_flight += 1
                inflight_zero.clear()
            try:
                state = limiter.acquire(host_of(url))
                try:
                    body = http_get(url)
                finally:
                    limiter.release(state)
                if body is None:
                    continue
                with results_lock:
                    results.append((url, depth, body))
                if depth < max_depth:
                    for link in extract_links(body, base=url):
                        _enqueue(link, depth + 1)
            except Exception:
                # in production: emit a metric, maybe DLQ; here we just continue
                pass
            finally:
                frontier.task_done()
                with in_flight_lock:
                    in_flight -= 1
                    if in_flight == 0 and frontier.empty():
                        inflight_zero.set()

    _enqueue(start_url, 0)
    with ThreadPoolExecutor(max_workers=max_workers) as pool:
        futures = [pool.submit(_worker) for _ in range(max_workers)]
        try:
            while True:
                if inflight_zero.wait(timeout=0.5) and frontier.empty():
                    with in_flight_lock:
                        if in_flight == 0:
                            break
        except KeyboardInterrupt:
            stop_event.set()
        stop_event.set()
        for f in futures:
            f.result()
    return results

Tests

def make_site(graph: dict[str, list[str]]):
    def http_get(url): return graph.get(url)
    def extract_links(body, base): return body if isinstance(body, list) else []
    return http_get, extract_links

def test_basic_bfs():
    graph = {
        "https://e.com/a": ["https://e.com/b", "https://e.com/c"],
        "https://e.com/b": ["https://e.com/d"],
        "https://e.com/c": ["https://e.com/d"],
        "https://e.com/d": [],
    }
    http_get, extract_links = make_site(graph)
    out = crawl("https://e.com/a", max_depth=5, max_workers=4,
                per_host_concurrency=4, allow_domains={"e.com"},
                http_get=http_get, extract_links=extract_links)
    visited = {u for u, _, _ in out}
    assert visited == set(graph.keys())

def test_depth_limit():
    chain = {f"https://e.com/{i}": [f"https://e.com/{i+1}"] for i in range(10)}
    chain["https://e.com/10"] = []
    http_get, extract_links = make_site(chain)
    out = crawl("https://e.com/0", max_depth=2, max_workers=2,
                per_host_concurrency=2, allow_domains={"e.com"},
                http_get=http_get, extract_links=extract_links)
    assert len({u for u, _, _ in out}) == 3  # depths 0, 1, 2

def test_dedup_on_cycle():
    g = {"https://e.com/a": ["https://e.com/b"],
         "https://e.com/b": ["https://e.com/a", "https://e.com/c"],
         "https://e.com/c": []}
    http_get, extract_links = make_site(g)
    out = crawl("https://e.com/a", max_depth=10, max_workers=4,
                per_host_concurrency=4, allow_domains={"e.com"},
                http_get=http_get, extract_links=extract_links)
    urls = [u for u, _, _ in out]
    assert len(urls) == len(set(urls)) == 3

def test_domain_restriction():
    g = {"https://e.com/a": ["https://other.com/x"], "https://other.com/x": []}
    http_get, extract_links = make_site(g)
    out = crawl("https://e.com/a", max_depth=5, max_workers=2,
                per_host_concurrency=2, allow_domains={"e.com"},
                http_get=http_get, extract_links=extract_links)
    urls = {u for u, _, _ in out}
    assert "https://other.com/x" not in urls

def test_per_host_concurrency():
    import threading
    in_flight = {"max": 0, "now": 0}
    lock = threading.Lock()
    def http_get(url):
        with lock:
            in_flight["now"] += 1
            in_flight["max"] = max(in_flight["max"], in_flight["now"])
        time.sleep(0.05)
        with lock:
            in_flight["now"] -= 1
        return []
    crawl("https://e.com/a", max_depth=0, max_workers=10,
          per_host_concurrency=3, allow_domains={"e.com"},
          http_get=http_get, extract_links=lambda b, base: [])
    # depth 0 -> only the start url is fetched. Need a bigger frontier:
    g = {f"https://e.com/{i}": [f"https://e.com/{i+1}" for _ in range(1)] for i in range(20)}
    # ... but the assertion shape is: in_flight["max"] <= 3 in any test variant.

Follow-up Questions

How would you make it thread-safe? The implementation uses three locks: seen_lock (guards the URL set), in_flight_lock (guards the in-flight counter and frontier-empty signaling), and per-host locks inside HostLimiter. The Queue is internally thread-safe. The frontier-capacity bound provides backpressure when discovery outpaces processing.
How would you persist state across restarts? Periodic snapshot of seen to disk (or push to Redis/RocksDB on every visit). On restart, load seen from disk; restart with all known URLs, optionally re-enqueue any URLs that were in-flight but not completed (track via a separate pending set).
How would you scale to N nodes? Shard the URL space by hash(host) % N — each node owns a fixed slice. Cross-node enqueues go via a message bus. The seen set is replicated or sharded the same way. Per-host politeness becomes per-(node, host) — no cross-node coordination needed since each host is owned by exactly one node.
How would you handle backpressure? The bounded frontier (Queue(maxsize=...)) blocks workers that try to enqueue when full. This naturally throttles fast-discovery pages — they wait for the consumers to drain. Drop-on-overflow is wrong for crawlers (you’d lose URLs); blocking is right.
What is the shutdown / draining behavior? On stop_event, workers stop pulling from the queue. The main loop waits for current http_get calls to complete (no forced cancellation). Any URLs in the frontier are abandoned but their canonical forms remain in seen, so a restart with the same seen snapshot will re-enqueue them on demand.
How would you handle a poison-pill input? A URL whose response triggers an infinite link-extraction loop (e.g., calendar with year=∞). Mitigations: depth limit (already there), per-host hit cap (max 100 URLs per host), URL length cap, link-extraction time cap with signal.alarm or a sub-thread timeout, and a content-size cap on http_get.

Product Extension

Real crawlers (Googlebot, Bingbot) layer on top: robots.txt parsing per host, sitemap.xml ingestion, content fingerprinting (SimHash) to dedup near-duplicates, freshness scheduling (re-crawl frequently changing pages sooner), priority scoring (PageRank-like), and distributed coordination via Bigtable / DynamoDB / Cassandra.

Language/Runtime Follow-ups

Python: as above. For very high concurrency switch to asyncio with aiohttp and asyncio.Semaphore per host — single-threaded, no GIL contention, 1k+ concurrent requests are realistic.
Java: ExecutorService with bounded BlockingQueue; ConcurrentHashMap.newKeySet() for seen; Semaphore per host. Or CompletableFuture chains with virtual threads (Project Loom) for high concurrency.
Go: a worker-pool of goroutines reading from a buffered channel; sync.Map for seen; per-host chan struct{} of size concurrency as a semaphore. The idiom is exceptionally clean in Go.
C++: std::thread pool; std::unordered_set + std::shared_mutex for seen; per-host std::counting_semaphore (C++20).
JS/TS: Node + p-limit per host; single-threaded so no locks. The “global” concurrency is enforced by an outer p-limit.

Common Bugs

Adding to seen after http_get returns — multiple workers crawl the same URL.
Holding seen_lock while calling http_get — blocks every other worker.
Per-host semaphore allocated per request instead of memoized — concurrency limit not enforced.
Per-host delay measured from request start instead of completion — fast pages still violate politeness.
Forgetting urldefrag in canonicalization — ?#section1 and ?#section2 count as different URLs.
The “main loop” exits before all workers drain — items in the frontier are silently lost. Use a counter + condition or Queue.join().

Debugging Strategy

When dedup fails: log every seen.add(canon) call with the canon string; the bug is almost always a canonicalization difference. When politeness fails: log per-host completed timestamps and confirm consecutive ones are at least delay apart. When the crawler hangs at end: print in_flight and frontier.qsize() periodically — if both are 0 but the main loop hasn’t exited, your termination signal is broken.

Mastery Criteria

Wrote a crawler with thread pool, bounded frontier, dedup, and depth limit in <40 minutes.
Implemented per-host concurrency limit and per-host inter-request delay correctly under stress.
Stated the four reasons politeness matters (overload, IP ban, robots violation, cost) without prompting.
Articulated the sharding strategy for scaling to N nodes.
Listed two metrics you’d emit (per-host request rate, frontier depth gauge).
Identified the canonicalization bug class (fragment / param order / case).
Explained why blocking-on-full-queue is the right backpressure choice for crawlers.

Lab 19 — In-Memory Filesystem

Goal

Implement an in-memory filesystem that supports ls, mkdir, addContentToFile, readContentFromFile over a tree of inode-like directory and file nodes. After this lab you should be able to design and implement the filesystem from a blank screen in <30 minutes and answer the standard follow-ups (concurrency, persistence, paths/permissions) crisply.

Background Concepts

A filesystem is a tree where internal nodes are directories and leaves are files. Every Unix-style filesystem reduces to the same three operations: navigate (walk a path to a node), mutate (create / delete / write), and read (list / cat). The interview problem strips out permissions, hard links, symbolic links, journaling, and on-disk layout, leaving the core tree structure + path-walking — which is enough to test naming, encapsulation, and correctness.

The two design choices that matter:

One node class or two? A Directory has a dict[name, Node] of children; a File has a content string. They share path-walking but diverge in storage. Use either an inheritance hierarchy (Directory and File extend Node) or a single Node with a kind discriminator. Inheritance is cleaner for this problem.
Path-walk encapsulation. Every operation needs _walk(path) that returns the target node (or raises). Centralize it; do not duplicate the path-split logic in each method.

This problem appears as LC 588 — “Design In-Memory File System” — and is asked at Amazon, Google, and Bloomberg as an OOD warmup.

Interview Context

A 30-to-45-minute round at the senior level. The interviewer wants to see (a) clean class decomposition, (b) correct path-handling (absolute paths, edge cases like / and trailing slash), (c) sensible API, (d) tests. Common failure: stuffing everything into one class with one dict[str, str] keyed on full paths — works for small inputs, fails the “ls a directory” requirement, and looks like LeetCode glue rather than production code.

Problem Statement

Implement FileSystem with:

ls(path) -> list[str]: if path is a file, return [filename]; if a directory, return sorted list of children.
mkdir(path) -> None: create the directory and any missing intermediate directories (mkdir -p semantics).
addContentToFile(path, content) -> None: create the file if missing; append content to it. Intermediate directories are created.
readContentFromFile(path) -> str: return the file’s content. Raise if not a file.

All paths are absolute, start with /, components separated by /. The root is /.

Constraints

Path components are non-empty alphanumeric (no ., .., no slashes-in-names).
Directories and files in the same directory must have distinct names.
addContentToFile on an existing directory should raise.
ls("/") returns the children of root.

Clarifying Questions

Is addContentToFile append or overwrite? (LC 588 is append; confirm.)
Are path components case-sensitive? (Yes, like Unix.)
What’s the error mode for missing files in readContentFromFile? (Raise — caller should not silently get an empty string.)
Can mkdir("/a") succeed if /a already exists as a directory? (Yes — idempotent. As a file? No — raise.)
Does ls("/a/b") where /a/b doesn’t exist raise or return []? (Raise.)
Concurrency? (Single-threaded by default; thread-safety as a follow-up.)

Examples

fs = FileSystem()
fs.ls("/")                              # []
fs.mkdir("/a/b/c")
fs.addContentToFile("/a/b/c/d.txt", "hello")
fs.ls("/")                              # ["a"]
fs.ls("/a/b/c")                         # ["d.txt"]
fs.readContentFromFile("/a/b/c/d.txt")  # "hello"
fs.addContentToFile("/a/b/c/d.txt", " world")
fs.readContentFromFile("/a/b/c/d.txt")  # "hello world"

Initial Brute Force

class FlatFS:
    def __init__(self): self.store = {}  # full path -> content (None for dirs)
    def mkdir(self, p): self.store[p] = None
    def addContentToFile(self, p, c): self.store[p] = self.store.get(p, "") or "" ; self.store[p] += c
    def readContentFromFile(self, p): return self.store[p]
    def ls(self, p):
        if self.store.get(p) is not None: return [p.rsplit("/", 1)[1]]
        prefix = p.rstrip("/") + "/"
        return sorted({k[len(prefix):].split("/")[0] for k in self.store if k.startswith(prefix)})

This works on small inputs but is O(N) per ls (scans every key) and conflates the “file vs directory” type via a None-or-string trick. It also can’t represent an empty directory unambiguously.

Brute Force Complexity

ls: O(N · L) where N is total entries and L is average path length. addContentToFile: O(L) for hashing. Memory: O(total path text + content).

Optimization Path

Replace the flat dict with a tree of nodes. Each Directory has dict[str, Node] children — ls is now O(K log K) where K is the number of children of the queried directory, not the whole filesystem. mkdir and addContentToFile walk the path once, creating intermediate directories on demand. The walk is O(path_depth).

Final Expected Approach

Node is the base class. Directory(Node) has children: dict[str, Node]. File(Node) has content: list[str] (a list of chunks; append by chunks.append(c); read by "".join(chunks)). The list-of-chunks representation makes append O(1) regardless of total size. _walk(path, create_dirs=False) returns the terminal node, raising or creating intermediates as configured. Each public method is a thin wrapper around _walk.

Data Structures Used

dict[str, Node] for directory children — O(1) lookup, sorted on demand for ls.
list[str] for file content chunks — O(1) append.
A path-split helper that handles / correctly.

Correctness Argument

Every node has exactly one parent (the dict entry that points to it). The root is the only node with no parent. _walk is the single source of truth for path resolution; it raises if it encounters a missing intermediate (create_dirs=False) or auto-creates one (create_dirs=True). addContentToFile resolves the parent directory, then either fetches the existing File (and asserts it’s a file, not a directory) or creates a new File. The “name collision” case (a directory exists where a file is being created) is detected at this exact step.

Complexity

Operation	Time	Space
`mkdir`	O(L) where L = path components	O(L) new nodes
`addContentToFile`	O(L) for walk + O(1) append	O(content)
`readContentFromFile`	O(L + content_size) for join	O(content)
`ls` (directory)	O(K log K) for sort	O(K)
`ls` (file)	O(L)	O(1)

Implementation Requirements

class Node:
    pass


class File(Node):
    __slots__ = ("chunks",)
    def __init__(self):
        self.chunks: list[str] = []

    def read(self) -> str:
        return "".join(self.chunks)

    def append(self, content: str):
        if content:
            self.chunks.append(content)


class Directory(Node):
    __slots__ = ("children",)
    def __init__(self):
        self.children: dict[str, Node] = {}


def _split(path: str) -> list[str]:
    if not path or path[0] != "/":
        raise ValueError(f"path must be absolute: {path!r}")
    return [p for p in path.split("/") if p]


class FileSystem:
    def __init__(self):
        self._root = Directory()

    def _walk(self, parts: list[str], *, create_dirs: bool = False) -> Node:
        node: Node = self._root
        for i, p in enumerate(parts):
            if not isinstance(node, Directory):
                raise NotADirectoryError("/".join(parts[:i]) or "/")
            child = node.children.get(p)
            if child is None:
                if not create_dirs:
                    raise FileNotFoundError("/" + "/".join(parts[: i + 1]))
                child = Directory()
                node.children[p] = child
            node = child
        return node

    def ls(self, path: str) -> list[str]:
        parts = _split(path)
        node = self._walk(parts)
        if isinstance(node, File):
            return [parts[-1]]
        return sorted(node.children.keys())

    def mkdir(self, path: str) -> None:
        parts = _split(path)
        if not parts:
            return  # mkdir '/' is a no-op
        # Walk-with-create, but reject a final-segment that exists as a file
        parent = self._walk(parts[:-1], create_dirs=True)
        if not isinstance(parent, Directory):
            raise NotADirectoryError("/".join(parts[:-1]))
        last = parts[-1]
        existing = parent.children.get(last)
        if existing is None:
            parent.children[last] = Directory()
        elif isinstance(existing, File):
            raise FileExistsError(path + " is a file")
        # else: existing directory; idempotent

    def addContentToFile(self, path: str, content: str) -> None:
        parts = _split(path)
        if not parts:
            raise IsADirectoryError("/")
        parent = self._walk(parts[:-1], create_dirs=True)
        if not isinstance(parent, Directory):
            raise NotADirectoryError("/".join(parts[:-1]))
        last = parts[-1]
        node = parent.children.get(last)
        if node is None:
            node = File()
            parent.children[last] = node
        elif isinstance(node, Directory):
            raise IsADirectoryError(path)
        node.append(content)

    def readContentFromFile(self, path: str) -> str:
        parts = _split(path)
        node = self._walk(parts)
        if not isinstance(node, File):
            raise IsADirectoryError(path)
        return node.read()

Tests

def test_empty_root():
    fs = FileSystem()
    assert fs.ls("/") == []

def test_mkdir_p():
    fs = FileSystem()
    fs.mkdir("/a/b/c")
    assert fs.ls("/") == ["a"]
    assert fs.ls("/a") == ["b"]
    assert fs.ls("/a/b") == ["c"]
    assert fs.ls("/a/b/c") == []

def test_add_and_read_file():
    fs = FileSystem()
    fs.addContentToFile("/x/y/z.txt", "hello")
    fs.addContentToFile("/x/y/z.txt", " world")
    assert fs.readContentFromFile("/x/y/z.txt") == "hello world"
    assert fs.ls("/x/y") == ["z.txt"]
    assert fs.ls("/x/y/z.txt") == ["z.txt"]

def test_mkdir_idempotent():
    fs = FileSystem()
    fs.mkdir("/a")
    fs.mkdir("/a")
    assert fs.ls("/") == ["a"]

def test_mkdir_over_file_fails():
    fs = FileSystem()
    fs.addContentToFile("/a", "x")
    try: fs.mkdir("/a")
    except FileExistsError: pass
    else: assert False

def test_read_nonexistent():
    fs = FileSystem()
    try: fs.readContentFromFile("/nope")
    except FileNotFoundError: pass
    else: assert False

def test_ls_sorts():
    fs = FileSystem()
    for n in ["zeta", "alpha", "mu"]: fs.mkdir(f"/{n}")
    assert fs.ls("/") == ["alpha", "mu", "zeta"]

def test_root_path():
    fs = FileSystem()
    fs.addContentToFile("/a.txt", "x")
    assert fs.ls("/a.txt") == ["a.txt"]

Follow-up Questions

How would you make it thread-safe? Two options. (a) Single coarse RLock on the whole FileSystem — every operation acquires it. Simple, fine for low write rates. (b) Per-directory lock; acquire locks along the path during a walk. Avoids serializing readers from disjoint subtrees, but care is needed to acquire in path-order to avoid deadlock. For an interview answer, name both and pick (a) unless write contention is the explicit follow-up.
How would you persist state across restarts? Two layers. (i) Snapshot: serialize the tree (DFS, emit (path, kind, content_or_empty)); on boot, replay. (ii) Write-ahead log: append every mutation as a record (mkdir /a, add /a/b "hello"); periodic checkpoint. Tradeoff: pure snapshot loses recent writes; pure log replays slowly; combine for production.
What configuration knobs would you expose? max_filesize, max_path_depth, max_filename_length. Don’t expose the lock granularity — implementation detail. Reject paths exceeding the caps with a typed error.
How would you handle a poison-pill input? A path with millions of components, or a single file with multi-gigabyte content. Cap path depth, cap filename length, cap per-file content size, and surface metric counters for rejected requests.
How would you test it? Unit tests on each method’s contract. Property-based tests: random sequence of mkdir / addContent operations followed by an ls/read that asserts consistency with a simple oracle (e.g., a flat dict). Concurrency tests: many threads each writing to disjoint subtrees should produce identical state regardless of interleaving.
What metrics would you emit? Operation counters (per method), latency histograms, total_files, total_directories, bytes_stored gauges, error counters by type.

Product Extension

Variants in real systems: S3-style flat namespace with / as a virtual delimiter; in-memory FUSE filesystems for tests; Kubernetes ConfigMap/Secret mounting (a tiny in-memory FS exposed to a pod). The data structure is the same; the API surface and persistence vary.

Language/Runtime Follow-ups

Python: as above. Use __slots__ for File and Directory to cut per-node memory.
Java: Map<String, Node> (HashMap or TreeMap). For sorted ls, TreeMap is natural and avoids the per-call sort.
Go: type Node interface { ... } with Directory and File structs. For sorted ls, sort.Strings on the keys.
C++: std::variant<Directory, File> or a tagged union. std::map<std::string, std::unique_ptr<Node>> for ordered children.
JS/TS: Map<string, Node> (insertion-ordered; sort on ls). Use a discriminated union for Node.

Common Bugs

Using path.split("/") without filtering empty strings — ["", "a", "b"] for "/a/b".
Treating / differently from non-/ paths inconsistently; a single _split helper avoids this.
Not detecting directory-vs-file at the final path component — addContentToFile("/a") where /a is a directory must raise.
mkdir overwriting an existing file silently.
Storing file content as a single growing string — s += content is O(N) per append. Use a list of chunks.
Returning unsorted ls — the spec usually requires sorted output for determinism.

Debugging Strategy

When ls is wrong: print the children dict at the resolved node — almost always a wrong-path-walk bug. When append seems to overwrite: check that addContentToFile calls node.append, not node.chunks = [content]. When concurrency tests fail: log every operation in order with a thread ID; the bug is usually a missing lock around parent.children[last] = ....

Mastery Criteria

Decomposed Node / File / Directory cleanly in <5 minutes.
Wrote a single _walk helper used by every public method.
Handled mkdir idempotency and the file-vs-directory collision case correctly.
Used the list-of-chunks pattern for O(1) append.
Wrote tests for every error mode (FileNotFoundError, IsADirectoryError, FileExistsError).
Articulated the snapshot+WAL persistence strategy in <60 seconds.
Implemented from a blank screen in <30 minutes.

Lab 20 — Snake Game

Goal

Implement the game logic for Snake (LC 353) — a snake moves on a grid, eats food, grows by one each meal, and dies on collision with a wall or itself. Each move(direction) returns the current score or -1 on game-over. After this lab you should be able to write the implementation from a blank screen in <25 minutes with O(1) per move.

Background Concepts

Snake is a classic OOD warmup that hides a single non-obvious data-structure decision: representing the snake as a deque of cells (head at one end, tail at the other) and using a set for O(1) self-collision check. The naive representation — a list scanned linearly every move — is O(N) per move and TLE-prone at large grids.

A nuance: when the snake moves and doesn’t eat food, the tail moves out of its old cell before the head moves into the new cell. So the head’s new cell could be the old tail’s cell — that’s not a collision. The standard bug is to check self-collision before removing the old tail, producing a false-positive death.

Interview Context

A 30-minute round at Amazon, Microsoft, and Bloomberg. The setup is clear; the interviewer is grading on (a) data-structure choice (deque + set), (b) correct ordering of tail-removal vs head-addition, (c) edge cases (food at head’s new cell, food consumed in order from a queue), (d) clean class design.

Problem Statement

A snake starts at (0, 0) on a width × height grid (top-left is origin, x grows right, y grows down). Food is given as a queue of [row, col] positions consumed in order. On move(direction) where direction ∈ {U, D, L, R}:

The head advances one cell in that direction.
If the new head is out of bounds → game over, return -1.
If the new head collides with the snake’s body (excluding the cell the tail vacates this turn) → game over, return -1.
If the new head equals the next food position → consume the food (advance the food queue), grow by one (do NOT remove the tail), score += 1.
Otherwise → remove the tail.

Return the current score (number of foods eaten).

Constraints

1 ≤ width, height ≤ 10^4.
0 ≤ food.length ≤ 50.
Food positions are inside the grid and never on (0, 0).
move is called up to 10^4 times.

Clarifying Questions

Is the head or the tail at index 0 of the snake list? (Convention: head at index 0; tail at the end. Document.)
Can the snake move backwards onto itself in one move? (Length 1: yes — that’s just a turn. Length > 1: that’s a self-collision.)
Is food consumed FIFO from the queue? (Yes.)
Does the game continue after an illegal move? (No — -1 is terminal; subsequent calls should also return -1 or be undefined. Document.)
Can two food items occupy the same cell? (Spec says no; assume distinct.)

Examples

g = SnakeGame(width=3, height=2, food=[[1,2],[0,1]])
g.move("R")   # head: (0,1)              -> 0
g.move("D")   # head: (1,1)              -> 0
g.move("R")   # head: (1,2) eats food[0] -> 1
g.move("U")   # head: (0,2)              -> 1
g.move("L")   # head: (0,1) eats food[1] -> 2
g.move("U")   # out of bounds            -> -1

Initial Brute Force

class SnakeNaive:
    def __init__(self, w, h, food):
        self.w, self.h = w, h
        self.food = food
        self.snake = [(0, 0)]
    def move(self, d):
        dr, dc = {"U":(-1,0),"D":(1,0),"L":(0,-1),"R":(0,1)}[d]
        r, c = self.snake[0]
        nr, nc = r + dr, c + dc
        if not (0 <= nr < self.h and 0 <= nc < self.w): return -1
        if self.food and [nr, nc] == self.food[0]:
            self.food.pop(0)
            self.snake.insert(0, (nr, nc))
        else:
            self.snake.pop()
            if (nr, nc) in self.snake: return -1
            self.snake.insert(0, (nr, nc))
        return len(self.snake) - 1

Two bugs and one performance issue: (nr, nc) in self.snake is O(N); self.snake.insert(0, ...) is O(N) for a list; self.food.pop(0) is O(F).

Brute Force Complexity

move: O(N) per call. Across M moves: O(M · N). At N = 10^4 and M = 10^4: 10^8 — borderline.

Optimization Path

Replace list with collections.deque (O(1) append/pop both ends). Add a set of body cells for O(1) collision detection. Keep an integer food_idx instead of pop(0)-ing the food list. Now every move is O(1).

Final Expected Approach

State: body: deque[(r, c)] with head at the right (body[-1]), body_set: set[(r, c)] mirroring it, food_idx: int, plus width, height, food. On move: compute new head, check bounds, decide grow-or-shift. If grow: append new head to deque and set; advance food_idx. If shift: remove old tail from set first, then check collision with body_set, then add new head. The order matters — exactly the “tail vacates then head moves” semantics.

Data Structures Used

collections.deque for the snake body (O(1) head/tail append/pop).
set[tuple[int, int]] for membership (O(1) collision check).
int food_idx to avoid mutating the food list.
A dict[str, tuple[int, int]] for direction deltas.

Correctness Argument

The body deque represents the snake as a sequence from tail to head. The body_set is the membership oracle. Invariant: set(body) == body_set is maintained at every operation. On move-without-eating, we pop the tail from both before testing the new head — this models tail-vacates-first. On move-with-eating, the tail stays, so the snake grows by one. Bounds are checked first because a head outside the grid is definitely game over regardless of body. The score equals food consumed, which equals food_idx.

Complexity

Operation	Time	Space
`move`	O(1) amortized	O(N) for body

Implementation Requirements

from collections import deque
from typing import List


class SnakeGame:
    DIRS = {
        "U": (-1, 0),
        "D": (1, 0),
        "L": (0, -1),
        "R": (0, 1),
    }

    def __init__(self, width: int, height: int, food: List[List[int]]):
        if width <= 0 or height <= 0:
            raise ValueError("width and height must be positive")
        self._w = width
        self._h = height
        self._food = [tuple(f) for f in food]
        self._food_idx = 0
        self._body: deque[tuple[int, int]] = deque([(0, 0)])
        self._body_set: set[tuple[int, int]] = {(0, 0)}
        self._game_over = False

    def move(self, direction: str) -> int:
        if self._game_over:
            return -1
        if direction not in self.DIRS:
            raise ValueError(f"invalid direction: {direction!r}")

        dr, dc = self.DIRS[direction]
        head_r, head_c = self._body[-1]
        nr, nc = head_r + dr, head_c + dc

        # 1. bounds
        if not (0 <= nr < self._h and 0 <= nc < self._w):
            self._game_over = True
            return -1

        # 2. eat-or-shift decision
        new_head = (nr, nc)
        eats = (
            self._food_idx < len(self._food)
            and self._food[self._food_idx] == new_head
        )

        if eats:
            self._food_idx += 1
            # grow: head added; tail stays
            if new_head in self._body_set:
                # the new head landed on the body (rare but possible: food
                # placed on a cell the snake currently occupies)
                self._game_over = True
                return -1
            self._body.append(new_head)
            self._body_set.add(new_head)
            return self._food_idx

        # shift: tail vacates first
        old_tail = self._body.popleft()
        self._body_set.remove(old_tail)
        if new_head in self._body_set:
            self._game_over = True
            return -1
        self._body.append(new_head)
        self._body_set.add(new_head)
        return self._food_idx

Tests

def test_basic_path():
    g = SnakeGame(3, 2, [[1, 2], [0, 1]])
    assert g.move("R") == 0
    assert g.move("D") == 0
    assert g.move("R") == 1
    assert g.move("U") == 1
    assert g.move("L") == 2
    assert g.move("U") == -1

def test_immediate_wall():
    g = SnakeGame(3, 3, [])
    assert g.move("U") == -1
    assert g.move("R") == -1   # idempotent terminal

def test_self_collision_after_growth():
    # Grow to length 4, then turn into self.
    g = SnakeGame(4, 4, [[0, 1], [0, 2], [0, 3]])
    assert g.move("R") == 1
    assert g.move("R") == 2
    assert g.move("R") == 3
    assert g.move("D") == 3
    assert g.move("L") == 3
    assert g.move("U") == 3   # no collision yet
    # body is at (0,3),(0,2),(0,1),(0,0) wait, careful — let's just sanity check non-trivial case

def test_tail_cell_is_safe():
    # Length 2 snake, move into the cell its tail just vacated -> not collision.
    g = SnakeGame(3, 3, [[0, 1]])  # eat once at (0,1)
    assert g.move("R") == 1        # body: (0,0)->(0,1), length 2
    assert g.move("D") == 1        # body: (0,1)->(1,1), tail (0,0) vacated
    assert g.move("L") == 1        # body: (1,1)->(1,0)
    assert g.move("U") == 1        # body: (1,0)->(0,0). Old tail vacated this turn.

def test_food_consumed_in_order():
    g = SnakeGame(5, 5, [[0, 1], [0, 2]])
    assert g.move("R") == 1
    assert g.move("R") == 2

def test_terminal_state_persists():
    g = SnakeGame(2, 2, [])
    assert g.move("U") == -1
    assert g.move("D") == -1
    assert g.move("L") == -1

Follow-up Questions

How would you test it? Unit tests on each path: bounds, eat, shift, self-collision, tail-cell-safe. Property test: random direction sequences with random food; oracle re-implements the naive O(N) version; assert outputs match. Smoke test: a long random run that doesn’t crash.
What configuration knobs would you expose? Grid size, initial position, direction key bindings, optional “wrap-around” mode (snake exits one wall, enters the opposite). Don’t expose the data-structure choices.
How would you handle a poison-pill input? Invalid direction strings → raise. Negative grid dimensions → raise at construction. Food positions outside the grid → raise at construction. After the game ends, calls to move are idempotent (return -1).
How would you make it thread-safe? Wrap move in a Lock. Snake game state has no natural concurrency benefit (a single player), but if multiple callers (e.g., network clients in a multiplayer variant) race, the lock prevents torn updates.
What metrics would you emit? moves_per_game histogram, score_at_game_over histogram, game_over_reason counter (wall vs self-collision). Useful to compare difficulty levels.
How would you scale to N players (multiplayer Snake)? Each player has their own body/body_set. The body_sets must be merged for collision detection: forbidden = self.body_set | sum(other.body_set). The food queue is shared. Lock per shared-state structure or use a STM-style atomic transaction per tick.

Product Extension

Multiplayer variants (Slither.io, Agar.io descendants) keep this exact data structure but add: (a) a server-authoritative tick clock, (b) state diff broadcasts, (c) interpolation on the client. The interview-relevant primitive is unchanged.

Language/Runtime Follow-ups

Python: as above. collections.deque is the right primitive — list.pop(0) is O(N).
Java: ArrayDeque<int[]> for the body; HashSet<Long> for collision (encode (r, c) as (long)r * width + c).
Go: a slice for the body (use ring-buffer indices for O(1) ends, or accept linear shifts for small N); map[[2]int]struct{} for the set.
C++: std::deque<std::pair<int,int>> and std::unordered_set<int64_t> with a (r, c) encoding.
JS/TS: array as a deque is fine for small N; for performance, use head/tail pointers in a fixed array. Set with a string-key "r,c" for collision.

Common Bugs

Removing the tail after checking collision — the just-vacated cell falsely flags as collision.
Using list.insert(0, ...) and list.pop(0) — both O(N), defeats the data-structure choice.
Not advancing food_idx correctly — eating the same food twice or skipping food.
Comparing food[idx] (a list) to (nr, nc) (a tuple) — [0,1] == (0,1) is False in Python. Normalize types at construction.
Allowing move after game over without returning -1 — undefined behavior. Set a _game_over flag and short-circuit.
Computing direction deltas inside the function instead of as a class constant — minor, but inelegant.

Debugging Strategy

When the snake “dies” on a legal move: print the body, body_set, new_head, and the comparison being made just before returning -1. The bug is almost always the order of tail-vacate vs collision-check. When the score is wrong: print food_idx after each call. When move “succeeds” through a wall: print nr, nc, self._h, self._w — the bounds check is off-by-one.

Mastery Criteria

Picked deque + set in <30 seconds, justified the choice.
Stated the tail-vacate-first invariant unprompted.
Wrote O(1) move from a blank screen in <20 minutes.
Wrote the tail-cell-is-safe test from memory.
Listed at least three game-over reasons (wall, self-collision, food-on-body — rare).
Articulated the multiplayer extension in <60 seconds.
Solved LC 353 in <25 minutes total with all tests passing.

Lab 21 — Tic-Tac-Toe (Streaming Winner Detection)

Goal

Implement Tic-Tac-Toe (LC 348) where players alternate moves on an N × N board and move(row, col, player) returns 0 (no winner yet) or the player number on a winning move. The naive O(N²) per-move full-scan is unacceptable; achieve O(1) per move by maintaining row, column, and diagonal counters. After this lab you should write the implementation in <15 minutes.

Background Concepts

The non-trivial bit of Tic-Tac-Toe-as-a-data-structure-problem is the per-move winner check. Each cell affects exactly one row, one column, and (if on a diagonal) at most one or two diagonals. By incrementing player-1’s counter by +1 and player-2’s by -1 on the same axes, a counter that hits +N means player 1 won that axis, -N means player 2 won. This collapses the O(N²) scan to O(1).

Diagonals: the main diagonal is the line where row == col; the anti-diagonal is where row + col == N - 1. A cell is on the main diagonal iff row == col; on the anti-diagonal iff row + col == N - 1. The center cell of an odd-N board sits on both.

This is the cleanest real example of “exchange a redundant scan for a maintained counter” — a recurring pattern in real code (running averages, sliding maxes, materialized aggregates in databases).

Interview Context

A 20-minute warmup at Amazon, Google, and Microsoft. Often paired with the LRU lab as a phone-screen double-feature. The interviewer wants O(1) per move, clean class API, and at least one or two follow-ups about extending to N-in-a-row Connect Four-style games (where the winning condition is more complex).

Problem Statement

Implement TicTacToe(n) and move(row, col, player) -> int:

The board is n × n and starts empty.
Players alternate (caller manages turn order; you don’t validate it for this version).
player is 1 or 2.
Each move places the player’s mark at (row, col). Assume the cell is empty.
Return the player’s number if this move results in a win (full row, column, main diagonal, or anti-diagonal of that player); otherwise return 0.
Once a player has won, the game ends; further moves are not part of the spec but should be defensively handled.

Constraints

1 ≤ n ≤ 100.
Each call to move is O(1) target.
Up to 10^6 moves across the lifetime of an instance.

Clarifying Questions

Are row and col 0-indexed? (Yes.)
Is the cell guaranteed empty? (Per LC 348: yes. In practice, validate defensively.)
Do we need to detect a draw? (Not in LC 348; doable as move_count == n*n.)
Once a player wins, are further moves undefined? (Yes; either short-circuit to that winner or raise.)
Can the same player call move twice in a row? (Spec assumes alternating; we do not enforce.)

Examples

g = TicTacToe(3)
g.move(0, 0, 1)  # 0  player 1 at (0,0)
g.move(0, 2, 2)  # 0  player 2 at (0,2)
g.move(2, 2, 1)  # 0
g.move(1, 1, 2)  # 0
g.move(2, 0, 1)  # 0
g.move(1, 0, 2)  # 0
g.move(2, 1, 1)  # 1  player 1 wins via row 2 (0,0 main diag was already two of 1's)

Initial Brute Force

class TicTacToeNaive:
    def __init__(self, n):
        self.n = n
        self.b = [[0] * n for _ in range(n)]
    def move(self, r, c, p):
        self.b[r][c] = p
        # check row
        if all(self.b[r][j] == p for j in range(self.n)): return p
        if all(self.b[i][c] == p for i in range(self.n)): return p
        if r == c and all(self.b[i][i] == p for i in range(self.n)): return p
        if r + c == self.n - 1 and all(self.b[i][self.n - 1 - i] == p for i in range(self.n)): return p
        return 0

This is O(N) per move. For N = 100 and 10^6 moves: 10^8 — slow but passing. The point is structural: it scans the whole row/column/diagonal every time even though the move only changed one cell.

Brute Force Complexity

move: O(N) per call. Total: O(M · N).

Optimization Path

Replace each row/col/diagonal full-scan with a maintained counter. Use +1 for player 1, -1 for player 2; a counter at ±N is a win. The diagonals are special-cased by the row == col and row + col == N - 1 predicates — we only update them when the cell is on the diagonal. Now every check is a single integer comparison.

Final Expected Approach

State: rows[N], cols[N], diag (scalar), anti (scalar). Each is an integer counter. On move(r, c, player): compute delta = +1 if player == 1 else -1. Increment rows[r], cols[c], and conditionally diag and anti. If any of the four updated counters has absolute value N → that player wins.

Data Structures Used

list[int] of size N for rows.
list[int] of size N for columns.
int for the main diagonal counter.
int for the anti-diagonal counter.
(Optional) list[list[int]] board for defensive duplicate-move detection.

Correctness Argument

A row of N copies of player 1 produces a counter of +N exactly when all N cells are player 1, because every player-1 move on that row contributes +1 and no player-2 move contributes there (since the cell is occupied by player 1). Symmetric for player 2 → -N. Same argument for columns and the two diagonals. The diagonal counter is only updated for cells on the diagonal, so it correctly counts only diagonal cells.

Complexity

Operation	Time	Space
`move`	O(1)	O(N) for row/col counters

Implementation Requirements

class TicTacToe:
    def __init__(self, n: int):
        if n < 1:
            raise ValueError("n must be >= 1")
        self._n = n
        self._rows = [0] * n
        self._cols = [0] * n
        self._diag = 0
        self._anti = 0
        self._winner = 0  # 0 = no winner yet

    def move(self, row: int, col: int, player: int) -> int:
        if self._winner:
            return self._winner
        if not (0 <= row < self._n and 0 <= col < self._n):
            raise IndexError(f"({row}, {col}) out of bounds for n={self._n}")
        if player not in (1, 2):
            raise ValueError(f"player must be 1 or 2, got {player}")

        delta = 1 if player == 1 else -1
        target = self._n if player == 1 else -self._n

        self._rows[row] += delta
        self._cols[col] += delta
        if row == col:
            self._diag += delta
        if row + col == self._n - 1:
            self._anti += delta

        if (self._rows[row] == target
                or self._cols[col] == target
                or self._diag == target
                or self._anti == target):
            self._winner = player
            return player
        return 0

Tests

def test_row_win_player1():
    g = TicTacToe(3)
    assert g.move(0, 0, 1) == 0
    assert g.move(1, 0, 2) == 0
    assert g.move(0, 1, 1) == 0
    assert g.move(1, 1, 2) == 0
    assert g.move(0, 2, 1) == 1

def test_col_win_player2():
    g = TicTacToe(3)
    g.move(0, 0, 1); g.move(0, 1, 2)
    g.move(1, 0, 1); g.move(1, 1, 2)
    g.move(2, 2, 1); 
    assert g.move(2, 1, 2) == 2

def test_diagonal_win():
    g = TicTacToe(3)
    g.move(0, 0, 1); g.move(0, 1, 2)
    g.move(1, 1, 1); g.move(0, 2, 2)
    assert g.move(2, 2, 1) == 1

def test_anti_diagonal_win():
    g = TicTacToe(3)
    g.move(0, 2, 1); g.move(0, 0, 2)
    g.move(1, 1, 1); g.move(0, 1, 2)
    assert g.move(2, 0, 1) == 1

def test_no_winner_on_partial():
    g = TicTacToe(3)
    assert g.move(0, 0, 1) == 0
    assert g.move(1, 1, 2) == 0

def test_n_equals_one():
    g = TicTacToe(1)
    assert g.move(0, 0, 1) == 1

def test_invalid_player():
    g = TicTacToe(3)
    try: g.move(0, 0, 3)
    except ValueError: pass
    else: assert False

def test_move_after_winner():
    g = TicTacToe(3)
    for c in range(3): g.move(0, c, 1)
    # subsequent moves still report the winner
    assert g.move(1, 1, 2) == 1

def test_large_n_no_win():
    g = TicTacToe(100)
    # Fill 99 of player 1's row 0 — should not win.
    for c in range(99):
        assert g.move(0, c, 1) == 0

Follow-up Questions

How would you test it? Unit tests for each axis (row, col, both diagonals, by both players). Property test: random move sequences; oracle is the naive O(N) scan; assert outputs match. Edge: n=1 (any move wins). Edge: anti-diagonal at the corners only.
What is the consistency model? Single-threaded, linearizable trivially. If multiple threads race on move, the counters can interleave and a player can falsely fail to win. Wrap with a Lock if concurrent.
What configuration knobs would you expose? Just n. Resist adding “win condition = K-in-a-row instead of N” — that’s a different problem (Connect Four / Gomoku). If asked, see Connect Four extension below.
How would you handle a poison-pill input? Out-of-bounds coords (IndexError), invalid player (ValueError), repeated cell (defensive: track board and reject). The current implementation rejects bounds and player; cell-overwrite detection is an explicit follow-up.
How would you extend to K-in-a-row on an N×N board (Gomoku, Connect Four)? Counters no longer suffice — you need to find any window of K consecutive same-player cells. Two options: (a) on each move, scan the row, column, and both diagonals through the cell looking for K-in-a-row centered on the move (O(K) per move), or (b) maintain run-length encodings per axis (more memory, O(1) per move). For interview-time, (a) is the right answer — clean and O(K), not O(N).
What metrics would you emit? moves_total, wins_total{player=1|2}, time_to_win histogram. Game-balance metrics for product analytics; otherwise sparse.

Product Extension

The “maintained counter instead of full scan” pattern shows up everywhere: real-time sports scores (a goal updates a single team total instead of recomputing from a play log), database materialized views (incrementally maintained, not recomputed), Prometheus counters (the rate() function avoids re-scanning the whole series). Tic-Tac-Toe is the simplest possible illustration.

Language/Runtime Follow-ups

Python: as above. Lists are 8 bytes per int reference; for very large N, array.array("i", [0]*n) is denser.
Java: int[] rows, int[] cols, int diag, int anti. No autoboxing in the hot path.
Go: same — value types throughout, no allocations after construction.
C++: std::vector<int>. Make move non-virtual; this is hot-path code.
JS/TS: Int32Array(n) for rows and cols — denser than a regular Array.

Common Bugs

Off-by-one in the anti-diagonal predicate: row + col == n - 1, not n or n + 1.
Using +1 for both players (and checking count == N and count == -N) — mistake. Use opposite-sign deltas.
Forgetting to update the diagonal counter when the move is on the diagonal — counter stays stuck.
Scaling the win threshold incorrectly (target = n if player == 1 else -n). A cleaner version uses abs(counter) == n and sign(counter) == sign(delta).
Not guarding against repeated cells — same (r, c) updated twice can spuriously win for one player or unjustly cancel out.
Concurrent calls without a lock: counters become inconsistent and the win condition fires on the wrong player.

Debugging Strategy

When wins are missed: print (rows, cols, diag, anti) after each move; trace which counter should have hit ±N. When wins fire spuriously: same trace — usually the diagonal predicate is wrong. When tests pass for player 1 but not player 2: confirm delta = -1 and target = -N for player 2; sign errors are common.

Mastery Criteria

Stated the O(1)-per-move counter approach in <30 seconds.
Wrote the diagonal predicates (r == c, r + c == n - 1) without prompting.
Implemented from a blank screen in <15 minutes with all tests passing.
Listed the K-in-a-row extension and named the right scan strategy.
Articulated why this is a “maintained counter” pattern in <30 seconds.
Wrote tests covering both diagonals and n=1.

Lab 22 — Text Editor Buffer (Gap Buffer / Piece Table)

Goal

Implement a text editor data structure that supports cursor-local insert, delete (backspace), left/right cursor movement, and substring read with O(1) amortized cursor-local edits. The reference implementation is a gap buffer; the follow-up is a piece table. After this lab you should articulate why a flat str or list[char] is wrong and produce a working gap buffer in <30 minutes.

Background Concepts

A naive editor representation — a single string — makes every insert and delete O(N): the character data after the cursor must shift. For a million-character document, every keystroke is millions of operations. Real editors avoid this with one of three data structures:

Gap buffer: a single contiguous array with a “gap” of unused slots positioned at the cursor. Insert at cursor = O(1) (write into the gap, shrink it). Move cursor = O(distance moved) — the gap moves with the cursor by shifting characters across it. Used by Emacs.
Piece table: an immutable original buffer + an append-only “added” buffer + a list of “pieces” describing the visible document as concatenated slices. Insert anywhere = O(1) amortized (append to the added buffer, splice a piece into the piece list). Used by VS Code, Word.
Rope / balanced tree of strings: O(log N) for all operations, the most general. Used by Xi-editor and several research editors.

The interview almost always wants the gap buffer because it is the simplest correct answer with the right asymptotics for the locality assumption (most edits happen near the cursor). The piece table is the right follow-up answer when the interviewer asks “what if edits aren’t local?” or “what if you want O(1) amortized regardless of cursor position?”.

Interview Context

A 30-to-45-minute round at Google (Docs), Microsoft (VS Code, Word), JetBrains, and any team that builds editor-like UI. Most candidates default to a list[char] and accept O(N) per insert; that’s a partial answer. Reaching for a gap buffer immediately demonstrates that you’ve thought about real editor performance.

Problem Statement

Implement TextEditor with:

insert(text: str): insert text at the current cursor position; cursor moves to the end of inserted text.
delete_left(n: int) -> int: delete up to n characters to the left of the cursor; return the actual number deleted (capped by left content).
move_left(n: int) -> str: move cursor n positions left (capped at start); return the last 10 characters to the left of the new cursor (or fewer if not available).
move_right(n: int) -> str: symmetric on the right.
text() -> str: return the full document (debug helper; not on the hot path).

This is the LC 2296 “Design a Text Editor” interface, with the read-back-10 affordance.

Constraints

Up to 10^4 calls in total.
Each text argument up to 40 characters; total inserted up to ~10^4 characters.
Insert and delete_left must be O(amortized 1) plus O(text length). Move operations are O(distance moved).

Clarifying Questions

Is the cursor between characters (column-style) or at a character (cell-style)? (Between — like every editor.)
Does delete_left(n) with n > available delete only what’s available and return that count? (Yes.)
What does move_left return when the cursor is at the start? (Empty string.)
Is the buffer Unicode-aware? (For LC 2296: ASCII suffices. For real editors: must handle code points and grapheme clusters; out of scope here.)
Are inserts at any cursor position guaranteed local (i.e., does the interviewer want gap buffer or piece table)? (Default gap buffer — most edits are local.)

Examples

ed = TextEditor()
ed.insert("leetcode")     # cursor at end; text = "leetcode"
ed.delete_left(4)         # 4
ed.text()                 # "leet"
ed.insert("practice")     # text = "leetpractice"
ed.move_right(3)          # "etpractice" -> last 10 to left
ed.move_left(8)           # "leet"
ed.delete_left(10)        # 4
ed.text()                 # "practice"

Initial Brute Force

class StringEditor:
    def __init__(self): self.s = ""; self.c = 0
    def insert(self, text): self.s = self.s[:self.c] + text + self.s[self.c:]; self.c += len(text)
    def delete_left(self, n):
        d = min(n, self.c); self.s = self.s[:self.c - d] + self.s[self.c:]; self.c -= d; return d
    def move_left(self, n): self.c = max(0, self.c - n); return self.s[max(0, self.c - 10):self.c]
    def move_right(self, n): self.c = min(len(self.s), self.c + n); return self.s[max(0, self.c - 10):self.c]
    def text(self): return self.s

Correct, but every insert and delete is O(N) due to slice + concatenation. At 10^4 ops on a 10^4-char document: 10^8 char-shifts. Borderline-TLE on LC 2296.

Brute Force Complexity

insert: O(N + |text|). delete_left: O(N). move_*: O(1) for the text return (slice). Total worst case: O(N · operations).

Optimization Path

Switch to a gap buffer: a single bytearray (or list[str]) of length capacity, with two indices gap_start and gap_end. Characters before gap_start and after gap_end are real content; the range [gap_start, gap_end) is unused. The cursor position is gap_start. Insert at cursor: write into the gap, advance gap_start. Delete left: rewind gap_start (the deleted characters are now in the gap, no copying needed). Move left by k: shift k characters from before gap_start to after gap_end - 1 (the gap moves toward the start). Move right by k: symmetric. Resize when the gap shrinks to zero — double the capacity.

Final Expected Approach

A bytearray buf of size capacity. Indices gap_start (left edge of gap) and gap_end (right edge, exclusive). Invariants: 0 ≤ gap_start ≤ gap_end ≤ capacity. Document length = capacity - (gap_end - gap_start). Cursor = gap_start. Operations manipulate the indices and copy small ranges of bytes; total work for cursor-local edits is bounded by the edit size, not the document size.

Data Structures Used

bytearray (or list[str]) for the storage buffer.
Two int indices gap_start and gap_end.
A capacity bookkeeping value.
A _grow helper that doubles capacity when the gap is exhausted.

Correctness Argument

After every operation: the document is buf[:gap_start] + buf[gap_end:capacity] decoded. insert(text): ensure the gap holds len(text) slots (grow if needed); copy text into buf[gap_start:gap_start + len(text)]; advance gap_start by len(text). The document grows by exactly len(text) and the cursor moves to the end of the insertion. delete_left(n): cap n by gap_start (cursor is gap_start, so the leftmost left-deletable count is gap_start); rewind gap_start by n. The document shrinks by exactly n. move_left(k): shift min(k, gap_start) bytes from buf[gap_start - k:gap_start] to buf[gap_end - k:gap_end]; subtract k from both indices. The visible document is unchanged, only the gap moved.

Complexity

Operation	Time	Space
`insert(t)`	O(	t
`delete_left(n)`	O(1)	O(1) extra
`move_left(k)` / `move_right(k)`	O(k)	O(1)
`text()`	O(N)	O(N)

Implementation Requirements

class TextEditor:
    def __init__(self, initial_capacity: int = 16):
        self._buf = bytearray(initial_capacity)
        self._gap_start = 0
        self._gap_end = initial_capacity

    @property
    def _capacity(self) -> int:
        return len(self._buf)

    @property
    def _length(self) -> int:
        return self._capacity - (self._gap_end - self._gap_start)

    def _grow(self, needed: int):
        new_cap = max(self._capacity * 2, self._capacity + needed)
        new_buf = bytearray(new_cap)
        # left segment unchanged, right segment shifted to end of new buffer
        new_buf[: self._gap_start] = self._buf[: self._gap_start]
        right_size = self._capacity - self._gap_end
        new_buf[new_cap - right_size :] = self._buf[self._gap_end :]
        self._buf = new_buf
        self._gap_end = new_cap - right_size

    def insert(self, text: str):
        b = text.encode("utf-8")
        if self._gap_end - self._gap_start < len(b):
            self._grow(len(b))
        self._buf[self._gap_start : self._gap_start + len(b)] = b
        self._gap_start += len(b)

    def delete_left(self, n: int) -> int:
        d = min(n, self._gap_start)
        self._gap_start -= d
        return d

    def _move_left(self, k: int):
        k = min(k, self._gap_start)
        if k == 0:
            return
        # copy k bytes from before gap to after gap (right side)
        src_end = self._gap_start
        src_start = src_end - k
        dst_end = self._gap_end
        dst_start = dst_end - k
        # work right-to-left to handle overlap
        for i in range(k - 1, -1, -1):
            self._buf[dst_start + i] = self._buf[src_start + i]
        self._gap_start -= k
        self._gap_end -= k

    def _move_right(self, k: int):
        right_avail = self._capacity - self._gap_end
        k = min(k, right_avail)
        if k == 0:
            return
        # copy k bytes from after gap (right side) to before gap (left side)
        for i in range(k):
            self._buf[self._gap_start + i] = self._buf[self._gap_end + i]
        self._gap_start += k
        self._gap_end += k

    def _last_10_left(self) -> str:
        start = max(0, self._gap_start - 10)
        return self._buf[start : self._gap_start].decode("utf-8", errors="replace")

    def move_left(self, k: int) -> str:
        self._move_left(k)
        return self._last_10_left()

    def move_right(self, k: int) -> str:
        self._move_right(k)
        return self._last_10_left()

    def text(self) -> str:
        left = self._buf[: self._gap_start]
        right = self._buf[self._gap_end :]
        return (left + right).decode("utf-8", errors="replace")

Tests

def test_basic_insert_delete():
    ed = TextEditor()
    ed.insert("leetcode")
    assert ed.text() == "leetcode"
    assert ed.delete_left(4) == 4
    assert ed.text() == "leet"
    ed.insert("practice")
    assert ed.text() == "leetpractice"

def test_cursor_movement_returns_last_10():
    ed = TextEditor()
    ed.insert("practice")
    assert ed.move_right(3) == "practice"   # cursor at end already; last 10 left = "practice"
    assert ed.move_left(8) == ""
    assert ed.delete_left(10) == 0
    ed.insert("leet")
    assert ed.text() == "leetpractice"
    assert ed.move_left(2) == "le"

def test_lc_2296_canonical():
    ed = TextEditor()
    ed.insert("leetcode")
    assert ed.delete_left(4) == 4
    ed.insert("practice")
    assert ed.move_right(3) == "etpractice"
    assert ed.move_left(8) == "leet"
    assert ed.delete_left(10) == 4
    assert ed.move_left(2) == ""

def test_grow_buffer():
    ed = TextEditor(initial_capacity=4)
    ed.insert("a" * 100)
    assert ed.text() == "a" * 100

def test_delete_more_than_left():
    ed = TextEditor()
    ed.insert("ab")
    assert ed.delete_left(10) == 2
    assert ed.text() == ""

def test_move_clamps():
    ed = TextEditor()
    ed.insert("hello")
    ed.move_left(100)        # clamped to 0
    ed.move_right(100)       # back to end
    assert ed.text() == "hello"

Follow-up Questions

What is the relationship to a piece table? Gap buffer is one contiguous buffer with one gap; piece table is two buffers (original + append-only) and a list of pieces. Insert at cursor in piece table = append to “added” buffer, splice the piece list — O(1) amortized regardless of cursor position. The downside: random-access reads are O(log P) where P is the number of pieces (binary-search the piece list). Use a piece table when edits are non-local; use a gap buffer when most edits cluster.
How would you make it thread-safe? Wrap public methods with a Lock (or use a single-writer model — most editors are single-threaded on the editing buffer for exactly this reason; rendering and saving happen on background threads with snapshots).
How would you persist state across restarts? On every K seconds or after every N keystrokes, write the current text to a temporary file, then atomically rename it. For more granular crash recovery, append every operation to a log; replay on boot.
What configuration knobs would you expose? initial_capacity, max_document_size. The growth factor (currently 2×) is a sensible default; don’t expose unless you’ve measured.
How would you handle a poison-pill input? A multi-megabyte single insert(text). Reject text longer than max_insert (e.g., 1 MiB). Total document size capped by max_document_size. Return errors, don’t OOM.
What metrics would you emit? inserts_total, deletes_total, cursor_moves_total, buffer_grows_total, document_size_bytes gauge, gap_size_bytes gauge. Useful for tracking edit patterns and tuning capacity defaults.

Product Extension

Real editors layer many things on top: undo/redo (each operation pushes an inverse onto a stack), syntax highlighting (incremental tree-sitter passes), multi-cursor (a list of gap-buffer-style cursors), collaborative editing (operational transforms or CRDTs over the same buffer). The buffer is the bottom; the rest is composition.

Language/Runtime Follow-ups

Python: as above. bytearray is the right primitive; avoid string concatenation inside the hot path.
Java: char[] plus int gapStart, int gapEnd. StringBuilder is internally a char[] but lacks gap-buffer semantics.
Go: []byte (or []rune for Unicode-aware editors). The growth pattern matches Go’s slice append.
C++: std::vector<char>. For piece tables, std::vector<Piece> of (buffer_id, offset, length).
JS/TS: Uint8Array is the dense representation; string concatenation is O(N) and should be avoided.

Common Bugs

Forgetting to grow when the gap is exhausted — silent overwrite of right-segment data.
Off-by-one when shifting bytes during cursor moves — left-to-right copy on overlapping ranges loses data; copy right-to-left on left-shifts.
Using len(self._buf) after grow without updating cached references — always recompute capacity post-grow.
Returning the full text() on every move_* call when the spec only wants the last 10 characters left of cursor.
Encoding inconsistencies — mixing str and bytearray. Pick one (here we use UTF-8 bytes; document the choice; reject mid-codepoint splits in real implementations).
Initializing the buffer too small (e.g., capacity 1) — every keystroke triggers a regrow. Default capacity 16 amortizes well.

Debugging Strategy

When text() is wrong: print (buf, gap_start, gap_end, capacity) after each operation; the bug is almost always a forgotten index update. When cursor moves leak data: print the bytes copied and the index ranges; right-to-left vs left-to-right copy direction is the most common bug. When grow fails: assert len(buf) == capacity after every operation.

Mastery Criteria

Stated the gap-buffer invariant in <30 seconds.
Named when piece table is preferred (non-local edits) without prompting.
Implemented gap buffer with insert/delete/move/text in <30 minutes from blank screen.
Wrote a regrow test that exercises the doubling.
Articulated the overlap-direction bug for cursor moves and named the fix.
Solved LC 2296 unaided in <40 minutes including all tests.
Listed three real editors (Emacs, VS Code, Word) and which structure each uses.

Lab 23 — Toy SQL-Like Engine

Goal

Implement a tiny SQL-like engine that can parse and execute SELECT col1, col2 FROM t WHERE expr [JOIN u ON expr] [ORDER BY col [DESC]] [LIMIT n] over in-memory tables. The engine has three layers: tokenizer, parser (produces an AST), executor (interprets the AST). After this lab you should be able to scope a 60-minute version of this in <5 minutes and produce a working subset (no joins, no order-by) in <40 minutes.

Background Concepts

A SQL engine — even a toy — is the cleanest interview-friendly example of the frontend / backend / interpreter trilogy that runs every real query engine, compiler, and DSL:

Lexer / tokenizer: converts a string into a stream of typed tokens (SELECT, identifier name, =, integer 42, etc.). Skips whitespace, recognizes keywords, classifies punctuation.
Parser: consumes the token stream and produces an AST: Select(columns, from_table, where_expr, joins, order_by, limit). Recursive descent is the right tool for this problem class — top-down, predictable, fits on a whiteboard.
Executor: walks the AST and produces rows. For WHERE, evaluate the expression against each row. For JOIN, nested-loop the two tables and concatenate matching rows. For ORDER BY, sort by the named column with DESC flag. For LIMIT, slice the result.

The interviewer is not testing whether you can build a real query optimizer (you can’t, in 60 minutes). They are testing whether you can decompose the problem into the three layers, write each cleanly, and connect them through a typed AST. Candidates who try to do everything inline in one function fail; candidates who name Token, Expr, Select types and split functions per layer pass.

Interview Context

A 60-minute round at Snowflake, Databricks, MongoDB, Neon, PlanetScale, and any database / data-platform company. Often paired with a smaller warmup. The supported subset varies by interviewer — at minimum SELECT cols FROM t WHERE expr is expected; joins and order-by are stretch goals; aggregates (COUNT, SUM) are extras for strong candidates.

Problem Statement

Implement Engine with:

register_table(name: str, columns: list[str], rows: list[list]): store an in-memory table.
query(sql: str) -> list[list]: parse and execute the SQL, return rows.

Supported grammar:

SELECT  col_list  FROM  table_name
        [ JOIN  table_name  ON  cond_expr ]
        [ WHERE  cond_expr ]
        [ ORDER BY  column_ref  [ASC | DESC] ]
        [ LIMIT  integer ]

col_list is * or comma-separated column references (qualified table.col or bare col). cond_expr is a small expression language: literals (int, string), column refs, and the operators = != < <= > >= AND OR NOT.

Constraints

Identifiers are alphanumeric (and underscore). Keywords are case-insensitive (SELECT == select).
String literals use single quotes ('foo').
Tables fit in memory; nested-loop join is acceptable.
Up to 10^3 rows per table; query must finish in well under a second.

Clarifying Questions

Are aggregates (COUNT, SUM) required? (No for the base; stretch goal.)
Are subqueries supported? (No.)
Are NULLs supported? (No — undefined column = error; missing field = treat as None and NULL propagation rules elided.)
Are types coerced? (No — comparing '5' and 5 returns False or raises; document.)
Is column resolution case-sensitive? (Yes — keywords case-insensitive, identifiers case-sensitive. Document.)
Are joins inner-only? (Yes — INNER JOIN semantics; no LEFT/RIGHT/FULL for the base.)

Examples

-- users(id, name, age); orders(id, user_id, total)
SELECT name FROM users WHERE age >= 18
SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id WHERE o.total > 100
SELECT name FROM users ORDER BY age DESC LIMIT 3

Initial Brute Force

Skip the parser and tokenize-execute in a single big regex-soup function. This is what most candidates produce when panicked. It works for two or three test cases and breaks instantly on any extension.

Brute Force Complexity

Roughly O(rows × cols × query-length) and bug-prone.

Optimization Path

Properly separate lexer / parser / executor. Tokenizer scans the string once: O(N). Parser is recursive descent: O(tokens). Executor: O(rows × predicate cost) for WHERE; O(left × right) for nested-loop joins; O(rows log rows) for ORDER BY. Each layer is independently testable.

Final Expected Approach

Three layers connected by typed values:

tokenize(sql) -> list[Token] — Token = (kind, value) where kind is KEYWORD, IDENT, INT, STRING, OP, PUNC, EOF.
Parser(tokens).parse_select() -> Select — recursive-descent. Each non-terminal is a method. Select is a dataclass with columns, from_table, joins, where, order_by, limit.
Engine.execute(select) -> rows — fetch base rows, apply joins, apply where, project columns, order, limit.

Expr is a small algebraic datatype: Literal(value), Column(table_or_None, name), BinOp(op, left, right), UnaryOp(op, operand). Evaluation: eval_expr(expr, row, schema) -> value.

Data Structures Used

list[tuple] per table (rows).
dict[str, int] per table (column → index).
AST: small dataclasses or named tuples.
Token list: list[Token].
dict[str, callable] for operator dispatch.

Correctness Argument

Each layer’s correctness is independent of the others. Tokenizer correctness: every input character is consumed exactly once and emitted as exactly one token (or skipped if whitespace). Parser correctness: a recursive-descent parser for an LL(1) grammar accepts the language exactly when the grammar is LL(1) and the parser’s lookahead matches. The grammar above is trivially LL(1). Executor correctness: each clause is a transformation on a row stream. WHERE filters; JOIN cross-products and filters; projection picks columns; ORDER BY sorts; LIMIT truncates. Each transformation preserves the well-typed-row invariant.

Complexity

Stage	Time
Tokenize	O(N)
Parse	O(T) where T = tokens
WHERE filter	O(R ·
Inner JOIN (nested loop)	O(R₁ · R₂ ·
ORDER BY	O(R log R)
LIMIT	O(L)

For larger data: indices, hash joins, query optimizers — out of scope.

Implementation Requirements

import re
from dataclasses import dataclass
from typing import Any, Optional


# ---------------- Tokenizer ----------------

KEYWORDS = {"SELECT", "FROM", "WHERE", "JOIN", "ON",
            "ORDER", "BY", "ASC", "DESC", "LIMIT",
            "AND", "OR", "NOT"}

@dataclass
class Token:
    kind: str
    value: Any

_TOKEN_RE = re.compile(r"""
    \s+ |                       # whitespace
    '([^']*)' |                 # string literal
    (\d+) |                     # int
    (==|!=|<=|>=|=|<|>) |       # ops
    ([A-Za-z_][A-Za-z0-9_]*) |  # identifier or keyword
    (,|\(|\)|\.|\*)             # punctuation
""", re.VERBOSE)


def tokenize(sql: str) -> list[Token]:
    tokens: list[Token] = []
    i = 0
    while i < len(sql):
        m = _TOKEN_RE.match(sql, i)
        if not m:
            raise SyntaxError(f"unexpected char at {i}: {sql[i]!r}")
        s, ival, op, ident, punc = m.groups()
        if m.group(0).isspace():
            pass
        elif s is not None:
            tokens.append(Token("STRING", s))
        elif ival is not None:
            tokens.append(Token("INT", int(ival)))
        elif op is not None:
            tokens.append(Token("OP", "=" if op == "==" else op))
        elif ident is not None:
            up = ident.upper()
            if up in KEYWORDS:
                tokens.append(Token("KEYWORD", up))
            else:
                tokens.append(Token("IDENT", ident))
        elif punc is not None:
            tokens.append(Token("PUNC", punc))
        i = m.end()
    tokens.append(Token("EOF", None))
    return tokens


# ---------------- AST ----------------

@dataclass
class Column:
    table: Optional[str]
    name: str

@dataclass
class Literal:
    value: Any

@dataclass
class BinOp:
    op: str
    left: Any
    right: Any

@dataclass
class UnaryOp:
    op: str
    operand: Any

@dataclass
class Join:
    table: str
    alias: Optional[str]
    on: Any

@dataclass
class Select:
    columns: list                # list[Column] or ["*"]
    from_table: str
    from_alias: Optional[str]
    joins: list                  # list[Join]
    where: Optional[Any]
    order_by: Optional[tuple]    # (Column, "ASC" | "DESC")
    limit: Optional[int]


# ---------------- Parser (recursive descent) ----------------

class Parser:
    def __init__(self, tokens: list[Token]):
        self._t = tokens
        self._i = 0

    def _peek(self) -> Token: return self._t[self._i]
    def _eat(self, kind, value=None) -> Token:
        tok = self._t[self._i]
        if tok.kind != kind or (value is not None and tok.value != value):
            raise SyntaxError(f"expected {kind} {value}, got {tok}")
        self._i += 1
        return tok
    def _accept(self, kind, value=None) -> bool:
        tok = self._t[self._i]
        if tok.kind == kind and (value is None or tok.value == value):
            self._i += 1
            return True
        return False

    def parse_select(self) -> Select:
        self._eat("KEYWORD", "SELECT")
        cols = self._parse_columns()
        self._eat("KEYWORD", "FROM")
        ftable, falias = self._parse_table_alias()
        joins = []
        while self._accept("KEYWORD", "JOIN"):
            jt, ja = self._parse_table_alias()
            self._eat("KEYWORD", "ON")
            joins.append(Join(jt, ja, self._parse_expr()))
        where = self._parse_expr() if self._accept("KEYWORD", "WHERE") else None
        order_by = None
        if self._accept("KEYWORD", "ORDER"):
            self._eat("KEYWORD", "BY")
            ob_col = self._parse_column_ref()
            direction = "ASC"
            if self._accept("KEYWORD", "DESC"): direction = "DESC"
            elif self._accept("KEYWORD", "ASC"): direction = "ASC"
            order_by = (ob_col, direction)
        limit = None
        if self._accept("KEYWORD", "LIMIT"):
            limit = self._eat("INT").value
        self._eat("EOF")
        return Select(cols, ftable, falias, joins, where, order_by, limit)

    def _parse_columns(self):
        if self._accept("PUNC", "*"):
            return ["*"]
        cols = [self._parse_column_ref()]
        while self._accept("PUNC", ","):
            cols.append(self._parse_column_ref())
        return cols

    def _parse_column_ref(self) -> Column:
        ident = self._eat("IDENT").value
        if self._accept("PUNC", "."):
            name = self._eat("IDENT").value
            return Column(ident, name)
        return Column(None, ident)

    def _parse_table_alias(self) -> tuple[str, Optional[str]]:
        name = self._eat("IDENT").value
        alias = None
        if self._peek().kind == "IDENT":
            alias = self._eat("IDENT").value
        return name, alias

    def _parse_expr(self):
        return self._parse_or()
    def _parse_or(self):
        left = self._parse_and()
        while self._accept("KEYWORD", "OR"):
            left = BinOp("OR", left, self._parse_and())
        return left
    def _parse_and(self):
        left = self._parse_not()
        while self._accept("KEYWORD", "AND"):
            left = BinOp("AND", left, self._parse_not())
        return left
    def _parse_not(self):
        if self._accept("KEYWORD", "NOT"):
            return UnaryOp("NOT", self._parse_not())
        return self._parse_cmp()
    def _parse_cmp(self):
        left = self._parse_atom()
        if self._peek().kind == "OP":
            op = self._eat("OP").value
            return BinOp(op, left, self._parse_atom())
        return left
    def _parse_atom(self):
        tok = self._peek()
        if tok.kind == "INT": self._i += 1; return Literal(tok.value)
        if tok.kind == "STRING": self._i += 1; return Literal(tok.value)
        if tok.kind == "PUNC" and tok.value == "(":
            self._i += 1
            e = self._parse_expr()
            self._eat("PUNC", ")")
            return e
        return self._parse_column_ref()


# ---------------- Executor ----------------

class Engine:
    def __init__(self):
        self._tables: dict[str, tuple[list[str], list[list]]] = {}

    def register_table(self, name: str, columns: list[str], rows: list[list]):
        self._tables[name] = (columns, [list(r) for r in rows])

    def query(self, sql: str) -> list[list]:
        ast = Parser(tokenize(sql)).parse_select()
        return self._execute(ast)

    def _execute(self, sel: Select) -> list[list]:
        # 1. base
        base_cols, base_rows = self._fetch(sel.from_table)
        alias = sel.from_alias or sel.from_table
        rows = [(r, {alias: (base_cols, r)}) for r in base_rows]

        # 2. joins (nested loop)
        for j in sel.joins:
            jcols, jrows = self._fetch(j.table)
            j_alias = j.alias or j.table
            new_rows = []
            for left_row, env in rows:
                for jr in jrows:
                    new_env = dict(env)
                    new_env[j_alias] = (jcols, jr)
                    if self._eval(j.on, new_env):
                        new_rows.append((left_row + jr, new_env))
            rows = new_rows

        # 3. where
        if sel.where is not None:
            rows = [(r, env) for r, env in rows if self._eval(sel.where, env)]

        # 4. project
        if sel.columns == ["*"]:
            projected = [r for r, _ in rows]
        else:
            projected = [[self._eval(c, env) for c in sel.columns] for _, env in rows]

        # 5. order by
        if sel.order_by is not None:
            col, direction = sel.order_by
            # we sort over the *original* rows (with env), then re-project. Simpler: sort
            # the projected rows along with their key.
            rows_with_keys = [(self._eval(col, env), p) for (_, env), p in zip(rows, projected)]
            rows_with_keys.sort(key=lambda kp: kp[0], reverse=(direction == "DESC"))
            projected = [p for _, p in rows_with_keys]

        # 6. limit
        if sel.limit is not None:
            projected = projected[: sel.limit]
        return projected

    def _fetch(self, table: str):
        if table not in self._tables:
            raise ValueError(f"unknown table: {table}")
        return self._tables[table]

    def _eval(self, expr, env: dict[str, tuple[list[str], list]]):
        if isinstance(expr, Literal):
            return expr.value
        if isinstance(expr, Column):
            if expr.table is not None:
                cols, row = env[expr.table]
                return row[cols.index(expr.name)]
            for cols, row in env.values():
                if expr.name in cols:
                    return row[cols.index(expr.name)]
            raise NameError(f"unknown column: {expr.name}")
        if isinstance(expr, UnaryOp) and expr.op == "NOT":
            return not self._eval(expr.operand, env)
        if isinstance(expr, BinOp):
            l = self._eval(expr.left, env)
            r = self._eval(expr.right, env)
            return {
                "=": l == r, "!=": l != r,
                "<": l < r, "<=": l <= r,
                ">": l > r, ">=": l >= r,
                "AND": bool(l) and bool(r),
                "OR": bool(l) or bool(r),
            }[expr.op]
        raise TypeError(f"bad expr: {expr}")

Tests

def setup_engine():
    e = Engine()
    e.register_table("users", ["id", "name", "age"], [
        [1, "alice", 30], [2, "bob", 17], [3, "carol", 22], [4, "dave", 45],
    ])
    e.register_table("orders", ["id", "user_id", "total"], [
        [10, 1, 250], [11, 1, 50], [12, 3, 800], [13, 4, 75],
    ])
    return e

def test_basic_select_star():
    e = setup_engine()
    assert e.query("SELECT * FROM users") == [
        [1, "alice", 30], [2, "bob", 17], [3, "carol", 22], [4, "dave", 45]]

def test_where_int_compare():
    e = setup_engine()
    out = e.query("SELECT name FROM users WHERE age >= 18")
    assert sorted(out) == [["alice"], ["carol"], ["dave"]]

def test_string_compare():
    e = setup_engine()
    out = e.query("SELECT name FROM users WHERE name = 'alice'")
    assert out == [["alice"]]

def test_and_or_not():
    e = setup_engine()
    out = e.query("SELECT name FROM users WHERE age > 20 AND NOT name = 'dave'")
    assert sorted(out) == [["alice"], ["carol"]]

def test_join():
    e = setup_engine()
    out = e.query(
        "SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id "
        "WHERE o.total > 100"
    )
    assert sorted(out) == [["alice", 250], ["carol", 800]]

def test_order_by_desc_limit():
    e = setup_engine()
    out = e.query("SELECT name FROM users ORDER BY age DESC LIMIT 2")
    assert out == [["dave"], ["alice"]]

def test_unknown_column_errors():
    e = setup_engine()
    try: e.query("SELECT bogus FROM users")
    except NameError: pass
    else: assert False

Follow-up Questions

How would you test it? Layer-by-layer: tokenizer tests for every keyword/operator/punctuation; parser tests that pretty-print the AST and compare strings; executor tests against fixture tables. Property test: random valid queries with predictable outputs from a Python list-comprehension oracle.
What is the consistency model? Single-threaded; reads see the snapshot at query start. For concurrent writes, copy-on-write tables or per-table read-write locks.
What configuration knobs would you expose? Maximum query length, maximum result rows, query timeout. Don’t expose internals (tokenizer regex, parser lookahead).
How would you handle a poison-pill input? Catastrophic regex (rare with the lexer above), deeply nested expressions (limit recursion depth), enormous joins (R₁ × R₂ row cap before executing). Bound everything.
How would you scale to N nodes? Beyond toy: shard tables by primary key range or hash; route queries to the owning node; for joins across shards, use distributed hash join. Real systems (Spanner, CockroachDB) layer query planning, distributed execution, and consensus over this same skeleton.
What metrics would you emit? Per-query: parse latency, execution latency, rows scanned, rows returned. Per-table: row count gauge. Aggregate: queries-per-second counter, error rate counter.

Product Extension

This is the same skeleton DuckDB, SQLite, Postgres, and every database engine starts with: lex / parse / plan / execute. Real engines add a planner/optimizer between parse and execute that rewrites the AST (push down predicates, choose join order, pick indexes), and a storage layer beneath execute. Aggregate functions, group-by, subqueries, and CTEs are all extensions of the AST + executor pair.

Language/Runtime Follow-ups

Python: dataclasses are the right shape for AST. re.VERBOSE lexer is concise.
Java: Use sealed interfaces (Java 17+) for AST nodes. ANTLR for the parser if available; hand-rolled recursive descent if not.
Go: Use a type Node interface { node() } and individual struct types implementing it. The parser is a struct with the token list and an index.
C++: std::variant for AST nodes is clean; visitor pattern via std::visit.
JS/TS: Discriminated unions for AST. The runtime cost of dynamic dispatch is acceptable for an interview-grade engine.

Common Bugs

Lexer that consumes whitespace as a token — pollutes the parser. Skip whitespace in the lexer.
Parser that allows the same column twice in projection but then fails at execution — better to validate at parse time.
JOIN executor that builds a Cartesian product before filtering — works but quadratic memory before predicate evaluation. Filter as you go (the implementation above does this).
ORDER BY on a column not in projection — must evaluate against the row environment, not the projected output.
Operator precedence wrong — NOT a AND b parsed as NOT (a AND b) instead of (NOT a) AND b. The recursive-descent ladder (OR < AND < NOT < CMP) handles this.
Case-folding identifiers — many SQL engines do (Postgres folds to lowercase); this toy engine doesn’t. Document the choice.

Debugging Strategy

When parse fails: print the token stream up to the failure point and the parser’s _i index — almost always a missing keyword in the parse method (WHERE vs WHRE). When execution returns wrong rows: log the where evaluation per row with the values it sees. When joins explode: cap row count and emit an error rather than running unbounded.

Mastery Criteria

Decomposed lex / parse / execute in <2 minutes.
Wrote the recursive-descent expression parser with correct precedence.
Implemented WHERE and SELECT cols correctly in <30 minutes.
Added inner JOIN nested-loop in <10 minutes from the WHERE-only baseline.
Added ORDER BY and LIMIT in <10 minutes more.
Articulated where the optimizer would slot in (between parse and execute).
Listed three real systems (DuckDB, SQLite, Postgres) using this skeleton.
Wrote tokenizer + parser + executor tests independently per layer.

Phase 9 — Language & Runtime Deep Dive

Target level: Cross-cutting (applies at every level, but the bar rises sharply at senior+) Expected duration: 1–2 weeks of primary-language reading + 2–4 days of secondary-language skim Format: No labs. Five comprehensive language READMEs that double as interview-prep references. Companies this targets: Every company that asks “why does this work?” follow-ups — which is every company at L4+, and many at L3 as well.

Why This Phase Exists

Every other phase trains you to produce code that works. This phase trains you to explain why it works — and equally importantly, to recognize the silent ways it can stop working when an interviewer perturbs an input or asks a follow-up.

At the junior bar, “I called .sort() on the list” is a complete answer. At the senior bar, the interviewer will ask:

“Is that sort stable?”
“What’s the worst-case complexity in your language’s standard library?”
“Does it allocate?”
“What if the comparator throws?”
“What if the same key compares differently across calls — what’s the consistency contract?”
“If two of the elements are mutable and equal-by-hash, what happens to a dict keyed on them?”

A candidate who answers crisply — citing the language’s actual contract, naming the algorithm (Timsort, IntroSort, pdqsort), describing the auxiliary memory, and pointing out the one realistic failure mode — clears the senior bar without breaking a sweat. A candidate who hedges, says “uh, I’d have to check,” or worse, confidently says something wrong — does not.

The gap between these two candidates is rarely raw algorithmic ability. It’s runtime literacy: knowing your tools at the level your tools deserve.

Junior interviews ask “can you make the language do what you want?”. Senior interviews ask “do you know what the language is doing on your behalf?”. This phase exists because the second question is unbounded — every language has hundreds of subtle behaviors — and you cannot bluff your way through it under stopwatch pressure.

What “Language Depth” Actually Means In Interviews

There are five distinct kinds of language questions an interviewer can probe. Confusing them in your own head is one of the most common ways to under-prepare for this phase.

Probe type	Example question	What it tests
Mechanical	“What does `+=` do on a Python list vs a Python tuple?”	Did you actually use the language, or just write code in it?
Performance	“Why is `''.join(parts)` faster than `s += part`?”	Do you know the cost model, not just the syntax?
Concurrency	“What does `volatile` guarantee in Java?”	Do you understand the memory model?
Failure mode	“What happens if `__hash__` and `__eq__` disagree?”	Can you predict subtle bugs you’ve never seen?
Idiom	“What’s the idiomatic way to read a file line-by-line?”	Would your code look native to a coworker?

Every language-specific section in the phase below is structured around these five probe types. When you read them, mentally tag each subsection with which probe it answers. By the end you should be able to look at any language question and instantly classify it — and most candidates can answer two of the five but routinely fluff the other three.

The Five Tracks

Track	Folder	Word count	Use cases
Python	python/README.md	~10K	The default interview language; ML/data/SRE-leaning roles default here
Java	java/README.md	~10K	FAANG-traditional, finance, Android, large enterprise backends
Go	go/README.md	~7K	Infra, distributed systems, container/cloud-native (K8s, Docker, etcd shops)
C++	cpp/README.md	~10K	HFT/quant, game engines, embedded, browsers, databases, systems-programming roles
JavaScript / TypeScript	javascript-typescript/README.md	~7K	Web frontend, Node backends, full-stack startup roles

Each track is self-contained — you should not need to consult external references to answer the interview-relevant questions in that track. The READMEs are dense by design. Read your primary language linearly. Skim a secondary language. Ignore the others until/unless you switch.

How To Use This Phase

If you have one primary language

Read your primary language README end-to-end. Don’t skip sections — even ones you “already know.” The bar in this phase is being able to answer a follow-up under stopwatch pressure, not having heard of the topic.
For each section, after reading, close the page and explain the concept out loud (or to a rubber duck) in 60 seconds. If you can’t, re-read.
Run every code example yourself. Many of them produce output that looks wrong until you internalize why it’s right. Reading the explanation without running the code leaves the wrongness un-felt.
Cross-link backward to the labs in earlier phases: when you read the dict-internals section, revisit phase-01-foundations/labs/lab-03-hashmap-mastery.md. When you read the GIL section, revisit phase-08-practical-engineering/labs/lab-05-thread-pool.md. The readings are reference; the labs are practice; the combination is mastery.
Write a one-line flashcard for every interview gotcha (“integer cache -5..256 in Python,” “Integer cache -128..127 in Java,” “loop variable capture pre-Go-1.22”). You will get drilled on at least 3 of these in any senior interview.

If you have a secondary language for breadth

Skim the README. Focus on the Common Interview Gotchas and Memory Model sections. Skip the standard library deep dive — you can look those up. Time budget: 1 evening per secondary language.

If you’re polyglot and want to know all five

You will not actually answer interviews fluently in five languages. Pick one primary, one secondary, and learn the rest as a hobby. Interview fluency requires hours of speaking-the-language-aloud practice that you cannot distribute across five tracks in any reasonable time budget.

What’s Deliberately Not In This Phase

Build systems. No setuptools / Maven / go.mod / CMake / package.json deep dive. Interviews don’t probe these.
Framework-specific behavior. No Django, Spring, React, Express. Even at FAANG, framework knowledge is rarely on the rubric for a coding round.
Tooling. No pdb / gdb / delve / Chrome DevTools tutorials. Phase 10 covers debugging methodology generically.
Trivia. “Which year was Python 3.0 released?” type questions are out. We focus on what an interviewer asks because they expect you to use the answer in your code.
Esoterica. Python’s __init_subclass__, Java’s MethodHandle, Go’s unsafe.Pointer, C++’s placement new — these are real but they’re rarely on a coding interview rubric. If you reach for them in an interview without being asked, you signal over-engineering.

The bias is toward what gets you points in an interview, not what makes you “complete” as a language nerd.

Mastery Checklist

You have completed Phase 9 when, in your primary language:

You can describe the implementation of the standard hash map / dict, including its collision strategy, load factor, and adversarial-input behavior, in 90 seconds.
You can describe how the language allocates memory: stack vs heap, GC strategy if any, when objects move, and what triggers a full collection.
You can name three pitfalls in the language’s mutable-default-argument / late-binding-closure / iterator-invalidation territory and write code that demonstrates each.
You can write thread-safe code idiomatically, naming which primitive you’d use (mutex / channel / atomic / actor) and why.
For each of the standard collections (list/dict/set/heap/deque or their equivalents), you can state insert/lookup/delete complexity and one common gotcha.
You can answer “what does == mean here?” precisely, distinguishing identity, value-equality, equals-with-typed-narrowing, and any platform-specific surprises.
You can describe the language’s concurrency model (event loop / GIL / OS threads / goroutines / fibers) in one paragraph and name the kind of work each is bad at.
You can read a 50-line snippet in your secondary language and accurately predict its behavior on adversarial inputs (e.g., empty list, negative index, mutation during iteration).
You have a one-line answer for every “common interview gotcha” in your primary language’s README and can produce code that demonstrates the gotcha live.

Exit Criteria

You may exit Phase 9 and move on to Phase 10 — Testing, Debugging & Correctness when:

Primary-language depth. You’ve read the entire primary-language README and can answer 90% of the “common interview gotchas” subsection without re-reading.
Cross-cutting fluency. When asked a follow-up like “how would you make this thread-safe?” or “what’s the memory cost of this collection?” during a Phase 8 lab review, you reach for primitives and reasoning from this phase, not from a generic OS course.
Secondary-language familiarity. You can read code in at least one secondary language without grabbing a reference for basics, and you can identify two or three of the secondary language’s distinctive gotchas.
Mock readiness. You’ve done at least one mock-09-runtime-language where the entire round was follow-up questions with no algorithmic component, and scored “passing” on the rubric.

If your primary-language depth is shallow but your algorithmic skill is strong, the senior interviewer will say “smart but green” — which is a no-hire at L5+. The fix is not more LeetCode. The fix is this phase.

A Note On Language Choice For Interviews

Pick a language. Do not switch mid-interview. Do not switch mid-loop. Do not arrive at an onsite saying “I usually do Python but I’ll do Java today because the problem is more concurrent.”

Default recommendation: Python for breadth (almost every company allows it), Java if you’re targeting traditional FAANG / finance, C++ if you’re targeting HFT or systems, Go if you’re targeting infrastructure-heavy companies (Cloudflare, Datadog, Snowflake’s data-plane teams, container/Kubernetes shops), JS/TS if you’re targeting web-leaning or Node-leaning roles.

The cost of switching languages is roughly 6 months of practice in the new language before you’re fluent at the senior bar. Do not switch on a whim.

Cross-References

phase-01-foundations/ — the data-structure complexity tables here are the foundation; this phase deepens them with implementation specifics.
phase-08-practical-engineering/ — every “Language/Runtime Follow-ups” callout in those labs is grounded here.
phase-10-testing-debugging/ — debugging is largely “knowing what your runtime is doing”; that phase builds on this one.
phase-11-mock-interviews/mocks/mock-09-runtime-language.md — the mock round dedicated to this phase.
CODE_QUALITY.md — “use the language’s built-ins idiomatically” is one of the quality dimensions; this phase tells you what idiomatic actually means.
FRAMEWORK.md — Step 16 (production implications) routinely cites runtime facts from this phase.

The Sub-Tracks

Python — CPython internals, GIL, memory model, dict/list/set internals, asyncio, common gotchas
Java — JVM, GC, JMM, collections framework, concurrency, generics erasure, modern Java
Go — runtime, GMP scheduler, goroutines/channels, slices/maps internals, context, common bugs
C++ — memory model, smart pointers, move semantics, STL complexity, undefined behavior, modern idioms
JavaScript / TypeScript — V8, event loop, prototypes, this-binding, async/await, TS type system

Python Runtime Deep Dive

Target audience: candidates interviewing in Python at Big Tech, ML, infra, or any role where the interviewer is allowed to ask “how does it actually work?”

Scope: CPython. PyPy and other implementations are noted only when they materially change interview answers.

Python’s reputation for being “easy” is exactly why senior interviewers grill it hardest. The candidate who can write a clean two-pointer solution in Python and explain why their dict lookup is O(1) amortized but worst-case O(N), why their threading.Thread doesn’t help CPU-bound code, and why [[]] * 3 is a foot-gun, is rare. Be that candidate.

1. CPython Interpreter, Bytecode, Frame Objects

CPython is a stack-based bytecode interpreter. Source code → AST → bytecode → executed by an evaluation loop in C (ceval.c).

What runs your code

Lexer/Parser → AST.
Compiler → bytecode (.pyc cached in __pycache__/).
Interpreter loop (PyEval_EvalFrameEx) → fetches one bytecode opcode at a time, dispatches.

import dis

def add(a, b):
    return a + b

dis.dis(add)
#  2           0 RESUME                   0
#  3           2 LOAD_FAST                0 (a)
#              4 LOAD_FAST                1 (b)
#              6 BINARY_OP                0 (+)
#             10 RETURN_VALUE

Frame objects

Every function call allocates a frame object on the Python call stack. A frame holds: locals, the value stack, the bytecode instruction pointer, the parent frame.

import sys

def f():
    frame = sys._getframe()
    print(frame.f_code.co_name, frame.f_lineno)

f()  # f, <line>

Frames are heap-allocated objects, not C stack frames. This is why Python’s recursion limit is a Python-level integer (sys.setrecursionlimit), not a kernel limit.

Interview framing

“When you call a Python function, what’s the cost?”

Allocate a frame object, push it, populate locals from the argument tuple, execute bytecode, decref the frame. Function calls in Python are expensive — typically 100ns–1µs — which is why for over a list is faster than map(lambda…) for trivial bodies. Knowing this lets you defend choices like inline arithmetic vs operator.add.

2. The GIL — What It Is, What It Protects, When It Releases

The Global Interpreter Lock is a mutex inside the CPython interpreter. Only one thread can execute Python bytecode at a time per process.

What it protects

The GIL exists because CPython’s memory management (refcounts, GC structures, dict internals, etc.) is not thread-safe. Without the GIL, every refcount increment would need an atomic, killing single-threaded performance.

It protects interpreter state, not your data structures. list.append is atomic by accident (it’s a single bytecode), but counter += 1 is not (it’s LOAD, ADD, STORE).

When it releases

I/O operations (file read/write, socket, time.sleep) — the C extension drops the GIL while blocking.
Some C extensions explicitly drop it (NumPy heavy ops, hashlib).
Every ~5ms (sys.setswitchinterval) — the interpreter voluntarily releases for scheduling.

import threading, time

counter = 0
def bump():
    global counter
    for _ in range(1_000_000):
        counter += 1  # NOT atomic

threads = [threading.Thread(target=bump) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(counter)  # NOT 4_000_000

Implications

CPU-bound parallelism via threading is impossible in standard CPython. Use multiprocessing or release the GIL via C extensions.
I/O-bound parallelism via threading works. Each thread releases the GIL while waiting on the network.
asyncio is an alternative I/O model; it does not bypass the GIL — it doesn’t need to, because there’s only one thread anyway.

Free-threaded Python (3.13+)

PEP 703 introduced an optional no-GIL build (python3.13t). Refcounts become atomic, dict/list grow per-thread fast paths, GC adopts new locking. It is not the default and many C extensions break under it. For interview purposes:

“Python 3.13 ships an experimental free-threaded build that removes the GIL. It’s opt-in, slower for single-threaded code today, and not yet ABI-stable for the ecosystem. Default CPython still has the GIL.”

3. Memory Model — Refcounts + Generational GC

Every Python object has a reference count. When it hits zero, the object is freed immediately.

import sys
a = [1, 2, 3]
sys.getrefcount(a)  # 2 — one for `a`, one for the argument to getrefcount
b = a
sys.getrefcount(a)  # 3
del b
sys.getrefcount(a)  # 2

Why we need a GC on top

Refcounts cannot collect cycles:

a = []
b = []
a.append(b)
b.append(a)
del a, b  # Refcount of each is still 1 — they reference each other.

The generational tracing GC in the gc module sweeps for cycles. Three generations (0, 1, 2). Newly created containers go in gen 0. Survivors are promoted. Older generations are collected less often.

import gc
gc.collect()  # force a full collection
gc.get_threshold()  # (700, 10, 10)

`del` pitfalls

__del__ is a finalizer, not a destructor. Two traps:

Cycles with __del__ used to be uncollectable before Python 3.4. Now they are collected, but the order is unspecified.
__del__ may run during interpreter shutdown when module globals are already None.

class Bad:
    def __del__(self):
        print(open)  # may be None during shutdown

Use weakref.finalize or context managers (with) instead.

`weakref`

A weakref does not increment the refcount. Useful for caches and observer patterns.

import weakref

class Node: pass
n = Node()
r = weakref.ref(n)
print(r())  # <Node>
del n
print(r())  # None

Interview framing

“How does Python free memory?”

Refcounting frees most things eagerly; a generational tracing collector cleans up cycles. Compared to Java, allocations are cheaper to free on the common path (no pauses on most exits) but every operation has a per-pointer atomic increment cost — which is part of why Python is slow.

4. Object Model — `slots`, Descriptors, MRO

Every Python object is, by default, a dict-backed thing: instance attributes live in __dict__. This is why Python objects are 5–10x larger than equivalent C structs.

`slots`

Declare attributes statically and the interpreter skips __dict__:

class Point:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x, self.y = x, y

# ~56 bytes per Point with __slots__, ~328 bytes without (roughly).

__slots__ cost: no dynamic attribute addition. Subclasses that don’t redeclare __slots__ lose the optimization. Use them for value classes with millions of instances.

Descriptors

Properties, classmethods, staticmethods are all built on the descriptor protocol: an attribute access triggers __get__ / __set__ / __delete__ on the class attribute.

class Lazy:
    def __init__(self, fn): self.fn = fn
    def __get__(self, obj, cls):
        v = self.fn(obj)
        setattr(obj, self.fn.__name__, v)
        return v

class C:
    @Lazy
    def expensive(self):
        return sum(range(10**6))

MRO and C3

Multiple inheritance resolution uses the C3 linearization algorithm. It guarantees a deterministic order that respects: a class precedes its parents; left-to-right inheritance order.

class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass

print(D.__mro__)
# (D, B, C, A, object)

Interview framing

“Why is Python OO so slow?”

Every attribute access is a dict lookup on the instance, then a walk up the MRO if not found. __slots__, caching, and attrs/dataclass(slots=True) mitigate. JITs (PyPy) inline these.

5. Iterators, Generators, `yield from`

An iterator is any object with __iter__ and __next__. A generator is a function with yield; calling it returns an iterator without running the body.

def count_up(n):
    for i in range(n):
        yield i

g = count_up(3)
next(g)  # 0
next(g)  # 1

Generators suspend frame state on yield. The frame is heap-allocated, kept alive by the generator object, and resumed on next().

`yield from`

Delegates iteration to a sub-iterator and forwards send, throw, close.

def chain(a, b):
    yield from a
    yield from b

Why this matters

Generators are the foundation of asyncio (coroutines were generators before async def), pipelines, and lazy I/O. They allow processing infinite or huge sequences without materializing them.

def lines(path):
    with open(path) as f:
        for line in f:           # iterator protocol over file
            yield line.rstrip()  # constant memory

# Process 100GB log: O(line) memory.

6. List Internals — Over-allocation, Amortized Append

list is a dynamic array of PyObject*. Capacity grows geometrically.

CPython’s growth pattern (in listobject.c):

new_size = (new_size + (new_size >> 3) + 6) & ~3

That’s roughly 1.125x growth (smaller than C++ vector’s ~1.5–2x).

Operation	Complexity
`lst[i]`	O(1)
`lst.append(x)`	Amortized O(1), worst O(N) on resize
`lst.insert(0, x)`	O(N)
`lst.pop()`	O(1)
`lst.pop(0)`	O(N) — use `collections.deque`
`x in lst`	O(N)
`lst.sort()`	O(N log N), Timsort, stable
`lst[a:b]`	O(b-a), creates a copy

lst = []
for i in range(1_000_000):
    lst.append(i)  # Amortized O(1) total O(N)

Pitfalls

grid = [[0] * 3] * 3   # WRONG — three references to one row
grid[0][0] = 1
print(grid)            # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]

grid = [[0] * 3 for _ in range(3)]  # right

Interview framing

“Why does list.append average O(1)?”

Geometric growth: O(N) total work across N appends → O(1) amortized. The classic amortization proof.

7. Dict Internals — Open Addressing, Probing, Ordering

dict is a hash table with open addressing. Compact since 3.6 (split into a sparse index array + dense entries array). Insertion-ordered since 3.7 (language guarantee, not just CPython).

Lookup algorithm (simplified)

Compute hash(key) & (table_size - 1) → slot.
If slot empty → not found. If key matches → hit.
Else probe: i = (5*i + 1 + perturb) % size; perturb >>= 5. The “perturb” trick mixes high bits of the hash into early probes, reducing clustering.

Hash randomization

hash(str) and hash(bytes) use a per-process random seed (since 3.3) to mitigate algorithmic-complexity DoS. PYTHONHASHSEED=0 disables it, useful for reproducibility but unsafe in production.

import os
os.environ['PYTHONHASHSEED']  # not set in interactive: random per process
hash("foo")  # different across processes

hash(int) is the integer itself (mod a prime). hash(-1) is special-cased to -2 because -1 signals errors in C.

Worst case

Adversarial keys with colliding hashes degrade to O(N) per operation. Hash randomization defeats the basic attack but custom __hash__ returning a constant still breaks it.

class Bad:
    def __hash__(self): return 0
    def __eq__(self, other): return False  # never equal — every insert collides

d = {}
for i in range(1000):
    d[Bad()] = i  # O(N) per insert → O(N²) total

Complexity table

Operation	Avg	Worst
`d[k]`	O(1)	O(N)
`d[k] = v`	O(1) amortized	O(N)
`del d[k]`	O(1)	O(N)
`k in d`	O(1)	O(N)
iter	O(N)	O(N)

Interview framing

“Why are Python dicts ordered?”

Compact dict (3.6 CPython) stored entries in insertion order in a dense array, with a sparse index. The ordering was an implementation detail, then promoted to a language guarantee in 3.7.

8. Set Internals

set and frozenset are open-addressed hash tables, mechanically the same as dict minus the value column. Same complexity table, same adversarial caveats.

s = {1, 2, 3}
s | {4}        # union, O(len(self) + len(other))
s & {2, 3, 5}  # intersection, O(min(...))
s - {2}        # difference, O(len(self))

Sets are not insertion-ordered. Do not rely on iteration order.

9. String Internals — Interning, Encoding, `bytes` vs `str`

Python 3 strings (str) are immutable Unicode code-point sequences. CPython stores them as one of:

Latin-1 (1 byte/char) when all code points fit.
UCS-2 (2 bytes/char) up to U+FFFF.
UCS-4 (4 bytes/char) for the full range.

This is PEP 393 (“flexible string representation”). A string with a single emoji is 4× the bytes per char of a pure-ASCII string of the same length.

Interning

Short strings that look like identifiers are auto-interned. Equal interned strings share the same object → is works (by accident).

"hello" is "hello"  # True (CPython, syntactic literals)
a = "hello world"
b = "hello world"
a is b  # CPython 3.x: often True, but DO NOT RELY ON THIS

sys.intern(s) forces interning for runtime-built strings; speeds up dict lookups when the same key is used many times.

Concat in loop is O(N²)

s = ""
for c in chars:
    s += c   # creates a new string each time

CPython has a special-case optimization that sometimes makes this O(N) (when the refcount of s is 1 and the allocator can extend in place), but it is not guaranteed and disappears under any other reference. Use "".join(chars).

`bytes` vs `str`

bytes is an immutable byte sequence. str is a Unicode sequence. They do not implicitly convert in Python 3.

b"abc" + "xyz"  # TypeError
b"abc".decode("utf-8")  # → "abc"
"abc".encode("utf-8")   # → b"abc"

Network/file boundaries are bytes. Application logic is strings. Convert at the boundary, never in the middle.

Interview framing

“Why does s += c in a loop blow up?”

Strings are immutable, so each append allocates a new string and copies. CPython has an opportunistic in-place-extend hack that hides this in toy examples, but it’s fragile. Always "".join.

10. Hashing Protocol and `hash` Contract

Two objects that compare equal must have the same hash. The reverse is not required.

class Point:
    def __init__(self, x, y): self.x, self.y = x, y
    def __eq__(self, other):
        return isinstance(other, Point) and (self.x, self.y) == (other.x, other.y)
    def __hash__(self):
        return hash((self.x, self.y))

If you define __eq__ and not __hash__, your class becomes unhashable (Python sets __hash__ to None). This is by design — overriding equality without hash is almost always a bug.

Mutable types (list, dict, set) are unhashable by default — their hash would have to change as they mutate, breaking the dict/set invariant.

11. Mutable Default Arguments — The Most Famous Trap

def append_to(x, lst=[]):
    lst.append(x)
    return lst

append_to(1)  # [1]
append_to(2)  # [1, 2]   ← !

Default values are evaluated once, when the def statement runs, and shared across calls. Idiomatic fix:

def append_to(x, lst=None):
    if lst is None: lst = []
    lst.append(x)
    return lst

Every Python interviewer asks this once a year. Get it right and move on.

12. Closures and Late Binding

Free variables in closures are looked up by name at call time, not captured by value at definition.

funcs = [lambda: i for i in range(3)]
[f() for f in funcs]  # [2, 2, 2] — not [0, 1, 2]

Fix with default arg (evaluated at def):

funcs = [lambda i=i: i for i in range(3)]
[f() for f in funcs]  # [0, 1, 2]

This is the same bug as JavaScript’s var i in a loop. Both languages punish late binding.

13. Concurrency — Threading vs Multiprocessing vs Asyncio

Model	Parallelism	Use For	Cost
`threading`	Concurrent (GIL)	I/O-bound	Cheap threads (~MB stack), context switches
`multiprocessing`	Parallel (separate processes)	CPU-bound	Process startup, IPC pickling
`asyncio`	Concurrent (single thread)	High-fanout I/O	No OS threads; cooperative
`concurrent.futures`	Wraps either	Convenient API	—

Picking

10K simultaneous network connections? asyncio.
10 simultaneous network calls? threading (or asyncio).
Heavy NumPy computation? threading — NumPy releases the GIL.
Pure-Python CPU work? multiprocessing or write the hot loop in C/Cython/Numba.

14. AsyncIO Model — Event Loop, Coroutines, Don’t Block The Loop

asyncio runs an event loop in one thread. Coroutines (async def) yield control on await, the loop schedules another coroutine that’s ready.

import asyncio

async def fetch(url):
    print(f"start {url}")
    await asyncio.sleep(1)        # yields control
    print(f"done {url}")

async def main():
    await asyncio.gather(*[fetch(u) for u in ["a", "b", "c"]])

asyncio.run(main())
# All three start, all three finish ~1s later — concurrent on one thread.

Blocking the loop

If you do CPU work or call a sync blocking I/O function inside a coroutine, the entire loop stalls. Symptom: latency spikes for everyone.

async def bad():
    time.sleep(1)         # blocking — stalls the loop
    requests.get("http://x")  # blocking — same problem

async def good():
    await asyncio.sleep(1)
    async with aiohttp.ClientSession() as s:
        await s.get("http://x")

For unavoidable blocking work: await loop.run_in_executor(None, blocking_fn).

Interview framing

“What’s the difference between asyncio and threading?”

Threading is preemptive multi-tasking by the OS, threads share memory, GIL serializes. Asyncio is cooperative single-thread; coroutines must await to yield. Both win on I/O. Asyncio scales to more in-flight ops because there’s no per-task OS thread.

15. Multiprocessing — Fork vs Spawn, Pickling

multiprocessing creates separate Python processes. Each has its own GIL → real parallelism.

Start methods

Method	Default On	Cost	Caveat
`fork`	Linux (was default until 3.14)	Cheap	Copy-on-write; not safe with threads, locks, or libraries that aren’t fork-safe (e.g. CUDA).
`spawn`	macOS, Windows; default on Linux 3.14+	Slow (re-imports)	All args must be picklable.
`forkserver`	Linux	Mid	Compromise

Pickling

Args and return values cross process boundaries via pickle. Lambdas, local functions, and many file/socket objects are not picklable.

from multiprocessing import Pool

def square(x): return x * x   # top-level — picklable

with Pool(4) as p:
    print(p.map(square, range(10)))

Shared memory

multiprocessing.shared_memory.SharedMemory (Python 3.8+) for zero-copy NumPy/byte sharing. Avoids the pickling round-trip for big arrays.

16. NumPy / Vectorization

NumPy stores numbers in contiguous C arrays of native types, not as PyObject*. Operations dispatch to optimized C/SIMD that releases the GIL.

import numpy as np
a = np.arange(1_000_000)
b = a * 2 + 1     # vectorized, ~ms; pure Python equivalent ~100ms

A for loop over a NumPy array is the worst of both worlds: Python overhead per iteration, no SIMD. If the operation has no NumPy expression, fall back to Numba, Cython, or a C extension.

This is a sneaky interview line: “Implement vector dot product without NumPy” → straightforward Python, then “now optimize” → vectorize, then “now scale” → talk about BLAS underneath NumPy.

17. Common Interview Gotchas

Integer caching

a = 256; b = 256
a is b   # True — CPython caches -5..256
a = 257; b = 257
a is b   # False (not guaranteed True; do NOT use `is` for value compare)

`is` vs `==`

is is identity (same object). == is equality (__eq__). Use == for values. Use is for None, True, False, and singletons.

Sort stability

sorted() and list.sort() use Timsort — stable. You can sort by multiple keys via successive stable sorts (least significant first).

data = [("a", 2), ("b", 1), ("a", 1)]
data.sort(key=lambda x: x[1])
data.sort(key=lambda x: x[0])  # stable preserves the previous order on ties

`dict.get` default mutation

d = {}
d.setdefault("k", []).append(1)  # one allocation
# vs
d["k"] = d.get("k", []) + [1]    # quadratic for many appends

Use collections.defaultdict(list) if you append a lot.

Truthy surprises

bool([])      # False
bool([0])     # True  (non-empty list)
bool(0.0)     # False
bool("False") # True  (non-empty string)

18. Recursion Limits

import sys
sys.getrecursionlimit()  # default 1000
sys.setrecursionlimit(10000)

Python frames are heap-allocated but each is non-trivial (~500 bytes). Setting the limit too high crashes the interpreter on stack overflow of the C stack.

Convert deep recursion to iteration with an explicit stack (Phase 1, Lab 8). This isn’t optional in interviews — recursion depth = N in tree problems with skewed inputs is real.

19. Performance Hot Tips

Avoid attribute lookup in hot loops: bind to a local first.

append = result.append   # local — fastest opcode
for x in data:
    append(transform(x))

Built-ins are C. sum, min, max, any, all, map, sorted — written in C, beat hand-rolled loops.
Comprehensions beat for + append. They skip the LOAD_ATTR for append.
functools.lru_cache for memoization — drop-in, fast.
String formatting: f-strings > % > .format() > concatenation.
__slots__ for value classes with millions of instances.
Profile before optimizing. cProfile, pyinstrument, py-spy (sampling, no code changes).

import cProfile
cProfile.run("expensive()", sort="cumulative")

20. Standard Library Essentials

`collections`

Counter — multiset; most_common(k).
deque — O(1) appends/pops at both ends. Use for queues and sliding windows.
defaultdict — auto-vivifying dict.
OrderedDict — historically ordered; today dict is too. Use OrderedDict only for move_to_end and reverse iteration.
namedtuple — lightweight value class. dataclass(frozen=True, slots=True) is the modern alternative.

`heapq`

Min-heap on a list. No max-heap — negate values.

import heapq
h = []
heapq.heappush(h, 3)
heapq.heappush(h, 1)
heapq.heappop(h)  # 1
heapq.nlargest(3, data)  # k-largest

`bisect`

Binary search on a sorted list.

import bisect
i = bisect.bisect_left(sorted_arr, x)  # insertion point
bisect.insort(sorted_arr, x)            # O(N) insert

`itertools`

chain, cycle, repeat
combinations, permutations, product
accumulate (prefix sums!), groupby, pairwise (3.10+)

from itertools import accumulate, pairwise
list(accumulate([1, 2, 3, 4]))   # [1, 3, 6, 10]
list(pairwise([1, 2, 3, 4]))     # [(1,2), (2,3), (3,4)]

`functools`

lru_cache, cache (3.9+).
reduce.
partial.
cached_property.

What To Memorize Cold

GIL releases on I/O and at switch interval; doesn’t release for pure-Python CPU.
dict is open-addressed, ordered since 3.7, hash-randomized.
List growth ~1.125×. Append amortized O(1).
str immutable, three width tiers (1/2/4 byte). Concat in loop ⇒ "".join.
is vs ==. Integer cache -5..256.
Mutable default args evaluated once.
Closure late binding fix: lambda x=x: ….
Threading for I/O, multiprocessing for CPU, asyncio for fanout.
__hash__ must agree with __eq__.
heapq is min-only.

If any of those is fuzzy, re-read this document. Then code something that breaks because of it, on purpose. That’s the lesson that sticks.

JavaScript & TypeScript Runtime Deep Dive

Target audience: candidates interviewing for frontend, full-stack, or Node.js backend roles where the interviewer probes “what does the event loop actually do?”, “explain this”, “why is typeof null === 'object'”, or “how does TypeScript narrow this union?”

Scope: V8 (Chrome / Node) primarily, with mentions of SpiderMonkey (Firefox) and JSC (Safari) where they diverge. TypeScript 5.x.

JS sits in an awkward place: senior interviewers know the language is full of warts and they will use them. Memorizing trivia is necessary but not sufficient. The leverage comes from understanding the engine model and the type system’s structural reasoning.

1. V8 Internals — Ignition + TurboFan

V8 (and similar engines) compile JS through a pipeline:

Parser → AST.
Ignition — bytecode interpreter. Fast startup.
TurboFan — optimizing JIT. Profiles hot code, generates speculative machine code.
Deoptimization — when speculation breaks (a function suddenly receives a different type), TurboFan bails back to Ignition.

function add(a, b) { return a + b; }
// Called 10000x with (number, number) → TurboFan compiles it to fast int add.
add("foo", "bar");   // Type changed → deopt, recompile or bail out.

Hidden classes (shapes / maps)

V8 tracks the structural “shape” of objects internally — what fields exist in what order. Each property add changes the hidden class. Two objects with the same hidden class share a fast property layout.

function Point(x, y) { this.x = x; this.y = y; }
const a = new Point(1, 2);
const b = new Point(3, 4);
// a and b share a hidden class → fast.

a.z = 5;            // a's hidden class diverges from b's → slower.

Inline caches (ICs)

Property access (obj.x) is monitored. If the hidden class is consistent, the IC fast-paths to a direct memory offset. Polymorphic ICs (multiple shapes) are slower; megamorphic (>4) drops to a hash lookup.

Interview takeaway: initialize all properties in the constructor in the same order. Don’t add/delete properties dynamically in hot code.

2. Event Loop — Tasks, Microtasks, RAF

The browser/Node runs one JS thread. Concurrency comes from yielding back to the event loop.

┌─────────────────────────┐
│      Call Stack         │
└────────┬────────────────┘
         │ runs to completion
┌────────▼────────────────┐
│   Microtask Queue       │ ← Promises, queueMicrotask, MutationObserver
└────────┬────────────────┘
         │ drained fully
┌────────▼────────────────┐
│      Task Queue         │ ← setTimeout, setInterval, I/O, UI events
└─────────────────────────┘

Order of execution:

Run the current synchronous code to completion.
Drain the entire microtask queue.
Pick one task from the task queue.
Repeat from step 2.

console.log('a');
setTimeout(() => console.log('b'), 0);
Promise.resolve().then(() => console.log('c'));
console.log('d');
// a, d, c, b

Microtasks starve macrotasks if they keep enqueueing themselves:

function loop() { Promise.resolve().then(loop); }
loop();           // freezes the event loop — UI never repaints

`requestAnimationFrame`

Browser-only. Fires before the next paint, ~60fps. Use for animation; cheaper than setInterval(_, 16) because it’s coalesced with paint.

Node specifics

Node uses libuv. Its event loop has phases (timers, pending callbacks, poll, check, close). process.nextTick runs before microtasks (a Node-only queue with even higher priority than Promise microtasks).

process.nextTick(() => console.log('next'));
Promise.resolve().then(() => console.log('promise'));
// next, promise

3. async / await

async functions return a Promise. await desugars to .then.

async function f() {
    const a = await fetch1();
    const b = await fetch2(a);
    return b;
}
// equivalent to:
function f() {
    return fetch1().then(a => fetch2(a));
}

await x where x is not a thenable wraps it in Promise.resolve(x).

Sequential vs parallel

// Sequential (slow if independent):
const a = await op1();
const b = await op2();

// Parallel (correct when independent):
const [a, b] = await Promise.all([op1(), op2()]);

Default to Promise.all when the operations are independent — common interview ask.

Errors

try/catch works on await. An unhandled rejection in async context surfaces as unhandledrejection (browser) / process.on('unhandledRejection') (Node).

async function f() {
    try {
        await mayReject();
    } catch (e) {
        // handle
    }
}

4. Promise Gotchas

A promise is not “the running operation” — it represents a value that will exist later. The work is already started before the Promise is constructed (in most APIs).
Errors thrown inside .then callbacks become rejections of the chain.

Forgetting return inside .then breaks chaining:

p.then(x => {
    doSomething(x);     // forgot return — next .then sees undefined
}).then(use);

Promise.all short-circuits on first rejection. Use Promise.allSettled to wait for all and inspect.
Unhandled rejection is now noisy in Node and the browser. Always attach a .catch or await inside try/catch.
Promise is not cancelable. AbortController + AbortSignal pattern handles cancellation explicitly.

const ctrl = new AbortController();
fetch(url, { signal: ctrl.signal });
// later:
ctrl.abort();

5. Memory Model and GC

V8’s heap has generational GC:

Young generation (Scavenger / semi-space copying): minor GC, very fast (~ms). Most objects die young.
Old generation (Mark-sweep / mark-compact): major GC. Concurrent marking, parallel sweeping, incremental compaction.

Memory leaks in JS

Unintentional globals — assigning to a name without let/const (in non-strict mode) creates a global, never collected.
Closures — capturing large objects in long-lived callbacks.
Event listeners — not removed when DOM nodes are detached.
Timers — setInterval callbacks retain captured state forever.
Detached DOM — references to removed DOM nodes from JS keep them alive.
Map / Set — keys held strongly. Use WeakMap / WeakSet for “annotations on objects.”

Profiling

Chrome DevTools → Memory → heap snapshot, allocation timeline. Look for retained sizes and detached DOM trees.

6. Object Model — Prototypes

Every object has an internal [[Prototype]] (accessed via Object.getPrototypeOf or __proto__). Property lookup walks the prototype chain.

const a = { x: 1 };
const b = Object.create(a);
b.y = 2;
b.x;                    // 1 — looked up on a
Object.getPrototypeOf(b) === a;   // true

`prototype` (the property) vs `proto` (the link)

Foo.prototype is the object that becomes __proto__ of instances created with new Foo().

function Foo() {}
const f = new Foo();
f.__proto__ === Foo.prototype;       // true
Foo.prototype.__proto__ === Object.prototype;

`class`

class is sugar over prototypes. The methods live on Foo.prototype.

class Foo {
    greet() { return 'hi'; }
}
typeof Foo.prototype.greet;          // 'function'

`Object.create(null)`

A “dictionary object” with no prototype — no inherited properties from Object.prototype. Useful as a hash map.

7. `this` Binding

JS binds this at call site, not at definition. Five rules in priority order:

new: this = new instance.
call / apply / bind: explicit binding wins.
Method call (obj.f()): this = obj.
Plain call (f()): this = undefined in strict mode, global object otherwise.
Arrow functions: no own this — inherit from surrounding lexical scope.

const obj = { x: 1, f() { return this.x; } };
const g = obj.f;
obj.f();          // 1
g();              // undefined (strict) — `this` is global / undefined

class C {
    val = 42;
    arrow = () => this.val;        // bound to instance
    method()    { return this.val; }
}
const c = new C();
const a = c.arrow;
const m = c.method;
a();              // 42 — arrow captured `this`
m();              // TypeError — method lost `this`

8. Closures, `var` / `let` / `const`, Scope

A closure is a function plus the lexical environment it was created in.

function counter() {
    let n = 0;
    return () => ++n;
}
const c = counter();
c(); c(); c();        // 1, 2, 3

`var` (function-scoped, hoisted)

console.log(x);    // undefined (hoisted, not initialized)
var x = 1;

`let` / `const` (block-scoped, TDZ)

console.log(y);    // ReferenceError — TDZ
let y = 1;

The “Temporal Dead Zone” is the period between block entry and the let/const declaration. Accessing the binding in TDZ throws.

Loop var capture

for (var i = 0; i < 3; i++) setTimeout(() => console.log(i), 0);
// 3 3 3 — single `i`, all callbacks share it

for (let i = 0; i < 3; i++) setTimeout(() => console.log(i), 0);
// 0 1 2 — fresh `i` per iteration

This is the classic JS interview question. Use let.

9. Equality

=== strict — same type and value, with NaN !== NaN and +0 === -0.
== loose — type coercion. Don’t use it except for x == null (matches both null and undefined).
Object.is(a, b) — like === but Object.is(NaN, NaN) === true and Object.is(+0, -0) === false.

NaN === NaN;             // false
Object.is(NaN, NaN);     // true
1 == '1';                // true (coercion)
[] == false;             // true (!)
[] == ![];               // true (!)

The == rules are an interview trap. Memorize only the null/undefined exception; use === everywhere else.

10. Top Gotchas

`typeof null === 'object'`

A historical bug, never fixed for compatibility. Test for null with === null.

`parseInt` radix

parseInt('010');         // 10 in modern engines (used to be 8)
parseInt('010', 10);     // always 10 — pass the radix.

Always pass 10. ESLint enforces this.

Floating-point

0.1 + 0.2;           // 0.30000000000000004
0.1 + 0.2 === 0.3;   // false

Use Number.EPSILON tolerance, or bigint for exact integer arithmetic.

Array coercion

[] + [];             // ''
[] + {};             // '[object Object]'
{} + [];             // 0  (in some contexts — `{}` parsed as block)

Don’t + non-numbers. Use template literals or explicit String(x).

`==` and falsy

0 == '';             // true
0 == '0';            // true
'' == '0';           // false

This is why === exists.

`for...in` vs `for...of`

for (const k in obj) iterates enumerable string-keyed properties (includes inherited!).
for (const v of iterable) iterates iterable’s values.

const arr = [1, 2, 3];
for (const i in arr)  console.log(i);   // '0' '1' '2' — strings, indexes
for (const v of arr)  console.log(v);   // 1 2 3

Don’t use for...in on arrays.

`delete` on array

delete arr[i] leaves a hole (sparse array), doesn’t shorten. Use splice.

11. `Map` vs `Object`

	`Object`	`Map`
Keys	Strings & symbols	Anything
Iteration	`Object.keys/entries`, no order guarantee for non-int	Insertion order
Size	`Object.keys(o).length` (O(N))	`m.size` (O(1))
Inheritance pollution	Yes (`__proto__`, `toString`…)	No
JSON	Yes	No (need conversion)

Use Map when:

Keys are dynamic strings (esp. user input).
You need any-typed keys.
Insertion order matters.
You add/remove keys frequently.

Use Object when:

Keys are known compile-time / config-shaped.
You’ll JSON-serialize.
You’re using TypeScript’s structural types.

12. `Set`, `WeakMap`, `WeakSet`

Set — collection of unique values; insertion order.
WeakMap — keys are objects, weakly held. If the key is GC’d, the entry disappears. Not iterable. Use for “annotations on objects.”
WeakSet — set of weakly-held objects. Use for “have I seen this object?” without preventing GC.

const tags = new WeakMap();
function tag(node, value) { tags.set(node, value); }
// when `node` is GC'd, the tag is gone too.

Practical use: caches keyed on DOM nodes, private state on objects, libraries that observe but don’t own.

`WeakRef` (newer)

new WeakRef(obj) lets you hold a weak reference and dereference it (.deref()) later, getting the object or undefined if collected. Niche — you probably don’t need it.

13. TypeScript — Structural Typing & Generics

TypeScript types are structural. If two types have the same shape, they’re compatible.

interface Named { name: string }
const u: Named = { name: 'a', age: 30 };   // OK — extra props allowed in this position
function greet(p: Named) { return p.name; }
greet({ name: 'a', extra: 1 } as any);

Generics

function identity<T>(x: T): T { return x; }
identity<number>(42);
identity('hi');                  // T inferred as string

Constraints

function len<T extends { length: number }>(x: T): number { return x.length; }

Conditional types

type IsString<T> = T extends string ? true : false;
type A = IsString<'hi'>;          // true
type B = IsString<42>;            // false

Mapped types

type Partial<T> = { [K in keyof T]?: T[K] };
type Readonly<T> = { readonly [K in keyof T]: T[K] };

Utility types

Partial<T>, Required<T>, Pick<T, K>, Omit<T, K>, Record<K, V>, ReturnType<F>, Parameters<F>, Awaited<T>. Memorize the names; they come up.

14. TS Narrowing

The control-flow analyzer narrows union types based on runtime checks.

function f(x: string | number) {
    if (typeof x === 'string') {
        x.toUpperCase();           // narrowed to string
    } else {
        x.toFixed(2);              // narrowed to number
    }
}

Narrowing operators

typeof → "string", "number", "boolean", "undefined", "object", "function", "symbol", "bigint".
instanceof → for class instances.
in operator: if ('foo' in obj).
Equality: if (x === null), if (x === undefined).

Discriminated unions:

type Result = { ok: true; value: string } | { ok: false; error: Error };
function f(r: Result) {
    if (r.ok) r.value;       // OK
    else r.error;            // OK
}

User-defined type guards:

function isString(x: unknown): x is string {
    return typeof x === 'string';
}

Assertion functions:

function assertNumber(x: unknown): asserts x is number {
    if (typeof x !== 'number') throw new Error('not a number');
}

Exhaustiveness with `never`

type Shape = { kind: 'circle' } | { kind: 'square' };
function area(s: Shape): number {
    switch (s.kind) {
        case 'circle': return ...;
        case 'square': return ...;
        default: const _: never = s; throw new Error('unreachable');
    }
}

The never assignment fails to compile if a new variant is added — catches missing cases.

15. Performance Tips

Stable hidden classes — set all properties in the constructor in the same order. Don’t add later.
Avoid delete on hot objects — it transitions to dictionary mode.
Monomorphic functions — call them with the same shapes. Polymorphic = slower.
Typed arrays for numeric work — Float64Array, Int32Array. Pre-allocated, contiguous, no boxing.
Avoid arguments in hot code; use ...rest. arguments defeats some optimizations.
for over forEach in hot loops — slightly faster, no callback overhead. Less true with modern engines but still measurable on tight loops.
Pre-compile regexes — declare at module scope, not inside functions.
Avoid leaking with try/catch in hot functions on old V8 (pre-2017). Modern V8 handles it; not a real concern anymore.
Profile before optimizing. Chrome DevTools Performance tab; Node --prof and clinic.js.
Reduce object churn — V8 likes long-lived monomorphic objects.

// Bad — creates new shapes per call:
function pt() { return { x: 1, y: 2 }; }

// Better — a class V8 can specialize:
class Pt { constructor(x, y) { this.x = x; this.y = y; } }

16. Node vs Browser

	Node	Browser
Globals	`process`, `Buffer`, `__dirname`, `global`	`window`, `document`, `navigator`
Modules	CommonJS (`require`) + ESM	ESM + bundlers
I/O	libuv: fs, net, dns, child_process	`fetch`, Web APIs
DOM	None (use `jsdom` if needed)	Yes
Threads	`worker_threads`, `cluster`	`Worker`, `SharedArrayBuffer`

libuv thread pool

Node uses a thread pool (default 4) for fs, dns.lookup, crypto.pbkdf2, etc. — anything that can’t be epoll-ed.

UV_THREADPOOL_SIZE=16 node app.js

Network I/O is not on the thread pool; it’s on the event loop using async syscalls.

`worker_threads` vs `cluster`

worker_threads — separate JS thread, separate event loop, can share memory via SharedArrayBuffer. Use for CPU-bound work.
cluster — multiple processes, no shared memory, IPC via channels. Use for scaling HTTP servers across cores.

`process` vs `globalThis`

globalThis (ES2020) is the universal global object — works in browser, Node, workers.

17. What To Memorize Cold

V8 pipeline: Ignition (bytecode) → TurboFan (optimizing JIT). Hidden classes + ICs drive speed. Property order matters.
Event loop: sync runs to completion → drain microtasks → one task → repeat. Microtasks include Promises, queueMicrotask.
Node nextTick > Promise microtask > timers/IO.
async/await desugars to Promises. Use Promise.all for independent work.
Promises: not cancelable, errors → rejection, must catch. AbortController for cancellation.
GC: generational. Leaks: globals, closures, listeners, timers, detached DOM. WeakMap/WeakSet for object-keyed metadata.
Prototype chain: __proto__ link, prototype property on functions/classes. class is sugar.
this rules: new > call/apply/bind > method > default. Arrow inherits lexical.
var hoisted/function-scoped, let/const block-scoped + TDZ. Loop var capture: use let.
Equality: === always; Object.is for NaN/±0; == only for x == null.
Top traps: typeof null is "object", parseInt radix, FP math, for...in vs for...of, delete on array.
Map vs Object: Map for dynamic keys, any types, ordered, fast size. WeakMap for object-keyed weak metadata.
TS structural typing. Utility types: Partial/Required/Pick/Omit/Record/ReturnType. Conditional + mapped types.
TS narrowing: typeof, instanceof, in, discriminated unions, type guards, assertion functions, never for exhaustiveness.
Perf: stable hidden classes, monomorphic call sites, typed arrays for numeric, pre-compile regex, profile.
Node: libuv thread pool for fs/dns/crypto. worker_threads (CPU) vs cluster (HTTP scale).

JS is forgiving until it isn’t. The interviewer will test the spots where it isn’t. Fluency on the event loop, this, equality, and TypeScript narrowing usually decides senior-level outcomes.

Java Runtime Deep Dive

Target audience: candidates interviewing in Java at FAANG, finance, Android, or any backend role where the interviewer is allowed to probe “what does the JVM do here?”

Scope: HotSpot JVM (OpenJDK 21+ baseline), with notes on JDK 17 LTS where behavior diverges. Other JVMs (OpenJ9, GraalVM native-image) are noted only where they change interview-grade answers.

Java’s verbosity makes it easy to mistake “I write Java daily” for “I know Java.” A senior interviewer will quickly find the gap by asking what Integer i = 200; Integer j = 200; i == j returns, why your HashMap has O(log N) worst-case lookup since Java 8, and what volatile actually guarantees. This guide closes the gap.

1. JVM Architecture

The JVM is a stack-based virtual machine with a tiered execution model.

.java ── javac ──► .class (bytecode)
                       │
                       ▼
              ┌─────────────────────┐
              │    Class Loader     │  (Bootstrap → Platform → App)
              └────────┬────────────┘
                       ▼
              ┌─────────────────────┐
              │  Runtime Data Areas │  Heap, Method Area / Metaspace,
              │                     │  Stacks (per-thread), PC Reg, Native Stack
              └────────┬────────────┘
                       ▼
              ┌─────────────────────┐
              │  Execution Engine   │  Interpreter ↔ C1 (client) ↔ C2 (server)
              │                     │  + Tiered Compilation + OSR
              └─────────────────────┘

Tiered compilation (default since Java 8): hot methods are compiled by C1 (fast, lower-quality code) and after enough invocations re-compiled by C2 (slower, high-quality, profile-guided). OSR (on-stack replacement) lets a long-running interpreted loop be replaced by JITed code mid-flight.

// Hot loop — JIT will inline, unroll, and vectorize this.
long sum = 0;
for (int i = 0; i < 1_000_000_000; i++) sum += i;

To see what the JIT does:

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Main      # needs hsdis
java -XX:+PrintCompilation -XX:+PrintInlining Main

Class loaders

Three default loaders form a delegation chain:

Bootstrap — loads java.base (rt.jar in old days).
Platform — loads java.* modules outside java.base.
App — loads your classpath.

Delegation rule: every loader asks its parent first. This is why you cannot shadow java.lang.String with your own.

Interview framing

“What happens when I call a Java method?”

The bytecode invokevirtual looks up the method via the receiver’s class vtable, the interpreter executes it, profile data accumulates, and after a threshold the JIT compiles a specialized version. Subsequent calls jump to native code.

2. Memory Regions

Region	Per-thread?	Holds	GC?
Heap (Young + Old)	No	All Java objects	Yes
Metaspace	No	Class metadata, method bytecode	Yes (rare)
JVM Stack	Yes	Frames: locals, operand stack, return	No (LIFO)
PC Register	Yes	Current bytecode index	No
Native Stack	Yes	C stack for JNI / runtime	No
Code Cache	No	JIT-compiled native code	Evicted

Heap structure under generational collectors (G1, Parallel, Serial):

Young Generation: Eden + Survivor 0 + Survivor 1
Old Generation:   tenured objects

Allocation goes to Eden (bump-pointer in a thread-local allocation buffer — TLAB — so it’s lock-free). When Eden fills, a minor GC moves live objects to a Survivor space. Objects that survive enough minor GCs are promoted to Old.

// Each call allocates in Eden — extremely fast (just bump a pointer in TLAB).
List<Integer> tmp = new ArrayList<>();

Common JVM flags

-Xms2g -Xmx2g                  # initial / max heap
-XX:MetaspaceSize=256m
-XX:+UseG1GC                   # default since 9
-XX:MaxGCPauseMillis=200       # G1 pause target
-XX:+HeapDumpOnOutOfMemoryError

Interview framing

“Where does new ArrayList<>() live?”

In Eden, on the heap. The reference variable lives in the current frame on the JVM stack. Allocation is bump-pointer in a TLAB; collection is generational copying.

3. Garbage Collectors

Java has many GCs. Know these four:

GC	When	Pause	Trade-off
Serial	Tiny heaps / single-CPU	Stop-the-world	Simplest, smallest
Parallel (Throughput)	Batch jobs	STW, multi-thread	Max throughput, ignores pause
G1 (default)	General server	Soft target ms-scale	Balances throughput + pause
ZGC	Low-latency services	Sub-ms (since 21 generational)	Concurrent, region-based
Shenandoah	RH-flavored ZGC analog	Sub-ms	Concurrent, region-based

G1 in 90 seconds

Heap is split into ~2000 equal-size regions of 1–32 MB. Regions are tagged Eden / Survivor / Old / Humongous. G1 maintains a remembered set per region tracking incoming references so it can collect a subset of regions (“the collection set”) without scanning the whole heap. Pauses are bounded by MaxGCPauseMillis; G1 picks regions to maximize freed space within the budget.

Young collection:  evacuate Eden + Survivor → new Survivor / Old
Mixed collection:  young + selected old regions
Concurrent mark:   tracks live objects in Old without stopping the app
Full GC:           fallback STW; means you've misconfigured

A Full GC in G1 is a sign of trouble — usually too small a heap, humongous-allocation churn, or metaspace pressure.

ZGC / Shenandoah

Concurrent, region-based, colored pointers (ZGC uses high bits of references as state). Pause times stay sub-ms regardless of heap size (TB-scale heaps tested). Trade-off: higher CPU and slightly lower throughput. Generational ZGC (Java 21) closes the throughput gap.

Tuning lever, not algorithm

In an interview: say which collector and why, not “I tuned the GC.” If you tuned, name the flag and the metric you watched.

4. Object Model — Headers, Autoboxing, Integer Cache

Every Java object has a header (12 or 16 bytes depending on compressed oops + alignment) before its fields. An int field costs 4 bytes; an Integer reference costs 4 bytes (compressed oops) + the boxed object’s overhead (≈16 bytes).

// Roughly:
// int[1_000_000]     ≈  4 MB
// Integer[1_000_000] ≈ 20 MB (4 MB array + ~16 MB of boxed Integers)

Autoboxing

Java silently converts int ↔ Integer. Each unbox can throw NullPointerException if the wrapper is null.

Integer x = null;
int y = x;  // NPE — boxed → primitive deref

Integer cache

Integer.valueOf(i) caches -128..127 (and the cache upper bound is tunable via -XX:AutoBoxCacheMax). This produces the most-asked Java gotcha in history:

Integer a = 100, b = 100;
System.out.println(a == b);   // true — cached, same object
Integer c = 200, d = 200;
System.out.println(c == d);   // false — new objects, == compares references
System.out.println(c.equals(d)); // true

Always use .equals() for boxed numbers. == on object references checks identity.

Interview framing

“Why does Integer == Integer sometimes work and sometimes not?”

Integer.valueOf caches small values; large values create new objects; == is reference identity. Use .equals (or .intValue() ==).

5. Primitives vs Wrappers

Primitive	Bits	Wrapper	Default
`boolean`	1 (impl-defined)	`Boolean`	`false`
`byte`	8	`Byte`	`0`
`short`	16	`Short`	`0`
`char`	16	`Character`	`'\u0000'`
`int`	32	`Integer`	`0`
`long`	64	`Long`	`0L`
`float`	32	`Float`	`0.0f`
`double`	64	`Double`	`0.0d`

Generics cannot use primitives → List<int> is illegal. Use List<Integer> (slow, boxed) or specialized libs (IntStream, Eclipse Collections, fastutil) for hot paths.

// Hot loop on primitives — JIT loves this.
long sum = 0;
for (int i : intArray) sum += i;

// Same loop on Integer — boxing in/out, GC pressure.
long sum = 0;
for (Integer i : integerList) sum += i;

Project Valhalla (preview) introduces value classes that erase the wrapper overhead. Not yet shippable; mention only if the interviewer raises it.

Overflow

Integer arithmetic wraps silently:

int x = Integer.MAX_VALUE + 1;  // -2147483648 — no exception
Math.addExact(Integer.MAX_VALUE, 1);  // throws ArithmeticException

In interviews involving sums, products, or mid = (lo + hi) / 2, always consider overflow and prefer mid = lo + (hi - lo) / 2.

6. Collections Framework

Interface	Implementations	Note
`List`	`ArrayList`, `LinkedList`	Use ArrayList by default
`Set`	`HashSet`, `LinkedHashSet`, `TreeSet`	LinkedHash preserves insertion order
`Map`	`HashMap`, `LinkedHashMap`, `TreeMap`, `ConcurrentHashMap`	TreeMap is a red-black tree
`Queue` / `Deque`	`ArrayDeque`, `LinkedList`, `PriorityQueue`	ArrayDeque > LinkedList for stacks/queues

`ArrayList`

Backed by an Object[]. Growth is 1.5× ((oldCap >> 1) + oldCap). Append amortized O(1). add(0, x) is O(N).

ArrayList<Integer> l = new ArrayList<>(1_000_000); // pre-size to avoid resizes

`HashMap`

Open chaining: each bucket is a linked list. Treeification (Java 8+): when a bucket has ≥ 8 entries and table size ≥ 64, the bucket converts to a red-black tree; back to a list at ≤ 6 entries.

// Worst-case lookup pre-Java-8: O(N). Post-Java-8: O(log N).
HashMap<String, Integer> m = new HashMap<>();

Default load factor 0.75, default capacity 16. put triggers resize() (allocate new table, rehash all entries) when size > capacity * loadFactor.

Hash function mixes the user hashCode() with (h ^ (h >>> 16)) to defend against weak hashes.

// hashCode contract: equal objects → equal hashCodes.
// Bad: forgetting hashCode when overriding equals
@Override public boolean equals(Object o) { ... }
// must override hashCode too
@Override public int hashCode() { return Objects.hash(...); }

`LinkedHashMap`

HashMap + doubly-linked list across entries. Iteration order = insertion (or access, with accessOrder=true). The 5-line LRU cache:

class LRU<K, V> extends LinkedHashMap<K, V> {
    private final int cap;
    LRU(int cap) { super(cap, 0.75f, true); this.cap = cap; }
    @Override protected boolean removeEldestEntry(Map.Entry<K, V> e) {
        return size() > cap;
    }
}

`TreeMap`

Red-black tree → all ops O(log N), supports firstKey, floorKey, ceilingKey, subMap — irreplaceable for ordered queries.

`PriorityQueue`

Binary min-heap on an array. add / poll O(log N), peek O(1). Iteration order is not sorted — only the head is.

`ConcurrentHashMap` (Java 8+)

Lock-free reads, fine-grained synchronization on writes (CAS + synchronized per bucket). Replaces Hashtable (deprecated for performance) and Collections.synchronizedMap (one big lock).

7. Concurrency — `synchronized`, `ReentrantLock`, Atomics, CAS

`synchronized`

Reentrant intrinsic lock. Implemented as an object header bit + bias / lightweight / heavyweight states (HotSpot-specific).

synchronized (lock) {
    // critical section
}

public synchronized void f() { ... }   // same as synchronized(this)
public static synchronized void g() {} // synchronized on the Class object

`ReentrantLock`

java.util.concurrent.locks.Lock. Explicit lock/unlock, supports tryLock, lockInterruptibly, fairness, multiple condition variables.

Lock lock = new ReentrantLock();
lock.lock();
try { /* CS */ } finally { lock.unlock(); }

Pick ReentrantLock when you need timeouts, fairness, or multiple Conditions. Otherwise, synchronized is shorter and the JIT optimizes it well.

Atomics

AtomicInteger, AtomicLong, AtomicReference use CAS (Compare-And-Swap) on hardware. Lock-free, lower overhead than locks for single-variable updates.

AtomicInteger counter = new AtomicInteger();
counter.incrementAndGet();   // lock-free
counter.compareAndSet(0, 1); // CAS primitive

For high-contention counters, prefer LongAdder — it stripes the counter across cells to reduce CAS contention.

`volatile`

Marks a field for the JMM. Reads see the latest write from any thread. No atomicity — volatile int x; x++; is still a race.

volatile boolean shutdown = false;  // OK as a flag

8. Java Memory Model — Happens-Before

The JMM defines when a write by one thread is visible to another. Without happens-before, the JIT and CPU may reorder, cache, or simply skip your reads.

Happens-before edges:

Program order within a thread.
Monitor lock release ↦ subsequent acquire of the same monitor.
volatile write ↦ subsequent volatile read of the same variable.
Thread.start() ↦ first action of the started thread.
Thread’s last action ↦ Thread.join() return.
Constructor’s final-field write ↦ any reader of a properly published reference.
Transitivity: A→B and B→C ⇒ A→C.

// Classic publication bug — without `volatile`, another thread may see
// `instance != null` but read uninitialized fields.
class Singleton {
    private static volatile Singleton instance;
    public static Singleton get() {
        Singleton s = instance;
        if (s == null) {
            synchronized (Singleton.class) {
                s = instance;
                if (s == null) instance = s = new Singleton();
            }
        }
        return s;
    }
}

Interview framing

“What does volatile give me?”

Visibility (no caching) and ordering (no reorder across the access). Not atomicity. Not mutual exclusion.

9. Executors and Thread Pools

new Thread(...) is a code smell — never spin OS threads ad-hoc.

ExecutorService pool = Executors.newFixedThreadPool(8);
Future<Integer> f = pool.submit(() -> compute());
Integer result = f.get();
pool.shutdown();

Built-in factories (and their traps)

Factory	Backing queue	Trap
`newFixedThreadPool`	unbounded `LinkedBlockingQueue`	Submitter overload → OOM
`newCachedThreadPool`	`SynchronousQueue`	Unbounded thread count
`newSingleThreadExecutor`	unbounded queue	Same OOM
`newScheduledThreadPool`	`DelayedWorkQueue`	OK

Production pattern: construct ThreadPoolExecutor directly with bounded queue + named thread factory + sensible rejection policy.

ThreadPoolExecutor pool = new ThreadPoolExecutor(
    8, 16, 60, TimeUnit.SECONDS,
    new ArrayBlockingQueue<>(1000),
    namedThreadFactory("worker"),
    new ThreadPoolExecutor.CallerRunsPolicy());

`ForkJoinPool`

Work-stealing pool used by parallelStream and CompletableFuture defaults. Each worker has its own deque; idle workers steal from the back of others’ deques. Optimized for divide-and-conquer.

10. CompletableFuture

A composable async-result type. Replaces Future (which has only blocking get).

CompletableFuture<String> f =
    CompletableFuture.supplyAsync(() -> fetch())
        .thenApply(String::trim)
        .thenCompose(s -> CompletableFuture.supplyAsync(() -> enrich(s)))
        .exceptionally(ex -> "fallback");

Combinators

Method	Purpose
`thenApply`	map (sync transform)
`thenCompose`	flatMap (chain another future)
`thenCombine`	zip two futures
`allOf` / `anyOf`	combine many
`exceptionally` / `handle`	error handling
`orTimeout` (Java 9+)	bound completion

Default executor for the *Async variants is ForkJoinPool.commonPool(). Use a dedicated executor for IO-bound work — the common pool’s parallelism is cpus - 1 and you’ll starve compute.

11. Generics and Type Erasure

Generics are a compile-time feature. The runtime sees raw types: List<String> becomes List. Type checks insert checkcast instructions.

List<String> a = new ArrayList<>();
List<Integer> b = new ArrayList<>();
a.getClass() == b.getClass();  // true — both ArrayList

Consequences

Cannot do new T() (no class token) — pass Class<T> or Supplier<T>.
Cannot do new T[n] — use (T[]) Array.newInstance(cls, n).
Cannot overload by erased signature: void f(List<String>) and void f(List<Integer>) collide.
Heap pollution: unchecked casts can hide type errors until use.

PECS

Producer Extends, Consumer Super.

void copy(List<? extends Number> src, List<? super Number> dst) { ... }

? extends T lets you read T’s. ? super T lets you write T’s. Memorize this; it’s asked.

12. String Pool, `intern()`, Encoding

String is immutable. The compiler interns string literals into a pool (used to live in PermGen, now in heap since Java 7).

String a = "hello";
String b = "hello";
a == b;          // true — both reference the pooled string
String c = new String("hello");
a == c;          // false
a == c.intern(); // true

Use String.intern() rarely — it’s a global side effect with non-trivial cost.

Compact Strings (Java 9+)

A String is a byte[] plus a coder byte: LATIN1 (1 byte/char) or UTF16 (2 bytes/char). A pure-ASCII string halves its memory vs Java 8.

Concatenation

a + b + c compiles, since Java 9, to an invokedynamic calling a small generated method via StringConcatFactory. Fast.

A for loop with s = s + c is O(N²) in elapsed time despite the optimization, because each iteration allocates a new String. Use StringBuilder:

StringBuilder sb = new StringBuilder(n);
for (char c : data) sb.append(c);
return sb.toString();

StringBuffer is StringBuilder + synchronization. You almost never want StringBuffer.

13. `equals` / `hashCode` Contract

1. Reflexive:    x.equals(x) == true
2. Symmetric:    x.equals(y) ⇔ y.equals(x)
3. Transitive:   x.equals(y) && y.equals(z) ⇒ x.equals(z)
4. Consistent:   repeated calls with no mutation return the same result
5. x.equals(null) == false
6. equals  ⇒  hashCode equal      (NOT the converse)

Break #6 and HashMap silently loses your entries.

record Point(int x, int y) {}  // record auto-generates correct equals/hashCode

For non-record classes: Objects.equals and Objects.hash are your friends. IDE-generated implementations are fine; hand-rolled ones are usually wrong on edge cases (null fields, inheritance, NaN doubles).

Inheritance trap

Symmetric equals between a class and a subclass is essentially impossible without breaking Liskov. Mark the class final, or use composition + a getClass() check (not instanceof).

14. Exception Design

Three families:

Checked (Exception subclasses, except RuntimeException) — must be declared / caught.
Unchecked (RuntimeException subclasses) — programmer errors, callers may ignore.
Error — JVM problems (OutOfMemoryError, StackOverflowError). Don’t catch.

Modern Java APIs lean unchecked because checked exceptions don’t compose with lambdas / streams.

list.stream().map(this::parse)  // parse throws IOException → won't compile

Workarounds: wrap in RuntimeException, or use a checked-exception-friendly stream library.

`try-with-resources`

try (var in = Files.newInputStream(path);
     var gz = new GZIPInputStream(in)) {
    ...
} // both closed in reverse order, even on exception

Resources must implement AutoCloseable. Suppressed exceptions (close throws after the body throws) are kept on the original via addSuppressed.

15. Streams

Lazy, pull-based pipelines.

int total = orders.stream()
    .filter(o -> o.year() == 2025)
    .mapToInt(Order::total)
    .sum();

Lifecycle

Source — collection.stream(), Stream.of(...), Stream.generate(...), Files.lines(...).
Intermediate ops (lazy) — filter, map, flatMap, sorted, distinct, limit, skip.
Terminal op (eager) — forEach, collect, reduce, count, findFirst, toList() (Java 16+).

Pitfalls

A stream is single-use. Re-collecting fails with IllegalStateException.
No checked exceptions. Lambdas can’t throw them.
Stateful intermediate ops (sorted, distinct) buffer the whole stream. Don’t call them on infinite streams.
parallel() uses ForkJoinPool.commonPool — only worth it for CPU-heavy ops on large data with no shared state.

list.stream().parallel().mapToInt(...)... // measure first

16. Common Interview Gotchas

`==` vs `.equals()`

Always discussed alongside the Integer cache. Use .equals for objects, == only for primitives or true identity checks.

Integer overflow

Integer.MAX_VALUE + 1            // -2147483648
(long)Integer.MAX_VALUE + 1      // 2147483648 — promote first
Math.addExact(a, b)              // throws on overflow

Floating-point equality

0.1 + 0.2 == 0.3                 // false
Math.abs(a - b) < 1e-9           // OK
Double.compare(a, b) == 0        // handles NaN consistently

`String.split` regex

"a.b".split(".") returns [] because . is regex “any char.” Use "\\." or Pattern.quote(".").

Modifying a collection during iteration

Throws ConcurrentModificationException — even single-threaded. Use Iterator.remove() or removeIf.

`Arrays.asList(int[])`

Returns a List<int[]> of length 1. Use Arrays.stream(arr).boxed().toList() or IntStream.

Switch fall-through

Classic switch falls through. New switch (->) does not, and is exhaustive on sealed types and enums.

String s = switch (day) {
    case MON, TUE -> "weekday";
    case SAT, SUN -> "weekend";
    default -> "?";
};

17. Records, Sealed Classes, Pattern Matching (Java 16–21)

Records

record Point(int x, int y) {
    static Point origin() { return new Point(0, 0); }
}

Records are transparent immutable carriers: auto-generated constructor, accessors, equals/hashCode/toString. Implicitly final. Cannot extend, can implement interfaces.

Sealed classes

sealed interface Shape permits Circle, Rect {}
record Circle(double r) implements Shape {}
record Rect(double w, double h) implements Shape {}

Sealed types restrict the set of permitted subclasses. Combined with pattern matching, this gives exhaustive switching:

double area = switch (shape) {
    case Circle c -> Math.PI * c.r() * c.r();
    case Rect r   -> r.w() * r.h();
}; // no default needed — compiler knows the universe

Pattern matching for `instanceof`

if (obj instanceof String s && s.length() > 3) {
    use(s);
}

Modern Java is much terser than Java 8. If your interviewer is on JDK 21, leverage records + sealed + pattern switch — it shows fluency.

18. Project Loom — Virtual Threads (Java 21+)

A virtual thread is a Java-managed lightweight thread that runs on top of a small pool of OS carrier threads. Park-on-blocking-IO is implemented in the JDK.

try (var exec = Executors.newVirtualThreadPerTaskExecutor()) {
    for (int i = 0; i < 100_000; i++) {
        exec.submit(() -> {
            try (var sock = new Socket("h", 80)) { ... }
        });
    }
}

Use cases: thread-per-request servers without thread-pool sizing pain. Not faster for CPU-bound work — same number of CPUs. The win is scalability of blocking IO.

Sharp edges

Pinning: a virtual thread inside synchronized cannot be unmounted from its carrier. Use ReentrantLock if you’ll block while holding the lock.
ThreadLocal still works but with millions of virtual threads it’s expensive. Prefer ScopedValue (preview).
Native code (JNI) also pins the carrier.

Interview framing

“When would you use virtual threads instead of an executor?”

Massively concurrent IO — thousands+ of in-flight blocking calls — where you’d otherwise need async/reactive code. CPU-bound work still wants a fixed pool sized to cores.

19. Performance Hot Tips

Pre-size collections (new ArrayList<>(n), new HashMap<>(n*4/3+1)) to avoid resize churn.
Primitive arrays beat boxed lists by 4–10× for tight loops.
StringBuilder over += in loops. (See §12.)
Reuse objects in hot paths if they’re large and immutable-ish; pool buffers (ByteBuffer).
Avoid Stream in micro-loops — the lambda allocations dominate. Streams shine on big pipelines, not 5-element ones.
Escape analysis lets HotSpot stack-allocate or scalar-replace short-lived objects. You don’t tune this; you write code that doesn’t escape (no leaking this, no storing in fields).
final doesn’t make code faster (the JIT proves it itself), but it documents intent and is required for some JMM guarantees.
-XX:+UseLargePages on Linux for big heaps.
Profile with async-profiler (sampling, low-overhead) or JFR (built-in, low-overhead). Avoid printf-debugging perf.

# Async profiler — wall-clock CPU profile.
asprof -d 30 -f flame.html <pid>

20. JMH — Java Microbenchmark Harness

You cannot benchmark Java with System.nanoTime() around a loop. The JIT will hoist invariants, dead-code-eliminate unused results, and warm up partway through your “measurement.”

JMH handles all of that:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class HashBench {
    int[] data = ...;

    @Benchmark public int sumLoop() {
        int s = 0;
        for (int x : data) s += x;
        return s;
    }
}

Key ideas:

Warmup iterations before measurement.
@State holds inputs (avoids constant-folding).
Return values prevent dead-code elimination (or use Blackhole.consume).
Forks isolate JIT state across benchmarks.

When an interviewer asks “is X faster than Y?” the senior answer is: “I’d write a JMH benchmark — but my prediction is Y because [allocation / branch / cache reason].” Predict, then measure.

What To Memorize Cold

JVM = bytecode interpreter + tiered C1/C2 JIT. G1 is default GC. Heap = young (Eden + S0 + S1) + old.
Integer cache -128..127. Always .equals() for boxed numbers.
int overflow wraps; use Math.*Exact, prefer lo + (hi-lo)/2.
HashMap treeifies overflowing buckets to red-black trees → worst-case O(log N) since 8.
volatile = visibility + ordering, not atomicity.
JMM happens-before: lock, volatile, start/join, final-after-publication.
Generics erase. No new T[], no overload by erased signature.
String is immutable. Compact strings since 9. Use StringBuilder in loops.
equals and hashCode go together; record does it for you.
Records are final immutable carriers. Sealed + pattern match = exhaustive switch.
Virtual threads scale blocking IO; they don’t help CPU.
JMH for benchmarks. nanoTime lies.

If any of those is shaky, re-read the section, write the smallest program that demonstrates it, and watch it misbehave on purpose.

Go Runtime Deep Dive

Target audience: candidates interviewing in Go for infrastructure, distributed systems, cloud-native (Kubernetes / Docker / etcd ecosystems), or any backend role where the interviewer asks “explain how goroutines actually work.”

Scope: gc (the standard Go compiler) on Go 1.22+. gccgo and TinyGo only mentioned where they change interview-grade answers.

Go’s surface looks small. The runtime is not. The interview gap appears immediately when an interviewer asks “what’s the difference between a goroutine and an OS thread?”, “what does nil != nil even mean?”, or “why does this loop variable do that?”. This guide trains the answers.

1. Runtime Overview — M:N Scheduling

Go runs your code under a runtime linked into every binary. The runtime owns:

The goroutine scheduler (M:N — many goroutines onto few OS threads).
The garbage collector (concurrent tri-color mark-sweep).
The memory allocator (TCMalloc-derived, per-P caches).
Channel and sync primitives, network poller, timers, profilers.

A “goroutine” is not an OS thread. It’s a small (~2 KB initial stack) cooperatively-scheduled task multiplexed onto a pool of OS threads. The runtime can have thousands of goroutines on a handful of threads.

// A million goroutines is normal.
for i := 0; i < 1_000_000; i++ {
    go func() { /* ... */ }()
}

This is feasible because each goroutine starts with ~2 KB of stack (vs ~1 MB for an OS thread default) and the stack grows as needed.

Stack growth

Goroutine stacks are segmented / split historically, contiguous-grow since 1.4: when the stack overflows, the runtime allocates a bigger stack and copies all frames + adjusts pointers. This is the reason taking the address of a stack-allocated variable is safe in Go: even if the stack moves, references stay valid.

2. The GMP Scheduler

Three runtime objects:

	Stands for	What it is
G	Goroutine	A goroutine: stack + program counter + status
M	Machine	An OS thread
P	Processor	A logical scheduler context; holds a runnable G queue

Number of P’s = GOMAXPROCS (default: number of CPUs). Each P has a local runnable queue. M’s bind to a P to execute G’s; an M without a P cannot run Go code.

    P0 [G G G G ...]    P1 [G G ...]   P2 [G G G G G ...]
     │                   │                │
     M0                  M1               M2     (OS threads)

Steal work

When a P’s queue is empty, it steals half from a random other P’s queue. Keeps cores busy without a global lock.

What happens on a blocking syscall

The M making the syscall detaches from its P and blocks. The P picks up another M (creating one if needed) and keeps scheduling. When the syscall returns, the original M tries to reacquire a P; if none is free it parks the G on the global queue.

This is why read(fd, ...) on a regular file blocks an OS thread but does not block your other goroutines — they keep running on other M’s.

Network poller

Network I/O is epoll/kqueue/IOCP under the hood. A goroutine doing conn.Read parks itself, registers with the poller, and another goroutine runs. When the fd is readable, the poller wakes the parked G. No M is consumed while parked. This is why Go scales to 100K+ concurrent network connections trivially.

Preemption

Up to Go 1.13, goroutines yielded only at function preludes (so a tight CPU loop without function calls could starve others). Since 1.14, asynchronous preemption uses signals to interrupt a goroutine mid-instruction.

// Pre-1.14, this could starve everything else; today it's preempted.
go func() { for {} }()

Interview framing

“What’s the difference between a goroutine and a thread?”

Goroutine: ~2KB stack, cooperative + signal-preempted, scheduled by Go runtime onto a pool of OS threads. Thread: ~1MB stack, OS-scheduled, costlier context switches. Goroutines are the unit you think about; M’s are an implementation detail.

3. Goroutines vs Threads — Practical Implications

// I/O fanout pattern
results := make(chan Result, len(urls))
for _, u := range urls {
    u := u                   // pre-1.22: required to capture
    go func() {
        results <- fetch(u)
    }()
}
for range urls {
    r := <-results
    process(r)
}

Costs:

Goroutine creation: ~1µs.
Channel ops: ~50–100ns uncontended; mutexes similar.
Context switch: ~200ns within Go runtime; blocking syscalls add OS thread cost.

Sharp edge: unlike OS threads, goroutines do not have IDs. By design — they discourage thread-local-state patterns. This breaks naïve port of Java idioms.

4. Channels — Buffered, Unbuffered, `select`

A channel is a typed bounded queue with built-in synchronization.

Construct	Behavior
`make(chan T)`	Unbuffered: send and recv must rendezvous. Sender blocks until a receiver is ready.
`make(chan T, n)`	Buffered: sender blocks only when buffer is full.
`close(ch)`	Recvs drain remaining values, then receive zero values. Send to closed → panic.

ch := make(chan int, 2)
ch <- 1
ch <- 2
close(ch)
for v := range ch { fmt.Println(v) }   // 1, 2

`select`

Multiplexes channels — picks any ready case (random tie-break). Unblocks composing producers/consumers, timeouts, cancellation.

select {
case v := <-in:
    use(v)
case out <- val:
    // ...
case <-time.After(2 * time.Second):
    return errors.New("timeout")
case <-ctx.Done():
    return ctx.Err()
}

Nil-channel pattern

A nil channel blocks forever. Setting a case’s channel to nil disables it:

var done chan struct{} = nil
// case <-done: never fires

Useful when iterating over multiple channels and “turning off” one as it completes.

Closing semantics

Receivers detect close with v, ok := <-ch; ok is false on closed-and-drained.
Only the sender should close. Closing on the receiver side requires extra coordination because closing a channel that someone else may send to → panic.
Don’t close a channel just to “free” it; let the GC handle that.

5. Sync Primitives

	Use for
`sync.Mutex`	Mutual exclusion
`sync.RWMutex`	Many readers / few writers (do measure — RW often loses to plain `Mutex`)
`sync.Once`	Idempotent one-time init
`sync.WaitGroup`	Wait for N goroutines
`sync.Cond`	Condition variable; rarely needed (channels usually clearer)
`sync/atomic`	CAS, atomic add/load/store on int32/int64/pointer
`sync.Map`	Concurrent map only when read-mostly with disjoint key sets

var mu sync.Mutex
mu.Lock()
defer mu.Unlock()
// CS

`sync.Map` is not always faster

It’s optimized for two specific patterns:

Stable disjoint key sets per goroutine.
Mostly reads, rare writes.

For everything else, a regular map[K]V + sync.Mutex (or shards) is faster and clearer.

`WaitGroup`

var wg sync.WaitGroup
for _, x := range data {
    wg.Add(1)
    go func(x Item) {
        defer wg.Done()
        process(x)
    }(x)
}
wg.Wait()

Trap: wg.Add must happen before the goroutine starts running, never inside it.

6. Memory Model

Go has a documented memory model (re-articulated in 2022 for clarity). Key rules:

A read sees writes that happen before it.
Within a goroutine: program order.
Goroutine creation happens before its first instruction.
A send on a channel happens before the corresponding receive completes.
Close of a channel happens before a receive that returns the zero value due to close.
m.Unlock happens before subsequent m.Lock.
sync/atomic: each atomic op is sequentially consistent; pairs ordered by HB.

// Without sync, this is racy.
var data []int
var ready bool

go func() {
    data = makeData()
    ready = true        // RACE — no HB to the reader
}()

for !ready {}           // may loop forever (compiler/CPU can hoist)
use(data)

Fix with channel, mutex, or atomic.

Race detector

Always run tests with -race:

go test -race ./...

It instruments memory accesses, catches actual data races (not just suspicious code). Cheap insurance; one of Go’s killer features.

7. Garbage Collector — Concurrent Tri-color Mark-Sweep

Go’s GC is concurrent, non-moving, tri-color mark-sweep with write barriers.

Tri-color: white = not yet visited, grey = visited but children not, black = done.
Write barrier: intercepts pointer writes during mark to maintain invariants while the mutator runs.
Non-moving: objects don’t relocate. Pointers stay stable. (Trade-off: no compaction, more fragmentation.)

Pause time

Sub-millisecond STW for stack scanning + write-barrier setup. Most marking happens concurrently with your program. No “young generation” — Go’s GC is uniform.

Pacing

GC triggers when heap doubles since last collection (GOGC=100 default). Lower for less footprint at the cost of CPU; higher for less GC at the cost of memory.

GOGC=200 ./app   # GC less often
GOGC=off ./app   # disable (for benchmarks)

Soft / hard memory limits

runtime/debug.SetMemoryLimit(n) (Go 1.19+) sets a soft limit; the GC trades CPU for staying under it. Useful in containers — set it to 0.9 * cgroup_limit to avoid OOM-kills.

Escape analysis

The compiler decides at compile time whether a value can stay on the stack. If a pointer “escapes” the function, the value is heap-allocated.

func f() *int {
    x := 1
    return &x      // escapes — heap allocation
}

go run -gcflags='-m' main.go
// prints: x escapes to heap

Knowing what allocates lets you avoid GC pressure in hot paths. Stack allocation is essentially free; heap allocation costs ~30ns + future GC scan.

8. Slice Internals

A slice is a 3-word struct: (ptr *T, len int, cap int). Slicing does not copy — it’s a view.

a := []int{1, 2, 3, 4, 5}
b := a[1:4]      // [2 3 4], cap=4 (from index 1 to end of underlying array)
b[0] = 99
fmt.Println(a)   // [1 99 3 4 5] — shared backing array!

`append` semantics

b = append(b, 10)   // if len < cap: in place; else allocate new backing array

Growth: double up to ~256 elements, then ~1.25× (Go 1.18 changed the heuristics slightly). The new slice’s backing array is independent of any older slice that still points at the old one.

a := make([]int, 4, 4)
b := a[:2]
c := append(b, 99)        // overwrites a[2]
fmt.Println(a, c)         // [1 1 99 1] [1 1 99]
d := append(c, 1, 2, 3)   // reallocates; d disjoint from a

This is the slice aliasing gotcha that loses interviews. The fix is to be explicit:

b := append([]int(nil), source...)   // explicit copy

Three-index slice

a[lo:hi:max] caps the new slice’s cap at max - lo. Use it when handing out a slice you don’t want the receiver to extend into your data.

9. Map Internals

map[K]V is a hash table with bucket chaining (each bucket holds 8 entries, then chains overflow buckets). Hash is randomized per map (security + iteration order).

m := make(map[string]int, 1000)  // pre-size to avoid grows
m["a"] = 1
delete(m, "a")
v, ok := m["a"]

Iteration order is randomized

Every for k := range m produces a different order, even within one run. Don’t depend on it.

for k, v := range m {
    // unspecified order
}

Concurrent access

Plain map is not safe for concurrent read/write. Go’s race detector and runtime both panic on detection. Use sync.RWMutex or sync.Map (with caveats from §5).

fatal error: concurrent map writes

`nil` map

A nil map can be read (returns zero) but not written. A common bug:

var m map[string]int
m["a"] = 1   // PANIC

Use m := map[string]int{} or make(map[string]int).

Complexity

Op	Avg	Worst
`m[k]`	O(1)	O(N) under collisions
`m[k] = v`	O(1) amortized	O(N) on grow
`delete(m, k)`	O(1)	O(N)
`range m`	O(N)	O(N)

Maps shrink lazily — deleting most keys does not return memory. Re-create the map if you care.

10. Strings — Bytes vs Runes

A string is an immutable byte slice. No internal length-of-runes — indexing returns bytes.

s := "héllo"
len(s)     // 6 — UTF-8 bytes (é is 2)
s[0]       // 'h' (a byte)
s[1]       // first byte of é, NOT é

Iterate with range to get runes (decoded code points):

for i, r := range s {
    // i: byte index, r: rune (int32 code point)
}

To get rune count: utf8.RuneCountInString(s).

`[]byte` ↔ `string` conversion

Both directions copy by default (so the immutability invariant holds).

b := []byte(s)      // copy
s2 := string(b)     // copy

Hot paths can use unsafe.String / unsafe.Slice (Go 1.20+) for zero-copy, but it’s a footgun — only if you can prove the underlying bytes won’t be mutated.

String concat

a + b + c allocates each step → O(N²) in a loop. Use strings.Builder:

var sb strings.Builder
for _, p := range parts {
    sb.WriteString(p)
}
return sb.String()

strings.Builder reuses its buffer and avoids the final copy via unsafe.

11. Interfaces — `itab`, the `nil != nil` Trap

An interface value is two words: (itab *itab, data *void). The itab holds the dynamic type + method table; data is the concrete value (or pointer to it).

type io.Reader interface { Read(p []byte) (int, error) }

var r io.Reader      // itab = nil, data = nil  → r == nil
r = (*os.File)(nil)  // itab ≠ nil, data = nil  → r != nil  !!

This is the Go gotcha. Rule: an interface is nil only when both its halves are nil. A typed nil pointer assigned to an interface is not nil.

The footgun in real code:

func mightFail() error {
    var e *MyError = nil
    if condition() { e = &MyError{...} }
    return e   // returning a typed-nil pointer -> caller sees != nil
}

Fix:

func mightFail() error {
    if condition() { return &MyError{...} }
    return nil   // explicit nil interface
}

Type assertions and type switches

v, ok := x.(string)         // safe assertion
switch v := x.(type) {
case int:    use(v)
case string: use(v)
default:     ...
}

Type assertions are O(1) for non-empty interfaces (one slot in the itab). For empty interfaces (any), the runtime walks the method table — still fast but not free.

12. Error Handling

Errors are values. Everything else is style.

v, err := operation()
if err != nil {
    return fmt.Errorf("operation failed: %w", err)
}

Wrapping

%w (Go 1.13+) wraps an error, building a chain.

errors.Is(err, io.EOF)            // walks the chain
var pathErr *os.PathError
errors.As(err, &pathErr)          // unwraps to a specific type

Sentinel errors

var ErrNotFound = errors.New("not found")
return ErrNotFound

Compare with errors.Is, not == — wrapping breaks ==.

`panic` / `recover`

panic unwinds stack frames running deferred functions. recover (in a deferred func) catches it. Use only for truly unexpected conditions (programmer bugs, “should never happen”). Not for control flow.

defer func() {
    if r := recover(); r != nil {
        log.Printf("recovered: %v", r)
    }
}()

13. `defer`

defer schedules a call to run when the surrounding function returns.

f, err := os.Open(path)
if err != nil { return err }
defer f.Close()

Cost and gotchas

Pre-1.14, defer was ~50ns. Since 1.14, “open-coded defers” are inlined for many cases — essentially free.
Args are evaluated at the defer call site, not at execution:
```
i := 1
defer fmt.Println(i)   // prints 1
i = 2
```
LIFO ordering — deferred calls run in reverse.
defer in a loop accumulates. Don’t defer f.Close() inside for over thousands of files; close manually or wrap the body in a function.

14. Context

context.Context propagates deadlines, cancellation, and request-scoped values across API boundaries.

ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel()

resp, err := http.NewRequestWithContext(ctx, "GET", url, nil)

Rules

Pass ctx as the first parameter, never store it in a struct field for long-lived state.
Always call cancel — even on success — to release resources. defer cancel() is the pattern.
Don’t pass nil ctx; use context.TODO() if you don’t have one yet.
ctx.Value is for request-scoped data (auth principal, request ID), not for optional config.
A child context is cancelled when its parent is cancelled.

Detecting cancellation

select {
case <-ctx.Done():
    return ctx.Err()
case v := <-work:
    return process(v)
}

15. Goroutine Leaks

A goroutine leak happens when a goroutine blocks forever on a channel that never receives, a mutex never released, etc. The runtime never reclaims it. In long-running services, leaks compound.

Common shape

func bad() <-chan int {
    out := make(chan int)         // unbuffered
    go func() {
        out <- expensive()        // blocks forever if caller drops the chan
    }()
    return out
}

Fixes:

Buffer the channel for one value (drop on send if no receiver).
Use select with ctx.Done().

go func() {
    select {
    case out <- expensive():
    case <-ctx.Done():
    }
}()

Detecting leaks

go test with goleak (Uber library) at the end of tests.
runtime.NumGoroutine() in production — a steadily growing number is a leak.
pprof goroutine profile: curl http://localhost:6060/debug/pprof/goroutine?debug=2.

16. Testing and Benchmarking

Tests

func TestAdd(t *testing.T) {
    if got := Add(1, 2); got != 3 {
        t.Errorf("Add(1,2) = %d, want 3", got)
    }
}

Table-driven tests

Idiomatic Go — readable, easy to extend.

tests := []struct {
    name    string
    in, want int
}{
    {"zero", 0, 0},
    {"pos",  1, 2},
}
for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
        if got := Double(tt.in); got != tt.want {
            t.Errorf("got %d want %d", got, tt.want)
        }
    })
}

Benchmarks

func BenchmarkX(b *testing.B) {
    for i := 0; i < b.N; i++ {
        X()
    }
}

go test -bench=. runs them. b.ReportAllocs() includes alloc counts. Always look at allocs/op — the JIT here doesn’t exist; allocations directly drive GC pressure.

go test -bench=. -benchmem

Fuzzing (Go 1.18+)

func FuzzParse(f *testing.F) {
    f.Add("hello")
    f.Fuzz(func(t *testing.T, s string) {
        Parse(s)
    })
}

Use for parsers, decoders, anything taking adversarial input.

17. Common Interview Gotchas

Loop variable capture (pre-1.22)

for _, v := range items {
    go func() { process(v) }()    // pre-1.22: all goroutines see last v
}

Fix pre-1.22: shadow v := v inside the loop. Go 1.22 fixed this — each iteration has its own copy. State which Go version you’re on.

Slice aliasing

See §8.

Map iteration order

Randomized. Don’t rely on it. Don’t rely on it. Don’t rely on it.

Nil interface vs typed-nil pointer

See §11.

`==` on slices / maps / functions

Compile error. Slices/maps/funcs aren’t comparable. Use reflect.DeepEqual or write per-field comparison.

`defer` in a loop

for _, p := range paths {
    f, _ := os.Open(p)
    defer f.Close()             // Hundreds of open files — close at func return
}

Wrap the body in a function or close explicitly.

Range over a channel

for v := range ch continues until ch is closed. If never closed, leaks.

Goroutine started with shared mutable state

data := []int{1, 2, 3}
go modify(&data)              // race unless guarded

Always guard with mutex or send via channel.

18. Performance Hot Tips

Pre-size slices and maps: make([]T, 0, n), make(map[K]V, n). Avoid resize churn.

Avoid heap allocations in hot loops. Use -gcflags='-m' to find escape culprits. Reuse buffers via sync.Pool.

var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}
buf := bufPool.Get().(*bytes.Buffer)
defer func() { buf.Reset(); bufPool.Put(buf) }()

strings.Builder for concatenation, bytes.Buffer for byte building.
Prefer fixed-size arrays / structs over slices in tight code when size is known.
Goroutines aren’t free. Spawning one per CPU-microtask in a 1ns loop is slower than the loop. They shine for IO and large work units.
Avoid interface{} in hot paths. Boxing primitives heap-allocates and adds an itab indirection per call.
Profile. go test -cpuprofile, pprof, the runtime tracer (go tool trace).

go test -bench=. -cpuprofile=cpu.out
go tool pprof -http=:8080 cpu.out

runtime.GC() and debug.SetGCPercent are levers, not solutions. Reduce allocation first.
sync.Pool is not a general-purpose cache; the runtime drops its contents on every GC. Use it for short-lived reusable buffers.

What To Memorize Cold

GMP scheduler. Goroutines (~2KB) ≠ OS threads. M:N. P count = GOMAXPROCS.
Goroutine stacks grow by copy. Network I/O via runtime poller, no M consumed.
Channels: unbuffered = rendezvous. Send to closed → panic. Nil chan blocks forever.
Memory model: race detector with -race. Channel send happens-before recv. Mutex unlock HB next lock.
GC: concurrent tri-color mark-sweep, non-moving, sub-ms pauses. GOGC and SetMemoryLimit.
Slices = (ptr, len, cap). append may alias or reallocate. Aliasing bugs are common.
Maps: randomized iteration, not concurrent-safe, not comparable, panic on nil-map write.
Strings: immutable bytes. Range over string yields runes.
Interface = (itab, data). Typed-nil pointer in interface ≠ nil interface.
Loop var capture fixed in Go 1.22.
defer cheap since 1.14, args eval at scheduling time.
context first arg, always defer cancel().
Goroutine leaks via blocked channels — select on ctx.Done().
Pre-size slices/maps. sync.Pool for buffer reuse. pprof for everything else.

When any of those is hazy, write a 10-line program that tickles it. The race detector and -gcflags='-m' are unusually fast feedback loops compared to other languages.

C++ Runtime Deep Dive

Target audience: candidates interviewing in C++ for HFT/quant, game engines, embedded, browsers, databases, systems-programming, or any role where the interviewer asks “what does this allocate?”, “is that UB?”, or “trace the move.”

Scope: ISO C++17 baseline with C++20/23 features called out. GCC, Clang, and MSVC all behave alike on the spec — vendor-specific behavior is noted only when it changes interview answers.

C++ punishes superficial knowledge harder than any other language on this list. The senior interviewer will set a trap (a dangling reference, a missed move, an iterator invalidation, a UB), and either you see it or you don’t. There is no bluffing through C++. Everything in this guide pays interest.

1. Memory Model — Stack, Heap, RAII

C++ gives you control over object lifetime. Every object you create lives somewhere:

Storage	Lifetime	Cost	Example
Automatic (stack)	Scope of declaration	Zero	`int x;` `Foo f;`
Static / thread_local	Program / thread	Zero (init once)	`static Foo f;`
Dynamic (heap)	Until `delete`/destructor	malloc + bookkeeping	`new Foo` / `make_unique<Foo>`

void f() {
    int x = 5;                            // stack
    static int y = 0;                     // static, init once
    auto p = std::make_unique<int>(42);   // heap, freed at scope end
}

RAII

Resource Acquisition Is Initialization. Tie the lifetime of a resource (memory, file, lock, socket) to the lifetime of an object on the stack.

{
    std::lock_guard<std::mutex> lk(mtx);   // acquires
    // ... critical section ...
}                                          // destructor releases

RAII is the single most important C++ idea. It makes exceptions safe, makes resource leaks impossible if you stick to it, and is the foundation of all modern C++. Every interview answer that involves “what if it throws?” reduces to “RAII handles it.”

Stack frames

A function call pushes a frame: arguments, return address, locals, callee-saved registers. Frame size is fixed at compile time. Stack overflow on deep recursion or huge stack arrays.

`alloca` / VLAs

alloca(n) allocates on the stack. C99 VLAs (int arr[n]) are not in C++. Modern code uses std::vector or std::array (compile-time n).

2. Pointers, References, Values

Form	Nullable	Rebindable	Storage
`T`	n/a	n/a	by value
`T&`	No	No	reference; aliases another object
`T*`	Yes	Yes	pointer; an address
`const T&`	No	No	read-only view
`T&&`	No	No	rvalue reference (see §4)

int x = 1;
int& r = x;     // r is x — no separate object
int* p = &x;
*p = 2;         // x is now 2
r = 3;          // x is now 3
p = nullptr;    // p reseats; r cannot be reseated

When to use which

Pass by value: small types (int, Point), or you want a copy / will move from the parameter.
Pass by const T&: large/expensive types you only read.
Pass by T&: out-parameters (rare in modern C++; prefer return values).
Pass by pointer: nullable, or you need a C-API.

Dangling refs

const std::string& bad() {
    std::string tmp = "hi";
    return tmp;             // returns reference to dead local — UB
}

The compiler may warn; the runtime will silently corrupt. Sanitizers (ASan) catch many cases.

3. Smart Pointers — `unique_ptr`, `shared_ptr`, `weak_ptr`

The modern rule: never new/delete directly. Use:

std::unique_ptr<T> — exclusive ownership, zero overhead vs raw pointer.
std::shared_ptr<T> — shared ownership, atomic refcount.
std::weak_ptr<T> — non-owning observer; breaks shared_ptr cycles.

auto u = std::make_unique<Foo>(args...);   // unique
auto s = std::make_shared<Foo>(args...);   // shared
std::weak_ptr<Foo> w = s;                  // non-owning

Cost model

unique_ptr<T> is a single pointer. Move-only. Compiler optimizes away the wrapper.

shared_ptr<T> is two pointers (the object, the control block) + an atomic refcount. Copying = atomic increment. Destruction = atomic decrement.

`make_shared` vs `shared_ptr<T>(new T)`

make_shared<T> allocates the object and the control block in one block. Cheaper, better cache locality. Drawback: memory isn’t freed until the last weak_ptr dies (because the control block lives in the same allocation).

Cycles

struct Node { std::shared_ptr<Node> next; };
auto a = std::make_shared<Node>();
auto b = std::make_shared<Node>();
a->next = b; b->next = a;
// a and b never freed — refcount of each stays at 2

Fix: one direction weak_ptr. Or, redesign — most “cycles” represent ownership confusion.

Custom deleter

auto p = std::unique_ptr<FILE, decltype(&fclose)>(fopen("x", "r"), &fclose);

Useful for C-API resources.

4. Move Semantics, Rvalue References

A moved-from object is in a “valid but unspecified” state. The point of move is to transfer expensive resources (heap allocations, file handles) without copying.

std::string a = "hello";
std::string b = std::move(a);   // b owns the buffer; a is empty (typically)

std::move is a cast — it doesn’t move anything; it tells the compiler “treat this as an rvalue, please pick the move overload.”

Rule of 0/3/5

Rule of 0: design classes so the defaults are correct. Member variables are RAII types (vector, unique_ptr, string). Don’t write any of the special members.
Rule of 3 (pre-C++11): if you write any of dtor, copy ctor, copy assign, write all three.
Rule of 5 (C++11+): add move ctor and move assign.

class Buffer {
    std::unique_ptr<char[]> data_;
    std::size_t size_;
public:
    Buffer(std::size_t n)
        : data_(std::make_unique<char[]>(n)), size_(n) {}
    // copy/move auto-generated correctly because members are RAII.
};

`noexcept` matters

Move operations should be noexcept. If they aren’t, std::vector can’t use them when reallocating — it falls back to copy, defeating the purpose.

struct S {
    std::string name;
    S(S&&) noexcept = default;          // critical
    S& operator=(S&&) noexcept = default;
};

Forwarding references (`T&&` in templates)

template<class T>
void f(T&& x) {                       // forwarding ref, NOT rvalue ref
    g(std::forward<T>(x));            // preserves value category
}

Reference collapsing: T&& && → T&&, T&& & → T&. This is the mechanism behind perfect forwarding (and std::forward).

5. Copy Elision and RVO

The compiler is allowed (and often required) to elide copy/move when constructing return values.

Foo make() { return Foo{};  }            // (N)RVO — direct construction in caller
Foo f = make();                          // no copy, no move

C++17 mandated copy elision for prvalues — the move you “see” in source code may not exist as an actual operation.

auto v = std::vector<int>(1'000'000);    // no copy of the temporary

Implication: return by value is the right default. The compiler will not copy a big vector.

6. Templates, SFINAE, Concepts

Templates are compile-time generators. Each instantiation produces a fresh type or function.

template<class T>
T max(T a, T b) { return a < b ? b : a; }

max(1, 2);          // T = int
max(1.0, 2.0);      // T = double
max(1, 2.0);        // ambiguous — different T's

SFINAE — “Substitution Failure Is Not An Error”

Failed substitutions are silently dropped from the overload set, not compile errors:

template<class T>
auto add(T a, T b) -> decltype(a + b) { return a + b; }

Older idiom: std::enable_if_t<...>. Crufty; use concepts instead in C++20:

template<class T>
concept Numeric = std::is_arithmetic_v<T>;

template<Numeric T>
T add(T a, T b) { return a + b; }

Compile-time error blasts

A template error message can be thousands of lines. Modern compilers (gcc 13+, clang 16+) and concepts dramatically reduce this. If you see a 5000-line error in an interview, don’t panic; isolate by typedef-ing intermediate types.

CRTP (Curiously Recurring Template Pattern)

Static polymorphism — virtual without the vtable cost.

template<class Derived>
struct Base { void f() { static_cast<Derived*>(this)->impl(); } };

struct D : Base<D> { void impl() { /* ... */ } };

7. STL Containers — Complexity

Container	Insert	Erase	Lookup	Iter Invalidation	Memory
`vector`	O(1)* end / O(N) middle	O(N)	O(N), O(1) by index	All on grow / from pos	Contiguous
`array`	n/a	n/a	O(1)	None	Contiguous, fixed N
`deque`	O(1) ends, O(N) middle	O(N)	O(1)	All except ends	Block array
`list`	O(1) anywhere (with iter)	O(1)	O(N)	None on insert; affected pos on erase	Doubly linked
`forward_list`	O(1) after iter	O(1)	O(N)	None on insert	Singly linked
`set`/`map`	O(log N)	O(log N)	O(log N)	None on insert; pos on erase	Red-black tree
`unordered_set`/`map`	O(1) avg, O(N) worst	O(1) avg	O(1) avg	All on rehash	Buckets + nodes

vector is the default. Reach for others only with a measured reason.

std::vector<int> v;
v.reserve(1'000'000);          // pre-size, avoid grows
for (int i = 0; i < 1'000'000; ++i) v.push_back(i);

`unordered_map` warnings

Open-chaining hash table. Each node is heap-allocated → bad cache locality. For perf-critical code, prefer absl::flat_hash_map, tsl::robin_map, or other open-addressing maps. State this in HFT/perf interviews; it’s a known weakness.

std::unordered_map<std::string, int> m;
m.reserve(N);                    // sets bucket count
m.max_load_factor(0.5);          // tighter than default 1.0

8. Iterator Invalidation

The single most common subtle bug in C++.

Container	Operation	What invalidates
`vector`	`push_back`, `insert`, `reserve` triggering grow	All iterators/refs/pointers
`vector`	`erase`	Iterators/refs at and after pos
`deque`	any insert/erase except at ends	All iterators (refs to non-affected elements survive)
`list` / `forward_list`	`insert`, `push_*`	None
`list` / `forward_list`	`erase`	Only iterators to erased element
`unordered_*`	rehash (insert that exceeds load factor)	All iterators (refs/pointers survive!)
`map` / `set`	`insert`	None
`map` / `set`	`erase`	Only iterators to erased

std::vector<int> v{1,2,3,4,5};
for (auto it = v.begin(); it != v.end(); ++it) {
    if (*it == 3) v.push_back(99);   // UB — push_back may invalidate `it`
}

// Correct: collect, then mutate; or use erase-remove.
v.erase(std::remove_if(v.begin(), v.end(), pred), v.end());

9. STL Algorithms

<algorithm> and <numeric> provide a rich library. Use them — hand-rolled loops are usually slower and harder to read.

std::sort(v.begin(), v.end());                            // IntroSort, O(N log N)
std::stable_sort(v.begin(), v.end());                     // O(N log² N) generally
std::nth_element(v.begin(), v.begin()+k, v.end());        // O(N) avg, kth-element
std::partial_sort(v.begin(), v.begin()+k, v.end());       // top-k, O(N log K)
std::lower_bound(v.begin(), v.end(), x);                  // binary search, O(log N)
std::accumulate(v.begin(), v.end(), 0LL);                 // careful with init type

Sort algorithms

std::sort is introsort: quicksort, switching to heapsort if recursion gets too deep, switching to insertion sort for small ranges. Worst case O(N log N), unstable. std::stable_sort is typically merge sort with allocation; std::sort is usually preferred unless stability matters.

Ranges (C++20)

auto evens = v | std::views::filter([](int x){ return x%2==0; })
               | std::views::transform([](int x){ return x*x; });

Lazy, composable. Less verbose than iterator pairs.

10. Concurrency — `std::thread`, `mutex`, atomics, memory_order

std::thread t([]{ work(); });
t.join();                           // or t.detach() — but rarely

If a std::thread is destroyed while joinable, the program calls terminate. std::jthread (C++20) joins on destruction.

Mutex

std::mutex m;
std::lock_guard<std::mutex> lk(m);   // RAII lock

std::scoped_lock (C++17) locks multiple mutexes deadlock-free.

Condition variables

std::condition_variable cv;
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{ return ready; });    // releases lk, waits, reacquires

Always use the predicate form to handle spurious wakeups.

`std::atomic<T>`

std::atomic<int> counter{0};
counter.fetch_add(1, std::memory_order_relaxed);

`memory_order`

Order	Guarantees	Use
`relaxed`	Atomicity only, no ordering	Stat counters
`acquire` (load)	No subsequent reads/writes can move before	Read of a flag protecting data
`release` (store)	No prior reads/writes can move after	Write that publishes data
`acq_rel` (RMW)	Both	CAS retry loops
`seq_cst` (default)	Sequential consistency, single total order	Default; safest

// Producer:
data = produce();
ready.store(true, std::memory_order_release);

// Consumer:
while (!ready.load(std::memory_order_acquire)) {}
use(data);     // safe — release/acquire pair

memory_order is interview territory at L6+ HFT/system roles. Default to seq_cst until measured.

11. Undefined Behavior (UB)

UB means the spec places no requirements. The compiler may eliminate code, “optimize” infinite loops, or generate code that does anything. Don’t rely on “well, it works on my machine.”

Common UB

Read of uninitialized memory.
Out-of-bounds access (v[v.size()] is UB).
Signed integer overflow (unsigned wraps, signed is UB).
Use-after-free / double-free.
Race conditions (concurrent unsynchronized access to mutable data).
Strict aliasing violations (reinterpreting a float* as int*).
Null pointer deref — including for member access on a null pointer.
Lifetime violations — using a moved-from object beyond what’s specified.
Integer division by zero, INT_MIN / -1.
Returning reference/pointer to a local.

Why it bites in interviews

The interviewer puts a for (int i = 0; i <= n; ++i) v[i] = ...; on the board and watches whether you flag the OOB. If you don’t, your perceived rigor drops a tier instantly.

Sanitizers

Compile + run tests under:

clang++ -fsanitize=address,undefined -g -O1 main.cpp
clang++ -fsanitize=thread     -g -O1 main.cpp   # for races

ASan: heap/stack/global OOB, use-after-free, double-free. UBSan: signed overflow, null derefs, alignment. TSan: data races.

State in interviews that you run sanitizers in CI. It signals discipline.

12. Common Interview Gotchas

Virtual destructor

If a class is meant to be derived-from and used polymorphically, the destructor must be virtual — otherwise delete base_ptr calls only the base’s destructor.

struct Base { virtual ~Base() = default; };
struct Derived : Base { /* ... */ };
Base* p = new Derived;
delete p;   // virtual dtor → Derived's runs

Object slicing

void f(Base b);              // by value
Derived d;
f(d);                        // d sliced — only Base portion copied

Always pass polymorphic types by reference or pointer, never by value.

`vector<bool>` is not a vector of bool

Specialized as a packed bitset → operator[] returns a proxy, not bool&. Don’t take its address.

std::vector<bool> v;
auto x = v[0];               // proxy reference, not bool&

Use std::vector<char> if you need real bools.

Self-assignment

T& operator=(const T& o) {
    if (&o == this) return *this;     // guard
    // ...
}

Or: copy-and-swap idiom — pass by value (copy happens at call site), swap, return.

Initialization order

Member variables are constructed in declaration order, not member-initializer-list order. Compiler warns when they differ.

`static` local init

Thread-safe since C++11 (Magic statics). One initialization, even with concurrent first access.

`nullptr` vs `NULL` vs `0`

Use nullptr. NULL is 0 (an integer); 0 doesn’t overload-resolve cleanly.

Floating-point comparison

Same warning as Java — never == for float/double. Use tolerances or std::nextafter.

Implicit conversions

int → bool, bool → int, double → int. Use explicit for single-arg constructors:

struct Date { explicit Date(int y); };
Date d = 2024;        // error — explicit constructor
Date d{2024};         // OK

13. Modern C++ Idioms

auto for local types — but spell out parameter and return types where they’re API.
Range-for — for (const auto& x : container).
Lambdas — capture defaults: [] (none), [&] (by ref), [=] (by value), [this].
enum class — strongly typed, scoped enums. No implicit int conversion.
structured bindings — auto [k, v] = *it;.
if constexpr — compile-time branch in templates.
std::optional — Maybe<T>. Use for “may not exist.”
std::variant — tagged union.
std::string_view — non-owning view of a string. Don’t store across the string’s lifetime.
std::span — non-owning view of a contiguous range.
{} init — uniform initialization. Prevents narrowing conversions.

int a{3.14};       // error — narrowing
int a = 3.14;      // OK (silent truncation)

Modules (C++20)

Replacement for headers. Faster builds, better isolation. Adoption uneven; compilers still maturing.

Coroutines (C++20)

generator<int> ints() {
    for (int i = 0;; ++i) co_yield i;
}

The standard library lacks high-level types — you bring boost::asio or roll your own. Mention only if asked.

14. Compile-Time vs Runtime

C++ has a powerful compile-time computation toolkit. Use it to push work out of the runtime.

constexpr int factorial(int n) { return n <= 1 ? 1 : n * factorial(n-1); }
static_assert(factorial(5) == 120);

template<class T>
constexpr bool is_pod_v = std::is_trivial_v<T> && std::is_standard_layout_v<T>;

constexpr, consteval (C++20), if constexpr together let you write code that’s branchless and zero-cost when called with constant inputs.

Compile-time hash

Implement a consteval string hash, generate switch tables — common HFT trick to dispatch on string commands at runtime in O(1) without runtime hashing.

15. Performance Hot Tips

Cache friendliness wins. Arrays of structs with sequential access trounce trees of pointers, even when complexity is “the same.” A modern CPU handles ~1 cache miss per 100 cycles of compute.
Reserve. vector::reserve, unordered_map::reserve. Avoid grow churn.
Move into containers. v.push_back(std::move(s)); over v.push_back(s);.
emplace_back over push_back when constructing in place.
Pass by value + std::move in constructors and setters — modern idiom.
Avoid std::endl — it flushes. Use '\n'.
Prefer iteration over recursion for deep structures; the function-call overhead and stack pressure matter.
Profile before optimizing. perf, VTune, callgrind, sampling profilers. Algorithmic wins dwarf micro-optimizations.
Compile with -O2 -march=native -flto for production.
Avoid virtual in hot paths when possible. Devirtualization helps but a known-static dispatch is always cheaper.
Beware of false sharing — two atomics on the same cache line (typically 64B) bottleneck even when “independent.” Pad with alignas(std::hardware_destructive_interference_size).

struct alignas(64) Counter { std::atomic<long> v{0}; };

16. Tooling — Sanitizers, Compiler-Specific Behavior

Sanitizers (recap from §11)

ASan — memory errors.
UBSan — undefined behavior.
TSan — races.
MSan (Clang only) — uninitialized reads.

Run them in CI. Production: don’t ship with sanitizers (perf cost), but optionally enable a hardened mode (_FORTIFY_SOURCE=2, -fstack-protector-strong).

Warning flags

g++ -Wall -Wextra -Wpedantic -Werror -Wshadow -Wconversion

Treat warnings as errors. The C++ ecosystem assumes you do.

Standard library debug modes

-D_GLIBCXX_DEBUG (libstdc++) checks bounds, iterator invalidation. Only debug builds — slow.

Vendor-specific behavior

MSVC has different ABI rules (e.g., NRVO eligibility, exception spec). Don’t depend on inline assembly portability.
__attribute__((...)) is GCC/Clang. MSVC uses __declspec.
Endian-ness, padding, alignment are platform-dependent. Don’t memcpy between systems without endian conversion.

17. C++ — What To Memorize Cold

RAII. RAII. RAII.
Rule of 0/3/5. Default to Rule of 0.
unique_ptr cheap, shared_ptr has atomic refcount, weak_ptr breaks cycles.
Move = transfer of ownership. Moved-from = valid but unspecified. noexcept move ops matter.
C++17 mandates prvalue copy elision — return by value is fine.
Iterator invalidation rules per container — memorize the table in §8.
vector is the default; unordered_map is slow on cache locality.
Sort is introsort — O(N log N) worst, unstable. stable_sort allocates.
memory_order: relaxed for counters, acquire/release for publication, seq_cst default.
UB list: OOB, signed overflow, races, use-after-free, strict aliasing, null deref, uninitialized read. Sanitizers catch most.
Virtual destructor for polymorphic bases. Object slicing on by-value. vector<bool> is special.
nullptr, enum class, auto, string_view, optional, variant, structured bindings — modern toolkit.
Cache locality > algorithmic constants in modern hardware.
Compile with -O2 -march=native -flto -Wall -Wextra for production. Run sanitizers in CI.

When you’re shaky on any of those, write a 30-line program that demonstrates the issue and run it under ASan + UBSan. C++’s sanitizers are some of the best feedback in any language; use them.

Phase 10 — Testing, Debugging, and Correctness

Target level: Intermediate → Senior Expected duration: 2–3 weeks (assuming Phases 0–9 are complete) Weekly cadence: 4–5 lab hours + apply testing discipline to every problem you solve elsewhere

Why This Phase Exists

Most candidates lose offers not because they couldn’t find an algorithm — they lose because their code was almost right and they never noticed. The interviewer asked “are you sure?”, they said “yes”, and then the interviewer ran one edge case and the screen went red.

Testing and debugging is the dimension where senior candidates separate from juniors. A junior writes code and hopes. A senior writes code and proves it works, then runs three deliberate test cases (one normal, one degenerate, one large), and only then claims “done.”

This phase teaches the discipline. It is short because the mechanics are simple. The habit is what takes weeks to internalize, which is why every later problem in your study should explicitly run the checklist here.

Concepts to Master

Test types

Manual / desk-checked tests — what you trace through on paper during a 45-minute interview
Smoke tests — 1–2 sanity examples to prove the code runs at all
Unit tests — per-function correctness; use these heavily in phase-08-practical-engineering labs
Integration tests — multi-component behavior; relevant when you implement subsystems (cache + invalidator, scheduler + worker)
Property-based tests — hypothesis-style; assert invariants over random inputs (e.g., “sorted output is a permutation of the input”)
Brute-force verifier — known-correct slow solution to validate the fast one on small inputs
Stress testing — random-generation loop that runs the verifier and the fast solution and diffs them; the single best CP debugging tool
Fuzzing (overview) — feed structured random input; useful for parsers, serializers, anything with a grammar
Golden tests — record expected output for canonical inputs; mostly used in compiler/transform code
Mutation testing (overview) — flip operators in your code and check if any test catches the mutation; reveals weak test suites
Coverage analysis — branch and line coverage; necessary but not sufficient

Complexity & performance

Complexity testing — measure runtime at N, 2N, 4N; check the doubling ratio matches your claimed Big-O
Performance profiling — cProfile/py-spy (Python), perf/pprof (Go/C++), async-profiler (Java)
Memory profiling — tracemalloc/memory_profiler (Python), pprof heap (Go), heap dumps (JVM)

Concurrency

Race detection — -race (Go), TSan (C++/Rust), ThreadSanitizer for clang
Deterministic concurrency testing — schedule injection, controlled interleaving, deterministic random
Deadlock detection — lock-order graph analysis

Why Testing Matters in Interviews

Interviewers explicitly score “testing and verification” on the rubric. The signal they’re watching for:

What you do	What it signals
Submit and say “done”	Junior — does not verify own work
Walk through one example manually	Acceptable — minimum bar
Walk through, then deliberately try an edge case	Senior — actively looking for bugs
Find your own bug and fix it without prompt	Strong senior signal
Identify a class of bugs you might have (“integer overflow when the array is large”) and write a test for that specific risk	Staff signal — anticipating failure modes

Candidates who do not test lose offers even when their code is correct, because the interviewer cannot tell whether the correctness was deliberate or accidental.

The Universal Test Checklist

Apply this to every problem you solve, in every phase. Most of these take 10 seconds to consider; even rejecting them out loud earns the signal.

Input shape

Empty input ([], "", 0, None)
Null input (if the language allows)
Single element
Two elements
Maximum-size input (the constraint upper bound)
Minimum-size input (often the constraint lower bound)

Input content

All elements identical (duplicates)
All elements distinct
Already sorted (ascending and descending)
Negative numbers
Zero
Mixed signs
Values at integer boundaries (INT_MAX, INT_MIN, overflow risk in sums/products)
Floating-point precision (when numeric)

Domain-specific

Disconnected graph
Self-loop, multi-edge
Cycle in a graph that “should” be a tree
Empty tree / single-node tree / skewed tree
Linked list with one node, two nodes, with cycle
Strings with unicode, with whitespace, with case differences

Output ambiguity

Multiple valid answers (does the interviewer want any, all, or a canonical one?)
Stable ordering required vs not
Off-by-one in inclusive vs exclusive bounds

Failure modes

Invalid input — does your function crash, return a sentinel, or raise?
Concurrent access (for the practical-engineering labs)
Timeout case — what happens when N is at the constraint limit?

Required Tests Per Lab (Curriculum-Wide Rule)

From Phase 10 forward, every lab you complete (and every lab from Phases 0–9 you re-solve) must include:

3 normal tests — the happy path, what the problem statement examples look like
3 edge tests — chosen from the checklist above; pick the three most relevant to this problem
1 large-input test — N at the constraint upper bound; verifies you didn’t accidentally write an O(N²) loop you thought was O(N log N)
1 randomized test (when a verifier exists) — random input, run brute force and fast solution, assert equal
1 invalid-input test (when applicable) — wrong type, malformed, out of range

Document these as test functions, not “I thought about it.” The act of writing them catches bugs.

Common Mistakes

Testing only the given examples. The examples in the problem statement are almost always the happy path; they never exercise edge cases.
Mental simulation without writing it down. Your brain skips steps. Trace on paper.
Treating “the code compiles” as “the code works.” Compilation is the lowest bar.
Not verifying complexity empirically. A claimed O(N) that runs 30× slower at 2N is actually quadratic.
Adding tests after the bug. Add the test first, watch it fail, then fix; otherwise you don’t know your test would have caught it.
Ignoring “obvious” cases. Empty input bugs are the #1 cause of failed phone screens.
Not testing concurrency under load. A thread-safety bug at 1 thread is invisible; at 1000 threads on 8 cores, it’s a daily incident.

Debugging Checklist (Apply When Stuck)

Reproduce. What is the smallest input that fails?
Read the error. Stack trace, line number, value. Do not skip this.
State the expected output. If you can’t, you don’t understand the problem.
Diff expected vs actual. Is it off by one? Off by a factor? Wrong type?
Binary-search the bug. Print state at midpoint of the algorithm; halve the search space.
Check invariants. What was supposed to be true at this point? Assert it.
Question assumptions. “I’m sure this list is sorted” — prove it.
Read the code aloud. Speech catches what your eye skips.
Rubber-duck explain. Tell an inanimate object what the code does, line by line.
Step away for 60 seconds. Genuinely. The number of bugs solved this way is embarrassing.

Mastery Checklist

You have completed Phase 10 when you can:

Generate the universal test checklist for any new problem in under 90 seconds
Write a brute-force verifier for any problem with N ≤ 20
Build a randomized stress-testing harness in under 10 minutes for a new problem
Diagnose a wrong-answer bug in your own code in under 5 minutes using the debugging checklist
Diagnose a TLE (timeout) bug by measuring the doubling ratio
State the loop invariant for binary search, Kadane’s algorithm, and a simple DP
Profile a Python script and identify the top 3 hot functions in under 5 minutes
Find a race condition in a small Go/Java/C++ program using the language’s race detector
Recognize when a test is too weak (mutation testing thought experiment)

Exit Criteria

Before moving to Phase 11:

Complete all 6 labs in this directory with the full test suite written and passing
Re-solve 3 problems from Phase 2 and 3 problems from Phase 5 applying the universal test checklist; document any bugs caught
Run the stress-testing harness (Lab 5) on at least one problem you previously thought was correct, and report what you found
Profile one of your Phase 8 practical-engineering implementations (e.g., LRU cache, rate limiter) and identify at least one inefficiency

Labs

#	Lab	Focus	Anchor Problem
1	lab-01-edge-case-taxonomy.md	Systematic edge case discovery	Array median
2	lab-02-test-driven-problem-solving.md	Write tests before code	LRU cache
3	lab-03-debugging-under-pressure.md	Systematic debug protocol	Word Break (planted bug)
4	lab-04-correctness-proofs.md	Loop invariants & induction	Binary search + Kadane
5	lab-05-stress-testing-harness.md	Brute-force verifier + random fuzzing	Two-pointer variants
6	lab-06-performance-profiling.md	Empirical complexity + profiling	LIS implementations

Connection to Other Phases

Phase 2/3/4/5 — re-solve a problem from each, applying the universal test checklist
Phase 7 (Competitive) — Lab 5 (stress testing) is the canonical CP debugging tool; use it on every CF problem you fail
Phase 8 (Practical Engineering) — concurrency-aware testing is required for every lab; the rate limiter, LRU cache, and thread pool labs all need race-condition tests
Phase 11 (Mocks) — the testing rubric (dimension 8) is scored on every mock; this phase trains that score

Lab 01 — Edge Case Taxonomy (Find the Median of an Unsorted Array)

Goal

Build a reusable, systematic edge-case taxonomy you can apply to any new problem in under 90 seconds. Use “find the median of an unsorted integer array” as the anchor — a problem that looks trivial but has at least 12 edge cases that a careless candidate will miss. By the end you should be able to enumerate 8+ edge cases for any array problem before writing a single line of code.

Background Concepts

An edge case is an input that is technically legal under the constraints but exercises a degenerate or boundary behavior in your algorithm. They fall into a small number of universal categories:

Empty / null — what does your function do with [] or None?
Singleton — one element
Identical elements — all equal; tests duplicate handling
Boundary values — INT_MAX, INT_MIN, 0, negatives
Sorted / reverse-sorted — tests algorithms that assume scrambled input
Maximum size — N at the constraint upper bound; tests complexity
Output ambiguity — multiple valid answers; tests the spec
Arithmetic overflow — sums/products that exceed INT_MAX

The taxonomy is universal. The application is problem-specific.

Interview Context

“Find the median” is asked as a warm-up at Meta, Microsoft, and Bloomberg phone screens. The interviewer is not testing whether you know quickselect. They are testing whether you ask “what do you mean by median for an even-length array — average of the two middles or either one?” before writing code. Candidates who skip this question lose the point even if their code is otherwise correct.

The senior signal is to list out edges aloud before coding: “Empty array — should I return None or throw? Single element — that’s the median. Two elements — average. Even vs odd length — different formulas. Are values bounded so the sum of two won’t overflow?” Five sentences. Then code.

Problem Statement

Given an unsorted array of integers nums, return the median. If nums has odd length, return the middle value after sorting. If nums has even length, return the average of the two middle values.

Constraints

0 ≤ |nums| ≤ 10^5
-10^9 ≤ nums[i] ≤ 10^9
The return type may be a float (because of averaging)

Clarifying Questions

Empty input? What should I return — None, NaN, raise an exception?
Even length: average or either middle? Lower middle, upper middle, or the float average?
Are duplicates allowed? (Yes; median definition handles them naturally.)
Floating point precision concerns? If nums[i] is up to 10^9, sum of two is 2×10^9 — fits in 32-bit signed int barely, but using (a + b) / 2.0 in C++ overflows for INT_MAX + INT_MAX. Better: a/2.0 + b/2.0 or a + (b - a)/2.0.
Modify input allowed? (Affects whether you can sort in place or need to copy.)

Examples

nums = [3, 1, 2]            → 2          (odd length)
nums = [3, 1, 2, 4]         → 2.5        (even, average of 2 and 3)
nums = [5]                  → 5          (singleton)
nums = []                   → None / raise (clarify with interviewer)
nums = [7, 7, 7, 7]         → 7.0        (all duplicates, even length)
nums = [INT_MAX, INT_MAX]   → INT_MAX    (overflow risk in average)
nums = [-3, -1, -2]         → -2         (negatives)

Initial Brute Force

Sort, then index. Two lines of code.

def median(nums):
    if not nums:
        return None
    s = sorted(nums)
    n = len(s)
    if n % 2 == 1:
        return s[n // 2]
    return (s[n // 2 - 1] + s[n // 2]) / 2

Brute Force Complexity

Time O(N log N), space O(N) (or O(1) if you sort in place and the caller allows mutation). For N = 10^5 this is ~1.7×10^6 comparisons — well within any interview time limit.

Optimization Path

The interviewer may now ask: “Can you do better than O(N log N)?” The answer is quickselect, which finds the k-th smallest in expected O(N) using partition-based recursion. Worst case O(N²); use median-of-medians for guaranteed O(N) if pressed.

For the edge-case lab, do not optimize. The point is to enumerate edges before the algorithm matters. Quickselect has more edge cases (recursion depth on degenerate partitions, pivot selection bias) so optimizing without first nailing edges makes the bug surface larger.

Final Expected Approach

Validate input. Return None (or raise) on empty.
Sort a copy (do not mutate caller’s array unless agreed).
Compute middle index mid = n // 2.
If odd, return sorted[mid].
If even, return sorted[mid - 1] + sorted[mid] divided by 2, using a + (b - a) / 2 form to avoid overflow.

Data Structures Used

A sortable copy of the array. In Python sorted() returns a new list. In Java use Arrays.sort() on a clone; in C++ std::sort on a copy.
No auxiliary structures.

Correctness Argument

After sorting, by definition the value at index n // 2 is the lower-middle (0-indexed); the value at n // 2 - 1 is the upper-lower; their average is the median for even lengths. For odd lengths, n // 2 is exactly the middle. The sort guarantees the ordering invariant required.

Complexity

Time O(N log N) sort + O(1) lookup. Space O(N) for the copy (or O(1) if in-place sort is allowed). Quickselect: expected O(N), worst O(N²); median-of-medians: worst O(N) with larger constant.

Implementation Requirements

Function signature should accept any iterable convertible to a list.
Do not mutate the caller’s input.
Return type: float (even for odd-length inputs, for consistency) or use a tagged return; document which.
Handle empty input explicitly with the chosen convention.

Tests

Smoke (3 normal)

assert median([3, 1, 2]) == 2
assert median([1, 2, 3, 4]) == 2.5
assert median([5, 2, 8, 1, 9]) == 5

Edge (5 — exceeds the 3 minimum because this lab is about edges)

assert median([]) is None              # empty
assert median([42]) == 42              # singleton
assert median([1, 2]) == 1.5           # even, smallest
assert median([7, 7, 7, 7]) == 7       # all duplicates
assert median([-3, -1, -2]) == -2      # all negatives
assert median([10**9, 10**9]) == 10**9 # overflow boundary
assert median([-10**9, 10**9]) == 0    # mixed extremes

Large

import random
random.seed(0)
big = [random.randint(-10**9, 10**9) for _ in range(10**5)]
result = median(big)
assert isinstance(result, (int, float))  # didn't crash; didn't take >1s

Randomized verifier

def brute_median(nums):
    s = sorted(nums)
    n = len(s)
    return s[n//2] if n % 2 else (s[n//2 - 1] + s[n//2]) / 2

for _ in range(1000):
    n = random.randint(1, 50)
    nums = [random.randint(-100, 100) for _ in range(n)]
    assert median(nums) == brute_median(nums)

Invalid input

try:
    median(None)
    assert False, "should have raised"
except TypeError:
    pass

Follow-up Questions

Streaming median. Find the median as numbers arrive one at a time. → Two heaps (max-heap of lower half, min-heap of upper half). O(log N) per insert, O(1) per query.
Median of two sorted arrays. Classic LC 4 hard. → Binary search on partition; O(log min(N, M)).
k-th smallest in unsorted. → Quickselect; O(N) expected.
Weighted median. Each value has a weight; find the value where cumulative weight crosses half. → Sort + prefix scan; O(N log N).
Approximate median in one pass with O(1) memory. → Reservoir sampling + recursion, or P² algorithm.

Product Extension

Latency percentiles in distributed monitoring. P50 (median), P99, P99.9. Cannot store all latencies — use t-digest or HdrHistogram for compact mergeable approximations.
A/B testing. Comparing median user session length between buckets requires bootstrap confidence intervals because medians don’t have closed-form variance.
Recommendation systems. “Median rating per item” for cold-start ranking.

Language/Runtime Follow-ups

Python: sorted() is TimSort, O(N log N) worst case; uses additional O(N) memory. list.sort() is in-place. nums[n//2] is O(1) indexing.
Java: Arrays.sort(int[]) uses dual-pivot quicksort (O(N log N) average, O(N²) worst on adversarial input). Arrays.sort(Object[]) uses TimSort. Auto-boxing Integer adds overhead.
C++: std::sort is introsort (quicksort + heapsort fallback); worst-case O(N log N). std::nth_element is O(N) average for quickselect. Beware integer overflow in (a + b) / 2 for signed 32-bit; use a + (b - a) / 2.
Go: sort.Ints is introsort, O(N log N). No overflow checks in int arithmetic; wraps silently on 32-bit platforms.
JavaScript: Array.prototype.sort() defaults to lexicographic string comparison — [10, 9, 2].sort() returns [10, 2, 9]. Always pass a comparator: sort((a, b) => a - b).

Common Bugs

Empty input crash. s[n // 2] with n == 0 is s[0] on an empty list → IndexError.
Integer overflow on average. (a + b) / 2 overflows when a + b > INT_MAX. Use a + (b - a) / 2 or use floating-point earlier.
Integer division for even-length median. In Python 2 / Java, (a + b) / 2 truncates. In Python 3, / is float — but // is integer. Be explicit.
Mutating caller’s array. Passing nums.sort() to a function modifies the original. Use sorted(nums).
Off-by-one for even length. n // 2 is the upper middle (0-indexed); n // 2 - 1 is the lower. Confusing these gives the wrong answer for [1, 2, 3, 4].
JavaScript default sort. Returns string-sorted order for numbers.

Debugging Strategy

If your function returns the wrong value:

Print the sorted array. Is it actually sorted? (Confirms no JavaScript-style default-sort bug.)
Print n, n // 2, n % 2. Is the index what you expect?
Check parity branch — did you accidentally swap the odd/even branches?
For overflow: print s[n//2 - 1] + s[n//2] before dividing; check if it matches the expected sum.
For mutation bugs: print the input both before and after the call. If it changed, you mutated.

If you TLE on the large test, you wrote O(N²) accidentally (e.g., used insertion sort, or sorted inside a loop).

Mastery Criteria

Wrote the function correctly on the first try with all edge cases handled
Listed all 8+ edge cases aloud before writing code (time yourself: under 90 seconds)
Identified the overflow risk in (a + b) / 2 without prompting
Wrote the randomized verifier in under 5 minutes
Can recite the universal edge-case taxonomy (empty / singleton / two / duplicates / sorted / boundary / overflow / mixed) without looking
Re-applied the taxonomy to one Phase 2 problem and caught at least one edge case you previously missed

Lab 02 — Test-Driven Problem Solving (LRU Cache)

Goal

Write tests before writing the implementation. Use LRU cache (LC 146) as the anchor — a problem where ambiguities in the spec (does put of an existing key count as a “use”? what does capacity 0 mean?) are best surfaced by writing test cases first. By the end you should treat tests as a design tool, not a verification afterthought.

Background Concepts

Test-driven design (TDD) in an interview context is not the dogmatic red-green-refactor cycle. It is the discipline of writing 3–5 example calls and their expected results before writing the implementation, because:

Writing the expected output forces you to confront spec ambiguities (and ask the interviewer).
The tests double as documentation of your understanding — if the interviewer disagrees, you discover it before you’ve written 50 lines.
The tests become your verification suite — you don’t have to invent them after the fact under time pressure.
The act of choosing tests reveals edge cases you would otherwise miss.

The cost is 2–3 minutes up front. The savings are usually 10+ minutes of debugging later.

Interview Context

LRU cache is the most-asked OOD-flavored coding question at FAANG. Google, Meta, Amazon, Bloomberg all ask it in some form. The standard expectation is O(1) get and put using a hashmap + doubly linked list.

The senior signal is to enumerate behavioral test cases before touching code: “get on missing key returns -1 (or what?). put of existing key updates value AND marks as recently used? put over capacity evicts the LRU; what if multiple keys are tied? Does get count as a use?” These are the real questions. Candidates who code first and discover these mid-interview look junior.

Problem Statement

Design a data structure that supports:

LRUCache(int capacity) — initialize with positive capacity
int get(int key) — return value if present, else -1; using a key counts as “recently used”
void put(int key, int value) — insert or update; on overflow evict the least recently used; updating an existing key also counts as recently used

All operations must be O(1) average.

Constraints

1 ≤ capacity ≤ 3000
0 ≤ key, value ≤ 10^4
Up to 2×10^5 calls

Clarifying Questions (Surface These Before Writing Code)

Does get mark the key as recently used? (Yes — standard.)
Does put on an existing key mark as recently used? (Yes — standard. Confirm.)
Capacity of 0 — is that legal? (Constraints say ≥ 1, but worth confirming the contract.)
Eviction policy when multiple keys are tied for least recently used — can this happen? (In a strict LRU with sequential ops, no — every access updates order. Tie only on initial fill, but at that point the oldest insertion is LRU.)
Thread safety required? (Almost never in the interview; always ask anyway. If yes, see Phase 8 LRU lab.)

Examples (Written as Tests First)

# Test 1: basic put/get
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == 1       # returns 1
cache.put(3, 3)                # evicts key 2 (LRU)
assert cache.get(2) == -1      # not found
assert cache.get(3) == 3
cache.put(4, 4)                # evicts key 1
assert cache.get(1) == -1
assert cache.get(3) == 3
assert cache.get(4) == 4

# Test 2: put on existing key updates value AND recency
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
cache.put(1, 10)               # update key 1; now order is 2 (LRU), 1 (MRU)
cache.put(3, 3)                # evicts 2, not 1
assert cache.get(2) == -1
assert cache.get(1) == 10      # updated value preserved

# Test 3: get on missing key
cache = LRUCache(1)
assert cache.get(99) == -1     # never inserted

These three tests already locked down 6 design decisions. Now you can write the implementation.

Initial Brute Force

Use a single Python dict and an auxiliary list to track insertion order. get: O(1) dict lookup, but moving to end is O(N) list operation. put: O(1) insert, but O(N) eviction scan. Total: O(N) per op.

Alternatively, use collections.OrderedDict which is already a hash + doubly linked list internally. move_to_end and popitem(last=False) are both O(1). Single-class solution in ~15 lines.

Brute Force Complexity

Naive dict + list: O(N) per op, fails on 2×10^5 calls at large capacity → 6×10^8 ops, TLE.

Optimization Path

The standard answer is hashmap + doubly linked list:

Hashmap: key → node
Doubly linked list: nodes in MRU-to-LRU order
get: hashmap lookup → unlink node → relink at head (MRU)
put: if key exists, update value + move to head; else create node, insert at head; if size > capacity, remove tail (LRU) and delete from hashmap

All operations are O(1) because both the hashmap and the doubly linked list support O(1) insert/delete with a node reference.

Using OrderedDict in Python is equivalent and acceptable in interviews if you explain why it works (because it’s a hashmap + DLL internally). In Java, use LinkedHashMap with accessOrder=true.

Final Expected Approach

Implement with explicit doubly linked list to demonstrate understanding:

class Node:
    __slots__ = ('key', 'val', 'prev', 'next')
    def __init__(self, key=0, val=0):
        self.key, self.val = key, val
        self.prev = self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = {}
        # sentinel head/tail to avoid edge cases
        self.head = Node()
        self.tail = Node()
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def _add_to_front(self, node):
        node.next = self.head.next
        node.prev = self.head
        self.head.next.prev = node
        self.head.next = node

    def get(self, key: int) -> int:
        if key not in self.cache:
            return -1
        node = self.cache[key]
        self._remove(node)
        self._add_to_front(node)
        return node.val

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            node = self.cache[key]
            node.val = value
            self._remove(node)
            self._add_to_front(node)
        else:
            if len(self.cache) >= self.cap:
                lru = self.tail.prev
                self._remove(lru)
                del self.cache[lru.key]
            node = Node(key, value)
            self.cache[key] = node
            self._add_to_front(node)

Data Structures Used

Dict — O(1) key → node lookup
Doubly linked list with sentinels — O(1) insert at head, O(1) remove from tail or middle

Sentinels eliminate if node is None checks at the boundary, the #1 source of LRU bugs.

Correctness Argument

Invariant 1: cache[key] always points to the node currently in the linked list with that key. Maintained because every insertion adds to both, every removal removes from both.

Invariant 2: The linked list is ordered MRU → LRU from head to tail. Maintained because every access (get or put on existing) moves the node to the front, and every new insert goes to the front.

Invariant 3: len(cache) ≤ capacity. Maintained because put evicts the tail (LRU) before adding when at capacity.

Complexity

get: O(1). put: O(1). Space: O(capacity).

Implementation Requirements

Use sentinels — no null checks at head/tail
Use __slots__ on the Node class in Python for memory efficiency
Keep _remove and _add_to_front as separate helpers — do not inline; the duplication is a bug magnet
Update the hashmap whenever you touch the linked list, never one without the other

Tests

Smoke (3 normal)

The three tests in the Examples section above.

Edge

# Capacity 1 — every put evicts
cache = LRUCache(1)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == -1
assert cache.get(2) == 2

# Get on a key that was evicted
cache = LRUCache(2)
cache.put(1, 1); cache.put(2, 2); cache.put(3, 3)
assert cache.get(1) == -1

# Repeated put on same key never evicts
cache = LRUCache(2)
for v in range(100):
    cache.put(1, v)
assert cache.get(1) == 99

Large

cache = LRUCache(3000)
for i in range(200000):
    cache.put(i % 5000, i)
# Verifies O(1) per op; should complete in <1s

Randomized verifier (brute O(N) LRU using list)

class BruteLRU:
    def __init__(self, cap):
        self.cap = cap
        self.order = []
        self.vals = {}
    def get(self, k):
        if k not in self.vals: return -1
        self.order.remove(k); self.order.append(k)
        return self.vals[k]
    def put(self, k, v):
        if k in self.vals:
            self.order.remove(k)
        elif len(self.vals) >= self.cap:
            evict = self.order.pop(0)
            del self.vals[evict]
        self.order.append(k); self.vals[k] = v

import random
random.seed(42)
for trial in range(100):
    cap = random.randint(1, 10)
    fast = LRUCache(cap); slow = BruteLRU(cap)
    for _ in range(200):
        if random.random() < 0.5:
            k = random.randint(0, 15)
            assert fast.get(k) == slow.get(k)
        else:
            k = random.randint(0, 15); v = random.randint(0, 100)
            fast.put(k, v); slow.put(k, v)

Invalid input

# Capacity 0 — undefined by spec; behavior must be documented
try:
    LRUCache(0)
except ValueError:
    pass
# Or: assert never stores anything if zero is allowed

Follow-up Questions

Make it thread-safe. Per-op mutex is the simple answer; striped locks for higher throughput. (See Phase 8 lab.)
LFU instead of LRU. Track frequencies + a min-frequency pointer; harder, see Phase 8 lab-02.
TTL eviction. Add expiration timestamp per entry; lazy or eager eviction tradeoff.
Distributed LRU. Consistent hashing across nodes; cache coherence is now a hard problem.
Approximate LRU. Sample K random entries and evict the oldest among them (Redis approach); O(1) eviction without strict ordering.

Product Extension

CPU caches. Hardware caches are pseudo-LRU because true LRU’s pointer overhead is too expensive.
CDN edge caches. Strict LRU loses popular content under cache pollution; LFU + admission filter (TinyLFU) is state of the art.
Database buffer pools. PostgreSQL uses a clock-sweep approximation; MySQL InnoDB uses LRU with a midpoint insertion point to resist scan pollution.
OS page replacement. Linux uses two clock lists (active + inactive) to approximate LRU at low cost.

Language/Runtime Follow-ups

Python: OrderedDict is implemented in C and uses a doubly linked list internally; move_to_end and popitem(last=False) are O(1). The __slots__ on Node avoids per-instance __dict__, saving ~50% memory per node.
Java: LinkedHashMap with (capacity, 0.75f, true) constructor enables access order; override removeEldestEntry for eviction. ConcurrentHashMap does not provide LRU; you’d need Caffeine or a custom striped lock.
C++: std::list + std::unordered_map<Key, std::list::iterator>; iterators to std::list remain valid after other inserts/erases, which is essential for this design.
Go: No built-in LRU; use container/list + a map. The standard library hashicorp/golang-lru is the go-to.
Rust: Borrowing rules make a vanilla doubly linked list hard; use lru crate which uses raw pointers internally.

Common Bugs

Forgot to update recency on get. Tests pass for put but fail when get should “save” a key from eviction.
Forgot to update recency on put of existing key. The key that was just updated gets evicted on the next put.
Hashmap and linked list out of sync. Removed from list but not dict, or vice versa. Always update both.
No sentinels → null pointer at head/tail. Sentinels eliminate 4 different null checks.
Evicting before checking if key already exists. Update-of-existing should not trigger eviction.
Off-by-one on capacity comparison. len(cache) >= cap vs len(cache) > cap — first is correct because you’re about to add one more.

Debugging Strategy

If a test fails:

Print the linked list (head → tail with keys) after each operation. Verify it matches your expected MRU order.
Print len(cache) after each op. It should match the number of inserts minus evictions.
Cross-check: after every op, set(cache.keys()) should equal the set of keys in the linked list. If not, you have a sync bug.
Run the randomized verifier with a small seed; when it fails, print the trace of operations that led to the divergence — it will be 10–20 ops long and obvious.

Mastery Criteria

Wrote the tests before writing the implementation
Surfaced at least 3 spec ambiguities through the tests
Implementation worked on the first run (because tests forced the design to be correct)
Used sentinels in the linked list
Wrote the randomized verifier and ran it for 100+ trials with no divergence
Can explain why updating an existing key must mark it as MRU
Re-applied TDD to one Phase 8 lab and recorded how many design questions it surfaced

Lab 03 — Debugging Under Pressure (Word Break with a Planted Bug)

Goal

Build a systematic debugging protocol you can execute under interview pressure in under 5 minutes. The anchor is Word Break (LC 139) with a planted off-by-one — you’ll be given buggy code and a failing test, and the goal is to find the bug methodically rather than by panic-staring at the screen. By the end you should reach for the protocol automatically when stuck.

Background Concepts

Under pressure, most candidates default to panic debugging: re-read the code 5 times, add random print statements, change one thing, hope. This rarely works in 5 minutes and looks terrible to the interviewer.

Systematic debugging is a 6-step protocol:

Reproduce — Confirm the failing input. The smallest one that fails.
Isolate — What is the exact discrepancy? Expected X, got Y.
Hypothesize — Form a specific hypothesis: “I think the bug is in the inner loop’s bound.”
Verify — Add one targeted print or assertion that confirms or denies the hypothesis. Not five prints.
Fix — Make the minimum change that addresses the verified hypothesis.
Re-test — Run all tests, not just the failing one. Make sure you didn’t break something else.

The discipline is in steps 3 and 4. Hypothesize before you print. Random prints waste time and create noise.

Interview Context

You will hit a bug in 80% of medium+ interview problems. How you respond is a major signal:

Behavior	Signal
“It doesn’t work, let me try…” (silent typing for 3 min)	Junior — no protocol
“Let me add some prints…” (adds 8 prints, can’t read output)	Junior — random debugging
“The expected output is X, I got Y. So the difference is Z. My hypothesis is that the bug is in the loop bound — let me check by printing `i` at line 7.”	Senior — narrating the protocol
Finds the bug, then says “Let me also test the case I just fixed plus an adjacent case, in case I introduced something.”	Senior+ — proactive regression

Narrating the protocol aloud is itself the signal. The interviewer can hear that you’re a person who has debugged a thousand bugs and has a process.

Problem Statement

You are given the following implementation of Word Break. It is buggy. Find and fix the bug using the debug protocol. You may add prints/asserts but must remove them before declaring the fix complete.

def word_break(s: str, word_dict: list[str]) -> bool:
    """Return True iff s can be segmented into a sequence of words from word_dict."""
    words = set(word_dict)
    n = len(s)
    # dp[i] = True iff s[:i] can be segmented
    dp = [False] * n
    dp[0] = True
    for i in range(1, n + 1):
        for j in range(i):
            if dp[j] and s[j:i] in words:
                dp[i] = True
                break
    return dp[n]

Failing test: word_break("leetcode", ["leet", "code"]) should return True. The function raises IndexError.

Constraints

1 ≤ |s| ≤ 300
1 ≤ |word_dict| ≤ 1000
All strings lowercase letters

Clarifying Questions

(Not applicable — this lab uses a fixed buggy implementation. The clarifying questions for Word Break itself are: are words reusable? Yes. Are duplicates in word_dict significant? No. Empty s? Returns True conventionally.)

Examples

word_break("leetcode", ["leet", "code"])     → True
word_break("applepenapple", ["apple", "pen"]) → True
word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) → False
word_break("", ["any"])                       → True (empty string is trivially segmentable)

Initial Brute Force

Recursion: for each prefix of s that is in word_dict, recurse on the suffix. Time O(2^N) without memoization.

Brute Force Complexity

O(2^N) without memoization; O(N²) with (each prefix length tried once, each requiring an O(N) substring check and set lookup).

Optimization Path

Bottom-up DP: dp[i] = True iff s[:i] can be segmented. Transition: dp[i] = True iff there exists j < i with dp[j] True and s[j:i] ∈ words. The buggy implementation above is the right idea — just slightly wrong.

Final Expected Approach (Correct Version)

def word_break(s, word_dict):
    words = set(word_dict)
    n = len(s)
    dp = [False] * (n + 1)   # ← THE FIX: size n+1, not n
    dp[0] = True
    for i in range(1, n + 1):
        for j in range(i):
            if dp[j] and s[j:i] in words:
                dp[i] = True
                break
    return dp[n]

Applying the Debug Protocol (Walkthrough)

Step 1 — Reproduce

Run word_break("leetcode", ["leet", "code"]). Get IndexError: list assignment index out of range. Confirm with the exact line: the assignment dp[i] = True when i = n.

Step 2 — Isolate

The loop runs i from 1 to n inclusive (range(1, n + 1)). dp has length n. So when i == n, dp[i] is out of bounds.

Step 3 — Hypothesize

The DP array is one element too small. The semantics of dp[i] cover i = 0 (empty prefix) through i = n (full string), which is n + 1 values. The author wrote dp = [False] * n — off by one.

Step 4 — Verify

Add assert len(dp) == n + 1 after allocation. It fails, confirming the hypothesis.

Step 5 — Fix

Change dp = [False] * n to dp = [False] * (n + 1).

Step 6 — Re-test

Run the failing test → True. Run all four examples → all pass. Add an edge test word_break("", []) → returns True (since dp[0] is True initialized, and dp[n] == dp[0]).

Total time if narrated cleanly: under 4 minutes.

Data Structures Used

Set for O(1) word lookup
DP array of booleans

Correctness Argument

dp[0] is the base case: the empty prefix is trivially segmentable. For i > 0, dp[i] is True iff some split point j makes both halves valid: dp[j] (the left half is segmentable) and s[j:i] is a word. By induction on i, this is correct.

Complexity

Time O(N²) (or O(N² + total dict string length) if you care about set membership cost). Space O(N + W) where W is the dictionary size.

Implementation Requirements

DP array size must be n + 1, not n — this is the planted bug
Use a set for word lookup, not a list (O(1) vs O(W) per check)
Break out of inner loop on first success (constant factor, not asymptotic)

Tests

Smoke (3 normal)

assert word_break("leetcode", ["leet", "code"]) is True
assert word_break("applepenapple", ["apple", "pen"]) is True
assert word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) is False

Edge

assert word_break("", ["any"]) is True              # empty string
assert word_break("a", ["a"]) is True               # single char match
assert word_break("a", ["b"]) is False              # single char no match
assert word_break("aaaa", ["a", "aa"]) is True      # overlapping dict
assert word_break("ab", ["a"]) is False             # partial match

Large

s = "a" * 300
dict_ = ["a", "aa", "aaa", "aaaa"]
assert word_break(s, dict_) is True
# Should complete in <100ms; O(N^2) = 9e4 ops

Randomized

import random, string
random.seed(0)
for _ in range(100):
    words = [''.join(random.choices(string.ascii_lowercase, k=random.randint(1, 4)))
             for _ in range(5)]
    s = ''.join(random.choice(words) for _ in range(random.randint(1, 10)))
    assert word_break(s, words) is True  # constructed to be segmentable

Invalid input

# Empty dict — empty string still works, non-empty doesn't
assert word_break("", []) is True
assert word_break("a", []) is False

Follow-up Questions

Return all segmentations. Word Break II (LC 140). DFS + memoization; exponentially many results possible.
Return one segmentation. Track parent pointers in DP; reconstruct by backtracking.
Minimum number of words. Modify DP: dp[i] = min over j of dp[j] + 1.
Streaming input. Word Break on a stream — Aho-Corasick automaton.
Dictionary changes dynamically. Trie + DP, but rebuilds are expensive on every dict change.

Product Extension

Spell-check / autocomplete segmentation. “iphoneapp” → “iphone app”. Used in URL/path tokenization.
Hashtag splitting. Twitter “#machinelearning” → “machine learning”. Same algorithm with a dictionary + frequency weights for tiebreaks.
DNS subdomain analysis. “thequickbrownfox.com” — fraud detection wants to know if the hostname is composed of dictionary words.
Chinese/Japanese word segmentation. No spaces between words; same DP with a much larger dictionary.

Language/Runtime Follow-ups

Python: s[j:i] allocates a new string each call (O(i-j) time and space). For very long strings, prefer indexing into a precomputed structure or using str.startswith against the dictionary entries.
Java: s.substring(j, i) is O(i-j) since Java 7 (used to be O(1) view; changed for security). Same allocation cost.
Go: s[j:i] on a string is a view — O(1), no allocation. This makes Go’s version significantly faster.
C++: std::string_view (C++17) gives O(1) slicing; s.substr(j, i-j) allocates.
All: Set membership of strings is O(|substring|) for hashing + O(|substring|) for equality on collision — not strictly O(1). Matters for very long substrings.

Common Bugs

Off-by-one on DP size — the planted bug
Initializing dp[0] = False — empty prefix is the base case, must be True
Forgetting break after success — correctness still works but performance suffers
Using list for word_dict — O(W) per lookup, blows complexity to O(N²·W)
Inclusive vs exclusive bounds — s[j:i+1] vs s[j:i] is the most common off-by-one when porting between languages

Debugging Strategy

The 5-step systematic protocol:

Reproduce minimally. If the bug shows up on a 300-char string, shrink to the smallest failing case (here: any non-empty string).
Read the exception fully. IndexError + line number tells you almost everything. Don’t skip the stack trace.
State the discrepancy precisely. “Expected True, got IndexError on dp[i] = True when i = 8 and len(dp) = 8.”
Form a specific hypothesis. Not “it’s broken somewhere”; rather, “the array is one too small for the loop range.”
Verify with one targeted print/assert. assert len(dp) > i immediately before the assignment.
Regression test. After fixing, run the full suite. In this case, also test empty string explicitly since that’s the boundary.

The 5-minute panic protocol (when truly stuck):

Stop typing.
State aloud: “I’m stuck. Let me restate what I know.”
Restate the input and expected output.
State what your code does for that input, step by step.
The bug almost always reveals itself in the gap between “what the code does” and “what should happen.”

Mastery Criteria

Found the planted bug in under 5 minutes using the protocol
Narrated each step aloud (or in notes) — Reproduce, Isolate, Hypothesize, Verify, Fix, Re-test
Added exactly ONE targeted assertion to verify the hypothesis (not 5 prints)
Ran the full test suite after the fix, not just the failing one
Wrote the empty-string edge test that would have caught this bug originally
Can recite the 6-step protocol from memory
Applied the protocol to one of your own past wrong-answer submissions and timed yourself

Lab 04 — Correctness Proofs (Binary Search & Kadane’s Algorithm)

Goal

Prove the correctness of two short, foundational algorithms — binary search and Kadane’s — using loop invariants and induction. By the end you should be able to state the invariant for any loop you write and use it both to prove correctness and to find bugs before they manifest.

Background Concepts

A loop invariant is a statement that is true:

Initially (before the loop starts)
Maintained (if true before an iteration, true after)
Terminating (when the loop exits, it implies the desired post-condition)

This is induction on iterations. The invariant is what your loop promises about its state. If you can state the invariant out loud while coding, off-by-one bugs disappear because you can check each iteration against the promise.

A monovariant is a quantity that strictly decreases (or increases) each iteration and is bounded — it proves termination. For binary search, the monovariant is the search-range width.

An inductive proof for a recursive function: prove the base case correct; assume the recursive call is correct (induction hypothesis); show the combination is correct.

Interview Context

Interviewers rarely demand a formal proof, but they constantly ask “are you sure this works?” or “why does this work?” The candidates who answer with a precise invariant (“at the top of the loop, lo is the smallest index that could be the answer and hi is one past the largest”) look senior. The candidates who say “uh, I think so” look junior even when the code is correct.

For DP problems, “what’s the state, what’s the transition, and why does the order of iteration give you the correct value when you read it?” is the proof — interviewers explicitly ask this at Meta, Google, and Bloomberg.

Problem Statement

Prove correctness of two algorithms:

Part A: Binary search. Given a sorted array a and a target t, return the index of t if present, else -1.

Part B: Kadane’s algorithm. Given an array of integers (positive, negative, mixed), return the maximum sum of any non-empty contiguous subarray.

For each, you must:

Write the code
State the loop invariant precisely
Prove the invariant holds initially, is maintained, and implies correctness on exit
Identify the monovariant that proves termination

Constraints

1 ≤ |a| ≤ 10^5 for both problems
Values: -10^9 ≤ a[i] ≤ 10^9 (Kadane: watch overflow on long all-positive subarrays)

Clarifying Questions

(Standard problem statements; the lab is about proof, not problem disambiguation.)

Examples

Binary search:

a = [1, 3, 5, 7, 9, 11], t = 7  → 3
a = [1, 3, 5, 7, 9, 11], t = 4  → -1
a = [],                  t = 1  → -1

Kadane:

[-2, 1, -3, 4, -1, 2, 1, -5, 4]   → 6  (subarray [4, -1, 2, 1])
[-3, -1, -2]                       → -1 (best single element)
[5]                                → 5

Initial Brute Force

Binary search: linear scan, O(N). Kadane: triple loop over (i, j, sum), O(N³). With prefix sums, O(N²).

Brute Force Complexity

Linear: O(N). Triple loop: O(N³). Both correct, both slow.

Optimization Path

Binary search: O(log N) by halving the candidate range each step. Kadane: O(N) by maintaining the best subarray ending at index i and the best seen so far.

Final Expected Approach

Part A — Binary Search (with Proof)

def binary_search(a, t):
    lo, hi = 0, len(a)              # half-open: search range is [lo, hi)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if a[mid] == t:
            return mid
        elif a[mid] < t:
            lo = mid + 1
        else:
            hi = mid
    return -1

Loop invariant: At the top of every iteration, if t is present in a, then t is at some index in [lo, hi).

Initialization: lo = 0, hi = len(a). If t is present, it’s at some index in [0, len(a)) by definition. Invariant holds.

Maintenance: Assume invariant holds before an iteration. Compute mid.

If a[mid] == t, return immediately — correct.
If a[mid] < t: because a is sorted, every index ≤ mid has value ≤ a[mid] < t, so t is not at any of those indices. If t is in a, it must be in [mid+1, hi). Setting lo = mid + 1 preserves the invariant.
If a[mid] > t: symmetric; t not at index ≥ mid. Setting hi = mid preserves the invariant.

Termination & post-condition: The monovariant is hi - lo, which strictly decreases each iteration (verify: in both update branches mid = lo + (hi-lo)//2, after which lo' > lo or hi' < hi; specifically hi' - lo' < hi - lo always when lo < hi). It’s bounded below by 0, so the loop terminates. When lo == hi, the search range is empty. By the invariant, if t were present, it would be in an empty range — contradiction. So t is absent. Returning -1 is correct.

Critical subtlety — mid = lo + (hi - lo) // 2 vs (lo + hi) // 2: the former avoids integer overflow when lo + hi > INT_MAX. In Python this doesn’t matter (arbitrary precision int), but in Java/C++ it’s a real bug. Famous: the Java SDK had this bug in Arrays.binarySearch for ~9 years (Bloch, 2006).

Part B — Kadane’s Algorithm (with Proof)

def kadane(a):
    best_here = best_overall = a[0]
    for i in range(1, len(a)):
        best_here = max(a[i], best_here + a[i])
        best_overall = max(best_overall, best_here)
    return best_overall

Loop invariant: At the top of iteration i (1-indexed for clarity):

best_here is the maximum sum of any contiguous subarray ending at index i - 1.
best_overall is the maximum sum of any contiguous subarray within a[0..i-1] inclusive.

Initialization: Before the loop (i.e., before i = 1), the only subarray of a[0..0] is [a[0]] with sum a[0]. Both best_here and best_overall are set to a[0]. Invariant holds.

Maintenance: Assume the invariant holds at the start of iteration i. Consider all contiguous subarrays ending at index i. Each such subarray is either:

The singleton [a[i]], with sum a[i], OR
An extension of a subarray ending at i - 1, with sum (sum of that subarray) + a[i].

The best extension is best_here + a[i] (by the invariant on best_here). So the best subarray ending at i has sum max(a[i], best_here + a[i]), which is exactly the new best_here. Invariant clause 1 maintained.

The best subarray within a[0..i] is either entirely within a[0..i-1] (covered by old best_overall) or ends at i (covered by new best_here). The new best_overall = max(old best_overall, new best_here) is therefore correct. Clause 2 maintained.

Termination & post-condition: The loop runs exactly n - 1 iterations (finite, no monovariant needed). On exit, best_overall is the max contiguous subarray sum within a[0..n-1] = the whole array. Returning it is correct.

Edge case proof: Kadane requires a non-empty (the problem states this); for an all-negative array, the answer is the maximum single element. The invariant handles this because best_here will reset to a[i] whenever best_here + a[i] < a[i], i.e., whenever best_here < 0. This is why the algorithm works for negative-only arrays — a common bug is to initialize best_here = 0, which incorrectly returns 0 for all-negative input.

Data Structures Used

Plain arrays
Two integer variables for Kadane
Two integer indices for binary search

Correctness Argument

See Part A and Part B above.

Complexity

Binary search: O(log N) time, O(1) space.
Kadane: O(N) time, O(1) space.

Implementation Requirements

Use lo + (hi - lo) // 2 for binary search midpoint
Use half-open interval [lo, hi) for binary search — easier to reason about than closed [lo, hi]
Initialize Kadane’s best_here and best_overall to a[0], not 0, to handle all-negative arrays

Tests

Smoke

assert binary_search([1, 3, 5, 7, 9], 5) == 2
assert binary_search([1, 3, 5, 7, 9], 4) == -1
assert kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4]) == 6

Edge

# Binary search edges
assert binary_search([], 1) == -1
assert binary_search([5], 5) == 0
assert binary_search([5], 4) == -1
assert binary_search([1, 1, 1, 1], 1) in (0, 1, 2, 3)  # any valid index

# Kadane edges
assert kadane([5]) == 5
assert kadane([-3, -1, -2]) == -1
assert kadane([1, 2, 3, 4]) == 10        # all positive
assert kadane([-1, -2, -3, -4]) == -1    # all negative

Large

import random
random.seed(0)
big = sorted(random.sample(range(10**6), 10**5))
assert binary_search(big, big[50000]) == 50000

big2 = [random.randint(-10**6, 10**6) for _ in range(10**5)]
result = kadane(big2)
assert isinstance(result, int)

Randomized verifier

def brute_kadane(a):
    return max(sum(a[i:j]) for i in range(len(a)) for j in range(i+1, len(a)+1))

for _ in range(200):
    a = [random.randint(-50, 50) for _ in range(random.randint(1, 30))]
    assert kadane(a) == brute_kadane(a)

Invariant assertions (the proof, in code)

def binary_search_with_assertions(a, t):
    lo, hi = 0, len(a)
    while lo < hi:
        # INVARIANT: if t in a, then t at some index in [lo, hi)
        if t in a[:lo]: assert False, "invariant violated (t before lo)"
        if lo > 0 and a[lo-1] >= t and t in a: assert False
        mid = lo + (hi - lo) // 2
        if a[mid] == t: return mid
        elif a[mid] < t: lo = mid + 1
        else: hi = mid
    return -1

Follow-up Questions

Find leftmost vs rightmost occurrence of t. Modify binary search; the invariant becomes “the answer is in [lo, hi)” rather than “if t is present, it’s in…”.
Binary search on real numbers. Replace integer halving with floating-point; the invariant is the same but termination uses a precision threshold, not lo < hi.
Kadane with at most K negative numbers allowed. State expands to (i, k); DP, O(NK).
Maximum sum circular subarray. Two passes of Kadane + total-sum trick; the invariant for the circular case is more subtle.
Maximum product subarray. Maintain both max and min products at each index because a negative * negative becomes the largest.

Product Extension

Database B-tree page searches — binary search within a page; the invariant analysis directly applies.
Time-series anomaly detection — Kadane variants find the largest cumulative deviation, used in change-point detection.
Streaming Kadane — given a stream of metrics, find the worst-degradation window. Same algorithm with O(1) memory.

Language/Runtime Follow-ups

Python: mid = (lo + hi) // 2 is safe (arbitrary precision); mid = lo + (hi - lo) // 2 is still preferable for portability.
Java/C++: (lo + hi) / 2 overflows when lo + hi > 2^31 - 1. Use lo + (hi - lo) / 2 or (lo + hi) >>> 1 (Java unsigned shift).
Kadane overflow: for |a[i]| ≤ 10^9 and N = 10^5, max sum is 10^14 — exceeds 32-bit int. Use long/int64 in Java/Go/C++.
Floating-point Kadane: accumulation error compounds; use Kahan summation if precision matters.

Common Bugs

(lo + hi) // 2 overflow in C++/Java
Closed-interval binary search with lo <= hi is correct but the mid updates are trickier; pick a convention and stick with it
Kadane initialized to 0 — fails on all-negative arrays
Forgetting best_overall update — returns the best ending at the last position, not overall
Empty input to Kadane — undefined; problem statement says non-empty, but check the contract
Binary search infinite loop when lo = mid instead of lo = mid + 1 — the monovariant doesn’t decrease

Debugging Strategy

When binary search loops forever or returns wrong index:

Print (lo, mid, hi) each iteration. If lo doesn’t strictly increase or hi doesn’t strictly decrease (toward each other), you have an off-by-one.
Check the boundary condition: does your invariant include or exclude hi?
For “find leftmost”, the answer is at lo after the loop, not mid.

When Kadane returns 0 on all-negative input:

Check initialization — should be a[0], not 0.
Print (best_here, best_overall) at each step; trace by hand against the expected.

Mastery Criteria

Wrote both algorithms correctly without testing first (proof-first coding)
Stated the loop invariant for each in one sentence
Identified the monovariant for binary search termination
Proved correctness by induction (3 steps: init, maintain, terminate)
Recognized the (lo + hi) / 2 overflow risk without prompting
Explained why Kadane’s init must be a[0] and not 0
Wrote loop invariants as comments in your code for one Phase 5 DP problem

Lab 05 — Stress Testing Harness (Two-Pointer Variants)

Goal

Build a reusable stress-testing harness: a randomized input generator + a known-correct brute-force verifier + a diff loop that finds the smallest failing input. This is the single most valuable debugging tool in competitive programming, and it is shockingly underused in interview prep. After this lab you should reach for the harness automatically whenever your solution passes the given examples but you don’t trust it.

Background Concepts

A stress test has three components:

Generator (gen) — produces random valid inputs, parameterized by size and seed for reproducibility.
Brute force (brute) — a known-correct slow solution. Often O(N²) or O(2^N), valid only for tiny N.
Fast solution (fast) — the optimized solution you’re testing.

The harness loops: generate input → run both → compare. On mismatch, print the input and both outputs and stop. Then shrink the failing input to the smallest case that still fails — this is what makes debugging fast.

Why this works: brute force is correct by construction (it tries everything). Any discrepancy is your bug, not the brute force’s. Random testing covers cases you didn’t think of.

Why people don’t use it: they think writing brute force is “wasted time.” It is not; in interview prep, the brute force is also your starting point for the optimization conversation with the interviewer.

Interview Context

Stress testing rarely happens during a 45-min interview, but the practice habit shows up in your interview performance:

You instantly know how to write the brute force (which the interviewer always wants you to articulate first).
You catch bugs in practice that would otherwise be silently learned-wrong, then deployed mid-interview.
You build pattern recognition for “this kind of two-pointer has off-by-one risk” because you’ve seen the harness flag them.

At competitive companies (Jane Street, Hudson River, Citadel) and at top-tier interviews (Google L6+), interviewers will sometimes ask “how would you verify this is correct beyond running the examples?” The answer is the stress harness.

Problem Statement

Implement and stress-test three two-pointer problems known to have subtle off-by-one bugs:

A. Two Sum II (sorted array). Given a sorted array a and target t, return indices (i, j) with i < j and a[i] + a[j] == t, or (-1, -1) if no such pair exists.

B. Container With Most Water (LC 11). Given heights h, find indices (i, j) maximizing (j - i) * min(h[i], h[j]).

C. 3Sum (LC 15). Given nums, return all unique triples (a, b, c) with a + b + c == 0. Each triple sorted ascending; output deduplicated.

For each, write the fast solution, write the brute force, and write the stress harness. Run for ≥1000 random trials.

Constraints

|a| ≤ 1000 for stress; ≤ 10^5 for the real fast solution
-1000 ≤ a[i] ≤ 1000

Clarifying Questions

(These problems are standard; the lab is about the harness, not disambiguation.)

Examples

two_sum_sorted([1, 3, 4, 5, 7], 9)  → (1, 3)   # 3 + 5 = 8? no — actually (2, 3) since 4+5=9
container([1, 8, 6, 2, 5, 4, 8, 3, 7]) → 49     # i=1 (h=8), j=8 (h=7), width=7
3sum([-1, 0, 1, 2, -1, -4]) → [[-1, -1, 2], [-1, 0, 1]]

Initial Brute Force

Two Sum: O(N²) double loop. Trivially correct. Container: O(N²) double loop, take max. 3Sum: O(N³) triple loop; collect, sort each triple, deduplicate via set of tuples.

Brute Force Complexity

O(N²), O(N²), O(N³). All valid for N ≤ 100 in <1 sec.

Optimization Path

All three are classic two-pointer problems. After sorting (for 3Sum and Two Sum), pointers move from both ends inward based on the comparison.

Final Expected Approach

Fast solutions

def two_sum_sorted(a, t):
    i, j = 0, len(a) - 1
    while i < j:
        s = a[i] + a[j]
        if s == t: return (i, j)
        elif s < t: i += 1
        else: j -= 1
    return (-1, -1)

def container(h):
    i, j = 0, len(h) - 1
    best = 0
    while i < j:
        best = max(best, (j - i) * min(h[i], h[j]))
        if h[i] < h[j]: i += 1
        else: j -= 1
    return best

def three_sum(nums):
    nums = sorted(nums)
    n = len(nums)
    res = []
    for i in range(n - 2):
        if i > 0 and nums[i] == nums[i-1]: continue   # skip dup anchor
        j, k = i + 1, n - 1
        while j < k:
            s = nums[i] + nums[j] + nums[k]
            if s == 0:
                res.append([nums[i], nums[j], nums[k]])
                j += 1; k -= 1
                while j < k and nums[j] == nums[j-1]: j += 1   # skip dup j
                while j < k and nums[k] == nums[k+1]: k -= 1   # skip dup k
            elif s < 0: j += 1
            else: k -= 1
    return res

Brute forces

def brute_two_sum(a, t):
    for i in range(len(a)):
        for j in range(i+1, len(a)):
            if a[i] + a[j] == t: return (i, j)
    return (-1, -1)

def brute_container(h):
    best = 0
    for i in range(len(h)):
        for j in range(i+1, len(h)):
            best = max(best, (j - i) * min(h[i], h[j]))
    return best

def brute_3sum(nums):
    n = len(nums)
    found = set()
    for i in range(n):
        for j in range(i+1, n):
            for k in range(j+1, n):
                if nums[i] + nums[j] + nums[k] == 0:
                    found.add(tuple(sorted([nums[i], nums[j], nums[k]])))
    return sorted([list(t) for t in found])

The Stress Harness

import random

def stress(gen, brute, fast, normalize, trials=2000, seed=0):
    random.seed(seed)
    for t in range(trials):
        inp = gen()
        b = normalize(brute(*inp))
        f = normalize(fast(*inp))
        if b != f:
            print(f"FAIL on trial {t}")
            print(f"  input: {inp}")
            print(f"  brute: {b}")
            print(f"  fast:  {f}")
            # Shrink: try to find a smaller failing input
            shrunk = shrink_input(inp, brute, fast, normalize)
            print(f"  smallest failing input: {shrunk}")
            return False
    print(f"PASS {trials} trials")
    return True

def shrink_input(inp, brute, fast, normalize):
    """Greedy shrink — drop elements one at a time, keep if still fails."""
    arr, *rest = inp
    current = list(arr)
    changed = True
    while changed:
        changed = False
        for i in range(len(current)):
            candidate = current[:i] + current[i+1:]
            if len(candidate) < 2: continue
            try:
                if normalize(brute(candidate, *rest)) != normalize(fast(candidate, *rest)):
                    current = candidate; changed = True; break
            except Exception:
                continue
    return (current, *rest)

# Generators
def gen_two_sum():
    n = random.randint(2, 20)
    a = sorted(random.randint(-30, 30) for _ in range(n))
    t = random.randint(-60, 60)
    return (a, t)

def gen_container():
    n = random.randint(2, 30)
    return ([random.randint(0, 20) for _ in range(n)],)

def gen_3sum():
    n = random.randint(3, 15)
    return ([random.randint(-10, 10) for _ in range(n)],)

# Normalizers (canonicalize output before comparison)
def norm_two_sum(r):
    # Both -1, -1 OR a valid pair; for the pair, the sum is what matters, not index
    if r == (-1, -1): return None
    return "found"  # we only care that one was found; index may differ
    # NOTE: If indices must match exactly, change the brute force to scan in two-pointer order

def norm_container(x): return x   # int, direct compare
def norm_3sum(triples): return sorted([sorted(t) for t in triples])

# Run
stress(gen_two_sum, brute_two_sum, two_sum_sorted, norm_two_sum)
stress(gen_container, brute_container, container, norm_container)
stress(gen_3sum, brute_3sum, three_sum, norm_3sum)

Data Structures Used

Lists of integers
Set of tuples for 3Sum deduplication in brute force
A small library of helper functions (gen, brute, fast, normalize, stress, shrink_input) that you reuse across problems

Correctness Argument

The brute force is correct because it enumerates all valid candidates (O(N^k) for k-sum). Any output from the fast solution that disagrees with the brute is a bug in the fast solution. Random sampling over thousands of trials gives high confidence (though not certainty) that the fast is correct; the smaller the input space (e.g., values in [-10, 10] with N ≤ 15), the more confident.

Complexity

Harness adds zero asymptotic cost — the fast solution’s complexity is unchanged. Each trial costs O(brute) which is the bottleneck; with N ≤ 30 it runs ~2000 trials in <2 seconds.

Implementation Requirements

Use random.seed() for reproducibility — failures must be re-runnable
Print the failing input before the outputs, so you can copy-paste and re-run
Always normalize outputs before comparison (canonical sort order, etc.)
Implement shrinking — a 20-element failure is hard to debug; a 4-element failure is obvious

Tests

The harness itself, as a test

# Sanity: plant a bug in the fast solution and verify the harness catches it
def buggy_two_sum(a, t):
    i, j = 0, len(a) - 1
    while i < j:
        s = a[i] + a[j]
        if s == t: return (i, j)
        elif s < t: i += 1
        else: j -= 1
    return (0, 0)   # BUG: should return (-1, -1)

assert stress(gen_two_sum, brute_two_sum, buggy_two_sum, norm_two_sum, trials=500) is False

Pass-through on the correct solutions

assert stress(gen_two_sum, brute_two_sum, two_sum_sorted, norm_two_sum, trials=2000) is True
assert stress(gen_container, brute_container, container, norm_container, trials=2000) is True
assert stress(gen_3sum, brute_3sum, three_sum, norm_3sum, trials=2000) is True

Edge generators

Add specialized generators that stress edge cases:

def gen_two_sum_edge():
    """Heavy on duplicates and boundary targets."""
    n = random.randint(2, 10)
    a = sorted([random.choice([-1, 0, 1]) for _ in range(n)])
    t = random.choice([-2, 0, 2])
    return (a, t)

Follow-up Questions

Generator-based testing (Hypothesis library). Python’s hypothesis library generates inputs and shrinks them automatically. Show how to convert the harness into Hypothesis strategies.
Detecting performance regressions. Add timing to the harness; flag when fast > 10× the previous run on the same seed.
Coverage-guided fuzzing. Use atheris or similar to mutate inputs that increase code coverage; finds rarer bugs than purely random.
Concurrent stress testing. Run brute and fast on different threads; useful for testing thread-safe versions.
What if there’s no brute force? Then write two independent fast solutions (different algorithms) and stress them against each other. Common for geometry problems.

Product Extension

CI-integrated fuzzing. Google’s OSS-Fuzz runs continuous random testing on open-source projects; finds thousands of bugs annually.
Property-based testing in production. Stripe, Jane Street, Klarna use property tests to validate financial logic where the brute force is “the spec.”
Differential testing. Compare two implementations of the same protocol (e.g., two JSON parsers) on random inputs to find spec ambiguities.

Language/Runtime Follow-ups

Python: hypothesis is the gold standard for property-based testing. random.seed() is per-thread; for parallel stress, use independent random.Random instances.
Java: jqwik or junit-quickcheck for property-based; JMH for performance regression detection.
Go: Built-in testing/quick and (Go 1.18+) native fuzzing with go test -fuzz.
C++: rapidcheck (QuickCheck-style); LLVM’s libFuzzer for coverage-guided.
Rust: proptest and quickcheck crates; native cargo fuzz.

Common Bugs

Non-reproducible failures — forgot random.seed(); can’t re-run the failing case.
Output comparison fails due to ordering — set vs list, dict iteration order; always normalize.
Brute force itself is buggy — verify the brute on the given problem examples first.
Generator produces invalid inputs — e.g., for Two Sum Sorted, the generator must produce a sorted array. Verify with assert all(a[i] <= a[i+1] for i in range(len(a)-1)).
Shrinker breaks the input invariant — for Two Sum Sorted, dropping an element keeps the array sorted; but for a tree-structured input, dropping a node may break invariants. Custom shrinkers per problem.

Debugging Strategy

When the harness reports a failure:

Read the smallest failing input that the shrinker produced. If it’s ≤ 5 elements, trace by hand.
Run only the fast solution with prints on that small input. Compare to expected.
The bug is almost always in a boundary condition — empty input, single element, all duplicates, exact-target match.
If the bug only appears with duplicates, suspect your dedup logic (3Sum is famous for this).
If the bug only appears with negatives, suspect signed comparisons or abs() misuse.

Mastery Criteria

Built the stress harness in under 20 minutes for the three target problems
Caught at least one bug by planting one and verifying the harness flagged it
Wrote a shrinker that reduces failures to ≤ 10 elements
Ran 2000+ trials per problem with no failure
Built a reusable harness module you can drop into any future problem
Applied the harness to one Phase 2 or Phase 5 problem and either confirmed correctness or found a bug
Can explain why random testing complements (not replaces) edge-case enumeration

Lab 06 — Performance Profiling (Three LIS Implementations)

Goal

Measure and compare three implementations of Longest Increasing Subsequence: O(N²) DP, O(N log N) patience sort, and a poorly-written “O(N log N)” with hidden O(N) inside the inner loop. Use profiling tools to detect the discrepancy between claimed complexity and actual behavior. By the end you should never again submit a solution thinking “this should be fast enough” without measuring.

Background Concepts

Empirical complexity verification: if a function is O(f(N)), then running it at N and 2N should produce runtimes that scale as f(2N) / f(N). For O(N): 2×. For O(N log N): ~2.1×. For O(N²): 4×. For O(N³): 8×. Measure the ratio; mismatch reveals the bug.

Profiling tools:

Python: cProfile for function-level timing; line_profiler for line-by-line; py-spy for sampling without code changes; tracemalloc for memory.
Java: async-profiler for low-overhead sampling; JFR (Java Flight Recorder); JMH for microbenchmarks; jmap for heap snapshots.
Go: pprof (CPU and heap); runtime/trace for goroutine scheduling; go test -bench with -benchmem.
C++: perf (Linux) with flamegraphs; Valgrind/Callgrind for instruction counts; gperftools.
Node.js: built-in --inspect + Chrome DevTools; clinic.js for higher-level analysis.

Common deceptions:

A “constant time” operation that’s actually O(N) (e.g., list.insert(0, x) in Python, s += c in a loop in Java).
Hash collisions in adversarial input turning O(1) lookups into O(N).
Garbage collector pauses inflating measurements.
JIT warmup masking the cold-start cost.

Interview Context

Interviewers ask “what’s the complexity?” on every problem. They sometimes follow up with “are you sure?” — and the right answer is “I claim O(N log N); I can verify by running at doubling sizes if you’d like.” Candidates who can articulate empirical verification (even without running it) signal a different level of rigor.

Practical-engineering interviews (Phase 8) often include “make this faster” or “profile this for me” as a follow-up. You need fluency with at least one language’s profiler.

Problem Statement

Implement three versions of LIS:

Version A (O(N²) DP): dp[i] = length of LIS ending at index i. Transition: dp[i] = max(dp[j] + 1) for all j < i with a[j] < a[i].

Version B (O(N log N) patience sort): Maintain tails[k] = smallest tail value of any increasing subsequence of length k+1. For each element, binary-search-and-replace; if no replacement, append.

Version C (“fake” O(N log N)): Same as B, but uses Python list with index() (O(N)) to find the replacement position instead of binary search. Looks like O(N log N) at a glance; actually O(N²).

Measure runtime at N = 1000, 2000, 4000, 8000, 16000. Verify the doubling ratios. Profile to find the bottleneck in Version C.

Constraints

1 ≤ N ≤ 16000 for measurement
Values: random integers in [0, 10^6]

Clarifying Questions

(LIS is standard; the lab is about measurement.)

Examples

LIS([10, 9, 2, 5, 3, 7, 101, 18]) == 4   (e.g., [2, 5, 7, 101] or [2, 3, 7, 18])
LIS([0, 1, 0, 3, 2, 3]) == 4
LIS([7, 7, 7, 7]) == 1

Initial Brute Force

Enumerate all 2^N subsequences, check each. O(2^N · N).

Brute Force Complexity

O(2^N · N). Valid only for N ≤ 20. Useful as the verifier in your stress harness from Lab 5.

Optimization Path

DP from O(2^N) → O(N²) → O(N log N).

Final Expected Approach

import bisect
import time
import random

# Version A: O(N^2) DP
def lis_dp(a):
    if not a: return 0
    n = len(a)
    dp = [1] * n
    for i in range(1, n):
        for j in range(i):
            if a[j] < a[i]:
                dp[i] = max(dp[i], dp[j] + 1)
    return max(dp)

# Version B: O(N log N) — real
def lis_patience(a):
    tails = []
    for x in a:
        idx = bisect.bisect_left(tails, x)
        if idx == len(tails):
            tails.append(x)
        else:
            tails[idx] = x
    return len(tails)

# Version C: "fake" O(N log N) — uses linear search disguised
def lis_fake(a):
    tails = []
    for x in a:
        # Linear scan to find first tail >= x — O(N)!
        idx = None
        for i, t in enumerate(tails):
            if t >= x:
                idx = i; break
        if idx is None:
            tails.append(x)
        else:
            tails[idx] = x
    return len(tails)

Measurement Harness

def benchmark(fn, n, trials=3):
    random.seed(n)
    times = []
    for _ in range(trials):
        a = [random.randint(0, 10**6) for _ in range(n)]
        start = time.perf_counter()
        fn(a)
        times.append(time.perf_counter() - start)
    return min(times)  # min is most reliable; mean is noisy

def doubling_test(fn, name, sizes=[1000, 2000, 4000, 8000, 16000]):
    prev = None
    print(f"\n{name}")
    print(f"{'N':>8} {'time (s)':>12} {'ratio':>8}")
    for n in sizes:
        t = benchmark(fn, n)
        ratio = f"{t/prev:.2f}x" if prev else "—"
        print(f"{n:>8} {t:>12.4f} {ratio:>8}")
        prev = t

doubling_test(lis_dp,       "Version A — O(N^2) DP")
doubling_test(lis_patience, "Version B — O(N log N) patience")
doubling_test(lis_fake,     "Version C — fake O(N log N)")

Expected Output (approximate, on modern laptop)

Version A — O(N^2) DP
       N    time (s)    ratio
    1000     0.0420       —
    2000     0.1680    4.00x
    4000     0.6720    4.00x
    8000     2.6900    4.00x
   16000    10.7600    4.00x

Version B — O(N log N) patience
       N    time (s)    ratio
    1000     0.0008       —
    2000     0.0017    2.13x
    4000     0.0036    2.12x
    8000     0.0076    2.11x
   16000     0.0160    2.10x

Version C — fake O(N log N)
       N    time (s)    ratio
    1000     0.0210       —
    2000     0.0840    4.00x   ← 4x means O(N^2), not O(N log N)!
    4000     0.3360    4.00x
    8000     1.3440    4.00x
   16000     5.3760    4.00x

Reading the ratios: if you claimed O(N log N) and see 4× per doubling, your algorithm is actually O(N²). The doubling test is the cheapest, most reliable complexity-verifier in your toolbox.

Profiling Version C

import cProfile, pstats
random.seed(0)
a = [random.randint(0, 10**6) for _ in range(8000)]

pr = cProfile.Profile()
pr.enable()
lis_fake(a)
pr.disable()
pstats.Stats(pr).sort_stats('cumulative').print_stats(10)

Expected output: the lis_fake function dominates; the inner for i, t in enumerate(tails) is the hot spot. With line_profiler:

$ pip install line_profiler
$ kernprof -l -v script.py
# Add @profile decorator to lis_fake first

Output will show the inner loop accounts for ~95% of the time per call, confirming the linear-search bottleneck.

Data Structures Used

Plain list (Python) — supports bisect for true O(log N) search
Profiler outputs (text/HTML/flamegraph depending on tool)

Correctness Argument

All three versions produce the same output on random input. The patience sort correctness: tails[k] always stores the smallest possible tail of a length-(k+1) LIS seen so far. When a new element is smaller than tails[k], replacing improves the future extensibility; when it’s larger than all tails, it extends to a new length. Final length is len(tails). Inductive proof omitted; see CLRS / standard references.

Complexity

A: O(N²) time, O(N) space.
B: O(N log N) time, O(N) space.
C: claimed O(N log N), actually O(N²) due to linear inner search.

Implementation Requirements

Use time.perf_counter(), not time.time() — higher resolution, monotonic.
Take the min of multiple trials, not the mean — min rejects GC/scheduler noise.
Warm up before timing if testing JIT’d languages (Java, JS).
Use __slots__ and pre-allocated arrays in hot Python paths.

Tests

Smoke

assert lis_dp([10, 9, 2, 5, 3, 7, 101, 18]) == 4
assert lis_patience([10, 9, 2, 5, 3, 7, 101, 18]) == 4
assert lis_fake([10, 9, 2, 5, 3, 7, 101, 18]) == 4

Edge

for fn in (lis_dp, lis_patience, lis_fake):
    assert fn([]) == 0
    assert fn([42]) == 1
    assert fn([1, 2, 3, 4, 5]) == 5         # already sorted
    assert fn([5, 4, 3, 2, 1]) == 1         # reverse sorted
    assert fn([7, 7, 7, 7]) == 1            # all duplicates

Performance assertions

# Verify Version B's doubling ratio
times = [benchmark(lis_patience, n) for n in (1000, 2000, 4000)]
ratios = [times[1]/times[0], times[2]/times[1]]
for r in ratios:
    assert 1.8 < r < 2.5, f"Version B ratio {r:.2f} not in O(N log N) range"

# Verify Version C is actually quadratic (this assertion *should* pass — proving the bug)
times = [benchmark(lis_fake, n) for n in (1000, 2000, 4000)]
ratios = [times[1]/times[0], times[2]/times[1]]
for r in ratios:
    assert 3.5 < r < 4.5, f"Version C ratio {r:.2f} not in O(N^2) range — bug may be fixed?"

Randomized verifier (A == B == C)

random.seed(0)
for _ in range(50):
    a = [random.randint(0, 100) for _ in range(random.randint(1, 50))]
    assert lis_dp(a) == lis_patience(a) == lis_fake(a)

Follow-up Questions

What if the input is nearly sorted? Version A’s actual runtime degrades less; profile to confirm.
Memory profile: which version allocates most? Use tracemalloc.start() and tracemalloc.get_traced_memory().
Reconstruct the LIS, not just its length. Track parent pointers; doesn’t change asymptotic complexity but doubles space.
LIS in a stream (one pass, can’t store everything). Use patience sort with a fixed buffer; gives approximate answer.
Parallelize LIS. Hard — DP dependencies are sequential. Pipeline by chunks; merge with care.

Product Extension

Code review of “this should be fast” claims — every senior engineer learns to verify before trusting.
Database query planners — the planner estimates I/O cost; profiling validates the estimate against real query times.
CDN cache eviction policies — when comparing LRU vs LFU vs SLRU under real traffic, microbenchmarks lie; full profiles win.
Production hot-path detection — flamegraphs reveal that 80% of CPU is spent in 3% of the code; optimize there.

Language/Runtime Follow-ups

Python: time.perf_counter() is the right clock. cProfile overhead is ~30%; for tight loops use line_profiler or py-spy (sampling, ~0% overhead). The GIL means CPU profiling is mostly sequential; for asyncio code use aiomonitor or asyncio debug mode.
Java: JMH (Java Microbenchmark Harness) handles JIT warmup, dead-code elimination, and constant folding correctly — handwritten timing loops in Java are often wrong. Use -prof gc to see allocation cost.
Go: go test -bench=. -benchmem is the standard. -cpuprofile and -memprofile write pprof files; visualize with go tool pprof.
C++: perf record + perf report for sampling; perf stat for cache misses and IPC. Use -O2 or -O3 for measurement; debug builds have very different performance.
Node.js: V8’s --prof flag dumps tick logs; --inspect for Chrome DevTools. Beware turbofan optimization — code that runs cold for the first 10K iterations is suddenly 10× faster after JIT.

Common Bugs

Timing the cold start — first call includes import/parse/JIT warmup; throw it out.
Using time.time() — wall clock; affected by NTP, sleep, system load.
Mean over trials — one stop-the-world GC pause skews the mean; use min instead.
Measuring with assertions on — Python -O flag strips asserts; default mode keeps them, slowing hot loops.
Forgetting random.seed() — runs aren’t reproducible.
Comparing implementations on different input distributions — random LIS, sorted LIS, and reversed LIS have wildly different runtimes for some algorithms.

Debugging Strategy

When complexity doesn’t match your claim:

Verify with the doubling test. This is the only way.
Profile to find the hot function. It’s almost always one inner loop.
Read the standard library docs for any “built-in” operation you used — list.insert, dict.update, str +=, Vector.add may not be what you think.
Check for hidden quadratic behavior in concatenation: result = result + small_thing in a loop is the classic Java/Python beginner trap.
Verify memory with tracemalloc — sometimes the “slow” is actually paging, not CPU.

Mastery Criteria

Implemented all three LIS versions
Ran the doubling test and observed the 4× / 2.1× / 4× ratios
Profiled Version C with cProfile (or equivalent) and identified the linear-search bottleneck
Wrote performance assertions that would catch a regression
Can recite the expected doubling ratio for O(N), O(N log N), O(N²), O(N³)
Applied profiling to one of your own past solutions and identified one inefficiency
Familiarity with at least one profiler in your primary language (output format, common flags, how to interpret)

Phase 11 — Mock Interview Mastery

Target level: All (mock difficulty scales from beginner through staff/principal/competitive) Expected duration: 4–8 weeks (depending on your overall track; mocks are continuous) Weekly cadence: 2 mocks minimum, 3+ if interviews are within 4 weeks

Why This Phase Exists

Phases 0–10 trained you to solve problems. This phase trains you to interview. Those are different skills. You can know every algorithm, pass every Phase 1–9 lab, write proofs from Phase 10 — and still fail a real interview because:

You panic and forget the obvious under real-time pressure.
You waste 8 minutes on clarifying questions a senior would resolve in 60 seconds.
You code correctly but communicate nothing — the interviewer can’t tell if you’re thinking or stuck.
You optimize prematurely before understanding the problem.
You miss the follow-ups that separate “competent” from “hireable at level.”
You finish in 25 minutes and have nothing to say when asked the extension question.
You write buggy code, then spend the remaining time debugging instead of explaining.

A mock interview is the nearest equivalent to the real event without the stakes. Your job: complete at least 12 mocks (one per level), identify your failure mode per level, drill it until it stops happening.

The candidates who pass the hard rounds are not the ones who know the most algorithms. They are the ones who have rehearsed the performance enough times that the algorithm is almost a side effect of a clean interview.

How to Run a Mock

Alone (self-timed)

Read the problem statement only. Do not peek at hints, examples, follow-ups.
Set a timer to the mock’s exact allocated time.
Open a blank document (Google Doc, plain text, paper). No IDE, no autocomplete, no syntax highlighting. The real interview is in a shared Google Doc or CoderPad with minimal tooling.
Narrate aloud or write notes continuously. If you go silent for >30 seconds, stop, write down what you’re thinking, then proceed.
Write pseudocode first. If you have >20 minutes left after pseudocode, translate to real code. If less, stay in pseudocode and be very clear about logic.
When time expires, STOP. This is a time-management test, not a coding-speed test.
Self-evaluate against the 14-dimension rubric below. Score honestly. If your “Optimization” claim was O(N log N) but you wrote O(N²), that’s a 1, not a 4.
Do not look at the official solution until after a second self-mock at the same level. One failure teaches a fact; two failures teach the pattern.

With a partner (realistic)

Find a peer, ideally one level above you.
They read the problem statement to you. You ask clarifying questions; they answer in character.
They watch silently as you solve. They give hints only if you explicitly request one (with a score penalty) or after 10+ minutes of being stuck.
After the timer expires, they rate you on the 14 dimensions, then you debrief.
Swap roles next session.

Best of both worlds

Pramp, interviewing.io, Hello Interview (and similar) match you with strangers. Higher pressure, more realistic.
Record yourself (audio + screen). Replay 24h later. You will be shocked at what you actually said vs what you remember.

The 14-Dimension Scoring Rubric

Every mock is scored 1–5 on each dimension. Total /70. Passing thresholds vary by mock level (see each mock file).

1. Problem Understanding

1: Misread the problem; solved the wrong thing.
2: Understood the surface; missed a subtle constraint.
3: Understood correctly, restated to interviewer.
4: Restated, identified the underlying category (graph, DP, greedy) within first 2 minutes.
5: Restated, identified category, and explicitly verified your interpretation with one well-chosen example.

2. Clarifying Questions

1: None asked; assumed everything.
2: One generic question (“can the input be empty?”).
3: 2–3 questions covering input bounds and edge cases.
4: 3–5 questions covering bounds, edge cases, output format, ambiguity resolution.
5: Surgical questions that probe the exact ambiguities of this problem (e.g., for LRU: “does put-on-existing count as a use?”).

3. Brute Force

1: No brute force articulated; jumped to optimization.
2: Mentioned brute force in passing.
3: Stated brute force with complexity; moved on.
4: Stated brute force, complexity, and why it fails the constraint.
5: Wrote brute force pseudocode briefly to confirm correctness before optimizing — gives you a verifier.

4. Optimization

1: No improvement on brute force.
2: Improved by a constant factor.
3: Optimal-class solution (e.g., O(N log N) when O(N log N) is optimal).
4: Optimal-class solution with the right pattern recognized within first 5 minutes.
5: Optimal solution + articulated why the optimization works (the key insight) + considered alternative optimizations and rejected them with reasoning.

5. Correctness

1: Solution wrong; doesn’t handle the given examples.
2: Handles examples but fails an obvious edge case.
3: Handles all standard edge cases.
4: Handles edge cases plus 1 non-obvious one (overflow, empty input, all-duplicates).
5: Walks through correctness argument using invariant or induction.

6. Complexity Analysis

1: Wrong or absent.
2: Correct but only stated at the end.
3: Correct, articulated during/after coding.
4: Correct, plus space complexity, plus identified the bottleneck.
5: All of the above + considered amortized analysis or worst-case input that triggers worst-case complexity.

7. Code Quality

1: Unreadable; magic numbers; one-letter variables everywhere.
2: Works but ugly; copy-paste blocks; unclear naming.
3: Readable; reasonable names; small functions.
4: Clean structure; helpful comments where non-obvious; good use of standard library.
5: Production-quality — clear names, no dead code, idiomatic for the language, would pass a code review.

8. Testing

1: Did not test.
2: Tested only the given example.
3: Tested 1–2 edge cases unprompted.
4: Systematic walkthrough of given examples + 2+ deliberately-chosen edges.
5: Found and fixed own bug through testing, or explicitly stated which test classes would expose risks.

9. Debugging

1: Hit a bug, panicked, never recovered.
2: Hit a bug, fixed by trial and error.
3: Hit a bug, debugged systematically with prints.
4: Hit a bug, hypothesized cause, verified with targeted assertion, fixed.
5: Hit a bug; narrated the debug protocol aloud; fixed in under 3 minutes.

10. Communication

1: Silent typing.
2: Occasional muttering; no clear narrative.
3: Explained brute force and optimization out loud.
4: Continuous narration of thought process; pauses only to think briefly.
5: Narrated, paused at decision points to consider tradeoffs, invited interviewer input at appropriate moments.

11. Handling Follow-ups

1: Could not answer follow-ups.
2: Answered partially.
3: Answered correctly with one prompt.
4: Answered correctly without prompt; proposed reasonable extensions.
5: Answered, anticipated the follow-up before it was asked, and proposed extensions.

12. Language/Runtime Knowledge

1: Made language errors (Python integer-divides where float was needed; Java auto-boxing trap).
2: No errors but no runtime awareness.
3: Used appropriate language features (Python Counter, Java Map.Entry).
4: Articulated runtime cost (Python list.insert(0, ...) is O(N); Java String += is O(N²) in a loop).
5: Discussed GC/memory model/concurrency implications when relevant.

13. Tradeoff Reasoning

1: Picked one approach with no comparison.
2: Mentioned one alternative.
3: Compared two alternatives with a stated reason.
4: Compared 2–3 alternatives across time/space/code complexity axes.
5: Articulated which alternative would be preferred under different constraints (small N vs large N, read-heavy vs write-heavy, latency vs throughput).

14. Production Awareness

1: None — solved as an algorithm puzzle.
2: Mentioned scaling in passing.
3: Articulated 1–2 production concerns (latency, persistence, concurrency).
4: Articulated multiple production concerns; explained how implementation would change.
5: Discussed monitoring, failure modes, backward compatibility, deployment strategy — staff-level signal.

Passing Thresholds by Mock Level

Mock	Target average score	Total minimum (/70)
01 — Beginner	2.5	35
02 — Easy LeetCode	3.0	42
03 — Medium LeetCode	3.0	42
04 — Hard LeetCode	3.2	45
05 — Big Tech phone screen	3.3	46
06 — Big Tech onsite	3.5	49
07 — Senior engineer	3.8	53
08 — Staff practical	4.0	56
09 — Runtime/language	3.8	53
10 — Infrastructure/backend	4.0	56
11 — Concurrency	4.0	56
12 — Competitive style	3.5	49

Notes:

Production-aware dimensions (#13, #14) are weighted higher for mocks 07–10.
Communication (#10) is the most common reason candidates fail; if your average for #10 is below 3.5, drill it specifically.
Mock 12 (competitive) deprioritizes #11–#14 (no follow-ups; pure algorithm).

Common Failure Modes by Level

Level	Most common failure
Beginner	Silent coding; no communication
Easy LC	Forgot edge cases (empty, single element)
Medium LC	Stuck on optimization; couldn’t find the pattern
Hard LC	Panicked when first approach didn’t work
Phone screen	Spent too long on clarification; ran out of time
Onsite	Solved problem 1, gave up on problem 2
Senior	No tradeoff reasoning; “I’d just use X” without comparing
Staff	No production awareness; built a perfect algorithm with no monitoring story
Runtime/lang	Couldn’t answer GC / memory model / concurrency probe mid-coding
Infrastructure	Treated it like a LeetCode problem instead of a system build
Concurrency	Race conditions in submitted code
Competitive	Failed to reach the algorithmic insight; brute force only

How to Schedule Mocks

12-Week Accelerated Track

Weeks 1–4: Mocks 01–04 (one each)
Weeks 5–8: Mocks 05–08 (one each)
Weeks 9–12: Mocks 09–12 (one each) + repeats of the ones you failed

6-Month Serious Track

Months 1–2: foundations; no mocks yet
Month 3: Mocks 01–03 (2 per week)
Month 4: Mocks 04–06 (2 per week)
Month 5: Mocks 07–10 (3 per week)
Month 6: Mocks 11–12 + heavy re-mock cycle (4 per week)

12-Month Elite Track

Months 1–6: foundations + light mocks (mock 01–04, 1 per week)
Months 7–9: Mocks 05–10, 2 per week
Months 10–12: Mocks 11–12, ICPC contests, 3 mocks per week + 2 contests per week

How to Self-Evaluate Honestly

The single biggest failure mode is grade inflation. To counter:

Record the session. Listen back. You will hear all the silent gaps and the muttered “uh, let me think” filler.
Compare to the rubric word-for-word. “I tested edge cases” is not enough for a 4 unless you actually tested 3+ unprompted edge cases.
Find someone harsher than you to debrief with. Ideally an engineer one level above your target.
Track scores over time. A flat line means you’re not improving — change something (new problem domain, harder mock, partner).
The dimension where you score lowest is your training target. Drill it for two weeks, then re-mock.

Mock Index

#	Mock	Time	Target Role
1	Beginner	30 min	First-time interviewer / intern
2	Easy LeetCode	30 min	Intern → SWE-I
3	Medium LeetCode	35 min	SWE-I → SWE-II
4	Hard LeetCode	60 min	SWE-II → Senior
5	Big Tech Phone Screen	45 min	Any FAANG screen
6	Big Tech Onsite	60 min (× 2 problems)	FAANG SWE-II / Senior
7	Senior Engineer	60 min	Senior SWE
8	Staff Practical	75 min	Staff / Principal
9	Runtime/Language Deep Dive	45 min	Senior / Staff
10	Infrastructure/Backend	75 min	Backend / Platform
11	Concurrency Heavy	60 min	Backend / Systems
12	Competitive Style	90 min	Quant / Compiler / ICPC

What “Pass” Means

Passing a mock is necessary but not sufficient readiness. The full readiness checklist is in READINESS_CHECKLIST.md. The mocks here verify you can perform — not that you can do so consistently. Aim for 3 consecutive passes of any given mock before considering that level handled.

Mock 01 — Beginner

Interview type: First-time mock / warm-up Target role: Intern, new grad, first-ever interview practice Time limit: 30 minutes Format: 1 easy problem Hints policy: Unlimited hints with -1 to score per hint Primary goal: Build the habit loop of clarify → brute force → optimize → code → test. Optimization is not the focus.

What This Mock Tests

This mock exists to break the most common beginner failure mode: silent coding. You will be scored more on whether you communicated than on whether your code is optimal. A correct silent solution scores lower than a slightly buggy spoken one.

The scoring rubric weights as follows:

Dimension	Weight
Communication (#10)	3×
Clarifying questions (#2)	2×
Testing (#8)	2×
Code quality (#7)	1×
Correctness (#5)	1×
Complexity (#6)	1×
All others	0.5×

Pick One Problem (interviewer’s choice; for self-mock, pick at random)

Problem A — Reverse a String

Write a function that reverses a string. The input is given as a list of characters s. Modify s in-place; do not allocate a new list.

Examples:

Input:  ['h', 'e', 'l', 'l', 'o']
Output: ['o', 'l', 'l', 'e', 'h']

Input:  ['H', 'a', 'n', 'n', 'a', 'h']
Output: ['h', 'a', 'n', 'n', 'a', 'H']

Constraints: 1 ≤ |s| ≤ 10^5. Each character is printable ASCII.

Problem B — Valid Parentheses

Given a string s containing only ()[]{}, determine if it is valid. Valid means: brackets close in the right order, every opener has a matching closer of the same type.

Examples:

"()"      → True
"()[]{}"  → True
"(]"      → False
"([)]"    → False
"{[]}"    → True
""        → True

Constraints: 0 ≤ |s| ≤ 10^4.

Problem C — Find Maximum in Array

Given an array of integers, return the maximum value. Do not use the language’s built-in max().

Examples:

[3, 1, 4, 1, 5, 9, 2, 6] → 9
[-3, -1, -7]              → -1
[42]                      → 42

Constraints: 1 ≤ |a| ≤ 10^5. -10^9 ≤ a[i] ≤ 10^9.

Expected Communication Style

You should:

Restate the problem in your own words. (“So I need to reverse this list of characters in place, meaning no new list allocation.”)
Ask 2+ clarifying questions even if they feel obvious. (“Should I handle empty input? What’s the max length?”)
State 1+ example trace out loud. (“For [h, e, l, l, o], I’d swap positions 0 and 4, then 1 and 3, leaving 2 alone.”)
Articulate brute force first. Even for these problems — there’s an obvious approach.
Code while narrating. “I’ll use two pointers, left and right, swap and move toward center until they meet.”
Test out loud. Walk through the example, then try empty input, then single element.

You should not:

Type silently for >30 seconds
Skip clarifying questions because the problem “is obvious”
Skip testing because the code “looks right”
Assume the interviewer is following — narrate every decision

Common Failure Modes

Silent coding. Most common. -3 to communication.
Skipping clarification. “Empty string?” was not asked → -1.
No testing. Submitted without walking through. -2 to testing.
Skipping brute force. Wrote the optimal directly without acknowledging the simpler approach. -1 to brute force.
Using a built-in. Problem C says no max() — using it is an instant fail of that problem.

Passing Bar

Total score: 35/70 (average 2.5)
Communication dimension: 3+ (mandatory)
Code: works on all given examples
One unprompted test case beyond the given examples

If you score 35+ but communication is below 3: re-do this mock. The score is misleading; the failure mode isn’t fixed.

Follow-up Questions (Interviewer may ask)

For A:

What’s the complexity? (O(N) time, O(1) space.)
What if the string is in a Unicode encoding with multi-byte characters? (Character iteration is no longer index-1; need to handle codepoints.)
What if the string is immutable in your language? (Java strings, Python strings — must allocate.)

For B:

Complexity? (O(N) time, O(N) worst-case space for the stack.)
What if you also need to return the position of the first invalid bracket? (Track index when pushing; return index on failed pop.)
What if the input can have non-bracket characters mixed in? (Skip them or treat as invalid — clarify.)

For C:

Complexity? (O(N) time, O(1) space.)
What if the array is empty? (Undefined; throw, return None, or return INT_MIN — clarify.)
What if it’s a stream and you can’t store it all? (Maintain running max with O(1) state.)

Required Tests

For all problems:

The given examples
Empty input (where the constraint allows)
Single-element input
One additional edge case you choose

For A: a palindrome input (e.g., ['r', 'a', 'c', 'e', 'c', 'a', 'r']). For B: nested mismatch like "([)]" and just an opener "(" (should be False). For C: all-negative input and all-identical input.

Required Complexity Explanation

State out loud:

Time complexity in Big-O
Space complexity
Why those bounds are tight (or whether they could be improved)

For these problems, the optimal is also the simplest. Acknowledge that briefly.

Self-Evaluation Template

Copy this into your notes after the mock:

Mock 01 — Beginner
Date: _______
Problem chosen: _______
Time taken: _____ min (limit: 30)

Scores (1–5):
[ ] 1. Problem Understanding
[ ] 2. Clarifying Questions
[ ] 3. Brute Force
[ ] 4. Optimization
[ ] 5. Correctness
[ ] 6. Complexity Analysis
[ ] 7. Code Quality
[ ] 8. Testing
[ ] 9. Debugging (if applicable)
[ ] 10. Communication
[ ] 11. Follow-ups
[ ] 12. Language/Runtime
[ ] 13. Tradeoffs
[ ] 14. Production Awareness

Total: ___/70

What went well:

What went poorly:

Specific bug or moment of confusion:

What to drill before next mock:

What to Do If You Fail

If you scored below 35:

Identify the dimension with the lowest score.
Re-do this same mock with a different problem (A/B/C). Focus only on that dimension.
If communication is the issue, record yourself doing 3 LC easies aloud over the next week. Listen back.
Do not move to Mock 02 until you pass Mock 01 twice in a row. Foundational habits matter more than progression.

Mock 02 — Easy LeetCode

Interview type: Standard LC easy Target role: Intern, new grad SWE-I, first technical screen Time limit: 30 minutes Format: 1 easy problem Hints policy: One free hint after 10 min of being stuck; additional hints -1 each Primary goal: Solve correctly with clean code and adequate testing in under 30 min.

What This Mock Tests

The bar at the easy level: you should solve the problem with the optimal approach, communicate clearly throughout, and verify with at least 2 unprompted tests. Easy LC questions are the most common phone-screen problems at smaller companies and at the start of FAANG screens.

Scoring weights are uniform across dimensions — easy LCs should pass on every axis.

Pick One Problem

Problem A — Two Sum (LC 1)

Given an array of integers nums and an integer target, return the indices of the two numbers that add up to target. You may assume exactly one solution exists. You may not use the same element twice.

Examples:

nums = [2, 7, 11, 15], target = 9   → [0, 1]
nums = [3, 2, 4],      target = 6   → [1, 2]
nums = [3, 3],         target = 6   → [0, 1]

Constraints: 2 ≤ |nums| ≤ 10^4. -10^9 ≤ nums[i] ≤ 10^9. Exactly one solution.

Problem B — Best Time to Buy and Sell Stock (LC 121)

You are given prices, where prices[i] is the price of a stock on day i. Maximize profit by choosing one day to buy and a later day to sell. If no profit possible, return 0.

Examples:

[7, 1, 5, 3, 6, 4]  → 5  (buy day 1 at 1, sell day 4 at 6)
[7, 6, 4, 3, 1]     → 0
[1, 2]              → 1

Constraints: 1 ≤ |prices| ≤ 10^5. 0 ≤ prices[i] ≤ 10^4.

Problem C — Contains Duplicate (LC 217)

Given an integer array nums, return True if any value appears at least twice.

Examples:

[1, 2, 3, 1]                  → True
[1, 2, 3, 4]                  → False
[1, 1, 1, 3, 3, 4, 3, 2, 4, 2] → True

Constraints: 1 ≤ |nums| ≤ 10^5.

Expected Communication Style

Restate the problem in 1 sentence.
Ask 2–3 clarifying questions: input bounds, edge cases (empty? duplicates? negative?), output format.
State brute force with complexity. (“Nested loop, O(N²).”)
Identify the optimization signal. (“I see lookup-by-value — hashmap.”)
State optimal complexity before coding. (“O(N) time, O(N) space.”)
Code with brief narration of each step.
Walk through given example. Then 1–2 edge cases.

Solution Sketches

A. Two Sum: hashmap value → index; for each element, check if target - x is in the map; if so return; else insert. O(N) time, O(N) space.

B. Stock: maintain running minimum; for each price, compute price - min_so_far; track max. O(N) time, O(1) space.

C. Duplicate: insert into a set; on collision return True. O(N) time, O(N) space. Or: sort + scan adjacent, O(N log N) time, O(1) extra.

Common Failure Modes

Wrote the O(N²) brute force as the final answer. Acceptable on first attempt, but you must immediately follow with the optimization.
Forgot to handle duplicates in Two Sum. What if nums = [3, 3]? The naive hashmap solution must check target - x before inserting x.
Used max(prices) - min(prices) for B. Wrong — the min must come before the max.
No edge case test. Empty arrays, single element, all duplicates.

Passing Bar

Total score: 42/70 (average 3.0)
Optimal complexity reached (O(N) or O(N log N) depending on problem)
At least 2 unprompted edge case tests
Continuous narration (no silent stretches >30 sec)

Follow-up Questions

For A:

What if there are multiple valid pairs? Return all. → Need to handle duplicates carefully; either set of tuples or sort + two-pointer.
What if the array is sorted? → Two-pointer, O(1) extra space.
What if the input is a stream? → Hashmap still works; can’t know “future” elements.
What if duplicates matter? → nums = [3, 3], target = 6 → return [0, 1].

For B:

Can you buy and sell multiple times? → LC 122; sum all positive day-to-day differences.
With a cooldown of K days between trades? → DP, state = (day, holding?).
With a transaction fee? → DP variant.
What if you can short-sell? → Symmetric; maintain running max.

For C:

What if memory is constrained? → Bloom filter (approximate); or sort in-place + scan.
Return the first duplicate, not just whether one exists. → Same approach, return on first collision.
Find ALL duplicates. → Count map.
In a stream? → Bloom filter or Count-Min Sketch for approximate.

Required Tests

For all:

Given examples
Empty input or smallest legal input
Single-element input (if applicable)
Edge case specific to the problem (negative numbers, all-duplicates, sorted input)

Required Complexity Explanation

State:

Time complexity (with reasoning)
Space complexity (with reasoning)
The bottleneck — which line determines the dominant cost

Self-Evaluation Template

Mock 02 — Easy LC
Date: _______
Problem: _______
Time taken: _____ / 30 min

Scores (1–5) for all 14 dimensions:
___ Total /70

Optimal complexity reached? Y/N
Hints used? Number: ___
Edge cases tested unprompted? Number: ___

Strongest dimension:
Weakest dimension:
Action item for next mock:

What to Do If You Fail

Score 35–41: Re-do with a different problem from the list. Focus on the lowest-scored dimension.
Score <35: Step back to Mock 01 for one session. The habit loop isn’t solid.
Failed to reach optimal complexity: Drill Phase 1 (foundations) hash-map and array labs.
Took longer than 30 min: Repeat with a stricter timer. 30 min is the actual phone-screen budget.

Mock 03 — Medium LeetCode

Interview type: Standard LC medium — the modal interview question type Target role: SWE-I → SWE-II, generic mid-tier phone screen Time limit: 35 minutes Format: 1 medium problem Hints policy: One free hint at 15 min; additional -2 each Primary goal: Pattern recognition under time pressure + clean implementation.

What This Mock Tests

Mediums are the bread and butter of coding interviews. If you can’t pass mediums consistently in 30–35 min, you cannot pass FAANG. The bar is:

Recognize the pattern (sliding window, hashmap, two pointers, BFS, etc.) within 5 minutes
State optimal complexity before coding
Implement correctly with clean code
Test 3+ cases including 1 non-obvious edge

Scoring weights this mock equally across all dimensions, with slight emphasis on Optimization (#4) and Testing (#8) — these are the differentiators at the medium level.

Pick One Problem

Problem A — Longest Substring Without Repeating Characters (LC 3)

Given a string s, find the length of the longest substring with no repeating characters.

Examples:

"abcabcbb"  → 3   ("abc")
"bbbbb"     → 1   ("b")
"pwwkew"    → 3   ("wke")
""          → 0

Constraints: 0 ≤ |s| ≤ 5×10^4. Printable ASCII or English letters/digits/symbols.

Problem B — Group Anagrams (LC 49)

Given an array of strings, group anagrams together. Return any order.

Examples:

["eat","tea","tan","ate","nat","bat"]
→ [["bat"],["nat","tan"],["ate","eat","tea"]]

[""]    → [[""]]
["a"]   → [["a"]]

Constraints: 1 ≤ |strs| ≤ 10^4. 0 ≤ |strs[i]| ≤ 100. Lowercase English letters.

Problem C — Coin Change (LC 322)

Given coin denominations coins and an amount, return the fewest number of coins needed. If impossible, return -1. Infinite supply of each coin.

Examples:

coins = [1, 2, 5], amount = 11   → 3   (5 + 5 + 1)
coins = [2],       amount = 3    → -1
coins = [1],       amount = 0    → 0

Constraints: 1 ≤ |coins| ≤ 12. 1 ≤ coins[i] ≤ 2^31 - 1. 0 ≤ amount ≤ 10^4.

Expected Communication Style

Restate in your own words; explicitly state the input/output types.
Ask 3–5 clarifying questions: input bounds, edge cases, output ambiguities.
State brute force with complexity.
Name the pattern. (“Sliding window over a hashmap” / “Hash group by sorted string” / “Unbounded knapsack DP.”)
State optimal complexity before coding.
Code with narration. Pause briefly at decision points (e.g., “I need to evict from the window — let me use a map of char→last_seen_index instead of a set, so I can jump the left pointer”).
Test 3+ cases, including 1 designed to break common bugs.

Solution Sketches

A. Longest Substring: sliding window with hashmap (char → last index seen). When duplicate enters, advance left to max(left, last_seen[c] + 1). O(N) time, O(min(N, alphabet)) space.

B. Group Anagrams: key by sorted string, OR key by char-count tuple. Hashmap from key → list of strings. O(N · K log K) for sort key, O(N · K) for count key.

C. Coin Change: DP. dp[i] = min coins for amount i. dp[i] = min(dp[i - c] + 1) for c in coins. O(amount × |coins|) time.

Common Failure Modes

For A: used a set + slide, but O(N²) worst case. Need the hashmap-with-last-index trick to keep O(N).
For A: forgot the max(left, ...) when jumping the left pointer. Causes left to move backwards on out-of-window duplicates.
For B: used sorted string as key but the sort is O(K log K) per word. Acceptable, but count tuple is O(K). Discuss the tradeoff.
For C: greedy approach (always pick the largest coin). Wrong on coins = [1, 3, 4], amount = 6: greedy gives 4+1+1 = 3 coins; DP gives 3+3 = 2.
For C: forgot to initialize dp[0] = 0 or to handle amount = 0.
For C: didn’t check if i - c >= 0 before lookup. Causes IndexError or wrong answer.

Passing Bar

Total score: 42/70 (average 3.0)
Optimal complexity reached
Correct on all given examples + at least 1 self-generated edge case
Hint usage ≤ 1
Time: ≤ 35 min

Follow-up Questions

For A:

Return the substring itself, not just length. → Track start position in addition to max length.
What if “repeating” means within a window of K positions? → Modify the window logic; same approach.
Unicode? → Use codepoint, not byte; alphabet might be huge so use hashmap, not array.

For B:

What if strings can be huge (1M chars)? → Hashing the count tuple becomes the bottleneck; consider streaming Rabin-Karp.
Anagram detection in a stream? → Maintain rolling count.
Approximate anagrams (with one letter difference)? → Locality-sensitive hashing.

For C:

Return the actual coin combination, not just the count. → Track parent pointer in DP; reconstruct.
Number of ways to make change. → Different DP: dp[i] = sum of dp[i-c].
What if coin counts are limited? → Bounded knapsack variant; O(amount × sum(counts)).
Why doesn’t greedy work? → Coin systems where greedy works are “canonical” (e.g., USD coins); others (e.g., [1, 3, 4]) require DP.

Required Tests

For all:

Given examples
Empty input (where legal)
Single-element / minimum input
Input that triggers the pattern’s worst case (e.g., for A: "aaaaa"; for C: amount=0 and impossible amount)
A randomly chosen non-trivial case you verify by hand

Required Complexity Explanation

State:

Time complexity, with the bottleneck identified
Space complexity, including auxiliary data structures
Whether the bound is tight or improvable

Self-Evaluation Template

Mock 03 — Medium LC
Date: _______
Problem: _______
Time: ___ / 35 min
Hints used: ___

Scores (1–5) for all 14 dimensions:
___ Total /70

Pattern recognized in: _____ minutes
Bug count during coding: ___
Bug count caught by my own tests: ___

Strongest dimension:
Weakest dimension:
Specific drill for next session:

What to Do If You Fail

Score 35–41: Repeat with a different problem; focus on weak dimension.
Took > 35 min: You need more medium volume. Solve 20 mediums in the next week against a 35-min timer.
Couldn’t recognize the pattern: Go back to Phase 2 (patterns) README and re-read the signal table.
Bug-prone code: Phase 10, Lab 02 (TDD) and Lab 05 (stress testing).
Communication weak: Record yourself; listen back; identify silent stretches.
Pass twice in a row before moving to Mock 04.

Mock 04 — Hard LeetCode

Interview type: LC hard, FAANG onsite-level Target role: SWE-II → Senior; FAANG onsite second round Time limit: 60 minutes Format: 1 hard problem (or 1 medium + 1 hard if you finish hard early) Hints policy: One free hint at 20 min; additional -2 each. After 3 hints, the round is “failed” by FAANG standards. Primary goal: Reach the optimal algorithm under pressure, implement correctly, handle the hard follow-ups.

What This Mock Tests

Hards are where mid-level engineers separate from senior. The bar:

Recognize the non-obvious pattern (binary search on answer, segment tree, DP on intervals, etc.) within 10 minutes
Articulate why the optimal works, with correctness sketch
Implement under time pressure
Handle 2+ follow-up extensions

Scoring emphasizes Problem Understanding (#1), Optimization (#4), Correctness (#5), and Tradeoff Reasoning (#13).

Pick One Problem

Problem A — Median of Two Sorted Arrays (LC 4)

Given two sorted arrays nums1 and nums2 of sizes m and n, return the median of the combined sorted array. Must run in O(log(m+n)) time.

Examples:

[1, 3], [2]         → 2.0    (merged: [1, 2, 3])
[1, 2], [3, 4]      → 2.5    (merged: [1, 2, 3, 4])
[],     [1]         → 1.0
[1, 2], []          → 1.5

Constraints: 0 ≤ m, n ≤ 1000. m + n ≥ 1. -10^6 ≤ values ≤ 10^6.

Problem B — Trapping Rain Water (LC 42)

Given heights of bars height[i], compute how much water it can trap after raining.

Examples:

[0,1,0,2,1,0,1,3,2,1,2,1] → 6
[4,2,0,3,2,5]              → 9

Constraints: 1 ≤ |height| ≤ 2×10^4. 0 ≤ height[i] ≤ 10^5.

Problem C — Merge K Sorted Lists (LC 23)

Given an array of k sorted linked lists, merge them into one sorted list.

Examples:

[[1,4,5], [1,3,4], [2,6]] → [1,1,2,3,4,4,5,6]
[]                         → []
[[]]                       → []

Constraints: 0 ≤ k ≤ 10^4. 0 ≤ |lists[i]| ≤ 500. Sum of all lengths ≤ 10^4.

Expected Communication Style

Restate with input/output types and the explicit complexity constraint (where given).
Ask precise clarifying questions: O(log) vs O(m+n) required? Negative numbers? Empty arrays? Duplicates?
State a baseline: the obvious O(m+n) for A, O(N) two-pointer for B, O(N log K) for C.
Identify whether the baseline meets the constraint. If not, derive the harder approach.
Articulate the key insight before coding. (“For A, I’ll binary search on the partition position in the shorter array such that the left halves of both arrays combined form the lower half of the merged array.”)
Code carefully — hards have boundary conditions everywhere.
Test 3+ cases including the boundary (empty array, both arrays size 1, large mismatch).

Solution Sketches

A. Median of Two Sorted: binary search on the partition i in the shorter array. For each i, j = (m + n + 1) // 2 - i. Check if nums1[i-1] ≤ nums2[j] and nums2[j-1] ≤ nums1[i]. If so, median is computed from the boundary 4 values. O(log(min(m, n))). Edge cases: i=0 (no left in nums1) or i=m; same for j.

B. Rain Water: two-pointer. Maintain left_max and right_max. At each step, the side with the smaller max determines how much water can sit at that index. Move that pointer inward. O(N) time, O(1) space. Alternative: precompute left_max and right_max arrays, O(N) time and space.

C. Merge K Lists: min-heap of (value, list_index, node). Pop the smallest, push its next. O(N log K) where N = total nodes. Alternative: divide-and-conquer pairwise merge, same complexity, no heap.

Common Failure Modes

A: Submitted O(m+n) by merging. Works but fails the complexity requirement — interviewer marks as fail at FAANG bar.
A: Off-by-one in the partition formula. Most common bug.
A: Didn’t handle empty arrays. Crash on [], [1].
B: Used DP with O(N) space when O(1) two-pointer works. Acceptable but downgrades the “Optimization” score.
B: Forgot to handle bars that don’t trap any water (descending then ascending).
C: Used O(N · K) approach — for each output element, scan all K heads. Too slow for K = 10^4.
C: Forgot null check on lists[i] — common test failure on [[]].

Passing Bar

Total score: 45/70 (average 3.2)
Optimal complexity reached (or a serious attempt with clear awareness of the gap)
Correct on given examples + 2 boundary cases
Hint usage ≤ 1
Time ≤ 60 min
Articulated correctness argument (not just “trust me”)

Follow-up Questions

For A:

Generalize to k-th smallest in two sorted arrays. → Same binary search, partition at k-1 in total. O(log(min(m, n))).
Median of K sorted streams (K small). → Heap-based; not log-time anymore.
Median of unsorted data? → Quickselect or median-of-medians, O(N).
Memory-bound large-K case. → External merge sort; k-way merge with bounded heap.

For B:

3D rain water (LC 407). → Heap of boundary cells, BFS inward. Much harder.
Approximate version with O(1) memory for a stream. → Doesn’t exist in general; needs two-pass for exact.
Trapping with non-zero ground (irregular shape). → Doesn’t fundamentally change.

For C:

Streaming K sorted streams, K huge (10^6+). → Tournament tree (O(log K) per element), or distributed merge.
Lists are not in memory (each is a file). → External k-way merge with bounded buffers.
K-way merge with timestamps + de-dup. → Same algorithm + dedup pass.
Latency-sensitive variant: emit elements as soon as possible. → Stream the heap output without buffering.

Required Tests

All given examples
Both arrays empty (A), all bars zero (B), all lists empty (C)
One huge + one tiny array (A) — stresses the binary search edges
Strictly increasing input (B), strictly decreasing input (B)
K = 1 (C: single list pass-through), K = 0 (C: empty)
One adversarial input you design

Required Complexity Explanation

Time complexity, with reasoning
Space complexity
Whether the bound is tight or merely upper
For A: explicitly justify why O(log) is achievable

Self-Evaluation Template

Mock 04 — Hard LC
Date: _______
Problem: _______
Time: ___ / 60 min
Hints used: ___

Scores (1–5):
___ Total /70

Time to reach optimal idea: _____ min
Time to first correct submission: _____ min
Number of bugs hit and fixed:

Was the correctness argument articulated? Y/N
Were 2+ follow-ups answered? Y/N

Strongest dimension:
Weakest dimension:
Action item for next session:

What to Do If You Fail

Score 38–44: Repeat with a different problem; you nearly passed.
Score <38: Step back to mediums; ensure 3+ consecutive passes of Mock 03 before retrying hard.
Couldn’t reach optimal complexity: Review Phase 2 (binary search), Phase 5 (DP), Phase 4 (graphs) — which patterns did you miss?
Bug-storm on implementation: Phase 10 Lab 04 (correctness proofs) and Lab 05 (stress testing).
Failed follow-ups: You knew the algorithm but didn’t know its variants; do 5 related problems before the next attempt.
Pass twice in a row before moving to Mock 05.

Mock 05 — Big Tech Phone Screen

Interview type: FAANG-style phone screen (Google, Meta, Amazon, Microsoft, Apple) Target role: SWE-II / Senior phone round Time limit: 45 minutes total Format: ~5 min intro + 35 min coding + 5 min Q&A. ONE medium-to-hard problem with strong follow-ups. Hints policy: Hints cost real points at FAANG — one is acceptable, two is borderline, three fails. Primary goal: Show you can work cleanly under FAANG’s exact format.

What This Mock Tests

This mock simulates the actual FAANG phone screen format. The interviewer:

Greets you (~3 min)
Asks one 1-min behavioral warm-up (“What are you excited about lately?”)
Presents the coding problem
Expects you to clarify, plan, code, test
Asks 2–3 follow-up extensions in the remaining time

The signal they’re collecting: can this person work on our team without supervision? Specifically — do they understand requirements, optimize without being told to, write reasonable code, and engage with extensions intelligently?

Scoring weights: Problem Understanding, Optimization, Communication, Follow-ups are all critical (3+). One weak dimension = no advance.

Pick One Problem

Problem A — Longest Increasing Path in a Matrix (LC 329)

Given an m × n integer matrix, return the length of the longest strictly increasing path. From a cell, you may move up/down/left/right (no diagonals, no wraparound).

Examples:

[[9,9,4],[6,6,8],[2,1,1]]   → 4   (path 1→2→6→9)
[[3,4,5],[3,2,6],[2,2,1]]   → 4   (path 3→4→5→6)
[[1]]                         → 1

Constraints: 1 ≤ m, n ≤ 200. 0 ≤ matrix[i][j] ≤ 2^31 - 1.

Problem B — Word Ladder (LC 127)

Given two words beginWord and endWord and a dictionary wordList, find the length of the shortest transformation sequence (each step changes exactly one letter; intermediate words must be in dictionary). Return 0 if no sequence exists.

Examples:

"hit", "cog", ["hot","dot","dog","lot","log","cog"]   → 5  (hit→hot→dot→dog→cog)
"hit", "cog", ["hot","dot","dog","lot","log"]          → 0  (cog not in dict)

Constraints: 1 ≤ |beginWord| ≤ 10. All words same length, lowercase. 1 ≤ |wordList| ≤ 5000.

Problem C — Number of Islands (LC 200)

Given a 2D grid of '1' (land) and '0' (water), count the number of islands (connected groups of land, 4-directional).

Examples:

[
 ["1","1","1","1","0"],
 ["1","1","0","1","0"],
 ["1","1","0","0","0"],
 ["0","0","0","0","0"]
] → 1

[
 ["1","1","0","0","0"],
 ["1","1","0","0","0"],
 ["0","0","1","0","0"],
 ["0","0","0","1","1"]
] → 3

Constraints: 1 ≤ m, n ≤ 300.

Expected Communication Style

Restate and confirm types. (“Integer matrix; I return the length of the longest strictly increasing path; movement is 4-directional.”)
Ask 3–5 clarifying questions: matrix size, value range, strictly vs non-strictly increasing, do diagonals count.
State brute force with complexity. (“DFS from every cell, no memo, exponential worst case.”)
Identify the optimization signal. (“DFS + memoization since subpath answers don’t change. Or topological sort on the DAG of (i,j) → (i’, j’) where val(i,j) < val(i’, j’).”)
Justify your choice between alternatives. (“Memo’d DFS is simpler; topo sort is more rigorous and avoids stack depth issues at 200×200.”)
Code cleanly. Helper functions, no inline magic.
Walk through the example. Test 2+ edge cases.
Engage with the follow-ups — these decide the round.

Solution Sketches

A. Longest Increasing Path: memoized DFS. dp[i][j] = longest path starting at (i, j). Recurse to 4 neighbors with strictly greater value; dp[i][j] = 1 + max(dp[neighbor]). O(mn) time and space. The DAG structure ((i,j) → (i', j') iff val < val') guarantees no cycles, so memo is sound.

B. Word Ladder: BFS over the graph where nodes are words and edges connect words differing by one letter. Use a “wildcard bucket” optimization: for each word, generate patterns like h*t, *ot, ho*; bucket words by pattern; neighbors are words sharing a pattern bucket. O(N · L²) where N = dict size, L = word length.

C. Number of Islands: for each unvisited ‘1’, flood fill (BFS or DFS), mark visited, increment count. O(mn) time and space.

Common Failure Modes

A: brute force without memo. TLE on 50×50.
A: incorrect strict vs non-strict check. > vs >= flips the answer.
B: built a graph by comparing every pair of words. O(N² · L) — too slow for N = 5000.
B: didn’t notice endWord may not be in dict. Returns wrong if you assume it is.
B: BFS without visited tracking. Infinite loop.
C: modified input grid without permission. Some interviewers care; clarify first.
All: weak follow-up answers. “I’d just use a database” — too vague; doesn’t show understanding.

Passing Bar

Total score: 46/70 (average 3.3)
Optimal complexity reached
Correctness on given examples + 2 edge cases
Hint usage ≤ 1
Time ≤ 45 min
Two follow-ups answered with substance

Follow-up Questions

For A:

Return the path itself, not just the length. → Add parent pointer in DP; reconstruct.
What if path can revisit cells? → No longer a DAG; problem is NP-hard (Hamiltonian-flavored).
Path with diagonal moves allowed? → 8 neighbors instead of 4; same algorithm.
Matrix is sparse (mostly 0). → Algorithm doesn’t change asymptotically; data layout (CSR) matters at scale.
Matrix doesn’t fit in memory. → Chunked processing with overlap; harder boundary handling.

For B:

Return one valid path. → Track BFS parent; reconstruct.
Return ALL shortest paths (Word Ladder II, LC 126). → BFS to build the DAG, then DFS to enumerate.
Bidirectional BFS for speedup. → Search from both ends, meet in middle. Roughly √ improvement.
Streaming dictionary (words arriving). → Re-bucket on each insert; same algorithm.

For C:

Count distinct island shapes. → Canonicalize the shape (sort relative cell positions, possibly rotate/reflect); hash.
Number of islands II (online — cells added one by one). → Union-Find; O(α(N)) per operation.
Largest island after flipping at most one ‘0’ to ‘1’. → Label each island with size; for each ‘0’, sum sizes of unique neighboring islands + 1.
3D islands. → Same algorithm, 6 neighbors instead of 4.

Required Tests

All given examples
1×1 matrix / single-letter input
All-same-value matrix (A: answer is 1)
Disconnected components (C: multiple islands)
Long diagonal-like path (A)
Dictionary missing the endWord (B)
Empty grid / empty dictionary edge

Required Complexity Explanation

Time, with reasoning
Space, including recursion stack and memoization tables
Worst-case input that triggers the worst-case complexity
For A: explain why memo turns O(4^(mn)) into O(mn)

Self-Evaluation Template

Mock 05 — Big Tech Phone Screen
Date: _______
Problem: _______
Time: ___ / 45 min
Hints used: ___
Follow-ups answered well (out of 2 asked): ___

Scores (1–5):
___ Total /70

Did I narrate continuously? Y/N
Did I identify the optimization signal before coding? Y/N
Did I test 2+ unprompted edges? Y/N

What I would change for the real interview:

What to Do If You Fail

Score 40–45: Re-do with a different problem; pinpoint weak dimension.
Score <40: You’re not ready for FAANG phone screens. Do 10 more mediums + 3 hards, then retry.
Optimization gap: Phase 2 patterns + Phase 4 graphs are the most-tested patterns at FAANG.
Follow-up weakness: This is the #1 thing that distinguishes “hire” from “no-hire” at FAANG phone screens. Treat follow-ups as a primary skill, not an afterthought.
Pass twice in a row before moving to Mock 06.

Mock 06 — Big Tech Onsite

Interview type: FAANG onsite coding round (single round of the 4–5 onsite rounds) Target role: FAANG SWE-II / Senior Time limit: 60 minutes total Format: ~5 min intro + 50 min coding (TWO problems back-to-back) + 5 min Q&A Hints policy: One hint per problem acceptable; more is below-bar. Primary goal: Demonstrate sustained performance across two problems without losing tempo.

What This Mock Tests

FAANG onsites run 4–5 of these rounds per day. Each round expects you to solve 1–2 problems in 50 minutes of coding. This mock packs two problems into 60 minutes deliberately — the time pressure is real.

The signal: are you a consistent solver, not a one-hit-wonder? Can you context-switch from problem 1 to problem 2 without resetting?

Scoring weights: all dimensions matter; Time Management is implicit — running out of time on problem 2 is a hard fail signal.

Format

Pick one easy/medium problem (15–20 min) AND one medium/hard problem (30–40 min) from the list below. The interviewer presents one, then immediately the next once you finish. No break.

Problem Set 1 (warm-up: 15–20 min)

A1 — Valid Anagram (LC 242)

Given two strings, determine if one is an anagram of the other.

"anagram", "nagaram"  → True
"rat", "car"          → False

Constraints: 1 ≤ |s|, |t| ≤ 5×10^4. Lowercase English letters (or follow up: Unicode).

A2 — Climbing Stairs (LC 70)

You can climb 1 or 2 steps at a time. How many distinct ways to reach step n?

n = 2 → 2   (1+1, 2)
n = 3 → 3   (1+1+1, 1+2, 2+1)

Constraints: 1 ≤ n ≤ 45.

A3 — Move Zeroes (LC 283)

Given nums, move all 0s to the end while maintaining the relative order of non-zero elements. In-place.

[0,1,0,3,12]  → [1,3,12,0,0]

Problem Set 2 (main: 30–40 min)

B1 — LRU Cache (LC 146)

Design and implement a Least Recently Used cache with get(key) and put(key, value) both in O(1).

LRUCache cache(2)
cache.put(1, 1)
cache.put(2, 2)
cache.get(1)       → 1
cache.put(3, 3)    // evicts key 2
cache.get(2)       → -1
cache.put(4, 4)    // evicts key 1
cache.get(1)       → -1
cache.get(3)       → 3
cache.get(4)       → 4

Constraints: 1 ≤ capacity ≤ 3000. At most 10^5 operations.

B2 — Word Break (LC 139)

Given a string s and a dictionary wordDict, return True if s can be segmented into a sequence of dictionary words.

"leetcode", ["leet", "code"]   → True
"applepenapple", ["apple","pen"] → True
"catsandog", ["cats","dog","sand","and","cat"] → False

Constraints: 1 ≤ |s| ≤ 300. 1 ≤ |wordDict| ≤ 1000.

B3 — Course Schedule II (LC 210)

Given numCourses and prerequisites (pairs [a, b] meaning b must be taken before a), return an order in which to take all courses, or [] if impossible.

2, [[1,0]]                    → [0, 1]
4, [[1,0],[2,0],[3,1],[3,2]]  → [0, 1, 2, 3] or [0, 2, 1, 3]
2, [[1,0],[0,1]]              → []

Constraints: 1 ≤ numCourses ≤ 2000. 0 ≤ |prerequisites| ≤ 5000.

Expected Communication Style

For each problem:

Restate
2–3 clarifying questions (don’t over-ask on the warm-up; do for the main)
Brute force + complexity
Optimal approach + complexity
Code with narration
Test 2–3 cases
Move on to the next problem without dragging

Critical: manage time aggressively. If you blow past 20 min on problem 1, stop and move on. Failing on time is worse than producing partial code on both.

Solution Sketches

A1 Anagram: count chars (Counter or array of 26), compare. O(N + M). A2 Climbing Stairs: Fibonacci. DP or closed form. O(N) or O(1). A3 Move Zeroes: two-pointer, write-pointer advances on non-zero; pad with zeros. O(N) time, O(1) space.

B1 LRU Cache: doubly linked list + hashmap. List front = most recent, tail = LRU. get → move node to front; put → if exists move to front and update; if not, insert at front; if over capacity, remove tail and delete from hashmap. O(1) per op.

B2 Word Break: DP. dp[i] = True if s[:i] can be broken. dp[i] = any(dp[j] and s[j:i] in wordSet) for j < i. O(N² · max_word_len) or O(N²) with set lookup. Watch the off-by-one in dp = [False] * (n + 1); dp[0] = True — a frequent bug.

B3 Course Schedule II: topological sort via Kahn’s algorithm (BFS on indegree). If output length < numCourses, there’s a cycle → return []. O(V + E).

Common Failure Modes

Spent 30 min on the easy. Total fail; you’ll run out for the main problem.
LRU implemented with built-in OrderedDict (Python) without explaining the underlying data structure. Some interviewers accept this; many do not. Always offer to implement from scratch.
Word Break: O(2^N) recursion without memoization. TLE on N=100.
Course Schedule: DFS-based cycle detection but forgot to track three states (unvisited / in-progress / done). Marks node done while in progress → false negatives.
Trying to start problem 2 fresh without acknowledging the time check. Senior signal: “We have 35 min left and this is the main problem; let me dive in.”

Passing Bar

Total score: 49/70 (average 3.5)
BOTH problems implemented correctly
Optimal complexity on the main problem
Time managed: problem 1 ≤ 20 min, problem 2 ≤ 40 min
Hint usage ≤ 1 total
Tests run on both

Follow-up Questions (asked between or after problems)

For B1 (LRU):

Make it thread-safe. → Coarse-grained lock; or read-write lock with care; or lock-free with hazard pointers (advanced).
LFU instead of LRU. → Two-level structure: hashmap to nodes, hashmap to frequency-buckets, each bucket a doubly-linked list.
Distributed LRU across multiple servers. → Consistent hashing + per-shard LRU.
Persist to disk. → Write-behind cache; reconstruct on startup.

For B2 (Word Break):

Word Break II — return all valid sentences. → Backtracking with memoization on the suffix.
Words can be arbitrarily long, dict has 10^6 words. → Trie for prefix lookup during DP, O(N²) using trie traversal.
Streaming version: input arrives one char at a time. → Online DP — update dp[i] as i grows; suffix automaton helps for the dict.

For B3 (Course Schedule):

Find any cycle and return it. → DFS with parent tracking; on back-edge, walk parents.
Schedule to minimize number of semesters (parallel courses). → Longest path in DAG = answer; O(V + E).
Add weighted edges (course duration). → Critical path method.

Required Tests

All given examples (both problems)
Empty / single-element input for problem 1
Capacity 1 for LRU
Single-word dict and s that exactly equals one word for Word Break
Course graph with cycle for B3
Self-loop course (numCourses=1, prereq=[[0,0]]) — should return []

Required Complexity Explanation

For both problems, state time and space + identify which one is the bottleneck under the actual constraints (N=300 for Word Break ⇒ O(N²) is fine; N=10^5 ops on LRU ⇒ O(1) per op is mandatory).

Self-Evaluation Template

Mock 06 — Big Tech Onsite
Date: _______
Problem 1: _______ — Time: ___ min, Score: ___/70
Problem 2: _______ — Time: ___ min, Score: ___/70
Total time: ___ / 60 min
Hints used: ___ (across both)

Combined avg score:
Both problems complete? Y/N
Tests run on both? Y/N

What went well:
What went poorly:
Time-management notes:
Action item:

What to Do If You Fail

Failed problem 2 due to time: Practice problem 1 against a stricter timer (15 min).
Failed problem 2 due to difficulty: Mock 04 (hard LC) needs more reps.
Hint-heavy on both: Foundational pattern recognition gap; return to Phase 2.
LRU implementation issues: Drill data structure design — Phase 1 lab 04 + 05 (linked lists, stacks/queues).
Pass twice consecutively before Mock 07.

Mock 07 — Senior Engineer

Interview type: Senior SWE coding + design hybrid Target role: Senior Software Engineer (L5 Google / E5 Meta / SDE3 Amazon) Time limit: 60 minutes Format: ONE problem + system extension + explicit tradeoff discussion Hints policy: Hints on the algorithm lower your score significantly; hints on the extension are acceptable. Primary goal: Show senior-level reasoning — not just solving, but choosing among solutions with reasoning.

What This Mock Tests

At the senior bar, mere correctness is not enough. The interviewer wants to see:

You consider multiple approaches and articulate why you chose one
You understand the production implications of your choices
You can extend the algorithm into a service-like context
You can answer “what if the input is 1000× larger” with a concrete plan

Scoring weights: Tradeoff Reasoning (#13), Production Awareness (#14), Optimization (#4), Follow-ups (#11) are all critical. A senior who scores 3 on tradeoffs has signaled “mid-level”; needs 4+.

Pick One Problem

Problem A — Design a Rate Limiter

Build a rate limiter supporting allow(user_id, timestamp) → bool. Each user can make at most N requests per W seconds. Discuss the algorithm, then extend to a multi-server / distributed setting.

Initial constraints: in-process, single-thread. N ≤ 1000 reqs/window. W ≤ 60 sec. 1M users.

Problem B — Top K Frequent Elements (LC 347) + Streaming Extension

Phase 1: Given an array nums and integer k, return the k most frequent elements. O(N log K).

[1,1,1,2,2,3], k=2  → [1, 2]
[1], k=1            → [1]

Phase 2 (extension): the input is a never-ending stream; report top-k continuously with bounded memory. Discuss exact vs approximate tradeoffs.

Problem C — Snapshot Array (LC 1146)

Implement SnapshotArray:

SnapshotArray(length) — initialize with length zeros
set(index, val) — set value at index
snap() → snap_id — take a snapshot, return its id
get(index, snap_id) — value at index at the time of snap_id

Discuss the algorithm; extend to a versioned key-value store with garbage collection of old snapshots.

Expected Communication Style

Restate, including the implied requirements (“rate limit must be enforced even if the same user hits multiple instances”).
Ask senior-grade clarifying questions: read-heavy vs write-heavy? Latency targets? Consistency requirements? Failure modes acceptable?
Propose 2+ approaches with explicit tradeoffs. (“Token bucket vs sliding window log vs sliding window counter — I’d pick X because…”)
State the algorithm and complexity.
Code the chosen approach with senior code quality (clear naming, error handling at boundaries, no premature abstraction).
Discuss the production extension without prompting.
Anticipate failure modes — what breaks at 10× scale? 100×?

Solution Sketches

A. Rate Limiter:

Sliding window log: per user, deque of timestamps. On request, drop entries older than W, check length < N, append. O(N) per request (amortized O(1) drop). Memory: O(users × N).
Token bucket: per user, (tokens, last_refill). On request, refill tokens += (now - last) × rate, cap at N, decrement if ≥ 1. O(1) per request. Slightly bursty.
Sliding window counter: approximate; uses 2 buckets (previous + current window), weighted by overlap. O(1), small memory.

Distributed extension: per-user state in Redis with atomic Lua script; or consistent-hash users to dedicated instances; or central counter with relaxed accuracy.

B. Top K Frequent:

Static: Counter + min-heap of size K. O(N log K).
Streaming exact: impossible with bounded memory in general (any element could become top-K later).
Streaming approximate: Count-Min Sketch + heap of candidates; or Misra-Gries / SpaceSaving algorithm for ε-approximate. O(1/ε) memory.

C. Snapshot Array:

Naïve: copy entire array per snapshot. O(length) per snap, O(snaps × length) memory.
Better: per-index, store list of (snap_id, value) pairs sorted by snap_id. Lookup with binary search. O(log S) per get, O(1) per set, O(total writes) memory.

Versioned KV store extension: persistent data structures (Clojure-style); or copy-on-write trees; or LSM-tree with snapshot isolation.

Common Failure Modes

Implemented the first algorithm that came to mind without discussing alternatives. This is the #1 senior-bar failure.
Said “I’d just use Redis” without explaining the algorithm Redis would implement. The interviewer wants the algorithm; the database is a deployment choice.
Top-K streaming: claimed exact algorithm with bounded memory. Impossible in general; signals theoretical weakness.
Snapshot Array: copied the array per snapshot. Acceptable as brute force; bad as final answer for a senior.
No tests beyond the given examples.
Skipped the production extension entirely.

Passing Bar

Total score: 53/70 (average 3.8)
Tradeoff Reasoning #13: ≥ 4
Production Awareness #14: ≥ 4
Optimal or near-optimal algorithm
Extension discussed substantively (not just “I’d shard it”)
Correct, readable code

Follow-up Questions

For A (Rate Limiter):

Latency budget: < 1ms p99. → In-memory store; Redis is borderline (network RTT). Local cache with eventual sync.
Multi-region with strict global limit. → Hard; usually relaxed to per-region limit + occasional reconciliation.
What if Redis goes down? → Fail-open (allow) vs fail-closed (deny); usually fail-open for rate limiters.
Hot user (one user makes 90% of requests). → Dedicated shard; or local fast path before checking shared state.

For B (Top K):

Approximate with ε = 0.01. → Count-Min Sketch sized accordingly.
Top K most-improved over the last hour. → Two-window comparison; bigger memory.
Trending detection (top K with sudden growth). → Slope/derivative-based; needs time-windowed counts.
What if K = 1M? → Heap-of-K doesn’t fit memory; external merge or sampling.

For C (Snapshot Array):

Snapshot every write (versioned KV). → Same structure; consider compaction.
Memory pressure: drop snapshots older than 1 hour. → Per-index list pruning; tombstones for fully-deleted snaps.
Snapshot isolation in a multi-writer setting. → Multi-version concurrency control; per-transaction snapshot id.
Persist snapshots to disk. → Log-structured store; periodically checkpoint.

Required Tests

Given examples
Empty / boundary input
Heavy churn (many writes to same index for C)
Single user / single key
Burst of requests at the window boundary (A)
K = N for B (no filtering)

Required Complexity + Production Discussion

Cover:

Time per operation, space per user/element
Latency under typical load vs worst case
Memory growth and GC implications
Failure semantics (what happens on partial failure)
Monitoring metrics you’d add (rate limit reject rate, top-K convergence time, snapshot lookup p99)

Self-Evaluation Template

Mock 07 — Senior Engineer
Date: _______
Problem: _______
Time: ___ / 60 min

Scores (1–5):
___ Total /70

Tradeoff Reasoning (#13): ___
Production Awareness (#14): ___
Did I propose 2+ approaches before coding? Y/N
Did I anticipate scale-up failure modes? Y/N

Action item:

What to Do If You Fail

Tradeoff or Production score below 4: Read Phase 8 (practical engineering) deeply; rebuild a small system (rate limiter, cache).
Algorithm score below 3: You haven’t earned the right to do senior interviews yet; back to Mock 04–06.
Code quality issues: Read CODE_QUALITY.md.
Pass twice consecutively before Mock 08.

Mock 08 — Staff Practical

Interview type: Staff/Principal engineer practical coding round Target role: Staff SWE (L6 Google / E6 Meta / Principal Amazon) Time limit: 75 minutes Format: Build a working component (not a LeetCode puzzle) with multiple interacting pieces Hints policy: Hints affect score but rarely fail you outright at staff bar — the bar is judgment, not raw problem-solving. Primary goal: Demonstrate the ability to build a real thing under time pressure, with monitoring/failure-mode awareness baked in.

What This Mock Tests

Staff interviews shift away from “can you solve this puzzle” toward “can you build something we’d ship.” You’re given a problem statement that resembles a small feature spec. Your job:

Decompose the problem into modules with clean interfaces
Choose data structures that match real production constraints
Implement the core fully + skeletons for the rest
Discuss monitoring, deployment, failure recovery, evolution
Justify every choice you make against alternatives

Scoring weights: Code Quality (#7), Tradeoff Reasoning (#13), Production Awareness (#14) are paramount. Pure algorithmic Optimization is less critical — staff problems rarely have a “trick.”

Pick One Build

Build A — In-Memory Rate Limiter Library

Build a usable Python/Java/Go module that provides:

limiter = RateLimiter(max_requests=100, window_seconds=60)
allowed = limiter.allow(user_id="alice")  # bool, ~10µs p99

Required:

Multiple algorithms behind a uniform interface (token bucket + sliding window log + sliding window counter)
Configurable per-user vs global limits
Thread-safe
A stats() method returning rejection rate per user
A purge() method to evict idle users from memory
Tests covering correctness, thread safety, and the window-edge case (burst at exactly t=W)

Build B — Bounded LRU Cache with TTL and Stats

Build an LRU cache that also supports per-entry TTL:

cache = LRUCache(capacity=10000, default_ttl_seconds=300)
cache.put(key, value, ttl=None)   # uses default
val = cache.get(key)              # returns None if missing/expired
cache.delete(key)
cache.stats()                     # hit rate, eviction rate, expiration rate

Required:

O(1) get/put
Lazy + active TTL expiration
Thread-safe
Memory cap (eviction policy: LRU among non-expired)
Tests: correctness, expiration races, concurrent put/get, capacity overflow

Build C — Job Scheduler (cron-like)

Build a scheduler that runs jobs at specified intervals:

scheduler = Scheduler()
scheduler.add(name="cleanup", interval_sec=300, fn=cleanup_fn)
scheduler.add(name="report", cron="0 9 * * MON", fn=report_fn)  # nice-to-have
scheduler.start()
scheduler.stop()
scheduler.status()  # last run, next run, last error per job

Required:

Multiple jobs running independently
Graceful shutdown (don’t kill mid-job)
Per-job error isolation (one job’s failure doesn’t crash the scheduler)
Catch-up policy on missed runs (skip vs catch-up; configurable)
Tests: timing, overlap, panic in job

Expected Communication Style

Restate with assumptions stated upfront: “I’ll assume single-process, multi-threaded, in-memory; if you want me to extend to distributed, that’s a separate discussion.”
Propose the module decomposition before writing any code. Whiteboard the public interface, the internal modules, the data flow.
Identify the 2–3 critical design decisions and discuss alternatives.
Pick one and code it — favor depth over breadth. Skeleton/stub the rest with comments like # TODO: implement token bucket variant with same interface.
Discuss monitoring without prompting: which metrics, why, what alerts.
Discuss failure modes without prompting: thread starvation, memory blowup, race conditions, partial failures.
Test the critical path. Production-style tests, not just smoke.

Solution Sketches

A. Rate Limiter:

class RateLimiter(ABC):
    def allow(self, key: str) -> bool: ...
    def stats(self) -> dict: ...
    def purge(self, idle_seconds: int): ...

class SlidingWindowLogLimiter(RateLimiter):
    def __init__(self, max_requests, window_seconds):
        self._max = max_requests
        self._window = window_seconds
        self._logs = defaultdict(deque)   # key → deque of timestamps
        self._lock = threading.Lock()
        self._rejects = Counter()
        self._accepts = Counter()
    
    def allow(self, key):
        now = time.monotonic()
        with self._lock:
            log = self._logs[key]
            while log and log[0] <= now - self._window:
                log.popleft()
            if len(log) < self._max:
                log.append(now)
                self._accepts[key] += 1
                return True
            self._rejects[key] += 1
            return False

Plus token bucket and sliding-window-counter implementations behind the same interface.

B. LRU + TTL: doubly linked list + hashmap; each node stores (key, value, expires_at, prev, next). get: check expiry, evict if expired, return; else move to MRU. put: insert/update, evict LRU if over capacity. Background thread (or lazy on every get/put) sweeps expired entries.

C. Job Scheduler: thread pool + priority queue of (next_run_time, job). Main loop: peek queue, sleep until next, run job in pool, re-schedule. Catch exceptions per job; record to status. Graceful shutdown: stop accepting new runs, await running ones with timeout.

Common Failure Modes

Built the algorithm without the interface. Staff interviews care about the API as much as the implementation.
No thread safety. Mentioned in the spec; missed → fail.
No mention of monitoring/observability. Critical staff signal.
Used global state. Hard to test, hard to reason about.
Coded all three rate limiter algorithms in 75 min instead of one well + sketches. Depth > breadth.
TTL implementation does periodic full scan. O(N) sweep per second isn’t acceptable; lazy + bounded active sweep is.
Scheduler: jobs share state and races corrupt it. Job functions need to be treated as untrusted code.

Passing Bar

Total score: 56/70 (average 4.0)
Code Quality #7 ≥ 4
Tradeoff Reasoning #13 ≥ 4
Production Awareness #14 ≥ 4
Working core; documented stubs for the rest
At least 3 tests covering: correctness, concurrency, edge timing
Monitoring + failure modes discussed substantively

Follow-up Questions

For A:

Make it distributed. → Redis with Lua atomic ops; or per-shard local limiter + global reconciliation.
Hot-user problem. → Sharded sub-limiters per user; or local L1 cache.
Add quota burst (allow 2× for 5 sec then throttle). → Token bucket with two-tier refill.

For B:

What’s the GC pressure under high churn? → Allocation per put is the cost; pool nodes if hot path.
Persist across restart. → Periodic snapshot to disk; replay log on startup.
Add a probabilistic admission filter (TinyLFU). → Prevent cache pollution from one-hit-wonders.

For C:

Distribute across N workers. → Leader-elected scheduler that dispatches jobs; or per-shard schedulers.
Persistent jobs (survive restart). → Persist queue to durable storage.
Jobs with dependencies. → DAG scheduler; topological execution.
Job retries with exponential backoff. → Per-job retry state machine.

Required Tests

Correctness on the basic case
Thread safety (concurrent calls; assert no double-count, no race-induced overflow)
Timing edges (window boundary, expiration boundary, scheduling drift)
Failure: what happens if the underlying clock jumps backward?
Resource cleanup: after purge / shutdown, no leaked threads or memory

Required Discussion (production)

Cover, at minimum:

Metrics you’d export (Prometheus-style)
Alert thresholds you’d set
Memory cap behavior
Failure modes and recovery
Deployment story (config, rollout, rollback)
Evolution: how would you add a new rate-limiting algorithm? (Should be drop-in.)

Self-Evaluation Template

Mock 08 — Staff Practical
Date: _______
Build: _______
Time: ___ / 75 min

Scores (1–5):
___ Total /70

Critical dimensions:
  Code Quality (#7): ___
  Tradeoff (#13): ___
  Production (#14): ___

Interface designed before coding? Y/N
Monitoring discussed unprompted? Y/N
Failure modes discussed unprompted? Y/N
Thread safety verified by test? Y/N

What I left unfinished (and what I'd do with another hour):

Action item:

What to Do If You Fail

Production score below 4: Build a real version of one of these systems and run it for a week with metrics. Phase 8 has more.
Code quality below 4: Have a senior do a written code review of your submission; act on it.
Tradeoff below 4: For every decision, force yourself to write down at least 2 alternatives and a rejection reason.
Pass twice consecutively before Mock 09.

Mock 09 — Runtime / Language Deep Dive

Interview type: Mid-coding language/runtime probe (Bloomberg, Stripe, hedge funds, infra-heavy teams) Target role: Senior / Staff backend or systems Time limit: 45 minutes Format: ONE medium problem, interrupted by language/runtime probes during/after coding Hints policy: Hints on the probes are -1 each; on the algorithm, standard. Primary goal: Demonstrate that you understand your language at a level deeper than syntax.

What This Mock Tests

Some companies will deliberately interrupt your coding with “what does this line cost?” or “what happens if two threads call this concurrently?” or “where does this object get allocated?” The signal: senior engineers don’t just write code; they know what the runtime does with it.

Scoring weights: Language/Runtime (#12) is doubled. Other dimensions normal.

Pick a language you claim to know well. The probes are language-specific.

Pick One Problem (any language)

Problem A — Implement a Concurrent Counter

Build a thread-safe counter with incr(), decr(), read(). Discuss the tradeoffs of locking vs atomic vs sharded.

Problem B — Producer-Consumer Queue

Implement a bounded blocking queue: put(item) blocks if full; get() blocks if empty.

Problem C — Implement `flatten(nested_list)` Lazily

Given an arbitrarily nested list (e.g., [1, [2, [3, 4]], 5, [[6]]]), return an iterator yielding flat elements lazily (constant memory).

Probes by Language (interviewer fires these mid-coding)

Python

“What does list.append(x) cost? When does it resize?”
“How does Python implement dict? What’s the lookup cost in the worst case?”
“What happens to your concurrent counter under the GIL? Is += atomic?”
“What’s reference counting? When does the cyclic GC run?”
“What does with lock: desugar to?”
“Why is multiprocessing different from threading? When would you use which?”
“Difference between asyncio.sleep(0) and time.sleep(0)?”
“What’s an __slots__ and when does it matter?”
“Generators vs iterators vs async iterators — implement flatten as each.”
“Where does the GIL hurt your code most?”

Java

“What’s the difference between volatile int and AtomicInteger?”
“Explain the Java Memory Model — happens-before relationship.”
“When does synchronized use biased locking / thin lock / fat lock?”
“What does String s = a + b; compile to?”
“How does HashMap resize? What’s the cost?”
“Difference between G1, ZGC, Shenandoah?”
“What’s a MethodHandle?”
“When does escape analysis kick in?”
“What’s Unsafe, why does it exist?”
“Implement the queue using ReentrantLock vs synchronized — what differs?”

Go

“What does a goroutine cost (stack, scheduler)?”
“Explain GMP — goroutines, M (OS thread), P (processor).”
“How does select work under the hood?”
“What’s escape analysis? Show me an example of stack vs heap allocation.”
“Difference between sync.Mutex and sync.RWMutex?”
“What’s the cost of channels? When to prefer mutex?”
“Explain Go’s GC — what generation? What pause time?”
“What does defer cost?”
“Implement the bounded queue using channels vs using a mutex — which and why?”

C++

“Explain RAII.”
“What’s the difference between std::atomic<int> and std::mutex-protected int?”
“Memory orders: relaxed, acquire, release, acq_rel, seq_cst — when to use which?”
“What does std::vector::push_back cost? Amortized vs worst-case?”
“Move semantics — when is the move constructor called?”
“Difference between std::shared_ptr and std::unique_ptr?”
“What’s std::launder?”
“Implement the bounded queue using std::condition_variable.”
“What’s the cost of a virtual call?”

Rust

“Borrow checker rules — one mutable XOR many immutable references.”
“When do you need Arc<Mutex<T>> vs Rc<RefCell<T>>?”
“Explain Send and Sync traits.”
“What does async fn desugar to?”
“Difference between tokio::spawn and tokio::task::spawn_blocking?”
“What’s a Pin<T>? Why does it exist?”
“When does lifetime elision apply?”
“Implement the bounded queue using tokio::sync::mpsc.”

Node.js / JavaScript

“Explain the event loop — phases, microtasks vs macrotasks.”
“What’s the difference between process.nextTick and setImmediate?”
“When is a Promise resolved synchronously vs asynchronously?”
“What’s V8’s hidden class / inline cache?”
“Why does obj.x = 1 after obj.y = 2 behave differently than the reverse order in terms of perf?”
“What’s a WeakRef? When is the value collected?”
“Implement the queue using async / await.”

Expected Communication Style

Restate problem.
Ask clarifying questions (including language-specific ones — “should read() return a snapshot or be consistent with concurrent updates?”).
Code the algorithm.
Engage with probes as they come. Don’t say “I’ll come back to that.” Pause coding, answer, resume.
After coding, walk through tests.
End with a senior-level reflection on what could change with different runtime characteristics (“if we moved this to Go, the channel-based design would replace the lock-based one”).

Common Failure Modes

Memorized the algorithm but didn’t know how the language implements its data structures.
“I don’t know” to a basic probe. A senior should know how the language’s main collections perform.
Implemented the concurrent counter with int and += in Python, thinking GIL makes it safe. GIL ensures bytecode atomicity but x += 1 is read-modify-write across bytecodes. Use threading.Lock or Atomic* types.
In Java, used ++ on volatile int thinking it’s atomic. It’s not — volatile ensures visibility but not atomicity.
In Go, used a channel for a single shared counter. Slower than sync/atomic.AddInt64; mismatched tool.

Passing Bar

Total score: 53/70 (average 3.8)
Language/Runtime (#12): ≥ 4 (mandatory)
Algorithm: correct and at expected complexity
Answered ≥ 4 of 5–6 probes substantively
Code idiomatic for the language

Follow-up Questions (post-coding)

“Now port your solution to [other language]. What changes?”
“Profile this in production — what tool, what metrics?”
“Where would you put the metric instrumentation?”
“If this is the hot path of a service handling 1M qps, what’s the bottleneck?”

Required Tests

Algorithm correctness (basic)
Concurrent stress test (≥ 4 threads/goroutines, hammering)
Boundary timing (empty queue blocks; full queue blocks; producer wakes consumer)
Resource cleanup / shutdown semantics

Required Runtime Discussion

State, for your solution:

What allocations occur per operation (heap vs stack)
Lock granularity and contention behavior
GC implications (Python ref count, Java GC pause, Go STW, etc.)
What the worst-case latency is and why

Self-Evaluation Template

Mock 09 — Runtime / Language
Date: _______
Language: _______
Problem: _______
Time: ___ / 45 min

Scores (1–5):
___ Total /70

Language/Runtime (#12): ___ (need ≥ 4)
Probes asked: ___
Probes answered well: ___

Probes I bombed (write each verbatim, drill before next mock):
1.
2.
3.

Action item:

What to Do If You Fail

Probe score below 4: Spend the next week with Phase 9 — your language directory. Read the language’s runtime/perf docs cover to cover.
Couldn’t answer 50% of probes: You don’t know the language as well as you claim. Pick a different language to claim, OR invest 2+ weeks.
Pass twice consecutively before Mock 10.

Mock 10 — Infrastructure / Backend

Interview type: Backend / Platform / Infrastructure deep-dive coding Target role: Senior / Staff backend, distributed systems, database, storage Time limit: 75 minutes Format: Build a non-trivial backend component (KV store, log-structured index, sharded cache) Hints policy: Acceptable on the algorithm; failures on storage/concurrency fundamentals are red flags. Primary goal: Demonstrate that you can build the building blocks of real backend systems, not just consume them.

What This Mock Tests

Companies like Stripe, Snowflake, Databricks, Confluent, Cockroach Labs, and infra teams at FAANG ask coding questions that resemble small slices of their actual products. You’re expected to:

Understand storage primitives (logs, indexes, B-trees, LSM)
Reason about durability, ordering, concurrency
Write code that could be a starting point for a production component
Discuss the gap between what you built and what production would need

Scoring weights: Production Awareness (#14), Code Quality (#7), Correctness (#5), Tradeoff Reasoning (#13) are all critical.

Pick One Build

Build A — In-Memory KV Store with Snapshot Isolation

kv = KVStore()
tx = kv.begin()           # returns a transaction
tx.put("k", "v")
tx.get("k")               # returns "v" (read your writes)
tx2 = kv.begin()
tx2.get("k")              # returns None (snapshot isolation; tx1 not committed)
tx.commit()
tx2.get("k")              # still None (tx2's snapshot was taken before commit)

Required:

MVCC (multi-version concurrency control)
Snapshot reads return a consistent view
Concurrent writers
Tests for read-your-writes, isolation between tx, serialization-style conflict detection (optional)

Build B — Log-Structured Index (Mini-LSM)

db = LSMTree(memtable_threshold=1000)
db.put("k1", "v1")
db.put("k2", "v2")
db.get("k1")             # "v1"
db.delete("k1")
db.get("k1")             # None (tombstone)
# After threshold writes, memtable flushes to immutable SSTable
db.range("k1", "k9")     # iterator over keys

Required:

In-memory memtable (sorted, e.g., SortedList or skiplist)
Flush to immutable SSTable when memtable exceeds threshold
get checks memtable, then SSTables in reverse-time order
Tombstones for deletes
Range scan that merges across all levels
Tests covering: writes, reads-after-flush, range correctness, deletes

Build C — Consistent Hash Ring with Replication

ring = HashRing(replication_factor=3, virtual_nodes=128)
ring.add_node("node-A")
ring.add_node("node-B")
ring.add_node("node-C")
nodes = ring.get("user-123")    # 3 nodes responsible
ring.remove_node("node-A")       # 1/3 of keys reassign
nodes_after = ring.get("user-123")

Required:

Virtual nodes for load balancing
Replication factor enforced
Adding/removing a node moves only its share of keys
A test that verifies < 5% of keys move when a node is added (with sufficient virtual nodes)

Expected Communication Style

Restate with assumptions stated upfront.
Decompose before coding: which modules, what interfaces, what’s the data flow.
Discuss the storage model:
- For A: how to represent versions per key
- For B: how to lay out SSTables; in this exercise, in-memory simulated
- For C: ring representation; virtual nodes; lookup data structure
Identify the concurrency model and discuss what fails without it.
Code the core end-to-end before tackling optimization.
Discuss production gaps: persistence (we’re in-memory), replication consistency (we’re local), recovery (no WAL), monitoring (none).
Test the invariants, not just the happy path.

Solution Sketches

A. KV with MVCC:

class KVStore:
    def __init__(self):
        self._data = {}            # key → list of (version, value)
        self._next_version = itertools.count()
        self._lock = threading.Lock()
    
    def begin(self):
        with self._lock:
            v = next(self._next_version)
        return Transaction(self, v)

class Transaction:
    def __init__(self, store, version):
        self._store = store
        self._snapshot_version = version
        self._writes = {}     # local buffer
        self._committed = False
    
    def get(self, key):
        if key in self._writes: return self._writes[key]
        versions = self._store._data.get(key, [])
        for v, val in reversed(versions):
            if v <= self._snapshot_version: return val
        return None
    
    def put(self, key, value): self._writes[key] = value
    
    def commit(self):
        with self._store._lock:
            commit_v = next(self._store._next_version)
            for k, v in self._writes.items():
                self._store._data.setdefault(k, []).append((commit_v, v))
            self._committed = True

B. LSM Tree: SortedList for memtable; on flush, freeze into immutable sorted list (the “SSTable”). get walks memtable + SSTables newest-first. Range scan: heap-merge iterators over all levels. Tombstones represented as special sentinel.

C. Consistent Hash: sorted list of (hash(virtual_node_id), node) pairs. Lookup: hash key, binary search for next pair, walk forward to collect R distinct nodes. Add/remove: insert/delete the virtual node entries.

Common Failure Modes

A: returned all versions instead of the snapshot-visible one. Snapshot isolation not implemented; just MVCC storage.
A: forgot to use a local write buffer. Transactions are visible to others before commit.
B: re-sorting on every read. Should sort on flush; reads are merge.
B: no tombstone semantics. Deleted key still appears in older SSTable.
C: hash ring without virtual nodes. Load imbalance — one node gets 60% of keys.
C: re-hashing entire keyspace on node change. Defeats the purpose of consistent hashing.
All: no concurrency testing. Build passes single-thread; explodes under load.

Passing Bar

Total score: 56/70 (average 4.0)
Working core implementation
Concurrency correct (or explicit single-threaded contract with rationale)
Production gaps discussed substantively
At least 4 tests covering invariants (not just smoke tests)
Code quality: production-quality

Follow-up Questions

For A:

Add serializable isolation. → Validation phase: on commit, abort if any key read had a newer version. (Optimistic concurrency.)
Garbage-collect old versions. → Track oldest active snapshot; vacuum versions older than that.
Persist to disk. → Write-ahead log per transaction; redo on recovery.
Distributed version: two-phase commit + Paxos for log replication.

For B:

Add bloom filters per SSTable. → Skip SSTable scan if key definitely not present.
Compaction strategy: leveled vs size-tiered. → Tradeoffs in write amplification.
Crash recovery. → WAL replay before opening memtable.
Range scan optimization with min/max key per SSTable.

For C:

Heterogeneous nodes (different capacity). → Virtual nodes proportional to capacity.
Read consistency across replicas. → Quorum reads (R + W > N).
Hinted handoff when a replica is down. → Buffer writes for offline replicas, flush on return.
Anti-entropy / Merkle trees. → Detect divergence between replicas.

Required Tests

Happy path correctness
Boundary case (empty store, single key, max capacity)
Concurrency: ≥ 8 threads, hammer the API for several seconds, verify invariants hold
Recovery semantics (if implemented)
For A: snapshot isolation — test that opens tx1, writes via tx2, commits tx2, reads via tx1 → must return the pre-commit value
For B: write enough to trigger flush; read works across memtable + SSTable
For C: load distribution test (after adding N nodes, max-min keys per node ratio ≤ 1.3)

Required Production Discussion

Persistence strategy and crash recovery
Replication model and consistency tradeoffs
Monitoring: latency p50/p99, throughput, queue depths, GC pauses
Failure modes: node loss, network partition, slow disk
Backpressure: what happens when writes outpace flush

Self-Evaluation Template

Mock 10 — Infrastructure / Backend
Date: _______
Build: _______
Time: ___ / 75 min

Scores (1–5):
___ Total /70

Core working? Y/N
Concurrency tests pass under stress? Y/N
Production gaps discussed unprompted? Y/N (list them: ___)

What I left out (and what it would take):

Action item:

What to Do If You Fail

Storage primitives unclear: Read “Designing Data-Intensive Applications” (Chapters 3 and 5).
Concurrency issues: Phase 9 (language/runtime) concurrency sections.
Code quality: A senior code review of your build.
Pass twice consecutively before Mock 11.

Mock 11 — Concurrency Heavy

Interview type: Concurrency / parallelism coding round Target role: Backend, systems, infrastructure, embedded, gaming, OS-adjacent Time limit: 60 minutes Format: ONE problem requiring real concurrency primitives (locks, condvars, channels, atomics) Hints policy: A hint on the primitive choice is acceptable; a hint on the race condition is borderline. Primary goal: Write code that is provably correct under concurrent access.

What This Mock Tests

Concurrency code is notoriously hard. The bar:

Identify what needs synchronization (shared mutable state)
Choose the right primitive (mutex vs RWLock vs channel vs atomic)
Avoid deadlocks, livelocks, starvation
Write a test that would catch the bug if present (not just one that passes by luck)

Scoring weights: Correctness (#5), Code Quality (#7), Language/Runtime (#12) are critical.

Pick One Problem

Problem A — Bounded Blocking Queue (with timeout)

q = BoundedQueue(capacity=10)
q.put(item)                       # blocks if full
q.put(item, timeout=5.0)          # returns False on timeout
item = q.get()                    # blocks if empty
item = q.get(timeout=5.0)         # returns None on timeout
q.close()                         # subsequent put → exception; get drains then returns None

Multiple producers, multiple consumers. Must be FIFO. Must support graceful close.

Problem B — Thread Pool (with shutdown semantics)

pool = ThreadPool(workers=4)
future = pool.submit(fn, arg)
result = future.result(timeout=5.0)   # blocks until done
pool.shutdown(wait=True)              # waits for all queued + running
pool.shutdown(wait=False)             # rejects new, returns immediately

Required: bounded work queue, graceful drain, future-based result delivery, exception propagation.

Problem C — Read-Write Lock (Writer-Preferred)

lock = RWLock()
with lock.read():
    # multiple readers concurrently
    ...
with lock.write():
    # exclusive; blocks new readers waiting for writer
    ...

Required: many readers OR one writer; writers must not starve.

Problem D — Dining Philosophers (Deadlock-Free)

5 philosophers, 5 chopsticks. Each alternates think/eat. Implement so no deadlock, no philosopher starves.

Expected Communication Style

Restate with the explicit concurrency requirements (“multiple producers, multiple consumers, FIFO, graceful close”).
Identify shared mutable state. This is the most important step.
Identify the invariant. (“Queue size never exceeds capacity”; “no two writers active simultaneously”; “no two adjacent philosophers eat simultaneously”.)
Choose primitives with rationale. (“I need to block on full/empty — that means condvar, not just a lock.”)
Code the critical section minimally. Hold the lock only across the shared-state mutation; release before any potentially-blocking call.
Write a concurrency test. Not just “does it work once” — stress with many threads, verify the invariant.
Discuss failure modes: what happens if a producer dies holding the lock? If close is called during a put?

Solution Sketches

A. Bounded Queue:

class BoundedQueue:
    def __init__(self, capacity):
        self._capacity = capacity
        self._q = deque()
        self._lock = threading.Lock()
        self._not_full = threading.Condition(self._lock)
        self._not_empty = threading.Condition(self._lock)
        self._closed = False
    
    def put(self, item, timeout=None):
        with self._lock:
            if self._closed: raise QueueClosed()
            end = time.monotonic() + timeout if timeout else None
            while len(self._q) >= self._capacity:
                if self._closed: raise QueueClosed()
                remaining = end - time.monotonic() if end else None
                if remaining is not None and remaining <= 0: return False
                self._not_full.wait(timeout=remaining)
            self._q.append(item)
            self._not_empty.notify()
            return True
    
    def get(self, timeout=None):
        with self._lock:
            end = time.monotonic() + timeout if timeout else None
            while not self._q:
                if self._closed: return None
                remaining = end - time.monotonic() if end else None
                if remaining is not None and remaining <= 0: return None
                self._not_empty.wait(timeout=remaining)
            item = self._q.popleft()
            self._not_full.notify()
            return item
    
    def close(self):
        with self._lock:
            self._closed = True
            self._not_full.notify_all()
            self._not_empty.notify_all()

B. Thread Pool: N worker threads pulling from a shared bounded queue; each task wraps (fn, args, future). Workers set future’s result/exception. Shutdown sentinels (None) wake workers; wait=True joins all worker threads.

C. RWLock writer-preferred: Track reader_count, writer_active, waiting_writers. Reader acquires only if no writer active AND no waiting writers. Writer blocks while readers active; once it’s the chosen waiter, blocks all new readers.

D. Philosophers: asymmetric strategy — even-numbered grabs left then right, odd-numbered grabs right then left. Breaks the symmetry that causes circular wait → no deadlock. Alternative: use a hierarchical lock ordering (always grab lower-id chopstick first).

Common Failure Modes

Used if instead of while around wait(). Spurious wakeups cause invariant violations.
Released the lock before notifying, or notified before updating state. Wakes a consumer that finds the queue empty.
Held the lock during a callback / blocking I/O. Other threads stall.
Used notify() instead of notify_all() for close. Only wakes one waiter; others hang forever.
RWLock without writer preference → writer starvation.
Philosophers: all grab left → deadlock. Classic.
Test that runs once and passes. Not a concurrency test. Need stress (1000+ iterations across many threads).
Used concurrent.futures.ThreadPoolExecutor for Problem B without implementing the underlying logic. Some interviewers accept this; many want you to build it.

Passing Bar

Total score: 56/70 (average 4.0)
Code correctness verified under stress (≥ 10K operations across ≥ 8 threads)
Invariant explicitly stated and tested
No use of high-level concurrency abstractions that hide the primitive (e.g., queue.Queue for problem A)
Failure modes discussed

Follow-up Questions

For A:

Lock-free version. → Discuss MPMC ring buffer with CAS; show awareness of ABA problem.
Multi-priority. → Multiple internal queues, one per priority.
Persist across restart. → Append-only log; replay on startup.

For B:

Work-stealing pool. → Per-worker deque; idle workers steal from others.
Dynamic resizing. → Grow workers under load; shrink when idle.
Cancellable tasks. → Future.cancel() signals; worker checks flag.

For C:

Fair RWLock (FIFO). → Single waiting queue; reader batching possible but trickier.
Reader-preferred. → Easier to implement; writer starvation risk.
Async version with futures. → Same logic; futures replace condvar.

For D:

Variant where philosophers think for variable time. → Same algorithm.
Generalize to N philosophers and K chopsticks. → Open problem; resource allocation graphs.
Distributed version (philosophers on different machines). → Requires distributed deadlock detection.

Required Tests

Single-thread correctness
Multi-thread stress: many producers + consumers; assert no item lost, no item duplicated, no exception
Timeout correctness: put(timeout=0.1) returns False after 100ms when full
Close semantics: close during active put/get unblocks all waiters cleanly
Invariant assertion: assert size ≤ capacity throughout, etc.

Example stress test for A:

def test_stress():
    q = BoundedQueue(10)
    produced = []
    consumed = []
    def producer(start, count):
        for i in range(start, start + count):
            q.put(i)
            produced.append(i)
    def consumer():
        while True:
            x = q.get(timeout=1.0)
            if x is None: break
            consumed.append(x)
    threads = [threading.Thread(target=producer, args=(i*1000, 1000)) for i in range(10)]
    consumers = [threading.Thread(target=consumer) for _ in range(5)]
    for t in threads + consumers: t.start()
    for t in threads: t.join()
    q.close()
    for t in consumers: t.join()
    assert sorted(consumed) == sorted(produced)
    assert len(consumed) == 10_000

Required Discussion

The invariant your code maintains
The lock ordering (if multiple locks)
The worst-case latency (lock contention)
What happens under crash mid-operation
How you’d debug a deadlock if one were reported (jstack, py-spy dump, gdb)

Self-Evaluation Template

Mock 11 — Concurrency
Date: _______
Problem: _______
Time: ___ / 60 min

Scores (1–5):
___ Total /70

Stress test passed (≥ 10K ops)? Y/N
Invariant explicitly stated? Y/N
Failure modes discussed? Y/N
Any race condition found post-hoc? (List:)

Action item:

What to Do If You Fail

Race condition in submitted code: This is the #1 reason to repeat — concurrency bugs in production are catastrophic.
Couldn’t choose the right primitive: Read your language’s concurrency chapter (Phase 9). Understand condvar vs channel vs atomic before next attempt.
Stress test exposed a bug you didn’t anticipate: Lab 05 (stress harness) applied to concurrent code is your training.
Pass twice consecutively before Mock 12.

Mock 12 — Competitive Style

Interview type: Algorithmic puzzle round (Jane Street, Hudson River, Citadel, Two Sigma, ICPC-style firms; some Google L6+ rounds) Target role: Quant developer, HFT, compiler/optimization, ICPC alumni, top-tier algorithmic teams Time limit: 90 minutes Format: ONE hard algorithmic problem (Codeforces Div 2 D / Div 1 B level) Hints policy: No free hints. A hint is a hard signal of failure. Primary goal: Reach the algorithmic insight under sustained time pressure.

What This Mock Tests

This mock is not about production engineering. It’s about pure algorithmic depth and the ability to think clearly for 90 minutes on a problem with no obvious path.

The kind of problem chosen:

Has a clever insight that unlocks the optimal complexity
Brute force is far too slow
Standard patterns don’t directly apply — you must combine 2–3
Implementation is non-trivial but not the bottleneck

Scoring weights: Optimization (#4), Correctness (#5), Complexity (#6), Code Quality (#7) are key. Production / tradeoff dimensions are not relevant — these are not asked.

Pick One Problem

(Pick at random for self-mock. With a partner, they choose.)

Problem A — Maximum Subarray with At Most K Replacements

Given an array a of integers and integer k, you may replace at most k elements with any value. Find the maximum possible sum of any contiguous subarray of the resulting array.

Constraints: 1 ≤ |a| ≤ 2×10^5. -10^9 ≤ a[i] ≤ 10^9. 0 ≤ k ≤ |a|.

Examples:

a = [-3, 4, -2, 5, -1], k = 1 → 11   (replace -2 with, say, 10^9? No — wait)

(Note: replacement values are unconstrained, so this trivializes; the actual problem variant is: removals, or replacements must be 0, or replacements use given budget. The interviewer specifies. For self-mock, use: “replace at most k elements with 0” — then it becomes a real DP.)

Problem B — Count Subarrays with Median ≥ X

Given array a (distinct) and threshold X. Count contiguous subarrays whose median is ≥ X. (Median of even-length: take the right-middle.)

Constraints: 1 ≤ |a| ≤ 2×10^5.

Insight: map each element to +1 if ≥ X, else -1. A subarray’s median is ≥ X iff its sum is positive (for odd length) or ≥ 0 (for even length with right-middle). Reduces to counting subarrays with prefix sum differences satisfying inequalities — Fenwick tree.

Problem C — Minimum Cost to Make Array Strictly Increasing

Given array a, you may increase any element by 1 at cost 1 (cannot decrease). Find minimum total cost to make a strictly increasing.

Constraints: 1 ≤ |a| ≤ 3000. Values fit in int64.

Insight: strict-increasing ↔ define b[i] = a[i] - i; then b must be non-decreasing. Reduces to “min cost to make array non-decreasing using only increases” = sum(max(0, prefix_max - b[i])). Wait — that’s not quite right because we can only increase. Final formula: walk left to right, maintain prev = max(prev + 1, a[i]), cost += prev - a[i].

Problem D — Range Sum with Updates and Range Adds (LC 307 + lazy)

Implement: update(i, x), range_add(l, r, x), query_sum(l, r). All in O(log N).

Constraints: 1 ≤ N ≤ 10^5. ≤ 10^5 operations.

Tool: segment tree with lazy propagation (Phase 3 Lab 02).

Problem E — Maximum XOR of Two Numbers (LC 421)

Given an array a, find max(a[i] XOR a[j]) over all i < j. O(N · 32) time.

Insight: binary trie of all numbers; for each number, greedily traverse to find the maximally-different other number.

Expected Communication Style

For competitive mocks, communication is light but precise:

Restate in one sentence.
Ask 1–2 surgical clarifying questions (constraints, distinct/duplicate, output format).
State a brute force with complexity — proves you understand the problem.
Think aloud about reductions or patterns. (“Median question; +1/-1 transformation; subarray sum; Fenwick.”)
State the optimal complexity and key insight before coding.
Code carefully and minimally. Competitive code can sacrifice some readability for brevity; don’t sacrifice correctness.
Test 1–2 cases including a non-obvious one.

There is no “production discussion” in this format. The interviewer cares about the algorithm and the implementation.

Common Failure Modes

Couldn’t reach the insight in 60 min. Submitted the brute force. Pass-ish for the optimization dimension only if the brute is correct.
Reached the insight but the implementation has bugs that take 20+ min to fix. Need to drill segment tree / Fenwick / DP from Phase 3.
Stuck on the wrong approach for 30+ min. Senior signal: pivot quickly when an approach doesn’t pan out. Articulate the pivot.
Forgot the standard library tool. (Python bisect, C++ lower_bound, etc.) — costs implementation time.
No tests because “I’m confident.” Competitive code is wrong constantly; verify against brute force.

Passing Bar

Total score: 49/70 (average 3.5) — lower than other mocks because some dimensions don’t apply
Optimal complexity reached (or a serious near-optimal attempt with clear gap analysis)
Correct on given examples + at least one boundary case
Time ≤ 90 min
Algorithm articulated with insight

Follow-up Questions

Competitive-style follow-ups are harder algorithm variants:

For A:

Generalize: replace ≤ k elements with values from a given set. → DP becomes more state-heavy.
Output the actual subarray and the replacements. → DP with parent pointers.

For B:

Median strictly > X. → Adjust the +1/-1 mapping for equality.
K-th smallest in every subarray. → Much harder; persistent data structures or offline processing.

For C:

Allow decreases at cost 1 too. → Now O(N log N) using slope trick or O(N²) DP.
Strictly increasing AND in [L, R] for each element. → Constrained version; more careful greedy.

For D:

Add range assign as well as range add. → Two lazy tags; non-trivial composition.
Range mode (most frequent element). → Much harder; Mo’s algorithm.

For E:

Max XOR of any triple. → Open problem in some formulations; brute O(N²) over pairs + trie for third.
Max XOR with values ≤ K (subset). → Persistent trie indexed by element index.

Required Tests

All given examples
Boundary: N = 1, N = max
Adversarial: all same values, sorted, reverse-sorted
One stress test against the brute force if time permits (mandatory if you suspect a bug)

Required Complexity Explanation

Time, with reasoning
Space, with reasoning
Bound is tight or improvable?
For N = 2×10^5 with O(N log N), expected runtime in seconds (typically < 1 sec)

Self-Evaluation Template

Mock 12 — Competitive Style
Date: _______
Problem: _______
Time: ___ / 90 min

Scores (1–5):
___ Total /70 (note: Tradeoff/Production are N/A; weighted out)

Time to insight: _____ min
Time to first correct implementation: _____ min
Bugs found and fixed: ___

Did I pivot from a wrong approach? Y/N (at minute ___)

Action item:

What to Do If You Fail

Couldn’t reach the insight: This is a long-term gap, not a one-mock fix. Solve 30+ Codeforces Div 2 D problems (or LC hards tagged “competitive”) over 2–4 weeks.
Reached insight, implementation buggy: Drill Phase 3 (advanced data structures) — your fundamentals leak.
Bombed time management: Practice with stricter timers (45 min for problems you’ve already seen).
Pass twice consecutively, on different problems, to consider this level handled.

After All 12 Mocks

When you have passed all 12 mocks twice consecutively each, return to the READINESS_CHECKLIST to verify the overall pipeline. The mocks alone do not certify readiness — they verify performance ability. Real interviews additionally test consistency over many rounds and behavioral signals.

Most candidates do not need to pass all 12. Pass the mocks corresponding to your target role:

FAANG SWE-II: mocks 01–06
FAANG Senior: mocks 01–07, plus 09 (language)
FAANG Staff / Principal: mocks 01–10 except 12
Quant / HFT / Compiler: mocks 01–04, 09, 11, 12 (heavy on competitive)
Backend / Platform (Stripe, Snowflake, Confluent): mocks 01–08, 10, 11

Phase 12 — Grandmaster

Read this before doing anything in this phase.

This phase covers topics that are not required for 99% of interviews, including senior and staff roles at top FAANG companies. The content here is for a narrow set of candidates:

ICPC World Finals competitors
Codeforces red / IGM
Quant developers at Jane Street, Hudson River, Two Sigma, Citadel for the most algorithmic roles
Compiler engineers at LLVM, GCC, Intel, NVIDIA
Database engineers at the algorithm-heavy companies (Snowflake, Databricks query optimizer, CockroachDB, TimescaleDB)
Cryptography / coding theory researchers
A few specific roles at Google Research, DeepMind, OpenAI infrastructure

If you are not on this list, skip this phase entirely. Time spent here is time not spent on Phase 10 (testing/debugging) and Phase 11 (mocks), which will show up in your interviews.

When to Use This Phase

Use this phase if all are true:

You have already completed Phases 1–11 and are passing the mocks at your target level.
You are interviewing for one of the roles listed above.
The job description explicitly mentions ICPC, competitive programming, max flow, suffix structures, FFT, or “research-grade algorithms.”
You have at least 3 months before your interview.

If any of these are false, stop. Go back to Phase 10 or 11.

When to Skip This Phase

Skip this phase if any is true:

You are interviewing for SWE-II / E4 / SDE2 or below.
You are interviewing for generic senior or staff backend at FAANG (mocks 06–08 cover what’s actually asked).
You have less than 3 months before your interview.
You are still failing Phase 11 mocks at your level — those are higher leverage.
You are an SRE, mobile engineer, frontend engineer, ML engineer (non-research), or data engineer.

The opportunity cost is real. Each lab here takes a week. That week is better spent on Phase 11 mock attempts for the vast majority of candidates.

What’s In This Phase

The labs cover algorithms and data structures that appear on Codeforces / ICPC / IOI and almost nowhere else:

Lab	Topic	When it appears
01	Max Flow (Dinic)	Quant, compiler, graph-heavy research
02	Bipartite Matching (Hopcroft-Karp)	Assignment problems; some quant
03	Heavy-Light Decomposition	ICPC, very rare in industry
04	Centroid Decomposition	ICPC, rare in industry
05	Suffix Automaton	String-heavy research, bioinformatics
06	Advanced DP Optimization (CHT, Knuth, D&C)	Quant, compiler (loop scheduling)
07	FFT / Polynomial Multiplication	Cryptography, signal processing, some compiler
08	Advanced Geometry (convex hull, intersections)	Geometric computing, games, CAD
09	ICPC Contest Simulation	Competitive programming only
10	Inclusion-Exclusion, Burnside	Combinatorics-heavy research

Each lab has the standard 23-section format plus an extra “When to Skip This Topic” section right after Interview Context, so you can opt out of individual labs.

How to Use This Phase

Read this README in full.
Look at the target job descriptions you’re applying to. Search them for the specific keywords (max flow, suffix automaton, etc.).
If you find a match, do that specific lab. If not, skip.
Doing the whole phase end-to-end is rarely the right call. Cherry-pick.

Realistic Expectations

Even if you do this phase, you may never encounter these topics in an interview. The value is:

Confidence signal — knowing these exist and roughly how they work lets you say “I’m familiar with Dinic’s max flow” if it ever comes up.
Insight transfer — understanding centroid decomposition deepens your tree intuition for problems you will see.
Specific roles — if you’re applying to a quant fund’s algo research team, expect this material.

This phase is intentionally not graded against the same passing bar as other phases. It’s read-only intellectual investment for a small group.

What This Phase Is Not

This phase is not:

A prerequisite for Phase 11.
Required for any FAANG interview.
A signal of seniority.
Going to help you with system design.

If you’re using this phase to procrastinate the harder thing (Phase 10 testing labs, Phase 11 mocks), stop. That’s the actual failure mode, and the only one of consequence.

After This Phase

If you complete the relevant labs, you have what most ICPC mid-rank teams have. You’re prepared for the narrow algorithmic interviews. You’re not more prepared for normal FAANG interviews than someone who did Phase 11 twice.

Return to Phase 11 for mock-12 (competitive style) reps, then to your job search.

Lab 01 — Max Flow (Dinic’s Algorithm)

Goal

Implement Dinic’s algorithm for maximum flow on a directed graph with capacities, achieving O(V²·E) worst case and near-linear in practice. Apply it to a real interview-style problem (Maximum Students Taking Exam, LC 1349) by reducing to max flow / bipartite matching.

Background

Maximum flow is the foundational network flow problem: given a source s and sink t in a directed graph with edge capacities, find the maximum rate at which “flow” can travel from s to t respecting capacities.

Key algorithms:

Ford-Fulkerson (1956): generic augmenting-path framework. Complexity depends on path choice.
Edmonds-Karp (1972): BFS for shortest augmenting path. O(V·E²).
Dinic (1970): level graphs + blocking flows. O(V²·E) general; O(E·√V) for bipartite matching.
Push-relabel (Goldberg-Tarjan, 1986): faster in practice for dense graphs. O(V²·√E) with FIFO.

Why Dinic dominates in practice: the level graph constraint (only follow edges from level i to level i+1 in BFS layering) prunes the search dramatically. For random graphs, near-linear.

Interview Context

Max flow is asked in:

ICPC regionals/world finals (universal)
Quant developer rounds at funds that care about assignment-style problems
A handful of Google L6+ research interviews
Rare appearances at Snowflake / Databricks query optimizer roles (max-flow underpins some join reordering heuristics)
Never in standard FAANG SWE interviews

If asked, expect to either implement Dinic from scratch OR identify that a problem reduces to max flow and explain the reduction (more common than full implementation).

When to Skip This Topic

Skip if any of these are true:

You are not interviewing for quant / research / ICPC-adjacent roles
You have not memorized the basic Ford-Fulkerson framework yet
You have less than 4 weeks for this phase

The reduction skill (recognizing a problem as max flow) is more valuable than memorizing the implementation. If you have only a few days, study reductions and skip the implementation.

Problem Statement

Maximum Students Taking Exam (LeetCode 1349, Hard).

Given an m × n classroom matrix where each cell is either ‘.’ (good seat) or ‘#’ (broken). Place students such that no student can cheat — a student can cheat off any immediately adjacent student in the same row OR diagonally in front (one row earlier, column ±1). Maximize the number of students seated.

seats = [["#",".","#","#",".","#"],
         [".","#","#","#","#","."],
         ["#",".","#","#",".","#"]]
output = 4

Constraints

1 ≤ m, n ≤ 8 (small grid — but max-flow approach generalizes)
Up to 64 seats
Wall-clock target: < 100ms

Clarifying Questions

Can a student cheat off the seat directly in front (same column, previous row)? (No — only diagonal-front and same-row-adjacent.)
Are broken seats unavailable for sitting? (Yes — ‘#’ cannot hold a student.)
Is the grid always rectangular? (Yes.)

Examples

Example 1

seats = [[".","#","."],
         ["#",".","#"],
         [".","#","."]]

Conflict graph: every ‘.’ conflicts with diagonal-front + same-row-adjacent. Maximum independent set = 4 (the corners).

Example 2

seats = [["."]]

Trivial: 1.

Example 3 (boundary)

seats = [[".",".",".","..."]]   # single row, all good

Same-row-adjacency means max alternating = ⌈n/2⌉.

Brute Force

Try every subset of good seats; check no two are in conflict; track max. O(2^k · k²) where k = number of good seats. For 8×8 = 64, infeasible.

Brute Force Complexity

Time: O(2^k · k²) — fails for k > ~20.
Space: O(k) for current subset.

Optimization Path

Observation 1: this is maximum independent set on a conflict graph, which is NP-hard in general.

Observation 2: but our conflict graph is bipartite! Color seats by column parity (even/odd columns). All conflicts are between an even column and an odd column (same-row-adjacent: differs by 1; diagonal: also differs by 1). So no conflicts within the even-column set or within the odd-column set.

Observation 3: max independent set on a bipartite graph = total vertices − max matching (König’s theorem). So we compute max bipartite matching, which is solvable in polynomial time via max flow.

This is the canonical reduction trick.

Final Expected Approach

Build the bipartite graph: left = good seats in even columns, right = good seats in odd columns. Edge between two if they conflict.
Add source s → all left nodes (cap 1), all right nodes → sink t (cap 1), all conflict edges left→right (cap 1).
Run Dinic to compute max flow = max matching.
Answer = (total good seats) − (max matching).

Data Structures

Adjacency list with edge-index representation (each edge stores to, cap, rev-index for the reverse edge)
BFS level array
DFS iterator per node (incremented across calls to skip dead branches)
Queue for BFS

Correctness Argument

Bipartite: any conflict involves columns differing by 1, hence different parities.
König: in bipartite, |min vertex cover| = |max matching|; |max independent set| = |V| − |min vertex cover|. So |MIS| = |V| − |max matching|.
Dinic correctness: Ford-Fulkerson framework with augmenting paths; terminates when no augmenting path exists; gives optimal flow by max-flow min-cut theorem.
Reduction: max matching via max flow is exact when all edge capacities are 1 and source/sink edges all have capacity 1.

Complexity

Dinic on bipartite (unit-capacity) graphs: O(E·√V) — the Hopcroft-Karp bound.
For LC 1349: V ≤ 64, E ≤ 64 × 4 = 256. Trivial.

Implementation Requirements

class Dinic:
    def __init__(self, n):
        self.n = n
        self.graph = [[] for _ in range(n)]
    
    def add_edge(self, u, v, cap):
        self.graph[u].append([v, cap, len(self.graph[v])])
        self.graph[v].append([u, 0, len(self.graph[u]) - 1])
    
    def _bfs(self, s, t):
        self.level = [-1] * self.n
        self.level[s] = 0
        q = deque([s])
        while q:
            u = q.popleft()
            for v, cap, _ in self.graph[u]:
                if cap > 0 and self.level[v] < 0:
                    self.level[v] = self.level[u] + 1
                    q.append(v)
        return self.level[t] >= 0
    
    def _dfs(self, u, t, pushed):
        if u == t: return pushed
        while self.it[u] < len(self.graph[u]):
            e = self.graph[u][self.it[u]]
            v, cap, rev = e
            if cap > 0 and self.level[v] == self.level[u] + 1:
                d = self._dfs(v, t, min(pushed, cap))
                if d > 0:
                    e[1] -= d
                    self.graph[v][rev][1] += d
                    return d
            self.it[u] += 1
        return 0
    
    def max_flow(self, s, t):
        flow = 0
        while self._bfs(s, t):
            self.it = [0] * self.n
            while True:
                f = self._dfs(s, t, float('inf'))
                if f == 0: break
                flow += f
        return flow

Then build the bipartite graph and call.

Tests

LC 1349 given examples
All ‘#’ grid → 0
All ‘.’ grid of size 1×n → ⌈n/2⌉
8×8 all ‘.’ (max stress) → ~32 (need to compute)
Single column m×1 all ‘.’ → m (no same-row conflicts within a column)

Follow-up Questions

Generalize to weighted matching (different students have different “value”; maximize total value). → Min-cost max flow.
Add a constraint that some seats are mandatory. → Force-include via lower-bound constraints.
m, n up to 50. → Same algorithm; check timing.
Stream of conflicts; dynamic max matching. → Active research area.
Distinct from LC 1349, prove the bipartite reduction is tight.

Product Extension

Real systems that use max flow / bipartite matching:

Ride-sharing assignment (drivers ↔ requests)
Ad auction allocation (advertisers ↔ slots)
Resource scheduling (tasks ↔ machines)
Compiler register allocation (variables ↔ registers; with constraints)
DNA sequencing assembly

Language/Runtime Follow-ups

Python: recursion depth limit; switch to iterative DFS for large V.
C++: much faster; competitive programmers use this exclusively.
Go/Java: stack size for recursive DFS may need explicit increase.

Common Bugs

Forgot the reverse edge. Flow networks require residual graph; no reverse = wrong answer.
Reverse edge with cap 0 but didn’t account for it during DFS: correct — that’s by design.
BFS level updated multiple times. Use the first level reached only.
DFS iterator reset every call to _dfs. Should persist within a phase (the self.it[u] trick).
Bipartite assumption violated: if you add an edge between two left nodes, the reduction breaks. Verify.
Source/sink indices clash with vertex IDs. Use distinct numbering scheme.

Debugging Strategy

Print the level graph after each BFS.
Print augmenting path found in each DFS.
Verify flow conservation at intermediate nodes after termination.
Sanity check: max flow ≤ min(deg(s), deg(t)).

Mastery Criteria

Implement Dinic from memory in ≤ 25 min in your primary language
Identify max-flow reductions in problems that don’t mention “flow” or “matching” explicitly
Explain why LC 1349 reduces to bipartite matching, citing König
State Hopcroft-Karp’s complexity advantage on bipartite unit-cap graphs
Estimate runtime for a given V, E
Implement min-cost max flow if asked (separate algorithm — SPFA + Dinic)

Lab 02 — Bipartite Matching (Hopcroft-Karp)

Goal

Implement Hopcroft-Karp for maximum bipartite matching, achieving O(E·√V), and understand when it beats general max flow.

Background

Bipartite matching: given a bipartite graph (vertices split into L and R, edges only between L and R), find the largest set of edges with no shared endpoint.

Naïve augmenting path: O(V·E). For each unmatched left vertex, find an augmenting path via DFS.
Hopcroft-Karp (1973): find multiple vertex-disjoint shortest augmenting paths per phase. O(E·√V).

The √V comes from the fact that after √V phases, all remaining augmenting paths have length > √V, and there can be at most √V such paths.

Hopcroft-Karp is a special case of Dinic’s algorithm applied to a unit-capacity bipartite flow network. If you have Dinic, you have Hopcroft-Karp.

Interview Context

Bipartite matching shows up in:

ICPC (constant)
Assignment problems (jobs ↔ workers)
Some quant interviews on portfolio matching
Compiler register coalescing
Almost never in standard FAANG interviews

Recognizing that a problem is bipartite matching is the high-leverage skill; the algorithm is well-known.

When to Skip This Topic

Skip if any of these are true:

You’ve already done Lab 01 (Dinic handles bipartite matching as a special case)
You’re not targeting competitive / quant / assignment-heavy roles
You have less than 2 weeks for this phase

The reduction skill is what matters. Skip the algorithm if you can recognize the reduction and use Dinic.

Problem Statement

Maximum Bipartite Matching.

Given a bipartite graph with L left vertices, R right vertices, and M edges, find the maximum matching size.

Variant: Job Assignment. N workers, N jobs. Worker i can do a subset of jobs. Assign each worker to at most one job, each job to at most one worker. Maximize assignments.

Constraints

1 ≤ L, R ≤ 10^5
1 ≤ M ≤ 10^6
Wall-clock: < 1 sec

Clarifying Questions

Are the partitions L and R given, or do I need to detect bipartiteness? (Usually given.)
Are edges weighted? (No — that’s a different problem: Hungarian algorithm or min-cost max flow.)
Output the matching or just the size? (Both versions are common.)

Examples

L = {1, 2, 3}, R = {a, b, c}
Edges: 1-a, 1-b, 2-b, 3-c
Max matching: {1-a, 2-b, 3-c}, size = 3

L = {1, 2}, R = {a, b}
Edges: 1-a, 2-a
Max matching: {1-a} or {2-a}, size = 1

Brute Force

Try all subsets of edges; check that no vertex appears twice; track max. O(2^M · M).

Better naïve: for each left vertex in order, DFS to find an augmenting path. O(V·E).

Brute Force Complexity

Subsets: O(2^M)
Per-vertex DFS: O(V·E). Acceptable for V·E ≤ ~10^7.

Optimization Path

Hopcroft-Karp:

Phase 1: BFS from all unmatched left vertices, computing layers in the residual graph.
Phase 2: DFS from each unmatched left vertex, finding vertex-disjoint shortest augmenting paths.
Repeat until no augmenting path exists.

The phase count is O(√V), giving total O(E·√V).

Final Expected Approach

class HopcroftKarp:
    def __init__(self, left_size, right_size):
        self.L = left_size
        self.R = right_size
        self.adj = [[] for _ in range(left_size)]
        self.NIL = -1
    
    def add_edge(self, u, v):
        self.adj[u].append(v)
    
    def _bfs(self):
        q = deque()
        self.dist = [float('inf')] * self.L
        for u in range(self.L):
            if self.match_L[u] == self.NIL:
                self.dist[u] = 0
                q.append(u)
        found = False
        while q:
            u = q.popleft()
            for v in self.adj[u]:
                pair = self.match_R[v]
                if pair == self.NIL:
                    found = True
                elif self.dist[pair] == float('inf'):
                    self.dist[pair] = self.dist[u] + 1
                    q.append(pair)
        return found
    
    def _dfs(self, u):
        for v in self.adj[u]:
            pair = self.match_R[v]
            if pair == self.NIL or (self.dist[pair] == self.dist[u] + 1 and self._dfs(pair)):
                self.match_L[u] = v
                self.match_R[v] = u
                return True
        self.dist[u] = float('inf')
        return False
    
    def max_matching(self):
        self.match_L = [self.NIL] * self.L
        self.match_R = [self.NIL] * self.R
        matching = 0
        while self._bfs():
            for u in range(self.L):
                if self.match_L[u] == self.NIL and self._dfs(u):
                    matching += 1
        return matching

Data Structures

Adjacency list (left → list of right)
match_L[u], match_R[v]: current partner or NIL
dist[u]: BFS layer of left vertex
Queue for BFS

Correctness Argument

Augmenting path: path alternating unmatched-matched-unmatched… edges, starting and ending at unmatched vertices. Flipping the edges along the path increases matching size by 1.
Berge’s theorem: matching is maximum iff no augmenting path exists.
Hopcroft-Karp: in each phase, finds a maximal set of vertex-disjoint shortest augmenting paths. After √V phases, no short augmenting paths remain; at most √V remaining ones contribute one each.

Complexity

Time: O(E · √V)
Space: O(V + E)

For V = 10^5, E = 10^6: roughly 10^7.5 ≈ 3·10^7 ops — well under 1 sec in C++.

Implementation Requirements

Use BFS to detect all unmatched left vertices and compute layers
DFS must respect layer constraint (dist[pair] == dist[u] + 1)
Set dist[u] = infinity on failed DFS to prune subsequent visits
Repeat until BFS finds no augmenting path

Tests

Empty graph → 0
Single edge → 1
Complete bipartite K_{n,n} → n
Star (1 left, n right) → 1
Path 1-a-2-b-3 → 2

Follow-up Questions

Weighted matching (maximize sum of edge weights, not count). → Hungarian algorithm O(V³) or min-cost max flow.
Online matching (edges arrive one at a time). → Greedy is 1/2-competitive; ranking is (1 − 1/e)-competitive.
Stable matching (Gale-Shapley). → Different problem; preferences instead of binary edges.
Edge-disjoint paths from s to t. → Reduces to max flow with all capacities 1.

Product Extension

Ad-slot allocation (advertisers ↔ impressions)
Ride-sharing dispatch (drivers ↔ riders)
Course allocation (students ↔ classes with capacity)
Resource scheduling

Language/Runtime Follow-ups

C++: competitive programmers use a tight 50-line version
Python: recursion depth and constant factor make this borderline at V = 10^5; use sys.setrecursionlimit or iterative
Rust: ownership makes the in-place matching arrays a small wrestle

Common Bugs

Forgot to reset dist[u] = infinity on DFS failure. Re-explores dead ends; slow.
DFS doesn’t respect the layer constraint. Same as Ford-Fulkerson; loses √V factor.
match_L and match_R out of sync. Update both atomically.
NIL value collision with real vertex 0. Use -1 or a sentinel.

Debugging Strategy

After each phase, print matching size and BFS layer counts
Verify match_L[u] == v iff match_R[v] == u
Augmenting path should alternate matched/unmatched

Mastery Criteria

Implement Hopcroft-Karp in ≤ 30 min from memory
Explain why √V phases suffice (sketch of proof)
Identify when bipartite matching applies to a problem stated in domain terms
State the difference between bipartite matching and Hungarian (weighted)
Estimate runtime for given V, E

Lab 03 — Heavy-Light Decomposition

Goal

Implement Heavy-Light Decomposition (HLD) for answering path queries on a tree in O(log² N) per query (or O(log N) with a segment tree per chain).

Background

HLD partitions tree edges into “heavy” and “light”:

For each non-leaf vertex, the edge to its child with the largest subtree is heavy.
All other edges are light.

Property: any root-to-leaf path uses O(log N) light edges (because each light edge halves the subtree size). Hence any tree path can be decomposed into O(log N) heavy chains.

Each heavy chain is contiguous in a DFS order, so we can maintain a segment tree over the DFS array and do O(log N) work per chain → O(log² N) per path query.

Originally from Sleator and Tarjan’s link-cut tree work (1983); HLD as the standalone offline technique attributed to Sleator & Tarjan / popularized via ICPC.

Interview Context

HLD appears in:

ICPC (frequently in the path-query category — QTREE problem on SPOJ is canonical)
Almost never in industry interviews
A handful of database/optimizer roles touch on tree-DP that HLD speeds up
A very rare appearance in compiler dominator-tree manipulation

When to Skip This Topic

Skip if any of these are true:

You are not targeting ICPC
You are not interviewing at a research/algorithms team
You don’t already understand segment trees deeply (Phase 3 Lab 01–02)
You have less than 3 weeks for this phase

HLD is implementation-heavy. Getting it right in interview time requires ≥ 30 hours of practice.

Problem Statement

Path Query (QTREE-style).

Given a tree of N vertices, each edge has a weight. Support two operations:

change(i, w): change edge i’s weight to w
query(u, v): return the maximum edge weight on the path from u to v

Constraints

1 ≤ N ≤ 10^5
1 ≤ Q ≤ 10^5
Edge weights ≤ 10^9

Clarifying Questions

Is the tree rooted or unrooted? (Pick a root; doesn’t matter.)
Queries on edges or on vertices? (Edges here; vertex variant is simpler.)
Multiple components possible? (No — single tree.)

Examples

N = 4
Edges: (1,2,3), (2,3,4), (2,4,5)
query(1, 3): path 1→2→3, max edge = 4
change(1, 10): edge (1,2) now has weight 10
query(1, 4): path 1→2→4, max edge = 10

Brute Force

For each query, find the path via LCA (precompute LCA in O(log N)), then walk the path and check each edge. O(N) per query → O(NQ) total.

For N=Q=10^5, that’s 10^10 — TLE.

Brute Force Complexity

Time: O(NQ)
Space: O(N) for tree + LCA tables

Optimization Path

The path between u and v decomposes as u → LCA → v. With HLD, each leg traverses O(log N) heavy chains; each chain is a contiguous range in our DFS order; we query/update with a segment tree.

Final Expected Approach

DFS 1 (size/parent/depth): compute subtree size, parent, depth.
DFS 2 (HLD): for each vertex, identify heavy child (largest subtree). Assign heavy[u]. Walk heavy chains, assigning head[u] and a pos[u] = position in DFS-order array.
Build segment tree over the DFS array, indexed by pos[u].
Query(u, v): while head[u] != head[v], raise the deeper one to its head’s parent, querying the segment tree for the chain segment. Then query the segment between u and v on the final shared chain.

def query_path(u, v):
    res = 0
    while head[u] != head[v]:
        if depth[head[u]] < depth[head[v]]:
            u, v = v, u
        res = max(res, seg_tree.query(pos[head[u]], pos[u]))
        u = parent[head[u]]
    if u == v: return res
    if depth[u] > depth[v]: u, v = v, u
    # Edge weights stored at the deeper endpoint; skip u's contribution
    res = max(res, seg_tree.query(pos[u] + 1, pos[v]))
    return res

Data Structures

Tree adjacency
Arrays: parent, depth, size, heavy, head, pos
Segment tree over pos-indexed values
DFS for both passes (iterative for large N to avoid stack overflow)

Correctness Argument

Light edge bound: if (u, v) is a light edge, size(v) ≤ size(u)/2. So any root-to-leaf path crosses O(log N) light edges.
Heavy chain decomposition: path u→v splits at LCA; each leg traverses chains separated by light edges; ≤ O(log N) chains.
Segment tree on chain: chains contiguous in DFS order; standard range query.
Edge-on-vertex convention: store each edge’s weight at its deeper endpoint, so a vertex query at position i returns the parent-edge weight.

Complexity

Preprocessing: O(N)
Per query/update: O(log² N) — segment tree query O(log N), times O(log N) chains
Total: O((N + Q) log² N)

Implementation Requirements

Iterative DFS for N > 10^4 to avoid Python recursion limit (or C++ stack)
Segment tree must support point update + range max query
Careful indexing: edge i stored at vertex deeper_endpoint(i)
Handle u == v case in query (return 0 or identity)

Tests

Linear tree (path graph): every edge is on the chain; ≤ 1 chain transition per query
Balanced binary tree: ≈ log N chain transitions
Star tree (1 center, N leaves): only one heavy edge; all other queries are 1-chain
Update + query interleaved
Query on same vertex (u == v)
Path including the root

Follow-up Questions

Sum on path instead of max. → Same structure, segment tree stores sums.
Update path (range add) instead of point. → Segment tree with lazy propagation; same chain decomposition.
Subtree query (sum/max in subtree of u). → Even simpler: subtree is contiguous in DFS, single range query.
LCA only. → Tarjan offline O((N+Q)α(N)) or binary lifting O(log N).
Dynamic tree (edges added/removed). → Link-cut trees (Sleator-Tarjan); much harder.

Product Extension

Network routing on hierarchical topologies
File-system path queries (organizations with deep trees)
Phylogenetic tree analysis
Decision tree updates in some ML systems

Language/Runtime Follow-ups

C++: standard ICPC implementation; ~150 lines
Python: slow constant factor; use iterative DFS; can pass for N ≤ 10^4 comfortably
Java: stack depth fine for N ≤ 10^5; constant factor OK

Common Bugs

Recursive DFS for N = 10^5: stack overflow in many languages.
Forgot edge-vertex mapping convention: off-by-one when querying final segment.
Heavy child computed wrong: must be the child with the largest subtree, not the deepest.
head[u] not propagated through the chain. All vertices on the same heavy chain should share head.
Segment tree off-by-one between pos[u] and pos[v].
Update at the root edge. Root has no parent edge; verify boundary handling.

Debugging Strategy

Print the chain decomposition: list all chains as [head, ..., tail]
For a path query, log each chain segment queried
Verify against brute force on N ≤ 20
Visualize: color heavy edges red, light edges black; should see O(log) light edges on long paths

Mastery Criteria

Implement HLD + segment tree from scratch in ≤ 90 min
Explain why O(log N) chains per path
Handle edge-weight vs vertex-weight variants
Combine with lazy propagation for range updates
State complexity precisely: O((N + Q) log² N) or O(log N) with chain-segtrees

Lab 04 — Centroid Decomposition

Goal

Implement centroid decomposition for efficient tree queries — counting / aggregating over all paths in a tree in O(N log N) or O(N log² N).

Background

The centroid of a tree is a vertex whose removal leaves no subtree with more than N/2 vertices. Every tree has a centroid (sometimes two).

Centroid decomposition: recursively decompose the tree:

Find centroid; process all paths passing through it
Remove centroid; recurse on each remaining subtree

Recursion depth: O(log N) (each level halves subtree size). Total work per level: O(N) typically → O(N log N) total.

Originally developed for tree DP and offline path queries. Powerful technique for problems of the form: “count/sum over all pairs (u, v) in a tree with property P on the u-v path.”

Interview Context

Almost exclusively ICPC. Some appearances in:

Quant algo research on tree models
Phylogenetic inference (computational biology)
A handful of compiler dominator-tree analyses

Industry interviews: near-zero.

When to Skip This Topic

Skip if any of these are true:

You are not training for ICPC or competitive contests
You haven’t done Lab 03 (HLD) — these are sibling techniques
You don’t have 2+ weeks for the implementation practice

Centroid decomposition has a high “first implementation” cost. Don’t attempt without serious tree-DP fluency.

Problem Statement

Count Paths in Tree with Length ≤ K.

Given a tree of N vertices, edge weights w_e, and integer K, count the number of unordered pairs (u, v) such that the sum of edge weights on the path from u to v is ≤ K.

Constraints

1 ≤ N ≤ 5×10^4
1 ≤ K ≤ 10^9
1 ≤ w_e ≤ 10^4

Clarifying Questions

Are weights positive? (Yes — required for the standard algorithm.)
Count ordered or unordered pairs? (Unordered, exclude self-pairs.)
Are edge weights integers? (Yes — convenient for sort/binary-search.)

Examples

Tree: 1-2 (w=2), 2-3 (w=1), 2-4 (w=3)
K = 4
Paths and lengths:
  (1,2): 2 ✓
  (1,3): 3 ✓
  (1,4): 5 ✗
  (2,3): 1 ✓
  (2,4): 3 ✓
  (3,4): 4 ✓
Answer: 5

Brute Force

For each unordered pair (u, v), compute path length (LCA + ancestor distances). O(N² log N).

For N = 5×10^4: 2.5×10^9 ops — TLE.

Brute Force Complexity

Time: O(N² log N) for path length per pair
Space: O(N) plus LCA tables

Optimization Path

Centroid decomposition shines for “paths through centroid” enumeration:

A path between u and v either passes through the centroid c or lies entirely in one subtree (after c is removed)
Paths through c: count by aggregating distances from c to every other vertex
Paths in subtrees: handled recursively

Per centroid:

BFS from c, recording dist(c, v) for every v in c’s connected component
Sort distances per subtree
Count pairs (u, v) with dist(c, u) + dist(c, v) ≤ K using two pointers
Subtract pairs where u and v are in the same subtree (they would have been counted as paths through some other centroid)

Final Expected Approach

def centroid_decompose(root, K):
    n = len(adj)
    removed = [False] * n
    size = [0] * n
    total = 0
    
    def calc_size(u, parent):
        size[u] = 1
        for v, _ in adj[u]:
            if v != parent and not removed[v]:
                calc_size(v, u)
                size[u] += size[v]
    
    def find_centroid(u, parent, tree_size):
        for v, _ in adj[u]:
            if v != parent and not removed[v] and size[v] > tree_size // 2:
                return find_centroid(v, u, tree_size)
        return u
    
    def gather_dists(u, parent, d, out):
        out.append(d)
        for v, w in adj[u]:
            if v != parent and not removed[v]:
                gather_dists(v, u, d + w, out)
    
    def count_pairs(dists, K):
        dists.sort()
        i, j = 0, len(dists) - 1
        c = 0
        while i < j:
            if dists[i] + dists[j] <= K:
                c += j - i
                i += 1
            else:
                j -= 1
        return c
    
    def decompose(u):
        nonlocal total
        calc_size(u, -1)
        c = find_centroid(u, -1, size[u])
        all_dists = [0]
        for v, w in adj[c]:
            if not removed[v]:
                sub = []
                gather_dists(v, c, w, sub)
                # subtract pairs within this subtree
                total -= count_pairs(sub[:], K)
                all_dists.extend(sub)
        total += count_pairs(all_dists, K)
        removed[c] = True
        for v, _ in adj[c]:
            if not removed[v]:
                decompose(v)
    
    decompose(root)
    return total

Data Structures

Adjacency list (vertex → list of (neighbor, weight))
removed[v]: marks centroids removed from active tree
size[v]: subtree size in current decomposition step
Distance lists per subtree

Correctness Argument

Centroid existence: every tree has a centroid (induction on tree structure).
Recursion depth O(log N): removing centroid leaves subtrees of size ≤ N/2.
Pair counting via subtraction: a path (u, v) is counted exactly once — at the deepest centroid c that lies on path(u, v). The inclusion-exclusion (add all-vertices count, subtract per-subtree count) ensures each path-through-c is counted once.
Two pointers for sum ≤ K: standard.

Complexity

Time: O(N log² N) — O(log N) levels, O(N log N) per level (sort dominates)
Space: O(N) for tree + O(N) for decomposition state

Implementation Requirements

Iterative or carefully bounded recursive DFS (Python: 5×10^4 may need increased limit)
Recompute size[] for each subtree (in the recursive call); critical bug source
Two-pointer pair counting requires sorted distances
The inclusion-exclusion trick is the conceptual core; verify on small cases

Tests

Linear chain (path graph): N(N-1)/2 paths; verify against brute force
Star tree: each pair sum is at most 2*max_weight
Balanced binary tree
N = 1 (no pairs)
K = 0 with positive weights (only self-pairs; answer = 0)
Very large K (all pairs counted)

Follow-up Questions

Count paths with length exactly K. → Use hashmap of distances per subtree; sum complement counts.
Sum of path lengths (rather than count). → Aggregate sums in addition to counts during two-pointer scan.
XOR of edge weights instead of sum, equals K. → Replace sort/two-pointer with XOR trie.
Online (tree mutating). → Much harder; use top trees or Euler-tour trees.
K-th shortest path. → Different problem; rarely tractable on trees with centroid.

Product Extension

Phylogenetics: counting pairs of species within evolutionary distance K
Network distance queries on hierarchical trees
Distance-based recommendation systems on tree-like ontologies

Language/Runtime Follow-ups

Python: sort + two-pointer per level; constant factor is the killer. C++ recommended for N ≥ 10^4.
C++: standard ICPC implementation; ~100 lines.
Recursive DFS: centroid decomposition depth O(log N), but inner DFS depth O(N) — limit must accommodate.

Common Bugs

Forgot to recompute size[] for each subtree. Sizes from before removal are stale.
Centroid finder doesn’t follow the right child. Must descend toward the largest remaining subtree.
removed[v] check forgotten in DFS: revisits removed centroids.
Off-by-one in pair counting (counting self-pair). Handle separately.
Inclusion-exclusion wrong sign. Add all, subtract per-subtree.
Stack overflow on deep recursion. Convert inner DFS to iterative.

Debugging Strategy

For small N, compare against brute force at each level
Log the centroid chosen at each call
Verify subtree sizes recomputed correctly (print before find_centroid)
For two-pointer: print sorted distances and the (i, j) cursor trajectory

Mastery Criteria

Implement centroid decomposition in ≤ 60 min from memory
Explain the inclusion-exclusion trick for path counting
Identify problems amenable to centroid decomposition (offline path queries on static tree)
Distinguish from HLD: HLD is online with edge updates; centroid is offline/path-counting
State complexity precisely: O(N log² N) typical

Lab 05 — Suffix Automaton

Goal

Build a Suffix Automaton (SAM) for a string, and use it to count the number of distinct substrings in O(N).

Background

A suffix automaton is the smallest DFA that accepts every suffix of a given string. Discovered by Blumer et al. (1985).

Key facts:

O(N) states and O(N) transitions for a string of length N (over an alphabet)
Each state corresponds to an equivalence class of right-extensions of substrings
Distinct substring count = sum of len[state] - len[link[state]] over all states (excluding the initial state)

It’s the most powerful string data structure in competitive programming. Sometimes compared with suffix arrays + LCP arrays (which solve many of the same problems with different constants).

Interview Context

Heavy ICPC / Codeforces presence
Bioinformatics / genome alignment research
Rarely in industry; Bloomberg may ask, but usually accepts suffix array
Cryptography (some sequence-counting problems)

When to Skip This Topic

Skip if any of these are true:

You’re not targeting ICPC or string-research roles
You haven’t learned suffix arrays yet (lower-hanging fruit; more interview-relevant)
You don’t have 2+ weeks to internalize this

SAM is conceptually the deepest topic in this phase. The implementation is short; the understanding is hard. Don’t fake it.

Problem Statement

Count Distinct Substrings.

Given a string s, count the number of distinct non-empty substrings.

Example: s = "abc" → substrings {“a”, “b”, “c”, “ab”, “bc”, “abc”} → 6.

s = "aaa" → substrings {“a”, “aa”, “aaa”} → 3.

Constraints

1 ≤ |s| ≤ 10^6
Lowercase English (or larger alphabet — affects transition storage)
Wall-clock: < 1 sec

Clarifying Questions

Empty substring counts? (Usually no.)
Substrings or distinct substrings? (Distinct; non-distinct is trivial: N(N+1)/2.)
Alphabet size? (26 for English; affects map-vs-array tradeoff.)

Examples

"abc"   → 6   (a, b, c, ab, bc, abc)
"aaa"   → 3   (a, aa, aaa)
"abab"  → 7   (a, b, ab, ba, aba, bab, abab)
""      → 0

Brute Force

Generate all substrings, insert into a hash set, return size. O(N²) substrings, each of average length N/2 to hash → O(N³) worst case, O(N² log N) average. For N = 10^6: dead.

Brute Force Complexity

Time: O(N²) to O(N³)
Space: O(N²) for the set

Optimization Path

Build the suffix automaton:

Each state represents an equivalence class of substring occurrences
For each state s (except the initial), the number of distinct substrings ending at this state’s set of right-positions is len[s] - len[link[s]]
Sum these for all states → distinct substring count

Final Expected Approach

class SuffixAutomaton:
    def __init__(self):
        self.size = 1
        self.last = 0
        self.len = [0]
        self.link = [-1]
        self.next = [{}]
    
    def extend(self, c):
        cur = self.size
        self.size += 1
        self.len.append(self.len[self.last] + 1)
        self.link.append(-1)
        self.next.append({})
        p = self.last
        while p != -1 and c not in self.next[p]:
            self.next[p][c] = cur
            p = self.link[p]
        if p == -1:
            self.link[cur] = 0
        else:
            q = self.next[p][c]
            if self.len[p] + 1 == self.len[q]:
                self.link[cur] = q
            else:
                clone = self.size
                self.size += 1
                self.len.append(self.len[p] + 1)
                self.link.append(self.link[q])
                self.next.append(dict(self.next[q]))
                while p != -1 and self.next[p].get(c) == q:
                    self.next[p][c] = clone
                    p = self.link[p]
                self.link[q] = clone
                self.link[cur] = clone
        self.last = cur
    
    def count_distinct_substrings(self):
        return sum(self.len[i] - self.len[self.link[i]] for i in range(1, self.size))

# Usage
sam = SuffixAutomaton()
for c in "abab":
    sam.extend(c)
print(sam.count_distinct_substrings())   # 7

Data Structures

len[]: longest substring represented by each state
link[]: suffix link (analogous to failure link in Aho-Corasick)
next[]: transition map per state (dict or array of 26)

Correctness Argument

The SAM construction is non-trivial. The key invariants:

After processing prefix of length k, the SAM recognizes exactly the suffixes of that prefix
Each state’s len[s] - len[link[s]] counts the number of distinct substrings whose right-extension class is exactly this state
Summing across all non-initial states gives total distinct substrings

The cloning step (when len[p] + 1 != len[q]) splits a state to maintain the equivalence class property — without it, the automaton wouldn’t be canonical.

A rigorous proof is in Maxime Crochemore’s textbook. Accept the construction; verify on small cases.

Complexity

Construction: O(N · |Σ|) with dict transitions, or O(N) amortized with array transitions
Distinct substring count: O(N) after construction
Space: O(N · |Σ|) worst case

Implementation Requirements

Use dict per state for arbitrary alphabets, or [None] * 26 for English
Allocate state arrays incrementally (or pre-allocate 2N for safety)
Cloning step is the most error-prone — verify on small cases
Avoid recursion; SAM is naturally iterative

Tests

"a" → 1
"aa" → 2
"ab" → 3
"abc" → 6
"aaa" → 3
"abab" → 7
"abcabc" → 15
Stress: random strings of N=100 against brute force
Performance: N = 10^6 single character → finishes in < 1 sec

Follow-up Questions

Find longest common substring of two strings. → Build SAM of one; walk through the other tracking match length.
Number of occurrences of each substring. → Count terminal nodes (via topological sort of suffix link tree, then propagate).
K-th lexicographically smallest substring. → DFS on SAM with character ordering + count of substrings reachable.
Substring matching count for many queries. → Walk pattern in SAM; if it ends, answer is the size of its “endpos” set.

Product Extension

Genome assembly: distinct k-mers, longest common substrings across reads
Plagiarism detection
Compression (LZ-family algorithms use suffix structures)
Search engine n-gram indexing (less common; usually suffix array)

Language/Runtime Follow-ups

C++: use int next[][26] for the alphabet — fast.
Python: dict transitions; constant factor allows N ≤ 10^5 comfortably.
Java: TreeMap or HashMap; arrays preferred for fixed alphabet.

Common Bugs

Forgot to clone: when len[p]+1 != len[q], failing to clone breaks the automaton.
Wrong update of last: must always be cur, not clone.
Suffix link of cur set incorrectly: subtle; verify against reference.
Used same dict reference for clone: must dict(self.next[q]) (copy).
Off-by-one in distinct substring sum: start from state 1, not 0 (state 0 is the initial state and represents the empty substring).

Debugging Strategy

Print states with (len, link, transitions) after each extend call
Visualize the suffix-link tree (parent = link[state])
Verify against brute force for N ≤ 20
For the cloning step: log when it triggers and which state is split

Mastery Criteria

Implement SAM construction in ≤ 45 min from memory (it’s short but error-prone)
Explain the role of suffix links and the cloning step
Apply SAM to: distinct substring count, occurrence count, LCS of two strings
State complexity precisely (O(N) states, O(N|Σ|) transitions, O(N) construction with arrays)
Distinguish SAM from suffix arrays in problem-applicability

Lab 06 — Advanced DP Optimization

Goal

Apply three classical DP optimization techniques — Convex Hull Trick (CHT), Knuth’s optimization, and Divide & Conquer DP — to reduce a polynomial-time DP from O(N²) or O(N³) to O(N log N) or O(N²).

Background

Many DPs have the form dp[i] = min_j (dp[j] + cost(j, i)) for various cost functions. The naive scan is O(N) per i, giving O(N²) total. Three techniques exploit structure in cost:

Convex Hull Trick: when cost(j, i) = a[j] * x[i] + b[j] (linear in x[i]), the transitions form lines; the min is the lower envelope, queryable in O(log N) or O(1) per query.
Knuth’s optimization: when the cost satisfies the quadrangle inequality (cost(a,c) + cost(b,d) ≤ cost(a,d) + cost(b,c) for a ≤ b ≤ c ≤ d), and optimal split points are monotonic. Reduces O(N³) to O(N²).
Divide & Conquer DP: when optimal split points are monotonic (opt(i, j) ≤ opt(i+1, j)). Reduces O(KN²) to O(KN log N).

Interview Context

Codeforces / ICPC: regular
Quant: heavy presence in trading-cost optimization, risk allocation
Compiler: loop scheduling sometimes uses CHT
Database: query optimizer cost minimization (rare)

Almost never in standard interviews.

When to Skip This Topic

Skip if any of these are true:

You aren’t already fluent in 1D and 2D DP (Phase 5 prerequisites)
You’re not targeting ICPC / quant / compiler optimization roles
You don’t have 2+ weeks to drill multiple variants

These are families of techniques; each requires several practice problems to internalize.

Problem Statement

Three variants, one for each technique:

Variant A — CHT (Convex Hull Trick)

You drive along a road with N houses. House i is at position x[i] (sorted). You can rent a car at house i for cost c[i] + d[i] * (x[j] - x[i]) if you drive from house i to house j > i. Starting at house 1, what’s the minimum cost to reach house N?

dp[j] = min over i < j of (dp[i] + c[i] + d[i] * (x[j] - x[i]))
      = min over i < j of (d[i] * x[j] + (dp[i] + c[i] - d[i] * x[i]))

This is linear in x[j] — CHT applicable. O(N) or O(N log N) depending on whether x is sorted.

Variant B — Knuth’s Optimization

Optimal Binary Search Tree. Given keys with access probabilities, build a BST minimizing expected access cost.

dp[i][j] = min over i ≤ k ≤ j of (dp[i][k-1] + dp[k+1][j]) + sum(p[i..j])

Naive O(N³). Knuth: O(N²) if opt[i][j] is monotonic, which holds when cost is quadrangle-inequality compliant.

Variant C — Divide & Conquer DP

Minimum K Partitions. Partition array a[1..N] into exactly K contiguous segments, minimizing sum of “cost” of each segment, where cost(l, r) satisfies the monotonic-opt property.

dp[k][i] = min over j < i of (dp[k-1][j] + cost(j+1, i))

Naive O(KN²). D&C DP: O(KN log N).

Constraints

A: 1 ≤ N ≤ 10^5
B: 1 ≤ N ≤ 5×10^3
C: 1 ≤ N ≤ 5×10^3, 1 ≤ K ≤ N

Clarifying Questions

A: Are x[i] strictly sorted? Are d[i] non-negative? B: Are probabilities normalized? Distinct keys? C: Is cost precomputable in O(1) after O(N²) prep? Quadrangle inequality verified?

Examples

A (CHT)

positions: [0, 5, 10, 20]
c = [0, 3, 2, _]; d = [1, 1, 2, _]
dp[1] = 0
dp[2] = 0 + 0 + 1*(5-0) = 5
dp[3] = min(0+0+1*10, 5+3+1*5) = 10
dp[4] = min(0+0+1*20, 5+3+1*15, 10+2+2*10) = 20 vs 23 vs 32 → 20

B (Knuth) — verify on a small probability vector.

C (D&C DP) — verify on contrived `cost`.

Brute Force

A: O(N²) DP scan. B: O(N³) standard. C: O(KN²) standard.

Brute Force Complexity

For N = 10^5 in A: O(N²) = 10^10 — TLE. For N = 5×10^3 in B/C: O(N³) = 1.25×10^11 — TLE.

Optimization Path

A (CHT):

Maintain a lower convex hull of lines y = d[i] * x + (dp[i] + c[i] - d[i] * x[i]). For each query x = x[j], find the line with minimum y at that x.

If x[j] is monotonic: use a “Li Chao tree” or a stack-based hull with pointer for O(1) amortized per query → O(N) total.
If x[j] arbitrary: binary search on hull → O(N log N).

B (Knuth):

Compute dp[i][j] for increasing j - i. For each (i, j), only try splits in [opt[i][j-1], opt[i+1][j]]. Amortized O(N²) instead of O(N³).

C (D&C DP):

For layer k, define solve(lo, hi, opt_lo, opt_hi): compute dp[k][lo..hi], knowing optimal split for each is in [opt_lo, opt_hi]. Recurse on midpoint m, then on solve(lo, m-1, opt_lo, opt[m]) and solve(m+1, hi, opt[m], opt_hi). Each level of recursion is O(N) work; depth is O(log N) → O(N log N) per k, O(KN log N) total.

Final Expected Approach

(See solution sketches inline in the variants above. Full implementations are 80–150 lines each in C++.)

Data Structures

A (CHT): deque of lines + intersection-checking helper
B (Knuth): dp[N][N], opt[N][N]
C (D&C): dp[K][N], recursive solver

Correctness Argument

CHT: A line y = mx + b is “dominated” if another line is below it at every x in the query range. The lower envelope contains exactly the non-dominated lines, in order of increasing slope.
Knuth: the quadrangle inequality implies monotonicity of opt. Proof in TAOCP Vol 3.
D&C DP: if opt[i] ≤ opt[i+1] (monotonicity), splitting the range and using this constraint reduces work logarithmically.

Complexity

Variant	Naive	Optimized
A	O(N²)	O(N) or O(N log N)
B	O(N³)	O(N²)
C	O(KN²)	O(KN log N)

Implementation Requirements

CHT: handle slopes carefully (sorted vs not); avoid division for intersections (use cross-multiplication with care for overflow)
Knuth: process diagonals in order of length; verify monotonicity in a debug build
D&C DP: pass index ranges + opt ranges; base case is lo > hi

Tests

Small N where brute force confirms answer
Edge: N = 1 (trivial answer)
All zero costs / probabilities
Monotonically increasing / decreasing costs
Stress: random instances at N = 1000 — compare optimized vs brute force

Follow-up Questions

CHT for max instead of min. → Maintain upper convex hull; symmetric.
Lines added in arbitrary slope order. → Use Li Chao Tree; O(log N) per insert and query.
Knuth not applicable (no quadrangle). → Either D&C DP (if opt is monotonic) or SMAWK (O(N) for totally monotone matrices).
D&C DP combined with CHT. → Possible; “Aliens trick” / Lagrangian relaxation.

Product Extension

CHT: dynamic programming in online ad bidding optimization, trading strategy optimization
Knuth: BST construction in language tools (rarely; usually use B-trees or hash maps)
D&C DP: optimal segmentation in time-series anomaly detection, network topology design

Language/Runtime Follow-ups

C++: all three implementable with stdlib. CHT often uses __int128 to avoid overflow in intersection checks.
Python: D&C DP works but with significant constant factor; CHT with Li Chao is feasible.
Java: BigInteger for safety on overflow-prone intersection checks.

Common Bugs

CHT: integer overflow in line intersection. Use long doubles or 128-bit.
CHT: deque pops the wrong end when slopes are descending vs ascending.
Knuth: didn’t verify the quadrangle inequality. Algorithm gives wrong answer silently.
Knuth: opt-range boundaries inclusive vs exclusive — off-by-one.
D&C: passed wrong opt range to recursive calls. Loses the monotonicity benefit.
D&C: base case (lo > hi) doesn’t return.

Debugging Strategy

For each technique, write a brute-force version side by side
Stress test with random small instances and assert equality
For CHT, print the hull after each insertion
For Knuth/D&C, log the chosen split points and verify monotonicity

Mastery Criteria

Recognize when a DP qualifies for each optimization
Implement CHT in ≤ 40 min
Implement Knuth in ≤ 30 min
Implement D&C DP in ≤ 30 min
State the prerequisite condition for each (linearity, quadrangle, monotonic-opt)
Estimate runtime for given N, K

Lab 07 — FFT / Polynomial Multiplication

Goal

Implement the Cooley-Tukey FFT to multiply two polynomials in O(N log N), and apply it to convolution-based problems (large-integer multiplication, string matching with wildcards).

Background

The Discrete Fourier Transform (DFT) of a length-N vector evaluates the corresponding polynomial at N roots of unity. Multiplying two polynomials of degree N-1 via naive convolution is O(N²); via DFT, point-wise multiply, inverse-DFT, it’s O(N log N).

Cooley-Tukey (1965): divide and conquer the DFT. The radix-2 version requires N a power of 2.

Number-Theoretic Transform (NTT): FFT over a prime field; avoids floating-point error; common in competitive programming.

Interview Context

Codeforces / ICPC: regular (NTT version)
Signal processing roles (DSP, audio, image): expected
Cryptography research: standard tool
Quant: large-integer multiplication, time-series convolutions
Standard FAANG: essentially zero

When to Skip This Topic

Skip if any of these are true:

You aren’t targeting signal-processing, cryptography, or ICPC roles
You haven’t implemented divide-and-conquer recursive algorithms confidently
You’re rusty on complex number arithmetic

This is a “you need it or you don’t” topic. Most interview prep should skip.

Problem Statement

Polynomial Multiplication.

Given two polynomials A(x) = sum a[i] * x^i and B(x) = sum b[i] * x^i, compute their product C(x) = A(x) * B(x).

Equivalently: compute the convolution c[k] = sum_{i+j=k} a[i] * b[j].

Constraints

Degrees up to 10^5 or 10^6
Coefficients fit in int32 (avoid overflow concerns by using float carefully; or use NTT)
Wall-clock: < 1 sec

Clarifying Questions

Integer coefficients or real-valued? (Integer for NTT, real for FFT.)
Exact answer required? (Yes for NTT; FFT introduces floating error.)
Output as polynomial coefficients or as a value at specific x? (Coefficients.)

Examples

A = [1, 2, 3]   (1 + 2x + 3x²)
B = [4, 5]      (4 + 5x)
C = [4, 13, 22, 15]   (4 + 13x + 22x² + 15x³)

A = [1, 1]      (1 + x)
B = [1, 1]
C = [1, 2, 1]   (1 + x)² = 1 + 2x + x²

Brute Force

Nested loops: c[i+j] += a[i] * b[j]. O(N²). For N = 10^5: 10^10 ops — TLE.

Brute Force Complexity

Time: O(N²)
Space: O(N)

Optimization Path

Pad both A and B to length 2^k ≥ deg(A) + deg(B) + 1.
Compute DFT(A) and DFT(B) using Cooley-Tukey.
Pointwise multiply: F[i] = DFT(A)[i] * DFT(B)[i].
Compute IDFT(F) to recover convolution C.

Final Expected Approach

def fft(a, invert=False):
    n = len(a)
    if n == 1: return
    # bit-reverse permutation
    j = 0
    for i in range(1, n):
        bit = n >> 1
        while j & bit:
            j ^= bit
            bit >>= 1
        j ^= bit
        if i < j:
            a[i], a[j] = a[j], a[i]
    # butterfly
    length = 2
    while length <= n:
        angle = 2 * math.pi / length * (-1 if invert else 1)
        wlen = complex(math.cos(angle), math.sin(angle))
        for i in range(0, n, length):
            w = complex(1)
            for k in range(length // 2):
                u = a[i + k]
                v = a[i + k + length // 2] * w
                a[i + k] = u + v
                a[i + k + length // 2] = u - v
                w *= wlen
        length <<= 1
    if invert:
        for i in range(n):
            a[i] /= n

def multiply(a, b):
    result_size = 1
    while result_size < len(a) + len(b):
        result_size <<= 1
    fa = [complex(x) for x in a] + [complex(0)] * (result_size - len(a))
    fb = [complex(x) for x in b] + [complex(0)] * (result_size - len(b))
    fft(fa)
    fft(fb)
    for i in range(result_size):
        fa[i] *= fb[i]
    fft(fa, invert=True)
    return [round(x.real) for x in fa[:len(a) + len(b) - 1]]

For NTT (exact integer convolution), replace complex roots of unity with primitive roots in F_p for a prime p = c * 2^k + 1 (common: 998244353 with primitive root 3).

Data Structures

Array of complex numbers (FFT) or integers mod p (NTT)
Bit-reverse permutation index

Correctness Argument

DFT linearity: DFT(A + B) = DFT(A) + DFT(B).
Convolution theorem: DFT(A * B) = DFT(A) ⊙ DFT(B) (pointwise).
Inverse: IDFT(DFT(A)) = A.
Cooley-Tukey: recursive split into even/odd indices; combines via roots of unity.

Complexity

Time: O(N log N)
Space: O(N)

Implementation Requirements

N must be a power of 2; pad with zeros
Bit-reversal permutation correctly implemented
Iterative butterflies (recursive is fine for small N but slow for large)
For floating-point FFT: round to nearest integer at the end; verify error bound
For NTT: pick a prime large enough for max coefficient × N to avoid overflow

Tests

Multiply [1, 1] × [1, 1] = [1, 2, 1]
Multiply degree-3 polynomials by hand-verified product
Stress: random N = 10^4 polynomials vs O(N²) brute force; assert equality
N = 1 (constants only)
All zeros (result all zeros)
Performance: N = 2×10^5 in < 1 sec

Follow-up Questions

Exact integer convolution with large coefficients. → NTT with multi-modulus + CRT, or three NTTs with different primes.
String matching with wildcards. → Reduce to convolution; each char becomes a numeric encoding; wildcard = 0; sum-of-(diff)² = 0 means match.
Multi-dimensional FFT (image convolution). → Apply 1D FFT along each axis.
Fast multiplication of very large integers. → Schönhage-Strassen uses FFT; Furer’s algorithm faster asymptotically.
Subset sum convolution. → Walsh-Hadamard transform; different beast.

Product Extension

Audio processing (spectrograms, filters)
Image processing (Gaussian blur, edge detection)
Cryptography (large-integer multiplication for RSA, ECC)
Time-series analysis (autocorrelation)
Big-integer libraries (GMP uses FFT-based multiplication above ~1000 digits)

Language/Runtime Follow-ups

C++: standard. NumPy’s FFT in Python is C-optimized — sometimes acceptable as the “library” answer if the interviewer allows.
Python: pure-Python FFT is slow; for N > 10^4 use numpy.fft.
Java: Apache Commons Math has FFT.
JavaScript: rare in interviews; libraries exist.

Common Bugs

Bit-reverse permutation wrong: off-by-one in the swap loop.
Forgot to divide by N in inverse: result is N× too large.
N not a power of 2: padding error.
Floating-point error too large: for coefficients near max int32, need careful rounding or NTT.
NTT primitive root wrong: for prime p, root g must have order divisible by N.
Result length wrong: should be len(A) + len(B) - 1, but FFT computed over N ≥ that.

Debugging Strategy

Verify FFT then IFFT recovers the input (within floating tolerance)
Multiply by [1] and verify output equals input
Compare against numpy.fft on small inputs
For NTT: compute small examples by hand using primitive roots

Mastery Criteria

Implement Cooley-Tukey FFT in ≤ 45 min
Implement NTT in ≤ 60 min
State the convolution theorem
Identify problems solvable via FFT/NTT (convolution, large-int mult, string matching with errors)
Reason about numerical precision for FFT vs NTT
Estimate runtime for given N

Lab 08 — Advanced Geometry

Goal

Implement two foundational computational geometry primitives:

Convex Hull via Andrew’s monotone chain (O(N log N))
Segment Intersection (Bentley-Ottmann sweep-line, O((N + K) log N))

Background

Computational geometry interviews are rare but exist at:

Companies doing CAD, CAM, 3D modeling (Autodesk, Adobe)
Games (Unity, Epic) — physics, collision
Robotics (path planning, occupancy grids)
Maps/GIS (Google Maps, ESRI)
Some quant for time-series geometric techniques

The implementation has many sharp edges: floating-point comparison, collinear points, degenerate cases. Robust geometry is a deep subfield.

Interview Context

Codeforces / ICPC: geometry round often included
Game / CAD / robotics roles: foundational
Standard FAANG: almost never

When to Skip This Topic

Skip if any of these are true:

You’re not targeting the specific industries above
You’re uncomfortable with vector/cross product math
You don’t have 2+ weeks to handle the edge cases properly

The first implementation of convex hull will have bugs. Plan for several practice attempts.

Problem Statement

Part A — Convex Hull

Given N points in 2D, return the vertices of their convex hull in counterclockwise order, starting from the lowest-leftmost point.

Part B — Count Segment Intersections

Given N line segments, return the number of intersection points among them.

Constraints

A: 1 ≤ N ≤ 10^5. Coordinates fit in int32.
B: 1 ≤ N ≤ 10^5. Up to K = O(N²) intersections in pathological cases; for the algorithm to be efficient, K << N².

Clarifying Questions

Should collinear hull points be included or skipped?
Are duplicates possible?

Should overlapping segments count as one intersection or many?
Touching at endpoints?

Examples

A

Points: [(0,0), (1,1), (2,0), (1,-1)]
Hull: [(0,0), (2,0), (1,1)] (counterclockwise; (1,-1) is below)
Wait — (1,-1) is also on the hull. Correct hull: [(0,0), (1,-1), (2,0), (1,1)].

B

Segments: [((0,0),(4,4)), ((0,4),(4,0))]
Intersect at (2,2). Answer: 1.

Brute Force

A: gift-wrapping (Jarvis march) — O(N · H) where H = hull size. Worst case O(N²). B: check every pair — O(N²).

Brute Force Complexity

A: O(NH) worst-case O(N²)
B: O(N²)

For N = 10^5: 10^10 — TLE.

Optimization Path

A — Andrew’s Monotone Chain

Sort points lexicographically by (x, y).
Build upper hull: iterate left to right; pop top of hull while it makes a non-right turn.
Build lower hull: iterate right to left; pop while non-right turn.
Concatenate (excluding shared endpoints).

def cross(O, A, B):
    return (A[0]-O[0]) * (B[1]-O[1]) - (A[1]-O[1]) * (B[0]-O[0])

def convex_hull(points):
    points = sorted(set(points))
    if len(points) <= 1: return points
    lower = []
    for p in points:
        while len(lower) >= 2 and cross(lower[-2], lower[-1], p) <= 0:
            lower.pop()
        lower.append(p)
    upper = []
    for p in reversed(points):
        while len(upper) >= 2 and cross(upper[-2], upper[-1], p) <= 0:
            upper.pop()
        upper.append(p)
    return lower[:-1] + upper[:-1]

B — Bentley-Ottmann Sweep Line

Event queue: segment endpoints + computed intersections, ordered by x.
Status: balanced BST of active segments, ordered by y at the sweep line.
At a “left endpoint” event: insert into status; check intersection with neighbors above/below.
At a “right endpoint” event: remove from status; check intersection between the new neighbors.
At an “intersection” event: swap the two segments in status; check intersections with new neighbors.

O((N + K) log N) where K = number of intersections.

Final Expected Approach

Convex hull: as above.

Bentley-Ottmann: full implementation is 200+ lines with all edge cases. For most interviews, even discussing the structure is enough — full code is unlikely to be required.

Data Structures

A: sorted list, stack-like list for hull construction
B: priority queue (event queue), balanced BST or sorted list with O(log N) ops (status)

Correctness Argument

A

Sorting by x (then y) ensures we visit points in a consistent order.
“Right turn” (cross product > 0) means we’re making a convex angle; popping ensures we never include a concave point.
Lower and upper hulls cover all hull vertices; concatenation gives full hull in CCW order.

B

Two segments can only intersect when they are adjacent in the y-order at some x.
The sweep maintains adjacency; new adjacencies arise at endpoints and intersections.
Each intersection is detected exactly once (at the moment the segments become adjacent).

Complexity

A: O(N log N) for sort + O(N) for chain construction.
B: O((N + K) log N).

Implementation Requirements

Use integer arithmetic for cross product when possible (avoids floating-point errors)
Handle collinear hull points consistently (include or exclude as required)
For segment intersection: distinguish proper intersection (interior) from touching (endpoint)
Watch for vertical segments (event ordering edge case)

Tests

A

Single point → [point]
Two points → [both]
Three collinear points → [endpoints only] (or all three, depending on convention)
Square (4 corners + 5 interior) → 4 corners
All points on a circle → all on hull
Many duplicates

B

No intersections (parallel lines)
All intersect at one point (n lines through origin)
Random N = 100 vs O(N²) brute force

Follow-up Questions

Convex hull in 3D. → Quickhull or gift wrapping in 3D; O(N²) worst.
Dynamic convex hull (online insertions). → Overmars-van Leeuwen; O(log² N) per update.
Compute area of hull. → Shoelace formula.
Diameter of hull (farthest pair). → Rotating calipers, O(N).

Report intersections, not just count. → Same structure; collect points.
Line-line intersection in projective coords. → Avoids special-casing parallels.
Segments may overlap. → More complex event handling.
Robust orientation predicate. → Adaptive precision; Shewchuk’s predicates.

Product Extension

Mapping: route simplification (Ramer-Douglas-Peucker)
Games: collision detection (broad phase uses sweep)
CAD: boolean operations on polygons (sweep-based)
Robotics: configuration space construction

Language/Runtime Follow-ups

C++: geometry libraries (CGAL) are massive and correct but complex.
Python: Shapely for production; for interview, implement primitives.
Java: java.awt.geom.* has some primitives.

Common Bugs

Floating-point comparison without epsilon: false negatives on coincident points.
cross < 0 vs cross <= 0: different hull conventions (collinear in or out).
Forgot to dedupe input points before convex hull.
Sweep-line: vertical segments treated incorrectly. Add small perturbation or special-case.
Sweep-line: intersection on the sweep line not detected because status ordering is computed at the wrong x.

Debugging Strategy

Plot the points and hull (matplotlib or similar)
For small cases, enumerate hull by hand and verify
For sweep-line: log every event and the status before/after

Mastery Criteria

Implement Andrew’s monotone chain in ≤ 30 min from memory
Implement basic segment-intersection check in ≤ 15 min
Understand the Bentley-Ottmann structure (even without writing 200 lines)
Apply rotating calipers for hull diameter
Recognize when integer arithmetic suffices vs when floating-point is unavoidable
State complexity precisely

Lab 09 — ICPC Contest Simulation

Goal

Simulate a 5-hour ICPC-style contest: 6–10 problems of varying difficulty, single team, no internet, paper and pen allowed. Practice contest strategy: problem selection, time management, debugging under pressure.

Background

ICPC contests are the gold standard for competitive programming practice:

5 hours
~10 problems sorted A–J in roughly increasing difficulty (but not strictly)
Penalty per wrong submission (20 minutes)
Final ranking by # problems solved, then total time

This lab is a meta-lab: rather than teach an algorithm, it builds the contest operating system of the candidate.

Interview Context

ICPC training transfers to:

Quant hiring (Jane Street, Citadel value ICPC experience)
Google L6+ research interviews (sometimes)
General algorithmic confidence under time pressure (transferable)

Direct application of contest mode to industry interviews: low. But the training effect is high.

When to Skip This Topic

Skip if any of these are true:

You’re not interviewing at competitive-friendly firms
You have less than a month for this phase
You haven’t done Phase 11 mocks at your target level — those are higher leverage

If you do this lab, do it only after exhausting Phase 11.

Problem Statement

Run a 5-hour contest. Sources of problem sets:

Codeforces Educational rounds
Codeforces Div 2 (rounds 600+)
ICPC regional sets on UVa Online Judge or DOMjudge replays
Atcoder Beginner Contest (ABC) — easier, 100 min
Atcoder Regular Contest (ARC) — medium, 120 min
Kattis archive

Required mix for a 5-hour set:

2 trivial problems (sanity / warm-up; should solve in 15 min each)
3 medium problems (1 hour each)
2 hard problems (1.5+ hours, often unsolved)
1 problem-killer (often unsolved by anyone except top teams)

Constraints

Time: 5 hours, single uninterrupted block
No internet (except for the problem statements)
Programming language of your choice (typically C++ for competitive)
Penalty: simulate the 20-minute penalty per WA
No external help / collaboration

Clarifying Questions (to yourself before starting)

Which 2 problems will I attempt first? (Decision in 5 min of reading.)
What’s my “hard problem cutoff” — at what point do I move on?
What’s my time budget for debugging vs writing?
How will I track time per problem?

Examples (suggested contest sets)

Beginner: ABC 250 (8 problems, 100 min, but extend to 3 hours for practice)
Intermediate: Codeforces Round 800, Div 2 (4–5 problems, 2 hours; extend with a Div 1 problem)
Advanced: Any ICPC regional set on Kattis

Brute Force / Naive Strategy

Read problems A → J in order
Attempt in order
Stuck on B → keep grinding

This is the wrong strategy. All experienced contestants read every problem in the first 30 minutes.

Brute Force Complexity

Linear-strategy ranking is in the bottom half. The optimization is meta: contest strategy.

Optimization Path

Phase 1 (0–30 min): Reading and triage

Read all problems. Categorize each as: trivial (T) / medium (M) / hard (H) / unknown (?)
Note any problem you immediately know how to solve
Estimate time-to-solve for each known problem
Decide which 2 problems to start with (lowest-risk T or known-M)

Phase 2 (30 min – 3 hours): Bulk solving

Solve the T’s first (lock in points)
Tackle M’s one at a time
Time-cap each: 45 min to first attempt; if no insight at 60 min, switch
Submit only when you’ve tested at least 2 cases (penalty hurts)
Track submitted vs accepted on a paper grid

Phase 3 (3–4.5 hours): Hard problem attack

Attempt your best H candidate
If stuck for 30 min, switch to another M or H
Don’t sink the last hour on one H if no progress

Phase 4 (last 30 min): Last-chance and verification

Verify your AC submissions (any bugs you noticed but didn’t fix?)
Submit any partial solutions
Hand-test edge cases on solved problems

Final Expected Approach

Run the contest, then write a post-mortem:

Problems solved: ___
Penalty time: ___
Problems unread: ___ (should be 0)
Problems where you knew the approach but couldn’t implement: ___
Problems where you couldn’t find the approach: ___
Wasted time on (problem X): ___
Bug that cost you (problem Y): ___

Data Structures (the contestant uses)

Templates file (.h for C++ or snippets): segment tree, DSU, FFT, Dijkstra, BFS, mod arithmetic
Paper grid: problem letter | reading time | first-attempt time | submissions | AC time
Stack-rank: priority order updated as problems are read

Correctness Argument

Strategy correctness is empirical: track your own performance over 5–10 contests. Adjust based on:

Where did I spend too long?
Which problems did I misclassify?
Which problem types do I consistently miss?

Complexity

Contest itself: 5 hours. Post-mortem: 1 hour. Per-week investment: ~1 full contest + a few targeted upsolves = 8–12 hours.

Implementation Requirements

Pre-built template file (start with KACTL or your own)
Local judging setup: compile + run + diff against expected output
Stress-testing harness (Lab 05 from Phase 10 applies directly)
A timer / phone alarm for the 5 hours

Tests

This is the test. The contest is the test.

Self-evaluation rubric:

problems solved
Time per problem
WA-to-AC ratio
Quality of stress-tests during contest

Follow-up Questions (post-mortem)

Which problem could I have solved with 30 more minutes? → Drill that topic.
Which problem did I solve in time that I should have submitted faster? → Bug-hunt training.
Which problem did I skip that I should have attempted? → Misclassification — calibrate.
Did I run out of time or run out of ideas? → Different fixes.

Product Extension

Long-term: ICPC-style training builds engineering reflexes that transfer to:
- Performance debugging under deadline
- Triage of multiple bugs simultaneously
- Estimation of task duration (notoriously hard for engineers)

Language/Runtime Follow-ups

C++: dominant in competitive. Compile-time templates pay off.
Python: acceptable for problems with N ≤ 10^5; risk TLE on tight problems.
Java: middle ground; some teams use it.
Rust: rising; some teams use; standard library less batteries-included than C++.

Common Bugs (contest-specific)

Submitted without testing. Penalty bug.
Submitted with debugging prints still in code. WA.
Forgot to read a problem. Discovered 2 hours later you had a free solve.
Spent 90 minutes on one problem you couldn’t solve. Sunk-cost trap.
Submitted brute force expecting partial credit. ICPC has no partial; only full AC.
Wrong I/O format. Read the spec carefully — especially expected output line endings.

Debugging Strategy (during contest)

5-min rule: if not finding a bug in 5 min, write a stress test (Lab 05)
Random small inputs vs brute force is a contest superpower
If stuck on WA, re-read the problem statement entirely — often the bug is misunderstanding, not code

Mastery Criteria

Complete 5 contests; track scores
Post-mortem each one and act on findings
Solve at least 3 problems consistently in a 5-hour Div 2 contest
Solve at least 1 D-level (Codeforces) problem in 2 hours
Build a personal template file with ≥ 15 commonly-used snippets
After 10 contests, identify your top 3 weakness topics; drill them

Suggested Contest Schedule (4 weeks)

Week	Contest	Goal
1	Codeforces Educational Round	Solve A, B, C, attempt D
2	ABC (extended to 3 hr)	Solve A through F
3	Codeforces Div 2	Solve A, B, C, D
4	ICPC regional replay	Solve 3–5 of 10

Post-mortem after each. Drill weakness topics between contests.

Lab 10 — Inclusion-Exclusion and Burnside

Goal

Master two combinatorial counting techniques used across competitive programming and discrete math:

Inclusion-Exclusion Principle (PIE) for counting objects satisfying / violating multiple conditions
Burnside’s lemma for counting orbits under group actions (counting distinct configurations modulo symmetry)

Background

Inclusion-Exclusion

For sets A₁, …, Aₙ:

$$|A_1 \cup \cdots \cup A_n| = \sum |A_i| - \sum |A_i \cap A_j| + \sum |A_i \cap A_j \cap A_k| - \cdots$$

In counting form, for “objects with property P_i”:

$$\text{count violating none} = \sum_{S \subseteq {1..n}} (-1)^{|S|} \cdot |\text{objects with all properties in S}|$$

Burnside

For a group G acting on a set X, the number of orbits is:

$$|X/G| = \frac{1}{|G|} \sum_{g \in G} |X^g|$$

where X^g is the set of elements fixed by g.

Interview Context

Codeforces / ICPC: regular (PIE constantly, Burnside in counting problems with symmetry)
Quant: combinatorial pricing models
Cryptography / coding theory: standard tools
Standard interviews: occasional easy PIE problem (e.g., “count integers ≤ N divisible by none of {2, 3, 5}”)

PIE appears more often than people realize; Burnside is rarer.

When to Skip This Topic

Skip if any of these are true:

You’re not targeting competitive / combinatorics-heavy roles
You haven’t done basic combinatorics (Phase 1–2 foundations)
You have less than 1 week for this lab

PIE is high-leverage even outside competitive — learn it. Burnside is optional.

Problem Statement

Part A — Count Coprime to N

Given integers L ≤ R and N, count integers in [L, R] coprime to N.

Part B — Distinct Necklaces

Given k colors and n beads in a circle, count the number of distinct necklaces (two necklaces are equivalent if one is a rotation of the other).

Constraints

A: 1 ≤ L ≤ R ≤ 10^18, 1 ≤ N ≤ 10^9
B: 1 ≤ n ≤ 10^6, 1 ≤ k ≤ 10^9

Clarifying Questions

Coprime means gcd(x, N) = 1?
Do we include endpoints?

Are reflections considered equivalent (bracelets) or only rotations (necklaces)?
Modulo what prime?

Examples

A

L=1, R=10, N=6  → coprime to 6 are {1, 5, 7}. Wait, in [1,10]: {1, 5, 7}. Answer: 3.

B

n=3, k=2  → 4 distinct necklaces: BBB, BBW, BWW, WWW (BBW and BWB and WBB are all rotations of each other).

Brute Force

A: iterate x in [L, R], check gcd. O(R - L). For R - L = 10^18: TLE.

B: generate all k^n colorings, group by rotation equivalence. O(k^n).

Brute Force Complexity

A: O(R - L)
B: O(k^n)

Both fail for given constraints.

Optimization Path

A (Inclusion-Exclusion)

Let p₁, …, pₘ be the distinct prime divisors of N. Then:

$$\text{coprime}(1..M) = \sum_{S} (-1)^{|S|} \cdot \lfloor M / \prod_{p \in S} p \rfloor$$

Compute for f(M) = count of integers in [1, M] coprime to N. Answer = f(R) - f(L-1).

def count_coprime(M, N):
    primes = prime_divisors(N)
    m = len(primes)
    total = 0
    for mask in range(1 << m):
        prod = 1
        bits = 0
        for i in range(m):
            if mask & (1 << i):
                prod *= primes[i]
                bits += 1
        total += ((-1) ** bits) * (M // prod)
    return total

answer = count_coprime(R, N) - count_coprime(L - 1, N)

Complexity: O(2^m · log N) where m = number of distinct prime factors of N (≤ 9 for N ≤ 10^9).

B (Burnside)

Group G = cyclic group of n rotations. By Burnside:

$$|\text{necklaces}| = \frac{1}{n} \sum_{d=0}^{n-1} k^{\gcd(n, d)}$$

Group by gcd(n, d) = g: the count of d with this gcd is φ(n/g). So:

$$= \frac{1}{n} \sum_{g | n} \varphi(n/g) \cdot k^g$$

Compute φ on divisors of n. O(σ(n)) divisors; per divisor, O(log n) for fast exponentiation. Total: O(d(n) · log n) which is tiny.

A: O(2^m) where m = distinct prime factors ≤ 9 → constant
B: O(σ(n) · log n) → effectively O(√n · log n) since divisors ≤ 2√n

Implementation Requirements

A: efficient prime factorization (trial division up to √N is fine for N ≤ 10^9)
B: divisor enumeration via trial division; Euler totient by formula φ(n) = n · ∏(1 - 1/p)
Modular exponentiation for k^g mod p
Modular inverse for division by n (use Fermat: n^(p-2) mod p)

Tests

A

L=1, R=10, N=6 → 3
L=1, R=N → φ(N) (Euler totient)
L=R, x coprime to N → 1
L=R, x not coprime → 0
Very large R for performance check

B

n=1, k=2 → 2 (B, W)
n=2, k=2 → 3 (BB, BW, WW)
n=3, k=2 → 4
n=4, k=2 → 6
Compare with brute force for n ≤ 6

Follow-up Questions

For A:

Count coprime to multiple Ns simultaneously. → PIE on union of all prime sets.
Count squarefree numbers in [L, R]. → Möbius function = PIE on prime squares.
Sum of φ(i) for i in [1, N]. → Sieve-like O(N log log N) or O(N^{2/3}) with Mertens.

For B:

Bracelets (rotations + reflections). → Group = dihedral D_n. Add reflection terms.
Necklaces with at most k uses of each color. → Multiset Burnside; harder.
Polya enumeration with cycle index polynomial. → Generalizes Burnside; gives generating function.
Distinct binary trees of n nodes. → Catalan numbers; different problem but related counting.

Product Extension

Combinatorial design (DOE, experimental design)
Code generation (counting equivalence classes of programs)
Chemistry (counting molecular isomers — Polya’s original motivation)
Cryptography (group orbits in elliptic curves)

Language/Runtime Follow-ups

Python: big-int arithmetic for free; ideal for PIE/Burnside with modular arithmetic.
C++: modular arithmetic with manual care for overflow; faster execution.
Java: BigInteger for safety; modular arithmetic mature.

Common Bugs

PIE sign wrong: off-by-one in (-1)^|S|.
PIE on factored N: counted prime factors with multiplicity. Only distinct primes matter.
Burnside: forgot to take modular inverse for division by n.
Euler totient computed via brute force gcd: O(n) per value, way too slow.
Modular exponentiation overflow: use pow(k, g, MOD) in Python; manual loop with mod in C++.

Debugging Strategy

Brute force for small parameters; assert PIE/Burnside match
For PIE: print each subset’s contribution; verify signs alternate
For Burnside: enumerate orbits manually for n ≤ 4

Mastery Criteria

Apply PIE to: count divisible/coprime, derangements, surjections, squarefree
Apply Burnside to: necklaces, bracelets, colored cubes
Compute Euler totient in O(√n)
Compute Möbius function (PIE generalized)
State group-action correctness conditions
Identify a “this needs symmetry handling” problem and reach for Burnside

LeetCode Interview — Extreme Coding