Phase 10 — Testing, Debugging, and Correctness

Target level: Intermediate → Senior Expected duration: 2–3 weeks (assuming Phases 0–9 are complete) Weekly cadence: 4–5 lab hours + apply testing discipline to every problem you solve elsewhere

Why This Phase Exists

Most candidates lose offers not because they couldn’t find an algorithm — they lose because their code was almost right and they never noticed. The interviewer asked “are you sure?”, they said “yes”, and then the interviewer ran one edge case and the screen went red.

Testing and debugging is the dimension where senior candidates separate from juniors. A junior writes code and hopes. A senior writes code and proves it works, then runs three deliberate test cases (one normal, one degenerate, one large), and only then claims “done.”

This phase teaches the discipline. It is short because the mechanics are simple. The habit is what takes weeks to internalize, which is why every later problem in your study should explicitly run the checklist here.

Concepts to Master

Test types

Manual / desk-checked tests — what you trace through on paper during a 45-minute interview
Smoke tests — 1–2 sanity examples to prove the code runs at all
Unit tests — per-function correctness; use these heavily in phase-08-practical-engineering labs
Integration tests — multi-component behavior; relevant when you implement subsystems (cache + invalidator, scheduler + worker)
Property-based tests — hypothesis-style; assert invariants over random inputs (e.g., “sorted output is a permutation of the input”)
Brute-force verifier — known-correct slow solution to validate the fast one on small inputs
Stress testing — random-generation loop that runs the verifier and the fast solution and diffs them; the single best CP debugging tool
Fuzzing (overview) — feed structured random input; useful for parsers, serializers, anything with a grammar
Golden tests — record expected output for canonical inputs; mostly used in compiler/transform code
Mutation testing (overview) — flip operators in your code and check if any test catches the mutation; reveals weak test suites
Coverage analysis — branch and line coverage; necessary but not sufficient

Complexity & performance

Complexity testing — measure runtime at N, 2N, 4N; check the doubling ratio matches your claimed Big-O
Performance profiling — cProfile/py-spy (Python), perf/pprof (Go/C++), async-profiler (Java)
Memory profiling — tracemalloc/memory_profiler (Python), pprof heap (Go), heap dumps (JVM)

Concurrency

Race detection — -race (Go), TSan (C++/Rust), ThreadSanitizer for clang
Deterministic concurrency testing — schedule injection, controlled interleaving, deterministic random
Deadlock detection — lock-order graph analysis

Why Testing Matters in Interviews

Interviewers explicitly score “testing and verification” on the rubric. The signal they’re watching for:

What you do	What it signals
Submit and say “done”	Junior — does not verify own work
Walk through one example manually	Acceptable — minimum bar
Walk through, then deliberately try an edge case	Senior — actively looking for bugs
Find your own bug and fix it without prompt	Strong senior signal
Identify a class of bugs you might have (“integer overflow when the array is large”) and write a test for that specific risk	Staff signal — anticipating failure modes

Candidates who do not test lose offers even when their code is correct, because the interviewer cannot tell whether the correctness was deliberate or accidental.

The Universal Test Checklist

Apply this to every problem you solve, in every phase. Most of these take 10 seconds to consider; even rejecting them out loud earns the signal.

Input shape

Empty input ([], "", 0, None)
Null input (if the language allows)
Single element
Two elements
Maximum-size input (the constraint upper bound)
Minimum-size input (often the constraint lower bound)

Input content

All elements identical (duplicates)
All elements distinct
Already sorted (ascending and descending)
Negative numbers
Zero
Mixed signs
Values at integer boundaries (INT_MAX, INT_MIN, overflow risk in sums/products)
Floating-point precision (when numeric)

Domain-specific

Disconnected graph
Self-loop, multi-edge
Cycle in a graph that “should” be a tree
Empty tree / single-node tree / skewed tree
Linked list with one node, two nodes, with cycle
Strings with unicode, with whitespace, with case differences

Output ambiguity

Multiple valid answers (does the interviewer want any, all, or a canonical one?)
Stable ordering required vs not
Off-by-one in inclusive vs exclusive bounds

Failure modes

Invalid input — does your function crash, return a sentinel, or raise?
Concurrent access (for the practical-engineering labs)
Timeout case — what happens when N is at the constraint limit?

Required Tests Per Lab (Curriculum-Wide Rule)

From Phase 10 forward, every lab you complete (and every lab from Phases 0–9 you re-solve) must include:

3 normal tests — the happy path, what the problem statement examples look like
3 edge tests — chosen from the checklist above; pick the three most relevant to this problem
1 large-input test — N at the constraint upper bound; verifies you didn’t accidentally write an O(N²) loop you thought was O(N log N)
1 randomized test (when a verifier exists) — random input, run brute force and fast solution, assert equal
1 invalid-input test (when applicable) — wrong type, malformed, out of range

Document these as test functions, not “I thought about it.” The act of writing them catches bugs.

Common Mistakes

Testing only the given examples. The examples in the problem statement are almost always the happy path; they never exercise edge cases.
Mental simulation without writing it down. Your brain skips steps. Trace on paper.
Treating “the code compiles” as “the code works.” Compilation is the lowest bar.
Not verifying complexity empirically. A claimed O(N) that runs 30× slower at 2N is actually quadratic.
Adding tests after the bug. Add the test first, watch it fail, then fix; otherwise you don’t know your test would have caught it.
Ignoring “obvious” cases. Empty input bugs are the #1 cause of failed phone screens.
Not testing concurrency under load. A thread-safety bug at 1 thread is invisible; at 1000 threads on 8 cores, it’s a daily incident.

Debugging Checklist (Apply When Stuck)

Reproduce. What is the smallest input that fails?
Read the error. Stack trace, line number, value. Do not skip this.
State the expected output. If you can’t, you don’t understand the problem.
Diff expected vs actual. Is it off by one? Off by a factor? Wrong type?
Binary-search the bug. Print state at midpoint of the algorithm; halve the search space.
Check invariants. What was supposed to be true at this point? Assert it.
Question assumptions. “I’m sure this list is sorted” — prove it.
Read the code aloud. Speech catches what your eye skips.
Rubber-duck explain. Tell an inanimate object what the code does, line by line.
Step away for 60 seconds. Genuinely. The number of bugs solved this way is embarrassing.

Mastery Checklist

You have completed Phase 10 when you can:

Generate the universal test checklist for any new problem in under 90 seconds
Write a brute-force verifier for any problem with N ≤ 20
Build a randomized stress-testing harness in under 10 minutes for a new problem
Diagnose a wrong-answer bug in your own code in under 5 minutes using the debugging checklist
Diagnose a TLE (timeout) bug by measuring the doubling ratio
State the loop invariant for binary search, Kadane’s algorithm, and a simple DP
Profile a Python script and identify the top 3 hot functions in under 5 minutes
Find a race condition in a small Go/Java/C++ program using the language’s race detector
Recognize when a test is too weak (mutation testing thought experiment)

Exit Criteria

Before moving to Phase 11:

Complete all 6 labs in this directory with the full test suite written and passing
Re-solve 3 problems from Phase 2 and 3 problems from Phase 5 applying the universal test checklist; document any bugs caught
Run the stress-testing harness (Lab 5) on at least one problem you previously thought was correct, and report what you found
Profile one of your Phase 8 practical-engineering implementations (e.g., LRU cache, rate limiter) and identify at least one inefficiency

Labs

#	Lab	Focus	Anchor Problem
1	lab-01-edge-case-taxonomy.md	Systematic edge case discovery	Array median
2	lab-02-test-driven-problem-solving.md	Write tests before code	LRU cache
3	lab-03-debugging-under-pressure.md	Systematic debug protocol	Word Break (planted bug)
4	lab-04-correctness-proofs.md	Loop invariants & induction	Binary search + Kadane
5	lab-05-stress-testing-harness.md	Brute-force verifier + random fuzzing	Two-pointer variants
6	lab-06-performance-profiling.md	Empirical complexity + profiling	LIS implementations

Connection to Other Phases

Phase 2/3/4/5 — re-solve a problem from each, applying the universal test checklist
Phase 7 (Competitive) — Lab 5 (stress testing) is the canonical CP debugging tool; use it on every CF problem you fail
Phase 8 (Practical Engineering) — concurrency-aware testing is required for every lab; the rate limiter, LRU cache, and thread pool labs all need race-condition tests
Phase 11 (Mocks) — the testing rubric (dimension 8) is scored on every mock; this phase trains that score

LeetCode Interview — Extreme Coding