Phase 10 — Testing, Debugging, and Correctness
Target level: Intermediate → Senior Expected duration: 2–3 weeks (assuming Phases 0–9 are complete) Weekly cadence: 4–5 lab hours + apply testing discipline to every problem you solve elsewhere
Why This Phase Exists
Most candidates lose offers not because they couldn’t find an algorithm — they lose because their code was almost right and they never noticed. The interviewer asked “are you sure?”, they said “yes”, and then the interviewer ran one edge case and the screen went red.
Testing and debugging is the dimension where senior candidates separate from juniors. A junior writes code and hopes. A senior writes code and proves it works, then runs three deliberate test cases (one normal, one degenerate, one large), and only then claims “done.”
This phase teaches the discipline. It is short because the mechanics are simple. The habit is what takes weeks to internalize, which is why every later problem in your study should explicitly run the checklist here.
Concepts to Master
Test types
- Manual / desk-checked tests — what you trace through on paper during a 45-minute interview
- Smoke tests — 1–2 sanity examples to prove the code runs at all
- Unit tests — per-function correctness; use these heavily in
phase-08-practical-engineeringlabs - Integration tests — multi-component behavior; relevant when you implement subsystems (cache + invalidator, scheduler + worker)
- Property-based tests —
hypothesis-style; assert invariants over random inputs (e.g., “sorted output is a permutation of the input”) - Brute-force verifier — known-correct slow solution to validate the fast one on small inputs
- Stress testing — random-generation loop that runs the verifier and the fast solution and diffs them; the single best CP debugging tool
- Fuzzing (overview) — feed structured random input; useful for parsers, serializers, anything with a grammar
- Golden tests — record expected output for canonical inputs; mostly used in compiler/transform code
- Mutation testing (overview) — flip operators in your code and check if any test catches the mutation; reveals weak test suites
- Coverage analysis — branch and line coverage; necessary but not sufficient
Complexity & performance
- Complexity testing — measure runtime at N, 2N, 4N; check the doubling ratio matches your claimed Big-O
- Performance profiling —
cProfile/py-spy(Python),perf/pprof(Go/C++),async-profiler(Java) - Memory profiling —
tracemalloc/memory_profiler(Python),pprofheap (Go), heap dumps (JVM)
Concurrency
- Race detection —
-race(Go), TSan (C++/Rust),ThreadSanitizerfor clang - Deterministic concurrency testing — schedule injection, controlled interleaving, deterministic random
- Deadlock detection — lock-order graph analysis
Why Testing Matters in Interviews
Interviewers explicitly score “testing and verification” on the rubric. The signal they’re watching for:
| What you do | What it signals |
|---|---|
| Submit and say “done” | Junior — does not verify own work |
| Walk through one example manually | Acceptable — minimum bar |
| Walk through, then deliberately try an edge case | Senior — actively looking for bugs |
| Find your own bug and fix it without prompt | Strong senior signal |
| Identify a class of bugs you might have (“integer overflow when the array is large”) and write a test for that specific risk | Staff signal — anticipating failure modes |
Candidates who do not test lose offers even when their code is correct, because the interviewer cannot tell whether the correctness was deliberate or accidental.
The Universal Test Checklist
Apply this to every problem you solve, in every phase. Most of these take 10 seconds to consider; even rejecting them out loud earns the signal.
Input shape
- Empty input (
[],"",0,None) - Null input (if the language allows)
- Single element
- Two elements
- Maximum-size input (the constraint upper bound)
- Minimum-size input (often the constraint lower bound)
Input content
- All elements identical (duplicates)
- All elements distinct
- Already sorted (ascending and descending)
- Negative numbers
- Zero
- Mixed signs
- Values at integer boundaries (
INT_MAX,INT_MIN, overflow risk in sums/products) - Floating-point precision (when numeric)
Domain-specific
- Disconnected graph
- Self-loop, multi-edge
- Cycle in a graph that “should” be a tree
- Empty tree / single-node tree / skewed tree
- Linked list with one node, two nodes, with cycle
- Strings with unicode, with whitespace, with case differences
Output ambiguity
- Multiple valid answers (does the interviewer want any, all, or a canonical one?)
- Stable ordering required vs not
- Off-by-one in inclusive vs exclusive bounds
Failure modes
- Invalid input — does your function crash, return a sentinel, or raise?
- Concurrent access (for the practical-engineering labs)
- Timeout case — what happens when N is at the constraint limit?
Required Tests Per Lab (Curriculum-Wide Rule)
From Phase 10 forward, every lab you complete (and every lab from Phases 0–9 you re-solve) must include:
- 3 normal tests — the happy path, what the problem statement examples look like
- 3 edge tests — chosen from the checklist above; pick the three most relevant to this problem
- 1 large-input test — N at the constraint upper bound; verifies you didn’t accidentally write an O(N²) loop you thought was O(N log N)
- 1 randomized test (when a verifier exists) — random input, run brute force and fast solution, assert equal
- 1 invalid-input test (when applicable) — wrong type, malformed, out of range
Document these as test functions, not “I thought about it.” The act of writing them catches bugs.
Common Mistakes
- Testing only the given examples. The examples in the problem statement are almost always the happy path; they never exercise edge cases.
- Mental simulation without writing it down. Your brain skips steps. Trace on paper.
- Treating “the code compiles” as “the code works.” Compilation is the lowest bar.
- Not verifying complexity empirically. A claimed O(N) that runs 30× slower at 2N is actually quadratic.
- Adding tests after the bug. Add the test first, watch it fail, then fix; otherwise you don’t know your test would have caught it.
- Ignoring “obvious” cases. Empty input bugs are the #1 cause of failed phone screens.
- Not testing concurrency under load. A thread-safety bug at 1 thread is invisible; at 1000 threads on 8 cores, it’s a daily incident.
Debugging Checklist (Apply When Stuck)
- Reproduce. What is the smallest input that fails?
- Read the error. Stack trace, line number, value. Do not skip this.
- State the expected output. If you can’t, you don’t understand the problem.
- Diff expected vs actual. Is it off by one? Off by a factor? Wrong type?
- Binary-search the bug. Print state at midpoint of the algorithm; halve the search space.
- Check invariants. What was supposed to be true at this point? Assert it.
- Question assumptions. “I’m sure this list is sorted” — prove it.
- Read the code aloud. Speech catches what your eye skips.
- Rubber-duck explain. Tell an inanimate object what the code does, line by line.
- Step away for 60 seconds. Genuinely. The number of bugs solved this way is embarrassing.
Mastery Checklist
You have completed Phase 10 when you can:
- Generate the universal test checklist for any new problem in under 90 seconds
- Write a brute-force verifier for any problem with N ≤ 20
- Build a randomized stress-testing harness in under 10 minutes for a new problem
- Diagnose a wrong-answer bug in your own code in under 5 minutes using the debugging checklist
- Diagnose a TLE (timeout) bug by measuring the doubling ratio
- State the loop invariant for binary search, Kadane’s algorithm, and a simple DP
- Profile a Python script and identify the top 3 hot functions in under 5 minutes
- Find a race condition in a small Go/Java/C++ program using the language’s race detector
- Recognize when a test is too weak (mutation testing thought experiment)
Exit Criteria
Before moving to Phase 11:
- Complete all 6 labs in this directory with the full test suite written and passing
- Re-solve 3 problems from Phase 2 and 3 problems from Phase 5 applying the universal test checklist; document any bugs caught
- Run the stress-testing harness (Lab 5) on at least one problem you previously thought was correct, and report what you found
- Profile one of your Phase 8 practical-engineering implementations (e.g., LRU cache, rate limiter) and identify at least one inefficiency
Labs
| # | Lab | Focus | Anchor Problem |
|---|---|---|---|
| 1 | lab-01-edge-case-taxonomy.md | Systematic edge case discovery | Array median |
| 2 | lab-02-test-driven-problem-solving.md | Write tests before code | LRU cache |
| 3 | lab-03-debugging-under-pressure.md | Systematic debug protocol | Word Break (planted bug) |
| 4 | lab-04-correctness-proofs.md | Loop invariants & induction | Binary search + Kadane |
| 5 | lab-05-stress-testing-harness.md | Brute-force verifier + random fuzzing | Two-pointer variants |
| 6 | lab-06-performance-profiling.md | Empirical complexity + profiling | LIS implementations |
Connection to Other Phases
- Phase 2/3/4/5 — re-solve a problem from each, applying the universal test checklist
- Phase 7 (Competitive) — Lab 5 (stress testing) is the canonical CP debugging tool; use it on every CF problem you fail
- Phase 8 (Practical Engineering) — concurrency-aware testing is required for every lab; the rate limiter, LRU cache, and thread pool labs all need race-condition tests
- Phase 11 (Mocks) — the testing rubric (dimension 8) is scored on every mock; this phase trains that score