Phase 10 — Testing, Debugging, and Correctness

Target level: Intermediate → Senior Expected duration: 2–3 weeks (assuming Phases 0–9 are complete) Weekly cadence: 4–5 lab hours + apply testing discipline to every problem you solve elsewhere


Why This Phase Exists

Most candidates lose offers not because they couldn’t find an algorithm — they lose because their code was almost right and they never noticed. The interviewer asked “are you sure?”, they said “yes”, and then the interviewer ran one edge case and the screen went red.

Testing and debugging is the dimension where senior candidates separate from juniors. A junior writes code and hopes. A senior writes code and proves it works, then runs three deliberate test cases (one normal, one degenerate, one large), and only then claims “done.”

This phase teaches the discipline. It is short because the mechanics are simple. The habit is what takes weeks to internalize, which is why every later problem in your study should explicitly run the checklist here.


Concepts to Master

Test types

  • Manual / desk-checked tests — what you trace through on paper during a 45-minute interview
  • Smoke tests — 1–2 sanity examples to prove the code runs at all
  • Unit tests — per-function correctness; use these heavily in phase-08-practical-engineering labs
  • Integration tests — multi-component behavior; relevant when you implement subsystems (cache + invalidator, scheduler + worker)
  • Property-based testshypothesis-style; assert invariants over random inputs (e.g., “sorted output is a permutation of the input”)
  • Brute-force verifier — known-correct slow solution to validate the fast one on small inputs
  • Stress testing — random-generation loop that runs the verifier and the fast solution and diffs them; the single best CP debugging tool
  • Fuzzing (overview) — feed structured random input; useful for parsers, serializers, anything with a grammar
  • Golden tests — record expected output for canonical inputs; mostly used in compiler/transform code
  • Mutation testing (overview) — flip operators in your code and check if any test catches the mutation; reveals weak test suites
  • Coverage analysis — branch and line coverage; necessary but not sufficient

Complexity & performance

  • Complexity testing — measure runtime at N, 2N, 4N; check the doubling ratio matches your claimed Big-O
  • Performance profilingcProfile/py-spy (Python), perf/pprof (Go/C++), async-profiler (Java)
  • Memory profilingtracemalloc/memory_profiler (Python), pprof heap (Go), heap dumps (JVM)

Concurrency

  • Race detection-race (Go), TSan (C++/Rust), ThreadSanitizer for clang
  • Deterministic concurrency testing — schedule injection, controlled interleaving, deterministic random
  • Deadlock detection — lock-order graph analysis

Why Testing Matters in Interviews

Interviewers explicitly score “testing and verification” on the rubric. The signal they’re watching for:

What you doWhat it signals
Submit and say “done”Junior — does not verify own work
Walk through one example manuallyAcceptable — minimum bar
Walk through, then deliberately try an edge caseSenior — actively looking for bugs
Find your own bug and fix it without promptStrong senior signal
Identify a class of bugs you might have (“integer overflow when the array is large”) and write a test for that specific riskStaff signal — anticipating failure modes

Candidates who do not test lose offers even when their code is correct, because the interviewer cannot tell whether the correctness was deliberate or accidental.


The Universal Test Checklist

Apply this to every problem you solve, in every phase. Most of these take 10 seconds to consider; even rejecting them out loud earns the signal.

Input shape

  • Empty input ([], "", 0, None)
  • Null input (if the language allows)
  • Single element
  • Two elements
  • Maximum-size input (the constraint upper bound)
  • Minimum-size input (often the constraint lower bound)

Input content

  • All elements identical (duplicates)
  • All elements distinct
  • Already sorted (ascending and descending)
  • Negative numbers
  • Zero
  • Mixed signs
  • Values at integer boundaries (INT_MAX, INT_MIN, overflow risk in sums/products)
  • Floating-point precision (when numeric)

Domain-specific

  • Disconnected graph
  • Self-loop, multi-edge
  • Cycle in a graph that “should” be a tree
  • Empty tree / single-node tree / skewed tree
  • Linked list with one node, two nodes, with cycle
  • Strings with unicode, with whitespace, with case differences

Output ambiguity

  • Multiple valid answers (does the interviewer want any, all, or a canonical one?)
  • Stable ordering required vs not
  • Off-by-one in inclusive vs exclusive bounds

Failure modes

  • Invalid input — does your function crash, return a sentinel, or raise?
  • Concurrent access (for the practical-engineering labs)
  • Timeout case — what happens when N is at the constraint limit?

Required Tests Per Lab (Curriculum-Wide Rule)

From Phase 10 forward, every lab you complete (and every lab from Phases 0–9 you re-solve) must include:

  • 3 normal tests — the happy path, what the problem statement examples look like
  • 3 edge tests — chosen from the checklist above; pick the three most relevant to this problem
  • 1 large-input test — N at the constraint upper bound; verifies you didn’t accidentally write an O(N²) loop you thought was O(N log N)
  • 1 randomized test (when a verifier exists) — random input, run brute force and fast solution, assert equal
  • 1 invalid-input test (when applicable) — wrong type, malformed, out of range

Document these as test functions, not “I thought about it.” The act of writing them catches bugs.


Common Mistakes

  • Testing only the given examples. The examples in the problem statement are almost always the happy path; they never exercise edge cases.
  • Mental simulation without writing it down. Your brain skips steps. Trace on paper.
  • Treating “the code compiles” as “the code works.” Compilation is the lowest bar.
  • Not verifying complexity empirically. A claimed O(N) that runs 30× slower at 2N is actually quadratic.
  • Adding tests after the bug. Add the test first, watch it fail, then fix; otherwise you don’t know your test would have caught it.
  • Ignoring “obvious” cases. Empty input bugs are the #1 cause of failed phone screens.
  • Not testing concurrency under load. A thread-safety bug at 1 thread is invisible; at 1000 threads on 8 cores, it’s a daily incident.

Debugging Checklist (Apply When Stuck)

  1. Reproduce. What is the smallest input that fails?
  2. Read the error. Stack trace, line number, value. Do not skip this.
  3. State the expected output. If you can’t, you don’t understand the problem.
  4. Diff expected vs actual. Is it off by one? Off by a factor? Wrong type?
  5. Binary-search the bug. Print state at midpoint of the algorithm; halve the search space.
  6. Check invariants. What was supposed to be true at this point? Assert it.
  7. Question assumptions. “I’m sure this list is sorted” — prove it.
  8. Read the code aloud. Speech catches what your eye skips.
  9. Rubber-duck explain. Tell an inanimate object what the code does, line by line.
  10. Step away for 60 seconds. Genuinely. The number of bugs solved this way is embarrassing.

Mastery Checklist

You have completed Phase 10 when you can:

  • Generate the universal test checklist for any new problem in under 90 seconds
  • Write a brute-force verifier for any problem with N ≤ 20
  • Build a randomized stress-testing harness in under 10 minutes for a new problem
  • Diagnose a wrong-answer bug in your own code in under 5 minutes using the debugging checklist
  • Diagnose a TLE (timeout) bug by measuring the doubling ratio
  • State the loop invariant for binary search, Kadane’s algorithm, and a simple DP
  • Profile a Python script and identify the top 3 hot functions in under 5 minutes
  • Find a race condition in a small Go/Java/C++ program using the language’s race detector
  • Recognize when a test is too weak (mutation testing thought experiment)

Exit Criteria

Before moving to Phase 11:

  1. Complete all 6 labs in this directory with the full test suite written and passing
  2. Re-solve 3 problems from Phase 2 and 3 problems from Phase 5 applying the universal test checklist; document any bugs caught
  3. Run the stress-testing harness (Lab 5) on at least one problem you previously thought was correct, and report what you found
  4. Profile one of your Phase 8 practical-engineering implementations (e.g., LRU cache, rate limiter) and identify at least one inefficiency

Labs

#LabFocusAnchor Problem
1lab-01-edge-case-taxonomy.mdSystematic edge case discoveryArray median
2lab-02-test-driven-problem-solving.mdWrite tests before codeLRU cache
3lab-03-debugging-under-pressure.mdSystematic debug protocolWord Break (planted bug)
4lab-04-correctness-proofs.mdLoop invariants & inductionBinary search + Kadane
5lab-05-stress-testing-harness.mdBrute-force verifier + random fuzzingTwo-pointer variants
6lab-06-performance-profiling.mdEmpirical complexity + profilingLIS implementations

Connection to Other Phases

  • Phase 2/3/4/5 — re-solve a problem from each, applying the universal test checklist
  • Phase 7 (Competitive) — Lab 5 (stress testing) is the canonical CP debugging tool; use it on every CF problem you fail
  • Phase 8 (Practical Engineering) — concurrency-aware testing is required for every lab; the rate limiter, LRU cache, and thread pool labs all need race-condition tests
  • Phase 11 (Mocks) — the testing rubric (dimension 8) is scored on every mock; this phase trains that score