p01 — Two Sum

Source: LeetCode 1 · Easy · Topics: Array, Hash Table Companies (2024–2025 frequency): Amazon (very high), Google (high), Meta (high), Apple (medium), Microsoft (medium), Bloomberg (very high) Loop position: phone screen warmup, or first 10 min of onsite to calibrate

1. Quick Context

This is the most-asked Easy in Big Tech history. The interviewer is not testing whether you can solve it — they expect you to solve it in <8 minutes. They are testing whether you:

Clarify before coding (duplicates? multiple answers? sorted?)
State the brute force out loud before optimizing
Pick the right optimal (one-pass hash, not two-pass)
Handle the “what about the same element twice” edge case
Communicate cleanly through a problem you’ve obviously seen before

What it looks like it tests: array iteration. What it actually tests: disciplined communication under “I’ve done this a million times” complacency. Senior candidates fail this by going too fast and skipping clarifications. The interviewer is watching for the framework, not the answer.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/two-sum/

STOP. Set a 15-minute timer. Code it cold in a scratch file. Do not read past this section until you have either solved it or the timer expired.

If you’ve solved Two Sum before: do it again anyway, and time yourself. Target: 6 min including narration. If you’re over 8 min, you have a framework/communication gap, not an algorithm gap — and that gap will kill you on harder problems.

3. Prerequisite Concepts

Hash table average O(1) lookup + the assumption that makes it true (phase-01 §3 HashMap Mastery)
“Complement search” pattern: instead of looking for pairs, transform to lookup of (target − x)
In your primary language: what hash collision behavior is, what hash resize cost is — see phase-09

4. How to Approach (FRAMEWORK Steps 1–9 applied)

Step 1 — Restate: “Given an integer array and a target integer, return the indices of two distinct elements that sum to target. Exactly one valid pair exists per problem statement.”

Step 2 — Clarify (ask out loud, do NOT skip even though it’s Easy):

“Can the same element be used twice?” (No — indices must be distinct.)
“If multiple pairs sum to target, which one do I return?” (Problem says exactly one solution — but in real interviews they may relax this; ask.)
“Can the array be empty / size 1?” (Per constraints, N ≥ 2.)
“Are values bounded? Can they be negative?” (Per LC, yes, both negative and positive in int32 range.)
“Is the array sorted?” (No. If it were, you’d use two pointers — explicitly call this out; this is a senior signal.)
“Return indices or values?” (Indices, in any order.)

Step 3 — Constraints: N up to 10^4 in classic statement. O(N²) brute fits, but O(N) is expected. With N up to 10^4, O(N²) = 10^8 — borderline, will get TLE on some servers.

Step 4 — Examples (build your own):

[2,7,11,15], target=9 → [0,1] (the given one)
[3,3], target=6 → [0,1] ← critical: the “same value, different indices” case
[-3,4,3,90], target=0 → [0,2] (negative handling)
[1,2,3,4], target=100 → does not occur per statement, but ask: “if no solution, what do I return?”

Step 5 — Brute Force: Nested loop. For each i, check every j > i. O(N²) time, O(1) space.

Step 6 — Brute Force Complexity: Time O(N²), space O(1). Trivially correct. State this out loud BEFORE optimizing.

Step 7 — Pattern Recognition: “Given an array, find a pair satisfying a property” + “complement is computable” + “order of result doesn’t matter” → HashMap complement search. (Sorted + ordered → two pointers. Unsorted + complement-computable → HashMap.)

Step 8 — Optimize: Walk the array once. For each x at index i, compute complement = target − x. If complement is in the map, return (map[complement], i). Else add x → i to the map. One pass, not two. Two-pass works but signals weaker intuition.

Step 9 — Prove correctness: Loop invariant: after processing index i, the map contains exactly {nums[0..i] : their indices} for every distinct value (or the latest index for duplicates, but since exactly one solution exists, duplicates can only be the answer pair — and at index j, the complement was stored at index i < j, so we find it).

5. Progressive Hints

If you’re stuck for more than 5 minutes, open hints.md and read one hint only. Set another 5-min timer between hints.

6. Deeper Insight — Why It Works

The transformation: “Find two numbers summing to T” is a 2D search (i, j pairs) → O(N²). By computing T − nums[i] for each i, we reduce to a 1D lookup (“does this value exist?”) which is O(1) amortized with a hash table. The hash table is the data structure that turns 2D pair-search into 1D existence-search. This is the master pattern for entire problem families: 3Sum, 4Sum, Subarray-Sum-K, etc., all reduce 1 dimension via complement-hashing.

The single-pass insight: You don’t need to load the full array into the map first. By the time you encounter the second element of the answer pair, the first one has already been stored. So insert AFTER you check — never before, or [3,3] breaks (you’d find 3 → 0 and return [0,0], two equal indices).

Order matters: check first, insert second. Reversing this is the single most common bug. The map state at step i represents “everything seen STRICTLY BEFORE i” — that’s the invariant.

7. Anti-Pattern Analysis

Wrong-but-tempting #1 — Two-pass with enumerate:

d = {x: i for i, x in enumerate(nums)}
for i, x in enumerate(nums):
    j = d.get(target - x)
    if j is not None and j != i:
        return [i, j]

This works but: (a) O(N) extra memory written then read again, (b) the j != i guard signals you didn’t think about why one-pass avoids the issue, (c) for duplicates [3,3], the dict only stores the last index — works only because j != i saves you. Two-pass is a code smell that says “I memorized the pattern but didn’t internalize the invariant.”

Wrong-but-tempting #2 — Sort then two-pointer:

sorted_nums = sorted(enumerate(nums), key=lambda p: p[1])
# ... two-pointer scan

Works, but O(N log N) when O(N) exists. Some candidates do this because two-pointer is their hammer. At Google in particular, choosing N log N when N exists is a signaled-down on the algorithmic complexity rubric.

Wrong-but-tempting #3 — Brute force “to be safe”: The brute force fits N ≤ 10⁴ barely. If the interviewer expands the constraint to 10⁵, brute force times out. They WILL expand the constraint to test you — be ready.

8. Skills & Takeaways

Generalizable pattern: Complement search via HashMap. Any time you see “find pair/triple/group summing or differing or relating to value V”, first ask: can I express the missing piece as a function of what I have? If yes, HashMap that function’s range.

Analogous problems (do these on LC after):

LC 167 — Two Sum II (sorted input — uses two pointers instead; teaches when NOT to use hashmap)
LC 653 — Two Sum IV BST (BFS + set; same complement pattern, different traversal)
LC 1 vs LC 15 (3Sum) — the recursive extension; outer loop fixes one element, reduces to 2Sum
LC 560 — Subarray Sum Equals K (the prefix-sum complement variant — same idea, applied to running sums)
LC 454 — 4Sum II (split into two halves, hash one half, lookup complements of the other — meet-in-the-middle flavor)

When NOT to use this: Sorted input (two pointers is O(1) space, hashmap is O(N)). Streaming input where you can’t afford O(N) memory.

9. When to Move On (binary; must all be YES)

I solved p01 unaided in <8 min including narration on the second attempt
I can state the loop invariant of one-pass Two Sum without looking
I can explain why “check first, insert second” matters, with the [3,3] example
I can name when two-pointer is preferred over hashmap (sorted input, O(1) space requirement)
I implemented this from scratch and my version passes stress_test() in solution.py
I solved LC 167 (Two Sum II) and articulated why the optimal approach changed
I solved LC 560 (Subarray Sum K) and recognized the same complement-hashing pattern

If any unchecked: redo tomorrow before moving to p02.

10. Company Context

Amazon (LP-heavy; coding bar = “are bugs going to ship?”)

The framing: Often given as “given a list of Order objects with a price field, find two orders whose prices sum to the target promo discount.”
Misleading example: They’ll give you [5, 5, 5, 5], target=10. Many candidates lock onto “return the first two indices that work” — but Amazon interviewers want you to ASK whether you should return any valid pair or the first in some defined order. Asking shows Customer Obsession (knowing the requirement).
Adversarial extension: “Now there could be millions of orders streamed one at a time. How does your solution change?” → streaming with a HashSet, return the first matching pair.
What they want to hear: “Let me clarify the requirements before I optimize.” Verbatim phrases like “Let me first state the brute force so we have a baseline” earn rubric points.

Google (algorithmic complexity is a hard rubric line)

The framing: Often the cleanest, just the original problem.
Misleading example: A small-N example where O(N²) clearly fits. Google interviewers do this to see if junior candidates will say “brute force is fine” and stop. The right move: “Brute force fits this size, but I want to do better as a habit — and to handle the extension where N is 10⁶.”
Adversarial extension: “Return all pairs that sum to target.” Now it’s not 1 answer; you must dedupe pairs (3Sum-style logic).
What they want to hear: Explicit complexity for brute and optimal, stated separately. “O(N²) brute → O(N) one-pass hashmap, O(N) extra space.”

Meta (heavy on follow-ups; expect 3–4 in 25 min)

The framing: “Two Sum, then variants in rapid succession.”
Misleading example: They start with the canonical example, then immediately follow up: “Now what if the array is sorted?” If you stayed on hashmap, you scored less than the candidate who pivots to two-pointers.
Adversarial extension: “What if duplicates are allowed in input AND we want all unique pairs?” → 2Sum-with-duplicates → introduces dedup via sort or a counted-set.
What they want to hear: Recognition that the optimal algorithm depends on input properties. The phrase “if the input were sorted, I’d use two pointers” wins them over.

Microsoft (clarity + cleanliness over speed)

The framing: Phone screen warmup.
Misleading example: None — Microsoft tends to be straightforward here.
Adversarial extension: “Now make it work for any K-Sum” (K is a parameter).
What they want to hear: Clean function decomposition, clear variable names, edge-case enumeration. Microsoft phone screens reward boring, correct code.

Bloomberg (financial framing — be ready)

The framing: “Given a list of trade prices and a settlement amount, find two trades that exactly settle to the amount.”
Misleading example: Includes negative prices (refunds). Candidates who hard-code i < j ordering may break on negatives if they switch to a sort approach mid-stream.
What they want to hear: Explicit “I’m assuming negative values are allowed; my hash approach handles them naturally.”

11. Interviewer’s Lens

Phase	Strong signal	Weak signal	Scorecard phrase (strong)
Reading problem	Asks 3+ clarifying questions even though it’s Easy	Says “oh I’ve seen this” and dives in	“Disciplined clarifying behavior — would translate to fewer production bugs”
Pre-coding	States brute force, then states optimal, with complexities for both	Jumps to “I’ll use a hashmap” without justifying	“Communicates derivation, not just memorization”
Coding	Names variable `complement`, comments invariant once	Uses `d`, `m`, no comments	“Code is interview-readable; would pass our internal code review bar”
Edge cases	Tests `[3,3]` before submission	Tests only the given example	“Self-catches bugs before code review — strong production instinct”
Post-coding	Articulates time AND space complexity unprompted	Says only “it’s linear”	“Owns full complexity analysis”

The scorecard line that gets you the offer: “Candidate demonstrated framework discipline on a trivial problem, suggesting it will scale to hard ones.”

The scorecard line that loses you: “Skipped clarifying questions; rushed to known answer; did not test [3,3]; missed senior signal opportunity.”

12. Level Delta

Level	What their answer looks like
Mid	One-pass hashmap solution. States O(N) time, O(N) space. Tests given example. ~10 min.
Senior	All of Mid + clarifies 3+ questions upfront + explicitly states brute force first + tests `[3,3]` + mentions “if sorted, I’d use two pointers” + completes in ~7 min.
Staff	All of Senior + articulates loop invariant before coding + names the complement-search pattern by family + connects to LC 15 (3Sum) and LC 560 (Subarray Sum K) as the same family + mentions hash collision worst-case (O(N²)) as a footnote + completes in ~6 min.
Principal	All of Staff + asks “what’s the production context — are these orders, transactions, ad bids?” + identifies that for very large N you’d shard the hashmap or use a Bloom-filter prefilter + mentions GC pressure from large dict allocation as an Amazon/Google production concern + offers the streaming variant unprompted + completes in ~5 min with time to discuss tradeoffs.

Honest self-assessment: Which level was YOUR answer? If “Mid”, you have 4 sections to add to your toolkit. If “Senior” — good baseline; aim for Staff on the next 5 problems.

13. Follow-up Questions & Full Answers

Follow-up 1: “What if the array is sorted?”

Signal sought: Do you recognize that input properties change the optimal algorithm?

Full answer: “If sorted, I’d switch to two pointers — left=0, right=N-1. If sum > target, decrement right; if sum < target, increment left; if equal, return. O(N) time, O(1) space (better than hashmap’s O(N) space). The correctness comes from monotonicity: incrementing left only increases the sum; decrementing right only decreases. We never miss the answer because at each step, the eliminated half cannot contain the answer.”

Follow-up 2: “What if there are multiple valid pairs and we want all unique ones?”

Signal sought: Can you handle dedup without explosive complexity?

Full answer: “Two approaches. (a) Sort + two pointers + skip duplicates on both sides — O(N log N) time, O(1) extra. (b) Hashmap + use a set of frozensets to dedup result pairs — O(N) time, O(N) extra. Approach (a) is preferred unless we cannot mutate input. The key insight: dedup happens by skipping equal adjacent values after sorting, not by post-filtering.”

Follow-up 3: “Now there are billions of integers streamed one at a time, infinite stream. Detect any sum-pair as fast as possible.”

Signal sought: Streaming / unbounded-input thinking.

Full answer: “Use a HashSet (not Map — we just need existence). For each incoming x: check (target − x) in set; if found, emit the pair; else add x to set. O(1) per element, O(N) total memory grows unboundedly. For unbounded memory: use a Bloom filter as a prefilter (false positives OK; we verify by querying upstream), bounded memory at cost of occasional false alarms. If we need bounded memory AND zero false positives, we accept that we may miss pairs — fundamentally we cannot remember an arbitrary stream. State this tradeoff explicitly.”

Follow-up 4 (Hard): “Distribute across 100 machines. Each holds 1% of the array. Find a pair summing to target.”

Signal sought: Distributed systems thinking, communication cost awareness.

Full answer: “Broadcast the target. Each machine builds a local set of its values. To find cross-machine pairs: each machine emits (target − x, x, machine_id) for each local x, hash-partitioned by (target − x) mod 100 to the responsible machine. That receiving machine checks if it holds the complement. Communication is O(N) total messages, O(N/100) per machine. Latency: 1 shuffle round. Correctness: every valid pair (x, y) where x + y = target is checked because x is hashed by y = target − x to y’s home machine. Caveat: if the array is so large that even local sets don’t fit in RAM, we partition further or stream from disk with external-sort-style processing.”

Follow-up 5 (Senior signal): “How would you test this code?”

Signal sought: Testing discipline, not just unit tests.

Full answer: “Four layers. (1) Unit: given example, [3,3], negatives, two valid pairs choosing one. (2) Edge: minimum N=2, maximum constraint. (3) Property test: random N, random ints, brute-force comparator — assert both find the same pair (modulo order). My solution.py does this via stress_test(). (4) Adversarial fuzz: hash-collision DoS inputs — known pathological inputs that cause O(N²) hashmap behavior. Production code would also include perf regression tests and memory profiling for large N.”

14. Full Solution Walkthrough

See solution.py.

The file has four sections:

two_sum_brute(nums, target) — nested loop, O(N²). This is your correctness oracle.
two_sum(nums, target) — the one-pass hashmap. Note the order: check first, insert after. The comment on line marking the invariant.
stress_test() — generates 1000 random arrays, runs both, asserts results sum to target equally. This is the bar: every flagship problem has a stress test.
__main__ — runs the given example, [3,3], negative case, then the stress test.

Decisions justified in the file:

Why seen.get(complement) instead of if complement in seen: return [seen[complement], i]: one hash lookup instead of two.
Why we return as a list [a, b] not a tuple: matches LC signature exactly.
Why no input validation: per the framework, we validate at system boundaries — interview code assumes valid input per the problem statement.

15. Beyond the Problem — Production Reality

At 10M RPS:

The hashmap allocation per request becomes a GC pressure point. In production, you’d pool the map or use a primitive-keyed map (e.g., Long2IntOpenHashMap in Java’s Eclipse Collections, or a dict allocated once and .clear()’d in Python).
For very large N per request, the O(N) memory dominates. Spilling to off-heap or using a compact open-addressing hashmap would matter.

Real system this is the kernel of:

Ad bid matching: given a target CPM, find two bids that sum to the publisher’s floor. Same algorithm, with bid-objects carrying metadata. Real ad exchanges (Google Ad Exchange, Meta Audience Network) do variants of this billions of times per second.
Promo discount calculator: e-commerce platforms match “if customer buys X and Y, the bundle costs the promo target.”
Settlement matching at exchanges: pair buy and sell orders that exactly clear.

Principal-engineer code review comment: “Why is this a one-off function? In our codebase, complement-search-against-hashmap is a building block. Extract find_pair_by_property(items, key_fn, target) so we can reuse for 3Sum, k-Sum, prefix-sum, and the promo-bundle code path. Also: thread safety? If this map is shared across requests, we have a race.”

LeetCode Interview — Extreme Coding