Week 09 — Heap / Top-K: Streaming Order Statistics
Month 2 (Patterns) · Week 4 of 4 · Difficulty: Medium → Hard
Theme
The heap is the algorithm world’s most underrated data structure. Whenever you hear:
- “find the K-th largest/smallest”
- “top K frequent / closest / cheapest”
- “running median”
- “merge K sorted streams”
- “schedule the next-due / highest-priority task”
…the answer is almost always a heap (often plus a hash map or a second heap).
This week drills the five canonical heap patterns:
- Quickselect vs heap-of-size-K for “K-th element” (p41).
- Bucket sort vs heap-of-size-K for “top K by count” (p42).
- Two-heaps (max-heap of lower half + min-heap of upper half) for running median (p43).
- Heap-over-K-heads for merging K sorted lists (p44).
- Greedy heap scheduling for “no two adjacent equal” (p45).
Why these specific five
Each forces a different heap technique:
| Problem | Pattern | Discriminator |
|---|---|---|
| p41 Kth Largest | Heap of size K OR Quickselect | Two valid answers; senior names both, picks heap for production, quickselect for interview “do better than O(n log n)” |
| p42 Top K Frequent | Heap-of-size-K OR Bucket sort O(n) | Bucket sort surprise-beats heap when count ≤ n |
| p43 Running Median | Two-heap dance | The invariant: |lo| - |hi| ∈ {0, 1}; rebalance after every insert |
| p44 Merge K Lists | Heap over K head pointers | The classic. O(N log K) vs O(NK) naive |
| p45 Reorganize String | Greedy heap (counts, restore after using) | Counter-intuitive: max-heap by count, “use top + stash” |
Problem table
| # | Problem | LC | Difficulty | Core technique |
|---|---|---|---|---|
| p41 | Kth Largest Element in an Array | 215 | Medium | Heap of size K · Quickselect |
| p42 | Top K Frequent Elements | 347 | Medium | Counter + heap OR bucket sort |
| p43 | Find Median from Data Stream | 295 | Hard | Two heaps + rebalance invariant |
| p44 | Merge K Sorted Lists | 23 | Hard | Heap over K heads |
| p45 | Reorganize String | 767 | Medium | Greedy max-heap + delay slot |
Daily Schedule (suggested 5-day pace)
| Day | Problem | Deliverable |
|---|---|---|
| Mon | p41 | Heap-of-size-K + Quickselect; understand expected O(n) |
| Tue | p42 | Bucket sort + heap variants; pick winner for given input shape |
| Wed | p43 | Two-heap invariant; trace 6-element insert sequence by hand |
| Thu | p44 | Heap of nodes; understand why we push only one-per-list at a time |
| Fri | p45 | Greedy with delayed re-insertion; prove correctness |
Readiness Gate
You may proceed to Phase 3 when ALL hold:
-
All 5 stress tests PASS (
python3 solution.pyper folder). - You can write each problem’s canonical solution in ≤ 12 min, bug-free.
- You can articulate WHY a heap is the right structure (vs sorted array / BST / sort) for each.
- You can explain the running-median two-heap invariant without notes.
- You can write Quickselect from scratch and explain its expected O(n).
Key Mental Models Established This Week
heapqis a min-heap. For max-heap, negate values.- Heap of size K is O(n log K) — beats sort O(n log n) when K << n.
- Quickselect is expected O(n), worst O(n²) — randomize pivot.
- Two heaps: max-heap stores SMALLER half (with negation), min-heap stores LARGER half.
- Tie-breaking in heaps: push
(key, counter, item)to avoid comparing items. - Bucket sort beats heap when the value domain is small (e.g., counts ∈ [1, n]).
- Lazy deletion: for heaps that need “remove arbitrary,” push tombstones, skip on pop.