Week 09 — Heap / Top-K: Streaming Order Statistics

Month 2 (Patterns) · Week 4 of 4 · Difficulty: Medium → Hard

Theme

The heap is the algorithm world’s most underrated data structure. Whenever you hear:

  • “find the K-th largest/smallest”
  • “top K frequent / closest / cheapest”
  • “running median”
  • “merge K sorted streams”
  • “schedule the next-due / highest-priority task”

…the answer is almost always a heap (often plus a hash map or a second heap).

This week drills the five canonical heap patterns:

  1. Quickselect vs heap-of-size-K for “K-th element” (p41).
  2. Bucket sort vs heap-of-size-K for “top K by count” (p42).
  3. Two-heaps (max-heap of lower half + min-heap of upper half) for running median (p43).
  4. Heap-over-K-heads for merging K sorted lists (p44).
  5. Greedy heap scheduling for “no two adjacent equal” (p45).

Why these specific five

Each forces a different heap technique:

ProblemPatternDiscriminator
p41 Kth LargestHeap of size K OR QuickselectTwo valid answers; senior names both, picks heap for production, quickselect for interview “do better than O(n log n)”
p42 Top K FrequentHeap-of-size-K OR Bucket sort O(n)Bucket sort surprise-beats heap when count ≤ n
p43 Running MedianTwo-heap danceThe invariant: |lo| - |hi| ∈ {0, 1}; rebalance after every insert
p44 Merge K ListsHeap over K head pointersThe classic. O(N log K) vs O(NK) naive
p45 Reorganize StringGreedy heap (counts, restore after using)Counter-intuitive: max-heap by count, “use top + stash”

Problem table

#ProblemLCDifficultyCore technique
p41Kth Largest Element in an Array215MediumHeap of size K · Quickselect
p42Top K Frequent Elements347MediumCounter + heap OR bucket sort
p43Find Median from Data Stream295HardTwo heaps + rebalance invariant
p44Merge K Sorted Lists23HardHeap over K heads
p45Reorganize String767MediumGreedy max-heap + delay slot

Daily Schedule (suggested 5-day pace)

DayProblemDeliverable
Monp41Heap-of-size-K + Quickselect; understand expected O(n)
Tuep42Bucket sort + heap variants; pick winner for given input shape
Wedp43Two-heap invariant; trace 6-element insert sequence by hand
Thup44Heap of nodes; understand why we push only one-per-list at a time
Frip45Greedy with delayed re-insertion; prove correctness

Readiness Gate

You may proceed to Phase 3 when ALL hold:

  • All 5 stress tests PASS (python3 solution.py per folder).
  • You can write each problem’s canonical solution in ≤ 12 min, bug-free.
  • You can articulate WHY a heap is the right structure (vs sorted array / BST / sort) for each.
  • You can explain the running-median two-heap invariant without notes.
  • You can write Quickselect from scratch and explain its expected O(n).

Key Mental Models Established This Week

  • heapq is a min-heap. For max-heap, negate values.
  • Heap of size K is O(n log K) — beats sort O(n log n) when K << n.
  • Quickselect is expected O(n), worst O(n²) — randomize pivot.
  • Two heaps: max-heap stores SMALLER half (with negation), min-heap stores LARGER half.
  • Tie-breaking in heaps: push (key, counter, item) to avoid comparing items.
  • Bucket sort beats heap when the value domain is small (e.g., counts ∈ [1, n]).
  • Lazy deletion: for heaps that need “remove arbitrary,” push tombstones, skip on pop.