p44 — Merge K Sorted Lists
Source: LeetCode 23 · Hard · Topics: Heap, Linked List, Divide & Conquer Companies (2024–2025 frequency): Google (very high), Meta (high), Amazon (very high), Microsoft (high), Apple (high), Uber (medium), LinkedIn (high) Loop position: Classic Hard heap problem; appears in nearly every senior algorithm round. The “K-way merge” pattern is the canonical lesson.
1. Quick Context
You’re given an array of k sorted linked lists. Merge them into a single sorted linked list and return the head.
Two canonical solutions, both O(N log K) where N = total nodes:
- Heap of K head pointers. Push all heads onto a min-heap; pop the smallest, append to result, push the popped node’s
next(if any). O(N log K) time, O(K) extra space. - Divide and conquer. Pairwise-merge K lists in log K rounds. Same O(N log K), no heap needed.
The naïve approaches both lose:
- Concatenate + sort: O(N log N) — fine in big-O but ignores existing sortedness; awkward with linked lists.
- Sequential merge (merge list 0 with list 1, then with list 2, …): O(N · K) — TLE for large K.
Tie-break gotcha: Python’s heapq compares tuples elementwise; if two nodes have equal value, it tries to compare the ListNode objects → TypeError. Fix: push (val, unique_index, node).
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/merge-k-sorted-lists/
STOP. 30-min timer. Define ListNode first; sketch the heap approach.
3. Prerequisite Concepts
- Linked list traversal + dummy-head pattern.
heapqwith tuple comparison + tie-break index.- Merge-two-sorted-lists subroutine (LC 21).
- Divide and conquer recurrence:
T(K) = 2T(K/2) + O(N)→ O(N log K).
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “K already-sorted streams; produce one sorted stream.”
Step 2 — Clarify:
- “Can lists be empty? Can the input array be empty?” (Both yes per LC.)
- “Can node values be negative / duplicates?” (Both yes.)
- “Modify in place or new list?” (LC accepts node-reuse.)
- “Stable order on ties?” (Doesn’t matter for value-sorted output.)
Step 3 — Constraints: k ≤ 10^4, total nodes ≤ 10^4, values in [-10^4, 10^4]. O(N log K) comfortable.
Step 4 — Examples: [[1,4,5], [1,3,4], [2,6]] → [1,1,2,3,4,4,5,6].
Step 5 — Brute force: Collect all node values into a list, sort, rebuild a new linked list. O(N log N) time, O(N) space. Loses information that lists are presorted but passes correctness.
Step 6 — Complexity:
- Brute (collect+sort): O(N log N).
- Sequential merge: O(N · K).
- Heap of K: O(N log K) time, O(K) space.
- Divide & conquer: O(N log K) time, O(log K) recursion space (or O(1) iterative pairwise).
Step 7 — Pattern: “K-way merge” — heap-over-streams is the canonical streaming pattern. Same shape: merge-K-sorted-arrays, merge-K-sorted-files, external sort.
Step 8 — Develop: see solution.py.
Step 9 — Test: empty input, single list, all-empty lists, lists with one node, duplicates across lists, all-equal-values (tie-break stress).
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — Two equally good answers
6.1 Heap of K
heap = []
for i, head in enumerate(lists):
if head:
heapq.heappush(heap, (head.val, i, head))
dummy = tail = ListNode()
while heap:
val, i, node = heapq.heappop(heap)
tail.next = node
tail = node
if node.next:
heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
Why the index i? heapq compares tuples elementwise. On a tie at val, it falls through to the second element. If you push (val, node), Python tries to compare ListNode objects → TypeError: '<' not supported between instances of 'ListNode' and 'ListNode'. Using a unique integer index sidesteps the issue.
Alternative fixes:
- Reuse the original list index
i(works as long as you never push two values from the same list — true here, since you push.nextonly after popping). - Use
itertools.count()to generate unique indices. - Define
__lt__onListNodeto compare by value (modifies the class; works but feels invasive).
Time: O(N log K). Each node enters and leaves the heap exactly once; heap size ≤ K.
6.2 Divide and conquer
def merge(lists):
if not lists: return None
while len(lists) > 1:
merged = []
for i in range(0, len(lists), 2):
l1 = lists[i]
l2 = lists[i + 1] if i + 1 < len(lists) else None
merged.append(merge_two(l1, l2))
lists = merged
return lists[0]
Why it’s O(N log K)? Each “round” of pairwise merges touches each node once (total work = N per round). There are log K rounds (K halves each time). Total = O(N log K). Identical big-O to the heap.
Recursive variant:
def merge(lists, lo, hi):
if lo == hi: return lists[lo]
mid = (lo + hi) // 2
return merge_two(merge(lists, lo, mid), merge(lists, mid + 1, hi))
Same big-O; O(log K) recursion depth.
6.3 Why NOT sequential merge
result = None
for lst in lists:
result = merge_two(result, lst)
Round i merges a list of size ~ i · N/K with a list of size N/K, costing O(i · N/K). Total: (N/K) · (1 + 2 + ... + K) = O(N · K).
For K = 10^4, N = 10^4: 10^8 ops — TLE on most judges.
Divide-and-conquer keeps merges BALANCED, avoiding the “growing accumulator” problem.
6.4 Iterators vs node references
This problem is canonical with linked lists, but the heap pattern works identically over iterators. heapq.merge(*iterables) (stdlib) does exactly this in O(N log K) and is the right answer for “merge K sorted iterables of any kind.” Mention it as a one-liner — interviewers love that you know stdlib has it built in.
6.5 External sort connection
The heap-of-K-streams pattern IS the merge phase of external sort: K sorted runs on disk, merge them by maintaining a heap of K “current-front” elements, one buffered block per stream. Same code, different I/O model.
6.6 The space-time trade
Heap version uses O(K) auxiliary memory. Divide-and-conquer iterative uses O(K) for the temp merged array per round but pairs of merges are O(1) extra. Recursive divide-and-conquer uses O(log K) stack. For K ≤ 10^4, all are negligible. For K = 10^9 (cluster-scale), heap-of-K becomes the constraint and you need hierarchical merging.
7. Anti-Pattern Analysis
- Sequential merge (accumulator) — O(N · K). TLE.
- Push
(val, node)without index → TypeError on tie. - Build full array → sort → rebuild list — works but wasteful; loses the sortedness.
- Forget to push
.nextafter popping — heap drains, lose nodes. - Forget to handle empty input lists —
lists = []should return None, not crash. - Forget to filter
Noneheads on initial push —head.valon None → AttributeError. - Forget the dummy head — convoluted edge handling for the first node.
- Skip the tie-break index — works only if all values distinct.
- Modify nodes’
.nextduring merge then iterate later — easy to introduce cycles. - Returning a new list of values instead of relinking nodes — passes LC but doubles memory; in production, in-place relink is preferred.
8. Skills & Takeaways
- K-way merge pattern with heap-over-streams.
- Divide and conquer as a no-heap alternative with same big-O.
- Tie-break index for non-comparable payloads.
- Dummy head for linked-list construction.
heapq.mergestdlib one-liner for iterables.- External sort as the production analogue.
9. When to Move On
Move on when:
- You can write heap-of-K from scratch in ≤ 15 min including tie-break.
- You can write divide-and-conquer pairwise merge in ≤ 15 min.
- You explain WHY sequential merge is O(N · K) and why D&C is O(N log K).
- You know
heapq.mergeexists. - Stress test on K up to 50, N up to 1000 passes vs sorted-array oracle.
10. Company Context
Google:
- Universal Hard. L4/L5 must produce both solutions. L6+ asked external-sort follow-up.
- Bar Raiser scorecard: “K-way merge — produced both heap and divide-and-conquer; explained O(N log K) crisply; knew
heapq.merge.”
Meta:
- L5/L6 staple. Often the second Hard in a 4-round loop.
Amazon:
- Phrased as “merge K sorted streams of orders/events.” Connect to Kinesis-like systems.
Microsoft:
- L63+. Often asks the iterator variant first (“merge K sorted iterators”).
Apple:
- Storage / iOS Files teams ask the external-sort variant.
Uber, LinkedIn:
- Common Hard, often paired with a follow-up on stable ordering or duplicates.
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| First instinct | Concat + sort | Heap-of-K | “Heap or D&C — same big-O” | + mentions heapq.merge, external sort |
| Tie-break | Hits TypeError | Adds index | Explains WHY | + alternative: define __lt__ |
| Empty input | Crashes | Handles | Mentions explicitly | + parameterizes for None tolerance throughout |
| Complexity | “It’s fast” | O(N log K) | Compares vs sequential O(N · K) | + space analysis O(K) vs O(log K) recursion |
| Production | Silent | Silent | “External sort uses this” | + sketches buffered-block I/O, sketches Kafka/Kinesis stream join |
12. Level Delta
Mid (L4):
- Heap-of-K with tie-break index.
- Handles empty lists and empty input array.
- Quotes O(N log K).
Senior (L5):
- Provides BOTH heap and divide-and-conquer.
- Explains tie-break gotcha with the TypeError.
- Compares against sequential merge’s O(N · K).
Staff (L6):
- Mentions
heapq.mergestdlib. - Connects to external sort and stream-merge in production.
- Discusses space trade-off (heap O(K) vs D&C O(log K)).
Principal (L7+):
- Discusses hierarchical merge for K = 10^9 (cluster-scale): per-shard pre-merge, then merge of merges.
- Discusses I/O-aware merging: block-buffered reads, prefetching, parallel decompression.
- Discusses bounded-memory merging when K · block_size > RAM: spill intermediate runs to disk.
- Discusses stream-merge in event systems: watermarks, late-arriving data, out-of-order tolerance.
13. Follow-up Q&A
Q1: K sorted iterators of any type (not linked lists)?
A: heapq.merge(*iterables) — stdlib, one line, lazy. Or write the heap loop with iter()/next().
Q2: K = 10^9 (cluster-scale)? A: Hierarchical merge. Each machine merges ~K/M lists; emit sorted output; coordinator merges M sorted streams (same algorithm, smaller K).
Q3: How would you do this on disk (external merge)? A: K input files, each pre-sorted. Maintain a small buffer (1 block) per file; heap over current-front elements; write merged output to disk. Total I/O is linear in N if you size buffers well. This is the merge phase of external sort.
Q4: K sorted streams, but each stream is infinite — stream the result?
A: Same heap loop, but yield instead of build a list. heapq.merge returns a lazy iterator already. Backpressure / pull semantics matter.
Q5: Stable merge (preserve original list order on ties)?
A: The (val, list_index, node) tuple already gives stability by ascending list_index. If you want stability across ties WITHIN a single list, use (val, list_index, node_index).
Q6: Merge K iterators where each emits (timestamp, payload) with possible out-of-order events?
A: Watermark-based merging: accept events within a tolerance window; buffer slightly out-of-order events; drop or side-channel late ones. Production: Flink, Beam, Spark Structured Streaming.
14. Solution Walkthrough
See solution.py:
merge_k_lists_heap— heap of K head pointers with tie-break index.merge_k_lists_divide_conquer— iterative pairwise merge.merge_two_lists— LC 21 subroutine.merge_k_lists_brute— collect + sort baseline (oracle).- Helpers:
to_list_from_array,to_array_from_listfor testability. edge_cases()+stress_test()— 200 random trials, all three agree.
Run: python3 solution.py.
15. Beyond the Problem
Principal-engineer code review comment:
“Four things. (1) The tie-break index pattern is a low-key signature of ‘developer who reads tracebacks’ — keep it; explicit comment in the code. (2)
heapq.mergeexists; for new code over iterables, use it. We had a 200-line custom merger in our pipeline that was a one-liner. (3) For our streaming-events pipeline, the linked-list framing is wrong — events come as iterators with bounded memory budget per stream. Use bounded queues, not unbounded buffering; or you OOM at 3am. (4) Divide-and-conquer is the right answer for cluster-scale; the heap version assumes you have all K streams in one process. State the model assumption in the docstring.”
Connections forward:
- p41 — Kth Largest (heap-of-size-K).
- p42 — Top K Frequent.
- p43 — Find Median from Data Stream.
- LC 21 — Merge Two Sorted Lists (subroutine).
- LC 88 — Merge Sorted Array (two-pointer variant).
- LC 378 — Kth Smallest in Sorted Matrix (heap-over-rows; same pattern).
- Phase 02 Lab 09 — Heap Top-K.
- Production: external sort merge phase, Kafka/Kinesis stream merge, log aggregation, distributed query plan merge.