p44 — Merge K Sorted Lists

Source: LeetCode 23 · Hard · Topics: Heap, Linked List, Divide & Conquer Companies (2024–2025 frequency): Google (very high), Meta (high), Amazon (very high), Microsoft (high), Apple (high), Uber (medium), LinkedIn (high) Loop position: Classic Hard heap problem; appears in nearly every senior algorithm round. The “K-way merge” pattern is the canonical lesson.

1. Quick Context

You’re given an array of k sorted linked lists. Merge them into a single sorted linked list and return the head.

Two canonical solutions, both O(N log K) where N = total nodes:

  1. Heap of K head pointers. Push all heads onto a min-heap; pop the smallest, append to result, push the popped node’s next (if any). O(N log K) time, O(K) extra space.
  2. Divide and conquer. Pairwise-merge K lists in log K rounds. Same O(N log K), no heap needed.

The naïve approaches both lose:

  • Concatenate + sort: O(N log N) — fine in big-O but ignores existing sortedness; awkward with linked lists.
  • Sequential merge (merge list 0 with list 1, then with list 2, …): O(N · K) — TLE for large K.

Tie-break gotcha: Python’s heapq compares tuples elementwise; if two nodes have equal value, it tries to compare the ListNode objects → TypeError. Fix: push (val, unique_index, node).


🔗 https://leetcode.com/problems/merge-k-sorted-lists/

STOP. 30-min timer. Define ListNode first; sketch the heap approach.


3. Prerequisite Concepts

  • Linked list traversal + dummy-head pattern.
  • heapq with tuple comparison + tie-break index.
  • Merge-two-sorted-lists subroutine (LC 21).
  • Divide and conquer recurrence: T(K) = 2T(K/2) + O(N) → O(N log K).

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “K already-sorted streams; produce one sorted stream.”

Step 2 — Clarify:

  • “Can lists be empty? Can the input array be empty?” (Both yes per LC.)
  • “Can node values be negative / duplicates?” (Both yes.)
  • “Modify in place or new list?” (LC accepts node-reuse.)
  • “Stable order on ties?” (Doesn’t matter for value-sorted output.)

Step 3 — Constraints: k ≤ 10^4, total nodes ≤ 10^4, values in [-10^4, 10^4]. O(N log K) comfortable.

Step 4 — Examples: [[1,4,5], [1,3,4], [2,6]] → [1,1,2,3,4,4,5,6].

Step 5 — Brute force: Collect all node values into a list, sort, rebuild a new linked list. O(N log N) time, O(N) space. Loses information that lists are presorted but passes correctness.

Step 6 — Complexity:

  • Brute (collect+sort): O(N log N).
  • Sequential merge: O(N · K).
  • Heap of K: O(N log K) time, O(K) space.
  • Divide & conquer: O(N log K) time, O(log K) recursion space (or O(1) iterative pairwise).

Step 7 — Pattern: “K-way merge” — heap-over-streams is the canonical streaming pattern. Same shape: merge-K-sorted-arrays, merge-K-sorted-files, external sort.

Step 8 — Develop: see solution.py.

Step 9 — Test: empty input, single list, all-empty lists, lists with one node, duplicates across lists, all-equal-values (tie-break stress).


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Two equally good answers

6.1 Heap of K

heap = []
for i, head in enumerate(lists):
    if head:
        heapq.heappush(heap, (head.val, i, head))

dummy = tail = ListNode()
while heap:
    val, i, node = heapq.heappop(heap)
    tail.next = node
    tail = node
    if node.next:
        heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next

Why the index i? heapq compares tuples elementwise. On a tie at val, it falls through to the second element. If you push (val, node), Python tries to compare ListNode objects → TypeError: '<' not supported between instances of 'ListNode' and 'ListNode'. Using a unique integer index sidesteps the issue.

Alternative fixes:

  • Reuse the original list index i (works as long as you never push two values from the same list — true here, since you push .next only after popping).
  • Use itertools.count() to generate unique indices.
  • Define __lt__ on ListNode to compare by value (modifies the class; works but feels invasive).

Time: O(N log K). Each node enters and leaves the heap exactly once; heap size ≤ K.

6.2 Divide and conquer

def merge(lists):
    if not lists: return None
    while len(lists) > 1:
        merged = []
        for i in range(0, len(lists), 2):
            l1 = lists[i]
            l2 = lists[i + 1] if i + 1 < len(lists) else None
            merged.append(merge_two(l1, l2))
        lists = merged
    return lists[0]

Why it’s O(N log K)? Each “round” of pairwise merges touches each node once (total work = N per round). There are log K rounds (K halves each time). Total = O(N log K). Identical big-O to the heap.

Recursive variant:

def merge(lists, lo, hi):
    if lo == hi: return lists[lo]
    mid = (lo + hi) // 2
    return merge_two(merge(lists, lo, mid), merge(lists, mid + 1, hi))

Same big-O; O(log K) recursion depth.

6.3 Why NOT sequential merge

result = None
for lst in lists:
    result = merge_two(result, lst)

Round i merges a list of size ~ i · N/K with a list of size N/K, costing O(i · N/K). Total: (N/K) · (1 + 2 + ... + K) = O(N · K).

For K = 10^4, N = 10^4: 10^8 ops — TLE on most judges.

Divide-and-conquer keeps merges BALANCED, avoiding the “growing accumulator” problem.

6.4 Iterators vs node references

This problem is canonical with linked lists, but the heap pattern works identically over iterators. heapq.merge(*iterables) (stdlib) does exactly this in O(N log K) and is the right answer for “merge K sorted iterables of any kind.” Mention it as a one-liner — interviewers love that you know stdlib has it built in.

6.5 External sort connection

The heap-of-K-streams pattern IS the merge phase of external sort: K sorted runs on disk, merge them by maintaining a heap of K “current-front” elements, one buffered block per stream. Same code, different I/O model.

6.6 The space-time trade

Heap version uses O(K) auxiliary memory. Divide-and-conquer iterative uses O(K) for the temp merged array per round but pairs of merges are O(1) extra. Recursive divide-and-conquer uses O(log K) stack. For K ≤ 10^4, all are negligible. For K = 10^9 (cluster-scale), heap-of-K becomes the constraint and you need hierarchical merging.


7. Anti-Pattern Analysis

  1. Sequential merge (accumulator) — O(N · K). TLE.
  2. Push (val, node) without index → TypeError on tie.
  3. Build full array → sort → rebuild list — works but wasteful; loses the sortedness.
  4. Forget to push .next after popping — heap drains, lose nodes.
  5. Forget to handle empty input listslists = [] should return None, not crash.
  6. Forget to filter None heads on initial push — head.val on None → AttributeError.
  7. Forget the dummy head — convoluted edge handling for the first node.
  8. Skip the tie-break index — works only if all values distinct.
  9. Modify nodes’ .next during merge then iterate later — easy to introduce cycles.
  10. Returning a new list of values instead of relinking nodes — passes LC but doubles memory; in production, in-place relink is preferred.

8. Skills & Takeaways

  • K-way merge pattern with heap-over-streams.
  • Divide and conquer as a no-heap alternative with same big-O.
  • Tie-break index for non-comparable payloads.
  • Dummy head for linked-list construction.
  • heapq.merge stdlib one-liner for iterables.
  • External sort as the production analogue.

9. When to Move On

Move on when:

  • You can write heap-of-K from scratch in ≤ 15 min including tie-break.
  • You can write divide-and-conquer pairwise merge in ≤ 15 min.
  • You explain WHY sequential merge is O(N · K) and why D&C is O(N log K).
  • You know heapq.merge exists.
  • Stress test on K up to 50, N up to 1000 passes vs sorted-array oracle.

10. Company Context

Google:

  • Universal Hard. L4/L5 must produce both solutions. L6+ asked external-sort follow-up.
  • Bar Raiser scorecard: “K-way merge — produced both heap and divide-and-conquer; explained O(N log K) crisply; knew heapq.merge.”

Meta:

  • L5/L6 staple. Often the second Hard in a 4-round loop.

Amazon:

  • Phrased as “merge K sorted streams of orders/events.” Connect to Kinesis-like systems.

Microsoft:

  • L63+. Often asks the iterator variant first (“merge K sorted iterators”).

Apple:

  • Storage / iOS Files teams ask the external-sort variant.

Uber, LinkedIn:

  • Common Hard, often paired with a follow-up on stable ordering or duplicates.

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
First instinctConcat + sortHeap-of-K“Heap or D&C — same big-O”+ mentions heapq.merge, external sort
Tie-breakHits TypeErrorAdds indexExplains WHY+ alternative: define __lt__
Empty inputCrashesHandlesMentions explicitly+ parameterizes for None tolerance throughout
Complexity“It’s fast”O(N log K)Compares vs sequential O(N · K)+ space analysis O(K) vs O(log K) recursion
ProductionSilentSilent“External sort uses this”+ sketches buffered-block I/O, sketches Kafka/Kinesis stream join

12. Level Delta

Mid (L4):

  • Heap-of-K with tie-break index.
  • Handles empty lists and empty input array.
  • Quotes O(N log K).

Senior (L5):

  • Provides BOTH heap and divide-and-conquer.
  • Explains tie-break gotcha with the TypeError.
  • Compares against sequential merge’s O(N · K).

Staff (L6):

  • Mentions heapq.merge stdlib.
  • Connects to external sort and stream-merge in production.
  • Discusses space trade-off (heap O(K) vs D&C O(log K)).

Principal (L7+):

  • Discusses hierarchical merge for K = 10^9 (cluster-scale): per-shard pre-merge, then merge of merges.
  • Discusses I/O-aware merging: block-buffered reads, prefetching, parallel decompression.
  • Discusses bounded-memory merging when K · block_size > RAM: spill intermediate runs to disk.
  • Discusses stream-merge in event systems: watermarks, late-arriving data, out-of-order tolerance.

13. Follow-up Q&A

Q1: K sorted iterators of any type (not linked lists)? A: heapq.merge(*iterables) — stdlib, one line, lazy. Or write the heap loop with iter()/next().

Q2: K = 10^9 (cluster-scale)? A: Hierarchical merge. Each machine merges ~K/M lists; emit sorted output; coordinator merges M sorted streams (same algorithm, smaller K).

Q3: How would you do this on disk (external merge)? A: K input files, each pre-sorted. Maintain a small buffer (1 block) per file; heap over current-front elements; write merged output to disk. Total I/O is linear in N if you size buffers well. This is the merge phase of external sort.

Q4: K sorted streams, but each stream is infinite — stream the result? A: Same heap loop, but yield instead of build a list. heapq.merge returns a lazy iterator already. Backpressure / pull semantics matter.

Q5: Stable merge (preserve original list order on ties)? A: The (val, list_index, node) tuple already gives stability by ascending list_index. If you want stability across ties WITHIN a single list, use (val, list_index, node_index).

Q6: Merge K iterators where each emits (timestamp, payload) with possible out-of-order events? A: Watermark-based merging: accept events within a tolerance window; buffer slightly out-of-order events; drop or side-channel late ones. Production: Flink, Beam, Spark Structured Streaming.


14. Solution Walkthrough

See solution.py:

  • merge_k_lists_heap — heap of K head pointers with tie-break index.
  • merge_k_lists_divide_conquer — iterative pairwise merge.
  • merge_two_lists — LC 21 subroutine.
  • merge_k_lists_brute — collect + sort baseline (oracle).
  • Helpers: to_list_from_array, to_array_from_list for testability.
  • edge_cases() + stress_test() — 200 random trials, all three agree.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“Four things. (1) The tie-break index pattern is a low-key signature of ‘developer who reads tracebacks’ — keep it; explicit comment in the code. (2) heapq.merge exists; for new code over iterables, use it. We had a 200-line custom merger in our pipeline that was a one-liner. (3) For our streaming-events pipeline, the linked-list framing is wrong — events come as iterators with bounded memory budget per stream. Use bounded queues, not unbounded buffering; or you OOM at 3am. (4) Divide-and-conquer is the right answer for cluster-scale; the heap version assumes you have all K streams in one process. State the model assumption in the docstring.”

Connections forward:

  • p41 — Kth Largest (heap-of-size-K).
  • p42 — Top K Frequent.
  • p43 — Find Median from Data Stream.
  • LC 21 — Merge Two Sorted Lists (subroutine).
  • LC 88 — Merge Sorted Array (two-pointer variant).
  • LC 378 — Kth Smallest in Sorted Matrix (heap-over-rows; same pattern).
  • Phase 02 Lab 09 — Heap Top-K.
  • Production: external sort merge phase, Kafka/Kinesis stream merge, log aggregation, distributed query plan merge.