p07 — Merge Two Sorted Lists

Source: LeetCode 21 · Easy · Topics: Linked List, Recursion Companies (2024–2025 frequency): Amazon (very high), Microsoft (very high), Apple (high), Google (high), Meta (medium), Bloomberg (high) Loop position: phone screen warmup; the kernel of merge-sort (LC 148), K-way merge (LC 23), and external sort

1. Quick Context

The “merge” of merge sort, on linked lists. The interviewer is testing whether you reach for the dummy/sentinel node pattern without prompting, because dummy nodes are how senior engineers eliminate special cases. The candidate who writes if result is None: result = ...; tail = ... four times has not internalized the pattern. The one who writes dummy = ListNode(); tail = dummy and never special-cases anything has.

What it looks like it tests: merging two sorted lists. What it actually tests: do you use the dummy-node pattern reflexively? Can you stitch the leftover tail in O(1) instead of walking it node-by-node?

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/merge-two-sorted-lists/

STOP. Set a 12-minute timer. Code it cold. Do not read past this section until solved or expired.

If repeat: target 5 min including narration. Iterative + dummy node. If you wrote any if head is None branches inside the loop, you’ve missed the lesson — redo with a sentinel.

3. Prerequisite Concepts

Linked list node basics (phase-01 §4)
Sentinel / dummy node pattern: allocate one throwaway node at the start so the “first node” of the output isn’t a special case. This pattern reappears in every list/tree problem where insertion at head differs from insertion anywhere else.
The “leftover tail” optimization: when one list is exhausted, the other’s remaining suffix is already sorted — just point at it; don’t iterate.

4. How to Approach (FRAMEWORK Steps 1–9 applied)

Step 1 — Restate: “Given two sorted (non-decreasing) singly linked lists, return a single sorted list formed by splicing their nodes.”

Step 2 — Clarify:

“Splice (reuse input nodes) or allocate new nodes?” (Splice is expected — O(1) extra space. Ask anyway.)
“Either list empty? Both empty?” (Yes, both possible.)
“Stable for equal values? Does order within a value matter?” (Usually no; ask if data has meaning beyond val.)
“Sorted ascending or descending?” (Ascending per LC.)

Step 3 — Constraints: N1 + N2 up to 100 per LC, but real production sizes are unbounded. O(N1+N2) is the bar.

Step 4 — Examples:

[1,2,4] + [1,3,4] → [1,1,2,3,4,4]
[] + [] → []
[] + [0] → [0]
[5] + [1] → [1,5] (a < b throughout — leftover tail kicks in immediately)
[1,1,1] + [1,1,1] → [1,1,1,1,1,1] (all equal — tie-breaking)

Step 5 — Brute Force: Walk both lists, push all values into a Python list, sort, rebuild a linked list. O((N1+N2) log (N1+N2)) time, O(N1+N2) space. Throws away the sortedness — explicitly inferior. State this aloud to anchor that you know it’s wrong.

Step 6 — Brute Force Complexity: O(N log N) time, O(N) space. Misses the linear opportunity.

Step 7 — Pattern Recognition: “Two sorted sequences → one merged sorted sequence” → two-pointer merge (the inner loop of merge sort). On linked lists this also requires dummy node for clean output construction.

Step 8 — Optimize: Dummy node + tail pointer. While both l1 and l2 are non-None, append the smaller (l1 if l1.val <= l2.val else l2) to tail.next, advance that list and tail. After the loop, attach whichever is non-None (the “leftover tail” — point, don’t walk). Return dummy.next.

Step 9 — Prove correctness: Loop invariant: after each iteration, dummy.next ... tail is a sorted list containing exactly the consumed nodes from l1 and l2, and both l1 and l2 still point to sorted remainders whose first elements are ≥ tail.val. When one becomes None, the other’s remainder is sorted and ≥ tail — attaching it preserves sortedness.

5. Progressive Hints

If stuck for 5 minutes, open hints.md. One hint, 5-min timer, retry.

6. Deeper Insight — Why It Works

Why dummy node: without it, the first append is a special case (if result is None: result = node; tail = node). With it, the first append is identical to every other (tail.next = node; tail = tail.next). One pattern, zero special cases. This is the entire reason senior code looks cleaner than junior code on linked list problems — the senior reaches for sentinels without thinking.

Why <= not < for stability: if you write <, ties from l1 get placed AFTER ties from l2, breaking stability with respect to input order. For pure numerical merge it doesn’t matter, but for sorted lists of objects (rows, records) with equal keys, stability is a real correctness property. The Pythonic principle: write the version that generalizes.

Why “leftover tail” matters: after one list exhausts, the other is already a sorted linked list. Walking it node-by-node to append each is O(remainder) wasted work AND extra pointer writes. Just doing tail.next = whichever_is_not_none is O(1) and is the move that makes the algorithm exactly O(N1+N2), not “amortized” or “approximately”.

Why this is merge-sort’s inner loop: merge sort on a linked list (LC 148) recursively splits and then calls THIS exact function on each pair. Merging K sorted lists (LC 23) uses this repeatedly inside a min-heap. The kernel is small and gets reused; learn it cleanly.

7. Anti-Pattern Analysis

Wrong-but-tempting #1 — No dummy node, special-case the first append:

result = None; tail = None
while l1 and l2:
    if l1.val <= l2.val:
        if result is None: result = l1; tail = l1
        else: tail.next = l1; tail = l1
        l1 = l1.next
    else:
        if result is None: result = l2; tail = l2
        else: tail.next = l2; tail = l2
        l2 = l2.next
# ... and you need similar special-casing for the leftover

Works but it’s a code smell. The dummy node version is half the lines and zero branches in the loop. Senior interviewers see this and write “didn’t use idiomatic sentinel” on the scorecard.

Wrong-but-tempting #2 — Allocate new nodes:

while l1 and l2:
    if l1.val <= l2.val:
        tail.next = ListNode(l1.val); l1 = l1.next
    else:
        tail.next = ListNode(l2.val); l2 = l2.next
    tail = tail.next

Allocates N new nodes, doubling memory. Splicing existing nodes is O(1) extra space. For interview, this is a “didn’t optimize space” mark. For production with 10M nodes, it’s a memory pressure incident.

Wrong-but-tempting #3 — Walk the leftover tail node by node:

while l1: tail.next = l1; tail = tail.next; l1 = l1.next
while l2: tail.next = l2; tail = tail.next; l2 = l2.next

Works, but the second loop is unnecessary — exactly one of l1, l2 is non-None after the merge loop, and you can attach the whole remainder with tail.next = l1 or l2. The over-engineered version wastes pointer writes and obscures that the leftover is already sorted.

Wrong-but-tempting #4 — Recursive without the splice:

def merge(a, b):
    if not a: return b
    if not b: return a
    if a.val <= b.val:
        a.next = merge(a.next, b); return a
    b.next = merge(a, b.next); return b

This is actually GOOD — clean, elegant, and demonstrates recursion. But: O(N1+N2) stack depth, which stack-overflows in Python around N=1000. Iterative is the production choice; recursive is the “show off elegance, then mention the stack cost” choice. Both should be in your repertoire.

8. Skills & Takeaways

Generalizable patterns:

Dummy/sentinel node: use whenever the “first” element is structurally different from the rest. Reappears in every list mutation problem.
Two-pointer merge: the inner loop of merge sort, of streaming joins in databases, of timeline-merge in log aggregation systems.
Leftover tail attachment: any merge of two already-sorted sequences leaves one as a prepared suffix; attach in O(1).

Analogous problems:

LC 23 — Merge K Sorted Lists (Hard; min-heap of heads, call this kernel K-1 times)
LC 148 — Sort List (merge sort on a linked list; this is the merge step)
LC 88 — Merge Sorted Array (the array version, with the in-place-from-end trick — your p04)
LC 1019 — Next Greater Node In Linked List (different problem, but reuses the dummy-node-for-result pattern)
LC 86 — Partition List (two dummy nodes, splice into two lists, then merge)

When NOT to use this: unsorted inputs (sort first, OR use a different algorithm). Streaming inputs of unbounded size where you can’t hold both heads.

9. When to Move On (binary; must all be YES)

I solved the iterative version in <5 min on the second attempt
I used a dummy node and have ZERO if tail is None checks inside the loop
I attached the leftover tail in O(1) (not by walking)
I can explain why <= matters for stability
I can write the recursive version and state its O(N) stack cost
My stress_test() passes 1000 iterations against a sorted-Python-list oracle
I solved LC 23 (Merge K Sorted Lists) and recognized this kernel called K-1 times

If any unchecked: redo before moving to p08.

10. Company Context

Amazon (warehouse / fulfillment narratives common)

The framing: “You have two sorted shipment-arrival queues from different warehouses; merge them into one timeline.”
Misleading example: Empty input from one queue. Many candidates’ code crashes on None.val.
Adversarial extension: “Now there are M warehouses.” (LC 23 — K-way merge.) Followed by: “And each can be 10TB on disk.” (External merge sort.)
What they want to hear: “Let me handle the empty-list cases first.” Amazon weighs defensive coding heavily.

Microsoft (loves linked-list problems — phone screen battery)

The framing: Often paired with reverse-linked-list as a back-to-back battery.
Misleading example: They give you [1,1,1] and [1,1,1] — testing whether you correctly use <= and don’t introduce bugs around ties.
Adversarial extension: “Sort a linked list using merge sort.” (LC 148.) Now you split the list with slow/fast pointers and call this function on the halves.
What they want to hear: Verbalize the dummy-node decision out loud. “I’ll use a sentinel to avoid special-casing the first node.”

Apple (clean code expected)

The framing: Often the straight problem.
Misleading example: None — Apple tends to be straightforward but watches for code aesthetics.
Adversarial extension: “Merge them in descending order instead.” (Flip the comparator, but watch for the leftover-tail still being attached without re-sorting — it’s already descending if input was descending.)
What they want to hear: Idiomatic, dummy-node, leftover-attached-in-O(1) code. Apple interviewers will literally say “looks like production code” when they see it.

Google (algorithm composition)

The framing: Straight problem, then “What’s the K-way version?”, then “What if K=10^6?”
What they want to hear: Progression from this kernel → min-heap of K heads → external merge with limited memory. The Google bar is articulating the algorithmic family tree.

Meta / Bloomberg (data-processing scenarios)

The framing: “Merge two sorted streams of events / orders / quotes by timestamp.”
What they want to hear: Recognition that this is the inner loop of every streaming-join algorithm. Bloomberg in particular loves merge-by-timestamp scenarios.

11. Interviewer’s Lens

Phase	Strong signal	Weak signal	Scorecard phrase (strong)
Reading problem	Asks “splice or copy?” + “either empty?”	Assumes both	“Clarifies mutation and edge cases — production-mindful”
Pre-coding	Declares dummy node before loop, names `tail`	Builds result with `if result is None` branches	“Reaches for idiomatic sentinel — senior signal”
Coding	Uses `<=` (stable); attaches leftover in O(1)	Uses `<` (silently destabilizes); walks leftover	“Demonstrates awareness of stability and the leftover optimization”
Edge cases	Tests empty + single + all-equal before submit	Tests only the canonical example	“Self-catches edge bugs — strong production instinct”
Follow-up	Names this as merge-sort’s inner loop	Doesn’t connect to broader pattern	“Sees the kernel inside larger algorithms — Staff signal”

The scorecard line that gets you the offer: “Used dummy node naturally, attached leftover tail in O(1) — code reads like internal library code.”

The scorecard line that loses you: “Multiple branches in main loop; allocated new nodes unnecessarily; walked leftover.”

12. Level Delta

Level	What their answer looks like
Mid	Iterative solution that works. Uses dummy node after prompting or recall. Walks the leftover instead of attaching. Handles empty after explicit testing. ~10 min.
Senior	Iterative + dummy node + `<=` for stability + O(1) leftover attachment. Handles all empty cases without prompting. ~6 min. Mentions O(N1+N2) explicitly.
Staff	All of Senior + recursive version with stack-depth caveat + names this as merge-sort’s inner loop + connects to LC 23 (K-way) and LC 148 (sort list) + mentions stability for object-valued lists. ~5 min.
Principal	All of Staff + asks about object value semantics (stability matters for records, not for ints) + offers external-merge variant for disk-resident lists + mentions that timestamp-keyed merges in distributed log systems (Kafka log compaction, Kinesis shard merging) use exactly this kernel + ~4 min with discussion of when to use loser-tree instead of pairwise merge for K-way.

13. Follow-up Questions & Full Answers

Follow-up 1: “Now merge K sorted lists.” (LC 23)

Signal sought: Composition of this kernel.

Full answer: “Two approaches. (a) Min-heap of K heads: push the head of each list into a min-heap keyed by val. Pop the smallest, append to result, push its next if any. O(N log K) time, O(K) extra space. (b) Pairwise merge: merge in K/2 pairs, then K/4, … O(N log K) same complexity but better cache behavior because each merge is sequential. For very large K, (a); for moderate K with cache-sensitive workloads, (b). The kernel I just wrote is what’s called inside both.”

Follow-up 2: “What if the lists are descending?”

Signal sought: Adaptation, comparator awareness.

Full answer: “Flip the comparison: l1.val >= l2.val picks the larger first. The leftover-tail attach still works — it’s still a sorted suffix, just descending. Single character change. If I expected either order at runtime, I’d parameterize with a comparator function: merge(l1, l2, cmp=lambda a,b: a <= b).”

Follow-up 3: “Stability — explain when `<=` vs `<` matters.”

Signal sought: Do you understand the property name?

Full answer: “Stability means equal elements preserve their original relative order. For pure numbers it’s invisible. For objects with payloads (e.g., trading orders with same price but different timestamps), it’s a correctness property — a stable merge keeps earlier-timestamp orders first. Using <= (pick from l1 on ties) achieves stability when l1 is the ‘first’ list by convention. Using < (pick from l2 on ties) reverses tie order. Always state which list is ‘first’ when you claim stability.”

Follow-up 4 (Hard): “Sort a linked list.” (LC 148)

Signal sought: Algorithm composition.

Full answer: “Merge sort. (1) Find the middle with slow/fast pointers; split into two halves (set middle.next = None to terminate the first half). (2) Recursively sort each half. (3) Merge with this kernel. O(N log N) time. Space: O(log N) for recursion stack (vs. O(1) for the merge step itself). Why not quicksort? Linked list random access is O(N), so quicksort’s partition step is awful. Merge sort is the natural fit because its access pattern is sequential.”

Follow-up 5 (Senior signal): “How would you handle this for billion-node lists across multiple machines?”

Signal sought: Distributed thinking.

Full answer: “External merge. Each machine holds a sorted shard; we want one globally sorted output. Use a tournament tree (loser tree) with K = number of shards, each shard streams its head value. The tree picks the global minimum in O(log K) per element. Each machine reads sequentially from disk; the coordinator emits the merged stream. This is exactly how Spark’s sort-merge join works, and how HBase compactions merge sorted SSTables. The two-list kernel I just wrote generalizes via tournament-tree to N-way.”

14. Full Solution Walkthrough

See solution.py.

The file has five sections:

ListNode — minimal class with __repr__.
merge_two_lists(l1, l2) — iterative, dummy-node, splice-in-place. The production version.
merge_two_lists_recursive(l1, l2) — for didactic comparison; bumps recursion limit.
stress_test() — generates 1000 random sorted-list pairs, runs both implementations, compares to a sorted Python list oracle.
edge_cases() — empty+empty, empty+nonempty, single+single, all-equal-values, large.

Decisions justified:

Why <= not <: stability (per Section 13.3).
Why tail.next = l1 or l2 after the loop: Python truthy idiom, both Pythonic and O(1).
Why we splice nodes rather than copy: O(1) extra space, no allocation pressure.

15. Beyond the Problem — Production Reality

At scale:

Database query execution: SORT MERGE JOIN reads two sorted child operators and merges by join key. This is the kernel — Postgres, Oracle, MySQL, BigQuery all have this code path.
Log aggregation: Kafka log compaction, Loki/Tempo segment merging, ClickHouse part-merging — all are timestamp-keyed merges of sorted segments.
LSM trees (RocksDB, LevelDB, Cassandra): compaction merges multiple sorted SSTables into one. Same kernel, lifted to disk-resident files.

Real systems this is the kernel of:

Polyphase merge sort in external sort utilities (sort command on Unix when input doesn’t fit in RAM).
Spark’s sort-merge join: each executor holds a sorted partition; the merge is precisely this kernel.
Time-series database compaction: InfluxDB, Prometheus, TimescaleDB all merge time-ordered chunks.

Principal-engineer code review comment: “I’d add a debug assertion that asserts both inputs are sorted on the way in. We’ve had subtle bugs where a caller passed an unsorted list (a different list type assumed by mistake) and this function silently produced garbage. A assert is_sorted(l1) and is_sorted(l2) under if __debug__ is cheap insurance. Also: please version the comparator — for our timestamp lists, we sometimes pass nanosecond ints and sometimes datetime objects; the lambda makes the contract explicit at the call site.”

LeetCode Interview — Extreme Coding