p35 — Median of Two Sorted Arrays

Source: LeetCode 4 · Hard · Topics: Array, Binary Search, Divide & Conquer Companies (2024–2025 frequency): Google (very high), Meta (high), Apple (high), Amazon (medium), Microsoft (medium) Loop position: algorithm round, hardest slot. This is the canonical “binary-search-on-partition” Hard. A successful O(log(min(m,n))) solution is a strong staff+ signal.

1. Quick Context

Given two sorted arrays nums1 (length m) and nums2 (length n), return the median of the combined m + n elements. The required complexity is O(log(min(m, n))).

The straightforward O(m + n) merge gives the answer easily — but the problem demands log time, which means: binary search on the partition of the smaller array.

The setup:

Total length T = m + n. The median splits the combined sorted array into two equal halves (or, when T is odd, the left half has one more element).
Choose a partition i of nums1 (0..m) and the matching partition j = (T+1)//2 - i of nums2 (so the combined left half has (T+1)//2 elements).
The partition is correct iff maxLeft(A) <= minRight(B) AND maxLeft(B) <= minRight(A).
Binary search on i over [0, m]. Each step adjusts based on which of the two cross-inequalities fails.

The discriminator: writing a correct binary-search-on-partition for the first time in an interview is extremely hard if you haven’t done it before. Most candidates get O(m+n) merge and a Bar Raiser response of “good — now can you do log?” If you fold under that pressure, you’re capped at mid-level. If you produce the O(log) solution with correct partition handling, you’re signaling staff+.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/median-of-two-sorted-arrays/

STOP. 35-min timer for the merge approach AND the partition approach. This problem deserves the full time.

3. Prerequisite Concepts

Convergent-bounds binary search (see p34).
Two-pointer merge (LC 88 — Merge Sorted Array).
The concept of a “partition” of a sorted array — index i says “the first i elements go to the left side.”
Sentinel values (-inf, +inf) for edge partitions.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Two sorted arrays. Find the median of their union. O(log(min(m, n))).”

Step 2 — Clarify:

“Are the arrays each sorted ascending?” (Yes.)
“Can one or both be empty?” (At least one is non-empty. Per LC.)
“Definition of median for even total: average of two middle elements?” (Yes.)
“Return float?” (Yes.)
“Can values be negative?” (Yes.)
“Required complexity?” (O(log(min(m, n))).)

Step 3 — Constraints: m, n ≤ 1000. O(m + n) is fast enough, but the interview requires O(log). The discriminator is whether you produce log.

Step 4 — Examples:

[1,3], [2] → 2.0 (combined [1,2,3], median 2).
[1,2], [3,4] → 2.5 (combined [1,2,3,4], median (2+3)/2).
[], [1] → 1.0.
[2], [] → 2.0.
[1,2,3,4,5], [6,7,8,9,10] → 5.5 (no overlap).
[1,1,1], [1,1,1] → 1.0.

Step 5 — Brute Force:

O((m+n) log(m+n)): concatenate and sort. Trivial.
O(m + n): merge with two pointers, stop at the median index. State and pivot — but also show that you CAN do brute first to set up the contrast.

Step 6 — Complexity: O(log(min(m, n))) time, O(1) space. We binary-search the partition of the SMALLER array — log of its length.

Step 7 — Pattern Recognition:

“Sorted data + need sub-linear” → binary search.
“Combine two sorted things and answer a ‘split’ query” → binary search on partition.
The partition invariant maxLeft(A) <= minRight(B) AND maxLeft(B) <= minRight(A) is the heart of the technique.

Step 8 — Develop: see solution.py. Implement BOTH the merge oracle and the partition binary search.

Step 9 — Test: all the edge cases above, plus interleaved cases, plus stress test against the merge oracle.

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — Binary search on partition

6.1 The partition framing

Let A = nums1 (length m), B = nums2 (length n), with m <= n (swap if not). The total length is T = m + n.

A “partition” is a pair (i, j) with 0 <= i <= m and 0 <= j <= n such that:

The left side A[0..i-1] + B[0..j-1] has exactly (T + 1) // 2 elements.
The right side A[i..] + B[j..] has the other T // 2 elements.

So j = (T + 1) // 2 - i — once i is fixed, j is determined.

6.2 The correctness invariant

A partition (i, j) is “correct” (median-revealing) iff: $$\text{maxLeft}(A) \leq \text{minRight}(B) \quad \text{AND} \quad \text{maxLeft}(B) \leq \text{minRight}(A)$$ where maxLeft(A) = A[i-1] (or -inf if i == 0), minRight(A) = A[i] (or +inf if i == m), and similarly for B.

If both inequalities hold, then EVERYTHING on the left side is <= EVERYTHING on the right side. The median is determined by the boundary:

If T is odd, median = max(maxLeft(A), maxLeft(B)).
If T is even, median = (max(maxLeft(A), maxLeft(B)) + min(minRight(A), minRight(B))) / 2.

6.3 Why binary search on `i` converges

If maxLeft(A) > minRight(B) — A’s left is too big → too many A elements on the left → decrease i (search left half of [0, m]).

Otherwise (maxLeft(B) > minRight(A)) — B’s left is too big (we have too FEW A elements on the left) → increase i (search right half of [0, m]).

The predicate “is i too large?” is monotonic — that’s why binary search works. Convergent template, same as p34.

6.4 Why we binary search the SMALLER array

So the search range is [0, m] with m = min(m, n). This gives O(log(min(m, n))). If we accidentally searched the larger one, we’d get O(log(max(m, n))) — still log, but the spec is stricter.

There’s another reason: j = (T + 1) // 2 - i. If i is in [0, m] and m <= n, then j is automatically in [0, n] (provable from (T+1)//2 >= m). If we swapped, we’d have to clamp j.

6.5 Sentinels

The edge cases (i == 0, i == m, j == 0, j == n) make maxLeft or minRight undefined. The clean fix: use float('-inf') and float('inf') as sentinels. This eliminates a tangle of conditional branches.

6.6 Why this is staff-level

The bar isn’t “do you know binary search?” — it’s “do you recognize when to binary-search a derived parameter (the partition index) instead of the data itself?” This is the same intellectual leap as binary-search-on-answer (LC 410, LC 875). Once you see it, you see it everywhere: it’s the spine of LC 4, LC 162, LC 410, LC 875, LC 1011, LC 1482, and many more.

7. Anti-Pattern Analysis

Concatenate-and-sort. O((m+n) log(m+n)). Works but signals you didn’t see the structure.
O(m+n) merge as final answer. Works but capped at mid-level. The interviewer will push for log.
Binary-searching the LARGER array. Gets the wrong complexity bound; j can go out of range.
Forgetting sentinels. A pile of if i == 0/else branches that nobody (including you) can debug live.
Wrong partition formula. Common bug: j = T // 2 - i (drops one element when T is odd). The correct formula is j = (T + 1) // 2 - i, which puts the extra element on the left for odd T (so the median is max(maxLeft(A), maxLeft(B))).
(lo + hi) // 2 integer overflow in C++/Java. (Python is fine.) Use lo + (hi - lo) // 2.
Returning int for even-total medians. The answer is a float (e.g., 2.5). Use float division /, not //.
Reaching for the “k-th smallest of two sorted arrays” approach. Also valid (and a great problem), but more code and trickier edge cases than the partition approach.
Manual case-by-case for tiny m (≤ 2). Looks clever but adds complexity. The general partition handles tiny m correctly.

8. Skills & Takeaways

Binary search on derived parameter (partition index) — a pattern, not a trick.
Partition invariant: maxLeft(A) <= minRight(B) AND maxLeft(B) <= minRight(A). Memorize.
Sentinels (±inf) as edge-case eliminator.
Always search the smaller array.
Parity-aware median formula: odd → max of left; even → average of max-left and min-right.
The conceptual link to LC 410, LC 875 — binary search on answer.

9. When to Move On

Move on when:

You can write the merge oracle in under 5 min.
You can write the partition binary search in under 25 min (this Hard is genuinely Hard).
You can articulate the partition invariant from memory.
Your stress test passes 200 random pairs vs the merge oracle.
You can derive the “k-th smallest of two sorted arrays” extension on demand.

10. Company Context

Google:

One of THE most-asked Hard problems at Google for L5+ in 2024-25. Bar Raisers expect either the partition solution or a strong attempt with correct framing.
The “did the candidate see the binary-search-on-partition pattern” is the central decision point.
Scorecard phrase: “Algorithm — produced O(log) median via binary search on partition. Strong signal.”

Meta:

Less common than at Google, but appears in senior loops. Even reaching the O(m+n) merge cleanly is good; the O(log) is the staff+ signal.
Scorecard phrase: “Strong — O(m+n) merge confidently, attempted O(log) partition.”

Apple:

Asked in algo loops, particularly for media/systems teams. Often paired with a “what if the arrays are streams” follow-up.
Scorecard phrase: “Algorithm — partition binary search, articulated invariant.”

Amazon:

Less common in standard L5 loops but appears in L6+ or Bar Raiser slots.
Scorecard phrase: “Strong — Hard problem solved with correct complexity.”

Microsoft:

Standard “Hard slot” for L63+. Bar Raiser may ask for both the merge and the partition approach.
Scorecard phrase: “Strong — both approaches with clear reasoning.”

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
Approach	Concatenate-sort	O(m+n) merge	Attempts partition, may not finish	Partition with sentinels, clean
Sentinels	None	Maybe a few `if` branches	`±inf` sentinels from the start	+ explains why they eliminate edge cases
Which array to search	Either	Whichever is convenient	Always smaller	+ proves why (j-range)
Median parity	Bug on even or odd	Handles after one fail	Correct first try	+ states the `(T+1)//2` choice and why
Follow-up	“Don’t know”	“Use k-th smallest?”	“k-th smallest in two sorted arrays”	+ streaming extension, k-way generalization

12. Level Delta

Mid (L4):

O(m+n) merge solution.
Handles odd and even totals.
Handles empty array.

Senior (L5):

Attempts the partition binary search.
Articulates the partition invariant correctly even if implementation has bugs.
Uses sentinels.
Tests odd, even, empty, all-same, no-overlap.

Staff (L6):

Clean O(log(min(m, n))) partition solution with sentinels.
Connects to “binary search on derived parameter” pattern (LC 410, LC 875).
Discusses k-th smallest extension (LC 4 generalized).
Discusses what changes for K sorted arrays — heap-based merge, log(K) median tracking with two heaps (LC 295 family).

Principal (L7+):

Discusses distributed/streaming variants: maintaining a running median over two streams = LC 295 (two heaps). Maintaining a median across federated sorted shards = log-ranked search via approximate quantile sketches (t-digest, KLL).
Discusses GPU-friendly variant: parallel partition via “split-merge” on a balanced binary tree of subarrays.
Discusses production: financial median-price-over-N-sources, A/B test result median across cohorts, distributed monitoring p50 latency.

13. Follow-up Q&A

Q1: K-th smallest element across two sorted arrays. A: Generalize the partition. Pick a partition (i, j) with i + j = K. The k-th smallest is max(A[i-1], B[j-1]) if the partition is valid. Binary search over i in [max(0, K-n), min(K, m)]. O(log(min(m, n, K))).

Q2: Median of K sorted arrays. A: Two approaches: (1) Heap merge — extract the (T/2)-th element. O(T log K). (2) Generalized partition: binary-search a “pivot” and count how many elements in each array are below the pivot. Binary search on the pivot value, O(K log V log K) where V is the value range. Harder, often overkill.

Q3: Running median of two infinite streams. A: Two heaps (max-heap for lower half, min-heap for upper half). Insert: O(log T). Median: O(1). This is LC 295.

Q4: What if the arrays are NOT sorted but we have access to “sort hints”? A: Sort first (O(T log T)), then apply partition. There’s no shortcut without structure.

Q5: Approximate median for huge unsortable streams. A: Quantile sketches: t-digest, GK, KLL. Sub-linear memory, ε-approximate median. Production-ready (used in Datadog, Prometheus).

14. Solution Walkthrough

See solution.py:

find_median_sorted_arrays(nums1, nums2) — partition binary search. O(log(min(m,n))).
find_median_brute(nums1, nums2) — merge with two pointers (O(m+n)). Oracle.
stress_test() — 200 random pairs (lengths 0..15 each, at least one non-empty), values in [-30, 30].
edge_cases() — empty + non-empty, single elements, no overlap, full overlap, all equal, interleaved, odd and even totals, negatives.

Run: python3 solution.py.

15. Beyond the Problem

Principal-engineer code review comment:

“This solution lives or dies by the partition invariant. Write a clear comment block at the top of the function with the formula, the j = (T+1)//2 - i derivation, and a worked example — your future self (and reviewers) will thank you. Second: the float('inf') sentinels are clever but ALSO subtle — if you ever extend this to integer-only environments (some embedded systems), you’ll need INT_MAX/MIN and have to think about overflow. Document the sentinel choice. Third: if this function is going into a library, expose BOTH the O(m+n) merge variant AND the O(log) partition variant — the merge is more cache-friendly for small arrays, and for typical inputs (say m+n < 64), it’s likely FASTER in wall-clock time. Profile, then pick the right default. Fourth: write at least one test that exercises every cross-check failure case — maxLeft(A) > minRight(B) AND maxLeft(B) > minRight(A) — to make sure the binary search direction is right both ways.”

Connections forward:

p33, p34 — rotated binary search siblings.
LC 295 — Find Median from Data Stream: dynamic-streaming variant via two heaps.
LC 295’s heap pattern → general top-K and quantile structures.
LC 215 — K-th Largest Element: quickselect or heap.
LC 410 — Split Array Largest Sum: binary search on answer (Phase 02 Lab 04).
LC 875 — Koko Eating Bananas: binary search on answer.
Phase 02 Lab 04 — Binary Search on Answer: full template lab.
Phase 02 Lab 09 — Heap Top-K: complementary technique.
Production: distributed median (quantile sketches: t-digest, GK, KLL), monitoring p50 latency, A/B test median computation.

LeetCode Interview — Extreme Coding