p46 — Merge Intervals
Source: LeetCode 56 · Medium · Topics: Array, Sorting Companies (2024–2025 frequency): Meta (very high), Google (very high), Amazon (very high), Microsoft (high), Bloomberg (high), Apple (medium-high), Uber (high), Airbnb (high) Loop position: Top-3 most-asked Medium overall. Phone screen and onsite warm-up. NOT solving this cleanly is a serious red flag.
1. Quick Context
Given a list of intervals [[start_i, end_i], ...] (possibly unsorted, possibly overlapping), merge all overlapping intervals and return the resulting list, sorted by start.
Canonical solution:
- Sort by start.
- Sweep: for each interval, if it overlaps the last appended interval, extend; otherwise append.
intervals.sort(key=lambda x: x[0])
out = []
for s, e in intervals:
if out and s <= out[-1][1]:
out[-1][1] = max(out[-1][1], e)
else:
out.append([s, e])
return out
O(n log n) time (sort), O(n) extra (output, ignoring sort space). The whole algorithm is 5 lines; the interview signal is in HOW you discuss it: comparator choice, endpoint semantics, why one comparison with out[-1] suffices.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/merge-intervals/
STOP. 15-min timer. Write the comparator, the overlap condition, and the merge rule on paper FIRST.
3. Prerequisite Concepts
- Sorting with a key (
list.sort(key=...)). - Mutable list-of-lists vs list-of-tuples (LC uses lists — mutable end).
- Overlap condition
a[0] ≤ b[1] AND b[0] ≤ a[1]; after sorting by start, the second clause is automatic.
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “Given closed intervals, fuse all that touch or overlap; return the disjoint cover.”
Step 2 — Clarify:
- “Closed intervals — do
[1, 2]and[2, 3]merge?” (LC 56: yes — closed, touching counts.) - “May intervals be empty (
start > end)?” (LC: no.) - “Output sorted by start?” (Implicit yes; the sort gives it for free.)
- “May I mutate the input?” (LC: usually yes.)
- “Range of start/end?” (LC:
[0, 10^4]; can be larger in production.)
Step 3 — Constraints: n ≤ 10^4 → O(n log n) trivially fits.
Step 4 — Examples:
[[1,3],[2,6],[8,10],[15,18]] → [[1,6],[8,10],[15,18]].[[1,4],[4,5]] → [[1,5]](touching merge).[[1,4],[2,3]] → [[1,4]](contained).
Step 5 — Brute force: Repeatedly scan pairs and merge any overlapping; halt when no merges happen. O(n²) per scan, O(n) scans → O(n³). Terrible but illustrates the goal.
Step 6 — Complexity: Sort + sweep O(n log n). Lower bound is Ω(n log n) by reduction from sorting (intervals [i, i] for distinct i reduce to sorting).
Step 7 — Pattern: Sort + linear sweep. Sibling problems: LC 57 Insert Interval (pre-sorted), LC 435 Non-overlapping (sort-by-end), LC 252 Meeting Rooms I (just feasibility check), LC 1288 Remove Covered Intervals.
Step 8 — Develop: see solution.py.
Step 9 — Test: empty input, single interval, all-disjoint, all-overlapping (one big merge), contained interval, touching intervals (closed-interval merge), unsorted input.
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — Why one comparison suffices
6.1 The key invariant
Claim: After sorting by start, for the current interval [s, e], the ONLY interval in out that can possibly overlap with it is out[-1].
Proof: Suppose some earlier out[i] (with i < len(out) - 1) overlapped [s, e]. Then out[i].end ≥ s ≥ out[-1].start. But out[i] and out[-1] are both in the output, meaning they were determined to NOT overlap (or out[-1] would have been merged INTO out[i]). Contradiction.
This reduces an O(n²) “compare against everything” check to O(n) “compare against last.”
6.2 The overlap check, formalized
Two intervals A = [a, b] and B = [c, d] overlap iff max(a, c) ≤ min(b, d).
After sorting by start, A.start ≤ B.start, so max(A.start, B.start) = B.start. Overlap reduces to B.start ≤ min(A.end, B.end), i.e., B.start ≤ A.end (since B.start ≤ B.end always).
So the check is simply cur.start ≤ prev.end. Note ≤ (closed) vs < (half-open). Different intervals semantics → different operator.
6.3 The merge rule — the off-by-one trap
out[-1][1] = max(out[-1][1], cur[1]) # CORRECT
out[-1][1] = cur[1] # WRONG — fails when cur is fully contained
Example: [[1, 10], [2, 3]] after sort. Without max, the merged end becomes 3 instead of 10.
max(prev_end, cur_end) is correct ALWAYS; the simpler form is wrong for the “contained” case.
6.4 In-place output and Python list aliasing
out.append([s, e])
out[-1][1] = max(...)
You’re mutating the list in out. If the caller cares about the input not being modified, ensure you out.append([s, e]) rather than out.append(intervals[i]) — otherwise you mutate the input’s nested list.
In production, prefer immutable tuples and rebuild rather than mutate. The cost is one extra allocation; the gain is no aliasing footguns.
6.5 Stability and tie-breaking
What if two intervals have the same start? [[1, 4], [1, 5]] → [[1, 5]]. Either comes first after a stable sort (Python’s sort IS stable). Doesn’t matter for merging because both will merge into one result; max(prev_end, cur_end) handles it.
Adding key=lambda x: (x[0], x[1]) to the sort doesn’t change correctness; mention only if asked.
6.6 Stream variant
What if intervals arrive one at a time and you want to maintain the merged set online?
- Sorted structure (
SortedList/ BBST): find the position by start in O(log n); check neighbors; merge if needed; remove subsumed intervals. Amortized O(log n) per insert. - Interval tree: O(log n) per query/insert; supports range overlap queries.
- Naïve list + linear scan: O(n) per insert; fine for small n.
LC asks for the batch version; mention the online version as a senior+ follow-up.
6.7 Half-open [s, e) semantics
In calendar systems, intervals are typically half-open: a meeting [9, 10) ends BEFORE 10, so [10, 11) can begin at 10 without conflict. The overlap check becomes cur.start < prev.end (strict).
LC 56 uses closed intervals (touch-merge). Real systems usually don’t. ASK FIRST.
7. Anti-Pattern Analysis
- Skip the sort — overlap detection becomes O(n²) per pair. Quadratic.
out[-1][1] = cur[1]withoutmax— fails on contained intervals.- Compare against EVERY element in
out— wasted work; the invariant (Section 6.1) makes one comparison enough. - Sort by
(start, end)— adds nothing; sort by start alone is correct and faster (less comparator work). - Use
cur.start < prev.end(strict) when problem wants closed semantics, or≤when problem wants half-open. ASK. - Mutate input by appending
intervals[i]references. Aliasing bugs. - Use a custom comparator class for sort.
key=is faster and idiomatic in Python. - Concat-and-redo merge in a loop — O(n²) and ugly. The single-pass sweep is right.
- Build a sweep-line of
(point, +1/-1)events for THIS problem — works but overengineered; the sort+sweep is simpler and faster.
8. Skills & Takeaways
- Sort-then-sweep as the canonical pattern for interval problems.
- One-comparison-with-last invariant after sorting.
max(prev_end, cur_end)for the merge rule.- Closed vs half-open endpoint distinction (always ASK).
key=sort idiom in Python; no custom comparator needed.- Aliasing-safe output by appending fresh copies.
9. When to Move On
Move on when:
- You can write the 5-line solution in ≤ 8 min, no lookup.
- You explain WHY comparing only
out[-1]is sufficient (the invariant). - You explain the
max(prev_end, cur_end)trap. - You ASK about closed vs half-open before coding.
- Stress test on 1000 random inputs agrees with brute O(n²) merge.
10. Company Context
Meta:
- Universal phone-screen + onsite warm-up. L4/L5 must produce in ≤ 10 min, including endpoint clarification.
- Bar Raiser scorecard: “Asked about endpoint semantics; one-pass sort+sweep; explained the invariant.”
Google:
- L4/L5. Often paired with Insert Interval as a two-problem set.
- Common follow-up: “Now do this in a streaming setting.”
Amazon:
- L5/L6. Often phrased as “merge overlapping CPU job intervals” or “merge calendar events.”
Microsoft:
- L62+. Variant: “Given two sorted disjoint interval lists, compute their union” (basically merge after concat).
Bloomberg:
- Trading-hour overlap; always with closed-interval semantics.
Apple:
- Calendar app team — half-open semantics; pair with “find first free slot.”
Uber:
- Driver shift overlaps; always with timezone gotchas (ASK).
Airbnb:
- Booking conflict detection.
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| Clarification | Codes immediately | Asks about ordering | Asks endpoint semantics | + asks streaming/batch, timezone, integer/float |
| Sort key | (start, end) “for safety” | start only | + explains why second key irrelevant | + comparator complexity O(1) matters at scale |
| Merge rule | out[-1][1] = cur[1] (BUG) | max(...) correctly | + explains contained case | + immutable rebuild for aliasing safety |
| Complexity | “It’s fast” | O(n log n) | + notes Ω(n log n) lower bound | + analyzes constant factors / cache |
| Production | Silent | Silent | Mentions streaming variant | + interval tree, R-tree, segment-tree, geospatial generalization |
12. Level Delta
Mid (L4):
- Sort + sweep with
max(prev_end, cur_end). - Handles closed-interval merging (
≤). - Single-pass after sort.
Senior (L5):
- ASKs endpoint semantics before coding.
- Articulates the “compare with last only” invariant.
- Mentions the streaming variant (SortedList).
Staff (L6):
- Discusses interval trees and segment trees for online queries.
- Discusses immutable rebuild vs in-place mutation trade.
- Generalizes to 2D (rectangle union — much harder, related to Klee’s algorithm).
Principal (L7+):
- Discusses R-tree / geospatial indices for high-dimensional interval-like queries.
- Discusses distributed interval merging: partition the timeline, merge per-partition, stitch at boundaries (boundary intervals are the hard part).
- Discusses memory-bounded streaming with bounded out-of-order tolerance (watermarks).
- Discusses lock-free interval data structures for concurrent calendar systems.
13. Follow-up Q&A
Q1: Streaming variant — intervals arrive online?
A: SortedList of intervals keyed by start; on insert, find neighbors via bisect; merge with both if overlapping; remove subsumed. O(log n + k) per insert where k = intervals absorbed. Or maintain an interval tree.
Q2: 2D — merge overlapping RECTANGLES? A: Much harder. Naïve approach: sweep-line on x-axis, maintain set of active y-intervals (1D merge). For union area, use Klee’s algorithm O(n² log n) or coordinate compression + segment tree O(n log n). Mention complexity; full code is out of scope for a 45-min interview.
Q3: Half-open intervals — does the code change?
A: Yes — change <= to < in the overlap check. [1, 2) and [2, 3) no longer merge.
Q4: Find the largest gap between merged intervals?
A: Merge first, then linear scan computing out[i+1].start - out[i].end. O(n log n) total.
Q5: Count of overlapping intervals at the busiest point?
A: That’s Meeting Rooms II (p49). Use sweep-line with (time, +1/-1) events, take running max.
Q6: Remove the smallest set of intervals to make the rest disjoint? A: LC 435 Non-overlapping Intervals (p48). Sort by END, greedy activity-selection.
14. Solution Walkthrough
See solution.py:
merge_intervals— sort + sweep canonical answer.merge_intervals_brute— repeated-pair-merge oracle (O(n³)).merge_intervals_sweep_line—(point, +1/-1)events alternative.edge_cases()+stress_test()— 1000 random inputs; all three agree.
Run: python3 solution.py.
15. Beyond the Problem
Principal-engineer code review comment:
“Two things. (1) The endpoint semantics question is the most important line of code in this function — make it a docstring comment, not a constant in your head. We had a P0 incident because a backend engineer assumed
[start, end]was half-open and a frontend engineer assumed it was closed; meetings vanished from calendars at the minute boundary. (2) The 5-line sweep is correct but FRAGILE when intervals come from different timezones or have non-UTC representations. Always normalize timestamps to UTC integers (epoch seconds or millis) at the API boundary, BEFORE the merge. Type-safety: anInterval[T]with T adatetimeis one bug away from comparing strings.”
Connections forward:
- p47 — Insert Interval (input is pre-sorted).
- p48 — Non-overlapping Intervals (greedy by END).
- p49 — Meeting Rooms II (peak active).
- p50 — Employee Free Time (multi-list merge + invert).
- LC 252 — Meeting Rooms I (just feasibility).
- LC 1288 — Remove Covered Intervals.
- LC 986 — Interval List Intersections.
- Phase 02 Lab 06 — Intervals.
- Production: calendar systems, CPU scheduling, geospatial range queries (interval tree, R-tree), distributed log compaction, event store deduplication.