p46 — Merge Intervals

Source: LeetCode 56 · Medium · Topics: Array, Sorting Companies (2024–2025 frequency): Meta (very high), Google (very high), Amazon (very high), Microsoft (high), Bloomberg (high), Apple (medium-high), Uber (high), Airbnb (high) Loop position: Top-3 most-asked Medium overall. Phone screen and onsite warm-up. NOT solving this cleanly is a serious red flag.

1. Quick Context

Given a list of intervals [[start_i, end_i], ...] (possibly unsorted, possibly overlapping), merge all overlapping intervals and return the resulting list, sorted by start.

Canonical solution:

  1. Sort by start.
  2. Sweep: for each interval, if it overlaps the last appended interval, extend; otherwise append.
intervals.sort(key=lambda x: x[0])
out = []
for s, e in intervals:
    if out and s <= out[-1][1]:
        out[-1][1] = max(out[-1][1], e)
    else:
        out.append([s, e])
return out

O(n log n) time (sort), O(n) extra (output, ignoring sort space). The whole algorithm is 5 lines; the interview signal is in HOW you discuss it: comparator choice, endpoint semantics, why one comparison with out[-1] suffices.


🔗 https://leetcode.com/problems/merge-intervals/

STOP. 15-min timer. Write the comparator, the overlap condition, and the merge rule on paper FIRST.


3. Prerequisite Concepts

  • Sorting with a key (list.sort(key=...)).
  • Mutable list-of-lists vs list-of-tuples (LC uses lists — mutable end).
  • Overlap condition a[0] ≤ b[1] AND b[0] ≤ a[1]; after sorting by start, the second clause is automatic.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Given closed intervals, fuse all that touch or overlap; return the disjoint cover.”

Step 2 — Clarify:

  • “Closed intervals — do [1, 2] and [2, 3] merge?” (LC 56: yes — closed, touching counts.)
  • “May intervals be empty (start > end)?” (LC: no.)
  • “Output sorted by start?” (Implicit yes; the sort gives it for free.)
  • “May I mutate the input?” (LC: usually yes.)
  • “Range of start/end?” (LC: [0, 10^4]; can be larger in production.)

Step 3 — Constraints: n ≤ 10^4 → O(n log n) trivially fits.

Step 4 — Examples:

  • [[1,3],[2,6],[8,10],[15,18]] → [[1,6],[8,10],[15,18]].
  • [[1,4],[4,5]] → [[1,5]] (touching merge).
  • [[1,4],[2,3]] → [[1,4]] (contained).

Step 5 — Brute force: Repeatedly scan pairs and merge any overlapping; halt when no merges happen. O(n²) per scan, O(n) scans → O(n³). Terrible but illustrates the goal.

Step 6 — Complexity: Sort + sweep O(n log n). Lower bound is Ω(n log n) by reduction from sorting (intervals [i, i] for distinct i reduce to sorting).

Step 7 — Pattern: Sort + linear sweep. Sibling problems: LC 57 Insert Interval (pre-sorted), LC 435 Non-overlapping (sort-by-end), LC 252 Meeting Rooms I (just feasibility check), LC 1288 Remove Covered Intervals.

Step 8 — Develop: see solution.py.

Step 9 — Test: empty input, single interval, all-disjoint, all-overlapping (one big merge), contained interval, touching intervals (closed-interval merge), unsorted input.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Why one comparison suffices

6.1 The key invariant

Claim: After sorting by start, for the current interval [s, e], the ONLY interval in out that can possibly overlap with it is out[-1].

Proof: Suppose some earlier out[i] (with i < len(out) - 1) overlapped [s, e]. Then out[i].end ≥ s ≥ out[-1].start. But out[i] and out[-1] are both in the output, meaning they were determined to NOT overlap (or out[-1] would have been merged INTO out[i]). Contradiction.

This reduces an O(n²) “compare against everything” check to O(n) “compare against last.”

6.2 The overlap check, formalized

Two intervals A = [a, b] and B = [c, d] overlap iff max(a, c) ≤ min(b, d).

After sorting by start, A.start ≤ B.start, so max(A.start, B.start) = B.start. Overlap reduces to B.start ≤ min(A.end, B.end), i.e., B.start ≤ A.end (since B.start ≤ B.end always).

So the check is simply cur.start ≤ prev.end. Note (closed) vs < (half-open). Different intervals semantics → different operator.

6.3 The merge rule — the off-by-one trap

out[-1][1] = max(out[-1][1], cur[1])   # CORRECT
out[-1][1] = cur[1]                    # WRONG — fails when cur is fully contained

Example: [[1, 10], [2, 3]] after sort. Without max, the merged end becomes 3 instead of 10.

max(prev_end, cur_end) is correct ALWAYS; the simpler form is wrong for the “contained” case.

6.4 In-place output and Python list aliasing

out.append([s, e])
out[-1][1] = max(...)

You’re mutating the list in out. If the caller cares about the input not being modified, ensure you out.append([s, e]) rather than out.append(intervals[i]) — otherwise you mutate the input’s nested list.

In production, prefer immutable tuples and rebuild rather than mutate. The cost is one extra allocation; the gain is no aliasing footguns.

6.5 Stability and tie-breaking

What if two intervals have the same start? [[1, 4], [1, 5]] → [[1, 5]]. Either comes first after a stable sort (Python’s sort IS stable). Doesn’t matter for merging because both will merge into one result; max(prev_end, cur_end) handles it.

Adding key=lambda x: (x[0], x[1]) to the sort doesn’t change correctness; mention only if asked.

6.6 Stream variant

What if intervals arrive one at a time and you want to maintain the merged set online?

  • Sorted structure (SortedList / BBST): find the position by start in O(log n); check neighbors; merge if needed; remove subsumed intervals. Amortized O(log n) per insert.
  • Interval tree: O(log n) per query/insert; supports range overlap queries.
  • Naïve list + linear scan: O(n) per insert; fine for small n.

LC asks for the batch version; mention the online version as a senior+ follow-up.

6.7 Half-open [s, e) semantics

In calendar systems, intervals are typically half-open: a meeting [9, 10) ends BEFORE 10, so [10, 11) can begin at 10 without conflict. The overlap check becomes cur.start < prev.end (strict).

LC 56 uses closed intervals (touch-merge). Real systems usually don’t. ASK FIRST.


7. Anti-Pattern Analysis

  1. Skip the sort — overlap detection becomes O(n²) per pair. Quadratic.
  2. out[-1][1] = cur[1] without max — fails on contained intervals.
  3. Compare against EVERY element in out — wasted work; the invariant (Section 6.1) makes one comparison enough.
  4. Sort by (start, end) — adds nothing; sort by start alone is correct and faster (less comparator work).
  5. Use cur.start < prev.end (strict) when problem wants closed semantics, or when problem wants half-open. ASK.
  6. Mutate input by appending intervals[i] references. Aliasing bugs.
  7. Use a custom comparator class for sort. key= is faster and idiomatic in Python.
  8. Concat-and-redo merge in a loop — O(n²) and ugly. The single-pass sweep is right.
  9. Build a sweep-line of (point, +1/-1) events for THIS problem — works but overengineered; the sort+sweep is simpler and faster.

8. Skills & Takeaways

  • Sort-then-sweep as the canonical pattern for interval problems.
  • One-comparison-with-last invariant after sorting.
  • max(prev_end, cur_end) for the merge rule.
  • Closed vs half-open endpoint distinction (always ASK).
  • key= sort idiom in Python; no custom comparator needed.
  • Aliasing-safe output by appending fresh copies.

9. When to Move On

Move on when:

  • You can write the 5-line solution in ≤ 8 min, no lookup.
  • You explain WHY comparing only out[-1] is sufficient (the invariant).
  • You explain the max(prev_end, cur_end) trap.
  • You ASK about closed vs half-open before coding.
  • Stress test on 1000 random inputs agrees with brute O(n²) merge.

10. Company Context

Meta:

  • Universal phone-screen + onsite warm-up. L4/L5 must produce in ≤ 10 min, including endpoint clarification.
  • Bar Raiser scorecard: “Asked about endpoint semantics; one-pass sort+sweep; explained the invariant.”

Google:

  • L4/L5. Often paired with Insert Interval as a two-problem set.
  • Common follow-up: “Now do this in a streaming setting.”

Amazon:

  • L5/L6. Often phrased as “merge overlapping CPU job intervals” or “merge calendar events.”

Microsoft:

  • L62+. Variant: “Given two sorted disjoint interval lists, compute their union” (basically merge after concat).

Bloomberg:

  • Trading-hour overlap; always with closed-interval semantics.

Apple:

  • Calendar app team — half-open semantics; pair with “find first free slot.”

Uber:

  • Driver shift overlaps; always with timezone gotchas (ASK).

Airbnb:

  • Booking conflict detection.

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
ClarificationCodes immediatelyAsks about orderingAsks endpoint semantics+ asks streaming/batch, timezone, integer/float
Sort key(start, end) “for safety”start only+ explains why second key irrelevant+ comparator complexity O(1) matters at scale
Merge ruleout[-1][1] = cur[1] (BUG)max(...) correctly+ explains contained case+ immutable rebuild for aliasing safety
Complexity“It’s fast”O(n log n)+ notes Ω(n log n) lower bound+ analyzes constant factors / cache
ProductionSilentSilentMentions streaming variant+ interval tree, R-tree, segment-tree, geospatial generalization

12. Level Delta

Mid (L4):

  • Sort + sweep with max(prev_end, cur_end).
  • Handles closed-interval merging ().
  • Single-pass after sort.

Senior (L5):

  • ASKs endpoint semantics before coding.
  • Articulates the “compare with last only” invariant.
  • Mentions the streaming variant (SortedList).

Staff (L6):

  • Discusses interval trees and segment trees for online queries.
  • Discusses immutable rebuild vs in-place mutation trade.
  • Generalizes to 2D (rectangle union — much harder, related to Klee’s algorithm).

Principal (L7+):

  • Discusses R-tree / geospatial indices for high-dimensional interval-like queries.
  • Discusses distributed interval merging: partition the timeline, merge per-partition, stitch at boundaries (boundary intervals are the hard part).
  • Discusses memory-bounded streaming with bounded out-of-order tolerance (watermarks).
  • Discusses lock-free interval data structures for concurrent calendar systems.

13. Follow-up Q&A

Q1: Streaming variant — intervals arrive online? A: SortedList of intervals keyed by start; on insert, find neighbors via bisect; merge with both if overlapping; remove subsumed. O(log n + k) per insert where k = intervals absorbed. Or maintain an interval tree.

Q2: 2D — merge overlapping RECTANGLES? A: Much harder. Naïve approach: sweep-line on x-axis, maintain set of active y-intervals (1D merge). For union area, use Klee’s algorithm O(n² log n) or coordinate compression + segment tree O(n log n). Mention complexity; full code is out of scope for a 45-min interview.

Q3: Half-open intervals — does the code change? A: Yes — change <= to < in the overlap check. [1, 2) and [2, 3) no longer merge.

Q4: Find the largest gap between merged intervals? A: Merge first, then linear scan computing out[i+1].start - out[i].end. O(n log n) total.

Q5: Count of overlapping intervals at the busiest point? A: That’s Meeting Rooms II (p49). Use sweep-line with (time, +1/-1) events, take running max.

Q6: Remove the smallest set of intervals to make the rest disjoint? A: LC 435 Non-overlapping Intervals (p48). Sort by END, greedy activity-selection.


14. Solution Walkthrough

See solution.py:

  • merge_intervals — sort + sweep canonical answer.
  • merge_intervals_brute — repeated-pair-merge oracle (O(n³)).
  • merge_intervals_sweep_line(point, +1/-1) events alternative.
  • edge_cases() + stress_test() — 1000 random inputs; all three agree.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“Two things. (1) The endpoint semantics question is the most important line of code in this function — make it a docstring comment, not a constant in your head. We had a P0 incident because a backend engineer assumed [start, end] was half-open and a frontend engineer assumed it was closed; meetings vanished from calendars at the minute boundary. (2) The 5-line sweep is correct but FRAGILE when intervals come from different timezones or have non-UTC representations. Always normalize timestamps to UTC integers (epoch seconds or millis) at the API boundary, BEFORE the merge. Type-safety: an Interval[T] with T a datetime is one bug away from comparing strings.”

Connections forward:

  • p47 — Insert Interval (input is pre-sorted).
  • p48 — Non-overlapping Intervals (greedy by END).
  • p49 — Meeting Rooms II (peak active).
  • p50 — Employee Free Time (multi-list merge + invert).
  • LC 252 — Meeting Rooms I (just feasibility).
  • LC 1288 — Remove Covered Intervals.
  • LC 986 — Interval List Intersections.
  • Phase 02 Lab 06 — Intervals.
  • Production: calendar systems, CPU scheduling, geospatial range queries (interval tree, R-tree), distributed log compaction, event store deduplication.