p48 — Non-overlapping Intervals

Source: LeetCode 435 · Medium · Topics: Array, Greedy, Sorting, DP Companies (2024–2025 frequency): Google (high), Meta (medium-high), Amazon (medium), Microsoft (medium), Bloomberg (medium) Loop position: Classic greedy / activity-selection. Tests whether you reach for the OPTIMAL sort key (end, not start) and can justify it.

1. Quick Context

Return the minimum number of intervals to REMOVE so that the remaining intervals are non-overlapping.

Equivalently: find the MAXIMUM number of mutually non-overlapping intervals (the keep-count); answer = n - kept.

This is the classic Activity Selection problem.

The right algorithm:

  1. Sort by END time (not start!).
  2. Greedily keep any interval whose start ≥ prev_end; else discard (count++).
intervals.sort(key=lambda x: x[1])
removed = 0
prev_end = -inf
for s, e in intervals:
    if s >= prev_end:
        prev_end = e
    else:
        removed += 1
return removed

O(n log n) time, O(1) extra. The senior signal is justifying sort-by-end via the exchange argument (Section 6.2).


🔗 https://leetcode.com/problems/non-overlapping-intervals/

STOP. 15-min timer. Before coding: which sort key, and WHY?


3. Prerequisite Concepts

  • Greedy algorithms and exchange-argument proofs.
  • Sorting with a key.
  • Closed vs half-open overlap semantics — for LC 435, intervals are HALF-OPEN: [1, 2] and [2, 3] do NOT overlap.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Find the smallest interval set whose removal leaves a disjoint set, equivalently find the largest disjoint subset.”

Step 2 — Clarify:

  • “Half-open or closed? Does [1, 2] and [2, 3] overlap?” (LC 435: half-open — touching does NOT overlap.)
  • “Are inputs distinct?” (LC: not necessarily; duplicates count as overlapping.)
  • “Empty input → 0?” (Yes.)
  • “May I mutate input?” (Yes.)

Step 3 — Constraints: n ≤ 10^5. O(n log n) is the obvious target.

Step 4 — Examples:

  • [[1,2],[2,3],[3,4],[1,3]] → 1 (remove [1,3]).
  • [[1,2],[1,2],[1,2]] → 2 (keep one).
  • [[1,2],[2,3]] → 0 (touching, no overlap in half-open).
  • [] → 0.

Step 5 — Brute force: Try all 2^n subsets; for each, check disjoint; track largest. O(2^n × n). Exponential.

DP alternative: Sort by start; dp[i] = 1 + max(dp[j] : j < i AND intervals[j].end ≤ intervals[i].start). O(n²) or O(n log n) with binary search. Works but heavier than greedy.

Step 6 — Complexity:

  • Greedy: O(n log n) time, O(1) extra.
  • DP: O(n²) naive, O(n log n) with bisect.
  • Brute: O(2^n × n).

Step 7 — Pattern: Activity Selection / Weighted Interval Scheduling. Classic CLRS chapter on greedy algorithms.

Step 8 — Develop: see solution.py.

Step 9 — Test: empty, single, all-disjoint (→ 0), all-overlapping at one point (→ n-1), one nested interval, duplicates, touching pairs (→ 0).


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Why sort by END (the exchange argument)

6.1 The greedy choice

Greedy: Among all remaining candidates, repeatedly KEEP the one with the earliest end time that doesn’t conflict with the previously kept. Discard others that conflict.

The single-pass sort-by-end implementation realizes this: after sorting, the first interval is automatically the earliest-ending; subsequent intervals are considered in end-order, and we keep them if compatible.

6.2 The exchange argument — proof of optimality

Claim: Sorting by END and greedily keeping any interval compatible with the previously-kept is OPTIMAL (max keep-count).

Proof: Let G = g₁, g₂, ..., g_k be the greedy’s kept sequence (sorted by end). Let OPT = o₁, o₂, ..., o_m be ANY optimal kept sequence (also sortable by end since OPT is disjoint).

Show k ≥ m by induction on the first index where they differ.

  • Base: g₁ has the smallest end of any interval, so g₁.end ≤ o₁.end. Replace o₁ with g₁ in OPT — still disjoint (because o₂.start ≥ o₁.end ≥ g₁.end), still size m.
  • Inductive step: Suppose g_i.end ≤ o_i.end for i ≤ j. Greedy at step j+1 picks the earliest-ending interval compatible with g_j. Since o_{j+1} is compatible with g_j (because o_{j+1}.start ≥ o_j.end ≥ g_j.end), o_{j+1} is a CANDIDATE for greedy. Greedy picks something with end ≤ o_{j+1}.end. So g_{j+1}.end ≤ o_{j+1}.end.
  • Conclusion: Greedy never falls behind OPT in end-time, so it can fit at least as many intervals.

6.3 Why sort by START fails

Counterexample: [[1, 100], [2, 3], [4, 5]].

  • Sort by start: [1,100], [2,3], [4,5]. Greedy keeps [1,100]; rejects rest. Keep = 1, remove = 2.
  • Sort by end: [2,3], [4,5], [1,100]. Greedy keeps [2,3], [4,5]; rejects [1,100]. Keep = 2, remove = 1.

Sort-by-start is GREEDY-WRONG because the first picked interval can swallow many future opportunities.

6.4 The half-open vs closed distinction matters HERE

For LC 435 (half-open): s >= prev_end is the keep condition. If the problem were closed-interval (touching = overlapping): s > prev_end would be the keep condition.

ASK ALWAYS. The single-character difference flips the answer for touching intervals.

6.5 Weighted variant — DP, not greedy

If each interval has a WEIGHT and you want max-weight disjoint subset (LC 1235 Maximum Profit in Job Scheduling):

  • Greedy by end FAILS — could miss a high-weight long interval.
  • Solution: DP. Sort by end; dp[i] = max(dp[i-1], weight[i] + dp[j]) where j = last interval ending ≤ start[i] (found via bisect). O(n log n).

Mention this if the interviewer asks “what if intervals have weights?”

6.6 Connection to graph coloring

If you build a graph where intervals are nodes and edges connect overlapping intervals, the maximum independent set in this “interval graph” = max non-overlapping subset. For interval graphs (a special graph class), maximum independent set is polynomial-time and equals the greedy answer. In general graphs, max independent set is NP-hard.

6.7 Online variant

If intervals arrive one at a time and you must commit (keep/discard) immediately without seeing future: this is the online interval scheduling problem. No constant-competitive deterministic algorithm exists in general; randomized strategies achieve O(log n) competitive ratio. Offline (full input visible) is the LC problem.


7. Anti-Pattern Analysis

  1. Sort by start — wrong (Section 6.3). Counterexample.
  2. Sort by length (“keep the smallest intervals first”) — wrong. Counterexample: [[1,5],[2,3],[4,5]] — keeping [2,3] first blocks both others.
  3. DP when greedy suffices — works (O(n log n)) but more code and more bugs.
  4. Use > vs >= wrongly — half-open vs closed confusion.
  5. Return kept count instead of removed count — read the problem.
  6. Compare against ALL kept instead of just the latest kept — O(n²) and unnecessary.
  7. Initialize prev_end = 0 — fails for negative-start intervals. Use float('-inf').
  8. Forget that sort is in-place if you need the original order later.
  9. Sort by (end, start) — second key adds nothing; harmless but signals junior.

8. Skills & Takeaways

  • Activity Selection / Weighted Interval Scheduling — canonical greedy problem family.
  • Sort by END for unweighted; DP-sort-by-end + bisect for weighted.
  • Exchange argument as the standard proof technique for greedy correctness.
  • Closed vs half-open flips the comparison operator.
  • Interval graph max independent set = polynomial via this greedy.

9. When to Move On

Move on when:

  • You write the greedy in ≤ 8 min.
  • You can EXPLAIN why sort-by-end is optimal (informal exchange argument).
  • You can give a counterexample to sort-by-start.
  • You know the weighted variant requires DP.
  • Stress test passes on 1000 random inputs vs brute O(2^n) for small n.

10. Company Context

Google:

  • L4/L5 onsite Medium; classic greedy litmus test.
  • Bar Raiser: “Justified sort-by-end with exchange argument; gave counterexample to sort-by-start.”

Meta:

  • L5+. Variant: “Job scheduling on a single machine.”

Amazon:

  • L5. Productized as “warehouse dock scheduling.”

Microsoft:

  • L62+. Often with weighted follow-up (LC 1235).

Bloomberg:

  • Trading-strategy backtesting: “max non-overlapping trades.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
Sort keystart (BUG)end+ justifies+ exchange argument formally
Counterexample to start“It works”“Hmm”Provides+ proves greedy from base principles
Weighted follow-upTries greedy“Need DP”DP + bisect+ Knuth’s interval algorithms
Online variantSilentSilentMentions+ competitive ratio analysis
Complexity“Fast”O(n log n)+ Ω(n log n) lower bound+ cache effects, parallel scheduling

12. Level Delta

Mid (L4):

  • Greedy with sort-by-end.
  • Correct >= vs > for half-open.
  • Counts removed (not kept).

Senior (L5):

  • Counterexample to sort-by-start.
  • Informal exchange-argument explanation.
  • Mentions weighted variant needs DP.

Staff (L6):

  • Formal exchange argument.
  • Knows weighted variant = LC 1235; sketches DP.
  • Discusses interval graph & max independent set connection.

Principal (L7+):

  • Discusses online competitive algorithms for unseen-future variants.
  • Discusses multi-machine scheduling generalization (becomes NP-hard for many machines; greedy approximation).
  • Discusses integer programming formulation for variants with complex constraints.
  • Discusses real-world OR systems (operating rooms, crew scheduling) and where greedy breaks.

13. Follow-up Q&A

Q1: Each interval has a WEIGHT — maximize sum of kept weights? A: LC 1235. Sort by end; for each interval i, dp[i] = max(dp[i-1], weight[i] + dp[j]) where j = bisect_right(ends, start[i]) - 1. O(n log n).

Q2: K parallel machines — max non-overlapping per machine? A: Hungarian-style assignment; or interval graph k-coloring. For k=2, equivalent to bipartite matching. For general k, harder.

Q3: All intervals MUST be kept, minimize CONFLICTS? A: Different problem — count overlapping pairs. Use sweep-line; O(n log n).

Q4: Approximation when greedy can’t see future? A: Online competitive algorithms; randomized achieves O(log n) competitive.

Q5: Closed intervals — does the code change? A: Change >= to >. One character.

Q6: Maximize NUMBER of intervals such that no point covered by more than k? A: Generalization. Use min-heap of end times (similar to Meeting Rooms II); if heap size > k, must skip current.


14. Solution Walkthrough

See solution.py:

  • erase_overlap_intervals — greedy sort-by-end. Canonical.
  • erase_overlap_intervals_sort_by_start_WRONG — counterexample-failing version (educational).
  • erase_overlap_intervals_dp — sort-by-end + DP (equivalent answer, slower).
  • erase_overlap_intervals_brute — try all 2^n subsets for small n (oracle).
  • edge_cases() + stress_test().

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“I see three patterns junior engineers get wrong with interval problems, and this code review touches all of them. (1) Sorting by start instead of end — the counterexample is so common ([[1, 100], [2, 3], [4, 5]]) that I keep it in our team docs. If your greedy looks suspiciously like sort-by-start, write it on a whiteboard and try that 3-element input. (2) The half-open vs closed flip costs us about one bug per quarter in our calendar service; we now type-tag intervals as Closed[T] or HalfOpen[T] and the operators are method calls (.overlaps(other)) — no operator confusion. (3) When the requirement grows from ‘count’ to ‘weighted sum,’ DON’T try to patch the greedy. Throw it out and write the DP. Greedy doesn’t generalize.”

Connections forward:

  • p46 / p47 — Merge / Insert Intervals (different invariants).
  • p49 — Meeting Rooms II (count peak overlaps).
  • LC 1235 — Maximum Profit Job Scheduling (weighted variant).
  • LC 452 — Minimum arrows to burst balloons (same greedy structure!).
  • LC 646 — Maximum length pair chain (literally the same problem in disguise).
  • CLRS Chapter 16 — Greedy algorithms; activity selection.
  • Production: operating-room scheduling, batch-job placement, ad-slot allocation, OR systems.