p47 — Insert Interval

Source: LeetCode 57 · Medium · Topics: Array Companies (2024–2025 frequency): Google (very high), Meta (high), Amazon (high), Microsoft (medium-high), Bloomberg (medium), LinkedIn (medium) Loop position: Common Medium, frequently paired with p46 Merge Intervals as a two-problem set. Tests whether you exploit pre-sorted input or wastefully re-sort.

1. Quick Context

You’re given a list of non-overlapping, sorted-by-start intervals and a new interval. Insert the new interval and merge if necessary; return the resulting list.

The lazy approach: Append new to the list, re-sort, run Merge Intervals (p46). O(n log n). Works but ignores the pre-sortedness — wasted log-factor.

The right approach: Three-phase linear scan in O(n):

  1. Before: copy all intervals that END strictly before new.start.
  2. Merge: while the current interval OVERLAPS new, expand new to [min(starts), max(ends)] and skip; finally append the expanded new.
  3. After: copy all remaining intervals.
out = []
i, n = 0, len(intervals)
# Phase 1: strictly before
while i < n and intervals[i][1] < new[0]:
    out.append(intervals[i]); i += 1
# Phase 2: merge overlapping
while i < n and intervals[i][0] <= new[1]:
    new = [min(new[0], intervals[i][0]), max(new[1], intervals[i][1])]
    i += 1
out.append(new)
# Phase 3: strictly after
out.extend(intervals[i:])
return out

O(n) time, O(n) output. The senior signal is recognizing that re-sorting is wasteful.


🔗 https://leetcode.com/problems/insert-interval/

STOP. 15-min timer. Draw the three phases on paper before coding.


3. Prerequisite Concepts

  • p46 Merge Intervals (the sub-procedure used by the lazy approach).
  • Linear-scan iteration with index variable.
  • Closed-interval overlap check a.start ≤ b.end AND b.start ≤ a.end.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Given disjoint sorted intervals and one new interval, insert + merge to maintain disjoint-sorted invariant.”

Step 2 — Clarify:

  • “Are existing intervals guaranteed sorted by start AND non-overlapping?” (LC: yes.)
  • “Closed-interval merge?” (LC: yes — [1, 2] and [2, 3] merge.)
  • “May new interval be empty (start > end)?” (LC: no.)
  • “Can I mutate input?” (LC: yes.)

Step 3 — Constraints: n ≤ 10^4. Both O(n) and O(n log n) work. The interview signals which one you reach for.

Step 4 — Examples:

  • intervals = [[1,3],[6,9]], new = [2,5] → [[1,5],[6,9]].
  • intervals = [[1,2],[3,5],[6,7],[8,10],[12,16]], new = [4,8] → [[1,2],[3,10],[12,16]].
  • intervals = [], new = [5,7] → [[5,7]].
  • intervals = [[1,5]], new = [2,3] → [[1,5]] (contained).
  • intervals = [[1,5]], new = [6,8] → [[1,5],[6,8]] (after).

Step 5 — Brute force: Append + re-sort + run merge. O(n log n).

Step 6 — Complexity:

  • Lazy (append + sort + merge): O(n log n).
  • Three-phase scan: O(n).
  • Binary search to find phase boundaries: O(log n) for boundaries + O(k) merge where k = #merged. Worst-case O(n) but typically faster.

Step 7 — Pattern: Pre-sorted-input optimization. Whenever input is sorted, ASK if you can leverage that to drop a log factor.

Step 8 — Develop: see solution.py.

Step 9 — Test: empty list; new before all; new after all; new merges with one; new merges with many; new contained; new contains all.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — The three phases, formalized

6.1 Phase definitions

Let new = [ns, ne]. Partition the input intervals into three groups:

  • B (Before): i.end < ns — strictly before new; cannot merge.
  • M (Merge): i.start ≤ ne AND i.end ≥ ns — overlaps new; must merge.
  • A (After): i.start > ne — strictly after new; cannot merge.

After sorting by start (input is already sorted), these groups appear in order: B, then M, then A. The scan walks through them in sequence.

Critical detail: the boundaries depend on the overlap check. For closed intervals: i.end < ns means “strictly before”; i.end ≥ ns AND i.start ≤ ne means “overlaps.”

6.2 The merge update

While in phase M, the new interval EXPANDS:

new = [min(new[0], i[0]), max(new[1], i[1])]

Both min and max are required:

  • min because the merged interval starts at the EARLIEST of all merged starts (could be the original new or some intervals[i]).
  • max because the merged interval ends at the LATEST of all merged ends.

After phase M terminates, append new (now expanded) ONCE, then continue with phase A.

6.3 Why one-pass O(n) beats the O(n log n) lazy approach

The lazy approach treats the problem as unsorted: append, re-sort, run merge. The re-sort is n log n work to recover what we already knew. The three-phase scan exploits the existing order: each interval is visited exactly once, and we know its phase from a simple comparison.

For n = 10^4, both are fast. For n = 10^7 (in-memory database with timestamp ranges, say), the log-factor matters: 10^7 vs ~2.3 × 10^8 ops.

6.4 Binary search optimization

The boundary between B and M can be found by binary-searching for the first interval with end ≥ ns. Similarly, the boundary between M and A. Each is O(log n).

If only a few intervals merge (typical case in production), this gives O(log n) total work — a huge win when n is large and updates are dense.

import bisect
# left boundary: first interval whose end >= ns
b_end = bisect.bisect_left([i[1] for i in intervals], ns)
# right boundary: first interval whose start > ne  
a_start = bisect.bisect_right([i[0] for i in intervals], ne)

In practice, building those projection lists costs O(n), so use a SortedList-of-ends or maintain dual indices. Mention this as the senior+ optimization.

6.5 Mutation vs immutability

The three-phase scan can build out as a fresh list (clean) or mutate intervals (savvier). Mutation:

  • Splice out the M intervals, replace with the single expanded new.
  • O(n) due to list shifts in Python; not really faster, just lower memory.

Production: prefer immutable returns. Frees the caller from aliasing concerns and pairs with type-safe interval types.

6.6 Half-open variant

For [start, end) intervals: change < to <= in phase 1 boundary (i.end <= ns means “strictly before”), and <= to < in phase 2 (i.start < ne means “overlaps”). One-liner difference, but ASK first.

6.7 Online insertion into a sorted-interval data structure

If you maintain a long-lived sorted set of intervals and insert is frequent, use a SortedList (or interval tree):

  • Find insertion point with bisect: O(log n).
  • Scan left and right for overlapping neighbors; merge.
  • Remove subsumed; insert expanded new. Amortized O(log n + k) per insert. This is the production answer; LC asks for the batch one-shot.

7. Anti-Pattern Analysis

  1. Append + sort + run Merge Intervals — works, but wastes the pre-sortedness; senior interviewers downgrade.
  2. Skip the min in new[0] = ... — if new started LATER than some merged interval, you’d shrink the merged interval’s start. Wrong.
  3. Forget max — same issue from p46.
  4. Forget to append new before phase A — drops the merged interval entirely.
  5. Loop with for over intervals and break instead of while with index — phase A becomes awkward (need to remember where you stopped).
  6. Treat overlap as i.end <= ns strict-less-than when problem is closed — misses the touch-merge case.
  7. Re-check the same interval against new after new has expanded — already handled by the while advancing i; only a bug if you forget to advance.
  8. Use extend([new]) on outappend(new) is cleaner.
  9. Returning the input list with intervals.insert(...) — O(n) per insert AND breaks the merge logic for multi-overlap.
  10. Computing new lazily then forgetting to push — the “phase 2 exit point” is the most common bug site.

8. Skills & Takeaways

  • Three-phase scan as a pattern for “insert into sorted disjoint structure.”
  • Pre-sorted-input optimization — drop log factor by exploiting order.
  • min(starts), max(ends) as the universal merge formula.
  • Closed vs half-open boundary semantics.
  • Binary search to locate phase boundaries (O(log n) senior-level optimization).
  • SortedList / interval tree as the production data structure for online inserts.

9. When to Move On

Move on when:

  • You write the three-phase scan in ≤ 10 min from memory.
  • You explain WHY both min and max are needed in the merge update.
  • You ASK about endpoint semantics (closed vs half-open).
  • You can sketch the binary-search optimization in 2 min.
  • Stress test on 1000 random inputs agrees with the append+sort+merge oracle.

10. Company Context

Google:

  • L4/L5 onsite Medium; usually paired with p46.
  • Bar Raiser: “Exploited pre-sorted input for O(n); didn’t re-sort.”

Meta:

  • L5+ phone screen. The three-phase scan must be on-the-fly correct.

Amazon:

  • “Insert a new shift into a driver’s existing schedule” — same problem, productized phrasing.

Microsoft:

  • L62+. Variant: “Now do it 1000 times — maintain the structure across many inserts.” (Online variant.)

Bloomberg:

  • “Add a new trading-hour exception to an existing market hours list.”

LinkedIn:

  • Calendar conflict detection.

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
First instinctAppend + sort + mergeThree-phase scan+ explains why pre-sortedness matters+ mentions binary-search boundaries
Merge updateJust max (BUG)min AND max+ explains both+ immutable rebuild for safety
Edge: empty inputCrashesHandles+ explicitly tests+ parameterizes for Optional
Edge: new before all / after allForgets phase AHandles+ names the three phases+ symmetric reasoning
ProductionSilentSilent“SortedList for online”+ interval tree, segment tree, persistent data structures

12. Level Delta

Mid (L4):

  • Three-phase scan working.
  • min and max in merge update.
  • Handles empty input.

Senior (L5):

  • ASKs endpoint semantics.
  • Articulates O(n) vs O(n log n) trade.
  • Mentions binary-search boundaries as further optimization.

Staff (L6):

  • Discusses SortedList / interval tree for online inserts.
  • Discusses 2D extension (rectangle insert) — much harder.
  • Discusses immutable persistent interval structures for transactional systems.

Principal (L7+):

  • Discusses persistent / functional interval trees for time-travel queries.
  • Discusses CRDT-style merge for distributed calendar systems.
  • Discusses online + concurrent insert with lock-free data structures.
  • Discusses bulk insert when many intervals are inserted at once: sort batch first, then sweep through the existing structure once.

13. Follow-up Q&A

Q1: What if many inserts in sequence — maintain state across calls? A: SortedList keyed by start. Per insert: bisect to find left neighbor; scan right while overlapping; remove + replace. Amortized O(log n + k) per insert, k = #merged.

Q2: Insert BULK (many new intervals at once)? A: Sort the new batch; do a merged sweep through (existing ∪ new) using two-pointer; output the union as a fresh merged list. O((n + m) log m) where m = batch size, or O(n + m) if both are pre-sorted.

Q3: Half-open intervals? A: Switch the overlap check from <= to < at the phase boundaries.

Q4: Insert with a CAPACITY constraint (each interval covers up to k items)? A: Becomes a sweep-line accumulation problem. Harder; not a simple insert anymore.

Q5: Insert into a DISK-resident sorted interval store? A: B-tree or LSM-tree of intervals; bulk-merge strategy similar to Q2. Production: LevelDB/RocksDB with interval keys.

Q6: Delete an interval — symmetric problem? A: Find intervals overlapping the target via bisect; for each, either remove (if fully contained) or split into pieces (if it spans the boundary). O(log n + k).


14. Solution Walkthrough

See solution.py:

  • insert_interval — three-phase O(n) scan canonical answer.
  • insert_interval_lazy — append + sort + merge oracle.
  • insert_interval_binary — binary-search boundary optimization.
  • edge_cases() + stress_test() — 1000 random inputs; all three agree.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“Three things. (1) The three-phase scan reads beautifully because the phases are NAMED in comments. Make the comments # Phase 1: before, # Phase 2: merge, # Phase 3: after. Future-you and reviewers immediately know what each loop does. (2) For our calendar-conflict service, we use a SortedList of (start, end) tuples with online inserts; this batch problem is a unit test against a brute oracle, not production. Don’t conflate the two. (3) If the input intervals are immutable references (shared with other code), the lazy approach mutates them via re-sort. Prefer the three-phase scan precisely because it never sorts the input.”

Connections forward:

  • p46 — Merge Intervals (subroutine for the lazy approach).
  • p48 — Non-overlapping Intervals.
  • p49 — Meeting Rooms II.
  • p50 — Employee Free Time.
  • LC 715 — Range Module (online interval add/remove/query).
  • LC 731 — My Calendar II (double-booking detection).
  • LC 732 — My Calendar III (max active overlap).
  • Production: calendar systems, scheduling, CRDT-based collaborative editing of time ranges.