Lab 01 — Interval Scheduling (Activity Selection)

Goal

Master the canonical greedy problem: maximum non-overlapping interval selection. Internalize the earliest-end-time-first greedy and produce its exchange argument out loud, without help, in under 90 seconds.

Background

Interval scheduling is the prototype greedy problem in every algorithms textbook because it has the cleanest exchange argument and the largest gap between “intuitive but wrong” choices (earliest start, shortest duration, fewest conflicts) and the one correct one (earliest end). Mastering this lab is mastering the discipline of proof-before-code for the rest of Phase 6. The same exchange-argument template recurs in LC 452 (minimum arrows to burst balloons), LC 1353 (maximum events attended), and dozens of variants.

Interview Context

A staff-level interviewer at FAANG-tier companies will not accept “I’ll sort by end time” without a justification. The signal they’re testing for is: can this candidate distinguish between intuition and proof? The exchange argument, delivered cleanly in 60–90 seconds before code, is the strongest possible signal. Conversely, candidates who code first and then mumble “earliest end is correct because… well… it just is” almost always fail the question even when their code passes the tests.

Problem Statement

Given n activities, each with a start time s_i and an end time e_i, select the maximum-cardinality subset of activities that are pairwise non-overlapping (an activity ending at time t does not overlap with one starting at time t — boundary touching is allowed). Return the size of that subset, or the subset itself if requested.

LeetCode reference: LC 435 — Non-overlapping Intervals (return the count of intervals to remove to make the rest non-overlapping; equivalent to n − maxNonOverlapping).

Constraints

  • 1 ≤ n ≤ 10^5
  • −5 · 10^4 ≤ s_i < e_i ≤ 5 · 10^4
  • Boundary contact (an interval ending at t and another starting at t) does NOT count as overlap.
  • Time complexity must be O(n log n); space O(1) extra (sort in place).

Clarifying Questions

  1. “Are intervals open, closed, or half-open?” — LC convention is half-open: [s, e) is a common interpretation; ask the interviewer.
  2. “Is an interval where s == e (zero-duration) allowed?” — usually yes, treat as a single-point interval.
  3. “Should I return the maximum non-overlapping count or the minimum to remove?” — LC 435 asks the latter; the answer is n − maxNonOverlapping.
  4. “Can the intervals be unsorted? Are they ever pre-sorted?” — assume unsorted unless stated; sort is part of the algorithm.
  5. “Are there ties on end time?” — yes; tie-break by start time ascending so the algorithm is deterministic and the proof handles ties cleanly.

Examples

  • [[1,2], [2,3], [3,4], [1,3]] → max non-overlapping = 3 (pick [1,2], [2,3], [3,4]); remove count = 1.
  • [[1,2], [1,2], [1,2]] → max non-overlapping = 1; remove count = 2.
  • [[1,2], [2,3]] → max non-overlapping = 2; remove count = 0 (boundary contact does not overlap).
  • [] → 0.

Initial Brute Force

Try every subset; for each, check pairwise non-overlap; return the size of the largest valid subset.

from itertools import combinations

def max_non_overlap_brute(intervals):
    n = len(intervals)
    best = 0
    for r in range(n + 1):
        for subset in combinations(intervals, r):
            ok = all(subset[i][1] <= subset[j][0]
                     for i in range(len(subset))
                     for j in range(len(subset))
                     if subset[i][0] < subset[j][0])
            if ok:
                best = max(best, len(subset))
    return best

Brute Force Complexity

Time O(2^n · n^2) — every subset checked pairwise. Space O(n) for combinations. Infeasible at n = 25. Useful only as a stress-test oracle for the greedy on n ≤ 12.

Optimization Path

  1. Brute force (above) — establishes correctness baseline.
  2. DP — sort by end time; dp[i] = max non-overlapping using intervals 0..i. Transition: dp[i] = max(dp[i-1], 1 + dp[predecessor(i)]) where predecessor(i) is the latest j < i with e_j ≤ s_i. Time O(n log n) with binary search. Space O(n). This is weighted interval scheduling’s structure when we drop weights.
  3. Greedy — sort by end time, scan once, pick whenever compatible. O(n log n) time, O(1) extra space. Optimality from the exchange argument below.

The DP step is worth deriving even though greedy beats it, because the moment intervals carry weights, greedy fails and DP is the only correct approach. Keep DP in your back pocket.

Final Expected Approach

Sort intervals by end time ascending (tie-break by start ascending). Maintain last_end = -∞. For each interval [s, e] in sorted order, if s ≥ last_end, accept it (count += 1, last_end = e); else skip. Return count.

For LC 435 the answer is n − count.

Data Structures Used

  • A sorted list / array of intervals.
  • One scalar last_end.
  • One counter count.

No heap, no DSU, no DP table. The algorithm is sort-and-scan.

Correctness Argument

This is the section the rest of Phase 6 hangs on. We use the canonical 4-step exchange argument.

Setup. Let G = g_1, g_2, …, g_k be the greedy solution: intervals sorted by end time, picked greedily. Let O = o_1, o_2, …, o_m be any optimal solution, also written in sorted-by-end-time order (we can always re-sort an optimal solution; non-overlap is preserved). Assume for contradiction m > k (O is strictly better than G); we will derive a contradiction by showing we can transform O into G step by step without decreasing its size — implying m ≤ k.

Step 1 — Locate the first divergence. Let i be the smallest index where g_i ≠ o_i. By construction, g_1 = o_1, …, g_{i-1} = o_{i-1}.

Step 2 — Compare end times at the divergence. Greedy picks the interval with the earliest end time among those compatible with g_1, …, g_{i-1} — equivalently, those compatible with o_1, …, o_{i-1}. So g_i.end ≤ o_i.end (with equality possible if there is a tie and o_i happens to be the tied alternative).

Step 3 — Exchange. Replace o_i with g_i in O, producing O' = o_1, …, o_{i-1}, g_i, o_{i+1}, …, o_m. We must verify (a) feasibility and (b) size.

  • Feasibility. g_i is compatible with o_{i-1} = g_{i-1} because greedy ensured this. g_i is compatible with o_{i+1}: o_{i+1}.start ≥ o_i.end ≥ g_i.end, so g_i ends no later than o_i did, and the rest of O was already non-overlapping with o_i’s end. So O' is feasible.
  • Size. |O'| = |O| (we replaced one interval with one interval).

Step 4 — Iterate. O' agrees with G on positions 1..i. Repeat the argument on O' versus G to find the next divergence. After at most min(k, m) exchanges, the resulting solution agrees with G on the first min(k, m) positions.

Conclude. If m > k, after k exchanges the transformed O looks like g_1, …, g_k, o_{k+1}, …, o_m. But greedy stopped at g_k, which means there was no interval compatible with g_1, …, g_k. Therefore o_{k+1} cannot exist — contradiction. So m ≤ k, i.e., G is at least as large as any feasible solution. G is optimal. QED.

Complexity

  • Time O(n log n) for the sort, O(n) for the scan; total O(n log n).
  • Space O(1) extra beyond the input (sort in place); O(log n) for sort recursion in some implementations.

Implementation Requirements

  • Sort key: (end, start) — explicit tie-break.
  • Boundary semantics: s_next ≥ e_prev is “compatible” (touching allowed). The interview’s clarifying question about open/closed determines this; default to half-open.
  • Handle empty input (n = 0 → return 0).
  • Single pass, no nested loop.

Tests

def test_interval_scheduling():
    # canonical
    assert max_non_overlap([[1,2],[2,3],[3,4],[1,3]]) == 3
    # all overlapping
    assert max_non_overlap([[1,2],[1,2],[1,2]]) == 1
    # no overlap
    assert max_non_overlap([[1,2],[2,3]]) == 2
    # empty
    assert max_non_overlap([]) == 0
    # single
    assert max_non_overlap([[5,10]]) == 1
    # tie on end time
    assert max_non_overlap([[1,3],[2,3],[3,4]]) == 2  # one of {[1,3],[2,3]} + [3,4]
    # negative coords
    assert max_non_overlap([[-5,0],[0,5],[5,10]]) == 3
    # nested
    assert max_non_overlap([[1,10],[2,3],[4,5],[6,7]]) == 3

Stress test: generate random n ≤ 12 and compare greedy to brute force on 1000 trials.

Follow-up Questions

  • Weighted version (each interval has weight w_i; maximize total weight of chosen non-overlapping). Greedy fails. Solution: DP with binary search predecessor pointer, O(n log n).
  • Online / streaming version (intervals arrive one at a time, decide accept/reject immediately, no recall). Different problem class — competitive ratio analysis.
  • K-machine extension (k parallel resources; each interval scheduled on any one of them; maximize total scheduled). Greedy with k “last_end” trackers + min-heap.

Product Extension

Calendar systems (Google Calendar’s “find a meeting time” feature, AWS spot-instance scheduling, ad slot allocation): the unweighted version is the prototype, and weighted variants are exactly what production schedulers solve.

Language / Runtime Follow-ups

  • Python: sorted(intervals, key=lambda x: (x[1], x[0])) — stable sort, tuple comparison handles tie-break for free.
  • Java: Arrays.sort(intervals, (a, b) -> a[1] != b[1] ? Integer.compare(a[1], b[1]) : Integer.compare(a[0], b[0])). Use Integer.compare to avoid integer-overflow on a[1] - b[1].
  • Go: sort.Slice(intervals, func(i, j int) bool { if intervals[i][1] != intervals[j][1] { return intervals[i][1] < intervals[j][1] }; return intervals[i][0] < intervals[j][0] }).
  • C++: std::sort with a lambda; prefer std::tie(a[1], a[0]) < std::tie(b[1], b[0]) for clarity.
  • JS/TS: intervals.sort((a, b) => a[1] - b[1] || a[0] - b[0]) — the || 0 falls through to start-comparison on tie.

Common Bugs

  • Sorting by start time instead of end time (intuitive, wrong).
  • Wrong tie-break (e.g., descending start) breaking determinism in the proof.
  • Using > instead of for compatibility, rejecting touching intervals.
  • Forgetting to update last_end after accepting an interval.
  • Returning the count when the question asked for the removal count (LC 435), or vice versa.

Debugging Strategy

If your greedy disagrees with brute force on some n ≤ 12 input:

  1. Print both solutions side by side.
  2. Find the first interval where they differ.
  3. Manually run the exchange argument step on that point — does the swap preserve feasibility? If not, your sort key or tie-break is wrong.
  4. If the exchange preserves feasibility but your code didn’t pick g_i, your scan logic has an off-by-one in the compatibility check.

Mastery Criteria

  • You can write the algorithm in <5 minutes from a blank screen, including the tie-break in the sort key.
  • You can deliver the 4-step exchange argument out loud in <90 seconds, without notes.
  • You can extend to LC 452 (minimum arrows to burst balloons) by recognizing it as the same problem with renaming, in <2 minutes.
  • You can articulate why the weighted version requires DP, in <30 seconds.
  • You correctly reject the wrong sort keys (earliest start, shortest duration, fewest conflicts) by giving a concrete counterexample for each, in <2 minutes total.

← Phase 6 README · Lab 02 — Jump Game II →