Lab 04 — Sweep Line: The Skyline Problem
Goal
Master the sweep-line paradigm by solving the canonical Skyline Problem (LC 218). Process N rectangles by sweeping left-to-right over event points, maintaining a set of active heights, and emitting key points when the maximum height changes. By the end, you can write the full algorithm from blank in <15 minutes.
Background
Sweep line is a meta-technique: convert geometric or interval problems on N objects into a sequence of 2N events (one when an object enters, one when it leaves), sort the events, then process them in order while maintaining a dynamic data structure (multiset, segment tree, BIT) that summarizes the currently active set. The cost shifts from “compare every pair” O(N²) to “sort + log-time updates” O(N log N).
The skyline problem is the canonical sweep-line interview question because it forces every component: event extraction, event sorting (with non-obvious tie-breaking), a dynamic max query, and careful output deduplication. Internalize this and you can derive sweep-line variants for rectangle area unions, interval intersection counting, point-in-rectangle queries, and convex hull (Andrew’s monotone chain).
Interview Context
LC 218 (Hard) — by far the most-asked Hard at FAANG mid-level interviews when the panel wants to test sweep line. The bar is very high: a passing solution must be O(N log N), must handle ties correctly, must deduplicate output, and must not emit phantom key points. A candidate who hand-waves through tie-breaking will be rejected even with otherwise-correct code. Quant interviews use rectangle-union-area, which is the same engine with an integral instead of a max.
Problem Statement
Given N buildings, each described by [left_i, right_i, height_i], return the skyline outline as a list of key points [x, y], where y is the height of the skyline at x and key points appear only when the height changes. The last key point has y = 0 (where the rightmost building ends).
LeetCode reference: LC 218 (The Skyline Problem).
Constraints
1 ≤ N ≤ 10^4(LC), realistically up to10^5for harder variants.0 ≤ left_i < right_i ≤ 2^31 − 1.1 ≤ height_i ≤ 2^31 − 1.- Output is sorted by
x. No two consecutive key points have the samey.
Clarifying Questions
- “Are buildings axis-aligned and non-rotated?” — yes (skyline assumption).
- “Can buildings overlap?” — yes, freely.
- “Is the ground at
y = 0?” — yes; the skyline ends with[x_max, 0]. - “Output format: key points where height changes, or every event point?” — only changes; consecutive duplicates are bugs.
- “Tie-breaking for events at the same
x?” — opens before closes (the building exists at that exactx, contributing its height).
Examples
- Input:
[[2, 9, 10], [3, 7, 15], [5, 12, 12], [15, 20, 10], [19, 24, 8]]. - Output:
[[2, 10], [3, 15], [7, 12], [12, 0], [15, 10], [20, 8], [24, 0]].
Walking through: at x = 2, height jumps from 0 to 10 → emit [2, 10]. At x = 3, height jumps to 15 → emit [3, 15]. At x = 7, building of height 15 ends, max drops to 12 → emit [7, 12]. At x = 12, last “left-cluster” building ends → emit [12, 0]. Then the right cluster starts at 15 ([15, 10]), 19 doesn’t change max (still 10 since 8 < 10), 20 ends 10-building → emit [20, 8], 24 ends 8-building → emit [24, 0].
Initial Brute Force
For each x from 0 to x_max, compute the max height over all buildings covering x. Emit [x, h] whenever h changes.
def skyline_brute(buildings):
x_max = max(b[1] for b in buildings)
out = []
prev_h = 0
for x in range(x_max + 1):
h = max((b[2] for b in buildings if b[0] <= x < b[1]), default=0)
if h != prev_h:
out.append([x, h])
prev_h = h
return out
Brute Force Complexity
O(x_max · N). For coordinates up to 2^31, infeasible. Even on small CP-scale x_max = 10^9, no chance. Useful only as oracle on x_max ≤ 100.
Optimization Path
- Brute (above).
- Event sweep with sorted multiset. Generate
2Nevents:(left, -h, OPEN)and(right, h, CLOSE). Sort. Sweep, maintaining a multiset of active heights. Emit[x, max_active]when max changes.O(N log N). - Event sweep with a max-heap and lazy deletion. Same idea, but the heap doesn’t natively support deletion; instead, store
(height, end_time)pairs and pop stale entries lazily. Sometimes faster constant factor thanmultiset.O(N log N). - Divide and conquer. Split buildings into two halves, solve recursively, merge skylines (similar to merge sort).
O(N log N).
The multiset approach is the cleanest in C++/Java; Python defaults to the heap-with-lazy-deletion style.
Final Expected Approach
Heap with lazy deletion (Python idiom):
import heapq
def get_skyline(buildings):
events = []
for L, R, H in buildings:
events.append((L, -H, R)) # opening: negative height for max-heap via min-heap
events.append((R, 0, 0)) # closing sentinel: process at this x
events.sort()
result = []
heap = [(0, float('inf'))] # (negative height, end_time); ground is height 0 forever
i = 0
n = len(events)
while i < n:
x = events[i][0]
# Process all events at this x: add openings.
while i < n and events[i][0] == x:
L, neg_H, R = events[i]
if neg_H < 0: # an opening
heapq.heappush(heap, (neg_H, R))
i += 1
# Lazy-pop expired buildings.
while heap[0][1] <= x:
heapq.heappop(heap)
cur_max = -heap[0][0]
if not result or result[-1][1] != cur_max:
result.append([x, cur_max])
return result
Data Structures Used
- A list of events, sorted.
- A max-heap (or multiset / sorted set) of active heights with their end times.
- A result list with deduplication on consecutive
y.
Correctness Argument
Sweep correctness. Define H(x) = max heights of buildings covering x. The function H changes value only at event coordinates (the boundaries of buildings). So if we sample H at every event coordinate (in sorted order), we capture every change. The result is the unique sequence of changes in H, which is the skyline.
Tie-breaking at equal x. Multiple events can share an x: an opening, a closing, or both. We process all events at this x together: first add all openings (their buildings exist at this x, contributing their height), then lazy-pop closings (those buildings no longer cover any x ≥ this x). After all events at x are processed, the heap reflects the active set on [x, x+1), and we read cur_max. Emit [x, cur_max] if it differs from the previous emission. This handles “two buildings of different heights both starting at the same x” (emit the taller), “one building closing at the same x another opens” (emit the new max), and “two buildings of equal height with overlapping ranges” (no change at the boundary).
Lazy deletion correctness. We never pop a building from the heap until we’ve passed its end time. The heap’s top might be stale; we pop while heap[0].end ≤ x. Once we stop, the top is the current maximum. Since each building is pushed once and popped at most once, total heap work is O(N log N).
Complexity
O(N log N)time (sort + heap operations).O(N)space (events and heap).- Output size up to
O(N)key points.
Implementation Requirements
- Sort events with the right tie-breaking. The
(x, -h, R)tuple ordering naturally puts openings before closings at the samex(negative height < 0 < zero sentinel). - Process all events at the same
xtogether before emitting. - Deduplicate consecutive key points with equal
y. - Initialize the heap with the ground sentinel
(0, ∞)so it’s never empty.
Tests
def test_skyline():
bs = [[2,9,10],[3,7,15],[5,12,12],[15,20,10],[19,24,8]]
expected = [[2,10],[3,15],[7,12],[12,0],[15,10],[20,8],[24,0]]
assert get_skyline(bs) == expected
# Single building
assert get_skyline([[1, 2, 1]]) == [[1, 1], [2, 0]]
# Two same-x openings, different heights
assert get_skyline([[0, 5, 3], [0, 4, 5]]) == [[0, 5], [4, 3], [5, 0]]
# Stress vs brute on small inputs
import random
for _ in range(200):
N = random.randint(1, 6)
bs = []
for _ in range(N):
L = random.randint(0, 10)
R = random.randint(L + 1, L + 5)
H = random.randint(1, 5)
bs.append([L, R, H])
assert get_skyline(bs) == skyline_brute(bs)
Follow-up Questions
- “Total area covered (rectangle union)?” — same sweep, but accumulate
Δx · max_heightbetween consecutive events.O(N log N). - “Number of overlapping buildings at each
x?” — same events, but track the count of active buildings, not max.O(N log N). - “Online version where buildings stream in?” — segment tree over compressed coordinates, range max query.
O(N log N)total. - “K-th tallest building visible at
x?” — segment tree with “K-th order statistic” support, or a balanced BST.
Product Extension
Logging / event analytics: “concurrent active sessions over time” is the same engine with count instead of max. Cloud autoscaling decision: “what’s the peak demand in this 5-minute window?” Same engine with sum instead of max. Calendar conflict detection: pairs of overlapping events found by sweep + active-set membership. Real-time bidding (RTB): impression eligibility windows with priority-tier counts.
Language / Runtime Follow-ups
- Python:
heapqis min-heap, so use negative heights for max.SortedListfromsortedcontainersisO(log N)insert / delete / max — closest to C++ multiset. - Java:
TreeMap<Integer, Integer>mapping height to count, withlastKey()for max. OrPriorityQueuewith lazy deletion. - Go:
container/heapwith customLess; alternatively asort.IntSliceyou maintain manually. - C++:
multiset<int>with*rbegin()for max,erase(find(...))for single-instance removal. Orpriority_queue+ lazy deletion.
Common Bugs
- Tie-breaking: closing processed before opening at the same
xcauses phantom drops. - Forgetting the ground sentinel
(0, ∞)causes empty-heap crashes when all buildings expire. - Failing to deduplicate consecutive key points: emitting
[5, 10], [7, 10], [9, 10]instead of[5, 10]. - Removing both copies when two buildings have the same height by using
multiset.erase(value)(which erases all). Useerase(find(value)). - Confusing
<=vs<on the lazy-pop condition; off-by-one drops a building one event too early or late. - Sorting events by
xonly and breaking ties arbitrarily — leads to wrong output on dense inputs.
Debugging Strategy
If output has phantom key points or wrong heights:
- Print events after sorting; verify openings appear before closings at the same
x. - Print heap contents after processing each event; verify the top is the true max active height.
- Run on the smallest failing case and compare against brute on
x_max ≤ 30.
If output is missing key points:
- Check the deduplication condition; an off-by-one might filter out real changes.
- Verify all events at the same
xare processed before emitting.
Mastery Criteria
- Write the full skyline algorithm from blank in <15 minutes.
- Articulate the tie-breaking rule and why it’s needed in 30 seconds.
- Adapt the sweep to rectangle-union-area in <5 minutes.
- Recognize “intervals + boundary events + dynamic property” as the sweep-line trigger in <60 seconds.
-
State the
O(N log N)correctness argument in one sentence.
← Lab 03 — Sieve and Factorization · Phase 7 README · Lab 05 — Coordinate Compression →