Lab 04 — Sparse Table for Range Minimum Queries
Goal
Implement a sparse table supporting O(1) range-min queries on a static array, after O(N log N) preprocessing. Internalize the “two overlapping intervals of the largest power-of-two length” trick. After this lab you should be able to write a sparse table from blank in under 10 minutes and instantly choose between sparse table and segment tree based on whether updates are required.
Background Concepts
A sparse table is a preprocessing structure for range queries on immutable arrays where the aggregate is idempotent — combining the same element twice gives the same result. Min, max, gcd, bitwise OR, bitwise AND, and “is-there-a-1-in-this-range” are idempotent. Sum is not (counts twice).
Construction. st[k][i] = min of arr[i .. i + 2^k - 1]. Build by:
st[0][i] = arr[i]for alli.st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)])— the range of length 2^k splits cleanly into two halves of length 2^(k-1).
Query. Given [l, r], let k = floor(log2(r - l + 1)). Then min(st[k][l], st[k][r - 2^k + 1]). The two intervals each have length 2^k, they cover [l, l + 2^k - 1] and [r - 2^k + 1, r], and their union is exactly [l, r] because l + 2^k - 1 ≥ r - 2^k + 1 whenever 2^k ≥ (r - l + 1) / 2, which holds by choice of k. They overlap, but for an idempotent op that’s harmless.
For O(1) per query you also need a precomputed log_floor[len] table — calling math.log2 each query has too much overhead and floating-point trouble.
Interview Context
Sparse tables show up in problems with a read-only array and many range-min/max queries. The signal: “static array, Q queries with Q ≫ N”. Common cousin: range-LCA via Euler tour + sparse table over depth array — Phase 4 territory but rooted here. Asked at: Google occasionally, CP-flavored shops always. Rejecting an O(log N) segment tree in favor of a sparse table when O(1) queries matter (e.g., 10⁷ queries on a 10⁵ array) is a senior-level signal.
Problem Statement
Given a static integer array arr of length N, build a structure that answers query(l, r) = min of arr[l..r] in O(1) per query.
Constraints
- 1 ≤ N ≤ 10⁵
- 1 ≤ Q ≤ 10⁷ queries
- 0 ≤ l ≤ r < N
- −10⁹ ≤ arr[i] ≤ 10⁹
Clarifying Questions
- Is the array static? (Yes — that’s the entire premise.)
- Inclusive endpoints? (Yes.)
- Min, or min and max? (Just min for this lab; max is identical with
min→max.)
Examples
arr = [3, 1, 4, 1, 5, 9, 2, 6]
query(0, 7) → 1
query(2, 5) → 1
query(4, 7) → 2
query(3, 3) → 1
Initial Brute Force
min(arr[l:r+1]) per query — O(N) per call. Total O(N · Q).
Brute Force Complexity
At N=10⁵, Q=10⁷: 10¹² ops. TLE by 6 orders of magnitude.
Optimization Path
Segment tree gives O(log N) per query, O(N) per build. Total O(Q log N) = ~2 × 10⁸ at the limits — borderline TLE in Python, fine in C++.
Sparse table gives O(1) per query, O(N log N) per build. Total O(N log N + Q) = ~2 × 10⁶ + 10⁷ = 1.2 × 10⁷ ops. Comfortable everywhere.
The deciding factor: updates. Sparse table is read-only. If the array mutates between queries, sparse table is wrong; segment tree is required. The interviewer asking “what if I want updates?” is a real follow-up — answer: “Switch to a segment tree; sparse table doesn’t support point updates without an O(N log N) full rebuild.”
Final Expected Approach
- Precompute
log_floor[1..N]vialog_floor[i] = log_floor[i // 2] + 1, base caselog_floor[1] = 0. - Allocate
stas a 2D array of size(K + 1) × NwhereK = log_floor[N]. st[0][i] = arr[i]for all i.- For
k = 1 .. K: fori = 0 .. N - 2^k:st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]). query(l, r):k = log_floor[r - l + 1]; returnmin(st[k][l], st[k][r - 2^k + 1]).
Data Structures Used
- 2D array
st[K+1][N], whereK = floor(log2(N)). - 1D array
log_floor[N+1].
Correctness Argument
By induction on k: st[0][i] = arr[i] (length-1 range, trivially correct). Given st[k-1][·] correct: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) covers [i, i + 2^(k-1) - 1] ∪ [i + 2^(k-1), i + 2^k - 1] = [i, i + 2^k - 1]. Min commutes over union.
For query, k = floor(log2(r - l + 1)) ⇒ 2^k ≤ len ≤ 2^(k+1) - 1 ⇒ 2^k ≥ len/2 ⇒ l + 2^k > r - 2^k, so the two intervals [l, l + 2^k - 1] and [r - 2^k + 1, r] overlap (or meet exactly), and their union is [l, r]. Min over the union equals min of the two.
Complexity
| Operation | Time | Space |
|---|---|---|
| Build | O(N log N) | O(N log N) |
| Query | O(1) | — |
At N=10⁵: K ≈ 17, total table cells ≈ 1.7 × 10⁶. At 8 bytes each, ~14 MB.
Implementation Requirements
class SparseTableMin:
def __init__(self, arr):
n = len(arr)
self.log = [0] * (n + 1)
for i in range(2, n + 1):
self.log[i] = self.log[i // 2] + 1
K = self.log[n]
self.st = [list(arr)] + [[0] * n for _ in range(K)]
for k in range(1, K + 1):
half = 1 << (k - 1)
for i in range(n - (1 << k) + 1):
self.st[k][i] = min(self.st[k-1][i], self.st[k-1][i + half])
def query(self, l, r):
k = self.log[r - l + 1]
return min(self.st[k][l], self.st[k][r - (1 << k) + 1])
Tests
- N=1:
query(0, 0) == arr[0]. - All same:
query(l, r) == arr[0]for all valid (l, r). - Sorted ascending:
query(l, r) == arr[l]. - Sorted descending:
query(l, r) == arr[r]. - Random: 10⁴ queries on a length-1000 random array vs brute force.
- Edge:
query(0, n-1)should equalmin(arr). - Power-of-two length and non-power-of-two length both must pass.
Follow-up Questions
- “Now also support range-max.” → second sparse table or pack
(min, max)into each cell. - “Now updates are required.” → switch to segment tree; sparse table cannot support O(log N) updates without rebuild.
- “Now I want range-sum.” → sum is not idempotent. You can still answer in O(log N) by combining
K = log(len)non-overlapping doubling intervals, but at that point segment tree is simpler. - “Range LCA.” → reduce to range-min on Euler tour depth array + sparse table over depths. Lab in Phase 4.
- “Reduce memory at the cost of complexity.” → Fischer–Heun (RMQ ±1) is O(N) preprocessing + O(1) query but conceptually heavy.
Product Extension
Static analytics dashboards (pre-aggregated, refreshed nightly) over time-series metrics: “min latency in this 5-minute window over the last 24 hours, sliding”. Pre-aggregate the time-series as a sparse table at end-of-day; serve queries at the dashboard at single-microsecond latencies. The “static” condition matches because the data is read-only between rebuilds.
Language/Runtime Follow-ups
- Python: list-of-lists is cache-unfriendly; flatten to one big list with manual indexing for ~3x speedup. PyPy if benchmarking.
- Java:
int[][] st = new int[K+1][n]. JIT will hoist invariants. For N=10⁶ allocate carefully. - Go:
[][]intis fine;make([]int, n)inside a loop is idiomatic. - C++:
vector<vector<int>> st(K + 1, vector<int>(n)). With-O2this is the canonical fast implementation. - JS/TS:
Int32Arrayper row beatsArrayfor numeric ops. JS doesn’t have integer log2 —Math.log2is float and slow; use(31 - Math.clz32(x))for 32-bit ints.
Common Bugs
- Computing
log2per query → floating-point rounding errors, e.g.log2(8) → 2.9999..., floored to 2. Always use the precomputedlog_floor[]. - Building
st[k][i]fori + 2^k - 1 ≥ N— out-of-bounds. Loop must end atn - 2^k. - Sizing
stwith too few rows: K = log_floor[N], but allocating K rows misses the K-th. Allocate K+1. - Using sparse table for sum and then puzzling over wrong answers — sum is not idempotent.
- Forgetting that the array must be static. If a query is interleaved with mutation, the structure silently returns stale answers.
- Off-by-one in query:
r - (1 << k) + 1vsr - (1 << k). The interval is[r - 2^k + 1, r]of length2^k— verify by hand on a tiny case.
Debugging Strategy
Print st[k] for small N=8 and verify by hand: st[0] is the array, st[1][i] = min(arr[i], arr[i+1]), st[2][i] = min(arr[i..i+3]), st[3][0] = min(arr[0..7]). If those don’t hold, your build loop is wrong. If queries fail but build is correct, suspect log_floor and/or query indexing — trace the formula on query(2, 5) where len=4, k=2, indices=2 and 5-4+1=2, both pointing at the same precomputed cell.
Mastery Criteria
- Stated the idempotence requirement and gave 3 ops that satisfy it and 1 that doesn’t.
- Wrote sparse table from scratch in <10 minutes.
-
Wrote
log_floortable without usingmath.log2. - Chose sparse table over segment tree for a read-only workload by stating the constants (1 vs log N per query).
- Identified the failure mode “what if updates are needed” and named segment tree as the replacement.
- Solved one classic RMQ problem and one problem reducible to RMQ.