Lab 04 — Sparse Table for Range Minimum Queries

Goal

Implement a sparse table supporting O(1) range-min queries on a static array, after O(N log N) preprocessing. Internalize the “two overlapping intervals of the largest power-of-two length” trick. After this lab you should be able to write a sparse table from blank in under 10 minutes and instantly choose between sparse table and segment tree based on whether updates are required.

Background Concepts

A sparse table is a preprocessing structure for range queries on immutable arrays where the aggregate is idempotent — combining the same element twice gives the same result. Min, max, gcd, bitwise OR, bitwise AND, and “is-there-a-1-in-this-range” are idempotent. Sum is not (counts twice).

Construction. st[k][i] = min of arr[i .. i + 2^k - 1]. Build by:

  • st[0][i] = arr[i] for all i.
  • st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) — the range of length 2^k splits cleanly into two halves of length 2^(k-1).

Query. Given [l, r], let k = floor(log2(r - l + 1)). Then min(st[k][l], st[k][r - 2^k + 1]). The two intervals each have length 2^k, they cover [l, l + 2^k - 1] and [r - 2^k + 1, r], and their union is exactly [l, r] because l + 2^k - 1 ≥ r - 2^k + 1 whenever 2^k ≥ (r - l + 1) / 2, which holds by choice of k. They overlap, but for an idempotent op that’s harmless.

For O(1) per query you also need a precomputed log_floor[len] table — calling math.log2 each query has too much overhead and floating-point trouble.

Interview Context

Sparse tables show up in problems with a read-only array and many range-min/max queries. The signal: “static array, Q queries with Q ≫ N”. Common cousin: range-LCA via Euler tour + sparse table over depth array — Phase 4 territory but rooted here. Asked at: Google occasionally, CP-flavored shops always. Rejecting an O(log N) segment tree in favor of a sparse table when O(1) queries matter (e.g., 10⁷ queries on a 10⁵ array) is a senior-level signal.

Problem Statement

Given a static integer array arr of length N, build a structure that answers query(l, r) = min of arr[l..r] in O(1) per query.

Constraints

  • 1 ≤ N ≤ 10⁵
  • 1 ≤ Q ≤ 10⁷ queries
  • 0 ≤ l ≤ r < N
  • −10⁹ ≤ arr[i] ≤ 10⁹

Clarifying Questions

  1. Is the array static? (Yes — that’s the entire premise.)
  2. Inclusive endpoints? (Yes.)
  3. Min, or min and max? (Just min for this lab; max is identical with minmax.)

Examples

arr = [3, 1, 4, 1, 5, 9, 2, 6]
query(0, 7) → 1
query(2, 5) → 1
query(4, 7) → 2
query(3, 3) → 1

Initial Brute Force

min(arr[l:r+1]) per query — O(N) per call. Total O(N · Q).

Brute Force Complexity

At N=10⁵, Q=10⁷: 10¹² ops. TLE by 6 orders of magnitude.

Optimization Path

Segment tree gives O(log N) per query, O(N) per build. Total O(Q log N) = ~2 × 10⁸ at the limits — borderline TLE in Python, fine in C++.

Sparse table gives O(1) per query, O(N log N) per build. Total O(N log N + Q) = ~2 × 10⁶ + 10⁷ = 1.2 × 10⁷ ops. Comfortable everywhere.

The deciding factor: updates. Sparse table is read-only. If the array mutates between queries, sparse table is wrong; segment tree is required. The interviewer asking “what if I want updates?” is a real follow-up — answer: “Switch to a segment tree; sparse table doesn’t support point updates without an O(N log N) full rebuild.”

Final Expected Approach

  1. Precompute log_floor[1..N] via log_floor[i] = log_floor[i // 2] + 1, base case log_floor[1] = 0.
  2. Allocate st as a 2D array of size (K + 1) × N where K = log_floor[N].
  3. st[0][i] = arr[i] for all i.
  4. For k = 1 .. K: for i = 0 .. N - 2^k: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]).
  5. query(l, r): k = log_floor[r - l + 1]; return min(st[k][l], st[k][r - 2^k + 1]).

Data Structures Used

  • 2D array st[K+1][N], where K = floor(log2(N)).
  • 1D array log_floor[N+1].

Correctness Argument

By induction on k: st[0][i] = arr[i] (length-1 range, trivially correct). Given st[k-1][·] correct: st[k][i] = min(st[k-1][i], st[k-1][i + 2^(k-1)]) covers [i, i + 2^(k-1) - 1] ∪ [i + 2^(k-1), i + 2^k - 1] = [i, i + 2^k - 1]. Min commutes over union.

For query, k = floor(log2(r - l + 1))2^k ≤ len ≤ 2^(k+1) - 12^k ≥ len/2l + 2^k > r - 2^k, so the two intervals [l, l + 2^k - 1] and [r - 2^k + 1, r] overlap (or meet exactly), and their union is [l, r]. Min over the union equals min of the two.

Complexity

OperationTimeSpace
BuildO(N log N)O(N log N)
QueryO(1)

At N=10⁵: K ≈ 17, total table cells ≈ 1.7 × 10⁶. At 8 bytes each, ~14 MB.

Implementation Requirements

class SparseTableMin:
    def __init__(self, arr):
        n = len(arr)
        self.log = [0] * (n + 1)
        for i in range(2, n + 1):
            self.log[i] = self.log[i // 2] + 1
        K = self.log[n]
        self.st = [list(arr)] + [[0] * n for _ in range(K)]
        for k in range(1, K + 1):
            half = 1 << (k - 1)
            for i in range(n - (1 << k) + 1):
                self.st[k][i] = min(self.st[k-1][i], self.st[k-1][i + half])

    def query(self, l, r):
        k = self.log[r - l + 1]
        return min(self.st[k][l], self.st[k][r - (1 << k) + 1])

Tests

  • N=1: query(0, 0) == arr[0].
  • All same: query(l, r) == arr[0] for all valid (l, r).
  • Sorted ascending: query(l, r) == arr[l].
  • Sorted descending: query(l, r) == arr[r].
  • Random: 10⁴ queries on a length-1000 random array vs brute force.
  • Edge: query(0, n-1) should equal min(arr).
  • Power-of-two length and non-power-of-two length both must pass.

Follow-up Questions

  1. “Now also support range-max.” → second sparse table or pack (min, max) into each cell.
  2. “Now updates are required.” → switch to segment tree; sparse table cannot support O(log N) updates without rebuild.
  3. “Now I want range-sum.” → sum is not idempotent. You can still answer in O(log N) by combining K = log(len) non-overlapping doubling intervals, but at that point segment tree is simpler.
  4. “Range LCA.” → reduce to range-min on Euler tour depth array + sparse table over depths. Lab in Phase 4.
  5. “Reduce memory at the cost of complexity.” → Fischer–Heun (RMQ ±1) is O(N) preprocessing + O(1) query but conceptually heavy.

Product Extension

Static analytics dashboards (pre-aggregated, refreshed nightly) over time-series metrics: “min latency in this 5-minute window over the last 24 hours, sliding”. Pre-aggregate the time-series as a sparse table at end-of-day; serve queries at the dashboard at single-microsecond latencies. The “static” condition matches because the data is read-only between rebuilds.

Language/Runtime Follow-ups

  • Python: list-of-lists is cache-unfriendly; flatten to one big list with manual indexing for ~3x speedup. PyPy if benchmarking.
  • Java: int[][] st = new int[K+1][n]. JIT will hoist invariants. For N=10⁶ allocate carefully.
  • Go: [][]int is fine; make([]int, n) inside a loop is idiomatic.
  • C++: vector<vector<int>> st(K + 1, vector<int>(n)). With -O2 this is the canonical fast implementation.
  • JS/TS: Int32Array per row beats Array for numeric ops. JS doesn’t have integer log2 — Math.log2 is float and slow; use (31 - Math.clz32(x)) for 32-bit ints.

Common Bugs

  1. Computing log2 per query → floating-point rounding errors, e.g. log2(8) → 2.9999..., floored to 2. Always use the precomputed log_floor[].
  2. Building st[k][i] for i + 2^k - 1 ≥ N — out-of-bounds. Loop must end at n - 2^k.
  3. Sizing st with too few rows: K = log_floor[N], but allocating K rows misses the K-th. Allocate K+1.
  4. Using sparse table for sum and then puzzling over wrong answers — sum is not idempotent.
  5. Forgetting that the array must be static. If a query is interleaved with mutation, the structure silently returns stale answers.
  6. Off-by-one in query: r - (1 << k) + 1 vs r - (1 << k). The interval is [r - 2^k + 1, r] of length 2^k — verify by hand on a tiny case.

Debugging Strategy

Print st[k] for small N=8 and verify by hand: st[0] is the array, st[1][i] = min(arr[i], arr[i+1]), st[2][i] = min(arr[i..i+3]), st[3][0] = min(arr[0..7]). If those don’t hold, your build loop is wrong. If queries fail but build is correct, suspect log_floor and/or query indexing — trace the formula on query(2, 5) where len=4, k=2, indices=2 and 5-4+1=2, both pointing at the same precomputed cell.

Mastery Criteria

  • Stated the idempotence requirement and gave 3 ops that satisfy it and 1 that doesn’t.
  • Wrote sparse table from scratch in <10 minutes.
  • Wrote log_floor table without using math.log2.
  • Chose sparse table over segment tree for a read-only workload by stating the constants (1 vs log N per query).
  • Identified the failure mode “what if updates are needed” and named segment tree as the replacement.
  • Solved one classic RMQ problem and one problem reducible to RMQ.