Lab 05 — Coordinate Compression and Fenwick Tree: Count of Smaller Numbers After Self
Goal
Master coordinate compression as a preprocessing step, then use a Fenwick tree (BIT) over the compressed indices to count, for each element of an array, how many elements to its right are strictly smaller. Solve LC 315 in O(N log N). By the end, you can compress + scan + BIT-update from blank in <10 minutes.
Background
Many problems on integer arrays don’t actually depend on the values, only on the relative ordering. When values can be huge (10^9) but N is small (10^5), allocating an array indexed by value is impossible. Coordinate compression replaces each value v by its rank in sort(unique(values)), mapping the value space down to [0, N). This lets us index a BIT or segment tree by rank instead of value, swapping O(value_range) space for O(N).
The pairing of “compress, then BIT over ranks, then scan one direction” is one of the most reused patterns in CP: count inversions, count smaller-on-right, count pairs with sum in a range, K-th smallest in a sliding window — all instances of the same engine.
Interview Context
LC 315 (Hard) — Count of Smaller Numbers After Self. A classic FAANG senior-level question; also asked at quant firms for “count inversions” or “count pairs (i, j) with i < j and a_i > a_j”. Brute is O(N²) and obvious. The expected O(N log N) solution requires the candidate to either:
- Compress + BIT (this lab), or
- Modified merge sort (counts inversions during the merge step).
Both are valid; BIT is more general and extends to more variants. Senior candidates are expected to know both.
Problem Statement
Given an integer array nums of length N, return an array counts where counts[i] is the number of indices j > i such that nums[j] < nums[i].
LeetCode reference: LC 315 (Count of Smaller Numbers After Self).
Constraints
1 ≤ N ≤ 10^5.−10^4 ≤ nums[i] ≤ 10^4(LC); generalize to−10^9 ≤ nums[i] ≤ 10^9.- Time limit: 1 second.
Clarifying Questions
- “Strictly smaller, or
≤?” — strictly smaller (per LC). For≤, change query upper bound. - “Are duplicates allowed?” — yes. They must not count themselves as “smaller”.
- “Modify input allowed?” — generally yes; the compression step can sort a copy.
- “Return order?” — same order as input (
counts[i]aligned withnums[i]).
Examples
nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]. For5: indices 1 and 3 have values < 5. For2: only index 3. For6: only index 3. For1: nothing to the right.nums = [-1] → [0].nums = [-1, -1] → [0, 0]. Equal values don’t count.
Initial Brute Force
def count_smaller_brute(nums):
n = len(nums)
out = [0] * n
for i in range(n):
for j in range(i + 1, n):
if nums[j] < nums[i]:
out[i] += 1
return out
Brute Force Complexity
O(N²). For N = 10^5, that’s 10^10 ops — infeasible.
Optimization Path
- Brute (above).
- Modified merge sort. During merge, when an element from the right half is placed before an element from the left half, increment counters for all unplaced left-half elements.
O(N log N). - Coordinate compression + Fenwick tree. Compress values to ranks
[0, N). Scan right-to-left. For each element, query BIT prefix sum on[0, rank(v) − 1](= count of strictly smaller values already seen on the right), then update BIT atrank(v).O(N log N).
The BIT approach is more flexible: it handles “smaller”, “equal”, “in range”, “K-th smallest” with the same engine. The merge sort approach is more efficient in constant factor for pure inversion counting.
Final Expected Approach
Coordinate compression + Fenwick tree, scan right-to-left.
struct BIT {
vector<int> t;
BIT(int n) : t(n + 1, 0) {}
void update(int i, int v) { for (++i; i < (int)t.size(); i += i & -i) t[i] += v; }
int query(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
};
vector<int> count_smaller(vector<int>& nums) {
int n = nums.size();
vector<int> sorted_nums(nums.begin(), nums.end());
sort(sorted_nums.begin(), sorted_nums.end());
sorted_nums.erase(unique(sorted_nums.begin(), sorted_nums.end()), sorted_nums.end());
auto rank_of = [&](int v) {
return (int)(lower_bound(sorted_nums.begin(), sorted_nums.end(), v) - sorted_nums.begin());
};
BIT bit(sorted_nums.size());
vector<int> result(n);
for (int i = n - 1; i >= 0; --i) {
int r = rank_of(nums[i]);
result[i] = (r > 0) ? bit.query(r - 1) : 0;
bit.update(r, 1);
}
return result;
}
Data Structures Used
- A sorted-uniqued copy of the input for rank lookup (O(N log N) sort).
- A Fenwick tree of size
unique_countfor prefix-sum updates and queries. - A result array of size
N.
Correctness Argument
Compression preserves order. lower_bound returns the first index whose value is ≥ v; for any v in the original array, this is the unique rank. So rank(a) < rank(b) ↔ a < b, and rank(a) == rank(b) ↔ a == b.
Right-to-left scan invariant. When processing index i, the BIT contains exactly the multiset of ranks for indices [i+1, n−1] (each updated by +1). The query bit.query(rank(nums[i]) − 1) returns the count of those ranks strictly less than rank(nums[i]), which equals the count of nums[j] < nums[i] for j > i. After the query, we insert rank(nums[i]) so it’s visible to the next (lower) i.
Edge case rank == 0. If nums[i] is the minimum, no value is strictly smaller, so result[i] = 0. The code guards r > 0 to avoid querying bit.query(-1).
Complexity
- Sort + unique:
O(N log N). - For each of
Nelements: one rank lookup (O(log N)), one BIT query (O(log N)), one BIT update (O(log N)). - Total:
O(N log N)time,O(N)space.
Implementation Requirements
- Use 1-indexed BIT internally (
++ion entry); 0-indexed externally. - After sort, deduplicate before binary search; otherwise rank would skip values for ties.
- Scan right-to-left; left-to-right would count smaller-on-left, a different problem.
- Handle
n = 0(return empty array) andn = 1(return[0]).
Tests
def test_count_smaller():
assert count_smaller([5, 2, 6, 1]) == [2, 1, 1, 0]
assert count_smaller([-1]) == [0]
assert count_smaller([-1, -1]) == [0, 0]
assert count_smaller([]) == []
assert count_smaller([1, 2, 3, 4]) == [0, 0, 0, 0] # already sorted asc
assert count_smaller([4, 3, 2, 1]) == [3, 2, 1, 0] # sorted desc
assert count_smaller([1, 1, 1]) == [0, 0, 0] # all equal
# Stress vs brute
import random
for _ in range(200):
n = random.randint(0, 50)
nums = [random.randint(-100, 100) for _ in range(n)]
assert count_smaller(nums) == count_smaller_brute(nums)
Follow-up Questions
- “Count strictly greater on right?” — change query to
bit.query(maxRank) − bit.query(rank(v)). - “Count of equal values on right?” —
bit.query(rank(v)) − bit.query(rank(v) − 1). - “Count of values in
[lo, hi]on right?” —bit.query(rank(hi)) − bit.query(rank(lo) − 1). - “Total inversions in array?” — sum the result array.
- “Online streaming version?” — same algorithm with a hash-rank assigned on-the-fly via a balanced BST or order-statistics tree (no compression possible until all values seen).
- “Why not segment tree?” — works equally well; BIT has 4× smaller constant and is shorter to code. Use seg tree if you need range max / range update.
Product Extension
Recommendation systems: “for each user’s recently watched item, how many of their next 10 watches were lower-rated?” — same problem on rating arrays. Quant trading: rank-based features (“how many of the next 100 ticks are below this tick’s price?”) computed in batch via this engine. Search ranking: “for each query, count the number of subsequent queries with shorter session length” — feature engineering pipelines.
Language / Runtime Follow-ups
- Python:
bisect.bisect_leftfor rank lookup; BIT as a plain list.sortedcontainers.SortedListis a 1-line alternative (sl.bisect_left(v); sl.add(v)) but ~5× slower than BIT in pure Python. - Java:
Arrays.binarySearchfor rank lookup,int[]BIT. - Go:
sort.SearchInts,[]intBIT. - C++:
lower_bound+vector<int>BIT. Thepbdsorder-statistics tree (tree<>from__gnu_pbds) givesfind_by_orderandorder_of_keydirectly, but is slower than BIT.
Common Bugs
- Forgetting to deduplicate after sort:
lower_boundstill works, but BIT size becomesNeven if values are mostly equal — wasted space, not incorrect. - Using
upper_boundinstead oflower_boundfor rank: gives wrong answer for duplicates. - Scanning left-to-right instead of right-to-left: solves a different problem.
- 1-indexed vs 0-indexed off-by-one in the BIT.
- Querying
rank − 1without checkingrank > 0:bit.query(-1)may return garbage or crash. - Comparing with
≤instead of<(depends on problem statement).
Debugging Strategy
If output is wrong:
- Print compressed ranks alongside original values; check ordering preserved.
- Print BIT state after each insertion (small input). Verify prefix sums equal “count of values ≤ rank”.
- Run on
[5, 2, 6, 1]and trace right-to-left: ati=3, BIT empty, result=0. Insert rank(1). Ati=2, query rank(6)−1 → count of values with rank ≤ rank(6)−1 already in BIT → expect 1. - Compare against brute on
N ≤ 30random inputs.
Mastery Criteria
-
Write coordinate compression (
sort+unique+lower_bound) from blank in <2 minutes. - Write Fenwick tree (update + query) from memory in <3 minutes.
- Articulate the right-to-left scan invariant in one sentence.
- Adapt the engine to “count in range [lo, hi]” or “total inversions” in <5 minutes.
- Recognize “values up to 10^9, N up to 10^5, count-by-rank query” as the compression+BIT trigger in <60 seconds.
← Lab 04 — Sweep Line · Phase 7 README · Lab 06 — Stress Testing →