Lab 05 — Coordinate Compression and Fenwick Tree: Count of Smaller Numbers After Self

Goal

Master coordinate compression as a preprocessing step, then use a Fenwick tree (BIT) over the compressed indices to count, for each element of an array, how many elements to its right are strictly smaller. Solve LC 315 in O(N log N). By the end, you can compress + scan + BIT-update from blank in <10 minutes.

Background

Many problems on integer arrays don’t actually depend on the values, only on the relative ordering. When values can be huge (10^9) but N is small (10^5), allocating an array indexed by value is impossible. Coordinate compression replaces each value v by its rank in sort(unique(values)), mapping the value space down to [0, N). This lets us index a BIT or segment tree by rank instead of value, swapping O(value_range) space for O(N).

The pairing of “compress, then BIT over ranks, then scan one direction” is one of the most reused patterns in CP: count inversions, count smaller-on-right, count pairs with sum in a range, K-th smallest in a sliding window — all instances of the same engine.

Interview Context

LC 315 (Hard) — Count of Smaller Numbers After Self. A classic FAANG senior-level question; also asked at quant firms for “count inversions” or “count pairs (i, j) with i < j and a_i > a_j”. Brute is O(N²) and obvious. The expected O(N log N) solution requires the candidate to either:

Compress + BIT (this lab), or
Modified merge sort (counts inversions during the merge step).

Both are valid; BIT is more general and extends to more variants. Senior candidates are expected to know both.

Problem Statement

Given an integer array nums of length N, return an array counts where counts[i] is the number of indices j > i such that nums[j] < nums[i].

LeetCode reference: LC 315 (Count of Smaller Numbers After Self).

Constraints

1 ≤ N ≤ 10^5.
−10^4 ≤ nums[i] ≤ 10^4 (LC); generalize to −10^9 ≤ nums[i] ≤ 10^9.
Time limit: 1 second.

Clarifying Questions

“Strictly smaller, or ≤?” — strictly smaller (per LC). For ≤, change query upper bound.
“Are duplicates allowed?” — yes. They must not count themselves as “smaller”.
“Modify input allowed?” — generally yes; the compression step can sort a copy.
“Return order?” — same order as input (counts[i] aligned with nums[i]).

Examples

nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]. For 5: indices 1 and 3 have values < 5. For 2: only index 3. For 6: only index 3. For 1: nothing to the right.
nums = [-1] → [0].
nums = [-1, -1] → [0, 0]. Equal values don’t count.

Initial Brute Force

def count_smaller_brute(nums):
    n = len(nums)
    out = [0] * n
    for i in range(n):
        for j in range(i + 1, n):
            if nums[j] < nums[i]:
                out[i] += 1
    return out

Brute Force Complexity

O(N²). For N = 10^5, that’s 10^10 ops — infeasible.

Optimization Path

Brute (above).
Modified merge sort. During merge, when an element from the right half is placed before an element from the left half, increment counters for all unplaced left-half elements. O(N log N).
Coordinate compression + Fenwick tree. Compress values to ranks [0, N). Scan right-to-left. For each element, query BIT prefix sum on [0, rank(v) − 1] (= count of strictly smaller values already seen on the right), then update BIT at rank(v). O(N log N).

The BIT approach is more flexible: it handles “smaller”, “equal”, “in range”, “K-th smallest” with the same engine. The merge sort approach is more efficient in constant factor for pure inversion counting.

Final Expected Approach

Coordinate compression + Fenwick tree, scan right-to-left.

struct BIT {
    vector<int> t;
    BIT(int n) : t(n + 1, 0) {}
    void update(int i, int v) { for (++i; i < (int)t.size(); i += i & -i) t[i] += v; }
    int query(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
};

vector<int> count_smaller(vector<int>& nums) {
    int n = nums.size();
    vector<int> sorted_nums(nums.begin(), nums.end());
    sort(sorted_nums.begin(), sorted_nums.end());
    sorted_nums.erase(unique(sorted_nums.begin(), sorted_nums.end()), sorted_nums.end());

    auto rank_of = [&](int v) {
        return (int)(lower_bound(sorted_nums.begin(), sorted_nums.end(), v) - sorted_nums.begin());
    };

    BIT bit(sorted_nums.size());
    vector<int> result(n);
    for (int i = n - 1; i >= 0; --i) {
        int r = rank_of(nums[i]);
        result[i] = (r > 0) ? bit.query(r - 1) : 0;
        bit.update(r, 1);
    }
    return result;
}

Data Structures Used

A sorted-uniqued copy of the input for rank lookup (O(N log N) sort).
A Fenwick tree of size unique_count for prefix-sum updates and queries.
A result array of size N.

Correctness Argument

Compression preserves order. lower_bound returns the first index whose value is ≥ v; for any v in the original array, this is the unique rank. So rank(a) < rank(b) ↔ a < b, and rank(a) == rank(b) ↔ a == b.

Right-to-left scan invariant. When processing index i, the BIT contains exactly the multiset of ranks for indices [i+1, n−1] (each updated by +1). The query bit.query(rank(nums[i]) − 1) returns the count of those ranks strictly less than rank(nums[i]), which equals the count of nums[j] < nums[i] for j > i. After the query, we insert rank(nums[i]) so it’s visible to the next (lower) i.

Edge case rank == 0. If nums[i] is the minimum, no value is strictly smaller, so result[i] = 0. The code guards r > 0 to avoid querying bit.query(-1).

Complexity

Sort + unique: O(N log N).
For each of N elements: one rank lookup (O(log N)), one BIT query (O(log N)), one BIT update (O(log N)).
Total: O(N log N) time, O(N) space.

Implementation Requirements

Use 1-indexed BIT internally (++i on entry); 0-indexed externally.
After sort, deduplicate before binary search; otherwise rank would skip values for ties.
Scan right-to-left; left-to-right would count smaller-on-left, a different problem.
Handle n = 0 (return empty array) and n = 1 (return [0]).

Tests

def test_count_smaller():
    assert count_smaller([5, 2, 6, 1]) == [2, 1, 1, 0]
    assert count_smaller([-1]) == [0]
    assert count_smaller([-1, -1]) == [0, 0]
    assert count_smaller([]) == []
    assert count_smaller([1, 2, 3, 4]) == [0, 0, 0, 0]   # already sorted asc
    assert count_smaller([4, 3, 2, 1]) == [3, 2, 1, 0]   # sorted desc
    assert count_smaller([1, 1, 1]) == [0, 0, 0]         # all equal

    # Stress vs brute
    import random
    for _ in range(200):
        n = random.randint(0, 50)
        nums = [random.randint(-100, 100) for _ in range(n)]
        assert count_smaller(nums) == count_smaller_brute(nums)

Follow-up Questions

“Count strictly greater on right?” — change query to bit.query(maxRank) − bit.query(rank(v)).
“Count of equal values on right?” — bit.query(rank(v)) − bit.query(rank(v) − 1).
“Count of values in [lo, hi] on right?” — bit.query(rank(hi)) − bit.query(rank(lo) − 1).
“Total inversions in array?” — sum the result array.
“Online streaming version?” — same algorithm with a hash-rank assigned on-the-fly via a balanced BST or order-statistics tree (no compression possible until all values seen).
“Why not segment tree?” — works equally well; BIT has 4× smaller constant and is shorter to code. Use seg tree if you need range max / range update.

Product Extension

Recommendation systems: “for each user’s recently watched item, how many of their next 10 watches were lower-rated?” — same problem on rating arrays. Quant trading: rank-based features (“how many of the next 100 ticks are below this tick’s price?”) computed in batch via this engine. Search ranking: “for each query, count the number of subsequent queries with shorter session length” — feature engineering pipelines.

Language / Runtime Follow-ups

Python: bisect.bisect_left for rank lookup; BIT as a plain list. sortedcontainers.SortedList is a 1-line alternative (sl.bisect_left(v); sl.add(v)) but ~5× slower than BIT in pure Python.
Java: Arrays.binarySearch for rank lookup, int[] BIT.
Go: sort.SearchInts, []int BIT.
C++: lower_bound + vector<int> BIT. The pbds order-statistics tree (tree<> from __gnu_pbds) gives find_by_order and order_of_key directly, but is slower than BIT.

Common Bugs

Forgetting to deduplicate after sort: lower_bound still works, but BIT size becomes N even if values are mostly equal — wasted space, not incorrect.
Using upper_bound instead of lower_bound for rank: gives wrong answer for duplicates.
Scanning left-to-right instead of right-to-left: solves a different problem.
1-indexed vs 0-indexed off-by-one in the BIT.
Querying rank − 1 without checking rank > 0: bit.query(-1) may return garbage or crash.
Comparing with ≤ instead of < (depends on problem statement).

Debugging Strategy

If output is wrong:

Print compressed ranks alongside original values; check ordering preserved.
Print BIT state after each insertion (small input). Verify prefix sums equal “count of values ≤ rank”.
Run on [5, 2, 6, 1] and trace right-to-left: at i=3, BIT empty, result=0. Insert rank(1). At i=2, query rank(6)−1 → count of values with rank ≤ rank(6)−1 already in BIT → expect 1.
Compare against brute on N ≤ 30 random inputs.

Mastery Criteria

Write coordinate compression (sort + unique + lower_bound) from blank in <2 minutes.
Write Fenwick tree (update + query) from memory in <3 minutes.
Articulate the right-to-left scan invariant in one sentence.
Adapt the engine to “count in range [lo, hi]” or “total inversions” in <5 minutes.
Recognize “values up to 10^9, N up to 10^5, count-by-rank query” as the compression+BIT trigger in <60 seconds.

← Lab 04 — Sweep Line · Phase 7 README · Lab 06 — Stress Testing →

LeetCode Interview — Extreme Coding