Lab 03 — Fenwick Tree (Binary Indexed Tree)

Goal

Implement a Fenwick tree (BIT) and use it to solve LeetCode 315 — Count of Smaller Numbers After Self. Internalize the bit-tricks (i & -i) and the 1-indexed convention so well that you can write a Fenwick tree in under 8 minutes from a blank page.

Background Concepts

A Fenwick tree is a clever encoding of prefix sums in O(N) space supporting prefix_sum(i) and point_update(i, delta) each in O(log N). The key insight: index i in 1-indexed form is associated with a “responsibility range” of size i & -i (the lowest set bit of i). Index 12 = 1100₂ has responsibility for the 4 values at positions 9..12. Index 8 = 1000₂ for 1..8. Walking up the tree (i += i & -i) accumulates non-overlapping responsibility ranges that span exactly [1, i] for query, and exactly the buckets that contain i for update.

The structure is invertible-only: it stores prefix sums and you derive range_sum(l, r) = prefix(r) - prefix(l - 1). This is fine for sum, xor, count, and “frequency-prefix” aggregates. It does not generalize to min/max because subtraction doesn’t undo a min.

For LC 315, the trick is coordinate compression + Fenwick of frequencies. Process the array right-to-left; for each nums[i], query “how many values strictly less than nums[i] have I seen so far?” by computing prefix(rank(nums[i]) - 1) on the frequency Fenwick; then increment update(rank(nums[i]), 1).

Interview Context

Fenwick trees are asked roughly as often as segment trees but the audience skews more competitive-programming. Stripe, Jane Street, Two Sigma, Bloomberg quant — all reach for them. The signal is “count inversions / count-of-X-after-Y / range-sum-with-updates and the aggregate is invertible”. Faster constants than segment tree, ~5x fewer lines of code; if both work, prefer Fenwick.

Problem Statement

Given an integer array nums, return an array counts where counts[i] is the number of elements to the right of nums[i] that are strictly smaller than nums[i].

Constraints

1 ≤ N ≤ 10⁵
−10⁴ ≤ nums[i] ≤ 10⁴

Clarifying Questions

Strictly smaller, or ≤? (Strictly smaller.)
Return order: same as input order? (Yes — counts[i] aligns with nums[i].)
Are duplicates allowed? (Yes — they don’t count toward “smaller”.)

Examples

nums = [5, 2, 6, 1]   →   counts = [2, 1, 1, 0]
   5: indices 1,3 (vals 2,1) are smaller → 2
   2: index 3 (val 1) is smaller → 1
   6: index 3 (val 1) is smaller → 1
   1: nothing to the right is smaller → 0

Initial Brute Force

For each i, scan j > i and count nums[j] < nums[i]. O(N²).

Brute Force Complexity

O(N²) time. At N=10⁵: 10¹⁰ ops. TLE by 4 orders of magnitude.

Optimization Path

Merge sort with inversion counting. During the merge step, when copying from the right half, every remaining element on the left half is strictly larger and hasn’t yet been placed — for each, increment its inversion count. O(N log N) time and space. Works, idiomatic.

Fenwick of frequencies after coordinate compression. Equally O(N log N), simpler to extend (e.g., to “count of values in [a, b] after self”).

For this lab, Fenwick is the assigned approach because it generalizes farther.

Final Expected Approach

Coordinate compression: build sorted_unique = sorted(set(nums)); map each value v to rank = bisect_left(sorted_unique, v) + 1 (1-indexed for Fenwick).
Right-to-left sweep: for each i from n-1 down to 0:
- r = rank[nums[i]]
- counts[i] = bit.prefix(r - 1) — count of strictly smaller previously-seen.
- bit.update(r, 1).
Return counts.

Data Structures Used

BIT(size) — Fenwick tree of size = number of distinct values.
rank map — value → 1-indexed compressed rank.
counts[] — output array.

Correctness Argument

After processing index i, the BIT contains exactly the multiset of ranks for nums[i+1..n-1] (the elements to the right of i, since we go right-to-left). bit.prefix(r - 1) returns the count of those whose rank is < r — i.e., strictly smaller than nums[i]. Coordinate compression preserves order, so “rank smaller” iff “value smaller”. The update bit.update(r, 1) then registers nums[i] for the next iteration. By induction the invariant “BIT == multiset of ranks of strictly-right-of-current” is preserved.

Complexity

Operation	Time	Space
Coordinate compression	O(N log N)	O(N)
Right-to-left sweep	O(N log N)	O(N) Fenwick
Total	O(N log N)	O(N)

Implementation Requirements

class BIT:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)   # 1-indexed

    def update(self, i, delta):
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s


from bisect import bisect_left

def countSmaller(nums):
    sorted_unique = sorted(set(nums))
    rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
    bit = BIT(len(sorted_unique))
    counts = [0] * len(nums)
    for i in range(len(nums) - 1, -1, -1):
        r = rank[nums[i]]
        counts[i] = bit.prefix(r - 1)
        bit.update(r, 1)
    return counts

Tests

[5, 2, 6, 1] → [2, 1, 1, 0].
[1, 2, 3, 4] → [0, 0, 0, 0] (already sorted).
[4, 3, 2, 1] → [3, 2, 1, 0] (reverse sorted — every pair is an inversion).
All same: [5, 5, 5] → [0, 0, 0].
Negatives: [-1, -1, 0, -2] → [1, 1, 1, 0].
Single element: [7] → [0].
Stress: 10⁴ random arrays of size 1000 against the O(N²) brute force.

Follow-up Questions

“Now count strictly larger after self.” → mirror: prefix from rank+1 to n = bit.prefix(n) - bit.prefix(rank).
“Count of values in [a, b] after self.” → bit.prefix(rank[b]) - bit.prefix(rank[a] - 1).
“Reverse pairs (LC 493)”: nums[i] > 2 * nums[j] for i < j. Adapt the rank/query: compute “count of v’s in BIT with v < nums[i] / 2” — careful with integer division.
“Sum of values smaller after self instead of count.” → BIT stores value sums, not counts; update(rank, nums[i]) instead of update(rank, 1).
“Now updates are interleaved with queries on the original problem.” → Fenwick tree of frequencies still works because both ops are O(log N).

Product Extension

A leaderboard service that streams game scores and reports “your rank percentile” as scores arrive. Fenwick of frequencies indexed by score bucket; as a new score arrives, query prefix to know how many scored less, divide by total. Works at millions-of-events-per-second with log-bucket cost per event.

Language/Runtime Follow-ups

Python: integer ops are arbitrary-precision but slow; the BIT loop is hot. PyPy or C-extension if N=10⁶. array.array('q') over list only marginally helps.
Java: int[] for the tree at this N. Math.floorMod not needed (values are positive ranks). Watch for long if you store sums.
Go: idiomatic — tree []int. No surprises.
C++: canonical CP template. vector<int> bit(n + 1, 0). The i & -i lowbit relies on two’s complement, which all modern compilers guarantee for signed int.
JS/TS: Int32Array(n + 1) outperforms regular arrays. The bitwise & and unary - on numbers cast through 32-bit signed int, which works for N ≤ 2³¹.

Common Bugs

0-indexing the BIT. Calling update(0, …) is i & -i = 0, the loop never advances, or it loops forever (depends on language). Always 1-index.
Update walks down, query walks up — got the directions reversed. Mnemonic: update goes up the responsibility tree (so future prefix walks see it); query walks down (collecting predecessor ranges).
Forgetting to add 1 when going from bisect_left rank to BIT index.
Compressing nums but using the original value when querying.
tree array size n not n + 1 for 1-indexed.
Using Fenwick for min/max — it doesn’t work because subtraction is not the inverse of min.

Debugging Strategy

For a length-8 input, print tree[1..8] after each update. Recall: tree[i] stores the sum over [i - lowbit(i) + 1, i]. So tree[8] should equal the sum over [1, 8], tree[12] over [9, 12], etc. Verify by hand for a 3-update sequence.

If the inversion-count is off by 1 at every position, you almost certainly forgot the +1 in rank shifting (1-indexed vs 0-indexed). If it’s off by a lot, your update is going down instead of up, or your prefix is going up instead of down.

Mastery Criteria

Wrote update and prefix with i & -i correctly on first try.
Used 1-indexed throughout without bugs.
Solved LC 315 in <15 minutes from blank slate.
Solved one cousin (LC 327, LC 493) in <30 minutes.
Articulated why Fenwick can’t do range-min.
Stated when Fenwick beats segment tree (smaller code, smaller constants — pick Fenwick when the aggregate is invertible).
Estimated memory at N=10⁶ (~4 MB for int) without prompting.

LeetCode Interview — Extreme Coding