Lab 03 — Fenwick Tree (Binary Indexed Tree)
Goal
Implement a Fenwick tree (BIT) and use it to solve LeetCode 315 — Count of Smaller Numbers After Self. Internalize the bit-tricks (i & -i) and the 1-indexed convention so well that you can write a Fenwick tree in under 8 minutes from a blank page.
Background Concepts
A Fenwick tree is a clever encoding of prefix sums in O(N) space supporting prefix_sum(i) and point_update(i, delta) each in O(log N). The key insight: index i in 1-indexed form is associated with a “responsibility range” of size i & -i (the lowest set bit of i). Index 12 = 1100₂ has responsibility for the 4 values at positions 9..12. Index 8 = 1000₂ for 1..8. Walking up the tree (i += i & -i) accumulates non-overlapping responsibility ranges that span exactly [1, i] for query, and exactly the buckets that contain i for update.
The structure is invertible-only: it stores prefix sums and you derive range_sum(l, r) = prefix(r) - prefix(l - 1). This is fine for sum, xor, count, and “frequency-prefix” aggregates. It does not generalize to min/max because subtraction doesn’t undo a min.
For LC 315, the trick is coordinate compression + Fenwick of frequencies. Process the array right-to-left; for each nums[i], query “how many values strictly less than nums[i] have I seen so far?” by computing prefix(rank(nums[i]) - 1) on the frequency Fenwick; then increment update(rank(nums[i]), 1).
Interview Context
Fenwick trees are asked roughly as often as segment trees but the audience skews more competitive-programming. Stripe, Jane Street, Two Sigma, Bloomberg quant — all reach for them. The signal is “count inversions / count-of-X-after-Y / range-sum-with-updates and the aggregate is invertible”. Faster constants than segment tree, ~5x fewer lines of code; if both work, prefer Fenwick.
Problem Statement
Given an integer array nums, return an array counts where counts[i] is the number of elements to the right of nums[i] that are strictly smaller than nums[i].
Constraints
- 1 ≤ N ≤ 10⁵
- −10⁴ ≤ nums[i] ≤ 10⁴
Clarifying Questions
- Strictly smaller, or ≤? (Strictly smaller.)
- Return order: same as input order? (Yes —
counts[i]aligns withnums[i].) - Are duplicates allowed? (Yes — they don’t count toward “smaller”.)
Examples
nums = [5, 2, 6, 1] → counts = [2, 1, 1, 0]
5: indices 1,3 (vals 2,1) are smaller → 2
2: index 3 (val 1) is smaller → 1
6: index 3 (val 1) is smaller → 1
1: nothing to the right is smaller → 0
Initial Brute Force
For each i, scan j > i and count nums[j] < nums[i]. O(N²).
Brute Force Complexity
O(N²) time. At N=10⁵: 10¹⁰ ops. TLE by 4 orders of magnitude.
Optimization Path
Merge sort with inversion counting. During the merge step, when copying from the right half, every remaining element on the left half is strictly larger and hasn’t yet been placed — for each, increment its inversion count. O(N log N) time and space. Works, idiomatic.
Fenwick of frequencies after coordinate compression. Equally O(N log N), simpler to extend (e.g., to “count of values in [a, b] after self”).
For this lab, Fenwick is the assigned approach because it generalizes farther.
Final Expected Approach
- Coordinate compression: build
sorted_unique = sorted(set(nums)); map each valuevtorank = bisect_left(sorted_unique, v) + 1(1-indexed for Fenwick). - Right-to-left sweep: for each
ifromn-1down to0:r = rank[nums[i]]counts[i] = bit.prefix(r - 1)— count of strictly smaller previously-seen.bit.update(r, 1).
- Return
counts.
Data Structures Used
BIT(size)— Fenwick tree of size = number of distinct values.rankmap — value → 1-indexed compressed rank.counts[]— output array.
Correctness Argument
After processing index i, the BIT contains exactly the multiset of ranks for nums[i+1..n-1] (the elements to the right of i, since we go right-to-left). bit.prefix(r - 1) returns the count of those whose rank is < r — i.e., strictly smaller than nums[i]. Coordinate compression preserves order, so “rank smaller” iff “value smaller”. The update bit.update(r, 1) then registers nums[i] for the next iteration. By induction the invariant “BIT == multiset of ranks of strictly-right-of-current” is preserved.
Complexity
| Operation | Time | Space |
|---|---|---|
| Coordinate compression | O(N log N) | O(N) |
| Right-to-left sweep | O(N log N) | O(N) Fenwick |
| Total | O(N log N) | O(N) |
Implementation Requirements
class BIT:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1) # 1-indexed
def update(self, i, delta):
while i <= self.n:
self.tree[i] += delta
i += i & -i
def prefix(self, i):
s = 0
while i > 0:
s += self.tree[i]
i -= i & -i
return s
from bisect import bisect_left
def countSmaller(nums):
sorted_unique = sorted(set(nums))
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
bit = BIT(len(sorted_unique))
counts = [0] * len(nums)
for i in range(len(nums) - 1, -1, -1):
r = rank[nums[i]]
counts[i] = bit.prefix(r - 1)
bit.update(r, 1)
return counts
Tests
[5, 2, 6, 1] → [2, 1, 1, 0].[1, 2, 3, 4] → [0, 0, 0, 0](already sorted).[4, 3, 2, 1] → [3, 2, 1, 0](reverse sorted — every pair is an inversion).- All same:
[5, 5, 5] → [0, 0, 0]. - Negatives:
[-1, -1, 0, -2] → [1, 1, 1, 0]. - Single element:
[7] → [0]. - Stress: 10⁴ random arrays of size 1000 against the O(N²) brute force.
Follow-up Questions
- “Now count strictly larger after self.” → mirror: prefix from
rank+1ton=bit.prefix(n) - bit.prefix(rank). - “Count of values in
[a, b]after self.” →bit.prefix(rank[b]) - bit.prefix(rank[a] - 1). - “Reverse pairs (LC 493)”:
nums[i] > 2 * nums[j]fori < j. Adapt the rank/query: compute “count of v’s in BIT withv < nums[i] / 2” — careful with integer division. - “Sum of values smaller after self instead of count.” → BIT stores value sums, not counts;
update(rank, nums[i])instead ofupdate(rank, 1). - “Now updates are interleaved with queries on the original problem.” → Fenwick tree of frequencies still works because both ops are O(log N).
Product Extension
A leaderboard service that streams game scores and reports “your rank percentile” as scores arrive. Fenwick of frequencies indexed by score bucket; as a new score arrives, query prefix to know how many scored less, divide by total. Works at millions-of-events-per-second with log-bucket cost per event.
Language/Runtime Follow-ups
- Python: integer ops are arbitrary-precision but slow; the BIT loop is hot. PyPy or C-extension if N=10⁶.
array.array('q')overlistonly marginally helps. - Java:
int[]for the tree at this N.Math.floorModnot needed (values are positive ranks). Watch forlongif you store sums. - Go: idiomatic —
tree []int. No surprises. - C++: canonical CP template.
vector<int> bit(n + 1, 0). Thei & -ilowbit relies on two’s complement, which all modern compilers guarantee for signedint. - JS/TS:
Int32Array(n + 1)outperforms regular arrays. The bitwise&and unary-on numbers cast through 32-bit signed int, which works for N ≤ 2³¹.
Common Bugs
- 0-indexing the BIT. Calling
update(0, …)isi & -i = 0, the loop never advances, or it loops forever (depends on language). Always 1-index. - Update walks down, query walks up — got the directions reversed. Mnemonic: update goes up the responsibility tree (so future prefix walks see it); query walks down (collecting predecessor ranges).
- Forgetting to add 1 when going from
bisect_leftrank to BIT index. - Compressing
numsbut using the original value when querying. treearray sizennotn + 1for 1-indexed.- Using Fenwick for min/max — it doesn’t work because subtraction is not the inverse of min.
Debugging Strategy
For a length-8 input, print tree[1..8] after each update. Recall: tree[i] stores the sum over [i - lowbit(i) + 1, i]. So tree[8] should equal the sum over [1, 8], tree[12] over [9, 12], etc. Verify by hand for a 3-update sequence.
If the inversion-count is off by 1 at every position, you almost certainly forgot the +1 in rank shifting (1-indexed vs 0-indexed). If it’s off by a lot, your update is going down instead of up, or your prefix is going up instead of down.
Mastery Criteria
-
Wrote
updateandprefixwithi & -icorrectly on first try. - Used 1-indexed throughout without bugs.
- Solved LC 315 in <15 minutes from blank slate.
- Solved one cousin (LC 327, LC 493) in <30 minutes.
- Articulated why Fenwick can’t do range-min.
- Stated when Fenwick beats segment tree (smaller code, smaller constants — pick Fenwick when the aggregate is invertible).
-
Estimated memory at N=10⁶ (~4 MB for
int) without prompting.