Lab 04 — Centroid Decomposition
Goal
Implement centroid decomposition for efficient tree queries — counting / aggregating over all paths in a tree in O(N log N) or O(N log² N).
Background
The centroid of a tree is a vertex whose removal leaves no subtree with more than N/2 vertices. Every tree has a centroid (sometimes two).
Centroid decomposition: recursively decompose the tree:
- Find centroid; process all paths passing through it
- Remove centroid; recurse on each remaining subtree
Recursion depth: O(log N) (each level halves subtree size). Total work per level: O(N) typically → O(N log N) total.
Originally developed for tree DP and offline path queries. Powerful technique for problems of the form: “count/sum over all pairs (u, v) in a tree with property P on the u-v path.”
Interview Context
Almost exclusively ICPC. Some appearances in:
- Quant algo research on tree models
- Phylogenetic inference (computational biology)
- A handful of compiler dominator-tree analyses
Industry interviews: near-zero.
When to Skip This Topic
Skip if any of these are true:
- You are not training for ICPC or competitive contests
- You haven’t done Lab 03 (HLD) — these are sibling techniques
- You don’t have 2+ weeks for the implementation practice
Centroid decomposition has a high “first implementation” cost. Don’t attempt without serious tree-DP fluency.
Problem Statement
Count Paths in Tree with Length ≤ K.
Given a tree of N vertices, edge weights w_e, and integer K, count the number of unordered pairs (u, v) such that the sum of edge weights on the path from u to v is ≤ K.
Constraints
- 1 ≤ N ≤ 5×10^4
- 1 ≤ K ≤ 10^9
- 1 ≤ w_e ≤ 10^4
Clarifying Questions
- Are weights positive? (Yes — required for the standard algorithm.)
- Count ordered or unordered pairs? (Unordered, exclude self-pairs.)
- Are edge weights integers? (Yes — convenient for sort/binary-search.)
Examples
Tree: 1-2 (w=2), 2-3 (w=1), 2-4 (w=3)
K = 4
Paths and lengths:
(1,2): 2 ✓
(1,3): 3 ✓
(1,4): 5 ✗
(2,3): 1 ✓
(2,4): 3 ✓
(3,4): 4 ✓
Answer: 5
Brute Force
For each unordered pair (u, v), compute path length (LCA + ancestor distances). O(N² log N).
For N = 5×10^4: 2.5×10^9 ops — TLE.
Brute Force Complexity
- Time: O(N² log N) for path length per pair
- Space: O(N) plus LCA tables
Optimization Path
Centroid decomposition shines for “paths through centroid” enumeration:
- A path between u and v either passes through the centroid c or lies entirely in one subtree (after c is removed)
- Paths through c: count by aggregating distances from c to every other vertex
- Paths in subtrees: handled recursively
Per centroid:
- BFS from c, recording dist(c, v) for every v in c’s connected component
- Sort distances per subtree
- Count pairs (u, v) with dist(c, u) + dist(c, v) ≤ K using two pointers
- Subtract pairs where u and v are in the same subtree (they would have been counted as paths through some other centroid)
Final Expected Approach
def centroid_decompose(root, K):
n = len(adj)
removed = [False] * n
size = [0] * n
total = 0
def calc_size(u, parent):
size[u] = 1
for v, _ in adj[u]:
if v != parent and not removed[v]:
calc_size(v, u)
size[u] += size[v]
def find_centroid(u, parent, tree_size):
for v, _ in adj[u]:
if v != parent and not removed[v] and size[v] > tree_size // 2:
return find_centroid(v, u, tree_size)
return u
def gather_dists(u, parent, d, out):
out.append(d)
for v, w in adj[u]:
if v != parent and not removed[v]:
gather_dists(v, u, d + w, out)
def count_pairs(dists, K):
dists.sort()
i, j = 0, len(dists) - 1
c = 0
while i < j:
if dists[i] + dists[j] <= K:
c += j - i
i += 1
else:
j -= 1
return c
def decompose(u):
nonlocal total
calc_size(u, -1)
c = find_centroid(u, -1, size[u])
all_dists = [0]
for v, w in adj[c]:
if not removed[v]:
sub = []
gather_dists(v, c, w, sub)
# subtract pairs within this subtree
total -= count_pairs(sub[:], K)
all_dists.extend(sub)
total += count_pairs(all_dists, K)
removed[c] = True
for v, _ in adj[c]:
if not removed[v]:
decompose(v)
decompose(root)
return total
Data Structures
- Adjacency list (vertex → list of (neighbor, weight))
removed[v]: marks centroids removed from active treesize[v]: subtree size in current decomposition step- Distance lists per subtree
Correctness Argument
- Centroid existence: every tree has a centroid (induction on tree structure).
- Recursion depth O(log N): removing centroid leaves subtrees of size ≤ N/2.
- Pair counting via subtraction: a path (u, v) is counted exactly once — at the deepest centroid c that lies on path(u, v). The inclusion-exclusion (add all-vertices count, subtract per-subtree count) ensures each path-through-c is counted once.
- Two pointers for sum ≤ K: standard.
Complexity
- Time: O(N log² N) — O(log N) levels, O(N log N) per level (sort dominates)
- Space: O(N) for tree + O(N) for decomposition state
Implementation Requirements
- Iterative or carefully bounded recursive DFS (Python: 5×10^4 may need increased limit)
- Recompute
size[]for each subtree (in the recursive call); critical bug source - Two-pointer pair counting requires sorted distances
- The inclusion-exclusion trick is the conceptual core; verify on small cases
Tests
- Linear chain (path graph): N(N-1)/2 paths; verify against brute force
- Star tree: each pair sum is at most 2*max_weight
- Balanced binary tree
- N = 1 (no pairs)
- K = 0 with positive weights (only self-pairs; answer = 0)
- Very large K (all pairs counted)
Follow-up Questions
- Count paths with length exactly K. → Use hashmap of distances per subtree; sum complement counts.
- Sum of path lengths (rather than count). → Aggregate sums in addition to counts during two-pointer scan.
- XOR of edge weights instead of sum, equals K. → Replace sort/two-pointer with XOR trie.
- Online (tree mutating). → Much harder; use top trees or Euler-tour trees.
- K-th shortest path. → Different problem; rarely tractable on trees with centroid.
Product Extension
- Phylogenetics: counting pairs of species within evolutionary distance K
- Network distance queries on hierarchical trees
- Distance-based recommendation systems on tree-like ontologies
Language/Runtime Follow-ups
- Python: sort + two-pointer per level; constant factor is the killer. C++ recommended for N ≥ 10^4.
- C++: standard ICPC implementation; ~100 lines.
- Recursive DFS: centroid decomposition depth O(log N), but inner DFS depth O(N) — limit must accommodate.
Common Bugs
- Forgot to recompute
size[]for each subtree. Sizes from before removal are stale. - Centroid finder doesn’t follow the right child. Must descend toward the largest remaining subtree.
removed[v]check forgotten in DFS: revisits removed centroids.- Off-by-one in pair counting (counting self-pair). Handle separately.
- Inclusion-exclusion wrong sign. Add all, subtract per-subtree.
- Stack overflow on deep recursion. Convert inner DFS to iterative.
Debugging Strategy
- For small N, compare against brute force at each level
- Log the centroid chosen at each call
- Verify subtree sizes recomputed correctly (print before find_centroid)
- For two-pointer: print sorted distances and the (i, j) cursor trajectory
Mastery Criteria
- Implement centroid decomposition in ≤ 60 min from memory
- Explain the inclusion-exclusion trick for path counting
- Identify problems amenable to centroid decomposition (offline path queries on static tree)
- Distinguish from HLD: HLD is online with edge updates; centroid is offline/path-counting
- State complexity precisely: O(N log² N) typical