Lab 03 — Heavy-Light Decomposition

Goal

Implement Heavy-Light Decomposition (HLD) for answering path queries on a tree in O(log² N) per query (or O(log N) with a segment tree per chain).

Background

HLD partitions tree edges into “heavy” and “light”:

For each non-leaf vertex, the edge to its child with the largest subtree is heavy.
All other edges are light.

Property: any root-to-leaf path uses O(log N) light edges (because each light edge halves the subtree size). Hence any tree path can be decomposed into O(log N) heavy chains.

Each heavy chain is contiguous in a DFS order, so we can maintain a segment tree over the DFS array and do O(log N) work per chain → O(log² N) per path query.

Originally from Sleator and Tarjan’s link-cut tree work (1983); HLD as the standalone offline technique attributed to Sleator & Tarjan / popularized via ICPC.

Interview Context

HLD appears in:

ICPC (frequently in the path-query category — QTREE problem on SPOJ is canonical)
Almost never in industry interviews
A handful of database/optimizer roles touch on tree-DP that HLD speeds up
A very rare appearance in compiler dominator-tree manipulation

When to Skip This Topic

Skip if any of these are true:

You are not targeting ICPC
You are not interviewing at a research/algorithms team
You don’t already understand segment trees deeply (Phase 3 Lab 01–02)
You have less than 3 weeks for this phase

HLD is implementation-heavy. Getting it right in interview time requires ≥ 30 hours of practice.

Problem Statement

Path Query (QTREE-style).

Given a tree of N vertices, each edge has a weight. Support two operations:

change(i, w): change edge i’s weight to w
query(u, v): return the maximum edge weight on the path from u to v

Constraints

1 ≤ N ≤ 10^5
1 ≤ Q ≤ 10^5
Edge weights ≤ 10^9

Clarifying Questions

Is the tree rooted or unrooted? (Pick a root; doesn’t matter.)
Queries on edges or on vertices? (Edges here; vertex variant is simpler.)
Multiple components possible? (No — single tree.)

Examples

N = 4
Edges: (1,2,3), (2,3,4), (2,4,5)
query(1, 3): path 1→2→3, max edge = 4
change(1, 10): edge (1,2) now has weight 10
query(1, 4): path 1→2→4, max edge = 10

Brute Force

For each query, find the path via LCA (precompute LCA in O(log N)), then walk the path and check each edge. O(N) per query → O(NQ) total.

For N=Q=10^5, that’s 10^10 — TLE.

Brute Force Complexity

Time: O(NQ)
Space: O(N) for tree + LCA tables

Optimization Path

The path between u and v decomposes as u → LCA → v. With HLD, each leg traverses O(log N) heavy chains; each chain is a contiguous range in our DFS order; we query/update with a segment tree.

Final Expected Approach

DFS 1 (size/parent/depth): compute subtree size, parent, depth.
DFS 2 (HLD): for each vertex, identify heavy child (largest subtree). Assign heavy[u]. Walk heavy chains, assigning head[u] and a pos[u] = position in DFS-order array.
Build segment tree over the DFS array, indexed by pos[u].
Query(u, v): while head[u] != head[v], raise the deeper one to its head’s parent, querying the segment tree for the chain segment. Then query the segment between u and v on the final shared chain.

def query_path(u, v):
    res = 0
    while head[u] != head[v]:
        if depth[head[u]] < depth[head[v]]:
            u, v = v, u
        res = max(res, seg_tree.query(pos[head[u]], pos[u]))
        u = parent[head[u]]
    if u == v: return res
    if depth[u] > depth[v]: u, v = v, u
    # Edge weights stored at the deeper endpoint; skip u's contribution
    res = max(res, seg_tree.query(pos[u] + 1, pos[v]))
    return res

Data Structures

Tree adjacency
Arrays: parent, depth, size, heavy, head, pos
Segment tree over pos-indexed values
DFS for both passes (iterative for large N to avoid stack overflow)

Correctness Argument

Light edge bound: if (u, v) is a light edge, size(v) ≤ size(u)/2. So any root-to-leaf path crosses O(log N) light edges.
Heavy chain decomposition: path u→v splits at LCA; each leg traverses chains separated by light edges; ≤ O(log N) chains.
Segment tree on chain: chains contiguous in DFS order; standard range query.
Edge-on-vertex convention: store each edge’s weight at its deeper endpoint, so a vertex query at position i returns the parent-edge weight.

Complexity

Preprocessing: O(N)
Per query/update: O(log² N) — segment tree query O(log N), times O(log N) chains
Total: O((N + Q) log² N)

Implementation Requirements

Iterative DFS for N > 10^4 to avoid Python recursion limit (or C++ stack)
Segment tree must support point update + range max query
Careful indexing: edge i stored at vertex deeper_endpoint(i)
Handle u == v case in query (return 0 or identity)

Tests

Linear tree (path graph): every edge is on the chain; ≤ 1 chain transition per query
Balanced binary tree: ≈ log N chain transitions
Star tree (1 center, N leaves): only one heavy edge; all other queries are 1-chain
Update + query interleaved
Query on same vertex (u == v)
Path including the root

Follow-up Questions

Sum on path instead of max. → Same structure, segment tree stores sums.
Update path (range add) instead of point. → Segment tree with lazy propagation; same chain decomposition.
Subtree query (sum/max in subtree of u). → Even simpler: subtree is contiguous in DFS, single range query.
LCA only. → Tarjan offline O((N+Q)α(N)) or binary lifting O(log N).
Dynamic tree (edges added/removed). → Link-cut trees (Sleator-Tarjan); much harder.

Product Extension

Network routing on hierarchical topologies
File-system path queries (organizations with deep trees)
Phylogenetic tree analysis
Decision tree updates in some ML systems

Language/Runtime Follow-ups

C++: standard ICPC implementation; ~150 lines
Python: slow constant factor; use iterative DFS; can pass for N ≤ 10^4 comfortably
Java: stack depth fine for N ≤ 10^5; constant factor OK

Common Bugs

Recursive DFS for N = 10^5: stack overflow in many languages.
Forgot edge-vertex mapping convention: off-by-one when querying final segment.
Heavy child computed wrong: must be the child with the largest subtree, not the deepest.
head[u] not propagated through the chain. All vertices on the same heavy chain should share head.
Segment tree off-by-one between pos[u] and pos[v].
Update at the root edge. Root has no parent edge; verify boundary handling.

Debugging Strategy

Print the chain decomposition: list all chains as [head, ..., tail]
For a path query, log each chain segment queried
Verify against brute force on N ≤ 20
Visualize: color heavy edges red, light edges black; should see O(log) light edges on long paths

Mastery Criteria

Implement HLD + segment tree from scratch in ≤ 90 min
Explain why O(log N) chains per path
Handle edge-weight vs vertex-weight variants
Combine with lazy propagation for range updates
State complexity precisely: O((N + Q) log² N) or O(log N) with chain-segtrees

LeetCode Interview — Extreme Coding