Lab 03 — Heavy-Light Decomposition

Goal

Implement Heavy-Light Decomposition (HLD) for answering path queries on a tree in O(log² N) per query (or O(log N) with a segment tree per chain).

Background

HLD partitions tree edges into “heavy” and “light”:

  • For each non-leaf vertex, the edge to its child with the largest subtree is heavy.
  • All other edges are light.

Property: any root-to-leaf path uses O(log N) light edges (because each light edge halves the subtree size). Hence any tree path can be decomposed into O(log N) heavy chains.

Each heavy chain is contiguous in a DFS order, so we can maintain a segment tree over the DFS array and do O(log N) work per chain → O(log² N) per path query.

Originally from Sleator and Tarjan’s link-cut tree work (1983); HLD as the standalone offline technique attributed to Sleator & Tarjan / popularized via ICPC.

Interview Context

HLD appears in:

  • ICPC (frequently in the path-query category — QTREE problem on SPOJ is canonical)
  • Almost never in industry interviews
  • A handful of database/optimizer roles touch on tree-DP that HLD speeds up
  • A very rare appearance in compiler dominator-tree manipulation

When to Skip This Topic

Skip if any of these are true:

  • You are not targeting ICPC
  • You are not interviewing at a research/algorithms team
  • You don’t already understand segment trees deeply (Phase 3 Lab 01–02)
  • You have less than 3 weeks for this phase

HLD is implementation-heavy. Getting it right in interview time requires ≥ 30 hours of practice.

Problem Statement

Path Query (QTREE-style).

Given a tree of N vertices, each edge has a weight. Support two operations:

  • change(i, w): change edge i’s weight to w
  • query(u, v): return the maximum edge weight on the path from u to v

Constraints

  • 1 ≤ N ≤ 10^5
  • 1 ≤ Q ≤ 10^5
  • Edge weights ≤ 10^9

Clarifying Questions

  1. Is the tree rooted or unrooted? (Pick a root; doesn’t matter.)
  2. Queries on edges or on vertices? (Edges here; vertex variant is simpler.)
  3. Multiple components possible? (No — single tree.)

Examples

N = 4
Edges: (1,2,3), (2,3,4), (2,4,5)
query(1, 3): path 1→2→3, max edge = 4
change(1, 10): edge (1,2) now has weight 10
query(1, 4): path 1→2→4, max edge = 10

Brute Force

For each query, find the path via LCA (precompute LCA in O(log N)), then walk the path and check each edge. O(N) per query → O(NQ) total.

For N=Q=10^5, that’s 10^10 — TLE.

Brute Force Complexity

  • Time: O(NQ)
  • Space: O(N) for tree + LCA tables

Optimization Path

The path between u and v decomposes as u → LCA → v. With HLD, each leg traverses O(log N) heavy chains; each chain is a contiguous range in our DFS order; we query/update with a segment tree.

Final Expected Approach

  1. DFS 1 (size/parent/depth): compute subtree size, parent, depth.
  2. DFS 2 (HLD): for each vertex, identify heavy child (largest subtree). Assign heavy[u]. Walk heavy chains, assigning head[u] and a pos[u] = position in DFS-order array.
  3. Build segment tree over the DFS array, indexed by pos[u].
  4. Query(u, v): while head[u] != head[v], raise the deeper one to its head’s parent, querying the segment tree for the chain segment. Then query the segment between u and v on the final shared chain.
def query_path(u, v):
    res = 0
    while head[u] != head[v]:
        if depth[head[u]] < depth[head[v]]:
            u, v = v, u
        res = max(res, seg_tree.query(pos[head[u]], pos[u]))
        u = parent[head[u]]
    if u == v: return res
    if depth[u] > depth[v]: u, v = v, u
    # Edge weights stored at the deeper endpoint; skip u's contribution
    res = max(res, seg_tree.query(pos[u] + 1, pos[v]))
    return res

Data Structures

  • Tree adjacency
  • Arrays: parent, depth, size, heavy, head, pos
  • Segment tree over pos-indexed values
  • DFS for both passes (iterative for large N to avoid stack overflow)

Correctness Argument

  • Light edge bound: if (u, v) is a light edge, size(v) ≤ size(u)/2. So any root-to-leaf path crosses O(log N) light edges.
  • Heavy chain decomposition: path u→v splits at LCA; each leg traverses chains separated by light edges; ≤ O(log N) chains.
  • Segment tree on chain: chains contiguous in DFS order; standard range query.
  • Edge-on-vertex convention: store each edge’s weight at its deeper endpoint, so a vertex query at position i returns the parent-edge weight.

Complexity

  • Preprocessing: O(N)
  • Per query/update: O(log² N) — segment tree query O(log N), times O(log N) chains
  • Total: O((N + Q) log² N)

Implementation Requirements

  • Iterative DFS for N > 10^4 to avoid Python recursion limit (or C++ stack)
  • Segment tree must support point update + range max query
  • Careful indexing: edge i stored at vertex deeper_endpoint(i)
  • Handle u == v case in query (return 0 or identity)

Tests

  • Linear tree (path graph): every edge is on the chain; ≤ 1 chain transition per query
  • Balanced binary tree: ≈ log N chain transitions
  • Star tree (1 center, N leaves): only one heavy edge; all other queries are 1-chain
  • Update + query interleaved
  • Query on same vertex (u == v)
  • Path including the root

Follow-up Questions

  • Sum on path instead of max. → Same structure, segment tree stores sums.
  • Update path (range add) instead of point. → Segment tree with lazy propagation; same chain decomposition.
  • Subtree query (sum/max in subtree of u). → Even simpler: subtree is contiguous in DFS, single range query.
  • LCA only. → Tarjan offline O((N+Q)α(N)) or binary lifting O(log N).
  • Dynamic tree (edges added/removed). → Link-cut trees (Sleator-Tarjan); much harder.

Product Extension

  • Network routing on hierarchical topologies
  • File-system path queries (organizations with deep trees)
  • Phylogenetic tree analysis
  • Decision tree updates in some ML systems

Language/Runtime Follow-ups

  • C++: standard ICPC implementation; ~150 lines
  • Python: slow constant factor; use iterative DFS; can pass for N ≤ 10^4 comfortably
  • Java: stack depth fine for N ≤ 10^5; constant factor OK

Common Bugs

  1. Recursive DFS for N = 10^5: stack overflow in many languages.
  2. Forgot edge-vertex mapping convention: off-by-one when querying final segment.
  3. Heavy child computed wrong: must be the child with the largest subtree, not the deepest.
  4. head[u] not propagated through the chain. All vertices on the same heavy chain should share head.
  5. Segment tree off-by-one between pos[u] and pos[v].
  6. Update at the root edge. Root has no parent edge; verify boundary handling.

Debugging Strategy

  • Print the chain decomposition: list all chains as [head, ..., tail]
  • For a path query, log each chain segment queried
  • Verify against brute force on N ≤ 20
  • Visualize: color heavy edges red, light edges black; should see O(log) light edges on long paths

Mastery Criteria

  • Implement HLD + segment tree from scratch in ≤ 90 min
  • Explain why O(log N) chains per path
  • Handle edge-weight vs vertex-weight variants
  • Combine with lazy propagation for range updates
  • State complexity precisely: O((N + Q) log² N) or O(log N) with chain-segtrees