p65 — Number of Connected Components in an Undirected Graph

Source: LeetCode 323 · Medium · Topics: Union-Find (DSU), DFS/BFS Companies (2024–2025): Amazon (high), Google (high), Meta (medium-high), Microsoft (medium-high), LinkedIn (medium) Loop position: Union-Find / Disjoint Set Union (DSU) canonical. Tests path compression + union by rank — the two optimizations that make DSU effectively O(α(n)) per op (inverse Ackermann, ~constant).

1. Quick Context

n nodes 0..n-1, undirected edges. Return number of connected components.

DSU approach: start with n singleton components. For each edge (u, v), union u and v. Final count = number of distinct roots.

parent = list(range(n))
rank = [0] * n
def find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]   # path compression (halving)
        x = parent[x]
    return x
def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry: return False
    if rank[rx] < rank[ry]: rx, ry = ry, rx
    parent[ry] = rx
    if rank[rx] == rank[ry]: rank[rx] += 1
    return True
count = n
for u, v in edges:
    if union(u, v): count -= 1
return count

Time: O((V + E) · α(V)) ≈ O(V + E) practically.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/number-of-connected-components-in-an-undirected-graph/ (LC Premium) Equivalent free problem: LeetCode 547 “Number of Provinces” (matrix form).

STOP. 20-min timer.

3. Prerequisite Concepts

DSU data structure: parent array + find + union.
Two key optimizations: path compression (in find) and union by rank/size (in union).
Why α(n) ≤ 4 for all n ≤ 2^65536 (Tarjan).

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Count connected components in undirected graph.

2. Clarify: Self-loops? Duplicate edges? Both fine; DSU handles via find returning same root.

3. Examples:

n=5, edges=[[0,1],[1,2],[3,4]] → 2.
n=5, edges=[[0,1],[1,2],[2,3],[3,4]] → 1.

5. Brute force: DFS / BFS from each unvisited node; count components. O(V + E).

6. Target: DSU with α(V) per op.

7. Pattern: Union-Find.

8. Develop: see solution.py.

9. Test: empty edges; single node; all in one component; all isolated; duplicate edges; self-loops.

6. Deeper Insight

6.1 DSU operations

find(x): return the root (representative) of x’s component. Walk parent chain.

union(x, y): connect the components containing x and y. Find both roots; if different, hang one under the other.

Both ops with optimizations are O(α(n)) amortized.

6.2 Path compression — three variants

Full compression (recursive):

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])
    return parent[x]

Cleanest; risks recursion limit.

Path halving (iterative):

def find(x):
    while parent[x] != x:
        parent[x] = parent[parent[x]]
        x = parent[x]
    return x

No recursion; nearly as good as full compression.

Path splitting: like halving but updates parent[x] to grandparent then moves to OLD parent (not grandparent). Theoretical curiosity.

Both halving and full compression give α(n) bound.

6.3 Union by rank vs union by size

Union by rank: rank ≈ depth upper bound. Hang shorter tree under taller. Rank increases only on ties.

Union by size: track component size; hang smaller under larger.

Both achieve O(log n) tree height without compression, O(α(n)) with compression. Equivalent in practice; rank uses less memory if components are small; size is more intuitive and gives the component size as a side benefit.

6.4 Why α(n) — Tarjan’s theorem

Tarjan (1975, 1984) proved: with both path compression AND union by rank/size, m operations on n elements take O((m + n)·α(n)) time, where α is the inverse Ackermann function. α(n) ≤ 4 for all n ≤ 2^2^2^…^2 (65536 twos). Effectively constant.

Without either optimization, naive DSU is O(n) per op worst case.

6.5 Variants worth knowing

Weighted DSU / Potentials: store offset weight[x] = “x’s value minus root’s.” Used for LC 399 (Evaluate Division), arithmetic constraints, parity checks.
DSU on intervals (offline): track “next unmarked index ≥ i” — classic LC 1488 type problem.
DSU with rollback (persistent): for offline dynamic connectivity; segment-tree-divide-and-conquer over edge lifetimes.
Small-to-large merging: generic “merge two sets” pattern; O(n log² n) without DSU.

6.6 Connection to Kruskal’s MST

Kruskal sorts edges by weight, scans through, and unions endpoints — using DSU. Path-compression + union-by-rank is essential for Kruskal’s O(E log E) bound.

6.7 DFS / BFS alternative

For static graphs (all edges known up front), DFS/BFS from each unvisited node gives O(V+E) component count — simpler than DSU, same asymptotics. Choose DSU when:

Edges arrive in a STREAM (incremental connectivity).
You need to interleave “is u connected to v?” queries with edge additions.
The problem is naturally union-based (Kruskal, image segmentation, ETL).

6.8 Decremental / fully dynamic connectivity

DSU handles INCREMENTAL (edge additions). For DECREMENTAL (deletions) or FULLY DYNAMIC, advanced structures: Holm-Lichtenberg-Thorup O(log² n) per op for fully dynamic, link-cut trees for trees only. Out of LC scope but worth knowing exists.

7. Anti-Pattern Analysis

Path compression WITHOUT union by rank — still works (O(log n) amortized) but loses the α(n) bound.
Union by rank WITHOUT path compression — same: O(log n), not α(n).
Naive find (no compression) on a deep tree — degenerates to O(n) per query in worst case.
Initialize parent = [0] * n — bug, parents should be list(range(n)) (each its own root).
Forget to check if rx == ry in union — does redundant assignment; correctness OK, but in problems where you increment a counter on successful merge, you’d over-count.
Mix find(x) with parent[x] later — after compression, parent[x] is the root, but only because find was called. Don’t assume parent[x] is the root without calling find.

8. Skills & Takeaways

DSU with path compression + union by rank/size — the standard combo.
O(α(n)) ≈ constant per op intuition.
When to choose DSU vs DFS/BFS for component-counting.
Pattern recognition: any “merge groups” problem screams DSU.
Weighted DSU for relational constraints.

9. When to Move On

Implement DSU with both optimizations cold in 10 min.
Solve LC 323 (premium), LC 547 (Provinces), LC 684 (Redundant Connection), LC 200 (Islands via DSU), LC 1319 (Make Network Connected) in 10–15 min each.
Solve LC 399 (Evaluate Division) using weighted DSU.
Solve Kruskal’s MST.
Recite Tarjan’s α(n) result and its intuition.

10. Company Context

Amazon: L5/L6; productized as “cluster detection in service-mesh graphs.”
Google: L5/L6; image-segmentation pipelines; LC 399 a favorite.
Meta: E5; account-merging across signals.
Microsoft: L65+; build-system dependency clustering.
LinkedIn: L5; people-you-may-know cluster discovery.

Scorecard for strong: “Implemented DSU with both optimizations, articulated α(n) bound, mentioned weighted-DSU and Kruskal applications, can solve LC 399.”

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff+
DSU correctness	Buggy	Correct after debug	Cold	+ iterative find for huge inputs
Optimizations	Neither	One of two	Both, explained	+ path-halving variant + lower bounds
α(n) understanding	None	Memorized name	Sketches intuition	+ Tarjan’s proof structure named
Applications	LC 323 only	+ LC 547	+ Kruskal + LC 399	+ decremental connectivity + small-to-large + image seg pipelines

12. Level Delta

Mid (L4): DSU correctness, path compression OR union by rank, returns count.
Senior (L5): + both optimizations + LC 399 weighted DSU + Kruskal MST.
Staff (L6): + iterative DFS path-halving for deep trees + small-to-large merging + decremental connectivity awareness + ETL pipelines using DSU for entity resolution.
Principal (L7+): + Holm-Lichtenberg-Thorup for fully dynamic connectivity + link-cut trees + DSU with rollback for offline problems + image-segmentation production pipelines (graph cuts, watershed).

13. Follow-up Q&A

Q1: Count the SIZE of each component, not just the count. A1: Track size[x] initialized to 1; on union, size[new_root] += size[other_root]. Final sizes are at roots; call find to look up.

Q2: LC 399 Evaluate Division — given division equations, answer division queries. A2: Weighted DSU. weight[x] = value of x / value of parent[x]. On union and find, update weights multiplicatively.

Q3: Edges arrive dynamically; report component count after each addition. A3: Maintain a counter, decrement on each successful union. O(α(n)) per edge.

Q4: What if you must also handle edge DELETIONS? A4: Fully dynamic connectivity. Naive DSU doesn’t support deletion. Use Holm-Lichtenberg-Thorup or offline segment-tree-divide-and-conquer over edge lifetimes.

Q5: How does DSU appear in Kruskal’s MST? A5: Sort edges by weight. Scan in order; for each edge, if its endpoints are in different DSU components, add to MST and union them. Skip otherwise. O(E log E) total.

14. Solution Walkthrough

See solution.py:

count_components_dsu — DSU with path halving + union by rank.
count_components_dfs — DFS over adjacency list (oracle).
count_components_brute — quadratic transitive-closure flood (stress only).

15. Beyond the Problem

Principal code review: “Union-Find is one of the rare data structures that’s both theoretically beautiful (α(n) ≈ constant via Tarjan’s analysis) and constantly practical. Image segmentation in production (think medical imaging, satellite analysis), Kruskal’s MST in network design, entity resolution in data warehouses (‘these three records are the same customer’), percolation simulations in physics, build-system dependency clustering — all DSU. Weighted DSU adds the ability to track relational constraints (parity, ratios, offsets), turning it into a constraint propagator. Persistent DSU (with rollback) enables offline algorithms on dynamic graphs. When you write parent[ry] = rx, you’re invoking 50 years of algorithm-design history and one of the most elegant amortized analyses in computer science. Memorize the seven lines; learn the dozen applications; respect the depth.”

LeetCode Interview — Extreme Coding