p65 — Number of Connected Components in an Undirected Graph
Source: LeetCode 323 · Medium · Topics: Union-Find (DSU), DFS/BFS Companies (2024–2025): Amazon (high), Google (high), Meta (medium-high), Microsoft (medium-high), LinkedIn (medium) Loop position: Union-Find / Disjoint Set Union (DSU) canonical. Tests path compression + union by rank — the two optimizations that make DSU effectively O(α(n)) per op (inverse Ackermann, ~constant).
1. Quick Context
n nodes 0..n-1, undirected edges. Return number of connected components.
DSU approach: start with n singleton components. For each edge (u, v), union u and v. Final count = number of distinct roots.
parent = list(range(n))
rank = [0] * n
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # path compression (halving)
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry: return False
if rank[rx] < rank[ry]: rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]: rank[rx] += 1
return True
count = n
for u, v in edges:
if union(u, v): count -= 1
return count
Time: O((V + E) · α(V)) ≈ O(V + E) practically.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/number-of-connected-components-in-an-undirected-graph/ (LC Premium) Equivalent free problem: LeetCode 547 “Number of Provinces” (matrix form).
STOP. 20-min timer.
3. Prerequisite Concepts
- DSU data structure:
parentarray +find+union. - Two key optimizations: path compression (in find) and union by rank/size (in union).
- Why α(n) ≤ 4 for all n ≤ 2^65536 (Tarjan).
4. How to Approach (FRAMEWORK 1–9)
1. Restate: Count connected components in undirected graph.
2. Clarify: Self-loops? Duplicate edges? Both fine; DSU handles via find returning same root.
3. Examples:
n=5, edges=[[0,1],[1,2],[3,4]] → 2.n=5, edges=[[0,1],[1,2],[2,3],[3,4]] → 1.
5. Brute force: DFS / BFS from each unvisited node; count components. O(V + E).
6. Target: DSU with α(V) per op.
7. Pattern: Union-Find.
8. Develop: see solution.py.
9. Test: empty edges; single node; all in one component; all isolated; duplicate edges; self-loops.
5. Progressive Hints
See hints.md.
6. Deeper Insight
6.1 DSU operations
find(x): return the root (representative) of x’s component. Walk parent chain.
union(x, y): connect the components containing x and y. Find both roots; if different, hang one under the other.
Both ops with optimizations are O(α(n)) amortized.
6.2 Path compression — three variants
Full compression (recursive):
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
Cleanest; risks recursion limit.
Path halving (iterative):
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
No recursion; nearly as good as full compression.
Path splitting: like halving but updates parent[x] to grandparent then moves to OLD parent (not grandparent). Theoretical curiosity.
Both halving and full compression give α(n) bound.
6.3 Union by rank vs union by size
Union by rank: rank ≈ depth upper bound. Hang shorter tree under taller. Rank increases only on ties.
Union by size: track component size; hang smaller under larger.
Both achieve O(log n) tree height without compression, O(α(n)) with compression. Equivalent in practice; rank uses less memory if components are small; size is more intuitive and gives the component size as a side benefit.
6.4 Why α(n) — Tarjan’s theorem
Tarjan (1975, 1984) proved: with both path compression AND union by rank/size, m operations on n elements take O((m + n)·α(n)) time, where α is the inverse Ackermann function. α(n) ≤ 4 for all n ≤ 2^2^2^…^2 (65536 twos). Effectively constant.
Without either optimization, naive DSU is O(n) per op worst case.
6.5 Variants worth knowing
- Weighted DSU / Potentials: store offset
weight[x]= “x’s value minus root’s.” Used for LC 399 (Evaluate Division), arithmetic constraints, parity checks. - DSU on intervals (offline): track “next unmarked index ≥ i” — classic LC 1488 type problem.
- DSU with rollback (persistent): for offline dynamic connectivity; segment-tree-divide-and-conquer over edge lifetimes.
- Small-to-large merging: generic “merge two sets” pattern; O(n log² n) without DSU.
6.6 Connection to Kruskal’s MST
Kruskal sorts edges by weight, scans through, and unions endpoints — using DSU. Path-compression + union-by-rank is essential for Kruskal’s O(E log E) bound.
6.7 DFS / BFS alternative
For static graphs (all edges known up front), DFS/BFS from each unvisited node gives O(V+E) component count — simpler than DSU, same asymptotics. Choose DSU when:
- Edges arrive in a STREAM (incremental connectivity).
- You need to interleave “is u connected to v?” queries with edge additions.
- The problem is naturally union-based (Kruskal, image segmentation, ETL).
6.8 Decremental / fully dynamic connectivity
DSU handles INCREMENTAL (edge additions). For DECREMENTAL (deletions) or FULLY DYNAMIC, advanced structures: Holm-Lichtenberg-Thorup O(log² n) per op for fully dynamic, link-cut trees for trees only. Out of LC scope but worth knowing exists.
7. Anti-Pattern Analysis
- Path compression WITHOUT union by rank — still works (O(log n) amortized) but loses the α(n) bound.
- Union by rank WITHOUT path compression — same: O(log n), not α(n).
- Naive
find(no compression) on a deep tree — degenerates to O(n) per query in worst case. - Initialize
parent = [0] * n— bug, parents should belist(range(n))(each its own root). - Forget to check
if rx == ryin union — does redundant assignment; correctness OK, but in problems where you increment a counter on successful merge, you’d over-count. - Mix
find(x)withparent[x]later — after compression,parent[x]is the root, but only becausefindwas called. Don’t assumeparent[x]is the root without callingfind.
8. Skills & Takeaways
- DSU with path compression + union by rank/size — the standard combo.
- O(α(n)) ≈ constant per op intuition.
- When to choose DSU vs DFS/BFS for component-counting.
- Pattern recognition: any “merge groups” problem screams DSU.
- Weighted DSU for relational constraints.
9. When to Move On
- Implement DSU with both optimizations cold in 10 min.
- Solve LC 323 (premium), LC 547 (Provinces), LC 684 (Redundant Connection), LC 200 (Islands via DSU), LC 1319 (Make Network Connected) in 10–15 min each.
- Solve LC 399 (Evaluate Division) using weighted DSU.
- Solve Kruskal’s MST.
- Recite Tarjan’s α(n) result and its intuition.
10. Company Context
- Amazon: L5/L6; productized as “cluster detection in service-mesh graphs.”
- Google: L5/L6; image-segmentation pipelines; LC 399 a favorite.
- Meta: E5; account-merging across signals.
- Microsoft: L65+; build-system dependency clustering.
- LinkedIn: L5; people-you-may-know cluster discovery.
Scorecard for strong: “Implemented DSU with both optimizations, articulated α(n) bound, mentioned weighted-DSU and Kruskal applications, can solve LC 399.”
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff+ |
|---|---|---|---|---|
| DSU correctness | Buggy | Correct after debug | Cold | + iterative find for huge inputs |
| Optimizations | Neither | One of two | Both, explained | + path-halving variant + lower bounds |
| α(n) understanding | None | Memorized name | Sketches intuition | + Tarjan’s proof structure named |
| Applications | LC 323 only | + LC 547 | + Kruskal + LC 399 | + decremental connectivity + small-to-large + image seg pipelines |
12. Level Delta
- Mid (L4): DSU correctness, path compression OR union by rank, returns count.
- Senior (L5): + both optimizations + LC 399 weighted DSU + Kruskal MST.
- Staff (L6): + iterative DFS path-halving for deep trees + small-to-large merging + decremental connectivity awareness + ETL pipelines using DSU for entity resolution.
- Principal (L7+): + Holm-Lichtenberg-Thorup for fully dynamic connectivity + link-cut trees + DSU with rollback for offline problems + image-segmentation production pipelines (graph cuts, watershed).
13. Follow-up Q&A
Q1: Count the SIZE of each component, not just the count.
A1: Track size[x] initialized to 1; on union, size[new_root] += size[other_root]. Final sizes are at roots; call find to look up.
Q2: LC 399 Evaluate Division — given division equations, answer division queries.
A2: Weighted DSU. weight[x] = value of x / value of parent[x]. On union and find, update weights multiplicatively.
Q3: Edges arrive dynamically; report component count after each addition. A3: Maintain a counter, decrement on each successful union. O(α(n)) per edge.
Q4: What if you must also handle edge DELETIONS? A4: Fully dynamic connectivity. Naive DSU doesn’t support deletion. Use Holm-Lichtenberg-Thorup or offline segment-tree-divide-and-conquer over edge lifetimes.
Q5: How does DSU appear in Kruskal’s MST? A5: Sort edges by weight. Scan in order; for each edge, if its endpoints are in different DSU components, add to MST and union them. Skip otherwise. O(E log E) total.
14. Solution Walkthrough
See solution.py:
count_components_dsu— DSU with path halving + union by rank.count_components_dfs— DFS over adjacency list (oracle).count_components_brute— quadratic transitive-closure flood (stress only).
15. Beyond the Problem
Principal code review: “Union-Find is one of the rare data structures that’s both theoretically beautiful (α(n) ≈ constant via Tarjan’s analysis) and constantly practical. Image segmentation in production (think medical imaging, satellite analysis), Kruskal’s MST in network design, entity resolution in data warehouses (‘these three records are the same customer’), percolation simulations in physics, build-system dependency clustering — all DSU. Weighted DSU adds the ability to track relational constraints (parity, ratios, offsets), turning it into a constraint propagator. Persistent DSU (with rollback) enables offline algorithms on dynamic graphs. When you write
parent[ry] = rx, you’re invoking 50 years of algorithm-design history and one of the most elegant amortized analyses in computer science. Memorize the seven lines; learn the dozen applications; respect the depth.”