Lab 02 — Bipartite Matching (Hopcroft-Karp)
Goal
Implement Hopcroft-Karp for maximum bipartite matching, achieving O(E·√V), and understand when it beats general max flow.
Background
Bipartite matching: given a bipartite graph (vertices split into L and R, edges only between L and R), find the largest set of edges with no shared endpoint.
- Naïve augmenting path: O(V·E). For each unmatched left vertex, find an augmenting path via DFS.
- Hopcroft-Karp (1973): find multiple vertex-disjoint shortest augmenting paths per phase. O(E·√V).
The √V comes from the fact that after √V phases, all remaining augmenting paths have length > √V, and there can be at most √V such paths.
Hopcroft-Karp is a special case of Dinic’s algorithm applied to a unit-capacity bipartite flow network. If you have Dinic, you have Hopcroft-Karp.
Interview Context
Bipartite matching shows up in:
- ICPC (constant)
- Assignment problems (jobs ↔ workers)
- Some quant interviews on portfolio matching
- Compiler register coalescing
- Almost never in standard FAANG interviews
Recognizing that a problem is bipartite matching is the high-leverage skill; the algorithm is well-known.
When to Skip This Topic
Skip if any of these are true:
- You’ve already done Lab 01 (Dinic handles bipartite matching as a special case)
- You’re not targeting competitive / quant / assignment-heavy roles
- You have less than 2 weeks for this phase
The reduction skill is what matters. Skip the algorithm if you can recognize the reduction and use Dinic.
Problem Statement
Maximum Bipartite Matching.
Given a bipartite graph with L left vertices, R right vertices, and M edges, find the maximum matching size.
Variant: Job Assignment. N workers, N jobs. Worker i can do a subset of jobs. Assign each worker to at most one job, each job to at most one worker. Maximize assignments.
Constraints
- 1 ≤ L, R ≤ 10^5
- 1 ≤ M ≤ 10^6
- Wall-clock: < 1 sec
Clarifying Questions
- Are the partitions L and R given, or do I need to detect bipartiteness? (Usually given.)
- Are edges weighted? (No — that’s a different problem: Hungarian algorithm or min-cost max flow.)
- Output the matching or just the size? (Both versions are common.)
Examples
L = {1, 2, 3}, R = {a, b, c}
Edges: 1-a, 1-b, 2-b, 3-c
Max matching: {1-a, 2-b, 3-c}, size = 3
L = {1, 2}, R = {a, b}
Edges: 1-a, 2-a
Max matching: {1-a} or {2-a}, size = 1
Brute Force
Try all subsets of edges; check that no vertex appears twice; track max. O(2^M · M).
Better naïve: for each left vertex in order, DFS to find an augmenting path. O(V·E).
Brute Force Complexity
- Subsets: O(2^M)
- Per-vertex DFS: O(V·E). Acceptable for V·E ≤ ~10^7.
Optimization Path
Hopcroft-Karp:
- Phase 1: BFS from all unmatched left vertices, computing layers in the residual graph.
- Phase 2: DFS from each unmatched left vertex, finding vertex-disjoint shortest augmenting paths.
- Repeat until no augmenting path exists.
The phase count is O(√V), giving total O(E·√V).
Final Expected Approach
class HopcroftKarp:
def __init__(self, left_size, right_size):
self.L = left_size
self.R = right_size
self.adj = [[] for _ in range(left_size)]
self.NIL = -1
def add_edge(self, u, v):
self.adj[u].append(v)
def _bfs(self):
q = deque()
self.dist = [float('inf')] * self.L
for u in range(self.L):
if self.match_L[u] == self.NIL:
self.dist[u] = 0
q.append(u)
found = False
while q:
u = q.popleft()
for v in self.adj[u]:
pair = self.match_R[v]
if pair == self.NIL:
found = True
elif self.dist[pair] == float('inf'):
self.dist[pair] = self.dist[u] + 1
q.append(pair)
return found
def _dfs(self, u):
for v in self.adj[u]:
pair = self.match_R[v]
if pair == self.NIL or (self.dist[pair] == self.dist[u] + 1 and self._dfs(pair)):
self.match_L[u] = v
self.match_R[v] = u
return True
self.dist[u] = float('inf')
return False
def max_matching(self):
self.match_L = [self.NIL] * self.L
self.match_R = [self.NIL] * self.R
matching = 0
while self._bfs():
for u in range(self.L):
if self.match_L[u] == self.NIL and self._dfs(u):
matching += 1
return matching
Data Structures
- Adjacency list (left → list of right)
match_L[u],match_R[v]: current partner or NILdist[u]: BFS layer of left vertex- Queue for BFS
Correctness Argument
- Augmenting path: path alternating unmatched-matched-unmatched… edges, starting and ending at unmatched vertices. Flipping the edges along the path increases matching size by 1.
- Berge’s theorem: matching is maximum iff no augmenting path exists.
- Hopcroft-Karp: in each phase, finds a maximal set of vertex-disjoint shortest augmenting paths. After √V phases, no short augmenting paths remain; at most √V remaining ones contribute one each.
Complexity
- Time: O(E · √V)
- Space: O(V + E)
For V = 10^5, E = 10^6: roughly 10^7.5 ≈ 3·10^7 ops — well under 1 sec in C++.
Implementation Requirements
- Use BFS to detect all unmatched left vertices and compute layers
- DFS must respect layer constraint (
dist[pair] == dist[u] + 1) - Set
dist[u] = infinityon failed DFS to prune subsequent visits - Repeat until BFS finds no augmenting path
Tests
- Empty graph → 0
- Single edge → 1
- Complete bipartite K_{n,n} → n
- Star (1 left, n right) → 1
- Path 1-a-2-b-3 → 2
Follow-up Questions
- Weighted matching (maximize sum of edge weights, not count). → Hungarian algorithm O(V³) or min-cost max flow.
- Online matching (edges arrive one at a time). → Greedy is 1/2-competitive; ranking is (1 − 1/e)-competitive.
- Stable matching (Gale-Shapley). → Different problem; preferences instead of binary edges.
- Edge-disjoint paths from s to t. → Reduces to max flow with all capacities 1.
Product Extension
- Ad-slot allocation (advertisers ↔ impressions)
- Ride-sharing dispatch (drivers ↔ riders)
- Course allocation (students ↔ classes with capacity)
- Resource scheduling
Language/Runtime Follow-ups
- C++: competitive programmers use a tight 50-line version
- Python: recursion depth and constant factor make this borderline at V = 10^5; use sys.setrecursionlimit or iterative
- Rust: ownership makes the in-place matching arrays a small wrestle
Common Bugs
- Forgot to reset
dist[u] = infinityon DFS failure. Re-explores dead ends; slow. - DFS doesn’t respect the layer constraint. Same as Ford-Fulkerson; loses √V factor.
match_Landmatch_Rout of sync. Update both atomically.- NIL value collision with real vertex 0. Use -1 or a sentinel.
Debugging Strategy
- After each phase, print matching size and BFS layer counts
- Verify
match_L[u] == viffmatch_R[v] == u - Augmenting path should alternate matched/unmatched
Mastery Criteria
- Implement Hopcroft-Karp in ≤ 30 min from memory
- Explain why √V phases suffice (sketch of proof)
- Identify when bipartite matching applies to a problem stated in domain terms
- State the difference between bipartite matching and Hungarian (weighted)
- Estimate runtime for given V, E