Lab 02 — Bipartite Matching (Hopcroft-Karp)

Goal

Implement Hopcroft-Karp for maximum bipartite matching, achieving O(E·√V), and understand when it beats general max flow.

Background

Bipartite matching: given a bipartite graph (vertices split into L and R, edges only between L and R), find the largest set of edges with no shared endpoint.

Naïve augmenting path: O(V·E). For each unmatched left vertex, find an augmenting path via DFS.
Hopcroft-Karp (1973): find multiple vertex-disjoint shortest augmenting paths per phase. O(E·√V).

The √V comes from the fact that after √V phases, all remaining augmenting paths have length > √V, and there can be at most √V such paths.

Hopcroft-Karp is a special case of Dinic’s algorithm applied to a unit-capacity bipartite flow network. If you have Dinic, you have Hopcroft-Karp.

Interview Context

Bipartite matching shows up in:

ICPC (constant)
Assignment problems (jobs ↔ workers)
Some quant interviews on portfolio matching
Compiler register coalescing
Almost never in standard FAANG interviews

Recognizing that a problem is bipartite matching is the high-leverage skill; the algorithm is well-known.

When to Skip This Topic

Skip if any of these are true:

You’ve already done Lab 01 (Dinic handles bipartite matching as a special case)
You’re not targeting competitive / quant / assignment-heavy roles
You have less than 2 weeks for this phase

The reduction skill is what matters. Skip the algorithm if you can recognize the reduction and use Dinic.

Problem Statement

Maximum Bipartite Matching.

Given a bipartite graph with L left vertices, R right vertices, and M edges, find the maximum matching size.

Variant: Job Assignment. N workers, N jobs. Worker i can do a subset of jobs. Assign each worker to at most one job, each job to at most one worker. Maximize assignments.

Constraints

1 ≤ L, R ≤ 10^5
1 ≤ M ≤ 10^6
Wall-clock: < 1 sec

Clarifying Questions

Are the partitions L and R given, or do I need to detect bipartiteness? (Usually given.)
Are edges weighted? (No — that’s a different problem: Hungarian algorithm or min-cost max flow.)
Output the matching or just the size? (Both versions are common.)

Examples

L = {1, 2, 3}, R = {a, b, c}
Edges: 1-a, 1-b, 2-b, 3-c
Max matching: {1-a, 2-b, 3-c}, size = 3

L = {1, 2}, R = {a, b}
Edges: 1-a, 2-a
Max matching: {1-a} or {2-a}, size = 1

Brute Force

Try all subsets of edges; check that no vertex appears twice; track max. O(2^M · M).

Better naïve: for each left vertex in order, DFS to find an augmenting path. O(V·E).

Brute Force Complexity

Subsets: O(2^M)
Per-vertex DFS: O(V·E). Acceptable for V·E ≤ ~10^7.

Optimization Path

Hopcroft-Karp:

Phase 1: BFS from all unmatched left vertices, computing layers in the residual graph.
Phase 2: DFS from each unmatched left vertex, finding vertex-disjoint shortest augmenting paths.
Repeat until no augmenting path exists.

The phase count is O(√V), giving total O(E·√V).

Final Expected Approach

class HopcroftKarp:
    def __init__(self, left_size, right_size):
        self.L = left_size
        self.R = right_size
        self.adj = [[] for _ in range(left_size)]
        self.NIL = -1
    
    def add_edge(self, u, v):
        self.adj[u].append(v)
    
    def _bfs(self):
        q = deque()
        self.dist = [float('inf')] * self.L
        for u in range(self.L):
            if self.match_L[u] == self.NIL:
                self.dist[u] = 0
                q.append(u)
        found = False
        while q:
            u = q.popleft()
            for v in self.adj[u]:
                pair = self.match_R[v]
                if pair == self.NIL:
                    found = True
                elif self.dist[pair] == float('inf'):
                    self.dist[pair] = self.dist[u] + 1
                    q.append(pair)
        return found
    
    def _dfs(self, u):
        for v in self.adj[u]:
            pair = self.match_R[v]
            if pair == self.NIL or (self.dist[pair] == self.dist[u] + 1 and self._dfs(pair)):
                self.match_L[u] = v
                self.match_R[v] = u
                return True
        self.dist[u] = float('inf')
        return False
    
    def max_matching(self):
        self.match_L = [self.NIL] * self.L
        self.match_R = [self.NIL] * self.R
        matching = 0
        while self._bfs():
            for u in range(self.L):
                if self.match_L[u] == self.NIL and self._dfs(u):
                    matching += 1
        return matching

Data Structures

Adjacency list (left → list of right)
match_L[u], match_R[v]: current partner or NIL
dist[u]: BFS layer of left vertex
Queue for BFS

Correctness Argument

Augmenting path: path alternating unmatched-matched-unmatched… edges, starting and ending at unmatched vertices. Flipping the edges along the path increases matching size by 1.
Berge’s theorem: matching is maximum iff no augmenting path exists.
Hopcroft-Karp: in each phase, finds a maximal set of vertex-disjoint shortest augmenting paths. After √V phases, no short augmenting paths remain; at most √V remaining ones contribute one each.

Complexity

Time: O(E · √V)
Space: O(V + E)

For V = 10^5, E = 10^6: roughly 10^7.5 ≈ 3·10^7 ops — well under 1 sec in C++.

Implementation Requirements

Use BFS to detect all unmatched left vertices and compute layers
DFS must respect layer constraint (dist[pair] == dist[u] + 1)
Set dist[u] = infinity on failed DFS to prune subsequent visits
Repeat until BFS finds no augmenting path

Tests

Empty graph → 0
Single edge → 1
Complete bipartite K_{n,n} → n
Star (1 left, n right) → 1
Path 1-a-2-b-3 → 2

Follow-up Questions

Weighted matching (maximize sum of edge weights, not count). → Hungarian algorithm O(V³) or min-cost max flow.
Online matching (edges arrive one at a time). → Greedy is 1/2-competitive; ranking is (1 − 1/e)-competitive.
Stable matching (Gale-Shapley). → Different problem; preferences instead of binary edges.
Edge-disjoint paths from s to t. → Reduces to max flow with all capacities 1.

Product Extension

Ad-slot allocation (advertisers ↔ impressions)
Ride-sharing dispatch (drivers ↔ riders)
Course allocation (students ↔ classes with capacity)
Resource scheduling

Language/Runtime Follow-ups

C++: competitive programmers use a tight 50-line version
Python: recursion depth and constant factor make this borderline at V = 10^5; use sys.setrecursionlimit or iterative
Rust: ownership makes the in-place matching arrays a small wrestle

Common Bugs

Forgot to reset dist[u] = infinity on DFS failure. Re-explores dead ends; slow.
DFS doesn’t respect the layer constraint. Same as Ford-Fulkerson; loses √V factor.
match_L and match_R out of sync. Update both atomically.
NIL value collision with real vertex 0. Use -1 or a sentinel.

Debugging Strategy

After each phase, print matching size and BFS layer counts
Verify match_L[u] == v iff match_R[v] == u
Augmenting path should alternate matched/unmatched

Mastery Criteria

Implement Hopcroft-Karp in ≤ 30 min from memory
Explain why √V phases suffice (sketch of proof)
Identify when bipartite matching applies to a problem stated in domain terms
State the difference between bipartite matching and Hungarian (weighted)
Estimate runtime for given V, E

LeetCode Interview — Extreme Coding