Lab 02 — Bipartite Matching (Hopcroft-Karp)

Goal

Implement Hopcroft-Karp for maximum bipartite matching, achieving O(E·√V), and understand when it beats general max flow.

Background

Bipartite matching: given a bipartite graph (vertices split into L and R, edges only between L and R), find the largest set of edges with no shared endpoint.

  • Naïve augmenting path: O(V·E). For each unmatched left vertex, find an augmenting path via DFS.
  • Hopcroft-Karp (1973): find multiple vertex-disjoint shortest augmenting paths per phase. O(E·√V).

The √V comes from the fact that after √V phases, all remaining augmenting paths have length > √V, and there can be at most √V such paths.

Hopcroft-Karp is a special case of Dinic’s algorithm applied to a unit-capacity bipartite flow network. If you have Dinic, you have Hopcroft-Karp.

Interview Context

Bipartite matching shows up in:

  • ICPC (constant)
  • Assignment problems (jobs ↔ workers)
  • Some quant interviews on portfolio matching
  • Compiler register coalescing
  • Almost never in standard FAANG interviews

Recognizing that a problem is bipartite matching is the high-leverage skill; the algorithm is well-known.

When to Skip This Topic

Skip if any of these are true:

  • You’ve already done Lab 01 (Dinic handles bipartite matching as a special case)
  • You’re not targeting competitive / quant / assignment-heavy roles
  • You have less than 2 weeks for this phase

The reduction skill is what matters. Skip the algorithm if you can recognize the reduction and use Dinic.

Problem Statement

Maximum Bipartite Matching.

Given a bipartite graph with L left vertices, R right vertices, and M edges, find the maximum matching size.

Variant: Job Assignment. N workers, N jobs. Worker i can do a subset of jobs. Assign each worker to at most one job, each job to at most one worker. Maximize assignments.

Constraints

  • 1 ≤ L, R ≤ 10^5
  • 1 ≤ M ≤ 10^6
  • Wall-clock: < 1 sec

Clarifying Questions

  1. Are the partitions L and R given, or do I need to detect bipartiteness? (Usually given.)
  2. Are edges weighted? (No — that’s a different problem: Hungarian algorithm or min-cost max flow.)
  3. Output the matching or just the size? (Both versions are common.)

Examples

L = {1, 2, 3}, R = {a, b, c}
Edges: 1-a, 1-b, 2-b, 3-c
Max matching: {1-a, 2-b, 3-c}, size = 3
L = {1, 2}, R = {a, b}
Edges: 1-a, 2-a
Max matching: {1-a} or {2-a}, size = 1

Brute Force

Try all subsets of edges; check that no vertex appears twice; track max. O(2^M · M).

Better naïve: for each left vertex in order, DFS to find an augmenting path. O(V·E).

Brute Force Complexity

  • Subsets: O(2^M)
  • Per-vertex DFS: O(V·E). Acceptable for V·E ≤ ~10^7.

Optimization Path

Hopcroft-Karp:

  1. Phase 1: BFS from all unmatched left vertices, computing layers in the residual graph.
  2. Phase 2: DFS from each unmatched left vertex, finding vertex-disjoint shortest augmenting paths.
  3. Repeat until no augmenting path exists.

The phase count is O(√V), giving total O(E·√V).

Final Expected Approach

class HopcroftKarp:
    def __init__(self, left_size, right_size):
        self.L = left_size
        self.R = right_size
        self.adj = [[] for _ in range(left_size)]
        self.NIL = -1
    
    def add_edge(self, u, v):
        self.adj[u].append(v)
    
    def _bfs(self):
        q = deque()
        self.dist = [float('inf')] * self.L
        for u in range(self.L):
            if self.match_L[u] == self.NIL:
                self.dist[u] = 0
                q.append(u)
        found = False
        while q:
            u = q.popleft()
            for v in self.adj[u]:
                pair = self.match_R[v]
                if pair == self.NIL:
                    found = True
                elif self.dist[pair] == float('inf'):
                    self.dist[pair] = self.dist[u] + 1
                    q.append(pair)
        return found
    
    def _dfs(self, u):
        for v in self.adj[u]:
            pair = self.match_R[v]
            if pair == self.NIL or (self.dist[pair] == self.dist[u] + 1 and self._dfs(pair)):
                self.match_L[u] = v
                self.match_R[v] = u
                return True
        self.dist[u] = float('inf')
        return False
    
    def max_matching(self):
        self.match_L = [self.NIL] * self.L
        self.match_R = [self.NIL] * self.R
        matching = 0
        while self._bfs():
            for u in range(self.L):
                if self.match_L[u] == self.NIL and self._dfs(u):
                    matching += 1
        return matching

Data Structures

  • Adjacency list (left → list of right)
  • match_L[u], match_R[v]: current partner or NIL
  • dist[u]: BFS layer of left vertex
  • Queue for BFS

Correctness Argument

  • Augmenting path: path alternating unmatched-matched-unmatched… edges, starting and ending at unmatched vertices. Flipping the edges along the path increases matching size by 1.
  • Berge’s theorem: matching is maximum iff no augmenting path exists.
  • Hopcroft-Karp: in each phase, finds a maximal set of vertex-disjoint shortest augmenting paths. After √V phases, no short augmenting paths remain; at most √V remaining ones contribute one each.

Complexity

  • Time: O(E · √V)
  • Space: O(V + E)

For V = 10^5, E = 10^6: roughly 10^7.5 ≈ 3·10^7 ops — well under 1 sec in C++.

Implementation Requirements

  • Use BFS to detect all unmatched left vertices and compute layers
  • DFS must respect layer constraint (dist[pair] == dist[u] + 1)
  • Set dist[u] = infinity on failed DFS to prune subsequent visits
  • Repeat until BFS finds no augmenting path

Tests

  • Empty graph → 0
  • Single edge → 1
  • Complete bipartite K_{n,n} → n
  • Star (1 left, n right) → 1
  • Path 1-a-2-b-3 → 2

Follow-up Questions

  • Weighted matching (maximize sum of edge weights, not count). → Hungarian algorithm O(V³) or min-cost max flow.
  • Online matching (edges arrive one at a time). → Greedy is 1/2-competitive; ranking is (1 − 1/e)-competitive.
  • Stable matching (Gale-Shapley). → Different problem; preferences instead of binary edges.
  • Edge-disjoint paths from s to t. → Reduces to max flow with all capacities 1.

Product Extension

  • Ad-slot allocation (advertisers ↔ impressions)
  • Ride-sharing dispatch (drivers ↔ riders)
  • Course allocation (students ↔ classes with capacity)
  • Resource scheduling

Language/Runtime Follow-ups

  • C++: competitive programmers use a tight 50-line version
  • Python: recursion depth and constant factor make this borderline at V = 10^5; use sys.setrecursionlimit or iterative
  • Rust: ownership makes the in-place matching arrays a small wrestle

Common Bugs

  1. Forgot to reset dist[u] = infinity on DFS failure. Re-explores dead ends; slow.
  2. DFS doesn’t respect the layer constraint. Same as Ford-Fulkerson; loses √V factor.
  3. match_L and match_R out of sync. Update both atomically.
  4. NIL value collision with real vertex 0. Use -1 or a sentinel.

Debugging Strategy

  • After each phase, print matching size and BFS layer counts
  • Verify match_L[u] == v iff match_R[v] == u
  • Augmenting path should alternate matched/unmatched

Mastery Criteria

  • Implement Hopcroft-Karp in ≤ 30 min from memory
  • Explain why √V phases suffice (sketch of proof)
  • Identify when bipartite matching applies to a problem stated in domain terms
  • State the difference between bipartite matching and Hungarian (weighted)
  • Estimate runtime for given V, E