Lab 02 — DFS Connected Components (Number of Islands)
Goal
Implement DFS on a 2D grid to enumerate connected components, both recursively and iteratively. After this lab you should be able to write numIslands from a blank slate in <8 minutes, convert recursive DFS to iterative DFS in <3 minutes, and extend the template to any grid-component problem (max area, perimeter, surrounded regions) by changing 5 lines or fewer.
Background Concepts
A grid graph treats each cell (r, c) as a node and edges as adjacencies between 4-connected (or 8-connected) neighboring cells of compatible type. A connected component is a maximal set of cells reachable from each other. Counting components reduces to: scan all cells; whenever an unvisited “land” cell is found, increment the count and DFS-mark its entire component as visited.
DFS is naturally recursive: enter a cell, mark visited, recurse to each valid neighbor. The recursion depth equals the longest path in the component; for an R × C grid the worst case is R · C — at 300 × 300 that’s 9 × 10^4, exceeding Python’s default 1000 recursion limit. The iterative version uses an explicit stack and avoids this entirely.
Interview Context
Number of Islands (LC 200) is the most-asked grid-DFS problem in interview history — it has appeared at virtually every FAANG company at least once, and at Amazon, Meta, and Google in the last year. It’s a stock phone-screen problem at L3-L4 and a warm-up at L5+. Bombing it is a no-hire signal at any senior level. The interviewer expects you to know it cold, and the value-add comes from how you handle the follow-ups: max area, surrounded regions, online updates (LC 305), grid as adjacency matrix (LC 547).
Problem Statement
Given an m × n 2D binary grid where '1' represents land and '0' represents water, return the number of islands. An island is a maximal group of land cells connected 4-directionally (horizontally or vertically).
Constraints
- 1 ≤ m, n ≤ 300
grid[i][j]is'0'or'1'.- The grid is surrounded implicitly by water on all sides.
Clarifying Questions
- Is diagonal adjacency 4-connected or 8-connected? (4-connected.)
- Are the cell values strings
'0'/'1'or ints? (Strings, per LeetCode.) - Can the input be modified in place? (Usually yes — saves O(R·C) visited memory.)
- Are very tall thin grids possible? (Yes — 1 × 90000 is allowed by the constraint.)
- Is the count required to fit in int32? (Yes; max islands ≈ 4.5 × 10^4.)
Examples
grid = [["1","1","0"],
["1","0","0"],
["0","0","1"]]
→ 2
grid = [["1","1","1"],
["1","1","1"],
["0","0","0"]]
→ 1
Initial Brute Force
For each cell, if it’s land and unvisited, increment count and recursively flood-fill. There is no significantly worse “naive” — DFS is the natural approach.
Brute Force Complexity
O(R · C) — every cell is visited exactly once. Space O(R · C) for the recursion stack worst case.
Optimization Path
There’s no asymptotic improvement; only constant-factor and stack-depth improvements:
- In-place marking (mutate
'1'→'0') — saves O(R · C) memory. - Iterative DFS via stack — avoids Python recursion-limit blow-up at R · C ≥ 10^4.
- DSU as alternative — slightly slower in practice (α factor) but composable for online problems (LC 305).
- BFS variant — same asymptotic, different constant; sometimes preferred in Python because deque-pop has lower per-call cost than function calls.
Final Expected Approach
Iterative DFS with in-place marking:
- Iterate over every cell.
- If
grid[r][c] == '1': increment count, push(r, c)to stack. - While stack non-empty: pop
(r, c); if already water, skip; mark'0'; push 4 neighbors that are land.
Recursive DFS is acceptable for small grids; mention recursion-limit and in-place marking on follow-up.
Data Structures Used
- The input grid itself as the visited bookmark (in-place mutation).
- An explicit stack (list, in Python) for iterative DFS.
Correctness Argument
Component-counting via DFS is correct because: (1) DFS from an unvisited land cell visits exactly the cells in its component (closure under adjacency); (2) marking visited prevents re-counting; (3) the outer loop ensures every cell is examined; (4) we increment count only on the first cell of each component. Termination follows from finite grid size.
Complexity
| Operation | Time | Space |
|---|---|---|
| Whole algorithm | O(R · C) | O(R · C) recursion or O(min(R,C)) for BFS |
Implementation Requirements
def numIslands(grid):
if not grid:
return 0
R, C = len(grid), len(grid[0])
DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
count = 0
for r0 in range(R):
for c0 in range(C):
if grid[r0][c0] != '1':
continue
count += 1
stack = [(r0, c0)]
while stack:
r, c = stack.pop()
if grid[r][c] != '1':
continue
grid[r][c] = '0'
for dr, dc in DIRS:
nr, nc = r + dr, c + dc
if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == '1':
stack.append((nr, nc))
return count
Tests
- All water:
[["0"]]→ 0. - All land:
[["1"]*5 for _ in range(5)]→ 1. - Diagonal 1s only: 4-connected → many islands.
- Single column:
[["1"],["0"],["1"],["0"],["1"]]→ 3. - Single row: same logic.
- Snake pattern: alternating
1rows /0rows → R/2 components. - Large: 300 × 300 random — must complete in <1s.
Follow-up Questions
- “Max area of an island.” (LC 695) → DFS returns area; track max.
- “Number of distinct islands” (LC 694) → record canonical-form path of each DFS; dedupe by hash.
- “Surrounded regions” (LC 130) → DFS from border, mark; flip rest.
- “Add land online and report island count after each addition” (LC 305) → DSU.
- “Are two grids the same modulo rotation?” — open-ended modeling; involves shape signatures.
Product Extension
Connected-component analysis underpins image segmentation (region labeling in computer vision), ACL group expansion in identity systems, and graph-clustering for fraud-ring detection. The grid is just a constrained adjacency; the structure generalizes to any sparse adjacency.
Language/Runtime Follow-ups
- Python: recursion limit at 1000 means recursive DFS fails at large grids. Use iterative or
sys.setrecursionlimit(10**6). List-as-stack is fast;tuplekeys in any auxiliary structures are fine. - Java: write
int[][] gridorchar[][]. UseDeque<int[]>(ArrayDeque) for stack;int[]{r, c}instead ofPairfor performance. Don’t useStack(legacy synchronized class). - Go:
[][]byteor[][]int32. Slice-as-stack:s = s[:len(s)-1]. Two-int struct (type cell struct{ r, c int }) avoids slice allocation per push. - C++:
vector<vector<char>>orvector<string>.stack<pair<int,int>>. Useemplace_backto avoid copies. - JS/TS: arrays as stacks (
push/popare O(1)). For 2D grids prefergrid[r][c]over flat-array indexing for clarity; the speed diff is negligible at N = 9 × 10^4.
Common Bugs
- Bounds check missing on neighbor (negative or out-of-grid index).
- Marking visited after pushing instead of on push — same node enters stack many times via different neighbors.
- Recursive DFS without
sys.setrecursionlimitblowing the stack on 300 × 300 dense islands. - Mutating the input when the caller didn’t allow it — clarify in interviews.
- Treating
'1'as integer1(or vice versa) — equality check fails silently. - Off-by-one on
RandC(using< Rvs<= R). - Forgetting one of the four directions (typo in DIRS).
Debugging Strategy
For a 3 × 3 grid, print the grid after each DFS call to verify in-place marking. Add a print((r, c)) on each pop and verify the 4 neighbors are correctly considered. If count is too high, you’re probably re-counting a visited component (visited check missing). If too low, your DFS is exiting early — check the bounds and the equality on '1' vs 1.
Mastery Criteria
- Recognized “count connected groups in a grid” as DFS/BFS in <30 seconds.
- Wrote both recursive and iterative DFS from cold start in <8 minutes total.
- Handled the recursion-limit trap correctly when asked about 10^4 × 10^4 grids.
- Stated O(R · C) complexity unprompted.
- Solved LC 200 in <10 minutes from cold start.
- Solved LC 695 (Max Area) in <12 minutes by extending the template.
- Solved LC 130 (Surrounded Regions) in <20 minutes by inverting the search.
- Articulated when DSU is preferable to DFS (online updates, no spatial constraint).