Lab 02 — DFS Connected Components (Number of Islands)

Goal

Implement DFS on a 2D grid to enumerate connected components, both recursively and iteratively. After this lab you should be able to write numIslands from a blank slate in <8 minutes, convert recursive DFS to iterative DFS in <3 minutes, and extend the template to any grid-component problem (max area, perimeter, surrounded regions) by changing 5 lines or fewer.

Background Concepts

A grid graph treats each cell (r, c) as a node and edges as adjacencies between 4-connected (or 8-connected) neighboring cells of compatible type. A connected component is a maximal set of cells reachable from each other. Counting components reduces to: scan all cells; whenever an unvisited “land” cell is found, increment the count and DFS-mark its entire component as visited.

DFS is naturally recursive: enter a cell, mark visited, recurse to each valid neighbor. The recursion depth equals the longest path in the component; for an R × C grid the worst case is R · C — at 300 × 300 that’s 9 × 10^4, exceeding Python’s default 1000 recursion limit. The iterative version uses an explicit stack and avoids this entirely.

Interview Context

Number of Islands (LC 200) is the most-asked grid-DFS problem in interview history — it has appeared at virtually every FAANG company at least once, and at Amazon, Meta, and Google in the last year. It’s a stock phone-screen problem at L3-L4 and a warm-up at L5+. Bombing it is a no-hire signal at any senior level. The interviewer expects you to know it cold, and the value-add comes from how you handle the follow-ups: max area, surrounded regions, online updates (LC 305), grid as adjacency matrix (LC 547).

Problem Statement

Given an m × n 2D binary grid where '1' represents land and '0' represents water, return the number of islands. An island is a maximal group of land cells connected 4-directionally (horizontally or vertically).

Constraints

  • 1 ≤ m, n ≤ 300
  • grid[i][j] is '0' or '1'.
  • The grid is surrounded implicitly by water on all sides.

Clarifying Questions

  1. Is diagonal adjacency 4-connected or 8-connected? (4-connected.)
  2. Are the cell values strings '0'/'1' or ints? (Strings, per LeetCode.)
  3. Can the input be modified in place? (Usually yes — saves O(R·C) visited memory.)
  4. Are very tall thin grids possible? (Yes — 1 × 90000 is allowed by the constraint.)
  5. Is the count required to fit in int32? (Yes; max islands ≈ 4.5 × 10^4.)

Examples

grid = [["1","1","0"],
        ["1","0","0"],
        ["0","0","1"]]
→ 2

grid = [["1","1","1"],
        ["1","1","1"],
        ["0","0","0"]]
→ 1

Initial Brute Force

For each cell, if it’s land and unvisited, increment count and recursively flood-fill. There is no significantly worse “naive” — DFS is the natural approach.

Brute Force Complexity

O(R · C) — every cell is visited exactly once. Space O(R · C) for the recursion stack worst case.

Optimization Path

There’s no asymptotic improvement; only constant-factor and stack-depth improvements:

  • In-place marking (mutate '1''0') — saves O(R · C) memory.
  • Iterative DFS via stack — avoids Python recursion-limit blow-up at R · C ≥ 10^4.
  • DSU as alternative — slightly slower in practice (α factor) but composable for online problems (LC 305).
  • BFS variant — same asymptotic, different constant; sometimes preferred in Python because deque-pop has lower per-call cost than function calls.

Final Expected Approach

Iterative DFS with in-place marking:

  1. Iterate over every cell.
  2. If grid[r][c] == '1': increment count, push (r, c) to stack.
  3. While stack non-empty: pop (r, c); if already water, skip; mark '0'; push 4 neighbors that are land.

Recursive DFS is acceptable for small grids; mention recursion-limit and in-place marking on follow-up.

Data Structures Used

  • The input grid itself as the visited bookmark (in-place mutation).
  • An explicit stack (list, in Python) for iterative DFS.

Correctness Argument

Component-counting via DFS is correct because: (1) DFS from an unvisited land cell visits exactly the cells in its component (closure under adjacency); (2) marking visited prevents re-counting; (3) the outer loop ensures every cell is examined; (4) we increment count only on the first cell of each component. Termination follows from finite grid size.

Complexity

OperationTimeSpace
Whole algorithmO(R · C)O(R · C) recursion or O(min(R,C)) for BFS

Implementation Requirements

def numIslands(grid):
    if not grid:
        return 0
    R, C = len(grid), len(grid[0])
    DIRS = [(-1, 0), (1, 0), (0, -1), (0, 1)]
    count = 0
    for r0 in range(R):
        for c0 in range(C):
            if grid[r0][c0] != '1':
                continue
            count += 1
            stack = [(r0, c0)]
            while stack:
                r, c = stack.pop()
                if grid[r][c] != '1':
                    continue
                grid[r][c] = '0'
                for dr, dc in DIRS:
                    nr, nc = r + dr, c + dc
                    if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == '1':
                        stack.append((nr, nc))
    return count

Tests

  • All water: [["0"]] → 0.
  • All land: [["1"]*5 for _ in range(5)] → 1.
  • Diagonal 1s only: 4-connected → many islands.
  • Single column: [["1"],["0"],["1"],["0"],["1"]] → 3.
  • Single row: same logic.
  • Snake pattern: alternating 1 rows / 0 rows → R/2 components.
  • Large: 300 × 300 random — must complete in <1s.

Follow-up Questions

  1. “Max area of an island.” (LC 695) → DFS returns area; track max.
  2. “Number of distinct islands” (LC 694) → record canonical-form path of each DFS; dedupe by hash.
  3. “Surrounded regions” (LC 130) → DFS from border, mark; flip rest.
  4. “Add land online and report island count after each addition” (LC 305) → DSU.
  5. “Are two grids the same modulo rotation?” — open-ended modeling; involves shape signatures.

Product Extension

Connected-component analysis underpins image segmentation (region labeling in computer vision), ACL group expansion in identity systems, and graph-clustering for fraud-ring detection. The grid is just a constrained adjacency; the structure generalizes to any sparse adjacency.

Language/Runtime Follow-ups

  • Python: recursion limit at 1000 means recursive DFS fails at large grids. Use iterative or sys.setrecursionlimit(10**6). List-as-stack is fast; tuple keys in any auxiliary structures are fine.
  • Java: write int[][] grid or char[][]. Use Deque<int[]> (ArrayDeque) for stack; int[]{r, c} instead of Pair for performance. Don’t use Stack (legacy synchronized class).
  • Go: [][]byte or [][]int32. Slice-as-stack: s = s[:len(s)-1]. Two-int struct (type cell struct{ r, c int }) avoids slice allocation per push.
  • C++: vector<vector<char>> or vector<string>. stack<pair<int,int>>. Use emplace_back to avoid copies.
  • JS/TS: arrays as stacks (push / pop are O(1)). For 2D grids prefer grid[r][c] over flat-array indexing for clarity; the speed diff is negligible at N = 9 × 10^4.

Common Bugs

  1. Bounds check missing on neighbor (negative or out-of-grid index).
  2. Marking visited after pushing instead of on push — same node enters stack many times via different neighbors.
  3. Recursive DFS without sys.setrecursionlimit blowing the stack on 300 × 300 dense islands.
  4. Mutating the input when the caller didn’t allow it — clarify in interviews.
  5. Treating '1' as integer 1 (or vice versa) — equality check fails silently.
  6. Off-by-one on R and C (using < R vs <= R).
  7. Forgetting one of the four directions (typo in DIRS).

Debugging Strategy

For a 3 × 3 grid, print the grid after each DFS call to verify in-place marking. Add a print((r, c)) on each pop and verify the 4 neighbors are correctly considered. If count is too high, you’re probably re-counting a visited component (visited check missing). If too low, your DFS is exiting early — check the bounds and the equality on '1' vs 1.

Mastery Criteria

  • Recognized “count connected groups in a grid” as DFS/BFS in <30 seconds.
  • Wrote both recursive and iterative DFS from cold start in <8 minutes total.
  • Handled the recursion-limit trap correctly when asked about 10^4 × 10^4 grids.
  • Stated O(R · C) complexity unprompted.
  • Solved LC 200 in <10 minutes from cold start.
  • Solved LC 695 (Max Area) in <12 minutes by extending the template.
  • Solved LC 130 (Surrounded Regions) in <20 minutes by inverting the search.
  • Articulated when DSU is preferable to DFS (online updates, no spatial constraint).