p38 — Subsets
Source: LeetCode 78 · Medium · Topics: Array, Backtracking, Bit Manipulation Companies (2024–2025 frequency): Meta (high), Amazon (high), Google (high), Microsoft (medium), Apple (medium) Loop position: algorithm round; often paired with Combinations (p37) and Permutations (p36) to test family recognition.
1. Quick Context
Given an array of distinct integers, return the powerset — all 2^n subsets. The follow-up (LC 90 — Subsets II, with duplicates) is the staff signal.
Three equally valid approaches — interviewer-level signal comes from knowing all three and the trade-offs:
- Backtracking with start index (variant of p37’s template — yield at EVERY node, not just leaves).
- Include/exclude DFS (binary choice tree, depth = n).
- Iterative powerset growth — for each new element, double the result by including it in every existing subset.
- Bitmask enumeration (n ≤ 32) — iterate masks
0..2^n - 1, decode bits.
Time: O(n · 2^n) — 2^n subsets, O(n) to copy each. Space: O(n) recursion (or O(1) for the iterative/bitmask variants, ignoring output).
The discriminator: writing the include/exclude template cleanly, AND knowing why bitmask is fastest for n ≤ 20.
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/subsets/
STOP. 15-min timer.
3. Prerequisite Concepts
- p37 (Combinations) — subsets are “all sizes” instead of fixed k.
- Pascal’s identity: 2^n = sum over k of C(n, k).
- Bit manipulation basics:
mask >> i & 1.
4. How to Approach (FRAMEWORK Steps 1–9)
Step 1 — Restate: “All 2^n subsets of the input array. Any order.”
Step 2 — Clarify:
- “Distinct elements?” (Yes — that’s the LC 78 spec. LC 90 is the duplicate variant.)
- “Include the empty subset?” (Yes.)
- “Order of subsets in output?” (Free.)
- “Order within each subset?” (Free, but typically input order is convention.)
Step 3 — Constraints: n ≤ 10 → 1024 subsets. n ≤ 20 → ~1M. Tight enough that the algorithm matters but not lifesaving.
Step 4 — Examples:
[1,2,3]→[[],[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]](8 subsets).[]→[[]].
Step 5 — Brute Force: Bitmask enumeration is already O(n · 2^n). The “backtracking” framing is for pedagogy.
Step 6 — Complexity: O(n · 2^n). Space O(n) recursion or O(1) bitmask.
Step 7 — Pattern: “Powerset / all-subset” → 2^n enumeration. Three equivalent algorithms.
Step 8 — Develop: see solution.py — start-index, include/exclude, iterative, bitmask, and LC 90 (duplicates).
Step 9 — Test: edge cases (empty, single, full overlap), stress vs itertools.chain over combinations.
5. Progressive Hints
If stuck for 5 min, open hints.md.
6. Deeper Insight — Four algorithms, one shape
6.1 Start-index backtracking — “snapshot at every node”
def backtrack(start, path):
result.append(path[:]) # SNAPSHOT here, not at a leaf
for i in range(start, n):
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
The crucial difference from p37: result.append is at the TOP of the function, not gated by len(path) == k. Every node of the recursion tree is itself a valid subset.
The recursion tree has 2^n nodes (each node = one subset).
6.2 Include/exclude DFS
def backtrack(i, path):
if i == n:
result.append(path[:]); return
# exclude nums[i]
backtrack(i + 1, path)
# include nums[i]
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
Depth = n, branching = 2. Generates 2^n leaves; each leaf is a subset. Mirrors the bit-by-bit decision: “is element i in the subset?”
This is the cleanest framing for connecting to DP (e.g., knapsack: include/exclude each item) and to LC 698 (Partition to K Equal Sum Subsets).
6.3 Iterative powerset growth
result = [[]]
for x in nums:
result += [sub + [x] for sub in result]
Starts with [[]]. For each new element, doubles the result by including the element in every existing subset. O(n · 2^n) time, no recursion. The most elegant code.
6.4 Bitmask enumeration
for mask in range(1 << n):
subset = [nums[i] for i in range(n) if mask >> i & 1]
result.append(subset)
O(n · 2^n). Trivially parallelizable (split masks across workers). Fastest in practice for n ≤ 20 (no recursion overhead, cache-friendly).
6.5 LC 90 — Subsets II (duplicates in input)
Same as LC 78 backtracking with the same dedup line as LC 40:
for i in range(start, n):
if i > start and nums[i] == nums[i-1]: continue
...
Note: i > start, NOT i > 0. At the same recursion level (same start), skip equal siblings; the first time at a level, take it. This is the canonical-form pruning principle from p36/p37/p40 — internalize it.
6.6 Why so many algorithms?
Each one teaches a different concept:
- Start-index: the “yield at every node” pattern (also used in tree traversals).
- Include/exclude: bridges backtracking → DP. Knapsack is exactly this.
- Iterative: the “build up” pattern. Cleanest code; hardest to extend.
- Bitmask: bridges to bitmask DP (LC 526, LC 691).
A staff candidate uses include/exclude when extending to DP, bitmask when n is small and speed matters, iterative when readability matters, and start-index when extending to size-restricted variants (LC 77, LC 39).
7. Anti-Pattern Analysis
- Snapshot inside the for-loop instead of at function entry — misses the empty subset and produces duplicates.
- Using
setoffrozensetto dedup for LC 90. Exponentially wasteful; the sort+skip line dedupes at the source. for i in range(n)instead offor i in range(start, n)— generates duplicates{1,2}and{2,1}.result.append(path)instead ofpath[:]— same bug as p36.- Bitmask without
int.bit_length()or pre-sorting when needed — order issues. - For n=30+ using bitmask — 2^30 = 1B subsets. Output is huge regardless of algorithm; the issue is whether you NEED all of them. For “count” or “sum over subsets,” switch to DP.
- For LC 90, dedup line
i > 0instead ofi > start— over-prunes; loses valid subsets.
8. Skills & Takeaways
- Four algorithms, all O(n · 2^n) but with different code/extension properties.
- “Snapshot at every node” vs “snapshot at leaves” — the structural choice.
- LC 90 dedup:
i > start and nums[i] == nums[i-1]. - Bitmask = the bridge to bitmask DP (Phase 03).
- Include/exclude = the bridge to knapsack DP (Phase 05).
9. When to Move On
Move on when:
- You can write all four variants in under 10 min total.
- You can write LC 90 (duplicates) in under 5 min.
- You can articulate which algorithm you’d choose in each context (small n, large n, DP extension, parallel).
- Stress test of 200 random inputs (n=0..8) matches a reference oracle.
10. Company Context
Meta:
- L4/L5 standard. Often paired with LC 90 immediately, then LC 39, then “now count subsets with sum = S” (DP).
- Scorecard phrase: “Algorithm — articulated 3 algorithms, picked the right one for the follow-up.”
Amazon:
- L5 staple. The “iterative powerset growth” version is a frequent expected answer.
Google:
- L5+ uses this as a launching pad to bitmask DP (LC 526, LC 691, LC 1349) — be ready to discuss.
Microsoft:
- L63. Often combined with “how many subsets sum to S” → bridge to DP.
Apple:
- Common in algo loops; data and ML teams use the “all feature combinations” framing.
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff/Principal |
|---|---|---|---|---|
| Approach | One algorithm | Two (backtracking + iterative) | All four, picks the right one | + extends to bitmask DP and knapsack |
| LC 90 dedup | Set-based | i > 0 rule (wrong) | i > start rule (correct) | + proves canonical-form correctness |
| Connections | None | “Like combinations” | + Pascal’s identity | + bridge to subset-sum DP, partition DP, bitmask DP |
12. Level Delta
Mid (L4):
- One correct algorithm (backtracking or iterative).
- Handles empty input.
Senior (L5):
- Two algorithms minimum (start-index + iterative OR include/exclude + bitmask).
- Solves LC 90 with correct dedup.
- Discusses O(n · 2^n) and why output is the limit.
Staff (L6):
- All four algorithms.
- Connects include/exclude to knapsack DP and bitmask to bitmask DP.
- Discusses streaming / generator variant.
- Handles “subset with property X” (count, sum, etc.) — knows when to switch to DP.
Principal (L7+):
- Discusses production: feature-flag combinations (test all on/off combos), formal verification (state-space enumeration for tiny systems), policy enumeration in RL (small action spaces), regulatory compliance (audit all combinations of attributes).
- Discusses subset-sum is NP-hard; powerset enumeration is exponential — when n ≥ 30, switch to DP, branch-and-bound, or approximation.
- Discusses sketching: for “approximately count subsets with property P,” use hashing/sampling.
13. Follow-up Q&A
Q1: Subsets II (LC 90) — duplicates in input.
A: Sort input. In the loop: if i > start and nums[i] == nums[i-1]: continue. Canonical-form pruning — at each recursion level, take the first equal sibling only.
Q2: Count subsets with sum = target.
A: Don’t enumerate. DP: dp[i][s] = dp[i-1][s] + dp[i-1][s - nums[i]]. O(n · S) time, O(S) space with rolling array. Subset-sum DP; LC 416 family.
Q3: K-th subset in lex order. A: Combinatorial number system, or interpret k’s binary representation as a subset-mask (with care about ordering).
Q4: Subsets summing to target with each element usable once. A: LC 39 / LC 40 / LC 416 family — DP or backtracking + pruning.
Q5: Subsets when n = 50. A: 2^50 ≈ 10^15. Can’t enumerate. Use meet-in-the-middle: split into halves of 25, enumerate each (2^25 ≈ 33M each), join. Or DP if there’s structure. Or sampling if approximate is OK. See Phase 03 Lab 09 — Meet-in-the-Middle.
14. Solution Walkthrough
See solution.py:
subsets_backtrack(nums)— start-index, snapshot at every node.subsets_include_exclude(nums)— binary choice tree.subsets_iterative(nums)— powerset growth, no recursion.subsets_bitmask(nums)— enumerate0..2^n - 1.subsets_with_dup(nums)— LC 90, sort + skip-dedup.subsets_brute(nums)— oracle viaitertoolschain over combinations.edge_cases()+stress_test()— verify all five vs oracle.
Run: python3 solution.py.
15. Beyond the Problem
Principal-engineer code review comment:
“Three things. (1) Pick ONE algorithm as the ‘canonical’ for the codebase — most teams should standardize on iterative (
result = [[]]; for x in nums: result += [s+[x] for s in result]) because it’s 5 lines and obviously correct. (2) For LC 90, the dedup line is THE thing that goes wrong; add a unit test specifically for[1,2,2,3]that asserts no duplicate subset appears. (3) If subsets are an INPUT to downstream code (not the final answer), don’t materialize them — write a generator (yield from ...) and let the caller consume on the fly. For n=20 that’s a 100× memory win.”
Connections forward:
- p37 — Combinations: size-restricted subsets.
- p36 — Permutations: ordered variant.
- LC 90 — Subsets II: dedup variant.
- LC 416 — Partition Equal Subset Sum: counting via subset-sum DP.
- LC 698 — Partition to K Equal Sum Subsets: backtracking + pruning.
- Phase 03 Lab 08 — Bitmask DP.
- Phase 03 Lab 09 — Meet-in-the-Middle (for n=40+).
- Phase 05 — DP labs: knapsack, subset-sum, partition.
- Production: feature flag combinations, regulatory audit, formal verification of small systems.