p37 — Combinations

Source: LeetCode 77 · Medium · Topics: Backtracking Companies (2024–2025 frequency): Amazon (high), Google (medium-high), Meta (medium), Microsoft (medium), Apple (medium) Loop position: algorithm round, second slot. Tests whether the candidate distinguishes “permutation” from “combination” — a common confusion.

1. Quick Context

Given integers n and k, return all C(n, k) combinations of k numbers chosen from [1..n]. Order within a combination is non-decreasing by convention (and to deduplicate naturally).

The key distinction from permutations: a start index replaces used[]. Each recursion picks values from [start..n], ensuring monotonically increasing selection — which automatically deduplicates {1,2} and {2,1}.

def backtrack(start, path):
    if len(path) == k:
        result.append(path[:]); return
    for i in range(start, n + 1):
        path.append(i)
        backtrack(i + 1, path)
        path.pop()

Time: O(k · C(n, k)). Space: O(k) recursion.

The discriminator: the pruning optimization — for i in range(start, n - (k - len(path)) + 2) — which skips branches that can’t reach length k. Without it, you still iterate over impossible starts; with it, you cut significant work at large n.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/combinations/

STOP. 15-min timer.

3. Prerequisite Concepts

p36 (Permutations).
The combinatorial identity C(n, k) = C(n-1, k-1) + C(n-1, k) (Pascal’s rule).
Why monotonic order (i < j for path positions i, j) deduplicates.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “All k-subsets of {1, 2, …, n}, as lists.”

Step 2 — Clarify:

“Numbers from 1 to n inclusive?” (Yes.)
“k ≤ n?” (Yes per spec.)
“Order within each combination?” (Non-decreasing — natural for dedup.)

Step 3 — Constraints: n ≤ 20, k ≤ n. C(20, 10) ≈ 184k. Manageable.

Step 4 — Examples:

n=4, k=2 → [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]].
n=1, k=1 → [[1]].
k=0 → [[]].

Step 5 — Brute Force: Iterate all 2^n subsets, keep those of size k. O(2^n) — vastly worse than O(k · C(n,k)) for small k. State it.

Step 6 — Complexity: O(k · C(n, k)) time. O(k) recursion depth.

Step 7 — Pattern: “Choose k from n without repetition, unordered” → combinations template (start index).

Step 8 — Develop: see solution.py. Both naive and pruned loop bound.

Step 9 — Test: edge cases (k=0, k=n, n=1), stress vs itertools.combinations.

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — The pruning that changes everything

6.1 The naive bound vs the tight bound

Naive: for i in range(start, n + 1). Iterates even when it’s IMPOSSIBLE to complete path to length k.

Tight: for i in range(start, n - (k - len(path)) + 2).

Derivation: we need to add (k - len(path)) more elements. The last possible starting value i is one that still leaves (k - len(path) - 1) elements above it in [i+1..n]. That requires n - i >= k - len(path) - 1, i.e., i <= n - (k - len(path)) + 1. So range’s upper bound (exclusive) is n - (k - len(path)) + 2.

For n=20, k=10, the naive variant explores ~3× more nodes than the pruned one. For larger n, the gap widens.

6.2 Why the start index suffices (no `used[]`)

In permutations, we revisit “early” elements at deeper levels (e.g., [2,1,3] after [1,2,3]). In combinations, we never revisit — once we choose i, the next pick is from [i+1..n]. The strictly increasing structure makes {1,2} and {2,1} the SAME thing, and we generate only the canonical (non-decreasing) form.

6.3 LC 39 — Combination Sum (with reuse)

Same template, but backtrack(i, ...) (not i+1) — allows reusing the same element. Plus a target -= candidates[i] accumulator with target == 0 as the success condition and target < 0 as a prune.

6.4 LC 40 — Combination Sum II (with duplicates in input, no reuse)

Same template + the same dedup line as LC 47: if i > start and candidates[i] == candidates[i-1]: continue. Note i > start (not i > 0) — at the same recursion level, skip equal siblings; the first time at a level, take it.

6.5 LC 216 — Combination Sum III

k numbers from [1..9] summing to n. Pure combination of LC 77 + LC 39 logic.

6.6 Connection to Pascal’s triangle

C(n, k) = C(n-1, k-1) [contains n] + C(n-1, k) [doesn’t contain n]. This is the “include/exclude” framing of the same algorithm — the recursion tree has the same shape, just viewed differently. The include/exclude framing dominates subsets (p38).

7. Anti-Pattern Analysis

Using used[] as in permutations. Generates k! × C(n,k) = P(n,k) results with massive duplication. The dedup-at-end is exponentially wasteful.
backtrack(start, ...) then recursing with start instead of i+1. Allows reuse — that’s LC 39, not LC 77.
No pruning bound. Works but slow for large n.
Generating all 2^n subsets and filtering by size k. O(2^n) — vastly slower for small k.
result.append(path) instead of path[:]. Same bug as p36.
Off-by-one on the range — range(start, n) instead of range(start, n + 1) (1-indexed!).
For LC 39 (reuse), forgetting the target < 0 prune — the recursion explodes.

8. Skills & Takeaways

Start-index template = the combination signature.
Pruning bound: n - (k - len(path)) + 2. Memorize.
Reuse variant = recurse with i, not i+1.
Duplicates-in-input variant = sort + if i > start and a[i] == a[i-1]: continue.
The include/exclude view (Pascal’s rule) — bridges to subsets (p38).

9. When to Move On

Move on when:

You can write the basic combinations template in under 90 seconds.
You can add the pruning bound from memory.
You can write LC 39 and LC 40 in under 5 minutes each.
Stress test of 200 random (n, k) matches itertools.combinations.

10. Company Context

Amazon:

L5 staple. Combined with “now find combinations summing to target” (LC 39) is the standard 2-part question.
Scorecard phrase: “Algorithm — used start-index template; pruning bound on second pass.”

Google:

L4/L5. Often paired with LC 78 (Subsets) to test whether you see the family connection.
Scorecard phrase: “Strong — clean template, articulated connection to subsets and permutations.”

Meta:

Standard backtracking warm-up; rare as the sole question.

Microsoft:

L63. Often combined with “now stream them” (generator variant).

Apple:

Common in algo loops, particularly for ML / data teams (feature combinations).

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
Approach	Permute-and-dedupe	Start index, naive bound	Start index + pruning bound	+ cites Pascal’s rule and the include/exclude view
Variants	Knows only LC 77	+ LC 39	+ LC 40 (dedup)	+ LC 216 + connection to dynamic programming (knapsack family)
Output	Includes duplicates	Clean	Clean + complexity	+ generator variant for memory

12. Level Delta

Mid (L4):

Start-index template, correct snapshot.
Handles k=0, k=n, n=1.

Senior (L5):

Adds the pruning bound and explains the derivation.
Solves LC 39 (reuse) and LC 40 (duplicates) on demand.

Staff (L6):

Discusses the include/exclude framing and connection to DP / knapsack.
Generator variant for streaming.
Discusses when combinations are the wrong tool — e.g., k-th combination (LC 60 analogue) via combinatorial number system.

Principal (L7+):

Discusses production: feature selection in ML, A/B test bucket enumeration, combinatorial test generation (pairwise / t-wise coverage tools like ACTS), portfolio combinatorial optimization.
Discusses parallel enumeration via prefix-based work splitting.
Discusses lattice / Bratteli diagrams for advanced combinatorial enumeration.

13. Follow-up Q&A

Q1: Combination Sum (LC 39) — same value usable infinitely. A: Recurse with i instead of i + 1. Subtract from target; success when target == 0; prune when target < 0.

Q2: Combination Sum II (LC 40) — duplicates in input, each used once. A: Sort + if i > start and a[i] == a[i-1]: continue + recurse with i + 1.

Q3: K-th combination of [1..n] choose k. A: Combinatorial number system. At each position, the i-th element of the k-th combination is determined by counting how many combinations start with each prefix. O(k · n) — no enumeration needed.

Q4: Generate combinations as a stream. A: Python generator with yield list(path) instead of result.append(...). Same template.

Q5: Why not bitmask? A: For k-combinations from n ≤ 32, enumerate bitmasks with exactly k bits set (Gosper’s hack: t = v | (v - 1); w = (t + 1) | (((~t & -~t) - 1) >> (v.bit_length() of (v & -v)))). O(1) per next-combination. Beautiful for n ≤ 64; backtracking is clearer.

14. Solution Walkthrough

See solution.py:

combine(n, k) — start-index template with pruning bound.
combine_naive(n, k) — without pruning, for comparison.
combine_brute(n, k) — itertools.combinations oracle.
edge_cases() + stress_test() — 200 random (n, k), all variants agree.

Run: python3 solution.py.

15. Beyond the Problem

Principal-engineer code review comment:

“Two notes. (1) The pruning bound n - (k - len(path)) + 2 is correct but reads like noise. Inline a one-line comment showing the derivation, and consider naming a local: max_first = n - need + 1. (2) If this is a hot path, switch to bitmask enumeration with Gosper’s hack — branchless, no recursion, ~10× faster for n ≤ 64. But ONLY for hot paths — for everything else, the recursion is more maintainable. (3) For very large n with sparse k (e.g., n=10000, k=3), the combinatorial number system gives O(k) per next-combination — useful when you want to sample or stream.”

Connections forward:

p38 — Subsets: include/exclude view of the same template.
p36 — Permutations: contrasting “all orderings” vs “unordered choice”.
LC 39 — Combination Sum: reuse variant.
LC 40 — Combination Sum II: duplicate-input variant.
LC 216 — Combination Sum III: bounded-k + target.
LC 90 — Subsets II: dedup version of p38.
Phase 02 Lab 08 — Backtracking lab.
Phase 03 Lab 08 — Bitmask DP: connection to bit-enumeration via Gosper’s hack.
Production: feature selection (ML), A/B-test bucket design, combinatorial test coverage.

LeetCode Interview — Extreme Coding