p37 — Combinations

Source: LeetCode 77 · Medium · Topics: Backtracking Companies (2024–2025 frequency): Amazon (high), Google (medium-high), Meta (medium), Microsoft (medium), Apple (medium) Loop position: algorithm round, second slot. Tests whether the candidate distinguishes “permutation” from “combination” — a common confusion.

1. Quick Context

Given integers n and k, return all C(n, k) combinations of k numbers chosen from [1..n]. Order within a combination is non-decreasing by convention (and to deduplicate naturally).

The key distinction from permutations: a start index replaces used[]. Each recursion picks values from [start..n], ensuring monotonically increasing selection — which automatically deduplicates {1,2} and {2,1}.

def backtrack(start, path):
    if len(path) == k:
        result.append(path[:]); return
    for i in range(start, n + 1):
        path.append(i)
        backtrack(i + 1, path)
        path.pop()

Time: O(k · C(n, k)). Space: O(k) recursion.

The discriminator: the pruning optimizationfor i in range(start, n - (k - len(path)) + 2) — which skips branches that can’t reach length k. Without it, you still iterate over impossible starts; with it, you cut significant work at large n.


🔗 https://leetcode.com/problems/combinations/

STOP. 15-min timer.


3. Prerequisite Concepts

  • p36 (Permutations).
  • The combinatorial identity C(n, k) = C(n-1, k-1) + C(n-1, k) (Pascal’s rule).
  • Why monotonic order (i < j for path positions i, j) deduplicates.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “All k-subsets of {1, 2, …, n}, as lists.”

Step 2 — Clarify:

  • “Numbers from 1 to n inclusive?” (Yes.)
  • “k ≤ n?” (Yes per spec.)
  • “Order within each combination?” (Non-decreasing — natural for dedup.)

Step 3 — Constraints: n ≤ 20, k ≤ n. C(20, 10) ≈ 184k. Manageable.

Step 4 — Examples:

  • n=4, k=2[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]].
  • n=1, k=1[[1]].
  • k=0[[]].

Step 5 — Brute Force: Iterate all 2^n subsets, keep those of size k. O(2^n) — vastly worse than O(k · C(n,k)) for small k. State it.

Step 6 — Complexity: O(k · C(n, k)) time. O(k) recursion depth.

Step 7 — Pattern: “Choose k from n without repetition, unordered” → combinations template (start index).

Step 8 — Develop: see solution.py. Both naive and pruned loop bound.

Step 9 — Test: edge cases (k=0, k=n, n=1), stress vs itertools.combinations.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — The pruning that changes everything

6.1 The naive bound vs the tight bound

Naive: for i in range(start, n + 1). Iterates even when it’s IMPOSSIBLE to complete path to length k.

Tight: for i in range(start, n - (k - len(path)) + 2).

Derivation: we need to add (k - len(path)) more elements. The last possible starting value i is one that still leaves (k - len(path) - 1) elements above it in [i+1..n]. That requires n - i >= k - len(path) - 1, i.e., i <= n - (k - len(path)) + 1. So range’s upper bound (exclusive) is n - (k - len(path)) + 2.

For n=20, k=10, the naive variant explores ~3× more nodes than the pruned one. For larger n, the gap widens.

6.2 Why the start index suffices (no used[])

In permutations, we revisit “early” elements at deeper levels (e.g., [2,1,3] after [1,2,3]). In combinations, we never revisit — once we choose i, the next pick is from [i+1..n]. The strictly increasing structure makes {1,2} and {2,1} the SAME thing, and we generate only the canonical (non-decreasing) form.

6.3 LC 39 — Combination Sum (with reuse)

Same template, but backtrack(i, ...) (not i+1) — allows reusing the same element. Plus a target -= candidates[i] accumulator with target == 0 as the success condition and target < 0 as a prune.

6.4 LC 40 — Combination Sum II (with duplicates in input, no reuse)

Same template + the same dedup line as LC 47: if i > start and candidates[i] == candidates[i-1]: continue. Note i > start (not i > 0) — at the same recursion level, skip equal siblings; the first time at a level, take it.

6.5 LC 216 — Combination Sum III

k numbers from [1..9] summing to n. Pure combination of LC 77 + LC 39 logic.

6.6 Connection to Pascal’s triangle

C(n, k) = C(n-1, k-1) [contains n] + C(n-1, k) [doesn’t contain n]. This is the “include/exclude” framing of the same algorithm — the recursion tree has the same shape, just viewed differently. The include/exclude framing dominates subsets (p38).


7. Anti-Pattern Analysis

  1. Using used[] as in permutations. Generates k! × C(n,k) = P(n,k) results with massive duplication. The dedup-at-end is exponentially wasteful.
  2. backtrack(start, ...) then recursing with start instead of i+1. Allows reuse — that’s LC 39, not LC 77.
  3. No pruning bound. Works but slow for large n.
  4. Generating all 2^n subsets and filtering by size k. O(2^n) — vastly slower for small k.
  5. result.append(path) instead of path[:]. Same bug as p36.
  6. Off-by-one on the rangerange(start, n) instead of range(start, n + 1) (1-indexed!).
  7. For LC 39 (reuse), forgetting the target < 0 prune — the recursion explodes.

8. Skills & Takeaways

  • Start-index template = the combination signature.
  • Pruning bound: n - (k - len(path)) + 2. Memorize.
  • Reuse variant = recurse with i, not i+1.
  • Duplicates-in-input variant = sort + if i > start and a[i] == a[i-1]: continue.
  • The include/exclude view (Pascal’s rule) — bridges to subsets (p38).

9. When to Move On

Move on when:

  • You can write the basic combinations template in under 90 seconds.
  • You can add the pruning bound from memory.
  • You can write LC 39 and LC 40 in under 5 minutes each.
  • Stress test of 200 random (n, k) matches itertools.combinations.

10. Company Context

Amazon:

  • L5 staple. Combined with “now find combinations summing to target” (LC 39) is the standard 2-part question.
  • Scorecard phrase: “Algorithm — used start-index template; pruning bound on second pass.”

Google:

  • L4/L5. Often paired with LC 78 (Subsets) to test whether you see the family connection.
  • Scorecard phrase: “Strong — clean template, articulated connection to subsets and permutations.”

Meta:

  • Standard backtracking warm-up; rare as the sole question.

Microsoft:

  • L63. Often combined with “now stream them” (generator variant).

Apple:

  • Common in algo loops, particularly for ML / data teams (feature combinations).

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
ApproachPermute-and-dedupeStart index, naive boundStart index + pruning bound+ cites Pascal’s rule and the include/exclude view
VariantsKnows only LC 77+ LC 39+ LC 40 (dedup)+ LC 216 + connection to dynamic programming (knapsack family)
OutputIncludes duplicatesCleanClean + complexity+ generator variant for memory

12. Level Delta

Mid (L4):

  • Start-index template, correct snapshot.
  • Handles k=0, k=n, n=1.

Senior (L5):

  • Adds the pruning bound and explains the derivation.
  • Solves LC 39 (reuse) and LC 40 (duplicates) on demand.

Staff (L6):

  • Discusses the include/exclude framing and connection to DP / knapsack.
  • Generator variant for streaming.
  • Discusses when combinations are the wrong tool — e.g., k-th combination (LC 60 analogue) via combinatorial number system.

Principal (L7+):

  • Discusses production: feature selection in ML, A/B test bucket enumeration, combinatorial test generation (pairwise / t-wise coverage tools like ACTS), portfolio combinatorial optimization.
  • Discusses parallel enumeration via prefix-based work splitting.
  • Discusses lattice / Bratteli diagrams for advanced combinatorial enumeration.

13. Follow-up Q&A

Q1: Combination Sum (LC 39) — same value usable infinitely. A: Recurse with i instead of i + 1. Subtract from target; success when target == 0; prune when target < 0.

Q2: Combination Sum II (LC 40) — duplicates in input, each used once. A: Sort + if i > start and a[i] == a[i-1]: continue + recurse with i + 1.

Q3: K-th combination of [1..n] choose k. A: Combinatorial number system. At each position, the i-th element of the k-th combination is determined by counting how many combinations start with each prefix. O(k · n) — no enumeration needed.

Q4: Generate combinations as a stream. A: Python generator with yield list(path) instead of result.append(...). Same template.

Q5: Why not bitmask? A: For k-combinations from n ≤ 32, enumerate bitmasks with exactly k bits set (Gosper’s hack: t = v | (v - 1); w = (t + 1) | (((~t & -~t) - 1) >> (v.bit_length() of (v & -v)))). O(1) per next-combination. Beautiful for n ≤ 64; backtracking is clearer.


14. Solution Walkthrough

See solution.py:

  • combine(n, k) — start-index template with pruning bound.
  • combine_naive(n, k) — without pruning, for comparison.
  • combine_brute(n, k)itertools.combinations oracle.
  • edge_cases() + stress_test() — 200 random (n, k), all variants agree.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“Two notes. (1) The pruning bound n - (k - len(path)) + 2 is correct but reads like noise. Inline a one-line comment showing the derivation, and consider naming a local: max_first = n - need + 1. (2) If this is a hot path, switch to bitmask enumeration with Gosper’s hack — branchless, no recursion, ~10× faster for n ≤ 64. But ONLY for hot paths — for everything else, the recursion is more maintainable. (3) For very large n with sparse k (e.g., n=10000, k=3), the combinatorial number system gives O(k) per next-combination — useful when you want to sample or stream.”

Connections forward:

  • p38 — Subsets: include/exclude view of the same template.
  • p36 — Permutations: contrasting “all orderings” vs “unordered choice”.
  • LC 39 — Combination Sum: reuse variant.
  • LC 40 — Combination Sum II: duplicate-input variant.
  • LC 216 — Combination Sum III: bounded-k + target.
  • LC 90 — Subsets II: dedup version of p38.
  • Phase 02 Lab 08 — Backtracking lab.
  • Phase 03 Lab 08 — Bitmask DP: connection to bit-enumeration via Gosper’s hack.
  • Production: feature selection (ML), A/B-test bucket design, combinatorial test coverage.