Lab 01 — Two Pointers: 3Sum (Deep Duplicate Handling)

Goal

Master the opposite-ends two-pointer pattern on a sorted array, with the specific discipline required for correct duplicate suppression. Deliverable solves 3Sum in O(N²) time, O(1) extra space, returning all unique triplets — and you can articulate, line by line, why each duplicate-skip is needed and what bug it prevents.

Background Concepts

Sorting as a precondition for two-pointer; loop invariants on (l, r); duplicate suppression at three loci (the outer i, the inner l, the inner r); the relationship between 3Sum and 2Sum-on-sorted; why hash-set deduplication is the wrong tool here. Review pattern Two Pointers and pattern Hashing.

Interview Context

3Sum is one of the single most asked problems in Big Tech onsite loops — it appears at Meta, Amazon, Bloomberg, and Apple. The signal is whether you can articulate the duplicate-handling logic before you code, not retroactively. Naive candidates produce O(N²) triplets and dedup via a hash set (set(tuple(sorted(triplet)))), which works but signals weak invariant reasoning. Strong candidates handle duplicates inside the two-pointer loop with three skip-clauses and explain each one’s purpose. Elite candidates also address the trade-off vs the hash-based 3Sum (when the input is unsorted and you can’t sort).

Problem Statement

Given an integer array nums, return all unique triplets [nums[i], nums[j], nums[k]] such that i, j, k are distinct indices and nums[i] + nums[j] + nums[k] == 0. The result must not contain duplicate triplets.

Constraints

  • 3 ≤ N ≤ 3000
  • -10^5 ≤ nums[i] ≤ 10^5
  • Output is a list of triplets in any order; each triplet’s internal order doesn’t matter.
  • Triplets are deduplicated by value, not by index: [-1, 0, 1] from indices (0, 2, 4) and from (1, 3, 5) count once.

Clarifying Questions

  1. Should [-1, 0, 1] and [0, 1, -1] count as different triplets? (No — value-set duplicates.)
  2. Are duplicates in the input array allowed? (Yes — many test cases hinge on this.)
  3. Can the input be empty / size < 3? (Per constraints, N ≥ 3, but mention you’d guard.)
  4. Can values exceed 32-bit range when summed? (Per constraints, max |sum| ≤ 3·10^5, safe in 32-bit, but you should mention overflow as a habit.)
  5. Is there a strict no-extra-space requirement, or is the output allocation OK?

Examples

InputOutput
[-1,0,1,2,-1,-4][[-1,-1,2],[-1,0,1]]
[0,0,0,0][[0,0,0]]
[1,2,-2,-1][]
[][]
[0][]

Initial Brute Force

Three nested loops; check nums[i] + nums[j] + nums[k] == 0; dedup with a hash set of sorted triplets.

def three_sum_brute(nums):
    out = set()
    n = len(nums)
    for i in range(n):
        for j in range(i + 1, n):
            for k in range(j + 1, n):
                if nums[i] + nums[j] + nums[k] == 0:
                    out.add(tuple(sorted((nums[i], nums[j], nums[k]))))
    return [list(t) for t in out]

Brute Force Complexity

Time O(N³). Space O(N²) worst case for out (number of unique triplets). At N=3000, ~2.7×10^10 operations — far too slow (>30 seconds in any language).

Optimization Path

The sub-problem after fixing nums[i] is 2Sum on the remaining sorted array with target -nums[i]. 2Sum-on-sorted is O(N) via opposite-ends two-pointer. Total: O(N) outer × O(N) inner = O(N²).

Why sort? Sortedness gives 2Sum-on-sorted an O(N) two-pointer algorithm; without sort, 2Sum is O(N) via hash but combining hash-2Sum with outer triplet enumeration makes duplicate-handling much trickier.

Final Expected Approach

def three_sum(nums):
    nums.sort()
    n = len(nums)
    out = []
    for i in range(n - 2):
        if nums[i] > 0: break                              # early exit
        if i > 0 and nums[i] == nums[i - 1]: continue      # skip i-duplicate
        l, r = i + 1, n - 1
        while l < r:
            s = nums[i] + nums[l] + nums[r]
            if s < 0:
                l += 1
            elif s > 0:
                r -= 1
            else:
                out.append([nums[i], nums[l], nums[r]])
                l += 1; r -= 1
                while l < r and nums[l] == nums[l - 1]: l += 1   # skip l-duplicate
                while l < r and nums[r] == nums[r + 1]: r -= 1   # skip r-duplicate
    return out

Data Structures Used

  • The input array, sorted in place.
  • An output list of triplets.
  • Three integer indices (i, l, r).

No hash structures; no auxiliary lists beyond output.

Correctness Argument

After sorting, fix i. The remaining array nums[i+1..n-1] is sorted, and we run the canonical opposite-ends 2Sum: when the sum is too small we advance l, when too large we retreat r, when equal we record and advance both. The standard 2Sum-on-sorted invariant proves that every value pair (nums[l], nums[r]) with l < r and nums[l] + nums[r] == target is found exactly once, in sorted order.

For duplicates: the three skip-clauses ensure that each distinct triplet by value is recorded exactly once.

  1. if i > 0 and nums[i] == nums[i-1]: continue — the previous outer iteration already enumerated all triplets with first element equal to nums[i]. Without this, [-1,-1,0,1,2] would record [-1,-1,2] and [-1,0,1] twice (once for each -1 as the outer pivot).
  2. while l < r and nums[l] == nums[l-1]: l += 1 (after recording) — if multiple nums[l] values exist within (i, r), they all pair with the same nums[r] to give the same triplet. Skip them.
  3. while l < r and nums[r] == nums[r+1]: r -= 1 (symmetric for the right pointer) — same reasoning.

The early exit if nums[i] > 0: break is correct because once the smallest element of the triplet is positive, no triplet sums to zero (the array is sorted).

Complexity

Time O(N²): O(N log N) sort + O(N²) total work in the nested two-pointer (each l and r move monotonically over a window of size O(N), and the outer i runs N times). Space O(1) extra (excluding output and the sort’s O(log N) recursion stack).

Implementation Requirements

  • Sort in place (do not allocate a sorted copy).
  • Skip duplicates inside the loop, not via a hash set on output.
  • Use the early-exit if nums[i] > 0: break (small but real speedup for typical inputs).
  • Move both pointers on a match, then run the skip loops — not before, or you’ll never advance off the matched element.
  • Return a List<List<Integer>> / list[list[int]] — not a set, not tuples.

Tests

  • Smoke: [-1,0,1,2,-1,-4][[-1,-1,2],[-1,0,1]].
  • Unit: all-zeros ([0,0,0,0][[0,0,0]]); no triplets ([1,2,3][]); minimum size ([0,0,0][[0,0,0]]).
  • Edge: size 0 / 1 / 2 → []; all-negative; all-positive (sum > 0 from index 0 → break immediately, return []).
  • Large: N = 3000, values in [-10^5, 10^5]; assert sub-second; verify count against the brute force on a 100-element prefix.
  • Random: generate 50 random inputs of size ≤ 200; compare against the brute force solution as oracle.
  • Adversarial: [0]*3000 (all zeros — exactly one triplet [0,0,0]); long run of duplicates of a single value.
  • Invalid: non-integer / null input — out of scope per constraints, but mention you’d validate at the boundary.

Follow-up Questions

  • “What about kSum?” → recurse: kSum(nums, target, k) calls (k-1)Sum(remaining, target - nums[i]), base case is 2Sum-sorted. Time O(N^(k-1)).
  • “What if the array is unsorted and you cannot sort it?” → 2Sum-with-hash inside an outer enumeration; duplicate handling becomes per-output-set deduplication, more memory.
  • “What if values repeat extremely (e.g., 99% zeros)?” → the duplicate skips handle this in O(N²) worst case, but in practice each outer iteration is O(1) for the duplicate values; you’d see a near-O(N) effective runtime on that adversarial input.
  • “Can you do better than O(N²)?” → not under standard reductions: 3Sum has a known conditional lower bound of Ω(N²) (3SUM-hardness conjecture). Subquadratic 3Sum implies subquadratic many problems in computational geometry.
  • “What about returning closest triplet to a target?” → 3SumClosest variant; same skeleton, track the best |s - target| seen.

Product Extension

Detecting fraud rings of size 3 in a transaction graph where the signed sum of three transactions must net to zero (cancel out), under the constraint that the transactions hashed-distinctly. The same skeleton — fix one transaction, two-pointer the rest by signed amount — works, with the wrinkle that “duplicate” must be defined carefully (transactions are distinct by ID even if amounts are equal, so the duplicate skipping is replaced by a per-amount enumeration that emits all distinct ID combinations summing to zero).

Language/Runtime Follow-ups

  • Python: nums.sort() is in place. Use nums.append([nums[i], nums[l], nums[r]]) not nums.append((..)) if a list-of-lists is required (LC accepts either, but the contract is list).
  • Java: sort with Arrays.sort(nums) — note this is dual-pivot quicksort for primitives, average O(N log N). For Integer[] it’s TimSort. Use List<List<Integer>> and add Arrays.asList(a, b, c) per triplet — Arrays.asList returns a fixed-size list, which is fine because we never mutate it.
  • Go: sort.Ints(nums) sorts in place; the rest is pointer arithmetic over indices.
  • C++: std::sort(nums.begin(), nums.end()). Use std::vector<std::vector<int>> for output. Beware: integer addition nums[i] + nums[l] + nums[r] can overflow 32-bit if value bounds were larger; with the given constraints it’s safe, but as a habit, use long long.
  • JS/TS: Array.prototype.sort() defaults to lexicographic comparison — this is a top-3 JS interview bug. Use nums.sort((a, b) => a - b). Also, a - b can overflow 32-bit if values exceed ±2^30; for very large values use Math.sign(a - b).
  • Adversarial: sorting is the dominant constant; if the input is already sorted (best case for many sorts) this is faster than the brute by another factor.

Common Bugs

  1. Forgetting the nums[i] == nums[i-1] skip on outer — produces duplicate output triplets like [-1,-1,2] repeated.
  2. Forgetting to advance pointers on matchl += 1; r -= 1 must come before the inner duplicate-skip loops; otherwise nums[l] == nums[l-1] is comparing nums[l] to itself and the skip loop runs forever (well, runs until l == r, but produces no progress on the matched value).
  3. Using < vs <= in while l < rl == r would mean the same index appearing as both pointers, which is invalid (i, j, k must be distinct indices).
  4. JS lexicographic sort[-1, -1, 2, -4, 0, 1] after default sort is [-1, -1, -4, 0, 1, 2] (string-sorted). Always pass a numeric comparator.
  5. Missing 32-bit overflow in C++/Java/Go when constraints allow large values. With LC-3Sum’s constraints it doesn’t bite, but the habit costs nothing and saves you on related problems (4Sum, kSum-with-target).
  6. Hash-set deduplication on output — works, but signals you didn’t internalize the invariant. Time still O(N²) but space O(N²) instead of O(1).
  7. Sorting twice by accident (once explicitly, once implicit in a downstream API) — innocuous but signals carelessness.

Debugging Strategy

  • Run on [0,0,0,0] first — should return [[0,0,0]]. If you get [[0,0,0],[0,0,0]], your outer-skip is broken. If you get [], your inner loop never matches (likely l < r typo).
  • Run on [-2,0,0,2,2]. Expected: [[-2,0,2]]. If you get [[-2,0,2],[-2,0,2]], your inner-r skip is broken.
  • Diff against the brute force on 50 random inputs of size 50. The answers should match as sets (order of triplets and inner order doesn’t matter; sort each triplet and the list of triplets to compare).
  • Time a 3000-element random input. Should run in well under a second in Python; under 50ms in C++ / Java / Go.

Mastery Criteria

  • Recognized the signal “sorted, find triplet summing to target” as 3Sum within 30 seconds of reading.
  • Articulated the three skip-clauses before writing them.
  • Wrote a correct implementation in under 8 minutes, no lookup.
  • Passed the all-zeros, single-duplicate, and large-N tests on the first attempt.
  • Stated the conditional Ω(N²) lower bound when asked “can you do better?”.
  • Identified the JS lexicographic-sort trap (or its language equivalent) without prompting.
  • Generalized verbally to kSum and to 3SumClosest.