p31 — Subarray Sum Equals K

Source: LeetCode 560 · Medium · Topics: Array, Hash Table, Prefix Sum Companies (2024–2025 frequency): Meta (very high), Amazon (high), Google (high), Microsoft (medium), Bloomberg (medium) Loop position: algorithm warmup or first problem. The discriminator: do you reach for sliding window (WRONG with negatives) or prefix sum + hashmap (RIGHT)?

1. Quick Context

Given an integer array nums and an integer k, return the total number of contiguous subarrays whose sum equals k. The array may contain negative numbers and zeros.

This is the canonical “prefix sum + hashmap” problem. The setup: $$\text{sum}(i, j) = \text{prefix}[j+1] - \text{prefix}[i]$$ where prefix[0] = 0 and prefix[t] = nums[0] + ... + nums[t-1]. Now sum(i,j) == k iff prefix[j+1] == prefix[i] + k.

So as you sweep right computing prefix[t], count how many previous prefixes equal prefix[t] - k. A hashmap count[prefix_value] → frequency gives O(1) lookup → O(N) total.

The interview trap: the unwary candidate spots “subarray sum” and reaches for sliding window. This works only when all entries are non-negative (so the running sum is monotonic in window growth). With negatives, the monotone property dies — sliding window is silently wrong. Prefix sum + hashmap is the correct general-purpose tool.


🔗 https://leetcode.com/problems/subarray-sum-equals-k/

STOP. 25-min timer. If you go for sliding window, force yourself to test on nums=[-1, -1, 1], k=0 (expected: 1; the answer is [-1,-1,1] itself summing to -1, hmm actually that’s -1 — try nums=[1,-1,0], k=0 → 3 subarrays). Sliding window’s monotone assumption breaks on these.


3. Prerequisite Concepts

  • Prefix sums (1D).
  • collections.defaultdict(int) / dict.get(key, 0).
  • The identity sum(i, j) = prefix[j+1] - prefix[i].

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Given nums (may contain negatives) and integer k, count contiguous subarrays summing exactly to k.”

Step 2 — Clarify:

  • “Contiguous?” (Yes. Always confirm.)
  • “Can nums contain negatives, zeros?” (Yes.)
  • “Can k be 0 or negative?” (Yes.)
  • “Overlapping subarrays counted separately?” (Yes — each contiguous range is its own subarray.)
  • “Return count or list of indices?” (Count.)
  • “Array empty?” (Return 0.)

Step 3 — Constraints: N up to 2×10⁴, values in [-1000, 1000]. O(N²) brute is borderline (4×10⁸); O(N) is the target.

Step 4 — Examples:

  • nums=[1,1,1], k=2 → 2 (subarrays [1,1] at indices 0-1 and 1-2).
  • nums=[1,2,3], k=3 → 2 ([3] and [1,2]).
  • nums=[1,-1,1,-1,1,-1], k=0 → 9 (lots).
  • nums=[0,0,0], k=0 → 6 (all non-empty subarrays sum to 0; C(3,2)+3 = 3+3 = 6).
  • nums=[1], k=0 → 0.
  • nums=[], k=0 → 0.

Step 5 — Brute Force: Enumerate all (i, j), sum nums[i..j] incrementally. O(N²). State and pivot.

Step 6 — Complexity: O(N) time, O(N) space (hashmap).

Step 7 — Pattern Recognition:

  • “Subarray sum equals target” + “negatives allowed” → prefix sum + hashmap. Memorize.
  • The hashmap stores HOW MANY TIMES each prefix value was seen — not just whether it was seen — because we count subarrays, not just detect existence.

Step 8 — Develop: see solution.py.

Step 9 — Test: the all-zeros case ([0,0,0], k=0), the negatives case ([1,-1,...], k=0), the LC canonical, the no-match case.


5. Progressive Hints

If stuck for 5 min, open hints.md.


6. Deeper Insight — Why the count[0] = 1 seed?

The map count[v] records: “how many times has the running prefix-sum equaled v BEFORE the current index?”

Before iteration starts, the running prefix sum is 0 (empty prefix nums[0..-1]). So count[0] should start at 1: the empty prefix is a valid “starting boundary.” This is what enables subarrays that start at index 0 to be counted.

Without the seed: consider nums=[3], k=3. Iteration: prefix=3. We look up count[3 - 3] = count[0]. If count[0]=0, we miss the answer. With the seed count[0]=1, we correctly count 1.

A way to remember it: treat the array as having a phantom “left wall” at index -1 with prefix-sum 0. The hashmap entry count[0]=1 represents that wall. Every subarray needs a left boundary and a right boundary; the hashmap tracks the left boundaries available, and count[0]=1 is the first available (the very start).

Why sliding window fails on negatives

Sliding window relies on the invariant: “expanding right monotonically increases (or keeps the same) the running sum; shrinking left monotonically decreases it.” That lets us find the unique window with sum K by adjusting one endpoint at a time.

With negatives, expanding right can DECREASE the sum (when nums[right] < 0). So we can no longer say “if current sum > K, shrink left.” Multiple windows can have the same sum at any moment, and the shrink decision becomes ambiguous.

Concrete failure: nums=[1, -1, 1, 1], k=1. Sliding window starts at [1] sum=1 → record. Expand to [1,-1] sum=0. Need to grow to reach 1 → [1,-1,1] sum=1 → record. Expand to [1,-1,1,1] sum=2 → shrink to [-1,1,1] sum=1 → record. We get 3. But the actual answer includes [1] at index 0, [1,-1,1] 0-2, [1] at index 2, [1,1] at 2-3, [-1,1,1] 1-3, [1] at 3 = 6 subarrays. Sliding window undercounts because it tries to find one window at a time, not all with a given sum.

The hashmap counts all left-endpoints with the desired prefix value at every right-endpoint, in one pass. That’s the structural difference.

Connection to other prefix-sum problems

  • LC 974 (Subarray Sums Divisible by K): replace prefix - k lookup with prefix mod K lookup. Python negative modulo is well-defined ((-7) % 5 == 3), which is exactly what we want.
  • LC 525 (Contiguous Array): treat 0 as -1, then find longest subarray with sum 0 — hashmap stores FIRST index of each prefix value (for longest), not count.
  • LC 1248 (Count Nice Subarrays): replace nums[i] with nums[i] % 2, same template.

7. Anti-Pattern Analysis

  1. Sliding window with negatives. Silently wrong. See Section 6.
  2. Forgetting count[0] = 1. Off-by-one — misses every subarray starting at index 0.
  3. Storing prefix → bool instead of prefix → count. You’d count distinct prefix values, not subarrays.
  4. Updating the hashmap BEFORE the lookup. On nums=[0,0,0], k=0: at index 0, prefix=0; if you do count[0] += 1 (now count[0]=2) THEN look up count[0 - 0] = 2, you count the current empty subarray as well — overcount. Correct order: look up FIRST, then update.
  5. Building a full prefix array. Works but uses O(N) extra space unnecessarily. Just maintain a running sum integer.
  6. Iterating with index i and recomputing sum(nums[:i+1]) each time. O(N²) and silently slow.
  7. Returning the subarrays themselves instead of count. Re-read the problem.
  8. Using dict() and KeyError handling. defaultdict(int) or dict.get(v, 0) is cleaner.

8. Skills & Takeaways

  • Prefix sum + hashmap as a single coupled pattern — memorize.
  • The count[0]=1 seed — fundamental, recurs in LC 974, LC 525, LC 437 (Path Sum III on trees!), etc.
  • Why sliding window fails on negatives — articulate this clearly; it’s a Senior-level signal.
  • Look-up-before-update ordering in counting problems.
  • defaultdict(int) fluency.

9. When to Move On

Move on when:

  • You can write the solution in under 7 minutes.
  • You can explain in one breath why sliding window doesn’t work here.
  • You can derive count[0]=1 rather than memorize it.
  • Your stress test passes 300 random arrays vs O(N²) brute.
  • You can pivot to LC 974, LC 525, LC 437 using the same template.

10. Company Context

Meta:

  • One of the most-asked “warmup” Mediums at Meta in 2024-25. The discriminator is the negative-numbers case.
  • Bar Raiser will explicitly test: “what if the array has negatives?” If you answer “use sliding window,” it’s a fast no.
  • Scorecard phrase: “Algorithm — prefix sum + hashmap, correctly handled negatives.”

Amazon:

  • Often paired with LC 525 (Contiguous Array) in the same loop.
  • Scorecard phrase: “Strong — recognized prefix-sum-hashmap family, transferred to second problem.”

Google:

  • May extend to LC 974 (Subarrays Sums Divisible by K) — the modular variant.
  • Scorecard phrase: “Algorithm + generalization — modular prefix sum.”

Microsoft:

  • Sometimes asked with an additional twist: return the number of DISTINCT subarrays summing to K, by value (not by position).
  • Scorecard phrase: “Strong — recognized prefix-sum-hashmap, handled distinct-value extension.”

Bloomberg:

  • Variant: given a sequence of P&L deltas per minute, count time windows where cumulative P&L equals zero.
  • Scorecard phrase: “Pragmatic — financial prefix-sum framing.”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff/Principal
First instinctBrute O(N²)Sliding window then debugsPrefix sum + hashmap immediatelySame + cites why sliding window fails
Negatives discussionDoesn’t bring upRealizes mid-solveStates upfrontStates + relates to LC 525 / LC 974 family
count[0]=1Forgets, debugs after empty subarrayAdds after one failing testIncludes from start with verbal justificationIncludes + explains “phantom left wall” intuition
GeneralizationNone“Probably works for similar problems”LC 974, LC 525 by name+ Path Sum III on trees uses same template

12. Level Delta

Mid (L4):

  • Prefix sum + hashmap solution.
  • O(N) time, O(N) space.
  • count[0]=1 seed (possibly after one test failure).
  • Edge cases.

Senior (L5):

  • Solution from the start with count[0]=1 and verbal justification.
  • Explains why sliding window fails on negatives.
  • Tests with negatives and zeros explicitly.

Staff (L6):

  • Connects to LC 974 (mod variant), LC 525 (treat-zero-as-minus-one trick), LC 437 (Path Sum III on trees — DFS + same hashmap).
  • Discusses overflow if nums[i] were big (Python is fine; in Java/C++ use long).
  • Discusses parallelism: prefix sum is the classic parallel-scan target (Blelloch); hashmap counting is a sequential reduce — full parallelization is the work of Guy Blelloch’s NESL papers.

Principal (L7+):

  • Discusses probabilistic count-min sketch for streaming over enormous data when exact count not required.
  • Discusses k-dimensional generalization: count subrectangles with sum K in 2D matrix → fix two rows, collapse to 1D Subarray Sum K, O(R²·C). Apply to Kadane’s-family 2D problems.
  • Discusses production: financial drawdown windows, anomaly windows in metrics, A/B test segment matching.

13. Follow-up Q&A

Q1: What if the array can be huge and we need streaming? A: The hashmap can grow to O(N) distinct prefix values — unbounded for streaming. Bounded approximation: count-min sketch with bounded memory. Exact streaming with sublinear memory is impossible.

Q2: 2D version — number of submatrices summing to K (LC 1074). A: Reduce to 1D. For each pair of rows (r1, r2), compute column sums col[c] = sum(matrix[r1..r2][c]) (use 2D prefix to make this O(1) per column after preprocessing). Then run 1D Subarray Sum K on col. Total O(R²·C).

Q3: Return the actual subarrays (indices), not just count. A: Change the hashmap value from int to List[int] storing all previous indices where each prefix value was seen. On match, emit (idx+1, current_idx) for each. Output size can be O(N²) in the worst case — be aware.

Q4: Find the LONGEST subarray summing to K (LC 325). A: Hashmap stores FIRST index of each prefix value. On prefix - k lookup, length = current - first_index. Track max. The “longest” variant doesn’t need to count occurrences.

Q5: Find the count of subarrays where sum is in range [lower, upper] (LC 327). A: Sort prefix sums and use merge-sort-counting or Fenwick tree (BIT). O(N log N). The hashmap pattern doesn’t directly extend to ranges.


14. Solution Walkthrough

See solution.py:

  • subarray_sum(nums, k) — canonical prefix-sum + hashmap. O(N).
  • subarray_sum_brute(nums, k) — O(N²) double loop. Oracle.
  • stress_test() — 300 random arrays with negatives, length ≤ 25, k ∈ [-10, 10].
  • edge_cases() — empty, single positive/negative/zero match-or-not, all-zero, LC canonical, negatives interleaved, large positive only.

Run: python3 solution.py.


15. Beyond the Problem

Principal-engineer code review comment:

“The count[0] = 1 seed needs a comment — every code reviewer who hasn’t seen this template will ask ‘why?’. Either inline-comment ‘represents the empty prefix as a valid left boundary’ or pull the whole hashmap logic into a class PrefixSumCounter with a docstring. Second: if you’re processing many queries on the same array with varying K, precompute all prefix sums once and reuse — don’t re-iterate per query. Third: if nums is very wide (e.g., 64-bit values), the hashmap key space is enormous; in production C++/Java code, profile vs std::map or HashMap<Long, Integer> and consider radix bucketing for sustained throughput. Fourth: if you’re going to extract this into a library, version the API — adding return_indices=True later is a breaking change if naive callers depend on the integer return type.”

Connections forward:

  • LC 974 — Subarray Sums Divisible by K: same template, hashmap on prefix % K.
  • LC 525 — Contiguous Array: treat 0 as -1, find longest subarray summing to 0.
  • LC 437 — Path Sum III: same hashmap pattern, but during DFS on a binary tree.
  • LC 1074 — Number of Submatrices Sum K: 2D extension via row-pair fixing.
  • LC 327 — Count of Range Sum: prefix sums + Fenwick tree.
  • p32 (this week) — 2D prefix sum: complementary technique.
  • Phase 02 Lab 03 — Prefix Sums: template lab.
  • Production: financial drawdown analysis, anomaly detection on metrics, log-window aggregation, A/B test cohort matching.