p09 — Maximum Subarray (Kadane’s Algorithm)

Source: LeetCode 53 · Medium · Topics: Array, Divide & Conquer, Dynamic Programming Companies (2024–2025 frequency): Amazon (very high), Microsoft (high), Bloomberg (very high), Apple (high), Meta (medium), Google (medium) Loop position: mid-loop coding round; sometimes given as “Easy” but the optimal is genuinely subtle

1. Quick Context

This is your first encounter with a 1-D dynamic programming algorithm that collapses to two scalars. The DP table is conceptually dp[i] = max subarray sum ending exactly at i, but the answer only needs dp[i-1], so you never allocate the array. Senior candidates derive this collapse on the spot. Mid candidates write the O(N) DP array and forget to optimize. Junior candidates write the O(N²) brute force and get stuck.

The interviewer is testing whether you can: (a) identify that “contiguous subarray” rules out two-pointer (which needs monotone structure) and dynamic-programming-with-prefix-sums (which gets the global sum, not the contiguous-piece argmax), (b) derive the recurrence on the spot, (c) collapse it to O(1) space.

What it looks like it tests: find the max-sum contiguous subarray. What it actually tests: can you discover Kadane’s recurrence on your own? Can you handle the all-negative-array trap?

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/maximum-subarray/

STOP. Set a 15-minute timer (this one is genuinely harder than the surrounding Easy problems). Code it cold. Do not read past this section until solved or expired.

If you find Kadane’s immediately because you’ve seen it: target 5 min, AND derive the recurrence aloud as you write. Memorization without derivation is a Mid signal — practice the derivation.

3. Prerequisite Concepts

1D Dynamic Programming intuition: “state at index i depends only on state at index i-1” (phase-05 §1)
Why prefix sums DON’T directly solve this (they solve “sum of any subarray” but the argmax over all O(N²) subarrays still needs O(N²) without a cleverer transform — see Section 6)
The “all-negative” trap: max(0, ...) is wrong when every element is negative; the answer is the maximum single element
Loop invariants in DP: what state does each scalar represent at the top of each iteration?

4. How to Approach (FRAMEWORK Steps 1–9 applied)

Step 1 — Restate: “Given an integer array, return the maximum sum of any contiguous non-empty subarray.”

Step 2 — Clarify:

“Can the array contain negatives?” (Yes — this is the entire point. Pure positives is trivial.)
“Can it be all negatives?” (Yes — the answer is the maximum single element, NOT 0.)
“Must subarray be non-empty?” (Yes per LC. If empty allowed, answer would be max(0, kadane(nums)).)
“Need indices or just the sum?” (Per LC, just sum. Indices are a natural follow-up — track start/end.)
“Subarray (contiguous) or subsequence (non-contiguous)?” (Subarray. If subsequence, answer = sum of positives, trivial.)

Step 3 — Constraints: N up to 10^5 per LC. O(N²) brute (10^10) won’t fit. O(N) is required.

Step 4 — Examples:

[-2,1,-3,4,-1,2,1,-5,4] → 6 (subarray [4,-1,2,1])
[1] → 1
[-1] → -1 (single negative — critical trap)
[-3, -2, -1, -5] → -1 (all negatives — must NOT return 0)
[5, 4, -1, 7, 8] → 23 (mostly positive — sum almost everything)
[1, -100, 2, -100, 3] → 3 (a single big negative resets the running sum)

Step 5 — Brute Force: Two nested loops; for each (i, j) compute sum. O(N²) with the partial-sum optimization (compute running sum as j advances), O(N³) naively. State this aloud.

Step 6 — Brute Force Complexity: O(N²) time with the running-sum trick, O(1) space. Acceptable correctness oracle for small N.

Step 7 — Pattern Recognition: “Max over all contiguous subarrays” + “decision at index i depends only on whether to extend the previous best or start fresh” → Kadane’s algorithm (1D DP, collapses to O(1) space). This is the canonical “DP that doesn’t look like DP” problem.

Step 8 — Optimize: The recurrence: best_ending_here[i] = max(nums[i], best_ending_here[i-1] + nums[i]). Interpretation: at each position, either extend the previous best (if doing so makes the sum positive) or start fresh at nums[i]. The global answer is max(best_ending_here) across all i. Collapse to two scalars: curr = nums[0], best = nums[0]. For i in 1..N-1: curr = max(nums[i], curr + nums[i]); best = max(best, curr). Return best.

Step 9 — Prove correctness: Loop invariant: at the top of iteration i, curr = max sum of any subarray ending exactly at index i-1, and best = max sum of any subarray ending at indices 0..i-1. The recurrence preserves this: a subarray ending at i either includes the optimal one ending at i-1 (then sum = curr + nums[i]) or starts fresh at i (sum = nums[i]). Taking max preserves the optimal-ending-at-i invariant. Taking running max preserves best over all positions.

5. Progressive Hints

If stuck for 5 minutes, open hints.md. One hint, 5-min timer, retry.

6. Deeper Insight — Why It Works

The DP formulation, made explicit: define dp[i] = max sum of contiguous subarray ending at i (inclusive). The recurrence dp[i] = max(nums[i], dp[i-1] + nums[i]) says: the best subarray ending at i either starts at i (skip the past) or extends the best one ending at i-1. The global answer is max(dp[0..N-1]). This is a one-dimensional DP. The trick is recognizing that “ending at i” is the right state — most candidates initially think of dp[i] = “max in nums[0..i]” which leads to a recurrence that requires both “include i” and “don’t include i” cases and doesn’t collapse cleanly.

Why O(1) space: the recurrence only depends on dp[i-1], so a single scalar curr suffices. The global max is also a scalar best. No array allocation needed.

Why “all negatives” is the trap: the recurrence curr = max(nums[i], curr + nums[i]) correctly handles it: when all values are negative, nums[i] is always larger than curr + nums[i] (since curr is negative), so curr = nums[i] each step, and best tracks the maximum single element. But a popular incorrect formulation, curr = max(0, curr + nums[i]), treats “empty subarray” as legal and returns 0 — wrong because the problem requires non-empty.

Why prefix sums alone don’t solve this in O(N): with prefix sums P, the sum of subarray (i, j) is P[j] − P[i-1]. Maximizing P[j] − P[i-1] over i ≤ j IS solvable in O(N): walk left to right, track min P seen so far, take max of P[j] − min_so_far. But this IS Kadane in disguise — it’s the same algorithm rephrased. Both are O(N), one pass, O(1) extra space.

Connection to LC 121 (Buy/Sell Stock): identical algorithm. “Max profit = max over j of (price[j] − min_so_far_price)” is “max subarray sum = max over j of (P[j] − min_so_far_P)” where P is the prefix sum of price differences. The two problems are literally the same algorithm on different inputs.

Divide & conquer alternative: split in half, recursively compute (a) max subarray entirely in left half, (b) max subarray entirely in right half, (c) max subarray crossing the midpoint (linear scan). Take max of the three. O(N log N) — worse than Kadane. Worth knowing because it’s a clean example of D&C and because LC sometimes asks for both.

7. Anti-Pattern Analysis

Wrong-but-tempting #1 — Reset to 0 when sum goes negative:

curr = 0; best = float('-inf')
for x in nums:
    curr += x
    if curr < 0: curr = 0
    best = max(best, curr)
return best

Wrong for all-negative arrays: returns 0 (never updates best with a positive value). The fix is curr = max(x, curr + x) not curr = max(0, curr + x).

Wrong-but-tempting #2 — Brute force with three loops: O(N³). Even the O(N²) running-sum version is wasteful. Acceptable to state aloud as the baseline, then optimize.

Wrong-but-tempting #3 — Sort the array first: Destroys the “contiguous” constraint entirely. After sorting, “contiguous” no longer means anything meaningful about the original array.

Wrong-but-tempting #4 — DP with explicit array:

dp = [0] * len(nums)
dp[0] = nums[0]
for i in range(1, len(nums)):
    dp[i] = max(nums[i], dp[i-1] + nums[i])
return max(dp)

Correct, O(N) time, but O(N) space when O(1) suffices. For interview, this is a “didn’t notice the collapse” mark. For production with billion-element arrays, it’s an unnecessary 8GB of allocation.

Wrong-but-tempting #5 — Maintain “best including” and “best excluding” separately: Some candidates try include[i] = best ending at i and exclude[i] = best not ending at i. Works but more complex than necessary — Kadane’s only needs one DP series, not two.

8. Skills & Takeaways

Generalizable patterns:

1-D DP with state collapse: any DP where dp[i] depends only on dp[i-1] collapses to O(1) space. Reappears in: climbing stairs (p05), house robber (LC 198), best time to buy/sell stock (p03), maximum product subarray (LC 152).
“Extend or restart” decision: many subarray/substring problems ask “extend the current run or start fresh at this index?” — Kadane is the prototypical example.
All-negative / all-zero / empty traps: any “max sum” problem with negatives requires checking whether empty subarray is allowed and handling all-negative correctly.

Analogous problems:

LC 121 — Best Time to Buy/Sell Stock (Kadane on differences — your p03, same algorithm)
LC 152 — Maximum Product Subarray (Kadane with two states: max-and-min ending here, because a negative times negative becomes max — Hard variant)
LC 198 — House Robber (Kadane-style 1D DP with “skip current” choice)
LC 918 — Maximum Sum Circular Subarray (Kadane twice — once for max, once for min, then global - min)
LC 1186 — Maximum Subarray Sum with One Deletion (Kadane with two-state DP: “best ending here without deletion” and “with deletion”)
LC 363 — Max Sum Rectangle ≤ K in Matrix (Kadane lifted to 2D)
LC 53 itself is sometimes asked as “return the actual subarray, not just the sum” — extension: track start/end indices alongside curr/best.

When NOT to use this: non-contiguous subsequences (different problem, trivial: sum of positives or LIS-family). Streaming with multiple deletions (need a more complex deque-based variant).

9. When to Move On (binary; must all be YES)

I derived the recurrence dp[i] = max(nums[i], dp[i-1] + nums[i]) on paper before coding
My code uses TWO scalars (not an array) and handles all-negative correctly
I can explain why max(nums[i], ...) is right and max(0, ...) is wrong
I can articulate the loop invariant of both curr and best
I can sketch the divide-and-conquer O(N log N) alternative and explain why Kadane is better
My stress_test() passes 1000 iterations against the O(N²) brute force
I solved LC 152 (Max Product Subarray) and saw the two-state extension of Kadane
I noticed that p03 (Buy/Sell Stock) is the same algorithm on differences

If any unchecked: redo before moving to p10.

10. Company Context

Amazon (extremely common — every coding round may see this or a variant)

The framing: Often in story form: “Given daily P&L, find the most profitable contiguous trading window.”
Misleading example: They give you [1, 2, 3, 4] (all positive — trivial). Then [-1, -2, -3] (all negative — exposes whether you fell for the max(0, ...) trap).
Adversarial extension: “Now return the actual subarray (start and end indices), not just the sum.” (Track start candidate when you reset; finalize when best updates.) Then: “Now allow ONE element to be deleted.” (LC 1186.)
What they want to hear: Explicit derivation aloud. Verbatim phrase that earns points: “At each index I have two choices — extend or restart — and I pick whichever gives a larger sum ending here.”

Microsoft (loves the DP derivation question)

The framing: Standard problem.
Misleading example: [5, -10, 4, -1, 5] — first element is the temporary best (5), then a big negative pulls curr down, then you have to recognize that starting fresh at 4 was better.
Adversarial extension: “Now return the indices.” Then “Now find the max-sum subarray in a 2D matrix.” (LC 363’s cousin — fix two rows, collapse columns, apply Kadane.)
What they want to hear: Articulation of the DP state and recurrence. Microsoft phone screens explicitly score “derived the recurrence” as a separate rubric line.

Bloomberg (financial — they LOVE this problem)

The framing: “Given a price series, find the optimal long-only trading window with no shorting allowed.”
Misleading example: Includes negative price changes throughout — tests the all-negative handling.
Adversarial extension: “Now with transaction costs of X — subtract X from each profit.” Then “Now what if shorting is allowed?” (Absolute value Kadane.)
What they want to hear: Recognition that this IS Buy/Sell Stock in disguise, with explicit framing: “If I difference the price series, Kadane on the diffs gives the max single trade.” Bloomberg likes seeing the algorithmic family tree.

Apple (clarity + clean code)

The framing: Standard.
What they want to hear: Idiomatic code, clear variable names (curr, best, not c, b), explicit handling of empty / single-element inputs.

Meta (less common; usually appears as the harder LC 152 variant)

The framing: Often LC 152 (Max Product) directly, with this as warmup if you fumble.
What they want to hear: Awareness that product has two states (positive and negative) because negative × negative = max.

Google (medium frequency; high bar)

The framing: Sometimes the divide-and-conquer version is asked explicitly, to test that you can write D&C cleanly.
What they want to hear: Both Kadane (O(N)) and D&C (O(N log N)), with explicit “Kadane wins on this problem but D&C generalizes to 2D and to certain parallel implementations.”

11. Interviewer’s Lens

Phase	Strong signal	Weak signal	Scorecard phrase (strong)
Reading problem	Asks “negatives allowed?” + “all-negative case?” + “non-empty?”	Assumes positive-only	“Recognizes the trap immediately — strong test-case instinct”
Pre-coding	States brute force complexity, then derives Kadane’s recurrence aloud	Says “I’ll use Kadane” without derivation	“Derived recurrence on the spot — not just memorizing”
Coding	Two scalars, `max(nums[i], curr+nums[i])`, initializes from nums[0]	Uses array (no space optimization), `max(0, ...)` (wrong)	“Optimized space; correct all-negative handling”
Edge cases	Tests single-negative, all-negative before submit	Tests only positive examples	“Self-catches the all-negative trap”
Follow-up	Connects to LC 121, LC 152, LC 198	Sees only this problem	“Recognizes Kadane as a family of algorithms”

The scorecard line that gets you the offer: “Derived Kadane’s recurrence on the spot; optimized to O(1) space; recognized this as Buy/Sell Stock in disguise.”

The scorecard line that loses you: “Used max(0, ...) formulation; returned 0 on all-negative input; required interviewer to debug.”

12. Level Delta

Level	What their answer looks like
Mid	DP array version, correct. Doesn’t collapse to O(1). May handle all-negative correctly after prompting. ~12 min.
Senior	O(1) space Kadane derived aloud. Handles all-negative correctly without prompting. States complexity. Recognizes the loop invariant. ~8 min.
Staff	All of Senior + names “Kadane’s algorithm” + connects to LC 121 (same algorithm on differences), LC 152 (two-state extension), LC 198 (different “extend or skip” semantics) + offers D&C alternative + ~6 min.
Principal	All of Staff + asks about real production framing (financial windowing, anomaly detection, signal processing) + mentions that for streaming with sliding windows, deque-based variants apply + discusses parallel Kadane (associative monoid: combine (best_sum, prefix_sum, suffix_sum, total_sum) — enables map-reduce / SIMD parallelization) + ~5 min with full discussion.

13. Follow-up Questions & Full Answers

Follow-up 1: “Return the actual subarray, not just the sum.”

Signal sought: Bookkeeping discipline.

Full answer: “Track candidate start. When we ‘restart’ (i.e., nums[i] > curr + nums[i]), set tmp_start = i. When best updates, snapshot start = tmp_start, end = i. Return nums[start:end+1]. Single pass, O(1) extra state. Edge case: all-negative — tmp_start = end = argmax(nums), single-element subarray.”

Follow-up 2: “Now allow one element deletion.” (LC 1186)

Signal sought: Extending the state.

Full answer: “Two-state DP. with_delete[i] = best subarray ending at i using one deletion; without_delete[i] = best ending at i with no deletion. Recurrences: without[i] = max(nums[i], without[i-1] + nums[i]) (standard Kadane). with[i] = max(without[i-1], with[i-1] + nums[i]) — either delete nums[i] (carrying without[i-1] forward) or keep nums[i] and have already deleted earlier. Answer = max over all i of max(with[i], without[i]). Collapses to two scalars each. O(N) time, O(1) space.”

Follow-up 3: “Maximum sum circular subarray.” (LC 918)

Signal sought: Algorithmic re-application.

Full answer: “The optimal subarray is either (a) a normal subarray (Kadane gives this), or (b) wraps around — equivalently, (total sum) − (some subarray we exclude), so we want to MINIMIZE a subarray (Kadane on negated array). Return max(kadane(nums), total − min_subarray(nums)). Special case: if all elements are negative, the wraparound formula would ‘exclude everything’ = 0, but the problem requires non-empty, so handle separately: if max kadane result is negative, return that. Two Kadane passes. O(N) time, O(1) space.”

Follow-up 4: “Divide and conquer formulation.”

Signal sought: Algorithm versatility.

Full answer: “Recursively: max-subarray(arr) = max of {max-subarray(left half), max-subarray(right half), max-crossing-subarray(mid)}. The crossing case requires a linear scan from mid outward in both directions to find the best left-extending and right-extending pieces. Recurrence: T(N) = 2T(N/2) + O(N) → O(N log N). Worse than Kadane on a single machine but the D&C structure parallelizes cleanly and generalizes to 2D (LC 363’s cousin). Worth knowing both.”

Follow-up 5 (Hard, Principal signal): “Parallel Kadane across K machines.”

Signal sought: Distributed thinking, monoid recognition.

Full answer: “Each machine holds a contiguous shard. Each computes four scalars: (total_sum, max_prefix, max_suffix, max_subarray) over its shard. These compose associatively under a ‘merge’ operation: when merging shard A then shard B, the merged max_subarray is max(A.max_subarray, B.max_subarray, A.max_suffix + B.max_prefix). This is a monoid — the four-tuple under merge is associative with identity (0, 0, 0, -∞). That means we can map-reduce the entire computation: each shard contributes its 4-tuple, reduce them pairwise in any tree order. O(N/K) per machine, O(log K) reduction depth. SIMD implementations use the same trick at the cache-line level. This is a beautiful example of ‘rewrite the algorithm as a monoid to make it parallel.’”

14. Full Solution Walkthrough

See solution.py.

The file has six sections:

max_subarray_brute(nums) — O(N²) with running-sum trick; correctness oracle.
max_subarray(nums) — Kadane in two scalars; INVARIANT comments.
max_subarray_dp_array(nums) — Kadane with the explicit dp[] array; included for didactic comparison.
max_subarray_divide_conquer(nums) — O(N log N) version; useful to know.
stress_test() — generates 1000 random arrays including all-negative, asserts all four implementations agree.
edge_cases() — single positive, single negative, all-negative, all-positive, all-zero, the canonical example, sparse positives among negatives.

Decisions justified:

Why init curr = best = nums[0] (not 0): empty subarray not allowed.
Why max(nums[i], curr + nums[i]): handles all-negative correctly.
Why we include the dp_array version: shows the unoptimized DP for didactic clarity, then we point out the collapse.

15. Beyond the Problem — Production Reality

Real applications of Kadane:

Financial windowing: maximum profit / drawdown / volume window in time series. Trading desks compute “best trading interval” over historical bars; Kadane is the kernel.
Anomaly detection: maximum contiguous deviation from baseline in monitoring data (Datadog, Prometheus).
Genomic analysis: maximum-scoring local alignment (BLAST, Smith-Waterman) — the inner score recurrence is Kadane-like.
Image processing: max-sum rectangle in a 2D image (LC 363) uses Kadane on collapsed column sums; appears in object-detection preprocessing.
Signal processing: detecting bursts in a noisy signal — Kadane on (signal − threshold) finds the strongest burst window.

Production caveat — floating-point Kadane: if nums are floats with both positives and negatives, naive Kadane can accumulate floating-point error in curr. Use Kahan summation if precision matters. Production financial code does this.

Parallelization: the monoid formulation in Section 13.5 is real — Apache Spark MLlib’s vector-window algorithms use this structure to parallelize across executors. The four-scalar representation is the secret.

Principal-engineer code review comment: “Kadane is correct but our 2D variant (LC 363, applied to image kernels) is O(M²·N). We’ve seen people ‘cleverly’ parallelize by chunking rows, then have the chunks lose the cross-chunk optimal. Use the four-scalar monoid form for any parallel/chunked version — it’s the only correct way. Also: please add a unit test for the all-equal-negative array. We’ve shipped two bugs from max(0, ...) formulations.”

LeetCode Interview — Extreme Coding