Lab 04 — Correctness Proofs (Binary Search & Kadane’s Algorithm)
Goal
Prove the correctness of two short, foundational algorithms — binary search and Kadane’s — using loop invariants and induction. By the end you should be able to state the invariant for any loop you write and use it both to prove correctness and to find bugs before they manifest.
Background Concepts
A loop invariant is a statement that is true:
- Initially (before the loop starts)
- Maintained (if true before an iteration, true after)
- Terminating (when the loop exits, it implies the desired post-condition)
This is induction on iterations. The invariant is what your loop promises about its state. If you can state the invariant out loud while coding, off-by-one bugs disappear because you can check each iteration against the promise.
A monovariant is a quantity that strictly decreases (or increases) each iteration and is bounded — it proves termination. For binary search, the monovariant is the search-range width.
An inductive proof for a recursive function: prove the base case correct; assume the recursive call is correct (induction hypothesis); show the combination is correct.
Interview Context
Interviewers rarely demand a formal proof, but they constantly ask “are you sure this works?” or “why does this work?” The candidates who answer with a precise invariant (“at the top of the loop, lo is the smallest index that could be the answer and hi is one past the largest”) look senior. The candidates who say “uh, I think so” look junior even when the code is correct.
For DP problems, “what’s the state, what’s the transition, and why does the order of iteration give you the correct value when you read it?” is the proof — interviewers explicitly ask this at Meta, Google, and Bloomberg.
Problem Statement
Prove correctness of two algorithms:
Part A: Binary search. Given a sorted array a and a target t, return the index of t if present, else -1.
Part B: Kadane’s algorithm. Given an array of integers (positive, negative, mixed), return the maximum sum of any non-empty contiguous subarray.
For each, you must:
- Write the code
- State the loop invariant precisely
- Prove the invariant holds initially, is maintained, and implies correctness on exit
- Identify the monovariant that proves termination
Constraints
- 1 ≤ |a| ≤ 10^5 for both problems
- Values: -10^9 ≤ a[i] ≤ 10^9 (Kadane: watch overflow on long all-positive subarrays)
Clarifying Questions
(Standard problem statements; the lab is about proof, not problem disambiguation.)
Examples
Binary search:
a = [1, 3, 5, 7, 9, 11], t = 7 → 3
a = [1, 3, 5, 7, 9, 11], t = 4 → -1
a = [], t = 1 → -1
Kadane:
[-2, 1, -3, 4, -1, 2, 1, -5, 4] → 6 (subarray [4, -1, 2, 1])
[-3, -1, -2] → -1 (best single element)
[5] → 5
Initial Brute Force
Binary search: linear scan, O(N).
Kadane: triple loop over (i, j, sum), O(N³). With prefix sums, O(N²).
Brute Force Complexity
Linear: O(N). Triple loop: O(N³). Both correct, both slow.
Optimization Path
Binary search: O(log N) by halving the candidate range each step. Kadane: O(N) by maintaining the best subarray ending at index i and the best seen so far.
Final Expected Approach
Part A — Binary Search (with Proof)
def binary_search(a, t):
lo, hi = 0, len(a) # half-open: search range is [lo, hi)
while lo < hi:
mid = lo + (hi - lo) // 2
if a[mid] == t:
return mid
elif a[mid] < t:
lo = mid + 1
else:
hi = mid
return -1
Loop invariant: At the top of every iteration, if t is present in a, then t is at some index in [lo, hi).
Initialization: lo = 0, hi = len(a). If t is present, it’s at some index in [0, len(a)) by definition. Invariant holds.
Maintenance: Assume invariant holds before an iteration. Compute mid.
- If
a[mid] == t, return immediately — correct. - If
a[mid] < t: becauseais sorted, every index≤ midhas value≤ a[mid] < t, sotis not at any of those indices. Iftis ina, it must be in[mid+1, hi). Settinglo = mid + 1preserves the invariant. - If
a[mid] > t: symmetric;tnot at index≥ mid. Settinghi = midpreserves the invariant.
Termination & post-condition: The monovariant is hi - lo, which strictly decreases each iteration (verify: in both update branches mid = lo + (hi-lo)//2, after which lo' > lo or hi' < hi; specifically hi' - lo' < hi - lo always when lo < hi). It’s bounded below by 0, so the loop terminates. When lo == hi, the search range is empty. By the invariant, if t were present, it would be in an empty range — contradiction. So t is absent. Returning -1 is correct.
Critical subtlety — mid = lo + (hi - lo) // 2 vs (lo + hi) // 2: the former avoids integer overflow when lo + hi > INT_MAX. In Python this doesn’t matter (arbitrary precision int), but in Java/C++ it’s a real bug. Famous: the Java SDK had this bug in Arrays.binarySearch for ~9 years (Bloch, 2006).
Part B — Kadane’s Algorithm (with Proof)
def kadane(a):
best_here = best_overall = a[0]
for i in range(1, len(a)):
best_here = max(a[i], best_here + a[i])
best_overall = max(best_overall, best_here)
return best_overall
Loop invariant: At the top of iteration i (1-indexed for clarity):
best_hereis the maximum sum of any contiguous subarray ending at indexi - 1.best_overallis the maximum sum of any contiguous subarray withina[0..i-1]inclusive.
Initialization: Before the loop (i.e., before i = 1), the only subarray of a[0..0] is [a[0]] with sum a[0]. Both best_here and best_overall are set to a[0]. Invariant holds.
Maintenance: Assume the invariant holds at the start of iteration i. Consider all contiguous subarrays ending at index i. Each such subarray is either:
- The singleton
[a[i]], with suma[i], OR - An extension of a subarray ending at
i - 1, with sum(sum of that subarray) + a[i].
The best extension is best_here + a[i] (by the invariant on best_here). So the best subarray ending at i has sum max(a[i], best_here + a[i]), which is exactly the new best_here. Invariant clause 1 maintained.
The best subarray within a[0..i] is either entirely within a[0..i-1] (covered by old best_overall) or ends at i (covered by new best_here). The new best_overall = max(old best_overall, new best_here) is therefore correct. Clause 2 maintained.
Termination & post-condition: The loop runs exactly n - 1 iterations (finite, no monovariant needed). On exit, best_overall is the max contiguous subarray sum within a[0..n-1] = the whole array. Returning it is correct.
Edge case proof: Kadane requires a non-empty (the problem states this); for an all-negative array, the answer is the maximum single element. The invariant handles this because best_here will reset to a[i] whenever best_here + a[i] < a[i], i.e., whenever best_here < 0. This is why the algorithm works for negative-only arrays — a common bug is to initialize best_here = 0, which incorrectly returns 0 for all-negative input.
Data Structures Used
- Plain arrays
- Two integer variables for Kadane
- Two integer indices for binary search
Correctness Argument
See Part A and Part B above.
Complexity
- Binary search: O(log N) time, O(1) space.
- Kadane: O(N) time, O(1) space.
Implementation Requirements
- Use
lo + (hi - lo) // 2for binary search midpoint - Use half-open interval
[lo, hi)for binary search — easier to reason about than closed[lo, hi] - Initialize Kadane’s
best_hereandbest_overalltoa[0], not0, to handle all-negative arrays
Tests
Smoke
assert binary_search([1, 3, 5, 7, 9], 5) == 2
assert binary_search([1, 3, 5, 7, 9], 4) == -1
assert kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4]) == 6
Edge
# Binary search edges
assert binary_search([], 1) == -1
assert binary_search([5], 5) == 0
assert binary_search([5], 4) == -1
assert binary_search([1, 1, 1, 1], 1) in (0, 1, 2, 3) # any valid index
# Kadane edges
assert kadane([5]) == 5
assert kadane([-3, -1, -2]) == -1
assert kadane([1, 2, 3, 4]) == 10 # all positive
assert kadane([-1, -2, -3, -4]) == -1 # all negative
Large
import random
random.seed(0)
big = sorted(random.sample(range(10**6), 10**5))
assert binary_search(big, big[50000]) == 50000
big2 = [random.randint(-10**6, 10**6) for _ in range(10**5)]
result = kadane(big2)
assert isinstance(result, int)
Randomized verifier
def brute_kadane(a):
return max(sum(a[i:j]) for i in range(len(a)) for j in range(i+1, len(a)+1))
for _ in range(200):
a = [random.randint(-50, 50) for _ in range(random.randint(1, 30))]
assert kadane(a) == brute_kadane(a)
Invariant assertions (the proof, in code)
def binary_search_with_assertions(a, t):
lo, hi = 0, len(a)
while lo < hi:
# INVARIANT: if t in a, then t at some index in [lo, hi)
if t in a[:lo]: assert False, "invariant violated (t before lo)"
if lo > 0 and a[lo-1] >= t and t in a: assert False
mid = lo + (hi - lo) // 2
if a[mid] == t: return mid
elif a[mid] < t: lo = mid + 1
else: hi = mid
return -1
Follow-up Questions
- Find leftmost vs rightmost occurrence of
t. Modify binary search; the invariant becomes “the answer is in[lo, hi)” rather than “if t is present, it’s in…”. - Binary search on real numbers. Replace integer halving with floating-point; the invariant is the same but termination uses a precision threshold, not
lo < hi. - Kadane with at most K negative numbers allowed. State expands to
(i, k); DP, O(NK). - Maximum sum circular subarray. Two passes of Kadane + total-sum trick; the invariant for the circular case is more subtle.
- Maximum product subarray. Maintain both max and min products at each index because a negative * negative becomes the largest.
Product Extension
- Database B-tree page searches — binary search within a page; the invariant analysis directly applies.
- Time-series anomaly detection — Kadane variants find the largest cumulative deviation, used in change-point detection.
- Streaming Kadane — given a stream of metrics, find the worst-degradation window. Same algorithm with O(1) memory.
Language/Runtime Follow-ups
- Python:
mid = (lo + hi) // 2is safe (arbitrary precision);mid = lo + (hi - lo) // 2is still preferable for portability. - Java/C++:
(lo + hi) / 2overflows whenlo + hi > 2^31 - 1. Uselo + (hi - lo) / 2or(lo + hi) >>> 1(Java unsigned shift). - Kadane overflow: for
|a[i]| ≤ 10^9and N = 10^5, max sum is 10^14 — exceeds 32-bit int. Uselong/int64in Java/Go/C++. - Floating-point Kadane: accumulation error compounds; use Kahan summation if precision matters.
Common Bugs
(lo + hi) // 2overflow in C++/Java- Closed-interval binary search with
lo <= hiis correct but themidupdates are trickier; pick a convention and stick with it - Kadane initialized to 0 — fails on all-negative arrays
- Forgetting
best_overallupdate — returns the best ending at the last position, not overall - Empty input to Kadane — undefined; problem statement says non-empty, but check the contract
- Binary search infinite loop when
lo = midinstead oflo = mid + 1— the monovariant doesn’t decrease
Debugging Strategy
When binary search loops forever or returns wrong index:
- Print
(lo, mid, hi)each iteration. Iflodoesn’t strictly increase orhidoesn’t strictly decrease (toward each other), you have an off-by-one. - Check the boundary condition: does your invariant include or exclude
hi? - For “find leftmost”, the answer is at
loafter the loop, notmid.
When Kadane returns 0 on all-negative input:
- Check initialization — should be
a[0], not0. - Print
(best_here, best_overall)at each step; trace by hand against the expected.
Mastery Criteria
- Wrote both algorithms correctly without testing first (proof-first coding)
- Stated the loop invariant for each in one sentence
- Identified the monovariant for binary search termination
- Proved correctness by induction (3 steps: init, maintain, terminate)
-
Recognized the
(lo + hi) / 2overflow risk without prompting -
Explained why Kadane’s init must be
a[0]and not0 - Wrote loop invariants as comments in your code for one Phase 5 DP problem