Lab 04 — Correctness Proofs (Binary Search & Kadane’s Algorithm)

Goal

Prove the correctness of two short, foundational algorithms — binary search and Kadane’s — using loop invariants and induction. By the end you should be able to state the invariant for any loop you write and use it both to prove correctness and to find bugs before they manifest.

Background Concepts

A loop invariant is a statement that is true:

  1. Initially (before the loop starts)
  2. Maintained (if true before an iteration, true after)
  3. Terminating (when the loop exits, it implies the desired post-condition)

This is induction on iterations. The invariant is what your loop promises about its state. If you can state the invariant out loud while coding, off-by-one bugs disappear because you can check each iteration against the promise.

A monovariant is a quantity that strictly decreases (or increases) each iteration and is bounded — it proves termination. For binary search, the monovariant is the search-range width.

An inductive proof for a recursive function: prove the base case correct; assume the recursive call is correct (induction hypothesis); show the combination is correct.

Interview Context

Interviewers rarely demand a formal proof, but they constantly ask “are you sure this works?” or “why does this work?” The candidates who answer with a precise invariant (“at the top of the loop, lo is the smallest index that could be the answer and hi is one past the largest”) look senior. The candidates who say “uh, I think so” look junior even when the code is correct.

For DP problems, “what’s the state, what’s the transition, and why does the order of iteration give you the correct value when you read it?” is the proof — interviewers explicitly ask this at Meta, Google, and Bloomberg.

Problem Statement

Prove correctness of two algorithms:

Part A: Binary search. Given a sorted array a and a target t, return the index of t if present, else -1.

Part B: Kadane’s algorithm. Given an array of integers (positive, negative, mixed), return the maximum sum of any non-empty contiguous subarray.

For each, you must:

  1. Write the code
  2. State the loop invariant precisely
  3. Prove the invariant holds initially, is maintained, and implies correctness on exit
  4. Identify the monovariant that proves termination

Constraints

  • 1 ≤ |a| ≤ 10^5 for both problems
  • Values: -10^9 ≤ a[i] ≤ 10^9 (Kadane: watch overflow on long all-positive subarrays)

Clarifying Questions

(Standard problem statements; the lab is about proof, not problem disambiguation.)

Examples

Binary search:

a = [1, 3, 5, 7, 9, 11], t = 7  → 3
a = [1, 3, 5, 7, 9, 11], t = 4  → -1
a = [],                  t = 1  → -1

Kadane:

[-2, 1, -3, 4, -1, 2, 1, -5, 4]   → 6  (subarray [4, -1, 2, 1])
[-3, -1, -2]                       → -1 (best single element)
[5]                                → 5

Initial Brute Force

Binary search: linear scan, O(N). Kadane: triple loop over (i, j, sum), O(N³). With prefix sums, O(N²).

Brute Force Complexity

Linear: O(N). Triple loop: O(N³). Both correct, both slow.

Optimization Path

Binary search: O(log N) by halving the candidate range each step. Kadane: O(N) by maintaining the best subarray ending at index i and the best seen so far.

Final Expected Approach

Part A — Binary Search (with Proof)

def binary_search(a, t):
    lo, hi = 0, len(a)              # half-open: search range is [lo, hi)
    while lo < hi:
        mid = lo + (hi - lo) // 2
        if a[mid] == t:
            return mid
        elif a[mid] < t:
            lo = mid + 1
        else:
            hi = mid
    return -1

Loop invariant: At the top of every iteration, if t is present in a, then t is at some index in [lo, hi).

Initialization: lo = 0, hi = len(a). If t is present, it’s at some index in [0, len(a)) by definition. Invariant holds.

Maintenance: Assume invariant holds before an iteration. Compute mid.

  • If a[mid] == t, return immediately — correct.
  • If a[mid] < t: because a is sorted, every index ≤ mid has value ≤ a[mid] < t, so t is not at any of those indices. If t is in a, it must be in [mid+1, hi). Setting lo = mid + 1 preserves the invariant.
  • If a[mid] > t: symmetric; t not at index ≥ mid. Setting hi = mid preserves the invariant.

Termination & post-condition: The monovariant is hi - lo, which strictly decreases each iteration (verify: in both update branches mid = lo + (hi-lo)//2, after which lo' > lo or hi' < hi; specifically hi' - lo' < hi - lo always when lo < hi). It’s bounded below by 0, so the loop terminates. When lo == hi, the search range is empty. By the invariant, if t were present, it would be in an empty range — contradiction. So t is absent. Returning -1 is correct.

Critical subtlety — mid = lo + (hi - lo) // 2 vs (lo + hi) // 2: the former avoids integer overflow when lo + hi > INT_MAX. In Python this doesn’t matter (arbitrary precision int), but in Java/C++ it’s a real bug. Famous: the Java SDK had this bug in Arrays.binarySearch for ~9 years (Bloch, 2006).

Part B — Kadane’s Algorithm (with Proof)

def kadane(a):
    best_here = best_overall = a[0]
    for i in range(1, len(a)):
        best_here = max(a[i], best_here + a[i])
        best_overall = max(best_overall, best_here)
    return best_overall

Loop invariant: At the top of iteration i (1-indexed for clarity):

  1. best_here is the maximum sum of any contiguous subarray ending at index i - 1.
  2. best_overall is the maximum sum of any contiguous subarray within a[0..i-1] inclusive.

Initialization: Before the loop (i.e., before i = 1), the only subarray of a[0..0] is [a[0]] with sum a[0]. Both best_here and best_overall are set to a[0]. Invariant holds.

Maintenance: Assume the invariant holds at the start of iteration i. Consider all contiguous subarrays ending at index i. Each such subarray is either:

  • The singleton [a[i]], with sum a[i], OR
  • An extension of a subarray ending at i - 1, with sum (sum of that subarray) + a[i].

The best extension is best_here + a[i] (by the invariant on best_here). So the best subarray ending at i has sum max(a[i], best_here + a[i]), which is exactly the new best_here. Invariant clause 1 maintained.

The best subarray within a[0..i] is either entirely within a[0..i-1] (covered by old best_overall) or ends at i (covered by new best_here). The new best_overall = max(old best_overall, new best_here) is therefore correct. Clause 2 maintained.

Termination & post-condition: The loop runs exactly n - 1 iterations (finite, no monovariant needed). On exit, best_overall is the max contiguous subarray sum within a[0..n-1] = the whole array. Returning it is correct.

Edge case proof: Kadane requires a non-empty (the problem states this); for an all-negative array, the answer is the maximum single element. The invariant handles this because best_here will reset to a[i] whenever best_here + a[i] < a[i], i.e., whenever best_here < 0. This is why the algorithm works for negative-only arrays — a common bug is to initialize best_here = 0, which incorrectly returns 0 for all-negative input.

Data Structures Used

  • Plain arrays
  • Two integer variables for Kadane
  • Two integer indices for binary search

Correctness Argument

See Part A and Part B above.

Complexity

  • Binary search: O(log N) time, O(1) space.
  • Kadane: O(N) time, O(1) space.

Implementation Requirements

  • Use lo + (hi - lo) // 2 for binary search midpoint
  • Use half-open interval [lo, hi) for binary search — easier to reason about than closed [lo, hi]
  • Initialize Kadane’s best_here and best_overall to a[0], not 0, to handle all-negative arrays

Tests

Smoke

assert binary_search([1, 3, 5, 7, 9], 5) == 2
assert binary_search([1, 3, 5, 7, 9], 4) == -1
assert kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4]) == 6

Edge

# Binary search edges
assert binary_search([], 1) == -1
assert binary_search([5], 5) == 0
assert binary_search([5], 4) == -1
assert binary_search([1, 1, 1, 1], 1) in (0, 1, 2, 3)  # any valid index

# Kadane edges
assert kadane([5]) == 5
assert kadane([-3, -1, -2]) == -1
assert kadane([1, 2, 3, 4]) == 10        # all positive
assert kadane([-1, -2, -3, -4]) == -1    # all negative

Large

import random
random.seed(0)
big = sorted(random.sample(range(10**6), 10**5))
assert binary_search(big, big[50000]) == 50000

big2 = [random.randint(-10**6, 10**6) for _ in range(10**5)]
result = kadane(big2)
assert isinstance(result, int)

Randomized verifier

def brute_kadane(a):
    return max(sum(a[i:j]) for i in range(len(a)) for j in range(i+1, len(a)+1))

for _ in range(200):
    a = [random.randint(-50, 50) for _ in range(random.randint(1, 30))]
    assert kadane(a) == brute_kadane(a)

Invariant assertions (the proof, in code)

def binary_search_with_assertions(a, t):
    lo, hi = 0, len(a)
    while lo < hi:
        # INVARIANT: if t in a, then t at some index in [lo, hi)
        if t in a[:lo]: assert False, "invariant violated (t before lo)"
        if lo > 0 and a[lo-1] >= t and t in a: assert False
        mid = lo + (hi - lo) // 2
        if a[mid] == t: return mid
        elif a[mid] < t: lo = mid + 1
        else: hi = mid
    return -1

Follow-up Questions

  1. Find leftmost vs rightmost occurrence of t. Modify binary search; the invariant becomes “the answer is in [lo, hi)” rather than “if t is present, it’s in…”.
  2. Binary search on real numbers. Replace integer halving with floating-point; the invariant is the same but termination uses a precision threshold, not lo < hi.
  3. Kadane with at most K negative numbers allowed. State expands to (i, k); DP, O(NK).
  4. Maximum sum circular subarray. Two passes of Kadane + total-sum trick; the invariant for the circular case is more subtle.
  5. Maximum product subarray. Maintain both max and min products at each index because a negative * negative becomes the largest.

Product Extension

  • Database B-tree page searches — binary search within a page; the invariant analysis directly applies.
  • Time-series anomaly detection — Kadane variants find the largest cumulative deviation, used in change-point detection.
  • Streaming Kadane — given a stream of metrics, find the worst-degradation window. Same algorithm with O(1) memory.

Language/Runtime Follow-ups

  • Python: mid = (lo + hi) // 2 is safe (arbitrary precision); mid = lo + (hi - lo) // 2 is still preferable for portability.
  • Java/C++: (lo + hi) / 2 overflows when lo + hi > 2^31 - 1. Use lo + (hi - lo) / 2 or (lo + hi) >>> 1 (Java unsigned shift).
  • Kadane overflow: for |a[i]| ≤ 10^9 and N = 10^5, max sum is 10^14 — exceeds 32-bit int. Use long/int64 in Java/Go/C++.
  • Floating-point Kadane: accumulation error compounds; use Kahan summation if precision matters.

Common Bugs

  1. (lo + hi) // 2 overflow in C++/Java
  2. Closed-interval binary search with lo <= hi is correct but the mid updates are trickier; pick a convention and stick with it
  3. Kadane initialized to 0 — fails on all-negative arrays
  4. Forgetting best_overall update — returns the best ending at the last position, not overall
  5. Empty input to Kadane — undefined; problem statement says non-empty, but check the contract
  6. Binary search infinite loop when lo = mid instead of lo = mid + 1 — the monovariant doesn’t decrease

Debugging Strategy

When binary search loops forever or returns wrong index:

  1. Print (lo, mid, hi) each iteration. If lo doesn’t strictly increase or hi doesn’t strictly decrease (toward each other), you have an off-by-one.
  2. Check the boundary condition: does your invariant include or exclude hi?
  3. For “find leftmost”, the answer is at lo after the loop, not mid.

When Kadane returns 0 on all-negative input:

  1. Check initialization — should be a[0], not 0.
  2. Print (best_here, best_overall) at each step; trace by hand against the expected.

Mastery Criteria

  • Wrote both algorithms correctly without testing first (proof-first coding)
  • Stated the loop invariant for each in one sentence
  • Identified the monovariant for binary search termination
  • Proved correctness by induction (3 steps: init, maintain, terminate)
  • Recognized the (lo + hi) / 2 overflow risk without prompting
  • Explained why Kadane’s init must be a[0] and not 0
  • Wrote loop invariants as comments in your code for one Phase 5 DP problem