Lab 07 — FFT / Polynomial Multiplication

Goal

Implement the Cooley-Tukey FFT to multiply two polynomials in O(N log N), and apply it to convolution-based problems (large-integer multiplication, string matching with wildcards).

Background

The Discrete Fourier Transform (DFT) of a length-N vector evaluates the corresponding polynomial at N roots of unity. Multiplying two polynomials of degree N-1 via naive convolution is O(N²); via DFT, point-wise multiply, inverse-DFT, it’s O(N log N).

Cooley-Tukey (1965): divide and conquer the DFT. The radix-2 version requires N a power of 2.

Number-Theoretic Transform (NTT): FFT over a prime field; avoids floating-point error; common in competitive programming.

Interview Context

  • Codeforces / ICPC: regular (NTT version)
  • Signal processing roles (DSP, audio, image): expected
  • Cryptography research: standard tool
  • Quant: large-integer multiplication, time-series convolutions
  • Standard FAANG: essentially zero

When to Skip This Topic

Skip if any of these are true:

  • You aren’t targeting signal-processing, cryptography, or ICPC roles
  • You haven’t implemented divide-and-conquer recursive algorithms confidently
  • You’re rusty on complex number arithmetic

This is a “you need it or you don’t” topic. Most interview prep should skip.

Problem Statement

Polynomial Multiplication.

Given two polynomials A(x) = sum a[i] * x^i and B(x) = sum b[i] * x^i, compute their product C(x) = A(x) * B(x).

Equivalently: compute the convolution c[k] = sum_{i+j=k} a[i] * b[j].

Constraints

  • Degrees up to 10^5 or 10^6
  • Coefficients fit in int32 (avoid overflow concerns by using float carefully; or use NTT)
  • Wall-clock: < 1 sec

Clarifying Questions

  1. Integer coefficients or real-valued? (Integer for NTT, real for FFT.)
  2. Exact answer required? (Yes for NTT; FFT introduces floating error.)
  3. Output as polynomial coefficients or as a value at specific x? (Coefficients.)

Examples

A = [1, 2, 3]   (1 + 2x + 3x²)
B = [4, 5]      (4 + 5x)
C = [4, 13, 22, 15]   (4 + 13x + 22x² + 15x³)
A = [1, 1]      (1 + x)
B = [1, 1]
C = [1, 2, 1]   (1 + x)² = 1 + 2x + x²

Brute Force

Nested loops: c[i+j] += a[i] * b[j]. O(N²). For N = 10^5: 10^10 ops — TLE.

Brute Force Complexity

  • Time: O(N²)
  • Space: O(N)

Optimization Path

  1. Pad both A and B to length 2^k ≥ deg(A) + deg(B) + 1.
  2. Compute DFT(A) and DFT(B) using Cooley-Tukey.
  3. Pointwise multiply: F[i] = DFT(A)[i] * DFT(B)[i].
  4. Compute IDFT(F) to recover convolution C.

Final Expected Approach

def fft(a, invert=False):
    n = len(a)
    if n == 1: return
    # bit-reverse permutation
    j = 0
    for i in range(1, n):
        bit = n >> 1
        while j & bit:
            j ^= bit
            bit >>= 1
        j ^= bit
        if i < j:
            a[i], a[j] = a[j], a[i]
    # butterfly
    length = 2
    while length <= n:
        angle = 2 * math.pi / length * (-1 if invert else 1)
        wlen = complex(math.cos(angle), math.sin(angle))
        for i in range(0, n, length):
            w = complex(1)
            for k in range(length // 2):
                u = a[i + k]
                v = a[i + k + length // 2] * w
                a[i + k] = u + v
                a[i + k + length // 2] = u - v
                w *= wlen
        length <<= 1
    if invert:
        for i in range(n):
            a[i] /= n

def multiply(a, b):
    result_size = 1
    while result_size < len(a) + len(b):
        result_size <<= 1
    fa = [complex(x) for x in a] + [complex(0)] * (result_size - len(a))
    fb = [complex(x) for x in b] + [complex(0)] * (result_size - len(b))
    fft(fa)
    fft(fb)
    for i in range(result_size):
        fa[i] *= fb[i]
    fft(fa, invert=True)
    return [round(x.real) for x in fa[:len(a) + len(b) - 1]]

For NTT (exact integer convolution), replace complex roots of unity with primitive roots in F_p for a prime p = c * 2^k + 1 (common: 998244353 with primitive root 3).

Data Structures

  • Array of complex numbers (FFT) or integers mod p (NTT)
  • Bit-reverse permutation index

Correctness Argument

  • DFT linearity: DFT(A + B) = DFT(A) + DFT(B).
  • Convolution theorem: DFT(A * B) = DFT(A) ⊙ DFT(B) (pointwise).
  • Inverse: IDFT(DFT(A)) = A.
  • Cooley-Tukey: recursive split into even/odd indices; combines via roots of unity.

Complexity

  • Time: O(N log N)
  • Space: O(N)

Implementation Requirements

  • N must be a power of 2; pad with zeros
  • Bit-reversal permutation correctly implemented
  • Iterative butterflies (recursive is fine for small N but slow for large)
  • For floating-point FFT: round to nearest integer at the end; verify error bound
  • For NTT: pick a prime large enough for max coefficient × N to avoid overflow

Tests

  • Multiply [1, 1] × [1, 1] = [1, 2, 1]
  • Multiply degree-3 polynomials by hand-verified product
  • Stress: random N = 10^4 polynomials vs O(N²) brute force; assert equality
  • N = 1 (constants only)
  • All zeros (result all zeros)
  • Performance: N = 2×10^5 in < 1 sec

Follow-up Questions

  • Exact integer convolution with large coefficients. → NTT with multi-modulus + CRT, or three NTTs with different primes.
  • String matching with wildcards. → Reduce to convolution; each char becomes a numeric encoding; wildcard = 0; sum-of-(diff)² = 0 means match.
  • Multi-dimensional FFT (image convolution). → Apply 1D FFT along each axis.
  • Fast multiplication of very large integers. → Schönhage-Strassen uses FFT; Furer’s algorithm faster asymptotically.
  • Subset sum convolution. → Walsh-Hadamard transform; different beast.

Product Extension

  • Audio processing (spectrograms, filters)
  • Image processing (Gaussian blur, edge detection)
  • Cryptography (large-integer multiplication for RSA, ECC)
  • Time-series analysis (autocorrelation)
  • Big-integer libraries (GMP uses FFT-based multiplication above ~1000 digits)

Language/Runtime Follow-ups

  • C++: standard. NumPy’s FFT in Python is C-optimized — sometimes acceptable as the “library” answer if the interviewer allows.
  • Python: pure-Python FFT is slow; for N > 10^4 use numpy.fft.
  • Java: Apache Commons Math has FFT.
  • JavaScript: rare in interviews; libraries exist.

Common Bugs

  1. Bit-reverse permutation wrong: off-by-one in the swap loop.
  2. Forgot to divide by N in inverse: result is N× too large.
  3. N not a power of 2: padding error.
  4. Floating-point error too large: for coefficients near max int32, need careful rounding or NTT.
  5. NTT primitive root wrong: for prime p, root g must have order divisible by N.
  6. Result length wrong: should be len(A) + len(B) - 1, but FFT computed over N ≥ that.

Debugging Strategy

  • Verify FFT then IFFT recovers the input (within floating tolerance)
  • Multiply by [1] and verify output equals input
  • Compare against numpy.fft on small inputs
  • For NTT: compute small examples by hand using primitive roots

Mastery Criteria

  • Implement Cooley-Tukey FFT in ≤ 45 min
  • Implement NTT in ≤ 60 min
  • State the convolution theorem
  • Identify problems solvable via FFT/NTT (convolution, large-int mult, string matching with errors)
  • Reason about numerical precision for FFT vs NTT
  • Estimate runtime for given N