Lab 07 — FFT / Polynomial Multiplication

Goal

Implement the Cooley-Tukey FFT to multiply two polynomials in O(N log N), and apply it to convolution-based problems (large-integer multiplication, string matching with wildcards).

The Discrete Fourier Transform (DFT) of a length-N vector evaluates the corresponding polynomial at N roots of unity. Multiplying two polynomials of degree N-1 via naive convolution is O(N²); via DFT, point-wise multiply, inverse-DFT, it’s O(N log N).

Cooley-Tukey (1965): divide and conquer the DFT. The radix-2 version requires N a power of 2.

Number-Theoretic Transform (NTT): FFT over a prime field; avoids floating-point error; common in competitive programming.

Interview Context

Codeforces / ICPC: regular (NTT version)
Signal processing roles (DSP, audio, image): expected
Cryptography research: standard tool
Quant: large-integer multiplication, time-series convolutions
Standard FAANG: essentially zero

When to Skip This Topic

Skip if any of these are true:

You aren’t targeting signal-processing, cryptography, or ICPC roles
You haven’t implemented divide-and-conquer recursive algorithms confidently
You’re rusty on complex number arithmetic

This is a “you need it or you don’t” topic. Most interview prep should skip.

Problem Statement

Polynomial Multiplication.

Given two polynomials A(x) = sum a[i] * x^i and B(x) = sum b[i] * x^i, compute their product C(x) = A(x) * B(x).

Equivalently: compute the convolution c[k] = sum_{i+j=k} a[i] * b[j].

Constraints

Degrees up to 10^5 or 10^6
Coefficients fit in int32 (avoid overflow concerns by using float carefully; or use NTT)
Wall-clock: < 1 sec

Clarifying Questions

Integer coefficients or real-valued? (Integer for NTT, real for FFT.)
Exact answer required? (Yes for NTT; FFT introduces floating error.)
Output as polynomial coefficients or as a value at specific x? (Coefficients.)

Examples

A = [1, 2, 3]   (1 + 2x + 3x²)
B = [4, 5]      (4 + 5x)
C = [4, 13, 22, 15]   (4 + 13x + 22x² + 15x³)

A = [1, 1]      (1 + x)
B = [1, 1]
C = [1, 2, 1]   (1 + x)² = 1 + 2x + x²

Brute Force

Nested loops: c[i+j] += a[i] * b[j]. O(N²). For N = 10^5: 10^10 ops — TLE.

Brute Force Complexity

Time: O(N²)
Space: O(N)

Optimization Path

Pad both A and B to length 2^k ≥ deg(A) + deg(B) + 1.
Compute DFT(A) and DFT(B) using Cooley-Tukey.
Pointwise multiply: F[i] = DFT(A)[i] * DFT(B)[i].
Compute IDFT(F) to recover convolution C.

Final Expected Approach

def fft(a, invert=False):
    n = len(a)
    if n == 1: return
    # bit-reverse permutation
    j = 0
    for i in range(1, n):
        bit = n >> 1
        while j & bit:
            j ^= bit
            bit >>= 1
        j ^= bit
        if i < j:
            a[i], a[j] = a[j], a[i]
    # butterfly
    length = 2
    while length <= n:
        angle = 2 * math.pi / length * (-1 if invert else 1)
        wlen = complex(math.cos(angle), math.sin(angle))
        for i in range(0, n, length):
            w = complex(1)
            for k in range(length // 2):
                u = a[i + k]
                v = a[i + k + length // 2] * w
                a[i + k] = u + v
                a[i + k + length // 2] = u - v
                w *= wlen
        length <<= 1
    if invert:
        for i in range(n):
            a[i] /= n

def multiply(a, b):
    result_size = 1
    while result_size < len(a) + len(b):
        result_size <<= 1
    fa = [complex(x) for x in a] + [complex(0)] * (result_size - len(a))
    fb = [complex(x) for x in b] + [complex(0)] * (result_size - len(b))
    fft(fa)
    fft(fb)
    for i in range(result_size):
        fa[i] *= fb[i]
    fft(fa, invert=True)
    return [round(x.real) for x in fa[:len(a) + len(b) - 1]]

For NTT (exact integer convolution), replace complex roots of unity with primitive roots in F_p for a prime p = c * 2^k + 1 (common: 998244353 with primitive root 3).

Data Structures

Array of complex numbers (FFT) or integers mod p (NTT)
Bit-reverse permutation index

Correctness Argument

DFT linearity: DFT(A + B) = DFT(A) + DFT(B).
Convolution theorem: DFT(A * B) = DFT(A) ⊙ DFT(B) (pointwise).
Inverse: IDFT(DFT(A)) = A.
Cooley-Tukey: recursive split into even/odd indices; combines via roots of unity.

Complexity

Time: O(N log N)
Space: O(N)

Implementation Requirements

N must be a power of 2; pad with zeros
Bit-reversal permutation correctly implemented
Iterative butterflies (recursive is fine for small N but slow for large)
For floating-point FFT: round to nearest integer at the end; verify error bound
For NTT: pick a prime large enough for max coefficient × N to avoid overflow

Tests

Multiply [1, 1] × [1, 1] = [1, 2, 1]
Multiply degree-3 polynomials by hand-verified product
Stress: random N = 10^4 polynomials vs O(N²) brute force; assert equality
N = 1 (constants only)
All zeros (result all zeros)
Performance: N = 2×10^5 in < 1 sec

Follow-up Questions

Exact integer convolution with large coefficients. → NTT with multi-modulus + CRT, or three NTTs with different primes.
String matching with wildcards. → Reduce to convolution; each char becomes a numeric encoding; wildcard = 0; sum-of-(diff)² = 0 means match.
Multi-dimensional FFT (image convolution). → Apply 1D FFT along each axis.
Fast multiplication of very large integers. → Schönhage-Strassen uses FFT; Furer’s algorithm faster asymptotically.
Subset sum convolution. → Walsh-Hadamard transform; different beast.

Product Extension

Audio processing (spectrograms, filters)
Image processing (Gaussian blur, edge detection)
Cryptography (large-integer multiplication for RSA, ECC)
Time-series analysis (autocorrelation)
Big-integer libraries (GMP uses FFT-based multiplication above ~1000 digits)

Language/Runtime Follow-ups

C++: standard. NumPy’s FFT in Python is C-optimized — sometimes acceptable as the “library” answer if the interviewer allows.
Python: pure-Python FFT is slow; for N > 10^4 use numpy.fft.
Java: Apache Commons Math has FFT.
JavaScript: rare in interviews; libraries exist.

Common Bugs

Bit-reverse permutation wrong: off-by-one in the swap loop.
Forgot to divide by N in inverse: result is N× too large.
N not a power of 2: padding error.
Floating-point error too large: for coefficients near max int32, need careful rounding or NTT.
NTT primitive root wrong: for prime p, root g must have order divisible by N.
Result length wrong: should be len(A) + len(B) - 1, but FFT computed over N ≥ that.

Debugging Strategy

Verify FFT then IFFT recovers the input (within floating tolerance)
Multiply by [1] and verify output equals input
Compare against numpy.fft on small inputs
For NTT: compute small examples by hand using primitive roots

Mastery Criteria

Implement Cooley-Tukey FFT in ≤ 45 min
Implement NTT in ≤ 60 min
State the convolution theorem
Identify problems solvable via FFT/NTT (convolution, large-int mult, string matching with errors)
Reason about numerical precision for FFT vs NTT
Estimate runtime for given N

LeetCode Interview — Extreme Coding