Lab 07 — FFT / Polynomial Multiplication
Goal
Implement the Cooley-Tukey FFT to multiply two polynomials in O(N log N), and apply it to convolution-based problems (large-integer multiplication, string matching with wildcards).
Background
The Discrete Fourier Transform (DFT) of a length-N vector evaluates the corresponding polynomial at N roots of unity. Multiplying two polynomials of degree N-1 via naive convolution is O(N²); via DFT, point-wise multiply, inverse-DFT, it’s O(N log N).
Cooley-Tukey (1965): divide and conquer the DFT. The radix-2 version requires N a power of 2.
Number-Theoretic Transform (NTT): FFT over a prime field; avoids floating-point error; common in competitive programming.
Interview Context
- Codeforces / ICPC: regular (NTT version)
- Signal processing roles (DSP, audio, image): expected
- Cryptography research: standard tool
- Quant: large-integer multiplication, time-series convolutions
- Standard FAANG: essentially zero
When to Skip This Topic
Skip if any of these are true:
- You aren’t targeting signal-processing, cryptography, or ICPC roles
- You haven’t implemented divide-and-conquer recursive algorithms confidently
- You’re rusty on complex number arithmetic
This is a “you need it or you don’t” topic. Most interview prep should skip.
Problem Statement
Polynomial Multiplication.
Given two polynomials A(x) = sum a[i] * x^i and B(x) = sum b[i] * x^i, compute their product C(x) = A(x) * B(x).
Equivalently: compute the convolution c[k] = sum_{i+j=k} a[i] * b[j].
Constraints
- Degrees up to 10^5 or 10^6
- Coefficients fit in int32 (avoid overflow concerns by using float carefully; or use NTT)
- Wall-clock: < 1 sec
Clarifying Questions
- Integer coefficients or real-valued? (Integer for NTT, real for FFT.)
- Exact answer required? (Yes for NTT; FFT introduces floating error.)
- Output as polynomial coefficients or as a value at specific x? (Coefficients.)
Examples
A = [1, 2, 3] (1 + 2x + 3x²)
B = [4, 5] (4 + 5x)
C = [4, 13, 22, 15] (4 + 13x + 22x² + 15x³)
A = [1, 1] (1 + x)
B = [1, 1]
C = [1, 2, 1] (1 + x)² = 1 + 2x + x²
Brute Force
Nested loops: c[i+j] += a[i] * b[j]. O(N²). For N = 10^5: 10^10 ops — TLE.
Brute Force Complexity
- Time: O(N²)
- Space: O(N)
Optimization Path
- Pad both A and B to length 2^k ≥ deg(A) + deg(B) + 1.
- Compute DFT(A) and DFT(B) using Cooley-Tukey.
- Pointwise multiply: F[i] = DFT(A)[i] * DFT(B)[i].
- Compute IDFT(F) to recover convolution C.
Final Expected Approach
def fft(a, invert=False):
n = len(a)
if n == 1: return
# bit-reverse permutation
j = 0
for i in range(1, n):
bit = n >> 1
while j & bit:
j ^= bit
bit >>= 1
j ^= bit
if i < j:
a[i], a[j] = a[j], a[i]
# butterfly
length = 2
while length <= n:
angle = 2 * math.pi / length * (-1 if invert else 1)
wlen = complex(math.cos(angle), math.sin(angle))
for i in range(0, n, length):
w = complex(1)
for k in range(length // 2):
u = a[i + k]
v = a[i + k + length // 2] * w
a[i + k] = u + v
a[i + k + length // 2] = u - v
w *= wlen
length <<= 1
if invert:
for i in range(n):
a[i] /= n
def multiply(a, b):
result_size = 1
while result_size < len(a) + len(b):
result_size <<= 1
fa = [complex(x) for x in a] + [complex(0)] * (result_size - len(a))
fb = [complex(x) for x in b] + [complex(0)] * (result_size - len(b))
fft(fa)
fft(fb)
for i in range(result_size):
fa[i] *= fb[i]
fft(fa, invert=True)
return [round(x.real) for x in fa[:len(a) + len(b) - 1]]
For NTT (exact integer convolution), replace complex roots of unity with primitive roots in F_p for a prime p = c * 2^k + 1 (common: 998244353 with primitive root 3).
Data Structures
- Array of complex numbers (FFT) or integers mod p (NTT)
- Bit-reverse permutation index
Correctness Argument
- DFT linearity: DFT(A + B) = DFT(A) + DFT(B).
- Convolution theorem: DFT(A * B) = DFT(A) ⊙ DFT(B) (pointwise).
- Inverse: IDFT(DFT(A)) = A.
- Cooley-Tukey: recursive split into even/odd indices; combines via roots of unity.
Complexity
- Time: O(N log N)
- Space: O(N)
Implementation Requirements
- N must be a power of 2; pad with zeros
- Bit-reversal permutation correctly implemented
- Iterative butterflies (recursive is fine for small N but slow for large)
- For floating-point FFT: round to nearest integer at the end; verify error bound
- For NTT: pick a prime large enough for max coefficient × N to avoid overflow
Tests
- Multiply [1, 1] × [1, 1] = [1, 2, 1]
- Multiply degree-3 polynomials by hand-verified product
- Stress: random N = 10^4 polynomials vs O(N²) brute force; assert equality
- N = 1 (constants only)
- All zeros (result all zeros)
- Performance: N = 2×10^5 in < 1 sec
Follow-up Questions
- Exact integer convolution with large coefficients. → NTT with multi-modulus + CRT, or three NTTs with different primes.
- String matching with wildcards. → Reduce to convolution; each char becomes a numeric encoding; wildcard = 0; sum-of-(diff)² = 0 means match.
- Multi-dimensional FFT (image convolution). → Apply 1D FFT along each axis.
- Fast multiplication of very large integers. → Schönhage-Strassen uses FFT; Furer’s algorithm faster asymptotically.
- Subset sum convolution. → Walsh-Hadamard transform; different beast.
Product Extension
- Audio processing (spectrograms, filters)
- Image processing (Gaussian blur, edge detection)
- Cryptography (large-integer multiplication for RSA, ECC)
- Time-series analysis (autocorrelation)
- Big-integer libraries (GMP uses FFT-based multiplication above ~1000 digits)
Language/Runtime Follow-ups
- C++: standard. NumPy’s FFT in Python is C-optimized — sometimes acceptable as the “library” answer if the interviewer allows.
- Python: pure-Python FFT is slow; for N > 10^4 use
numpy.fft. - Java: Apache Commons Math has FFT.
- JavaScript: rare in interviews; libraries exist.
Common Bugs
- Bit-reverse permutation wrong: off-by-one in the swap loop.
- Forgot to divide by N in inverse: result is N× too large.
- N not a power of 2: padding error.
- Floating-point error too large: for coefficients near max int32, need careful rounding or NTT.
- NTT primitive root wrong: for prime p, root g must have order divisible by N.
- Result length wrong: should be len(A) + len(B) - 1, but FFT computed over N ≥ that.
Debugging Strategy
- Verify FFT then IFFT recovers the input (within floating tolerance)
- Multiply by [1] and verify output equals input
- Compare against numpy.fft on small inputs
- For NTT: compute small examples by hand using primitive roots
Mastery Criteria
- Implement Cooley-Tukey FFT in ≤ 45 min
- Implement NTT in ≤ 60 min
- State the convolution theorem
- Identify problems solvable via FFT/NTT (convolution, large-int mult, string matching with errors)
- Reason about numerical precision for FFT vs NTT
- Estimate runtime for given N