Lab 01 — Modular Arithmetic: nCr mod p With Precomputed Factorials
Goal
Master modular arithmetic and modular inverse by building a nCr mod p engine that answers any (n, r) query in O(1) after O(N) preprocessing, for n up to 10^7 and p = 10^9 + 7. By the end of this lab you can write the engine from blank in under 5 minutes.
Background
Counting problems modulo a prime are the single most common framing in competitive programming. The output line “print the answer modulo 10^9 + 7” appears on roughly 30% of CF Div 2/3 problems with combinatorial flavor. Behind that line is a fixed engine: precompute fact[i] = i! and inv_fact[i] = (i!)^(-1) modulo p, and C(n, r) = fact[n] · inv_fact[r] · inv_fact[n-r] mod p. Once you have the engine, dozens of problems collapse to “set up the formula, plug in, output”. The same machinery underlies probability problems where the answer is a/b mod p (output a · b^(-1) mod p).
Interview Context
Quant/HFT interviews use modular-counting problems as direct filters: “how many distinct length-k increasing sequences over [1, n], modulo 10^9 + 7?” If you can’t write the formula and the inverse-factorial trick fluently, you’ve failed in 10 minutes. FAANG L4 interviews almost never ask this directly, but a good L5+ candidate signals fluency by reaching for inv_fact[r] without explanation when a counting problem’s answer needs to be reduced. The signal interviewers want is “this candidate has CP background”; that signal is delivered by writing modular inverse without flinching.
Problem Statement
Given Q queries, each (n_i, r_i) with 0 ≤ r_i ≤ n_i ≤ N_max, output C(n_i, r_i) mod p for p = 10^9 + 7. The engine must support Q ≥ 10^6 queries in <2 seconds total.
LeetCode reference: LC 62 (Unique Paths) asks for C(m+n-2, m-1) directly (no mod needed). LC 920 (Number of Music Playlists) is a DP that uses modular arithmetic but not nCr directly. The pure CP framing appears on Codeforces (e.g., CF 1342E, CF 1342B).
Constraints
1 ≤ N_max ≤ 10^7(table size).1 ≤ Q ≤ 10^6(queries).0 ≤ r ≤ n ≤ N_max(well-formed query).p = 10^9 + 7(prime).- Time limit: 2 seconds. Memory limit: 256 MB. The factorial tables fit (
10^7 · 8 bytes ≈ 80 MB).
Clarifying Questions
- “Is
palways prime?” — yes; FLT works iffpprime. If composite, fall back to extgcd-based inverse and watch for non-invertible elements. - “Are
nandralways non-negative?” — yes; ifr < 0orr > n, return0by convention. - “Do queries arrive online or can they be batched?” — for this lab, online (one at a time after preprocessing),
O(1)each. - “Is
N_maxknown in advance?” — yes; we precompute up toN_max. - “Should I support Lucas’s theorem for
nlarger thanN_max?” — out of scope for this lab; see follow-up.
Examples
C(5, 2) mod p=10.C(10, 5) mod p=252.C(0, 0) mod p=1.C(1000000, 500000) mod (10^9+7)= a large nonzero value (engine must handle it).C(5, 7) mod p=0(well-formedness allowsr > nonly as boundary; we return 0).
Initial Brute Force
For each query, compute C(n, r) = n! / (r! · (n-r)!) using arbitrary-precision arithmetic (e.g., Python’s int), then mod p.
from math import factorial
def nCr_brute(n, r, p):
if r < 0 or r > n:
return 0
return (factorial(n) // (factorial(r) * factorial(n - r))) % p
Brute Force Complexity
Time O(n log² n) per query in arbitrary precision (factorial is O(n) multiplications of big ints, each O(log n) digits). For n = 10^7 and Q = 10^6, infeasible by ~6 orders of magnitude. Useful only as a stress oracle on n ≤ 30.
Optimization Path
- Brute force (above) — correctness baseline only.
- Precompute factorials, compute inverse on each query.
fact[i]table built inO(N); each query computesinv(fact[r]) · inv(fact[n-r]) · fact[n], withinv()takingO(log p)via FLT. TotalO(N + Q log p). Better but still2·10^7ops forQ = 10^6— passes but wasteful. - Precompute inverse factorials, query in
O(1). Afterfact[], computeinv_fact[N] = power(fact[N], p-2, p)once, theninv_fact[i] = inv_fact[i+1] · (i+1) mod pgoing backward. TotalO(N + Q). The optimal solution.
The going-backward trick is the key insight: (i!)^(-1) = ((i+1)!)^(-1) · (i+1), because i! · (i+1) = (i+1)!. So one expensive power call plus a backward sweep gives all inverse factorials in O(N).
Final Expected Approach
- Precompute
fact[0..N]withfact[0] = 1,fact[i] = fact[i-1] · i mod p. - Compute
inv_fact[N] = power(fact[N], p - 2, p)via binary exponentiation (FLT). - Compute
inv_fact[i] = inv_fact[i+1] · (i+1) mod pforifromN-1down to0. - Each query:
if r < 0 or r > n: return 0; else return fact[n] · inv_fact[r] · inv_fact[n-r] mod p.
Data Structures Used
- Two flat arrays of
long long(orint64):fact[0..N]andinv_fact[0..N]. - One scalar
p. - A
power(a, b, m)helper (binary exponentiation).
No heaps, no maps, no trees. The whole thing is two arrays and a closed-form formula.
Correctness Argument
FLT proof of inverse. Fermat’s Little Theorem states: for prime p and gcd(a, p) = 1, a^(p-1) ≡ 1 (mod p). Multiplying both sides by a^(-1): a^(p-2) ≡ a^(-1) (mod p). So power(a, p-2, p) is the modular inverse of a whenever a ≢ 0 (mod p). For a = fact[i], since fact[i] = i! < p for i < p and the product of nonzero-mod-p values, gcd(fact[i], p) = 1, so the inverse exists.
Backward inverse-factorial recurrence. We have inv_fact[i+1] = ((i+1)!)^(-1) and want inv_fact[i] = (i!)^(-1). Since (i+1)! = i! · (i+1), taking inverses: ((i+1)!)^(-1) = (i!)^(-1) · (i+1)^(-1), equivalently (i!)^(-1) = ((i+1)!)^(-1) · (i+1). So inv_fact[i] = inv_fact[i+1] · (i+1) mod p. Base case at i = N is given by the explicit FLT call.
nCr formula correctness. C(n, r) = n! / (r! · (n-r)!). In Z/pZ, division by x is multiplication by x^(-1). So C(n, r) ≡ fact[n] · inv_fact[r] · inv_fact[n-r] (mod p). ✓
Complexity
- Preprocess:
O(N + log p)time (onepowercall, two linear sweeps),O(N)space. - Each query:
O(1)time,O(1)space. - Total for
N = 10^7andQ = 10^6: ~10^7 + 10^6mults, well under 2 seconds in C++/Java/Go; in Python use PyPy or numpy-accelerated arithmetic.
Implementation Requirements
- Use
long long(C++) /int64(Go) /int(Python; Javalong). Cast operands before multiplying to avoid overflow:(long long)fact[n] * inv_fact[r] % p. powerhelper handlesb = 0returning1andm = 1returning0.- Guard
r < 0 || r > nreturning0. - Mod once after every multiplication, not at the end.
Tests
def test_nCr():
p = 10**9 + 7
eng = NCrEngine(N=20, p=p)
assert eng.nCr(5, 2) == 10
assert eng.nCr(10, 5) == 252
assert eng.nCr(0, 0) == 1
assert eng.nCr(1, 0) == 1
assert eng.nCr(1, 1) == 1
assert eng.nCr(5, 7) == 0 # r > n
assert eng.nCr(5, -1) == 0 # r < 0
# Stress vs brute on small N:
import random
for _ in range(1000):
n = random.randint(0, 20)
r = random.randint(-2, 22)
assert eng.nCr(n, r) == nCr_brute(n, r, p)
Edge cases: n = 0, r = 0, r = n, r > n, r < 0, n = N_max.
Follow-up Questions
- “What if
pis not prime?” — FLT fails. Use extgcd-based inverse, but be aware that not everyais invertible (only those withgcd(a, p) = 1). - “What if
nis up to10^18butpis small (≤10^5)?” — Lucas’s theorem: writen, rin basep, multiplyC(n_i, r_i) mod pdigit-wise. - “What if you need
C(n, r) mod 4?” — neither FLT nor straightforward Lucas applies; use Kummer’s theorem or direct computation.
Product Extension
Probability/statistics services (e.g., AdTech bid pacing, fraud-risk scoring) compute combinatorial denominators on the fly. The factorial-precomputation engine is the production primitive: build it once at service startup, query it O(1) for the lifetime of the server. Same machinery is in CryptoLib’s prime-modulus arithmetic and in numerical libraries (NumPy’s comb calls down to a similar routine for large arguments).
Language / Runtime Follow-ups
- Python: integers are arbitrary precision, so overflow isn’t a concern, but per-mult cost is ~5× C++. Use
pow(a, b, m)(built-inO(log b)modular exponentiation). Precompute asnumpy.int64arrays only ifN ≤ 10^6; otherwise plain lists. - Java: use
longeverywhere; cast to(long)before multiplying.BigInteger.modPowworks but is 10× slower than a hand-rolled loop. - Go:
int64everywhere;math/bigif you need extra safety. Hand-roll the loop for performance. - C++:
long long(int64_t),(long long)a * b % p. The product of two values <p ≈ 10^9fits inlong long(<2^63). With unsigned overflow concerns, useunsigned long longand% pdefensively. - JS/TS:
Numberisdouble-precision and loses integer precision above2^53. UseBigInt, but it’s ~10× slower; avoid for hot loops larger than10^6ops.
Common Bugs
- Forgetting to mod after every multiplication; silent overflow.
- Using
**(exponentiation) where the language doesn’t have a modular form; computes a 5MB number first, then mods. Usepow(a, b, m)(Python) or your own loop. - Computing
inv_fact[i]directly viapower(fact[i], p-2, p)for everyi. Correct butO(N log p); the backward sweep isO(N). - Off-by-one:
inv_facthasN+1entries (inv_fact[0..N]); allocate accordingly. - Returning
1forC(n, -1)orC(n, n+1)instead of0. Always guard.
Debugging Strategy
If nCr(n, r) disagrees with brute on small cases:
- Print
fact[0..n]and confirmfact[i] = i!fori ≤ 10. - Print
inv_fact[0..n]and confirmfact[i] · inv_fact[i] ≡ 1 (mod p)for eachi. - If step 2 fails: the FLT exponent is wrong. Check
power(fact[N], p-2, p), notp-1. - If step 2 passes but
nCris wrong: the formula isfact[n] · inv_fact[r] · inv_fact[n-r]. Check you’re not usingfact[n-r](withoutinv_).
Mastery Criteria
- Write the engine (factorial table + inverse-factorial backward sweep + query) from blank in <5 minutes.
-
Articulate why
power(fact[N], p-2, p)givesinv_fact[N]in one sentence (FLT). -
Articulate why
inv_fact[i] = inv_fact[i+1] · (i+1)in one sentence (telescope). - Recognize “answer mod prime” + “counting / paths / arrangements” as the trigger for this engine within 60 seconds of reading a problem.
-
Switch to extgcd-based inverse if asked “what if
pis not prime?” - State Lucas’s theorem on demand and explain when it’s needed.