Lab 01 — Modular Arithmetic: `nCr mod p` With Precomputed Factorials

Goal

Master modular arithmetic and modular inverse by building a nCr mod p engine that answers any (n, r) query in O(1) after O(N) preprocessing, for n up to 10^7 and p = 10^9 + 7. By the end of this lab you can write the engine from blank in under 5 minutes.

Background

Counting problems modulo a prime are the single most common framing in competitive programming. The output line “print the answer modulo 10^9 + 7” appears on roughly 30% of CF Div 2/3 problems with combinatorial flavor. Behind that line is a fixed engine: precompute fact[i] = i! and inv_fact[i] = (i!)^(-1) modulo p, and C(n, r) = fact[n] · inv_fact[r] · inv_fact[n-r] mod p. Once you have the engine, dozens of problems collapse to “set up the formula, plug in, output”. The same machinery underlies probability problems where the answer is a/b mod p (output a · b^(-1) mod p).

Interview Context

Quant/HFT interviews use modular-counting problems as direct filters: “how many distinct length-k increasing sequences over [1, n], modulo 10^9 + 7?” If you can’t write the formula and the inverse-factorial trick fluently, you’ve failed in 10 minutes. FAANG L4 interviews almost never ask this directly, but a good L5+ candidate signals fluency by reaching for inv_fact[r] without explanation when a counting problem’s answer needs to be reduced. The signal interviewers want is “this candidate has CP background”; that signal is delivered by writing modular inverse without flinching.

Problem Statement

Given Q queries, each (n_i, r_i) with 0 ≤ r_i ≤ n_i ≤ N_max, output C(n_i, r_i) mod p for p = 10^9 + 7. The engine must support Q ≥ 10^6 queries in <2 seconds total.

LeetCode reference: LC 62 (Unique Paths) asks for C(m+n-2, m-1) directly (no mod needed). LC 920 (Number of Music Playlists) is a DP that uses modular arithmetic but not nCr directly. The pure CP framing appears on Codeforces (e.g., CF 1342E, CF 1342B).

Constraints

1 ≤ N_max ≤ 10^7 (table size).
1 ≤ Q ≤ 10^6 (queries).
0 ≤ r ≤ n ≤ N_max (well-formed query).
p = 10^9 + 7 (prime).
Time limit: 2 seconds. Memory limit: 256 MB. The factorial tables fit (10^7 · 8 bytes ≈ 80 MB).

Clarifying Questions

“Is p always prime?” — yes; FLT works iff p prime. If composite, fall back to extgcd-based inverse and watch for non-invertible elements.
“Are n and r always non-negative?” — yes; if r < 0 or r > n, return 0 by convention.
“Do queries arrive online or can they be batched?” — for this lab, online (one at a time after preprocessing), O(1) each.
“Is N_max known in advance?” — yes; we precompute up to N_max.
“Should I support Lucas’s theorem for n larger than N_max?” — out of scope for this lab; see follow-up.

Examples

C(5, 2) mod p = 10.
C(10, 5) mod p = 252.
C(0, 0) mod p = 1.
C(1000000, 500000) mod (10^9+7) = a large nonzero value (engine must handle it).
C(5, 7) mod p = 0 (well-formedness allows r > n only as boundary; we return 0).

Initial Brute Force

For each query, compute C(n, r) = n! / (r! · (n-r)!) using arbitrary-precision arithmetic (e.g., Python’s int), then mod p.

from math import factorial

def nCr_brute(n, r, p):
    if r < 0 or r > n:
        return 0
    return (factorial(n) // (factorial(r) * factorial(n - r))) % p

Brute Force Complexity

Time O(n log² n) per query in arbitrary precision (factorial is O(n) multiplications of big ints, each O(log n) digits). For n = 10^7 and Q = 10^6, infeasible by ~6 orders of magnitude. Useful only as a stress oracle on n ≤ 30.

Optimization Path

Brute force (above) — correctness baseline only.
Precompute factorials, compute inverse on each query. fact[i] table built in O(N); each query computes inv(fact[r]) · inv(fact[n-r]) · fact[n], with inv() taking O(log p) via FLT. Total O(N + Q log p). Better but still 2·10^7 ops for Q = 10^6 — passes but wasteful.
Precompute inverse factorials, query in O(1). After fact[], compute inv_fact[N] = power(fact[N], p-2, p) once, then inv_fact[i] = inv_fact[i+1] · (i+1) mod p going backward. Total O(N + Q). The optimal solution.

The going-backward trick is the key insight: (i!)^(-1) = ((i+1)!)^(-1) · (i+1), because i! · (i+1) = (i+1)!. So one expensive power call plus a backward sweep gives all inverse factorials in O(N).

Final Expected Approach

Precompute fact[0..N] with fact[0] = 1, fact[i] = fact[i-1] · i mod p.
Compute inv_fact[N] = power(fact[N], p - 2, p) via binary exponentiation (FLT).
Compute inv_fact[i] = inv_fact[i+1] · (i+1) mod p for i from N-1 down to 0.
Each query: if r < 0 or r > n: return 0; else return fact[n] · inv_fact[r] · inv_fact[n-r] mod p.

Data Structures Used

Two flat arrays of long long (or int64): fact[0..N] and inv_fact[0..N].
One scalar p.
A power(a, b, m) helper (binary exponentiation).

No heaps, no maps, no trees. The whole thing is two arrays and a closed-form formula.

Correctness Argument

FLT proof of inverse. Fermat’s Little Theorem states: for prime p and gcd(a, p) = 1, a^(p-1) ≡ 1 (mod p). Multiplying both sides by a^(-1): a^(p-2) ≡ a^(-1) (mod p). So power(a, p-2, p) is the modular inverse of a whenever a ≢ 0 (mod p). For a = fact[i], since fact[i] = i! < p for i < p and the product of nonzero-mod-p values, gcd(fact[i], p) = 1, so the inverse exists.

Backward inverse-factorial recurrence. We have inv_fact[i+1] = ((i+1)!)^(-1) and want inv_fact[i] = (i!)^(-1). Since (i+1)! = i! · (i+1), taking inverses: ((i+1)!)^(-1) = (i!)^(-1) · (i+1)^(-1), equivalently (i!)^(-1) = ((i+1)!)^(-1) · (i+1). So inv_fact[i] = inv_fact[i+1] · (i+1) mod p. Base case at i = N is given by the explicit FLT call.

nCr formula correctness. C(n, r) = n! / (r! · (n-r)!). In Z/pZ, division by x is multiplication by x^(-1). So C(n, r) ≡ fact[n] · inv_fact[r] · inv_fact[n-r] (mod p). ✓

Complexity

Preprocess: O(N + log p) time (one power call, two linear sweeps), O(N) space.
Each query: O(1) time, O(1) space.
Total for N = 10^7 and Q = 10^6: ~10^7 + 10^6 mults, well under 2 seconds in C++/Java/Go; in Python use PyPy or numpy-accelerated arithmetic.

Implementation Requirements

Use long long (C++) / int64 (Go) / int (Python; Java long). Cast operands before multiplying to avoid overflow: (long long)fact[n] * inv_fact[r] % p.
power helper handles b = 0 returning 1 and m = 1 returning 0.
Guard r < 0 || r > n returning 0.
Mod once after every multiplication, not at the end.

Tests

def test_nCr():
    p = 10**9 + 7
    eng = NCrEngine(N=20, p=p)
    assert eng.nCr(5, 2) == 10
    assert eng.nCr(10, 5) == 252
    assert eng.nCr(0, 0) == 1
    assert eng.nCr(1, 0) == 1
    assert eng.nCr(1, 1) == 1
    assert eng.nCr(5, 7) == 0       # r > n
    assert eng.nCr(5, -1) == 0      # r < 0
    # Stress vs brute on small N:
    import random
    for _ in range(1000):
        n = random.randint(0, 20)
        r = random.randint(-2, 22)
        assert eng.nCr(n, r) == nCr_brute(n, r, p)

Edge cases: n = 0, r = 0, r = n, r > n, r < 0, n = N_max.

Follow-up Questions

“What if p is not prime?” — FLT fails. Use extgcd-based inverse, but be aware that not every a is invertible (only those with gcd(a, p) = 1).
“What if n is up to 10^18 but p is small (≤ 10^5)?” — Lucas’s theorem: write n, r in base p, multiply C(n_i, r_i) mod p digit-wise.
“What if you need C(n, r) mod 4?” — neither FLT nor straightforward Lucas applies; use Kummer’s theorem or direct computation.

Product Extension

Probability/statistics services (e.g., AdTech bid pacing, fraud-risk scoring) compute combinatorial denominators on the fly. The factorial-precomputation engine is the production primitive: build it once at service startup, query it O(1) for the lifetime of the server. Same machinery is in CryptoLib’s prime-modulus arithmetic and in numerical libraries (NumPy’s comb calls down to a similar routine for large arguments).

Language / Runtime Follow-ups

Python: integers are arbitrary precision, so overflow isn’t a concern, but per-mult cost is ~5× C++. Use pow(a, b, m) (built-in O(log b) modular exponentiation). Precompute as numpy.int64 arrays only if N ≤ 10^6; otherwise plain lists.
Java: use long everywhere; cast to (long) before multiplying. BigInteger.modPow works but is 10× slower than a hand-rolled loop.
Go: int64 everywhere; math/big if you need extra safety. Hand-roll the loop for performance.
C++: long long (int64_t), (long long)a * b % p. The product of two values <p ≈ 10^9 fits in long long (< 2^63). With unsigned overflow concerns, use unsigned long long and % p defensively.
JS/TS: Number is double-precision and loses integer precision above 2^53. Use BigInt, but it’s ~10× slower; avoid for hot loops larger than 10^6 ops.

Common Bugs

Forgetting to mod after every multiplication; silent overflow.
Using ** (exponentiation) where the language doesn’t have a modular form; computes a 5MB number first, then mods. Use pow(a, b, m) (Python) or your own loop.
Computing inv_fact[i] directly via power(fact[i], p-2, p) for every i. Correct but O(N log p); the backward sweep is O(N).
Off-by-one: inv_fact has N+1 entries (inv_fact[0..N]); allocate accordingly.
Returning 1 for C(n, -1) or C(n, n+1) instead of 0. Always guard.

Debugging Strategy

If nCr(n, r) disagrees with brute on small cases:

Print fact[0..n] and confirm fact[i] = i! for i ≤ 10.
Print inv_fact[0..n] and confirm fact[i] · inv_fact[i] ≡ 1 (mod p) for each i.
If step 2 fails: the FLT exponent is wrong. Check power(fact[N], p-2, p), not p-1.
If step 2 passes but nCr is wrong: the formula is fact[n] · inv_fact[r] · inv_fact[n-r]. Check you’re not using fact[n-r] (without inv_).

Mastery Criteria

Write the engine (factorial table + inverse-factorial backward sweep + query) from blank in <5 minutes.
Articulate why power(fact[N], p-2, p) gives inv_fact[N] in one sentence (FLT).
Articulate why inv_fact[i] = inv_fact[i+1] · (i+1) in one sentence (telescope).
Recognize “answer mod prime” + “counting / paths / arrangements” as the trigger for this engine within 60 seconds of reading a problem.
Switch to extgcd-based inverse if asked “what if p is not prime?”
State Lucas’s theorem on demand and explain when it’s needed.

← Phase 7 README · Lab 02 — Binary Exponentiation →

LeetCode Interview — Extreme Coding