Hints — p94 Repeated DNA Sequences
-
Trivial: set of every length-10 substring → repeats. Works but O(n·k) memory.
-
Bit-encoded rolling hash: alphabet is
{A,C,G,T}(4 letters); encode each base in 2 bits. A 10-mer fits in 20 bits. Slide:h = ((h << 2) | code[c]) & ((1 << 20) - 1). -
After feeding the first 10 chars,
huniquely identifies the current window. Maintainseenset of hashes; on collision, add the substring to result. -
General Rabin-Karp: polynomial rolling hash
h = (h * BASE - s[i-k] * BASE^k + s[i]) mod MOD. Use a large MOD (e.g.,(1<<61) - 1) and verify substring on hash match. -
Double-hashing (two independent (BASE, MOD) pairs) reduces collision probability to ~1/MOD² without needing to verify substrings.
If stuck: see solution.py.