Hints — p94 Repeated DNA Sequences

  1. Trivial: set of every length-10 substring → repeats. Works but O(n·k) memory.

  2. Bit-encoded rolling hash: alphabet is {A,C,G,T} (4 letters); encode each base in 2 bits. A 10-mer fits in 20 bits. Slide: h = ((h << 2) | code[c]) & ((1 << 20) - 1).

  3. After feeding the first 10 chars, h uniquely identifies the current window. Maintain seen set of hashes; on collision, add the substring to result.

  4. General Rabin-Karp: polynomial rolling hash h = (h * BASE - s[i-k] * BASE^k + s[i]) mod MOD. Use a large MOD (e.g., (1<<61) - 1) and verify substring on hash match.

  5. Double-hashing (two independent (BASE, MOD) pairs) reduces collision probability to ~1/MOD² without needing to verify substrings.

If stuck: see solution.py.