Lab 02 — Test-Driven Problem Solving (LRU Cache)
Goal
Write tests before writing the implementation. Use LRU cache (LC 146) as the anchor — a problem where ambiguities in the spec (does put of an existing key count as a “use”? what does capacity 0 mean?) are best surfaced by writing test cases first. By the end you should treat tests as a design tool, not a verification afterthought.
Background Concepts
Test-driven design (TDD) in an interview context is not the dogmatic red-green-refactor cycle. It is the discipline of writing 3–5 example calls and their expected results before writing the implementation, because:
- Writing the expected output forces you to confront spec ambiguities (and ask the interviewer).
- The tests double as documentation of your understanding — if the interviewer disagrees, you discover it before you’ve written 50 lines.
- The tests become your verification suite — you don’t have to invent them after the fact under time pressure.
- The act of choosing tests reveals edge cases you would otherwise miss.
The cost is 2–3 minutes up front. The savings are usually 10+ minutes of debugging later.
Interview Context
LRU cache is the most-asked OOD-flavored coding question at FAANG. Google, Meta, Amazon, Bloomberg all ask it in some form. The standard expectation is O(1) get and put using a hashmap + doubly linked list.
The senior signal is to enumerate behavioral test cases before touching code: “get on missing key returns -1 (or what?). put of existing key updates value AND marks as recently used? put over capacity evicts the LRU; what if multiple keys are tied? Does get count as a use?” These are the real questions. Candidates who code first and discover these mid-interview look junior.
Problem Statement
Design a data structure that supports:
LRUCache(int capacity)— initialize with positive capacityint get(int key)— return value if present, else -1; using a key counts as “recently used”void put(int key, int value)— insert or update; on overflow evict the least recently used; updating an existing key also counts as recently used
All operations must be O(1) average.
Constraints
- 1 ≤ capacity ≤ 3000
- 0 ≤ key, value ≤ 10^4
- Up to 2×10^5 calls
Clarifying Questions (Surface These Before Writing Code)
- Does
getmark the key as recently used? (Yes — standard.) - Does
puton an existing key mark as recently used? (Yes — standard. Confirm.) - Capacity of 0 — is that legal? (Constraints say ≥ 1, but worth confirming the contract.)
- Eviction policy when multiple keys are tied for least recently used — can this happen? (In a strict LRU with sequential ops, no — every access updates order. Tie only on initial fill, but at that point the oldest insertion is LRU.)
- Thread safety required? (Almost never in the interview; always ask anyway. If yes, see Phase 8 LRU lab.)
Examples (Written as Tests First)
# Test 1: basic put/get
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == 1 # returns 1
cache.put(3, 3) # evicts key 2 (LRU)
assert cache.get(2) == -1 # not found
assert cache.get(3) == 3
cache.put(4, 4) # evicts key 1
assert cache.get(1) == -1
assert cache.get(3) == 3
assert cache.get(4) == 4
# Test 2: put on existing key updates value AND recency
cache = LRUCache(2)
cache.put(1, 1)
cache.put(2, 2)
cache.put(1, 10) # update key 1; now order is 2 (LRU), 1 (MRU)
cache.put(3, 3) # evicts 2, not 1
assert cache.get(2) == -1
assert cache.get(1) == 10 # updated value preserved
# Test 3: get on missing key
cache = LRUCache(1)
assert cache.get(99) == -1 # never inserted
These three tests already locked down 6 design decisions. Now you can write the implementation.
Initial Brute Force
Use a single Python dict and an auxiliary list to track insertion order. get: O(1) dict lookup, but moving to end is O(N) list operation. put: O(1) insert, but O(N) eviction scan. Total: O(N) per op.
Alternatively, use collections.OrderedDict which is already a hash + doubly linked list internally. move_to_end and popitem(last=False) are both O(1). Single-class solution in ~15 lines.
Brute Force Complexity
Naive dict + list: O(N) per op, fails on 2×10^5 calls at large capacity → 6×10^8 ops, TLE.
Optimization Path
The standard answer is hashmap + doubly linked list:
- Hashmap:
key → node - Doubly linked list: nodes in MRU-to-LRU order
get: hashmap lookup → unlink node → relink at head (MRU)put: if key exists, update value + move to head; else create node, insert at head; if size > capacity, remove tail (LRU) and delete from hashmap
All operations are O(1) because both the hashmap and the doubly linked list support O(1) insert/delete with a node reference.
Using OrderedDict in Python is equivalent and acceptable in interviews if you explain why it works (because it’s a hashmap + DLL internally). In Java, use LinkedHashMap with accessOrder=true.
Final Expected Approach
Implement with explicit doubly linked list to demonstrate understanding:
class Node:
__slots__ = ('key', 'val', 'prev', 'next')
def __init__(self, key=0, val=0):
self.key, self.val = key, val
self.prev = self.next = None
class LRUCache:
def __init__(self, capacity: int):
self.cap = capacity
self.cache = {}
# sentinel head/tail to avoid edge cases
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _add_to_front(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def get(self, key: int) -> int:
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.val
def put(self, key: int, value: int) -> None:
if key in self.cache:
node = self.cache[key]
node.val = value
self._remove(node)
self._add_to_front(node)
else:
if len(self.cache) >= self.cap:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
node = Node(key, value)
self.cache[key] = node
self._add_to_front(node)
Data Structures Used
- Dict — O(1) key → node lookup
- Doubly linked list with sentinels — O(1) insert at head, O(1) remove from tail or middle
Sentinels eliminate if node is None checks at the boundary, the #1 source of LRU bugs.
Correctness Argument
Invariant 1: cache[key] always points to the node currently in the linked list with that key. Maintained because every insertion adds to both, every removal removes from both.
Invariant 2: The linked list is ordered MRU → LRU from head to tail. Maintained because every access (get or put on existing) moves the node to the front, and every new insert goes to the front.
Invariant 3: len(cache) ≤ capacity. Maintained because put evicts the tail (LRU) before adding when at capacity.
Complexity
get: O(1). put: O(1). Space: O(capacity).
Implementation Requirements
- Use sentinels — no null checks at head/tail
- Use
__slots__on the Node class in Python for memory efficiency - Keep
_removeand_add_to_frontas separate helpers — do not inline; the duplication is a bug magnet - Update the hashmap whenever you touch the linked list, never one without the other
Tests
Smoke (3 normal)
The three tests in the Examples section above.
Edge
# Capacity 1 — every put evicts
cache = LRUCache(1)
cache.put(1, 1)
cache.put(2, 2)
assert cache.get(1) == -1
assert cache.get(2) == 2
# Get on a key that was evicted
cache = LRUCache(2)
cache.put(1, 1); cache.put(2, 2); cache.put(3, 3)
assert cache.get(1) == -1
# Repeated put on same key never evicts
cache = LRUCache(2)
for v in range(100):
cache.put(1, v)
assert cache.get(1) == 99
Large
cache = LRUCache(3000)
for i in range(200000):
cache.put(i % 5000, i)
# Verifies O(1) per op; should complete in <1s
Randomized verifier (brute O(N) LRU using list)
class BruteLRU:
def __init__(self, cap):
self.cap = cap
self.order = []
self.vals = {}
def get(self, k):
if k not in self.vals: return -1
self.order.remove(k); self.order.append(k)
return self.vals[k]
def put(self, k, v):
if k in self.vals:
self.order.remove(k)
elif len(self.vals) >= self.cap:
evict = self.order.pop(0)
del self.vals[evict]
self.order.append(k); self.vals[k] = v
import random
random.seed(42)
for trial in range(100):
cap = random.randint(1, 10)
fast = LRUCache(cap); slow = BruteLRU(cap)
for _ in range(200):
if random.random() < 0.5:
k = random.randint(0, 15)
assert fast.get(k) == slow.get(k)
else:
k = random.randint(0, 15); v = random.randint(0, 100)
fast.put(k, v); slow.put(k, v)
Invalid input
# Capacity 0 — undefined by spec; behavior must be documented
try:
LRUCache(0)
except ValueError:
pass
# Or: assert never stores anything if zero is allowed
Follow-up Questions
- Make it thread-safe. Per-op mutex is the simple answer; striped locks for higher throughput. (See Phase 8 lab.)
- LFU instead of LRU. Track frequencies + a min-frequency pointer; harder, see Phase 8 lab-02.
- TTL eviction. Add expiration timestamp per entry; lazy or eager eviction tradeoff.
- Distributed LRU. Consistent hashing across nodes; cache coherence is now a hard problem.
- Approximate LRU. Sample K random entries and evict the oldest among them (Redis approach); O(1) eviction without strict ordering.
Product Extension
- CPU caches. Hardware caches are pseudo-LRU because true LRU’s pointer overhead is too expensive.
- CDN edge caches. Strict LRU loses popular content under cache pollution; LFU + admission filter (TinyLFU) is state of the art.
- Database buffer pools. PostgreSQL uses a clock-sweep approximation; MySQL InnoDB uses LRU with a midpoint insertion point to resist scan pollution.
- OS page replacement. Linux uses two clock lists (active + inactive) to approximate LRU at low cost.
Language/Runtime Follow-ups
- Python:
OrderedDictis implemented in C and uses a doubly linked list internally;move_to_endandpopitem(last=False)are O(1). The__slots__on Node avoids per-instance__dict__, saving ~50% memory per node. - Java:
LinkedHashMapwith(capacity, 0.75f, true)constructor enables access order; overrideremoveEldestEntryfor eviction.ConcurrentHashMapdoes not provide LRU; you’d needCaffeineor a custom striped lock. - C++:
std::list+std::unordered_map<Key, std::list::iterator>; iterators tostd::listremain valid after other inserts/erases, which is essential for this design. - Go: No built-in LRU; use
container/list+ a map. The standard libraryhashicorp/golang-lruis the go-to. - Rust: Borrowing rules make a vanilla doubly linked list hard; use
lrucrate which uses raw pointers internally.
Common Bugs
- Forgot to update recency on
get. Tests pass forputbut fail whengetshould “save” a key from eviction. - Forgot to update recency on
putof existing key. The key that was just updated gets evicted on the nextput. - Hashmap and linked list out of sync. Removed from list but not dict, or vice versa. Always update both.
- No sentinels → null pointer at head/tail. Sentinels eliminate 4 different null checks.
- Evicting before checking if key already exists. Update-of-existing should not trigger eviction.
- Off-by-one on capacity comparison.
len(cache) >= capvslen(cache) > cap— first is correct because you’re about to add one more.
Debugging Strategy
If a test fails:
- Print the linked list (head → tail with keys) after each operation. Verify it matches your expected MRU order.
- Print
len(cache)after each op. It should match the number of inserts minus evictions. - Cross-check: after every op,
set(cache.keys())should equal the set of keys in the linked list. If not, you have a sync bug. - Run the randomized verifier with a small seed; when it fails, print the trace of operations that led to the divergence — it will be 10–20 ops long and obvious.
Mastery Criteria
- Wrote the tests before writing the implementation
- Surfaced at least 3 spec ambiguities through the tests
- Implementation worked on the first run (because tests forced the design to be correct)
- Used sentinels in the linked list
- Wrote the randomized verifier and ran it for 100+ trials with no divergence
- Can explain why updating an existing key must mark it as MRU
- Re-applied TDD to one Phase 8 lab and recorded how many design questions it surfaced