Lab 02 — LFU Cache

Goal

Implement a LFUCache with O(1) get and put. The challenge over LRU is the tie-breaking rule: when multiple keys share the minimum frequency, evict the least-recently-used among them. Expect this to take 30 minutes from blank screen on first attempt; aim to bring it down to 20 with practice.

Background Concepts

LFU evicts the least-frequently-used entry. Frequency means “number of get plus put calls referencing that key since insertion”. A naive frequency-counter approach forces an O(N) eviction scan. The O(1) trick is to bucket nodes by frequency: a hashmap from freq → DoublyLinkedList plus a min_freq cursor. On eviction, pop the LRU entry from freq_map[min_freq]. On a hit, move the node from freq_map[f] to freq_map[f+1]; when freq_map[min_freq] empties, advance min_freq.

This is a textbook example of bucket sort applied to a dynamic counter. Each frequency bucket is itself an LRU list, which handles tie-breaking in O(1). Together you get O(1) for both ops at the cost of more code than LRU.

Interview Context

LFU follows LRU as the second-asked cache problem at senior practical interviews. It is LeetCode 460 verbatim. Where LRU is a 20-minute problem, LFU is a 30-to-45-minute problem and tests whether you can design with two coordinated abstractions (the freq map and the per-bucket LRU lists). Most candidates fail by picking a frequency-only design and then hitting the tie-breaking case.

Problem Statement

Design LFUCache(capacity):

  • get(key) -> value or None — increment frequency on hit.
  • put(key, value) — insert/update; on capacity overflow, evict the LFU entry. Tie-break by LRU within the LFU bucket. Insertion sets frequency to 1.

Both O(1) average.

Constraints

  • 1 ≤ capacity ≤ 10^5
  • 10^7 ops in benchmarks
  • Thread-safety: optional follow-up

Clarifying Questions

  1. Does put of an existing key increment frequency? (Yes by convention; confirm.)
  2. Tie-breaking: LRU or arbitrary? (LRU is the standard expectation; confirm.)
  3. What does get return on miss? (None or sentinel; confirm.)
  4. Should frequencies decay over time (windowed LFU)? (Often a follow-up; default is “no decay”.)

Examples

c = LFUCache(2)
c.put(1, "a")            # freq: {1->1}
c.put(2, "b")            # freq: {1->1, 2->1}
c.get(1)        -> "a"   # freq: {1->2, 2->1}
c.put(3, "c")            # evict 2 (freq=1, LRU); state {1, 3}
c.get(2)        -> None
c.get(3)        -> "c"   # freq: {1->2, 3->2}
c.put(4, "d")            # tie at freq=2; evict 1 (LRU); state {3, 4}
c.get(1)        -> None

Initial Brute Force

dict mapping key → (value, freq, last_access_time). On eviction, scan all entries to find the (min freq, min time) pair. O(N) eviction.

Brute Force Complexity

get is O(1), but put with eviction is O(N). At N=10^5 over 10^7 calls, slow but tolerable for small inputs; will TLE at large N.

Optimization Path

Replace the linear eviction scan with frequency bucketing. Maintain:

  • key_to_node: dict[K, Node] for O(1) lookup
  • freq_to_list: dict[int, DoublyLinkedList] mapping frequency to an LRU list of nodes
  • min_freq: int tracking the current minimum frequency present

On get(k): look up node, remove from freq_to_list[node.freq], increment node.freq, append to freq_to_list[node.freq]. If the old bucket is now empty and it equaled min_freq, increment min_freq.

On put(k, v): if exists, update value and behave like get. Otherwise, evict from freq_to_list[min_freq] if at capacity (pop the front = LRU), then insert with freq=1 and reset min_freq=1.

The reset of min_freq=1 on insertion is the only step that’s not obvious; without it you’d evict the wrong bucket on the very next put.

Final Expected Approach

Three layers: Node, per-frequency DoublyLinkedList (with sentinels), LFUCache orchestrating the two maps. The per-bucket linked list handles LRU tie-breaking automatically — append on insert, pop from front on evict.

Data Structures Used

StructurePurpose
dict[K, Node]O(1) key lookup
dict[int, DLList]O(1) frequency → bucket
Doubly-linked list per bucketLRU within frequency, O(1) splice
min_freq: intO(1) eviction target

Correctness Argument

Invariants maintained after every operation:

  1. key_to_node and the union of all freq_to_list[f] contain the same set of keys.
  2. Every node n lives in freq_to_list[n.freq] and only there.
  3. min_freq is the smallest f such that freq_to_list[f] is non-empty (or any value when the cache is empty).
  4. Within each bucket, the front is the LRU and the back is the MRU.

Each method preserves these by case analysis. The subtle case is on get: incrementing node.freq may empty the old bucket. We check if was_min_freq_bucket and now_empty: min_freq += 1. This is correct because every other key has frequency ≥ old min_freq + 1, since old min_freq was the global minimum and this was the only node at it (we know that because the bucket is now empty, but the bucket contained this node before the increment — so it was the only node).

Complexity

  • get: O(1)
  • put: O(1)
  • Space: O(capacity)

Implementation Requirements

from typing import Hashable, Any, Optional

class _Node:
    __slots__ = ("key", "val", "freq", "prev", "next")
    def __init__(self, key=None, val=None, freq=1):
        self.key, self.val, self.freq = key, val, freq
        self.prev = self.next = None

class _DLList:
    """Doubly linked list with sentinels. Front = LRU, back = MRU."""
    def __init__(self):
        self.head, self.tail = _Node(), _Node()
        self.head.next, self.tail.prev = self.tail, self.head
        self.size = 0

    def append(self, node: _Node) -> None:           # add to back (MRU)
        prev = self.tail.prev
        prev.next = node; node.prev = prev
        node.next = self.tail; self.tail.prev = node
        self.size += 1

    def remove(self, node: _Node) -> None:
        node.prev.next = node.next; node.next.prev = node.prev
        self.size -= 1

    def pop_front(self) -> _Node:                    # evict LRU
        node = self.head.next
        self.remove(node)
        return node

    def is_empty(self) -> bool:
        return self.size == 0


class LFUCache:
    def __init__(self, capacity: int):
        if capacity <= 0:
            raise ValueError("capacity must be positive")
        self._cap = capacity
        self._key_to_node: dict[Hashable, _Node] = {}
        self._freq_to_list: dict[int, _DLList] = {}
        self._min_freq = 0

    def get(self, key: Hashable) -> Optional[Any]:
        node = self._key_to_node.get(key)
        if node is None:
            return None
        self._bump(node)
        return node.val

    def put(self, key: Hashable, value: Any) -> None:
        if self._cap == 0:
            return
        node = self._key_to_node.get(key)
        if node is not None:
            node.val = value
            self._bump(node)
            return
        if len(self._key_to_node) >= self._cap:
            evicted = self._freq_to_list[self._min_freq].pop_front()
            del self._key_to_node[evicted.key]
        node = _Node(key, value, freq=1)
        self._key_to_node[key] = node
        self._freq_to_list.setdefault(1, _DLList()).append(node)
        self._min_freq = 1

    def _bump(self, node: _Node) -> None:
        old_list = self._freq_to_list[node.freq]
        old_list.remove(node)
        if old_list.is_empty() and node.freq == self._min_freq:
            self._min_freq += 1
        node.freq += 1
        self._freq_to_list.setdefault(node.freq, _DLList()).append(node)

    def __len__(self) -> int:
        return len(self._key_to_node)

Tests

import unittest

class TestLFU(unittest.TestCase):
    def test_basic_eviction(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")
        self.assertEqual(c.get(1), "a")          # 1.freq=2, 2.freq=1
        c.put(3, "c")                            # evict 2
        self.assertIsNone(c.get(2))
        self.assertEqual(c.get(3), "c")          # 3.freq=2

    def test_tie_break_lru(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")             # both freq=1; 1 is LRU
        c.put(3, "c")                            # evict 1 (LRU at freq=1)
        self.assertIsNone(c.get(1))
        self.assertEqual(c.get(2), "b")

    def test_update_increments_freq(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b"); c.put(1, "z")  # 1.freq=2
        c.put(3, "c")                            # evict 2 (freq=1)
        self.assertEqual(c.get(1), "z")
        self.assertIsNone(c.get(2))

    def test_capacity_zero(self):
        c = LFUCache(0)
        c.put(1, "a")
        self.assertIsNone(c.get(1))

    def test_min_freq_advance(self):
        c = LFUCache(2)
        c.put(1, "a"); c.put(2, "b")
        c.get(1); c.get(1); c.get(2); c.get(2)   # both freq=3
        c.put(3, "c")                            # evict 1 (LRU at freq=3)
        self.assertIsNone(c.get(1))
        self.assertEqual(c.get(2), "b")

Follow-up Questions

(1) Thread-safe? A single RLock around get, put, and _bump is correct and the standard production answer. The locked region is O(1), so contention is bounded. Sharding by hash(key) % N into N independent LFU caches is the higher-throughput choice — but each shard has its own LFU bucketing, which is fine because LFU is per-key.

(4) Observe and monitor? Hit rate (hits / (hits + misses)) as a gauge; eviction count as a counter; frequency distribution as a histogram (10th, 50th, 90th, 99th percentile of frequency at eviction time) — this tells you whether the cache is actually being used “frequency-aware” or whether everything has freq=1. Cache size as a gauge.

(9) Eviction policy / cleanup? This is the eviction policy. The catch: long-tail entries can pile up at high frequency and never get evicted even after they go cold (a value popular yesterday is still freq=1000 today). Solutions: windowed LFU (decay frequencies on a timer), LFU-Aging (halve all frequencies periodically), or TinyLFU (admission filter that uses a count-min sketch). State the limitation and pick a mitigation.

(10) Consistency model? Linearizable in a single process under the lock — every get/put appears to take effect instantly at some moment between invocation and return. Replicated LFU is harder; most distributed caches degrade to eventual consistency on the cache and rely on an authoritative store underneath.

(11) Configuration knobs? capacity, optionally decay_interval (for windowed LFU), optionally eviction_callback. Don’t expose internal data-structure tuning.

Product Extension

LFU is the right policy when access patterns are stationary (popular items stay popular). It outperforms LRU on workloads with strong popularity skew (web caches, recommendation systems, query result caches). It performs worse than LRU on scan-heavy workloads (a large one-time scan pollutes LRU with a single bump but pollutes LFU with cold-but-frequent entries until aging kicks in). TinyLFU (used in Caffeine) combines a count-min admission filter with a small LRU window, getting LFU’s hit rate without the staleness problem.

Language/Runtime Follow-ups

  • Python: same approach as shown. collections.Counter for frequencies is tempting but doesn’t give the bucketing we need.
  • Java: build on LinkedHashMap for the per-frequency buckets, HashMap<Integer, LinkedHashSet<K>> for the freq map. Caffeine provides production-grade TinyLFU.
  • Go: container/list per bucket, map[int]*list.List for buckets. The groupcache library uses a different policy (LRU); for LFU, write it yourself or use dgraph-io/ristretto (TinyLFU).
  • C++: std::list<Node> per bucket; std::unordered_map<int, list> for freq map.
  • JS/TS: Map<int, Set<K>> for buckets — Set preserves insertion order, so LRU-within-bucket is free.

Common Bugs

  1. Forgetting to advance min_freq when the LRU bucket becomes empty after a _bump. Subsequent eviction picks an empty bucket and crashes.
  2. Resetting min_freq=1 on insert before the eviction step instead of after, evicting the wrong bucket.
  3. Tie-breaking by MRU instead of LRU — appending to the front of the bucket instead of the back. The bucket is itself an LRU list; respect the convention.
  4. Sharing a single _DLList across freq buckets accidentally (e.g., a class-level default). Use setdefault(freq, _DLList()).
  5. On put of an existing key, decrementing-then-incrementing frequency instead of bumping — produces correct frequency but breaks the bucketing if you forget to move buckets.

Debugging Strategy

Print the bucketing as {freq: [keys]} after every operation. The bug usually shows as a stale entry in min_freq’s bucket or a missing entry in the new bucket. Add an assert _check_invariants() method that walks the buckets and verifies (a) bucket→key relations, (b) min_freq correctness, (c) hashmap-bucket set equality. Run the test suite with assertions on.

Mastery Criteria

  • Implemented LFUCache from blank screen in <30 minutes (target: <25 after second attempt).
  • All five tests pass first run.
  • Stated tie-breaking-via-per-bucket-LRU explicitly without prompting.
  • Articulated the min_freq advancement rule precisely.
  • Answered follow-ups #1, #4, #9 (LFU-Aging / TinyLFU), #10, #11 in <90 seconds each.
  • Compared LFU vs LRU on scan-heavy and popularity-skewed workloads in <60 seconds.