Lab 13 — Hierarchical Timer Wheel
Goal
Implement a hierarchical timer wheel that supports O(1) amortized schedule and cancel operations for up to millions of timers. After this lab you should understand why a min-heap is wrong for high-throughput timer workloads, and be able to implement a single-level and a hierarchical timer wheel from blank screen in under 35 minutes.
Background Concepts
Many systems need to schedule callbacks for a future time: TCP retransmit timers, session timeouts, rate-limit reset, cron-style task scheduling. The classical data structure is a min-heap of (fire_time, callback): O(log N) schedule, O(log N) to fire (pop), O(N) to cancel (without indexing).
For the cases where N is small (thousands) and timers are infrequent, a heap is fine. But TCP at high QPS has millions of pending timers, most of which are cancelled before firing (the data arrives or the connection closes). For that workload, the heap is too slow.
The timer wheel (Varghese & Lauck, 1987) achieves O(1) schedule and O(1) cancel by bucketing timers by their fire time. Imagine a circular array of N slots; each slot holds a list of timers that fire when the wheel cursor reaches that slot. Each tick, advance the cursor and fire all timers in the current slot. Schedule is O(1): slot_index = (now + delay) % N. Cancel is O(1): remove from the slot’s list.
The single-level wheel works for delays up to N · tick_resolution. Beyond that, hierarchical wheels: minute wheel + hour wheel + day wheel, like an analog clock. When the minute hand sweeps past 60, advance the hour hand by one and re-bucket the timers in that hour slot into the minute wheel.
Interview Context
Timer wheel is a senior+ system-design-coding question, asked at networking/infrastructure companies (Cloudflare, Cilium, AWS networking, Datadog APM agents). The bar is high: implement at least the single-level wheel correctly; the hierarchical structure is for top candidates.
Problem Statement
Implement TimerWheel:
schedule(delay_seconds, callback) -> timer_idcancel(timer_id) -> booltick(now)— advance the cursor; fire all timers whose deadline passed.
Then extend to HierarchicalTimerWheel: 4 levels (e.g., 256 slots × 4 levels = 4 GB span at 1 ms tick).
Constraints
- Up to 10^7 active timers
- Tick resolution: 1 ms to 100 ms
- Schedule rate: 10^6 / sec
- Cancel rate: 10^6 / sec (most timers are cancelled before firing)
Clarifying Questions
- Tick resolution — fixed at construction, or adaptive? (Fixed.)
- Time source — caller supplies
now(testable) or wall clock? (Caller supplies; this also lets us simulate.) - Are callbacks fired on the tick thread, or queued? (On tick thread — simpler. For long callbacks, tick is slow; document.)
- What’s the max delay? (Single-level:
slots * tick; hierarchical: enormous.)
Examples
wheel = TimerWheel(slots=60, tick_ms=1000) # 1-second tick, 60-second range
fired = []
wheel.schedule(5, lambda: fired.append("5s"))
wheel.schedule(10, lambda: fired.append("10s"))
# Simulate ticks
for i in range(15):
wheel.tick(start_time + i)
# After 15 seconds, fired == ["5s", "10s"]
Initial Brute Force
A min-heap of (deadline, id, callback). Schedule = O(log N). Tick = O(K log N) for K firings. Cancel = O(N) (or O(log N) with a dict[id, heap_idx] and lazy deletion).
Brute Force Complexity
At 10^6 schedules/sec with N=10^7, schedule cost is log(10^7) ≈ 23 per — feasible but at the edge. The fatal weakness is cancel: O(N) without indexing. With lazy-deletion indexing, the heap grows unboundedly with cancelled-but-not-popped entries — memory leak.
Optimization Path
Single-level wheel: slots = circular array of bucket lists. schedule(delay): compute slot = (cursor + delay // tick) % slots, append to that slot’s list. tick: advance cursor, fire and clear slots[cursor]. Cancel: remove the timer from its slot’s list (O(1) if you store back-pointers).
Hierarchical: when delay > slots, place in higher-level wheel. When the lower wheel completes a full revolution, the next slot of the higher wheel is “cascaded” — its timers are re-bucketed into the lower wheel.
Final Expected Approach
Single-level: doubly-linked lists at each slot for O(1) cancel (timer holds prev/next pointers). Cursor advances on tick; fire all timers in the new slot; clear the slot.
Hierarchical: 4 wheels with slot counts [256, 64, 64, 64] (Linux kernel choice). Lower wheel ticks every 1 ms; carries every 256 ticks → upper level advances by one slot, and we cascade that slot (re-bucket its timers into the lower wheel based on remaining delay).
Data Structures Used
| Structure | Purpose |
|---|---|
| Array of doubly-linked lists | wheel slots |
dict[timer_id, Timer] | O(1) cancel lookup |
cursor: int | current slot |
| Multiple wheels | hierarchical |
Correctness Argument
Single-level firing: a timer scheduled at now + delay is placed in slot (cursor + delay // tick) % slots. tick is called once per tick time unit; after delay // tick calls, the cursor reaches the timer’s slot and fires it. Provided delay < slots * tick, this is exact.
Cancel: O(1) splice in the doubly-linked list, O(1) lookup via dict.
Hierarchical correctness: when the lower wheel completes a revolution (cursor wraps), the next upper-wheel slot is cascaded: each timer in it has its remaining delay computed against the new “now” and is placed in the appropriate lower-wheel slot. Because the cascade happens just before the next revolution, the timer fires at the same wall-clock time it would have in a single-large-wheel implementation.
Complexity
schedule: O(1)cancel: O(1)tick: O(K) for K firings, plus amortized O(1) cascade- Space: O(slots + active timers)
Implementation Requirements
import itertools
from typing import Callable, Optional
class _Timer:
__slots__ = ("tid", "deadline_tick", "callback", "prev", "next", "slot")
def __init__(self, tid, deadline_tick, callback):
self.tid = tid
self.deadline_tick = deadline_tick
self.callback = callback
self.prev = self.next = None
self.slot: Optional[list] = None # back-ref to bucket for O(1) cancel
class _Bucket:
__slots__ = ("head", "tail")
def __init__(self):
self.head = _Timer(None, None, None) # sentinel
self.tail = _Timer(None, None, None)
self.head.next = self.tail; self.tail.prev = self.head
def append(self, t: _Timer) -> None:
t.prev = self.tail.prev; t.next = self.tail
self.tail.prev.next = t; self.tail.prev = t
t.slot = self
def remove(self, t: _Timer) -> None:
t.prev.next = t.next; t.next.prev = t.prev
t.slot = None; t.prev = t.next = None
def drain(self) -> list[_Timer]:
out = []
n = self.head.next
while n is not self.tail:
nxt = n.next
n.prev = n.next = None; n.slot = None
out.append(n); n = nxt
self.head.next = self.tail; self.tail.prev = self.head
return out
class TimerWheel:
def __init__(self, slots: int = 256, tick_seconds: float = 0.001):
self._slots = [_Bucket() for _ in range(slots)]
self._n_slots = slots
self._tick = tick_seconds
self._cursor = 0
self._current_tick = 0
self._timers: dict[int, _Timer] = {}
self._next_id = itertools.count(1)
def schedule(self, delay_seconds: float, callback: Callable) -> int:
ticks = max(1, int(delay_seconds / self._tick))
if ticks >= self._n_slots:
raise ValueError("delay exceeds wheel range; use HierarchicalTimerWheel")
deadline = self._current_tick + ticks
slot = deadline % self._n_slots
t = _Timer(next(self._next_id), deadline, callback)
self._slots[slot].append(t)
self._timers[t.tid] = t
return t.tid
def cancel(self, timer_id: int) -> bool:
t = self._timers.pop(timer_id, None)
if t is None or t.slot is None:
return False
t.slot.remove(t)
return True
def tick(self) -> None:
self._current_tick += 1
self._cursor = (self._cursor + 1) % self._n_slots
bucket = self._slots[self._cursor]
for t in bucket.drain():
self._timers.pop(t.tid, None)
try: t.callback()
except Exception: pass
class HierarchicalTimerWheel:
"""4 levels: each level has 256 slots; tick = lower-level period."""
def __init__(self, levels: int = 4, slots_per_level: int = 256,
tick_seconds: float = 0.001):
self._levels = [
[_Bucket() for _ in range(slots_per_level)] for _ in range(levels)
]
self._cursors = [0] * levels
self._n = slots_per_level
self._tick = tick_seconds
self._current_tick = 0
self._timers: dict[int, tuple[int, int, _Timer]] = {} # id -> (level, slot, t)
self._next_id = itertools.count(1)
def schedule(self, delay_seconds: float, callback: Callable) -> int:
ticks = max(1, int(delay_seconds / self._tick))
deadline = self._current_tick + ticks
return self._place(deadline, callback)
def _place(self, deadline_tick: int, callback: Callable) -> int:
ticks_from_now = deadline_tick - self._current_tick
# Find the lowest level that can hold this delay.
capacity = self._n
level = 0
while ticks_from_now >= capacity and level < len(self._levels) - 1:
level += 1; capacity *= self._n
if ticks_from_now >= capacity:
raise ValueError("delay exceeds hierarchical range")
# Slot at this level:
per_slot = capacity // self._n
slot = (self._cursors[level] + ticks_from_now // per_slot) % self._n
t = _Timer(next(self._next_id), deadline_tick, callback)
self._levels[level][slot].append(t)
self._timers[t.tid] = (level, slot, t)
return t.tid
def cancel(self, timer_id: int) -> bool:
entry = self._timers.pop(timer_id, None)
if entry is None: return False
_, _, t = entry
if t.slot is not None: t.slot.remove(t)
return True
def tick(self) -> None:
self._current_tick += 1
# Advance level 0
self._cursors[0] = (self._cursors[0] + 1) % self._n
# Fire level-0 current slot
bucket = self._levels[0][self._cursors[0]]
for t in bucket.drain():
self._timers.pop(t.tid, None)
try: t.callback()
except Exception: pass
# Cascade if we wrapped
for lvl in range(1, len(self._levels)):
if self._cursors[lvl - 1] != 0:
break
self._cursors[lvl] = (self._cursors[lvl] + 1) % self._n
for t in self._levels[lvl][self._cursors[lvl]].drain():
self._timers.pop(t.tid, None)
self._place(t.deadline_tick, t.callback)
Tests
import unittest
class TestTimerWheel(unittest.TestCase):
def test_basic(self):
w = TimerWheel(slots=10, tick_seconds=1.0)
fired = []
w.schedule(3, lambda: fired.append("a"))
w.schedule(5, lambda: fired.append("b"))
for _ in range(3): w.tick()
self.assertEqual(fired, ["a"])
for _ in range(2): w.tick()
self.assertEqual(fired, ["a", "b"])
def test_cancel(self):
w = TimerWheel(slots=10, tick_seconds=1.0)
fired = []
tid = w.schedule(2, lambda: fired.append("x"))
self.assertTrue(w.cancel(tid))
for _ in range(5): w.tick()
self.assertEqual(fired, [])
def test_hierarchical_long_delay(self):
w = HierarchicalTimerWheel(levels=3, slots_per_level=4, tick_seconds=1.0)
# Range = 4 * 4 * 4 = 64 ticks
fired = []
w.schedule(50, lambda: fired.append("late"))
for _ in range(50): w.tick()
self.assertEqual(fired, ["late"])
Follow-up Questions
(11) Configuration knobs? slots, tick_seconds, levels (for hierarchical). Tick resolution chooses your scheduling granularity vs CPU cost: 1 ms means 1000 ticks/sec which is fine; 100 µs is 10000 ticks/sec which adds CPU. Slots per level: 256 is a good default (1 byte addressable, common L1 cache friendly).
(7) Backpressure? Timer firing is on the tick thread. If callbacks are slow, ticks fall behind real time; subsequent timers fire late. Mitigation: dispatch callbacks to a thread pool from tick. Document the soft real-time semantics (“fires within 1 tick of deadline barring slow callbacks”).
(4) Observe / monitor? Active timer count (gauge), schedule rate (counter), cancel rate (counter), fire rate (counter), tick latency p99 (histogram — should be near zero; spikes mean slow callbacks).
(8) Partial failure? A callback that raises kills the tick if not caught. Always wrap with try/except and log.
(13) Poison pill? A callback that takes 1 second on a 1-ms wheel: all subsequent ticks pile up. Same mitigation as #7: dispatch to thread pool, or set a per-callback timeout.
Product Extension
Linux’s kernel uses a 4-level hierarchical timer wheel for setitimer and TCP retransmits. Netty’s HashedWheelTimer is the canonical Java implementation (single-level wheel; “hashed” means linked-list bucket per slot). The Linux choice of 256/64/64/64 slots covers ~5 days at 1 ms tick — enough for any kernel-level timer.
Language/Runtime Follow-ups
- Python: this implementation. For sub-millisecond, switch to a C extension or use
asyncio’s event loop scheduler (which uses a heap, but for low-N is fine). - Java: Netty’s
HashedWheelTimer(single-level) andJCToolsfor lock-free variants.ScheduledThreadPoolExecutoris heap-based and slower at scale. - Go:
time.AfterFuncuses a heap internally (fine for low-N). For high-N,github.com/RussellLuo/timingwheelis a clean library. - C++: the textbook reference; libuv and Boost both have wheel-based timer implementations.
- JS/TS: Node’s timer subsystem uses a hash-bucket-by-time-and-context structure — not exactly a wheel but similar idea.
Common Bugs
- Forgetting modulo on slot indexing — array out of bounds when delay wraps around.
- Cascade firing on every tick instead of only when wrapping. Catastrophic slowdown.
- Forgetting to drain the bucket before clearing — callbacks lost.
- Holding callbacks in slot lists and in the
_timersdict; failing to remove from one when removing from the other. - Scheduling
delay=0: should fire on next tick, not “now”. Clamp to ≥ 1 tick.
Debugging Strategy
Print the wheel state (occupied slots and their counts) after each tick. For “missed firing” bugs, walk the slot indexing and verify the placement formula. For hierarchical cascade bugs, set very small slot counts (4 × 4 × 4) and hand-trace the wrap.
Mastery Criteria
- Implemented single-level wheel in <20 minutes; hierarchical in <35.
- All three tests pass.
- Stated heap vs wheel tradeoff (heap O(log N), wheel O(1); wheel wins at high-N high-cancel).
- Articulated cascade mechanism for hierarchical wheels.
- Answered follow-ups #4, #7, #11, #13.
- Identified that real-world systems (Linux, Netty) use this exact structure.