Lab 13 — Hierarchical Timer Wheel

Goal

Implement a hierarchical timer wheel that supports O(1) amortized schedule and cancel operations for up to millions of timers. After this lab you should understand why a min-heap is wrong for high-throughput timer workloads, and be able to implement a single-level and a hierarchical timer wheel from blank screen in under 35 minutes.

Background Concepts

Many systems need to schedule callbacks for a future time: TCP retransmit timers, session timeouts, rate-limit reset, cron-style task scheduling. The classical data structure is a min-heap of (fire_time, callback): O(log N) schedule, O(log N) to fire (pop), O(N) to cancel (without indexing).

For the cases where N is small (thousands) and timers are infrequent, a heap is fine. But TCP at high QPS has millions of pending timers, most of which are cancelled before firing (the data arrives or the connection closes). For that workload, the heap is too slow.

The timer wheel (Varghese & Lauck, 1987) achieves O(1) schedule and O(1) cancel by bucketing timers by their fire time. Imagine a circular array of N slots; each slot holds a list of timers that fire when the wheel cursor reaches that slot. Each tick, advance the cursor and fire all timers in the current slot. Schedule is O(1): slot_index = (now + delay) % N. Cancel is O(1): remove from the slot’s list.

The single-level wheel works for delays up to N · tick_resolution. Beyond that, hierarchical wheels: minute wheel + hour wheel + day wheel, like an analog clock. When the minute hand sweeps past 60, advance the hour hand by one and re-bucket the timers in that hour slot into the minute wheel.

Interview Context

Timer wheel is a senior+ system-design-coding question, asked at networking/infrastructure companies (Cloudflare, Cilium, AWS networking, Datadog APM agents). The bar is high: implement at least the single-level wheel correctly; the hierarchical structure is for top candidates.

Problem Statement

Implement TimerWheel:

schedule(delay_seconds, callback) -> timer_id
cancel(timer_id) -> bool
tick(now) — advance the cursor; fire all timers whose deadline passed.

Then extend to HierarchicalTimerWheel: 4 levels (e.g., 256 slots × 4 levels = 4 GB span at 1 ms tick).

Constraints

Up to 10^7 active timers
Tick resolution: 1 ms to 100 ms
Schedule rate: 10^6 / sec
Cancel rate: 10^6 / sec (most timers are cancelled before firing)

Clarifying Questions

Tick resolution — fixed at construction, or adaptive? (Fixed.)
Time source — caller supplies now (testable) or wall clock? (Caller supplies; this also lets us simulate.)
Are callbacks fired on the tick thread, or queued? (On tick thread — simpler. For long callbacks, tick is slow; document.)
What’s the max delay? (Single-level: slots * tick; hierarchical: enormous.)

Examples

wheel = TimerWheel(slots=60, tick_ms=1000)   # 1-second tick, 60-second range
fired = []
wheel.schedule(5, lambda: fired.append("5s"))
wheel.schedule(10, lambda: fired.append("10s"))

# Simulate ticks
for i in range(15):
    wheel.tick(start_time + i)
# After 15 seconds, fired == ["5s", "10s"]

Initial Brute Force

A min-heap of (deadline, id, callback). Schedule = O(log N). Tick = O(K log N) for K firings. Cancel = O(N) (or O(log N) with a dict[id, heap_idx] and lazy deletion).

Brute Force Complexity

At 10^6 schedules/sec with N=10^7, schedule cost is log(10^7) ≈ 23 per — feasible but at the edge. The fatal weakness is cancel: O(N) without indexing. With lazy-deletion indexing, the heap grows unboundedly with cancelled-but-not-popped entries — memory leak.

Optimization Path

Single-level wheel: slots = circular array of bucket lists. schedule(delay): compute slot = (cursor + delay // tick) % slots, append to that slot’s list. tick: advance cursor, fire and clear slots[cursor]. Cancel: remove the timer from its slot’s list (O(1) if you store back-pointers).

Hierarchical: when delay > slots, place in higher-level wheel. When the lower wheel completes a full revolution, the next slot of the higher wheel is “cascaded” — its timers are re-bucketed into the lower wheel.

Final Expected Approach

Single-level: doubly-linked lists at each slot for O(1) cancel (timer holds prev/next pointers). Cursor advances on tick; fire all timers in the new slot; clear the slot.

Hierarchical: 4 wheels with slot counts [256, 64, 64, 64] (Linux kernel choice). Lower wheel ticks every 1 ms; carries every 256 ticks → upper level advances by one slot, and we cascade that slot (re-bucket its timers into the lower wheel based on remaining delay).

Data Structures Used

Structure	Purpose
Array of doubly-linked lists	wheel slots
`dict[timer_id, Timer]`	O(1) cancel lookup
`cursor: int`	current slot
Multiple wheels	hierarchical

Correctness Argument

Single-level firing: a timer scheduled at now + delay is placed in slot (cursor + delay // tick) % slots. tick is called once per tick time unit; after delay // tick calls, the cursor reaches the timer’s slot and fires it. Provided delay < slots * tick, this is exact.

Cancel: O(1) splice in the doubly-linked list, O(1) lookup via dict.

Hierarchical correctness: when the lower wheel completes a revolution (cursor wraps), the next upper-wheel slot is cascaded: each timer in it has its remaining delay computed against the new “now” and is placed in the appropriate lower-wheel slot. Because the cascade happens just before the next revolution, the timer fires at the same wall-clock time it would have in a single-large-wheel implementation.

Complexity

schedule: O(1)
cancel: O(1)
tick: O(K) for K firings, plus amortized O(1) cascade
Space: O(slots + active timers)

Implementation Requirements

import itertools
from typing import Callable, Optional

class _Timer:
    __slots__ = ("tid", "deadline_tick", "callback", "prev", "next", "slot")
    def __init__(self, tid, deadline_tick, callback):
        self.tid = tid
        self.deadline_tick = deadline_tick
        self.callback = callback
        self.prev = self.next = None
        self.slot: Optional[list] = None  # back-ref to bucket for O(1) cancel


class _Bucket:
    __slots__ = ("head", "tail")
    def __init__(self):
        self.head = _Timer(None, None, None)   # sentinel
        self.tail = _Timer(None, None, None)
        self.head.next = self.tail; self.tail.prev = self.head

    def append(self, t: _Timer) -> None:
        t.prev = self.tail.prev; t.next = self.tail
        self.tail.prev.next = t; self.tail.prev = t
        t.slot = self

    def remove(self, t: _Timer) -> None:
        t.prev.next = t.next; t.next.prev = t.prev
        t.slot = None; t.prev = t.next = None

    def drain(self) -> list[_Timer]:
        out = []
        n = self.head.next
        while n is not self.tail:
            nxt = n.next
            n.prev = n.next = None; n.slot = None
            out.append(n); n = nxt
        self.head.next = self.tail; self.tail.prev = self.head
        return out


class TimerWheel:
    def __init__(self, slots: int = 256, tick_seconds: float = 0.001):
        self._slots = [_Bucket() for _ in range(slots)]
        self._n_slots = slots
        self._tick = tick_seconds
        self._cursor = 0
        self._current_tick = 0
        self._timers: dict[int, _Timer] = {}
        self._next_id = itertools.count(1)

    def schedule(self, delay_seconds: float, callback: Callable) -> int:
        ticks = max(1, int(delay_seconds / self._tick))
        if ticks >= self._n_slots:
            raise ValueError("delay exceeds wheel range; use HierarchicalTimerWheel")
        deadline = self._current_tick + ticks
        slot = deadline % self._n_slots
        t = _Timer(next(self._next_id), deadline, callback)
        self._slots[slot].append(t)
        self._timers[t.tid] = t
        return t.tid

    def cancel(self, timer_id: int) -> bool:
        t = self._timers.pop(timer_id, None)
        if t is None or t.slot is None:
            return False
        t.slot.remove(t)
        return True

    def tick(self) -> None:
        self._current_tick += 1
        self._cursor = (self._cursor + 1) % self._n_slots
        bucket = self._slots[self._cursor]
        for t in bucket.drain():
            self._timers.pop(t.tid, None)
            try: t.callback()
            except Exception: pass


class HierarchicalTimerWheel:
    """4 levels: each level has 256 slots; tick = lower-level period."""
    def __init__(self, levels: int = 4, slots_per_level: int = 256,
                 tick_seconds: float = 0.001):
        self._levels = [
            [_Bucket() for _ in range(slots_per_level)] for _ in range(levels)
        ]
        self._cursors = [0] * levels
        self._n = slots_per_level
        self._tick = tick_seconds
        self._current_tick = 0
        self._timers: dict[int, tuple[int, int, _Timer]] = {}  # id -> (level, slot, t)
        self._next_id = itertools.count(1)

    def schedule(self, delay_seconds: float, callback: Callable) -> int:
        ticks = max(1, int(delay_seconds / self._tick))
        deadline = self._current_tick + ticks
        return self._place(deadline, callback)

    def _place(self, deadline_tick: int, callback: Callable) -> int:
        ticks_from_now = deadline_tick - self._current_tick
        # Find the lowest level that can hold this delay.
        capacity = self._n
        level = 0
        while ticks_from_now >= capacity and level < len(self._levels) - 1:
            level += 1; capacity *= self._n
        if ticks_from_now >= capacity:
            raise ValueError("delay exceeds hierarchical range")
        # Slot at this level:
        per_slot = capacity // self._n
        slot = (self._cursors[level] + ticks_from_now // per_slot) % self._n
        t = _Timer(next(self._next_id), deadline_tick, callback)
        self._levels[level][slot].append(t)
        self._timers[t.tid] = (level, slot, t)
        return t.tid

    def cancel(self, timer_id: int) -> bool:
        entry = self._timers.pop(timer_id, None)
        if entry is None: return False
        _, _, t = entry
        if t.slot is not None: t.slot.remove(t)
        return True

    def tick(self) -> None:
        self._current_tick += 1
        # Advance level 0
        self._cursors[0] = (self._cursors[0] + 1) % self._n
        # Fire level-0 current slot
        bucket = self._levels[0][self._cursors[0]]
        for t in bucket.drain():
            self._timers.pop(t.tid, None)
            try: t.callback()
            except Exception: pass
        # Cascade if we wrapped
        for lvl in range(1, len(self._levels)):
            if self._cursors[lvl - 1] != 0:
                break
            self._cursors[lvl] = (self._cursors[lvl] + 1) % self._n
            for t in self._levels[lvl][self._cursors[lvl]].drain():
                self._timers.pop(t.tid, None)
                self._place(t.deadline_tick, t.callback)

Tests

import unittest

class TestTimerWheel(unittest.TestCase):
    def test_basic(self):
        w = TimerWheel(slots=10, tick_seconds=1.0)
        fired = []
        w.schedule(3, lambda: fired.append("a"))
        w.schedule(5, lambda: fired.append("b"))
        for _ in range(3): w.tick()
        self.assertEqual(fired, ["a"])
        for _ in range(2): w.tick()
        self.assertEqual(fired, ["a", "b"])

    def test_cancel(self):
        w = TimerWheel(slots=10, tick_seconds=1.0)
        fired = []
        tid = w.schedule(2, lambda: fired.append("x"))
        self.assertTrue(w.cancel(tid))
        for _ in range(5): w.tick()
        self.assertEqual(fired, [])

    def test_hierarchical_long_delay(self):
        w = HierarchicalTimerWheel(levels=3, slots_per_level=4, tick_seconds=1.0)
        # Range = 4 * 4 * 4 = 64 ticks
        fired = []
        w.schedule(50, lambda: fired.append("late"))
        for _ in range(50): w.tick()
        self.assertEqual(fired, ["late"])

Follow-up Questions

(11) Configuration knobs? slots, tick_seconds, levels (for hierarchical). Tick resolution chooses your scheduling granularity vs CPU cost: 1 ms means 1000 ticks/sec which is fine; 100 µs is 10000 ticks/sec which adds CPU. Slots per level: 256 is a good default (1 byte addressable, common L1 cache friendly).

(7) Backpressure? Timer firing is on the tick thread. If callbacks are slow, ticks fall behind real time; subsequent timers fire late. Mitigation: dispatch callbacks to a thread pool from tick. Document the soft real-time semantics (“fires within 1 tick of deadline barring slow callbacks”).

(4) Observe / monitor? Active timer count (gauge), schedule rate (counter), cancel rate (counter), fire rate (counter), tick latency p99 (histogram — should be near zero; spikes mean slow callbacks).

(8) Partial failure? A callback that raises kills the tick if not caught. Always wrap with try/except and log.

(13) Poison pill? A callback that takes 1 second on a 1-ms wheel: all subsequent ticks pile up. Same mitigation as #7: dispatch to thread pool, or set a per-callback timeout.

Product Extension

Linux’s kernel uses a 4-level hierarchical timer wheel for setitimer and TCP retransmits. Netty’s HashedWheelTimer is the canonical Java implementation (single-level wheel; “hashed” means linked-list bucket per slot). The Linux choice of 256/64/64/64 slots covers ~5 days at 1 ms tick — enough for any kernel-level timer.

Language/Runtime Follow-ups

Python: this implementation. For sub-millisecond, switch to a C extension or use asyncio’s event loop scheduler (which uses a heap, but for low-N is fine).
Java: Netty’s HashedWheelTimer (single-level) and JCTools for lock-free variants. ScheduledThreadPoolExecutor is heap-based and slower at scale.
Go: time.AfterFunc uses a heap internally (fine for low-N). For high-N, github.com/RussellLuo/timingwheel is a clean library.
C++: the textbook reference; libuv and Boost both have wheel-based timer implementations.
JS/TS: Node’s timer subsystem uses a hash-bucket-by-time-and-context structure — not exactly a wheel but similar idea.

Common Bugs

Forgetting modulo on slot indexing — array out of bounds when delay wraps around.
Cascade firing on every tick instead of only when wrapping. Catastrophic slowdown.
Forgetting to drain the bucket before clearing — callbacks lost.
Holding callbacks in slot lists and in the _timers dict; failing to remove from one when removing from the other.
Scheduling delay=0: should fire on next tick, not “now”. Clamp to ≥ 1 tick.

Debugging Strategy

Print the wheel state (occupied slots and their counts) after each tick. For “missed firing” bugs, walk the slot indexing and verify the placement formula. For hierarchical cascade bugs, set very small slot counts (4 × 4 × 4) and hand-trace the wrap.

Mastery Criteria

Implemented single-level wheel in <20 minutes; hierarchical in <35.
All three tests pass.
Stated heap vs wheel tradeoff (heap O(log N), wheel O(1); wheel wins at high-N high-cancel).
Articulated cascade mechanism for hierarchical wheels.
Answered follow-ups #4, #7, #11, #13.
Identified that real-world systems (Linux, Netty) use this exact structure.

LeetCode Interview — Extreme Coding