p102 — Design Hit Counter

Source: LeetCode 362 · Medium · Topics: Design, Queue, Binary Search Companies (2024–2025): Microsoft (high), Amazon (high), Google (medium), Dropbox (medium), Datadog (medium) Loop position: The rate-limiting / metrics primitive. Every API gateway, observability tool, and DDoS shield has this exact data structure at its core.

1. Quick Context

hit(ts) records a hit at timestamp ts (monotonically nondecreasing seconds). getHits(ts) returns the number of hits in the past 300 seconds (i.e., in window [ts-299, ts]).

Two designs, very different scaling:

Deque of timestamps — append on hit; popleft while head < ts - 299. Best when QPS is low; O(n) worst-case hit if a burst tails old.
Circular array of (ts, count) per bucket — 300 buckets indexed ts % 300. hit increments one bucket in O(1). getHits sums 300 buckets in O(300) = O(1).

The bucket design is what production rate limiters use (token bucket variant; Stripe’s, Cloudflare’s, AWS API Gateway’s).

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/design-hit-counter/

STOP. 20-min timer. Sketch both designs before reading further.

3. Prerequisite Concepts

Sliding window over time.
collections.deque O(1) append/popleft.
Circular buffer indexed by (value % capacity).

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Streaming hits with monotonic timestamps; count hits in trailing 300s window.

2. Clarify: Are timestamps guaranteed monotonic? Yes (per LC). Out-of-order? No. Granularity? 1 second. What if the same second is hit 1M times? → the deque blows up; the bucket design handles it in O(1).

3. Examples: hit(1), hit(2), hit(3), getHits(4)=3, getHits(300)=3, getHits(301)=3, getHits(302)=2 (hit at ts=1 dropped).

5. Brute: list[int] of all hits; on getHits, iterate and count. O(n) per query, unbounded memory.

6. Target: O(1) amortized hit; O(1) getHits (bucket design).

7. Pattern: Sliding-window via bucketed time.

8. Develop: see solution.py.

9. Test: burst at same second; bucket-recycling at window boundary; long quiet gap (stale buckets must be ignored, not summed).

5. Progressive Hints

See hints.md.

6. Deeper Insight

6.1 The deque design’s hidden failure mode

hit(ts) is O(1) amortized. But if 1M hits happen at second 100 and then no hits until second 401, the first getHits(401) must popleft all 1M entries in one call → that one query is O(1M). For a latency-sensitive metrics endpoint (P99 matters), this is unacceptable. The bucket design bounds the work per getHits to 300 ops regardless of traffic shape.

6.2 The bucket design’s elegance

Index ts % 300 into a 300-slot array. Each slot stores (stored_timestamp, count). On hit(ts):

If bucket[ts%300].timestamp == ts, increment count.
Else, the bucket is stale → overwrite with (ts, 1).

On getHits(ts):

Sum bucket[i].count for all i where bucket[i].timestamp > ts - 300.

The “stale” check is what makes it correct across the wrap: a bucket from 300+ seconds ago must not contribute.

6.3 Production: from this to a rate limiter

A rate limiter is “hit counter that returns whether to admit”. Same bucket design. Cloudflare’s rate limiter, AWS API Gateway throttling, GitHub’s REST API limits, Stripe’s idempotency-key dedup window — all run this exact data structure (with finer buckets, e.g., 10 buckets of 1s for a 10s window, to balance accuracy vs memory).

6.4 Sub-second / nanosecond hits — out of scope, but…

LC restricts ts to integer seconds. Real systems use leaky bucket or token bucket with floating-point fill rates. Same algorithm shape; the “bucket” is now a counter that decays continuously rather than discretely.

6.5 Family: time-window aggregates

LC 362 Design Hit Counter (this).
LC 933 Number of Recent Calls — deque variant.
LC 1845 Seat Reservation Manager — different concern.
LC 1396 Design Underground System — different (paired check-in/out).

7. Anti-Pattern Analysis

Unbounded list of timestamps — memory leak under load.
getHits uses bisect_left on a sorted list — O(log n) is fine for queries but hit becomes O(n) if you ever need to reclaim memory.
Forget the bucket staleness check — a hit from second 5 still counted at second 405 because bucket[5 % 300] = bucket[5] still has count from the old hit.
Use time.time() instead of the passed ts — fails the test harness.
Cleanup runs during hit but is unbounded — moves the latency from getHits to hit but doesn’t solve it; bucket design bounds both.

8. Skills & Takeaways

Bucketed sliding window — the rate-limiter primitive.
Why “amortized O(1)” can still be unacceptable under bursty load.
Recognizing when memory must be bounded by window_size, not total_events.

9. When to Move On

Implement both deque and bucket variants cold in 15 min.
Articulate the burst-then-quiet failure of the deque design.
Extend to a configurable window (5min, 1hr).

10. Company Context

Microsoft: Azure SDK team; phone screen staple.
Amazon: SDE2; “design rate limiter” round.
Google: L4; “extend to per-user rate limit” follow-up.
Dropbox / Datadog: observability/SRE roles.

Strong: “Bucket array of 300 slots, indexed ts % 300, each storing (stored_ts, count) with staleness check. Both hit and getHits strictly O(window_size).”

Weak: “Deque of timestamps.” → “OK but what about a burst then a long gap?” → if you stall, you’re a mid.

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff+
Bounded memory	After hint	Cold	Cold	Cold
Bucket design	—	After hint	Cold	Cold
Burst failure mode	—	After hint	Articulates	+ cites rate-limiter products
Sub-second extension	—	—	Mentions	+ leaky/token bucket math

12. Level Delta

Mid (L4): Deque solution. Works. Hire.
Senior (L5): Bucket design cold + articulates the burst-then-quiet failure of deque. Strong hire.
Staff (L6): + names Cloudflare / AWS API Gateway / Stripe as users of this exact pattern; + designs per-user rate-limit extension (sharded buckets per (user_id, time_bucket)).
Principal (L7+): + discusses distributed rate limiting: Redis with TTL’d buckets, eventual consistency tradeoffs, Lua scripts for atomicity, and “burstable” tokens (token bucket as a more accurate model than fixed-window). + Mentions request smoothing via leaky bucket and where in the stack this lives (edge vs gateway vs service mesh).

13. Follow-up Q&A

Q1: 1-hour window instead of 5 minutes? A1: 3600 buckets. Same algorithm; 12× memory. Or: coarser buckets (60-second granularity → 60 buckets, ±60s error).

Q2: Per-user counter. A2: dict[user_id, BucketArray]. Memory grows with active-user count; evict idle users (LRU on the dict — see p101).

Q3: Distributed. A3: Redis with INCR + EXPIRE per (user, time_bucket) key. Read = sum of N keys. For atomicity under concurrent ops, use Lua scripts. Cloudflare and AWS WAF do exactly this.

Q4: Out-of-order timestamps. A4: Bucket design still works if ts falls within [now - window, now]. Older = drop or count toward original bucket if still active.

Q5: Why not a count-min sketch? A5: CMS approximates per-key counts, useful for “hits per user” with bounded memory. Overkill for a single counter. CMS shows up if the question becomes “top-K users by request rate”.

14. Solution Walkthrough

See solution.py:

HitCounter — deque-based (idiomatic for LC; passes all tests).
HitCounterBucket — circular array of 300 buckets; production-grade.
HitCounterBrute — list of all hits; O(n) per query oracle.
Stress test: random hit + getHits sequences with monotonic timestamps.

15. Beyond the Problem

Principal code review: “What you’ve built is the fixed-window counter — the simplest of three rate-limiter families. Production systems pick differently per concern: (1) fixed-window (this; simple, burst-on-window-boundary problem — a user can fire 2x limit in one second by hitting the end of one window and the start of the next), (2) sliding-window log (the deque variant; accurate, memory-unbounded), (3) sliding-window counter (interpolate between two fixed windows — Cloudflare’s published design — smooth and bounded), (4) leaky bucket (constant outflow; smooths bursts at the cost of latency), (5) token bucket (constant refill; allows burstiness up to bucket size — AWS API Gateway). The Staff-grade answer in this round names the burst-on-boundary issue with fixed-window and reaches for the sliding-window counter as the production compromise. The Principal-grade answer ties it to the edge vs gateway vs service placement: edge (Cloudflare) does coarse fixed-window for DDoS; gateway (Kong, AWS API GW) does token bucket for fairness; service mesh (Envoy) does adaptive concurrency for back-pressure. Same algorithm family; entirely different role per layer.”

LeetCode Interview — Extreme Coding