p105 — All Oone Data Structure

Source: LeetCode 432 · Hard · Topics: Design, Hash Table, Doubly Linked List Companies (2024–2025): Bloomberg (very high), Meta (medium), Amazon (medium), Two Sigma (medium), Citadel (medium) Loop position: The week’s capstone DLL-of-buckets puzzle. Generalizes the LFU pattern (p101): bucket by count, DLL of buckets, O(1) extremes.

1. Quick Context

Build a key-value store supporting:

  • inc(key) — count of key +1 (insert with count 1 if absent).
  • dec(key) — count of key -1; remove key when count reaches 0. (Guaranteed key exists.)
  • getMaxKey() / getMinKey() — any key with the max/min count; “” if empty.

All four operations must be O(1).

State:

  • key_to_count: dict[key, int].
  • A doubly-linked list of buckets, each bucket = (count, set_of_keys_with_that_count). The buckets are kept sorted ascending by count (low at head, high at tail).
  • count_to_bucket: dict[int, BucketNode] for O(1) bucket lookup.

getMinKey = any key in head.next bucket; getMaxKey = any key in tail.prev bucket. Both O(1).

inc(key): move key from its current bucket to the (count+1) bucket — creating that bucket adjacent if needed. dec(key): move to (count-1) — or drop if count was 1.

🔗 https://leetcode.com/problems/all-oone-data-structure/

STOP. 30-min timer. This is harder than LFU. Sketch the bucket-DLL on paper first.

3. Prerequisite Concepts

  • LFU Cache (p101) — same “bucket by integer attribute, DLL of buckets” shape.
  • Sentinel-head/tail doubly linked list.
  • O(1) add/discard on set for bucket membership.

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Multiset with counts. O(1) inc, dec, min, max.

2. Clarify: dec is only called on existing keys (per LC). getMin/getMax on empty → “”. Multiple keys at the same count? Return any one.

3. Examples: inc(“a”), inc(“b”), inc(“b”). max=b (2), min=a (1). inc(“a”). Now a=b=2; min and max both return “a” or “b”.

5. Brute: dict[key, count] + min(values) / max(values) per query. O(n) per query.

6. Target: O(1) for all four.

7. Pattern: Bucket DLL ordered by count.

8. Develop: see solution.py.

9. Test: sequence that exercises bucket-creation between existing ones; dec to zero (key removal AND bucket removal if last); inc adjacent.

5. Progressive Hints

See hints.md.

6. Deeper Insight

6.1 Why the buckets must be a DLL (not a sorted list)

We need O(1) insert-after / insert-before / remove given a pointer to a node. A sorted array would be O(log n) for insert position. DLL is the answer; same reason as LRU and LFU.

6.2 Bucket creation is the subtle part

inc("foo") where foo’s current count is c needs to move foo to the c+1 bucket. Cases:

  • c+1 bucket exists adjacent → just move foo.
  • c+1 bucket doesn’t exist → create it adjacent to the c bucket, then move.
  • foo was the only key in its old c bucket → after moving, the old bucket is empty and must be deleted.

dec is symmetric with c-1, plus the “if c was 1, just delete the key” branch.

6.3 Why this is harder than LFU

LFU only needs the min (for eviction). This requires both min and max simultaneously O(1). The DLL gives both as endpoints — but the bookkeeping for adjacency on bucket creation is denser. A common bug: creating a new bucket on the wrong side of the existing one.

6.4 Family: “rank by integer attribute” structures

  • LC 460 LFU (p101) — bucket by frequency.
  • LC 432 All O(1) (this) — bucket by count.
  • LC 716 Max Stack — different pattern (LDLL + max-tracking).
  • LC 895 Maximum Frequency Stack — bucket by frequency + stack-per-bucket.

Real systems use this exact bucket-DLL when they need “top-K most-incremented keys over a window with O(1) updates”. Examples: real-time word counts in stream processors (Heron, Flink), live-vote tallies, slow-query trackers. Combined with a sliding-window decay, it’s the core of the Misra-Gries heavy-hitters algorithm’s data structure.

7. Anti-Pattern Analysis

  1. min(counts) / max(counts) on a dict — O(n) per query.
  2. Use heapq with lazy deletion — O(log n) and gets ugly with dec.
  3. Single bucket ordered by count with binary search to find position — O(log n).
  4. Forget to delete empty buckets — bucket DLL grows unboundedly; min/max becomes wrong (they sit at an empty bucket).
  5. Create the new bucket on the wrong sideinc should create (c+1) AFTER c (toward tail); dec should create (c-1) BEFORE c (toward head).
  6. getMinKey / getMaxKey on empty doesn’t return "" — LC test failure.

8. Skills & Takeaways

  • Bucket-DLL pattern (generalizes LFU’s “bucket by frequency” to “bucket by any integer attribute, O(1) extremes”).
  • Adjacent bucket creation/deletion bookkeeping.
  • Why O(log n) is a downlevel signal on this specific problem.

9. When to Move On

  • Implement cold in 30 min, all 4 ops O(1), all buckets correctly created/destroyed.
  • Articulate why a heap fails.
  • Connect to LFU (p101) and heavy-hitters streaming.

10. Company Context

  • Bloomberg: Phone + onsite staple. They love the bucket-DLL.
  • Two Sigma / Citadel: “Real-time vote tally / order book level count” — same data structure.
  • Meta / Amazon: Less common but appears in cache / rate-limit follow-ups.

Strong: “Doubly-linked list of buckets sorted by count. Each bucket holds a set of keys at that count. count_to_bucket map for O(1) lookup. Inc moves a key to the (count+1) bucket (creating it adjacent if needed); dec mirrors. Empty buckets are deleted. All four ops strictly O(1).”

11. Interviewer’s Lens

SignalJuniorMidSeniorStaff+
O(1) achievedAfter hintsColdCold
Bucket-DLL ideaAfter hintsColdCold
Empty bucket cleanupForgetsFixesColdCold
LFU connectionMentionsArticulates
Misra-Gries linkCites

12. Level Delta

  • Mid (L4): Heap with lazy deletion. Works (slow). Senior+ marks it as a downlevel.
  • Senior (L5): Bucket-DLL cold, all four ops O(1), correct empty-bucket cleanup.
  • Staff (L6): + connects to LFU (p101 — same pattern); + names the generalization “bucket by integer attribute, DLL of buckets” and applies it elsewhere; + handles thread-safety strategy.
  • Principal (L7+): + names Misra-Gries and Space-Saving heavy-hitters algorithms as the streaming generalization; + discusses memory cost vs accuracy (exact O(distinct keys) vs sketch O(k)); + production placement (used in Flink for top-K trending, in Kafka Streams for windowed aggregations); + how an order book in a financial exchange uses the same bucket-DLL keyed by price level.

13. Follow-up Q&A

Q1: Make it thread-safe. A1: Coarse lock around all four ops (they’re all short). Fine-grained: per-bucket lock + RW lock on the DLL spine — fragile. Realistic: shard by hash(key) % N and merge results for global min/max (which becomes O(N) for the merge).

Q2: Concurrent inc on the same key. A2: With a single lock, sequenced. Lock-free is very hard here (the bucket moves are multi-step pointer updates).

Q3: getTopK() for K > 1. A3: Walk the tail buckets, collecting keys, until K. Worst case O(n) but typically O(K). For large K, return iterator over buckets.

Q4: count(key) query. A4: key_to_count[key] — O(1) trivially. We already have it.

Q5: Persist across restarts. A5: Snapshot key_to_count only; rebuild buckets lazily on first access. The bucket structure is derived state.

14. Solution Walkthrough

See solution.py:

  • AllOne — bucket-DLL with count_to_bucket map.
  • AllOneBrute — dict + scan oracle.
  • Stress test: random inc/dec/getMin/getMax sequences with the dec-precondition respected.

15. Beyond the Problem

Principal code review: “What you’ve built is the exact data structure behind the order book at every financial exchange (price levels = bucket counts; orders = keys; best bid = getMaxKey, best ask = getMinKey; on every order insert/cancel, O(1) updates AND O(1) BBO read). It is also the data structure behind real-time top-K trending (Twitter trends, YouTube trending, Reddit Hot) when memory permits exact counting — for huge key spaces, Space-Saving (Metwally et al., 2005) and Misra-Gries (1982) are approximations that trade accuracy for O(K) memory regardless of distinct-key count. The Staff/Principal signal is recognizing that this 80-line data structure is the kernel of multiple production systems AND knowing when to swap exact (this) for approximate (Space-Saving) — the threshold is when O(distinct_keys) memory becomes intolerable. The bucket-DLL family — LRU (p16), LFU (p101), All-O(1) (this) — is one of the three most important patterns in systems coding (the others being the segment tree family and the heap-merge family); mastering all three is a precondition for the L6+ design loop.”