p42 — Top K Frequent Elements

Source: LeetCode 347 · Medium · Topics: Array, Hash Table, Heap, Bucket Sort, Quickselect Companies (2024–2025 frequency): Meta (very high), Amazon (very high), Google (high), Microsoft (high), Apple (medium), Uber (high), LinkedIn (high) Loop position: Phone screen and first round. The “bucket sort is faster” curveball — most candidates default to heap, the senior signal is recognizing bucket sort.

1. Quick Context

Given nums, return the k most frequent elements.

Three acceptable solutions:

Approach	Time	Space	When to pick
Counter + sort by count	O(n log n)	O(n)	Baseline; never primary
Counter + min-heap of size K	O(n log K)	O(n + K)	Production default
Counter + bucket sort	O(n)	O(n)	Optimal — beats heap when count ≤ n

The discriminator is recognizing bucket sort applies: counts are bounded by n (an element can appear at most n times), so a bucket array of size n+1 indexed by frequency lets you sweep top-K in linear time. Most candidates miss this.

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/top-k-frequent-elements/

STOP. 20-min timer. Aim to derive bucket sort.

3. Prerequisite Concepts

collections.Counter for O(n) frequency counting.
heapq and min-heap-of-size-K (p41).
The bucket-sort insight: when values are bounded by n, sort by indexing.

4. How to Approach (FRAMEWORK Steps 1–9)

Step 1 — Restate: “Count each element’s frequency; return K elements with highest counts. Order among the K is unspecified.”

Step 2 — Clarify:

“Ties — which to break in favor of?” (LC: any; if asked, specify lex / first-seen / etc.)
“k ≤ #distinct?” (LC: guaranteed.)
“n bound?” (LC: 10^5; both heap and bucket comfortable.)
“Are elements hashable?” (Yes — ints.)
“Return order matters?” (LC: no.)

Step 3 — Constraints: n ≤ 10^5 → counts ≤ 10^5. Any of the three approaches comfortably passes.

Step 4 — Examples:

[1,1,1,2,2,3], k=2 → [1, 2]
[1], k=1 → [1]
[4,4,4,4,4], k=1 → [4]

Step 5 — Brute Force: Counter → list of (count, val) → sort by count → take top K. O(n log n).

Step 6 — Complexity: Heap O(n log K); Bucket O(n).

Step 7 — Pattern: “Top K by frequency” → bucket sort wins when count is bounded by n; heap is general.

Step 8 — Develop: see solution.py.

Step 9 — Test: k=1, k=#distinct (all elements), all same, all distinct (every count = 1).

5. Progressive Hints

If stuck for 5 min, open hints.md.

6. Deeper Insight — Bucket sort is the surprise

6.1 The bucket-sort insight

Element counts are bounded: 1 ≤ count ≤ n. So you can create a buckets array of size n + 1 where buckets[c] is the list of elements with frequency c. Sweep c from n down to 1, collecting elements until you have K.

counter = Counter(nums)
buckets = [[] for _ in range(len(nums) + 1)]
for val, cnt in counter.items():
    buckets[cnt].append(val)

result = []
for c in range(len(buckets) - 1, 0, -1):
    for val in buckets[c]:
        result.append(val)
        if len(result) == k: return result

Time: O(n) for Counter + O(n) for bucket fill + O(n) for sweep = O(n) total.

This is the optimal answer. It beats heap O(n log K) asymptotically. Why isn’t it always used? Memory: bucket array is size n + 1 regardless of K, while heap is size K. For huge n and tiny K, the heap can be more cache-friendly.

6.2 Heap solution — when bucket sort doesn’t apply

For STREAMING input where you don’t know n in advance, bucket sort fails — you can’t pre-size the bucket array. Heap-of-size-K is the streaming-friendly answer.

counter = Counter(nums)
heap = []
for val, cnt in counter.items():
    if len(heap) < k:
        heapq.heappush(heap, (cnt, val))
    elif cnt > heap[0][0]:
        heapq.heappushpop(heap, (cnt, val))
return [val for cnt, val in heap]

Tuple (cnt, val) — min-heap orders by count (first tuple element). On ties, val breaks the tie (deterministic).

6.3 Quickselect on (val, count) pairs — O(n) expected

Use quickselect to find the K largest by count. Same algorithm as p41 but the comparison key is the count, not the value itself.

items = list(counter.items())   # [(val, count), ...]
# Quickselect on items, comparing by count, targeting position len(items) - k
quickselect(items, len(items) - k, key=lambda x: x[1])
return [val for val, _ in items[len(items) - k:]]

O(n) expected; same trade-offs as p41.

6.4 Why ties don’t matter (usually)

LC accepts any K elements with the top-K counts. If two elements tie for K-th place, either is acceptable. BUT: the production version often requires deterministic tie-breaking (lex order, first-seen, etc.) — confirm with the interviewer.

For lex tie-break: heap key becomes (cnt, val) directly works (min-heap → smallest cnt-then-val at root). For “first-seen” tie-break: include an insertion counter in the key.

6.5 LC 692 — Top K Frequent Words (tie-break variant)

Same problem with deterministic tie-break: “if two words have same frequency, the lexicographically smaller comes first.” This makes heap key tricky: we want (LARGER count, SMALLER word) to be at the heap root, but heapq is min-heap.

Trick: invert just the count: heapq.heappush(heap, (-cnt, word)). Then a min-heap pops smallest (-cnt, word) = largest cnt, lex-smallest word on ties. After collecting top K (with size-K heap), pop everything and reverse.

This is the kind of detail that distinguishes a senior from a mid-level candidate.

6.6 Counter is a hash table (and that matters)

collections.Counter is a subclass of dict. Frequency counting is O(n) expected but O(n²) worst case under adversarial hash collisions (theoretical only — Python’s hash is randomized per-process by default). For arbitrary objects (not just ints/strings), ensure __hash__ is well-defined.

7. Anti-Pattern Analysis

Sort + slice as your only answer — O(n log n), misses the bucket-sort O(n) signal.
Heap of size n (push all, pop K) — O(n log n). Wrong reflex: heap-of-size-K is O(n log K).
Push counts, not (count, val) tuples — you lose track of which value the count refers to.
Bucket sort with defaultdict(list) instead of array — slower in practice for dense bucket fill; preallocated list is the right choice.
Bucket sort with size max(counter.values()) + 1 — works but requires a pre-scan; len(nums) + 1 is upper bound and avoids the scan. Tiny win for clarity.
Forgetting LC 692 tie-break — push (-cnt, word) not (cnt, word) for size-K heap with lex tie-break.
Mutating counter.items() during iteration — fortunately we don’t, but a common bug elsewhere.
Returning a heap, not its values — extract with list comprehension.
Not using Counter — manual defaultdict(int) loop works but Counter is idiomatic.

8. Skills & Takeaways

Bucket sort O(n) when count is bounded by n.
Min-heap of size K with (count, val) tuples for streaming-friendly top-K.
Quickselect on items by count key for O(n) expected.
LC 692 trick: (-cnt, word) for deterministic lex tie-break in size-K min-heap.
Counter is the idiomatic frequency-counter; subclass of dict.
Three valid answers, deliberate pick: bucket (optimal), heap (streaming), quickselect (interview flair).

9. When to Move On

Move on when:

You can write bucket-sort version in ≤ 7 min.
You can write heap version in ≤ 5 min.
You can articulate why bucket sort is O(n).
You handle LC 692 tie-break correctly.
Stress test on 500 random arrays agrees with sort baseline.

10. Company Context

Meta:

Phone-screen staple. The bucket-sort recognition is the senior signal.
Bar Raiser asks: “what’s the time complexity? Can you do better than O(n log K)?” — expects bucket sort.
Scorecard phrase: “Algorithm — derived O(n) bucket sort; named heap and quickselect alternatives.”

Amazon:

L5/L6 staple. Often combined with “top K customers by purchase count” framing.
Bar Raiser tests “top K over a 1B-row table” → discusses partitioning + per-shard top-K.

Google:

L4/L5. Often follows up with LC 692 (words + lex tie-break) for senior signal.

Microsoft:

L63/L64. Common in search/ranking team interviews.

Uber/LinkedIn:

“Top K trending hashtags” / “top K skills” framing. Streaming variant common — heap-of-size-K + sliding window.

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff/Principal
First answer	Sort by count	Heap of size K	Bucket sort O(n)	+ names quickselect + streaming caveat
Counter	Manual `dict` loop	`Counter(nums)`	+ explains it’s a dict subclass	+ adversarial hashing caveat
Tie-break	“Doesn’t matter”	“Use tuple key”	“(-cnt, val) for lex order”	+ deterministic-output discussion
Streaming	“Re-sort each time”	“Heap-of-size-K”	+ “bucket sort needs known n”	+ sliding-window Misra-Gries / Space-Saving
LC 692	Brute lex sort	Tuple key (basic)	(-cnt, word) for size-K min-heap, reverse at end	+ count-min sketch for approximate large-stream variant

12. Level Delta

Mid (L4):

Heap solution clean with (cnt, val) tuples.
States O(n log K).
May miss bucket sort.

Senior (L5):

Bucket sort first; names heap as streaming alternative.
Handles LC 692 tie-break correctly.
Mentions Counter internals.

Staff (L6):

Discusses production: hot-key detection, trending topics, real-time leaderboard.
Discusses approximate streaming algorithms (Misra-Gries, Space-Saving).
Discusses sharded top-K for distributed systems.

Principal (L7+):

Discusses count-min sketch + heap for bounded-memory streaming top-K with provable error.
Discusses sliding-window top-K with lazy deletion.
Discusses adversarial input: hash collisions, count overflow, memory exhaustion.
Discusses A/B testing: how top-K rankings drift over time and the need for stable tie-break.

13. Follow-up Q&A

Q1: LC 692 — Top K Frequent Words with lex tie-break. A: Use heapq with key (-cnt, word). Min-heap pops smallest (-cnt, word) = largest cnt, lex-smallest on ties. After collecting K, pop all and reverse to get descending order.

Q2: Streaming over an unbounded input? A: Heap-of-size-K is feasible only if #distinct values fits in memory. For unbounded distinct values, use Space-Saving algorithm (bounded-memory approximate top-K with provable error bounds).

Q3: Top K over a sliding window of W? A: Maintain count map per window position + heap with lazy deletion. Or: bucket of “active counts” per element.

Q4: K = n (return ALL elements by frequency)? A: Sort by count: O(n log n). No win from heap or bucket — must touch everything anyway.

Q5: Top K by SUM-of-values, not count? A: Same template; replace Counter with grouped sum, then bucket / heap / quickselect on sums.

Q6: Top K across multiple shards? A: Per-shard top-K + merge with K · S-heap (S shards). Total O((K · S) log (K · S)).

14. Solution Walkthrough

See solution.py:

top_k_frequent_bucket(nums, k) — O(n) bucket sort.
top_k_frequent_heap(nums, k) — heap of size K with (cnt, val) tuples.
top_k_frequent_sort(nums, k) — baseline.
top_k_frequent_words(words, k) — LC 692 with lex tie-break.
edge_cases() + stress_test() — 500 random arrays; compare as SETS (order unspecified).

Run: python3 solution.py.

15. Beyond the Problem

Principal-engineer code review comment:

“Three things. (1) The bucket sort is asymptotically optimal but memory is O(n) regardless of K — for a service processing top-1 over n=10^9, this matters; use the heap. State the trade-off in the docstring. (2) LC 692’s (-cnt, word) trick is the kind of code that breaks two years later when someone ‘simplifies’ it; add a comment with WHY the negation is there and a test that fails if you remove it. (3) For real top-K systems (trending hashtags, recommendation candidates), you’re almost never doing exact top-K over batched data — you’re doing approximate top-K over a stream with bounded memory. Mention Space-Saving / Misra-Gries / count-min sketch in the design doc; the interview question is the toy version of a much harder real problem.”

Connections forward:

p41 — Kth Largest Element (sibling pattern).
p44 — Merge K Sorted Lists (heap-over-K).
LC 692 — Top K Frequent Words (lex tie-break).
LC 973 — K Closest Points (heap with distance key).
LC 451 — Sort Characters By Frequency (bucket sort revisited).
Phase 02 Lab 09 — Heap Top-K (lab generalization).
Production: trending hashtags, hot-key detection, recommendation candidates, search-suggestion ranking.

LeetCode Interview — Extreme Coding