p103 — Design Twitter
Source: LeetCode 355 · Medium · Topics: Design, Hash Table, Heap, Linked List Companies (2024–2025): Meta (high), Twitter/X (very high — namesake), Snap (medium), LinkedIn (high), Bloomberg (medium) Loop position: The K-way merge feed problem. Every social feed, news aggregator, and activity timeline reduces to “merge K sorted streams, take top N”.
1. Quick Context
Build a tiny social network with:
postTweet(userId, tweetId)getNewsFeed(userId)→ 10 most recent tweets from user + everyone they follow.follow(followerId, followeeId)unfollow(followerId, followeeId)
State:
tweets: dict[userId, list[(time, tweetId)]]— append-only, in post order.follows: dict[userId, set[userId]].time: int— monotonic counter.
getNewsFeed = k-way merge of at most K = (1 + #followed) sorted streams, take top 10. Heap-based: O(K log K + 10 log K) = O(K log K).
2. LeetCode Link + Attempt Gate
🔗 https://leetcode.com/problems/design-twitter/
STOP. 25-min timer.
3. Prerequisite Concepts
- Heap merge of K sorted streams (LC 23 Merge K Sorted Lists, p9-heap-top-k pattern).
- Reverse iteration of per-user tweet lists (newest first).
- Following = adjacency set (self-implicitly followed).
4. How to Approach (FRAMEWORK 1–9)
1. Restate: Mini-Twitter; news feed = top-10 most recent across user + followees.
2. Clarify: Self-follow allowed? No (LC). But user always sees own tweets. unfollow someone you don’t follow? No-op. Re-follow? Idempotent.
3. Examples: user1 posts t5; user2 posts t6; user1 follows user2; user1.feed = [t6, t5].
5. Brute: On getNewsFeed, gather all relevant tweets, sort desc by time, take 10. O(N total tweets log N).
6. Target: O(K log K + 10 log K) where K = followee count + 1.
7. Pattern: Heap-merge K sorted streams.
8. Develop: see solution.py.
9. Test: user follows self (must be no-op or implicit); follow then unfollow then re-follow; getNewsFeed on user with 0 tweets and 0 followees; user with only 3 tweets across feed.
5. Progressive Hints
See hints.md.
6. Deeper Insight
6.1 Why heap-merge, not sort-everything
If a user follows 1M accounts and each has 10K tweets, sort-everything is 10B-element sort. Heap-merge of 1M streams pulling top 10 is (1M heap init) + (10 × log(1M)) = ~30M ops. The interviewer’s “scale” probe is satisfied only by this.
In practice, “heap init” is also dominated by only ever peeking the newest tweet per followee (lazy: push only the newest per stream initially, pop-then-push-next as you consume).
6.2 Fan-out-on-write vs fan-out-on-read
This solution is fan-out-on-read: feed is computed on getNewsFeed. Production tradeoff:
| Model | Post cost | Read cost | Wins when |
|---|---|---|---|
| Fan-out-on-read | O(1) | O(K log K) | Few reads, many follows, viral users |
| Fan-out-on-write | O(F) F=#followers | O(1) | Many reads, average users |
| Hybrid (Twitter prod) | O(F) for normals, O(1) for celebs | O(K_celebs log K_celebs) + 1 | Mix |
Twitter famously uses hybrid: for users with > ~1M followers (celebs), fan-out-on-write would mean writing 1M timelines per tweet — too expensive. Their tweets are pulled at read time. Normal users fan-out-on-write.
6.3 The “self-tweets included” trap
getNewsFeed(uid) returns tweets from uid AND followees. Half of LC submissions forget the user’s own tweets. The clean fix: when building the candidate set, always include uid itself.
6.4 Time ordering — monotonic counter, not wall clock
Wall-clock time.time() can collide for tweets posted in the same microsecond. The monotonic counter (a single self.time int incremented on each post) is the canonical answer. Production: hybrid (wall_clock_ms, sequence) tuple — Twitter’s Snowflake IDs embed this exactly: 41 bits ms-timestamp + 10 bits machine + 12 bits sequence.
6.5 Family: feed merging
- LC 355 Design Twitter (this).
- LC 23 Merge K Sorted Lists — pure heap merge primitive.
- LC 218 Skyline — heap-merge variant.
- LC 692 Top K Frequent Words — heap top-K.
7. Anti-Pattern Analysis
- Sort all relevant tweets — O(N total log N); fails the scale probe.
- Forget user’s own tweets — most common LC submission bug.
- Use wall-clock
time.time()— collisions; nondeterministic across runs. followsas alist— O(F) forunfollow; should beset.unfollowof self — must be no-op (not an error).- Push entire tweet lists onto heap upfront — wasted work if most are dropped; lazy “push next on pop” is the production pattern.
8. Skills & Takeaways
- K-way heap merge of pre-sorted streams.
- Monotonic counter for deterministic ordering.
- Fan-out-on-read vs fan-out-on-write tradeoff vocabulary.
9. When to Move On
- Implement cold in 25 min, heap-based.
- Articulate the celebrity / fan-out-on-write tradeoff.
- Extend to
getRecentMentions(userId)follow-up.
10. Company Context
- Meta / X / Snap / LinkedIn: Feed team interview staple.
- Bloomberg: Real-time feed / news ticker.
- Reddit: Front-page ranking variant.
Strong: “Per-user tweet lists in post order. getNewsFeed is a 10-element heap merge of K = (followees + 1) streams, lazy push-on-pop. Monotonic time counter.”
Weak: “Concatenate all tweets and sort.” → 1M-follow follow-up kills you.
11. Interviewer’s Lens
| Signal | Junior | Mid | Senior | Staff+ |
|---|---|---|---|---|
| Heap merge | After hint | Cold | Cold | Cold |
| Includes self-tweets | Forgets | Fixes | Cold | Cold |
| Fan-out vocabulary | — | — | Mentions | Articulates tradeoff |
| Celebrity hybrid | — | — | — | Cites Twitter prod |
12. Level Delta
- Mid (L4): Sort-everything works. Survives. Doesn’t pass scale probe.
- Senior (L5): Heap merge cold + correct self-tweets + monotonic counter.
- Staff (L6): + lazy “push-next-on-pop” + articulates fan-out-on-read vs write + describes celebrity-user hybrid.
- Principal (L7+): + discusses timeline materialization (Twitter’s “home timeline” stored in Redis per user, capped at ~800 entries); + read-after-write consistency (your own posts must appear instantly even if Redis hasn’t caught up — query merges materialized + recent self-tweets); + multi-DC issues (your follower fan-out crosses regions); + Snowflake IDs as the canonical monotonic time + machine-shard scheme.
13. Follow-up Q&A
Q1: Scale to 1M followees. A1: Lazy heap-merge already handles it: K=1M but we only pop 10. Initial heap-init is O(K), still dominant. To improve further: persist the timeline (fan-out-on-write).
Q2: Hot user (celebrity) with 100M followers. A2: Don’t fan-out-on-write for celebs. At read, merge celebrity tweets in at request time (hybrid model).
Q3: Delete a tweet.
A3: Add deleted: set[tweet_id]; filter in getNewsFeed. Or remove from per-user list (O(N_user)).
Q4: Trending topics. A4: Count-min sketch + heavy-hitters algorithm (Misra-Gries) over a sliding time window. Different problem; same “streaming top-K” family.
Q5: Read-after-write for own posts.
A5: On getNewsFeed(uid), merge user’s own recent tweets at request time even if the materialized timeline is stale. Production: this is exactly what Twitter does for self-views.
14. Solution Walkthrough
See solution.py:
Twitter— heap-merge feed usingheapqwith lazy push-on-pop.TwitterBrute— gather-all + sort oracle.- Stress test: random sequences of post/follow/unfollow/feed.
15. Beyond the Problem
Principal code review: “The textbook answer is fan-out-on-read with heap merge — that’s what your code does, and it’s the right starting point. But Twitter’s actual production architecture (well documented in their engineering blog) is a hybrid based on followee fan-out distribution: for the median user (< ~100 followers), they fan-out-on-write into per-follower Redis timelines (capped at ~800 entries each, ~600 GB RAM total for the active user set). For celebrities (> ~1M followers), they do fan-out-on-read at request time — writing 100M timelines per tweet from @ladygaga would crash the cluster. The hybrid ‘merge celebrity in at read’ adds latency to celebrity-follower reads but bounds write amplification. The Staff/Principal signal in this round is naming this pull vs push asymmetry and identifying the viral fan-in event (e.g., a tweet by a new account that suddenly goes viral) as the hardest case — you can’t pre-compute timelines for an unknown future celebrity. Bonus signal: knowing that the timeline is itself capped (you don’t get tweets from 2014 in your feed), which means the merge is bounded by
min(800, K × per_user_recent), not unbounded history. Snowflake IDs (their open-sourced monotonic ID scheme: 41 bits ms-time + 10 bits machine + 12 bits sequence) replace yourself.timecounter and solve cross-shard ordering without coordination.”