p103 — Design Twitter

Source: LeetCode 355 · Medium · Topics: Design, Hash Table, Heap, Linked List Companies (2024–2025): Meta (high), Twitter/X (very high — namesake), Snap (medium), LinkedIn (high), Bloomberg (medium) Loop position: The K-way merge feed problem. Every social feed, news aggregator, and activity timeline reduces to “merge K sorted streams, take top N”.

1. Quick Context

Build a tiny social network with:

postTweet(userId, tweetId)
getNewsFeed(userId) → 10 most recent tweets from user + everyone they follow.
follow(followerId, followeeId)
unfollow(followerId, followeeId)

State:

tweets: dict[userId, list[(time, tweetId)]] — append-only, in post order.
follows: dict[userId, set[userId]].
time: int — monotonic counter.

getNewsFeed = k-way merge of at most K = (1 + #followed) sorted streams, take top 10. Heap-based: O(K log K + 10 log K) = O(K log K).

2. LeetCode Link + Attempt Gate

🔗 https://leetcode.com/problems/design-twitter/

STOP. 25-min timer.

3. Prerequisite Concepts

Heap merge of K sorted streams (LC 23 Merge K Sorted Lists, p9-heap-top-k pattern).
Reverse iteration of per-user tweet lists (newest first).
Following = adjacency set (self-implicitly followed).

4. How to Approach (FRAMEWORK 1–9)

1. Restate: Mini-Twitter; news feed = top-10 most recent across user + followees.

2. Clarify: Self-follow allowed? No (LC). But user always sees own tweets. unfollow someone you don’t follow? No-op. Re-follow? Idempotent.

3. Examples: user1 posts t5; user2 posts t6; user1 follows user2; user1.feed = [t6, t5].

5. Brute: On getNewsFeed, gather all relevant tweets, sort desc by time, take 10. O(N total tweets log N).

6. Target: O(K log K + 10 log K) where K = followee count + 1.

7. Pattern: Heap-merge K sorted streams.

8. Develop: see solution.py.

9. Test: user follows self (must be no-op or implicit); follow then unfollow then re-follow; getNewsFeed on user with 0 tweets and 0 followees; user with only 3 tweets across feed.

If a user follows 1M accounts and each has 10K tweets, sort-everything is 10B-element sort. Heap-merge of 1M streams pulling top 10 is (1M heap init) + (10 × log(1M)) = ~30M ops. The interviewer’s “scale” probe is satisfied only by this.

In practice, “heap init” is also dominated by only ever peeking the newest tweet per followee (lazy: push only the newest per stream initially, pop-then-push-next as you consume).

6.2 Fan-out-on-write vs fan-out-on-read

This solution is fan-out-on-read: feed is computed on getNewsFeed. Production tradeoff:

Model	Post cost	Read cost	Wins when
Fan-out-on-read	O(1)	O(K log K)	Few reads, many follows, viral users
Fan-out-on-write	O(F) F=#followers	O(1)	Many reads, average users
Hybrid (Twitter prod)	O(F) for normals, O(1) for celebs	O(K_celebs log K_celebs) + 1	Mix

Twitter famously uses hybrid: for users with > ~1M followers (celebs), fan-out-on-write would mean writing 1M timelines per tweet — too expensive. Their tweets are pulled at read time. Normal users fan-out-on-write.

6.3 The “self-tweets included” trap

getNewsFeed(uid) returns tweets from uid AND followees. Half of LC submissions forget the user’s own tweets. The clean fix: when building the candidate set, always include uid itself.

6.4 Time ordering — monotonic counter, not wall clock

Wall-clock time.time() can collide for tweets posted in the same microsecond. The monotonic counter (a single self.time int incremented on each post) is the canonical answer. Production: hybrid (wall_clock_ms, sequence) tuple — Twitter’s Snowflake IDs embed this exactly: 41 bits ms-timestamp + 10 bits machine + 12 bits sequence.

6.5 Family: feed merging

LC 355 Design Twitter (this).
LC 23 Merge K Sorted Lists — pure heap merge primitive.
LC 218 Skyline — heap-merge variant.
LC 692 Top K Frequent Words — heap top-K.

7. Anti-Pattern Analysis

Sort all relevant tweets — O(N total log N); fails the scale probe.
Forget user’s own tweets — most common LC submission bug.
Use wall-clock time.time() — collisions; nondeterministic across runs.
follows as a list — O(F) for unfollow; should be set.
unfollow of self — must be no-op (not an error).
Push entire tweet lists onto heap upfront — wasted work if most are dropped; lazy “push next on pop” is the production pattern.

8. Skills & Takeaways

K-way heap merge of pre-sorted streams.
Monotonic counter for deterministic ordering.
Fan-out-on-read vs fan-out-on-write tradeoff vocabulary.

9. When to Move On

Implement cold in 25 min, heap-based.
Articulate the celebrity / fan-out-on-write tradeoff.
Extend to getRecentMentions(userId) follow-up.

10. Company Context

Meta / X / Snap / LinkedIn: Feed team interview staple.
Bloomberg: Real-time feed / news ticker.
Reddit: Front-page ranking variant.

Strong: “Per-user tweet lists in post order. getNewsFeed is a 10-element heap merge of K = (followees + 1) streams, lazy push-on-pop. Monotonic time counter.”

Weak: “Concatenate all tweets and sort.” → 1M-follow follow-up kills you.

11. Interviewer’s Lens

Signal	Junior	Mid	Senior	Staff+
Heap merge	After hint	Cold	Cold	Cold
Includes self-tweets	Forgets	Fixes	Cold	Cold
Fan-out vocabulary	—	—	Mentions	Articulates tradeoff
Celebrity hybrid	—	—	—	Cites Twitter prod

12. Level Delta

Mid (L4): Sort-everything works. Survives. Doesn’t pass scale probe.
Senior (L5): Heap merge cold + correct self-tweets + monotonic counter.
Staff (L6): + lazy “push-next-on-pop” + articulates fan-out-on-read vs write + describes celebrity-user hybrid.
Principal (L7+): + discusses timeline materialization (Twitter’s “home timeline” stored in Redis per user, capped at ~800 entries); + read-after-write consistency (your own posts must appear instantly even if Redis hasn’t caught up — query merges materialized + recent self-tweets); + multi-DC issues (your follower fan-out crosses regions); + Snowflake IDs as the canonical monotonic time + machine-shard scheme.

13. Follow-up Q&A

Q1: Scale to 1M followees. A1: Lazy heap-merge already handles it: K=1M but we only pop 10. Initial heap-init is O(K), still dominant. To improve further: persist the timeline (fan-out-on-write).

Q2: Hot user (celebrity) with 100M followers. A2: Don’t fan-out-on-write for celebs. At read, merge celebrity tweets in at request time (hybrid model).

Q3: Delete a tweet. A3: Add deleted: set[tweet_id]; filter in getNewsFeed. Or remove from per-user list (O(N_user)).

Q4: Trending topics. A4: Count-min sketch + heavy-hitters algorithm (Misra-Gries) over a sliding time window. Different problem; same “streaming top-K” family.

Q5: Read-after-write for own posts. A5: On getNewsFeed(uid), merge user’s own recent tweets at request time even if the materialized timeline is stale. Production: this is exactly what Twitter does for self-views.

14. Solution Walkthrough

See solution.py:

Twitter — heap-merge feed using heapq with lazy push-on-pop.
TwitterBrute — gather-all + sort oracle.
Stress test: random sequences of post/follow/unfollow/feed.

15. Beyond the Problem

Principal code review: “The textbook answer is fan-out-on-read with heap merge — that’s what your code does, and it’s the right starting point. But Twitter’s actual production architecture (well documented in their engineering blog) is a hybrid based on followee fan-out distribution: for the median user (< ~100 followers), they fan-out-on-write into per-follower Redis timelines (capped at ~800 entries each, ~600 GB RAM total for the active user set). For celebrities (> ~1M followers), they do fan-out-on-read at request time — writing 100M timelines per tweet from @ladygaga would crash the cluster. The hybrid ‘merge celebrity in at read’ adds latency to celebrity-follower reads but bounds write amplification. The Staff/Principal signal in this round is naming this pull vs push asymmetry and identifying the viral fan-in event (e.g., a tweet by a new account that suddenly goes viral) as the hardest case — you can’t pre-compute timelines for an unknown future celebrity. Bonus signal: knowing that the timeline is itself capped (you don’t get tweets from 2014 in your feed), which means the merge is bounded by min(800, K × per_user_recent), not unbounded history. Snowflake IDs (their open-sourced monotonic ID scheme: 41 bits ms-time + 10 bits machine + 12 bits sequence) replace your self.time counter and solve cross-shard ordering without coordination.”

LeetCode Interview — Extreme Coding