Scale Wars #5 — Twitter: The Fan-out Pattern and the Architecture Behind 140 Characters

Published: 2 weeks ago (May 26, 2026 at 05:20 PM EDT)

5 min read

Source: Dev.to

The Problem: Lady Gaga and 50 Million Followers

When a user tweets, that tweet should appear in all their followers’ timelines.

An average user has ~200 followers.
Lady Gaga has 50 million followers.
If Lady Gaga tweets, 50 million timelines need to be updated.

If Twitter created a separate database row for each follower:

-- Naive approach: One row per follower
INSERT INTO timeline (user_id, tweet_id, author_id, created_at)
SELECT follower_id, 12345, 'ladygaga', NOW()
FROM followers
WHERE followee_id = 'ladygaga';
-- 50 million INSERTs — disaster!

This approach is impossible. 50 million INSERTs take minutes and lock the database.

Architectural Decision: Fan‑out‑on‑Write vs. Fan‑out‑on‑Read

Twitter developed two strategies and used them as a hybrid.

Strategy 1: Fan‑out‑on‑Write

When a user tweets, the tweet is written to all followers’ timelines at that moment.

USER A tweeted
     │
     ▼
┌─────────────────┐
│ Timeline Service│
│ (at write time) │
└────────┬────────┘
         │
 ┌───────┴───────┬───────┬───────┬───────┐
 ▼               ▼       ▼       ▼       ▼
Follower1   Follower2  Follower3 …   FollowerN
timeline    timeline   timeline       timeline

Pros

Reads (viewing the timeline) are very fast — just fetch the user’s timeline.
Simple architecture.

Cons

Writing is extremely slow for users with many followers.
Storage explosion: each tweet is stored N times.

Strategy 2: Fan‑out‑on‑Read

When a user opens their timeline, tweets from the people they follow are merged at that moment.

USER opened their timeline
     │
     ▼
┌──────────────────┐
│ Timeline Service│
│ (at read time)   │
└────────┬─────────┘
         │
 ┌───────┴───────┬───────┬───────┐
 ▼               ▼       ▼       ▼
Author1        Author2  Author3  Author4
tweets        tweets   tweets   tweets
         │
         └──> MERGE ──> Show to user

Pros

Writing is very fast — just store the tweet.
Storage efficient — each tweet is stored once.

Cons

Reading is very slow — N queries per timeline view.
The merge operation is expensive.

Twitter’s Hybrid Solution

Twitter splits users into two categories:

Category	Followers	Strategy
Normal users	10 000	Fan‑out‑on‑Read

# Twitter's hybrid approach (pseudo‑code)
def post_tweet(user, tweet_text):
    tweet = create_tweet(user, tweet_text)

    if user.follower_count < 10_000:
        # Normal user: Fan‑out‑on‑Write
        followers = get_followers(user.id)
        for follower in followers:
            redis.zadd(
                f"timeline:{follower.id}",
                tweet.timestamp,
                tweet.id
            )
    else:
        # Celebrity: Only store their own tweet
        # Will be merged when followers open their timeline
        redis.zadd(f"user_tweets:{user.id}", tweet.timestamp, tweet.id)

def get_timeline(user_id):
    # 1. Get the user's pre‑computed timeline
    timeline = redis.zrevrange(f"timeline:{user_id}", 0, 100)

    # 2. Add tweets from followed celebrities
    for celeb in get_followed_celebrities(user_id):
        celeb_tweets = redis.zrevrange(
            f"user_tweets:{celeb.id}", 0, 10
        )
        timeline.extend(celeb_tweets)

    # 3. Sort by time and return the latest 100
    return sorted(timeline, key=lambda t: t.timestamp, reverse=True)[:100]

Manhattan: Twitter’s Own Database

In 2014, Twitter migrated from MySQL to Manhattan, a distributed key‑value store built for its specific needs.

Multi‑datacenter replication: Tweets are replicated across multiple data centers worldwide.
Low latency: < 10 ms read latency at the 99th percentile.
High throughput: Millions of tweets per second.

Snowflake: Twitter’s ID Generation System

Tweet IDs aren’t random. Twitter uses an ID generation system called Snowflake:

┌────────────────────────────────────────────────────────────┐
│ 64‑bit Tweet ID (Snowflake)                                 │
├────────────────────────────────────────────────────────────┤
│ Bit 63:          Sign bit (always 0)                        │
│ Bits 22‑62:      Timestamp (41 bits — ms since custom epoch)│
│ Bits 17‑21:      Datacenter ID (5 bits → 32 datacenters)    │
│ Bits 12‑16:      Worker ID (5 bits → 32 workers per DC)     │
│ Bits 0‑11:       Sequence number (12 bits → 4096 per ms/worker)│
└────────────────────────────────────────────────────────────┘

Why this kind of ID?

Decentralized: Each worker generates its own IDs; no coordination needed.
Time‑ordered: Tweets are naturally sorted chronologically.
Unique: Collisions are impossible (each worker uses a different ID block).
Compact: 64 bits vs. 128 bits for a UUID.

Trade‑offs

✅ Gains

Low latency: Timelines load within milliseconds.
Scale: Billions of tweets and users supported.
Cost efficiency: Storage and write load optimized for celebrity users.

❌ Costs

Architectural complexity introduced by maintaining two fan‑out strategies and the hybrid logic.
Additional operational overhead for synchronising pre‑computed timelines with on‑read merges.
Need for custom infrastructure (Manhattan, Snowflake) and expertise.

Complexity: Managing two different strategies is hard

Inconsistency: Celebrity tweets may appear a few seconds late in followers’ timelines

Threshold management: Determining and dynamically adjusting the “10,000‑follower” threshold is difficult

Takeaways

Twitter showed us there’s no such thing as “one size fits all” — they used a hybrid approach instead of a single strategy.
In feed, timeline, and notification systems, the read‑vs‑write trade‑off will always come up; analyze which operation is more frequent and choose your strategy accordingly.
Centralized ID generation (auto‑increment) becomes a bottleneck in distributed systems; look into Snowflake, ULID, UUID v7 as alternatives.
For frequently read data like timelines, in‑memory caches like Redis are vital.

Next up — Chapter 6: Spotify’s Squad Model and how Golden Paths cut service creation from 2 weeks to 5 minutes. 🎵

Scale Wars #5 — Twitter: The Fan-out Pattern and the Architecture Behind 140 Characters

The Problem: Lady Gaga and 50 Million Followers

Architectural Decision: Fan‑out‑on‑Write vs. Fan‑out‑on‑Read

Strategy 1: Fan‑out‑on‑Write

Strategy 2: Fan‑out‑on‑Read

Twitter’s Hybrid Solution

Manhattan: Twitter’s Own Database

Snowflake: Twitter’s ID Generation System

Trade‑offs

Takeaways

Related posts

Circuit Breakers: The Unsung Heroes of Resilient Microservices

Message Brokers Comparison 2026 — Kafka, RabbitMQ, NATS & Redis Streams: Which One Should You Choose?

I Should Have Put Events in the Same Database as the Aggregate Root—Heres What Happened

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually Works

The Problem: Lady Gaga and 50 Million Followers

Architectural Decision: Fan‑out‑on‑Write vs. Fan‑out‑on‑Read

Strategy 1: Fan‑out‑on‑Write

Strategy 2: Fan‑out‑on‑Read

Twitter’s Hybrid Solution

Manhattan: Twitter’s Own Database

Snowflake: Twitter’s ID Generation System

Trade‑offs

Takeaways

Related posts

Circuit Breakers: The Unsung Heroes of Resilient Microservices

Message Brokers Comparison 2026 — Kafka, RabbitMQ, NATS & Redis Streams: Which One Should You Choose?

I Should Have Put Events in the Same Database as the Aggregate Root—Heres What Happened

Treasure Hunt Engine: How We Blew Up the Docs and Built a System That Actually Works

The Problem: Lady Gaga and 50 Million Followers

Strategy 1: Fan‑out‑on‑Write

Strategy 2: Fan‑out‑on‑Read