Designing Instagram Stories: From Upload to Expiry
Source: Dev.to
Core Requirements
Functional requirements
- Users can post photo or video stories
- Stories expire automatically after 24 hours
- Visibility is restricted (followers, close friends, contacts)
- Viewers can see and react to stories
Non-functional requirements
- Low latency for feed loading
- High availability
- Horizontal scalability
- Eventual consistency is acceptable
High-Level Architecture (HLD)
At a high level, the system is split into independent services, each responsible for a single concern:
- API Gateway – authentication, authorization, routing, rate limiting
- Story Service – story creation and lifecycle management
- Content Service – media, text, and link handling
- Feed Service – story feed generation
- Visibility Service – privacy and audience enforcement
- Expiration Service – 24‑hour TTL handling
- Kafka & Background Workers – asynchronous processing
- Analytics & Notification Services – engagement insights and alerts
Story Creation Flow (Write Path)
When a user posts a story:
- The client sends a request through the API Gateway.
- The Content Service returns a pre‑signed upload URL.
- The client uploads media directly to object storage (e.g., S3).
- The Story Service stores metadata with a 24‑hour expiration timestamp.
Key design decision: separating media upload from metadata storage keeps the write path fast and scalable.
Story Consumption Flow (Read Path)
When a user opens the stories tray:
- Feed Service fetches active stories from followed users.
- Visibility Service filters stories based on privacy rules.
- Expired stories are ignored.
- Media URLs are returned to the client.
- The client streams media directly from the CDN.
This design is optimized for read‑heavy traffic, which dominates story usage.
Visibility and Privacy Rules
Stories support multiple visibility modes:
- Followers only
- Close friends
- Contact‑based visibility (WhatsApp Status)
- Blocked users
A dedicated Visibility Service enforces these rules using:
- Followers/contacts graph
- Redis caching for fast permission checks
Isolating visibility logic ensures consistent privacy enforcement and easier evolution.
Expiration and Lifecycle Management
Ephemeral content is treated as a first‑class concern:
- Each story has a strict 24‑hour TTL.
- Expiration Service monitors story timestamps.
- Expired stories trigger lifecycle events.
- Expired content is no longer served.
This guarantees correctness even under high traffic.
Event‑Driven Cleanup Using Kafka
Kafka decouples lifecycle events from cleanup logic. Typical events include:
story_createdstory_expiredstory_viewedstory_reacted
A Media Cleanup Worker consumes expiration events and:
- Deletes media from object storage.
- Removes CDN references.
Cleanup happens asynchronously, keeping user‑facing APIs fast.
Engagement Tracking (Views & Reactions)
User engagement is handled by separate services:
- View Tracking Service – tracks story views.
- Reaction Service – likes, emojis, and replies.
These services:
- Handle extremely high write throughput.
- Are eventually consistent.
- Do not impact feed read performance.
Engagement data is later aggregated for analytics.
Final Thoughts
Stories systems are deceptively complex. By designing explicitly for expiration, visibility, and scale, we can build systems that are resilient, efficient, and easy to evolve. This design closely mirrors how platforms like Instagram and WhatsApp handle ephemeral content at massive scale. Feedback and alternative approaches are welcome.