A System Design Deep Dive — Question by Question
Source: Dev.to
Introduction
A URL shortener seems deceptively simple — take a long URL, return a short one. But at scale, it hides some of the most fascinating distributed‑systems challenges in software engineering. This post walks through the real complexity, challenge by challenge.
Challenge 1: Scaling Under Heavy Traffic
Interview Question: When millions of users are simultaneously shortening URLs and millions more are clicking short links — how do you ensure the system stays fast and doesn’t become a bottleneck?
The naive approach is a single server handling everything. The moment traffic spikes, you hit a wall. The fix is horizontal scaling — a load balancer distributes incoming requests across multiple application servers.
But this raises an immediate follow‑up: what about the database?
Key Insight: Horizontal scaling solves app‑layer pressure, but the database becomes the next bottleneck if left as a single instance.
Challenge 2: The Read/Write Imbalance
Interview Question: If all application servers point to one single database for both reads and writes — what happens under heavy read traffic? Redirects outnumber URL creation by roughly 100:1.
A URL shortener is an extremely read‑heavy system. For every person shortening a URL, roughly 100 people are clicking it. A single database will buckle under that read pressure.
Solution: Treat reads and writes differently. Most reads are for the same popular URLs repeatedly — which is exactly what caching is built for.
Key Insight: Reads and writes have fundamentally different patterns and must be architected independently. Caching is the most powerful lever for read‑heavy systems.
Challenge 3: Cache Misses and the Cold‑Start Problem
Interview Question: Your Redis cache is cold. You have 500 million unique short URLs — you can’t cache all of them. What stays in cache, and what happens when a miss falls through to the database?
Even with caching, misses happen. Every miss hits the database. The database therefore needs to be horizontally scalable too — which is why NoSQL databases like Cassandra or DynamoDB are popular here. They are designed to scale out across many nodes, handling reads across distributed partitions.
Key Insight: NoSQL provides horizontal scalability at the storage layer, acting as the safety net for cache misses at any scale.
Challenge 4: Choosing the Right Cache‑Eviction Strategy
Interview Question: Your cache is full. A new URL needs space. Which entry do you evict — and does your algorithm reflect real‑world URL access patterns?
| Strategy | Drawback |
|---|---|
| FIFO | Evicts the oldest entry, ignores popularity and recency entirely |
| LFU | A viral URL from 3 months ago that is now dead stays in cache forever |
| LRU | A URL accessed 1 M times but not hit in 2 hours gets evicted over a rarely accessed recent one |
Optimal approach: Combine both frequency and recency — evict the entry that is infrequently accessed and hasn’t been accessed recently. This is the principle behind W‑TinyLFU, the algorithm Redis uses internally in production.
Key Insight: W‑TinyLFU (hybrid LFU + LRU) is the gold standard for cache eviction, combining frequency and recency for smarter decisions.
Challenge 5: Unique ID Generation Across Distributed Nodes
Interview Question: Multiple application servers generate short codes simultaneously. How do you ensure no two servers generate the same short code for different URLs?
A central auto‑increment counter seems obvious — but it becomes a single point of failure. Master‑slave replication helps with availability, but async replication risks duplicate IDs being issued after a failover.
Follow‑up: Can you design the system so each node generates IDs independently without coordinating on every request?
Solution: Range‑Based ID Allocation.
- A counter service hands each node a range (e.g., Node A gets 1–1000, Node B gets 1001–2000).
- Each node generates IDs independently from its allocated range.
- When a node exhausts its range, it requests a new batch.
The counter service is called infrequently — not in the hot path. If it goes down briefly, nodes keep generating from their existing range. Gaps in the sequence don’t matter; short codes are opaque to users.
Key Insight: Range‑based ID allocation decentralizes generation while maintaining global uniqueness — used by Twitter, Instagram, and many others at scale.
Challenge 6: 301 vs 302 Redirects — A Business Decision
Interview Question: 301 is a permanent redirect — browsers cache it, reducing server load. But what does 301 silently break for businesses using your service?
- Once a browser caches a 301, it never contacts your servers again for that URL.
- Analytics die — you cannot track clicks, geography, device type, or referrer.
- URL updating breaks — if a business wants to change the destination mid‑campaign, users with cached 301s will never see the update.
Using 302 ensures every click hits your servers first. Yes, there is a small overhead, but for a service where analytics and flexibility are core value propositions, 302 is the only sensible choice.
Key Insight: 302 preserves analytics and URL mutability — essential for businesses running campaigns. The slight latency cost is worth the business value.
Challenge 7: Malicious URL Protection
Interview Question: A bad actor shortens a phishing URL. Millions of users click it. How do you protect users — and what about URLs that were clean when shortened but become malicious later?
A purely reactive approach leaves a dangerous time window.
Layered defense:
- Creation‑time check – query a third‑party malicious‑URL database (e.g., Google Safe Browsing API) before accepting the URL.
- Periodic re‑scanning – re‑check existing URLs on a schedule because clean URLs can turn malicious later.
- User reports + manual verification – a final safety net that blocks URLs flagged by the community or internal teams.
Key Insight: Defense in depth shrinks the harmful time window and provides multiple opportunities to block malicious destinations.
Challenge 8: The Thundering Herd / Cache Stampede
Interview Question:
A celebrity tweets your short URL to 50 million followers simultaneously. The URL has just been created – the cache is cold. What happens to your database at that exact moment?
- This is not a cold‑start problem.
- Cold start = cache empty, traffic arrives gradually → DB warms up slowly.
- Cache stampede = cache empty and millions hit at the same instant → the database is hammered in one shot.
Follow‑up 1 – How do you make only one request go to the database and make the rest wait for that result?
Solution: Cache Locking
- The first request misses the cache.
- It sets an
IN_PROGRESSflag in Redis (or any distributed cache). - It proceeds to query the database.
- All subsequent requests see the
IN_PROGRESSflag and wait (e.g., poll or block). - When the DB response returns, the first request:
- populates the cache,
- removes the flag,
- returns the result.
- Waiting requests are then served instantly from the now‑filled cache.
Result: One DB hit instead of one million.
Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?
Give the IN_PROGRESS flag a TTL (time‑to‑live) that is a little longer than the expected DB response time.
TTL = DB_response_time + buffer
# Example:
DB response ≈ 500 ms
buffer ≈ 100 ms
TTL = 600 ms
- If the request crashes, the flag expires automatically.
- No manual cleanup, no deadlocks.
Key Insight:
Cache locking with a TTL‑based expiry prevents a thundering herd without risking deadlock – a pattern used at Facebook, Twitter, and other large‑scale services.
Summary
| Area | Approach |
|---|---|
| App‑layer scaling | Horizontal scaling via load balancer |
| Database scaling | NoSQL with horizontal sharding |
| Cache eviction | W‑TinyLFU hybrid (frequency + recency) |
| Unique ID generation | Range‑based allocation across nodes |
| Counter availability | Infrequent calls + node breathing room |
| Redirect strategy | 302 for analytics & URL flexibility |
| Malicious URLs | 3rd‑party scanning, periodic re‑check, reactive blocking |
| Cache stampede | Cache locking with TTL‑based expiry |
Final Thoughts
A URL shortener is one of the most deceptively deep system‑design problems. On the surface it looks like a simple key‑value store, but underneath it forces you to confront:
- Horizontal scaling
- Caching theory
- Distributed ID generation
- HTTP semantics
- Security
- Concurrency
The most important skill isn’t memorising answers; it’s questioning your own assumptions. Every solution uncovers new edge cases. That iterative thinking separates good system design from great system design.
Happy building!