A System Design Deep Dive — Question by Question
Source: Dev.to
Introduction
A URL shortener seems deceptively simple — take a long URL, return a short one. But at scale, it hides some of the most fascinating distributed‑systems challenges in software engineering. This post walks through the real complexity, challenge by challenge.
Challenge 1: Scaling Under Heavy Traffic
Interview Question:
When millions of users are simultaneously shortening URLs and millions more are clicking short links — how do you ensure the system stays fast and doesn’t become a bottleneck?
Naïve Approach
A single server handles everything. The moment traffic spikes, you hit a wall.
Correct Approach
Horizontal scaling – a load balancer distributes incoming requests across multiple application servers.
But this raises an immediate follow‑up: what about the database?
Key Insight
Horizontal scaling solves the application‑layer pressure, but the database becomes the next bottleneck if left as a single instance.
Challenge 2 – The Read/Write Imbalance
Interview question
If all application servers point to a single database for both reads and writes, what happens under heavy read traffic?
- In a URL‑shortener, reads outnumber writes by roughly 100 : 1.
- A single database will quickly buckle under that read pressure.
Solution
Treat reads and writes differently.
- Most reads target the same popular URLs repeatedly → perfect for caching.
- Offload those reads to a cache layer (e.g., Redis, Memcached, CDN) and keep the database for writes and cache misses.
Key Insight
Reads and writes exhibit fundamentally different access patterns and should be architected independently.
Caching is the most powerful lever for read‑heavy systems.
Challenge 3 – Cache Misses and the Cold‑Start Problem
Interview Question:
Your Redis cache is cold. You have 500 million unique short URLs—you can’t cache all of them. What stays in the cache, and what happens when a miss falls through to the database?
What Happens on a Miss
- Every cache miss results in a read from the backing store.
- The database must therefore be horizontally scalable to handle the additional load.
Why NoSQL Is a Good Fit
- Horizontal scalability: Systems such as Cassandra or DynamoDB can add nodes to increase capacity without downtime.
- Distributed partitions: Reads and writes are spread across many machines, preventing a single point of contention.
- High availability: Replication and automatic fail‑over keep the service alive even when individual nodes fail.
Key Insight
NoSQL databases act as the safety net for cache misses. They provide the storage‑layer scalability needed to sustain traffic when the cache is cold or when items are evicted.
Challenge 4 – Choosing the Right Cache‑Eviction Strategy
Interview Question
Your cache is full. A new URL needs space. Which entry do you evict — and does your algorithm reflect real‑world URL access patterns?
| Strategy | Drawback |
|---|---|
| FIFO | Evicts the oldest entry, ignoring both popularity and recency. |
| LFU | A viral URL from 3 months ago that is now dead can stay in the cache forever. |
| LRU | A URL accessed 1 M times but not hit in 2 hours may be evicted in favor of a rarely accessed, recent entry. |
Optimal approach
Combine frequency and recency: evict the entry that is infrequently accessed and hasn’t been accessed recently.
This is the principle behind W‑TinyLFU, the hybrid LFU + LRU algorithm that Redis uses in production.
Key Insight
W‑TinyLFU is the current gold‑standard for cache eviction because it makes smarter decisions by weighing both how often and how recently items are accessed.
Challenge 5: Unique ID Generation Across Distributed Nodes
Interview Question
Multiple application servers generate short codes simultaneously. How do you ensure no two servers generate the same short code for different URLs?
- A central auto‑increment counter seems obvious — but it becomes a single point of failure.
- Master‑slave replication helps with availability, but asynchronous replication risks duplicate IDs being issued after a fail‑over.
Follow‑up
Can you design the system so each node generates IDs independently without coordinating on every request?
Solution: Range‑Based ID Allocation
- Allocate a range – A dedicated counter service hands each node a block of IDs, e.g.:
- Node A → IDs 1–1000
- Node B → IDs 1001–2000
- Generate locally – Each node creates short codes from its own range without contacting the counter service.
- Re‑request when exhausted – When a node runs out of IDs, it asks the counter service for a new batch.
Benefits
| Benefit | Explanation |
|---|---|
| No hot‑path dependency | The counter service is called only when a range is depleted, not for every ID request. |
| Resilience | If the counter service is temporarily unavailable, nodes continue using their existing ranges. |
| Scalability | Adding more nodes is just a matter of assigning them new ranges. |
| Acceptable gaps | Gaps in the sequence are harmless because short codes are opaque to users. |
Key Insight
Range‑based ID allocation decentralizes ID generation while guaranteeing global uniqueness. This pattern is used at scale by systems such as Twitter, Instagram, and many other high‑throughput services.
Challenge 6: 301 vs 302 Redirects — A Business Decision
Interview Question
“301 is a permanent redirect — browsers cache it, reducing server load. But what does 301 silently break for businesses using your service?”
Why a 301 can be problematic
- Once a browser caches a 301, it never contacts your servers again for that URL.
- Analytics die – you lose click counts, geography, device type, and referrer data.
- URL updating breaks – if a business changes the destination mid‑campaign, users with a cached 301 will never see the new URL.
Why a 302 is often the better choice
- Every click reaches your server, preserving real‑time analytics.
- Destination URLs can be changed at any time without worrying about cached redirects.
- The additional round‑trip latency is minimal compared with the business value of accurate tracking and flexibility.
Key Insight
A 302 redirect retains analytics and URL mutability—critical for businesses running campaigns. The slight latency cost is worth the business value.
Challenge 7: Malicious URL Protection
Interview Question: A bad actor shortens a phishing URL. Millions of users click it. How do you protect users — and what about URLs that were clean when shortened but become malicious later?
A purely reactive approach leaves a dangerous time window.
Layered defense
- Creation‑time check – Query a third‑party malicious‑URL database (e.g., Google Safe Browsing API) before accepting the URL.
- Periodic re‑scanning – Re‑check existing URLs on a schedule because clean URLs can turn malicious later.
- User reports + manual verification – A final safety net that blocks URLs flagged by the community or internal teams.
Key Insight: Defense in depth shrinks the harmful time window and provides multiple opportunities to block malicious destinations.
Challenge 8 – The Thundering Herd / Cache Stampede
Interview question
A celebrity tweets your short‑URL to 50 million followers at the same time. The URL has just been created, so the cache is cold. What happens to your database at that exact moment?
- This is not a cold‑start problem.
- Cold start = cache empty, traffic arrives gradually → the DB warms up slowly.
- Cache stampede = cache empty and millions hit simultaneously → the DB is hammered in one shot.
Follow‑up 1 – How do you ensure that only one request hits the database while the rest wait for that result?
Solution: Cache locking (a “single‑flight” pattern)
First request
- Misses the cache.
- Sets an
IN_PROGRESSflag in a distributed cache (e.g., Redis). - Queries the database.
Subsequent requests
- See the
IN_PROGRESSflag. - Wait (poll, block, or use a pub/sub notification).
- See the
When the DB response returns
- The first request populates the cache with the result.
- Removes the
IN_PROGRESSflag. - Returns the result to the client.
Waiting requests are then served instantly from the now‑filled cache.
Result: One DB hit instead of millions.
Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?
Give the IN_PROGRESS flag a TTL (time‑to‑live) that is slightly longer than the expected DB response time.
TTL = DB_response_time + safety_buffer
# Example:
DB response ≈ 500 ms
buffer ≈ 100 ms
TTL = 600 ms- If the request crashes, the flag expires automatically.
- No manual cleanup is required, and no deadlock occurs.
Key insight
Cache locking with a TTL‑based expiry prevents a thundering herd while avoiding deadlocks—a pattern used at Facebook, Twitter, and other large‑scale services.
Summary
| Area | Approach |
|---|---|
| App‑layer scaling | Horizontal scaling via load balancer |
| Database scaling | NoSQL with horizontal sharding |
| Cache eviction | W‑TinyLFU hybrid (frequency + recency) |
| Unique ID generation | Range‑based allocation across nodes |
| Counter availability | Infrequent calls + node breathing room |
| Redirect strategy | 302 for analytics & URL flexibility |
| Malicious URLs | Third‑party scanning, periodic re‑check, reactive blocking |
| Cache stampede | Cache locking with TTL‑based expiry |
Final Thoughts
A URL shortener is one of the most deceptively deep system‑design problems. On the surface it looks like a simple key‑value store, but underneath it forces you to confront:
- Horizontal scaling
- Caching theory
- Distributed ID generation
- HTTP semantics
- Security
- Concurrency
The most important skill isn’t memorising answers; it’s questioning your own assumptions. Every solution uncovers new edge cases. That iterative thinking separates good system design from great system design.
Happy building!