A System Design Deep Dive — Question by Question

Published: (March 10, 2026 at 11:06 PM EDT)
8 min read
Source: Dev.to

Source: Dev.to

Introduction

A URL shortener seems deceptively simple — take a long URL, return a short one. But at scale, it hides some of the most fascinating distributed‑systems challenges in software engineering. This post walks through the real complexity, challenge by challenge.


Challenge 1: Scaling Under Heavy Traffic

Interview Question: When millions of users are simultaneously shortening URLs and millions more are clicking short links — how do you ensure the system stays fast and doesn’t become a bottleneck?

The naive approach is a single server handling everything. The moment traffic spikes, you hit a wall. The fix is horizontal scaling — a load balancer distributes incoming requests across multiple application servers.

But this raises an immediate follow‑up: what about the database?

Key Insight: Horizontal scaling solves app‑layer pressure, but the database becomes the next bottleneck if left as a single instance.


Challenge 2: The Read/Write Imbalance

Interview Question: If all application servers point to one single database for both reads and writes — what happens under heavy read traffic? Redirects outnumber URL creation by roughly 100:1.

A URL shortener is an extremely read‑heavy system. For every person shortening a URL, roughly 100 people are clicking it. A single database will buckle under that read pressure.

Solution: Treat reads and writes differently. Most reads are for the same popular URLs repeatedly — which is exactly what caching is built for.

Key Insight: Reads and writes have fundamentally different patterns and must be architected independently. Caching is the most powerful lever for read‑heavy systems.


Challenge 3: Cache Misses and the Cold‑Start Problem

Interview Question: Your Redis cache is cold. You have 500 million unique short URLs — you can’t cache all of them. What stays in cache, and what happens when a miss falls through to the database?

Even with caching, misses happen. Every miss hits the database. The database therefore needs to be horizontally scalable too — which is why NoSQL databases like Cassandra or DynamoDB are popular here. They are designed to scale out across many nodes, handling reads across distributed partitions.

Key Insight: NoSQL provides horizontal scalability at the storage layer, acting as the safety net for cache misses at any scale.


Challenge 4: Choosing the Right Cache‑Eviction Strategy

Interview Question: Your cache is full. A new URL needs space. Which entry do you evict — and does your algorithm reflect real‑world URL access patterns?

StrategyDrawback
FIFOEvicts the oldest entry, ignores popularity and recency entirely
LFUA viral URL from 3 months ago that is now dead stays in cache forever
LRUA URL accessed 1 M times but not hit in 2 hours gets evicted over a rarely accessed recent one

Optimal approach: Combine both frequency and recency — evict the entry that is infrequently accessed and hasn’t been accessed recently. This is the principle behind W‑TinyLFU, the algorithm Redis uses internally in production.

Key Insight: W‑TinyLFU (hybrid LFU + LRU) is the gold standard for cache eviction, combining frequency and recency for smarter decisions.


Challenge 5: Unique ID Generation Across Distributed Nodes

Interview Question: Multiple application servers generate short codes simultaneously. How do you ensure no two servers generate the same short code for different URLs?

A central auto‑increment counter seems obvious — but it becomes a single point of failure. Master‑slave replication helps with availability, but async replication risks duplicate IDs being issued after a failover.

Follow‑up: Can you design the system so each node generates IDs independently without coordinating on every request?

Solution: Range‑Based ID Allocation.

  1. A counter service hands each node a range (e.g., Node A gets 1–1000, Node B gets 1001–2000).
  2. Each node generates IDs independently from its allocated range.
  3. When a node exhausts its range, it requests a new batch.

The counter service is called infrequently — not in the hot path. If it goes down briefly, nodes keep generating from their existing range. Gaps in the sequence don’t matter; short codes are opaque to users.

Key Insight: Range‑based ID allocation decentralizes generation while maintaining global uniqueness — used by Twitter, Instagram, and many others at scale.


Challenge 6: 301 vs 302 Redirects — A Business Decision

Interview Question: 301 is a permanent redirect — browsers cache it, reducing server load. But what does 301 silently break for businesses using your service?

  • Once a browser caches a 301, it never contacts your servers again for that URL.
  • Analytics die — you cannot track clicks, geography, device type, or referrer.
  • URL updating breaks — if a business wants to change the destination mid‑campaign, users with cached 301s will never see the update.

Using 302 ensures every click hits your servers first. Yes, there is a small overhead, but for a service where analytics and flexibility are core value propositions, 302 is the only sensible choice.

Key Insight: 302 preserves analytics and URL mutability — essential for businesses running campaigns. The slight latency cost is worth the business value.


Challenge 7: Malicious URL Protection

Interview Question: A bad actor shortens a phishing URL. Millions of users click it. How do you protect users — and what about URLs that were clean when shortened but become malicious later?

A purely reactive approach leaves a dangerous time window.

Layered defense:

  1. Creation‑time check – query a third‑party malicious‑URL database (e.g., Google Safe Browsing API) before accepting the URL.
  2. Periodic re‑scanning – re‑check existing URLs on a schedule because clean URLs can turn malicious later.
  3. User reports + manual verification – a final safety net that blocks URLs flagged by the community or internal teams.

Key Insight: Defense in depth shrinks the harmful time window and provides multiple opportunities to block malicious destinations.

Challenge 8: The Thundering Herd / Cache Stampede

Interview Question:
A celebrity tweets your short URL to 50 million followers simultaneously. The URL has just been created – the cache is cold. What happens to your database at that exact moment?

  • This is not a cold‑start problem.
  • Cold start = cache empty, traffic arrives gradually → DB warms up slowly.
  • Cache stampede = cache empty and millions hit at the same instant → the database is hammered in one shot.

Follow‑up 1 – How do you make only one request go to the database and make the rest wait for that result?

Solution: Cache Locking

  1. The first request misses the cache.
  2. It sets an IN_PROGRESS flag in Redis (or any distributed cache).
  3. It proceeds to query the database.
  4. All subsequent requests see the IN_PROGRESS flag and wait (e.g., poll or block).
  5. When the DB response returns, the first request:
    • populates the cache,
    • removes the flag,
    • returns the result.
  6. Waiting requests are then served instantly from the now‑filled cache.

Result: One DB hit instead of one million.


Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?

Give the IN_PROGRESS flag a TTL (time‑to‑live) that is a little longer than the expected DB response time.

TTL = DB_response_time + buffer
# Example:
DB response ≈ 500 ms
buffer      ≈ 100 ms
TTL         = 600 ms
  • If the request crashes, the flag expires automatically.
  • No manual cleanup, no deadlocks.

Key Insight:
Cache locking with a TTL‑based expiry prevents a thundering herd without risking deadlock – a pattern used at Facebook, Twitter, and other large‑scale services.


Summary

AreaApproach
App‑layer scalingHorizontal scaling via load balancer
Database scalingNoSQL with horizontal sharding
Cache evictionW‑TinyLFU hybrid (frequency + recency)
Unique ID generationRange‑based allocation across nodes
Counter availabilityInfrequent calls + node breathing room
Redirect strategy302 for analytics & URL flexibility
Malicious URLs3rd‑party scanning, periodic re‑check, reactive blocking
Cache stampedeCache locking with TTL‑based expiry

Final Thoughts

A URL shortener is one of the most deceptively deep system‑design problems. On the surface it looks like a simple key‑value store, but underneath it forces you to confront:

  • Horizontal scaling
  • Caching theory
  • Distributed ID generation
  • HTTP semantics
  • Security
  • Concurrency

The most important skill isn’t memorising answers; it’s questioning your own assumptions. Every solution uncovers new edge cases. That iterative thinking separates good system design from great system design.

Happy building!

0 views
Back to Blog

Related posts

Read more »