A System Design Deep Dive — Question by Question

Published: 1 month ago (March 10, 2026 at 11:06 PM EDT)

9 min read

Source: Dev.to

Source: Dev.to

Introduction

A URL shortener seems deceptively simple — take a long URL, return a short one. But at scale, it hides some of the most fascinating distributed‑systems challenges in software engineering. This post walks through the real complexity, challenge by challenge.

Challenge 1: Scaling Under Heavy Traffic

Interview Question:
When millions of users are simultaneously shortening URLs and millions more are clicking short links — how do you ensure the system stays fast and doesn’t become a bottleneck?

Naïve Approach

A single server handles everything. The moment traffic spikes, you hit a wall.

Correct Approach

Horizontal scaling – a load balancer distributes incoming requests across multiple application servers.

But this raises an immediate follow‑up: what about the database?

Key Insight

Horizontal scaling solves the application‑layer pressure, but the database becomes the next bottleneck if left as a single instance.

Challenge 2 – The Read/Write Imbalance

Interview question
If all application servers point to a single database for both reads and writes, what happens under heavy read traffic?

In a URL‑shortener, reads outnumber writes by roughly 100 : 1.
A single database will quickly buckle under that read pressure.

Solution

Treat reads and writes differently.

Most reads target the same popular URLs repeatedly → perfect for caching.
Offload those reads to a cache layer (e.g., Redis, Memcached, CDN) and keep the database for writes and cache misses.

Key Insight

Reads and writes exhibit fundamentally different access patterns and should be architected independently.
Caching is the most powerful lever for read‑heavy systems.

Challenge 3 – Cache Misses and the Cold‑Start Problem

Interview Question:
Your Redis cache is cold. You have 500 million unique short URLs—you can’t cache all of them. What stays in the cache, and what happens when a miss falls through to the database?

What Happens on a Miss

Every cache miss results in a read from the backing store.
The database must therefore be horizontally scalable to handle the additional load.

Why NoSQL Is a Good Fit

Horizontal scalability: Systems such as Cassandra or DynamoDB can add nodes to increase capacity without downtime.
Distributed partitions: Reads and writes are spread across many machines, preventing a single point of contention.
High availability: Replication and automatic fail‑over keep the service alive even when individual nodes fail.

Key Insight

NoSQL databases act as the safety net for cache misses. They provide the storage‑layer scalability needed to sustain traffic when the cache is cold or when items are evicted.

Challenge 4 – Choosing the Right Cache‑Eviction Strategy

Interview Question
Your cache is full. A new URL needs space. Which entry do you evict — and does your algorithm reflect real‑world URL access patterns?

Strategy	Drawback
FIFO	Evicts the oldest entry, ignoring both popularity and recency.
LFU	A viral URL from 3 months ago that is now dead can stay in the cache forever.
LRU	A URL accessed 1 M times but not hit in 2 hours may be evicted in favor of a rarely accessed, recent entry.

Optimal approach

Combine frequency and recency: evict the entry that is infrequently accessed and hasn’t been accessed recently.

This is the principle behind W‑TinyLFU, the hybrid LFU + LRU algorithm that Redis uses in production.

Key Insight
W‑TinyLFU is the current gold‑standard for cache eviction because it makes smarter decisions by weighing both how often and how recently items are accessed.

Challenge 5: Unique ID Generation Across Distributed Nodes

Interview Question

Multiple application servers generate short codes simultaneously. How do you ensure no two servers generate the same short code for different URLs?

A central auto‑increment counter seems obvious — but it becomes a single point of failure.
Master‑slave replication helps with availability, but asynchronous replication risks duplicate IDs being issued after a fail‑over.

Follow‑up

Can you design the system so each node generates IDs independently without coordinating on every request?

Solution: Range‑Based ID Allocation

Allocate a range – A dedicated counter service hands each node a block of IDs, e.g.:
- Node A → IDs 1–1000
- Node B → IDs 1001–2000
Generate locally – Each node creates short codes from its own range without contacting the counter service.
Re‑request when exhausted – When a node runs out of IDs, it asks the counter service for a new batch.

Benefits

Benefit	Explanation
No hot‑path dependency	The counter service is called only when a range is depleted, not for every ID request.
Resilience	If the counter service is temporarily unavailable, nodes continue using their existing ranges.
Scalability	Adding more nodes is just a matter of assigning them new ranges.
Acceptable gaps	Gaps in the sequence are harmless because short codes are opaque to users.

Key Insight

Range‑based ID allocation decentralizes ID generation while guaranteeing global uniqueness. This pattern is used at scale by systems such as Twitter, Instagram, and many other high‑throughput services.

Challenge 6: 301 vs 302 Redirects — A Business Decision

Interview Question

“301 is a permanent redirect — browsers cache it, reducing server load. But what does 301 silently break for businesses using your service?”

Why a 301 can be problematic

Once a browser caches a 301, it never contacts your servers again for that URL.
Analytics die – you lose click counts, geography, device type, and referrer data.
URL updating breaks – if a business changes the destination mid‑campaign, users with a cached 301 will never see the new URL.

Why a 302 is often the better choice

Every click reaches your server, preserving real‑time analytics.
Destination URLs can be changed at any time without worrying about cached redirects.
The additional round‑trip latency is minimal compared with the business value of accurate tracking and flexibility.

Key Insight
A 302 redirect retains analytics and URL mutability—critical for businesses running campaigns. The slight latency cost is worth the business value.

Challenge 7: Malicious URL Protection

Interview Question: A bad actor shortens a phishing URL. Millions of users click it. How do you protect users — and what about URLs that were clean when shortened but become malicious later?

A purely reactive approach leaves a dangerous time window.

Layered defense

Creation‑time check – Query a third‑party malicious‑URL database (e.g., Google Safe Browsing API) before accepting the URL.
Periodic re‑scanning – Re‑check existing URLs on a schedule because clean URLs can turn malicious later.
User reports + manual verification – A final safety net that blocks URLs flagged by the community or internal teams.

Key Insight: Defense in depth shrinks the harmful time window and provides multiple opportunities to block malicious destinations.

Challenge 8 – The Thundering Herd / Cache Stampede

Interview question
A celebrity tweets your short‑URL to 50 million followers at the same time. The URL has just been created, so the cache is cold. What happens to your database at that exact moment?

This is not a cold‑start problem.
Cold start = cache empty, traffic arrives gradually → the DB warms up slowly.
Cache stampede = cache empty and millions hit simultaneously → the DB is hammered in one shot.

Follow‑up 1 – How do you ensure that only one request hits the database while the rest wait for that result?

Solution: Cache locking (a “single‑flight” pattern)

First request
- Misses the cache.
- Sets an IN_PROGRESS flag in a distributed cache (e.g., Redis).
- Queries the database.
Subsequent requests
- See the IN_PROGRESS flag.
- Wait (poll, block, or use a pub/sub notification).
When the DB response returns
- The first request populates the cache with the result.
- Removes the IN_PROGRESS flag.
- Returns the result to the client.
Waiting requests are then served instantly from the now‑filled cache.

Result: One DB hit instead of millions.

Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?

Give the IN_PROGRESS flag a TTL (time‑to‑live) that is slightly longer than the expected DB response time.

TTL = DB_response_time + safety_buffer
# Example:
DB response ≈ 500 ms
buffer      ≈ 100 ms
TTL         = 600 ms

If the request crashes, the flag expires automatically.
No manual cleanup is required, and no deadlock occurs.

Key insight
Cache locking with a TTL‑based expiry prevents a thundering herd while avoiding deadlocks—a pattern used at Facebook, Twitter, and other large‑scale services.

Summary

Area	Approach
App‑layer scaling	Horizontal scaling via load balancer
Database scaling	NoSQL with horizontal sharding
Cache eviction	W‑TinyLFU hybrid (frequency + recency)
Unique ID generation	Range‑based allocation across nodes
Counter availability	Infrequent calls + node breathing room
Redirect strategy	`302` for analytics & URL flexibility
Malicious URLs	Third‑party scanning, periodic re‑check, reactive blocking
Cache stampede	Cache locking with TTL‑based expiry

Final Thoughts

A URL shortener is one of the most deceptively deep system‑design problems. On the surface it looks like a simple key‑value store, but underneath it forces you to confront:

Horizontal scaling
Caching theory
Distributed ID generation
HTTP semantics
Security
Concurrency

The most important skill isn’t memorising answers; it’s questioning your own assumptions. Every solution uncovers new edge cases. That iterative thinking separates good system design from great system design.

Happy building!

A System Design Deep Dive — Question by Question

Introduction

Challenge 1: Scaling Under Heavy Traffic

Naïve Approach

Correct Approach

Key Insight

Challenge 2 – The Read/Write Imbalance

Solution

Key Insight

Challenge 3 – Cache Misses and the Cold‑Start Problem

What Happens on a Miss

Why NoSQL Is a Good Fit

Key Insight

Challenge 4 – Choosing the Right Cache‑Eviction Strategy

Optimal approach

Challenge 5: Unique ID Generation Across Distributed Nodes

Interview Question

Follow‑up

Solution: Range‑Based ID Allocation

Benefits

Key Insight

Challenge 6: 301 vs 302 Redirects — A Business Decision

Why a 301 can be problematic

Why a 302 is often the better choice

Challenge 7: Malicious URL Protection

Layered defense

Challenge 8 – The Thundering Herd / Cache Stampede

Follow‑up 1 – How do you ensure that only one request hits the database while the rest wait for that result?

Solution: Cache locking (a “single‑flight” pattern)

Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?

Summary

Final Thoughts

Related posts

'Pokemon Go' Players Unknowingly Trained Delivery Robots With 30 Billion Images

Micron enters high-volume production of HBM4 for Nvidia Vera Rubin - 2.3x bandwidth improvement and 20% boost in power efficiency

Nvidia’s version of OpenClaw could solve its biggest problem: security

Nvidia’s version of OpenClaw could solve its biggest problem: Security

Introduction

Challenge 1: Scaling Under Heavy Traffic

Naïve Approach

Correct Approach

Key Insight

Challenge 2 – The Read/Write Imbalance

Solution

Key Insight

Challenge 3 – Cache Misses and the Cold‑Start Problem

What Happens on a Miss

Why NoSQL Is a Good Fit

Key Insight

Challenge 4 – Choosing the Right Cache‑Eviction Strategy

Optimal approach

Challenge 5: Unique ID Generation Across Distributed Nodes

Interview Question

Follow‑up

Solution: Range‑Based ID Allocation

Benefits

Key Insight

Challenge 6: 301 vs 302 Redirects — A Business Decision

Why a 301 can be problematic

Why a 302 is often the better choice

Challenge 7: Malicious URL Protection

Layered defense

Challenge 8 – The Thundering Herd / Cache Stampede

Follow‑up 1 – How do you ensure that only one request hits the database while the rest wait for that result?

Solution: Cache locking (a “single‑flight” pattern)

Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?

Summary

Final Thoughts

Related posts

'Pokemon Go' Players Unknowingly Trained Delivery Robots With 30 Billion Images

Micron enters high-volume production of HBM4 for Nvidia Vera Rubin - 2.3x bandwidth improvement and 20% boost in power efficiency

Nvidia’s version of OpenClaw could solve its biggest problem: security

Nvidia’s version of OpenClaw could solve its biggest problem: Security

Challenge 1: Scaling Under Heavy Traffic

Challenge 2 – The Read/Write Imbalance

Challenge 3 – Cache Misses and the Cold‑Start Problem

Challenge 4 – Choosing the Right Cache‑Eviction Strategy

Challenge 5: Unique ID Generation Across Distributed Nodes

Challenge 6: 301 vs 302 Redirects — A Business Decision

Challenge 7: Malicious URL Protection

Challenge 8 – The Thundering Herd / Cache Stampede

Follow‑up 1 – How do you ensure that only one request hits the database while the rest wait for that result?

Follow‑up 2 – What if the first request crashes after setting the flag but before populating the cache?