Autocomplete / Type-ahead System for a Search Box - Part 2
Source: Dev.to
📌 Table of Contents (Part 2)
- Choosing the Core Data Structure
- Optimizing the Trie Structure
- How Top K Suggestions Are Retrieved
- Building and Maintaining Top K Lists
- High‑Level System Architecture
- Storage, Replication, and Availability
Choosing the Core Data Structure
Autocomplete is fundamentally a prefix‑matching problem:
Given a prefix, return the most relevant strings that start with it — fast.
This eliminates naive solutions like scanning a list or a database table. We need a structure that is prefix‑aware by design.
Why a Trie Works Well
A Trie (Prefix Tree) stores strings character by character.
- Each node represents a prefix.
- All descendants share that prefix.
Key Advantages
- Lookup time depends only on prefix length (not on the number of stored strings).
- No scanning of unrelated strings.
- Naturally groups similar queries.
- Perfect fit for prefix search.
Example
Stored queries:
be, bee, bell, bent, belt
Trie traversal for prefix "be" leads to a subtree containing:
bee, bell, bent, belt
This is exactly what autocomplete needs — fast prefix isolation.
Optimizing the Trie Structure
A naive trie becomes extremely large when storing billions of queries.
Problems with a Naive Trie
- Too many nodes.
- Deep chains with single children.
- High memory overhead.
- Excessive pointer chasing.
Compressed Trie (Radix Tree)
In many cases, nodes have only one child. These chains can be merged into a single edge that stores multiple characters.
Benefits
- Reduced memory usage.
- Fewer pointer traversals.
- Faster, cache‑friendly lookups.
This optimized trie is often called a Radix Tree or Compact Trie.
Storing Popularity Information
How popular is a full search query?
Each complete query ends at a terminal node in the trie. At that node we store:
- Search frequency, or a weighted score that combines frequency + recency.
Example
User searches:
"apple" → 100 searches
"application"→ 60 searches
"app store" → 200 searches
Trie view:
app
├── apple (freq=100)
├── application(freq=60)
└── app store (freq=200)
At this stage we know the popularity of full queries, but we cannot yet answer prefix queries fast. This layer answers:
“How popular is each query?”
How Top K Suggestions Are Retrieved
Given a prefix, how do we return the best suggestions instantly?
Naïve Approach ❌
For prefix "be":
- Traverse to the prefix node.
- Walk the entire subtree.
- Collect all terminal queries.
- Sort by frequency.
- Return the top K.
This is far too slow for real‑time systems.
Pre‑computed Top K Optimization ✅
To guarantee fast responses:
Each trie node stores the top K suggestions for that prefix.
At query time:
Traverse to prefix node → return stored top‑K
No subtree traversal, no sorting, no computation on the request path.
Storage Optimization
- Store references (pointers/IDs) to terminal nodes rather than full strings.
- Store frequency alongside the reference.
This is a classic space vs latency trade‑off—autocomplete is extremely read‑heavy, so the extra memory is worthwhile.
Building and Maintaining Top K Lists
Top‑K lists are built bottom‑up.
Build Process
- Leaf nodes know their own frequency.
- Parent nodes merge children’s top‑K lists.
- Only the top K entries are retained.
- The process propagates upward to the root.
Result: every prefix node has a pre‑sorted suggestion list.
Lookup time remains O(prefix length).
Two‑Layer Mental Model
| Layer | Purpose |
|---|---|
| 1️⃣ Popularity Tracking (Data Layer) | Tracks how often full queries are searched. Stored at terminal nodes. Updated offline via logs and aggregation. |
| 2️⃣ Fast Retrieval (Serving Layer) | Uses popularity data to compute rankings. Stores only top‑K per prefix. Optimized purely for read latency. |
High‑Level System Architecture
Autocomplete is a read‑heavy, latency‑critical pipeline.
Serve from memory first → fall back to structured storage → never compute on the request path.
Autocomplete is not a compute problem—it’s a data‑access problem.
Logical Components
User
↓
Browser / Mobile Client
↓
Global Load Balancer
↓
Stateless Application Servers
↓
In‑Memory Cache (e.g., Redis)
↓
Distributed Trie Storage
Each layer has a single responsibility, which keeps the system scalable and debuggable.
Step‑by‑Step Request Flow
1️⃣ Client Layer (Browser / Mobile)
- Sends only the current prefix.
- Requests are debounced (e.g., 150 ms) to avoid flooding the backend while typing.
Example request payload:
{
"prefix": "how",
"k": 5,
"locale": "en-IN"
}
2️⃣ Load Balancer
- Routes to the nearest region.
- Distributes traffic evenly.
- Performs health‑checking and TLS termination.
(Further steps—application server handling, cache lookup, fallback to trie storage, and response composition—continue in Part 3.)
Storage, Replication, and Availability
(Outline only; detailed design deferred to later sections.)
- Primary storage: sharded, replicated radix‑tree nodes stored on SSD‑backed machines.
- Cache layer: hot prefixes cached in a distributed in‑memory store (Redis/Memcached).
- Replication: synchronous replication for hot shards, asynchronous for cold shards to balance consistency and latency.
- Failover: automatic rerouting via the load balancer; read‑only fallback to a stale but available replica if the primary is down.
Unhealthy Servers
At Scale
- Global LB (GeoDNS / Anycast)
- Regional LB inside data centers
3️⃣ Application Servers (Stateless)
Act as coordinators, not data holders. They handle:
- Input normalization (
"How "→"how") - Validation
- Cache orchestration
- Fallback logic
Stateless design means:
- Easy horizontal scaling
- No data migration during scale‑up
4️⃣ Redis Cache (Hot Prefix Layer)
First data stop.
Cache Key
autocomplete:{locale}:{prefix}
Example
autocomplete:en:how
Value
[
"how to cook rice",
"how to lose weight",
"how to make tea"
]
Why Redis works well
- A small set of prefixes dominates traffic
- Sub‑millisecond memory access
- TTL removes stale data automatically
Prefixes like "a", "th", "how" are often served 99 % from cache.
5️⃣ Cache Miss → Trie Lookup
On cache miss:
- Query distributed trie storage.
- Trie node already contains pre‑computed top‑K → no traversal or sorting at runtime.
Important
- Trie servers are read‑only on the serving path.
- Writes and rebuilds happen offline.
6️⃣ Response & Cache Population
- Results returned to client.
- Same response written back to Redis.
- TTL applied (10–30 minutes).
Cold prefixes naturally become hot.
Failure Handling
Redis Down
- App servers bypass cache.
- Read directly from trie.
- Slight latency increase; system remains available.
Trie Shard Down
- Traffic routed to replica.
- Load balancer avoids unhealthy node.
App Server Down
- Instantly removed from rotation.
- No data loss (stateless).
Regional Deployment
User (India)
↓
India Load Balancer
↓
India App Servers
↓
India Redis
↓
India Trie Replica
Each region:
- Serves local users.
- Has an independent cache.
- Shares replicated trie data.
Conceptual Architecture Diagram
+---------+ +--------------+ +-------------------+
| Client | ---> | Load Balancer| ---> | Application Server|
+---------+ +--------------+ +-------------------+
|
Cache Hit | Cache Miss
+-----------+ v
| Redis | +-----------------+
+-----------+ | Distributed Trie|
| Storage |
+-----------------+
Distributed Trie Storage
- The trie is logically one structure but physically distributed across many servers.
Partitioning Strategy
Common approach: prefix‑based partitioning
| Prefix Range | Shard |
|---|---|
| a–c | 1 |
| d–f | 2 |
| … | … |
- Hot prefixes (
a,s,th) are further subdivided to keep shards balanced and scalable.
Replication Strategy
Each trie partition is replicated across multiple servers.
Why Replication Matters
- High Availability – node failures don’t impact service.
- Fault Tolerance – handles hardware failures and deployments.
- Read Scalability – replicas share massive QPS load.
Storage Technology Choices
Cassandra (Durable Storage)
Best fit when:
- Large trie partitions.
- Very high read throughput.
- Cross‑region replication needed.
Usage pattern
- Trie partitions stored as serialized blocks.
- Loaded into memory on startup.
Cassandra acts as durable backing store.
Strengths
- Horizontal scalability.
- High availability.
- Tunable consistency.
ZooKeeper (Metadata & Coordination)
ZooKeeper is not used to store trie data. It is used for:
- Partition metadata.
- Prefix‑to‑server mappings.
- Leader election (if required).
- Atomic version switching during trie rebuilds.
Replication Guarantees
- ✅ High availability
- ✅ Fault tolerance
- ✅ Read scalability
🎯 Final Takeaway (Part 2)
A real‑world autocomplete system succeeds because it:
- Uses tries for prefix isolation.
- Pre‑computes top K suggestions.
- Serves requests from memory first.
- Pushes all heavy computation offline.
- Scales via partitioning and replication.
This is the level of clarity interviewers look for.