Autocomplete / Type-ahead System for a Search Box - Part 2

Published: (December 20, 2025 at 08:32 PM EST)
6 min read
Source: Dev.to

Source: Dev.to

📌 Table of Contents (Part 2)

Choosing the Core Data Structure

Autocomplete is fundamentally a prefix‑matching problem:

Given a prefix, return the most relevant strings that start with it — fast.

This eliminates naive solutions like scanning a list or a database table. We need a structure that is prefix‑aware by design.

Why a Trie Works Well

A Trie (Prefix Tree) stores strings character by character.

  • Each node represents a prefix.
  • All descendants share that prefix.

Key Advantages

  • Lookup time depends only on prefix length (not on the number of stored strings).
  • No scanning of unrelated strings.
  • Naturally groups similar queries.
  • Perfect fit for prefix search.

Example

Stored queries:

be, bee, bell, bent, belt

Trie traversal for prefix "be" leads to a subtree containing:

bee, bell, bent, belt

This is exactly what autocomplete needs — fast prefix isolation.

Optimizing the Trie Structure

A naive trie becomes extremely large when storing billions of queries.

Problems with a Naive Trie

  • Too many nodes.
  • Deep chains with single children.
  • High memory overhead.
  • Excessive pointer chasing.

Compressed Trie (Radix Tree)

In many cases, nodes have only one child. These chains can be merged into a single edge that stores multiple characters.

Benefits

  • Reduced memory usage.
  • Fewer pointer traversals.
  • Faster, cache‑friendly lookups.

This optimized trie is often called a Radix Tree or Compact Trie.

Storing Popularity Information

How popular is a full search query?

Each complete query ends at a terminal node in the trie. At that node we store:

  • Search frequency, or a weighted score that combines frequency + recency.

Example

User searches:

"apple"      → 100 searches
"application"→ 60 searches
"app store"  → 200 searches

Trie view:

app
 ├── apple      (freq=100)
 ├── application(freq=60)
 └── app store (freq=200)

At this stage we know the popularity of full queries, but we cannot yet answer prefix queries fast. This layer answers:

“How popular is each query?”

How Top K Suggestions Are Retrieved

Given a prefix, how do we return the best suggestions instantly?

Naïve Approach ❌

For prefix "be":

  1. Traverse to the prefix node.
  2. Walk the entire subtree.
  3. Collect all terminal queries.
  4. Sort by frequency.
  5. Return the top K.

This is far too slow for real‑time systems.

Pre‑computed Top K Optimization ✅

To guarantee fast responses:

Each trie node stores the top K suggestions for that prefix.

At query time:

Traverse to prefix node → return stored top‑K

No subtree traversal, no sorting, no computation on the request path.

Storage Optimization

  • Store references (pointers/IDs) to terminal nodes rather than full strings.
  • Store frequency alongside the reference.

This is a classic space vs latency trade‑off—autocomplete is extremely read‑heavy, so the extra memory is worthwhile.

Building and Maintaining Top K Lists

Top‑K lists are built bottom‑up.

Build Process

  1. Leaf nodes know their own frequency.
  2. Parent nodes merge children’s top‑K lists.
  3. Only the top K entries are retained.
  4. The process propagates upward to the root.

Result: every prefix node has a pre‑sorted suggestion list.
Lookup time remains O(prefix length).

Two‑Layer Mental Model

LayerPurpose
1️⃣ Popularity Tracking (Data Layer)Tracks how often full queries are searched. Stored at terminal nodes. Updated offline via logs and aggregation.
2️⃣ Fast Retrieval (Serving Layer)Uses popularity data to compute rankings. Stores only top‑K per prefix. Optimized purely for read latency.

High‑Level System Architecture

Autocomplete is a read‑heavy, latency‑critical pipeline.

Serve from memory first → fall back to structured storage → never compute on the request path.

Autocomplete is not a compute problem—it’s a data‑access problem.

Logical Components

User

Browser / Mobile Client

Global Load Balancer

Stateless Application Servers

In‑Memory Cache (e.g., Redis)

Distributed Trie Storage

Each layer has a single responsibility, which keeps the system scalable and debuggable.

Step‑by‑Step Request Flow

1️⃣ Client Layer (Browser / Mobile)

  • Sends only the current prefix.
  • Requests are debounced (e.g., 150 ms) to avoid flooding the backend while typing.

Example request payload:

{
  "prefix": "how",
  "k": 5,
  "locale": "en-IN"
}

2️⃣ Load Balancer

  • Routes to the nearest region.
  • Distributes traffic evenly.
  • Performs health‑checking and TLS termination.

(Further steps—application server handling, cache lookup, fallback to trie storage, and response composition—continue in Part 3.)

Storage, Replication, and Availability

(Outline only; detailed design deferred to later sections.)

  • Primary storage: sharded, replicated radix‑tree nodes stored on SSD‑backed machines.
  • Cache layer: hot prefixes cached in a distributed in‑memory store (Redis/Memcached).
  • Replication: synchronous replication for hot shards, asynchronous for cold shards to balance consistency and latency.
  • Failover: automatic rerouting via the load balancer; read‑only fallback to a stale but available replica if the primary is down.

Unhealthy Servers

At Scale

  • Global LB (GeoDNS / Anycast)
  • Regional LB inside data centers

3️⃣ Application Servers (Stateless)

Act as coordinators, not data holders. They handle:

  • Input normalization ("How ""how")
  • Validation
  • Cache orchestration
  • Fallback logic

Stateless design means:

  • Easy horizontal scaling
  • No data migration during scale‑up

4️⃣ Redis Cache (Hot Prefix Layer)

First data stop.

Cache Key

autocomplete:{locale}:{prefix}

Example

autocomplete:en:how

Value

[
  "how to cook rice",
  "how to lose weight",
  "how to make tea"
]

Why Redis works well

  • A small set of prefixes dominates traffic
  • Sub‑millisecond memory access
  • TTL removes stale data automatically

Prefixes like "a", "th", "how" are often served 99 % from cache.

5️⃣ Cache Miss → Trie Lookup

On cache miss:

  1. Query distributed trie storage.
  2. Trie node already contains pre‑computed top‑K → no traversal or sorting at runtime.

Important

  • Trie servers are read‑only on the serving path.
  • Writes and rebuilds happen offline.

6️⃣ Response & Cache Population

  • Results returned to client.
  • Same response written back to Redis.
  • TTL applied (10–30 minutes).

Cold prefixes naturally become hot.

Failure Handling

Redis Down

  • App servers bypass cache.
  • Read directly from trie.
  • Slight latency increase; system remains available.

Trie Shard Down

  • Traffic routed to replica.
  • Load balancer avoids unhealthy node.

App Server Down

  • Instantly removed from rotation.
  • No data loss (stateless).

Regional Deployment

User (India)

India Load Balancer

India App Servers

India Redis

India Trie Replica

Each region:

  • Serves local users.
  • Has an independent cache.
  • Shares replicated trie data.

Conceptual Architecture Diagram

+---------+      +--------------+      +-------------------+
|  Client | ---> | Load Balancer| ---> | Application Server|
+---------+      +--------------+      +-------------------+
                                             |
                               Cache Hit   |   Cache Miss
                              +-----------+   v
                              |   Redis   | +-----------------+
                              +-----------+ | Distributed Trie|
                                            |    Storage      |
                                            +-----------------+

Distributed Trie Storage

  • The trie is logically one structure but physically distributed across many servers.

Partitioning Strategy

Common approach: prefix‑based partitioning

Prefix RangeShard
a–c1
d–f2
  • Hot prefixes (a, s, th) are further subdivided to keep shards balanced and scalable.

Replication Strategy

Each trie partition is replicated across multiple servers.

Why Replication Matters

  • High Availability – node failures don’t impact service.
  • Fault Tolerance – handles hardware failures and deployments.
  • Read Scalability – replicas share massive QPS load.

Storage Technology Choices

Cassandra (Durable Storage)

Best fit when:

  • Large trie partitions.
  • Very high read throughput.
  • Cross‑region replication needed.

Usage pattern

  • Trie partitions stored as serialized blocks.
  • Loaded into memory on startup.

Cassandra acts as durable backing store.

Strengths

  • Horizontal scalability.
  • High availability.
  • Tunable consistency.

ZooKeeper (Metadata & Coordination)

ZooKeeper is not used to store trie data. It is used for:

  • Partition metadata.
  • Prefix‑to‑server mappings.
  • Leader election (if required).
  • Atomic version switching during trie rebuilds.

Replication Guarantees

  • ✅ High availability
  • ✅ Fault tolerance
  • ✅ Read scalability

🎯 Final Takeaway (Part 2)

A real‑world autocomplete system succeeds because it:

  • Uses tries for prefix isolation.
  • Pre‑computes top K suggestions.
  • Serves requests from memory first.
  • Pushes all heavy computation offline.
  • Scales via partitioning and replication.

This is the level of clarity interviewers look for.

Back to Blog

Related posts

Read more »