How to design a Real-Time Chat Application

Published: 2 months ago (March 4, 2026 at 05:06 AM EST)

6 min read

Source: Dev.to

Source: Dev.to

Introduction

Designing a real‑time chat application is significantly more complex than building systems like a URL shortener or a notification service.

Main challenges

Real‑time bidirectional communication
Handling millions of concurrent connections
Ensuring low latency
Managing message persistence and offline delivery

Unlike simple request‑response systems, chat applications require persistent connections and instant delivery at scale.

Functional Requirements

1‑to‑1 messaging
Group messaging
Message persistence
Offline message delivery (messages should be delivered when a user comes online)

Non‑Functional Requirements

Scalable to millions of users
Low latency (< 500 ms)
Fault tolerant
Highly available
Durable storage

Choosing the Correct Communication Protocol

Because our latency requirement is < 500 ms, traditional short‑polling or long‑polling are not ideal—they introduce unnecessary delays and overhead.

Server‑Sent Events (SSE) are also unsuitable because they support only one‑way communication (server → client), whereas a chat system requires two‑way communication.

We therefore use WebSockets, which provide:

Persistent connections
Bidirectional communication
Low latency
Reduced network overhead

Modern messaging platforms like WhatsApp use persistent connections to achieve real‑time communication.

High‑Level Architecture

Chat System Architecture

Our system consists of the following components:

1. Client

Maintains a WebSocket connection with the server to send and receive messages.

2. Load Balancer

Distributes incoming WebSocket connections across multiple chat servers to ensure scalability and high availability.

3. Chat Servers

Handle the core business logic:

Manage WebSocket connections
Validate messages
Store messages in the database
Deliver messages to recipients

4. Redis

Since the load balancer does not know which user is connected to which chat server, we store connection mappings in Redis, e.g.:

userId → serverId / connectionId

This allows any server to determine whether a user is online and where to route the message.

5. Database

We use a scalable NoSQL database such as Amazon DynamoDB (or any key‑value store) because:

High write throughput is required
Strict ACID guarantees are unnecessary
Horizontal scaling is easier

Message Flows

1‑to‑1 Message Flow

The sender sends a message via WebSocket.
The chat server validates and stores the message in the database (for persistence).
The server checks Redis to determine whether the recipient is online.
- If online: deliver immediately via WebSocket.
- If offline: keep the message stored; deliver when the user reconnects.

Group Chat Message Flow

A user sends a message to a group.
The message is stored in the database with the group ID.
The server retrieves the list of group members.
For each member, check Redis for their connection:
- If online → deliver via WebSocket.
- If offline → deliver when they reconnect.

Challenges

Designing the architecture is only the beginning. The real complexity lies in handling the following challenges at scale.

Scaling Millions of WebSocket Connections

Each active user maintains a persistent WebSocket connection.
Each connection consumes memory; a single server can handle only a limited number of concurrent connections.
Sudden traffic spikes (e.g., during peak hours) can overwhelm servers.

Solution

Use horizontal scaling (multiple chat servers).
Keep servers stateless; store connection metadata in a centralized store like Redis.
Use load balancers to distribute traffic evenly.

The Fan‑Out Problem in Group Chats

When a user sends a message to a group with 10 000 members, the system must deliver that message to all members, creating massive delivery overhead.

Two common approaches

Approach	Description	Trade‑off
Fan‑out on Write	Distribute the message to all members immediately.	Faster reads, heavy write amplification.
Fan‑out on Read	Store a single copy; deliver when users fetch/reconnect.	Reduces write load, increases read complexity.

Large‑scale systems like Slack often use hybrid approaches depending on group size.

Message Ordering

Messages may arrive out of order due to network delays.
Multiple servers handling requests can cause race conditions.

Solution

Assign a sequence number per conversation.
Store timestamps.
Let clients reorder messages based on sequence IDs.

Handling Offline Users

Users may disconnect unexpectedly (network issues, app crashes, device shutdown). The system must:

Store undelivered messages safely.
Detect when the user reconnects.
Deliver pending messages reliably.

This requires durable storage (e.g., DynamoDB).

Delivery Guarantees

Should messages be delivered:

At most once?
At least once?
Exactly once?

Exactly‑once delivery is extremely hard in distributed systems. Most chat systems opt for at‑least‑once delivery:

Assign a unique message ID.
Let the client deduplicate if necessary.

Summary

Building a real‑time chat system involves more than just wiring up WebSockets. It requires careful consideration of:

Scalable connection handling
Efficient fan‑out strategies
Consistent message ordering
Reliable offline storage and delivery
Appropriate delivery guarantees

By combining WebSockets, a stateless server layer, Redis for connection mapping, and a high‑throughput NoSQL database, we can meet the functional and non‑functional requirements of a modern, large‑scale chat application.

Fault Tolerance

What happens if:

A chat server crashes?
Redis goes down?
A database node fails?

Solutions:

Replicated databases.
Redis clustering.
Health checks and auto‑restarts.
Multi‑availability‑zone deployments.

Large messaging systems like WhatsApp are designed with redundancy at every layer to avoid message loss.

Data Storage & Hot Partitions

If many users are chatting in the same popular group, all writes may hit the same database partition. This creates:

Hot keys
Increased latency
Throttling

Solutions:

Partition by conversation ID + time bucket.
Use sharding strategies.
Distribute load evenly across nodes.

Conclusion

Designing a real‑time chat application goes far beyond simply sending messages between users. It requires solving complex distributed‑systems problems such as:

Scaling millions of persistent connections
Ensuring low latency
Handling offline users
Maintaining message ordering
Guaranteeing fault tolerance

By using WebSockets for bidirectional communication, horizontally scalable chat servers, centralized connection mapping with Redis, and durable storage solutions like Amazon DynamoDB, we can build a system capable of supporting millions of users efficiently.

The real challenge is not just building the architecture — it’s understanding the trade‑offs between scalability, consistency, and reliability.

A well‑designed chat system is a practical example of how distributed‑systems principles are applied in real‑world applications.

How to design a Real-Time Chat Application

Introduction

Main challenges

Functional Requirements

Non‑Functional Requirements

Choosing the Correct Communication Protocol

High‑Level Architecture

1. Client

2. Load Balancer

3. Chat Servers

4. Redis

5. Database

Message Flows

1‑to‑1 Message Flow

Group Chat Message Flow

Challenges

Scaling Millions of WebSocket Connections

The Fan‑Out Problem in Group Chats

Message Ordering

Handling Offline Users

Delivery Guarantees

Summary

Fault Tolerance

Data Storage & Hot Partitions

Conclusion

Related posts

Why Clean Architecture Still Matters in 2026

Stop Writing Manual Retry Loops in Go: Why Your Current Logic is Probably Dangerous

Building a Scalable File Storage System Using S3

Introducing ocpp-ws-io: The Type-Safe OCPP Ecosystem for Node.js ⚡