How to design a Real-Time Chat Application
Source: Dev.to
Introduction
Designing a real‑time chat application is significantly more complex than building systems like a URL shortener or a notification service.
Main challenges
- Real‑time bidirectional communication
- Handling millions of concurrent connections
- Ensuring low latency
- Managing message persistence and offline delivery
Unlike simple request‑response systems, chat applications require persistent connections and instant delivery at scale.
Functional Requirements
- 1‑to‑1 messaging
- Group messaging
- Message persistence
- Offline message delivery (messages should be delivered when a user comes online)
Non‑Functional Requirements
- Scalable to millions of users
- Low latency (< 500 ms)
- Fault tolerant
- Highly available
- Durable storage
Choosing the Correct Communication Protocol
Because our latency requirement is < 500 ms, traditional short‑polling or long‑polling are not ideal—they introduce unnecessary delays and overhead.
Server‑Sent Events (SSE) are also unsuitable because they support only one‑way communication (server → client), whereas a chat system requires two‑way communication.
We therefore use WebSockets, which provide:
- Persistent connections
- Bidirectional communication
- Low latency
- Reduced network overhead
Modern messaging platforms like WhatsApp use persistent connections to achieve real‑time communication.
High‑Level Architecture

Our system consists of the following components:
1. Client
Maintains a WebSocket connection with the server to send and receive messages.
2. Load Balancer
Distributes incoming WebSocket connections across multiple chat servers to ensure scalability and high availability.
3. Chat Servers
Handle the core business logic:
- Manage WebSocket connections
- Validate messages
- Store messages in the database
- Deliver messages to recipients
4. Redis
Since the load balancer does not know which user is connected to which chat server, we store connection mappings in Redis, e.g.:
userId → serverId / connectionId
This allows any server to determine whether a user is online and where to route the message.
5. Database
We use a scalable NoSQL database such as Amazon DynamoDB (or any key‑value store) because:
- High write throughput is required
- Strict ACID guarantees are unnecessary
- Horizontal scaling is easier
Message Flows
1‑to‑1 Message Flow
- The sender sends a message via WebSocket.
- The chat server validates and stores the message in the database (for persistence).
- The server checks Redis to determine whether the recipient is online.
- If online: deliver immediately via WebSocket.
- If offline: keep the message stored; deliver when the user reconnects.
Group Chat Message Flow
- A user sends a message to a group.
- The message is stored in the database with the group ID.
- The server retrieves the list of group members.
- For each member, check Redis for their connection:
- If online → deliver via WebSocket.
- If offline → deliver when they reconnect.
Challenges
Designing the architecture is only the beginning. The real complexity lies in handling the following challenges at scale.
Scaling Millions of WebSocket Connections
- Each active user maintains a persistent WebSocket connection.
- Each connection consumes memory; a single server can handle only a limited number of concurrent connections.
- Sudden traffic spikes (e.g., during peak hours) can overwhelm servers.
Solution
- Use horizontal scaling (multiple chat servers).
- Keep servers stateless; store connection metadata in a centralized store like Redis.
- Use load balancers to distribute traffic evenly.
The Fan‑Out Problem in Group Chats
When a user sends a message to a group with 10 000 members, the system must deliver that message to all members, creating massive delivery overhead.
Two common approaches
| Approach | Description | Trade‑off |
|---|---|---|
| Fan‑out on Write | Distribute the message to all members immediately. | Faster reads, heavy write amplification. |
| Fan‑out on Read | Store a single copy; deliver when users fetch/reconnect. | Reduces write load, increases read complexity. |
Large‑scale systems like Slack often use hybrid approaches depending on group size.
Message Ordering
- Messages may arrive out of order due to network delays.
- Multiple servers handling requests can cause race conditions.
Solution
- Assign a sequence number per conversation.
- Store timestamps.
- Let clients reorder messages based on sequence IDs.
Handling Offline Users
Users may disconnect unexpectedly (network issues, app crashes, device shutdown). The system must:
- Store undelivered messages safely.
- Detect when the user reconnects.
- Deliver pending messages reliably.
This requires durable storage (e.g., DynamoDB).
Delivery Guarantees
Should messages be delivered:
- At most once?
- At least once?
- Exactly once?
Exactly‑once delivery is extremely hard in distributed systems. Most chat systems opt for at‑least‑once delivery:
- Assign a unique message ID.
- Let the client deduplicate if necessary.
Summary
Building a real‑time chat system involves more than just wiring up WebSockets. It requires careful consideration of:
- Scalable connection handling
- Efficient fan‑out strategies
- Consistent message ordering
- Reliable offline storage and delivery
- Appropriate delivery guarantees
By combining WebSockets, a stateless server layer, Redis for connection mapping, and a high‑throughput NoSQL database, we can meet the functional and non‑functional requirements of a modern, large‑scale chat application.
Fault Tolerance
What happens if:
- A chat server crashes?
- Redis goes down?
- A database node fails?
Solutions:
- Replicated databases.
- Redis clustering.
- Health checks and auto‑restarts.
- Multi‑availability‑zone deployments.
Large messaging systems like WhatsApp are designed with redundancy at every layer to avoid message loss.
Data Storage & Hot Partitions
If many users are chatting in the same popular group, all writes may hit the same database partition. This creates:
- Hot keys
- Increased latency
- Throttling
Solutions:
- Partition by conversation ID + time bucket.
- Use sharding strategies.
- Distribute load evenly across nodes.
Conclusion
Designing a real‑time chat application goes far beyond simply sending messages between users. It requires solving complex distributed‑systems problems such as:
- Scaling millions of persistent connections
- Ensuring low latency
- Handling offline users
- Maintaining message ordering
- Guaranteeing fault tolerance
By using WebSockets for bidirectional communication, horizontally scalable chat servers, centralized connection mapping with Redis, and durable storage solutions like Amazon DynamoDB, we can build a system capable of supporting millions of users efficiently.
The real challenge is not just building the architecture — it’s understanding the trade‑offs between scalability, consistency, and reliability.
A well‑designed chat system is a practical example of how distributed‑systems principles are applied in real‑world applications.