Scaling Horizontally: Kubernetes, Sticky Sessions, and Redis
Source: Dev.to
Introduction
Scaling stateless HTTP applications is a well‑understood problem: spin up more pods, put a load balancer in front, and use a round‑robin algorithm. If one pod fails, the next request is simply routed to another healthy instance. However, real‑time applications using WebSockets—specifically those built with Flask‑SocketIO—break this paradigm fundamentally.
WebSockets rely on long‑lived, stateful TCP connections. Once a client connects to a server process, that specific process holds the socket file descriptor and the in‑memory context (rooms, session data) for that user. If you simply replicate a Flask‑SocketIO container to ten pods in Kubernetes, the system will fail immediately upon deployment.

This failure occurs because the standard horizontal‑scaling model does not account for the dual requirements of the Socket.IO protocol:
- Connection persistence during the handshake
- Distributed event propagation after the connection is established
To scale Flask‑SocketIO effectively, we must move beyond the single‑server mindset and implement a distributed architecture utilizing Kubernetes Ingress for session affinity and Redis as a pub/sub message bus.
The Stateful Problem: Why Round‑Robin Fails
To understand why standard load balancing fails, we must look at the Socket.IO protocol negotiation. Unlike raw WebSockets, Socket.IO does not immediately establish a WebSocket connection. Instead, it typically begins with HTTP long‑polling to ensure compatibility and robust connectivity through restrictive proxies.
The handshake sequence looks like this:
Handshake Request: POST /socket.io/?EIO=4&transport=polling
The server responds with a Session ID (sid) and connection intervals.
Poll Request: GET /socket.io/?EIO=4&transport=polling&sid=...
Upgrade Request: Client sends an Upgrade: websocket header to switch protocols.
In a round‑robin Kubernetes environment without session affinity, the Handshake Request might route to Pod A, which generates a session ID (e.g., abc-123) and stores it in its local memory. The subsequent Poll Request might be routed by the Service to Pod B. Pod B has no record of session abc-123 in its memory, so it rejects the request with a 400 Bad Request or {"code":1,"message":"Session ID unknown"} error.
Even if the connection successfully upgrades to WebSocket (which locks the TCP connection to a single pod), the system remains broken for broadcasting. If User A is connected to Pod A and User B is connected to Pod B, and they are both in a chat room room_1, a message sent by User A will only exist inside Pod A’s memory. Pod B will never know it needs to forward that message to User B.
Sticky Sessions: Configuring Ingress‑Nginx
The solution to the handshake failure is session affinity, commonly known as “Sticky Sessions.” This ensures that once a client initiates a handshake with a specific pod, all subsequent requests from that client are routed to the exact same pod.

In Kubernetes, this is typically handled at the Ingress controller level rather than the Service level (which offers sessionAffinity: ClientIP, but this is often unreliable behind NATs). For ingress‑nginx, the standard controller used in many clusters, stickiness is achieved via cookie‑based affinity.
Configuration via Annotations
Add the following annotations to your Ingress resource to inject a routing cookie:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: socketio-ingress
annotations:
# Enable cookie‑based affinity
nginx.ingress.kubernetes.io/affinity: "cookie"
# Name of the cookie sent to the client
nginx.ingress.kubernetes.io/session-cookie-name: "route"
# Critical: Use "persistent" mode to prevent rebalancing active sessions
nginx.ingress.kubernetes.io/affinity-mode: "persistent"
# Hash algorithm (sha1, md5, or index)
nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
# Duration (should match your socket.io ping timeout logic)
nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
spec:
rules:
- host: socket.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: socketio-service
port:
number: 5000
The “Persistent” vs. “Balanced” Trap
A common configuration mistake is ignoring the affinity-mode. By default, or when set to balanced, Nginx may redistribute sessions if the number of pods scales up or down to balance the load. For stateless apps this is fine, but for WebSockets it breaks the connection. Setting
nginx.ingress.kubernetes.io/affinity-mode: "persistent"
ensures that Nginx honors the cookie even if pod distribution changes, preserving WebSocket connection stability at the cost of potential load imbalance.
The Redis Backplane: Distributed Event Propagation
Sticky sessions solve the handshake problem, but they do not address the need for cross‑pod event broadcasting. To propagate events (e.g., chat messages, notifications) to all connected clients regardless of which pod they are attached to, a shared message bus is required. Redis, used as a pub/sub backplane, fulfills this role:
- Each Flask‑SocketIO instance publishes events to a Redis channel.
- All instances subscribe to the same channel, receiving events from peers.
- The Redis server can be deployed as a StatefulSet with a stable DNS name (e.g.,
redis-master.default.svc.cluster.local) and optionally protected by a password or TLS.
By combining sticky sessions (for connection persistence) with a Redis backplane (for distributed event propagation), you achieve a truly scalable Flask‑SocketIO deployment on Kubernetes.
Connection Problem, but They Create a New Isolation Problem
Users are now siloed on different servers. To allow User A (on Pod 1) to send a message to User B (on Pod 2), we need a backplane—a mechanism to bridge the isolated memory spaces of the Flask processes.

Flask‑SocketIO solves this using a Message Queue, with Redis being the most performant and common choice. This implements the Pub/Sub (Publish/Subscribe) pattern.
How It Works Internally
When you configure Flask‑SocketIO with a Redis message queue:
| Step | Description |
|---|---|
| Subscription | Every Flask‑SocketIO worker establishes a connection to Redis and subscribes to a specific channel (usually flask-socketio). |
| Emission | When code in Pod A executes emit('chat', msg, room='lobby'), it does not loop through its own client list. Instead, it publishes a message to Redis saying “Send chat to lobby”. |
| Distribution | Redis pushes this message to all other subscribed Flask workers (Pod B, Pod C, …). |
| Fan‑Out | Each pod receives the Redis message, checks its own local memory for clients in lobby, and forwards the message to them over their open WebSocket connections. |
This architecture decouples the origin of an event from the delivery of the event.
Installation: Setting Up Redis and Flask‑SocketIO
Implementing this requires installing the Redis server (usually via a Helm chart in Kubernetes, e.g., bitnami/redis) and configuring the Python application to use it.
Dependencies
pip install flask-socketio redis
Note: If you use
eventletorgeventfor async workers, ensure the Redis client is monkey‑patch compatible, or use a driver that is. The standardredis-pyworks well with recent versions ofeventletwhen patched correctly.
Application Configuration
Pass the connection string to the SocketIO constructor. This is the only code change required to switch from a single‑node memory store to a distributed Redis store.
from flask import Flask
from flask_socketio import SocketIO
app = Flask(__name__)
# The `message_queue` argument enables the Redis backend.
# In Kubernetes, `redis-master` is typically the service DNS name.
socketio = SocketIO(
app,
message_queue='redis://redis-master:6379/0',
cors_allowed_origins="*"
)
@socketio.on('message')
def handle_message(data):
# This emit is now broadcast via Redis to all pods
socketio.emit('response', data)
Common Mistake: Do not use the client_manager argument manually unless you are customizing the underlying Engine.IO implementation. The message_queue argument is the high‑level wrapper that configures the RedisManager automatically.
Trade‑offs: Latency and Bottlenecks
While this architecture enables horizontal scaling, it introduces specific engineering trade‑offs that must be monitored.
Latency Introduction
In a single‑node setup, an emit is a direct memory operation. In a distributed setup, every broadcast involves a network round‑trip to Redis.
Client → Pod A → Redis → Pod B → Client
- This adds single‑digit millisecond latency (typically 1–5 ms) per hop.
- Network congestion or Redis overload can increase latency dramatically.
Production Checklist
| Layer | Requirement |
|---|---|
| Layer 1 (Ingress) | MUST enable sticky sessions (cookie affinity) to ensure handshake completion. |
| Layer 2 (App) | MUST configure a Redis message queue to bridge isolated worker processes. |
- Ingress configured with affinity:
cookieandaffinity-mode: persistent. - Redis deployed (preferably with persistence disabled for pure Pub/Sub performance, or enabled if you also need it for storage).
- Flask‑SocketIO initialized with
message_queue='redis://...'. - Monitoring in place for Redis CPU, network latency, and WebSocket connection health.
- Gevent/Eventlet monkey‑patching applied at the very top of the application entry point.
By implementing this architecture, you transform Flask‑SocketIO from a development toy into a robust, scalable real‑time platform capable of handling tens of thousands of concurrent connections.