Scaling Horizontally: Kubernetes, Sticky Sessions, and Redis

Published: 12 hours ago (December 17, 2025 at 11:22 PM EST)

7 min read

Source: Dev.to

Introduction

Scaling stateless HTTP applications is a well‑understood problem: spin up more pods, put a load balancer in front, and use a round‑robin algorithm. If one pod fails, the next request is simply routed to another healthy instance. However, real‑time applications using WebSockets—specifically those built with Flask‑SocketIO—break this paradigm fundamentally.

WebSockets rely on long‑lived, stateful TCP connections. Once a client connects to a server process, that specific process holds the socket file descriptor and the in‑memory context (rooms, session data) for that user. If you simply replicate a Flask‑SocketIO container to ten pods in Kubernetes, the system will fail immediately upon deployment.

Diagram of failure

This failure occurs because the standard horizontal‑scaling model does not account for the dual requirements of the Socket.IO protocol:

Connection persistence during the handshake
Distributed event propagation after the connection is established

To scale Flask‑SocketIO effectively, we must move beyond the single‑server mindset and implement a distributed architecture utilizing Kubernetes Ingress for session affinity and Redis as a pub/sub message bus.

The Stateful Problem: Why Round‑Robin Fails

To understand why standard load balancing fails, we must look at the Socket.IO protocol negotiation. Unlike raw WebSockets, Socket.IO does not immediately establish a WebSocket connection. Instead, it typically begins with HTTP long‑polling to ensure compatibility and robust connectivity through restrictive proxies.

The handshake sequence looks like this:

Handshake Request: POST /socket.io/?EIO=4&transport=polling
The server responds with a Session ID (sid) and connection intervals.

Poll Request: GET /socket.io/?EIO=4&transport=polling&sid=...

Upgrade Request: Client sends an Upgrade: websocket header to switch protocols.

In a round‑robin Kubernetes environment without session affinity, the Handshake Request might route to Pod A, which generates a session ID (e.g., abc-123) and stores it in its local memory. The subsequent Poll Request might be routed by the Service to Pod B. Pod B has no record of session abc-123 in its memory, so it rejects the request with a 400 Bad Request or {"code":1,"message":"Session ID unknown"} error.

Even if the connection successfully upgrades to WebSocket (which locks the TCP connection to a single pod), the system remains broken for broadcasting. If User A is connected to Pod A and User B is connected to Pod B, and they are both in a chat room room_1, a message sent by User A will only exist inside Pod A’s memory. Pod B will never know it needs to forward that message to User B.

Sticky Sessions: Configuring Ingress‑Nginx

The solution to the handshake failure is session affinity, commonly known as “Sticky Sessions.” This ensures that once a client initiates a handshake with a specific pod, all subsequent requests from that client are routed to the exact same pod.

Ingress sticky session diagram

In Kubernetes, this is typically handled at the Ingress controller level rather than the Service level (which offers sessionAffinity: ClientIP, but this is often unreliable behind NATs). For ingress‑nginx, the standard controller used in many clusters, stickiness is achieved via cookie‑based affinity.

Configuration via Annotations

Add the following annotations to your Ingress resource to inject a routing cookie:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: socketio-ingress
  annotations:
    # Enable cookie‑based affinity
    nginx.ingress.kubernetes.io/affinity: "cookie"

    # Name of the cookie sent to the client
    nginx.ingress.kubernetes.io/session-cookie-name: "route"

    # Critical: Use "persistent" mode to prevent rebalancing active sessions
    nginx.ingress.kubernetes.io/affinity-mode: "persistent"

    # Hash algorithm (sha1, md5, or index)
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"

    # Duration (should match your socket.io ping timeout logic)
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
spec:
  rules:
  - host: socket.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: socketio-service
            port:
              number: 5000

The “Persistent” vs. “Balanced” Trap

A common configuration mistake is ignoring the affinity-mode. By default, or when set to balanced, Nginx may redistribute sessions if the number of pods scales up or down to balance the load. For stateless apps this is fine, but for WebSockets it breaks the connection. Setting

nginx.ingress.kubernetes.io/affinity-mode: "persistent"

ensures that Nginx honors the cookie even if pod distribution changes, preserving WebSocket connection stability at the cost of potential load imbalance.

The Redis Backplane: Distributed Event Propagation

Sticky sessions solve the handshake problem, but they do not address the need for cross‑pod event broadcasting. To propagate events (e.g., chat messages, notifications) to all connected clients regardless of which pod they are attached to, a shared message bus is required. Redis, used as a pub/sub backplane, fulfills this role:

Each Flask‑SocketIO instance publishes events to a Redis channel.
All instances subscribe to the same channel, receiving events from peers.
The Redis server can be deployed as a StatefulSet with a stable DNS name (e.g., redis-master.default.svc.cluster.local) and optionally protected by a password or TLS.

By combining sticky sessions (for connection persistence) with a Redis backplane (for distributed event propagation), you achieve a truly scalable Flask‑SocketIO deployment on Kubernetes.

Connection Problem, but They Create a New Isolation Problem

Users are now siloed on different servers. To allow User A (on Pod 1) to send a message to User B (on Pod 2), we need a backplane—a mechanism to bridge the isolated memory spaces of the Flask processes.

Architecture diagram

Flask‑SocketIO solves this using a Message Queue, with Redis being the most performant and common choice. This implements the Pub/Sub (Publish/Subscribe) pattern.

How It Works Internally

When you configure Flask‑SocketIO with a Redis message queue:

Step	Description
Subscription	Every Flask‑SocketIO worker establishes a connection to Redis and subscribes to a specific channel (usually `flask-socketio`).
Emission	When code in Pod A executes `emit('chat', msg, room='lobby')`, it does not loop through its own client list. Instead, it publishes a message to Redis saying “Send `chat` to `lobby`”.
Distribution	Redis pushes this message to all other subscribed Flask workers (Pod B, Pod C, …).
Fan‑Out	Each pod receives the Redis message, checks its own local memory for clients in `lobby`, and forwards the message to them over their open WebSocket connections.

This architecture decouples the origin of an event from the delivery of the event.

Installation: Setting Up Redis and Flask‑SocketIO

Implementing this requires installing the Redis server (usually via a Helm chart in Kubernetes, e.g., bitnami/redis) and configuring the Python application to use it.

Dependencies

pip install flask-socketio redis

Note: If you use eventlet or gevent for async workers, ensure the Redis client is monkey‑patch compatible, or use a driver that is. The standard redis-py works well with recent versions of eventlet when patched correctly.

Application Configuration

Pass the connection string to the SocketIO constructor. This is the only code change required to switch from a single‑node memory store to a distributed Redis store.

from flask import Flask
from flask_socketio import SocketIO

app = Flask(__name__)

# The `message_queue` argument enables the Redis backend.
# In Kubernetes, `redis-master` is typically the service DNS name.
socketio = SocketIO(
    app,
    message_queue='redis://redis-master:6379/0',
    cors_allowed_origins="*"
)

@socketio.on('message')
def handle_message(data):
    # This emit is now broadcast via Redis to all pods
    socketio.emit('response', data)

Common Mistake: Do not use the client_manager argument manually unless you are customizing the underlying Engine.IO implementation. The message_queue argument is the high‑level wrapper that configures the RedisManager automatically.

Trade‑offs: Latency and Bottlenecks

While this architecture enables horizontal scaling, it introduces specific engineering trade‑offs that must be monitored.

Latency Introduction

In a single‑node setup, an emit is a direct memory operation. In a distributed setup, every broadcast involves a network round‑trip to Redis.

Client → Pod A → Redis → Pod B → Client

This adds single‑digit millisecond latency (typically 1–5 ms) per hop.
Network congestion or Redis overload can increase latency dramatically.

Production Checklist

Layer	Requirement
Layer 1 (Ingress)	MUST enable sticky sessions (cookie affinity) to ensure handshake completion.
Layer 2 (App)	MUST configure a Redis message queue to bridge isolated worker processes.

Ingress configured with affinity: cookie and affinity-mode: persistent.
Redis deployed (preferably with persistence disabled for pure Pub/Sub performance, or enabled if you also need it for storage).
Flask‑SocketIO initialized with message_queue='redis://...'.
Monitoring in place for Redis CPU, network latency, and WebSocket connection health.
Gevent/Eventlet monkey‑patching applied at the very top of the application entry point.

By implementing this architecture, you transform Flask‑SocketIO from a development toy into a robust, scalable real‑time platform capable of handling tens of thousands of concurrent connections.