Performance Tuning: Linux Kernel Optimizations for 10k+ Connections
Source: Dev.to
Introduction
In high‑concurrency real‑time architectures, the performance bottleneck inevitably shifts from the application layer to the operating system. A well‑optimized Flask‑SocketIO application running on Gevent or Eventlet can theoretically handle tens of thousands of concurrent connections. However, in a default Linux environment, such an application will usually crash or stop accepting connections long before CPU or memory resources are saturated.
This plateau occurs because the Linux kernel, out of the box, is tuned for general‑purpose computing, not for acting as a massive termination point for persistent TCP connections. For a WebSocket server—where connections are long‑lived and stateful—resource exhaustion manifests as:
- File descriptor limits
- Ephemeral‑port starvation
- TCP‑stack congestion
The article below outlines the specific kernel‑level tuning required to scale Flask‑SocketIO beyond the 10 000‑connection barrier.
File Descriptors
In Unix‑like operating systems, “everything is a file.” This includes TCP sockets. When a client connects to your server, the kernel allocates a file descriptor (FD) to represent that socket.
- By default, most Linux distributions enforce a strict limit of 1024 open file descriptors per process – a legacy constraint.
- For a WebSocket server, this means that after roughly 1 000 concurrent users (plus a few descriptors for log files and shared libraries), the application will crash or raise
OSError: [Errno 24] Too many open files
The kernel distinguishes between:
- Soft limit – user‑configurable ceiling.
- Hard limit – absolute ceiling set by root.
Verification
ulimit -n
# → 1024
Remediation
System‑wide (/etc/security/limits.conf):
* soft nofile 65535
* hard nofile 65535
systemd Service (/etc/systemd/system/app.service):
Systemd ignores user limits; you must define them explicitly in the unit file:
[Service]
LimitNOFILE=65535
Ephemeral Ports
While file descriptors limit incoming connections, ephemeral ports limit outgoing connections. This distinction is critical for Flask‑SocketIO architectures that rely on a message broker like Redis.
When the Flask app connects to Redis (or Nginx connects to your upstream Flask/Gunicorn workers), it opens a TCP socket. The kernel assigns a local port from the ephemeral‑port range.
- Default range is often narrow (e.g.,
32768–60999), providing only ~28 000 ports. - In high‑throughput scenarios—e.g., the Flask app publishing aggressively to Redis or Nginx proxying massive traffic—the server can run out of available local ports.
Symptoms
EADDRNOTAVAIL (Cannot assign requested address)errors in logs.- Sudden inability of the Flask app to talk to Redis, despite Redis being healthy.
- Nginx returning 502 Bad Gateway because it cannot open a socket to the upstream.
Tuning
# Check current range
sysctl net.ipv4.ip_local_port_range
Add to /etc/sysctl.conf to expand the range:
net.ipv4.ip_local_port_range = 1024 65535
Apply the change:
sudo sysctl -p
TIME_WAIT State
The most misunderstood aspect of TCP scaling is the TIME_WAIT state. When a TCP connection is closed, the side that initiated the close enters TIME_WAIT for 2 * MSL (Maximum Segment Lifetime), typically 60 seconds. This ensures that delayed packets are handled correctly and not mistaken for a new connection on the same port.
In a high‑churn environment (e.g., clients constantly refreshing pages or reconnecting), the server can accumulate tens of thousands of sockets in TIME_WAIT. These sockets:
- Consume system resources.
- Lock up the 4‑tuple (source IP, source port, dest IP, dest port), preventing new outgoing connections.
tcp_tw_recycle – Do NOT use
Older guides suggested enabling net.ipv4.tcp_tw_recycle. It was removed in Linux kernel 4.12 because it breaks connections for users behind NAT by aggressively dropping out‑of‑order packets.
tcp_tw_reuse – Safe alternative
net.ipv4.tcp_tw_reuse allows the kernel to reclaim a TIME_WAIT socket for a new outgoing connection if the new connection’s timestamp is strictly greater than the last packet seen on the old connection. This is safe for most internal infrastructure (e.g., Flask ↔ Redis).
Configuration (/etc/sysctl.conf):
# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
Apply:
sudo sysctl -p
Benchmarking WebSockets
Standard HTTP benchmarking tools like ab (Apache Bench) are useless for WebSockets. They measure requests per second, whereas the primary metric for WebSockets is concurrency (simultaneous open connections) and message latency.
Recommended tools
- Artillery – supports WebSocket scenarios.
- Locust – can be scripted for persistent connections.
Test methodology
- Ramp‑up – Don’t connect 10 k users instantly; this triggers “thundering‑herd” protection or SYN‑flood defenses. Ramp up over minutes.
- Sustain – Hold the connections open for an extended period.
- Broadcast – While connections are held, trigger a broadcast event to measure the latency of the Redis back‑plane and the Nginx proxy buffering.
Interpretation
| Failure point | Likely cause |
|---|---|
| ~1024 users | File‑descriptor limit still in effect |
| ~28 000 users | Ephemeral‑port range exhausted |
| >30 000 TIME_WAIT sockets | Churn problem or missing tcp_tw_reuse |
Observability
Observability is the only way to confirm that kernel tuning is effective. When running high‑concurrency workloads, monitor specific OS‑level metrics.
What to watch
- process_open_fds for the Gunicorn/uWSGI process – if this line flattens at a specific number (e.g., 1024 or 4096) while CPU is low, you have hit a hard limit.
- Socket state counts –
ESTABLISHED,TIME_WAIT, etc.ESTABLISHEDshould match your active user count.TIME_WAITspikes to 30 k+ → churn problem or needtcp_tw_reuse.
- Allocated sockets –
sockstatoutput.
Example commands:
# Open file descriptors used by the process (replace <pid>)
cat /proc/<pid>/fd | wc -l
# Socket statistics
ss -s
# Detailed socket list (filter by state)
ss -tan state established | wc -l
ss -tan state time-wait | wc -l
Memory Consumed by Networking Buffers
If you are using iptables or Docker, the nf_conntrack table limits how many connections the firewall tracks.
# Check kernel log for conntrack table overflow
dmesg | grep "nf_conntrack: table full, dropping packet"
Tune (example):
sysctl -w net.netfilter.nf_conntrack_max=131072
Risks of Aggressive Kernel Tuning
| Area | Potential Issue | Impact |
|---|---|---|
| Security | Expanding the ephemeral port range makes port scanning slightly easier (negligible inside a private VPC). | |
| Stability | Setting file‑descriptor limits too high (e.g., millions) can let a memory leak in the application crash the entire server rather than just the process. | |
| Connection Tracking | Increasing nf_conntrack_max consumes kernel memory (RAM). Ensure the server has enough RAM to store the state of 100k+ tracked connections. |
Golden Rule:
Never apply sysctl settings blindly. Deploy them via configuration‑management tools (Ansible, Terraform), document why they are needed, and validate with load testing.
Scaling Flask‑SocketIO to 10 000+ Connections
Achieving high‑concurrency is as much a systems‑engineering problem as a software one. The default Linux configuration is conservative—geared toward desktops or low‑traffic servers. By systematically addressing:
- File‑descriptor limits (
ulimit) - Ephemeral port range (
net.ipv4.ip_local_port_range) - TCP TIME‑WAIT reuse (
net.ipv4.tcp_tw_reuse)
you unlock the OS’s ability to handle many simultaneous sockets.
Production‑Readiness Checklist
- File‑descriptor limit –
ulimit -n > 65535for the Gunicorn process. - Ephemeral port range –
net.ipv4.ip_local_port_range = 1024 65535. - TCP TIME‑WAIT reuse –
net.ipv4.tcp_tw_reuse = 1. - TCP TIME‑WAIT recycle –
net.ipv4.tcp_tw_recycle = 0(or the key absent). - Conntrack table – Increase
nf_conntrack_maxif using stateful firewalls.
Example sysctl Configuration
# /etc/sysctl.d/99-flask-socketio.conf
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.netfilter.nf_conntrack_max = 131072
Apply the changes:
sudo sysctl --system
Remember:
- Monitor memory usage after raising
nf_conntrack_max. - Keep an eye on the number of open file descriptors (
lsof,cat /proc/<pid>/fd). - Perform load‑testing (e.g., with
locustorwrk) to verify that the system remains stable under the expected traffic.