How I Built an Adaptive 'Immune System' for Cloud Traffic
Source: Dev.to

The Architecture: Under the Hood
This isn’t just a script running in a vacuum. To make it work in a production‑style environment, I deployed a stack that mirrors real‑world DevOps architecture:
The Source: A Nextcloud instance running in Docker.
The Proxy: Nginx, configured to write JSON access logs to a specific path.
The Bridge: A named Docker volume (HNG-nginx-logs) shared between Nginx (writer) and my Python daemon (reader).
The Brain: A multi‑module Python engine that tails these logs in real‑time.
1. The Sliding Window: Beyond Simple Counters
Most beginners use a simple integer counter that resets every minute. That’s a mistake: if an attacker sends 1,000 requests in the last 10 seconds of a minute, a counter might miss the “peak.”
I used a time‑based sliding window with collections.deque:
from collections import deque
import time
# Each IP has its own deque of timestamps
window = deque()
def process_request():
now = time.time()
window.append(now) # Add the new hit
# Eviction logic: keep only the last 60 seconds
while window and window[0] < now - 60:
window.popleft()
Detection criteria (whichever triggers first flags an anomaly):
- Z‑score > 3.0 → traffic is 3 standard deviations away from the average.
- 5× Rule → current rate exceeds 5 × the baseline mean.
4. The “Zero‑Trust” Error Surge
Attackers often leave a trail of 404 Not Found (scanning) or 500 Internal Server Error (crashing).
The engine tracks the error rate per IP. If an IP’s 4xx/5xx errors exceed 3 × the baseline error rate, the detection threshold is tightened from a Z‑score of 3.0 to 1.5, removing the benefit of the doubt.
5. Enforcement & The Lifecycle of a Ban
Detection is useless without action. When a ban is triggered, the engine talks directly to the Linux kernel:
# Inject a DROP rule into iptables
iptables -I INPUT -s <IP_ADDRESS> -j DROP
- Backoff schedule: bans follow a progression — 10 minutes → 30 minutes → 2 hours → permanent.
- Alerting: a Slack notification is sent within 10 seconds, containing the Z‑score, current rate, and baseline.
6. Real‑Time Observability
The live metrics UI serves as the control room. Built with Flask and refreshing every 3 seconds, it provides full visibility:
- Global req/s vs. learned effective mean/stddev.
- Banned IPs with “time remaining” countdowns.
- Top 10 source IPs and system health (CPU/Memory).
Lessons Learned
The biggest takeaway: DevOps is about observation, not just maintenance. The hardest part wasn’t the architecture; it was the math. I spent far more time fine‑tuning thresholds so the system could distinguish between a successful product launch and a genuine DDoS attack.
Real‑world quirk: during testing I ended up banning my own Docker gateway (172.18.0.1). Nginx saw internal traffic through the Docker bridge, the engine flagged the gateway as an aggressive attacker, and locked it out. This forced a more robust whitelisting strategy for internal CIDR ranges, proving that even the best math must be grounded in the reality of your specific network.