How I Built an Adaptive 'Immune System' for Cloud Traffic

Published: 5 hours ago (April 29, 2026 at 04:18 AM EDT)

3 min read

Source: Dev.to

The Architecture: Under the Hood

This isn’t just a script running in a vacuum. To make it work in a production‑style environment, I deployed a stack that mirrors real‑world DevOps architecture:

The Source:   A Nextcloud instance running in Docker.
The Proxy:    Nginx, configured to write JSON access logs to a specific path.
The Bridge:  A named Docker volume (HNG-nginx-logs) shared between Nginx (writer) and my Python daemon (reader).
The Brain:   A multi‑module Python engine that tails these logs in real‑time.

1. The Sliding Window: Beyond Simple Counters

Most beginners use a simple integer counter that resets every minute. That’s a mistake: if an attacker sends 1,000 requests in the last 10 seconds of a minute, a counter might miss the “peak.”

I used a time‑based sliding window with collections.deque:

from collections import deque
import time

# Each IP has its own deque of timestamps
window = deque()

def process_request():
    now = time.time()
    window.append(now)  # Add the new hit

    # Eviction logic: keep only the last 60 seconds
    while window and window[0] < now - 60:
        window.popleft()

Detection criteria (whichever triggers first flags an anomaly):

Z‑score > 3.0 → traffic is 3 standard deviations away from the average.
5× Rule → current rate exceeds 5 × the baseline mean.

4. The “Zero‑Trust” Error Surge

Attackers often leave a trail of 404 Not Found (scanning) or 500 Internal Server Error (crashing).

The engine tracks the error rate per IP. If an IP’s 4xx/5xx errors exceed 3 × the baseline error rate, the detection threshold is tightened from a Z‑score of 3.0 to 1.5, removing the benefit of the doubt.

5. Enforcement & The Lifecycle of a Ban

Detection is useless without action. When a ban is triggered, the engine talks directly to the Linux kernel:

# Inject a DROP rule into iptables
iptables -I INPUT -s <IP_ADDRESS> -j DROP

Backoff schedule: bans follow a progression — 10 minutes → 30 minutes → 2 hours → permanent.
Alerting: a Slack notification is sent within 10 seconds, containing the Z‑score, current rate, and baseline.

6. Real‑Time Observability

The live metrics UI serves as the control room. Built with Flask and refreshing every 3 seconds, it provides full visibility:

Global req/s vs. learned effective mean/stddev.
Banned IPs with “time remaining” countdowns.
Top 10 source IPs and system health (CPU/Memory).

Lessons Learned

The biggest takeaway: DevOps is about observation, not just maintenance. The hardest part wasn’t the architecture; it was the math. I spent far more time fine‑tuning thresholds so the system could distinguish between a successful product launch and a genuine DDoS attack.

Real‑world quirk: during testing I ended up banning my own Docker gateway (172.18.0.1). Nginx saw internal traffic through the Docker bridge, the engine flagged the gateway as an aggressive attacker, and locked it out. This forced a more robust whitelisting strategy for internal CIDR ranges, proving that even the best math must be grounded in the reality of your specific network.

How I Built an Adaptive 'Immune System' for Cloud Traffic

The Architecture: Under the Hood

1. The Sliding Window: Beyond Simple Counters

4. The “Zero‑Trust” Error Surge

5. Enforcement & The Lifecycle of a Ban

6. Real‑Time Observability

Lessons Learned

Related posts

My First Google Cloud NEXT ’26 Experience as a Beginner in Machine Learning

We Built a 3-Layer Audit Trail (AI + GPS + Blockchain) to Eliminate Greenwashing in Ocean Conservation

That $500k AI rewrite story is actually a story about test suites

🚀 AI + AWS in April 2026: Agentic AI Boom, Massive Partnerships, and Rising Risks