I Crashed My Server with Promise.all() - Here's How I Built a 121 RPS Fuzzer Instead
Source: Dev.to
When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn’t just about throwing more requests at a server. It’s about understanding how Node.js actually works.
Below is the journey from my first “obvious” solution to a fuzzer that achieves 121 RPS on modest hardware.
Chapter 1: The Promise.all() Trap 🪤
My First Thought
“Easy! I’ll just load all my wordlist paths and fire them all at once with
Promise.all()!”
// My first naive approach
const wordlist = ['admin', 'backup', 'config', /* … */]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!
The Brutal Reality
The laptop froze and the target server likely started throttling.
What went wrong?
Promise.all()is not parallel execution.- It opens 10 000 simultaneous connections, consuming massive memory and overwhelming both the client and the server.
- Result: system crash, memory exhaustion, or an accidental DDoS.
Understanding Node.js: Concurrent, Not Parallel 🔄
┌─────────────────────────────────────────────────┐
│ Node.js Single Thread │
├─────────────────────────────────────────────────┤
│ Your Code (Asynchronous) │
│ ↓ │
│ Event Demultiplexer (Receives all events) │
│ ↓ │
│ Event Queue [Event1, Event2, Event3, …] │
│ ↓ │
│ Event Loop (while(queue.length > 0)) │
│ ├─ Takes Event1 │
│ ├─ Executes Callback │
│ ├─ Returns immediately (non‑blocking!) │
│ └─ Takes Event2… │
└─────────────────────────────────────────────────┘
Key Insight: Node.js is concurrent (non‑blocking), not parallel (multiple things at once).
When you call Promise.all() with thousands of requests you get:
- ❌ No parallel threads
- ✅ 10 000 open sockets
- ❌ Massive memory usage
- ❌ Overload of both client and target
Chapter 2: The Queue Model – Controlled Chaos 🎯
The Better Approach
Use bounded concurrency: limit the number of simultaneous requests and queue the rest.
┌────────────────────────────────────────┐
│ Wordlist (10,000 paths) │
└────────────┬───────────────────────────┘
↓
┌────────────────────────────────────────┐
│ Request Queue │
│ [req1, req2, req3, req4, req5, …] │
└────────────┬───────────────────────────┘
↓
┌────────────────────────────────────────┐
│ Concurrency Limit (e.g., 50) │
│ │
│ [Active1] [Active2] … [Active50] │
│ ↓ ↓ ↓ │
│ Response Response Response │
│ ↓ ↓ ↓ │
│ Next from queue (req51) │
└────────────────────────────────────────┘
Implementation
class RequestQueue {
constructor(concurrency = 50) {
this.concurrency = concurrency;
this.running = 0;
this.queue = [];
}
async add(task) {
// If we’re at the limit, wait
if (this.running >= this.concurrency) {
await new Promise(resolve => this.queue.push(resolve));
}
this.running++;
try {
return await task();
} finally {
this.running--;
// Release next queued task
if (this.queue.length > 0) {
const resolve = this.queue.shift();
resolve();
}
}
}
}
Results
- ✅ Memory stays stable
- ✅ Target server remains responsive
- ✅ Predictable resource usage
- ✅ Concurrency tunable via
-tflag
But I wanted MORE speed. Time for the next level.
Chapter 3: Multi‑Core Power – The Cluster Model 💪
The Problem
Node.js runs on a single thread. On an i5 with 8 cores I was using only ≈12 % of the CPU.
Solution: Node.js Cluster Module
┌─────────────────────────────────────────────────┐
│ Primary Process (Master) │
│ - Loads wordlist │
│ - Splits work among workers │
│ - Collects results │
└───────────────┬───────────────────────┬─────────┘
│ │
┌───────┴───────┐ ┌───────┴───────┐
↓ ↓ ↓ ↓
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │ │ Worker 4 │
│ Queue │ │ Queue │ │ Queue │ │ Queue │
│ Model │ │ Model │ │ Model │ │ Model │
│ (-t 10) │ │ (-t 10) │ │ (-t 10) │ │ (-t 10) │
└───────────┘ └───────────┘ └───────────┘ └───────────┘
↓ ↓ ↓ ↓
Target Target Target Target
Implementation
// Primary process
if (cluster.isPrimary) {
const numWorkers = getCoreCount(options.cores); // -c flag
const workload = splitWordlist(wordlist, numWorkers);
for (let i = 0; i {
const queue = new RequestQueue(concurrency);
for (const path of paths) {
await queue.add(() => fuzzPath(path));
}
});
}
Chapter 4: The Sweet Spot – Balancing Act ⚖️
The Complexity
Now two variables need tuning:
| Variable | Meaning |
|---|---|
-c | Number of worker processes (clusters) |
-t | Requests per worker (concurrency) |
What I Discovered
Configuration RPS Why?
────────────────────────────────────────────────────
-c 8 -t 2 ~65 Too much IPC overhead
-c 4 -t 5 ~95 Better balance
-c 2 -t 10 ~121 SWEET SPOT! ⭐
-c 1 -t 20 ~85 Bottlenecked by single process
-c all -t 20 ~70 IPC kills performance
Pattern: Fewer workers + higher per‑worker concurrency → higher throughput.
Why It Works
Fewer Workers (e.g., -c 2):
┌──────────────┐
│ Worker 1 │───┐
│ -t 10 │ │ Less inter‑process communication
│ (10 reqs) │ ├─> lower overhead
└──────────────┘ │
│
┌──────────────┐ │
│ Worker 2 │───┘
│ -t 10 │
│ (10 reqs) │
└──────────────┘
More Workers (e.g., -c 8):
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│W1 -t2│ │W2 -t2│ │W3 -t2│ │W4 -t2│ …
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │
└─────────┴─────────┴─────────┘
High IPC overhead!
Chapter 5: Putting It All Together 🎯
Final Architecture
┌───────────────────────────────────────────────┐
│ 404fuzz Primary │
│ - Loads wordlist, parses CLI flags │
│ - Spawns workers (‑c) │
│ - Distributes chunks of paths │
│ - Aggregates results & prints summary │
└───────────────┬───────────────────────┬───────┘
│ │
┌───────▼───────┐ ┌───────▼───────┐
│ Worker 1 │ … │ Worker N │
│ Queue Model │ │ Queue Model │
│ (-t value) │ │ (-t value) │
└───────┬───────┘ └───────┬───────┘
│ │
▼ ▼
Target servers (HTTP endpoints)
With bounded concurrency per worker and an optimal number of workers (-c 2 -t 10 on my test machine), 404fuzz consistently reaches ~121 RPS while keeping memory usage low and avoiding overload of the target.
That’s how a naïve Promise.all() crash turned into a fast, reliable fuzzer.