I Crashed My Server with Promise.all() - Here's How I Built a 121 RPS Fuzzer Instead

Published: (December 13, 2025 at 11:21 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn’t just about throwing more requests at a server. It’s about understanding how Node.js actually works.

Below is the journey from my first “obvious” solution to a fuzzer that achieves 121 RPS on modest hardware.

Chapter 1: The Promise.all() Trap 🪤

My First Thought

“Easy! I’ll just load all my wordlist paths and fire them all at once with Promise.all()!”

// My first naive approach
const wordlist = ['admin', 'backup', 'config', /* … */]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!

The Brutal Reality

The laptop froze and the target server likely started throttling.
What went wrong?

  • Promise.all() is not parallel execution.
  • It opens 10 000 simultaneous connections, consuming massive memory and overwhelming both the client and the server.
  • Result: system crash, memory exhaustion, or an accidental DDoS.

Understanding Node.js: Concurrent, Not Parallel 🔄

┌─────────────────────────────────────────────────┐
│          Node.js Single Thread                  │
├─────────────────────────────────────────────────┤
│  Your Code (Asynchronous)                      │
│         ↓                                      │
│  Event Demultiplexer (Receives all events)    │
│         ↓                                      │
│  Event Queue [Event1, Event2, Event3, …]      │
│         ↓                                      │
│  Event Loop (while(queue.length > 0))         │
│    ├─ Takes Event1                             │
│    ├─ Executes Callback                        │
│    ├─ Returns immediately (non‑blocking!)      │
│    └─ Takes Event2…                            │
└─────────────────────────────────────────────────┘

Key Insight: Node.js is concurrent (non‑blocking), not parallel (multiple things at once).

When you call Promise.all() with thousands of requests you get:

  • ❌ No parallel threads
  • ✅ 10 000 open sockets
  • ❌ Massive memory usage
  • ❌ Overload of both client and target

Chapter 2: The Queue Model – Controlled Chaos 🎯

The Better Approach

Use bounded concurrency: limit the number of simultaneous requests and queue the rest.

┌────────────────────────────────────────┐
│      Wordlist (10,000 paths)         │
└────────────┬───────────────────────────┘

┌────────────────────────────────────────┐
│         Request Queue                  │
│  [req1, req2, req3, req4, req5, …]     │
└────────────┬───────────────────────────┘

┌────────────────────────────────────────┐
│    Concurrency Limit (e.g., 50)        │
│                                        │
│  [Active1] [Active2] … [Active50]      │
│      ↓          ↓            ↓         │
│   Response   Response   Response      │
│      ↓          ↓            ↓         │
│  Next from queue (req51)               │
└────────────────────────────────────────┘

Implementation

class RequestQueue {
  constructor(concurrency = 50) {
    this.concurrency = concurrency;
    this.running = 0;
    this.queue = [];
  }

  async add(task) {
    // If we’re at the limit, wait
    if (this.running >= this.concurrency) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.running++;
    try {
      return await task();
    } finally {
      this.running--;
      // Release next queued task
      if (this.queue.length > 0) {
        const resolve = this.queue.shift();
        resolve();
      }
    }
  }
}

Results

  • ✅ Memory stays stable
  • ✅ Target server remains responsive
  • ✅ Predictable resource usage
  • ✅ Concurrency tunable via -t flag

But I wanted MORE speed. Time for the next level.

Chapter 3: Multi‑Core Power – The Cluster Model 💪

The Problem

Node.js runs on a single thread. On an i5 with 8 cores I was using only ≈12 % of the CPU.

Solution: Node.js Cluster Module

┌─────────────────────────────────────────────────┐
│            Primary Process (Master)             │
│  - Loads wordlist                               │
│  - Splits work among workers                    │
│  - Collects results                             │
└───────────────┬───────────────────────┬─────────┘
                │                       │
        ┌───────┴───────┐       ┌───────┴───────┐
        ↓               ↓       ↓               ↓
 ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐
 │ Worker 1  │   │ Worker 2  │   │ Worker 3  │   │ Worker 4  │
 │  Queue    │   │  Queue    │   │  Queue    │   │  Queue    │
 │  Model    │   │  Model    │   │  Model    │   │  Model    │
 │ (-t 10)   │   │ (-t 10)   │   │ (-t 10)   │   │ (-t 10)   │
 └───────────┘   └───────────┘   └───────────┘   └───────────┘
        ↓               ↓               ↓               ↓
      Target          Target          Target          Target

Implementation

// Primary process
if (cluster.isPrimary) {
  const numWorkers = getCoreCount(options.cores); // -c flag
  const workload = splitWordlist(wordlist, numWorkers);

  for (let i = 0; i  {
    const queue = new RequestQueue(concurrency);
    for (const path of paths) {
      await queue.add(() => fuzzPath(path));
    }
  });
}

Chapter 4: The Sweet Spot – Balancing Act ⚖️

The Complexity

Now two variables need tuning:

VariableMeaning
-cNumber of worker processes (clusters)
-tRequests per worker (concurrency)

What I Discovered

Configuration          RPS   Why?
────────────────────────────────────────────────────
-c 8  -t 2            ~65   Too much IPC overhead
-c 4  -t 5            ~95   Better balance
-c 2  -t 10           ~121  SWEET SPOT! ⭐
-c 1  -t 20           ~85   Bottlenecked by single process
-c all -t 20          ~70   IPC kills performance

Pattern: Fewer workers + higher per‑worker concurrency → higher throughput.

Why It Works

Fewer Workers (e.g., -c 2):
┌──────────────┐
│ Worker 1     │───┐
│ -t 10        │   │  Less inter‑process communication
│ (10 reqs)    │   ├─> lower overhead
└──────────────┘   │

┌──────────────┐   │
│ Worker 2     │───┘
│ -t 10        │
│ (10 reqs)    │
└──────────────┘

More Workers (e.g., -c 8):
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│W1 -t2│ │W2 -t2│ │W3 -t2│ │W4 -t2│ …
└───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘
    │         │         │         │
    └─────────┴─────────┴─────────┘
         High IPC overhead!

Chapter 5: Putting It All Together 🎯

Final Architecture

┌───────────────────────────────────────────────┐
│                404fuzz Primary                │
│  - Loads wordlist, parses CLI flags            │
│  - Spawns workers (‑c)                         │
│  - Distributes chunks of paths                 │
│  - Aggregates results & prints summary         │
└───────────────┬───────────────────────┬───────┘
                │                       │
        ┌───────▼───────┐        ┌───────▼───────┐
        │ Worker 1      │ …      │ Worker N      │
        │  Queue Model  │        │  Queue Model │
        │  (-t value)   │        │  (-t value)  │
        └───────┬───────┘        └───────┬───────┘
                │                        │
                ▼                        ▼
              Target servers (HTTP endpoints)

With bounded concurrency per worker and an optimal number of workers (-c 2 -t 10 on my test machine), 404fuzz consistently reaches ~121 RPS while keeping memory usage low and avoiding overload of the target.

That’s how a naïve Promise.all() crash turned into a fast, reliable fuzzer.

Back to Blog

Related posts

Read more »