How to Benchmark Web Frameworks in a Fair, Isolated Way | Mahdi Shamlou

Published: 2 months ago (February 21, 2026 at 03:39 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

Hey everyone! Mahdi Shamlou here 👋

I’ve seen many posts online comparing web frameworks, but most of them are either biased, outdated, or hard to reproduce. So I wanted to share a practical way to benchmark any web framework, keeping everything isolated, fair, and reproducible.

We’ll use Docker for isolation, k6 for load testing, and Python frameworks — FastAPI and Flask — as simple examples. The approach works for Node.js, Go, Java, Rust, or anything else.

Overview

Benchmarking web frameworks can be tricky. Many factors affect results:

CPU & memory availability
Number of workers / threads
Background processes
Routing, logging, database, I/O

To make a fair comparison you need:

Docker containers with fixed CPU & memory limits
Identical routes or endpoints in each framework
Controlled load tests using k6 (or similar tools)
Results saved for later analysis

Project Structure

mkdir web_framework_benchmarks
cd web_framework_benchmarks
mkdir framework1 framework2 k6-tests results

You can replace framework1 and framework2 with any frameworks you want to compare. For demonstration we use a simple /hello endpoint.

FastAPI (Python)

# app.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/hello")
def hello():
    return {"message": "hello world"}

Flask (Python)

# app.py
from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/hello")
def hello():
    return jsonify({"message": "hello world"})

You can implement the same endpoint in Node.js, Go, Java, etc., keeping the functionality identical. Optionally add a sleep route to simulate I/O‑heavy work.

Dockerfiles (Fair Comparison)

FastAPI Dockerfile

# Dockerfile (FastAPI)
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install fastapi uvicorn gunicorn
CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "-w", "1", "-b", "0.0.0.0:8000", "app:app"]

Flask Dockerfile

# Dockerfile (Flask)
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install flask gunicorn
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8000", "app:app"]

✅ Both containers now have the same worker count and identical CPU/memory limits, ensuring a fair baseline.

Load Testing with k6

k6 Script (JavaScript)

// k6-tests/framework1.js (or framework2.js)
import http from "k6/http";
import { sleep } from "k6";

export const options = {
  stages: [
    { duration: "30s", target: 50 },   // ramp‑up
    { duration: "1m", target: 200 },  // hold load
    { duration: "30s", target: 0 },    // ramp‑down
  ],
  thresholds: {
    "http_req_duration": ["p(95)<200"], // 95% of requests should be <200 ms
  },
};

export default function () {
  http.get("http://localhost:8001/hello");
  sleep(1);
}

Running the Tests

mkdir -p results
k6 run --out json=results/framework1.json k6-tests/framework1.js
k6 run --out json=results/framework2.json k6-tests/framework2.js

The JSON output can be processed with any language or plotting tool.

Analyzing Results

Below is a minimal Python script that extracts average latency, 95th‑percentile latency, request count, and failure rate, then plots the average response time.

# analyze.py
import json
import numpy as np
import matplotlib.pyplot as plt

files = {
    "Framework1": "results/framework1.json",
    "Framework2": "results/framework2.json"
}
summary = {}

for name, file in files.items():
    durations, fails, total = [], 0, 0
    with open(file) as f:
        for line in f:
            obj = json.loads(line)
            if obj.get("type") == "Point":
                metric = obj.get("metric")
                if metric == "http_req_duration":
                    durations.append(obj["data"]["value"])
                if metric == "http_req_failed":
                    fails += obj["data"]["value"]
                    total += 1
    if durations:
        summary[name] = {
            "avg": np.mean(durations),
            "p95": np.percentile(durations, 95),
            "requests": len(durations),
            "fail_rate": fails / total if total else 0,
        }

print(summary)

plt.bar(summary.keys(), [summary[n]["avg"] for n in summary])
plt.title("Average Response Time (ms)")
plt.ylabel("Milliseconds")
plt.show()

Key Takeaways

Docker isolation makes the benchmark reproducible.
Worker count & CPU limits must match across containers.
Simple routes may make Flask look faster; don’t be fooled.
Async/I/O‑heavy routes showcase the strengths of FastAPI (or other async frameworks).
Always benchmark your actual workload, not just tiny examples.

Repository

The full example, including Dockerfiles, k6 scripts, and analysis code, is available at:

https://github.com/mahdi-shamlou/web_framework_benchmarks