How to Benchmark Web Frameworks in a Fair, Isolated Way | Mahdi Shamlou

Published: (February 21, 2026 at 03:39 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Hey everyone! Mahdi Shamlou here ๐Ÿ‘‹

Iโ€™ve seen many posts online comparing web frameworks, but most of them are either biased, outdated, or hard to reproduce. So I wanted to share a practical way to benchmark any web framework, keeping everything isolated, fair, and reproducible.

Weโ€™ll use Docker for isolation, k6 for load testing, and Python frameworks โ€” FastAPI and Flask โ€” as simple examples. The approach works for Node.js, Go, Java, Rust, or anything else.

Overview

Benchmarking web frameworks can be tricky. Many factors affect results:

  • CPU & memory availability
  • Number of workers / threads
  • Background processes
  • Routing, logging, database, I/O

To make a fair comparison you need:

  • Docker containers with fixed CPU & memory limits
  • Identical routes or endpoints in each framework
  • Controlled load tests using k6 (or similar tools)
  • Results saved for later analysis

Project Structure

mkdir web_framework_benchmarks
cd web_framework_benchmarks
mkdir framework1 framework2 k6-tests results

You can replace framework1 and framework2 with any frameworks you want to compare. For demonstration we use a simple /hello endpoint.

FastAPI (Python)

# app.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/hello")
def hello():
    return {"message": "hello world"}

Flask (Python)

# app.py
from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/hello")
def hello():
    return jsonify({"message": "hello world"})

You can implement the same endpoint in Node.js, Go, Java, etc., keeping the functionality identical. Optionally add a sleep route to simulate I/Oโ€‘heavy work.

Dockerfiles (Fair Comparison)

FastAPI Dockerfile

# Dockerfile (FastAPI)
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install fastapi uvicorn gunicorn
CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "-w", "1", "-b", "0.0.0.0:8000", "app:app"]

Flask Dockerfile

# Dockerfile (Flask)
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install flask gunicorn
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8000", "app:app"]

โœ… Both containers now have the same worker count and identical CPU/memory limits, ensuring a fair baseline.

Load Testing with k6

k6 Script (JavaScript)

// k6-tests/framework1.js (or framework2.js)
import http from "k6/http";
import { sleep } from "k6";

export const options = {
  stages: [
    { duration: "30s", target: 50 },   // rampโ€‘up
    { duration: "1m", target: 200 },  // hold load
    { duration: "30s", target: 0 },    // rampโ€‘down
  ],
  thresholds: {
    "http_req_duration": ["p(95)<200"], // 95% of requests should be <200โ€ฏms
  },
};

export default function () {
  http.get("http://localhost:8001/hello");
  sleep(1);
}

Running the Tests

mkdir -p results
k6 run --out json=results/framework1.json k6-tests/framework1.js
k6 run --out json=results/framework2.json k6-tests/framework2.js

The JSON output can be processed with any language or plotting tool.

Analyzing Results

Below is a minimal Python script that extracts average latency, 95thโ€‘percentile latency, request count, and failure rate, then plots the average response time.

# analyze.py
import json
import numpy as np
import matplotlib.pyplot as plt

files = {
    "Framework1": "results/framework1.json",
    "Framework2": "results/framework2.json"
}
summary = {}

for name, file in files.items():
    durations, fails, total = [], 0, 0
    with open(file) as f:
        for line in f:
            obj = json.loads(line)
            if obj.get("type") == "Point":
                metric = obj.get("metric")
                if metric == "http_req_duration":
                    durations.append(obj["data"]["value"])
                if metric == "http_req_failed":
                    fails += obj["data"]["value"]
                    total += 1
    if durations:
        summary[name] = {
            "avg": np.mean(durations),
            "p95": np.percentile(durations, 95),
            "requests": len(durations),
            "fail_rate": fails / total if total else 0,
        }

print(summary)

plt.bar(summary.keys(), [summary[n]["avg"] for n in summary])
plt.title("Average Response Time (ms)")
plt.ylabel("Milliseconds")
plt.show()

Key Takeaways

  • Docker isolation makes the benchmark reproducible.
  • Worker count & CPU limits must match across containers.
  • Simple routes may make Flask look faster; donโ€™t be fooled.
  • Async/I/Oโ€‘heavy routes showcase the strengths of FastAPI (or other async frameworks).
  • Always benchmark your actual workload, not just tiny examples.

Repository

The full example, including Dockerfiles, k6 scripts, and analysis code, is available at:

https://github.com/mahdi-shamlou/web_framework_benchmarks
0 views
Back to Blog

Related posts

Read more ยป