Introducing FlameIQ — Deterministic Performance Regression Detection for Python

Published: 1 month ago (March 7, 2026 at 05:06 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

Performance regressions are invisible in code review.

A careless refactor that recompiles a regex on every function call.
A new dependency that adds 40 ms to your p95 latency.
A database query that wasn’t indexed.

None of these show up in a diff. They accumulate silently across hundreds of commits — a 3 ms latency increase here, a 2 % throughput drop there — until they become expensive production incidents.

Type checkers enforce correctness automatically. Linters enforce style automatically. Nothing enforces performance — until now.

Introducing FlameIQ

Today we are releasing FlameIQ v1.0.0 — an open‑source, deterministic, CI‑native performance‑regression engine for Python.

pip install flameiq-core

FlameIQ compares your current benchmark results against a stored baseline and fails your CI pipeline if any metric exceeds its configured threshold — the same way a type checker fails your build on a type error.

Quick Start

Step 1 — Initialise

cd my-project
flameiq init

Step 2 — Run your benchmarks and produce a metrics file

{
  "schema_version": 1,
  "metadata": {
    "commit": "abc123",
    "branch": "main",
    "environment": "ci"
  },
  "metrics": {
    "latency": {
      "mean": 120.5,
      "p95": 180.0,
      "p99": 240.0
    },
    "throughput": 950.2,
    "memory_mb": 512.0
  }
}

Step 3 — Set a baseline

flameiq baseline set --metrics benchmark.json

Step 4 — Compare on every PR

flameiq compare --metrics current.json --fail-on-regression

Output

  Metric           Baseline    Current      Change   Threshold  Status
  ────────────────────────────────────────────────────────────────────
  latency.p95       2.45 ms     4.51 ms     +84.08%    ±10.0%  REGRESSION
  throughput        412.30      231.50      -43.84%    ±10.0%  REGRESSION

  ✗ REGRESSION — 2 metric(s) exceeded threshold.

Exit code 1. Pipeline fails. Regression caught before merge.

A Real Example: Catching a Regex Regression

# FAST — original implementation
def clean(text: str) -> str:
    text = re.sub(r"[^\w\s]", "", text)   # Python caches compiled regex
    text = re.sub(r"\s+", " ", text).strip()
    return text.lower()

# SLOW — regressed implementation
def clean(text: str) -> str:
    punct_re = re.compile(r"[^\w\s]")     # recompiled on every call!
    space_re = re.compile(r"\s+")        # recompiled on every call!
    text = punct_re.sub("", text)
    text = space_re.sub(" ", text).strip()
    return text.lower()

The logic is identical, so the diff looks clean. FlameIQ catches it with an 84 % p95 latency increase — well above the 10 % threshold.

GitHub Actions Integration

- name: Install FlameIQ
  run: pip install flameiq-core

- name: Restore baseline cache
  uses: actions/cache@v4
  with:
    path: .flameiq/
    key: flameiq-${{ github.base_ref }}

- name: Run benchmarks
  run: python run_benchmarks.py > metrics.json

- name: Check for regressions
  run: flameiq compare --metrics metrics.json --fail-on-regression

Key Design Decisions

Deterministic by design – Identical inputs always produce identical outputs. No randomness, network calls, or datetime.now(). Safe for any CI environment, including air‑gapped infrastructure.
No vendor dependency – Baselines are local JSON files. No SaaS account, API keys, or telemetry. Your performance data stays on your infrastructure.
Direction‑aware thresholds – Latency increases are regressions; throughput decreases are regressions. Thresholds are sign‑aware per metric type, so no manual configuration is required for known metrics.
Statistical mode – In noisy benchmark environments FlameIQ can apply the Mann‑Whitney U test alongside threshold comparison. A regression is declared only if both the threshold is exceeded and the result is statistically significant.
Versioned schema – The metrics schema is versioned (currently v1) with a formal specification. The threshold algorithm and statistical methodology are fully documented in /specs.

HTML Reports

flameiq report --metrics current.json --output report.html

Generates a self‑contained HTML report with a full metric‑diff table, regression highlights, and trend analysis. No external assets — works offline.

Configuration

flameiq.yaml (created by flameiq init):

thresholds:
  latency.p95:   10%    # Allow up to 10% latency increase
  latency.p99:   15%
  throughput:    -5%    # Allow up to 5% throughput decrease
  memory_mb:      8%

baseline:
  strategy: rolling_median
  rolling_window: 5

statistics:
  enabled: false
  confidence: 0.95

provider: json

Try the Demo

We built a demo project — flameiq-demo — that walks through the full regression‑detection workflow using a real Python library:

👉

Introducing FlameIQ — Deterministic Performance Regression Detection for Python

Introducing FlameIQ

Quick Start

Step 1 — Initialise

Step 2 — Run your benchmarks and produce a metrics file

Step 3 — Set a baseline

Step 4 — Compare on every PR

Output

A Real Example: Catching a Regex Regression

GitHub Actions Integration

Key Design Decisions

HTML Reports

Configuration

Try the Demo

Links

Related posts

Stop Guessing: Turn Vibe Coding from 'Sometimes Magic' to 'Reliably Powerful'!

Your AI Agent Is Dumpster Diving Through Your Code,,,

Watson's Contract Problem: What AI Teaches Us About Tech Debt

My Journey Into Machine Learning at 15 - Looking for Advice!

Introducing FlameIQ

Quick Start

Step 1 — Initialise

Step 2 — Run your benchmarks and produce a metrics file

Step 3 — Set a baseline

Step 4 — Compare on every PR

Output

A Real Example: Catching a Regex Regression

GitHub Actions Integration

Key Design Decisions

HTML Reports

Configuration

Try the Demo

Links

Related posts

Stop Guessing: Turn Vibe Coding from 'Sometimes Magic' to 'Reliably Powerful'!

Your AI Agent Is Dumpster Diving Through Your Code,,,

Watson's Contract Problem: What AI Teaches Us About Tech Debt

My Journey Into Machine Learning at 15 - Looking for Advice!

Step 1 — Initialise

Step 2 — Run your benchmarks and produce a metrics file

Step 3 — Set a baseline

Step 4 — Compare on every PR