Introducing FlameIQ — Deterministic Performance Regression Detection for Python
Source: Dev.to
Performance regressions are invisible in code review.
- A careless refactor that recompiles a regex on every function call.
- A new dependency that adds 40 ms to your p95 latency.
- A database query that wasn’t indexed.
None of these show up in a diff. They accumulate silently across hundreds of commits — a 3 ms latency increase here, a 2 % throughput drop there — until they become expensive production incidents.
Type checkers enforce correctness automatically. Linters enforce style automatically. Nothing enforces performance — until now.
Introducing FlameIQ
Today we are releasing FlameIQ v1.0.0 — an open‑source, deterministic, CI‑native performance‑regression engine for Python.
pip install flameiq-coreFlameIQ compares your current benchmark results against a stored baseline and fails your CI pipeline if any metric exceeds its configured threshold — the same way a type checker fails your build on a type error.
Quick Start
Step 1 — Initialise
cd my-project
flameiq initStep 2 — Run your benchmarks and produce a metrics file
{
"schema_version": 1,
"metadata": {
"commit": "abc123",
"branch": "main",
"environment": "ci"
},
"metrics": {
"latency": {
"mean": 120.5,
"p95": 180.0,
"p99": 240.0
},
"throughput": 950.2,
"memory_mb": 512.0
}
}Step 3 — Set a baseline
flameiq baseline set --metrics benchmark.jsonStep 4 — Compare on every PR
flameiq compare --metrics current.json --fail-on-regressionOutput
Metric Baseline Current Change Threshold Status
────────────────────────────────────────────────────────────────────
latency.p95 2.45 ms 4.51 ms +84.08% ±10.0% REGRESSION
throughput 412.30 231.50 -43.84% ±10.0% REGRESSION
✗ REGRESSION — 2 metric(s) exceeded threshold.Exit code 1. Pipeline fails. Regression caught before merge.
A Real Example: Catching a Regex Regression
# FAST — original implementation
def clean(text: str) -> str:
text = re.sub(r"[^\w\s]", "", text) # Python caches compiled regex
text = re.sub(r"\s+", " ", text).strip()
return text.lower()
# SLOW — regressed implementation
def clean(text: str) -> str:
punct_re = re.compile(r"[^\w\s]") # recompiled on every call!
space_re = re.compile(r"\s+") # recompiled on every call!
text = punct_re.sub("", text)
text = space_re.sub(" ", text).strip()
return text.lower()The logic is identical, so the diff looks clean. FlameIQ catches it with an 84 % p95 latency increase — well above the 10 % threshold.
GitHub Actions Integration
- name: Install FlameIQ
run: pip install flameiq-core
- name: Restore baseline cache
uses: actions/cache@v4
with:
path: .flameiq/
key: flameiq-${{ github.base_ref }}
- name: Run benchmarks
run: python run_benchmarks.py > metrics.json
- name: Check for regressions
run: flameiq compare --metrics metrics.json --fail-on-regressionKey Design Decisions
- Deterministic by design – Identical inputs always produce identical outputs. No randomness, network calls, or
datetime.now(). Safe for any CI environment, including air‑gapped infrastructure. - No vendor dependency – Baselines are local JSON files. No SaaS account, API keys, or telemetry. Your performance data stays on your infrastructure.
- Direction‑aware thresholds – Latency increases are regressions; throughput decreases are regressions. Thresholds are sign‑aware per metric type, so no manual configuration is required for known metrics.
- Statistical mode – In noisy benchmark environments FlameIQ can apply the Mann‑Whitney U test alongside threshold comparison. A regression is declared only if both the threshold is exceeded and the result is statistically significant.
- Versioned schema – The metrics schema is versioned (currently v1) with a formal specification. The threshold algorithm and statistical methodology are fully documented in
/specs.
HTML Reports
flameiq report --metrics current.json --output report.htmlGenerates a self‑contained HTML report with a full metric‑diff table, regression highlights, and trend analysis. No external assets — works offline.
Configuration
flameiq.yaml (created by flameiq init):
thresholds:
latency.p95: 10% # Allow up to 10% latency increase
latency.p99: 15%
throughput: -5% # Allow up to 5% throughput decrease
memory_mb: 8%
baseline:
strategy: rolling_median
rolling_window: 5
statistics:
enabled: false
confidence: 0.95
provider: jsonTry the Demo
We built a demo project — flameiq-demo — that walks through the full regression‑detection workflow using a real Python library:
👉
Links
- PyPI:
- Documentation: docs.io
- Source: https://github.com/flameiq/flameiq-core
- Demo project: https://github.com/flameiq/demo-flameiq
Feedback, issues, and contributions are welcome. If you have caught a regression with FlameIQ or have a use case we haven’t considered, open an issue or start a discussion on GitHub.
Tags: python opensource devtools ci performance