Why Accuracy Lies — The Metrics That Actually Matter (Part 4)

Published: 2 months ago (December 2, 2025 at 10:18 PM EST)

3 min read

Source: Dev.to

Cover image for Why Accuracy Lies — The Metrics That Actually Matter (Part 4)

Accuracy is the most widely used metric in machine learning.
It’s also the most misleading. In real‑world production ML systems, accuracy can make a bad model look good, hide failures, distort business decisions, and even create the illusion of success before causing catastrophic downstream impact.

Accuracy is a vanity metric. It tells you almost nothing about real ML performance.

The Accuracy Trap

Accuracy formula

Accuracy = Correct predictions / Total predictions

When accuracy breaks

Classes are imbalanced
Rare events matter more
Cost of mistakes is different
Distribution changes
Confidence matters

Most real ML use cases suffer from one or more of these issues.

Classic Example: Fraud Detection

Dataset: 10,000 normal transactions, 12 frauds
Model: predicts everything as “normal”

Accuracy = 99.88%

The model catches 0 frauds → useless. Accuracy hides the failure.

Why Accuracy Fails

Problem	Why Accuracy Is Useless
Class imbalance	Majority class dominates
Rare events	Accuracy ignores minority class
Cost‑sensitive predictions	Wrong predictions have different penalties
Real‑world data shift	Accuracy stays the same while failure increases
Business KPIs	Accuracy doesn’t measure financial impact

Accuracy ≠ business value.

Metrics That Actually Matter

1. Precision

Definition: Of all predicted positives, how many were correct?

Use when: False positives are costly (e.g., spam detection, fraud alerts).

Formula

Precision = TP / (TP + FP)

2. Recall

Definition: Of all actual positives, how many did the model identify?

Use when: False negatives are costly (e.g., cancer detection, intrusion detection).

Formula

Recall = TP / (TP + FN)

3. F1 Score

Definition: Harmonic mean of precision & recall.

Use when: A balance between precision and recall is needed.

Formula

F1 = 2 * (Precision * Recall) / (Precision + Recall)

4. ROC‑AUC

Measures how well the model separates classes. Common in credit scoring and risk ranking. Higher AUC indicates better separation.

5. PR‑AUC

More informative than ROC‑AUC for highly imbalanced datasets. Used for fraud, rare defects, anomaly detection.

6. Log Loss (Cross Entropy)

Evaluates the correctness of predicted probabilities. Important when confidence matters and probabilities drive decisions.

7. Cost‑Based Metrics

Accuracy ignores cost; real ML does not.

Example

False negative cost = ₹5,000
False positive cost = ₹50

Formula

Total Cost = (FN * Cost_FN) + (FP * Cost_FP)

Enterprises use such cost‑based calculations to measure real model impact.

How to Pick the Right Metric — Practical Cheat Sheet

Use Case	Best Metrics
Fraud detection	Recall, F1, PR‑AUC
Medical diagnosis	Recall
Spam detection	Precision
Churn prediction	F1, Recall
Credit scoring	ROC‑AUC, KS
Product ranking	MAP@k, NDCG
NLP classification	F1
Forecasting	RMSE, MAPE

The Real Lesson

Accuracy is for beginners. Real ML engineers choose metrics that reflect business value.

Accuracy can be high while:

Profit drops
Risk increases
Users churn
Fraud bypasses detection
Trust collapses

Metrics must match:

The domain
The cost of mistakes
The real‑world distribution

Key Takeaways

Insight	Meaning
Accuracy is misleading	Never use it alone
Choose metric per use case	No universal metric
Precision/Recall matter more	Especially for imbalance
ROC‑AUC & PR‑AUC give deeper insight	Useful for ranking & rare events
Always tie metrics to business	ML is about impact, not just math

Coming Next — Part 5

Overfitting & Underfitting — Beyond Textbook Definitions
Real symptoms, real debugging, real engineering fixes.

Why Accuracy Lies — The Metrics That Actually Matter (Part 4)

The Accuracy Trap

When accuracy breaks

Classic Example: Fraud Detection

Why Accuracy Fails

Metrics That Actually Matter

1. Precision

2. Recall

3. F1 Score

4. ROC‑AUC

5. PR‑AUC

6. Log Loss (Cross Entropy)

7. Cost‑Based Metrics

How to Pick the Right Metric — Practical Cheat Sheet

The Real Lesson

Key Takeaways

Coming Next — Part 5

Related posts

The End of the Train-Test Split

Bias–Variance Tradeoff — Visually and Practically Explained (Part 6)

The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

How to Climb the Hidden Career Ladder of Data Science