Understanding Errors in Machine Learning: Accuracy, Precision, Recall & F1 Score

Published: (December 16, 2025 at 01:59 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Machine Learning models are often judged by numbers, but many beginners (and even practitioners) misunderstand what those numbers actually mean. A model showing 95 % accuracy might still be useless in real‑world scenarios.

In this post we’ll break down:

  • Types of errors in Machine Learning
  • Confusion matrix
  • Accuracy
  • Precision
  • Recall
  • F1 Score

All explained intuitively, with examples you can confidently use in interviews or projects.

1️⃣ Types of Errors in Machine Learning

In a classification problem, predictions fall into four categories:

Predicted PositivePredicted Negative
Actual PositiveTrue Positive (TP)False Negative (FN)
Actual NegativeFalse Positive (FP)True Negative (TN)

🔴 False Positive (Type I Error)

Model predicts Positive, but the actual result is Negative

Example: An email is marked as Spam but it is actually Not Spam.

🔵 False Negative (Type II Error)

Model predicts Negative, but the actual result is Positive

Example: A medical test says No Disease but the patient actually has it.

These errors directly impact evaluation metrics.

2️⃣ Confusion Matrix (The Foundation)

A confusion matrix summarizes prediction results:

                Predicted
               +      -
Actual +      TP     FN
Actual -      FP     TN

All metrics are derived from this table.

3️⃣ Accuracy

📌 Definition

Accuracy measures how often the model is correct.

📐 Formula

[ \text{Accuracy} = \frac{TP + TN}{TP + FP + FN + TN} ]

❗ Problem with Accuracy

Accuracy can be misleading on imbalanced datasets.

Example

  • 99 normal patients
  • 1 patient with disease

If the model predicts No Disease for everyone:

[ \text{Accuracy}= \frac{99}{100}=99% ]

The model is dangerous despite the high accuracy. → Accuracy alone is not enough.

4️⃣ Precision

📌 Definition

Of all predicted positives, how many are actually positive?

📐 Formula

[ \text{Precision} = \frac{TP}{TP + FP} ]

🎯 When to focus on Precision?

When False Positives are costly.

Examples

  • Spam detection
  • Fraud detection

You don’t want to wrongly flag legitimate cases.

5️⃣ Recall (Sensitivity)

📌 Definition

Of all actual positives, how many did the model correctly identify?

📐 Formula

[ \text{Recall} = \frac{TP}{TP + FN} ]

🎯 When to focus on Recall?

When False Negatives are dangerous.

Examples

  • Cancer detection
  • Accident detection

Missing a positive case can have severe consequences.

6️⃣ Precision ↔ Recall Trade‑off

Increasing Precision often decreases Recall, and vice‑versa.

ScenarioPriority
Spam filterPrecision
Disease detectionRecall
Fraud detectionRecall

This trade‑off leads us to the F1 Score.

7️⃣ F1 Score

📌 Definition

The harmonic mean of Precision and Recall.

📐 Formula

[ \text{F1} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} ]

✅ Why F1 Score?

  • Balances Precision & Recall
  • Works well for imbalanced datasets
  • Penalises extreme values (if either Precision or Recall is low, F1 drops sharply)

8️⃣ Summary Table

MetricBest Used WhenFocus
AccuracyBalanced dataOverall correctness
PrecisionFalse Positives costlyPrediction quality
RecallFalse Negatives costlyDetection completeness
F1 ScoreImbalanced dataBalanced performance

9️⃣ Real‑World Case Studies

Understanding metrics becomes clearer when we map them to real‑world problems. Below are some common, interview‑relevant case studies.

🏥 Case Study 1: Disease Detection (Cancer / COVID)

  • Scenario: Model predicts whether a patient has a disease.
  • Critical error: False Negative – predicting Healthy when the patient is actually sick.
  • Why Recall matters more: Missing a sick patient can delay treatment and cost lives. Some false alarms (FPs) are acceptable.

Primary metric: Recall

💳 Case Study 2: Credit‑Card Fraud Detection

  • Scenario: Model identifies fraudulent transactions.
  • Critical error: False Negative – fraud marked as legitimate.
  • Trade‑off: Too many FP annoy customers; too many FN cause financial loss.

Best metric: F1 Score (balances FP and FN costs)

📧 Case Study 3: Spam Email Detection

  • Scenario: Classify emails as Spam or Not Spam.
  • Critical error: False Positive – important email marked as spam.
  • Why Precision matters: Users may miss critical emails (job offers, OTPs, invoices).

Primary metric: Precision

🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)

  • Scenario: Detect pedestrians using camera and sensor data.
  • Critical error: False Negative – pedestrian not detected.
  • Why Recall is crucial: Missing even one pedestrian can be fatal.

Primary metric: Recall

🏭 Case Study 5: Manufacturing Defect Detection

  • Scenario: Detect defective products on an assembly line.
  • Critical error depends on context:
    • High FP → waste & increased cost
    • High FN → faulty product reaches the customer
  • Balanced approach: Use both Precision and Recall.

Best metric: F1 Score

🔚 Final Thoughts

Never blindly trust accuracy. Always ask:

  • Which error (FP or FN) is more dangerous?
  • Is my dataset imbalanced?
  • What is the real‑world cost of a false positive vs. a false negative?

Understanding these metrics lets you choose the right evaluation strategy for any problem.

ics makes you a better ML engineer, not just a model builder.

If this helped you, feel free to share or comment your favorite ML pitfall!

Back to Blog

Related posts

Read more »