Overfitting & Underfitting — Beyond Textbook Definitions (Part 5)

Published: 2 months ago (December 2, 2025 at 10:43 PM EST)

3 min read

Source: Dev.to

Part 5 of The Hidden Failure Point of ML Models series

Most ML beginners think they understand overfitting and underfitting. In real production systems, overfitting is not just “high variance” and underfitting is not just “high bias.” They are system‑level failures that silently destroy model performance after deployment—especially when data drifts, pipelines change, or features misbehave. This article goes deeper than standard definitions and explains the real engineering meaning behind these problems.

Overfitting vs. Underfitting (textbook)

Overfitting – Model performs well on training data but poorly on unseen data.
Underfitting – Model performs poorly on both training and test data.

These definitions are correct but overly simplistic.

Operational Overfitting

Overfitting is not simply “memorization.” It occurs when a model:

Learns noise instead of true patterns
Depends on features that are unstable
Relies on correlations that won’t exist in production
Fails because training conditions ≠ real‑world conditions

Example (real ML case)

A churn‑prediction model learns:

if last_3_days_support_tickets > 0 → user will churn

But this feature:

Is not available at inference time
Is often missing
Behaves differently month to month

The model collapses in production.
Operational overfitting = relying on features/patterns that break when the environment changes.

Operational Underfitting

Underfitting is not simply “too simple a model.” Real underfitting happens when:

Data quality is poor
Features don’t represent the true signal
Wrong sampling hides real patterns
Domain understanding is missing
Feature interactions are ignored

Example

A fraud model predicts fraud = 0 almost always because:

Training data was mostly clean
The model never saw rare fraud patterns
Sampling wasn’t stratified

This is data underfitting, not an algorithm failure.

Specific Overfitting Scenarios

Feature Leakage Overfitting – Model depends on future or hidden variables.
Pipeline Overfitting – Training pipeline ≠ production pipeline.
Temporal Overfitting – Model learns patterns that existed only in a specific time period.
Segment Overfitting – Model overfits to particular user groups or regions.

Contributing factors include weak/noisy features, wrong preprocessing, inappropriate loss functions, underrepresented classes, low model capacity, and poor domain encoding.

How to Detect Overfitting

Large train–validation performance gap
Sudden performance drop after deployment
Time‑based performance decay
Over‑reliance on a few unstable features
Frequent drift‑detection alerts

How to Detect Underfitting

Poor metrics on all datasets
No improvement with more data
High bias (systematic errors)
Flat learning curves

How to Fix Overfitting

Remove noisy or unstable features
Fix leakage issues
Add regularization (e.g., L2, dropout)
Use dropout or other robust techniques
Employ time‑based validation splits
Align training and production pipelines

How to Fix Underfitting

Add richer, domain‑driven features
Increase model capacity (more layers, wider trees, etc.)
Oversample rare classes or use class‑weighting
Tune hyperparameters more aggressively
Switch to more expressive model families if needed

Key Takeaways

Insight	Meaning
Overfitting ≠ memorization	It’s operational fragility caused by unstable dependencies.
Underfitting ≠ small model	It’s missing signal due to data or feature issues.
Pipeline alignment matters	Most failures stem from mismatches between training and production.
Evaluation must be real‑world aware	Use time‑split, segment‑split, and other realistic validation strategies.
Monitoring is essential	Models decay over time; continuous monitoring catches drift early.