Overfitting & Underfitting — Beyond Textbook Definitions (Part 5)
Source: Dev.to
Part 5 of The Hidden Failure Point of ML Models series
Most ML beginners think they understand overfitting and underfitting. In real production systems, overfitting is not just “high variance” and underfitting is not just “high bias.” They are system‑level failures that silently destroy model performance after deployment—especially when data drifts, pipelines change, or features misbehave. This article goes deeper than standard definitions and explains the real engineering meaning behind these problems.
Overfitting vs. Underfitting (textbook)
- Overfitting – Model performs well on training data but poorly on unseen data.
- Underfitting – Model performs poorly on both training and test data.
These definitions are correct but overly simplistic.
Operational Overfitting
Overfitting is not simply “memorization.” It occurs when a model:
- Learns noise instead of true patterns
- Depends on features that are unstable
- Relies on correlations that won’t exist in production
- Fails because training conditions ≠ real‑world conditions
Example (real ML case)
A churn‑prediction model learns:
if last_3_days_support_tickets > 0 → user will churn
But this feature:
- Is not available at inference time
- Is often missing
- Behaves differently month to month
The model collapses in production.
Operational overfitting = relying on features/patterns that break when the environment changes.
Operational Underfitting
Underfitting is not simply “too simple a model.” Real underfitting happens when:
- Data quality is poor
- Features don’t represent the true signal
- Wrong sampling hides real patterns
- Domain understanding is missing
- Feature interactions are ignored
Example
A fraud model predicts fraud = 0 almost always because:
- Training data was mostly clean
- The model never saw rare fraud patterns
- Sampling wasn’t stratified
This is data underfitting, not an algorithm failure.
Specific Overfitting Scenarios
- Feature Leakage Overfitting – Model depends on future or hidden variables.
- Pipeline Overfitting – Training pipeline ≠ production pipeline.
- Temporal Overfitting – Model learns patterns that existed only in a specific time period.
- Segment Overfitting – Model overfits to particular user groups or regions.
Contributing factors include weak/noisy features, wrong preprocessing, inappropriate loss functions, underrepresented classes, low model capacity, and poor domain encoding.
How to Detect Overfitting
- Large train–validation performance gap
- Sudden performance drop after deployment
- Time‑based performance decay
- Over‑reliance on a few unstable features
- Frequent drift‑detection alerts
How to Detect Underfitting
- Poor metrics on all datasets
- No improvement with more data
- High bias (systematic errors)
- Flat learning curves
How to Fix Overfitting
- Remove noisy or unstable features
- Fix leakage issues
- Add regularization (e.g., L2, dropout)
- Use dropout or other robust techniques
- Employ time‑based validation splits
- Align training and production pipelines
How to Fix Underfitting
- Add richer, domain‑driven features
- Increase model capacity (more layers, wider trees, etc.)
- Oversample rare classes or use class‑weighting
- Tune hyperparameters more aggressively
- Switch to more expressive model families if needed
Key Takeaways
| Insight | Meaning |
|---|---|
| Overfitting ≠ memorization | It’s operational fragility caused by unstable dependencies. |
| Underfitting ≠ small model | It’s missing signal due to data or feature issues. |
| Pipeline alignment matters | Most failures stem from mismatches between training and production. |
| Evaluation must be real‑world aware | Use time‑split, segment‑split, and other realistic validation strategies. |
| Monitoring is essential | Models decay over time; continuous monitoring catches drift early. |