왜 수학이 머신러닝에 필수적인가
Source: Dev.to
Introduction — The Black Box Myth
Machine Learning is often presented as an essentially algorithmic discipline: you load data, choose a model, train it, and “it works.”
This view is partly true, but fundamentally incomplete.
Behind every Machine Learning algorithm lie precise mathematical structures:
- notions of distance
- properties of continuity
- assumptions of convexity
- convergence guarantees and theoretical limits that no model can circumvent
👉 Modern Machine Learning is not an alternative to mathematics; it is a direct application of it.
This article sets the general framework for the series: understanding why mathematical analysis is indispensable for understanding, designing, and mastering Machine Learning algorithms.
1. Machine Learning Is Primarily an Optimization Problem
At a fundamental level, almost all ML algorithms solve the same problem:
Minimize a loss function.
Formally, we search for parameters θ such that
[ \theta^{*} = \arg\min_{\theta} L(\theta) ]
where L(θ) measures the model’s error on the data.
Essential mathematical questions arise immediately:
- What does it mean to minimize?
- Does a minimum exist?
- Is it unique?
- Can it be reached numerically?
- At what speed?
These questions are mathematical, not merely algorithmic.
2. Distance, Norms, and Geometry: Measuring Error Is Not Neutral
Before optimizing anything, we must answer:
How do we measure error?
This leads directly to the notions of distance and norm.
Classic examples:
- MAE (Mean Absolute Error) ↔
L¹norm - MSE (Mean Squared Error) ↔
L²norm - Maximum error ↔
L^∞norm
These choices are not incidental:
- they change the geometry of the problem
- they affect robustness to outliers
- they influence numerical stability
- they impact gradient descent behavior
👉 Without understanding the geometry induced by a norm, one does not truly understand what the algorithm is optimizing.
3. Convergence: When Can We Say an Algorithm Works?
A Machine Learning algorithm is often iterative:
[ \theta_{0} \rightarrow \theta_{1} \rightarrow \theta_{2} \rightarrow \dots ]
This raises a crucial question:
Does this sequence converge? And if so, to what?
The answer depends on concepts from analysis:
- sequences and limits
- Cauchy sequences
- completeness
- continuity
Without these notions, it is impossible to answer practical questions such as:
- why training diverges
- why it oscillates
- why it is slow
- why two implementations produce different results
4. Continuity, Lipschitz Conditions, and Stability
A Machine Learning model must be stable: a small change in the data or parameters should not cause predictions to explode. This stability is formalized by:
- uniform continuity
- Lipschitz functions
A function f is Lipschitz if
[ |f(x) - f(y)| \le L,|x - y| ]
This inequality lies at the core of:
- model stability
- learning rate selection
- convergence guarantees for gradient descent
👉 The Lipschitz constant is not a theoretical detail; it directly controls the speed and stability of learning.
5. Convexity: Why Some Problems Are Easy… and Others Are Not
Convexity is arguably the most important mathematical property in optimization.
A convex function has:
- a unique global minimum
- no traps in the form of local minima
This is why methods such as:
- linear regression
- support vector machines
- certain regularization problems
benefit from strong theoretical guarantees.
By contrast, deep neural networks are non‑convex, yet they still work thanks to particular structures and effective heuristics.
👉 Understanding convexity makes it possible to know when guarantees exist — and when they do not.
6. Theory vs. Practice: What Mathematics Guarantees (and What It Does Not)
A crucial point to understand from the outset:
Mathematics guarantees properties, not miraculous performance.
It can tell us:
- whether a solution exists
- whether it is unique
- whether an algorithm converges
- how fast it converges
It cannot guarantee:
- good data
- good generalization
- an unbiased model
Without mathematical insight, we proceed blindly.
Conclusion — Understand Before You Optimize
Modern Machine Learning rests on three fundamental mathematical pillars:
- Geometry (norms, distances)
- Analysis (continuity, convergence, Lipschitz conditions)
- Optimization (convexity, gradient descent)
Ignoring these foundations amounts to:
- applying recipes without understanding their limits
- misdiagnosing failures
- overcomplicating simple problems
👉 Understanding the mathematical analysis of Machine Learning is not theory for theory’s sake; it is about gaining control, robustness, and intuition.