Optimization and Regularization — 모델이 학습하는 방식 (그리고 학습이 실패하는 이유)

발행: 3주 전 (2026년 4월 12일 오전 01:50 GMT+9)

4 분 소요

원문: Dev.to

Source: Dev.to

실제 문제

Low training loss ≠ good model.
훈련 손실이 낮다고 해서 모델이 좋은 것은 아니다.
The real goal: generalization.
실제 목표: 일반화.

최적화 = 학습

Optimization reduces loss by updating parameters.
최적화는 파라미터를 업데이트하여 손실을 감소시킨다.
Without optimization: random predictions.
최적화가 없으면: 무작위 예측.

경사 하강법 (실제)

Compute gradient.
1. 그래디언트를 계산한다.
Update parameters.
2. 파라미터를 업데이트한다.

Real training uses minibatches → noisy updates.
실제 훈련은 미니배치를 사용 → 노이즈가 섞인 업데이트.

옵티마이저

Momentum → smoother updates
- Momentum → 더 부드러운 업데이트
RMSProp → adaptive scaling
- RMSProp → 적응형 스케일링
Adam → default
- Adam → 기본값

If unsure: → use Adam
확신이 없을 때: → Adam 사용

학습률 (가장 먼저 확인할 것)

Most training failures come from this.
대부분의 훈련 실패는 여기서 비롯된다.

Too high → exploding loss
- 너무 높음 → 손실 폭발
Too low → stuck training
- 너무 낮음 → 훈련 정체

Fix: → tune learning rate before anything else.
해결책: → 다른 모든 것보다 먼저 학습률을 조정한다.

과적합

Classic pattern:
전형적인 패턴:

Train loss ↓
- 훈련 손실 ↓
Validation loss ↑
- 검증 손실 ↑

정규화

Prevents memorization.
기억을 방지한다.

L1 vs L2

L1 → sparse
- L1 → 희소성
L2 → stable
- L2 → 안정성

Default: → L2
기본값: → L2

조기 종료

Stop when validation loss increases.
검증 손실이 증가하면 중단한다.
Often the simplest fix.
종종 가장 간단한 해결책이다.

드롭아웃

Randomly disable neurons.
뉴런을 무작위로 비활성화한다.

Use when: model relies too much on specific features.
사용 시점: 모델이 특정 특징에 과도하게 의존할 때.

디버깅 체크리스트

Loss exploding → reduce LR
- 손실 폭발 → 학습률 감소
Loss flat → increase LR
- 손실 평탄 → 학습률 증가
Train good / val bad → add regularization
- 훈련은 좋고 검증은 나쁨 → 정규화 추가
Both bad → model too simple
- 둘 다 나쁨 → 모델이 너무 단순함

실용적인 설정

Optimizer: Adam
- 옵티마이저: Adam
Learning Rate: moderate + decay
- 학습률: 보통 + 감소
Regularization: L2
- 정규화: L2
Optional: dropout
- 선택 사항: 드롭아웃
Monitor: validation loss
- 모니터링: 검증 손실

핵심 인사이트

Optimization + regularization = working model
최적화 + 정규화 = 작동하는 모델