신경망에서 과적합 이해하기 (TensorFlow - CNN)

발행: 2주 전 (2025년 12월 12일 오전 01:17 GMT+9)

5 min read

Source: Dev.to

과적합은 신경망을 개발할 때 직면하는 근본적인 문제입니다. 훈련 데이터셋에서 매우 높은 성능을 보이는 모델이라도 보지 못한 데이터에 일반화되지 않아 실제 환경에서 성능이 저하될 수 있습니다. 이 글에서는 Fashion‑MNIST 데이터셋을 사용해 과적합을 체계적으로 조사하고, Dropout, L2 정규화, Early Stopping 등 여러 완화 전략을 평가합니다.

Fashion‑MNIST 데이터셋

훈련 이미지 60,000장
테스트 이미지 10,000장
28 × 28 회색조 형식
10개의 출력 클래스

과적합 현상을 더 뚜렷하게 보기 위해 훈련 데이터의 상당히 작은 부분 집합을 의도적으로 사용합니다.

CNN 구조

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu',
                     kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dropout(dropout_rate),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

모든 실험은 이 구조를 공유하며, L2 정규화와 Dropout을 선택적으로 적용합니다.

훈련 이력 시각화 유틸리티

def plot_history(history, title_prefix=""):
    hist = history.history
    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)
    plt.plot(hist["loss"], label="Train Loss")
    plt.plot(hist["val_loss"], label="Val Loss")
    plt.title(f"{title_prefix} Loss")
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(hist["accuracy"], label="Train Accuracy")
    plt.plot(hist["val_accuracy"], label="Val Accuracy")
    plt.title(f"{title_prefix} Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()

베이스라인 모델 (정규화 없음)

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")

관찰 결과

훈련 정확도가 꾸준히 상승합니다.
검증 정확도는 초기에 최고점에 도달한 뒤 감소합니다.
훈련 손실은 감소하지만 검증 손실은 증가합니다.

이러한 패턴은 명확한 과적합을 나타냅니다.

Dropout (비율 = 0.5)

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")

관찰 결과

훈련 정확도가 더 천천히 상승합니다 (Dropout 때문).
검증 정확도가 훈련 곡선과 더 가깝게 따라갑니다.
훈련 손실과 검증 손실 사이의 차이가 크게 감소했습니다.

Dropout은 이 실험에서 매우 효과적이며, 일반화가 눈에 띄게 개선됩니다.

L2 정규화 (λ = 0.001)

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")

관찰 결과

가중치 페널티 때문에 훈련 손실이 눈에 띄게 높아집니다.
검증 손실 추세가 베이스라인에 비해 더 안정적입니다.
검증 정확도가 약간 개선됩니다.

L2 정규화는 학습 동역학을 부드럽게 하고 과적합을 완화하지만, 이 설정에서는 Dropout만큼 큰 영향을 주지는 못합니다.

Early Stopping

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

history_early = earlystop_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20,
    callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")

관찰 결과

검증 손실이 개선되지 않으면 훈련이 중단됩니다.
베이스라인에서 보였던 후기 에폭의 과적합을 방지합니다.
모든 모델 중 가장 깔끔한 검증 곡선을 보여줍니다.

Early Stopping은 간단하면서도 효과적인 일반화 기법입니다.

모델 변환 (선택 사항)

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))

이 단계는 배포를 위한 모델 크기 감소를 보여주는 예시이며, 정규화 전략은 아닙니다.

결과 요약

베이스라인: 명확한 과적합 발생.
Dropout: 검증 동작이 가장 크게 개선됨.
L2 정규화: 학습 동역학을 안정화하는 데 도움.
Early Stopping: 후기 에폭 발산을 방지하고 일반화를 향상.
조합 (Dropout + Early Stopping): 축소된 Fashion‑MNIST 데이터셋에서 가장 견고한 성능을 제공.

신경망에서 과적합 이해하기 (TensorFlow - CNN)

CNN 구조

훈련 이력 시각화 유틸리티

베이스라인 모델 (정규화 없음)

관찰 결과

Dropout (비율 = 0.5)

관찰 결과

L2 정규화 (λ = 0.001)

관찰 결과

Early Stopping

관찰 결과

모델 변환 (선택 사항)

결과 요약

관련 글

Bias–Variance Tradeoff — 시각적으로 그리고 실용적으로 설명 (Part 6)

AutoAugment: 데이터에서 증강 정책 학습

신경망에서의 벡터화: 초보자를 위한 가이드

머신러닝 ‘Advent Calendar’ 23일차: Excel에서 CNN

CNN 구조

훈련 이력 시각화 유틸리티

베이스라인 모델 (정규화 없음)

관찰 결과

Dropout (비율 = 0.5)

관찰 결과

L2 정규화 (λ = 0.001)

관찰 결과

Early Stopping

관찰 결과

모델 변환 (선택 사항)

결과 요약

관련 글

Bias–Variance Tradeoff — 시각적으로 그리고 실용적으로 설명 (Part 6)

AutoAugment: 데이터에서 증강 정책 학습

신경망에서의 벡터화: 초보자를 위한 가이드

머신러닝 ‘Advent Calendar’ 23일차: Excel에서 CNN

Dropout (비율 = 0.5)

L2 정규화 (λ = 0.001)