理解神经网络中的过拟合（TensorFlow - CNN）

发布: 1个月前 (2025年12月12日 GMT+8 00:17)

4 分钟阅读

Source: Dev.to

过拟合是开发神经网络时的一个根本性挑战。一个在训练数据集上表现极佳的模型可能无法对未见数据进行泛化，从而导致真实世界中的性能不佳。本文使用 Fashion‑MNIST 数据集对过拟合进行结构化调查，并评估了几种缓解策略，包括 Dropout、L2 正则化和提前停止（Early Stopping）。

Fashion‑MNIST 数据集

60,000 张训练图像
10,000 张测试图像
28 × 28 灰度格式
10 个输出类别

有意使用了显著更小的训练子集，以使过拟合行为更加明显。

CNN 架构

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu',
                     kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dropout(dropout_rate),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

所有实验均使用此架构，可选的 L2 正则化和 Dropout。

绘制训练历史的工具函数

def plot_history(history, title_prefix=""):
    hist = history.history
    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)
    plt.plot(hist["loss"], label="Train Loss")
    plt.plot(hist["val_loss"], label="Val Loss")
    plt.title(f"{title_prefix} Loss")
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(hist["accuracy"], label="Train Accuracy")
    plt.plot(hist["val_accuracy"], label="Val Accuracy")
    plt.title(f"{title_prefix} Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()

基线模型（无正则化）

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")

观察结果

训练准确率持续稳步上升。
验证准确率在早期达到峰值后开始下降。
训练损失下降，而验证损失上升。

这些模式表明出现了明显的过拟合。

Dropout（率 = 0.5）

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")

观察结果

训练准确率上升较慢（这是 Dropout 的预期效果）。
验证准确率与训练曲线更为接近。
训练与验证损失之间的差距显著缩小。

Dropout 在本实验中效果显著，显著提升了泛化能力。

L2 正则化（λ = 0.001）

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")

观察结果

由于权重惩罚，训练损失明显更高。
与基线相比，验证损失趋势更为平稳。
验证准确率有适度提升。

L2 正则化平滑了学习动态，缓解了过拟合，但其影响相较于 Dropout 较为温和。

提前停止（Early Stopping）

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

history_early = earlystop_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20,
    callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")

观察结果

当验证损失不再提升时，训练提前终止。
避免了基线模型在后期出现的过拟合。
在所有模型中产生了最干净的验证曲线之一。

提前停止是一种简单且有效的泛化技术。

模型转换（可选）

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))

此步骤演示了用于部署的模型体积缩减；它并非正则化策略。

结果汇总

基线：出现明显过拟合。
Dropout：在验证表现上提升最大。
L2 正则化：帮助稳定训练动态。
提前停止：防止后期发散并提升泛化。
组合（Dropout + 提前停止）：在缩减后的 Fashion‑MNIST 数据集上实现最稳健的性能。

理解神经网络中的过拟合（TensorFlow - CNN）

CNN 架构

绘制训练历史的工具函数

基线模型（无正则化）

观察结果

Dropout（率 = 0.5）

观察结果

L2 正则化（λ = 0.001）

观察结果

提前停止（Early Stopping）

观察结果

模型转换（可选）

结果汇总

相关文章

AI 与机器学习：Intelligent Automation 的 Futnewsure

过度平滑的隐藏陷阱：如何防止过

Anansi的网络作为神经架构：从民俗到框架

大语言模型与 Transformer 架构简介：与计算器对话

CNN 架构

绘制训练历史的工具函数

基线模型（无正则化）

观察结果

Dropout（率 = 0.5）

观察结果

L2 正则化（λ = 0.001）

观察结果

提前停止（Early Stopping）

观察结果

模型转换（可选）

结果汇总

相关文章

AI 与机器学习：Intelligent Automation 的 Futnewsure

过度平滑的隐藏陷阱：如何防止过

Anansi的网络作为神经架构：从民俗到框架

大语言模型与 Transformer 架构简介：与计算器对话

Dropout（率 = 0.5）

L2 正则化（λ = 0.001）