Understanding Overfitting in Neural Networks (TensorFlow- CNN)

Published: 1 month ago (December 11, 2025 at 11:17 AM EST)

3 min read

Source: Dev.to

Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real‑world performance. This article presents a structured investigation of overfitting using the Fashion‑MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 regularisation, and Early Stopping.

Fashion‑MNIST dataset

60,000 training images
10,000 test images
28 × 28 grayscale format
10 output classes

A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.

CNN Architecture

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
    model = keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu',
                      kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu',
                     kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dropout(dropout_rate),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

All experiments share this architecture, with optional L2 regularisation and Dropout.

Utility for Plotting Training History

def plot_history(history, title_prefix=""):
    hist = history.history
    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)
    plt.plot(hist["loss"], label="Train Loss")
    plt.plot(hist["val_loss"], label="Val Loss")
    plt.title(f"{title_prefix} Loss")
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(hist["accuracy"], label="Train Accuracy")
    plt.plot(hist["val_accuracy"], label="Val Accuracy")
    plt.title(f"{title_prefix} Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()

Baseline Model (No Regularisation)

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")

Observations

Training accuracy continues to increase steadily.
Validation accuracy peaks early and then declines.
Training loss decreases, while validation loss increases.

These patterns indicate clear overfitting.

Dropout (Rate = 0.5)

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")

Observations

Training accuracy rises more slowly (expected due to Dropout).
Validation accuracy tracks the training curve more closely.
Divergence between training and validation loss is significantly reduced.

Dropout is highly effective in this experiment, producing noticeably improved generalisation.

L2 Regularisation (λ = 0.001)

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")

Observations

Training loss is noticeably higher due to weight penalisation.
Validation loss trends are more stable compared to the baseline.
Validation accuracy improves moderately.

L2 regularisation smooths learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.

Early Stopping

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

history_early = earlystop_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20,
    callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")

Observations

Training terminates after validation loss stops improving.
Avoids the late‑epoch overfitting seen in the baseline.
Produces one of the cleanest validation curves among all models.

Early stopping is a simple and effective generalisation technique.

Model Conversion (Optional)

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))

This step demonstrates model size reduction for deployment purposes; it is not a regularisation strategy.

Summary of Results

Baseline: Clear overfitting.
Dropout: Largest improvement in validation behaviour.
L2 Regularisation: Helps stabilise training dynamics.
Early Stopping: Prevents late‑epoch divergence and improves generalisation.
Combination (Dropout + Early Stopping): Yields the most robust performance on the reduced Fashion‑MNIST dataset.

Understanding Overfitting in Neural Networks (TensorFlow- CNN)

CNN Architecture

Utility for Plotting Training History

Baseline Model (No Regularisation)

Observations

Dropout (Rate = 0.5)

Observations

L2 Regularisation (λ = 0.001)

Observations

Early Stopping

Observations

Model Conversion (Optional)

Summary of Results

Related posts

AI and Machine Learning: The Futnewsure of Intelligent Automation

**'The Hidden Pitfall of Over-Smoothing: How To Prevent Over

Anansi’s Web as Neural Architecture: From Folklore to Framework

An Intro to Large Language Models and the Transformer Architecture: Talking to a calculator

CNN Architecture

Utility for Plotting Training History

Baseline Model (No Regularisation)

Observations

Dropout (Rate = 0.5)

Observations

L2 Regularisation (λ = 0.001)

Observations

Early Stopping

Observations

Model Conversion (Optional)

Summary of Results

Related posts

AI and Machine Learning: The Futnewsure of Intelligent Automation

**'The Hidden Pitfall of Over-Smoothing: How To Prevent Over

Anansi’s Web as Neural Architecture: From Folklore to Framework

An Intro to Large Language Models and the Transformer Architecture: Talking to a calculator

Dropout (Rate = 0.5)

L2 Regularisation (λ = 0.001)