Understanding Gradients: The Engine Behind Neural Network Learning

Published: 1 month ago (December 26, 2025 at 04:08 PM EST)

2 min read

Source: Dev.to

In the previous article, we explored activation functions and visualized them using Python.
Now, let’s see what gradients are.

What Is a Gradient?

Neural networks use activation functions to transform inputs. When a network produces an incorrect output, it needs a way to know what to adjust—this is where gradients come in.

Think of walking on a hill: a steep slope tells you clearly which direction is up or down, while a flat area makes it hard to decide.
A gradient is a number that indicates how steep a curve is at a particular point.

Gradients in Neural Networks

Gradients tell us how much a parameter should change.
The larger the gradient, the larger the update.
If the gradient is 0, learning stops.

During training:

Make a prediction.
Calculate the loss (how wrong the prediction is).
Update the weights to reduce the loss.

The weight update depends entirely on the gradients.

Each activation function has:

A curve (the activation itself).
A gradient curve (the derivative of the activation).

ReLU Gradient

import numpy as np
import matplotlib.pyplot as plt

def relu_grad(x):
    return np.where(x > 0, 1, 0)

# Example usage
x = np.linspace(-5, 5, 400)
plt.figure()
plt.plot(x, relu_grad(x), label="ReLU Gradient")
plt.title("Gradient of ReLU")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

The entire negative side has a gradient of 0.
The positive side has a constant gradient of 1.

Implications

ReLU learns very fast when active.
ReLU neurons can “die” if they always receive negative inputs.

Softplus Gradient

def softplus_grad(x):
    return 1 / (1 + np.exp(-x))

# Plotting
plt.figure()
plt.plot(x, softplus_grad(x), label="Softplus Gradient")
plt.title("Gradient of Softplus")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

The gradient shape matches that of the sigmoid activation.
It provides a smooth transition, so learning never completely stops.

Implications

Softplus avoids dying neurons and adds stability, but learning is slower than with ReLU.

Sigmoid Gradient

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1 - s)

# Plotting
plt.figure()
plt.plot(x, sigmoid_grad(x), label="Sigmoid Gradient")
plt.title("Gradient of Sigmoid")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

The gradient is very small at both extremes (large positive or negative inputs).
It is strong only around the middle of the curve.

Implications

This behavior leads to the vanishing gradient problem, which hampers learning in deep networks. We’ll explore this issue further in the next article.

Try It Yourself

You can experiment with the code snippets above in a Colab notebook.

Understanding Gradients: The Engine Behind Neural Network Learning

What Is a Gradient?

Gradients in Neural Networks

ReLU Gradient

Observations

Implications

Softplus Gradient

Observations

Implications

Sigmoid Gradient

Observations

Implications

Try It Yourself

Related posts

Activation Functions: How Simple Curves Power Neural Networks

AutoAugment: Learning Augmentation Policies from Data

Vectorization in Neural Networks: A Beginner’s Guide

Neural Networks for Absolute Beginners