Understanding Gradients: The Engine Behind Neural Network Learning

Published: (December 26, 2025 at 04:08 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

In the previous article, we explored activation functions and visualized them using Python.
Now, let’s see what gradients are.

What Is a Gradient?

Neural networks use activation functions to transform inputs. When a network produces an incorrect output, it needs a way to know what to adjust—this is where gradients come in.

  • Think of walking on a hill: a steep slope tells you clearly which direction is up or down, while a flat area makes it hard to decide.
  • A gradient is a number that indicates how steep a curve is at a particular point.

Gradients in Neural Networks

  • Gradients tell us how much a parameter should change.
  • The larger the gradient, the larger the update.
  • If the gradient is 0, learning stops.

During training:

  1. Make a prediction.
  2. Calculate the loss (how wrong the prediction is).
  3. Update the weights to reduce the loss.

The weight update depends entirely on the gradients.

Each activation function has:

  • A curve (the activation itself).
  • A gradient curve (the derivative of the activation).

ReLU Gradient

import numpy as np
import matplotlib.pyplot as plt

def relu_grad(x):
    return np.where(x > 0, 1, 0)

# Example usage
x = np.linspace(-5, 5, 400)
plt.figure()
plt.plot(x, relu_grad(x), label="ReLU Gradient")
plt.title("Gradient of ReLU")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

  • The entire negative side has a gradient of 0.
  • The positive side has a constant gradient of 1.

Implications

  • ReLU learns very fast when active.
  • ReLU neurons can “die” if they always receive negative inputs.

Softplus Gradient

def softplus_grad(x):
    return 1 / (1 + np.exp(-x))

# Plotting
plt.figure()
plt.plot(x, softplus_grad(x), label="Softplus Gradient")
plt.title("Gradient of Softplus")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

  • The gradient shape matches that of the sigmoid activation.
  • It provides a smooth transition, so learning never completely stops.

Implications

  • Softplus avoids dying neurons and adds stability, but learning is slower than with ReLU.

Sigmoid Gradient

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1 - s)

# Plotting
plt.figure()
plt.plot(x, sigmoid_grad(x), label="Sigmoid Gradient")
plt.title("Gradient of Sigmoid")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

Observations

  • The gradient is very small at both extremes (large positive or negative inputs).
  • It is strong only around the middle of the curve.

Implications

  • This behavior leads to the vanishing gradient problem, which hampers learning in deep networks. We’ll explore this issue further in the next article.

Try It Yourself

You can experiment with the code snippets above in a Colab notebook.

Back to Blog

Related posts

Read more »