理解梯度：神经网络学习背后的引擎

发布: 1个月前 (2025年12月27日 GMT+8 05:08)

3 分钟阅读

原文: Dev.to

Source: Dev.to

在上一篇文章中，我们探讨了激活函数并使用 Python 对其进行了可视化。
现在，让我们看看梯度是什么。

什么是梯度？

神经网络使用激活函数来转换输入。当网络产生错误的输出时，它需要一种方式来知道该如何调整——这就是梯度发挥作用的地方。

想象在山坡上行走：陡峭的坡度会清晰地告诉你哪边是上坡或下坡，而平坦的区域则让你难以判断方向。
梯度是一个数值，表示曲线在某一点的陡峭程度。

神经网络中的梯度

梯度告诉我们参数应该改变多少。
梯度越大，更新幅度越大。
如果梯度为 0，学习就会停止。

训练过程中：

进行一次预测。
计算损失（预测错误的程度）。
更新权重以降低损失。

权重的更新完全取决于梯度。

每个激活函数都有：

一条曲线（激活本身）。
一条梯度曲线（激活的导数）。

ReLU 梯度

import numpy as np
import matplotlib.pyplot as plt

def relu_grad(x):
    return np.where(x > 0, 1, 0)

# Example usage
x = np.linspace(-5, 5, 400)
plt.figure()
plt.plot(x, relu_grad(x), label="ReLU Gradient")
plt.title("Gradient of ReLU")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

观察

整个负侧的梯度为 0。
正侧的梯度恒为 1。

含义

当 ReLU 被激活时学习非常快。
如果 ReLU 神经元始终收到负输入，它们可能会“死亡”。

Softplus 梯度

def softplus_grad(x):
    return 1 / (1 + np.exp(-x))

# Plotting
plt.figure()
plt.plot(x, softplus_grad(x), label="Softplus Gradient")
plt.title("Gradient of Softplus")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

观察

梯度的形状与 sigmoid 激活相匹配。
它提供平滑的过渡，使学习永远不会完全停止。

含义

Softplus 能避免神经元死亡并提升稳定性，但学习速度比 ReLU 慢。

Sigmoid 梯度

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1 - s)

# Plotting
plt.figure()
plt.plot(x, sigmoid_grad(x), label="Sigmoid Gradient")
plt.title("Gradient of Sigmoid")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()

观察

在两端（大正值或大负值）梯度非常小。
仅在曲线中间附近梯度较大。

含义

这种行为导致 梯度消失问题，会阻碍深层网络的学习。我们将在下一篇文章中进一步探讨此问题。

亲自尝试

你可以在 Colab 笔记本中实验上述代码片段。

理解梯度：神经网络学习背后的引擎

什么是梯度？

神经网络中的梯度

ReLU 梯度

观察

含义

Softplus 梯度

观察

含义

Sigmoid 梯度

观察

含义

亲自尝试

相关文章

激活函数：简单曲线如何赋能神经网络

AutoAugment：从数据中学习增强策略

神经网络中的向量化：入门指南

神经网络：绝对初学者