Cross Entropy Derivatives, Part 3: Chain Rule for a Single Output Class
Source: Dev.to

In the [previous article](https://dev.to/rijultp/cross-entropy-derivatives-part-2-setting-up-the-derivative-with-respect-to-a-bias-32gh) we prepared a chain‑rule equation to compute the derivative of cross‑entropy with respect to bias **b₃**.
We will solve that equation step‑by‑step in this article.
---
## 1️⃣ Derivative of the cross‑entropy with respect to the predicted probability for *Setosa*

We use the familiar formula
\[
\frac{\partial \, \text{CE}}{\partial \hat{y}} = -\frac{y}{\hat{y}}
\]
Applying it here gives
\[
\frac{\partial \, \text{CE}}{\partial \hat{y}_{\text{Setosa}}}
= -\frac{1}{\hat{y}_{\text{Setosa}}}
\]

---
## 2️⃣ Derivative of the predicted probability with respect to the raw output for *Setosa*
First, write the soft‑max equation for the predicted probability of *Setosa*:
\[
\hat{y}_{\text{Setosa}} =
\frac{e^{z_{\text{Setosa}}}}
{e^{z_{\text{Setosa}}}+e^{z_{\text{Versicolor}}}+e^{z_{\text{Virginica}}}}
\]
Taking the derivative with respect to the raw output \(z_{\text{Setosa}}\) yields
\[
\frac{\partial \hat{y}_{\text{Setosa}}}{\partial z_{\text{Setosa}}}
= \hat{y}_{\text{Setosa}}\bigl(1-\hat{y}_{\text{Setosa}}\bigr)
\]

---
## 3️⃣ Derivative of the raw output for *Setosa* with respect to the bias **b₃**
The raw output for *Setosa* can be expressed as
\[
z_{\text{Setosa}} = w_{3}^{\top}x + b_{3}
\]
Hence
\[
\frac{\partial z_{\text{Setosa}}}{\partial b_{3}} = 1
\]
The visual explanation:
- The blue bent surface (other weights) is independent of \(b_{3}\) → derivative = 0.
- The orange bent surface (other biases) is also independent of \(b_{3}\) → derivative = 0.
- The bias term itself varies linearly with \(b_{3}\) → derivative = 1.

---
## 4️⃣ Putting it all together – Chain rule
\[
\frac{\partial \,\text{CE}}{\partial b_{3}}
= \frac{\partial \,\text{CE}}{\partial \hat{y}_{\text{Setosa}}}
\times
\frac{\partial \hat{y}_{\text{Setosa}}}{\partial z_{\text{Setosa}}}
\times
\frac{\partial z_{\text{Setosa}}}{\partial b_{3}}
\]
Substituting the three pieces derived above:
\[
\frac{\partial \,\text{CE}}{\partial b_{3}}
= \Bigl(-\frac{1}{\hat{y}_{\text{Setosa}}}\Bigr)
\times
\bigl(\hat{y}_{\text{Setosa}}\bigl(1-\hat{y}_{\text{Setosa}}\bigr)\bigr)
\times
1
= \hat{y}_{\text{Setosa}} - 1
\]
When the observed class is *Setosa* (i.e., the true label \(y_{\text{Setosa}} = 1\)), the derivative simplifies to
\[
\boxed{\displaystyle \frac{\partial \,\text{CE}}{\partial b_{3}} = \hat{y}_{\text{Setosa}} - 1}
\]
This is the gradient of the cross‑entropy loss with respect to the bias term \(b_{3}\) for the *Setosa* class.
---
*End of Part 3 – Chain Rule for a Single Output Class.*
Cleaned Markdown Content

So, when the predicted probability for Setosa is used to compute the cross‑entropy, the derivative of the cross‑entropy with respect to (b_3) is

In the next article, we will continue by applying the same process for Virginica.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia – a community‑driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
…and you’re done! 🚀
