Cross Entropy Derivatives, Part 5: Optimizing bias with backpropagation
Source: Dev.to

In the previous article we calculated the derivatives of the cross‑entropy loss.
In this article we begin optimizing the bias term (b_3) using back‑propagation.
Setting the initial bias
We start by fixing the bias (b_3) to an initial value:
[ b_3 = -2 ]
To verify that back‑propagation actually improves the model, we first compute the total cross‑entropy over the training data for this value of (b_3).
Bias values used

We keep
[ b_3 = -2 ]
Forward‑pass computation
For a single input example we compute the intermediate values as follows.
Upper node

Bottom node

Raw output values

Softmax probabilities

Cross‑entropy loss per sample
| Petal | Sepal | Species | (p) | Cross Entropy |
|---|---|---|---|---|
| 0.04 | 0.42 | Setosa | 0.15 | 1.89 |
| 1.00 | 0.54 | Virginica | 0.71 | 0.35 |
| 0.50 | 0.37 | Versicolor | 0.65 | 0.43 |
The total cross‑entropy when (b_3 = -2) is
[ \text{Total CE} = 2.67 ]
Visualising the loss curve
We can visualise how the total cross‑entropy changes with different values of (b_3) by plotting (b_3) on the x‑axis and the total cross‑entropy on the y‑axis. Evaluating many values of (b_3) yields a smooth pink curve with a clear minimum.
Python code to generate the plot
import numpy as np
import matplotlib.pyplot as plt
def relu(x):
return np.maximum(0, x)
def softmax(raws):
exp_vals = np.exp(raws)
return exp_vals / np.sum(exp_vals)
# Fixed biases
b1 = 1.6
b2 = 0.7
b4 = 0
b5 = 1
# Sample training data: (petal, sepal, true_class)
data = [
(0.04, 0.42, 0), # Setosa
(1.00, 0.54, 2), # Virginica
(0.50, 0.37, 1), # Versicolor
]
def total_cross_entropy(b3):
total_ce = 0.0
for petal, sepal, target in data:
upper = petal * -2.5 + sepal * 0.6 + b1
bottom = petal * -1.5 + sepal * 0.4 + b2
raw_setosa = relu(upper) * -0.1 + relu(bottom) * 1.5 + b3
raw_versi = relu(upper) * 2.4 + relu(bottom) * -5.2 + b4
raw_virg = relu(upper) * -2.2 + relu(bottom) * 3.7 + b5
probs = softmax([raw_setosa, raw_versi, raw_virg])
total_ce += -np.log(probs[target])
return total_ce
b3_values = np.linspace(-6, 4, 200)
losses = [total_cross_entropy(b3) for b3 in b3_values]
plt.plot(b3_values, losses, color="pink")
plt.xlabel("b₃")
plt.ylabel("Total Cross Entropy")
plt.title("Cross Entropy vs b₃")
plt.show()

The plot shows the pink curve; the lowest point corresponds to the value of (b_3) that minimises the total cross‑entropy.
What’s next?
In the next part of the article we will use back‑propagation to move (b_3) toward this minimum and update it step by step.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia – a community‑driven, structured installer.
Installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀
