Cross Entropy Derivatives, Part 5: Optimizing bias with backpropagation

Published: 3 days ago (February 6, 2026 at 03:01 PM EST)

3 min read

Source: Dev.to

Cover image for Cross Entropy Derivatives, Part 5: Optimizing bias with backpropagation

In the previous article we calculated the derivatives of the cross‑entropy loss.
In this article we begin optimizing the bias term (b_3) using back‑propagation.

Setting the initial bias

We start by fixing the bias (b_3) to an initial value:

[ b_3 = -2 ]

To verify that back‑propagation actually improves the model, we first compute the total cross‑entropy over the training data for this value of (b_3).

Bias values used

Bias values illustration

We keep

[ b_3 = -2 ]

Forward‑pass computation

For a single input example we compute the intermediate values as follows.

Upper node

Upper node computation

Bottom node

Bottom node computation

Raw output values

Raw outputs

Softmax probabilities

Cross‑entropy loss per sample

Petal	Sepal	Species	(p)	Cross Entropy
0.04	0.42	Setosa	0.15	1.89
1.00	0.54	Virginica	0.71	0.35
0.50	0.37	Versicolor	0.65	0.43

The total cross‑entropy when (b_3 = -2) is

[ \text{Total CE} = 2.67 ]

Visualising the loss curve

We can visualise how the total cross‑entropy changes with different values of (b_3) by plotting (b_3) on the x‑axis and the total cross‑entropy on the y‑axis. Evaluating many values of (b_3) yields a smooth pink curve with a clear minimum.

Python code to generate the plot

import numpy as np
import matplotlib.pyplot as plt

def relu(x):
    return np.maximum(0, x)

def softmax(raws):
    exp_vals = np.exp(raws)
    return exp_vals / np.sum(exp_vals)

# Fixed biases
b1 = 1.6
b2 = 0.7
b4 = 0
b5 = 1

# Sample training data: (petal, sepal, true_class)
data = [
    (0.04, 0.42, 0),  # Setosa
    (1.00, 0.54, 2),  # Virginica
    (0.50, 0.37, 1),  # Versicolor
]

def total_cross_entropy(b3):
    total_ce = 0.0
    for petal, sepal, target in data:
        upper  = petal * -2.5 + sepal * 0.6 + b1
        bottom = petal * -1.5 + sepal * 0.4 + b2

        raw_setosa = relu(upper) * -0.1 + relu(bottom) * 1.5 + b3
        raw_versi  = relu(upper) * 2.4  + relu(bottom) * -5.2 + b4
        raw_virg   = relu(upper) * -2.2 + relu(bottom) * 3.7 + b5

        probs = softmax([raw_setosa, raw_versi, raw_virg])
        total_ce += -np.log(probs[target])
    return total_ce

b3_values = np.linspace(-6, 4, 200)
losses = [total_cross_entropy(b3) for b3 in b3_values]

plt.plot(b3_values, losses, color="pink")
plt.xlabel("b₃")
plt.ylabel("Total Cross Entropy")
plt.title("Cross Entropy vs b₃")
plt.show()

Loss curve (pink)

The plot shows the pink curve; the lowest point corresponds to the value of (b_3) that minimises the total cross‑entropy.

What’s next?

In the next part of the article we will use back‑propagation to move (b_3) toward this minimum and update it step by step.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia – a community‑driven, structured installer.

Installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done! 🚀

🔗 Explore Installerpedia here