Backprop Finally Made Sense When I Reimplemented It in Rust

Published: 3 months ago (February 2, 2026 at 01:15 AM EST)

4 min read

Source: Dev.to

Source: Dev.to

Introduction

I never used PyTorch or TensorFlow. My ML background was NumPy and scikit‑learn: I could train models, tune parameters, and get reasonable results, but when it came to explaining why things worked, my understanding was shaky. Backpropagation especially felt like a black box. I knew the steps at a high level, but I didn’t feel them.

So I stopped using ML libraries entirely and rebuilt the core of a neural network from scratch in Rust. That’s when backprop finally made sense.

Why the abstractions hide the learning

The problem wasn’t NumPy or scikit‑learn—they do exactly what they promise. The problem was that they abstract away everything that actually matters for understanding. By removing the abstractions (no autograd, just flat buffers, explicit indexing, and hand‑written matrix operations) the mystery disappeared.

Memory layout example

let data = [1, 2, 3, 4, 5, 6];
let shape = (2, 3); // (rows, cols)

// Logical view
// [ 1  2  3 ]
// [ 4  5  6 ]

// Memory view (row‑major)
// [1][2][3][4][5][6]
//  0  1  2  3  4  5

In Rust you can’t “kind of” do a transpose—you have to explain exactly how indices move in memory:

let index = row * cols + col;

That constraint changed everything. You can’t wave at gradients; you have to compute and store them explicitly.

What backprop really is

Backprop stopped being mysterious when I had to implement it myself—not symbolically, but as concrete bookkeeping. The process boils down to three repeated actions:

Applying the chain rule
Reusing intermediate values from the forward pass
Pushing gradients backward through matrix operations

Forward and backward passes

Forward pass:
X → [ Linear ] → [ Activation ] → ŷ → Loss

Backward pass:
∂Loss → [ dActivation ] → [ dLinear ] → ∂W, ∂X

When you write this by hand, a few things become painfully clear:

Gradients don’t “flow” — they are accumulated.
Shape alignment is the real constraint, not calculus.
Most bugs stem from incorrect assumptions about dimensions, not from the math itself.

Simple computational graph

        ┌─── w1 ───┐
X ──► (+)         (+) ──► Loss
        └─── w2 ───┘

Backward:

[ \frac{\partial \text{Loss}}{\partial X} = \frac{\partial \text{Loss}}{\partial \text{path}_1}

\frac{\partial \text{Loss}}{\partial \text{path}_2} ]

Backprop felt hard before because I never saw where the numbers actually lived.

Rust’s role in the learning process

Rust isn’t important because it’s fast here; it’s important because it’s unforgiving. It forces you to confront:

How tensors are laid out in memory
When data is copied vs. reused
Which operations allocate new buffers
Which gradients depend on which forward values

I avoided third‑party crates on purpose and used only the standard library. The goal wasn’t elegance or performance—it was transparency. If something worked, I wanted to be able to explain why it worked at the level of indices and buffers.

Step‑by‑step implementation

A tensor type backed by a flat buffer
Element‑wise operations
Transpose, reduction, and matrix multiplication
Linear regression
Backpropagation and gradient updates
A small neural network trained end‑to‑end

Nothing is optimized. Everything is explicit. This is not a framework.

Who should try this

Software developers who want to understand neural networks beyond high‑level APIs
Readers learning Rust who want a demanding, systems‑oriented project

If backprop still feels like something you “accept” rather than understand, rebuilding it once is worth the time.

Backprop Finally Made Sense When I Reimplemented It in Rust

Introduction

Why the abstractions hide the learning

Memory layout example

What backprop really is

Forward and backward passes

Simple computational graph

Rust’s role in the learning process

Step‑by‑step implementation

Who should try this

Further reading

Related posts

Introducing nono: A Secure Sandbox for AI Agents

Switch Claude Code providers in seconds with claude-provider (Plugin + CLI)

How to Set Up OpenClaw in 5-10 Minutes (No Mac Mini, No VPS, No Code)

Debugging My Brain: Why Procrastination is Actually an 'Emotional Regulation' Glitch