[Paper] Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

Published: 2 days ago (June 2, 2026 at 08:52 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2606.03584v1

Overview

The paper presents the first successful training of a predictive‑coding network (PCN) on the full ImageNet dataset using Equilibrium Propagation (EP) – a physics‑inspired alternative to back‑propagation. By marrying a centered EP formulation with a new equilibration routine, the authors train a 10‑layer convolutional PCN (VGG10) that reaches a 13.23 % top‑5 error, only a hair’s breadth away from the 12.2 % achieved by standard back‑propagation. This work shows that EP‑based learning can scale to the same problem sizes that dominate modern deep‑learning practice.

Key Contributions

EP‑compatible training pipeline for PCNs – introduces a centered EP variant together with a novel equilibration scheme tailored to predictive‑coding dynamics.
Large‑scale demonstration – trains a 10‑layer convolutional PCN on full‑size ImageNet (≈1.3 M images, 224 × 224 pixels).
Competitive performance – achieves 13.23 % top‑5 error, within 1 % of a strong back‑propagation baseline on the same architecture.
Scalability insight – argues that EP’s bottleneck is computational (e.g., equilibrium convergence) rather than a fundamental limitation of the learning rule.
Open‑source reference implementation – provides code and hyper‑parameter details that other researchers and engineers can reuse.

Methodology

Predictive‑Coding Network (PCN) – a hierarchical model where each layer predicts the activity of the layer below and receives an error signal (the prediction error) that drives local updates. The network’s dynamics can be expressed as the gradient descent of an energy function, making it an energy‑based model.
Equilibrium Propagation (EP) – instead of explicitly computing gradients via back‑prop, EP runs the network twice:
- Free phase: the system settles to an equilibrium under the current parameters and input.
- Nudged phase: a small “nudging” term pushes the output toward the target label, and the system relaxes again.
  The difference in the network’s steady‑state states yields an unbiased estimator of the gradient.
Centered EP + Equilibration Scheme – the authors adopt a centered version of EP that reduces bias and variance, and they design a layer‑wise equilibration schedule that speeds up convergence for deep convolutional PCNs. This schedule adaptively controls the number of internal iterations per layer, preventing the costly “run‑to‑steady‑state” loops that have hampered previous EP attempts.
Architecture & Training Details – a VGG‑style 10‑layer convolutional PCN (VGG10) is used, matching the depth and filter layout of a classic VGG network. Training follows standard ImageNet practices (data augmentation, learning‑rate schedule, batch size 256) but replaces the usual back‑prop update with the EP‑derived gradient estimate.

Results & Findings

Metric	EP‑trained PCN (VGG10)	Back‑prop baseline (same net)
Top‑5 error (ImageNet)	13.23 %	12.2 %
Training time (GPU‑hours)	~1.4 × back‑prop (due to extra equilibration steps)	—
Memory footprint	Comparable (no need to store full backward graph)	—

Accuracy: The EP‑trained model trails the back‑prop baseline by only ~1 % absolute top‑5 error, a remarkable gap given the historical difficulty of scaling EP beyond toy problems.
Efficiency: While EP incurs extra forward‑only passes to reach equilibrium, it eliminates the need for a separate backward pass and large gradient buffers, which can be advantageous on hardware where memory is at a premium.
Stability: The centered EP formulation yields smoother loss curves and fewer spikes during training, confirming the theoretical benefits of bias reduction.

Practical Implications

Neuromorphic & Analog Hardware – EP’s reliance on local dynamics and energy minimization aligns naturally with analog circuits (e.g., resistive crossbars, memristor arrays). The demonstrated ImageNet‑scale performance suggests that future chips could train deep vision models without the digital back‑propagation pipeline.
Energy‑Efficient Training – By avoiding the explicit backward pass, EP can reduce the number of memory accesses, a major source of power consumption in GPUs/TPUs. This could translate into lower operational costs for edge devices that need on‑device learning.
Robustness to Gradient Issues – Since EP computes gradients through equilibrium differences rather than chain‑rule multiplication, it may be less susceptible to exploding/vanishing gradients, opening doors for more stable training of very deep or recurrent structures.
Alternative Research Paradigm – The work validates a physics‑based learning rule at a scale that matters to industry, encouraging researchers to explore hybrid models that blend biologically plausible learning with modern deep‑learning performance.

Limitations & Future Work

Computational Overhead – EP still requires multiple relaxation steps per minibatch, making training slower (≈1.4×) than conventional back‑prop on GPUs. Optimizing the equilibration schedule or leveraging specialized hardware will be crucial.
Scaling Beyond 10 Layers – The study stops at a 10‑layer VGG‑style net; deeper architectures (ResNets, Transformers) may pose new equilibrium challenges.
Task Diversity – Only image classification was evaluated. Extending EP‑trained PCNs to detection, segmentation, or language tasks remains an open question.
Theoretical Guarantees – While centered EP reduces bias, a rigorous analysis of convergence rates for large, non‑convex networks is still lacking.

Future research directions highlighted by the authors include:

Integrating EP with sparsity‑promoting priors to cut relaxation time.
Implementing the method on emerging neuromorphic platforms.
Exploring hybrid training regimes that combine a few EP steps with occasional back‑prop updates for faster convergence.

Authors

Tugdual Kerjan
Rasmus Høier
Benjamin Scellier

Paper Information

arXiv ID: 2606.03584v1
Categories: cs.LG, cond-mat.dis-nn, cs.NE
Published: June 2, 2026
PDF: Download PDF

[Paper] Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

[Paper] Streaming Communication in Multi-Agent Reasoning

[Paper] Reinforcement Learning from Rich Feedback with Distributional DAgger

[Paper] Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization