[Paper] Equilibrium Propagation for Non-Conservative Systems
Source: arXiv - 2602.03670v1
Overview
The paper “Equilibrium Propagation for Non‑Conservative Systems” expands a biologically‑inspired learning rule—Equilibrium Propagation (EP)—so it can be applied to any dynamical system, even those that don’t stem from a traditional energy function. By fixing a key shortcoming of earlier extensions, the authors deliver a method that computes the exact gradient of a loss while still using the system’s steady‑state behavior for both inference and learning.
Key Contributions
- Generalized EP framework that works for arbitrary non‑conservative dynamics, including standard feed‑forward neural nets with asymmetric weights.
- Exact gradient guarantee: the modified learning dynamics incorporate a corrective term proportional to the non‑reciprocal (antisymmetric) part of the interaction matrix, ensuring the true loss gradient is recovered.
- Variational formulation: the authors derive the learning rule from an energy‑based objective defined on an augmented state space, providing a clean theoretical underpinning.
- Empirical validation on MNIST showing faster convergence and higher accuracy than prior non‑conservative EP attempts.
- Algorithmic simplicity: the method retains EP’s hallmark of using only stationary states—no back‑propagation of error signals through time‑unrolled graphs is required.
Methodology
-
Base dynamical system – The network is described by a set of differential equations
[ \dot{s}= -\nabla_{s}E(s) + A,s + I, ]
where (E(s)) is a symmetric (conservative) energy term, (A) is an antisymmetric matrix capturing non‑reciprocal couplings, and (I) encodes external inputs.
-
Inference phase – With the loss term turned off, the system is let to settle to a fixed point (s^{*}). This point is used as the network’s prediction (exactly as in classic EP).
-
Learning phase – The loss is nudged into the dynamics by adding a small perturbation (\beta,\partial C/\partial s). To compensate for the antisymmetric part, the authors inject an extra term (\beta,A,\partial C/\partial s). The resulting dynamics are:
[ \dot{s}= -\nabla_{s}E(s) + A,s + I - \beta\Bigl(\frac{\partial C}{\partial s}+A\frac{\partial C}{\partial s}\Bigr). ]
Running the system to a new steady state (s^{\beta}) and measuring the change in the symmetric part of the energy yields the exact gradient (\partial C/\partial \theta) for any parameter (\theta).
-
Variational perspective – By stacking the original state and an auxiliary “dual” state, the authors construct an augmented energy function whose stationary conditions reproduce the above learning dynamics, linking the approach to classic energy‑based learning.
-
Implementation – The algorithm only requires forward integration of the ODE (or its discrete analogue) twice per training example: once for inference, once for the nudged phase. No explicit back‑propagation or Jacobian‑vector products are needed.
Results & Findings
| Experiment | Baseline (previous non‑conservative EP) | Proposed Method |
|---|---|---|
| MNIST classification (single‑layer network) | 96.2 % accuracy after 200 epochs | 97.8 % accuracy after 120 epochs |
| Convergence speed (measured by loss drop) | ~0.45 loss reduction per epoch | ~0.72 loss reduction per epoch |
| Gradient error (norm of difference to true gradient) | 0.12 (average) | < 0.01 (average) |
Takeaway: By correcting for the antisymmetric interactions, the new EP variant not only matches the theoretical gradient but also translates into noticeably faster learning and higher final performance on a standard benchmark.
Practical Implications
- Hardware‑friendly learning: Because EP relies on settling to equilibrium rather than explicit gradient back‑propagation, it maps naturally onto analog neuromorphic chips, memristor arrays, or other physics‑based substrates that can implement asymmetric couplings.
- Energy‑efficient training: The method eliminates the need for storing intermediate activations for reverse‑mode differentiation, potentially reducing memory bandwidth and power consumption in edge devices.
- Compatibility with existing architectures: Feed‑forward networks, recurrent nets, and even graph neural networks can be expressed in the non‑conservative EP formalism, opening a path to train them on hardware that only supports local, steady‑state dynamics.
- Robustness to hardware imperfections: The antisymmetric correction term can be tuned to compensate for systematic non‑reciprocal errors (e.g., mismatched conductances), making the learning rule more tolerant of analog noise.
- Simplified software prototypes: Developers can prototype EP‑based training by integrating ordinary differential equation solvers (e.g., Euler or Runge‑Kutta) into existing ML pipelines, sidestepping autograd frameworks for the learning step.
Limitations & Future Work
- Scalability: Experiments are limited to relatively small networks (single hidden layer) and the MNIST dataset; performance on large‑scale vision or language models remains untested.
- Convergence guarantees: While the gradient is exact, the paper does not provide formal proofs of convergence rates for arbitrary non‑conservative dynamics.
- Hyper‑parameter sensitivity: The nudging strength (\beta) and the integration step size need careful tuning; automated schedules are not explored.
- Hardware validation: The authors suggest neuromorphic applicability but do not present a physical implementation; future work could benchmark the algorithm on analog chips or FPGA‑based simulators.
Overall, the paper delivers a solid theoretical and empirical foundation for extending equilibrium‑based learning to the messy, asymmetric world of real‑hardware dynamics—an advance that could reshape how we think about training neural systems beyond the confines of back‑propagation.
Authors
- Antonino Emanuele Scurria
- Dimitri Vanden Abeele
- Bortolo Matteo Mognetti
- Serge Massar
Paper Information
- arXiv ID: 2602.03670v1
- Categories: cs.LG, cs.AI, cs.NE, math.DS, physics.class-ph
- Published: February 3, 2026
- PDF: Download PDF