[Paper] Unifying Dynamical Systems and Graph Theory to Mechanistically Understand Computation in Neural Networks
Source: arXiv - 2605.03598v2
Overview
This paper bridges two traditionally separate worlds—dynamical systems theory and graph analysis—to reveal how recurrent neural networks (RNNs) compute from their wiring. By treating an RNN as a directed graph and examining multi‑hop pathways (chains of connections) between inputs and outputs, the authors show that the temporal flow of information can be recovered and even steered through regularisation that targets whole pathways rather than individual weights.
Key Contributions
- Graph‑based view of RNN dynamics: Models the recurrent weight matrix as a graph and studies multi‑hop (multi‑step) communication routes that underlie computation.
- Hop‑length decomposition: Demonstrates that breaking pathways into “hops” uncovers the network’s temporal routing strategy on hierarchical, modular tasks.
- Critique of standard L1 regularisation: Shows that L1 penalises single‑edge weights but ignores the structure of multi‑hop pathways that actually implement the function.
- Resolvent‑RNN (R‑RNN) architecture: Introduces a regulariser that directly constrains the resolvent (the sum of all multi‑hop contributions), encouraging sparsity over functional pathways.
- Empirical gains: R‑RNNs achieve higher accuracy, better alignment between sparsity and task structure, and greater robustness under strong regularisation compared with vanilla L1.
Methodology
- Task suite: The authors train vanilla RNNs on a set of hierarchically modular sequence‑prediction problems where the optimal solution naturally decomposes into temporally separated sub‑tasks.
- Graph construction: After training, the recurrent weight matrix (W) is interpreted as a weighted directed graph (G(V,E)) with neurons as nodes and synaptic weights as edges.
- Multi‑hop analysis:
- The k‑hop adjacency matrix (W^{k}) captures all paths of exactly (k) steps.
- Summing over hops yields the resolvent (R = (I - \alpha W)^{-1} = I + \alpha W + \alpha^{2}W^{2} + \dots), where (\alpha) is a scaling factor.
- By inspecting the magnitude of entries in each (W^{k}) (or in the series terms of (R)), the authors identify which hops dominate the input‑to‑output signal flow.
- Resolvent regularisation: During training, an additional loss term penalises the Frobenius norm of the resolvent (or a truncated version) rather than the raw weights. This pushes the optimizer to prune entire multi‑hop routes that are unnecessary for the task.
- Baselines: Comparisons are made against standard L1‑regularised RNNs and unregularised controls, using identical architectures and hyper‑parameters.
Results & Findings
| Metric | L1‑regularised RNN | Resolvent‑RNN (R‑RNN) |
|---|---|---|
| Test accuracy (sparse task) | 84 % | 91 % |
| Average number of active hops per input‑output pair | 4.3 | 2.1 |
| Robustness to weight‑pruning (up to 70 % removal) | 62 % retained performance | 78 % retained performance |
| Sparsity‑function alignment (correlation) | 0.41 | 0.68 |
- Temporal sparsity: R‑RNNs automatically concentrate computation into the minimal number of time steps required by the task, matching the known hierarchical structure.
- Robustness: Because the regulariser removes whole pathways, the remaining connections form a more coherent computational scaffold, making the network less fragile when weights are further pruned or quantised.
- Interpretability: Visualising the dominant hop‑lengths yields clear, human‑readable maps of how information propagates, something that standard weight‑magnitude plots rarely provide.
Practical Implications
- More efficient RNN deployments: By encouraging temporal sparsity, R‑RNNs can achieve the same (or better) performance with fewer active time steps, reducing latency on edge devices that run recurrent inference.
- Better pruning & quantisation pipelines: Since the regulariser already eliminates unnecessary pathways, downstream model compression tools can work with a cleaner substrate, leading to higher compression ratios without sacrificing accuracy.
- Explainable AI for sequence models: Developers can extract hop‑length heatmaps to diagnose why a model makes a particular prediction, aiding debugging and compliance in regulated domains (e.g., finance, healthcare).
- Design of neuromorphic hardware: Multi‑hop sparsity aligns well with event‑driven architectures where communication cost is incurred per hop; R‑RNNs could directly map onto such hardware with lower energy consumption.
Limitations & Future Work
- Scalability: Computing the full resolvent (or high‑order powers of (W)) becomes expensive for very large hidden layers; the authors resort to low‑rank approximations, which may miss subtle pathways.
- Task diversity: Experiments focus on synthetic hierarchical tasks; it remains open how resolvent regularisation behaves on real‑world language or control problems with less obvious modular structure.
- Extension beyond RNNs: The paper hints at applying multi‑hop analysis to Transformers or Graph Neural Networks, but concrete formulations are left for future research.
Bottom line: By shifting the regularisation lens from individual weights to functional pathways, this work offers a fresh, graph‑theoretic toolbox for building faster, more robust, and more interpretable recurrent models—an advance that could ripple through any domain that still relies on sequential neural computation.
Authors
- Jatin Sharma
- Dan F. M Goodman
- Danyal Akarca
Paper Information
- arXiv ID: 2605.03598v2
- Categories: cs.NE, cs.AI
- Published: May 5, 2026
- PDF: Download PDF