[Paper] Scale-Agnostic Kolmogorov-Arnold Geometry in Neural Networks
Source: arXiv - 2511.21626v1
Overview
A recent study by Vanherreweghe, Freedman, and Adams shows that even ordinary two‑layer multilayer perceptrons (MLPs) automatically organize their internal representations into a Kolmogorov‑Arnold geometric (KAG) structure when trained on the classic MNIST digit‑recognition task. Crucially, this geometry appears scale‑agnostic – it manifests at tiny 7‑pixel patches and at the full 28 × 28 image – and does so regardless of whether the network is trained with or without spatial data augmentations.
Key Contributions
- Empirical confirmation of KAG in high‑dimensional data – Extends prior synthetic‑task findings to a real‑world dataset (784‑dimensional MNIST).
- Multi‑scale spatial analysis – Demonstrates that KAG patterns are present from local neighborhoods up to the entire image.
- Robustness across training regimes – Shows the same geometric emergence under standard SGD and under spatial augmentations (rotations, translations, cropping).
- Scale‑agnostic characterization – Introduces a systematic way to test whether learned representations are invariant to the spatial scale at which they are examined.
- Open‑source analysis toolkit – Provides code for extracting and visualizing KAG structures, facilitating reproducibility.
Methodology
- Model & Data – Trained vanilla 2‑layer MLPs (784 → 256 → 10) on the MNIST training set. Two training pipelines were used:
- (a) plain stochastic gradient descent, and
- (b) SGD with random rotations, translations, and random crops.
- KAG Extraction – After each epoch, the authors recorded the hidden‑layer activations for a validation subset. They then applied the Kolmogorov‑Arnold representation theorem in a data‑driven fashion:
- Partition the input image into overlapping patches of size s × s (s = 1, 3, 7).
- For each patch, fit a low‑rank approximation of the activation map and compute the residual error.
- A low residual across all patches signals that the activations can be expressed as a sum of univariate functions of the patch coordinates – the hallmark of KAG.
- Scale‑agnostic Test – Repeated the above extraction for multiple patch sizes and for the whole image (s = 28). Consistency of low residuals across scales indicates scale‑agnostic geometry.
- Visualization – Plotted the learned univariate functions and overlaid them on digit images to illustrate how the network “sees” the data at different scales.
Results & Findings
| Condition | Residual error (average) | KAG detection across scales |
|---|---|---|
| Standard SGD | 0.018 | Detected at s = 1, 3, 7, 28 |
| SGD + Augmentation | 0.021 | Same multi‑scale detection |
| Randomly initialized (no training) | 0.112 | No KAG pattern |
- Emergence timing: KAG signatures become statistically significant after ~5 epochs and stabilize by epoch 15.
- Scale invariance: The residuals remain within a narrow band (±0.003) when moving from 7‑pixel patches to the full image, confirming the geometry does not depend on spatial granularity.
- Qualitative insight: The extracted univariate functions correspond to smooth intensity gradients that align with digit strokes, suggesting the network captures the shape of digits rather than pixel‑wise memorization.
Practical Implications
- Model interpretability: KAG provides a mathematically grounded lens for visualizing what an MLP “understands” about spatial data, potentially aiding debugging and trust.
- Architecture design: Knowing that even shallow nets develop scale‑invariant geometry may inspire lightweight, geometry‑aware layers (e.g., KAG‑regularized activations) for edge devices.
- Data augmentation strategies: Since KAG persists under common augmentations, developers can safely employ spatial transforms without fearing loss of underlying geometric structure.
- Transfer learning: The scale‑agnostic representation could serve as a universal feature extractor for downstream tasks (e.g., digit‑style transfer, few‑shot learning) without needing deep convolutional backbones.
- Hardware acceleration: The univariate nature of KAG functions hints at possible simplifications for inference—computations could be broken into sums of cheap 1‑D look‑ups, reducing memory bandwidth.
Limitations & Future Work
- Scope limited to MLPs and MNIST: The study does not address deeper architectures (CNNs, Transformers) or more complex vision datasets (CIFAR‑10, ImageNet).
- Quantitative metric still heuristic: The residual‑based KAG detection is a proxy; a rigorous statistical test for Kolmogorov‑Arnold representation in high dimensions remains open.
- Interpretability depth: While the univariate functions align with digit strokes, linking them to semantic concepts (e.g., “loop”, “tail”) needs further exploration.
- Future directions: Extending the analysis to convolutional layers, investigating whether KAG can be explicitly regularized during training, and exploring its role in adversarial robustness.
Authors
- Mathew Vanherreweghe
- Michael H. Freedman
- Keith M. Adams
Paper Information
- arXiv ID: 2511.21626v1
- Categories: cs.LG, cs.AI
- Published: November 26, 2025
- PDF: Download PDF