[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition
Source: arXiv - 2603.09972v1
Overview
This paper challenges the prevailing view of superposition in neural networks—the idea that many more latent features are packed into a limited number of dimensions. By moving beyond idealised, sparse‑and‑uncorrelated settings, the authors show that feature correlations can turn interference from a nuisance into a useful signal. They introduce a synthetic “Bag‑of‑Words Superposition” (BOWS) benchmark and reveal how realistic language models naturally organise correlated features into semantic clusters and cyclic structures.
Key Contributions
- BOWS benchmark: a controllable synthetic task that embeds binary bag‑of‑words representations of internet text into a superposed latent space.
- Empirical evidence that correlated features lead to constructive interference, contrary to the classic “interference‑as‑noise” narrative.
- Demonstration that ReLU non‑linearities still prune false positives while allowing correlated activations to reinforce each other.
- Identification of geometric patterns (semantic clusters, cycles) that emerge when models are trained with weight decay, matching phenomena observed in large language models.
- Open‑source implementation (https://github.com/LucasPrietoAl/correlations-feature-geometry) for reproducibility and further experimentation.
Methodology
- Synthetic Data Generation – The authors build a dataset of binary bag‑of‑words vectors derived from real internet text. Each vector encodes the presence/absence of a set of words (features).
- Superposition Encoding – A shallow feed‑forward network (linear layer + ReLU) is trained to map these high‑dimensional binary vectors into a lower‑dimensional hidden space, deliberately forcing superposition.
- Controlled Correlation Manipulation – By varying the co‑occurrence statistics of words (e.g., grouping synonyms or topic‑related terms), they create regimes of low vs. high feature correlation.
- Geometric Analysis – Hidden activations are visualised using dimensionality‑reduction (t‑SNE, UMAP) and examined for polytope‑like structures, clusters, and cycles.
- Weight‑Decay Experiments – Training runs with and without L2 regularisation are compared to assess how regularisation influences the emergent geometry.
Results & Findings
- Constructive Interference: When features co‑activate frequently, their superposed representations align, amplifying the shared signal rather than cancelling it.
- ReLU Filtering: The ReLU non‑linearity still effectively suppresses spurious activations, ensuring that only the jointly active correlated features survive.
- Emergent Semantic Clusters: Weight‑decayed models automatically group correlated words (e.g., “dog”, “puppy”, “bark”) into tight clusters in hidden space, mirroring semantic neighbourhoods seen in real LLMs.
- Cyclical Structures: Certain correlated feature sets form ring‑like arrangements, offering a geometric explanation for previously reported “circular” neuron activations in language models.
- Deviation from Polytope Theory: Classic superposition theory predicts regular polytopes (e.g., simplices) for uncorrelated features; the experiments show that realistic, correlated data produce richer, non‑polytope geometries.
Practical Implications
- Model Debugging & Interpretability: Understanding that correlated features can cooperate rather than clash helps developers design better probing tools and attribution methods for LLMs.
- Feature Engineering: When constructing embeddings or auxiliary heads, deliberately encouraging useful correlations (e.g., via contrastive losses) could improve representation efficiency.
- Regularisation Strategies: The finding that weight decay promotes semantically meaningful geometry suggests that modest L2 penalties might aid downstream interpretability without sacrificing performance.
- Sparse Autoencoder Design: Existing dictionary‑learning approaches that assume independence may be sub‑optimal; incorporating correlation‑aware objectives could yield more compact, expressive bases.
- Hardware‑Efficient Deployments: If correlated features can be packed more tightly, it may enable lower‑dimensional hidden layers for a given expressive power, reducing memory and compute footprints on edge devices.
Limitations & Future Work
- Synthetic Scope: BOWS, while controlled, abstracts away many complexities of full‑scale language modeling (e.g., token ordering, contextual dynamics).
- Scale Gap: Experiments are limited to modest hidden dimensions; it remains open how the observed geometry scales to billions‑parameter transformers.
- Non‑ReLU Activations: The study focuses on ReLU; other non‑linearities (GELU, SiLU) may interact differently with correlated interference.
- Dynamic Correlations: Future work could explore how correlations evolve during training and whether they can be steered intentionally.
- Application to Vision: Extending the analysis to vision models, where feature correlations are also strong, could validate the generality of the findings.
Authors
- Lucas Prieto
- Edward Stevinson
- Melih Barsbey
- Tolga Birdal
- Pedro A. M. Mediano
Paper Information
- arXiv ID: 2603.09972v1
- Categories: cs.LG, cs.AI, cs.CV
- Published: March 10, 2026
- PDF: Download PDF