[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Published: 13 hours ago (March 10, 2026 at 01:59 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.09972v1

Overview

This paper challenges the prevailing view of superposition in neural networks—the idea that many more latent features are packed into a limited number of dimensions. By moving beyond idealised, sparse‑and‑uncorrelated settings, the authors show that feature correlations can turn interference from a nuisance into a useful signal. They introduce a synthetic “Bag‑of‑Words Superposition” (BOWS) benchmark and reveal how realistic language models naturally organise correlated features into semantic clusters and cyclic structures.

Key Contributions

BOWS benchmark: a controllable synthetic task that embeds binary bag‑of‑words representations of internet text into a superposed latent space.
Empirical evidence that correlated features lead to constructive interference, contrary to the classic “interference‑as‑noise” narrative.
Demonstration that ReLU non‑linearities still prune false positives while allowing correlated activations to reinforce each other.
Identification of geometric patterns (semantic clusters, cycles) that emerge when models are trained with weight decay, matching phenomena observed in large language models.
Open‑source implementation (https://github.com/LucasPrietoAl/correlations-feature-geometry) for reproducibility and further experimentation.

Methodology

Synthetic Data Generation – The authors build a dataset of binary bag‑of‑words vectors derived from real internet text. Each vector encodes the presence/absence of a set of words (features).
Superposition Encoding – A shallow feed‑forward network (linear layer + ReLU) is trained to map these high‑dimensional binary vectors into a lower‑dimensional hidden space, deliberately forcing superposition.
Controlled Correlation Manipulation – By varying the co‑occurrence statistics of words (e.g., grouping synonyms or topic‑related terms), they create regimes of low vs. high feature correlation.
Geometric Analysis – Hidden activations are visualised using dimensionality‑reduction (t‑SNE, UMAP) and examined for polytope‑like structures, clusters, and cycles.
Weight‑Decay Experiments – Training runs with and without L2 regularisation are compared to assess how regularisation influences the emergent geometry.

Results & Findings

Constructive Interference: When features co‑activate frequently, their superposed representations align, amplifying the shared signal rather than cancelling it.
ReLU Filtering: The ReLU non‑linearity still effectively suppresses spurious activations, ensuring that only the jointly active correlated features survive.
Emergent Semantic Clusters: Weight‑decayed models automatically group correlated words (e.g., “dog”, “puppy”, “bark”) into tight clusters in hidden space, mirroring semantic neighbourhoods seen in real LLMs.
Cyclical Structures: Certain correlated feature sets form ring‑like arrangements, offering a geometric explanation for previously reported “circular” neuron activations in language models.
Deviation from Polytope Theory: Classic superposition theory predicts regular polytopes (e.g., simplices) for uncorrelated features; the experiments show that realistic, correlated data produce richer, non‑polytope geometries.

Practical Implications

Model Debugging & Interpretability: Understanding that correlated features can cooperate rather than clash helps developers design better probing tools and attribution methods for LLMs.
Feature Engineering: When constructing embeddings or auxiliary heads, deliberately encouraging useful correlations (e.g., via contrastive losses) could improve representation efficiency.
Regularisation Strategies: The finding that weight decay promotes semantically meaningful geometry suggests that modest L2 penalties might aid downstream interpretability without sacrificing performance.
Sparse Autoencoder Design: Existing dictionary‑learning approaches that assume independence may be sub‑optimal; incorporating correlation‑aware objectives could yield more compact, expressive bases.
Hardware‑Efficient Deployments: If correlated features can be packed more tightly, it may enable lower‑dimensional hidden layers for a given expressive power, reducing memory and compute footprints on edge devices.

Limitations & Future Work

Synthetic Scope: BOWS, while controlled, abstracts away many complexities of full‑scale language modeling (e.g., token ordering, contextual dynamics).
Scale Gap: Experiments are limited to modest hidden dimensions; it remains open how the observed geometry scales to billions‑parameter transformers.
Non‑ReLU Activations: The study focuses on ReLU; other non‑linearities (GELU, SiLU) may interact differently with correlated interference.
Dynamic Correlations: Future work could explore how correlations evolve during training and whether they can be steered intentionally.
Application to Vision: Extending the analysis to vision models, where feature correlations are also strong, could validate the generality of the findings.

Authors

Lucas Prieto
Edward Stevinson
Melih Barsbey
Tolga Birdal
Pedro A. M. Mediano

Paper Information

arXiv ID: 2603.09972v1
Categories: cs.LG, cs.AI, cs.CV
Published: March 10, 2026
PDF: Download PDF

[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion

[Paper] From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding

[Paper] No Image, No Problem: End-to-End Multi-Task Cardiac Analysis from Undersampled k-Space

[Paper] Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation