[Paper] From Data Statistics to Feature Geometry: How Correlations Shape Superposition

Published: (March 10, 2026 at 01:59 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.09972v1

Overview

This paper challenges the prevailing view of superposition in neural networks—the idea that many more latent features are packed into a limited number of dimensions. By moving beyond idealised, sparse‑and‑uncorrelated settings, the authors show that feature correlations can turn interference from a nuisance into a useful signal. They introduce a synthetic “Bag‑of‑Words Superposition” (BOWS) benchmark and reveal how realistic language models naturally organise correlated features into semantic clusters and cyclic structures.

Key Contributions

  • BOWS benchmark: a controllable synthetic task that embeds binary bag‑of‑words representations of internet text into a superposed latent space.
  • Empirical evidence that correlated features lead to constructive interference, contrary to the classic “interference‑as‑noise” narrative.
  • Demonstration that ReLU non‑linearities still prune false positives while allowing correlated activations to reinforce each other.
  • Identification of geometric patterns (semantic clusters, cycles) that emerge when models are trained with weight decay, matching phenomena observed in large language models.
  • Open‑source implementation (https://github.com/LucasPrietoAl/correlations-feature-geometry) for reproducibility and further experimentation.

Methodology

  1. Synthetic Data Generation – The authors build a dataset of binary bag‑of‑words vectors derived from real internet text. Each vector encodes the presence/absence of a set of words (features).
  2. Superposition Encoding – A shallow feed‑forward network (linear layer + ReLU) is trained to map these high‑dimensional binary vectors into a lower‑dimensional hidden space, deliberately forcing superposition.
  3. Controlled Correlation Manipulation – By varying the co‑occurrence statistics of words (e.g., grouping synonyms or topic‑related terms), they create regimes of low vs. high feature correlation.
  4. Geometric Analysis – Hidden activations are visualised using dimensionality‑reduction (t‑SNE, UMAP) and examined for polytope‑like structures, clusters, and cycles.
  5. Weight‑Decay Experiments – Training runs with and without L2 regularisation are compared to assess how regularisation influences the emergent geometry.

Results & Findings

  • Constructive Interference: When features co‑activate frequently, their superposed representations align, amplifying the shared signal rather than cancelling it.
  • ReLU Filtering: The ReLU non‑linearity still effectively suppresses spurious activations, ensuring that only the jointly active correlated features survive.
  • Emergent Semantic Clusters: Weight‑decayed models automatically group correlated words (e.g., “dog”, “puppy”, “bark”) into tight clusters in hidden space, mirroring semantic neighbourhoods seen in real LLMs.
  • Cyclical Structures: Certain correlated feature sets form ring‑like arrangements, offering a geometric explanation for previously reported “circular” neuron activations in language models.
  • Deviation from Polytope Theory: Classic superposition theory predicts regular polytopes (e.g., simplices) for uncorrelated features; the experiments show that realistic, correlated data produce richer, non‑polytope geometries.

Practical Implications

  • Model Debugging & Interpretability: Understanding that correlated features can cooperate rather than clash helps developers design better probing tools and attribution methods for LLMs.
  • Feature Engineering: When constructing embeddings or auxiliary heads, deliberately encouraging useful correlations (e.g., via contrastive losses) could improve representation efficiency.
  • Regularisation Strategies: The finding that weight decay promotes semantically meaningful geometry suggests that modest L2 penalties might aid downstream interpretability without sacrificing performance.
  • Sparse Autoencoder Design: Existing dictionary‑learning approaches that assume independence may be sub‑optimal; incorporating correlation‑aware objectives could yield more compact, expressive bases.
  • Hardware‑Efficient Deployments: If correlated features can be packed more tightly, it may enable lower‑dimensional hidden layers for a given expressive power, reducing memory and compute footprints on edge devices.

Limitations & Future Work

  • Synthetic Scope: BOWS, while controlled, abstracts away many complexities of full‑scale language modeling (e.g., token ordering, contextual dynamics).
  • Scale Gap: Experiments are limited to modest hidden dimensions; it remains open how the observed geometry scales to billions‑parameter transformers.
  • Non‑ReLU Activations: The study focuses on ReLU; other non‑linearities (GELU, SiLU) may interact differently with correlated interference.
  • Dynamic Correlations: Future work could explore how correlations evolve during training and whether they can be steered intentionally.
  • Application to Vision: Extending the analysis to vision models, where feature correlations are also strong, could validate the generality of the findings.

Authors

  • Lucas Prieto
  • Edward Stevinson
  • Melih Barsbey
  • Tolga Birdal
  • Pedro A. M. Mediano

Paper Information

  • arXiv ID: 2603.09972v1
  • Categories: cs.LG, cs.AI, cs.CV
  • Published: March 10, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »