[Paper] A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

Published: 2 days ago (February 28, 2026 at 05:05 PM EST)

6 min read

Source: arXiv

Source: arXiv - 2603.00824v1

Overview

This paper proposes a new way to think about how large language models (LLMs) blend multiple meanings—what the authors call superposition. Instead of assuming a single, global “dictionary” of concepts, they model the model’s knowledge as a collection of overlapping local maps (charts) stitched together like a geographic atlas. By borrowing ideas from gauge theory and sheaf theory, the authors expose hidden sources of interference that can limit interpretability and reliability of LLMs.

Key Contributions

Sheaf‑theoretic representation: Introduces a discrete gauge‑theoretic framework that treats semantic contexts as a stratified complex of local charts, each equipped with its own feature space and Fisher‑information metric.
Three measurable obstructions: Defines and quantifies (1) local jamming (when a chart’s active load exceeds its Fisher bandwidth), (2) proxy shearing (misalignment between geometric transport and a fixed correspondence proxy), and (3) nontrivial holonomy (path‑dependent transport around loops).
Gauge‑invariant holonomy computation: Shows that after fixing a gauge on a spanning tree of the context graph, the residual on any chord equals the holonomy of its fundamental cycle, making holonomy a tractable, gauge‑independent statistic.
Certified interference bounds: Provides non‑vacuous, data‑driven certificates for jamming/interference that hold across random seeds and hyper‑parameter settings, with zero observed violations.
Empirical validation on Llama 3.2 3B: Demonstrates the framework on a frozen Llama 3.2 3B Instruct model using three large corpora (WikiText‑103, a C4‑derived web‑text subset, and the‑stack‑smol).
Stability analysis: Shows that estimates of shearing and holonomy converge quickly with bootstrap and sample‑size experiments, especially on well‑conditioned subsystems.

Methodology

Context Complex Construction – The authors cluster token windows (contexts) from a corpus into a graph where nodes are charts (local semantic neighborhoods) and edges indicate overlap. This yields a stratified context complex.
Local Feature Spaces & Fisher Metric – Within each chart, they extract a low‑dimensional feature representation (e.g., via PCA on hidden states) and endow it with a Fisher‑information or Gauss‑Newton metric. This metric captures how sensitive the model’s predictions are to perturbations in each direction.
Gauge Theory Formalism – A gauge assigns a coordinate frame to each chart. Transporting a feature vector from one chart to another follows the connection defined by the Fisher metric. The holonomy around a loop measures the cumulative mis‑alignment after returning to the start.
Obstruction Quantification
- Jamming: Compare the norm of the active feature load to the chart’s Fisher bandwidth.
- Shearing: Compute the discrepancy between the true geometric transport and a fixed proxy mapping (e.g., a simple linear alignment).
- Holonomy: Sum the connection matrices around cycles; non‑zero values indicate path‑dependence.
Gauge Fixing & Computation – By selecting a spanning tree of the graph, the authors set a reference gauge (zero connection on tree edges). Residuals on chords become direct measurements of holonomy, making the computation efficient and gauge‑invariant.
Empirical Evaluation – They run the pipeline on a frozen Llama 3.2 3B model, measuring the three obstructions across the three corpora, and validate the bounds with bootstrap resampling.

Results & Findings

Obstruction	What the Paper Shows	Practical Meaning
Local Jamming	Certified upper bounds on the interference energy for each chart; zero violations across seeds.	Certain contexts are “overloaded” and can cause unpredictable token predictions.
Proxy Shearing	Shearing energy (D_{\text{shear}}) lower‑bounds a data‑dependent transfer mismatch; it cannot be eliminated by better alignment alone.	Even with perfect training, some mismatch between learned representations and downstream tasks is inevitable.
Holonomy	After gauge fixing, chord residuals equal holonomy of fundamental cycles; holonomy estimates are stable under bootstrapping.	The model’s internal representation is path‑dependent—different reasoning routes can lead to different outputs for the same input.
Stability	Sample‑size experiments reveal rapid concentration of (D_{\text{shear}}) and (D_{\text{hol}}) on well‑conditioned subgraphs, indicating reliable estimation with modest data.	Practitioners can compute these metrics without needing massive corpora.

Overall, the authors demonstrate that the three obstructions are measurable, non‑trivial, and persist even in a well‑trained, frozen LLM.

Practical Implications

Debugging Model Failures – By pinpointing charts that jam or exhibit high holonomy, engineers can localize where a model is likely to hallucinate or produce inconsistent answers.
Curriculum & Data Selection – The Fisher‑bandwidth metric can guide the construction of training curricula that avoid overloading specific semantic regions, potentially reducing catastrophic forgetting.
Model Editing & Safety – When applying model‑editing techniques (e.g., weight surgery or prompt‑based interventions), the gauge‑theoretic view offers a principled way to assess whether an edit will cause unintended shearing elsewhere.
Interpretability Tools – The atlas of local charts can be visualized as a semantic map, giving developers a “geographic” intuition of where concepts live and how they interact.
Transfer Learning Guarantees – The lower bound on shearing provides a theoretical floor for transfer‑learning performance loss, helping set realistic expectations when fine‑tuning LLMs on niche domains.
Efficient Monitoring – Because the metrics are computable on frozen models and converge with relatively small samples, they can be integrated into CI pipelines to continuously monitor representation health as data pipelines evolve.

Limitations & Future Work

Discrete Approximation – The framework relies on a discretized context graph; finer granularity may improve fidelity but at higher computational cost.
Frozen Model Assumption – All experiments use a non‑fine‑tuned Llama 3.2 3B; it remains unclear how the obstructions evolve during continued training or reinforcement‑learning from human feedback (RLHF).
Scalability to Larger Models – While the authors argue the method scales, empirical validation on models >10 B parameters is pending.
Proxy Choice – The shearing metric depends on a chosen correspondence proxy; alternative proxies could yield different bounds, and selecting an optimal proxy is an open problem.
User‑Facing Tools – Translating the mathematical constructs into developer‑friendly dashboards or APIs will require additional engineering effort.

Future research directions include extending the gauge‑theoretic atlas to multimodal models, exploring dynamic gauge fixing during training, and integrating holonomy‑aware regularizers to mitigate path‑dependent inconsistencies.

Authors

Hossein Javidnia

Paper Information

arXiv ID: 2603.00824v1
Categories: cs.LG, cs.AI, cs.CL, cs.NE
Published: February 28, 2026
PDF: Download PDF

[Paper] A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Tool Verification for Test-Time Reinforcement Learning

[Paper] Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment

[Paper] Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

[Paper] LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations