[Paper] The Representational Geometry of Number

Published: 2 months ago (February 6, 2026 at 11:35 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.06843v1

Overview

The paper The Representational Geometry of Number investigates how large language models (LLMs) internally organize numeric concepts across different tasks. By treating numbers as points in a high‑dimensional space, the authors show that while task‑specific embeddings occupy distinct subspaces, the relationships between numbers (e.g., “larger than”, “even vs. odd”) stay remarkably consistent. This insight bridges two competing ideas in cognitive science—shared conceptual manifolds vs. orthogonal task spaces—and offers a concrete, mechanistic explanation for how LLMs can both generalize and specialize.

Key Contributions

Relational‑first perspective: Proposes that shared structure lives in the relations among concepts rather than in the concepts themselves.
Empirical evidence with numbers: Demonstrates that magnitude and parity are encoded along stable linear directions across multiple downstream tasks (e.g., arithmetic, classification, reasoning).
Subspace decomposition: Shows that each task’s number embeddings reside in a distinct, low‑dimensional subspace, yet these subspaces are linearly transformable into one another.
Linear mapping analysis: Introduces a simple linear‑regression framework to quantify how well one task’s subspace can be mapped onto another, revealing high fidelity (R² > 0.9 in most cases).
Mechanistic account: Provides a unified explanation for how LLMs balance shared relational knowledge with task‑specific flexibility, offering a new lens for interpretability research.

Methodology

Model selection: The authors fine‑tuned several popular transformer‑based language models (e.g., GPT‑2, LLaMA) on a suite of number‑centric tasks: magnitude comparison, parity detection, arithmetic word problems, and numeric reasoning.
Embedding extraction: For each task, they collected the hidden‑state vectors corresponding to numeric tokens (e.g., “seven”, “42”) from the final transformer layer.
Subspace identification: Using Principal Component Analysis (PCA) on each task’s embeddings, they isolated the top few components that captured > 95 % of variance, defining a task‑specific subspace.
Relational probing: Linear probes were trained to predict scalar magnitude and binary parity from the embeddings. The direction vectors (weights) of these probes served as “relational axes.”
Cross‑task mapping: Pairwise linear regression models were fitted to map embeddings from one task’s subspace to another’s. Mapping quality was assessed via reconstruction error and cosine similarity of relational axes.
Visualization: t‑SNE and 2‑D PCA plots illustrated that numbers form consistent geometric patterns (e.g., a monotonic line for magnitude) even when the absolute coordinates differ across tasks.

Results & Findings

Stable relational axes: The probe weights for magnitude and parity were nearly identical (cosine similarity > 0.98) across all tasks, indicating a shared relational geometry.
Distinct subspaces: Each task’s embeddings occupied a unique subspace, with minimal overlap in the raw coordinate space (average subspace angle ≈ 45°).
High‑fidelity linear transforms: Mapping one task’s subspace to another recovered > 90 % of variance, and the relational axes remained aligned after transformation.
Task‑specific nuances: While magnitude and parity were universal, more complex relations (e.g., “multiple of three”) showed weaker but still linear‑transformable patterns, suggesting a hierarchy of relational stability.
Model‑agnostic behavior: The phenomena held across model sizes (from 124 M to 7 B parameters) and across both encoder‑only and decoder‑only architectures.

Practical Implications

Transfer learning shortcuts: Knowing that relational structure is preserved means developers can fine‑tune a model on a cheap proxy task (e.g., parity detection) and reliably reuse the learned embeddings for more expensive numeric reasoning tasks, simply applying a learned linear map.
Debugging & interpretability tools: Linear probes for magnitude/parity can serve as lightweight sanity checks when deploying LLMs in finance, scientific computing, or education platforms.
Modular system design: Engineers can build “task adapters”—small linear layers that re‑orient the shared relational space to the needs of a downstream application, reducing the need for full model re‑training.
Safety & bias mitigation: Since the relational geometry is stable, systematic errors (e.g., mis‑ranking large numbers) can be identified and corrected at the relational level rather than hunting through task‑specific weights.
Cross‑modal extensions: The same geometric principles could be applied to non‑numeric concepts (e.g., dates, units, code tokens), opening avenues for more coherent multi‑task LLM deployments.

Limitations & Future Work

Scope limited to numbers: While numbers provide a clean testbed, it remains unclear how well the relational‑geometry hypothesis scales to abstract or highly contextual concepts.
Linear assumption: The analysis hinges on linear mappings; non‑linear transformations might be required for more complex relational structures.
Static probing: Probes were trained post‑hoc; integrating relational constraints directly into the training objective could yield stronger guarantees.
Dataset bias: The tasks used relatively simple, synthetic prompts; real‑world numeric language (e.g., financial reports) may introduce noise that disrupts the clean geometry.
Future directions: Extending the framework to multimodal models (vision‑language), exploring hierarchical relational axes, and designing training regimes that explicitly preserve relational geometry across tasks.

Authors

Zhimin Hu
Lanhao Niu
Sashank Varma

Paper Information

arXiv ID: 2602.06843v1
Categories: cs.CL, cs.AI
Published: February 6, 2026
PDF: Download PDF

[Paper] The Representational Geometry of Number

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

[Paper] Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

[Paper] Uncovering Cross-Objective Interference in Multi-Objective Alignment

[Paper] Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory