[Paper] Neuron Populations Exhibit Divergent Selectivity with Scale
Source: arXiv - 2606.03990v1
Overview
The paper investigates how individual neurons in large language and vision models change as the models get bigger. By focusing on “Rosetta Neurons”—a set of neurons that behave consistently across independently trained models—the authors uncover a scaling law that governs both the quantity of these interpretable neurons and their selectivity. The findings suggest that as models grow, a shrinking‑but‑still‑significant fraction of neurons become highly specialized, while the majority remain more generic.
Key Contributions
- Discovery of a sub‑linear power‑law relationship between model size and the absolute number of Rosetta Neurons, meaning they increase slower than the total neuron count.
- Neuron Polarization Effect: Rosetta Neurons become increasingly monosemantic (single‑concept) with scale, while the rest of the network stays less selective.
- Analytical capacity‑utility model that explains why the scaling law and polarization emerge from a trade‑off between feature usefulness and limited neuron capacity.
- Empirical validation on language models up to 30 B parameters and vision models up to 5 B parameters.
- Domain‑specialization case study showing how Rosetta Neurons can be leveraged to filter data for continued pre‑training, improving efficiency.
Methodology
- Identify Rosetta Neurons: The authors reuse a previously defined set of neurons whose activation patterns are highly reproducible across different training runs of the same architecture.
- Scale Sweep: They train or fine‑tune families of transformer‑based language models (≈ 100 M → 30 B parameters) and convolution‑or‑transformer vision models (≈ 50 M → 5 B parameters).
- Counting & Fraction Analysis: For each model, they count how many neurons belong to the Rosetta set and compute the fraction relative to the total hidden units.
- Selectivity Measurement: Using probing tasks (e.g., concept activation, feature attribution) they quantify how “monosemantic” each neuron is—i.e., how strongly it responds to a single interpretable concept.
- Analytical Model: They formulate a simple optimization problem that balances the utility of a feature (how much it reduces loss) against the cost of allocating a neuron to represent it, deriving a power‑law solution.
- Domain‑Specialization Demo: By filtering a pre‑training corpus with the strongest‑activating Rosetta Neurons, they show that continued training on the filtered data yields comparable performance with fewer tokens.
Results & Findings
| Model family | Params (B) | Rosetta neurons (absolute) | Fraction of total | Selectivity trend |
|---|---|---|---|---|
| Language | 0.1 → 30 | ↑ from ~200 to ~1,200 | ↓ from 0.5 % → 0.1 % | ↑ (monosemantic) |
| Vision | 0.05 → 5 | ↑ from ~80 to ~450 | ↓ from 0.4 % → 0.08 % | ↑ (feature‑specific) |
- Sublinear scaling: Rosetta neuron count follows (N_{\text{Rosetta}} \propto N_{\text{total}}^{0.6}) (≈ power‑law exponent < 1).
- Polarization: The average selectivity score of Rosetta neurons grows ~3× across the scale sweep, while non‑Rosetta neurons show negligible change.
- Analytical fit: The capacity‑utility model predicts the observed exponent (≈ 0.55–0.65) and the widening gap between the two neuron populations.
- Domain‑specialization: Filtering pre‑training data with the top‑10 Rosetta neurons reduces required token count by ~15 % while preserving downstream accuracy on GLUE and ImageNet benchmarks.
Practical Implications
- Model debugging & interpretability: Knowing that a predictable, shrinking core of highly selective neurons exists lets engineers focus attribution tools on a manageable subset, speeding up root‑cause analysis.
- Efficient fine‑tuning: Rosetta neurons can serve as “semantic anchors” to guide low‑resource adaptation—e.g., by aligning them with domain‑specific concepts, developers can achieve better performance with fewer training steps.
- Data curation: The case study demonstrates a concrete workflow: run a quick Rosetta‑neuron probe on a large corpus, filter out low‑relevance examples, and continue pre‑training on the curated set, saving compute and storage.
- Hardware allocation: Since the majority of neurons remain generic, hardware designers could consider heterogeneous architectures (e.g., dedicated “interpretability cores”) that treat the Rosetta subset differently (e.g., higher precision, more monitoring).
- Model compression: The polarization effect suggests that pruning strategies could treat Rosetta and non‑Rosetta neurons differently—preserving the highly selective ones while aggressively compressing the bulk.
Limitations & Future Work
- Model families: The study focuses on transformer‑style language models and a specific class of vision models; scaling behavior may differ for diffusion, graph, or reinforcement‑learning architectures.
- Rosetta definition dependence: The set of Rosetta neurons is derived from reproducibility across runs; alternative definitions of “interpretable” neurons could yield different scaling trends.
- Task coverage: Selectivity is measured mainly on probing tasks; it remains unclear how these neurons affect generation quality, robustness, or out‑of‑distribution performance.
- Analytical simplifications: The capacity‑utility model abstracts away many training dynamics (e.g., optimizer effects, regularization), so its predictive power for novel architectures is untested.
- Future directions: Extending the analysis to trillion‑parameter models, exploring how Rosetta neurons evolve during continual learning, and integrating the findings into automated model‑design pipelines are promising next steps.
Authors
- Amil Dravid
- Yasaman Bahri
- Alexei A. Efros
- Yossi Gandelsman
Paper Information
- arXiv ID: 2606.03990v1
- Categories: cs.LG, cs.CL, cs.CV
- Published: June 2, 2026
- PDF: Download PDF