[Paper] Activation-Based Active Learning for In-Context Learning: Challenges and Insights
Source: arXiv - 2606.05134v1
Overview
The paper investigates whether internal activations of large language models (LLMs) can be used to pick better in‑context examples for active learning. By probing the hidden‑layer (MLP) signals of Llama‑3.2‑3B and Qwen2.5‑3B, the authors aim to see if “big” or “interesting” activation patterns correlate with higher downstream performance. The surprising answer: no – activation‑based sampling does not reliably improve in‑context learning.
Key Contributions
- Comprehensive empirical study of MLP‑activation‑driven active learning across a wide range of classification and generative tasks.
- Comparison of attention‑masking strategies (full‑context vs. masked) to understand how they affect activation signals.
- Quantitative analysis of correlation between activation statistics (magnitude, variance, skewness, kurtosis) and example quality, showing a maximum Spearman ρ of 0.33.
- Negative result that activation‑based sampling should not be used for in‑context learning with current models.
- Insightful hypothesis linking the failure to the phenomenon of superposition in transformer representations, and a suggestion to explore Sparse Autoencoders (SAEs) as a next step.
Methodology
- Models & Datasets – Experiments run on two 3‑billion‑parameter LLMs (Llama‑3.2‑3B, Qwen2.5‑3B) covering several benchmark classification (e.g., SST‑2, AGNews) and generative (e.g., story continuation) datasets.
- Activation Extraction – For each candidate in‑context example, the authors record the output of every MLP block (the feed‑forward layer after attention). They compute four statistical moments: mean (massive activation), variance, skewness, and kurtosis.
- Active‑Learning Strategies –
- Random baseline – uniformly sample examples.
- Activation‑based sampling – rank candidates by each moment (or a combination) and select the top‑k.
- Masking variants – test whether masking future tokens during activation capture changes the signal.
- Evaluation – In‑context performance measured by accuracy (classification) or BLEU/ROUGE (generation). Correlation between activation scores and resulting performance is computed using Spearman’s ρ.
The pipeline is deliberately lightweight: no fine‑tuning, only forward passes to collect activations, making it reproducible for other practitioners.
Results & Findings
| Metric | Random Sampling | Best Activation‑Based Sampling |
|---|---|---|
| Classification accuracy (average) | 78.4 % | 78.9 % |
| Generation BLEU (average) | 21.3 | 21.5 |
| Max Spearman ρ (any moment) | 0.33 | — |
Key takeaways
- No meaningful gain: Even the best‑performing activation‑based selector barely edges out random sampling, and the improvement is not statistically significant.
- Weak correlation: The strongest monotonic relationship between any activation statistic and downstream performance is a modest ρ ≈ 0.33, indicating that large activations do not reliably flag “good” examples.
- Masking effect negligible: Changing attention masks does not materially alter the correlation, suggesting the issue is intrinsic to the representation rather than the context window.
Practical Implications
- Skip activation‑based heuristics: For developers building retrieval‑augmented generation or few‑shot pipelines, relying on raw MLP activation magnitudes is unlikely to yield better prompts.
- Stick to proven selectors: Simpler methods—semantic similarity, diversity sampling, or even random selection—remain competitive and are far easier to implement.
- Resource budgeting: Since extracting activations adds compute overhead without payoff, teams can allocate those cycles to more promising strategies (e.g., embedding‑based nearest‑neighbor retrieval).
- Future tooling direction: The paper points toward Sparse Autoencoders as a way to disentangle superposed features. If SAEs can surface interpretable latent factors, they might become the next “activation‑based” signal worth engineering.
Limitations & Future Work
- Model scale: Experiments are limited to 3 B‑parameter models; larger LLMs might exhibit different activation dynamics.
- Feature scope: Only MLP outputs were examined; attention heads, token‑level embeddings, or cross‑layer interactions could carry richer signals.
- Superposition hypothesis: The claim that superposition blurs activation relevance is plausible but not formally validated; dedicated probing studies are needed.
- Alternative compressions: The authors suggest exploring Sparse Autoencoders, but concrete experiments are left for future work.
Overall, the study serves as a valuable reality check for anyone hoping to “hack” in‑context learning by peeking inside transformer activations. It reminds us that not every internal signal translates into a practical utility—yet it also opens a promising research avenue toward more structured, disentangled representations.
Authors
- Yaseen M. Osman
- Geoff V. Merrett
- Stuart E. Middleton
Paper Information
- arXiv ID: 2606.05134v1
- Categories: cs.CL, cs.LG
- Published: June 3, 2026
- PDF: Download PDF