[Paper] Activation-Based Active Learning for In-Context Learning: Challenges and Insights

Published: 1 day ago (June 3, 2026 at 01:39 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2606.05134v1

Overview

The paper investigates whether internal activations of large language models (LLMs) can be used to pick better in‑context examples for active learning. By probing the hidden‑layer (MLP) signals of Llama‑3.2‑3B and Qwen2.5‑3B, the authors aim to see if “big” or “interesting” activation patterns correlate with higher downstream performance. The surprising answer: no – activation‑based sampling does not reliably improve in‑context learning.

Key Contributions

Comprehensive empirical study of MLP‑activation‑driven active learning across a wide range of classification and generative tasks.
Comparison of attention‑masking strategies (full‑context vs. masked) to understand how they affect activation signals.
Quantitative analysis of correlation between activation statistics (magnitude, variance, skewness, kurtosis) and example quality, showing a maximum Spearman ρ of 0.33.
Negative result that activation‑based sampling should not be used for in‑context learning with current models.
Insightful hypothesis linking the failure to the phenomenon of superposition in transformer representations, and a suggestion to explore Sparse Autoencoders (SAEs) as a next step.

Methodology

Models & Datasets – Experiments run on two 3‑billion‑parameter LLMs (Llama‑3.2‑3B, Qwen2.5‑3B) covering several benchmark classification (e.g., SST‑2, AGNews) and generative (e.g., story continuation) datasets.
Activation Extraction – For each candidate in‑context example, the authors record the output of every MLP block (the feed‑forward layer after attention). They compute four statistical moments: mean (massive activation), variance, skewness, and kurtosis.
Active‑Learning Strategies –
- Random baseline – uniformly sample examples.
- Activation‑based sampling – rank candidates by each moment (or a combination) and select the top‑k.
- Masking variants – test whether masking future tokens during activation capture changes the signal.
Evaluation – In‑context performance measured by accuracy (classification) or BLEU/ROUGE (generation). Correlation between activation scores and resulting performance is computed using Spearman’s ρ.

The pipeline is deliberately lightweight: no fine‑tuning, only forward passes to collect activations, making it reproducible for other practitioners.

Results & Findings

Metric	Random Sampling	Best Activation‑Based Sampling
Classification accuracy (average)	78.4 %	78.9 %
Generation BLEU (average)	21.3	21.5
Max Spearman ρ (any moment)	0.33	—

Key takeaways

No meaningful gain: Even the best‑performing activation‑based selector barely edges out random sampling, and the improvement is not statistically significant.
Weak correlation: The strongest monotonic relationship between any activation statistic and downstream performance is a modest ρ ≈ 0.33, indicating that large activations do not reliably flag “good” examples.
Masking effect negligible: Changing attention masks does not materially alter the correlation, suggesting the issue is intrinsic to the representation rather than the context window.

Practical Implications

Skip activation‑based heuristics: For developers building retrieval‑augmented generation or few‑shot pipelines, relying on raw MLP activation magnitudes is unlikely to yield better prompts.
Stick to proven selectors: Simpler methods—semantic similarity, diversity sampling, or even random selection—remain competitive and are far easier to implement.
Resource budgeting: Since extracting activations adds compute overhead without payoff, teams can allocate those cycles to more promising strategies (e.g., embedding‑based nearest‑neighbor retrieval).
Future tooling direction: The paper points toward Sparse Autoencoders as a way to disentangle superposed features. If SAEs can surface interpretable latent factors, they might become the next “activation‑based” signal worth engineering.

Limitations & Future Work

Model scale: Experiments are limited to 3 B‑parameter models; larger LLMs might exhibit different activation dynamics.
Feature scope: Only MLP outputs were examined; attention heads, token‑level embeddings, or cross‑layer interactions could carry richer signals.
Superposition hypothesis: The claim that superposition blurs activation relevance is plausible but not formally validated; dedicated probing studies are needed.
Alternative compressions: The authors suggest exploring Sparse Autoencoders, but concrete experiments are left for future work.

Overall, the study serves as a valuable reality check for anyone hoping to “hack” in‑context learning by peeking inside transformer activations. It reminds us that not every internal signal translates into a practical utility—yet it also opens a promising research avenue toward more structured, disentangled representations.

Authors

Yaseen M. Osman
Geoff V. Merrett
Stuart E. Middleton

Paper Information

arXiv ID: 2606.05134v1
Categories: cs.CL, cs.LG
Published: June 3, 2026
PDF: Download PDF

[Paper] Activation-Based Active Learning for In-Context Learning: Challenges and Insights

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

[Paper] Streaming Communication in Multi-Agent Reasoning

[Paper] Reinforcement Learning from Rich Feedback with Distributional DAgger

[Paper] Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)