[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Published: 1 day ago (June 3, 2026 at 01:59 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2606.05165v1

Overview

The paper introduces STRIDE (Steering‑based Training Data Influence Decomposition), a new way to answer the question “Which training examples caused a model to make a particular prediction?” Instead of repeatedly retraining massive language models or tracking billions of gradient vectors, STRIDE works in the model’s activation space and frames the problem as a sparse‑recovery task—much like compressive sensing. The authors demonstrate that this approach can attribute influence for large‑scale LLMs up to 13× faster than prior methods while delivering state‑of‑the‑art accuracy.

Key Contributions

Activation‑space formulation: Shifts the TDA problem from parameter‑level gradients to functional changes in hidden activations, avoiding costly gradient bookkeeping.
Steering operators: Learns lightweight, data‑subset‑specific linear operators that “steer” model behavior the way the corresponding training examples would.
Sparse recovery pipeline: Casts influence recovery as a compressive‑sensing problem, enabling efficient extraction of a few highly influential training points from millions of candidates.
Speed & scalability: Achieves a 13× speed‑up over the best existing TDA baselines on LLM pre‑training datasets (hundreds of millions of tokens).
Real‑world validation: Shows downstream utility in data selection, contamination detection, and qualitative model‑behavior analysis.

Methodology

Subset Perturbation:
- Randomly sample small subsets of the training corpus.
- Fine‑tune the frozen LLM on each subset for a few steps, recording the resulting change in hidden activations (not the final weights).
Learning Steering Operators:
- For each subset, train a tiny linear map Sᵢ (the steering operator) that predicts how the subset would shift the model’s activation vectors for any input.
- Because the map is linear and low‑dimensional, it can be stored and applied cheaply.
Sparse Recovery (Compressive Sensing):
- Given a test input, compute the observed activation shift Δ caused by the full training set (approximated via a single forward pass).
- Model Δ ≈ Σ w_j S_j, where w_j are scalar coefficients indicating how much each subset contributed.
- Solve for the sparsest coefficient vector w (e.g., using L1‑regularized least squares). The non‑zero entries point to the most responsible training examples.
Attribution Extraction:
- Map the selected subset coefficients back to individual training instances (since each subset is a known collection of examples).
- Rank the examples by their recovered influence scores.

The whole pipeline requires only a handful of forward passes and a small linear solve—no full model retraining or gradient storage across billions of parameters.

Results & Findings

Metric	STRIDE	Prior Gradient‑Based TDA	Speedup
Top‑5 attribution accuracy (on a held‑out benchmark)	92.3 %	84.7 %	–
Mean absolute error in influence score	0.07	0.15	–
Runtime per query (GPU‑A100)	0.42 s	5.5 s	13×
Memory footprint (activation storage)	~150 MB	>2 GB	–

Key takeaways

STRIDE matches or exceeds the precision of gradient‑based baselines while dramatically reducing compute and memory demands.
The sparse recovery step reliably isolates a handful of training examples that, when removed, cause a measurable drop in the target prediction’s confidence.
In downstream tasks, using STRIDE to prune low‑influence data improved fine‑tuning speed by ~18 % without hurting downstream accuracy.

Practical Implications

Debugging LLM behavior: Quickly pinpoint which training sentences introduced a hallucination or bias, enabling targeted data cleaning.
Data‑centric AI pipelines: Integrate STRIDE into data selection loops—retain only high‑influence examples for future pre‑training, reducing dataset size and training cost.
Intellectual property & compliance: Audit whether proprietary or copyrighted text contributed to a model’s output, supporting legal defensibility.
Contamination detection: Attribute unexpected outputs to specific training shards, allowing security teams to spot data poisoning or inadvertent leakage.
Tooling integration: Operate on activations, so STRIDE can be wrapped as a lightweight service (e.g., a REST API) alongside existing inference stacks, offering on‑demand attribution without heavyweight retraining.

Limitations & Future Work

Subset granularity: Relies on pre‑computed subsets; extremely fine‑grained attribution (down to single tokens) may require more sophisticated subset designs.
Linear steering assumption: Modeling influence as a linear operator works well empirically but may miss higher‑order interactions in highly non‑linear regimes.
Scalability to trillion‑parameter models: While 13× faster than prior art, the current implementation still assumes access to full activation tensors, which can become a bottleneck for the largest models.
Future directions: Explore non‑linear steering maps, adaptive subset construction (e.g., active learning), and extending the framework to multimodal models where activations span vision and language modalities.

Authors

Rishit Dagli
Abir Harrasse
Luke Zhang
Florent Draye
Amirali Abdullah
Bernhard Schölkopf
Zhijing Jin

Paper Information

arXiv ID: 2606.05165v1
Categories: cs.LG, cs.CL
Published: June 3, 2026
PDF: Download PDF

[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Streaming Communication in Multi-Agent Reasoning

[Paper] Reinforcement Learning from Rich Feedback with Distributional DAgger

[Paper] Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

[Paper] Activation-Based Active Learning for In-Context Learning: Challenges and Insights