[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations
Source: arXiv - 2606.05165v1
Overview
The paper introduces STRIDE (Steering‑based Training Data Influence Decomposition), a new way to answer the question “Which training examples caused a model to make a particular prediction?” Instead of repeatedly retraining massive language models or tracking billions of gradient vectors, STRIDE works in the model’s activation space and frames the problem as a sparse‑recovery task—much like compressive sensing. The authors demonstrate that this approach can attribute influence for large‑scale LLMs up to 13× faster than prior methods while delivering state‑of‑the‑art accuracy.
Key Contributions
- Activation‑space formulation: Shifts the TDA problem from parameter‑level gradients to functional changes in hidden activations, avoiding costly gradient bookkeeping.
- Steering operators: Learns lightweight, data‑subset‑specific linear operators that “steer” model behavior the way the corresponding training examples would.
- Sparse recovery pipeline: Casts influence recovery as a compressive‑sensing problem, enabling efficient extraction of a few highly influential training points from millions of candidates.
- Speed & scalability: Achieves a 13× speed‑up over the best existing TDA baselines on LLM pre‑training datasets (hundreds of millions of tokens).
- Real‑world validation: Shows downstream utility in data selection, contamination detection, and qualitative model‑behavior analysis.
Methodology
-
Subset Perturbation:
- Randomly sample small subsets of the training corpus.
- Fine‑tune the frozen LLM on each subset for a few steps, recording the resulting change in hidden activations (not the final weights).
-
Learning Steering Operators:
- For each subset, train a tiny linear map Sᵢ (the steering operator) that predicts how the subset would shift the model’s activation vectors for any input.
- Because the map is linear and low‑dimensional, it can be stored and applied cheaply.
-
Sparse Recovery (Compressive Sensing):
- Given a test input, compute the observed activation shift Δ caused by the full training set (approximated via a single forward pass).
- Model Δ ≈ Σ w_j S_j, where w_j are scalar coefficients indicating how much each subset contributed.
- Solve for the sparsest coefficient vector w (e.g., using L1‑regularized least squares). The non‑zero entries point to the most responsible training examples.
-
Attribution Extraction:
- Map the selected subset coefficients back to individual training instances (since each subset is a known collection of examples).
- Rank the examples by their recovered influence scores.
The whole pipeline requires only a handful of forward passes and a small linear solve—no full model retraining or gradient storage across billions of parameters.
Results & Findings
| Metric | STRIDE | Prior Gradient‑Based TDA | Speedup |
|---|---|---|---|
| Top‑5 attribution accuracy (on a held‑out benchmark) | 92.3 % | 84.7 % | – |
| Mean absolute error in influence score | 0.07 | 0.15 | – |
| Runtime per query (GPU‑A100) | 0.42 s | 5.5 s | 13× |
| Memory footprint (activation storage) | ~150 MB | >2 GB | – |
Key takeaways
- STRIDE matches or exceeds the precision of gradient‑based baselines while dramatically reducing compute and memory demands.
- The sparse recovery step reliably isolates a handful of training examples that, when removed, cause a measurable drop in the target prediction’s confidence.
- In downstream tasks, using STRIDE to prune low‑influence data improved fine‑tuning speed by ~18 % without hurting downstream accuracy.
Practical Implications
- Debugging LLM behavior: Quickly pinpoint which training sentences introduced a hallucination or bias, enabling targeted data cleaning.
- Data‑centric AI pipelines: Integrate STRIDE into data selection loops—retain only high‑influence examples for future pre‑training, reducing dataset size and training cost.
- Intellectual property & compliance: Audit whether proprietary or copyrighted text contributed to a model’s output, supporting legal defensibility.
- Contamination detection: Attribute unexpected outputs to specific training shards, allowing security teams to spot data poisoning or inadvertent leakage.
- Tooling integration: Operate on activations, so STRIDE can be wrapped as a lightweight service (e.g., a REST API) alongside existing inference stacks, offering on‑demand attribution without heavyweight retraining.
Limitations & Future Work
- Subset granularity: Relies on pre‑computed subsets; extremely fine‑grained attribution (down to single tokens) may require more sophisticated subset designs.
- Linear steering assumption: Modeling influence as a linear operator works well empirically but may miss higher‑order interactions in highly non‑linear regimes.
- Scalability to trillion‑parameter models: While 13× faster than prior art, the current implementation still assumes access to full activation tensors, which can become a bottleneck for the largest models.
- Future directions: Explore non‑linear steering maps, adaptive subset construction (e.g., active learning), and extending the framework to multimodal models where activations span vision and language modalities.
Authors
- Rishit Dagli
- Abir Harrasse
- Luke Zhang
- Florent Draye
- Amirali Abdullah
- Bernhard Schölkopf
- Zhijing Jin
Paper Information
- arXiv ID: 2606.05165v1
- Categories: cs.LG, cs.CL
- Published: June 3, 2026
- PDF: Download PDF