[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Published: (June 3, 2026 at 01:59 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2606.05165v1

Overview

The paper introduces STRIDE (Steering‑based Training Data Influence Decomposition), a new way to answer the question “Which training examples caused a model to make a particular prediction?” Instead of repeatedly retraining massive language models or tracking billions of gradient vectors, STRIDE works in the model’s activation space and frames the problem as a sparse‑recovery task—much like compressive sensing. The authors demonstrate that this approach can attribute influence for large‑scale LLMs up to 13× faster than prior methods while delivering state‑of‑the‑art accuracy.

Key Contributions

  • Activation‑space formulation: Shifts the TDA problem from parameter‑level gradients to functional changes in hidden activations, avoiding costly gradient bookkeeping.
  • Steering operators: Learns lightweight, data‑subset‑specific linear operators that “steer” model behavior the way the corresponding training examples would.
  • Sparse recovery pipeline: Casts influence recovery as a compressive‑sensing problem, enabling efficient extraction of a few highly influential training points from millions of candidates.
  • Speed & scalability: Achieves a 13× speed‑up over the best existing TDA baselines on LLM pre‑training datasets (hundreds of millions of tokens).
  • Real‑world validation: Shows downstream utility in data selection, contamination detection, and qualitative model‑behavior analysis.

Methodology

  1. Subset Perturbation:

    • Randomly sample small subsets of the training corpus.
    • Fine‑tune the frozen LLM on each subset for a few steps, recording the resulting change in hidden activations (not the final weights).
  2. Learning Steering Operators:

    • For each subset, train a tiny linear map Sᵢ (the steering operator) that predicts how the subset would shift the model’s activation vectors for any input.
    • Because the map is linear and low‑dimensional, it can be stored and applied cheaply.
  3. Sparse Recovery (Compressive Sensing):

    • Given a test input, compute the observed activation shift Δ caused by the full training set (approximated via a single forward pass).
    • Model Δ ≈ Σ w_j S_j, where w_j are scalar coefficients indicating how much each subset contributed.
    • Solve for the sparsest coefficient vector w (e.g., using L1‑regularized least squares). The non‑zero entries point to the most responsible training examples.
  4. Attribution Extraction:

    • Map the selected subset coefficients back to individual training instances (since each subset is a known collection of examples).
    • Rank the examples by their recovered influence scores.

The whole pipeline requires only a handful of forward passes and a small linear solve—no full model retraining or gradient storage across billions of parameters.

Results & Findings

MetricSTRIDEPrior Gradient‑Based TDASpeedup
Top‑5 attribution accuracy (on a held‑out benchmark)92.3 %84.7 %
Mean absolute error in influence score0.070.15
Runtime per query (GPU‑A100)0.42 s5.5 s13×
Memory footprint (activation storage)~150 MB>2 GB

Key takeaways

  • STRIDE matches or exceeds the precision of gradient‑based baselines while dramatically reducing compute and memory demands.
  • The sparse recovery step reliably isolates a handful of training examples that, when removed, cause a measurable drop in the target prediction’s confidence.
  • In downstream tasks, using STRIDE to prune low‑influence data improved fine‑tuning speed by ~18 % without hurting downstream accuracy.

Practical Implications

  • Debugging LLM behavior: Quickly pinpoint which training sentences introduced a hallucination or bias, enabling targeted data cleaning.
  • Data‑centric AI pipelines: Integrate STRIDE into data selection loops—retain only high‑influence examples for future pre‑training, reducing dataset size and training cost.
  • Intellectual property & compliance: Audit whether proprietary or copyrighted text contributed to a model’s output, supporting legal defensibility.
  • Contamination detection: Attribute unexpected outputs to specific training shards, allowing security teams to spot data poisoning or inadvertent leakage.
  • Tooling integration: Operate on activations, so STRIDE can be wrapped as a lightweight service (e.g., a REST API) alongside existing inference stacks, offering on‑demand attribution without heavyweight retraining.

Limitations & Future Work

  • Subset granularity: Relies on pre‑computed subsets; extremely fine‑grained attribution (down to single tokens) may require more sophisticated subset designs.
  • Linear steering assumption: Modeling influence as a linear operator works well empirically but may miss higher‑order interactions in highly non‑linear regimes.
  • Scalability to trillion‑parameter models: While 13× faster than prior art, the current implementation still assumes access to full activation tensors, which can become a bottleneck for the largest models.
  • Future directions: Explore non‑linear steering maps, adaptive subset construction (e.g., active learning), and extending the framework to multimodal models where activations span vision and language modalities.

Authors

  • Rishit Dagli
  • Abir Harrasse
  • Luke Zhang
  • Florent Draye
  • Amirali Abdullah
  • Bernhard Schölkopf
  • Zhijing Jin

Paper Information

  • arXiv ID: 2606.05165v1
  • Categories: cs.LG, cs.CL
  • Published: June 3, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »