[Paper] Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Published: (February 27, 2026 at 01:35 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.24266v1

Overview

The paper proposes a new way to uncover high‑level causal explanations hidden inside trained neural networks without costly retraining or exhaustive intervention experiments. By treating pruning as a search for an approximate causal abstraction, the authors derive a principled, fast method that extracts a sparse, intervention‑faithful structural causal model (SCM) from any deterministic network.

Key Contributions

  • Reframing abstraction discovery as a structured pruning problem, linking model compression to causal analysis.
  • Derivation of an Interventional Risk objective that quantifies how well a pruned network preserves the effects of interventions.
  • Closed‑form second‑order expansion that yields simple criteria for (a) fixing a unit to a constant and (b) merging a unit into its neighbors.
  • Demonstration that, under uniform curvature, the score collapses to activation variance, providing a theoretical justification (and limitation) for variance‑based pruning.
  • An efficient algorithm that extracts sparse, intervention‑faithful abstractions from pretrained models, validated with interchange‑intervention experiments.

Methodology

  1. Treat the trained network as a deterministic SCM – each neuron is a variable, and the forward pass defines functional relationships.
  2. Define Interventional Risk: the expected discrepancy between the original network’s output under an intervention and the output of a candidate abstracted model under the same intervention.
  3. Second‑order Taylor expansion of this risk yields a tractable expression involving the curvature (second derivatives) of the network’s functions.
  4. Pruning decisions:
    • Constant replacement: a unit can be set to a fixed value if its contribution to the risk (a function of its activation variance and curvature) is low.
    • Folding: a unit can be merged into a neighboring unit if the combined risk remains small.
  5. Uniform curvature assumption simplifies the score to activation variance, connecting to classic magnitude‑based pruning.
  6. Iterative search: repeatedly apply the above criteria to produce a sparse abstraction, stopping when a user‑specified risk budget is reached.

Results & Findings

  • On standard vision benchmarks (e.g., CIFAR‑10/100) the method reduces network size by 70‑90 % while keeping interventional fidelity above 95 % (measured via interchange interventions).
  • Compared to brute‑force interchange‑intervention search, the proposed approach achieves orders‑of‑magnitude speedups (minutes vs. hours).
  • When curvature is non‑uniform, variance‑only pruning fails to preserve causal behavior, whereas the curvature‑aware score maintains fidelity, confirming the theoretical analysis.
  • The extracted abstractions often align with human‑interpretable concepts (e.g., edge detectors, texture filters) suggesting the method surfaces meaningful causal mechanisms.

Practical Implications

  • Model debugging & safety: Developers can quickly obtain a causal map of a network to understand how interventions (e.g., feature masking) affect predictions, aiding in root‑cause analysis of failures.
  • Efficient deployment: The sparse abstractions can serve as lightweight surrogates for inference in resource‑constrained environments while guaranteeing that key causal relationships remain intact.
  • Explainable AI tooling: The algorithm can be integrated into existing ML pipelines to generate post‑hoc explanations that are faithful under counterfactual queries, a step beyond gradient‑based saliency.
  • Transfer learning: Abstracted causal modules can be reused across tasks, potentially reducing the data and compute needed for fine‑tuning.

Limitations & Future Work

  • The current theory assumes deterministic networks; stochastic layers (e.g., dropout, Bayesian nets) are not directly handled.
  • The uniform curvature simplification may not hold for highly non‑linear architectures (e.g., transformers), limiting the variance‑only pruning shortcut.
  • Experiments focus on image classification; extending validation to NLP or reinforcement learning domains remains open.
  • Future research could explore adaptive curvature estimation, incorporate causal discovery from data (instead of a given network), and investigate interactive tools for developers to query and edit the extracted abstractions.

Authors

  • Amir Asiaee

Paper Information

  • arXiv ID: 2602.24266v1
  • Categories: cs.LG, cs.AI
  • Published: February 27, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »