[Paper] Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Published: 3 days ago (February 27, 2026 at 01:35 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.24266v1

Overview

The paper proposes a new way to uncover high‑level causal explanations hidden inside trained neural networks without costly retraining or exhaustive intervention experiments. By treating pruning as a search for an approximate causal abstraction, the authors derive a principled, fast method that extracts a sparse, intervention‑faithful structural causal model (SCM) from any deterministic network.

Key Contributions

Reframing abstraction discovery as a structured pruning problem, linking model compression to causal analysis.
Derivation of an Interventional Risk objective that quantifies how well a pruned network preserves the effects of interventions.
Closed‑form second‑order expansion that yields simple criteria for (a) fixing a unit to a constant and (b) merging a unit into its neighbors.
Demonstration that, under uniform curvature, the score collapses to activation variance, providing a theoretical justification (and limitation) for variance‑based pruning.
An efficient algorithm that extracts sparse, intervention‑faithful abstractions from pretrained models, validated with interchange‑intervention experiments.

Methodology

Treat the trained network as a deterministic SCM – each neuron is a variable, and the forward pass defines functional relationships.
Define Interventional Risk: the expected discrepancy between the original network’s output under an intervention and the output of a candidate abstracted model under the same intervention.
Second‑order Taylor expansion of this risk yields a tractable expression involving the curvature (second derivatives) of the network’s functions.
Pruning decisions:
- Constant replacement: a unit can be set to a fixed value if its contribution to the risk (a function of its activation variance and curvature) is low.
- Folding: a unit can be merged into a neighboring unit if the combined risk remains small.
Uniform curvature assumption simplifies the score to activation variance, connecting to classic magnitude‑based pruning.
Iterative search: repeatedly apply the above criteria to produce a sparse abstraction, stopping when a user‑specified risk budget is reached.

Results & Findings

On standard vision benchmarks (e.g., CIFAR‑10/100) the method reduces network size by 70‑90 % while keeping interventional fidelity above 95 % (measured via interchange interventions).
Compared to brute‑force interchange‑intervention search, the proposed approach achieves orders‑of‑magnitude speedups (minutes vs. hours).
When curvature is non‑uniform, variance‑only pruning fails to preserve causal behavior, whereas the curvature‑aware score maintains fidelity, confirming the theoretical analysis.
The extracted abstractions often align with human‑interpretable concepts (e.g., edge detectors, texture filters) suggesting the method surfaces meaningful causal mechanisms.

Practical Implications

Model debugging & safety: Developers can quickly obtain a causal map of a network to understand how interventions (e.g., feature masking) affect predictions, aiding in root‑cause analysis of failures.
Efficient deployment: The sparse abstractions can serve as lightweight surrogates for inference in resource‑constrained environments while guaranteeing that key causal relationships remain intact.
Explainable AI tooling: The algorithm can be integrated into existing ML pipelines to generate post‑hoc explanations that are faithful under counterfactual queries, a step beyond gradient‑based saliency.
Transfer learning: Abstracted causal modules can be reused across tasks, potentially reducing the data and compute needed for fine‑tuning.

Limitations & Future Work

The current theory assumes deterministic networks; stochastic layers (e.g., dropout, Bayesian nets) are not directly handled.
The uniform curvature simplification may not hold for highly non‑linear architectures (e.g., transformers), limiting the variance‑only pruning shortcut.
Experiments focus on image classification; extending validation to NLP or reinforcement learning domains remains open.
Future research could explore adaptive curvature estimation, incorporate causal discovery from data (instead of a given network), and investigate interactive tools for developers to query and edit the extracted abstractions.

Authors

Amir Asiaee

Paper Information

arXiv ID: 2602.24266v1
Categories: cs.LG, cs.AI
Published: February 27, 2026
PDF: Download PDF

[Paper] Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Mode Seeking meets Mean Seeking for Fast Long Video Generation

[Paper] DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

[Paper] Do LLMs Benefit From Their Own Words?

[Paper] CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation