[Paper] Explainable AI for Jet Tagging: A Comparative Study of GNNExplainer, GNNShap, and GradCAM for Jet Tagging in the Lund Jet Plane

Published: 19 hours ago (April 28, 2026 at 01:28 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.25885v1

Overview

The paper investigates why cutting‑edge graph‑based neural networks (e.g., ParticleNet, LundNet) can so accurately identify the origin of particle jets at the Large Hadron Collider (LHC). By adapting three popular explainability techniques—GNNExplainer, GNNShap, and Grad‑CAM—to the Lund‑plane graph representation of jets, the authors provide a systematic way to peek inside these “black‑box” models and connect their decisions to well‑understood physics observables.

Key Contributions

Adaptation of XAI tools: Tailors perturbation‑based (GNNExplainer), Shapley‑value‑based (GNNShap), and gradient‑based (Grad‑CAM) explainers to work on Lund‑plane graphs, where each node corresponds to a physically meaningful parton splitting.
Physics‑aware evaluation framework: Introduces Monte‑Carlo truth masks and novel metrics that go beyond generic fidelity scores, allowing a direct comparison between model explanations and ground‑truth jet substructure.
Cross‑regime analysis: Benchmarks explanation quality across three transverse‑momentum (pₜ) bins, highlighting how interpretability shifts between non‑perturbative and perturbative QCD regimes.
Correlation with classic observables: Quantifies how explainer‑assigned node importance aligns with traditional jet‑substructure variables such as τ₂₁, τ₃₂, and energy‑correlation functions.
Open‑source implementation: Releases a reproducible codebase for explainability studies on graph‑based jet taggers, encouraging community adoption.

Methodology

Data & Model: Uses simulated LHC jet events (quark‑ vs. gluon‑initiated, top‑quark vs. QCD background) processed by LundNet, which encodes each jet as a directed graph on the Lund plane.
Explainability Adaptation:
- GNNExplainer: Perturbs individual nodes/edges and measures impact on the model’s output to infer importance scores.
- GNNShap: Computes Shapley values for nodes, offering a game‑theoretic attribution that fairly distributes credit among all graph components.
- Grad‑CAM: Propagates gradients from the final classification layer back to the graph nodes, producing a heat‑map of salient splittings.
Ground‑Truth Masks: Generates “truth” importance masks from Monte‑Carlo labels by marking nodes that belong to the hard parton splitting hierarchy.
Evaluation Metrics: Goes beyond standard fidelity (e.g., deletion/insertion) by measuring overlap with truth masks, node‑level precision/recall, and correlation with analytic jet observables.
Phase‑Space Binning: Repeats the whole pipeline in three pₜ intervals—[500, 700] GeV, [800, 1000] GeV, and the inclusive [500, 1000] GeV—to capture regime‑dependent behavior.

Results & Findings

Explanation Quality Varies with pₜ: In the lower‑pₜ bin (non‑perturbative regime) GNNShap produced the highest overlap with truth masks, while Grad‑CAM excelled at higher pₜ where perturbative splittings dominate.
Consistent Correlation with Classical Observables: All three explainers showed a positive Pearson correlation (≈0.3–0.5) between node importance scores and τ₂₁/τ₃₂ ratios, confirming that the network implicitly learns known QCD substructure features.
Method‑Specific Strengths:
- GNNExplainer offered the most interpretable, sparse masks but suffered from higher variance across runs.
- GNNShap delivered stable, theoretically grounded attributions but required significantly more compute (≈5× inference time).
- Grad‑CAM was the fastest and produced smooth importance maps, yet sometimes highlighted peripheral nodes with little physical relevance.
Overall Insight: The trained graph neural network does not operate as a pure “black box”; its decision surface aligns with established jet‑physics quantities, albeit with regime‑dependent emphasis.

Practical Implications

Model Debugging & Trust: Developers can now audit graph‑based jet taggers, spotting failure modes (e.g., over‑reliance on soft radiation) before deploying them in real‑time trigger systems.
Feature Engineering: Correlations with τ₂₁, τ₃₂, and energy‑correlation functions suggest that hybrid models—combining learned embeddings with handcrafted observables—could achieve better performance with fewer parameters.
Transfer to Other Domains: The adaptation pipeline (graph representation → XAI) is directly applicable to any problem where data naturally form hierarchical graphs (e.g., molecular property prediction, network traffic analysis).
Accelerated R&D: Open‑source tools lower the barrier for experimental collaborations to conduct systematic explainability studies, speeding up the iteration cycle for new tagging algorithms.
Regulatory & Safety Contexts: In high‑stakes environments (e.g., autonomous systems, medical diagnostics), having a physics‑aware explanation framework can satisfy audit requirements and improve stakeholder confidence.

Limitations & Future Work

Computational Overhead: Shapley‑value estimation (GNNShap) remains expensive, limiting its use in large‑scale production pipelines.
Monte‑Carlo Truth Approximation: The ground‑truth masks rely on simulated parton histories, which may not capture all detector effects present in real data.
Scope of Models: The study focuses on LundNet; extending the analysis to transformer‑based point‑cloud models (e.g., ParticleTransformer) could reveal different attribution patterns.
Dynamic Graphs: Future work could explore explainability for graphs that evolve during training (e.g., adaptive edge construction) to see how attribution stability changes.
User‑Centric Evaluation: Conducting user studies with physicists and engineers would help refine the explanation visualizations for practical decision‑making.

Authors

Pahal D. Patel
Sanmay Ganguly

Paper Information

arXiv ID: 2604.25885v1
Categories: hep‑ph, cs.LG, hep‑ex
Published: April 28, 2026
PDF: Download PDF

[Paper] Explainable AI for Jet Tagging: A Comparative Study of GNNExplainer, GNNShap, and GradCAM for Jet Tagging in the Lund Jet Plane

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models