[Paper] Analysis of Dirichlet Energies as Over-smoothing Measures
Source: arXiv - 2512.09890v1
Overview
The paper Analysis of Dirichlet Energies as Over‑smoothing Measures investigates why two popular ways of measuring over‑smoothing in graph neural networks (GNNs) – the Dirichlet energy based on the unnormalized Laplacian and the one based on the normalized Laplacian – behave differently. By grounding the discussion in a formal “node‑similarity” axiom set, the authors show that the normalized version actually violates these axioms, which has direct consequences for how practitioners should monitor and control over‑smoothing in real‑world GNN pipelines.
Key Contributions
- Axiomatic critique: Proves that the Dirichlet energy derived from the normalized Laplacian does not satisfy the node‑similarity axioms introduced by Rusch et al. (2022).
- Spectral comparison: Provides a clean, side‑by‑side spectral analysis of the unnormalized vs. normalized Laplacian Dirichlet energies, exposing the root cause of their divergent behavior.
- Guidelines for metric selection: Offers concrete criteria for choosing the “spectrally compatible” energy measure that aligns with a given GNN architecture (e.g., GCN, GraphSAGE, GAT).
- Resolution of ambiguity: Clarifies why previous empirical studies sometimes reported contradictory over‑smoothing trends when switching between the two energies.
Methodology
- Formal definition of node‑similarity – The authors adopt the axioms (non‑negativity, symmetry, identity of indiscernibles, and monotonicity under graph diffusion) from Rusch et al. to set a benchmark for any over‑smoothing metric.
- Spectral decomposition – Both Laplacians are expressed in terms of their eigenvalues/eigenvectors. The Dirichlet energy of a node feature matrix (X) is written as
[ E_{\mathcal{L}}(X)=\operatorname{tr}(X^\top \mathcal{L} X). ] - Axiom testing – By plugging the normalized Laplacian (\mathcal{L}_{\text{norm}} = I - D^{-1/2} A D^{-1/2}) into the axioms, the authors construct counter‑examples (e.g., graphs with highly heterogeneous degree distributions) that break the monotonicity axiom.
- Comparative experiments – Small synthetic graphs and standard benchmark datasets (Cora, PubMed, ogbn‑arxiv) are used to illustrate how the two energies evolve across layers of popular GNNs.
The approach stays high‑level enough for developers: think of it as “checking whether a smoothness score behaves like a proper distance” by looking at the eigen‑spectrum of the graph operator.
Results & Findings
| Metric | Satisfies node‑similarity axioms? | Typical behavior in GNN layers |
|---|---|---|
| Unnormalized Dirichlet energy ((\mathcal{L}=D-A)) | ✅ Yes | Decreases monotonically as layers increase, cleanly reflecting over‑smoothing. |
| Normalized Dirichlet energy ((\mathcal{L}_{\text{norm}})) | ❌ No (fails monotonicity) | Can increase after a few layers on irregular graphs, misleading developers about the true smoothness. |
Key takeaways
- The unnormalized energy is a reliable “over‑smoothing alarm” across a wide range of graph topologies.
- Normalized energy’s dependence on degree scaling makes it sensitive to heterogeneity, causing spurious spikes that do not correspond to actual loss of discriminative power.
Practical Implications
- Model debugging: When you see a sudden rise in the normalized Dirichlet energy during training, it may be a false positive. Switch to the unnormalized version for a trustworthy signal.
- Architecture‑aware regularization: Many GNN regularizers (e.g., Laplacian smoothing, DropEdge) are derived from the unnormalized Laplacian. Aligning your over‑smoothing metric with the same operator avoids mismatched objectives.
- Hyper‑parameter tuning: Early‑stopping criteria based on Dirichlet energy can now be implemented with confidence, using the unnormalized version to decide when a model has become too smooth.
- Tooling: Libraries such as PyG or DGL can expose a simple
torch.linalg.eigvals(L).real‑based utility that returns the unnormalized Dirichlet energy, making it a one‑liner in training loops.
Limitations & Future Work
- The analysis is primarily theoretical and validated on relatively small benchmark graphs; scaling to massive, dynamic graphs (e.g., billions of nodes) may reveal additional nuances.
- Only static GNN architectures are considered; extensions to temporal GNNs or attention‑based diffusion operators remain open.
- The authors suggest exploring adaptive Laplacian choices (e.g., learned degree normalizations) that could combine the stability of the unnormalized energy with the scale‑invariance benefits of the normalized version.
Bottom line for developers: If you need a reliable, mathematically sound gauge of over‑smoothing in your GNN pipelines, stick with the Dirichlet energy built from the unnormalized graph Laplacian. It respects the core similarity axioms, behaves predictably across layers, and integrates seamlessly with existing regularization tricks.
Authors
- Anna Bison
- Alessandro Sperduti
Paper Information
- arXiv ID: 2512.09890v1
- Categories: cs.LG
- Published: December 10, 2025
- PDF: Download PDF