[Paper] From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

Published: 2 months ago (February 4, 2026 at 01:50 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.04861v1

Overview

The paper presents a new benchmark—Bond Smoothness Characterization Test (BSCT)—that quickly spots when a machine‑learning interatomic potential (MLIP) produces an unphysical, “jagged” potential energy surface (PES). By flagging discontinuities, spurious minima, and errant forces far from equilibrium, BSCT helps researchers design more reliable MLIPs without the heavy cost of full molecular‑dynamics (MD) runs.

Key Contributions

BSCT benchmark: an inexpensive, automated test that deforms individual bonds to probe PES smoothness and detect hidden artifacts.
Correlation evidence: Demonstrates that BSCT scores strongly predict MD stability, yet run in a fraction of the time.
Design loop: Shows how BSCT can be used in‑the‑loop to iteratively improve MLIP architectures.
Transformer‑based MLIP case study: Introduces a differentiable k‑nearest‑neighbors (k‑NN) layer and temperature‑controlled attention, both guided by BSCT feedback, yielding a model that excels on traditional regression metrics and MD stability.
Open‑source tooling: Provides reference implementations for BSCT and the differentiable k‑NN module, facilitating adoption by the community.

Methodology

Bond‑wise deformation: For each bond type in a training set, the authors generate a 1‑D scan of inter‑atomic distances (both compression and stretching) while keeping the rest of the system fixed.
Smoothness metrics: They compute three simple diagnostics on the MLIP’s predicted energy and forces along each scan:
- Energy curvature continuity (second‑derivative smoothness)
- Force consistency (force = –∇E)
- Absence of artificial minima (detecting unexpected low‑energy wells)
  The metrics are aggregated into a single BSCT score.
Benchmarking: BSCT scores are compared against standard MD stability tests (e.g., energy drift, atom loss) across several benchmark materials (silicon, copper, water).
Iterative model redesign: Starting from a vanilla Transformer MLIP, the authors modify the architecture—adding a differentiable k‑NN layer to enforce locality, and a temperature‑scaled attention mechanism to smooth out high‑frequency attention spikes—re‑training after each change and re‑evaluating with BSCT.

All steps are implemented in PyTorch and run on a single GPU, making the whole pipeline practical for everyday development cycles.

Results & Findings

Metric	Baseline Transformer	+Diff‑kNN	+Temp‑Attention	Final Model
Energy RMSE (meV/atom)	4.2	3.9	3.8	3.7
Force RMSE (meV/Å)	85	71	68	65
BSCT score (lower = smoother)	1.12	0.78	0.65	0.48
MD stability (fraction of 10 ps runs that stay stable)	0.62	0.84	0.91	0.96
Wall‑clock time for evaluation	–	–	–	BSCT ≈ 0.1 × MD

Strong correlation: Pearson r ≈ 0.88 between BSCT score and MD stability across all tested potentials.
Design impact: Each architectural tweak that improved BSCT also reduced conventional regression errors, showing that smoothness and accuracy are not mutually exclusive.
Generalization: The final model maintained low errors on unseen crystal structures and reproduced elastic constants within experimental uncertainty.

Practical Implications

Faster development cycles: Developers can run BSCT as part of CI pipelines to catch “hidden bugs” before expensive MD simulations, shaving days off model validation.
More robust simulations: Production‑level MLIPs built with BSCT guidance are less likely to crash or generate unphysical trajectories, which is critical for high‑throughput materials screening and reactive MD.
Architecture insights: The differentiable k‑NN and temperature‑controlled attention modules are reusable components that can be dropped into existing graph‑based or equivariant networks to enforce locality and smooth attention distributions.
Tooling integration: Because BSCT works on a per‑bond basis, it can be combined with existing datasets (e.g., OpenKIM, Materials Project) without extra data collection, making it easy to adopt in current MLIP workflows.

Bottom line: BSCT offers a pragmatic, low‑cost “smell test” for MLIP smoothness that can be baked into the model‑design loop, helping developers ship more reliable interatomic potentials without waiting for costly MD validation.

Authors

Ryan Liu
Eric Qu
Tobias Kreiman
Samuel M. Blau
Aditi S. Krishnapriyan

Paper Information

arXiv ID: 2602.04861v1
Categories: cs.LG, cond-mat.mtrl-sci, cs.AI, physics.chem-ph
Published: February 4, 2026
PDF: Download PDF

[Paper] From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Authors

Paper Information

Related posts

[Paper] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

[Paper] Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

[Paper] Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

[Paper] Reliable Mislabel Detection for Video Capsule Endoscopy Data