[Paper] From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

Published: (February 4, 2026 at 01:50 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2602.04861v1

Overview

The paper presents a new benchmark—Bond Smoothness Characterization Test (BSCT)—that quickly spots when a machine‑learning interatomic potential (MLIP) produces an unphysical, “jagged” potential energy surface (PES). By flagging discontinuities, spurious minima, and errant forces far from equilibrium, BSCT helps researchers design more reliable MLIPs without the heavy cost of full molecular‑dynamics (MD) runs.

Key Contributions

  • BSCT benchmark: an inexpensive, automated test that deforms individual bonds to probe PES smoothness and detect hidden artifacts.
  • Correlation evidence: Demonstrates that BSCT scores strongly predict MD stability, yet run in a fraction of the time.
  • Design loop: Shows how BSCT can be used in‑the‑loop to iteratively improve MLIP architectures.
  • Transformer‑based MLIP case study: Introduces a differentiable k‑nearest‑neighbors (k‑NN) layer and temperature‑controlled attention, both guided by BSCT feedback, yielding a model that excels on traditional regression metrics and MD stability.
  • Open‑source tooling: Provides reference implementations for BSCT and the differentiable k‑NN module, facilitating adoption by the community.

Methodology

  1. Bond‑wise deformation: For each bond type in a training set, the authors generate a 1‑D scan of inter‑atomic distances (both compression and stretching) while keeping the rest of the system fixed.
  2. Smoothness metrics: They compute three simple diagnostics on the MLIP’s predicted energy and forces along each scan:
    • Energy curvature continuity (second‑derivative smoothness)
    • Force consistency (force = –∇E)
    • Absence of artificial minima (detecting unexpected low‑energy wells)
      The metrics are aggregated into a single BSCT score.
  3. Benchmarking: BSCT scores are compared against standard MD stability tests (e.g., energy drift, atom loss) across several benchmark materials (silicon, copper, water).
  4. Iterative model redesign: Starting from a vanilla Transformer MLIP, the authors modify the architecture—adding a differentiable k‑NN layer to enforce locality, and a temperature‑scaled attention mechanism to smooth out high‑frequency attention spikes—re‑training after each change and re‑evaluating with BSCT.

All steps are implemented in PyTorch and run on a single GPU, making the whole pipeline practical for everyday development cycles.

Results & Findings

MetricBaseline Transformer+Diff‑kNN+Temp‑AttentionFinal Model
Energy RMSE (meV/atom)4.23.93.83.7
Force RMSE (meV/Å)85716865
BSCT score (lower = smoother)1.120.780.650.48
MD stability (fraction of 10 ps runs that stay stable)0.620.840.910.96
Wall‑clock time for evaluationBSCT ≈ 0.1 × MD
  • Strong correlation: Pearson r ≈ 0.88 between BSCT score and MD stability across all tested potentials.
  • Design impact: Each architectural tweak that improved BSCT also reduced conventional regression errors, showing that smoothness and accuracy are not mutually exclusive.
  • Generalization: The final model maintained low errors on unseen crystal structures and reproduced elastic constants within experimental uncertainty.

Practical Implications

  • Faster development cycles: Developers can run BSCT as part of CI pipelines to catch “hidden bugs” before expensive MD simulations, shaving days off model validation.
  • More robust simulations: Production‑level MLIPs built with BSCT guidance are less likely to crash or generate unphysical trajectories, which is critical for high‑throughput materials screening and reactive MD.
  • Architecture insights: The differentiable k‑NN and temperature‑controlled attention modules are reusable components that can be dropped into existing graph‑based or equivariant networks to enforce locality and smooth attention distributions.
  • Tooling integration: Because BSCT works on a per‑bond basis, it can be combined with existing datasets (e.g., OpenKIM, Materials Project) without extra data collection, making it easy to adopt in current MLIP workflows.

Bottom line: BSCT offers a pragmatic, low‑cost “smell test” for MLIP smoothness that can be baked into the model‑design loop, helping developers ship more reliable interatomic potentials without waiting for costly MD validation.

Authors

  • Ryan Liu
  • Eric Qu
  • Tobias Kreiman
  • Samuel M. Blau
  • Aditi S. Krishnapriyan

Paper Information

  • arXiv ID: 2602.04861v1
  • Categories: cs.LG, cond-mat.mtrl-sci, cs.AI, physics.chem-ph
  • Published: February 4, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »