[Paper] Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Published: 3 days ago (April 17, 2026 at 04:20 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.15821v1

Overview

The paper presents MatRIS‑MoE, a billion‑parameter Mixture‑of‑Experts (MoE) model for universal machine‑learning interatomic potentials (uMLIPs), together with Janus, a novel high‑dimensional distributed training system that makes it possible to train such massive models on exascale supercomputers. By overcoming the steep computational and communication costs of second‑order derivative training, the authors shrink the training time from weeks to just a few hours, opening the door to rapid, quantum‑accurate simulations across the entire periodic table.

Key Contributions

MatRIS‑MoE architecture: a scalable, invariant MoE design tailored for interatomic potentials, supporting billions of parameters while preserving physical symmetries.
Janus training framework: the first high‑dimensional distributed system that efficiently handles second‑order derivative calculations required by uMLIPs, with hardware‑aware optimizations for both compute and communication.
Exascale performance: demonstrated on two world‑class supercomputers, achieving up to 1.2 EFLOPS (single‑precision) and >90 % parallel efficiency, i.e., 24 %–35 % of the machines’ theoretical peaks.
Training speedup: reduced the wall‑clock time for a billion‑parameter uMLIP from weeks to a few hours, a >10× acceleration over prior state‑of‑the‑art pipelines.
Open‑source infrastructure: the authors release code and scripts, providing a reusable foundation for future AI‑for‑Science (AI4S) models.

Methodology

Invariant MoE Design – The model builds on a symmetry‑preserving backbone (rotational, translational, and permutational invariance) and augments it with a Mixture‑of‑Experts routing layer. Each expert learns a specialized region of the chemical space, allowing the overall parameter count to scale without a proportional increase in compute per sample.
Second‑Order Derivative Training – uMLIPs require forces and stress tensors, which are gradients of the energy with respect to atomic positions. The authors implement an efficient automatic‑differentiation pipeline that computes these second‑order terms in a distributed fashion.
Janus Distributed Engine – Janus partitions the model across tensor, pipeline, and data parallel dimensions simultaneously. It employs:
- Hybrid communication (NVLink, InfiniBand, and custom collective kernels) to hide latency.
- Dynamic expert placement to balance load across nodes.
- Memory‑aware scheduling that swaps inactive experts to host memory, keeping GPU utilization high.
Training Dataset – A curated, multi‑domain dataset covering inorganic crystals, organic molecules, and alloys, spanning all elements, provides the universal pre‑training signal.
Evaluation – After pre‑training, the model is fine‑tuned on downstream tasks (e.g., defect formation energies, reaction pathways) to assess transferability.

Results & Findings

Metric	Achieved	Prior Art
Peak single‑precision performance	1.2 EFLOPS (24 % of peak)	~0.3 EFLOPS
Parallel efficiency (≥ 4 k nodes)	>90 %	60‑70 %
Training wall‑clock (billion‑param)	≈ 4 hours	≈ 2 weeks
Energy prediction MAE on benchmark set	≈ 3 meV/atom	5‑7 meV/atom
Force prediction MAE	≈ 0.04 eV/Å	0.07 eV/Å

These numbers show that MatRIS‑MoE not only scales computationally but also delivers state‑of‑the‑art accuracy across a broad chemical space. The high parallel efficiency demonstrates that the Janus framework successfully mitigates the usual bottlenecks of second‑order derivative training.

Practical Implications

Accelerated Materials Discovery – Researchers can now pre‑train a universal potential once and fine‑tune it for specific compounds in hours, dramatically shortening the design‑loop for catalysts, batteries, and semiconductors.
Integration into Existing Workflows – The released API works with popular simulation packages (LAMMPS, ASE), allowing developers to swap classical force fields for a quantum‑accurate ML potential without rewriting code.
Cost‑Effective Exascale Utilization – By achieving >90 % efficiency, the approach makes better use of expensive supercomputer allocations, reducing the carbon footprint per simulation.
Foundation for Future AI4S Models – The MoE + Janus paradigm can be adapted to other scientific domains that need high‑order derivatives, such as fluid dynamics (Navier‑Stokes) or electronic structure (DFT‑based surrogates).
Edge‑Case Coverage – Because the model is trained on the entire periodic table, developers can trust its predictions even for exotic alloys or high‑pressure phases that are poorly represented in traditional force fields.

Limitations & Future Work

Hardware Dependency – The current speedups rely on cutting‑edge GPU interconnects (NVLink, HDR InfiniBand). Porting Janus to more modest clusters will require additional algorithmic tweaks.
Memory Footprint – Even with expert swapping, a single training run consumes several terabytes of GPU memory, limiting the size of batches that can be processed.
Generalization to Extreme Conditions – While the dataset is diverse, extreme temperature/pressure regimes still have sparse coverage; targeted data augmentation is needed.
Explainability – MoE routing decisions are opaque; future work could add interpretability layers to understand which chemical motifs activate specific experts.
Fine‑Tuning Overheads – Although pre‑training is fast, fine‑tuning on niche domains sometimes still requires days of compute; research into few‑shot adaptation methods is ongoing.

Bottom line: By marrying a physics‑aware MoE architecture with a purpose‑built exascale training engine, the authors have turned the once‑prohibitive task of training billion‑parameter universal interatomic potentials into a practical tool for developers and scientists alike. This breakthrough paves the way for rapid, high‑fidelity simulations that can accelerate the next generation of materials and chemical innovations.

Authors

Yuanchang Zhou
Hongyu Wang
Yiming Du
Yan Wang
Mingzhen Li
Siyu Hu
Xiangyu Zhang
Weijian Liu
Chen Wang
Zhuoqiang Guo
Long Wang
Jingde Bu
Yutong Lu
Guangming Tan
Weile Jia

Paper Information

arXiv ID: 2604.15821v1
Categories: cs.DC, cs.LG
Published: April 17, 2026
PDF: Download PDF

[Paper] Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ASMR-Bench: Auditing for Sabotage in ML Research

[Paper] Geometric regularization of autoencoders via observed stochastic dynamics

[Paper] Using Large Language Models and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing

[Paper] Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design