[Paper] High-Order Spectral Element Methods for Wave Propagation on ARM Multicore CPU with SME: Optimizations and Implications

Published: (June 10, 2026 at 11:30 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.12850v1

Overview

Wave propagation based on the spectral element method (SEM) is a representative HPC workload, but existing SEM implementations are not well matched to emerging ARM multicore CPUs with Scalable Matrix Extension (SME). We present an SME-enabled optimization of \textsc{SPECFEM3D} on the emerging LX2 processor that combines an SME-aware batched small-matrix kernel for SEM tensor-product operators, a memory-aware hybrid MPI+OpenMP execution scheme for limited-HBM systems, and a dispersion-based iso-accuracy study of the $(h,p)$ tradeoff. At fixed polynomial order, the optimized implementation improves full-application performance by 4—6$\times$ over the original code and delivers clear gains over optimized non-SME CPU baselines. Beyond these implementation-level gains, our results suggest that SME shifts the performance-favorable operating point toward higher polynomial orders along the dispersion-based iso-accuracy frontier, further reducing time-to-solution and working-set size. These results indicate that SME affects not only kernel efficiency, but also the practical discretization tradeoff for SEM on modern ARM multicore platforms.

Key Contributions

This paper presents research in the following areas:

  • cs.DC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

  • Yinuo Wang
  • Lin Gan
  • Tianqi Mao
  • Wubing Wan
  • Zekun Yin
  • Wenqiang Wang
  • Wei Xue
  • Guangwen Yang

Paper Information

  • arXiv ID: 2606.12850v1
  • Categories: cs.DC
  • Published: June 11, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »