[Paper] EuroHPC SPACE CoE: Redesigning Scalable Parallel Astrophysical Codes for Exascale

Published: (December 21, 2025 at 03:49 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2512.18883v1

Overview

The EuroHPC SPACE Centre of Excellence (CoE) tackles a pressing bottleneck for astrophysics and cosmology: legacy simulation codes were written for petascale machines and struggle to exploit the massive parallelism and heterogeneous architectures of upcoming exascale systems. By redesigning these codes with modern programming models, portable software stacks, and data‑centric workflows, the project aims to keep European astrophysical research at the forefront of discovery in the exascale era.

Key Contributions

  • Unified Exascale Software Stack – Definition of a common, open‑source framework (modules, build system, container images) that lets disparate astrophysical codes share libraries, I/O layers, and runtime hooks.
  • Portability Layer for Heterogeneous HW – Introduction of abstraction APIs (e.g., Kokkos, SYCL, OpenMP 5) that automatically map compute kernels to CPUs, GPUs, and emerging accelerators without rewriting scientific kernels.
  • Scalable Parallelisation Patterns – Refactoring of core solvers (hydrodynamics, N‑body gravity, radiative transfer) using task‑based runtimes (HPX, Legion) and communication‑avoiding algorithms to reduce MPI bottlenecks.
  • In‑situ Data Analytics & ML Pipelines – Integration of on‑the‑fly analysis tools (Python‑based, TensorFlow/PyTorch bindings) that compress, classify, and visualise petabyte‑scale outputs while the simulation runs.
  • Community‑First Deployment Model – Centralised code repositories (GitLab, Zenodo) and reproducible container images (Docker/Apptainer) that enable one‑click deployment on any EuroHPC system, from testbeds to production exascale clusters.

Methodology

  1. Code Survey & Refactoring Roadmap – The CoE first catalogued the most widely used astrophysical simulation packages (e.g., FLASH, GADGET, RAMSES, PLUTO, GRMHD codes). For each, developers identified performance‑critical kernels and data‑flow patterns that would suffer on exascale hardware.
  2. Adoption of Portable Parallel Paradigms – Legacy MPI + OpenMP loops were rewritten using performance‑portable libraries such as Kokkos (C++) and OpenACC (Fortran). This lets the same source compile to CUDA, HIP, or native CPU threads.
  3. Task‑Based Runtime Integration – Where possible, the team swapped bulk synchronous steps for fine‑grained tasks managed by HPX or Legion, allowing the runtime to overlap computation, communication, and I/O dynamically.
  4. Co‑Design with Hardware Vendors – Early‑access prototypes of upcoming European exascale nodes (AMD MI300, Intel Xeon Max, NVIDIA H100) were used to benchmark kernels, feeding back into compiler flag tuning and memory‑layout decisions.
  5. In‑situ Analytics Framework – A lightweight Python interpreter embedded in the simulation loop streams reduced data (e.g., density PDFs, halo catalogs) to an ML inference service that flags interesting events (supernovae, merger signatures) for immediate visualisation.
  6. Continuous Integration & Reproducibility – All code changes trigger automated builds on a CI pipeline that tests on multiple architectures and publishes container images to a shared registry, ensuring that any researcher can reproduce results on their own hardware.

Results & Findings

MetricLegacy ImplementationRefactored Exascale‑Ready Version
Strong scaling (up to 2 M cores)45 % efficiency at 256 k cores78 % efficiency at 2 M cores (≈ 1.7× speed‑up)
GPU acceleration (single node)2× speed‑up (CUDA‑only hand‑tuned)3.5× speed‑up using Kokkos‑generated kernels (portable)
I/O throughput1.2 GB/s (POSIX)4.8 GB/s (HDF5 + MPI‑IO + compression)
In‑situ analysis overhead12 % of total runtime (offline post‑processing)4 % (real‑time ML inference)
PortabilitySeparate code bases for CPU/GPUSingle source tree builds on CPU, AMD, Intel, NVIDIA

Key take‑aways: the refactored codes not only scale dramatically better on massive core counts, they also retain high performance on heterogeneous nodes without duplicated code paths. The in‑situ analytics layer cuts down the post‑processing workload by a factor of three, turning petabytes of raw output into actionable scientific products on the fly.

Practical Implications

  • Faster Time‑to‑Science – Researchers can run higher‑resolution cosmological volumes or longer GRMHD simulations within the same allocation window, accelerating discovery cycles for phenomena like black‑hole mergers or galaxy formation.
  • Cost‑Effective Resource Use – Better scaling reduces the number of nodes needed for a given problem, translating into lower electricity and allocation costs on shared exascale facilities.
  • Cross‑Platform Development – The portable abstraction layer means a developer can write a kernel once and trust it to run efficiently on a university GPU cluster, a national supercomputer, or a cloud‑based exascale service.
  • Real‑Time Decision Making – In‑situ ML can trigger adaptive mesh refinement or early termination of uninteresting parameter sweeps, saving compute cycles and storage.
  • Standardised Data Products – By enforcing common HDF5 schemas and metadata conventions, the CoE makes it trivial to share simulation snapshots with the broader community, fostering collaborative analysis and reproducibility.

Limitations & Future Work

  • Algorithmic Constraints – Some legacy solvers (e.g., explicit SPH with strict timestep limits) still suffer from communication latency at extreme scales; further algorithm redesign (e.g., asynchronous time integration) is needed.
  • Hardware Diversity – While the portable layers cover major accelerator families, emerging architectures (quantum‑accelerators, neuromorphic chips) remain out of scope and will require additional abstraction layers.
  • ML Generalisation – The current in‑situ models are trained on specific simulation setups; extending them to new physics regimes may demand transfer‑learning pipelines and larger labelled datasets.
  • User Adoption Curve – Transitioning existing research groups to the new workflow entails a learning period; the CoE plans extensive training workshops and detailed migration guides to lower this barrier.

The EuroHPC SPACE CoE demonstrates that with a coordinated, community‑driven effort, even the most complex astrophysical codes can be future‑proofed for exascale, unlocking new scientific frontiers while delivering tangible benefits to developers and institutions today.

Authors

  • Nitin Shukla
  • Alessandro Romeo
  • Caterina Caravita
  • Lubomir Riha
  • Ondrej Vysocky
  • Petr Strakos
  • Milan Jaros
  • João Barbosa
  • Radim Vavrik
  • Andrea Mignone
  • Marco Rossazza
  • Stefano Truzzi
  • Vittoria Berta
  • Iacopo Colonnelli
  • Doriana Medić
  • Elisabetta Boella
  • Daniele Gregori
  • Eva Sciacca
  • Luca Tornatore
  • Giuliano Taffoni
  • Pranab J. Deka
  • Fabio Bacchini
  • Rostislav‑Paul Wilhelm
  • Georgios Doulis
  • Khalil Pierre
  • Luciano Rezzolla
  • Tine Colman
  • Benoît Commerçon
  • Othman Bouizi
  • Matthieu Kuhn
  • Erwan Raffin
  • Marc Sergent
  • Robert Wissing
  • Guillermo Marin
  • Klaus Dolag
  • Geray S. Karademir
  • Gino Perna
  • Marisa Zanotti
  • Sebastian Trujillo‑Gomez

Paper Information

  • arXiv ID: 2512.18883v1
  • Categories: astro-ph.IM, cs.DC
  • Published: December 21, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »