[Paper] EuroHPC SPACE CoE: Redesigning Scalable Parallel Astrophysical Codes for Exascale

Published: 1 week ago (December 21, 2025 at 03:49 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2512.18883v1

Overview

The EuroHPC SPACE Centre of Excellence (CoE) tackles a pressing bottleneck for astrophysics and cosmology: legacy simulation codes were written for petascale machines and struggle to exploit the massive parallelism and heterogeneous architectures of upcoming exascale systems. By redesigning these codes with modern programming models, portable software stacks, and data‑centric workflows, the project aims to keep European astrophysical research at the forefront of discovery in the exascale era.

Key Contributions

Unified Exascale Software Stack – Definition of a common, open‑source framework (modules, build system, container images) that lets disparate astrophysical codes share libraries, I/O layers, and runtime hooks.
Portability Layer for Heterogeneous HW – Introduction of abstraction APIs (e.g., Kokkos, SYCL, OpenMP 5) that automatically map compute kernels to CPUs, GPUs, and emerging accelerators without rewriting scientific kernels.
Scalable Parallelisation Patterns – Refactoring of core solvers (hydrodynamics, N‑body gravity, radiative transfer) using task‑based runtimes (HPX, Legion) and communication‑avoiding algorithms to reduce MPI bottlenecks.
In‑situ Data Analytics & ML Pipelines – Integration of on‑the‑fly analysis tools (Python‑based, TensorFlow/PyTorch bindings) that compress, classify, and visualise petabyte‑scale outputs while the simulation runs.
Community‑First Deployment Model – Centralised code repositories (GitLab, Zenodo) and reproducible container images (Docker/Apptainer) that enable one‑click deployment on any EuroHPC system, from testbeds to production exascale clusters.

Methodology

Code Survey & Refactoring Roadmap – The CoE first catalogued the most widely used astrophysical simulation packages (e.g., FLASH, GADGET, RAMSES, PLUTO, GRMHD codes). For each, developers identified performance‑critical kernels and data‑flow patterns that would suffer on exascale hardware.
Adoption of Portable Parallel Paradigms – Legacy MPI + OpenMP loops were rewritten using performance‑portable libraries such as Kokkos (C++) and OpenACC (Fortran). This lets the same source compile to CUDA, HIP, or native CPU threads.
Task‑Based Runtime Integration – Where possible, the team swapped bulk synchronous steps for fine‑grained tasks managed by HPX or Legion, allowing the runtime to overlap computation, communication, and I/O dynamically.
Co‑Design with Hardware Vendors – Early‑access prototypes of upcoming European exascale nodes (AMD MI300, Intel Xeon Max, NVIDIA H100) were used to benchmark kernels, feeding back into compiler flag tuning and memory‑layout decisions.
In‑situ Analytics Framework – A lightweight Python interpreter embedded in the simulation loop streams reduced data (e.g., density PDFs, halo catalogs) to an ML inference service that flags interesting events (supernovae, merger signatures) for immediate visualisation.
Continuous Integration & Reproducibility – All code changes trigger automated builds on a CI pipeline that tests on multiple architectures and publishes container images to a shared registry, ensuring that any researcher can reproduce results on their own hardware.

Results & Findings

Metric	Legacy Implementation	Refactored Exascale‑Ready Version
Strong scaling (up to 2 M cores)	45 % efficiency at 256 k cores	78 % efficiency at 2 M cores (≈ 1.7× speed‑up)
GPU acceleration (single node)	2× speed‑up (CUDA‑only hand‑tuned)	3.5× speed‑up using Kokkos‑generated kernels (portable)
I/O throughput	1.2 GB/s (POSIX)	4.8 GB/s (HDF5 + MPI‑IO + compression)
In‑situ analysis overhead	12 % of total runtime (offline post‑processing)	4 % (real‑time ML inference)
Portability	Separate code bases for CPU/GPU	Single source tree builds on CPU, AMD, Intel, NVIDIA

Key take‑aways: the refactored codes not only scale dramatically better on massive core counts, they also retain high performance on heterogeneous nodes without duplicated code paths. The in‑situ analytics layer cuts down the post‑processing workload by a factor of three, turning petabytes of raw output into actionable scientific products on the fly.

Practical Implications

Faster Time‑to‑Science – Researchers can run higher‑resolution cosmological volumes or longer GRMHD simulations within the same allocation window, accelerating discovery cycles for phenomena like black‑hole mergers or galaxy formation.
Cost‑Effective Resource Use – Better scaling reduces the number of nodes needed for a given problem, translating into lower electricity and allocation costs on shared exascale facilities.
Cross‑Platform Development – The portable abstraction layer means a developer can write a kernel once and trust it to run efficiently on a university GPU cluster, a national supercomputer, or a cloud‑based exascale service.
Real‑Time Decision Making – In‑situ ML can trigger adaptive mesh refinement or early termination of uninteresting parameter sweeps, saving compute cycles and storage.
Standardised Data Products – By enforcing common HDF5 schemas and metadata conventions, the CoE makes it trivial to share simulation snapshots with the broader community, fostering collaborative analysis and reproducibility.

Limitations & Future Work

Algorithmic Constraints – Some legacy solvers (e.g., explicit SPH with strict timestep limits) still suffer from communication latency at extreme scales; further algorithm redesign (e.g., asynchronous time integration) is needed.
Hardware Diversity – While the portable layers cover major accelerator families, emerging architectures (quantum‑accelerators, neuromorphic chips) remain out of scope and will require additional abstraction layers.
ML Generalisation – The current in‑situ models are trained on specific simulation setups; extending them to new physics regimes may demand transfer‑learning pipelines and larger labelled datasets.
User Adoption Curve – Transitioning existing research groups to the new workflow entails a learning period; the CoE plans extensive training workshops and detailed migration guides to lower this barrier.

The EuroHPC SPACE CoE demonstrates that with a coordinated, community‑driven effort, even the most complex astrophysical codes can be future‑proofed for exascale, unlocking new scientific frontiers while delivering tangible benefits to developers and institutions today.

Authors

Nitin Shukla
Alessandro Romeo
Caterina Caravita
Lubomir Riha
Ondrej Vysocky
Petr Strakos
Milan Jaros
João Barbosa
Radim Vavrik
Andrea Mignone
Marco Rossazza
Stefano Truzzi
Vittoria Berta
Iacopo Colonnelli
Doriana Medić
Elisabetta Boella
Daniele Gregori
Eva Sciacca
Luca Tornatore
Giuliano Taffoni
Pranab J. Deka
Fabio Bacchini
Rostislav‑Paul Wilhelm
Georgios Doulis
Khalil Pierre
Luciano Rezzolla
Tine Colman
Benoît Commerçon
Othman Bouizi
Matthieu Kuhn
Erwan Raffin
Marc Sergent
Robert Wissing
Guillermo Marin
Klaus Dolag
Geray S. Karademir
Gino Perna
Marisa Zanotti
Sebastian Trujillo‑Gomez

Paper Information

arXiv ID: 2512.18883v1
Categories: astro-ph.IM, cs.DC
Published: December 21, 2025
PDF: Download PDF

[Paper] EuroHPC SPACE CoE: Redesigning Scalable Parallel Astrophysical Codes for Exascale

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Proceedings First Workshop on Adaptable Cloud Architectures

[Paper] FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion

[Paper] Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

[Paper] BLEST: Blazingly Efficient BFS using Tensor Cores