[Paper] RHAPSODY: Execution of Hybrid AI-HPC Workflows at Scale

Published: 1 month ago (December 23, 2025 at 04:42 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20795v1

Overview

The paper introduces RHAPSODY, a middleware layer that lets developers run highly heterogeneous AI‑HPC pipelines—mixing large‑scale simulations, deep‑learning training, high‑throughput inference, and tightly‑coupled agent‑driven control—inside a single job on leadership‑class supercomputers. By orchestrating existing runtimes rather than replacing them, RHAPSODY bridges the gap between traditional MPI‑based scientific codes and modern AI services, enabling these disparate components to scale together efficiently.

Key Contributions

Unified abstraction layer for tasks, services, resources, and execution policies that works across MPI, containerized AI services, and fine‑grained task runtimes.
Composable multi‑runtime architecture that coordinates existing runtimes (e.g., RADICAL‑Pilot, Dask, Ray, vLLM) instead of reinventing them.
Low‑overhead orchestration demonstrated on multiple leadership‑class systems, showing near‑linear scaling for high‑throughput inference and efficient AI‑HPC coupling.
Real‑world validation with two representative workloads: (1) Dragon (a scientific simulation) + vLLM inference at scale, and (2) an agentic workflow that tightly couples simulation steps with AI decisions.
Extensible policy engine that lets users specify placement, priority, and data‑movement strategies for heterogeneous components in a single job allocation.

Methodology

Abstraction Design – The authors defined a set of generic objects (Task, Service, Resource, Policy) that capture the essential semantics of both batch‑style MPI jobs and persistent AI services.
Runtime Composition – RHAPSODY launches each required runtime (e.g., an MPI job via srun, a containerized inference server via singularity, a task queue via Dask) inside the same allocation. A lightweight coordinator mediates communication and resource sharing among them.
Policy‑Driven Scheduling – Users provide a JSON/YAML policy describing how many nodes to allocate to each runtime, data locality constraints, and latency targets. The coordinator enforces these policies at launch and dynamically during execution.
Benchmarking – Experiments were run on three HPC systems (Summit, Perlmutter, and Theta) using:
- High‑throughput inference: thousands of concurrent vLLM requests feeding a Dragon simulation.
- Agentic workflow: a loop where a simulation step triggers an AI model that decides the next simulation parameters, requiring sub‑second round‑trip latency.
Metrics Collected – Runtime overhead, scaling efficiency, end‑to‑end latency, and network I/O were measured and compared against baseline runs where each component was executed in isolation.

Results & Findings

Scenario	Scaling Behavior	Overhead	Key Insight
High‑throughput inference (vLLM + Dragon)	Near‑linear up to 4 k nodes (≈ 98 % efficiency)	< 5 % extra compared to native vLLM	RHAPSODY’s scheduler can keep inference workers saturated while the simulation runs concurrently.
Agentic AI‑HPC loop	Sustained sub‑100 ms round‑trip latency across 1 k nodes	~3 % runtime overhead	Tight coupling is achievable without sacrificing the performance of the underlying MPI simulation.
Mixed workloads (MPI + container services)	Balanced resource utilization; no starvation of either side	Minimal coordination cost (≈ 2 % of total wall‑time)	The policy engine successfully enforces fairness and respects user‑specified priorities.

Overall, RHAPSODY adds only a few percent of runtime overhead while enabling heterogeneous workloads to co‑exist and scale on the same allocation—something most existing HPC schedulers cannot do.

Practical Implications

One‑job deployments: Developers can bundle a climate model, a deep‑learning surrogate, and a reinforcement‑learning controller into a single sbatch script, simplifying job management and reducing queue wait times.
Cost‑effective resource usage: By sharing nodes between MPI and AI services, organizations can achieve higher utilization on expensive leadership‑class systems, lowering total compute spend.
Rapid prototyping of AI‑augmented simulations: Researchers can iterate on agentic workflows locally and then scale them without rewriting orchestration code, thanks to RHAPSODY’s portable policy files.
Vendor‑agnostic integration: Since RHAPSODY composes existing runtimes, teams can keep using familiar tools (e.g., PyTorch, TensorFlow, OpenFOAM) while gaining the benefits of a unified scheduler.
Future‑proofing: As AI models become larger and more interactive, RHAPSODY’s low‑latency coupling will be essential for emerging domains like digital twins, autonomous scientific experiments, and real‑time data assimilation.

Limitations & Future Work

Dependency on underlying runtimes: RHAPSODY’s performance is bounded by the capabilities of the composed runtimes (e.g., MPI launch latency, container start‑up time).
Policy complexity: Crafting optimal resource‑allocation policies for very large, multi‑tenant jobs can be non‑trivial and may require automated tuning tools.
Fault tolerance: Current implementation assumes a relatively stable allocation; handling node failures or dynamic scaling of services is left for future extensions.
Broader hardware support: The authors plan to integrate GPU‑direct communication libraries and explore support for emerging accelerator architectures (e.g., Habana, Graphcore).

In summary, RHAPSODY demonstrates that a carefully designed middleware can unlock the full potential of hybrid AI‑HPC workflows, offering developers a practical path to run complex, data‑intensive pipelines at scale without sacrificing performance.

Authors

Aymen Alsaadi
Mason Hooten
Mariya Goliyad
Andre Merzky
Andrew Shao
Mikhail Titov
Tianle Wang
Yian Chen
Maria Kalantzi
Kent Lee
Andrew Park
Indira Pimpalkhare
Nick Radcliffe
Colin Wahl
Pete Mendygral
Matteo Turilli
Shantenu Jha

Paper Information

arXiv ID: 2512.20795v1
Categories: cs.DC
Published: December 23, 2025
PDF: Download PDF

[Paper] RHAPSODY: Execution of Hybrid AI-HPC Workflows at Scale

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Proceedings First Workshop on Adaptable Cloud Architectures

[Paper] FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion

[Paper] Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

[Paper] BLEST: Blazingly Efficient BFS using Tensor Cores