[Paper] RHAPSODY: Execution of Hybrid AI-HPC Workflows at Scale

Published: (December 23, 2025 at 04:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.20795v1

Overview

The paper introduces RHAPSODY, a middleware layer that lets developers run highly heterogeneous AI‑HPC pipelines—mixing large‑scale simulations, deep‑learning training, high‑throughput inference, and tightly‑coupled agent‑driven control—inside a single job on leadership‑class supercomputers. By orchestrating existing runtimes rather than replacing them, RHAPSODY bridges the gap between traditional MPI‑based scientific codes and modern AI services, enabling these disparate components to scale together efficiently.

Key Contributions

  • Unified abstraction layer for tasks, services, resources, and execution policies that works across MPI, containerized AI services, and fine‑grained task runtimes.
  • Composable multi‑runtime architecture that coordinates existing runtimes (e.g., RADICAL‑Pilot, Dask, Ray, vLLM) instead of reinventing them.
  • Low‑overhead orchestration demonstrated on multiple leadership‑class systems, showing near‑linear scaling for high‑throughput inference and efficient AI‑HPC coupling.
  • Real‑world validation with two representative workloads: (1) Dragon (a scientific simulation) + vLLM inference at scale, and (2) an agentic workflow that tightly couples simulation steps with AI decisions.
  • Extensible policy engine that lets users specify placement, priority, and data‑movement strategies for heterogeneous components in a single job allocation.

Methodology

  1. Abstraction Design – The authors defined a set of generic objects (Task, Service, Resource, Policy) that capture the essential semantics of both batch‑style MPI jobs and persistent AI services.
  2. Runtime Composition – RHAPSODY launches each required runtime (e.g., an MPI job via srun, a containerized inference server via singularity, a task queue via Dask) inside the same allocation. A lightweight coordinator mediates communication and resource sharing among them.
  3. Policy‑Driven Scheduling – Users provide a JSON/YAML policy describing how many nodes to allocate to each runtime, data locality constraints, and latency targets. The coordinator enforces these policies at launch and dynamically during execution.
  4. Benchmarking – Experiments were run on three HPC systems (Summit, Perlmutter, and Theta) using:
    • High‑throughput inference: thousands of concurrent vLLM requests feeding a Dragon simulation.
    • Agentic workflow: a loop where a simulation step triggers an AI model that decides the next simulation parameters, requiring sub‑second round‑trip latency.
  5. Metrics Collected – Runtime overhead, scaling efficiency, end‑to‑end latency, and network I/O were measured and compared against baseline runs where each component was executed in isolation.

Results & Findings

ScenarioScaling BehaviorOverheadKey Insight
High‑throughput inference (vLLM + Dragon)Near‑linear up to 4 k nodes (≈ 98 % efficiency)< 5 % extra compared to native vLLMRHAPSODY’s scheduler can keep inference workers saturated while the simulation runs concurrently.
Agentic AI‑HPC loopSustained sub‑100 ms round‑trip latency across 1 k nodes~3 % runtime overheadTight coupling is achievable without sacrificing the performance of the underlying MPI simulation.
Mixed workloads (MPI + container services)Balanced resource utilization; no starvation of either sideMinimal coordination cost (≈ 2 % of total wall‑time)The policy engine successfully enforces fairness and respects user‑specified priorities.

Overall, RHAPSODY adds only a few percent of runtime overhead while enabling heterogeneous workloads to co‑exist and scale on the same allocation—something most existing HPC schedulers cannot do.

Practical Implications

  • One‑job deployments: Developers can bundle a climate model, a deep‑learning surrogate, and a reinforcement‑learning controller into a single sbatch script, simplifying job management and reducing queue wait times.
  • Cost‑effective resource usage: By sharing nodes between MPI and AI services, organizations can achieve higher utilization on expensive leadership‑class systems, lowering total compute spend.
  • Rapid prototyping of AI‑augmented simulations: Researchers can iterate on agentic workflows locally and then scale them without rewriting orchestration code, thanks to RHAPSODY’s portable policy files.
  • Vendor‑agnostic integration: Since RHAPSODY composes existing runtimes, teams can keep using familiar tools (e.g., PyTorch, TensorFlow, OpenFOAM) while gaining the benefits of a unified scheduler.
  • Future‑proofing: As AI models become larger and more interactive, RHAPSODY’s low‑latency coupling will be essential for emerging domains like digital twins, autonomous scientific experiments, and real‑time data assimilation.

Limitations & Future Work

  • Dependency on underlying runtimes: RHAPSODY’s performance is bounded by the capabilities of the composed runtimes (e.g., MPI launch latency, container start‑up time).
  • Policy complexity: Crafting optimal resource‑allocation policies for very large, multi‑tenant jobs can be non‑trivial and may require automated tuning tools.
  • Fault tolerance: Current implementation assumes a relatively stable allocation; handling node failures or dynamic scaling of services is left for future extensions.
  • Broader hardware support: The authors plan to integrate GPU‑direct communication libraries and explore support for emerging accelerator architectures (e.g., Habana, Graphcore).

In summary, RHAPSODY demonstrates that a carefully designed middleware can unlock the full potential of hybrid AI‑HPC workflows, offering developers a practical path to run complex, data‑intensive pipelines at scale without sacrificing performance.

Authors

  • Aymen Alsaadi
  • Mason Hooten
  • Mariya Goliyad
  • Andre Merzky
  • Andrew Shao
  • Mikhail Titov
  • Tianle Wang
  • Yian Chen
  • Maria Kalantzi
  • Kent Lee
  • Andrew Park
  • Indira Pimpalkhare
  • Nick Radcliffe
  • Colin Wahl
  • Pete Mendygral
  • Matteo Turilli
  • Shantenu Jha

Paper Information

  • arXiv ID: 2512.20795v1
  • Categories: cs.DC
  • Published: December 23, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »