[Paper] Designing FAIR Workflows at OLCF: Building Scalable and Reusable Ecosystems for HPC Science

Published: (December 2, 2025 at 09:27 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.02818v1

Overview

The paper Designing FAIR Workflows at OLCF examines how the Oak Ridge Leadership Computing Facility (OLCF) can turn its massive HPC resources into a reusable, discoverable ecosystem for scientific software and workflows. By extending the FAIR (Findable, Accessible, Interoperable, Reusable) principles beyond data to the building blocks of HPC pipelines, the authors propose a concrete architecture that could cut duplication, speed up onboarding, and make large‑scale science more collaborative across disciplines.

Key Contributions

  • Component‑centric FAIR model: Shifts the focus from whole workflows to individual workflow components (e.g., container images, scripts, libraries) to better match the modular, evolving nature of HPC work.
  • Adaptation of EOSC‑Life FAIR Workflows Collaboratory: Re‑engineers the European Open Science Cloud (EOSC) architecture for the unique constraints of HPC (security, heterogeneous hardware, batch scheduling).
  • Metadata schema & registry prototype: Defines a lightweight, extensible metadata set for HPC artifacts and demonstrates a searchable registry that integrates with OLCF’s job submission tools.
  • Cross‑disciplinary use‑case demonstrations: Shows how the same FAIR component can be reused in climate modeling, genomics, and materials simulations, reducing code duplication.
  • Guidelines for HPC centers: Provides a roadmap for other supercomputing facilities to adopt FAIR‑oriented services (catalogues, CI pipelines, provenance capture).

Methodology

  1. Requirement gathering – Interviews with OLCF users from three scientific domains identified pain points (environment drift, lack of discoverability, security hurdles).
  2. Design mapping – The authors mapped EOSC‑Life’s FAIR workflow stack (metadata service, component registry, execution engine) onto OLCF’s infrastructure (SLURM scheduler, Cray‑specific modules, authentication layers).
  3. Prototype implementation – Built a minimal viable product consisting of:
    • A metadata service exposing a JSON‑LD schema for components.
    • A registry UI/API that indexes container images, Singularity definition files, and module files.
    • Integration hooks into the sbatch command so users can query the registry at submission time.
  4. Evaluation via case studies – Three representative scientific pipelines were refactored to use the FAIR components, and the team measured reuse frequency, setup time, and reproducibility metrics.

Results & Findings

MetricTraditional approachFAIR component approach
Time to set up a new workflow (hrs)6–121–2
Duplicate code artifacts per domain~15~3
Success rate of reproducing a published result (first try)68 %92 %
User satisfaction (Likert 1‑5)3.24.6

The prototype proved that a modest metadata layer and a searchable registry can slash onboarding time and dramatically improve reproducibility. Moreover, the component‑centric view revealed that many “different” pipelines were actually re‑using the same underlying tools (e.g., a specific FFT library), suggesting a large untapped potential for sharing.

Practical Implications

  • For developers: Publishing a container image or module file with the prescribed metadata automatically makes it discoverable by anyone on OLCF, turning a personal script into a community asset.
  • For HPC operators: The registry can be integrated with existing resource managers, enabling policy enforcement (e.g., only approved, FAIR‑tagged components can be scheduled) and simplifying security audits.
  • For research teams: Reusing vetted components reduces the need for custom environment builds, freeing up compute cycles for actual science rather than “environment engineering.”
  • Cross‑facility portability: Because the metadata follows community standards (JSON‑LD, schema.org), the same components can be exported to other supercomputers or cloud HPC services with minimal friction.
  • Automation pipelines: CI/CD systems can automatically validate FAIR compliance (metadata completeness, provenance capture) before a component is promoted to the shared registry, ensuring quality at scale.

Limitations & Future Work

  • Scope of the prototype – The current implementation covers only a subset of component types (Singularity containers, module files). Extending to compiled binaries, data‑intensive libraries, and AI models remains work in progress.
  • Security & policy integration – While the authors outline a path for integrating with OLCF’s authentication, the prototype does not yet enforce fine‑grained access controls or sandboxing for untrusted components.
  • User adoption barrier – Convincing legacy users to annotate and register existing scripts may require incentives or automated retro‑fitting tools.
  • Scalability testing – The registry was evaluated on a few dozen components; future work should stress‑test the service with thousands of entries and concurrent queries typical of a large HPC center.
  • Inter‑center federation – The paper proposes a roadmap for linking FAIR registries across multiple supercomputing sites, but concrete protocols and governance models are still open research questions.

Bottom line: By re‑thinking FAIR not as a data‑only concern but as a component‑level strategy, this work offers a practical blueprint for turning the massive, siloed HPC ecosystems into collaborative, reusable platforms—an evolution that could accelerate scientific discovery while lowering the hidden cost of “environment engineering.”

Authors

  • Sean R. Wilkinson
  • Patrick Widener
  • Sarp Oral
  • Rafael Ferreira da Silva

Paper Information

  • arXiv ID: 2512.02818v1
  • Categories: cs.DC, cs.DL
  • Published: December 2, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »