[Paper] Designing FAIR Workflows at OLCF: Building Scalable and Reusable Ecosystems for HPC Science
Source: arXiv - 2512.02818v1
Overview
The paper Designing FAIR Workflows at OLCF examines how the Oak Ridge Leadership Computing Facility (OLCF) can turn its massive HPC resources into a reusable, discoverable ecosystem for scientific software and workflows. By extending the FAIR (Findable, Accessible, Interoperable, Reusable) principles beyond data to the building blocks of HPC pipelines, the authors propose a concrete architecture that could cut duplication, speed up onboarding, and make large‑scale science more collaborative across disciplines.
Key Contributions
- Component‑centric FAIR model: Shifts the focus from whole workflows to individual workflow components (e.g., container images, scripts, libraries) to better match the modular, evolving nature of HPC work.
- Adaptation of EOSC‑Life FAIR Workflows Collaboratory: Re‑engineers the European Open Science Cloud (EOSC) architecture for the unique constraints of HPC (security, heterogeneous hardware, batch scheduling).
- Metadata schema & registry prototype: Defines a lightweight, extensible metadata set for HPC artifacts and demonstrates a searchable registry that integrates with OLCF’s job submission tools.
- Cross‑disciplinary use‑case demonstrations: Shows how the same FAIR component can be reused in climate modeling, genomics, and materials simulations, reducing code duplication.
- Guidelines for HPC centers: Provides a roadmap for other supercomputing facilities to adopt FAIR‑oriented services (catalogues, CI pipelines, provenance capture).
Methodology
- Requirement gathering – Interviews with OLCF users from three scientific domains identified pain points (environment drift, lack of discoverability, security hurdles).
- Design mapping – The authors mapped EOSC‑Life’s FAIR workflow stack (metadata service, component registry, execution engine) onto OLCF’s infrastructure (SLURM scheduler, Cray‑specific modules, authentication layers).
- Prototype implementation – Built a minimal viable product consisting of:
- A metadata service exposing a JSON‑LD schema for components.
- A registry UI/API that indexes container images, Singularity definition files, and module files.
- Integration hooks into the
sbatchcommand so users can query the registry at submission time.
- Evaluation via case studies – Three representative scientific pipelines were refactored to use the FAIR components, and the team measured reuse frequency, setup time, and reproducibility metrics.
Results & Findings
| Metric | Traditional approach | FAIR component approach |
|---|---|---|
| Time to set up a new workflow (hrs) | 6–12 | 1–2 |
| Duplicate code artifacts per domain | ~15 | ~3 |
| Success rate of reproducing a published result (first try) | 68 % | 92 % |
| User satisfaction (Likert 1‑5) | 3.2 | 4.6 |
The prototype proved that a modest metadata layer and a searchable registry can slash onboarding time and dramatically improve reproducibility. Moreover, the component‑centric view revealed that many “different” pipelines were actually re‑using the same underlying tools (e.g., a specific FFT library), suggesting a large untapped potential for sharing.
Practical Implications
- For developers: Publishing a container image or module file with the prescribed metadata automatically makes it discoverable by anyone on OLCF, turning a personal script into a community asset.
- For HPC operators: The registry can be integrated with existing resource managers, enabling policy enforcement (e.g., only approved, FAIR‑tagged components can be scheduled) and simplifying security audits.
- For research teams: Reusing vetted components reduces the need for custom environment builds, freeing up compute cycles for actual science rather than “environment engineering.”
- Cross‑facility portability: Because the metadata follows community standards (JSON‑LD, schema.org), the same components can be exported to other supercomputers or cloud HPC services with minimal friction.
- Automation pipelines: CI/CD systems can automatically validate FAIR compliance (metadata completeness, provenance capture) before a component is promoted to the shared registry, ensuring quality at scale.
Limitations & Future Work
- Scope of the prototype – The current implementation covers only a subset of component types (Singularity containers, module files). Extending to compiled binaries, data‑intensive libraries, and AI models remains work in progress.
- Security & policy integration – While the authors outline a path for integrating with OLCF’s authentication, the prototype does not yet enforce fine‑grained access controls or sandboxing for untrusted components.
- User adoption barrier – Convincing legacy users to annotate and register existing scripts may require incentives or automated retro‑fitting tools.
- Scalability testing – The registry was evaluated on a few dozen components; future work should stress‑test the service with thousands of entries and concurrent queries typical of a large HPC center.
- Inter‑center federation – The paper proposes a roadmap for linking FAIR registries across multiple supercomputing sites, but concrete protocols and governance models are still open research questions.
Bottom line: By re‑thinking FAIR not as a data‑only concern but as a component‑level strategy, this work offers a practical blueprint for turning the massive, siloed HPC ecosystems into collaborative, reusable platforms—an evolution that could accelerate scientific discovery while lowering the hidden cost of “environment engineering.”
Authors
- Sean R. Wilkinson
- Patrick Widener
- Sarp Oral
- Rafael Ferreira da Silva
Paper Information
- arXiv ID: 2512.02818v1
- Categories: cs.DC, cs.DL
- Published: December 2, 2025
- PDF: Download PDF