[Paper] Exploring Novel Data Storage Approaches for Large-Scale Numerical Weather Prediction
Source: arXiv - 2602.17610v1
Overview
The paper evaluates two modern object‑storage systems—DAOS and Ceph—as alternatives to the traditional POSIX‑based Lustre file system for large‑scale Numerical Weather Prediction (NWP) workloads. By building adapters that let ECMWF’s operational NWP code talk to these object stores, the author shows how they perform on real HPC hardware and what that could mean for the broader HPC and AI community.
Key Contributions
- Adapter Development: Created software‑level bridges that let a production‑grade NWP model read/write data directly to DAOS and Ceph without rewriting the core application.
- Comprehensive Benchmarking: Ran a suite of I/O benchmarks (including real NWP I/O patterns) on multiple HPC clusters, comparing DAOS, Ceph, and Lustre side‑by‑side on identical hardware.
- Scalability Analysis: Demonstrated that DAOS scales more linearly with node count and I/O concurrency than both Ceph and Lustre, especially for large, bursty writes typical of NWP.
- Practical Guidance: Documented the challenges of porting POSIX‑centric code to object storage (e.g., handling metadata, consistency semantics) and provided best‑practice recommendations.
- Broader Insight: Provided domain‑agnostic performance data that can help I/O engineers evaluate object storage for other HPC and AI workloads.
Methodology
- Adapter Layer: Implemented thin wrappers around the ECMWF NWP I/O library (IOAPI) that translate POSIX calls into DAOS/Ceph APIs. The wrappers preserve the original data layout to avoid any scientific‑model changes.
- Testbeds: Used three distinct HPC platforms (two with Lustre + NVMe SSDs, one with a DAOS‑ready architecture). Each platform had comparable compute nodes, network fabrics, and storage capacity.
- Workloads: Executed both synthetic micro‑benchmarks (e.g., sequential/parallel reads, writes, metadata ops) and the full NWP workflow (≈10 TB of forecast data per run).
- Metrics Collected: Throughput (GB/s), I/O latency, scalability (throughput vs. node count), and resource utilization (CPU overhead, network traffic).
- Analysis: Compared results across the three storage back‑ends, focusing on where object storage diverged from Lustre and why.
Results & Findings
- Throughput: DAOS achieved up to 2.5× higher aggregate write throughput than Lustre and 1.8× higher than Ceph on the largest node count (256 nodes).
- Scalability: DAOS maintained near‑linear scaling up to the tested limit, while Lustre’s performance plateaued after ~128 nodes due to metadata bottlenecks.
- Latency: Average write latency for small, random I/O was ~30 µs for DAOS vs. ~70 µs for Lustre, making DAOS more suitable for bursty, checkpoint‑style writes.
- CPU Overhead: The adapter added < 5 % extra CPU usage, indicating that the object‑store APIs are lightweight enough for production workloads.
- Ceph: Performed competitively with Lustre for sequential workloads but lagged behind DAOS on random, high‑concurrency patterns.
- Porting Effort: The main hurdles were handling POSIX semantics (e.g., file locking) and tuning object‑store parameters (e.g., pool sizing). Once addressed, the code required minimal changes.
Practical Implications
- For HPC Centers: DAOS offers a compelling upgrade path for systems that already rely on Lustre but need higher I/O scalability for data‑intensive simulations (climate, astrophysics, AI training).
- For Developers: The adapter approach shows that existing POSIX‑based applications can be retrofitted to object storage without a full rewrite, lowering adoption risk.
- AI & Data‑Intensive Pipelines: Object storage’s flat namespace and high concurrency align well with distributed training workloads that generate many small files (e.g., model checkpoints).
- Cost & Architecture: DAOS leverages NVMe‑over‑Fabric and can be deployed on commodity servers, potentially reducing the need for expensive parallel file‑system appliances.
- Future HPC Stack: As exascale systems roll out, the shift toward object‑oriented I/O may become a standard part of the software stack, complementing rather than replacing POSIX interfaces.
Limitations & Future Work
- Hardware Dependency: DAOS performance gains were most pronounced on systems with high‑speed NVMe fabrics; results may differ on older or slower interconnects.
- Software Maturity: Both DAOS and Ceph are still evolving; stability and feature completeness (e.g., advanced security, multi‑tenant isolation) need further validation for production use.
- Benchmark Scope: The study focused on a single NWP model and a limited set of HPC platforms; broader workload diversity (e.g., genomics, fluid dynamics) would strengthen the conclusions.
- Future Directions: The author suggests deeper integration of object‑storage semantics into scientific I/O libraries (e.g., HDF5, NetCDF), automated tuning tools for object‑store parameters, and long‑term reliability studies under continuous production loads.
Authors
- Nicolau Manubens Gil
Paper Information
- arXiv ID: 2602.17610v1
- Categories: cs.DC, cs.DB
- Published: February 19, 2026
- PDF: Download PDF