[Paper] STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

Published: 3 days ago (February 26, 2026 at 12:01 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.23220v1

Overview

The paper introduces STELLAR, an autonomous tuning engine that leverages large language models (LLMs) to optimize the configuration of high‑performance parallel file systems. By turning the traditionally manual, trial‑and‑error process of I/O tuning into a fast, data‑driven loop, STELLAR can find near‑optimal settings in just a handful of runs, making storage performance tuning practical for scientists and engineers who lack deep systems expertise.

Key Contributions

LLM‑driven end‑to‑end tuning pipeline that extracts tunable parameters from documentation, interprets I/O traces, and iteratively refines configurations.
Retrieval‑augmented generation (RAG) + tool‑use architecture that grounds LLM reasoning in real system data, dramatically reducing hallucinations.
Multi‑agent design that stabilizes decision‑making by having separate agents specialize in extraction, analysis, and strategy selection.
Empirical evidence that STELLAR reaches near‑optimal performance within the first five tuning attempts on unseen workloads, compared with traditional autotuners that may need thousands of iterations.
Knowledge‑base feedback loop that captures successful tuning patterns for reuse on future applications, turning each run into a learning experience for the system.

Methodology

Parameter Extraction – An LLM reads the parallel file system’s manual (e.g., Lustre, GPFS) and builds a structured list of all configurable knobs (stripe size, I/O scheduler, cache policies, etc.).
Trace Analysis – The application’s I/O trace log is fed to the LLM, which identifies workload characteristics (read/write mix, access patterns, concurrency level).
Initial Strategy Selection – Using the extracted parameters and trace insights, the LLM proposes a small set of promising configurations (often just one or two).
Execution & Feedback – The system runs the application with the chosen settings on a real cluster, measures throughput/latency, and records the results.
Iterative Refinement – The LLM reasons over the performance feedback, adjusts the configuration, and repeats steps 3‑4.
Knowledge Consolidation – After convergence, the system summarizes the tuning journey into a reusable knowledge entry (e.g., “for write‑heavy, small‑file workloads, stripe size = 64 KB works best”).

The pipeline is orchestrated by a multi‑agent framework:

Extractor Agent handles documentation parsing.
Analyzer Agent interprets traces.
Planner Agent proposes configurations.
Executor Agent runs the workload and reports metrics.

RAG is used throughout to pull relevant snippets from manuals or prior tuning logs, keeping the LLM’s reasoning anchored to factual data.

Results & Findings

Speed of convergence: In 90 % of 30 benchmark applications, STELLAR identified a configuration within 5 iterations that was within ± 3 % of the globally optimal throughput (as determined by exhaustive search).
Search‑space reduction: The LLM‑guided approach cut the effective search space by > 99 % compared with naïve grid or random search.
Robustness to unseen workloads: Even for applications not represented in the training data, the system’s reasoning based on trace patterns generalized well.
Ablation study: Removing RAG or the multi‑agent coordination increased the average number of iterations needed from 5 to 27, confirming the importance of grounding and specialization.

Practical Implications

For system administrators: STELLAR can be deployed as a “plug‑and‑play” service that continuously optimizes storage settings as new jobs arrive, reducing the need for manual tuning expertise.
For developers of data‑intensive pipelines: Teams can focus on algorithmic work rather than low‑level I/O knobs; the tuner automatically adapts to changing data sizes or access patterns.
For cloud and HPC providers: Embedding STELLAR into job‑submission portals could improve overall cluster utilization and lower the cost per compute hour by squeezing extra I/O performance without hardware upgrades.
For other optimization domains: The paper’s architecture (LLM + RAG + multi‑agent loop) is reusable for tuning compilers, network stacks, or even hyper‑parameter selection in machine‑learning pipelines where each evaluation is expensive.

Limitations & Future Work

Dependence on high‑quality documentation: If the manual is sparse or outdated, parameter extraction may miss critical knobs.
Scalability of real‑system runs: While the iteration count is low, each iteration still requires a full application execution, which can be costly for very long jobs.
Hallucination risk: Although mitigated by RAG and multi‑agent checks, occasional incorrect LLM suggestions were observed, especially for obscure parameters.
Future directions include:
1. Integrating simulation‑based proxies to evaluate configurations faster.
2. Extending the knowledge base to cross‑cluster environments.
3. Exploring fine‑tuned domain‑specific LLMs to further reduce hallucinations.

Authors

Chris Egersdoerfer
Philip Carns
Shane Snyder
Robert Ross
Dong Dai

Paper Information

arXiv ID: 2602.23220v1
Categories: cs.DC
Published: February 26, 2026
PDF: Download PDF

[Paper] STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Exploiting network topology in brain-scale simulations of spiking neural networks

[Paper] A High-Throughput AES-GCM Implementation on GPUs for Secure, Policy-Based Access to Massive Astronomical Catalogs

[Paper] A Simple Distributed Deterministic Planar Separator

[Paper] Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks