[Paper] MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem

Published: (March 4, 2026 at 10:06 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.04756v1

Overview

The paper introduces MOOSEnger, a conversational AI agent built specifically for the Multiphysics Object‑Oriented Simulation Environment (MOOSE). By turning natural‑language requests into valid MOOSE input files, it dramatically speeds up the traditionally tedious setup and debugging phases of multiphysics simulations.

Key Contributions

  • Domain‑specific AI agent that couples Retrieval‑Augmented Generation (RAG) with deterministic, MOOSE‑aware parsing and validation.
  • Core‑plus‑plugin architecture separating reusable agent infrastructure from a lightweight MOOSE plugin (HIT‑file parsing, syntax‑preserving ingestion, repair utilities).
  • Input pre‑check pipeline that automatically cleans hidden formatting artifacts, fixes malformed HIT structures, and resolves unknown object types via similarity search against a curated syntax registry.
  • Closed‑loop execution backend that runs the generated input through the actual MOOSE runtime (via MCP) and feeds solver diagnostics back into the conversation for iterative correction.
  • Comprehensive evaluation suite reporting RAG metrics (faithfulness, relevance, context precision/recall) and end‑to‑end execution success on a 125‑prompt benchmark covering five major physics domains.
  • Performance boost: 93 % of generated inputs run successfully, compared with only 8 % for a vanilla LLM baseline.

Methodology

  1. Retrieval‑Augmented Generation – When a user asks a question (e.g., “Set up a diffusion problem with a 10 mm slab”), the system first pulls the most relevant snippets from a curated MOOSE documentation and example repository.
  2. Deterministic Parsing – The retrieved text is fed to a HIT‑aware parser that respects MOOSE’s strict input syntax (the “.i” files). The parser builds an abstract representation of the simulation case.
  3. Pre‑check & Repair – A grammar‑constrained loop scans the representation for hidden characters, mismatched braces, or unknown object names. Unknown names are resolved by a similarity search against an application‑syntax registry, effectively “guessing” the intended MOOSE object.
  4. Validation & Smoke‑Testing – The repaired input is validated against MOOSE’s schema and optionally executed on a lightweight runtime (MCP). Solver messages (errors, warnings, convergence info) are captured.
  5. Iterative Feedback – Diagnostic messages are transformed into natural‑language hints and sent back to the LLM, which then refines the input. This loop repeats until the simulation passes the execution check.
  6. Evaluation – The authors log RAG quality metrics and the final pass/fail status for each prompt, enabling a transparent comparison with a baseline LLM that lacks the domain‑specific tooling.

Results & Findings

MetricMOOSEngerLLM‑only baseline
Execution pass rate0.93 (110/118)0.08 (≈9/118)
Faithfulness (RAG)0.960.71
Context precision0.940.62
Context recall0.920.58
Average correction cycles per prompt1.34.7

Interpretation: The deterministic parsing and execution‑in‑the‑loop feedback are the primary drivers of the success gap. Even when the LLM produces syntactically plausible text, without the pre‑check and runtime validation it frequently generates inputs that MOOSE cannot parse or that violate physics constraints.

Practical Implications

  • Faster onboarding – New users can spin up complex multiphysics cases by simply describing what they need, cutting weeks of manual input file authoring down to minutes.
  • Reduced debugging time – The automatic pre‑check catches hidden formatting bugs (e.g., stray Unicode characters) that often cause cryptic MOOSE errors, saving developers from tedious trial‑and‑error cycles.
  • Continuous integration – MOOSEnger can be embedded in CI pipelines to auto‑generate and validate simulation inputs whenever a new physics module is added, ensuring regressions are caught early.
  • Extensibility to other DSLs – The core‑plus‑plugin design shows a clear path for building similar agents for other domain‑specific languages (e.g., OpenFOAM dictionaries, Abaqus input files).
  • Improved reproducibility – By storing the conversational transcript alongside the generated .i file, teams gain a provenance trail that explains why a particular configuration was chosen.

Limitations & Future Work

  • Scope of physics covered – The benchmark, while diverse, still represents a subset of MOOSE’s full capability set; exotic modules may need additional retrieval sources.
  • Dependency on curated docs – Retrieval quality hinges on the completeness and up‑to‑date nature of the documentation corpus; stale examples can mislead the agent.
  • Runtime cost – The smoke‑testing loop requires a local or remote MOOSE execution environment, which may be heavyweight for very large models.
  • Generalization – The similarity‑search repair mechanism works well for misspelled object names but may struggle with entirely novel user intents that lack a close example.

Future directions include expanding the retrieval corpus with community‑contributed notebooks, integrating lightweight surrogate solvers for faster feedback, and exposing a REST API so that IDE plugins or web front‑ends can leverage MOOSEnger directly.

Authors

  • Mengnan Li
  • Jason Miller
  • Zachary Prince
  • Alexander Lindsay
  • Cody Permann

Paper Information

  • arXiv ID: 2603.04756v1
  • Categories: cs.AI, cs.CE, cs.SE
  • Published: March 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »