[Paper] SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Published: 1 month ago (January 8, 2026 at 01:10 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.05187v1

Overview

The paper presents SimuAgent, a large‑language‑model (LLM)‑driven assistant that helps engineers create and simulate Simulink models. By swapping Simulink’s bulky XML files for a compact, dictionary‑style Python representation, SimuAgent slashes token usage, speeds up in‑process simulation, and makes the model‑generation pipeline far more developer‑friendly.

Key Contributions

Compact Python DSL for Simulink – replaces verbose XML with a lightweight, human‑readable dictionary format, cutting token counts by an order of magnitude.
Two‑stage plan‑execute training – first teaches low‑level Simulink API skills, then high‑level design reasoning, yielding a more robust agent.
Reflection‑GRPO (ReGRPO) – a novel reinforcement‑learning algorithm that injects self‑reflection traces as intermediate rewards, tackling sparse‑reward problems in long‑horizon modeling tasks.
SimuBench – a new benchmark suite of 5,300 multi‑domain Simulink modeling problems for systematic evaluation.
On‑premise, privacy‑preserving deployment – the entire training and inference pipeline runs on modest hardware, avoiding cloud‑based data exposure and high API costs.

Methodology

Representation Layer – SimuAgent translates a Simulink diagram into a Python dictionary, e.g.:
```
{
    "blocks": [...],
    "connections": [...]
}
```
This representation is both token‑efficient for the LLM and directly executable via Simulink’s Python API.
Plan‑Execute Architecture
- Planning: The LLM generates a high‑level design plan (which blocks to add, parameter choices, connection strategy).
- Execution: A thin runtime engine consumes the plan, calls Simulink’s API to build the model, runs a quick simulation, and returns diagnostics.
Two‑Stage Curriculum
- Stage 1: Fine‑tune the LLM on low‑level API calls and basic block creation tasks.
- Stage 2: Expose the model to full design problems from SimuBench, encouraging hierarchical reasoning.
ReGRPO RL Loop
- The agent interacts with SimuBench tasks, receiving a sparse final reward (model correctness).
- After each episode, the LLM generates a self‑reflection trace (what worked, what failed, why).
- These traces are treated as dense intermediate rewards and fed into Group Relative Policy Optimization (GRPO), accelerating policy updates and stabilizing learning.

Results & Findings

Training Efficiency – The Qwen2.5‑7B model fine‑tuned with SimuAgent converged in ~30 % fewer RL steps compared with vanilla GRPO and PPO baselines.
Modeling Accuracy – On SimuBench, SimuAgent achieved 84 % correct model generation (within tolerance) versus 71 % for the best baseline and 78 % for few‑shot GPT‑4o prompting.
Token Savings – The Python DSL reduced average token length from ~12 k (XML) to ~1.1 k, enabling larger context windows and cheaper inference.
Ablation Insights – Removing the two‑stage curriculum dropped accuracy by ~6 pts; omitting the abstract‑reconstruct data augmentation (randomly shuffling block order) reduced generalization to unseen domains by ~4 pts.
Hardware Footprint – Training completed on a single 8‑GPU node (A100 40 GB) with < 150 GB RAM, and inference runs in < 2 seconds per model on a consumer‑grade RTX 4090.

Practical Implications

Faster Prototyping – Engineers can describe system requirements in natural language and receive a ready‑to‑run Simulink model in seconds, cutting weeks of manual block wiring.
Cost‑Effective AI – By staying on‑premise and using a compact DSL, companies avoid expensive cloud LLM API fees and protect proprietary design data.
Integration Friendly – The Python dictionary format plugs directly into existing CI pipelines; automated regression testing can be added as a post‑generation step.
Domain Extension – Because the approach is model‑agnostic, similar agents could be built for other graphical tools (e.g., LabVIEW, Modelica), opening a path to AI‑assisted model‑driven engineering across industries.
Educational Use – Teaching control‑systems or signal‑processing courses can leverage SimuAgent to auto‑generate example models, letting students focus on analysis rather than tedious diagramming.

Limitations & Future Work

Benchmark Bias – SimuBench, while extensive, is still synthetic; real‑world industrial models may contain custom blocks or legacy components not covered.
Long‑Term Consistency – The current plan‑execute loop handles single‑run tasks; extending to multi‑iteration design cycles (e.g., iterative tuning) requires more sophisticated state tracking.
Model Size – Larger LLMs (e.g., 70 B) could further improve reasoning but would increase hardware demands, challenging the “modest‑hardware” claim.
Explainability – While self‑reflection traces help training, exposing those traces to end users for debugging remains an open UX question.

Authors

Yanchang Liang
Xiaowei Zhao

Paper Information

arXiv ID: 2601.05187v1
Categories: cs.AI
Published: January 8, 2026
PDF: Download PDF

[Paper] SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem