[Paper] SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
Source: arXiv - 2601.05187v1
Overview
The paper presents SimuAgent, a large‑language‑model (LLM)‑driven assistant that helps engineers create and simulate Simulink models. By swapping Simulink’s bulky XML files for a compact, dictionary‑style Python representation, SimuAgent slashes token usage, speeds up in‑process simulation, and makes the model‑generation pipeline far more developer‑friendly.
Key Contributions
- Compact Python DSL for Simulink – replaces verbose XML with a lightweight, human‑readable dictionary format, cutting token counts by an order of magnitude.
- Two‑stage plan‑execute training – first teaches low‑level Simulink API skills, then high‑level design reasoning, yielding a more robust agent.
- Reflection‑GRPO (ReGRPO) – a novel reinforcement‑learning algorithm that injects self‑reflection traces as intermediate rewards, tackling sparse‑reward problems in long‑horizon modeling tasks.
- SimuBench – a new benchmark suite of 5,300 multi‑domain Simulink modeling problems for systematic evaluation.
- On‑premise, privacy‑preserving deployment – the entire training and inference pipeline runs on modest hardware, avoiding cloud‑based data exposure and high API costs.
Methodology
-
Representation Layer – SimuAgent translates a Simulink diagram into a Python dictionary, e.g.:
{ "blocks": [...], "connections": [...] }This representation is both token‑efficient for the LLM and directly executable via Simulink’s Python API.
-
Plan‑Execute Architecture
- Planning: The LLM generates a high‑level design plan (which blocks to add, parameter choices, connection strategy).
- Execution: A thin runtime engine consumes the plan, calls Simulink’s API to build the model, runs a quick simulation, and returns diagnostics.
-
Two‑Stage Curriculum
- Stage 1: Fine‑tune the LLM on low‑level API calls and basic block creation tasks.
- Stage 2: Expose the model to full design problems from SimuBench, encouraging hierarchical reasoning.
-
ReGRPO RL Loop
- The agent interacts with SimuBench tasks, receiving a sparse final reward (model correctness).
- After each episode, the LLM generates a self‑reflection trace (what worked, what failed, why).
- These traces are treated as dense intermediate rewards and fed into Group Relative Policy Optimization (GRPO), accelerating policy updates and stabilizing learning.
Results & Findings
- Training Efficiency – The Qwen2.5‑7B model fine‑tuned with SimuAgent converged in ~30 % fewer RL steps compared with vanilla GRPO and PPO baselines.
- Modeling Accuracy – On SimuBench, SimuAgent achieved 84 % correct model generation (within tolerance) versus 71 % for the best baseline and 78 % for few‑shot GPT‑4o prompting.
- Token Savings – The Python DSL reduced average token length from ~12 k (XML) to ~1.1 k, enabling larger context windows and cheaper inference.
- Ablation Insights – Removing the two‑stage curriculum dropped accuracy by ~6 pts; omitting the abstract‑reconstruct data augmentation (randomly shuffling block order) reduced generalization to unseen domains by ~4 pts.
- Hardware Footprint – Training completed on a single 8‑GPU node (A100 40 GB) with < 150 GB RAM, and inference runs in < 2 seconds per model on a consumer‑grade RTX 4090.
Practical Implications
- Faster Prototyping – Engineers can describe system requirements in natural language and receive a ready‑to‑run Simulink model in seconds, cutting weeks of manual block wiring.
- Cost‑Effective AI – By staying on‑premise and using a compact DSL, companies avoid expensive cloud LLM API fees and protect proprietary design data.
- Integration Friendly – The Python dictionary format plugs directly into existing CI pipelines; automated regression testing can be added as a post‑generation step.
- Domain Extension – Because the approach is model‑agnostic, similar agents could be built for other graphical tools (e.g., LabVIEW, Modelica), opening a path to AI‑assisted model‑driven engineering across industries.
- Educational Use – Teaching control‑systems or signal‑processing courses can leverage SimuAgent to auto‑generate example models, letting students focus on analysis rather than tedious diagramming.
Limitations & Future Work
- Benchmark Bias – SimuBench, while extensive, is still synthetic; real‑world industrial models may contain custom blocks or legacy components not covered.
- Long‑Term Consistency – The current plan‑execute loop handles single‑run tasks; extending to multi‑iteration design cycles (e.g., iterative tuning) requires more sophisticated state tracking.
- Model Size – Larger LLMs (e.g., 70 B) could further improve reasoning but would increase hardware demands, challenging the “modest‑hardware” claim.
- Explainability – While self‑reflection traces help training, exposing those traces to end users for debugging remains an open UX question.
Authors
- Yanchang Liang
- Xiaowei Zhao
Paper Information
- arXiv ID: 2601.05187v1
- Categories: cs.AI
- Published: January 8, 2026
- PDF: Download PDF