[Paper] SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Published: (January 8, 2026 at 01:10 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.05187v1

Overview

The paper presents SimuAgent, a large‑language‑model (LLM)‑driven assistant that helps engineers create and simulate Simulink models. By swapping Simulink’s bulky XML files for a compact, dictionary‑style Python representation, SimuAgent slashes token usage, speeds up in‑process simulation, and makes the model‑generation pipeline far more developer‑friendly.

Key Contributions

  • Compact Python DSL for Simulink – replaces verbose XML with a lightweight, human‑readable dictionary format, cutting token counts by an order of magnitude.
  • Two‑stage plan‑execute training – first teaches low‑level Simulink API skills, then high‑level design reasoning, yielding a more robust agent.
  • Reflection‑GRPO (ReGRPO) – a novel reinforcement‑learning algorithm that injects self‑reflection traces as intermediate rewards, tackling sparse‑reward problems in long‑horizon modeling tasks.
  • SimuBench – a new benchmark suite of 5,300 multi‑domain Simulink modeling problems for systematic evaluation.
  • On‑premise, privacy‑preserving deployment – the entire training and inference pipeline runs on modest hardware, avoiding cloud‑based data exposure and high API costs.

Methodology

  1. Representation Layer – SimuAgent translates a Simulink diagram into a Python dictionary, e.g.:

    {
        "blocks": [...],
        "connections": [...]
    }

    This representation is both token‑efficient for the LLM and directly executable via Simulink’s Python API.

  2. Plan‑Execute Architecture

    • Planning: The LLM generates a high‑level design plan (which blocks to add, parameter choices, connection strategy).
    • Execution: A thin runtime engine consumes the plan, calls Simulink’s API to build the model, runs a quick simulation, and returns diagnostics.
  3. Two‑Stage Curriculum

    • Stage 1: Fine‑tune the LLM on low‑level API calls and basic block creation tasks.
    • Stage 2: Expose the model to full design problems from SimuBench, encouraging hierarchical reasoning.
  4. ReGRPO RL Loop

    • The agent interacts with SimuBench tasks, receiving a sparse final reward (model correctness).
    • After each episode, the LLM generates a self‑reflection trace (what worked, what failed, why).
    • These traces are treated as dense intermediate rewards and fed into Group Relative Policy Optimization (GRPO), accelerating policy updates and stabilizing learning.

Results & Findings

  • Training Efficiency – The Qwen2.5‑7B model fine‑tuned with SimuAgent converged in ~30 % fewer RL steps compared with vanilla GRPO and PPO baselines.
  • Modeling Accuracy – On SimuBench, SimuAgent achieved 84 % correct model generation (within tolerance) versus 71 % for the best baseline and 78 % for few‑shot GPT‑4o prompting.
  • Token Savings – The Python DSL reduced average token length from ~12 k (XML) to ~1.1 k, enabling larger context windows and cheaper inference.
  • Ablation Insights – Removing the two‑stage curriculum dropped accuracy by ~6 pts; omitting the abstract‑reconstruct data augmentation (randomly shuffling block order) reduced generalization to unseen domains by ~4 pts.
  • Hardware Footprint – Training completed on a single 8‑GPU node (A100 40 GB) with < 150 GB RAM, and inference runs in < 2 seconds per model on a consumer‑grade RTX 4090.

Practical Implications

  • Faster Prototyping – Engineers can describe system requirements in natural language and receive a ready‑to‑run Simulink model in seconds, cutting weeks of manual block wiring.
  • Cost‑Effective AI – By staying on‑premise and using a compact DSL, companies avoid expensive cloud LLM API fees and protect proprietary design data.
  • Integration Friendly – The Python dictionary format plugs directly into existing CI pipelines; automated regression testing can be added as a post‑generation step.
  • Domain Extension – Because the approach is model‑agnostic, similar agents could be built for other graphical tools (e.g., LabVIEW, Modelica), opening a path to AI‑assisted model‑driven engineering across industries.
  • Educational Use – Teaching control‑systems or signal‑processing courses can leverage SimuAgent to auto‑generate example models, letting students focus on analysis rather than tedious diagramming.

Limitations & Future Work

  • Benchmark Bias – SimuBench, while extensive, is still synthetic; real‑world industrial models may contain custom blocks or legacy components not covered.
  • Long‑Term Consistency – The current plan‑execute loop handles single‑run tasks; extending to multi‑iteration design cycles (e.g., iterative tuning) requires more sophisticated state tracking.
  • Model Size – Larger LLMs (e.g., 70 B) could further improve reasoning but would increase hardware demands, challenging the “modest‑hardware” claim.
  • Explainability – While self‑reflection traces help training, exposing those traces to end users for debugging remains an open UX question.

Authors

  • Yanchang Liang
  • Xiaowei Zhao

Paper Information

  • arXiv ID: 2601.05187v1
  • Categories: cs.AI
  • Published: January 8, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »