[Paper] SolAgent: A Specialized Multi-Agent Framework for Solidity Code Generation
Source: arXiv - 2601.23009v1
Overview
Smart contracts power today’s decentralized applications, but writing them without bugs or security flaws is notoriously hard. The paper introduces SolAgent, a multi‑agent system that couples large language models with real Solidity tooling (the Forge compiler and Slither analyzer) to automatically generate, test, and harden contract code. The authors show that this “human‑like” workflow can dramatically improve both functional correctness and security compared to existing LLM‑based code generators.
Key Contributions
- Dual‑loop refinement architecture: an inner loop that compiles generated code with Forge for functional correctness, and an outer loop that runs Slither to prune security vulnerabilities.
- File‑system aware agents: the system can navigate multi‑file Solidity projects, resolve imports, and manage project‑level dependencies automatically.
- SolEval+ benchmark: a new, rigorously curated suite of real‑world contracts used to evaluate generation quality and security.
- State‑of‑the‑art performance: SolAgent reaches a Pass@1 of 64.39 %, more than double the best public LLMs and AI IDEs, while cutting vulnerability rates by up to 39.77 % versus human‑written baselines.
- Model distillation pipeline: high‑quality generation traces are released so smaller, open‑source models can be fine‑tuned to inherit SolAgent’s security‑aware behavior.
Methodology
- Prompting & Initial Generation – A primary LLM agent receives a natural‑language specification (e.g., “ERC‑20 token with a timelock”) and emits an initial Solidity file set.
- Inner Loop (Functional Check) – The generated files are fed to the Forge compiler. Compilation errors trigger a feedback message that the LLM rewrites the offending parts until the contract compiles cleanly.
- Outer Loop (Security Check) – Once compilation succeeds, the Slither static analyzer scans the contract suite. Detected issues (re‑entrancy, uninitialized storage, etc.) are packaged into a concise report that the LLM uses to patch the code.
- File‑System Agent – A secondary agent monitors the project directory, adds missing imports, creates auxiliary libraries, and ensures that the overall project structure matches Solidity’s expectations.
- Iterative Convergence – The inner and outer loops alternate until both compilation and security checks pass or a maximum iteration budget is reached.
- Data Collection & Distillation – Every successful trajectory (prompt → code → refinements) is logged. These logs are later used to fine‑tune smaller models, democratizing the approach.
The whole pipeline runs autonomously, requiring only the original specification as input.
Results & Findings
| Metric | SolAgent | Best Public LLM | GitHub Copilot | Human Baseline |
|---|---|---|---|---|
| Pass@1 (functional correctness) | 64.39 % | ~25 % | ~30 % | 70 % (manual) |
| Vulnerability reduction (vs. human) | ‑39.77 % | +12 % (more bugs) | +8 % | 0 % |
| Average refinement cycles | 3.2 | 5.8 | 5.1 | N/A |
Key takeaways
- The dual‑loop design yields a ~2.5× lift in functional pass rate over raw LLM generation.
- Security‑oriented feedback cuts common Solidity bugs dramatically, outperforming even careful human authors on the benchmark.
- The system scales: even with a modest 2‑GPU setup, the full SolEval+ suite (≈1,200 contracts) is processed in under 6 hours.
Practical Implications
- Faster prototyping – Developers can feed a high‑level spec to SolAgent and receive a compile‑ready, security‑vetted contract in minutes, shaving weeks off audit cycles.
- Integrated CI/CD – The inner/outer loop can be wrapped as a pre‑commit hook or GitHub Action, automatically rejecting PRs that fail compilation or introduce Slither warnings.
- Lower audit costs – By catching many low‑level bugs early, firms can focus formal audits on business‑logic correctness rather than basic safety checks.
- Open‑source democratization – The released trajectories enable startups to fine‑tune lightweight models (e.g., LLaMA‑7B) for internal use without licensing expensive proprietary LLMs.
- Education & onboarding – Coding bootcamps can use SolAgent as a teaching assistant that instantly points out why a contract won’t compile or is vulnerable, accelerating learning.
Limitations & Future Work
- Toolchain dependency – SolAgent’s success hinges on Forge and Slither; contracts that rely on newer Solidity features not yet supported by these tools may slip through.
- Scalability to massive codebases – The current implementation handles typical token contracts and modest libraries; scaling to multi‑megabyte DeFi suites may require smarter dependency caching.
- Security analysis depth – Slither is a static analyzer; it cannot detect runtime‑only issues like gas‑limit attacks or complex cross‑contract invariants. Future versions could integrate symbolic execution or formal verification tools.
- Prompt sensitivity – The quality of the initial specification still influences outcomes; ambiguous prompts can lead to divergent implementations. Improving prompt engineering guidance is an open research direction.
Overall, SolAgent demonstrates that marrying LLM creativity with domain‑specific tooling can bridge the gap between rapid code generation and production‑grade security—a promising blueprint for other safety‑critical software domains.
Authors
- Wei Chen
- Zhiyuan Peng
- Xin Yin
- Chao Ni
- Chenhao Ying
- Bang Xie
- Yuan Luo
Paper Information
- arXiv ID: 2601.23009v1
- Categories: cs.SE
- Published: January 30, 2026
- PDF: Download PDF