[Paper] FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Published: 5 days ago (February 3, 2026 at 01:01 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.03798v1

Overview

FullStack‑Agent is a new LLM‑driven system that goes beyond generating pretty front‑ends and actually builds complete, production‑grade web applications—frontend, backend, and database. By combining a multi‑agent coding framework, a self‑learning data pipeline, and a dedicated benchmark, the authors show that large language models can reliably handle the full stack, opening the door to automated web development for non‑experts.

Key Contributions

FullStack‑Dev: A multi‑agent architecture that integrates planning, code editing, repository navigation, and bug localization to manage end‑to‑end web development tasks.
FullStack‑Learn: A data‑scaling/self‑improvement loop that back‑translates crawled and synthetically generated web repositories, fine‑tuning the underlying LLM without human annotation.
FullStack‑Bench: The first systematic benchmark that evaluates generated sites on frontend rendering, backend API correctness, and database operations.
Performance gains: FullStack‑Dev improves over the previous state‑of‑the‑art by 8.7 % (frontend), 38.2 % (backend), and 15.9 % (database). FullStack‑Learn further lifts a 30B model by 9.7 %, 9.5 %, and 2.8 % on the same metrics.
Open‑source release: All code, data, and evaluation scripts are publicly available, encouraging reproducibility and community extensions.

Methodology

Multi‑Agent Planning & Execution
- A Planner LLM sketches the overall architecture (routing, data models, UI components).
- Editor agents iteratively write or modify code files, guided by a Navigator that can query the repository tree and retrieve relevant snippets.
- A Debugger agent runs unit/integration tests, pinpoints failing lines, and asks the Editor to apply patches.
Development‑Oriented Testing
- For each generated project, the system automatically spins up a containerized environment, runs a suite of frontend (Selenium‑style), backend (API), and database (SQL) tests, and records pass/fail signals used by the Debugger.
Self‑Improvement via Back‑Translation
- The authors crawl thousands of open‑source web repos, then reverse‑engineer them: the agents attempt to recreate the repo from a high‑level description, compare the result to the original, and generate correction data.
- This synthetic “error‑corrected” dataset is used to fine‑tune the backbone LLM (30B parameter model) in a continual learning loop, improving its ability to reason about full‑stack code.
Benchmark Construction
- FullStack‑Bench contains balanced test cases across three dimensions (frontend UI, backend logic, database schema & queries) with hidden ground truth, enabling fair comparison of different agents.

Results & Findings

Metric	Prior SOTA	FullStack‑Dev	FullStack‑Learn (30B)
Frontend pass rate	–	+8.7 %	+9.7 %
Backend pass rate	–	+38.2 %	+9.5 %
Database pass rate	–	+15.9 %	+2.8 %

Backend leap: The 38 % boost shows the planner’s ability to correctly wire APIs, authentication, and data validation—areas where earlier agents usually stumble.
Self‑learning impact: Even a modest 30B model gains double‑digit improvements after a single back‑translation round, confirming that the synthetic data is high‑quality and directly relevant.
Robustness: Across 500+ generated sites, the Debugger reduced the average number of failing tests from 4.3 to 0.9, demonstrating effective automated bug localization.

Practical Implications

Rapid prototyping for startups: Developers can describe a product idea in natural language and receive a ready‑to‑deploy full‑stack scaffold, cutting weeks of boilerplate work.
Low‑code platforms: FullStack‑Agent can serve as the AI “engine” behind visual builders, automatically handling the hidden server‑side code that most low‑code tools omit.
Automated migration & modernization: By feeding legacy codebases into the back‑translation pipeline, organizations could generate updated stacks (e.g., moving from monolith to micro‑services) with minimal manual effort.
Education & onboarding: New engineers can experiment with end‑to‑end web projects without needing deep knowledge of each layer, accelerating learning curves.
Continuous integration: The built‑in testing and debugging loop can be plugged into CI pipelines to auto‑repair failing builds in large codebases.

Limitations & Future Work

Scalability to large codebases: The current system is evaluated on medium‑size demo projects; handling enterprise‑scale monoliths may require hierarchical planning and more sophisticated dependency analysis.
Security & compliance: Generated code inherits the same security risks as any LLM output (e.g., injection vulnerabilities); a dedicated security audit module is still needed.
Domain‑specific extensions: While the benchmark covers generic CRUD apps, specialized domains (e.g., real‑time streaming, ML inference services) are not yet addressed.
Human‑in‑the‑loop refinement: The authors note that occasional manual guidance (e.g., clarifying ambiguous requirements) can dramatically improve outcomes, suggesting future work on seamless human‑AI collaboration interfaces.

FullStack‑Agent demonstrates that with the right orchestration of planning, testing, and self‑learning, LLMs can move from “pretty UI generators” to true full‑stack developers—an exciting step toward AI‑augmented software engineering.

Authors

Zimu Lu
Houxing Ren
Yunqiao Yang
Ke Wang
Zhuofan Zong
Mingjie Zhan
Hongsheng Li

Paper Information

arXiv ID: 2602.03798v1
Categories: cs.SE, cs.CL, cs.CV
Published: February 3, 2026
PDF: Download PDF

[Paper] FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Reinforced Attention Learning

[Paper] AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

[Paper] Pseudo-Invertible Neural Networks

[Paper] Shared LoRA Subspaces for almost Strict Continual Learning