[Paper] DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Published: 3 days ago (June 9, 2026 at 07:37 AM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.10728v1

Overview

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce \textbf{DeNovoSWE}, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with “divide and conquer” and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

Key Contributions

This paper presents research in the following areas:

cs.SE

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.SE.

Authors

Jiale Zhao
Guoxin Chen
Fanzhe Meng
Wayne Xin Zhao
Ruihua Song
Ji-Rong Wen
Kai Jia

Paper Information

arXiv ID: 2606.10728v1
Categories: cs.SE
Published: June 9, 2026
PDF: Download PDF

[Paper] DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] The End of Code Review: Coding Agents Supersede Human Inspection

[Paper] DIG: Oracle-Guided Directed Input Generation for One-Day Vulnerabilities

[Paper] The Rise of AI-Native Software Engineering: Implications for Practice, Education, and the Future Workforce

[Paper] Characterizing Tests in IoT Software: Practices, Challenges and Opportunities