[Paper] RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades
Coding agents are increasingly deployed in real software development, where a single version iteration requires months of coordinated work across many files. Ho...