[Paper] ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer
Source: arXiv - 2606.05548v1
Overview
The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any empirical understanding of how framework choice affects agent performance. We propose \textbf{LLM-as-a-Developer}, a methodology that replaces human developers with an LLM coding agent that learns each framework’s API from documentation, writes agent code, and iteratively repairs it through a validate-and-feedback loop until tests pass. By holding the developer constant and varying only the framework, generation effort becomes a quantitative proxy for API usability and the resulting agents provide a controlled measure of framework effectiveness. We implement this in \textbf{ADK Arena}, a fully automated pipeline with per-framework Docker isolation, a three-level validation pipeline, and benchmark adapters for SWE-bench, $τ^2$-bench, Terminal-Bench, and MCP-Atlas. Evaluating all 51 popular Python ADK frameworks (204 agent—benchmark pairs), we find that: (1)~generation succeeds for 57% of runs, and its cost varies 5.6$\times$ across frameworks ($0.6 to $3.4 per agent), a quantitative proxy for API complexity, though cost alone does not predict success; (2)~no single framework dominates: the best single-benchmark ADK agents resolve up to 80% of tasks and can even \emph{beat} general-purpose frontier coding agents at a fraction of the cost, yet the median framework resolves only 32%; (3)~across information-source ablations, genuine framework usage stays within a narrow 28—40% band (highest with raw source access and still 33% with no reference material at all), indicating that documentation, source code, and parametric knowledge are largely substitutable rather than any one being a hard bottleneck.
Key Contributions
This paper presents research in the following areas:
- cs.SE
- cs.AI
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.SE.
Authors
- Jintao Huang
- Xiaomin Li
- Gaurav Mittal
- Yu Hu
Paper Information
- arXiv ID: 2606.05548v1
- Categories: cs.SE, cs.AI
- Published: June 4, 2026
- PDF: Download PDF