[Paper] ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer

Published: (June 3, 2026 at 09:00 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.05548v1

Overview

The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any empirical understanding of how framework choice affects agent performance. We propose \textbf{LLM-as-a-Developer}, a methodology that replaces human developers with an LLM coding agent that learns each framework’s API from documentation, writes agent code, and iteratively repairs it through a validate-and-feedback loop until tests pass. By holding the developer constant and varying only the framework, generation effort becomes a quantitative proxy for API usability and the resulting agents provide a controlled measure of framework effectiveness. We implement this in \textbf{ADK Arena}, a fully automated pipeline with per-framework Docker isolation, a three-level validation pipeline, and benchmark adapters for SWE-bench, $τ^2$-bench, Terminal-Bench, and MCP-Atlas. Evaluating all 51 popular Python ADK frameworks (204 agent—benchmark pairs), we find that: (1)~generation succeeds for 57% of runs, and its cost varies 5.6$\times$ across frameworks ($0.6 to $3.4 per agent), a quantitative proxy for API complexity, though cost alone does not predict success; (2)~no single framework dominates: the best single-benchmark ADK agents resolve up to 80% of tasks and can even \emph{beat} general-purpose frontier coding agents at a fraction of the cost, yet the median framework resolves only 32%; (3)~across information-source ablations, genuine framework usage stays within a narrow 28—40% band (highest with raw source access and still 33% with no reference material at all), indicating that documentation, source code, and parametric knowledge are largely substitutable rather than any one being a hard bottleneck.

Key Contributions

This paper presents research in the following areas:

  • cs.SE
  • cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.SE.

Authors

  • Jintao Huang
  • Xiaomin Li
  • Gaurav Mittal
  • Yu Hu

Paper Information

  • arXiv ID: 2606.05548v1
  • Categories: cs.SE, cs.AI
  • Published: June 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »