AWS re:Invent 2025 - Code completion to agents: The evolution of development (DVT405)

Published: 1 week ago (December 6, 2025 at 07:49 AM EST)

3 min read

Source: Dev.to

Introduction

Giovanni Zappella and Laurent Callot (Principal Scientists, Amazon Q Developer) give a brief overview of the evolution of coding agents—from simple code completion to autonomous systems such as the Taxy Code Agent (38 % on SWE‑bench verified) and the Logos Agent (51 % with sandbox execution). They also present Huang, a supervisor‑sub‑agent architecture for handling complex tasks.

Key take‑aways

Optimizing agents for specific use cases
Defining relevant metrics (e.g., the PolyBench benchmark)
Building reliable systems with the Stren Agent SDK and Amazon Bedrock Agent Core
Maintaining flexibility as models and customer needs evolve

From Code Completion to Autonomous Agents

Early IDEs

Basic syntax highlighting and simple autocompletion.
Developers still needed to manage details like semicolons and manual compilation.

Modern IDEs

Advanced suggestions for variable names, method signatures, and larger code snippets.
Still primarily a productivity aid—speeding up typing rather than writing code autonomously.

First Autonomous Agents

Example: Amazon Q Developer CLI.
Takes a natural‑language problem description, interacts with the file system, identifies files to modify, and applies changes to achieve a goal.

Two Families of Coding Agents

1. Synchronous (Companion) Agents

Operate interactively, acting as a developer’s assistant.
Help accelerate task completion while the developer remains in control.

2. Asynchronous (Autonomous) Agents

Developers delegate tasks; agents work independently, possibly in parallel.
Can handle long‑running or short tasks without continuous human supervision.

Typical touch points

Task definition – the developer specifies the goal.
Autonomous execution – the agent performs work (e.g., creates a pull request).
Human review – the developer reviews and merges the changes.
Iterative refinement – the agent may iterate until the code is ready for shipping.

Benchmarking with SWE‑bench

SWE‑bench is a benchmark derived from real GitHub issues.

Process

Remove the human‑written solution from a GitHub issue.
Let an agent generate the missing code.
Run the original unit tests; passing tests indicate a correct solution.

Limitations

Unit tests may not cover every edge case.
Human reviewers may miss defects.

Despite these limits, the benchmark correlates well with code quality. The authors used SWE‑bench to track progress over time, retroactively computing results for earlier agent versions to illustrate evolution.

Architectural Evolution of Coding Agents

Early Attempts

RAG‑based (Retrieval‑Augmented Generation) solutions with modest performance.

Taxy Code Agent

Achieved 38 % verification on SWE‑bench.
Introduced more sophisticated retrieval and reasoning components.

Logos Agent

Reached 51 % verification with sandboxed execution.
Added safety checks and tighter integration with execution environments.

Huang (Supervisor‑Sub‑Agent Architecture)

Hierarchical design where a supervisor delegates subtasks to specialized sub‑agents.
Enables handling of complex, multi‑step development tasks.

Building Reliable Agents

Stren Agent SDK: Provides abstractions for building, testing, and deploying agents.
Amazon Bedrock Agent Core: Offers managed infrastructure, model access, and security controls.

Best practices

Use case‑driven optimization – tailor the agent’s capabilities to the target workflow.
Create custom metrics – such as PolyBench, to measure performance beyond generic benchmarks.
Maintain modularity – allow swapping models or components as technology evolves.

Lessons Learned

Flexibility is essential: Models, APIs, and developer expectations change rapidly.
Metrics matter: Standard benchmarks help, but domain‑specific metrics give clearer insight.
Iterative development: Continuous evaluation against real‑world tasks drives meaningful improvements.

This article is auto‑generated from the original presentation content and may contain minor typographical errors.

AWS re:Invent 2025 - Code completion to agents: The evolution of development (DVT405)

Introduction

From Code Completion to Autonomous Agents

Early IDEs

Modern IDEs

First Autonomous Agents

Two Families of Coding Agents

1. Synchronous (Companion) Agents

2. Asynchronous (Autonomous) Agents

Benchmarking with SWE‑bench

Architectural Evolution of Coding Agents

Early Attempts

Taxy Code Agent

Logos Agent

Huang (Supervisor‑Sub‑Agent Architecture)

Building Reliable Agents

Lessons Learned

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Introduction

From Code Completion to Autonomous Agents

Early IDEs

Modern IDEs

First Autonomous Agents

Two Families of Coding Agents

1. Synchronous (Companion) Agents

2. Asynchronous (Autonomous) Agents

Benchmarking with SWE‑bench

Architectural Evolution of Coding Agents

Early Attempts

Taxy Code Agent

Logos Agent

Huang (Supervisor‑Sub‑Agent Architecture)

Building Reliable Agents

Lessons Learned

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Taxy Code Agent

Logos Agent