Building a Multi-Agent Deep Research Tool with Google ADK, A2A, & Cloud Run

Published: (December 29, 2025 at 10:29 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

Introduction

Research is a loaded word. It’s not just Googling a keyword. It’s reading papers, verifying facts, finding that one perfect diagram, and synthesizing it all into something coherent.

Asking a single AI agent to do all of that sequentially is not very efficient. They’ll hallucinate, they’ll get stuck, and they’ll definitely be slow.

Deep Researcher Tool

TL;DR: Want the code? Check out the Deep Research Agent code on GitHub.

I wanted a system that could take a topic—say, “The History of Recurrent Neural Networks”—and produce a comprehensive, illustrated report. Additionally, I wanted to learn how to build a Deep Research Tool from scratch.

The first attempt? A single loop. It researched, then it looked for images, then it checked its work. It took forever.

So I asked:

Can I make this faster?

In this post we’ll build a Parallel Research Squad. Instead of one agent doing everything, we’ll spin up three specialized agents that run simultaneously, coordinated by a central Orchestrator. We’ll use:

Architecture Diagram

Part 1 – Agentic Design Patterns

We’re no longer just writing prompts; we’re doing system engineering. To build a robust system we leverage three key design patterns:

1. The Orchestrator Pattern

Instead of a “God Agent” that decides everything, we have a central Orchestrator—think of it as the editor‑in‑chief. It doesn’t write the articles; it assigns stories to reporters, manages state, handles errors, and ensures the final product meets the deadline.

2. Parallelization

This is our speed hack. Most agent frameworks run sequentially (Step A → Step B → Step C). But “Reading ArXiv papers” and “Searching for images” are independent tasks. By running them in parallel we reduce total latency to the duration of the slowest task, not the sum of all tasks.

3. The Evaluator‑Optimizer

We don’t trust the first draft. Our system includes a Judge agent. The Orchestrator sends the research to the Judge, which returns a strict Pass/Fail grade with feedback. If it fails, the Orchestrator loops back (Optimizer) to fix the gaps.

Sequential vs Parallel Processing

Part 2 – The Need for Speed (Parallel Execution)

The biggest bottleneck in AI agents is latency. Waiting for a model to “think” and browse the web takes time.

With ADK we implement a ParallelAgent. This isn’t just a concept; it’s a primitive in the framework that handles the async complexity for us. ParallelAgents run in parallel, and the Orchestrator waits for all of them to finish before moving on. This is a simple way to parallelize agents that don’t depend on each other.

# orchestrator/app/agent.py
from google.adk.agents import ParallelAgent

# The "Squad" runs together
research_squad = ParallelAgent(
    name="research_squad",
    description=(
        "Runs the researcher, academic scholar, and asset gatherer in parallel."
    ),
    sub_agents=[researcher, academic_scholar, asset_gatherer],
)

This one change cut our total processing time by 60 %. While the Scholar is reading a dense PDF, the Asset Gatherer is already validating image URLs.

A2A Handshake

Part 3 – The Universal Language (A2A Protocol)

How do these agents talk? They are separate microservices. The Researcher might run on a high‑memory instance, while the Orchestrator lives on a tiny one.

We use the Agent‑to‑Agent (A2A) Protocol, a standardized API for AI agents built on top of JSON‑RPC.

Why A2A?

  • Decoupling – The Orchestrator doesn’t need to know how the Researcher works, only where it is.
  • Interoperability – You could write the Researcher in Python and the Judge in Go. As long as they speak A2A, they can collaborate.
  • Service Discovery – In development we map agents to localhost ports; in production we map them to Cloud Run URLs (or any other service mesh).

TL;DR Implementation Checklist

  1. Define sub‑agents (researcher, academic_scholar, asset_gatherer, judge).
  2. Wrap them in a ParallelAgent (research_squad).
  3. Create an Orchestrator that:
    • Sends the topic to research_squad.
    • Receives the combined output.
    • Passes it to judge.
    • If the judge fails, triggers the Optimizer loop.
  4. Deploy each agent as an independent Cloud Run service exposing the A2A endpoint.
  5. Configure service discovery (environment variables or a service registry).

With these patterns you can turn a slow, monolithic “research bot” into a fast, scalable Parallel Research Squad that reliably produces high‑quality, illustrated reports. Happy building!

# orchestrator/app/agent.py
from google.adk.agents.remote_a2a_agent import RemoteA2aAgent

# The Orchestrator calls the remote Scholar service
academic_scholar = RemoteA2aAgent(
    name="academic_scholar",
    # In prod, this is an internal Cloud Run URL
    agent_card="http://scholar-service:8000/.well-known/agent.json",
    description="Searches for academic papers."
)

Scaling Graph

Part 4 – Infrastructure as a Superpower (Cloud Run)

We deploy this system on Google Cloud Run. This gives us the “Grocery Store” scaling model.

The “Grocery Store” Model

Imagine a grocery store with one checkout lane. If 50 people show up, the line goes out the door.

In our system, each agent is a checkout lane.

  • Monolith: One lane. 50 requests = 50× wait time.
  • Microservices on Cloud Run: 50 requests = Cloud Run instantly spins up 50 instances of the Researcher. Everyone gets checked out at once.

Scale to Zero

When no one is using the app, we have 0 instances running. We pay $0. This is crucial for cost‑effective AI applications.

Note: When a Cloud Run service is not in use, it automatically scales to zero, which means a cold start is required for the next request. You can keep your services warm by using a health check.

Part 5 – The Frontend (Next.js + Real‑Time)

We didn’t want a CLI tool; we wanted a product.

We built a Next.js frontend that connects to the Orchestrator. Because we know the architecture, we can visualize it. When the research_squad starts, our frontend shows three pulsing indicators side‑by‑side. You actually see the parallelism happening.

It creates a sense of “liveness” and transparency that builds user trust.

Conclusion

By breaking our monolith into a Parallel Research Squad, we built a system that is:

  • Faster: Parallel execution cuts wait times by >50%.
  • Better: Specialized agents (Scholar, Gatherer) do deeper work than one generalist.
  • Scalable: Microservices on Cloud Run handle infinite load.

Want to build this yourself?

Back to Blog

Related posts

Read more »

The RGB LED Sidequest 💡

markdown !Jennifer Davishttps://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%...

Mendex: Why I Build

Introduction Hello everyone. Today I want to share who I am, what I'm building, and why. Early Career and Burnout I started my career as a developer 17 years a...