How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

Published: 3 days ago (May 7, 2026 at 05:23 PM EDT)

7 min read

Source: VentureBeat

The Problem with Hard‑Coded LangChain Pipelines

“Every LangChain pipeline your team hardcodes starts breaking the moment the query distribution shifts — and it always shifts.”

That bottleneck is exactly what Sakana AI set out to eliminate.

Introducing the RL Conductor

Sakana AI’s researchers have built a small language model trained via reinforcement learning (RL) to automatically orchestrate a diverse pool of worker LLMs.

Dynamic analysis of each input
Labor distribution across multiple workers
Coordination among agents

The result: state‑of‑the‑art performance on hard reasoning and coding benchmarks, outperforming frontier models such as GPT‑5 and Claude Sonnet 4, as well as expensive human‑designed multi‑agent pipelines. All of this is achieved at a fraction of the cost and with fewer API calls than competing solutions.

The RL Conductor is the backbone of Fugu, Sakana AI’s commercial multi‑agent orchestration service.

The Limitations of Manual Agentic Frameworks

Large language models possess strong latent capabilities, but tapping those capabilities fully remains a major challenge. Current commercial AI products rely heavily on manually designed agentic workflows, which suffer from several fundamental issues:

Rigidity & Constrained Design
- Hard‑coded pipelines (e.g., LangChain, Mixture‑of‑Agents) work for narrow use‑cases but break in production when user demands become heterogeneous.
Quote from the Authors

“While using frameworks with hard‑coded pipelines like LangChain and Mixture‑of‑Agents can work well for specific use cases … In production, an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.”
— Yujin Tang, co‑author (VentureBeat interview)

Tang adds: “Real‑world generalization in such heterogeneous applications inherently necessitates going beyond human‑hardcoded designs.”
No Single Model Is Optimal for All Tasks
- Different models excel in different domains (scientific reasoning, code generation, mathematical logic, high‑level planning, etc.).
- Manually predicting and hard‑coding the ideal model combination for every query is practically impossible.

An optimal agentic framework should:

Analyze a problem automatically.
Delegate subtasks to the most suitable expert in the pool.

Conducting an Orchestra of Agents

The RL Conductor was built to overcome the above limitations. As its name suggests, it conducts an orchestra of agents by:

Dividing challenging problems into subtasks.
Delegating each subtask to a targeted worker LLM.
Designing communication topologies (who sees which prior outputs).

How It Works

Natural‑language workflow generation: For each step, the Conductor emits a plain‑English instruction, assigns an agent, and creates an access list that specifies which previous subtasks and responses are included in that agent’s context.
Flexible structures:
- Simple sequential chains
- Parallel tree structures
- Recursive loops (when needed)

All of this is learned via reinforcement learning rather than hand‑crafted rules:

Training Signal	What It Optimizes
Task + worker pool	Correct answer & proper output format
Reward (binary/graded)	Maximizing task success

Through trial‑and‑error, the Conductor discovers advanced orchestration strategies such as:

Targeted prompt engineering
Iterative refinement
Meta‑prompt optimization

Thus, the model dynamically adjusts its strategy and leverages each worker’s strengths without any human‑coded routing logic.

Conductor in Action

Experimental Setup

Base model: 7‑billion‑parameter Qwen2.5‑7B fine‑tuned with the RL Conductor framework.
Worker pool (7 models):
- Closed‑source giants: Gemini 2.5 Pro, Claude‑Sonnet‑4, GPT‑5
- Open‑source models: DeepSeek‑R1‑Distill‑Qwen‑32B, Gemma3‑27B, Qwen3‑32B, plus one additional model.

The Conductor was tasked with designing agentic workflows of up to five steps.

Benchmarks & Results

Benchmark	Score (Conductor)	Comparison
Overall average	77.27 %	New state‑of‑the‑art
AIME25 (math)	93.3 %	Highest reported
GPQA‑Diamond	87.5 %	—
LiveCodeBench	83.93 %	—

Efficiency

Tokens per question:
- Baseline MoA: 11,203 tokens
- RL Conductor: 1,820 tokens (≈ 6× fewer)
Average workflow steps: 3

Why It Works

Task‑difficulty awareness:
- Simple factual queries → single‑step or two‑agent workflows.
- Complex coding problems → up to four agents (planning, implementation, verification, etc.).
Model‑strength exploitation:
- The Conductor learns that frontier models have complementary strengths and routes subtasks accordingly (e.g., using Claude‑Sonnet‑4 for reasoning, Gemini 2.5 Pro for code synthesis).

Takeaways

Hard‑coded pipelines are brittle in the face of shifting query distributions.
RL Conductor demonstrates that a small, RL‑trained model can dynamically orchestrate a heterogeneous pool of LLMs, achieving superior accuracy and dramatically lower token usage.
The approach paves the way for scalable, cost‑effective multi‑agent services like Fugu, moving beyond the limits of manual agentic designs.

Conductor‑Driven Benchmark Success

To achieve record scores on coding benchmarks, the Conductor frequently assigned Gemini 2.5 Pro and Claude Sonnet 4 to act as high‑level planners, bringing in GPT‑5 only at the very end to write the final optimized code.

In a particularly clever display of adaptability, the Conductor would sometimes abdicate its own role entirely, handing the entire planning process over to Gemini 2.5 Pro and allowing it to dictate the subtasks for the rest of the model pool.

Beyond Benchmarks – Enterprise Utility

“We have been using our Fugu models—built on Conductor technology—internally for a range of practical enterprise applications: software development, deep research, strategy development, and even visual tasks like slide generation,”
— Yujin Tang

Bringing Orchestration to the Enterprise: Sakana Fugu

The 7B model described in the research paper was an exploratory blueprint and is not publicly available.
Sakana AI has productized the Conductor framework into its flagship commercial AI product, Sakana Fugu.

Current Status

Beta phase
Serves as a multi‑agent orchestration system accessible through a standard OpenAI‑compatible API.

Target Market

“Fugu targets the large market of industries where AI adoption has yet to bring large productivity gains due to the generalization limitations of current hard‑coded pipelines, such as finance and defense.”
— Tang

Benefits for Enterprise Developers

Seamless integration into existing applications without managing multiple API keys or manually routing tasks across different vendors.
Behind the API, Fugu automates complex collaboration topologies and role assignments across a pool of models.

Product Variants

Variant	Purpose	Key Characteristics
Fugu Mini	Low‑latency operations	Optimized for speed, suitable for real‑time use cases
Fugu Ultra	Maximum performance on demanding workloads	Scales to heavy computational loads, best for large‑scale tasks

Governance & Interpretability

Tang notes that interpretability risks are functionally similar to the hidden reasoning traces of current top‑tier closed APIs.
The system is managed with established guardrails to minimize hallucinations.

When to Use RL‑Orchestration vs. Traditional Routing

“The absolute sweet spot comes whenever users and their teams feel they are spending a disproportionate amount of time guiding their underlying agents,”
— Tang

Caution: The framework isn’t necessary for every scenario.
Economic note: “It’s hard to beat the economic proposition of a local model running directly on the user’s machine for simple queries.”

Looking Ahead

As the diversity of specialized open‑ and closed‑source AI models continues to grow, static hard‑coded pipelines will become obsolete.
Dynamic orchestration is expected to extend beyond text and code.

“There is indeed a large potential to fill this gap with cross‑modal Conductor frameworks becoming the foundation for more autonomous, self‑coordinating physical AI systems.”
— Tang

How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

The Problem with Hard‑Coded LangChain Pipelines

Introducing the RL Conductor

The Limitations of Manual Agentic Frameworks

Conducting an Orchestra of Agents

How It Works

Conductor in Action

Experimental Setup

Benchmarks & Results

Why It Works

Takeaways

Conductor‑Driven Benchmark Success

Beyond Benchmarks – Enterprise Utility

Bringing Orchestration to the Enterprise: Sakana Fugu

Current Status

Target Market

Benefits for Enterprise Developers

Product Variants

Governance & Interpretability

When to Use RL‑Orchestration vs. Traditional Routing

Looking Ahead

Related posts

AI tool poisoning exposes a major flaw in enterprise agent security

Intent-based chaos testing is designed for when AI behaves confidently — and wrongly

OpenAI brings GPT-5-class reasoning to real-time voice — and it changes what voice agents can actually orchestrate

Governance, not gatekeeping: How SAP brings enterprise‑grade safety to AI connectivity

The Problem with Hard‑Coded LangChain Pipelines

Introducing the RL Conductor

The Limitations of Manual Agentic Frameworks

Conducting an Orchestra of Agents

How It Works

Conductor in Action

Experimental Setup

Benchmarks & Results

Why It Works

Takeaways

Conductor‑Driven Benchmark Success

Beyond Benchmarks – Enterprise Utility

Bringing Orchestration to the Enterprise: Sakana Fugu

Current Status

Target Market

Benefits for Enterprise Developers

Product Variants

Governance & Interpretability

When to Use RL‑Orchestration vs. Traditional Routing

Looking Ahead

Related posts

AI tool poisoning exposes a major flaw in enterprise agent security

Intent-based chaos testing is designed for when AI behaves confidently — and wrongly

OpenAI brings GPT-5-class reasoning to real-time voice — and it changes what voice agents can actually orchestrate

Governance, not gatekeeping: How SAP brings enterprise‑grade safety to AI connectivity

Introducing the RL Conductor

Bringing Orchestration to the Enterprise: Sakana Fugu